QuantiSlav - Quantitative Methods in Historical Slavic Studies

The QuantiSlav project – a cooperation between the Digital Humanities Department of the BAdW and the Department of Slavic Languages and Literatures of the University of Freiburg, Germany – targets several complex issues in historical Slavic Studies and historical Natural Language Processing, including the development of sustainable resources and infrastructure, aimed at equipping scholars with domain-specific technical skills. The 3-year-project (2022-2025) focuses on the Church Slavic written heritage, investigating the time and place of origin of individual textual witnesses.

The Church Slavic language and orthography were subject to various cultural influences over the centuries. Artificially created in the 9th century during the Christianization process as a language for translating biblical and liturgical texts, Church Slavic remains the liturgical language of the Byzantine heritage Orthodox Slavic world to the present day (while the Roman Catholic Slavic world reverted to Latin). Church Slavic was written in a script that had been specifically created for this purpose, soon replaced by a script (based on the Greek writing system) that, with only a few modifications, is still known today as Cyrillic. Particularly with the emergence of additional non-liturgical texts and their regional copies and translations, vernacular elements have been incorporated by the scribes, forming specific redactions of Church Slavic.

In the QuantiSlav project, we are preparing a large body of previously digitized pre-modern Church Slavic Cyrillic texts from the Great Menaion Reader (16th-17th cc.), thus one of our research foci is the diagnosing and improving of the output of Handwritten Text Recognition tools. Using language technological approaches, we are working towards the creation of a benchmark corpus of these -- for the most part unedited -- texts, comprising several million tokens. These data allow for investigating properties such as orthographic and grammatical variation.

An important research question is how to assign them to specific geographic locations resp. to temporal periods, semi-automatically and via machine learning methods. An applied end task in the project is therefore the automatic dating and localization of texts written in pre-modern variants of Church Slavic.

Funding

The QuantiSlav project is funded from the EU's Recovery and Resilience Facility and by the Federal Ministry of Research, Technology and Space in accordance with the guidelines for funding projects to strengthen the digital competences of young scholars (grant number: 16DKWN123B).