Principles and Methods of Digital Lexicography

Cover Page

Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription Access

Abstract

The article describes the principles and methods of digital lexicography. It begins by defining the four main stages of the lexicographic process: 1) writing up the dictionary, 2) editing and developing the book layout, 3) publishing, and 4) the post-publication period. The following section focuses on stage 1, comparing the compilation of example corpora for dictionary preparation in the past (using millions of cardboard cards) with modern tools for lexical analysis provided by web corpora like the Russian National Corpus (ruscorpora.ru). The overview of the advancements in finding examples illustrating word usage is followed by an exploration of the ways dictionary writing methods have evolved.

The analysis of computer-based dictionary writing methods starts with a discussion of the two most popular approaches: file-based and tabular. The former involves composing dictionary files with thousands of entries using text editors like Microsoft Word, resulting in poorly structured entries with inconsistent markup. The latter, however, represents each entry as a raw with entry zones (forms, meanings, examples, etc.) arranged in separate columns. The section outlines the challenges of these methods, emphasizing their limitations in publishing options and handling complex linguistic data, often employing many-to-one relationships. Alternatives such as Text Encoding Initiative (TEI) formats and database utilization are discussed, highlighting their capacity for structured data representation.

Subsequently, dictionary writing systems (DWS) are introduced, with the OnLex platform serving as a primary example illustrating their functionality. It demonstrates how online editing interfaces streamline lexicographic processes, from data input to publication and feedback collection. By analyzing DWS features, the article emphasizes their efficacy in simplifying the editorial workflow and enhancing user experience.

A critical appraisal of the advantages of online DWS is provided, highlighting their role in addressing key challenges faced by traditional publishing methods. Notable advantages include seamless integration of search functionalities, support for multiple languages, and real-time error reporting mechanisms after publication.

In conclusion, the article advocates for the wider adoption of digital lexicography methods, particularly within the Russian tradition, emphasizing their potential to facilitate every stage of the dictionary creation process.

Full Text

Restricted Access

About the authors

Yu. Yu. Makarov

V.V. Vinogradov Russian Language Institute of the Russian Academy of Sciences; Institute of Linguistics of the Russian Academy of Sciences; National Research University “Higher School of Economics”

Author for correspondence.
Email: yurmak@iling-ran.ru

Research Fellow at the V.V. Vinogradov Russian Language Institute of the Russian Academy of Sciences, Junior Researcher at the Institute of Linguistics of the Russian Academy of Sciences, Visiting Scholar at the National Research University “Higher School of Economics”

Russian Federation, Moscow; Moscow; Moscow

References

  1. Belyaev, O.I., Makarov, Y., Novokshanov, D.A., Sinitsyna, Ju.V., Khomchenkova, I.A. Onlajn-slovari iranskikh jazykov [Online Dictionaries of Iranian Languages]. 1-aja Mezhdunarodnaja nauchno-obrazovatelnaja konferentsija “Pejsikovskie chtenija: problemy sovremennogo akademicheskogo vostokovedenija”: materialy konferentsii [1st International Scientific and Educational Conference “Peisikov Readings: Problems of Modern Academic Oriental Studies”: Conference Materials]. Ed. A.A. Maslov. Moscow: ISAA MGU imeni M.V. Lomonosova Publ., 2023, pp. 7–11. URL https://elibrary.ru/item.asp?id=58073241&pff=1 (In Russ.)
  2. Belyaev, O.I., Khomchenkova, I.A., Sinitsyna, J.V., Dyachkov, V.V., Byzova, A.A., Badeev, A.O., Alekseev, D.A., Makarov, Y. Istoriko-etimologicheskij slovar osetinskogo jazyka V.I. Abaeva: problemy sozdanija tsifrovoj dvujazychnoj versii [V.I. Abaev’s Historical-Etymological Dictionary: Issues in the Development of a Digital Bilingual Edition]. Vestn. Mosk. un-ta. Seriya 9. Filologiya [Lomonosov Philology Journal. Series 9. Philology]. 2024, No. 2, pp. 75–86. (In Russ.) http://dx.doi.org/10.55959/MSU0130-0075-9-2024-47-02-4
  3. Dragićević, R., Makarov, Y., Ryzhova, D., Shapich, Y., Yakushkina, E. A new bilingual Serbian–Russian dictionary. (Eds.) K. Despot, I. Brač, A. Ostroški Anić. Lexicography and Semantics: Proceedings of the XXI EURALEX International Congress. Zagreb: Institute for the Croatian Language, 2024, рр. 93–100.
  4. Plungian, V. A. Korpus kak instrument i kak ideologija: o nekotorykh urokakh sovremennoj korpusnoj lingvistiki [A Corpus as a Research Tool and Ideology: Some Lessons from Modern Corpus Linguistics]. Russkij jazyk v nauchnom osveshchenii [Russian Language and Linguistic Theory]. 2008, No. 16(2), pp. 7–20. (In Russ.)
  5. Belikov, V.I., Kopylov, N.Ju., Piperski, A.Ch., Selegey, V.P., Sharoff, S.A. Korpus kak yazyk: ot masshtabiruemosti k differencialnoj polnote [Corpus as Language: From Scalability to Register Variation]. Kompiuternaia lingvistika i intellektualnye tekhnologii [Computational Linguistics and Intelligent Technologies]. 2013, No. 12(1), p. 19. (In Russ.)
  6. Piperski, A., Belikov, V., Kopylov, N., Morozov, E., Selegey, V., Monakhov, S. Big and diverse is beautiful: A large corpus of Russian to study linguistic variation. (Eds.) S. Evert, E. Stemle, P. Rayson. Proceedings of the 8th Web as Corpus Workshop (WAC-8) @ Corpus Linguistics 2013. 2013. P. 24–28.
  7. Magomedgazhieva, P., Daniel, M. Dictionary of Tukita (v2.0.0). Linguistic Convergence Laboratory, HSE University, Moscow, 2023. https://doi.org/10.5281/zenodo.7803955
  8. Belyaev, O., Khomchenkova, I., Sinitsyna, J., Djachkov, V. Digitizing print dictionaries using TEI: The Abaev Dictionary Project. Proceedings of the Seventh International Workshop on Computational Linguistics of Uralic Languages, Syktyvkar, Russia (Online): Association for Computational Linguistics. 2021. P. 57–64. URL: https://aclanthology.org/2021.iwclul-1.7
  9. Ivanov, V.B. Bolshoj persidsko-russkij slovar [Persian-Russian Dictionary]. Vol. 1. Moscow: Nauka Publ., 2020. (In Russ.)
  10. Abel, A. Dictionary writing systems and beyond. Electronic Lexicography. (Eds.) S. Granger, M. Paquot. Oxford University Press, 2012. P. 83–106. https://doi.org/10.1093/acprof:oso/9780199654864.003.0005
  11. Makarov, Y., Melenchenko, M., Novokshanov, D. Digital Resources for the Shughni Language. Proceedings of The Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia within the 13th Language Resources and Evaluation Conference, Marseille, France: European Language Resources Association, 2022. P. 61–64. URL: https://aclanthology.org/2022.eurali-1.9
  12. Ivanov, V.B. Bolshoj persidsko-russkij slovar [Persian-Russian Dictionary]. Vol. 2. Moscow: Fond Ibn Siny Publ., 2023. (In Russ.)
  13. Ivanov, V.B. Bolshoj persidsko-russkij slovar [Persian-Russian Dictionary]. Vol. 3. Moscow: OOO “Sadra” Publ., 2024. (In Russ.)
  14. Krysin, L.P. (ed.) Akademicheskij tolkovyj slovar russkogo jazyka. Tom 1: A – VILIAT’ [Academic Explanatory Dictionary of Russian. Vol. 1]. Moscow: Izdatelskij dom IASK Publ., 2016. (In Russ.)
  15. Krysin, L.P. (ed.) Akademicheskij tolkovyj slovar russkogo jazyka. Tom 2: VINA – GIAUR [Academic Explanatory dictionary of Russian. Vol. 2]. Moscow: Izdatelskij dom IASK Publ., 2016. (In Russ.)
  16. Tsumarev, A.E., Shestakova, L.L., Nechaeva, I.V., Kuleva, A.S., Grunchenko, O.M. “Akademicheskij tolkovyj slovar russkogo jazyka”: traditsionnoe i novoe [“Academic Explanatory Dictionary of the Russian Language”: the Traditional and the New]. Izvestiâ Rossijskoj akademii nauk. Seriâ literatury i âzyka [Bulletin of the Russian Academy of Sciences: Studies in Literature and Language]. 2017, Vol. 76, No. 5, pp. 5–21. (In Russ.)

Supplementary files

Supplementary Files
Action
1. JATS XML
2. Rice. 1. Dictionary of the Tukita dialect of the Karata language (lingconlab.ru/TukitaDict/; [7])

Download (184KB)
3. Fig. 2. Examples of a) TEI markup (left) and b) TEI-Lex-0 markup (right) of a fragment of a dictionary entry

Download (252KB)
4. Fig. 3. A screenshot of the SQLite DBMS showing the table of meanings of the online dictionary of the Persian language iranic.space [1], [9]; meaning_id is the ID of the meaning, unit_id is the ID of the capital lexeme, meaning is the text of the meaning, pos_id and rank are technical indexes

Download (276KB)
5. Fig. 4. Fragment of the editorial interface of the OnLex platform, example from the pamiri.online website [11]

Download (363KB)
6. Fig. 5. Article dod from the Shugnan-Russian dictionary by D. Karamshoev, presented on the website pamiri.online [11]

Download (194KB)

Copyright (c) 2024 Russian Academy of Sciences