Digital Humanities Research ›› 2022, Vol. 2 ›› Issue (4): 74-92.

Previous Articles     Next Articles

A Data-driven Approach to Studying Changing Vocabularies in Historical Newspaper Collections

  

  • Online:2022-11-08 Published:2023-03-06

Abstract: Nation and nationhood are among themost frequently studied concepts in the field ofintellectual history. At the same time,theword ‘nation and its historical usage are veryvague. The aim in this article was to develop a data-drivenmethod using dependencyparsing and neuralword embeddings to clarify some of the vagueness in the evo- lutionthis concept. To this end,we propose the following two-step method. First,usinglinguistic processing,we create a large set of words pertaining to the topic of nation. Second,we traindiachronicwordembeddings anduse themto quantify the strength ofthe semantic similarity between these words and thereby create meaningful clusters,which are then a- ligned diachronically. To illustrate the robustness of the study acrosslanguages,time spans,as well as large datasets,we apply it to the entirety of fivehistorical newspaper archives in Dutch,Swedish,Finnish,and English. To our knowledge, thus far there have been no large-scale comparative studies of this kind thatpurport to grasp long-term developments in as many as four different languages in adata-driven way. A particular strength of themethod we describe in this article is that,by design,it is not limited to the study of nationhood,but rather expands beyond it toother research questions and is reusable in different contexts.

Key words:

digital humanities, data-driven, historical newspapers, vocabulary change

CLC Number: