数字人文研究 ›› 2021, Vol. 1 ›› Issue (1): 48-64.

• • 上一篇    下一篇

历史文本的词汇标记及其应用

  

  • 出版日期:2021-01-20 发布日期:2021-01-22

  • Online:2021-01-20 Published:2021-01-22

摘要:

[背景/意义]历史文本是历史学研究的基础素材,通过对文本内容的爬网,历史学家将文本中有意义的信息整理、拼凑并脉络化。历史学是一门研究人在时间中的活动轨迹的学科,在加入地理空间的概念之后,历史文本将变得更加立体。跳脱以往在纸本数据中的线性阅读,对信息时代的历史文本,通过技术的协助增添词汇标记,再利用对标记词汇的分析与可视化,鸟瞰并掌握历史文本中隐含的脉络。[过程/方法]通过探讨历史文本中人物、时间、地名与对象词汇标记对历史研究的意义,描述各种标记的目的与特性,尤其指出词汇标记不只是辨识词汇,还需要达到消歧聚合”的功能。同时介绍两个自动标记工具——“码库思古籍半自动标记平台MARKUS)批次标记工具”(CT Tool)。这两个工具使得大量快速标记人、时、地、物成为可能。[结果/结论]透过实际的研究成果案例,说明如何运用标记过的文本;透过时间、人物、地理与对象词汇标记的实际效益,说明历史文本中的词汇标记及其在历史研究中的应用。最后讨论事件标记的问题,指出事件标记与其他词汇标记本质上的不同。

关键词:

词汇标记; , 数字人文; , 历史文本; , DocuSky; , MARKUS

Abstract:

[Background/ Meaning] Historical text is the basic material of historical research. By crawling the content of the text, historians organize, piece together and contextualize meaningful information in the text. History is a discipline that studies the trajectory of human activities in time. After adding the concept to geographic space, historical texts will become more three-dimensional. Instead of linear reading in paper data in the past, historical texts in the information age can add a lot of vocabulary tags with the assistance of technology, and then use the analysis and visualization of tagged vocabulary to take a bird’s eye view and grasp the implicit context in historical texts. [Process/ Method] By discussing the meaning of person, time, place name and object vocabulary marks in historical texts for historical research, describing the purpose and characteristics of various marks, especially pointing out that vocabulary marks not only identify words, but also need to achieve disambiguation With the aggregationfunction. At the same time introduce two automatic tagging tools, Code Library Semi -automatic Marking Platform for Ancient Books”(MARKUS) andBatch Tagging Tool”(Content Tagging Tool, CT) .These two tools make it possible to quickly mark  a large number of people, times, places, and things.[Results/Conclusions] Illustrate how to use marked texts through actual research results; use time, person, geography, and object vocabulary to mark actual benefits to illustrate the use of vocabulary marking and application in historical texts and historical research. Finally, we discuss the issue of event markers, and point out that event markers are essentially different from other lexical markers.

Key words:

Text Annotation, Digital Humanities, Historical Text, DocuSky, MARKUS