数字人文研究 ›› 2023, Vol. 3 ›› Issue (2): 39-49.

• 攻玉以石 • 上一篇    下一篇

基于关键词提取的文化遗产信息资源知识抽取方法

  

  • 出版日期:2023-08-17 发布日期:2023-08-17

Knowledge Extraction of Cultural Heritage Information Resources Based on Keywords

  • Online:2023-08-17 Published:2023-08-17

摘要:

在文化遗产信息资源数量爆炸式增长的态势下,能否高效抽取非结构化数据构成的文化遗产信息资源中的知识,影响着是否能更有效传播和弘扬优秀传统文化。文章以文化遗产信息资源文本为研究对象,提出将其按来源分类,根据不同的知识分布特点选取针对性的关键词提取方法,获得关键词后在知识图谱中检索文化遗产实体与关系,完成知识抽取的路径。实验结果表明,研究所确立的分类关键词提取方法在多个阈值条件下较其他方法有较大提升,能够较好地抽取非结构化数据中的文化遗产知识。

关键词: 知识抽取, 关键词提取, 文化遗产, TF-IDF, LDA

Abstract:

With the explosive growth of cultural heritage information resources, efficient extraction methods of unstructured data affects the spread of Chinese traditional culture. This article takes the text of cultural heritage information resources as the research object, proposes to classify them by sources, select keyword extraction methods according to different knowledge distribution characteristics, and retrieve cultural heritage entities and relationships in the knowledge graph after obtaining keywords to complete the path of knowledge extraction. The experimental results show that the categorized keyword extraction method established by the research can improve more than 50% compared with other methods, and can better extract the cultural heritage knowledge in unstructured data.

Key words: knowledge extraction,  keyword extraction,  cultural heritage,  TF-IDF,  LDA

中图分类号: