Digital Humanities Research ›› 2023, Vol. 3 ›› Issue (4): 49-62.
Previous Articles Next Articles
Challenges and Thoughts in Making Text Ground Truth for Republican Chinese Newspaper: Taking Jing Bao as an Example
Online:
Published:
Abstract:
Many researchers have explored the use of machine learning for optical character recognition (OCR), particularly in Europe and North America, and many projects are producing ground truth (GT) data for this purpose. It is different when it comes to non-Latin script (NLS) material. The Early Chinese Periodicals Online (ECPO) project at the University of Heidelberg started to work on ways to produce machine-readable full text from historical Chinese newspapers in 2021.ECPO uses different machine-learning approaches, including convolutional neural networks, to develop a semi-automatic pipeline to produce machine-readable full text. We chose the entertainment newspaper JingBao (The Crystal, 1919-1940) as the basis for our experiments.
Key words:
ground truth , Republican Chinese newspapers , Jing Bao , OCR
CLC Number:
H127
TP391.1
K26
Xie Jia, Yip SukMan.
Challenges and Thoughts in Making Text Ground Truth for Republican Chinese Newspaper: Taking Jing Bao as an Example [J]. Digital Humanities Research, 2023, 3(4): 49-62.
Add to citation manager EndNote|Ris|BibTeX
URL: http://dhr.ruc.edu.cn/EN/
http://dhr.ruc.edu.cn/EN/Y2023/V3/I4/49