中国哲学书电子化计划
中国哲学书电子化计划 (CTP) 是个开源的线上电子图书馆计划,提供转录、阅读和搜索中国经典和古籍的多项功能。它的宗旨在于提供容易获取并准确的文本版本[1],尤其和中国哲学相关的文本。这个网站被誉为网络上最大和准确度最高的中国古典文献库,[2][3]也被评为用以研究中国上古文献的最有用文本库之一。
这个网站以多种功能的整合,以及API等外部工具为基础,让用户得以对古代文献进行各种数位分析。借助CTP API和网站的插件系统,中国哲学书电子化计划和其他数位计划是互通的,例如Text Tools、TextRef和MARKUS。
网站内容
这个网站的基本内容包括中国古籍的经过点校和抄本的电子版,也包括基于这些版本而生成的电子文字版。图版和录文这两种内容是互为关联的。包含这两种元素的图文对照界面使得用户可以在检阅文字版的同时,参考文字来源的图录,并以此有效地进行校对和抄录工作。图文对照的界面也允许众包用户对文字进行修订和加上注释,例如是修正OCR生成的文字版,或对白文加上标点。
Texts are divided into pre-Qin and Han texts, and post-Han texts, with the former categorized by school of thought and the latter by dynasty. The ancient (pre-Qin and Han) section of the database contains over 5 million Chinese characters, the post-Han database over 20 million characters, and the publicly editable wiki section over 5 billion characters.[4] Many texts also have English and Chinese translations, which are paired with the original text paragraph by paragraph as well as phrase by phrase for ease of comparison; this makes it possible for the system to be used as a useful scholarly research tool even by students with little or no knowledge of Chinese.[5] 许多文献在这个网站上都有多种版本,其录文是根据不同的具体的版本,系统会一一记录。
除了提供适用于中国文献的高阶搜索功能[6][7] ,网站还提供多项为专家设计的功能,包括多功能辞典、词汇列表、互文信息[8]、文本来源的图录、字词索引及相关信息[9]、元数据、注释信息的显示[10]、罗列公开数位资源的数据库,以及可连接到本网站任何一条数据的论坛。[11][12] 中国哲学书电子化计划的“图书馆”包含超过2500万页的中国古籍扫描图版,[13][14]并且跟全文库逐行关联。这其中有许多是OCR生成的,[15]并可通过一个线上、众包的维基系统编辑和维护。[16][17]这些文本数据和元数据可以通过API输出,所以可以跟其他线上工具连接,也方便用于文本挖掘和数位人文计划。[16][18]
功能
系统内嵌了不少功能,而且通过插件和API还可以加入更多功能。基本工具包括辞典功能,提供来自本系统的关于某字词的信息,例如是辞典中所记录的出处引文、各种文献记载的音韵信息、字词过去的用法,以及翻译(如果有的话)等。辞典也支持以Unicode以外的语言来搜索。
Specially adapted optical character recognition developed for the project and achieving greatly reduced error rates compared with alternative methods is used extensively within the system to provide transcriptions of many texts and editions not previously available in digital form.[19] Transcriptions created through optical character recognition are used to enable full-text search of scanned images of early editions, including those provided by university libraries and other large-scale scanning projects such as the Harvard Yenching Chinese Rare Books Digitization Project. Users of the system collaboratively edit the resulting transcriptions to correct OCR errors as well as add modern punctuation and other annotations to the texts.
As well as providing a mechanism for close integration with external tools and projects, the CTP API and plugin system also provide a powerful means for programmatic access to textual data for use in text mining research and digital humanities teaching. External tools such as Text Tools facilitate browser-based analyses of word usage, text reuse, document similarity, and other aspects of texts contained in the system as well as interactive visualization of results. A Python module interfacing with the same API allows for more specialized data mining research.
参考文献
- ↑ Elman, Benjamin A. Classical Historiography for Chinese History: Databases & electronic texts. Princeton University. [June 3, 2016].
- ↑ Association of Chinese Philosophers in North America (北美中国哲学学者协会)
- ↑ Chris Fraser, Department of Philosophy, University of Hong Kong
- ↑ http://ctext.org/system-statistics
- ↑ Connolly, Tim. Learning Chinese Philosophy with Commentaries. Teaching Philosophy (Philosophy Documentation Center). 2012, 35 (1): 1–18 [March 19, 2017].
- ↑ http://ctext.org/instructions/advanced-search
- ↑ http://ctext.org/faq/normalization
- ↑ Sturgeon, Donald. Unsupervised identification of text reuse in early Chinese literature. Digital Scholarship in the Humanities (Oxford University Press). 2017 [November 21, 2017].
- ↑ Xu, Jiajin. Corpus-based Chinese studies: A historical review from the 1920s to the present. Chinese Language & Discourse (John Benjamins Publishing Company). 2015, 6 (2): 218–244 [June 3, 2016].
- ↑ Adkins, Martha A. Web Review: Online Resources for the Study of Chinese Religion and Philosophy. Theological Librarianship (American Theological Library Association). 2016, 9 (2): 5–8 [November 7, 2016].
- ↑ Holger Schneider and Jeff Tharsen, http://dissertationreviews.org/archives/9213
- ↑ http://ctext.org/introduction
- ↑ http://ctext.org/library.pl?if=en
- ↑ http://ctext.org/system-statistics
- ↑ Template:Cite conference
- ↑ 16.0 16.1 https://cpianalysis.org/2016/06/08/crowdsourcing-apis-and-a-digital-library-of-chinese/, China Policy Institute, University of Nottingham
- ↑ http://ctext.org/instructions/ocr
- ↑ http://ctext.org/tools/api
- ↑ Template:Cite conference