Chinese Text Project
The Chinese Text Project (CTP; 中國哲學書電子化計劃) is an open-access digital library project providing a wide range of functionality for transcribing, navigating, and searching early Chinese texts. It aims at providing accessible and accurate versions of a wide range of texts,[1] particularly those relating to Chinese philosophy, and the site is credited with providing one of the most comprehensive and accurate collections of classical Chinese texts on the Internet,[2][3] as well as being one of the most useful textual databases for scholars of early Chinese texts.[4][5]
By means of integrated functionality as well as external tools connected via Application Programming Interface, the system facilitates a wide range of digital analyses of pre-modern textual works. Together with the CTP API, a plugin system facilitates direct connections from within the user interface to other projects including Text Tools, TextRef, and MARKUS.
Site contents
Texts are divided into pre-Qin and Han texts, and post-Han texts, with the former categorized by school of thought and the latter by dynasty. The ancient (pre-Qin and Han) section of the database contains over 5 million Chinese characters, the post-Han database over 20 million characters, and the publicly editable wiki section over 5 billion characters.[6] Many texts also have English and Chinese translations, which are paired with the original text paragraph by paragraph as well as phrase by phrase for ease of comparison; this makes it possible for the system to be used as a useful scholarly research tool even by students with little or no knowledge of Chinese.[7]
As well as providing customized search functionality suited to Chinese texts,[8][9] the site also attempts to make use of the unique format of the web to offer a range of features relevant to sinologists, including an integrated dictionary, word lists, parallel passage information[10], scanned source texts, concordance and index data,[11] a metadata system, Chinese commentary display,[12] a published resources database, and a discussion forum in which threads can be linked to specific data on the site.[13][14] The "Library" section of the site also includes scanned copies of over 25 million pages of early Chinese texts,[15][16] linked line by line to transcriptions in the full-text database, many creating using Optical Character Recognition,[17] and edited and maintained using an online crowd-sourcing wiki system.[18][19] Textual data and metadata can also be exported using an Application Programming Interface, allowing integration with other online tools as well as use in text mining and digital humanities projects.[18][20]
References
- ↑ Elman, Benjamin A. "Classical Historiography for Chinese History: Databases & electronic texts". Princeton University. Retrieved June 3, 2016.
- ↑ Association of Chinese Philosophers in North America (北美中国哲学学者协会)
- ↑ Chris Fraser, Department of Philosophy, University of Hong Kong
- ↑ http://warpweftandway.com/support-the-chinese-text-project/
- ↑ http://languagehat.com/chinese-text-project/
- ↑ http://ctext.org/system-statistics
- ↑ Connolly, Tim (2012). "Learning Chinese Philosophy with Commentaries". Teaching Philosophy. Philosophy Documentation Center. 35 (1): 1–18. Retrieved March 19, 2017.
- ↑ http://ctext.org/instructions/advanced-search
- ↑ http://ctext.org/faq/normalization
- ↑ Sturgeon, Donald (2017). "Unsupervised identification of text reuse in early Chinese literature". Digital Scholarship in the Humanities. Oxford University Press. Retrieved November 21, 2017.
- ↑ Xu, Jiajin (2015). "Corpus-based Chinese studies: A historical review from the 1920s to the present". Chinese Language & Discourse. John Benjamins Publishing Company. 6 (2): 218–244. Retrieved June 3, 2016.
- ↑ Adkins, Martha A. (2016). "Web Review: Online Resources for the Study of Chinese Religion and Philosophy". Theological Librarianship. American Theological Library Association. 9 (2): 5–8. Retrieved November 7, 2016.
- ↑ Holger Schneider and Jeff Tharsen, http://dissertationreviews.org/archives/9213
- ↑ http://ctext.org/introduction
- ↑ http://ctext.org/library.pl?if=en
- ↑ http://ctext.org/system-statistics
- ↑ Sturgeon, Donald (2017). Unsupervised Extraction of Training Data for Pre-Modern Chinese OCR. The Thirtieth International Flairs Conference. AAAI. Retrieved November 21, 2017.
- ↑ 18.0 18.1 https://cpianalysis.org/2016/06/08/crowdsourcing-apis-and-a-digital-library-of-chinese/, China Policy Institute, University of Nottingham
- ↑ http://ctext.org/instructions/ocr
- ↑ http://ctext.org/tools/api