“中國哲學書電子化計劃”的版本间的差异

来自Digital Sinology
跳转至: 导航搜索
第3行: 第3行:
 
 這個網站以多種功能的整合,以及API等外部工具為基礎,讓用戶得以對古代文獻進行各種數位分析。藉助[https://ctext.org/tools/api CTP API]和網站的[https://ctext.org/tools/plugins 插件系統],'''中國哲學書電子化計劃'''和其他數位計劃是互通的,例如[[Text Tools]]、[[TextRef]]和[[MARKUS]]。
 
 這個網站以多種功能的整合,以及API等外部工具為基礎,讓用戶得以對古代文獻進行各種數位分析。藉助[https://ctext.org/tools/api CTP API]和網站的[https://ctext.org/tools/plugins 插件系統],'''中國哲學書電子化計劃'''和其他數位計劃是互通的,例如[[Text Tools]]、[[TextRef]]和[[MARKUS]]。
  
 +
==網站內容==
  
'''中国哲学书电子化计划'''('''Chinese Text Project'''),是一个线上[[ 中國古典典籍|古籍文献]]检索系统<ref>[http://library.ust.hk/res/beyond/Electronic_Books/ E-Books and E-texts] {{webarchive|url=https://web.archive.org/web/20090604051805/http://library.ust.hk/res/beyond/Electronic_Books/ |date=2009-06-04 }}, Hong Kong University Library</ref>,是线上古典文献最丰富且最可靠的来源之一<ref>[http://www.acpa-net.org/scholarship.html Association of Chinese Philosophers in North America(北美中国哲学学者协会)] {{webarchive|url=https://web.archive.org/web/20101225233421/http://www.acpa-net.org/scholarship.html |date=2010-12-25 }}</ref><ref>[http://cjfraser.net/links/ Chris Fraser], Department of Philosophy, University of Hong Kong</ref>。它的目的是提供尽可能精确且便利使用的中国古代原典文献,尤其先秦两汉文献,把这些资料以恰当结构、可搜索模式来展现,并且广泛使用现代技术作为工具使这些文献更容易学习和研究<ref>[http://ctext.org/introduction/zh 中国哲学书电子化计划简介]</ref>。
+
[[File:Liezi text flow.png|200px|thumb|right| 錄文界面]]  
  
  主要内 容包括 原典资料库、当代研究资料库以及内部字典<ref>[http://km-server.taitheo.org.tw/library/front/bin/ptdetail.phtml?Part=Ebook_1_General&Category=30 台湾神学院图书馆] {{webarchive|url=https://archive.is/20120710044431/http://km-server.taitheo.org.tw/library/front/bin/ptdetail.phtml?Part=Ebook_1_General&Category=30 |date=2012-07-10 }}</ref>;原典资料库 内容 包括 :[[儒家]]、[[墨家]]、[[道家]]、[[法家]]、[[名家]]、[[兵家]]、[[算书]]、[[杂家]]、[[史书]]、经典 献、[[ 书]]、[[中醫學|医学]] 及[[出土 献]]先秦两汉各种原 资料。部分原典 现代汉语 文的 翻译本 另外还有内部 典、底本扫描 、相似段落提示等独特的功能
+
  這個網站 基本內 容包括 中國古籍的經過點校和抄本 電子版,也 包括 基於這些版本而生成的電子 文字 版。圖版和錄文這兩種內容是互為關聯的。包含這兩種元素的圖文對照界面使得用戶可 在檢閱 字版的同時,參考 字來源的圖錄,並以此 效地進行校對 抄錄工作。圖 對照 界面也允許眾包用戶對文字進行修訂和加上注釋 例如是修正OCR生成的文 字版 ,或對白文加上標點
  
== 考文獻==
+
[[File:Liezi-page.png|200px|thumb|right|圖文對照界面]]
 +
 
 +
Texts are divided into pre-Qin and Han texts, and post-Han texts, with the former categorized by [[Hundred Schools of Thought|school of thought]] and the latter by [[Dynasties in Chinese history|dynasty]]. The ancient (pre-Qin and Han) section of the database contains over 5 million Chinese characters, the post-Han database over 20 million characters, and the publicly editable [[wiki]] section over 5 billion characters.<ref>http://ctext.org/system-statistics</ref> Many texts also have English and Chinese translations, which are paired with the original text paragraph by paragraph as well as phrase by phrase for ease of comparison; this makes it possible for the system to be used as a useful scholarly research tool even by students with little or no knowledge of Chinese.<ref>{{cite journal | last = Connolly | first = Tim | date = 2012 | title = Learning Chinese Philosophy with Commentaries | url = https://www.pdcnet.org/teachphil/content/teachphil_2012_0035_0001_0001_0018 | journal = Teaching Philosophy | publisher = Philosophy Documentation Center | volume = 35 | issue = 1 | pages = 1-18 | access-date= March 19, 2017}}</ref> Many works are available in multiple versions, with each transcription following (and often linked to images of) a particular historical edition of the text.
 +
 
 +
As well as providing customized search functionality suited to Chinese texts,<ref>http://ctext.org/instructions/advanced-search</ref><ref>http://ctext.org/faq/normalization</ref> the site also attempts to make use of the unique format of the web to offer a range of features relevant to [[Sinology|sinologists]], including an integrated dictionary, word lists, parallel passage information<ref>{{cite journal | last = Sturgeon | first = Donald | date = 2017 | title = Unsupervised identification of text reuse in early Chinese literature | url = https://dsturgeon.net/text-reuse-chinese-literature/ | journal = Digital Scholarship in the Humanities | publisher = Oxford University Press | access-date= November 21, 2017 }}</ref>, scanned source texts, concordance and index data,<ref>{{cite journal | last = Xu | first = Jiajin | date = 2015 | title = Corpus-based Chinese studies: A historical review from the 1920s to the present | url = http://www.ingentaconnect.com/content/jbp/cld/2015/00000006/00000002/art00006 | journal = Chinese Language & Discourse | publisher = John Benjamins Publishing Company | volume = 6 | issue = 2 | pages = 218-244 | access-date= June 3, 2016 }}</ref> a metadata system, Chinese commentary display,<ref>{{cite journal | last = Adkins | first = Martha A. | date = 2016 | title = Web Review: Online Resources for the Study of Chinese Religion and Philosophy | url = https://theolib.atla.com/theolib/article/view/435/1515 | journal = Theological Librarianship | publisher = American Theological Library Association | volume = 9 | issue = 2 | pages = 5-8 | access-date= November 7, 2016 }}</ref> a published resources database, and a discussion forum in which threads can be linked to specific data on the site.<ref>Holger Schneider and Jeff Tharsen, http://dissertationreviews.org/archives/9213</ref><ref>http://ctext.org/introduction</ref> The "Library" section of the site also includes scanned copies of over 25 million pages of early Chinese texts,<ref>http://ctext.org/library.pl?if=en</ref><ref>http://ctext.org/system-statistics</ref> linked line by line to transcriptions in the full-text database, many creating using Optical Character Recognition,<ref>{{cite conference | last = Sturgeon | first = Donald | date = 2017 | title = Unsupervised Extraction of Training Data for Pre-Modern Chinese OCR. | url = https://aaai.org/ocs/index.php/FLAIRS/FLAIRS17/paper/view/15490/15011 | conference = The Thirtieth International Flairs Conference | publisher = AAAI | access-date= November 21, 2017 }}</ref> and edited and maintained using an online crowd-sourcing wiki system.<ref name="cpi">https://cpianalysis.org/2016/06/08/crowdsourcing-apis-and-a-digital-library-of-chinese/, China Policy Institute, University of Nottingham</ref><ref>http://ctext.org/instructions/ocr</ref> Textual data and metadata can also be exported using an Application Programming Interface, allowing integration with other online tools as well as use in [[text mining]] and [[digital humanities]] projects.<ref name="cpi" /><ref>http://ctext.org/tools/api</ref>
 +
 
 +
==功能=
 +
 
 +
[[File:Wenjin.png|200px|thumb|right|Dictionary page for non-Unicode character]]
 +
 
 +
A number of functions are integrated directly into the system itself, with many more others accessible through external plugins and APIs. Core functionality includes an integrated dictionary, which summarized available information about words and characters from knowledge encoded throughout the system itself, such as citations from historical dictionaries, phonetic annotations from various historical sources, attested usage of the term, together with translations (where available). The dictionary also allows lookup of non-Unicode characters where these have been attested to in specific locations within the database.
 +
 
 +
[[File:Ctext-ocr.png|200px|thumb|right|Comparison of typical error rates on a page of pre-modern Chinese text]]
 +
 
 +
Specially adapted optical character recognition developed for the project and achieving greatly reduced error rates compared with alternative methods is used extensively within the system to provide transcriptions of many texts and editions not previously available in digital form.<ref>{{cite conference | last = Sturgeon | first = Donald | date = 2017 | title = Unsupervised Extraction of Training Data for Pre-Modern Chinese OCR. | url = https://aaai.org/ocs/index.php/FLAIRS/FLAIRS17/paper/view/15490/15011 | conference = The Thirtieth International Flairs Conference | publisher = AAAI | access-date= November 21, 2017 }}</ref> Transcriptions created through optical character recognition are used to enable full-text search of scanned images of early editions, including those provided by university libraries and other large-scale scanning projects such as the Harvard Yenching Chinese Rare Books Digitization Project. Users of the system collaboratively edit the resulting transcriptions to correct OCR errors as well as add modern punctuation and other annotations to the texts.
 +
 
 +
As well as providing a mechanism for close integration with external tools and projects, the [https://ctext.org/tools/api CTP API] and [https://ctext.org/tools/plugins plugin system] also provide a powerful means for programmatic access to textual data for use in text mining research and digital humanities teaching. External tools such as [[Text Tools]] facilitate browser-based analyses of word usage, text reuse, document similarity, and other aspects of texts contained in the system as well as interactive visualization of results. A [https://pypi.org/project/ctext/ Python module] interfacing with the same API allows for more specialized data mining research.
 +
 
 +
==參 考文獻==
 
{{reflist|2}}
 
{{reflist|2}}
  
== 外部链 接==
+
==E鏈 接==
* [http://ctext.org/zhs 中国哲学书电子化计划]
+
* [http://ctext.org Chinese Text Project]
 
* [http://ctext.org/zh 中國哲學書電子化計劃]
 
* [http://ctext.org/zh 中國哲學書電子化計劃]
* [http://ctext.org/ Chinese Text Project]
+
* [https://site.douban.com/137325/ Chinese Text Project] at Douban
 +
 
 +
[[zh:中國哲學書電子化計劃]]
 +
[[ja:中国哲学書電子化計画]]
 +
 
 +
[[Category:Projects]]
 +
[[Category:Full text databases]]
 +
[[Category:Digital libraries]]
 +
[[Category:Digital humanities]]
 +
[[Category:Chinese classic texts]]
 +
[[Category:Projects with APIs]]
 +
 
 +
 
  
 
[[en:Chinese Text Project]]
 
[[en:Chinese Text Project]]

2018年6月5日 (二) 10:55的版本

中國哲學書電子化計劃 (CTP) 是個開源的線上電子圖書館計劃,提供轉錄、閱讀和搜索中國經典和古籍的多項功能。它的宗旨在於提供容易獲取並準確的文本版本[1],尤其和中國哲學相關的文本。這個網站被譽為網絡上最大和準確度最高的中國古典文獻庫,[2][3]也被評為用以研究中國上古文獻的最有用文本庫之一。

這個網站以多種功能的整合,以及API等外部工具為基礎,讓用戶得以對古代文獻進行各種數位分析。藉助CTP API和網站的插件系統中國哲學書電子化計劃和其他數位計劃是互通的,例如Text ToolsTextRefMARKUS

網站內容

錄文界面

這個網站的基本內容包括中國古籍的經過點校和抄本的電子版,也包括基於這些版本而生成的電子文字版。圖版和錄文這兩種內容是互為關聯的。包含這兩種元素的圖文對照界面使得用戶可以在檢閱文字版的同時,參考文字來源的圖錄,並以此有效地進行校對和抄錄工作。圖文對照的界面也允許眾包用戶對文字進行修訂和加上注釋,例如是修正OCR生成的文字版,或對白文加上標點。

圖文對照界面

Texts are divided into pre-Qin and Han texts, and post-Han texts, with the former categorized by school of thought and the latter by dynasty. The ancient (pre-Qin and Han) section of the database contains over 5 million Chinese characters, the post-Han database over 20 million characters, and the publicly editable wiki section over 5 billion characters.[4] Many texts also have English and Chinese translations, which are paired with the original text paragraph by paragraph as well as phrase by phrase for ease of comparison; this makes it possible for the system to be used as a useful scholarly research tool even by students with little or no knowledge of Chinese.[5] Many works are available in multiple versions, with each transcription following (and often linked to images of) a particular historical edition of the text.

As well as providing customized search functionality suited to Chinese texts,[6][7] the site also attempts to make use of the unique format of the web to offer a range of features relevant to sinologists, including an integrated dictionary, word lists, parallel passage information[8], scanned source texts, concordance and index data,[9] a metadata system, Chinese commentary display,[10] a published resources database, and a discussion forum in which threads can be linked to specific data on the site.[11][12] The "Library" section of the site also includes scanned copies of over 25 million pages of early Chinese texts,[13][14] linked line by line to transcriptions in the full-text database, many creating using Optical Character Recognition,[15] and edited and maintained using an online crowd-sourcing wiki system.[16][17] Textual data and metadata can also be exported using an Application Programming Interface, allowing integration with other online tools as well as use in text mining and digital humanities projects.[16][18]

=功能

Dictionary page for non-Unicode character

A number of functions are integrated directly into the system itself, with many more others accessible through external plugins and APIs. Core functionality includes an integrated dictionary, which summarized available information about words and characters from knowledge encoded throughout the system itself, such as citations from historical dictionaries, phonetic annotations from various historical sources, attested usage of the term, together with translations (where available). The dictionary also allows lookup of non-Unicode characters where these have been attested to in specific locations within the database.

Comparison of typical error rates on a page of pre-modern Chinese text

Specially adapted optical character recognition developed for the project and achieving greatly reduced error rates compared with alternative methods is used extensively within the system to provide transcriptions of many texts and editions not previously available in digital form.[19] Transcriptions created through optical character recognition are used to enable full-text search of scanned images of early editions, including those provided by university libraries and other large-scale scanning projects such as the Harvard Yenching Chinese Rare Books Digitization Project. Users of the system collaboratively edit the resulting transcriptions to correct OCR errors as well as add modern punctuation and other annotations to the texts.

As well as providing a mechanism for close integration with external tools and projects, the CTP API and plugin system also provide a powerful means for programmatic access to textual data for use in text mining research and digital humanities teaching. External tools such as Text Tools facilitate browser-based analyses of word usage, text reuse, document similarity, and other aspects of texts contained in the system as well as interactive visualization of results. A Python module interfacing with the same API allows for more specialized data mining research.

參考文獻

  1. Elman, Benjamin A. Classical Historiography for Chinese History: Databases & electronic texts. Princeton University. [June 3, 2016]. 
  2. Association of Chinese Philosophers in North America (北美中国哲学学者协会)
  3. Chris Fraser, Department of Philosophy, University of Hong Kong
  4. http://ctext.org/system-statistics
  5. Connolly, Tim. Learning Chinese Philosophy with Commentaries. Teaching Philosophy (Philosophy Documentation Center). 2012, 35 (1): 1–18 [March 19, 2017]. 
  6. http://ctext.org/instructions/advanced-search
  7. http://ctext.org/faq/normalization
  8. Sturgeon, Donald. Unsupervised identification of text reuse in early Chinese literature. Digital Scholarship in the Humanities (Oxford University Press). 2017 [November 21, 2017]. 
  9. Xu, Jiajin. Corpus-based Chinese studies: A historical review from the 1920s to the present. Chinese Language & Discourse (John Benjamins Publishing Company). 2015, 6 (2): 218–244 [June 3, 2016]. 
  10. Adkins, Martha A. Web Review: Online Resources for the Study of Chinese Religion and Philosophy. Theological Librarianship (American Theological Library Association). 2016, 9 (2): 5–8 [November 7, 2016]. 
  11. Holger Schneider and Jeff Tharsen, http://dissertationreviews.org/archives/9213
  12. http://ctext.org/introduction
  13. http://ctext.org/library.pl?if=en
  14. http://ctext.org/system-statistics
  15. Template:Cite conference
  16. 16.0 16.1 https://cpianalysis.org/2016/06/08/crowdsourcing-apis-and-a-digital-library-of-chinese/, China Policy Institute, University of Nottingham
  17. http://ctext.org/instructions/ocr
  18. http://ctext.org/tools/api
  19. Template:Cite conference

E鏈接

zh:中國哲學書電子化計劃