“中國哲學書電子化計劃”的版本间的差异

来自Digital Sinology
跳转至: 导航搜索
 
(未显示1个用户的3个中间版本)
第1行: 第1行:
 
'''中國哲學書電子化計劃''' (CTP) 是個開源的線上[[電子圖書館]]計劃,提供轉錄、閱讀和搜索[[中國經典和古籍]]的多項功能。它的宗旨在於提供容易獲取並準確的文本版本<ref>{{cite web | url = https://www.princeton.edu/chinese-historiography/electronic-resources/databases-electronic-text/ | title = Classical Historiography for Chinese History: Databases & electronic texts | last = Elman | first = Benjamin A. | publisher = Princeton University | access-date = June 3, 2016}}</ref>,尤其和中國哲學相關的文本。這個網站被譽為網絡上最大和準確度最高的中國古典文獻庫,<ref>[http://www.acpa-net.org/scholarship.html Association of Chinese Philosophers in North America (北美中国哲学学者协会)]</ref><ref>[http://cjfraser.net/links/#etext Chris Fraser], Department of Philosophy, University of Hong Kong</ref>也被評為用以研究中國上古文獻的最有用文本庫之一。
 
'''中國哲學書電子化計劃''' (CTP) 是個開源的線上[[電子圖書館]]計劃,提供轉錄、閱讀和搜索[[中國經典和古籍]]的多項功能。它的宗旨在於提供容易獲取並準確的文本版本<ref>{{cite web | url = https://www.princeton.edu/chinese-historiography/electronic-resources/databases-electronic-text/ | title = Classical Historiography for Chinese History: Databases & electronic texts | last = Elman | first = Benjamin A. | publisher = Princeton University | access-date = June 3, 2016}}</ref>,尤其和中國哲學相關的文本。這個網站被譽為網絡上最大和準確度最高的中國古典文獻庫,<ref>[http://www.acpa-net.org/scholarship.html Association of Chinese Philosophers in North America (北美中国哲学学者协会)]</ref><ref>[http://cjfraser.net/links/#etext Chris Fraser], Department of Philosophy, University of Hong Kong</ref>也被評為用以研究中國上古文獻的最有用文本庫之一。
  
 這個網站以多種功能的整合,以及API等外部工具為基礎,讓用戶得以對古代文獻進行各種數位分析。藉助[https://ctext.org/tools/api CTP API]和網站的[https://ctext.org/tools/plugins 插件系統],'''中國哲學書電子化計劃'''和其他數位計劃是互通的,例如[[Text Tools]]、[[TextRef]]和[[MARKUS]]。
+
 這個網站以多種功能的整合,以及API等外部工具為基礎,讓用戶得以對古代文獻進行各種數位分析。藉助[https://ctext.org/tools/api CTP API]和網站的[https://ctext.org/tools/plugins 插件系統],'''中國哲學書電子化計劃'''和其他數位計劃是互通的,例如[[Text Tools]]、[[TextRef]]和[[ 碼庫思]]。
  
 
==網站內容==
 
==網站內容==
第11行: 第11行:
 
[[File:Liezi-page.png|200px|thumb|right|圖文對照界面]]
 
[[File:Liezi-page.png|200px|thumb|right|圖文對照界面]]
  
Texts are divided into pre-Qin and Han texts, and post-Han texts, with the former categorized by [[Hundred Schools of Thought|school of thought]] and the latter by [[Dynasties in Chinese history|dynasty]]. The ancient (pre-Qin and Han) section of the database contains over 5 million Chinese characters, the post-Han database over 20 million characters, and the publicly editable [[wiki]] section over 5 billion characters.<ref>http://ctext.org/system-statistics</ref> Many texts also have English and Chinese translations, which are paired with the original text paragraph by paragraph as well as phrase by phrase for ease of comparison; this makes it possible for the system to be used as a useful scholarly research tool even by students with little or no knowledge of Chinese.<ref>{{cite journal | last = Connolly | first = Tim | date = 2012 | title = Learning Chinese Philosophy with Commentaries | url = https://www.pdcnet.org/teachphil/content/teachphil_2012_0035_0001_0001_0018 | journal = Teaching Philosophy | publisher = Philosophy Documentation Center | volume = 35 | issue = 1 | pages = 1-18 | access-date= March 19, 2017}}</ref> 許多文獻在這個網站上都有多種版本,其錄文是根據不同的具體的版本,系統會一一記錄。
+
網站上的文獻分為先秦兩漢以及漢代之後的文本。前者以[[ 諸子百家]] 來劃分,後者則以[[ 朝代]] 來分類。先秦兩漢部分有超過5百萬字文獻,而漢代之後部分則有超過2000萬字文獻。可供用戶自由編輯修訂的[[ 維基]] 部分有超過5億字的規模。<ref>http://ctext.org/system-statistics</ref> 許多文獻還附上英文和現代漢語譯文,並跟原文逐段或逐句配對顯示,便於用戶比對。這加強了這個網站的學術價值,不精通或完全不懂中文的用戶也可以加以利用。<ref>{{cite journal | last = Connolly | first = Tim | date = 2012 | title = Learning Chinese Philosophy with Commentaries | url = https://www.pdcnet.org/teachphil/content/teachphil_2012_0035_0001_0001_0018 | journal = Teaching Philosophy | publisher = Philosophy Documentation Center | volume = 35 | issue = 1 | pages = 1-18 | access-date= March 19, 2017}}</ref> 許多文獻在這個網站上都有多種版本,其錄文是根據不同的具體的版本,系統會一一記錄。
  
 
 除了提供適用於中國文獻的高階搜索功能<ref>http://ctext.org/instructions/advanced-search</ref><ref>http://ctext.org/faq/normalization</ref> ,網站還提供多項為專家設計的功能,包括多功能辭典、詞彙列表、互文信息<ref>{{cite journal | last = Sturgeon | first = Donald | date = 2017 | title = Unsupervised identification of text reuse in early Chinese literature | url = https://dsturgeon.net/text-reuse-chinese-literature/ | journal = Digital Scholarship in the Humanities | publisher = Oxford University Press | access-date= November 21, 2017 }}</ref>、文本來源的圖錄、字詞索引及相關信息<ref>{{cite journal | last = Xu | first = Jiajin | date = 2015 | title = Corpus-based Chinese studies: A historical review from the 1920s to the present | url = http://www.ingentaconnect.com/content/jbp/cld/2015/00000006/00000002/art00006 | journal = Chinese Language & Discourse | publisher = John Benjamins Publishing Company | volume = 6 | issue = 2 | pages = 218-244 | access-date= June 3, 2016 }}</ref>、元數據、注釋信息的顯示<ref>{{cite journal | last = Adkins | first = Martha A. | date = 2016 | title = Web Review: Online Resources for the Study of Chinese Religion and Philosophy | url = https://theolib.atla.com/theolib/article/view/435/1515 | journal = Theological Librarianship | publisher = American Theological Library Association | volume = 9 | issue = 2 | pages = 5-8 | access-date= November 7, 2016 }}</ref>、羅列公開數位資源的資料庫,以及可連接到本網站任何一條數據的論壇。<ref>Holger Schneider and Jeff Tharsen, http://dissertationreviews.org/archives/9213</ref><ref>http://ctext.org/introduction</ref> '''中國哲學書電子化計劃'''的「圖書館」包含超過2500萬頁的中國古籍掃描圖版,<ref>http://ctext.org/library.pl?if=en</ref><ref>http://ctext.org/system-statistics</ref>並且跟全文庫逐行關聯。這其中有許多是OCR生成的,<ref>{{cite conference | last = Sturgeon | first = Donald | date = 2017 | title = Unsupervised Extraction of Training Data for Pre-Modern Chinese OCR. | url = https://aaai.org/ocs/index.php/FLAIRS/FLAIRS17/paper/view/15490/15011 | conference = The Thirtieth International Flairs Conference | publisher = AAAI | access-date= November 21, 2017 }}</ref>並可通過一個線上、眾包的維基系統編輯和維護。<ref name="cpi">https://cpianalysis.org/2016/06/08/crowdsourcing-apis-and-a-digital-library-of-chinese/, China Policy Institute, University of Nottingham</ref><ref>http://ctext.org/instructions/ocr</ref>這些文本數據和元數據可以通過API輸出,所以可以跟其他線上工具連接,也方便用於[[文本挖掘]]和[[數位人文]]計劃。<ref name="cpi" /><ref>http://ctext.org/tools/api</ref>
 
 除了提供適用於中國文獻的高階搜索功能<ref>http://ctext.org/instructions/advanced-search</ref><ref>http://ctext.org/faq/normalization</ref> ,網站還提供多項為專家設計的功能,包括多功能辭典、詞彙列表、互文信息<ref>{{cite journal | last = Sturgeon | first = Donald | date = 2017 | title = Unsupervised identification of text reuse in early Chinese literature | url = https://dsturgeon.net/text-reuse-chinese-literature/ | journal = Digital Scholarship in the Humanities | publisher = Oxford University Press | access-date= November 21, 2017 }}</ref>、文本來源的圖錄、字詞索引及相關信息<ref>{{cite journal | last = Xu | first = Jiajin | date = 2015 | title = Corpus-based Chinese studies: A historical review from the 1920s to the present | url = http://www.ingentaconnect.com/content/jbp/cld/2015/00000006/00000002/art00006 | journal = Chinese Language & Discourse | publisher = John Benjamins Publishing Company | volume = 6 | issue = 2 | pages = 218-244 | access-date= June 3, 2016 }}</ref>、元數據、注釋信息的顯示<ref>{{cite journal | last = Adkins | first = Martha A. | date = 2016 | title = Web Review: Online Resources for the Study of Chinese Religion and Philosophy | url = https://theolib.atla.com/theolib/article/view/435/1515 | journal = Theological Librarianship | publisher = American Theological Library Association | volume = 9 | issue = 2 | pages = 5-8 | access-date= November 7, 2016 }}</ref>、羅列公開數位資源的資料庫,以及可連接到本網站任何一條數據的論壇。<ref>Holger Schneider and Jeff Tharsen, http://dissertationreviews.org/archives/9213</ref><ref>http://ctext.org/introduction</ref> '''中國哲學書電子化計劃'''的「圖書館」包含超過2500萬頁的中國古籍掃描圖版,<ref>http://ctext.org/library.pl?if=en</ref><ref>http://ctext.org/system-statistics</ref>並且跟全文庫逐行關聯。這其中有許多是OCR生成的,<ref>{{cite conference | last = Sturgeon | first = Donald | date = 2017 | title = Unsupervised Extraction of Training Data for Pre-Modern Chinese OCR. | url = https://aaai.org/ocs/index.php/FLAIRS/FLAIRS17/paper/view/15490/15011 | conference = The Thirtieth International Flairs Conference | publisher = AAAI | access-date= November 21, 2017 }}</ref>並可通過一個線上、眾包的維基系統編輯和維護。<ref name="cpi">https://cpianalysis.org/2016/06/08/crowdsourcing-apis-and-a-digital-library-of-chinese/, China Policy Institute, University of Nottingham</ref><ref>http://ctext.org/instructions/ocr</ref>這些文本數據和元數據可以通過API輸出,所以可以跟其他線上工具連接,也方便用於[[文本挖掘]]和[[數位人文]]計劃。<ref name="cpi" /><ref>http://ctext.org/tools/api</ref>
第23行: 第23行:
 
[[File:Ctext-ocr.png|200px|thumb|right|Comparison of typical error rates on a page of pre-modern Chinese text]]
 
[[File:Ctext-ocr.png|200px|thumb|right|Comparison of typical error rates on a page of pre-modern Chinese text]]
  
Specially adapted optical character recognition developed for the project and achieving greatly reduced error rates compared with alternative methods is used extensively within the system to provide transcriptions of many texts and editions not previously available in digital form.<ref>{{cite conference | last = Sturgeon | first = Donald | date = 2017 | title = Unsupervised Extraction of Training Data for Pre-Modern Chinese OCR. | url = https://aaai.org/ocs/index.php/FLAIRS/FLAIRS17/paper/view/15490/15011 | conference = The Thirtieth International Flairs Conference | publisher = AAAI | access-date= November 21, 2017 }}</ref> Transcriptions created through optical character recognition are used to enable full-text search of scanned images of early editions, including those provided by university libraries and other large-scale scanning projects such as the Harvard Yenching Chinese Rare Books Digitization Project. Users of the system collaboratively edit the resulting transcriptions to correct OCR errors as well as add modern punctuation and other annotations to the texts.
+
利用為了這個網站特別設計的OCR,這個系統大幅減低了古籍錄文的錯誤率,提供了大量過去沒有的古籍版本之文字。<ref>{{cite conference | last = Sturgeon | first = Donald | date = 2017 | title = Unsupervised Extraction of Training Data for Pre-Modern Chinese OCR. | url = https://aaai.org/ocs/index.php/FLAIRS/FLAIRS17/paper/view/15490/15011 | conference = The Thirtieth International Flairs Conference | publisher = AAAI | access-date= November 21, 2017 }}</ref> 這些以OCR生成的錄文讓用戶得以對古籍的圖像版進行搜索,其中包括哈佛燕京圖書館中文善本特藏數碼化計劃的內容和其他大學圖書館的提供的館藏內容。這個網站的用戶可以集體對錄文進行修訂,糾正其中錯誤,也可以對文字加上標點和注釋。
  
As well as providing a mechanism for close integration with external tools and projects, the [https://ctext.org/tools/api CTP API] and [https://ctext.org/tools/plugins plugin system] also provide a powerful means for programmatic access to textual data for use in text mining research and digital humanities teaching. External tools such as [[Text Tools]] facilitate browser-based analyses of word usage, text reuse, document similarity, and other aspects of texts contained in the system as well as interactive visualization of results. A [https://pypi.org/project/ctext/ Python module] interfacing with the same API allows for more specialized data mining research.
+
除了提供跟外部工具和其他計劃的整合機制,中國哲學書電子化計劃的[https://ctext.org/tools/api CTP API] and [https://ctext.org/tools/plugins plugin system] 還提供了強大的工鞥呢,使得其中的文本數據可用於文本挖掘研究和數位人文教學。這些外部工具包括[[Text Tools]] ,實現以下功能:瀏覽器中的字詞統計、互文分析、文本相似度分析,以及對系統上文本的其他面向之互動可視化。網站還提供一個[https://pypi.org/project/ctext/ Python 工具包] ,提供更專門的數據挖掘操作,並和API相關聯。
  
 
==參考文獻==
 
==參考文獻==

2018年6月15日 (五) 08:52的最新版本

中國哲學書電子化計劃 (CTP) 是個開源的線上電子圖書館計劃,提供轉錄、閱讀和搜索中國經典和古籍的多項功能。它的宗旨在於提供容易獲取並準確的文本版本[1],尤其和中國哲學相關的文本。這個網站被譽為網絡上最大和準確度最高的中國古典文獻庫,[2][3]也被評為用以研究中國上古文獻的最有用文本庫之一。

這個網站以多種功能的整合,以及API等外部工具為基礎,讓用戶得以對古代文獻進行各種數位分析。藉助CTP API和網站的插件系統中國哲學書電子化計劃和其他數位計劃是互通的,例如Text ToolsTextRef碼庫思

網站內容

錄文界面

這個網站的基本內容包括中國古籍的經過點校和抄本的電子版,也包括基於這些版本而生成的電子文字版。圖版和錄文這兩種內容是互為關聯的。包含這兩種元素的圖文對照界面使得用戶可以在檢閱文字版的同時,參考文字來源的圖錄,並以此有效地進行校對和抄錄工作。圖文對照的界面也允許眾包用戶對文字進行修訂和加上注釋,例如是修正OCR生成的文字版,或對白文加上標點。

圖文對照界面

網站上的文獻分為先秦兩漢以及漢代之後的文本。前者以諸子百家來劃分,後者則以朝代來分類。先秦兩漢部分有超過5百萬字文獻,而漢代之後部分則有超過2000萬字文獻。可供用戶自由編輯修訂的維基部分有超過5億字的規模。[4] 許多文獻還附上英文和現代漢語譯文,並跟原文逐段或逐句配對顯示,便於用戶比對。這加強了這個網站的學術價值,不精通或完全不懂中文的用戶也可以加以利用。[5] 許多文獻在這個網站上都有多種版本,其錄文是根據不同的具體的版本,系統會一一記錄。

除了提供適用於中國文獻的高階搜索功能[6][7] ,網站還提供多項為專家設計的功能,包括多功能辭典、詞彙列表、互文信息[8]、文本來源的圖錄、字詞索引及相關信息[9]、元數據、注釋信息的顯示[10]、羅列公開數位資源的資料庫,以及可連接到本網站任何一條數據的論壇。[11][12] 中國哲學書電子化計劃的「圖書館」包含超過2500萬頁的中國古籍掃描圖版,[13][14]並且跟全文庫逐行關聯。這其中有許多是OCR生成的,[15]並可通過一個線上、眾包的維基系統編輯和維護。[16][17]這些文本數據和元數據可以通過API輸出,所以可以跟其他線上工具連接,也方便用於文本挖掘數位人文計劃。[16][18]

功能

辭典頁面

系統內嵌了不少功能,而且通過插件和API還可以加入更多功能。基本工具包括辭典功能,提供來自本系統的關於某字詞的信息,例如是辭典中所記錄的出處引文、各種文獻記載的音韻信息、字詞過去的用法,以及翻譯(如果有的話)等。辭典也支持以Unicode以外的語言來搜索。

Comparison of typical error rates on a page of pre-modern Chinese text

利用為了這個網站特別設計的OCR,這個系統大幅減低了古籍錄文的錯誤率,提供了大量過去沒有的古籍版本之文字。[19] 這些以OCR生成的錄文讓用戶得以對古籍的圖像版進行搜索,其中包括哈佛燕京圖書館中文善本特藏數碼化計劃的內容和其他大學圖書館的提供的館藏內容。這個網站的用戶可以集體對錄文進行修訂,糾正其中錯誤,也可以對文字加上標點和注釋。

除了提供跟外部工具和其他計劃的整合機制,中國哲學書電子化計劃的CTP API and plugin system還提供了強大的工鞥呢,使得其中的文本數據可用於文本挖掘研究和數位人文教學。這些外部工具包括Text Tools,實現以下功能:瀏覽器中的字詞統計、互文分析、文本相似度分析,以及對系統上文本的其他面向之互動可視化。網站還提供一個Python工具包,提供更專門的數據挖掘操作,並和API相關聯。

參考文獻

  1. Elman, Benjamin A. Classical Historiography for Chinese History: Databases & electronic texts. Princeton University. [June 3, 2016]. 
  2. Association of Chinese Philosophers in North America (北美中国哲学学者协会)
  3. Chris Fraser, Department of Philosophy, University of Hong Kong
  4. http://ctext.org/system-statistics
  5. Connolly, Tim. Learning Chinese Philosophy with Commentaries. Teaching Philosophy (Philosophy Documentation Center). 2012, 35 (1): 1–18 [March 19, 2017]. 
  6. http://ctext.org/instructions/advanced-search
  7. http://ctext.org/faq/normalization
  8. Sturgeon, Donald. Unsupervised identification of text reuse in early Chinese literature. Digital Scholarship in the Humanities (Oxford University Press). 2017 [November 21, 2017]. 
  9. Xu, Jiajin. Corpus-based Chinese studies: A historical review from the 1920s to the present. Chinese Language & Discourse (John Benjamins Publishing Company). 2015, 6 (2): 218–244 [June 3, 2016]. 
  10. Adkins, Martha A. Web Review: Online Resources for the Study of Chinese Religion and Philosophy. Theological Librarianship (American Theological Library Association). 2016, 9 (2): 5–8 [November 7, 2016]. 
  11. Holger Schneider and Jeff Tharsen, http://dissertationreviews.org/archives/9213
  12. http://ctext.org/introduction
  13. http://ctext.org/library.pl?if=en
  14. http://ctext.org/system-statistics
  15. Template:Cite conference
  16. 16.0 16.1 https://cpianalysis.org/2016/06/08/crowdsourcing-apis-and-a-digital-library-of-chinese/, China Policy Institute, University of Nottingham
  17. http://ctext.org/instructions/ocr
  18. http://ctext.org/tools/api
  19. Template:Cite conference

鏈接

zh:中國哲學書電子化計劃