中国历代人物传记数据库

来自Digital Sinology
Tsui lincoln讨论 | 贡献2018年5月11日 (五) 08:12的版本 Sources
跳转至: 导航搜索

中国历代人物传记资料(或称数据)库系线上的关系型数据库,其远程目标在于系统性 地收入中国历史上所有重要的传记资料,并将其内容毫无限制地、免费地公诸学术之 用。截至 2017 年 8 月为止,本数据库共收录约 417,000 人的传记资料,这些人物主要 出自七世纪至十九世纪,本数据库现正致力于增录更多唐代和明清的人物传记资料。 本数据库除可作为人物传记的一种参考资料外,亦冀可敷统计分析与空间分析之用。

“中国历代人物传记数据库(CBDB)”是关于7世纪到19世纪中国历史人物的关系型数据库。截止至2017年8月,CBDB收录了超过417,000人的传记信息(包括姓名、生卒年、籍贯、入仕、官职、亲属关系、社会关系等数据)。[1]

项目历史

CBDB 始于社会史专家郝若贝(1932-1996)的工作。[2]郝若贝首次使用关系型数据库研究宋代官员的社会和家庭网络。意识到学界缺乏用于研究中国中古社会史的大型数据集之后,他便踏出了搜集数据的第一步,并通过数据分析,试图对中国历史的变迁提出一些有意义的回答。郝若贝透过人名、地名、官僚系统、亲属关系和社会关系等栏目来为数据建立结构。 郝若贝教授去世后,将数据和相关程序遗赠予哈佛燕京学社。当时此数据包含超过25,000个历史人物,4,500 条书目信息,以及他在历史地理信息系统方面积累的成果。哈佛燕京学社随后对此数据失去了兴趣,所以自 2005 年开始,哈佛大学的包弼德教授开始着手公开发布郝若贝的成果,并进行扩展。来自加利福尼亚大学尔湾分校的中国文学教授傅君劢参与负责重新设计程序。在北京大学的邓小南教授带领下,北京大学中国古代史研究中心的研究生负责修订和审核数据库中的数据。中研院历史语言研究所柳立言教授向CBDB项目提供了数字化的资料。有赖多个数据库项目的参与和共同努力,CBDB在数据的时代跨度和数据类型上有了巨大的扩展。CBDB 当前由哈佛大学费正清中国研究中心、中央研究院历史语言研究所和北大中国古代史研究中心共同拥有。更多关于历史、资助者、贡献者的信息,请访问CBDB项目网站。

资料来源

CBDB 广泛使用传记资料文本来获取人物信息。这些信息包括学者们整理的传记索引、正史的传记部分、祭文和墓志、地方志、个人文集中的部分资料,以及大量官方记录。 CBDB 是一个长期开放的项目。它当前已经在以下材料搜集传记数据:《宋人传记资料索引》、《元人传记资料索引》、《明人传记资料索引》、《清代人物生卒年表》、《宋代郡守年表》、宋(1148、1256两年)明清三代的进士记录、明代进士考生的亲属资料等。2018年,项目成员正从唐代主要的史料和索引中搜集人物资料。 CBDB同时也和其他数据库合作、相互交换数据。CBDB的合作者包括:明清妇女著作项目(Ming Qing Women’s Writings)Ming Qing Women’s Writings、人名权威资料查询人名权威–人物传记资料查询,以及京都大学唐代人物知识库项(Pers-DB Knowledge Base of Tang Persons)等。[3] CBDB项目当前正系统地从地方志和缙绅录中搜集职官信息。

Limitations and Strengths

CBDB 使用数据挖掘技术,从大量资料中提取数据。在数据挖掘的工作中,由于 CBDB 会首先挑选行文比较系统、格套化的文本,所以数据挖掘是系统地进行的。这意味着,虽然 CBDB的合作者会透过手动录入某些历史人物的详尽个人信息,但CBDB小组本身不会对单个历史人物进行深入挖掘。数据挖掘的主要目标是从材料中精确提取数据,并对其进行编码,而不是核查这些源材料的准确性(那往往是文科研究者的工作)。因此,来自不同材料的错误与矛盾的资讯有时会依照原状被保留在数据中。虽然CBDB 区分主要传记来源和在其他人传记中提到的此人信息,但CBDB不会只侧重某一材料,而认为其可靠性高于所有其他材料。因此,CBDB很好地保存着时代积累下来的史料,而时代越早,史料自然越少。当前CBDB人物数据主要来自七世纪到二十世纪(从唐代到清代)。这些数据是过去历史记录的一个“样本”(而不是全体)。例如,墓志是记录亲属关系的重要资料,但是历史上只有几万方墓志被保存至今。相似地,只有一部分人的文集流传至今,而这些将被项目组陆续系统地进行挖掘和处理 。由于传世资料本身的性质,造成CBDB的数据较多是关于历史上的官员的,而不是其他人物。[4]

虽然用户可以用CBDB来检索单个人物的信息,但是CBDB不仅仅是一部人名词典。它实际上是一份大型,并不断增长的人物数据集。这些数据包括人物的名字、职业、入仕方式、亲属关系、社会关系以及著作等。我们可以透过查询这些数据的查询来分析时空变迁的宏观趋势。当我们分析大批量数据的时候,无论是史料的个别舛误还是编码工作造成的微量错误,对结论都将不会有太大的影响。关系型数据库向用户提供了查询以及设置查询变量的强大功能,这是人名词典所无法提供的。

长远来看,CBDB将全面挖掘现有的中国历史资料,并将愈来愈准确地反映中国历史资料中的传记数据。

CBDB Contents

CBDB 2017 Data by Period.png

The figure on the right shows persons in CBDB distributed across dynastic periods as of 2018/1. The variation across dynastic periods has much to do with the sources used. For example, the high number of persons for the Ming period is the result of mining the nearly complete record of Ming jinshi degree holders, which includes not only the names of M(other), F(ather), FF, and FFF, but also the names of B+ (older brother) and B-.

By rule CBDB assigns a person to a single dynastic period based on their date of death, although much of their career may have taken place during the previous dynasty. The date of death is lacking for a majority of figures. In these cases we rely on the index year. The index year is a heuristic that represents the surmised time a person was in the sixtieth year of life (60 sui in Chinese terms or 59 years old in Western terms) or the year of death if less than 60. The index year is estimated using a variety of rules, based on averages of all CBDB data. For example, on average men pass the jinshi degree in their thirtieth year, a wife is two and a half years younger than her husband, the first surviving son is born in his father's thirtieth year and so on. Thus if one date is certain within a family then index years can be estimated for other family members. Generally this works well, but if it is extended across more than two generations up or down the reliability of the index year decreases greatly. The index year is essential for queries with temporal parameters.

CBDB 2017 Data by table.png

CBDB collects many kinds of data on individuals; the number of data points by category are given in the figure on the left. For each category there is a code table in the database. The main biographical data table assigns each person a unique ID that can be used in various data tables. It codes 235 kinds of Social Associations, which are further categorized by type: the main ones being Writings, Politics and Scholarship. There are 20 Biographical Address codes, including: place of birth, death and burial; basic affiliation (jiguan 籍贯); ancestral address; membership in the Eight Banner system of the Qing dynasty; former address; etc. The seventeen Alternate Name codes include: courtesy name (zi 字), studio names, posthumous name, dharma name, birth order name, childhood names, etc. Every possible kinship relationship in the sources is coded. However, the goal is to reduce these relations to the shortest distance (e.g. F-S(on), H(usband)-W(ife) and rely on computation to generate family trees on demand. Entry into office codes a wide variety of modes of entry, including: many types of examination, recommendation, yin privilege, purchase, etc. Office postings include all office titles and ranks in a dynasty, which in turn can be accessed through a hierarchical tree (allowing one to query all holders of positions within a part of the bureaucratic structure), and places of service for local officials. Social distinction is used in particular to identify the reputation of persons irrespective of office (e.g. poet, artist, monk, merchant). Texts include both the titles of extant and lost works of a person; when possible the bibliographic class is included.

Visualizations

CBDB serves as a data resource for prosopographical research.[5] The data can be queried and then copied into a tool for statistical analysis and visualization.
CBDB median Age of Death.png
CBDB median Age of Death-Women.png
This is illustrated by the two figures contrasting median age of death for all persons in CBDB with the median age of death for CBDB women. The difference, obscured when gender is not differentiated, is due to the higher mortality of women in child-bearing ages. About ten percent of CBDB persons are women.







Song Lian's (1310-81) literary and scholarly network
Song Lian's social network mapped
Song Lian Lit Sch network.png
The results of querying CBDB data can be exported for use in two other kinds of analytic visualizations: geographic information systems and social network analysis. To illustrate the value of both consider three visualizations of the same dataset. Song Lian 宋濂 (1310-81) has 452 social associations in CBDB, included in this visualization of his social network are only his literary associations (e.g., letter exchanges, wrote inscriptions for) and scholarly associations (e.g. student-teacher relationships), which have been filtered to include only those in the network who have at least one connection to Song Lian and at least one connection to another person in the network. Pajek, free network analysis software, was used for the graph on the left. Gephi was used for the graph in the center; in this case colors have been used to identify subgroups within the network. CBDB also supports export to UCInet. By mapping Song's associations we see that his network is national but overwhelmingly local. This visualization used Quantum GIS, freeware for PCs and Macs, along with prefectural boundaries, rivers, and a digital elevation model freely available from the China Historical GIS (CHGIS) v. 6. CBDB can also export in KML to Google Earth. CBDB makes it possible to generate data based on simple and complex queries. One could find, for example, all those who came from a certain place and proceed to discover the social and kinship connections among all those who entered government through the civil service examination from that place within a certain span of years.
The Spatial Distribution of Persons in CBDB

See also

References

  1. China Biographical Database Project (CBDB). Projects.iq.harvard.edu. 2016-11-07 [2016-12-11]. 
  2. Smith, Paul J. "Obituary: Robert M. Hartwell (1932-1996)". Journal of Song Yuan Studies 27. 1997. 
  3. Reviews of Internet resources for Asian Studies. Resource: China Biographical Database Project (CBDB) [New Release] (Jan 2011, Vol. 18, No. 1, 320). The Asian Studies WWW Monitor. 
  4. New Approaches in Chinese Digital Humanities - CBDB and Digging into Data Workshop. Peking University. Office of International Relations. 2016-01-11. 
  5. Gerritsen, Anne. Prosopography and its Potential for Middle Period Research (Workshop on the Prosopography of Middle Period China: Using the China Biographical Database). Journal of Song-Yuan Studies. 2008, 38: 161–201. 

Further reading

  • Peter K. Bol, Chao-Lin Liu, and Hongsu Wang, Mining and Discovering Biographical Information in Difangzhi with a Language-Model-based Approach[1]
  • Peter K. Bol, "The Late Robert M. Hartwell 'Chinese Historical Studies, Ltd.' Software Project," 1999[2]
  • Anne Gerritsen, "Using the CBDB for the study of women and gender? Some of the pitfalls" December 2007[3]
  • Fuller, Michael A. "The China Biographical Database User's Guide," February 28, 2015[4]
  • "Online Guide to Querying and Reporting System," Academia Sinica[5]ZH:中国历代人物传记数据库


外部链接

  • Mining and Discovering Biographical Information in Difangzhi with a Language-Model-based Approach (PDF). Arvix.org. [2016-12-11]. 
  • Peter Bol. The Late Robert M. Hartwell "Chinese Historical Studies, Ltd." Software Project (PDF). Pnclink.org. [2016-12-11]. 
  • Anne Gerritsen. Using the CBDB for the study of women and gender? Some of the pitfalls (PDF). Humanities.uci.edu. [2016-12-11]. 
  • Michael A. Fuller. The China Biographical Database : User's Guide (PDF). Projects.iq.harvard.edu. February 28, 2015 [2016-12-11]. 
  • CBDB Querying and Reporting System - Online Help. Db1.ihp.sinica.edu.tw. [2016-12-11]. 
  • 取自“https://digitalsinology.org/zh/mediawiki/index.php?title=中國歷代人物傳記資料庫&oldid=240