Global EditionASIA 中文雙語Fran?ais
    Life

    Language tool helps decipher ancient texts

    By YANG YANG | China Daily | Updated: 2023-12-23 00:00
    Share
    Share - WeChat

    Xunzi, a pioneering large language tool designed specifically for the processing and study of ancient texts, was launched earlier this month by Professor Wang Dongbo and his research team from the College of Information Management at Nanjing Agricultural University.

    Xunzi, the first intelligent tool of its kind in China, features a vast corpus of more than 2 billion words from ancient texts, including the Siku Quanshu (The Complete Library in the Four Branches of Literature).

    As a language model that can understand natural language, do automatic translation, generate poems, and index automatically, Xunzi has been open-sourced on platforms like GitHub and ModelScope.

    The research team named the language model after ancient Chinese philosopher and master of prose, Xun Zi, from the Warring States Period (475-221 BC).

    During its research, the team found that he was not only a great philosopher, but also a pioneer in linguistics.

    Nowadays, readers often find it difficult to understand ancient texts due to challenges such as complex traditional Chinese characters, vertical layout, and the absence of punctuation marks.

    As a result, the launch of Xunzi makes it possible to engage with ancient texts in the era of smart media, Wang says.

    In a demonstration, Wang instructed the model to generate a five-character quatrain with Jinling (the name of Nanjing, East China's Jiangsu province, in ancient times) as the theme. The system promptly produced a well-written original quatrain.

    Xunzi can also easily tackle challenging works concerning ancient texts, such as reading, comprehension, marking punctuation, and translating texts into modern Chinese.

    Experts in ancient Chinese studies can leverage Xunzi for tasks like analyzing word structure, recognizing linguistic entities, and classifying and summarizing ancient texts.

    The model can complete all the tasks thanks to high-performance computing facilities provided by Nanjing Agricultural University and a substantial corpus of annotated and refined data accumulated over a long time, Wang says.

    "Our team has fed the model with a massive 4 billion-word mixed corpus," he says.

    Many factors can influence the building of the language model, such as computing power or application scenarios, but it essentially relies on precise high-quality data fed to it, Wang says. Since 2013, his research team has been focusing on the painstaking manual data annotation to establish a solid foundation for Xunzi.

    Wang takes the essay In Praise of Yueyang Tower by Fan Zhongyan, a politician and writer from the Song Dynasty (960-1279), as an example.

    "To train the machine to mark all the adjective words in this ancient essay, we need to first train people to do the work, and afterward let the machine learn the marked text," he says.

    Wang says the research is expected to benefit both the cultivation of related interdisciplinary talents and the common users of ancient texts. The ultimate goal is to engage a broader audience with ancient texts, promoting innovation in traditional Chinese culture.

    While enabling general users to smoothly use ancient text content and advancing the organization and digitalization of ancient texts, Xunzi is poised for extensive applications in AI writing and teaching, digital entertainment, and various other domains.

     

     

     

    Today's Top News

    Editor's picks

    Most Viewed

    Top
    BACK TO THE TOP
    English
    Copyright 1995 - . All rights reserved. The content (including but not limited to text, photo, multimedia information, etc) published in this site belongs to China Daily Information Co (CDIC). Without written authorization from CDIC, such content shall not be republished or used in any form. Note: Browsers with 1024*768 or higher resolution are suggested for this site.
    License for publishing multimedia online 0108263

    Registration Number: 130349
    FOLLOW US
    成 人无码在线视频高清不卡| AAA级久久久精品无码片| 免费无码AV一区二区| 成人无码A区在线观看视频| 久久精品天天中文字幕人妻| 国模GOGO无码人体啪啪| 一区二区三区无码视频免费福利 | 最近免费字幕中文大全| 亚洲av无码国产精品色在线看不卡 | 97无码免费人妻超| 亚洲V无码一区二区三区四区观看| 久久精品天天中文字幕人妻| 亚洲成A∨人片天堂网无码| 久久久久亚洲av无码专区喷水| 一本一道av中文字幕无码| 欧美日韩亚洲中文字幕二区 | 日韩精品无码免费视频| 人妻少妇伦在线无码专区视频| 亚洲国产精品成人精品无码区 | 最近更新中文字幕第一页| 久クク成人精品中文字幕| 精品久久久久久无码中文字幕| 人妻丰满熟妇无码区免费| 亚洲AV无码一区二区三区性色| 暴力强奷在线播放无码| 中文字幕人妻无码一区二区三区| 色综合久久最新中文字幕| 久久精品中文字幕久久| 日韩中文字幕在线播放| avtt亚洲一区中文字幕| 伊人热人久久中文字幕| 最新版天堂中文在线| 天堂中文8资源在线8| 亚洲视频无码高清在线| 中文字幕一区二区三区在线不卡| 精品久久久久中文字| 亚洲中文字幕丝袜制服一区| 无码人妻精品一区二区三区蜜桃| 无码人妻精品一区二区蜜桃网站| 直接看的成人无码视频网站| 亚洲日韩乱码中文无码蜜桃臀网站|