For a better user experience, please open this page on your PC.

PanGu_sample_dataset

本数据集一共包含8个文件,分别为百科(baike.txt)、电子书1(books1.txt)、电子书2(books2.txt)、Common Crawl(common_crawl_2019.txt)、新闻数据(data-news.txt)、开放数据集1(openData-1.txt)、开放数据集2(openData-2.txt)、Sogou-T(Sogou-T.txt),构成鹏城盘古系列模型的训练语料,所有语料经过人工和模型相结合的数据质量评估方法进行评估。
natural language processing language modeling
* Please backup or migrate the files listed as "Non-upgradable" in the "upgrade status" column on this page in a timely manner. These files will be cleaned up after this page is taken offline
File Name
Size
Status
Creator
Upload Time
Upgrade status
Operate
Unzip Status:Unzip Successed   
Download:21
20 MiB
Private Public
2022-10-11 18:42:44
Upgraded
Go to new version Download
Baidu
taptap点点登录入口