Try   HackMD

【Cofacts 真的假的】Open Datasets

In order to facilitate academic research and analysis in fact-checking field under closed messaging platform, Cofacts releases all instant messages and replies in its database to the public domain, under CC0 license. Everyone can freely distribute and leverage the dataset.

Format

All CSV files are utf-8 encoded.

Fields across different entities

  • userIdsha (string) Hashed user identifier.
  • appId (string) Possible values:
    • LEGACY_APP: Articles collected before 2017-03.
    • RUMORS_LINE_BOT: Articles collected with the current LINE bot client after 2017-03.

The two fields together identifies an unique user across different CSV files. For instance, if one row (reply) in replies.csv and another row (feedback) in article_reply_feedbacks.csv have identical userIdsha and appId, the reply and the feedback are submitted by the same user.

Fields

articles.csv

The instant messages LINE bot users submitted into the database.

Field Data type Description
id String
references Enum string Where the message is from. Currently the only possible value is LINE.
userIdsha String Author of the article.
appId String
tags Text Preserved for category labels, currently empty.
normalArticleReplyCount Integer The number of replies are associated to this article, excluding the deleted reply associations.
text Text The instant message text
hyperlinks Text Preserved. Now empty.
createdAt ISO time string When the article is submitted to the database.
updatedAt ISO time string Preserved, currently identical to createdAt
lastRequestedAt ISO time string The submission time of the last reply_request is sent on the article, before the article is replied.

article_replies.csv

Articles and replies are in has-and-belongs-to-many relationship. That is, an article can have multiple replies, and a reply can be connected to multiple similar articles.

article_replies is the "join table" between articles and replies, bringing articleId and replyId together, along with other useful properties related to this connection between an article and a reply.

One pair of articleId, replyId will map to exactly one article_reply.

Field Data type Description
articleId String Relates to id field of articles
replyId String Relates to id field of replies
userId String The user connecting the reply with the article
negativeFeedbackCount Integer Number of article_reply_feedbacks that has score -1
positiveFeedbackCount Integer Number of article_reply_feedbacks that has score 1
replyType Enum string Duplicated from replies's type.
appId String
status Enum string NORMAL: The reply and article are connected. DELETED: The reply does not connect to the article anymore.
createdAt ISO time string The time when the reply is connected to the article
updatedAt ISO time string The latest date when the reply's status is updated

replies.csv

Editor's reply to the article.

Field Data type Description
id String
type Enum string Type of the reply chosen by the editor. RUMOR: The article contains rumor. NON_RUMOR: The article contains fact. OPINIONATED: The article contains personal opinions. NOT_ARTICLE: The article should not be processed by Cofacts.
reference Text For RUMOR and NON_RUMOR replies: The reference to support the chosen type and text. For OPINIONATED replies: References containing different perspectives from the article. For NOT_ARTICLE: empty string.
userId String The editor that authored this reply.
appId String
text Text Reply text writtern by the editor
createdAt ISO Time string When the reply is written

reply_requests.csv

Before an article is replied, users may submit reply_requests to indicate that they want this article to be answered.

When an article is first submitted to the article, an reply request is also created. Any further queries to the same article submits new reply_requests.

An user can only submit one reply request to an article.

Field Data type Description
articleId String The target of the request
createdAt ISO Time string When the reply request is issued

article_reply_feedbacks.csv

Editors and LINE bot users can express if a reply is useful by submitting article_reply_feedbacks toward a article_reply with score 1 or -1.

The feedback is actually submitted toward an article_reply, the connection between an article and a reply. This is because a reply can be connected to multiple articles. A reply that makes sense in one article does not necessarily mean that it is useful in answering another article. Therefore, the feedback count for a reply connecting to different articles are counted separately.

Field Data type Description
articleId String Relates to articleId of the target article_reply
replyId String Relates to replyId of the target article_reply
score Integer 1: Useful. -1: Not useful.
createdAt ISO Time string When the feedback is submitted

License


To the extent possible under law, g0v Cofacts Project has waived all copyright and related or neighboring rights to Cofacts Dataset. This work is published from: Taiwan.


Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Cofacts 專案首頁