Migrating from DMI-TCAT to 4CAT

# Migrating from DMI-TCAT to 4CAT ## Why migrate? [Twitter's API v1.1 will be deprecated on March 9, 2023](https://web.archive.org/web/20230109225630/https://twittercommunity.com/t/announcing-the-deprecation-of-v1-1-statuses-filter-endpoint/182960), and DMI-TCAT will not be adapted to the latest API version. [4CAT](https://4cat.nl) is now the recommended tool for conducting Twitter research, as it has a modern code base in Python, is actively maintained, and comes with many other features, such as retrieving and analyzing data from sources such as 4chan, bitchute, Reddit, Telegram, Tumblr, as well as Instagram, TikTok, and LinkedIn. ## What does the migration entail? - setting up and familiarizing yourself with 4CAT - getting new credentials for Twitter API v2. - backing up existing DMI-TCAT databases - moving data from DMI-TCAT to 4CAT - setting up new Twitter collections with 4CAT ## Migrating ### Setting up and familiarizing yourself with 4CAT - https://4cat.nl bundles all information, videos, and excercises with 4cat - We have instructions for [installing 4cat](https://github.com/digitalmethodsinitiative/4cat#installation) - We have a list of [equivalent dmi-tcat and 4cat functionalities](https://github.com/digitalmethodsinitiative/4cat/tree/tcat-datasource/datasources/dmi-tcatv2) - There is also a [playlist](https://www.youtube.com/playlist?list=PLWukutaRyIn31H0uPfkYlmbWvo83PnXXo) that contains a few short videos on how to install 4cat via Docker, create a data set, and analyse it using processors - We have a one and a half hour [video introducting the basic functionalities of 4cat](https://www.youtube.com/watch?v=VRMWuJYOKHQ), and showing how it can be used for academic research ### Getting new credentials for Twitter API v2. Access to the [academic track of the Twitter API](https://blog.twitter.com/developer/en_us/topics/tools/2021/enabling-the-future-of-academic-research-with-the-twitter-api.html), which allows ‘full-archive search’ – searching the full archive of all tweets posted since the platform started – is only available by request. You can read more about the process [here](https://developer.twitter.com/en/solutions/academic-research/application-info). To request access, you can follow these steps: 1) Start the process by going to the relevant page in the developer portal. You need to be logged in to Twitter to start the process. 2) If you match the criteria listed on the page, click ‘Start Academic Research Application’. 3) You will be asked to fill in a series of questions about how you plan to use the API. It is recommended that you keep a copy of your answers to these questions in a separate document, since you will not be able to see them after submitting! 4) If you are a student requesting access for your MA thesis, ask your supervisor to add a statement to their university profile page confirming that you are their thesis student; a link to this page can then serve as proof that you indeed qualify for access. 5) After filling in the form, Twitter will manually vet your request. This process takes a few days, or sometimes up to one week. They may ask you to clarify some of your answers before granting access. 6) If you have been granted access, you will receive an e-mail saying so at the address you provided. ### Backing up existing DMI-TCAT databases TBD ### Moving data from DMI-TCAT to 4CAT There are three ways to use your DMI-TCAT data in 4CAT: - Use the existing DMI-TCAT database and frontend and query it with 4CAT. 4CAT can interface directly with the DMI-TCAT frontend, allowing 4CAT users to access DMI-TCAT's data. - Use the existing TCAT database and query it with 4CAT. This option is for users who wish to retain the DMI-TCAT database but analyze it using 4CAT. Instructions for doing so can be found [here](https://github.com/digitalmethodsinitiative/4cat/tree/tcat-datasource/datasources/dmi-tcatv2). - Export data from DMI-TCAT into 4CAT. This option is for users who wish to migrate all data and functionality to 4CAT and completely abandon the DMI-TCAT code and database. (TBD) ### Gathering Twitter data with 4CAT When migrating data sets from DMI-TCAT into 4CAT, they can no longer be added to. If you have a DMI-TCAT bin that you would like to continue capturing in 4CAT, you will need to start a new data set into 4CAT. For "live" data collection, such as with DMI-TCAT's v1.1 track, follow, or 1% endpoints, you can use 4CAT's filtered stream endpoint. (TBD) For archive searches, you can access the full Twitter archive in 4CAT via Twitter's academic track. To do so, log into your 4CAT instance, create a new data set, select "Twitter API (v2) search" as the data source, and choose either the academic or standard API track. A short worksheet outlining how to obtain and use the Twitter v2 API with 4CAT can be found [here](https://docs.google.com/document/d/17v6xX805AGFZDLiv1S35dziMc9LlmL8IOMr5OFc6YQM/edit). Note that the academic API access is considered easier than v1.1 access as you can query after the fact, thus not requiring you to keep track of emerging issues. Refer to Pfeffer et al. ([2022](http://arxiv.org/abs/2204.02290)) for a description of how v2 academic access may work for you. However, the downside is that it does not allow for gathering deleted tweets. For more information on use cases for researching deleted tweets, refer to Bastos & Mercea ([2019](https://doi.org/10.1177/0894439317734157)) and Bastos ([2021](https://doi.org/10.1177/0002764221989772)). For more information on the differences between DMI-TCAT and 4CAT, see the [functionality comparison table](https://github.com/digitalmethodsinitiative/4cat/wiki/TCAT-4CAT-Comparison) ## Resources - The website https://4cat.nl provides a comprehensive guide with all the information, videos, and exercises related to 4CAT. - There is a YouTube [playlist](https://www.youtube.com/playlist?list=PLWukutaRyIn31H0uPfkYlmbWvo83PnXXo) that includes several short videos on how to install 4CAT through Docker, create a data set, and analyze it using processors. - Bernhard Rieder has produced a one-and-a-half hour [video introducing the basics of 4CAT](https://www.youtube.com/watch?v=VRMWuJYOKHQ), and demonstrating its use in academic research. - You can find [4CAT installation instructions](https://github.com/digitalmethodsinitiative/4cat#installation) on Github - There is a list of [equivalent functionalities of DMI-TCAT and 4CAT](https://github.com/digitalmethodsinitiative/4cat/tree/tcat-datasource/datasources/dmi-tcatv2). ## Support If you have any questions or are unsure about anything, don't hesitate to reach out by creating an [issue](https://github.com/digitalmethodsinitiative/dmi-tcat/issues) on the DMI-TCAT Github. We'll do our best to assist you in migrating your existing data sets. As academics, we provide open-source software to the research community for free. The best way to support us is by citing our papers in your academic work. The accompanying paper for 4CAT is written by Peeters & Hagen ([2022](https://doi.org/10.5117/CCR2022.2.007.HAGE)), but you're also welcome to continue citing Borra and Rieder ([2014](https://doi.org/10.1108/AJIM-09-2013-0094)). ## Farewell It has been a great journey. Thank you for all your support and feedback. So long and thanks for all the fish. ## Open questions - how to best make a back-up? - I would think helpers/archive_export.php which is a bit more comprehensive than export.php (including error data and such). You cannot import that directly into 4CAT, but could import it back into a TCAT instance and from there import into 4CAT. - We need to provide more clarity in the section 'Moving data from DMI-TCAT to 4CAT' - What is the status of https://github.com/digitalmethodsinitiative/dmi-tcat/wiki/The-Future-of-TCAT. Might need a page rename, and rewrite to show best options. May replace the section "Migrating from DMI-TCAT to 4CAT" - how to best use existing tcat database and query it with 4cat - export data from tcat into 4cat - helpers/archive_export.php? - helpers/export.php - helpers/migrate.php - TBD: bin by bin? All bins at the same time? - Currently, the use of an existing DMI-TCAT database and frontend is the most straightforward way to import to 4CAT. It works bin by bin and completely copies the data to 4CAT. - bin by bin!! - double check section 'Gathering Twitter data with 4CAT' - rewrite query. Streaming or academic search (past tweets). Any query that twitter allows, you will have to rewrite that per search / bin. - after import, are queries ported and transformed into new Twitter rules? - They are transformed to match the Twitter API v2 format as closely as possible; they can then be combined with and all Twitter processors can run on both (TCAT collected and new API v2 tweets) - does live capture / stream work? i.e. replicating 'track' - Martin can speak to this better, but there is a Filtered stream using rules (like track or follow) and a Sample stream with x% of tweets - how to capture user accounts / timelines / follow relations / id - 4CAT currently only collects tweets (though each tweet contains all user account data from the tweeter) and allows you to create any query allowed by Twitter. You can therefore collect all timeslines of certain users. There is also a function (deactivated by default) that allows you to "rehydrate" a list ot Tweet IDs and recollect them. Twitter API does allows you to collect follower relationships, but we do not currently have a datasource set up for that (did TCAT?). - how to do subselection / filter in 4cat? (missing from tcat-4cat-comparison) - There are a list of available "Filter" processors/analysis which includes a custom filter that allows you to select any mapped attribute of a Tweet. These filters create a subset, new Dataset to work with in 4CAT. - Rather than the notion of 'bins' we now use ...? collection of query rules? - 4CAT collects a single "query" and creates a Dataset from that query. This is different from the live capture/stream which is ruled based. There will be some way to interact with the live capture tweets that will involve querying that database; those Datasets created this way will have both a set of rules and a query (which could be all tweets with those rules). - query bin -> translates to data collection in 4cat. - diff search and streaming - streaming -> tag is bin name, own rule. - search -> own rule, but static data set of tweets, matching parameters. Then you can run filters to make subsets. - [Equivalence page](https://github.com/digitalmethodsinitiative/4cat/wiki/TCAT-4CAT-Comparison) - ✔️ [Count Values](https://github.com/digitalmethodsinitiative/4cat/blob/master/processors/metrics/rank_attribute.py) with urls column -> does this count media frequency? or URL frequency? - Counts URLs (tweets come with a list of URLs). You could also use our URL extractor processor, select which columns you are interested (such as the text `body`, `images`, `urls` etc.), and then rank the newly collected group of URLs. ## Todo - add meta-data from query bin when importing into 4cat. I.e. date, keywords, etc. - add pointers to how to translate queries from v1 to v2. E.g. from follow to from:@user OR from:@user. Limit to rule is 1024 characters. For academic: 4096. There is a possibility to use multiple use. - Do not expire tcat bins - Streaming (by 9 March) - pull request is ready. Stijn needs a day or so to go through the code. - getting 4cat scrapers into 4cat - ppl can already set up rule logic - Even though it would be possible to merge data sets, we don't do it for you as v1.1 and keywords are different from v2 and rules. - So half a day to install and configure 4cat, depending on bin size it may take a while. - overview of archived data sets ## Bibliography - Bastos, M. (2021). This Account Doesn’t Exist: Tweet Decay and the Politics of Deletion in the Brexit Debate. _American Behavioral Scientist_, _65_(5), 757–773. [https://doi.org/10.1177/0002764221989772](https://doi.org/10.1177/0002764221989772) - Bastos, M. T., & Mercea, D. (2019). The Brexit Botnet and User-Generated Hyperpartisan News. _Social Science Computer Review_, _37_(1), 38–54. [https://doi.org/10.1177/0894439317734157](https://doi.org/10.1177/0894439317734157) - Borra, E., & Rieder, B. (2014). Programmed Method: Developing a Toolset for Capturing and Analyzing Tweets. _Aslib Journal of Information Management_, _66_(3), 262–278. [https://doi.org/10.1108/AJIM-09-2013-0094](https://doi.org/10.1108/AJIM-09-2013-0094) - Peeters, S., & Hagen, S. (2022). The 4CAT Capture and Analysis Toolkit: A Modular Tool for Transparent and Traceable Social Media Research. _Computational Communication Research_, _4_(2), 571–589. [https://doi.org/10.5117/CCR2022.2.007.HAGE](https://doi.org/10.5117/CCR2022.2.007.HAGE) - Pfeffer, J., Mooseder, A., Hammer, L., Stritzel, O., & Garcia, D. (2022). _This Sample seems to be good enough! Assessing Coverage and Temporal Reliability of Twitter’s Academic API_ (arXiv:2204.02290). arXiv. [http://arxiv.org/abs/2204.02290](http://arxiv.org/abs/2204.02290)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.