erikb
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Migrating from DMI-TCAT to 4CAT ## Why migrate? [Twitter's API v1.1 will be deprecated on March 9, 2023](https://web.archive.org/web/20230109225630/https://twittercommunity.com/t/announcing-the-deprecation-of-v1-1-statuses-filter-endpoint/182960), and DMI-TCAT will not be adapted to the latest API version. [4CAT](https://4cat.nl) is now the recommended tool for conducting Twitter research, as it has a modern code base in Python, is actively maintained, and comes with many other features, such as retrieving and analyzing data from sources such as 4chan, bitchute, Reddit, Telegram, Tumblr, as well as Instagram, TikTok, and LinkedIn. ## What does the migration entail? - setting up and familiarizing yourself with 4CAT - getting new credentials for Twitter API v2. - backing up existing DMI-TCAT databases - moving data from DMI-TCAT to 4CAT - setting up new Twitter collections with 4CAT ## Migrating ### Setting up and familiarizing yourself with 4CAT - https://4cat.nl bundles all information, videos, and excercises with 4cat - We have instructions for [installing 4cat](https://github.com/digitalmethodsinitiative/4cat#installation) - We have a list of [equivalent dmi-tcat and 4cat functionalities](https://github.com/digitalmethodsinitiative/4cat/tree/tcat-datasource/datasources/dmi-tcatv2) - There is also a [playlist](https://www.youtube.com/playlist?list=PLWukutaRyIn31H0uPfkYlmbWvo83PnXXo) that contains a few short videos on how to install 4cat via Docker, create a data set, and analyse it using processors - We have a one and a half hour [video introducting the basic functionalities of 4cat](https://www.youtube.com/watch?v=VRMWuJYOKHQ), and showing how it can be used for academic research ### Getting new credentials for Twitter API v2. Access to the [academic track of the Twitter API](https://blog.twitter.com/developer/en_us/topics/tools/2021/enabling-the-future-of-academic-research-with-the-twitter-api.html), which allows ‘full-archive search’ – searching the full archive of all tweets posted since the platform started – is only available by request. You can read more about the process [here](https://developer.twitter.com/en/solutions/academic-research/application-info). To request access, you can follow these steps: 1) Start the process by going to the relevant page in the developer portal. You need to be logged in to Twitter to start the process. 2) If you match the criteria listed on the page, click ‘Start Academic Research Application’. 3) You will be asked to fill in a series of questions about how you plan to use the API. It is recommended that you keep a copy of your answers to these questions in a separate document, since you will not be able to see them after submitting! 4) If you are a student requesting access for your MA thesis, ask your supervisor to add a statement to their university profile page confirming that you are their thesis student; a link to this page can then serve as proof that you indeed qualify for access. 5) After filling in the form, Twitter will manually vet your request. This process takes a few days, or sometimes up to one week. They may ask you to clarify some of your answers before granting access. 6) If you have been granted access, you will receive an e-mail saying so at the address you provided. ### Backing up existing DMI-TCAT databases TBD ### Moving data from DMI-TCAT to 4CAT There are three ways to use your DMI-TCAT data in 4CAT: - Use the existing DMI-TCAT database and frontend and query it with 4CAT. 4CAT can interface directly with the DMI-TCAT frontend, allowing 4CAT users to access DMI-TCAT's data. - Use the existing TCAT database and query it with 4CAT. This option is for users who wish to retain the DMI-TCAT database but analyze it using 4CAT. Instructions for doing so can be found [here](https://github.com/digitalmethodsinitiative/4cat/tree/tcat-datasource/datasources/dmi-tcatv2). - Export data from DMI-TCAT into 4CAT. This option is for users who wish to migrate all data and functionality to 4CAT and completely abandon the DMI-TCAT code and database. (TBD) ### Gathering Twitter data with 4CAT When migrating data sets from DMI-TCAT into 4CAT, they can no longer be added to. If you have a DMI-TCAT bin that you would like to continue capturing in 4CAT, you will need to start a new data set into 4CAT. For "live" data collection, such as with DMI-TCAT's v1.1 track, follow, or 1% endpoints, you can use 4CAT's filtered stream endpoint. (TBD) For archive searches, you can access the full Twitter archive in 4CAT via Twitter's academic track. To do so, log into your 4CAT instance, create a new data set, select "Twitter API (v2) search" as the data source, and choose either the academic or standard API track. A short worksheet outlining how to obtain and use the Twitter v2 API with 4CAT can be found [here](https://docs.google.com/document/d/17v6xX805AGFZDLiv1S35dziMc9LlmL8IOMr5OFc6YQM/edit). Note that the academic API access is considered easier than v1.1 access as you can query after the fact, thus not requiring you to keep track of emerging issues. Refer to Pfeffer et al. ([2022](http://arxiv.org/abs/2204.02290)) for a description of how v2 academic access may work for you. However, the downside is that it does not allow for gathering deleted tweets. For more information on use cases for researching deleted tweets, refer to Bastos & Mercea ([2019](https://doi.org/10.1177/0894439317734157)) and Bastos ([2021](https://doi.org/10.1177/0002764221989772)). For more information on the differences between DMI-TCAT and 4CAT, see the [functionality comparison table](https://github.com/digitalmethodsinitiative/4cat/wiki/TCAT-4CAT-Comparison) ## Resources - The website https://4cat.nl provides a comprehensive guide with all the information, videos, and exercises related to 4CAT. - There is a YouTube [playlist](https://www.youtube.com/playlist?list=PLWukutaRyIn31H0uPfkYlmbWvo83PnXXo) that includes several short videos on how to install 4CAT through Docker, create a data set, and analyze it using processors. - Bernhard Rieder has produced a one-and-a-half hour [video introducing the basics of 4CAT](https://www.youtube.com/watch?v=VRMWuJYOKHQ), and demonstrating its use in academic research. - You can find [4CAT installation instructions](https://github.com/digitalmethodsinitiative/4cat#installation) on Github - There is a list of [equivalent functionalities of DMI-TCAT and 4CAT](https://github.com/digitalmethodsinitiative/4cat/tree/tcat-datasource/datasources/dmi-tcatv2). ## Support If you have any questions or are unsure about anything, don't hesitate to reach out by creating an [issue](https://github.com/digitalmethodsinitiative/dmi-tcat/issues) on the DMI-TCAT Github. We'll do our best to assist you in migrating your existing data sets. As academics, we provide open-source software to the research community for free. The best way to support us is by citing our papers in your academic work. The accompanying paper for 4CAT is written by Peeters & Hagen ([2022](https://doi.org/10.5117/CCR2022.2.007.HAGE)), but you're also welcome to continue citing Borra and Rieder ([2014](https://doi.org/10.1108/AJIM-09-2013-0094)). ## Farewell It has been a great journey. Thank you for all your support and feedback. So long and thanks for all the fish. ## Open questions - how to best make a back-up? - I would think helpers/archive_export.php which is a bit more comprehensive than export.php (including error data and such). You cannot import that directly into 4CAT, but could import it back into a TCAT instance and from there import into 4CAT. - We need to provide more clarity in the section 'Moving data from DMI-TCAT to 4CAT' - What is the status of https://github.com/digitalmethodsinitiative/dmi-tcat/wiki/The-Future-of-TCAT. Might need a page rename, and rewrite to show best options. May replace the section "Migrating from DMI-TCAT to 4CAT" - how to best use existing tcat database and query it with 4cat - export data from tcat into 4cat - helpers/archive_export.php? - helpers/export.php - helpers/migrate.php - TBD: bin by bin? All bins at the same time? - Currently, the use of an existing DMI-TCAT database and frontend is the most straightforward way to import to 4CAT. It works bin by bin and completely copies the data to 4CAT. - bin by bin!! - double check section 'Gathering Twitter data with 4CAT' - rewrite query. Streaming or academic search (past tweets). Any query that twitter allows, you will have to rewrite that per search / bin. - after import, are queries ported and transformed into new Twitter rules? - They are transformed to match the Twitter API v2 format as closely as possible; they can then be combined with and all Twitter processors can run on both (TCAT collected and new API v2 tweets) - does live capture / stream work? i.e. replicating 'track' - Martin can speak to this better, but there is a Filtered stream using rules (like track or follow) and a Sample stream with x% of tweets - how to capture user accounts / timelines / follow relations / id - 4CAT currently only collects tweets (though each tweet contains all user account data from the tweeter) and allows you to create any query allowed by Twitter. You can therefore collect all timeslines of certain users. There is also a function (deactivated by default) that allows you to "rehydrate" a list ot Tweet IDs and recollect them. Twitter API does allows you to collect follower relationships, but we do not currently have a datasource set up for that (did TCAT?). - how to do subselection / filter in 4cat? (missing from tcat-4cat-comparison) - There are a list of available "Filter" processors/analysis which includes a custom filter that allows you to select any mapped attribute of a Tweet. These filters create a subset, new Dataset to work with in 4CAT. - Rather than the notion of 'bins' we now use ...? collection of query rules? - 4CAT collects a single "query" and creates a Dataset from that query. This is different from the live capture/stream which is ruled based. There will be some way to interact with the live capture tweets that will involve querying that database; those Datasets created this way will have both a set of rules and a query (which could be all tweets with those rules). - query bin -> translates to data collection in 4cat. - diff search and streaming - streaming -> tag is bin name, own rule. - search -> own rule, but static data set of tweets, matching parameters. Then you can run filters to make subsets. - [Equivalence page](https://github.com/digitalmethodsinitiative/4cat/wiki/TCAT-4CAT-Comparison) - ✔️ [Count Values](https://github.com/digitalmethodsinitiative/4cat/blob/master/processors/metrics/rank_attribute.py) with urls column -> does this count media frequency? or URL frequency? - Counts URLs (tweets come with a list of URLs). You could also use our URL extractor processor, select which columns you are interested (such as the text `body`, `images`, `urls` etc.), and then rank the newly collected group of URLs. ## Todo - add meta-data from query bin when importing into 4cat. I.e. date, keywords, etc. - add pointers to how to translate queries from v1 to v2. E.g. from follow to from:@user OR from:@user. Limit to rule is 1024 characters. For academic: 4096. There is a possibility to use multiple use. - Do not expire tcat bins - Streaming (by 9 March) - pull request is ready. Stijn needs a day or so to go through the code. - getting 4cat scrapers into 4cat - ppl can already set up rule logic - Even though it would be possible to merge data sets, we don't do it for you as v1.1 and keywords are different from v2 and rules. - So half a day to install and configure 4cat, depending on bin size it may take a while. - overview of archived data sets ## Bibliography - Bastos, M. (2021). This Account Doesn’t Exist: Tweet Decay and the Politics of Deletion in the Brexit Debate. _American Behavioral Scientist_, _65_(5), 757–773. [https://doi.org/10.1177/0002764221989772](https://doi.org/10.1177/0002764221989772) - Bastos, M. T., & Mercea, D. (2019). The Brexit Botnet and User-Generated Hyperpartisan News. _Social Science Computer Review_, _37_(1), 38–54. [https://doi.org/10.1177/0894439317734157](https://doi.org/10.1177/0894439317734157) - Borra, E., & Rieder, B. (2014). Programmed Method: Developing a Toolset for Capturing and Analyzing Tweets. _Aslib Journal of Information Management_, _66_(3), 262–278. [https://doi.org/10.1108/AJIM-09-2013-0094](https://doi.org/10.1108/AJIM-09-2013-0094) - Peeters, S., & Hagen, S. (2022). The 4CAT Capture and Analysis Toolkit: A Modular Tool for Transparent and Traceable Social Media Research. _Computational Communication Research_, _4_(2), 571–589. [https://doi.org/10.5117/CCR2022.2.007.HAGE](https://doi.org/10.5117/CCR2022.2.007.HAGE) - Pfeffer, J., Mooseder, A., Hammer, L., Stritzel, O., & Garcia, D. (2022). _This Sample seems to be good enough! Assessing Coverage and Temporal Reliability of Twitter’s Academic API_ (arXiv:2204.02290). arXiv. [http://arxiv.org/abs/2204.02290](http://arxiv.org/abs/2204.02290)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully