# Design a twitter crawler > This document is used to record the development process. ## 1. Use cases and constraints > Gather requirements and scope the problem. ### Use cases #### We'll scope the problem to handle the following use cases * **Monitor** : * Monitor multiple twitter accounts activities (`tweet`, `retweet`, `delete`) * **Crawler** : * Collect the `first 5` `tweets` (retweets), and their `replies` and `likes` count given a user account. * **User app** : * Add/delete multiple user accounts to the system. * Access/download all data has been collected. ### Constraints and assumptions #### State assumptions * Support only anonymous users * The web crawler should not get stuck in an infinite loop * We get stuck in an infinite loop if the graph contains a cycle * `500` users to monitor * Monitor a user every `15 min` #### Calculate usage * `2 MB` of stored content per day * One account posts 1 tweet per day, while we need to get 5 tweets * 7 Kb per tweet * 100 replies per tweet * 4 kb per reply * 5 * 100 * 4 kb + 7 kb = 2 MB ## 2: Create a high level design > Outline a high level design with all important components. ![Imgur](https://i.imgur.com/ePcve6A.png) ## 3: Development environment * Python 3.7 * MongoDB * MySQL * Javascript * Tweepy, twarc * Ubuntu 14.04 ## 4: Hardware requirements * 2 CPUs * 8 GB RAM * 40+ GB hard-drive space ## 5: Next step > The next step is to develop each core componets