# Design a twitter crawler
> This document is used to record the development process.
## 1. Use cases and constraints
> Gather requirements and scope the problem.
### Use cases
#### We'll scope the problem to handle the following use cases
* **Monitor** :
* Monitor multiple twitter accounts activities (`tweet`, `retweet`, `delete`)
* **Crawler** :
* Collect the `first 5` `tweets` (retweets), and their `replies` and `likes` count given a user account.
* **User app** :
* Add/delete multiple user accounts to the system.
* Access/download all data has been collected.
### Constraints and assumptions
#### State assumptions
* Support only anonymous users
* The web crawler should not get stuck in an infinite loop
* We get stuck in an infinite loop if the graph contains a cycle
* `500` users to monitor
* Monitor a user every `15 min`
#### Calculate usage
* `2 MB` of stored content per day
* One account posts 1 tweet per day, while we need to get 5 tweets
* 7 Kb per tweet
* 100 replies per tweet
* 4 kb per reply
* 5 * 100 * 4 kb + 7 kb = 2 MB
## 2: Create a high level design
> Outline a high level design with all important components.

## 3: Development environment
* Python 3.7
* MongoDB
* MySQL
* Javascript
* Tweepy, twarc
* Ubuntu 14.04
## 4: Hardware requirements
* 2 CPUs
* 8 GB RAM
* 40+ GB hard-drive space
## 5: Next step
> The next step is to develop each core componets