---
tags: AS 2022
title: Course Project
---
# Course Project description
The project aims at teaching how to perform a data analysis using techniques that will be learned it the course (i.e statistical tests, deriving descriptive statistics from raw data, etc.)
The project requires synthesizing all the material from the course and it might also require some additional material provided by the instructors. Hence, it's one of the best ways to solidify your understanding of statistical methods. Moreover, it can promote your intellectual curiosity in the field of Computer Science. Furthermore, a properly done project might lead to a Scopus-indexed publication.
# Project Deliverables
Projects are tracked and reviewed in deep weekly using the supplied overleaf files, which will be the primary form of submission of work also. Projects will also have a final presentation made by each group 10 min. All the participants in a project are expected to provide an equivalent contribution and to know any aspect of the project; a lack of knowledge on it would amount to a failure in the project. Each week a minimum of 50 newly extracted metrics should be presented together with corresponding statistical analysis.
# Data Extraction
You will need to run a tool developed for the extraction of metrics in a docker container to collect all the necessary metrics. The prerequisites are :
1. [Docker](https://www.docker.com)
2. postgres (installed together with pgAdmin)
## Installing prerequisites
It is recommended to use Linux or Mac. However, it is still possible to use Windows.
### Installing Docker
You already know from the course of Big Data Analytics
### Installing postgres (installed together with pgAdmin)
All the retrieved data will be stored in postgres database in your local machine. The data extraction tool that will be run in the docker container will save the data in your local machine postgres database. PGAdmin is used to manage the Postgres database and its services. You will use it to monitor the progress of data extraction and specify the list of repositories to extract.
* [**Installing pgAdmin on Ubuntu**](https://www.pgadmin.org/download/pgadmin-4-apt/)
* [**Installing pgAdmin on Windows 10**](https://www.postgresqltutorial.com/install-postgresql/)
* [**Installing pgAdmin on Mac**](https://www.postgresqltutorial.com/install-postgresql-macos/)
* [**Installing pgAdmin on Mac - Using homebrew**](https://www.sqlshack.com/setting-up-a-postgresql-database-on-mac/)
:::warning
Make sure that you [create a password for postgres](https://stackoverflow.com/questions/14035742/pgadmin-gives-me-the-error-no-password-supplied)
:::
# Steps
1. Clone repository : you need to [clone repository](https://github.com/xavzelada/tom) i.e `git clone https://github.com/xavzelada/tom.git`
1. Navigate to cloned repository and sub-folder `./tom/radar/`
1. Modify the Docker file where your postgres password is in line 12 & 13
``12 ENV DATABASE_URL #DATABASE_URL `` <br>``13 ENV TOM_DATABASE_PASSWORD #DATABASE_PASSWORD``
3. Build the docker container using `docker build . -t tom-radar`
4. Run docker container using : `docker run --name tom-radar --net=host -p 3000:3000 --mount type=bind,source=/etc/hosts,dst=/etc/hosts -d tom-radar`
5. navigate into the docker container using : `docker exec -it tom-radar sh -c "export TERM=xterm && bash"`
6. Setup the database in the container using: `rake db:setup`
7. Run the migrations inside the container using : `rake db:migrate`
8. Open query tool in pgAdmin paste the query in [**file**](https://www.dropbox.com/s/z7l44kszpdlghge/tom_settings.sql?dl=0) to set up the local database
9. Add the list of repositories to extract to Postgres database (using query tool)
10. In pgAdmin add token to tom_tokens_queues table. The token is a GitHub token.
11. From the browser start the data collection process by visiting : `http://localhost:3000/api/v1/repos_update?source=github`
All the metrics will be stored in table ____ and can be exported to .csv file for further analysis
# Grading Criteria
The project will be evaluated during the whole period of the course.
Overall, the project is required to exhibit a correct analysis, failing to do so will result in a failing grade. Moreover, the following weighted criteria will be considered:
* Local correctness of the whole discussion (40%)
* Completeness of the analysis (40%)
* Resulting written documentation (15%)
* The presentation with Q&A (5%)
Please, note that the final video will be evaluated only at the end of the project.