# Overview
The main objective of this module from LiveSD is to implement an application that goes through a software repository in an efficient way, to analyze the quality of the code and provide useful metrics, such as warnings for duplicated code, code smells, and large blocks of code. The results will be stored in a database-like system so that they can later be re-utilized and later presented in a user-friendly way by other modules of this project. This can then be extended to analyzing multiple repositories at once. The crawler program is executed on a virtual machine on google cloud and there be can many instances of it.
In terms of programming languages, we found out that crawlers can be developed in most programming languages. To be able to give out code smells and quality metrics we choose a specific language to analyze. This module is designed in such a way that in the future, it can be easily extended to support multiple languages.
# Requirements
## Functional
- As a User, I want to receive code smells about my code in an organized fashion so that I am informed about their presence.
- As a User, I want to receive quality metrics about my code so I can measure its quality.
## Non-Functional
- As a User, I want to be able to scan multiple repositories at once, so that I can compare repositories without having to schedule multiple scans at different times.
- As a User, I want repositories to be scanned in real-time so that I can expect when the scan of the repository will be finished.
- As a User, I want the metrics shown to be of relevance to the repository, so that I can act upon them to improve the repository.
# Design
## Design Decisions
### 1. Should the crawler have a dedicated interface to show the user the results of crawling a repository?
**Answer: No.**
A graphical interface is not necessary, since there are already a few other modules that will handle the display of analyzed information.
### 2. Should the group use analyzers for code smells and metrics that already exist?
**Answer: Yes.**
The possibility of making our analyzers were considered but after some discussion, we realized this would take too much time and the decision was made to reuse previously made analyzers widely available on the internet.
### 3. What programming language to use?
**Answer: Typescript**
The programming language should be compatible with the other projects and must also be closely related to the languages used by the metric and smell analyzers. After some research, we found out that crawlers can be implemented in many programming languages so we ended up choosing crawlers that were made in Typescript as this language is already well known by all members of the group.
### 4. What programming language/s should the crawler support ?
**Answer: JavaScript**
This is something that was dependent on what analyzers the group choose. The smell analyzer works for both java and JavaScript while the metric analyzer only works for javascript so this was the only language chosen to be supported.
### 5. Should the output of the crawler be saved anywhere? If so, Where?
**Answer: Yes. In a virtual machine on google cloud which only has one function, store all data by all groups in the project (repository).**
It was decided that instead of having one database to save all the data by all groups, there was to be one virtual machine on google Cloud that has some storage and allows the groups to easily save their data there as well share data with other modules.
## Issues/challenges
- Integration with other groups' projects. It is important to coordinate with other groups to make sure this integration can go as smoothly as possible.
- Other groups have similar necessities (needing to pull a git repository and others), it is imperative to make sure not to create modules with the same functionality.
- What should be the chosen smell analyzer and metric analyzer? (There are a lot of analyzers available, each with their own characteristics)
- What relevant smells and metrics should the crawler give out? (Depends on the smell analyzer and the metric analyzer)
- How should the information given out by the crawler be saved/encoded?
## Adopted Patterns
- **Adapter**
The adapted pattern was used as a means to standardize and allow the Analysis Builder class to not have to know specific parameters required for each Analyser. This way the whenever a new Analyser is added to the pool, it is easily used by the builder and does not require much setup in that class.
- **Builder**
The builder pattern was used as a means to build a list of Reports for each type of Analysis without having to know which Analyser to use in each case. This allows the Reports to be generated for each Analysis and only then be accessed all at once which is also inline with how the Git Crawler is supposed to work.
- **Template method**
This pattern was used for the Report Maker interface. This was done so that each Analyser can still have its own Report Maker class, but still have a general Report to be used by other classes without having to know where or how it was generated.
# Implementation
## Interfaces
The communication with the application is done through a **Web Server** without a visual interface and has a **REST API** prepared to receive requests without requiring any authentication.
- **Address**: `35.205.24.142:8011`
- **API routes:**
- `/` - GET
- **Input**
N/A
- **Output**
A string explaining how to use the API
- `/download` - GET
- **Input**
**Required**
`source` - query parameter - Should contain a public git repository *url* to be cloned
- **Output** - A JSON object with a success value.
- **Success**
`success` is set to true
- **Error**
`success` is set to false; also has a reason.
- `/crawl` - GET
- **Input**
**Required**
`repoName` - query parameter - Should contain the name of a repository previously downloaded
**Optional**
`js` - query parameter - Whether the javascript analyser should run or not. This parameter acts as a boolean; if it is set, is considered true; false otherwise.
`metrics` - query parameter - Whether the metrics analyser should run or not. This parameter acts as a boolean; if it is set, is considered true; false otherwise.
`java` - query parameter - Whether the java analyser should run or not **(not implemented yet)**. This parameter acts as a boolean; if it is set, is considered true; false otherwise.
`ts` - query parameter - Whether the typescript analyser should run or not **(not implemented yet)**. This parameter acts as a boolean; if it is set, is considered true; false otherwise.
- **Output** - A JSON object with a success value.
- **Success**
success is set to true; There is also a result object that contains all the data generated by the selected analysers.
- **Error**
success is set to false; also has a reason.
- `/repository-list` - GET
- **Input**
N/A
- **Output** - a JSON object with:
- Success value
- List of names of all repositories previously downloaded using the `/download` route.
`/remove-repository` - GET
- **Input**
**Required**
`repoName`- name of the repository to be removed - Should contain the name of the repository previously downloaded using the `/download` route.
- **Output** - a JSON object with a success value.
- **Success**
`success` is set to true
- **Error**
`success` is set to false
string with reason for error
A normal flow would be to check whether or not the wanted repository has already been downloaded (`repository-list`). If it has, the next step is to `crawl`. Otherwise, a `download` should be done before `crawl`. In the case that the repository version already stored is outdated, a `remove-repository` call should be done, followed by a `download` before crawling.
## Smells and metrics
**Code smells detectable by JScent ([https://github.com/moskirathe/JScent](https://github.com/moskirathe/JScent)) :**
- **Long message chain** - In code you see a series of calls resembling a->b()->c()->d()
- **Feature envy** - A method accesses the data of another object more than its own data.
- **Long parameter list** - A method with more than 3 parameters
- **Large objects - A class/object that is doing too much**
- **Dead code** - A variable, parameter, field, method or class is no longer used (usually because it is obsolete).
- **Long Methods** - A method contains too many lines of code
- **Switch statement** - A switch statement with too many cases
- **Method comments** - A method is filled with explanatory comments
**Code metrics given out by Nocuous ([https://github.com/h-o-t/nocuous](https://github.com/h-o-t/nocuous)) :**
|Metric|Label|Description|Thereshold|
|---|---|---|---|
|File Length|L|The number of lines in a file|500|
|Class fan-out complexity|CFAC|The number of classes or interfaces in the dependency chain for a given class|30|
|Class data abstraction coupling|CDAC|The number of instances of other classes that are "new"ed in a given class|10|
|Anon Inner Length|AIL|Class expressions of arrow functions length in number of lines|35|
|Function Length|FL|The number of statements in a function declaration, function expression, or method declaration|30|
|Parameter Number|P|The number of parameters for a function or method|6|
|Cyclomatic Complexity|CC|The cyclomatic complexity for a function or method|10|
|Nested `if` Depth|ID|The number of nested `if` statements|3|
|Nested `try` Depth|TD|The number of nested `try` statements|2|
|Binary Expression Complexity|BEC|How complex a binary expression is (e.g. how many `&&` and `\|\|`|0|
|Missing Switch Default|MSD|Any switch statements that are missing the default case|1|
## Diagrams
### Logical Architecture

Below, we describe the purpose and key behaviors of each component and detail collaborations between them:
- **Core (Class)**: Responsible for setting up the server's routes.
- **Repository Retriever (Class)**: clones repositories.
- **Database Controller (Class)**: Deletes and lists repositories and saves the reports generated by the analysers in the correct location.
- **Analysis Builder (Class)**: delegates the multiple analysis requested to the different analysers available.
- **Report (Class)**: stores the information relative to the analysis done to each file and by each different Analyser class.
- **Analysis Adapter (Interface)**: interface used to standardize the communication with the Analysers and often generate reports with resulting information.
- **Analyser (Interface)**: analyses the repository requested.
- **Report Maker** **(Interface)**: creates a Report based on the output of related Analyser.
The following **Class Diagram** aims to better explain the structure of the developed program by showing the system's classes, their attributes, operations, and the relationships among objects:

# Development
## Set Up
The procedure described below uses Docker to make the application available at `localhost:8011`:
- Build our image:
`docker build -t feupasso/crawler_st --build-arg CRAWLER_INPUT=<input_path> --build-arg CRAWLER_OUTPUT=<output_path> crawler_st`,
where <input_path> is the location where the downloaded repositories are saved to, and <output_path> is the location where the json file containg the analysis results is saved to.
- Run the built image:
`docker run 8011:8081 feupasso/crawler_st`
## Technologies
- **Node.js**
<div align="center"><img width="250" src="https://i.imgur.com/1RgsDhm.png" /></div>
- **Typescript**
<div align="center"><img width="100" src="https://i.imgur.com/tXNNcFI.png" /></div>
- **Docker**
<div align="center"><img width="150" src="https://i.imgur.com/2Tx7pUE.png" /></div>
- **Google Cloud**
<div align="center"><img width="250" src="https://i.imgur.com/zCsIpXz.png" /></div>
- **JScent** (smell analyzer) - [https://github.com/moskirathe/JScent](https://github.com/moskirathe/JScent)
- JScent is a program analyzer that detects code smells. Code smells are potential issues with source code that can correspond to a deeper problem in the program.
- **Nocuous** (metric analyzer) - [https://github.com/h-o-t/nocuous](https://github.com/h-o-t/nocuous)
- Nocuous is a static code analysis tool for JavaScript and TypeScript.
# Operations
Our module is being deployed automatically with the rest of the modules using the CI operations of GitLab, which runs the script `deploy.sh` found at the root of this repository.
As mentioned in the Develpment section above, we can execute the program locally for debugging using Docker. Instructions can be found on said section above.
When deployed, our application is available at `35.205.24.142:8011`. We can use Postman, for example, to send API requests to this adress for testing/monitoring of the program in production.
# Group
- José Guerra - up201706421@edu.up.pt
- Gaspar Pinheiro - up201704700@edu.up.pt
- Pedro Baptista - up201705255@edu.up.pt
- Luís Ramos - up201706253@edu.up.pt