# OOP2 Assignment 1 - Group 321
[[_TOC_]]
## Overview
This section should give a quick overview of the server and the client and how the classes on server-side and client-side relate to each other. Everything important is described below in the respective section. We extended the class `Util` by methods for printing the output tables on the client-side.
### Server
The following class diagram shows the associations of all classes and interfaces on the server-side:

### Client
The following class diagram shows the associations of all classes and interfaces on the client-side:

## Socket connection
Once the analysis server has started, clients are able to connect to the server using a socket connection. For each client who wants to connect, a new thread (a so called `ConnectionThread`) is created by the `AnalysisServer`. Within this thread, the client and the server can communicate by using the established socket connection. This allows the server to communicate with several clients simultaneously.
Every message that is sent from the client and received by the server (and vice versa) is done by `ObjectOutputStream` for sending and `ObjectInputStream` for receiving a specific message. With these two classes, it's possible to serialize and deserialize objects, e.g. `DataSeries`.
When the client tries to execute a command and the server is unreachable, a message is shown on client-side and the application closes.
### Open Issues
* Each client can only execute one command at once, so the client has to wait until the servers responses to execute the another command.
* For some reason, if an `ls` or a `data` (this includes `linechart`, `multilinechart` and `scatterplot`) command has been sent to the server and after that the client wants to close the application using `exit`, it takes a little bit until the application has been closed.
## `ls` Command
When the command `ls` is executed by the client, the `ConnectionThread` calls `readSensors` from the class `DataReader`. This returns a list of sensor which is then sent to the client. The client prints the sensor data as table in this format:
| ID | TYPE | LATITUDE | LONGITUDE | LOCATION | METRIC |
| - | - | - | - | - | - |
| <SensorID> | <Type> | <Latitude> | <Longitude> | <Location> | <Metric> |
So the data is viewed as the following example output shows:
```
------------------------------------------------------------------------------------
| ID | TYPE | LATITUDE | LONGITUDE | LOCATION | METRIC |
------------------------------------------------------------------------------------
| 1779 | SDS011 | 46.670000 | 14.260000 | 10019 | P1 |
| 1779 | SDS011 | 46.670000 | 14.260000 | 10019 | P2 |
| ... | ... | ... | ... | ... | ... |
```
### `DataReader`
In order to retrieve data from CSV files and sample this data, we introduced a new class `DataReader`. `DataReader` implements the following methods:
* `readSensors(String path)`: Returns a list of sensor in the given data path (which is usually `data`).
* `readData(String path, DataQueryParameters dqp)`: Returns a DataSeries with the processed DataPoints considering the operation and interval.
* `getAllDataPoints(String path, DataQueryParameters dqp)`: Returns a DataSeries of all DataPoints of the given sensor in the given timespan.
* `interpolate(DataPoint dp1, DataPoint dp2)`: Returns the mean of two DataPoints.
This class is used in `ConnectionThread` whenever an `ls` or a `data` command is received.
## `data` Command
When the command `data` is executed by the client, the `ConnectionThread` calls `readData` from the class `DataReader`.
The following shows the data command syntax:
```
data <SensorId> <Metric> <From> <To> <Operation> <Interval>
```
This returns a `DataSeries` `(TreeSet<DataPoint>)` which is then sent to the client. The client prints the sensor data as table like this:
There are the Data Query Parameters, which has been read by the `data` command.
```
---------------------------------
| Data Query Parameters |
---------------------------------
| Sensor ID: 1000 |
| Type: P1 |
| From: 2019-10-01T0:00:00 |
| To: 2019-10-02T0:00:00 |
| Operation: MIN |
| Interval: 86400 |
---------------------------------
```
There is listed one sensor in the specific time range with his metric - optional plus an operation (MIN, MAX, MEDIAN, MEAN) or an operation and an interval (in seconds).
```
-----------------------------------------------------------
| FROM | TO | VALUE |
-----------------------------------------------------------
| 2019-10-01T00:00:00 | 2019-10-02T00:00:00 | 0.400 |
-----------------------------------------------------------
```
## `scatterplot` Command
When calling the `scatterplot` command, a query containing the commandline values is sent to the `data` command. When a width, followed by a number greater than 600, as well as a height, followed by a number greater than 600 as well, are specified in the command, the values are being used as width and height for the resulting image.
```
> scatterplot 1000 P1 7646 P2 2018-01-01 2019-03-30 graphs/scatterplots/image1.png MAX 2d width 5000 height 3000
```
If the requested image-path contains nonexisting subfolders, the folders are being created, if the client has the correct access rights.
The two DataSeries requested by `scatterplot` are used to populate the scatterplot, by scaling the values according to the highest and lowest values of each axis and creating correctly numbered segments, the scatterplot should make it easier to read and interpret the shown datapoints.
The scatterplot class, as well as the linechart class, inherit from the picture class in order to share common attributes, like labeling, creating the segments and enumeration of those segments, as well as the outline of the coordinate system.
A few new classes have been introduced, in order to make processing and transferring values needed to create the scatterplot easier:
### `Point`
The `Point` class simply contains two double values, x and y respectively, and performs some limited actions on those values. To make the creation of the Line of Best Fit easier, the x-value can be returned squared and both values can be multiplied togheter.
### `LineBestFit`
The `LineBestFit` class contains methods to create values for a Line of Best Fit. Upon initialization the constructor calculates the slope value `m` and the intercept `b` by using the list of points the line should describe as parameters. Furthermore, the class is able to return the lowest and highest x-values within a given bound, as well as any value `y` for a given `x`.
### `Scaler`
The `Scaler` class contains information on how the coordinate system shall be created and annotated, as well as scaling values in order to set all `Points` in the correct position.
## `linechart` Command
Calling the `linechart` command, will cause a linechart picture to be saved in the desired folder. Depending on the commandline input, the size of the linechart will be adjusted. The only restriction is, that height and width need a minimal value of 600. If no size is stated in the command, the picture will have a standard size of 1920 x 1080.
```
> linechart 1000 P2 2018-01-01 2019-03-30 data/images/image.png MEAN width 3000 height 2000
```
The heading of the chart shows the processed timeframe, the sensor and the operator(MEAN, MAX, MIN). The labels furthermore show in what time unit the time and in what measurement the metrics are given.
The linechart inherits from the Picture class to access needed functions. The scale factor is calculated through determining the earliest and the latest date of the Dataseries and the difference of the highest and lowest point of the metrics.
Following classes are being used to simplify the creation:
### `Annotation`
The `Annotation` class saves the important data needed for labelling the chart. It contains the mode of time (min, secs, h etc.), value of metric and number of segments for the time-axis.
### `LCPoint`
The `LCPoint` class contains a value and a timestamp, so that each `LCPoint` contains a datapoint with its date.
### `LCScaler`
The `LCScaler` class inherits all attributes from `Scaler` and adds a value for the time mode in order to label the time-axis correctly.
## Cache
The Analysis Server caches sensor data from previous requests in the RAM - so we created a new class `DataCache` which extends `TreeMap<String, DataSeries>`. `DataCache` implements our created interface `Cache` with the following method:
* `keyExists(String key)`: Returns a boolean if the DataCache contains the key.
### Open Issues
* We implemented a pretty stupid cache, which saves all data of a command if the exactly identical command was not executed before. So if an exactly identical request was already made, the `DataReader` fetches the data from the cache.
## Bonus: `history` Command
We implemented an additional command `history` using the Singleton design pattern. The most recent commands (at most 10) executed by the client are stored in the history file `history.cli` and can be viewed by using `history`. An output of this command could look like this:
```
> history
1 data 1000 P2 2019-01-01 2019-03-01 MIN 1d
2 data 1000 P1 2020-02-01 2019-02-02
3 help
```
If the client wants to load a command as the current command, `history` can be used with the number that comes before the command. So if we continue the previous example, the command
```
> history 2
```
would lead to
```
> data 1000 P1 2020-02-01 2019-02-02
```
so the client can add an operation and/or an interval and only has to press enter to execute this command. Unfortunately, it's not trivial to use the arrow keys together with Java console, so we had to implement the history as a seperate command.
## Bonus: `multilinechart` Command
The `multilinechart` command, as the name suggests, allows to create multiple linecharts of different sensors on one picture.
```
multilinechart 8 1000 P1 1503 P1 1665 P1 1693 P1 2043 P1 3993 P1 4021 P1 7646 temperature 2019-01-01 2019-03-30 images/image.png MAX 5d width 5000 height 3500
```
For better readabilty only 8 sensors are allowed at once, but its always possible to use less.
Since the `MultiLineChart` class inherits from Linechart, it shares all its common traits, like scaling the picture by specifying width and height.
In the end, it uses the same procedure `linechart` uses and applies the calculation to all `DataSeries` elements it gets.
### `SensorInfo`
The `SensorInfo` class helps to simplify the process of creating the necessary `DataSeries` by containing the `id` and the `metric` of each sensor.
# OOP2 Assignment 2 - Group 321
## Overview
The second part of this assignment deals with clustering data. For this reason, we implemented the `cluster` command, which uses the implementation of the `data` command from Assignment 1 (with a few more features regarding interpolation). With the help of the SOM algorithm, which is executed on the server, the requested data is clustered (see [ `cluster` command](##`cluster`-command) for a more detailed description). The intermediate results and the final result of the clustering is then sent back to the client in JSON format. The client stores the JSON files the intermediate results and the final result in the corresponding directory `clusteringResults/<resultID>`. After storing the JSON files, the client is able to print out some information regarding a specific cluster using the `inspectcluster`command or plotting the result(s) of a specific cluster using the `plotcluster` command. The client also has the options to list all clustering results using the `listresults` command or to delete a specific result using the `rm` command.
### Done
* `cluster` command
* SOM algorithm
* `listcluster` command
* `rm` command
* `inspectcluster` command
* `plotcluster` command
* Different Kernel Functions (Bonus)
### Missing
* Operations for heatmaps
## `cluster` Command
The cluster command creates a Self-Organizing Map according to the input parameters. Either all sensor are used for a given metric or just the specified IDs. Selecting DataPoints follows the same procedure as the data command, where data is either selected from disk or from cache. The `length` specifies the amount of weights for each node and the size of each input vector. If the resulting DataSeries for the sensor cannot be divided by the `length`, clustering is aborted. `gridHeight` and `gridWidth` specify the width and the height (amount of nodes) of the cluster respectively. The `updateRadius` defines how big the initial neighborhood radius according to the radius of the cluster diagonal should be. `learningRate` specifies the initial learning rate, `iterationPerCurve` furthermore specifies the number of training cycles for the SOM-Algorthm, `resultID` the identifier for the storage directory and the files and `amountOfIntermediateResults` defines the number of intermediate results, which should be sent to the client.
The implemented SOM-Algorithm initializes the weights of each node by randomly selecting from all input values. For each iteration the best-matching unit (BMU) of the SOM is calculated and new input vectors are added to the member list of the BMU. Based on the current neighborhood radius neighboring nodes are selected and updated, whilst taking into account the distance to the BMU and reducing the adjustment respectively. When the SOM has been trained on all input vectors, the learning rate and the neighborhood raidus are updated and another round of training starts.
In order to send intermediate results to the client, a worker thread is created on the server, so that the servers main thread is able to send the intermediates to the client. Upon receiving a result, the client transforms it to JSON and stores it in the correct directory.
## `listresults` Command
This command lists all finished clustering results. This is technically realized by listing all directory names inside the folder `clusteringResults`.
The command below provides an example of `listresults`:
```
listresults
```
## `rm` Command
This command removes the results of a finished clustering query. More precisely, it removes the whole directory of the finished clustering query given by the result ID.
The command below provides an example of `rm`:
```
rm 0xCAFE
```
## `inspectcluster` Command
The `inspectcluster` command shows information about the entered cluster, given by the height and the width index. The boolean defines whether all members should be listed or not.
The command below provides an example of `inspectcluster`:
```
inspectcluster 0xCAFE 1 1 true
```
## `plotcluster` Command
Depending on the `<resultID` (first argument) the command draws the associated final result of the query or depending on the `<boolPlotAllFrames` (last argument) the intermediate results. The `<boolPlotClusterMember>`(fourth argument) defines if all of the members or only the weight of the nodes should be drawn. On the basis of this command, the video is created.
Just as a clarification for `clusterPlotHeight` and `clusterPlotWidth`: In our implementation, those two arguments define the dimensions of a single cluster. This means, that for e.g. 5x6 clusters in one single plot, the height of the whole image is `5 * clusterPlotHeight (+ some margin)` and the width of the whole image is `6 * clusterPlotWidth (+ some margin)`.
The command below provides an example of `plotcluster`:
```
plotcluster 0xCAFE 300 400 true NONE false
```
## Bonus: Different Kernel Functions
A gaussian kernel has been implemented, which updates nodes within the specified radius less, the further they are away from the BMU.