# TX-7093 Notes
## 1. Researches
### 1.1 MAP API
#### OpenLayers
https://openlayers.org/en/latest/examples/measure.html
This can be used for zone decomposition
Not very well maintained
Very customizable & flexible, but complicated
#### Leaflet
More active & popular
http://leaflet.github.io/Leaflet.draw/docs/leaflet-draw-latest.html
Simpler, less flexibility
### 1.2 Python modules
- Working with coordinates: https://pyproj4.github.io/pyproj/stable/
- Measuring: https://pygeos.readthedocs.io/en/stable/index.html
- Postgis can also be an option to work with big database
### 1.3 Frameworks to use
Multiple options available for UI:
- Angular
- Better dev experience
- Might be lacked of libraries but until now everything is fine
- React
- Larger community, better lib supports
- Plain HTML/JS
- Might be complicated
- Not an option
Both Frameworks supports leaflet and openlayers, preferably now Angular.
For the serverside (since data is large, can't process on client's side)
- Python (Django, Flask)
- Flask is better for small application
- Python is good for data processing
## 2. Brainstorming
Server-side rendering is not required since we don't need SEO. Client-side rendering is better since we have powerful support from angular/react frameworks. Easier to deliver good UI/UX.
### 2.1 Client side
Provide a simple UI, with a centered-fullscreen map.
- Toolbar capable of
- Zone division: click on map to draw polygons, lines
- Option to calculate data on demand/real-time (real-time seems overcomplicated since we have to recalculate everytime)
- Choose which kind of data to display
### 2.2 Server side
- Store data in database
- Provide APIs for calculation:
- User submits zone division then get back a result of data
- Can specify which data we want, to avoid over calculating
- Some features/indicators can be disabled/enabled
### 2.3 Deployment
The final application can be hosted on a single server. A simple CI/CD job would be configured to automatically run and deploy it.
### 2.4 Problems
This project is more about how can we process data quickly and efficiently on the server.
- How to store data? In memory or permanent?
- Read from file or store in database?
- Calculate multiple indicators/features in one query.
- How strong the server need to be?
Also, how can we handle local data?
- City-level places markers (restaurants, bars, bus stop, etc.)
- Mean of transport? If we want to calculate time to go to school, we have to consider transport system? => Walking
## 3. Application design
### 3.1 Simple database
```plantuml
class Street {
#street_id VARCHAR
street_name VARCHAR
}
class Family {
#family_id INTEGER
-house_id VARCHAR
quotient_familial DOUBLE PRECISION
}
class House {
#house_id VARCHAR
address VARCHAR
house_number INTEGER
multiplicative VARCHAR
-street_id VARCHAR
lat DOUBLE PRECISION
lon DOUBLE PRECISION
}
class School {
#school_id SERIAL
school_name VARCHAR
school_address VARCHAR
school_type VARCHAR
lat DOUBLE PRECISION
lon DOUBLE PRECISION
color VARCHAR
}
class SchoolStreetAssoc {
#assoc_id SERIAL
-decomposition_id INTEGER
-school_id INTEGER
-street_id VARCHAR
parity VARCHAR
n_start INTEGER
n_start_multiplicative VARCHAR
n_end INTEGER
n_end_multiplicative VARCHAR
}
class Kid {
#kid_id INTEGER
-family_id INTEGER
-current_school_id INTEGER
dob DATE
level VARCHAR
sex VARCHAR
}
class Decomposition {
#decomposition_id SERIAL
decomposition_name VARCHAR
created_at TIMESTAMPTZ
}
class Boundary {
#boundary_id SERIAL
-decomposition_id INTEGER
-school_id INTEGER
polygons JSON
}
Family "*" -- "1" House : live_in
Family "1" -right- "*" Kid : has
Kid "*" -right- "1" School : register_in
House "*" -- "1" Street : belong to
(School, Street) .. SchoolStreetAssoc
Decomposition "1" -- "*" SchoolStreetAssoc
(Decomposition, School) .. Boundary
```
### 3.2 APIs
#### Houses
```
/houses/
/houses/?house_number[gte]=10&house_number[lte]=100&street_id[exact]=100000
```
#### Schools
```
# List of schools
/schools/
```
#### Decompositions
```
# List of saved decompositions
/decompositions/
# List of associations in a decomposition
/decompositions/{id}/schoolstreetassocs/
# List of boundaries in a decomposition
/decompositions/{id}/boundaries/
```
## 4 Problems & Solutions
### 4.1 Decomposition Visualisation (polygons)
The first problem we have is how to visualize a zone associated to a school. A simple convex-hull approach is not very
efficient because these points are not necessarily separable by convex polygons.

__Solution (not very good):__
1) Build voronoi diagram
2) Use shapely to `sjoin` polygons associated to the same school
__Solution (potential):__
1) Build an over fitted classifier (SVM, DecisionTree)
2) Find the decision boundary and turn it into GeoJSON
### 4.2 Save & load decomposition
How to efficiently save & load decomposition for visualisation (cf. Database diagram above)
- Each decomposition should be saved separately.
- The polygon used for visualisation should be pre-computed each save. (static)
- To visualize, fetch /schools/{id}/decompositions/{id}/boundaries/
### Address mismatches
The address fetched from overpass & the given data doesn't overlap. There are some addresses
in the given data that overpass doesn't have.
We can't use overpass.
**The street_id field is not guaranteed to be true. Don't use it.**
Another problem is there are addresses that way too far from Les Lilas (outside of leslilas)
in the given data. Should I eliminate all of them?
### Anomalies
There are addresses where multiple numbers are provided. Such as
```
16-18 rue des bruyeres 93260 les lilas
16/ 18 rue des bruyeres 93260 les lilas
18- 20 rue romain rolland 93260 les lilas
```
Use this regex to filter and fix it. Take the second number as house number (group 5)
```
^(\s*)(\d*)(\s*)[\-\\\/](\s*)(\d*)
```
There are also some addresses need manual fix
```
46- rue de paris 93260 les lilas
70/a 78 rue de l egalite 93260 les lilas
chez mr seror richard
julia beaute
```
Will be fixed to
```
46 rue de paris 93260 les lilas
78 rue de l egalite 93260 les lilas
5 RUE DU HUIT MAI 1945 93260 LES LILAS
120 RUE DE PARIS 93260 LES LILAS
```
> Addresses without code postal should be treated as les lilas?
> Do we show all the houses or show only houses that have kids in current type of school?
- Add duplicate button for decomposition, rename save button and make it more visible