# TX-7093 Notes ## 1. Researches ### 1.1 MAP API #### OpenLayers https://openlayers.org/en/latest/examples/measure.html This can be used for zone decomposition Not very well maintained Very customizable & flexible, but complicated #### Leaflet More active & popular http://leaflet.github.io/Leaflet.draw/docs/leaflet-draw-latest.html Simpler, less flexibility ### 1.2 Python modules - Working with coordinates: https://pyproj4.github.io/pyproj/stable/ - Measuring: https://pygeos.readthedocs.io/en/stable/index.html - Postgis can also be an option to work with big database ### 1.3 Frameworks to use Multiple options available for UI: - Angular - Better dev experience - Might be lacked of libraries but until now everything is fine - React - Larger community, better lib supports - Plain HTML/JS - Might be complicated - Not an option Both Frameworks supports leaflet and openlayers, preferably now Angular. For the serverside (since data is large, can't process on client's side) - Python (Django, Flask) - Flask is better for small application - Python is good for data processing ## 2. Brainstorming Server-side rendering is not required since we don't need SEO. Client-side rendering is better since we have powerful support from angular/react frameworks. Easier to deliver good UI/UX. ### 2.1 Client side Provide a simple UI, with a centered-fullscreen map. - Toolbar capable of - Zone division: click on map to draw polygons, lines - Option to calculate data on demand/real-time (real-time seems overcomplicated since we have to recalculate everytime) - Choose which kind of data to display ### 2.2 Server side - Store data in database - Provide APIs for calculation: - User submits zone division then get back a result of data - Can specify which data we want, to avoid over calculating - Some features/indicators can be disabled/enabled ### 2.3 Deployment The final application can be hosted on a single server. A simple CI/CD job would be configured to automatically run and deploy it. ### 2.4 Problems This project is more about how can we process data quickly and efficiently on the server. - How to store data? In memory or permanent? - Read from file or store in database? - Calculate multiple indicators/features in one query. - How strong the server need to be? Also, how can we handle local data? - City-level places markers (restaurants, bars, bus stop, etc.) - Mean of transport? If we want to calculate time to go to school, we have to consider transport system? => Walking ## 3. Application design ### 3.1 Simple database ```plantuml class Street { #street_id VARCHAR street_name VARCHAR } class Family { #family_id INTEGER -house_id VARCHAR quotient_familial DOUBLE PRECISION } class House { #house_id VARCHAR address VARCHAR house_number INTEGER multiplicative VARCHAR -street_id VARCHAR lat DOUBLE PRECISION lon DOUBLE PRECISION } class School { #school_id SERIAL school_name VARCHAR school_address VARCHAR school_type VARCHAR lat DOUBLE PRECISION lon DOUBLE PRECISION color VARCHAR } class SchoolStreetAssoc { #assoc_id SERIAL -decomposition_id INTEGER -school_id INTEGER -street_id VARCHAR parity VARCHAR n_start INTEGER n_start_multiplicative VARCHAR n_end INTEGER n_end_multiplicative VARCHAR } class Kid { #kid_id INTEGER -family_id INTEGER -current_school_id INTEGER dob DATE level VARCHAR sex VARCHAR } class Decomposition { #decomposition_id SERIAL decomposition_name VARCHAR created_at TIMESTAMPTZ } class Boundary { #boundary_id SERIAL -decomposition_id INTEGER -school_id INTEGER polygons JSON } Family "*" -- "1" House : live_in Family "1" -right- "*" Kid : has Kid "*" -right- "1" School : register_in House "*" -- "1" Street : belong to (School, Street) .. SchoolStreetAssoc Decomposition "1" -- "*" SchoolStreetAssoc (Decomposition, School) .. Boundary ``` ### 3.2 APIs #### Houses ``` /houses/ /houses/?house_number[gte]=10&house_number[lte]=100&street_id[exact]=100000 ``` #### Schools ``` # List of schools /schools/ ``` #### Decompositions ``` # List of saved decompositions /decompositions/ # List of associations in a decomposition /decompositions/{id}/schoolstreetassocs/ # List of boundaries in a decomposition /decompositions/{id}/boundaries/ ``` ## 4 Problems & Solutions ### 4.1 Decomposition Visualisation (polygons) The first problem we have is how to visualize a zone associated to a school. A simple convex-hull approach is not very efficient because these points are not necessarily separable by convex polygons. ![Convex-hull approach](./screenshots/convex-hull-boundary.png) __Solution (not very good):__ 1) Build voronoi diagram 2) Use shapely to `sjoin` polygons associated to the same school __Solution (potential):__ 1) Build an over fitted classifier (SVM, DecisionTree) 2) Find the decision boundary and turn it into GeoJSON ### 4.2 Save & load decomposition How to efficiently save & load decomposition for visualisation (cf. Database diagram above) - Each decomposition should be saved separately. - The polygon used for visualisation should be pre-computed each save. (static) - To visualize, fetch /schools/{id}/decompositions/{id}/boundaries/ ### Address mismatches The address fetched from overpass & the given data doesn't overlap. There are some addresses in the given data that overpass doesn't have. We can't use overpass. **The street_id field is not guaranteed to be true. Don't use it.** Another problem is there are addresses that way too far from Les Lilas (outside of leslilas) in the given data. Should I eliminate all of them? ### Anomalies There are addresses where multiple numbers are provided. Such as ``` 16-18 rue des bruyeres 93260 les lilas 16/ 18 rue des bruyeres 93260 les lilas 18- 20 rue romain rolland 93260 les lilas ``` Use this regex to filter and fix it. Take the second number as house number (group 5) ``` ^(\s*)(\d*)(\s*)[\-\\\/](\s*)(\d*) ``` There are also some addresses need manual fix ``` 46- rue de paris 93260 les lilas 70/a 78 rue de l egalite 93260 les lilas chez mr seror richard julia beaute ``` Will be fixed to ``` 46 rue de paris 93260 les lilas 78 rue de l egalite 93260 les lilas 5 RUE DU HUIT MAI 1945 93260 LES LILAS 120 RUE DE PARIS 93260 LES LILAS ``` > Addresses without code postal should be treated as les lilas? > Do we show all the houses or show only houses that have kids in current type of school? - Add duplicate button for decomposition, rename save button and make it more visible