{%hackmd D5Ke1wgzREKAMTq40fd_Ww %}
# BSS dataset README

## npy_files
* sclar_inflow_and_outflow_array:
* shape: (2976, 100, 2)
* 2976 timeslots: 2020/7/1~2020/8/29, 30 minutes / interval
* 100 major flow station
* 0 is inflow, 1 is outflow
* scalar_volume_array:
* shape: (2976, 100, 2)
* 2976 timeslots: 2020/7/1~2020/8/29, 30 minutes / interval
* 100 major flow station
* 0 is start of this timeslot, 1 is end of this timeslot
* grid_inflow_and_outflow_array:
* shape: (2976, 100, 2, 7, 7)
* 2976 timeslots: 2020/7/1~2020/8/29, 30 minutes / interval
* 100 major flow station
* 0 is inflow, 1 is outflow
* x, y is coordinate
* grid_volume_array:
* shape: (2976, 100, 2, 7, 7)
* 2976 timeslots: 2020/7/1~2020/8/29, 30 minutes / interval
* 100 major flow station
* 0 is start of this timeslot, 1 is end of this timeslot
* x, y is coordinate
* about training / test:
* just like descryption above, but timeslots number are different
* training's duration is 1920 (40 days * 48 timeslot), and test's duration is 960 (20 days * 48 timeslot)
## csv_files

* station_info.csv
* station_id: this station's id in the BSS system
* station_name: this station's name in the BSS system
* station_lat: this station's latitude in the BSS system
* station_lng: this station's lngitude in the BSS system
* major_flow_station_info.csv
* filtered 100 major flow station's data(according to the total flow rank)
* index: the order this station in station_info.csv
* total_flow: this station inflow + outflow in this 2 month
* station_id: this station's id in the BSS system
* station_name: this station's name in the BSS system
* major_flow.csv
* depicts every major station inflow and outflow for every other major flow station
* cut into 2 month(July and August)
* the index is current station's name
* data checker:
* Not split:

### Min max vaule for training and testing
* station - level:

* region - level (Menhatan)

* regin - level (all New York City)
