# [LATEST] Oralytics Simulation Environment Reproducibility
## Data Preprocessing
### Pulling Raw Data and Parsing for Brush Time and Phone Engagement
[Code here](https://colab.research.google.com/drive/1TU9U7vGjt8pgxwHNa2DkDQca1wXcJBMp?usp=sharing)
---
All user's raw brushing and phone engagement data are stored in box. We use the built in Box API to pull data, parse, and reformat into a dictionary.
#### Setup for Box API
Before using the Box API, we have to register an app and authenticate.
1. Go to https://app.box.com/developers/console and hit 'Create New App' and 'Custom App'. You can name it whatever you want but I named it "Oralytics".

2. Click on your app and go to `Configuration` and make sure App Access Level is set to `App + Enterprise Access`.

3. When using Box API it will prompt you for a `Developer Token`. Under `your app` > `Configuration` > `Developer Token` is the Developer Token. Click `Generate Developer Token` to generate a temporary token that you can input and authenticate your session.


#### Pulling Data
##### Query Info
Box uses the query API to find the specific file and extract data.
For Brush Time we use:
`accelerometer_query = '"' + "ACCELEROMETER--org.md2k.motionsense--MOTION_SENSE_HRV_PLUS--LEFT_WRIST" + '"'`
For Phone Engagement we use:
``'"' + "TOUCH_SCREEN--org.md2k.phonesensor--PHONE" + '"'``
##### Raw Data Parsing
* We consider morning session from 4AM - 4PM and evening session from 4:01 PM - 3:99AM (next day).
After runing the script located [here](https://colab.research.google.com/drive/1TU9U7vGjt8pgxwHNa2DkDQca1wXcJBMp#scrollTo=_J-NOyKjkGZq), we obtain a `data_dictionary` of the following form:
`{'robas_1': {'2018-07-09 Evening': 25268.813,
'2018-07-09 Morning': 11.704,
...
'2019-02-28 Evening': 0.0},
'robas_10': {'2018-06-21 Evening': 2748.423,
'2018-06-22 Evening': 3.502,
...}`
where the key is the user and the value is a dictionary of day time and either brush time or phon engagement time values.
#### Formatting Data
After obtaining the dictionary, we then use the `pandas` framework to turn the dictionary into a csv file and download.
Unfortunately we have not written an algorithmic procedure for turning the csv into the desired format of having day in trial instead of raw date.