# [LATEST] Oralytics Simulation Environment Reproducibility ## Data Preprocessing ### Pulling Raw Data and Parsing for Brush Time and Phone Engagement [Code here](https://colab.research.google.com/drive/1TU9U7vGjt8pgxwHNa2DkDQca1wXcJBMp?usp=sharing) --- All user's raw brushing and phone engagement data are stored in box. We use the built in Box API to pull data, parse, and reformat into a dictionary. #### Setup for Box API Before using the Box API, we have to register an app and authenticate. 1. Go to https://app.box.com/developers/console and hit 'Create New App' and 'Custom App'. You can name it whatever you want but I named it "Oralytics". ![](https://i.imgur.com/TYMrpNa.png) 2. Click on your app and go to `Configuration` and make sure App Access Level is set to `App + Enterprise Access`. ![](https://i.imgur.com/hXv68zX.png) 3. When using Box API it will prompt you for a `Developer Token`. Under `your app` > `Configuration` > `Developer Token` is the Developer Token. Click `Generate Developer Token` to generate a temporary token that you can input and authenticate your session. ![](https://i.imgur.com/n5sq4Hl.png) ![](https://i.imgur.com/vJ6cIxe.png) #### Pulling Data ##### Query Info Box uses the query API to find the specific file and extract data. For Brush Time we use: `accelerometer_query = '"' + "ACCELEROMETER--org.md2k.motionsense--MOTION_SENSE_HRV_PLUS--LEFT_WRIST" + '"'` For Phone Engagement we use: ``'"' + "TOUCH_SCREEN--org.md2k.phonesensor--PHONE" + '"'`` ##### Raw Data Parsing * We consider morning session from 4AM - 4PM and evening session from 4:01 PM - 3:99AM (next day). After runing the script located [here](https://colab.research.google.com/drive/1TU9U7vGjt8pgxwHNa2DkDQca1wXcJBMp#scrollTo=_J-NOyKjkGZq), we obtain a `data_dictionary` of the following form: `{'robas_1': {'2018-07-09 Evening': 25268.813, '2018-07-09 Morning': 11.704, ... '2019-02-28 Evening': 0.0}, 'robas_10': {'2018-06-21 Evening': 2748.423, '2018-06-22 Evening': 3.502, ...}` where the key is the user and the value is a dictionary of day time and either brush time or phon engagement time values. #### Formatting Data After obtaining the dictionary, we then use the `pandas` framework to turn the dictionary into a csv file and download. Unfortunately we have not written an algorithmic procedure for turning the csv into the desired format of having day in trial instead of raw date.