# Meeting Notes II
Jamboard [here](https://jamboard.google.com/d/18BJRppOy5K_qR-wI91dyQMBcqG0_jFQzhj3_71SiaHo/edit?usp=sharing)
## Plan
- [ ] Pick analysis things to do
- [ ] Do them next time
- [ ] Discuss and present in colab
- [ ] Collate results
## ML Terminology (Why?)
- Rhiannon -- Depends on how it is clustered on the data-graph
- Trevor -- More information (readability)
- Miles -- unsupervised --> harder to see and analyze
- Labeled Input data --> Supervised learning
- Unlabeled data --> Unsupervised
Classification problems require a measure to decide which class is assigned, that measure can be obtained through **regresison**.
## What's with the dimensions?
## For next time
- Go over this: https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py
- Load all the data
- Split the data into three parts
- 60% 20% 20% : Train test validation (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html)
- df$prot1, df$totprot
- Random sample of protein, and sum of 77 proteins
```python
# Create linear regression object
regr = linear_model.LinearRegression()
# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)
# Make predictions using the testing set
diabetes_y_pred = regr.predict(diabetes_X_validation)
```
On the basis of the protien level of expression, can we determine if the mouse had recieved the shock treatment.