# Meeting Notes II Jamboard [here](https://jamboard.google.com/d/18BJRppOy5K_qR-wI91dyQMBcqG0_jFQzhj3_71SiaHo/edit?usp=sharing) ## Plan - [ ] Pick analysis things to do - [ ] Do them next time - [ ] Discuss and present in colab - [ ] Collate results ## ML Terminology (Why?) - Rhiannon -- Depends on how it is clustered on the data-graph - Trevor -- More information (readability) - Miles -- unsupervised --> harder to see and analyze - Labeled Input data --> Supervised learning - Unlabeled data --> Unsupervised Classification problems require a measure to decide which class is assigned, that measure can be obtained through **regresison**. ## What's with the dimensions? ## For next time - Go over this: https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py - Load all the data - Split the data into three parts - 60% 20% 20% : Train test validation (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html) - df$prot1, df$totprot - Random sample of protein, and sum of 77 proteins ```python # Create linear regression object regr = linear_model.LinearRegression() # Train the model using the training sets regr.fit(diabetes_X_train, diabetes_y_train) # Make predictions using the testing set diabetes_y_pred = regr.predict(diabetes_X_validation) ``` On the basis of the protien level of expression, can we determine if the mouse had recieved the shock treatment.