--- title: "Bike Sharing Demand" author: 'Philippe Ferdinand Tadger Viloria, Ivo Kubam, Irene García-Fogeda, Nur A Ferdous' date: "12/3/2019" output: word_document: default tags: Year 2.1 done --- # Bike Sharing Demand # Introduction Bike sharing system is a service where people can rent bicycle for short term basis on payment or even free. The whole process from membership, rental and return back has become automatic. Day-by-day the number of bike-friendly cities is increasing because there is no traffic issue and also bikes are environmental friendly. By using bike sharing system people can rent bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world and the top three bike-friendly countries are Spain, Italy and China. Bike sharing demand can be predicted by using a modern statistical technique known as machine learning. In machine learning we use two data sets namely training and test data sets. The training data is used to make prediction and the test data is often used to validate the model. # Data Management The train and test data set were both restructured. and the following variables were created: | Variable | Feature | |------------|------------------------------------------------------------------------------------------------------------------| | Season | Four levels-Autumn, summer, winter, spring | | Holiday | Two levels-yes or no | | workingday | yes or No | | Weekend | Had values yes or no to indicate weekend | | Timeperiod | Four different values. Morning($04:00-11:59$), afternoon($12:00-16:59$), evening($17:00-21:59$) and night($22:00-03:59$) | | Weather | Four distinct weather periods | | Category | The count variable in the train dataset was categorized into 20 equal categories with width intervals of 50 | Count,Humidity, windspeed, atemp were considered as the continuous variables **Plan of Analysis**. The approach to compare and choose between the best methods of prediction and classification, will be done through Test MSE, Cross validation, and missclassification matrix. # Importing the restructured data sets During this process, two more data sets were created from the train dataset: biketrainrelm had count as the response while biketrainrelda had category of count as reponse ```{r} library(tidyverse);library(pander);library(broom);library(e1071) bikeTestRe<-na.omit(read.table("biketest.txt",header = T,sep = ",")) bikeTrainRe<-na.omit(read.csv("biketrain.txt",header = T,sep = ",")) str(bikeTrainRe) bikeTrainRe<-bikeTrainRe[,c(2:11,13)] bikeTestRe<-bikeTestRe[,c(2:10)] ``` # Exploratory data Analysis In fig 1, shows a very common pattern, of peaks hour (probably related with our common schedules: going to work/school, return to lunch, going back home). This peak hour pattern changes when interacts with weekdays (Fig 2), working day and holiday (not shown); such pattern has a stationary trend for the three variables, meaning that in not working days/holidays or weekend the distribution is unimodal left skewed with a peak around 12-15 hours, and the rest of the days bi modal with peaks in 8, 12 and 17 hours. A exploratory scatter plot was made (not shown) of the continuous variables and the continuous response variable (Count) where it is observed that Count has a distribution Zero-Poisson type., The most obvious linear predictors are number of registered persons, casual users since they present linearity against count. *Fig 1. Renting vs Hours (Color: from January to December)* ![](https://i.imgur.com/kU6OCqK.png) *Fig 2. Renting vs Hours (Color: weekdays)* ![](https://i.imgur.com/SvhvB0q.png) ```{r} # Train data set summary statistics summary(bikeTrainRe)%>%pander() summary(bikeTestRe)%>%pander() ``` Weather Variable had 2 observations for weather4 level for both the train and test dataset. So it was removed. The weather variable was recategorized into 3 levels. ```{r} # Removing the fourth level of weather bikeTrainRe$weather[bikeTrainRe$weather=="Weather4"]<-"Weather3" bikeTrainRe$weather<-factor(bikeTrainRe$weather) bikeTestRe$weather[bikeTestRe$weather=="Weather4"]<-"Weather3" bikeTestRe$weather<-factor(bikeTestRe$weather) #creating separate datasets for regression and classification bikeTrainRelm<-bikeTrainRe[,-c(9)] bikeTrainRelda<-bikeTrainRe[,c(-8)] str(bikeTrainRelm) ``` # Exploratory Analysis Using Unsupervised Learning K-means clustering and principal component analysis were used in order to discover unknown subgroups and visualize the data. Hierarchical Clustering was also approached since it was suspected that there could be four clusters corresponding to the season.For these techniques only the continuous variables in the datasets were considered. The idea was to kept the minimum number of groups more homogeneous withing them and as different as possible between them. Nonetheless, taking into account that three variables were kept to arrange this analysis. ```{r} #k Means clustering str(bikeTestRe) kmdata<-bikeTestRe[,c(5,6,7,8)] kmdata_scaled<-scale(kmdata) km.out=kmeans(kmdata_scaled,6,nstart=20) par(mfrow=c(1,2)) plot(bikeTestRe$humidity,bikeTestRe$windspeed, col=(km.out$cluster), main="K-Means) Clustering K=6", xlab="", ylab="", pch=20, cex=2) #Hierarchical clustering hc.average =hclust(dist(kmdata_scaled), method="average") plot(hc.average , main="Average Linkage", xlab="", sub="", cex=.9) ``` ```{r} #k Means clustering str(bikeTestRe) kmdata<-bikeTestRe[,c(5,6,7,8)] kmdata_scaled<-scale(kmdata) km.out=kmeans(kmdata_scaled,6,nstart=20) par(mfrow=c(1,2)) plot(bikeTestRe$humidity,bikeTestRe$windspeed, col=(km.out$cluster), main="K-Means) Clustering K=4", xlab="", ylab="", pch=20, cex=2) #PCA str(bikeTestRe) biketestpca<-bikeTestRe[,c(5,6,7,8)];apply(biketestpca, 2, var) pr.out=prcomp(biketestpca, scale=TRUE) pr.out$rotation[,1]=-pr.out$rotation[,1] pr.out$x=-pr.out$x;pr.out$sdev pr.var=pr.out$sdev^2;pve=pr.var/sum(pr.var) biplot(pr.out, scale=0,var.axes=T,col=c("grey","black"),main="PCA biplot") par(mfrow=c(1,2)) plot(pve, xlab="Principal Component", ylab="Proportion of Variance Explained", ylim=c(0,1),type='b') plot(cumsum(pve), xlab="Principal Component", ylab="Cumulative Proportion of Variance Explained", ylim=c(0,1),type='b') ``` From the biplot, the first component loading gives counts to the right of the plot high humidity and high windspeed to counts to the left of the plot. The second component loading gives counts to the top of the plot high adjusted temperature. Performing 6 cluster k means does not really lead to distinct clusters. But some subgroups can be seen from the plot on the left. PCA analysis was performed at the beggining with both temperatures, "atemp" and "temp", since both of them gave same interpretations for both dimensions, "atemp" was held to go on with this analysis. The first two components of PCA explains more than 80% of the variability in the data. These two component are sufficient and can be used in the modelling the response variable. # Methodology 1. K-Nearest Neighbours(KNN): This classifier attempts to estimate the conditional distribution of Y given X and then classify a given observation to the class with the highest estimated probability. This probability is given as $Pr(Y|X=x_{0})=\frac{1}{K}\sum_{i\epsilon N_{0}}^{}I(y_{i}=j)$ Where: K=Number of classes, j=1 to K, i=1 to N, $N_{0}$ = Number of observations in class j As K grows, the method becomes less flexible and produces a decision boundary that is close to linear. 2. Generalized Additive Models: GAM is a natural way of extending the linear multiple regression to accommodate for non linear relationships between the covariates and the response variable while maintaining it additives nature. GAM for regression is of the form $y_{i} = \beta_{0}+f_{1}X_{1}+f_{2}X_{2}+...+f_{p}X_{p}$ The f are basis functions(transformations). These functions can take the form of polynomials, step functions, splines etc. 3. K-Fold Cross Validation: This approach involves randomly dividing the set of observations into k groups or folds of approximately equal size. The first fold is treated as a validation set and the method is fit on the remaining (k−1) folds. The mean squared error is computed on the observations in the held-out fold. This procedure is repeated k times; each time a different group of observations is treated as a validation set. The k-fold CV estimate is computed by, $CV_{(k)}=\frac{1}{k}\sum_{i=1}^{k}MSE_{i}$ 4. The Lasso: The lasso performs variables selection for linear regression models and it shrinks the coefficient estimates towards zero.The lasso coefficients $\hat{\beta}^{L}_{\lambda}$ minimize the quantity, $\sum_{i=1}^{n}(y_{i}-\beta_{0}-\sum_{j=1}^{p}\beta_{j}x_{ij})^2 + \lambda\sum_{j=1}^{p}|\beta_{j}|= RSS+\lambda\sum_{j=1}^{p}|\beta_{j}|$ Where, $\lambda$ is the tuning parameter. ## Model variable Selection 10-fold Cross validation, Lasso and best subset technique were used to select variables for regression and classificaiton models 3. Best subset selection: This method of selecting the best model consist of the following algorithm. i. Null model that contains no variable ii. For k = 1, 2, . . . p: (a) All \binom{k}{p} models that contain exactly k predictors are fitted. (b) The best among these \binom{k}{p} models is selected (M_{k}.The model with the smallest RSS, or equivalently largest R2 is considered the best. (c) A single best model from among M_{0}, . . . , M_{p} is selected using crossvalidated prediction error, Cp (AIC), BIC, or adjusted R2. ```{r} ###Model Selection By Cross-Validation library(leaps) pred.predict=function(object,newdata,id,...){ form=as.formula(object$call[[2]]) mat=model.matrix(form,newdata) coefi=coef(object,id=id) xvars=names(coefi) mat[,xvars]%*%coefi } k=10 set.seed(1) folds=sample(1:k,nrow(bikeTrainRelm),replace=TRUE) cv.errors=matrix(NA,k,13, dimnames=list(NULL, paste(1:13))) for(j in 1:k){ best.fit=regsubsets(count~.,data=bikeTrainRelm[folds!=j,],nvmax=13) for(i in 1:13){ pred=pred.predict(best.fit,bikeTrainRelm[folds==j,],id=i) cv.errors[j,i]=mean( (bikeTrainRelm$count[folds==j]-pred)^2) } } mean.cv.errors=apply(cv.errors,2,mean) plot(mean.cv.errors,type='b',xlab="Number of errors",ylab="Mean square error",main="10 fold cross validation variable selection") reg.best=regsubsets(count~.,data=bikeTrainRelm, nvmax=15) coef(reg.best,8) ``` The plot hits is lowest MSE around 10 variables. So a regression model with around 10 variables will be suited ```{r} ##Lasso Regression library("glmnet") x<-model.matrix(count~.,data = bikeTrainRelm) y<-bikeTrainRelm$count fit.lasso<-glmnet(x,y) #plot(fit.lasso,xvar="dev",label=T) cv.lasso<-cv.glmnet(x,y) par(mfrow=c(1,2)) coef(cv.lasso) par(mfrow=c(1,2)) plot(fit.lasso,xvar="lambda",label=T) plot(cv.lasso) ``` The left figure(coef vs log(lambda)) shows all coefficieints are zero at the beginning of the plot(lambda=0). At this point we are simply fitting a least square regression. But as lambda grows we fit the coeficients get shrinks and some coeficients become zero. The right figure displays the best value of log Lambda as around negative 1 and value of around 2 as the one standard error value. A model between 13 and 8 variables will produce the best subset models using the lasso selection method. ```{r} #Model Selection #Best Subset library(leaps);library(flextable) reg=regsubsets(cat~.,bikeTrainRelda) reg=regsubsets(cat~.,data=bikeTrainRelda,nvmax=15) reg.summary<-summary(reg) which.max(reg.summary$adjr2) which.min(reg.summary$cp) coef(reg,10)%>%tidy()%>%flextable par(mfrow=c(1,2)) plot(reg.summary$adjr2,xlab="Number of Variables",ylab="Adjusted RSq",type="l") points(10,reg.summary$adjr2[10], col="red",cex=2,pch=20) plot(reg.summary$cp,xlab="Number of Variables",ylab="Cp",type='l') points(9,reg.summary$cp[9],col="red",cex=2,pch=20) ``` For the classification model, best subset variable selection technique was used. Adjusted R square and Mallow's cp were used criteria to select best model. Both criteria gave selected models with 10 and 9 variables respectively. All variables were included except windspeed and weekend. Eventhough the variable weekend was removed from the model, it was considered in classification models because of the variability seen in the plot of count vs weekend. ```{r} #Creating new data sets for selected variables #Regression Models bikeTrainRelm2<-bikeTrainRelm[,-c(7)] str(bikeTrainRelm) #Classification models bikeTrainRelda2<-bikeTrainRelda[,-c(2,3,7)] ``` # Regression Models ## Linear and non-linear regression models ```{r} fitglm<-glm(bikeTrainRelm2$count~., data = bikeTrainRelm2) summary(fitglm) fitglmrstd<-rstandard(fitglm) par(mfrow=c(1,2)) qqnorm(fitglmrstd, ylab="Standardized Residuals", xlab="Normal Scores") qqline(fitglmrstd) plot(fitglm$residuals,fitglm$count,ylab="Residuals",xlab="counts") ``` Diagnostic plots shows the model does not respect the assumptions of Normality, Homoscedasticity and Independency. Variables such holiday, workingday, weekend were really unsignificant based on their p values and therefore were removed from the analysis, "Backward" method was used. ```{r} #Estimating the mean square test error #using 10 fold cross validation k=10 set.seed(1) folds=sample(1:k,nrow(bikeTrainRelm2),replace=TRUE) acc<-NULL j=5 for(j in c(1:10)){ fitk=glm(count~.,data=bikeTrainRelm2[folds!=j,]) pred=predict(fitk,bikeTrainRelm2[folds==j,]) acc[j]<-mean( (bikeTrainRelm2$count[folds==j]-pred)^2) } mseTest<-mean(acc,na.omit=T) mseTest ``` Using the 10 fold cross validation method, the test mean square error was found to be 18,581.75. Below, a Generalized Poisson Model respecting the significant variables, was performed in order to respect the distribution of count, afterwards its predicted values were backtransformed in order to calculate MSE. ```{r} #Poisson Regression #UPDATED fitglm<-glm(bikeTrainRelm2$count~., data = bikeTrainRelm2,family = "poisson") summary(fitglm) ``` ```{r} #Estimating the mean square test error #using 10 fold cross validation k=10 set.seed(1) folds=sample(1:k,nrow(bikeTrainRelm2),replace=TRUE) acc<-NULL for(j in c(1:10)){ fitglm<-glm(bikeTrainRelm2$count~., data = bikeTrainRelm2,family = "poisson") pred=predict(fitglm,bikeTrainRelm2[folds==j,]) acc[j]<-mean( (bikeTrainRelm2$count[folds==j]-exp(pred))^2) } mseTest<-mean(acc,na.omit=T) mseTest ``` Fitting a Poisson Model and estimating the mean square error using a 10 fold cross validation gave an MSE of 17721.71. Therefore, a decrease of error the measurement was achieved. ```{r} #Random Forests #install.packages("randomForest") library(randomForest) fitrandom<-randomForest(count~.,data=bikeTrainRelm2,mtry=3,ntree=100) fitrandom%>%pander() ``` The random forest method which we used 100 trees for each iteration and three variables at each split gave the mean squared error as 13445.33 and 59.02% of explain variability Below, the Random Forest was fitted with the categorize count variable. ```{r} predrandom<-predict(fitrandom,bikeTestRe) ``` ## Boosting ```{r} # Boosting library(gbm) set.seed(1) boost.bike=gbm(count~.,data=bikeTrainRelm2,distribution="poisson",n.trees=5000,interaction.depth=4) summary(boost.bike) #Estimating the mean square test error #using 10 fold cross validation k=10 set.seed(1) folds=sample(1:k,nrow(bikeTrainRelm2),replace=TRUE) acc<-NULL j=5 for(j in c(1:10)){ fitk=gbm(count~.,data=bikeTrainRelm2[folds!=j,],distribution="poisson",n.trees=5000,interaction.depth=4) pred=predict(fitk,bikeTrainRelm2[folds==j,],n.trees = 5000) acc[j]<-mean( (bikeTrainRelm2$count[folds==j]-pred)^2) } mseTest<-mean(acc,na.omit=T) mseTest ``` Boosting method with the following tuning parameters;number of trees=5000, default shrinkage paramter of 0.1 and number of splits (d=4) resulted in the following plot of variables importance. From the plot, the variable period turns out to be the most important followed by atemp . The least variable of importance was holidays. #updated In order to estimate the test error, a 10 fold cross validation was carried out on the train dataset which lead to an estimated test mse of 13,943.05. Value is much higher than the value of random forest. ## Classification Models Several classification techniques were considered to modeled the categorical outcome (20 levels). Two approaches were consider: through quantiles and through equal interval size. In order to improve LDA which was one of the best methods, the response variable was categorize by 20 quantiles in order to have a balanced ammount of information per category and have a fair comparisson with the previous categorized variable. ### Linear Discriminant Analysis (LDA): ```{r} ##categories #this code produce exactly the same that all previous lines, and give the flexibility to change the number of splits bikeTrainRe$cat2 <- with(bikeTrainRe, cut(count, breaks=quantile(count, probs=seq(0,1, by=0.05), na.rm=TRUE),include.lowest=TRUE)) library(MASS) bikeTrainRelda2<-bikeTrainRe[,c(-8)] ##FITTING LDA CAT2 fitlda2<-lda(bikeTrainRelda2$cat2~.,data = bikeTrainRelda2) #using k fold cross validation k=10 set.seed(1) folds2=sample(1:k,nrow(bikeTrainRelda2),replace=TRUE) acc<-NULL j=5 for(j in c(1,2,3,5:10)){ fitk2=lda(cat2~.,data=bikeTrainRelda2[folds2!=j,]) pred2=predict(fitk2,bikeTrainRelda2[folds2==j,]) acc[j]<-mean(pred2$class==bikeTrainRelda2$cat2[folds2==j]) } acclda<-mean(acc,na.rm = T) acclda #FITTING LDA WITH EQUAL INTERVAL SIZE library(MASS) library(Hmisc) bikeTrainRelda2<- bikeTrainRe bikeTrainRelda2<-bikeTrainRelda2[,-c(8)] bikeTrainRelda2$cat<- cut2(bikeTrainRe$count, seq(0, 1000, by=50)) str(bikeTrainRelda2) fitlda<-lda(bikeTrainRelda2$cat~.,data = bikeTrainRelda2) #using k fold cross validation k=10 set.seed(1) folds=sample(1:k,nrow(bikeTrainRelda2),replace=TRUE) acc<-NULL j=5 for(j in c(1,2,3,5:10)){ fitk=lda(cat~.,data=bikeTrainRelda2[folds!=j,]) pred=predict(fitk,bikeTrainRelda2[folds==j,]) acc[j]<-mean(pred$class==bikeTrainRelda2$cat[folds==j]) } acclda<-mean(acc,na.rm = T) acclda ``` Fitting LDA with the quantile method, an accuracy of 50,39% was achieved. Meanwhile with a category of equal size interval, was about 80% of accuracy. After fitting LDA, 10 fold cross validation was used to estimate miss classi fication error which was found to be 0.66. ### K Nearest Neighbour (KNN) ```{r} # 10-fold cross validation to determine the best value of k and accuracy library("caret") ctrl <- trainControl(method = "repeatedcv", repeats = 1) cvknn <- train(cat ~ ., data =bikeTrainRelda2, method = "knn", tuneLength = 12, trControl = ctrl) cvknn ##Fitting KNN library(class) x<-cbind(bikeTrainRelda2$season,bikeTrainRelda2$weather, bikeTrainRelda2$atemp,bikeTrainRelda2$humidity,bikeTrainRelda2$period,bikeTrainRelda2$weekend) y<-cbind(bikeTestRe$season,bikeTestRe$weather, bikeTestRe$atemp,bikeTestRe$humidity,bikeTestRe$period,bikeTestRe$weekend) x<-na.omit(x) y<-na.omit(y) str(bikeTrainRelda2) fitknn<-knn(x,y,bikeTrainRelda2$cat,k=21) ``` KNN results to using k=21 with an accuracy of 0.31 in order to estimate the bayes classifier boundary. This high value of K leads to a less flixible model which is almost equivalent to a linear regression model. ## Generalized Additive Models ### Polynomial Models A three degree polynomial was fitted for all continuous variables. The fitted model was then used to do 10 fold cross validation in order to estimate the miss classfication error. ```{r} #Polynomial model attach(bikeTrainRelda2) fitpoly<-lda(cat~season+poly(atemp,df=3)+poly(humidity,df=3)+weather+period,data=bikeTrainRelda2) #using k fold cross validation to get the accuracy for GAM model k=10 set.seed(1) folds=sample(1:k,nrow(bikeTrainRelda2),replace=TRUE) acc<-NULL for(j in c(1,2,3,5:10)){ fitk=lda(cat~season+poly(atemp,df=3)+poly(humidity,df=3)+weather+period,data=bikeTrainRelda2[folds!=j,]) pred=predict(fitk,bikeTrainRelda[folds==j,]) acc[j]<-mean(pred$class==bikeTrainRelda2$cat[folds==j]) } accpoly<-mean(acc,na.rm = T) accpoly ``` The miss classification error was found to be 0.66 ### Smoothing Splines ```{r} #Fitting a smoothing spline for all continuous variables to determine df using LOOCV library(splines) fit1=smooth.spline(bikeTrainRelda2$atemp,bikeTrainRelm$count,cv=TRUE);fit1 fit2=smooth.spline(bikeTrainRelda2$humidity,bikeTrainRelm$count,cv=TRUE);fit2 #Fitting a smoothing spline for all continues variables require("gam") attach(bikeTrainRelda2) fitgam=lda(cat~season+s(atemp,df=20.6)+s(humidity,df=10.2)+weather+period,data=bikeTrainRelda2) #using k fold cross validation to get the accuracy for GAM model k=10 set.seed(1) folds=sample(1:k,nrow(bikeTrainRelda2),replace=TRUE) acc<-NULL for(j in c(1,2,3,5:10)){ fitk=lda(cat~season+s(atemp,df=20.6)+s(humidity,df=10.2)+weather+period,data=bikeTrainRelda2[folds!=j,]) pred=predict(fitk,bikeTrainRelda2[folds==j,]) acc[j]<-mean(pred$class==bikeTrainRelda2$cat[folds==j]) } accgam<-mean(acc,na.rm = T) accgam ``` Fitting these separate splines led to 20.6 and 10.2 effective degrees of freedom for atempt and humidity respectively. Using these effective degree of freedoms, the GAM model consisting of two smoothing splines gave a miss classification error of 0.66 ### Random Forest #### Random Forest Equal Interval Size ```{r} #Random Forests for cat fitrandom2<-randomForest(cat~.,data=bikeTrainRelda2,mtry=3) library(randomForest) fitrandom<-randomForest(cat~.,data=bikeTrainRelda2) ``` For this approach, the estimate error rate was of 62.59% as for the next approach, quite similar as with the count variable. #### Random Forest Quantile Method ```{r} #Random Forests for quantile for cat 2 library(randomForest) bikeTrainRelda2<-bikeTrainRelda2[,c(-8,-9)] fitrandom2.quan<-randomForest(cat2~.,data=bikeTrainRelda2,mtry=3) library(randomForest) fitrandom.quan<-randomForest(cat2~.,data=bikeTrainRelda,mtry=3) ``` Using 500 trees and three variables tried at each split, the out of back estimate of error rate is 62.42%. # Discussion and Conclusion Regression models seems to be the most obvious choice over classifcation models in predicting the number of counts of bike rented. Multiple linear regression,poisson regression, random forest and boosting were considered for predicting the outcome variable. Random forest was found to be better in terms of least mean square error with a value of 13445.33. Eventhough random forest seems to be a better choice in this case, boosting can be better improved by playing with parameters of the model using k fold cross validation method, and so could be regression analysis if some transformations of the predictors would be performed or an information of registered and casual users could be used for the estimate of the response variable. Random forest and boosting are more flexible models over multiple linear regression and poisson but the cost of the high flexibility is in lost of interpretation. Eventhough classification models seems not visible to predict counts of bike rented,20 classes of width length of 50 was used in order to explore the classification models such linear discriminant analysis, K nearest neighbour, Generalized additive models and random forest. Based on the miss classification error rates, random forest turns out to be the best with an accuracy of 38 %. Not much of a difference with the other classifiers. |Category|Models|Test MSE|Missclassification| |-----|------|-------|-----| |Regression|Multiple linear regression|18581.75|-| ||Poisson regression|17721.71|-| ||Random forest|13445.33|-| ||Boosting|13943.05|-| |Classification|Linear discriminant analysis|-|0.66| ||K-nearest neighbour|-|0.69| ||GAM-Polynomial|-|0.66| ||GAM-Smoothing splines|-|0.66| ||Random forest|-|0.62| ## References 1.Policy Institute: Bike-sharing programs hit the streets in over 500 cities worldwide. http://www.earth-policy.org/plan_b_updates/2013/update112 (2013). 2.James, G. (2013). An introduction to statistical learning In: Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. #### Exploratory Plots Code (JMP Pro) ``` Graph Builder( Variables( X( :H ), Y( :count ), Overlay( :p.Reg ) ), Elements( Smoother( X, Y, Legend( 72 ) ) ) Graph Builder( Variables( X( :H ), Y( :count ), Overlay( :Mes_num ) ), Elements( Smoother( X, Y, Legend( 72 ) ) ) ); Graph Builder( Variables( X( :H ), Y( :count ), Overlay( :workingday ) ), Elements( Smoother( X, Y, Legend( 72 ) ) ) ); ```