Geometric Moving average and Wild Binary Segmentation

# Geometric Moving average and Wild Binary Segmentation (0#f116f785-8626-48f3-a390-c0c4a03b5bd6) ## Tried out methods: - navie methods: - These are simple estimation techniques, such as the predicted value is given the value equal to mean of preceding values of the time dependent variable, or previous actual value. These are used for comparison with sophisticated modelling techniques. - not very useful => the sentiment scores of our dataset are not very different over time. RMSE is always low. - Auto regression - the reasoning behind autocorrelation is not really suitable for application with time series data which do not have characteristics such as seasonality. - ![](https://i.imgur.com/YyNrIIj.png) - https://github.com/chenhaotian/Changepoints - learn a model and do the prediction on other data. parametric. ## Moving average ### Simple Moving Average (SMA): - simple moving average uses a simple window to take average over a set number pf time periods. It is an equally weighted mean of the previous n data ![](https://i.imgur.com/6E0Xsv3.png) ### Exponential Moving Average (EMA): - smoothier. ![](https://i.imgur.com/PX5dRVM.png) ### navie moving window - choosing a window size, the mean of the second half of the window minus the first half of the window, the bigger the value, the more abrupt the change. ![](https://i.imgur.com/a4QOyda.png) ## Wild Binary Segmentation - paper: https://arxiv.org/pdf/1411.0858.pdf - compared to standard binary segmentation - non-parametric (no need to chose a window or span parameter) - works even for every small jump magnitude - for consistent estimation of the number and locations of multiple change points in data. ### WBS - on all data, without smoothing. ![](https://i.imgur.com/Msa59l6.png) - on exponentially weighted data - SPAN 10 ![](https://i.imgur.com/zcfU7DF.png) - on all hotel-topics related data ![](https://i.imgur.com/cuYS2vP.png) - on all renovation related data ![](https://i.imgur.com/O3jFlj4.png) ### SBS (Standard Binary Segmentation) - comparison - on all data, without smoothing. ![](https://i.imgur.com/YcR5ssA.png) - on exponentially weighted data - SPAN 10 ![](https://i.imgur.com/A4CtlzV.png) ### Retrospect - WBS can be used as the baseline algorithm to select the reviews. - experimenting filtered derivative algorithm - next step to select all the reviews using this algorithm. - Still working on implementing the RULSIF. - filter out the reivews renovation related after the detection of change points. ### Meeting with Martin (20200525) -　try fasttext for misspelling -　using review based aggregation on mean_sentiment score -　better reasoning than equal weighted sentences. -　create html template for annotation. - reasoning behind using word2verc to train embeddings for keywords, and extract aspect for each sentence using those keywords.