--- title: 'CNN TimeSeries Predictions using Tensorflow' disqus: hackmd --- CNN TimeSeries Predictions using Tensorflow === ## Table of Contents [TOC] ## Reference https://www.tensorflow.org/tutorials/structured_data/time_series#inspect_and_cleanup **Transforming data for better timeseries prediction** https://otexts.com/fpp2/combinations.html Data Introduction --- ### Call Libraries ```python= import seaborn as sns import tensorflow as tf tf.enable_eager_execution() import numpy as np ``` ### Stock Data Pull [Quantconnect](https://www.quantconnect.com/docs/home/home) provides equities data. We will use their API. ```python= # Call quantbook qb = QuantBook() # top 10 small-penny stock universe from stockfetcher's small-cap smallCap = ["NEXT", "XYF", "CLMT", "HCHC", "STON", "ABEO", "MTNB", "EMX", "UK", "EMAN"] # Add instance in qb for i in smallCap: qb.AddEquity(i) # Add History of all in qbInstance allHistory = qb.History(qb.Securities.Keys, timedelta(days=360), Resolution.Minute) # Accessing each symbol, extracting close, renaming close, and concat based on time workingData = pd.DataFrame() for i in smallCap: workingData['close{0}'.format(i)] = allHistory.loc[i].close ``` ### Inspect and Clean up This part we have to graph data. Best to graph percent returns. Also, time should be in stamps, later converted into human-readable time. ```python= # Graphing data test = workingData.reset_index(drop=True) test.pct_change().cumsum().plot(figsize=(16,8)); ``` ### Transforming Data to Simple MA ```python= workingData['NEXTclose'] = workingData.iloc[:,1].rolling(window=15).mean() # remove NaN workingData.fillna(0) ``` ### Split Data Split data for training, validation, and test (70%m 20%, 10%). ```python= # save column names column_indicies = {name: i for i, name in enumerate(workingData)} # find row of data n = len(workingData) # create sets train_df = workingData[0:int(n*0.7)] val_df = workingData[int(n*0.7):int(0.9)] test_df = workingData[int(n*0.9):] # return tuple (row, col) num_features = workingData.shape[1] # ignores index ``` ### Normalize Data Scaling features, so that variables are matching to another in terms of magnitude. ```python= train_mean = train_df.mean() train_std = train_df.std() train_df = (train_df - train_mean) / train_std val_df = (val_df - train_mean) / train_std test_df = (test_df - train_mean) ``` ### Data Windowing Set of predictions basaed on windows of consecutive samples. Features: 1) width (timesteps, of input/label) 2) timeoffset 3) features as input or labels *this part used to determine which output: single-output, or multi-output and single-timestep or multitimestep*. **Predicting 24hr in future w/ 24hr history** ![](https://i.imgur.com/vv90Ze1.png) **Predicting 1hr into the future, given 6hr of history** ![](https://i.imgur.com/JAzT8nf.png) This research, I'm more interested in predicting one timestep ahead (for daytrade purposes). This is because, there is more information minute-based than it is outside. #### Index and Offset Create a class for window size. ```python= ## Create Data Windows; index and offset class WindowGenerator(): def __init__(self, input_width, label_width, shift, train_df=train_df, val_df=val_df, test_df=test_df, label_columns=None): # store the raw data self.train_df = train_df self.val_df = val_df self.test_df = test_df # work out the label column indices self.label_columns = label_columns if label_columns is not None: self.label_columns_indices = {name: i for i, name in enumerate(label_columns)} self.column_indices = {name: i for i, name in enumerate(train_df.columns)} # work out the window parameters self.input_width = input_width self.label_width = label_width self.shift = shift self.total_window_size = input_width + shift self.input_slice = slice(0, input_width) self.input_indices = np.arange(self.total_window_size)[self.input_slice] self.label_start = self.total_window_size - self.label_width self.labels_slice = slice(self.label_start, None) self.label_indices = np.arange(self.total_window_size)[self.labels_slice] def __repr__(self): return '\n'.join([ f'Total window size: {self.total_window_size}', f'Input indices: {self.input_indices}', f'Label indices: {self.label_indices}', f'Label column name(s): {self.label_columns}' ]) #### example w1 = WindowGenerator(input_width=30, label_width=1, shift=30, label_columns=['closeNEXT']) w1 ''' Total window size: 48 Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23] Label indices: [47] Label column name(s): ['T (degC)'] ''' ``` Trying to predict: labels. Attributes to predictuion: features. ##### Custom Parameters Because we are working with minute data, I'm interested in pattern identification. I think patterns are more present parsed in 30mins. Therefore, my parameters should be: :::info **input_width=30 label_width=1 shift=30 label_columns=['closeNext']** ::: We are trying to predict closeNext. We want to predict the next 30min structure. *example* :::info ```python= w1 = WindowGenerator(input_width=30, label_width=1, shift=30, label_columns=['closeNEXT']) w1 ``` ::: #### Split We have a list of consecutive inputs. The split window will convert input and label windows. ![](https://i.imgur.com/lrdd3eT.png) Features of the image not shown, but the function handles label columns for both single and multi-output examples (predict one variable or multiple variables). ```python= def split_window(self, features): inputs = features[:, self.input_slice, :] labels = features[:, self.labels_slice, :] if self.label_columns is not None: labels = tf.stack( [labels[:, :, self.column_indices[name]] for name in self.label_columns], axis=-1) # slicing doesn't preserve static shape, set shapes # manually. 'tf.data.Datasets' are easier to inspect inputs.set_shape([None, self.input_width, None]) labels.set_shape([None, self.label_width, None]) return inputs, labels WindowGenerator.split_window = split_window ``` **Windows example.** :::info ```python= # stack three slices, the length of the total window example_window = tf.stack([ np.array(train_df[:w1.total_window_size]), np.array(train_df[100:100+w1.total_window_size]), np.array(train_df[200:200+w1.total_window_size]) ]) example_inputs, example_labels = w1.split_window(example_window) print('All shapes are: (batch, time, features)') print(f'Window shape: {example_window.shape}') print(f'Inputs shape: {example_inputs.shape}') print(f'labels shape: {example_labels.shape}') ``` ::: Typically data in TensorFlow into arrays where outermost index is across examples (batch dimension). The middle indices are time or space (width, height). Innermost indices are features. The code took batch 3, 30-timestep windows, with 19 features each timestep (timestep determined by input_width), 1-timestep and 1-feature. #### Plot Visualize **split window**: inputs, labels, predictions. ```python= w1.example = example_inputs, example_labels ``` Plotting Function ```python= def plot(self, model=None, plot_col='closeNEXT', max_subplots=3): inputs, labels = self.example plt.figure(figsize=(12, 8)) plot_col_index = self.column_indices[plot_col] max_n = min(max_subplots, len(inputs.shape)) for n in range(max_n): plt.subplot(3, 1, n+1) plt.ylabel(f'{plot_col} [normed]') plt.plot(self.input_indices, inputs[n, :, plot_col_index], label='Inputs', marker='.', zorder=-10) if self.label_columns: label_col_index = self.label_columns_indices.get(plot_col, None) else: label_col_index = plot_col_index if label_col_index is None: continue plt.scatter(self.label_indices, labels[n, :, label_col_index], edgecolors='k', label='Labels', c='#2ca02c', s=64) if model is not None: predictions = model(inputs) plt.scatter(self.label_indices, predictions[n, :, label_col_index], marker='X', edgecolors='k', label='Predictions', c='#ff7f0e', s=64) if n == 0: plt.legend() plt.xlabel('Time [h]') WindowGenerator.plot = plot ``` **example** ```pytnon= # plot w1.plot() ### displays split window ``` #### Create tf.data.Datasets Convert **DataFrame** into **tf.data.Dataset (input_window, label_window)** pairs. ```python= def make_dataset(self, data): data = np.array(data, dtype=np.float32) ds = tf.keras.preprocessing.timeseries_dataset_from_array( data=data, targets=None, sequence_length=self.total_window_size, sequence_stride=1, shuffle=True, batch_size=32, ) ds = ds.map(self.split_window) return ds WindowGenerator.make_dataset = make_dataset # holds training, validation, test data ``` ```python= @property def train(self): return self.make_dataset(self.train_df) @property def val(self): return self.make_dataset(self.val_df) @property def test(self): return self.make_dataset(self.test_df) @property def example(self): """Get and cache an example batch of `inputs, labels` for plotting""" result = getattr(self, '_example', None) if result is None: # No example batch found, so get from `.train` dataset result = next(iter(self.train)) # and cache it for next time self._example = result return result WindowGenerator.train = train WindowGenerator.val = val WindowGenerator.test = test WindowGenerator.example = example ``` *inspect structure and dtype* ```python= # Each element is an (inputs, label) pair w1.train.element_spec ``` ## Appendix and FAQ :::info **Find this document incomplete?** Leave a comment! ::: ###### tags: `CNN` `Machine Learning`