---
title: 'CNN TimeSeries Predictions using Tensorflow'
disqus: hackmd
---
CNN TimeSeries Predictions using Tensorflow
===
## Table of Contents
[TOC]
## Reference
https://www.tensorflow.org/tutorials/structured_data/time_series#inspect_and_cleanup
**Transforming data for better timeseries prediction**
https://otexts.com/fpp2/combinations.html
Data Introduction
---
### Call Libraries
```python=
import seaborn as sns
import tensorflow as tf
tf.enable_eager_execution()
import numpy as np
```
### Stock Data Pull
[Quantconnect](https://www.quantconnect.com/docs/home/home) provides equities data. We will use their API.
```python=
# Call quantbook
qb = QuantBook()
# top 10 small-penny stock universe from stockfetcher's small-cap
smallCap = ["NEXT", "XYF", "CLMT", "HCHC",
"STON", "ABEO", "MTNB", "EMX", "UK", "EMAN"]
# Add instance in qb
for i in smallCap: qb.AddEquity(i)
# Add History of all in qbInstance
allHistory = qb.History(qb.Securities.Keys, timedelta(days=360), Resolution.Minute)
# Accessing each symbol, extracting close, renaming close, and concat based on time
workingData = pd.DataFrame()
for i in smallCap: workingData['close{0}'.format(i)] = allHistory.loc[i].close
```
### Inspect and Clean up
This part we have to graph data. Best to graph percent returns. Also, time should be in stamps, later converted into human-readable time.
```python=
# Graphing data
test = workingData.reset_index(drop=True)
test.pct_change().cumsum().plot(figsize=(16,8));
```
### Transforming Data to Simple MA
```python=
workingData['NEXTclose'] = workingData.iloc[:,1].rolling(window=15).mean()
# remove NaN
workingData.fillna(0)
```
### Split Data
Split data for training, validation, and test (70%m 20%, 10%).
```python=
# save column names
column_indicies = {name: i for i, name in enumerate(workingData)}
# find row of data
n = len(workingData)
# create sets
train_df = workingData[0:int(n*0.7)]
val_df = workingData[int(n*0.7):int(0.9)]
test_df = workingData[int(n*0.9):]
# return tuple (row, col)
num_features = workingData.shape[1] # ignores index
```
### Normalize Data
Scaling features, so that variables are matching to another in terms of magnitude.
```python=
train_mean = train_df.mean()
train_std = train_df.std()
train_df = (train_df - train_mean) / train_std
val_df = (val_df - train_mean) / train_std
test_df = (test_df - train_mean)
```
### Data Windowing
Set of predictions basaed on windows of consecutive samples.
Features:
1) width (timesteps, of input/label)
2) timeoffset
3) features as input or labels
*this part used to determine which output: single-output, or multi-output and single-timestep or multitimestep*.
**Predicting 24hr in future w/ 24hr history**

**Predicting 1hr into the future, given 6hr of history**

This research, I'm more interested in predicting one timestep ahead (for daytrade purposes). This is because, there is more information minute-based than it is outside.
#### Index and Offset
Create a class for window size.
```python=
## Create Data Windows; index and offset
class WindowGenerator():
def __init__(self, input_width, label_width, shift, train_df=train_df, val_df=val_df, test_df=test_df, label_columns=None):
# store the raw data
self.train_df = train_df
self.val_df = val_df
self.test_df = test_df
# work out the label column indices
self.label_columns = label_columns
if label_columns is not None:
self.label_columns_indices = {name: i for i, name in enumerate(label_columns)}
self.column_indices = {name: i for i, name in enumerate(train_df.columns)}
# work out the window parameters
self.input_width = input_width
self.label_width = label_width
self.shift = shift
self.total_window_size = input_width + shift
self.input_slice = slice(0, input_width)
self.input_indices = np.arange(self.total_window_size)[self.input_slice]
self.label_start = self.total_window_size - self.label_width
self.labels_slice = slice(self.label_start, None)
self.label_indices = np.arange(self.total_window_size)[self.labels_slice]
def __repr__(self):
return '\n'.join([
f'Total window size: {self.total_window_size}',
f'Input indices: {self.input_indices}',
f'Label indices: {self.label_indices}',
f'Label column name(s): {self.label_columns}'
])
#### example
w1 = WindowGenerator(input_width=30, label_width=1, shift=30,
label_columns=['closeNEXT'])
w1
'''
Total window size: 48
Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
Label indices: [47]
Label column name(s): ['T (degC)']
'''
```
Trying to predict: labels.
Attributes to predictuion: features.
##### Custom Parameters
Because we are working with minute data, I'm interested in pattern identification. I think patterns are more present parsed in 30mins. Therefore, my parameters should be:
:::info
**input_width=30
label_width=1
shift=30
label_columns=['closeNext']**
:::
We are trying to predict closeNext. We want to predict the next 30min structure.
*example*
:::info
```python=
w1 = WindowGenerator(input_width=30, label_width=1, shift=30,
label_columns=['closeNEXT'])
w1
```
:::
#### Split
We have a list of consecutive inputs. The split window will convert input and label windows.

Features of the image not shown, but the function handles label columns for both single and multi-output examples (predict one variable or multiple variables).
```python=
def split_window(self, features):
inputs = features[:, self.input_slice, :]
labels = features[:, self.labels_slice, :]
if self.label_columns is not None:
labels = tf.stack(
[labels[:, :, self.column_indices[name]] for name in self.label_columns], axis=-1)
# slicing doesn't preserve static shape, set shapes
# manually. 'tf.data.Datasets' are easier to inspect
inputs.set_shape([None, self.input_width, None])
labels.set_shape([None, self.label_width, None])
return inputs, labels
WindowGenerator.split_window = split_window
```
**Windows example.**
:::info
```python=
# stack three slices, the length of the total window
example_window = tf.stack([
np.array(train_df[:w1.total_window_size]),
np.array(train_df[100:100+w1.total_window_size]),
np.array(train_df[200:200+w1.total_window_size])
])
example_inputs, example_labels = w1.split_window(example_window)
print('All shapes are: (batch, time, features)')
print(f'Window shape: {example_window.shape}')
print(f'Inputs shape: {example_inputs.shape}')
print(f'labels shape: {example_labels.shape}')
```
:::
Typically data in TensorFlow into arrays where outermost index is across examples (batch dimension). The middle indices are time or space (width, height). Innermost indices are features.
The code took batch 3, 30-timestep windows, with 19 features each timestep (timestep determined by input_width), 1-timestep and 1-feature.
#### Plot
Visualize **split window**: inputs, labels, predictions.
```python=
w1.example = example_inputs, example_labels
```
Plotting Function
```python=
def plot(self, model=None, plot_col='closeNEXT', max_subplots=3):
inputs, labels = self.example
plt.figure(figsize=(12, 8))
plot_col_index = self.column_indices[plot_col]
max_n = min(max_subplots, len(inputs.shape))
for n in range(max_n):
plt.subplot(3, 1, n+1)
plt.ylabel(f'{plot_col} [normed]')
plt.plot(self.input_indices, inputs[n, :, plot_col_index],
label='Inputs', marker='.', zorder=-10)
if self.label_columns:
label_col_index = self.label_columns_indices.get(plot_col, None)
else:
label_col_index = plot_col_index
if label_col_index is None:
continue
plt.scatter(self.label_indices, labels[n, :, label_col_index],
edgecolors='k', label='Labels', c='#2ca02c', s=64)
if model is not None:
predictions = model(inputs)
plt.scatter(self.label_indices, predictions[n, :, label_col_index],
marker='X', edgecolors='k', label='Predictions',
c='#ff7f0e', s=64)
if n == 0:
plt.legend()
plt.xlabel('Time [h]')
WindowGenerator.plot = plot
```
**example**
```pytnon=
# plot
w1.plot()
### displays split window
```
#### Create tf.data.Datasets
Convert **DataFrame** into **tf.data.Dataset (input_window, label_window)** pairs.
```python=
def make_dataset(self, data):
data = np.array(data, dtype=np.float32)
ds = tf.keras.preprocessing.timeseries_dataset_from_array(
data=data,
targets=None,
sequence_length=self.total_window_size,
sequence_stride=1,
shuffle=True,
batch_size=32,
)
ds = ds.map(self.split_window)
return ds
WindowGenerator.make_dataset = make_dataset
# holds training, validation, test data
```
```python=
@property
def train(self):
return self.make_dataset(self.train_df)
@property
def val(self):
return self.make_dataset(self.val_df)
@property
def test(self):
return self.make_dataset(self.test_df)
@property
def example(self):
"""Get and cache an example batch of `inputs, labels` for plotting"""
result = getattr(self, '_example', None)
if result is None:
# No example batch found, so get from `.train` dataset
result = next(iter(self.train))
# and cache it for next time
self._example = result
return result
WindowGenerator.train = train
WindowGenerator.val = val
WindowGenerator.test = test
WindowGenerator.example = example
```
*inspect structure and dtype*
```python=
# Each element is an (inputs, label) pair
w1.train.element_spec
```
## Appendix and FAQ
:::info
**Find this document incomplete?** Leave a comment!
:::
###### tags: `CNN` `Machine Learning`