A good generative model for time-series data should preserve temporal dynamics, in the sense that new sequences respect the original relationships between variables across time. The authors propose a novel framework for generating realistic time series data that combines the flexibility of the unsupervised paradigm with the control afforded by supervised training. Through a learned embedding space jointly optimized with both supervised and adversarial objectives, the authors encourage the network to adhere to the dynamics of the training data during sampling.
The temporal setting poses a unique challenge to generative modeling. A model is not only tasked with capturing the distributions of features within each time point, it should also capture the potentially complex dynamics of those variables across time. In modeling multivariate sequential data
First, in addition to the unsupervised adversarial loss on both real and synthetic sequences, we introduce a stepwise supervised loss using the original data as supervision, thereby explicitly encouraging the model to capture the stepwise conditional distributions in the data.
Second, the authors introduce an embedding network to provide a reversible mapping between features and latent representations, thereby reducing the high-dimensionality of the adversarial learning space. This capitalizes on the fact the temporal dynamics of even complex systems are often driven by fewer and lower-dimensional factors of variation.
"train on synthetic, test on real (TSTR)" framework to the sequence prediction task, the authors evaluate how well the generated data preserves the predictive characteristics of the original.
Autoregressive recurrent networks trained via the maximum likelihood principle are prone to potentially large prediction errors when performing multi-step sampling, due to the discrepancy between closed-loop training (i.e. conditioned on ground truths) and open-loop inference (i.e. conditioned on previous guesses).
Multiple studies have straightforwardly inherited the GAN framework within the temporal setting. The first (C-RNN-GAN) directly applied the GAN architecture to sequential data, using LSTM networks for generator and discriminator.
Representation learning in the time-series setting primarily deals with the benefits of learning compact encodings for the benefit of downstream tasks such as prediction, forecasting, and classification.
Let
The goal is to use training data
Two objectives. Importantly, this breaks down the sequence-level objective (matching the joint distribution) into a series of stepwise objectives (matching the conditionals).
The first is global
where
for any
TimeGAN consists of four network components: an embedding function, recovery function, sequence generator, and sequence discriminator. The key insight is that the autoencoding components (first two) are trained jointly with the adversarial components (latter two), such that TimeGAN simultaneously learns to encode features, generate representations, and iterate across time.
Let
where,
In the opposite direction, the recovery function
where,
Note that the embedding and recovery functions can be parameterized by any architecture of choice, with the only stipulation being that they be autoregressive and obey causal ordering (i.e. output(s) at each step can only depend on preceding information). For example, it is just as possible to implement the former with temporal convolutions, or the latter via an attention-based decode
Instead of producing synthetic output directly in feature space, the generator first outputs into the embedding space.
Let
where
Finally, the discriminator also operates from the embedding space. The discrimination function
where,
First, purely as a reversible mapping between feature and latent spaces, the embedding and recovery functions should enable accurate reconstructions
Gradients are then computed on the unsupervised loss. This is as one would expect-that is, to allow maximizing (for the discriminator) or minimizing (for the generator) the likelihood of providing correct classifications
The authors also train in closed-loop mode, where the generator receives sequences of embeddings of actual data
where
In sum, at any step in a training sequence, one assess the difference between the actual next-step latent vector (from the embedding function) and synthetic next-step latent vector (from the generator-conditioned on the actual historical sequence of latents). While
pushes the generator to create realistic sequences (evaluated by an imperfect adversary), further ensures that it produces similar stepwise transitions (evaluated by ground-truth targets).
Optimization. Figure 1(b) illustrates the mechanics of our approach at training. Let
Next, the generator and discriminator networks are trained adversarially as follows
Figure 2: (a) TimeGAN instantiated with RNNs, (b) C-RNN-GAN, and © RCGAN. Solid lines denote function application, dashed lines denote recurrence, and orange lines indicate loss computation.
Benchmarks and Evaluation. We compare TimeGAN with RCGAN and C-RNN-GAN, the two most closely related methods. For purely autoregressive approaches, the authors compare against RNNs trained with teacher-forcing (T-Forcing) as well as professor-forcing (P-Forcing). For additional comparison, the authors consider the performance of WaveNet as well as its GAN counterpart WaveGAN. To assess the quality of generated data, the authors observe three desiderata: (1) diversity-samples should be distributed to cover the real data; (2) fidelity-samples should be indistinguishable from the real data; and (3) usefulness-samples should be just as useful as the real data when used for the same predictive purposes (i.e. train-on-synthetic, test-on-real).
(1) Visualization. We apply t-SNE and PCA analyses on both the original and synthetic datasets (flattening the temporal dimension).
(2) Discriminative Score. For a quantitative measure of similarity, the authors train a post-hoc time-series classification model (by optimizing a 2-layer LSTM) to distinguish between sequences from the original and generated datasets.
(3) Predictive Score. In order to be useful, the sampled data should inherit the predictive characteris-tics of the original. In particular, the authors expect TimeGAN to excel in capturing conditional distributions over time. Therefore, using the synthetic dataset, the authors train a post-hoc sequence-prediction model (by optimizing a 2-layer LSTM) to predict next-step temporal vectors over each input sequence. Then, the authors evaluate the trained model on the original dataset. Performance is measured in terms of the mean absolute error (MAE).
Figure 3: t-SNE visualization on Sines (1st row) and Stocks (2nd row). Each column provides the visualization for each of the 7 benchmarks. Red denotes original data, and blue denotes synthetic.