# Signature: Feature for Sequential Data Analysis
## Introduction
The signature method is a versatile mathematical framework with roots in stochastic analysis and algebra that has found widespread applications in machine learning and data science. It offers a powerful and principled approach for analyzing sequential data by transforming it into a path and extracting meaningful features through iterated integrals. This method, originally developed by K.T. Chen in the 1950s, provides a unique perspective on time series data, capturing its dynamic nature and revealing hidden patterns.
### Why Study Signatures?
The signature method addresses key challenges in machine learning, offering several advantages:
* **Automatic Feature Extraction:** Systematically extracts relevant features from sequential data, eliminating the need for manual feature engineering.
* **Nonlinearity:** Captures both linear and nonlinear relationships within the data.
* **Interpretability:** Provides geometric interpretations of patterns, enhancing understanding of the underlying dynamics.
* **Invariance Properties:** Naturally encodes important invariances like time warping and translation.
* **Robustness to Irregular Sampling:** Handles missing data and varying sampling rates effectively.
* **Dimensionality Reduction:** Compresses high-dimensional sequential data while preserving essential information.
### Historical Context
The journey of the signature method from pure mathematics to practical applications spans several decades:
* **1950s:** Introduced by K.T. Chen for studying paths in algebraic topology.
* **1990s:** Terry Lyons extends the theory to rough paths, enabling applications in stochastic analysis.
* **2000s:** First applications emerge in financial mathematics for analyzing price movements.
* **2010s:** Gains traction in machine learning for diverse tasks like character recognition and time series classification.
* **Present:** Rapidly expanding applications in healthcare, signal processing, computer vision, and more.
### Applications Spectrum
The signature method has proven successful in various domains:
Domain | Example Applications
----------------|--------------------
Finance | Price prediction, risk management, algorithmic trading
Healthcare | Patient monitoring, disease prediction, drug discovery
Signal Process. | Speech recognition, anomaly detection
Computer Vision | Action recognition, gesture analysis
### Prerequisites
A basic understanding of the following is helpful:
* Basic linear algebra and calculus
* Fundamental probability theory
* Elementary machine learning concepts
* Basic programming skills (Python/C++)
## The Signature: Capturing the Essence of a Path
The signature of a path provides a complete and concise summary of its essential information. It achieves this by encoding the path's incremental changes and their order in a hierarchical structure.
### Definition
Given a path $X: [a,b] \rightarrow \mathbb{R}^d$, its signature $S(X)_{a,b}$ is an infinite sequence of real numbers:
$$
S(X)_{a,b} = (1, S(X)^1_{a,b}, \dots, S(X)^d_{a,b}, S(X)^{1,1}_{a,b}, S(X)^{1,2}_{a,b}, \dots)
$$
Each element in this sequence is an iterated integral of the path, capturing increasingly complex interactions between its dimensions:
$$
S(X)^{i_1,...,i_k}_{a,t} = \int_{a<s<t} S(X)^{i_1,...,i_{k-1}}_{a,s} dX^{i_k}_s
$$
This recursive definition starts with the basic increments of the path ($S(X)^i_{a,b} = X^i_b - X^i_a$) and builds up higher-order terms that capture more intricate patterns.
### A Simple Analogy
Imagine a path as a road trip. The signature is like a detailed travel log that records not just the starting and ending points, but also:
* **Individual legs of the journey:** The distances traveled in each direction (north, south, east, west).
* **Combinations of legs:** The effect of traveling north then east, versus east then north.
* **More complex combinations:** The impact of going north, then east, then south, and so on.
This multi-layered record provides a rich description of the journey, capturing its overall shape and complexity.
### Truncated Signature
In practice, we often work with a truncated signature $S_N(X)_{a,b}$, which includes only the terms up to a certain level $N$:
$$
S_N(X)_{a,b} = (1, S(X)^i_{a,b}, S(X)^{i,j}_{a,b}, \dots, S(X)^{i_1,\dots,i_N}_{a,b})
$$
This truncation provides a finite representation of the signature while still capturing a significant portion of the path's information.
### Key Properties
The signature possesses remarkable properties that make it a powerful tool for analysis:
* **Reparametrization Invariance:** The signature ignores speed, focusing only on the path's shape. $$S(X \circ \phi)_{a,b} = S(X)_{a,b}$$
* **Shuffle Product:** An algebraic rule governs how signature terms combine. $$S(X)^I_{a,b}S(X)^J_{a,b} = \sum_{K\in I \sqcup\sqcup J} S(X)^K_{a,b}$$
* **Chen's Identity:** The signature of a concatenated path is built from the signatures of its parts. $$S(X * Y)_{a,c} = S(X)_{a,b} \otimes S(Y)_{b,c}$$
* **Time-Reversal:** Relates the signature of a path to its reversed version. $$S(X)_{a,b} \otimes S(\overleftarrow{X})_{a,b} = 1$$
* **Uniqueness:** (Under certain conditions) The signature uniquely identifies the path.
* **Log Signature:** An additive representation of the signature, simplifying some calculations. $$\log S(X)_{a,b} = \sum_{k\geq 1} \sum_{i_1,\dots,i_k} \lambda_{i_1, \dots,i_k}[e_{i_1}, [e_{i_2}, \dots, [e_{i_{k-1}}, e_{i_k}] \dots]]$$
## Basic Operations: Preparing Data for Signature Analysis
Before we can leverage the power of signatures, we need to transform our data into a suitable format. This involves representing the data as a continuous path and potentially applying transformations to enhance its information content.
### From Discrete to Continuous
Real-world data often comes in discrete form, as a sequence of measurements at specific time points. To apply the signature method, we need to convert this discrete data into a continuous path. Two common approaches are:
* **Linear Interpolation:** This method simply connects consecutive data points with straight lines. For a sequence of data points $\{(t_i, x_i)\}_{i=1}^n$, the linear interpolation $X(t)$ is defined as: $$X(t) = x_i + (t - t_i) \cdot \frac{x_{i+1} - x_i}{t_{i+1} - t_i}, \quad t \in [t_i, t_{i+1})$$ This creates a piecewise linear path that captures the basic trend of the data.
* **Rectilinear Path (Axis Path):** This method constructs a path that moves parallel to the coordinate axes. For example, to represent the point $(x_1,x_2)$ in a 2-dimensional space, we first move horizontally from $(0,0)$ to $(x_1,0)$, and then vertically to $(x_1,x_2)$. This approach is particularly useful for financial time series data, where each dimension represents a different asset or indicator.

### Lead-Lag Transformation
This transformation enhances a one-dimensional time series by introducing a lagged version of the data. Specifically, for a time series $\{X(t)\}$, we create a new path in three dimensions:
$$
t \mapsto (X^{\text{lead}}(t), X^{\text{lag}}(t))
$$
where $X^{lead}(t_i) = X(t_i)$ and $X^{lag}(t_i) = X(t_{i-1})$.

This lead-lag transformation captures important information about the volatility and temporal dependencies in the data. For instance, the area enclosed by the lead-lag path is related to the quadratic variation of the original time series:
$$
\text{Area enclosed} = \frac{1}{2}QV(X)
$$
where the quadratic variation $QV(X)$ is defined as:
$$
QV(X) = \sum_i (X(t_i) - X(t_{i-1}))^2
$$

### Computing the Signature
Once we have a continuous path representation of our data, we can compute its signature. This involves calculating the iterated integrals that define the signature terms. In practice, we typically compute a truncated signature up to a certain level $N$, denoted by $S_N(X)$. The choice of truncation level depends on factors like data complexity, computational resources, and desired accuracy.
### Extracting Statistical Moments
The signature not only captures the dynamic information of a path but also encodes its statistical moments. For instance, given a path $X$ over the time interval $[0,T]$, its mean and variance can be directly extracted from its signature:
* **Mean:** $\mu = \frac1T S(X)^1_{0,T}$
* **Variance:** $$\sigma^2 = \frac{1}{T}S(X)^{1,1}_{0,T} - \left(\frac{1}{T}S(X)^1_{0,T}\right)^2$$
Higher-order moments like skewness and kurtosis can also be obtained from the signature of the centered path $\tilde{X}_t = X_t - \mu t$. This connection between the signature and statistical moments highlights the richness of information contained in the signature.
## Machine Learning Applications: Signatures in Action
The signature method provides a powerful and versatile toolkit for tackling a wide range of machine learning problems involving sequential data.
### Time Series Analysis
Here's a typical pipeline for using signatures in time series analysis:
1. **Path Construction:** Transform the raw time series data $\{(t_i, x_i)\}_{i=1}^n$ into a continuous path $X:[0,T] \rightarrow \mathbb{R}^d$. You can augment the path with time, lead-lag information, or other relevant features: $$\begin{split}\text{Time}: & \quad t \mapsto (t, X_t) \\ \text{Lead-Lag}: & \quad t \mapsto (X_t, X_{t-1}) \\ \text{Multi-modal}: & \quad t \mapsto (X^1_t, X^2_t, \ldots, X^d_t) \end{split}$$
2. **Signature Computation:** Compute the truncated signature $S_N(X)$ of the path up to a chosen level $N$.
3. **Feature Selection:** If necessary, apply feature selection techniques like LASSO regularization to identify the most relevant signature features. $$\min_\beta |y - \beta S_N(X)|^2_2 + \lambda|\beta|_1$$
### Classification Methods:
Signatures can be used with various classification methods:
* **Distance-based:** Define a distance metric between signatures, e.g., $$d(X,Y) = |S_N(X) - S_N(Y)|$$ and use it for k-NN classification or clustering.
* **Kernel methods:** Construct a kernel based on the signature, e.g., $$K(X,Y) = \langle S(X), S(Y) \rangle$$ and use it with Support Vector Machines.
* **Neural networks:** Incorporate a signature layer within a neural network architecture.

### Handling Missing Data
Signatures can effectively handle missing data by augmenting the path with an indicator process $R(t)$:
$$
R(t) =
\begin{cases}
1 & \text{if } X(t) \text{ is missing} \\
0 & \text{otherwise}
\end{cases}
$$
The signature of the augmented path $t \mapsto (X(t), R(t))$ then captures the information about both the data and the missingness pattern.
### Domain-Specific Applications
#### Finance
* **Trading strategy detection:** Analyze price, volume, and order imbalance data to identify patterns and develop trading strategies.
* **Market microstructure:** Extract features related to volatility, price-volume relationships, and order flow.

#### Character Recognition
Represent handwritten characters as paths and use the signature to extract features that are invariant to translation, scale, and rotation.
#### Healthcare
* **Time series analysis:** Analyze physiological signals like MEG data to extract features and identify patterns.
* **Longitudinal studies:** Model patient trajectories using clinical, lab, and medication data to predict outcomes.

## Reference
* Chevyrev, I., & Kormilitzin, A. (2016). A primer on the signature method in machine learning. arXiv preprint arXiv:1603.03788.
* Levin, D., Lyons, T., & Ni, H. (2013). Learning from the past, predicting the statistics for the future, learning an evolving system. arXiv preprint arXiv:1309.0260.
* Lyons, T. (2023). Signatures of streams [link](https://youtu.be/GtJMLJqTUFc?si=mKszKMA3OLmPirvS)
* Cuchiero, C. (2024). Signatures methods in finance-I [link](https://staff.fnwi.uva.nl/a.khedher/winterschool/21slidesCuchiero1.pdf)
## Further Reading
### Signature Methods in Machine Learning
* Lyons, T., & McLeod, A. D. (2022). Signature methods in machine learning. arXiv preprint arXiv:2206.14674.
### Signature Methods in Finance
* Gyurkó, L. G., Lyons, T., Kontkowski, M., & Field, J. (2013). Extracting information from the signature of a financial data stream. arXiv preprint arXiv:1307.7244.
* Kalsi, J., Lyons, T., & Pérez Arribas, I. (2020). Optimal execution with rough path signatures. SIAM Journal on Financial Mathematics, 11(2), 470–493. https://doi.org/10.1137/19M1259778
* Lyons, T., Nejad, S., & Perez Arribas, I. (2019). Numerical Method for Model-free Pricing of Exotic Derivatives in Discrete Time Using Rough Path Signatures. Applied Mathematical Finance, 26(6), 583–597. https://doi.org/10.1080/1350486X.2020.1726784
* Lyons, T., Nejad, S., & Perez Arribas, I. (2020). Non-parametric Pricing and Hedging of Exotic Derivatives. Applied Mathematical Finance, 27(6), 457–494. https://doi.org/10.1080/1350486X.2021.1891555