# Signature: Feature for Sequential Data Analysis ## Introduction The signature method is a versatile mathematical framework with roots in stochastic analysis and algebra that has found widespread applications in machine learning and data science. It offers a powerful and principled approach for analyzing sequential data by transforming it into a path and extracting meaningful features through iterated integrals. This method, originally developed by K.T. Chen in the 1950s, provides a unique perspective on time series data, capturing its dynamic nature and revealing hidden patterns. ### Why Study Signatures? The signature method addresses key challenges in machine learning, offering several advantages: * **Automatic Feature Extraction:** Systematically extracts relevant features from sequential data, eliminating the need for manual feature engineering. * **Nonlinearity:** Captures both linear and nonlinear relationships within the data. * **Interpretability:** Provides geometric interpretations of patterns, enhancing understanding of the underlying dynamics. * **Invariance Properties:** Naturally encodes important invariances like time warping and translation. * **Robustness to Irregular Sampling:** Handles missing data and varying sampling rates effectively. * **Dimensionality Reduction:** Compresses high-dimensional sequential data while preserving essential information. ### Historical Context The journey of the signature method from pure mathematics to practical applications spans several decades: * **1950s:** Introduced by K.T. Chen for studying paths in algebraic topology. * **1990s:** Terry Lyons extends the theory to rough paths, enabling applications in stochastic analysis. * **2000s:** First applications emerge in financial mathematics for analyzing price movements. * **2010s:** Gains traction in machine learning for diverse tasks like character recognition and time series classification. * **Present:** Rapidly expanding applications in healthcare, signal processing, computer vision, and more. ### Applications Spectrum The signature method has proven successful in various domains: Domain          | Example Applications ----------------|-------------------- Finance         | Price prediction, risk management, algorithmic trading Healthcare      | Patient monitoring, disease prediction, drug discovery Signal Process. | Speech recognition, anomaly detection Computer Vision | Action recognition, gesture analysis ### Prerequisites A basic understanding of the following is helpful: * Basic linear algebra and calculus * Fundamental probability theory * Elementary machine learning concepts * Basic programming skills (Python/C++) ## The Signature: Capturing the Essence of a Path The signature of a path provides a complete and concise summary of its essential information. It achieves this by encoding the path's incremental changes and their order in a hierarchical structure. ### Definition Given a path $X: [a,b] \rightarrow \mathbb{R}^d$, its signature $S(X)_{a,b}$ is an infinite sequence of real numbers: $$ S(X)_{a,b} = (1, S(X)^1_{a,b}, \dots, S(X)^d_{a,b}, S(X)^{1,1}_{a,b}, S(X)^{1,2}_{a,b}, \dots) $$ Each element in this sequence is an iterated integral of the path, capturing increasingly complex interactions between its dimensions: $$ S(X)^{i_1,...,i_k}_{a,t} = \int_{a<s<t} S(X)^{i_1,...,i_{k-1}}_{a,s} dX^{i_k}_s $$ This recursive definition starts with the basic increments of the path ($S(X)^i_{a,b} = X^i_b - X^i_a$) and builds up higher-order terms that capture more intricate patterns. ### A Simple Analogy Imagine a path as a road trip. The signature is like a detailed travel log that records not just the starting and ending points, but also: * **Individual legs of the journey:** The distances traveled in each direction (north, south, east, west). * **Combinations of legs:** The effect of traveling north then east, versus east then north. * **More complex combinations:** The impact of going north, then east, then south, and so on. This multi-layered record provides a rich description of the journey, capturing its overall shape and complexity. ### Truncated Signature In practice, we often work with a truncated signature $S_N(X)_{a,b}$, which includes only the terms up to a certain level $N$: $$ S_N(X)_{a,b} = (1, S(X)^i_{a,b}, S(X)^{i,j}_{a,b}, \dots, S(X)^{i_1,\dots,i_N}_{a,b}) $$ This truncation provides a finite representation of the signature while still capturing a significant portion of the path's information. ### Key Properties The signature possesses remarkable properties that make it a powerful tool for analysis: * **Reparametrization Invariance:** The signature ignores speed, focusing only on the path's shape. $$S(X \circ \phi)_{a,b} = S(X)_{a,b}$$ * **Shuffle Product:** An algebraic rule governs how signature terms combine. $$S(X)^I_{a,b}S(X)^J_{a,b} = \sum_{K\in I \sqcup\sqcup J} S(X)^K_{a,b}$$ * **Chen's Identity:** The signature of a concatenated path is built from the signatures of its parts. $$S(X * Y)_{a,c} = S(X)_{a,b} \otimes S(Y)_{b,c}$$ * **Time-Reversal:** Relates the signature of a path to its reversed version. $$S(X)_{a,b} \otimes S(\overleftarrow{X})_{a,b} = 1$$ * **Uniqueness:** (Under certain conditions) The signature uniquely identifies the path. * **Log Signature:** An additive representation of the signature, simplifying some calculations. $$\log S(X)_{a,b} = \sum_{k\geq 1} \sum_{i_1,\dots,i_k} \lambda_{i_1, \dots,i_k}[e_{i_1}, [e_{i_2}, \dots, [e_{i_{k-1}}, e_{i_k}] \dots]]$$ ## Basic Operations: Preparing Data for Signature Analysis Before we can leverage the power of signatures, we need to transform our data into a suitable format. This involves representing the data as a continuous path and potentially applying transformations to enhance its information content. ### From Discrete to Continuous Real-world data often comes in discrete form, as a sequence of measurements at specific time points. To apply the signature method, we need to convert this discrete data into a continuous path. Two common approaches are: * **Linear Interpolation:** This method simply connects consecutive data points with straight lines. For a sequence of data points $\{(t_i, x_i)\}_{i=1}^n$, the linear interpolation $X(t)$ is defined as: $$X(t) = x_i + (t - t_i) \cdot \frac{x_{i+1} - x_i}{t_{i+1} - t_i}, \quad t \in [t_i, t_{i+1})$$ This creates a piecewise linear path that captures the basic trend of the data. * **Rectilinear Path (Axis Path):** This method constructs a path that moves parallel to the coordinate axes. For example, to represent the point $(x_1,x_2)$ in a 2-dimensional space, we first move horizontally from $(0,0)$ to $(x_1,0)$, and then vertically to $(x_1,x_2)$. This approach is particularly useful for financial time series data, where each dimension represents a different asset or indicator. ![Screenshot 2024-10-23 at 12.59.09 PM](https://hackmd.io/_uploads/rJKoGWLeyl.png) ### Lead-Lag Transformation This transformation enhances a one-dimensional time series by introducing a lagged version of the data. Specifically, for a time series $\{X(t)\}$, we create a new path in three dimensions: $$ t \mapsto (X^{\text{lead}}(t), X^{\text{lag}}(t)) $$ where $X^{lead}(t_i) = X(t_i)$ and $X^{lag}(t_i) = X(t_{i-1})$. ![Lead-lag](https://hackmd.io/_uploads/ByLODe8x1x.png) This lead-lag transformation captures important information about the volatility and temporal dependencies in the data. For instance, the area enclosed by the lead-lag path is related to the quadratic variation of the original time series: $$ \text{Area enclosed} = \frac{1}{2}QV(X) $$ where the quadratic variation $QV(X)$ is defined as: $$ QV(X) = \sum_i (X(t_i) - X(t_{i-1}))^2 $$ ![Screenshot 2024-10-23 at 12.12.20 PM](https://hackmd.io/_uploads/ryehwe8xyl.png) ### Computing the Signature Once we have a continuous path representation of our data, we can compute its signature. This involves calculating the iterated integrals that define the signature terms. In practice, we typically compute a truncated signature up to a certain level $N$, denoted by $S_N(X)$. The choice of truncation level depends on factors like data complexity, computational resources, and desired accuracy. ### Extracting Statistical Moments The signature not only captures the dynamic information of a path but also encodes its statistical moments. For instance, given a path $X$ over the time interval $[0,T]$, its mean and variance can be directly extracted from its signature: * **Mean:** $\mu = \frac1T S(X)^1_{0,T}$ * **Variance:** $$\sigma^2 = \frac{1}{T}S(X)^{1,1}_{0,T} - \left(\frac{1}{T}S(X)^1_{0,T}\right)^2$$ Higher-order moments like skewness and kurtosis can also be obtained from the signature of the centered path $\tilde{X}_t = X_t - \mu t$. This connection between the signature and statistical moments highlights the richness of information contained in the signature. ## Machine Learning Applications: Signatures in Action The signature method provides a powerful and versatile toolkit for tackling a wide range of machine learning problems involving sequential data. ### Time Series Analysis Here's a typical pipeline for using signatures in time series analysis: 1. **Path Construction:** Transform the raw time series data $\{(t_i, x_i)\}_{i=1}^n$ into a continuous path $X:[0,T] \rightarrow \mathbb{R}^d$. You can augment the path with time, lead-lag information, or other relevant features: $$\begin{split}\text{Time}: & \quad t \mapsto (t, X_t) \\ \text{Lead-Lag}: & \quad t \mapsto (X_t, X_{t-1}) \\ \text{Multi-modal}: & \quad t \mapsto (X^1_t, X^2_t, \ldots, X^d_t) \end{split}$$ 2. **Signature Computation:** Compute the truncated signature $S_N(X)$ of the path up to a chosen level $N$. 3. **Feature Selection:** If necessary, apply feature selection techniques like LASSO regularization to identify the most relevant signature features. $$\min_\beta |y - \beta S_N(X)|^2_2 + \lambda|\beta|_1$$ ### Classification Methods: Signatures can be used with various classification methods: * **Distance-based:** Define a distance metric between signatures, e.g., $$d(X,Y) = |S_N(X) - S_N(Y)|$$ and use it for k-NN classification or clustering. * **Kernel methods:** Construct a kernel based on the signature, e.g., $$K(X,Y) = \langle S(X), S(Y) \rangle$$ and use it with Support Vector Machines. * **Neural networks:** Incorporate a signature layer within a neural network architecture. ![Screenshot 2024-10-22 at 8.54.47 PM](https://hackmd.io/_uploads/rymogXrgkl.png) ### Handling Missing Data Signatures can effectively handle missing data by augmenting the path with an indicator process $R(t)$: $$ R(t) = \begin{cases} 1 & \text{if } X(t) \text{ is missing} \\ 0 & \text{otherwise} \end{cases} $$ The signature of the augmented path $t \mapsto (X(t), R(t))$ then captures the information about both the data and the missingness pattern. ### Domain-Specific Applications #### Finance * **Trading strategy detection:** Analyze price, volume, and order imbalance data to identify patterns and develop trading strategies. * **Market microstructure:** Extract features related to volatility, price-volume relationships, and order flow. ![Screenshot 2024-10-23 at 12.26.59 PM](https://hackmd.io/_uploads/ByJmsl8xkg.png) #### Character Recognition Represent handwritten characters as paths and use the signature to extract features that are invariant to translation, scale, and rotation. #### Healthcare * **Time series analysis:** Analyze physiological signals like MEG data to extract features and identify patterns. * **Longitudinal studies:** Model patient trajectories using clinical, lab, and medication data to predict outcomes. ![Screenshot 2024-10-23 at 12.28.55 PM](https://hackmd.io/_uploads/S1m5slLl1e.png) ## Reference * Chevyrev, I., & Kormilitzin, A. (2016). A primer on the signature method in machine learning. arXiv preprint arXiv:1603.03788. * Levin, D., Lyons, T., & Ni, H. (2013). Learning from the past, predicting the statistics for the future, learning an evolving system. arXiv preprint arXiv:1309.0260.   * Lyons, T. (2023). Signatures of streams [link](https://youtu.be/GtJMLJqTUFc?si=mKszKMA3OLmPirvS) * Cuchiero, C. (2024). Signatures methods in finance-I [link](https://staff.fnwi.uva.nl/a.khedher/winterschool/21slidesCuchiero1.pdf) ## Further Reading ### Signature Methods in Machine Learning * Lyons, T., & McLeod, A. D. (2022). Signature methods in machine learning. arXiv preprint arXiv:2206.14674. ### Signature Methods in Finance * Gyurkó, L. G., Lyons, T., Kontkowski, M., & Field, J. (2013). Extracting information from the signature of a financial data stream. arXiv preprint arXiv:1307.7244. * Kalsi, J., Lyons, T., & Pérez Arribas, I. (2020). Optimal execution with rough path signatures. SIAM Journal on Financial Mathematics, 11(2), 470–493. https://doi.org/10.1137/19M1259778 * Lyons, T., Nejad, S., & Perez Arribas, I. (2019). Numerical Method for Model-free Pricing of Exotic Derivatives in Discrete Time Using Rough Path Signatures. Applied Mathematical Finance, 26(6), 583–597. https://doi.org/10.1080/1350486X.2020.1726784 * Lyons, T., Nejad, S., & Perez Arribas, I. (2020). Non-parametric Pricing and Hedging of Exotic Derivatives. Applied Mathematical Finance, 27(6), 457–494. https://doi.org/10.1080/1350486X.2021.1891555