Annotation API's

# Annotation API's Looking at api designs for annotation that already exist i.e. [pyod](https://github.com/yzhao062/Pyod), [adtk](https://github.com/arundo/adtk), [luminaire](https://github.com/zillow/luminaire), ruptures Questions: 1. what is structure of base class or classes? do they have fit/predict or something else? what is the precise inputs and outputs for the base methods. 2. what is format of annotation data? 3. what kind of data they can handle (e.g. univariate vs multivariate). 4. what 'learning tasks' can they handle? e.g. supervised vs unsupervised? offline versus online? any other variations? ## pyod ![](https://i.imgur.com/3fCzpV6.jpg) 1. [base class](https://github.com/yzhao062/pyod/blob/master/pyod/models/base.py) `__init__(self, contamination=0.1)` (contamination refers to proportion of outliers in the data set) `fit(self, X, y=None):` pass `predict(self, X, return_confidence=False)` return prediction, confidence `decision_function(self, X):` pass `predict_confidence(self, X)` return confidence (confidence in making the same prediction under slightly different training sets) `predict_proba(self, X, method='linear', return_confidence=False)` return probs, confidence (the probability of a sample being outlier) 2. annotation labels `labels_ : int, either 0 or 1` The binary labels of the data, where 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying ``threshold_`` on ``decision_scores_``. 3. data `X_train` : numpy array of shape (n_samples, n_features) The training samples. `y_train` : list or array of shape (n_samples,) The ground truth of training samples. `y_train_pred` : numpy array of shape (n_samples, n_features) The predicted binary labels of the training samples. same for `X_test`, `y_test` and `y_test_pred`, respectively. 4. models - [Angle-based Outlier Detector (ABOD)](https://github.com/yzhao062/pyod/blob/master/pyod/models/abod.py) | unsupervised, offline - [Using Auto Encoder with Outlier Detection](https://github.com/yzhao062/pyod/blob/master/pyod/models/auto_encoder.py) | unsupervised, offline - [Clustering Based Local Outlier Factor (CBLOF)](https://github.com/yzhao062/pyod/blob/master/pyod/models/cblof.py) | unsupervised, offline - [Cook's distance outlier detection (CD)](https://github.com/yzhao062/pyod/blob/master/pyod/models/cd.py) | supervised, offline - [Connectivity-Based Outlier Factor (COF)](https://github.com/yzhao062/pyod/blob/master/pyod/models/cof.py) | unsupervised, offline - [Copula Based Outlier Detector (COPOD)](https://github.com/yzhao062/pyod/blob/master/pyod/models/copod.py) | unsupervised, offline - [Deep One-Class Classification for outlier detection](https://github.com/yzhao062/pyod/blob/master/pyod/models/deep_svdd.py) | unsupervised, offline - [Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions (ECOD)](https://github.com/yzhao062/pyod/blob/master/pyod/models/ecod.py) | unsupervised, offline - [Feature bagging detector](https://github.com/yzhao062/pyod/blob/master/pyod/models/feature_bagging.py) | unsupervised, offline - [Outlier detection based on Gaussian Mixture Model (GMM)](https://github.com/yzhao062/pyod/blob/master/pyod/models/gmm.py) | unsupervised, offline - [Histogram-based Outlier Detection (HBOS)](https://github.com/yzhao062/pyod/blob/master/pyod/models/hbos.py) | unsupervised, offline - [Isolation-based anomaly detection using nearest-neighbor ensembles](https://github.com/yzhao062/pyod/blob/master/pyod/models/inne.py) | unsupervised, offline - [Kernel Density Estimation (KDE) for Unsupervised Outlier Detection](https://github.com/yzhao062/pyod/blob/master/pyod/models/kde.py) | unsupervised, offline - [k-Nearest Neighbors Detector (kNN)](https://github.com/yzhao062/pyod/blob/master/pyod/models/knn.py) | unsupervised, offline - [Linear Model Deviation-base outlier detection (LMDD)](https://github.com/yzhao062/pyod/blob/master/pyod/models/lmdd.py) | unsupervised, offline - [Local Correlation Integral (LOCI)](https://github.com/yzhao062/pyod/blob/master/pyod/models/loci.py) | unsupervised, offline - [Lightweight on-line detector of anomalies (LODA)](https://github.com/yzhao062/pyod/blob/master/pyod/models/loda.py) | unsupervised, offline - [Locally Selective Combination of Parallel Outlier Ensembles (LSCP)](https://github.com/yzhao062/pyod/blob/master/pyod/models/lscp.py) | unsupervised, offline - [Unifying Local Outlier Detection Methods via Graph Neural Networks (LUNAR)](https://github.com/yzhao062/pyod/blob/master/pyod/models/lunar.py) | supervised, offline - [Median Absolute deviation (MAD)](https://github.com/yzhao062/pyod/blob/master/pyod/models/mad.py) | unsupervised, offline - [Outlier Detection with Minimum Covariance Determinant (MCD)](https://github.com/yzhao062/pyod/blob/master/pyod/models/mcd.py) | unsupervised, offline - [Multiple-Objective Generative Adversarial Active Learning](https://github.com/yzhao062/pyod/blob/master/pyod/models/mo_gaal.py) | unsupervised, offline - [Principal Component Analysis (PCA) Outlier Detector](https://github.com/yzhao062/pyod/blob/master/pyod/models/pca.py) | unsupervised, offline - [Outlier detection by R-graph (random walks on the representation graph)](https://github.com/yzhao062/pyod/blob/master/pyod/models/rgraph.py) | unsupervised, offline - [Rotation-based Outlier Detector (ROD)](https://github.com/yzhao062/pyod/blob/master/pyod/models/rod.py) | unsupervised, offline - [Outlier detection based on Sampling (SP)](https://github.com/yzhao062/pyod/blob/master/pyod/models/sampling.py) | unsupervised, offline - [Subspace Outlier Detection (SOD)](https://github.com/yzhao062/pyod/blob/master/pyod/models/sod.py) | unsupervised, offline - [Stochastic Outlier Selection (SOS)](https://github.com/yzhao062/pyod/blob/master/pyod/models/sos.py) | unsupervised, offline - [Accelerating Large-Scale Unsupervised Heterogeneous Outlier Detection (SUOD)](https://github.com/yzhao062/pyod/blob/master/pyod/models/suod.py) | unsupervised, offline - [Variational Auto Encoder (VAE) and beta-VAE for Unsupervised Outlier Detection](https://github.com/yzhao062/pyod/blob/master/pyod/models/vae.py) | unsupervised, offline - [Improving Supervised Outlier Detection with Unsupervised Representation Learning (XGBOD)](https://github.com/yzhao062/pyod/blob/master/pyod/models/xgbod.py) | semi-supervised, offline ## adtk Anomaly Detection Toolkit (ADTK) is a Python package for unsupervised / rule-based time series anomaly detection. ![](https://i.imgur.com/0hVyn4D.png) 1. [base class](https://github.com/arundo/adtk/blob/develop/src/adtk/_detector_base.py) `_NonTrainableUnivariateDetector()` with methods `predict` and `score` `_TrainableUnivariateDetector()` with methods `fit`, `predict`, `fit_predict` and `score` `_TrainableMultivariateDetector()` with methods `fit`, `predict`, `fit_predict` and `score` 2. annotation labels ``` pandas.Series or list Detected anomalies. - If return_list=False, return a binary series; - If return_list=True, return a list of events where an event is a pandas Timestamp if it is instantaneous or a 2-tuple of pandas Timestamps if it is a closed time interval. ``` 3. data Univariate base class, ``` ts: pandas Series or pandas.DataFrame Time series to detect anomalies from. If a DataFrame with k columns, it is treated as k independent univariate time series, and the detector will be applied to each series independently. ``` Multivariate base class, ``` df: pandas.DataFrame Time series to be used to train the detector. ``` 4. models [Univariate](https://github.com/arundo/adtk/blob/develop/src/adtk/detector/_detector_1d.py) `ThresholdAD`: identifies anomalies when values are beyond thresholds `QuantileAD`: detects anomaly based on quantiles of historical data `InterQuartileRangeAD`: detects anomaly based on inter-quartile range of historical data `GeneralizedESDTestAD`: performs generalized extreme Studentized deviate (ESD) test on historical data and identifies normal values vs. outliers for training `PersistAD`: detects anomaly based on values in a preceding period `LevelShiftAD`: detects level shift of time series values `VolatilityShiftAD`: detects shift of volatility in time series `AutoregressionAD`: detects anomalous autoregression property in time series `SeasonalAD`: detects anomalous values away from seasonal pattern [Multivariate](https://github.com/arundo/adtk/blob/develop/src/adtk/detector/_detector_hd.py) `MinClusterDetector`: detects anomaly based on clustering of historical data `OutlierDetector`: peforms time-independent outlier detection using given model `RegressionAD`: detects anomalous inter-series relationship, performs regression and identifies a time point as anomalous based on residual `PcaAD`: detects outlier point with principal component analysis ## luminaire ![](https://i.imgur.com/A5e1Kc5.png) 1. [base class](https://github.com/zillow/luminaire/blob/master/luminaire/model/base_model.py) `BaseModel()` with methods `train`, `score`, `get_info` and `run` 2. annotation labels `_train` returns a tuple containing a flag whether the datapoint on the given date is an anomnaly, the prediction and the standard error of prediction `train` returns trained model object with result from `_train` 3. data ``` data: input data to a single model. May be a (pickled) pandas series/dataframe, a list, or a single number. ``` 4. models [LADFilteringModel](https://github.com/zillow/luminaire/blob/master/luminaire/model/lad_filtering.py): Markovian state space model, detects anomaly based on the residual process obtained through Kalman Filter based model estimation [LADStructuralModel](https://github.com/zillow/luminaire/blob/master/luminaire/model/lad_structural.py) ? [WindowDensityModel](https://github.com/zillow/luminaire/blob/master/luminaire/model/window_density.py): detects anomalous windows using KL divergence (for high frequency data) and Wilcoxon sign rank test (for low frequency data)