Skip to content

Time Series and Anomaly Detection

Types of Anomalies

A time series is an ordered sequence of data points organized by their occurrence in time. In time series anomaly detection, anomalies are typically classified into three categories:

  1. Point anomalies (a): Single data points that deviate from the rest of the distribution without temporal or contextual information
  2. Contextual anomalies (b): Data points that fall within the expected range but deviate from regular patterns in a specific context (e.g. temperature in summer or winter)
  3. Collective anomalies (c): Sequences of data points that appear abnormal only when observed together, even if individual points may look normal (e.g. many small bank transfers)

Types of Anomalies.png
Figure 1: Types of Anomalies.png

Point-based vs. Sequence-based Anomalies

  • Point-based anomalies: Includes point anomalies and contextual anomalies
  • Sequence-based anomalies: Includes collective anomalies

Univariate vs. Multivariate Anomalies

  • Univariate anomalies: Deviations detected in a single variable over time
  • Multivariate anomalies: Anomalies that are only evident when considering relationships among multiple variables

Univariate vs Multivariate.png
Figure 2: Univariate vs Multivariate.png

Detector Categories:

Distance-based:
- Proximity-based (e.g. LOF)
- Clustering-based (e.g. NormA, SAND)
- Discord-based (e.g. Matrix Profile, DAMP)

Density-based:
- Distribution-based (e.g. HOBS, OCSVM)
- Graph-based (e.g. Series2Graph)
- Tree-based (e.g. Isolation Forest)
- Encoding-based (e.g. GrammarViz, POLY, PCA)

Prediction-based:
- Forecasting-based (e.g. LSTM, CNN)
- Reconstruction-based (e.g. AutoEncoders)

Types of Anomaly Detection Methods.png
Figure 3: Types of Anomaly Detection Methods.png

Leaderboard

Below is the ranking of models from The Elephant in the Room study.

Leaderboard.png
Figure 4: Leaderboard.png

More details can be found in the original paper.
Website

Benchmarks

  • TSB-AD: 1070 curated time series, with 870 univariate and 200 multivariate
Name Domain Origin Dim. Category # Datasets # Channels
UCR Multidomain both uni P&Seq 228 1
NAB Multidomain both uni Seq 28 1
YAHOO Multidomain both uni P&Seq 259 1
IOPS Business real uni Seq 17 1
MGAB Medical and health synthetic uni Seq 9 1
WSD Web / Online Services both uni Seq 111 1
SED Industrial Control Systems real uni Seq 3 1
TODS NA synthetic uni P&Seq 15 1
NEK Computer Networks real uni P&Seq 9 1
Stock Finance real uni P&Seq 20 1
Power Energy real uni Seq 1 1
Daphnet (U) Health and Medicine real uni Seq 1 1
CATSv2 (U) Industrial Control Systems synthetic uni Seq 1 1
SWaT (U) Industrial control systems real uni Seq 1 1
LTDB (U) Medical and health real uni Seq 9 1
TAO (U) Object Tracking in Videos real uni P&Seq 3 1
Exathlon (U) Server machines monitoring real uni Seq 32 1
MITDB (U) Medical and health real uni Seq 8 1
MSL (U) Aerospace real uni Seq 9 1
SMAP (U) Aerospace real uni Seq 19 1
SMD (U) Server machines monitoring real uni Seq 38 1
SVDB (U) Medical and health real uni Seq 20 1
OPPORTUNITY (U) Computer networks real uni Seq 29 1
Name Domain Origin Dim. Category # Datasets # Channels
GHL Industrial control systems synthetic multi Seq 25 19
Daphnet Health and Medicine real multi Seq 1 9
Exathlon Server machines monitoring real multi Seq 27 21
Genesis Industrial control systems real multi Seq 1 18
OPPORTUNITY Computer networks real multi Seq 8 248
SMD Server machines monitoring real multi Seq 22 38
SWaT Industrial control systems real multi Seq 2 59
PSM Server machines monitoring real multi P&Seq 1 25
SMAP Aerospace real multi Seq 27 25
MSL Aerospace real multi Seq 16 55
CreditCard Fraud detection real multi P&Seq 1 29
GECCO Internet of things (IoT) real multi Seq 1 9
MITDB Medical and health real multi Seq 13 2
SVDB Medical and health real multi Seq 31 2
LTDB Medical and health real multi Seq 5 2
CATSv2 Industrial Control Systems synthetic multi Seq 6 17
TAO Object Tracking in Videos real multi P&Seq 13 3
  • TimeEval: 976 labeled time series with synthetic tuning
Name Domain Origin Dim. Category # Datasets # Channels
CalIt2 Urban events management real multi Seq 1 2
Daphnet Medical and health real multi Seq 35 9
Dodgers Urban events management real uni P&Seq 1 1
Exathlon Server machines monitoring real multi Seq 39 45
GHL Industrial control systems synthetic multi Seq 48 16
Genesis Industrial control systems real multi Seq 1 18
GutenTAG Not specified synthetic uni/multi P&Seq 193 2
IOPS Business real uni Seq 29 1
KDD-TSAD Multidomain synthetic uni P&Seq 250 1
Kitsune Computer networks real multi P&Seq 9 116
LTDB Medical and health real multi Seq 7 3
MGAB Medical and health synthetic uni Seq 10 1
MITDB Medical and health real multi Seq 48 2
Metro Urban events management real multi P 1 5
NAB Multidomain both uni Seq 58 1
MSL Aerospace real uni Seq 27 1
SMAP Aerospace real uni Seq 54 1
NormA Multidomain both uni Seq 21 1
OPPORTUNITY Computer networks real multi Seq 24 133
Occupancy Energy real multi P&Seq 2 5
SMD Server machines monitoring real multi Seq 28 38
SSA Environmental Monitoring real uni Seq 23 1
SVDB Medical and health real multi Seq 78 2
YAHOO Multidomain both uni P&Seq 367 1
  • HEX/UCR: 250 labeled time series

References

  1. The Elephant in the Room: Towards A Reliable Time-Series Anomaly Detection Benchmark
  2. TimeEval: A Benchmarking Toolkit for Time Series Anomaly Detection Algorithms
  3. Dive into Time-Series Anomaly Detection: A Decade Review
  4. VUS: Effective and Efficient Accuracy Measures for Time-Series Anomaly Detection