Foundation Models for Time Series
What are Foundation Models?
Foundation models are large pretrained models that can adapt to various tasks such as forecasting, anomaly detection, classification and imputation. Originally developed for natural language processing, their use in time series analysis is rapidly growing.
Examples of Foundation Models
Univariate models:
- TimesFM
- Chronos
- Timer
Multivariate models:
- Moirai
- MOMENT
- TTM
- UniTS
More models can be found here.
Adaptation techniques
- Zero-shot learning: Direct use of a pretrained model without additional training.
- Fine-tuning: Updating all or part of the model parameters using the target dataset.
- Few-shot learning: A fine-tuning variant where only a small subset of the target dataset is used.
- Prompt-tuning: Adjusting only the prompts while keeping model weights frozen.
Notes:
LLM-based models use textual prompts, while other models like UniTS use learnable embeddings as prompts.
Fine-tuning modifies model weights, while prompt-tuning leaves them unchanged.
Advantages
- Zero-shot and few-shot capabilities: direct use without extensive training.
- Transfer learning: adaptable using similar datasets, even when target datasets are unavailable.
- Multivariate & contextual awareness: Transformer-based models like many foundation models handle sparse and noisy time series, capturing complex multivariate dependencies and nonlinear patterns.
- No labelling requirement: Self-supervised learning eliminates the need for costly labeling by uncovering relationships directly from input data.
Challenges
- Interpretability and explainability remain limited despite advances in explainable AI.
- High computational demands compared to simpler task-specific models.
- Performance tends to improve through few-shot or fine-tuning despite zero-shot capabilities. Domain-specific adaptation may be necessary for real-world applications.
Benchmark Results
Below is the results from "FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting"

More details can be found in the original paper.
What is Self-supervised Learning?
Many foundation models use self-supervised learning, which...
- Is a subset of unsupervised learning
- Supervises itself using the structure or properties of the data
- Generates output labels from input data examples by revealing the relationships between data components

Image source
Window vs Patch?
Patching is a tokenization technique that divides time series data into smaller segments called patches. It is one of the most widely used techniques in foundation models for time series. How does it differ from the traditional windowing approach?
Window (sequence)
- Input unit: data point
- Processing: one data point at a time

Image source
Patch
- Input unit: patch (a collection of data points)
- Processing: all patches at once
- Each patch contains multiple data points and represents them collectively.
- Multiple patches can form a window.

Image source
Why Use Patching?
- Efficient Representation: Reduces sequence length by grouping data points.
- Better Contextual Learning: Each patch captures local temporal patterns.
- Widely Adopted: Common in transformer-based foundation models, similar to image patching in vision transformers.
References
- FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting
- Time-Series Large Language Models: A Systematic Review of State-of-the-Art
- Foundation Models for Time Series: A Survey
- Foundation Models for Time Series Analysis: A Tutorial and Survey
- A Survey on Time-Series Pre-Trained Models
- Large Language Models for Time Series: A Survey