Time Series

5.8. Time Series#

5.8.1. Check Seasonality automatically with `darts`#

Seasonality describes a pattern that repeats regularly over time.

Identifying and understanding the seasonality in time series can boost the performance of your model.

But you don’t have to find the seasonality effect and period by yourself.

Instead, you can use check_seasonality() from darts in Python.

It will check if the time series is seasonal and returns also the period, which is inferred from the Auto-correlation Function.

In the example below, it will return a seasonal period of 12 (Air Passenger Dataset has a monthly frequency).

!pip install darts

from darts.utils.statistics import check_seasonality
from darts.datasets import AirPassengersDataset

ts = AirPassangersDataset().load()

is_seasonal, period = check_seasonality(ts)

5.8.2. Cross-validation for Time Series Data with `TimeSeriesSplit`#

How to do Cross-Validation with Time Series?

Using standard K-Fold Cross-Validation will not work.

In this case, you would simply partition the data into k folds, and then train and evaluate the model k times, each time using a different fold as the test set and the rest of the data as the training set.

But, this can lead to issues because the model will be trained on data that is both before and after the test data.

This can result in overfitting or biased estimates of model performance

Instead, use TimeSeriesSplit from scikit-learn.

TimeSeriesSplit ensures that the model is only trained on the past values and tested on future data.

This gives you a more accurate and less biased assessment of the model’s performance.

from sklearn.model_selection import TimeSeriesSplit, cross_validate
from sklearn.ensemble import GradientBoostingRegressor

X, y = ...
model = GradientBoostingRegressor()

ts_cv = TimeSeriesSplit(n_splits=3)

scores = cross_validate(model, X, y, cv=ts_cv, scoring='neg_mean_squared_error')

5.8.3. More Cross-Validation with `tscv`#

How to do Cross-Validation with Time Series?

Using standard K-Fold Cross-Validation will not work.

In this case, you would simply partition the data into k folds, and then train and evaluate the model k times, each time using a different fold as the test set and the rest of the data as the training set.

But, this can lead to issues because the model will be trained on data that is both before and after the test data.

This can result in overfitting or biased estimates of model performance.

Instead, use tscv package for Python.

tscv offers methods for correct splitting of your data with 3 classes implemented:

GapLeavePOut
GapKFold
GapRollForward

This gives you a more accurate and less biased assessment of the model’s performance.

!pip install tscv

from tscv import GapRollForward
cv = GapRollForward(min_train_size=3, gap_size=1, max_test_size=2)
for train, test in cv.split(range(10)):
    print("train:", train, "test:", test)

5.8.4. Time Series Forecasting with Machine Learning with `mlforecast`#

Do you want to perform powerful time series forecasting?

Try mlforecast by Nixtla.

mlforecast lets you run Machine Learning models for time series forecasting, even on remote clusters like Ray or Spark.

Feature Engineering, support for exogenous variables, and probabilistic forecasting are also included.

!pip install mlforecast

import lightgbm as lgb

from mlforecast import MLForecast
from sklearn.linear_model import LinearRegression

mlf = MLForecast(
    models = [LinearRegression(), lgb.LGBMRegressor()],
    lags=[1, 12],
    freq = 'M'
)
mlf.fit(df)
mlf.predict(12)

5.8.5. Lightning Fast Time Series Forecasting with `statsforecast`#

Do you want to perform lightning fast time series forecasting?

Try statsforecast by Nixtla.

statsforecast lets you run statistical models on your time series data.

It’s up to 20x faster than existing libraries like pmdarima and statsmodels.

!pip install statsforecast

from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
from statsforecast.utils import AirPassengersDF

df = AirPassengersDF
sf = StatsForecast(
    models = [AutoARIMA(season_length = 12)],
    freq = 'M'
)

sf.fit(df)
sf.predict(h=12, level=[95])

5.8.6. Time Series with Polars Backend with `functime`#

Fast time-series forecasting with functime.

functime is a Python library for time series forecasting and feature extraction, built with Polars.

Since it uses lazy Polars dataframes, functime speeds up forecasting and feature engineering.

Backtesting, cross-validation splitters and metrics are included too.

It even comes with a LLM agent to analyze and describe your forecasts.

Check it out!

!pip install functime

import polars as pl
from functime.cross_validation import train_test_split
from functime.forecasting import linear_model
from functime.metrics import mase

y_train, y_test = y.pipe(train_test_split(test_size=3))

forecaster = linear_model(freq="1mo", lags=24)
forecaster.fit(y=y_train)
y_pred = forecaster.predict(fh=3)

y_pred = linear_model(freq="1mo", lags=24)(y=y_train, fh=3)

scores = mase(y_true=y_test, y_pred=y_pred, y_train=y_train)

5.8.7. Time Series Forecasting with Deep Learning with `neuralforecast`#

Do you want to perform powerful time series forecasting?

Try neuralforecast by nixtla.

neuralforecast lets you run Deep Learning models for time series forecasting with models like N-BEATS or N-HiTS.

Support for exogenous variables and probabilistic forecasting are also included.

Check the example below!

!pip install neuralforecast

import pandas as pd

from neuralforecast import NeuralForecast
from neuralforecast.models import NBEATS, NHITS
from neuralforecast.utils import AirPassengersDF

Y_df = AirPassengersDF
Y_train_df = Y_df[Y_df.ds<='1959-12-31']
Y_test_df = Y_df[Y_df.ds>'1959-12-31']

horizon = 12
models = [NBEATS(input_size=2 * horizon, h=horizon, max_steps=50),
          NHITS(input_size=2 * horizon, h=horizon, max_steps=50)]

nf = NeuralForecast(models=models, freq='M')
nf.fit(df=Y_train_df)
Y_hat_df = nf.predict().reset_index()

5.8.8. Efficient Preprocessing and Feature Engineering with `temporian`#

temporian is a Python library for preprocessing and feature engineering temporal data to feed into ML libraries like XGBoost, Scikit-learn or PyTorch.

It handles various types of temporal data like single- and multivariate data or flat- and multi-index data.

!pip install temporian

import temporian as tp

sales = tp.from_csv("sales.csv")

sales_per_store = sales.add_index("store")

days = sales_per_store.tick_calendar(hour=22)
work_days = (days.calendar_day_of_week() <= 5).filter()

daily_revenue = sales_per_store["revenue"].moving_sum(
                     tp.duration.days(1), 
                     sampling=work_days)

5.8.9. Change Point Detection with `ruptures`#

Change point detection was never easier in Python with `ruptures``

ruptures is a library which provides methods for detecting and displaying off-line change points.

It offers multiple exact and approximation detection methods.

!pip install ruptures

import matplotlib.pyplot as plt
import ruptures as rpt

# Generate signal
n_samples, dim, sigma = 1000, 3, 4
n_breakpoints = 4
signal, bkps = rpt.pw_constant(n_samples, dim, n_breakpoints, noise_std=sigma)

# Detection
algo = rpt.Pelt(model="rbf").fit(signal)
result = algo.predict(pen=10)

# Display
rpt.display(signal, bkps, result)
plt.show()

5.8.10. Probabilistic Machine Learning with `skpro`#

Use supervised probabilistic prediction like a pro with skpro.

skpro is a scikit-learn-like library for probabilistic predictions and evaluations.

It supports tabular regressors, survival prediction, and reductions to turn scikit-learn regressors into probabilistic skpro regressors.

!pip install skpro

from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

from skpro.regression.residual import ResidualDouble

X, y = load_diabetes(return_X_y=True, as_frame=True)
X_train, X_new, y_train, _ = train_test_split(X, y)

reg_mean = RandomForestRegressor()
reg_resid = LinearRegression()
reg_proba = ResidualDouble(reg_mean, reg_resid)

reg_proba.fit(X_train, y_train)

y_pred_proba = reg_proba.predict_proba(X_new)

y_pred_interval = reg_proba.predict_interval(X_new, coverage=0.9)

y_pred_quantiles = reg_proba.predict_quantiles(X_new, alpha=[0.05, 0.5, 0.95])

y_pred_var = reg_proba.predict_var(X_new)

y_pred_mean = reg_proba.predict(X_new)

5.8.11. Evaluating Forecasts with `fev`#

Time Series Forecasting without evaluation is guessing, not knowing.

Make your life easier and use fev.

fev is a new Python library aiming to benchmark forecasting models easily.

As a wrapper on top of Huggingface Datasets, it is very easy to define custom forecasting benchmarks.

It supports point and even probabilistic forecasts which is crucial in today’s world.

!pip install fev

import fev

# Create Task
task = fev.Task(
    dataset_path="autogluon/chronos_datasets",
    dataset_config="monash_kdd_cup_2018",
    horizon=12,
)
# Load data
past_data, future_data = task.get_input_data()

def naive_forecast(y: list, horizon: int) -> list:
    return [y[-1] for _ in range(horizon)]

# Make predictions
predictions = []
for ts in past_data:
    predictions.append(
        {"predictions": naive_forecast(y=ts[task.target_column], horizon=task.horizon)}
    )
    

# Evaluate
task.evaluation_summary(predictions, model_name="naive")

Time Series

Contents

5.8. Time Series#

5.8.1. Check Seasonality automatically with darts#

5.8.2. Cross-validation for Time Series Data with TimeSeriesSplit#

5.8.3. More Cross-Validation with tscv#

5.8.4. Time Series Forecasting with Machine Learning with mlforecast#

5.8.5. Lightning Fast Time Series Forecasting with statsforecast#

5.8.6. Time Series with Polars Backend with functime#

5.8.7. Time Series Forecasting with Deep Learning with neuralforecast#

5.8.8. Efficient Preprocessing and Feature Engineering with temporian#

5.8.9. Change Point Detection with ruptures#

5.8.10. Probabilistic Machine Learning with skpro#

5.8.11. Evaluating Forecasts with fev#