5.7. Time Series#

5.7.1. Check Seasonality automatically with darts#

Seasonality describes a pattern that repeats regularly over time.

Identifying and understanding the seasonality in time series can boost the performance of your model.

But you don’t have to find the seasonality effect and period by yourself.

Instead, you can use check_seasonality() from darts in Python.

It will check if the time series is seasonal and returns also the period, which is inferred from the Auto-correlation Function.

In the example below, it will return a seasonal period of 12 (Air Passenger Dataset has a monthly frequency).

!pip install darts
from darts.utils.statistics import check_seasonality
from darts.datasets import AirPassengersDataset

ts = AirPassangersDataset().load()

is_seasonal, period = check_seasonality(ts)

5.7.2. Cross-validation for Time Series Data with TimeSeriesSplit#

How to do Cross-Validation with Time Series?

Using standard K-Fold Cross-Validation will not work.

In this case, you would simply partition the data into k folds, and then train and evaluate the model k times, each time using a different fold as the test set and the rest of the data as the training set.

But, this can lead to issues because the model will be trained on data that is both before and after the test data.

This can result in overfitting or biased estimates of model performance

Instead, use TimeSeriesSplit from scikit-learn.

TimeSeriesSplit ensures that the model is only trained on the past values and tested on future data.

This gives you a more accurate and less biased assessment of the model’s performance.

from sklearn.model_selection import TimeSeriesSplit, cross_validate
from sklearn.ensemble import GradientBoostingRegressor

X, y = ...
model = GradientBoostingRegressor()

ts_cv = TimeSeriesSplit(n_splits=3)

scores = cross_validate(model, X, y, cv=ts_cv, scoring='neg_mean_squared_error')

5.7.3. More Cross-Validation with tscv#

How to do Cross-Validation with Time Series?

Using standard K-Fold Cross-Validation will not work.

In this case, you would simply partition the data into k folds, and then train and evaluate the model k times, each time using a different fold as the test set and the rest of the data as the training set.

But, this can lead to issues because the model will be trained on data that is both before and after the test data.

This can result in overfitting or biased estimates of model performance.

Instead, use tscv package for Python.

tscv offers methods for correct splitting of your data with 3 classes implemented:

  • GapLeavePOut

  • GapKFold

  • GapRollForward

This gives you a more accurate and less biased assessment of the model’s performance.

!pip install tscv
from tscv import GapRollForward
cv = GapRollForward(min_train_size=3, gap_size=1, max_test_size=2)
for train, test in cv.split(range(10)):
    print("train:", train, "test:", test)

5.7.4. Time Series Forecasting with Machine Learning with mlforecast#

Do you want to perform powerful time series forecasting?

Try mlforecast by Nixtla.

mlforecast lets you run Machine Learning models for time series forecasting, even on remote clusters like Ray or Spark.

Feature Engineering, support for exogenous variables, and probabilistic forecasting are also included.

!pip install mlforecast
import lightgbm as lgb

from mlforecast import MLForecast
from sklearn.linear_model import LinearRegression

mlf = MLForecast(
    models = [LinearRegression(), lgb.LGBMRegressor()],
    lags=[1, 12],
    freq = 'M'
)
mlf.fit(df)
mlf.predict(12)

5.7.5. Lightning Fast Time Series Forecasting with statsforecast#

Do you want to perform lightning fast time series forecasting?

Try statsforecast by Nixtla.

statsforecast lets you run statistical models on your time series data.

It’s up to 20x faster than existing libraries like pmdarima and statsmodels.

!pip install statsforecast
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
from statsforecast.utils import AirPassengersDF

df = AirPassengersDF
sf = StatsForecast(
    models = [AutoARIMA(season_length = 12)],
    freq = 'M'
)

sf.fit(df)
sf.predict(h=12, level=[95])

5.7.6. Time Series with Polars Backend with functime#

Fast time-series forecasting with functime.

functime is a Python library for time series forecasting and feature extraction, built with Polars.

Since it uses lazy Polars dataframes, functime speeds up forecasting and feature engineering.

Backtesting, cross-validation splitters and metrics are included too.

It even comes with a LLM agent to analyze and describe your forecasts.

Check it out!

!pip install functime
import polars as pl
from functime.cross_validation import train_test_split
from functime.forecasting import linear_model
from functime.metrics import mase

y_train, y_test = y.pipe(train_test_split(test_size=3))

forecaster = linear_model(freq="1mo", lags=24)
forecaster.fit(y=y_train)
y_pred = forecaster.predict(fh=3)

y_pred = linear_model(freq="1mo", lags=24)(y=y_train, fh=3)

scores = mase(y_true=y_test, y_pred=y_pred, y_train=y_train)

5.7.7. Time Series Forecasting with Deep Learning with neuralforecast#

Do you want to perform powerful time series forecasting?

Try neuralforecast by nixtla.

neuralforecast lets you run Deep Learning models for time series forecasting with models like N-BEATS or N-HiTS.

Support for exogenous variables and probabilistic forecasting are also included.

Check the example below!

!pip install neuralforecast
import pandas as pd

from neuralforecast import NeuralForecast
from neuralforecast.models import NBEATS, NHITS
from neuralforecast.utils import AirPassengersDF

Y_df = AirPassengersDF
Y_train_df = Y_df[Y_df.ds<='1959-12-31']
Y_test_df = Y_df[Y_df.ds>'1959-12-31']

horizon = 12
models = [NBEATS(input_size=2 * horizon, h=horizon, max_steps=50),
          NHITS(input_size=2 * horizon, h=horizon, max_steps=50)]

nf = NeuralForecast(models=models, freq='M')
nf.fit(df=Y_train_df)
Y_hat_df = nf.predict().reset_index()

5.7.8. Efficient Preprocessing and Feature Engineering with temporian#

temporian is a Python library for preprocessing and feature engineering temporal data to feed into ML libraries like XGBoost, Scikit-learn or PyTorch.

It handles various types of temporal data like single- and multivariate data or flat- and multi-index data.

!pip install temporian
import temporian as tp

sales = tp.from_csv("sales.csv")

sales_per_store = sales.add_index("store")

days = sales_per_store.tick_calendar(hour=22)
work_days = (days.calendar_day_of_week() <= 5).filter()

daily_revenue = sales_per_store["revenue"].moving_sum(
                     tp.duration.days(1), 
                     sampling=work_days)

5.7.9. Change Point Detection with ruptures#

Change point detection was never easier in Python with `ruptures``

ruptures is a library which provides methods for detecting and displaying off-line change points.

It offers multiple exact and approximation detection methods.

!pip install ruptures
import matplotlib.pyplot as plt
import ruptures as rpt

# Generate signal
n_samples, dim, sigma = 1000, 3, 4
n_breakpoints = 4
signal, bkps = rpt.pw_constant(n_samples, dim, n_breakpoints, noise_std=sigma)

# Detection
algo = rpt.Pelt(model="rbf").fit(signal)
result = algo.predict(pen=10)

# Display
rpt.display(signal, bkps, result)
plt.show()

5.7.10. Probabilistic Machine Learning with skpro#

Use supervised probabilistic prediction like a pro with skpro.

skpro is a scikit-learn-like library for probabilistic predictions and evaluations.

It supports tabular regressors, survival prediction, and reductions to turn scikit-learn regressors into probabilistic skpro regressors.

!pip install skpro
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

from skpro.regression.residual import ResidualDouble

X, y = load_diabetes(return_X_y=True, as_frame=True)
X_train, X_new, y_train, _ = train_test_split(X, y)

reg_mean = RandomForestRegressor()
reg_resid = LinearRegression()
reg_proba = ResidualDouble(reg_mean, reg_resid)

reg_proba.fit(X_train, y_train)

y_pred_proba = reg_proba.predict_proba(X_new)

y_pred_interval = reg_proba.predict_interval(X_new, coverage=0.9)

y_pred_quantiles = reg_proba.predict_quantiles(X_new, alpha=[0.05, 0.5, 0.95])

y_pred_var = reg_proba.predict_var(X_new)

y_pred_mean = reg_proba.predict(X_new)