kats.utils.backtesters moduleΒΆ

This file defines the BackTester classes for Kats.

Kats supports multiple types of backtesters, including:

This module also supports CrossValidation with both expanding and rolling windows.

For more information, check out the Kats tutorial notebook on backtesting!

class kats.utils.backtesters.BackTesterExpandingWindow(error_methods: List[str], data: kats.consts.TimeSeriesData, params: kats.consts.Params, start_train_percentage: float, end_train_percentage: float, test_percentage: float, expanding_steps: int, model_class: Type, multi=True, **kwargs)[source]ΒΆ

Bases: kats.utils.backtesters.BackTesterParent

Defines functions to execute an expanding window backtest.

An expanding window backtest conducts a backtest over multiple iterations, wherein each iteration, the size of the training dataset increases by a fixed amount, while the test dataset β€œslides” forward to accommodate. Iterations continue until the complete data set is used to either train or test in the final interation.

For more information, check out the Kats tutorial notebooks!

start_train_percentageΒΆ

A float for the initial percentage of data used for training.

end_train_percentageΒΆ

A float for the final percentage of data used for training.

test_percentageΒΆ

A float for the percentage of data used for testing.

expanding_stepsΒΆ

An integer for the number of expanding steps (i.e. number of folds).

error_methodsΒΆ

List of strings indicating which errors to calculate (see ALLOWED_ERRORS for exhaustive list).

dataΒΆ

kats.consts.TimeSeriesData object to perform backtest on.

paramsΒΆ

Parameters to train model with.

model_classΒΆ

Defines the model type to use for backtesting.

multiΒΆ

A boolean flag to toggle multiprocessing (default True).

resultsΒΆ

List of tuples (training_data, testing_data, trained_model, forecast_predictions) storing forecast results.

errorsΒΆ

Dictionary mapping the error type to value.

sizeΒΆ

An integer for the total number of datapoints.

error_funcsΒΆ

Dictionary mapping error name to function that calculates it.

freqΒΆ

A string representing the (inferred) frequency of the pandas.DataFrame.

raw_errorsΒΆ

List storing raw errors (truth - predicted).

Raises

ValueError – One or more of the train, test, or expanding steps params were invalid. Or the time series is empty.

Sample Usage:
>>> df = pd.read_csv("kats/data/air_passengers.csv")
>>> ts = TimeSeriesData(df=df)
>>> params = ARIMAParams(p=1, d=1, q=1)
>>> all_errors = ["mape", "smape", "mae", "mase", "mse", "rmse"]
>>> backtester = BackTesterExpandingWindow(
      error_methods=all_errors,
      data=ts,
      params=paramsparams,
      start_train_percentage=50,
      end_train_percentage=75,
      test_percentage=25,
      expanding_steps=3,
      model_class=ARIMAModel,
    )
>>> backtester.run_backtest()
>>> mape = backtester.get_error_value("mape") # Retrieve MAPE error
class kats.utils.backtesters.BackTesterFixedWindow(error_methods: List[str], data: kats.consts.TimeSeriesData, params: kats.consts.Params, train_percentage: float, test_percentage: float, window_percentage: int, model_class: Type, **kwargs)[source]ΒΆ

Bases: kats.utils.backtesters.BackTesterParent

Defines functions to execute a fixed window ahead backtest.

A fixed window ahead backtest is similar to a standard (i.e. simple) backtest, with the caveat that there is a gap between the train and test data sets. The purpose of this type backtest is to focus on the long range forecasting ability of the model.

train_percentageΒΆ

A float for the percentage of data used for training.

test_percentageΒΆ

A float for the percentage of data used for testing.

window_percentageΒΆ

A float for the percentage of data used for the fixed window.

error_methodsΒΆ

List of strings indicating which errors to calculate (see ALLOWED_ERRORS for exhaustive list).

dataΒΆ

kats.consts.TimeSeriesData object to perform backtest on.

paramsΒΆ

Parameters to train model with.

model_classΒΆ

Defines the model type to use for backtesting.

resultsΒΆ

List of tuples (training_data, testing_data, trained_model, forecast_predictions) storing forecast results.

errorsΒΆ

Dictionary mapping the error type to value.

sizeΒΆ

An integer for the total number of datapoints.

error_funcsΒΆ

Dictionary mapping error name to the function that calculates it.

freqΒΆ

A string representing the (inferred) frequency of the pandas.DataFrame.

raw_errorsΒΆ

List storing raw errors (truth - predicted).

Raises

ValueError – One or more of the train, test, or fixed window params were invalid. Or the time series is empty.

Sample Usage:
>>> df = pd.read_csv("kats/data/air_passengers.csv")
>>> ts = TimeSeriesData(df=df)
>>> params = ARIMAParams(p=1, d=1, q=1)
>>> all_errors = ["mape", "smape", "mae", "mase", "mse", "rmse"]
>>> backtester = BackTesterFixedWindow(
      error_methods=all_errors,
      data=ts,
      params=paramsparams,
      train_percentage=50,
      test_percentage=25,
      window_percentage=25,
      model_class=ARIMAModel,
    )
>>> backtester.run_backtest()
>>> mape = backtester.get_error_value("mape") # Retrieve MAPE error
class kats.utils.backtesters.BackTesterParent(error_methods: List[str], data: kats.consts.TimeSeriesData, params: kats.consts.Params, model_class: Type, multi: bool, offset=0, **kwargs)[source]ΒΆ

Bases: abc.ABC

This class defines the parent functions for various backtesting methods.

error_methodsΒΆ

List of strings indicating which errors to calculate (see ALLOWED_ERRORS for exhaustive list).

dataΒΆ

kats.consts.TimeSeriesData object to perform backtest on.

paramsΒΆ

Parameters to train model with.

model_classΒΆ

Defines the model type to use for backtesting.

multiΒΆ

Boolean flag to use multiprocessing (if set to True).

offsetΒΆ

Gap between train/test datasets (default 0).

resultsΒΆ

List of tuples (training_data, testing_data, trained_model, forecast_predictions) storing forecast results.

errorsΒΆ

Dictionary mapping the error type to value.

sizeΒΆ

An integer for the total number of datapoints.

error_funcsΒΆ

Dictionary mapping error name to the function that calculates it.

freqΒΆ

A string representing the (inferred) frequency of the pandas.DataFrame.

Raises

ValueError – The time series is empty or an invalid error type was passed.

calc_error()Optional[float][source]ΒΆ

Calculates all errors in self.error_methods and stores them in the errors dict.

Returns

The error value. None if the error value does not exist.

get_error_value(error_name: str)float[source]ΒΆ

Gets requested error value.

Parameters

error_name – A string of the error whose value should be returned.

Returns

A float of the eror value.

Raises

ValueError – The error name is invalid.

run_backtest()None[source]ΒΆ

Executes backtest.

class kats.utils.backtesters.BackTesterRollingWindow(error_methods: List[str], data: kats.consts.TimeSeriesData, params: kats.consts.Params, train_percentage: float, test_percentage: float, sliding_steps: int, model_class: Type, multi=True, **kwargs)[source]ΒΆ

Bases: kats.utils.backtesters.BackTesterParent

Defines functions to execute a rolling window backtest.

An rolling window backtest conducts a backtest over multiple iterations, wherein each iteration, the start location of the training dataset moves forward by a fixed amount, while the test dataset β€œslides” forward to accommodate. Iterations continue until the end of the test set meets the end of the full data set.

For more information, check out the Kats tutorial notebooks!

train_percentageΒΆ

A float for the percentage of data used for training.

test_percentageΒΆ

A float for the percentage of data used for testing.

sliding_stepsΒΆ

An integer for the number of rolling steps (i.e. number of folds).

error_methodsΒΆ

List of strings indicating which errors to calculate (see ALLOWED_ERRORS for exhaustive list).

dataΒΆ

kats.consts.TimeSeriesData object to perform backtest on.

paramsΒΆ

Parameters to train model with.

model_classΒΆ

Defines the model type to use for backtesting.

multiΒΆ

A boolean flag to toggle multiprocessing (default True).

resultsΒΆ

List of tuples (training_data, testing_data, trained_model, forecast_predictions) storing forecast results.

errorsΒΆ

Dictionary mapping the error type to value.

sizeΒΆ

An integer for the total number of datapoints.

error_funcsΒΆ

Dictionary mapping error name to the function that calculates it.

freqΒΆ

A string representing the (inferred) frequency of the pandas.DataFrame.

raw_errorsΒΆ

List storing raw errors (truth - predicted).

Raises

ValueError – One or more of the train, test, or sliding steps params were invalid. Or the time series is empty.

Sample Usage:
>>> df = pd.read_csv("kats/data/air_passengers.csv")
>>> ts = TimeSeriesData(df=df)
>>> params = ARIMAParams(p=1, d=1, q=1)
>>> all_errors = ["mape", "smape", "mae", "mase", "mse", "rmse"]
>>> backtester = BackTesterExpandingWindow(
      error_methods=all_errors,
      data=ts,
      params=paramsparams,
      train_percentage=50,
      test_percentage=25,
      expanding_steps=3,
      model_class=ARIMAModel,
    )
>>> backtester.run_backtest()
>>> mape = backtester.get_error_value("mape") # Retrieve MAPE error
class kats.utils.backtesters.BackTesterSimple(error_methods: List[str], data: kats.consts.TimeSeriesData, params: kats.consts.Params, train_percentage: float, test_percentage: float, model_class: Type, **kwargs)[source]ΒΆ

Bases: kats.utils.backtesters.BackTesterParent

Defines the functions to execute a simple train/test backtest.

train_percentageΒΆ

A float for the percentage of data used for training.

test_percentageΒΆ

A float for the percentage of data used for testing.

error_methodsΒΆ

List of strings indicating which errors to calculate (see ALLOWED_ERRORS for exhaustive list).

dataΒΆ

kats.consts.TimeSeriesData object to perform backtest on.

paramsΒΆ

Parameters to train model with.

model_classΒΆ

Defines the model type to use for backtesting.

resultsΒΆ

List of tuples (training_data, testing_data, trained_model, forecast_predictions) storing forecast results.

errorsΒΆ

Dictionary mapping the error type to value.

sizeΒΆ

An integer for the total number of datapoints.

error_funcsΒΆ

Dictionary mapping error name to the function that calculates it.

freqΒΆ

A string representing the (inferred) frequency of the pandas.DataFrame.

raw_errorsΒΆ

List storing raw errors (truth - predicted).

Raises

ValueError – Invalid train and/or test params passed. Or the time series is empty.

Sample Usage:
>>> df = pd.read_csv("kats/data/air_passengers.csv")
>>> ts = TimeSeriesData(df=df)
>>> params = ARIMAParams(p=1, d=1, q=1)
>>> all_errors = ["mape", "smape", "mae", "mase", "mse", "rmse"]
>>> backtester = BackTesterSimple(
      error_methods=all_errors,
      data=ts,
      params=params,
      train_percentage=75,
      test_percentage=25,
      model_class=ARIMAModel,
    )
>>> backtester.run_backtest()
>>> mape = backtester.get_error_value("mape") # Retrieve MAPE error
class kats.utils.backtesters.CrossValidation(error_methods: List[str], data: kats.consts.TimeSeriesData, params: kats.consts.Params, train_percentage: float, test_percentage: float, num_folds: int, model_class: Type, rolling_window=False, multi=True)[source]ΒΆ

Bases: object

Defines class to execute cross validation.

Cross validation is a useful technique to use multiple folds of the training and testing data to help optimize the performance of the model (e.g. hyperparameter tuning). For more info on cross validation, see https://en.wikipedia.org/wiki/Cross-validation_(statistics)

train_percentageΒΆ

A float for the percentage of data used for training.

test_percentageΒΆ

A float for the percentage of data used for testing.

num_foldsΒΆ

An integer for the number of folds to use.

error_methodsΒΆ

List of strings indicating which errors to calculate (see ALLOWED_ERRORS for exhaustive list).

dataΒΆ

kats.consts.TimeSeriesData object to perform backtest on.

paramsΒΆ

Parameters to train model with.

model_classΒΆ

Defines the model type to use for backtesting.

rolling_windowΒΆ

A boolean flag to use the rolling window method instead of the expanding window method (default False).

multiΒΆ

A boolean flag to toggle multiprocessing (default True).

resultsΒΆ

List of tuples (training_data, testing_data, trained_model, forecast_predictions) storing forecast results.

errorsΒΆ

Dictionary mapping the error type to value.

sizeΒΆ

An integer for the total number of datapoints.

raw_errorsΒΆ

List storing raw errors (truth - predicted).

Raises

ValueError – One or more of the train, test, or num_folds params were invalid. Or the time series is empty.

Sample Usage:
>>> df = pd.read_csv("kats/data/air_passengers.csv")
>>> ts = TimeSeriesData(df=df)
>>> params = ARIMAParams(p=1, d=1, q=1)
>>> all_errors = ["mape", "smape", "mae", "mase", "mse", "rmse"]
>>> cv = CrossValidation(
      error_methods=all_errors,
      data=ts,
      params=paramsparams,
      train_percentage=50,
      test_percentage=25,
      num_folds=3,
      model_class=ARIMAModel,
      rolling_window=True
    )
>>> backtester.run_backtest()
>>> mape = cv.get_error_value("mape") # Retrieve MAPE error
get_error_value(error_name: str)float[source]ΒΆ

Gets requested error value.

Parameters

error_name – A string of the error whose value should be returned.

Returns

A float of the eror value.

Raises

ValueError – The error name is invalid.

run_cv()None[source]ΒΆ

Runs the cross validation.