kats.utils.backtesters module¶

This file defines the BackTester classes for Kats.

Kats supports multiple types of backtesters, including:

BackTesterSimple (basic train & test backtesting).
BackTesterFixedWindow (discontinuous train & test data).
BackTesterExpandingWindow (increasing train window size over multiple iterations).
BackTesterRollingWindow (sliding train & test windows over multiple iterations).

This module also supports CrossValidation with both expanding and rolling windows.

For more information, check out the Kats tutorial notebook on backtesting!

class kats.utils.backtesters.BackTesterExpandingWindow(error_methods: List[str], data: kats.consts.TimeSeriesData, params: kats.consts.Params, start_train_percentage: float, end_train_percentage: float, test_percentage: float, expanding_steps: int, model_class: Type, multi=True, **kwargs)[source]¶

Bases: kats.utils.backtesters.BackTesterParent

Defines functions to execute an expanding window backtest.

An expanding window backtest conducts a backtest over multiple iterations, wherein each iteration, the size of the training dataset increases by a fixed amount, while the test dataset “slides” forward to accommodate. Iterations continue until the complete data set is used to either train or test in the final interation.

For more information, check out the Kats tutorial notebooks!

start_train_percentage¶: A float for the initial percentage of data used for training.

end_train_percentage¶: A float for the final percentage of data used for training.

test_percentage¶: A float for the percentage of data used for testing.

expanding_steps¶: An integer for the number of expanding steps (i.e. number of folds).

error_methods¶: List of strings indicating which errors to calculate (see ALLOWED_ERRORS for exhaustive list).

data¶: kats.consts.TimeSeriesData object to perform backtest on.

params¶: Parameters to train model with.

model_class¶: Defines the model type to use for backtesting.

multi¶: A boolean flag to toggle multiprocessing (default True).

results¶: List of tuples (training_data, testing_data, trained_model, forecast_predictions) storing forecast results.

errors¶: Dictionary mapping the error type to value.

size¶: An integer for the total number of datapoints.

error_funcs¶: Dictionary mapping error name to function that calculates it.

freq¶: A string representing the (inferred) frequency of the pandas.DataFrame.

raw_errors¶: List storing raw errors (truth - predicted).

Raises: ValueError – One or more of the train, test, or expanding steps params were invalid. Or the time series is empty.

Sample Usage:

>>> df = pd.read_csv("kats/data/air_passengers.csv")
>>> ts = TimeSeriesData(df=df)
>>> params = ARIMAParams(p=1, d=1, q=1)
>>> all_errors = ["mape", "smape", "mae", "mase", "mse", "rmse"]
>>> backtester = BackTesterExpandingWindow(
      error_methods=all_errors,
      data=ts,
      params=paramsparams,
      start_train_percentage=50,
      end_train_percentage=75,
      test_percentage=25,
      expanding_steps=3,
      model_class=ARIMAModel,
    )
>>> backtester.run_backtest()
>>> mape = backtester.get_error_value("mape") # Retrieve MAPE error

class kats.utils.backtesters.BackTesterFixedWindow(error_methods: List[str], data: kats.consts.TimeSeriesData, params: kats.consts.Params, train_percentage: float, test_percentage: float, window_percentage: int, model_class: Type, **kwargs)[source]¶

Bases: kats.utils.backtesters.BackTesterParent

Defines functions to execute a fixed window ahead backtest.

A fixed window ahead backtest is similar to a standard (i.e. simple) backtest, with the caveat that there is a gap between the train and test data sets. The purpose of this type backtest is to focus on the long range forecasting ability of the model.

train_percentage¶: A float for the percentage of data used for training.

test_percentage¶: A float for the percentage of data used for testing.

window_percentage¶: A float for the percentage of data used for the fixed window.

error_methods¶: List of strings indicating which errors to calculate (see ALLOWED_ERRORS for exhaustive list).

data¶: kats.consts.TimeSeriesData object to perform backtest on.

params¶: Parameters to train model with.

model_class¶: Defines the model type to use for backtesting.

results¶: List of tuples (training_data, testing_data, trained_model, forecast_predictions) storing forecast results.

errors¶: Dictionary mapping the error type to value.

size¶: An integer for the total number of datapoints.

error_funcs¶: Dictionary mapping error name to the function that calculates it.

freq¶: A string representing the (inferred) frequency of the pandas.DataFrame.

raw_errors¶: List storing raw errors (truth - predicted).

Raises: ValueError – One or more of the train, test, or fixed window params were invalid. Or the time series is empty.

Sample Usage:

>>> df = pd.read_csv("kats/data/air_passengers.csv")
>>> ts = TimeSeriesData(df=df)
>>> params = ARIMAParams(p=1, d=1, q=1)
>>> all_errors = ["mape", "smape", "mae", "mase", "mse", "rmse"]
>>> backtester = BackTesterFixedWindow(
      error_methods=all_errors,
      data=ts,
      params=paramsparams,
      train_percentage=50,
      test_percentage=25,
      window_percentage=25,
      model_class=ARIMAModel,
    )
>>> backtester.run_backtest()
>>> mape = backtester.get_error_value("mape") # Retrieve MAPE error

class kats.utils.backtesters.BackTesterParent(error_methods: List[str], data: kats.consts.TimeSeriesData, params: kats.consts.Params, model_class: Type, multi: bool, offset=0, **kwargs)[source]¶

Bases: abc.ABC

This class defines the parent functions for various backtesting methods.

error_methods¶: List of strings indicating which errors to calculate (see ALLOWED_ERRORS for exhaustive list).

data¶: kats.consts.TimeSeriesData object to perform backtest on.

params¶: Parameters to train model with.

model_class¶: Defines the model type to use for backtesting.

multi¶: Boolean flag to use multiprocessing (if set to True).

offset¶: Gap between train/test datasets (default 0).

results¶: List of tuples (training_data, testing_data, trained_model, forecast_predictions) storing forecast results.

errors¶: Dictionary mapping the error type to value.

size¶: An integer for the total number of datapoints.

error_funcs¶: Dictionary mapping error name to the function that calculates it.

freq¶: A string representing the (inferred) frequency of the pandas.DataFrame.

Raises: ValueError – The time series is empty or an invalid error type was passed.

calc_error() → Optional[float][source]¶

Calculates all errors in self.error_methods and stores them in the errors dict.

Returns: The error value. None if the error value does not exist.

get_error_value(error_name: str) → float [source]¶

Gets requested error value.

Parameters: error_name – A string of the error whose value should be returned.
Returns: A float of the eror value.
Raises: ValueError – The error name is invalid.

run_backtest() → None [source]¶: Executes backtest.

class kats.utils.backtesters.BackTesterRollingWindow(error_methods: List[str], data: kats.consts.TimeSeriesData, params: kats.consts.Params, train_percentage: float, test_percentage: float, sliding_steps: int, model_class: Type, multi=True, **kwargs)[source]¶

Bases: kats.utils.backtesters.BackTesterParent

Defines functions to execute a rolling window backtest.

An rolling window backtest conducts a backtest over multiple iterations, wherein each iteration, the start location of the training dataset moves forward by a fixed amount, while the test dataset “slides” forward to accommodate. Iterations continue until the end of the test set meets the end of the full data set.

For more information, check out the Kats tutorial notebooks!

train_percentage¶: A float for the percentage of data used for training.

test_percentage¶: A float for the percentage of data used for testing.

sliding_steps¶: An integer for the number of rolling steps (i.e. number of folds).

error_methods¶: List of strings indicating which errors to calculate (see ALLOWED_ERRORS for exhaustive list).

data¶: kats.consts.TimeSeriesData object to perform backtest on.

params¶: Parameters to train model with.

model_class¶: Defines the model type to use for backtesting.

multi¶: A boolean flag to toggle multiprocessing (default True).

results¶: List of tuples (training_data, testing_data, trained_model, forecast_predictions) storing forecast results.

errors¶: Dictionary mapping the error type to value.

size¶: An integer for the total number of datapoints.

error_funcs¶: Dictionary mapping error name to the function that calculates it.

freq¶: A string representing the (inferred) frequency of the pandas.DataFrame.

raw_errors¶: List storing raw errors (truth - predicted).

Raises: ValueError – One or more of the train, test, or sliding steps params were invalid. Or the time series is empty.

Sample Usage:

>>> df = pd.read_csv("kats/data/air_passengers.csv")
>>> ts = TimeSeriesData(df=df)
>>> params = ARIMAParams(p=1, d=1, q=1)
>>> all_errors = ["mape", "smape", "mae", "mase", "mse", "rmse"]
>>> backtester = BackTesterExpandingWindow(
      error_methods=all_errors,
      data=ts,
      params=paramsparams,
      train_percentage=50,
      test_percentage=25,
      expanding_steps=3,
      model_class=ARIMAModel,
    )
>>> backtester.run_backtest()
>>> mape = backtester.get_error_value("mape") # Retrieve MAPE error

class kats.utils.backtesters.BackTesterSimple(error_methods: List[str], data: kats.consts.TimeSeriesData, params: kats.consts.Params, train_percentage: float, test_percentage: float, model_class: Type, **kwargs)[source]¶

Bases: kats.utils.backtesters.BackTesterParent

Defines the functions to execute a simple train/test backtest.

train_percentage¶: A float for the percentage of data used for training.

test_percentage¶: A float for the percentage of data used for testing.

error_methods¶: List of strings indicating which errors to calculate (see ALLOWED_ERRORS for exhaustive list).

data¶: kats.consts.TimeSeriesData object to perform backtest on.

params¶: Parameters to train model with.

model_class¶: Defines the model type to use for backtesting.

results¶: List of tuples (training_data, testing_data, trained_model, forecast_predictions) storing forecast results.

errors¶: Dictionary mapping the error type to value.

size¶: An integer for the total number of datapoints.

error_funcs¶: Dictionary mapping error name to the function that calculates it.

freq¶: A string representing the (inferred) frequency of the pandas.DataFrame.

raw_errors¶: List storing raw errors (truth - predicted).

Raises: ValueError – Invalid train and/or test params passed. Or the time series is empty.

Sample Usage:

>>> df = pd.read_csv("kats/data/air_passengers.csv")
>>> ts = TimeSeriesData(df=df)
>>> params = ARIMAParams(p=1, d=1, q=1)
>>> all_errors = ["mape", "smape", "mae", "mase", "mse", "rmse"]
>>> backtester = BackTesterSimple(
      error_methods=all_errors,
      data=ts,
      params=params,
      train_percentage=75,
      test_percentage=25,
      model_class=ARIMAModel,
    )
>>> backtester.run_backtest()
>>> mape = backtester.get_error_value("mape") # Retrieve MAPE error

class kats.utils.backtesters.CrossValidation(error_methods: List[str], data: kats.consts.TimeSeriesData, params: kats.consts.Params, train_percentage: float, test_percentage: float, num_folds: int, model_class: Type, rolling_window=False, multi=True)[source]¶

Bases: object

Defines class to execute cross validation.

Cross validation is a useful technique to use multiple folds of the training and testing data to help optimize the performance of the model (e.g. hyperparameter tuning). For more info on cross validation, see https://en.wikipedia.org/wiki/Cross-validation_(statistics)

train_percentage¶: A float for the percentage of data used for training.

test_percentage¶: A float for the percentage of data used for testing.

num_folds¶: An integer for the number of folds to use.

error_methods¶: List of strings indicating which errors to calculate (see ALLOWED_ERRORS for exhaustive list).

data¶: kats.consts.TimeSeriesData object to perform backtest on.

params¶: Parameters to train model with.

model_class¶: Defines the model type to use for backtesting.

rolling_window¶: A boolean flag to use the rolling window method instead of the expanding window method (default False).

multi¶: A boolean flag to toggle multiprocessing (default True).

results¶: List of tuples (training_data, testing_data, trained_model, forecast_predictions) storing forecast results.

errors¶: Dictionary mapping the error type to value.

size¶: An integer for the total number of datapoints.

raw_errors¶: List storing raw errors (truth - predicted).

Raises: ValueError – One or more of the train, test, or num_folds params were invalid. Or the time series is empty.

Sample Usage:

>>> df = pd.read_csv("kats/data/air_passengers.csv")
>>> ts = TimeSeriesData(df=df)
>>> params = ARIMAParams(p=1, d=1, q=1)
>>> all_errors = ["mape", "smape", "mae", "mase", "mse", "rmse"]
>>> cv = CrossValidation(
      error_methods=all_errors,
      data=ts,
      params=paramsparams,
      train_percentage=50,
      test_percentage=25,
      num_folds=3,
      model_class=ARIMAModel,
      rolling_window=True
    )
>>> backtester.run_backtest()
>>> mape = cv.get_error_value("mape") # Retrieve MAPE error

get_error_value(error_name: str) → float [source]¶

Gets requested error value.

Parameters: error_name – A string of the error whose value should be returned.
Returns: A float of the eror value.
Raises: ValueError – The error name is invalid.

run_cv() → None [source]¶: Runs the cross validation.

kats.utils.backtesters module¶

Kats

Navigation

Related Topics