kats.utils.backtesters moduleΒΆ
This file defines the BackTester classes for Kats.
- Kats supports multiple types of backtesters, including:
BackTesterSimple
(basic train & test backtesting).BackTesterFixedWindow
(discontinuous train & test data).BackTesterExpandingWindow
(increasing train window size over multiple iterations).BackTesterRollingWindow
(sliding train & test windows over multiple iterations).
This module also supports CrossValidation
with both expanding and
rolling windows.
For more information, check out the Kats tutorial notebook on backtesting!
- class kats.utils.backtesters.BackTesterExpandingWindow(error_methods: List[str], data: kats.consts.TimeSeriesData, params: kats.consts.Params, start_train_percentage: float, end_train_percentage: float, test_percentage: float, expanding_steps: int, model_class: Type, multi=True, **kwargs)[source]ΒΆ
Bases:
kats.utils.backtesters.BackTesterParent
Defines functions to execute an expanding window backtest.
An expanding window backtest conducts a backtest over multiple iterations, wherein each iteration, the size of the training dataset increases by a fixed amount, while the test dataset βslidesβ forward to accommodate. Iterations continue until the complete data set is used to either train or test in the final interation.
For more information, check out the Kats tutorial notebooks!
- start_train_percentageΒΆ
A float for the initial percentage of data used for training.
- end_train_percentageΒΆ
A float for the final percentage of data used for training.
- test_percentageΒΆ
A float for the percentage of data used for testing.
- expanding_stepsΒΆ
An integer for the number of expanding steps (i.e. number of folds).
- error_methodsΒΆ
List of strings indicating which errors to calculate (see ALLOWED_ERRORS for exhaustive list).
- dataΒΆ
kats.consts.TimeSeriesData
object to perform backtest on.
- paramsΒΆ
Parameters to train model with.
- model_classΒΆ
Defines the model type to use for backtesting.
- multiΒΆ
A boolean flag to toggle multiprocessing (default True).
- resultsΒΆ
List of tuples (training_data, testing_data, trained_model, forecast_predictions) storing forecast results.
- errorsΒΆ
Dictionary mapping the error type to value.
- sizeΒΆ
An integer for the total number of datapoints.
- error_funcsΒΆ
Dictionary mapping error name to function that calculates it.
- freqΒΆ
A string representing the (inferred) frequency of the pandas.DataFrame.
- raw_errorsΒΆ
List storing raw errors (truth - predicted).
- Raises
ValueError β One or more of the train, test, or expanding steps params were invalid. Or the time series is empty.
- Sample Usage:
>>> df = pd.read_csv("kats/data/air_passengers.csv") >>> ts = TimeSeriesData(df=df) >>> params = ARIMAParams(p=1, d=1, q=1) >>> all_errors = ["mape", "smape", "mae", "mase", "mse", "rmse"] >>> backtester = BackTesterExpandingWindow( error_methods=all_errors, data=ts, params=paramsparams, start_train_percentage=50, end_train_percentage=75, test_percentage=25, expanding_steps=3, model_class=ARIMAModel, ) >>> backtester.run_backtest() >>> mape = backtester.get_error_value("mape") # Retrieve MAPE error
- class kats.utils.backtesters.BackTesterFixedWindow(error_methods: List[str], data: kats.consts.TimeSeriesData, params: kats.consts.Params, train_percentage: float, test_percentage: float, window_percentage: int, model_class: Type, **kwargs)[source]ΒΆ
Bases:
kats.utils.backtesters.BackTesterParent
Defines functions to execute a fixed window ahead backtest.
A fixed window ahead backtest is similar to a standard (i.e. simple) backtest, with the caveat that there is a gap between the train and test data sets. The purpose of this type backtest is to focus on the long range forecasting ability of the model.
- train_percentageΒΆ
A float for the percentage of data used for training.
- test_percentageΒΆ
A float for the percentage of data used for testing.
- window_percentageΒΆ
A float for the percentage of data used for the fixed window.
- error_methodsΒΆ
List of strings indicating which errors to calculate (see ALLOWED_ERRORS for exhaustive list).
- dataΒΆ
kats.consts.TimeSeriesData
object to perform backtest on.
- paramsΒΆ
Parameters to train model with.
- model_classΒΆ
Defines the model type to use for backtesting.
- resultsΒΆ
List of tuples (training_data, testing_data, trained_model, forecast_predictions) storing forecast results.
- errorsΒΆ
Dictionary mapping the error type to value.
- sizeΒΆ
An integer for the total number of datapoints.
- error_funcsΒΆ
Dictionary mapping error name to the function that calculates it.
- freqΒΆ
A string representing the (inferred) frequency of the pandas.DataFrame.
- raw_errorsΒΆ
List storing raw errors (truth - predicted).
- Raises
ValueError β One or more of the train, test, or fixed window params were invalid. Or the time series is empty.
- Sample Usage:
>>> df = pd.read_csv("kats/data/air_passengers.csv") >>> ts = TimeSeriesData(df=df) >>> params = ARIMAParams(p=1, d=1, q=1) >>> all_errors = ["mape", "smape", "mae", "mase", "mse", "rmse"] >>> backtester = BackTesterFixedWindow( error_methods=all_errors, data=ts, params=paramsparams, train_percentage=50, test_percentage=25, window_percentage=25, model_class=ARIMAModel, ) >>> backtester.run_backtest() >>> mape = backtester.get_error_value("mape") # Retrieve MAPE error
- class kats.utils.backtesters.BackTesterParent(error_methods: List[str], data: kats.consts.TimeSeriesData, params: kats.consts.Params, model_class: Type, multi: bool, offset=0, **kwargs)[source]ΒΆ
Bases:
abc.ABC
This class defines the parent functions for various backtesting methods.
- error_methodsΒΆ
List of strings indicating which errors to calculate (see ALLOWED_ERRORS for exhaustive list).
- dataΒΆ
kats.consts.TimeSeriesData
object to perform backtest on.
- paramsΒΆ
Parameters to train model with.
- model_classΒΆ
Defines the model type to use for backtesting.
- multiΒΆ
Boolean flag to use multiprocessing (if set to True).
- offsetΒΆ
Gap between train/test datasets (default 0).
- resultsΒΆ
List of tuples (training_data, testing_data, trained_model, forecast_predictions) storing forecast results.
- errorsΒΆ
Dictionary mapping the error type to value.
- sizeΒΆ
An integer for the total number of datapoints.
- error_funcsΒΆ
Dictionary mapping error name to the function that calculates it.
- freqΒΆ
A string representing the (inferred) frequency of the pandas.DataFrame.
- Raises
ValueError β The time series is empty or an invalid error type was passed.
- calc_error() → Optional[float][source]ΒΆ
Calculates all errors in self.error_methods and stores them in the errors dict.
- Returns
The error value. None if the error value does not exist.
- get_error_value(error_name: str) → float[source]ΒΆ
Gets requested error value.
- Parameters
error_name β A string of the error whose value should be returned.
- Returns
A float of the eror value.
- Raises
ValueError β The error name is invalid.
- class kats.utils.backtesters.BackTesterRollingWindow(error_methods: List[str], data: kats.consts.TimeSeriesData, params: kats.consts.Params, train_percentage: float, test_percentage: float, sliding_steps: int, model_class: Type, multi=True, **kwargs)[source]ΒΆ
Bases:
kats.utils.backtesters.BackTesterParent
Defines functions to execute a rolling window backtest.
An rolling window backtest conducts a backtest over multiple iterations, wherein each iteration, the start location of the training dataset moves forward by a fixed amount, while the test dataset βslidesβ forward to accommodate. Iterations continue until the end of the test set meets the end of the full data set.
For more information, check out the Kats tutorial notebooks!
- train_percentageΒΆ
A float for the percentage of data used for training.
- test_percentageΒΆ
A float for the percentage of data used for testing.
- sliding_stepsΒΆ
An integer for the number of rolling steps (i.e. number of folds).
- error_methodsΒΆ
List of strings indicating which errors to calculate (see ALLOWED_ERRORS for exhaustive list).
- dataΒΆ
kats.consts.TimeSeriesData
object to perform backtest on.
- paramsΒΆ
Parameters to train model with.
- model_classΒΆ
Defines the model type to use for backtesting.
- multiΒΆ
A boolean flag to toggle multiprocessing (default True).
- resultsΒΆ
List of tuples (training_data, testing_data, trained_model, forecast_predictions) storing forecast results.
- errorsΒΆ
Dictionary mapping the error type to value.
- sizeΒΆ
An integer for the total number of datapoints.
- error_funcsΒΆ
Dictionary mapping error name to the function that calculates it.
- freqΒΆ
A string representing the (inferred) frequency of the pandas.DataFrame.
- raw_errorsΒΆ
List storing raw errors (truth - predicted).
- Raises
ValueError β One or more of the train, test, or sliding steps params were invalid. Or the time series is empty.
- Sample Usage:
>>> df = pd.read_csv("kats/data/air_passengers.csv") >>> ts = TimeSeriesData(df=df) >>> params = ARIMAParams(p=1, d=1, q=1) >>> all_errors = ["mape", "smape", "mae", "mase", "mse", "rmse"] >>> backtester = BackTesterExpandingWindow( error_methods=all_errors, data=ts, params=paramsparams, train_percentage=50, test_percentage=25, expanding_steps=3, model_class=ARIMAModel, ) >>> backtester.run_backtest() >>> mape = backtester.get_error_value("mape") # Retrieve MAPE error
- class kats.utils.backtesters.BackTesterSimple(error_methods: List[str], data: kats.consts.TimeSeriesData, params: kats.consts.Params, train_percentage: float, test_percentage: float, model_class: Type, **kwargs)[source]ΒΆ
Bases:
kats.utils.backtesters.BackTesterParent
Defines the functions to execute a simple train/test backtest.
- train_percentageΒΆ
A float for the percentage of data used for training.
- test_percentageΒΆ
A float for the percentage of data used for testing.
- error_methodsΒΆ
List of strings indicating which errors to calculate (see ALLOWED_ERRORS for exhaustive list).
- dataΒΆ
kats.consts.TimeSeriesData
object to perform backtest on.
- paramsΒΆ
Parameters to train model with.
- model_classΒΆ
Defines the model type to use for backtesting.
- resultsΒΆ
List of tuples (training_data, testing_data, trained_model, forecast_predictions) storing forecast results.
- errorsΒΆ
Dictionary mapping the error type to value.
- sizeΒΆ
An integer for the total number of datapoints.
- error_funcsΒΆ
Dictionary mapping error name to the function that calculates it.
- freqΒΆ
A string representing the (inferred) frequency of the pandas.DataFrame.
- raw_errorsΒΆ
List storing raw errors (truth - predicted).
- Raises
ValueError β Invalid train and/or test params passed. Or the time series is empty.
- Sample Usage:
>>> df = pd.read_csv("kats/data/air_passengers.csv") >>> ts = TimeSeriesData(df=df) >>> params = ARIMAParams(p=1, d=1, q=1) >>> all_errors = ["mape", "smape", "mae", "mase", "mse", "rmse"] >>> backtester = BackTesterSimple( error_methods=all_errors, data=ts, params=params, train_percentage=75, test_percentage=25, model_class=ARIMAModel, ) >>> backtester.run_backtest() >>> mape = backtester.get_error_value("mape") # Retrieve MAPE error
- class kats.utils.backtesters.CrossValidation(error_methods: List[str], data: kats.consts.TimeSeriesData, params: kats.consts.Params, train_percentage: float, test_percentage: float, num_folds: int, model_class: Type, rolling_window=False, multi=True)[source]ΒΆ
Bases:
object
Defines class to execute cross validation.
Cross validation is a useful technique to use multiple folds of the training and testing data to help optimize the performance of the model (e.g. hyperparameter tuning). For more info on cross validation, see https://en.wikipedia.org/wiki/Cross-validation_(statistics)
- train_percentageΒΆ
A float for the percentage of data used for training.
- test_percentageΒΆ
A float for the percentage of data used for testing.
- num_foldsΒΆ
An integer for the number of folds to use.
- error_methodsΒΆ
List of strings indicating which errors to calculate (see ALLOWED_ERRORS for exhaustive list).
- dataΒΆ
kats.consts.TimeSeriesData
object to perform backtest on.
- paramsΒΆ
Parameters to train model with.
- model_classΒΆ
Defines the model type to use for backtesting.
- rolling_windowΒΆ
A boolean flag to use the rolling window method instead of the expanding window method (default False).
- multiΒΆ
A boolean flag to toggle multiprocessing (default True).
- resultsΒΆ
List of tuples (training_data, testing_data, trained_model, forecast_predictions) storing forecast results.
- errorsΒΆ
Dictionary mapping the error type to value.
- sizeΒΆ
An integer for the total number of datapoints.
- raw_errorsΒΆ
List storing raw errors (truth - predicted).
- Raises
ValueError β One or more of the train, test, or num_folds params were invalid. Or the time series is empty.
- Sample Usage:
>>> df = pd.read_csv("kats/data/air_passengers.csv") >>> ts = TimeSeriesData(df=df) >>> params = ARIMAParams(p=1, d=1, q=1) >>> all_errors = ["mape", "smape", "mae", "mase", "mse", "rmse"] >>> cv = CrossValidation( error_methods=all_errors, data=ts, params=paramsparams, train_percentage=50, test_percentage=25, num_folds=3, model_class=ARIMAModel, rolling_window=True ) >>> backtester.run_backtest() >>> mape = cv.get_error_value("mape") # Retrieve MAPE error
- get_error_value(error_name: str) → float[source]ΒΆ
Gets requested error value.
- Parameters
error_name β A string of the error whose value should be returned.
- Returns
A float of the eror value.
- Raises
ValueError β The error name is invalid.