kats.models.metalearner.metalearner_modelselect module

A module for meta-learner model selection.

This module contains:
  • MetaLearnModelSelect for meta-learner models selection, which recommends the forecasting model based on time series or time series features;

  • RandomDownSampler for creating balanced dataset via downsampling.

class kats.models.metalearner.metalearner_modelselect.MetaLearnModelSelect(metadata: Optional[List[Dict[str, Any]]] = None, load_model: bool = False)[source]

Bases: object

Meta-learner framework on forecasting model selection. This framework uses classification algorithms to recommend suitable forecasting models. For training, it uses time series features as inputs and the best forecasting models as labels. For prediction, it takes time series or time series features as inputs to predict the most suitable forecasting model. The class provides count_category, preprocess, plot_feature_comparison, get_corr_mtx, plot_corr_heatmap, train, pred, pred_by_feature, pred_fuzzy, load_model and save_model.

metadata

Optional; A list of dictionaries representing the meta-data of time series (e.g., the meta-data generated by GetMetaData object). Each dictionary d must contain at least 3 components: ‘hpt_res’, ‘features’ and ‘best_model’. d[‘hpt_res’] represents the best hyper-parameters for each candidate model and the corresponding errors; d[‘features’] are time series features, and d[‘best_model’] is a string representing the best candidate model of the corresponding time series data. metadata should not be None unless load_model is True. Default is None.

load_model[source]

Optional; A boolean to specify whether or not to load a trained model. Default is False.

Sample Usage:
>>> mlms = MetaLearnModelSelect(data)
>>> mlms.train(n_trees=200, test_size=0.1, eval_method='mean') # Train a meta-learner model selection model.
>>> mlms.pred(TSdata) # Predict/recommend forecasting model for a new time series data.
>>> mlms2.pred(TSdata, n_top=3) # Predict/recommend the top 3 most suitable forecasting model.
>>> mlms.save_model("mlms.pkl") # Save the trained model.
>>> mlms2 = MetaLearnModelSelect(metadata=None, load_model=True) # Create a new object and then load a pre-trained model.
>>> mlms2.load_model("mlms.pkl")
count_category()Dict[str, int][source]

Count the number of observations of each candidate model in meta-data.

Returns

A dictionary storing the number of observations of each candidate model in meta-data.

get_corr_mtx()pandas.core.frame.DataFrame[source]

Calculate correlation matrix of feature matrix.

Returns

A pd.DataFrame representing the correlation matrix of time series features.

load_model(file_name: str)None[source]

Load a pre-trained model.

Parameters

file_name – A string representing the path to load the pre-trained model.

Returns

None.

plot_corr_heatmap(camp: str = 'RdBu_r')None[source]

Generate heat-map for correlation matrix of feature matrix.

Parameters

camp – Optional; A string representing the olor bar used to generate heat-map. Default is “RdBu_r”.

Returns

None

plot_feature_comparison(i: int, j: int)None[source]

Generate the time series features comparison plot.

Parameters
  • i – A integer representing the index of one feature vector from feature matrix to be compared.

  • j – A integer representing the other index of one feature vector from feature matrix to be compared.

Returns

None

pred(source_ts: kats.consts.TimeSeriesData, ts_scale: bool = True, n_top: int = 1)Union[str, List[str]][source]

Predict the best forecasting model for a new time series data.

Parameters
  • source_tskats.consts.TimeSeriesData object representing the new time series data.

  • ts_scale – Optional; A boolean to specify whether or not to rescale time series data (i.e., normalizing it with its maximum vlaue) before calculating features. Default is True.

  • n_top – Optional; A integer for the number of top model names to return. Default is 1.

Returns

A string or a list of strings of the names of forecasting models.

pred_by_feature(source_x: Union[numpy.ndarray, List[numpy.ndarray], pandas.core.frame.DataFrame], n_top: int = 1)numpy.ndarray[source]

Predict the best forecasting models given a list/dataframe of time series features :param source_x: the time series features of the time series that one wants to predict, can be a np.ndarray, a list of np.ndarray or a pd.DataFrame. :param n_top: Optional; An integer for the number of top model names to return. Default is 1.

Returns

An array of strings representing the forecasing models. If n_top=1, a 1-d np.ndarray will be returned. Otherwise, a 2-d np.ndarray will be returned.

pred_fuzzy(source_ts: kats.consts.TimeSeriesData, ts_scale: bool = True, sig_level: float = 0.2)Dict[str, Any][source]

Predict a forecasting model for a new time series data using fuzzy method.

The fuzzy method returns the best candiate model and the second best model will be returned if there is no statistically significant difference between them. The statistical test is based on the bootstrapping samples drawn from the fitted random forest model. This function is only available for random forest classifier.

Parameters
  • source_tskats.consts.TimeSeriesData object representing the new time series data.

  • ts_scale – Optional; A boolean to specify whether or not to rescale time series data (i.e., normalizing it with its maximum vlaue) before calculating features. Default is True.

  • sig_level – Optional; A float representing the significance level for bootstrap test. If pvalue>=sig_level, then we deem there is no difference between the best and the second best model. Default is 0.2.

Returns

A dictionary of prediction results, including forecasting models, their probability of being th best forecasting models and the pvalues of bootstrap tests.

preprocess(downsample: bool = True, scale: bool = False)None[source]

Pre-process meta data before training a classifier.

There are 2 options in this function: 1) whether or not to downsample meta-data to ensure each candidate model has the same number of observations; and 2) whether or not to rescale the time series features to zero-mean and unit-variance.

Parameters
  • downsample – Optional; A boolean to specify whether or not to downsample meta-data to ensure each candidate model has the same number of observations. Default is True.

  • scale – Optional; A boolean to specify whether or not to rescale the time series features to zero-mean and unit-variance.

Returns

None

save_model(file_name: str)None[source]

Save the trained model.

Parameters

file_name – A string representing the path to save the trained model.

Returns

None.

train(method: str = 'RandomForest', eval_method: str = 'mean', test_size: float = 0.1, n_trees: int = 500, n_neighbors: int = 5)Dict[str, Any][source]

Train a meta-learner model selection model (i.e., a classifier).

Parameters
  • method – Optional; A string representing the name of the classification algorithm. Can be ‘RandomForest’, ‘GBDT’, ‘SVM’, ‘KNN’ or ‘NaiveBayes’. Default is ‘RandomForest’.

  • eval_method – Optional; A string representing the aggregation method used for computing errors. Can be ‘mean’ or ‘median’. Default is ‘mean’.

  • test_size – Optional; A float representing the proportion of test set, which should be within (0, 1). Default is 0.1.

  • n_trees – Optional; An integer representing the number of trees in random forest model. Default is 500.

  • n_neighbors – Optional; An integer representing the number of neighbors in KNN model. Default is 5.

Returns

A dictionary summarizing the performance of the trained classifier on both training and validation set.

class kats.models.metalearner.metalearner_modelselect.RandomDownSampler(hpt: pandas.core.series.Series, dataX: pandas.core.frame.DataFrame, dataY: pandas.core.series.Series)[source]

Bases: object

An assistant class for class MetaLearnModelSelect to do random downsampling.

RandomDownSampler provides methods for creating a balanced dataset via downsampling. It contains fit_resample.

hpt

A pandas.Series object storing the best hyper-parameters and the corresponding errors for each model.

dataX

A pandas.DataFrame object representing the time series features matrix.

dataY

A pandas.Series object representing the best models for the corresponding time series.

fit_resample()Tuple[pandas.core.series.Series, pandas.core.frame.DataFrame, pandas.core.series.Series][source]

Create balanced dataset via random downsampling.

Returns

A tuple containing the pandas.Series object of the best hyper-parameters and the corresponding errors, the pandas.DataFrame object of the downsampled time series features, and the pandas.Series object of the downsampled best models for the corresponding time series.