kats.tsfeatures.tsfeatures module

class kats.tsfeatures.tsfeatures.TsFeatures(window_size: int = 20, spectral_freq: int = 1, stl_period: int = 7, nbins: int = 10, lag_size: int = 30, acfpacf_lag: int = 6, decomp: str = 'additive', iqr_mult: float = 3.0, threshold: float = 0.8, window: int = 5, n_fast: int = 12, n_slow: int = 21, selected_features: Optional[List[str]] = None, **kwargs)[source]

Bases: object

Process time series data into features for machine learning models, with the function to opt-in/out feature and feature groups in the calculations.

window_size

int; Length of the sliding window for getting level shift features, lumpiness, and stability of time series.

spectral_freq

int; Frequency parameter in getting periodogram through scipy for calculating Shannon entropy.

stl_period

int; Period parameter for performing seasonality trend decomposition using LOESS with statsmodels.

nbins

int; Number of bins to equally segment time series array for getting flat spot feature.

lag_size

int; Maximum number of lag values for calculating Hurst Exponent.

acfpacf_lag

int; Largest lag number for returning ACF/PACF features via statsmodels.

decomp

str; Additive or Multiplicative mode for performing outlier detection using Kats.Detectors.outlier.OutlierDetector.

iqr_mult

float; IQR range for determining outliers through Kats.Detectors.outlier.OutlierDetector.

threshold

float; threshold for trend intensity; higher threshold gives trend with high intensity (0.8 by default). If we only want to use the p-value to determine changepoints, set threshold = 0.

window

int; length of window for all nowcasting features.

n_fast

int; length of “fast” or short period exponential moving average in the MACD algorithm in the nowcasting features.

n_slow

int; length of “slow” or long period exponential moving average in the MACD algorithm in the nowcasting features.

selected_features

None or List[str]; list of feature/feature group name(s) selected to be calculated. We will try only calculating selected features, since some features are bundled in the calculations. This process helps with boosting efficiency, and we will only output selected features.

feature_group_mapping

The dictionary with the mapping from individual features to their bundled feature groups.

final_filter

A dicitonary with boolean as the values to filter out the features not selected, yet calculated due to underlying bundles.

stl_features

Switch for calculating/outputting stl features.

level_shift_features

Switch for calculating/outputting level shift features.

acfpacf_features

Switch for calculating/outputting ACF/PACF features.

special_ac

Switch for calculating/outputting features.

holt_params

Switch for calculating/outputting holt parameter features.

hw_params

Switch for calculating/outputting holt-winters parameter features.

statistics

Switch for calculating/outputting raw statistics features.

cusum_detector

Switch for calculating/outputting features using cusum detector in Kats.

robust_stat_detector

Switch for calculating/outputting features using robust stat detector in Kats.

bocp_detector

Switch for calculating/outputting stl features features using bocp detector in Kats.

outlier_detector

Switch for calculating/outputting stl features using outlier detector in Kats.

trend_detector

Switch for calculating/outputting stl features using trend detector in Kats.

nowcasting

Switch for calculating/outputting stl features using nowcasting detector in Kats.

seasonalities

Switch for calculating/outputting stl features using cusum detector in Kats.

default

The default status of the switch for opt-in/out feature calculations.

static get_acf_features(extra_args: Dict[str, bool], default_status: bool, y_acf_list: List[float], diff1y_acf_list: List[float], diff2y_acf_list: List[float])[source]

Aggregating extracted ACF features from get_acfpacf_features function.

Parameters
  • extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.

  • default_status – Default status of the switch for calculate the features or not. Default value is True.

  • y_acf_list – List of ACF values acquired from original time series.

  • diff1y_acf_list – List of ACF values acquired from differenced time series.

  • diff2y_acf_list – List of ACF values acquired from twice differenced time series.

Returns

Auto-correlation function (ACF) features.

static get_acfpacf_features(x: numpy.ndarray, acfpacf_lag: int = 6, period: int = 7, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]

Calculate ACF and PACF based features. Calculate seasonal ACF, PACF based features Reference: https://stackoverflow.com/questions/36038927/whats-the-difference-between-pandas-acf-and-statsmodel-acf R code: https://cran.r-project.org/web/packages/tsfeatures/vignettes/tsfeatures.html Paper: Meta-learning how to forecast time series

Parameters
  • x – The univariate time series array in the form of 1d numpy array.

  • acfpacf_lag – int; Largest lag number for returning ACF/PACF features via statsmodels. Default value is 6.

  • period – int; Seasonal period. Default value is 7.

  • extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.

  • default_status – Default status of the switch for calculate the features or not. Default value is True.

Returns

Aggregated ACF, PACF features.

static get_binarize_mean(x: numpy.ndarray)[source]

Converts time series array into a binarized version. Time-series values above its mean are given 1, and those below the mean are 0. Return the average value of the binarized vector. Reference: https://cran.r-project.org/web/packages/tsfeatures/vignettes/tsfeatures.html

Parameters

x – The univariate time series array in the form of 1d numpy array.

Returns

The binarized version of time series array.

static get_bocp_detector(ts: kats.consts.TimeSeriesData, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]

Run the Kats BOCP Detector on the Time Series, extract features from the outputs of the detection.

Parameters
  • ts – The univariate time series array in the form of Kats TimeSeriesData object.

  • extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.

  • default_status – Default status of the switch for calculate the features or not. Default value is True.

Returns

tuple containing:

Number of changepoints detected by BOCP Detector Max value of the confidence of the changepoints detected Mean value of the confidence of the changepoints detected.

Return type

(tuple)

static get_crossing_points(x: numpy.ndarray)[source]

Calculating crossing points: the number of times a time series crosses the median line. Reference: https://cran.r-project.org/web/packages/tsfeatures/vignettes/tsfeatures.html

Parameters

x – The univariate time series array in the form of 1d numpy array.

Returns

The number of times a time series crosses the median line.

static get_cusum_detector(ts: kats.consts.TimeSeriesData, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]

Run the Kats CUSUM Detector on the Time Series, extract features from the outputs of the detection.

Parameters
  • ts – The univariate time series array in the form of Kats TimeSeriesData object.

  • extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.

  • default_status – Default status of the switch for calculate the features or not. Default value is True.

Returns

Outputs of the CUSUM Detector, which include (1) Number of changepoints, either 1 or 0, (2) Confidence of the changepoint detected, 0 if not changepoint, (3) index, or position of the changepoint detected within the time series, (4) delta of the mean levels before and after the changepoint, (5) log-likelihood ratio of changepoint, (6) Boolean - whether regression is detected by CUSUM, (7) Boolean - whether changepoint is stable, (8) p-value of changepoint.

static get_flat_spots(x: numpy.ndarray, nbins: int = 10)[source]

Getting flat spots: Maximum run-lengths across equally-sized segments of time series

Parameters
  • x – The univariate time series array in the form of 1d numpy array.

  • nbins – int; Number of bins to segment time series data into.

Returns

Maximum run-lengths across segmented time series array.

static get_het_arch(x: numpy.ndarray)[source]

reference: https://www.statsmodels.org/dev/generated/statsmodels.stats.diagnostic.het_arch.html Engle’s Test for Autoregressive Conditional Heteroscedasticity (ARCH)

Parameters

x – The univariate time series array in the form of 1d numpy array.

Returns

Lagrange multiplier test statistic

static get_histogram_mode(x: numpy.ndarray, nbins: int = 10)[source]

Measures the mode of the data vector using histograms with a given number of bins. Reference: https://cran.r-project.org/web/packages/tsfeatures/vignettes/tsfeatures.html

Parameters
  • x – The univariate time series array in the form of 1d numpy array.

  • nbins – int; Number of bins to get the histograms. Default value is 10.

Returns

Mode of the data vector using histograms.

static get_holt_params(x: numpy.ndarray, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]

Estimates the smoothing parameter for the level-alpha and the smoothing parameter for the trend-beta of Holt’s linear trend method. ‘alpha’: Level parameter of the Holt model. ‘beta’: Trend parameter of the Hold model.

Parameters
  • x – The univariate time series array in the form of 1d numpy array.

  • extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.

  • default_status – Default status of the switch for calculate the features or not. Default value is True.

Returns

Level and trend parameter of a fitted Holt model.

static get_hurst(x: numpy.ndarray, lag_size: int = 30)[source]

Getting: Hurst Exponent wiki: https://en.wikipedia.org/wiki/Hurst_exponent

Parameters
  • x – The univariate time series array in the form of 1d numpy array.

  • lag_size – int; Size for getting lagged time series data. Default value is 30.

Returns

The Hurst Exponent of the time series array

static get_hw_params(x: numpy.ndarray, period: int = 7, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]

Estimates the smoothing parameter for the level-alpha, trend-beta of HW’s linear trend, and additive seasonal trend-gamma.

Parameters
  • x – The univariate time series array in the form of 1d numpy array.

  • period – int; Seaonal period for fitting exponential smoothing model. Default value is 7.

  • extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.

  • default_status – Default status of the switch for calculate the features or not. Default value is True.

Returns

Level, trend and seasonal parameter of a fitted Holt-Winter’s model.

static get_length(x: numpy.ndarray)[source]

Getting the length of time series array.

Parameters

x – The univariate time series array in the form of 1d numpy array.

Returns

Length of the time series array.

static get_level_shift(x: numpy.ndarray, window_size: int = 20, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)pandas.core.frame.DataFrame[source]

Calculating level shift features.

  • level_shift_idx: Location of the maximum mean value difference, between two consecutive sliding windows

  • level_shift_size: Size of the maximum mean value difference, between two consecutive sliding windows

Parameters
  • x – The univariate time series array in the form of 1d numpy array.

  • window_size – int; Length of the sliding window. Default value is 20.

  • extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.

  • default_status – Default status of the switch for calculate the features or not. Default value is True.

Returns

Level shift features including level_shift_idx, and level_shift_size

static get_linearity(x: numpy.ndarray)[source]

Getting linearity feature: R square from a fitted linear regression.

Parameters

x – The univariate time series array in the form of 1d numpy array.

Returns

R square from a fitted linear regression.

static get_lumpiness(x: numpy.ndarray, window_size: int = 20)[source]

Calculating the lumpiness of time series. Lumpiness is defined as the variance of the chunk-wise variances.

Parameters
  • x – The univariate time series array in the form of 1d numpy array.

  • window_size – int; Window size to split the data into chunks for getting variances. Default value is 20.

Returns

Lumpiness of the time series array.

static get_mean(x: numpy.ndarray)[source]

Getting the average value of time series array.

Parameters

x – The univariate time series array in the form of 1d numpy array.

Returns

Average of the time series array.

static get_nowcasting(x: numpy.ndarray, window: int = 5, n_fast: int = 12, n_slow: int = 21, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]

Run the Kats Nowcasting transformer on the Time Series, extract aggregated features from the outputs of the transformation.

Parameters
  • x – The univariate time series array in the form of 1d numpy array.

  • window – int; Length of window size for all Nowcasting features. Default value is 5.

  • n_fast – int; length of “fast” or short period exponential moving average in the MACD algorithm in the nowcasting features. Default value is 12.

  • n_slow – int; length of “slow” or long period exponential moving average in the MACD algorithm in the nowcasting features. Default value is 21.

  • extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.

  • default_status – Default status of the switch for calculate the features or not. Default value is True.

Returns

Mean values of the Kats Nowcasting algorithm time series outputs using the parameters (window, n_fast, n_slow) indicated above. These outputs inclue : (1) Mean of Rate of Change (ROC) time series, (2) Mean of Moving Average (MA) time series,(3) Mean of Momentum (MOM) time series, (4) Mean of LAG time series, (5) Means of MACD, MACDsign, and MACDdiff from Kats Nowcasting.

static get_outlier_detector(ts: kats.consts.TimeSeriesData, decomp: str = 'additive', iqr_mult: float = 3.0, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]

Run the Kats Outlier Detector on the Time Series, extract features from the outputs of the detection.

Parameters
  • ts – The univariate time series array in the form of Kats TimeSeriesData object.

  • decomp – str; Additive or Multiplicative mode for performing outlier detection using OutlierDetector. Default value is ‘additive’.

  • iqr_mult – float; IQR range for determining outliers through OutlierDetector. Default value is 3.0.

  • extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.

  • default_status – Default status of the switch for calculate the features or not. Default value is True.

Returns

Number of outliers by the Outlier Detector.

static get_pacf_features(extra_args: Dict[str, bool], default_status: bool, y_pacf_list: List[float], diff1y_pacf_list: List[float], diff2y_pacf_list: List[float])[source]

Aggregating extracted PACF features from get_acfpacf_features function.

Parameters
  • extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.

  • default_status – Default status of the switch for calculate the features or not. Default value is True.

  • y_pacf_list – List of PACF values acquired from original time series.

  • diff1y_pacf_list – List of PACF values acquired from differenced time series.

  • diff2y_pacf_list – List of PACF values acquired from twice differenced time series.

Returns

Partial auto-correlation function (PACF) features.

static get_robust_stat_detector(ts: kats.consts.TimeSeriesData, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]

Run the Kats Robust Stat Detector on the Time Series, extract features from the outputs of the detection.

Parameters
  • ts – The univariate time series array in the form of Kats TimeSeriesData object.

  • extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.

  • default_status – Default status of the switch for calculate the features or not. Default value is True.

Returns

(1) Number changepoints detected by the Robust Stat Detector, and (2) Mean of the Metric values from the Robust Stat Detector.

static get_seasonalities(ts: kats.consts.TimeSeriesData, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]

Run the Kats seaonality detectors to get the estimated seasonal period, then extract trend, seasonality and residual magnitudes.

Parameters
  • ts – The univariate time series array in the form of Kats TimeSeriesData object.

  • extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.

  • default_status – Default status of the switch for calculate the features or not. Default value is True.

Returns

Returns the detected seasonality period. Slope acquired via fitting simple linear regression model on the trend component as trend magnitude. Difference between the 95 percentile and 5 percentile of the seasonal component as the seasonality magnitude. Standard deviation of the residual component.

static get_special_ac(x: numpy.ndarray, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]

Gettting special_ac features. firstmin_ac: the time of first minimum in the autocorrelation function firstzero_ac: the time of first zero crossing the autocorrelation function.

Parameters
  • x – The univariate time series array in the form of 1d numpy array.

  • extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.

  • default_status – Default status of the switch for calculate the features or not. Default value is True.

Returns

Special autocorrelation features described above.

static get_spectral_entropy(x: numpy.ndarray, freq: int = 1)[source]

Getting normalized Shannon entropy of power spectral density. PSD is calculated using scipy’s periodogram.

Parameters
  • x – The univariate time series array in the form of 1d numpy array.

  • freq – int; Frequency for calculating the PSD via scipy periodogram. Default value is 1.

Returns

Normalized Shannon entropy.

static get_stability(x: numpy.ndarray, window_size: int = 20)[source]

Calculating the stability of time series. Stability is defined as the variance of chunk-wise means.

Parameters
  • x – The univariate time series array in the form of 1d numpy array.

  • window_size – int; Window size to split the data into chunks for getting variances. Default value is 20.

Returns

Stability of the time series array.

static get_std1st_der(x: numpy.ndarray)[source]

Calculating std1st_der: the standard deviation of the first derivative of the time series. Reference: https://cran.r-project.org/web/packages/tsfeatures/vignettes/tsfeatures.html

Parameters

x – The univariate time series array in the form of 1d numpy array.

Returns

The standard deviation of the first derivative of the time series.

static get_stl_features(x: numpy.ndarray, period: int = 7, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]

Calculate STL based features for a time series, including strength of trend, seasonality, spikiness, peak/trough.

Parameters
  • x – The univariate time series array in the form of 1d numpy array.

  • period – int; Period parameter for performing seasonality trend decomposition using LOESS with statsmodels. Default value is 7.

  • extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.

  • default_status – Default status of the switch for calculate the features or not. Default value is True.

Returns

Seasonality features including strength of trend, seasonality, spikiness, peak/trough.

static get_trend_detector(ts: kats.consts.TimeSeriesData, threshold: float = 0.8, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]

Run the Kats Trend Detector on the Time Series, extract features from the outputs of the detection.

Parameters
  • ts – The univariate time series array in the form of Kats TimeSeriesData object.

  • threshold – float; threshold for trend intensity; higher threshold gives trend with high intensity (0.8 by default). If we only want to use the p-value to determine changepoints, set threshold = 0.

  • extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.

  • default_status – Default status of the switch for calculate the features or not. Default value is True.

Returns

(1) Number of trends detected by the Kats Trend Detector, (2) Number of increasing trends, (3) Mean of the abolute values of Taus of the trends detected.

static get_unitroot_kpss(x: numpy.ndarray)[source]

Getting a test statistic based on KPSS test. Test a null hypothesis that an observable time series is stationary around a deterministic trend. A vector comprising the statistic for the KPSS unit root test with linear trend and lag one Wiki: https://en.wikipedia.org/wiki/KPSS_test

Parameters

x – The univariate time series array in the form of 1d numpy array.

Returns

Test statistics acquired using KPSS test.

static get_var(x: numpy.ndarray)[source]

Getting the variance of time series array.

Parameters

x – The univariate time series array in the form of 1d numpy array.

Returns

Variance of the time series array.

transform(x: kats.consts.TimeSeriesData)[source]

The overall high-level function for transforming time series into a number of features

Parameters

x – Kats TimeSeriesData object.

Returns

Returning maps (dictionary) with feature name and value pair. For univariate input return a map of {feature: value}. For multivarite input return a list of maps.