kats.tsfeatures.tsfeatures module¶
- class kats.tsfeatures.tsfeatures.TsFeatures(window_size: int = 20, spectral_freq: int = 1, stl_period: int = 7, nbins: int = 10, lag_size: int = 30, acfpacf_lag: int = 6, decomp: str = 'additive', iqr_mult: float = 3.0, threshold: float = 0.8, window: int = 5, n_fast: int = 12, n_slow: int = 21, selected_features: Optional[List[str]] = None, **kwargs)[source]¶
Bases:
object
Process time series data into features for machine learning models, with the function to opt-in/out feature and feature groups in the calculations.
- window_size¶
int; Length of the sliding window for getting level shift features, lumpiness, and stability of time series.
- spectral_freq¶
int; Frequency parameter in getting periodogram through scipy for calculating Shannon entropy.
- stl_period¶
int; Period parameter for performing seasonality trend decomposition using LOESS with statsmodels.
- nbins¶
int; Number of bins to equally segment time series array for getting flat spot feature.
- lag_size¶
int; Maximum number of lag values for calculating Hurst Exponent.
- acfpacf_lag¶
int; Largest lag number for returning ACF/PACF features via statsmodels.
- decomp¶
str; Additive or Multiplicative mode for performing outlier detection using Kats.Detectors.outlier.OutlierDetector.
- iqr_mult¶
float; IQR range for determining outliers through Kats.Detectors.outlier.OutlierDetector.
- threshold¶
float; threshold for trend intensity; higher threshold gives trend with high intensity (0.8 by default). If we only want to use the p-value to determine changepoints, set threshold = 0.
- window¶
int; length of window for all nowcasting features.
- n_fast¶
int; length of “fast” or short period exponential moving average in the MACD algorithm in the nowcasting features.
- n_slow¶
int; length of “slow” or long period exponential moving average in the MACD algorithm in the nowcasting features.
- selected_features¶
None or List[str]; list of feature/feature group name(s) selected to be calculated. We will try only calculating selected features, since some features are bundled in the calculations. This process helps with boosting efficiency, and we will only output selected features.
- feature_group_mapping¶
The dictionary with the mapping from individual features to their bundled feature groups.
- final_filter¶
A dicitonary with boolean as the values to filter out the features not selected, yet calculated due to underlying bundles.
- stl_features¶
Switch for calculating/outputting stl features.
- level_shift_features¶
Switch for calculating/outputting level shift features.
- acfpacf_features¶
Switch for calculating/outputting ACF/PACF features.
- special_ac¶
Switch for calculating/outputting features.
- holt_params¶
Switch for calculating/outputting holt parameter features.
- hw_params¶
Switch for calculating/outputting holt-winters parameter features.
- statistics¶
Switch for calculating/outputting raw statistics features.
- cusum_detector¶
Switch for calculating/outputting features using cusum detector in Kats.
- robust_stat_detector¶
Switch for calculating/outputting features using robust stat detector in Kats.
- bocp_detector¶
Switch for calculating/outputting stl features features using bocp detector in Kats.
- outlier_detector¶
Switch for calculating/outputting stl features using outlier detector in Kats.
- trend_detector¶
Switch for calculating/outputting stl features using trend detector in Kats.
- nowcasting¶
Switch for calculating/outputting stl features using nowcasting detector in Kats.
- seasonalities¶
Switch for calculating/outputting stl features using cusum detector in Kats.
- default¶
The default status of the switch for opt-in/out feature calculations.
- static get_acf_features(extra_args: Dict[str, bool], default_status: bool, y_acf_list: List[float], diff1y_acf_list: List[float], diff2y_acf_list: List[float])[source]¶
Aggregating extracted ACF features from get_acfpacf_features function.
- Parameters
extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.
default_status – Default status of the switch for calculate the features or not. Default value is True.
y_acf_list – List of ACF values acquired from original time series.
diff1y_acf_list – List of ACF values acquired from differenced time series.
diff2y_acf_list – List of ACF values acquired from twice differenced time series.
- Returns
Auto-correlation function (ACF) features.
- static get_acfpacf_features(x: numpy.ndarray, acfpacf_lag: int = 6, period: int = 7, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]¶
Calculate ACF and PACF based features. Calculate seasonal ACF, PACF based features Reference: https://stackoverflow.com/questions/36038927/whats-the-difference-between-pandas-acf-and-statsmodel-acf R code: https://cran.r-project.org/web/packages/tsfeatures/vignettes/tsfeatures.html Paper: Meta-learning how to forecast time series
- Parameters
x – The univariate time series array in the form of 1d numpy array.
acfpacf_lag – int; Largest lag number for returning ACF/PACF features via statsmodels. Default value is 6.
period – int; Seasonal period. Default value is 7.
extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.
default_status – Default status of the switch for calculate the features or not. Default value is True.
- Returns
Aggregated ACF, PACF features.
- static get_binarize_mean(x: numpy.ndarray)[source]¶
Converts time series array into a binarized version. Time-series values above its mean are given 1, and those below the mean are 0. Return the average value of the binarized vector. Reference: https://cran.r-project.org/web/packages/tsfeatures/vignettes/tsfeatures.html
- Parameters
x – The univariate time series array in the form of 1d numpy array.
- Returns
The binarized version of time series array.
- static get_bocp_detector(ts: kats.consts.TimeSeriesData, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]¶
Run the Kats BOCP Detector on the Time Series, extract features from the outputs of the detection.
- Parameters
ts – The univariate time series array in the form of Kats TimeSeriesData object.
extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.
default_status – Default status of the switch for calculate the features or not. Default value is True.
- Returns
tuple containing:
Number of changepoints detected by BOCP Detector Max value of the confidence of the changepoints detected Mean value of the confidence of the changepoints detected.
- Return type
(tuple)
- static get_crossing_points(x: numpy.ndarray)[source]¶
Calculating crossing points: the number of times a time series crosses the median line. Reference: https://cran.r-project.org/web/packages/tsfeatures/vignettes/tsfeatures.html
- Parameters
x – The univariate time series array in the form of 1d numpy array.
- Returns
The number of times a time series crosses the median line.
- static get_cusum_detector(ts: kats.consts.TimeSeriesData, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]¶
Run the Kats CUSUM Detector on the Time Series, extract features from the outputs of the detection.
- Parameters
ts – The univariate time series array in the form of Kats TimeSeriesData object.
extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.
default_status – Default status of the switch for calculate the features or not. Default value is True.
- Returns
Outputs of the CUSUM Detector, which include (1) Number of changepoints, either 1 or 0, (2) Confidence of the changepoint detected, 0 if not changepoint, (3) index, or position of the changepoint detected within the time series, (4) delta of the mean levels before and after the changepoint, (5) log-likelihood ratio of changepoint, (6) Boolean - whether regression is detected by CUSUM, (7) Boolean - whether changepoint is stable, (8) p-value of changepoint.
- static get_flat_spots(x: numpy.ndarray, nbins: int = 10)[source]¶
Getting flat spots: Maximum run-lengths across equally-sized segments of time series
- Parameters
x – The univariate time series array in the form of 1d numpy array.
nbins – int; Number of bins to segment time series data into.
- Returns
Maximum run-lengths across segmented time series array.
- static get_het_arch(x: numpy.ndarray)[source]¶
reference: https://www.statsmodels.org/dev/generated/statsmodels.stats.diagnostic.het_arch.html Engle’s Test for Autoregressive Conditional Heteroscedasticity (ARCH)
- Parameters
x – The univariate time series array in the form of 1d numpy array.
- Returns
Lagrange multiplier test statistic
- static get_histogram_mode(x: numpy.ndarray, nbins: int = 10)[source]¶
Measures the mode of the data vector using histograms with a given number of bins. Reference: https://cran.r-project.org/web/packages/tsfeatures/vignettes/tsfeatures.html
- Parameters
x – The univariate time series array in the form of 1d numpy array.
nbins – int; Number of bins to get the histograms. Default value is 10.
- Returns
Mode of the data vector using histograms.
- static get_holt_params(x: numpy.ndarray, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]¶
Estimates the smoothing parameter for the level-alpha and the smoothing parameter for the trend-beta of Holt’s linear trend method. ‘alpha’: Level parameter of the Holt model. ‘beta’: Trend parameter of the Hold model.
- Parameters
x – The univariate time series array in the form of 1d numpy array.
extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.
default_status – Default status of the switch for calculate the features or not. Default value is True.
- Returns
Level and trend parameter of a fitted Holt model.
- static get_hurst(x: numpy.ndarray, lag_size: int = 30)[source]¶
Getting: Hurst Exponent wiki: https://en.wikipedia.org/wiki/Hurst_exponent
- Parameters
x – The univariate time series array in the form of 1d numpy array.
lag_size – int; Size for getting lagged time series data. Default value is 30.
- Returns
The Hurst Exponent of the time series array
- static get_hw_params(x: numpy.ndarray, period: int = 7, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]¶
Estimates the smoothing parameter for the level-alpha, trend-beta of HW’s linear trend, and additive seasonal trend-gamma.
- Parameters
x – The univariate time series array in the form of 1d numpy array.
period – int; Seaonal period for fitting exponential smoothing model. Default value is 7.
extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.
default_status – Default status of the switch for calculate the features or not. Default value is True.
- Returns
Level, trend and seasonal parameter of a fitted Holt-Winter’s model.
- static get_length(x: numpy.ndarray)[source]¶
Getting the length of time series array.
- Parameters
x – The univariate time series array in the form of 1d numpy array.
- Returns
Length of the time series array.
- static get_level_shift(x: numpy.ndarray, window_size: int = 20, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True) → pandas.core.frame.DataFrame[source]¶
Calculating level shift features.
level_shift_idx: Location of the maximum mean value difference, between two consecutive sliding windows
level_shift_size: Size of the maximum mean value difference, between two consecutive sliding windows
- Parameters
x – The univariate time series array in the form of 1d numpy array.
window_size – int; Length of the sliding window. Default value is 20.
extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.
default_status – Default status of the switch for calculate the features or not. Default value is True.
- Returns
Level shift features including level_shift_idx, and level_shift_size
- static get_linearity(x: numpy.ndarray)[source]¶
Getting linearity feature: R square from a fitted linear regression.
- Parameters
x – The univariate time series array in the form of 1d numpy array.
- Returns
R square from a fitted linear regression.
- static get_lumpiness(x: numpy.ndarray, window_size: int = 20)[source]¶
Calculating the lumpiness of time series. Lumpiness is defined as the variance of the chunk-wise variances.
- Parameters
x – The univariate time series array in the form of 1d numpy array.
window_size – int; Window size to split the data into chunks for getting variances. Default value is 20.
- Returns
Lumpiness of the time series array.
- static get_mean(x: numpy.ndarray)[source]¶
Getting the average value of time series array.
- Parameters
x – The univariate time series array in the form of 1d numpy array.
- Returns
Average of the time series array.
- static get_nowcasting(x: numpy.ndarray, window: int = 5, n_fast: int = 12, n_slow: int = 21, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]¶
Run the Kats Nowcasting transformer on the Time Series, extract aggregated features from the outputs of the transformation.
- Parameters
x – The univariate time series array in the form of 1d numpy array.
window – int; Length of window size for all Nowcasting features. Default value is 5.
n_fast – int; length of “fast” or short period exponential moving average in the MACD algorithm in the nowcasting features. Default value is 12.
n_slow – int; length of “slow” or long period exponential moving average in the MACD algorithm in the nowcasting features. Default value is 21.
extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.
default_status – Default status of the switch for calculate the features or not. Default value is True.
- Returns
Mean values of the Kats Nowcasting algorithm time series outputs using the parameters (window, n_fast, n_slow) indicated above. These outputs inclue : (1) Mean of Rate of Change (ROC) time series, (2) Mean of Moving Average (MA) time series,(3) Mean of Momentum (MOM) time series, (4) Mean of LAG time series, (5) Means of MACD, MACDsign, and MACDdiff from Kats Nowcasting.
- static get_outlier_detector(ts: kats.consts.TimeSeriesData, decomp: str = 'additive', iqr_mult: float = 3.0, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]¶
Run the Kats Outlier Detector on the Time Series, extract features from the outputs of the detection.
- Parameters
ts – The univariate time series array in the form of Kats TimeSeriesData object.
decomp – str; Additive or Multiplicative mode for performing outlier detection using OutlierDetector. Default value is ‘additive’.
iqr_mult – float; IQR range for determining outliers through OutlierDetector. Default value is 3.0.
extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.
default_status – Default status of the switch for calculate the features or not. Default value is True.
- Returns
Number of outliers by the Outlier Detector.
- static get_pacf_features(extra_args: Dict[str, bool], default_status: bool, y_pacf_list: List[float], diff1y_pacf_list: List[float], diff2y_pacf_list: List[float])[source]¶
Aggregating extracted PACF features from get_acfpacf_features function.
- Parameters
extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.
default_status – Default status of the switch for calculate the features or not. Default value is True.
y_pacf_list – List of PACF values acquired from original time series.
diff1y_pacf_list – List of PACF values acquired from differenced time series.
diff2y_pacf_list – List of PACF values acquired from twice differenced time series.
- Returns
Partial auto-correlation function (PACF) features.
- static get_robust_stat_detector(ts: kats.consts.TimeSeriesData, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]¶
Run the Kats Robust Stat Detector on the Time Series, extract features from the outputs of the detection.
- Parameters
ts – The univariate time series array in the form of Kats TimeSeriesData object.
extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.
default_status – Default status of the switch for calculate the features or not. Default value is True.
- Returns
(1) Number changepoints detected by the Robust Stat Detector, and (2) Mean of the Metric values from the Robust Stat Detector.
- static get_seasonalities(ts: kats.consts.TimeSeriesData, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]¶
Run the Kats seaonality detectors to get the estimated seasonal period, then extract trend, seasonality and residual magnitudes.
- Parameters
ts – The univariate time series array in the form of Kats TimeSeriesData object.
extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.
default_status – Default status of the switch for calculate the features or not. Default value is True.
- Returns
Returns the detected seasonality period. Slope acquired via fitting simple linear regression model on the trend component as trend magnitude. Difference between the 95 percentile and 5 percentile of the seasonal component as the seasonality magnitude. Standard deviation of the residual component.
- static get_special_ac(x: numpy.ndarray, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]¶
Gettting special_ac features. firstmin_ac: the time of first minimum in the autocorrelation function firstzero_ac: the time of first zero crossing the autocorrelation function.
- Parameters
x – The univariate time series array in the form of 1d numpy array.
extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.
default_status – Default status of the switch for calculate the features or not. Default value is True.
- Returns
Special autocorrelation features described above.
- static get_spectral_entropy(x: numpy.ndarray, freq: int = 1)[source]¶
Getting normalized Shannon entropy of power spectral density. PSD is calculated using scipy’s periodogram.
- Parameters
x – The univariate time series array in the form of 1d numpy array.
freq – int; Frequency for calculating the PSD via scipy periodogram. Default value is 1.
- Returns
Normalized Shannon entropy.
- static get_stability(x: numpy.ndarray, window_size: int = 20)[source]¶
Calculating the stability of time series. Stability is defined as the variance of chunk-wise means.
- Parameters
x – The univariate time series array in the form of 1d numpy array.
window_size – int; Window size to split the data into chunks for getting variances. Default value is 20.
- Returns
Stability of the time series array.
- static get_std1st_der(x: numpy.ndarray)[source]¶
Calculating std1st_der: the standard deviation of the first derivative of the time series. Reference: https://cran.r-project.org/web/packages/tsfeatures/vignettes/tsfeatures.html
- Parameters
x – The univariate time series array in the form of 1d numpy array.
- Returns
The standard deviation of the first derivative of the time series.
- static get_stl_features(x: numpy.ndarray, period: int = 7, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]¶
Calculate STL based features for a time series, including strength of trend, seasonality, spikiness, peak/trough.
- Parameters
x – The univariate time series array in the form of 1d numpy array.
period – int; Period parameter for performing seasonality trend decomposition using LOESS with statsmodels. Default value is 7.
extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.
default_status – Default status of the switch for calculate the features or not. Default value is True.
- Returns
Seasonality features including strength of trend, seasonality, spikiness, peak/trough.
- static get_trend_detector(ts: kats.consts.TimeSeriesData, threshold: float = 0.8, extra_args: Optional[Dict[str, bool]] = None, default_status: bool = True)[source]¶
Run the Kats Trend Detector on the Time Series, extract features from the outputs of the detection.
- Parameters
ts – The univariate time series array in the form of Kats TimeSeriesData object.
threshold – float; threshold for trend intensity; higher threshold gives trend with high intensity (0.8 by default). If we only want to use the p-value to determine changepoints, set threshold = 0.
extra_args – A dictionary containing information for disabling calculation of a certain feature. Default value is None, i.e. no feature is disabled.
default_status – Default status of the switch for calculate the features or not. Default value is True.
- Returns
(1) Number of trends detected by the Kats Trend Detector, (2) Number of increasing trends, (3) Mean of the abolute values of Taus of the trends detected.
- static get_unitroot_kpss(x: numpy.ndarray)[source]¶
Getting a test statistic based on KPSS test. Test a null hypothesis that an observable time series is stationary around a deterministic trend. A vector comprising the statistic for the KPSS unit root test with linear trend and lag one Wiki: https://en.wikipedia.org/wiki/KPSS_test
- Parameters
x – The univariate time series array in the form of 1d numpy array.
- Returns
Test statistics acquired using KPSS test.
- static get_var(x: numpy.ndarray)[source]¶
Getting the variance of time series array.
- Parameters
x – The univariate time series array in the form of 1d numpy array.
- Returns
Variance of the time series array.
- transform(x: kats.consts.TimeSeriesData)[source]¶
The overall high-level function for transforming time series into a number of features
- Parameters
x – Kats TimeSeriesData object.
- Returns
Returning maps (dictionary) with feature name and value pair. For univariate input return a map of {feature: value}. For multivarite input return a list of maps.