kats.detectors.stat_sig_detector module¶
- class kats.detectors.stat_sig_detector.MultiStatSigDetectorModel(n_control: Optional[int] = None, n_test: Optional[int] = None, serialized_model: Optional[bytes] = None, time_unit: Optional[str] = None, method: str = 'fdr_bh')[source]¶
Bases:
kats.detectors.stat_sig_detector.StatSigDetectorModel
MultiStatSigDetectorModel is a multivariate version of the StatSigDetector. It applies a univariate t-test to each of the components of the multivariate time series to see if the means between the control and test periods are significantly different. Then it uses a false discovery rate controlling procedure rate (FDR) controlling procedure (https://en.wikipedia.org/wiki/False_discovery_rate#Controlling_procedure) to adjust the p-values, reducing the noise the the alerts that are triggered by the detector. The default FDR controlling procedure is the Benjamini-Hochberg procedure, but this can be adjusted when initializing the model.
Like with the StatSigDetector, we start with the history data, and then as for the current data, we apply a rolling window, adding one data point at a time from the current data, and detecting significant change. The T-statistics we return here are based on the adjusted p-values from the FDR controlling procedure.
We suggest using n_control >= 30 to get good estimates
- n_control¶
int, number of data points(or time units) of history to compare with
- n_test¶
int, number of points(or time_units) to compare the history with
- serialized_model¶
Optional, serialized json containing the parameters
- time_units¶
str, units of time used to measure the intervals. If not provided we infer it from the provided data
- method¶
str, indicates the FDR controlling method used for adjusting the p-values. Defaults to ‘fdr_bh’ for Benjamini-Hochberg. Inputs for other FDR controlling methods can be found at https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html
>>> # Example usage: >>> # history and ts_pt are TimeSeriesData objects and history is larger >>> # than (n_control + n_test) so that we have sufficient history to >>> # run the detector >>> n_control = 28 >>> n_test = 7 >>> import random >>> control_time = pd.date_range(start='2018-01-01', freq='D', periods=(n_control + n_test)) >>> test_time = pd.date_range(start='2018-02-05', freq='D', periods=n_test) >>> num_seq = 5 >>> control_val = [np.random.randn(len(control_time)) for _ in range(num_seq)] >>> test_val = [np.random.randn(len(test_time)) for _ in range(num_seq)] >>> hist_ts = TimeSeriesData( pd.DataFrame( { **{"time": control_time}, **{f"ts_{i}": control_val[i] for i in range(num_seq)}, } ) ) >>> data_ts = TimeSeriesData( pd.DataFrame( { **{"time": test_time}, **{f"ts_{i}": test_val[i] for i in range(num_seq)}, } ) ) >>> ss_detect = MultiStatSigDetectorModel(n_control=n_control, n_test=n_test) >>> anom = ss_detect.fit_predict(data=data_ts, historical_data=hist_ts)
- fit_predict(data: kats.consts.TimeSeriesData, historical_data: Optional[kats.consts.TimeSeriesData] = None) → kats.detectors.detector_consts.AnomalyResponse[source]¶
This is the main working function. The function returns an AnomalyResponse object of length equal to the length of the data We require len(historical_data) > (n_control + n_test)
- Parameters
data – TimeSeriesData, A multivariate TimeSeriesData for which we are running the MultiStatSigDetectorModel
historical_data – Optional[TimeSeriesData] Historical data used to do detection for initial points in data
- class kats.detectors.stat_sig_detector.StatSigDetectorModel(n_control: Optional[int] = None, n_test: Optional[int] = None, serialized_model: Optional[bytes] = None, time_unit: Optional[str] = None)[source]¶
Bases:
kats.detectors.detector.DetectorModel
StatSigDetectorModel is a simple detector, which compares a control and test period. The detector assumes that the time series data comes from a iid normal distribution, and applies a t-test to check if the means between the control and test period are significantly different.
We start with the history data, and then as for the current data, we apply a rolling window, adding one data point at a time from the current data, and detecting significant change. We return the t-statistic as a score, which reflects the severity of the change. We suggest using n_control >= 30 to get good estimates
- n_control¶
number of data points(or time units) of history to compare with
- n_test¶
number of points(or time_units) to compare the history with
- serialized_model¶
serialized json containing the parameters
- time_units¶
units of time used to measure the intervals. If not provided we infer it from the provided data.
>>> # Example usage: >>> # history and ts_pt are TimeSeriesData objects and history is larger >>> # than (n_control + n_test) so that we have sufficient history to >>> # run the detector >>> n_control = 28 >>> n_test = 7 >>> import random >>> control_time = pd.date_range(start='2018-01-01', freq='D', periods=(n_control + n_test)) >>> test_time = pd.date_range(start='2018-02-05', freq='D', periods=n_test) >>> control_val = [random.normalvariate(100,10) for _ in range(n_control + n_test)] >>> test_val = [random.normalvariate(120,10) for _ in range(n_test)] >>> hist_ts = TimeSeriesData(time=control_time, value=pd.Series(control_val)) >>> data_ts = TimeSeriesData(time=test_time, value=pd.Series(test_val)) >>> ss_detect = StatSigDetectorModel(n_control=n_control, n_test=n_test) >>> anom = ss_detect.fit_predict(data=data_ts, historical_data=hist_ts)
- fit(data: kats.consts.TimeSeriesData, historical_data: Optional[kats.consts.TimeSeriesData] = None) → None[source]¶
Fit can be called during priming. It’s a noop for us.
- fit_predict(data: kats.consts.TimeSeriesData, historical_data: Optional[kats.consts.TimeSeriesData] = None) → kats.detectors.detector_consts.AnomalyResponse[source]¶
This is the main working function. The function returns an AnomalyResponse object of length equal to the length of the data. We require len(historical_data) > (n_control + n_test).
- Parameters
data – TimeSeriesData, A univariate TimeSeriesData for which we are running the StatSigDetectorModel
historical_data – Optional[TimeSeriesData] Historical data used to do detection for initial points in data
- predict(data: kats.consts.TimeSeriesData, historical_data: Optional[kats.consts.TimeSeriesData] = None) → kats.detectors.detector_consts.AnomalyResponse[source]¶
Predict is not implemented.
- kats.detectors.stat_sig_detector.to_datetime(dt: numpy.datetime64) → datetime.datetime[source]¶
Helper function to convert from np.datetime64 which is used by pandas pd.to_datetime to datetime in datetime library