kats.detectors.stat_sig_detector module¶

class kats.detectors.stat_sig_detector.MultiStatSigDetectorModel(n_control: Optional[int] = None, n_test: Optional[int] = None, serialized_model: Optional[bytes] = None, time_unit: Optional[str] = None, method: str = 'fdr_bh')[source]¶

Bases: kats.detectors.stat_sig_detector.StatSigDetectorModel

MultiStatSigDetectorModel is a multivariate version of the StatSigDetector. It applies a univariate t-test to each of the components of the multivariate time series to see if the means between the control and test periods are significantly different. Then it uses a false discovery rate controlling procedure rate (FDR) controlling procedure (https://en.wikipedia.org/wiki/False_discovery_rate#Controlling_procedure) to adjust the p-values, reducing the noise the the alerts that are triggered by the detector. The default FDR controlling procedure is the Benjamini-Hochberg procedure, but this can be adjusted when initializing the model.

Like with the StatSigDetector, we start with the history data, and then as for the current data, we apply a rolling window, adding one data point at a time from the current data, and detecting significant change. The T-statistics we return here are based on the adjusted p-values from the FDR controlling procedure.

We suggest using n_control >= 30 to get good estimates

n_control¶

int, number of data points(or time units) of history to compare with

n_test¶

int, number of points(or time_units) to compare the history with

serialized_model¶

Optional, serialized json containing the parameters

time_units¶

str, units of time used to measure the intervals. If not provided we infer it from the provided data

method¶

str, indicates the FDR controlling method used for adjusting the p-values. Defaults to ‘fdr_bh’ for Benjamini-Hochberg. Inputs for other FDR controlling methods can be found at https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html

>>> # Example usage:
>>> # history and ts_pt are TimeSeriesData objects and history is larger
>>> # than (n_control + n_test) so that we have sufficient history to
>>> # run the detector
>>> n_control = 28
>>> n_test = 7
>>> import random
>>> control_time = pd.date_range(start='2018-01-01', freq='D', periods=(n_control + n_test))
>>> test_time = pd.date_range(start='2018-02-05', freq='D', periods=n_test)
>>> num_seq = 5
>>> control_val = [np.random.randn(len(control_time)) for _ in range(num_seq)]
>>> test_val = [np.random.randn(len(test_time)) for _ in range(num_seq)]
>>> hist_ts =
    TimeSeriesData(
        pd.DataFrame(
            {
                **{"time": control_time},
                **{f"ts_{i}": control_val[i] for i in range(num_seq)},
            }
        )
    )
>>> data_ts =
    TimeSeriesData(
        pd.DataFrame(
            {
                **{"time": test_time},
                **{f"ts_{i}": test_val[i] for i in range(num_seq)},
            }
        )
    )
>>> ss_detect = MultiStatSigDetectorModel(n_control=n_control, n_test=n_test)
>>> anom = ss_detect.fit_predict(data=data_ts, historical_data=hist_ts)
fit_predict(data: kats.consts.TimeSeriesData, historical_data: Optional[kats.consts.TimeSeriesData] = None)kats.detectors.detector_consts.AnomalyResponse[source]¶

This is the main working function. The function returns an AnomalyResponse object of length equal to the length of the data We require len(historical_data) > (n_control + n_test)

Parameters
  • data – TimeSeriesData, A multivariate TimeSeriesData for which we are running the MultiStatSigDetectorModel

  • historical_data – Optional[TimeSeriesData] Historical data used to do detection for initial points in data

class kats.detectors.stat_sig_detector.StatSigDetectorModel(n_control: Optional[int] = None, n_test: Optional[int] = None, serialized_model: Optional[bytes] = None, time_unit: Optional[str] = None)[source]¶

Bases: kats.detectors.detector.DetectorModel

StatSigDetectorModel is a simple detector, which compares a control and test period. The detector assumes that the time series data comes from a iid normal distribution, and applies a t-test to check if the means between the control and test period are significantly different.

We start with the history data, and then as for the current data, we apply a rolling window, adding one data point at a time from the current data, and detecting significant change. We return the t-statistic as a score, which reflects the severity of the change. We suggest using n_control >= 30 to get good estimates

n_control¶

number of data points(or time units) of history to compare with

n_test¶

number of points(or time_units) to compare the history with

serialized_model¶

serialized json containing the parameters

time_units¶

units of time used to measure the intervals. If not provided we infer it from the provided data.

>>> # Example usage:
>>> # history and ts_pt are TimeSeriesData objects and history is larger
>>> # than (n_control + n_test) so that we have sufficient history to
>>> # run the detector
>>> n_control = 28
>>> n_test = 7
>>> import random
>>> control_time = pd.date_range(start='2018-01-01', freq='D', periods=(n_control + n_test))
>>> test_time = pd.date_range(start='2018-02-05', freq='D', periods=n_test)
>>> control_val = [random.normalvariate(100,10) for _ in range(n_control + n_test)]
>>> test_val = [random.normalvariate(120,10) for _ in range(n_test)]
>>> hist_ts = TimeSeriesData(time=control_time, value=pd.Series(control_val))
>>> data_ts = TimeSeriesData(time=test_time, value=pd.Series(test_val))
>>> ss_detect = StatSigDetectorModel(n_control=n_control, n_test=n_test)
>>> anom = ss_detect.fit_predict(data=data_ts, historical_data=hist_ts)
fit(data: kats.consts.TimeSeriesData, historical_data: Optional[kats.consts.TimeSeriesData] = None)None[source]¶

Fit can be called during priming. It’s a noop for us.

fit_predict(data: kats.consts.TimeSeriesData, historical_data: Optional[kats.consts.TimeSeriesData] = None)kats.detectors.detector_consts.AnomalyResponse[source]¶

This is the main working function. The function returns an AnomalyResponse object of length equal to the length of the data. We require len(historical_data) > (n_control + n_test).

Parameters
  • data – TimeSeriesData, A univariate TimeSeriesData for which we are running the StatSigDetectorModel

  • historical_data – Optional[TimeSeriesData] Historical data used to do detection for initial points in data

predict(data: kats.consts.TimeSeriesData, historical_data: Optional[kats.consts.TimeSeriesData] = None)kats.detectors.detector_consts.AnomalyResponse[source]¶

Predict is not implemented.

serialize()bytes[source]¶

Serializes by putting model parameters in a json

visualize()[source]¶

Function to visualize the result of the StatSigDetectorModel.

kats.detectors.stat_sig_detector.to_datetime(dt: numpy.datetime64)datetime.datetime[source]¶

Helper function to convert from np.datetime64 which is used by pandas pd.to_datetime to datetime in datetime library