kats.detectors.bocpd module

This module contains classes and functions used for implementing the Bayesian Online Changepoint Detection algorithm.

class kats.detectors.bocpd.BOCPDMetadata(model: kats.detectors.bocpd.BOCPDModelType, ts_name: Optional[str] = None)[source]

Bases: object

Metadata for the BOCPD model.

This gives information about the type of detector, the name of the time series and the model used for detection.

model

The kind of predictive model used.

ts_name

string, name of the time series for which the detector is is being run.

class kats.detectors.bocpd.BOCPDModelParameters(data: Optional[kats.consts.TimeSeriesData] = None, prior_choice: Dict[str, List[float]] = <factory>, cp_prior: float = 0.1, search_method: str = 'random')[source]

Bases: abc.ABC

Data class containing data for predictive models used in BOCPD.

Particular predictive models derive from this class.

prior_choice

list of changepoint probability priors over which we will search hyperparameters

Type

Dict[str, List[float]]

cp_prior

default prior for probability of changepoint.

Type

float

search_method

string, representing the search method for the hyperparameter tuning library. Allowed values are ‘random’ and ‘gridsearch’.

Type

str

set_prior(param_dict: Dict[str, float])[source]

Setter method, which sets the value of the parameters.

Currently, this sets the value of the prior probability of changepoint.

Parameters

param_dict – dictionary of the form {param_name: param_value}.

Returns

None.

class kats.detectors.bocpd.BOCPDModelType(value)[source]

Bases: enum.Enum

Bayesian Online Change Point Detection model type.

Describes the type of predictive model used by the BOCPD algorithm.

class kats.detectors.bocpd.BOCPDetector(data: kats.consts.TimeSeriesData)[source]

Bases: kats.detectors.detector.Detector

Bayesian Online Changepoint Detection.

Given an univariate time series, this class performs changepoint detection, i.e. it tells us when the time series shows a change. This is online, which means it gives the best estimate based on a lookehead number of time steps (which is the lag).

This faithfully implements the algorithm in Adams & McKay, 2007. “Bayesian Online Changepoint Detection” https://arxiv.org/abs/0710.3742

The basic idea is to see whether the new values are improbable, when compared to a bayesian predictive model, built from the previous observations.

Attrbutes:

data: TimeSeriesData, data on which we will run the BOCPD algorithm.

detector(model: kats.detectors.bocpd.BOCPDModelType = <BOCPDModelType.NORMAL_KNOWN_MODEL: 1>, model_parameters: Union[None, kats.detectors.bocpd.BOCPDModelParameters] = None, lag: int = 10, choose_priors: bool = True, changepoint_prior: float = 0.01, threshold: float = 0.5, debug: bool = False, agg_cp: bool = True)List[Tuple[kats.consts.TimeSeriesChangePoint, kats.detectors.bocpd.BOCPDMetadata]][source]

The main detector method.

This function runs the BOCPD detector and returns the list of changepoints, along with some metadata

Parameters
  • model – This specifies the probabilistic model, that generates the data within each segment. The user can input several model types depending on the behavior of the time series. Currently allowed models are: NORMAL_KNOWN_MODEL: Normal model with variance known. Use this to find level shifts in normally distributed data. TREND_CHANGE_MODEL : This model assumes each segment is generated from ordinary linear regression. Use this model to understand changes in slope, or trend in time series. POISSON_PROCESS_MODEL: This assumes a poisson generative model. Use this for count data, where most of the values are close to zero.

  • model_parameters – Model Parameters correspond to specific parameters for a specific model. They are defined in the NormalKnownParameters, TrendChangeParameters, PoissonModelParameters classes.

  • lag – integer referring to the lag in reporting the changepoint. We report the changepoint after seeing “lag” number of data points. Higher lag gives greater certainty that this is indeed a changepoint. Lower lag will detect the changepoint faster. This is the tradeoff.

  • choose_priors – If True, then hyperparameter tuning library (HPT) is used to choose the best priors which maximizes the posterior predictive

  • changepoint_prior – This is a Bayesian algorithm. Hence, this parameter specifies the prior belief on the probability that a given point is a changepoint. For example, if you believe 10% of your data will be a changepoint, you can set this to 0.1.

  • threshold – We report the probability of observing the changepoint at each instant. The actual changepoints are obtained by denoting the points above this threshold to be a changepoint.

  • debug – This surfaces additional information, such as the plots of predicted means and variances, which allows the user to see debug why changepoints were not properly detected.

  • agg_cp – It is tested and believed that by aggregating run-length posterior, we may have a stronger signal for changepoint detection. When setting this parameter as True, posterior will be the aggregation of run-length posterior by fetching maximum values diagonally.

Returns

Each element in this list is a changepoint, an object of TimeSeriesChangepoint class. The start_time gives the time that the change was detected. The metadata contains data about the name of the time series (useful when multiple time series are run simultaneously), and the predictive model used.

Return type

List[Tuple[TimeSeriesChangePoint, BOCPDMetadata]]

get_change_prob()Dict[str, numpy.ndarray][source]

Returns the probability of being a changepoint.

Parameters

None.

Returns

For every point in the time series. The return type is a dict, with the name of the timeseries as the key, and the value is an array of probabilities of the same length as the timeseries data.

get_run_length_matrix()Dict[str, numpy.ndarray][source]

Returns the entire run-time posterior. :param None.:

Returns

The return type is a dict, with the name of the timeseries as the key, and the value is an array of probabilities of the same length as the timeseries data.

group_changepoints_by_timeseries(change_points: List[Tuple[kats.consts.TimeSeriesChangePoint, kats.detectors.bocpd.BOCPDMetadata]])Dict[str, List[Tuple[kats.consts.TimeSeriesChangePoint, kats.detectors.bocpd.BOCPDMetadata]]][source]

Helper function to group changepoints by time series.

For multivariate inputs, all changepoints are output in a list and the time series they correspond to is referenced in the metadata. This function is a helper function to group these changepoints by time series.

Parameters

change_points – List of changepoints, with metadata containing the time series names. This is the return value of the detector() method.

Returns

Dictionary, with time series names, and their corresponding changepoints.

plot(change_points: List[Tuple[kats.consts.TimeSeriesChangePoint, kats.detectors.bocpd.BOCPDMetadata]], ts_names: Optional[List[str]] = None)None[source]

Plots the change points, along with the time series.

Use this function to visualize the results of the changepoint detection.

Parameters
  • change_points – List of changepoints, which are the return value of the detector() function.

  • ts_names – List of names of the time series, useful in case multiple time series are used.

Returns

None.

class kats.detectors.bocpd.NormalKnownParameters(data: Optional[kats.consts.TimeSeriesData] = None, prior_choice: Dict[str, List[float]] = <factory>, cp_prior: float = 0.1, search_method: str = 'random', empirical: bool = True, mean_prior: Optional[float] = None, mean_prec_prior: Optional[float] = None, known_prec: Optional[float] = None, known_prec_multiplier: float = 1.0)[source]

Bases: kats.detectors.bocpd.BOCPDModelParameters

Data class containing the parameters for Normal predictive model.

This assumes that the data comes from a normal distribution with known precision.

empirical

Boolean, should we derive the prior empirically. When this is true, the mean_prior, mean_prec_prior and known_prec are derived from the data, and don’t need to be specified.

Type

bool

mean_prior

float, mean of the prior normal distribution.

Type

Optional[float]

mean_prec_prior

float, precision of the prior normal distribution.

Type

Optional[float]

known_prec

float, known precision of the data.

Type

Optional[float]

known_prec_multiplier

float, a multiplier of the known precision. This is a variable, that is used in the hyperparameter search, to multiply with the known_prec value.

Type

float

prior_choice

List of parameters to search, for hyperparameter tuning.

Type

Dict[str, List[float]]

set_prior(param_dict: Dict[str, float])[source]

Sets priors

Sets the value of the prior based on the parameter dictionary passed.

Parameters

param_dict – Dictionary of parameters required for setting the prior value.

Returns

None.

class kats.detectors.bocpd.PoissonModelParameters(data: Optional[kats.consts.TimeSeriesData] = None, prior_choice: Dict[str, List[float]] = <factory>, cp_prior: float = 0.1, search_method: str = 'random', alpha_prior: float = 1.0, beta_prior: float = 0.05)[source]

Bases: kats.detectors.bocpd.BOCPDModelParameters

Parameters for the Poisson predictive model.

Here, the data is generated from a Poisson distribution.

alpha_prior

prior value of the alpha value of the Gamma prior.

Type

float

beta_prior

prior value of the beta value of the Gamma prior.

Type

float

class kats.detectors.bocpd.TrendChangeParameters(data: Optional[kats.consts.TimeSeriesData] = None, prior_choice: Dict[str, List[float]] = <factory>, cp_prior: float = 0.1, search_method: str = 'random', mu_prior: Optional[numpy.ndarray] = None, num_likelihood_samples: int = 100, num_points_prior: int = 10, readjust_sigma_prior: bool = False, plot_regression_prior: bool = False)[source]

Bases: kats.detectors.bocpd.BOCPDModelParameters

Parameters for the trend change predictive model.

This model assumes that the data is generated from a Bayesian linear model.

mu_prior

array, mean of the normal priors on the slope and intercept

Type

Optional[numpy.ndarray]

num_likelihood_samples

int, number of samples generated, to calculate the posterior.

Type

int

num_points_prior

int,

Type

int

readjust_sigma_prior

Boolean, whether we should readjust the Inv. Gamma

Type

bool

prior for the variance, based on the data.
plot_regression_prior

Boolean, plot prior. set as False, unless trying to debug.

Type

bool

kats.detectors.bocpd.check_data(data: kats.consts.TimeSeriesData)[source]

Small helper function to check if the data is in the appropriate format.

Currently, this only checks if we have enough data points to run the algorithm meaningfully.

Parameters

data – TimeSeriesData object, on which to run the algorithm.

Returns

None.