kats.detectors.bocpd module¶

This module contains classes and functions used for implementing the Bayesian Online Changepoint Detection algorithm.

class kats.detectors.bocpd.BOCPDMetadata(model: kats.detectors.bocpd.BOCPDModelType, ts_name: Optional[str] = None)[source]¶

Bases: object

Metadata for the BOCPD model.

This gives information about the type of detector, the name of the time series and the model used for detection.

model¶: The kind of predictive model used.

ts_name¶: string, name of the time series for which the detector is is being run.

class kats.detectors.bocpd.BOCPDModelParameters(data: Optional[kats.consts.TimeSeriesData] = None, prior_choice: Dict[str, List[float]] = <factory>, cp_prior: float = 0.1, search_method: str = 'random')[source]¶

Bases: abc.ABC

Data class containing data for predictive models used in BOCPD.

Particular predictive models derive from this class.

prior_choice¶

list of changepoint probability priors over which we will search hyperparameters

Type: Dict[str, List[float]]

cp_prior¶

default prior for probability of changepoint.

Type: float

search_method¶

string, representing the search method for the hyperparameter tuning library. Allowed values are ‘random’ and ‘gridsearch’.

Type: str

set_prior(param_dict: Dict[str, float])[source]¶

Setter method, which sets the value of the parameters.

Currently, this sets the value of the prior probability of changepoint.

Parameters: param_dict – dictionary of the form {param_name: param_value}.
Returns: None.

class kats.detectors.bocpd.BOCPDModelType(value)[source]¶

Bases: enum.Enum

Bayesian Online Change Point Detection model type.

Describes the type of predictive model used by the BOCPD algorithm.

class kats.detectors.bocpd.BOCPDetector(data: kats.consts.TimeSeriesData)[source]¶

Bases: kats.detectors.detector.Detector

Bayesian Online Changepoint Detection.

Given an univariate time series, this class performs changepoint detection, i.e. it tells us when the time series shows a change. This is online, which means it gives the best estimate based on a lookehead number of time steps (which is the lag).

This faithfully implements the algorithm in Adams & McKay, 2007. “Bayesian Online Changepoint Detection” https://arxiv.org/abs/0710.3742

The basic idea is to see whether the new values are improbable, when compared to a bayesian predictive model, built from the previous observations.

Attrbutes:: data: TimeSeriesData, data on which we will run the BOCPD algorithm.

detector(model: kats.detectors.bocpd.BOCPDModelType = <BOCPDModelType.NORMAL_KNOWN_MODEL: 1>, model_parameters: Union[None, kats.detectors.bocpd.BOCPDModelParameters] = None, lag: int = 10, choose_priors: bool = True, changepoint_prior: float = 0.01, threshold: float = 0.5, debug: bool = False, agg_cp: bool = True) → List[Tuple[kats.consts.TimeSeriesChangePoint, kats.detectors.bocpd.BOCPDMetadata]][source]¶

The main detector method.

This function runs the BOCPD detector and returns the list of changepoints, along with some metadata

Parameters

model – This specifies the probabilistic model, that generates the data within each segment. The user can input several model types depending on the behavior of the time series. Currently allowed models are: NORMAL_KNOWN_MODEL: Normal model with variance known. Use this to find level shifts in normally distributed data. TREND_CHANGE_MODEL : This model assumes each segment is generated from ordinary linear regression. Use this model to understand changes in slope, or trend in time series. POISSON_PROCESS_MODEL: This assumes a poisson generative model. Use this for count data, where most of the values are close to zero.
model_parameters – Model Parameters correspond to specific parameters for a specific model. They are defined in the NormalKnownParameters, TrendChangeParameters, PoissonModelParameters classes.
lag – integer referring to the lag in reporting the changepoint. We report the changepoint after seeing “lag” number of data points. Higher lag gives greater certainty that this is indeed a changepoint. Lower lag will detect the changepoint faster. This is the tradeoff.
choose_priors – If True, then hyperparameter tuning library (HPT) is used to choose the best priors which maximizes the posterior predictive
changepoint_prior – This is a Bayesian algorithm. Hence, this parameter specifies the prior belief on the probability that a given point is a changepoint. For example, if you believe 10% of your data will be a changepoint, you can set this to 0.1.
threshold – We report the probability of observing the changepoint at each instant. The actual changepoints are obtained by denoting the points above this threshold to be a changepoint.
debug – This surfaces additional information, such as the plots of predicted means and variances, which allows the user to see debug why changepoints were not properly detected.
agg_cp – It is tested and believed that by aggregating run-length posterior, we may have a stronger signal for changepoint detection. When setting this parameter as True, posterior will be the aggregation of run-length posterior by fetching maximum values diagonally.

Returns

Each element in this list is a changepoint, an object of TimeSeriesChangepoint class. The start_time gives the time that the change was detected. The metadata contains data about the name of the time series (useful when multiple time series are run simultaneously), and the predictive model used.

Return type

List[Tuple[TimeSeriesChangePoint, BOCPDMetadata]]

get_change_prob() → Dict[str, numpy.ndarray][source]¶

Returns the probability of being a changepoint.

Parameters: None. –
Returns: For every point in the time series. The return type is a dict, with the name of the timeseries as the key, and the value is an array of probabilities of the same length as the timeseries data.

get_run_length_matrix() → Dict[str, numpy.ndarray][source]¶

Returns the entire run-time posterior. :param None.:

Returns: The return type is a dict, with the name of the timeseries as the key, and the value is an array of probabilities of the same length as the timeseries data.

group_changepoints_by_timeseries(change_points: List[Tuple[kats.consts.TimeSeriesChangePoint, kats.detectors.bocpd.BOCPDMetadata]]) → Dict[str, List[Tuple[kats.consts.TimeSeriesChangePoint, kats.detectors.bocpd.BOCPDMetadata]]][source]¶

Helper function to group changepoints by time series.

For multivariate inputs, all changepoints are output in a list and the time series they correspond to is referenced in the metadata. This function is a helper function to group these changepoints by time series.

Parameters: change_points – List of changepoints, with metadata containing the time series names. This is the return value of the detector() method.
Returns: Dictionary, with time series names, and their corresponding changepoints.

plot(change_points: List[Tuple[kats.consts.TimeSeriesChangePoint, kats.detectors.bocpd.BOCPDMetadata]], ts_names: Optional[List[str]] = None) → None [source]¶

Plots the change points, along with the time series.

Use this function to visualize the results of the changepoint detection.

Parameters

change_points – List of changepoints, which are the return value of the detector() function.
ts_names – List of names of the time series, useful in case multiple time series are used.

Returns

None.

class kats.detectors.bocpd.NormalKnownParameters(data: Optional[kats.consts.TimeSeriesData] = None, prior_choice: Dict[str, List[float]] = <factory>, cp_prior: float = 0.1, search_method: str = 'random', empirical: bool = True, mean_prior: Optional[float] = None, mean_prec_prior: Optional[float] = None, known_prec: Optional[float] = None, known_prec_multiplier: float = 1.0)[source]¶

Bases: kats.detectors.bocpd.BOCPDModelParameters

Data class containing the parameters for Normal predictive model.

This assumes that the data comes from a normal distribution with known precision.

empirical¶

Boolean, should we derive the prior empirically. When this is true, the mean_prior, mean_prec_prior and known_prec are derived from the data, and don’t need to be specified.

Type: bool

mean_prior¶

float, mean of the prior normal distribution.

Type: Optional[float]

mean_prec_prior¶

float, precision of the prior normal distribution.

Type: Optional[float]

known_prec¶

float, known precision of the data.

Type: Optional[float]

known_prec_multiplier¶

float, a multiplier of the known precision. This is a variable, that is used in the hyperparameter search, to multiply with the known_prec value.

Type: float

prior_choice¶

List of parameters to search, for hyperparameter tuning.

Type: Dict[str, List[float]]

set_prior(param_dict: Dict[str, float])[source]¶

Sets priors

Sets the value of the prior based on the parameter dictionary passed.

Parameters: param_dict – Dictionary of parameters required for setting the prior value.
Returns: None.

class kats.detectors.bocpd.PoissonModelParameters(data: Optional[kats.consts.TimeSeriesData] = None, prior_choice: Dict[str, List[float]] = <factory>, cp_prior: float = 0.1, search_method: str = 'random', alpha_prior: float = 1.0, beta_prior: float = 0.05)[source]¶

Bases: kats.detectors.bocpd.BOCPDModelParameters

Parameters for the Poisson predictive model.

Here, the data is generated from a Poisson distribution.

alpha_prior¶

prior value of the alpha value of the Gamma prior.

Type: float

beta_prior¶

prior value of the beta value of the Gamma prior.

Type: float

class kats.detectors.bocpd.TrendChangeParameters(data: Optional[kats.consts.TimeSeriesData] = None, prior_choice: Dict[str, List[float]] = <factory>, cp_prior: float = 0.1, search_method: str = 'random', mu_prior: Optional[numpy.ndarray] = None, num_likelihood_samples: int = 100, num_points_prior: int = 10, readjust_sigma_prior: bool = False, plot_regression_prior: bool = False)[source]¶

Bases: kats.detectors.bocpd.BOCPDModelParameters

Parameters for the trend change predictive model.

This model assumes that the data is generated from a Bayesian linear model.

mu_prior¶

array, mean of the normal priors on the slope and intercept

Type: Optional[numpy.ndarray]

num_likelihood_samples¶

int, number of samples generated, to calculate the posterior.

Type: int

num_points_prior¶

int,

Type: int

readjust_sigma_prior¶

Boolean, whether we should readjust the Inv. Gamma

Type: bool

prior for the variance, based on the data.

plot_regression_prior¶

Boolean, plot prior. set as False, unless trying to debug.

Type: bool

kats.detectors.bocpd.check_data(data: kats.consts.TimeSeriesData)[source]¶

Small helper function to check if the data is in the appropriate format.

Currently, this only checks if we have enough data points to run the algorithm meaningfully.

Parameters: data – TimeSeriesData object, on which to run the algorithm.
Returns: None.

kats.detectors.bocpd module¶

Kats

Navigation

Related Topics