kats.detectors.bocpd module¶
This module contains classes and functions used for implementing the Bayesian Online Changepoint Detection algorithm.
- class kats.detectors.bocpd.BOCPDMetadata(model: kats.detectors.bocpd.BOCPDModelType, ts_name: Optional[str] = None)[source]¶
Bases:
object
Metadata for the BOCPD model.
This gives information about the type of detector, the name of the time series and the model used for detection.
- model¶
The kind of predictive model used.
- ts_name¶
string, name of the time series for which the detector is is being run.
- class kats.detectors.bocpd.BOCPDModelParameters(data: Optional[kats.consts.TimeSeriesData] = None, prior_choice: Dict[str, List[float]] = <factory>, cp_prior: float = 0.1, search_method: str = 'random')[source]¶
Bases:
abc.ABC
Data class containing data for predictive models used in BOCPD.
Particular predictive models derive from this class.
- prior_choice¶
list of changepoint probability priors over which we will search hyperparameters
- search_method¶
string, representing the search method for the hyperparameter tuning library. Allowed values are ‘random’ and ‘gridsearch’.
- Type
- class kats.detectors.bocpd.BOCPDModelType(value)[source]¶
Bases:
enum.Enum
Bayesian Online Change Point Detection model type.
Describes the type of predictive model used by the BOCPD algorithm.
- class kats.detectors.bocpd.BOCPDetector(data: kats.consts.TimeSeriesData)[source]¶
Bases:
kats.detectors.detector.Detector
Bayesian Online Changepoint Detection.
Given an univariate time series, this class performs changepoint detection, i.e. it tells us when the time series shows a change. This is online, which means it gives the best estimate based on a lookehead number of time steps (which is the lag).
This faithfully implements the algorithm in Adams & McKay, 2007. “Bayesian Online Changepoint Detection” https://arxiv.org/abs/0710.3742
The basic idea is to see whether the new values are improbable, when compared to a bayesian predictive model, built from the previous observations.
- Attrbutes:
data: TimeSeriesData, data on which we will run the BOCPD algorithm.
- detector(model: kats.detectors.bocpd.BOCPDModelType = <BOCPDModelType.NORMAL_KNOWN_MODEL: 1>, model_parameters: Union[None, kats.detectors.bocpd.BOCPDModelParameters] = None, lag: int = 10, choose_priors: bool = True, changepoint_prior: float = 0.01, threshold: float = 0.5, debug: bool = False, agg_cp: bool = True) → List[Tuple[kats.consts.TimeSeriesChangePoint, kats.detectors.bocpd.BOCPDMetadata]][source]¶
The main detector method.
This function runs the BOCPD detector and returns the list of changepoints, along with some metadata
- Parameters
model – This specifies the probabilistic model, that generates the data within each segment. The user can input several model types depending on the behavior of the time series. Currently allowed models are: NORMAL_KNOWN_MODEL: Normal model with variance known. Use this to find level shifts in normally distributed data. TREND_CHANGE_MODEL : This model assumes each segment is generated from ordinary linear regression. Use this model to understand changes in slope, or trend in time series. POISSON_PROCESS_MODEL: This assumes a poisson generative model. Use this for count data, where most of the values are close to zero.
model_parameters – Model Parameters correspond to specific parameters for a specific model. They are defined in the NormalKnownParameters, TrendChangeParameters, PoissonModelParameters classes.
lag – integer referring to the lag in reporting the changepoint. We report the changepoint after seeing “lag” number of data points. Higher lag gives greater certainty that this is indeed a changepoint. Lower lag will detect the changepoint faster. This is the tradeoff.
choose_priors – If True, then hyperparameter tuning library (HPT) is used to choose the best priors which maximizes the posterior predictive
changepoint_prior – This is a Bayesian algorithm. Hence, this parameter specifies the prior belief on the probability that a given point is a changepoint. For example, if you believe 10% of your data will be a changepoint, you can set this to 0.1.
threshold – We report the probability of observing the changepoint at each instant. The actual changepoints are obtained by denoting the points above this threshold to be a changepoint.
debug – This surfaces additional information, such as the plots of predicted means and variances, which allows the user to see debug why changepoints were not properly detected.
agg_cp – It is tested and believed that by aggregating run-length posterior, we may have a stronger signal for changepoint detection. When setting this parameter as True, posterior will be the aggregation of run-length posterior by fetching maximum values diagonally.
- Returns
Each element in this list is a changepoint, an object of TimeSeriesChangepoint class. The start_time gives the time that the change was detected. The metadata contains data about the name of the time series (useful when multiple time series are run simultaneously), and the predictive model used.
- Return type
List[Tuple[TimeSeriesChangePoint, BOCPDMetadata]]
- get_change_prob() → Dict[str, numpy.ndarray][source]¶
Returns the probability of being a changepoint.
- Parameters
None. –
- Returns
For every point in the time series. The return type is a dict, with the name of the timeseries as the key, and the value is an array of probabilities of the same length as the timeseries data.
- get_run_length_matrix() → Dict[str, numpy.ndarray][source]¶
Returns the entire run-time posterior. :param None.:
- Returns
The return type is a dict, with the name of the timeseries as the key, and the value is an array of probabilities of the same length as the timeseries data.
- group_changepoints_by_timeseries(change_points: List[Tuple[kats.consts.TimeSeriesChangePoint, kats.detectors.bocpd.BOCPDMetadata]]) → Dict[str, List[Tuple[kats.consts.TimeSeriesChangePoint, kats.detectors.bocpd.BOCPDMetadata]]][source]¶
Helper function to group changepoints by time series.
For multivariate inputs, all changepoints are output in a list and the time series they correspond to is referenced in the metadata. This function is a helper function to group these changepoints by time series.
- Parameters
change_points – List of changepoints, with metadata containing the time series names. This is the return value of the detector() method.
- Returns
Dictionary, with time series names, and their corresponding changepoints.
- plot(change_points: List[Tuple[kats.consts.TimeSeriesChangePoint, kats.detectors.bocpd.BOCPDMetadata]], ts_names: Optional[List[str]] = None) → None[source]¶
Plots the change points, along with the time series.
Use this function to visualize the results of the changepoint detection.
- Parameters
change_points – List of changepoints, which are the return value of the detector() function.
ts_names – List of names of the time series, useful in case multiple time series are used.
- Returns
None.
- class kats.detectors.bocpd.NormalKnownParameters(data: Optional[kats.consts.TimeSeriesData] = None, prior_choice: Dict[str, List[float]] = <factory>, cp_prior: float = 0.1, search_method: str = 'random', empirical: bool = True, mean_prior: Optional[float] = None, mean_prec_prior: Optional[float] = None, known_prec: Optional[float] = None, known_prec_multiplier: float = 1.0)[source]¶
Bases:
kats.detectors.bocpd.BOCPDModelParameters
Data class containing the parameters for Normal predictive model.
This assumes that the data comes from a normal distribution with known precision.
- empirical¶
Boolean, should we derive the prior empirically. When this is true, the mean_prior, mean_prec_prior and known_prec are derived from the data, and don’t need to be specified.
- Type
- known_prec_multiplier¶
float, a multiplier of the known precision. This is a variable, that is used in the hyperparameter search, to multiply with the known_prec value.
- Type
- class kats.detectors.bocpd.PoissonModelParameters(data: Optional[kats.consts.TimeSeriesData] = None, prior_choice: Dict[str, List[float]] = <factory>, cp_prior: float = 0.1, search_method: str = 'random', alpha_prior: float = 1.0, beta_prior: float = 0.05)[source]¶
Bases:
kats.detectors.bocpd.BOCPDModelParameters
Parameters for the Poisson predictive model.
Here, the data is generated from a Poisson distribution.
- class kats.detectors.bocpd.TrendChangeParameters(data: Optional[kats.consts.TimeSeriesData] = None, prior_choice: Dict[str, List[float]] = <factory>, cp_prior: float = 0.1, search_method: str = 'random', mu_prior: Optional[numpy.ndarray] = None, num_likelihood_samples: int = 100, num_points_prior: int = 10, readjust_sigma_prior: bool = False, plot_regression_prior: bool = False)[source]¶
Bases:
kats.detectors.bocpd.BOCPDModelParameters
Parameters for the trend change predictive model.
This model assumes that the data is generated from a Bayesian linear model.
- mu_prior¶
array, mean of the normal priors on the slope and intercept
- Type
Optional[numpy.ndarray]
- prior for the variance, based on the data.
- kats.detectors.bocpd.check_data(data: kats.consts.TimeSeriesData)[source]¶
Small helper function to check if the data is in the appropriate format.
Currently, this only checks if we have enough data points to run the algorithm meaningfully.
- Parameters
data – TimeSeriesData object, on which to run the algorithm.
- Returns
None.