kats.consts module

This module contains some of the key data structures in the Kats library, including TimeSeriesData, TimeSeriesChangePoint, and TimeSeriesIterator.

TimeSeriesChangePoint is the return type of many of the Kats detection algorithms.

TimeSeriesData is the fundamental data structure in the Kats library, that gives uses access to a host of forecasting, detection, and utility algorithms right at the user’s fingertips.

class kats.consts.ModelEnum(value)[source]

Bases: enum.Enum

This enum lists the options of models to be set for default search space in hyper-parameter tuning.

class kats.consts.OperationsEnum(value)[source]

Bases: enum.Enum

This enum lists all the mathematical operations that can be performed on TimeSeriesData objects.

class kats.consts.SearchMethodEnum(value)[source]

Bases: enum.Enum

This enum lists the options of search algorithms to be used in hyper-parameter tuning.

class kats.consts.TSIterator(ts: kats.consts.TimeSeriesData)[source]

Bases: object

Iterates through the values of a single timeseries.

Produces a timeseries with a single point, in case of an univariate time series, or a timeseries with an array indicating the values at the given location, for a multivariate time series.

ts

The input timeseries.

class kats.consts.TimeSeriesChangePoint(start_time, end_time, confidence: float)[source]

Bases: object

Object returned by detector classes.

start_time

Start time of the change.

end_time

End time of the change.

confidence

The confidence of the change point.

class kats.consts.TimeSeriesData(df: Optional[pandas.core.frame.DataFrame] = None, sort_by_time: bool = True, time: Optional[Union[pandas.core.series.Series, pandas.core.indexes.datetimes.DatetimeIndex]] = None, value: Optional[Union[pandas.core.series.Series, pandas.core.frame.DataFrame]] = None, time_col_name: str = 'time', date_format: Optional[str] = None, use_unix_time: bool = False, unix_time_units: str = 'ns', tz: Optional[str] = None, tz_ambiguous: Union[str, numpy.ndarray] = 'raise', tz_nonexistent: str = 'raise')[source]

Bases: object

The fundamental Kats data structure to store a time series.

In order to access much of the functionality in the Kats library, users must initialize the TimeSeriesData class with their data first.

Initialization. TimeSeriesData can be initialized from the following data sources:

  • pandas.DataFrame

  • pandas.Series

  • pandas.DatetimeIndex

Typical usage example for initialization:

>>> import pandas as pd
>>> df = pd.read_csv("/kats/data/air_passengers.csv")
>>> ts = TimeSeriesData(df=df, time_col_name="ds")

Initialization arguments (all optional, but must choose one way to initialize e.g. pandas.DataFrame):

  • df: A pandas.DataFrame storing the time series (default None).

  • sort_by_time: A boolean indicating whether the TimeSeriesData

    should be sorted by time (default True).

  • time: a pandas.Series or pandas.DatetimeIndex storing the time

    values (default None).

  • value: A pandas.Series or pandas.DataFrame storing the series value(s)

    (default None).

  • time_col_name: A string representing the value of the time column (

    default “time”)

  • date_format: A string specifying the format of the date/time in the

    time column. Useful for faster parsing, and required pandas.to_datetime() cannot parse the column otherwise (default None).

  • use_unix_time: A boolean indicating if the time is represented as

    unix time (default False).

  • unix_time_units: A string indicating the units of the unix time – only

    used if use_unix_time=True (default “ns”).

  • tz: A string representing the timezone of the time values (default None).

  • tz_ambiguous: A string representing how to handle ambiguous timezones

    (default “raise”).

  • tz_nonexistant: A string representing how to handle nonexistant timezone

    values (default “raise”).

Raises

ValueError – Invalid params passed when trying to create the TimeSeriesData.

Operations. Many operations that you can do with pandas.DataFrame objects are also applicable to TimeSeriesData. For example:

>>> ts[0:2] # Slicing
>>> ts_1 == ts_2 # Equality
>>> ts_1.extend(ts_2) # Extend
>>> ts.plot(cols=["y"]) # Visualize

Utility Functions. Many utility functions for converting TimeSeriesData objects to other common data structures exist. For example:

>>> ts.to_dataframe() # Convert to pandas.DataFrame
>>> ts.to_array() # Convert to numpy.ndarray
time

A pandas.Series object storing the time values of the time series.

value

A pandas.Series (if univariate) or pandas.DataFrame (if multivariate) object storing the values of each field in the time series.

min

A float or pandas.Series representing the min value(s) of the time series.

max

A float or pandas.Series representing the max value(s) of the time series.

extend(other: object, validate: bool = True)None[source]

Extends TimeSeriesData with another TimeSeriesData object.

Parameters
  • other – The other TimeSeriesData object (currently only other TimeSeriesData objects are supported).

  • validate (optional) – A boolean representing if the new TimeSeriesData should be validated (default True).

Raises

ValueError – The object passed was not an instance of TimeSeriesData.

freq_to_timedelta()[source]

Returns a pandas.Timedelta representation of the TimeSeriesdata frequency.

Returns

A pandas.Timedelta object representing the frequency of the TimeSeriesData.

infer_freq_robust()pandas._libs.tslibs.timedeltas.Timedelta[source]

This method is a more robust way to infer the frequency of the time series in the presence of missing data. It looks at the diff of the time series, and decides the frequency by majority voting.

Returns

A pandas.Timedelta object representing the frequency of the series.

Raises

ValueError – The TimeSeriesData has less than 2 data points.

interpolate(freq: Optional[Union[str, pandas._libs.tslibs.timedeltas.Timedelta]] = None, method: str = 'linear', remove_duplicate_time=False)kats.consts.TimeSeriesData[source]

Interpolate missing date if time doesn’t have constant frequency.

The following options are available:
  • linear

  • backward fill

  • forward fill

See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html for more detail on these options.

Parameters
  • freq – A string representing the pre-defined freq of the time series.

  • method – A string representing the method to impute the missing time and data. See the above options (default “linear”).

  • remove_duplicate_index – A boolean to auto-remove any duplicate time values, as interpolation in this case due to the need to index on time (default False).

Returns

A new TimeSeriesData object with interpolated data.

is_data_missing()bool[source]

Checks if data is missing from the time series.

This is very similar to validate_data() but will not raise an error.

Returns

True when data is missing from the time series. Otherwise False.

is_empty()bool[source]

Checks if the TimeSeriesData is empty.

Returns

False if TimeSeriesData does not have any datapoints. Otherwise return True.

is_univariate()[source]

Returns whether the TimeSeriesData is univariate.

Returns

True if the TimeSeriesData is univariate. False otherwise.

property max: Union[pandas.core.series.Series, float]

Returns the max value(s) of the series.

Returns

A pandas.Series or float representing the max value(s) of the time series.

property min: Union[pandas.core.series.Series, float]

Returns the min value(s) of the series.

Returns

A pandas.Series or float representing the min value(s) of the time series.

plot(cols: List[str])None[source]

Plots the time series.

Parameters

cols – List of variables (strings) to plot (against time).

property time: pandas.core.series.Series

Returns the time values of the series.

Returns

A pandas.Series representing the time values of the time series.

time_to_index()pandas.core.indexes.datetimes.DatetimeIndex[source]

Utility function converting the time in the TimeSeriesData object to a pandas.DatetimeIndex.

Returns

A pandas.DatetimeIndex representation of the time values of the series.

to_array()numpy.ndarray[source]

Converts the TimeSeriesData object to a numpy.ndarray.

Returns

A numpy.ndarray representation of the time series.

to_dataframe(standard_time_col_name: bool = False)pandas.core.frame.DataFrame[source]

Converts the TimeSeriesData object into a pandas.DataFrame.

Parameters

standard_time_col (optional) – True if the DataFrame’s time column name should be “time”. To keep the same time column name as the current TimeSeriesData object, leave as False (default False).

tz()Optional[Union[datetime.tzinfo, dateutil.tz.tz.tzfile]][source]

Returns the timezone of the TimeSeriesData.

Returns

A timezone aware object representing the timezone of the TimeSeriesData. Returns None when there is no timezone present.

For more info, see: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.tz.html.

validate_data(validate_frequency: bool, validate_dimension: bool)None[source]

Validates the time series for correctness (on both frequency and dimension).

Parameters
  • validate_frequency – A boolean indicating whether the TimeSeriesData should be validated for constant frequency.

  • validate_dimension – A boolean indicating whether the TimeSeriesData should be validated for having both the same number of timesteps and values.

Raises

ValueError – The frequency and/or dimensions were invalid.

property value: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Returns the value(s) of the series.

Returns

A pandas.Series or pandas.DataFrame representing the value(s) of the time series.