kats.consts module¶
This module contains some of the key data structures in the Kats library,
including TimeSeriesData, TimeSeriesChangePoint, and
TimeSeriesIterator.
TimeSeriesChangePoint is the return type of many of the Kats detection
algorithms.
TimeSeriesData is the fundamental data structure in the Kats library,
that gives uses access to a host of forecasting, detection, and utility
algorithms right at the user’s fingertips.
- class kats.consts.ModelEnum(value)[source]¶
Bases:
enum.EnumThis enum lists the options of models to be set for default search space in hyper-parameter tuning.
- class kats.consts.OperationsEnum(value)[source]¶
Bases:
enum.EnumThis enum lists all the mathematical operations that can be performed on
TimeSeriesDataobjects.
- class kats.consts.SearchMethodEnum(value)[source]¶
Bases:
enum.EnumThis enum lists the options of search algorithms to be used in hyper-parameter tuning.
- class kats.consts.TSIterator(ts: kats.consts.TimeSeriesData)[source]¶
Bases:
objectIterates through the values of a single timeseries.
Produces a timeseries with a single point, in case of an univariate time series, or a timeseries with an array indicating the values at the given location, for a multivariate time series.
- ts¶
The input timeseries.
- class kats.consts.TimeSeriesChangePoint(start_time, end_time, confidence: float)[source]¶
Bases:
objectObject returned by detector classes.
- start_time¶
Start time of the change.
- end_time¶
End time of the change.
- confidence¶
The confidence of the change point.
- class kats.consts.TimeSeriesData(df: Optional[pandas.core.frame.DataFrame] = None, sort_by_time: bool = True, time: Optional[Union[pandas.core.series.Series, pandas.core.indexes.datetimes.DatetimeIndex]] = None, value: Optional[Union[pandas.core.series.Series, pandas.core.frame.DataFrame]] = None, time_col_name: str = 'time', date_format: Optional[str] = None, use_unix_time: bool = False, unix_time_units: str = 'ns', tz: Optional[str] = None, tz_ambiguous: Union[str, numpy.ndarray] = 'raise', tz_nonexistent: str = 'raise')[source]¶
Bases:
objectThe fundamental Kats data structure to store a time series.
In order to access much of the functionality in the Kats library, users must initialize the
TimeSeriesDataclass with their data first.Initialization.
TimeSeriesDatacan be initialized from the following data sources:pandas.DataFrame
pandas.Series
pandas.DatetimeIndex
Typical usage example for initialization:
>>> import pandas as pd >>> df = pd.read_csv("/kats/data/air_passengers.csv") >>> ts = TimeSeriesData(df=df, time_col_name="ds")
Initialization arguments (all optional, but must choose one way to initialize e.g. pandas.DataFrame):
df: A pandas.DataFrame storing the time series (default None).
- sort_by_time: A boolean indicating whether the
TimeSeriesData should be sorted by time (default True).
- sort_by_time: A boolean indicating whether the
- time: a pandas.Series or pandas.DatetimeIndex storing the time
values (default None).
- value: A pandas.Series or pandas.DataFrame storing the series value(s)
(default None).
- time_col_name: A string representing the value of the time column (
default “time”)
- date_format: A string specifying the format of the date/time in the
time column. Useful for faster parsing, and required pandas.to_datetime() cannot parse the column otherwise (default None).
- use_unix_time: A boolean indicating if the time is represented as
unix time (default False).
- unix_time_units: A string indicating the units of the unix time – only
used if use_unix_time=True (default “ns”).
tz: A string representing the timezone of the time values (default None).
- tz_ambiguous: A string representing how to handle ambiguous timezones
(default “raise”).
- tz_nonexistant: A string representing how to handle nonexistant timezone
values (default “raise”).
- Raises
ValueError – Invalid params passed when trying to create the
TimeSeriesData.
Operations. Many operations that you can do with pandas.DataFrame objects are also applicable to
TimeSeriesData. For example:>>> ts[0:2] # Slicing >>> ts_1 == ts_2 # Equality >>> ts_1.extend(ts_2) # Extend >>> ts.plot(cols=["y"]) # Visualize
Utility Functions. Many utility functions for converting
TimeSeriesDataobjects to other common data structures exist. For example:>>> ts.to_dataframe() # Convert to pandas.DataFrame >>> ts.to_array() # Convert to numpy.ndarray
- time¶
A pandas.Series object storing the time values of the time series.
- value¶
A pandas.Series (if univariate) or pandas.DataFrame (if multivariate) object storing the values of each field in the time series.
- min¶
A float or pandas.Series representing the min value(s) of the time series.
- max¶
A float or pandas.Series representing the max value(s) of the time series.
- extend(other: object, validate: bool = True) → None[source]¶
Extends
TimeSeriesDatawith anotherTimeSeriesDataobject.- Parameters
other – The other
TimeSeriesDataobject (currently only otherTimeSeriesDataobjects are supported).validate (optional) – A boolean representing if the new
TimeSeriesDatashould be validated (default True).
- Raises
ValueError – The object passed was not an instance of
TimeSeriesData.
- freq_to_timedelta()[source]¶
Returns a pandas.Timedelta representation of the
TimeSeriesdatafrequency.- Returns
A pandas.Timedelta object representing the frequency of the
TimeSeriesData.
- infer_freq_robust() → pandas._libs.tslibs.timedeltas.Timedelta[source]¶
This method is a more robust way to infer the frequency of the time series in the presence of missing data. It looks at the diff of the time series, and decides the frequency by majority voting.
- Returns
A pandas.Timedelta object representing the frequency of the series.
- Raises
ValueError – The
TimeSeriesDatahas less than 2 data points.
- interpolate(freq: Optional[Union[str, pandas._libs.tslibs.timedeltas.Timedelta]] = None, method: str = 'linear', remove_duplicate_time=False) → kats.consts.TimeSeriesData[source]¶
Interpolate missing date if time doesn’t have constant frequency.
- The following options are available:
linear
backward fill
forward fill
See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html for more detail on these options.
- Parameters
freq – A string representing the pre-defined freq of the time series.
method – A string representing the method to impute the missing time and data. See the above options (default “linear”).
remove_duplicate_index – A boolean to auto-remove any duplicate time values, as interpolation in this case due to the need to index on time (default False).
- Returns
A new
TimeSeriesDataobject with interpolated data.
- is_data_missing() → bool[source]¶
Checks if data is missing from the time series.
This is very similar to
validate_data()but will not raise an error.- Returns
True when data is missing from the time series. Otherwise False.
- is_empty() → bool[source]¶
Checks if the
TimeSeriesDatais empty.- Returns
False if
TimeSeriesDatadoes not have any datapoints. Otherwise return True.
- is_univariate()[source]¶
Returns whether the
TimeSeriesDatais univariate.- Returns
True if the
TimeSeriesDatais univariate. False otherwise.
- property max: Union[pandas.core.series.Series, float]¶
Returns the max value(s) of the series.
- Returns
A pandas.Series or float representing the max value(s) of the time series.
- property min: Union[pandas.core.series.Series, float]¶
Returns the min value(s) of the series.
- Returns
A pandas.Series or float representing the min value(s) of the time series.
- plot(cols: List[str]) → None[source]¶
Plots the time series.
- Parameters
cols – List of variables (strings) to plot (against time).
- property time: pandas.core.series.Series¶
Returns the time values of the series.
- Returns
A pandas.Series representing the time values of the time series.
- time_to_index() → pandas.core.indexes.datetimes.DatetimeIndex[source]¶
Utility function converting the time in the
TimeSeriesDataobject to a pandas.DatetimeIndex.- Returns
A pandas.DatetimeIndex representation of the time values of the series.
- to_array() → numpy.ndarray[source]¶
Converts the
TimeSeriesDataobject to a numpy.ndarray.- Returns
A numpy.ndarray representation of the time series.
- to_dataframe(standard_time_col_name: bool = False) → pandas.core.frame.DataFrame[source]¶
Converts the
TimeSeriesDataobject into a pandas.DataFrame.- Parameters
standard_time_col (optional) – True if the DataFrame’s time column name should be “time”. To keep the same time column name as the current
TimeSeriesDataobject, leave as False (default False).
- tz() → Optional[Union[datetime.tzinfo, dateutil.tz.tz.tzfile]][source]¶
Returns the timezone of the
TimeSeriesData.- Returns
A timezone aware object representing the timezone of the
TimeSeriesData. Returns None when there is no timezone present.
For more info, see: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.tz.html.
- validate_data(validate_frequency: bool, validate_dimension: bool) → None[source]¶
Validates the time series for correctness (on both frequency and dimension).
- Parameters
validate_frequency – A boolean indicating whether the
TimeSeriesDatashould be validated for constant frequency.validate_dimension – A boolean indicating whether the
TimeSeriesDatashould be validated for having both the same number of timesteps and values.
- Raises
ValueError – The frequency and/or dimensions were invalid.
- property value: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶
Returns the value(s) of the series.
- Returns
A pandas.Series or pandas.DataFrame representing the value(s) of the time series.