kats.consts module¶
This module contains some of the key data structures in the Kats library,
including TimeSeriesData
, TimeSeriesChangePoint
, and
TimeSeriesIterator
.
TimeSeriesChangePoint
is the return type of many of the Kats detection
algorithms.
TimeSeriesData
is the fundamental data structure in the Kats library,
that gives uses access to a host of forecasting, detection, and utility
algorithms right at the user’s fingertips.
- class kats.consts.ModelEnum(value)[source]¶
Bases:
enum.Enum
This enum lists the options of models to be set for default search space in hyper-parameter tuning.
- class kats.consts.OperationsEnum(value)[source]¶
Bases:
enum.Enum
This enum lists all the mathematical operations that can be performed on
TimeSeriesData
objects.
- class kats.consts.SearchMethodEnum(value)[source]¶
Bases:
enum.Enum
This enum lists the options of search algorithms to be used in hyper-parameter tuning.
- class kats.consts.TSIterator(ts: kats.consts.TimeSeriesData)[source]¶
Bases:
object
Iterates through the values of a single timeseries.
Produces a timeseries with a single point, in case of an univariate time series, or a timeseries with an array indicating the values at the given location, for a multivariate time series.
- ts¶
The input timeseries.
- class kats.consts.TimeSeriesChangePoint(start_time, end_time, confidence: float)[source]¶
Bases:
object
Object returned by detector classes.
- start_time¶
Start time of the change.
- end_time¶
End time of the change.
- confidence¶
The confidence of the change point.
- class kats.consts.TimeSeriesData(df: Optional[pandas.core.frame.DataFrame] = None, sort_by_time: bool = True, time: Optional[Union[pandas.core.series.Series, pandas.core.indexes.datetimes.DatetimeIndex]] = None, value: Optional[Union[pandas.core.series.Series, pandas.core.frame.DataFrame]] = None, time_col_name: str = 'time', date_format: Optional[str] = None, use_unix_time: bool = False, unix_time_units: str = 'ns', tz: Optional[str] = None, tz_ambiguous: Union[str, numpy.ndarray] = 'raise', tz_nonexistent: str = 'raise')[source]¶
Bases:
object
The fundamental Kats data structure to store a time series.
In order to access much of the functionality in the Kats library, users must initialize the
TimeSeriesData
class with their data first.Initialization.
TimeSeriesData
can be initialized from the following data sources:pandas.DataFrame
pandas.Series
pandas.DatetimeIndex
Typical usage example for initialization:
>>> import pandas as pd >>> df = pd.read_csv("/kats/data/air_passengers.csv") >>> ts = TimeSeriesData(df=df, time_col_name="ds")
Initialization arguments (all optional, but must choose one way to initialize e.g. pandas.DataFrame):
df: A pandas.DataFrame storing the time series (default None).
- sort_by_time: A boolean indicating whether the
TimeSeriesData
should be sorted by time (default True).
- sort_by_time: A boolean indicating whether the
- time: a pandas.Series or pandas.DatetimeIndex storing the time
values (default None).
- value: A pandas.Series or pandas.DataFrame storing the series value(s)
(default None).
- time_col_name: A string representing the value of the time column (
default “time”)
- date_format: A string specifying the format of the date/time in the
time column. Useful for faster parsing, and required pandas.to_datetime() cannot parse the column otherwise (default None).
- use_unix_time: A boolean indicating if the time is represented as
unix time (default False).
- unix_time_units: A string indicating the units of the unix time – only
used if use_unix_time=True (default “ns”).
tz: A string representing the timezone of the time values (default None).
- tz_ambiguous: A string representing how to handle ambiguous timezones
(default “raise”).
- tz_nonexistant: A string representing how to handle nonexistant timezone
values (default “raise”).
- Raises
ValueError – Invalid params passed when trying to create the
TimeSeriesData
.
Operations. Many operations that you can do with pandas.DataFrame objects are also applicable to
TimeSeriesData
. For example:>>> ts[0:2] # Slicing >>> ts_1 == ts_2 # Equality >>> ts_1.extend(ts_2) # Extend >>> ts.plot(cols=["y"]) # Visualize
Utility Functions. Many utility functions for converting
TimeSeriesData
objects to other common data structures exist. For example:>>> ts.to_dataframe() # Convert to pandas.DataFrame >>> ts.to_array() # Convert to numpy.ndarray
- time¶
A pandas.Series object storing the time values of the time series.
- value¶
A pandas.Series (if univariate) or pandas.DataFrame (if multivariate) object storing the values of each field in the time series.
- min¶
A float or pandas.Series representing the min value(s) of the time series.
- max¶
A float or pandas.Series representing the max value(s) of the time series.
- extend(other: object, validate: bool = True) → None[source]¶
Extends
TimeSeriesData
with anotherTimeSeriesData
object.- Parameters
other – The other
TimeSeriesData
object (currently only otherTimeSeriesData
objects are supported).validate (optional) – A boolean representing if the new
TimeSeriesData
should be validated (default True).
- Raises
ValueError – The object passed was not an instance of
TimeSeriesData
.
- freq_to_timedelta()[source]¶
Returns a pandas.Timedelta representation of the
TimeSeriesdata
frequency.- Returns
A pandas.Timedelta object representing the frequency of the
TimeSeriesData
.
- infer_freq_robust() → pandas._libs.tslibs.timedeltas.Timedelta[source]¶
This method is a more robust way to infer the frequency of the time series in the presence of missing data. It looks at the diff of the time series, and decides the frequency by majority voting.
- Returns
A pandas.Timedelta object representing the frequency of the series.
- Raises
ValueError – The
TimeSeriesData
has less than 2 data points.
- interpolate(freq: Optional[Union[str, pandas._libs.tslibs.timedeltas.Timedelta]] = None, method: str = 'linear', remove_duplicate_time=False) → kats.consts.TimeSeriesData[source]¶
Interpolate missing date if time doesn’t have constant frequency.
- The following options are available:
linear
backward fill
forward fill
See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html for more detail on these options.
- Parameters
freq – A string representing the pre-defined freq of the time series.
method – A string representing the method to impute the missing time and data. See the above options (default “linear”).
remove_duplicate_index – A boolean to auto-remove any duplicate time values, as interpolation in this case due to the need to index on time (default False).
- Returns
A new
TimeSeriesData
object with interpolated data.
- is_data_missing() → bool[source]¶
Checks if data is missing from the time series.
This is very similar to
validate_data()
but will not raise an error.- Returns
True when data is missing from the time series. Otherwise False.
- is_empty() → bool[source]¶
Checks if the
TimeSeriesData
is empty.- Returns
False if
TimeSeriesData
does not have any datapoints. Otherwise return True.
- is_univariate()[source]¶
Returns whether the
TimeSeriesData
is univariate.- Returns
True if the
TimeSeriesData
is univariate. False otherwise.
- property max: Union[pandas.core.series.Series, float]¶
Returns the max value(s) of the series.
- Returns
A pandas.Series or float representing the max value(s) of the time series.
- property min: Union[pandas.core.series.Series, float]¶
Returns the min value(s) of the series.
- Returns
A pandas.Series or float representing the min value(s) of the time series.
- plot(cols: List[str]) → None[source]¶
Plots the time series.
- Parameters
cols – List of variables (strings) to plot (against time).
- property time: pandas.core.series.Series¶
Returns the time values of the series.
- Returns
A pandas.Series representing the time values of the time series.
- time_to_index() → pandas.core.indexes.datetimes.DatetimeIndex[source]¶
Utility function converting the time in the
TimeSeriesData
object to a pandas.DatetimeIndex.- Returns
A pandas.DatetimeIndex representation of the time values of the series.
- to_array() → numpy.ndarray[source]¶
Converts the
TimeSeriesData
object to a numpy.ndarray.- Returns
A numpy.ndarray representation of the time series.
- to_dataframe(standard_time_col_name: bool = False) → pandas.core.frame.DataFrame[source]¶
Converts the
TimeSeriesData
object into a pandas.DataFrame.- Parameters
standard_time_col (optional) – True if the DataFrame’s time column name should be “time”. To keep the same time column name as the current
TimeSeriesData
object, leave as False (default False).
- tz() → Optional[Union[datetime.tzinfo, dateutil.tz.tz.tzfile]][source]¶
Returns the timezone of the
TimeSeriesData
.- Returns
A timezone aware object representing the timezone of the
TimeSeriesData
. Returns None when there is no timezone present.
For more info, see: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.tz.html.
- validate_data(validate_frequency: bool, validate_dimension: bool) → None[source]¶
Validates the time series for correctness (on both frequency and dimension).
- Parameters
validate_frequency – A boolean indicating whether the
TimeSeriesData
should be validated for constant frequency.validate_dimension – A boolean indicating whether the
TimeSeriesData
should be validated for having both the same number of timesteps and values.
- Raises
ValueError – The frequency and/or dimensions were invalid.
- property value: Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶
Returns the value(s) of the series.
- Returns
A pandas.Series or pandas.DataFrame representing the value(s) of the time series.