kats.utils.simulator moduleΒΆ

This module implements a simulator for generating synthetic time series data.

class kats.utils.simulator.Simulator(n: int = 100, freq: str = 'D', start: Optional[Any] = None)[source]ΒΆ

Bases: object

TimeSeriesData simulator, to generate synthetic timeseries data.

The Simulator currently supports generating synthetic time series using the STL, ARIMA models and also adds level and trend changepoints.

nΒΆ

length of the time series.

freqΒΆ

desired frequency (e.g. daily, weekly) of a time series.

startΒΆ

start date of the time series.

add_noise(magnitude: float = 1.0, multiply: bool = False)[source]ΒΆ

Add noise to the generated time series for STL-based simulator.

Noise type is normal - noise will be generated from iid normal distribution; may consider adding noise generated by ARMA in the future if there’re use cases.

Parameters
  • magnitude – float.

  • multiply – True if the noise is multiplicative, otherwise additive.

Returns

Generated timeseries.

add_seasonality(magnitude: float = 0.0, period: Union[datetime.timedelta, float, str] = '1D', multiply: bool = False)kats.consts.TimeSeriesData[source]ΒΆ

Add a seasonality component to the time series for STL-based simulator.

Parameters
  • magnitude – slope of the trend, float.

  • period – period of seasonality, timedelta.

  • multiply – True if the seasonality is multiplicative, otherwise additive.

Returns

Generated timeseries.

add_trend(magnitude: float, trend_type: str = 'linear', multiply: bool = False)[source]ΒΆ

Add a trend component to the target time series for STL-based simulator.

trend_type - shape of the trend. {β€œlinear”,”sigmoid”}

Parameters
  • magnitude – slope of the trend, float.

  • trend_type – linear or sigmoid, string.

  • multiply – True if the trend is multiplicative, otherwise additive.

Returns

The timeseries generated.

arima_sim(ar: List[float], ma: List[float], mu: float = 0, sigma: float = 1, burnin: int = 10, d: int = 0, t: int = 0)kats.consts.TimeSeriesData[source]ΒΆ

Simulate data from ARIMA model.

Data generation includes two steps: (1). Simulate data from ARMA(p’, q) model

The configuration of ARMA(p’, q) model is: X_t = alpha_1 * X_{t-1} + … + alpha_p * X_{t-p’}

  • 1 * epsilon_t + theta_1 * epsilon_{t-1} + … + theta_q * epsilon_{t-q}

(2). Add drift d d is the order of differencing p = p’ - d for ARIMA(p, d, q)

Parameters
  • ar – [alpha_1, …, alpha_p’], coefficients of AR parameters. p = len(alpha)

  • ma – [theta_1, …, theta_q], coefficients of MA parameters. q = len(theta)

  • epsilon – error terms follows normal distribution(mu, sigma).

  • mu – mean of normal distribution for epsilon.

  • sigma – standard dev of normal distribution for epsilon.

  • burnin – number of data that will be dropped because lack of lagged data in the beginning.

  • d – number of unit roots for non-stationary data.

  • t – linear trend constant.

Returns

TimeSeries generated.

Return type

ts

Examples: >>> sim = Simulator(n=100, freq=”MS”, start = pd.to_datetime(β€œ2011-01-01 00:00:00”)) >>> np.random.seed(100) >>> ts = sim.arima_sim(ar=[0.1, 0.05], ma = [0.04, 0.1], d = 1)

level_shift_multivariate_indep_sim(cp_arr: Optional[List[int]] = None, level_arr: Optional[List[float]] = None, noise: float = 30.0, seasonal_period: int = 7, seasonal_magnitude: float = 3.0, anomaly_arr: Optional[List[int]] = None, z_score_arr: Optional[List[float]] = None, dim: int = 3)kats.consts.TimeSeriesData[source]ΒΆ

Produces a multivariate time series with level shifts.

The positions of the level shifts are indicated by the beginning and end changepoints. the duration of the first change is [first_cp_begin, first_cp_end], the duration of the second change point is [second_cp_begin, self.n] The number of dimensions are indicated by dim.

Parameters
  • cp_arr – Array of changepoint locations.

  • level_arr – Array containing levels for each segment. Since the number of segments is one more than the number of changepoints, hence, the level arr should be longer than the cp_arr by one.

  • noise – std. dev of random Gaussian noise added.

  • seasonal_period – periodicity of the time series.

  • seasonal_magnitude – amplitude of the seasonality. Set this to 0, if you want a time series without seasonality.

  • anomaly_arr – locations where we introduce an anomalous point.

  • z_score_arr – same length as anomaly arr. This is the z-score of the anomaly introduced at the location indicated by the anomaly_arr.

  • dim – number of dimensions of the timeseries.

Returns

Generated timeseries.

level_shift_sim(random_seed: int = 100, cp_arr: Optional[List[int]] = None, level_arr: Optional[List[float]] = None, noise: float = 30.0, seasonal_period: int = 7, seasonal_magnitude: float = 3.0, anomaly_arr: Optional[List[int]] = None, z_score_arr: Optional[List[float]] = None)kats.consts.TimeSeriesData[source]ΒΆ

Produces a time series with level shifts.

The positions of the level shifts are indicated by the beginning and end changepoints. the duration of the first change is [first_cp_begin, first_cp_end], the duration of the second change point is [second_cp_begin, self.n]

Parameters
  • cp_arr – Array of changepoint locations.

  • level_arr – Array containing levels for each segment. Since the number of segments is one more than the number of changepoints, hence, the level arr should be longer than the cp_arr by one.

  • noise – std. dev of random Gaussian noise added.

  • seasonal_period – periodicity of the time series.

  • seasonal_magnitude – amplitude of the seasonality. Set this to 0, if you want a time series without seasonality.

  • anomaly_arr – locations where we introduce an anomalous point.

  • z_score_arr – same length as anomaly arr. This is the z-score of the anomaly introduced at the location indicated by the anomaly_arr.

Returns

Generated timeseries.

Example Usage: >>> sim2 = Simulator(n=450, start=”2018-01-01”) >>> ts2 = sim2.level_shift_sim(

cp_arr=[100, 200], level_arr=[3, 20, 2], noise=3, seasonal_period=7, seasonal_magnitude=3, anomaly_arr = [50, 150, 250], z_score_arr = [10, -10, 20],

)

stl_sim()kats.consts.TimeSeriesData[source]ΒΆ

Simulate time series data with seasonality, trend, and noise.

Parameters

None. –

Returns

Generated timeseries.

Example usage: >>> sim = Simulator(n=100, freq=”1D”, start = pd.to_datetime(β€œ2011-01-01”)) >>> sim.add_trend(magnitude=10) >>> sim.add_seasonality(5, period=timedelta(days=7)) >>> sim.add_noise(magnitude=2) >>> sim_ts = sim.stl_sim()

trend_shift_sim(random_seed: int = 15, cp_arr: Optional[List[int]] = None, trend_arr: Optional[List[float]] = None, intercept: float = 100.0, noise: float = 30.0, seasonal_period: int = 7, seasonal_magnitude: float = 3.0, anomaly_arr: Optional[List[int]] = None, z_score_arr: Optional[List[int]] = None)kats.consts.TimeSeriesData[source]ΒΆ

Produces a time series with multiple trend shifts and seasonality.

This can be used as synthetic data to test trend changepoints first_cp_begin is where the trend change begins, and continues till the end.

Parameters
  • random_seed – Seed, to reproduce the same time series.

  • cp_arr – Array of changepoint locations.

  • trend_arr – Array containing trends for each segment. Since the number of segments is one more than the number of changepoints, hence, the trend arr should be longer than the cp_arr by one.

  • noise – std. dev of random Gaussian noise added.

  • seasonal_period – periodicity of the time series.

  • seasonal_magnitude – amplitude of the seasonality. Set this to 0, if you want a time series without seasonality.

  • anomaly_arr – locations where we introduce an anomalous point.

  • z_score_arr – same length as anomaly arr. This is the z-score of the anomaly introduced at the location indicated by the anomaly_arr.

Returns

Generated timeseries.

Example usage: >>> sim2 = Simulator(n=450, start=”2018-01-01”) >>> ts2 = sim2.trend_shift_sim(

cp_arr=[100, 200], trend_arr=[3, 20, 2], intercept=30, noise=30, seasonal_period=7, seasonal_magnitude=3, anomaly_arr = [50, 150, 250], z_score_arr = [10, -10, 20],

)