ARIMA is a class of statistical models for analyzing and forecasting time series data. It's an acronym that stands for AutoRegressive Integrated Moving Average.
ARIMA models are particularly suited for data that shows evidence of non-stationarity, which can be made stationary by applying a differencing step.
1. The Components of ARIMA: (p,d,q)
ARIMA is defined by three parameters:
- p (Autoregressive - AR): This is the number of lag observations included in the model. It captures the relationship between an observation and a number of lagged observations. For example, if p=2, we are using the values from the previous two time steps to predict the current value.
- d (Integrated - I): This is the number of differencing transformations required to make the time series stationary. If the data is already stationary, then d=0. If it needs one round of differencing, d=1.
- q (Moving Average - MA): This is the size of the moving average window. It captures the relationship between an observation and the residual error from a moving average model applied to lagged observations.
The model is expressed as ARIMA(p,d,q).
2. Handling Seasonality with SARIMA
ARIMA is great, but it doesn't explicitly support data with a seasonal component. For that, we use SARIMA (Seasonal ARIMA).
SARIMA adds another set of parameters to model the seasonality: (P,D,Q,m).
- (P,D,Q): These are the seasonal equivalents of (p,d,q).
- m: The number of time steps in a single seasonal period (e.g., m=12 for monthly data, m=4 for quarterly data).
The full model is expressed as SARIMAX(p,d,q)(P,D,Q,m). (The 'X' means it can also include exogenous variables, which are external factors).
3. The Forecasting Pipeline
Here's a standard workflow for building an ARIMA/SARIMA model:
- Visualize the Data: Plot your time series to check for trends, seasonality, and outliers.
- Check for Stationarity: Use the ADF test. If the data is non-stationary, apply differencing (this determines your d and D values).
- Determine Parameters (p,q,P,Q): The traditional way to do this is by analyzing the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots of the stationary time series.
- ACF plot: Helps determine the q (MA) parameter.
- PACF plot: Helps determine the p (AR) parameter.
- Note: In practice, this can be tricky. Often, a "grid search" approach is used to automatically find the best parameter combination based on a metric like AIC (Akaike Information Criterion).
- Fit the Model: Instantiate and fit the SARIMA model with the chosen parameters.
- Evaluate the Model: Check the model's summary and diagnostic plots to ensure the residuals are well-behaved (e.g., resemble white noise).
- Forecast: Use the fitted model to predict future values.
4. Code Example
Let's fit a simple ARIMA model using statsmodels.
Python
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt
# Let's create some sample data
data = {'value': [i + i/10 * (i%3-1) for i in range(100)]}
df = pd.DataFrame(data, index=pd.date_range(start='2018-01-01', periods=100, freq='W'))
# Split data into train and test sets
train_data = df.iloc[:80]
test_data = df.iloc[80:]
# Define and fit the ARIMA model
# We'll use p=5, d=1, q=0 as an example.
# These would typically be determined through analysis (ACF/PACF, grid search).
model = ARIMA(train_data['value'], order=(5, 1, 0))
model_fit = model.fit()
# Print model summary
print(model_fit.summary())
# Generate forecasts
forecast_steps = len(test_data)
forecast = model_fit.get_forecast(steps=forecast_steps)
# Get the predicted values and confidence intervals
forecast_mean = forecast.predicted_mean
confidence_intervals = forecast.conf_int()
# Plot the results
plt.figure(figsize=(12, 6))
plt.plot(train_data.index, train_data['value'], label='Training Data')
plt.plot(test_data.index, test_data['value'], label='Actual Test Data', color='orange')
plt.plot(test_data.index, forecast_mean, label='Forecast', color='green')
plt.fill_between(test_data.index,
confidence_intervals.iloc[:, 0],
confidence_intervals.iloc[:, 1], color='k', alpha=0.1)
plt.title('ARIMA Forecast')
plt.legend()
plt.show()