While models like ARIMA and Prophet are powerful, they have limitations. Machine learning and deep learning models, such as Gradient Boosting, LSTMs (Long Short-Term Memory networks), or Transformers, can often capture more complex, non-linear patterns in data.
To use these models, we must first reframe our time series problem as a supervised learning problem. This involves creating a dataset where we have a set of input features (X) and a target variable (y) to predict.
1. Feature Engineering for Time Series
This is the process of creating new, informative features from your existing time series data. These features will become the columns in your X matrix.
- Time-Based Features: These are features you can extract directly from the timestamp.
- Day of the week, week of the year, month, quarter
- Is it a weekend? Is it the beginning/end of a month?
- These help the model learn seasonal and cyclical patterns.
- Lag Features: These are the values of the time series from previous time steps. A lag of 1 (lag_1) for today's value is yesterday's value.
- This is the most crucial feature, as it directly informs the model about the recent history of the series.
- Rolling Window Features: These are statistics calculated over a moving window of past data.
- Rolling mean (e.g., 7-day moving average) to capture the recent trend.
- Rolling standard deviation to capture recent volatility.
- Rolling min/max, and more.
Here's how you can create these features using pandas:
Python
import pandas as pd
import numpy as np
# Create sample data
data = {'value': np.random.randn(500).cumsum() + 50}
df = pd.DataFrame(data, index=pd.date_range(start='2023-01-01', periods=500, freq='D'))
# Time-based features
df['day_of_week'] = df.index.dayofweek
df['month'] = df.index.month
# Lag features (e.g., value from 1, 2, and 3 days ago)
df['lag_1'] = df['value'].shift(1)
df['lag_2'] = df['value'].shift(2)
df['lag_3'] = df['value'].shift(3)
# Rolling window features (e.g., 7-day rolling mean)
df['rolling_mean_7'] = df['value'].shift(1).rolling(window=7).mean()
df['rolling_std_7'] = df['value'].shift(1).rolling(window=7).std()
# Drop rows with NaN values created by shifts and rolls
df.dropna(inplace=True)
print(df.head())
2. Creating Sequences for Deep Learning
Deep learning models like LSTMs are "sequence models." They are designed to learn from sequences of data. To use an LSTM, we need to create input "windows" of historical data.
We take a fixed-length sequence of past data (e.g., the last 30 days) as our input X, and the value at the next time step (the 31st day) as our target y. We then slide this window across our entire time series to generate many training examples.
Python
from sklearn.preprocessing import MinMaxScaler
import numpy as np
# Assume 'df' is our feature-engineered DataFrame
# Select features and target
features = ['lag_1', 'lag_2', 'lag_3', 'rolling_mean_7', 'rolling_std_7']
target = 'value'
# Scale the data (important for neural networks)
scaler_features = MinMaxScaler()
scaler_target = MinMaxScaler()
X_scaled = scaler_features.fit_transform(df[features])
y_scaled = scaler_target.fit_transform(df[[target]])
# Function to create sequences
def create_sequences(X, y, time_steps=10):
Xs, ys = [], []
for i in range(len(X) - time_steps):
v = X[i:(i + time_steps)]
Xs.append(v)
ys.append(y[i + time_steps])
return np.array(Xs), np.array(ys)
TIME_STEPS = 30 # Use the last 30 days of features to predict the next day's value
X_seq, y_seq = create_sequences(X_scaled, y_scaled, TIME_STEPS)
print('Shape of input sequences (X):', X_seq.shape)
print('Shape of target values (y):', y_seq.shape)
# Output shape might be:
# (Number of samples, time_steps, number_of_features) -> (457, 30, 5)
# (Number of samples, 1) -> (457, 1)
This X_seq and y_seq data is now perfectly formatted to be fed into a deep learning sequence model (like an LSTM in Keras or PyTorch) for training. This approach allows the model to learn complex temporal patterns from multiple engineered features simultaneously.