Walk through TIME — Part-2 (Modelling of Analysis in Python)

Hitesh Tripathi
13 min readDec 27, 2020
Source: Unsplash

Walk Forward Validation :

In time series modelling, the predictions over time become less and less accurate and hence it is a more realistic approach to re-train the model with actual data as it gets available for further predictions. Since training of statistical models are not time consuming, walk-forward validation is the most preferred solution to get most accurate results.

Random Walk Model :

A random walk is one in which future steps or directions cannot be predicted on the basis of past history. When the term is applied to the stock market, it means that short-run changes in stock prices are unpredictable.

When faced with a time series that shows irregular growth, such as X2 analyzed earlier, the best strategy may not be to try to directly predict the level of the series at each period (i.e., the quantity Yt). Instead, it may be better to try to predict the change that occurs from one period to the next (i.e., the quantity Yt — Yt-1). That is, it may be better to look at the first difference of the series, to see if a predictable pattern can be found there. For purposes of one-period-ahead forecasting, it is just as good to predict the next change as to predict the next level of the series, since the predicted change can be added to the current level to yield a predicted level. The simplest case of such a model is one that always predicts that the next change will be zero, as if the series is equally likely to go up or down in the next period regardless of what it has done in the past.

random process for which this model is appropriate

In each time period, going from left to right, the value of the variable takes an independent random step up or down, a so-called random walk. If up and down movements are equally likely at each intersection, then every possible left-to-right path through the grid is equally likely a priori. See this link for a nice simulation. A commonly-used analogy is that of a drunkard who staggers randomly to the left or right as he tries to go forward: the path he traces will be a random walk.

Random Walk Model

Test-Train Split

Splitting data frame (or arrays) into two subsets: for training data and for testing data.

temp_df = pd.read_csv('daily-min-temperatures.csv', header=0 , parse_dates=[0])
train_size = int(temp_df.shape[0]*0.8)
train = temp_df[0:train_size]
test = temp_df[train_size:]

1. Persistence Model (Naive forecast):

In persistence or Naive forecast model we assume that the last time period value is the forecast for this(current) period.

df = pd.read_csv(‘daily-min-temperatures.csv’, header=0 , parse_dates=[0])
df['t'] = df['Temp'].shift(1)
train, test = df[1:df.shape[0]-7], df[df.shape[0]-7:]
train_X, train_y = train['t'], train['Temp']
test_X, test_y = test['t'], test['Temp']

walk-forward validation (of persistance model)

predictions = test_X.copy()
print(predictions)
print(test_y)
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(test_y, predictions)
mse
3.4228571428571413
import matplotlib.pyplot as plt
plt.plot(test_y)
plt.plot(predictions, color='red')

2. Autoregression Model (AR Model)

[Not Applied to Those Time series which has Trend and Seasonality]
AR model for short, relies only on past period values to predict current ones (as model says regression of itself from its previous time values). AR model is running a regression model on the lagged values and using using it in future values. It’s a linear model, where current period values are a sum of past outcomes multiplied by a numeric factor. This model can only be used on series without trend and seasonality.

df = pd.read_csv('daily-min-temperatures.csv', header=0 , parse_dates=[0])
train, test = df.Temp[1:df.shape[0]-7], df.Temp[df.shape[0]-7:]
from statsmodels.tsa.ar_model import AR
model = AR(train)
model_fit = model.fit()
# No. of lag Variables
model_fit.k_ar
# Coef of lag variables
model_fit.params
predictions = model_fit.predict(start=len(train), end=len(train)+len(test)-1)from sklearn.metrics import mean_squared_error
mse = mean_squared_error(test, predictions)
mse
1.501525231006946
plt.plot(test)
plt.plot(predictions, color='red')

Walk Forward validation (for AR model)

df = pd.read_csv('daily-min-temperatures.csv', header=0 , parse_dates=[0])train, test = df.Temp[1:df.shape[0]-7], df.Temp[df.shape[0]-7:]
data = train
predict =[]
for t in test:
model = AR(data)
model_fit = model.fit()
y = model_fit.predict(start=len(data), end=len(train)+len(test)-1)
print(y.values[0])
predict.append(y.values[0])
data = np.append(data, t)
data = pd.Series(data)
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(test_y.values, predict)
mse
1.4524568425185247
plt.plot(test_y.values)
plt.plot(predict, color='red')

Smoothing

Smoothing is usually done to help us better see patterns, trends for example, in time series. Generally smooth out the irregular roughness to see a clearer signal. For seasonal data, we might smooth out the seasonality so that we can identify the trend. Smoothing doesn’t provide us with a model, but it can be a good first step in describing various components of the series.

The term filter is sometimes used to describe a smoothing procedure. For instance, if the smoothed value for a particular time is calculated as a linear combination of observations for surrounding times, it might be said that we’ve applied a linear filter to the data (not the same as saying the result is a straight line, by the way).

3. Moving Average Model (MA Model)

Moving averages are of two types Centered MA and Trailing MA. MA is method used for smoothening. Trailing MA is used in

  • Feature Engineering
  • Forecasting

The traditional use of the term moving average is that at each point in time we determine (possibly weighted) averages of observed values that surround a particular time.

For instance, at time t, a “centered moving average of length 3” with equal weights would be the average of values at times t−1,t, and t+1.

To take away seasonality from a series so we can better see trend, we would use a moving average with a length = seasonal span. Thus in the smoothed series, each smoothed value has been averaged across all seasons. This might be done by looking at a “one-sided” moving average in which you average all values for the previous year’s worth of data or a centered moving average in which you use values both before and after the current time.

For quarterly data, for e.g., we could define a smoothed value for time t as

the average of this time and the previous 3 quarters. In R code this will be a one-sided filter.A centered moving average creates a bit of a difficulty when we have an even number of time periods in the seasonal span (as we usually do).To smooth away seasonality in quarterly data, in order to identify trend, the usual convention is to use the moving average smoothed at time t is

To smooth away seasonality in monthly data, in order to identify trend, the usual convention is to use the moving average smoothed at time t is

That is, we apply weight 1/24 to values at times t−6 and t+6 and weight 1/12 to all values at all times between t−5 and t+5.

Single Exponential Smoothing

The basic forecasting equation for single exponential smoothing is often given as

We forecast the value of x at time t+1 to be a weighted combination of the observed value at time t and the forecasted value at time t. Although the method is called a smoothing method, it’s principally used for short run forecasting.The value of α is called the smoothing constant. For whatever reason, α = 0.2 is a popular default choice of programs. This puts a weight of .2 on the most recent observation and a weight of 1 − .2 = .8 on the most recent forecast. With a relatively small value of α, the smoothing will be relatively more extensive. With a relatively large value of α, the smoothing is relatively less extensive as more weight will be put on the observed value.

This is simple one-step ahead forecasting method that at first glance seems not to require a model for the data. In fact, this method is equivalent to the use of an ARIMA(0,1,1) model with no constant.

The optimal procedure is to fit an ARIMA (0,1,1) model to the observed dataset and use the results to determine the value of α. This is “optimal” in the sense of creating the best α for the data already observed.

Although the goal is smoothing and one step ahead forecasting, the equivalence to the ARIMA(0,1,1) model does bring up a good point. We shouldn’t blindly apply exponential smoothing because the underlying process might not be well modeled by an ARIMA(0,1,1).

ARIMA(0,1,1) and Exponential Smoothing Equivalence

Consider an ARIMA(0,1,1) with mean μ = 0 for the first differences, xt — xt-1:

Why the Method is Called Exponential Smoothing

This yields the following:

Continue in this fashion by successively substituting for the forecasted value on the right side of the equation. This leads to:

Equation 2 shows that the forecasted value is a weighted average of all past values of the series, with exponentially changing weights as we move back in the series.

Double Exponential Smoothing

Double exponential smoothing might be used when there’s trend (either long run or short run), but no seasonality.

Essentially the method creates a forecast by combining exponentially smoothed estimates of the trend (slope of a straight line) and the level (basically, the intercept of a straight line).

Two different weights, or smoothing parameters, are used to update these two components at each time.

The smoothed “level” is more or less equivalent to a simple exponential smoothing of the data values and the smoothed trend is more or less equivalent to a simple exponential smoothing of the first differences.

The procedure is equivalent to fitting an ARIMA(0,2,2) model, with no constant; it can be carried out with an ARIMA(0,2,2) fit.

df = pd.read_csv('daily-min-temperatures.csv', header=0 , parse_dates=[0])
df['t'] = df['Temp'].shift(1)
df['Resid'] = df['Temp'] - df['t']
train, test = df.Resid[1:df.shape[0]-7], df.Resid[df.shape[0]-7:]
from statsmodels.tsa.ar_model import ARmodel = AR(train)
model_fit = model.fit()
# k_ar is lag length or residual variables
model_fit.k_ar
pred_resid = model_fit.predict(start=len(train), end=len(train)+len(test)-1)
df.t[df.shape[0]-7:]
predictions = df.t[df.shape[0]-7:] + pred_residfrom sklearn.metrics import mean_squared_error
mse = mean_squared_error(test_y, predictions)
mse
2.049398556648208
plt.plot(test_y)
plt.plot(predictions, color='red')

Walk Forward validation (for MA model)

df = pd.read_csv('daily-min-temperatures.csv', header=0 , parse_dates=[0])
df['t'] = df['Temp'].shift(1)
df['resid'] = df['Temp'] - df['t']
train, test = df.resid[1:df.shape[0]-7], df.resid[df.shape[0]-7:]
from statsmodels.tsa.ar_model import AR
model = AR(train)
model_fit = model.fit()
pred_resid = model_fit.predict(start=len(train), end=len(train)+len(test)-1)
data = train
predict =[]
for t in test:
model = AR(data)
model_fit = model.fit()
y = model_fit.predict(start=len(train), end=len(train)+len(test)-1)
print(y.values[0])
predict.append(y.values[0])
data = np.append(data, t)
data = pd.Series(data)
predictions = df.t[df.shape[0]-7:] + pred_resid
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(test_y, predictions)
mse
2.049398556648208plt.plot(test_y)
plt.plot(predictions, color='red')

4. What Is ARIMA?

ARIMA stands for Auto Regressive Integrated Moving Average. ARIMA is a simple stochastic time series model that we can use to train and then forecast future time points.ARIMA can capture complex relationships as it takes error terms and observations of lagged terms. These models rely on regressing a variable on past values.

ARIMA is Auto Regressive — (AR)

Auto Regressive (AR) property of ARIMA is referred to as P.

Past time points of time series data can impact current and future time points. ARIMA models take this concept into account when forecasting current and future values. ARIMA uses a number of lagged observations of time series to forecast observations. A weight is applied to each of the past term and the weights can vary based on how recent they are.

AR(x) means x lagged error terms are going to be used in the ARIMA model.

ARIMA relies on AutoRegression. Autoregression is a process of regressing a variable on past values of itself. Autocorrelations gradually decay and estimate the degree to which white noise characterizes a series of data.

ARIMA is Integrated — (I)

If a trend exists then time series is considered non stationary and shows seasonality. Integrated is a property that reduces seasonality from a time series. ARIMA models have a degree of differencing which eliminates seasonality.

D property of ARIMA represents degree of differencing.

ARIMA is Moving Average — (MA)

Error terms of previous time points are used to predict current and future point’s observation. Moving average (MA) removes non-determinism or random movements from a time series. The property Q represents Moving Average in ARIMA. It is expressed as MA(x) where x represents previous observations that are used to calculate current observation.

Moving average models have a fixed window and weights are relative to the time. This implies that the MA models are more responsive to current event and are more volatile.

P (AutoRegressive), D (Integrated) and Q (Moving Average) are the three properties of ARIMA model

Coefficients are calculated recursively. Model is chosen such that the estimated results calculated from the model are closer to the actual observed values. This process is iterative in nature.

df = pd.read_csv('shampoo.csv', header=0, parse_dates=[0])
df['Date']=pd.to_datetime('190'+df.Month,format='%Y-%m')
df=df.dropna()
df['Sales'].plot()
Line plot to determine Trend which is Polynomial.
D =2

Autocorrelation Plot

ACF is an (complete) auto-correlation function which gives us values of auto-correlation of any series with its lagged values. We plot these values along with the confidence band and tada! We have an ACF plot. In simple terms, it describes how well the present value of the series is related with its past values. A time series can have components like trend, seasonality, cyclic and residual. ACF considers all these components while finding correlations hence it’s a ‘complete auto-correlation plot’.

from pandas.plotting import autocorrelation_plot
autocorrelation_plot(df['Sales'])

in ACF plot the line is crossing 95% Confidence Interval(upper) at near about 5 Lag and at 5 Lag our 99% C.I(dotted line), so we take order of Moving Average (q)
q = 5

Partial Autocorrelation Graph

PACF is a partial auto-correlation function. Basically instead of finding correlations of present with lags like ACF, it finds correlation of the residuals (which remains after removing the effects which are already explained by the earlier lag(s)) with the next lag value hence ‘partial’ and not ‘complete’ as we remove already found variations before we find the next correlation. So if there is any hidden information in the residual which can be modeled by the next lag, we might get a good correlation and we will keep that next lag as a feature while modeling. Remember while modeling we don’t want to keep too many features which are correlated as that can create multicollinearity issues. Hence we need to retain only the relevant features.

from statsmodels.graphics.tsaplots import plot_pacf
plot_pacf(df['Sales'], lags=15)
# Here the blue band is our CI
As in PACF plot lag = 2,is above our CI band, So order of Autoregression (p) p = 2
from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(df['Sales'], order=(5,1,0))
model_fit = model.fit()
model_fit.summary()
residuals = model_fit.resid
residuals.plot()
plt.title('ARIMA Fit Residual Error Line Plot')
plt.show()
residuals.plot(kind='kde')
plt.title('ARIMA Fit Residual Error Density Plot')
plt.show()
print(residuals.describe())

5. SARIMA or Seasonal ARIMA

SARIMA(p,d,q)(P,D,Q)m

  • p: Trend autoregression order.
  • d: Trend difference order.
  • q: Trend moving average order.

Seasonal Elements -
There are four seasonal elements that are not part of ARIMA that must be configured; they are:

  • P: Seasonal autoregressive order.
  • D: Seasonal difference order.
  • Q: Seasonal moving average order.
  • m: The number of time steps for a single seasonal period.
from statsmodels.tsa.statespace.sarimax import SARIMAX
df = pd.read_csv('us-airlines-monthly-aircraft-miles-flown.csv', header=0 , parse_dates=[0])
df.index = df['Month']
result_a = seasonal_decompose(df['MilesMM'], model='multiplicative')
result_a.plot()
model = SARIMAX(df['MilesMM'], order=(5,1,3), seasonal_order=(1,1,1,12))
model_fit = model.fit()
residuals = model_fit.resid
residuals.plot()
output = model_fit.forecast()
model_fit.forecast(12)
yhat = model_fit.predict()
yhat.head()
plt.plot(df['MilesMM'])
plt.plot(yhat, color='red')
Model fitting with SARIMA

If you found this Article interesting and contributed to your learning, give it a few claps or better still share it with your friends or colleagues.

You can find me on LinkedIn: https://www.linkedin.com/in/hiteshtripathi/

References:

--

--

Hitesh Tripathi

Statistician | Analyst | Technology Enthusiast | Pantheist | Explorer |