Financial time series forecasting using ARIMA

7 min readApr 6, 2020

People have a notion that the stock market is a gamble with high risk. But it’s not! Successful investors proved that building trading strategies by considering micro and macro-financial, economic factors can make this gamble more predictable. For building trading strategies one needs to predict the stock price movement to take the position in the market.

To predict stock prices, you have three methods. Fundamental, Technical and Statistical methods (which Includes Machine learning)

Fundamental analysis focuses on a company’s financials and technical analysis uses various technical indicators like (Bollinger bands, RSI, MACD) by presupposing that all the information about the market and its further fluctuations is contained in the price chain. Both of these methods ignore short term market variations and lack predictive behavior. On the other hand, statistical methods (Time-series methodologies or Machine learning methods) consider the long term and short term data to extract meaningful information from data series to predict the future.

In this blog we will be focusing on specific model ARIMA, time series forecasting technique by answering the following questions:

1. Can stock prices be modeled by Time series forecasting methods?

2. What are the different Time series methods available?

3. Why ARIMA is a better time series model?

4. How can the ARIMA model be implemented on financial data?

5. Implementing ARIMA on — — — stock data in Python

Can stock prices be modeled by Time series forecasting methods?

Time series data is simply a time-ordered sequence of observations, can be chosen to analyze any variable that changes over time. Hence, data like daily stock closing prices, etc. collected at successive points at regular intervals of time are treated as time series. Here time is often the independent variable and the goal is usually to make a forecast for the future. The future of stock prices usually depends on the current values and it's short/long term historical behavior. Since time series’ forecasting methods assume that the observations are dependent or correlated, this can be better used for modeling the stock prices by utilizing the historical values and associated patterns in the price data.

What are the different Time series methods available?

Some traditional statistical methods used for time series are forecasting are:

• Random Walks — It predicts the future value by using the previous value plus a random change

• Simple Moving Average — Predicts the future value by taking the average of previous few observations

• Simple Exponential Smoothing: This method makes a prediction by calculating a weighted sum of past observations

• Holt’s linear trend: It is an extended version of the simple exponential smoothing method which allows forecasting of data with a trend. The forecasts generated by Holt’s linear method display a constant trend (increasing or decreasing) indefinitely into the future.

• Holt’s winter method: This is an extension of Holt’s linear method which also accounts for the seasonality in the time series.

• Autoregressive Integrated Moving Average (ARIMA): ARIMA builds a linear regression model (AR — Auto Regression) by preparing the data (which includes I-Integrated & MA-Moving Average) to remove trend and seasonal structures that negatively affect the regression model.

Why is ARIMA a better time series model?

ARIMA models are known to be robust and efficient in financial time series forecasting since it assumes past values of the series as well as previous error terms for forecasting. Also, ARIMA does not assume knowledge of any underlying model or relationships and are good predictors than more complex structural models in relation to short-term forecasting.

If a time series, has seasonal patterns, then you need to add seasonal terms and it becomes SARIMA, short for ‘Seasonal ARIMA’

A detailed explanation of how ARIMA works and how it can be implemented is mentioned below.

What is the ARIMA model and how can it be implemented on financial data?

The ARIMA model has 3 main components: (1) Auto Regression (p), (2) Integrated (d), and (3) Moving Average (q),

Auto Regression: This models the relationship between the series and it’s lagged observations (lag is a delay, one set of observations in a time series is plotted against a second, later set of data);

Moving Average: Uses the dependency between an observation and a residual error from a moving average model applied to lagged observations;

Integrated: a component that makes the series stationary (e.g. subtracting an observation from observation at the previous time step in order to make the time series stationary)

On a high-level ARIMA models the time series by capturing the autocorrelation in the series. For making accurate forecasting it requires the series to be stationary which means that the mean, variance, and autocorrelation structure of the series do not change with time.

How ARIMA handles the non-stationarity: The integration part in ARIMA, makes stationary time series out of the non-stationary one by using the “differencing” procedure. This ‘I’ component can handle 2 types of non-stationarity: hidden trend (linear, polynomial, seasonals, etc.), and unit-roots (random walk with drift). Differencing removes any type of polynomial trend and the higher the higher-degree polynomial is, the more differencing it requires to make the series stationary. The level of differencing is denoted by the d in ARIMA (p,d,q) parameters.

So while using ARIMA, it can convert a non-stationary to stationary given that the parameters are specified correctly. (For using the ARMA model without Integrated (I) component, the series needs to be first converted to stationary).

Construction of the ARIMA model by finding the 3 parameters p,d,q:

1. A. Check stationarity:

The term ‘Auto-Regressive’ in ARIMA means it is a linear regression model that uses its own lags as predictors. Generally linear regression models work best when the predictors are not correlated and are independent of each other and hence Auto Regression (AR) in ARIMA only works better on the stationary series. So the first step is to check whether the series is stationary or not. To check the stationarity, rather looking at different plots and observing the trends or variance in the time series data, we can apply statistical tests.

There are two widely used tests for detecting stationarity.

1. KPSS Test

2. Augmented Dickey-Fuller test (ADF)

The stationary test- KPSS has a null hypothesis that the series is stationary, whereas the unit root test — Augmented Dickey-Fuller test (ADF) considers the null hypothesis is on the contrary that the series possesses a unit root and hence is not stationary. It’s better to run a stationarity test (ADF) that takes null-hypothesis as a non-stationary series because you reduce the risk of not rejecting the null hypothesis (type 2 error) of a series which is stationary. The p-value from the ADF test has to be less than 0.05 for a series to be stationary. If not then the time series is non-stationary and tune the differencing parameter ‘d’ in ARIMA as explained below to convert it to stationary.

1.B. Differencing (‘d’):

Differencing method computes the differences between consecutive observations i.e subtract the previous value from the current value. The level of differencing is denoted by the parameter ‘d’. In the statistical tests done above for stationarity, the differencing parameter is 0. Next, perform differencing at different lags and calculate the p-value in statistical tests again to check for the stationarity. Whenever the series is found to be stationary choose that differencing level as the parameter ‘d’ in the ARIMA model.

Note: In this process of differencing, there’s a chance that the series might have been slightly over-differenced or slightly under-differenced. Add an additional MA term to handle over-differenced and try adding one or more additional AR terms for under-differencing

2. Auto-Regressive (‘p’):

Once the time series is stationary, we need to find out the required number of AR (lag) terms for forecasting the series. ‘p’ parameter in ARIMA allows us to incorporate the effect of past values into our model. For choosing the right number of lags (p parameter) we use the Partial Autocorrelation (PACF) plots. Partial autocorrelation is the correlation between the series and its lag, after excluding the contributions from the intermediate lags. So, PACF is a pure correlation between a lag and the series.

Any autocorrelation in a stationarized series can be rectified by adding enough AR terms. So, we initially take the order of the AR term to be equal to as many lags that cross the significance limit in the PACF plot. Choose the lag value at which the PACF plot will be above the significance line. Having too many lag components can create multicollinearity issues. Hence choose the right p parameter from the PACF plot.

3. Moving Average (‘q’):

To make the forecasting from the Autoregression (AR) more accurate parameter ‘q’ in ARIMA allows us to set the error of our model as a linear combination of the error values observed at previous time points in the past. So, similar to the PACF plot for the number of AR terms, check the ACF plot (correlation of the time series observations is calculated with values of the same series at previous times, and this is called autocorrelation) for the number of MA terms. An MA term is technically, the error of the lagged forecast. Check the lags at which series is above the significance line and choose that value as the ‘q’ component in the ARIMA parameters.

Tune p and q parameters using AIC and BIC to check if we agree with above derived values from ACF, PACF plots.

Now the ARIMA model can be implemented on the time series which is differenced at least once to make it stationary and then by combining the right AR and the MA terms.

So, Predicted Yt = Constant + Linear combination Lags of Y (upto p lags) + Linear Combination of Lagged forecast errors (upto q lags)

Implementing ARIMA for forecasting stock prices (Python code)

Data: Alphabet Inc. Adjusted Close price data for over 16 years is pulled from the Yahoo finance website. I have applied the above ARIMA model to predict the next day’s adjusted close price in python.

Conclusion:

Although ARIMA is a simple and powerful method for many time series forecasting, it has an embedded weakness as the predicted value is regression against the prior value. So using ARIMA we can see the past to forecast the next value but not a good enough means to see the future since it uses the forecasted data to further forecast the long term. This way there is a high chance that errors get integrated and long term forecasts can be misleading.

Financial time series forecasting using ARIMA

Written by Chamundeswari Koppisetti