Machine Learning Newsletter

Trend Estimation via Hodrick Prescott Filter

Trend Estimation Methods

With more and more sensors readily available and collection of data becomes more ubiquitous and enables machine to machine communication(a.k.a internet of things), time series signals play more and more important role in both data collection process and also naturally in the data analysis. Data aggregation from different sources and from many people make time-series analysis crucially important in these settings. Detecting trends and patterns in time-series signals enable people to respond these changes and take actions intelligibly. Historically, trend estimation has been useful in macroeconomics, financial time series analysis, revenue management and many more fields to reveal underlying trends from the time series signals.

Trend estimation is a family of methods to be able to detect and predict tendencies and regularities in time series signals without knowing any information a priori about the signal. Trend estimation is not only useful for trends but also could yield seasonality(cycles) of data as well. Robust estimation of increasing and decreasing trends not only infer useful information from the signal but also prepares us to take actions accordingly and more intelligibly where the time of response and to action is important.

When you have a quite volatile signal and want to see a mid-to-long term range ignoring the short-term or seasonal effects in the time series signal, then you could actually use a band-pass filter in order to get a medium range signal change over time. Or there are trend filters that economics people have been sing in order to separate seasonality from the mid-long term range. Consider a product sale time series signal, it will definitely show a seasonality(in holiday season, product is sold much more than any other period). When you visualize this over years, you would see an actual cycle effect in the time series signal.

Hodrick-Prescott Filter

There are various ways to do trend estimation methods, you could decompose the signal into two compoenent; one cycle part(which is short-term) and one trend part(which is medium-to-long term), which is what Hodrick-Prescott Filter tries to do.

Hodrick Prescott filter is a bandpass filter where it tries to decompose the time-series signal into a trend $x_t$ (mid-term growth) and a cylical component(recurring and seasonal signal) $c_t$.

$$y_t = x_t + c_t$$

The loss function that it tries to minimize is the following:

$$\min_{\\{ x_{t}\\} }\sum_{t}^{T} c_{t}^{2}+\lambda\displaystyle\sum_{t=1}^{T}\left[\left(x_{t}- x_{t-1}\right)-\left(x_{t-1}-x_{t-2}\right)\right]^{2}$$

The first term is the square of difference of original signal and growth signal(cylical component) and $\lambda$ is the smoothing parameter.

Based on the smoothing parameter, you could actually change what type of effects you may want to include or capture(if you want to capture some variation and volatility in short-term signal, then you may want to use a smaller smoothing parameter so that you have less smooth signal. If you want to also capture only a long term range signal, the smoothing parameter could be chosen arbitrarily large. However, in order to get some changes, we need to not to choose very large smoothing optimization parameter.

In the following section, I will look at the revenue numbers of Apple and stock price of Apple to see if we could use trend estimation to see if there is an increase or a decrease in the time-series signals over time.

In [2]:
%matplotlib inline

import matplotlib as mlp
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import Quandl

import statsmodels.api as sm

plt.style.use('fivethirtyeight')
In [3]:
_FIG_SIZE = (16, 12)
In [4]:
df_aapl_revenue = Quandl.get("SEC/AAPL_SALESREVENUENET_Q", trim_start="2009-06-27", trim_end="2014-06-28")
df_aapl = Quandl.get("WIKI/AAPL", trim_start="1980-12-12", trim_end="2014-10-16")
In [5]:
df_aapl.head()
Out[5]:
Open High Low Close Volume Ex-Dividend Split Ratio Adj. Open Adj. High Adj. Low Adj. Close Adj. Volume
Date
1980-12-12 28.75 28.88 28.75 28.75 2093900 0 1 0.475256 0.477405 0.475256 0.475256 117258400
1980-12-15 27.38 27.38 27.25 27.25 785200 0 1 0.452609 0.452609 0.450460 0.450460 43971200
1980-12-16 25.38 25.38 25.25 25.25 472000 0 1 0.419547 0.419547 0.417398 0.417398 26432000
1980-12-17 25.88 26.00 25.88 25.88 385900 0 1 0.427813 0.429796 0.427813 0.427813 21610400
1980-12-18 26.62 26.75 26.62 26.62 327900 0 1 0.440045 0.442194 0.440045 0.440045 18362400
In [6]:
df_aapl_revenue.head()
Out[6]:
Value
Date
2009-06-27 8337000000
2009-12-26 15683000000
2010-03-27 13499000000
2010-06-26 15700000000
2010-09-25 20343000000
In [7]:
fig, ax = plt.subplots(figsize=_FIG_SIZE)
plt.plot(df_aapl_revenue.index, df_aapl_revenue.Value);
plt.title('Apple Revenue');
In [8]:
fig, ax = plt.subplots(figsize=_FIG_SIZE)
plt.plot(df_aapl.index, df_aapl.Close);
plt.title('Closing Value of Apple Stock');
In [9]:
revenue_cycle, revenue_trend = sm.tsa.filters.hpfilter(df_aapl_revenue.Value)
stock_cycle, stock_trend = sm.tsa.filters.hpfilter(df_aapl.Close)

revenue_df = pd.DataFrame(df_aapl_revenue)
revenue_df['cycle'] = revenue_cycle
revenue_df['trend'] = revenue_trend

stock_df = pd.DataFrame(df_aapl['Close'])
stock_df['cycle'] = stock_cycle
stock_df['trend'] = stock_trend

revenue_df.plot(figsize=_FIG_SIZE, title='Revenue Plot of Cycle and Trend');

In this plot, we could easily see the trend and the cycle plots using a trend estimation. The linear fit is particulary good and does not get affected in short term volatility. One another application of trend estimation is to be able to capture the seasonality or periodicity of the signal as well if you are interested more in the seasonality rather than the trend signal.

In [10]:
stock_df.plot(figsize=_FIG_SIZE, title='Stock Price of Cycle and Trend by HP Filter');

If the cycling behavior is not that obvious, then it does not do as good job as the one that has cycling behavior. We could see in here it removes some volatility in the signal, but it is not as powerful in terms of extracting the medium range signal as you could see from the above graph.

In [18]:
lamb = 1e6
stock_cycle_annual, stock_trend_annual = sm.tsa.filters.hpfilter(df_aapl.Close, lamb=lamb)
stock_df_annual = pd.DataFrame(df_aapl['Close'])
stock_df_annual['cycle'] = stock_cycle_annual
stock_df_annual['trend'] = stock_trend_annual
stock_df_annual.plot(figsize=_FIG_SIZE, title='Stock Price of Cycle and Trend by HP Filter with smoothing parameter {}'.format(int(lamb)));

This fits well and capture somehow the short term subtle changes as well. As you increase the smoothing parameter, you would get a much smoother trend signal where the cyclical component gets more and more high frequency componets.

In [17]:
lamb = 1e8
stock_cycle_annual, stock_trend_annual = sm.tsa.filters.hpfilter(df_aapl.Close, lamb=lamb)
stock_df_annual = pd.DataFrame(df_aapl['Close'])
stock_df_annual['cycle'] = stock_cycle_annual
stock_df_annual['trend'] = stock_trend_annual
stock_df_annual.plot(figsize=_FIG_SIZE, title='Stock Price of Cycle and Trend by HP Filter with smoothing parameter {}'.format(int(lamb)));

The trend signal is much smoother, and we are capturing more and more short-term volatility in the cycling component. Let's increase the smoothing parameter one more time in order to make the signal a little bit smoother.

In [15]:
lamb = 1e9
stock_cycle_annual, stock_trend_annual = sm.tsa.filters.hpfilter(df_aapl.Close, lamb=lamb)
stock_df_annual = pd.DataFrame(df_aapl['Close'])
stock_df_annual['cycle'] = stock_cycle_annual
stock_df_annual['trend'] = stock_trend_annual
stock_df_annual.plot(figsize=_FIG_SIZE, title='Stock Price of Cycle and Trend by HP Filter with smoothing parameter {}'.format(int(lamb)));

This signal only takes the medium range changes and only when they actually persist over time in the time series signals. Even significant jumps(e.g. around 2012), does not affect the signal too much.

In general, if the cycling behavior is not good and there are a lot of short-term volatility in the signal, then you should choose a larger smoothing parameter to further smooth the signal and then get the trend from the signal.

comments powered by Disqus