Machine Learning Newsletter

Trend Estimation Methods for Time Series Signals

Trend Estimation

Trend estimation in a very broad sense could be defined as a family of methods to be able to detect and predict tendencies and regularities in time series signals. It is generallly highly dependent on the problem domain and application. Therefore, it begs different approaches for every other problem rather than a single solution that works across different domains.

Why do we do it?

It is very useful when the raw signal is very volatile and not necessarily useful for processing. What do I mean by that? If you want to compute slope of a noisy signal, derivative of the function will amplify the noise where it attenuates the useful signal. For this type of problem, it would be much better to preprocess the signal first(either a low-pass filter or some other preprocessing step) and then compute the slope on top of "smoothed" signal. We could of course use trend estimation to compute the trend first and then compute the slope on top of the trend, which will provide more accurate results.

Tell me more

What else? If you have a noisy signal that actually has the definition of volatility inside of it, then you would have hard time to communicate how the signal actually behaves in the long term to other people. When you compute a trend which is right and up, people would understand it is trending or has a certain amount of traction over time.

So, trend estimation is useful, but how are we going to do that?

Here is how we are going to do that

There are many ways on how one could do trend estimation in the time series data. Since it is very signal dependent and various ways, instead of choosing "one" algorithm over others, in this blog post, I will intrdocuce 6 different methods and try to summarize their properties/advantages. They are:

  • Moving average filtering
  • Exponential Weighted Moving Average (EWMA)
  • Median filtering
  • Bandpass filtering
  • Hodrick Prescott Filter
  • $l_1$ trend filtering

Let's get started.

In [1]:
%matplotlib inline
from datetime import date
from l1 import l1
import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy import signal as sp_signal
from sklearn import ensemble
import statsmodels.api as sm

plt.style.use('fivethirtyeight')

_DATA_DIR = 'data'
_FIG_DIR = 'figures'

if not os.path.exists(_FIG_DIR):
    os.makedirs(_FIG_DIR)

_SNP_500_PATH = os.path.join(_DATA_DIR, 'snp500.csv')
_BEGINNING_DATE = date(2014, 1, 1)

## Matplotlib Variables
_FIG_SIZE = (16, 12)
_FIG_FORMAT = 'png'
_FIG_DPI = 200
_FIG_LEGEND_LOCATION = 4

_ORIGINAL_SIGNAL_LABEL = 'Original Signal'
_MEAN_AVERAGE_SIGNAL_LABEL = 'Mean Averaged Signal'
_MEDIAN_FILTERED_SIGNAL_LABEL = 'Median Filtered Signal'
_EXPONENTIAL_SMOOTHED_AVERAGE_SIGNAL_LABEL = 'Exponential Moving Averaged Signal'
_SNP_CYCLE_SIGNAL_LABEL = 'Cycle Signal'
_SNP_TREND_SIGNAL_LABEL = 'Trend'
_L1_TREND_SIGNAL_LABEL = '$l_1$ Trend'
_BAND_PASS_FILTER_LABEL = 'Band-pas filtered'


_BUTTERWORTH_FILTER_TITLE = 'Butterworth filter frequency response'
_BUTTERWORTH_FILTER_XLABEL = 'Frequency [radians / second]'
_BUTTERWORTH_FILTER_YLABEL = 'Amplitude [dB]'

_MEAN_AVERAGE_SIGNAL_TITLE_TEMPLATE = 'Mean Average signal with window = {}'.format
_MEDIAN_FILTER_SIGNAL_TITLE_TEMPLATE = 'Median Filtered signal with window = {}'.format
_EXPONENTIAL_SMOOTHING_TITLE_TEMPLATE = 'Exponential Smoothed signal with span = {}'.format
_HODRICK_PRESSCOTT_TITLE_TEMPLATE = 'Trend and Cycle Signal with smoothing parameter: {}'.format
_L1_TITLE_TEMPLATE = '$l_1$ Trend Signal with regularizer: {}'.format
_BAND_PASS_TITLE_TEMPLATE = 'Band-pass order: {}, low - high cutoff frequency: {} - {}'.format

Data Explanation

  • The S&P 500, or the Standard & Poor's 500, is an American stock market index based on the market capitalizations of 500 large companies having common stock listed on the NYSE or NASDAQ.
  • The S&P 500 index components and their weightings are determined by S&P Dow Jones Indices.
  • It differs from other U.S. stock market indices, such as the Dow Jones Industrial Average or the Nasdaq Composite index, because of its diverse constituency and weighting methodology.
  • It is one of the most commonly followed equity indices, and many consider it one of the best representations of the U.S. stock market, and a bellwether for the U.S. economy.
  • The National Bureau of Economic Research has classified common stocks as a leading indicator of business cycles.

Reference: http://www.albany.edu/cer/bc/bc_essays.html

In [2]:
def _file_format(string_):
    string_ = string_.replace('-', '_').replace(' ', '_').replace('$', '')
    string_ += '.' + _FIG_FORMAT
    return string_
In [3]:
df = pd.read_csv(_SNP_500_PATH, parse_dates=['Date'])
df = df.sort(['Date'])
df = df[df.Date > _BEGINNING_DATE]
In [4]:
df.head()
Out[4]:
Date Open High Low Close Volume Adj Close
382 2014-01-02 1845.859985 1845.859985 1827.739990 1831.979980 3080600000 1831.979980
381 2014-01-03 1833.209961 1838.239990 1829.130005 1831.369995 2774270000 1831.369995
380 2014-01-06 1832.310059 1837.160034 1823.729980 1826.770020 3294850000 1826.770020
379 2014-01-07 1828.709961 1840.099976 1828.709961 1837.880005 3511750000 1837.880005
378 2014-01-08 1837.900024 1840.020020 1831.400024 1837.489990 3652140000 1837.489990
In [5]:
title = 'SNP 500 Close Price between {} - 2015-07-11'.format(_BEGINNING_DATE)
fig_path = os.path.join(_FIG_DIR, _file_format(title))
plt.figure(figsize=_FIG_SIZE)
plt.plot_date(df.Date, df.Close, '-');
plt.title(title);
plt.savefig(fig_path, dpi=_FIG_DPI)

What is the simplest possible trend estimation method? In signal processing, if someone says simplest, you'd just respond "moving average". No matter what follows simplest, it always is the simplest. Not necessarily is the best for anything, but provides a good nice baseline which would be very useful.

Moving Average Filtering

For a window size: $N$, the computation is pretty straightforward. Applying window on top of the original signal and average the signal for a given time window. Following formula assumes that $N$ being odd.

$$y(t) = \frac{\displaystyle\sum_{i=-\frac{N+1}{2}}^{\frac{N+1}{2}} x(t + i)}{w} $$
In [6]:
window = 11
title = _MEAN_AVERAGE_SIGNAL_TITLE_TEMPLATE(window)
fig_path = os.path.join(_FIG_DIR, _file_format(title))
plt.figure(figsize=_FIG_SIZE)
plt.plot_date(df.Date, df.Close, '-', label=_ORIGINAL_SIGNAL_LABEL)
plt.plot_date(df.Date, pd.rolling_mean(df.Close, window), '-', label=_MEAN_AVERAGE_SIGNAL_LABEL)
plt.title(title)
plt.legend(loc=_FIG_LEGEND_LOCATION);
plt.savefig(fig_path, format=_FIG_FORMAT, dpi=_FIG_DPI)
In [7]:
window = 21
title = _MEAN_AVERAGE_SIGNAL_TITLE_TEMPLATE(window)
fig_path = os.path.join(_FIG_DIR, _file_format(title))
plt.figure(figsize=_FIG_SIZE)
plt.plot_date(df.Date, df.Close, '-', label=_ORIGINAL_SIGNAL_LABEL)
plt.plot_date(df.Date, pd.rolling_mean(df.Close, window), '-', label=_MEAN_AVERAGE_SIGNAL_LABEL)
plt.title(title)
plt.legend(loc=_FIG_LEGEND_LOCATION);
plt.savefig(fig_path, format=_FIG_FORMAT, dpi=_FIG_DPI)
In [8]:
window = 61
title = _MEAN_AVERAGE_SIGNAL_TITLE_TEMPLATE(window)
fig_path = os.path.join(_FIG_DIR, _file_format(title))
plt.figure(figsize=_FIG_SIZE)
plt.plot_date(df.Date, df.Close, '-', label=_ORIGINAL_SIGNAL_LABEL)
plt.plot_date(df.Date, pd.rolling_mean(df.Close, window), '-', label=_MEAN_AVERAGE_SIGNAL_LABEL)
plt.title(title)
plt.legend(loc=_FIG_LEGEND_LOCATION);
plt.savefig(fig_path, format=_FIG_FORMAT, dpi=_FIG_DPI)

Good to Know for Moving Average

  • Linear
  • Not really a trend estimation method, but provides baseline
  • If the window size is small, it removes high volatility part in the signal
  • If the window size is large, it exposes the long-term trend
  • Not robust to outliers and abrupt changes for small and medium window sizes

Median Filter

Median Filter is very similar to the moving average filter. It only differs from computation of median of a given window rather than computing a mean. This approach and the next approach are very similar in terms of computing observations in a given window. It computes the median rather than computing the average.

$$ y(t) = median( x[t-\frac{w}{2}, t+\frac{w}{2}])$$

where $w$ is the window size whose median will replace the original data point

In [9]:
window = 11
median_filtered_signal = sp_signal.medfilt(df.Close, window)
title = _MEDIAN_FILTER_SIGNAL_TITLE_TEMPLATE(window)
fig_path = os.path.join(_FIG_DIR, _file_format(title))
plt.figure(figsize=_FIG_SIZE)
plt.plot_date(df.Date, df.Close, '-', label=_ORIGINAL_SIGNAL_LABEL)
plt.plot_date(df.Date, median_filtered_signal, '-', label=_MEAN_AVERAGE_SIGNAL_LABEL)
plt.title(title)
plt.legend(loc=_FIG_LEGEND_LOCATION);
plt.savefig(fig_path, format=_FIG_FORMAT, dpi=_FIG_DPI)
In [10]:
window = 21
median_filtered_signal = sp_signal.medfilt(df.Close, window)
title = _MEDIAN_FILTER_SIGNAL_TITLE_TEMPLATE(window)
fig_path = os.path.join(_FIG_DIR, _file_format(title))
plt.figure(figsize=_FIG_SIZE)
plt.plot_date(df.Date, df.Close, '-', label=_ORIGINAL_SIGNAL_LABEL)
plt.plot_date(df.Date, median_filtered_signal, '-', label=_MEDIAN_FILTERED_SIGNAL_LABEL)
plt.title(title)
plt.legend(loc=_FIG_LEGEND_LOCATION);
plt.savefig(fig_path, format=_FIG_FORMAT, dpi=_FIG_DPI)
In [11]:
window = 61
median_filtered_signal = sp_signal.medfilt(df.Close, window)
title = _MEDIAN_FILTER_SIGNAL_TITLE_TEMPLATE(window)
fig_path = os.path.join(_FIG_DIR, _file_format(title))
plt.figure(figsize=_FIG_SIZE)
plt.plot_date(df.Date, df.Close, '-', label=_ORIGINAL_SIGNAL_LABEL)
plt.plot_date(df.Date, median_filtered_signal, '-', label=_MEDIAN_FILTERED_SIGNAL_LABEL)
plt.title(title)
plt.legend(loc=_FIG_LEGEND_LOCATION);
plt.savefig(fig_path, format=_FIG_FORMAT, dpi=_FIG_DPI)

Good to Know for Median Filter

  • Nonlinear
  • Very robust to noise
  • If the window size is very large, it could shadow mid-term change
  • Trend signal may not be smooth(actually rarely is in practice)

Exponential Weighted Moving Average (EWMA)

$$ y_t = \frac{\displaystyle\sum_i^t w_i x_{t-i}}{\displaystyle\sum_i^t w_i} $$$$ w_i = \left\{ \begin{array}{lr} \alpha (1 - \alpha)^i & i \lt t\\ (1 - \alpha)^i & i = t \end{array} \right. $$$$ \alpha = \left\{ \begin{array}{lr} \frac{2}{s+1} & \text{s=span}\\ \frac{1}{c+1} & \text{c=center of mass}\\ (1 - e^{\frac{log 0.5}{h}}) & \text{h = half life} \end{array} \right. $$

One can think the exponential weights in terms of span and half life. Half life signifies when the weight becomes half of the max and span is how many days it spans when you get the average of signal with weights.

In [12]:
## Exponential Weighted Moving Average
span = 11
title = _EXPONENTIAL_SMOOTHING_TITLE_TEMPLATE(span)
fig_path = os.path.join(_FIG_DIR, _file_format(title))
plt.figure(figsize=_FIG_SIZE)
plt.plot_date(df.Date, df.Close, '-', label=_ORIGINAL_SIGNAL_LABEL)
plt.plot_date(df.Date, pd.stats.moments.ewma(df.Close, span=span), '-', label=_EXPONENTIAL_SMOOTHED_AVERAGE_SIGNAL_LABEL)
plt.title(title)
plt.legend(loc=_FIG_LEGEND_LOCATION);
plt.savefig(fig_path, format=_FIG_FORMAT, dpi=_FIG_DPI)
In [13]:
## Exponential Weighted Moving Average
span = 21
title = _EXPONENTIAL_SMOOTHING_TITLE_TEMPLATE(span)
fig_path = os.path.join(_FIG_DIR, _file_format(title))
plt.figure(figsize=_FIG_SIZE)
plt.plot_date(df.Date, df.Close, '-', label=_ORIGINAL_SIGNAL_LABEL)
plt.plot_date(df.Date, pd.stats.moments.ewma(df.Close, span=span), '-', label=_EXPONENTIAL_SMOOTHED_AVERAGE_SIGNAL_LABEL)
plt.title(title)
plt.legend(loc=_FIG_LEGEND_LOCATION);
plt.savefig(fig_path, format=_FIG_FORMAT, dpi=_FIG_DPI)