Trend Estimation
in Time Series Signals

Hi!

Bugra Akyildiz

Data Scientist at Axial

@bugraa


Machine Learning Newsletter | mln.io


bugra@nyu.edu

http://bit.ly/pydata-seattle-2015

Axial

A network that brings private companies with investors together

Enables business owners access to private capital markets

We are hiring! | axial.net

Trend Estimation

Family of methods to be able to detect and predict tendencies and regularities in time series signals


  • Depends on problem and domain
  • Medium to Long Term Trend
  • Mitigates seasonality(cycles) from data


Why?

  • Trends are very interpretable
  • Trends are easy to deal with when original signal is not very useful for processing
## Trend Estimation Methods - Moving average filtering - Exponential Weighted Moving Average (EWMA) - Median filtering - Bandpass filtering - Hodrick Prescott Filter - $l_1$ trend filtering
## Data - The S&P 500, or the Standard & Poor's 500, is an American stock market index based on the market capitalizations of 500 large companies having common stock listed on the NYSE or NASDAQ. - The S&P 500 index components and their weightings are determined by S&P Dow Jones Indices. - The National Bureau of Economic Research has classified common stocks as a leading indicator of __business cycles__. - We will come to these _cycles_ later. ```python import pandas as pd df = pd.read_csv(_SNP_500_PATH, parse_dates=['Date']) df = df.sort(['Date']) ```

SNP 500 Data

## Moving Average Filtering - Average the signal over a window $$ y(t) = \frac{\displaystyle\sum_{i=-\frac{w}{2}}^{\frac{w}{2}} x(t + i)}{w} $$
### In Python ```python import pandas as pd window = 11 averaged_signal = pd.rolling_mean(df.Close, window) ```
## Good to Know - Linear - Not really a trend estimation method, but provides baseline - If the window size is small, it removes high volatility part in the signal - If the window size is large, it exposes the long-term trend - Not robust to outliers and abrupt changes for small and medium window sizes
## Median Filtering $$ y(t) = median\\{ x[t-\frac{w}{2}, t+\frac{w}{2}] \\} $$ where $w$ is the window size whose median will replace the original data point
### In Python ```python from scipy import signal as sp_signal window = 11 median_filtered_signal = sp_signal.medfilt(df.Close, window) ```
## Good to Know - Nonlinear - Very robust to noise - If the window size is very large, it could _shadow_ mid-term change - Trend signal may not be smooth(actually rarely is in practice)

EWMA

### In Python ```python import pandas as pd span = 20 ewma_signal = pd.stats.moments.ewma(df.Close, span=span) ```
## Good to Know - Linear - Could provide a better estimate than a simple moving average because the weights are better distributed - Not robust to outliers and abrupt changes - Very flexible in terms of weights and puts more emphasis on the spatial window in the signal
## Bandpass Filtering It filters based on __frequency response__ of the signal. It attenuates very low range (long term) and very high frequency(short-term, volatility) and exposes mid-term trend in the signal.
## In Python ```python ## Filter Construction filter_order = 2 low_cutoff_frequency = 0.001 high_cutoff_frequency = 0.15 b, a = sp_signal.butter(filter_order, [low_cutoff_frequency, high_cutoff_frequency], btype='bandpass', output='ba') bandpass_filtered = sp_signal.filtfilt(b, a, df.Close.values) ```
## Good to Know - Allow certain frequencies of the signal(between `low cutoff frequency` and `high cutoff frequency`) and attenuates the other frequencies. - This provides a flexible way to remove/attenuate low frequency(very long term) and high frequency(short-term) in the signal. - Could prepare different filters to stop a particular band as well(called band-stop filter). - Similar to Hodrick-Prescott Filter, it extracts mid-term trend by removing very small changes(bias) and extracting short-term changes(cycle).
## Hodrick-Prescott(HP) Filter - Decomposes the time-series signal into a trend $x_t$ (mid-term growth) and a cyclical component(recurring and seasonal signal) $c_t$. $$y_t = x_t + c_t$$

HP Minimization Function

## Good to Know - Linear - Decomposes the signal into two distinct components(trend and cycle) - Cycle part => short term, season - Trend part => medium to long term - With changing regularizer, smoothing can be adjusted in the signal - Bandpass filter is at its heart - Perfect for signals that show seasonality - Yields good results when noise is normally distributed
### In Python ```python import statsmodels.api as sm lamb = 10 # Regularizer, lambda snp_cycle, snp_trend = sm.tsa.filters.hpfilter(df.Close, lamb=lamb) ```
## $l_1$ Trend Filtering Explanation: Instead of minimizing the mean squared error in HP minimization function, what if we minimize by $l_1$ error? We could get a very robust way to measure trend in the signal. - Optimization function: $$ \frac{1}{2} \lVert x - y \rVert_2^2 + \lambda \lVert Dx \rVert_1$$ where $x,y \in \mathbf{R}^n$ and $D$ is the second order difference matrix
## Good to Know - Nonlinear - Trend is piecewise linear, generally very smooth - The kinks, or changes in slope of the estimated trend show abrupt events - Changes in trend could be used for outlier detection - Computationally a little bit expensive. - Yields good results when noise is exponentially distributed
### Get the library ```bash # See the source code: https://github.com/bugra/l1 # PRs are more than welcome! git clone https://github.com/bugra/l1 cd l1 python setup.py install ``` ### In Python ```python from l1 import l1 # Get the library from: https://github.com/bugra/l1 regularizer = 1 l1_trend = l1(df.Close.values, regularizer) ```

Questions?