**Trend Estimation **

in Time Series Signals

**Hi!**

**Bugra Akyildiz**

Data Scientist at Axial

@bugraa

**Machine Learning Newsletter **| **mln.io**

##### bugra@nyu.edu

##### http://bit.ly/pydata-seattle-2015

**Axial**

A network that brings private companies with investors together

Enables business owners access to private capital markets

#### We are hiring! | axial.net

### Trend Estimation

Family of methods to be able to detect and predict tendencies and regularities in time series signals

- Depends on problem and domain
- Medium to Long Term Trend
- Mitigates seasonality(cycles) from data

#### Why?

- Trends are very interpretable
- Trends are easy to deal with when original signal is not very useful for processing

## Trend Estimation Methods
- Moving average filtering
- Exponential Weighted Moving Average (EWMA)
- Median filtering
- Bandpass filtering
- Hodrick Prescott Filter
- $l_1$ trend filtering
## Data
- The S&P 500, or the Standard & Poor's 500, is an American stock market index based on the market capitalizations of 500 large companies having common stock listed on the NYSE or NASDAQ.
- The S&P 500 index components and their weightings are determined by S&P Dow Jones Indices.
- The National Bureau of Economic Research has classified common stocks as a leading indicator of __business cycles__.
- We will come to these _cycles_ later.
```python
import pandas as pd
df = pd.read_csv(_SNP_500_PATH, parse_dates=['Date'])
df = df.sort(['Date'])
```
#### SNP 500 Data

## Moving Average Filtering
- Average the signal over a window
$$ y(t) = \frac{\displaystyle\sum_{i=-\frac{w}{2}}^{\frac{w}{2}} x(t + i)}{w} $$
### In Python
```python
import pandas as pd
window = 11
averaged_signal = pd.rolling_mean(df.Close, window)
```
## Good to Know
- Linear
- Not really a trend estimation method, but provides baseline
- If the window size is small, it removes high volatility part in the signal
- If the window size is large, it exposes the long-term trend
- Not robust to outliers and abrupt changes for small and medium window sizes
## Median Filtering
$$ y(t) = median\\{ x[t-\frac{w}{2}, t+\frac{w}{2}] \\} $$
where $w$ is the window size whose median will replace the original data point
### In Python
```python
from scipy import signal as sp_signal
window = 11
median_filtered_signal = sp_signal.medfilt(df.Close, window)
```
## Good to Know
- Nonlinear
- Very robust to noise
- If the window size is very large, it could _shadow_ mid-term change
- Trend signal may not be smooth(actually rarely is in practice)
## EWMA

### In Python
```python
import pandas as pd
span = 20
ewma_signal = pd.stats.moments.ewma(df.Close, span=span)
```
## Good to Know
- Linear
- Could provide a better estimate than a simple moving average because the weights
are better distributed
- Not robust to outliers and abrupt changes
- Very flexible in terms of weights and puts more emphasis on the spatial window
in the signal
## Bandpass Filtering
It filters based on __frequency response__ of the signal. It attenuates very low range
(long term) and very high frequency(short-term, volatility) and exposes mid-term
trend in the signal.
## In Python
```python
## Filter Construction
filter_order = 2
low_cutoff_frequency = 0.001
high_cutoff_frequency = 0.15
b, a = sp_signal.butter(filter_order, [low_cutoff_frequency, high_cutoff_frequency],
btype='bandpass', output='ba')
bandpass_filtered = sp_signal.filtfilt(b, a, df.Close.values)
```
## Good to Know
- Allow certain frequencies of the signal(between `low cutoff frequency` and `high cutoff frequency`) and attenuates the other frequencies.
- This provides a flexible way to remove/attenuate low frequency(very long term) and high frequency(short-term) in the signal.
- Could prepare different filters to stop a particular band as well(called band-stop filter).
- Similar to Hodrick-Prescott Filter, it extracts mid-term trend by removing very small changes(bias) and extracting short-term changes(cycle).
## Hodrick-Prescott(HP) Filter
- Decomposes the time-series signal into a trend $x_t$ (mid-term growth) and a
cyclical component(recurring and seasonal signal) $c_t$.
$$y_t = x_t + c_t$$
## HP Minimization Function

## Good to Know
- Linear
- Decomposes the signal into two distinct components(trend and cycle)
- Cycle part => short term, season
- Trend part => medium to long term
- With changing regularizer, smoothing can be adjusted in the signal
- Bandpass filter is at its heart
- Perfect for signals that show seasonality
- Yields good results when noise is normally distributed
### In Python
```python
import statsmodels.api as sm
lamb = 10 # Regularizer, lambda
snp_cycle, snp_trend = sm.tsa.filters.hpfilter(df.Close, lamb=lamb)
```
## $l_1$ Trend Filtering
Explanation: Instead of minimizing the mean squared error in HP minimization
function, what if we minimize by $l_1$ error? We could get a very robust way
to measure trend in the signal.
- Optimization function:
$$ \frac{1}{2} \lVert x - y \rVert_2^2 + \lambda \lVert Dx \rVert_1$$
where $x,y \in \mathbf{R}^n$ and $D$ is the second order difference matrix
## Good to Know
- Nonlinear
- Trend is piecewise linear, generally very smooth
- The kinks, or changes in slope of the estimated trend show abrupt events
- Changes in trend could be used for outlier detection
- Computationally a little bit expensive.
- Yields good results when noise is exponentially distributed
### Get the library
```bash
# See the source code: https://github.com/bugra/l1
# PRs are more than welcome!
git clone https://github.com/bugra/l1
cd l1
python setup.py install
```
### In Python
```python
from l1 import l1 # Get the library from: https://github.com/bugra/l1
regularizer = 1
l1_trend = l1(df.Close.values, regularizer)
```
### Questions?