Bugra Akyildiz
@bugraa
bugra.github.io

# Outliers

## What is it?

The problem, as always, is what the heck does one mean by "outlier" in these contexts. Seems to be like pornography -- "I know it when I see it." - Berton Gunter, R-help, December 2009

# Outlier Definition

## An observation in a data set which appears to be inconsistent with the remainder of that set of data

## Median Filter $$y[m] = median\\{ x[m-\frac{w}{2}, m+\frac{w}{2}] \\}$$ where $w$ is the window size whose median will replace the original data point. - Not linear - Does not smooth the signal. (Does not blur the images) - Robust to outliers!

## Toy Example

• Window size: $w$ = 3
• Extend left-most and right-most values to compute boundaries

# Fast Fourier Transform (FFT)

## Noise Removal

• Not necessarily distant than main signal
• Not necessarily random
• May have dependency to the original signal
• ## Outlier Detection

• Distant observations than main signal
• ## DFT(Discrete Fourier Transform) Definition $N$ point FFT of $x[n]$ sequence $$X[k] = \sum_{n=0}^{N-1} x[n] e^{\frac{-j 2 \pi n k}{N}}$$ where $w$(frequency) is sampled by $\frac{2\pi k}{N}$ for $k = 0,1, \ldots, N-1$. ## IDFT(Inverse DFT) Definition $$x[n] = \frac{1}{N} \sum_{k=0}^{N-1} X[k] e^{\frac{j 2\pi n k}{N}}$$ for $n=0, 1, \ldots, N-1$

# FFT Block

# Euler's Formula ## $$e^{jx} = cos x + j sin x$$

## Outlier Detection via Monte Carlo Markov Chain

• Monte Carlo Sampling: To estimate characteristics of a distribution
• Mean
• Variance
• Kurtosis
• Markov Chain: Stochastic process to sample different _states_ from a stationay distribution
• MCMC: Design a Markov Chain in such a way that the stationary distribution is the distribution that we want to estimate (target distribution)
## Metropolis-Hastings Sampler We want to sample from $p(x)$, creating the following Markov Chain: $$x^{(1)} \rightarrow x^{(2)} \rightarrow \ldots \rightarrow x^{(t)} \rightarrow \ldots$$ where $t$ is the iteration number in Markov Chain - Initialize $x^{(1)}$ - Use proposal distribution $q(x_s|x_t)$ to sample candidate value $x_s$ given the current value of $x_t$ - Assuming the transition matrix is symmetric, compute $a_{prob}(x_t, x_s) = min [1, \frac{p(x_s)}{p(x_t)}]$ - Based on acceptance criterion($a_{prob}$), accept or reject the observation.

## Basic Metropolis Sampler


n_iterations = 100000
burn_in = 20000
t = np.zeros(n_iterations)
ii = 0
sigma = 1.

t_previous = np.random.randn()
posterior_distribution = lambda x: np.exp(-x * x/2.)
proposal_distribution = lambda mu, sigma: sigma * np.random.randn() + mu

while ii < n_iterations:
t_proposed = proposal_distribution(t_previous, sigma)
proposed_posterior = posterior_distribution(t_proposed)
previous_posterior = posterior_distribution(t_previous)
acceptance_criterion = np.min([proposed_posterior / previous_posterior, 1.0])
temp = np.random.rand()
if temp < acceptance_criterion : # acceptance criteria
t[ii]=t_proposed
t_previous=t_proposed
else:
t[ii]=t_previous
ii += 1


## Outliers are captured via MCMC

### Advantages

• As long as the model is nicely captured, the priors do not have to be known beforehand. The data leads the way.
• If priors are known, they could be incorporated into the model whereas other models cannot use prior information.
• Probabilistic, you could reason about probabilities whereas frequentist approaches do not provide a similar functionality like Bayesain methods.
• One can build complex models in Bayesian setting whereas this is not possible for Frequentist methods as you depend on the method rather than models.

### Disadvantages

• You need to design priors, models and inference schema beforehand whereas you do not need to do these things in frequentist methods.
• Generally both computation and algorithm-wise, frequentist approaches tend to be faster and simpler.

## Credits

• http://xkcd.com/
• http://1ucasvb.tumblr.com/