Machine Learning Newsletter

First, Second Derivative, Convolution and Quadratic Fitting and all that via MCMC

Agenda

I will look at in this post:

  • First, how we approach fitting a curve to a perfect quadratic function, using first order and second order derivatives of the function.
  • Second, how one can do curve fitting in a quadratic function via Monte Carlo Markov Chain(MCMC) via Pymc.
  • Last, how convolution could be used for numerical differentiation to estimate the coefficients of quadratic function.
In [101]:
%matplotlib inline
In [102]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc
import random
import seaborn as sns
from scipy import signal as ssignal

The following signal would be our perfect signal, in the sense that it is quadratic and we want to estimate its parameters through MCMC, and then using gradients(first derivative and second derivative) of the function. The range would be from 0 to 1000 and the coefficient numbers are chosen arbitrarily.
To see the advantages of the MCMC over taking simple derivatives, I will add some noise to the signal as well as. Noisy signal would be useful to show the advantages of MCMC. Noisy signal is also more realistic and close to real world signal, as almost never we would have processes that produce a perfect second order quadratic signal. Even the systems that are supposed to produce that type of signals, they cannot do that due to environmental, component-dependent noise.

In [103]:
## Constucting our perfect signal
a = 5
b = 10
c = 50
x = np.arange(1000)
signal = a * x ** 2 + b * x + c
noisy_signal = signal + 10000 * np.random.normal(0, 9, x.size)
In [104]:
plt.figure(figsize=(12, 12));
plt.plot(signal);
plt.title(r'$5 x^2 + 10 x + 50$');
In [105]:
plt.figure(figsize=(12, 12));
plt.plot(noisy_signal);
plt.title('Noisy Signal');
In [106]:
first_derivative = np.gradient(signal)
second_derivative = np.gradient(first_derivative)
In [107]:
plt.figure(figsize=(12, 12))
plt.scatter(range(second_derivative.size), second_derivative, c='k');

This plot is our $2a$ plot, it is because the second derivative of $ax^2 + bx +c $ is $2a$(Note that the first derivative is $2ax + b$). We could see that the $a$ is actually 5 as $2a$ is 10. For a quadratic function, if you want to get the coefficient of the second order term in a quadratic function, instead of curve fitting, we could get its second derivative by doing so.

Note that the borders(boundary conditions are not well-defined for derivative of the function) do not obey rest of the second derivative but this is easy to fix, getting the median of the whole second derivative term. If you do not want to deal with boundaries, you could truncate the second derivative and first derivative terms without loss of generalization.

In [108]:
plt.figure(figsize=(12, 12))
plt.scatter(range(second_derivative.size), first_derivative, c='k');

What about the coefficient of $x$, namely $b$? We could find its value from the first derivative as we already found $a$. As mentioned earlier, the graph of first derivative is $10x + b$ and by looking at any $(x, y)$ pairs, we could easily find the value of $b$.

In [109]:
first_derivative[500]
Out[109]:
5010.0

For $(x, y) = (500, 5010)$, we could see $b=10$, but does that hold true for every point? We could check by computing an error term or we could directly look athe value of the b, which I will do the later in the next graph.

In [110]:
b_ = [jj - 10 * ii for ii, jj in enumerate(first_derivative.tolist())]
plt.figure(figsize=(12, 12))
plt.plot(b_);
# To get rid of boundary
plt.xlim(10, 990);

Apparently for every point in the first derivative this holds true, so it is fair to conclude that $b=10$. Similarly, after finding the $a$ and $b$ values, we could similarly estimate $c$ from the $(x, y)$ pairs in the original signal.

In [111]:
signal[50]
Out[111]:
13050

The signal that we have is now $5x^2 + 10x + c$ and for $x=50$ that corresponds to $13000 + c$ and if we put the value of $x,y$ pair above, we would find that the $c$ is 50.

Similarly, we could plot the value of $c$ for different $x$ values;

In [112]:
c_ = [jj - (5 * ii ** 2 + 10 *ii) for ii, jj in enumerate(signal.tolist())]
plt.figure(figsize=(12, 12));
plt.plot(c_);

$c$ seems to be 50.

Derivation in the Noisy Signal

Since most of the real world datasets do not have perfect signals, they have noise, taking the derivatives from the noisy signal does not work. Take the noisy_signal example:

In [113]:
noisy_first_derivative = np.gradient(noisy_signal)
noisy_second_derivative = np.gradient(noisy_first_derivative)
In [114]:
plt.figure(figsize=(12, 12));
plt.scatter(range(second_derivative.size), second_derivative, label='Second derivative of noiseless signal');
plt.scatter(range(noisy_second_derivative.size), noisy_second_derivative, label='Second derivative of noisy signal', c='k');

There is not a single coherent value in the second derivative of the noisy signal. This is because the noise gets worse when we get the derivative of the signal(high-pass filtered signal, will become clear in the convolution section) as I have mentioned a little bit in here as well.

Curve Fitting via MCMC

Priors

Since we do not have any informed opinions on priors, what we could do is just selecting the priors from the uniform distribution to have an unbiased beginning and data will update the posterior by itself.

In [115]:
sig = pymc.Uniform('sig', 0.0, 100.0, value=1.)
a = pymc.Uniform('a', 0.0, 100.0, value= 0.0)
b = pymc.Uniform('b', 0.0, 100.0, value= 0.0)
c = pymc.Uniform('c', 0.0, 100.0, value= 0.0)

Quadratic Model

In [116]:
@pymc.deterministic
def quadratic_model(x=x, a=a, b=b, c=c):
    return a * x ** 2 + b * x + c

quad_y = pymc.Normal('quad_y', mu=quadratic_model, tau=1.0 / (sig ** 2), value=noisy_signal, observed=True)
quad_model = pymc.Model([a, b, c, quad_y])
quadratic_mcmc = pymc.MCMC(quad_model)    

Quadratic model is defined as a straightforward 2nd order equation and we want to estimate a Gaussian where its mean is our quadratic function.

Inference

In [117]:
quadratic_mcmc.sample(100000, 20000)
 [-----------------100%-----------------] 100000 of 100000 complete in 40.4 sec
In [118]:
# Quadratic Function Coefficients
a = quadratic_mcmc.trace('a')[:]
b = quadratic_mcmc.trace('b')[:]
c = quadratic_mcmc.trace('c')[:]
In [119]:
print('Last value of a: {}, and the median value is {}'.format(a[-1], np.median(a)))
print('Last value of b: {}, and the median value is {}'.format(b[-1], np.median(b)))
print('Last value of c: {}, and the median value is {}'.format(c[-1], np.median(c)))
Last value of a: 5.01246794716, and the median value is 5.01247082111
Last value of b: 1.39603794911e-05, and the median value is 5.95839920691e-06
Last value of c: 4.05537704958e-05, and the median value is 0.00119246364763

The meaningful values are the median values as they reflect the estimated values better. If we look at the numbers, they are quite close to the real values(except c), but let's see how they are distributed to get a better idea on posterior distribution of the plots.

Posterior Plots

In [120]:
plt.figure(figsize=(12, 12))
plt.hist(a, color='k', bins=100);
In [121]:
plt.figure(figsize=(12, 12))
plt.hist(b, color='k', bins=100);
In [122]:
plt.figure(figsize=(12, 12))
plt.hist(c, color='k', bins=100);
In [123]:
aa = np.median(a)
bb = np.median(b)
cc = np.median(c)

constructed_signal = aa * x ** 2 + bb * x + cc
plt.figure(figsize=(12, 12));
plt.plot(noisy_signal, label='Noisy Signal');
plt.plot(constructed_signal, label='Constructed Signal');
plt.legend();

Note that the constructed signal is quite good in terms of noise handling and it is very close to original signal that we started with.

Linear Model

Similarly, we could do linear regression on the first derivative of the function in order to estimate the second order and first order terms in the function.

Priors

Priors could be chosen from Uniform distribution as previously as we do not have prior information on the parameters, linear model is basic linear regression except that it is using MCMC to estimate the slope and intercept of the linear function.

In [132]:
m = pymc.Uniform('m', -100.0, 100.0, value= 0.0)
n = pymc.Uniform('n', -100.0, 100.0, value= 0.0)
In [133]:
@pymc.deterministic
def linear_model(x=x, m=m, n=n):
    return m * x + n
linear_y = pymc.Normal('linear_y', mu=linear_model, tau=1.0 / (sig ** 2), value=first_derivative, observed=True)
linear_model = pymc.Model([m, n, linear_y])
linear_mcmc = pymc.MCMC(linear_model)  

Inference

In [134]:
linear_mcmc.sample(100000, 20000)
 [-----------------100%-----------------] 100000 of 100000 complete in 29.5 sec
In [135]:
# Linear Function Coefficients
m = linear_mcmc.trace('m')[:]
n = linear_mcmc.trace('n')[:]
print('Last value of m: {}, and the median value is {}'.format(m[-1], np.median(m)))
print('Last value of n: {}, and the median value is {}'.format(n[-1], np.median(n)))
Last value of m: 9.99190707125, and the median value is 10.0003068209
Last value of n: 12.6452367342, and the median value is 9.8141133337

In the above, note that $m$ corresponds to $2a$ and $n$ corresponds to $b$ in the original signal. The $c$ could be found in the same way we found earlier, checking the $x, y$ pairs of the function.

The approach of first taking the derivative of the function and then doing linear regression would be useful to estimate the coefficients that are not constant if the curve fitting takes either too much time to converge or function's second derivative is not meaningful. Note that one does not have to do curve fitting on a function if function is differentiable. We could always use the first derivative to estimate the parameters(albeit not the constants), then use the estimated parameters to estimate the constant.

Taking Derivatives through convolution

The first and second derivative is obtained by convenient numpy function gradient but this is not only way to take the derivative of the function. We could also apply a high-pass filter to get the gradient of the function. Arguably, the simplest high-pass filter [-1, 1] could just do that.

For the second derivative filter, we could apply one more time the same filter which will result in second derivative filter. However, the convolution operation is linear operation so instead of applying the same filter twice to the signal, we could convolve the filter with itself to get the second derivative filter, namely Discrete Laplace Filter.

Theory

Derivation at least in the broad sense of derivation is defined in the continuous signals. To be able to get derivatives of finite series, we need to depend on numerical differentiation. That is, the derivation is not done in infinitesimally function parts but rather it would be a discrete difference. So, when we define the derviation in the discrete series more formally, we could write as follows: $$ f^{\prime} = \lim_{h \rightarrow 0} \frac{f(x+h) - f(x)}{f(h)} $$

The equation is also called Difference Quotient of newton. In finite series, this operation could be implemented through convolution with a [1, -1] filter as it only gets the difference between two values. This equation can also be written in the following format without loss of generality(this is how we compute high-order functions' slope, secant of the function), $$ f^{\prime} = \lim_{h \rightarrow 0} \frac{f(x+h) - f(x-h)}{f(2h)} $$ Note that, even if we are computing the derivative of the function at $x$, we are not using that individual entry. The reason why I switched to this representation will become clear now. If we want to get another derivative of this function, we could write the expression in the following: $$f^{\prime\prime} = lim_{h \rightarrow 0} \frac{f(x+2h) - f(x) - [f(x) - f(x-2h)]}{2h * 2h}$$ .

$$f^{\prime\prime} = lim_{h \rightarrow 0} \frac{f(x+2h) - 2f(x) + f(x-2h)]}{4h^2}$$

If we look at the numerator coefficients of both first derivative and second derivative, then we would see something coherent, first derivative is [1, -1] and second derivative is [1, -2, 1]. The second derivative filter is called also Laplace Operator.

In [128]:
### High-Pass Filtering
def get_first_and_second_derivative_through_convolution(signal):
    # Simple High Pass Filter
    high_pass_filter = np.array([1.0, -1.0], np.float32)
    first_derivative = ssignal.convolve(signal, high_pass_filter, mode='valid')
    # If we convolve the first derivative filter with itself, we get 2nd derivative filter
    # Laplacian Operator => 2nd Derivative Filter 
    laplacian = ssignal.convolve(high_pass_filter, high_pass_filter)
    # This is gain, that we just applied the filter once to the signal to get 2nd derivative
    second_derivative = ssignal.convolve(signal, laplacian, mode='valid')
    return (first_derivative, second_derivative)

# For noiseless signal
f_signal, s_signal = get_first_and_second_derivative_through_convolution(signal)
# For noisy signal
f_noisy_signal, s_noisy_signal = get_first_and_second_derivative_through_convolution(noisy_signal)
In [129]:
## Laplacian Filter
high_pass_filter = np.array([1.0, -1.0], np.float32)
laplacian_filter = ssignal.convolve(high_pass_filter, high_pass_filter)

print('High Pass Filter: {}'.format(high_pass_filter))
print('Laplacian: {}'.format(laplacian_filter))
High Pass Filter: [ 1. -1.]
Laplacian: [ 1. -2.  1.]

Nothing unexpected. We are doing one convolution with a large signal instead of two convolution based on the assumption that convolution of signal with a filter is significantly more computationally expensive than the filter's convolution with itself.

In [130]:
plt.figure(figsize=(12, 12))
plt.scatter(range(s_signal.size), s_signal);
In [131]:
plt.figure(figsize=(12, 12))
plt.plot(f_signal, label='Second Derivative of Noiseless Signal')
plt.plot(f_noisy_signal, label='Second Derivative of Noisy Signal')
plt.legend();

Takeaways

  • If you have a perfect quadratic or similar to quadratic signal, instead of doing MCMC, you may want to go get its derivatives for parameter estimation.
  • Convolution is powerful, you could filter the signal based on frequency, smooth it, high-pass filter it. In this particular implementation, convolution is used for second-order derivative as well. Since it is linear, you could combine a bunch of filters(through convolution) and instead of applyting the filters separately on the signal, you could apply one filter to whole signal in order to reduce the computation.
  • For noisy signals, getting high-pass filter response of the signals deteriorate the signal as the noise response is also quite high. In order to apply, high-pass filters, you need to first smooth the signal to reduce the noise in a sane level. Then, you may have a better Signal-to-Noise Ratio(SNR).
  • MCMC handles the noise quite well to estimate the parameters, real-world datasets, MCMC is quite good as it could reject the noise quite well and captures the main signal. But it is more on computationally expensive side. You may want to try Scipy Curve Fit, Numpy Poly Fit where the first one uses non-linear least square and latter uses the linear least square method. They should result in faster results than MCMC.(not as correct as MCMC albeit)
  • Generally, linear regression is de-facto method for regression, but if you want to estimate the change and different points in the signal, linear regression will give a constant slope for every point. In order to allow slope to vary across time, you need to fit a higher-order function. It will be computationally expensive, but as long as you do not overfit, the fit would be better than linear regression and you would get a slope function as well to estimate the changes in the function over time.
comments powered by Disqus