Gaussian Discriminant Analysis
Preface
There are many classification algorithm such as Logistic Regression, SVM and Decision Tree etc. Today we'll talk about Gaussian Discriminant Analysis(GDA) Algorithm, which is not so popular. Actually, Logistic Regression performance better than GDA because it can fit any distributions from exponential family. However, we can learn more knowledge about gaussian distribution from the algorithm which is the most import distribution in statistics. Furthermore, if you want to understand Gaussian Mixture Model or Factor Analysis, GDA is a good start.
We, firstly, talk about Gaussian Distribution and Multivariate Gaussian Distribution, in which section, you'll see plots about Gaussian distributions with different parameters. Then we will learn GDA classification algorithm. We'll apply GDA to a dataset and see the consequnce of it.
Multivariate Gaussian Distribution
Gaussian Distribution
As we known that the pdf(Probability Distribution Function) of
gaussian distribution is a bell-curve, which is decided by two
parameters
Figure 1. Gaussian Distribution with
Actually, parameter
You must want to know how these two parameter influence the shape of
PDF of gaussian distribution. First of all, when we change
Figure 2. Probability Density Function curver with
So, what if when we change
Figure 3. Probability Density Function curver with
Some may wonder what is the form of
Multivariate Gaussian
For convenience, we first see what is form of Multivariate Guassian Distribution:
where
In order to get an intuition of Multivariate Guassian Distribution,
We first take a look at a distribution with
Figure 4. 2-dimensional gaussian distribution with
Notice that the the figure is rather than a curve but a 3-dimensional
diagram. Just like normal distribution pdf,
1. change
Rather than change
Figure 5. Contours when change
2. change diagonal entries of
If scaling diagonal entries, we can see from figure 6. samples are
concentrated to a smaller range when change
Figure 6. Density when scaling diagonal entries to 0.3.
What if we change only one entry of the diagonal? Figure 7. shows the
variation of the density when change
Figure 7. Density when scaling one of the diagonal entries.
3. change secondary
diagonal entries of
We now try to change entries along secondary diagonal. Figure 8.
demonstrates that the variation of density is no longer parallel to
Figure 8. Density when scaling secondary diagonal entries to 0.5
When we alter secondary entries to negative 0.5, the direction of contour presents a mirror to contour when positive.
Figure 9. Density when scaling secondary diagonal entries to -0.5
In light of the importance of determinant of
Figure 10. Density when determinant is close to zero.
Gaussian Discriminant Analysis
Intuition
When input features
Figure 11. Two gaussian distributions with respect to
Specifically, let's look at a concrete example, Figure 12 are samples drawn from two Gaussian distribution. There are 100 blue '+'s and 100 red 'o's. Assume that we have such data to be classified. We can apply GDA to solve the problem.
CODE:
1 | %% octave |
Figure 12. 200 samples drawn from two Gaussian Distribution with
parameters
Definition
Now, let's define the algorithm. Firstly we assume discrete random
variable classes
Concretely, the probablity of
Apparently,
Another assumption is that we consider
i.e.
Vice versa,
Here
It's tough for us to maximize
By now, we have found a convex function with respect parameters
Solution
To estimate these four parameters, we just apply partial derivative
to
Note that
We have
Simlarly,
Before calculate
In spite of the harshness of the deducing, the outcome are pretty beautiful. Next, we will apply these parameters and see how the estimation performance.
Apply GDA
Notice the data drawn from two Gaussian Distribution is random, thus, if you run the code, the outcome may be different. However, in most cases, distributions drawn by estimated parameters are roughly the same as the original distributions.
From Figure 13, We can see contours of two Gaussian distribution, and most of samples are correctly classified.
CODE:
1 | %% octave |
Figure 13. Contours drawn from parameters estimated.
In fact, we can compute the probability of each data point to predict
which distribution it is more likely belongs, for example, if we want to
predict
and
In light of the equivalency of
You may wonder why there is a blue line. It turns out that all the data point below the blue line will be considered as blue class. Otherwise, data points above the line is classified as the red class. How it work?
The blue line is decision boundary, if we know the expression of this line, the decision will be made easier. In fact GDA is a linear classifier, we will prove it later. Still, we see the data point above, if we just divide one probability to another, we just need find if the ratio larger or less than 1. For our example, the ratio is roughly 0.68, so the data point is classified to be the blue class.
Figure 14. Decision Boundary
If we can obtain the expression of the ratio, that should be good. So
given a new
which is equal to
Then,
Here,
If you plug parameters in the formula, you will find:
It is the decision boundary(Figure 14.). Since you have got the
decision boundary formula, it is convenient to use the decision boundary
function predict if a data point
Conclusion
Today, we have talked about Guassian Distribution and its Multivariate form. Then, we assume two groups of data drawn from Gaussian Distributions. We apply Gaussian Discriminant Analysis to the data. There are 200 data point, only one is misclassified. In fact we can deduce GDA to Logistic regression Algorithm(LR). But LR can not deduce GDA, i.e. LR is a better classifier, especially when we do not know the distribution of the data. However, if you have known that data is drawn from Gaussian Distribution, GDA is the better choice.
Reference
- Andrew Ng http://cs229.stanford.edu/notes/cs229-notes2.pdf
- https://en.wikipedia.org/wiki/Normal_distribution
- https://en.wikipedia.org/wiki/Multivariate_normal_distribution
- http://www.cnblogs.com/emituofo/archive/2011/12/02/2272584.html
- http://m.blog.csdn.net/article/details?id=52190572
- 张贤达《矩阵分析与应用》:156-158
- http://www.tk4479.net/hujingshuang/article/details/46357543
- http://www.chinacloud.cn/show.aspx?id=24927&cid=22
- http://www.cnblogs.com/jcchen1987/p/4424436.html
- http://www.xlgps.com/article/139591.html
- http://www.matlabsky.com/thread-10308-1-1.html
- http://classes.engr.oregonstate.edu/eecs/fall2015/cs534/notes/GaussianDiscriminantAnalysis.pdf