A tutorial on Canonical Correlation Analysis(CCA)

Introduction

Suppose we have two sets of variable corresponding to two aspects such as height and weight, we want to analysis the relationship between this two sets. There are several ways to measure the relationship between them. However, sometime the it is hard to handle datasets with different dimensions, meaning, if \(X\in \mathbb{R}^m\) and \(Y\in \mathbb{R}^n\), how to resolve the relationship?

basic of CCA

Assume there are two sets of data \(X\) and \(Y\), the size of \(X\) is \(n \times p\), whereas size of \(Y\) is \(n\times q\). That is, \(X\) and \(Y\) share the same row numbers but are differnt in columns number. The idea of CCA is simple: find the best match of \(X w_x\) and \(Y w_y\). Let's just set: \[X w_x = z_x\qquad\text{and}\qquad Y w_y = z_y\]

Where \(X\in \mathbb{R}^{n\times p}\), \(w_x \in \mathbb{R}^{p}\), \(z_x\in \mathbb{R}^n\), \(Y\in \mathbb{R}^{n\times q}\), \(w_y \in \mathbb{R}^{q}\), \(z_y\in \mathbb{R}^n\). \(w_x\) and \(w_y\) are often refered as canonical weight vectors, \(z_x\) and \(z_y\) are named images as well as canonical variates or canonical scores. To simplify the problem, we assume \(X\) and \(Y\) are standardized to zero mean and unit variance. Our task is to maximize the angle of \(z_x\) and \(z_y\), meaning:

\[\max_{z_x, z_y \in \mathbf{R^n}} <z_x, z_y>=\max \cos(z_x, z_y)=\max\frac{<z_x, z_y>}{\|z_x\|\|z_y\|}\]

with respect to: \(\|z_x\|_{2}=1\quad \|z_y\|_{2}=1\).

In fact, our task is just project \(X\) and \(Y\) to a new coordinate system after the linear transformation to \(X\) and \(Y\).

Resolve CCA

There are many solutions to this problems. Before start, We need make some assumptions: 1. the each column vector of \(X\) is perpendicular to the others. Which means \(X^T X= I\). The assumption is the same with \(Y\) and \(w_x, w_y\). We can find \(\min(p,q)\) canonical components, and the \(r\)th component is orthogonal to all the \(r-1\) components.

Resolve CCA through SVD

To solve the CCA problem using SVD, we first introduce the joint covariance matrix \(C\) such such that: \[\begin{equation} C = \begin{pmatrix} C_{xx} & C_{xy}\\ C_{yx} & C_{yy}\\ \end{pmatrix} \end{equation}\] Where \(C_{xx}=\frac{1}{n-1}X^\top X\) and \(C_{yy}=\frac{1}{n-1}Y^\top Y\) are the empirical variance matrices between \(X\) and \(Y\) respectively. The \(C_{xy}=\frac{1}{n-1} X^\top Y\) is the covariance matrix between \(X\) and \(Y\).

We next can reform CCA problem with two linear transformations \(w_x\) and \(w_y\) such that:

\[\begin{equation} w_x^\top C_{xx} w_x = I_p, \quad w_y^\top C_{yy} w_y = I_q, \quad w_x^\top C_{xy} w_y = D \end{equation}\] Where I_p and I_q are th p-dimensional and q-dimensional identity meatrics respectively. The diagonal matrix \(D = \text{diag}(\gamma_i)\) so that:

\[\begin{equation} \begin{pmatrix} {w}_x^\top & { 0}\\ { 0} & {w}_y^\top \end{pmatrix} \begin{pmatrix} C_{xx} & C_{xy}\\ C_{yx} & C_{yy} \end{pmatrix} \begin{pmatrix} {w}_x & { 0}\\ { 0} & {w}_y \end{pmatrix} = \begin{pmatrix} I_p & D\\ D^\top & I_q \end{pmatrix}, \end{equation}\]

The canoical variable: \[\begin{equation} Z_x = Xw_x, \quad Z_y = Y w_y \end{equation}\] The diagonal elements \(\gamma_i\) of D denote the canonical correlations. Thus we find the linear compounds \({Z}_x\) and \({Z}_y\) to maximize the cross-correlations. Since both \(C_{xx}\) and \(C_{yy}\) are symmetric positive definite, we can perform Cholesky Decomposition on them to get: \[\begin{equation} C_{xx} = C_{xx}^{\top/2} C_{xx}^{1/2}, \quad C_{yy} = C_{yy}^{\top/2} C_{yy}^{1/2} \end{equation}\]

where \(C_{xx}^{\top/2}\) is the transpose of \(C_{xx}^{1/2}\). Applying the inverses of the square root factors symmetrically on the joint covariance matrix \(C\), the matrix is transformed into: \[\begin{equation} \begin{pmatrix} C_{xx}^{-\top/2} & {\mathbf 0}\\ {\mathbf 0} & C_{yy}^{-\top/2} \end{pmatrix} \begin{pmatrix} C_{xx} & C_{ab}\\ C_{yx} & C_{yy} \end{pmatrix} \begin{pmatrix} C_{xx}^{-1/2} & {\mathbf 0}\\ {\mathbf 0} & C_{yy}^{-1/2} \end{pmatrix} = \begin{pmatrix} I_p & C_{xx}^{-1/2}C_{ab}C_{yy}^{-1/2}\\ C_{yy}^{-1/2}C_{yx}C_{xx}^{-1/2} & I_q \end{pmatrix}. \end{equation}\]

The canonical correlation problem is reduced to that of finding an SVD of a triple product: \[\begin{equation} U^{\top} (C_{xx}^{-1/2}C_{ab}C_{yy}^{-1/2}) V = D. \end{equation}\] The matrix \(C\) is thus reduced to the joint covariance matrix by applying a two-sided Jacobi method such that: \[\begin{equation} \begin{pmatrix} U^\top & {\mathbf 0}\\ {\mathbf 0} & V^\top \end{pmatrix} \begin{pmatrix} I_p & C_{xx}^{-1/2}C_{ab}C_{yy}^{-1/2}\\ C_{yy}^{-1/2}C_{_y}C_{xx}^{-1/2} & I_q \end{pmatrix} \begin{pmatrix} U & {\mathbf 0}\\ {\mathbf 0} & V \end{pmatrix} = \begin{pmatrix} I_p & D\\ D^\top & I_q \end{pmatrix} \end{equation}\]

with the desired transformation \({w}_x\) and \({w}_y\): \[\begin{equation} {w}_x = C_{xx}^{-1/2} U, \quad {w}_y = C_{yy}^{-1/2}V \end{equation}\] where the singular values \(\gamma_i\) are in descending order such that: \[\begin{equation} \gamma_1 \geq \gamma_2 \geq \cdots \geq 0. \end{equation}\]

Resolve CCA through Standard EigenValue Problem

The Problem can be reformed to solve the problem: \[\begin{equation} \underset{w_x \in \mathbb{R}^p, w_y\in \mathbb{R}^q}{\arg \max} w_x^\top C_{xy} w_y \end{equation}\] With respect to \(\|\|w_x^\top C_{xx} w_x\|\|_2 = \sqrt{w_x^\top C_{xx} w_x}=1\) and \(\|\|w_y^\top C_{yy} w_y\|\|_2 = \sqrt{w_y^\top C_{yy} w_y}=1\). The problem can apparently sovled by Lagrange multiplier technique. Let construct the Lagrange multiplier \(L\) such that: \[\begin{equation} L = w_x^\top C_{xy} w_y - \frac{\rho_1}{2} w_x^\top C_{xx} w_x - \frac{\rho_2}{2} w_y^\top C_{yy} w_y \end{equation}\]

The differentiation of L to \(w_x\) and \(w_y\) is: \[\begin{equation} \begin{aligned} \frac{\partial L}{\partial w_x} = C_{xy} w_y - \rho_1 C_{xx}w_x = \mathbf{0}\\ \frac{\partial L}{\partial w_y} = C_{yx} w_x - \rho_2 C_{yy}w_y = \mathbf{0} \end{aligned} \end{equation}\]

By left multipling \(w_x\) and \(w_y\) the above equation, we have:

\[\begin{equation} \begin{aligned} w_x^\top C_xy w_y -\rho_1 w_x^\top C_xx w_x = \mathbf{0}\\ w_y^\top C_yx w_x -\rho_2 w_y^\top C_yy w_y = \mathbf{0} \end{aligned} \end{equation}\] Since w_x^C_xx w_x = 1 and w_y^C_yy w_y = 1, we can obtain that \(\rho_1 = \rho_2 = \rho\). By substituting \(\rho\) to the formula. We can get: \[\begin{equation} w_x = \frac{C_{xx}^{-1}C_{xy}w_y}{rho} \end{equation}\] Evantually we have the equation: \[\begin{equation} C_{yx} C_{xx}^{-1} C_{xy} w_y = \rho^2 C_yy w_y \end{equation}\] Obviously, this is the form of eigenvalue decompostion problem where all eigen values are greater or equal to zero. By solving the eigenvalue decomposition we can find \(w_x\) and \(w_y\).