X∼N(μ,σ2) f(x;μ,σ2)=1√2πσe−12(x−μ)2σ2
1989-2001
f(x;0,1)=1√2πe−x22
函数曲线下68.268949%的面积在平均数左右的一个标准差范围内。 95.449974%的面积在平均数左右两个标准差 {\displaystyle 2\sigma } 2 \sigma的范围内。 99.730020%的面积在平均数左右三个标准差 {\displaystyle 3\sigma } 3 \sigma的范围内。-->
X∼N(2,1)vsX∼N(−2,1)
N(0,1),N(0,0.1),N(0,4),N(0,8)
X∼(μ,σ2)x∈Rd ⇓ f(x;μ,σ2)=1√2πσe−12(x−μ)2σ2
X∼(μ,Σ) ⇓ f(x;μ,Σ)=1(2π)d2|Σ|12e−12(x−μ)TΣ−1(x−μ) μ=E(x)Σ=E[(x−μ)(x−μ)T]
X∼([00],[1001])
X∼([00],[1001]) ⇓ f(x;μ,Σ)=1(2π)22|1001|12e−12(x−[00])T[1001](x−[00]) x=[00],f([00];[00],[1001])=12π
p(x;μ,Σ)=12π=0.15915
μ=[00],Σ=[4004]?
μ=[00],Σ=[1005]? μ=[00],Σ=[5001]?
μ=[00],Σ=[10.50.51]?μ=[00],Σ=[1−0.5−0.51]?
μ=[00],Σ=[1−0.0001−0.00011]?μ=[00],Σ=[10.990.991]?
μ=[00],Σ=[1111]?
f(x;μ,Σ)=1(2π)22|1111|12e−12(x−[00])T[1111]−1(x−[00])
f(x;μ,Σ)=1(2π)d2|Σ|12e−12(x−μ)TΣ−1(x−μ)
x=[x1x2],μ=[μ1μ2],Σ=[σ2100σ22]
f(x;μ,Σ)=12π|σ2100σ22|12e−12([x1x2]−[μ1μ2])T[1σ21001σ22]([x1x2]−[μ1μ2])=12πσ1σ2e−12σ21(x1−μ1)2−12σ22(x1−μ2)2
μ0=[11],μ1=[44],Σ=[1001]
p(y=blue)=ϕp(y=red)=1−ϕ
p(y=1)=ϕp(y=0)=1−ϕ
p(y;ϕ)=ϕy(1−ϕ)(1−y)
可以假设,有铁和塑料两个箱子,分别红有红球和蓝球。我们取了一堆红球和篮球
p(x|y=0)=1(2π)d2|Σ|12e−12(x−μ0)TΣ−1(x−μ0)
p(x|y=1)=1(2π)d2|Σ|12e−12(x−μ1)TΣ−1(x−μ1)
p(X,y;ϕ,μ0,μ1,Σ)=p(x(1),x(2),⋯,x(m),y(1)y(2),⋯,y(m);ϕ,μ0,μ1,Σ)≜
\begin{aligned}&\arg\max p(X,y;\phi,\mu_0,\mu_1,\Sigma)\\ =&\arg\max\prod_{i=1}^{m}p(x^{(i)},y^{(i)};\phi,\mu_0,\mu_1,\Sigma)\\ =&\arg\max\prod_{i=1}^{m}p(x^{(i)}|y^{(i)};\mu_0,\mu_1,\Sigma)p(y^{(i)};\phi)\\ =&\arg\max\sum_{i=1}^{m}\log p(x^{(i)}|y^{(i)};\mu_0,\mu_1,\Sigma)+\sum_{i=1}^{m}p(y^{(i)};\phi)\end{aligned}
\begin{aligned}&\frac{\partial\sum_{i=1}^{m} \log p(x^{(i)}|y^{(i)};\mu_0,\mu_1,\Sigma)+\sum_{i=1}^{m} \log p(y^{(i)};\phi)}{\partial \phi}=0\\ \Rightarrow&\frac{\partial\sum_{i=1}^{m}\log p(y^{(i)};\phi)}{\partial \phi}=0\\ \Rightarrow&\frac{\partial\sum_{i=1}^{m}\log \phi^{y^{(i)}}(1-\phi)^{(1-y^{(i)})}}{\partial \phi}=0\\ \Rightarrow&\frac{\partial\sum_{i=1}^{m}{y^{(i)}}\log \phi+{(1-y^{(i)})}\log(1-\phi)}{\partial \phi}=0\\ \Rightarrow&\frac{\partial\sum_{i=1}^{m}{{1}{\{y^{(i)}=1\}}}\log \phi+{1}{\{y^{(i)}=0\}}\log(1-\phi)}{\partial \phi}=0\\ \Rightarrow&\phi=\frac{1}{m}\sum_{i=1}^{m}1\{y^{(i)}=1\}\end{aligned}
\phi=\frac{1}{m}\sum_{i=1}^{m}1{y^{(i)}=1}
\begin{aligned}&\frac{\partial\sum_{i=1}^{m} \log p(x^{(i)}|y^{(i)};\mu_0,\mu_1,\Sigma)+\sum_{i=1}^{m} \log p(y^{(i)};\phi)}{\partial \mu_0}=0\\ \Rightarrow&\frac{\partial\sum_{i=1}^{m} \log p(x^{(i)}|y^{(i)};\mu_0,\mu_1,\Sigma)}{\partial \mu_0}=0\\ \Rightarrow&\frac{\partial \sum_{i=1}^{m}\log\frac{1}{(2\pi)^{\frac{d}{2}}|\Sigma|^{\frac{1}{2}}}e^{-\frac{1}{2}(x^{(i)}-\mu_0)^T\Sigma^{-1}(x^{(i)}-\mu_0)}}{\partial \mu_0}=0\\ \Rightarrow&0+\frac{\partial \sum_{i=1}^{m}{-\frac{1}{2}(x^{(i)}-\mu_0)^T\Sigma^{-1}(x^{(i)}-\mu_0)}}{\partial \mu_0}=0 \end{aligned}
Given that \frac{\partial X^TAX}{\partial X}=(A+A^T)X,令(x^{(i)}-\mu_0)=X
\begin{aligned}&0+\frac{\partial \sum_{i=1}^{m}{-\frac{1}{2}(x^{(i)}-\mu_0)^T\Sigma^{-1}(x^{(i)}-\mu_0)}}{\partial \mu_0}=0\\ \Rightarrow&{\sum_{i=1}^{m}-\frac{1}{2}((\Sigma^{-1})^T+\Sigma^{-1})(x^{(i)}-\mu_0)\cdot(-1)}=0\\ \Rightarrow& \sum_{i=1}^{m}1\{y^{(i)}=0\}x^{(i)}=\sum_{i=1}^{m}1\{y^{(i)}=0\}\mu_0\\ \Rightarrow&\mu_0=\frac{\sum_{i=1}^{m}1\{y^{(i)}=0\}x^{(i)}}{\sum_{i=1}^{m}1\{y^{(i)}=0\}} \end{aligned}
\mu_0=\frac{\sum_{i=1}^{m}1\{y^{(i)}=0\}x^{(i)}}{\sum_{i=1}^{m}1\{y^{(i)}=0\}} \mu_1=\frac{\sum_{i=1}^{m}1\{y^{(i)}=1\}x^{(i)}}{\sum_{i=1}^{m}1\{y^{(i)}=1\}}
\begin{aligned}&\frac{\partial\sum_{i=1}^{m} \log p(x^{(i)}|y^{(i)};\mu_0,\mu_1,\Sigma)+\sum_{i=1}^{m} \log p(y^{(i)};\phi)}{\partial \Sigma}=0\\ \Rightarrow&\frac{\partial \sum_{i=1}^{m}\log\frac{1}{(2\pi)^{\frac{d}{2}}|\Sigma|^{\frac{1}{2}}}e^{-\frac{1}{2}(x^{(i)}-\mu_{y^{(i)}})^T\Sigma^{-1}(x^{(i)}-\mu_{y^{(i)}})}}{\partial \Sigma}=0\\ \Rightarrow&\frac{\partial \sum_{i=1}^{m}-\frac{d}{2}\log2\pi}{\partial \Sigma}+\frac{\partial \sum_{i=1}^{m}-\frac{1}{2}\log|\Sigma|}{\partial \Sigma}\\&+\frac{\partial \sum_{i=1}^{m}{-\frac{1}{2}(x^{(i)}-\mu_{y^{(i)}})^T\Sigma^{-1}(x^{(i)}-\mu_{y^{(i)}})}}{\partial \Sigma}=0\\\\ \Rightarrow&\frac{\partial \sum_{i=1}^{m}-\frac{1}{2}\log|\Sigma|}{\partial \Sigma}+\frac{\partial \sum_{i=1}^{m}{-\frac{1}{2}(x^{(i)}-\mu_{y^{(i)}})^T\Sigma^{-1}(x^{(i)}-\mu_{y^{(i)}})}}{\partial \Sigma}\\ &=0 \end{aligned}
\begin{aligned}&\frac{\partial \sum_{i=1}^{m}\log|\Sigma|}{\partial \Sigma}+\frac{\partial \sum_{i=1}^{m}{(x^{(i)}-\mu_{y^{(i)}})^T\Sigma^{-1}(x^{(i)}-\mu_{y^{(i)}})}}{\partial \Sigma}=0\\ \Rightarrow&m\frac{1}{|\Sigma|}|\Sigma|\Sigma^{-1}+\sum_{i=1}^m(x^{(i)}-\mu_{y^{(i)}})^T(x^{(i)}-\mu_{y^{(i)}})(-\Sigma^{-2}))=0\\ \Rightarrow&\Sigma=\frac{1}{m}\sum_{i=1}^{m}(x^{(i)}-\mu_{y^{(i)}})(x^{(i)}-\mu_{y^{(i)}})^T \end{aligned}
\Sigma=\frac{1}{m}\sum_{i=1}^{m}(x^{(i)}-\mu_{y^{(i)}})(x^{(i)}-\mu_{y^{(i)}})^T
\begin{aligned}&\phi=\frac{1}{m}\sum_{i=1}^{m}1\{y^{(i)}=1\}\\\\ &\mu_0=\frac{\sum_{i=1}^{m}1\{y^{(i)}=0\}x^{(i)}}{\sum_{i=1}^{m}1\{y^{(i)}=0\}}\\\\ &\mu_1=\frac{\sum_{i=1}^{m}1\{y^{(i)}=1\}x^{(i)}}{\sum_{i=1}^{m}1\{y^{(i)}=1\}}\\\\ &\Sigma=\frac{1}{m}\sum_{i=1}^{m}(x^{(i)}-\mu_{y^{(i)}})(x^{(i)}-\mu_{y^{(i)}})^T \end{aligned}
\begin{aligned} &\phi=0.5\\\\ &\mu_0=\begin{bmatrix}4.0551\\4.1008\end{bmatrix}\\\\ &\mu_1=\begin{bmatrix}0.85439\\1.03622\end{bmatrix}\\\\ &\Sigma=\begin{bmatrix}1.118822&-0.058976\\-0.058976&1.023049\end{bmatrix} \end{aligned}
\phi=0.5\,\mu_0=\begin{bmatrix}4.055\\4.101\end{bmatrix} \Sigma=\begin{bmatrix}1.1188&-0.059\\-0.059&1.023\end{bmatrix}x=\begin{bmatrix}0.88\\3.95\end{bmatrix}
\begin{aligned} &\frac{1}{2\pi|\Sigma|^{\frac{1}{2}}}e^{-\frac{1}{2}{\begin{bmatrix}x_1-\mu_1\\x_2-\mu_2\end{bmatrix}^T\Sigma^{-1}\begin{bmatrix}x_1-\mu_1\\x_2-\mu_2\end{bmatrix}}}\\=&\frac{1}{{2\pi}\left|\begin{matrix}1.1188&-0.059\\-0.059&1.023\end{matrix}\right|^{\frac{1}{2}}}e^{-\frac{1}{2}{\begin{bmatrix}-3.175\\-0.151\end{bmatrix}^T\begin{bmatrix}0.896&-0.052\\-0.0520&0.98\end{bmatrix}\begin{bmatrix}-3.175\\-0.151\end{bmatrix}}}\\ =&\frac{1}{2\pi\sqrt{(1.141)}}e^{-\frac{1}{2}\times 9.11} =0.149\times 0.01=0.0015 \end{aligned}
\phi=0.5\,\mu_1=\begin{bmatrix}0.85\\1.036\end{bmatrix}\Sigma=\begin{bmatrix}1.1188&-0.059\\-0.059&1.023\end{bmatrix}x=\begin{bmatrix}0.88\\3.95\end{bmatrix}
\begin{aligned} &\frac{1}{2\pi|\Sigma|^{\frac{1}{2}}}e^{-\frac{1}{2}{\begin{bmatrix}x_1-\mu_1\\x_2-\mu_2\end{bmatrix}^T\Sigma^{-1}\begin{bmatrix}x_1-\mu_1\\x_2-\mu_2\end{bmatrix}}}\\=&\frac{1}{{2\pi}\left|\begin{matrix}1.1188&-0.059\\-0.059&1.023\end{matrix}\right|^{\frac{1}{2}}}e^{-\frac{1}{2}{\begin{bmatrix}0.03\\2.91\end{bmatrix}^T\begin{bmatrix}0.896&-0.052\\-0.0520&0.98\end{bmatrix}\begin{bmatrix}0.03\\2.91\end{bmatrix}}}\\ =&\frac{1}{2\pi\sqrt{(1.141)}}e^{-\frac{1}{2}\times 8.336} =0.149\times 0.015=0.0022 \end{aligned}
\quad p\left(x=\begin{bmatrix}0.88\\3.95\end{bmatrix}\Bigg| y=0\right)=0.0015
\quad p\left(x=\begin{bmatrix}0.88\\3.95\end{bmatrix}\Bigg| y=1\right)=0.0022
\frac{p\left(x=\begin{bmatrix}0.88\\3.95\end{bmatrix} \Bigg| y=0\right)p(y=0)}{p\left(x=\begin{bmatrix}0.88\\3.95\end{bmatrix}\Bigg| y=1\right)p(y=1)}=\frac{0.0015}{0.0022}=0.68182<1
If x|y with a Multivariate Gaussian Distribution,then the posterior probability y|x with a Logistic Regression; but the inverse is not true.
If x|y with a Gaussian Distribution, then GDA is a good choice; otherwise Logistic Regression is a better one.
If x|y=0 and x|y=1 are both with Poisson Distribution, then y|x with Logistic Regression as well.
\color{yellow}{X\sim \mathcal{N}(\mu, \sigma^2)} \color{yellow}{f(x;\mu,\sigma^2)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{1}{2}\frac{(x-\mu)^2}{\sigma^2} } }
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |