Processing math: 32%
+ - 0:00:00
Notes for current slide

I am a note, will be only shown when under presentation mode

Notes for next slide

斯坦福统计学教授Persi Diaconis是一位传奇式的人物。Diaconis14岁就成了一名魔术师,为了看懂数学家Feller的概率论著作,24岁时进入大学读书。 他向《科学美国人》投稿介绍他的洗牌方法,在《科学美国人》上常年开设数学游戏专栏的著名数学科普作家马丁•加德纳给他写了推荐信去哈佛大学, 当时哈佛的统计学家Mosteller 正在研究魔术,于是Diaconis成了Mosteller的学生。(对他这段传奇经历有兴趣的读者可以看一看统计学史话《女士品茶》)。 下面要讲的这个故事,是Diaconis 在他的文章The Markov Chain Monte Carlo Revolution中给出的破译犯人密码的例子。 一天,一位研究犯罪心理学的心理医生来到斯坦福拜访Diaconis。他带来了一个囚犯所写的密码信息。他希望Diaconis帮助他把这个密码中的信息找出来。 这个密码里的每个符号应该对应着某个字母,但是如何把这些字母准确地找出来呢?Diaconis和他的学生Marc采用了一种叫做MCMC(马尔科夫链蒙特卡洛)的方法解决了这个问题。 这个MCMC方法就是这一节我们所要讨论的内容。

MCMC

By ChengMingbo

2017-11-30

powered by remark.js

1 / 42

I am a note, will be only shown when under presentation mode

Criminal ciphertex


2 / 42

斯坦福统计学教授Persi Diaconis是一位传奇式的人物。Diaconis14岁就成了一名魔术师,为了看懂数学家Feller的概率论著作,24岁时进入大学读书。 他向《科学美国人》投稿介绍他的洗牌方法,在《科学美国人》上常年开设数学游戏专栏的著名数学科普作家马丁•加德纳给他写了推荐信去哈佛大学, 当时哈佛的统计学家Mosteller 正在研究魔术,于是Diaconis成了Mosteller的学生。(对他这段传奇经历有兴趣的读者可以看一看统计学史话《女士品茶》)。 下面要讲的这个故事,是Diaconis 在他的文章The Markov Chain Monte Carlo Revolution中给出的破译犯人密码的例子。 一天,一位研究犯罪心理学的心理医生来到斯坦福拜访Diaconis。他带来了一个囚犯所写的密码信息。他希望Diaconis帮助他把这个密码中的信息找出来。 这个密码里的每个符号应该对应着某个字母,但是如何把这些字母准确地找出来呢?Diaconis和他的学生Marc采用了一种叫做MCMC(马尔科夫链蒙特卡洛)的方法解决了这个问题。 这个MCMC方法就是这一节我们所要讨论的内容。

Outline

  • Monte Carlo Approximation
  • Markov Chains
  • MCMC
  • Applications
3 / 42

How to solve π

4 / 42

无穷级数

How to solve π


π4=113+1517+19+

5 / 42

How to solve π


π4=113+1517+19+

N = 100000;
sig = 1;
total = 0;
for i=1:N
total += 1*sig/(2*i-1);
sig = -sig;
if i== 100 || i==1000 || i==10000 || i==100000
fprintf('N=%d, pi=%.4f\n', i, total*4)
end
end
6 / 42

How to solve π


π4=113+1517+19+

N = 100000;
sig = 1;
total = 0;
for i=1:N
total += 1*sig/(2*i-1);
sig = -sig;
if i== 100 || i==1000 || i==10000 || i==100000
fprintf('N=%d, pi=%.4f\n', i, total*4)
end
end
N=100, pi=3.1316
N=1000, pi=3.1406
N=10000, pi=3.1415
N=100000, pi=3.1416
7 / 42

How to solve π

8 / 42

How to solve π

rejection method
9 / 42

How to solve π

N = 100000;
RN = rand(N, 2)*2-1;
inner = 0;
outer = 0;
for i=1:N
if RN(i, 1)^2 + RN(i, 2)^2 <1
inner +=1;
else
outer +=1;
end
if i== 100 || i==1000 || i==10000 || i==100000
fprintf('N=%d, pi=%.4f\n', i, inner*4/i)
end
end
N=100, pi=3.1200
N=1000, pi=3.0920
N=10000, pi=3.1436
N=100000, pi=3.1498
10 / 42

Monte Carlo Approximation

11 / 42

Monte Carlo Integration

12 / 42

Monte Carlo Integration

13 / 42

Monte Carlo Integration

14 / 42

Monte Carlo Integration




15 / 42

Monte Carlo Integration


FN=(ba)1NN1i=0f(Xi)
Pr(lim

Monte Carlo Integration

\begin{array}{l} E[\langle F^N \rangle] & = & E \left[ (b-a) \dfrac{1}{N } \sum_{i=0}^{N-1} f(x_i)\right],\\ & = & (b-a)\dfrac{1}{N } E[f(x)],\\ & = &(b-a)\dfrac{1}{N} \sum_{i=0}^{N-1} \int_a^b f(x)pdf(x)\:dx\\ & = & \dfrac{1}{N} \sum_{i=0}^{N-1} \int_a^b f(x)\:dx,\\ &=& \int_a^b f(x)\:dx,\\ &=&F\\ \end{array}

Mathematical Expectation

16 / 42

Monte Carlo Integration





Resovle I = \int_0^1 xe^x {\rm d}x

17 / 42

Monte Carlo Integration


\begin{aligned} I &= uv - \int v du\\ &= xe^x - \int e^x dx\\ &= \left. xe^x - e^x \right|_0^1\\ &= \left. e^x(x-1)\right|_0^1\\ &= 0 - (-1) = 1 \end{aligned}

18 / 42

Monte Carlo Integration


rand('seed',12345);
N1 = 100;
x = rand(N1,1);
Ihat1 = sum(x.*exp(x))/N1
N2 = 5000;
x = rand(N2,1);
Ihat2 = sum(x.*exp(x))/N2


Ihat1 = 0.95668
Ihat2 = 1.0063
19 / 42

Monte Carlo Integration

\mathbb E[x] = \int_{p(x)} p(x) f(x) dx where \quad f(x)=xe^x,\quad p(x)\in Unif(0,1)




\begin{aligned} I &= \mathbb E_{p(x)} f(x)\\ &\approx \frac{1}{N}\sum_{i=1}^N x_i e^{x_i} \end{aligned}

20 / 42

Monte Carlo Integration

\begin{aligned} &\begin{array}{|cc|}\hline g(x) \geq 0, x \in (a,b)\qquad\int_a^b g(x) = C < \infty\\ \hline\end{array}\\ &Let\quad p(x) = \frac{g(x)}{C}\\ &I = C \int_a^b h(x) p(x)dx = C \mathbb E_{p(x)}[h(x)] \end{aligned}


21 / 42

Monte Carlo Integration

\begin{aligned} &\begin{array}{|cc|}\hline g(x) \geq 0, x \in (a,b)\qquad\int_a^b g(x) = C < \infty\\ \hline\end{array}\\ &Let\quad p(x) = \frac{g(x)}{C}\\ &I = C \int_a^b h(x) p(x)dx = C \mathbb E_{p(x)}[h(x)] \end{aligned}




\begin{array}{ll} \hline 1 & Identify \, h(x)\\ 2 & Identify\, g(x),\, determine\, p(x)\, and\, C \\ 3 & Draw\, N\, independence\, samples\\ 4 & Estimate\, I = C \mathbb E[h(x)] \approx \frac{C}{N} \sum_i^N h(x_i)\\ \hline \end{array}

22 / 42

Markov Chains

23 / 42

Markov Chains


x^{(0)} \rightarrow x^{(1)} \rightarrow x^{(2)} \dots \rightarrow x^{(t)} \rightarrow \dots


\begin{array}{ll} \hline 1 & state\,space:\, \color{blue}{x}\\ 2 & transition\,operator:\, \color{blue}{p(x^{(t+1)} | x^{(t)})}\\ 3 & initial\,condition\,distribution:\, \color{blue}{\pi^{(0)} }\\ \hline \end{array}

24 / 42

Markov Chains


Weather of Berkeley, CA:

\begin{array}{|c|c|c|c|} \hline today->next week & sunny & foggy & rainy\\ \hline sunny & 0.8 & 0.15 & 0.05\\ \hline foggy & 0.4 & 0.5 & 0.1\\ \hline rainy & 0.1 & 0.3 & 0.6\\ \hline \end{array}


weather next month, next year ?

25 / 42

Markov Chains


1. state space: {sunny, foggy, rainy}
2. transition operator: P = \begin{bmatrix} 0.8 & 0.15 & 0.05\\ 0.4 & 0.5 & 0.1\\ 0.1& 0.3 & 0.6 \end{bmatrix}
3. initial condition distribution: {sunny, foggy, rainy} = {0, 0, 1}

26 / 42

Markov Chains


P = [.8 .15 .05; % SUNNY
.4 .5 .1; % FOGGY
.1 .3 .6]; % RAINY
nWeeks = 52
% INITIAL STATE IS RAINY
X(1,:) = [0 0 1];
% RUN MARKOV CHAIN
for iB = 2:nWeeks
X(iB,:) = X(iB-1,:)*P; % TRANSITION
end
% PREDICTIONS
fprintf('\np(weather) in 1 week -->'), disp(X(2,:));
fprintf('\np(weather) in 2 weeks -->'), disp(X(3,:));
fprintf('\np(weather) in 6 months -->'), disp(X(25,:));
fprintf('\np(weather) in 1 year -->'), disp(X(52,:));
p(weather) in 1 week --> 0.10000 0.30000 0.60000
p(weather) in 2 weeks --> 0.26000 0.34500 0.39500
p(weather) in 6 months --> 0.59649 0.26316 0.14035
p(weather) in 1 year --> 0.59649 0.26316 0.14035
27 / 42

Markov Chains


28 / 42

How about continuous state space ?

29 / 42

Markov Chain Monte Carlo(MCMC)

30 / 42

The Metropolis Sampler


Markov chain

1. state space {depends on target distribution, e.g. [-\infty, \infty]}
2. proposal distribution: q(x|x^{(t-1)})
3. initial state: x^{(0)}=\pi^{(0)}

Monte Carlo

1. sampling from q(x|x^{(t-1)})
2. accept or reject the sample following some rules

31 / 42

The Metropolis Sampler


Draw sample from p(x)=(1+x^2)^{-1}


1. \quad\pi^{(0)} \sim \mathcal{N}(0,1)

2. \quad q(x|x^{(t-1)}) \sim \mathcal{N}(x^{(t-1)},1)

32 / 42

The Metropolis Sampler


randn('seed',12345);
p = inline('(1 + x.^2).^-1','x')
nSamples = 5000; sigma = 1;
minn = -20; maxx = 20;
xx = 3*minn:.1:3*maxx;
x = zeros(1 ,nSamples); x(1) = randn; t = 1;
while t < nSamples
t = t+1;
xStar = normrnd(x(t-1) ,sigma);
alpha = min([1, p(xStar)/p(x(t-1))]);
u = rand;
if u < alpha
x(t) = xStar;
str = 'Accepted';
else
x(t) = x(t-1);
str = 'Rejected';
end
end
hist(x, 500);
33 / 42

The Metropolis Sampler


34 / 42

The Metropolis Sampler


\begin{array}{|l|} \hline \text{1. set } t = 0\\ \text{2. generate an initial state }x^{(0)}\text{ from a prior distribution }\pi^{(0)}\text{ over initial states}\\ \text{3. repeat until }t = M\\ \qquad\text{set }t = t+1\\ \qquad\text{generate a proposal state }x^*\text{ from }q(x | x^{(t-1)})\\ \qquad\text{calculate the acceptance probability }\alpha = \min \left(1, \frac{p(x^*)}{p(x^{(t-1)})}\right)\\ \qquad\text{draw a random number u from }\text{Unif}(0,1)\\ \qquad\qquad\text{if }u \leq \alpha\text{, accept the proposal and set }x^{(t)} = x^*\\ \qquad\qquad\text{else set }x^{(t)} = x^{(t-1)}\\ \hline \end{array}

35 / 42

MCMC family

  • Metropolis Sampler
  • Metropolis-Hastings Sampler
  • Hybrid Monte Carlo
  • Gibbs Sampling
36 / 42

Applications

37 / 42

Applications




LDA

Gibbs Sampling

38 / 42

LDA

\begin{aligned} p(z_i = k|\overrightarrow{\mathbf{z}}_{\neg i}, \overrightarrow{\mathbf{w}}) & \propto p(z_i = k, w_i = t |\overrightarrow{\mathbf{z}}_{\neg i}, \overrightarrow{\mathbf{w}}_{\neg i}) \\ &= \int p(z_i = k, w_i = t, \overrightarrow{\theta}_m,\overrightarrow{\varphi}_k | \overrightarrow{\mathbf{z}}_{\neg i}, \overrightarrow{\mathbf{w}}_{\neg i}) d \overrightarrow{\theta}_m d \overrightarrow{\varphi}_k \\ &= \int p(z_i = k, \overrightarrow{\theta}_m|\overrightarrow{\mathbf{z}}_{\neg i}, \overrightarrow{\mathbf{w}}_{\neg i}) \cdot p(w_i = t, \overrightarrow{\varphi}_k | \overrightarrow{\mathbf{z}}_{\neg i}, \overrightarrow{\mathbf{w}}_{\neg i}) d \overrightarrow{\theta}_m d \overrightarrow{\varphi}_k \\ &= \int p(z_i = k |\overrightarrow{\theta}_m) p(\overrightarrow{\theta}_m|\overrightarrow{\mathbf{z}}_{\neg i}, \overrightarrow{\mathbf{w}}_{\neg i}) \cdot p(w_i = t |\overrightarrow{\varphi}_k) p(\overrightarrow{\varphi}_k|\overrightarrow{\mathbf{z}}_{\neg i}, \overrightarrow{\mathbf{w}}_{\neg i}) d \overrightarrow{\theta}_m d \overrightarrow{\varphi}_k \\ &= \int p(z_i = k |\overrightarrow{\theta}_m) Dir(\overrightarrow{\theta}_m| \overrightarrow{n}_{m,\neg i} + \overrightarrow{\alpha}) d \overrightarrow{\theta}_m \\ & \hspace{0.2cm} \cdot \int p(w_i = t |\overrightarrow{\varphi}_k) Dir( \overrightarrow{\varphi}_k| \overrightarrow{n}_{k,\neg i} + \overrightarrow{\beta}) d \overrightarrow{\varphi}_k \\ &= \int \theta_{mk} Dir(\overrightarrow{\theta}_m| \overrightarrow{n}_{m,\neg i} + \overrightarrow{\alpha}) d \overrightarrow{\theta}_m \cdot \int \varphi_{kt} Dir( \overrightarrow{\varphi}_k| \overrightarrow{n}_{k,\neg i} + \overrightarrow{\beta}) d \overrightarrow{\varphi}_k \\ &= \color{red}{E(\theta_{mk}) \cdot E(\varphi_{kt})} \\ \end{aligned}

39 / 42

Summary

  • Monte Carlo Approximation
  • Markov Chains
  • MCMC
  • Applications
40 / 42

Reference

  • https://en.wikipedia.org/wiki/Monte_Carlo_integration
  • http://blog.csdn.net/baimafujinji/article/details/53869358
  • https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/monte-carlo-methods-in-practice
  • https://theclevermachine.wordpress.com/2012/11/19/a-gentle-introduction-to-markov-chain-monte-carlo-mcmc/
  • https://site.douban.com/182577/widget/notes/10567181/note/292072927/
  • http://www.52nlp.cn/lda-math-lda-%E6%96%87%E6%9C%AC%E5%BB%BA%E6%A8%A1
41 / 42

Thanks!

Q&A

42 / 42

Criminal ciphertex


2 / 42

斯坦福统计学教授Persi Diaconis是一位传奇式的人物。Diaconis14岁就成了一名魔术师,为了看懂数学家Feller的概率论著作,24岁时进入大学读书。 他向《科学美国人》投稿介绍他的洗牌方法,在《科学美国人》上常年开设数学游戏专栏的著名数学科普作家马丁•加德纳给他写了推荐信去哈佛大学, 当时哈佛的统计学家Mosteller 正在研究魔术,于是Diaconis成了Mosteller的学生。(对他这段传奇经历有兴趣的读者可以看一看统计学史话《女士品茶》)。 下面要讲的这个故事,是Diaconis 在他的文章The Markov Chain Monte Carlo Revolution中给出的破译犯人密码的例子。 一天,一位研究犯罪心理学的心理医生来到斯坦福拜访Diaconis。他带来了一个囚犯所写的密码信息。他希望Diaconis帮助他把这个密码中的信息找出来。 这个密码里的每个符号应该对应着某个字母,但是如何把这些字母准确地找出来呢?Diaconis和他的学生Marc采用了一种叫做MCMC(马尔科夫链蒙特卡洛)的方法解决了这个问题。 这个MCMC方法就是这一节我们所要讨论的内容。

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow