tag which lead to failure of parsing mathjax.
To solve the problem you should add backslashing to all underline of this math block
---
## Optimization Algorithms
* With momentum
.footnote[From chapter 2. week 2.6]
---
## Optimization Algorithms
* With momentum
.footnote[From chapter 2. week 2.6]
---
## Optimization Algorithms
* With momentum
.footnote[From chapter 2. week 2.6]
--
### compute dW, db on current mini-batch
$$\begin{aligned}
&v\_{\rm{d}W} = \beta v\_{\rm{d}W} + (1-\beta) \rm{d}W\\\\
&v\_{\rm{d}b} = \beta v\_{\rm{d}b} + (1-\beta) \rm{d}b\\\\
&W := W - \alpha v\_{\rm{d}W}\\\\
&b := b - \alpha v\_{\rm{d}b}
\end{aligned}$$
.footnote[From chapter 2. week 2.6]
---
## Optimization Algorithms
* With momentum
### compute dW, db on current mini-batch
$$\begin{aligned}
&v\_{\rm{d}W} = \beta v\_{\rm{d}W} + (1-\beta) \rm{d}W\\\\
&v\_{\rm{d}b} = \beta v\_{\rm{d}b} + (1-\beta) \rm{d}b\\\\
&W := W - \alpha v\_{\rm{d}W}\\\\
&b := b - \alpha v\_{\rm{d}b}
\end{aligned}$$
.footnote[From chapter 2. week 2.6]
---
## Optimization Algorithms
* RMSProp
--
### compute dW, db on current mini-batch
$$\begin{aligned}
&s\_{\rm{d}W} = \beta s\_{\rm{d}W} + (1-\beta) \rm{d}W^2\\\\
&s\_{\rm{d}b} = \beta s\_{\rm{d}b} + (1-\beta) \rm{d}b^2\\\\
&W := W - \alpha \frac{\rm{d}W}{\sqrt{s\_{\rm{d}W} + \epsilon }}\\\\
&b := b - \alpha \frac{\rm{d}b}{\sqrt{s\_{\rm{d}b} + \epsilon }}
\end{aligned}$$
.footnote[From chapter 2. week 2.7]
---
## Optimization Algorithms
* Adam optimization
.footnote[From chapter 2. week 2.8]
---
## Optimization Algorithms
* Adam optimization(Momentum + RMSProp)
.footnote[From chapter 2. week 2.8]
---
## Optimization Algorithms
* Adam optimization(Momentum + RMSProp)
$$\begin{aligned}
&v\_{\rm{d}W} = \beta\_1 v\_{\rm{d}W} + (1-\beta\_1) \rm{d}W&v\_{\rm{d}b} = \beta\_1 v\_{\rm{d}b} + (1-\beta\_1) \rm{d}b\\\\
\end{aligned}$$
.footnote[From chapter 2. week 2.8]
---
## Optimization Algorithms
* Adam optimization(Momentum + RMSProp)
$$\begin{aligned}
&v\_{\rm{d}W} = \beta\_1 v\_{\rm{d}W} + (1-\beta\_1) \rm{d}W&v\_{\rm{d}b} = \beta\_1 v\_{\rm{d}b} + (1-\beta\_1) \rm{d}b\\\\
&s\_{\rm{d}W} = \beta\_2 s\_{\rm{d}W} + (1-\beta\_2) \rm{d}W^2 & s\_{\rm{d}b} = \beta\_2 s\_{\rm{d}b} + (1-\beta\_2) \rm{d}b^2\\\\
\end{aligned}$$
.footnote[From chapter 2. week 2.8]
---
## Optimization Algorithms
* Adam optimization(Momentum + RMSProp)
$$\begin{aligned}
&v\_{\rm{d}W} = \beta\_1 v\_{\rm{d}W} + (1-\beta\_1) \rm{d}W
&v\_{\rm{d}b} = \beta\_1 v\_{\rm{d}b} + (1-\beta\_1) \rm{d}b\\\\
&s\_{\rm{d}W} = \beta\_2 s\_{\rm{d}W} + (1-\beta\_2) \rm{d}W^2
&s\_{\rm{d}b} = \beta\_2 s\_{\rm{d}b} + (1-\beta\_2) \rm{d}b^2\\\\
&v\_{\rm{d}W}^{\rm corrected} = \frac{v\_{\rm{d}W} }{1-\beta\_1^t}
&v\_{\rm{d}b}^{\rm corrected} = \frac{v\_{\rm{d}b} }{1-\beta\_1^t}
\end{aligned}$$
.footnote[From chapter 2. week 2.8]
---
## Optimization Algorithms
* Adam optimization(Momentum + RMSProp)
$$\begin{aligned}
&v\_{\rm{d}W} = \beta\_1 v\_{\rm{d}W} + (1-\beta\_1) \rm{d}W
&v\_{\rm{d}b} = \beta\_1 v\_{\rm{d}b} + (1-\beta\_1) \rm{d}b\\\\
&s\_{\rm{d}W} = \beta\_2 s\_{\rm{d}W} + (1-\beta\_2) \rm{d}W^2
&s\_{\rm{d}b} = \beta\_2 s\_{\rm{d}b} + (1-\beta\_2) \rm{d}b^2\\\\
&v\_{\rm{d}W}^{\rm corrected} = \frac{v\_{\rm{d}W} }{1-\beta\_1^t}
&v\_{\rm{d}b}^{\rm corrected} = \frac{v\_{\rm{d}b} }{1-\beta\_1^t}\\\\
&s\_{\rm{d}W}^{\rm corrected} = \frac{s\_{\rm{d}W} }{1-\beta\_2^t}
&s\_{\rm{d}b}^{\rm corrected} = \frac{s\_{\rm{d}b} }{1-\beta\_2^t}
\end{aligned}$$
.footnote[From chapter 2. week 2.8]
---
## Optimization Algorithms
* Adam optimization(Momentum + RMSProp)
$$\begin{aligned}
&v\_{\rm{d}W} = \beta\_1 v\_{\rm{d}W} + (1-\beta\_1) \rm{d}W
&v\_{\rm{d}b} = \beta\_1 v\_{\rm{d}b} + (1-\beta\_1) \rm{d}b\\\\
&s\_{\rm{d}W} = \beta\_2 s\_{\rm{d}W} + (1-\beta\_2) \rm{d}W^2
&s\_{\rm{d}b} = \beta\_2 s\_{\rm{d}b} + (1-\beta\_2) \rm{d}b^2\\\\
&v\_{\rm{d}W}^{\rm corrected} = \frac{v\_{\rm{d}W} }{1-\beta\_1^t}
&v\_{\rm{d}b}^{\rm corrected} = \frac{v\_{\rm{d}b} }{1-\beta\_1^t}\\\\
&s\_{\rm{d}W}^{\rm corrected} = \frac{s\_{\rm{d}W} }{1-\beta\_2^t}
&s\_{\rm{d}b}^{\rm corrected} = \frac{s\_{\rm{d}b} }{1-\beta\_2^t}\\\\
&W := W - \alpha\frac{v\_{\rm{d}W}^{\rm corrected}}{\sqrt{s\_{\rm{d}W}^{\rm corrected} +\epsilon }}
&b := b - \alpha\frac{v\_{\rm{d}b}^{\rm corrected}}{\sqrt{s\_{\rm{d}b}^{\rm corrected} + \epsilon}}
\end{aligned}$$
.footnote[From chapter 2. week 2.8]
---
## Optimization Algorithms
* Learning rate decay
$$\begin{aligned}
\alpha = \frac{1}{1+\rm{decayRate}\times\rm{epochNum}} \alpha_0
\end{aligned}$$
.footnote[From chapter 2. week 2.9]
---
## Optimization Algorithms
* Learning rate decay
$$\begin{aligned}
&\alpha = \frac{1}{1+\rm{decayRate}\times\rm{epochNum}} \alpha_0\\\\\\\\
&\alpha\_0=0.2 \qquad \rm{decayRate}=1\\\\
&\begin{array}{c|c}
Epoch & \alpha \\\\
\\hline
1 & 0.1\\\\
2 & 0.067\\\\
3 & 0.05\\\\
4 & 0.04\\\\
\vdots & \vdots\\\\
\end{array}
\end{aligned}$$
.footnote[From chapter 2. week 2.9]
---
## Hyperparameters Tuning
.footnote[From chapter 2. week 3]
--
* $\alpha$, $\beta$
* $\beta_1, \beta_2, \epsilon$
* \#layers
* \#hidden units
* Learning rate decay
* mini-batch size
.footnote[From chapter 2. week 3]
???
these meanings of Hyperparameters
---
## Hyperparameter Tuning
* Coarse to fine
* Panda VS Caviar
.footnote[From chapter 2. week 3]
---
## Hyperparameter Tuning
* Panda VS Caviar
.left[
Panda
]
.footnote[From chapter 2. week 3.3]
--
.right[
Caviar
]
.footnote[From chapter 2. week 3.3]
---
## Hyperparameter Tuning
* Panda VS Caviar
.left[
Panda
]
.right[
Caviar
]
.footnote[From chapter 2. week 3.3]
---
## Batch Normalization
$$\begin{aligned}
&\mu = \frac{1}{m} \sum\_i z^{(i)} \\\\
&\sigma^2 = \frac{1}{m} \sum\_i (z^{(i)} -\mu)^2\\\\
&z\_{norm}^{(i)} = \frac{z^{(i)} - \mu}{\sqrt{\sigma^2 + \epsilon}}\\\\
&\tilde{z}^{(i)} = \gamma z\_{norm}^{(i)} + \beta \qquad\color{blue}{\gamma\; and\; \beta\; are\; learnable\; parameters}
\end{aligned}$$
.footnote[From chapter 2. week 3.4 3.5 3.6]
???
you should look through vedio to learn how implement the Batch Normalization
---
template: inverse
## Structuring Machine Learning Projects
---
## ML Strategy
### 1. Collect more data
--
### 2. Collect more diverse training set
--
### 3. Train algorithm longer with gradient descent
--
### 4. Try bigger network
--
### 5. Try smaller network
--
### 6. Try dropout
--
### 7. Add L2 regularization
--
### 8. Network architecture
.footnote[From chapter 3. week 1.1]
???
activation functions, hidden units
---
## ML Strategy More...
.footnote[From chapter 3. week 1.2]
--
### 1. orthogonalization
.footnote[From chapter 3. week 1.2]
---
## Setting up Your Goal
* Single number evaluation metric
* Satisficing and optimzing metrics
* Train/dev/test distribution
* Size of dev and test sets
* When to change dev/test sets and metrics
* Why human-level performance?
* Avoidable bias
.footnote[From chapter 4. week 1]
---
## Error Analysis
* Carrying out error analysis
* Cleaning up Incorrectly labeled data
* Build your first system quickly, then iterate
.footnote[From chapter 4. week 2.1 2.2 2.3]
---
## Mismatched training and dev/test data
* Training and testing on different distributions
* Bias and Variance with mismatched data distributions
* Addressing data mismatch
.footnote[From chapter 4. week 2.4 2.5 2.6]
---
## Learning from multiple tasks
* Transfer learning
* Multi-task learning
.footnote[From chapter 4. week 2.7 2.8]
---
## End-to-end deep learning
* What is end-to-end deep learning
* Whether to use end-to-end learning
.footnote[From chapter 4. week 2.9 2.10]
---
template: inverse
## Convolutional Neural Network
---
## Computer Vision Problems
* Image Classification
* Nerual Style Transfer
* Object Detection
* ... ...
.footnote[From chapter 5. week 1.1]
---
## Convolutional Neural Network
* Edge Detection
* Padding
* Strides
* Pooling
.footnote[From chapter 5. week 1]
---
## Convolutional Neural Network
* Edge Detection
.footnote[From chapter 5. week 1]
---
## Convolutional Neural Network
* Multiple filters
.footnote[From chapter 5. week 1.6]
---
## Convolutional Neural Network
* A typical convolutional neural network
.footnote[from https://sefiks.com/2017/11/03/a-gentle-introduction-to-convolutional-neural-networks/]
---
## Why Convolution?
.footnote[From chapter 5. week 1.11]
--
* Parameter sharing
* Sparsity of connections
.footnote[From chapter 5. week 1.11]
---
## Why Convolution?
* Parameter sharing
.subblock[
A feature detector(such as a vertical edge detector) that's useful in one part of the image is probably useful in another part of the image.
]
* Sparsity of connections
.subblock[
In each layer, each output value depends only on a small number of inputs.
]
.footnote[From chapter 5. week 1.11]
---
## Case Studies
* LeNet-5
* AlexNet
* VGG
* ResNet
* Inception
.footnote[From chapter 5. week 2.1]
---
## Case Studies
* LeNet-5
.footnote[From chapter 5. week 2.2]
---
## Case Studies
* AlexNet
.footnote[From chapter 5. week 2.2]
---
## Case Studies
* VGG-16
.footnote[From chapter 5. week 2.2]
---
## Case Studies
* ResNet
.footnote[From chapter 5. week 2.3 2.4]
--
.footnote[From chapter 5. week 2.3 2.4]
---
## Case Studies
* Inception network
.footnote[From chapter 5. week 2.5]
---
## Transfer Learning
.footnote[From chapter 5. week 2.9]
---
## Transfer Learning
.footnote[From chapter 5. week 2.9]
---
## Transfer Learning
.footnote[From chapter 5. week 2.9]
---
## Transfer Learning
.footnote[From chapter 5. week 2.9]
---
## Data augmentation
* Mirroring
--
* Random cropping
--
* Rotation
--
* Shearing
--
* Local warping
--
* Color shifting
--
* PCA color augmentation
.footnote[From chapter 5. week 2.10]
---
## The State of computer vision
---
## Object Detection
* Object localization
.footnote[From chapter 5. week 3.1]
---
## Object Detection
* Object localization
.footnote[From chapter 5. week 3.1]
---
## Object Detection
* Object localization
.footnote[From chapter 5. week 3.1]
--
.left[
1 - pedestrain
2 - car
3 - motorcycle
4 - background
]
.footnote[From chapter 5. week 3.1]
---
## Object Detection
* Classification with Object localization
.left[
1 - pedestrain
2 - car
3 - motorcycle
4 - background
]
.footnote[From chapter 5. week 3.1]
---
## Object Detection
* Classification with Object localization
$$\begin{aligned}y = \begin{bmatrix}P\_c \\\\ b\_x \\\\b\_y \\\\b\_h \\\\b\_h \\\\b\_w \\\\C\_1 \\\\C\_2 \\\\C\_3 \end{bmatrix}
\quad y = \begin{bmatrix}1 \\\\ b\_x \\\\b\_y \\\\b\_h \\\\b\_h \\\\b\_w \\\\0 \\\\1 \\\\0 \end{bmatrix}
\qquad y = \begin{bmatrix}0 \\\\? \\\\? \\\\? \\\\? \\\\? \\\\? \\\\? \\\\? \end{bmatrix}
\end{aligned}$$
.footnote[From chapter 5. week 3.1]
---
## Landmark detection
.footnote[From chapter 5. week 3.2]
---
## Sliding windows object Detection
### Different size windows
.left[
]
.right[
]
.footnote[From chapter 5. week 3.3]
---
## Truning FC layer into convolutional layers
.footnote[From chapter 5. week 3.4]
---
## Convolution implementation of sliding windows
.footnote[From chapter 5. week 3.4]
---
## Convolution implementation of sliding windows
.footnote[From chapter 5. week 3.4]
---
## YOLO algorithm
.left[
]
.right[
$3\times 3 \times 8\qquad\qquad$
]
.footnote[From chapter 5. week 3.5]
--
$\quad 10\times 10 \times 3 \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \quad 3 \times 3 \times 8$
.footnote[From chapter 5. week 3.5]
---
## Specify the bounding boxes
.left[
]
.right[
$$y=\begin{bmatrix}1 \\\\ b\_x \\\\ b\_y \\\\ b\_h \\\\ b\_w \\\\ 0 \\\\ 1 \\\\ 0 \end{bmatrix} \begin{aligned} b\_x, b\_y = \begin{cases} 0.4 \\\\ 0.3 \end{cases}\\\\\\\\ b\_h, b\_w=\begin{cases}0.9 \\\\ 0.5\end{cases} \end{aligned}$$
]
.footnote[From chapter 5. week 3.5]
---
## Evaluating object localization
$$\text{intersection over union} = \frac{\text{size of } intersection}{\text{size of } union}$$
.footnote[From chapter 5. week 3.6]
---
---
template: inverse
## Sequence Models
---
## Why sequence models?
* Speech recognition
* Music generateration
* Sentiment classification
* DNA sequence analysis
* Machine Translation
* Video activity recognition
* Name entity recognition
---
## Notation
### Motivating example
x: Harry Potter and Hermione Granger invented a new spell
$\quad x^{<1>}\enspace x^{<2>}\enspace x^{<3>}\quad\qquad\qquad\cdots\qquad\qquad\qquad x^{<9>}$
one-hot:
And=367
Invented=4700
A=1
New=5976
Spell=8376
Harry=4075
Potter=6830
Hermione=4200
Gran...=4000
---
## Recurrent Neural Networks
---
# Summary
* Neural Networks and Deep Learning
* Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization
* Structuring Machine Learning Projects
* Convolutional Neural Networks
* Sequence Models
---
# Reference
- http://mooc.study.163.com/smartSpec/detail/1001319001.htm
- https://www.coursera.org/specializations/deep-learning
- https://liam0205.me/2017/03/25/bias-variance-tradeoff/
---
template: inverse
# Thanks!
## Q&A