Neural Network ABC
Background
Deep learning is very popular recently, which is based on Neural Network, an old algorithm that had degraded for years but is resurging right now. We talk about some basic concept about Neural network today, hoping supply a intuitive perspective of it.
Before beginning, I'd like to introduce you an exicting product which
help those who are blind see the world. BrianPort, which is invented by
Wicab, uses you tougue to see the world. Tongue array contains 400
electrodes and is connected to the glasses. The product transfers from
light to electric signal. More than 80% blind persons could pass through
the block during the experiments.
In fact, Wicab takes advantage the mechanism of neural network of our brain. There are 86 billion neuron in our brain. We can smell, see, hear the world just because of these neurons. They are connect to each other to help us sense the world. Algorithm Neural Network is a way of mimic the mechanism of our brain.
Intuition
Let's start from the easiest model, we get \(a_1\) in two steps: step1: \(z_1=w_1x_1+w_2x_2+w_3x_3\) step2: \(a_1=\frac{1}{1+e^{(-z)}}\) In addition, we
add a bias \(w_0\) to the calculate.
After letting \(x_0=1\), then: \[z=w_0x_0+w_1x_1+w_2x_2+w_3x_3\] We always
add a bias at each layer but the last to Neural Network. If
we contrast this model with logistic regression model, ww find that
right now the to model is just the same: input every \(x\) represents a feature. In logistic
regression, we want to train a model \(h_w(x)=\frac{1}{1+e^{-W^Tx}}\). The
simpliest Neural Network, the model is a little complex, but if we do
not take hidden layer into account, the model is just logistic
regression.
Neural Network
To approach the authentic Neural Network, we add two more
nerons(\(a_2^{(2)}\) and \(a_1^{(3)}\)) to logistic regression model.
Notice that the model inner green triangle box is just like logistic
regression demonstrated above. There are only two layers in Logistic
Regression, in contrast, we can add more layers like L2 layer. In Neural
Network, we call these layers hidden layers which are neither the
input(e.g. layer have \(x_1, x_2,
x_2\)), nor the output \(h(x)\).
The figure below has only one hidden layer, though we can add many
hidden layers to the model.
Look at the figure above, let's look at the definition of Neural
Network, take \(w_{12}^{(1)}\) for
example, the subscript \(_{12}\)
represents the weight from the former layer \(2nd\) unit to the current layer \(1st\) unit. The superscript \(^1\) represents former layer is layer L1.
These \(w\) are named weights of Neural
Network. The sigmoid function \(f=\frac{1}{1+e^{-x}}\) is activation
function. We can choose other activation function such as symmetrical
sigmoid \(S(x)=\frac{1-e^{-x}}{1+e^{-x}}\). Now let's
think about how to calculate \(h(x)\),
for the L2 layer, we have: \[\begin{align}
& z_1^{(2)}=w_{10}^{(1)}x_0 + w_{11}^{(1)}x_1 + w_{12}^{(1)}x_2
+ w_{13}^{(1)}x_3\\
& z_2^{(2)}=w_{20}^{(1)}x_0 + w_{21}^{(1)}x_1 + w_{22}^{(1)}x_2
+ w_{23}^{(1)}x_3\\
& a_1^{(2)} = g(w_{10}^{(1)}x_0 + w_{11}^{(1)}x_1 + w_{12}^{(1)}x_2
+ w_{13}^{(1)}x_3)=g(z_1^{2})\\
& a_2^{(2)} = g(w_{20}^{(1)}x_0 + w_{21}^{(1)}x_1 + w_{22}^{(1)}x_2
+ w_{23}^{(1)}x_3)=g(z_2^{2})
\end{align}\] Here, \(g()\) is
the activation function. Notice that if we use matrices represent the
equation, result will be simpler: \[a^{(2)} =
g(z^{(2)}) = g(W^{(1)} a^{(1)})\] Here, we let \(a_i^{(1)}=x_i\). We can conclude one more
step, for layer k, we have: \[a^{(k)} =
g(z^{(k)}) = g(W^{(k-1)} a^{(k-1)})\] Then for the L3 Layer, we
have only one neural: \[\begin{align}
h(x) = a_1^{3}=g(w_{10}^{(1)}a_0^{(2)} + w_{11}^{(1)}a_1^{(2)} +
w_{12}^{(1)}a_2^{(2)})=g(z_1^{3})
\end{align}\] If we substitute \(a_1^{2}\) and \(a_2^{2}\) for elme \(h(x)\), we have: \[\begin{align}
h(x)=a_1^{3}=g(w_{10}^{(1)}\cdot 1 + w_{11}^{(1)}
\cdot g(z_1^{(2)})+ w_{12}^{(1)}\cdot g(z_2^{(2)}))
\end{align}\] The formula show that we use \(g()\) function once and once again to nest
the input, and compute the output eventaully. It is rather a non-linear
classifier than linear classifier such as Linear Regression and Logistic
Regression.
More Complicated Network
A Neural Network can be very complex, as long as we add more hidden
layer into the network, the figure showed below is a neural network
which has 20 layers, which means it has 1 input layer, 1 output layer
and 18 hidden layers. From the connected weight we can imagine how much
many weight we would calculate if we want to train such a big Neural
Network. Notice that we add a bias subscript with zero on each layer
except the output layer. And in each layer, we can add different amount
of nerons. If we want to recognize numer image in zipcode from 0~9, we
can design the Neural Network with 10 outputs in the output layer.
Simple Applications
This section, I'd like to construct a Neural Network to simulate a logic gate. Remember that bias \(x_0\) is always \(1\). Now let set \(w_{10},\)\(w_{11}\) and \(w_{12}\), and find what will h(x) become: \[w_{10}=-30\,,w_{11}=20\,,w_{12}=20\,\]
\(x_1\) | \(x_2\) | \(z_1\) | \(a_1\) |
---|---|---|---|
0 | 0 | -30 | 0 |
0 | 1 | -10 | 0 |
1 | 0 | -10 | 0 |
1 | 1 | 10 | 1 |
Here we take advantge the property of sigmoid function, \(g(-10)=4.5\times 10^{-5}\approx 0\) and
\(g(10)=0.99995\approx 1\). From the
table we have constructed an \(AND\)
logic gate. It is easy to construct an \(OR\) logic gate. We just set: \[w_{10}=-30\,,w_{11}=50\,,w_{12}=50\,\]
Then we get an \(OR\) logic gate. We
can construct \(NOR\) gate as well,
just set: \[w_{10}=10\,,w_{11}=-20\,,w_{12}=-20\,\]
Question: can we construct a \(XOR\)
gate? In
fact, we can get a more powerful logic gate through adding more hidden
layers. Only 2 layers of Neural Network can not construct a \(XOR\) gate but 3 layers can. Neural Network
shown below can implement function as \(XOR\) logic gate.
The weights matrices is as followed, we can testify through table
listed. \[\begin{align}
&W^{(1)}=\begin{bmatrix}-30&20&20\\
10&-20&-20
\end{bmatrix}\\
&W^{(2)}=\begin{bmatrix}10&-20&-20
\end{bmatrix}
\end{align}\]
\(x_1\) | \(x_2\) | \(a_1^{(2)}\) | \(a_2^{(2)}\) | \(a_1^{(3)}\) |
---|---|---|---|---|
0 | 0 | 0 | 1 | 0 |
0 | 1 | 0 | 0 | 1 |
1 | 0 | 0 | 0 | 1 |
1 | 1 | 1 | 0 | 0 |
From examples we have seen, hope you can gain intuition about Neural Network. We can generate more abstract features through adding hidden layers. ### Summerize Today we used Logistic Regression adding hidden layers to generate Neural Network. Then we talked about how to represent a Neural Network. In the end, we found that Neural Network can simulate logic gate. We do not talk about how to train a Neural Network here. Usually we use Backpropagation Algorithm to train a Neural Network.
Reference
- https://www.coursera.org/learn/machine-learning
- 《Neural Networks》by Raul Rojas