Pytorch ABC

发表于 2020-03-07

Fundamentals

Torch.tensor

1
2
3
4
t = tensor([[1,2,4],[4,5,6]])
t.shape: (2,3)
t.ndim: 2
type: scalar, vector, matrix, tensor

Tensor data produce

1
2
3
4
5
random_tensor = torch.rand(size=(3, 4, 5))
zeros = torch.zeros(size=(3, 4))
ones = torch.ones(size=(3, 4))
zero_to_ten = torch.arange(start=0, end=10, step=1)
ten_zeros = torch.zeros_like(input=zero_to_ten) # same shape but all zeros

Float

1
2
3
4
torch.float32/torch.float
torch.float16
torch.half
torch.float64/torch.double

types specific for GPU or CPU

1
2
device=’cuda’ if torch.cuda.is_available() else ‘cpu’
t = tensor([1,2,3], device=device)

tensor operations

1
2
tensor_A = torch.tensor([[1,2],[3,4],[5,6]],
                        dtype = torch.float32)

multiply

1
2
3
4
5
6
7
tensor = torch.tensor([1, 2, 3])
tensor + 10
torch.multiply(tensor, 10)
tensor * tensor # tensor([1,4,9])
tenorA @ tensorB # matrix multiplication -> tensor(14)
torch.matmul(tensor, tensor)/torch.mm # 1*1 + 2*2 + 3*3 = tensor(14)
tensor.T

layer

1
2
3
4
5
6
7
8
#Torch.nn.Linear
y = x A^T + b
torch.manual_seed(42)
linear = torch.nn.Linear(in_features=2, # in_features = matches inner dimension of input
             out_features=6) # out_features = describes outer value
x = tensor_A
output = linear(x)
x.shape, output, output.shape

other operations

1
2
3
4
5
6
7
8
9
10
11
12
tensor = torch.arange(10, 100, 10) # tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
tensor.argmax() # 8
tensor.argmin() # 0
tensor.type(torch.float16) # tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.]
torch.reshape(new_shape) # -1 is to ask calculating automatically
tensor.view(new_shape) # return a new shape view
torch.stack(t, dim=0) # concate a sequence of tensors along a new dimension(dim)
torch.squeeze() # all into the first dimensions
torch.clamp() # min=min, max=max, limit the range
torch.unsqueeze()
torch.permute() # torch.Size([224, 224, 3]) -> torch.Size([3, 224, 224])
torch.permute_(), x.unsqueeze_() -> inplace operation

slice

 1
2
3
4
x[:, 0] #  tensor([[1, 2, 3]])
x[:, :, 1] # tensor([[2,5,8]])
x[:, 1, 1] # tensor([5])
x[0, 0, :]/x[0][0]  # tensor([1,2,3])

numpy

1
tensor = torch.from_numpy(array)

random seed

1
2
3
random seed
torch.manual_seed(seed=RANDOM_SEED)
torch.random.manual_seed(seed=RANDOM_SEED)

Variable

1
2
3
4
5
6
7
8
9
10
torch.autograd import Variable
.data, .grad, .grad_fn
x_tensor = torch.randn(10, 5)
y_tensor = torch.randn(10, 5)
x = Variable(x_tensor, requires_grad=True)
y = Variable(y_tensor, requires_grad=True)
z = torch.sum(x + y)
print(z.data) #-2.1379
print(z.grad_fn) #<SumBackward0 object at 0x10da636a0>
z.backward()

GPU

1
2
3
4
5
6
7
8
9
if torch.cuda.is_available():
    device = "cuda" # Use NVIDIA GPU (if available)
elif torch.backends.mps.is_available():
    device = "mps" # Use Apple Silicon GPU (if available)
else:
    device = "cpu" # Default to CPU if no GPU is available

tensor.to(device)
tensor_on_gpu.cpu().numpy()

Neural network

torch.nn

Contains all of the building blocks for computational graphs (essentially a series of computations executed in a particular way).

torch.nn.Parameter

Stores tensors that can be used with nn.Module. If requires_grad=True gradients (used for updating model parameters via gradient descent) are calculated automatically, this is often referred to as "autograd".

torch.nn.Module

The base class for all neural network modules, all the building blocks for neural networks are subclasses. If you're building a neural network in PyTorch, your models should subclass nn.Module. Requires a forward() method be implemented.

torch.optim

Contains various optimization algorithms (these tell the model parameters stored in nn.Parameter how to best change to improve gradient descent and in turn reduce the loss).

def forward()

All nn.Module subclasses require a forward() method, this defines the computation that will take place on the data passed to the particular nn.Module (e.g. the linear regression formula above).

Define a net

1
2
3
4
5
6
7
8
9
10
11
12
class net(nn.Module):
    __init__(self): super().__init__() ...
    def forward(self, x: torch.Tensor) -> torch.Tensor:
    return self.weights * x + self.bias
## expample
class LinearRegressModle(nn.Module):
    def __init__(self):
    super().__init__()
    self.weights = nn.Parameter(torch.randn(1, required_grad=True, dtype=torch.float))
    self.bias = nn.Parameter(torch.randn(1, required_grad=True, dtype=torch.float))
def forward(self, x:torch.Tensor) -> torch.Tensor:
        return self.weight * x + self.bias

Check module

1
2
3
4
5
torch.manual_seed(42)
model_0 = LinearRegressionModel()
list(model_0.parameters()) #  tensor([0.3367], requires_grad=True)
model_0.state_dict() # OrderedDict([('weights', tensor([0.3367])), ('bias', tensor([0.1288]))])
with torch.inference_mode(): y_preds = model_0(X_test) # run inference

Training

1
2
loss_fn = nn.L1Loss() # MAE loss is same as L1Loss
optimizer = torch.optim.SGD(params=model_0.parameters(), lr=0.01) ## lr(learning rate)

	Step name	What does it do?	Code example
1	Forward pass	The model goes through all of the training data once, performing its forward() function calculations.	model(x_train)
2	Calculate the loss	The model's outputs (predictions) are compared to the ground truth and evaluated to see how wrong they are.	loss = loss_fn(y_pred, y_train)
3	Zero gradients	The optimizers gradients are set to zero (they are accumulated by default) so they can be recalculated for the specific training step.	optimizer.zero_grad()
4	Perform backpropagation on the loss	Computes the gradient of the loss with respect for every model parameter to be updated (each parameter with requires_grad=True)	loss.backward()
5	Update the optimizer (gradient descent)	Update the parameters with requires_grad=True with respect to the loss gradients in order to improve them.	optimizer.step()

Training example

1
2
3
4
5
6
7
for epoch in range(epoches):
    model.train()
y_pred = model(X_train)
loss = loss_fn(y_pred, y_true)
optimizer.zero_grad()
loss.backward()
optimizer.step()

test

Forward pass	The model goes through all of the training data once, performing its forward() function calculations.	model(x_test)
Calculate the loss	The model's outputs (predictions) are compared to the ground truth and evaluated to see how wrong they are.	loss = loss_fn(y_pred, y_test)
Calulate evaluation metrics (optional)	Alongisde the loss value you may want to calculate other evaluation metrics such as accuracy on the test set.	Custom functions

Inference and save model

Inferennce

model_0.eval() # Set the model in evaluation mode with torch.inference_mode(): y_preds = model_0(X_test)

torch.save

Saves a serialized object to disk using Python's pickle utility. Models, tensors and various other Python objects like dictionaries can be saved using torch.save.

torch.load

Uses pickle's unpickling features to deserialize and load pickled Python object files (like models, tensors or dictionaries) into memory. You can also set which device to load the object to (CPU, GPU etc).

torch.nn.Module.load_state_dict ## recommended

Loads a model's parameter dictionary (model.state_dict()) using a saved state_dict() object.

Examples

Example 1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
torch.manual_seed(42)
epochs = 100 # Set the number of epochs

# Create empty loss lists to track values
train_loss_values = []
test_loss_values = []
epoch_count = []

for epoch in range(epochs):
    ### Training
    model_0.train() # Put model in training mode (this is the default state of a model)

    # 1. Forward pass on train data using the forward() method inside
    y_pred = model_0(X_train)
    # 2. Calculate the loss (how different are our models predictions to the ground truth)
    loss = loss_fn(y_pred, y_train)
    optimizer.zero_grad() # 3. Zero grad of the optimizer
    loss.backward() # 4. Loss backwards
    optimizer.step() # 5. Progress the optimizer
     ### Testing
    # Put the model in evaluation mode
    model_0.eval()

    with torch.inference_mode():
      # 1. Forward pass on test data
      test_pred = model_0(X_test)

      # 2. Caculate loss on test data
       # predictions come in torch.float datatype, so comparisons need to be done with tensors of the same type
      test_loss = loss_fn(test_pred, y_test.type(torch.float))

      # Print out what's happening
      if epoch % 10 == 0:
            epoch_count.append(epoch)
            train_loss_values.append(loss.detach().numpy())
            test_loss_values.append(test_loss.detach().numpy())
            print(f"Epoch: {epoch} | MAE Train Loss: {loss} | MAE Test Loss: {test_loss} ")

Example 2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
torch.manual_seed(42)

# Set the number of epochs
epochs = 1000

# Put data on the available device
# Without this, error will happen (not all model/data on device)
X_train = X_train.to(device)
X_test = X_test.to(device)
y_train = y_train.to(device)
y_test = y_test.to(device)

for epoch in range(epochs):
    ### Training
    model_1.train() # train mode is on by default after construction

    # 1. Forward pass
    y_pred = model_1(X_train)
    # 2. Calculate loss
    loss = loss_fn(y_pred, y_train)

    # 3. Zero grad optimizer
    optimizer.zero_grad()

    # 4. Loss backward
    loss.backward()

    # 5. Step the optimizer
    optimizer.step()

    ### Testing
    model_1.eval() # put the model in evaluation mode for testing (inference)
    # 1. Forward pass
    with torch.inference_mode():
        test_pred = model_1(X_test)

        # 2. Calculate the loss
        test_loss = loss_fn(test_pred, y_test)

    if epoch % 100 == 0:
        print(f"Epoch: {epoch} | Train loss: {loss} | Test loss: {test_loss}")

VAE

class VAE(nn.Module):
    def __init__(self, input_dim=784, hidden_dim=400, latent_dim=200, device=device):
        super(VAE, self).__init__()

        # encoder
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.LeakyReLU(0.2),
            nn.Linear(hidden_dim, latent_dim),
            nn.LeakyReLU(0.2)
            )

        # latent mean and variance
        self.mean_layer = nn.Linear(latent_dim, 2)
        self.logvar_layer = nn.Linear(latent_dim, 2)

        # decoder
        self.decoder = nn.Sequential(
            nn.Linear(2, latent_dim),
            nn.LeakyReLU(0.2),
            nn.Linear(latent_dim, hidden_dim),
            nn.LeakyReLU(0.2),
            nn.Linear(hidden_dim, input_dim),
            nn.Sigmoid()
            )
    def encode(self, x):
        x = self.encoder(x)
        mean, logvar = self.mean_layer(x), self.logvar_layer(x)
        return mean, logvar

    def reparameterization(self, mean, var):
        epsilon = torch.randn_like(var).to(device)
        z = mean + var*epsilon
        return z

    def decode(self, x):
        return self.decoder(x)

    def forward(self, x):
        mean, logvar = self.encode(x)
        z = self.reparameterization(mean, logvar)
        x_hat = self.decode(z)
        return x_hat, mean, log_var

CNN

# Create a neural net class
class Net(nn.Module):
    # Constructor
    def __init__(self, num_classes=3):
        super(Net, self).__init__()

        # Our images are RGB, so input channels = 3. We'll apply 12 filters in the first convolutional layer
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=12, kernel_size=3, stride=1, padding=1)

        # We'll apply max pooling with a kernel size of 2
        self.pool = nn.MaxPool2d(kernel_size=2)

        # A second convolutional layer takes 12 input channels, and generates 12 outputs
        self.conv2 = nn.Conv2d(in_channels=12, out_channels=12, kernel_size=3, stride=1, padding=1)

        # A third convolutional layer takes 12 inputs and generates 24 outputs
        self.conv3 = nn.Conv2d(in_channels=12, out_channels=24, kernel_size=3, stride=1, padding=1)

        # A drop layer deletes 20% of the features to help prevent overfitting
        self.drop = nn.Dropout2d(p=0.2)
        # Our 128x128 image tensors will be pooled twice with a kernel size of 2. 128/2/2 is 32.
        # So our feature tensors are now 32 x 32, and we've generated 24 of them
        # We need to flatten these and feed them to a fully-connected layer
        # to map them to  the probability for each class
        self.fc = nn.Linear(in_features=32 * 32 * 24, out_features=num_classes)

    def forward(self, x):
        # Use a relu activation function after layer 1 (convolution 1 and pool)
        x = F.relu(self.pool(self.conv1(x)))

        # Use a relu activation function after layer 2 (convolution 2 and pool)
        x = F.relu(self.pool(self.conv2(x)))

        # Select some features to drop after the 3rd convolution to prevent overfitting
        x = F.relu(self.drop(self.conv3(x)))

        # Only drop the features if this is a training pass
        x = F.dropout(x, training=self.training)

        # Flatten
        x = x.view(-1, 32 * 32 * 24)
        # Feed to fully-connected layer to predict class
        x = self.fc(x)
        # Return log_softmax tensor
        return F.log_softmax(x, dim=1)

LSTM

import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.functional as F
import torch.optim as optim
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence

class LSTMClassifier(nn.Module):
  def __init__(self, vocab_size, embedding_dim, hidden_dim, output_size):
    super(LSTMClassifier, self).__init__()
    self.embedding_dim = embedding_dim
    self.hidden_dim = hidden_dim
    self.vocab_size = vocab_size
    self.embedding = nn.Embedding(vocab_size, embedding_dim)
    self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers=1)
    self.hidden2out = nn.Linear(hidden_dim, output_size)
    self.softmax = nn.LogSoftmax()
    self.dropout_layer = nn.Dropout(p=0.2)
  def init_hidden(self, batch_size):
    return(autograd.Variable(torch.randn(1, batch_size, self.hidden_dim)),
            autograd.Variable(torch.randn(1, batch_size, self.hidden_dim)))

  def forward(self, batch, lengths):

    self.hidden = self.init_hidden(batch.size(-1))
    embeds = self.embedding(batch)
    packed_input = pack_padded_sequence(embeds, lengths)
    outputs, (ht, ct) = self.lstm(packed_input, self.hidden)
    # ht is the last hidden state of the sequences
    # ht = (1 x batch_size x hidden_dim)
    # ht[-1] = (batch_size x hidden_dim)
    output = self.dropout_layer(ht[-1])
    output = self.hidden2out(output)
    output = self.softmax(output)

    return output

References

1. https://github.com/mrdbourke/pytorch-deep-learning
2. https://readmedium.com/@rekalantar/variational-auto-encoder-vae-pytorch-tutorial-dce2d2fe0f5f
3. https://github.com/MicrosoftDocs/ml-basics/blob/master/05b%20-%20Convolutional%20Neural%20Networks%20(PyTorch).ipynb
4. https://github.com/ritchieng/the-incredible-pytorch?tab=readme-ov-file
5. https://github.com/claravania/lstm-pytorch
6. https://machinelearningmastery.com/pytorch-tutorial-develop-deep-learning-models/