# 71 Tutorial on R torch package

Wenbo Zhao

## 71.1 Introduction

Nowadays, deep learning technique has shown its ability in dealing with data intensive tasks such as image recognition, neural language processing and so on. In python, there are two famous deep learning frameworks: `tensorflow`

[1] and `pytorch`

[2]. While `tensorflow`

provides R accessibility since 2017, `torch`

is not available until the end of 2019[3]. Because of the use-friendliness of `pytorch`

and the strong visualization ability of R, it is cool to combine them together. Here I would like to talk about how to use `torch`

in R.

The `torch`

ecosystem includes several packages. `Torch`

package is the basic one that includes the basic data structure `tensor`

, an N-d array of numbers, `nn_modules`

that includes trainable weights and `nnf_modules`

that are static ones. `Luz`

package provides compact implementation of training procedure so that we do not need to write loops. `Torchvision`

is a package especially designed for vision tasks that can crop or transform the images into suitable size to feed into the network.

## 71.2 Background

### 71.2.1 Deep Learning

Like linear regression, support vector machine and k-nearest neighbour, deep learning is also a kind of machine learning methods that can help us approximate the output or classify. All of these methods need a training dataset so that they can finetune their parameters and give the result of a new sample. One of the most important benefit of deep learning is the complexity of models that can abstract inner feature of samples and give precise prediction. It is said that with enough parameters, deep learning models can emulate any function between input and output. Of course this will cause overfitting problem and thus the size of model is set to be compatible with the size of input sample.

However, to present such a large deep learning model and its training process is not an easy task with traditional matrix multiplication. In such context, `tensorflow`

and `pytorch`

appears as an higher level API. With them, we can easily abstract a deep learning block and stacking these blocks together gives the complete network.

### 71.2.2 Convolution Neural Network

As is mentioned in the previous paragraph, a deep learning network is consisted of multiple blocks. For image classification task the most frequently used block is convolution neural network (CNN). Suppose we have a input of size \((N, C_{\mbox{in}}, H, W)\) and weight \(W\) of size \((C_{\mbox{out}}, C_{\mbox{in}}, k, k)\) with stride \(s\) and padding \(p\), the output size will be \((N, C_{\mbox{out}}, H_{\mbox{out}}, W_{\mbox{out}})\) where \[ H_{\mbox{out}} = \left\lfloor\frac{H + 2p - k}{s}\right\rfloor + 1 \\ W_{\mbox{out}} = \left\lfloor\frac{W + 2p - k}{s}\right\rfloor + 1 \] and \[ \mbox{out}[n, j, x, y] = \mbox{bias}_j + \mbox{inner_product}(W[j,:,:,:], \mbox{in}[n, :, 1+(x-1)s:1+(x-1)s+k, 1+(y-1)s:1+(y-1)s+k]) \] \(i.e.\) each output value is the inner product result of two 4-d array of size \([1,C_\mbox{in}, k, k]\) plus bias. This block is widely used in image tasks has achieved good result and is the basic unit in foundation networks Alexnet[4] and Resnet[5].

## 71.3 Implementation

### 71.3.1 Installation

To install torch in Rstudio, we should first install `torch`

package. In our example, we also need `luz`

and `torchvision`

, therefore we run

```
install.packages("torch")
install.packages("torchvision")
install.packages("luz")
```

and then run

### 71.3.2 Fetching Dataset

In this example, we would like to identify the digits in the famous hand-written digit dataset `MNIST`

using pure convolution neural network.

```
dir <- "./dataset/mnist"
train_ds <- mnist_dataset(
dir,
download = TRUE,
transform = transform_to_tensor
)
test_ds <- mnist_dataset(
dir,
train = FALSE,
transform = transform_to_tensor
)
train_dl <- dataloader(train_ds, batch_size = 128, shuffle = TRUE)
test_dl <- dataloader(test_ds, batch_size = 128)
```

The `batch_size`

parameter is the number of inputs that are processed in parallel and its value is decided by the computation capacity of the processor. An example of the input is shown as follow:

### 71.3.3 Building up the network

```
net <- nn_module(
"Net",
initialize = function() {
self$conv1 <- nn_conv2d(1, 32, 3, 1)
self$conv2 <- nn_conv2d(32, 64, 3, 1)
self$dropout1 <- nn_dropout2d(0.25)
self$dropout2 <- nn_dropout2d(0.5)
self$fc1 <- nn_linear(9216, 128)
self$fc2 <- nn_linear(128, 10)
},
forward = function(x) {
x %>% # N * 1 * 28 * 28
self$conv1() %>% # N * 32 * 26 * 26
nnf_relu() %>%
self$conv2() %>% # N * 64 * 24 * 24
nnf_relu() %>%
nnf_max_pool2d(2) %>% # N * 64 * 12 * 12
self$dropout1() %>%
torch_flatten(start_dim = 2) %>% # N * 9216
self$fc1() %>% # N * 128
nnf_relu() %>%
self$dropout2() %>%
self$fc2() # N * 10
}
)
```

The `nn_module`

function requires 3 variables. A `name`

variable which is an optional name for the module, an `initialization`

function that includes all the accessaries that are needed in the network. In the example, they are all the convolution layers. The parameter of `nn_conv2d`

are input channel \(C_\mbox{in}\), output channel \(C_\mbox{out}\), kernel size \(k\) and stride \(s\), which are described in previous sections. The padding \(p\) is set to 0 by default.

As is shown in the comment of the code, disregarding the batch size \(N\), the input size is an 28*28 image with only 1 input channel. After conv1, using the euqation of calculating \(H_\mbox{out}\) and \(W_\mbox{out}\), we can see that the size is 26*26 with 32 channels. After conv2, the shape is 24*24 with 64 channels. The `max_pool2d(2)`

function selection the largest one from each 2*2 region to prevent overfit. After that the shape is 12*12 with 64 channels. The `flatten`

function aligns them into a 9216 series and two linear layers finally get 10 values as the network output and the index of the largest one among them decides the output prediction digit.

### 71.3.4 Training

```
fitted <- net %>%
setup(
loss = nn_cross_entropy_loss(),
optimizer = optim_adam,
metrics = list(
luz_metric_accuracy()
)
) %>%
fit(train_dl, epochs = 10, valid_data = test_dl)
```

```
Epoch 1/10
Train metrics: Loss: 0.2813 - Acc: 0.9146
Valid metrics: Loss: 0.0565 - Acc: 0.982
Epoch 2/10
Train metrics: Loss: 0.1055 - Acc: 0.9687
Valid metrics: Loss: 0.0424 - Acc: 0.985
Epoch 3/10
Train metrics: Loss: 0.0782 - Acc: 0.9756
Valid metrics: Loss: 0.0359 - Acc: 0.9872
Epoch 4/10
Train metrics: Loss: 0.0626 - Acc: 0.9815
Valid metrics: Loss: 0.0364 - Acc: 0.989
Epoch 5/10
Train metrics: Loss: 0.0563 - Acc: 0.983
Valid metrics: Loss: 0.0362 - Acc: 0.9889
Epoch 6/10
Train metrics: Loss: 0.0522 - Acc: 0.9831
Valid metrics: Loss: 0.0345 - Acc: 0.9892
Epoch 7/10
Train metrics: Loss: 0.0448 - Acc: 0.9861
Valid metrics: Loss: 0.029 - Acc: 0.9901
Epoch 8/10
Train metrics: Loss: 0.0415 - Acc: 0.9863
Valid metrics: Loss: 0.0307 - Acc: 0.9905
Epoch 9/10
Train metrics: Loss: 0.0361 - Acc: 0.9883
Valid metrics: Loss: 0.0289 - Acc: 0.9905
Epoch 10/10
Train metrics: Loss: 0.0337 - Acc: 0.9892
Valid metrics: Loss: 0.0286 - Acc: 0.9904
```

### 71.3.6 Saving the model

`luz_save(fitted, "mnist-cnn.pt")`

## 71.4 Conclusion

Traning the model on R can gives us much merit since R can visualize the data easier and prettier than python. For example, we can plot the accuracy curve during the traning to see whether the model has converged, plot the distribution of trained weights to see the distribution difference in each layer, etc. Although some of the visualization method is already implemented with tensorboardX in python, I think it is still a beneficial attempt to train deep learning models on R.

## 71.5 Citation

[1] https://www.tensorflow.org/

[2] https://pytorch.org/

[3] https://torch.mlverse.org/

[4] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems 25 (2012): 1097-1105.

[5] He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.