In [1]: x = 4 In [2]: y = 2 In [3]: x + y Out[3]: 6
x
and y
will not result in the sum of these numbers, but in a descriptor of the calculated graph, which will give the desired value only after it is executed: In [1]: import tensorflow as tf In [2]: x = tf.constant(4) In [3]: y = tf.constant(2) In [4]: x + y Out[4]: <tf.Tensor 'add:0' shape=() dtype=int32>
In [1]: import torch In [2]: x = torch.ones(1) * 4 In [3]: y = torch.ones(1) * 2 In [4]: x + y Out[4]: 6 [torch.FloatTensor of size 1]
tf.cond()
, which accepts three subgraphs as input: a conditional subgraph and two subgraphs for two development branches of a condition: if
and else
. Similarly, loops in Ternsorflow columns should be represented as tf.while()
operations, taking the condition
and the body
subgraph as input. In a situation with a dynamic graph, all this is simplified. Since the graphs for each interpretation are viewed from the Python code as is, flow control can be natively implemented in the language using if
conditions and while
loops, as in any other program. Thus, the clumsy and confused Tensorflow code: import tensorflow as tf x = tf.constant(2, shape=[2, 2]) w = tf.while_loop( lambda x: tf.reduce_sum(x) < 100, lambda x: tf.nn.relu(tf.square(x)), [x])
import torch.nn from torch.autograd import Variable x = Variable(torch.ones([2, 2]) * 2) while x.sum() < 100: x = torch.nn.ReLU()(x**2)
print
instructions (and not using tf.Print()
nodes) or in the debugger is a big plus. Of course, dynamism can both optimize programmability as well as degrade performance — that is, it is more difficult to optimize such graphs. Therefore, the differences and trade-offs between PyTorch and TensorFlow are much the same as between a dynamic interpreted language, for example, Python, and a static compiled language, for example, C or C ++. The first is easier and faster to work with, and from the second and third it is more convenient to collect entities that are well optimizable. This is the trade-off between flexibility and performance.tensor
. The tensor data type is very similar in meaning and function to the ndarray
of NumPy. Moreover, since PyTorch is focused on reasonable interoperability with NumPy, the API tensor
also resembles the ndarray
API (but is not identical to it). PyTorch tensors can be created using the torch.Tensor
constructor, which accepts tensor dimensions as input and returns a tensor that occupies an uninitialized memory region: import torch x = torch.Tensor(4, 4)
torch.rand
: values ​​are initialized from a random uniform distribution,torch.randn
: values ​​are initialized from a random normal distribution,torch.eye(n)
: nĂ—nnĂ—n
identity matrix,torch.from_numpy(ndarray)
: PyTorch tensor based on ndarray
from NumPytorch.linspace(start, end, steps)
: 1-D tensor with values ​​of steps
evenly distributed between start
and end
,torch.ones
: single-unit tensor,torch.zeros_like(other)
: a tensor of the same shape as the other
and with only zeros,torch.arange(start, end, step)
: 1-D tensor with values ​​filled out of range.ndarray
of NumPy, PyTorch tensors provide a very rich API for combination with other tensors, as well as for situational changes. As in NumPy, unary and binary operations can usually be performed using functions from the torch
module, for example, torch.add(x, y)
or directly using methods in tensor objects, for example, x.add(y)
. For the most common places, there are overload operators, for example, x + y
. Moreover, for many functions, there are situational alternatives that will not create a new tensor, but change the recipient instance. These functions are named the same as the standard variants, however, they contain an underscore in the title, for example: x.add_(y)
.torch.add(x, y)
: elementwise additiontorch.mm(x, y)
: matrix multiplication (not matmul
or dot
),torch.mul(x, y)
: elementwise multiplicationtorch.exp(x)
: elementwise exponenttorch.pow(x, power)
: elementwise exponentiationtorch.sqrt(x)
: squaring by elementtorch.sqrt_(x)
: situational, elementwise squaringtorch.sigmoid(x)
: elementwise sigmoidtorch.cumprod(x)
: the product of all valuestorch.sum(x)
: the sum of all valuestorch.std(x)
: standard deviation of all valuestorch.mean(x)
: average of all valuesx[x > 5]
) and element-wise relational operators ( x > y
). PyTorch tensors can also be converted directly to ndarray
NumPy using the torch.Tensor.numpy()
function. Finally, since the main superiority of PyTorch tensors over ndarray NumPy is GPU acceleration, you also have a feature torch.Tensor.cuda()
, which copies the tensor memory to a GPU device with CUDA support, if any.Tensor
PyTorch Tensor
does not yet have full mechanisms for participating in automatic differentiation. To be able to record the tensor, it must be wrapped in torch.autograd.Variable
. The Variable
class provides almost the same API as Tensor
, but it complements its ability to interact with torch.autograd.Function
precisely for the sake of automatic differentiation. More precisely, the history of operations on the Tensor
recorded in the Variable
.torch.autograd.Variable
very simple. You just need to pass a Tensor
to it and tell torch
whether this variable requires writing gradients: x = torch.autograd.Variable(torch.ones(4, 4), requires_grad=True)
requires_grad
may require a value of False
, for example, when entering data or working with labels, since such information is usually not differentiated. However, they still need to be Variables
in order to be suitable for automatic differentiation. Note: requires_grad is False
by default, therefore, for the parameters to be trained, it should be set to True
.backward()
function is applied to the Variable
. This calculates the gradient of this tensor relative to the leaves of the calculated graph (all input values ​​that have influenced this). These gradients are then assembled into a member of a grad
class Variable
: In [1]: import torch In [2]: from torch.autograd import Variable In [3]: x = Variable(torch.ones(1, 5)) In [4]: w = Variable(torch.randn(5, 1), requires_grad=True) In [5]: b = Variable(torch.randn(1), requires_grad=True) In [6]: y = x.mm(w) + b # mm = matrix multiply In [7]: y.backward() # perform automatic differentiation In [8]: w.grad Out[8]: Variable containing: 1 1 1 1 1 [torch.FloatTensor of size (5,1)] In [9]: b.grad Out[9]: Variable containing: 1 [torch.FloatTensor of size (1,)] In [10]: x.grad None
Variable
except input values ​​are results of operations, each Variable is associated with grad_fn
, which is a function torch.autograd.Function
for calculating the inverse step. For input values, it is None
: In [11]: y.grad_fn Out[11]: <AddBackward1 at 0x1077cef60> In [12]: x.grad_fn None torch.nn
torch.nn
module provides torch.nn
users with functionality specific to neural networks. One of its most important members is torch.nn.Module
, representing a reusable block of operations and associated (trained) parameters, most often used in layers of neural networks. Modules may contain other modules and implicitly receive the backward()
function for back distribution. An example of a module is torch.nn.Linear()
, which represents a linear (dense / fully connected) layer (ie, an affine transform Wx+bWx+b
): In [1]: import torch In [2]: from torch import nn In [3]: from torch.autograd import Variable In [4]: x = Variable(torch.ones(5, 5)) In [5]: x Out[5]: Variable containing: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [torch.FloatTensor of size (5,5)] In [6]: linear = nn.Linear(5, 1) In [7]: linear(x) Out[7]: Variable containing: 0.3324 0.3324 0.3324 0.3324 0.3324 [torch.FloatTensor of size (5,1)]
backward()
in a module in order to calculate gradients for its variables. Since when calling backward()
, the Variables
member of the Variables
, there is also the nn.Module.zero_grad()
method, which drops the grad
member of all Variable
to zero. Your training loop usually calls zero_grad()
at the very beginning, or just before calling backward()
, to reset the gradients for the next optimization step.torch.nn.Module
and give it the forward
method. For example, here is the module that I wrote for one of my models (in it Gaussian noise is added to the input information): class AddNoise(torch.nn.Module): def __init__(self, mean=0.0, stddev=0.1): super(AddNoise, self).__init__() self.mean = mean self.stddev = stddev def forward(self, input): noise = input.clone().normal_(self.mean, self.stddev) return input + noise
torch.nn.Sequential()
, which is transferred a sequence of modules, to connect or hook modules into full-featured models - and it, in turn, begins to act as an independent module, each call sequentially calculating those modules that were passed to it. For example: In [1]: import torch In [2]: from torch import nn In [3]: from torch.autograd import Variable In [4]: model = nn.Sequential( ...: nn.Conv2d(1, 20, 5), ...: nn.ReLU(), ...: nn.Conv2d(20, 64, 5), ...: nn.ReLU()) ...: In [5]: image = Variable(torch.rand(1, 1, 32, 32)) In [6]: model(image) Out[6]: Variable containing: (0 ,0 ,.,.) = 0.0026 0.0685 0.0000 ... 0.0000 0.1864 0.0413 0.0000 0.0979 0.0119 ... 0.1637 0.0618 0.0000 0.0000 0.0000 0.0000 ... 0.1289 0.1293 0.0000 ... ⋱ ... 0.1006 0.1270 0.0723 ... 0.0000 0.1026 0.0000 0.0000 0.0000 0.0574 ... 0.1491 0.0000 0.0191 0.0150 0.0321 0.0000 ... 0.0204 0.0146 0.1724
torch.nn
also provides a number of loss functions, naturally important for machine learning applications. Examples of such functions:torch.nn.MSELoss
: mean square loss functiontorch.nn.BCELoss
: loss function of binary cross-entropy,torch.nn.KLDivLoss
: loss function of Kullback-Leibler information discrepancy In [1]: import torch In [2]: import torch.nn In [3]: from torch.autograd import Variable In [4]: x = Variable(torch.randn(10, 3)) In [5]: y = Variable(torch.ones(10).type(torch.LongTensor)) In [6]: weights = Variable(torch.Tensor([0.2, 0.2, 0.6])) In [7]: loss_function = torch.nn.CrossEntropyLoss(weight=weights) In [8]: loss_value = loss_function(x, y) Out [8]: Variable containing: 1.2380 [torch.FloatTensor of size (1,)]
nn.Module
) and loss functions, it remains to consider only the optimizer, which triggers a stochastic gradient descent (variant). PyTorch torch.optim
, , :torch.optim.SGD
: ,torch.optim.Adam
: ,torch.optim.RMSprop
: , Coursera,torch.optim.LBFGS
: ---parameters()
nn.Module
, , . , . For example: In [1]: import torch In [2]: import torch.optim In [3]: from torch.autograd import Variable In [4]: x = Variable(torch.randn(5, 5)) In [5]: y = Variable(torch.randn(5, 5), requires_grad=True) In [6]: z = x.mm(y).mean() # Perform an operation In [7]: opt = torch.optim.Adam([y], lr=2e-4, betas=(0.5, 0.999)) In [8]: z.backward() # Calculate gradients In [9]: y.data Out[9]: -0.4109 -0.0521 0.1481 1.9327 1.5276 -1.2396 0.0819 -1.3986 -0.0576 1.9694 0.6252 0.7571 -2.2882 -0.1773 1.4825 0.2634 -2.1945 -2.0998 0.7056 1.6744 1.5266 1.7088 0.7706 -0.7874 -0.0161 [torch.FloatTensor of size 5x5] In [10]: opt.step() # y Adam In [11]: y.data Out[11]: -0.4107 -0.0519 0.1483 1.9329 1.5278 -1.2398 0.0817 -1.3988 -0.0578 1.9692 0.6250 0.7569 -2.2884 -0.1775 1.4823 0.2636 -2.1943 -2.0996 0.7058 1.6746 1.5264 1.7086 0.7704 -0.7876 -0.0163 [torch.FloatTensor of size 5x5]
torch.utils.data module
. :Dataset
, ,DataLoader
, , , .torch.utils.data.Dataset
__len__
, , , __getitem__
. , , : import math class RangeDataset(torch.utils.data.Dataset): def __init__(self, start, end, step=1): self.start = start self.end = end self.step = step def __len__(self, length): return math.ceil((self.end - self.start) / self.step) def __getitem__(self, index): value = self.start + index * self.step assert value < self.end return value
__init__
- . __len__
, __getitem__
, __getitem__
, , .for i in range
__getitem__
. , , , for sample in dataset
. , DataLoader
. DataLoader
, . , , . DataLoader
num_workers
. : DataLoader
, batch_size
. A simple example: dataset = RangeDataset(0, 10) data_loader = torch.utils.data.DataLoader( dataset, batch_size=4, shuffle=True, num_workers=2, drop_last=True) for i, batch in enumerate(data_loader): print(i, batch)
batch_size
4, . shuffle=True
, , . drop_last=True
, , , batch_size
, . , num_workers
«», , . , DataLoader
, , , , .DataLoader
, , , __getitem__
, , DataLoader
. , __getitem__
, DataLoader
, , . , , __getitem__
dict(example=example, label=label)
, , DataLoader
, dict(example=[example1, example2, ...], label=[label1, label2, ...])
, , , . , collate_fn
DataLoader
.torchvision
, , torchvision.datasets.CIFAR10
. torchaudio
torchtext
.Source: https://habr.com/ru/post/354912/
All Articles