# PyTorch¶

Deepwave provides wave propagation as a differentiable operation in PyTorch. Having a good understanding of PyTorch will therefore help you to get the greatest benefit from Deepwave. PyTorch has a lot of features and is very flexible. The tutorials on the PyTorch website are a good place to learn about these, but I include a quick overview of some of the most important features here.

PyTorch is similar to the popular Python package NumPy, providing the tools to do numerical work in Python. Many NumPy functions have an equivalent PyTorch version. There are two important differences, however: Tensors and backpropagation.

NumPy stores data in a multi-dimensional array known as an ndarray. PyTorch calls them Tensors. Unlike ndarrays, which are restricted to CPUs, a Tensor can be on a CPU or a GPU (if you have an appropriate one), providing the opportunity for substantially better performance. You can create a Tensor, move it around, and apply operations to it, like this:

a = torch.arange(3)  # [0, 1, 2] on CPU
b = torch.ones(3, device=torch.device('cuda'))  # [1, 1, 1] on GPU
c = a.cuda()**2 + 2 * b  # [2, 3, 6] on GPU
d = c.cpu()  # d = [2, 3, 6] on CPU, c is still on GPU


The second change, backpropagation, is even more important. It allows you to backpropagate gradients through chains of operations, using automatic differentiation, enabling you to perform inversion/optimisation for complicated calculations after just coding the forward pass.

Let’s demonstrate this by starting with a simple example:

>>> import torch
>>> b = (2 * a).sum()
>>> print(b)
>>> b.backward()
tensor([2., 2., 2.])


Here we created a Tensor, a, containing [0, 1, 2], and indicated that we will require gradients with respect to it (requires_grad=True). Multiplying each element by 2 and summing gives a Tensor containing the number 6, which we store in b. But b also contains a reference to the adjoint of the operation that generated it (SumBackward0), which is needed when we call backward on b to calculate gradients with respect to it. This backpropagates gradients through the computational graph until they reach all of the Tensors in the calculation that had requires_grad set to True. Calling backward on b thus backpropagated to calculate the gradient of b with respect to a. The result is stored in a.grad. We expect that if we change the value of any of the elements in a by $$\delta$$, the effect on b will be $$2\delta$$ (since we obtained b by multiplying a by two and summing), so the gradient with respect to every element of a should be 2, which we see it is.

We were able to easily work out what that gradient was ourselves. The power of PyTorch becomes more evident when calculations become more complicated. In those cases it makes life much easier when we only have to write the “forward” part of the calculation, and leave working out the gradient to PyTorch’s automatic differentiation.

>>> b = (2 * a).sum()
>>> d = (a * c).sum() + b
>>> d.backward()
tensor([5., 5., 5.])
tensor([0., 1., 2.])


Now that we know how to calculate gradients, we can use these to perform optimisation/inversion. PyTorch provides several optimisers, or you can use the gradients to create your own. As an example, I will use the simple Stochastic Gradient Descent method.

>>> def f(x):
>>>     return 3 * (torch.sin(5 * x) + 2 * torch.exp(x))
>>>
>>> x_true = torch.tensor([0.123, 0.321])
>>> y_true = f(x_true)
>>>
>>> opt = torch.optim.SGD([x], lr=1e-4)
>>> loss_fn = torch.nn.MSELoss()
>>>
>>> for i in range(2000):