PyTorch is a powerful and versatile library for numerical computing and machine learning. It’s built on Python, a language known for its simplicity and vast ecosystem. While AI can be implemented in other programming languages, the overwhelming majority of open-source tools and frameworks—like TensorFlow, Scikit-learn, and PyTorch—are written in Python. This makes Python the de facto standard for artificial intelligence and machine learning.
This note introduces three key aspects of PyTorch: its capabilities for numerical programming, its use of automatic differentiation, and its features for building neural networks and differentiable programs.
PyTorch is, first and foremost, a numerical programming package, akin to a "calculator app" for Python. It provides tools to perform mathematical operations efficiently and is especially optimized for large-scale computations like those in machine learning. Another example of a numerical programming library is NumPy, but PyTorch goes beyond it by offering features like GPU support and automatic differentiation, which make it more suitable for deep learning.
The primary data structure in PyTorch is the tensor, a generalization of scalars, vectors, and matrices. Tensors can have arbitrary dimensions and shapes:
[]
.[n]
.[m, n]
.[batch_size, height, width]
.PyTorch supports a wide range of operations, from elementwise functions (addition, multiplication, exponentiation) to matrix multiplication using the @
operator.
Example:
One of the key principles in PyTorch is vectorization: performing operations directly on entire tensors instead of iterating through their elements. This approach is much faster because PyTorch uses highly optimized C libraries for tensor operations. For example:
PyTorch supports computations on multiple platforms:
GPUs excel at operations like:
To use a GPU in PyTorch, you move tensors to the GPU with .cuda()
:
To bring tensors back to the CPU, use .cpu()
:
In Google Colab, you can request a GPU backend by selecting Runtime > Change runtime type > GPU. This allows you to compare the speed of computations on the CPU and GPU. For example, try multiplying two large matrices and observe the difference.
To fully benefit from the GPU, ensure:
PyTorch builds a computational graph in the background whenever you perform tensor operations. This graph tracks the sequence of operations and is essential for automatic differentiation, a method for computing gradients (derivatives). Gradients are critical for training machine learning models, as they indicate how to adjust model parameters to reduce error.
PyTorch uses reverse-mode automatic differentiation, commonly referred to as backpropagation, to calculate gradients efficiently. Backpropagation uses the chain rule of calculus and is akin to dynamic programming, where intermediate results are reused.
For more details, see this note on automatic differentiation.
requires_grad=True
when creating a tensor:
.backward()
to compute gradients:
torch.no_grad
for inference to save memory and computation:
.backward()
, the computational graph is destroyed by default. Use .detach()
to keep a tensor without tracking its history:
PyTorch simplifies the creation of neural networks, which are composed of layers performing differentiable computations. These layers are combined to process inputs, transform data, and generate outputs.
Here’s a simple example of a multi-layer perceptron (MLP):
Example:
PyTorch provides tools to streamline workflows:
For more advanced details, explore the Elements of Differentiable Programming.
PyTorch is a flexible and efficient framework for numerical programming, automatic differentiation, and building neural networks. With these tools, you can create powerful machine learning models and explore the exciting world of AI.