Mutable.ai logo
 Auto Wiki by Mutable.ai
Create your own wiki
AI-generated instantly
Updates automatically
Solo and team plans

xformers

Auto-generated from facebookresearch/xformers by Mutable.ai Auto Wiki
xformers
GitHub Repository
Developerfacebookresearch
Written inPython
Stars6.8k
Watchers71
Created10/13/2021
Last updated01/07/2024
LicenseOther
Homepagefacebookresearch.github.io/xformers
Repositoryfacebookresearch/xformers
Auto Wiki
Revision0
Software Version0.0.4Basic
Generated fromCommit 660000
Generated at01/07/2024

XFormers is a Python library focused on building, optimizing and evaluating Transformer neural network models. It provides a comprehensive set of modular components, optimizations, tools and framework integrations to simplify Transformer construction, accelerate training/inference, and rigorously benchmark model performance.

At the core, XFormers enables configurable model architecture construction from reusable components like attention mechanisms and feedforward networks. The …/factory module provides a model factory using classes and utilities to instantiate Transformer encoder-decoder models from configuration files in a flexible way. Components like attention are registered and selected based on the config.

A key focus is providing bleeding-edge optimizations to push efficiency of Transformers. The …/ops module contains optimized operators for memory-efficient multi-head attention computations on GPUs. Techniques like split key attention are used to save memory. Low-level CUDA kernels in …/cuda provide additional optimizations.

The library integrates well with PyTorch for ease-of-use while enabling custom kernels. Utilities handle operator registration, conversion and dispatching between frameworks. Autograd support is provided for custom kernels.

The …/benchmarks module provides utilities to rigorously benchmark model performance. Classes generate different attention patterns and transformer configurations to evaluate across tasks defined in …/LRA. Profilers measure runtime, memory usage, and hardware metrics.

Overall XFormers focuses on providing an extensible, optimized Transformer toolkit with modular components, efficient kernels, benchmarking, and deep framework integration. The design centers around configurability, performance and cutting-edge techniques.

Transformer Model Construction

The …/factory directory implements a factory pattern to construct modular Transformer blocks and models from configurable components in an extensible way. Key functionality includes classes that take configuration objects and compose attention, feedforward, and other modules into reusable blocks.

Read more

Modular Components

The XFormers library provides modular components that enable flexible construction of transformer models. Classes discussed in the summary for …/base.py and discussed in the summary for …/base.py define reusable base components that can be combined to build models.

Read more

Model Factory

The model factory class builds the full Transformer model. It handles parsing the configuration, constructing encoder and decoder blocks, and implementing the overall forward pass.

Read more

Configurable Architectures

References: xformers

The …/build_model directory contains examples of building transformer models from configurations. The configuration specifies the overall model architecture. The class contains the core logic for building out each component of the model from the configuration.

Read more

Optimized Operators

The optimized operators in XFormers focus on efficiently implementing attention mechanisms and reducing memory usage. A key aspect is leveraging optimized linear algebra primitives for operations central to attention. The …/ops directory contains several important operator implementations.

Read more

Optimized Attention Kernels

The XFormers library provides highly optimized CUDA implementations of key algorithms for efficient self-attention and related linear algebra operations on NVIDIA GPUs. The core building blocks leverage CUDA, shared memory, and compiler-generated kernels to efficiently map these algorithms to GPU hardware.

Read more

Memory-Efficient Attention

References: xformers/ops/fmha

The …/fmha directory contains classes that implement memory-efficient multi-head attention. Concrete subclasses like classes defined in …/cutlass.py, …/triton.py, and …/flash.py implement the forward and backward passes by overriding methods.

Read more

Framework Integrations

References: xformers

The …/csrc directory contains implementations that integrate core XFormers operators and computations with PyTorch. It provides registrations, dispatching logic, and conversions between the C++/CUDA backend and the Python frontend through PyTorch.

Read more

Normalization Kernels

The file …/rmsnorm_kernels.py contains optimized CUDA kernels for performing row-wise root mean squared (RMS) normalization. RMS normalization is commonly used in neural networks as a form of normalization before or after linear transformations. The kernels in this file enable efficiently performing RMS normalization on GPUs in a memory-efficient blocked manner.

Read more

Quantization

The …/triton.py file implements quantization support for fast multi-head attention using Triton. It allows the Triton attention operator to work with half and bfloat16 data types, reducing memory usage compared to float32.

Read more

Split Key Attention

This code in …/triton_splitk.py implements split key attention to reduce memory usage during self-attention. It defines a class that handles the overall forward pass of split attention.

Read more

Autograd Support

The …/autograd directory implements autograd support for optimized attention operations. It contains a C++ implementation of the forward and backward passes for matrix multiplication with an optional mask in the file …/matmul.cpp.

Read more

Benchmarking

The core functionality of the …/benchmarks code is to provide tools and frameworks for evaluating attention patterns, models, and hardware accelerators. This is done through several main components:

Read more

Benchmark Utilities

The …/utils.py file contains various utility functions for running and analyzing benchmarks of PyTorch models. It contains functions for benchmarking models and timing iterations to gather results. Results are returned as a list which can then be passed to further processing functions. One function handles the core benchmarking workflow, accepting flags like the number of warmups and iterations to run. It loads the model and runs warmups, then times iterations to collect performance data. Another utility function takes the raw timing data and modifies it when multiple algorithms are compared, for example by computing relative performance metrics. It supports reporting both average time and samples processed per second.

Read more

Long Range Arena

The Long Range Arena (LRA) benchmark suite contains code and scripts for evaluating Transformer models on a standardized set of tasks. The suite implements several benchmark tasks designed to test long-range dependencies, including sequence modeling, question answering, and language understanding. Models are evaluated based on their ability to capture dependencies between elements in an input that are distant from each other.

Read more

Attention Benchmarking

This section focuses on benchmarking different attention patterns. The …/benchmark_blocksparse_transformers.py file contains relevant code.

Read more

Model Benchmarking

The …/benchmark_transformer.py module contains classes and functions that enable integrating Transformer models with benchmarks for performance testing. It replaces the standard attention and MLP modules in a model with benchmarking modules when running performance tests. Model configurations, optional precision changes, and test cases are generated by functions in this module. Benchmarking is done by running models with different inputs and modifiers, measuring the forward and backward pass times, and returning the results. The module contains efficient operator implementations from other parts of the library that are used to modify the existing modules when benchmarking.

Read more

Profiling

References: xformers/profiler

The profilers in xformers provide tools to measure model performance and hardware usage during training and inference. The …/profiler directory contains several profiler implementations that can operate sequentially or individually.

Read more

Sparse Attention

The core functionality provided by the code under Sparse Attention is enabling efficient computations and representations for sparse attention patterns in transformer models. This is achieved through classes, utilities, and algorithms for working with sparse tensors and sparse attention computations.

Read more

Sparse Tensor Classes

The xformers library provides classes for working with sparse tensors.

Read more

Sparse Linear Algebra

The …/_csr_ops.py file contains utilities and autograd functions for performing sparse linear algebra operations on CSR (compressed sparse row) matrices in an efficient manner. It handles dispatching between sparse tensor formats like COO and CSR depending on the sparsity and shape of input matrices. Key functionality includes:

Read more

Sparse Utilities

The utilities in the …/utils.py file provide important functionality for working with sparse tensors, with a focus on the compressed sparse row (CSR) format. The file contains functions for sorting indices and converting between sparse formats. Overall, these utilities provide a set of important sparse tensor manipulation functions focused on operations in the CSR format used throughout the library.

Read more

Block Sparse Attention

The file …/attention_patterns.py contains implementations for various attention patterns. It provides functions for block-sparse attention patterns and converting between patterns and block-sparse layouts.

Read more

Long Range Attention

The XFormers library includes two attention mechanisms that enable modeling long-range dependencies without quadratic attention costs: the attention mechanism defined in …/lambda_layer.py and the class defined in …/linformer.py.

Read more

Visual Attention

This section covers attention mechanisms designed for visual inputs like images. The file …/visual.py contains implementations of visual attention in transformers.

Read more

Integrations

This section discusses integrations between XFormers and other frameworks like PyTorch, Triton, CUDA, as well as libraries like Sputnik.

Read more

PyTorch Integration

The …/swiglu directory implements core XFormers operators and computations using PyTorch. It contains functionality in the …/swiglu_op.cpp file to register operators with PyTorch.

Read more

CUDA Kernels

The …/cuda directory contains highly optimized CUDA implementations of multi-head attention and related linear algebra operations for NVIDIA GPUs. This includes kernels for both the forward and backward passes.

Read more

Autograd Support

The …/autograd directory implements autograd support for optimized CUDA attention operations. It contains a C++ implementation of the forward and backward passes for matrix multiplication with an optional mask.

Read more

Sputnik Integration

The Sputnik library provides optimized GPU primitives through functions that leverage low-level CUDA features for performance and portability across Nvidia GPU hardware. The main components of Sputnik integration are:

Read more

Model Training & Inference

The examples directory contains several examples that demonstrate how to use the XFormers library for model training, evaluation, and inference.

Read more

Model Training

This section covers code and examples for training transformer models. The key functionality is:

Read more

Inference

This section details how trained models can be used for text generation and prediction through inference. The core functionality is handled by the code in …/generate.py.

Read more

Utilities

The utilities in this section are used for common tasks during model training and inference.

Read more

Testing

References: tests

The xformers library provides a comprehensive suite of tests for validating components using parameterization and utilities that support writing robust and reusable tests. The tests are organized into directories under tests that match the component structure under test. This allows grouping related tests together in an intuitive way.

Read more

Test Organization

Tests are organized into subdirectories by component under the tests folder. Tests for attention functionality can be found in …/test_core_attention.py.

Read more

Functionality Testing

The …/test_feedforward.py file contains tests for feedforward neural network components in Xformers. These tests validate that feedforward layers function correctly across different device configurations by constructing each layer with random parameters, passing dummy data through, and checking the outputs.

Read more

Gradient Checking

The …/test_feedforward.py file contains tests that validate gradient correctness for feedforward layers in addition to checking the forward pass results. These gradient checks are implemented using finite difference approximations. The tests construct feedforward layers like those registered in the component registry with different configurations, including varying the number of experts. Dummy data is passed through the layers during a forward pass to get outputs. Finite difference approximations are then calculated for the gradients to validate they match the gradients produced. Any discrepancies would indicate issues with the backward implementation. This gradient checking process is performed for different feedforward layers and configurations to ensure their gradients are being computed correctly during training.

Read more

Correctness Testing

The unit tests in the …/test_core_attention.py file validate the correctness of the sparse attention kernels by comparing their results to the dense implementation under different conditions. Tests are run with different attention masks and input shapes specified in the test functions.

Read more

Performance Testing

References: xformers

The XFormers library contains comprehensive tests to ensure components achieve their expected speed and memory usage goals. The tests directory contains rigorous performance tests that benchmark key components across a variety of configurations.

Read more

Model Testing

References: xformers

The tests directory contains tests for complete models constructed from Xformers components in different configurations. Tests validate models by passing data through and checking outputs and gradients.

Read more