Auto-generated from huggingface/peft by Mutable.ai Auto Wiki
|Apache License 2.0
PEFT provides a framework for parameter-efficient fine-tuning of large pretrained models like BERT and GPT-2. It implements techniques like adapters, prompts, and quantization to efficiently specialize models for new tasks and datasets without retraining the entire model.
At the core, PEFT allows injecting small trainable modules called adapters into specific layers of a pretrained model. For example, LoRA adapters implemented in
…/lora add low-rank "update matrices" to attention blocks that can be tuned on new data. This keeps the original weights fixed while adapting to new tasks. Modules like
…/peft_model.py handle loading, saving, and executing these adapted models.
In addition to adapters, PEFT supports prompt tuning by adding special prompt token embeddings that guide model behavior, as implemented in
Configuration classes like
…/config.py define model schemas. Base classes provide common interfaces to load tuners and dispatch execution. Utilities handle initialization and I/O. Together these components allow easily applying techniques like LoRA and prompt tuning to HuggingFace models with just a few lines of configuration.
examples directory contains end-to-end training examples for tasks like conditional text generation, image classification, and causal language modeling. Tests in
tests validate functionality. The documentation in
docs provides conceptual overviews of techniques and guides to customizing models.
In summary, PEFT provides a toolkit enabling efficient adaptation and specialization of large pretrained models for new datasets and tasks. By keeping most original weights fixed and only tuning small adapters and prompts, it allows much more efficient fine-tuning than full retraining.
The PEFT library provides implementations of various model tuning techniques to efficiently specialize pretrained models for downstream tasks. Key techniques implemented include:
Adapters: Lightweight modules inserted into models to capture task-specific knowledge. Techniques like classes in
…/layer.pyimplement different approaches to quantizing and tuning adapters.
Prompts: Learned embeddings injected into models to subtly guide their behavior.
Quantization: Methods for reducing model size through techniques like LoRA quantization using classes defined in
…/bnb.py. This improves efficiency.
The core functionality is encapsulated in the
…/tuners directory. It contains subdirectories implementing different tuners, with common utilities providing standardized APIs and functionality.
The core functionality implemented in the code related to adapters is the ability to efficiently tune models using lightweight adapter modules. Several classes and techniques work together to achieve this.
…/adalora directory implements the AdaLora algorithm, which uses low-rank matrix factorization to compress models. The key aspects are handled by classes and files in this directory:
…/layer.pyfile contains functionality for initializing and inserting adapter modules.
…/model.pyfile defines the class used to create an AdaLora compressed model from a pretrained model by overriding methods to insert modules.
…/lora implementation also uses adapters. The
…/layer.py file defines classes for layers that store and apply adapter parameters. The
…/model.py file handles injecting these modules into a base model.
The core functionality implemented for prompts involves injecting trainable prompt embeddings to guide model behavior. This is handled through classes defined in the
The configuration object stores hyperparameters for initializing prompt embeddings. This includes settings like the initialization method and text. Standardizing the configuration helps ensure prompts are initialized correctly.
The class implements an embedding layer to map prompt tokens to vectors. During initialization, it loads embeddings from the base model specified in the config to encode prompts in the same embedding space. The method simply applies this embedding layer to input indices to produce prompt embeddings.
The initialization logic for models is handled through methods defined in code referenced in the summaries. Its methods ensure the prompt embeddings and model weights are set up properly for joint training.
By defining standardized configuration and embedding classes, the code provides a clean interface for integrating prompts into models. The prompt embeddings can be added to model inputs as learnable parameters to subtly guide the model during fine-tuning based on the prompt text. This allows natural language prompts to be represented within the model's parameters.
The PEFT framework implements several quantization techniques under
…/lora to reduce model size for efficient deployment. Quantization is applied after tuning via other methods to compress the model further.
…/lora directory contains implementations of quantization using the LoRA algorithm. Key classes for quantization include:
- Classes in
…/__init__.pyinherit from a base class and implement quantized versions of layers.
Quantization is configured via classes in
Quantized layers are implemented by:
…/__init__.pywhich applies quantization to weights and activations
…/bnb.pyimplement quantized linear algebra
A class in
…/model.py injects quantized modules by calling a function which instantiates the correct quantized layer class for each target module specified in config. This allows quantizing full models.
…/oft directory implements Operator-based Filter Tuning (OFT) for efficiently tuning models. OFT works by learning low-rank "delta" weights that are applied to the outputs of existing layers. These delta weights are stored as additional parameters called "adapters" that are much smaller than replacing the entire layers.
The core implementation of OFT uses adapter parameters that store the delta weights for each layer. During the forward pass, the delta weights are applied by multiplying the inputs.
The model is constructed by stacking subclasses according to the configuration.
The configuration defines the hyperparameters for OFT models, including the layers to transform and adapter ranks. It uses inheritance and provides default values and types.
The core functionality implemented in the PEFT library for training and evaluation includes components for training loops, distributed scaling using frameworks like Accelerate, and loading/saving models.
Some of the key implementation details include:
Training loops are implemented in Python script files under
examples. These contain the main logic for running training on different tasks.
…/peft_lora_clm_accelerate_ds_zero3_offload.pyshow an example of distributed training using LoRA adapters on a causal language modeling task with Accelerate.
The base class in
…/peft_model.pyprovides the base interface for loading, saving, and executing models. It handles dispatching the forward pass to tuner-specific model classes.
Subclasses of the base class implement task-specific logic.
The configuration classes define the schemas and hyperparameters needed for PEFT models. The key classes are defined in
A base mixin class defines common configuration functionality like saving and loading configs to and from dictionaries using methods. It also contains a method for serialization.
Another class inherits from the base class and defines the core fields required for any PEFT model.
A third class inherits from the second class and adds prompt-specific fields needed for models that use prompting techniques.
All classes use dataclasses for easy serialization and deserialization. The base class handles common serialization logic that the other classes inherit. The second class defines the base schema for any PEFT model. The third class extends this for prompting models by adding additional fields.
The base class in
…/peft_model.py handles dispatching the forward pass to the appropriate model class based on the PEFT configuration. It loads and saves adapters, and provides utilities like getting prompt embeddings.
The class initializes the model class specified in the PEFT configuration. For prompt tuning models, it initializes a module to generate prompt embeddings and stores the prompt tokens. It overloads functions for prompt-based methods.
Task-specific model classes inherit from and add task logic. For example, they set the classification layer name, overload functions to add labels, and return the appropriate output type.
…/__init__.py file imports classes, functions, and mappings that provide the core PEFT functionality. It imports the model classes and configuration classes. The mapping functions in
…/mapping.py provide a mapping between task types and model/configuration classes, and tuner types and tuner classes.
…/helpers.py file contains functions for updating method signatures like and . These ensure signatures are properly inherited when subclassing models.
…/__init__.py file collects various utility functions and classes. This includes type aliases defined in enums for model and task types. Constants map models to target modules for techniques like prefix tuning.
…/constants.py file centralizes mappings of models to adapter insertion points and naming conventions.
…/save_and_load.py file provides functions for saving and loading PEFT model states. It handles loading weights from local or HuggingFace paths onto the correct device.
…/loftq_utils.py file implements LoFTQ weight initialization. The class handles quantization and dequantization of tensors using lookup tables generated from distributions. Weights are initialized in a loop with quantization and SVD decomposition of residuals.
Utilities are also provided for tasks like initialization, quantization configuration, shifting tokens, and prompt learning setup. These common utilities provide reusable functionality throughout PEFT.
The testing utilities provide common functionality for writing and running tests of PEFT models. This includes configuration for pytest, common testing components like fixtures and base classes, as well as general utilities.
…/conftest.py file contains pytest configuration. It defines functions to configure how pytest runs tests.
…/testing_common.py file defines a base class with helper methods.
…/testing_utils.py file contains utilities for writing tests in PyTorch. It includes decorators to skip tests if certain hardware or dependencies are missing.
These testing utilities provide common infrastructure for writing tests, running them consistently across environments, and testing critical functionalities of PEFT models in a modular way. The base classes and utilities handle common tasks so that model implementations can focus on their specific testing logic.
examples directory contains example training loops for tasks like language modeling, sequence classification, conditional text generation, and more. These demonstrate how to use PEFT for various natural language processing tasks.
…/causal_language_modeling directory contains examples of causal language modeling, where a model is fine-tuned to generate text in a causal, non-toxic manner. The main file
…/peft_lora_clm_accelerate_ds_zero3_offload.py loads a Twitter complaints dataset, tokenizes it, and prepares PyTorch datasets and dataloaders for training. It initializes a causal language model and trains it on the dataset for multiple epochs while tracking metrics like loss and accuracy. During training, it evaluates the model on a validation set and makes predictions on a test set if one exists. Training is distributed across GPUs/TPUs using the Accelerate library to speed up training.
…/conditional_generation directory contains examples of conditional text generation using sequence-to-sequence models. For example,
…/peft_adalora_seq2seq.py fine-tunes a BART model for financial sentiment analysis, loading a pretrained BART and training it on a financial sentiment dataset.
…/peft_lora_seq2seq_accelerate_ds_zero3_offload.py trains a BART model on Twitter complaints data using PEFT and distributed training with Accelerate.
…/peft_lora_seq2seq_accelerate_fsdp.py trains a T5 model on a financial phrase bank dataset using PEFT and distributed training with Accelerate.
…/sequence_classification directory contains an example in
…/peft_no_lora_accelerate.py which loads a pretrained model and GLUE MRPC dataset, wraps the model with PEFT, and performs distributed training using gradient accumulation. It defines training and evaluation loops over epochs.
…/regression directory contains a comprehensive test suite for validating the core functionality of PEFT and preventing unintended breaks. This suite is run regularly to ensure all models and components continue working as expected.
The main test runner is in the file
…/test_regression.py. This handles both normal regression testing where outputs are compared to stored outputs, as well as "creation mode" where new outputs are saved for each test/version combination. Clean git/tag checks are done in creation mode to avoid accidentally overwriting tests.
Individual test classes define tests as methods. They load models from
…/tuners using the full model API, calculate outputs on sample data, and return results. This validates models are built correctly and operate as intended.
The file runs tests by comparing fresh outputs to stored ones when running tests. In creation mode, it saves new output files, then validates the current commit is clean before finalizing. This process is version-controlled to ensure tests only change when code changes, preventing unintended breaks.
Conceptual & Developer Guides provides documentation on different conceptual approaches and techniques for developing models using the PEFT library. This includes both theoretical explanations of parameter-efficient methods as well as practical guides for customizing models.
…/conceptual_guides directory contains documentation files that describe various approaches at a high level. The file
…/adapter.md discusses adapter-based methods for efficiently fine-tuning pretrained models using lightweight modules. The file
…/prompting.md provides an overview of soft prompting methods.
…/developer_guides directory contains guides for developing with PEFT. The file
…/contributing.md outlines best practices for code quality and testing when contributing. The file
…/custom_models.md demonstrates applying techniques like LoRA to custom models by identifying modules to target.
…/lora.md documents the LoRA technique and its implementation. It discusses initialization options.
The base class in
…/layer.py provides common functionality inherited by subclasses like and that apply the technique to different layer types. The class in
…/model.py handles injecting and managing modules.
The base class in
…/layer.py stores parameters and implements merging weights. Layer-specific subclasses apply LoRA to different types. The class manages quantization via modules.
…/config.py file defines a configuration class to store hyperparameters for prompt tuning models.
…/model.py file defines a class which implements an embedding layer.
/__init__.py file imports core components for implementing prompt tuning. It provides standardized configuration and embedding components to implement a flexible approach for incorporating prompts into models.
The PEFT framework implements several quantization techniques to reduce model size under the
Quantization section. Quantization aims to compress models while maintaining performance by representing values with fewer bits.
…/lora directory provides quantization functionality using the LoRA algorithm. LoRA quantizes layers by representing the weights as a low-rank approximation, stored efficiently in a factored form. This allows reducing model size without losing accuracy.
…/layer.py file implements LoRA functionality for different layer types. Key aspects include storing LoRA parameters, initializing weights, computing the adapter contribution to weights, and merging weights. Layer-specific subclasses implement these aspects for their layer type.
Configuration for LoRA is defined in
This section discusses guidelines for customizing models with PEFT techniques. The
…/custom_models.md file demonstrates how to apply PEFT's LoRA adapter technique to custom models that are not standard Transformer architectures.
It shows an example of applying LoRA to a simple model. The model defines a sequential model with layers. To apply LoRA, the file prints the named modules to identify the layers to target. It creates a configuration to specify these as the target modules along with specifying modules to save. It then gets the PEFT model and checks that only a small fraction of parameters need to be trained.
The file also discusses applying LoRA to models from a library for computer vision. It loads a model and prints the named modules. It identifies the layers to target by matching names with a regex. The configuration specifies modules to save. It again uses to get the PEFT model and check the fraction of trainable parameters.
describes how to leverage PEFT's low-level API to inject trainable adapters into any module. Currently it supports injecting the LoRA, , and adapters inplace through modification using the method. This allows users to fine-tune models without relying on PEFT's modeling classes.
The file concludes with tips for applying PEFT techniques like LoRA to new architectures - to check existing model mappings and to identify and layers to specify as targets.
This section focuses on using PEFT together with other libraries like Accelerate and PyTorch.
The PEFT library integrates closely with Accelerate to enable distributed training of large models across multiple GPUs and TPUs. Several examples show using Accelerate for distributed training, including
…/peft_no_lora_accelerate.py where gradients from multiple batches are accumulated before updating the model using gradient accumulation. Another example is using DeepSpeed ZeRO in
…/peft_lora_clm_accelerate_ds_zero3_offload.py along with CPU offloading to reduce memory usage during distributed training.
PEFT also integrates closely with PyTorch functionality. Functions are provided to load PyTorch models and extract their parameters. Tests in
tests validate the PEFT transformations by comparing the model structure and parameters before and after applying techniques like adapters.
The PEFT library provides efficient methods for distributed training of large models using parameter-efficient techniques. PEFT integrates with libraries like Accelerate to allow distributed training of models augmented with its methods.
Some key aspects of using PEFT with distributed training include:
The training loops in files such as
…/peft_lora_clm_accelerate_ds_zero3_offload.pycalculate losses on batches and accumulate gradients across devices during optimization.
…/peft_lora_seq2seq_accelerate_fsdp.pyshow using techniques like Fully Sharded Data Parallel (FSDP) for efficient distributed training of large sequence models.
…/accelerateprovides guidance on using techniques with PEFT to train the largest models at scale.
The PEFT library leverages PyTorch functionality through classes defined in key files. The
…/peft_model.py file handles loading and saving adapters and dispatching the forward pass.
…/config.py file defines configuration classes that inherit attributes to serialize hyperparameters.
…/mapping.py file contains a class that maps configurations to model classes, enabling automatic loading of different PyTorch model types.
Tests in the
tests directory leverage PyTorch modules, optimizers, and utilities to validate models can be trained end-to-end with PyTorch. The
…/regression subdirectory contains tests that compare or store outputs.
…/regression directory contains a comprehensive suite of tests to validate the core functionality of PEFT and prevent unintended breaks. These regression tests cover a range of models, components, and use cases.
The main test runner is in the file
…/test_regression.py. This handles both normal regression testing where outputs are compared to stored outputs, as well as "creation mode" where new outputs are saved for each test/version combination. Clean git/tag checks are done in creation mode to ensure tests are only re-run when the code itself has changed.
Individual test classes define tests as methods, load models from
…/peft, calculate outputs on sample data, and return results. This allows testing specific parts of the code in a modular way. Models are built and called directly in the tests to validate core functionality works as expected.
Creation mode prevents accidental breaks to existing tests. It uses git/tag checks so that tests are only re-run and updated when the code itself has changed between tags. This validates that changes do not inadvertently cause existing tests to fail without reason.