Mutable.ai logoAuto Wiki by Mutable.ai

peft

Auto-generated from huggingface/peft by Mutable.ai Auto WikiRevise

peft
GitHub Repository
Developerhuggingface
Written inPython
Stars13k
Watchers105
Created11/25/2022
Last updated04/03/2024
LicenseApache License 2.0
Homepagehuggingface.codocspeft
Repositoryhuggingface/peft
Auto Wiki
Revision
Software Version0.0.8Basic
Generated fromCommit 02b5ae
Generated at04/03/2024

The PEFT (Parameter-Efficient Fine-Tuning) library is a powerful tool for efficiently fine-tuning large pre-trained language models on specific tasks. It provides a range of techniques that can significantly reduce the number of parameters that need to be updated during the fine-tuning process, making it an attractive solution for resource-constrained environments.

The core functionality of the PEFT library is implemented across several key directories:

  1. …/tuners: This directory contains the implementation of various tuning techniques, including adapter-based methods like LoRA, LoHa, LoKr, OFT, and AdaLora, as well as prompt-based methods like PromptEmbedding, PrefixEncoder, and MultitaskPromptEmbedding. Each of these techniques is implemented in a separate subdirectory, with the core functionality defined in the corresponding __init__.py, config.py, layer.py, and model.py files.

  2. …/utils: This directory provides a collection of utility functions and classes that support various aspects of the PEFT framework, including configuration management, Transformer model mappings, model state management, quantization, and integration with other libraries like DeepSpeed and bitsandbytes.

The adapter-based tuning methods in the …/tuners directory introduce low-rank matrix adaptations to pre-trained models, enabling efficient fine-tuning by updating only a small number of parameters. For example, the LoRA technique decomposes the large attention matrices in the model into two smaller low-rank matrices, significantly reducing the number of parameters that need to be fine-tuned. The AdaLora method extends this further by dynamically adjusting the rank of the low-rank decomposition during the training process, allowing for more efficient use of the model's capacity.

The prompt-based tuning methods, on the other hand, condition frozen language models to perform specific downstream tasks by adding task-specific prompts to the input. The PromptEmbedding class is responsible for encoding the virtual tokens into prompt embeddings, a crucial component of the Prompt Tuning technique.

The utility functions and classes in the …/utils directory play a crucial role in supporting the various tuning techniques implemented in the PEFT library. For example, the get_peft_model_state_dict() and set_peft_model_state_dict() functions handle the retrieval and setting of the state dictionary for PEFT models, accounting for the different PEFT types and prompt learning configurations. The loftq_utils.py file provides functionality for quantizing and dequantizing tensors using the "normal" or "uniform" quantization methods, which is a key component of the LoftQ technique.

Overall, the PEFT library provides a comprehensive set of tools for efficiently fine-tuning large language models, with a focus on reducing the computational and memory requirements of the fine-tuning process. The modular design of the library, with separate directories and files for the various tuning techniques and utility functions, allows for easy extensibility and integration with other projects.

Adapter-based Tuning Methods
Revise

The PEFT library supports several adapter-based tuning methods, which introduce low-rank matrix adaptations to pre-trained models to enable efficient fine-tuning.

Read more

Low-Rank Adaptation (LoRA)
Revise

The LoRA technique introduces low-rank matrix adaptations to pre-trained models to enable efficient fine-tuning. It includes key components such as configuration classes, layer classes, and a LoRA model wrapper.

Read more

Adaptive Low-Rank Adaptation (AdaLora)
Revise

The AdaLora (Adaptive Low-Rank Adaptation) tuning method is a variant of the LoRA (Low-Rank Adaptation) technique, which introduces low-rank matrix adaptations to pre-trained models to enable efficient fine-tuning. The AdaLora method extends LoRA by dynamically adjusting the rank of the low-rank decomposition during the training process, allowing for more efficient use of the model's capacity.

Read more

Orthogonal Factorization Tuning (OFT)
Revise

The Orthogonal Factorization Tuning (OFT) technique is a parameter-efficient fine-tuning method for large language models that applies an orthogonal factorization to the model's weight matrices. The PEFT library provides a flexible and powerful implementation of OFT, which includes the following key components:

Read more

Low-Rank Kronecker (LoKr)
Revise

The Low-Rank Kronecker (LoKr) tuning method is a technique for efficiently fine-tuning large language models in the PEFT (Parameter-Efficient Fine-Tuning) library. The key components of the LoKr implementation are:

Read more

Low-Rank Hashing (LoHa)
Revise

The LoHa (Low-Rank Hashing) tuning method is a technique for efficiently fine-tuning large language models by using low-rank hashing to update the model parameters. The key components of the LoHa implementation are:

Read more

Prompt-based Tuning Methods
Revise

The PEFT library also provides prompt-based tuning methods, which condition frozen language models to perform specific downstream tasks by adding task-specific prompts to the input. These methods focus on efficiently fine-tuning large language models by updating only a small number of parameters, such as the prompt embeddings, while keeping the rest of the model parameters frozen.

Read more

Prompt Tuning
Revise

The Prompt Tuning technique efficiently fine-tunes large language models by updating only the prompt embeddings while keeping the rest of the model parameters frozen. This is achieved through the core functionality provided in the …/prompt_tuning directory.

Read more

Prefix Tuning
Revise

The Prefix Tuning technique is a parameter-efficient fine-tuning method for large language models that conditions the model's attention mechanism on a learned prefix. The core implementation of this technique is found in the …/prefix_tuning directory of the PEFT library.

Read more

P-Tuning
Revise

The P-Tuning technique is a type of Prompt Learning used for efficiently fine-tuning large language models by adding learnable prompt tokens to the input sequence. The PEFT library provides an implementation of the P-Tuning technique in the …/p_tuning directory.

Read more

Multitask Prompt Tuning
Revise

The Multitask Prompt Tuning (MPT) technique is a method for fine-tuning language models on multiple tasks simultaneously while maintaining parameter efficiency. The PEFT (Parameter-Efficient Fine-Tuning) library provides an implementation of the MPT technique, which is centered around the MultitaskPromptEmbedding class.

Read more

Other Tuning Methods
Revise

In addition to the adapter-based and prompt-based tuning methods, the PEFT library supports other techniques for efficient fine-tuning, such as IA3 (Infused Adapter by Inhibiting and Amplifying Inner Activations) and Polytropon (a multitask model with a LoRA adapter inventory).

Read more

IA3 (Infused Adapter by Inhibiting and Amplifying Inner Activations)
Revise

The IA3 (Infused Adapter by Inhibiting and Amplifying Inner Activations) tuning method is a technique for efficiently fine-tuning large language models by infusing an additional "informer" attention mechanism into the model's layers. The core functionality of the IA3 implementation is provided in the …/ia3 directory.

Read more

Polytropon (Multitask Model with LoRA Adapter Inventory)
Revise

Polytropon is a multitask model that uses a LoRA adapter inventory to enable efficient fine-tuning on multiple tasks simultaneously. The core implementation of Polytropon is located in the …/poly directory.

Read more

Adaption Prompt
Revise

The Adaption Prompt tuning method is a technique for fine-tuning large language models by injecting trainable prompt embeddings into the attention mechanism of the base model. This is implemented in the AdaptionPromptModel class, which is the main model class that combines the base language model with the Adapted Attention mechanism to enable the Adaption Prompt tuning process.

Read more

Utilities and Integration
Revise

References: src/peft/utils

The PEFT library provides a range of utility functions and classes that support various aspects of the fine-tuning process, including configuration management, model state management, quantization, and integration with other libraries like DeepSpeed and bitsandbytes.

Read more

Configuration Management
Revise

The PEFT (Parameter-Efficient Fine-Tuning) library provides a centralized set of utility functions and constants related to managing PEFT configurations, including mapping Transformer models to target modules for different fine-tuning techniques.

Read more

Model State Management
Revise

The PEFT library provides utility functions for saving, loading, and managing the state of PEFT models, handling the different PEFT types and prompt learning configurations.

Read more

Quantization and Optimization
Revise

The peft/utils/loftq_utils.py file in the PEFT library provides utilities for working with the LoftQ (Low-Rank Quantization) technique, which is a method for quantizing neural network weights while minimizing the quantization error.

Read more

Task Tensor Manipulation
Revise

The merge_utils.py file in the …/ directory provides a set of utility functions for merging, pruning, and performing arithmetic operations on task tensors, which is important for techniques like Multitask Prompt Tuning.

Read more

Miscellaneous Utilities
Revise

The peft/utils/other.py file in the PEFT library provides a variety of utility functions and classes that support various aspects of the fine-tuning process, including model preparation, quantization, adapter management, and other miscellaneous tasks.

Read more