huggingface/peft · Auto Wiki by Mutable.ai

Auto-generated from huggingface/peft by Mutable.ai Auto WikiRevise

peft
GitHub Repository
Developer	huggingface
Written in	Python
Stars	13k
Watchers	105
Created	11/25/2022
Last updated	04/03/2024
License	Apache License 2.0
Homepage	huggingface.codocspeft
Repository	huggingface/peft
Auto Wiki
Revision
Software Version	0.0.8Basic
Generated from	Commit `02b5ae`
Generated at	04/03/2024

The PEFT (Parameter-Efficient Fine-Tuning) library is a powerful tool for efficiently fine-tuning large pre-trained language models on specific tasks. It provides a range of techniques that can significantly reduce the number of parameters that need to be updated during the fine-tuning process, making it an attractive solution for resource-constrained environments.

The core functionality of the PEFT library is implemented across several key directories:

…/tuners: This directory contains the implementation of various tuning techniques, including adapter-based methods like LoRA, LoHa, LoKr, OFT, and AdaLora, as well as prompt-based methods like PromptEmbedding, PrefixEncoder, and MultitaskPromptEmbedding. Each of these techniques is implemented in a separate subdirectory, with the core functionality defined in the corresponding __init__.py, config.py, layer.py, and model.py files.
…/utils: This directory provides a collection of utility functions and classes that support various aspects of the PEFT framework, including configuration management, Transformer model mappings, model state management, quantization, and integration with other libraries like DeepSpeed and bitsandbytes.

The adapter-based tuning methods in the …/tuners directory introduce low-rank matrix adaptations to pre-trained models, enabling efficient fine-tuning by updating only a small number of parameters. For example, the LoRA technique decomposes the large attention matrices in the model into two smaller low-rank matrices, significantly reducing the number of parameters that need to be fine-tuned. The AdaLora method extends this further by dynamically adjusting the rank of the low-rank decomposition during the training process, allowing for more efficient use of the model's capacity.

The prompt-based tuning methods, on the other hand, condition frozen language models to perform specific downstream tasks by adding task-specific prompts to the input. The PromptEmbedding class is responsible for encoding the virtual tokens into prompt embeddings, a crucial component of the Prompt Tuning technique.

The utility functions and classes in the …/utils directory play a crucial role in supporting the various tuning techniques implemented in the PEFT library. For example, the get_peft_model_state_dict() and set_peft_model_state_dict() functions handle the retrieval and setting of the state dictionary for PEFT models, accounting for the different PEFT types and prompt learning configurations. The loftq_utils.py file provides functionality for quantizing and dequantizing tensors using the "normal" or "uniform" quantization methods, which is a key component of the LoftQ technique.

Overall, the PEFT library provides a comprehensive set of tools for efficiently fine-tuning large language models, with a focus on reducing the computational and memory requirements of the fine-tuning process. The modular design of the library, with separate directories and files for the various tuning techniques and utility functions, allows for easy extensibility and integration with other projects.

Adapter-based Tuning Methods
Revise

References: src/peft/tuners/lora, src/peft/tuners/loha, src/peft/tuners/lokr, src/peft/tuners/oft, src/peft/tuners/adalora

The PEFT library supports several adapter-based tuning methods, which introduce low-rank matrix adaptations to pre-trained models to enable efficient fine-tuning.

Low-Rank Adaptation (LoRA)
Revise

References: src/peft/tuners/lora

The LoRA technique introduces low-rank matrix adaptations to pre-trained models to enable efficient fine-tuning. It includes key components such as configuration classes, layer classes, and a LoRA model wrapper.

Adaptive Low-Rank Adaptation (AdaLora)
Revise

References: src/peft/tuners/adalora

The AdaLora (Adaptive Low-Rank Adaptation) tuning method is a variant of the LoRA (Low-Rank Adaptation) technique, which introduces low-rank matrix adaptations to pre-trained models to enable efficient fine-tuning. The AdaLora method extends LoRA by dynamically adjusting the rank of the low-rank decomposition during the training process, allowing for more efficient use of the model's capacity.

Orthogonal Factorization Tuning (OFT)
Revise

References: src/peft/tuners/oft

The Orthogonal Factorization Tuning (OFT) technique is a parameter-efficient fine-tuning method for large language models that applies an orthogonal factorization to the model's weight matrices. The PEFT library provides a flexible and powerful implementation of OFT, which includes the following key components:

Low-Rank Kronecker (LoKr)
Revise

References: src/peft/tuners/lokr

The Low-Rank Kronecker (LoKr) tuning method is a technique for efficiently fine-tuning large language models in the PEFT (Parameter-Efficient Fine-Tuning) library. The key components of the LoKr implementation are:

Low-Rank Hashing (LoHa)
Revise

References: src/peft/tuners/loha

The LoHa (Low-Rank Hashing) tuning method is a technique for efficiently fine-tuning large language models by using low-rank hashing to update the model parameters. The key components of the LoHa implementation are:

Prompt-based Tuning Methods
Revise

References: src/peft/tuners/p_tuning, src/peft/tuners/prefix_tuning, src/peft/tuners/prompt_tuning, src/peft/tuners/multitask_prompt_tuning

The PEFT library also provides prompt-based tuning methods, which condition frozen language models to perform specific downstream tasks by adding task-specific prompts to the input. These methods focus on efficiently fine-tuning large language models by updating only a small number of parameters, such as the prompt embeddings, while keeping the rest of the model parameters frozen.

Prompt Tuning
Revise

References: src/peft/tuners/prompt_tuning

The Prompt Tuning technique efficiently fine-tunes large language models by updating only the prompt embeddings while keeping the rest of the model parameters frozen. This is achieved through the core functionality provided in the …/prompt_tuning directory.

Prefix Tuning
Revise

References: src/peft/tuners/prefix_tuning

The Prefix Tuning technique is a parameter-efficient fine-tuning method for large language models that conditions the model's attention mechanism on a learned prefix. The core implementation of this technique is found in the …/prefix_tuning directory of the PEFT library.

P-Tuning
Revise

References: src/peft/tuners/p_tuning

The P-Tuning technique is a type of Prompt Learning used for efficiently fine-tuning large language models by adding learnable prompt tokens to the input sequence. The PEFT library provides an implementation of the P-Tuning technique in the …/p_tuning directory.

Multitask Prompt Tuning
Revise

References: src/peft/tuners/multitask_prompt_tuning

The Multitask Prompt Tuning (MPT) technique is a method for fine-tuning language models on multiple tasks simultaneously while maintaining parameter efficiency. The PEFT (Parameter-Efficient Fine-Tuning) library provides an implementation of the MPT technique, which is centered around the MultitaskPromptEmbedding class.

Other Tuning Methods
Revise

References: src/peft/tuners/ia3, src/peft/tuners/poly, src/peft/tuners/adaption_prompt

In addition to the adapter-based and prompt-based tuning methods, the PEFT library supports other techniques for efficient fine-tuning, such as IA3 (Infused Adapter by Inhibiting and Amplifying Inner Activations) and Polytropon (a multitask model with a LoRA adapter inventory).

IA3 (Infused Adapter by Inhibiting and Amplifying Inner Activations)
Revise

References: src/peft/tuners/ia3

The IA3 (Infused Adapter by Inhibiting and Amplifying Inner Activations) tuning method is a technique for efficiently fine-tuning large language models by infusing an additional "informer" attention mechanism into the model's layers. The core functionality of the IA3 implementation is provided in the …/ia3 directory.

Polytropon (Multitask Model with LoRA Adapter Inventory)
Revise

References: src/peft/tuners/poly

Polytropon is a multitask model that uses a LoRA adapter inventory to enable efficient fine-tuning on multiple tasks simultaneously. The core implementation of Polytropon is located in the …/poly directory.

Adaption Prompt
Revise

References: src/peft/tuners/adaption_prompt

The Adaption Prompt tuning method is a technique for fine-tuning large language models by injecting trainable prompt embeddings into the attention mechanism of the base model. This is implemented in the AdaptionPromptModel class, which is the main model class that combines the base language model with the Adapted Attention mechanism to enable the Adaption Prompt tuning process.

Utilities and Integration
Revise

References: src/peft/utils

The PEFT library provides a range of utility functions and classes that support various aspects of the fine-tuning process, including configuration management, model state management, quantization, and integration with other libraries like DeepSpeed and bitsandbytes.

Configuration Management
Revise

References: src/peft/utils/__init__.py, src/peft/utils/constants.py, src/peft/utils/peft_types.py

The PEFT (Parameter-Efficient Fine-Tuning) library provides a centralized set of utility functions and constants related to managing PEFT configurations, including mapping Transformer models to target modules for different fine-tuning techniques.

Model State Management
Revise

References: src/peft/utils/__init__.py, src/peft/utils/save_and_load.py

The PEFT library provides utility functions for saving, loading, and managing the state of PEFT models, handling the different PEFT types and prompt learning configurations.

Quantization and Optimization
Revise

References: src/peft/utils/loftq_utils.py, src/peft/utils/integrations.py, src/peft/utils/other.py

The peft/utils/loftq_utils.py file in the PEFT library provides utilities for working with the LoftQ (Low-Rank Quantization) technique, which is a method for quantizing neural network weights while minimizing the quantization error.

Task Tensor Manipulation
Revise

References: src/peft/utils/merge_utils.py

The merge_utils.py file in the …/ directory provides a set of utility functions for merging, pruning, and performing arithmetic operations on task tensors, which is important for techniques like Multitask Prompt Tuning.

Miscellaneous Utilities
Revise

References: src/peft/utils/other.py

The peft/utils/other.py file in the PEFT library provides a variety of utility functions and classes that support various aspects of the fine-tuning process, including model preparation, quantization, adapter management, and other miscellaneous tasks.

peft

Adapter-based Tuning MethodsRevise

Low-Rank Adaptation (LoRA)Revise

Adaptive Low-Rank Adaptation (AdaLora)Revise

Orthogonal Factorization Tuning (OFT)Revise

Low-Rank Kronecker (LoKr)Revise

Low-Rank Hashing (LoHa)Revise

Prompt-based Tuning MethodsRevise

Prompt TuningRevise

Prefix TuningRevise

P-TuningRevise

Multitask Prompt TuningRevise

Other Tuning MethodsRevise

IA3 (Infused Adapter by Inhibiting and Amplifying Inner Activations)Revise

Polytropon (Multitask Model with LoRA Adapter Inventory)Revise

Adaption PromptRevise

Utilities and IntegrationRevise

Configuration ManagementRevise

Model State ManagementRevise

Quantization and OptimizationRevise

Task Tensor ManipulationRevise

Miscellaneous UtilitiesRevise