Mutable.ai logoAuto Wiki by Mutable.ai

llama

Auto-generated from facebookresearch/llama by Mutable.ai Auto Wiki

llama
GitHub Repository
Developerfacebookresearch
Written inPython
Stars48k
Watchers 469
Created2023-02-14
Last updated2024-01-05
LicenseOther
Repositoryfacebookresearch/llama
Auto Wiki
Generated at2024-01-05
Generated fromCommit ef351e
Version0.0.4

Llama provides APIs and tooling for generating natural language text using large language models. The key functionality centers around the Llama class, which handles loading a pretrained model and providing methods to generate text continuations and conversational responses.

The Llama generator is built from model weights and a tokenizer located in llama. The weights implement an efficient transformer architecture, while the tokenizer handles encoding/decoding text using SentencePiece.

When built, the Llama instance exposes methods that take a prompt, feed it to the model token-by-token, and sample the next tokens to produce a completion. Additional methods provide easy-to-use interfaces for tasks like prompt finishing and dialog responses.

The llama examples show building a Llama generator and calling its methods to produce text for various prompts. Governance docs provide guidelines around intended use cases and contributions. The model card and documentation provide details on model training, evaluation, and deployment.

Overall, Llama focuses on providing a simple interface through the Llama class to generate text in a scalable and efficient way. The examples, governance, and documentation aim to promote safe and responsible application of the models.

Text Generation

References: llama, example_chat_completion.py, example_text_completion.py

This section covers text generation functionality for generating text continuations and chat responses. The Llama library provides two main interfaces for text generation - Llama generation methods for open-domain text, and a chat completion example that builds on the Llama class.

The Llama class in the …/generation.py file is the primary interface for text generation. It handles initializing the model from …/model.py, preprocessing text with the tokenizer from …/tokenizer.py, and generating responses. The responses are generated by feeding the prompts to the model one token at a time.

The example chat completion script in example_chat_completion.py builds a Llama generator from model and tokenizer paths. It passes dialogs to the generator's method along with generation hyperparameters.

The example text completion script in example_text_completion.py also builds a Llama generator, defines sample prompts, and calls the generator's method to return responses for each prompt. It defines a CLI to run the script.

Model

References: llama/model.py

The model code is contained within the …/model.py file. The hyperparameters for the model such as hidden size are defined.

The core of the model is implemented in a class. This class handles preprocessing the input, passing it through blocks, and postprocessing the output. Each block contains a layer followed by another layer.

The layer implements multi-head attention using a function to apply embeddings to the input tensors with precomputed frequencies. This allows for efficient parallel computation.

Other important functionality includes layer normalization and a function for precomputing embedding frequencies.

Tokenization

References: llama/tokenizer.py

The …/tokenizer.py file provides functionality for encoding and decoding text. It loads a SentencePiece model for tokenization from a specified file path on initialization.

The file defines a class that handles the encoding and decoding of text. The class stores important attributes from the loaded SentencePiece model like vocabulary size, beginning of sentence (BOS) token ID, end of sentence (EOS) token ID, and padding token ID. It provides a method that takes a string as input, optionally prepends the BOS token and appends the EOS token, and encodes the string to a list of token IDs using the SentencePiece encode method. It also has a method that takes a list of token IDs and decodes them to a string using SentencePiece's decode method.

SentencePiece is used for tokenization as it provides both efficiency and support for Unicode characters. The BOS and EOS tokens allow full sentences to be encoded and decoded. Token IDs rather than raw strings are used to represent the text, and padding is supported via a padding token ID.

The class provides a clean interface for encoding and decoding text that abstracts away the SentencePiece model specifics. Programmers interacting with the tokenizer only need to know how to call its encoding and decoding methods rather than handling the SentencePiece model directly.

Text Generation Interface

References: llama/generation.py

The Llama class in the file …/generation.py provides a high-level API for generating text. The Llama class handles building and loading a pre-trained model.

Governance

References: CODE_OF_CONDUCT.md, CONTRIBUTING.md, USE_POLICY.md

This section outlines the guidelines and policies for using and contributing to the Llama project. The CODE_OF_CONDUCT.md file establishes a code of conduct for inclusive, harassment-free participation. It describes responsibilities for maintainers to clarify standards and enforce the policy.

The CONTRIBUTING.md file provides guidelines for contributing code through pull requests and filing issues on GitHub. Contributors must complete a Contributor License Agreement to submit pull requests. The file outlines the pull request workflow and expectations for tests, documentation, and code quality. It also notes security bugs should be reported privately rather than through public issues.

The USE_POLICY.md file defines Meta's Acceptable Use Policy for Llama 2. It prohibits illegal or harmful uses such as violating laws, harassment, deception, or failing to disclose known dangers. Reporting mechanisms are in place for policy violations, risky model outputs, and software issues.

Documentation

References: README.md, MODEL_CARD.md, UPDATES.md

The README.md file provides documentation on downloading and using the various Llama 2 models for text generation. It discusses important aspects like model sizes, required resources, quick start instructions, formatting requirements, licensing, and references for additional technical documentation.

The MODEL_CARD.md file contains a detailed model card following the ML Metadata Profile format. It describes the Llama 2 model architecture, training methodology, intended uses, benchmarks, and important considerations for developing applications with the models. Performance is benchmarked on commonsense reasoning, math, safety, and other skills.

The UPDATES.md file summarizes changes and updates made to the Llama system. One update removed the default system prompt to reduce false refusals, while another update sanitized special strings used in prompts to prevent manipulation while allowing experimentation. The files provide high-level descriptions of issues addressed and the approaches taken at an overview level.