Auto-generated from openai/openai-cookbook by Mutable.ai Auto Wiki
The openai-cookbook repository provides code examples and documentation for building natural language applications using OpenAI's APIs and models. It demonstrates techniques like question answering, text generation, semantic search, prompt engineering, and responsible AI practices.
Some of the key functionality includes:
Fine-tuned question answering models to answer questions based on a context string, as shown in
…/fine-tuned_qa. The directory contains the main logic to call the OpenAI Search API to build a context string from a search file, and the Completion API with a fine-tuned model to generate an answer based on that context.
Leveraging vector databases like Pinecone, Weaviate, and others for semantic search, as demonstrated in
…/vector_databases. The vector databases allow storing embeddings to find semantically similar matches and provide relevant context to reduce hallucinations. Specific techniques like question answering, hybrid search, tuning search parameters, managed cloud services, and SQL support are shown.
Prompt engineering through careful prompt design, priming, and chaining to improve model outputs, explained in
articles. For example, one article demonstrates incremental reasoning through selection-inference prompting.
Generating code snippets from natural language descriptions, as shown in the
Writing high quality documentation and following code conventions, testing practices, and licensing guidelines as described in the repository documentation.
Responsible AI practices like evaluating model outputs, considering potential harms, and not automating without human oversight.
The repository aims to demonstrate production-ready implementations using OpenAI's capabilities through modular, well-tested code following software engineering best practices. The documentation focuses on explaining concepts, providing actionable examples and tutorials, and linking to additional resources.
This code demonstrates using fine-tuned question answering models from OpenAI to answer questions based on a generated context string. The functionality is contained in the
A function retrieves relevant passages from a search file given a question. It calls the OpenAI Search API to find the most similar context, returning passages within a maximum total length by appending them to a list.
Another function takes a question, generated context string, and fine-tuned QA model ID. It first calls the context retrieval function to get the context. It then calls the OpenAI Completion API's completion method to answer the question based on the context string. It sets completion parameters and returns the first choice from the response.
An ArgumentParser is defined to accept command line arguments, and the question answering function is called to demonstrate answering a question from the command line input. Try/except is used to catch any errors.
examples directory contains examples demonstrating how to generate text using OpenAI APIs. This includes generating text snippets based on prompts as well as machine translation capabilities.
…/Backtranslation_of_SQL_queries.py file contains code to generate SQL queries from natural language instructions.
…/embeddings_utils.py file contains functionality for working with text embeddings from OpenAI API models.
…/weaviate directory contains examples using the Weaviate vector database combined with OpenAI models. Weaviate allows storing and retrieving embeddings to perform semantic vector search.
Vector similarity allows finding semantically related objects by calculating distances between vector embeddings in multidimensional space. Many databases support vector search through specialized functions and indexing. The
…/vector_databases directory contains code demonstrating semantic search on various databases.
…/redisqna subdirectory shows building a question answering system by storing documents and their embeddings in Redis.
…/weaviate directory contains examples using Weaviate, an open-source vector search engine. Notebooks demonstrate tasks like question answering on stored context.
…/typesense examples show indexing embeddings into Typesense, an in-memory search engine.
…/kusto directory provides examples storing embeddings in Azure Data Explorer and performing searches on them.
Vector databases like Pinecone and Weaviate allow storing embeddings or vectors that represent complex data, enabling efficient semantic search through vector similarity comparisons. Both services support storing, searching, and retrieving vectors at scale.
Pinecone is a managed vector database service that offers high performance, accuracy, and scalability for complex search tasks. The
…/pinecone directory contains examples using Pinecone with OpenAI models. The
…/README.md file describes three natural language applications: augmenting models with retrieval, question answering by retrieving relevant passages, and building a semantic search process. These leverage Pinecone features like single-stage filtering.
Weaviate is an open-source vector search engine that can be run locally or in the cloud. The
…/weaviate directory contains Jupyter notebooks demonstrating use with OpenAI. Jupyter notebooks in the directory show semantic search, hybrid keyword-vector search, and question answering workflows using Weaviate.
The code in
…/redisqna contains examples for building a question answering system. Documents are first preprocessed and embedded into vectors. These embeddings are stored in a Redis database.
To answer questions, the system finds similar documents using Redis' commands. It analyzes the top documents to extract answers from relevant contexts. Techniques like named entity recognition and part-of-speech tagging help identify potential answer spans. Candidate answers are scored and the top answer is returned along with relevant contexts and metadata.
The modular design allows replacing components like the embedding or storage models. Storing embeddings in Redis enables fast semantic searches without extensive feature engineering on queries.
The Weaviate class in the
…/weaviate module handles data storage and retrieval. It stores objects containing text fields and embedded vectors.
To perform hybrid search, a query is passed to the method of the Weaviate class. This method handles both keyword matching on text fields and nearest neighbor search on vectors. Keywords are matched using full-text search on the stored text fields. Vectors are searched using ANN to find semantically similar objects.
The results from both searches are combined and ranked based on relevance to the original query. Objects that match both keywords and are a vector nearest neighbor receive higher scores. This allows more accurate retrieval of objects that are related both textually and semantically to the query.
The method of the module handles embedding texts from fields into vectors. These are stored within Weaviate objects alongside the original text. On retrieval, the same model embeds the query to find nearest neighbors in the vector space.
This section covers optimizing parameters for vector search performance in Typesense. Typesense allows tuning the
k parameter to control the number of results returned from a vector search query.
The Typesense documentation describes the
k parameter for vector searches. A lower
k value returns fewer results faster, while a higher
k searches more documents to find better matches.
Weaviate also allows tuning search performance. The
…/weaviate directory contains Jupyter notebooks demonstrating Weaviate usage, such as storing text and performing semantic searches on stored vectors. Weaviate has a
k parameter for its search function to limit results.
Elasticsearch allows indexing vector embeddings for efficient semantic search and retrieval at scale. The
/elasticsearch directory contains Jupyter notebooks that demonstrate indexing the OpenAI Wikipedia embeddings and performing semantic searches. One notebook encodes a question into vectors and searches Elasticsearch, while another selects the top hit and uses it to prime the OpenAI Chat Completions API for retrieval augmented generation.
MongoDB Atlas provides the Vector Search capability, which simplifies indexing and searching high-dimensional vector data stored in MongoDB collections. The
/mongodb_atlas directory contains notebooks that show end-to-end workflows for indexing vector datasets into MongoDB Atlas and performing searches. The notebooks demonstrate common workflows for building a vector database with MongoDB Atlas, including indexing a vector dataset, performing searches on the indexed data, and evaluating the results.
Supabase leverages a Postgres extension that allows storing embeddings directly in a Postgres database. This provides a unified data store for both structured data and embeddings. Supabase adds additional services like auto-generated REST and GraphQL APIs, realtime APIs, authentication, file storage, and edge functions. These services can be used to build full-stack semantic search applications with minimal configuration or deployment work.
The extension allows embedding vectors directly into Postgres columns and indexing them using algorithms like ANNOY or FAISS. This enables fast similarity searches on the embedded vectors. Supabase provides documentation on using the extension's vector columns and indexes through its Postgres API. Role-based access control can be configured through Supabase's permissions system to restrict access to different vector data.
MongoDB Atlas also provides a fully managed vector search service. It simplifies deploying and managing vector indexes in MongoDB collections hosted on Atlas. Users can augment existing MongoDB data with vectors or use MongoDB solely as a vector database. The service automatically handles scaling the vector indexes as data grows. Its documentation linked from
…/README.md provides details on performing searches, indexing vectors, and evaluating results.
Both Supabase and MongoDB Atlas eliminate the need to provision and manage infrastructure for vector databases. Their auto-scaling capabilities and access controls integrate vectors seamlessly into application databases with minimal configuration or maintenance overhead.
This section covers self-hosted open source vector databases that can be run on-premises. The examples in the
…/vector_databases directory demonstrate how to use Weaviate and Typesense for semantic vector search with OpenAI models.
Weaviate allows storing and retrieving embeddings to perform semantic searches. As shown in the
…/weaviate directory, Weaviate can index embeddings from text and supports different "connectors" to run locally or on cloud platforms as described in the
Typesense is an open source in-memory search engine that stores the entire index in RAM for fast performance. As demonstrated in the
…/typesense directory, Typesense allows indexing documents containing text and embedding vectors. It supports vector searches to return the most relevant documents based on cosine similarity between the query vector and document vectors. The
…/README.md file provides instructions for using Typesense with OpenAI embeddings.
Both Weaviate and Typesense provide self-hosted options to run the vector databases on-premises. Weaviate supports different deployment configurations as described in its documentation, while Typesense focuses on performance and supports deployment either self-hosted or on Typesense Cloud. The examples in this section demonstrate how to index OpenAI embeddings into Weaviate and Typesense and perform semantic vector searches on the stored data.
SQL databases with vector support like SingleStoreDB allow for fast vector search and similarity calculations through vector data types and functions. SingleStoreDB is an SQL database that supports vector types and provides functions for nearest neighbor search and similarity calculations between vectors.
The vector similarity functions allow for similarity calculations between vectors stored in the database. These functions enable fast nearest neighbor search to find semantically similar vectors. This allows applications to improve accuracy by finding related vectors through semantic search.
SingleStoreDB is described as a high-performance and scalable SQL database, though no lower-level implementation details are provided about how it achieves this performance. The
…/SingleStoreDB directory contains Jupyter notebooks demonstrating how to insert vector data into SingleStoreDB, query vectors using the vector similarity functions, and combine the results with OpenAI models. For example, one notebook shows improving a question answering algorithm's accuracy by using the vector similarity functions to find related context from the database before attempting to answer.
This section demonstrates leveraging Cassandra's vector similarity features through the examples in the
…/cassandra_astradb directory. The examples show implementing a quote generation workflow backed by Cassandra and Astra DB for semantic vector searches.
The examples each load existing quote vectors from a Cassandra database table and calculate their similarity to a given input quote. This allows retrieving the most semantically similar quotes based on their vector embeddings. The top results are then used to generate new quotes via prompt tuning techniques from OpenAI. The newly created quotes and their vectors are indexed back into the Cassandra database for future semantic retrieval.
The examples demonstrate three different methods for interfacing with the Cassandra database - the Python driver, datastax/python-driver-async abstraction layer, and raw CQL queries. Each approach implements the same basic workflow but with a different API.
…/README.md file provides helpful documentation, including a visualization of vector embeddings to explain how searches in multidimensional vector space relate to semantics.
…/redis directory contains examples of storing vector embeddings in Redis and performing semantic searches using those embeddings.
…/redisqna contains a question answering system that indexes documents into vectors. It contains a class for embedding texts into vectors. These embeddings are stored in Redis for efficient retrieval and comparison.
The question answering logic retrieves similar document vectors from Redis. These documents are analyzed for answers, which are returned to the user. By storing embeddings in Redis, similar documents can be retrieved quickly for semantic matching.
…/nbutils.py contains functions for downloading and preprocessing Wikipedia article embedding data.
Azure Data Explorer (Kusto) allows storing vector embeddings as dynamic columns in tables. This enables semantic searches over corpus data. The
…/kusto directory contains examples demonstrating this functionality.
…/README.md file provides an overview of using Kusto for vector search. It shows how to use precomputed embeddings from OpenAI, store them in a Kusto table with the embeddings as dynamic columns. A text query can then be converted to an embedding with OpenAI and compared to the stored embeddings. This allows performing semantic searches over the dataset, such as a Wikipedia corpus.
The examples store Wikipedia article embeddings from OpenAI in a Kusto table. The embeddings are stored as dynamic columns, taking advantage of Kusto's support for dynamic data types. To search, a query text is passed to OpenAI to generate an embedding vector. This returns the closest matches based on semantic similarity.
The articles directory contains several files that provide guidance on designing effective prompts for large language models. The
…/techniques_to_improve_reliability.md file discusses using selection-inference prompting to improve model performance on multi-step reasoning problems. Selection-inference prompting alternates between having the model select relevant facts and make inferences to generate multiple reasoning steps. When applied to benchmarks requiring long reasoning chains, this technique significantly outperformed single-step prompting according to results cited in the file.
The file also describes using a model to evaluate its own inferences. After each step, the model is asked if it has enough information to answer the question. It can then choose to continue reasoning or stop. This helps avoid incorrect early guesses. Another technique trains a model to classify solutions to problems as correct or incorrect. This model is then used to select the best solution from 100 generations. This substantially improved accuracy on math problems compared to generative-only models according to results cited in the file.
…/text_comparison_examples.md file explains how to use text embeddings for tasks like semantic search, question answering, and recommendations. It provides examples of embedding text, storing embeddings in a database, and computing cosine similarity between embeddings. For question answering, semantic search is first used to find relevant documents. The documents are then prompted to GPT-3 to generate an answer. Recommendations work similarly, comparing item embeddings. The file also describes customizing embeddings for a domain by training a matrix modification.
…/related_resources.md file centralizes libraries, tools, guides, courses, and papers related to improving model outputs and prompt engineering. It lists over 20 prompting libraries and platforms, guides on techniques, video courses, and over a dozen papers on advanced prompting methods.
Prompt structure refers to how prompts are formatted and organized when interacting with large language models. The structure of a prompt can help guide the model's behavior and ensure it understands the task. Some important aspects of prompt structure include:
Using clear section headers and paragraph breaks to separate different parts of the prompt, like an introduction, examples, and instructions. This helps the model easily parse the relevant information.
Providing context, background details, and examples up front to prime the model before giving instructions. For complex tasks, a scenario or story can help illustrate the desired behavior.
Being specific yet concise when giving instructions to avoid ambiguity. It is best to directly tell the model the expected behavior rather than implying it or asking open-ended questions.
Structuring prompts incrementally, such as first showing examples then providing a template for the model to complete. Or breaking a multi-step problem into clear sub-tasks.
Using consistent formatting and language throughout the prompt to ensure the model understands the context and task requirements uniformly.
…/fine-tuned_qa directory demonstrates structuring a prompt for question answering. It uses functions to first retrieve relevant passages from a search file and concatenate them into a context string. Then it takes the question, context, and fine-tuned QA model to generate an answer. This clearly separates the steps for the model.
Prompt priming involves warming up language models with relevant examples and context before asking them to complete tasks. This helps the models better understand the types of inputs, outputs, and reasoning required. When priming prompts, it is important to provide a variety of representative examples and contextual information.
Some key files related to prompt priming include:
…/answers_with_ft.py file contains functions for working with text. Providing context helps prime the model to understand the topic and question domain before generating an answer.
…/embeddings_utils.py file contains functions for working with text embeddings.
Providing a variety of high-quality examples through functions helps language models better understand the types of inputs, contexts and expected outputs for different tasks, which can improve their accuracy and reliability when completing prompts. The model is primed to have a base understanding before responding.
Prompt chaining involves breaking down prompts into multiple steps by iteratively selecting relevant facts and making inferences based on those facts. As described in the article on improving reliability techniques, this technique significantly outperformed single prompts on benchmarks requiring long reasoning chains according to the results cited.
The selection-inference prompting approach discussed in the article guides models through problems incrementally. It involves alternating between prompting the model to select important facts from a list, and then prompting the model to make an inference based on the selected facts. This process is repeated in a loop. Between each iteration, the model can be asked if it needs additional steps to solve the problem or if it can provide the answer. By getting feedback from the model, it helps avoid incorrect guesses compared to single prompts.
Selection-inference prompting provides an example of how prompt chaining can improve reliability on multi-step problems. By breaking the reasoning process into discrete steps and obtaining feedback from the model, it helps models reason through problems incrementally.
Large language models can generate harmful, toxic, dangerous or unethical responses if not carefully guided. Prompt engineering techniques can help avoid problematic content and ensure models respond respectfully and helpfully. When designing prompts, it is important to consider how the prompt may affect the types of responses generated and whether certain topics could lead discussions in an unsafe direction. Prompts should be reviewed and tested to minimize the risk of content that could cause real-world harm.
Some techniques for safe prompting include:
Avoiding prompts that could promote harm, such as those relating to illegal or dangerous acts. This reduces the chance of a model providing a response that endorses or enables unsafe behavior.
Framing discussions around abstract, hypothetical scenarios rather than real people or events to prevent responses related to specific individuals or groups.
Priming models with examples that demonstrate respectful, non-toxic discussions to encourage continued safe and thoughtful responses. Examples can show respect for all people and acknowledge complex issues often have multiple valid perspectives.
Using prompts that steer discussions towards constructive topics and solutions rather than those that may divide or anger. Focusing on shared hopes and bringing people together helps prevent escalation.
Reviewing model outputs for potentially harmful assumptions, biases or lack of understanding before deploying. Providing feedback and adding corrective examples to the model training can help address issues and improve future responses.
This section provides resources for optimizing prompts used with large language models. The file
…/related_resources.md centralizes relevant open-source tools, educational materials, and research papers. It contains several useful sections:
The section on "Prompting libraries & tools" lists Python libraries for tasks like building chatbot interfaces, managing model data, automating model tuning, validating outputs, and more.
The "Prompting guides" section links to online courses and documentation on prompt engineering basics and techniques. This includes introductions to template-based prompting, priming models with examples, and designing prompts for specific tasks.
"Papers on advanced prompting" surveys academic papers exploring techniques such as chain-of-thought prompting, self-consistency, tree-based reasoning, and multi-agent debate to improve model reasoning abilities. These provide a starting point for implementing more complex prompting paradigms.
This section covers generating code snippets from natural language instructions using AI models. The file
…/Backtranslation_of_SQL_queries.py contains code that takes a natural language query as input, generates multiple candidate SQL queries to represent that query, evaluates them, and selects the best candidate.
The code contains functionality to generate candidates from a prompt and evaluate candidates by reconstructing the original query. It also contains main logic to run the backtranslation on a sample query.
This implementation shows how to leverage AI models to map between natural language and formal representations like SQL by generating candidates and selecting the best one. This technique could generate code snippets from plain English descriptions.
articles directory contains several files that provide tutorials on important concepts for working with large language models. The
…/how_to_work_with_large_language_models.md file discusses how to control models through different prompt styles like instructions, scenarios, demonstrations and fine-tuning. It also covers using models for code generation.
…/text_comparison_examples.md file describes how to use text embeddings for tasks like semantic search, question answering, and recommendations. It explains how to generate embeddings with OpenAI's API, store them in a database, and compute cosine similarity between embeddings. Pseudocode is provided for the semantic search process. Question answering is done by searching for relevant documents and prompting a model with them. Recommendations work similarly but compare item embeddings. The file also discusses customizing embeddings for specific domains.
…/what_makes_documentation_good.md file provides best practices for writing documentation that is easy to understand and helpful to many readers. It recommends structuring documentation with sections, titles, and short paragraphs. Writing titles as informative sentences rather than nouns, using simple language, and avoiding jargon are also covered. The file focuses on empowering readers and reducing obstacles to finding information.
This section provides an overview of techniques for designing effective prompts and priming models as discussed in the
…/related_resources.md file. The file lists several guides and papers related to prompt engineering that can help improve model performance and reliability. Some key areas covered include prompt structure, priming with examples, and advanced prompting techniques.
…/related_resources.md file discusses libraries and tools for tasks like building chatbot interfaces, managing model data, automating model tuning, validating outputs. It also links to guides on prompt engineering basics and techniques. Papers listed explore prompting methods such as self-consistency and tree-based reasoning to strengthen models' logical abilities.
This section discusses techniques for improving reliability on problems requiring multiple reasoning steps according to the file
…/techniques_to_improve_reliability.md. Selection-inference prompting alternates between prompting the model to select relevant facts and make inferences to generate multiple reasoning steps. When applied to benchmarks requiring long reasoning chains, selection-inference prompting significantly outperformed chain-of-thought prompting alone according to the cited results.
After each inference step, the model is asked whether the inferences so far are sufficient to answer the question. This helps avoid incorrect guesses by telling the process to continue or stop reasoning as needed.
To pick the best output from multiple model generations, the file discusses training a model to evaluate outputs. This model is fine-tuned to classify solutions as correct or incorrect. The evaluating model is then used to select the best solution from 100 generations according to its evaluation, substantially improving accuracy compared to generative-only models according to the cited results.
Evaluating model outputs accurately is important for building trustworthy applications. Some key practices include testing outputs against validation data, manually reviewing a sample of outputs, and using automated evaluation metrics.
…/answers_with_ft.py file contains code demonstrating question answering. It contains a function for answering questions based on a context string retrieved from a search file ID using the OpenAI Completion API.
The same file contains a function for retrieving relevant passages from the specified search file ID using the OpenAI search API. It appends passages to a list and tracks total length. This generated context string can then be used to evaluate answers against the original search file contents.
…/embeddings_utils.py file contains functions helpful for evaluating against validation data. It includes a function for plotting precision-recall curves to quantitatively measure a model's ability to separate classes of data.
This section provides guidance on writing high-quality documentation and examples for code in the repository. The file
…/what_makes_documentation_good.md outlines best practices for documentation. It recommends organizing documentation with sections, titles, and tables of contents to make it easy to skim. Titles should be informative sentences rather than abstract nouns.
The file also suggests writing in a clear style using simple sentences, avoiding jargon, and maintaining consistency. Documentation should be broadly helpful by explaining concepts simply and prioritizing common use cases. Key information should be up front, and topics introduced with a broad overview. Bullets and tables can clarify complex ideas.
Some important tips from the file include:
- Use sections, titles and tables of contents to organize documentation and aid skimming
- Write titles as informative sentences rather than abstract nouns
- Write in a clear style using simple language, avoiding jargon or undefined terms
- Explain concepts simply for all levels of experience
- Prioritize documenting common use cases and tasks
- Put important information up front rather than buried deep in pages
- Introduce topics with a broad overview before delving into details
- Clarify complex ideas using bulleted lists, tables or diagrams
Following these best practices will empower readers and reduce obstacles to finding and understanding information in the documentation.
When working with large language models, it is important to consider how to ensure the technology is developed and applied responsibly. Some key responsibilities for AI practitioners include evaluating model outputs for unintended harms, incorporating feedback loops, and designing systems with human oversight.
The OpenAI Cookbook provides guidance on safety practices for AI assistants through its documentation on Prompt Engineering. Proper prompt design is crucial for aligning models with human values and priorities. The section on Content Safety Prompts discusses techniques for avoiding harmful, toxic, dangerous or unethical responses through careful consideration of how the model is framed and primed.
The Priming with Examples documentation explains how exposing models to diverse, fact-checked examples during initial conversations can help establish appropriate boundaries and expectations before they are deployed widely. This Chained Prompts approach of breaking interactions into discrete steps with feedback at each stage allows correcting models if needed before moving to the next topic.
The Prompt Engineering Resources section aggregates various tools and best practices that can help evaluate model outputs, identify unintended harms, and refine prompt design iteratively based on feedback. Techniques like comparing responses to similar prior examples or getting models to justify their reasoning in their own words may reveal undesirable biases or lack of factual grounding.
By drawing on these resources and testing models extensively in controlled environments before releasing them, developers can help ensure AI systems are beneficial, harmless and honest. Ongoing monitoring and opportunities for public input are also important aspects of accountability. When handled carefully and conscientiously, language models have potential to be developed safely and for the benefit of humanity.
…/utils directory contains utility modules for working with AI models from the OpenAI API. The module
…/embeddings_utils.py provides functionality for generating and analyzing embeddings. It includes functions for computing distances between embeddings. Dimensionality reduction techniques are implemented in classes to project embeddings into 2D for visualization.
The OpenAI Cookbook aims to provide high quality examples and documentation for natural language applications. This relies on contributors sharing code, tutorials, and other content that follows certain standards.
CONTRIBUTING.md file outlines criteria for submissions to ensure they will be helpful for other users. Contributions are assessed on attributes like relevance, clarity, correctness, and completeness. A score is given from 1-4 for each attribute, with contributions below a 3 in any criteria generally rejected.
…/answers_with_ft.py file contains examples of using fine-tuned question answering models.
Content is expected to follow standards like consistent code style as outlined in
CONTRIBUTING.md. Thorough documentation also allows others to understand, use, and build upon examples. Overall these guidelines help maintain a high bar for quality contributions.
The OpenAI Cookbook provides examples of building natural language applications using OpenAI APIs and models. The code is open source and hosted on GitHub to encourage contributions. Code quality and readability are important for others to easily understand, use, and improve upon the examples. As such, the code follows standard style guides and best practices.
examples directory contains Python code examples demonstrating various capabilities. Code in this directory and throughout the repo adheres to PEP 8 style guidelines for formatting and structure. This includes following naming conventions like using lowercase with underscores for variables and functions, and CamelCase for classes. Comments are included to explain complex logic or non-intuitive choices.
…/answers_with_ft.py file contains functions for question answering tasks. Parameter names are self-documenting.
Custom modules in
…/utils provide reusable helper functions. Code reuses well-tested packages like NumPy and Pandas wherever possible rather than duplicating functionality.
Overall, adhering to established Python style guides and principles of clean coding like separating concerns, intentional naming, and documentation helps ensure the examples are understandable, maintainable and extensible as the project grows. This allows new contributors to easily understand and build upon the work.
This section focuses on writing clear and helpful documentation and docstrings. Documentation is crucial for others to understand how to use code effectively. Good documentation explains concepts simply and puts useful information prominently.
Key aspects covered include writing docstrings that specify what parameters and returns are. Docstrings should give a brief overview of the purpose and usage of functions, classes, and modules. Formatting docstrings using reStructuredText makes them easy to read across platforms.
Writing comments is also important. Inline comments clarify tricky logic, while high-level comments provide context around sections of code. Comments do not duplicate what the code already shows but explain intent behind design decisions.
Testing examples show how functionality works concretely. Short code snippets demonstrate a function's capabilities without needing to read extensive docs. Examples give confidence that code works as intended. Longer tutorials illustrate how to accomplish real tasks step-by-step.
examples directory contains many well-documented code samples. Files specify parameters and returns clearly in docstrings. Code is split into small testable functions with descriptive names. Comments provide rationale for implementation choices. Notebooks give live examples of using functions and classes. This serves as a model for documentation.
Testing practices are important for any codebase. The OpenAI Cookbook provides examples of integrating AI models into applications, so testing is crucial to ensure outputs are reliable and safe. Unit tests isolate components to validate functionality, while integration tests validate interactions between components.
…/utils directory contains utility modules used across examples. The
…/fine-tuned_qa directory demonstrates question answering. Tests should validate functions retrieve the correct context and answer for different inputs.
…/api_request_parallel_processor.py file contains code to call APIs concurrently while throttling requests. Tests are needed to confirm requests are properly throttled and failures are correctly retried. Integration tests could validate end-to-end request processing.
…/vector_databases directory contains examples storing embeddings in databases like
…/redis. Each database directory should contain tests to validate embeddings can be correctly stored, retrieved, and searched based on similarity. Tests in the database directories integrate and validate interactions between libraries and utilities.
Unit tests for individual components validate correct outputs for different inputs in isolation, while integration tests combine components to validate workflows. Both help catch bugs and regressions to maintain quality. Tests should be automated, run on each change, and catch failures early. Following testing best practices helps ensure examples are reliable and safe.
To contribute changes to the OpenAI Cookbook, contributors should follow a specific workflow. First, the contributor should fork the main cookbook repository on GitHub. This creates their own copy of the repository that they can make changes to without affecting the main repository.
Next, the contributor should clone their forked repository to their local machine. They can then make any desired changes locally, such as adding new code examples, documentation pages, or fixing bugs. It is recommended to make changes on branches other than main to avoid conflicts.
Once the changes are ready, the contributor pushes their local changes to their forked repository on GitHub. Then, they can create a pull request in the main cookbook repository to submit their changes for review. The pull request should include a description of the changes made.
The project maintainers will review the pull request changes. This involves running any automated tests, checking for code quality issues, verifying documentation is clear, and ensuring the changes meet the project's goals. Feedback may be provided to the contributor for improvements or requested changes before merging.
Once approved, the maintainers will merge the pull request, integrating the changes into the main repository. Regular contributors may also be given write access to directly push and merge their own changes. By following this workflow, contributors can help improve the cookbook through code, examples, and documentation while maintaining quality and avoiding conflicts with the main codebase.
examples directory contains code examples demonstrating tasks with various OpenAI APIs and models. Under the section
Licensing Information, programmers can find information on open source licensing used by the cookbook code.
This section discusses the different open source licenses that portions of the cookbook codebase may be released under. This is important for programmers to understand how they can use and modify the cookbook code for their own projects. Some of the licenses discussed include permissive licenses like MIT licenses that allow free use and modification of the code, as well as copyleft licenses like GPL licenses that require any modified code to also be open source.
CONTRIBUTING.md file establishes standards for inclusive, welcoming behavior and quality contributions to the documentation. It defines criteria for assessing submissions to ensure they maintain a constructive tone and follow writing best practices.
Contributions are evaluated on attributes like relevance, clarity, correctness, and completeness. A higher score in these categories means the content is more useful, understandable, and factually accurate. Submissions scoring below a 3 in any criteria may be rejected to preserve the quality of documentation.
Maintaining inclusive language and respectful discussions is important. The criteria helps provide objective guidelines for moderators to evaluate submissions and encourage improvements. This helps cultivate a collaborative community around the cookbook.