repo logo
agents
Language
Python
Created
10/19/2023
Last updated
09/17/2024
License
Apache License 2.0
autowiki
Software Version
u-0.0.1Basic
Generated from
Commit
6c3125
Generated on
09/18/2024

agents
[Edit section]
[Copy link]

• • •
Architecture Diagram for agents
Architecture Diagram for agents

The LiveKit Agents Framework provides a toolkit for building real-time multimodal AI applications that integrate speech recognition, text-to-speech synthesis, and natural language processing. Engineers can use this framework to create voice assistants, chatbots, and other AI-powered applications that interact with users through audio and text.

The core of the framework is implemented in the livekit-agents directory, which contains the main components for speech-to-text (STT), text-to-speech (TTS), and language model integration. The VoiceAssistant class in …/voice_assistant serves as the central component, orchestrating the interaction between user input, language model processing, and speech output.

Key functionality of the framework includes:

• Speech-to-Text: The framework supports multiple STT providers through a plugin system. The STT class in …/stt defines the interface for speech recognition, while specific implementations like Google Cloud Speech-to-Text and OpenAI Whisper are available as plugins in livekit-plugins.

• Text-to-Speech: Similar to STT, the framework supports various TTS providers. The TTS class in …/tts defines the interface, with implementations for services like Google Cloud TTS and OpenAI TTS available as plugins.

• Language Model Integration: The LLM class in …/llm provides an interface for interacting with large language models. The OpenAI plugin in …/livekit-plugins-openai offers integration with models like GPT-3.5 and GPT-4.

• Inter-Process Communication: The framework uses a custom IPC system implemented in …/ipc to manage communication between different components, including process pools and supervised processes.

The framework is designed with extensibility in mind, utilizing a plugin architecture that allows for easy integration of new STT, TTS, and NLP services. This is evident in the livekit-plugins directory, which contains various plugin implementations.

For developers looking to get started, the examples directory provides sample implementations of voice assistants, speech-to-text, and text-to-speech applications using the framework.

Key design choices in the framework include:

• Asynchronous programming: The framework extensively uses Python's asyncio for handling concurrent operations, as seen in the utility functions in …/aio.

• Streaming interfaces: Both STT and TTS components support streaming, allowing for real-time processing of audio data.

• Modular architecture: The use of abstract base classes and plugins allows for easy swapping of components and addition of new functionality.

• Command-line interface: The framework provides a CLI for managing agent processes, implemented in …/cli.

For more detailed information on specific components, refer to the relevant sections in this wiki, such as Voice Assistant, Speech-to-Text, and Text-to-Speech.

Voice Assistant
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/voice_assistant, examples/voice-assistant

• • •
Architecture Diagram for Voice Assistant
Architecture Diagram for Voice Assistant

The VoiceAssistant class serves as the central component for implementing voice-based interactions. It integrates various modules for speech recognition, natural language processing, and speech synthesis:

Read more

Core Voice Assistant Functionality
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/voice_assistant

• • •
Architecture Diagram for Core Voice Assistant Functionality
Architecture Diagram for Core Voice Assistant Functionality

The VoiceAssistant class in …/voice_assistant.py serves as the central component for managing voice-based interactions between users and AI assistants. It integrates various modules to handle speech recognition, natural language processing, and speech synthesis.

Read more

Human Input Processing
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/voice_assistant

• • •
Architecture Diagram for Human Input Processing
Architecture Diagram for Human Input Processing

The HumanInput class in …/human_input.py manages audio input processing from a participant in a LiveKit room. Key functionalities include:

Read more

Agent Output and Playback
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/voice_assistant

• • •
Architecture Diagram for Agent Output and Playback
Architecture Diagram for Agent Output and Playback

The AgentOutput class manages speech synthesis and playback for the assistant's responses. Key features include:

Read more

Example Implementations
[Edit section]
[Copy link]

References: examples/voice-assistant

• • •
Architecture Diagram for Example Implementations
Architecture Diagram for Example Implementations

The …/voice-assistant directory contains implementations of voice assistants with varying levels of complexity:

Read more

Minimal Assistant Setup
[Edit section]
[Copy link]

References: examples/voice-assistant/minimal_assistant.py

• • •
Architecture Diagram for Minimal Assistant Setup
Architecture Diagram for Minimal Assistant Setup

The minimal_assistant.py script initializes and manages a voice assistant using the LiveKit framework. Key components include:

Read more

Function Calling Weather Assistant
[Edit section]
[Copy link]

References: examples/voice-assistant/function_calling_weather.py

The AssistantFnc class encapsulates the weather-related functionality for the voice assistant. Its key method, get_weather(), retrieves weather information for a given location by making an asynchronous HTTP GET request to the wttr.in API. The method handles successful responses (status code 200) by returning the weather data as a string, and raises an exception for failed requests.

Read more

Simple RAG Assistant
[Edit section]
[Copy link]

References: examples/voice-assistant/simple-rag

• • •
Architecture Diagram for Simple RAG Assistant
Architecture Diagram for Simple RAG Assistant

The Simple RAG Assistant is implemented in …/assistant.py. It leverages the LiveKit framework to create a voice assistant that uses Retrieval-Augmented Generation (RAG) to enhance its responses. Key components include:

Read more

Speech-to-Text
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/stt, livekit-plugins/livekit-plugins-google, livekit-plugins/livekit-plugins-openai, examples/speech-to-text

• • •
Architecture Diagram for Speech-to-Text
Architecture Diagram for Speech-to-Text

The STT class in …/stt.py defines the core interface for speech-to-text functionality. It includes methods for recognizing speech from an AudioBuffer and streaming audio data for real-time transcription.

Read more

Google Speech-to-Text Integration
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py

• • •
Architecture Diagram for Google Speech-to-Text Integration
Architecture Diagram for Google Speech-to-Text Integration

The STT class in …/stt.py provides integration with Google's Speech-to-Text API. Key features include:

Read more

OpenAI Speech-to-Text Integration
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py

The STT class in …/stt.py implements speech recognition using OpenAI's Whisper model. Key features include:

Read more

Speech Recognition Configuration
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py

• • •
Architecture Diagram for Speech Recognition Configuration
Architecture Diagram for Speech Recognition Configuration

The STTOptions dataclass encapsulates configuration options for Google's Speech-to-Text service. Key options include:

Read more

Speech Event Processing
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py

• • •
Architecture Diagram for Speech Event Processing
Architecture Diagram for Speech Event Processing

The SpeechStream class handles the processing of speech events from the Google Cloud Speech-to-Text API. Key aspects of speech event processing include:

Read more

Text-to-Speech
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/tts, livekit-plugins/livekit-plugins-google, livekit-plugins/livekit-plugins-openai, examples/text-to-speech

• • •
Architecture Diagram for Text-to-Speech
Architecture Diagram for Text-to-Speech

The TTS class serves as the primary interface for text-to-speech functionality. It provides methods for synthesizing audio from text input, including synthesize() for generating complete audio segments and stream() for incremental audio generation.

Read more

Google Text-to-Speech Integration
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-google/livekit/plugins/google/tts.py

• • •
Architecture Diagram for Google Text-to-Speech Integration
Architecture Diagram for Google Text-to-Speech Integration

The TTS class in …/tts.py provides an interface to Google's Text-to-Speech service. Key features include:

Read more

OpenAI Text-to-Speech Integration
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/tts.py

• • •
Architecture Diagram for OpenAI Text-to-Speech Integration
Architecture Diagram for OpenAI Text-to-Speech Integration

The TTS class in …/tts.py implements text-to-speech functionality using OpenAI's API. Key features include:

Read more

Audio Encoding and Streaming
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-google/livekit/plugins/google/tts.py

• • •
Architecture Diagram for Audio Encoding and Streaming
Architecture Diagram for Audio Encoding and Streaming

The ChunkedStream class handles audio encoding and streaming of synthesized speech. It supports different audio encodings, including MP3, and provides efficient streaming of the generated audio content. Key aspects include:

Read more

Cartesia Text-to-Speech Integration
[Edit section]
[Copy link]

References: examples/text-to-speech/cartesia_tts.py

• • •
Architecture Diagram for Cartesia Text-to-Speech Integration
Architecture Diagram for Cartesia Text-to-Speech Integration

The Cartesia TTS integration is implemented in …/cartesia_tts.py. This example demonstrates how to use the Cartesia text-to-speech library within a LiveKit application. Key components include:

Read more

Natural Language Processing
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/llm, livekit-plugins/livekit-plugins-openai

• • •
Architecture Diagram for Natural Language Processing
Architecture Diagram for Natural Language Processing

The LLM class in …/llm.py serves as the primary interface for interacting with OpenAI-based language models. It provides static methods for creating instances configured to use specific models and services, including Azure, Fireworks, Groq, Octo, OLLaMA, Perplexity, Together, and now also includes the with_deepseek method for creating instances with a DeepSeek LLM model.

Read more

LLM Integration
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/llm.py

The LLM class in …/llm.py serves as the primary interface for integrating various Large Language Model providers. It offers a unified API for interacting with different LLM services:

Read more

Speech-to-Text Processing
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py

• • •
Architecture Diagram for Speech-to-Text Processing
Architecture Diagram for Speech-to-Text Processing

The STT class in …/stt.py implements speech-to-text functionality using OpenAI's Whisper model. Key features include:

Read more

Text-to-Speech Synthesis
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/tts.py

• • •
Architecture Diagram for Text-to-Speech Synthesis
Architecture Diagram for Text-to-Speech Synthesis

The TTS class in …/tts.py implements text-to-speech functionality using OpenAI's API. Key features include:

Read more

File Upload Handling
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/beta/assistant_llm.py

The AssistantLLM class in …/assistant_llm.py manages file uploads for vision-enabled AI assistants. Key features include:

Read more

Inter-Process Communication
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/ipc

• • •
Architecture Diagram for Inter-Process Communication
Architecture Diagram for Inter-Process Communication

The Inter-Process Communication (IPC) system in the LiveKit agent framework is centered around the ProcPool class in …/proc_pool.py. This class is responsible for managing a pool of job executors that are ready to execute tasks.

Read more

Job Executor Interface
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/ipc/job_executor.py

• • •
Architecture Diagram for Job Executor Interface
Architecture Diagram for Job Executor Interface

The JobExecutor protocol serves as a blueprint for job execution within the LiveKit Agents Framework, defining essential properties and asynchronous methods that facilitate the lifecycle management of jobs. This protocol is crucial for ensuring that different types of job executors can be implemented while adhering to a consistent interface.

Read more

Thread-based Job Execution
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/ipc/thread_job_executor.py

• • •
Architecture Diagram for Thread-based Job Execution
Architecture Diagram for Thread-based Job Execution

The ThreadJobExecutor class facilitates the execution of jobs within separate threads, ensuring that each job is isolated and runs concurrently without blocking the main execution flow. This class is a key component in the agent framework's ability to handle multiple tasks simultaneously, providing a robust solution for lifecycle management, inter-thread communication, and health monitoring of jobs.

Read more

Process-based Job Execution
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/ipc/proc_job_executor.py

• • •
Architecture Diagram for Process-based Job Execution
Architecture Diagram for Process-based Job Execution

The ProcJobExecutor class is tasked with the execution management of jobs within separate processes. Its design allows for a clean separation of concerns, where each job runs in isolation, enhancing stability and scalability of the system. Here are the key responsibilities of the ProcJobExecutor class:

Read more

Job Process Main Functionality
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/ipc/job_main.py, livekit-agents/livekit/agents/ipc/proc_lazy_main.py

• • •
Architecture Diagram for Job Process Main Functionality
Architecture Diagram for Job Process Main Functionality

In the LiveKit Agents Framework, the job process's responsibilities include managing the execution of tasks, facilitating communication between different components, and handling logging. The …/job_main.py file provides the core functionality for these processes.

Read more

Communication Protocol
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/ipc/proto.py

• • •
Architecture Diagram for Communication Protocol
Architecture Diagram for Communication Protocol

In the LiveKit Agents Framework, inter-process communication (IPC) is a critical component that enables the main process to coordinate with its subprocesses. The …/proto.py file is central to this functionality, as it defines the message protocols that facilitate this coordination. The file establishes a suite of dataclasses, each representing a specific type of message that can be exchanged between processes.

Read more

Command-Line Interface
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/cli

• • •
Architecture Diagram for Command-Line Interface
Architecture Diagram for Command-Line Interface

The CLI for managing and interacting with agent processes is implemented in the …/cli directory. The main entry point is the run_app() function, which is exposed in the __init__.py file.

Read more

CLI Structure and Commands
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/cli/cli.py

• • •
Architecture Diagram for CLI Structure and Commands
Architecture Diagram for CLI Structure and Commands

The command-line interface is defined using the click library, with run_app() serving as the main entry point. It offers several commands for managing agent processes:

Read more

Logging Configuration
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/cli/log.py

• • •
Architecture Diagram for Logging Configuration
Architecture Diagram for Logging Configuration

The setup_logging() function in …/log.py configures logging for both development and production environments. It creates a StreamHandler and attaches either a JsonFormatter or ColoredFormatter based on the devmode flag:

Read more

Protocol and Data Structures
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/cli/proto.py

• • •
Architecture Diagram for Protocol and Data Structures
Architecture Diagram for Protocol and Data Structures

The CliArgs dataclass defines the configuration options for the LiveKit agent CLI, including worker options, log level, development mode, AsyncIO debug mode, watch mode, and drain timeout. It also contains a mp_cch attribute for inter-process communication.

Read more

Utility Functions
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/utils

The …/utils directory contains various utility modules and classes used throughout the agent framework:

Read more

Asynchronous Utilities
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/utils/aio

• • •
Architecture Diagram for Asynchronous Utilities
Architecture Diagram for Asynchronous Utilities

The gracefully_cancel() function in …/__init__.py provides a way to cancel multiple asynchronous futures while ensuring proper release of associated callbacks. This is particularly useful for graceful shutdown of complex asynchronous systems.

Read more

Miscellaneous Utilities
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/utils/misc.py

• • •
Architecture Diagram for Miscellaneous Utilities
Architecture Diagram for Miscellaneous Utilities

The …/misc.py file contains utility functions for audio processing, time operations, and unique identifier generation:

Read more

Plugin System
[Edit section]
[Copy link]

References: livekit-plugins

• • •
Architecture Diagram for Plugin System
Architecture Diagram for Plugin System

The plugin system in LiveKit provides an extensible architecture for adding new capabilities to agents. The core of this system is the Plugin class from the livekit.agents module, which serves as the base for all plugins.

Read more

Plugin System Architecture
[Edit section]
[Copy link]

References: livekit-plugins

• • •
Architecture Diagram for Plugin System Architecture
Architecture Diagram for Plugin System Architecture

The LiveKit plugin system is built around the Plugin base class, which provides a foundation for extending functionality. Plugins are registered using a decorator-based mechanism, allowing for easy integration of new capabilities.

Read more

Google Plugin
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-google

• • •
Architecture Diagram for Google Plugin
Architecture Diagram for Google Plugin

The Google plugin integrates Google Cloud services for speech-to-text (STT) and text-to-speech (TTS) functionality within the LiveKit Agents framework. It is implemented in the …/livekit-plugins-google directory.

Read more

OpenAI Plugin
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-openai

• • •
Architecture Diagram for OpenAI Plugin
Architecture Diagram for OpenAI Plugin

The OpenAI plugin integrates OpenAI's language models and AI services into the LiveKit ecosystem. It provides implementations for speech-to-text (STT), text-to-speech (TTS), and language model (LLM) functionalities.

Read more

RAG Plugin
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-rag

• • •
Architecture Diagram for RAG Plugin
Architecture Diagram for RAG Plugin

The RAG plugin implements Retrieval-Augmented Generation capabilities for enhanced natural language processing tasks within the LiveKit Agents framework. Key components include:

Read more

Anthropic Plugin
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic, livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/__init__.py, livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/llm.py, livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/models.py, livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/version.py, livekit-plugins/livekit-plugins-anthropic/setup.py

• • •
Architecture Diagram for Anthropic Plugin
Architecture Diagram for Anthropic Plugin

The Anthropic plugin, located at …/anthropic, enables the integration of Anthropic's advanced language models into the LiveKit ecosystem. This integration facilitates natural language understanding and generation, allowing developers to leverage the capabilities of Anthropic's AI for various applications within the LiveKit framework.

Read more

Clova Plugin
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-clova/livekit/plugins/clova, livekit-plugins/livekit-plugins-clova/setup.py

• • •
Architecture Diagram for Clova Plugin
Architecture Diagram for Clova Plugin

The LiveKit Agents Framework expands its speech recognition capabilities through the integration of the Clova speech-to-text service, facilitated by the …/clova directory. The integration is encapsulated within the ClovaSTTPlugin class, which adheres to the LiveKit plugin architecture, allowing seamless addition of Clova's STT functionality into the ecosystem.

Read more

Deepgram Plugin
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt.py, livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/utils.py, livekit-plugins/livekit-plugins-deepgram/setup.py

• • •
Architecture Diagram for Deepgram Plugin
Architecture Diagram for Deepgram Plugin

The STT class in …/stt.py is the central component of the plugin, enabling applications to utilize Deepgram's speech recognition capabilities. It provides methods for both batch and real-time speech processing, allowing for flexible integration into various use cases.

Read more

ElevenLabs Plugin
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/tts.py

The ElevenLabs plugin is integrated into the LiveKit ecosystem to provide text-to-speech (TTS) synthesis capabilities. It supports Speech Synthesis Markup Language (SSML) parsing and phoneme handling, enabling developers to create more natural and varied speech outputs. The plugin leverages the ElevenLabs API to offer both chunked and streaming audio synthesis from text, accommodating different use cases and performance requirements.

Read more

Transcription Management
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/transcription

• • •
Architecture Diagram for Transcription Management
Architecture Diagram for Transcription Management

The STTSegmentsForwarder and TTSSegmentsForwarder classes in …/stt_forwarder.py and …/tts_forwarder.py handle the forwarding of speech-to-text and text-to-speech transcription data respectively.

Read more

STT Forwarding
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/transcription/stt_forwarder.py

The STTSegmentsForwarder class manages the forwarding of speech-to-text transcription data to a user's room. Key functionalities include:

Read more

TTS Forwarding
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/transcription/tts_forwarder.py

The TTSSegmentsForwarder class in …/tts_forwarder.py manages the synchronization of text-to-speech transcription with audio playback. Key features include:

Read more

Text Processing
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/tokenize, livekit-plugins/livekit-plugins-rag

• • •
Architecture Diagram for Text Processing
Architecture Diagram for Text Processing

The SentenceChunker class in …/chunking.py handles text chunking for NLP tasks. Key features:

Read more

Paragraph Tokenization
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/tokenize/_basic_paragraph.py

• • •
Architecture Diagram for Paragraph Tokenization
Architecture Diagram for Paragraph Tokenization

The split_paragraphs() function in …/_basic_paragraph.py is designed to segment a given text into distinct paragraphs. The process is as follows:

Read more

Sentence Tokenization
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/tokenize/_basic_sent.py

• • •
Architecture Diagram for Sentence Tokenization
Architecture Diagram for Sentence Tokenization

The split_sentences() function in …/_basic_sent.py is designed to segment text into individual sentences. It returns a list of tuples, with each tuple containing a sentence along with its start and end positions in the original text. This is crucial for applications that require sentence-level analysis or processing.

Read more

Word Tokenization
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/tokenize/_basic_word.py

• • •
Architecture Diagram for Word Tokenization
Architecture Diagram for Word Tokenization

The split_words() function in …/_basic_word.py is designed to tokenize a given text into individual words. It returns a list of tuples, with each tuple containing a word from the text along with its starting and ending index positions. This allows for precise tracking of where each word is located within the original string, which is essential for tasks that require word-level analysis or manipulation.

Read more

Tokenization Utilities
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/tokenize/token_stream.py, livekit-agents/livekit/agents/tokenize/tokenizer.py

• • •
Architecture Diagram for Tokenization Utilities
Architecture Diagram for Tokenization Utilities

The LiveKit Agents Framework employs utility classes to facilitate the tokenization process, which is crucial for natural language processing tasks. The …/token_stream.py file introduces the BufferedTokenStream class, which serves as a foundation for handling streams of tokens. This class buffers incoming text and utilizes a tokenization function, tokenize_fnc, to produce either a list of tokens or tuples with tokens and their respective start and end indices.

Read more

Tokenization Integration
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/tokenize/basic.py

• • •
Architecture Diagram for Tokenization Integration
Architecture Diagram for Tokenization Integration

Tokenization within the LiveKit Agents Framework is a critical step in preparing text for further natural language processing tasks. The …/basic.py file encapsulates the integration of tokenization functionality, providing a streamlined interface for converting large text blocks into structured data forms such as sentences, words, and paragraphs.

Read more