agents

Language

Python

Created

10/19/2023

Last updated

09/17/2024

License

Apache License 2.0

Homepage 1.1k

Repository 28

autowiki

Software Version

u-0.0.1Basic

Generated from

Commit

6c3125

Generated on

09/18/2024

agents
[Edit section]
[Copy link]

The LiveKit Agents Framework provides a toolkit for building real-time multimodal AI applications that integrate speech recognition, text-to-speech synthesis, and natural language processing. Engineers can use this framework to create voice assistants, chatbots, and other AI-powered applications that interact with users through audio and text.

The core of the framework is implemented in the livekit-agents directory, which contains the main components for speech-to-text (STT), text-to-speech (TTS), and language model integration. The VoiceAssistant class in …/voice_assistant serves as the central component, orchestrating the interaction between user input, language model processing, and speech output.

Key functionality of the framework includes:

• Speech-to-Text: The framework supports multiple STT providers through a plugin system. The STT class in …/stt defines the interface for speech recognition, while specific implementations like Google Cloud Speech-to-Text and OpenAI Whisper are available as plugins in livekit-plugins.

• Text-to-Speech: Similar to STT, the framework supports various TTS providers. The TTS class in …/tts defines the interface, with implementations for services like Google Cloud TTS and OpenAI TTS available as plugins.

• Language Model Integration: The LLM class in …/llm provides an interface for interacting with large language models. The OpenAI plugin in …/livekit-plugins-openai offers integration with models like GPT-3.5 and GPT-4.

• Inter-Process Communication: The framework uses a custom IPC system implemented in …/ipc to manage communication between different components, including process pools and supervised processes.

The framework is designed with extensibility in mind, utilizing a plugin architecture that allows for easy integration of new STT, TTS, and NLP services. This is evident in the livekit-plugins directory, which contains various plugin implementations.

For developers looking to get started, the examples directory provides sample implementations of voice assistants, speech-to-text, and text-to-speech applications using the framework.

Key design choices in the framework include:

• Asynchronous programming: The framework extensively uses Python's asyncio for handling concurrent operations, as seen in the utility functions in …/aio.

• Streaming interfaces: Both STT and TTS components support streaming, allowing for real-time processing of audio data.

• Modular architecture: The use of abstract base classes and plugins allows for easy swapping of components and addition of new functionality.

• Command-line interface: The framework provides a CLI for managing agent processes, implemented in …/cli.

For more detailed information on specific components, refer to the relevant sections in this wiki, such as Voice Assistant, Speech-to-Text, and Text-to-Speech.

Voice Assistant
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/voice_assistant, examples/voice-assistant

Architecture Diagram for Voice Assistant

The VoiceAssistant class serves as the central component for implementing voice-based interactions. It integrates various modules for speech recognition, natural language processing, and speech synthesis:

Core Voice Assistant Functionality
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/voice_assistant

Architecture Diagram for Core Voice Assistant Functionality

The VoiceAssistant class in …/voice_assistant.py serves as the central component for managing voice-based interactions between users and AI assistants. It integrates various modules to handle speech recognition, natural language processing, and speech synthesis.

Human Input Processing
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/voice_assistant

Architecture Diagram for Human Input Processing

The HumanInput class in …/human_input.py manages audio input processing from a participant in a LiveKit room. Key functionalities include:

Agent Output and Playback
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/voice_assistant

Architecture Diagram for Agent Output and Playback

The AgentOutput class manages speech synthesis and playback for the assistant's responses. Key features include:

Example Implementations
[Edit section]
[Copy link]

References: examples/voice-assistant

Architecture Diagram for Example Implementations

The …/voice-assistant directory contains implementations of voice assistants with varying levels of complexity:

Minimal Assistant Setup
[Edit section]
[Copy link]

References: examples/voice-assistant/minimal_assistant.py

Architecture Diagram for Minimal Assistant Setup

The minimal_assistant.py script initializes and manages a voice assistant using the LiveKit framework. Key components include:

Function Calling Weather Assistant
[Edit section]
[Copy link]

References: examples/voice-assistant/function_calling_weather.py

The AssistantFnc class encapsulates the weather-related functionality for the voice assistant. Its key method, get_weather(), retrieves weather information for a given location by making an asynchronous HTTP GET request to the wttr.in API. The method handles successful responses (status code 200) by returning the weather data as a string, and raises an exception for failed requests.

Simple RAG Assistant
[Edit section]
[Copy link]

References: examples/voice-assistant/simple-rag

Architecture Diagram for Simple RAG Assistant

The Simple RAG Assistant is implemented in …/assistant.py. It leverages the LiveKit framework to create a voice assistant that uses Retrieval-Augmented Generation (RAG) to enhance its responses. Key components include:

Speech-to-Text
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/stt, livekit-plugins/livekit-plugins-google, livekit-plugins/livekit-plugins-openai, examples/speech-to-text

The STT class in …/stt.py defines the core interface for speech-to-text functionality. It includes methods for recognizing speech from an AudioBuffer and streaming audio data for real-time transcription.

Google Speech-to-Text Integration
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py

Architecture Diagram for Google Speech-to-Text Integration

The STT class in …/stt.py provides integration with Google's Speech-to-Text API. Key features include:

OpenAI Speech-to-Text Integration
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py

The STT class in …/stt.py implements speech recognition using OpenAI's Whisper model. Key features include:

Speech Recognition Configuration
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py

Architecture Diagram for Speech Recognition Configuration

The STTOptions dataclass encapsulates configuration options for Google's Speech-to-Text service. Key options include:

Speech Event Processing
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py

Architecture Diagram for Speech Event Processing

The SpeechStream class handles the processing of speech events from the Google Cloud Speech-to-Text API. Key aspects of speech event processing include:

Text-to-Speech
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/tts, livekit-plugins/livekit-plugins-google, livekit-plugins/livekit-plugins-openai, examples/text-to-speech

The TTS class serves as the primary interface for text-to-speech functionality. It provides methods for synthesizing audio from text input, including synthesize() for generating complete audio segments and stream() for incremental audio generation.

Google Text-to-Speech Integration
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-google/livekit/plugins/google/tts.py

Architecture Diagram for Google Text-to-Speech Integration

The TTS class in …/tts.py provides an interface to Google's Text-to-Speech service. Key features include:

OpenAI Text-to-Speech Integration
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/tts.py

Architecture Diagram for OpenAI Text-to-Speech Integration

The TTS class in …/tts.py implements text-to-speech functionality using OpenAI's API. Key features include:

Audio Encoding and Streaming
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-google/livekit/plugins/google/tts.py

Architecture Diagram for Audio Encoding and Streaming

The ChunkedStream class handles audio encoding and streaming of synthesized speech. It supports different audio encodings, including MP3, and provides efficient streaming of the generated audio content. Key aspects include:

Cartesia Text-to-Speech Integration
[Edit section]
[Copy link]

References: examples/text-to-speech/cartesia_tts.py

Architecture Diagram for Cartesia Text-to-Speech Integration

The Cartesia TTS integration is implemented in …/cartesia_tts.py. This example demonstrates how to use the Cartesia text-to-speech library within a LiveKit application. Key components include:

Natural Language Processing
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/llm, livekit-plugins/livekit-plugins-openai

Architecture Diagram for Natural Language Processing

The LLM class in …/llm.py serves as the primary interface for interacting with OpenAI-based language models. It provides static methods for creating instances configured to use specific models and services, including Azure, Fireworks, Groq, Octo, OLLaMA, Perplexity, Together, and now also includes the with_deepseek method for creating instances with a DeepSeek LLM model.

LLM Integration
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/llm.py

The LLM class in …/llm.py serves as the primary interface for integrating various Large Language Model providers. It offers a unified API for interacting with different LLM services:

Speech-to-Text Processing
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py

Architecture Diagram for Speech-to-Text Processing

The STT class in …/stt.py implements speech-to-text functionality using OpenAI's Whisper model. Key features include:

Text-to-Speech Synthesis
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/tts.py

Architecture Diagram for Text-to-Speech Synthesis

The TTS class in …/tts.py implements text-to-speech functionality using OpenAI's API. Key features include:

File Upload Handling
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/beta/assistant_llm.py

The AssistantLLM class in …/assistant_llm.py manages file uploads for vision-enabled AI assistants. Key features include:

Inter-Process Communication
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/ipc

Architecture Diagram for Inter-Process Communication

The Inter-Process Communication (IPC) system in the LiveKit agent framework is centered around the ProcPool class in …/proc_pool.py. This class is responsible for managing a pool of job executors that are ready to execute tasks.

Job Executor Interface
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/ipc/job_executor.py

Architecture Diagram for Job Executor Interface

The JobExecutor protocol serves as a blueprint for job execution within the LiveKit Agents Framework, defining essential properties and asynchronous methods that facilitate the lifecycle management of jobs. This protocol is crucial for ensuring that different types of job executors can be implemented while adhering to a consistent interface.

Thread-based Job Execution
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/ipc/thread_job_executor.py

Architecture Diagram for Thread-based Job Execution

The ThreadJobExecutor class facilitates the execution of jobs within separate threads, ensuring that each job is isolated and runs concurrently without blocking the main execution flow. This class is a key component in the agent framework's ability to handle multiple tasks simultaneously, providing a robust solution for lifecycle management, inter-thread communication, and health monitoring of jobs.

Process-based Job Execution
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/ipc/proc_job_executor.py

Architecture Diagram for Process-based Job Execution

The ProcJobExecutor class is tasked with the execution management of jobs within separate processes. Its design allows for a clean separation of concerns, where each job runs in isolation, enhancing stability and scalability of the system. Here are the key responsibilities of the ProcJobExecutor class:

Job Process Main Functionality
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/ipc/job_main.py, livekit-agents/livekit/agents/ipc/proc_lazy_main.py

Architecture Diagram for Job Process Main Functionality

In the LiveKit Agents Framework, the job process's responsibilities include managing the execution of tasks, facilitating communication between different components, and handling logging. The …/job_main.py file provides the core functionality for these processes.

Communication Protocol
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/ipc/proto.py

Architecture Diagram for Communication Protocol

In the LiveKit Agents Framework, inter-process communication (IPC) is a critical component that enables the main process to coordinate with its subprocesses. The …/proto.py file is central to this functionality, as it defines the message protocols that facilitate this coordination. The file establishes a suite of dataclasses, each representing a specific type of message that can be exchanged between processes.

Command-Line Interface
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/cli

Architecture Diagram for Command-Line Interface

The CLI for managing and interacting with agent processes is implemented in the …/cli directory. The main entry point is the run_app() function, which is exposed in the __init__.py file.

CLI Structure and Commands
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/cli/cli.py

Architecture Diagram for CLI Structure and Commands

The command-line interface is defined using the click library, with run_app() serving as the main entry point. It offers several commands for managing agent processes:

Logging Configuration
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/cli/log.py

Architecture Diagram for Logging Configuration

The setup_logging() function in …/log.py configures logging for both development and production environments. It creates a StreamHandler and attaches either a JsonFormatter or ColoredFormatter based on the devmode flag:

Protocol and Data Structures
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/cli/proto.py

Architecture Diagram for Protocol and Data Structures

The CliArgs dataclass defines the configuration options for the LiveKit agent CLI, including worker options, log level, development mode, AsyncIO debug mode, watch mode, and drain timeout. It also contains a mp_cch attribute for inter-process communication.

Utility Functions
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/utils

The …/utils directory contains various utility modules and classes used throughout the agent framework:

Asynchronous Utilities
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/utils/aio

Architecture Diagram for Asynchronous Utilities

The gracefully_cancel() function in …/__init__.py provides a way to cancel multiple asynchronous futures while ensuring proper release of associated callbacks. This is particularly useful for graceful shutdown of complex asynchronous systems.

Miscellaneous Utilities
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/utils/misc.py

Architecture Diagram for Miscellaneous Utilities

The …/misc.py file contains utility functions for audio processing, time operations, and unique identifier generation:

Plugin System
[Edit section]
[Copy link]

References: livekit-plugins

The plugin system in LiveKit provides an extensible architecture for adding new capabilities to agents. The core of this system is the Plugin class from the livekit.agents module, which serves as the base for all plugins.

Plugin System Architecture
[Edit section]
[Copy link]

References: livekit-plugins

Architecture Diagram for Plugin System Architecture

The LiveKit plugin system is built around the Plugin base class, which provides a foundation for extending functionality. Plugins are registered using a decorator-based mechanism, allowing for easy integration of new capabilities.

Google Plugin
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-google

The Google plugin integrates Google Cloud services for speech-to-text (STT) and text-to-speech (TTS) functionality within the LiveKit Agents framework. It is implemented in the …/livekit-plugins-google directory.

OpenAI Plugin
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-openai

The OpenAI plugin integrates OpenAI's language models and AI services into the LiveKit ecosystem. It provides implementations for speech-to-text (STT), text-to-speech (TTS), and language model (LLM) functionalities.

RAG Plugin
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-rag

The RAG plugin implements Retrieval-Augmented Generation capabilities for enhanced natural language processing tasks within the LiveKit Agents framework. Key components include:

Anthropic Plugin
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic, livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/__init__.py, livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/llm.py, livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/models.py, livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/version.py, livekit-plugins/livekit-plugins-anthropic/setup.py

Architecture Diagram for Anthropic Plugin

The Anthropic plugin, located at …/anthropic, enables the integration of Anthropic's advanced language models into the LiveKit ecosystem. This integration facilitates natural language understanding and generation, allowing developers to leverage the capabilities of Anthropic's AI for various applications within the LiveKit framework.

Clova Plugin
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-clova/livekit/plugins/clova, livekit-plugins/livekit-plugins-clova/setup.py

The LiveKit Agents Framework expands its speech recognition capabilities through the integration of the Clova speech-to-text service, facilitated by the …/clova directory. The integration is encapsulated within the ClovaSTTPlugin class, which adheres to the LiveKit plugin architecture, allowing seamless addition of Clova's STT functionality into the ecosystem.

Deepgram Plugin
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt.py, livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/utils.py, livekit-plugins/livekit-plugins-deepgram/setup.py

Architecture Diagram for Deepgram Plugin

The STT class in …/stt.py is the central component of the plugin, enabling applications to utilize Deepgram's speech recognition capabilities. It provides methods for both batch and real-time speech processing, allowing for flexible integration into various use cases.

ElevenLabs Plugin
[Edit section]
[Copy link]

References: livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/tts.py

The ElevenLabs plugin is integrated into the LiveKit ecosystem to provide text-to-speech (TTS) synthesis capabilities. It supports Speech Synthesis Markup Language (SSML) parsing and phoneme handling, enabling developers to create more natural and varied speech outputs. The plugin leverages the ElevenLabs API to offer both chunked and streaming audio synthesis from text, accommodating different use cases and performance requirements.

Transcription Management
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/transcription

Architecture Diagram for Transcription Management

The STTSegmentsForwarder and TTSSegmentsForwarder classes in …/stt_forwarder.py and …/tts_forwarder.py handle the forwarding of speech-to-text and text-to-speech transcription data respectively.

STT Forwarding
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/transcription/stt_forwarder.py

The STTSegmentsForwarder class manages the forwarding of speech-to-text transcription data to a user's room. Key functionalities include:

TTS Forwarding
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/transcription/tts_forwarder.py

The TTSSegmentsForwarder class in …/tts_forwarder.py manages the synchronization of text-to-speech transcription with audio playback. Key features include:

Text Processing
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/tokenize, livekit-plugins/livekit-plugins-rag

Architecture Diagram for Text Processing

The SentenceChunker class in …/chunking.py handles text chunking for NLP tasks. Key features:

Paragraph Tokenization
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/tokenize/_basic_paragraph.py

Architecture Diagram for Paragraph Tokenization

The split_paragraphs() function in …/_basic_paragraph.py is designed to segment a given text into distinct paragraphs. The process is as follows:

Sentence Tokenization
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/tokenize/_basic_sent.py

Architecture Diagram for Sentence Tokenization

The split_sentences() function in …/_basic_sent.py is designed to segment text into individual sentences. It returns a list of tuples, with each tuple containing a sentence along with its start and end positions in the original text. This is crucial for applications that require sentence-level analysis or processing.

Word Tokenization
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/tokenize/_basic_word.py

Architecture Diagram for Word Tokenization

The split_words() function in …/_basic_word.py is designed to tokenize a given text into individual words. It returns a list of tuples, with each tuple containing a word from the text along with its starting and ending index positions. This allows for precise tracking of where each word is located within the original string, which is essential for tasks that require word-level analysis or manipulation.

Tokenization Utilities
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/tokenize/token_stream.py, livekit-agents/livekit/agents/tokenize/tokenizer.py

Architecture Diagram for Tokenization Utilities

The LiveKit Agents Framework employs utility classes to facilitate the tokenization process, which is crucial for natural language processing tasks. The …/token_stream.py file introduces the BufferedTokenStream class, which serves as a foundation for handling streams of tokens. This class buffers incoming text and utilizes a tokenization function, tokenize_fnc, to produce either a list of tokens or tuples with tokens and their respective start and end indices.

Tokenization Integration
[Edit section]
[Copy link]

References: livekit-agents/livekit/agents/tokenize/basic.py

Architecture Diagram for Tokenization Integration

Tokenization within the LiveKit Agents Framework is a critical step in preparing text for further natural language processing tasks. The …/basic.py file encapsulates the integration of tokenization functionality, providing a streamlined interface for converting large text blocks into structured data forms such as sentences, words, and paragraphs.

agents[Edit section][Copy link]

Voice Assistant[Edit section][Copy link]

Core Voice Assistant Functionality[Edit section][Copy link]

Human Input Processing[Edit section][Copy link]

Agent Output and Playback[Edit section][Copy link]

Example Implementations[Edit section][Copy link]

Minimal Assistant Setup[Edit section][Copy link]

Function Calling Weather Assistant[Edit section][Copy link]

Simple RAG Assistant[Edit section][Copy link]

Speech-to-Text[Edit section][Copy link]

Google Speech-to-Text Integration[Edit section][Copy link]

OpenAI Speech-to-Text Integration[Edit section][Copy link]

Speech Recognition Configuration[Edit section][Copy link]

Speech Event Processing[Edit section][Copy link]

Text-to-Speech[Edit section][Copy link]

Google Text-to-Speech Integration[Edit section][Copy link]

OpenAI Text-to-Speech Integration[Edit section][Copy link]

Audio Encoding and Streaming[Edit section][Copy link]

Cartesia Text-to-Speech Integration[Edit section][Copy link]

Natural Language Processing[Edit section][Copy link]

LLM Integration[Edit section][Copy link]

Speech-to-Text Processing[Edit section][Copy link]

Text-to-Speech Synthesis[Edit section][Copy link]

File Upload Handling[Edit section][Copy link]

Inter-Process Communication[Edit section][Copy link]

Job Executor Interface[Edit section][Copy link]

Thread-based Job Execution[Edit section][Copy link]

Process-based Job Execution[Edit section][Copy link]

Job Process Main Functionality[Edit section][Copy link]

Communication Protocol[Edit section][Copy link]

Command-Line Interface[Edit section][Copy link]

CLI Structure and Commands[Edit section][Copy link]

Logging Configuration[Edit section][Copy link]

Protocol and Data Structures[Edit section][Copy link]

Utility Functions[Edit section][Copy link]

Asynchronous Utilities[Edit section][Copy link]

Miscellaneous Utilities[Edit section][Copy link]

Plugin System[Edit section][Copy link]

Plugin System Architecture[Edit section][Copy link]

Google Plugin[Edit section][Copy link]

OpenAI Plugin[Edit section][Copy link]

RAG Plugin[Edit section][Copy link]

Anthropic Plugin[Edit section][Copy link]

Clova Plugin[Edit section][Copy link]

Deepgram Plugin[Edit section][Copy link]

ElevenLabs Plugin[Edit section][Copy link]

Transcription Management[Edit section][Copy link]

STT Forwarding[Edit section][Copy link]

TTS Forwarding[Edit section][Copy link]

Text Processing[Edit section][Copy link]

Paragraph Tokenization[Edit section][Copy link]

Sentence Tokenization[Edit section][Copy link]

Word Tokenization[Edit section][Copy link]

Tokenization Utilities[Edit section][Copy link]

Tokenization Integration[Edit section][Copy link]

agents
[Edit section]
[Copy link]

Voice Assistant
[Edit section]
[Copy link]

Core Voice Assistant Functionality
[Edit section]
[Copy link]

Human Input Processing
[Edit section]
[Copy link]

Agent Output and Playback
[Edit section]
[Copy link]

Example Implementations
[Edit section]
[Copy link]

Minimal Assistant Setup
[Edit section]
[Copy link]

Function Calling Weather Assistant
[Edit section]
[Copy link]

Simple RAG Assistant
[Edit section]
[Copy link]

Speech-to-Text
[Edit section]
[Copy link]

Google Speech-to-Text Integration
[Edit section]
[Copy link]

OpenAI Speech-to-Text Integration
[Edit section]
[Copy link]

Speech Recognition Configuration
[Edit section]
[Copy link]

Speech Event Processing
[Edit section]
[Copy link]

Text-to-Speech
[Edit section]
[Copy link]

Google Text-to-Speech Integration
[Edit section]
[Copy link]

OpenAI Text-to-Speech Integration
[Edit section]
[Copy link]

Audio Encoding and Streaming
[Edit section]
[Copy link]

Cartesia Text-to-Speech Integration
[Edit section]
[Copy link]

Natural Language Processing
[Edit section]
[Copy link]

LLM Integration
[Edit section]
[Copy link]

Speech-to-Text Processing
[Edit section]
[Copy link]

Text-to-Speech Synthesis
[Edit section]
[Copy link]

File Upload Handling
[Edit section]
[Copy link]

Inter-Process Communication
[Edit section]
[Copy link]

Job Executor Interface
[Edit section]
[Copy link]

Thread-based Job Execution
[Edit section]
[Copy link]

Process-based Job Execution
[Edit section]
[Copy link]

Job Process Main Functionality
[Edit section]
[Copy link]

Communication Protocol
[Edit section]
[Copy link]

Command-Line Interface
[Edit section]
[Copy link]

CLI Structure and Commands
[Edit section]
[Copy link]

Logging Configuration
[Edit section]
[Copy link]

Protocol and Data Structures
[Edit section]
[Copy link]

Utility Functions
[Edit section]
[Copy link]

Asynchronous Utilities
[Edit section]
[Copy link]

Miscellaneous Utilities
[Edit section]
[Copy link]

Plugin System
[Edit section]
[Copy link]

Plugin System Architecture
[Edit section]
[Copy link]

Google Plugin
[Edit section]
[Copy link]

OpenAI Plugin
[Edit section]
[Copy link]

RAG Plugin
[Edit section]
[Copy link]

Anthropic Plugin
[Edit section]
[Copy link]

Clova Plugin
[Edit section]
[Copy link]

Deepgram Plugin
[Edit section]
[Copy link]

ElevenLabs Plugin
[Edit section]
[Copy link]

Transcription Management
[Edit section]
[Copy link]

STT Forwarding
[Edit section]
[Copy link]

TTS Forwarding
[Edit section]
[Copy link]

Text Processing
[Edit section]
[Copy link]

Paragraph Tokenization
[Edit section]
[Copy link]

Sentence Tokenization
[Edit section]
[Copy link]

Word Tokenization
[Edit section]
[Copy link]

Tokenization Utilities
[Edit section]
[Copy link]

Tokenization Integration
[Edit section]
[Copy link]