Mutable.ai logoAuto Wiki by Mutable.ai

open-interpreter

Auto-generated from KillianLucas/open-interpreter by Mutable.ai Auto Wiki

open-interpreter
GitHub Repository
DeveloperKillianLucas
Written inPython
Stars36k
Watchers 273
Created2023-07-14
Last updated2024-01-08
LicenseGNU Affero General Public License v3.0
Homepageopeninterpreter.com
RepositoryKillianLucas/open-interpreter
Auto Wiki
Generated at2024-01-08
Generated fromCommit c607f7
Version0.0.4

Open Interpreter provides an open-source implementation of ChatGPT's conversational code interpreter that runs locally on a user's machine. It allows executing code like Python, JavaScript, shell scripts etc. through natural language conversations with large language models (LLMs) like GPT-3.

At the core is the interpreter Python module, which provides a simple interface for chatting and running code via the chat() method. This handles connecting to LLMs through the litellm library. LiteLLM allows executing code on the user's local environment by passing a language like "Python" and code string to its exec() method rather than just responding with text.

So Open Interpreter streams the model's responses and any code execution output back to the user through the terminal in Markdown format. It uses the rich library for displaying code with syntax highlighting, system messages, warnings etc. Configuration like changing models is supported through YAML files and command line arguments.

Some key design choices include:

  • Leveraging LiteLLM's exec() method for local code execution instead of remote APIs
  • Modular components for managing conversation state, computer interactions, terminal rendering etc.
  • Reusable blocks for handling messages, code snippets etc.
  • Configuration and requirement management through the poetry dependency manager
  • Comprehensive testing via pytest

For more details on the terminal interface, see the Terminal Interface section. The Language Model Integration section covers the OpenAI integration. And the Computer Interaction section discusses the programmatic control of the local computer.

Core Conversational AI System

References: interpreter, interpreter/core

The core conversational AI system handles interfacing with AI models, managing state, executing code, and processing responses. At the heart of this is the Core class defined in …/core.py. This class manages the overall conversation state and flow. It initializes attributes like the conversation history, model instance, and computer object.

The Core class' main responsibilities include:

  • Sending messages to the model via respond() and storing the responses in self.messages
  • Managing flags for the start and end of different message types
  • Truncating long output texts
  • Saving conversations to disk if enabled

The respond() function in …/respond.py is the main entry point for messaging the model, running code, and processing responses.

It then prepares the messages for the model by copying self.messages and prepending the extended system message. After messaging the model, it checks if the last message was code. If so, it runs the code on the object defined in Core.

The object represented by the Terminal class in …/terminal.py. This class initializes and coordinates other components like Terminal, Display, and Os. It provides a simplified interface to these components.

The Terminal class in …/terminal.py handles running code across different languages through a common interface.

State Management

References: interpreter/core, interpreter/core/core.py

The Core class in the …/core.py file manages the overall conversation state and execution flow across messages. It initializes attributes such as the self.messages list which stores the full conversation history.

The key chat() method handles sending messages to the LLM via respond() and processing the responses. It delegates single message handling. This method processes responses by calling respond(), handling flags, truncating long outputs, and appending messages to self.messages to maintain conversation context.

The Core class encapsulates interactions with the LLM through its LLM attribute.

The reset() method resets the state of the interpreter by clearing self.messages and other attributes, allowing a new conversation to begin.

Response Handling

References: interpreter/core, interpreter/core/respond.py

The core functionality of the respond() function in …/respond.py is to handle messaging the LLM, running code on the local machine, and processing responses from the LLM. Within the main loop of respond(), it first extends the system message using the interpreter object. This extended message is prepended to the messages from the interpreter, which are then sent to functions in …/llm.py like run().

Any responses from the LLM are processed - if a budget exceeded error occurs, a message is displayed to the user. Otherwise, the responses are further handled. If the last message was a code block, the code is run on the computer object defined in …/computer.py.

The key components involved in response handling are:

  • The interpreter object, which contains the conversation state and messages
  • The computer object, defined in …/computer.py, which represents the local machine and enables running code
  • Functions in …/llm.py like run(), which interface with the LLM
  • The respond() function itself, which coordinates the overall response process

Language Model Integration

References: interpreter/core/llm

This section covers code related to integrating with Large Language Models (LLMs) to enable conversational capabilities. The main classes and functions involved are:

The …/llm.py file contains functions for interacting with LLMs.

The …/convert_to_openai_messages.py file contains the convert_to_openai_messages function. This function takes messages in the internal format and converts them to the OpenAI format, handling different message types like "code", "console", and "image".

The …/merge_deltas.py file contains the merge_deltas function for combining response deltas to reconstruct complete messages.

The convert_to_openai_messages function translates between formats, handling different message types and encoding images. For "code" it constructs function calls or uses markdown depending on settings.

The merge_deltas function recursively merges a delta dictionary into the original to reconstruct messages from streaming responses.

Procedures Integration

References: interpreter/core/rag

The core functionality encapsulated in the …/rag directory is querying the Open Procedures database for coding tutorials based on a natural language conversation. This is done through the get_relevant_procedures_string function.

The get_relevant_procedures_string function formats the last two messages from the conversation using convert_to_openai_messages. It then constructs a query string and runs a search on the Open Procedures database. Depending on the file, this search is done by making an API request to the online database, or searching a local cached copy. Relevant procedures are extracted from the response and formatted into a readable string containing code snippets.

The key classes and functions involved are:

Terminal Interface

References: interpreter/terminal_interface, interpreter/terminal_interface/components

The …/terminal_interface directory contains the core logic for displaying conversations in the terminal interface. The BaseBlock, CodeBlock, and MessageBlock classes provide a reusable framework for building and displaying different rich block types in the terminal.

BaseBlock handles common terminal rendering logic using Rich's Live object. Subclasses like CodeBlock and MessageBlock focus on specific display logic and data. CodeBlock renders code snippets with syntax highlighting using the Syntax component. It tracks the active line number and only refreshes that portion when needed. MessageBlock displays message text, converting Markdown code blocks to plain text before rendering the message to a Panel.

These block classes leverage Rich components like Syntax, Panel, and Group to structure and style content. CodeBlock formats code and output into a nested structure of these components for a readable display. The render_past_conversation function intelligently maps past conversation data to CodeBlock and MessageBlock objects to simulate a conversation.

The terminal_interface function provides the core logic, yielding different block types and handling safe code execution. It delegates rendering to the block classes. The terminal_interface directory also contains utilities for tasks like Configuration, Displaying Output, and more.

Conversation Display

References: interpreter/terminal_interface, interpreter/terminal_interface/components

The render_past_conversation function handles displaying past conversations by mapping message data to the appropriate block components. It intelligently routes different message types like code snippets, output, and messages to either the CodeBlock or MessageBlock classes.

When called, render_past_conversation takes a list of past messages as input. It then iterates through each message chunk and determines the message type by checking fields like "role" and "content".

If the chunk indicates a code snippet, the function either creates a new CodeBlock object or continues an existing one. It stores relevant data from the fields on the block, like the code, language, and any output.

For message chunks, it simply prints the message text. When a new block is created or an existing one is continued, the function calls the block's refresh() method to display it, passing cursor=True/False to control the cursor.

After processing all chunks, it calls end() on the final block to cleanly terminate display. The CodeBlock and MessageBlock classes defined in …/components are used to represent code snippets and messages respectively.

The CodeBlock class initializes properties for the code, language, output, and active line number. Its refresh() method builds a syntax-highlighted Table for the code and displays any output below it using nested Panel and Group components. This structures the content clearly.

The MessageBlock class stores the message text and its refresh() method first converts Markdown code blocks to plain text before rendering the message to a Panel. This allows code snippets within messages to display properly.

Command Parsing

References: interpreter/terminal_interface/magic_commands.py

The handle_magic_command function in …/magic_commands.py is the main entry point for routing user commands. It splits the command on whitespace, gets the corresponding handler function from a switch dictionary based on the command name, and executes it, passing any arguments. This provides a clean way to route commands to the proper handlers without long if/else blocks.

The main handlers include:

The default_handle function displays an error if an unknown command is provided, then calls handle_help.

Interface Navigation

References: interpreter/terminal_interface/conversation_navigator.py

The conversation_navigator function handles navigating and resuming past conversations by displaying a list of saved conversation filenames for the user to select from. When a conversation is chosen, the function loads the stored conversation data from the corresponding JSON file. It then renders the past messages and sets the interpreter state to the selected conversation to resume it.

The key business logic implemented in …/conversation_navigator.py includes:

  • The get_storage_path function retrieves the path to the conversations directory.

  • Files are checked for in this directory and sorted by modification date.

  • A list of conversation filenames is formatted into human-readable strings for display.

  • Inquirer's prompt function is used to prompt the user to select a conversation.

  • The selected conversation's JSON data is loaded from file.

  • The render_past_conversation function displays the past messages.

  • The interpreter state is set to the selected conversation.

  • If requested, open_folder opens the conversations folder for the user's platform.

This allows seamlessly navigating between past conversations by loading and presenting the stored conversation history, while also setting relevant state behind the scenes. The implementation handles data loading, user input, and state management across operating systems.

Interface Initialization

References: interpreter/terminal_interface/start_terminal_interface.py, interpreter/terminal_interface/utils

The start_terminal_interface function in …/start_terminal_interface.py is responsible for applying configurations, checking dependencies, and launching the terminal interface. It takes the interpreter object and path to the configuration file as arguments.

The key steps it performs are:

check_for_package imports the requested package and checks it is installed by finding its spec, loading the module, and executing it.

Computer Interaction

References: interpreter/core/computer

The …/computer directory contains classes and functions for programmatically interacting with and controlling a computer. It provides an interface for simulating keyboard, mouse, clipboard, display, and operating system interactions through classes like Os, Terminal, and Display.

The main classes include:

  • The Display class in …/display.py handles capturing screenshots and interacting with the display. Its screenshot() method can capture the entire screen, a quadrant, or just the active window as a PIL image.

Key implementation details:

  • The Display class centralizes functions for capturing, processing and analyzing screenshots to interact with the computer display programmatically. It supports both online and offline modes.

Keyboard Control

References: interpreter/core/computer/keyboard

The …/keyboard directory provides functionality for simulating keyboard input. The main class is defined in …/keyboard.py.

This class allows simulating keyboard actions like typing strings and pressing individual keys or key combinations. It uses libraries to implement these inputs. The class is initialized with a computer object, representing the computer being simulated. This allows accessing other computer components like the clipboard.

The write() method simulates typing a string by efficiently pasting short strings or pasting the full text at once for long strings. It handles inserting line breaks and returns properly. Time delays are added between actions to mimic human typing timing. The clipboard is restored after to return the computer to its original state.

Wrap
Copy
press()
Wrap
Copy
hotkey() 

simulate pressing a single key or key combinations respectively.

Clipboard Interaction

References: interpreter/core/computer/clipboard

The class provides functionality for copying, pasting, and triggering clipboard shortcuts. It initializes with an object, allowing it to access functions for triggering shortcuts.

The method returns the current clipboard contents. The method can either copy a passed string, or trigger a shortcut sequence. This provides a cross-platform way to interact with the clipboard and trigger actions through shortcuts.

The class centralizes clipboard interactions and provides a simple API through its methods. It uses the object for access to functions and delegates shortcut triggering. This separation of concerns makes the code modular, reusable across different platforms, and able to support different implementations.

Display Handling

References: interpreter/core/computer/display

The Display class in the …/display directory handles capturing screenshots and interacting with the computer display programmatically. It initializes with the object and gets the screen size.

The main methods of the Display class are:

  • size() and center() - Get screen dimensions and center point
  • screenshot() - Capture a screenshot of the entire screen or a quadrant as a PIL Image object. It can capture the entire screen, a quadrant, or just the active window region.

The screenshot() method saves the captured image temporarily then loads it into a PIL Image object for further processing. It supports both online and offline modes. Optional packages like OpenCV, NumPy, and PyAutoGUI can be used for capturing screenshots and computer vision tasks. The Display class centralizes functions for capturing, processing and analyzing screenshots to interact with and understand the computer display programmatically.

Operating System Interface

References: interpreter/core/computer/os

The Os class in the …/os.py file provides an interface for abstracting operating system level interactions. The class handles notifications across operating systems via different approaches. On Linux and macOS it constructs a notification command, while on Windows it tries common GUI libraries before falling back to printing to the console. This method catches any exceptions to prevent crashes.

Terminal Execution

References: interpreter/core/computer/terminal

The …/terminal directory provides a unified interface for running code across different programming languages through terminals. The Terminal class handles initializing and running various language-specific classes to execute code.

The main language classes are defined in …/languages. For example, the Python class allows running Python code by leveraging the Jupyter kernel. It preprocesses code using the AddLinePrints AST transformer to insert line number print statements. These are used by detect_active_line() to determine the currently executing line.

The Shell class runs shell scripts through subprocesses. It calls preprocess_shell() to add active line markers by looping through lines and prepending echo statements. This allows detect_active_line() to check for echo output and detect the active line number.

Classes for other languages like JavaScript, R, and PowerShell similarly preprocess code before execution. They insert line number or execution markers, wrap in try/catch blocks, and postprocess output lines. This enables features like active line detection, error handling, and detecting execution completion.

The Terminal initializes these language classes then coordinates running code across them through a common interface. Its run() method executes preprocessed code from the appropriate class while formatting and yielding standardized output. This provides a unified way to run multiple programming languages with consistent interactivity and error handling.

Utility Functions

References: interpreter/core/computer/utils

The …/utils directory contains utility functions that provide common functionality for tasks like computer vision, retrieving window metadata, and converting file formats. Functions here aim to be portable across operating systems through platform-specific implementations abstracted behind consistent interfaces.

The file …/computer_vision.py contains the function that takes an image and text string as arguments. It detects text bounding boxes in the image, filters for the provided string, and returns the adjusted box centers.

The file …/get_active_window.py defines the function that returns a dictionary of metadata like the title and bounding box for the currently active window. It uses platform-specific APIs and conditional logic to retrieve this information consistently across Windows, macOS and Linux.

The file …/html_to_png_base64.py contains a function that converts an HTML string to a PNG image using the Html2Image class. It encodes the image to base64 and returns the string.

The file …/recipient_utils.py contains the functions. The function takes a text and recipient string, and returns a formatted string. The function parses such a formatted string and returns the extracted recipient and content.

Language Model Integration

References: interpreter/core/llm

The code in the …/llm directory handles all interactions with Large Language Models (LLMs) to enable conversational responses. This includes functionality for formatting messages, making API calls to LLMs, and parsing the responses.

The main class for interacting with LLMs is defined in …/convert_to_openai_messages.py. The convert_to_openai_messages() function translates between internal and OpenAI formats. It handles different message types, encoding images, and wrapping code blocks.

The …/utils directory contains important utility functions. merge_deltas() in …/merge_deltas.py recursively merges response deltas into the original message to reconstruct the full response.

parse_partial_json() in …/parse_partial_json.py parses "almost JSON" by tracking state and fixing issues character by character.

Message Formatting

References: interpreter/core/llm/utils/convert_to_openai_messages.py

The convert_to_openai_messages function in the …/convert_to_openai_messages.py file handles formatting user messages from the LMC format into the OpenAI format expected by the language model. It iterates through each message, determines the message type, and constructs a new message dictionary in the OpenAI format.

The main message types it handles are "message", "code", "console", and "image". For "code" messages, if the function_calling argument is True, it constructs a function_call dictionary with execute as the name and the code as a JSON argument. If False, it wraps the code in Markdown code fences.

For "console" output messages, if function_calling is True it constructs a function message with execute as the name and output as content. If False, it adds the output and a question to the user message.

For "image" messages, if the vision argument is False it skips them. If the format is base64, it decodes and resizes the image if width > 1024 pixels before re-encoding. If the format is a file path, it reads and encodes the image. It checks the encoded size is < 20MB.

The convert_to_openai_messages function takes the primary arguments messages, function_calling, and vision. It returns the list of converted messages in the OpenAI format.

Response Parsing

References: interpreter/core/llm/utils/merge_deltas.py, interpreter/core/llm/utils/parse_partial_json.py

The merge_deltas() function in the …/merge_deltas.py file is used to parse OpenAI response deltas and merge them back into the original response object. It recursively merges a delta dictionary into the original response dictionary to reconstruct the full response.

merge_deltas() takes two dictionaries as arguments - the original response dictionary and a delta dictionary containing changes. It iterates through each key-value pair in the delta, and either appends or sets the value in the original dictionary depending on if the key already exists.

For string values, it will append to any existing value under that key. For dictionary values, it recursively calls itself to merge the nested delta dictionary. This allows merging deltas at any level of nesting to reconstruct the full modified response.

The …/parse_partial_json.py file contains the parse_partial_json() function, which is used to parse response strings that may not be valid JSON. It processes the string character by character, tracking state like being inside a string. It handles escaped characters and uses a stack to track opened/closed brackets.

As it processes each character, it appends to a new string and fixes any issues. It then attempts to load the new string as JSON using json.loads(). If that fails, it returns None. This allows data that is "almost JSON" to still be successfully parsed into a Python object.

LLM Interaction

References: interpreter/core/llm/llm.py, interpreter/core/llm/run_function_calling_llm.py, interpreter/core/llm/run_text_llm.py

The main logic for communicating with the LLM service is handled in the …/llm.py file. The file contains code to represent a stateless LLM model that can be used for conversational responses. Its main methods are __init__() which initializes the class, and run() which takes a list of messages, formats them for the LLM, runs the LLM to get responses, and returns the responses.

The run() method first checks the message format, then detects if the LLM supports functions. It trims images and formats the messages. It then trims the messages to respect the context window and max tokens. The messages are passed depending on function support.

The code handles all the necessary processing such as formatting user messages, calling the underlying LLM via LiteLLM, and returning the responses back to the interpreter. It detects LLM capabilities and ensures the messages respect settings like context window and max tokens.

Documentation

References: docs

The Open Interpreter documentation covers a wide range of topics to help users and developers effectively utilize the system. At the core is documentation of usage through guides in the …/usage directory. This includes instructions for common tasks as well as support information in …/help.md.

When upgrading to new versions, detailed migration documentation is provided. The prime example is the guide for updating to version 0.2.0 located at …/NCU_MIGRATION_GUIDE.md. It explains changes to how Open Interpreter is instantiated and used, such as moving stateless attributes to a namespace. Examples demonstrate handling the new streaming response structure.

Project roadmaps outlining future plans can be found in …/ROADMAP.md. This file proposes new features, ways to future-proof the project through testing on GAIA, and using different language models with the .supports_functions attribute. The scope of the core and terminal interface code is also defined.

Security policies and procedures are documented in …/SECURITY.md. This describes the process for responsibly reporting vulnerabilities privately through draft advisories before public disclosure. It emphasizes that fixes should be coordinated and vulnerabilities not publicly disclosed until patches are released.

Additional utility files help implement documentation functionality. The …/style.css file contains CSS styles that are applied to documentation HTML elements for formatting. README files in other languages like …/README_JA.md explain usage to an international audience.

Usage Documentation

References: docs/usage, docs/usage/desktop

The …/usage directory contains documentation for using the Open Interpreter system. Within this directory is the …/desktop subdirectory, which holds documentation specific to the desktop application.

The sole file in …/desktop is help.md, a plain text file containing a single line directing users to email [email protected] for any support needs regarding the desktop application. No other documentation or code is present in this subdirectory.

Migration Documentation

References: docs/NCU_MIGRATION_GUIDE.md

The file …/NCU_MIGRATION_GUIDE.md provides guidance on migrating code to version 0.2.0 of Open Interpreter. It details changes to how Open Interpreter is instantiated and used in Python code, moving to a standard class format where Open Interpreter is instantiated via a class. All stateless LLM attributes have also been moved to a namespace.

The structure of static messages and streaming responses has been updated to use a flat list of message objects with standardized keys like role, type, and content. This new structure provides more modularity and ability to handle different media types. Examples show how to handle the new streaming structure in Python and TypeScript code.

Some key points covered include:

  • Open Interpreter is now instantiated via a class rather than a pre-instantiated object
  • All stateless LLM attributes have been moved to a namespace
  • Static messages and streaming responses now use a standardized message object format
  • Examples demonstrate processing the new streaming response format in various languages
  • Best practices like adding IDs, running code locally, and stopping streams are discussed

Support Documentation

References: docs/usage/desktop/help.md

The file …/help.md contains documentation on getting support for the desktop applications. It provides a plain text message directing users to email [email protected] for any support questions or issues.

This file serves as a central location for users to find support contact information. By centralizing support details in a documentation file, it ensures all users can easily access the appropriate channel to get assistance. The simple text format keeps the file lightweight and easily readable.

The help.md file contains no code, but provides a key resource for users needing support on desktop apps. Centralizing support contact information in documentation helps users quickly get the help they need from the development team.

Project Roadmaps

References: docs/ROADMAP.md

The …/ROADMAP.md file outlines the future plans and roadmap for Open Interpreter. It is split into several sections including documentation, new features, future-proofing, and completed tasks.

The roadmap discusses implementing new features such as multi-line input, displaying images, and data collection to improve the user experience. It also covers future-proofing the project through activities like testing, benchmarking on GAIA, and evaluating different language models. Ensuring the code works with various LLMs is important for continued support.

The scope of the core and terminal_interface projects is defined, with custom LLMs specified using a .supports_functions attribute. The interpreter.computer module exposes tools for running code in different languages via the interpreter.computer.run() function. Browser automation is discussed, along with an example of launching Chrome with remote debugging.

Security Information

References: docs/SECURITY.md

The file …/SECURITY.md details Open Interpreter's security policies and procedures for vulnerability reporting. It provides guidelines for responsibly disclosing any vulnerabilities privately to the developers first before public disclosure. This allows vulnerabilities to be validated and fixes coordinated, ensuring user privacy and protection.

The file states that responsible vulnerability disclosure is important. Any security advisories will be published on the GitHub Security Advisories page. To report a vulnerability, a security advisory draft should be created on GitHub following their guidelines. If a fix is discovered, a pull request should not be submitted - instead, the vulnerability should be reported and a temporary private fork may be used to collaborate on a patch.

The file links to relevant GitHub documentation for writing security advisories and collaborating privately on fixes. It does not define any code, but provides documentation on the process for responsibly reporting vulnerabilities.

Testing

References: tests

The comprehensive test suite in tests validates the core functionality of the Open Interpreter project. The main test file is …/test_interpreter.py, which contains a wide range of unit tests for the Interpreter class.

This file initializes an Interpreter instance and runs the chat() and computer() methods to validate message handling, code execution, I/O functionality, and markdown generation all operate as expected. Tests like test_generator() ensure messages are properly formatted with start/end flags. test_write_to_file() validates writing text to files. test_markdown() exercises generating different markdown structures.

Helper utilities like setup_function() and teardown_function() isolate tests by resetting the interpreter state between runs. count_tokens() measures code usage. The suite confirms the interpreter handles a variety of inputs and tasks correctly.

Utilities

References: interpreter/core/utils

The utilities modules provide common functionality used throughout the Open Interpreter project. These include modules for tasks like embedding text, searching embeddings, collecting debug information, handling temporary files, and scanning code snippets.

Many utility modules in …/utils implement reusable functionality. For example, temporary_file.py contains the create_temporary_file() and cleanup_temporary_file() functions for safely creating and deleting temporary files. This is used by scan_code.py to scan code snippets stored in temporary files.

system_debug_info.py defines several functions that aggregate system debugging information. This includes get_python_version(), get_pip_version(), get_oi_version(), get_os_version(), get_cpu_info(), get_ram_info(), and interpreter_info(). It centralizes collection of this data from various libraries.

The embed_function() in ARCHIVE_embed.py takes a query string and embeds it using the pretrained chroma_embedding_function model. This allows embedding text for tasks like search. ARCHIVE_vector_search.py contains the search() function, which calculates cosine distances between an embedded query and database to find the most similar matches.

Utility Functions

References: interpreter/core/utils/system_debug_info.py, interpreter/core/utils/get_user_info_string.py

The …/utils directory contains several common reusable utility functions used throughout the project. Key utility modules include:

These utility modules abstract away differences between platforms and provide reusable implementations for common debugging and environment tasks. Programmers can leverage functions like system_info() and get_user_info_string() without worrying about specific OS or library details.

Code Scanning

References: interpreter/core/utils/scan_code.py

The scan_code function in the …/scan_code.py module is used to scan code snippets for issues. It takes in the code to scan, the language of the code, and an interpreter object. This allows it to get the language class for the provided language from the interpreter.

The language class provides properties like the file extension for that language, which scan_code uses to create a temporary file containing the code snippet. It stores the temporary file path using the create_temporary_file function from …/temporary_file.py.

If verbose mode is enabled, it will print messages about scanning the code. It then runs semgrep on the temporary file to scan for issues. A loading spinner from the yaspin module is displayed to provide feedback to the user while semgrep is running.

If semgrep returns no errors, a success message is printed indicating no issues were found. It cleans up the temporary file using cleanup_temporary_file after scanning. This removes the temporary file and handles cleanup.

scan_code provides a clean interface to leverage semgrep for code scanning. It handles lower level details like I/O and configuration behind the scenes, while presenting a simple API to scan code snippets for various languages. The use of temporary files allows it to scan code for any language supported by the interpreter.

Embedding Utilities

References: interpreter/core/utils/ARCHIVE_embed.py

The …/ARCHIVE_embed.py file contains code for embedding text snippets. It imports the DefaultEmbeddingFunction from an external module to define the embedding function. Any errors when initializing the embedding function are ignored.

The main functionality is provided by the embed_function, which takes a query string as input. It passes the query to the chroma_embedding_function to get the embedding as a NumPy array. This array is then squeezed and converted to a list before being returned. So embed_function provides the core capability of embedding input text using the pretrained chroma_embedding_function model.

Search Utilities

References: interpreter/core/utils/ARCHIVE_vector_search.py

This section covers implementations of semantic search over embedded texts. The main functionality is provided by the search function in the …/ARCHIVE_vector_search.py file.

The search function takes a query string, a database mapping texts to their embeddings, an embedding function, and an optional number of results. It first embeds the query using the provided embedding function. It then calculates the cosine distance between the query embedding and each embedding in the database using the cosine distance function imported from chromadb.utils.distance_functions.

The distances are stored in a dictionary mapping each text to its distance from the query. This distances dictionary is then sorted by value to order the results from closest to furthest. Finally, it returns the top results by slicing the sorted distances dictionary. If the number of results is not provided, it defaults to the top 2 matches.

This implements a basic nearest neighbors search that finds the most similar matches in the database to the query based on cosine similarity between embeddings. The embedding function parameter allows embedding any text, making the search flexible for different models. This provides the core functionality for semantic search over embedded representations.

File Handling Utilities

References: interpreter/core/utils

The …/utils directory contains utility modules for common file handling tasks. This includes functions for creating and deleting temporary files, as well as embedding and searching text.

The …/temporary_file.py module contains two key functions - create_temporary_file() and cleanup_temporary_file(). create_temporary_file() takes file contents and metadata like the file extension and uses the tempfile module to safely create a temporary file on disk. It writes the contents and returns the file path. cleanup_temporary_file() takes a temporary file path and attempts to delete the file, printing messages if successful or if an error occurs. These functions provide a simple and robust way to work with temporary files.

The …/ARCHIVE_embed.py file initializes an embedding model for embedding text queries. It imports DefaultEmbeddingFunction and tries to initialize a chroma_embedding_function object, ignoring errors. The embed_function() takes a query string and uses chroma_embedding_function to embed it, returning the embedding as a list. This allows embedding text for tasks like search.

The …/ARCHIVE_vector_search.py file contains the search() function, which takes a query, database of embedded texts mapped to values, and an embedding function. It embeds the query, calculates cosine distances to each database embedding, sorts the distances to find the most similar values, and returns them. This implements basic nearest neighbors search over text embeddings.

Configuration

References: interpreter/terminal_interface/utils

The configuration module handles applying settings defined in YAML configuration files to customize various parameters and behaviors of the Open Interpreter application. These settings are loaded from configuration files located in the user's configuration directory, which is determined in a cross-platform way using the appdirs module.

The get_storage_path() function in …/local_storage_path.py is used to retrieve the base configuration directory path. It uses appdirs to determine the standard location, then returns full paths by joining subdirectories when requested.

The get_config() and get_config_path() functions in …/get_config.py handle loading the YAML configuration file contents. get_config_path() determines the file location, handling different cases like custom paths or defaults. get_config() retrieves the actual configuration after the path is resolved.

The key functionality is in applying the configuration settings. It:

  • Loads the YAML configuration file
  • Checks for any needed migrations between formats
  • Loops through the configuration keys
  • Uses setattr() to dynamically set attributes on the interpreter object matching each key name

This allows great flexibility in customizing various parts of the interpreter through YAML settings. Configuration values get mapped directly onto Python attributes at runtime.

Checking for Updates

References: interpreter/terminal_interface/utils/check_for_update.py

The …/check_for_update.py file contains code to check PyPI for newer versions of the "open-interpreter" package. The check_for_update() function makes a GET request to the PyPI API to fetch metadata about the latest version of the package. It extracts the "version" field from the response to get the latest version string, assigning it to the variable latest_version. The currently installed version is obtained using the pkg_resources module and assigned to current_version. The two version strings are then compared to check if the latest available version is newer than the current installation. By encapsulating this version checking logic in a single function, it provides a clean way to programmatically check for updates from PyPI.

The check_for_update() function performs the following steps:

  • It makes a GET request to the PyPI API URL to fetch package metadata as JSON. This uses the requests module.

  • The response JSON is parsed, and the "version" field is extracted from the "info" key to obtain the latest latest_version string.

  • The pkg_resources module is used to get the currently installed current_version string.

  • A string comparison of latest_version and current_version is done to check if the latest available version is newer.

By keeping the version checking logic self-contained in this single function, it provides an easy way to reuse this check from other parts of the codebase. The use of third party modules like requests and pkg_resources also avoids needing to replicate version fetching or parsing logic.

Procedures Integration

References: interpreter/core/rag

The Open Procedures integration code allows the conversational AI system to query an external database of coding tutorials and procedures to retrieve recommendations that are relevant to the user's conversation. This is handled through functions that construct queries from conversation messages, interface with the Open Procedures API or local database, run searches, and format the results.

The …/rag directory contains the key logic. The get_relevant_procedures_string function handles querying the database based on the conversation. It uses the convert_to_openai_messages function to format messages before making a request to the Open Procedures API. Alternatively, it can search a local cached copy of the database stored in _procedures_db using the search function from the vector_search module. Relevant procedures are extracted from the response and returned as a formatted string.

The ARCHIVE_local_get_relevant_procedures_string.py file contains similar logic but additionally handles downloading and updating the _procedures_db cache if needed. It constructs a query string from messages and runs search on the local database copy. This allows querying when offline by caching the database files.

The key aspects are:

  1. Formatting conversation messages with convert_to_openai_messages

  2. Querying the Open Procedures database semantically based on message content using get_relevant_procedures_string

  3. This function interfaces with either the online API or local cached database

  4. Running searches on the database with search

  5. Extracting and formatting relevant procedures from the response