SWE-agent
Auto-generated from princeton-nlp/SWE-agent by Mutable.ai Auto WikiRevise
SWE-agent | |
---|---|
GitHub Repository | |
Developer | princeton-nlp |
Written in | Python |
Stars | 449 |
Watchers | 9 |
Created | 04/02/2024 |
Last updated | 04/02/2024 |
License | MIT |
Homepage | swe-agent.com |
Repository | princeton-nlp/SWE-agent |
Auto Wiki | |
Revision | 0 |
Software Version | p-0.0.4Premium |
Generated from | Commit 67fee8 |
Generated at | 04/02/2024 |
The SWE-agent
repository is designed to transform language models into software engineering agents that can autonomously fix bugs and address issues within real GitHub repositories. Engineers can leverage this tool to automate the debugging and patching process, enhancing efficiency and reducing manual effort in software maintenance.
The most significant components of the repository include the sweagent
directory, which encapsulates the core functionality of the agent, including its behavior, interaction with the environment, and model inference. The scripts
directory is also crucial as it contains scripts for running the agent and evaluating its performance. The config
directory is essential for defining the agent's commands and environment interactions.
-
The
sweagent
directory:- The
agent
subdirectory contains theAgent
class, which manages the model, configurations, and arguments, and orchestrates the interaction with theSWEEnv
environment. - The
environment
subdirectory includes theSWEEnv
class, which is responsible for managing the Docker container environment where the agent operates. - Key design choices in this directory include the use of abstract base classes and concrete implementations to provide a flexible and extensible architecture. For more details, see Core Agent Functionality.
- The
-
The
scripts
directory:- Scripts like
run.py
andrun.sh
serve as the main entry points for running the agent, whilerun_and_eval.sh
is used for performance evaluation. - The
run_from_url.sh
andrun_jsonl.sh
scripts enable data-driven execution, allowing the agent to operate on data from various sources. - The
remove_all_containers.sh
script is crucial for Docker container management, ensuring a clean state for each run of the agent. For more details, see Execution Scripts.
- Scripts like
-
The
config
directory:- It defines the commands and prompt templates that the agent uses to interact with the
SWEEnv
environment. - The
commands
subdirectory includes scripts for file management, editing, linting, and search functionality within the command-line interface. - The configuration files are processed to set up the environment state variables and the action interface for the agent. For more details, see Agent Configuration and Commands.
- It defines the commands and prompt templates that the agent uses to interact with the
The repository relies on key technologies such as Docker for creating isolated environments, Gym for defining the agent's interaction space, and various machine learning models for inference. The use of Docker ensures that the agent operates in a consistent and controlled environment, while Gym provides a structured framework for the agent's actions and observations.
In summary, the SWE-agent
repository provides a sophisticated framework for automating software engineering tasks using advanced language models. It integrates a variety of scripts and configurations to facilitate the agent's learning, execution, and evaluation within a Dockerized environment, making it a powerful tool for developers looking to streamline the debugging and patching process.
Agent Configuration and CommandsRevise
References: config
The config
directory is pivotal in defining the SWE-agent's interaction with the SWEEnv
environment. It encapsulates the logic for command execution, prompt template definition, and environment state management. The directory structures the agent's capabilities to navigate, edit, and interact with the codebase through a set of predefined commands and state variables.
Configuration File Format and ProcessingRevise
References: config/README.md
YAML configuration files in config
dictate the SWE-agent's interaction with the SWEEnv
environment. These files define prompt templates, environment state variables, and the action interface.
Command Utilities and File ManagementRevise
Shell scripts …/cursors_defaults.sh
and …/defaults.sh
facilitate file management and navigation within a command-line interface. These scripts provide essential commands for interacting with files, managing cursors, and formatting display output.
Editing and LintingRevise
The edit()
function in …/cursors_edit_linting.sh
and …/edit_linting.sh
enables users to modify text within files and validates the changes using the flake8
linter. The primary operations performed by edit()
are:
Search FunctionalityRevise
The search.sh
script provides command-line search capabilities, enabling users to locate specific terms within directories or files and to find files by name. The script includes three primary functions: search_dir()
, search_file()
, and find_file()
.
Docker ConfigurationRevise
References: docker
The docker
directory is pivotal for setting up the Docker environment for the SWE-agent. It includes Dockerfiles that define the necessary configurations to create Docker images tailored for the SWE-agent's operation and evaluation.
Evaluation of Agent PerformanceRevise
References: evaluation
The evaluation
directory equips users with scripts to assess the SWE-agent's efficacy in generating model patch predictions. The primary script, run_eval.sh
, serves as a gateway to the evaluation process, invoking evaluation.py
with necessary arguments to facilitate the evaluation. The evaluation.py
script is pivotal in orchestrating the evaluation, leveraging the swebench
library to process model predictions, execute the SWE-bench evaluation, and output the results in JSON format.
Evaluation WorkflowRevise
References: evaluation/run_eval.sh
, evaluation/evaluation.py
The evaluation workflow for the SWE-agent involves setting up and executing the run_eval.sh
script, which serves as a wrapper for the evaluation.py
script. The workflow is initiated by providing the run_eval.sh
script with the path to the model's predictions file. This script orchestrates the evaluation process by setting up necessary directories and invoking evaluation.py
with appropriate arguments.
Evaluation Core LogicRevise
References: evaluation/evaluation.py
The evaluation.py
script processes model predictions by interfacing with the swebench
library to evaluate the performance of the SWE-agent. It accepts a .jsonl
file containing predictions and utilizes command-line arguments to determine the paths for logs, task instances, and the testbed directory. The script's core functionalities include:
Results Aggregation and AnalysisRevise
References: evaluation/aggregate_results.py
aggregate_results.py
parses experiment directories to aggregate data into a DataFrame, which is then output as a CSV file and a console summary. The script's command-line arguments allow filtering by model, dataset, setup, and run count, tailoring the aggregation process to specific evaluation needs.
Web-based Trajectory InspectionRevise
References: inspector
The inspector
directory equips users with a web interface to visualize and inspect trajectory files produced by the SWE-agent
. The interface is designed to facilitate the understanding and analysis of the agent's behavior through the trajectories it generates during operation.
Web Server Setup and Request HandlingRevise
References: inspector/server.py
The local HTTP server is initialized in …/server.py
using the http.server
module. A custom Handler
class extends http.server.SimpleHTTPRequestHandler
to manage HTTP requests and serve trajectory file contents and related information. The server is configured to listen on a specified port and can be started by invoking the main()
function with the desired data path, directory, and port number.
File Viewer InterfaceRevise
References: inspector/index.html
, inspector/fileViewer.js
The web interface for browsing trajectory files is facilitated by …/index.html
and …/fileViewer.js
. The interface allows users to select and view the contents of trajectory files, as well as refresh the displayed content without reloading the entire page.
Static HTML Viewer GenerationRevise
References: inspector/static.py
The static.py
script generates static HTML viewer pages for trajectory files, enabling offline review of the SWE-agent's interactions. The script processes trajectory files by loading their content and formatting it for web display, including the conversation history with distinct roles such as user or assistant. The resulting HTML pages are styled and structured to provide a clear and navigable presentation of the trajectory data.
Interface StylingRevise
References: inspector/style.css
The …/style.css
file provides the visual styling for the SWE-agent inspector interface. It defines the layout, animations, and content type styling to ensure a user-friendly and visually coherent experience.
Demonstration ManagementRevise
References: make_demos
The make_demos
directory is dedicated to the generation and handling of demonstration files which serve as training data for language models. These demonstrations are derived from completed trajectories, showcasing the sequence of actions an agent must take to resolve tasks within the environment.
Execution ScriptsRevise
References: scripts
The scripts
directory serves as a centralized hub for the execution and operational management of the SWE-agent. It contains a suite of scripts that facilitate the running of the SWE-agent, the evaluation of its performance, and the management of the Docker environment in which the agent operates.
Inference and Evaluation WorkflowRevise
References: scripts/run.sh
, scripts/run_and_eval.sh
The run.sh
script serves as a wrapper to execute the run.py
script with predefined arguments. It sets the model to be used (model_name
), the dataset location (data_path
), the cost limit for each instance (per_instance_cost_limit
), and the configuration file (config_file
). The run.py
script, while not detailed in the summaries, is the main inference script that likely performs the core functionality of the SWE-agent, such as processing input data and generating predictions or actions based on the model specified.
Data-Driven ExecutionRevise
References: scripts/run_from_url.sh
, scripts/run_jsonl.sh
The run_from_url.sh
script enables the SWE-agent to operate on data extracted from a GitHub issue URL. It invokes the run.py
script with a set of predefined arguments that include the model name, the URL of the GitHub issue, a specific base commit hash, a cost limit for the operation, and a configuration file. The script is designed to facilitate the use of GitHub issues as input data for the SWE-agent's inference process.
Trajectory ReplayRevise
References: scripts/run_replay.sh
The run_replay.sh
script facilitates the replay of saved trajectories for analysis or debugging by invoking the run_replay.py
script. The replay process is essential for understanding the behavior of the SWE-agent in past executions, which can be critical for both improving the agent's performance and diagnosing issues.
Docker Container ManagementRevise
References: scripts/remove_all_containers.sh
The remove_all_containers.sh
script is a utility for managing Docker containers, specifically designed to remove all containers from a system. It leverages Docker commands to perform this action, and its operation is encapsulated in a single line of Bash code:
Core Agent FunctionalityRevise
References: sweagent
The Agent
class orchestrates the core functionality of the SWE-agent, managing the interaction between the agent's behavior, the environment, and the model inference process. It initializes with configurations and arguments, setting the stage for the agent's operation within the SWEEnv
environment. The Agent
class's forward
method acts as the primary inference call, processing the model's output and updating the agent's history accordingly.
Agent Behavior and Model InteractionRevise
The Agent
class orchestrates the interaction between the software engineering agent and the coding environment, leveraging the model's capabilities to perform tasks. Upon initialization, the Agent
sets up the environment and prepares for the execution of commands. The forward
method serves as the primary conduit for the agent's operation, processing the model's output and updating the environment's state.
Environment ManagementRevise
The SWEEnv
class is responsible for the orchestration of the Docker container environment, which is pivotal for the execution of the SWE-agent. It extends the gym.Env
class, providing a structured environment that is compatible with reinforcement learning workflows. The class manages the lifecycle of Docker containers used to run software engineering tasks, ensuring that each task has a clean, isolated environment.
History Processing StrategiesRevise
References: sweagent/agent/history_processors.py
In …/history_processors.py
, the HistoryProcessor
abstract base class serves as the foundation for implementing various history processing strategies. These strategies are crucial for transforming the conversation history into a format suitable for the agent's subsequent operations.
Model Abstraction and InferenceRevise
References: sweagent/agent/models.py
The BaseModel
serves as an abstraction layer for various language models, providing a unified interface for model interaction. It establishes a framework for initializing model arguments, managing API statistics, and defining the core query()
method for subclasses to implement.
Experiment TrajectoriesRevise
References: trajectories
The trajectories
directory serves as the centralized storage for the results of experiments conducted by the SWE-agent. Each experiment's data is encapsulated within a user-specific subdirectory, which contains several key files: