v1.1.0: 10s of thousands of training trajectories

@klieret

v1.1.0: 10s of thousands of training trajectories

We're very excited to announce our new project SWE-smith, generating 10s of thousands of training trajectories for SWE agents.
Using this training data, our LM SWE-agent-LM-32b achieves open-weights SotA on SWE-bench verified with SWE-agent!

Apart from that, v1.1.0 is mostly a fix release with minor improvements, in particular adding compatibility with SWE-bench multilingual/multimodal, and SWE-smith. However, please pay attention to the breaking changes below.

Breaking changes

Changes to trajectory data format. The messages field is replaced by query by @klieret in #1107
Renamed many tool bundles that used "windowed" file viewer (defaults and more) by @klieret in #1147
Removed review_on_submit tool bundle (replaced by review_on_submit_m) by @klieret in #1148
Change in windowed tools (formerly default): Don't append \n to new file by @klieret in #1114

Added

New dataset support:

Feat: Support multilingual evaluation by @kabirgh in #1090
Feat: SWE-smith & multimodal base support by @klieret in #1092

New utilities:

Feat: Add quick-stats tool by @klieret in #1125

Enhanced

Feat: Config/override max_output_tokens by @klieret in #1036
Enh: [#1042] fix(run_batch): handle JSON parsing errors in trajectory check by @FRAOTIAC in #1043
Enh: Allow to override tools dirs etc. by @klieret in #1046
Enh: Allow to override path to swe-bench dataset by @klieret in #1093
Enh: Allow to disable python-standalone for batch by @klieret in #1115
Enh: More information on skipped exit status by @klieret in #1117

Fixed

Fix: Setting max_input_tokens to 0 by @klieret in #999
Fix: Explicitly set log file encoding by @klieret in #1013
Fix: Ensure pydantic-settings env prefix set by @klieret in #1018
Fix: run batch processing with modal by @vsee in #1023
Fix: Catch exit forfeit by @klieret in #1024
Fix: Use 'latest' image tag for SWE-Bench images by @klieret in #1029
Fix: Show tenacity retry reasons by @klieret in #1032
Fix: Compatibility with textual 2.0 by @klieret in #1033
Fix: Use default trajectories dir according to ENV by @vsee in #1054
Fix: fix Windows path error, replace Path with PurePosixPath or string by @alwaysgoodtime in #1052
Fix: Ensure tools PATH takes precedence by @klieret in #1058
Fix: Ensure state exists by @klieret in #1065
Fix spelling of 'agent' in hello world by @edspencer in #1077
Fix: Inspector needs to handle new message format by @klieret in #1094
Fix: SWEBenchInstances with path and no subset initiated as other instance type by @klieret in #1096
Fix: Token limit exceeded for PR body issue by @klieret in #1098
Fix: Work around litellm claude 3.7 tokens to 128k by @klieret in #1106
Fix(repo): Ensure absolute path for copy repo by @klieret in #1116
Fix execution time timeouts by @klieret in #1118
Fix: Hierarchical merge of multiple configs by @klieret in #1123
fix message type missing by @klieret in #1127
Fix: Conditional for warning about empty template by @klieret in #1137

New Contributors

@vsee made their first contribution in #1023
@FRAOTIAC made their first contribution in #1043
@jpaodev made their first contribution in #1050
@alwaysgoodtime made their first contribution in #1052
@alexgshaw made their first contribution in #1056
@talorabr made their first contribution in #1026
@katia-sentry made their first contribution in #1070
@edspencer made their first contribution in #1077
@kabirgh made their first contribution in #1090

Full Changelog: v1.0.1...v1.1.0

@klieret

SWE-agent 1.0.1

News: After our announcements for SOTA on SWE-Bench Lite and Verified, we now can claim SOTA on the full set of 2k GitHub issues of SWE-Bench full:

Interestingly, the improvement on the issues that are not also in the Lite/Verified subsets is much higher than the gain for the Lite/Verified subsets. Evaluating only on Lite/Verified doesn’t tell the whole story! -

What's Changed

This fixup release brings fixes mostly to the compatibility with local models. We have also significantly expanded the documentation in that aspect (models & keys documentation).

Changed

Change: Make anthropic_filemap the new default config by @klieret in #927

Added

Enh: Set timeout for post_startup_commands by @klieret in #973
Enh: Allow to override max_input_tokens for local models by @klieret in #992

Fixes

Fix: Handling local models cost lookup issues by @klieret in #937
Fix: Requires-python >= 3.11 by @klieret in #940
traj inspector viewport reset by @klieret in #946
Fix: Reset viewport when next/prev step/traj by @klieret in #948
Fix: Disable highlighting of model outputs by @klieret in #949
Fix: Create PRs by @klieret in #954
Fix: Add init,py to agent/hooks by @RNabel in #961
Fix: Pin textual to version 1.0.0 by @RNabel in #960
Fix: OpenAI API: Don't pass None tool_calls to the OpenAI API by @RNabel in #967
Fix: Forces platform to be linux/amd64 for swe-bench batch runs by @carlosejimenez in #942
Fix "TypeError: Cannot read properties of null (reading 'replace')" in Trajectory viewer by @0xba1a in #989
Fix: No retries if costs cannot be calculated by @klieret in #990
Fix: Race condition/size change during iteration by @klieret in #993
Fix: Handle total cost limit exceeded by @klieret in #994

New Contributors

@RNabel made their first contribution in #961
@dhruvji made their first contribution in #963
@0xba1a made their first contribution in #989

Full Changelog: v1.0.0...v1.0.1

@manya706

SWE-agent 1.0

News

So much new stuff! Here's a quick rundown of the cool new things you can do:

✨ Fast, massively parallel code execution with SWE-ReX.
✨ Run SWE-agent locally but execute code in the cloud (using modal, AWS, or anything else that runs SWE-ReX).
✨ Configurable retry mechanisms: Try multiple agent configurations, models, parameters, etc., then choose the best one.
✨ Flexible tool definitions with tool bundles.
✨ All language models supported using litellm (see models).
✨ Override any configuration option from the command line (see command line basics).
✨ New command line trajectory inspector to scroll few hundreds of trajectories with ease.
✨ New command line interface with subcommands for running over single issues, batches, and various utility commands.
✨ Greatly simplified and cleaned up codebase. In particular, the Agent class is now much easier to modify.

Read more about this in our 1.0 features & migration guide.

New Contributors

@manya706 made their first contribution in #787
@Prathamesh010 made their first contribution in #796
@magnimusprime made their first contribution in #813
@dependabot made their first contribution in #817
@Mefisto04 made their first contribution in #824
@acheshkov made their first contribution in #857
@yu-iskw made their first contribution in #881

Full Changelog: v0.7.0...v1.0.0

@samuela

SWE-agent is SOTA on offensive cybersecurity

SWE-agent EnIGMA (Enhanced Interactive Generative Model Agent) is SOTA on offensive cybersecurity challenges, with a 3.3x improvement over previous agents on the NYU CTF challenge dataset. The EnIGMA project introduces multiple novelties that are available to all use cases of SWE-agent, such as Interactive Agent Tools and a Summarizer to handle long outputs.

Major additions

Capability to run over CTF challenges
Interactive Agent Tools, including gdb
Summarizers to handle long outputs

Smaller additions

Add filemap command in the spirit of repomap by @samuela in #619
Create config to run human eval style challenges by @ofirpress in #658
Add claude 3.5 sonnet to models by @carlosejimenez in #601
Enh: Warn if scrolling >= 3 times by @klieret in #626
feat: support deepseek-coder LLM by @jcraftsman in #638
Enh: Make timeout for agent commands configurable by @klieret in #674
Add support for new gpt-4o-mini model by @ivan4722 in #693
Groq Models Integration by @MohammedNagdy in #721
Make log level configurable; add TRACE level by @klieret in #612

Fixes

Compatibility with SWE-bench 2.0 by @klieret in #671
ensure variables work in special command docstring by @forresty in #628
Important fix: Catch CostLimitExceeded in retry because of format/block by @klieret in #682
Fix: Handle empty traj in should_skip by @klieret in #616
Fix for end-marker communicate: Exit status always 0/invalid by @klieret in #644
Fix: Insufficient quoting of git commit message by @klieret in #646
Fix nonsensical trajectory formatting for PRs by @klieret in #647
Fix: sweunexpected keyword 'python_version' by @klieret in #692
Fix: Use LONG_TIMEOUT for pre_install commands by @klieret in #695
Fix: UnboundLocalError when catching decoding issue by @klieret in #709
Also create empty patch files for completeness by @klieret in #725
Fix: Raise ContextWindowExceeded instead of exit_cost by @klieret in #727
Fix: Deal with non-utf8 encoded bytes in comm by @klieret in #731
Fix: Handle spaces in repo names by @klieret in #734
Fix: Ensure utils is part of package by @klieret in #742
Fix: Submitting ' ' in human mode crashes container by @klieret in #749
Fix: Block su as command by @klieret in #752
Fix: SWE_AGENT_MODEL_MAX_RETRIES needs casting by @klieret in #757

New Contributors

🎉 @talorabr, @udiboy1209, @haoranxi, @NickNameInvalid, @rollingcoconut joined the team to build EnIGMA 🎉

@carlosejimenez made their first contribution in #601
@samefarrar made their first contribution in #606
@hubstrauss made their first contribution in #625
@samuela made their first contribution in #619
@forresty made their first contribution in #628
@jcraftsman made their first contribution in #638
@ivan4722 made their first contribution in #693
@JoshuaPurtell made their first contribution in #703
@MohammedNagdy made their first contribution in #721
@pdemro made their first contribution in #729

@klieret

This is (mostly) a patch release, in particular fixing several issues that had been introduced by the speed improvements of v0.7.0.
We also solve a bug where existing linter errors in a file left SWE-agent unable to edit (because of our lint-retry-loop).

Breaking changes

Change: sparse clone method is now correctly called "shallow" by @klieret in #591

Improved

Enh: Show commands when encountering timeout error by @klieret in #582
Enh: Configuration option to show time in log by @klieret in #583
Enh: Allow to configure LONG_TIMEOUT for SWEEnv by @klieret in #584
Enh: Always write log to traj directory by @klieret in #588

Fixed

fix docker.errors.NotFound by @klieret in #587
Fix: Revert to full clone method when needed by @klieret in #589
Fix: Refresh container_obj before querying status by @klieret in #590
Fixed #571 - show message that model arg is ignored in case of using Azure OpenAI by @jank in #592
Fix: Linting blocks for existing lint errors by @klieret in #593
Fix: Process done marker not found in read with timeout by @klieret in #596

@klieret

What's Changed

We sped up SWE-agent by 2x (timed with GPT4o). This is mostly due to faster communication with the running processes inside of the Docker container and other container setup & installation related improvements. Here are a few relevant PRs:

Switch to fast communicate and shallow clone by default by @klieret in #530
Change: Only wait 1s for docker to start by @klieret in #541
Feat: experimental shallow cloning by @klieret in #498
Enh: Start from clone of python conda environment for speedup by @klieret in #548
Enh: Use uv for editable install by default by @klieret in #547

Fixed

Web UI: Remove -n option to wait by @klieret in #487
Web UI: Kill the Flask server on exit. by @kwight in #479
Web UI: Avoid proxy errors on MacOS by @klieret in #506
Ensure container_name is reset for non-persistent containers by @klieret in #463
Fix: Do not allow persistent container with cache task imgs by @klieret in #551

Improved

Improve scrolling behavior in web UI by @anishfish2 in #420
Web UI: Render Markdown in agent feed messages. by @kwight in #486
Enh: Remove redundant 'saved traj to X' messages by @klieret in #528
Allow to disable config dump to log by @klieret in #537
Resolve relative paths to demonstrations and commands by @klieret in #444

New Contributors

@panozzaj made their first contribution in #476
@kwight made their first contribution in #482
@anishfish2 made their first contribution in #420
@ofirpress made their first contribution in #489
@milaiwi made their first contribution in #469
@burnettk made their first contribution in #533

Full Changelog: v0.5.0...v0.6.0

@ollmer

What's Changed

✨ The big news is our brand new documentation ✨

Secondly, @ollmer added a new flag --cache_task_images that will significantly speed up SWE-agent when running on the same environment/repository multiple times (no more waiting for cloning and installation!)

Breaking changes

We have reformatted our codebase. If you create a PR based on a previous commit, make sure you install our pre-commit hook to avoid merge-conflicts because of formatting. See our docs for more information.
Remove direct imports in __init__.py (you can no longer from sweagent import Agent by @klieret in #436

Added

Running the web UI is now supported when running swe-agent completely in docker
Speed up evaluation by caching task environments as docker images by @ollmer in #317

Improved

Add gpt-4o model by @raymyers in #344
Web: Allow to specify commit hash by @klieret in #358
Add default environment_setup config by @klieret in #351
Enh: Suppress openai logging; improve formatting of stats by @klieret in #416
Remove signal dependency by @klieret in #428
Do not use select if running on Windows by @klieret in #429
Use custom Config class to support env and keys.cfg (this allows passing keys as environment variables) by @klieret in #430

Fixes

Web: Fix script_path input by @klieret in #334
Fix: Don't print patch msg for exit_cost patch by @klieret in #343
Fix: Do not request job control in bash by @klieret in #345
Fix: --base_commit not used for gh urls by @klieret in #346
Fix: Separate data path/traj dir cause exception by @klieret in #348
Add docker-py lower bound by @klieret in #406
Fix: IndexError when replaying incomplete trajectories by @klieret in #410

New Contributors

@raymyers made their first contribution in #344
@nims11 made their first contribution in #332
@khangich made their first contribution in #274
@ollmer made their first contribution in #317

Full Changelog: v0.4.0...v0.5.0

@RainRat

What's Changed

We’re excited to launch the SWE-agent web UI! Specify a bug, press start and watch SWE-agent do the magic ✨

New Contributors

@tam-ng0905 made their first contribution in #321
@nonparibus made their first contribution in #310
@RainRat made their first contribution in #320

Full Changelog: v0.3.0...v0.4.0

@zgrannan

What's Changed

✨ Features

Run SWE-agent in the cloud using GitHub Codespaces
Add GPT4-turbo model by @zgrannan in #252
feat: Amazon Bedrock support (Claude models) by @JGalego in #207

🐛 Fixes

Better error handling for --open_pr by @klieret in #239
Fixed a potential error by @DanjieTang in #242
fix: TARGETARCH not set on some OS/docker setups by @mspronesti in #249
Pass Python version to get_environment_yml by @waterson in #271
Fix Together model validation error by @mikanfactory in #236
Doc: Avoid invalid github token by @klieret in #292

❤️ New Contributors

@DanjieTang made their first contribution in #242
@zgrannan made their first contribution in #252
@nfedyashev made their first contribution in #254
@JGalego made their first contribution in #207
@Borda made their first contribution in #256
@waterson made their first contribution in #271

Full Changelog: v0.2.0...v0.3.0

@klieret

What's Changed

Added

Allow to run on local repos (new flag: --repo_path) by @klieret in #193
Patch files are now saved separately to a patch directory by @klieret in #126
Allow to supply custom installation commands when running on gh issues or locally (--environment_setup) by @klieret in #153
Allow to specify openapi base url in keys.cfgby @bvandorf in #118

Improved

Improve error handling of docker issues by @klieret in #165
Make github token fully optional by @klieret in #189

Fixed

Fix opening PR from fork by @klieret in #229
Fix: Choosing TogetherAI models by @klieret in #130

New Contributors

@bvandorf made their first contribution in #118
@pre-commit-ci made their first contribution in #141
@moresearch made their first contribution in #147
@brandco made their first contribution in #155
@YeonwooSung made their first contribution in #72
@foragerr made their first contribution in #212
@zhipengzuo made their first contribution in #210
@mikanfactory made their first contribution in #218
@mspronesti made their first contribution in #216

Full Changelog: v0.1.2...v0.2.0

Releases: SWE-agent/SWE-agent