Releases: SWE-agent/SWE-agent
v1.1.0: 10s of thousands of training trajectories
v1.1.0: 10s of thousands of training trajectories
We're very excited to announce our new project SWE-smith, generating 10s of thousands of training trajectories for SWE agents.
Using this training data, our LM SWE-agent-LM-32b achieves open-weights SotA on SWE-bench verified with SWE-agent!
Apart from that, v1.1.0 is mostly a fix release with minor improvements, in particular adding compatibility with SWE-bench multilingual/multimodal, and SWE-smith. However, please pay attention to the breaking changes below.
Breaking changes
- Changes to trajectory data format. The
messages
field is replaced byquery
by @klieret in #1107 - Renamed many tool bundles that used "windowed" file viewer (
defaults
and more) by @klieret in #1147 - Removed
review_on_submit
tool bundle (replaced byreview_on_submit_m
) by @klieret in #1148 - Change in
windowed
tools (formerlydefault
): Don't append \n to new file by @klieret in #1114
Added
New dataset support:
- Feat: Support multilingual evaluation by @kabirgh in #1090
- Feat: SWE-smith & multimodal base support by @klieret in #1092
New utilities:
Enhanced
- Feat: Config/override max_output_tokens by @klieret in #1036
- Enh: [#1042] fix(run_batch): handle JSON parsing errors in trajectory check by @FRAOTIAC in #1043
- Enh: Allow to override tools dirs etc. by @klieret in #1046
- Enh: Allow to override path to swe-bench dataset by @klieret in #1093
- Enh: Allow to disable python-standalone for batch by @klieret in #1115
- Enh: More information on skipped exit status by @klieret in #1117
Fixed
- Fix: Setting max_input_tokens to 0 by @klieret in #999
- Fix: Explicitly set log file encoding by @klieret in #1013
- Fix: Ensure pydantic-settings env prefix set by @klieret in #1018
- Fix: run batch processing with modal by @vsee in #1023
- Fix: Catch exit forfeit by @klieret in #1024
- Fix: Use 'latest' image tag for SWE-Bench images by @klieret in #1029
- Fix: Show tenacity retry reasons by @klieret in #1032
- Fix: Compatibility with textual 2.0 by @klieret in #1033
- Fix: Use default trajectories dir according to ENV by @vsee in #1054
- Fix: fix Windows path error, replace Path with PurePosixPath or string by @alwaysgoodtime in #1052
- Fix: Ensure tools PATH takes precedence by @klieret in #1058
- Fix: Ensure state exists by @klieret in #1065
- Fix spelling of 'agent' in hello world by @edspencer in #1077
- Fix: Inspector needs to handle new message format by @klieret in #1094
- Fix: SWEBenchInstances with path and no subset initiated as other instance type by @klieret in #1096
- Fix: Token limit exceeded for PR body issue by @klieret in #1098
- Fix: Work around litellm claude 3.7 tokens to 128k by @klieret in #1106
- Fix(repo): Ensure absolute path for copy repo by @klieret in #1116
- Fix execution time timeouts by @klieret in #1118
- Fix: Hierarchical merge of multiple configs by @klieret in #1123
- fix message type missing by @klieret in #1127
- Fix: Conditional for warning about empty template by @klieret in #1137
New Contributors
- @vsee made their first contribution in #1023
- @FRAOTIAC made their first contribution in #1043
- @jpaodev made their first contribution in #1050
- @alwaysgoodtime made their first contribution in #1052
- @alexgshaw made their first contribution in #1056
- @talorabr made their first contribution in #1026
- @katia-sentry made their first contribution in #1070
- @edspencer made their first contribution in #1077
- @kabirgh made their first contribution in #1090
Full Changelog: v1.0.1...v1.1.0
v1.0.1: SOTA on SWE-Bench Full
SWE-agent 1.0.1
News: After our announcements for SOTA on SWE-Bench Lite and Verified, we now can claim SOTA on the full set of 2k GitHub issues of SWE-Bench full:
Interestingly, the improvement on the issues that are not also in the Lite/Verified subsets is much higher than the gain for the Lite/Verified subsets. Evaluating only on Lite/Verified doesnโt tell the whole story! -
What's Changed
This fixup release brings fixes mostly to the compatibility with local models. We have also significantly expanded the documentation in that aspect (models & keys documentation).
Changed
Added
- Enh: Set timeout for post_startup_commands by @klieret in #973
- Enh: Allow to override max_input_tokens for local models by @klieret in #992
Fixes
- Fix: Handling local models cost lookup issues by @klieret in #937
- Fix: Requires-python >= 3.11 by @klieret in #940
- traj inspector viewport reset by @klieret in #946
- Fix: Reset viewport when next/prev step/traj by @klieret in #948
- Fix: Disable highlighting of model outputs by @klieret in #949
- Fix: Create PRs by @klieret in #954
- Fix: Add init,py to agent/hooks by @RNabel in #961
- Fix: Pin textual to version 1.0.0 by @RNabel in #960
- Fix: OpenAI API: Don't pass None tool_calls to the OpenAI API by @RNabel in #967
- Fix: Forces platform to be linux/amd64 for swe-bench batch runs by @carlosejimenez in #942
- Fix "TypeError: Cannot read properties of null (reading 'replace')" in Trajectory viewer by @0xba1a in #989
- Fix: No retries if costs cannot be calculated by @klieret in #990
- Fix: Race condition/size change during iteration by @klieret in #993
- Fix: Handle total cost limit exceeded by @klieret in #994
New Contributors
- @RNabel made their first contribution in #961
- @dhruvji made their first contribution in #963
- @0xba1a made their first contribution in #989
Full Changelog: v1.0.0...v1.0.1
v1.0.0
SWE-agent 1.0
News
So much new stuff! Here's a quick rundown of the cool new things you can do:
โจ Fast, massively parallel code execution with SWE-ReX.
โจ Run SWE-agent locally but execute code in the cloud (using modal, AWS, or anything else that runs SWE-ReX).
โจ Configurable retry mechanisms: Try multiple agent configurations, models, parameters, etc., then choose the best one.
โจ Flexible tool definitions with tool bundles.
โจ All language models supported using litellm (see models).
โจ Override any configuration option from the command line (see command line basics).
โจ New command line trajectory inspector to scroll few hundreds of trajectories with ease.
โจ New command line interface with subcommands for running over single issues, batches, and various utility commands.
โจ Greatly simplified and cleaned up codebase. In particular, the Agent class is now much easier to modify.
Read more about this in our 1.0 features & migration guide.
New Contributors
- @manya706 made their first contribution in #787
- @Prathamesh010 made their first contribution in #796
- @magnimusprime made their first contribution in #813
- @dependabot made their first contribution in #817
- @Mefisto04 made their first contribution in #824
- @acheshkov made their first contribution in #857
- @yu-iskw made their first contribution in #881
Full Changelog: v0.7.0...v1.0.0
SWE-agent EnIGMA (0.7.0)
SWE-agent is SOTA on offensive cybersecurity
SWE-agent EnIGMA (Enhanced Interactive Generative Model Agent) is SOTA on offensive cybersecurity challenges, with a 3.3x improvement over previous agents on the NYU CTF challenge dataset. The EnIGMA project introduces multiple novelties that are available to all use cases of SWE-agent, such as Interactive Agent Tools and a Summarizer to handle long outputs.
Major additions
- Capability to run over CTF challenges
- Interactive Agent Tools, including
gdb
- Summarizers to handle long outputs
Smaller additions
- Add filemap command in the spirit of repomap by @samuela in #619
- Create config to run human eval style challenges by @ofirpress in #658
- Add claude 3.5 sonnet to models by @carlosejimenez in #601
- Enh: Warn if scrolling >= 3 times by @klieret in #626
- feat: support deepseek-coder LLM by @jcraftsman in #638
- Enh: Make timeout for agent commands configurable by @klieret in #674
- Add support for new gpt-4o-mini model by @ivan4722 in #693
- Groq Models Integration by @MohammedNagdy in #721
- Make log level configurable; add TRACE level by @klieret in #612
Fixes
- Compatibility with SWE-bench 2.0 by @klieret in #671
- ensure variables work in special command docstring by @forresty in #628
- Important fix: Catch CostLimitExceeded in retry because of format/block by @klieret in #682
- Fix: Handle empty traj in should_skip by @klieret in #616
- Fix for end-marker communicate: Exit status always 0/invalid by @klieret in #644
- Fix: Insufficient quoting of git commit message by @klieret in #646
- Fix nonsensical trajectory formatting for PRs by @klieret in #647
- Fix: sweunexpected keyword 'python_version' by @klieret in #692
- Fix: Use LONG_TIMEOUT for pre_install commands by @klieret in #695
- Fix: UnboundLocalError when catching decoding issue by @klieret in #709
- Also create empty patch files for completeness by @klieret in #725
- Fix: Raise ContextWindowExceeded instead of exit_cost by @klieret in #727
- Fix: Deal with non-utf8 encoded bytes in comm by @klieret in #731
- Fix: Handle spaces in repo names by @klieret in #734
- Fix: Ensure utils is part of package by @klieret in #742
- Fix: Submitting ' ' in human mode crashes container by @klieret in #749
- Fix: Block su as command by @klieret in #752
- Fix: SWE_AGENT_MODEL_MAX_RETRIES needs casting by @klieret in #757
New Contributors
๐ @talorabr, @udiboy1209, @haoranxi, @NickNameInvalid, @rollingcoconut joined the team to build EnIGMA ๐
- @carlosejimenez made their first contribution in #601
- @samefarrar made their first contribution in #606
- @hubstrauss made their first contribution in #625
- @samuela made their first contribution in #619
- @forresty made their first contribution in #628
- @jcraftsman made their first contribution in #638
- @ivan4722 made their first contribution in #693
- @JoshuaPurtell made their first contribution in #703
- @MohammedNagdy made their first contribution in #721
- @pdemro made their first contribution in #729
v0.6.1
This is (mostly) a patch release, in particular fixing several issues that had been introduced by the speed improvements of v0.7.0.
We also solve a bug where existing linter errors in a file left SWE-agent unable to edit (because of our lint-retry-loop).
Breaking changes
Improved
- Enh: Show commands when encountering timeout error by @klieret in #582
- Enh: Configuration option to show time in log by @klieret in #583
- Enh: Allow to configure LONG_TIMEOUT for SWEEnv by @klieret in #584
- Enh: Always write log to traj directory by @klieret in #588
Fixed
- fix
docker.errors.NotFound
by @klieret in #587 - Fix: Revert to full clone method when needed by @klieret in #589
- Fix: Refresh container_obj before querying status by @klieret in #590
- Fixed #571 - show message that model arg is ignored in case of using Azure OpenAI by @jank in #592
- Fix: Linting blocks for existing lint errors by @klieret in #593
- Fix: Process done marker not found in read with timeout by @klieret in #596
v0.6.0
What's Changed
We sped up SWE-agent by 2x (timed with GPT4o). This is mostly due to faster communication with the running processes inside of the Docker container and other container setup & installation related improvements. Here are a few relevant PRs:
- Switch to fast communicate and shallow clone by default by @klieret in #530
- Change: Only wait 1s for docker to start by @klieret in #541
- Feat: experimental shallow cloning by @klieret in #498
- Enh: Start from clone of python conda environment for speedup by @klieret in #548
- Enh: Use uv for editable install by default by @klieret in #547
Fixed
- Web UI: Remove -n option to wait by @klieret in #487
- Web UI: Kill the Flask server on exit. by @kwight in #479
- Web UI: Avoid proxy errors on MacOS by @klieret in #506
- Ensure container_name is reset for non-persistent containers by @klieret in #463
- Fix: Do not allow persistent container with cache task imgs by @klieret in #551
Improved
- Improve scrolling behavior in web UI by @anishfish2 in #420
- Web UI: Render Markdown in agent feed messages. by @kwight in #486
- Enh: Remove redundant 'saved traj to X' messages by @klieret in #528
- Allow to disable config dump to log by @klieret in #537
- Resolve relative paths to demonstrations and commands by @klieret in #444
New Contributors
- @panozzaj made their first contribution in #476
- @kwight made their first contribution in #482
- @anishfish2 made their first contribution in #420
- @ofirpress made their first contribution in #489
- @milaiwi made their first contribution in #469
- @burnettk made their first contribution in #533
Full Changelog: v0.5.0...v0.6.0
v0.5.0
What's Changed
โจ The big news is our brand new documentation โจ
Secondly, @ollmer added a new flag --cache_task_images
that will significantly speed up SWE-agent when running on the same environment/repository multiple times (no more waiting for cloning and installation!)
Breaking changes
- We have reformatted our codebase. If you create a PR based on a previous commit, make sure you install our
pre-commit
hook to avoid merge-conflicts because of formatting. See our docs for more information. - Remove direct imports in
__init__.py
(you can no longerfrom sweagent import Agent
by @klieret in #436
Added
- Running the web UI is now supported when running swe-agent completely in docker
- Speed up evaluation by caching task environments as docker images by @ollmer in #317
Improved
- Add gpt-4o model by @raymyers in #344
- Web: Allow to specify commit hash by @klieret in #358
- Add default environment_setup config by @klieret in #351
- Enh: Suppress openai logging; improve formatting of stats by @klieret in #416
- Remove signal dependency by @klieret in #428
- Do not use select if running on Windows by @klieret in #429
- Use custom Config class to support env and keys.cfg (this allows passing keys as environment variables) by @klieret in #430
Fixes
- Web: Fix script_path input by @klieret in #334
- Fix: Don't print patch msg for exit_cost patch by @klieret in #343
- Fix: Do not request job control in bash by @klieret in #345
- Fix: --base_commit not used for gh urls by @klieret in #346
- Fix: Separate data path/traj dir cause exception by @klieret in #348
- Add docker-py lower bound by @klieret in #406
- Fix: IndexError when replaying incomplete trajectories by @klieret in #410
New Contributors
- @raymyers made their first contribution in #344
- @nims11 made their first contribution in #332
- @khangich made their first contribution in #274
- @ollmer made their first contribution in #317
Full Changelog: v0.4.0...v0.5.0
0.4.0 Web UI
What's Changed
Weโre excited to launch the SWE-agent web UI! Specify a bug, press start and watch SWE-agent do the magic โจ
New Contributors
- @tam-ng0905 made their first contribution in #321
- @nonparibus made their first contribution in #310
- @RainRat made their first contribution in #320
Full Changelog: v0.3.0...v0.4.0
0.3.0
What's Changed
โจ Features
- Run SWE-agent in the cloud using GitHub Codespaces
- Add GPT4-turbo model by @zgrannan in #252
- feat: Amazon Bedrock support (Claude models) by @JGalego in #207
๐ Fixes
- Better error handling for --open_pr by @klieret in #239
- Fixed a potential error by @DanjieTang in #242
- fix: TARGETARCH not set on some OS/docker setups by @mspronesti in #249
- Pass Python version to get_environment_yml by @waterson in #271
- Fix Together model validation error by @mikanfactory in #236
- Doc: Avoid invalid github token by @klieret in #292
โค๏ธ New Contributors
- @DanjieTang made their first contribution in #242
- @zgrannan made their first contribution in #252
- @nfedyashev made their first contribution in #254
- @JGalego made their first contribution in #207
- @Borda made their first contribution in #256
- @waterson made their first contribution in #271
Full Changelog: v0.2.0...v0.3.0
v0.2.0
What's Changed
Added
- Allow to run on local repos (new flag:
--repo_path
) by @klieret in #193 - Patch files are now saved separately to a patch directory by @klieret in #126
- Allow to supply custom installation commands when running on gh issues or locally (
--environment_setup
) by @klieret in #153 - Allow to specify openapi base url in
keys.cfg
by @bvandorf in #118
Improved
- Improve error handling of docker issues by @klieret in #165
- Make github token fully optional by @klieret in #189
Fixed
New Contributors
- @bvandorf made their first contribution in #118
- @pre-commit-ci made their first contribution in #141
- @moresearch made their first contribution in #147
- @brandco made their first contribution in #155
- @YeonwooSung made their first contribution in #72
- @foragerr made their first contribution in #212
- @zhipengzuo made their first contribution in #210
- @mikanfactory made their first contribution in #218
- @mspronesti made their first contribution in #216
Full Changelog: v0.1.2...v0.2.0