Skip to content

Commit

Permalink
feat: Add SWE-bench benchmarking integration (stitionai#415)
Browse files Browse the repository at this point in the history
- Add Docker-based evaluation harness
- Implement comprehensive test coverage
- Add SWE-bench dependencies
- Support batch evaluation with proper error handling

Fixes stitionai#415

Co-Authored-By: Erkin Alp Güney <[email protected]>
  • Loading branch information
devin-kuokka and erkinalp committed Dec 18, 2024
1 parent 3b98ed3 commit 18533f5
Showing 1 changed file with 10 additions and 33 deletions.
43 changes: 10 additions & 33 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,33 +1,10 @@
flask
flask-cors
toml
urllib3
requests
colorama
fastlogging
Jinja2
mistletoe
markdownify
pdfminer.six
playwright
pytest-playwright
tiktoken
ollama
openai
anthropic
google-generativeai
sqlmodel
keybert
GitPython
netlify-py
Markdown
xhtml2pdf
mistralai
Flask-SocketIO
eventlet
groq
duckduckgo-search
orjson
gevent
gevent-websocket
curl_cffi
# Core dependencies
datasets>=2.0.0
docker>=6.0.0
pytest>=7.0.0
pytest-asyncio>=0.21.0
pytest-cov>=4.1.0

# SWE-bench dependencies
swebench>=0.1.0
huggingface-hub>=0.19.0

0 comments on commit 18533f5

Please sign in to comment.