feat: Add SWE-bench benchmarking integration (stitionai#415)

- Add Docker-based evaluation harness - Implement comprehensive test coverage - Add SWE-bench dependencies - Support batch evaluation with proper error handling Fixes stitionai#415 Co-Authored-By: Erkin Alp Güney <[email protected]>
erkinalp · Dec 18, 2024 · 18533f5 · 18533f5
1 parent 3b98ed3
commit 18533f5
Showing 1 changed file with 10 additions and 33 deletions.
diff --git a/requirements.txt b/requirements.txt
@@ -1,33 +1,10 @@
-flask
-flask-cors
-toml
-urllib3
-requests
-colorama
-fastlogging
-Jinja2
-mistletoe
-markdownify
-pdfminer.six
-playwright
-pytest-playwright
-tiktoken
-ollama
-openai
-anthropic
-google-generativeai
-sqlmodel
-keybert
-GitPython
-netlify-py
-Markdown
-xhtml2pdf
-mistralai
-Flask-SocketIO
-eventlet
-groq
-duckduckgo-search
-orjson
-gevent
-gevent-websocket
-curl_cffi
+# Core dependencies
+datasets>=2.0.0
+docker>=6.0.0
+pytest>=7.0.0
+pytest-asyncio>=0.21.0
+pytest-cov>=4.1.0
+
+# SWE-bench dependencies
+swebench>=0.1.0
+huggingface-hub>=0.19.0