promptfoo

promptfoo is an open-source framework for testing and evaluating LLM outputs. It helps you compare DeepSeek models with other LLMs (like o1, GPT-4o, Claude 3.5, Llama3.3, and Gemini) and test LLMs and LLM applications for security vulnerabilities. You can:

Run side-by-side comparisons between models
Check output quality and consistency
Generate test reports

Setup

Install promptfoo:

npm install -g promptfoo
# or
brew install promptfoo

Configure API keys:

export DEEPSEEK_API_KEY=your_api_key
# Add other API keys as needed

Quick Start

Create a configuration file promptfooconfig.yaml:

providers:
  - deepseek:deepseek-reasoner # DeepSeek-R1
  - openai:o1

prompts:
  - 'Solve this step by step: {{math_problem}}'

tests:
  - vars:
      math_problem: 'What is the derivative of x^3 + 2x with respect to x?'
    assert:
      - type: contains
        value: '3x^2' # Check for correct answer
      - type: llm-rubric
        value: 'Response shows clear steps'
      - type: cost
        threshold: 0.05 # Maximum cost per request

Run tests:

promptfoo eval

View results in your browser:

promptfoo view

Example Project

Check out our example that compares r1 and o1 on MMLU.

Resources

Documentation
GitHub Repository
Community Discord

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

promptfoo

Setup

Quick Start

Example Project

Resources

Files

README.md

Latest commit

History

README.md

File metadata and controls

promptfoo

Setup

Quick Start

Example Project

Resources