A React-based A/B testing platform for comparing images using LLM-based evaluation with statistical analysis.
- Image Comparison: Upload two images side-by-side for comparison
- LLM Evaluation: Ask questions about images and get ratings (1-10 scale)
- Multiple Providers: Support for Mock (simulated) and Ollama (local LLM) providers
- Statistical Analysis: Welch's t-test with configurable sample sizes
- Real-time Progress: Track experiment progress with live updates
- Node.js 18+
- (Optional) Ollama with a vision model for real LLM evaluation
npm installnpm run devOpen http://localhost:5173 in your browser.
To use real LLM evaluation instead of mock responses:
- Install Ollama from https://ollama.ai/
- Pull a vision model:
ollama pull llama3.2-vision
- Start Ollama:
ollama serve
- Select "Ollama (Local)" from the provider dropdown in the app
- Upload two images by clicking or dragging into the upload areas
- Enter a question (e.g., "How much on a scale from 1 to 10 do you like this image?")
- Select your LLM provider (Mock or Ollama)
- Choose sample size (10, 30, or 50 queries per image)
- Click "Run Experiment"
- View results including mean ratings, statistical analysis, and significance conclusion
The platform uses Welch's t-test (independent samples) with:
- Alpha level: 0.05 (5% significance)
- Effect size: Cohen's d
- No assumption of equal variances
src/
├── App.jsx # Main UI component
├── App.css # Styles
├── hooks/
│ └── useExperiment.js # Experiment orchestration hook
└── services/
├── llm/
│ ├── index.js # LLM provider factory
│ └── providers/
│ ├── mock.js # Mock provider
│ └── ollama.js # Ollama provider
└── statistics.js # Statistical functions
MIT
