Ad Copy Pipeline

DSPy + LLM-as-Judge Optimization

Automatic prompt optimization for ad copy generation. DSPy modules, synthetic data, LLM-as-judge evaluation, and MIPROv2 optimization.

Overview

A production pipeline for ad copy that optimizes itself. DSPy modules define the generation logic, synthetic data provides training examples, an LLM judge scores quality, and MIPROv2 rewrites prompts to maximize scores.

Challenge

Prompt engineering is manual and slow. You write a prompt, test it, tweak it, repeat. I wanted to automate the optimization loop: define what good looks like, then let the system find prompts that produce it.

Approach

Built DSPy signatures defining inputs (product, audience, brand voice, objective) and outputs (ad copy). This separates logic from prompt text.

Created an LLM-as-judge metric scoring ads on relevance, brand alignment, persuasion, and compliance (1-5 each). This is the optimization target.

Generated 50+ synthetic training examples with GPT-4o-mini at high temperature. Variety in training data improves generalization.

Ran MIPROv2 optimization to automatically rewrite prompt instructions. The optimizer finds prompts that maximize judge scores across the training set.

Outcome

The optimized pipeline produces higher-quality ads than hand-tuned prompts. The frozen artifact is reproducible: same inputs always produce consistent quality. This is the frontier of AI engineering: automatic prompt optimization.

PythonDSPyOpenAIClaudeJupyter

GitHub