TypeScript library for evaluating prompts against coding agents (Claude Code, Cursor, etc.) with multi-iteration testing and scoring