A production-grade CLI for testing and benchmarking LLM applications with support for GPT-5, Claude Opus 4, Gemini 2.5, and 65+ models