An MCP server that uses large language models (LLMs) as judges to evaluate the responses of other LLMs.