Agent Performance Manager
Scaled Cognition
United States
Posted on Oct 16, 2025
Scaled Cognition is the world’s only model lab dedicated exclusively to customer experience and pioneering agentic models purpose-built for reliable action-taking enterprise applications. Backed by Khosla Ventures, the company’s flagship Agentic Pretrained Transformer (APT) eliminates hallucinations, enforces enterprise policies and increases reliability in real-world CX workflows. Founded by serial AI entrepreneurs, former Microsoft Corporate Vice President of Conversational AI Dan Roth, and UC Berkeley AI Professor Dan Klein, and built by a team of world-class PhD researchers and engineers, Scaled Cognition advances the science of agentic AI to deliver safe, policy-aligned automation that enterprises can trust.
As an Agent Performance Manager at Scaled Cognition you will:
- Develop and implement scalable QA plans for evaluating AI agents, defining key performance metrics to measure progress over time.
- Collaborate with product and engineering teams to document findings, test fixes, and recommend improvements to the underlying models and conversational flows.
- Lead and mentor a team of QA engineers, establishing best practices and processes for testing conversational AI agents.
Example projects could include:
- Building test sets to track regressions, agent robustness, and end-to-end testing.
- Reviewing and analyzing voice and chat transcripts, and quickly identify conversational gaps and provide data for faster iteration on customer deployments.
- Designing and automating testing pipelines to scale QA capacity across a diverse portfolio of customers and to continuously evaluate the performance of our AI agents.
Preferred Qualifications:
- Intermediate-level proficiency in Python and experience building and testing conversational AI/LLM systems.
- Background in implementing evaluation benchmarks, and production monitoring metrics.
- Experience working with libraries and tooling common in the AI/LLM ecosystem.
- Demonstrated precision in documenting test plans, test cases, and bug reports, ensuring data is accurate and easily understandable by cross-functional teams.
- Experience with leveraging AI-powered assistants/tooling to enable rapid iteration, prototyping, and accelerated delivery.