Dice is the leading career destination for tech experts at every stage of their careers. Our client, Vertical Falls LLC, is seeking the following. Apply via Dice today!
• *Job Title: Lead / Senior QA Engineer – Agentic AI Systems WITH Langfuse , Temporal
100% Remote
Interview Mode:2 Video**
6-12 MONTHS CONTRACT
We are looking for a highly skilled QA professional to build and scale a next-generation
Agentic AI Quality Engineering function
. This role goes beyond traditional QA—focusing on validating autonomous AI systems, designing evaluation frameworks, and ensuring high-quality outputs across multiple AI-driven products.
You will play a critical role in shaping how quality is defined, measured, and improved for agentic systems that operate with minimal human intervention.
Key Responsibilities
• Agentic QA Strategy & Scaling
• Design and scale an agentic QA model for autonomous AI systems
• Move QA from human-driven validation to AI-led evaluation and continuous quality monitoring
• Establish best practices for testing AI agents across lifecycle stages
• Product Quality Ownership
Own QA for 3 core AI products:
• AI Contact Center solutions
• AI Chat & Form-based interaction systems
• AI Assistants (autonomous / semi-autonomous agents)
• Define quality benchmarks, SLAs, and success metrics for each product
• Proactively identify quality gaps ahead of customer impact
• Metrics, Observability & Evaluation
• Define and track performance outputs for agentic systems (accuracy, latency, resolution quality, hallucination rate, etc.)
• Build frameworks for:
• Evals & graders (LLM evaluation pipelines)
• Output scoring and benchmarking
• Continuous feedback loops
• Leverage tools like Langfuse for:
• LLM observability and tracing
• Prompt monitoring and performance analysis
• Debugging agent behavior in production
• Analyze:
• Downstream issues
• Production tickets
• Failure patterns
• Automation & Testing Frameworks
• Build and scale automation across:
• Regression testing
• Smoke testing
• End-to-end agent workflows
• Develop and maintain Playwright-based automation scripts
• Integrate QA into CI/CD pipelines for continuous validation
• Agentic Testing & Validation
• Design testing approaches for:
• Multi-step agent workflows
• Context retention and reasoning
• Tool usage by agents
• Work with orchestration frameworks like Temporal to:
• Validate long-running workflows
• Test retries, state transitions, and failure handling in agent pipelines
• Account for non-deterministic behavior in AI systems
• Invest additional effort in agentic validation, recognizing higher complexity vs traditional QA
• Continuous Improvement & Innovation
• Define frameworks to predict and prevent failures before customer exposure
• Continuously improve QA processes using AI and automation
• Partner with Product, Engineering, and AI teams to improve system quality
Required Skills & Experience
• 5–10+ years in QA / Quality Engineering, with strong automation experience
• Hands-on experience with:
• Test automation tools (Playwright preferred)
• API and system testing
• Strong understanding of:
• AI/ML systems (LLMs, conversational AI preferred)
• Evaluation frameworks and benchmarking
• Experience with:
• Temporal (workflow orchestration, stateful systems testing)
• Langfuse (LLM observability, tracing, and evaluation)
• Experience in:
• Building QA frameworks from scratch
• Working with production data, logs, and issue triaging
Good to Have
• Experience with LLM eval frameworks, prompt testing, or AI red-teaming
• Familiarity with agentic architectures / autonomous systems
• Exposure to observability and analytics platforms
Working Model
• Prefer candidates with EST time zone overlap
• Ability to work closely with global product and engineering teams
What Success Looks Like
• A scalable, automated QA system for agentic products
• Measurable improvement in AI output quality and reliability
• Reduced production issues and faster detection of failures
QA evolving from reactive testing to
proactive quality intelligence