Posted Jun 2, 2026

Applied AI Engineer

Dear applicants, please keep in mind that applications without provided salary expectations and active LN profile will not be considered. Hope for your understanding. We are hiring a Senior Applied AI Engineer to own the reliability, evaluation, and production stability of advanced multi-agent AI systems operating at real production scale. This role is focused on transforming LLM-powered workflows from “demo-ready” prototypes into resilient, observable, production-grade systems capable of handling non-deterministic model behavior, complex routing logic, and human-in-the-loop escalation flows. You will work closely with technical leadership and product stakeholders to design, evaluate, optimize, and maintain agentic AI systems across multiple communication channels and workflows. This is a highly hands-on engineering role for someone who thrives in production environments and understands the realities of deploying AI systems under live traffic conditions. Details Location: LATAM Work Model: Fully Remote Employment Type: Full-time Seniority Level: Senior Industry: AI / Agentic Systems / SaaS Start Date: ASAP English Level: Fluent English Required Time Zone: LATAM-friendly collaboration preferred About the Role This position is dedicated to AI agent reliability, evaluation pipelines, observability, and continuous optimization of production LLM systems. The ideal candidate combines strong backend engineering expertise with deep practical experience operating AI products in real-world environments. You will take ownership of evaluation frameworks, scoring systems, tracing infrastructure, production debugging, and the iterative optimization loop between prompts, architecture decisions, and system behavior. The role requires both technical depth and product intuition, especially around how evaluation systems directly impact product quality and user experience. Key Responsibilities Design, build, and maintain evaluation pipelines for production AI agent systems Instrument multi-agent workflows with tracing and observability tooling Build evaluation datasets using real production traffic and interaction logs Develop quality scoring and robustness scoring systems for LLM outputs Improve reliability of AI systems handling non-deterministic model behavior Implement and optimize HITL (Human-in-the-Loop) escalation workflows Analyze production failures and drive architectural improvements Own the full feedback loop between evaluations, prompt optimization, architecture updates, and re-testing Contribute to prompt engineering and model optimization strategies Collaborate on multi-agent orchestration and workflow reliability decisions Work across backend systems, deployment pipelines, monitoring, and operational sustainment Participate in production support and on-call responsibilities Maintain high engineering standards around scalability, observability, and maintainability Operate independently across development, testing, deployment, and production ownership Requirements 5+ years of backend or AI engineering experience in production environments Strong hands-on experience with production LLM or agentic AI systems Proven experience debugging and maintaining non-deterministic AI workflows under live traffic Experience building or operating evaluation/evals pipelines for AI systems Strong understanding of scorer design, feedback loops, and AI system evaluation methodologies Excellent Python backend engineering skills Production experience with: FastAPI Django Celery LangGraph or similar orchestration frameworks Experience with observability and tracing tools such as: Langfuse Grafana Loki OpenTelemetry or equivalent Experience deploying and operating distributed backend systems Strong understanding of AI reliability, prompt behavior, and model failure handling Ability to independently own projects end-to-end Experience working in asynchronous remote teams Strong written communication skills in English Nice to Have Experience with: DSPy DPO RLHF-related optimization workflows Experience with multi-agent orchestration systems Production experience with: GPT-4.x Claude Whisper Multi-model AI stacks Experience building AI tooling for communication or workflow automation Background in high-growth startups or product-focused engineering teams Experience with distributed systems and event-driven architectures Familiarity with AI observability and experiment tracking frameworks Exposure to vector databases, retrieval systems, or memory architectures Experience scaling AI products with real customer usage Tech Stack: Python FastAPI Django Celery LangGraph Langfuse Grafana Loki LLM APIs (OpenAI / Anthropic / multi-model stacks) What Success Looks Like AI agents reliably handle real production traffic with measurable quality improvements Evaluation pipelines provide actionable scoring and monitoring insights Observability systems surface failures before they impact users Human escalation triggers operate accurately and consistently Prompt and architecture iterations measurably improve production outcomes AI systems become resilient, scalable, and maintainable over time Interview Process HR / Introductory Call Technical Deep Dive Take-Home Technical Assessment Final Team & Culture Interview Offer Stage

Apply Now

Applied AI Engineer

More WFH Jobs