Money Forward Vietnam

E-Town Central, 11 Doan Van Bo, TP Hồ Chí Minh

Company Size : 25-99

Job Summary

25-99
Product
Việt Nam

Agent Ops Engineer

Money Forward Vietnam

Quận 4, TP Hồ Chí Minh
Negotiable
Full Time

English
Experienced (Non-Manager)

Posted:24/06/2026

Apply now

Technical Skill: AI (Artificial Intelligence) , DevOps , AWS , Java , Spring , Python , Regression Testing , API , Architecture , Observability , Protocol , OpenAI , Apache Kafka , Caching , Spring Boot , Kotlin , Kubernetes , Cloud Infrastructure , IaC , MLOps , CI/CD , LLM , LangChain , RAG , LangGraph , OpenTelemetry , MCP

Job description

Overview of job

We’re hiring an Agent Ops Engineer to scale AI agent capabilities across HRS Domain and products. This is a high-impact role at the intersection of AI engineering, platform operations, and knowledge enablement. You’ll provide directions and build AI agents reliable in production across teams by owning the lifecycle, quality gates, observability, and operational standards—while embedding with teams to accelerate adoption. The larger goal of this centralized Agent Ops model is to enable Ai enablers and product builders within each product team for agent development and at the same time contribution common best practices, guard rails, to MFBS adoption across other domains like ERP and SMB.

What you will do:

1) Agent Engineering & operation

Design, build, and maintain production-grade AI agent systems, including: context engineering and instruction architecture, prompt hardening and safe execution boundaries, tool integrations and multi-step orchestration, memory strategies and reliability patterns.
Own the full agent lifecycle: prototype → evaluate → deploy → monitor → iterate.
Build and maintain an evaluation pipeline to measure agent quality, catch regressions, and enforce deployment gates (golden datasets, scenario suites, automated checks).
Instrument agents and agent platforms for production observability: structured logging, tracing, and metrics; latency and cost monitoring; tool-call success rates and failure analysis.
Define operational readiness standards including: rollback criteria, incident response playbooks, recovery paths for common failure modes.

2) Team Enablement & Coaching

Embed with product engineering teams to identify high-value use cases ready for agent automation. We will be operating in a Central Agent Ops role enabling Ai product builders through AI enablers.
Translate business workflows into agent-executable tasks with clear: contact boundaries/interfaces, assumptions and inputs/outputs, failure modes and safe fallbacks.
Deliver targeted coaching to engineers on: context engineering best practices, harness design and regression testing patterns, agent skill design and tool-contract discipline.
Reduce onboarding time for teams adopting AI capabilities—from first conversation to a production-ready agent.
Train product engineers to extend and maintain agent skills independently.

3) Standards & Knowledge operations

Author and maintain org-level standards for agents, including: naming conventions, context file structures and ownership rules, skill interface contracts (inputs/outputs, invariants, error handling), evaluation criteria and release quality bars.
Establish and enforce “repo-as-discipline” practices so agent knowledge is: versioned, reviewable, discoverable, reusable; not trapped in prompt snippets or individual heads.
Build and grow a shared agent skills library that teams can reuse and extend.
Track and aggregate AI tooling/framework updates and external best practices, serving as a central intake so product teams don’t each have to follow the entire AI landscape.
Run internal knowledge-sharing sessions, showcases, and retrospectives to propagate learnings efficiently.

Caring Mental & Physical Recreation:

Hybrid working
Full salary in probation & 13th month salary
Social insurance on full salary from probation
Premium Health insurance from probation
Flexible start 8AM-9AM from Mon-Fri
16 days off annually + 1 Birthday Leave
Paternity leave extra 5 days
Annual company trip; Quarterly team building activities
Club activities
Annual health check

Caring Career & Development:

Clear Career path
Foreign language & International technology-related certifications sponsoring
Well-equipped facility: Macbook pro, additional monitor,..
Soft skill workshops
Tech seminars
Monthly and biannually Recognition Awards
Performance review twice/year

Job Requirement

What you bring:

Must have

12+ years of experience in the software development industry
Hands-on experience building and deploying production AI agents using modern frameworks (LangGraph, LangChain, OpenAI Agents SDK, trueAI, or equivalent).
Strong understanding of context engineering, including instruction architecture, token management, caching strategies, and latency-aware design.
Experience building evaluation pipelines: golden datasets and scenario libraries; automated quality gates and regression detection.
Familiarity with agent observability: tracing, structured logging, latency, and cost monitoring; tool-call reliability metrics and failure analysis.
Ability to design guardrails: output validation; prompt injection mitigation; safe execution boundaries for tools/actions.
Solid backend engineering skills; comfortable owning services/APIs end-to-end.
Strong communicator who can coach engineers, facilitate cross-team discussions, and write clear technical documentation.
Experience with production reliability and platform operations, including: event-driven architectures (Kafka and/or message queues); retries/backoff, DLQs, idempotency, ordering, backpressure; CDC/outbox-style patterns (or similar asynchronous reliability patterns); Kubernetes-based deployment and day-2 operations; CI/CD pipelines and infrastructure as code; on-call, incident response, postmortems, and SRE-style practices (SLOs/SLIs, runbooks).

Nice to have

Experience with RAG systems: ingestion, chunking, embeddings, hybrid search, retrieval evaluation.
Familiarity with MCP / Model Context Protocol or similar agent tooling standards (e.g., “MPTV”), and tool integration ecosystems.
Proficiency across Java/Kotlin (Spring Boot) and Python in production environments.

Who thrives in this role?

Engineers with an SRE/DevOps background pivoting into AI who naturally think about reliability, observability, and incident response.
Backend engineers with hands-on LLM/agent framework experience who want to work cross-functionally and enable multiple teams.
MLOps/LLM engineers who want to embed in product orgs and ship applied systems (not only model infrastructure).
Engineers who treat documentation, standards, and knowledge transfer as first-class engineering outputs.

What you can expect

A greenfield mandate to define what “good AI operations” looks like at scale inside an engineering organization.
Direct influence on the standards, patterns, and tooling multiple product teams will adopt.
A role that grows from team-level impact to organization-wide impact as the practice matures.
Work at the frontier of applied AI engineering, where best practices are still being written.

Our stack

Agent frameworks and LLM APIs, OpenTelemetry, Kafka/event-driven systems, Kubernetes, Spring Boot, Java, Kotlin, Python, CI/CD pipelines, AWS/cloud infrastructure.

Languages

- English
- Speaking: Intermediate - Reading: Intermediate - Writing: Intermediate

Technical Skill

AI (Artificial Intelligence)
DevOps
AWS
Java
Spring
Python
Regression Testing
API
Architecture
Observability
Protocol
OpenAI
Apache Kafka
Caching
Spring Boot
Kotlin
Kubernetes
Cloud Infrastructure
IaC
MLOps
CI/CD
LLM
LangChain
RAG
LangGraph
OpenTelemetry
MCP

COMPETENCES

Reliable
Analytic Skills
Documentation

Money Forward Vietnam

Job Summary

Agent Ops Engineer

Job description

Overview of job

Job Requirement

Languages

Technical Skill

COMPETENCES

Search for the right jobs

BUSINESS PROFILE

Money Forward Vietnam aims to solve money-related issues of all individuals and businesses through building an open and fair financial platform and providing essential services.

MORE JOBS FROM THIS EMPLOYER

(Hanoi) Principal/ Senior Golang Engineer

Solution Architect/ Technical Lead

Search for the right jobs

WHO WE ARE

Jobs from Ho Chi Minh

Jobs from Hanoi

Information