ItJobs Logo
Home About us Conditions
vi en
Login Sign Up
Logo

IT Jobs

Close
  • Home
  • About us
  • Conditions
  • Privacy
  • Contact
  • eng vi
TOP JOBS
SMG Swiss Marketplace Group
Mid/Senior Android Engineer
SMG Swiss Marketplace Group
Up to 4000USD
CODE88
DevOps Lead/ Manager
CODE88
Up to 3500USD
S4BT Solutions for Business Travel
Mid/Senior .NET Engineer
S4BT Solutions for Business Travel
Up to 3200USD
Rakuten Fintech Vietnam
Mid/Sr Java Developer
Rakuten Fintech Vietnam
Up to 3200USD
Viettel Post
DevOps Engineer
Viettel Post
Up to 3000USD
Ingenico Group
QA Engineer
Ingenico Group
Up to 3000USD
CodeHQ
Senior Fullstack Developer
CodeHQ
Up to 3000USD
One Mount Group
(HCM) Senior Data Engineer
One Mount Group
Up to 3000USD
Crossian
Supply Chain Data Analyst
Crossian
Up to 2600USD

One Mount Group

Times City, 458 Minh Khai, Hà Nội

Company Size : 25-99

View more

Job Summary

  • 25-99
  • Product
  • Việt Nam

Senior AI Platform Engineer

One Mount Group

  • Hoàn Kiếm, Hà Nội
  • Negotiable
  •  Full Time
  •  English
  •  Experienced (Non-Manager)
1
1

  •  Posted:05/04/2026

  • Expired
Senior AI Platform Engineer
Expired
Technical Skill: Python , AI (Artificial Intelligence) , Golang , System Administration , Linux , Networking , Distributed Systems , Docker , Apache Spark , Observability , DevOps , Apache , Big Data , Kubernetes , Rust , Google BigQuery

Job description

Overview of job

We are looking for a Senior AI Platform Engineer to architect and maintain the backbone of our AI training and inference clusters. You will be responsible for the entire lifecycle of our infrastructure—from automated provisioning of GPU nodes to deep observability of distributed systems. In this role, you will ensure our platform is resilient, scalable, and optimized for the unique demands of large-scale AI workloads.

KEY RESPONSIBILITIES

  • Kubernetes Orchestration: Design, deploy, and manage production-grade Kubernetes clusters (using EKS, GKE, or bare-metal) specifically optimized for high-performance AI computing.
  • Observability & Telemetry: Build and scale comprehensive monitoring stacks using Prometheus, Grafana, and LangSmith. Implement distributed tracing and logging to provide deep visibility into model performance and infrastructure health.
  • Infrastructure as Code (IaC): Automate the provisioning and configuration of global infrastructure using tools like Terraform, Ansible, or Pulumi.
  • MLOps Integration: Collaborate with AI Engineers to integrate Kubeflow, Airflow, or SageMaker into the platform , enabling seamless model training and deployment pipelines.
  • System Reliability: Implement self-healing mechanisms, automated scaling, and robust CI/CD pipelines to ensure 99.9% uptime for critical AI microservices.
  • Security & Governance: Manage identity, access, and data compliance within regulated and large-scale environments.

Salary & Allowances

  • 13-month salary with annual performance bonus, project incentives, sales incentives (based on position)
  • Lunch allowance: 730.000 VND/month
  • Special occasion bonus: 3.000.000 - 5.000.000 VND/year
  • Annual leaves: Up to 20 days/year (based on levels)
  • Health: Social insurance, premium health insurance, yearly health check
  • Laptop, screen and other needed facilities/ accounts/ tools for work

Career Growth

  • Yearly salary review and promotion
  • Diverse career path: Management or Expert and functions rotation opportunity
  • Free learning sources in Udemy, Coursera, O'relly platforms; internal workshop, certification sponsorship, and exclusive mentoring from C-levels
  • Recognition and awards at team and organizational levels.

Working Environment

  • Open & collaborative working space foster both individual focus and teamwork activities
  • Young, dynamic, and collaborative working atmosphere
  • Unwind zones: gaming, table tennis, yoga, gyms, bath rooms, sleep corner.
  • Quarterly/yearly teambuilding & engaged internal events.

Job Requirement

  • Cloud & DevOps: Strong background in Docker and Linux system administration.
  • Programming: Proficiency in Python, Go, Rust for building custom automation tools and operators.
  • Experience: 4+ years of experience (or equivalent) in Platform or Site Reliability Engineering (SRE), ideally within AI or data-heavy organizations.
  • Preferred Skills
  • Kubernetes Mastery: Deep expertise in K8s internals, CNI, CSI, and managing complex workloads (StatefulSets, Operators).
  • Telemetry Expert: Proven experience in building Observability frameworks (logging, metrics, tracing) for distributed systems.
  • Experience with vLLM, SGLang or Triton Inference Server inference engines.
  • Knowledge of big data processing frameworks like Apache Spark or BigQuery.
  • Background in high-performance networking (RDMA, InfiniBand) is a major plus.

Languages

    • English

    • Speaking: Intermediate - Reading: Intermediate - Writing: Intermediate

Technical Skill

  • Python
  • AI (Artificial Intelligence)
  • Golang
  • System Administration
  • Linux
  • Networking
  • Distributed Systems
  • Docker
  • Apache Spark
  • Observability
  • DevOps
  • Apache
  • Big Data
  • Kubernetes
  • Rust
  • Google BigQuery

COMPETENCES

  • Reliable

Search for the right jobs

BUSINESS PROFILE

One Mount Group (1MG) goal is to build Vietnam’s largest-scale technological ecosystem.

1MG was established with the vision of promoting and contributing to the economy’s efficiency, creating a technology infrastructure for Vietnamese businesses to accelerate its value added, providing products to consumers at a more competitive cost of goods sold.

1MG is committed to building a strong and sustainable Vietnamese business, creating a broad playing field to nurture and grow future start-ups. We believe that from our core infrastructure the following “giant” businesses of Vietnam will be generated. The goal of 1MG is to build Vietnam’s largest-scale technological ecosystem with solutions in order to link, optimize and close the gaps of the value chain of focused economic sectors having strong growth in Vietnam.

With a sound financial position and business administration, 1MG has competitive advantages to attract and retain the best Vietnamese talents from all over the world.

MORE JOBS FROM THIS EMPLOYER

  • 25-99
  • Product
  • Việt Nam

Product Owner (AI Platform)

One Mount Group

  • Hoàn Kiếm, Hà Nội
  • Negotiable
  •  Full Time
  •  Team Leader/Supervisor
1
Posted: 23/05/2026
Skills: AI (Artificial Intelligence), Machine Learning, SQL Query, Confluence, Jira, Product Development, Fintech
  • 25-99
  • Product
  • Việt Nam

(Senior) Platform Engineer

One Mount Group

  • Hoàn Kiếm, Hà Nội
  • Negotiable
  •  Full Time
  •  Experienced (Non-Manager)
1
Posted: 22/05/2026
Skills: DevOps, Linux, SaaS, OpenStack, DNS, BGP, Python, Unix, Networking, HTTP, TCP/IP, Distributed Systems, Ansible, Golang, Amazon S3, Flux, Proxmox, Vault, Kubernetes, Terraform, CEPH, IaC, GitOps, ArgoCD

Search for the right jobs

footer_logo

WHO WE ARE

ITJobs is founded in 2014 in Vietnam and the primary goal is grow to one of the leading specialists in recruitment and selection of IT staff in Asia.

  • READ MORE

Jobs from Ho Chi Minh

  • Java jobs
  • C# jobs
  • Tester jobs
  • iOS jobs
  • ASP.NET jobs

Jobs from Hanoi

  • C++ jobs
  • Java jobs
  • Linux jobs
  • SQL jobs
  • .NET jobs

Information

  • About Us
  • Conditions
  • Privacy
  • Contact Us

ITJobs © Copyright 2013-2021