Tóm tắt công việc
Job Purpose:
The DevOps Team Lead sits at the intersection of technical expertise, operational reliability, and project delivery. This role is responsible for leading a team of Systems/Platform engineers to design, implement, and maintain secure, scalable, and highly available infrastructure across AWS, Azure, Google Cloud, and on‑premise environments. The position owns the end‑to‑end application delivery platform (CI/CD, Kubernetes, GitLab, ArgoCD, Helm), observability stack, and continuous ISO/IEC 27001 compliance within the team, ensuring timely delivery of high‑quality infrastructure services that support business objectives.
Key Responsibilities
Infrastructure & IaC Management
- Lead the design, implementation, and maintenance of infrastructure across AWS, Azure, Google Cloud, and on‑premise servers.
- Champion Infrastructure as Code (IaC) practices using tools such as Terraform, Terragrunt, CloudFormation, or equivalent to provision, configure, and manage infrastructure in a repeatable and auditable way.
- Ensure environments are standardized, secure, cost‑optimized, and aligned with architecture and security guidelines.
Application Delivery & Platform Engineering
- Own and evolve the application delivery platform using GitLab CI, ArgoCD, Helm charts, and Kubernetes.
- Design and maintain CI/CD pipelines to support reliable, frequent, and automated application deployments across environments.
- Establish best practices and guardrails for Kubernetes cluster configuration, namespace management, Helm chart management, and deployment strategies (e.g., blue/green, canary).
- Collaborate closely with development teams to ensure smooth, predictable, and observable releases.
Monitoring, Logging & Alerting
- Lead the design, implementation, and continuous improvement of the observability stack, including Prometheus, Thanos, Alertmanager, Grafana, Kibana, and Elasticsearch.
- Define and maintain monitoring standards, SLOs/SLIs, dashboards, and alerting rules to ensure early detection and rapid resolution of incidents.
- Ensure logs, metrics, and traces are consistently collected, stored, and accessible for troubleshooting, performance tuning, and capacity planning.
Compliance & Information Security (ISO/IEC 27001)
- Lead the implementation, documentation, and continuous maintenance of the ISO/IEC 27001 Information Security Management System (ISMS) within the team.
- Ensure infrastructure, platforms, and operational processes adhere to information security policies, controls, and audit requirements.
- Collaborate with Information Security, Risk, and Compliance stakeholders to support audits, risk assessments, and corrective actions.
- Promote a culture of security and compliance awareness within the team and across collaborating functions.
Team Leadership & People Management
- Lead, mentor, and develop a team of Systems/Platform engineers; provide regular feedback, support career growth, and foster a high‑performance culture.
- Plan and prioritize team workload, ensuring timely delivery of projects, BAU tasks, and incident resolution.
- Promote knowledge sharing, documentation, and cross‑training to reduce single points of failure.
Collaboration
- Work closely with software development, security, network, and service desk teams to ensure infrastructure and platforms meet business and operational requirements.
- Translate business needs into technical solutions, set expectations, and communicate clearly on progress, risks, and timelines.
- Participate in architecture and design discussions, contributing infrastructure and operations perspectives.
Reliability, Incident & Problem Management
- Oversee incident response, including triage, communication, and coordination with relevant teams to minimize downtime and impact.
- Drive root cause analysis (RCA) and implement corrective and preventive actions for recurring issues.
- Continuously improve operational processes, runbooks, and standard operating procedures.
(*) BONUSES & REWARDS
- Competitive Salary
- 13th Month Salary & Performance Bonus
- Employee of the Year Award
(*) TRAINING & DEVELOPMENT
- In-house & Overseas Training
- Full reimbursement for international Technical Certification
- Global career opportunity
(*) ANNUAL PAID LEAVES
- Vacation Leave: 12 days per year
- Medical Leave: 8 days per year
- 1 extra seniority day for every 3 years of service
(*) HEALTHCARE
- Annual Routine Check-up
- Premium Healthcare Insurance
- Comprehensive Insurance
(*) WELLNESS AND LEISURE ACTIVITIES
- Annual Team Building
- Soccer Club and Badminton Club
- Entertainment activities: Music band, Karaoke
- Celebrations special events: Birthdays, Christmas, New Year/Year-end Party.
(*) PERKS
- Fruits Days every week
- Unlimited snacks & beverages