Tóm lược
Mô tả công việc
Tóm tắt công việc
Job Purpose:
The DevOps Team Lead sits at the intersection of technical expertise, operational reliability, and project delivery. This role is responsible for leading a team of Systems/Platform engineers to design, implement, and maintain secure, scalable, and highly available infrastructure across AWS, Azure, Google Cloud, and on‑premise environments. The position owns the end‑to‑end application delivery platform (CI/CD, Kubernetes, GitLab, ArgoCD, Helm), observability stack, and continuous ISO/IEC 27001 compliance within the team, ensuring timely delivery of high‑quality infrastructure services that support business objectives.
Key Responsibilities
Infrastructure & IaC Management
- Lead the design, implementation, and maintenance of infrastructure across AWS, Azure, Google Cloud, and on‑premise servers.
- Champion Infrastructure as Code (IaC) practices using tools such as Terraform, Terragrunt, CloudFormation, or equivalent to provision, configure, and manage infrastructure in a repeatable and auditable way.
- Ensure environments are standardized, secure, cost‑optimized, and aligned with architecture and security guidelines.
Application Delivery & Platform Engineering
- Own and evolve the application delivery platform using GitLab CI, ArgoCD, Helm charts, and Kubernetes.
- Design and maintain CI/CD pipelines to support reliable, frequent, and automated application deployments across environments.
- Establish best practices and guardrails for Kubernetes cluster configuration, namespace management, Helm chart management, and deployment strategies (e.g., blue/green, canary).
- Collaborate closely with development teams to ensure smooth, predictable, and observable releases.
Monitoring, Logging & Alerting
- Lead the design, implementation, and continuous improvement of the observability stack, including Prometheus, Thanos, Alertmanager, Grafana, Kibana, and Elasticsearch.
- Define and maintain monitoring standards, SLOs/SLIs, dashboards, and alerting rules to ensure early detection and rapid resolution of incidents.
- Ensure logs, metrics, and traces are consistently collected, stored, and accessible for troubleshooting, performance tuning, and capacity planning.
Compliance & Information Security (ISO/IEC 27001)
- Lead the implementation, documentation, and continuous maintenance of the ISO/IEC 27001 Information Security Management System (ISMS) within the team.
- Ensure infrastructure, platforms, and operational processes adhere to information security policies, controls, and audit requirements.
- Collaborate with Information Security, Risk, and Compliance stakeholders to support audits, risk assessments, and corrective actions.
- Promote a culture of security and compliance awareness within the team and across collaborating functions.
Team Leadership & People Management
- Lead, mentor, and develop a team of Systems/Platform engineers; provide regular feedback, support career growth, and foster a high‑performance culture.
- Plan and prioritize team workload, ensuring timely delivery of projects, BAU tasks, and incident resolution.
- Promote knowledge sharing, documentation, and cross‑training to reduce single points of failure.
Collaboration
- Work closely with software development, security, network, and service desk teams to ensure infrastructure and platforms meet business and operational requirements.
- Translate business needs into technical solutions, set expectations, and communicate clearly on progress, risks, and timelines.
- Participate in architecture and design discussions, contributing infrastructure and operations perspectives.
Reliability, Incident & Problem Management
- Oversee incident response, including triage, communication, and coordination with relevant teams to minimize downtime and impact.
- Drive root cause analysis (RCA) and implement corrective and preventive actions for recurring issues.
- Continuously improve operational processes, runbooks, and standard operating procedures.
(*) BONUSES & REWARDS
- Competitive Salary
- 13th Month Salary & Performance Bonus
- Employee of the Year Award
(*) TRAINING & DEVELOPMENT
- In-house & Overseas Training
- Full reimbursement for international Technical Certification
- Global career opportunity
(*) ANNUAL PAID LEAVES
- Vacation Leave: 12 days per year
- Medical Leave: 8 days per year
- 1 extra seniority day for every 3 years of service
(*) HEALTHCARE
- Annual Routine Check-up
- Premium Healthcare Insurance
- Comprehensive Insurance
(*) WELLNESS AND LEISURE ACTIVITIES
- Annual Team Building
- Soccer Club and Badminton Club
- Entertainment activities: Music band, Karaoke
- Celebrations special events: Birthdays, Christmas, New Year/Year-end Party.
(*) PERKS
- Fruits Days every week
- Unlimited snacks & beverages
Yêu cầu công việc
Skills & Qualifications
- Bachelor’s degree in Computer Science, Information Technology, Engineering, or related field; advanced degree is a plus.
- 5+ years of hands‑on experience in systems, platform, or infrastructure engineering, with at least 2 years in a technical leadership or team lead role.
- Strong communication skills in English, both written and verbal, with the ability to explain complex technical topics to non‑technical stakeholders.
- Demonstrated ability to provide high‑quality customer service, manage expectations, and build strong relationships with internal stakeholders.
- Proven experience leading and mentoring technical teams.
Knowledge & Experience
- Deep expertise in managing and configuring public cloud environments (AWS required; Azure and Google Cloud strongly preferred).
- Strong experience with Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or equivalent.
- Proven experience designing and maintaining CI/CD pipelines, ideally with GitLab CI; familiarity with other CI tools is a plus.
- Hands‑on experience with Kubernetes, ArgoCD, and Helm charts for application deployment and configuration management.
- Solid understanding of networking concepts within cloud and containerized environments (VPCs, subnets, security groups, ingress/egress, load balancers).
- Strong background in Linux administration, system hardening, patch management, and performance optimization.
- Practical experience with observability stacks: Prometheus, Thanos, Alertmanager, Grafana, Kibana, and Elasticsearch (or equivalent tools).
- Proven experience implementing, operating, or maintaining ISO/IEC 27001 controls and processes within an organization.
- Experience with configuration management/automation tools (e.g., Ansible, Rancher, or equivalent).
- Relevant cloud certifications (e.g., AWS Certified Solutions Architect, Azure Administrator, Google Professional Cloud Architect) are an advantage.
Ngôn ngữ
-
English
Nói: Intermediate - Đọc: Intermediate - Viết: Intermediate
Yêu cầu kỹ thuật
- DevOps
- MS Azure
- AWS
- Linux
- Networking
- Elasticsearch
- Observability
- Kibana
- Grafana
- Ansible
- Kubernetes
- Gitlab
- ISO
- GCP
- Terraform
- Rancher
- AWS CloudFormation
- Prometheus
- Helm
- IaC
- ArgoCD
- CI/CD
NĂNG LỰC
- Team Leadership
- Communication Skills
- Customer Care
Thông tin doanh nghiệp
CODE88, we provide IT solutions by developing software platform for business operation.
We provide IT solutions by developing software platform for business operation to local and overseas small and medium-sized enterprises (SMEs). We are committed to deliver strategic and innovative solutions to add business value to our customers. Our distinction from the others is our ability to provide scalable, quality, and cost effective solutions. We deliver a suite of integrated web based solutions and also develop customized Internet applications based on pre-defined work-order from our clients. CODE88 is committed to transform your business via web application for higher productivity and gain competitive advantages in speed, cost, and adaptability.