Techcombank

Techcombank Tower, 191 Ba Trieu, Hà Nội

Quy mô công ty : 100-499

Xem thêm

Tóm lược

Mô tả công việc

Tóm tắt công việc

1. About the Role:

We are seeking a highly skilled Site Reliability Engineer with experience applying GenAI to automate and enhance the reliability of complex data platforms in Data Division. You will be responsible for building self-healing infrastructure, AI-powered observability, and automating incident response across data pipelines (e.g., Databricks, Glue, Kafka, Flink). This is a high-impact role where you will shape the future of data reliability at Techcombank, mentor engineers, and lead initiatives that span multiple teams and domains.

2. Key Responsibilities:

Platform Reliability & Automation
• Design, implement, and operate reliable, scalable, and observable data platforms.
• Automate incident triage, remediation, and postmortems using GenAI-powered tools.
• Develop intelligent runbooks and self-healing workflows using LLMs.
GenAI-Enabled SRE Practices
• Build and integrate GenAI copilots for on-call support, anomaly detection, and RCA (root cause analysis).
• Fine-tune or prompt engineer LLMs for specific use cases like summarizing logs, interpreting metrics, or generating remediation steps.
• Leverage vector databases (e.g., FAISS, Weaviate) to retrieve telemetry and incident history for GenAI prompts.
Observability & Anomaly Detection
• Integrate GenAI with observability tools (e.g., Datadog, Prometheus, Grafana, OpenTelemetry).
• Build systems for natural language querying of platform health and pipeline performance.
• Collaborate with data engineers to monitor SLIs/SLOs across ingestion, transformation, and delivery layers.
CI/CD & Risk Management
• Integrate GenAI into CI/CD pipelines to generate blast radius analyses and deployment guardrails.
• Use LLMs to assess the risk of configuration or schema changes before production rollout.
• Automate validation and rollback strategies based on historical outcomes.

WHY BECOME IT/DATA EXPERTS AT TECHCOMBANK?

  • Investing over 500 million USD to develop large-scale IT projects, Techcombank is one of the leading bank in Technology trends in Vietnam
  • You will grow with Techcombank by having the opportunity to learn from top experts from across the world
  • Techcombank provides a rewarding remuneration structure that commensurate with your achievement and contribution
  • Techcombank is the Top 2 Best place to work in the banking industry where you can experience various exciting activities throughout the year: Company anniversary, Team building, Active Saturday , Year End Party, etc.

Yêu cầu công việc

• Bachelor's degree in computer science, software engineering or information technology
• Good at English

• 5+ years in SRE, DevOps, or Data Engineering roles with strong focus on automation and observability.
• Solid experience in cloud-native data platforms (e.g., Databricks, Glue, Kafka, Flink, S3, Lambda).
• Proven experience using or integrating GenAI tools (OpenAI, Claude, HuggingFace Transformers).

• Proficiency in Python or Scala; experience with Spark and Airflow a plus.
• Familiarity with LLM techniques: prompt engineering, embeddings, retrieval-augmented generation (RAG).
• Hands-on experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Datadog).
• Experience with Infrastructure as Code (e.g., Terraform, CloudFormation).
Preferred:
• Experience fine-tuning LLMs or integrating GenAI agents into production systems.
• Familiarity with vector databases (e.g., Pinecone, Qdrant, FAISS).
• Knowledge of data quality frameworks and lineage tools (e.g., DeeQu, Great Expectations, Amundsen, Unity Catalog).
• Understanding of ITIL/incident management frameworks.
• Strong communication and documentation skills, especially in on-call and postmortem environments.

Ngôn ngữ

  • English

    Nói: Intermediate - Đọc: Intermediate - Viết: Intermediate

Yêu cầu kỹ thuật

  • Java
  • Python
  • Apache Spark
  • Unity
  • AWS Lambda
  • Observability
  • Scala
  • DevOps
  • Grafana
  • OpenAI
  • Amazon S3
  • Apache Kafka
  • ITIL
  • DataDog
  • Terraform
  • AWS CloudFormation
  • Prometheus
  • Apache Airflow
  • Apache Flink
  • Databricks
  • AWS Glue
  • IaC
  • FAISS
  • LLM
  • Vector
  • GenAI
  • RAG
  • Claude

NĂNG LỰC

  • Reliable
  • Communication Skills
  • Documentation

Thông tin doanh nghiệp

Techcombank aspires to be the best bank and a leading business in Vietnam.

MISSION:

• To be the preferred and most trusted financial partner of our customers, providing them with a full range of financial products and services through a personalized/customer centric relationship.

• To provide our employees with a great working environment where they have multiple opportunities to develop, contribute and build a successful career

• To offer our shareholders superior long term returns by executing a fast growth strategy while enforcing rigorous corporate governance and risk management best practices

CORE VALUES:

1. Customer first: what we do is only valued if it is truly beneficial to our customers and colleagues.

2. Innovation: Make improvements to lead the way.

3. Team work: At Techcombank, you will not have good performance without cooperation.

4. People development: People with proven capability will bring the organization competitive advantages and remarkable successes.

5. Accountability: Be committed to overcoming difficulties and achieving great successes.