Tóm lược
- Yêu cầu kỹ thuật:
- Java ,
- Python ,
- Apache Spark ,
- Unity ,
- AWS Lambda ,
- Observability ,
- Scala ,
- DevOps ,
- Grafana ,
- OpenAI ,
- Amazon S3 ,
- Apache Kafka ,
- ITIL ,
- DataDog ,
- Terraform ,
- AWS CloudFormation ,
- Prometheus ,
- Apache Airflow ,
- Apache Flink ,
- Databricks ,
- AWS Glue ,
- IaC ,
- FAISS ,
- LLM ,
- Vector ,
- GenAI ,
- RAG ,
- Claude
Mô tả công việc
Tóm tắt công việc
1. About the Role:
We are seeking a highly skilled Site Reliability Engineer with experience applying GenAI to automate and enhance the reliability of complex data platforms in Data Division. You will be responsible for building self-healing infrastructure, AI-powered observability, and automating incident response across data pipelines (e.g., Databricks, Glue, Kafka, Flink). This is a high-impact role where you will shape the future of data reliability at Techcombank, mentor engineers, and lead initiatives that span multiple teams and domains.
2. Key Responsibilities:
Platform Reliability & Automation
• Design, implement, and operate reliable, scalable, and observable data platforms.
• Automate incident triage, remediation, and postmortems using GenAI-powered tools.
• Develop intelligent runbooks and self-healing workflows using LLMs.
GenAI-Enabled SRE Practices
• Build and integrate GenAI copilots for on-call support, anomaly detection, and RCA (root cause analysis).
• Fine-tune or prompt engineer LLMs for specific use cases like summarizing logs, interpreting metrics, or generating remediation steps.
• Leverage vector databases (e.g., FAISS, Weaviate) to retrieve telemetry and incident history for GenAI prompts.
Observability & Anomaly Detection
• Integrate GenAI with observability tools (e.g., Datadog, Prometheus, Grafana, OpenTelemetry).
• Build systems for natural language querying of platform health and pipeline performance.
• Collaborate with data engineers to monitor SLIs/SLOs across ingestion, transformation, and delivery layers.
CI/CD & Risk Management
• Integrate GenAI into CI/CD pipelines to generate blast radius analyses and deployment guardrails.
• Use LLMs to assess the risk of configuration or schema changes before production rollout.
• Automate validation and rollback strategies based on historical outcomes.
WHY BECOME IT/DATA EXPERTS AT TECHCOMBANK?
- Investing over 500 million USD to develop large-scale IT projects, Techcombank is one of the leading bank in Technology trends in Vietnam
- You will grow with Techcombank by having the opportunity to learn from top experts from across the world
- Techcombank provides a rewarding remuneration structure that commensurate with your achievement and contribution
- Techcombank is the Top 2 Best place to work in the banking industry where you can experience various exciting activities throughout the year: Company anniversary, Team building, Active Saturday , Year End Party, etc.
Yêu cầu công việc
• Bachelor's degree in computer science, software engineering or information technology
• Good at English
• 5+ years in SRE, DevOps, or Data Engineering roles with strong focus on automation and observability.
• Solid experience in cloud-native data platforms (e.g., Databricks, Glue, Kafka, Flink, S3, Lambda).
• Proven experience using or integrating GenAI tools (OpenAI, Claude, HuggingFace Transformers).
• Proficiency in Python or Scala; experience with Spark and Airflow a plus.
• Familiarity with LLM techniques: prompt engineering, embeddings, retrieval-augmented generation (RAG).
• Hands-on experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Datadog).
• Experience with Infrastructure as Code (e.g., Terraform, CloudFormation).
Preferred:
• Experience fine-tuning LLMs or integrating GenAI agents into production systems.
• Familiarity with vector databases (e.g., Pinecone, Qdrant, FAISS).
• Knowledge of data quality frameworks and lineage tools (e.g., DeeQu, Great Expectations, Amundsen, Unity Catalog).
• Understanding of ITIL/incident management frameworks.
• Strong communication and documentation skills, especially in on-call and postmortem environments.
Ngôn ngữ
-
English
Nói: Intermediate - Đọc: Intermediate - Viết: Intermediate
Yêu cầu kỹ thuật
- Java
- Python
- Apache Spark
- Unity
- AWS Lambda
- Observability
- Scala
- DevOps
- Grafana
- OpenAI
- Amazon S3
- Apache Kafka
- ITIL
- DataDog
- Terraform
- AWS CloudFormation
- Prometheus
- Apache Airflow
- Apache Flink
- Databricks
- AWS Glue
- IaC
- FAISS
- LLM
- Vector
- GenAI
- RAG
- Claude
NĂNG LỰC
- Reliable
- Communication Skills
- Documentation
Thông tin doanh nghiệp
Techcombank aspires to be the best bank and a leading business in Vietnam.
MISSION:
• To be the preferred and most trusted financial partner of our customers, providing them with a full range of financial products and services through a personalized/customer centric relationship.
• To provide our employees with a great working environment where they have multiple opportunities to develop, contribute and build a successful career
• To offer our shareholders superior long term returns by executing a fast growth strategy while enforcing rigorous corporate governance and risk management best practices
CORE VALUES:
1. Customer first: what we do is only valued if it is truly beneficial to our customers and colleagues.
2. Innovation: Make improvements to lead the way.
3. Team work: At Techcombank, you will not have good performance without cooperation.
4. People development: People with proven capability will bring the organization competitive advantages and remarkable successes.
5. Accountability: Be committed to overcoming difficulties and achieving great successes.