Senior Site Reliability Engineer
7 days ago
- Top-tier banking environment in Vietnam
- Challenging opportunities for the "Greater" You
- Attractive career path and benefits
1. About the Role:
We are seeking a highly skilled Site Reliability Engineer with experience applying GenAI to automate and enhance the reliability of complex data platforms in Data Division. You will be responsible for building self-healing infrastructure, AI-powered observability, and automating incident response across data pipelines (e.g., Databricks, Glue, Kafka, Flink). This is a high-impact role where you will shape the future of data reliability at Techcombank, mentor engineers, and lead initiatives that span multiple teams and domains.
2. Key Responsibilities:
Platform Reliability & Automation
- Design, implement, and operate reliable, scalable, and observable data platforms.
- Automate incident triage, remediation, and postmortems using GenAI-powered tools.
- Develop intelligent runbooks and self-healing workflows using LLMs.
GenAI-Enabled SRE Practices
- Build and integrate GenAI copilots for on-call support, anomaly detection, and RCA (root cause analysis).
- Fine-tune or prompt engineer LLMs for specific use cases like summarizing logs, interpreting metrics, or generating remediation steps.
- Leverage vector databases (e.g., FAISS, Weaviate) to retrieve telemetry and incident history for GenAI prompts.
Observability & Anomaly Detection
- Integrate GenAI with observability tools (e.g., Datadog, Prometheus, Grafana, OpenTelemetry).
- Build systems for natural language querying of platform health and pipeline performance.
- Collaborate with data engineers to monitor SLIs/SLOs across ingestion, transformation, and delivery layers.
CI/CD & Risk Management
- Integrate GenAI into CI/CD pipelines to generate blast radius analyses and deployment guardrails.
- Use LLMs to assess the risk of configuration or schema changes before production rollout.
- Automate validation and rollback strategies based on historical outcomes.
- Bachelor's degree in computer science, software engineering or information technology
- Good at English
- 5+ years in SRE, DevOps, or Data Engineering roles with strong focus on automation and observability.
- Solid experience in cloud-native data platforms (e.g., Databricks, Glue, Kafka, Flink, S3, Lambda).
- Proven experience using or integrating GenAI tools (OpenAI, Claude, HuggingFace Transformers).
- Proficiency in Python or Scala; experience with Spark and Airflow a plus.
- Familiarity with LLM techniques: prompt engineering, embeddings, retrieval-augmented generation (RAG).
- Hands-on experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Datadog).
- Experience with Infrastructure as Code (e.g., Terraform, CloudFormation).
Preferred:
- Experience fine-tuning LLMs or integrating GenAI agents into production systems.
- Familiarity with vector databases (e.g., Pinecone, Qdrant, FAISS).
- Knowledge of data quality frameworks and lineage tools (e.g., DeeQu, Great Expectations, Amundsen, Unity Catalog).
- Understanding of ITIL/incident management frameworks.
- Strong communication and documentation skills, especially in on-call and postmortem environments.
WHY BECOME IT/DATA EXPERTS AT TECHCOMBANK?
- Investing over 500 million USD to develop large-scale IT projects, Techcombank is one of the leading bank in Technology trends in Vietnam
- You will grow with Techcombank by having the opportunity to learn from top experts from across the world
- Techcombank provides a rewarding remuneration structure that commensurate with your achievement and contribution
- Techcombank is the Top 2 Best place to work in the banking industry where you can experience various exciting activities throughout the year: Company anniversary, Team building, Active Saturday , Year End Party, etc.
-
Senior Site Reliability Engineer
5 days ago
Ho Chi Minh City, Ho Chi Minh, Vietnam Zalopay Full time $40,000 - $120,000 per yearWe are seeking a Senior Site Reliability Engineer (SRE) with a strong DevOps mindset to drive automation, delivery excellence, and infrastructure scalability for our high-throughput payment platform. You will partner with engineering teams to streamline CI/CD pipelines, implement GitOps workflows, and build internal tools that improve developer productivity...
-
Senior Site Reliability Engineer
24 hours ago
Ho Chi Minh City, Ho Chi Minh, Vietnam VNG Full time $30,000 - $120,000 per yearWe are looking for aSenior Site Reliability Engineer (SRE)with deep expertise in deploying, operating, and optimizing database systems on Kubernetes (K8s). In this role, you will play a critical part in ensuring the data infrastructure is highly reliable, high-performance, scalable, and proactively monitored through modern observability systems.Key...
-
Senior Site Reliability Engineer
1 week ago
Ho Chi Minh City, Ho Chi Minh, Vietnam EPAM Systems Full time $60,000 - $120,000 per yearAtEPAM Vietnam, EPAM is hiring aSenior Site Reliability Engineerto join the team in Vietnam. You'll design and optimize infrastructure, automate processes and ensure the reliability of our education platforms. More than that, at EPAM, engineering is in our DNA. So, when you join our growing team, you will work with top global clients and make significant...
-
Site Reliability Engineer
5 days ago
Ho Chi Minh City, Ho Chi Minh, Vietnam HRS Group Full time $50,000 - $120,000 per yearHrs As a CompanyHRS, a pioneer in business travel, aims to elevate every stay through innovative technology. With over 50 years of experience, their digital platform, driven by ProcureTech, TravelTech, and FinTech, transforms how companies and travelers Stay, Work, and Pay.ProcureTech digitally revolutionizes lodging procurement, connecting corporations and...
-
Senior Site Reliability Engineer
7 days ago
Ho Chi Minh City, Ho Chi Minh, Vietnam VSol Full time $120,000 - $180,000 per yearTop 3 reasons to join usOnsite opportunities in UAE & Saudi ArabiaPremium Health insurance for employees & family14+ days of Annual leave & 5 days of Outing leaveJob descriptionVSOL is a digital enabler with a mission to help public and private organizations evolve their businesses through data and technology. We provide an end-to-end service from consulting...
-
Site Reliability Engineer
2 weeks ago
Ho Chi Minh City, Ho Chi Minh, Vietnam PAVE Full time ₫4,000,000 - ₫12,000,000 per yearPAVE is an innovative automotive technology company transforming the way the world inspects vehicles. Powered by Intelligent Damage Detection capabilities,PAVEenables anyone with a smartphone to complete a guided vehicle inspection simply by taking photos of their car.Headquartered in Toronto, our team brings deep expertise from both the automotive and...
-
senior devops/ site reliability engineer
3 days ago
Ho Chi Minh City, Ho Chi Minh, Vietnam Bestarion: Leading Outsourcing Company in Vietnam Full timeBestarion is a subsidiary of Larion, a well-established software outsourcing company in Vietnam with decades of experience delivering high-quality technology solutions. Inheriting Larion's strong foundation and technical expertise, Bestarion continues to grow as a trusted partner for clients worldwide.For over 15 years, Bestarion has provided innovative...
-
Sr. Site Reliability Engineer
2 weeks ago
Ho Chi Minh City, Ho Chi Minh, Vietnam HRS Group Full time $120,000 - $180,000 per yearHrs As a CompanyHRS, a pioneer in business travel, aims to elevate every stay through innovative technology. With over 50 years of experience, their digital platform, driven by ProcureTech, TravelTech, and FinTech, transforms how companies and travelers Stay, Work, and Pay.ProcureTech digitally revolutionizes lodging procurement, connecting corporations and...
-
Senior Site Reliability Engineer
5 days ago
Ho Chi Minh City, Ho Chi Minh, Vietnam VNG Corporation Full time ₫4,000,000 - ₫8,000,000 per yearTop 3 reasons to join usAttractive salary & benefits you'll loveBuilding large-scale productsWorking in one of the best places to work in VNJob descriptionChúng tôi đang tìm kiếm Senior Site Reliability Engineer (SRE) có kinh nghiệm chuyên sâu trong việc triển khai, vận hành và tối ưu hệ thống database trên môi trường...
-
Site Reliability Engineer
7 days ago
Ho Chi Minh City, Ho Chi Minh, Vietnam PAVE Full time ₫120,000 - ₫180,000 per yearTop 3 reasons to join usHybrid and flexible working environmentInnovative ProductGrowth OpportunitiesJob descriptionWe're seeking a skilled Site Reliability Engineer to join our DevOps team and ensure the stability and reliability of our enterprise vehicle inspection platform. Reporting to the Lead DevOps Engineer, you'll play a critical role in our GCP to...