Senior Site Reliability Engineer
1 day ago
**Top 3 reasons to join us**:
- Top-tier banking environment in Vietnam
- Challenging opportunities for the “Greater” You
- Attractive career path and benefits
**Job description**:
** 1. About the Role**:
**2. Key Responsibilities**:
**Platform Reliability & Automation**
- Design, implement, and operate reliable, scalable, and observable data platforms.
- Automate incident triage, remediation, and postmortems using GenAI-powered tools.
- Develop intelligent runbooks and self-healing workflows using LLMs.
**GenAI-Enabled SRE Practices**
- Build and integrate GenAI copilots for on-call support, anomaly detection, and RCA (root cause analysis).
- Fine-tune or prompt engineer LLMs for specific use cases like summarizing logs, interpreting metrics, or generating remediation steps.
- Leverage vector databases (e.g., FAISS, Weaviate) to retrieve telemetry and incident history for GenAI prompts.
**Observability & Anomaly Detection**
- Integrate GenAI with observability tools (e.g., Datadog, Prometheus, Grafana, OpenTelemetry).
- Build systems for natural language querying of platform health and pipeline performance.
- Collaborate with data engineers to monitor SLIs/SLOs across ingestion, transformation, and delivery layers.
**CI/CD & Risk Management**
- Integrate GenAI into CI/CD pipelines to generate blast radius analyses and deployment guardrails.
- Use LLMs to assess the risk of configuration or schema changes before production rollout.
- Automate validation and rollback strategies based on historical outcomes.
**Your skills and experience**:
- Bachelor's degree in computer science, software engineering or information technology
- Good at English
- 5+ years in SRE, DevOps, or Data Engineering roles with strong focus on automation and observability.
- Solid experience in cloud-native data platforms (e.g., Databricks, Glue, Kafka, Flink, S3, Lambda).
- Proven experience using or integrating GenAI tools (OpenAI, Claude, HuggingFace Transformers).
- Proficiency in Python or Scala; experience with Spark and Airflow a plus.
- Familiarity with LLM techniques: prompt engineering, embeddings, retrieval-augmented generation (RAG).
- Hands-on experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Datadog).
- Experience with Infrastructure as Code (e.g., Terraform, CloudFormation).
**Preferred**:
- Experience fine-tuning LLMs or integrating GenAI agents into production systems.
- Familiarity with vector databases (e.g., Pinecone, Qdrant, FAISS).
- Knowledge of data quality frameworks and lineage tools (e.g., DeeQu, Great Expectations, Amundsen, Unity Catalog).
- Understanding of ITIL/incident management frameworks.
- Strong communication and documentation skills, especially in on-call and postmortem environments.
**Why you'll love working here**:
** WHY BECOME IT/DATA EXPERTS AT TECHCOMBANK?**
- Investing **over 500 million USD** to develop large-scale IT projects, Techcombank is one of the leading bank in Technology trends in Vietnam
- You will grow with Techcombank by having the opportunity to learn from **top experts** from across the world
- Techcombank provides a **rewarding remuneration structure** that commensurate with your achievement and contribution
- Techcombank is the **Top 2 Best place to work**in the banking industry where you can experience various exciting activities throughout the year: Company anniversary, Team building, Active Saturday, Year End Party, etc.
-
Site Reliability Engineer
5 days ago
Thành phố Hồ Chí Minh, Vietnam Pizza Hut Digital & Technology Full timePizza Hut Digital & Technology *** - Waseco Building - 10 Pho Quang Street, Ward 02, Tan Binh, Ho Chi Minh- Hybrid- Posted 11 minutes ago- Skills: - AWS English Azure **Top 3 reasons to join us**: - Flexible Friday afternoon - 18 Annual Leave + 5 Recharge Days/ Year - Hybrid working model **Job description**: **Role Overview** - As a site reliability...
-
Site Reliability Engineer
7 days ago
Thành phố Hồ Chí Minh, Vietnam Zalo Full timeHồ Chí Minh Full-time A Database Reliability Engineer (DRE) in Zalo is a crucial role responsible for ensuring the constant availability, optimal performance, and robust scalability of ZA's inhouse database systems. This position blends the skills of a traditional database administrator with the principles of software engineering and site reliability...
-
Site Reliability Engineer
5 days ago
Thành phố Hồ Chí Minh, Vietnam Zalo Full timeHồ Chí Minh Full-time A Backend Reliability Engineer (BRE) in Zalo is a crucial role responsible for ensuring the constant availability, optimal performance, and robust scalability of ZA's inhouse database systems. This position blends the skills of a traditional database administrator with the principles of software engineering and site reliability...
-
Senior Site Reliability Engineer
6 days ago
Ho Chi Minh City, Ho Chi Minh, Vietnam Zalopay Full time $40,000 - $120,000 per yearWe are seeking a Senior Site Reliability Engineer (SRE) with a strong DevOps mindset to drive automation, delivery excellence, and infrastructure scalability for our high-throughput payment platform. You will partner with engineering teams to streamline CI/CD pipelines, implement GitOps workflows, and build internal tools that improve developer productivity...
-
Site Reliability Engineer
3 days ago
Thành phố Hồ Chí Minh, Vietnam HRS Full time**City**:Ho Chi Minh **Job Function**:Tech **Job Area**:Product & IT **Seniority Level**:Mid-Senior level **Date**:Apr 23, 2025 **HRS AS A COMPANY** - HRS, a pioneer in business travel, aims to elevate every stay through innovative technology. With over 50 years of experience, their digital platform, driven by ProcureTech, TravelTech, and FinTech,...
-
Senior Site Reliability Engineer
2 days ago
Ho Chi Minh City, Ho Chi Minh, Vietnam VNG Full time $30,000 - $120,000 per yearWe are looking for aSenior Site Reliability Engineer (SRE)with deep expertise in deploying, operating, and optimizing database systems on Kubernetes (K8s). In this role, you will play a critical part in ensuring the data infrastructure is highly reliable, high-performance, scalable, and proactively monitored through modern observability systems.Key...
-
Site Reliability Engineer
3 days ago
Ho Chi Minh City, Vietnam Wizeline Full time**Site Reliability Engineer / DevOps**: Wizeline - Ứng Tuyển Cloud System Admin AWS - Đăng nhập để xem mức lương - 285 Cách Mạng Tháng 8, District 10, Ho Chi Minh- Xem bản đồ- Tại văn phòng- 4 giờ trước **3 Lý Do Để Gia Nhập Công Ty**: - Leading Technologies to Deliver Great Solutions - Enjoy Competitive &...
-
Senior Site Reliability Engineer
7 days ago
Thành phố Hồ Chí Minh, Vietnam VNG Corporation Full time**Top 3 reasons to join us**: - Attractive salary & benefits you'll love - Building large-scale products - Working in one of the best places to work in VN **Job description**: Chúng tôi đang tìm kiếm **Senior Site Reliability Engineer (SRE)** có kinh nghiệm chuyên sâu trong việc triển khai, vận hành và tối ưu hệ thống database trên...
-
Senior Site Reliability Engineer
1 week ago
Ho Chi Minh City, Ho Chi Minh, Vietnam EPAM Systems Full time $60,000 - $120,000 per yearAtEPAM Vietnam, EPAM is hiring aSenior Site Reliability Engineerto join the team in Vietnam. You'll design and optimize infrastructure, automate processes and ensure the reliability of our education platforms. More than that, at EPAM, engineering is in our DNA. So, when you join our growing team, you will work with top global clients and make significant...
-
Site Reliability Engineer
2 weeks ago
Ho Chi Minh City, Vietnam Tyme Full time**Site Reliability Engineer**: Tyme - Ứng Tuyển AWS Python DevOps - Đăng nhập để xem mức lương - HIU Tower, 215 Điện Biên Phủ, Phường 15, Binh Thanh, Ho Chi Minh- Xem bản đồ- Linh hoạt- 2 giờ trước **3 Lý Do Để Gia Nhập Công Ty**: - Excellent environment and team to help you grow. - Competitive salary and...