Mô tả công việc
Manage and improve system reliability through SLO, SLI, and SLA practices.
Design and implement observability systems (metrics, logs, tracing, alerting) using tools like Prometheus, Grafana, ELK, etc.
Build and automate CI/CD pipelines and Infrastructure as Code (IaC) using tools such as Terraform, Ansible, Pulumi, Helm.
Collaborate in the analysis, design, and deployment of systems and processes to ensure reliability, observability, and scalability.
Optimize system cost, performance (latency, throughput), and security.
Operate and optimize Kubernetes clusters (EKS); strong knowledge of Docker, Kubernetes, Helm is required.
Develop internal tools to automate workflows and support other teams.
Participate in incident response, root cause analysis, postmortem reviews, and improve incident handling processes.
Support and coordinate with NOC (Network Operation Center) teams.
Be part of the on-call rotation when needed.
Quyền lợi được hưởng
You'll find this place irresistible
Enjoy top-tier compensation, including:
Monthly NET take-home pay that leaves you smiling
13th-month salary
Performance bonuses that could boost your income up to 02 months' salary
24 remote working days per year
12 days of annual paid leave
Flexible working time, from Monday to Friday; weekends are yours
Company trips and team bonding activities
Elevate your creativity and productivity in our modern workspace
Especially:
Shine like a rock star in our fast-growing global B2B SaaS squad
Blaze a trail to success with our super-fast career track
Collaborate with the brightest and coolest minds from across the globe
Be yourself, knowing you're valued and groomed to be your absolute best.
Yêu Cầu Công Việc
2–5 years of experience in SRE / DevOps / Platform Engineering.
Hands-on experience with monitoring and alerting systems (Prometheus, Grafana, ELK, Loki, etc.).
Proficient in CI/CD tools (GitLab CI, Jenkins) and familiar with Git workflows.
Experience in deploying and managing Kubernetes (EKS is a plus).
Understanding of gRPC, and capable of optimizing nginx connections and network stacks.
Strong Linux background with deep knowledge of kernel, network stack, file system, and processes.
Excellent troubleshooting skills — able to analyze issues from OS to application layer.
System-thinking mindset, focus on automation, and ability to mentor teammates.
Proactive, responsible, and able to work under pressure during incident response.
Nice to Have
Experience with AWS (EKS, EC2, RDS, CloudWatch).
Strong understanding of networking concepts (TCP/IP, DNS, Load Balancing, CDN).
Experience with high availability and distributed systems.
Previously built a complete observability stack.
Experience in building or optimizing Golang SDKs or internal frameworks.
Knowledge of cloud-native networking (CNI, overlay, BGP, eBPF-based load balancing).
Yêu Cầu Hồ Sơ
- CV xin việc (yêu cầu có ảnh trang trọng)
- CCCD (bản sao công chứng)
- Sơ yếu lý lịch bản thân
- Giấy khám sức khoẻ (Không quá 6 tháng)
- Bằng cấp liên quan (bản sao công chứng)
Thông Tin Liên Hệ
Hà Nội: 36 Hoàng Cầu, Phường Đống Đa