Infrastructure Senior SRE Engineer

1 week ago


hong kong, Hong Kong SAR China OKX Full time

Infrastructure Senior SRE Engineer Join to apply for the Infrastructure Senior SRE Engineer role at OKX 4 days ago Be among the first 25 applicants Who We Are At OKX, we believe that the future will be reshaped by crypto, and ultimately contribute to every individual's freedom. OKX is a leading crypto exchange, and the developer of OKX Wallet, giving millions access to crypto trading and decentralized crypto applications (dApps). OKX is also a trusted brand by hundreds of large institutions seeking access to crypto markets. We are safe and reliable, backed by our Proof of Reserves. Across our multiple offices globally, we are united by our core principles: We Before Me, Do the Right Thing, and Get Things Done. These shared values drive our culture, shape our processes, and foster a friendly, rewarding, and diverse environment for every OK-er. OKX is part of OKG, a group that brings the value of Blockchain to users around the world, through our leading products OKX, OKX Wallet, OKLink and more. About The Team The Service Stability Engineering Team envisions service stability as one of the core competitive strengths of the company's products. By building end-to-end, link-level risk management capabilities, the team aims to achieve sustainable automatic identification and analysis of stability risks, transforming from "reactive governance" to "proactive governance." This approach shifts more stability-related matters forward and addresses them early, preventing issues before they arise and enhancing user experience. Job Responsibilities Design and lead the stability architecture for large-scale distributed systems, including big data platforms, data warehouses, and core middleware infrastructure. Develop and optimize comprehensive stability strategies covering capacity planning, performance optimization, fault prevention, and disaster recovery. Spearhead chaos engineering practices, designing complex fault injection scenarios to validate system resilience and self-healing capabilities. Build and refine comprehensive monitoring and alerting systems for rapid fault detection, localization, and recovery. Lead root cause analysis for major incidents and formulate long-term improvement plans to continuously enhance system availability and reliability. Drive infrastructure intelligence and automation, designing and implementing AIOps solutions. Collaborate closely with product, development, and operations teams to integrate stability requirements throughout the product lifecycle. Lead the development of stability-related technical standards and best practices, promoting their adoption across the organization. Qualifications Bachelor's degree or above in Computer Science or related field, with 10+ years of architectural design experience in large-scale internet or cloud computing platforms. Expert knowledge of distributed system architectures, with deep understanding and rich practical experience in big data, cloud-native, and microservice technologies. In-depth understanding of various infrastructure components (e.g., Kubernetes, Kafka, Database) and ability to perform advanced tuning. Strong systems thinking capability, able to analyze and solve complex stability issues from a holistic perspective. Extensive experience in handling large-scale system failures, with the ability to quickly locate and resolve challenging problems. Mastery of Linux systems and network technologies, familiarity with mainstream cloud platforms (e.g., Alibaba Cloud, AWS) architecture and services. Excellent technical leadership skills, able to guide teams and drive cross-departmental collaboration. Strong communication and documentation skills, with the ability to engage in technical discussions in both Chinese and English. Passion for continuous learning, able to quickly grasp new technologies and apply them in practical work scenarios. Perks & Benefits Competitive total compensation Comprehensive insurance coverage for employees and their dependents More that we love to tell you along the process Seniority level Mid-Senior level Employment type Full-time Industries IT Services and IT Consulting Information collected and processed as part of the recruitment process of any job application you choose to submit is subject to OKX's Candidate Privacy Notice. #J-18808-Ljbffr


  • Senior SRE Engineer

    4 days ago


    Hong Kong Island, Hong Kong SAR China Grvt Full time

    Senior Site Reliability Engineer Join to apply for the Senior SRE Engineer role at Grvt Design, implement, and maintain scalable infrastructure for a high-performance, low-latency crypto trading platform. Operate and enhance GRVT’s Kubernetes and Nomad-based environments to ensure system stability, scalability, and security. Build infrastructure automation...

  • Senior SRE Engineer

    8 hours ago


    Hong Kong Island, Hong Kong SAR China Grvt Full time

    Join to apply for the Senior SRE Engineer role at Grvt . Design, implement, and maintain scalable infrastructure for a high-performance, low-latency crypto trading platform. Operate and enhance GRVT’s Kubernetes and Nomad-based environments to ensure system stability, scalability, and security. Build infrastructure automation and deployment pipelines using...


  • Hong Kong Island, Hong Kong SAR China OSL Full time

    Infrastructure Site Reliability Engineering (Infra SRE) Lead OSL is Hong Kong’s first SFC‑licensed digital asset platform. We are seeking an Infrastructure SRE Lead to own the bedrock of our regulated trading, custody, and payments platforms. You will architect systems that are not just robust, but resilient, scalable, and secure enough to power the...

  • Devops/Sre

    2 days ago


    Hong Kong Island, Hong Kong SAR China ioTech Solutions Full time

    We are seeking a skilled and motivated DevOps / Site Reliability Engineer (SRE) with 2+ years of experience to help us build, scale, and maintain robust, secure, and high-availability infrastructure. As a DevOps/SRE team member, you will work closely with development, QA, and operations teams to automate processes, monitor system health, and ensure the...

  • Senior SRE Engineer

    8 hours ago


    Hong Kong Island, Hong Kong SAR China Grvt Full time

    A dynamic technology firm in Hong Kong is seeking a talented Senior SRE Engineer. This role focuses on designing and maintaining scalable infrastructure for a crypto trading platform, ensuring system stability and reliability. The ideal candidate should have strong expertise in Kubernetes, Terraform, and cloud-native systems. Experience with observability...

  • Senior SRE

    4 days ago


    Hong Kong Island, Hong Kong SAR China IO Tech Solutions Limited Full time

    Senior SRE (VP-Grade) - Leading Crypto HFT We are a global leader in digital assets and data center infrastructure, providing solutions that drive progress in finance and artificial intelligence. We believe that innovations in blockchain and digital assets will revolutionize the movement of value worldwide, and we are dedicated to crafting the products and...

  • Senior SRE

    4 days ago


    Hong Kong Island, Hong Kong SAR China IO Tech Solutions Limited Full time

    A leading technology firm in Hong Kong is seeking a Senior Site Reliability Engineer to ensure the reliability and performance of their critical infrastructure. You will be responsible for managing Kubernetes deployments, optimizing AWS cloud services, and maintaining Infrastructure as Code using Terraform. The ideal candidate has over 8 years of experience...

  • Senior SRE

    3 days ago


    Hong Kong, Hong Kong SAR China IO TECH SOLUTIONS LIMITED Full time

    We are a global leader in digital assets and data center infrastructure, providing solutions that drive progress in finance and artificial intelligence. We believe that innovations in blockchain and digital assets will revolutionize the movement of value worldwide, and we are dedicated to crafting the products and services that will realize this...


  • Hong Kong Island, Hong Kong SAR China Galaxy Full time

    Who We Are Galaxy is a global leader in digital assets and data center infrastructure, delivering solutions that accelerate progress in finance and artificial intelligence. Our institutional digital assets platform spans trading, investment banking, asset management, staking, self‑custody, and tokenization technology. We also invest in and operate...


  • hong kong, Hong Kong SAR China ioTech Solutions Full time

    Join my clientas a Site Reliability Engineer and drive the future of infrastructure! You'll design robust automations, optimize performance, and ensure system resilience while minimizing manual toil. From deployments to incident response, your work will directly impact our platforms reliability and scalability. Key Responsibilities Develop automations for...