Senior Site Reliability Engineer
7 days ago
Who We Are At OKX, we believe that the future will be reshaped by crypto, and ultimately contribute to every individual's freedom. OKX is a leading crypto exchange, and the developer of OKX Wallet, giving millions access to crypto trading and decentralized crypto applications (dApps). OKX is also a trusted brand by hundreds of large institutions seeking access to crypto markets. We are safe and reliable, backed by our Proof of Reserves. Across our multiple offices globally, we are united by our core principles: We Before Me , Do the Right Thing , and Get Things Done . These shared values drive our culture, shape our processes, and foster a friendly, rewarding, and diverse environment for every OK-er. OKX is part of OKG, a group that brings the value of Blockchain to users around the world, through our leading products OKX, OKX Wallet, OKLink and more. About the Team The Service Stability Engineering Team envisions service stability as one of the core competitive strengths of the company's products. By building end-to-end, link-level risk management capabilities, the team aims to achieve sustainable automatic identification and analysis of stability risks, transforming from "reactive governance" to "proactive governance." This approach shifts more stability-related matters forward and addresses them early, preventing issues before they arise and enhancing user experience. Job Responsibilities: Design and lead the stability architecture for large-scale distributed systems, including big data platforms, data warehouses, and core middleware infrastructure. Develop and optimize comprehensive stability strategies covering capacity planning, performance optimization, fault prevention, and disaster recovery. Spearhead chaos engineering practices, designing complex fault injection scenarios to validate system resilience and self-healing capabilities. Build and refine comprehensive monitoring and alerting systems for rapid fault detection, localization, and recovery. Lead root cause analysis for major incidents and formulate long-term improvement plans to continuously enhance system availability and reliability. Drive infrastructure intelligence and automation, designing and implementing AIOps solutions. Collaborate closely with product, development, and operations teams to integrate stability requirements throughout the product lifecycle. Lead the development of stability-related technical standards and best practices, promoting their adoption across the organization. Qualifications: Bachelor's degree or above in Computer Science or related field, with 10+ years of architectural design experience in large-scale internet or cloud computing platforms. Expert knowledge of distributed system architectures, with deep understanding and rich practical experience in big data, cloud-native, and microservice technologies. In-depth understanding of various infrastructure components (e.g., Kubernetes, Kafka, Database) and ability to perform advanced tuning. Strong systems thinking capability, able to analyze and solve complex stability issues from a holistic perspective. Extensive experience in handling large-scale system failures, with the ability to quickly locate and resolve challenging problems. Mastery of Linux systems and network technologies, familiarity with mainstream cloud platforms (e.g., Alibaba Cloud, AWS) architecture and services. Excellent technical leadership skills, able to guide teams and drive cross-departmental collaboration. Strong communication and documentation skills, with the ability to engage in technical discussions in both Chinese and English. Passion for continuous learning, able to quickly grasp new technologies and apply them in practical work scenarios. Perks & Benefits Competitive total compensation Comprehensive insurance coverage for employees and their dependants More that we love to tell you along the process Notice : All official OKX vacancies are posted on this site. We are not affiliated with other third-party job boards except Linkedin.com , listings on other sites may be inaccurate or outdated. This is the only source of truth for applications. Information collected and processed as part of the recruitment process of any job application you choose to submit is subject to OKX 's Candidate Privacy Notice .
-
Site Reliability Engineer
1 day ago
Hong Kong Island, Hong Kong SAR China Goliath Partners Full timeA global financial technology leader is seeking Site Reliability Engineers to help scale and support the systems behind high-performance investment strategies. You’ll join a collaborative engineering team focused on automation, reliability, and solving complex technical problems at scale. What You’ll Do Ensure the reliability and performance of...
-
Site Reliability Engineer
2 weeks ago
hong kong, Hong Kong SAR China Goliath Partners Full time3 days ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. Direct message the job poster from Goliath Partners A global financial technology leader is seeking Site Reliability Engineers to help scale and support the systems behind high-performance investment strategies. You’ll join a collaborative engineering...
-
Site Reliability Engineer
1 week ago
hong kong, Hong Kong SAR China Qube Research & Technologies Full time2 days ago Be among the first 25 applicants Qube Research & Technologies (QRT) is a global quantitative and systematic investment manager, operating in all liquid asset classes across the world. We are a technology and data driven group implementing a scientific approach to investing. Combining data, research, technology, and trading expertise has shaped our...
-
Site reliability engineer
5 days ago
hong kong, Hong Kong SAR China Tek Systems Full timeFintech Innovative project Leading digital transformation project Site Reliability Engineer (SRE) Experience: 2-3 years Location: HK island About the Role We are looking for a skilled and motivated Site Reliability Engineer (SRE) to join a dynamic, forward-thinking team working on high-impact projects that shape the future of technology in the financial...
-
Site Reliability Engineer
3 days ago
Hong Kong, Hong Kong SAR China Ashford Benjamin Ltd Full timeWe are exclusively representing a global investment firm that stands at the intersection of advanced mathematics, cutting-edge technology, and global finance. They operate a world-class, high-performance technology stack (including C++, Python, KDB+, and FPGA) to identify and execute on systematic opportunities. They are now seeking to add a strategic Site...
-
Site Reliability Engineer
7 days ago
Hong Kong Island, Hong Kong SAR China NLS Executive Search Full timeOverview Site Reliability Engineer - Global Hedge Fund - Hong Kong My client, a global hedge fund, is actively seeking a hands-on, highly skilled and motivated SRE to join their team. As an SRE, you will play a critical role in driving the adoption of Site Reliability Engineering practices within their organization. The ideal candidate will have a strong...
-
Site Reliability Engineer
5 days ago
Hong Kong Island, Hong Kong SAR China Flow Traders Full timeFlow Traders is looking for an experienced Site Reliability Engineer to join our team and play a key role in building, maintaining, and scaling our cloud-based platform. The ideal candidate possesses a great sense of ownership and passion for working with the latest technologies. This is a unique opportunity to join a leading proprietary trading firm with an...
-
Lead Site Reliability Engineer
4 days ago
Hong Kong, Hong Kong SAR China IO TECH SOLUTIONS LIMITED Full timeMy client are seeking a Site Reliability Engineer (SRE) to join their growing infrastructure team in Asia. As an SRE, you will play a critical role in ensuring the stability, scalability, and resilience of our production systems. You'll help our client scale our services, define and enforce reliability standards, and continuously improve our engineering...
-
Lead Site Reliability Engineer
5 days ago
Hong Kong Island, Hong Kong SAR China IO TECH SOLUTIONS LIMITED Full timeMy clientare seeking a Site Reliability Engineer (SRE) to join their growing infrastructure team in Asia. As an SRE, you will play a critical role in ensuring the stability, scalability, and resilience of our production systems. You'll help our client scale our services, define and enforce reliability standards, and continuously improve our engineering...
-
Site Reliability Engineer
1 day ago
Hong Kong Island, Hong Kong SAR China Goliath Partners Full timeA global financial technology leader is looking for Site Reliability Engineers in Hong Kong. This full-time role focuses on ensuring the reliability of production applications, automating tasks, and conducting incident management. Ideal candidates should have a degree in Computer Science and skills in Python, CI/CD, and problem-solving. Compensation goes up...