Infrastructure Operations Engineer

4 days ago


Hong Kong, Central and Western District, Hong Kong SAR China Aethir Full time
Infrastructure Operations Engineer (GPU Computing) - Enterprise AI

Aethir is a pioneering technology company at the forefront of GPU-based compute infrastructure, specializing in cutting-edge solutions for diverse industries ranging from AI and machine learning to high-performance computing (HPC). We're dedicated to pushing the boundaries of what's possible, leveraging the latest advancements in hardware and software to empower our clients with unparalleled computational capabilities.

About the Role:

We are seeking a highly skilled and motivated Infrastructure Operations Engineer to join our dynamic team. As an integral member of the InfraOps team, you will play a key role in managing and optimizing our GPU-based compute infrastructure (across multiple locations and partners), ensuring maximum performance, scalability, and reliability.

Responsibilities:

  • Infrastructure Management: Deploy, configure, and maintain GPU-based compute infrastructure, including servers, storage, networking, and associated software stack. Aethir facilitates compute from dozens of providers around the world, from 4090s to H200s.
  • Monitoring and Optimization: Implement robust monitoring and alerting systems to proactively identify performance bottlenecks, resource constraints, and potential failures. Continuously optimize infrastructure to improve performance, efficiency, and cost-effectiveness.
  • Automation and Orchestration: Develop automation scripts and tools to streamline deployment, configuration, and management of infrastructure components. Implement infrastructure as code (IaC) principles to enable rapid provisioning and scaling.
  • Security and Compliance: Implement and enforce security best practices to safeguard sensitive data and ensure compliance with relevant regulations and industry standards. Conduct regular security audits and vulnerability assessments.
  • Incident Response and Troubleshooting: Provide tier-3 support for infrastructure-related issues, investigating root causes and implementing timely resolutions. Participate in on-call rotation to respond to critical incidents outside of regular business hours.
  • Capacity Planning and Scaling: Collaborate with cross-functional teams to forecast resource requirements, plan capacity upgrades, and scale infrastructure to accommodate growing workloads and user demands.
  • Documentation and Knowledge Sharing: Maintain comprehensive documentation of infrastructure configurations, procedures, and troubleshooting guidelines. Share knowledge and best practices with team members to foster continuous learning and skill development.

Requirements:

  • Experience in infrastructure operations, preferably in a DevOps or SRE role or Sales Engineering or Solution Architect role - focused on GPU compute.
  • Proficiency in managing GPU-based compute infrastructure, including NVIDIA GPUs and CUDA programming.
  • Strong expertise in Linux system administration and shell scripting (e.g., Bash, Python).
  • Experience with configuration management tools (e.g., Ansible, Chef, Puppet) and version control systems (e.g., Git).
  • Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Solid understanding of networking concepts, protocols, and troubleshooting techniques.
  • Excellent analytical and problem-solving skills, with a proactive and results-oriented mindset.
  • Effective communication skills and the ability to collaborate effectively with cross-functional teams. We operate in English, but speaking Mandarin as well is a big bonus as we have engineering teams in China and Southeast Asia.
  • Experience with cloud computing platforms (e.g., AWS, Azure, GCP) and hybrid cloud architectures.
  • Knowledge of HPC frameworks and job scheduling systems (e.g., Slurm, PBS Pro).
  • Familiarity with GPU-accelerated libraries and frameworks (e.g., TensorFlow, PyTorch, CUDA Toolkit).
  • Understanding of cybersecurity principles and practices, including encryption, access controls, and threat detection/prevention.
  • Bonus if you know Web3 (cryptocurrency, tokenization of RWAs, mining/staking, etc.).

Benefits:

  • Competitive compensation structure (and flexible on fiat/token mix).
  • Can be flexible on benefits, depending on location and setup.
  • Salary is also flexible depending on location and setup.
  • Flexible work hours and remote work options.

Seniority level: Mid-Senior level

Employment type: Full-time

Job function: IT Services and IT Consulting

#J-18808-Ljbffr

  • Hong Kong, Central and Western District, Hong Kong SAR China Asian Infrastructure Investment Bank Full time

    Company OverviewThe Asian Infrastructure Investment Bank (AIIB) is a multilateral development bank whose mission is to finance infrastructure for tomorrow in Asia and beyond. We began operations in Beijing in 2016 and have since grown to 110 approved members worldwide. Our strategic agenda focuses on developing and monitoring the implementation of our...


  • Hong Kong, Central and Western District, Hong Kong SAR China Marex Spectron Full time

    Marex has unique access across markets with significant share globally both on and off exchange. The depth of knowledge amongst its teams and divisions provides its customers with clear advantage, and its technology-led service provides access to all major exchanges, order-flow management via screen, voice and DMA, plus award-winning data, insights, and...


  • Hong Kong, Central and Western District, Hong Kong SAR China Asian Infrastructure Investment Bank Full time

    Company OverviewThe Asian Infrastructure Investment Bank (AIIB) is a leading multilateral development bank dedicated to financing infrastructure projects in Asia and beyond, with a strong focus on sustainability. Our mission is to support the growth of our member countries by providing innovative financial solutions that meet the needs of their rapidly...


  • Hong Kong, Central and Western District, Hong Kong SAR China Auros Full time

    Auros is seeking a highly skilled Cloud Infrastructure Engineer to join our team.The ideal candidate will have extensive experience in managing Linux systems in high-performance environments, with a strong background in container management, deployment, and orchestration. Proficiency in cloud management using Terraform, AWS, and Azure is required.We are...


  • Hong Kong, Central and Western District, Hong Kong SAR China Ploy Full time

    Infrastructure Engineer Location: Hong KongJob Type: 1-Year Renewable ContractIndustry: ConsultingSalary: HKD 50,000/month Responsibilities: Take the lead in maintaining and optimizing our server infrastructure, ensuring high availability and performance for critical applications and services. Oversee data center operations, from daily management to...


  • Hong Kong, Central and Western District, Hong Kong SAR China Hermeneutic Investments Full time

    We're looking for an AWS Infrastructure Engineer to join our hedge fund's technology team. You'll be responsible for building and maintaining our cloud infrastructure that powers our trading operations. This role combines expertise in AWS architecture, database administration, and system monitoring to ensure our platform operates at peak performance 24/7....


  • Hong Kong, Central and Western District, Hong Kong SAR China Crypto Full time

    About the Position:We are looking for an experienced Cloud Infrastructure Engineer to join our team. As a key member of our IT department, you will be responsible for designing, implementing, and maintaining our cloud infrastructure to ensure maximum uptime and scalability.Key Responsibilities:Cloud Infrastructure Design: Design and implement scalable and...


  • Hong Kong, Central and Western District, Hong Kong SAR China Mass Transit Railway Full time

    Job OverviewWe are seeking a skilled Railway Infrastructure Engineer to join our team at Mass Transit Railway. This is an exciting opportunity to work on infrastructure maintenance projects and contribute to the success of our organization.


  • Hong Kong, Central and Western District, Hong Kong SAR China IntelliPro Full time

    We are seeking a Senior DevOps Engineer to join our team at IntelliPro Group, a global provider of Executive Search, Recruitment Process Outsourcing (RPO), and Leadership / Talent Assessment Solutions.About the RoleThis is an exciting opportunity for a skilled engineer to design, develop, and maintain software for various ventures projects. As a Senior...


  • Hong Kong, Central and Western District, Hong Kong SAR China Research Institute Company Limited Full time

    The Cloud Infrastructure Engineer will play a key role in designing, building, and maintaining high-performance and resilient infrastructure environments. They will assist in system infrastructure design and work closely with the development team to ensure seamless integration.The ideal candidate will have hands-on experience with Microsoft Active Directory,...


  • Hong Kong, Central and Western District, Hong Kong SAR China IntelliPro Full time

    About UsIntelliPro Group is a global provider of Executive Search, Recruitment Process Outsourcing (RPO) and Leadership / Talent Assessment Solutions.We help clients design people strategies, attract, and develop talents worldwide across various industries. As a Senior DevOps Engineer at IntelliPro Group, you will be actively involved in designing scalable...


  • Hong Kong, Central and Western District, Hong Kong SAR China CID Full time

    CID Gulf LLC, a joint venture company in Oman, seeks an experienced Lead Engineer to oversee the design and development of a specialized high-tech infrastructure project. The ideal candidate will have expertise in large-scale, high-security, and aerospace-related facilities, including complex site planning, structural engineering, environmental compliance,...


  • Hong Kong, Central and Western District, Hong Kong SAR China NLS Executive Search Full time

    NLS Executive Search is seeking an experienced DevOps Engineer to join their team. The ideal candidate will have strong knowledge of DevOps principles, practices, and methodologies.Responsibilities:Collaborating with software development teams to establish and improve CI/CD pipelines for efficient software delivery.Developing and maintaining automation...


  • Hong Kong, Central and Western District, Hong Kong SAR China ACCA Careers Full time

    About the RoleWe are looking for a talented Cloud Systems Engineer to join our infrastructure team at ACCA Careers. As an Infrastructure Senior Engineer/Assistant Manager, you will play a key role in designing, building, and maintaining our cloud infrastructure.Your Key ResponsibilitiesDesign and implement cloud-based infrastructure solutions to meet...


  • Hong Kong, Central and Western District, Hong Kong SAR China Aethir Full time

    Job Description: Cloud Compute Operations EngineerAethir is a leading technology company at the forefront of GPU-based compute infrastructure, specializing in cutting-edge solutions for diverse industries ranging from AI and machine learning to high-performance computing (HPC).About the Role:We are seeking a highly skilled and motivated Cloud Compute...


  • Hong Kong, Central and Western District, Hong Kong SAR China Siemens Mobility Full time

    Job DescriptionWe are seeking a highly skilled Senior System Designer to join our team.The ideal candidate will have a degree in Computer Engineering, Electrical Engineering, Electronic Engineering, or Mechanical Engineering, and a minimum of 5-7 years' experience in system design works or commissioning works.Responsibilities include designing, implementing,...


  • Hong Kong, Central and Western District, Hong Kong SAR China Asian Infrastructure Investment Bank Full time

    About the Asian Infrastructure Investment BankThe Asian Infrastructure Investment Bank (AIIB) is a leading multilateral development institution dedicated to financing infrastructure that is green, technology-enabled and promotes regional connectivity. With 110 approved members worldwide and a capitalization of USD100 billion, AIIB plays a vital role in...


  • Hong Kong, Central and Western District, Hong Kong SAR China Amazon Full time

    Our team is dedicated to delivering high-quality services and support to our customers. As an AWS Data Center Operations Engineer, you'll play a critical role in maintaining the integrity and reliability of our global infrastructure.About the RoleYou'll be part of a dynamic team that works on the most challenging problems, with thousands of variables...


  • Hong Kong, Central and Western District, Hong Kong SAR China Marex Spectron Full time

    Marex Spectron OverviewMarex Spectron is a leading global company that provides unique access to markets, with a significant presence globally across both on- and off-exchange platforms. Our technology-led service offers access to major exchanges, order-flow management via screen, voice, and DMA, as well as award-winning data, insights, and analytics.The...


  • Hong Kong, Central and Western District, Hong Kong SAR China Asian Infrastructure Investment Bank Full time

    About UsThe Asian Infrastructure Investment Bank (AIIB) is a multilateral development bank dedicated to financing sustainable infrastructure in Asia and beyond. With 110 approved members worldwide, we strive to make a positive impact on the region's economic growth and social development.Our DepartmentThe Economics Department at AIIB plays a crucial role in...