Engineering Manager, HPC Kubernetes Platform

GTN Technical Staffing
Dallas, TX

Engineering Manager, HPC Kubernetes Platform
Location: Dallas, TX

Overview

This organization is backed by dedicated leadership and investment, with a clear mission as it operates at the bleeding edge of technology. Its goal is to scale and enhance high-performance computing (HPC) and cloud infrastructure that supports clients' research, production, and delivery, enabling breakthroughs that shape the industries of tomorrow. Its engineers build critical infrastructure to eliminate friction in scientific research, simulations, analysis, and decision-making, accelerating discovery and driving faster innovation.

We are seeking an experienced Engineering Manager, HPC Kubernetes Platform to lead the team responsible for designing and scaling a bare-metal Kubernetes environment—the orchestration layer powering GPU- and CPU-intensive machine-learning and HPC workloads across global datacenters.

This is a hands-on leadership role focused on platform performance, reliability, and automation. You will define the technical roadmap, guide system architecture and optimization, and ensure the Kubernetes platform delivers top-tier reliability and throughput for distributed ML and HPC environments. The ideal candidate is a strong technical leader who thrives at the intersection of infrastructure engineering, AI systems, and high-performance computing.

Key Responsibilities

- Lead and mentor engineers designing and scaling a bare-metal Kubernetes platform for HPC and ML workloads.
- Architect and optimize GPU/CPU scheduling, resource management, and performance across multi-tenant compute clusters.
- Drive automation and observability using Infrastructure-as-Code, CI/CD, and SRE best practices.
- Collaborate with Research, Storage, and Network teams to integrate distributed filesystems, high-speed interconnects (InfiniBand, RoCE), and custom runtimes.
- Partner with hardware and software vendors to improve tooling, influence product roadmaps, and streamline deployment.
- Oversee platform reliability, capacity forecasting, and performance KPIs across thousands of nodes.

Required Experience

- 7+ years in infrastructure, platform, or SRE engineering, including 2+ in technical leadership.
- Proven experience operating Kubernetes environments tailored for HPC or ML training workloads—GPU scheduling, resource isolation, and workload optimization.
- Deep knowledge of Linux systems, networking, and performance engineering on bare-metal hardware.
- Experience managing large-scale, multi-tenant clusters and integrating distributed storage or high-speed networking.
- Strong automation experience (Terraform, Ansible, or similar) and familiarity with observability tools (Prometheus, Grafana, Loki).
- Excellent communication and stakeholder management skills; ability to translate complex technical direction into clear, actionable plans.
- Bachelor's degree or equivalent experience.

Preferred Experience

- Familiarity with HPC schedulers (Slurm, Flux) and container runtimes (containerd, CRI-O).
- Contributions to open-source Kubernetes or ML infrastructure projects.

Posted 2026-04-06

Recommended Jobs

Contract Remote English Teacher — U.S. Elementary Schools

Air Education Inc.
Dallas, TX

About Us Literacy is the foundation of lifelong learning, yet nearly one in three children in America struggles to read at grade level. At Air Reading, we are tackling this challenge head-on by brin…

View Details
Posted 2025-11-01

Consultant - Contact Center

GroHR
Coppell, TX

Consultant - Contact Center Salary: $65,000 – $90,000 per year  Gro HR is partnering with a client to hire a Contact Center Consultant to support and improve enterprise contact center solutions us…

View Details
Posted 2026-03-12

Licensed Practical Nurse Behavioral Health

Pro Med Healthcare Services
Carlsbad, TX

Pro Med Healthcare Services is seeking a dedicated Licensed Practical Nurse (LPN) with Behavioral Health experience to join our compassionate and professional team. In this role, you will provide…

View Details
Posted 2026-03-25

Assistnat Manager

Pei Wei Asian Diner LLC
Hurst, TX

: Position Summary: The Assistant Manager reports directly to the General Manager and requires little direction in performing their job functions and leading the restaurant team. This position is r…

View Details
Posted 2026-04-06

Maintenance Supervisor - Preserve at Highway 6

Greystar
Missouri City, TX

ABOUT GREYSTAR Greystar is a leading, fully integrated global real estate platform offering expertise in property management, investment management, development, and construction services in ins…

View Details
Posted 2026-03-18

Accounting Specialist - A/P

Mid-States Distributing Co
Fort Worth, TX

Who We Are: Mid-States Distributing ("Mid-States") is the leading Farm, Ranch and Home Retail Cooperative. Our company is owned by 34 independent Farm, Ranch & Home retailers (“Members”) who operate …

View Details
Posted 2026-01-29

Pet Groomer

Creature Comfort Animal Clinic
Arlington, TX

Provides grooming services for pets in adherence with established policies and procedures to ensure safety and security of pets and clients. Maintains strong knowledge of grooming best practices, pri…

View Details
Posted 2026-04-06

Senior Electrical Engineer: Energy and Power (Fort Worth) 35423

Fort Worth, TX

Energy At Jacobs, we're challenging today to reinvent tomorrow by solving the world's most critical problems for th…

View Details
Posted 2026-04-06

Palliative Care Physician (MD/DO) - Texas

Commonwealth Medical Services
Abilene, TX

Palliative Care Physician (MD/DO) Position Summary The Palliative Care Physician provides specialized medical care focused on improving quality of life for patients with serious, complex, or lif…

View Details
Posted 2026-01-15

NPI Specialist

Celestica International LP
Richardson, TX

Req ID: 130422  Remote Position: No Region: Americas  Country: USA  State/Province: Texas  City:  Richardson  General Overview Functional Area: ENG - Engineering Career Stream: NPI - Ne…

View Details
Posted 2026-02-20