HPC Kubernetes Architect

GTN Technical Staffing
Dallas, TX

HPC Kubernetes Architect
Location: Dallas, TX

Overview

This organization is backed by dedicated leadership and investment, with a clear mission as it operates at the bleeding edge of technology. Its goal is to scale and enhance high-performance computing (HPC) and cloud infrastructure that supports clients' research, production, and delivery, enabling breakthroughs that shape the industries of tomorrow. Its engineers build critical infrastructure to eliminate friction in scientific research, simulations, analysis, and decision-making, accelerating discovery and driving faster innovation.

As an HPC Kubernetes Solutions Architect, you will act as a trusted advisor to customers, guiding them through the design, integration, and adoption of GPU-accelerated Kubernetes platforms purpose-built for high-performance computing (HPC), AI/ML training, simulation, and scientific workloads.

This is a customer-facing architecture role with accountability across the entire solution lifecycle — from early discovery and requirements analysis, through reference architecture design, proof-of-concept delivery, and deployment, to long-term optimization and platform evolution. You will be responsible for creating architectural blueprints and integration strategies that enable customers to achieve measurable performance and scalability outcomes, while preparing them for future growth and technology shifts. In addition, you will collaborate closely with product, engineering, and operations teams, ensuring customer feedback informs roadmap priorities and helping define the next generation of Kubernetes-based HPC orchestration. This role is ideal for someone who combines deep technical expertise in Kubernetes and GPU orchestration with the ability to engage customers as a solution strategist, aligning today's workloads with tomorrow's innovation.

Key Responsibilities

Customer Engagement & Architecture
- Act as the primary architectural point of contact for customers adopting GPU-accelerated Kubernetes platforms for HPC and AI/ML workloads.
- Partner with customers to capture workload requirements, performance objectives, scaling needs, and integration constraints, translating them into reference architectures and actionable solution designs.
- Lead proof-of-concept and benchmarking engagements, using profiling tools, workload characterization, and telemetry to validate solution performance and scalability.
- Provide architectural leadership during onboarding and deployment, ensuring successful integration of Kubernetes clusters with HPC schedulers and enterprise IT systems.
- Represent the organization at customer design sessions, technical workshops, and industry conferences, positioning yourself as a thought leader in Kubernetes for HPC.
- Share future insights with customers on GPU roadmaps, interconnect advancements (e.g., InfiniBand, RoCE, NVLink), and container orchestration trends.

Platform & Infrastructure
- Architect and operate Kubernetes clusters optimized for GPU workloads, leveraging NVIDIA GPU Operator, Network Operator, DCGM, and device plugins.
- Integrate and tune Multi-Instance GPU (MIG), GPU sharing, and scheduler extensions (e.g., Volcano, Slurm integration, kube-scheduler plugins) to maximize efficiency in multi-tenant environments.
- Develop or extend custom Kubernetes operators and controllers in Go/Python to automate HPC infrastructure services.
- Design and recommend secure multi-tenant Kubernetes environments, implementing RBAC, OPA/Gatekeeper policies, namespace isolation, and workload quotas.
- Define and document integration strategies across compute, storage, networking, and orchestration layers, including CNI plugins (NVIDIA CNI, Multus, Cilium), storage systems (Lustre, GPFS, Ceph, VAST), and container runtimes (containerd, NVIDIA Container Toolkit).
- Drive observability and monitoring solutions with Prometheus, Grafana, DCGM Exporter, and OpenTelemetry, ensuring visibility into GPU health, cluster utilization, and workload performance.
- Support GitOps-driven CI/CD pipelines for Kubernetes infrastructure using ArgoCD, FluxCD, Helm, and Kustomize.
- Collaborate with HPC, ML, and DevOps teams to validate performance and scalability in hybrid or on-premise environments.

Vendor & Product Collaboration
- Build and maintain strategic relationships with ecosystem vendors (e.g., NVIDIA, Cisco, storage partners), incorporating emerging technologies into customer environments.
- Collaborate closely with product, engineering, and operations teams to ensure customer feedback informs roadmap priorities.

Required Experience

- Bachelor's degree or equivalent experience.
- Extensive experience in Kubernetes architecture and operations for HPC or GPU-intensive environments.
- Strong technical expertise in the NVIDIA GPU stack (GPU Operator, device plugins, MIG, NVML, DCGM).
- Strong technical expertise in Kubernetes internals (CRDs, RBAC, scheduler extensions, custom operators/controllers).
- Strong technical expertise in distributed and parallel storage integration with Kubernetes for HPC workloads.
- Strong technical expertise in high-performance networking (InfiniBand, RDMA, RoCE) in containerized environments.
- Proven ability to design scalable, secure, and resilient Kubernetes-based architectures for HPC and AI/ML use cases.
- Proficiency in Go or Python for Kubernetes operator or controller development.
- Experience with workload profiling, benchmarking, and performance tuning.
- Strong customer engagement skills, capable of translating requirements into actionable architectures and presenting solutions effectively.
- Collaborative mindset with experience working across engineering, product, and operations teams.

Preferred Experience

- Demonstrated success in end-to-end customer solution delivery, from requirements discovery to deployment and adoption.
- Familiarity with containerized HPC environments (e.g., Singularity/Apptainer).
- Exposure to automation and GitOps practices for Kubernetes platform management (e.g., ArgoCD, FluxCD).
- Contributions to open-source projects in the Kubernetes or NVIDIA ecosystem.
- Experience advising on future adoption strategies, helping customers prepare for emerging GPU, interconnect, and orchestration technologies.
- Bachelor's or Master's degree in Computer Science, Engineering, Physics, or related technical field.
- Relevant Kubernetes and container certifications such as CKA, CKAD, or CKS, alongside cloud certifications like AWS Solutions Architect or Azure Solutions Architect Expert.

Posted 2026-04-03

Recommended Jobs

Head Junior High Tennis Coach

Kingdom Preparatory Academy
Lubbock, TX

Job Requirements: Kingdom Preparatory Academy's faculty members must: •    be committed Christ-followers who are in agreement with the school’s statement of faith () •    be committed to the sc…

View Details
Posted 2026-05-12

Service Coordinator

Heart of Texas Behavioral Health Network (Previously known as Heart of Texas MHMR)
Waco, TX

GENERAL DESCRIPTION This position is the primary advocacy position for consumers in the General Revenue program. The individual in this position should be able to effectively interact with both con…

View Details
Posted 2026-05-04

Membership Sales Advisors

Equinox
Houston, TX

Company Description OUR STORY Equinox Group is a high growth collective of the world's most influential, experiential, and differentiated lifestyle brands. We restlessly seek what is next for m…

View Details
Posted 2026-05-15

Assistant Construction Manager

ROI Agency
Houston, TX

Assistant Construction Manager – High Voltage Power Delivery Location: Travel / Project-Based (National) Employment Type: Contract Company Overview Our client is a recognized leader in …

View Details
Posted 2025-12-30

Paralegal

Professional Alternatives
Houston, TX

Job ID#: 37183 Downtown firm adding a Paralegal to growing team — great schedule, with no overtime  Duties include: Review and analyze pleadings, contracts, and discovery Maintain calendar…

View Details
Posted 2026-04-07

Summer Day Camp Counselor

KE Camps
Katy, TX

Job Description Job Description KE Camps is the leading provider of summer camps for country clubs nationwide. We partner with over 230 private clubs all over the country to implement traditional…

View Details
Posted 2026-03-21

Permanent Allied-Radiology Tech, X Ray Tech

StaffDNA
Waco, TX

Radiology Technologist - X Ray Tech We are seeking an experienced Radiology Technologist specializing in X-ray imaging. In this role, you will perform diagnostic imaging procedures to assist physi…

View Details
Posted 2026-05-15

Manager

Goodfire BBQ
San Antonio, TX

We offer top pay in the industry to managers. We suggest you send a separate email to [email protected] with your resume and personal information attached. We are always looking for outstanding man…

View Details
Posted 2026-01-18

Class A OTR Truck Driver

Beast Mode Truckin
Plano, TX

Beast Mode Truckin seeks experienced Class A OTR drivers and recent CDL graduates for over‑the‑road runs excluding the Northeast regional. Drivers will operate predominantly refrigerated (reefer) equ…

View Details
Posted 2026-03-31

Dental Hygienist (Magnolia)

Dry Creek Dental Care
Magnolia, TX

Dental Hygienist Dry Creek Dental is looking for a part-time Dental Hygienist to join our team.   Why Heartland Dental? Heartland Dental is the nation's largest dental support organization,…

View Details
Posted 2026-05-12