Compute Platform Engineer

GTN Technical Staffing
Dallas, TX

Compute Platform Engineer

Location: Dallas, TX (Hybrid)

Type: Direct Hire

• Competitive base salary + performance bonus
• 100% company-paid benefits

Overview

We are seeking a Compute Platform Engineer to support the reliability, performance, and operational health of large-scale, high-performance compute infrastructure supporting critical research and production workloads.

This role is responsible for maintaining and troubleshooting CPU and GPU-based compute platforms, ensuring consistent performance at scale, and driving operational excellence across the environment. The position works closely with platform engineering, infrastructure, operations teams, and hardware vendors to support a stable and highly available compute ecosystem.

The ideal candidate brings strong hands-on experience with HPC or AI infrastructure, deep knowledge of server hardware, and a proactive approach to troubleshooting, automation, and continuous improvement.

Key Responsibilities

Compute Infrastructure Engineering

• Design, configure, and manage high-performance compute infrastructure composed of CPU and GPU nodes
• Support large-scale HPC and AI platforms, ensuring systems are stable, performant, and production-ready
• Perform diagnostics, tuning, and capacity planning to support efficient scale-out of compute environments

Hardware Reliability & Lifecycle Management

• Manage full firmware and BIOS lifecycle across compute infrastructure, including baselines, validation, rollout, and compliance
• Troubleshoot complex hardware issues across CPU, GPU, DPU, NVSwitch, NICs, memory, PSU, and BMC components
• Drive root cause analysis and implement solutions to improve system reliability and reduce recovery time
• Analyze hardware lifecycle processes and recommend improvements for optimization and efficiency

Automation & Platform Operations

• Automate health checks, onboarding workflows, and operational processes to improve deployment efficiency
• Leverage Infrastructure-as-Code (IaC) methodologies to enable scalable and repeatable infrastructure management
• Recommend and implement tooling and process improvements to enhance platform operations

Vendor & Cross-Functional Collaboration

• Collaborate with hardware vendors to resolve firmware and system issues, providing detailed diagnostics, logs, and impact analysis
• Work closely with infrastructure, platform, and operations teams to align on system performance and reliability goals
• Support integration of hardware improvements across the broader environment

Monitoring, Performance & Security

• Monitor hardware performance and identify opportunities for optimization
• Implement best practices for platform security and system hardening
• Ensure adherence to operational standards and data center processes

Technical Leadership

• Act as a subject matter expert for compute infrastructure and hardware-related issues
• Mentor junior engineers and contribute to a culture of continuous improvement and technical excellence

Required Experience

• 3+ years of hands-on experience supporting large-scale compute platforms, HPC, or AI infrastructure
• Strong experience with HPE server platforms such as ProLiant and Apollo
• Experience working with NVIDIA GPUs, including A100, H100/H200, or similar
• Solid understanding of server architecture including UEFI/BIOS, PCIe devices, and out-of-band management systems (iLO, BMC)
• Proven ability to troubleshoot complex hardware issues and coordinate with vendors for resolution
• Experience with Linux in high-performance or latency-sensitive environments
• Familiarity with core networking concepts including DNS, DHCP, VLANs, switching, and routing
• Experience working within data center environments and operational processes

Technical Skills

• Experience with automation tools such as Ansible, Terraform, and CI/CD pipelines
• Exposure to Infrastructure-as-Code (IaC) practices
• Working knowledge of Kubernetes and/or OpenStack (preferred)
• Strong problem-solving and analytical skills with the ability to operate in complex environments

Preferred Experience

• Experience supporting AI platforms or next-generation GPU architectures
• Exposure to large-scale distributed compute environments
• Experience working in mission-critical or high-availability infrastructure environments

Posted 2026-03-31

Recommended Jobs

Manager, Solution Architect

PwC
Dallas, TX

Specialty/Competency: Product Innovation Industry/Sector: Not Applicable Time Type: Full time Travel Requirements: Up to 60% At PwC, our people in software and product innovation focus…

View Details
Posted 2026-03-15

Vacuum Truck Driver

E & A Welding and Oilfield Services
Monahans, TX

Responsibilities: Safely perform responsibilities in potentially hazardous tasks Safely operate a tractor with a tanker trailer Load / Unload / Transport Brine Water Produce Water …

View Details
Posted 2026-02-13

Senior Software Engineer - Device Team

Zimperium
Dallas, TX

Dallas, TX (hybrid) Duties: The Device Team Senior Software Engineer will manage the full software development cycle for Android and iOS applications, including design, development, testing, and…

View Details
Posted 2026-02-10

Line Cook

Wyndham Garden Katy
Katy, TX

Position Overview: Line Cook The Line Cook is responsible for preparing high-quality food items according to restaurant recipes and standards. This role involves working in a fast-paced environment,…

View Details
Posted 2025-08-30

PeopleSoft Developer

innovitusa
Austin, TX

8 or more years of experience, relies on experience and judgment to plan and accomplish goals, independently performs a variety of complicated tasks, may lead and direct the work of others, a wide deg…

View Details
Posted 2026-03-27

Drafting Technician

Blew & Associates, P.A.
Grapevine, TX

We are currently seeking a detail-oriented and highly organized Drafting Technician to join our team at Blew & Associates, P.A. In this role, you will prepare accurate and precise technical drawing…

View Details
Posted 2026-01-14

Service Advisor (Cypress)

Sun Auto Service Houston
Cypress, TX

Overview: ** Immediate Service Advisor Opportunity ** Pay: $45,000 - $75,000 (uncapped bonus)   The Service Advisor/Writer is responsible for selling and promoting all products and services offe…

View Details
Posted 2026-03-27

Soccer Coach

Super Soccer Stars
Mcallen, TX

Join Super Soccer Stars as a Soccer Coach and become part of a passionate team dedicated to teaching soccer fundamentals to children! We are a leading youth soccer training organization with a missio…

View Details
Posted 2026-02-10

Part-Time 100% Remote Inside Sales Representative - internet sales (Dallas)

Jobot
Dallas, TX

Operations Manager Needed For Leading Manufacturing Company This Jobot Job is hosted by: Kevin Finlay Are you a fit? Easy Apply now by clicking the Apply button and sending us your resume. S…

View Details
Posted 2026-03-27

Sr. Infrastructure Engineer I

Symetra
Arlington, TX

We have an exciting new opportunity to join Symetra as a Senior Infrastructure Engineer I. About the role As the Senior Infrastructure Engineer I you will solve complex business needs by del…

View Details
Posted 2026-02-19