Compute Platform Engineer
Compute Platform Engineer
Location: Dallas, TX (Hybrid)
Type: Direct Hire
⢠Competitive base salary + performance bonus
⢠100% company-paid benefits
Overview
We are seeking a Compute Platform Engineer to support the reliability, performance, and operational health of large-scale, high-performance compute infrastructure supporting critical research and production workloads.
This role is responsible for maintaining and troubleshooting CPU and GPU-based compute platforms, ensuring consistent performance at scale, and driving operational excellence across the environment. The position works closely with platform engineering, infrastructure, operations teams, and hardware vendors to support a stable and highly available compute ecosystem.
The ideal candidate brings strong hands-on experience with HPC or AI infrastructure, deep knowledge of server hardware, and a proactive approach to troubleshooting, automation, and continuous improvement.
Key Responsibilities
Compute Infrastructure Engineering
⢠Design, configure, and manage high-performance compute infrastructure composed of CPU and GPU nodes
⢠Support large-scale HPC and AI platforms, ensuring systems are stable, performant, and production-ready
⢠Perform diagnostics, tuning, and capacity planning to support efficient scale-out of compute environments
Hardware Reliability & Lifecycle Management
⢠Manage full firmware and BIOS lifecycle across compute infrastructure, including baselines, validation, rollout, and compliance
⢠Troubleshoot complex hardware issues across CPU, GPU, DPU, NVSwitch, NICs, memory, PSU, and BMC components
⢠Drive root cause analysis and implement solutions to improve system reliability and reduce recovery time
⢠Analyze hardware lifecycle processes and recommend improvements for optimization and efficiency
Automation & Platform Operations
⢠Automate health checks, onboarding workflows, and operational processes to improve deployment efficiency
⢠Leverage Infrastructure-as-Code (IaC) methodologies to enable scalable and repeatable infrastructure management
⢠Recommend and implement tooling and process improvements to enhance platform operations
Vendor & Cross-Functional Collaboration
⢠Collaborate with hardware vendors to resolve firmware and system issues, providing detailed diagnostics, logs, and impact analysis
⢠Work closely with infrastructure, platform, and operations teams to align on system performance and reliability goals
⢠Support integration of hardware improvements across the broader environment
Monitoring, Performance & Security
⢠Monitor hardware performance and identify opportunities for optimization
⢠Implement best practices for platform security and system hardening
⢠Ensure adherence to operational standards and data center processes
Technical Leadership
⢠Act as a subject matter expert for compute infrastructure and hardware-related issues
⢠Mentor junior engineers and contribute to a culture of continuous improvement and technical excellence
Required Experience
⢠3+ years of hands-on experience supporting large-scale compute platforms, HPC, or AI infrastructure
⢠Strong experience with HPE server platforms such as ProLiant and Apollo
⢠Experience working with NVIDIA GPUs, including A100, H100/H200, or similar
⢠Solid understanding of server architecture including UEFI/BIOS, PCIe devices, and out-of-band management systems (iLO, BMC)
⢠Proven ability to troubleshoot complex hardware issues and coordinate with vendors for resolution
⢠Experience with Linux in high-performance or latency-sensitive environments
⢠Familiarity with core networking concepts including DNS, DHCP, VLANs, switching, and routing
⢠Experience working within data center environments and operational processes
Technical Skills
⢠Experience with automation tools such as Ansible, Terraform, and CI/CD pipelines
⢠Exposure to Infrastructure-as-Code (IaC) practices
⢠Working knowledge of Kubernetes and/or OpenStack (preferred)
⢠Strong problem-solving and analytical skills with the ability to operate in complex environments
Preferred Experience
⢠Experience supporting AI platforms or next-generation GPU architectures
⢠Exposure to large-scale distributed compute environments
⢠Experience working in mission-critical or high-availability infrastructure environments
Recommended Jobs
Manager, Solution Architect
Specialty/Competency: Product Innovation Industry/Sector: Not Applicable Time Type: Full time Travel Requirements: Up to 60% At PwC, our people in software and product innovation focus…
Vacuum Truck Driver
Responsibilities: Safely perform responsibilities in potentially hazardous tasks Safely operate a tractor with a tanker trailer Load / Unload / Transport Brine Water Produce Water …
Senior Software Engineer - Device Team
Dallas, TX (hybrid) Duties: The Device Team Senior Software Engineer will manage the full software development cycle for Android and iOS applications, including design, development, testing, and…
Line Cook
Position Overview: Line Cook The Line Cook is responsible for preparing high-quality food items according to restaurant recipes and standards. This role involves working in a fast-paced environment,…
PeopleSoft Developer
8 or more years of experience, relies on experience and judgment to plan and accomplish goals, independently performs a variety of complicated tasks, may lead and direct the work of others, a wide deg…
Drafting Technician
We are currently seeking a detail-oriented and highly organized Drafting Technician to join our team at Blew & Associates, P.A. In this role, you will prepare accurate and precise technical drawing…
Service Advisor (Cypress)
Overview: ** Immediate Service Advisor Opportunity ** Pay: $45,000 - $75,000 (uncapped bonus) The Service Advisor/Writer is responsible for selling and promoting all products and services offe…
Soccer Coach
Join Super Soccer Stars as a Soccer Coach and become part of a passionate team dedicated to teaching soccer fundamentals to children! We are a leading youth soccer training organization with a missio…
Part-Time 100% Remote Inside Sales Representative - internet sales (Dallas)
Operations Manager Needed For Leading Manufacturing Company This Jobot Job is hosted by: Kevin Finlay Are you a fit? Easy Apply now by clicking the Apply button and sending us your resume. S…
Sr. Infrastructure Engineer I
We have an exciting new opportunity to join Symetra as a Senior Infrastructure Engineer I. About the role As the Senior Infrastructure Engineer I you will solve complex business needs by del…