Lead Test Engineer, Compute Server & Storage - AI Data Center
Req ID: 129193
Region: Americas
Country: USA
State/Province: Texas
City: Austin
General Overview
Functional Area: Engineering
Career Stream: Design - Software Engineering
SAP Short Name: LEN-ENG-DSE
Job Level: Level 08
IC/MGR: Individual Contributor
Direct/Indirect Indicator: Indirect
Summary
The Lead Storage and Server Test Engineer will support the design, development, and execution of test strategies for AI data center storage and server infrastructure. This role is hands-on and focused on test execution, automation, and troubleshooting across hardware and software components.
This position requires strong working knowledge of enterprise storage systems, server architectures, and networking, along with an understanding of the performance and reliability considerations of AI/ML workloads.
Required Qualifications
Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related technical field.
Working knowledge of storage technologies including NVMe, SAS/SATA SSDs/HDDs, RAID, distributed file systems (e.g., Ceph, Lustre, GPFS), SAN, and NAS.
Strong understanding of server architectures (x86, ARM, GPU servers), CPU/memory subsystems, PCIe, and power management.
Proficiency in scripting languages (e.g., Python, Bash) for test automation and data analysis.
Experience with Linux operating systems (e.g., Ubuntu, CentOS, RHEL) and command-line tools.
Familiarity with networking concepts (Ethernet, TCP/IP, InfiniBand) and network testing methodologies.
Experience with test methodologies such as performance testing, reliability testing, stress testing, and fault injection.
Excellent problem-solving, analytical, and debugging skills.
Strong communication and interpersonal skills, with the ability to collaborate effectively across diverse teams.
Knowledge/Skills/Competencies
Test Strategy:
Contribute to the development and implementation of test plans and strategies for storage, server hardware, firmware, and software components in our AI data center.
Execute and analyze complex test cases, including functional, performance, reliability, stress, and endurance testing.
Collaborate with senior engineers, adopting best practices and contributing to team knowledge.
Keep up-to-date with industry trends, emerging technologies, and best practices in storage, server, and AI infrastructure testing.
Technical Execution & Automation:
Contribute to the development and enhancement of automated test frameworks and scripts using languages like Python, Go, or similar, to improve efficiency and coverage of testing.
Conduct performance analysis and support bottleneck identification for server platforms (e.g., CPU, GPU, memory, PCIe, networking).
Troubleshoot hardware and software issues, working closely with development and operations teams to identify root causes and propose solutions.
Support the development and maintenance of testbeds and infrastructure for continuous integration and validation.
Utilize open-source and commercial test tools relevant to storage and server validation.
Collaboration & Communication:
Collaborate closely with hardware design, software development, infrastructure, and AI/ML engineering teams to understand requirements and integrate testing throughout the product lifecycle.
Communicate test progress, results, and critical issues effectively to stakeholders.
Participate in design reviews and technical discussions to ensure testability and quality are considered throughout development.
AI/ML Specific Considerations:
Support testing methodologies to validate performance and reliability under AI/ML workloads (e.g., model training, inference, data ingestion).
Assist in assessing the impact of storage and server configurations on system performance.
Understand and test interactions between GPU-accelerated computing, networking, and storage systems.
Preffered Qualifications
Familiarity with OCP (Open Compute Project)
Experience with cloud environments (AWS, Azure, GCP) and virtualization technologies.
Knowledge of containerization technologies (Docker, Kubernetes).
Familiarity with AI/ML frameworks (e.g., TensorFlow, PyTorch) and their infrastructure requirements.
Experience with performance profiling tools (e.g., fio, Iometer, Perf, VTune).
Contributions to open-source projects related to storage, servers, or testing.
Certifications in relevant technologies (e.g., NetApp, Dell EMC, HPE, NVIDIA).
Typical Experience
- 3 to 8 years
Typical Education
Bachelor degree or consideration of an equivalent combination of education and experience.
Notes
This job description is not intended to be an exhaustive list of all duties and responsibilities of the position. Employees are held accountable for all duties of the job. Job duties and the % of time identified for any function are subject to change at any time.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.
Celestica's policy on equal employment opportunity prohibits discrimination based on race, color, creed, religion, national origin, gender, sexual orientation, gender identity, age, marital status, veteran or disability status, or other characteristics protected by law.
This policy applies to hiring, promotion, discharge, pay, fringe benefits, job training, classification, referral and other aspects of employment and also states that retaliation against a person who files a charge of discrimination, participates in a discrimination proceeding, or otherwise opposes an unlawful employment practice will not be tolerated. All information will be kept confidential according to EEO guidelines.
COMPANY OVERVIEW:
Celestica (NYSE, TSX: CLS) enables the world's best brands. Through our recognized customer-centric approach, we partner with leading companies in Aerospace and Defense, Communications, Enterprise, HealthTech, Industrial, Capital Equipment and Energy to deliver solutions for their most complex challenges. As a leader in design, manufacturing, hardware platform and supply chain solutions, Celestica brings global expertise and insight at every stage of product development – from drawing board to full-scale production and after-market services for products from advanced medical devices, to highly engineered aviation systems, to next-generation hardware platform solutions for the Cloud. Headquartered in Toronto, with talented teams spanning 40+ locations in 13 countries across the Americas, Europe and Asia, we imagine, develop and deliver a better future with our customers.
Celestica would like to thank all applicants, however, only qualified applicants will be contacted.
Celestica does not accept unsolicited resumes from recruitment agencies or fee based recruitment services.
This location is a US ITAR facility and these positions will involve the release of export controlled goods either directly to employees or through the employee's movement within the facility. As such, Celestica will require necessary information from all applicants upon an applicant's acceptance of employment to determine if any export control exemptions or licenses must be filed.
Recommended Jobs
Clinical Neuropsychologist
At Houston Methodist, the Clinical Neuropsychologist position is responsible for providing psychological expertise for the clinical, educational and consultative aspect, specializing in the diagnostic…
Preschool Teacher
About the Job: Hill Country Fellowship Christian Academy is a faith-based preschool where children are loved, nurtured, and taught the foundational skills they need for kindergarten—academically…
Senior Director, AI Solutions
Who are we? Equinix is the world’s digital infrastructure company®, shortening the path to connectivity to enable the innovations that enrich our work, life and planet. A place where tech thin…
Dining Room Attendant
Description Our Dining Room Attendant helps us ensure a "Remark"able experience for all of our dine-in guests... making sure every aspect of the Dining Room is cleaned and maintained for our guest…
Manager
We offer top pay in the industry to managers. We suggest you send a separate email to [email protected] with your resume and personal information attached. We are always looking for outstanding man…
Digital Marketing Intern | The Jay
Are you a socially savvy student at Texas A&M University looking for an internship to hone your marketing skills and build your portfolio? Do you have a passion for storytelling and a knack for knowi…
Data Center Technician II (Day)
IDR is seeking a Data Center Technician II to join one of our top clients for an opportunity in Irving, Texas. This role is ideal for individuals with a passion for maintaining and supporting enterp…
Technical Manager - Energy
Job Description Overview We're hiring an experienced and motivated Technical Manager in our National Technology Practice to support key internal and external clients in Houston, Texas . T…
Entry-Level Leadership Development Representative
About Us The Fisher Organization is hiring motivated, coachable individuals for our Leadership Development Representative role. This is an entry-level opportunity designed for candidates who want …
Junk Removal Specialist
To be one of the H.U.N.K.S., you must be: Honest, Uniformed, Nice, Knowledgeable, and Service-oriented. Come and see what all the buzz is about and join our winning team. College Hunks Hauling Junk…