Supervisor - Server Repair Engineering
Supervisor - Server Repair Engineering
Description
This is a foundational role responsible for architecting, defining, and continuously improving the entire technical framework for diagnosing and repairing our complex, high-value AI server infrastructure.
More than a traditional supervisor, you are the lead repair engineer and process owner.
You will leverage your deep hardware expertise to develop systematic, data-driven, and scalable repair processes from the ground up.
You will not only lead a team of technicians and junior engineers but also act as their primary technical mentor and the engineering liaison to our core Product Design and Quality teams.
Your mission is to transform our repair facility into a center of excellence by embedding engineering discipline into every aspect of our service operations.
Key Responsibilities
1. Process Architecture & Definition (Primary Focus):
* Architect and Author: Design, document, and deploy the end-to-end technical workflow for AI server repair. This includes creating detailed Standard Operating Procedures (SOPs), diagnostic flowcharts, decision trees, and work instructions.
* Test Plan Development: Define and validate comprehensive test plans and validation criteria for all repaired components and full systems, ensuring they meet strict performance and reliability standards before being returned to service.
* Tooling & Automation: Identify, develop, and implement diagnostic scripts, software tools, and physical fixtures to improve the accuracy, consistency, and efficiency of the troubleshooting and repair process.
* Process Control: Establish critical control points within the repair process to ensure quality and gather vital failure data.
2. Advanced Engineering Support & Failure Analysis (Primary Focus):
* Technical Authority: Serve as the ultimate escalation point for the most complex hardware failures that elude standard diagnostic procedures.
* Root Cause Analysis (RCA): Lead systematic deep dives into new and recurring failure modes. Perform board-level analysis, interpret schematics, and collaborate with the team to isolate the root cause.
* Engineering Feedback Loop: Act as the primary technical interface between the repair center and core Hardware Engineering/R&D. Consolidate, analyze, and present failure data and RCA findings to influence future product design for improved serviceability and reliability (Design for Serviceability).
3. Operational Leadership & Team Enablement:
* Technical Mentorship: Lead and develop the technical capabilities of the repair team. Provide hands-on training on new products, advanced diagnostic techniques, and established repair processes.
* Enablement, Not Just Delegation: Empower the team by ensuring they have the processes, tools, and knowledge required to succeed. Focus on removing technical roadblocks and fostering an environment of structured problem-solving.
* Performance Management: Set clear technical objectives, manage workflow priorities based on engineering needs, and guide the professional growth of team members.
4. Data-Driven Continuous Improvement:
* Analyze Repair Data: Systematically collect and analyze repair data (failure modes, component usage, test yields) to identify trends and opportunities for process optimization.
* Drive Improvements: Initiate and lead engineering change requests (ECRs) and process improvement projects based on data analysis to enhance repair quality, reduce turn-around time, and lower costs.
Qualifications
Qualifications & Skills
Required Qualifications (Must-Haves):
* Education: Bachelors degree in Electrical Engineering, Computer Engineering, Manufacturing Engineering, or a closely related field.
* Experience: * 4+ years in a technical engineering role such as Test Engineering, Manufacturing Engineering, Hardware Sustaining, or high-level Repair Engineering.
* Proven track record of developing and documenting technical processes (SOPs, test plans, work instructions) from scratch in a manufacturing or repair environment.
* 3+ years in a technical leadership role, mentoring junior engineers or technicians.
* Technical Expertise:
* Expert-level ability to read and interpret electronic schematics, board layout files, and product specifications.
* Strong, hands-on experience with systematic hardware troubleshooting methodologies for complex systems (e.g., servers, networking equipment).
* Demonstrated proficiency in scripting (Python, Bash, or similar) to automate diagnostic tests and parse data logs.
* Deep knowledge of server components and architecture, including GPUs, high-speed interconnects (InfiniBand/Ethernet), CPUs, and power systems.
Working Conditions
Must be able to tolerate moderate to high noise levels in production and testing rooms. Office and outside environmental conditions found in the warehouse, hot in the summer, cold in the winter. Individuals may need to walk for an extensive period of time while working and walking the facilities; to reach over shoulder heights; bend or stoop below the waist; repetitive wrist, hand, or finger movement; and occasional lifting up to 50 pounds.
To apply send your resume to [email protected]
Recommended Jobs
Office Manager
Office Manager The Office Manager is responsible for all operational aspects of the agency. Office Manager will hire, train, and mentor all staff. Maintains and monitors all office services; office o…
TikTok Livestream Host wanted at Houston, TX
We are looking for a confident and engaging livestream host to join Flink’s TikTok LiveStream ! This live stream project is simple and stress-free – there are NO sales performance requirements a…
Material Handler II
Job Responsibilities: Maintain general supplies and stock materials for daily usage by production team(s) within controlled and classified areas. Perform cycle counts within classified spaces. …
Senior Technical Sales Representative
Role: Senior Technical Sales Representative Location Preferred: Houston TX Travel: Approximately 30% Overview Our client is expanding their footprint across North America and South America…
Production Lead - 2nd Shift
Description Position Summary: The Production Lead is responsible for effectively leading an hourly workforce to achieve corporate, plant and departmental objectives. In addition, the Production Le…
Senior Cyber Security Engineer - Security Services
Description The Role: At General Motors, our Cyber Security organization protects the company’s global operations through secure, scalable solutions that enable innovation. Within Security Enabl…
Data Capture Technician & Team Lead
Who We Are At CloudFactory, we are a mission-driven team passionate about unlocking the disruptive potential of AI for the world. By combining advanced technology with a global network of talented…
Clinical Team Lead (Charge Nurse) - FT - Nights - L&D - 10K Sign on Bonus
Hours of Work : 7pm-7am Days Of Week : veries Work Shift : Job Description : Your Job: The Clinical Team Leader performs direct patient care as the charge nurse and maintains a safe and …
Enterprise Software Implementation Consultant
ISIS Papyrus America is looking for experienced, ambitious IT Consultant to join their team. Working with blue-chip companies worldwide, this position is suited to an already successful consultant wh…
Senior Software Engineer, Agentic AI
Who You Are: You’re a seasoned software engineer ready for your next challenge. In this role, you’ll join Invicti’s newly formed AI team as a Senior Software Engineer, Agentic AI, and play a pivot…