Supervisor- Server repair engineering
Description
This is a foundational role responsible for architecting, defining, and continuously improving the entire technical framework for diagnosing and repairing our complex, high-value AI server infrastructure.
More than a traditional supervisor, you are the lead repair engineer and process owner.
You will leverage your deep hardware expertise to develop systematic, data-driven, and scalable repair processes from the ground up.
You will not only lead a team of technicians and junior engineers but also act as their primary technical mentor and the engineering liaison to our core Product Design and Quality teams.
Your mission is to transform our repair facility into a center of excellence by embedding engineering discipline into every aspect of our service operations.
Key Responsibilities
1. Process Architecture & Definition (Primary Focus):
* Architect and Author: Design, document, and deploy the end-to-end technical workflow for AI server repair. This includes creating detailed Standard Operating Procedures (SOPs), diagnostic flowcharts, decision trees, and work instructions.
* Test Plan Development: Define and validate comprehensive test plans and validation criteria for all repaired components and full systems, ensuring they meet strict performance and reliability standards before being returned to service.
* Tooling & Automation: Identify, develop, and implement diagnostic scripts, software tools, and physical fixtures to improve the accuracy, consistency, and efficiency of the troubleshooting and repair process.
* Process Control: Establish critical control points within the repair process to ensure quality and gather vital failure data.
2. Advanced Engineering Support & Failure Analysis (Primary Focus):
* Technical Authority: Serve as the ultimate escalation point for the most complex hardware failures that elude standard diagnostic procedures.
* Root Cause Analysis (RCA): Lead systematic deep dives into new and recurring failure modes. Perform board-level analysis, interpret schematics, and collaborate with the team to isolate the root cause.
* Engineering Feedback Loop: Act as the primary technical interface between the repair center and core Hardware Engineering/R&D. Consolidate, analyze, and present failure data and RCA findings to influence future product design for improved serviceability and reliability (Design for Serviceability).
3. Operational Leadership & Team Enablement:
* Technical Mentorship: Lead and develop the technical capabilities of the repair team. Provide hands-on training on new products, advanced diagnostic techniques, and established repair processes.
* Enablement, Not Just Delegation: Empower the team by ensuring they have the processes, tools, and knowledge required to succeed. Focus on removing technical roadblocks and fostering an environment of structured problem-solving.
* Performance Management: Set clear technical objectives, manage workflow priorities based on engineering needs, and guide the professional growth of team members.
4. Data-Driven Continuous Improvement:
* Analyze Repair Data: Systematically collect and analyze repair data (failure modes, component usage, test yields) to identify trends and opportunities for process optimization.
* Drive Improvements: Initiate and lead engineering change requests (ECRs) and process improvement projects based on data analysis to enhance repair quality, reduce turn-around time, and lower costs.
Qualifications
Qualifications & Skills
Required Qualifications (Must-Haves):
* Education: Bachelors degree in Electrical Engineering, Computer Engineering, Manufacturing Engineering, or a closely related field.
* Experience: * 4+ years in a technical engineering role such as Test Engineering, Manufacturing Engineering, Hardware Sustaining, or high-level Repair Engineering.
* Proven track record of developing and documenting technical processes (SOPs, test plans, work instructions) from scratch in a manufacturing or repair environment.
* 3+ years in a technical leadership role, mentoring junior engineers or technicians.
* Technical Expertise:
* Expert-level ability to read and interpret electronic schematics, board layout files, and product specifications.
* Strong, hands-on experience with systematic hardware troubleshooting methodologies for complex systems (e.g., servers, networking equipment).
* Demonstrated proficiency in scripting (Python, Bash, or similar) to automate diagnostic tests and parse data logs.
* Deep knowledge of server components and architecture, including GPUs, high-speed interconnects (InfiniBand/Ethernet), CPUs, and power systems.
Working Conditions
Must be able to tolerate moderate to high noise levels in production and testing rooms. Office and outside environmental conditions found in the warehouse, hot in the summer, cold in the winter. Individuals may need to walk for an extensive period of time while working and walking the facilities; to reach over shoulder heights; bend or stoop below the waist; repetitive wrist, hand, or finger movement; and occasional lifting up to 50 pounds.
To apply send your resume to [email protected]
Recommended Jobs
Nurse Practitioner
Full-Time and Part-Time Positions Available! We all want to make a difference…at Life Scan YOU can! As a nationally recognized medical practice that provides prevention-based screenings, physicals…
Senior Accountant
Description Senior Accountant Salary Range: $100,000 - $125,000 Location: Arlington, VA Education Requirements: Bachelor’s Degree in Accounting, Finance, or a related field. Profe…
Accounting Specialist
Description Are you ready to rev up your career in the dynamic world of automotive accounting? We're thrilled to announce that our accounting team is expanding, and we're on the lookout for a talent…
Travel Nurse - ICU
We are seeking a dedicated Travel Nurse for the ICU in Flower Mound, TX, earning $1,532/week. Provide expert nursing care to critically ill patients in the ICU setting. Monitor vital signs, admi…
Medical Billing Analyst- DME Required
We are seeking a detail-oriented and proactive Billing Analyst to join our team on a contract-to-hire basis. This role supports key billing and revenue cycle functions, with a strong focus on analy…
Senior Principal Software Engineer
Company Description Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and govern…
Business Analyst II - Client Integration
Integration Project Manager (Business Analyst II) Location: Â Remote Team: Client Solutions & Integrations The Opportunity Are you a technical bridge-builder who loves the "puzzle" of da…
- Hospice Registered Nurse (RN) - Career Growth
Hospice Registered Nurse (RN) in Brownsville, Texas Experience a work culture where nurses are valued, management backs you and you’re empowered to improve the quality of a patient’s life. At …