Site Reliability Engineer
Site Reliability Engineer (SRE)
Location: Hybrid (some at-home and some in Worlds Plano HQ) with preference given to candidates in proximity to the Dallas Fort Worth Metroplex.
Reporting to: Leader of Client Delight
The Client Delight team is responsible for comprehensive service delivery and ensuring ultimate customer satisfaction, encompassing Implementation, Operations, Security, Customer Support, Change Management, and IT. This role involves close collaboration with Worlds Engineering & Development, Sales & Solutions Architecture, and other cross-functional teams to provide holistic management of the Worlds customer experience.
About Worlds: Worlds is an AI Platform that enhances visibility and automates physical operations by applying AI across existing camera networks. Our end-to-end solution enables enterprises to model, train, and build automation into their physical environments, helping them develop applications that measure, detect, and track objects in real-time, impacting efficiency, safety, and security. Learn more at worlds.io.
Job Summary: The Site Reliability Engineer (SRE) is a critical hands-on role responsible for the deployment, monitoring, and operational support of the Worlds AI platform for our customers. The SRE ensures the reliability, scalability, and security of customer solutions, acting as the primary technical resource for implementation and incident management. This role requires a blend of cloud infrastructure expertise, automation skills, and a passion for customer success, ensuring our Fortune-500 clientele receive best-in-class service and support.
Key Responsibilities:
- Solution Implementation: Deploy, configure, and update new customer solution environments in Azure Kubernetes Service (AKS) and other cloud platforms (AWS, GCP, private cloud), utilizing infrastructure-as-code tools like Bicep and Helm charts.
- Custom Solution Integration: Work with our Forward Deployment Engineering team on custom development and integration of the Worlds app within the customer's operation. Provide monitoring guidance, support documentation, and solution health dashboards as required to manage the custom solution.
- Monitoring & Alerting: Implement, tune, and manage monitoring and alerting solutions using Prometheus and Grafana to meet customer SLAs and ensure optimal performance. Collaborate with core engineering to define and integrate application telemetry.
- Incident Management & Support: Provide production support and lead incident management processes following ITIL guidelines. Troubleshoot and resolve issues, escalating to DevOps for tool-related issues or to Core Engineering for any Worlds app stack issues (functionality or performance), with the goal of gaining knowledge to reduce escalations over time.
- Knowledge Management: Develop and maintain comprehensive customer runbooks in Confluence, documenting unique solution architectures and return-to-service procedures to ensure operational readiness.
- System & Performance Testing: Test new configurations, including performance and load testing, to validate solution stability and scalability.
- Security & Compliance: Adhere strictly to Worlds’ Acceptable Use Policy (AUP) and Access Control Policy, operating with the principle of least privilege to ensure the security and compliance of customer environments.
- Customer Communication: Serve as a key technical point of contact for customers, communicating effectively on project status, incidents, and operational performance.
Qualifications & Experience:
- Networking (5+ years): Deep experience configuring and troubleshooting TCP/IP networks, including subnetting, routing, firewalls, and VPN solutions (OpenVPN, WireGuard).
- Linux Administration (3+ years): Proficient in building, troubleshooting, and managing Linux servers, including remote access, service verification, and log analysis.
- Cloud Administration (2+ years): Demonstrable experience managing cloud solutions in Azure (required), with familiarity in AWS or Google Cloud as a bonus. Expertise in containerized solutions (Docker, Kubernetes), IaaS (VMs), DNS, IAM, and logging services is essential.
- Automation & Scripting: Experience with configuration management tools such as Ansible, Docker, Kubernetes, and Helm is highly preferred. Proficiency in scripting with Bash and Python for automation is required.
- Database: Ability to write and execute basic SQL select queries for troubleshooting and data verification.
- IT Service Management: Experience with ITSM frameworks (ITIL) and tools (e.g., Jira Service Management) for incident and problem management.
- AI/ML: Experience with Artificial Intelligence (AI) and Machine Learning (ML) concepts is a plus; Worlds will provide training on our specific platform.
Personal Qualities:
- Passionate about delivering Client Delight and taking ownership of the customer experience.
- Ability to thrive in a fast-paced startup environment, iterating quickly on solutions and processes while driving the maturation of operations for security and efficiency.
- A proactive and collaborative mindset, with excellent problem-solving and communication skills.
Perks and Benefits:
- 100% employer-paid medical coverage for employees and dependents.
- Comprehensive benefits including dental, vision, 401k, and disability.
- Flexible PTO policy.
- Employee stock options.
Qualified candidates should send a cover letter and resume to [email protected].
The above statements are intended to describe the general nature and level of work performed by employees assigned to this job. They are not intended to be an exhaustive list of all duties, responsibilities, and qualifications.
Join us at Worlds and help shape the future of industrial operations with cutting-edge technology!
Recommended Jobs
Immediate Openings for Caregivers
Job Description Job Description Immediate Openings for Caregivers Now conducting interviews from the comfort of home via video. Preferred Care at Home of North Austin & Williamson County,…
Junior Golang Stack Developer Role
Responsibilities and Duties In general you will be responsible for building a Cloud Native stack mostly coded in Golang. ~Paid software development experience a must (unpaid internships do not count…
Target Optical - Optician
Contract: [[cust_TypeOfContract]] Compensation: [[salaryMin]] If you’ve worn a pair of glasses, we’ve already met. We are a global leader in the design, manufacture, and distribution of o…
Daytime Year-Round Lifeguard
Job Description Job Description Description: If you want to be a part of a growing organization and make a difference in the community, come work for Shalom Austin! Day-Time Lifeguard, Avali…
HVAC STARTUP TECHNICIAN - NEW CONSTRUCTION
Job Description Job Description We are looking for an HVAC STARTUP TECHS to join our team! You will startup new heating and air conditioning systems. Responsibilities: Inspect and perform e…
Refrigeration Tech - Night Shift
Description Position at Pilgrim's Refrigeration Technician (Hourly) SUMMARY Installs and repairs industrial and commercial refrigerating systems (ammonia and Freon) and equipment by performing t…
Junior Accountant
Description Job description Junior Accountant Location: Arlington, VA Salary Range: $75,000 - $90,000 Education Requirements: ~ Bachelor’s degree in Accounting or a related field. …
Retail Pharmacy Manager - Junction, Texas to $150,000
OPEN JOB: Retail Pharmacy Manager Location: Junction, Texas Salary: $140,000 to $150,000 Work Life Balance Monday thru Friday ( 9AM-6PM) , some exceptions per clinic No nights, weekends,…
Senior Integration Sales Engineer (Houston, TX)
Working Location: Texas, Houston Workplace Flexibility: Field For more than 100 years, Olympus has focused on making people’s lives healthier, safer and more fulfilling. Every da…
Automated Quality Assurance Tester
Sol-Ark is a Veteran-owned solar and battery-based technology manufacturer based in Texas. Our mission is making the most reliable, innovative, and affordable solar storage solutions to power familie…