Senior Site Reliability Engineer

Qode
Texas

Job Title: Senior Site Reliability Engineer (Observability & Transaction Reliability)

Location: Austin, TX

Type: Full-Time

About Incedo

Incedo is a global AI and data transformation firm helping organizations drive measurable business impact from digital investments. We operate at the intersection of business and technology, combining AI, data, and digital engineering to deliver scalable, high-impact solutions.

With over 4,000 professionals across the U.S., Canada, Latin America, and India, Incedo partners with Fortune 500 and high-growth organizations across banking, payments, wealth management, telecom, and life sciences.

Role Overview

We are seeking a Senior Site Reliability Engineer (SRE) to drive reliability, observability, and performance across business-critical distributed systems.

This is a hands-on engineering role with strong ownership, focused on building and scaling observability platforms, improving transaction visibility, and enhancing system resilience. You will work closely with engineering, platform, and infrastructure teams to ensure high availability, performance, and operational excellence across microservices, APIs, and cloud-native systems.

The ideal candidate combines deep technical expertise in SRE practices with a passion for automation, monitoring, and continuous improvement.

Key Responsibilities

Observability & Monitoring

  • Design, implement, and maintain observability solutions across distributed systems
  • Build and optimize logging, metrics, and tracing pipelines using tools like Dynatrace, Datadog, Splunk, ELK, Grafana, and OpenTelemetry
  • Enable end-to-end transaction tracing across microservices and APIs
  • Develop dashboards and alerting strategies for proactive issue detection

Reliability & Incident Management

  • Own service reliability, uptime, and operational performance for critical systems
  • Lead incident response, root cause analysis (RCA), and postmortems
  • Reduce MTTD and MTTR through automation and improved observability
  • Create and maintain runbooks and incident response playbooks

Performance Engineering

  • Monitor and optimize system performance (latency, throughput, error rates)
  • Partner with application and database teams to troubleshoot bottlenecks
  • Use distributed tracing and telemetry data to identify and resolve issues
  • Implement performance testing and tuning strategies

Resiliency & Automation

  • Build and maintain fault-tolerant, highly available systems
  • Implement resiliency patterns (failover, retries, circuit breakers, self-healing)
  • Drive chaos engineering practices to validate system reliability
  • Automate operational tasks using scripting (Python, Go, etc.)

SRE Best Practices & Governance

  • Define and enforce SLOs, SLIs, and error budgets aligned to business goals
  • Promote SRE principles across engineering teams
  • Partner with DevOps and platform teams to improve CI/CD reliability
  • Contribute to building a culture of operational excellence and accountability

Required Qualifications

  • 7–10+ years of experience in Site Reliability Engineering or Production Support Engineering
  • Strong hands-on experience with observability tools (Dynatrace, Datadog, Splunk, ELK, Grafana, OpenTelemetry, Jaeger)
  • Experience supporting cloud-native environments (AWS, Azure, or GCP)
  • Deep understanding of microservices architecture and distributed systems
  • Proficiency in scripting/programming (Python, Go, Java, or similar)
  • Experience with monitoring, alerting, and incident management in production environments

Preferred Qualifications

  • Experience implementing OpenTelemetry at scale
  • Background in chaos engineering and resiliency testing
  • Familiarity with AIOps or intelligent monitoring platforms
  • Experience in financial services, banking, or wealth management environments
  • Dynatrace certification (Associate or Professional)

What Success Looks Like

  • Measurable reduction in MTTD and MTTR
  • Increased proactive detection of issues through monitoring
  • Improved system uptime, performance, and reliability
  • Strong adoption of SRE best practices across engineering teams

Why Join Us

  • Work on high-impact, mission-critical systems
  • Drive modern SRE and observability practices at scale
  • Collaborate with top-tier engineering and architecture teams
  • Opportunity to influence reliability strategy across the organization
Posted 2026-04-16

Recommended Jobs

Vehicle Inspector and Photographer

Dominion Enterprises
Austin, TX

Description Position at Dealer Specialties Dealer Specialtiesis looking for a Vehicle Inspector and Photographerto join our dynamic team. We offer a desirable work environment that allows you to w…

View Details
Posted 2026-01-29

OCS Design Engineer

Hatch
Dallas, TX

Requisition ID: 92926  Job Category:  Infrastructure; Consulting; Engineering; Project Engineering; Project Management  Location:  Dallas, TX, United States  Join a company that is passionate…

View Details
Posted 2026-01-02

Senior Manager - Energy Storage Engineering

Longroad Energy Management Llc
Arlington, TX

Description Job Purpose This position will be in the center of technical activities for Longroad’s rapidly growing energy storage business. The position requires in-depth knowle…

View Details
Posted 2026-04-10

Charge Registered Nurse / RN ER

BSA Hospital
Amarillo, TX

Overview: Join our team as a night shift, full-time, Emergency Room Charge Registered Nurse in Amarillo, TX.   Thrive in a People-First Environment and Make Healthcare Better ~  Thrive:  We emp…

View Details
Posted 2026-01-24

Family/Emergency Medicine physician

Adelphi Staffing
Mexia, TX

Job Quick Facts: • Specialty: Family/Emergency Medicine • Job Type: Locum Tenens • Facility Location: Mexia, TX • Service Setting: Outpatient • Reason For Coverage: Supplemental • Cove…

View Details
Posted 2026-04-09

Sr HR Representative

Houston Methodist
Houston, TX

At Houston Methodist, the Senior Human Resources (HR) Representative position is responsible for performing moderately complex HR activities within the scope of their department, related to onboarding…

View Details
Posted 2026-04-15

Property Manager

Staffmax Staffing & Recruiting
Austin, TX

Full job description Staffmax is looking for a Property Manager to support a high rise apartment building in Austin. The ideal candidate should have experience in the day to day operations and adm…

View Details
Posted 2026-03-24

Business Development Manager- Industrial Services

EnerMech
Houston, TX

We’re seeking a Business Development Manager to drive new revenue within our Industrial and Infrastructure Services Business Line by identifying, pursuing, and securing opportunities across target ma…

View Details
Posted 2026-04-09

Maintenance Worker

University of Texas at El Paso
El Paso, TX

Position Information Hiring Department : Facilities Management Reports To: Associate Director for Building Maintenance Job Location: El Paso, Texas Posting End Date: Open until filled. …

View Details
Posted 2026-04-09

Janitor

Hoamco
Texas

We’re CCMC, a community management company specializing in master-planned communities. Our vision of inspiring a resident-centric focus is brought to life by our core values: Integrity, Respect, Serv…

View Details
Posted 2026-03-12