Senior Site Reliability Engineer, DevOps
Company: Alphatec Holdings, Inc.
Location: Carlsbad
Posted on: October 29, 2025
|
|
|
Job Description:
The Senior Site Reliability Engineer (SRE) will be responsible
for ensuring the availability, performance, scalability, and
operational efficiency of the Informatix cloud platform. This role
is focused on reducing manual operations work (toil), automating
system reliability, and ensuring production-grade observability.
The ideal candidate is a systems-focused engineer who is passionate
about uptime, incident response, and continuous improvement through
engineering solutions. Essential Duties and Responsibilities •
Serve as a primary contributor to the on-call rotation to maintain
24/7 uptime for production systems. • Proactively, monitor, and
continuously improve SLAs, SLOs, and SLIs across critical services.
• Develop and maintain robust observability tooling including
logging, metrics, and tracing (e.g., Azure Monitor, OpenTelemetry,
Prometheus). • Proactively conduct postmortems and root cause
analysis; implement fixes to prevent repeat incidents. • Identify
and eliminate manual operational toil through scripting and
automation. • Design and maintain automated incident detection and
response systems. • Establish and maintain runbooks, playbooks, and
escalation protocols for system support. • Contribute to chaos
testing and failure injection to proactively uncover weaknesses. •
Promote a culture of operational excellence through data-driven
reliability practices. • Proactively communicating status
Requirements The requirements listed below are representative of
the knowledge, skill, and/or ability required. Reasonable
accommodations may be made to enable individuals with disabilities
to perform the essential functions. • 5 years of experience in Site
Reliability Engineering, systems engineering, or DevOps roles. •
Expertise in monitoring and observability platforms (e.g., Grafana,
Prometheus, ELK, Azure Monitor). • Solid background in incident
response, root cause analysis, and on-call rotations. • Deep
knowledge of Microsoft Azure, including containerized services
(AKS), networking, and storage. • Strong automation and scripting
experience (e.g., Python, Bash, PowerShell). • Familiarity with IaC
tools such as Terraform, Bicep, or ARM templates. • Experience
implementing SLIs/SLOs, operational dashboards, and error budgets.
• Comfortable designing for resiliency, failover, and graceful
degradation. • Knowledge of compliance frameworks (e.g., SOC 2,
HITRUST, IEC 62304) is a plus. • Strong written and verbal
communication with a focus on transparency and learning. Education
and Experience • BS/MS in Computer Science, Engineering, or related
technical field preferred. • 5 years in production engineering
roles with direct ownership of critical systems. • Microsoft
certifications a plus
Keywords: Alphatec Holdings, Inc., Carlsbad , Senior Site Reliability Engineer, DevOps, IT / Software / Systems , Carlsbad, California