Senior Site Reliability Engineer - Healthcare/Life Sciences
Senior Site Reliability Engineer - Healthcare/Life Sciences
Our Client - Hospital & Health Care company
- Sunnyvale, CA
Job description
***Please note that our customer is currently not considering applicants from the following locations: Alabama, Arkansas, Delaware, Florida, Indiana, Iowa, Louisiana, Maryland, Mississippi, Missouri, Oklahoma, Pennsylvania, South Carolina, and Tennessee.***
Our Customer is a corporation that develops, manufactures, and markets robotic products designed to improve clinical outcomes of patients through minimally invasive surgery. Founded in 1995, their goal was to create innovative, robotic-assisted systems that help empower doctors and hospitals to make surgery less invasive than an open approach. Working with the top medical professionals, they continue to develop new, minimally invasive surgical platforms and future diagnostic tools to help solve complex healthcare challenges around the world.
We are seeking a Senior Site Reliability Engineer - Healthcare/Life Sciences on a contract basis to support our Customer's business needs. This role is on-site in Sunnyvale, CA.
The Senior Site Reliability Engineer serves as a technical leader responsible for architecting, securing, and sustaining production infrastructure for regulated digital health and medical software platforms. This role ensures the reliability, scalability, and compliance of critical systems in alignment with FDA GxP guidelines and HITRUST standards.
You will drive initiatives across incident response, deployment automation, observability, and capacity planning - modern DevOps/SRE methodologies, cloud-native tools, and cross-functional collaboration to support safe, effective, and compliant patient-focused solutions.
Responsibilities:
- Support the design, implementation, and maintenance of CI/CD pipelines with fully auditable deployment processes.
- Promote infrastructure-as-code practices using Terraform, Helm, and Ansible, embedding HITRUST and GxP controls in reusable modules.
- Architect and maintain highly available, scalable, and compliant systems using Kubernetes and major cloud providers (AWS, GCP, Azure).
- Apply SRE principles by defining, measuring, and improving SLIs/SLOs/SLAs in regulated healthcare environments.
- Lead capacity planning, performance tuning, and infrastructure optimization initiatives aligned with regulatory and data privacy requirements.
- Manage the full incident lifecycle (detection → triage → resolution → postmortem), producing documentation required for FDA compliance and audit readiness.
- Develop and maintain incident response playbooks, including IT and regulatory escalation workflows.
- Implement and manage monitoring solutions (Datadog, Prometheus, Grafana, Elastic Search) to support rapid, compliant issue identification.
- Integrate and manage SIEM tools (Splunk, Datadog Security, Elastic Security) to support threat detection, log aggregation, and regulatory audits (HITRUST, GxP).
- Collaborate with security, QA, and regulatory teams to monitor and respond to production security incidents.
- Ensure logging, auditing, and reporting meet FDA, HITRUST, ISO 27001, and healthcare requirements, including data retention and traceability.
- Document infrastructure processes to support internal knowledge transfer and external audits.
- Plan and manage resource utilization to meet performance and regulatory efficiency standards.
- Troubleshoot and resolve cloud and network issues while ensuring secure handling of PHI and device data.
Qualifications
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
- 7+ years of experience in Production Engineering, DevOps, or SRE roles within healthcare, medical device, or life sciences.
- Expertise in containerization (Kubernetes, Docker), cloud platforms (GCP), and infrastructure-as-code.
- Relevant certifications (e.g., HITRUST, AWS Security, DevOps-related certifications) strongly preferred.
- Direct experience working with systems governed by FDA GxP and HITRUST frameworks; familiarity with HIPAA, SOC2, and ISO 27001.
- Strong scripting and automation skills (Python, Bash, Go).
- Proven experience managing SIEM and monitoring platforms in regulated environments.
- In-depth knowledge of incident response and reliability engineering in healthcare or medical device settings.
- Experience supporting or deploying medical device software under FDA regulations.
- Familiarity with quality management systems, validation procedures, and documentation for audits and FDA submissions.
- Strong communication and leadership skills for cross-functional work within regulated environments.
- Ability to innovate while adhering to strict compliance frameworks.
We offer a competitive salary range for this position. Most candidates who join our team are hired at the median of this range, ensuring fair and equitable compensation based on experience and qualifications.
Contractor benefits are available through our 3rd Party Employer of Record (Available upon completion of waiting period for eligible engagements) Benefits include: Medical, Dental, Vision, 401k.
An Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or protected veteran status and will not be discriminated against on the basis of disability.
All applicants applying for U.S. job openings must be legally authorized to work in the United States and are required to have U.S. residency at the time of application.
If you are a person with a disability needing assistance with the application, or at any point in the hiring process, please contact us at support@themomproject.com.