Site Reliability Engineer

Posted 2 weeks ago by Whitehall Resources Ltd on JobServe

Apply

Job Description: The Site Reliability Engineer will work on-site in London for three days a week, focusing on managing cloud infrastructure and ensuring the reliability of production systems. This role involves operational responsibilities, automation, and collaboration across teams to build scalable systems. The position requires participation in on-call rotations and proactive monitoring of systems. The contract is for an initial duration of six months and is classified as inside IR35.

Key Responsibilities:

Deploy, configure, and monitor AWS services ensuring high availability, scalability, and security.
Respond to and resolve infrastructure and service incidents with root cause analysis and preventive measures.
Handle change requests, track recurring issues, and work on long-term fixes to improve system stability.
Implement and maintain observability solutions using Prometheus, Grafana, and Splunk.
Write PromQL queries for custom monitoring dashboards, alerting, and diagnostics.
Manage and optimize CI/CD pipelines for automated testing, deployment, and rollback strategies.
Develop and maintain automation scripts in Python, Bash, Go, or SQL for routine infrastructure tasks.
Utilize Git-based workflows for infrastructure changes, version control, and automated deployments.
Operate, troubleshoot, and optimize Kubernetes clusters and containerized workloads.
Participate in a rotating on-call schedule to ensure 24/7 availability of production systems.

Skills Required:

Working knowledge and prior hands-on experience using AWS services at the DevOps Engineer level.
Incident, change & problem management experience.
Strong background in setup & operation of enterprise observability tooling, specifically Prometheus, Grafana, and Splunk, including usage of PromQL.
Proficient in one or more languages of Python, Go, Bash, SQL.
Familiar with GitHub/GitOps/container orchestration/Kubernetes operations.
Working configuration and deployment management experience with CI/CD.
Hands-on experience with Terraform or CloudFormation for infrastructure provisioning and automation (desirable).
Strong knowledge of Splunk for log analysis and troubleshooting (desirable).
Strong problem-solving skills and analytical thinking (desirable).

Salary (Rate): undetermined

City: City of London

Country: UK

Working Arrangements: on-site

IR35 Status: inside IR35

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

Site Reliability Engineer

Whitehall Resources require a Site Reliability Engineer to work with a key client on a 6 month initial contract.

*This role will involve on site work in London 3 days per week.

*Inside IR35.

*This role will require some on-call work.

Site Reliability Engineer

The Role
As a Site Reliability/DevOps Engineer, you will play a critical role in managing cloud infrastructure, ensuring the reliability of production systems, and improving end-to-end deployment pipelines. This role combines deep operational responsibilities with a strong focus on automation, observability, and continuous improvement. You will be responsible for maintaining high system availability, enabling rapid delivery through CI/CD, and supporting development teams with robust infrastructure and tooling. A key part of the role includes proactive monitoring using Prometheus, Grafana, and Splunk, as well as participating in on-call rotations to respond to live incidents. Collaboration across engineering, security, and product teams is essential to build scalable and resilient systems.

Your responsibilities:
1. Deploy, configure, and monitor AWS services ensuring high availability, scalability, and security.
2. Respond to and resolve infrastructure and service incidents with root cause analysis and preventive measures.
3. Handle change requests, track recurring issues, and work on long-term fixes to improve system stability.
4. Implement and maintain observability solutions using Prometheus, Grafana, and Splunk.
5. Write PromQL queries for custom monitoring dashboards, alerting, and diagnostics.
6. Manage and optimize CI/CD pipelines for automated testing, deployment, and rollback strategies.
7. Develop and maintain automation scripts in Python, Bash, Go, or SQL for routine infrastructure tasks.
8. Utilize Git-based workflows for infrastructure changes, version control, and automated deployments.
9. Operate, troubleshoot, and optimize Kubernetes clusters and containerized workloads.
10. Participate in a rotating on-call schedule to ensure 24/7 availability of production systems.

Your Profile

Essential skills/knowledge/experience:
1. Working knowledge and prior hands-on experience using AWS services at the DevOps Engineer level
2. Incident, change & problem management experience. This role is heavily operation-oriented, including on-call requirements
3. Strong background in setup & operation of enterprise observability tooling, specifically Prometheus, Grafana and Splunk, including usage of PromQL
4. Proficient in one or more languages of Python, Go, Bash, SQL
5. Familiar with GitHub/GitOps/container orchestration/Kubernetes operations
6. Working configuration and deployment management experience with CI/CD

Desirable skills/knowledge/experience:
1. Hands-on experience with Terraform or CloudFormation for infrastructure provisioning and automation.
2. Strong knowledge of Splunk for log analysis and troubleshooting.
3. Strong problem-solving skills and analytical thinking.

All of our opportunities require that applicants are eligible to work in the specified country/location, unless otherwise stated in the job description.

Whitehall Resources are an equal opportunities employer who value a diverse and inclusive working environment. All qualified applicants will receive consideration for employment without regard to race, religion, gender identity or expression, sexual orientation, national origin, pregnancy, disability, age, veteran status, or other characteristics.

Rate:

Negotiable

Location:

City of London, UK

IR35 Status:

Inside

Remote Status:

Onsite

Industry:

Seniority Level:

Not Specified

Key Responsibilities:

Deploy, configure, and monitor AWS services ensuring high availability, scalability, and security.
Respond to and resolve infrastructure and service incidents with root cause analysis and preventive measures.
Handle change requests, track recurring issues, and work on long-term fixes to improve system stability.
Implement and maintain observability solutions using Prometheus, Grafana, and Splunk.
Write PromQL queries for custom monitoring dashboards, alerting, and diagnostics.
Manage and optimize CI/CD pipelines for automated testing, deployment, and rollback strategies.
Develop and maintain automation scripts in Python, Bash, Go, or SQL for routine infrastructure tasks.
Utilize Git-based workflows for infrastructure changes, version control, and automated deployments.
Operate, troubleshoot, and optimize Kubernetes clusters and containerized workloads.
Participate in a rotating on-call schedule to ensure 24/7 availability of production systems.

Skills Required:

Working knowledge and prior hands-on experience using AWS services at the DevOps Engineer level.
Incident, change & problem management experience.
Strong background in setup & operation of enterprise observability tooling, specifically Prometheus, Grafana, and Splunk, including usage of PromQL.
Proficient in one or more languages of Python, Go, Bash, SQL.
Familiar with GitHub/GitOps/container orchestration/Kubernetes operations.
Working configuration and deployment management experience with CI/CD.
Hands-on experience with Terraform or CloudFormation for infrastructure provisioning and automation (desirable).
Strong knowledge of Splunk for log analysis and troubleshooting (desirable).
Strong problem-solving skills and analytical thinking (desirable).

Salary (Rate): undetermined

City: City of London

Country: UK

Working Arrangements: on-site

IR35 Status: inside IR35

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

All of our opportunities require that applicants are eligible to work in the specified country/location, unless otherwise stated in the job description.

Create a free account to view the take-home pay for this contract

Apply

Inside IR35

Outside IR35

Salary Calculator

IR35

Umbrella Companies

Limited Companies

First Time Contractors

What Is IR35?

Inside IR35

Outside IR35

The Cost of IR35

IR35 Assessments

IR35 Rules

IR35 Compliance

Expenses

Foreign Companies

Overseas Contractors

Limited Companies

Sole Traders

What Is An Umbrella Company?

Choosing an Umbrella Company

Tax and Pay

Tax Avoidance

Fees (Margin)

National Insurance

Holiday Pay

Expenses

Pensions

Maternity Pay

Sick Pay

What Is A Limited Company?

Limited Company vs Sole Trader

Incorporation

Taxes

Filing Responsibilities

Bookkeeping

Insurance

Expenses

Buying a Car or Van

Capital Allowances

Benefits In Kind

Pensions

Employing A Spouse

Managing Excess Money

Dormant Companies

Closing Your Company

Withdrawing Money

Business Asset Disposal Relief

How To Become A Contractor

Inside IR35 Checklist

Outside IR35 Checklist

Self-Assessment Tax Returns

Mortgages

Pensions

Working Multiple Contracts

What is the £100k Abatement?

Inside IR35

Outside IR35

Permanent Employee

IR35

Umbrella Companies

Limited Companies

First Time Contractors

Profile

My Jobs

Notifications

CV

Invite Friends

What Is IR35?

Inside IR35

Outside IR35

The Cost of IR35

IR35 Assessments

IR35 Rules

IR35 Compliance

Expenses

Foreign Companies

Overseas Contractors

Limited Companies

Sole Traders