Fulltime, IT & Engineering, Permanent

Site Reliability Engineer


Job Responsibilities:

  • Collaborate with Software Engineers to improve reliability and performance in their subsystems.
  • Partner with System Administrators in automating toil and eliminating alerts.
  • Evolve observability and monitoring capabilities to identify and solve problems before they impact the business.
  • Support development environments to help us achieve our delivery and quality goals.
  • Research and evaluate technologies, tools, and services to influence buy-vs-build decisions.
  • Develop expertise in diverse technical and business domains.
  • Expand your knowledge of the technical stacks used.


  • Experience using modern configuration management tools (such as Ansible, Chef or similar).
  • Experience working with Terraform.
  • Experience working with docker containers & container orchestration tools (such as Kubernetes, OpenShift or Docker Swarm).
  • Experience both using and maintaining CI / CD tools (such as Jenkins or similar).
  • Experience with monitoring tools such as InfluxDB, Prometheus or Grafana.
  • Experience of event-driven integration with MQ messaging (RabbitMQ or similar AMQP solution).
  • Good understanding of relational databases and SQL.
  • Linux command line, administration, and shell scripting.
  • Working knowledge of network security protocols.
  • Experience using, developing, and maintaining cloud hosting services (ideally AWS EC2, RDS, S3, Lambda).
  • Industry experience of writing well-tested code in one of our platform languages (Java, Go, Python or similar).
  • Knowledge of cross domain principles & technologies.
  • Experience of working in a service management environment.
  • Practical applications of using observability patterns in previous systems.
  • Creating and monitoring system availability metrics and using those to drive work that reduces downtime.
× How can I help you?