Core Site Reliability Engineer

Location Penycae
Discipline: Financial Technology
Job type: Permanent
Contact name: Polly Muirhead

Contact email: polly.muirhead@venturesearch.com
Job ref: 3648
Published: 1 day ago

Core Site Reliability Engineer – Global Trading Infrastructure

The Company
A rapidly scaling technology-driven trading firm is building the next generation of infrastructure for digital asset markets. The team designs and operates high-performance, globally distributed systems focused on reliability, transparency, and efficiency.
They apply a first-principles engineering culture across liquidity provision, trading solutions, and treasury operations, delivering robust systems that operate at scale.

The Role

As a Core Site Reliability Engineer, you will sit at the centre of the firm's infrastructure operations. Your work will directly influence the reliability, performance, and scalability of a multi-region, low-latency trading environment.

You will collaborate closely with Engineering, Trading, and Data teams while reporting into the firm’s Infrastructure leadership. Your mission: maintain, enhance, and scale the systems that keep the trading platform running at peak performance.

Responsibilities

Infrastructure Reliability & Operations

  • Maintain and optimise critical systems including ultra-low-latency networking stacks, cloud environments, data pipelines, and core trading services

  • Continuously monitor and tune network performance across global regions

  • Ensure high reliability, security, and scalability across mission-critical infrastructure

  • Participate in daily operational duties, real-time incident response, and platform stability improvements

Kubernetes & Orchestration

  • Develop and maintain in-house Kubernetes operators tailored to complex trading workloads

  • Manage containerised services and contribute to smooth, reliable deployment workflows

Automation, Tooling & DevOps

  • Enhance CI/CD tooling (FluxCD, GitHub Actions, Python-based automation)

  • Build monitoring and observability solutions using Prometheus, Grafana, and related technologies

  • Advance DevOps best practices, focusing on application-level reliability, automation, and system visibility

Collaboration & Innovation

  • Partner with cross-functional teams to understand requirements and deliver high-impact technical solutions

  • Provide technical guidance that strengthens operational capabilities across the firm

  • Propose improvements to performance, cost efficiency, and system design

  • Communicate complex engineering concepts effectively to non-technical stakeholders

Requirements

Essential Skills & Experience

  • Strong expertise in Kubernetes architecture and cluster operations

  • Solid background in system and network administration, particularly in low-latency or performance-critical contexts

  • Experience with AWS, GCP, and infrastructure-as-code tools (Terraform, Ansible)

  • Ability to write high-quality tooling in Golang, Rust, and/or Python

  • Hands-on experience with monitoring and observability (Prometheus, Grafana, alerting frameworks)

Highly Advantageous

  • Experience with CI/CD systems such as FluxCD and GitHub Actions

  • Familiarity with DevOps practices and cloud-native automation

  • Background in trading, market-making, or high-frequency environments

  • Exposure to ultra-low-latency networking, performance tuning, and throughput optimisation

  • Understanding of FinOps and cost optimisation strategies

Personal Qualities

  • Clear and confident communicator, able to break down technical ideas to varied audiences

  • Strong organisational skills and the ability to prioritise urgent tasks

  • Disciplined and methodical approach to troubleshooting

  • Eagerness to adopt new technologies and continuously grow skills

  • Comfortable operating in a fast-paced, international environment