Core Site Reliability Engineer – Global Trading Infrastructure

The Company
A rapidly scaling technology-driven trading firm is building the next generation of infrastructure for digital asset markets. The team designs and operates high-performance, globally distributed systems focused on reliability, transparency, and efficiency.
They apply a first-principles engineering culture across liquidity provision, trading solutions, and treasury operations, delivering robust systems that operate at scale.

The Role

As a Core Site Reliability Engineer, you will sit at the centre of the firm's infrastructure operations. Your work will directly influence the reliability, performance, and scalability of a multi-region, low-latency trading environment.

You will collaborate closely with Engineering, Trading, and Data teams while reporting into the firm’s Infrastructure leadership. Your mission: maintain, enhance, and scale the systems that keep the trading platform running at peak performance.

Responsibilities

Infrastructure Reliability & Operations

Maintain and optimise critical systems including ultra-low-latency networking stacks, cloud environments, data pipelines, and core trading services
Continuously monitor and tune network performance across global regions
Ensure high reliability, security, and scalability across mission-critical infrastructure
Participate in daily operational duties, real-time incident response, and platform stability improvements

Kubernetes & Orchestration

Develop and maintain in-house Kubernetes operators tailored to complex trading workloads
Manage containerised services and contribute to smooth, reliable deployment workflows

Automation, Tooling & DevOps

Enhance CI/CD tooling (FluxCD, GitHub Actions, Python-based automation)
Build monitoring and observability solutions using Prometheus, Grafana, and related technologies
Advance DevOps best practices, focusing on application-level reliability, automation, and system visibility

Collaboration & Innovation

Partner with cross-functional teams to understand requirements and deliver high-impact technical solutions
Provide technical guidance that strengthens operational capabilities across the firm
Propose improvements to performance, cost efficiency, and system design
Communicate complex engineering concepts effectively to non-technical stakeholders

Requirements

Essential Skills & Experience

Strong expertise in Kubernetes architecture and cluster operations
Solid background in system and network administration, particularly in low-latency or performance-critical contexts
Experience with AWS, GCP, and infrastructure-as-code tools (Terraform, Ansible)
Ability to write high-quality tooling in Golang, Rust, and/or Python
Hands-on experience with monitoring and observability (Prometheus, Grafana, alerting frameworks)

Highly Advantageous

Experience with CI/CD systems such as FluxCD and GitHub Actions
Familiarity with DevOps practices and cloud-native automation
Background in trading, market-making, or high-frequency environments
Exposure to ultra-low-latency networking, performance tuning, and throughput optimisation
Understanding of FinOps and cost optimisation strategies

Personal Qualities

Clear and confident communicator, able to break down technical ideas to varied audiences
Strong organisational skills and the ability to prioritise urgent tasks
Disciplined and methodical approach to troubleshooting
Eagerness to adopt new technologies and continuously grow skills
Comfortable operating in a fast-paced, international environment

Location	Penycae
Discipline:	Financial Technology
Job type:	Permanent
Contact name:	Polly Muirhead
Contact email:	polly.muirhead@venturesearch.com
Job ref:	3648
Published:	1 day ago

Core Site Reliability Engineer