Core Site Reliability Engineer – Global Trading Infrastructure
The Company
A rapidly scaling technology-driven trading firm is building the next generation of infrastructure for digital asset markets. The team designs and operates high-performance, globally distributed systems focused on reliability, transparency, and efficiency.
They apply a first-principles engineering culture across liquidity provision, trading solutions, and treasury operations, delivering robust systems that operate at scale.
The Role
As a Core Site Reliability Engineer, you will sit at the centre of the firm's infrastructure operations. Your work will directly influence the reliability, performance, and scalability of a multi-region, low-latency trading environment.
You will collaborate closely with Engineering, Trading, and Data teams while reporting into the firm’s Infrastructure leadership. Your mission: maintain, enhance, and scale the systems that keep the trading platform running at peak performance.
Responsibilities
Infrastructure Reliability & Operations
-
Maintain and optimise critical systems including ultra-low-latency networking stacks, cloud environments, data pipelines, and core trading services
-
Continuously monitor and tune network performance across global regions
-
Ensure high reliability, security, and scalability across mission-critical infrastructure
-
Participate in daily operational duties, real-time incident response, and platform stability improvements
Kubernetes & Orchestration
-
Develop and maintain in-house Kubernetes operators tailored to complex trading workloads
-
Manage containerised services and contribute to smooth, reliable deployment workflows
Automation, Tooling & DevOps
-
Enhance CI/CD tooling (FluxCD, GitHub Actions, Python-based automation)
-
Build monitoring and observability solutions using Prometheus, Grafana, and related technologies
-
Advance DevOps best practices, focusing on application-level reliability, automation, and system visibility
Collaboration & Innovation
-
Partner with cross-functional teams to understand requirements and deliver high-impact technical solutions
-
Provide technical guidance that strengthens operational capabilities across the firm
-
Propose improvements to performance, cost efficiency, and system design
-
Communicate complex engineering concepts effectively to non-technical stakeholders
Requirements
Essential Skills & Experience
-
Strong expertise in Kubernetes architecture and cluster operations
-
Solid background in system and network administration, particularly in low-latency or performance-critical contexts
-
Experience with AWS, GCP, and infrastructure-as-code tools (Terraform, Ansible)
-
Ability to write high-quality tooling in Golang, Rust, and/or Python
-
Hands-on experience with monitoring and observability (Prometheus, Grafana, alerting frameworks)
Highly Advantageous
-
Experience with CI/CD systems such as FluxCD and GitHub Actions
-
Familiarity with DevOps practices and cloud-native automation
-
Background in trading, market-making, or high-frequency environments
-
Exposure to ultra-low-latency networking, performance tuning, and throughput optimisation
-
Understanding of FinOps and cost optimisation strategies
Personal Qualities
-
Clear and confident communicator, able to break down technical ideas to varied audiences
-
Strong organisational skills and the ability to prioritise urgent tasks
-
Disciplined and methodical approach to troubleshooting
-
Eagerness to adopt new technologies and continuously grow skills
-
Comfortable operating in a fast-paced, international environment