Observability Manager

Do you want to tackle the biggest questions in finance with near infinite compute power at your fingertips?

G-Research is a leading quantitative research and technology firm, with offices in London and Dallas.

We are proud to employ some of the best people in their field and to nurture their talent in a dynamic, flexible and highly stimulating culture where world-beating ideas are cultivated and rewarded.

This is a hybrid role based in our new Dallas infrastructure hub where we work on the latest technologies in a cutting-edge environment.

The role

The Observability Platform team manages the doors – both entry and exit – to the telemetry services that are managed by the Platform Reliability and Observability Team. We ensure that engineers can effectively produce and consume telemetry for their services. This involves working with the Observability Engineering team to build robust pipelines to ingest and route data in predictable, composable ways as well as visualising that data after the fact to drive insight and action.

Under the umbrella of the Platform Engineering department, our group also has responsibility to mature the reliability of our full HPC stack, from networks and storage up to the compute and application platform layers.

We are seeking a Manager with deep expertise in observability stacks. You will understand the unique problems that come with moving cloud-level volumes of telemetry data at scale, and be excited at the prospect of ensuring our customers have eyes into the same underlying telemetry data to run their services as efficiently as possible.

Knowledge of and experience running observability platforms at scale, serving a wide variety of customers with varying degrees of access, is a strong requirement. Knowledge of core SRE principals is highly beneficial.

Key responsibilities of the role include:

Helping to lead the development of our observability and reliability engineering strategy
Defining and driving the roadmap for observability tooling, ensuring alignment with business goals and scalability requirements
Working with telemetry data at enormous scale, ingesting data from industry-leading GPU clusters
Acting as the lead for all observability efforts related to AWS services, ensuring seamless integration with the observability platform
Collaborating with engineering leadership to establish observability as a core function of the development lifecycle
Working closely with application teams to ensure observability systems are fully integrated and providing the necessary insights
Enabling SRE frameworks, promoting SLAs, SLOs and SLIs, and working closely with platform teams to ensure reliability is constantly improving
Growing, adapting and investing in your team, fostering a culture of continuous learning and improvement, encouraging adoption of new observability tools and techniques

Who are we looking for?

The ideal candidate will have the following skills and experience:

Proven experience leading observability or SRE teams in a cloud-native or hybrid-cloud environment, running platforms in production and at scale
Well versed in reliability engineering concepts, including different types of testing, progressive deployments, error budgets, the role observability plays and fault-tolerant design
Hands-on experience with modern observability tools and frameworks such as Prometheus, OTEL (OpenTelemetry), Grafana and enterprise SaaS Observability platforms, such as Datadog and Dynatrace
Expertise in designing, building and scaling observability solutions for distributed systems
Customer focused, with an enthusiasm for providing infrastructure as a service and defaulting to a product lens when evaluating platform scale problems
Excellent communication skills and the ability to collaborate with cross-functional teams
Leadership experience with demonstrated success in mentoring and developing technical talent
Experience with cloud platforms, such as AWS, Azure or Google Cloud
Familiarity with microservices architecture and containerized environments, such as Kubernetes and Docker
Knowledge of infrastructure as code (IaC) and automation tools, such as Terraform and Ansible

Why should you apply

Market-leading compensation plus annual discretionary bonus
Lunch provided in the office (via GrubHub)
Informal dress code and excellent work/life balance
Excellent paid time off allowance of 25 days
Sick days, military leave, and family and medical leave
Generous 401(k) plan
16-weeks’ fully paid parental leave
Medical and Prescription, Dental, and Vision insurance
Life and Accidental Death & Dismemberment (AD&D) insurance
Employee Assistance and Wellness programs
Generous relocation allowance and support
Great selection of office snacks, and hot and cold drinks
Free on-site gym and car parking

This role is employed through our US affiliate.

Location: Dallas

Apply Now

Mia Software Engineer

"What I appreciate most about working in G-Research is the supportive and knowledgeable environment. Everyone is incredibly helpful and patient, which ensures there’s a good balance between being challenged and your workload."

Find out more

What our people say

Willy Data Services Manager

"My team and I have access to a wide range of training opportunities, which allowed us to get the entire team AWS certified within a quarter. We’re actively working on the latest AI and machine learning projects to stay ahead of industry standards."

Find out more

Yoga Software Engineering Manager

"The friendly, collaborative atmosphere here is a breath of fresh air and a perfect fit for me."

Find out more

Mario FPGA Manager

"While some people might think working in finance may not be too exciting, at G-Research, it is, especially if you see it as a problem to solve. How do we solve this algorithm? How do we get faster? This is why I think people are really excited to work here."

Find out more

Neil Corporate IT Manager

"My favourite part of working for G-Research is that technology is at the heart of everything we do at the company, driving the business forward and enabling us to stay ahead of the competition."

Find out more

Ross Cloud Engineering Manager

"My favourite thing about working here is the people. G-Research strives to hire not only the brightest minds, but good people, which in turn creates a brilliant collegiate and social atmosphere at the company."

Find out more

Gabriel Software Engineer

"The problems we solve are often novel in nature, meaning we get to solve the previously unsolved. I find this to be a great way to stay challenged and engaged!"

Find out more

Garrett Software Engineer

"The willingness to collaborate between both teams and functions has made the transition into my new role as easy as possible."

Find out more

Michael Software Engineer

"It’s a privilege to be in a place where my curiosity is nurtured and my learning journey is supported!"

Find out more

Joshua Platform Engineer

"The best thing about working at G-Research is being around such smart people, it motivates you to always want to grow and learn."

Find out more

Margot HRIS manager

"I enjoy how dynamic the work environment at G-Research is. It keeps you busy and continuously creates opportunities to develop yourself and your career, too."

Find out more