Skip to main content

HPC Validation and Performance Engineer

  • Infrastructure Engineering
  • Dallas

Do you want to tackle the biggest questions in finance with near infinite compute power at your fingertips?

G-Research is a leading quantitative research and technology firm, with offices in London and Dallas.

We are proud to employ some of the best people in their field and to nurture their talent in a dynamic, flexible and highly stimulating culture where world-beating ideas are cultivated and rewarded.

This is a hybrid role based in our new Dallas infrastructure hub where we work on the latest technologies in a cutting-edge environment.

The role

As an HPC Validation and Performance Engineer at G-Research, you will take ownership of the validation and optimization of our HPC CPU and GPU calc farms.

This critical role will involve developing a validation and performance baselining framework, which ensures system readiness for AI/ML and HPC workloads across multiple architectures. Your role will be essential in providing continuous performance benchmarking, real-time observability, and long-term strategic readiness.

You will drive the implementation of advanced tooling and frameworks, maintaining an infrastructure that is crucial to our cutting-edge research efforts. You will be accountable for providing data driven performance metrics to support architectural design choices as we continue to globally scale our datacentre footprint.

We are looking for someone with deep technical expertise in compute, storage or networking optimizations and performance engineering who can develop solutions that scale with our growing infrastructure.

This role demands a forward-thinking engineer who can anticipate industry trends and adopt emerging architectures and strategies to keep G-Research at the forefront of innovation.

Key responsibilities of the role include:

  • Architecting and implementing a validation framework to certify the readiness of GPU nodes across a large, distributed environment
  • Defining methodologies to continually assess performance and optimising infrastructure across AI/ML workloads
  • Developing and executing comprehensive performance testing using industry benchmarks, ensuring optimal performance across HPC compute, storage and networking
  • Leading efforts to identify and resolve bottlenecks in system performance.
  • Building robust, scalable tools for automated validation and testing, utilising Python, Go, Kubernetes and CI/CD pipelines to streamline continuous validation and benchmarking processes
  • Implementing monitoring solutions using Prometheus, Grafana and other modern monitoring technologies to track performance metrics and real-time health of the cluster
  • Defining and implementing best practice for continuous performance validation, ensuring that the infrastructure remains reliable and efficient as new technologies emerge
  • Staying informed on industry trends and advancements to ensure long-term strategic alignment
  • Working cross-functionally with engineering, infrastructure and research teams to align validation efforts with the broader business objectives, ensuring that the platform meets evolving research demands

Who are we looking for?

The ideal candidate will have the following skills and experience:

  • Accelerator performance experience, including profiling and tuning with large-scale GPU clusters
  • In-depth understanding of NVIDIA ClusterKit, Nsight and Validation Suite, MLPerf and DCGM tools for GPU and DPUs
  • Networking & Storage performance experience, including profiling and optimisation with NVIDIA ClusterKit, iPerf or equivalent across InfiniBand/RoCe network implementations
  • System benchmarking experience across Linux and familiarity with the Phronix suite or equivalent
  • Experience with HPC workloads across distributed global locations, bringing data driven performance data to compliment key architectural decisions
  • Strong proficiency in developing automation tools and micro benchmarking frameworks for validation using Python, Go, and Kubernetes in a Ubuntu Linux environment
  • Expertise with key monitoring platforms including OTEL, Prometheus, ELK and Grafana and in definition and implementing the overall observability strategy for HPC validation and performance monitoring
  • A deep understanding of emerging technologies, architectures and strategies, with the ability to assess their potential impact on infrastructure and adopt them as part of a long-term plan
  • Proven ability to lead complex technical projects, influence decisions and engage with stakeholders across technical and research teams

Why should you apply?

  • Market-leading compensation plus annual discretionary bonus
  • Lunch provided in the office (via GrubHub)
  • Informal dress code and excellent work/life balance
  • Excellent paid time off allowance of 25 days
  • Sick days, military leave, and family and medical leave
  • Generous 401(k) plan
  • 16-weeks’ fully paid parental leave
  • Medical and Prescription, Dental, and Vision insurance
  • Life and Accidental Death & Dismemberment (AD&D) insurance
  • Employee Assistance and Wellness programs
  • Generous relocation allowance and support
  • Great selection of office snacks, and hot and cold drinks
  • On-site gym and car parking

This role is employed through our US affiliate.

Location: Dallas
Apply Now
An image of Willy
Willy Data Services Manager

"My team and I have access to a wide range of training opportunities, which allowed us to get the entire team AWS certified within a quarter. We’re actively working on the latest AI and machine learning projects to stay ahead of industry standards."

Find out more

What our people say

An image of Yoga
Yoga Software Engineering Manager

"The friendly, collaborative atmosphere here is a breath of fresh air and a perfect fit for me."

Find out more
An image of Mario
Mario FPGA Manager

"While some people might think working in finance may not be too exciting, at G-Research, it is, especially if you see it as a problem to solve. How do we solve this algorithm? How do we get faster? This is why I think people are really excited to work here."

Find out more
An image of Mia
Mia Software Engineer

"What I appreciate most about working in G-Research is the supportive and knowledgeable environment. Everyone is incredibly helpful and patient, which ensures there’s a good balance between being challenged and your workload."

Find out more
An image of Neil
Neil Corporate IT Manager

"My favourite part of working for G-Research is that technology is at the heart of everything we do at the company, driving the business forward and enabling us to stay ahead of the competition."

Find out more
An image of Ross
Ross Cloud Engineering Manager

"My favourite thing about working here is the people. G-Research strives to hire not only the brightest minds, but good people, which in turn creates a brilliant collegiate and social atmosphere at the company."

Find out more
An image of Gabriel
Gabriel Software Engineer

"The problems we solve are often novel in nature, meaning we get to solve the previously unsolved. I find this to be a great way to stay challenged and engaged!"

Find out more
An image of Garrett
Garrett Software Engineer

"The willingness to collaborate between both teams and functions has made the transition into my new role as easy as possible."

Find out more
An image of Michael
Michael Software Engineer

"It’s a privilege to be in a place where my curiosity is nurtured and my learning journey is supported!"

Find out more
An image of Joshua
Joshua Platform Engineer

"The best thing about working at G-Research is being around such smart people, it motivates you to always want to grow and learn."

Find out more
An image of Margot
Margot HRIS manager

"I enjoy how dynamic the work environment at G-Research is. It keeps you busy and continuously creates opportunities to develop yourself and your career, too."

Find out more

Interview process

Online Application

Our assessment process kicks off with our Talent Acquisition team, who will review your application and assess your fit for the role.

Stage One: Technical Interview

You will meet with a team member – or take a remote test – where your technical abilities will be put to the test.

Stage Two: Behavioural Interview

We will set aside technical skills and focus on you.

Stage Three: Further Technical Interviews

Here, we will take a deeper dive into your technical skills and competencies.

Stage Four: Management Interviews

The final stage of our interview process is where you will meet members of your team, your future manager, and functional leadership.

HPC Validation and Performance Engineer Apply now

Stay up to date with G-Research