Wrapping up the G-Research Crypto Forecasting competition
Running from November 2021 to May 2022, the G-Research Crypto Forecasting Competition was our first foray into sponsoring a Kaggle competition.
As a business that, at its core, develops trading strategies from data and modelling, there is a natural overlap between our day-to-day work and the world of Kaggle. Indeed, we have our own internal Kaggle Club and count a Kaggle Grandmaster among our number.
We were therefore delighted to launch the Crypto Forecasting Competition late last year, with it concluding May 2022.
Our main goal was to run a competition that replicated some of the interesting challenges we face in our work at G-Research to give contestants a sense of the problem sets we encounter and overcome. The most obvious route would have been a straightforward stock market prediction – but there has been no shortage of such competitions on Kaggle and we wanted to provide the community with something a little different.
Predicting cryptocurrencies is notoriously difficult. The markets can be wildly volatile – Luna’s collapse being the most egregious recent example – and prices fluctuate incredibly quickly. Such a low signal, high noise world, where finding alpha amongst millions of rows of data is exceptionally tough, is very close to the environment in which our Researchers seek out meaningful patterns in their day-to-day work.
With Kaggle having never hosted a crypto-focused competition before, we felt it was the perfect way to introduce ourselves to the Kaggle community.
We asked competitors to forecast the short-term returns of 14 different crypto coins. Participants had three months to develop their models using historic crypto data, which were evaluated on a weighted Pearson correlation coefficient (for those interested, more details can be found in the ‘Prediction Details and Evaluation’ section of this tutorial notebook).
After the training phase, each team selected their two best models for submission and we ran them for three months against real live market data to get our final leaderboard. At the end of the evaluation phase, the top ten teams shared a prize pool of $125,000.
With such a noisy dataset, it was unsurprising that the leaderboard saw many big jumps and precipitous drops, particularly early in the competition. Nevertheless, by the end we had clear winners: Jose and Eduardo of the team Meme Lord Capital took the $50,000 dollar first prize. Nathaniel Maddux took second spot, with GABA in third.
One of the great privileges of hosting the competition has been sitting down with the winners to hear how they tackled the challenge and came up with their winning submissions.
Interestingly, all of our top three used a LightGBM model, though some experimented with other approaches, including neural networks.
However, all of our top competitors believed it was their feature engineering that contributed most to their wins. This was where the winners spent the most time, and developing and testing different features had a far greater impact on their final scores than model development.
To the moon
Ultimately, we have found the quality of entrants and the level of discourse on the discussion board throughout the competition to be of a very impressive standard. We hope that in taking part, competitors have been able to gain some real insight into the day-to-day work our Researchers undertake and perhaps developed an interest in high noise-low signal predicting.
And if anyone who has participated – or even just followed from the side-lines – is interested in pursuing a career in quantitative finance that could turn Kaggling into your day job: we are hiring and would love to hear from you.