Open-source at G-Research: a pragmatic approach
When G-Research was founded in 2001, the tools to perform large-scale data science were few and far between. Fewer still were the number of open-source tools available to perform large-scale data science. At the time, it made perfect sense for G-Research to turn to the available 3rd-party technologies and in-house development to fill the gap.
However, as “big data” became commonplace and the tools to manage and analyse that data became more open and widespread, G-Research adjusted and adopted many of the same basic open-source tools that other data science firms use today.
This change of approach is largely a pragmatic choice: the cost of running open source software is ultimately less than the cost of running third-party tools. Yes, there is operational overhead when we take on an open source project: we need to install, update, operate and contribute to the software. But we believe that these costs are, in the long run, far less than the 3rd-party alternatives.
We also believe that there is no amount of money we could pay a company, nor any number of in-house engineers we could hire to keep up with the pace of development of machine learning and data science tools taking place right now. Rather than fight the tide of development in this space, we instead want to join forces and swim with the current.
There are additional benefits to using open-source software, as well. In many situations, we believe that using technologies that have many more eyes on the code will yield more secure and stable software. Likewise, using open technologies means that prospective hires are likely more familiar with the tools that we are using – and much more willing to join because their skills will be transferable.
All of these points together have led us to the conclusion that embracing open source technologies is the right thing to do. Since we like to live on the cutting edge of development, it wouldn’t work to sit back and be passive consumers of open-source software. Instead, we at G-Research have decided to embrace open-source whole-heartedly, by contributing time and resources directly back to the communities that foster the tools we rely upon. Again, this plays to our sense of pragmatism: there’s no sense in us fixing an open-source tool and not giving those fixes back to the community – it only leads to forked code that becomes more of a liability as time goes on. The desire to give back isn’t driven by some lofty, idealistic goal, but of reasoned pragmatism.
As part of our new approach to open-source, we are writing a series of blog posts detailing our open-source journey. Some posts are by community members with whom we work directly; some are from engineers from within the company; others are members of our team of people dedicated to improving open source technologies. Catch-up with our open-source articles here:
- Four strategies for engaging the open source universe
- The magic of maintaining Sparkmagic
- Overcoming the need to be useful
- Transitioning to Open Source Development
- Dispatch from the Fall 2019 RISECamp
- Tensorboard: What is Tensorboard?
We will continue to update this post as we publish new blogs. Hopefully, our experiences will help others out there who are contemplating the same journey.