By Alexander Scammon, Head of Open Source Development
There exists the concept of the lone coder – someone who sits by themselves, solitary, lit only by the glow of their monitor, devoid of human interaction and isolated – and who among us hasn’t heard the joke about the extroverted coder at the party who stares at your shoes?*
We’ve long since known these stereotypes to be just simple shorthand for an amalgamation of traits and proclivities, which may or may not be associated with what we do.
But such images gain traction because they do have a grain of truth to them, a truth that was heightened during the pandemic, when the lone coder became a physical reality for us all. But in that isolation, the intrinsic collaboration of the open source ecosystem was thrust into the spotlight.
Here at G-Research, our open source work reflects our ability to collaborate effectively. We champion four key strategies for problem solving, which enabled us to continue working collaboratively during the pandemic and obliterate the myth of the lone coder.
Direct contribution
At G-Research, our core team has directly contributed code to several open source projects that are important to us, because nothing proves that you are engaged with a cause like rolling up your sleeves and fully immersing yourself in it.
We have made major contributions to the Parquet project, within Apache Arrow, for example, which helps data scientists improve the performance of their analyses, by reducing the number of disk-seeks, and by improving the compression and scan rates. Parquet allows for data to be stored in columnar format, in memory – making searches faster and more efficient – and powering a wider range of potential next-generation applications.
We like data and working on better ways of processing the display of data is a passion of ours. In several open-source projects we use or contribute to, we were often using another Apache tool to analyse and process this data: Apache Storm.
However, spinning Storm up on Kubernetes wasn’t terribly straightforward and existing open source Helm charts to install Storm were either broken or missing essential features we needed. So, starting from some of the existing Helm charts, we improved and published our own Storm Helm chart that we believe solves some essential problems the others will appreciate.
Continuing our passion for data from another angle, we did a fair amount of work on tcollector. It’s a tool for getting data, processing it and pushing to the open source time-series database, OpenTSDB.
We understand the need for better ways of dealing with data. It was necessary to build tools to help make sense of the data; especially since modern applications create and process an immense amount. The program we contributed to facilitates many of the once-difficult things in working with data: dealing with multiple data collectors, automatically de-duping and making all the wire protocols easy.
We’re proud of our contributions, but recognise that we can’t make any piece of quality software without the help, support and collaboration of the wider developer community.
Create something of your own
Our work has focused primarily on making the data we work with easier to see, manipulate and reason with. Our philosophy has been that by building better tools for seeing what’s going on with systems, we give users the ability to solve their problems themselves or prevent problems from arising in the first place.
To that end, our researchers have contributed time, and code, to three major projects, each of which drives better transparency for complex systems.
Thanos is about making the metrics you track more manageable, so that you can more easily store, view, and access them for queries. It’s a metrics manager, a kind of corraler for all the things you want to know about.
It is built with the same philosophy that guides other successful software programming languages; in that each sub-command does one thing and does it well. The components are written to work together and each component is easily atomised into its own entity to facilitate troubleshooting.
TensorBoard helps with the visualisation of data from neural networks, an increasingly important part of modern data science. It allows researchers to understand and see what’s going on with the program they’re trying to debug or optimise. TensorBoard makes it possible to see the often-opaque analysis of large modern datasets, but it has been historically quite difficult to write plug-ins for. Our team has contributed several very useful extensions to the project, enabling anyone to write plug-ins.
Engage the community where it’s at
If you haven’t been paying attention to the open source community recently, you may have missed some of the progress being made in the systems and software that now undergird an ever-larger portion of the modern technology stack.
One of the biggest advances in our ability to engage has been a slew of new approaches to open source funding, like Open Collective and Github Sponsors. One new approach that we’re particularly excited about is Tidelift, a startup that helps companies distribute monetary contributions according to a company’s actual dependency tree. That helps companies that benefit from technological progress to directly fund its development, by supporting the developers working to make it better.
We’re quite interested in F# and the related ecosystem of tools that Microsoft first created in 2005 and G-Research adopted soon after.
Rather than go through an intermediary, we reached out directly to people in the community to engage them in their work. This is the ultimately the most genuine approach to engaging the community where it’s at. Based on our efforts, we’re fortunate to have found a cadre of capable contributors to develop this ecosystem. Based on our interest in the F# language, we think it makes sense for us, and for the larger universe of developers using the F# programming language.
Work closely with someone else
Nobody can do it all by themselves – and we’re humble enough to know that sometimes the best work is happening outside the walls of our building. That’s one of the reasons we chose to seed the good work at Ursa Labs, now Voltron Data.
Our involvement with the Ursa Labs and Voltron Data team is twofold: we can provide guidance as to how we use their tools and how we’d like to use them in the future and they help us with related packages that we maintain, like ParquetSharp. We offer suggestions based on our real-world experience working with financial data analytics and they provide technical support and help based on their deep understanding of the tools they’re creating — a very effective partnership in both directions.
How do our strategies help?
If our contributions above reflect anything, it’s that collaboration is key – many of the successful contributions we’ve made to the OS universe have been a byproduct of our collaborative approach:
- Working on what we’re passionate about led to the creation of all sorts of data-related open-source projects from G-Research
- Our insight into problem prevention resulted in the creation of Thanos and our TensorBoard plugins
- Using our expertise to support the work of teams outside our own such as Voltron Data means there’s a great bi-directional flow of ideas that helps us continue to expand our knowledge base and capabilities
- We’re able to support the open source communities that support our engineering via new financial models like Tidelift, Open Colective and Github Sponsors
Does this mean we’re single-handedly reducing the concept of a lone coder to a mere myth?
We’re not quite there yet, but we’re working on it.
*How do you tell an introverted computer scientist from an extroverted computer scientist? An extroverted computer scientist looks at your shoes when he talks to you.
See more about our projects and contributions on our website.