Dgraph is Greener on the Open Source Side
- Open Source Software
Written by Deborah Ong, Developer Relations Engineer at G-Research Open Source Software
The COVID-19 pandemic forced me to make several changes in my life. I was very fortunate to personally avoid any major health problems and so did my immediate circle of family and friends, who also came through relatively unscathed, but I did lose a lot of social contact and connection with the people around me.
The pandemic therefore gave me an opportunity to make new friends in new ways, as a lot of the traditional avenues had closed down, so I had to look beyond my normal social circle. If I had to imagine my social network on a graph database, I would have two observations:
- I have fewer people I think of as my friends now – my database has definitely shrunk
- I feel more connected to the people with whom I’m still friendly
It’s easy to be friends in fair weather. But when the clouds darken and the thunder rumbles, that’s when you really learn who is down to ride with you. This emphasis on the importance of quality over quantity in relationships seemed like a perfect analogy of what makes a graph database shine – it assumes that relationships (edges) are as important as the records (nodes).
While the relational model is still completely applicable for many business applications, industry leaders like eBay, Amazon and Facebook have shown the power and importance of the graph model for some of their most valuable data. In the banking and insurance industry, graphs help detect fraud, while in the entertainment sphere, graphs power recommendation systems. Another symbiotic application is in troubleshooting and visualising networks and IT operations.
Inspired by the different use-cases, I was looking to explore this shift towards graph databases and decided to kick the tyres on a viable open-source option: Dgraph.
A quick taster of Dgraph
Let’s say your relational database (e.g. MySQL or SQL Server) is enterprise-scale with many foreign keys – and the process of editing or creating new endpoints takes forever. Perhaps, you are even considering automation but you are cautious about transitioning to a graph database because it seems like so much work. That is a completely valid concern but the good news is, there are ways to address this.
Dgraph integrates with both Docker and Kubernetes, allowing you to set up without breaking a sweat. From there on, to create a data persistent native GraphQL back-end, you only need to create a GraphQL schema and click “Deploy” on Dgraph cloud. You can also use a standalone Docker image to run Dgraph’s Ratel UI on your local server:
Play around with it here: Dgraph Playground, a Freebase movie dataset with 21 million edges
Besides determining the underlying need for a graph database (FAQ: Why would I use/not use Dgraph), you might also find yourself torn between various graph databases – a comparison will help you decide if Dgraph suits your needs. Once you are ready to try Dgraph, go on one of the many tours and tutorials to see how it works and build your first Dgraph project.
A graph database with a great community
I had no experience in graph databases until a few months ago when I made some demo projects for fun that propelled an interest in back-end management systems. When I started, I had a whole host of 400s, CORS errors, and dependency issues.
Initially, it was frustrating, but every time that I posted a support request, I’d get a response within the hour! For every road bump, I was buoyed by the speed and determination of the Dgraph community to help, despite my limited experience. As a platform to learn and grow, Dgraph Discuss has been truly inclusive and supportive.
While I am just getting my feet wet testing out all the features, I have worked through some great tutorials. From one that covers a basic to-do-app, to intermediate ones that integrate authentication (a whole series!) – I now feel equipped to use Dgraph in my next project.
However, if you want advanced functionality like graph analytics for fraud detection and recommender systems, you will need a processing powerhouse like Spark. Spark scales perfectly with your data and supports graph data with dedicated components. In order to benefit from the added functionality of Spark, your Dgraph data will require a friendly connector.
The Spark-Dgraph connector
Since rewriting to Spark, G-Research has been apportioning resources towards building and sustaining open-source tools that explore the viability of graph databases. Connectors are particularly important to G-Research as they link powerful components together to process and distribute large amounts of data. With the goal of increasing functionality of graph databases, its creator, Enrico Minack, designed the Spark-Draph connector to “read graphs from a Dgraph cluster directly into DataFrames, GraphX or GraphFrames (while supporting) filter pushdown, projection pushdown and partitioning by orthogonal dimensions predicates and nodes.”
In order to test out the Spark connector, Enrico has selected DBpedia, a large real-world dataset to load into the Dgraph cluster. Here‘s how he generated the dataset and how its pre-processing is done with Spark. This pressure testing has gone a long way towards optimising the functionality of the connector and determining its performance on large graph datasets.
The grass is greener where you water it
The beauty of linking systems – whether it is Spark and Dgraph or any other database – is that this endeavour fills in the gaps of the data management ecosystem. Integrations are especially crucial when it comes to the adoption of graph databases which have clear utilitarian advantages for large, interconnected datasets.
G-Research benefits from developing tools and building interoperability with Spark – like Sparkmagic, a Livy integration which the G-Research Open Source team now maintains. In order to open up new ways of doing research, it recognises and fosters an interdependence on the best software in open-source. We need a network of people – learners, builders and thinkers to help our projects flourish. If this is something you are interested in, reach out by submitting an issue to the Spark-Dgraph project or participating in the discussion here. You can also feel free to check out other interesting projects from the G-Research Open Source team at: https://github.com/G-Research.
UPDATE: The Dgraph team has recently been going through some restructuring after its fair share of ups and downs recently. Despite this, the core team has confirmed that this will not impact the Dgraph product and their support for clients. As of today, there is still an active community of Dgraphers that can be found on Discord.