News - Open Source Software

Clear

Inspecting Parquet files with Spark

Technology Innovation and Open Source
31 Jul 2023

The Apache Parquet file format is popular for storing and interchanging tabular data. It is self-contained, well compressed, supports parallel reading, reading selected columns only, and filtering on values (to some extent). These properties make Parquet the ideal input file format for distributed data processing platforms like Apache Spark.

Read article

FastTrackML: The fastest ML experiment tracker yet

Technology Innovation and Open Source
26 Jul 2023

FastTrackML is an open source collaboration designed to build a bridge to a better MLOps-enabled future

Read article

Armada: Six months in the sandbox

Technology Innovation and Open Source
30 Jun 2023

Armada was accepted into the Cloud Native Computing Foundation (CNCF) sandbox in January 2023; the journey to get there, and what we’ve achieved since, has been no mean feat.

Read article

Spark’s groupByKey should be avoided – and here’s why

Technology Innovation and Open Source
13 Jun 2023

Apache Spark is very popular when it comes to processing tabular data of arbitrary size. One common operation is to group the data by some columns to further process those grouped data. Spark has two ways of grouping data groupBy and groupByKey, while the latter works, it may cause performance issues in some cases.

Read article

Aeron – Proof of the benefits of open development

18 May 2023

By Martin Thompson, High-Performance & Distributed Systems Specialist. A precept of open-source work is that “given enough eyeballs, all bugs are shallow,” first said by Eric S. Raymond in The Cathedral and the Bazaar, and dubbed “Linus’ Law” in 1999. Based on my experience of the benefits of developing Aeron – the high throughput, low latency […]

Read article

Invisible Work of OpenStack: Testing Matrices

05 Apr 2023

Open source projects enable people from all over the world to collaborate and create high-quality software that benefits everyone - they have, in effect, revolutionised the way we think about software development.

Read article

A PySpark bug makes co-grouping with window function partition-key-order-sensitive

Technology Innovation and Open Source
29 Mar 2023

Spark is used to process tabular data of arbitrary size. One common operation is to group the data by some grouping columns.

Read article

Un-pivot, sorted groups and many bug fixes: Celebrating the first Spark 3.4 release

Technology Innovation and Open Source
21 Mar 2023

Read article

Guaranteeing in-partition order for partitioned-writing in Apache Spark

05 Dec 2022

Learn how to guarantee in-partition order for partitioned-writing in Apache Spark from one of our open-source contributors.

Read article

Invisible Work of OpenStack: Security Bugs

30 Nov 2022

Learn more about what happens when a security bug is found from one of our open-source developers.

Read article

Previous 1 2 3 4 5 Next

News - Open Source Software

Stay up to date with G-Research

Stay up to date with
G-Research