Skip to main content

News - Open Source Software

Close Filters
Clear
Inspecting Parquet files with Spark
  • Technology Innovation and Open Source
  • 31 Jul 2023

The Apache Parquet file format is popular for storing and interchanging tabular data. It is self-contained, well compressed, supports parallel reading, reading selected columns only, and filtering on values (to some extent). These properties make Parquet the ideal input file format for distributed data processing platforms like Apache Spark.

Read article
FastTrackML: The fastest ML experiment tracker yet
  • Technology Innovation and Open Source
  • 26 Jul 2023

FastTrackML is an open source collaboration designed to build a bridge to a better MLOps-enabled future

Read article
Armada: Six months in the sandbox
  • Technology Innovation and Open Source
  • 30 Jun 2023

Armada was accepted into the Cloud Native Computing Foundation (CNCF) sandbox in January 2023; the journey to get there, and what we’ve achieved since, has been no mean feat.

Read article
Spark’s groupByKey should be avoided – and here’s why
  • Technology Innovation and Open Source
  • 13 Jun 2023

Apache Spark is very popular when it comes to processing tabular data of arbitrary size. One common operation is to group the data by some columns to further process those grouped data. Spark has two ways of grouping data groupBy and groupByKey, while the latter works, it may cause performance issues in some cases.

Read article
Aeron – Proof of the benefits of open development
  • 18 May 2023

By Martin Thompson, High-Performance & Distributed Systems Specialist. A precept of open-source work is that “given enough eyeballs, all bugs are shallow,” first said by Eric S. Raymond in The Cathedral and the Bazaar, and dubbed “Linus’ Law” in 1999. Based on my experience of the benefits of developing Aeron – the high throughput, low latency […]

Read article
Invisible Work of OpenStack: Testing Matrices
  • 05 Apr 2023

Open source projects enable people from all over the world to collaborate and create high-quality software that benefits everyone - they have, in effect, revolutionised the way we think about software development.

Read article
A PySpark bug makes co-grouping with window function partition-key-order-sensitive
  • Technology Innovation and Open Source
  • 29 Mar 2023

Spark is used to process tabular data of arbitrary size. One common operation is to group the data by some grouping columns.

Read article
Un-pivot, sorted groups and many bug fixes: Celebrating the first Spark 3.4 release
  • Technology Innovation and Open Source
  • 21 Mar 2023
Read article
Guaranteeing in-partition order for partitioned-writing in Apache Spark
  • 05 Dec 2022

Learn how to guarantee in-partition order for partitioned-writing in Apache Spark from one of our open-source contributors.

Read article
Invisible Work of OpenStack: Security Bugs
  • 30 Nov 2022

Learn more about what happens when a security bug is found from one of our open-source developers.

Read article
1 2 3 4 5

Stay up to date with
G-Research