Siembol – Open-Source Anti-Malware for the Cloud
- Alex Scammon, Head of Open-Source Development, G-Research
- Celie Valentiny, Software Developer, G-Research
Go sixty miles north of Brisbane, in Australia, and eventually you’ll come to the Shire of Maroochy, in Queensland. The quiet coastal town wouldn’t seem likely to be the epicentre for a new age in digital weaponry, but it was there where Vitek Boden caused 800,000 litres of raw sewage to be dumped onto local parks and rivers using little more than a laptop, a two-way pager and control device specific to the system. Boden was caught, got two years in an Australian prison for his crime, and has had his crime go on to be re-hashed in countless computer security case studies over the subsequent two decades. It was real proof that hijacked computers could be a powerful weapon in the wide world outside of offices or movie scripts.
The G-Research algorithms are focused on better future predictions but we’re vulnerable to the same type of computer security issues that plagued computer systems of the past. That’s why we’ve not only leveraged the best-in-class existing solutions to shore-up our defenses, but also invested in our own tools to dramatically extend and improve our security stance.
Now we’re offering one of our tools for free to the open source community.
Siembol provides a scalable, advanced security analytics framework based on open-source big data technologies. It helps security teams respond to attacks by supplying enriched alerts and data.
Siembol was developed in-house at G-Research as a security data processing application. We knew that we needed a highly efficient, real-time event processing engine and used Splunk and Apache Metron in the early years of our experience. However, neither product attended to all of our needs – we wanted specific features that mattered to G-Research – so we built Siembol.
Siembol has many things that set it ahead of other solutions.
It has customisable alert escalations that allow security teams to create advanced correlations and combinations of multiple data sources. Siembol integrates easily with other industry-standard systems such as Jira, TheHive, Cortex, ELK, and LDAP.
Siembol provides a robust framework for normalising and extracting fields from logs, supporting chaining of parsers, field extractors and transformation functions – all with flexible fault-tolerances built right in.
When you use Siembol, you can define rules for selecting enrichment logic, joining enrichment tables, and defining how to enrich the processed log with information from user-defined tables. That’s powerful.
Configurations and rules are defined by a modern Angular web application – Siembol UI – and stored in Git repositories. All configurations are stored in JSON format and edited by web forms in order to speed up the creation and learning time and avoid mistakes. The Siembol UI supports validation, testing, and creating and evaluating test cases to mitigate configuration errors in a production environment too. It makes it easy to code and quickly correct.
Siembol can be used in multi-tenant environments, without having to deploy multiple instances. It supports OAuth2/OIDC for authentication and authorisation in the Siembol UI. All Siembol services can have multiple instances with authorisation based on OIDC group membership.
And you get all of this with the ease of installation that comes from prepared Docker images and Helm charts. This simplifies even the installation process for those new to the project, making it easy to begin using.
Siembol can centralise both security data and the monitoring logs of different sources. The format of these logs can vary, especially when they come from third-party tools. Normalisation by a common field, such as timestamp, is an important component of Siembol’s functionality. It can be additionally useful to enrich log data with other metadata provided by a CMDB or other internal system when building out a robust monitoring solution.
For example, data repositories can be enriched by data classification, network devices by a network zone, username by active directory group, etc. By using Siembol alerting services, CSIRTs can use the tool to add detection on top of normalised logs. Alerts triggered from the detections are integrated into incident response and defined and evaluated by the Siembol response service. This allows for integration of Siembol with systems such as Jira, TheHive, or Cortex, and provides additional enrichments by searching ELK or doing LDAP queries – enabling a much richer picture of what’s going on.
Siembol can be used as a tool for detecting attacks or leaks by teams responsible for the system platform. For example, the Big Data team at G-Research is using Siembol to detect leaks and attacks on the Hadoop platform. These detections are then used as another data source within the Siembol SIEM log collection for the CSIRT team handling these incidents.
At G-Research, we use Siembol to parse, normalise, enrich and detect approximately 150,000 events a second. Per day, this adds up to volumes of approximately 15TB of raw data, or 13 billion events – no small task.
Fast forward two decades and many of the computerised industrial control systems that Boden attacked remain vulnerable. Some were even exploited to undermine Iran’s attempts at enriching uranium. If you read through the detailed MIT post-mortem of Boden’s hack, you see that an HWT employee, “Mr. Yagger,” was able to finally run down the malicious code by examining log data from the control system. The truth is always in the logs.
We look forward to seeing how you will use this new data security and network monitoring tool, and would love to have you join us for community conversation at GitHub Discussions.
Addendum: Marian Novotny gave presentations at Black Hat on Thursday 5 August 12-1pm PST (Black Hat video here) and at DEF CON Demo Lab (DEF CON video here) on Friday 6 August 12pm PST. He will be presenting Siembol again at All Things Open, October 17-19, 2021.