You are here: Home / OSADL / News / 
2024-11-23 - 21:42

Dates and Events:

OSADL Articles:

2024-10-02 12:00

Linux is now an RTOS!

PREEMPT_RT is mainline - What's next?


2023-11-12 12:00

Open Source License Obligations Checklists even better now

Import the checklists to other tools, create context diffs and merged lists


2023-03-01 12:00

Embedded Linux distributions

Results of the online "wish list"


2022-01-13 12:00

Phase #3 of OSADL project on OPC UA PubSub over TSN successfully completed

Another important milestone on the way to interoperable Open Source real-time Ethernet has been reached


2021-02-09 12:00

Open Source OPC UA PubSub over TSN project phase #3 launched

Letter of Intent with call for participation is now available



2014-02-10 12:00 Age: 11 Years

How to comply with Open Source licenses?

By: Armijn Hemel

It's easier than you may think!

Background

Many of today's software products could not have been built from scratch anymore by a single company, without either going vastly over time or budget but more likely both. Instead (partially) finished software components are purchased from third party companies, and combined into a final product. This brings a great reduction of cost, but there is also a legal risk that too many companies are not aware of. Failure to comply with the license conditions could mean a lot of legal risk to the company that brings a product to market. Carefully checking code that comes from your suppliers (and their suppliers, collectively called "supply chain") is a must to reduce legal risk.

Checking code from the supply chain

Checking licenses of the source code is a fairly well understood problem: there are many tools on the market, both open source [1] and proprietary, that can help determine licenses of software, and with a few simple measures an audit of the source code does not even have to cost much time. FOSSology [2] and Ninka [3] are well known open source scanners. Black Duck [4] is a well-known proprietary suite for code scanning.

It becomes a lot harder to check when suppliers ship components in binary-only form, without stating what is inside or without source code. From experience a lot of open source software is shipped without complying with the license conditions. This includes shipping GPL licensed code (so disclosure of the code under the GPL or a compatible license is required) as binary-only software.

Just passing on binaries from an upstream vendor usually is not a smart thing to do: basically it means that you may be forced to take all the legal risk of a license violation, even if this was the fault of the upstream vendor. The only right thing to do is to check everything that a vendor gives you and act if something is wrong. The district court of Hamburg decided in 2013 in the case "Welte v. Fantec" also stated that you cannot offload legal responsibilities to a third party.

Verifying all code (source code and binary) can be a lot of work, especially if supply chains are long, the code is a mess and the code is old (in one example for one of my customers I discovered that had been introduced as long as 15 years before). In some cases companies in the supply chain might not be able to fix issues, because they simply no longer exist.

The Binary Analysis Tool for checking binary files

The Binary Analysis Tool [5], or BAT, is made specifically to address situations where you get binaries from an upstream vendor without source code and you want to know what is inside, for example to verify all license requirements are met and there are no surprises in there. It is important to note that BAT does not make any legal conclusions (it is not an automated lawyer), nor solve compliance issues, it only helps gathering evidence.

BAT is made by Tjaldur Software Governance Solutions and is released under the Apache 2 license (with a few extra components released as public domain or under GPLv2).

BAT works in few phases: first a binary is scanned for known markers of file systems, compressed files and media files. If markers are found the files or file systems are carved from the binary, unpacked and verified. All files that were unpacked are then recursively scanned and (possibly) unpacked as well.

After unpacking, a wide range of checks is applied to each file that was unpacked. Checks range from extracting license markers from binaries, to finding dependencies of dynamically linked ELF files (to help research license requirements for dynamic linking), to finding out what is inside binary. BAT has two methods for this. The first one is a hard-coded list of identifiers that frequently occur in very limited list of programs, the other method uses statistics and a database full of information extracted from source code to determine the most likely used software. The scripts to create the database are open, and a ready to use database (including caching databases for speed ups) with cleaned up information from over 170,000 packages is available from Tjaldur Software Governance Solutions, the creator of BAT.

The database contains strings, function names, license information and more. When scanning the binary human readable strings, function names, variable names, etcetera, are extracted from the binary. These identifiers are used to fingerprint the binary by matching it with information from the database. The main fingerprinting technique uses so called "string constants" (like output strings, debug strings, and so on). This works because during translation from source code to binary code these strings are not removed from the binary. If enough strings from a certain package are found, it becomes very strong proof that a certain package was used. Due to copying of code between packages there is an algorithm that determines the most likely package that is used.

As it turns out this method is extremely effective: often there are dozens, hundreds or even thousands of strings that can be matched to open source software in the database, making it virtually impossible to deny reuse. If enough strings are found (and the database is filled with the correct information) it is even possible to reliably determine the exact version number of the software.

In the OSADL License Compliance Audit BAT is used as well, specifically to detect the interactions between ELF binaries found in the firmware.

[1] The Open Source Initiative has a list of widely used open source licenses at http://opensource.org/licenses.

[2] http://www.fossology.org/

[3] Ninka https://github.com/dmgerman/ninka/

[4] http://www.blackducksoftware.com/

[5] http://www.binaryanalysis.org/

OSADL would like to offer its grateful thanks to guest author Armijn Hemel for providing this News Article.