You are here: Home / Projects / 
2024-11-21 - 07:30

OSSelot – The Open Source Curation Database

Rationale

When people who copy and distribute Open Source software for whatever purpose are asked what they think most hinders and limits the use of such software, they regularly answer, "Clearing a software component for distribution and correctly fulfilling the various license obligations is so much painful work." And they usually add: "It's especially painful because you know that most of the work has been done a thousand times before by others, but you can't get to the results." It seems therefore obvious to share these efforts just as the development of the software itself is shared. To do so, three prerequisites must be fulfilled:

  • A minimal set of clearing information must be defined, and a database must be provided to store curated data.
  • A platform must be established where a community can grow that creates, shares, and makes such curation data generally available.
  • To create trust in the reliability of the provided material, its quality must be undeniably high, requiring experienced and responsible contributors and continuous, rigorous and thorough review.


To make this happen the OSSelot project was established and a separate homepage was created. The project was launched publicly at the December COOL event. The presentations and videos from this event can be found here.

Since then, another COOL event on curation of data and the OSSelot contribution process has taken place in March 2024. Presentations and videos are available here. In September 2024, a session on tooling to integrate OSSelot data into OpenEmbedded and Yocto build systems is upcoming. Details and registration can be found on the event page.

Provided artifacts

The project data are provided in a publicly accessible repository for selected versions of software packages such as Coreboot, the Linux kernel or the OpenSSL library. Typically, three artifacts are included per package – a README file with general information, an SPDX tag:value file with curated data for every single source code file and a ready-to-use OSS disclosure file. The tag:value files can be integrated into the build process, so only the licenses of those files that are actually compiled into the build artifact and distributed need to be considered. In addition, the tag:value files contain annotations to the license conclusions to elucidate decisions that are not obvious. The OSS disclosure files contain all applicable licenses and all copyright notices for the entire package. In addition, the OSS disclosure files contain "acknowledgment text" when such acknowledgment is required by the license.

Following the principle of Open Source software development, contributions, review of existing data and bug reports are encouraged. Feedback can be given via git issues in the repository or in direct contact to infoªosadl.org. In return, any inconsistencies or problems that are found while curating data are communicated to the respective projects in the hope that future versions are improved for everyone.

License

All material that is part of the OSSelot project is licensed under CC0 1.0 Universal.

Relevant links

Presentations

The following presentations show how the curation database can be used to facilitate license clearing of packages for which matching curation data exist and for packages with a somewhat different version e.g. after upgrading.


 

What is an SPDX tag:value file and what does it look like?

For now, the SPDX tag:value file format has been selected as the primary file format of this curation database. Some conversion tools to and from this format are already available, and some more will be developed during this project. The SPDX tag:value files are normally generated by clearing tools such as FOSSology, and can be imported back into such clearing tools, but are also human readable. The following section provides details about the internal structure of such a file.

 

SPDX tag:value file template

 

Header

SPDXVersion: SPDX version
DataLicense: Data license

Document Information

##-------------------------
## Document Information
##-------------------------
DocumentNamespace: Document namespace
DocumentName: Document name
SPDXID: SPDXID

Creation Information

##-------------------------
## Creation Information
##-------------------------
Creator: Tool: Creator's tool
Creator: Person: Creator's name
CreatorComment: <text>Creator comment</text>
Created: Date
LicenseListVersion: License list version

Package Information

##-------------------------
## Package Information
##-------------------------
PackageName: Package name
PackageFileName: Package file name
SPDXID: SPDXRef-ID
PackageDownloadLocation: Package download location
PackageVerificationCode: Package verification code
PackageChecksum: SHA1: SHA1 package checksum
PackageChecksum: SHA256: SHA256 package checksum
PackageChecksum: MD5: MD5 package checksum
PackageLicenseConcluded: Package license concluded
PackageLicenseDeclared: Package license declared
PackageLicenseComments: <text>Package license comments</text>
PackageLicenseInfoFromFiles: Package license info from files
PackageCopyrightText: Package copyright text

Relationship: Relationship

File information per file (may occur repeatedly)

##--------------------------
## File Information
##--------------------------

##File

FileName: File name
SPDXID: SPDXRef-item-No.
FileChecksum: SHA1: SHA1 File checksum
FileChecksum: SHA256: SHA256 File checksum
FileChecksum: MD5: MD5 File checksum
LicenseConcluded: LicenseRef-License
LicenseComments: <text>Comments</text>
LicenseInfoInFile: LicenseRef-License
LicenseInfoInFile: LicenseRef-License
LicenseInfoInFile: LicenseRef-License
LicenseInfoInFile: LicenseRef-License
FileCopyrightText: <text>Verbatim copy of File copyright text</text>

License information per license (may occur repeatedly)

##-------------------------
## License Information
##-------------------------
LicenseID: License ID
LicenseName: License name
ExtractedText: Verbatim copy of license test