We are thrilled to announce 📢 Kosli is now SOC 2 Type 2 compliant - Read more
New Feature: Kosli Trails is liveCreate comprehensive audit trails for any DevOps activity - Read more
Kosli devops change management Artifact Binary Provenance

How to secure your software supply chain with Artifact Binary Provenance

Mike Long
Author Mike Long
Published January 13, 2022 in technology
clock icon 3 min read

In Kosli, we use Artifact Binary Provenance as the foundation for our audit trails. Artifact Binary Provenance is a fancy term, but the idea behind it is really quite simple. All it means is that we can identify the software we have running in production. Let’s take a closer look 👀

How should we identify software?

There’s lots of ways to identify software. In our industry we’ve tried different approaches to version-numbers like semantic versioning and release names. These are human-centered approaches that involve applying a name to a specific piece of software.

This approach is called version labeling

The downside of this approach is that it is fallible. Any label can be applied to any software package, so it’s easy to see how mistakes can be made. For example, the version number could be incorrectly bumped, or errors in copying and distributing software could cause a misapplication of identity.

Version labeling also creates a security threat. A malicious actor could label their software in a way that makes a system believe it is running qualified software, but is instead running compromised software.

For compliance and security reasons we need a more reliable approach.

Content Addressable Storage

In high security environments we need a tamper-proof identity scheme. In plain talk, if the software changes we want it to have a different identity.

Luckily, this is a solved problem in computer science. The solution is Content Addressable Storage.

How this works is really simple. Instead of using a label to define software identity, you use the cryptographic hash of the software itself.

This means that if a single byte in the software changes it will have a different identity.

Can’t I just use the git commit SHA to identify the software?

Git commits define a content addressable snapshot of the source code (and its history). If you are distributing the source repo as your artifact this could be a valid method of identity.

However, in most cases software is not distributed as source but rather as binaries (typically through compilation, packaging, or Docker images). This translation process is often non-reproducible or nondeterministic, removing a hard trace from source to binary. In other words, the binary package could be labelled with a source commit that is invalid.

For this reason we use a Secure Hash Algorithm (SHA) to identify the binary.

Storing the provenance

Now that we have a method for identifying software, wouldn’t it be great if we could look this up on demand from our DevOps tools?

A compliance System of Record provides a secure database to store claims to the identity (we have a strong opinion on what that should be 😇). When we create a binary in our secure CI build process we store the identity information in a journal.

Artifact Binary Provenance process Kosli

As each binary progresses through the value stream you can record evidence against it such as:

  • Source commit
  • Build url
  • Test results
  • Security analysis
  • Deployments
  • Approvals

And the information is as easy to look up as it is to store. Our deployment processes can perform risk controls to ensure deployments are based on known approved binaries and verified processes. This is why we believe Artifact Binary Provenance is the basis for any compliance-based DevOps approach. It makes it impossible to qualify one piece of software and deploy another.

What about the humans?

Does this mean SemVer is dead? That you shouldn’t use git SHAs to identify your software? Not at all!

These are very useful ways for humans to navigate identity through version control and CI systems. However, since they are fallible, we still need the primary key of identity to be the content-addressable storage, linked to the labels. Labels are for humans and SHAs are for machines.


ABOUT THIS ARTICLE

Published January 13, 2022, in technology

AUTHOR

Stay in the loop with the Kosli newsletter

Get the latest updates, tutorials, news and more, delivered right to your inbox
Kosli is committed to protecting and respecting your privacy. By submitting this newsletter request, I consent to Kosli sending me marketing communications via email. I may opt out at any time. For information about our privacy practices, please visit Kosli's privacy policy.
Kosli team reading the newsletter

Got a question about Kosli?

We’re here to help, our customers range from larges fintechs, medtechs and regulated business all looking to streamline their DevOps audit trails

Contact us
Developers using Kosli