> Main > Explanations > How plakar works
Version main

How Plakar Works

Plakar is built on top of Kloset, an immutable data store engine designed specifically for backup workloads. Understanding how Plakar processes and stores your data helps you make informed decisions about backup strategies and troubleshoot issues when they arise.

This page explains the technical foundation of Plakar without step-by-step instructions. If you’re looking for practical guidance, see the Guides section.

Kloset Store

Kloset is the immutable data store engine at the heart of Plakar. It is the library that Plakar uses to store and manage backups.

The simplest way to see Kloset is as a “storage API” that Plakar uses to store backups. It is not a traditional REST API you might be familiar with, but rather a library that exposes a set of functions to store and retrieve data. For example, when making a backup, Plakar will use Kloset to retrieve the content and the metadata to be backed up, chunk it into smaller pieces, compress and encrypt those pieces, regroup them into larger files called “packfiles”, and finally write those packfiles to a storage backend such as a local filesystem, an object storage service, or a remote server.

Plakar is a tool built on top of Kloset, which provides a command-line and a web interface to manage your backups, with additional features such as scheduling, activity reporting, and more.

Without Plakar, you would have to write your own code to use Kloset. With Plakar, you get an easy-to-use tool to implement a backup strategy be it for your personal laptop or your large scale infrastructure.

If you want to dig deeper into Kloset and see all the features it provides, read the Kloset blog post.

Backup steps

When you run a backup command, Plakar will use the integration you specified to retrieve the content to be backed up.

For example, the built-in filesystem integration will scan the directory you specified, and retrieve the content and metadata of the files and directories to be backed up.

There are several steps that Plakar (actually, Kloset) will perform to create a backup:

  1. Chunking: The content is split into smaller pieces called “chunks”. If you attempt to back up a large video file for example, it will be split into smaller chunks to make it easier to store and manage.
  2. Deduplication: The chunks are deduplicated, meaning that if the same chunk already exists in the store, it will not be stored again. This is a key feature of Plakar that allows you to save space and time when backing up large files or directories that contain many duplicate files. This is also the reason why you can create multiple snapshots of the same directory without consuming much more space than a single snapshot. You might already understand that that’s why chunking is so important: if we didn’t chunk the content, then adding a single byte to a file would mean that the whole file would have to be stored again. With chunking, only the chunk containing the changed byte will be stored again, and the rest of the file will remain unchanged.
  3. Compression: The chunks are compressed to save space.
  4. Encryption: The chunks are encrypted. We call the encrypted chunks “blobs”. These blobs are sent to the storage backend, which acts as a “dumb storage”: it does not know anything about the content of the blobs, it just stores them as they are. This is what we call “real end-to-end encryption”: the storage backend does not have access to the content of the backups, and only you can decrypt them.

Independent snapshots

In a Kloset store, each backup is stored as an independent snapshot. This means that you can create multiple snapshots of the same data source without consuming much more space than a single snapshot. Each snapshot contains the content and metadata at the time of the backup, and can be restored independently of other snapshots.

These snapshots are not incremental backups, meaning that they do not depend on any other snapshot. You can delete a snapshot without affecting any of the subsequent snapshots, and you can compare the differences between a snapshot and any other snapshot.

Content Defined Chunking (CDC)

As seen in the Backup steps section, Kloset uses Content Defined Chunking (CDC) to split the content into smaller pieces called “chunks”.

To understand why chunking is important, consider the following: let’s say you have a large video file that you want to back up. If you didn’t chunk the content, then adding a single byte to the end of the file would mean that the whole file would have to be stored again. This would be very inefficient, especially if you have large files that change frequently.

Now, let’s understand why CDC is important. In our video example, what would happen if we added a single byte to the middle of the file? With a fixed-size chunking algorithm, all the subsequent chunks would be considered as changed, and they would have to be stored again.

CDC stands for “Content Defined Chunking”, and it is a technique that uses the content of the file to determine the size of the chunks. This means that if you add a single byte to the middle of a file, only the chunk containing that byte will be considered as changed, and only that chunk will be stored again. The rest of the file will remain unchanged. The “single byte change” in the middle of the file is obviously an example, and the same applies if you make larger changes to the file, such as adding or removing a few lines of text in a text file, or changing a few pixels in an image file.

To get a better understanding of how CDC works and to know more about go-cdc-chunker, the library we open-sourced to implement CDC in Kloset, read the go-cdc-chunker blog post.

Compression

Kloset uses compression to save space when storing backups. The compression is applied to the chunks before they are encrypted and stored in the storage backend.

Plakar currently uses LZ4, a fast compression algorithm that is well suited for backups.

Backing up encrypted data

When backing up data, you have to make a choice: do you want to backup encrypted data or not?

If you choose to backup encrypted data, then you defeat the deduplication and compression features of Kloset. Whenever you change a single byte in an encrypted file, the whole file will be considered as changed, and it will be stored again. This is because the encryption algorithm will produce a completely different output for the same input if even a single byte is changed.

Still, there might be situations where you want to backup encrypted data, but be aware that you will not benefit from all the optimizations that Kloset provides.

Tamper-evident snapshots

The data stored in Kloset is tamper-evident. It doesn’t mean the storage backend is “immutable” in the sense that it cannot be changed. If you store data on a hard drive, for example, it can be changed by anyone with access to the hard drive, and in any case, data can be lost or tampered due to hardware failures.

When we say that the data is tamper-evident, we mean that Kloset uses cryptographic techniques to ensure that any change to the data will be detected. Each snapshot is signed with a cryptographic hash, and any change to the data will result in a different hash. This means that if someone tries to change the data, you will be able to detect it by checking the hash of the snapshot.

From there, you can decide what to do with the tampered snapshot: should you untrust the whole store and use another copy, or should you just ignore the maybe single tampered item and continue using the store as is? This is up to you, but Kloset will always let you know if something is wrong.

Integration

We designed Plakar to be as flexible as possible. Nowadays, you not only want to back up your filesystem, but also your databases, your cloud storage, your remote servers, your SaaS applications, and more. To achieve this, Plakar uses the concept of “integrations”.

An integration provides a storage connector, a source connector, and a destination connector; or a combination of those.

These integrations are implemented as plugins, and we made the process of installing and using them as easy as possible. We also provide an easy way to create your own integration if you need to backup a data source that is not supported by Plakar out of the box. For example, the FTP source connector is about 80 lines of code, imports included.

Check out the list of available integrations to see what is already available.

Storage connector

The storage connector is the part of the integration that allows Plakar to host the Kloset store to a specific storage backend. It is responsible for storing the blobs (the encrypted chunks) in the storage backend, and for retrieving them when needed.

For example, Plakar has a built-in storage connector for filesystems and S3-compatible object storage services, but it is possible to install integrations to host your Kloset store on Google Drive or Dropbox.

Source connector

The source connector is the part of the integration that allows Plakar to retrieve the content to be backed up. It is responsible for scanning the data source, retrieving the content and metadata, and passing it to Kloset for processing.

For example, Plakar has a built-in source connector for SFTP servers, which allows to backup files from a remote server over SSH, but it is possible to install the integration for Notion to backup your Notion pages, or the integration for Google Photos to make sure your memories are safe.

Destination connector

The destination connector is the part of the integration that allows Plakar to restore the content from a backup. It is responsible for retrieving the content and metadata from Kloset, and restore it to the target location.

For example, Plakar has a built-in destination connector for filesystems, which allows to restore files to a local directory. Similarly, it is possible to install the integration for Google Drive, Dropbox, or any other cloud storage service to restore your backups to the cloud.

Found a bug or mistake in the documentation? Create an issue on GitHub