Restic: Support asymmetric backups

Created on 14 May 2015  ·  44Comments  ·  Source: restic/restic

This issue should collect use cases for asymmetric backups. In this situation, restic is able to efficiently create new backups, but unable to decrypt/restore and/or modify old backups. Please add your use case to this issue if you have one. I think we have enough use cases, thanks!

Summary (2018-04-28)

At the moment, restic (mostly) interacts with "dumb" storage (local, s3, b2, gcs, azure, everything except for the REST server with --append-only). restic can save, list, get and delete data in a backend. This is required for a backup, so the needed credentials to access the backend need to be present. On the upside, restic can use almost any storage, there are very few restrictions. On the downside, once attackers gain access to a server, they can easily extract the credentials for accessing the backend and the restic password, giving them all possibilities: Decrypt historical data from the repository, modify data, delete the whole repository.

When we add asymmetric cryptography, the only difference for attackers in such a situation is that they cannot decrypt historical data from the repository. Everything else, especially deleting all the data, is still possible. So "just add asymmetric crypto" is not the whole story.

The other idea is to not access the "dumb" storage directly, but indirectly via a custom server implementation. We've played around with this idea and added the --append-only option for the REST server, which can be seen as an "adapter" to access the "dumb" storage at the local harddisk.

The only exception from the first paragraph of this summary, and an implementation of the "adapter", is the rclone backend: It can be accessed e.g. via SSH (restic -o rclone.program='ssh user@server', with a hard-coded call to rclone via ForceCommand in .ssh/authorized_keys for the SSH key the user logs in). The cloud access credentials reside on the server, the user and machine running restic won't have access to those. If --append-only is specified in the call to rclone, data can only be added.

Having "non-dumb" storage alone won't help against attackers reading data from the repository (at least not without changing the repo format), but will prevent deleting all data in the repo.

So, in conclusion, to defend best against attackers taking over a server that uses restic for backups, I think we would need to implement both (non-dumb storage and asymmetric crypto). That's a long-term goal :)

feature enhancement tracking

Most helpful comment

Asymmetric crypto would allow also the usage of an OpenPGP key like yubikey or so.

All 44 comments

A brief recap from the previous discussion:

  • Useful for backups of email servers or remote backups of log data in the event that such a server is compromised. Sensitive data that has been shredded from the server may be present in the backup, and it may be useful to be able to deny an attacker access to such historical data. It was simultaneously discussed that it could be useful to have a "blob network server" with a capability system that allows you to configure append-only / read-only / read-write access as appropriate. Having asymmetric cryptography in place would remove the need for having _yet another_ set of symmetric keys for issuing capabilities, and would likely simplify the implementation of a capability system as well.
  • In a setting with several machines, _one_ backup key can be used to access several backup datasets without having to manage a multitude of symmetric keys. If a cryptosystem such as Daniel Bernstein's Curve25519 is used, such a key may be derived from a passphrase, meaning you can use one master passphrase to manage backups created by a (potentially large) number of different security domains. Although something similar can of course be achieved with n symmetric keys and an encrypted key storage.

@heipei FYI you might want to subscribe to this issue

well, the most obvious and pressing use case is not having the backups erased by an attacker who has broken into your production system (backup client) and knows how to use restic from there.

That's hardly possible just by using asymmetric crypto. The point here is that old backup contents (not necessarily metadata) should not be available to an attacker who has broken into a server.

Data confidentiality (write-only) would be achievable for these scenarios with the implementation of asymmetric crypto.

Data integrity is a slightly different beast:
While you can use asymmetric cryptography (and a one-time signature scheme) to prove that a given dataset has not been altered, you cannot prevent its complete removal or replacement. That is a problem that requires a storage system that is append-only (and read-only, for some parts).
Thanks to discretionary access control that is relatively straightforward to implement on *nix with a backend such as ssh (using commands like chmod, chown, chattr +i, chattr +a to configure the access policies). append-only backups don't even need to be encrypted to prevent attackers from reading them (that's part of what rsyslog does, for example), but that suddenly makes the backup server an interesting target, and cloning sensitive data to more devices doesn't exactly improve the odds of maintaining data confidentiality.

Having asymmetric crypto implemented would enable people to do this kind of backups on dumb, untrusted systems, one of the things restic is all about (apart from the improved deduplication and all the other nice features). I hope this elaboration clarifies my point a bit.

This is of interest to us for the two cases described in https://github.com/restic/restic/issues/187#issuecomment-101974306. I'm subscribing to this thread to keep an eye on this.

Regarding data integrity: AFAIK, you can achieve append-only behaviour on Amazon S3 by only granting your IAM credentials the PutObject and GetObject permissions, but withholding the DeleteObject permission.

Regarding data confidentiality, I wanted to mention an attack scenario that is enabled by the use of symmetric crypto:
If an attacker has control over a victim's user account at a point in time where the victim uses restic to perform a backup, he can:

  • steal the victim's restic key
  • steal the victim's IAM credentials/sftp login/...

The attacker can then immediately wipe any trace of his attack from the victim's system making detection less likely. Since he has stolen the restic key and the credentials for the remote storage system, the victim will conveniently deliver any (backed-up) future data to him: Our attacker only needs to download and decrypt new backups from the remote storage system from time to time.

Asymmetric crypto could help prevent this by allowing the victim to store the key for restoring backups offline.

Regarding data integrity: AFAIK, you can achieve append-only behaviour on Amazon S3 by only granting your IAM credentials the PutObject and GetObject permissions, but withholding the DeleteObject permission.

Unfortunately PutObject also gives you privileges to overwrite a previously uploaded file. So, maybe it's not full data integrity?

Asymmetric crypto would allow also the usage of an OpenPGP key like yubikey or so.

Today I realized that I don't really want support for asymmetric encryption in restic, but support to NOT store the key in the repository. I understand that using asymmetric encryption efficiently would mean that restic can upload new backups without the ability to read the repository, which is quite tricky.

So, for me, it would be fine if I could handle the symmetric key without restic, and it were not uploaded in any way to the repository, no matter what complex KDF is used.

My use case is as a system administrator for a number of servers.

Asymmetric backups are the only way to protect the backups in the scenario of a server compromise where an attacker maliciously destroys (or ransomware encrypts) all the data (including the backups).

This needs server side support through either an append only storage layer or a server daemon similar to the way rdiff-backup has the --restrict-update-only option. Currently I work around this by using read-only snapshots of the backup repository on the backup server (accessed via sftp).

(Perhaps relevant): Append-only can be achieved in Linux through the the append-only flag on a directory (which disable unlinking) along with the immutable flag on the files. The commands responsible for setting those flags are chattr +a /path/to/directory and chattr +i /path/to/directory/myfile01, respectively.

my use case here is #533 - unattended backups. as stated there, asymmetric crypto is only one of the ways of doing this, but it seems like the first obvious solution to the problem.

In a scenario where the repo is on a remote server - only local commands on the repo should be able to forget/prune.

The restic backup, should connect with a unique key for that system with restore / backup privileges only.

In a scenario where the repo is on a remote server - only local commands on the repo should be able to forget/prune.

This should be accomplished by restricting access to delete/modify files on the repo level. I think it's out of scope (and not even secure) for restic to manage these permissions. After all, someone could delete the repo or even the keys, rendering the whole repo useless, irrespective of whether that action were allowed by the restic client.

Regarding preventing backup data from being destroyed by someone that hacks the server: rest-server has recently gained an "append only mode" with PR https://github.com/restic/rest-server/pull/28 that prevents exactly this.

My use case is having many systems back up to the same repository, getting the advantage of deduplication across all those machines, but a compromise of one system (and it's backup script) not allowing the attacker to read the backups of the other systems.

The feature I'm looking for is to have "backup" keys that will enable a system to write(backup) and read(restore) but can not do any admin, (such as prune, add keys or even see the existence of other keys, (users) or snapshots not associated with $backup_key ). (Though a side-channel attack might be possible by comparing backup times, it does not matter to me if they can determine the existence of data, only that they can not ransomware my data and can't view other users.) I would expect the holder of a backup-(only)-key to be able to roll their own passphrase(s) forward. So unlike michbsd's request I would be able to admin from a non-local machine with a privileged key. (Having used SELinux for years I'm now a fan of the granularity of M.A.C.) Thanks for reading. (Sorry if this should have its own issue.) With this feature #ResticKillsRansomware

With this feature #ResticKillsRansomware

In general, pull-oriented backups (as opposed to push-oriented ones) solve ransomware, right? :)

Possibly but that then gives remote access and adds another attack vector. How I've designed it my backup server is for preserving data and should never have access to my production environment. Clear demarcation of functional domains.

I'm guessing Restic isn't the solution then, since it's designed to operate on backup repos directly.

Maybe you could do something with a middleman server of sorts. Have your production machines upload tarballs to a server, then have another system download the tarballs, extract them, and back up the contents locally. Each side only has access to the middle server. That would be fairly simple to do without any modification to Restic. It would probably be safer and more robust as well, as any bugs in a receive-only Restic mode could make the backups vulnerable to compromised backup clients.

The feature I'm looking for is to have "backup" keys that will enable a system to write(backup) and read(restore) but can not do any admin, (such as prune, add keys or even see the existence of other keys, (users) or snapshots not associated with $backup_key ). (Though a side-channel attack might be possible by comparing backup times, it does not matter to me if they can determine the existence of data, only that they can not ransomware my data and can't view other users.) I would expect the holder of a backup-(only)-key to be able to roll their own passphrase(s) forward. So unlike michbsd's request I would be able to admin from a non-local machine with a privileged key. (Having used SELinux for years I'm now a fan of the granularity of M.A.C.) Thanks for reading. (Sorry if this should have its own issue.) With this feature #ResticKillsRansomware

While this might not exactly be what you are aksing for you could take a look at rest-server. It has an append-only mode which prevents deletion and modification of existing backups.

While this might not exactly be what you are aksing for you could take a look at rest-server. It has an append-only mode which prevents deletion and modification of existing backups.

I didn't even realize that existed. D:

I see two structural things (and slightly different attacker models) that I'd like to dump here.

At the moment, restic (mostly) interacts with "dumb" storage (local, s3, b2, gcs, azure, everything except for the REST server with --append-only). It can save, list, get and delete data in a backend. This is required for a backup, so the needed credentials to access the backend need to be present. On the upside, restic can use almost any storage, there are very few restrictions. On the downside, once attackers gain access to a server, they can easily extract the credentials for accessing the backend and the restic password, giving them all possibilities: Decrypt historical data from the repository, modify data, delete the whole repository.

When we add asymmetric cryptography, the only difference for attackers in such a situation is that they cannot decrypt historical data from the repository. Everything else, especially deleting all the data, is still possible. So "just add asymmetric crypto" is not the whole story.

The other idea is to not access the "dumb" storage directly, but indirectly via a custom server implementation. We've played around with this idea and added the --append-only option for the REST server, which can be seen as an "adapter" to access the "dumb" storage at the local harddisk. I have several ideas on how to improve on that idea, not necessarily with the REST server.

For example, I'd like to define a protocol for a backend that is spoken over a pair of file descriptors, e.g. stdin/stdout. We can then implement this in a program which is run over SSH on a remote machine, just like we do for the sftp backend. The server implementation can then decide where to store the data (local, s3, b2, whatever) and which restrictions apply (e.g. "only add read old or add new data", without the ability to delete anything besides maybe lock files. The server could for example be started via a ForceCommand on login via SSH with a particular user account or SSH key.

Having "non-dumb" storage alone won't help against attackers reading data from the repository (at least not without changing the repo format), but will prevent deleting all data in the repo.

So, in conclusion, to defend best against attackers taking over a server that uses restic for backups, I think we would need to implement both (non-dumb storage and asymmetric crypto). That's a long-term goal :)

I'm going to copy this text to the first comment in this issue so it is easier to find.

yeah, reflecting on this further, i agree that asym crypto is not that useful to defend against takeovers - it's more useful for unattended backups (#533).

having a native communication protocol could be useful, but i'm not sure what you gain from that over the current REST server - could you expand on that? attic/borg went that way: there's a client-to-server "proprietary" (as in, borg-specific) protocol there and it is possible to implement some restrictions for clients. and yes, this relies on ForceCommand and "borg serve" restricted flags... there are some relevant notes in the borg docs about this and drawbacks you should be aware of.

and of course, the most natural way to protect backups from a compromised client is to simply not allow the client to perform the backups itself, but instead have the server pull files from the backups, "bacula-style" ("It comes in the night and sucks the essence from your computers", for those who remember that catchy phrase). there doesn't seem to be a well documented or elegant way to do this in borg either, the FAQ points to https://github.com/borgbackup/borg/issues/900 as a discussion on the topic. here this is tracked in #299, which hadn't been mentioned here yet.

so long story short, I would keep the focus of asymetric crypto support simple: make it easier to have offsite key storage and automated backups. there are other ways of securing compromised clients, and I think pull support is the most interesting one. in fact, in my optimal backup solutions, i have all clients pushing their backups to a central server, then an offsite server pulling from the main backup server. this way:

  1. the backup server doesn't need root access on all machines
  2. yet the compromise of a machine is still recoverable, even if they are able to mess with the backups

i actually find it strange that this issue was turned into "i want to secure against client takeover" - maybe we're confusing the solution with the problem here. :)

Hi,

It seems that this issue is not just about asymetric crypto backup, but more about different attack vectors.
I didn't read the code, so I really have a naive question, but my use case is mostly about being able to backup data without disclosing the secret key (by using the public key of an offline private key of the backup owner). For that use case, is it easy to implement?

My comprehension over the subject is that right now all the blobs are encrypted with the same key, and it works great.
If we would use asym crypto the way OpenPGP works, each snapshot made would generate a symetric key encrypted with the public key and add it in the repository. But I guess the problem is that to be able to discover what to deduplicate and what to backup, you should be able to read the info first, hence you would need the private key also. Is that right?
If that is the case, could some zero knowledge proof could help along those lines?

@dolanor please don't add new use cases or questions to this issue, use the forum for questions. Also, it is way too early to talk about implementation details.

I've updated the summary in the first post. The rclone backend has been added in the meantime, this can be used as an "adapter" as described above, and accessed e.g. via SSH.

On the downside, once attackers gain access to a server, they can easily extract the credentials for accessing the backend and the restic password

I hope this is a typo: you men the encrypted keyfile material here, no? Hopefully an attacker gaining access to the server doesn't have access to the plaintext password. The worst they can do is try to bruteforce or guess the "user password" which is used to decrypt the master encryption and authentication keys to the repository.

If that is correct, I would highly recommend you change the summary again to clarify, because it sure looks bad when stated that way. :)

Hopefully an attacker gaining access to the server doesn't have access to the plaintext password. The worst they can do is try to bruteforce or guess the "user password" which is used to decrypt the master encryption and authentication keys to the repository.

I'd guess it depends on the exact scenario: If you're manually entering the password, right. If you're doing scheduled automatic backups, on the other hand, the "user password" will have to be stored somewhere on the server.

And, of course, an attacker might swap out the Restic binary with one that leaks the entered password and wait for you to enter it. You can't trust a compromised system.

the "user password" will have to be stored somewhere on the server.

by "server" do you mean "the machine we are running restic on that we save the data of" or "the machine that receives/stores the data from the backup"?

it's rather ambiguous, and the source of my concern: I do not mind the backup client (the machine we are backing up that is running restic) having the password in cleartext: the whole dataset is there anyways so if that's compromised, the data is compromised anyways. but i sure hope that the backup server doesn't have access to the cleartext!

by "server" do you mean "the machine we are running restic on that we save the data of" or "the machine that receives/stores the data from the backup"?

The machine we are running restic on that we save the data of.

I see your point, you're right, it's ambiguous. My understanding from everything I know about Restic's model is the same as yours, I am quite certain about that, but I can't give you the definite confirmation you want.

The summary mentions the --append-only option for REST server. Perhaps that should remain as the only officially recommended method of append-only backup, but it might be good to document which files in restic need to be writable for normal operation to help with figuring out how to set up other approaches.

I believe that restic backup would work okay ifdata, index, keys, and snapshots allowed file creation but not modification or deletion (and config was also protected). However, I think that locks would need to allow deletion so that the repository does not get permanently locked. Also, some append-only implentations (like the attribute for ext4 and xfs file systems) are not recursive, so the 256 two character subdirectories of data would need to be pre-generated first and then the attribute would need to be set on them.

Some backends like S3 do not support append-only but do support object versioning which could achieve the same effect. However, this requires careful checking of the access control model. For example, B2 has lifecycle rules which allow object versioning, but the API key necessary to backup to B2 has the capapbility of modifying the lifecycle rules (B2 does not really have much of a permission system yet).

Aside: I may be missing something, but if asymmetric encryption is just protecting historical data from an attacker who has compromised the client it seems like a low priority. It would be nice to have but in most cases the current data is more valuable than previous versions (though sometimes something valuable is accidentally backed up, deleted, but not purged).

@willsALMANJ good observations. For S3 I wonder if the objection versions could be recorded to allow fetching a coherent view of the blobs required to restore a given snapshot (although you can validate them based on their contents, so not super important).

Re: your last paragraph:

  • The main benefit of asymmetric encryption, besides the "decrypt historical things" scenario you mentioned, is being able to store backups from multiple independent machines in the same backup repository without having to provision individual keys (which requires storing the backup key somewhere each time you install a new client machine). If you use a shared key you get the annoying threat scenario where client1 can read client2's data, which is not ideal.

@fd0 I think I have a decent scheme for asymmetric encryption using HMAC addressing with derived shared secrets. Also some ideas about server side garbage collection without leaking data, not sure if you are interested, but if you are I'd be interested to talk about it.

I don't know if I miss something here, but I am running restic successfully with this policy setting on S3 storage. It does not prevent an attacker from reading the data, but it prevents him vom deleting.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": "arn:aws:s3:::kvasir"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::backup/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::backup/locks/*"
    }
  ]
}

The prune/forget commands are then run from a trusted device which has write permissions. I also create two keys in every restic repository. One for the server and one for the trusted device, so that the trusted device can lock out an attacker (but the attacker can't since he cannot delete in keys/*).

Edit: Sorry, overlooked that this has already been discussed. Didn't want to hijack this thread.
PutObject is actually capable of overwriting object so this is not a solution to protect backups.

@freswa I am not an S3 expert, so I am not certain this is correct, but the point that was made above in this discussion is that the PutObject permission can be used to overwrite your data which is just as bad as deleting it. In my post above, I noted that you could work around this issue by using object versioning (don't give the backup system access to delete versions).

@andrewchambers I'm a bit swamped with other stuff, let's talk about your ideas once we get to actually implement this! Yes, I'm interested ;)

So, this issue is about (eventually) implement asymmetric backups, not access configuration for the backend storage. Thanks! :)

@fd0 Hopefully this explains what I meant https://packnback.github.io/blog/dedup_and_encryption/

@andrewchambers: Just in case you haven't come across the write-only issue yet (that you mentioned on your site), it's https://github.com/ncw/rclone/issues/2499.

@andrewchambers thanks for writing it down, I'm very interested how the actual implementation looks like. The blog post left some interesting bits open :)

I like that there will be another contender in the space of free softwar backup programs, giving users more options is always great!

So an interesting parallel can be made with the two git repository encryption mechanisms.

On the one side you have git-crypt: this uses git smudge/clean filters to (respectively) encrypt/decrypt files between the blob storage and the checked out copy. This works well and is fairly optimal, but it has one glaring hole: the git commits themselves are not encrypted, only the contents of the blobs, which means that filenames, commitlogs, authors, dates and other metadata is all store in the clear. That's a no-go for many use cases and is effective only when you have a public repository where (say) you want to encrypt some bits (but not all).

On the other side you have git-remote-gcrypt: that uses the git remote helpers protocol to encrypts everything that is sent on the server. But that is very inefficient, because the entire repository is re-encrypted on each run, due to the way special remotes work.

Now, those are git-specific implementation challenges, but I think they map well into the problems you might get here. Maybe i'm totally out of my depth here and this parallel is irrelevant, but I figured it might be of interest here...

As an aside, there is a middle ground that could currently be implemented (probably) rather easily: allow keys to be stored outside of the repository.

One of the attack vectors being addressed is that the attacker gets his hands on a key password, and then (since the keys are stored with the repository) he can decrypt a key with ease.

What if we allow specification of a separate key directory, where the key files are stored? This directory could be stored locally on each machine that needs to do backups, and it could itself be backed up to a different cloud provider, or even a QR code (~500 bytes is plenty small to be QR-encoded) for cold offline storage in a safety deposit box, for example.

If the encrypted keys never touch a cloud provider, the attack vector goes away completely. The keys would have to be compromised from the physical premises or exfiltrated with malware, for example.

This _can be done on Restic already_ if a local copy of the repository is kept -- just exclude the keys directory from being synced to the untrusted remote when running rclone. This _cannot_ be done if there is no local copy and restic directly interacts with the untrusted remote.

I think we should apply the single responsibility principle and break things down into 2 tasks:

  1. Keep data safe from decryption.
  2. Keep data safe from unauthorized delete action.

They are 2 different aspects of data safety. Technically they don't have to rely on each other.

For (1), obviously we can "simply add asymmetric encryption support". For (2), I believe there are many possible solutions (for example, as mentioned above, append-only S3 setup).

Was this page helpful?
0 / 5 - 0 ratings