Runtime: General low level primitive for ciphers (AES-GCM being the first)

Created on 29 Aug 2017  ·  143Comments  ·  Source: dotnet/runtime

Rationale

There is a general need for a number of ciphers for encryption. Todays mix of interfaces and classes has become a little disjointed. Also there is no support for AEAD style ciphers as they need the ability to provide extra authentication information. The current designs are also prone to allocation and these are hard to avoid due to the returning arrays.

Proposed API

A general purpose abstract base class that will be implemented by concrete classes. This will allow for expansion and also by having a class rather than static methods we have the ability to make extension methods as well as hold state between calls. The API should allow for recycling of the class to allow for lower allocations (not needing a new instance each time, and to catch say unmanaged keys). Due to the often unmanaged nature of resources that are tracked the class should implement IDisposable

public abstract class Cipher : IDisposable
{
    public virtual int TagSize { get; }
    public virtual int IVSize { get; }
    public virtual int BlockSize { get; }
    public virtual bool SupportsAssociatedData { get; }

    public abstract void Init(ReadOnlySpan<byte> key, ReadOnlySpan<byte> iv);
    public abstract void Init(ReadOnlySpan<byte> iv);
    public abstract int Update(ReadOnlySpan<byte> input, Span<byte> output);
    public abstract int Finish(ReadOnlySpan<byte> input, Span<byte> output);
    public abstract void AddAssociatedData(ReadOnlySpan<byte> associatedData);
    public abstract int GetTag(Span<byte> span);
    public abstract void SetTag(ReadOnlySpan<byte> tagSpan);
}

Example Usage

(the input/output source is a mythical span based stream like IO source)

using (var cipher = new AesGcmCipher(bitsize: 256))
{
    cipher.Init(myKey, nonce);
    while (!inputSource.EOF)
    {
        var inputSpan = inputSource.ReadSpan(cipher.BlockSize);
        cipher.Update(inputSpan);
        outputSource.Write(inputSpan);
    }
    cipher.AddAssociatedData(extraInformation);
    cipher.Finish(finalBlockData);
    cipher.GetTag(tagData);
}

API Behaviour

  1. If get tag is called before finish a [exception type?] should be thrown and the internal state should be set to invalid
  2. If the tag is invalid on finish for decrypt it should be an exception thrown
  3. Once finished is called, a call to anything other than one of the Init methods will throw
  4. Once Init is called, a second call without "finishing" will throw
  5. If the type expects an key supplied (a straight "new'd" up instance) if the Initial "Init" call only has an IV it will throw
  6. If the type was generated say from a store based key and you attempt to change the key via Init and not just the IV it will throw
  7. If get tag is not called before dispose or Init should an exception be thrown? To stop the user not collecting the tag by accident?

Reference dotnet/corefx#7023

Updates

  1. Changed nonce to IV.
  2. Added behaviour section
  3. Removed the single input/output span cases from finish and update, they can just be extension methods
  4. Changed a number of spans to readonlyspan as suggested by @bartonjs
  5. Removed Reset, Init with IV should be used instead
api-suggestion area-System.Security

Most helpful comment

@bartonjs you're literally ignoring all technical and logical analysis and instead using the dates of the Go and libsodium projects as a weak proxy for the real analysis?

No, I am using the advice of professional cryptographers who say it's extraordinarily dangerous and that we should avoid streaming AEAD. Then I'm using information from the CNG team of "many people say they want it in theory, but in practice almost no one does it" (I don't know how much of that is telemetry vs anecdotal from fielding assistance requests). The fact that other libraries have gone the one-shot route simply _reinforces_ the decision.

Why is the demand demonstrated so far on GitHub insufficient?

A few scenarios have been mentioned. Processing fragmented buffers could probably be addressed with accepting ReadOnlySequence, if it seems like there's enough of a scenario to warrant complicating the API instead of having the caller do data reassembly.

Large files are a problem, but large files are already a problem since GCM has a cutoff at just shy of 64GB, which is "not all that big" (okay, it's pretty big, but it's not the "whoa, that's big" that it used to be). Memory-mapped files would allow Spans (of up to 2^31-1) to be utilized without requiring 2GB of RAM. So we've shaved a couple of bits off of the maximum... that would probably happen over time anyways.

You do realize that deciding on a non-streaming interface for AEAD precludes all such implementations down the road, right?

I'm more and more convinced that @GrabYourPitchforks was right (https://github.com/dotnet/corefx/issues/23629#issuecomment-334638891) that there's probably not a sensible unifying interface. GCM _requiring_ a nonce/IV and SIV _forbidding_ it mean that the initialization of an AEAD mode/algorithm already requires knowledge about what's going to happen... there isn't really a "abstracted away" notion to AEAD. SIV dictates where "the tag" goes. GCM/CCM do not. SIV is tag-first, by spec.

SIV can't start encrypting until it has all of the data. So its streaming encrypt is either going to throw (which means you have to know to not call it) or buffer (which could result in n^2 operation time). CCM can't start until the length is known; but CNG doesn't allow a pre-encrypt hint at the length, so it's in the same boat.

We shouldn't design a new component where it's easier to do the wrong thing than the right thing by default. Streaming decryption makes it very easy and tempting to wire up a Stream class (a la your proposal to do so with CryptoStream) which makes it very easy to get a data validation bug before the tag is verified, which almost entirely nullifies the benefit of AE. (IGcmDecryptor => CryptoStream => StreamReader => XmlReader => "wait, that's not legal XML..." => adaptive ciphertext oracle).

It's getting to the point ... customer demand.

As I've, unfortunately, heard way too many times in my life: I'm sorry, but you aren't the customer we have in mind. I'll concede that perhaps you know how to do GCM safely. You know to only stream to a volatile file/buffer/etc until after tag verification. You know what nonce management means, and you know the risks of getting it wrong. You know to pay attention to stream sizes and cut over to a new GCM segment after 2^36-64 bytes. You know that after it's all said and done it's your bug if you get those things wrong.

The customer I have in mind, on the other hand, is someone who knows "I have to encrypt this" because their boss told them to. And they know that when searching for how to do encryption some tutorial said "always use AE" and mentions GCM. Then they find an "encryption in .NET" tutorial which uses CryptoStream. They then hook up the pipeline, not having any idea that they've just done the same thing as choosing SSLv2... checked a box in theory, but not really in practice. And when they do it _that_ bug belongs to everyone who knew better, but let the wrong thing be too easy to do.

All 143 comments

Quick-to-judge feedback on the proposed API (trying to be helpful):

  • Span<T> is not in NetStandard2.
  • "Nonce" is very implementation-specific - ie. smells of GCM. However, even GCM docs (ex. NIST SP800-38D) refer to it as "IV", which - in case of GCM - happens to be a nonce. However, in case of other AEAD implementations the IV does not have to be a nonce (ex. repeating IV under CBC+HMAC is not catastrophic).
  • Streaming AEAD should either work seamlessly with CryptoStream, or provide its own "AEADCryptoStream" which is just as easy to stream in/out as CryptoStream.
  • AEAD API imlementations should be allowed to do internal key derivation based on AAD (Associated Data). Using AAD purely for tag calculation/verification is too restrictive and prevents a stronger AEAD model.
  • "Get*" methods should return something (GetTag). If they are void, they must be setting something/changing state.
  • Trying to obtain a tag before "finishing" is probably a bad idea, so "IsFinished" might be helpful.
  • Folks that designed ICryptoTransform thought about reuse, multi-block support, and differently-sized input/output block sizes. These concerns are not captured.

As a proof of AEAD API sanity, the first implementation of such proposed API should not be AES-GCM, but the classic/default AES-CBC with an HMAC tag. The simple reason for it is that anyone can build AES-CBC+HMAC AEAD implementation today, with simple, existing, well-known .NET classes. Let's get a boring old [AES-CBC+HMAC] working over the new AEAD APIs first, since that's easy for everyone to MVP and test-drive.

The nonce/IV naming issue was something I was undecided about, happy with a change to IV so will change.

As for Get methods returning something, this avoids any allocations. There could be an overload Get() that returns something. Maybe it requires a naming change, but I am pretty married to the idea that the whole API needs to be basically allocation free.

As for streams etc, I am not overly bothered with that as they are higher level API's that can easily be constructed from the lower level primitives.

Obtaining a Tag before you have finished should not be allowed, however you should know when you have called finish, so I am not sure it should be in the API, however it should be a defined behaviour so I have updated the API design to include a behaviour section so we can capture any others that are thought of.

As for which cipher, I don't think any specific cipher should be the sole target, in order to prove out a new general purpose API it needs to fit a number. AES GCM, and CBC should both be covered.

(all on topic feedback good or bad is always helpful! )

  • Class, or interface?
  • How, if at all, do the current SymmetricAlgorithm classes interact with this?
  • How would this be used for persisted keys, like TripleDESCng and AesCng can do?
  • Many of these Spans seem like they could be ReadOnlySpan.

@Drawaes thanks for getting the ball rolling on this API. A few thoughts:

  1. Tag generation and verification is a very important part of this API since misuse of tags can defeat the whole purpose. If possible, I'd like to see tags built into the initialize and finish operations to ensure they can't be accidentally ignored. That likely implies that encrypt and decrypt shouldn't use the same initialize and finalize methods.
  2. I have mixed feelings about outputting blocks during decryption before getting to the end since the data isn't trustworthy until the tag has been checked (which can't be done until all data has been processed). We'll need to evaluate that tradeoff very carefully.
  3. Is Reset necessary? Should finish just reset? We do that on incremental hashes (but they don't need new IVs)

@bartonjs

  1. Class, as has often been seen in the BCL with an interface you can't expand it later without breaking everything. An interface is like a puppy for life... unless default interface methods can be considered as a solution to that problem.
    Also a sealed class from a abstract type is actually faster (as of now) because the jitt can devirtualise the methods now ... So it's basically free. interface dispatch isn't as good (still good just not as good)
  2. I dunno how would you like that to work? I have little interest in the current stuff as it's so confusing I would just patch all reasonable modern algos straight in (leave 3DES in The other classes :) But I don't have all the answers so do you have any further thoughts on this?
  3. Persisted keys should be easy. Make an extension method on the key method or persistence store.
MyKeyStore.GetCipher();

It's not initialized. It's disposable so any refs can be dropped by normal disposable pattern. If they try to set the key throw an invalid operation exception.

Yes to the read-only spans I will adjust when I am not on the tube on my phone.

@morganbr no problem... I just want to see it happen more than anything ;)

  1. Can you give a code snippet on how you see that working? Not sure it does but code always brings clarity
  2. It's unfortunate but you really have to spit out the blocks early. With hmac and hashing you don't but you don't have interim data just the state. So in this case you would have to buffer an unknown amount of data. Let's take a look at the pipelines example and TLS. We can write 16k of plaintext but the pipeline buffers are today the size of a 4k page. So we would at best want to encrypt/decrypt 4*4k. Of you don't give me the answer until the end you need to allocate internal memory to store all of that and then I presume throw it away when I get the result? Or will you wipe it. What if I decrypt 10mb and you hold that memory after I now have to worry about latent memory use.
  3. Not 100% on the init/reset thing (not your ideas, my current API shape) it doesn't sit well with me so I am open to a new suggestion!

I have little interest in the current stuff as it's so confusing I would just patch all reasonable modern algos straight in (leave 3DES in The other classes :)

The problem would be that container formats like EnvelopedCms (or EncryptedXml) may need to work with 3DES-CBC, AES-CBC, etc. Someone wanting to decrypt something encrypted with ECIES/AES-256-CBC-PKCS7/HMAC-SHA-2-256 probably wouldn't feel that they're doing old and crufty things.

If it's only supposed to be for AE, then that should be reflected somewhere in the name. Right now it's generic "cipher" (which I was/am, at some point, going to sit down with a dictionary/glossary and find out if there's a word for "an encryption algorithm in a mode of operation", since I think that "cipher" == "algorithm", therefore "Aes").

:) I was merely pointing out that it's not my subject area or of much interest to me so I am willing to defer to you and the community on this topic I haven't thought through the implications for this.


After quickly scanning through these, one option is allow them to take an instance of the "Cipher" or whatever it's called class. This might not be done in the first wave, but could quickly follow it up. If the API is super efficient then I see no reason they should do their own thing internally and is exactly the use case for this API.

As a side bar on the naming... I must admit its a tough one however
Openssl = cipher
Ruby = cipher
Go = cipher package with ducktyped interfaces for AEAD etc
Java = cipher

Now I am all for being different but... there is a trend. If something better is possible that is cool.

Possibly "BlockModeCipher" ... ?

I have made a few changes, I will change naming if a better name is decided upon.

When I started trying to answer questions, I realized the API is already missing encryption/decryption differentiation, so in your example, it doesn't know whether to encrypt or decrypt data. Getting that put in might add some clarity.

I could imagine a couple ways that the API could enforce proper tag usage (based on the assumption that this is an AEAD API, not just symmetric encryption since we already have SymmetricAlgorithm/ICryptoTransform/CryptoStream). Don't take these as prescriptive, just as an example of enforcing the tagging.
By method:

class Cipher
{
   void InitializeEncryption(ReadOnlySpan<byte> key, ReadOnlySpan<byte> iv);
   // Ensures decryptors get a tag
   void InitializeDecryption(ReadOnlySpan<byte> key, ReadOnlySpan<byte> iv, ReadOnlySpan<byte> tag);
   // Ensure encryptors produce a tag
    void FinishEncryption(ReadOnlySpan<byte> input, Span<byte> output, Span<byte> tag);
   // Throws if tag didn't verify
   void FinishDecryption(ReadOnlySpan<byte> input, Span<byte> output);
   // Update and properties are unchanged, but GetTag and SetTag are gone
}

By class:

class Cipher
{
    // Has properties and update, but Initialize and Finish aren't present
}
class Encryptor : Cipher
{
    void Initialize(ReadOnlySpan<byte> key, ReadOnlySpan<byte> iv);
    void Finish(ReadOnlySpan<byte> input, Span<byte> output, Span<byte> tag);
}
class Decryptor : Cipher
{
   void Initialize(ReadOnlySpan<byte> key, ReadOnlySpan<byte> iv, ReadOnlySpan<byte> tag);
   // Throws if tag didn't verify
   void Finish(ReadOnlySpan<byte> input, Span<byte> output);
}
class AesGCMEncryptor : Encryptor {}
class AesGCMDecryptor : Decryptor {}
}

That said, if it doesn't buffer on decryption, is it practical to ensure decryption actually gets finished and the tag gets checked? Can Update somehow figure out that it's time to check? Is that something Dispose should do? (Dispose is a bit dangerous since you may have already trusted the data by the time you dispose the object)

As far as naming, our precedent is SymmetricAlgorithm and AsymmetricAlgorithm. If this is intended for AEAD, some ideas could be AuthenticatedSymmetricAlgorithm or AuthenticatedEncryptionAlgorithm.

Some thoughts & API ideas:

public interface IAEADConfig
{
    // size of the input block (plaintext)
    int BlockSize { get; }

    // size of the output per input-block;
    // typically a multiple of BlockSize or equal to BlockSize.
    int FeedbackSize { get; }

    // IV size; CAESAR completition uses a fixed-length IV
    int IVSize { get; }

    // CAESAR competition uses a fixed-length key
    int KeySize { get; }

    // CAESAR competition states that typical AEAD ciphers have a constant gap between plaintext length
    // and ciphertext length, but the requirement is to have a constant *limit* on the gap.
    int MaxTagSize { get; }

    // allows for AE-only algorithms
    bool IsAdditionalDataSupported { get; }
}

public interface ICryptoAEADTransform : ICryptoTransform
{
    // new AEAD-specific ICryptoTransform interface will allow CryptoStream implementation
    // to distinguish AEAD transforms.
    // AEAD decryptor transforms should throw on auth failure, but current CryptoStream
    // logic swallows exceptions.
    // Alternatively, we can create a new AEAD_Auth_Failed exception class, and
    // CryptoTransform is modified to catch that specific exception.
}

public interface IAEADAlgorithm : IDisposable, IAEADConfig
{
    // separates object creation from initialization/keying; allows for unkeyed factories
    void Initialize(ArraySegment<byte> key);

    void Encrypt(
        ArraySegment<byte> iv, // readonly; covered by authentication
        ArraySegment<byte> plaintext, // readonly; covered by authentication
        ref ArraySegment<byte> ciphertext, // must be of at least [plaintext_length + MaxTagSize] length. iv is not part of ciphertext.
        ArraySegment<byte> additionalData = default(ArraySegment<byte>) // readonly; optional; covered by authentication
        ); // no failures expected under normal operation - abnormal failures will throw

    bool Decrypt(
        ArraySegment<byte> iv, // readonly
        ArraySegment<byte> ciphertext, // readonly
        ref ArraySegment<byte> plaintext, // must be of at least [ciphertext_length - MaxTagSize] length.
        ArraySegment<byte> additionalData = default(ArraySegment<byte>), // readonly; optional
        bool isAuthenticateOnly = false // supports Authentication-only mode
        );// auth failures expected under normal operation - return false on auth failure; throw on abnormal failure; true on success

    /*  Notes:
        * Array.LongLength should be used instead of Array.Length to accomodate byte arrays longer than 2^32.
        * Ciphertext/Plaintext produced by Encrypt()/Decrypt() must be determined *only* by method inputs (combined with a key).
          - (ie. if randomness or other hidden inputs are needed, they must be a part of iv)
        * Encrypt()/Decrypt() are allowed to write to ciphertext/plaintext segments under all conditions (failure/success/abnormal)
          - some implementations might be more stringent than others, and ex. not leak decrypted plaintext on auth failures
    */

    ICryptoAEADTransform CreateEncryptor(
        ArraySegment<byte> key,
        ArraySegment<byte> iv,
        ArraySegment<byte> additionalData = default(ArraySegment<byte>)
        );

    ICryptoAEADTransform CreateDecryptor(
        ArraySegment<byte> key,
        ArraySegment<byte> iv,
        ArraySegment<byte> additionalData = default(ArraySegment<byte>)
        );

    // Streaming AEAD can be done with good-old CryptoStream
    // (possibly modified to be AEAD-aware).
}

@sdrapkin , thanks for contributing thoughts. Where should tags go in your API? Are they implicitly part of the ciphertext? If so, that implies a protocol that some algorithms don't actually have. I'd like to understand what protocols people might care about to see whether implicit tag placement is acceptable or if it needs to be carried separately.

@morganbr I noticed the encrypt/decrypt issue as well but hadn't had time tonight to fix so glad for your design. I prefer the methods over the class as it allows better aggressive recycling (buffers for keys and IV's can add up).

As for a check before the dispose. It's not possible to tell the end of an operation unfortunately.

@sdrapkin interfaces I would say are a no go due to the version problem mentioned previously. Unless we rely on default implantation in the future. Also interface dispatch is slower.. the array segments also are our as span is the more versatile lower primitive. However extension methods could be added for arraysegment to span if there was demand later.

Some of your properties are of interest so will update when I am at a computer rather than on my phone.

Good feedback all round!

@morganbr Tags are part of ciphertext. This is modelled after CAESAR API (which includes AES-GCM).

@Drawaes I've used interfaces to illustrate thoughts only - I'm perfectly ok with static methods/classes. Span does not exist. I don't care about what may or may not be coming - it is not in NetStandard2, and it is not in the normal .NET that serious projects actually use (yeah, yeah, I know it's in Core, but Core is a toy for now). I'd be happy to personally consider Span when I see it - until then ArraySegment is the closest NetStandard API that actually ships.

I will take a deeper look athe CAESAR API that is useful.

As for Span, it is shipping around the 2.1 time frame I believe which is hopefully the same time the first implementation of this API would ship (or at least the earliest possible time it could).

If you take a look at the current prerelease nuget package it supports down to .net standard 1.0 and there is no plans to change that on release.

Maybe @stephentoub can confirm that as he is doing work to add Span based API's across the framework as we speak.

(Nuget for Span)[https://www.nuget.org/packages/System.Memory/4.4.0-preview2-25405-01]

So I would say it's the only real choice for a brand new API. Then extension methods etc can be added to take an ArraySegment if you so chose and if it is useful enough then it could be added to the framework but it's trivial to turn an ArraySegment into a Span, but the other way requires copying data.

The problem I see with that API above is it will be a disaster for performance on any "chunked" data. Say for instance network traffic, if a single authenticated block is split over multiple reads from an existing stream I need to buffer it all into a single [insert data structure] and encrypt/decrypt all at once. The defeats all attempts to do any kind of zero copy on that data.

Networking frameworks such as those that Pipelines provides manage to avoid almost all copies, but then if they hit any kind of crypto in this API all of that is gone.

The separate configuration object (or bag) has actually been involved in a recent discussion on another API I have been having. I am not opposed to it in principle as if it grows in future it can become a mess to have large numbers of properties on the main object.

A couple of things have occurred to me.

  • The current proposal calls TagSize an out value (well, a get-only-property). But for both GCM and CCM it's an input to encryption (and derivable from decryption since you've supplied the actual tag).
  • The proposal assumes that input and output can happen at the same time, and piecemeal.

    • IIRC CCM can't do streaming encryption (the length of the plaintext is an input into the first step of the algorithm).

    • Padded modes lag decryption by (at least one) block, since until Final is called they don't know if more data is coming / if the current block needs the padding removed

  • Some algorithms might consider the AD element to be required at the beginning of the operation, making it more like an Init/ctor parameter than a late-bound association.

I don't know if container formats (EnvelopedCms, EncryptedXml) would need to extract the key, or if it would simply be up to them to generate it and remember it (for as long as they need to to write it down).

(Apparently I didn't hit the "comment" button on this yesterday, so it won't be acknowledging anything after "I have made a few changes" at 1910Z)

True tag size would need to be variable. Agreed.

If we just look at encryption for now, to simplify the use case. You are correct that some ciphers will return nothing, less or more. There is a general question around what happens if you don't provide a big enough buffer.

On the new TextEncoding interfaces using span there was a suggestion around having the return type an enum to define if there was enough space to output or not, and the size actually written in an "out" param instead. This is a possibility.

In the CCM case I would just say it returns nothing and will have to internally buffer until you call finish at which point it would want to dump the whole lot. Nothing precludes you from just calling finish as your first call if you have all the data in a single block (In which case there might be a better name). Or it is possible to throw if you try an update on those ciphers. CNG returns an invalid size error if you try to do a continuation on CCM for instance.

As for when the tag is set on decryption, you often don't know it until you have read the entire packet, if we take TLS as an example we might have to read 8 * 2k network packets to get to the tag at the end of a 16k block. So we now have to buffer the entire 16k before we can start decryption and there is no chance to overlap (I am not saying this would be used for TLS just that an IO bound process is common for these types of ciphers, be it disk or network).

@Drawaes re. chunked streams and buffering limits:
You have to pick your battles. You won't be able to create a unifying API aligned with every singe nice-to-have goal in the AE world - and there are many such goals. Ex. there is a decent chunked AEAD streaming in Inferno, but it's not a standard by any stretch, and such a standard does not exist. At a higher level, the goal is "secure channels" (see this , this, and this).

However, we need to think smaller for now. Chunking/buffer-limiting are not even on the radar for standardisation efforts ("AEADs with large plaintexts" section)..

Encryption/Decryption operations are fundamentally about transforms. These transforms require buffers, and are not in-place (output buffers must be larger than input buffers - at least for Encrypt transform).

RFC 5116 might also be of interest.

@Drawaes , interesting that you bring up TLS. I would argue that SSLStream (were it to use this API) must not return any unauthenticated results to an application since the application won't have any way to defend itself.

Sure, but that is SSLStreams problem. I have prototyped this exact thing (managed TLS at the protocol level calling out to CNG and OpenSSL for the crypto bits) on top of pipelines. The logic was pretty simple, encrypted data comes in, decrypt the buffer in place, attach to the out bound and repeat until you get to the tag. At the tag call finished...

If it throws close the pipeline. If it doesn't throw then flush allowing the next stage to go to work either on the same thread or via dispatch.

My proof of concept was not ready for primetime but by using this and avoiding alot of copies etc it showed a very decent perf increase ;)

The problem with any networking is where things in the pipeline start allocating their own buffers and not using as much as possible the ones moving through the system already.

OpenSsl's crypt, and CNG have this same method Update,Update, Finish. Finish can ouput the tag as discussed. Updates must be in block size (for CNG) and for OpenSsl it does minimal buffering to get to a block size.

As they are primitives, I am not sure we would expect higher level functionality from them. If we were designing a "user" level API rather than primitives to construct those I would then argue that key generation, IV construction and entire authenticated chunks should all be implemented so I guess it depends what the target level of this API really is.

Wrong button

@blowdart, who had some interesting ideas on nonce management.

So nonce management is basically a user problem and is specific to their setup.

So ... make it a requirement. You must plug in nonce management ... and don't supply a default implementation, or any implementations at all. This, rather than a simple

cipher.Init(myKey, nonce);

forces users to make a specific gesture that the understand the risks.

@blowdart's idea might help with both nonce management problems and differences between algorithms. I agree that it's likely important not to have built-in implementation to ensure that users understand that nonce management is a problem they need to solve. How does something like this look?

interface INonceProvider
{
    public void GetNextNonce(Span<byte> writeNonceHere);
}

class AesGcmCipher : Cipher
{
    public AesGcmCipher(ReadOnlySpan<byte> key, INonceProvider nonceProvider);
}

// Enables platform-specific hardware keys
class AesGcmCng : Cipher
{
    public AesGcmCng(CngKey key, INonceProvider nonceProvider);
}

// Example of AEAD that might not need a nonce
class AesCBCHmac : Cipher
{
    public AesCBC(ReadOnlySpan<byte> key)
}

class Cipher
{
    // As above, but doesn't take keys, IVs, or nonces
}

But whats the point of the INonceProvider? It's just an extra inteface/type, if the Init just takes a none and is needed to be called before you start any block, isn't that the same thing without an extra/interface?

Also I am no crypto expert but doesn't AES require an IV (Which isn't a nonce but needs to be provided by the user?)

It's just an extra inteface/type

That's kind of the point. It essentially says that _nonce management_ is a problem, not just passing in a zeroed or even random byte array. It might also help prevent inadvertent reuse if people interpret GetNextNonce to mean "return something different than you did last time".

It's also helpful to not need it for algorithms that don't have nonce management problems (like AES SIV or perhaps AES+CBC+HMAC).

The exact IV/nonce requirements vary based on mode. For example:

  • AES ECB does not require a nonce or IV
  • AES GCM requires a 96-bit nonce that must never be reused or the security of the key breaks. Low entropy is fine as long as the nonce isn't reused.
  • AES CBC requires a 128-bit IV that must be random. If the IV repeats, it only reveals whether the same message has been sent before.
  • AES SIV doesn't need an explicit IV since it derives it from other inputs.

AES CBC needs IV right? So are you going to have an InitializationVectorProvider? Its not a nonce but
nonce like and reusing the last block led to a tls attack because the iv can be predicted. You explicitly can't use say a sequential nonce for CBC.

Yeah but an IV isn't a nonce so you can't have the term nomce provider

I didn't mean to imply AES CBC doesn't need an IV-- it does. I just meant to speculate about some schemes that derive the IV from other data.

Sure I guess my point is I like it generally... I can pool the provider ;) but either call it an iv provider or have 2 interfaces to be clear on intent

@morganbr INonceProvider factories passed into cipher constructors are a bad design. It completely misses the fact that _nonce_ does not exist by itself: the "_...used once_" constraint has a _context_. In cases of CTR and GCM (which uses CTR) modes, the _context_ of the _nonce_ is the _key_. Ie. _nonce provider_ must return a nonce that is used only once within a context of a specific _key_.

Since the INonceProvider in your proposed API is not key-aware, it cannot generate correct nonces (other than via randomness, which is not what a nonce is, even if the bit space was large enough for statistical randomness to work safely).

I'm not entirely sure what this discussion thread aims to achieve. Various Authenticated-Encryption design ideas are discussed... ok. What about Authenticated Encryption interfaces already built into .NET Core -- specifically into its ASP.NET API? There is IAuthenticatedEncryptor, etc. All these capabilities are already implemented, extensible, and shipping as part of .NET Core today. I'm not saying DataProtection crypto is perfect, but is the plan to ignore them? Change them? Assimilate or refactor?

DataProtection crypto was built by @GrabYourPitchforks (Levi Broderick). He knows the subject matter, and his opinion/input/feedback would be most valuable to this community. I enjoy crypto-themed entertainment as much as anyone, but if someone wants to get serious about crypto API design, then actual experts that are already on MS team should be engaged.

@sdrapkin, nonce providers needing to be key aware is a good point. I wonder if there's a reasonable way to modify these APIs to enforce that.

DataProtection is a fine API, but it's a higher-level construct. It encapsulates key generation and management, IVs and output protocol. If somebody needs to use (say) GCM in an implementation of a different protocol, DataProtection doesn't make that possible.

The .NET crypto team includes @bartonjs, @blowdart and myself. Of course, if @GrabYourPitchforks wants to chime in, he's more than welcome.

I agree with @morganbr in that this is supposed to be a low level primitive ( in fact it says that in the title). While data protection etc are designed to be used directly in usercode and reduces the ability to shoot yourself in the foot, the way I see this primitive is to allow the framework and libraries to build higher level constructs on a common base.

With that thought in mind, the provider is fine if it has to be supplied anytime a key is supplied. It does make it a little messy let me explain using TLS (its a well known use of AES block modes for network traffic is all).

I get a "frame" (maybe over 2 + TU's with the MTU ~1500 of the internet). It contains the nonce (or part of the nonce with 4 bytes left "hidden") I then have to set this value on a shell "provider" and then call decrypt and go through my cycle of decrypting the buffers to get a single plain text.

If you are happy with that, I can live with it. I am keen to get this moving along so keen to update the design above to something we can agree on.

Thanks for forking the discussion, getting some free time to jump onto this. @Drawaes , can you confirm/update the top post as gold standard / target of this evolving conversation? If not, can you update it?

I see the current proposal having a fatal issue and then other issues with being too chatty.

// current proposed usage
using (var cipher = new AesGcmCipher(bitsize: 256))
{
    cipher.Init(myKey, nonce);
    while (!inputSource.EOF)
    {
        var inputSpan = inputSource.ReadSpan(cipher.BlockSize);
        cipher.Update(inputSpan);
        outputSource.Write(inputSpan);
    }
    cipher.AddAssociatedData(extraInformation); // <= fatal, one can't just do this
    cipher.Finish(finalBlockData);
    cipher.GetTag(tagData);
}

If you look at a true AEAD primitive, the privacy data and the authenticated data are mixed lock-step. See this for Auth Data 1 and CipherText1. This of course continues for multiple blocks, not just 1. highlighted

Since all the world's a meme, can't resist, sorry :)
Can't resist

Also, the API seems chatty with new, init, update etc. I'd propose this programmer's model

// proposed, see detailed comments below
using (var cipher = new AesGcmCipher(myKey, iv, aad)) // 1
{
    // 2
    while (!inputSource.EOF) 
    {
        var inputSpan = inputSource.ReadSpan(16411); // 3
        var outSpan = cipher.Encrypt(inputSpan); // 4
        outputSource.Write(outSpan); 
    }    
    var tag = cipher.Finish(finalBlockData); // 5
}
  1. Typically AAD << plaintext so I've seen cipher.Init(mykey, nonce, aad); where the entire AAD is passed as a buffer and then the cipher crunches over the rest of the potentially gigabyte+ stream. (e.g. BCryptEncrypts's CipherModeInfo param). Also, size of myKey already establishes AES128, 192, 256, no need for another parameter.
  2. Init becomes an optional API in case caller wishes to reuse existing class, existing AES constants and skipping AES subkey generation if AES key is same
  3. cipher's API should shield caller from block size management internals like most other crypto APIs or even existing .NET APIs. Caller concerned with buffer size optimized for their use case e.g. network IO via 16K+ buffers). Demoing with a prime number > 16K to test implementation assumptions
  4. inputSpan is readonly. And input. So need an outSpan
  5. Update() does Encrypt or Decrypt? Just have Encrypt and Decrypt interfaces to match a developer's mental model. tag is also the most important desired data at this instant, return that.

Actually going a step further, why not just

using (var cipher = new AesGcmCipher(myKey, iv, aad))
{
    var tag = cipher.EncryptFinal(inputSpan, outputSpan);
}

Also, please steer away from INonceProvider and the sorts. The crypto primitives don't need this, just stick to byte[] iv (my favorite for small data) or Span (the supposed new cool but too much abstraction IMHO). Nonce provider operates at a layer above and the result of it could just be the iv seen here.

The problem with making primitives so primitive is people will simply use them incorrectly. With a provider we can at least force some thought into their use.

We're talking about AEAD in general of which GCM is specific. So first, the generalized case (iv) should drive the design, not the specific case (nonce).

Secondly, how does merely shifting from byte[] iv to GetNextNonce(Span<byte> writeNonceHere) actually solve the nonce issue? You've only changed the name/label on the problem while simultaneously making it more complex than it should be.

Third, since we're getting into policies on iv protection should we also get into key protection policies? What about key distribution policies? Those are obviously higher level concerns.

Finally, nonce is extremely situational on usage at higher layers. You don't want to have a brittle architecture where cross-layer concerns are being coupled together.

Frankly if we could hide primitives unless someone makes a gesture to say I know what I'm doing I'd push for that. But we can't. There are far too many bad crypto implementations out there because people though "Oh this is available, I'll use it". Heck look at AES itself, I'll just use that with no HMAC.

I want to see APIs be secure by default, and if that means a little more pain then frankly I'm all for it. 99% of developers do not know what they're doing when it comes to crypto and making it easy for the 1% who do should be a lower priority.

Span does not exist. I don't care about what may or may not be coming - it is not in NetStandard2

@sdrapkin as @Drawaes points out Span<T> is .NET Standard 1.0 so can be used on any framework. Its also safer than ArraySegment<T> as it only lets you access the actual window referenced; rather than the whole array.

Also ReadOnlySpan<T> prevents modification to that window; again unlike array segment where anything passed it can modify and/or retain a reference to the passed array.

Span should be the general go to for sync apis (The fact an api using Span can additionally cope with stackalloc'd, native memory as well as arrays; is the icing)

i.e.
With ArraySegment the readonly is suggested via docs; and no out of bounds read/modifications are prevented

void Encrypt(
    ArraySegment<byte> iv, // readonly; covered by authentication
    ArraySegment<byte> plaintext, // readonly; covered by authentication
    ref ArraySegment<byte> ciphertext, // must be of at least [plaintext_length + MaxTagSize] length. iv is not part of ciphertext.
    ArraySegment<byte> additionalData = default(ArraySegment<byte>) // readonly; optional; covered by authentication
    );

However with Span the readonly is enforced by api; as well as out of bounds reads of the arrays being prevented

void Encrypt(
    ReadOnlySpan<byte> iv, // covered by authentication
    ReadOnlySpan<byte> plaintext, // covered by authentication
    Span<byte> ciphertext, // must be of at least [plaintext_length + MaxTagSize] length. iv is not part of ciphertext.
    ReadOnlySpan<byte> additionalData = ReadOnlySpan<byte>.Empty) // optional; covered by authentication
    );

It conveys intent with the parameters far better; and is less error prone with regards to out of bounds reads/writes.

@benaadams @Drawaes never said that Span<T> was in NetStandard (any shipped NetStandard). What he did say is (1) agree that Span<T> is not in any shipped NetStandard; (2) that Span<T> will be _"shipping around the 2.1 time frame"_.

For this particular Github issue, however, (read-only) Span<T> discussion is bikeshedding right now - there is no clarity on scope or purpose of the API to be designed.

Either we go with raw low-level primitive AEAD API (ex. similar to CAESAR):

  • Pros: nice fit for AES-GCM/CCM, existing test vectors from good sources (NIST, RFC). @sidshetye will be happy. @blowdart will meditate over _"making primitives so primitive"_, but will eventually see the Yin and Yang because primitives are primitive and there is no way to childproof them.
  • Cons: Expert users (the proverbial 1%) will use the low-level APIs responsibly, while the other non-expert users (99%) will misuse it to write broken .NET software that will be responsible for the vast majority of .NET CVEs, which will greatly contribute to the perception that .NET is an insecure platform.

Or we go with high-level misuse-impossible or -resistant AEAD API:

  • Pros: 99% of non-expert users will continue to make mistakes, but at least not in AEAD code. @blowdart 's _"I want to see APIs be secure by default"_ resonates deeply in the ecosystem, and security, prosperity, and good karma befall all. Many good API designs & implementations are already available.
  • Cons: No standards. No test vectors. No consensus on whether AEAD is even the right goal to target for high-level online streaming API (spoiler: it's not - see Rogaway's paper).

Or, we do both. Or we enter analysis paralysis and might as well close this issue right now.

I strongly feel that being part of the core language, crypto needs to have a solid, low level foundational API. Once you have that, creating high level APIs or "training wheels" APIs can be bridged quickly by the core or community. But I challenge anyone to do the reverse elegantly. Plus the topic is "General low level primitive for ciphers" !

@Drawaes is there a timeline to converge and resolve this? Any plans on involving non-Microsoft folks beyond such GitHub alerts? Like a 30 min conference call? I'm trying to stay out of a rabbit hole but we're betting that .NET core crypto will be at a certain level of maturity and stability .. so can triage for such discussions.

We're still paying attention and working on this. We've met with the Microsoft Cryptography Board (the set of researchers and other experts who advice Microsoft's usage of cryptography) and @bartonjs will have more information to share soon.

Based on a little data flow doodling and the advice of the Crypto Board we came up with the following. Our model was GCM, CCM, SIV and CBC+HMAC (note that we're not talking about doing SIV or CBC+HMAC right now, just that we wanted to prove out the shape).

```C#
public interface INonceProvider
{
ReadOnlySpan GetNextNonce(int nonceSize);
}

public abstract class AuthenticatedEncryptor : IDisposable
{
public int NonceOrIVSizeInBits { get; }
public int TagSizeInBits { get; }
public bool SupportsAssociatedData { get; }
public ReadOnlySpan LastNonceOrIV { get; }
public ReadOnlySpan LastTag { get; }

protected AuthenticatedEncryptor(
    int tagSizeInBits,
    bool supportsAssociatedData,
    int nonceOrIVSizeInBits) => throw null;

protected abstract bool TryEncrypt(
    ReadOnlySpan<byte> data,
    ReadOnlySpan<byte> associatedData,
    Span<byte> encryptedData,
    out int bytesWritten,
    Span<byte> tag,
    Span<byte> nonceOrIVUsed);

public abstract void GetEncryptedSizeRange(
    int dataLength,
    out int minEncryptedLength,
    out int maxEncryptedLength);

public bool TryEncrypt(
    ReadOnlySpan<byte> data,
    ReadOnlySpan<byte> associatedData,
    Span<byte> encryptedData,
    out int bytesWritten) => throw null;

public byte[] Encrypt(
    ReadOnlySpan<byte> data,
    ReadOnlySpan<byte> associatedData) => throw null;

// some variant of the Dispose pattern here.

}

public sealed class AesGcmEncryptor : AuthenticatedEncryptor
{
public AesGcmEncryptor(ReadOnlySpan keySize, INonceProvider nonceProvider)
: base(128, true, 96)
{
}
}

public sealed class AesCcmEncryptor : AuthenticatedEncryptor
{
public AesCcmEncryptor(
ReadOnlySpan key,
int nonceSizeInBits,
INonceProvider nonceProvider,
int tagSizeInBits)
: base(tagSizeInBits, true, nonceSizeInBits)
{
validate nonceSize and tagSize against the algorithm spec;
}
}

public abstract class AuthenticatedDecryptor : IDisposable
{
public abstract bool TryDecrypt(
ReadOnlySpan tag,
ReadOnlySpan nonceOrIV,
ReadOnlySpan encryptedData,
ReadOnlySpan associatedData,
Span data,
out int bytesWritten);

public abstract void GetEncryptedSizeRange(
    int encryptedDataLength,
    out int minDecryptedLength,
    out int maxDecryptedLength);

public byte[] Decrypt(
    ReadOnlySpan<byte> tag,
    ReadOnlySpan<byte> nonceOrIV,
    ReadOnlySpan<byte> encryptedData,
    ReadOnlySpan<byte> associatedData) => throw null;

// some variant of the Dispose pattern here.

}

public sealed class AesGcmDecryptor : AuthenticatedDecryptor
{
public AesGcmDecryptor(ReadOnlySpan key) => throw null;
}

public sealed class AesCcmDecryptor : AuthenticatedDecryptor
{
public AesCcmDecryptor(ReadOnlySpan key) => throw null;
}
```

This proposal eliminates data streaming. We don't really have a lot of flexibility on that point. Real-world need (low) combined with the associated risks (extremely high for GCM) or impossibility thereof (CCM) means it's just gone.

This proposal uses an externalized source of nonce for encryption. We will not have any public implementations of this interface. Each application/protocol should make its own tying the key to the context so it can feed things in appropriately. While each call to TryEncrypt will only make one call to GetNextNonce there's no guarantee that that particular TryEncrypt will succeed, so it's still up to the application to understand if that means it should re-try the nonce. For CBC+HMAC we would create a new interface, IIVProvider, to avoid muddying the terminology. For SIV the IV is constructed, so there's no acceptable parameter; and based on the spec the nonce (when used) seems to just be considered as part of the associatedData. So SIV, at least, suggests that having nonceOrIV as a parameter to TryEncrypt is not generally applicable.

TryDecrypt most definitely throws on invalid tag. It only returns false if the destination is too small (per the rules of Try- methods)

Things that are definitely open for feedback:

  • Should sizes be in bits (like significant parts of the specs) or bytes (since only %8 values are legal anyways, and we're always going to divide, and some of the parts of specs talk about things like the nonce size in bytes)?
  • Parameter names and ordering.
  • The LastTag/LastNonceOrIV properties. (Making them be (writable) Spans on a public TryEncrypt just means there are three buffers that could be too small, by making them sit on the side the "Try" is more clear; the base class can make the promise that it'll never offer a too-short buffer.).
  • Offering up an AE algorithm for which this doesn't work.
  • Should associatedData be moved to the end with a default of ReadOnlySpan<byte>.Empty?

    • Or overloads made which omit it?

  • Does anyone want to assert love for, or hatred for, the byte[]-returning methods? (Low-allocation can be achieved by using the Span method, this is just for convenience)
  • The size ranges methods were sort of bolted on at the end.

    • Their purpose is that

    • If the destination span is less than min, return false immediately.

    • The byte[]-returning methods will allocate a buffer of max, and then Array.Resize it as needed.

    • Yes, for GCM and CCM min=max=input.Length, but that's not true for CBC+HMAC or SIV

    • Is there an algorithm that would need to take the associatedData length into account?

Definitely bytes - not bits.
Nonce provider that's not key-aware is a big mistake.

Nonce provider that's not key-aware is a big mistake.

You can write your nonce provider however you like. We aren't providing any.

What about deterministic cleanup/IDisposable ?

What about deterministic cleanup/IDisposable ?

Good call. Added it to AuthenticatedEncryptor/AuthenticatedDecryptor. I don't think they should probe for disposability on the nonce provider, the caller can just stack the using statements.

INonceProvider concept/purpose makes no sense to me (echoing others). Let primitives be primitive - pass in the nonce the same way you pass in the key (ie. as bytes - however declared). No AE/AEAD spec forces an algorithm for how nonces are generated/derived - this is a higher-layer responsibility (at least in the let-primitives-be-primitive model).

No streaming? Really? What is the justification to forcibly remove streaming from a stream cipher like AES-GCM at a core foundational level?

For example, what does your crypto board recommend these two recent scenarios we reviewed?

  1. Client has large healthcare files between 10-30GB. The core only sees a data stream though between two machines so it's one pass stream. Obviously a fresh key is issued for each 10GB file but you've just rendered every such workflow useless. You now want us to a) buffer that data (memory, no pipe-lining) b) perform encryption (all machines in the pipeline are now idle!) c) write the data out (first byte written after a and b are 100% done) ? Please tell me you're joking. You guys are knowingly putting "encryption is a burden" back into the game.

  2. Physical security unit has multiple 4K streams which are also encrypted for at-rest scenarios. Fresh key issuance happens at 15GB boundary. You propose buffering the entire clip?

I don't see any input from the community, of people actually building real-world software, asking to remove streaming support. But then the team disappears from the community dialog, huddles internally and then comes back with something nobody asked for, something that kills real applications and re-enforces that "encryption is slow and expensive, skip it?"

You can provide Encrypt and EncryptFinal which would support both options instead of imposing your decision for the entire ecosystem.

Elegant design eliminates complexity, not control.

What is the justification to forcibly remove streaming from a stream cipher like AES-GCM at a core foundational level?

I think it was something like

This proposal eliminates data streaming. We don't really have a lot of flexibility on that point. Real-world need (low) combined with the associated risks (extremely high for GCM) or impossibility thereof (CCM) means it's just gone.

GCM has too many oops moments where it allows key recovery. If an attacker can do a chosen ciphertext and watch the streaming output from before tag verification, they can recover the key. (Or so one of the cryptanalysts tells me). Effectively, if any GCM-processed data is observable at any point before tag verification then the key is compromised.

I'm pretty sure that the Crypto Board would recommend NOT using GCM for first scenario, but rather CBC+HMAC.

If your second scenario is 4k framing, and you're encrypting each 4k frame, then that works with this model. Each 4k + nonce + tag frame gets decrypted and verified before you get the bytes back, so you never leak the keystream / key.

For comparison: I'm currently developing this "let primitives be primitive" crypto API. Here is my class for authenticated encryption.

For me it turned out to be useful to be able to talk about an crypto primitive independently of a key. For example, I often want to plug a specific primitive into a method that works with any AEAD algorithm and leave the generation of keys etc. to that method. Therefore there's an AeadAlgorithm class and a separate Key class.

Another very useful thing that already prevented several bugs is to use distinct types to represent data of different shapes, e.g., a Key and a Nonce, instead of using a plain byte[] or Span<byte> for everything.


AeadAlgorithm API (click to expand)

public abstract class AeadAlgorithm : Algorithm
{
    public int KeySize { get; }

    public int NonceSize { get; }

    public int TagSize { get; }

    public byte[] Decrypt(
        Key key,
        Nonce nonce,
        ReadOnlySpan<byte> associatedData,
        ReadOnlySpan<byte> ciphertext)

    public void Decrypt(
        Key key,
        Nonce nonce,
        ReadOnlySpan<byte> associatedData,
        ReadOnlySpan<byte> ciphertext,
        Span<byte> plaintext)

    public byte[] Encrypt(
        Key key,
        Nonce nonce,
        ReadOnlySpan<byte> associatedData,
        ReadOnlySpan<byte> plaintext)

    public void Encrypt(
        Key key,
        Nonce nonce,
        ReadOnlySpan<byte> associatedData,
        ReadOnlySpan<byte> plaintext,
        Span<byte> ciphertext)

    public bool TryDecrypt(
        Key key,
        Nonce nonce,
        ReadOnlySpan<byte> associatedData,
        ReadOnlySpan<byte> ciphertext,
        out byte[] plaintext)

    public bool TryDecrypt(
        Key key,
        Nonce nonce,
        ReadOnlySpan<byte> associatedData,
        ReadOnlySpan<byte> ciphertext,
        Span<byte> plaintext)
}

@bartonjs he/she is correct you need to rely on the program not outputting until authentication. So for example if you aren't authenticating (or just not yet) you can control the input for a block and therefore know the output and work backwards from there...

E.g. a man in the middle attack can inject known blocks into a cbc stream and perform a classic bit flipping attack.

Not sure how to solve the large chunks of data issue really other thank to chunk them with serial nonces or similar... ala TLS.

Well let me rephrase that I do but only in the network small sizes case which isn't enough for a general purpose lib.

In the spirit of openness, is it possible to reveal who is on the Microsoft Cryptography Review Board (and ideally the comments/opinions of specific members that reviewed this topic)? Brian LaMacchia and who else?

_using reverse psychology:_

I'm happy that streaming AEAD is out. This means that Inferno continues to be the only practical CryptoStream-based streaming AEAD for the average Joe. Thank you MS Crypto Review Board!

Building on @ektrah's comment, his (her?) approach is driven by RFC 5116, which I've referenced earlier. There are many notable quotes in RFC 5116:

3.1. Requirements on Nonce Generation
It is essential for security that the nonces be constructed in a manner that respects the requirement that each nonce value be distinct for each invocation of the authenticated encryption operation, for any fixed value of the key.
...

  1. Requirements on AEAD Algorithm Specifications
    Each AEAD algorithm MUST accept any nonce with a length between N_MIN and N_MAX octets, inclusive, where the values of N_MIN and N_MAX are specific to that algorithm. The values of N_MAX and N_MIN MAY be equal. Each algorithm SHOULD accept a nonce with a length of twelve (12) octets. Randomized or stateful algorithms, which are described below, MAY have an N_MAX value of zero.
    ...
    An Authenticated Encryption algorithm MAY incorporate or make use of a random source, e.g., for the generation of an internal initialization vector that is incorporated into the ciphertext output. An AEAD algorithm of this sort is called randomized; though note that only encryption is random, and decryption is always deterministic. A randomized algorithm MAY have a value of N_MAX that is equal to zero.

An Authenticated Encryption algorithm MAY incorporate internal state information that is maintained between invocations of the encrypt operation, e.g., to allow for the construction of distinct values that are used as internal nonces by the algorithm. An AEAD algorithm of this sort is called stateful. This method could be used by an algorithm to provide good security even when the application inputs zero-length nonces. A stateful algorithm MAY have a value of N_MAX that is equal to zero.

One idea potentially worth exploring is the passing of zero-length/null Nonce, which might even be the default. The passing of such "special" Nonce value will randomize the actual Nonce value, which will be available as Encrypt's output.

If INonceProvider stays because "reasons", another idea is to add a Reset() call, which will be triggered every time the AuthenticatedEncryptor is rekey'ed. If, on the other hand, the plan is to never rekey AuthenticatedEncryptor instances, this will trash GC if we want to build a streaming chunk-encrypting API (ex. chunk = network packet), and every chunk must be encrypted with a different key (ex. Netflix MSL protocol, Inferno, others). Especially for parallel enc/dec operations where we'd want to maintain a pool of AEAD engines, and borrow instances from that pool to do enc/dec. Let's give GC some love :)

From my point of view the sole purpose of crypto primitives is to implement well-designed higher-level security protocols. Every such protocol insists on generating nonces in their own way. For example:

  • TLS 1.2 follows the recommendations of RFC 5116 and concatenates a 4-byte IV with an 8-byte counter,
  • TLS 1.3 xor's an 8-byte counter padded to 12 bytes with a 12-byte IV,
  • Noise uses an 8-byte counter padded to 12 bytes in big-endian byte order for AES-GCM and an 8-byte counter padded to 12 bytes in little-endian byte order for ChaCha/Poly.

GCM is way too brittle for randomized nonces at typical nonce sizes (96 bit). And I'm not aware of any security protocol that actually supports randomized nonces.

There is not much demand for more APIs providing crypto primitives. 99.9% of developers need high-level recipes for security-related scenarios: storing a password in a database, encrypting a file at rest, securely transferring a software update, etc.

However, APIs for such high-level recipes are rare. The only APIs available are often only HTTPS and the crypto primitives, which forces developers to roll their own security protocols. IMO the solution is not to put a lot of effort in designing APIs for working with primitives. It's APIs for high-level recipes.

Thanks for the feedback, everyone! A couple of questions:

  1. While streaming decryption can fail catastrophically, streaming encryption could be doable. Does streaming encryption (along with a non-streaming option) but only non-streaming decryption sound more useful? If yes, there are a couple of problems to solve:
    a. Some algorithms (CCM, SIV) don't actually support streaming. Should we put streaming encryption on the base class and buffer streamed inputs or throw from the derived classes?
    b. Streaming AAD likely isn't possible due to implementation constraints, but different algorithms need it at different times (some need it at the beginning, some don't need it until the end). Should we require it up-front or have a method for adding it that works when the individual algorithms allow?
  1. We're open to improvements to INonceProvider as long as the point is that users need to write code generating a new nonce. Does anyone have another proposed shape for it?

1 . a = I think it could be an issue not to warn the user early. Imagine the scenario from someone above, a 10gb file. They think they are getting streaming, then sometime later another dev changes the cipher and next thing the code is buffering 10gb (or trying) before returning a value.

  1. b = Again with the "streaming" or networking idea, for instance AES GCM etc you don't get the AAD information until the end for decryption. As for Encryption, I am yet to see a case where you don't have the data upfront. So I would say at least for encryption you should require it at the start, decryption is more complex.
  1. I think it's really a non issue, supplying the "bytes" for the nonce through an interface or just directly is neither here nor there. You can achieve the same thing both ways, I just find it uglier for a primitive but am not vehemently opposed if it makes people sleep better at night. I would just strike this off as a done deal, and move on with the other issues.

Regarding the deliberation process

@bartonjs: We could argue all day if closed door decisions devoid of community involvement is an effective justification but we'll go off-topic, so I'll let that be. Plus without richer face to face or realtime comms, I don't want to upset anyone there.

Regarding streaming

1. the 'streaming implies no AES-GCM security' argument

Specifically, steaming => return decrypted data to caller before tag verification => no security. This isn't sound. @bartonjs claims 'chosen ciphertext => watch output => recover key' while @drawaes claims 'control input for a block => therefore know output => "work from there" '

Well, in AES-GCM, the only thing the tag does, is integrity verification (tamper protection). It has 0 impact on privacy. In fact, if you remove the GCM/GHASH tag processing from AES-GCM, you simply get AES-CTR mode. It's this construct that handles the privacy aspect. And CTR is malleable to bit flips but isn't "broken" in any of the ways you two are asserting (recovering key or plaintext) because that would mean the fundamental AES primitive is compromised. If your cryptanalyst (who is it?) knows something the rest of us don't know, he/she should be publishing it. The only thing possible is, an attacked can flip bit N and know that bit N of the plaintext was flipped - but they never know what the actual plaintext is.

So

1) plaintext privacy is always enforced
2) integrity verification is simply deferred (till end of stream) and
3) no key is ever compromised.

For products and systems where streaming is foundational, you can now at least engineered a tradeoff where one momentarily steps down from AEAD to regular AES encryption - then step up back to AEAD upon tag verification. That unlocks several innovative concepts to embrace security instead of going "You want to buffer all that - are you crazy? We can't do encryption!".

All because you want to implement just EncryptFinal rather than both Encrypt and EncryptFinal (or equivalents).

2. Not specific to GCM!

Now AES-GCM isn't some magical beast to have "oops moments" galore. It's simply AES-CTR + GHASH (a sort of hash if I may). Nonce considerations related to privacy are inherited from CTR mode and tag considerations related to integrity come from the variable tag sizes allowed in the spec. Still, AES-CTR + GHASH is very similar to something like AES-CBC + HMAC-SHA256 in that the first algorithm handles privacy and the second handles integrity. In AES-CBC + HMAC-SHA256, bit flips in ciphertext will corrupt corresponding block in decrypted text (unlike CTR) AND also deterministically flip bits in the following decrypted plaintext block (like CTR). Again, an attacker won't know what the resulting plaintext will be - just that bits were flipped (like CTR). Finally, the integrity check (HMAC-SHA256) will catch it. But only processing the last byte (like GHASH).

So if your argument of holding back ALL decrypted data until integrity is OK is truly good - it should be applied consistently. So ALL data coming out of the AES-CBC path should also be buffered (internally by the library) till HMAC-SHA256 passes. That basically means on .NET, no streaming data can even benefit from AEAD advances. .NET forces streaming data to downgrade. To pick between no encryption or regular encryption. No AEAD. Where buffering is technically impractical, architects should at least have the option to warn end-users that "drone footage may be corrupt" rather than "no eyes for you".

3. It's the best we have

Data is getting larger and security needs to be stronger. Streaming is also a reality designers have to embrace. Until the world crafts a truly integrated AEAD algorithm which can natively detect corruption mid-stream tampering, we are stuck with encryption + authentication as bolted-on-buddies. True AEAD primitive are being researched but we've just got encryption + authentication for now.

I care less about "AES-GCM" as much as I care about a fast, popular AEAD algorithm that can support streaming workloads - super prevalent in a data-rich, hyper-connected world.

4. Use AES-CBC-HMAC, Use (insert workaround)

the Crypto Board would recommend NOT using GCM for first scenario, but rather CBC+HMAC.

Leaving aside everything mentioned above or even the specifics of the scenario - suggesting AES-CBC-HMAC isn't free. It's ~3x slower than AES-GCM since AES-CBC encrypt is non-parallelizable and since GHASH can be accelerated via the PCLMULQDQ instruction. So if you're at 1GB/sec with AES-GCM, you're now going to hit ~300MB/sec with AES-CBC-HMAC. This again perpetrates the "Crypto slows you down, skip it" mindset - one that security folks try hard to fight.

encrypting each 4k frame

video codecs should suddenly do encryption? Or the encryption layer must now understand video codecs? It's just a bitstream at the data security layer. The fact that it's a video/genomic data/images/proprietary formats etc shouldn't be a security layer concern. An overall solution shouldn't co-mingle core responsibilities.

Nonce

NIST allows for randomized IVs for length exceeding 96 bits. See section 8.2.2 at NIST 800-38D. Nothing new here, nonce requirements come from CTR mode. Which is also fairly standard across most stream ciphers. I don't understand the sudden fear towards nonces - it's always been number used once. Still, so while the INonce debate makes for a clunky interface at least it doesn't eliminate innovation like the no-stream-for-you imposition. I'll concede to INonce any day if we can get the AEAD security + streaming workload innovations. I hate calling something basic like streaming an innovation - but that's where I fear we would regress.

I'd love to be proven wrong

I'm just a guy who after a long day at work, gave up movie night with my kids to type this. I'm tired and could be wrong. But at least have an open fact based community dialog rather than anecdotes or "committee reasons" or some other voodoo. I'm in the business of promoting secure .NET and Azure innovations. I think we've got aligned goals.

Speaking of community dialog ...

Can we please have a community Skype call? Expressing a complex topic like this blows into a giant wall of text. Pretty please?

Please don't do a Skype call - that's the very definition of "closed door meeting", with no records available for the community. Github issues are the right vehicle for all parties to have a civil documented discourse (ignoring MS-comment-removal precedents).

MS Crypto Review Board probably did a Skype call too. It's not the fault of the MS folks participating in this thread - they likely have very limited access to & persuasion power over the ivory towers of MS Crypto Review Board (whatever/whoever that is).

Regarding streaming AEAD:

Byte-size streaming _encryption_ is possible for MAC-last modes like GCM, CTR+HMAC, but not possible for MAC-first modes like CCM. Byte-size streaming _decryption_ is fundamentally leaking and therefore is not considered by anyone. Block-size streaming _encryption_ is also possible for CBC+HMAC, but that does not change anything. Ie. Byte-size or Block-size approaches to streaming AEAD are flawed.

Chunk-size streaming _encryption_ and _decryption_ work great, but they have 2 constraints:

  • they require buffering (beyond-block-size). This can be done by the library/API if buffering is controlled/capped (ex. Inferno), or left to the upper layer (calling layer) to deal with. Either way works.

  • Chunked streaming AEAD is not standardized. Ex. nacl-stream, Inferno, MS-own DataProtection, make-your-own.

This is just a summary of what everyone in this discussion so far already knows.

@sdrapkin, to make sure I understand properly, are you ok with this API providing streaming encryption, but no streaming decryption?

@sdrapkin well, humans brainstorming in real time is certainly beneficial, record keeping concerns can be resolved with meeting minutes. Back to the technical side, while chunking works for streaming decryption, that's not a low level security primitive. It's a custom protocol. And a non-standard one like you noted.

@morganbr

_are you ok with this API providing streaming encryption, but no streaming decryption?_

No, I'm not. If such API were available, it would be easy to create a stream-encrypted ciphertext of a size that no buffer-decryption will be able to decrypt (out of memory).

^^^^ This, there hasn't been much agreement so far, but I think we can all agree whatever way it goes an asymmetric API would be a disaster. Both from a "hey were is the stream decrypt methods I thought I would rely on because there were encrypt methods", and because of @sdrapkin comments above.

@Drawaes Agreed. Asymmetric enc/dec API would be awful.

Any updates folks?

Apparently I conflated a few attacks.

Inherent weaknesses in stream ciphers (which AES-CTR and AES-GCM are) allow for chosen ciphertext attacks to allow for arbitrary plaintext recovery. The defense against chosen ciphertext attacks is authentication; so AES-GCM is immune... unless you're doing streaming decryption and you can identify from side-channel observations what the plaintext would have been. For example, if the decrypted data is being processed as XML it'll fail very quickly if characters other than whitespace or < are at the beginning of the decrypted data. So that's "streaming decryption re-introduces concerns with stream cipher design" (which, you might have noticed, .NET does not have any of).

While looking for where the key recovery was coming from there are papers like Authentication
weaknesses in GCM (Ferguson/Microsoft)
, but that one is recovering the authentication key based on short tag sizes (which is part of why the Windows implementation only allows 96-bit tags). I was probably advised about other authentication key recovery vectors as to why streaming GCM is dangerous.

In an earlier comment @sdrapkin noted "Byte-size streaming decryption is fundamentally leaking and therefore is not considered by anyone. ... Byte-size or Block-size approaches to streaming AEAD are flawed.". That, combined with CCM (and SIV) not being capable of doing streaming encryption and the comment of it would be weird to have one streaming and not the other, suggests that we're back to the proposal of just having one-shot encrypt and decrypt.

So it seems we're right back at my last API proposal (https://github.com/dotnet/corefx/issues/23629#issuecomment-329202845). Unless there are other outstanding issues that I managed to forget while taking some time off.

Welcome back @bartonjs

I'm going to sleep shortly but briefly:

  1. We've conflated protocol design with primitive design before on this thread. I'll just say that chosen ciphertext attacks is a protocol design concern, not a primitive concern.

  2. Streaming AEAD decryption at least allows you to have privacy and then immediately upgrades to privacy + authenticity at last byte. Without streaming support on AEAD (i.e. traditional privacy only), you're permanently restricting folks to a lower, privacy only assurance.

If technical merits are insufficient or you're (rightfully) skeptical of the authoritativeness of my arguments, I'll try the outside authority route. You should know that your actual underlying implementation supports AEAD (including AES GCM) in streaming mode. The Windows core OS (bcrypt) allows for streaming GCM via the BCryptEncrypt or BCryptDecrypt functions. See dwFlags there. Or an user code example. Or a Microsoft authored CLR wrapper. Or that the implementation has been NIST FIP-140-2 certified as recently as earlier this year. Or that both Microsoft and NIST both spent significant resources around the AES implementation and certified it here and here. And despite all of this, nobody has faulted the primitives. It makes no sense at all for .NET Core to suddenly come around and impose it's own crypto-thesis to water down the powerful underlying implementation. Especially when BOTH streaming and one-shot can be supported simultaneously, very trivially.

More? Well, the above it true for OpenSSL, even with their 'newer' evp APIs.

And it's true for BouncyCastle.

And it's true with Java Cryptography Architecture.

cheers!
Sid

@sidshetye ++10 if the cryptoboard is so concerned why do they let windows CNG do this?

If you check Microsoft's NIST FIPS-140-2 AES validation (ex. # 4064), you'll notice the following:

AES-GCM:

  • Plain Text Lengths: 0, 8, 1016, 1024
  • AAD Lengths: 0, 8, 1016, 1024

AES-CCM:

  • Plain Text Length: 0-32
  • AAD Length: 0-65536

There is no validation for streaming. I'm not even sure whether NIST checks that ex. AES-GCM implementation should not be allowed to encrypt more than 64Gb plaintext (another ridiculous limitation of GCM).

I am not massively wedded to streaming as my use shouldn't ride over 16k however fragmented buffers would be nice and should pose no risk at all ( I actually suspect that cng made it's interface the way it is for exactly that purpose) ... e.g. I want to be able to pass in a number of spans or similar (linked list for instance) and have it decrypt in one go. If it decrypts to a contiguous buffer that's all fine.

So I guess moving the shadowy crypto board on the "streaming style" API is a no go for now so let's move forward make a one shot API. There is always scope to expand an API IF enough people show a need later

@sdrapkin the point is that it's the streaming API that's gone through extensive review by NIST Labs and MSFT. Each build being validated is between $80,000 - $50,000 and MSFT (and OpenSSL and Oracle and other crypto heavyweights) have invested HEAVILY in getting these API and implementations validated for over 10 years. Lets not get distracted by the test plan's specific plain-text sizes because I'm confident .NET will support sizes other than 0, 8, 1016, 1024 regardless of streaming or one-shot. The point is all those battle-tested APIs (literally; on weapon support systems), on all these platforms support streaming AEAD at the crypto-primitive API level. Unfortunately, every argument so far against it has been an application or protocol level concern cited as a pseudo-concern at the crypto primitive level.

I'm all for 'let the best idea win' but unless the .net core crypto team (MSFT or community) has some ground breaking discovery, I just don't see how everyone doing crypto so far, from all different organizations are wrong and they are right.

PS: I know we're in disagreement here but we all want what's best for the platform and it's customers.

@Drawaes unless the AEAD interface (not necessarily implementation) being defined today supports a streaming API surface, I don't see how folks can extend it without having two interfaces or custom interfaces. That would be a disaster. I'm hoping this discussion leads to an interface that's future proof (or very least, mirrors other AEAD interfaces that have been around for many years!).

I tend to agree. But this issue is going nowhere fast and when that happens we are likely to hit a crunch point either it won't make it for 2.1 or it will have to be rammed through with no time left to iron out issues. I'll be honest I have gone back to my old wrappers and am just revamping them for 2.0 ;)

We've got a few reference APIs for Java, OpenSSL or C# Bouncy Castle or CLR Security. Frankly any of them will do and long term, I wish C# to have something like Java's 'Java Cryptography Architecture' where all crypto implementations are against a well established interface allowing one to swap out crypto libraries without impacting user code.

Back here, I think it's best we extend the .NET Core's ICryptoTransform interface as

public interface IAuthenticatedCryptoTransform : ICryptoTransform 
{
    bool CanChainBlocks { get; }
    byte[] GetTag();
    void SetExpectedTag(byte[] tag);
}

If we're Spanifying all byte[]s, that should permeate the entire API in the System.Security.Cryptography namespace for overall consistency.

Edit: Fixed JCA links

If we're Spanifying all byte[]s, that should permeate the entire API in the System.Security.Cryptography namespace for overall consistency.

We did that already. Everything except ICryptoTransform, because we can't change interfaces.

I think it's best we extend the .NET Core's ICryptoTransform ...

The problem with this is the calling pattern is very awkward with getting the tag out at the end (particularly if CryptoStream is involved). I wrote this originally, and it was ugly. There's also the problem of how to get one of these, since the GCM parameters are different than the CBC/ECB parameters.

So, here are my thoughts.

  • Streaming decryption is dangerous for AE.
  • In general, I'm a fan of "give me the primitive, and let me manage my risk"
  • I'm also a fan of ".NET shouldn't (easily) allow completely unsafe things, because that's some of its value proposition"
  • If, as I misunderstood originally, the risks of doing GCM decryption badly were input key recovery then I'd still be at "this is too unsafe". (The difference between .NET and everything else would be "having taken longer to do this the world has learned more")
  • But, since it isn't, if you really want the training wheels to come off, then I guess I'll entertain that notion.

My fairly raw thoughts to that end (adding to the existing suggestions, so the one-shot remains, though I guess as a virtual default impl instead of an abstract):

```C#
partial class AuthenticatedEncryptor
{
// throws if an operation is already in progress
public abstract void Initialize(ReadOnlySpan associatedData);
// true on success, false on “destination too small”, exception on anything else.
public abstract bool TryEncrypt(ReadOnlySpan data, Span encryptedData, out int bytesRead, out int bytesWritten);
// false if remainingEncryptedData is too small, throws if other inputs are too small, see NonceOrIVSizeInBits and TagSizeInBits properties.
// NonceOrIvUsed could move to Initialize, but then it might be interpreted as an input.
public abstract bool TryFinish(ReadOnlySpan remainingData, Span remainingEncryptedData, out int bytesWritten, Span tag, Span nonceOrIvUsed);
}

partial class AuthenticatedDecryptor
{
// throws if an operation is already in progress
public abstract void Initialize(ReadOnlySpan tag, ReadOnlySpan nonceOrIv, ReadOnlySpan associatedData);
// true on success, false on “destination too small”, exception on anything else.
public abstract bool TryDecrypt(ReadOnlySpan data, Span decryptedData, out int bytesRead, out int bytesWritten);
// throws on bad tag, but might leak the data anyways.
// (remainingDecryptedData is required for CBC+HMAC, and so may as well add remainingData, I guess?)
public abstract bool TryFinish(ReadOnlySpan remainingData, Span remainingDecryptedData, out int bytesWritten);
}
```

AssociatedData comes at Initialize, because algorithms that need it last can hold on to it, and algorithms that need it first can’t have it any other way.

Once a shape is decided for what streaming would look like (and whether people think CCM should internally buffer, or should throw, when in streaming encryption mode) then I'll go back to the board.

@bartonjs I know what you mean about plucking and programming the tag from the end of the stream for symmetry across encrypt/decrypt. It’s tricky but worse if left to each user to solve. I have an implementation I can share under MIT; will need to look internally with my team (not at my desk/mobile)

A middle ground could be like OpenSSL or NT’s bcrypt where you need to plug the tag right before the final decrypt call since that’s when the tag comparisons happen. i.e. a SetExpectedTag (before final decrypt) and GetTag (after final encrypt) would work but offloads tag management to the user. Most will simply append the tag to the cipherstream since it’s the natural temporal order.

I do think expecting the tag in Initialize itself (in decrypt) breaks symmetry in space (byte flow) and time (tag check at end, not start) which limits its usefulness. But the above Tag APIs resolve that.

Also for encrypt, Initialize needs the IV before any crypto transforms.

Lastly, for encrypt and decrypt, Initialize needs the AES encryption keys before any transforms. (I’m missing something obvious or you forgot to type that bit?)

I do think expecting the tag in Initialize itself (in decrypt) breaks symmetry

In CBC+HMAC the usual recommendation is to verify the HMAC before starting any decryption, so it's a tag-first decryption algorithm. Similarly, there could be a "pure AE" algorithm which does destructive operations on the tag during computations and merely checks that the final answer was 0. So, like the associated data value, since there could be algorithms which need it first, it has to come first in a fully generalized API.

Floating them out into SetAssociatedData and SetTag have the problem that while the base class was algorithm-independent, the usage becomes algorithm-dependent. Changing AesGcm to AesCbcHmacSha256 or SomeTagDesctructiveAlgorithm would now result in TryDecrypt throwing because the tag was not yet provided. To me that is worse than not being polymorphic at all, so allowing the flexibility suggests breaking the model apart to be fully isolated per algorithm. (Yes, it could be controlled by more algorithm identification characteristic properties like NeedsTagFirst, but that really just leads to it being harder to use)

Also for encrypt, Initialize needs the IV before any crypto transforms.

Lastly, for encrypt and decrypt, Initialize needs the AES encryption keys before any transforms.

The key was a class ctor parameter. The IV/nonce comes from the IV/nonce provider in the ctor parameter.

The provider model solves SIV, where no IV is given during encrypt, one is generated on behalf of the data. Otherwise SIV has the parameter and requires that an empty value be provided.

or you forgot to type that bit?

The streaming methods were being added to my existing proposal, which already had the key and IV/nonce provider as ctor parameters.

@bartonjs : Good point that some algos could want tag first while others at the end and thanks for the reminder that it's an addition to the original spec. I found that considering a use case makes it easier, so here is a cloud-first example:

We're going to perform analytics on one or more 10GB AES-GCM encrypted files (i.e. tags after ciphertext) kept in storage. An analytics' worker concurrently decrypts multiple inbound streams into separate machine/clusters and after last byte + tag checks, starts off each analysis workload. All storage, worker, analytics VMs are in Azure US-West.

Here, there is no way to fetch the tag at the end of every stream and provide it to AuthenticatedDecryptor's Initialize method. So even if a user volunteers to modify code for GCM usage, they can't even begin to use the API.

Come to think of it, the only way we could have an API that accommodates various AEADs AND have no user code changes, is if the crypto providers for different AEAD algorithms auto-magically handle the tags. Java does this by putting the tags at the end of ciphertext for GCM and plucks it out during decrypting without user intervention. Other than that, anytime someone changes the algorithm significantly (e.g. CBC-HMAC => GCM) they will have to modify their code because of the mutually exclusive nature of tag-first and tag-last processing.

IMHO, we should first decide if

Option 1) The algorithm providers internally handle tag management (like Java)

or

Option 2) Expose enough on the API for users to do it themselves (like WinNT bcrypt or openssl)

Option 1 would really simplify the overall experience for library consumers because buffer management can get complex. Solve it well in the library and each user won't have to solve it everytime now. Plus all AEADs get the same interface (tag-first, tag-last, tag-less) and swapping out algorithms is simpler too.

My vote would be for option 1.

Finally, we were able to dig up our implementation allowing ICryptoTransform streaming operations over GCM to automatically pluck out the tag in-stream source. This was a significant update to CLR Security's own wrapper and despite the additional buffer copies it's still really fast (~4GB/sec on our test macbook pro in Windows 10 bootcamp). We basically wrapped around CLR Security to create option 1 for ourselves so we don't need to do it everywhere else. This visual really helps explain what's going on within the TransformBlock and TransformFinalBlock of the ICryptoTransform interface.

@sidshetye I'm not sure why your cloud-first example is blocked. If you're reading from storage you can download the last few tag bytes first and provide that to the decryptor ctor. If using the Azure Storage APIs this would be accomplished via CloudBlockBlob.DownloadRangeXxx.

@GrabYourPitchforks Not to get too sidetracked on that example but that's a specific capability of Azure Blob Storage. In general, VM based storage (IaaS) or non-Azure Storage workloads typically get a network stream that's not seekable.

I, personally, am very excited to see @GrabYourPitchforks - yay!

We're going to perform analytics on one or more 10GB AES-GCM encrypted files (i.e. tags after ciphertext) kept in storage. An analytics' worker concurrently decrypts multiple inbound streams into separate machine/clusters and after last byte + tag checks, starts off each analysis workload. All storage, worker, analytics VMs are in Azure US-West.

@sidshetye , you were so adamant about keeping dumb-n-dangerous primitives and smart-n-huggable protocols separate! I had a dream - and I believed it. And then you throw this at us. This is a protocol - a system design. Whoever designed that protocol you described - messed up. There is no point crying over inability to fit a square peg into a round hole now.

Whoever GCM-encrypted 10Gb files is not only living dangerously close to the primitive edge (GCM is no good after 64Gb), but there was also an implicit assertion that the whole ciphertext will need to be buffered.

Whoever GCM-encrypts 10Gb files is making a protocol mistake with overwhelming probability. The solution: chunked encryption. TLS has variable-length 16k-limited chunking, and there are other, simpler, PKI-free flavors. The "cloud-first" sex appeal of this hypothetical example does not diminish the design mistakes.

(I have a lot of catching up to do on this thread.)

@sdrapkin's raised a point about reusing the IAuthenticatedEncryptor interface from the Data Protection layer. To be honest I don't think that's the right abstraction for a primitive, as the Data Protection layer is quite opinionated in how it performs cryptography. For instance, it forbids self-selection of an IV or nonce, it mandates that a conforming implementation understand the concept of AAD, and it produces a result that's somewhat proprietary. In the case of AES-GCM, the return value from IAuthenticatedEncryptor.Encrypt is the concatenation of a weird almost-nonce-thing used for subkey derivation, the ciphertext resulting from running AES-GCM over the provided plaintext (but not the AAD!), and the AES-GCM tag. So while each step involved in generating the protected payload is secure, the payload itself doesn't follow any type of accepted convention, and you're not going to find anybody aside from the Data Protection library that can successfully decrypt the resulting ciphertext. That makes it a good candidate for an app developer-facing library but a horrible candidate for an interface to be implemented by primitives.

I should also say that I don't see considerable value in having a One True Interface(tm) IAuthenticatedEncryptionAlgorithm that all authenticated encryption algorithms are supposed to implement. These primitives are "complex", unlike simple block cipher primitives or hashing primitives. There are simply too many variables in these complex primitives. Is the primitive AE only, or is it AEAD? Does the algorithm accept an IV / nonce at all? (I've seen some that don't.) Are there any concerns with how the input IV / nonce or data must be structured? IMO the complex primitives should simply be standalone APIs, and higher-level libraries would bake in support for the specific complex primitives they care about. Then the higher-level library exposes whatever uniform API it believes is appropriate for its scenarios.

@sdrapkin We're going off topic again. I'll just say that a system is built using primitives. The crypto primitives here are bare and powerful. While the system/protocol layer handled buffering; that too at a cluster level, certainly not in main system memory that the one shot primitives would force. 'chunking' boundary is X (X=10GB here) because < 64GB, because buffering capacity of the cluster was nearly limitless and nothing would/could start until last byte is loaded in the cluster. This is exactly the separation of concerns, optimizing each layer for it's strengths that I've been talking about. And this can only happen if the underlying primitives don't handicap higher layer designs/limitations (note that more real world apps come with their own legacy handicaps).

NIST 800-38d sec9.1 states:

In order to inhibit an unauthorized party from controlling or influencing the generation of IVs,
GCM shall be implemented only within a cryptographic module that meets the requirements of
FIPS Pub. 140-2. In particular, the cryptographic boundary of the module shall contain a
“generation unit” that produces IVs according to one of the constructions in Sec. 8.2 above.
The documentation of the module for its validation against the requirements of FIPS 140-2 shall
describe how the module complies with the uniqueness requirement on IVs.

That implies to me that GCM IVs must be auto-generated internally (and not passed in externally).

@sdrapkin Good point but if you read even closer you'll see that for IV lengths of 96 bits and above, section 8.2.2 allows for generating an IV with a random bit generator (RBG) where at least 96 bits are random (you could just 0 other bits). I did mention this last month on this thread itself (here under nonce).

LT;DR: INonce is a trap leading to non-compliance with NIST and FIPS guidelines.

Section 9.1 simply says, for FIPS 140-2, the IV generation unit (fully random i.e. sec 8.2.2 or deterministic implementation i.e. sec 8.2.1) must lie within the module boundary undergoing FIPS validated. Since ...

  1. RBGs are already FIPS validated
  2. IV lens >= 96 is recommended
  3. designing an IV generation unit that persist reboots, indefinite loss of power into a crypto primitive layer is hard
  4. getting 3 above implemented within the crypto library AND getting it certified is hard and expensive ($50K for anything resulting in a non bit-exact build image)
  5. No user code will ever implement 3 and get it certified because of 4 above. (lets leave aside some exotic military/govt installations).

... most crypto libraries (see Oracle's Java, WinNT's bcryptprimitives, OpenSSL etc) undergoing FIPS certification use the RBG route for IV and simply take a byte array for input. Note that having the INonce interface is actually a trap from NIST and FIPS' perspective because it implicitly suggests that a user should pass an implementation of that interface to the crypto function. But any user implementation of INonce is almost guaranteed to have NOT undergone the 9 month + and $50K+ NIST certification process. Yet, if they had just sent a byte array using the RGB construct (already in the crypto library), they would be fully compliant with the guidelines.

I've said before - these existing crypto libraries have evolved their API surface and have been battle tested across multiple scenarios. More that what we've touched upon in this long thread. My vote again is to leverage that knowledge and experience across all those libraries, all those validations and all those installations rather than attempting to reinvent the wheel. Don't reinvent the wheel. Use it to invent the rocket :)

Hi folks,

Any updates on this? Haven't seen any updates at @karelz 's crypto roadmap thread or on the AES GCM thread.

Thanks
Sid

So the last concrete proposal is from https://github.com/dotnet/corefx/issues/23629#issuecomment-334328439:

partial class AuthenticatedEncryptor
{
    // throws if an operation is already in progress
    public abstract void Initialize(ReadOnlySpan<byte> associatedData);
    // true on success, false on “destination too small”, exception on anything else.
    public abstract bool TryEncrypt(ReadOnlySpan<byte> data, Span<byte> encryptedData, out int bytesRead, out int bytesWritten);
    // false if remainingEncryptedData is too small, throws if other inputs are too small, see NonceOrIVSizeInBits and TagSizeInBits properties.
    // NonceOrIvUsed could move to Initialize, but then it might be interpreted as an input.
    public abstract bool TryFinish(ReadOnlySpan<byte> remainingData, Span<byte> remainingEncryptedData, out int bytesWritten, Span<byte> tag, Span<byte> nonceOrIvUsed);
}

partial class AuthenticatedDecryptor 
{
    // throws if an operation is already in progress
    public abstract void Initialize(ReadOnlySpan<byte> tag, ReadOnlySpan<byte> nonceOrIv, ReadOnlySpan<byte> associatedData);
    // true on success, false on “destination too small”, exception on anything else.
    public abstract bool TryDecrypt(ReadOnlySpan<byte> data, Span<byte> decryptedData, out int bytesRead, out int bytesWritten);
    // throws on bad tag, but might leak the data anyways.
    // (remainingDecryptedData is required for CBC+HMAC, and so may as well add remainingData, I guess?)
    public abstract bool TryFinish(ReadOnlySpan<byte> remainingData, Span<byte> remainingDecryptedData, out int bytesWritten);
}

Only a few potential issues have been raised since:

  • The tag is required upfront, which hinders certain scenarios. Either the API must become significantly more complex to allow further flexibility, or this issue must be considered a protocol (i.e. high-level) problem.
  • INonceProvider might be needlessly complex and/or lead to non-compliance with NIST and FIPS guidelines.
  • The intended abstraction of authenticated encryption primitives might be a pipe dream, as differences might be too great. There has not been any further discussion of this suggestion.

I'd like to suggest the following:

  1. The additional complexity of not requiring the tag upfront seems severe, the corresponding problem scenario seems uncommon, and the problem does indeed sound very much like a matter of protocol. Good design can accommodate much, but not everything. Personally I feel comfortable leaving this to the protocol. (Strong counterexamples welcome.)
  2. The discussion has consistently moved towards a flexible, low-level implementation that does not protect against misuse, with the exception of IV generation. Let's be consistent. The general consensus seems to be that a high-level API is an important next step, vital for proper use by the majority of developers - this is how we get away with not protecting against misuse in the low-level API. But it seems that an extra dose of fear has sustained the idea of misuse prevention _in the area of IV generation_. In the context of a low-level API, and to be consistent, I'd lean towards a byte[]-equivalent. But implementation swapping is more seamless with the injected INonceProvider. Is @sidshetye's comment irrefutable, or could a simple INonceProvider implentation that merely calls the RNG still be considered compliant?
  3. The abstractions seem useful, and so much effort has been put into designing them, that by now I am convinced they will do more good than harm. Besides, high-level APIs can still choose to implement low-level APIs that do not conform to the low-level abstractions.
  4. IV is the general term, and a nonce is a specific kind of IV, correct? This begs for renames from INonceProvider to IIVProvider, and from nonceOrIv* to iv*. After all, we are always dealing with an IV, but not necessarily with a nonce.

The tag upfront is a non starter for my scenario so I will probably just keep my own inplementation. Which is fine I am not sure it's everyone's cup of tea to write high perf code in this area.

The problem is it will cause unneeded latency. You have to pre buffer an entire message to get the tag at the end to start decoding the frame. This means you basically can't overlap IO and decrypting.

I am not sure why it's so hard to allow it at the end. But I am not going to out a road block for this API it just won't be of any interest in my scenario.

IV is the general term, and a nonce is a specific kind of IV, correct?

No. A nonce is a number used once. An algorithm which specifies a nonce indicates that reuse violates the guarantees of the algorithm. In the case of GCM, using the same nonce with the same key and a different message can result in the compromise of the GHASH key, reducing GCM to CTR.

From http://nvlpubs.nist.gov/nistpubs/ir/2013/NIST.IR.7298r2.pdf:

Nonce: A value used in security protocols that is never repeated with the same key. For example, nonces used as challenges in challenge-response authentication protocols generally must not be repeated until authentication keys are changed. Otherwise, there is a possibility of a replay attack. Using a nonce as a challenge is a different requirement than a random challenge, because a nonce is not necessarily unpredictable.

An "IV" doesn't have the same stringent requirements. For example, repeating an IV with CBC only leaks whether the encrypted message is the same as, or different than, than a previous one with the same IV. It does not weaken the algorithm.

A nonce is a number used once.
An "IV" doesn't have the same stringent requirements.

@bartonjs Yes. I would reason that, since a nonce is used to initialize the crypto primitive, it is its initialization vector. It adheres perfectly to any definition of IV that I can find. It has more stringent requirements, yes, just as being a cow has more stringent requirements than being an animal. The current wording seems to ask for a "cowOrAnimal" parameter. The fact that different modes have varying requirements of the IV does not change the fact that they are all asking for some form of IV. If there's something I'm missing, by all means keep the current wording, but as far as I can tell, just "iv" or "IIVProvider" are both simple and correct.

To indulge in the nonceOrIv bikeshedding:

The 96bit GCM IV is sometimes defined as a 4-byte salt and 8-byte nonce (ex. RFC 5288). RFC 4106 defines GCM nonce as a 4-byte salt and 8-byte iv. RFC 5084 (GCM in CMS) says that CCM takes a nonce, GCM takes an iv, but _"...to have a common set of terms for AES-CCM and AES-GCM, the AES-GCM IV is referred to as a nonce in the remainder of this document."_ RFC 5647 (GCM for SSH) says _"note: in [RFC5116], the IV is called the nonce."_ RFC 4543 (GCM in IPSec) says _"we refer to the AES-GMAC IV input as a nonce, in order to distinguish it from the IV fields in the packets."_ RFC 7714 (GCM for SRTP) talks about a 12-byte IV and almost maintains its consistency, but then says "_minimum & maximum nonce (IV) length: MUST be 12 octets."_

Given the complete lack of consistency in most GCM specs, nonceOrIv kinda makes sense. $0.02

Tag upfront is a non-starter

Like other customers voicing themselves here, requiring the tag upfront is a non-starter for us too. There is no way .NET can then process concurrent streams with this artificially introduced limitation. Totally kills scalability.

Can you back the assertion that it adds complexity? Because it should actually be trivial. Plus none of the platform-specific crypto implementations (that you'll be wrapping) have this limitation. Specifically, the reason is that the input tag needs to be merely constant-time compared against the computed tag. And the computed tag is available only after the final block has been decrypted during TryFinish. So essentially, when you start your implementation, you'll find that you're merely storing the tag inside your instance until the TryFinish. You could very well have it as an optional input

public abstract bool TryFinish(ReadOnlySpan<byte> remainingData, 
                             Span<byte> remainingDecryptedData, 
                             out int bytesWritten, 
                             ReadOnlySpan<byte> tag = null); // <==

I also think we're trying too hard to normalize to a single interface that will cover all crypto scenarios. I too prefer generalized interfaces, but never at the expense of functionality or scalability - especially at such a foundational layer like the standard cryto library of the language itself. IMHO, if one finds oneself doing so, it usually means the abstraction is faulty.

If a simple consistent interface is needed, I prefer the Java approach - also raised previously here as option 1. It also sidesteps the above issue of tag first/tag last by keeping them within the algorithm implementations (IMHO, as I think it should). My team isn't implementing this, so it's not our decision BUT if we had to make a decision and start implementing - we'd go this route for sure.

Please avoid the INonce interface a simple byte[] or span<> should suffice for a compliant low level interface.

IV vs Nonce - Generalized case is indeed IV. For GCM the IV is required to be a Nonce (e.g. Car vs RedOrCar). And as I'm copy-pasting this, I just noticed @timovzl used a very similar example :)

@sidshetye Can you make a precise proposal that both (1) supports algorithms that need the tag upfront, and (2) only needs the tag as late as TryFinish in all other situations?

I suppose you are thinking something in the following direction?

  • The tag in Initialize is allowed to be null. Only those algorithsm that need it upfront will throw on null.
  • The tag in TryFinish is required, or (alternatively) is allowed to be null for those algorithms that have already required it upfront.

I suppose the above only adds complexity in the form of documentation and know-how. For a low-level API this could be considered a small sacrifice, since adequate documentation and know-how are required anyway.

I am starting to become convinced that this should be possible, for compatibility with other implementations, and streaming.

@timovzl Sure, I hope to budget some time tomorrow for this.

@Timovzl , I ended up having time just today and this turned out to be quite the rabbit hole! Thisis long but I think it captures most use cases, captures .NET crypto's strengths (ICryptoTransform) while embracing the .NET Core/Standard direction (Span<>). I've re-read but hope there are no typos below. I also think some realtime communications (chat, conf call etc) is vital to speedy brainstorming; I hope you can consider that.

Programming model

I'll first talk about the resulting programming model for users of the API.

Streaming Encrypt

var aesGcm = new AesGcm();
using (var encryptor = aesGcm.CreateAuthenticatedEncryptor(Key, IV, AAD))
{
  using (var cryptoOutStream = new CryptoStream(cipherOutStream, encryptor, CryptoStreamMode.Write))
  {
    clearInStream.CopyTo(cryptoOutStream);
  }
}

Streaming Decrypt

var aesGcm = new AesGcm();
using (var decryptor = aesGcm.CreateAuthenticatedDecryptor(Key, IV, AAD))
{
  using (var decryptStream = new CryptoStream(cipherInStream, encryptor, CryptoStreamMode.Write))
  {
    decryptStream.CopyTo(clearOutStream);
  }
}

Non-streaming

Since non-streaming is a special case of streaming, we can wrap the above user code into helper methods on AuthenticatedSymmetricAlgorithm (defined below) to expose a simpler API. i.e.

public class AesGcm : AuthenticatedSymmetricAlgorithm
{
  ...
  // These return only after consuming entire input buffer

  // Code like Streaming Encrypt from above within here
​  public abstract bool TryEncrypt(ReadOnlySpan<byte> clearData, Span<byte> encryptedData);

  // Code like Streaming Decrypt from above within here
  public abstract bool TryDecrypt(ReadOnlySpan<byte> encryptedData, Span<byte> clearData); 
  ...
}

This can double as presenting a simpler API like

Non-streaming encrypt

var aesGcm = new AesGcm(Key, IV, AAD);
aesGcm.TryEncrypt(clearData, encryptedData);
var tag = aesGcm.Tag;

Non-streaming decrypt

var aesGcm = new AesGcm(Key, IV, AAD);
aesGcm.Tag = tag;
aesGcm.TryDecrypt(encryptedData, clearData);

Under the hood

Looking at the corefx source, Span<> is everywhere. This includes System.Security.Cryptography.* - except for symmetric ciphers, so lets fix that first and layer authenticated encryption on top.

1. Create ICipherTransform for Span I/O

This is like a Span aware version of ICryptoTransform. I'd just change the interface itself as part of the framework upgrade but since people can get touchy about that, I'm calling it ICipherTransform.

public partial interface ICipherTransform : System.IDisposable
{
  bool CanReuseTransform { get; }
  bool CanTransformMultipleBlocks { get; } // multiple blocks in single call?
  bool CanChainBlocks { get; }             // multiple blocks across Transform/TransformFinal
  int InputBlockSize { get; }
  int OutputBlockSize { get; }
  int TransformBlock(ReadOnlySpan<byte> inputBuffer, int inputOffset, int inputCount, Span<byte> outputBuffer, int outputOffset);
  Span<byte> TransformFinalBlock(ReadOnlySpan<byte> inputBuffer, int inputOffset, int inputCount);
}

Also mark ICryptoTransform as [Obsolete]

To be polite to people with previous knowledge of .NET crypto

[Obsolete("See ICipherTransform")]
public partial interface ICryptoTransform : System.IDisposable { ... }

2. Extend existing SymmetricAlgorithm class for Span I/O

public abstract class SymmetricAlgorithm : IDisposable
{
  ...
  public abstract ICipherTransform CreateDecryptor(ReadOnlySpan<byte> Key, ReadOnlySpan<byte> IV);
  public abstract ICipherTransform CreateEncryptor(ReadOnlySpan<byte> Key, ReadOnlySpan<byte> IV);
  public virtual ReadOnlySpan<byte> KeySpan {...}
  public virtual ReadOnlySpan<byte> IVSpan {...}
  public virtual ReadOnlySpan<byte> KeySpan {...}
  ...
}

3. Extend existing CryptoStream for Span I/O

This is just like Stream in System.Runtime. Plus we'll add a c'tor for our AEAD case to follow.

CRITICAL: CryptoStream will need a mandatory upgrade in FlushFinalBlock to add the tag to the end of the stream during encryption and automatically extract the tag (TagSize bytes) during decryption. This is similar to other battle tested API's like Java's Cryptographic Architecture or C# BouncyCastle. This is unavoidable yet the best place to do this since in streaming the tag is produced at the end yet is not needed until the final block is transformed during decryption. Upside is it vastly simplifies the programming model.

Note: 1) With CBC-HMAC, you can choose to verify the tag first. It's the safer option but if so it actually makes it a two pass algorithm. The 1st pass compute the HMAC tag then the 2nd pass actually does the decryption. So the memory stream or network stream will always have to be buffered in memory reducing it to the one-shot model; not streaming. True AEAD algorithms like GCM or CCM can stream efficiently.

public class CryptoStream : Stream, IDisposable
{
  ...
  public CryptoStream(Stream stream, IAuthenticatedCipherTransform transform, CryptoStreamMode mode);
  public override int Read(Span<byte> buffer, int offset, int count);
  public override Task<int> ReadAsync(Span<byte> buffer, int offset, int count, CancellationToken cancellationToken);
  public override void Write(ReadOnlySpan<byte> buffer, int offset, int count);
  public override Task WriteAsync(ReadOnlySpan<byte> buffer, int offset, int count, CancellationToken cancellationToken);

  public void FlushFinalBlock()
  {
    ...
    // If IAuthenticatedCipherTransform
    //    If encrypting, `TransformFinalBlock` -> `GetTag` -> append to out stream
    //    If decryption, extract last `TagSize` bytes -> `SetExpectedTag` -> `TransformFinalBlock`
    ...
  }

  ...
}

Layer in Authenticated Encryption

With the above, we can add the missing bits to allow Authenticated Encryption with Associated Data (AEAD)

Extend the new ICipherTransform for AEAD

This allows CryptoStream can do it's job properly. We can also use the IAuthenticatedCipherTransform interface to implement our own custom streaming class/usage but working with CryptoStream makes for a super-cohesive and consistent .net crypto API.

  public interface IAuthenticatedCipherTransform : ICipherTransform
  {
    Span<byte> GetTag();
    void SetExpectedTag(Span<byte> tag);
  }

Authenticated Encryption base class

Simply expands SymmetricAlgorithm

public abstract class AuthenticatedSymmetricAlgorithm : SymmetricAlgorithm
{
  ...
  // Program Key/IV/AAD via class properties OR CreateAuthenticatedEn/Decryptor() params
  public abstract IAuthenticatedCipherTransform CreateAuthenticatedDecryptor(ReadOnlySpan<byte> Key = default, ReadOnlySpan<byte> IV = default, ReadOnlySpan<byte> AuthenticatedData = default);
  public abstract IAuthenticatedCipherTransform CreateAuthenticatedEncryptor(ReadOnlySpan<byte> Key = default, ReadOnlySpan<byte> IV = default, ReadOnlySpan<byte> AuthenticatedData = default);
  public virtual Span<byte> AuthenticatedData {...}
  public virtual Span<byte> Tag {...}
  public virtual int TagSize {...}
  ...
}

AES GCM class

public class AesGcm : AuthenticatedSymmetricAlgorithm
{
  public AesGcm(ReadOnlySpan<byte> Key = default, ReadOnlySpan<byte> IV = default, ReadOnlySpan<byte> AuthenticatedData = default)

  /* other stuff like valid key sizes etc similar to `System.Security.Cryptography.Aes` */
}

@sidshetye I applaud the effort.

Streaming-Encrypt over GCM is doable. Streaming-Decrypt over GCM is

  • no allowed in NIST 800-38d. Section 5.2.2 "Authenticated Decryption Function" is crystal-clear that return of decrypted plaintext P must imply correct authentication via tag T.
  • not safe. There is a security notion of algorithms being secure in "Release of Unverified Plaintext" (RUP) setting. RUP security is formalised in 2014 paper by Andreeva-et-al. GCM is not secure in RUP setting. CAESAR competition where every entry is compared against GCM lists RUP security as a desirable property. Unverified plaintext released from GCM is trivially prone to bit-flipping attacks.

Earlier in this thread the possibility of asymmetrical Encrypt/Decrypt APIs was brought up (conceptually), and I think the consensus was that it would be a very bad idea.

In summary, you cannot have high-level byte-granular streaming API for GCM decryption. I said it many times before, and I'm saying it again. The only way to have streaming API is chunked encryption. I'll spare everyone the merry-go-round on chunked encryption..

Whatever MS decides to do for GCM API, RUP cannot be allowed.

@sdrapkin RUP was discussed here in detail and we’ve already crossed that bridge. Briefly, RUP implies data decrypted doesn’t need to be used upon till tag is verified, but like like Java JCE, WinNT bcrypt, OpenSSL etc it doesn't have to be enforced at the method boundary. Like with most crypto primitives, esp low level ones, use with caution.

^^^ that so much. I agree at the higher level stream API's etc then fine enforce it. But when I want to use a low level primitive then I need to be able to use things like split buffers etc, and it's up to me to ensure the data isn't used. Throw an exception in the tag calculation/checking point but don't hamstring low level primitives.

it's up to me to ensure the data isn't used

Wrong. It's not up to you. AES-GCM has a very _specific_ definition, and that definition ensures that it is not up to you. What you want is a separate AES-CTR primitive, and a separate GHASH primitive, which you could then combine and apply as you see fit. But we are not discussing separate AES-CTR and GHASH primitives, are we? We are discussing AES-GCM. And AES-GCM requires that RUP is not allowed.

I also suggest reviewing Ilmari Karonen's answer from crypto.stackexchange.

@sdrapkin You make a good point. It would be desirable, however, to eventually have an algorithm that is secure under RUP, and to have that algorithm fit the API that is decided upon here. So we have to choose:

  1. The API does not support streaming. Simple, but lacking for a low-level API. We might regret this one day.
  2. The AES-GCM implementation prevents streaming, adhering to the specification without limiting the API.

Can we detect the streaming scenario and throw an exception that explains why this usage is incorrect? Or do we need to resort to making its streaming implementation consume the entire buffer? The latter would be unfortunate: you might think you're streaming, but you're not.

We could add a SupportsStreaming(out string whyNot) method that is checked by the streaming implementation.

Do we have a solid argument against streaming/tag-at-the-end in general? If not, then I believe we should aim not to preclude it with the API.

@sdrapkin : Lets take a broader view of RUP since this is library, not an application. So this is more of a buffering and layer design issue than actual release/use of unverified data. Looking at NIST special publication 800-38D for AES GCM, we see that

  1. The GCM specification defines the maximum plaintext length to be 2^39-256 bit ~ 64 GB. Buffering anywhere close to that in system memory is unreasonable.

  2. The GCM specification defined output as FAIL if tag fails. But it's not prescriptive about which layer in an implementation must buffer till tag verification. Let's look at a call stack like:

A => AESGCM_Decrypt_App(key, iv, ciphertext, aad, tag)
B =>  +- AESGCM_Decrypt_DotNet(key, iv, ciphertext, aad, tag)
C =>    +- AESGCM_Decrypt_OpenSSL(key, iv, ciphertext, aad, tag)

Where
A is AES GCM at the application layer
B is AES-GCM at the language layer
C is AES-GCM at the platform layer

The plaintext is released at (A) if tag checks out but FAIL returned if otherwise. However absolutely no where in the spec does it suggest that main memory is the only place to buffer plaintext-in-progress, nor that buffering should happen at (B) or (C) or elsewhere. In fact OpenSSL, Windows NT Bcrypt at example of (C) where streaming permits buffering at the higher layer. And Java JCA, Microsoft's CLR Security and my proposal above are examples of (B) where streaming permits buffering at the application layer. It is presumptuous to assume designers of A don't have a better buffering capabilities before releasing plaintext. That buffer in theory and in practice could be memory or SSDs or a storage cluster across the network. Or punch cards ;) !

Even leaving aside buffering, the paper discusses other practical concerns (see Section 9.1, design considerations and 9.2, Operational Considerations ) like keys freshness or IV non-repetition across power indefinite failures. We obviously won't be baking this into layer B i.e. here.

@timovzl the recent proposal above addresses both scenarios - one-shot (architect doesn't care about buffering) as well as streaming (architect has better buffering capabilities). As long as the low level streaming API documentation is clear that the consumer is now responsible for buffering it, there is zero reduction of security proof and there is no deviation from the specification.

EDIT: grammar, typos and trying to get markdown working

Bingo .. Again it is the decision of the layer designers as to where the tag verification happens. I am in no way advocating releasing unverified data to the application layer.

The discussion comes back around to are these "consumer" level API's or are they true Primitives. If they are true primitives then they should expose the functionality so that higher level "safer" API's can be built on top. It was already decided above with the Nonce discussion that these should be true primitives which means you could shoot yourself in the foot, I think the same applies to the streaming/partial ciphertext decoding.

In saying that, it will be imperative to provide the "higher" level and safer API's quickly to stop people "rolling" their own ontop of these.

My interest comes from networking/pipelines and if you can't do partial buffers and have to do "one shot" then there would be no benefit to these API's just downside so I would continue going to BCrypt/OpenSsl etc directly.

My interest comes from networking/pipelines and if you can't do partial buffers and have to do "one shot" then there would be no benefit to these API's just downside so I would continue going to BCrypt/OpenSsl etc directly.

Exactly. The need won't go away, so people will use other implementations or roll their own. That's not necessarily a safer outcome than allowing streaming with good warning documentation.

@Timovzl, I think we've elicited a lot of technical feedback and design requirements around the last proposal. Thoughts on implementation and release?

@Sidshetye has made a detailed proposal that I believe addresses all the requirements. The single criticism, regarding RUP, has been addressed with no further opposition. (Specifically, RUP may be prevented in one of several layers, and the low-level API should not dictate which; and offering _no_ streaming is expected to have worse effects.)

In the interest of progress, I would like to invite anyone with further concerns with the latest proposal to please speak - and, of course, offer alternatives.

I am enthusiastic about this proposal and about an API taking shape.

@Sidshetye, I have some questions and suggestions:

  1. Is it desirable to inherit from the existing SymmetricAlgorithm? Are there any existing components that we want to integrate with? Unless I'm missing some advantage to that approach, I would rather see AuthenticatedEncryptionAlgorithm with no base class. If nothing else, it avoids exposing undesirable CreateEncryptor/CreateDecryptor (non-authenticated!) methods.
  2. None of the components involved are usable with asymmetric crypto, yes? Almost all components omit "Symmetric" from their names, something I agree with. Unless we keep inheriting SymmetricAlgorithm, AuthenticatedSymmetricAlgorithm could be renamed to AuthenticatedEncryptionAlgorithm, adhering to the conventional term Authenticated Encryption.
  3. Change TryEncrypt/TryDecrypt to write to / receive the tag, rather than having a settable Tag property on the algorithm.
  4. What is the purpose of setting key, iv, and authenticatedAdditionalData via public setters? I would steer clear of multiple valid approaches, and of mutable properties as much as possible. Could you create an updated proposal without them?
  5. Do we want any state in AesGcm at all? My instinct is to definitely keep out iv and authenticatedAdditionalData, as these are per-message. The key might be worth having as state, as we generally want to do multiple operations with a single key. Still, it's possible to take the key on a per-call basis as well. The same questions go for CreateAuthenticatorEncryptor. In any case, we should settle on _one way_ to pass the parameters. I'm eager to discuss pros and cons. I'm leaning towards key state in AesGcm, and the rest in CreateAuthenticatedEncryptor or TryEncrypt respectively. If we're already in agreement, please show us an updated proposal. :-)
  6. ICipherTransform should probably be an abstract class, CipherTransform, so that methods can be added without breaking existing implementations.
  7. All function parameters should use camelCase, i.e. start in lowercase. Also, should we say authenticatedData or authenticatedAdditionalData? Additionally, I think we should choose parameter names plaintext and ciphertext.
  8. Wherever the IV is passed, I would like to see it as an optional parameter, making it easier to get a properly generated (cryptorandom) IV than to provide our own. Makes misuse of the low-level API harder, at least, and we get this for free.
  9. I'm still trying to figure out how TryEncrypt's client code can know the required span length to provide for ciphertext! Same for TryDecrypt and the length of plaintext. Surely we're not supposed to try them in a loop until success, doubling the length after each failed iteration?

Finally, thinking ahead, what might a high-level API built on top of this look like? Looking purely at the API usage, there seems to be little room for improvement, as both the streaming and the non-streaming APIs are so straightforward already! The main differences I imagine are an automatic IV, automatic output sizes, and possibly a limit on the amount of data encrypted.

Windows allows streaming. OpenSSL does, too. Both of those mostly pigeonholed it into existing concepts (though they both threw in a wrench with "and there's this thing on the side you have to deal with or I'll error out").

Go does not, and libsodium does not.

It seems like the first wave allowed it, and later ones don't. Since we're inarguably in a later wave, I think that we're going to stick with not allowing it. If there's increased demand for streaming after introducing a one-shot model (to encryption / decryption, the key can be maintained across calls), then we can re-evaluate. So an API proposal which adheres to that pattern seems beneficial. Though neither SIV nor CCM support streaming encryption, so streaming API for them is potentially heavily buffering. Keeping things clear seems better.

Proposals also should not embed the tag in the payload (GCM and CCM call it out as a separate datum), unless the algorithm itself (SIV) incorporates it into the output of encryption. (E(...) => (c, t) vs E(...) => c || t or E(...) => t || c). Users of the API can certainly use it as concat (just open the Spans appropriately).

GCM specification does not allow release of anything other than FAIL on tag mismatch. NIST is quite clear about that. The original GCM paper by McGrew & Viega also says:

The decrypt operation would return FAIL rather than the plaintext, and the decapsulation would halt and the plaintext would be discarded rather than forwarded or further processed.

None of the prior comments addressed RUP - they merely hand-waved it away ("the higher layer will take care of it" - yeah, right).

It's simple: GCM encryption can stream. GCM decryption cannot stream. Anything else is no longer GCM.

It seems like the first wave allowed it, and later ones don't. Since we're inarguably in a later wave, I think that we're going to stick with not allowing it.

@bartonjs you're literally ignoring all technical and logical analysis and instead using the dates of the Go and libsodium projects as a weak proxy for the real analysis? Imagine if I make a similar argument based on the names of the projects. Plus we are deciding on the interface AND implementations. You do realize that deciding on a non-streaming interface for AEAD precludes all such implementations down the road, right?

If there's increased demand for streaming after introducing a one-shot model (to encryption / decryption, the key can be maintained across calls), then we can re-evaluate.

Why is the demand demonstrated so far on GitHub insufficient? It's getting to the point where it appears entirely whimsical to support having to do less work than on any technical or customer demand merits.

@bartonjs you're literally ignoring all technical and logical analysis and instead using the dates of the Go and libsodium projects as a weak proxy for the real analysis?

No, I am using the advice of professional cryptographers who say it's extraordinarily dangerous and that we should avoid streaming AEAD. Then I'm using information from the CNG team of "many people say they want it in theory, but in practice almost no one does it" (I don't know how much of that is telemetry vs anecdotal from fielding assistance requests). The fact that other libraries have gone the one-shot route simply _reinforces_ the decision.

Why is the demand demonstrated so far on GitHub insufficient?

A few scenarios have been mentioned. Processing fragmented buffers could probably be addressed with accepting ReadOnlySequence, if it seems like there's enough of a scenario to warrant complicating the API instead of having the caller do data reassembly.

Large files are a problem, but large files are already a problem since GCM has a cutoff at just shy of 64GB, which is "not all that big" (okay, it's pretty big, but it's not the "whoa, that's big" that it used to be). Memory-mapped files would allow Spans (of up to 2^31-1) to be utilized without requiring 2GB of RAM. So we've shaved a couple of bits off of the maximum... that would probably happen over time anyways.

You do realize that deciding on a non-streaming interface for AEAD precludes all such implementations down the road, right?

I'm more and more convinced that @GrabYourPitchforks was right (https://github.com/dotnet/corefx/issues/23629#issuecomment-334638891) that there's probably not a sensible unifying interface. GCM _requiring_ a nonce/IV and SIV _forbidding_ it mean that the initialization of an AEAD mode/algorithm already requires knowledge about what's going to happen... there isn't really a "abstracted away" notion to AEAD. SIV dictates where "the tag" goes. GCM/CCM do not. SIV is tag-first, by spec.

SIV can't start encrypting until it has all of the data. So its streaming encrypt is either going to throw (which means you have to know to not call it) or buffer (which could result in n^2 operation time). CCM can't start until the length is known; but CNG doesn't allow a pre-encrypt hint at the length, so it's in the same boat.

We shouldn't design a new component where it's easier to do the wrong thing than the right thing by default. Streaming decryption makes it very easy and tempting to wire up a Stream class (a la your proposal to do so with CryptoStream) which makes it very easy to get a data validation bug before the tag is verified, which almost entirely nullifies the benefit of AE. (IGcmDecryptor => CryptoStream => StreamReader => XmlReader => "wait, that's not legal XML..." => adaptive ciphertext oracle).

It's getting to the point ... customer demand.

As I've, unfortunately, heard way too many times in my life: I'm sorry, but you aren't the customer we have in mind. I'll concede that perhaps you know how to do GCM safely. You know to only stream to a volatile file/buffer/etc until after tag verification. You know what nonce management means, and you know the risks of getting it wrong. You know to pay attention to stream sizes and cut over to a new GCM segment after 2^36-64 bytes. You know that after it's all said and done it's your bug if you get those things wrong.

The customer I have in mind, on the other hand, is someone who knows "I have to encrypt this" because their boss told them to. And they know that when searching for how to do encryption some tutorial said "always use AE" and mentions GCM. Then they find an "encryption in .NET" tutorial which uses CryptoStream. They then hook up the pipeline, not having any idea that they've just done the same thing as choosing SSLv2... checked a box in theory, but not really in practice. And when they do it _that_ bug belongs to everyone who knew better, but let the wrong thing be too easy to do.

you aren't the customer we have in mind [...] The customer I have in mind, on the other hand, is someone who knows "I have to encrypt this" because their boss told them to [...]

@bartonjs months back we had already decided that the goal was to target two customer profiles by having a low level API (powerful but unsafe under certain conditions) and a high level API (foolproof). It's even in the title. It's certainly a free country but it's disingenuous to now move the goalpost by claiming otherwise.

The customer I have in mind, on the other hand, is someone who knows "I have to encrypt this" because their boss told them to. And they know that when searching for how to do encryption some tutorial said "always use AE" and mentions GCM. Then they find an "encryption in .NET" tutorial which uses CryptoStream. They then hook up the pipeline, not having any idea that they've just done the same thing as choosing SSLv2... checked a box in theory, but not really in practice. And when they do it that bug belongs to everyone who knew better, but let the wrong thing be too easy to do.

@bartonjs Wait, what happened to a low-level primitive? I thought the aim of this particular issue was flexibility over babysitting. Definitely let us know if the plan has changed, so that we're all talking about the same thing.

Also, are per-block methods still under consideration, or just one-shot methods?

I'm more and more convinced that @GrabYourPitchforks was right (#23629 (comment)) that there's probably not a sensible unifying interface.

Seeing all the examples, this does start to look more and more futile - especially for a low-level API where the implementations have such different restrictions. Perhaps we should merely set a solid example with AES-GCM, rather than a unifying interface. As a side note, the latter might still be interesting for a future high-level API. Its property of being more restrictive will probably make a unifying interface much easier to achieve there.

are per-block methods still under consideration, or just one-shot methods?

As mentioned in https://github.com/dotnet/corefx/issues/23629#issuecomment-378605071, we feel like the risk vs reward vs expressed-use-cases say that we should only allow one-shot versions of AE.

I haven't read the entire discussion, just random parts. I don't know what direction you are going. Sorry if what I write doesn't make sense in this context. My 2¢:

  • Streams are important. If you cannot support them directly because that would mean security vulnerability, then if possible provide a higher-level wrapper built on top of your low-level API that would expose streams (in an inefficient but safe way).
  • If AES-GCM absolutely cannot use streams in any scenario, then provide a legit implementation of AES-CBC-HMAC that is based on streams. Or some other AE algorithm.
  • The higher the level the better. The fewer the areas to make a mistake by the user the better. Meaning — expose API that would hide as much stuff as possible (for example this authentication tag). Of course, there can (should) be more specific overloads as well.
  • IMHO don't bother with unifying interfaces with other encryption services if they simply don't fit. That is what openssl has done with their CLI and the result is poor (e.g. no possibility to provide the authentication tag).

We (the .NET security team) conferred amongst ourselves and with the broader crypto team within Microsoft. We discussed many of the issues and concerns mentioned in this thread. Ultimately these concerns were not persuasive enough to warrant introducing a streaming GCM API as a core building block within the framework.

This decision can be revisited in the future if the need arises. And in the meantime we've not made things any worse than they are today: developers who are currently using a third party crypto library for streaming GCM support can continue to do so, and they won't be broken by our intended introduction of a non-streaming GCM API.

How to deal with encryption of data that does not fit into memory?

@pgolebiowski You use high-level .NET crypto libraries specifically designed to offer safe streaming encryption.

@sdrapkin this is easier said than done. "safe" is a lot to ask. what is there that is proven and can actually be trusted? you say yourself:

Bouncy Castle c# library (a typical StackOverflow recommendation). Bouncy Castle c# is a huge (145k LOC), poorly-performing museum catalogue of crypto (some of it ancient), with old Java implementations ported to equally-old .NET (2.0?).

ok, so what are the options? maybe your own library? not really. hmm... maybe libsodium-net? not really.

when you actually look for an audited library that comes from a rather trustworthy source (like Microsoft or extensively used by the community), i don't think such a library exists in the .NET Core world.


  • customer: authenticated encryption of data that does not fit into memory?
  • microsoft: I'm sorry, but you aren't the customer we have in mind. use a library that is not audited and its safety is questionable, not our problem if u under side-channel attack.
  • customer:

@pgolebiowski The options are to use an established .NET framework - ie. the very thing that has been proven and can be trusted, as you desire. Other libraries (including mine) wait for Microsoft to add missing crypto primitives like ECDH into NetStandard.

You can also look at Inferno forks. There are at least 2 forks where a few trivial changes achieved NetStandard20.

Your library was audited 2 years ago within 2 days by one guy. I don't trust that, sorry. In Microsoft, there would be a dedicated team for that -- of people I have higher trust than those from Cure53.

Seriously, we can talk about third-party support for lots of things. But all the needed security-related stuff should be provided by the standard library.

@pgolebiowski Far be it from me to try to convince anyone to trust anything, but your statement is not accurate. Inferno was audited by 2 professionals from "Cure53" organisation. The audit took 2 days, and the entire library was ~1,000 lines of code. That's ~250 lines of code per auditor/day - quite manageable.

In fact, easy auditability is among the key features of Inferno, precisely for those who don't want to trust.

The goal of this thread is to add support for AES-GCM. Your library does not even support AES-GCM. If you want people to use your code, submit it as a proposal for corefx. In an appropriate thread.

Another thing, even if it supported this algorithm — it has not been reviewed by the .net crypto board and is not a part of the corefx. It is not even a candidate for such a review. This means the end of this futile discussion and advertising.

@pgolebiowski I did not advertise anything - merely answered your question, suggested alternatives for your usecase, and corrected inaccurate claims. AES-GCM is not suited for streaming encryption, and that is something that has been reviewed and agreed on by the .NET security team, so you can trust that.

am I defending streaming AES-GCM anywhere? can't recall. but can recall saying:

  • Streams are important. If you cannot support them directly because that would mean security vulnerability, then if possible provide a higher-level wrapper built on top of your low-level API that would expose streams (in an inefficient but safe way).
  • If AES-GCM absolutely cannot use streams in any scenario, then provide a legit implementation of AES-CBC-HMAC that is based on streams. Or some other AE algorithm.

or stating an open problem for corefx:

How to deal with encryption of data that does not fit into memory?


bonus:

You can also look at Inferno forks. There are at least 2 forks where a few trivial changes achieved NetStandard20. [...] Inferno was audited by 2 professionals from "Cure53" organisation [...] easy auditability is among the key features of Inferno, precisely for those who don't want to trust.

I did not advertise anything

Just as an aside I never really wanted "streaming" in the sense that most are thinking. I just wanted "block" processing for perf and memory use reasons. I don't actually want to "stream" out the results. This is what openssl and cng support seems a shame to lose that and basically makes the "primatives" useless in any scenario I can think of.

@Drawaes Come to think of it, block operation might be much safer than using streams. A layman may touch streams, but he will much rather use the one-shot API than block operation. Moreover, block operation cannot _straightforwardly_ be combined with, say, XmlReader. So actually, many of the dangers discussed apply to stream objects, but not to block operation.

To work with block operation when a one-shot API is also available, suggests that we know what we are doing, and that we specifically require low-level tweaking. We could protect the layman _and_ have flexibility.

As for avoiding RUP, I'm still pondering how much of an advantage block operation truly is for GCM. Encryption reaps the full benefits, whereas decryption benefits only somewhat. We can avoid storing the full ciphertext, but we must still buffer the full plaintext. A decryptor _could_ choose to store the intermediate plaintext on disk. But in return, we have introduced more room for error. Do we have a convincing argument to not solve this problem at a higher level (e.g. chunk there, or use a truly streaming algorithm)?

TLS and pipelines. Currently (and for the foreseeable future) pipelines uses 4k blocks but a tls message can be 16k of cipher text.with a one shot you will need to copy the 16k to a single contingous buffer before you can decrypt. With blocks you might have say 4 or 5 and you might need to buffer upto 16 bytes to ensure compete blocks.

@Drawaes 16k is still constant and not huge. Does it make much of a difference in this context?

Yes, it means another copy in the pipeline. This has a major and measurable effect on perf.

What is needed to make this happen? What are the next steps? @Drawaes

As for AES-GCM, I think that its delivery is impaired due to the corresponding issue being locked: https://github.com/dotnet/corefx/issues/7023. @blowdart, could you unlock? It is really hard to have progress when people can't discuss. Or, if that's not an option, maybe propose an alternative solution that allows bringing this feature to the public.

Nope, I'm not unlocking that. A decision has been made, the topic is done.

@blowdart Thanks for the reply. I understand that maybe this was not clear enough:

Or, if that's not an option, maybe propose an alternative solution that allows bringing this feature to the public.

I appreciate that there is a decision for supporting AES-GCM. This is great, I definitely want that algorithm. Thus, now it would be cool to actually have it supported. Would you like the discussion on AES-GCM design and implementation be held here or in a new issue?

Also, if that topic is done, why not close it? And make it more explicit by changing the title of that issue, as right now it suggests that the discussion on the implementation would be held here: https://github.com/dotnet/corefx/issues/7023. Maybe something like Decide which AEAD algorithm to support first.

In other words: I provide feedback that in the current situation it is unclear what is needed to push AES-GCM forward.

@karelz

@pgolebiowski There's already a PR out. It'll probably be available in master Wednesday next week.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yahorsi picture yahorsi  ·  3Comments

aggieben picture aggieben  ·  3Comments

matty-hall picture matty-hall  ·  3Comments

noahfalk picture noahfalk  ·  3Comments

Timovzl picture Timovzl  ·  3Comments