Consul: Allow setting of SecretID and AccessorID on ACL tokens on Consul 1.4+

Created on 19 Nov 2018  ·  18Comments  ·  Source: hashicorp/consul

Feature Description

Pre v1.4 the token value of an ACL token could be specified by the client. This allowed for a simple workflow where my SaltStack masters would generate token UUIDs (securely) for every minion as they came up and later configure their ACLs. Those UUIDs were then easy to push to the minion even when the consul cluster was not yet available.

Post v1.4 the master needs to create a token for the client, saving its SecretID and AccessorID. Then the SecretID must be pushed to the minion.

This creates a chicken/egg problem where Consul must exist for new tokens to be created. Bringing up the consul cluster must now be done in two stages as tokens can't be pregenerated.

Use Case(s)

I would like to determine SecretID and possibly AccessorID values, thereby enabling completely automated cluster setups. Whether I read 128 bits from /dev/urandom to create a hard-to-guess UUID or Consul does it shouldn't matter security-wise. It would also be nice to be able to create other "well-known" SecretIDs and AccessorIDs like anonymous for my local cluster.

themacls typenhancement

Most helpful comment

It mostly makes sense except maybe the part about wanting to create a policy and link it to tokens in one pass.

You would have to create the token, then create the policy and do the linking. Why not just create the policy first then create the token specifying that policy.

I am envisioning the CLI flow to be:

  1. Generate some token IDs using uuidgen or something similar and pre-populate configuration files with the secrets
  2. Spin up some infrastructure utilizing the consul tokens (that do not exist yet)
  3. consul acl bootstrap
  4. consul acl policy create -name service-abc -rules @service_abc_policy.hcl
  5. consul acl token create -accessor <accessor id> -secret <secret id> -policy-name service-abc - Specifying the accessor might not be necessary for you as only the secret is needed to put in config files for usage by the consul agents.

After step 5 those tokens which previously were not working should now start working correctly. The caveat is that the negative token resolution responses may be cached on the Consul clients (in all datacenters) and Consul servers (in non-primary datacenters if token replication is disabled). So if you distributed some token secret to something like consul-template or another tool and pointed it at the HTTP API then it could take up to the token_ttl time (which is configurable) for the now created token to be available.

All 18 comments

Thinking about policies... it might also be worthwhile to allow policy AccessorIDs to be deterministic as some policy API endpoints can't (unike the Policies stanza on ACL tokens) resolve policy Names.

Since @pearkes added the needs-discussion tag, I thought I'd provide a more detailed write-up about what the problem is here.

Essentially, up to consul 1.3 the following workflow was possible:
consul-acl-13

In essence: a configuration management system that was able to create secure random identifiers was able to provision a consul cluster and agents in parallel. It didn't matter whether the consul cluster came up before or after the configuration management server.

With consul 1.4 this changed. The current process we're using looks like this:
consul-acl-14

Here we need to solve the chicken/egg problem of having a consul server available to be able to create ACLs.

  • So on the first configuration management pass we create a temporary ACL that is not persisted.
  • We use this temp ACL to set up service discovery for all subsequent services and all minions that are coming up in parallel.
  • On a second run, the configuration management system detects that the temp ACLs have not been persisted and calls the Consul API to create new ACL tokens that are then actually persisted.
  • If the consul server cluster ever reboots or is recreated, we have to query the ACL tokens through the API and when we learn that they are invalid, we create new ones and propagate them as necessary through the configuration management system.

An implementation of this process is now in my dynamicsecrets Saltstack plugin and salt configuration. It seems pretty brittle to me. I had to add a lot of states that block the configuration management system and use repeated polling on Consul's HTTP API to wait for ACLs to become available on the minions so that Consul can answer DNS API queries for service discovery so that subsequent states can find their dependencies.

I don't think forcing the creation of ACL tokens through consul leads to any security improvement here, but it makes automation a lot harder than it has to be.

@jdelic Originally it was a specific goal of mine to not have those fields be user-modifiable. However I can see how that doesn't play so nice with config management and orchestrators.

Is it safe to assume that the problem really only exists at the time of initial token creation and that once created you wouldn't need to modify the accessor and secret ids?

@mkeeler
sorry for the late answer, I was away on Christmas break trying to have little internet contact ;).

You are right, at least in my case, setting accessor ids and secret ids at creation time would be completely sufficient.

Facing exactly same chicken/egg problem in process of creating my own Consul installation and configuration state for SaltStack. First of all my thoughts was that only way to get "root / master / bootstrap" token is to create it with consul acl bootstrap but than I found that I can simply explicitly declare master token in server config file. But than, once again, I found that I still need to create agent token and write it to config file, otherwise server can't update information about himself with error agent: Coordinate update blocked by ACLs

I think ability to provide SecretID and probably AccessorID too, while creating new token is "must have" feature.

We use a system similar to @jdelic at the moment, except with the twist that we have a horrible bash script which creates secrets via the deprecated ACL API, and then migrates them. We're in the unusual situation of deploying more than a hundred clusters, which we want to have uniform ACLs. Being able to specify secret IDs at creation time would make this a lot simpler.

I am running into this same sort of issue just doing some basic automation with bash scripts. Decoupling the uuid generation and the configuration of the token would make this work a lot more straight forward.

To give some updates here:

I have a PR open that would allow setting the various ids of tokens and policies. However there are some issues I ran into that I don't have a good solution for yet.

Setting token ids should be safe enough as no other parts of the ACL system reference those token IDs.

Setting policy ids is a little more of a foot gun. In the happy path you are only setting IDs like that when they are truly unique and have never been used in the cluster before. However it would be possible to create a policy, link some tokens to that policy, delete the policy and recreate with the same ID but different permissions. In that scenario all the existing tokens would inherit the new policy. This is mostly due to the original assumption that we could lazily handle policy deletions and not update all the tokens that reference them when a policy is deleted. In some ways this sounds like a cool feature to restore accidentally deleted policies but my fear is that instead it will result in accidentally granting privileges to tokens that you do not intend to.

I am thinking a decent middle ground to solve these problems and not introduce any new ones would be to:

  1. Allow setting the ids of tokens.
  2. Provide some secondary endpoints to manipulate policies by name.

I think this would be sufficient for all the use-cases as detailed in this issue.

Additionally it might help some to know that the Consul terraform provider recently gained the ability to manage acl tokens and policies. For automation purposes that might be a route to consider.

Thanks @mkeeler! I think that sounds right. If I can supply the tokens via configuration files for agents, services, etc and then afterwards associate those tokens with policies I think that would do the trick. I think the workflow would be something like:

  1. Generate a token using uuid or something
  2. Create service_abc.json that includes the token
  3. Create a policy via consul acl policy create -existing-token=123-abc -rules @service_abc_policy.hcl where -existing-token=123-abc is the switch to include the aforementioned token and associate it with the policy

This would eliminate the need to consul acl token create and rewriting the service_abc.json with the output.

Ideally I would be able to do the same with the agent tokens, setting master, default, etc via configuration file and then associating them with the appropriate policy via consul acl policy create. Rather than doing the create policy, create token and feed the output to set-agent-token, then rewriting the config if token persistency isn't turned on.

Does this make sense?

It mostly makes sense except maybe the part about wanting to create a policy and link it to tokens in one pass.

You would have to create the token, then create the policy and do the linking. Why not just create the policy first then create the token specifying that policy.

I am envisioning the CLI flow to be:

  1. Generate some token IDs using uuidgen or something similar and pre-populate configuration files with the secrets
  2. Spin up some infrastructure utilizing the consul tokens (that do not exist yet)
  3. consul acl bootstrap
  4. consul acl policy create -name service-abc -rules @service_abc_policy.hcl
  5. consul acl token create -accessor <accessor id> -secret <secret id> -policy-name service-abc - Specifying the accessor might not be necessary for you as only the secret is needed to put in config files for usage by the consul agents.

After step 5 those tokens which previously were not working should now start working correctly. The caveat is that the negative token resolution responses may be cached on the Consul clients (in all datacenters) and Consul servers (in non-primary datacenters if token replication is disabled). So if you distributed some token secret to something like consul-template or another tool and pointed it at the HTTP API then it could take up to the token_ttl time (which is configurable) for the now created token to be available.

Got it, I think this sounds good to me @mkeeler.

Running into the same issue here with Chef. We'd like to be able to have the tokens created in advance, encrypted with KMS and stored in cookbook attributes and finally decrypted and passed to /acl/token endpoint to be created within Consul.

This will be part of our next release (1.5.0)

@mkeeler qq sort of on topic. You mention full automation, but the bootstrap is still a step, correct? Or is that just automated somehow else in salt?

@matthoey-okta
I'm the guy with the salt config. :)

Consul 1.4.x still allowed the initial master token to be specified via the config files and that's what I use to bootstrap the cluster in lieu of consul acl bootstrap

@matthoey-okta It was already mentioned but the acl.tokens.master configuration allows you to setup a token with that secret to be automatically bootstrapped when a particular server gains leadership of raft.

@jdelic
There is no reason I know of why we shouldn't keep that functionality around and no one has ever mentioned wanting it deprecated/removed. The same thing still works in v1.5.0.

Now that I am thinking about it, I probably should have modified the /v1/acl/bootstrap endpoint to allow setting the IDs too. That way you could create the privileged token with known IDs without having to ever store it in the servers config file. That is advantageous because if you put a master token in the config file and want to delete it in the future, it will require editing the server configurations and restarting them to prevent them from reinserting into raft.

@mkeeler and @jdelic Thank you for the explanations. I didn't understand how it was possible to use that option as I figured you'd have to somehow store the master token in the datastore first before putting it in the file. It's interesting to note that when that server becomes the raft leader it would use that.

We currently use all random token secret ID's so automation and orchestration has been a challenge. I will look into these new features which would certainly make our lives easier. Thanks!

Was this page helpful?
0 / 5 - 0 ratings