Fabric: v2-compatible 'roles' or similar

Created on 22 Apr 2017  ·  7Comments  ·  Source: fabric/fabric

Synopsis

At time of writing, the v2 branch has a Group class that should be capable of serving as the units formerly known as 'roles', aka "a bunch of hosts to do stuff with/on".

However, there's no specific way of organizing or labeling Group objects yet; it's "done" enough for the pure API use case of advanced users who want to roll their own specific way of creating them, but lacks anything for CLI-oriented users or intermediate folks who want something frameworky to build around.

Put another way, unless you're rolling purely with the API, having Group objects lying around somewhere is useless if the CLI or task-calling bits have no way of finding them!

Background

In v1, roles were effectively a single flat namespace mapping simple string labels to what would be Groups in v2, and they could be selected on the CLI at runtime (fab --roles=web,db) and/or registered as default targets for tasks (@task('db') \n def migrate():), much like hosts.

Users defined them in env.roledefs, a simple dict; any intermediate to advanced functionality revolved around modifying it, usually at runtime (via pre-task or subroutine), sometimes at module load time.

Specific use cases / needs / subfeatures

  • Basic, naive mapping for use/reference anywhere else in the system: put in a name, get back some iterable of Groups and/or Connections.

    • Aliasing often wants to go along with that, so e.g. a Lexicon instead of a dict.

    • Even deeper constructs, such as 'bundling', e.g. you have direct mappings named db, web, lb, but then a 2nd-tier name called prod that is always the union of the other three. I forget if I added that to Lexicon yet. Possible there's other map subclasses out there that already do it too.

    • Additionally/alternately, things like globbing or other string syntaxes, though I personally would prefer to leverage the fact that Python is not "stringly typed"...

  • Useful 'reverse mapping' such that you can identify which groups a given Connection belongs to.

    • Problematic: because there's currently no global shared state, the naive answer to this - using identity - falls down because you can technically create multiple identical Connection objects.

    • Especially since Group can create them implicitly on your behalf if you just give it shorthand host strings, though that is only a convenience option.

    • However, given that constraint of no global state, I can't see obvious problems with using equality testing instead, so that should be doable, e.g. if cxn in group would work even if cxn is a distinct object from the equal member inside group.

    • The only thing that comes to mind is if there were strong, stateful links from a Connection to a Group (would have to be groups, plural) holding it, instead of vice versa, but I can't see great reasons for that offhand.

  • Strongly related to the previous: ability to inspect/display what the "currently running role" is (something folks wanted for a long time in v1 which was nontrivial due to its design)

    • Main issue is that this is really two semi distinct questions: "what role(s) is the current host part of, generally speaking" (basically, that previous use case of the reverse lookup) but also "what role(s) was the execution machinery specifically asked to run against".

    • In other words, given host 'foo' belonging to roles A, B and C: within a given task whose context is 'foo', but which was run because of a request to 'execute on role A', is a user looking for an answer of "A, B and C" (the roles 'foo' is in overall) or just "A" (the currently executing role)?

    • This really feels like two distinct API calls, even though the feature requests I remember getting conflate the two.

  • Target selection on the CLI, globally and/or per-task

    • An extension of Invoke's CLI system to account for "flags that all tasks get on top of what they define" may be useful or required for this. Which falls firmly into pyinvoke/invoke#205 territory, in fact, so that just got higher priority than it already was (which was pretty high.)

  • Ditto task-level defaults

    • Though task-level target defaults really want to be any of: connection, connections, group obj, group objs, or name evaluating _to_ group objs (that last is the only thing that directly pertains to this ticket, arguably)

  • Ditto collection-level defaults (NEW in v2!)

    • I.e. "all tasks in $submodule default to running against the db role"

    • Same deal as previous point - this default wants to allow a number of different values, not just a string key.

  • Anything else new and exciting enabled by an OO approach that really wants to go along with this? Remember emphasis should be on building blocks and enabling advanced users, not on e.g. totally reinventing systems like Chef or Ansible.

Implementation ideas/concerns

  • If we used the config system as the main storage vector, values "want" to be primitives so they can be stored in yaml, json etc, but that's a can of worms ending with "store all Group/Connection kwargs in a big ol' list-o-dicts", etc.
  • If we expect the definitions to primarily be in Python, we can simply say "instantiate Group objects", and then we have the option of merging that data into the config system or leaving it standalone somehow.

    • I think I prefer the latter because stuffing literally everything into the nested config dicts feels like it'll lead to bad news.

  • The deeper constructs like aliasing and bundling add complexity & ordering issues (i.e. imagine a trivial alias setup where key1's value is a group but key2's value is key1; now you have to crawl the structure twice to resolve or check key2)

    • though if we go for a mostly "do it in-python" approach, it becomes much like the config system's API, where you can start out with a declarative structure but anything more is enabled by method calls after that initial setup. I don't think that's awful? EDIT: and I think that's exactly how Lexicon works anyways.

  • Regardless of format, we have to figure out how advanced users will want to generate it on the fly from external sources or similar; this plus the issues with aliasing and such, implies we may not want this in a naive structure "stored" somewhere, but as an API on some object or objects that is called to generate it.

    • I suspect we may want to work 'downwards' from the selection of roles/groups, arriving at whatever the highest level API is for "turn what the user supplied into an actionable unit of targets", because the most advanced users will necessarily want to have complete control over the implementation of that API call. Then we can as always supply what feels like a useful common case but which is clearly marked as "just one way to do it".

    • @RedKrieg has a nifty idea along these lines where we have @group like @task, and the functions aren't executable units of work, but instead yield Group objects.

    • This approach natively reuses the task hierarchy (Collection), which is practical (why reinvent the wheel) and elegant (because in real world cases, role/group definitions frequently DO map very closely to the tasks using them!)



      • It also works well even if your groups DON'T map to your tasks, because you can simply write the definitions at your root collection level. Easy peasy.



    • It's unclear to me whether this is best returning a single Group from each function, or if we want the ability to yield multiple groups (or connections), or if it's best to do it not as decorated functions at all but as just API calls on Collection (like how collection-level configs are stored).

    • For example, the use case where group/role data is dynamic and outside of Fabric still needs solving here (which is why earlier I noted that we first must identify the highest-level API for this space; then we need to see how that meshes with this intermediate-level idea.)

Feature

Most helpful comment

Hi, I don't know what happened to this software after many years, but I really missed the "roles" concept in [email protected], especially when running $ fab -R dev

All 7 comments

From the mailing list:

We impemented our own internal REST API which populates env.roledefs dynamically depending on the project being deployed and heavily rely on not embedding host strings into project's fabfile or specifying them in CLI.

Our use cases are:

  1. Environment-free codebase https://12factor.net/config. Environments (roles) and their respective host strings are stored in a centralized database. Each fabfile.py has something like this (it populates env.roledefs when the file is imported):
EnvironmentDatabaseAPIClient(
    'https://rest.api.url/schema/',
    env.service_name,
).apply_env()
  1. Number of server environments - multiple testing evironments (some of them are private, some public) and multiple production environments (for different clients). Each environment consists of one or more hosts and is mapped to fabric role.

  2. Each service (env.service_name in the example above) has different set of environments.

  3. Also we have meta-roles (groups of roles). They are prefixed with group-: group-production, group-test, group-external, group-internal, group-all. This allows us to deploy to multiple server roles without specifying them one-by-one, for example group-all deploys to all roles, both production and test.

  4. We have special fabric tasks to print information about role groups, roles and hosts.

  5. We also rely heavily on reverse mapping host strings back to role names (hosts strings are unique per service_name). This is used for deployment logging and notifications. Basically, we log service deployments to each host and send Slack notification when service has been deployed to all hosts in a role. EnvironmentDatabaseAPI server is responsible for this (it keeps logs and deployment state). This is done by decorating fabric tasks with a decorator which submits env.host, env.port and env.service_name (plus commit info) back to API server.

  6. We plan to add deployment authentication in the future, also very likely to pull more env variables from the server to make them available within task context.

Thanks @max-arnold! I recognize many of those from my own use cases in the past as well. The reverse mapping bit in particular I remember coming up in v1 a few times, so I added it to the list.

For Fabric v2 to become useful to me, I would need a way to tell fab which set of hosts to execute a task on.

Previously I defined roles and then ran fab -R .... (Actually the roles were defined programmatically using an IP address range, but that is no requirement and a static list inside a YAML file would be fine.)

I fail to find an equivalent in Fabric v2, and I also failed to emulate this feature using:

  • a fabric.yaml configuration file containing
active_hostset: null
hostsets:
  myhostset:
  - ...
  • active_hostset = config["hostsets"][config["active_hostset"]] in fabfile.py
  • env INVOKE_ACTIVE_HOSTSET=myhostset fab ...

Instead of the expected list of hosts I get KeyError: 'active_hostset'.

We map different sets of hosts to each role for each of our environments in fabric v1, and the environment is set by running a role.environment:staging task to specify it. So this task influences the hosts used by the following tasks.

In v2 we tried using a custom Task, but the problem is Executor.expand_calls runs before our role.environment task runs and so none of the following tasks know the environment in order to dynamically build their hosts lists.

Making Executor.expand_calls a generator allows task execution to influence later tasks execution. So my example above works, where we have a custom Task that needs to know it's environment to properly expand roles to hosts. e.g. fab role.environment dev deploy.app - the role.environment task is now run before deploy.app is expanded, and so deploy.app knows the environment and can configure it's hosts and then is expanded into the correct set of tasks.

I prototyped this in my forks:
https://github.com/pyinvoke/invoke/compare/master...rectalogic:expand-generator
https://github.com/fabric/fabric/compare/master...rectalogic:expand-generator

Hi, I don't know what happened to this software after many years, but I really missed the "roles" concept in [email protected], especially when running $ fab -R dev

We also use roles to represent the same set of operations across different environments. Perhaps separating the concept of a named role and a named environment would be useful? As in, the web role in the dev environment.

Was this page helpful?
0 / 5 - 0 ratings