Moby: Persisting ENV and ARG settings to all later stages in multi-stage builds

Created on 26 Jun 2018  ·  20Comments  ·  Source: moby/moby

Description

Afaik, there's no way to share variables between stages (correct me if i'm wrong, please). To share from one build stage to the next, the only option is to COPY files from one stage's directory to the current stage. We can build something like JSON files with a dump of build-time variables, but I think that this problem is a too frequent thing, and it will flood our multi-stage builds with all kinds of crazy JSON parsing issues. Can we automate this?

Steps to reproduce the issue:

FROM alpine:latest as base1
ARG v1=test
ENV v1=$v1
RUN echo ${v1}

FROM alpine:latest as base2
RUN echo ${v1}

Describe the results you received:

docker build --no-cache  .
Step 4/6 : RUN echo ${v1}
---> Running in b60a3079864b
test

...

Step 5/6 : FROM alpine:latest as base2
---> 3fd9065eaf02
Step 6/6 : RUN echo ${v1}
---> Running in 1147977afd60

Describe the results you expected:

docker build --no-cache --multistage-share-env --multistage-share-arg .
Step 4/6 : RUN echo ${v1}
---> Running in b60a3079864b
test

...

Step 5/6 : FROM alpine:latest as base2
---> 3fd9065eaf02
Step 6/6 : RUN echo ${v1}
---> Running in 1147977afd60
test

Or maybe we can use something like namespaces to interpolate ${base1.v1}

arebuilder kinquestion

Most helpful comment

Correct, Dockerfile instructions, including ENV vars and ARG are scoped per build-stage, and will not be preserved in the next stage; this is by design.

You _can_, however, set a _global_ ARG (set before the first build-stage), and use that value in each build-stage;

ARG version_default=v1

FROM alpine:latest as base1
ARG version_default
ENV version=$version_default
RUN echo ${version}
RUN echo ${version_default}

FROM alpine:latest as base2
ARG version_default
RUN echo ${version_default}

When building without a --build-arg set;

docker build --no-cache -<<'EOF'
ARG version_default=v1

FROM alpine:latest as base1
ARG version_default
ENV version=$version_default
RUN echo ${version}
RUN echo ${version_default}

FROM alpine:latest as base2
ARG version_default
RUN echo ${version_default}
EOF

This produces:

Sending build context to Docker daemon  2.048kB
Step 1/9 : ARG version_default=v1
Step 2/9 : FROM alpine:latest as base1
 ---> 3fd9065eaf02
Step 3/9 : ARG version_default
 ---> Running in 702c05d6f294
Removing intermediate container 702c05d6f294
 ---> 1b2cac6e7585
Step 4/9 : ENV version=$version_default
 ---> Running in 6fb73bc8cdb9
Removing intermediate container 6fb73bc8cdb9
 ---> 656d82ccb6d7
Step 5/9 : RUN echo ${version}
 ---> Running in 403c720d0031
v1
Removing intermediate container 403c720d0031
 ---> d6071c5bd329
Step 6/9 : RUN echo ${version_default}
 ---> Running in d5c76d7d3aaa
v1
Removing intermediate container d5c76d7d3aaa
 ---> 554df1d8584b
Step 7/9 : FROM alpine:latest as base2
 ---> 3fd9065eaf02
Step 8/9 : ARG version_default
 ---> Running in 92400e85c722
Removing intermediate container 92400e85c722
 ---> 5f0cb12f4448
Step 9/9 : RUN echo ${version_default}
 ---> Running in f38802f0d690
v1
Removing intermediate container f38802f0d690
 ---> 4b8caab7870a
Successfully built 4b8caab7870a

And _with_ a --build-arg;

docker build --no-cache --build-arg version_default=v2 -<<'EOF'
ARG version_default=v1

FROM alpine:latest as base1
ARG version_default
ENV version=$version_default
RUN echo ${version}
RUN echo ${version_default}

FROM alpine:latest as base2
ARG version_default
RUN echo ${version_default}
EOF
Sending build context to Docker daemon  2.048kB
Step 1/9 : ARG version_default=v1
Step 2/9 : FROM alpine:latest as base1
 ---> 3fd9065eaf02
Step 3/9 : ARG version_default
 ---> Running in 7f5dd5885859
Removing intermediate container 7f5dd5885859
 ---> 482ffb014095
Step 4/9 : ENV version=$version_default
 ---> Running in b6c6e9aa3489
Removing intermediate container b6c6e9aa3489
 ---> 83f1c0b82986
Step 5/9 : RUN echo ${version}
 ---> Running in 0805ec04fd20
v2
Removing intermediate container 0805ec04fd20
 ---> ef39d4bd6306
Step 6/9 : RUN echo ${version_default}
 ---> Running in f8747a5bfeeb
v2
Removing intermediate container f8747a5bfeeb
 ---> 72d497d25306
Step 7/9 : FROM alpine:latest as base2
 ---> 3fd9065eaf02
Step 8/9 : ARG version_default
 ---> Running in 57aa2e097787
Removing intermediate container 57aa2e097787
 ---> 45e167d234ce
Step 9/9 : RUN echo ${version_default}
 ---> Running in 8615cd6f6ab6
v2
Removing intermediate container 8615cd6f6ab6
 ---> 1674ad8d3b88
Successfully built 1674ad8d3b88

All 20 comments

Correct, Dockerfile instructions, including ENV vars and ARG are scoped per build-stage, and will not be preserved in the next stage; this is by design.

You _can_, however, set a _global_ ARG (set before the first build-stage), and use that value in each build-stage;

ARG version_default=v1

FROM alpine:latest as base1
ARG version_default
ENV version=$version_default
RUN echo ${version}
RUN echo ${version_default}

FROM alpine:latest as base2
ARG version_default
RUN echo ${version_default}

When building without a --build-arg set;

docker build --no-cache -<<'EOF'
ARG version_default=v1

FROM alpine:latest as base1
ARG version_default
ENV version=$version_default
RUN echo ${version}
RUN echo ${version_default}

FROM alpine:latest as base2
ARG version_default
RUN echo ${version_default}
EOF

This produces:

Sending build context to Docker daemon  2.048kB
Step 1/9 : ARG version_default=v1
Step 2/9 : FROM alpine:latest as base1
 ---> 3fd9065eaf02
Step 3/9 : ARG version_default
 ---> Running in 702c05d6f294
Removing intermediate container 702c05d6f294
 ---> 1b2cac6e7585
Step 4/9 : ENV version=$version_default
 ---> Running in 6fb73bc8cdb9
Removing intermediate container 6fb73bc8cdb9
 ---> 656d82ccb6d7
Step 5/9 : RUN echo ${version}
 ---> Running in 403c720d0031
v1
Removing intermediate container 403c720d0031
 ---> d6071c5bd329
Step 6/9 : RUN echo ${version_default}
 ---> Running in d5c76d7d3aaa
v1
Removing intermediate container d5c76d7d3aaa
 ---> 554df1d8584b
Step 7/9 : FROM alpine:latest as base2
 ---> 3fd9065eaf02
Step 8/9 : ARG version_default
 ---> Running in 92400e85c722
Removing intermediate container 92400e85c722
 ---> 5f0cb12f4448
Step 9/9 : RUN echo ${version_default}
 ---> Running in f38802f0d690
v1
Removing intermediate container f38802f0d690
 ---> 4b8caab7870a
Successfully built 4b8caab7870a

And _with_ a --build-arg;

docker build --no-cache --build-arg version_default=v2 -<<'EOF'
ARG version_default=v1

FROM alpine:latest as base1
ARG version_default
ENV version=$version_default
RUN echo ${version}
RUN echo ${version_default}

FROM alpine:latest as base2
ARG version_default
RUN echo ${version_default}
EOF
Sending build context to Docker daemon  2.048kB
Step 1/9 : ARG version_default=v1
Step 2/9 : FROM alpine:latest as base1
 ---> 3fd9065eaf02
Step 3/9 : ARG version_default
 ---> Running in 7f5dd5885859
Removing intermediate container 7f5dd5885859
 ---> 482ffb014095
Step 4/9 : ENV version=$version_default
 ---> Running in b6c6e9aa3489
Removing intermediate container b6c6e9aa3489
 ---> 83f1c0b82986
Step 5/9 : RUN echo ${version}
 ---> Running in 0805ec04fd20
v2
Removing intermediate container 0805ec04fd20
 ---> ef39d4bd6306
Step 6/9 : RUN echo ${version_default}
 ---> Running in f8747a5bfeeb
v2
Removing intermediate container f8747a5bfeeb
 ---> 72d497d25306
Step 7/9 : FROM alpine:latest as base2
 ---> 3fd9065eaf02
Step 8/9 : ARG version_default
 ---> Running in 57aa2e097787
Removing intermediate container 57aa2e097787
 ---> 45e167d234ce
Step 9/9 : RUN echo ${version_default}
 ---> Running in 8615cd6f6ab6
v2
Removing intermediate container 8615cd6f6ab6
 ---> 1674ad8d3b88
Successfully built 1674ad8d3b88

another way is to use base container for multiple stages:

FROM alpine:latest as base
ARG version_default
ENV version=$version_default

FROM base
RUN echo ${version}

FROM base
RUN echo ${version}
docker build --build-arg=version_default=123 --no-cache .   
Sending build context to Docker daemon  92.67kB
Step 1/7 : FROM alpine:latest as base
 ---> 3fd9065eaf02
Step 2/7 : ARG version_default
 ---> Running in a1ebfdf79f07
Removing intermediate container a1ebfdf79f07
 ---> 3e78800ed9ea
Step 3/7 : ENV version=$version_default
 ---> Running in 105d94baac3f
Removing intermediate container 105d94baac3f
 ---> a14276ddc77b
Step 4/7 : FROM base
 ---> a14276ddc77b
Step 5/7 : RUN echo ${version}
 ---> Running in d92f9b48a6cc
123
Removing intermediate container d92f9b48a6cc
 ---> 6505fe2a14bb
Step 6/7 : FROM base
 ---> a14276ddc77b
Step 7/7 : RUN echo ${version}
 ---> Running in 1b748eea4ef3
123
Removing intermediate container 1b748eea4ef3
 ---> f3311d3ad27e
Successfully built f3311d3ad27e
Time: 0h:00m:04s

Let me close this issue, because this is by design, but hope that the examples above help you further; also feel free to continue the conversation

If this is by design, then the design is wrong. Without a concise way to share variables between stages it is impossible to DRY. There are unavoidable, uneliminable situations where many stages will need access to the same variables. Duplicating definitions is error prone, and so is duplicating boilerplate for hacking the shared definitions into every stage.

The "base container" approach severely limits expressive power because it fixes the base container for every stage, while there are valid use cases where intermediate stages each need to use a different minimal base image that provides a tool required for the stage.

@thaJeztah
Can this design be reconsidered?

I have 11 base images that are only available internally and primarily used for our various builds. I am trying to make a base “FROM scratch” image, that those 11 images will use as part of a mulit-stage build, because some of the logic is the same across all 11, and this includes environment variables. So I have environment variables that need to be set within each image, and want to set these within my base image so that the same logic does not need to be replicated across every other image.

@DMaxfield-BDS Scope of ARG won't change, but you could consider having a separate build target that creates the "FROM scratch" base image; push that to the registry, and use if for your other images;

FROM scratch AS scratch-base
ENV foo=bar
ENV bar=baz

build it, and push it to your (internal) registry, and those 11 base images could use it as base

FROM scratch-base
RUN your stuff

There is another proposal to have EXPORT/IMPORT, which might fit some other use-cases; https://github.com/moby/moby/issues/32100

@thaJeztah

Thank you for your response. The one thing I did not point out is that these other 11 images all have different base images (python, npm, postgres, openjdk, etc). So what I am looking to do is put all the common set-up/prep into one base image, including setting needed environment variables used by my companies application, like the following:

FROM scratch AS scratch-base
ARG JAR_DIR='/Input/jars/'
ENV JAR_DOWNLOAD_DIR=$JAR_DIR

Push to my internal registry. Then, an example of one of the other 11 images would do the following:

FROM scratch-base AS scratch-base
FROM openjdk:8-jdk
COPY --from=scratch-base $JAR_DOWNLOAD_DIR .
RUN <stuff that also uses the JAR_DOWNLOAD_DIR environment variable>

If I am able to use environment variables set in the 1st image, I don't then have to configure these same settings in each of the other 11 images. This allows me one configuration point, and allows better automation.

How am I supposed to do something like this?

RUN COMMIT_HASH=$(git rev-parse --short HEAD)

How am I allowed to use COMMIT_HASH variable in subsequent RUN steps? I am forced to define the variable and use it all in one step?

In my opinion this design is very limited. Maybe one could:

RUN COMMIT_HASH=$(git rev-parse --short HEAD) AS commit-hash
ARG --from=commit-hash COMMIT_HASH
RUN echo ${COMMIT_HASH}

@dnk8n this is what I do to work around it:

On the build stage I do:

RUN git rev-parse HEAD > commit_hash

then on the other stage I copy the file with the data & and set the Environment variable before running my app that consumes it:

COPY --from=builder /<build_folder>/commit_hash /<other_stage_folder>/commit_hash

CMD export COMMIT_HASH=$(cat /<other_stage_folder>/commit_hash); java -jar myApp.jar

You should be able to do something similar to set the ARG.

At the very least we should be able to do something like this to copy the BAR variable from the other stage/image into the current build scope.

ENV --from=stage1 FOO=$BAR

This is even more important when considering using an external image as a stage because there can be important metadata in environment variables.

ENV --from=nginx:latest FOO=$BAR

I'm working-around by saving the environment to a file at each stage, using this pattern:

FROM base1:version1 AS BUILDER1
WORKDIR /build
RUN env | sort > env.BUILDER1

FROM base2:version2 AS BUILDER2
WORKDIR /build
RUN env | sort > env.BUILDER2

FROM finalbase:version AS FINAL
WORKDIR /build
ENV LANG=C.UTF-8
COPY --from=BUILDER1 /build/env.BUILDER1 .
COPY --from=BUILDER2 /build/env.BUILDER2 .
# Use ". ./env.BUILDER" instead of "source" if /bin/sh is true-POSIX
RUN set -eux ; \
        source ./env.BUILDER1 ; \
        source ./env.BUILDER2 ; \
        env | sort

How that can finally be 'saved' into 'docker' such that it is visible when running docker inspect I'm unsure, but it could go into /etc/profile for example.

image

Seen last comment being so fresh, only gives me a glimpse on how many developers are questioning the design decisions behind the dockerfile. Apparently "DRY" was not among mastered skills. If you want try, add another layer and start templating the dockerfiles...

To be honest I am very surprised about this... the "global arg" doesn't seem to work for me. If I move the build args to the top before anything else, it's like they don't exist in following steps after FROM.

Edit: I hadn't noticed that I have to repeat anyway the ARG lines with no values after each FROM... it works but not ideal.

Explicit way of copying (getting) environment variable from other stage is a must. I have a valid use case for it.
Here is a simplified example:

FROM node:lts-alpine AS node
FROM php:7-fpm-alpine AS php

# Here goes some build instructions for PHP image, then

# Install nodejs, npm, yarn
COPY --from=node /usr/local/bin/node /usr/local/bin/node
COPY --from=node /usr/local/lib/node_modules /usr/local/lib/node_modules
COPY --from=node /opt /opt

# Create symlinks to npm, yarn binaries
RUN \
    ln -s "/usr/local/lib/node_modules/npm/bin/npm-cli.js" /usr/local/bin/npm \
    && ln -s "/usr/local/lib/node_modules/npm/bin/npx-cli.js" /usr/local/bin/npx \
    && ln -s /opt/yarn-v1.??.?/bin/yarn /usr/local/bin/yarn \
    && ln -s /opt/yarn-v1.??.?/bin/yarnpkg /usr/local/bin/yarnpkg

# Some other instructions

Now, how am I supposed to know the yarn version if it is defined in node image (YARN_VERSION) in environment variable?

The multi-stage build is accommodated along huge benefits in our CI pipelines. A feature like "environment inheritance" would push it to another level maintenance and feature wise too.

I'm having around 5 layers with heavy use of env vars and every update is a nightmare. The bright side(?), I rather think twice before I introduce a new stage.

The current ARG implementation gives some curious behavior:

FROM python
ARG X=os    # or ENV, the same  
RUN echo import ${X:-sys}   # Proof that the syntax works with default SHELL
SHELL ["python", "-c"]
RUN import os
RUN import ${X}  # Fails
RUN import ${X:-sys}  # Fails too, but this is a documented syntax (for ENV)

or

FROM alpine
ARG X=ab.cd
RUN echo ${X%.*}  # it's ok but it looks like magic 
WORKDIR ${X%.*}  # Fails here 

This is linked by the fact that ARGs are per-stage environment variables, not templating.

I know that you (and me) have probably never see that in real life (string manipulation maybe ...).
But that highlight the fact that the docker syntax is based on the underlayer shell syntax. That makes the documentation lie: (see https://docs.docker.com/engine/reference/builder/#environment-replacement and counter-example "RUN import ${X:-sys}")

Why RUN don't interpret ARG before launching command like the legacy 'make' does?

$ cat Makefile
SHELL=python
X=os
test:
    import os
    import ${X}
$ make test
import os
import os

24 hours are over =)

_Warning_ this solution doesnt work, although I wish it would!

Write the env var to a file in the first stage and copy that one file in the second stage? etc? like this:

FROM golang:1.14 as cm_base
ARG commit_id
RUN echo "$commit_id" > /tmp/env.json
FROM golang:1.14
COPY --from=cm_base "/tmp/env.json" "/tmp/env.json"
ENV cm_cc_commit_id="$(cat /tmp/env.json)"

boom, except it doesnt work, since cm_cc_commit_id becomes the literal $(cat /tmp/env.json), the latter expression doesnt get evaluated, it remains literal. So setting dynamic env vars seems impossible?

However what does work is writing to a file, making your entrypoint a bash file, and then doing this:

#!/usr/bin/env bash
echo "the docker entrypoint args:" "$@"
export cm_cc_commit_id="$(cat /tmp/env.json)"
"$@"

Think about it more like this:

Dockerfile does not have a means of declaring variables except via global ARG. This do not automatically propagate to each stage because this has effects on caching.
The builder could be smarter and determine if you are trying to use an ARG in a particular stage and allow it, but then it has to guess if you wanted a literal or not.

So yes, you need to explicitly import global ARGs into each stage you want to use it in. Maybe the builder or a linter could implement warnings for cases where it looks like you might be trying to use an ARG that you haven't imported.
This is not broken by design, it is an explicit choice to choose optimal caching over fewer lines of code.

It is definitely understood that the usage of global ARGs is initially confusing.

Was this page helpful?
0 / 5 - 0 ratings