Go: proposal: cmd/go: support embedding static assets (files) in binaries

Created on 4 Dec 2019  Ā·  176Comments  Ā·  Source: golang/go

There are many tools to embed static asset files into binaries:

Actually, https://tech.townsourced.com/post/embedding-static-files-in-go/ lists more:

Proposal

I think it's time to do this well once & reduce duplication, adding official support for embedding file resources into the cmd/go tool.

Problems with the current situation:

  • There are too many tools
  • Using a go:generate-based solution bloats the git history with a second (and slightly larger) copy of each file.
  • Not using go:generate means not being go install-able or making people write their own Makefiles, etc.

Goals:

  • don't check in generated files
  • don't generate *.go files at all (at least not in user's workspace)
  • make go install / go build do the embedding automatically
  • let user choose per file/glob which type of access is needed (e.g. []byte, func() io.Reader, io.ReaderAt, etc)
  • Maybe store assets compressed in the binary where appropriate (e.g. if user only needs an io.Reader)? (edit: but probably not; see comments below)
  • No code execution at compilation time; that is a long-standing Go policy. go build or go install can not run arbitrary code, just like go:generate doesn't run automatically at install time.

The two main implementation approaches are //go:embed Logo logo.jpg or a well-known package (var Logo = embed.File("logo.jpg")).

go:embed approach

For a go:embed approach, one might say that any go/build-selected *.go file can contain something like:

//go:embed Logo logo.jpg

Which, say, compiles to:

func Logo() *io.SectionReader

(adding a dependency to the io package)

Or:

//go:embedglob Assets assets/*.css assets/*.js

compiling to, say:

var Assets interface{
     Files() []string
     Open func(name string) *io.SectionReader
} = runtime.EmbedAsset(123)

Obviously this isn't fully fleshed out. There'd need to be something for compressed files too that yield only an io.Reader.

embed package approach

The other high-level approach is to not have a magic //go:embed syntax and instead just let users write Go code in some new "embed" or "golang.org/x/foo/embed" package:

var Static = embed.Dir("static")
var Logo = embed.File("images/logo.jpg")
var Words = embed.CompressedReader("dict/words")

Then have cmd/go recognize the calls to embed.Foo("foo/*.js") etc and glob do the work in cmd/go, rather than at runtime. Or maybe certain build tags or flags could make it fall back to doing things at runtime instead. Perkeep (linked above) has such a mode, which is nice to speed up incremental development where you don't care about linking one big binary.

Concerns

  • Pick a style (//go:embed* vs a magic package).
  • Block certain files?

    • Probably block embedding ../../../../../../../../../../etc/shadow

    • Maybe block reaching into .git too

Proposal Proposal-Hold

Most helpful comment

@robpike and I talked through a proposal for doing this years ago (before there was a proposal process) and never got back to doing anything. It's been bugging me for years that we never finished doing that. The idea as I remember it was to just have a special directory name like "static" containing the static data and automatically make them available through an API, with no annotations needed.

I'm not convinced about the complexity of a "compressed vs not" knob. If we do that, then people will want us to add control over which compression, compression level, and so on. All we should need to add is the ability to embed a file of plain bytes. If users want to store compressed data in that file, great, the details are up to them and there's no API needed on Go's side at all.

All 176 comments

It's worth considering whether embedglob should support a complete file tree, perhaps using the ** syntax supported by some Unix shells.

Some people would need the ability to serve the embedded assets with HTTP using the http.FileServer.

I personally use either mjibson/esc (which does that) or in some cases my own file embedding implementation which renames files to create unique paths and adds a map from the original paths to the new ones, e.g. "/js/bootstrap.min.js": "/js/bootstrap.min.827ccb0eea8a706c4c34a16891f84e7b.js". Then you can use this map in the templates like this: href="{{ static_path "/css/bootstrap.min.css" }}".

I think a consequence of this would be that it would be nontrivial to figure out what files are necessary to build a program.

The //go:embed approach introduces another level of complexity too. You'd have to parse the magic comments in order to even typecheck the code. The "embed package" approach seems friendlier to static analysis.

(Just musing out loud here.)

@opennota,

would need the ability to serve the embedded assets with HTTP using the http.FileServer.

Yes, the first link above is a package I wrote (in 2011, before Go 1) and still use, and it supports using http.FileServer: https://godoc.org/perkeep.org/pkg/fileembed#Files.Open

@cespare,

The //go:embed approach introduces another level of complexity too. You'd have to parse the magic comments in order to even typecheck the code. The "embed package" approach seems friendlier to static analysis.

Yes, good point. That's a very strong argument for using a package. It also makes it more readable & documentable, since we can document it all with regular godoc, rather than deep in cmd/go's docs.

@bradfitz - Do you want to close this https://github.com/golang/go/issues/3035 ?

@agnivade, thanks for finding that! I thought I remembered that but couldn't find it. Let's leave it open for now and see what others think.

If we go with the magic package, we could use the unexported type trick to ensure that callers pass compile-time constants as arguments: https://play.golang.org/p/RtHlKjhXcda.

(This is the strategy referenced here: https://groups.google.com/forum/#!topic/golang-nuts/RDA9Hag8RZw/discussion)

One concern I have is how would it hanle invividual or all assets being too big to fit into memory and whether there would be maybe a build tag or per file access option to choose between pritorizing access time vs memory footprint or some middle ground implementation.

the way i've solved that problem (because of course i also have my own implementation :) ) is to provide an http.FileSystem implementation that serves all embedded assets. That way, you don't to rely on magic comments in order to appease the typechecker, the assets can easily be served by http, a fallback implementation can be provided for development purposes (http.Dir) without changing the code, and the final implementation is quite versatile, as http.FileSystem covers quite a bit, not only in reading files, but listing directories as well.

One can still use magic comments or whatever to specify what needs to be embedded, though its probably easier to specify all the globs via a plain text file.

@AlexRouSg This proposal would only be for files which are appropriate to include directly in the final executable. It would not be appropriate to use this for files that are too big to fit in memory. There's no reason to complicate this tool to handle that case; for that case, just don't use this tool.

@ianlancetaylor, I think the distinction @AlexRouSg was making was between having the files provided as global []bytes (unpageable, potentially writable memory) vs providing a read-only, on-demand view of an ELF section that can normally live on disk (in the executable), like via an Open call that returns an *io.SectionReader. (I don't want to bake in http.File or http.FileSystem into cmd/go or runtime... net/http can provide an adapter.)

@bradfitz both http.File itself is an interface with no technical dependencies to the http package. It might be a good idea for any Open method to provide an implementation that conforms to that interface, because both the Stat and Readdir methods are quite useful for such assets

@urandom, it couldn't implement http.FileSystem, though, without referring to the "http.File" name (https://play.golang.org/p/-r3KjG1Gp-8).

@robpike and I talked through a proposal for doing this years ago (before there was a proposal process) and never got back to doing anything. It's been bugging me for years that we never finished doing that. The idea as I remember it was to just have a special directory name like "static" containing the static data and automatically make them available through an API, with no annotations needed.

I'm not convinced about the complexity of a "compressed vs not" knob. If we do that, then people will want us to add control over which compression, compression level, and so on. All we should need to add is the ability to embed a file of plain bytes. If users want to store compressed data in that file, great, the details are up to them and there's no API needed on Go's side at all.

A couple thoughts:

  • It should not be possible to embed any file outside the module doing the embedding. We need to make sure files are part of module zip files when we create them, so that also means no symbolic links, case conflicts, etc. We can't change the algorithm that produces zip files without breaking sums.
  • I think it's simpler to restrict embedding to be in the same directory (if //go:embed comments are used) or a specific subdirectory (if static is used). This makes it a lot easier to understand the relationship between packages and embedded files.

Either way, this blocks embedding /etc/shadow or .git. Neither can be included in a module zip.

In general, I'm worried about expanding the scope of the go command too much. However, the fact that there are so many solutions to this problem means there probably ought to be one official solution.

I'm familiar with go_embed_data and go-bindata (of which there are several forks), and this seems to cover those use cases. Are there any important problems the others solve that this doesn't cover?

Blocking certain files shouldn't be too hard, especially if you use a static or embed directory. Symlinks might complicate that a bit, but you can just prevent it from embedding anything outside of the current module or, if you're on GOPATH, outside of the package containing the directory.

I'm not particularly a fan of a comment that compiles to code, but I also find the pseudo-package that affects compilation to be a bit strange as well. If the directory approach isn't used, maybe it might make a bit more sense to have some kind embed top-level declaration actually built into the language. It would work similarly to import, but would only support local paths and would require a name for it to be assigned to. For example,

embed ui "./ui/build"

func main() {
  file, err := ui.Open("version.txt")
  if err != nil {
    panic(err)
  }
  version, err = ioutil.ReadAll(file)
  if err != nil {
    panic(err)
  }
  file.Close()

  log.Printf("UI Version: %s\n", bytes.TrimSpace(version))
  http.ListenAndServe(":8080", http.EmbeddedDir(ui))
}

Edit: You beat me to it, @jayconrod.

To expand on https://github.com/golang/go/issues/35950#issuecomment-561703346, there is a puzzle about the exposed API. The obvious ways to expose the data are []byte, string, and Read-ish interfaces.

The typical case is that you want the embedded data to be immutable. However, all interfaces exposing []byte (which includes io.Reader, io.SectionReader, etc.) must either (1) make a copy, (2) allow mutability, or (3) be immutable despite being a []byte. Exposing the data as strings solves that, but at the cost of an API that will often end up requiring copying anyway, since lots of code that consumes embedded files eventually requires byte slices one way or another.

I'd suggest route (3): be immutable despite being a []byte. You can enforce this cheaply by using a readonly symbol for the backing array. This also lets you safely expose the same data as a []byte and a string; attempts to mutate the data will fail. The compiler can't take advantage of the immutability, but that's not too great of a loss. This is something that toolchain support can bring to the table that (as far as I know) none of the existing codegen packages do.

(A third party codegen package could do this by generating a generic assembly file containing DATA symbols that are marked as readonly, and then short arch-specific assembly files exposing those symbols in the form of strings and []bytes. I wrote CL 163747 specifically with this use case in mind, but never got around to integrating it into any codegen packages.)

I'm unsure what you're talking about in terms of immutability. io.Reader already enforces immutability. That's the entire point. When you call Read(buf), it copies data into the buffer that _you_ provided. Changing buf after that has zero effect on the internals of the io.Reader.

I agree with @DeedleFake. I don't want to play games with magic []byte array backings. It's okay to copy from the binary into user-provided buffers.

Just another wrinkle here -- I have a different project which uses DTrace source code (embedded). This is sensitive to differences between \n and \r\n. (We can argue whether this is a dumb thing in DTrace or not -- that's beside the point and it is the situation today.)

It's super useful that backticked strings treat both as \n regardless of how they appear in source, and I rely on this with a go-generate to embed the DTrace.

So if there is an embed file added to the go command, I would gently suggest that options to change the handling of CR/CRLF might come in very handy, particularly for folks who might be developing on different systems where the default line endings can be a gotcha.

Like with compression, I'd really like to stop at "copy the file bytes into the binary". CR/CRLF normalization, Unicode normalization, gofmt'ing, all that belongs elsewhere. Check in the files containing the exact bytes you want. (If your version control can't leave them alone, maybe check in gzipped content and gunzip them at runtime.) There are _many_ file munging knobs we could imagine adding. Let's stop at 0.

It may be too late to introduce a new reserved directory name, as much as I'd like to.
(It wasn't too late back in 2014, but it's probably too late now.)
So some kind of opt-in comment may be necessary.

Suppose we define a type runtime.Files. Then you could imagine writing:

//go:embed *.html (or static/* etc)
var files runtime.Files

And then at runtime you just call files.Open to get back an interface { io.ReadSeeker; io.ReaderAt } with the data. Note that the var is unexported, so one package can't go around grubbing in another package's embedded files.

Names TBD but as far as the mechanism it seems like that should be enough, and I don't see how to make it simpler. (Simplifications welcome of course!)

Whatever we do, it needs to be possible to support with Bazel and Gazelle too. That would mean having Gazelle recognize the comment and write out a Bazel rule saying the globs, and then we'd need to expose a tool (go tool embedgen or whatever) to generate the extra file to include in the build (the go command would do this automatically and never actually show the extra file). That seems straightforward enough.

If various munging won't do the trick, then that's an argument against using this new facility. It's not a stopper for me -- I can use go generate like I've been doing, but it means I cannot benefit from the new feature.

With respect to munging in general -- I can imagine a solution where someone provides an implementation of an interface (something like a Reader() on one side, and something to receive the file on the other -- maybe instantianted with an io.Reader from the file itself) -- which the go cmd would build and run to prefilter the file before embedding. Then folks can provide whatever filter they want. I imagine some folks would provide quasi-standard filters like a dos2unix implementation, compression, etc. (Maybe they should be chainable even.)

I guess there'd have to be an assumption that whatever the embedded processor is, it must be compilable on ~every build system, as go would be building a temporary native tool for this purpose.

It may be too late to introduce a new reserved directory name, as much as I'd like to. [...] some kind of opt-in comment may be necessary.

If the files are only accessible through a special package, say runtime/embed, then importing that package could be the opt-in signal.

The io.Read approach seems like it could add significant overhead (in terms of both copying and memory footprint) for conceptually-simple linear operations like strings.Contains (such as in cmd/go/internal/cfg) or, critically, template.Parse.

For those use-cases, it seems ideal to allow the caller to choose whether to treat the whole blob as a (presumably memory-mapped) string or an io.ReaderAt.

That seems compatible with the general runtime.Files approach, though: the thing returned from runtime.Files.Open could have a ReadString() string method that returns the memory-mapped representation.

some kind of opt-in comment may be necessary.

We could do that with the go version in the go.mod file. Before 1.15 (or whatever) the static subdirectory would contain a package, and at 1.15 or higher it would contain embedded assets.

(That doesn't really help in GOPATH mode, though.)

I'm not convinced about the complexity of a "compressed vs not" knob. If we do that, then people will want us to add control over which compression, compression level, and so on. All we should need to add is the ability to embed a file of plain bytes.

While i appreciate the drive for simplicity, we should also make sure we're meeting users' needs.

12 out of 14 of the tools listed at https://tech.townsourced.com/post/embedding-static-files-in-go/#comparison support compression, which suggests that it is a pretty common requirement.

It's true that one could do the compression as a pre-build step outside go, but that would still require 1) a tool to do the compression 2) checking some kind of assets.zip blob into vcs 3) probably a utility library around the embed api to undo the compression. At which point it is unclear what the benefit is at all.

Three of the goals listed in the initial proposal were:

  • don't check in generated files
  • make go install / go build do the embedding automatically
  • store assets compressed in the binary where appropriate

If we read the second of these as "don't require a separate tool for embedding" then not supporting compressed files directly or indirectly fails to meet all three of these goals.

Does this need to be package level? Module level seems a better granularity since most likely one module = one project.

Since this directory wouldn't contain Go codeā€  could it be something like _static?

ā€  or, if it is, it would be treated as arbitrary bytes whose name happens to end in ".go" instead of as Go code to be compiled

If it's one special directory, the logic could just be slurp up anything and everything in that directory tree. The magic embed package could let you do something like embed.Open("img/logo.svg") to open a file in a subdirectory of the asset tree.

Strings seem good enough. They can easily be copied into []byte or converted into a Reader. Code generation or libraries could be used to provide fancier APIs and handle things during init. That could include decompression or creating an http.FileSystem.

Doesn't Windows have a special format for embedding assets. Should that be used when building a Windows executable? If so, does that have any implications for the kinds of operations that can be provided?

Don't forget gitfs šŸ˜‚

Is there a reason it couldn't be part of go build / link... e.g. go build -embed example=./path/example.txt and some package that exposes access to it (e.g. embed.File("example"), instead of using go:embed?

you need a stub for that in your code though

@egonelbre the problem with go build -embed is that all users would need to use it properly. This needs to be fully transparent and automatic; existing go install or go get commands can't stop doing the right thing.

@bradfitz I would recommend https://github.com/markbates/pkger over Packr. It uses the standard library API for working with files.

func run() error {
    f, err := pkger.Open("/public/index.html")
    if err != nil {
        return err
    }
    defer f.Close()

    info, err := f.Stat()
    if err != nil {
        return err
    }

    fmt.Println("Name: ", info.Name())
    fmt.Println("Size: ", info.Size())
    fmt.Println("Mode: ", info.Mode())
    fmt.Println("ModTime: ", info.ModTime())

    if _, err := io.Copy(os.Stdout, f); err != nil {
        return err
    }
    return nil
}

Or maybe certain build tags or flags could make it fall back to doing things at runtime instead. Perkeep (linked above) has such a mode, which is nice to speed up incremental development where you don't care about linking one big binary.

mjibson/esc does this as well, and it is a big quality-of-life improvement when developing a webapp; you not only save linking time but also avoid having to restart the application, which can take substantial time and/or require repeating extra steps to test your changes, depending on the implementation of the webapp.

Problems with the current situation:

  • Using a go:generate-based solution bloats the git history with a second (and slightly larger) copy of each file.

Goals:

  • don't check in generated files

Well, this part is easily solvable by just adding the generated files to the .gitignore file or equivalent. I always did that...

So, alternatively Go could just have its own "official" embed tool that runs by default on go build and ask people to ignore these files as a convention. That would be the less magic solution available (and backward compatible with existing Go versions).

I'm just brainstorming / thinking aloud here... but I actually like the proposed idea in overall. šŸ™‚

Also, since //go:generate directives don't run automatically on go build the behavior of go build may seem a bit inconsistent: //go:embed will work automatically but for //go:generate you have to run go generate manually. (//go:generate can already break the go get flow if it generates .go files needed for the build).

//go:generate can already break the go get flow if it generates .go files needed for the build

I think the usual flow for that, and the one that I've generally used, although it took a bit of getting used to, is to use go generate entirely as a development-end tool and just commit the files that it generates.

@bradfitz it doesn't need to implement http.FileSystem itself. If the implementation provides a type that implements http.File, then it would be trivial for anyone, including the stdlib http package to provide a wrapper around the Open function, converting the type to http.File in order to conform to http.FileSystem

@andreynering //go:generate and //go:embed are very different, though. This mechanism can happen seamlessly at build time because it won't run arbitrary code. I believe that makes it similar to how cgo can generate code as part of go build.

I'm not convinced about the complexity of a "compressed vs not" knob. If we do that, then people will want us to add control over which compression, compression level, and so on. All we should need to add is the ability to embed a file of plain bytes.

While i appreciate the drive for simplicity, we should also make sure we're meeting users' needs.

12 out of 14 of the tools listed at https://tech.townsourced.com/post/embedding-static-files-in-go/#comparison support compression, which suggests that it is a pretty common requirement.

I'm not sure I agree with this reasoning.

The compression done by the other libraries is different from adding it to this proposal in that they will not reduce performance on subsequent builds since the alternatives are generally speaking generated ahead of build rather than during build time.

Low build times is a clear added value with Go over other languages and compression trades CPU time for a reduced storage/transfer footprint. If a lot of Go packages starts running compressions on go build we're going to add even more build time than the time added by simply copying assets during builds. I'm skeptical of adding compression because of others doing it. As long as the initial design doesn't by design prevent a future extension which adds support for i.e. compression, putting it in there because it might be something that could benefit some seems like unnecessary hedging.

It's not like file embedding would be useless without compression, compression is a nice-to-have to reduce the binary size from maybe 100MB to 50MB ā€” which is great, but also not a clear dealbreaker for the functionality for most applications I can think of. Especially not if most of the "heavier" assets are files such as JPEGs or PNGs which are already pretty well compressed.

What about keeping compression out for now and adding it in if it's actually missed by a lot of people? (and can be done without undue costs)

To add to @sakjur's comment above: compression seems orthogonal to me. I generally want to compress an entire binary or release archive, and not just the assets. Particularly when Go binaries in Go can easily get into the tens of megabytes without any assets.

@mvdan I guess one of my concerns is that quite often when I've seen embedding is together with some other pre-processing: minification, typescript compilation, data compression, image crunching, image resizing, sprite-sheets. The only exception being websites that only use html/template. So, in the end, you might end up using some sort of "Makefile" anyways or uploading the pre-processed content. In that sense, I would think a command-line flag would work nicer with other tools than comments.

I guess one of my concerns is that quite often when I've seen embedding is together with some other pre-processing: minification, typescript compilation, data compression, image crunching, image resizing, sprite-sheets. The only exception being websites that only use html/template.

Thanks, that's a useful data point. Perhaps the need for compression is not as common as it looked. If that's the case, i agree that it makes sense to leave it out.

It's not like file embedding would be useless without compression, compression is a nice-to-have to reduce the binary size from maybe 100MB to 50MB ā€” which is great, but also not a clear dealbreaker for the functionality for most applications I can think of.

Binary size is a big deal for many go developers (https://github.com/golang/go/issues/6853). Go compresses DWARF debug info specifically to reduce binary size, even though this comes at a cost to link time (https://github.com/golang/go/issues/11799, https://github.com/golang/go/issues/26074). If there were an easy way to cut binary size in half i think the developers would leap at that opportunity (although i doubt the gains here would be nearly that significant).

That doesn't really help in GOPATH mode, though

Maybe, if you're in GOPATH mode, this feature simply doesn't apply since I imagine the Go team doesn't plan on doing feature parity for GOPATH forever? There are already features that are not supported in GOPATH (such as security w/ checksum db, downloading dependencies through a proxy server, and semantic import versioning)

As @bcmills mentioned, having the static directory name in a go.mod file is a great way of introducing this feature in Go 1.15 since the feature can be automatically turned off in go.mod files that have a <=go1.14 clause.

That said, this also means users have to manually write what the static directory path is.

I think the vendor directory and the _test.go conventions are great examples of how they made working with Go and those two features a lot easier.

I don't recall many people requesting the option to customize the vendor directory name or having the ability to change the _test.go convention to something else. But if Go never introduce the _test.go feature, then testing in Go would look a lot different today.

Therefore, maybe a name less generic than static gives better chances of non-collision and so having a conventional directory (similar to vendor and _test.go) could be a better user experience compared to magical comments.

Examples of potentially low-collision names:

  • _embed - follows the _test.go convention
  • go_binary_assets
  • .gobin follows the .git convention
  • runtime_files - so that it matches the runtime.Files struct

Lastly, the vendor directory was added in Go 1.5 . Sooo, maybe it's not that bad to add a new convention now? šŸ˜…

I think it should expose a mmap-readonly []byte. Just raw access to pages from the executable, paged in by the OS as needed. Everything else can be provided on top of that, with just bytes.NewReader.

If this is for some reason unacceptable, please provide ReaderAt not just ReadSeeker; the latter is trivial to construct from the former, but the other way isn't as good: it would need a mutex to guard the single offset, and ruin performance.

It's not like file embedding would be useless without compression, compression is a nice-to-have to reduce the binary size from maybe 100MB to 50MB ā€” which is great, but also not a clear dealbreaker for the functionality for most applications I can think of.

Binary size is a big deal for many go developers (#6853). Go compresses DWARF debug info specifically to reduce binary size, even though this comes at a cost to link time (#11799, #26074). If there were an easy way to cut binary size in half i think the developers would leap at that opportunity (although i doubt the gains here would be nearly that significant).

That is definitely a fair point and I can see how my argument can be seen as an argument in favor of carelessness with regards to filesizes. That was not my intention. My point is more in line with shipping this feature without compression which would still be useful for some, and they could provide useful feedback and insights as to how to properly add compression in a way that feels right long-term. The assets might swell in a way that the debug info is unlikely to do and it's easier for developers of packages which are installed/imported by others to reduce build performance needlessly if the implementation makes it easy to do so.

Another option would be to make compression of assets a build-flag and leave the compromise between build size and time to the builder rather than the developer. That would move the decision closer to the end-user of the binary who could make a decision on whether the compression is worthwhile. Otoh, this would risk creating an increased surface area for differences between development and production, so it isn't a clear cut better method than anything else and it's not something I feel like I'd want to advocate for.

My current asset embedding tool loads content from the asset files when built with -tags dev. Some convention like that would probably be useful here too; it shortens the development cycle significantly when e.g. fiddling with HTML or a template.

If not, the caller will have to wrap this lower-level mechanism with some *_dev.go and *_nodev.go wrappers and implement non-embedded loading for the dev scenario. Not even hard, but that road will just lead to a similar explosion of tools that the first comment on this issue describes. Those tools will have to do less than today, but they'll still multiply.

I think -tags dev failing to work when run outside the Go module would be reasonable (can't figure out where to load the assets from).

What about just a go tool embed that takes inputs and produces Go output files in a special format recognized by the computer as embedded files which could then be accessed through runtime/emved or something. Then you could just do a simple //go:generate gzip -o - static.txt | go tool embed -o static.go.

A big downside, of course, is that you then have to commit the generated files.

@DeedleFake this issue started with

Using a go:generate-based solution bloats the git history with a second (and slightly larger) copy of each file.

Woops. Never mind. Sorry.

It's not like file embedding would be useless without compression, compression is a nice-to-have to reduce the binary size from maybe 100MB to 50MB ā€” which is great, but also not a clear dealbreaker for the functionality for most applications I can think of.

Binary size is a big deal for many go developers (#6853). Go compresses DWARF debug info specifically to reduce binary size, even though this comes at a cost to link time (#11799, #26074). If there were an easy way to cut binary size in half i think the developers would leap at that opportunity (although i doubt the gains here would be nearly that significant).

If there is a need for it, then people will have the compressed data committed and embedded, and there will be packages to provide a layer between runtime.Embed and the end consumer that do the decompression inline.

And then year or two from now there will be a new issue about adding compression and it can be sorted then.

I say this as one of the 15 competing standards when I wrote goembed :)

@tv42 wrote:

I think it should expose a mmap-readonly []byte. Just raw access to pages from the executable, paged in by the OS as needed.

This comment is easily missed and amazingly valuable.

@tv42,

I think it should expose a mmap-readonly []byte. Just raw access to pages from the executable, paged in by the OS as needed. Everything else can be provided on top of that, with just bytes.NewReader.

The type that's already read-only is string. Also: it provides a size, unlike io.ReaderAt, and it doesn't depend on the standard library. That's probably what we want to expose.

The type that's already read-only is string.

But the whole ecosystem of Write etc works on []byte. It's simple pragmatism. I don't see that readonly property being any more a problem than io.Writer.Write docs saying

Write must not modify the slice data, even temporarily.

Another potential downside is that when embedding a directory with go:generate I can check the output of git diff and see if any files are there by mistake. With this proposal - ? Perhaps the go command would print the list of files it is embedding?

@tv42

But the whole ecosystem of Write etc works on []byte.

html/template works with strings, though.

Go already let you use -ldflags -X to set some strings (useful to set git version, compile time, server, user etc.), could this mechanism be extended to set io.Readers instead of strings ?

@bradfitz Are you proposing to use strings here even for data that isn't text? It's common enough to embed small binary files like icons and small images, etc.

@tv42 You said Write but I assume you meant Read. You can turn a string into an io.ReaderAt using strings.NewReader, so using a string doesn't seem like a barrier there.

@andreynering A string can hold any sequence of bytes.

A string can hold any sequence of bytes.

Yeah, but its primary intention is to hold text, and not arbitrary data. I suppose this may cause a bit of confusion, in particular for inexperienced Go developers.

I totally got the idea, though. Thanks for clarifying.

@ianlancetaylor

Read is supposed to mutate the slice passed in. Write is not. Hence Write documentation says this is not allowed. I don't see anything more being needed than documenting that users must not write to the returned []byte.

Just because strings.Reader exists doesn't mean io.WriteString will find an efficient implementation of writing strings. For example, TCPConn does not have WriteString.

I'd hate to have Go include a new feature like this only to force all data to be copied just to write it to a socket.

Also, the general assumption is that strings are human-printable and []byte often isn't. Putting JPEGs in strings will cause a lot of messed-up terminals.

@opennota

html/template works with strings, though.

Yeah, that's a weird one, it only takes files by pathname not as readers. Two responses:

  1. There's no reason the embedded data couldn't have both methods Bytes() []byte and String() string.

  2. Hopefully you're not parsing a template every on every single request; whereas you really do have to send the data of a JPEG into a TCP socket for every request that asks for it.

@tv42 We can add WriteString methods as needed.

I don't think the most common use of this functionality will be to write the unmodified data, so I don't think we should optimize for that case.

I don't think the most common use of this functionality will be to write the unmodified data,

I think the absolute most common use of this functionality will be to serve web assets, images/js/css, unmodified.

But don't take my word for that, let's look at some of the importers of Brad's fileembed:

#fileembed pattern .+\.(js|css|html|png|svg|js.map)$
#fileembed pattern .*\.png



md5-f8b48fccd03599094034bf2b507e9e67



#fileembed pattern .*\.js$

And so on..

For anecdotal data: I know if this were implemented Iā€™d immediately use it two places at work, and both would be to provide unmodified access to static textual files. Right now we use a //go:generate step to convert the files to constant (hexadecimal-format) strings.

Iā€™d vote for the new package as opposed to the directive. Much easier to get a grip on, easier to handle/manage and much easier to document and extend. e.g. Can you easily find the documentation for a Go directive such as ā€œgo:generateā€? What about the documentation for the ā€œfmtā€ package? Do you see where I am going with this?

So, alternatively Go could just have its own "official" embed tool that runs by default on go build

@andreynering I know other packages managers and language tools allow it, but running arbitrary code/commands at build time is a security vulnerability (for what I hope are obvious reasons).

Two additional things come to my mind when thinking about this feature:

  • How would embedding files automatically work with the build cache?
  • Does it prevent reproducible builds? If the data is changed in any way (e.g. compressing it), it should take reproducibility into account.

stuffbin, linked in the first comment, was built to primarily let self-hosted web applications embed static (HTML, JS ...) assets. This seems to be a common use case.

Barring the compilation / compression discussion, another pain point is the lack of a filesystem abstraction in the stdlib because:

  • On a developer's machine, the numerous go runs and builds needn't be burdened by the overhead of embedding (while optionally compressing) assets. A filesystem abstraction would allow to easily _failover_ to the local filesystem during development.

  • Assets can change actively during development, for example, a full Javascript frontend in a web application. The ability to seamlessly switch between embed and the local filesystem instead of the embedded assets would allow to avoid compiling and re-running of the Go binary just because the assets changed.

Edit: To conclude, if the embed package could expose a filesystem-like interface, something better than http.FileSystem, it'd solve these concerns.

The ability to seamlessly switch between embed and the local filesystem

Surely this can be implemented at the application level, and is beyond the scope of this proposal, no?

Surely this can be implemented at the application level, and is beyond the scope of this proposal, no?

Sorry, just realised, the way I worded it is ambiguous. I wasn't proposing a filesystem implementation inside the embed package, but just an interface, something better than http.FileSystem. That'd enable applications to implement any sort of abstraction.

Edit: Typo.

@knadh Totally agree that it should work when you just use go run too, the way Packr handles this is really nice. It knows where you files are, if they're not embedded in the app, then it loads them from disk as it expects that to be "development mode" basically.

The author of Packr has also released a new tool Pkger that's more Go Modules focused. The files there are all relative to the Go Module root. I really like that idea, but Pkger seems to have not implemented the local development loading from disk. A combination of both would be amazing IMO.

I don't know if it's already out of the running, but while the "embed package approach" is pretty magical, it also provides some awesomeness because the tool can deduce what to do to the file based on the call. e.g. the API might be something like

package embed
func FileReader(name string) io.Reader {ā€¦}
func FileReaderAt(name string) io.ReaderAt {ā€¦}
func FileBytes(name string) []byte {ā€¦}
func FileString(name string) string {ā€¦}

If the go tool finds a call to FileReaderAt, it knows that the data must be uncompressed. If it only finds FileReader calls, it knows that it can store compressed data. If it finds a call FileBytes, it knows that it needs to do a copy, if it only finds FileString, it knows it can serve from read only memory. And so on.

I'm not convinced this is a reasonable way to implement this for the go tool proper. But I wanted to mention this, as it allows to get the benefits of compression and zero-copy embedding while not having any actual knobs.

[edit] also, of course, it let's us add these extra mangling things after the fact, focussing on a more minimal feature-set first [/edit]

If it only finds FileReader calls...

This would preclude the use of the other methods via reflection.

[Edit] Actually, I think the implications are broader than that. If the use of FileReaderAt is an indication that the data must be uncompressed, then the use of FileReaderAt() with any non-const input implies that all files must be stored uncompressed.

I don't know if that's good or bad. I just think the magic heuristic won't be nearly as useful as it might seem at first blush.

One argument in favor of a comment pragma (//go:embed) instead of a special directory name (static/): a comment lets us embed a file in the test archive for a package (or the xtest archive) but not the library under test. The comment just needs to appear in a _test.go file.

I expect this would address a common problem with modules: it's hard to access test data for another package if that package is in another module. A package could provide data for other tests with a comment like //go:embedglob testdata/* in a _test.go file. The package could be imported into a regular non-test binary without pulling in those files.

@fd0,

How would embedding files automatically work with the build cache?

It would still work. The embedded file content hashes would be mixed in to the cache key.

Would it be possible to (or even a good idea) to have a module/package/mechanism that would be practically transparent, as in from inside your application you's simply try to open a path like

internal://static/default.css

and the File functions will read data from inside the binary, or from an alternative location
ex Package.Mount("internal[/<folder>.]", binary_path + "/resources/")

to create "internal://" with all files in the binary, fall back to executable path / resources / if in dev mode or if file not found in binary (and maybe throw a warning or something for logging purposes)

This would allow for example to have

Package.Mount("internal", binary_path  + "/resources/private/")
Package.Mount("anotherkeyword", binary_path  + "/resources/content/")

Probably best to lock the alternate location to path of executable when in 'release' mode, but relax this in dev mode (allow only folders in go_path or something like that)

By default, the package "mounts" internal:// or some other keyword but let user rename it if he/she wants.. ex .ReMount("internal","myCustomName") or something like that.

Another thing... would it make sense to check last change/modified time on alternate location and automatically override internal file if there is such file outside the application (maybe have a flag allowing for this, configurable by programmer before build)
This may be wanted for super fast patches of applications, where you don't want to wait for a new build to be made and distributed.. you could just create the folder and copy the file there, and the binary will switch to the new file.

On Windows, would it be possible, or make sense to use resources (as in binary data blob in a resource)
And a bit unrelated, but maybe this package could also deal with bundling icons in the executable, or manifest data, or maybe even other resources? I realize it's Windows only...
I would imagine the builder could log the last modified/change dates of the files in the alternate folders and only trigger a "create blob of data" if a file changes and cache the blob somewhere.
Maybe only create a "cache" file is user chooses to enable compression on these bundled files (if it's decided eventually to compress them) ... if compression is chosen, only the particular file that was modified would have to be recompressed at build time, other files would just be copied into the binary from cache.

One problem I see is if package allows custom names, it would need to have a blacklist of some sort, as in not allow "udp, file, ftp, http, https and various other popular keywords"

As for store as byte array / string or compression ... imho whatever decision is made, it should leave room to easily update in the future... ex you could start with no compression and just have a list of offsets and file sizes and file names but make it possible to easily add compression in the future (ex method zlib, lzma, compressed size, uncompressed size if it's needed to allocate enough memory to unpack chunks etc etc .

I personally would be happy if the executable can be packed with UPX or equivalent, I assume the binary would be unpacked in memory and everything would work.

A few thoughts that relate indirectly:

  • I like the package embed approach for its using Go syntax
  • I think the need for compression and other manipulation isn't about binary size, it's about wanting to store just the most diff-friendly form of content in a repository, for there to be no "out of sync" state where someone forgets to regenerate and commit a compressed form when changing the "source," and for the package to remain "go gettable." Without addressing these points, we're only solving the standardization problem, which may be acceptable, but doesn't seem ideal.
  • I think we could sidestep the need for the toolchain needing to actively support specific compression/transformations if the embed interaction can optionally provide a "codec". Exactly how to define the codec depends on the integration syntax, but I imagine something like
package embed

type Codec interface {
    // Encode transforms a source representation to an in-binary encoded asset.
    Encode(io.Writer, io.Reader) error

    // Decode transforms an in-binary asset to its active representation that the embedded application wants to use.
    Decode(io.Writer, io.Reader) error
}

This can cover very specific use cases, like this contrived one:

package main

func NewJSONShrinker() embed.Codec {
   return jsonShrinker{}
}

type jsonShrinker struct{}
func (_ jsonShrinker)  Encode(io.Writer, io.Reader) error {
    // use json.Compact + gzip.Encode...
}
func (_ jsonShrinker)  Decode(io.Writer, io.Reader) error {
    // use gzip.Decode + json.Indent
}

Using it could look like

// go:embed file.name NewJSONShrinker

func main() {
    embed.NewFileReader("file.name") // codec is implied by the comment above
}

or possibly

func main() {
    f, err := embed.NewFileReaderCodec("file.name", NewJSONShrinker())
    ...
}

In the second form, there's the complication that the toolchain needs to understand statically which Codec to use, because it has to do the Encode step at compile-time. So we'd have to disallow any Codec value that can't be easily determined at compile time.

Given these two options, I think I'd choose the magic comment plus codecs. It results in a more powerful feature that addresses all the stated goals here. Plus, I don't think magic comments are unacceptable here. We already tolerate them via go:generate for this purpose right now. If anything, one might consider the magic package alone more of a departure from current idioms. The Go ecosystem right now doesn't have many features that let a source file instruct the toolchain to use additional source files, and I think the only one that isn't a magic comment right now is the import keyword.

If we do compression, there will be no codec type or compression level knobs at all. That is, having any knobs at all is the biggest argument to not support compression at all.

The only choice I'd want to expose, if any, is: random access or not. If you don't need random access, the tooling and runtime can pick whatever compression's appropriate and not expose it to users. And it'd likely change/improve over time.

But I've come around on @rsc's side of not having compression because of a realization I had: the very content that's most compressible (HTML, JS, CSS, etc) is content you'd still want random access on (to be served via, say, http.FileServer, which supports range requests)

And looking at the combined size of Perkeep's HTML/CSS/JS that we embed, it's 48 KB uncompressed. The Perkeep server binary is 49 MB. (I'm ignoring the size of embedded images because those are already compressed.) So it seems like it's just not worth it, but it could be added later.

From a discussion with @rsc, it seems we could do a mix of the above approaches:

In package runtime,

package runtime

type Files struct {
     // unexported field(s), at least 1 byte long so Files has a unique address
}

func (f *Files) Open(...) (...) { ...}
func (f *Files) Stat(...) (...) { ...}
func (f *Files) EnumerateSomehow(...) { ...}

Then in your code:

package yourcode

//go:embed static/*
//go:embed logo.jpg
var website runtime.Files

func F() {
     ... = website.Open("logo.jpg")
}

Then the cmd/go tool would parse the go:embed comments and glob those patterns + hash those files and register them with the runtime, using &website.

The runtime would effectively have a map of each Files address to what its contents are and where in the file executable those are (or what their ELF/etc section names are). And perhaps whether they do or don't support random access, if we end up doing any compression.

@gdamore,

Just another wrinkle here -- I have a different project which uses DTrace source code (embedded). This is sensitive to differences between n and rn.
...
If various munging won't do the trick, then that's an argument against using this new facility.

You can also munge at runtime to remove any carriage returns that get embedded from Windows users running go install. I've written that io.Reader filter a couple times.

But I've come around on @rsc's side of not having compression because of a realization I had: the very content that's most compressible (HTML, JS, CSS, etc) is content you'd still want random access on (to be served via, say, http.FileServer, which supports range requests)

Compression and random access are not fully mutually exclusive. See e.g. some discussion here: https://stackoverflow.com/questions/429987/compression-formats-with-good-support-for-random-access-within-archives

Compression and random access are not fully mutually exclusive

Yeah, if we wanted coarse-grained seeking with some overhead to get to the right position. I did some work in this space with CRFS's stargz format. But I fear the overhead would be big enough that we wouldn't want to do that automatically for people. I suppose you could also lazily inflate it into memory (and be able to drop it on GCs, like a sync.Pool), but it just doesn't seem worth it.

I fear the overhead would be big enough that we wouldn't want to do that automatically for people.

Fair enough. The important question is whether we would prefer an API that allows us to cheaply change our minds about this later, if needs change or if experiments show the overhead is acceptable.

@bradfitz good point. And I can certainly do that. FWIW, in my repo I've also configured git to be less toxic when seeing .d files. Still I find the property of embedded strings with backquotes to be useful, in that it's predictable and not subject to the whims of git or the system.

What I was getting at with the Codec idea is that compression isn't the only transformation one might want and that a user-provided Codec type allows the toolchain to ignore flags other than "which codec." Any compression levels, or the algorithm at all, compression or otherwise, would have to be specific to the codec used. I totally agree that trying to "support compression" in the sense of providing some specific set of formats and knobs would be a wild goose chase with all the variations people could ask for. In fact, I would be most excited about the uncommon uses, like preprocessing i18n data, perhaps, or processing datasets like in latlong, so I think it's worth considering options around it still.

I did think of another way to provide the same flexibility that might be more pleasant. The // go:embed directive could be a command invocation, just like // go:generate is. For the most simple case, something like

// go:embed "file.name" go run example.com/embedders/cat file.name

The key difference being, of course, that the stdout of the command invocation is embedded under the provided name. The example also uses a pretend package with go run to show how it'd likely be done to make the command OS independent, since cat may not be available everywhere Go compiles.

That takes care of the "encode" step of the transformation, and perhaps the task of the "decode" step can be left to the user. The runtime/embed package can just provide the bytes that the user asked the toolchain to embed, whatever the encoding. This is fine because the user knows what the decoding process should be.

One big downside of this is I don't see a good way to embed a glob of multiple files this way, beyond the embedded bytes being a zip or something. That might actually be good enough, since a glob could still be used by a zip command, and it's on the defining side where you really care about the glob. But we could also have two features out of this proposal, one to do simple embedded and another to run a generator-embed.

One possible downside that occurred to me is that it adds an open-ended step into the build, assuming embeds should be handled by go build and not require an extra toolchain invocation like go generate does. I think that's okay, though. Perhaps the tool can be expected to manage its own cache to avoid repeating expensive operations, or perhaps it can communicate with the toolchain to use Go's cache. That sounds like a problem that can be solved and fits with the overall theme of go build doing more for us (like fetching modules).

Is one of the goals of this project to ensure Go builds require no external tooling and no go:generate lines?

If not, it seems worth keeping things simple and only supporting a byte slice or string because if the user wants compression with lots of knobs they can do so in their make file (or similar), go generate line, etc. before building anyways, so it doesn't seem worth adding them to whatever the result of this proposal ends up being.

If not requiring Make or similar is a goal, then I suppose it might make sense to use compression, but personally I'd just as soon use Make, go generate, etc. to do the compression then keep embed simple and just embed some bytes.

@SamWhited,

Is one of the goals of this project to ensure Go builds require no external tooling and no go:generate lines?

Yes.

If people want to use go:generate or Makefiles or other tools, they have dozens of choices today.

We want something that's portable and safe and correct that works by default. (and to be clear: safe means we can't run arbitrary code at "go install" time, for the same reason that go:generate doesn't run by default)

@stephens2424

I think we could sidestep the need for the toolchain needing to actively support specific compression/transformations if the embed interaction can optionally provide a "codec".

No arbitrary code execution during go build.

No arbitrary code execution during go build.

Yep, I see that now. I suppose there's no way to reconcile having only "source" files committed to a repo, wanting "processed" files embedded, to have the package be "go gettable," _and_ keep go build simple and safe. I'm still for the push for standardization here, but I hoped to have my cake and eat it too, I guess. Worth a shot! Thanks for catching the problem!

@flimzy

This would preclude the use of the other methods via reflection.

There are no methods in what I mentioned, only functions. They are not discoverable at runtime and there's no way to reference them without mentioning them by name in source. And note that the interface-values returned by the different functions don't have to be the same type - indeed I would expect them to either be unexported types with exactly the method required to implement that interface or an instance of *strings.Reader etc., whatever would make sense in the context.

Arguably though, the idea suffers from passing around the exported functions of the embed package as values. Though even that likely wouldn't be a problem - the signature contains an unexported type (see below), so you can't declare a variable, argument or return of their type. You can pass them to reflect.ValueOf themselves, in theory. I don't even know if that would allow you to actually call them (you'd still have to construct a value of their parameter type, which is unexported. Dunno if reflect allows that).

But be that as it may: It would still be possible (and simplest) to simply be pessimistic in case any top-level function of embed gets used as a value and assume the restrictions it creates on all embedded files. Which would mean that if you decide to do extremely weird and useless things with the embed-package, you lose some optimizations (that we don't necessarily make any promises about anyway). Seems fair.

Actually, I think the implications are broader than that. If the use of FileReaderAt is an indication that the data must be uncompressed, then the use of FileReaderAt() with any non-const input implies that all files must be stored uncompressed.

It makes no sense to allow non-const inputs, as the filename needs to be known statically to do the embedding. It was imprecise of me to use string as the type of filename parameters though: They should've really been an unexported type filename string and not be used as anything but function arguments. That way, it's impossible to pass anything that isn't an untyped string constant.

@Merovius

It makes no sense to allow non-const inputs

I think we're talking about different things. I mean inputs to the accessor functions (i.e. FileReaderAt()) I'm sure you'll agree that non-const input makes sense there.

And my point is: Suppose we've embedded 100 files, but we have a FileReaderAt(filename) call, where filename is not constant; there's no way to know which (if indeed any) of the embedded files will be accessed this way, thus all must be stored uncompressed.

@flimzy we were talking about the same thing, I just seriously didn't think non-const filenames would make sense :) Which, thinking of it, was wrong and an oversight. Sorry about that. Facilities to glob or include whole directories and then iterate over them actually are pretty important, yes. Still think this could be solved - e.g. by making the decision per collection (dir/glob) and only allowing to select those by constant names - but as I said: It's not actually an API I would consider super appropriate for the Go tool because of how magic it is. So going into the weeds like this is probably giving the concept more space in the discussion than it deserves :)

Another case I didn't see in the previous messages and that made me consider embedding a file into a Go binary was the impossibility to properly distribute a wrapper package of a C shared library using regular go build/install (the shared library stays in the sources).

I didn't do it in the end but that would definitely make me reconsider it for this case. The C library has indeed a lot of dependencies that would be easier to distribute as a shared library. This shared library could be embedded by the Go bindings.

Wow!!!

@Julio-Guerra
I'm quite sure you would still have to extract them to disk and then use dlopen and dlsym to call the C functions. So then theres not much point in embedding the libs if they would already be avaiable in the source directory.

Edit: Misunderstood your post a little, just realized you're talking about creating a binary for distributation

Outside the http static assets, for embedded blobs that you need in a pointer to in memory, it would be nice to have a function that returned the pointer to the already-in-process embedded memory. Otherwise one would have to allocate new memory and make a copy from the io.Reader. That would consume twice the memory.

@glycerine, again, that's a string. A string is a pointer and a length.

Wouldn't it be great to just have some way to mark code to be executed at compile time and provide the result at runtime. That way you could read any file, compress it if you like at compile time and at runtime you could access it. That would work for some computations as it would work for preloading of file content.

@burka as said before in the thread, go build will not run arbitrary code.

@burka, that's explicitly out of scope. That decision (no code execution at compilation time) was made a long time ago and this isn't the bug to change that policy.

A side effect of this proposal is that go proxies can never optimize the files they store to be only go files. A proxy must store an entire repository because it won't know if the Go code embeds any of the non-Go files.

I don't know if proxies already optimize for this but it is feasible at someday they may want to.

@leighmcculloch I don't think this is the case today either, though. Any non-Go files in a Go package should be included in module archives, because they may be required for go test. You might also have C files for cgo, as another example.

This is an exciting direction, we definitely have need for it for our use-cases.

That said, I feel like there are different use-cases with different requirements but most everyone who is commenting about _how_ they think it should be done are implicitly envisioning their own use-cases but not explicitly defining them.

It might be helpful ā€” at least be really helpful for me ā€”Ā if we could delineate the different use-cases for a file embedding solution and the challenges that each use-case presents.

For example, our primary use-case is to embed HTML+CSS+JS+JPG+etc so that when the go app is run it can write those files to a directory such that they can be served by an http.FileServer. Given that use-case most of the comments I have read discussing Readers and Writers have been foreign to me because we don't need to access the files from Go, we just let go-bindata copy them to disk _(albeit maybe there is a way to leverage better techniques we simply have not yet realized we should consider.)_

But our challenges are as follows: We typically use GoLand with its debugger and will be working on the web app making continuous changes. So during development we need http.FileServer to load the files directly from our source directory. But when the app runs http.FileServer needs to read those files from the directory where the files have been written to by the embedding solution. Which means when we compile we have to run go-bindata to update the files, and then check them into Git. And that is all generally workable with go-bindata, although certainly not idea.

However at other times we need to actually run a compiled executable, so we can attach a debugger to the running program and yet still have that program load files from the source directory and not from the directory where embedded file are written by go-bindata. Currently we do not have a good solution for this.

So those are our use-cases and challenges. Maybe others could explicitly define the other use-cases and related set of challenges, so these discussions can explicitly address the various problem spaces and/or explicitly denote that this effort will not address the specific needs of a given problem space?

Thanks in advance for considering.

Since I donā€™t see it mentioned as a use case, weā€™d also benefit from this for our directory of templates we access via template.ParseFiles.

I would find the cleanest approach to be a method of opting in via go.mod. This would ensure it was backwards compatible (as existing projects would have to opt-in to use it) and allow tooling (such as go proxies) to determine what files are needed. The go mod init command could be updated to include a default version for new projects to make it easier to use in the future.

I can see arguments for having the directory be a standard name (if we require opt-in then it can be a cleaner/simpler name) or having the name of the directory be defined in go.mod itself and allowing users to choose the name (but having a default provided by go mod init.

In my mind a solution like this achieves a balance in ease of use and less "magic".

@jayconrod wrote:

One argument in favor of a comment pragma (//go:embed) instead of a special directory name (static/): a comment lets us embed a file in the test archive for a package (or the xtest archive) but not the library under test.

This is a really nice observation. Although if we wanted to go with the special directory name, we could use a familiar mechanism: static for all builds, static_test for test builds, static_amd64 for amd64 builds, and so on. I don't see an obvious way to provide arbitrary build tag support, though.

There could be a manifest file in the static dir (default when given an empty manifest is include everything except the manifest) that includes globs and allows specifying build tags and perhaps later compression etc.

One upside is that if go list hits a dir containing a manifest it can skip that tree a la #30058

One downside is that it could get very htacces and no thanks

A simple zero-knob mechanism for bundling files into a package could be a special directory go.files in a package directory (similar to go.mod in a module). Access would be limited to that package, unless it chooses to export a symbol.

Edit: single-function runtime/files proposal:

package files

func Open(name string) (io.ReadCloser, error) {
    // runtime opens embedded file based on caller package
    return rc, nil
}
package foo

import "runtime/files"

func ReadPackageFile(name string) ([]byte, error) {
    rc, err := files.Open(name)
    if err != nil {
        return nil, err
    }
    defer rc.Close()
    return ioutil.ReadAll(rc)
}

The import "C" approach has already set a precedent for "magic" import paths. IMO it has worked out pretty well.

Since I donā€™t see it mentioned as a use case, weā€™d also benefit from this for our directory of templates we access via template.ParseFiles.

There's another challenge: while the binary might contain all needed files, those very files would be the defaults I as the developer provide. However, templates like e.g. imprint or privacy policies must be customisable by the end user. As far as I see that means that there must be some way to export my default files and then either let the binary use the customised files at runtime, or some way to replace the embedded versions by the customised ones.

I think that could be done by providing an API with functions to 'export' and 'replace' an embedded resource. The developer could then provide some command line options to the end user (using internally the mentioned API calls).

All this, of course, based on the assumption that there will actually be some kind of embedding which would definitely ease the deployment.

Thanks for opening the issue. At work we have thought about the same feature idea, as we need embedding of files in pretty much every Golang project. The existing libraries work fine, but I think this is a feature that Golang is screaming for. It's a language made to turn into a single static binary. It should embrace that by allowing us to load required asset files into the binary, with a developer friendly and universal API.

I just want to quickly provide my favored implementation details. Multiple people have talked about automatically providing an API to read the embedded files from, instead of needing another signal like a magic comment. I think that should be the way to go, as it offers a familiar, programmatic syntax to the approach. Going for a special package, possiblly runtime/embed as previously mentioned, would satisfy that and it would allow for easy extensibility in the future. An implementation like the following would make the most sense to me:

type EmbedPackage interface {
    Bytes(filename string) []bytes
    BytesCompressed(filename string, config interface{}) []bytes // compressed in-binary as configured by some kind of config struct, memoizes decompression during runtime on first access
    Reader(filename string) io.Reader
    File(filename string) os.File // readonly and contains all metadata
    Dir(filepath string) []os.File 
    Glob(pattern string) []os.File // like filepath.Glob()

    // maybe? this could allow to load JSON, YAML, INI, TOML, etc files more easily
    // but would probably be too much for the std lib implementation
    Unmarshal(filename string, config interface{}, ptr interface{}) 
}

Using that package somewhere in your code should trigger the compiler to provide that file to the runtime by embedding it automatically.

// embed a file that is compressed in-binary and automatically decompressed on first access
var LongText = embed.BytesCompressed("legal.html", embed.Config{ Compression: "gzip", CompressionLevel: "9" })

// loads a single file as reader for easy access
var FewLinesOfText = bufio.NewReader(embed.Reader("lines.txt"))
for _, line := range FewLinesOfText.ReadLines() { ... }

// embeds all files in the directory
var PdfFontFiles = embed.Dir("/fonts")

// unmarshals file into custom config
var PdfProcessingConfig MyPdfProcessingConfig
embed.Unmarshal("/pdf_conversion.json", embed.Config{ Encoding: "text/json" }, &PdfProcessingConfig)

Also I think that security issues and reproducability should not be a problem if we restrict importing to files at or possibly 1 directory level below the go.mod directory, which has also been previously mentioned in the thread already. Absolute embed paths would resolve relative to that directoy level.

Failing to access the files during the compilation process will generate a compiler error.

Its also possible to create a zip archive behind a binary, so that it can effectively become a self extracting binary. Maybe this useful in some use cases? Did this as an experiment here: https://github.com/sanderhahn/gozip

Go already has ā€žtestdataā€œ. Unit tests use regular IO to do whatever they wanna do. Test scope means content is not shipped. Thatā€˜s all there is to know, no frills, no magic, no compressed container logic, no configurable indirections, no META-INF. Beautiful, simple, elegant. Why not have a ā€ždataā€œ folder for bundled runtime scope dependencies?

We can easily scan existing Go projects in Github e.a. and come up with a number of projects that already use a ā€ždataā€œ folder, and therefore require adaption.

Another thing that is not clear to me. For the discussion of a static directory it is not 100% clear to me if we are discussion a static _source_ directory, or a static directory where files will be made available _at runtime_?

And this distinction is particularly important because it relates to the development process and debugging code that is in development.

@mikeschinkel it's pretty clear from the original post that the embedding would happen at build time:

make go install / go build do the embedding automatically

The original post, and some of the comments above, also discuss having a "dev" mode to load files at run-time.

@mvdan Thanks for the answer. So you are thinking this means that the proposed /static/ directory would be relative to the root of the app repo, the package repo, and/or possibly both?

And that location of runtime files would be totally dependence on where the developer wanted to place them?

If that is all true ā€”Ā and it does seem logical ā€”Ā it would be helpful is programs compiled with debugging information could optionally load files from their source location to facilitate debugging without a lot of extra ā€”Ā and non-standardized ā€”Ā logic and code.

A couple people above mentioned module proxies. I think it's a great litmus test for good design of this feature.

It seems possible today, without executing user code, to implement a workable module proxy today that strips files that aren't used in the build. Some of the designs above would mean module proxies must execute user code to figure out which static files to also be included.

People also mentioned go.mod as an opt-in.

Idea: specification in go.mod file? Makes it straightforward for other tools to parse.

module github.com/foo/bar

data internal/static ./static/*.tmpl.html

This would create a package at compile time with the file data. Glob syntax might be nice here, but maybe simplifying and only embedding directories is good enough. (Aside: +1 for ** glob syntax.)

import "github.com/foo/bar/internal/static"

f, err := static.Open("static/templates/foo.tmpl")

Something like StripPrefix might be nice here but not necessary. Easy to create a wrapper package that uses whatever file paths you want.

It could be further simplified:

module github.com/foo/bar

data ./static/*.tmpl.html
import "runtime/moddata"

moddata.Open("static/foo.tmpl")

But it's a bit unintuitive that moddata would have different behavior depending on the calling package/module. It would make it more difficult to write helpers (e.g., http.Filesystem converter)

It seems possible today, without executing user code, to implement a workable module proxy today that strips files that aren't used in the build. Some of the designs above would mean module proxies must execute user code to figure out which static files to also be included.

I don't think there would be a significant change here. In particular, C-code could already include any file in the tree, so a module proxy who would want to do this would need to parse C. Seems like at that point, whatever magic comments or API we introduce will be a small step.

Some of the designs above would mean module proxies must execute user code to figure out which static files to also be included.

I think it's pretty clear that "the go tool must not execute user code during build" is a line drawn in the sand that will not be crossed here. And if the go tool can't execute user code, then it must be possible to tell what files to include without it.

I have been trying to condense my various thoughts on this use-case into something cogent, and so a big +1 to what @broady suggested. I think that for the most part encapsulates what I have been thinking. However, I think the keyword should be the verb embed instead of the noun data.

  1. Embedded files feel like something that should be imported vs. just having a special comment or a magic package. And in a Go project, the go.mod file is where a developer can specify the the modules/files that are needed, so it makes sense to extend it to support embedding.

  2. Further, a collection of embedded files feels to me like they would be more valuable and reusable if a package that could be included rather than something ad-hoc added to a Go project using one-off syntax. The idea here is that if embeds were implemented as packages then people could develop and share them via Github and others could use them in their projects. Imagine community-maintained and free-to-use packages on GitHub containing:

    a. Files for countries where each file contains all the postal codes in that country,
    b. A file with all known user-agent strings for identifying browsers,
    c. Images of the flag for each country in the world,
    d. In-depth help info describing the commonly occurring errors in a Go program,
    e. and so on...

  3. A new URL scheme such as goembed:// ā€”Ā or maybe an existing one ā€”Ā that can be used to open and read files from the package thus allowing _(all?)_ existing file manipulation APIs to be leveraged instead of creating new ones, something like the following which would be relative to the embed contained in the current package:

    data, err := ioutil.ReadFile("goembed://postal-codes.txt")    
    if (err != nil) {
      fmt.Println(err)
    }
    

With the above concepts, nothing feels like _"magic"_; everything would be handled elegantly by a mechanism that feels like it was intended for purpose. There would need be very little extension required; one new verb in go.mod and one new URL scheme that would be recognized internally by Go. Everything else would be provided as-is from Go.

What I do now

I use code.soquee.net/pkgzip for this right now (it's a fork of statik that changes the API to avoid global state and import side effects). My normal workflow (in a web app at least) is to embed assets bundled up in a ZIP file then serve them using golang.org/x/tools/godoc/vfs/zipfs and golang.org/x/tools/godoc/vfs/httpfs.

go:embed approach

There are two things that would provably prevent me from adopting the go:embed approach:

  1. Generated code won't show up in documentation
  2. Assets might be scattered all over the code base (this is true of using external tools and go:generate as well, which is why I generally prefer to use a makefile to generate various sets of assets in advance of building, then I can see them all in the makefile)

There is also an issue that I don't include above because it might be a feature for some that having assets be a part of a package (as opposed to the entire module) means that all the complexity of build tags, testing packages, etc. apply to them, we need a way to specify whether they are public or private to that package, etc. This seems like a lot of extra build complexity.

The thing I like about it is that libraries could be written that just make importing assets easier. Eg. a library with a single Go file that just embeds a font, or some icons could be published and I could import it like any other go package. In the future, I could get an icon font simply by importing it:

import "forkaweso.me/forkawesome/v2"

embed package approach

While I like the idea of having this all be explicit, normal Go code, I hate the idea that this would be another magic package that can't be implemented outside of the standard library.

Would such a package be defined as part of the language spec? If not it's another place Go code would break between different implementations, which also feels poor. I'd likely continue using an external tool to prevent this breakage.

Also, as others have mentioned, the fact that this is being done at build time means this package can only take string literals or constants as arguments. There is currently no way to represent this in the type system, and I suspect it will be a point of confusion. This could be solved by introducing something like constant functions, but now we're talking major language changes making it a non-starter. I don't see a good way to fix this otherwise.

Hybrid

I like the idea of a hybrid approach. Instead of reusing comments (which end up scattered all over the place, and, on a personal note, just feel gross), I'd like to see all assets put in one place, likely the go.mod file as others have said:

module forkaweso.me/forkawesome/v2

go 1.15

embed (
    fonts/forkawesome-webfont.ttf
    fonts/forkawesome-webfont.woff2
)

This means that assets can't be included or excluded by arbitrary build tags, or in arbitrary packages (eg. the _testing package) without creating a separate module. I think this reduction in complexity may be desirable (there's no build tag hidden in a library you're trying to import, and you can't figure out why you don't have the right asset because importing the library should have embedded it), but YMMV. If this is desirable, pragma like comments could be used still except that they don't generate code and instead use the same approach as I'm about to describe for the go.mod version.

Unlike the original proposal this would not generate any code. Instead, functionality for eg. reading the data section of the ELF file (or however this ends up being stored on whatever OS you're using) would be added where appropriate (eg. os or debug/elf, etc.) and then, optionally, a new package would be created that behaves exactly like the package described in the OP except that instead of being magic and doing the embedding itself, it merely reads the embedded files (meaning that it could be implemented outside of the standard library if desired).

This works around issues like having to restrict the magic package to only allow string literals as arguments, but does mean that it's harder to check if the embedded assets are actually used anywhere or end up being dead weight. It also avoids any new dependencies between standard library packages, because the only package that needs to import anything extra is a new package itself.

var IconFont = embed.Dir("forkaweso.me/forkawesome/v2/fonts/")
var Logo = embed.File("images/logo.jpg")

As seen above, putting the resources in the module could still scope them to that particular module if desired. The actual API and how you select an asset may need some work.

Yet another idea: Instead of adding a new kind of verb embed in go.mod, we could introduce a new kind of package, a data package, which gets imported and used in go.mod in the usual way. Here's a straw man sketch.

If a package contains exactly one .go file, static.go, and that file contains only comments and a package clause, then a package is a data package. When imported, cmd/go populates the package with exported functions providing access to the files contained therein, which are embedded in the resulting binary.

If it's an actual package, that would mean internal rules would apply and we can have access controls without adding to the API.

What about automatically including all non .go files and sub folders (following the no actual code rules) in the directory then?

If a package contains exactly one .go file, static.go, and that file contains only comments and a package clause, then a package is a data package.

Would this check be done before the application of build tags? If so, that seems like yet another special case, which may want to be avoided. If not, then it's quite possible that a package might be seen as a standard Go package for some build tags, and as a data package for others. That seems odd, but maybe it's desirable?

@flimzy
That would kinda allow one to use embedded files with one tag and define the same fns/vars as the generated package and serve the files another way (maybe remote?) with another tag.

Would be nice if there were a build flag to generate the wrapper functions so one just has to fill in the blanks.

@josharian

If a package contains exactly one .go file, static.go, and that file contains only comments and a package clause, then a package is a data package.

I can envision "data" packages as having their own domain-specific functionality,Ā such as a postal-code lookup. The approach you just proposed would disallow anything other than the raw data, and thus nullify the benefits of being able to package logic with data.

I can envision "data" packages as having their own domain-specific functionality, such as a postal-code lookup.

You could expose functionality in my.pkg/postalcode and put data in my.pkg/postalcode/data (or my.pkg/postalcode/internal/data).

I see the appeal of doing it the way you suggest, but it raises a bunch of questions: How does backwards compatibility work? How do you mark a data package as such? What do you do if the package has functions that would conflict with those cmd/go would add? (Iā€™m not saying these donā€™t have answers, just that it is simpler not to have to answer them.)

@josharian, please consider the type checking comment above (https://github.com/golang/go/issues/35950#issuecomment-561443566).

@bradfitz yes, this would be a language change, and would require go/types support.

Actually, there is a way to do this without it being a language changeā€”require static.go to contain body-less functions exactly matching what cmd/go would fill in.

require static.go to contain body-less functions exactly matching what cmd/go would fill in.

If it generates per file functions instead of a catch all embed.File(), that would allow for easy per asset export controls.

So like the generated stuff would look like:

EmbededFoo() embed.Asset {...}
embededBar() embed.Asset {...}

A blog post I've written about static files 4 months ago. See the last sentence in the conclusions :-)

@josharian

You could expose functionality in my.pkg/postalcode and put data in my.pkg/postalcode/data (or my.pkg/postalcode/internal/data).

That ā€”Ā while inelegant ā€”Ā could address my concerns.

How does backwards compatibility work?

I do not see how BC concerns apply here. Can you elaborate?

How do you mark a data package as such?

With the embed statement in the go.mod?

Maybe I do not follow what you are asking.

But I will turn it around; how do you mark a package with only data as such?

What do you do if the package has functions that would conflict with those cmd/go would add?

  1. Using the proposed approach, I do not think these there would be a need for cmd/go to add any functions.

  2. Even if cmd/go does need to add functions, I envision that the behavior of conflicts in an _existing_ package would be _undefined_.

    The proposal assumes the developer follows the single responsibility principle and thus should only build a package with data to be a data-centric package and not add data to an existing logic-centric package.

    Of course a developer _could_ add to an existing package in which case the behavior would be undefined. IOW, if a developer ignores idiom they would be in uncharted territory.

Iā€™m not saying these donā€™t have answers, just that it is simpler not to have to answer them.

Except that I think the answers are simple. At least for the ones you have posed thus far.

I think that any solution that adds symbols or values for symbols must be package scoped, not module scoped. Because the compilation unit for Go is package, not module.

So this leaves out any use of go.mod to specify the list of files to import.

@dolmen if the final result its in its own package, then the scope itself would be a module.

@urandom No, the scope is the package(s) that imports that generated package. But anyway I don't think that a full generated package is in the scope of this proposal.

@urandom No, the scope is the package(s) that imports that generated package. But anyway I don't think that a full generated package is in the scope of this proposal.

regardless of how this proposal would be implemented, given that various module packages will be using the final result, it makes sense that the definition of what gets embedded gets specified on a module-level. a precedent for this already exists as well, in the java ecosystem, where embedded files are module-scoped and added from a magic directory.

furthermore, go.mod presents the cleanest way to add such a feature (no magic comments, or magic directories) without breaking existing programs.

Here is something that have not yet seen mentioned: the API for Go source processing tools (compiler, static analyzers) is as important as the runtime API. This kind of API is a core value of Go that helps to grow the ecosystem (like go/ast / go/format and go mod edit).

This API could be used by pre-processor tools (in go:generate steps in particular) to get the list of files that will be embedded or it could be used to generate the reference.

In the case of a special package, I see nothing to change in go.mod parsing (go mod tools) or go/ast parser.

@dolmen

_"I think that any solution that adds symbols or values for symbols must be package scoped, not module scoped. Because the compilation unit for Go is package, not module. So this leaves out any use of go.mod to specify the list of files to import."_

What is module? Modules are _"a collection of related Go packages that are versioned together as a single unit."_ Thus a module can consist of a single package, and a single package can be the entirety of a module.

As such, go.mod is the correct place to specify the list of files to import assuming the Go team adopts go.mod over special comments and magic packages. That is unless and until the Go team decides to add a go.pkg file.

Further, if the Go team were to accept go.mod as the place to specify embed files then anyone wanting to embed files should provide a go.mod with the embed instructions, and the package which is represented by the directory in which the go.mod file resides would be the package that contains the embedded files.

But If that is not what the developer wants they should create another go.mod file and put it in the directory for the package they want to contain their embedded files.

Is there a legitimate scenario you envision where these constraints would not be workable?

@mikeschinkel, a module is a collection of _related_ packages. However, it is possible (and reasonable!) to use one package from a module without pulling in the transitive dependencies (and data!) of other packages within that module.

Data files are generally per-package dependencies, not per-module, so the information about how to locate those dependencies should be colocated with the package ā€” not stored as separate module-level metadata.

@bcmills

It seems that one can replace 'Data files' in your message with 'modules' and it will still hold true.
It's quite common to have specific modules as dependencies for specific packages of your own.
Yet we put all of them within the go.mod.

@urandom, not all of the packages in the modules indicated in the go.mod file are linked into the final binary. (Putting a dependency in the go.mod file is _not_ equivalent to linking that dependency into the program.)

Meta-point

It's clear that this is something lots of people care about, and Brad's original comment at the top was less a finished proposal than an initial sketch / starting point / call to action. I think it would make sense at this point to wind this specific discussion down, have Brad and maybe a couple other people collaborate on a detailed design doc, and then start a new issue for a discussion of that specific (yet to be written) doc. It seems like that would help focus what has become a bit of a sprawling conversation.

Thoughts?

I'm not sure I agree with closing this one, but we can Proposal-Hold it until there's a design doc and lock comments for a bit. (Almost all the comments are redundant at this point, as there are too many comments for people to read to see whether their comment is redundant...)

Maybe when there's a design doc then this one can be closed.

Or close this one and re-open #3035 (with comments frozen) so at least one open issue tracks it.

Sorry to do this right after talk about closing, but the close discussion jumped in right after @bcmills comment and before I could clarify.

"_However, it is possible (and reasonable!) to use one package from a module without pulling in the transitive dependencies (and data!) of other packages within that module."_

Yes, clearly it is _possible._ But just like any best practice, a best practice for data packages could be to create a single package for a module, which resolves your concern. If that means a go.mod in the root and another go.mod in the subdirectory containing the data package, so be it.

I guess I am advocating that you don't make perfect be the enemy of the good here, where good here is identified as go.mod being a perfect place to specify embed files given that by their nature they are a listing the components of a module.

Sorry, but packages are the fundamental concept in Go, not modules.
Modules are simply groups of packages that get versioned as a unit.
Modules do not contribute additional semantics beyond those of the individual packages.
That's part of their simplicity.
Anything we do here should be tied to packages, not modules.

Wherever this goes and however's it done, there should be a way to get a list of all the assets that will be embedded using go list (not just the patterns used).

Putting on hold until Brad and others work up a formal design doc.

Blocking certain files shouldn't be too hard, especially if you use a static or embed directory. Symlinks might complicate that a bit, but you can just prevent it from embedding anything outside of the current module or, if you're on GOPATH, outside of the package containing the directory.

I'm not particularly a fan of a comment that compiles to code, but I also find the pseudo-package that affects compilation to be a bit strange as well. If the directory approach isn't used, maybe it might make a bit more sense to have some kind embed top-level declaration actually built into the language. It would work similarly to import, but would only support local paths and would require a name for it to be assigned to. For example,

embed ui "./ui/build"

func main() {
  file, err := ui.Open("version.txt")
  if err != nil {
    panic(err)
  }
  version, err = ioutil.ReadAll(file)
  if err != nil {
    panic(err)
  }
  file.Close()

  log.Printf("UI Version: %s\n", bytes.TrimSpace(version))
  http.ListenAndServe(":8080", http.EmbeddedDir(ui))
}

Edit: You beat me to it, @jayconrod.

This is clean and readable however am not sure the go team would want to introduce a new keyword

The idea as I remember it was to just have a special directory name like "static" containing the static data and automatically make them available through an API, with no annotations needed.

Using static as a special directory name is a bit confusing and Iā€™d rather go with assets.
Another idea I didnā€™t see in the thread is to allow importing assets as a package, e.g. import "example.com/internal/assets". The exposed API still needs a design, but at least it looks cleaner than special comments or new runtime/files-style packages.

Another idea I didnā€™t see in the thread is to allow importing assets as a package

This was proposed here: https://github.com/golang/go/issues/35950#issuecomment-562966654

One complication is that to enable typechecking to occur, you need either have this be a language change or provide body-less functions to be filled out by cmd/go.

Thatā€™s similar idea, but the static.go design allows turning an arbitrary import path into data package, while assets directory works more like testdata, internal or vendor in terms of being ā€œspecialā€. One of the possible requirements assets could have is to contain no Go packages (or only allow docs), i.e. for implicit backwards compatibility.

This could also be combined with the runtime/files-thingy API for getting the files. That is, using raw imports for embedding directory trees with files and then using some runtime package for accessing them. Could be even os.Open, but thatā€™s unlikely to be accepted.

The shurcooL/vfsgen together shurcooL/httpgzip has a nice feature where the content can be served without decompression.

eg

    rsp.Header().Set("Content-Type", "image/png")
    httpgzip.ServeContent(rsp, req, "", time.Time{}, file)

A similar feature is being proposed for C++: std::embed:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1040r0.html
https://mobile.twitter.com/Cor3ntin/status/1208389050698215427

It may be useful as inspiration for a design, and to collect possible use cases.

Iā€™m a bit late to the party, but I had an idea. Instead of special comments, a fixed special directory (static), or the apparently no-no approach of extending go.mod: How about a new manifest file thatā€™s per package: go.res

  • Contains a list of files. No paths or globs, just names in the current package directory. Generate it from a glob before committing if you need to.

    • __Edit__ it might need a single package mypackagename line at the top, like a go file would have. Alternatively, you could include the package name in the filename (e.g. mypackagename.go.res). Personally, I like the package header line better.

  • A new core package called ā€œresourceā€ or maybe ā€œio/resourceā€. Has at least one function: func Read(name string) (io.Reader, bool) to read resources embedded in the current package.

    • __Edit__ Not sure core packages work this way. Might have to be a generated package private function (e.g. func readresource(name string) (io.Reader, bool))

  • If you want resources in a sub directory, then make the subdirectory a package by adding a go.res file and at least one .go file. The go file exports your own public API for accessing resources in the subdirectory package. The go file and exported API are necessary, because resources from other packages are not automatically exported (by design). You can also customize how they are exported this way.

    • __Edit__ alternatively, if you need a directory structure and/or compression, use a tar resource. This allows for things like webpack bundles, which already require compilation (and might benefit from pre-compression). Taking them one step farther to a tar is simple.

  • __Edit__ Need a manifest? Just include the go.res file itself as a resource. We donā€™t even need to create a listresources function.

Extremely simple. One new function. One new file. No paths. No compression. No new syntax. No magic. Extensible. Readonly access via reader (but open to alternative access patterns in the future). Near zero possibility of breaking existing packages. Keeps with the package being the core construct in go.

__Edit__ After a github search language:go filename:go.res extension:res, it seems like go.res would be a pretty safe filename to use. There are no matches in go repos, and only a few in non-go repos.

I like the idea of @chris.ackermanm. But I would prefer a combination:

A go.res file specifying the namespace within a directory.

This allows for

  • multiple includes as long as the namespace differs
  • not knowing the files before and having to generate a list

The latter one should tackle the output of webpack and the likes which may change the layout due to updates, different options, whatever you can think of.

Regarding compression: that I think is more a feature in terms of not having the binary sizes explode and should be transparent to the using code.

Later you could allow for rewrites such as

filename => stored-as.png

Just my 2Ā¢

@sascha-andres It seems like ultra simplicity and zero magic is the tone of this thread. See the edits I made to my comment re your suggestions.

I donā€™t like the mapping. No need. Thatā€™s possible by exposing your own read function from a separate package anyway, and now we need a new file syntax, or something more complex than file-per-line.

Hi

This proposal is awesome!

And I have my approach to emebed assets. no need to introduce any tools other than GNU bintools. It is sort of dirty, but works well for me for now. I just want to share it and see if it helps.

my approach is to just embed my assets(compressed with tar&gz) in a elf/pe32 section with objcopy, and read it via package debug/elf and debug/pe32 along with zip when needed. all that i need to remember is to not touch any existing section. all the assets are immutable and then the code read the content and process it in memory.

i'm pretty inexperienced on language design or compiler design. so i would just use the approach described above and use .goassets or something like that as the section name. and make compresssion optional.

my approach is to just embed my assets(compressed with tar&gz) in a elf/pe32 section with objcopy, and read it via package debug/elf and debug/pe32 along with zip when needed. all that i need to remember is to not touch any existing section. all the assets are immutable and then the code read the content and process it in memory.

That sounds like it works on elf/pe32 but what about mach-o/plan9 ?

Another issue is that it relies on opening a file handle on the executable, if the executable has been overwritten/updated/deleted then this will return different data, not sure if that's a legitimate problem or an unexpected feature.

I had a bit of a go myself (using debug/macho), but I can't see a way to get this working cross-platform, I'm building on macOS and the GNU binutils installed just seems to corrupt the mach-o-x86-64 file (that could just be my lack of mach-o structure understanding and too long since I even looked at objcopy).

Another issue is that it relies on opening a file handle on the executable

Iā€™m pretty sure that program loader will (or could) load the resources section into memory, so there is no need to use debug packages. Though accessing the data would require much more tinkering with object files than it is worth.

Why not follow what works -- e.g. how Java does it. I would require things o be a big go-ish, but something in the lines:

  • create a go.res file or modify go.mod to point to the directory where the resources are
  • all files from this directory are automatically included, no exceptions by the compiler in the final executable
  • language provides an path-like API for accessing these resources

Compression, etc. should be outside of scope of this resource bundling and are up to any // go:generate scripts if needed.

Has anybody looked at markbates/pkger? It's a pretty simple solution of using go.mod as the current working directory. Assuming an index.html is to be embedded, opening it would be pkger.Open("/index.html"). I think this is a better idea than hardcoding a static/ directory in the project.

It's also worth mentioning that Go doesn't have any significant structure requirements for a project as far as I could see. go.mod is just a file and not a lot of people ever use vendor/. I personally don't think that a static/ directory would be any good.

As we already have a way of injecting (albeit limited) data into a build via the existing ldflags link flag -X importpath.name=value, could that code path be adjusted to accept -X importpath.name=@filename to inject external arbitrary data?

I realise this doesn't cover all of the stated goals of the original issue, but as an extension of the existing -X functionality does it seem a reasonable step forward?

(And if that works out then extending the go.mod syntax as a neater way of specifying ldflags -X values is a next reasonable step?)

That's a very interesting idea, but I'm worried about the security implications.

It's pretty common to do -X 'pkg.BuildVersion=$(git rev-parse HEAD)', but we wouldn't want to let go.mod run arbitrary commands, would we? (I guess go generate does, but that's not something you typically run for downloaded OSS packages.) If go.mod can't handle that, it ends up missing a major use case, so ldflags would still be very common.

Then there's the other issue of making sure @filename is not a symlink to /etc/passwd or whatever.

Using the linker precludes support for WASM, and possibly other targets that don't use a linker.

Based on the discussion here, @bradfitz and I worked out a design that sits somewhere in the middle of the two approaches considered above, taking what seems to be the best of each. I've posted a draft design doc, video, and code (links below). Instead of comments on this issue, please use the Reddit Q&A for comments on this specific draft design - Reddit threads and scales discussions better than GitHub does. Thanks!

Video: https://golang.org/s/draft-embed-video
Design: https://golang.org/s/draft-embed-design
Q&A: https://golang.org/s/draft-embed-reddit
Code: https://golang.org/s/draft-embed-code

@rsc In my opinion, the go:embed proposal is inferior to providing _universal_ sandboxed Go code execution at compile-time which would include reading files and transforming read data into an _optimal format_ best suitable for consumption at runtime.

@atomsymbol That sounds like something waaay outside the scope of this issue.

@atomsymbol That sounds like something waaay outside the scope of this issue.

I am aware of that.

I read through the proposal and scanned the code, but couldn't find an answer to this: Will this embedding scheme contain information about the file on disk (~os.Stat)? Or will these timestamps get reset to build time? Either way, these are useful pieces information that gets used in various places, e.g. we can send a 304 for unchanged assets based on this.

Thanks!

Edit: found it in the reddit thread.

The modification time for all embedded files is the zero time, for exactly the reproducibility concerns you listed. (Modules don't even record modification times, again for the same reason.)

https://old.reddit.com/r/golang/comments/hv96ny/qa_goembed_draft_design/fytj7my/

Either way, these are useful pieces information that gets used in various places, e.g. we can send a 304 for unchanged assets based on this.

An ETag header based on the file data hash would solve that problem without having to know anything about dates. But that would have to be known by http.HandlerFS or something to be able to work and to not waste resources it would have to be done only once per file.

But that would have to be known by http.HandlerFS or something to be able to work and to not waste resources it would have to be done only once per file.

How would http.HandlerFS know that the fs.FS was immutable? Should there be an IsImmutable() bool optional interface?

How would http.HandlerFS know that the fs.FS was immutable? Should there be an IsImmutable() bool optional interface?

I don't want to get into implementation details because I'm not the designer of these things but http.HandlerFS could check if it's an embed.FS type and act upon that as a special case, I don't think anyone wants to expand the FS API right now. There could also be an option argument to HandlerFS specifically to tell it to treat a filesystem as immutable. Also if this is done on application start up and all ctime/mtime have zero value handlerFS could use that info to "know" that the file hasn't changed but there are also file systems which might not have mtime or have it disabled so there might be problems there as well.

I wasn't watching the comments on this issue.

@atomsymbol welcome back! It's great to see you commenting here again.
I agree in principle that if we had sandboxing many things would be easier.
On the other hand many things might be harder - builds might never finish.
In any event, we definitely don't have that kind of sandboxing today. :-)

@kokes I am not sure about the details,
but we'll make sure serving an embed.Files over HTTP gets ETags right by default.

I have filed #41191 for accepting the design draft posted back in July.
I am going to close this issue as superseded by that one.
Thanks for the great preliminary discussion here.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mingrammer picture mingrammer  Ā·  3Comments

natefinch picture natefinch  Ā·  3Comments

longzhizhi picture longzhizhi  Ā·  3Comments

gopherbot picture gopherbot  Ā·  3Comments

michaelsafyan picture michaelsafyan  Ā·  3Comments