Yaml: Begin work on v3: towards more `encoding/json`-like behaviour

Created on 3 Feb 2016  ·  11Comments  ·  Source: go-yaml/yaml

There are a number of areas where this library differs significantly from behaviours in encoding/json which tend to come as a surprise for many Go developers, and can result in significant time spent hunting down silent failures. Many of these issues will require breaking changes to fix, if they are to be default behaviour. Of course, there are those who think it's possible to do better than encoding/json, and for those people I think v3 should provide options to mimic the default behaviours in v2, and others. However, I am strongly of the opinion that the default out-of-the-box behaviour should probably be as close as possible to the encoding/json behaviour, with other features and options strictly opt-in, as this will be far less surprising for the average Go developer.

I would like to try to compile a list of breaking changes that would go into a potential v3, and see if there is enough support to begin work on that in earnest. So, please add any breaking-changes you would like to see in a v3 in the comments, let's see how much consensus there is, and I will try to get a few days in the coming month to make some serious contributions to this effort if it looks viable.

Here are the things I would really like to see in v3:

  • Simple field-name marshalling. i.e. by default, struct field names will be encoded as they are written in Go, rather than being lower-cased (see #149 and #148)
  • Simple field-name unmarshalling. i.e. by default struct field names will be decoded if they are an exact match (casing and all) with the struct field names, or will be unmarshalled in a case insensitive manner if no exact matches exist, like in encoding/json.Unmarshal (see #149, #148, #123 cc @moul) edited to fix formatting
  • Unmarshalling of anonymous field struct fields should match the way encoding/json handles it, see #63 (cc @bcronin)

It would also be great to get in some bug fixes for this version, although some of them may possibly also be applicable to v2, but certainly safe in a new version e.g. #154 (cc @RichiH), #125 (cc @wrouesnel @3rf).

So, has anyone seen any other differences with encoding/json we should try to fix in a v3? Anyone want to spend a weekend working on it with me? :)

Most helpful comment

As someone who uses the library, I want to say first off thank you for the library and the work you have done. Speaking only for myself, I would really like to see the Encoder and Decoder exported for streaming use. That in my view would solve the issue #154 and would actually behave a lot more like the python and ruby yaml packages ( ie update the underlying keys value with the new value effectively disregarding the former). Let the user implement their own business logic for handling duplicate keys. Anyway I know I am not saying anything you do not know but that is what I would like to see in a v3 go-yaml package. Thanks again.

All 11 comments

Hi Sam. The rationale sounds sensible, and the current list of changes sounds good too. We should talk a bit more about these other issues before spending time on them, to make sure no time is wasted on it.

Before we release v3, we also need to consider a migration tool. It doesn't have to actually migrate the code, but should at least point out where to look for potential incompatibilities.

OK, sounds good @niemeyer, but would a simple changelog listing the breaking changes explicitly not be enough, after all no one would be broken if they keep using v2... Just thinking that writing that tool might be quite a complex endeavour to get it right.

Yes, quite possibly not trivial. But the opposite of that is letting every other developer figure out by themselves what is going to break. Instead, what people will most likely do is stick to v2, which is not great either.

By the way, the actual changes listed above are all trivial by themselves. Under 30 minutes, I'd say.

Hmm, I think that basically anything using v2's Marshal or Unmarshal would be considered broken in the proposed v3 until it is tested. In most cases Unmarshal will still work, but will additionally unmarshal un-inlined anonymous struct members, but Marshal will produce completely different output... I can't see a general way to check brokenness without knowing the what the consumers of the data produced by this library are expecting...

One feature I think will be important will be having the library configurable to treat field names, anonymous structs, etc, in different ways. In fact, it will probably be quite easy to provide 2 default configurations, one which is json/encoding-like, and will be turned on by default, and one which it go-yaml/yaml.v2-like. E.g.:

V2Config := yaml.Config{FieldNameTransform: yaml.LowercaseFieldNameTransform, AutoInlineAnonymousStructFields: false}
yaml.Configure(V2Config)
// or, just to tweak a single parameter away from default...
yaml.Configure(yaml.Config{AutoInlineAnonymousStructFields: false})

EDIT: The code above was edited for clarity

That way, if we added a hard-coded yaml.V2Config, there would be a really easy upgrade path for those wanting the new library, but with the old config without risking breaking anything.

However, the change suggested above is not completely trivial, especially to add enough test cases for the different config combinations. I made a stab at it in #149, and probably I can just tweak that PR to achieve something like the above. Note that it requires an abolition of the global struct cache, and its replacement with a default global codec, with its own cache, that gets invalidated any time the config is changed.

Hmm, I think that basically anything using v2's Marshal or Unmarshal would be considered broken in the proposed v3 until it is tested. (...)

This is not just a technical issue. There's a very natural tendency for code to not be updated, ever. If there's known-for-sure breakage, the inertia increases significantly. If there's unknown known-for-sure breakage, the inertia is so high we may as well not do it.

We can't have a global flag, because in a same application different packages may depend on different behavior. That said, we can have a local flag, for a yaml.Decoder and yaml.Encoder, akin to json.Encoder/Decoder. This is actually where I would start. Introduce into v2 a compatibility flag. This benefits everyone, with no breakage. Eventually we can introduce a v3 which has this mode by default, and kill the old logic.

As someone who uses the library, I want to say first off thank you for the library and the work you have done. Speaking only for myself, I would really like to see the Encoder and Decoder exported for streaming use. That in my view would solve the issue #154 and would actually behave a lot more like the python and ruby yaml packages ( ie update the underlying keys value with the new value effectively disregarding the former). Let the user implement their own business logic for handling duplicate keys. Anyway I know I am not saying anything you do not know but that is what I would like to see in a v3 go-yaml package. Thanks again.

Issue #215 with patch available!! https://github.com/go-yaml/yaml/pull/218 should be part of V3 if not sooner :)

If this issue is still being considered at any level, then another item I would add is to add support for marshalling json.Number objects as numbers rather than as strings to match the behavior of encoding/json.

Here's an example program that demonstrates the difference in behavior between yaml.v2 and encoding/json:

package main

import (
    "encoding/json"
    "fmt"
    "strings"

    "gopkg.in/yaml.v2"
)

func main() {
    jsonObj := `{"num":13}`

    decoder := json.NewDecoder(strings.NewReader(jsonObj))
    decoder.UseNumber()

    var unmarshalled map[string]interface{}
    if err := decoder.Decode(&unmarshalled); err != nil {
        panic(err)
    }
    fmt.Printf("%+v\n", unmarshalled)       // map[num:13]
    fmt.Printf("%T\n", unmarshalled["num"]) // json.Number

    bytes, err := json.Marshal(unmarshalled)
    if err != nil {
        panic(err)
    }
    fmt.Println(string(bytes)) // {"num":13}

    bytes, err = yaml.Marshal(unmarshalled)
    if err != nil {
        panic(err)
    }
    fmt.Println(string(bytes)) // num: "13"
}

The desired behavior would be for the last line to output num: 13 rather than num: "13".

231 is another open issue having to do with making the behavior more consistent with encoding/json.

+1 would be nice if yaml --> json --> yaml work's the same way.

Work on v3 has begun, and I should have something ready for testing soon.

The notes above are all being taken into account, but I will close this now as it's becoming just a pointer into other issues.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ldesplat picture ldesplat  ·  6Comments

thallgren picture thallgren  ·  8Comments

zubairhamed picture zubairhamed  ·  8Comments

RichiH picture RichiH  ·  7Comments

michaelsauter picture michaelsauter  ·  28Comments