Yaml: YAML spec violation: Duplicate keys are silently ignored, and parser selects incorrect key

Created on 28 Dec 2015  ·  7Comments  ·  Source: go-yaml/yaml

As per https://github.com/prometheus/prometheus/issues/1275 :

If you run with a configuration like:

scrape_configs: <<<<<<<<<<<<<< this is discarded
  - job_name: 'prometheus_system'
    target_groups:
    - targets: ['localhost:9100']

scrape_configs:
  - job_name: 'foo_system'
    target_groups:
    - targets: ['foot:9100']

the former scrape_configs is silently discarded. While that's the obvious case, constructs like

scrape_configs: <<<<<<<<<<<<<< this is discarded
  - job_name: 'prometheus_system'
    target_groups:
    - targets: ['localhost:9100']

rule_files:
  - 'prometheus.rules'

scrape_configs:
  - job_name: 'foo_system'
    target_groups:
    - targets: ['foo:9100']

are silently discarded as well.

Raising an error would be appreciated, and suggested by the YAML specs, in this situation.

Also, as @fabxc notes, http://yaml.org/spec/1.1/#id932806 says that the first, not the last, occurence should be valid.

Most helpful comment

@niemeyer in strict mode yes this sounds resolved, but if the non-strict mode doesn't error, I'd say it should only be considered resolved once it follows the spec (as mentioned in the original issue)

It is an error for two equal keys to appear in the same mapping node. In such a case the YAML processor may continue, ignoring the second key: value pair and issuing an appropriate warning. This strategy preserves a consistent information model for one-pass and random access applications.

http://yaml.org/spec/1.1/#id932806

i.e. the first not the last entry should be the one which is kept.

Edit: actually the latest (1.2) spec just says:

The content of a mapping node is an unordered set of key: value node pairs, with the restriction that each of the keys is unique
[…]
JSON's RFC4627 requires that mappings keys merely “SHOULD” be unique, while YAML insists they “MUST” be.

With no suggestion of gracefully handling conflicting keys in any way, I'm inclined to say that for YAML 1.2 compliance, rejecting duplicate keys should be standard in both strict and non-strict modes.

All 7 comments

We have created a new pull request to resolve this; it includes tests for the situation described, and tests for valid cases (e.g. duplicate values are OK; only check for duplicate keys)

Can this land on the v2 branch, or does this fix break backwards compatibility and need to land on a v3 branch?

Any update on possibly merging to the v2 branch?

I believe this is addressed in the v2 branch via UnmarshalStrict. Although be aware that this also requires that there are fields present in the struct for all fields in the YAML data.

@rogpeppe - this should probably have been tagged as fixed in https://github.com/go-yaml/yaml/pull/307 - unless there are plans to add duplicate key detection to non-strict mode, then I suspect this should be closed.

I will review this behavior and perhaps make it standard in v3. But yet, today that new strict mode already handles the case discussed here.

@niemeyer in strict mode yes this sounds resolved, but if the non-strict mode doesn't error, I'd say it should only be considered resolved once it follows the spec (as mentioned in the original issue)

It is an error for two equal keys to appear in the same mapping node. In such a case the YAML processor may continue, ignoring the second key: value pair and issuing an appropriate warning. This strategy preserves a consistent information model for one-pass and random access applications.

http://yaml.org/spec/1.1/#id932806

i.e. the first not the last entry should be the one which is kept.

Edit: actually the latest (1.2) spec just says:

The content of a mapping node is an unordered set of key: value node pairs, with the restriction that each of the keys is unique
[…]
JSON's RFC4627 requires that mappings keys merely “SHOULD” be unique, while YAML insists they “MUST” be.

With no suggestion of gracefully handling conflicting keys in any way, I'm inclined to say that for YAML 1.2 compliance, rejecting duplicate keys should be standard in both strict and non-strict modes.

The 1.2 spec actually says that keys are unique, period. So any changes in behavior at this point simply means a choice of which way to handle a broken document, and we're not going to change the way we handle broken documents now because that can break actual software that is working today. For that reason, v2 won't change.

I will review the behavior before considering v3 final, but I'm not yet making any promises on the direction we'll go there either.

Was this page helpful?
0 / 5 - 0 ratings