Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Built-in YAML support #4910

Open
mweinelt opened this issue Jun 12, 2021 · 24 comments
Open

Built-in YAML support #4910

mweinelt opened this issue Jun 12, 2021 · 24 comments
Labels
language The Nix expression language; parser, interpreter, primops, evaluation, etc significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc.

Comments

@mweinelt
Copy link
Member

mweinelt commented Jun 12, 2021

Is your feature request related to a problem? Please describe.
Currently in nixpkgs YAML is often rendered as JSON and pkgs.formats.yaml denotes

# YAML has been a strict superset of JSON since 1.2

which glances over the fact that JSON is only a subset of YAML.

Specifically we have no proper way to denote builtin and user-defined types, which are unquoted strings prefixed with two or one exclamation mark.

Describe the solution you'd like
A more complete mapping from Nix to YAML.

Describe alternatives you've considered
Rendering an attrset to JSON, then using remarshals json2yaml to convert it to YAML and then apply a regular expression to unquote strings that start with an exclamation mark.

Home-assistant uses these to substitute secrets and do other includes at runtime.

{ pkgs, ... }:
let
  configFile = pkgs.runCommand "configuration.yaml" { preferLocalBuild = true; } ''
    ${pkgs.remarshal}/bin/json2yaml -i ${configJSON} -o $out
    # Hack to support custom yaml objects,
    # i.e. secrets: https://www.home-assistant.io/docs/configuration/secrets/
    sed -i -e "s/'\!\([a-z_]\+\) \(.*\)'/\!\1 \2/;s/^\!\!/\!/;" $out
  '';
in
{}

Additional context
https://en.wikipedia.org/wiki/YAML#Advanced_components

@roberth
Copy link
Member

roberth commented Jun 16, 2021

You would have to represent these advanced components in the Nix language somehow; perhaps:

{ type = "yaml-advanced-component"; value = "..."; }

With such a representation, nothing should stop you from converting that to proper YAML in a derivation.

There's some concern about keeping Nix itself easy to bootstrap, so adding a yaml library dependency comes at a cost.

For TOML we only have a parsing function, so we don't need IFD to read those files into Nix values. The same might apply for YAML, but that's not the use case you're describing.

@stale
Copy link

stale bot commented Jan 3, 2022

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Jan 3, 2022
@mweinelt
Copy link
Member Author

mweinelt commented Jan 3, 2022

I still care about this feature.

@stale stale bot removed the stale label Jan 3, 2022
@DavHau
Copy link
Member

DavHau commented May 29, 2022

I would be very happy to have a builtins.fromYAML.
YAML is commonly used in other package ecosystems to express package metadata.
This metadata could be converted to derivations via nix without IFD or any extra steps if nix had a parsing function.
To give some examples:

  • yarn.lock (yarn v2) for nodejs
  • stack.yaml and stack.yaml.lock for haskell

@ners
Copy link
Member

ners commented Jun 17, 2022

I'm happy to take ownership of this. I'll get started on Wednesday next week.

@DavHau
Copy link
Member

DavHau commented Aug 13, 2022

How is the status on this @ners ? We just had another use case popping up at nix-community/dream2nix#234

@roberth
Copy link
Member

roberth commented Aug 13, 2022

Adding a YAML parser effectively makes that specific YAML parser behavior part of the Nix language, because the Nix language should be reproducible.

Parsing YAML through IFD is actually a good thing for reproducibility, because it pins the implementation of the YAML parser to a specific version, preventing any repro bugs that will be caused by fixes in the YAML parser, switching to a different YAML library, YAML major version upgrades, etc.

@DavHau
Copy link
Member

DavHau commented Aug 13, 2022

Another approach to solve the logic pinning issue could be: #1491

I think IFD is not a good option for parsing. See #1491 (comment)

Aside from the mentioned issues, the problem with IFD is that it significantly impacts UX in a bad way. For example, if the package name is part of the yaml, and therefore read through IFD, simply listing your packages will become an expensive operation.
If a nix expression depends on IFD, it also looses the capability to be evaluated on arbitrary systems which destroys a lot of value of nix.
Therefore nix flake show will crash by default if IFD is used.

@roberth
Copy link
Member

roberth commented Aug 13, 2022

👀 I wouldn't think of Nix as a project that prefers to sacrifice correctness and reliability for a little performance improvement.

For example, if the package name is part of the yaml, and therefore read through IFD, simply listing your packages will become an expensive operation.

Not taking the package name from IFD doesn't seem too hard.
Of course the outputs will rely on it, but as long as the set of output names is also known statically, only the output attributes of the package need to be strict in the IFD. Assuming that the other attributes aren't taken from the IFD either, the package attrset can be returned without IFD, and all non-output attrs can be evaluated without IFD. Finally there's RFC92 outputOf which can make even the output attrs free of IFD.

If a nix expression depends on IFD, it also looses the capability to be evaluated on arbitrary systems which destroys a lot of value of nix.

It does complicate things a bit. Usually this is not a problem at all. When it is, remote builds can be used to solve it. If we don't consider that to be sufficient, we could add explicit cross-evaluation support to Nixpkgs. Similar to buildPlatform, we'd have evalPlatform (defaulting to buildPlatform), which produces pkgs.evalPackages, which is a Nixpkgs non-cross package set for evalPlatform.

Note that this is only for building. Listing packages and package metadata can be done without IFD.

@roberth
Copy link
Member

roberth commented Aug 13, 2022

I suppose a middle ground is to require the builtins.fromYAML calls to specify an implementation name and version number instead of a proper pin, so that future versions can be bug-for-bug compatible. (Sounds bad, right?, but that's exactly what we need)

@DavHau
Copy link
Member

DavHau commented Aug 13, 2022

My interest comes from building compatibility layers between nix and existing package managers. Reading other serialization formats like JSON, TOML, YAML is crucial for this as other package managers store data usually in these formats. YAML not being supported is a pain point.

I have tried IFD, and I made the experiences that it complicates things a lot.

Sure, one can try to avoid reading the package name via IFD and instead use some regex. Doing the same with package metadata becomes a lot harder already.

Also, requiring users to setup remote builders in order to evaluate nix expressions, I think, is not the kind of UX we should aim for.

we could add explicit cross-evaluation support to Nixpkgs. Similar to buildPlatform, we'd have evalPlatform (defaulting to buildPlatform), which produces pkgs.evalPackages, which is a Nixpkgs non-cross package set for evalPlatform.

Sounds interesting. Maybe I misunderstand, but does this means that there would be a evalPlatforms parameter to my nix expression with the expectation that the value of evalPlatform doesn't impact the evaluation result? This seems based on the assumption that the same tools executed on different evalPlatforms yield the same result. For a yaml parser this might be quite likely, but not for other tools. The mach-nix extractor for example is an IFD based function to extract metadata from a setup.py file using setuptools. Executing it on different platforms will lead to a different set of dependencies. Therefore, integrating a concept of evalPlatforms sounds potentially dangerous and would break reproducible evaluation.

I suppose a middle ground is to require the builtins.fromYAML calls to specify an implementation name and version number instead of a proper pin, so that future versions can be bug-for-bug compatible. (Sounds bad, right?, but that's exactly what we need)

I like that idea. There might be an overhead of maintaining more than one implementation at the same time, but old versions could be deprecated after a while.

@roberth
Copy link
Member

roberth commented Aug 13, 2022

YAML is crucial

I don't want to diminish the importance of the ability to process YAML documents using the evaluator. I want us to make an informed decision and hopefully not sacrifice other important features like reproducibility.

old versions could be deprecated after a while.

Being able to build age-old packages on a current Nix installation is not something I think we should sacrifice after all these years.

evalPlatforms [...] potentially dangerous

Just like with cross, the builder is responsible for creating an output that is appropriate for hostPlatform. So yes, just like you can have package expressions that don't work for cross, you could have IFD that only works for same-platform evaluation.

No more dangerous than cross.
I don't know. Maybe evalPlatform is over-engineering it, because most users will only eval for their own platform anyway. Evaluating for another platform without the intent to build for it is a little odd. Not completely useless, but not a necessity.

I have tried IFD, and I made the experiences that it complicates things a lot.

RFC 92 implementation has not been completed yet, so this may be a premature judgement.

Sure, one can try to avoid reading the package name via IFD and instead use some regex. Doing the same with package metadata becomes a lot harder already.

I agree regexes are problematic. What else do you need for nix flake show, besides the name of the package? Most of the actual meta metadata should be specified in a Nix expression anyway. Any computation that is not required for the package name and list of output names can be deferred with outputsOf.

@DavHau
Copy link
Member

DavHau commented Aug 14, 2022

Most of the actual meta metadata should be specified in a Nix expression anyway.

Some automatic nix packaging approaches benefit from not having to write (or generate) nix expressions. This on-the-fly translation increases eval complexity and might not be the first choice for repos like nixpkgs, but it allows using nix on existing software projects without intermediary steps, which, for example, is valuable for side-by-side usage with other package managers where otherwise, two sets of package expressions would have to be maintained. It also allows to use nix as a standard interface to inspect arbitrary packages of other ecosystems.

What else do you need for nix flake show, besides the name of the package?

nix flakes show only requires the package name AFAICT. Though there are other potential use cases, like inspecting the package license or homepage, dependencies, etc. These things can be of interest, even without the intention to build the package. Alternatively this information could be provided in other ways, too. One could provide a separate tool for inspection. It probably just becomes less integrated with native nix tooling and less re-usable inside other nix expressions.
For everything else, I agree, it can be deferred via IFD/rfc92.

@wmertens
Copy link
Contributor

Actually, @arcnmx implemented a subset of YAML here: https://github.com/arcnmx/nixexprs/blob/f90f7c0a758f2142a448b9e59cc5d6f768b9275b/lib/from-yaml.nix

Maybe it will prove sufficient to parse those yaml lock files.

@domenkozar
Copy link
Member

If there's one thing we learned from @grahamc talk at nixcon, it would be to implement builtins.fromYAML to bridge the gap between what the industry uses and what can be fed into Nix.

@NaN-git
Copy link

NaN-git commented Nov 28, 2022

I suppose a middle ground is to require the builtins.fromYAML calls to specify an implementation name and version number instead of a proper pin, so that future versions can be bug-for-bug compatible. (Sounds bad, right?, but that's exactly what we need)

I would be happy, if this could be discussed in #7340. At the moment I just added builtins.fromYAML without any versioning.

@roberth roberth added the significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. label Jun 2, 2023
@FlafyDev
Copy link

This will be useful for Dart / Flutter packaging in Nixpkgs. Currently we either IFD or convert the pubspec.lock file from yaml to json and use lib.importJSON.

@roberth roberth added the language The Nix expression language; parser, interpreter, primops, evaluation, etc label Jul 11, 2024
@Kek5chen
Copy link

Kek5chen commented Nov 4, 2024

I also think that the toYAML function shouldn't produce JSON output. From visuality, one could expect this to output a human-readable YAML file which can be plugged into programs:

configFile = builtins.toFile "config.yaml" (generators.toYAML {} cfg.configuration);

Instead, this gives you a JSON file because it internally calls toJSON. Consequently YAML-specific parsers, might fail to parse this.

You could argue that "I just did it wrong", but the behavior of the example does not align with its syntactic intent and I find that very, very unintuitive.

@quentinmit
Copy link

I also think that the toYAML function shouldn't produce JSON output. From visuality, one could expect this to output a human-readable YAML file which can be plugged into programs:

configFile = builtins.toFile "config.yaml" (generators.toYAML {} cfg.configuration);

Instead, this gives you a JSON file because it internally calls toJSON. Consequently YAML-specific parsers, might fail to parse this.

All valid JSON is also valid YAML; any parser which has trouble with this is not a YAML parser. [1]

You could argue that "I just did it wrong", but the behavior of the example does not align with its syntactic intent and I find that very, very unintuitive.

It sounds like you are expecting toYAML to produce a human-friendly prettified file, instead of the minimal output it uses today. That's a fair ask (for both YAML and JSON), but it feels orthogonal to me.

[1] This was not explicitly required until the YAML 1.2 spec, but 1.2 came out 15 years ago in 2009, so that's mostly a historical curiosity at this point.

@Kek5chen
Copy link

Kek5chen commented Nov 4, 2024

Well, either way, YAML != JSON and therefore shouldn't be treated as such. Take the rust serde parser implementation for YAML as an example. You might find, it's unable to parse the file generated by the current nix generator.

The expectation that every YAML Parser out there can parse JSON is a very far fetch. serde_yaml has been downloaded over 84 million times and there are currently more than 2822 direct dependents published on crates.io not even mentioning private repositories we're not aware of.

For those 2822+ packages the Nix implementation will not suffice and an ugly, overcomplicated and most importantly unintuitive workaround has to be used. (See issue)

Similar things apply to other parser implementations as well.

Another real-world scenario I can imagine off the top of my head is the the Minecraft Spigot ecosystem where plugins conventionally use YAML as the default config format. These plugins usually implement their configurations with the default bundled parser. This parser does not support JSON and configuring a nix server, or a cluster with the plugin configurations in place, I can only imagine as incredibly frustrating.

These are only some examples on why the current toYAML implementation is nonsensical. If toYAML is just an alias for toJSON, then it would be more practical to omit it entirely or have users define an alias themselves if they want such a conversion as the norm for their projects.

You make a valid point about the technicalities, and in theory, these implementations should adhere to spec. But in reality implementations often fall short of spec for various reasons. Practical experience shows that “spec-compliant” and “real-world compatibility” are rarely the same thing, so why treat it as such here if people just stumble over it?

@quentinmit
Copy link

To be clear, I'm not arguing that we shouldn't have support for generating YAML documents that are not representable as JSON - that's what I understand this issue to be about. Nix/nixpkgs should definitely have a way to generate the unique-to-YAML features like tagged values. The rest of this comment only deals with the subset of YAML documents that are also representable as JSON.

Well, either way, YAML != JSON and therefore shouldn't be treated as such. Take the rust serde parser implementation for YAML as an example. You might find, it's unable to parse the file generated by the current nix generator.

Do you have a concrete example of a valid real-world JSON document that serde_yaml can't parse? I Googled serde_yaml json bug and I can't seem to find anyone reporting such a bug.

For those 2822+ packages the Nix implementation will not suffice and an ugly, overcomplicated and most importantly unintuitive workaround has to be used. (See issue)

Did you mean to link an issue here? What is the workaround you're referring to? The sed hack at the top of the current issue is only for adding tagged values into a YAML document, which is outside the scope of valid JSON.

Similar things apply to other parser implementations as well.

Another real-world scenario I can imagine off the top of my head is the the Minecraft Spigot ecosystem where plugins conventionally use YAML as the default config format. These plugins usually implement their configurations with the default bundled parser. This parser does not support JSON and configuring a nix server, or a cluster with the plugin configurations in place, I can only imagine as incredibly frustrating.

Again, where do you see any indication that this does not actually support Nix-generated JSON? Have you tried it yourself?

These are only some examples on why the current toYAML implementation is nonsensical. If toYAML is just an alias for toJSON, then it would be more practical to omit it entirely or have users define an alias themselves if they want such a conversion as the norm for their projects.

You make a valid point about the technicalities, and in theory, these implementations should adhere to spec. But in reality implementations often fall short of spec for various reasons. Practical experience shows that “spec-compliant” and “real-world compatibility” are rarely the same thing, so why treat it as such here if people just stumble over it?

I certainly don't think that every YAML library has zero bugs and implements the spec perfectly. Of course not. But "JSON compatibility" is just a particular special case in the broad category of "correctly parses YAML files". There are plenty of other parsing bugs that YAML libraries can - and do - have. Whether Nix generates JSON-looking YAML or prettified multi-line YAML, it's just as likely to run into a parsing bug in some language's YAML implementation. Arguably more likely, because many of the parsing bugs I've run into are specific to long-form YAML. Is it Nix's job to emit YAML that correctly parses in every YAML library in every language?

I haven't actually seen any "real-world compatibility" issues caused by using a JSON serializer to generate YAML. It's important to realize that YAML parsers don't have some magic "JSON mode" that they need to explicitly support. JSON support just comes automatically as a side effect of the basic YAML syntax rules. For example,

a:
- foo
- bar
- baz

is a valid YAML document (and I assume roughly what you expect to see from a prettified toYAML function). But quotes are optional for strings in YAML, so it's exactly the same as:

"a":
- "foo"
- "bar"
- "baz"

and you can also use [] and {} for arrays and objects, respectively, instead of newlines and - :

{"a": ["foo", "bar", "baz"]}

and now we've arrived at valid YAML that is also valid JSON. It's extremely common to see one or the other of these used in real-world YAML files.

@Kek5chen
Copy link

Kek5chen commented Nov 5, 2024

Mhm, it's good to know that we're following the same goals. I actually just fact checked myself on the serde_yaml part, just to make sure. And it seems that I was wrong. It just so happened that I ran into a bug on my side recently that looked just like as if it wasn't able to take in JSON... I'm not sure about if the spigot implementation can do it, but I'll just assume that it will also work. Thanks for the hint.

I did not mean to link an issue, I was referring to this issue. More specifically to the call to the derivation which turns json into prettified yaml. I find that very ugly and unintuitive.

Next up, no I don't think that it's nix' responsibility to output dynamic and external library-fitted yaml. Of course changing the generator definition on the user side should allow that but that's how it should be with anything nix.

I suppose in the end though it becomes clear that it would be nice to have human readable yaml, although, as discussed, not mandatory. I do now also agree that toYAML should just output minified JSON for performance reasons. I can see that now.

Maybe this could be supported through something along the lines of a JSON to YAML converter? Or maybe options in the currently reserved parameter that make the function output human readable representation of either JSON and YAML?

Thanks for the discussion, it was very insightful.

@quentinmit
Copy link

I did not mean to link an issue, I was referring to this issue. More specifically to the call to the derivation which turns json into prettified yaml. I find that very ugly and unintuitive.

Yeah, totally. But the ugly and unintuitive part is the sed invocation, and that's specifically to generate YAML tags. IIUC it basically generates this JSON:

{"a": "!secret foo"}

which json2yaml turns into

a: '!secret foo'

and then sed is used to remove the quotes and turn it into

a: !secret foo

Incidentally, I tried using this derivation to build a YAML config file with tagged values and discovered that it doesn't work with long strings. What I ended up with is:

(format.generate "out.yaml" value).overrideAttrs {
    buildCommand = ''
      json2yaml --yaml-width inf "$valuePath" | sed -e "
        s/'\!\([A-Za-z_]\+\) \(.*\)'/\!\1 \2/
        s/^\!\!/\!/
        T
        s/'''/'/g
      " > "$out"
    '';
  }

which prevents json2yaml from wrapping long strings in a way that breaks the sed.

@roberth
Copy link
Member

roberth commented Nov 6, 2024

I am not eager to implement toYAML support, because evaluating this builtin must produce the exact same result going into the future. Otherwise evaluation is not reproducible, and even if the new behavior might turn out ok, you won't be able to use previously built binaries (particularly painful if you collaborate using multiple versions of Nix).
This implies:

  • When using a library, the library must not change its formatting of the YAML output. Most libraries don't have hash-for-hash stability in their requirements.
  • When not using a library, we have to basically get it right the first time, or have it as an experimental feature for a long time (which is just expectation management and obviously still sucks)

Most of the time, pkgs.formats.yaml can be used for generating human-friendly YAML.
In cases where YAML-specific syntax must be produced, I would recommend to generate a YAML AST in JSON and convert it to YAML syntax in a derivation. That way, the implementation details are pinned for reproducibility, while keeping the ability to evolve the implementation as desired.

(Side note: parsing YAML should be well-defined and unambiguous, and a fromYAML is far more likely to be added. It is on the good direction of the one-to-many relationship between semantics and syntax.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language The Nix expression language; parser, interpreter, primops, evaluation, etc significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc.
Projects
None yet
Development

No branches or pull requests