Some genius decided that, to make time input convenient, YAML would parse HH:MM:SS as SS + 60×MM + 60×60×HH. So you could enter 1:23:45 and it would give you the correct number of seconds in 1 hour, 23 minutes, and 45 seconds.
They neglected to put a maximum on the number of such sexagesimal places, so if you put, say, six numbers separated by colons like this, it would be parsed as a very large integer.
Imagine my surprise when, while working at a networking company, we had some devices which failed to configure their MAC addresses in YAML! After this YAML config file had been working for literal years! (I believe this was via netplan? It's been like a decade, I don't remember.)
Turns out, if an unquoted MAC address had even a single non-decimal hex digit, it would do what we expected (parse as a string). This is not only by FAR the more common case, but also we had an A in our vendor prefix, so we never ran into this "feature" during initial development.
Then one day we ran out of MAC addresses and got a new vendor prefix. This time it didn't have any letters in it. Hilarity ensued.
(This behavior has thankfully been removed in more recent YAML standards.)
Perl has a Poland Problem. The customary file extension for Perl files is *.pl. This worked well until Apache introduced content negotiation and the convention to add a language code as file extension. It had index.html.en, index.html.de, for example.
index.html.pl is where the problem started and the reason why the officially recommended file extension for Perl files used to be (still is?) *.plx.
I don't have the Camel book at hand, but Randal Schwartz's Learning Perl 5th edition says:
"Perl doesn't require any special kind of filename or extension, and it's better not to use an extension at all. But some systems may require an extension like plx (meaning PerL eXecutable); see your system's release notes for more information."
It should have been an Apache problem, yes. Not only did it turn out that at least the language negotiation part of content negotiation wasn't the best idea but the way Apache handled it was problematic apart from the pl problem. In the end the Perl community took the issue upon them, so historically I'd say it was a Perl problem (of choice).
Programming with string templates, in a highly complex and footgun-rich markup language, is one of the things I find most offputting about the DevOps ecosystem.
Several years ago when I was writing a deployment system for a cloud distributed database, I tried to automate everything with Ansible playbooks and the Ansible "API" (LOL). I pretty quickly gave up on implementing anything but the most trivial logic in templated YAML and switched to Python (wrapping maximally-dumb Ansible playbooks) for everything nontrivial.
Just about every time someone complains about ansible, there's a comment to plug this project but pyinfra seems to opt-out of the cloud provisioning part, instead delegating to its terraform connector, which drags in all the nonsense that entails. That makes it not only less useful but (IMHO) a horrible name for a project that only does "remote execution" and not infrastructure. The fact that it's even missing @aws @azure @gcp connectors further solidifies "who is the audience for this thing?"
This is why I generally use Terraform for Kubernetes. It's not perfect, but it's miles better than the various different YAML-templating solutions (Kustomize, Helm) popular in the Kubernetes ecosystem.
"The limits of my keyboard mean the limits of my programming language."
If only they had had ⊥ and ⊤ somewhere on their keys to work with Booleans directly while designing the languages. In another branch of history, perchance.[1]
Always quote all yaml strings. If you have a yaml file that has something that isn't a simple value (number, boolean) such as for example a date, time, ip-address, mac address, country code, phone number, server name, configuration name, etc. etc. then you are asking for trouble. Just DON'T DO THAT. It's pretty simple.
"Yeah but it's so convenient"
"Yeah but the benefit of yaml is that you don't need quotes everywhere so that it's more human readable"
This has been fixed since 2009 with YAML 1.2. The problem is that everyone uses libyaml (_e.g._ PyYAML _etc._) which is stuck on 1.1 for reasons.
The 1.2 spec just treats all scalar types as opaque strings, along with a configurable mechanism[0] for auto-converting non-quoted scalars if you so please.
As such, I really don't quite grok why upstream libraries haven't moved to YAML 1.2. Would love to hear details from anyone with more info.
It’s silly to have so many keyword synonyms as specified in that earlier regex. I’m also glad we can’t specify numeric literals as roman numerals. KISS
Honestly I’d prefer if “yes” and “no” were the only ways to spell the boolean values. They make sense in pretty much all contexts where booleans are used, whereas “true” and “false” rarely make sense.
In boolean logic true/false is ubiquitious and well known.
As you can see, if one tries to be cute with it, one will get all sorts of issues.
So at this point it doesnt make sense to use anything else.
The true/false terminology makes sense in boolean logic because you’re dealing with the truth of propositions. However, it does not make sense in the context of a configuration language, where there are no propositions that could be true or false.
It makes sense in the context of a configuration language because virtually 100% of programmers and other technical computer users understand “true” and “false” as the canonical Boolean values, and as far as I know that has always been the case. It never would have made sense to invent different unfamiliar terms like “yes” and “no” because of some niche philosophical distinction between “Boolean logic” and “configuration” that almost nobody in the real world cares about.
They are familiar as English words, yes, but unfamiliar as terms of art for Boolean values in computing. It’d be like replacing “if” statements with “whenever” statements.
The fix is to make conversion user-controllable. If you want to disallow bare scalars except for booleans and numbers or whatever, it's just a little bit of configuration away.
Logging: no could also be log in norwegian. Or log only for the norwegian region. That's the thing with too many keywords and optional quoting, you can't know.
And for this reason, "logging: false" would be clearer than "logging: no" to represent "I do not want logging".
`false` could be a code for something else just as well as `no`. For example, it could mean that I only want to see logs of false information appearing in the system. The only proper solution is to require quotes around strings.
How often do people even encounter this issue?
I have been using YAML for 5+ years and have never had it before.
Further, I use `yamllint` which points this out as a lint issue "truthy value should be one of [false, true]".
I don't recall encountering the norway problem in the wild.
Ansible has a pretty common issue with file permissions, because pretty much every numeric representation of a file mode is a valid number in YAML - and most of them are not what you want.
Sure, we can open up a whole 'nother can of worms if we should be programming infrastructure provisioning in YAML, but it's what we have. Chef with Ruby had much more severe issues once people started to abuse it.
While JSON is annoying because it lacks some pretty basic features (comments, trailing comma), at least its spec is short. YAML is huuuge - there are way too many ways to do the same thing.
IMO the proposed solution of StrictYAML + schema is the right one here and what we use extensively for human readable configs. StrictYAML (linked to in the post) is essentially a string-type-only restriction of YAML, so you impose your type coercion on the parsed data structure.
Being liberal in what you accept, also known as the “robustness principle”, doesn’t mean being ambiguous or surprising about how you accept it. If anything, robustness requires a great deal more precision and clarity (at least with your own reasoning, then with how you communicate what to expect from it).
Postel's Law does not deserve to be nicknamed the Robustness Principle.
Robustness has a meaning and it refers to handling bad inputs gracefully. An example of a lack of robustness is allowing a malicious actor to execute arbitrary code by supplying a datum larger than some buffer limit.
Trying to make sense of invalid inputs and do something with them isn't robustness. It's just example of making an extension to a spec. The extension could be robust or not.
Postel's Law amounts to "have extensions and hacks to handle incorrectly formatted data, rather than rejecting them. So, OK, yes, that entails being robust to certain bad inputs that are outside of the spec, but which land into onto one of the extensions. It doesn't entail being robust to inputs that fall outside of the core spec and all hacks/extensions.
Cherry picking certain bad inputs and giving them a meaning isn't, by itself, bona fide robustness; robustness means handling all bad inputs without crashing or allowing security to be compromised.
Postel's law isn't about accepting arbitrary invalid inputs. It's about inputs that are technically invalid but the intent is obvious from looking at it, and handling those according to intent.
In a distributed non-adversarial setting, this is exactly what you want for robustness.
The problem, as we've come to realise in the time since Postel's law was formulated, is that there is no such thing as a distributed non-adversarial setting. So I get what you're saying.
But your definition of robustness is too narrow as well. There's more to robustness than security. When Outlook strips out a certificate from an email for alleged security reasons, then that's not robustness, that's the opposite, brokenness: You had one job, to deliver an attachment from A to B, and you failed.
Robustness and security can be at odds. It's quite OK to say, "on so and so occasion I choose to make the system not robust, because the robust solution would not be sufficiently secure".
The only area in which is it acceptable to reason this way is graphical user interfaces. (And only if you've provided an API already for reliable automation, so that nobody has to automate the application through its GUI.). Is say graphical, because, no, not in command interfaces.
Even in the area of GUIs, new heuristics about intent cause annoyances to the users. But only annoyances and nothing more.
Like for example when you update your operating system, and now the window manager thinks that whenever you move a window so that its title bar happens to touch the top of the screen, you must be indicating the intent to maximize it.
I suppose the ship has sailed now that people are deploying LLMs in this way and that and those things intuit intent. They are like Postel's Law on amphetamines. There is a big cost to it, like warming the planet, and the systems become fragile for their lack of specification.
> When Outlook strips out a certificate from an email for alleged security reasons
I would say it's being liberal in what it accepts, if it's an alternative to rejecting the e-mail for security reasons.
It has taken a datum with a security problem and "fixed" it, so that it now looks like a datum without that security problem.
(I can't find references online to this exact issue that you're referring to, so I don't have the facts. Are you talking about incoming or outgoing? Is it a situation like an expired or otherwise invalid certificate not being used when sending an outgoing mail? That would be "conservative in what you send/do".)
You are spot on in regards to consideration of adversarial context; however, it is instructive to review the nuanced difference between the RFC761 (1980) statement, viz. "be conservative in what you do, be liberal in what you accept from others" and the substitution of "send" for "do" in RFC1122 (1989). The latter is, with hindsight, an error, since it refocused the attention of some rigid thinkers entirely onto protocol mechanics and away from implementation behaviour, despite the commentary beneath that admonishes such a mindset and concurs wholly with your point.
Or to put it otherwise, Postel was right to begin with, albeit perhaps just a little too cryptic, and has been frequently misquoted and misinterpreted ever since.
It’s not wrong, because it doesn’t stand alone, and yes, if you’re cherry picking half a statement, then you’re like the Australian politicians calling it a “lucky country”, perpetuating a gross misrepresentation of both the letter and the sentiment of the original statement.
This is especially ironic given that the constructive argument against Postel’s law is generally based on the value of a strict interpretation of specification. If you’re intentionally omitting half of the law, then you have an implementation problem.
Furthermore, none of this has much to do with YAML being a shitty design.
I think the other half of the law is more or less good, so I don't have any reason to bring it up. I don't exactly know what it means to be conservative, but it sounds safe, like staying away from generating inputs for other programs that exercise dark corner cases. I don't really have anything to discuss about that half of the law.
The world doesn't need principles that are half good.
> I don't exactly know what it means to be conservative
That could undoubtedly lead to misapprehension. As the GP indicated, and as the word itself means, it references systemic stability and self-preservation behaviour. Reciprocally, however, the obligation to be liberal absolutely does not mean absolving faulty inputs of their flaws. For example, it would not excuse a dud response to an SSH handshake like trying to negotiate RC4. Both Steve Crocker and Eric Allman have been at pains to unpack the understanding of robustness, forgiveness, and format canonicalisation in security context, and they're hardly wrong. It's also why I'm particularly an advocate of the "do", not the "send", formulation. This is a much more systemic and contextual verb in its consequences for implementatation.
> staying away from generating inputs for other programs that exercise dark corner cases
This is exactly the kind of focus-solely-on-the-wire misdirection that I identified above as a common misinterpretation. Conforming to the most precise and unambiguous interpretation of a protocol, if there is one, in regards to what an implementation puts on the wire, can most certainly be a part of that, but that isn't always what being conservative looks like, and processing is equally if not more important.
The introduction of Explicit Congestion Notification (ECN) aka RFC 3168 (2001) springs to mind. RFC 791 (1981) defined bits 14 & 15 of the IPv4 header as "reserved for future use" and diagrammatically gave them as zero. RFC 1349 (Type of Service, 1992, now obsoleted) named them "MBZ" (Must Be Zero) bits but gave them to be otherwise ignored. RFC 2474 (DSCP, 1998) did much the same with what it termed the "Currently Unused field". When ECN was introduced, making use of those bits as a supposedly backwards-compatible congestion signalling mechanism, we discovered a significant proportion of IP implementations aboard endpoints, routers, and middleboxes were rejecting (by discard or reset) datagrams with nonzero values in those bits. Consequently, ECN has taken two decades to fully enable, and this is where both sides of the principle prove their joint and inseparable necessity; to this day many ECN-aware TCP/IP stacks are passive, stochastic, or incremental with their advertisement of ECN, and equally forgiving if the bits coming back don't conform, because an implementation that resets a connection under the circumstances where the developer comprehends the impedance mismatch would be absurd. Thus fulfilling both sides of the maxim in order to promote systemic stability and practical availability and giving ECN a path to the widespread interoperability it has today.
The exposition on page 13 of RFC 1122 (Requirements for Internet Hosts, 1989) broadly anticipated this entire scenario, even though the same section misquotes Postel (or, rather, uses the "send" restatement that I find too reductive).
The statement of the robustness principle is an integrated whole. A partial reading is, perhaps ironically, nonconformant; as with Popper's paradox of tolerance, one thing it cannot be liberal about is itself.
Pandering to customers will make you a lot of money today but very narrow margins tomorrow. If you’re in startup mentality your bosses may be 100% fine with that. But you will likely be stuck supporting that crap because you didn’t become wealthy in the IPO/merger.
This has little to do with the robustness principle, however mis-stated. It's just shitty design. But if someone was still hell-bent on invoking it, then if anything, it's a straight-up violation of the adjacent words "be conservative in what you do"¹, and further disregards the commentary in RFC1122²:
... assume that the network is
filled with malevolent entities that will send in packets
designed to have the worst possible effect ...
Because that’s annoying. YAML is often written and read by humans. If you want a verbose and more regular way to do it, there is always JSON. But JSON is really annoying to deal with for humans, although it is much better than YAML for several applications.
I can't tell if it's irony or not given the sentiment in this thread, but that is not a declaration of a multiline Description field, that's a field of User named "Description:>-" that happens to be missing its trailing ":"
Seeing that used systemically, versus just for "risky" fields makes me want to draw attention to the fantastic remarshal tool[1], which offers a "--yaml-style >" (and "|" and the rest) which will render yaml fields quoted as one wishes
> I can't tell if it's irony or not given the sentiment in this thread, but that is not a declaration of a multiline Description field, that's a field of User named "Description:>-" that happens to be missing its trailing ":"
I do agree it’s a bit of a kludge. But if you want data types and unquoted strings then anything you do to the syntax to denote strings over other data types then becomes a bit of a kludge.
The one good thing about this kludge is it allows for string literals (ie no complicated escaping rules).
> Seeing that used systemically, versus just for "risky" fields makes me want to draw attention to the fantastic remarshal tool[1], which offers a "--yaml-style >" (and "|" and the rest) which will render yaml fields quoted as one wishes
I don’t really understand what you’re alluding to here.
Tell us that you didn't try to use that example without telling us you just eyeballed the post
$ /usr/local/opt/ansible/libexec/bin/python3 -c 'import sys, yaml; print(yaml.safe_load(sys.stdin.read()))' <<YML
User:
Name: >-
Bob
Phone: >-
01234 56789
Description:>-
This is a
multi line
description
YML
yaml.scanner.ScannerError: while scanning a simple key
in "<unicode string>", line 6, column 6:
Description:>-
$ gojq --yaml-input . <<YML
User:
Name: >-
Bob
Phone: >-
01234 56789
Description:>-
This is a
multi line
description
YML
gojq: invalid yaml: <stdin>:6
6 | Description:>-
^ could not find expected ':'
That's because, for better or worse, yaml considers that a legitimate key name, just missing its delimiter
$ gojq --yaml-input . <<YML
User:
Name: >-
Bob
Phone: >-
01234 56789
Description:>-:
This is a
multi line
description
YML
{
"User": {
"Description:>-": "This is a multi line description",
"Name": "Bob",
"Phone": "01234 56789"
}
}
This exchange in a thread complaining about the whitespace sensitivity doesn't escape me
As for remarshal, it was the systemic application of that quoting style that made me think of it, since writing { Name: >- Bob} is the worst of both worlds: not as legible as the plain unquoted version, not suitable for grep, and indentation sensitive
The issue is the lack of white space between the : and the >, not a missing : at the end. I’m typing this on my phone so the odd syntax error might creep in but the key pointer is the examples I linked to and the block token I’ve described.
Further to that point, none of the example links I’ve shared have the : at the end and I have production code that works using the formatting I’ve described. So you’re flat out wrong there with your assumption that block keys always terminate with :
> As for remarshal, it was the systemic application of that quoting style that made me think of it, since writing { Name: >- Bob} is the worst of both worlds: not as legible as the plain unquoted version, not suitable for grep, and indentation sensitive
You wouldn’t write code like that because >- denotes a block and you’re now inlining a string.
I mean I’ve shared links explaining how this works and you’re clearly not reading them.
At the end of the day, I’m not going to argue that >- (and its ilk) solves everything. It clearly doesn’t. If you want to write “minimized” YAML using JSON syntax then you’re far far better off quoting the string.
But if you are writing a string in YAML and either don’t want to deal with quotation marks, or need that string to be a string literal (ie not having to escape things like quotation marks) then my suggestion is an option.
It’s not there as a silver bullet but it is a lesser known feature of YAML. Hence me sharing.
Now go read the links and understand it better. You might genuinely find it useful under some scenarios ;)
I enjoy that you're scolding me about 'not reading' after doubling down the accuracy of your initial post, which, yes, I can easily imagine you did from your phone
And yet I brought receipts for my claims, and you just bring "reed the manul, n00b"
Firstly, I didn't say "read the manual", I said "read the links I shared". And that's a pretty reasonable comment to make given I took the time to find examples knowing that I couldn't easily type them out on my phone. And if you bothered to open the links you'd realize they were brief and to the point. I was actually trying to be helpful.
Secondly, your "receipts" were incorrect. Neither of your examples follows the examples I cited, and your second example creates a key named "Description:>-", which is clearly wrong. Hence why ">-" needs to be after the colon.
Here is more examples and evidence of how to use >- and why your "receipts" were also incorrect:
One final point: I don't understand why you're being so argumentative here. I posted a lesser-known YAML feature in case it helps some people and you've turned it into some kind of pissing match based on bad-faith interpretations of my comments. There was no need for you to do that.
I guess sometikes it is out of your control. I work on a workflow manager where users specify their workflows with YAML. So there's little we can do to prevent them from writing things like no, n, t in a place it could cause some issue like ij the article.
I like that in concept, but 1) literally no one does that (prime example - Kubernetes docs) and 2) it looks much more messy with quotes, when you know that they are unnecessary in 95% of cases.
This problem occurs because pyyaml load() uses the full YAML 1.1 schema. There is another function BaseLoader that will interpret everything as a string which is the workaround that the article suggests. Just another way to achieve it.
It’s a bit of a sore spot in the YAML community as to why PyYAML can’t / won’t support YAML 1.2. It was in maintenance mode for a while. YAML 1.2 also introduced breaking changes.
From a SO comment: “ As long as you're okay with the YAML 1.1 standard, PyYAML is still perfectly fine, secure, etc. If you want to support the YAML 1.2 spec (released in 2009), you can use ruamel.yaml, which started out as a fork of PyYAML. –
CrazyChucky
Commented Mar 26, 2023 at 20:51”
Yeah it's a problem I had to put up a PR on a tool I was using because I ran into the Norway problem on yaml I was getting from another team. I did ask them to add quotes just in case
A supplier we contracted with and we gave requirements to asked me what format do we want the export/import of the data to be in and I said JSON. It’s simple, easy and can be converted into anything else very easily
In Lisp, if you want to read text into symbols (e.g. file of words), you just switch to a dedicated package in which those symbols are interned. Then if NIL happens to come up, it will be a symbol named "NIL" in that package, unrelated to the special object.
I reckon if this is really a big concern for anybody, then they are probably writing way too much YAML to begin with. If you're being caught out by things like this and need to debug it, then it maps very cleanly to types in most high level languages and you can generate your YAML from that instead.
Sadly you usually realize you've been writing too much YAML way past the turning point, and it will be a pain to move a single file to JSON for instance when you have a whole process and system that otherwise ingest YAML, including keeping track of why this specific part of JSON and not YAML.
So people work around the little paper cuts, while still hitting the traps from time to time as they forget them.
> generate YAML
I've a hard time finding a situation where I'd want to do that. Usually YAML is chosen for human readability, but here we're already in a higher level language first. JSON sounds a more appropriate target most of the time ?
> I have been pressured multiple times by Brian Ingerson (one of the authors of the YAML specification) to remove this paragraph, despite him acknowledging that the actual incompatibilities exist. As I was personally bitten by this "JSON is YAML" lie, I refused and said I will continue to educate people about these issues, so others do not run into the same problem again and again. After this, Brian called me a (quote)complete and worthless idiot(unquote).
> In my opinion, instead of pressuring and insulting people who actually clarify issues with YAML and the wrong statements of some of its proponents, I would kindly suggest reading the JSON spec (which is not that difficult or long) and finally make YAML compatible to it, and educating users about the changes, instead of spreading lies about the real compatibility for many years and trying to silence people who point out that it isn't true.
> Addendum/2009: the YAML 1.2 spec is still incompatible with JSON, even though the incompatibilities have been documented (and are known to Brian) for many years and the spec makes explicit claims that YAML is a superset of JSON. It would be so easy to fix, but apparently, bullying people and corrupting userdata is so much easier.
Are there no cases where well-formed JSON could be subject to the problems covered in the article, when parsed by a compliant YAML parser? I'm asking because I know nothing about YAML and not much more about JSON.
Not that I know. JSON requires strings to be quoted which is basically the problem here. Of course it’s not a great human writable configuration format (no comments being a huge problem).
I’m just pointing out that it should be very simple to swap a YAML file for a JSON file in any system that accepts YAML
Configuration files for programs. These tend to be short.
DSLs which are large manifests for things like cloud infrastructure. These tend to be long, they grow over time.
My pet hypothesis is these DSLs exist mostly for neutrality - the vendor can't assume you have Python or something present. But as a user, you can assume just that and gain a lot by authoring in a proper language and generating YAML.
> Configuration files for programs. These tend to be short.
This is where I use YAML and it shines there. IMO easier to read and write by hand than JSON, and short sweet config files don't have the various problems people run into with YAML. It's great.
I can't run the examples right now, but looking at the last "print(template.to_json())" line, looks like the main use case is JSON ?
On cloud infra, yes, having one or two layers of languages is a natural situation. GCP and AWS both accepting (encouraging?) JSON as a subset of YAML makes it a simpler choice when choosing an auto generating target.
You mention people wanting to author the generated files, I think in other situations
tweaking the auto-generated files will be seen as riskier with potential overwriting issues, so lower readability will be seen as a positive.
That's the point really, you can generate JSON or YAML and it doesn't really matter. If you want to include 100 similar objects in that output, you can use a for loop. You can't do that in plain JSON/YAML.
I do a lot of ansible which needs to run on multiple versions, and their yaml typing are not consistent - whenever I have a variable in a logic statement, I nearly always need to apply the "| bool" filter.
This is likely hair splitting, but you are far more likely getting bitten by the monster amount of variance in jinja2 versions/behaviors than by anything "yaml-y"
For example, yaml does not care about this whatsoever
- name: skip on Tuesdays
when: ansible_date_time.weekday != "Tuesday"
but different ansible versions are pretty yolo about whether one needs to additionally wrap those fields in jinja2 mustaches
and another common bug is when the user tries to pass in a boolean via "-e" because those are coerced into string key-value pairs as in
$ ansible -e not_today=true -m debug -a var=not_today all
localhost | SUCCESS => {
"not_today": "true"
}
but if one uses the jinja/python compatible flavor, it does the thing
$ ansible -e not_today=True -m debug -a var=not_today all
localhost | SUCCESS => {
"not_today": true
}
It may be more work than you care for, since sprinkling rampant |bool likely doesn't actively hurt anything, but the |type_debug filter[1] can help if it's behaving mysteriously
Google App Engine used to do this to environment variables defined in YAML. IIRC it would convert the string "true" to "Yes", which was a fun surprise when deploying Java And NodeJS apps.
I will die on the hill than writing out ansible.builtin. are characters of my life I'll never get back, and refuse to. If it's built in why do I have to qualify it?!
Also, watch out: 0x644 != 0644 which is the mode you meant
It's not a coincidence that YAML is a perfect acronym for "yet another migraine looming".
I mean ok it is technically a coincidence but it definitely feels like the direct result of the "what could possibly go wrong" approach the spec writers apparently took
Even worse is the all-decimal MAC problem.
Some genius decided that, to make time input convenient, YAML would parse HH:MM:SS as SS + 60×MM + 60×60×HH. So you could enter 1:23:45 and it would give you the correct number of seconds in 1 hour, 23 minutes, and 45 seconds.
They neglected to put a maximum on the number of such sexagesimal places, so if you put, say, six numbers separated by colons like this, it would be parsed as a very large integer.
Imagine my surprise when, while working at a networking company, we had some devices which failed to configure their MAC addresses in YAML! After this YAML config file had been working for literal years! (I believe this was via netplan? It's been like a decade, I don't remember.)
Turns out, if an unquoted MAC address had even a single non-decimal hex digit, it would do what we expected (parse as a string). This is not only by FAR the more common case, but also we had an A in our vendor prefix, so we never ran into this "feature" during initial development.
Then one day we ran out of MAC addresses and got a new vendor prefix. This time it didn't have any letters in it. Hilarity ensued.
(This behavior has thankfully been removed in more recent YAML standards.)
Perl has a Poland Problem. The customary file extension for Perl files is *.pl. This worked well until Apache introduced content negotiation and the convention to add a language code as file extension. It had index.html.en, index.html.de, for example.
index.html.pl is where the problem started and the reason why the officially recommended file extension for Perl files used to be (still is?) *.plx.
I don't have the Camel book at hand, but Randal Schwartz's Learning Perl 5th edition says:
"Perl doesn't require any special kind of filename or extension, and it's better not to use an extension at all. But some systems may require an extension like plx (meaning PerL eXecutable); see your system's release notes for more information."
That sounds more like an Apache problem than a Perl problem. It's their mistake and it's not even relevant outside Apache context
That should be marked as a breaking change on Apache side IMO. It would be a security nightmare if server code were leaked to public.
It should have been an Apache problem, yes. Not only did it turn out that at least the language negotiation part of content negotiation wasn't the best idea but the way Apache handled it was problematic apart from the pl problem. In the end the Perl community took the issue upon them, so historically I'd say it was a Perl problem (of choice).
Also, Prolog has the Perl problem. :)
Related
The YAML document from hell (566 points, 2023, 353 comments) https://news.ycombinator.com/item?id=34351503
That's a Lot of YAML (429 points, 2023, 478 comments) https://news.ycombinator.com/item?id=37687060
No YAML (Same as above) (152 points, 2021, 149 comments) https://news.ycombinator.com/item?id=29019361
And some light commentary a few days ago: https://news.ycombinator.com/item?id=43648263 - Apr 2025 (51 comments)
Programming with string templates, in a highly complex and footgun-rich markup language, is one of the things I find most offputting about the DevOps ecosystem.
I believe Satan itself decided to mix YAML, Jinja and Turing-complete logic when it created Ansible. It truly is the sendmail of the modern era.
Several years ago when I was writing a deployment system for a cloud distributed database, I tried to automate everything with Ansible playbooks and the Ansible "API" (LOL). I pretty quickly gave up on implementing anything but the most trivial logic in templated YAML and switched to Python (wrapping maximally-dumb Ansible playbooks) for everything nontrivial.
You might like pyinfra.
Just about every time someone complains about ansible, there's a comment to plug this project but pyinfra seems to opt-out of the cloud provisioning part, instead delegating to its terraform connector, which drags in all the nonsense that entails. That makes it not only less useful but (IMHO) a horrible name for a project that only does "remote execution" and not infrastructure. The fact that it's even missing @aws @azure @gcp connectors further solidifies "who is the audience for this thing?"
This is why I generally use Terraform for Kubernetes. It's not perfect, but it's miles better than the various different YAML-templating solutions (Kustomize, Helm) popular in the Kubernetes ecosystem.
Two different stateful recordkeeping control planes with disparate opinions about the current state of the world. What can go wrong.
"The limits of my keyboard mean the limits of my programming language."
If only they had had ⊥ and ⊤ somewhere on their keys to work with Booleans directly while designing the languages. In another branch of history, perchance.[1]
[1] https://en.wikipedia.org/wiki/APL_(programming_language)#/me...
⊥ and ⊤ is not entirely congruent to false and true.
Boolean and propositional logic is not the same.
For _ordinary_ two‑valued classical propositional logic, e.g., YAML, they are congruent.
I have an emacs macro for this
Always quote all yaml strings. If you have a yaml file that has something that isn't a simple value (number, boolean) such as for example a date, time, ip-address, mac address, country code, phone number, server name, configuration name, etc. etc. then you are asking for trouble. Just DON'T DO THAT. It's pretty simple.
"Yeah but it's so convenient"
"Yeah but the benefit of yaml is that you don't need quotes everywhere so that it's more human readable"
DON'T
Yeah that.
00,01,02,03,04,05,06,07,OH SHIT
Pandas has a Nigeria problem, where NA -> NaN.
It's not that bad, because you can explicitly turn that behavior off, but ask me how I know =(
That's a Namibia problem, Nigeria is NG.
How?
This has been fixed since 2009 with YAML 1.2. The problem is that everyone uses libyaml (_e.g._ PyYAML _etc._) which is stuck on 1.1 for reasons.
The 1.2 spec just treats all scalar types as opaque strings, along with a configurable mechanism[0] for auto-converting non-quoted scalars if you so please.
As such, I really don't quite grok why upstream libraries haven't moved to YAML 1.2. Would love to hear details from anyone with more info.
[0]:https://yaml.org/spec/1.2.2/#chapter-10-recommended-schemas
I’m sad that the “fix” was to disallow “no” as a more readable alternative to “false”, rather than to disallow unquoted strings.
It’s silly to have so many keyword synonyms as specified in that earlier regex. I’m also glad we can’t specify numeric literals as roman numerals. KISS
Honestly I’d prefer if “yes” and “no” were the only ways to spell the boolean values. They make sense in pretty much all contexts where booleans are used, whereas “true” and “false” rarely make sense.
Boolean algebra with true and false was well established decades before computers were invented
Boolean algebra deals with logical propositions, not with configuration. The true/false terminology makes sense there.
In boolean logic true/false is ubiquitious and well known. As you can see, if one tries to be cute with it, one will get all sorts of issues. So at this point it doesnt make sense to use anything else.
The true/false terminology makes sense in boolean logic because you’re dealing with the truth of propositions. However, it does not make sense in the context of a configuration language, where there are no propositions that could be true or false.
It makes sense in the context of a configuration language because virtually 100% of programmers and other technical computer users understand “true” and “false” as the canonical Boolean values, and as far as I know that has always been the case. It never would have made sense to invent different unfamiliar terms like “yes” and “no” because of some niche philosophical distinction between “Boolean logic” and “configuration” that almost nobody in the real world cares about.
“yes” and “no” are “unfamiliar terms”? What the fuck? Everyone who knows even the basics of English knows what these words mean.
They are familiar as English words, yes, but unfamiliar as terms of art for Boolean values in computing. It’d be like replacing “if” statements with “whenever” statements.
Don’t give them any ideas! They already tried to make inroads with ruby’s “unless”.
Huh, I never considered this. we take true and false for granted everywhere but they aren't the most meaningful.
The fix is to make conversion user-controllable. If you want to disallow bare scalars except for booleans and numbers or whatever, it's just a little bit of configuration away.
Why do you need an alternative spelling of false?
`logging: no` clearly says “I do not want logging”. `logging: false` is less explicit – what exactly is false?
Logging: no could also be log in norwegian. Or log only for the norwegian region. That's the thing with too many keywords and optional quoting, you can't know.
And for this reason, "logging: false" would be clearer than "logging: no" to represent "I do not want logging".
`false` could be a code for something else just as well as `no`. For example, it could mean that I only want to see logs of false information appearing in the system. The only proper solution is to require quotes around strings.
Then it should be on/off, not yes/no.
on/off also doesn’t make sense in many contexts, for example `isRegistered: on`.
I often prefer enums over booleans for this. It seems more readable for most cases, and can be extended with new values.
This:
could be replaced withLogging: ignore/print/file
Don’t use bool at all.
This, along with number formats, could be a good argument for strings being the only primitive type in config languages.
I recently learned about NestedText: https://nestedtext.org/
While it has the YAML-like significant whitespace, it looks nice because it doesn't try to be clever.
Options are either
1. Specify in the key
2. Specify in the value:Yeah, you still get the same issue that 3 is an integer, 3.3 is a float and 3.3.3 is a string.
Why wasn't that a major version bump, like YAML 2.0?
That sounds like a breaking change that rendered old YAML documents to be parsed differently.
Absolutely correct! Please correct me if I am wrong, but as far as I know, no one has implemented YAML completely according to spec.
The tag schema used is supposed to be modifiable folks!
And why anyone would still be using 1.1 at this point is just forehead palming foolishness.
How often do people even encounter this issue? I have been using YAML for 5+ years and have never had it before. Further, I use `yamllint` which points this out as a lint issue "truthy value should be one of [false, true]".
I don't recall encountering the norway problem in the wild.
Ansible has a pretty common issue with file permissions, because pretty much every numeric representation of a file mode is a valid number in YAML - and most of them are not what you want.
Sure, we can open up a whole 'nother can of worms if we should be programming infrastructure provisioning in YAML, but it's what we have. Chef with Ruby had much more severe issues once people started to abuse it.
Plus, ansible-lint flags that reliably.
Fractions are discriminatory when they happen to one individual or group every time or even just the first time.
See also p95 but the same couple of users always see the p99 time, due to some bug.
Indeed, based on the comments, it is a scissor-bug. Most people never encountered it while some encountered it a lot.
I have seen it twice but I work in Sweden where we often do things also for the Norwegian market.
I have encountered it once, though I live in Norway and worked in IT there for a decade.
Never experienced it for the past 10+ years since the bug was fixed in the spec.
Has been encountered where I work. A global website with lots of country-specific config.
I don't think false is truthy.
I have when getting an openapi yaml file from someone else.
I for one did encounter exactly this problem when configuring a list of countries via ansible for geoip whitelisting.
Another solution is to change the country name:)
No way.
Or in New Zealand: Nor way.
Neitherway
Thoug we have renamed amino acidsvi think it was. Because microsoft excel switched the original names to months.
Genes, not amino acids.
https://www.theverge.com/2020/8/6/21355674/human-genes-renam...
We had this issue many years ago when people from Norway couldn't sign up. Took us a while to figure out
Narrow escape for people from Yemen (YE).
As a Norwegian I'm very curious, where in the pipeline were you using YAML? And why?
I've only seen it used for configuration.
We were using a fork of https://github.com/carmen-ruby/carmen/tree/master/iso_data/b... with our own translations. We used the data in the signup form.
I've seen teams use it as a replacement for JSON because it has the perception of being more "modern"
While JSON is annoying because it lacks some pretty basic features (comments, trailing comma), at least its spec is short. YAML is huuuge - there are way too many ways to do the same thing.
usually locale's paths gone wrong
I usually think of yaml for internal config files, would never think of yaml for user data.
Don't ask me why though, might have something to do with how it's written like a python file, no user would want to write their data in yaml format.
Probably, OP didn't keep user data in YAML, but I think there was config that kept allowed countries to sign up.
Or were they from Noway ...
IMO the proposed solution of StrictYAML + schema is the right one here and what we use extensively for human readable configs. StrictYAML (linked to in the post) is essentially a string-type-only restriction of YAML, so you impose your type coercion on the parsed data structure.
If you have a schema, why not using directly something like protobufs?
The comment you replied to talks about human readable configs
Like this? https://protobuf.dev/reference/protobuf/textformat-spec/
https://protobuf.dev/reference/protobuf/textformat-spec/#:~:...
and, setting that aside, the very next paragraph says that this is a legit representation of -2.0 which means something has gone gravely wrong
“Be liberal in what you accept” rears its ugly head once more.
Being liberal in what you accept, also known as the “robustness principle”, doesn’t mean being ambiguous or surprising about how you accept it. If anything, robustness requires a great deal more precision and clarity (at least with your own reasoning, then with how you communicate what to expect from it).
Postel's Law does not deserve to be nicknamed the Robustness Principle.
Robustness has a meaning and it refers to handling bad inputs gracefully. An example of a lack of robustness is allowing a malicious actor to execute arbitrary code by supplying a datum larger than some buffer limit.
Trying to make sense of invalid inputs and do something with them isn't robustness. It's just example of making an extension to a spec. The extension could be robust or not.
Postel's Law amounts to "have extensions and hacks to handle incorrectly formatted data, rather than rejecting them. So, OK, yes, that entails being robust to certain bad inputs that are outside of the spec, but which land into onto one of the extensions. It doesn't entail being robust to inputs that fall outside of the core spec and all hacks/extensions.
Cherry picking certain bad inputs and giving them a meaning isn't, by itself, bona fide robustness; robustness means handling all bad inputs without crashing or allowing security to be compromised.
Postel's law isn't about accepting arbitrary invalid inputs. It's about inputs that are technically invalid but the intent is obvious from looking at it, and handling those according to intent.
In a distributed non-adversarial setting, this is exactly what you want for robustness.
The problem, as we've come to realise in the time since Postel's law was formulated, is that there is no such thing as a distributed non-adversarial setting. So I get what you're saying.
But your definition of robustness is too narrow as well. There's more to robustness than security. When Outlook strips out a certificate from an email for alleged security reasons, then that's not robustness, that's the opposite, brokenness: You had one job, to deliver an attachment from A to B, and you failed.
Robustness and security can be at odds. It's quite OK to say, "on so and so occasion I choose to make the system not robust, because the robust solution would not be sufficiently secure".
> intent is obvious from looking at it
Ouch, no. Dragons be there. Famous last words.
The only area in which is it acceptable to reason this way is graphical user interfaces. (And only if you've provided an API already for reliable automation, so that nobody has to automate the application through its GUI.). Is say graphical, because, no, not in command interfaces.
Even in the area of GUIs, new heuristics about intent cause annoyances to the users. But only annoyances and nothing more.
Like for example when you update your operating system, and now the window manager thinks that whenever you move a window so that its title bar happens to touch the top of the screen, you must be indicating the intent to maximize it.
I suppose the ship has sailed now that people are deploying LLMs in this way and that and those things intuit intent. They are like Postel's Law on amphetamines. There is a big cost to it, like warming the planet, and the systems become fragile for their lack of specification.
> When Outlook strips out a certificate from an email for alleged security reasons
I would say it's being liberal in what it accepts, if it's an alternative to rejecting the e-mail for security reasons.
It has taken a datum with a security problem and "fixed" it, so that it now looks like a datum without that security problem.
(I can't find references online to this exact issue that you're referring to, so I don't have the facts. Are you talking about incoming or outgoing? Is it a situation like an expired or otherwise invalid certificate not being used when sending an outgoing mail? That would be "conservative in what you send/do".)
You are spot on in regards to consideration of adversarial context; however, it is instructive to review the nuanced difference between the RFC761 (1980) statement, viz. "be conservative in what you do, be liberal in what you accept from others" and the substitution of "send" for "do" in RFC1122 (1989). The latter is, with hindsight, an error, since it refocused the attention of some rigid thinkers entirely onto protocol mechanics and away from implementation behaviour, despite the commentary beneath that admonishes such a mindset and concurs wholly with your point.
Or to put it otherwise, Postel was right to begin with, albeit perhaps just a little too cryptic, and has been frequently misquoted and misinterpreted ever since.
The "liberal in what you accept" part is what is flat out wrong; is that being misquoted?
It’s not wrong, because it doesn’t stand alone, and yes, if you’re cherry picking half a statement, then you’re like the Australian politicians calling it a “lucky country”, perpetuating a gross misrepresentation of both the letter and the sentiment of the original statement.
This is especially ironic given that the constructive argument against Postel’s law is generally based on the value of a strict interpretation of specification. If you’re intentionally omitting half of the law, then you have an implementation problem.
Furthermore, none of this has much to do with YAML being a shitty design.
I think the other half of the law is more or less good, so I don't have any reason to bring it up. I don't exactly know what it means to be conservative, but it sounds safe, like staying away from generating inputs for other programs that exercise dark corner cases. I don't really have anything to discuss about that half of the law.
The world doesn't need principles that are half good.
> I don't exactly know what it means to be conservative
That could undoubtedly lead to misapprehension. As the GP indicated, and as the word itself means, it references systemic stability and self-preservation behaviour. Reciprocally, however, the obligation to be liberal absolutely does not mean absolving faulty inputs of their flaws. For example, it would not excuse a dud response to an SSH handshake like trying to negotiate RC4. Both Steve Crocker and Eric Allman have been at pains to unpack the understanding of robustness, forgiveness, and format canonicalisation in security context, and they're hardly wrong. It's also why I'm particularly an advocate of the "do", not the "send", formulation. This is a much more systemic and contextual verb in its consequences for implementatation.
> staying away from generating inputs for other programs that exercise dark corner cases
This is exactly the kind of focus-solely-on-the-wire misdirection that I identified above as a common misinterpretation. Conforming to the most precise and unambiguous interpretation of a protocol, if there is one, in regards to what an implementation puts on the wire, can most certainly be a part of that, but that isn't always what being conservative looks like, and processing is equally if not more important.
The introduction of Explicit Congestion Notification (ECN) aka RFC 3168 (2001) springs to mind. RFC 791 (1981) defined bits 14 & 15 of the IPv4 header as "reserved for future use" and diagrammatically gave them as zero. RFC 1349 (Type of Service, 1992, now obsoleted) named them "MBZ" (Must Be Zero) bits but gave them to be otherwise ignored. RFC 2474 (DSCP, 1998) did much the same with what it termed the "Currently Unused field". When ECN was introduced, making use of those bits as a supposedly backwards-compatible congestion signalling mechanism, we discovered a significant proportion of IP implementations aboard endpoints, routers, and middleboxes were rejecting (by discard or reset) datagrams with nonzero values in those bits. Consequently, ECN has taken two decades to fully enable, and this is where both sides of the principle prove their joint and inseparable necessity; to this day many ECN-aware TCP/IP stacks are passive, stochastic, or incremental with their advertisement of ECN, and equally forgiving if the bits coming back don't conform, because an implementation that resets a connection under the circumstances where the developer comprehends the impedance mismatch would be absurd. Thus fulfilling both sides of the maxim in order to promote systemic stability and practical availability and giving ECN a path to the widespread interoperability it has today.
The exposition on page 13 of RFC 1122 (Requirements for Internet Hosts, 1989) broadly anticipated this entire scenario, even though the same section misquotes Postel (or, rather, uses the "send" restatement that I find too reductive).
The statement of the robustness principle is an integrated whole. A partial reading is, perhaps ironically, nonconformant; as with Popper's paradox of tolerance, one thing it cannot be liberal about is itself.
> It's about inputs that are technically invalid but the intent is obvious from looking at it, and handling those according to intent.
Surely someone at some point thought it was obvious that “No” should mean “false”, and that’s why we’re now in this mess.
Pandering to customers will make you a lot of money today but very narrow margins tomorrow. If you’re in startup mentality your bosses may be 100% fine with that. But you will likely be stuck supporting that crap because you didn’t become wealthy in the IPO/merger.
That works only when everyone is trying in good faith to follow the standard, i.e. basically never. My version of Postel's Law:
If you accept crap, then eventually you will receive only crap.
This has little to do with the robustness principle, however mis-stated. It's just shitty design. But if someone was still hell-bent on invoking it, then if anything, it's a straight-up violation of the adjacent words "be conservative in what you do"¹, and further disregards the commentary in RFC1122²:
[1] https://datatracker.ietf.org/doc/html/rfc761#section-2.10[2] https://datatracker.ietf.org/doc/html/rfc1122#page-12
Why not just use quotes all the time for strings?
Because that’s annoying. YAML is often written and read by humans. If you want a verbose and more regular way to do it, there is always JSON. But JSON is really annoying to deal with for humans, although it is much better than YAML for several applications.
You don’t actually need quotes to define a string in YAML. Eg the following syntax
That’s both readable and parses your records as strings.Edit: This stack overflow like provides more details https://stackoverflow.com/questions/3790454/how-do-i-break-a...
I can't tell if it's irony or not given the sentiment in this thread, but that is not a declaration of a multiline Description field, that's a field of User named "Description:>-" that happens to be missing its trailing ":"
Seeing that used systemically, versus just for "risky" fields makes me want to draw attention to the fantastic remarshal tool[1], which offers a "--yaml-style >" (and "|" and the rest) which will render yaml fields quoted as one wishes
1: https://github.com/remarshal-project/remarshal#readme and/or $(brew install remarshal)
> I can't tell if it's irony or not given the sentiment in this thread, but that is not a declaration of a multiline Description field, that's a field of User named "Description:>-" that happens to be missing its trailing ":"
The trailing ‘:’ was there right after the ‘n’.
Examples of this syntax:
https://github.com/lmorg/murex/blob/master/builtins/core/arr...
I do agree it’s a bit of a kludge. But if you want data types and unquoted strings then anything you do to the syntax to denote strings over other data types then becomes a bit of a kludge.
The one good thing about this kludge is it allows for string literals (ie no complicated escaping rules).
> Seeing that used systemically, versus just for "risky" fields makes me want to draw attention to the fantastic remarshal tool[1], which offers a "--yaml-style >" (and "|" and the rest) which will render yaml fields quoted as one wishes
I don’t really understand what you’re alluding to here.
Tell us that you didn't try to use that example without telling us you just eyeballed the post
That's because, for better or worse, yaml considers that a legitimate key name, just missing its delimiter This exchange in a thread complaining about the whitespace sensitivity doesn't escape meAs for remarshal, it was the systemic application of that quoting style that made me think of it, since writing { Name: >- Bob} is the worst of both worlds: not as legible as the plain unquoted version, not suitable for grep, and indentation sensitive
The issue is the lack of white space between the : and the >, not a missing : at the end. I’m typing this on my phone so the odd syntax error might creep in but the key pointer is the examples I linked to and the block token I’ve described.
Further to that point, none of the example links I’ve shared have the : at the end and I have production code that works using the formatting I’ve described. So you’re flat out wrong there with your assumption that block keys always terminate with :
> As for remarshal, it was the systemic application of that quoting style that made me think of it, since writing { Name: >- Bob} is the worst of both worlds: not as legible as the plain unquoted version, not suitable for grep, and indentation sensitive
You wouldn’t write code like that because >- denotes a block and you’re now inlining a string.
I mean I’ve shared links explaining how this works and you’re clearly not reading them.
At the end of the day, I’m not going to argue that >- (and its ilk) solves everything. It clearly doesn’t. If you want to write “minimized” YAML using JSON syntax then you’re far far better off quoting the string.
But if you are writing a string in YAML and either don’t want to deal with quotation marks, or need that string to be a string literal (ie not having to escape things like quotation marks) then my suggestion is an option.
It’s not there as a silver bullet but it is a lesser known feature of YAML. Hence me sharing.
Now go read the links and understand it better. You might genuinely find it useful under some scenarios ;)
I enjoy that you're scolding me about 'not reading' after doubling down the accuracy of your initial post, which, yes, I can easily imagine you did from your phone
And yet I brought receipts for my claims, and you just bring "reed the manul, n00b"
Firstly, I didn't say "read the manual", I said "read the links I shared". And that's a pretty reasonable comment to make given I took the time to find examples knowing that I couldn't easily type them out on my phone. And if you bothered to open the links you'd realize they were brief and to the point. I was actually trying to be helpful.
Secondly, your "receipts" were incorrect. Neither of your examples follows the examples I cited, and your second example creates a key named "Description:>-", which is clearly wrong. Hence why ">-" needs to be after the colon.
Here is more examples and evidence of how to use >- and why your "receipts" were also incorrect:
https://go.dev/play/p/1B4ba-dUARq
Here you can clearly see my example:
produces: which is correct.Whereas your example:
produces which is incorrect.----
One final point: I don't understand why you're being so argumentative here. I posted a lesser-known YAML feature in case it helps some people and you've turned it into some kind of pissing match based on bad-faith interpretations of my comments. There was no need for you to do that.
I guess sometikes it is out of your control. I work on a workflow manager where users specify their workflows with YAML. So there's little we can do to prevent them from writing things like no, n, t in a place it could cause some issue like ij the article.
Ah, the many places that choose to use YAML for no good reason...
Yeah, can't say much about it as I joined after they had already decided on using YAML.
I like that in concept, but 1) literally no one does that (prime example - Kubernetes docs) and 2) it looks much more messy with quotes, when you know that they are unnecessary in 95% of cases.
Oh, I did that in Ansible stuff. Using quotes for all strings. Exactly because I know what a mess YAML is.
This problem occurs because pyyaml load() uses the full YAML 1.1 schema. There is another function BaseLoader that will interpret everything as a string which is the workaround that the article suggests. Just another way to achieve it.
It’s a bit of a sore spot in the YAML community as to why PyYAML can’t / won’t support YAML 1.2. It was in maintenance mode for a while. YAML 1.2 also introduced breaking changes.
From a SO comment: “ As long as you're okay with the YAML 1.1 standard, PyYAML is still perfectly fine, secure, etc. If you want to support the YAML 1.2 spec (released in 2009), you can use ruamel.yaml, which started out as a fork of PyYAML. – CrazyChucky Commented Mar 26, 2023 at 20:51”
- https://stackoverflow.com/q/75850232
I wish that ruamel.yaml had better documentation. I've had to dive into the code so many times to find out how to do something.
Yeah it's a problem I had to put up a PR on a tool I was using because I ran into the Norway problem on yaml I was getting from another team. I did ask them to add quotes just in case
A supplier we contracted with and we gave requirements to asked me what format do we want the export/import of the data to be in and I said JSON. It’s simple, easy and can be converted into anything else very easily
In Lisp, if you want to read text into symbols (e.g. file of words), you just switch to a dedicated package in which those symbols are interned. Then if NIL happens to come up, it will be a symbol named "NIL" in that package, unrelated to the special object.
The article mentioned people with the last name "null". I never thought about that. It sounds like really fun in modern days to have that last name.
There have been several write-ups about it: https://news.ycombinator.com/item?id=43113997 https://news.ycombinator.com/item?id=15046223
I like using tags and avoid any doubt
!!boolean
https://dev.to/kalkwst/a-gentle-introduction-to-the-yaml-for...
I reckon if this is really a big concern for anybody, then they are probably writing way too much YAML to begin with. If you're being caught out by things like this and need to debug it, then it maps very cleanly to types in most high level languages and you can generate your YAML from that instead.
Sadly you usually realize you've been writing too much YAML way past the turning point, and it will be a pain to move a single file to JSON for instance when you have a whole process and system that otherwise ingest YAML, including keeping track of why this specific part of JSON and not YAML.
So people work around the little paper cuts, while still hitting the traps from time to time as they forget them.
> generate YAML
I've a hard time finding a situation where I'd want to do that. Usually YAML is chosen for human readability, but here we're already in a higher level language first. JSON sounds a more appropriate target most of the time ?
Isn’t yaml a strict superset of JSON? Any compliant YAML parser should be able to ingest a JSON document.
https://metacpan.org/pod/JSON::XS#JSON-and-YAML
> I have been pressured multiple times by Brian Ingerson (one of the authors of the YAML specification) to remove this paragraph, despite him acknowledging that the actual incompatibilities exist. As I was personally bitten by this "JSON is YAML" lie, I refused and said I will continue to educate people about these issues, so others do not run into the same problem again and again. After this, Brian called me a (quote)complete and worthless idiot(unquote).
> In my opinion, instead of pressuring and insulting people who actually clarify issues with YAML and the wrong statements of some of its proponents, I would kindly suggest reading the JSON spec (which is not that difficult or long) and finally make YAML compatible to it, and educating users about the changes, instead of spreading lies about the real compatibility for many years and trying to silence people who point out that it isn't true.
> Addendum/2009: the YAML 1.2 spec is still incompatible with JSON, even though the incompatibilities have been documented (and are known to Brian) for many years and the spec makes explicit claims that YAML is a superset of JSON. It would be so easy to fix, but apparently, bullying people and corrupting userdata is so much easier.
Well that’s disappointing.
This explains some things on, like, a mythic level, that I’ve felt about yaml practically since the first time I saw it.
I guess software are human texts after all.
Are there no cases where well-formed JSON could be subject to the problems covered in the article, when parsed by a compliant YAML parser? I'm asking because I know nothing about YAML and not much more about JSON.
Not that I know. JSON requires strings to be quoted which is basically the problem here. Of course it’s not a great human writable configuration format (no comments being a huge problem).
I’m just pointing out that it should be very simple to swap a YAML file for a JSON file in any system that accepts YAML
JSON is stricter than YAML so that class of issues is avoided.
Yes. Rewriting a YAML file into strict JSON won't have any impact on the ingestion or the processing of it.
There are probably two use cases.
Configuration files for programs. These tend to be short.
DSLs which are large manifests for things like cloud infrastructure. These tend to be long, they grow over time.
My pet hypothesis is these DSLs exist mostly for neutrality - the vendor can't assume you have Python or something present. But as a user, you can assume just that and gain a lot by authoring in a proper language and generating YAML.
See https://github.com/cloudtools/troposphere for a great example for AWS CloudFormation.
> Configuration files for programs. These tend to be short.
This is where I use YAML and it shines there. IMO easier to read and write by hand than JSON, and short sweet config files don't have the various problems people run into with YAML. It's great.
I can't run the examples right now, but looking at the last "print(template.to_json())" line, looks like the main use case is JSON ?
On cloud infra, yes, having one or two layers of languages is a natural situation. GCP and AWS both accepting (encouraging?) JSON as a subset of YAML makes it a simpler choice when choosing an auto generating target.
You mention people wanting to author the generated files, I think in other situations tweaking the auto-generated files will be seen as riskier with potential overwriting issues, so lower readability will be seen as a positive.
That's the point really, you can generate JSON or YAML and it doesn't really matter. If you want to include 100 similar objects in that output, you can use a for loop. You can't do that in plain JSON/YAML.
True. YAML is an intermediate representation between my intention expressed in Dhall and what runs in production.
https://github.com/dhall-lang/dhall-kubernetes
I’ve been working on https://conl.dev, which fixes/removes YAMLs problematic features.
Trying to find a tag-line for it I like, maybe “markdown for config”?
I do a lot of ansible which needs to run on multiple versions, and their yaml typing are not consistent - whenever I have a variable in a logic statement, I nearly always need to apply the "| bool" filter.
This is likely hair splitting, but you are far more likely getting bitten by the monster amount of variance in jinja2 versions/behaviors than by anything "yaml-y"
For example, yaml does not care about this whatsoever
but different ansible versions are pretty yolo about whether one needs to additionally wrap those fields in jinja2 mustaches and another common bug is when the user tries to pass in a boolean via "-e" because those are coerced into string key-value pairs as in but if one uses the jinja/python compatible flavor, it does the thing It may be more work than you care for, since sprinkling rampant |bool likely doesn't actively hurt anything, but the |type_debug filter[1] can help if it's behaving mysteriously1: https://docs.ansible.com/ansible/11/collections/ansible/buil...
Yep. I just want strict yaml:
anything encased in quotes is a string, anything not is not a string (bool, int or float)
Including keys?
Quote your strings
Empty the footgun before firing.
We should use very basic yaml parsers without these kind of functions.
Related: the YAML exponent problem[0]
TLDR: unquoted hex hash in YAML is fine until it happens to match \d+E\d+ when it gets interpreted as a float in scientific notation.
[0]https://www.brautaset.org/posts/yaml-exponent-problem.html
Google App Engine used to do this to environment variables defined in YAML. IIRC it would convert the string "true" to "Yes", which was a fun surprise when deploying Java And NodeJS apps.
YAML is just doing too much and trying to be too clever.
See also https://noyaml.com (feel send in PRs with your gripes/gotchas re: YAML)
Usual reminder that this is not a problem in YAML 1.2 released 15 years ago.
Sadly many libraries still don't support it.
This effectively means that a new version of specification didn't solve the problem at all.
That edge case sounds like a reasonable tradeoff you would make for such a simple and readable generic data format.
Escaped json probably hits that sweetspot by being a bit uglier than yaml, but 100 times simpler than xml, though.
Mh, since I just commented about ansible, you just made XML-based ansible flash in front of my eyes. I think I'm in a bit of pain now.
But you could use XSLT to generate documentation in XHTML from your playbooks about what files are deployed, what services are managed...I will die on the hill than writing out ansible.builtin. are characters of my life I'll never get back, and refuse to. If it's built in why do I have to qualify it?!
Also, watch out: 0x644 != 0644 which is the mode you meant
It's not a coincidence that YAML is a perfect acronym for "yet another migraine looming".
I mean ok it is technically a coincidence but it definitely feels like the direct result of the "what could possibly go wrong" approach the spec writers apparently took
[dead]
[flagged]