2024-03-15 19:14:26 +00:00
|
|
|
# `yaml-rust2`'s first real release
|
|
|
|
If you are not interested in how this crate was born and just want to know what differs from `yaml-rust`, scroll down to
|
|
|
|
["This release" or click here](#this-release).
|
|
|
|
|
|
|
|
## The why
|
|
|
|
Sometime in August 2023, an ordinary developer (that's me) felt the urge to start scribbling about an OpenAPI linter. I
|
|
|
|
had worked with the OpenAPI format and tried different linters, but none of them felt right. And me needing 3 different
|
|
|
|
linters to lint my OpenAPI was a pain to me. Like any sane person would do, I would write my own (author's note: you are
|
|
|
|
not not sane if you wouldn't). In order to get things started, I needed a YAML parser.
|
|
|
|
|
|
|
|
On August 14th 2023, I forked `yaml-rust` and started working on it. The crate stated that some YAML features were not
|
|
|
|
yet available and I felt that was an issue I could tackle. I started by getting to know the code, understanding it,
|
|
|
|
adding warnings, refactoring, tinkering, documenting, ... . Anything I could do that made me feel that codebase was
|
|
|
|
better, I would do it. I wanted this crate to be as clean as it could be.
|
|
|
|
|
|
|
|
## Fixing YAML compliance
|
|
|
|
In my quest to understand YAML better, I found [the YAML test suite](https://github.com/yaml/yaml-test-suite/): a
|
|
|
|
compilation of corner cases and intricate YAML examples with their expected output / behavior. Interestingly enough,
|
|
|
|
there was an [open pull request on yaml-rust](https://github.com/chyh1990/yaml-rust/pull/187) by
|
|
|
|
[tanriol](https://github.com/tanriol) which integrated the YAML test suite as part of the crate tests. Comments mention
|
|
|
|
that the maintainer wasn't around anymore and that new contributions would probably never be accepted.
|
|
|
|
|
|
|
|
That, however, was a problem for future-past-me, as I was determined (somehow) to have `yaml-rust` pass every single
|
|
|
|
test of the YAML test suite. Slowly, over the course of multiple months (from August 2023 to January 2024), I would
|
|
|
|
sometimes pick a test from the test suite, fix it, commit and start again. On the 23rd of January, the last commit
|
|
|
|
fixing a test was created.
|
|
|
|
|
|
|
|
According to the [YAML test matrix](https://matrix.yaml.info/), there is to this day only 1 library that is fully
|
|
|
|
compliant (aside from the Perl parser generated by the reference). This would make `yaml-rust2` the second library to be
|
|
|
|
fully YAML-compliant. You really wouldn't believe how much you have to stretch YAML so that it's not valid YAML anymore.
|
|
|
|
|
|
|
|
## Performance
|
|
|
|
With so many improvements, the crate was now perfect!.. Except for performance. Adding conditions for every little bit
|
|
|
|
of compliance has lead the code to be much more complex and branch-y, which CPUs hate. I was around 20% slower than the
|
|
|
|
code was when I started.
|
|
|
|
|
|
|
|
For a bit over 3 weeks, I stared at flamegraphs and made my CPU repeat the same instructions until it could do it
|
|
|
|
faster. There have been a bunch of improvements for performance since `yaml-rust`'s last commit. Here are a few of them:
|
|
|
|
|
|
|
|
* Avoid putting characters in a `VecDeque<char>` buffer when we can push them directly into a `String`.
|
|
|
|
* Be a bit smarter about reallocating temporaries: it's best if we know the size in advance, but when we don't we can
|
|
|
|
sometimes avoid pushing characters 1 at a time.
|
|
|
|
* The scanner skips over characters one at a time. When skipping them, it needs to check whether they're a linebreak to
|
|
|
|
update the location. Sometimes, we know we skip over a letter (which is not a linebreak). Several "skip" functions
|
|
|
|
have been added for specific uses.
|
|
|
|
|
|
|
|
And the big winner, for an around 15% decrease in runtime was: use a statically-sized buffer instead of a dynamically
|
|
|
|
allocated one. (Almost) Every character goes from the input stream into the buffer and then gets read from the buffer.
|
|
|
|
This means that `VecDeque::push` and `VecDeque::pop` were called very frequently. The former always has to check for
|
|
|
|
capacity. Using an `ArrayDeque` removed the need for constant capacity checks, at the cost of a minor decrease in
|
|
|
|
performance if a line is deeply indented. Hopefully, nobody has 42 nested YAML objects.
|
|
|
|
|
|
|
|
Here is in the end the performance breakdown:
|
|
|
|
|
|
|
|
![Comparison of the performance between `yaml-rust`, `yaml-rust2` and the C `libfyaml`. `yaml-rust2` is faster in every
|
|
|
|
test than `yaml-rust`, but `libfyaml` remains faster overall.](./img/benchmarks-v0.6.svg)
|
|
|
|
|
2024-03-21 12:48:47 +00:00
|
|
|
Here is a short description of what the files contain:
|
2024-03-15 19:14:26 +00:00
|
|
|
|
|
|
|
* `big`: A large array of records with few fields. One of the fields is a description, a large text block scalar
|
|
|
|
spanning multiple lines. Most of the scanning happens in block scalars.
|
|
|
|
* `nested`: Very short key-value pairs that nest deeply.
|
|
|
|
* `small_objects`: A large array of 2 key-value mappings.
|
|
|
|
* `strings_array`: A large array of lipsum one-liners (~150-175 characters in length).
|
|
|
|
|
|
|
|
As you can see, `yaml-rust2` performs better than `yaml-rust` on every benchmark. However, when compared against the C
|
|
|
|
[`libfyaml`](https://github.com/pantoniou/libfyaml), we can see that there is still much room for improvement.
|
|
|
|
|
|
|
|
I'd like to end this section with a small disclaimer: I am not a benchmark expert. I tried to have an heterogenous set
|
|
|
|
of files that would highlight how the parser performs when stressed different ways. I invite you to take a look at [the
|
|
|
|
code generating the YAML files](https://github.com/Ethiraric/yaml-rust2/tree/master/tools/gen_large_yaml) and, if you
|
|
|
|
are more knowledgeable than I am, improve upon them. `yaml-rust2` performs better with these files because those are the
|
2024-03-21 12:51:01 +00:00
|
|
|
ones I could work with. If you find a file with which `yaml-rust2` is slower than `yaml-rust`, do file an issue!
|
2024-03-15 19:14:26 +00:00
|
|
|
|
|
|
|
## This release
|
|
|
|
### Improvements from `yaml-rust`
|
|
|
|
This release should improve over `yaml-rust` over 3 major points:
|
|
|
|
|
|
|
|
* Performance: We all love fast software. I want to help you achieve it. I haven't managed to make this crate twice as
|
|
|
|
fast, but you should notice a 15-20% improvement in performance.
|
|
|
|
* Compliance: You may not notice it, since I didn't know most of the bugs I fixed were bugs to begin with, but this
|
2024-03-21 12:52:07 +00:00
|
|
|
crate should now be fully YAML-compliant.
|
2024-03-15 19:14:26 +00:00
|
|
|
* Documentation: The documentation of `yaml-rust` is unfortunately incomplete. Documentation here is not exhaustive,
|
|
|
|
but most items are documented. Notably, private items are documented, making it much easier to understand where
|
|
|
|
something happens. There are also in-code comments that help figure out what is going on under the hood.
|
|
|
|
|
|
|
|
Also, last but not least, I do plan on keeping this crate alive as long as I can. Nobody can make promises on that
|
|
|
|
regard, of course, but I have poured hours of work into this, and I would hate to see this go to waste.
|
|
|
|
|
|
|
|
### Switching to `yaml-rust2`
|
|
|
|
This release is `v0.6.0`, chosen to explicitly differ in minor from `yaml-rust`. `v0.4.x` does not exist in this crate
|
|
|
|
to avoid any confusion between the 2 crates.
|
|
|
|
|
|
|
|
Switching to `yaml-rust2` should be a very simple process. Change your `Cargo.toml` to use `yaml-rust2` instead of
|
|
|
|
`yaml-rust`:
|
|
|
|
|
|
|
|
```diff
|
|
|
|
-yaml-rust = "0.4.4"
|
|
|
|
+yaml-rust2 = "0.6.0"
|
|
|
|
```
|
|
|
|
|
|
|
|
As for your code, you have one of two solutions:
|
|
|
|
|
|
|
|
* Changing your imports from `use yaml_rust::Yaml` to `use yaml_rust2::Yaml` if you import items directly, or change
|
2024-03-25 11:01:58 +00:00
|
|
|
occurrences of `yaml_rust` to `yaml_rust2` if you use fully qualified paths.
|
2024-03-15 19:14:26 +00:00
|
|
|
* Alternatively, you can alias `yaml_rust2` with `use yaml_rust2 as yaml_rust`. This would keep your code working if
|
|
|
|
you use fully qualified paths.
|
|
|
|
|
|
|
|
Whichever you decide is up to you.
|
|
|
|
|
2024-03-20 20:39:38 +00:00
|
|
|
[Courtesy of davvid](https://github.com/chyh1990/yaml-rust/issues/160#issuecomment-2008931473), there is another
|
|
|
|
solution. You can combine both approaches and tell `Cargo.toml` to add `yaml-rust2` and to create a `yaml_rust` alias
|
|
|
|
for your code with the following:
|
|
|
|
|
|
|
|
```diff
|
|
|
|
-yaml-rust = "0.4.4"
|
|
|
|
+yaml-rust = { version = "0.6", package = "yaml-rust2" }
|
|
|
|
```
|
|
|
|
|
|
|
|
This allows you to switch to `yaml-rust2` while continuing to refer to `yaml_rust` in your code (e.g. use
|
|
|
|
`yaml_rust::YamlLoader;` will continue to work so that no Rust code changes are required).
|
|
|
|
|
2024-03-15 19:14:26 +00:00
|
|
|
#### What about API breakage?
|
|
|
|
Most of what I have changed is in the implementation details. You might notice more documentation appearing on your LSP,
|
|
|
|
but documentation isn't bound by the API. There is only one change I made that could lead to compile errors. It is
|
|
|
|
unlikely you used that feature, but I'd hate to leave this undocumented.
|
|
|
|
|
|
|
|
If you use the low-level event parsing API (`Parser`,
|
|
|
|
`EventReceiver` / `MarkedEventReceiver`) and namely the `yaml_rust::Event` enumeration, there is one change that might
|
|
|
|
break your code. This was needed for tests in the YAML test suite. In `yaml-rust`, YAML tags are not forwarded from the
|
|
|
|
lower-level `Scanner` API to the low-level `Parser` API.
|
|
|
|
|
|
|
|
Here is the change that was made in the library:
|
|
|
|
|
|
|
|
```diff
|
|
|
|
pub enum Event {
|
|
|
|
// ...
|
|
|
|
-SequenceStart(usize),
|
|
|
|
-MappingStart(usize),
|
|
|
|
+SequenceStart(usize, Option<Tag>),
|
|
|
|
+MappingStart(usize, Option<Tag>),
|
|
|
|
// ...
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
This means that you may now see YAML tags appearing in your code.
|
|
|
|
|
|
|
|
## Closing words
|
|
|
|
YAML is hard. Much more than I had anticipated. If you are exploring dark corners of YAML that `yaml-rust2` supports but
|
|
|
|
`yaml-rust` doesn't, I'm curious to know what it is.
|
|
|
|
|
|
|
|
Work on this crate is far from over. I will try and match `libfyaml`'s performance. Today is the first time I benched
|
|
|
|
against it, and I wouldn't have guessed it to outperform `yaml-rust2` that much.
|
|
|
|
|
|
|
|
If you're interested in upgrading your `yaml-rust` crate, please do take a look at [davvid](https://github.com/davvid)'s
|
2024-03-21 12:56:05 +00:00
|
|
|
[fork of `yaml-rust`](https://github.com/davvid/yaml-rust). Very recent developments on this crate sparked from an
|
2024-03-15 19:14:26 +00:00
|
|
|
[issue on advisory-db](https://github.com/rustsec/advisory-db/issues/1921) about the unmaintained state of `yaml-rust`.
|
|
|
|
I hope it will be that YAML in Rust will improve following this issue.
|
|
|
|
|
|
|
|
Thank you for reading through this. If you happen to have issues with `yaml-rust2` or suggestions, do [drop an
|
|
|
|
issue](https://github.com/Ethiraric/yaml-rust2/issues)!
|
|
|
|
|
|
|
|
If however you wanted an OpenAPI linter, I'm afraid you're out of luck. Just as much as I'm out of time ;)
|
|
|
|
|
|
|
|
-Ethiraric
|
2024-03-20 20:39:38 +00:00
|
|
|
|
|
|
|
EDIT(20-03-2024): Add davvid's method of switching to `yaml-rust2` by creating a Cargo alias.
|