Update to v0.6.

This commit is contained in:
Ethiraric 2024-03-15 20:14:26 +01:00
parent dc88910c23
commit e4ae1d0546
8 changed files with 233 additions and 6 deletions

View file

@ -1,6 +1,6 @@
[package]
name = "yaml-rust2"
version = "0.5.0"
version = "0.6.0"
authors = [
"Yuheng Chen <yuhengchen@sensetime.com>",
"Ethiraric <ethiraric@gmail.com>"

View file

@ -15,7 +15,7 @@ Add the following to the Cargo.toml of your project:
```toml
[dependencies]
yaml-rust2 = "0.5"
yaml-rust2 = "0.6"
```
Use `yaml_rust2::YamlLoader` to load YAML documents and access them as `Yaml` objects:

View file

@ -0,0 +1,153 @@
# `yaml-rust2`'s first real release
If you are not interested in how this crate was born and just want to know what differs from `yaml-rust`, scroll down to
["This release" or click here](#this-release).
## The why
Sometime in August 2023, an ordinary developer (that's me) felt the urge to start scribbling about an OpenAPI linter. I
had worked with the OpenAPI format and tried different linters, but none of them felt right. And me needing 3 different
linters to lint my OpenAPI was a pain to me. Like any sane person would do, I would write my own (author's note: you are
not not sane if you wouldn't). In order to get things started, I needed a YAML parser.
On August 14th 2023, I forked `yaml-rust` and started working on it. The crate stated that some YAML features were not
yet available and I felt that was an issue I could tackle. I started by getting to know the code, understanding it,
adding warnings, refactoring, tinkering, documenting, ... . Anything I could do that made me feel that codebase was
better, I would do it. I wanted this crate to be as clean as it could be.
## Fixing YAML compliance
In my quest to understand YAML better, I found [the YAML test suite](https://github.com/yaml/yaml-test-suite/): a
compilation of corner cases and intricate YAML examples with their expected output / behavior. Interestingly enough,
there was an [open pull request on yaml-rust](https://github.com/chyh1990/yaml-rust/pull/187) by
[tanriol](https://github.com/tanriol) which integrated the YAML test suite as part of the crate tests. Comments mention
that the maintainer wasn't around anymore and that new contributions would probably never be accepted.
That, however, was a problem for future-past-me, as I was determined (somehow) to have `yaml-rust` pass every single
test of the YAML test suite. Slowly, over the course of multiple months (from August 2023 to January 2024), I would
sometimes pick a test from the test suite, fix it, commit and start again. On the 23rd of January, the last commit
fixing a test was created.
According to the [YAML test matrix](https://matrix.yaml.info/), there is to this day only 1 library that is fully
compliant (aside from the Perl parser generated by the reference). This would make `yaml-rust2` the second library to be
fully YAML-compliant. You really wouldn't believe how much you have to stretch YAML so that it's not valid YAML anymore.
## Performance
With so many improvements, the crate was now perfect!.. Except for performance. Adding conditions for every little bit
of compliance has lead the code to be much more complex and branch-y, which CPUs hate. I was around 20% slower than the
code was when I started.
For a bit over 3 weeks, I stared at flamegraphs and made my CPU repeat the same instructions until it could do it
faster. There have been a bunch of improvements for performance since `yaml-rust`'s last commit. Here are a few of them:
* Avoid putting characters in a `VecDeque<char>` buffer when we can push them directly into a `String`.
* Be a bit smarter about reallocating temporaries: it's best if we know the size in advance, but when we don't we can
sometimes avoid pushing characters 1 at a time.
* The scanner skips over characters one at a time. When skipping them, it needs to check whether they're a linebreak to
update the location. Sometimes, we know we skip over a letter (which is not a linebreak). Several "skip" functions
have been added for specific uses.
And the big winner, for an around 15% decrease in runtime was: use a statically-sized buffer instead of a dynamically
allocated one. (Almost) Every character goes from the input stream into the buffer and then gets read from the buffer.
This means that `VecDeque::push` and `VecDeque::pop` were called very frequently. The former always has to check for
capacity. Using an `ArrayDeque` removed the need for constant capacity checks, at the cost of a minor decrease in
performance if a line is deeply indented. Hopefully, nobody has 42 nested YAML objects.
Here is in the end the performance breakdown:
![Comparison of the performance between `yaml-rust`, `yaml-rust2` and the C `libfyaml`. `yaml-rust2` is faster in every
test than `yaml-rust`, but `libfyaml` remains faster overall.](./img/benchmarks-v0.6.svg)
Here is a shot description of what the files contain:
* `big`: A large array of records with few fields. One of the fields is a description, a large text block scalar
spanning multiple lines. Most of the scanning happens in block scalars.
* `nested`: Very short key-value pairs that nest deeply.
* `small_objects`: A large array of 2 key-value mappings.
* `strings_array`: A large array of lipsum one-liners (~150-175 characters in length).
As you can see, `yaml-rust2` performs better than `yaml-rust` on every benchmark. However, when compared against the C
[`libfyaml`](https://github.com/pantoniou/libfyaml), we can see that there is still much room for improvement.
I'd like to end this section with a small disclaimer: I am not a benchmark expert. I tried to have an heterogenous set
of files that would highlight how the parser performs when stressed different ways. I invite you to take a look at [the
code generating the YAML files](https://github.com/Ethiraric/yaml-rust2/tree/master/tools/gen_large_yaml) and, if you
are more knowledgeable than I am, improve upon them. `yaml-rust2` performs better with these files because those are the
ones I could work with. If you find a fil with which `yaml-rust2` is slower than `yaml-rust`, do file an issue!
## This release
### Improvements from `yaml-rust`
This release should improve over `yaml-rust` over 3 major points:
* Performance: We all love fast software. I want to help you achieve it. I haven't managed to make this crate twice as
fast, but you should notice a 15-20% improvement in performance.
* Compliance: You may not notice it, since I didn't know most of the bugs I fixed were bugs to begin with, but this
crate should now be fully YAML-comliant.
* Documentation: The documentation of `yaml-rust` is unfortunately incomplete. Documentation here is not exhaustive,
but most items are documented. Notably, private items are documented, making it much easier to understand where
something happens. There are also in-code comments that help figure out what is going on under the hood.
Also, last but not least, I do plan on keeping this crate alive as long as I can. Nobody can make promises on that
regard, of course, but I have poured hours of work into this, and I would hate to see this go to waste.
### Switching to `yaml-rust2`
This release is `v0.6.0`, chosen to explicitly differ in minor from `yaml-rust`. `v0.4.x` does not exist in this crate
to avoid any confusion between the 2 crates.
Switching to `yaml-rust2` should be a very simple process. Change your `Cargo.toml` to use `yaml-rust2` instead of
`yaml-rust`:
```diff
-yaml-rust = "0.4.4"
+yaml-rust2 = "0.6.0"
```
As for your code, you have one of two solutions:
* Changing your imports from `use yaml_rust::Yaml` to `use yaml_rust2::Yaml` if you import items directly, or change
occurences of `yaml_rust` to `yaml_rust2` if you use fully qualified paths.
* Alternatively, you can alias `yaml_rust2` with `use yaml_rust2 as yaml_rust`. This would keep your code working if
you use fully qualified paths.
Whichever you decide is up to you.
#### What about API breakage?
Most of what I have changed is in the implementation details. You might notice more documentation appearing on your LSP,
but documentation isn't bound by the API. There is only one change I made that could lead to compile errors. It is
unlikely you used that feature, but I'd hate to leave this undocumented.
If you use the low-level event parsing API (`Parser`,
`EventReceiver` / `MarkedEventReceiver`) and namely the `yaml_rust::Event` enumeration, there is one change that might
break your code. This was needed for tests in the YAML test suite. In `yaml-rust`, YAML tags are not forwarded from the
lower-level `Scanner` API to the low-level `Parser` API.
Here is the change that was made in the library:
```diff
pub enum Event {
// ...
-SequenceStart(usize),
-MappingStart(usize),
+SequenceStart(usize, Option<Tag>),
+MappingStart(usize, Option<Tag>),
// ...
}
```
This means that you may now see YAML tags appearing in your code.
## Closing words
YAML is hard. Much more than I had anticipated. If you are exploring dark corners of YAML that `yaml-rust2` supports but
`yaml-rust` doesn't, I'm curious to know what it is.
Work on this crate is far from over. I will try and match `libfyaml`'s performance. Today is the first time I benched
against it, and I wouldn't have guessed it to outperform `yaml-rust2` that much.
If you're interested in upgrading your `yaml-rust` crate, please do take a look at [davvid](https://github.com/davvid)'s
[fork of `yaml-rust`](https://github.com/davvid/yaml-rust). Very recent developements on this crate sparked from an
[issue on advisory-db](https://github.com/rustsec/advisory-db/issues/1921) about the unmaintained state of `yaml-rust`.
I hope it will be that YAML in Rust will improve following this issue.
Thank you for reading through this. If you happen to have issues with `yaml-rust2` or suggestions, do [drop an
issue](https://github.com/Ethiraric/yaml-rust2/issues)!
If however you wanted an OpenAPI linter, I'm afraid you're out of luck. Just as much as I'm out of time ;)
-Ethiraric

View file

@ -0,0 +1,5 @@
,yaml-rust2,yaml-rust,libfyaml
big.yaml,1644933464,2097747837,1642761913
nested.yaml,1186706803,1461738560,1104480120
small_objects.yaml,5459915062,5686715239,4402878726
strings_array.yaml,1698194153,2044921291,924246153
1 yaml-rust2 yaml-rust libfyaml
2 big.yaml 1644933464 2097747837 1642761913
3 nested.yaml 1186706803 1461738560 1104480120
4 small_objects.yaml 5459915062 5686715239 4402878726
5 strings_array.yaml 1698194153 2044921291 924246153

View file

@ -0,0 +1,69 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg width="177mm" height="92mm" viewBox="0 0 17700 9200" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" stroke-width="28.222" stroke-linejoin="round" xml:space="preserve">
<path fill="rgb(255,255,255)" stroke="none" d="M 8856,9178 L -13,9178 -13,-13 17724,-13 17724,9178 8856,9178 Z"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 8976,8348 L 2926,8348 2926,370 15027,370 15027,8348 8976,8348 Z"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 15027,8347 L 2926,8347"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 15027,7017 L 2926,7017"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 15027,5687 L 2926,5687"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 15027,4358 L 2926,4358"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 15027,3028 L 2926,3028"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 15027,1698 L 2926,1698"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 15027,368 L 2926,368"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 2926,8497 L 2926,8347"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 2926,8497 L 2926,8347"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 5951,8497 L 5951,8347"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 5951,8497 L 5951,8347"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 8976,8497 L 8976,8347"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 8976,8497 L 8976,8347"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 12001,8497 L 12001,8347"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 12001,8497 L 12001,8347"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 15027,8497 L 15027,8347"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 15027,8497 L 15027,8347"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 2926,8347 L 15027,8347"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 2776,8347 L 2926,8347"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 2776,8347 L 2926,8347"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 2776,7017 L 2926,7017"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 2776,7017 L 2926,7017"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 2776,5687 L 2926,5687"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 2776,5687 L 2926,5687"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 2776,4358 L 2926,4358"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 2776,4358 L 2926,4358"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 2776,3028 L 2926,3028"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 2776,3028 L 2926,3028"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 2776,1698 L 2926,1698"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 2776,1698 L 2926,1698"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 2776,368 L 2926,368"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 2776,368 L 2926,368"/>
<path fill="none" stroke="rgb(179,179,179)" stroke-linejoin="round" d="M 2926,8347 L 2926,368"/>
<path fill="rgb(248,203,173)" stroke="none" d="M 3304,8347 L 4060,8347 4060,5557 3304,5557 3304,8347 Z"/>
<path fill="rgb(248,203,173)" stroke="none" d="M 6329,8347 L 7086,8347 7086,6403 6329,6403 6329,8347 Z"/>
<path fill="rgb(248,203,173)" stroke="none" d="M 9354,8347 L 10110,8347 10110,785 9354,785 9354,8347 Z"/>
<path fill="rgb(248,203,173)" stroke="none" d="M 12379,8347 L 13136,8347 13136,5627 12379,5627 12379,8347 Z"/>
<path fill="rgb(198,224,180)" stroke="none" d="M 4060,8347 L 4816,8347 4816,6159 4060,6159 4060,8347 Z"/>
<path fill="rgb(198,224,180)" stroke="none" d="M 7086,8347 L 7842,8347 7842,6768 7086,6768 7086,8347 Z"/>
<path fill="rgb(198,224,180)" stroke="none" d="M 10110,8347 L 10866,8347 10866,1087 10110,1087 10110,8347 Z"/>
<path fill="rgb(198,224,180)" stroke="none" d="M 13136,8347 L 13892,8347 13892,6088 13136,6088 13136,8347 Z"/>
<path fill="rgb(189,215,238)" stroke="none" d="M 4816,8347 L 5573,8347 5573,6162 4816,6162 4816,8347 Z"/>
<path fill="rgb(189,215,238)" stroke="none" d="M 7842,8347 L 8598,8347 8598,6878 7842,6878 7842,8347 Z"/>
<path fill="rgb(189,215,238)" stroke="none" d="M 10866,8347 L 11623,8347 11623,2492 10866,2492 10866,8347 Z"/>
<path fill="rgb(189,215,238)" stroke="none" d="M 13892,8347 L 14648,8347 14648,7117 13892,7117 13892,8347 Z"/>
<text class="SVGTextShape"><tspan class="TextParagraph"><tspan class="TextPosition" x="4213" y="8915"><tspan font-family="Liberation Sans, sans-serif" font-size="353px" font-weight="400" fill="rgb(0,0,0)" stroke="none" style="white-space: pre">big</tspan></tspan></tspan></text>
<text class="SVGTextShape"><tspan class="TextParagraph"><tspan class="TextPosition" x="6948" y="8915"><tspan font-family="Liberation Sans, sans-serif" font-size="353px" font-weight="400" fill="rgb(0,0,0)" stroke="none" style="white-space: pre">nested</tspan></tspan></tspan></text>
<text class="SVGTextShape"><tspan class="TextParagraph"><tspan class="TextPosition" x="9456" y="8915"><tspan font-family="Liberation Sans, sans-serif" font-size="353px" font-weight="400" fill="rgb(0,0,0)" stroke="none" style="white-space: pre">small_objects</tspan></tspan></tspan></text>
<text class="SVGTextShape"><tspan class="TextParagraph"><tspan class="TextPosition" x="12522" y="8915"><tspan font-family="Liberation Sans, sans-serif" font-size="353px" font-weight="400" fill="rgb(0,0,0)" stroke="none" style="white-space: pre">strings_array</tspan></tspan></tspan></text>
<text class="SVGTextShape"><tspan class="TextParagraph"><tspan class="TextPosition" x="2491" y="8467"><tspan font-family="Liberation Sans, sans-serif" font-size="353px" font-weight="400" fill="rgb(0,0,0)" stroke="none" style="white-space: pre">0</tspan></tspan></tspan></text>
<text class="SVGTextShape"><tspan class="TextParagraph"><tspan class="TextPosition" x="1353" y="7137"><tspan font-family="Liberation Sans, sans-serif" font-size="353px" font-weight="400" fill="rgb(0,0,0)" stroke="none" style="white-space: pre">1000000</tspan></tspan></tspan></text>
<text class="SVGTextShape"><tspan class="TextParagraph"><tspan class="TextPosition" x="1353" y="5807"><tspan font-family="Liberation Sans, sans-serif" font-size="353px" font-weight="400" fill="rgb(0,0,0)" stroke="none" style="white-space: pre">2000000</tspan></tspan></tspan></text>
<text class="SVGTextShape"><tspan class="TextParagraph"><tspan class="TextPosition" x="1353" y="4478"><tspan font-family="Liberation Sans, sans-serif" font-size="353px" font-weight="400" fill="rgb(0,0,0)" stroke="none" style="white-space: pre">3000000</tspan></tspan></tspan></text>
<text class="SVGTextShape"><tspan class="TextParagraph"><tspan class="TextPosition" x="1353" y="3148"><tspan font-family="Liberation Sans, sans-serif" font-size="353px" font-weight="400" fill="rgb(0,0,0)" stroke="none" style="white-space: pre">4000000</tspan></tspan></tspan></text>
<text class="SVGTextShape"><tspan class="TextParagraph"><tspan class="TextPosition" x="1353" y="1818"><tspan font-family="Liberation Sans, sans-serif" font-size="353px" font-weight="400" fill="rgb(0,0,0)" stroke="none" style="white-space: pre">5000000</tspan></tspan></tspan></text>
<text class="SVGTextShape"><tspan class="TextParagraph"><tspan class="TextPosition" x="1353" y="488"><tspan font-family="Liberation Sans, sans-serif" font-size="353px" font-weight="400" fill="rgb(0,0,0)" stroke="none" style="white-space: pre">6000000</tspan></tspan></tspan></text>
<path fill="rgb(248,203,173)" stroke="none" d="M 15603,4190 L 15497,4190 15497,3979 15708,3979 15708,4190 15603,4190 Z"/>
<path fill="rgb(198,224,180)" stroke="none" d="M 15603,4687 L 15497,4687 15497,4477 15708,4477 15708,4687 15603,4687 Z"/>
<path fill="rgb(189,215,238)" stroke="none" d="M 15603,5185 L 15497,5185 15497,4974 15708,4974 15708,5185 15603,5185 Z"/>
<text class="SVGTextShape"><tspan class="TextParagraph"><tspan class="TextPosition" x="15808" y="4204"><tspan font-family="Liberation Sans, sans-serif" font-size="353px" font-weight="400" fill="rgb(0,0,0)" stroke="none" style="white-space: pre">yaml-rust</tspan></tspan></tspan></text>
<text class="SVGTextShape"><tspan class="TextParagraph"><tspan class="TextPosition" x="15808" y="4701"><tspan font-family="Liberation Sans, sans-serif" font-size="353px" font-weight="400" fill="rgb(0,0,0)" stroke="none" style="white-space: pre">yaml-rust2</tspan></tspan></tspan></text>
<text class="SVGTextShape"><tspan class="TextParagraph"><tspan class="TextPosition" x="15808" y="5199"><tspan font-family="Liberation Sans, sans-serif" font-size="353px" font-weight="400" fill="rgb(0,0,0)" stroke="none" style="white-space: pre">libfyaml</tspan></tspan></tspan></text>
<text class="SVGTextShape" transform="rotate(-90 -15451 4894)"><tspan class="TextParagraph"><tspan class="TextPosition" x="824" y="6394"><tspan font-family="Liberation Sans, sans-serif" font-size="318px" font-weight="400" fill="rgb(0,0,0)" stroke="none" style="white-space: pre">Time in ms (less is better)</tspan></tspan></tspan></text>
</svg>

After

Width:  |  Height:  |  Size: 9.7 KiB

View file

@ -11,7 +11,7 @@
//!
//! ```toml
//! [dependencies]
//! yaml-rust2 = "0.5.0"
//! yaml-rust2 = "0.6.0"
//! ```
//!
//! # Examples

View file

@ -1,6 +1,6 @@
[package]
name = "bench_compare"
version = "0.5.0"
version = "0.6.0"
authors = [
"Ethiraric <ethiraric@gmail.com>"
]

View file

@ -1,6 +1,6 @@
[package]
name = "gen_large_yaml"
version = "0.5.0"
version = "0.6.0"
authors = [
"Ethiraric <ethiraric@gmail.com>"
]
@ -11,7 +11,7 @@ readme = "README.md"
edition = "2018"
[dependencies]
yaml-rust2 = { version = "0.5.0", path = "../../" }
yaml-rust2 = { version = "0.6.0", path = "../../" }
rand = "0.8.5"
lipsum = "0.9.0"