Some cleanup after monorepo-ing.

- Update root `README.md`
- Remove `bench/` tools we no longer need
- Remove `.gitmodules` for `yaml-test-suite`
This commit is contained in:
Ethiraric 2024-10-12 16:29:33 +02:00
parent 3f9b8c22a3
commit 8ee4921e5e
7 changed files with 62 additions and 268 deletions

154
README.md
View file

@ -1,116 +1,73 @@
# saphyr
# Saphyr libraries
[saphyr](https://github.com/saphyr-rs/saphyr) is a fully compliant YAML 1.2
library written in pure Rust.
This repository is home to `saphyr-parser`, `saphyr` and soon-to-be
`saphyr-serde`. These crates provide fully YAML 1.2 compliant parsing and
manipulation, with a focus on correctness, performance and API friendliness (in that order).
This work is based on [`yaml-rust`](https://github.com/chyh1990/yaml-rust) with
fixes towards being compliant to the [YAML test
suite](https://github.com/yaml/yaml-test-suite/). `yaml-rust`'s parser is
heavily influenced by `libyaml` and `yaml-cpp`.
[`saphyr`](https://docs.rs/saphyr/latest/saphyr/) is the most user-friendly and
high-level crate, providing quick-and-easy YAML importing, exporting and object
manipulation.
`saphyr` is a pure Rust YAML 1.2 implementation that benefits from the
memory safety and other benefits from the Rust language.
```rs
use saphyr::{YamlLoader, YamlEmitter};
## Quick Start
### Installing
Add the following to your Cargo.toml:
let docs = YamlLoader::load_from_str("[1, 2, 3]").unwrap();
let doc = &docs[0]; // select the first YAML document
assert_eq!(doc[0].as_i64().unwrap(), 1); // access elements by index
```toml
[dependencies]
saphyr = "0.0.1"
```
or use `cargo add` to get the latest version automatically:
```sh
cargo add saphyr
```
### Example
Use `saphyr::YamlLoader` to load YAML documents and access them as `Yaml` objects:
```rust
use saphyr::{Yaml, YamlEmitter};
fn main() {
let s =
"
foo:
- list1
- list2
bar:
- 1
- 2.0
";
let docs = Yaml::load_from_str(s).unwrap();
// Multi document support, doc is a yaml::Yaml
let doc = &docs[0];
// Debug support
println!("{:?}", doc);
// Index access for map & array
assert_eq!(doc["foo"][0].as_str().unwrap(), "list1");
assert_eq!(doc["bar"][1].as_f64().unwrap(), 2.0);
// Array/map-like accesses are checked and won't panic.
// They will return `BadValue` if the access is invalid.
assert!(doc["INVALID_KEY"][100].is_badvalue());
// Dump the YAML object
let mut out_str = String::new();
{
let mut emitter = YamlEmitter::new(&mut out_str);
emitter.dump(doc).unwrap(); // dump the YAML object to a String
}
println!("{}", out_str);
}
```
Note that `saphyr::Yaml` implements `Index<&'a str>` and `Index<usize>`:
---
* `Index<usize>` assumes the container is an array
* `Index<&'a str>` assumes the container is a string to value map
* otherwise, `Yaml::BadValue` is returned
[`saphyr-parser`](https://docs.rs/saphyr-parser/latest/saphyr_parser/) is the
parser behind `saphyr`. It provides direct access to the parsing process by
emitting [YAML
events](https://docs.rs/saphyr-parser/latest/saphyr_parser/parser/enum.Event.html).
It does not include YAML to object mapping, but is a lightweight alternative to
`saphyr` for those interested in building directly atop the parser, without
having an intermediate conversion to a Rust object. More details on where to
start are available [on
doc.rs](https://docs.rs/saphyr-parser/latest/saphyr_parser/parser/trait.EventReceiver.html).
Note that `annotated::YamlData` cannot return `BadValue` and will panic.
```rs
/// Sink of events. Collects them into an array.
struct EventSink {
events: Vec<Event>,
}
If your document does not conform to this convention (e.g. map with complex
type key), you can use the `Yaml::as_XXX` family API of functions to access
your objects.
/// Implement `on_event`, pushing into `self.events`.
impl EventReceiver for EventSink {
fn on_event(&mut self, ev: Event) {
self.events.push(ev);
}
}
## Features
* Pure Rust
* `Vec`/`HashMap` access API
## Security
This library does not try to interpret any type specifiers in a YAML document,
so there is no risk of, say, instantiating a socket with fields and
communicating with the outside world just by parsing a YAML document.
/// Load events from a yaml string.
fn str_to_events(yaml: &str) -> Vec<Event> {
let mut sink = EventSink { events: Vec::new() };
let mut parser = Parser::new_from_str(yaml);
// Load events using our sink as the receiver.
parser.load(&mut sink, true).unwrap();
sink.events
}
```
## Specification Compliance
This implementation is fully compatible with the YAML 1.2 specification. The
parser behind this library
([`saphyr-parser`](https://github.com/saphyr-rs/saphyr-parser)) tests against
(and passes) the [YAML test suite](https://github.com/yaml/yaml-test-suite/).
This implementation is fully compatible with the YAML 1.2 specification.
`saphyr-parser`) tests against (and passes) the [YAML test
suite](https://github.com/yaml/yaml-test-suite/).
## License
Licensed under either of
* Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
* MIT license (http://opensource.org/licenses/MIT)
at your option.
Since this repository was originally maintained by
[chyh1990](https://github.com/chyh1990), there are 2 sets of licenses.
A license of each set must be included in redistributions. See the
[LICENSE](LICENSE) file for more details.
You can find licences in the [`.licenses`](.licenses) subfolder.
Sets of licences are available for each of the crates. Due to this project
being based on a fork of [chyh1990's
`yaml-rust`](https://github.com/chyh1990/yaml-rust), there are 2 licenses to be
included if using `saphyr` or `saphyr-parser`. Refer to the projects' READMEs
for details.
## Contribution
@ -123,13 +80,16 @@ for inclusion in the work by you, as defined in the Apache-2.0 license, shall
be dual licensed as above, without any additional terms or conditions.
## Links
* [saphyr source code repository](https://github.com/saphyr-rs/saphyr)
### `saphyr`
* [saphyr source code repository](https://github.com/saphyr-rs/saphyr/tree/master/saphyr)
* [saphyr releases on crates.io](https://crates.io/crates/saphyr)
* [saphyr documentation on docs.rs](https://docs.rs/saphyr/latest/saphyr/)
### `saphyr-parser`
* [saphyr-parser source code repository](https://github.com/saphyr-rs/saphyr/tree/master/parser)
* [saphyr-parser releases on crates.io](https://crates.io/crates/saphyr-parser)
* [saphyr-parser documentation on docs.rs](https://docs.rs/saphyr-parser/latest/saphyr-parser/)
### Other links
* [yaml-test-suite](https://github.com/yaml/yaml-test-suite)
* [YAML 1.2 specification](https://yaml.org/spec/1.2.2/)

View file

@ -9,11 +9,3 @@ version = { workspace = true }
[dependencies]
saphyr-parser = { workspace = true }
[[bin]]
name = "time_parse"
path = "tools/time_parse.rs"
[[bin]]
name = "run_bench"
path = "tools/run_bench.rs"

View file

@ -1,14 +1,12 @@
# `yaml-rust2` tools
# `saphyr` tools
This directory contains tools that are used to develop the crate.
Due to dependency management, only some of them are available as binaries from the `yaml-rust2` crate.
Due to dependency management, only some of them are available as binaries from the `saphyr` crate.
| Tool | Invocation |
|------|------------|
| `bench_compare` | `cargo bench_compare` |
| `dump_events` | `cargo run --bin dump_events -- [...]` |
| `gen_large_yaml` | `cargo gen_large_yaml` |
| `run_bench` | `cargo run --bin run_bench -- [...]` |
| `time_parse` | `cargo run --bin time_parse -- [...]` |
## `bench_compare`
See the [dedicated README file](./bench_compare/README.md).
@ -175,55 +173,3 @@ The generated files are the following:
All generated files are meant to be between 200 and 250 MiB in size.
This tool depends on external dependencies that are not part of `yaml-rust2`'s dependencies or `dev-dependencies` and as such can't be called through `cargo run` directly. A dedicated `cargo gen_large_yaml` alias can be used to generate the benchmark files.
## `run_bench`
This is a benchmarking helper that runs the parser on the given file a given number of times and is able to extract simple metrics out of the results. The `--output-yaml` flag can be specified to make the output a YAML file that can be fed into other tools.
This binary is made to be used by `bench_compare`.
Synopsis: `run_bench input.yaml <iterations> [--output-yaml]`
### Examples
```sh
$> cargo run --release --bin run_bench -- bench_yaml/big.yaml 10
Average: 1.631936191s
Min: 1.629654651s
Max: 1.633045284s
95%: 1.633045284s
$> cargo run --release --bin run_bench -- bench_yaml/big.yaml 10 --output-yaml
parser: yaml-rust2
input: bench_yaml/big.yaml
average: 1649847674
min: 1648277149
max: 1651936305
percentile95: 1651936305
iterations: 10
times:
- 1650216129
- 1649349978
- 1649507018
- 1648277149
- 1649036548
- 1650323982
- 1650917692
- 1648702081
- 1650209860
- 1651936305
```
## `time_parse`
This is a benchmarking helper that times how long it takes for the parser to emit all events. It calls the parser on the given input file, receives parsing events and then immediately discards them. It is advised to run this tool with `--release`.
### Examples
Loading a small file could output the following:
```sh
$> cargo run --release --bin time_parse -- input.yaml
Loaded 0MiB in 14.189µs
```
While loading a larger file could output the following:
```sh
$> cargo run --release --bin time_parse -- bench_yaml/big.yaml
Loaded 220MiB in 1.612677853s
```

View file

@ -1,68 +0,0 @@
#![allow(clippy::cast_possible_truncation, clippy::cast_precision_loss)]
use std::{env, fs::File, io::prelude::*};
use saphyr_parser::{Event, MarkedEventReceiver, Marker, Parser};
/// A sink which discards any event sent.
struct NullSink {}
impl MarkedEventReceiver for NullSink {
fn on_event(&mut self, _: Event, _: Marker) {}
}
/// Parse the given input, returning elapsed time in nanoseconds.
fn do_parse(input: &str) -> u64 {
let mut sink = NullSink {};
let mut parser = Parser::new_from_str(input);
let begin = std::time::Instant::now();
parser.load(&mut sink, true).unwrap();
let end = std::time::Instant::now();
(end - begin).as_nanos() as u64
}
fn main() {
let args: Vec<_> = env::args().collect();
let iterations: u64 = args[2].parse().unwrap();
let output_yaml = args.len() == 4 && args[3] == "--output-yaml";
let mut f = File::open(&args[1]).unwrap();
let mut s = String::new();
f.read_to_string(&mut s).unwrap();
// Warmup
do_parse(&s);
do_parse(&s);
do_parse(&s);
// Bench
let times: Vec<_> = (0..iterations).map(|_| do_parse(&s)).collect();
let mut sorted_times = times.clone();
sorted_times.sort_unstable();
// Compute relevant metrics.
let sum: u64 = times.iter().sum();
let avg = sum / iterations;
let min = sorted_times[0];
let max = sorted_times[(iterations - 1) as usize];
let percentile95 = sorted_times[((95 * iterations) / 100) as usize];
if output_yaml {
println!("parser: yaml-rust2");
println!("input: {}", args[1]);
println!("average: {avg}");
println!("min: {min}");
println!("max: {max}");
println!("percentile95: {percentile95}");
println!("iterations: {iterations}");
println!("times:");
for time in &times {
println!(" - {time}");
}
} else {
println!("Average: {}s", (avg as f64) / 1_000_000_000.0);
println!("Min: {}s", (min as f64) / 1_000_000_000.0);
println!("Max: {}s", (max as f64) / 1_000_000_000.0);
println!("95%: {}s", (percentile95 as f64) / 1_000_000_000.0);
}
}

View file

@ -1,33 +0,0 @@
use std::env;
use std::fs::File;
use std::io::prelude::*;
use saphyr_parser::{Event, MarkedEventReceiver, Marker, Parser};
/// A sink which discards any event sent.
struct NullSink {}
impl MarkedEventReceiver for NullSink {
fn on_event(&mut self, _: Event, _: Marker) {}
}
fn main() {
let args: Vec<_> = env::args().collect();
let mut f = File::open(&args[1]).unwrap();
let mut s = String::new();
f.read_to_string(&mut s).unwrap();
let mut sink = NullSink {};
let mut parser = Parser::new_from_str(&s);
// Load events using our sink as the receiver.
let begin = std::time::Instant::now();
parser.load(&mut sink, true).unwrap();
let end = std::time::Instant::now();
if args.len() == 3 && args[2] == "--short" {
println!("{}", (end - begin).as_nanos());
} else {
println!("Loaded {}MiB in {:?}", s.len() / 1024 / 1024, end - begin);
}
}

3
parser/.gitmodules vendored
View file

@ -1,3 +0,0 @@
[submodule "tests/yaml-test-suite"]
path = tests/yaml-test-suite
url = https://github.com/yaml/yaml-test-suite/

View file

@ -33,9 +33,9 @@ name = "dump_events"
path = "tools/dump_events.rs"
[[bin]]
name = "time_parser_parse"
name = "time_parser"
path = "tools/time_parse.rs"
[[bin]]
name = "run_parser_bench"
name = "run_parser"
path = "tools/run_bench.rs"