Overengineered HTML Codeblocks

Tricks for better code blocks on the web. Published Oct 24, 2024 Revised Mar 16, 2025 better is subjective

The What

Toggle 13 hidden lines#![feature(allocator_api)]
use std::{
    alloc::{Allocator, Global},
    cell::Cell,
    marker::PhantomData,
    ptr::NonNull,
};

pub struct Rc<T: ?Sized, A: Allocator = Global> {
    ptr: NonNull<RcBox<T>>,
    phantom: PhantomData<RcBox<T>>,
    alloc: A,
}

struct RcBox<T: ?Sized> {
    strong: Cell<usize>,
    weak: Cell<usize>,
    value: T,
}

fn main() {
    println!("This example isn't particularly interesting to run...");
}

By the way, the sample above was created using the following options in my homegrown markup language:

#{language = "rust", file = "alloc/rc.rs", number = 4, highlight = "14,19..=20", hide = "4..=11,17,23..", tool = #["playground", "godbolt", "clipboard"], "playground-version"="nightly"}

Features

Line numbering with optional starting number
Line highlighting, for multiple disjoint ranges
Line hiding, for multiple disjoint ranges
- Allow toggling visiblity of the hidden lines
- Show an indicator that some lines are hidden
Codeblock metadata, for example the file name or the programming language used.
Allow copying codeblock contents to the clipboard.
Allow linking to online execution environments that contain the contents of the codeblock
Allow IDE-like hover information for tokens through clicking.

Requirements

HTML used should be semantic and accessible.
Interactive items should work with keyboard navigation.
Generated HTML should look proper in Firefox Reader Mode.
To the greatest extent possible, all features should work without Javascript.

The How

Note: I'm going to assume you have control of the markup -> HTML pipeline in some shape or form. Personally, I use djot and a custom filter in my static site generator.

Line Numbering

Wrap each line of code: <span class="line">.
Add a class to the containing codeblock: <code class="number-lines">.
If the starting number is not 1, use a custom property to set the starting number: style="--number-start: 4" Why not data attributes? As much as I would like to just have data-line-number=4 instead of a class and a CSS variable, as of today the attr() function only works with the content property.
Create a CSS counter, then use the ::before pseudo-element to add the current line number. Using the ::before pseudo-element ensures that selecting the codeblock content with a mouse does not also select the line numbers.

code.number-lines {
  counter-reset: line-number-step;
  counter-increment: line-number-step calc(var(--number-start, 1) - 1);

  & .line::before {
    display: inline-block;
    content: counter(line-number-step, decimal-leading-zero);
    counter-increment: line-number-step;
  }
}

Line Highlighting

Add a class to each highlighted line of code: <span class="line line-highlight">.
Change the styling of the line as desired using CSS:

code .line-highlight {
  background-color: rgb(92 91 94);
  border-color: rgb(12 12 14);
}

As to how I go from highlight="14,19..=20" to something usable in code, I use nom for parsing and range-set-blaze for integer range sets.

Toggle 46 hidden linesuse nom::Parser;
use std::ops::RangeInclusive;

#[derive(Debug, PartialEq, Eq, Default)]
pub struct RangeSet(range_set_blaze::RangeSetBlaze<usize>);

impl RangeSet {
    pub fn contains(&self, idx: usize) -> bool {
        self.0.contains(idx)
    }
}

pub fn from_range_str(s: &str) -> Result<RangeSet, &'static str> {
    let mut rangeset = RangeSet::default();
    if s.is_empty() {
        return Ok(rangeset);
    }

    for range in s.split(',').map(get_range) {
        let range = range?;
        rangeset.0.ranges_insert(range);
    }
    Ok(rangeset)
}

fn get_range(s: &str) -> Result<RangeInclusive<usize>, &'static str> {
    let s = s.trim();
    if let Ok((_, num)) = digits_exact(s) {
        return Ok(num..=num);
    }
    let (_, result) = range(s).map_err(|_| "Invalid range string")?;

    match result {
        (Some(left), "..", Some(right)) => Ok(left..=right.saturating_sub(1)),
        (Some(left), "..=", Some(right)) => Ok(left..=right),
        (None, "..", Some(right)) => Ok(0..=right.saturating_sub(1)),
        (None, "..=", Some(right)) => Ok(0..=right),
        (Some(left), "..", None) => Ok(left..=usize::MAX),
        _ => Err("Invalid range string"),
    }
}

fn range(input: &str) -> nom::IResult<&str, (Option<usize>, &str, Option<usize>)> {
    let sep = nom::branch::alt((
        nom::bytes::complete::tag("..="),
        nom::bytes::complete::tag(".."),
    ));
    nom::sequence::tuple((digits, sep, digits))(input)
}

fn digits_exact(input: &str) -> nom::IResult<&str, usize> {
    nom::combinator::all_consuming(nom::combinator::map_res(
        nom::character::complete::digit1,
        str::parse,
    ))
    .parse(input)
}

fn digits(input: &str) -> nom::IResult<&str, Option<usize>> {
    nom::combinator::map_res(nom::character::complete::digit0, |s: &str| {
        if !s.is_empty() {
            s.parse::<usize>().map(Some)
        } else {
            Ok(None)
        }
    })(input)
}

Line Hiding

Add a class to each hidden line of code: <span class="line line-hidden">.
Hide the line while still making it accessible to screen readers and non-CSS environments:

code .line-hidden {
  position: absolute;
  left: -99999px;
  top: auto;
}

3. When creating the code block, if the code block contains hidden lines, add a checkbox before the <code> element:

<label class="toggle-line-hidden">
  <input type="checkbox">
  <span>Toggle N hidden lines</span>
</label>
<code>
  <span>Some code</span>
</code>

4. Unhide the hidden lines by resetting its position if the checkbox is checked. You might have noticed that I am using fairly recent CSS features here at the time of writing, :has is at 92% usage and CSS nesting is at 87% usage. This is done considering my target audience. Consider restructuring your HTML/CSS or use polyfills depending on your requirements.

label.toggle-line-hidden:has(> input:checked) + code {
  & .line {
    --hint-color: transparent;
  }

  & .line-hidden {
    position: relative;
    left: auto;
  }
}

5. Add CSS to show an indicator that lines are hidden:

/* When lines are numbered */
code .line-hidden+.line:not(.line-hidden)::before {
  border-top: 2px dotted var(--hint-color);
}

code .line:not(.line-hidden):has(+ .line-hidden)::before {
  border-bottom: 2px dotted var(--hint-color);
}

/* When lines are not numbered */
code:not(.number-lines) .line-hidden+.line:not(.line-hidden)::before {
  display: block;
  content: "";
  width: 4%;
  margin-top: 1px;
}

code:not(.number-lines) .line:not(.line-hidden):has(+ .line-hidden)::after {
  display: block;
  content: "";
  width: 4%;
  border-bottom: 2px dotted var(--hint-color);
  margin-top: 1px;
}

The --hint-color variable is used to disable the indicator when hidden lines are shown (see the 4th step).

Codeblock Metadata

Fairly simple considering the other features, add some sort of <span> or <ul> element before the codeblock and style it using CSS.

Codeblock Copy to Clipboard

Also fairly simple utilizing the browser's Clipboard API. On copy success, the icon is changed to notify the user.

Online Execution Environments

Grab the text content of the code block.
Turn it into a link to the execution environment using the appropriate format.
Add the link to the information block like in Codeblock Metadata.

Example: Godbolt

Compiler Explorer allows you to open the website with a certain ClientState loaded:

Encode the ClientState JSON body with url-safe base64.
Optional: Compress the JSON with zlib before encoding.
Append the body to https://godbolt.org/clientstate/.

Toggle 16 hidden linesuse std::io::Write as _;
use base64::Engine as _;

fn generate_godbolt_link(
    links: &mut Vec<(&str, String)>,
    tool: &str,
    code: &str,
    lang: &str,
) -> bool {
    if tool != "godbolt" {
        return false;
    }

    let args = match lang {
        "c" => {
            serde_json::json!({
                "sessions": [{
                    "id": 1, "language": "c", "source": &code,
                    "compilers": [{
                        "id": "cclang_trunk", "filters": { "binary": true, "execute": true },
                        "options": "-O0 -g -fsanitize=leak"
                    }],
                }],
            })
        }
        "rust" => {
            serde_json::json!({
                "sessions": [{
                    "id": 1, "language": "rust", "source": &code,
                    "compilers": [{
                        "id": "nightly", "filters": { "binary": true, "execute": true },
                    }],
                }],
            })
        }
        _ => return false,
    };

    let mut url = String::from("https://godbolt.org/clientstate/");
    let mut encoder = flate2::write::ZlibEncoder::new(Vec::new(), flate2::Compression::default());
    encoder.write_all(args.to_string().as_bytes()).unwrap();
    let args = encoder.finish().unwrap();
    base64::prelude::BASE64_URL_SAFE_NO_PAD.encode_string(args, &mut url);
    links.push(("godbolt", url));
    true
}

IDE Hover Information

I currently only have this feature for Rust due to relative ease of implementation. It's also not perfect! Here's how I do it:

Preprocessing

As a separate build step, crawl all Rust code blocks, and for each code block place the source code into a temporary directory, then run rust-analyzer lsif to generate Language Server Index Format data.
Deserialize the resulting JSON and grab the Ranges associated with all Hover results.
Store the range -> hover mappings on disk. The mappings have to be associated with their source code blocks (I use a hash).

Rendering

Read the mappings from disk.
When syntax highlighting Rust source code, check if each token produced is contained within any range registered with a hover.
If so, wrap the token in a <label> with an <input type="button">. Popover elements can be invoked by <button> or <input type="button"> elements. Initially, I went with <button> as it seemed like no-brainer for semantics, but Firefox Reader Mode does not display any button content (with no workarounds AFAIK). I use HTML popover elements for the "hover" information, so hash the hover content and save it for later, emitting the hash in the <input>. This allows multiple labels to share the same hover content (eg. documentation for the same type used multiple times throughout the article), greatly reducing the HTML output size.

<label class="type.builtin lsp-hover-ref">
  usize
  <input type="button" popovertarget="11498124591756402886">
</label>

4. After rendering the page contents, render the popovers. Since rust-analyzer produces hover information in Markdown, my site generator renders them as such so bold/italic text render properly, code blocks have syntax highlighting, etc...

<div id="11498124591756402886" popover class="lsp-hover">
  <label>
    <input class="fullscreen" type="checkbox">
    <span>Toggle fullscreen</span>
  </label>
  <div class="lsp-hover-content">
   <pre class="highlighted"><code class="lang-rust"><span class="line"><span class="type.builtin">usize</span></span></code></pre>
   <hr>
   <p>The pointer-sized unsigned integer type.</p>
   <p>The size of this primitive is how many bytes it takes to reference any location in memory. For example, on a 32 bit target, this is 4 bytes and on a 64 bit target, this is 8 bytes.</p>
  </div>
</div>

Styling

1. Popover elements are centered by default and positioning options are limited. Until anchor positioning arrives so I can place the hover information near clicked tokens, I position popups in the bottom right corner:

.lsp-hover {
  inset: unset;
  position: fixed;
  bottom: var(--space-s);
  right: var(--space-s);
  max-width: calc(min(48ch, 95vw) - var(--space-s));
  max-height: 40vh;

  background-color: var(--theme-background-alt);
  border: 2px solid var(--theme-foreground-alt);

  font-size: var(--step--1);
}

2. Using the same checkbox method as Line Hiding, the hover content can be expanded:

.lsp-hover:has(.fullscreen:checked) {
  min-width: calc(100vw - var(--space-s) * 2);
  min-height: calc(100vh - var(--space-s) * 2);
}

3. On smaller viewports, code blocks situated at the end of an article can be covered by the popup, so increase the bottom margin to allow scroll-positioning the code block above the popup:

article:has(> .lsp-hover:popover-open) {
  margin-bottom: calc(40vh + var(--space-s) * 2);
}

4. The <input> elements can be keyboard-focused, but since they do not contain any content we have to style the parent <label> instead.

.lsp-hover-ref {
  user-select: text;
  cursor: pointer;

  &:focus-within {
    position: relative;
    z-index: 1;
    outline: var(--theme-focus) solid 2px;
  }

  & > input:focus-visible {
    outline-color: transparent;
  }
}

5. Since <input> elements can be keyboard-focused, if they are hidden there is no indication to the user. So, the following addendum to the CSS unhides hidden lines if any tokens are keyboard-focused:

label.toggle-line-hidden:has(> input:checked) + code,
code:has(:focus-visible) {
  & .line {
    --hint-color: transparent;
  }

  & .line-hidden {
    position: relative;
    left: auto;
  }
}

6. In Firefox (not Chrome), the use of <input> causes copying and pasting the code block content (with Ctrl-c and Ctrl-v) to insert redundant, aggravating newlines, completely ruining the pasted content:

pub struct Rc

<T

: ?Sized

, A

: Allocator

 = Global

> {

This can be worked around like so: For code blocks that contain hover information, the parent <pre> tag is made focusable using tabindex="0", then the following CSS removes the <input> elements from the layout flow if the code block is mouse-focused but not keyboard-focused:

pre:focus:not(:focus-visible) .lsp-hover-ref > input {
  display: none;
}

Conclusion

Love it? Hate it? Let me know!

Behold, a story in three posts on the orange site

Toggle fullscreen

#[feature]

Valid forms are:

#[feature(name1, name2, …)]

Toggle fullscreen

extern crate std

The Rust Standard Library

The Rust Standard Library is the foundation of portable Rust software, a set of minimal and battle-tested shared abstractions for the broader Rust ecosystem. It offers core types, like Vec<T> and Option<T>, library-defined operations on language primitives, standard macros, I/O and multithreading, among many other things.

std is available to all Rust crates by default. Therefore, the standard library can be accessed in use statements through the path std, as in use std::env.

How to read this documentation

If you already know the name of what you are looking for, the fastest way to find it is to use the search button at the top of the page.

Otherwise, you may want to jump to one of these useful sections:

If this is your first time, the documentation for the standard library is written to be casually perused. Clicking on interesting things should generally lead you to interesting places. Still, there are important bits you don’t want to miss, so read on for a tour of the standard library and its documentation!

Once you are familiar with the contents of the standard library you may begin to find the verbosity of the prose distracting. At this stage in your development you may want to press the “ Summary” button near the top of the page to collapse it into a more skimmable view.

While you are looking at the top of the page, also notice the “Source” link. Rust’s API documentation comes with the source code and you are encouraged to read it. The standard library source is generally high quality and a peek behind the curtains is often enlightening.

What is in the standard library documentation?

First of all, The Rust Standard Library is divided into a number of focused modules, all listed further down this page. These modules are the bedrock upon which all of Rust is forged, and they have mighty names like std::slice and std::cmp. Modules’ documentation typically includes an overview of the module along with examples, and are a smart place to start familiarizing yourself with the library.

Second, implicit methods on primitive types are documented here. This can be a source of confusion for two reasons:

While primitives are implemented by the compiler, the standard libraryimplements methods directly on the primitive types (and it is the onlylibrary that does so), which are documented in the section on primitives.
The standard library exports many modules with the same name as primitive types. These define additional items related to the primitivetype, but not the all-important methods.

So for example there is a page for the primitive type char that lists all the methods that can be called on characters (very useful), and there is a page for the module std::char that documents iterator and error types created by these methods (rarely useful).

Note the documentation for the primitives str and [T] (also called ‘slice’). Many method calls on String and Vec<T> are actually calls to methods on str and [T] respectively, via deref coercions.

Third, the standard library defines The Rust Prelude, a small collection of items - mostly traits - that are imported into every module of every crate. The traits in the prelude are pervasive, making the prelude documentation a good entry point to learning about the library.

And finally, the standard library exports a number of standard macros, and lists them on this page (technically, not all of the standard macros are defined by the standard library - some are defined by the compiler - but they are documented here the same). Like the prelude, the standard macros are imported by default into all crates.

Contributing changes to the documentation

Check out the Rust contribution guidelines here. The source for this documentation can be found on GitHub in the ‘library/std/’ directory. To contribute changes, make sure you read the guidelines first, then submit pull-requests for your suggested changes.

Contributions are appreciated! If you see a part of the docs that can be improved, submit a PR, or chat with us first on Zulip #t-libs.

A Tour of The Rust Standard Library

The rest of this crate documentation is dedicated to pointing out notable features of The Rust Standard Library.

Containers and collections

The option and result modules define optional and error-handling types, Option<T> and Result<T, E>. The iter module defines Rust’s iterator trait, Iterator, which works with the for loop to access collections.

The standard library exposes three common ways to deal with contiguous regions of memory:

Vec<T> - A heap-allocated vector that is resizable at runtime.
[T; N] - An inline array with a fixed size at compile time.
[T] - A dynamically sized slice into any other kind of contiguousstorage, whether heap-allocated or not.

Slices can only be handled through some kind of pointer, and as such come in many flavors such as:

&[T] - shared slice
&mut [T] - mutable slice
Box<[T]> - owned slice

str, a UTF-8 string slice, is a primitive type, and the standard library defines many methods for it. Rust strs are typically accessed as immutable references: &str. Use the owned String for building and mutating strings.

For converting to strings use the format macro, and for converting from strings use the FromStr trait.

Data may be shared by placing it in a reference-counted box or the Rc type, and if further contained in a Cell or RefCell, may be mutated as well as shared. Likewise, in a concurrent setting it is common to pair an atomically-reference-counted box, Arc, with a Mutex to get the same effect.

The collections module defines maps, sets, linked lists and other typical collection types, including the common HashMap<K, V>.

Platform abstractions and I/O

Besides basic data types, the standard library is largely concerned with abstracting over differences in common platforms, most notably Windows and Unix derivatives.

Common types of I/O, including files, TCP, and UDP, are defined in the io, fs, and net modules.

The thread module contains Rust’s threading abstractions. sync contains further primitive shared memory types, including atomic, mpmc and mpsc, which contains the channel types for message passing.

Use before and after `main()`

Many parts of the standard library are expected to work before and after main(); but this is not guaranteed or ensured by tests. It is recommended that you write your own tests and run them on each platform you wish to support. This means that use of std before/after main, especially of features that interact with the OS or global state, is exempted from stability and portability guarantees and instead only provided on a best-effort basis. Nevertheless bug reports are appreciated.

On the other hand core and alloc are most likely to work in such environments with the caveat that any hookable behavior such as panics, oom handling or allocators will also depend on the compatibility of the hooks.

Some features may also behave differently outside main, e.g. stdio could become unbuffered, some panics might turn into aborts, backtraces might not get symbolicated or similar.

Non-exhaustive list of known limitations:

after-main use of thread-locals, which also affects additional features:
- thread::current
under UNIX, before main, file descriptors 0, 1, and 2 may be unchanged(they are guaranteed to be open during main,and are opened to /dev/null O_RDWR if they weren’t open on program start)

docs.rsToggle fullscreen

std

pub mod alloc

Memory allocation APIs.

In a given program, the standard library has one “global” memory allocator that is used for example by Box<T> and Vec<T>.

Currently the default global allocator is unspecified. Libraries, however, like cdylibs and staticlibs are guaranteed to use the System by default.

The `#[global_allocator]` attribute

This attribute allows configuring the choice of global allocator. You can use this to implement a completely custom global allocator to route all^system-alloc default allocation requests to a custom object.

use std::alloc::{GlobalAlloc, System, Layout};

struct MyAllocator;

unsafe impl GlobalAlloc for MyAllocator {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        unsafe { System.alloc(layout) }
    }

    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        unsafe { System.dealloc(ptr, layout) }
    }
}

#[global_allocator]
static GLOBAL: MyAllocator = MyAllocator;

fn main() {
    // This `Vec` will allocate memory through `GLOBAL` above
    let mut v = Vec::new();
    v.push(1);
}

The attribute is used on a static item whose type implements the GlobalAlloc trait. This type can be provided by an external library:

use jemallocator::Jemalloc;

#[global_allocator]
static GLOBAL: Jemalloc = Jemalloc;

fn main() {}

The #[global_allocator] can only be used once in a crate or its recursive dependencies.

Note that the Rust standard library internals may still directly call System when necessary (for example for the runtime support typically required to implement a global allocator, see re-entrance on GlobalAlloc for more details).

docs.rsToggle fullscreen

core::alloc

pub unsafe trait Allocator

An implementation of Allocator can allocate, grow, shrink, and deallocate arbitrary blocks of data described via Layout.

Allocator is designed to be implemented on ZSTs, references, or smart pointers. An allocator for MyAlloc([u8; N]) cannot be moved, without updating the pointers to the allocated memory.

In contrast to GlobalAlloc, Allocator allows zero-sized allocations. If an underlying allocator does not support this (like jemalloc) or responds by returning a null pointer (such as libc::malloc), this must be caught by the implementation.

Currently allocated memory

Some of the methods require that a memory block is currently allocated by an allocator. This means that:

the starting address for that memory block was previouslyreturned by allocate, grow, or shrink, and
the memory block has not subsequently been deallocated.

A memory block is deallocated by a call to deallocate, or by a call to grow or shrink that returns Ok. A call to grow or shrink that returns Err, does not deallocate the memory block passed to it.

Memory fitting

Some of the methods require that a layout fit a memory block or vice versa. This means that the following conditions must hold:

the memory block must be currently allocated with alignment of layout.align(), and
layout.size() must fall in the range min ..= max, where:
- min is the size of the layout used to allocate the block, and
- max is the actual size returned from allocate, grow, or shrink.

Safety

Memory blocks that are currently allocated by an allocator, must point to valid memory, and retain their validity until either:

the memory block is deallocated, or
the allocator is dropped.

Copying, cloning, or moving the allocator must not invalidate memory blocks returned from it. A copied or cloned allocator must behave like the original allocator.

A memory block which is currently allocated may be passed to any method of the allocator that accepts such an argument.

Additionally, any memory block returned by the allocator must satisfy the allocation invariants described in core::ptr. In particular, if a block has base address p and size n, then p as usize + n <= usize::MAX must hold.

This ensures that pointer arithmetic within the allocation (for example, ptr.add(len)) cannot overflow the address space. currently allocated: #currently-allocated-memory

docs.rsToggle fullscreen

alloc::alloc

pub struct Global

The global memory allocator.

This type implements the Allocator trait by forwarding calls to the allocator registered with the #[global_allocator] attribute if there is one, or the std crate’s default.

Note: while this type is unstable, the functionality it provides can be accessed through the free functions in alloc.

docs.rsToggle fullscreen

core

pub mod cell

Shareable mutable containers.

Rust memory safety is based on this rule: Given an object T, it is only possible to have one of the following:

Several immutable references (&T) to the object (also known as aliasing).
One mutable reference (&mut T) to the object (also known as mutability).

This is enforced by the Rust compiler. However, there are situations where this rule is not flexible enough. Sometimes it is required to have multiple references to an object and yet mutate it.

Shareable mutable containers exist to permit mutability in a controlled manner, even in the presence of aliasing. Cell<T>, RefCell<T>, and OnceCell<T> allow doing this in a single-threaded way—they do not implement Sync. (If you need to do aliasing and mutation among multiple threads, Mutex<T>, RwLock<T>, OnceLock<T> or atomic types are the correct data structures to do so).

Values of the Cell<T>, RefCell<T>, and OnceCell<T> types may be mutated through shared references (i.e. the common &T type), whereas most Rust types can only be mutated through unique (&mut T) references. We say these cell types provide ‘interior mutability’ (mutable via &T), in contrast with typical Rust types that exhibit ‘inherited mutability’ (mutable only via &mut T).

Cell types come in four flavors: Cell<T>, RefCell<T>, OnceCell<T>, and LazyCell<T>. Each provides a different way of providing safe interior mutability.

`Cell<T>`

Cell<T> implements interior mutability by moving values in and out of the cell. That is, a &T to the inner value can never be obtained, and the value itself cannot be directly obtained without replacing it with something else. This type provides the following methods:

For types that implement Copy, the get method retrieves the currentinterior value by duplicating it.
For types that implement Default, the take method replaces the currentinterior value with Default::default and returns the replaced value.
All types have:
- replace: replaces the current interior value and returns the replacedvalue.
- into_inner: this method consumes the Cell<T> and returns theinterior value.
- set: this method replaces the interior value, dropping the replaced value.

Cell<T> is typically used for more simple types where copying or moving values isn’t too resource intensive (e.g. numbers), and should usually be preferred over other cell types when possible. For larger and non-copy types, RefCell provides some advantages.

`RefCell<T>`

RefCell<T> uses Rust’s lifetimes to implement “dynamic borrowing”, a process whereby one can claim temporary, exclusive, mutable access to the inner value. Borrows for RefCell<T>s are tracked at runtime, unlike Rust’s native reference types which are entirely tracked statically, at compile time.

An immutable reference to a RefCell’s inner value (&T) can be obtained with borrow, and a mutable borrow (&mut T) can be obtained with borrow_mut. When these functions are called, they first verify that Rust’s borrow rules will be satisfied: any number of immutable borrows are allowed or a single mutable borrow is allowed, but never both. If a borrow is attempted that would violate these rules, the thread will panic.

The corresponding Sync version of RefCell<T> is RwLock<T>.

`OnceCell<T>`

OnceCell<T> is somewhat of a hybrid of Cell and RefCell that works for values that typically only need to be set once. This means that a reference &T can be obtained without moving or copying the inner value (unlike Cell) but also without runtime checks (unlike RefCell). However, its value can also not be updated once set unless you have a mutable reference to the OnceCell.

OnceCell provides the following methods:

get: obtain a reference to the inner value
set: set the inner value if it is unset (returns a Result)
get_or_init: return the inner value, initializing it if needed
get_mut: provide a mutable reference to the inner value, only availableif you have a mutable reference to the cell itself.

The corresponding Sync version of OnceCell<T> is OnceLock<T>.

`LazyCell<T, F>`

A common pattern with OnceCell is, for a given OnceCell, to use the same function on every call to OnceCell::get_or_init with that cell. This is what is offered by LazyCell, which pairs cells of T with functions of F, and always calls F before it yields &T. This happens implicitly by simply attempting to dereference the LazyCell to get its contents, so its use is much more transparent with a place which has been initialized by a constant.

More complicated patterns that don’t fit this description can be built on OnceCell<T> instead.

LazyCell works by providing an implementation of impl Deref that calls the function, so you can just use it by dereference (e.g. *lazy_cell or lazy_cell.deref()).

The corresponding Sync version of LazyCell<T, F> is LazyLock<T, F>.

When to choose interior mutability

The more common inherited mutability, where one must have unique access to mutate a value, is one of the key language elements that enables Rust to reason strongly about pointer aliasing, statically preventing crash bugs. Because of that, inherited mutability is preferred, and interior mutability is something of a last resort. Since cell types enable mutation where it would otherwise be disallowed though, there are occasions when interior mutability might be appropriate, or even must be used, e.g.

Introducing mutability ‘inside’ of something immutable
Implementation details of logically-immutable methods.
Mutating implementations of Clone.

Introducing mutability ‘inside’ of something immutable

Many shared smart pointer types, including Rc<T> and Arc<T>, provide containers that can be cloned and shared between multiple parties. Because the contained values may be multiply-aliased, they can only be borrowed with &, not &mut. Without cells it would be impossible to mutate data inside of these smart pointers at all.

It’s very common then to put a RefCell<T> inside shared pointer types to reintroduce mutability:

use std::cell::{RefCell, RefMut};
use std::collections::HashMap;
use std::rc::Rc;

fn main() {
    let shared_map: Rc<RefCell<_>> = Rc::new(RefCell::new(HashMap::new()));
    // Create a new block to limit the scope of the dynamic borrow
    {
        let mut map: RefMut<'_, _> = shared_map.borrow_mut();
        map.insert("africa", 92388);
        map.insert("kyoto", 11837);
        map.insert("piccadilly", 11826);
        map.insert("marbles", 38);
    }

    // Note that if we had not let the previous borrow of the cache fall out
    // of scope then the subsequent borrow would cause a dynamic thread panic.
    // This is the major hazard of using `RefCell`.
    let total: i32 = shared_map.borrow().values().sum();
    println!("{total}");
}

Note that this example uses Rc<T> and not Arc<T>. RefCell<T>s are for single-threaded scenarios. Consider using RwLock<T> or Mutex<T> if you need shared mutability in a multi-threaded situation.

Implementation details of logically-immutable methods

Occasionally it may be desirable not to expose in an API that there is mutation happening “under the hood”. This may be because logically the operation is immutable, but e.g., caching forces the implementation to perform mutation; or because you must employ mutation to implement a trait method that was originally defined to take &self.

use std::cell::OnceCell;

struct Graph {
    edges: Vec<(i32, i32)>,
    span_tree_cache: OnceCell<Vec<(i32, i32)>>
}

impl Graph {
    fn minimum_spanning_tree(&self) -> Vec<(i32, i32)> {
        self.span_tree_cache
            .get_or_init(|| self.calc_span_tree())
            .clone()
    }

    fn calc_span_tree(&self) -> Vec<(i32, i32)> {
        // Expensive computation goes here
        vec![]
    }
}

Mutating implementations of `Clone`

This is simply a special - but common - case of the previous: hiding mutability for operations that appear to be immutable. The clone method is expected to not change the source value, and is declared to take &self, not &mut self. Therefore, any mutation that happens in the clone method must use cell types. For example, Rc<T> maintains its reference counts within a Cell<T>.

use std::cell::Cell;
use std::ptr::NonNull;
use std::process::abort;
use std::marker::PhantomData;

struct Rc<T: ?Sized> {
    ptr: NonNull<RcInner<T>>,
    phantom: PhantomData<RcInner<T>>,
}

struct RcInner<T: ?Sized> {
    strong: Cell<usize>,
    refcount: Cell<usize>,
    value: T,
}

impl<T: ?Sized> Clone for Rc<T> {
    fn clone(&self) -> Rc<T> {
        self.inc_strong();
        Rc {
            ptr: self.ptr,
            phantom: PhantomData,
        }
    }
}

trait RcInnerPtr<T: ?Sized> {

    fn inner(&self) -> &RcInner<T>;

    fn strong(&self) -> usize {
        self.inner().strong.get()
    }

    fn inc_strong(&self) {
        self.inner()
            .strong
            .set(self.strong()
                     .checked_add(1)
                     .unwrap_or_else(|| abort() ));
    }
}

impl<T: ?Sized> RcInnerPtr<T> for Rc<T> {
   fn inner(&self) -> &RcInner<T> {
       unsafe {
           self.ptr.as_ref()
       }
   }
}

docs.rsToggle fullscreen

core::cell

pub struct Cell<T>
where
    T: ?Sized,
{
    value: UnsafeCell<T>,
}

A mutable memory location.

Memory layout

Cell<T> has the same memory layout and caveats as UnsafeCell<T>. In particular, this means that Cell<T> has the same in-memory representation as its inner type T.

Examples

In this example, you can see that Cell<T> enables mutation inside an immutable struct. In other words, it enables “interior mutability”.

use std::cell::Cell;

struct SomeStruct {
    regular_field: u8,
    special_field: Cell<u8>,
}

let my_struct = SomeStruct {
    regular_field: 0,
    special_field: Cell::new(1),
};

let new_value = 100;

// ERROR: `my_struct` is immutable
// my_struct.regular_field = new_value;

// WORKS: although `my_struct` is immutable, `special_field` is a `Cell`,
// which can always be mutated
my_struct.special_field.set(new_value);
assert_eq!(my_struct.special_field.get(), new_value);

See the module-level documentation for more.

docs.rsToggle fullscreen

core

pub mod marker

Primitive traits and types representing basic properties of types.

Rust types can be classified in various useful ways according to their intrinsic properties. These classifications are represented as traits.

docs.rsToggle fullscreen

core::marker

pub struct PhantomData<T>
where
    T: PointeeSized,

Zero-sized type used to mark things that “act like” they own a T.

Adding a PhantomData<T> field to your type tells the compiler that your type acts as though it stores a value of type T, even though it doesn’t really. This information is used when computing certain safety properties.

For a more in-depth explanation of how to use PhantomData<T>, please see the Nomicon.

A ghastly note 👻👻👻

Though they both have scary names, PhantomData and ‘phantom types’ are related, but not identical. A phantom type parameter is simply a type parameter which is never used. In Rust, this often causes the compiler to complain, and the solution is to add a “dummy” use by way of PhantomData.

Examples

Unused lifetime parameters

Perhaps the most common use case for PhantomData is a struct that has an unused lifetime parameter, typically as part of some unsafe code. For example, here is a struct Slice that has two pointers of type *const T, presumably pointing into an array somewhere:

struct Slice<'a, T> {
    start: *const T,
    end: *const T,
}

The intention is that the underlying data is only valid for the lifetime 'a, so Slice should not outlive 'a. However, this intent is not expressed in the code, since there are no uses of the lifetime 'a and hence it is not clear what data it applies to. We can correct this by telling the compiler to act as if the Slice struct contained a reference &'a T:

use std::marker::PhantomData;

struct Slice<'a, T> {
    start: *const T,
    end: *const T,
    phantom: PhantomData<&'a T>,
}

This also in turn infers the lifetime bound T: 'a, indicating that any references in T are valid over the lifetime 'a.

When initializing a Slice you simply provide the value PhantomData for the field phantom:

fn borrow_vec<T>(vec: &Vec<T>) -> Slice<'_, T> {
    let ptr = vec.as_ptr();
    Slice {
        start: ptr,
        end: unsafe { ptr.add(vec.len()) },
        phantom: PhantomData,
    }
}

Unused type parameters

It sometimes happens that you have unused type parameters which indicate what type of data a struct is “tied” to, even though that data is not actually found in the struct itself. Here is an example where this arises with FFI. The foreign interface uses handles of type *mut () to refer to Rust values of different types. We track the Rust type using a phantom type parameter on the struct ExternalResource which wraps a handle.

use std::marker::PhantomData;

struct ExternalResource<R> {
   resource_handle: *mut (),
   resource_type: PhantomData<R>,
}

impl<R: ResType> ExternalResource<R> {
    fn new() -> Self {
        let size_of_res = size_of::<R>();
        Self {
            resource_handle: foreign_lib::new(size_of_res),
            resource_type: PhantomData,
        }
    }

    fn do_stuff(&self, param: ParamType) {
        let foreign_params = convert_params(param);
        foreign_lib::do_stuff(self.resource_handle, foreign_params);
    }
}

Ownership and the drop check

The exact interaction of PhantomData with drop check may change in the future.

Currently, adding a field of type PhantomData<T> indicates that your type owns data of type T in very rare circumstances. This in turn has effects on the Rust compiler’s drop check analysis. For the exact rules, see the drop check documentation.

Layout

For all T, the following are guaranteed:

size_of::<PhantomData<T>>() == 0
align_of::<PhantomData<T>>() == 1

docs.rsToggle fullscreen

core

pub mod ptr

Manually manage memory through raw pointers.

See also the pointer primitive types.

Safety

Many functions in this module take raw pointers as arguments and read from or write to them. For this to be safe, these pointers must be valid for the given access. Whether a pointer is valid depends on the operation it is used for (read or write), and the extent of the memory that is accessed (i.e., how many bytes are read/written) – it makes no sense to ask “is this pointer valid”; one has to ask “is this pointer valid for a given access”. Most functions use *mut T and *const T to access only a single value, in which case the documentation omits the size and implicitly assumes it to be size_of::<T>() bytes.

The precise rules for validity are not determined yet. The guarantees that are provided at this point are very minimal:

A null pointer is never valid for reads/writes.
For memory accesses of size zero, every non-null pointer is valid for reads/writes.The following points are only concerned with non-zero-sized accesses.
For a pointer to be valid for reads/writes, it is necessary, but not always sufficient, thatthe pointer be dereferenceable. The provenance of the pointer is used to determine whichallocation it is derived from; a pointer is dereferenceable if the memory range of the givensize starting at the pointer is entirely contained within the bounds of that allocation. Notethat in Rust, every (stack-allocated) variable is considered a separate allocation.
All accesses performed by functions in this module are non-atomic in the senseof atomic operations used to synchronize between threads. This means it isundefined behavior to perform two concurrent accesses to the same location from differentthreads unless both accesses only read from memory.
The result of casting a reference to a pointer is valid for reads/writes for as long as theunderlying allocation is live and no reference (just raw pointers) is used toaccess the same memory. That is, reference and pointer accesses cannot beinterleaved.

These axioms, along with careful use of offset for pointer arithmetic, are enough to correctly implement many useful things in unsafe code. Stronger guarantees will be provided eventually, as the aliasing rules are being determined. For more information, see the book as well as the section in the reference devoted to undefined behavior.

Note that some operations such as read and write do allow null pointers if the total size of the access is zero. However, other operations internally convert pointers into references. Therefore, the general notion of “valid for reads/writes” excludes null pointers, and the specific operations that permit null pointers mention that as an exception. Furthermore, read_volatile and write_volatile can be used in even more situations; see their documentation for details.

We say that a pointer is “dangling” if it is not valid for any non-zero-sized accesses. This means out-of-bounds pointers, pointers to freed memory, null pointers, and pointers created with NonNull::dangling are all dangling.

Alignment

Valid raw pointers as defined above are not necessarily properly aligned (where “proper” alignment is defined by the pointee type, i.e., *const T must be aligned to align_of::<T>()). However, most functions require their arguments to be properly aligned, and will explicitly state this requirement in their documentation. Notable exceptions to this are read_unaligned and write_unaligned.

When a function requires proper alignment, it does so even if the access has size 0, i.e., even if memory is not actually touched. Consider using NonNull::dangling in such cases.

Pointer to reference conversion

When converting a pointer to a reference (e.g. via &*ptr or &mut *ptr), there are several rules that must be followed:

The pointer must be properly aligned.
It must be non-null.
It must be “dereferenceable” in the sense defined above.
The pointer must point to a valid value of type T.
You must enforce Rust’s aliasing rules. The exact aliasing rules are not decided yet, so we only give a rough overview here. The rules also depend on whether a mutable or a shared reference is being created.
- When creating a mutable reference, then while this reference exists, the memory it points tomust not get accessed (read or written) through any other pointer or reference not derivedfrom this reference.
- When creating a shared reference, then while this reference exists, the memory it points tomust not get mutated (except inside UnsafeCell).

If a pointer follows all of these rules, it is said to be convertible to a (mutable or shared) reference.

These rules apply even if the result is unused! (The part about being initialized is not yet fully decided, but until it is, the only safe approach is to ensure that they are indeed initialized.)

An example of the implications of the above rules is that an expression such as unsafe { &*(0 as *const u8) } is Immediate Undefined Behavior.

Allocation

An allocation is a subset of program memory which is addressable from Rust, and within which pointer arithmetic is possible. Examples of allocations include heap allocations, stack-allocated variables, statics, and consts. The safety preconditions of some Rust operations - such as offset and field projections (expr.field) - are defined in terms of the allocations on which they operate.

An allocation has a base address, a size, and a set of memory addresses. It is possible for an allocation to have zero size, but such an allocation will still have a base address. The base address of an allocation is not necessarily unique. While it is currently the case that an allocation always has a set of memory addresses which is fully contiguous (i.e., has no “holes”), there is no guarantee that this will not change in the future.

Allocations must behave like “normal” memory: in particular, reads must not have side-effects, and writes must become visible to other threads using the usual synchronization primitives.

For any allocation with base address, size, and a set of addresses, the following are guaranteed:

For all addresses a in addresses, a is in the range base .. (base + size) (note that this requires a < base + size, not a <= base + size)
base is not equal to null() (i.e., the address with the numericalvalue 0)
base + size <= usize::MAX
size <= isize::MAX

As a consequence of these guarantees, given any address a within the set of addresses of an allocation:

It is guaranteed that a - base does not overflow isize
It is guaranteed that a - base is non-negative
It is guaranteed that, given o = a - base (i.e., the offset of a withinthe allocation), base + o will not wrap around the address space (inother words, will not overflow usize)

Provenance

Pointers are not simply an “integer” or “address”. For instance, it’s uncontroversial to say that a Use After Free is clearly Undefined Behavior, even if you “get lucky” and the freed memory gets reallocated before your read/write (in fact this is the worst-case scenario, UAFs would be much less concerning if this didn’t happen!). As another example, consider that wrapping_offset is documented to “remember” the allocation that the original pointer points to, even if it is offset far outside the memory range occupied by that allocation. To rationalize claims like this, pointers need to somehow be more than just their addresses: they must have provenance.

A pointer value in Rust semantically contains the following information:

The address it points to, which can be represented by a usize.
The provenance it has, defining the memory it has permission to access. Provenance can beabsent, in which case the pointer does not have permission to access any memory.

The exact structure of provenance is not yet specified, but the permission defined by a pointer’s provenance have a spatial component, a temporal component, and a mutability component:

Spatial: The set of memory addresses that the pointer is allowed to access.
Temporal: The timespan during which the pointer is allowed to access those memory addresses.
Mutability: Whether the pointer may only access the memory for reads, or also access it forwrites. Note that this can interact with the other components, e.g. a pointer might permitmutation only for a subset of addresses, or only for a subset of its maximal timespan.

When an allocation is created, it has a unique Original Pointer. For alloc APIs this is literally the pointer the call returns, and for local variables and statics, this is the name of the variable/static. (This is mildly overloading the term “pointer” for the sake of brevity/exposition.)

The Original Pointer for an allocation has provenance that constrains the spatial permissions of this pointer to the memory range of the allocation, and the temporal permissions to the lifetime of the allocation. Provenance is implicitly inherited by all pointers transitively derived from the Original Pointer through operations like offset, borrowing, and pointer casts. Some operations may shrink the permissions of the derived provenance, limiting how much memory it can access or how long it’s valid for (i.e. borrowing a subfield and subslicing can shrink the spatial component of provenance, and all borrowing can shrink the temporal component of provenance). However, no operation can ever grow the permissions of the derived provenance: even if you “know” there is a larger allocation, you can’t derive a pointer with a larger provenance. Similarly, you cannot “recombine” two contiguous provenances back into one (i.e. with a fn merge(&[T], &[T]) -> &[T]).

A reference to a place always has provenance over at least the memory that place occupies. A reference to a slice always has provenance over at least the range that slice describes. Whether and when exactly the provenance of a reference gets “shrunk” to exactly fit the memory it points to is not yet determined.

A shared reference only ever has provenance that permits reading from memory, and never permits writes, except inside UnsafeCell.

Provenance can affect whether a program has undefined behavior:

It is undefined behavior to access memory through a pointer that does not have provenance over that memory. Note that a pointer “at the end” of its provenance is not actually outside its provenance, it just has 0 bytes it can load/store. Zero-sized accesses do not require any provenance since they access an empty range of memory.
It is undefined behavior to offset a pointer across a memory range that is not contained in the allocation it is derived from, or to offset_from two pointers not derived from the same allocation. Provenance is used to say what exactly “derived from” even means: the lineage of a pointer is traced back to the Original Pointer it descends from, and that identifies the relevant allocation. In particular, it’s always UB to offset a pointer derived from something that is now deallocated, except if the offset is 0.

But it is still sound to:

Create a pointer without provenance from just an address (see without_provenance). Such a pointer cannot be used for memory accesses (except for zero-sized accesses). This can still be useful for sentinel values like null or to represent a tagged pointer that will never be dereferenceable. In general, it is always sound for an integer to pretend to be a pointer “for fun” as long as you don’t use operations on it which require it to be valid (non-zero-sized offset, read, write, etc).
Forge an allocation of size zero at any sufficiently aligned non-null address. i.e. the usual “ZSTs are fake, do what you want” rules apply.
wrapping_offset a pointer outside its provenance. This includes pointers which have “no” provenance. In particular, this makes it sound to do pointer tagging tricks.
Compare arbitrary pointers by address. Pointer comparison ignores provenance and addresses are just integers, so there is always a coherent answer, even if the pointers are dangling or from different provenances. Note that if you get “lucky” and notice that a pointer at the end of one allocation is the “same” address as the start of another allocation, anything you do with that fact is probably going to be gibberish. The scope of that gibberish is kept under control by the fact that the two pointers still aren’t allowed to access the other’s allocation (bytes), because they still have different provenance.

Note that the full definition of provenance in Rust is not decided yet, as this interacts with the as-yet undecided aliasing rules.

Pointers Vs Integers

From this discussion, it becomes very clear that a usize cannot accurately represent a pointer, and converting from a pointer to a usize is generally an operation which only extracts the address. Converting this address back into pointer requires somehow answering the question: which provenance should the resulting pointer have?

Rust provides two ways of dealing with this situation: Strict Provenance and Exposed Provenance.

Note that a pointer can represent a usize (via without_provenance), so the right type to use in situations where a value is “sometimes a pointer and sometimes a bare usize” is a pointer type.

Strict Provenance

“Strict Provenance” refers to a set of APIs designed to make working with provenance more explicit. They are intended as substitutes for casting a pointer to an integer and back.

Entirely avoiding integer-to-pointer casts successfully side-steps the inherent ambiguity of that operation. This benefits compiler optimizations, and it is pretty much a requirement for using tools like Miri and architectures like CHERI that aim to detect and diagnose pointer misuse.

The key insight to making programming without integer-to-pointer casts at all viable is the with_addr method:

/// Creates a new pointer with the given address and the provenance  of `self`.
///
/// This is similar to a `addr as *const T` cast,
/// but copies the provenance of `self` to the new pointer.
/// This avoids the inherent ambiguity of the unary cast.
///
/// This is equivalent to using `wrapping_offset` to offset `self` to the given address,
/// and therefore has all the same capabilities and restrictions.
pub fn with_addr(self, addr: usize) -> Self;

So you’re still able to drop down to the address representation and do whatever clever bit tricks you want as long as you’re able to keep around a pointer into the allocation you care about that can “reconstitute” the provenance. Usually this is very easy, because you only are taking a pointer, messing with the address, and then immediately converting back to a pointer. To make this use case more ergonomic, we provide the map_addr method.

To help make it clear that code is “following” Strict Provenance semantics, we also provide an addr method which promises that the returned address is not part of a pointer-integer-pointer roundtrip. In the future we may provide a lint for pointer<->integer casts to help you audit if your code conforms to strict provenance.

Using Strict Provenance

Most code needs no changes to conform to strict provenance, as the only really concerning operation is casts from usize to a pointer. For code which does cast a usize to a pointer, the scope of the change depends on exactly what you’re doing.

In general, you just need to make sure that if you want to convert a usize address to a pointer and then use that pointer to read/write memory, you need to keep around a pointer that has sufficient provenance to perform that read/write itself. In this way all of your casts from an address to a pointer are essentially just applying offsets/indexing.

This is generally trivial to do for simple cases like tagged pointers as long as you represent the tagged pointer as an actual pointer and not a usize. For instance:

unsafe {
    // A flag we want to pack into our pointer
    static HAS_DATA: usize = 0x1;
    static FLAG_MASK: usize = !HAS_DATA;

    // Our value, which must have enough alignment to have spare least-significant-bits.
    let my_precious_data: u32 = 17;
    assert!(align_of::<u32>() > 1);

    // Create a tagged pointer
    let ptr = &my_precious_data as *const u32;
    let tagged = ptr.map_addr(|addr| addr | HAS_DATA);

    // Check the flag:
    if tagged.addr() & HAS_DATA != 0 {
        // Untag and read the pointer
        let data = *tagged.map_addr(|addr| addr & FLAG_MASK);
        assert_eq!(data, 17);
    } else {
        unreachable!()
    }
}

(Yes, if you’ve been using AtomicUsize for pointers in concurrent datastructures, you should be using AtomicPtr instead. If that messes up the way you atomically manipulate pointers, we would like to know why, and what needs to be done to fix it.)

Situations where a valid pointer must be created from just an address, such as baremetal code accessing a memory-mapped interface at a fixed address, cannot currently be handled with strict provenance APIs and should use exposed provenance.

Exposed Provenance

As discussed above, integer-to-pointer casts are not possible with Strict Provenance APIs. This is by design: the goal of Strict Provenance is to provide a clear specification that we are confident can be formalized unambiguously and can be subject to precise formal reasoning. Integer-to-pointer casts do not (currently) have such a clear specification.

However, there exist situations where integer-to-pointer casts cannot be avoided, or where avoiding them would require major refactoring. Legacy platform APIs also regularly assume that usize can capture all the information that makes up a pointer. Bare-metal platforms can also require the synthesis of a pointer “out of thin air” without anywhere to obtain proper provenance from.

Rust’s model for dealing with integer-to-pointer casts is called Exposed Provenance. However, the semantics of Exposed Provenance are on much less solid footing than Strict Provenance, and at this point it is not yet clear whether a satisfying unambiguous semantics can be defined for Exposed Provenance. (If that sounds bad, be reassured that other popular languages that provide integer-to-pointer casts are not faring any better.) Furthermore, Exposed Provenance will not work (well) with tools like Miri and CHERI.

Exposed Provenance is provided by the expose_provenance and with_exposed_provenance methods, which are equivalent to as casts between pointers and integers.

expose_provenance is a lot like addr, but additionally adds the provenance of thepointer to a global list of ‘exposed’ provenances. (This list is purely conceptual, it existsfor the purpose of specifying Rust but is not materialized in actual executions, except intools like Miri.)Memory which is outside the control of the Rust abstract machine (MMIO registers, for example)is always considered to be exposed, so long as this memory is disjoint from memory that willbe used by the abstract machine such as the stack, heap, and statics.
with_exposed_provenance can be used to construct a pointer with one of these previously‘exposed’ provenances. with_exposed_provenance takes only addr: usize as arguments, sounlike in with_addr there is no indication of what the correct provenance for the returnedpointer is – and that is exactly what makes integer-to-pointer casts so tricky to rigorouslyspecify! The compiler will do its best to pick the right provenance for you, but currently wecannot provide any guarantees about which provenance the resulting pointer will have. Only onething is clear: if there is no previously ‘exposed’ provenance that justifies the way thereturned pointer will be used, the program has undefined behavior.

If at all possible, we encourage code to be ported to Strict Provenance APIs, thus avoiding the need for Exposed Provenance. Maximizing the amount of such code is a major win for avoiding specification complexity and to facilitate adoption of tools like CHERI and Miri that can be a big help in increasing the confidence in (unsafe) Rust code. However, we acknowledge that this is not always possible, and offer Exposed Provenance as a way to explicit “opt out” of the well-defined semantics of Strict Provenance, and “opt in” to the unclear semantics of integer-to-pointer casts.

docs.rsToggle fullscreen

core::ptr::non_null

pub struct NonNull<T>
where
    T: PointeeSized,
{
    pointer: {unknown},
}

*mut T but non-zero and covariant.

This is often the correct thing to use when building data structures using raw pointers, but is ultimately more dangerous to use because of its additional properties. If you’re not sure if you should use NonNull<T>, just use *mut T!

Unlike *mut T, the pointer must always be non-null, even if the pointer is never dereferenced. This is so that enums may use this forbidden value as a discriminant – Option<NonNull<T>> has the same size as *mut T. However the pointer may still dangle if it isn’t dereferenced.

Unlike *mut T, NonNull<T> is covariant over T. This is usually the correct choice for most data structures and safe abstractions, such as Box, Rc, Arc, Vec, and LinkedList.

In rare cases, if your type exposes a way to mutate the value of T through a NonNull<T>, and you need to prevent unsoundness from variance (for example, if T could be a reference with a shorter lifetime), you should add a field to make your type invariant, such as PhantomData<Cell<T>> or PhantomData<&'a mut T>.

Example of a type that must be invariant:

use std::cell::Cell;
use std::marker::PhantomData;
struct Invariant<T> {
    ptr: std::ptr::NonNull<T>,
    _invariant: PhantomData<Cell<T>>,
}

Notice that NonNull<T> has a From instance for &T. However, this does not change the fact that mutating through a (pointer derived from a) shared reference is undefined behavior unless the mutation happens inside an UnsafeCell<T>. The same goes for creating a mutable reference from a shared reference. When using this From instance without an UnsafeCell<T>, it is your responsibility to ensure that as_mut is never called, and as_ptr is never used for mutation.

Representation

Thanks to the null pointer optimization, NonNull<T> and Option<NonNull<T>> are guaranteed to have the same size and alignment:

use std::ptr::NonNull;

assert_eq!(size_of::<NonNull<i16>>(), size_of::<Option<NonNull<i16>>>());
assert_eq!(align_of::<NonNull<i16>>(), align_of::<Option<NonNull<i16>>>());

assert_eq!(size_of::<NonNull<str>>(), size_of::<Option<NonNull<str>>>());
assert_eq!(align_of::<NonNull<str>>(), align_of::<Option<NonNull<str>>>());

Toggle fullscreen

codeintel::block_77e2389dd8080b8d

pub struct Rc<T, A = Global>
where
    T: ?Sized,
    A: Allocator,
{
    ptr: NonNull<RcBox<T>>,
    phantom: PhantomData<RcBox<T>>,
    alloc: A,
}

Toggle fullscreen

codeintel::block_77e2389dd8080b8d::Rc

T: ?Sized

docs.rsToggle fullscreen

core::marker

pub trait Sized
where
    Self: MetaSized,

Types with a constant size known at compile time.

All type parameters have an implicit bound of Sized. The special syntax ?Sized can be used to remove this bound if it’s not appropriate.

struct Foo<T>(T);
struct Bar<T: ?Sized>(T);

// struct FooUse(Foo<[i32]>); // error: Sized is not implemented for [i32]
struct BarUse(Bar<[i32]>); // OK

The one exception is the implicit Self type of a trait. A trait does not have an implicit Sized bound as this is incompatible with trait objects where, by definition, the trait needs to work with all possible implementors, and thus could be any size.

Although Rust will let you bind Sized to a trait, you won’t be able to use it to form a trait object later:

trait Foo { }
trait Bar: Sized { }

struct Impl;
impl Foo for Impl { }
impl Bar for Impl { }

let x: &dyn Foo = &Impl;    // OK
// let y: &dyn Bar = &Impl; // error: the trait `Bar` cannot be made into an object

Toggle fullscreen

codeintel::block_77e2389dd8080b8d::Rc

A: Allocator

Toggle fullscreen

codeintel::block_77e2389dd8080b8d::Rc

ptr: NonNull<RcBox<T>>

Toggle fullscreen

codeintel::block_77e2389dd8080b8d

struct RcBox<T>
where
    T: ?Sized,
{
    strong: Cell<usize>,
    weak: Cell<usize>,
    value: T,
}

Toggle fullscreen

codeintel::block_77e2389dd8080b8d::Rc

phantom: PhantomData<RcBox<T>>

Toggle fullscreen

codeintel::block_77e2389dd8080b8d::Rc

alloc: A

Toggle fullscreen

codeintel::block_77e2389dd8080b8d::RcBox

T: ?Sized

Toggle fullscreen

codeintel::block_77e2389dd8080b8d::RcBox

strong: Cell<usize>

Toggle fullscreen

usize

The pointer-sized unsigned integer type.

The size of this primitive is how many bytes it takes to reference any location in memory. For example, on a 32 bit target, this is 4 bytes and on a 64 bit target, this is 8 bytes.

Toggle fullscreen

codeintel::block_77e2389dd8080b8d::RcBox

weak: Cell<usize>

Toggle fullscreen

codeintel::block_77e2389dd8080b8d::RcBox

value: T

Toggle fullscreen

codeintel::block_77e2389dd8080b8d

fn main()

docs.rsToggle fullscreen

std::macros

macro_rules! println

Prints to the standard output, with a newline.

On all platforms, the newline is the LINE FEED character (\n/U+000A) alone (no additional CARRIAGE RETURN (\r/U+000D)).

This macro uses the same syntax as format, but writes to the standard output instead. See std::fmt for more information.

The println! macro will lock the standard output on each call. If you call println! within a hot loop, this behavior may be the bottleneck of the loop. To avoid this, lock stdout with io::stdout().lock():

use std::io::{stdout, Write};

let mut lock = stdout().lock();
writeln!(lock, "hello world").unwrap();

Use println! only for the primary output of your program. Use eprintln! instead to print error and progress messages.

See the formatting documentation in std::fmt for details of the macro argument syntax.

Panics

Panics if writing to io::stdout fails.

Writing to non-blocking stdout can cause an error, which will lead this macro to panic.

Examples

println!(); // prints just a newline
println!("hello there!");
println!("format {} arguments", "some");
let local_variable = "some";
println!("format {local_variable} arguments");

Toggle fullscreen

Property accepts one or more names of counters (identifiers), each one optionally followed by an integer. The integer gives the value that the counter is set to on each occurrence of the element.

(Edge 12, Firefox 1, Safari 3, Chrome 2, IE 8, Opera 9.2)

Syntax: [ <counter-name> <integer>? | <reversed-counter-name> <integer>? ]+ | none

MDN Reference

Toggle fullscreen

Manipulate the value of existing counters.

(Edge 12, Firefox 1, Safari 3, Chrome 2, IE 8, Opera 9.2)

Syntax: [ <counter-name> <integer>? ]+ | none

MDN Reference

Toggle fullscreen

In combination with ‘float’ and ‘position’, determines the type of box or boxes that are generated for an element.

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 4, Opera 7)

Syntax: [ <display-outside> || <display-inside> ] | <display-listitem> | <display-internal> | <display-box> | <display-legacy>

MDN Reference

Toggle fullscreen

Determines which page-based occurrence of a given element is applied to a counter or string value.

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 8, Opera 4)

Syntax: normal | none | [ <content-replacement> | <content-list> ] [/ [ <string> | <counter> ]+ ]?

MDN Reference

Toggle fullscreen

Sets the background color of an element.

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 4, Opera 3.5)

Syntax: <color>

MDN Reference

Toggle fullscreen

The color of the border around all four edges of an element.

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 4, Opera 3.5)

Syntax: <color>{1,4}

MDN Reference

Toggle fullscreen

extern crate nom

nom, eating data byte by byte

nom is a parser combinator library with a focus on safe parsing, streaming patterns, and as much as possible zero copy.

Example

use nom::{
  IResult,
  bytes::complete::{tag, take_while_m_n},
  combinator::map_res,
  sequence::tuple};

#[derive(Debug,PartialEq)]
pub struct Color {
  pub red:     u8,
  pub green:   u8,
  pub blue:    u8,
}

fn from_hex(input: &str) -> Result<u8, std::num::ParseIntError> {
  u8::from_str_radix(input, 16)
}

fn is_hex_digit(c: char) -> bool {
  c.is_digit(16)
}

fn hex_primary(input: &str) -> IResult<&str, u8> {
  map_res(
    take_while_m_n(2, 2, is_hex_digit),
    from_hex
  )(input)
}

fn hex_color(input: &str) -> IResult<&str, Color> {
  let (input, _) = tag("#")(input)?;
  let (input, (red, green, blue)) = tuple((hex_primary, hex_primary, hex_primary))(input)?;

  Ok((input, Color { red, green, blue }))
}

fn main() {
  assert_eq!(hex_color("#2F14DF"), Ok(("", Color {
    red: 47,
    green: 20,
    blue: 223,
  })));
}

The code is available on Github

There are a few guides with more details about how to write parsers, or the error management system. You can also check out the recipes module that contains examples of common patterns.

Looking for a specific combinator? Read the “choose a combinator” guide

If you are upgrading to nom 5.0, please read the migration document.

Parser combinators

Parser combinators are an approach to parsers that is very different from software like lex and yacc. Instead of writing the grammar in a separate syntax and generating the corresponding code, you use very small functions with very specific purposes, like “take 5 bytes”, or “recognize the word ‘HTTP’”, and assemble them in meaningful patterns like “recognize ‘HTTP’, then a space, then a version”. The resulting code is small, and looks like the grammar you would have written with other parser approaches.

This gives us a few advantages:

The parsers are small and easy to write
The parsers components are easy to reuse (if they’re general enough, please add them to nom!)
The parsers components are easy to test separately (unit tests and property-based tests)
The parser combination code looks close to the grammar you would have written
You can build partial parsers, specific to the data you need at the moment, and ignore the rest

Here is an example of one such parser, to recognize text between parentheses:

use nom::{
  IResult,
  sequence::delimited,
  // see the "streaming/complete" paragraph lower for an explanation of these submodules
  character::complete::char,
  bytes::complete::is_not
};

fn parens(input: &str) -> IResult<&str, &str> {
  delimited(char('('), is_not(")"), char(')'))(input)
}

It defines a function named parens which will recognize a sequence of the character (, the longest byte array not containing ), then the character ), and will return the byte array in the middle.

Here is another parser, written without using nom’s combinators this time:

use nom::{IResult, Err, Needed};

fn take4(i: &[u8]) -> IResult<&[u8], &[u8]>{
  if i.len() < 4 {
    Err(Err::Incomplete(Needed::new(4)))
  } else {
    Ok((&i[4..], &i[0..4]))
  }
}

This function takes a byte array as input, and tries to consume 4 bytes. Writing all the parsers manually, like this, is dangerous, despite Rust’s safety features. There are still a lot of mistakes one can make. That’s why nom provides a list of functions to help in developing parsers.

With functions, you would write it like this:

use nom::{IResult, bytes::streaming::take};
fn take4(input: &str) -> IResult<&str, &str> {
  take(4u8)(input)
}

A parser in nom is a function which, for an input type I, an output type O and an optional error type E, will have the following signature:

fn parser(input: I) -> IResult<I, O, E>;

Or like this, if you don’t want to specify a custom error type (it will be (I, ErrorKind) by default):

fn parser(input: I) -> IResult<I, O>;

IResult is an alias for the Result type:

use nom::{Needed, error::Error};

type IResult<I, O, E = Error<I>> = Result<(I, O), Err<E>>;

enum Err<E> {
  Incomplete(Needed),
  Error(E),
  Failure(E),
}

It can have the following values:

A correct result Ok((I,O)) with the first element being the remaining of the input (not parsed yet), and the second the output value;
An error Err(Err::Error(c)) with c an error that can be built from the input position and a parser specific error
An error Err(Err::Incomplete(Needed)) indicating that more input is necessary. Needed can indicate how much data is needed
An error Err(Err::Failure(c)). It works like the Error case, except it indicates an unrecoverable error: We cannot backtrack and test another parser

Please refer to the “choose a combinator” guide for an exhaustive list of parsers. See also the rest of the documentation here.

Making new parsers with function combinators

nom is based on functions that generate parsers, with a signature like this: (arguments) -> impl Fn(Input) -> IResult<Input, Output, Error>. The arguments of a combinator can be direct values (like take which uses a number of bytes or character as argument) or even other parsers (like delimited which takes as argument 3 parsers, and returns the result of the second one if all are successful).

Here are some examples:

use nom::IResult;
use nom::bytes::complete::{tag, take};
fn abcd_parser(i: &str) -> IResult<&str, &str> {
  tag("abcd")(i) // will consume bytes if the input begins with "abcd"
}

fn take_10(i: &[u8]) -> IResult<&[u8], &[u8]> {
  take(10u8)(i) // will consume and return 10 bytes of input
}

Combining parsers

There are higher level patterns, like the alt combinator, which provides a choice between multiple parsers. If one branch fails, it tries the next, and returns the result of the first parser that succeeds:

use nom::IResult;
use nom::branch::alt;
use nom::bytes::complete::tag;

let mut alt_tags = alt((tag("abcd"), tag("efgh")));

assert_eq!(alt_tags(&b"abcdxxx"[..]), Ok((&b"xxx"[..], &b"abcd"[..])));
assert_eq!(alt_tags(&b"efghxxx"[..]), Ok((&b"xxx"[..], &b"efgh"[..])));
assert_eq!(alt_tags(&b"ijklxxx"[..]), Err(nom::Err::Error((&b"ijklxxx"[..], nom::error::ErrorKind::Tag))));

The opt combinator makes a parser optional. If the child parser returns an error, opt will still succeed and return None:

use nom::{IResult, combinator::opt, bytes::complete::tag};
fn abcd_opt(i: &[u8]) -> IResult<&[u8], Option<&[u8]>> {
  opt(tag("abcd"))(i)
}

assert_eq!(abcd_opt(&b"abcdxxx"[..]), Ok((&b"xxx"[..], Some(&b"abcd"[..]))));
assert_eq!(abcd_opt(&b"efghxxx"[..]), Ok((&b"efghxxx"[..], None)));

many0 applies a parser 0 or more times, and returns a vector of the aggregated results:

use nom::{IResult, multi::many0, bytes::complete::tag};
use std::str;

fn multi(i: &str) -> IResult<&str, Vec<&str>> {
  many0(tag("abcd"))(i)
}

let a = "abcdef";
let b = "abcdabcdef";
let c = "azerty";
assert_eq!(multi(a), Ok(("ef",     vec!["abcd"])));
assert_eq!(multi(b), Ok(("ef",     vec!["abcd", "abcd"])));
assert_eq!(multi(c), Ok(("azerty", Vec::new())));

Here are some basic combinators available:

opt: Will make the parser optional (if it returns the O type, the new parser returns Option<O>)
many0: Will apply the parser 0 or more times (if it returns the O type, the new parser returns Vec<O>)
many1: Will apply the parser 1 or more times

There are more complex (and more useful) parsers like tuple, which is used to apply a series of parsers then assemble their results.

Example with tuple:

use nom::{error::ErrorKind, Needed,
number::streaming::be_u16,
bytes::streaming::{tag, take},
sequence::tuple};

let mut tpl = tuple((be_u16, take(3u8), tag("fg")));

assert_eq!(
  tpl(&b"abcdefgh"[..]),
  Ok((
    &b"h"[..],
    (0x6162u16, &b"cde"[..], &b"fg"[..])
  ))
);
assert_eq!(tpl(&b"abcde"[..]), Err(nom::Err::Incomplete(Needed::new(2))));
let input = &b"abcdejk"[..];
assert_eq!(tpl(input), Err(nom::Err::Error((&input[5..], ErrorKind::Tag))));

But you can also use a sequence of combinators written in imperative style, thanks to the ? operator:

use nom::{IResult, bytes::complete::tag};

#[derive(Debug, PartialEq)]
struct A {
  a: u8,
  b: u8
}

fn ret_int1(i:&[u8]) -> IResult<&[u8], u8> { Ok((i,1)) }
fn ret_int2(i:&[u8]) -> IResult<&[u8], u8> { Ok((i,2)) }

fn f(i: &[u8]) -> IResult<&[u8], A> {
  // if successful, the parser returns `Ok((remaining_input, output_value))` that we can destructure
  let (i, _) = tag("abcd")(i)?;
  let (i, a) = ret_int1(i)?;
  let (i, _) = tag("efgh")(i)?;
  let (i, b) = ret_int2(i)?;

  Ok((i, A { a, b }))
}

let r = f(b"abcdefghX");
assert_eq!(r, Ok((&b"X"[..], A{a: 1, b: 2})));

Streaming / Complete

Some of nom’s modules have streaming or complete submodules. They hold different variants of the same combinators.

A streaming parser assumes that we might not have all of the input data. This can happen with some network protocol or large file parsers, where the input buffer can be full and need to be resized or refilled.

A complete parser assumes that we already have all of the input data. This will be the common case with small files that can be read entirely to memory.

Here is how it works in practice:

use nom::{IResult, Err, Needed, error::{Error, ErrorKind}, bytes, character};

fn take_streaming(i: &[u8]) -> IResult<&[u8], &[u8]> {
  bytes::streaming::take(4u8)(i)
}

fn take_complete(i: &[u8]) -> IResult<&[u8], &[u8]> {
  bytes::complete::take(4u8)(i)
}

// both parsers will take 4 bytes as expected
assert_eq!(take_streaming(&b"abcde"[..]), Ok((&b"e"[..], &b"abcd"[..])));
assert_eq!(take_complete(&b"abcde"[..]), Ok((&b"e"[..], &b"abcd"[..])));

// if the input is smaller than 4 bytes, the streaming parser
// will return `Incomplete` to indicate that we need more data
assert_eq!(take_streaming(&b"abc"[..]), Err(Err::Incomplete(Needed::new(1))));

// but the complete parser will return an error
assert_eq!(take_complete(&b"abc"[..]), Err(Err::Error(Error::new(&b"abc"[..], ErrorKind::Eof))));

// the alpha0 function recognizes 0 or more alphabetic characters
fn alpha0_streaming(i: &str) -> IResult<&str, &str> {
  character::streaming::alpha0(i)
}

fn alpha0_complete(i: &str) -> IResult<&str, &str> {
  character::complete::alpha0(i)
}

// if there's a clear limit to the recognized characters, both parsers work the same way
assert_eq!(alpha0_streaming("abcd;"), Ok((";", "abcd")));
assert_eq!(alpha0_complete("abcd;"), Ok((";", "abcd")));

// but when there's no limit, the streaming version returns `Incomplete`, because it cannot
// know if more input data should be recognized. The whole input could be "abcd;", or
// "abcde;"
assert_eq!(alpha0_streaming("abcd"), Err(Err::Incomplete(Needed::new(1))));

// while the complete version knows that all of the data is there
assert_eq!(alpha0_complete("abcd"), Ok(("", "abcd")));

Going further: Read the guides, check out the recipes!

docs.rsToggle fullscreen

nom::internal

pub trait Parser<I, O, E>

All nom parsers implement this trait

docs.rsToggle fullscreen

core

pub mod ops

Overloadable operators.

Implementing these traits allows you to overload certain operators.

Some of these traits are imported by the prelude, so they are available in every Rust program. Only operators backed by traits can be overloaded. For example, the addition operator (+) can be overloaded through the Add trait, but since the assignment operator (=) has no backing trait, there is no way of overloading its semantics. Additionally, this module does not provide any mechanism to create new operators. If traitless overloading or custom operators are required, you should look toward macros to extend Rust’s syntax.

Implementations of operator traits should be unsurprising in their respective contexts, keeping in mind their usual meanings and operator precedence. For example, when implementing Mul, the operation should have some resemblance to multiplication (and share expected properties like associativity).

Note that the && and || operators are currently not supported for overloading. Due to their short circuiting nature, they require a different design from traits for other operators like BitAnd. Designs for them are under discussion.

Many of the operators take their operands by value. In non-generic contexts involving built-in types, this is usually not a problem. However, using these operators in generic code, requires some attention if values have to be reused as opposed to letting the operators consume them. One option is to occasionally use clone. Another option is to rely on the types involved providing additional operator implementations for references. For example, for a user-defined type T which is supposed to support addition, it is probably a good idea to have both T and &T implement the traits Add<T> and Add<&T> so that generic code can be written without unnecessary cloning.

Examples

This example creates a Point struct that implements Add and Sub, and then demonstrates adding and subtracting two Points.

use std::ops::{Add, Sub};

#[derive(Debug, Copy, Clone, PartialEq)]
struct Point {
    x: i32,
    y: i32,
}

impl Add for Point {
    type Output = Self;

    fn add(self, other: Self) -> Self {
        Self {x: self.x + other.x, y: self.y + other.y}
    }
}

impl Sub for Point {
    type Output = Self;

    fn sub(self, other: Self) -> Self {
        Self {x: self.x - other.x, y: self.y - other.y}
    }
}

assert_eq!(Point {x: 3, y: 3}, Point {x: 1, y: 0} + Point {x: 2, y: 3});
assert_eq!(Point {x: -1, y: -3}, Point {x: 1, y: 0} - Point {x: 2, y: 3});

See the documentation for each trait for an example implementation.

The Fn, FnMut, and FnOnce traits are implemented by types that can be invoked like functions. Note that Fn takes &self, FnMut takes &mut self and FnOnce takes self. These correspond to the three kinds of methods that can be invoked on an instance: call-by-reference, call-by-mutable-reference, and call-by-value. The most common use of these traits is to act as bounds to higher-level functions that take functions or closures as arguments.

Taking a Fn as a parameter:

fn call_with_one<F>(func: F) -> usize
    where F: Fn(usize) -> usize
{
    func(1)
}

let double = |x| x * 2;
assert_eq!(call_with_one(double), 2);

Taking a FnMut as a parameter:

fn do_twice<F>(mut func: F)
    where F: FnMut()
{
    func();
    func();
}

let mut x: usize = 1;
{
    let add_two_to_x = || x += 2;
    do_twice(add_two_to_x);
}

assert_eq!(x, 5);

Taking a FnOnce as a parameter:

fn consume_with_relish<F>(func: F)
    where F: FnOnce() -> String
{
    // `func` consumes its captured variables, so it cannot be run more
    // than once
    println!("Consumed: {}", func());

    println!("Delicious!");

    // Attempting to invoke `func()` again will throw a `use of moved
    // value` error for `func`
}

let x = String::from("x");
let consume_and_return_x = move || x;
consume_with_relish(consume_and_return_x);

// `consume_and_return_x` can no longer be invoked at this point

docs.rsToggle fullscreen

core::ops::range

pub struct RangeInclusive<Idx> {
    pub(crate) start: Idx,
    pub(crate) end: Idx,
    pub(crate) exhausted: bool,
}

A range bounded inclusively below and above (start..=end).

The RangeInclusive start..=end contains all values with x >= start and x <= end. It is empty unless start <= end.

This iterator is fused, but the specific values of start and end after iteration has finished are unspecified other than that .is_empty() will return true once no more values will be produced.

Examples

The start..=end syntax is a RangeInclusive:

assert_eq!((3..=5), std::ops::RangeInclusive::new(3, 5));
assert_eq!(3 + 4 + 5, (3..=5).sum());

let arr = [0, 1, 2, 3, 4];
assert_eq!(arr[ ..  ], [0, 1, 2, 3, 4]);
assert_eq!(arr[ .. 3], [0, 1, 2      ]);
assert_eq!(arr[ ..=3], [0, 1, 2, 3   ]);
assert_eq!(arr[1..  ], [   1, 2, 3, 4]);
assert_eq!(arr[1.. 3], [   1, 2      ]);
assert_eq!(arr[1..=3], [   1, 2, 3   ]); // This is a `RangeInclusive`

docs.rsToggle fullscreen

core::macros::builtin

macro derive

Attribute macro used to apply derive macros.

See the reference for more info.

docs.rsToggle fullscreen

core::fmt::macros

macro Debug

Derive macro generating an impl of the trait Debug.

docs.rsToggle fullscreen

core::cmp

macro PartialEq

Derive macro generating an impl of the trait PartialEq. The behavior of this macro is described in detail here.

docs.rsToggle fullscreen

core::cmp

macro Eq

Derive macro generating an impl of the trait Eq. The behavior of this macro is described in detail here.

docs.rsToggle fullscreen

core::default

macro Default

Derive macro generating an impl of the trait Default.

Toggle fullscreen

codeintel::block_0be8c6ed0e493506

pub struct RangeSet(RangeSetBlaze<usize>)

Toggle fullscreen

extern crate range_set_blaze

range-set-blaze

Integer sets as fast, sorted, integer ranges with full set operations

The integers can be any size (u8 to u128) and may be signed (i8 to i128). The set operations include union, intersection, difference, symmetric difference, and complement.

The crate’s main struct is RangeSetBlaze, a set of integers. See the documentation for details.

Unlike the standard BTreeSet and HashSet, RangeSetBlaze does not store every integer in the set. Rather, it stores sorted & disjoint ranges of integers in a cache-efficient BTreeMap. It differs from other interval libraries – that we know of – by offering full set operations and by being optimized for sets of clumpy integers.

We can construct a RangeSetBlaze from unsorted & redundant integers (or ranges). When the inputs are clumpy, construction will be linear in the number of inputs and set operations will be sped up quadratically.

The crate’s main trait is SortedDisjoint. It is implemented by iterators of sorted & disjoint ranges of integers. See the SortedDisjoint documentation for details.

With any SortedDisjoint iterator we can perform set operations in one pass through the ranges and with minimal (constant) memory. It enforces the “sorted & disjoint” constraint at compile time. This trait is inspired by the SortedIterator trait from the sorted_iter crate. SortedDisjoint differs from its inspiration by specializing on disjoint integer ranges.

The crate supports no_std, WASM, and embedded projects. Use the command:

cargo add range-set-blaze --features "alloc" --no-default-features

Benchmarks

See the benchmarks for performance comparisons with other range-related crates.

Generally, for many tasks involving clumpy integers and ranges, RangeSetBlaze is much faster than alternatives.

The benchmarks are in the benches directory. To run them, use cargo bench.

Articles

Nine Rules for Creating Fast, Safe, and Compatible Data Structures in Rust: Lessons from RangeSetBlaze in Towards Data Science. It provides a high-level overview of the crate and its design.
Nine Rules for Running Rust on the Web and on Embedded: Practical Lessons from Porting range-set-blaze to no_std and WASM in Towards Data Science. It covers porting to “no_std”.
Check AI-Generated Code Perfectly and Automatically My Experience Applying Kani’s Formal Verification to ChatGPT-Suggested Rust Code. Shows how to prove overflow safety.
Nine Rules to Formally Validate Rust Algorithms with Dafny in Towards Data Science. It shows how to formally validate one of the crate’s algorithms.
Nine Rules for SIMD Acceleration of your Rust Code: General Lessons from Boosting Data Ingestion in the range-set-blaze Crate by 7x in Towards Data Science
Also see: CHANGELOG

Examples

Example 1

Here we take the union (operator “|”) of two RangeSetBlaze’s:

Example 1

use range_set_blaze::RangeSetBlaze;

 // a is the set of integers from 100 to 499 (inclusive) and 501 to 1000 (inclusive)
let a = RangeSetBlaze::from_iter([100..=499, 501..=999]);
 // b is the set of integers -20 and the range 400 to 599 (inclusive)
let b = RangeSetBlaze::from_iter([-20..=-20, 400..=599]);
// c is the union of a and b, namely -20 and 100 to 999 (inclusive)
let c = a | b;
assert_eq!(c, RangeSetBlaze::from_iter([-20..=-20, 100..=999]));

Example 2

In biology, suppose we want to find the intron regions of a gene but we are given only the transcription region and the exon regions.

Example 2

We create a RangeSetBlaze for the transcription region and a RangeSetBlaze for all the exon regions. Then we take the difference between the transcription region and exon regions to find the intron regions.

use range_set_blaze::RangeSetBlaze;

let line = "chr15   29370   37380   29370,32358,36715   30817,32561,37380";

// split the line on white space
let mut iter = line.split_whitespace();
let chr = iter.next().unwrap();

// Parse the start and end of the transcription region into a RangeSetBlaze
let trans_start: i32 = iter.next().unwrap().parse().unwrap();
let trans_end: i32 = iter.next().unwrap().parse().unwrap();
let trans = RangeSetBlaze::from_iter([trans_start..=trans_end]);
assert_eq!(trans, RangeSetBlaze::from_iter([29370..=37380]));

// Parse the start and end of the exons into a RangeSetBlaze
let exon_starts = iter.next().unwrap().split(',').map(|s| s.parse::<i32>());
let exon_ends = iter.next().unwrap().split(',').map(|s| s.parse::<i32>());
let exon_ranges = exon_starts
    .zip(exon_ends)
    .map(|(s, e)| s.unwrap()..=e.unwrap());
let exons = RangeSetBlaze::from_iter(exon_ranges);
assert_eq!(
    exons,
    RangeSetBlaze::from_iter([29370..=30817, 32358..=32561, 36715..=37380])
);

// Use 'set difference' to find the introns
let intron = trans - exons;
assert_eq!(intron, RangeSetBlaze::from_iter([30818..=32357, 32562..=36714]));
for range in intron.ranges() {
    let (start, end) = range.into_inner();
    println!("{chr}\t{start}\t{end}");
}

Toggle fullscreen

range_set_blaze

pub struct RangeSetBlaze<T>
where
    T: Integer,
{
    len: <T as Integer>::SafeLen,
    btree_map: BTreeMap<T, T>,
}

A set of integers stored as sorted & disjoint ranges.

Internally, it stores the ranges in a cache-efficient BTreeMap.

`RangeSetBlaze` Constructors

You can also create RangeSetBlaze’s from unsorted and overlapping integers (or ranges). However, if you know that your input is sorted and disjoint, you can speed up construction.

Here are the constructors, followed by a description of the performance, and then some examples.

Methods	Input	Notes
`new`/`default`
`from_iter`/`collect`	integer iterator
`from_iter`/`collect`	ranges iterator
`from_slice`	slice of integers	Fast, but nightly-only
`from_sorted_disjoint`/`into_range_set_blaze`	`SortedDisjoint` iterator
`from_sorted_starts`	`SortedStarts` iterator
`from` /`into`	array of integers

Constructor Performance

The from_iter/collect constructors are designed to work fast on ‘clumpy’ data. By ‘clumpy’, we mean that the number of ranges needed to represent the data is small compared to the number of input integers. To understand this, consider the internals of the constructors:

Internally, the from_iter/collect constructors take these steps:

collect adjacent integers/ranges into disjoint ranges, O(n₁)
sort the disjoint ranges by their start, O(n₂ log n₂)
merge adjacent ranges, O(n₂)
create a BTreeMap from the now sorted & disjoint ranges, O(n₃ log n₃)

where n₁ is the number of input integers/ranges, n₂ is the number of disjoint & unsorted ranges, and n₃ is the final number of sorted & disjoint ranges.

For example, an input of

3, 2, 1, 4, 5, 6, 7, 0, 8, 8, 8, 100, 1, becomes
0..=8, 100..=100, 1..=1, and then
0..=8, 1..=1, 100..=100, and finally
0..=8, 100..=100.

What is the effect of clumpy data? Notice that if n₂ ≈ sqrt(n₁), then construction is O(n₁). (Indeed, as long as n₂ ≤ n₁/ln(n₁), then construction is O(n₁).) Moreover, we’ll see that set operations are O(n₃). Thus, if n₃ ≈ sqrt(n₁) then set operations are O(sqrt(n₁)), a quadratic improvement an O(n₁) implementation that ignores the clumps.

The from_slice constructor typically provides a constant-time speed up for array-like collections of clumpy integers. On a representative benchmark, the speed up was 7×. The method works by scanning the input for blocks of consecutive integers, and then using from_iter on the results. Where available, it uses SIMD instructions. It is nightly only and enabled by the from_slice feature.

Constructor Examples

use range_set_blaze::prelude::*;

// Create an empty set with 'new' or 'default'.
let a0 = RangeSetBlaze::<i32>::new();
let a1 = RangeSetBlaze::<i32>::default();
assert!(a0 == a1 && a0.is_empty());

// 'from_iter'/'collect': From an iterator of integers.
// Duplicates and out-of-order elements are fine.
let a0 = RangeSetBlaze::from_iter([3, 2, 1, 100, 1]);
let a1: RangeSetBlaze<i32> = [3, 2, 1, 100, 1].into_iter().collect();
assert!(a0 == a1 && a0.to_string() == "1..=3, 100..=100");

// 'from_iter'/'collect': From an iterator of inclusive ranges, start..=end.
// Overlapping, out-of-order, and empty ranges are fine.
#[allow(clippy::reversed_empty_ranges)]
let a0 = RangeSetBlaze::from_iter([1..=2, 2..=2, -10..=-5, 1..=0]);
#[allow(clippy::reversed_empty_ranges)]
let a1: RangeSetBlaze<i32> = [1..=2, 2..=2, -10..=-5, 1..=0].into_iter().collect();
assert!(a0 == a1 && a0.to_string() == "-10..=-5, 1..=2");

// 'from_slice': From any array-like collection of integers.
// Nightly-only, but faster than 'from_iter'/'collect' on integers.
#[cfg(feature = "from_slice")]
let a0 = RangeSetBlaze::from_slice(vec![3, 2, 1, 100, 1]);
#[cfg(feature = "from_slice")]
assert!(a0.to_string() == "1..=3, 100..=100");

// If we know the ranges are already sorted and disjoint,
// we can avoid work and use 'from'/'into'.
let a0 = RangeSetBlaze::from_sorted_disjoint(CheckSortedDisjoint::from([-10..=-5, 1..=2]));
let a1: RangeSetBlaze<i32> = CheckSortedDisjoint::from([-10..=-5, 1..=2]).into_range_set_blaze();
assert!(a0 == a1 && a0.to_string() == "-10..=-5, 1..=2");

// For compatibility with `BTreeSet`, we also support
// 'from'/'into' from arrays of integers.
let a0 = RangeSetBlaze::from([3, 2, 1, 100, 1]);
let a1: RangeSetBlaze<i32> = [3, 2, 1, 100, 1].into();
assert!(a0 == a1 && a0.to_string() == "1..=3, 100..=100");

`RangeSetBlaze` Set Operations

You can perform set operations on RangeSetBlazes using operators.

Set Operation	Operator	Multiway Method
union	`a` \| `b`	`[a, b, c].union()`
intersection	`a & b`	`[a, b, c].intersection()`
difference	`a - b`	n/a
symmetric difference	`a ^ b`	n/a
complement	`!a`	n/a

RangeSetBlaze also implements many other methods, such as insert, pop_first and split_off. Many of these methods match those of BTreeSet.

Set Operation Performance

Every operation is implemented as

a single pass over the sorted & disjoint ranges
the construction of a new RangeSetBlaze

Thus, applying multiple operators creates intermediate RangeSetBlaze’s. If you wish, you can avoid these intermediate RangeSetBlaze’s by switching to the SortedDisjoint API. The last example below demonstrates this.

Set Operation Examples

use range_set_blaze::prelude::*;

let a = RangeSetBlaze::from_iter([1..=2, 5..=100]);
let b = RangeSetBlaze::from_iter([2..=6]);

// Union of two 'RangeSetBlaze's.
let result = &a | &b;
// Alternatively, we can take ownership via 'a | b'.
assert_eq!(result.to_string(), "1..=100");

// Intersection of two 'RangeSetBlaze's.
let result = &a & &b; // Alternatively, 'a & b'.
assert_eq!(result.to_string(), "2..=2, 5..=6");

// Set difference of two 'RangeSetBlaze's.
let result = &a - &b; // Alternatively, 'a - b'.
assert_eq!(result.to_string(), "1..=1, 7..=100");

// Symmetric difference of two 'RangeSetBlaze's.
let result = &a ^ &b; // Alternatively, 'a ^ b'.
assert_eq!(result.to_string(), "1..=1, 3..=4, 7..=100");

// complement of a 'RangeSetBlaze'.
let result = !&a; // Alternatively, '!a'.
assert_eq!(
    result.to_string(),
    "-2147483648..=0, 3..=4, 101..=2147483647"
);

// Multiway union of 'RangeSetBlaze's.
let c = RangeSetBlaze::from_iter([2..=2, 6..=200]);
let result = [&a, &b, &c].union();
assert_eq!(result.to_string(), "1..=200");

// Multiway intersection of 'RangeSetBlaze's.
let result = [&a, &b, &c].intersection();
assert_eq!(result.to_string(), "2..=2, 6..=6");

// Applying multiple operators
let result0 = &a - (&b | &c); // Creates an intermediate 'RangeSetBlaze'.
// Alternatively, we can use the 'SortedDisjoint' API and avoid the intermediate 'RangeSetBlaze'.
let result1 = RangeSetBlaze::from_sorted_disjoint(a.ranges() - (b.ranges() | c.ranges()));
assert!(result0 == result1 && result0.to_string() == "1..=1");

`RangeSetBlaze` Comparisons

We can compare RangeSetBlazes using the following operators: <, <=, >, >=. Following the convention of BTreeSet, these comparisons are lexicographic. See cmp for more examples.

Use the is_subset and is_superset methods to check if one RangeSetBlaze is a subset or superset of another.

Use ==, != to check if two RangeSetBlazes are equal or not.

Additional Examples

See the module-level documentation for additional examples.

Toggle fullscreen

codeintel::block_0be8c6ed0e493506::RangeSet

pub fn contains(&self, idx: usize) -> bool

Toggle fullscreen

self: &RangeSet

Toggle fullscreen

idx: usize

Toggle fullscreen

bool

The boolean type.

The bool represents a value, which could only be either true or false. If you cast a bool into an integer, true will be 1 and false will be 0.

Basic usage

bool implements various traits, such as BitAnd, BitOr, Not, etc., which allow us to perform boolean operations using &, | and !.

if requires a bool value as its conditional. assert!, which is an important macro in testing, checks whether an expression is true and panics if it isn’t.

let bool_val = true & false | false;
assert!(!bool_val);

Examples

A trivial example of the usage of bool:

let praise_the_borrow_checker = true;

// using the `if` conditional
if praise_the_borrow_checker {
    println!("oh, yeah!");
} else {
    println!("what?!!");
}

// ... or, a match pattern
match praise_the_borrow_checker {
    true => println!("keep praising!"),
    false => println!("you should praise!"),
}

Also, since bool implements the Copy trait, we don’t have to worry about the move semantics (just like the integer and float primitives).

Now an example of bool cast to integer type:

assert_eq!(true as i32, 1);
assert_eq!(false as i32, 0);

Toggle fullscreen

codeintel::block_0be8c6ed0e493506::RangeSet

0: RangeSetBlaze<usize>

Toggle fullscreen

range_set_blaze::RangeSetBlaze

impl<T> RangeSetBlaze<T>
pub fn contains(&self, value: T) -> bool
where
    // Bounds from impl:
    T: Integer,

Returns true if the set contains an element equal to the value.

Examples

use range_set_blaze::RangeSetBlaze;

let set = RangeSetBlaze::from_iter([1, 2, 3]);
assert_eq!(set.contains(1), true);
assert_eq!(set.contains(4), false);

Toggle fullscreen

codeintel::block_0be8c6ed0e493506

pub fn from_range_str(s: &str) -> Result<RangeSet, &'static str>

Toggle fullscreen

s: &str

Toggle fullscreen

str

String slices.

See also the std::str module.

The str type, also called a ‘string slice’, is the most primitive string type. It is usually seen in its borrowed form, &str. It is also the type of string literals, &'static str.

Basic Usage

String literals are string slices:

let hello_world = "Hello, World!";

Here we have declared a string slice initialized with a string literal. String literals have a static lifetime, which means the string hello_world is guaranteed to be valid for the duration of the entire program. We can explicitly specify hello_world’s lifetime as well:

let hello_world: &'static str = "Hello, world!";

Representation

A &str is made up of two components: a pointer to some bytes, and a length. You can look at these with the as_ptr and len methods:

use std::slice;
use std::str;

let story = "Once upon a time...";

let ptr = story.as_ptr();
let len = story.len();

// story has nineteen bytes
assert_eq!(19, len);

// We can re-build a str out of ptr and len. This is all unsafe because
// we are responsible for making sure the two components are valid:
let s = unsafe {
    // First, we build a &[u8]...
    let slice = slice::from_raw_parts(ptr, len);

    // ... and then convert that slice into a string slice
    str::from_utf8(slice)
};

assert_eq!(s, Ok(story));

Note: This example shows the internals of &str. unsafe should not be used to get a string slice under normal circumstances. Use as_str instead.

Invariant

Rust libraries may assume that string slices are always valid UTF-8.

Constructing a non-UTF-8 string slice is not immediate undefined behavior, but any function called on a string slice may assume that it is valid UTF-8, which means that a non-UTF-8 string slice can lead to undefined behavior down the road.

docs.rsToggle fullscreen

core::result

pub enum Result<T, E> {
    Ok( /* … */ ),
    Err( /* … */ ),
}

Result is a type that represents either success (Ok) or failure (Err).

See the module documentation for details.

Toggle fullscreen

'static

Toggle fullscreen

let mut rangeset: RangeSet

Toggle fullscreen

codeintel::block_0be8c6ed0e493506::RangeSet

pub fn default() -> Self

Returns the “default value” for a type.

Default values are often some kind of initial value, identity value, or anything else that may make sense as a default.

Examples

Using built-in default values:

let i: i8 = Default::default();
let (x, y): (Option<String>, f64) = Default::default();
let (a, b, (c, d)): (i32, u32, (bool, bool)) = Default::default();

Making your own:

enum Kind {
    A,
    B,
    C,
}

impl Default for Kind {
    fn default() -> Self { Kind::A }
}

docs.rsToggle fullscreen

core::str

pub const fn is_empty(&self) -> bool

Returns true if self has a length of zero bytes.

Examples

let s = "";
assert!(s.is_empty());

let s = "not empty";
assert!(!s.is_empty());

Toggle fullscreen

core::result::Result

Ok(T)

Contains the success value

Toggle fullscreen

let range: Result<RangeInclusive<usize>, &'static str>

docs.rsToggle fullscreen

core::str

pub fn split<P>(&self, pat: P) -> Split<'_, P>
where
    P: Pattern,

Returns an iterator over substrings of this string slice, separated by characters matched by a pattern.

The pattern can be a &str, char, a slice of chars, or a function or closure that determines if a character matches.

If there are no matches the full string slice is returned as the only item in the iterator.

Iterator behavior

The returned iterator will be a DoubleEndedIterator if the pattern allows a reverse search and forward/reverse search yields the same elements. This is true for, e.g., char, but not for &str.

If the pattern allows a reverse search but its results might differ from a forward search, the rsplit method can be used.

Examples

Simple patterns:

let v: Vec<&str> = "Mary had a little lamb".split(' ').collect();
assert_eq!(v, ["Mary", "had", "a", "little", "lamb"]);

let v: Vec<&str> = "".split('X').collect();
assert_eq!(v, [""]);

let v: Vec<&str> = "lionXXtigerXleopard".split('X').collect();
assert_eq!(v, ["lion", "", "tiger", "leopard"]);

let v: Vec<&str> = "lion::tiger::leopard".split("::").collect();
assert_eq!(v, ["lion", "tiger", "leopard"]);

let v: Vec<&str> = "AABBCC".split("DD").collect();
assert_eq!(v, ["AABBCC"]);

let v: Vec<&str> = "abc1def2ghi".split(char::is_numeric).collect();
assert_eq!(v, ["abc", "def", "ghi"]);

let v: Vec<&str> = "lionXtigerXleopard".split(char::is_uppercase).collect();
assert_eq!(v, ["lion", "tiger", "leopard"]);

If the pattern is a slice of chars, split on each occurrence of any of the characters:

let v: Vec<&str> = "2020-11-03 23:59".split(&['-', ' ', ':', '@'][..]).collect();
assert_eq!(v, ["2020", "11", "03", "23", "59"]);

A more complex pattern, using a closure:

let v: Vec<&str> = "abc1defXghi".split(|c| c == '1' || c == 'X').collect();
assert_eq!(v, ["abc", "def", "ghi"]);

If a string contains multiple contiguous separators, you will end up with empty strings in the output:

let x = "||||a||b|c".to_string();
let d: Vec<_> = x.split('|').collect();

assert_eq!(d, &["", "", "", "", "a", "", "b", "c"]);

Contiguous separators are separated by the empty string.

let x = "(///)".to_string();
let d: Vec<_> = x.split('/').collect();

assert_eq!(d, &["(", "", "", ")"]);

Separators at the start or end of a string are neighbored by empty strings.

let d: Vec<_> = "010".split("0").collect();
assert_eq!(d, &["", "1", ""]);

When the empty string is used as a separator, it separates every character in the string, along with the beginning and end of the string.

let f: Vec<_> = "rust".split("").collect();
assert_eq!(f, &["", "r", "u", "s", "t", ""]);

Contiguous separators can lead to possibly surprising behavior when whitespace is used as the separator. This code is correct:

let x = "    a  b c".to_string();
let d: Vec<_> = x.split(' ').collect();

assert_eq!(d, &["", "", "", "", "a", "", "b", "c"]);

It does not give you:

assert_eq!(d, &["a", "b", "c"]);

Use split_whitespace for this behavior.

docs.rsToggle fullscreen

core::iter::traits::iterator::Iterator

pub trait Iterator
pub fn map<B, F>(self, f: F) -> Map<Self, F>
where
    Self: Sized,
    F: FnMut(Self::Item) -> B,

Takes a closure and creates an iterator which calls that closure on each element.

map() transforms one iterator into another, by means of its argument: something that implements FnMut. It produces a new iterator which calls this closure on each element of the original iterator.

If you are good at thinking in types, you can think of map() like this: If you have an iterator that gives you elements of some type A, and you want an iterator of some other type B, you can use map(), passing a closure that takes an A and returns a B.

map() is conceptually similar to a for loop. However, as map() is lazy, it is best used when you’re already working with other iterators. If you’re doing some sort of looping for a side effect, it’s considered more idiomatic to use for than map().

Examples

Basic usage:

let a = [1, 2, 3];

let mut iter = a.iter().map(|x| 2 * x);

assert_eq!(iter.next(), Some(2));
assert_eq!(iter.next(), Some(4));
assert_eq!(iter.next(), Some(6));
assert_eq!(iter.next(), None);

If you’re doing some sort of side effect, prefer for to map():

// don't do this:
(0..5).map(|x| println!("{x}"));

// it won't even execute, as it is lazy. Rust will warn you about this.

// Instead, use a for-loop:
for x in 0..5 {
    println!("{x}");
}

Toggle fullscreen

codeintel::block_0be8c6ed0e493506

fn get_range(s: &str) -> Result<RangeInclusive<usize>, &'static str>

Toggle fullscreen

let range: RangeInclusive<usize>

docs.rsToggle fullscreen

core::result::Result

impl<T, E> ops::Try for Result<T, E>
fn branch(self) -> ControlFlow<Self::Residual, Self::Output>

Used in ? to decide whether the operator should produce a value (because this returned ControlFlow::Continue) or propagate a value back to the caller (because this returned ControlFlow::Break).

Examples

#![feature(try_trait_v2)]
use std::ops::{ControlFlow, Try};

assert_eq!(Ok::<_, String>(3).branch(), ControlFlow::Continue(3));
assert_eq!(Err::<String, _>(3).branch(), ControlFlow::Break(Err(3)));

assert_eq!(Some(3).branch(), ControlFlow::Continue(3));
assert_eq!(None::<String>.branch(), ControlFlow::Break(None));

assert_eq!(ControlFlow::<String, _>::Continue(3).branch(), ControlFlow::Continue(3));
assert_eq!(
    ControlFlow::<_, String>::Break(3).branch(),
    ControlFlow::Break(ControlFlow::Break(3)),
);

Toggle fullscreen

range_set_blaze::RangeSetBlaze

impl<T> RangeSetBlaze<T>
pub fn ranges_insert(&mut self, range: RangeInclusive<T>) -> bool
where
    // Bounds from impl:
    T: Integer,

Adds a range to the set.

Returns whether any values where newly inserted. That is:

If the set did not previously contain some value in the range, true isreturned.
If the set already contained every value in the range, false is returned, andthe entry is not updated.

Performance

Inserting n items will take in O(n log m) time, where n is the number of inserted items and m is the number of ranges in self. When n is large, consider using | which is O(n+m) time.

Examples

use range_set_blaze::RangeSetBlaze;

let mut set = RangeSetBlaze::new();

assert_eq!(set.ranges_insert(2..=5), true);
assert_eq!(set.ranges_insert(5..=6), true);
assert_eq!(set.ranges_insert(3..=4), false);
assert_eq!(set.len(), 5usize);

Toggle fullscreen

let s: &str

docs.rsToggle fullscreen

core::str

pub fn trim(&self) -> &str

Returns a string slice with leading and trailing whitespace removed.

‘Whitespace’ is defined according to the terms of the Unicode Derived Core Property White_Space, which includes newlines.

Examples

let s = "\n Hello\tworld\t\n";

assert_eq!("Hello\tworld", s.trim());

Toggle fullscreen

let num: usize

Toggle fullscreen

codeintel::block_0be8c6ed0e493506

fn digits_exact(input: &str) -> nom::IResult<&str, usize>

Toggle fullscreen

let result: (Option<usize>, &str, Option<usize>)

Toggle fullscreen

codeintel::block_0be8c6ed0e493506

fn range(input: &str) -> nom::IResult<&str, (Option<usize>, &str, Option<usize>)>

docs.rsToggle fullscreen

core::result::Result

impl<T, E> Result<T, E>
pub const fn map_err<F, O>(self, op: O) -> Result<T, F>
where
    O: FnOnce(E) -> F + Destruct,

Maps a Result<T, E> to Result<T, F> by applying a function to a contained Err value, leaving an Ok value untouched.

This function can be used to pass through a successful result while handling an error.

Examples

fn stringify(x: u32) -> String { format!("error code: {x}") }

let x: Result<u32, u32> = Ok(2);
assert_eq!(x.map_err(stringify), Ok(2));

let x: Result<u32, u32> = Err(13);
assert_eq!(x.map_err(stringify), Err("error code: 13".to_string()));

Toggle fullscreen

core::option::Option

Some(T)

Some value of type T.

Toggle fullscreen

let left: usize

Toggle fullscreen

let right: usize

docs.rsToggle fullscreen

core::num

pub const fn saturating_sub(self, rhs: Self) -> Self

Saturating integer subtraction. Computes self - rhs, saturating at the numeric bounds instead of overflowing.

Examples

assert_eq!(100usize.saturating_sub(27), 73);
assert_eq!(13usize.saturating_sub(127), 0);

Toggle fullscreen

core::option::Option

None

No value.

docs.rsToggle fullscreen

core::num

pub const MAX: Self = 18446744073709551615 (0xFFFFFFFFFFFFFFFF)

The largest value that can be represented by this integer type (2⁶⁴ − 1 on 64-bit targets).

Examples

assert_eq!(usize::MAX, 18446744073709551615);

Toggle fullscreen

core::result::Result

Err(E)

Contains the error value

Toggle fullscreen

input: &str

docs.rsToggle fullscreen

nom::internal

pub type IResult<I, O, E = error::Error<I>> = Result<(I, O), Err<E>>

Holds the result of parsing functions

It depends on the input type I, the output type O, and the error type E (by default (I, nom::ErrorKind))

The Ok side is a pair containing the remainder of the input (the part of the data that was not parsed) and the produced value. The Err side contains an instance of nom::Err.

Outside of the parsing code, you can use the Finish::finish method to convert it to a more common result type

docs.rsToggle fullscreen

core::option

pub enum Option<T> {
    None,
    Some( /* … */ ),
}

The Option type. See the module level documentation for more.

Toggle fullscreen

let sep: impl FnMut(&str) -> Result<(&str, &str), Err<Error<&str>>>

docs.rsToggle fullscreen

nom

pub mod branch

Choice combinators

docs.rsToggle fullscreen

nom::branch

pub fn alt<I, O, E, List>(mut l: List) -> impl FnMut(I) -> IResult<I, O, E>
where
    I: Clone,
    E: ParseError<I>,
    List: Alt<I, O, E>,

Tests a list of parsers one by one until one succeeds.

It takes as argument a tuple of parsers. There is a maximum of 21 parsers. If you need more, it is possible to nest them in other alt calls, like this: alt(parser_a, alt(parser_b, parser_c))

use nom::character::complete::{alpha1, digit1};
use nom::branch::alt;
fn parser(input: &str) -> IResult<&str, &str> {
  alt((alpha1, digit1))(input)
};

// the first parser, alpha1, recognizes the input
assert_eq!(parser("abc"), Ok(("", "abc")));

// the first parser returns an error, so alt tries the second one
assert_eq!(parser("123456"), Ok(("", "123456")));

// both parsers failed, and with the default error type, alt will return the last error
assert_eq!(parser(" "), Err(Err::Error(error_position!(" ", ErrorKind::Digit))));

With a custom error type, it is possible to have alt return the error of the parser that went the farthest in the input data

docs.rsToggle fullscreen

nom

pub mod bytes

Parsers recognizing bytes streams

docs.rsToggle fullscreen

nom::bytes

pub mod complete

Parsers recognizing bytes streams, complete input version

docs.rsToggle fullscreen

nom::bytes::complete

pub fn tag<T, Input, Error>(tag: T) -> impl Fn(Input) -> IResult<Input, Input, Error>
where
    Error: ParseError<Input>,
    Input: InputTake + Compare<T>,
    T: InputLength + Clone,

Recognizes a pattern

The input data will be compared to the tag combinator’s argument and will return the part of the input that matches the argument

It will return Err(Err::Error((_, ErrorKind::Tag))) if the input doesn’t match the pattern

Example

use nom::bytes::complete::tag;

fn parser(s: &str) -> IResult<&str, &str> {
  tag("Hello")(s)
}

assert_eq!(parser("Hello, World!"), Ok((", World!", "Hello")));
assert_eq!(parser("Something"), Err(Err::Error(Error::new("Something", ErrorKind::Tag))));
assert_eq!(parser(""), Err(Err::Error(Error::new("", ErrorKind::Tag))));

docs.rsToggle fullscreen

nom

pub mod sequence

Combinators applying parsers in sequence

docs.rsToggle fullscreen

nom::sequence

pub fn tuple<I, O, E, List>(mut l: List) -> impl FnMut(I) -> IResult<I, O, E>
where
    E: ParseError<I>,
    List: Tuple<I, O, E>,

Applies a tuple of parsers one by one and returns their results as a tuple. There is a maximum of 21 parsers

use nom::sequence::tuple;
use nom::character::complete::{alpha1, digit1};
let mut parser = tuple((alpha1, digit1, alpha1));

assert_eq!(parser("abc123def"), Ok(("", ("abc", "123", "def"))));
assert_eq!(parser("123def"), Err(Err::Error(("123def", ErrorKind::Alpha))));

Toggle fullscreen

codeintel::block_0be8c6ed0e493506

fn digits(input: &str) -> nom::IResult<&str, Option<usize>>

docs.rsToggle fullscreen

nom

pub mod combinator

General purpose combinators

docs.rsToggle fullscreen

nom::combinator

pub fn all_consuming<I, O, E, F>(mut f: F) -> impl FnMut(I) -> IResult<I, O, E>
where
    E: ParseError<I>,
    I: InputLength,
    F: Parser<I, O, E>,

Succeeds if all the input has been consumed by its child parser.

use nom::combinator::all_consuming;
use nom::character::complete::alpha1;

let mut parser = all_consuming(alpha1);

assert_eq!(parser("abcd"), Ok(("", "abcd")));
assert_eq!(parser("abcd;"),Err(Err::Error((";", ErrorKind::Eof))));
assert_eq!(parser("123abcd;"),Err(Err::Error(("123abcd;", ErrorKind::Alpha))));

docs.rsToggle fullscreen

nom::combinator

pub fn map_res<I, O1, O2, E, E2, F, G>(mut parser: F, mut f: G) -> impl FnMut(I) -> IResult<I, O2, E>
where
    I: Clone,
    E: FromExternalError<I, E2>,
    F: Parser<I, O1, E>,
    G: FnMut(O1) -> Result<O2, E2>,

Applies a function returning a Result over the result of a parser.

use nom::character::complete::digit1;
use nom::combinator::map_res;

let mut parse = map_res(digit1, |s: &str| s.parse::<u8>());

// the parser will convert the result of digit1 to a number
assert_eq!(parse("123"), Ok(("", 123)));

// this will fail if digit1 fails
assert_eq!(parse("abc"), Err(Err::Error(("abc", ErrorKind::Digit))));

// this will fail if the mapped function fails (a `u8` is too small to hold `123456`)
assert_eq!(parse("123456"), Err(Err::Error(("123456", ErrorKind::MapRes))));

docs.rsToggle fullscreen

nom

pub mod character

Character specific parsers and combinators

Functions recognizing specific characters

docs.rsToggle fullscreen

nom::character

pub mod complete

Character specific parsers and combinators, complete input version.

Functions recognizing specific characters.

docs.rsToggle fullscreen

nom::character::complete

pub fn digit1<T, E>(input: T) -> IResult<T, T, E>
where
    E: ParseError<T>,
    T: InputTakeAtPosition,
    <T as InputTakeAtPosition>::Item: AsChar,

Recognizes one or more ASCII numerical characters: 0-9

Complete version: Will return an error if there’s not enough input data, or the whole input if no terminating token is found (a non digit character).

Example

fn parser(input: &str) -> IResult<&str, &str> {
    digit1(input)
}

assert_eq!(parser("21c"), Ok(("c", "21")));
assert_eq!(parser("c1"), Err(Err::Error(Error::new("c1", ErrorKind::Digit))));
assert_eq!(parser(""), Err(Err::Error(Error::new("", ErrorKind::Digit))));

Parsing an integer

You can use digit1 in combination with map_res to parse an integer:

fn parser(input: &str) -> IResult<&str, u32> {
  map_res(digit1, str::parse)(input)
}

assert_eq!(parser("416"), Ok(("", 416)));
assert_eq!(parser("12b"), Ok(("b", 12)));
assert!(parser("b").is_err());

docs.rsToggle fullscreen

core::str

pub fn parse<F>(&self) -> Result<F, F::Err>
where
    F: FromStr,

Parses this string slice into another type.

Because parse is so general, it can cause problems with type inference. As such, parse is one of the few times you’ll see the syntax affectionately known as the ‘turbofish’: ::<>. This helps the inference algorithm understand specifically which type you’re trying to parse into.

parse can parse into any type that implements the FromStr trait.

Errors

Will return Err if it’s not possible to parse this string slice into the desired type.

Examples

Basic usage:

let four: u32 = "4".parse().unwrap();

assert_eq!(4, four);

Using the ‘turbofish’ instead of annotating four:

let four = "4".parse::<u32>();

assert_eq!(Ok(4), four);

Failing to parse:

let nope = "j".parse::<u32>();

assert!(nope.is_err());

Toggle fullscreen

nom::internal

impl<'a, I, O, E, F> Parser<I, O, E> for F
fn parse(&mut self, i: I) -> IResult<I, O, E>
where
    // Bounds from impl:
    F: FnMut(I) -> IResult<I, O, E> + 'a,

A parser takes in input type, and returns a Result containing either the remaining input and the output value, or an error

docs.rsToggle fullscreen

nom::character::complete

pub fn digit0<T, E>(input: T) -> IResult<T, T, E>
where
    E: ParseError<T>,
    T: InputTakeAtPosition,
    <T as InputTakeAtPosition>::Item: AsChar,

Recognizes zero or more ASCII numerical characters: 0-9

Complete version: Will return an error if there’s not enough input data, or the whole input if no terminating token is found (a non digit character).

Example

fn parser(input: &str) -> IResult<&str, &str> {
    digit0(input)
}

assert_eq!(parser("21c"), Ok(("c", "21")));
assert_eq!(parser("21"), Ok(("", "21")));
assert_eq!(parser("a21c"), Ok(("a21c", "")));
assert_eq!(parser(""), Ok(("", "")));

Toggle fullscreen

core::ops::bit

fn not(self) -> bool

Performs the unary ! operation.

Examples

assert_eq!(!true, false);
assert_eq!(!false, true);
assert_eq!(!1u8, 254);
assert_eq!(!0u8, 255);

docs.rsToggle fullscreen

core::result::Result

impl<T, E> Result<T, E>
pub const fn map<U, F>(self, op: F) -> Result<U, E>
where
    F: FnOnce(T) -> U + Destruct,

Maps a Result<T, E> to Result<U, E> by applying a function to a contained Ok value, leaving an Err value untouched.

This function can be used to compose the results of two functions.

Examples

Print the numbers on each line of a string multiplied by two.

let line = "1\n2\n3\n4\n";

for num in line.lines() {
    match num.parse::<i32>().map(|i| i * 2) {
        Ok(n) => println!("{n}"),
        Err(..) => {}
    }
}

Toggle fullscreen

The position CSS property sets how an element is positioned in a document. The top, right, bottom, and left properties determine the final location of positioned elements.

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 4, Opera 4)

Syntax: static | relative | absolute | sticky | fixed

MDN Reference

Toggle fullscreen

Specifies how far an absolutely positioned box’s left margin edge is offset to the right of the left edge of the box’s ‘containing block’.

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 5.5, Opera 5)

Syntax: <length> | <percentage> | auto

MDN Reference

Toggle fullscreen

Specifies how far an absolutely positioned box’s top margin edge is offset below the top edge of the box’s ‘containing block’.

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 5, Opera 6)

Syntax: <length> | <percentage> | auto

MDN Reference

Toggle fullscreen

Shorthand property for setting border width, style and color

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 4, Opera 3.5)

Syntax: <line-width> || <line-style> || <color>

MDN Reference

Toggle fullscreen

Shorthand property for setting border width, style and color.

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 4, Opera 3.5)

Syntax: <line-width> || <line-style> || <color>

MDN Reference

Toggle fullscreen

Specifies the width of the content area, padding area or border area (depending on ‘box-sizing’) of certain boxes.

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 4, Opera 3.5)

MDN Reference

Toggle fullscreen

Shorthand property to set values for the thickness of the margin area. If left is omitted, it is the same as right. If bottom is omitted it is the same as top, if right is omitted it is the same as top. Negative values for margin properties are allowed, but there may be implementation-specific limits..

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 3, Opera 3.5)

Syntax: <length-percentage> | auto

MDN Reference

docs.rsToggle fullscreen

std

pub mod io

Traits, helpers, and type definitions for core I/O functionality.

The std::io module contains a number of common things you’ll need when doing input and output. The most core part of this module is the Read and Write traits, which provide the most general interface for reading and writing input and output.

Read and Write

Because they are traits, Read and Write are implemented by a number of other types, and you can implement them for your types too. As such, you’ll see a few different types of I/O throughout the documentation in this module: Files, TcpStreams, and sometimes even Vec<T>s. For example, Read adds a read method, which we can use on Files:

use std::io;
use std::io::prelude::*;
use std::fs::File;

fn main() -> io::Result<()> {
    let mut f = File::open("foo.txt")?;
    let mut buffer = [0; 10];

    // read up to 10 bytes
    let n = f.read(&mut buffer)?;

    println!("The bytes: {:?}", &buffer[..n]);
    Ok(())
}

Read and Write are so important, implementors of the two traits have a nickname: readers and writers. So you’ll sometimes see ‘a reader’ instead of ‘a type that implements the Read trait’. Much easier!

Seek and BufRead

Beyond that, there are two important traits that are provided: Seek and BufRead. Both of these build on top of a reader to control how the reading happens. Seek lets you control where the next byte is coming from:

use std::io;
use std::io::prelude::*;
use std::io::SeekFrom;
use std::fs::File;

fn main() -> io::Result<()> {
    let mut f = File::open("foo.txt")?;
    let mut buffer = [0; 10];

    // skip to the last 10 bytes of the file
    f.seek(SeekFrom::End(-10))?;

    // read up to 10 bytes
    let n = f.read(&mut buffer)?;

    println!("The bytes: {:?}", &buffer[..n]);
    Ok(())
}

BufRead uses an internal buffer to provide a number of other ways to read, but to show it off, we’ll need to talk about buffers in general. Keep reading!

BufReader and BufWriter

Byte-based interfaces are unwieldy and can be inefficient, as we’d need to be making near-constant calls to the operating system. To help with this, std::io comes with two structs, BufReader and BufWriter, which wrap readers and writers. The wrapper uses a buffer, reducing the number of calls and providing nicer methods for accessing exactly what you want.

For example, BufReader works with the BufRead trait to add extra methods to any reader:

use std::io;
use std::io::prelude::*;
use std::io::BufReader;
use std::fs::File;

fn main() -> io::Result<()> {
    let f = File::open("foo.txt")?;
    let mut reader = BufReader::new(f);
    let mut buffer = String::new();

    // read a line into buffer
    reader.read_line(&mut buffer)?;

    println!("{buffer}");
    Ok(())
}

BufWriter doesn’t add any new ways of writing; it just buffers every call to write:

use std::io;
use std::io::prelude::*;
use std::io::BufWriter;
use std::fs::File;

fn main() -> io::Result<()> {
    let f = File::create("foo.txt")?;
    {
        let mut writer = BufWriter::new(f);

        // write a byte to the buffer
        writer.write(&[42])?;

    } // the buffer is flushed once writer goes out of scope

    Ok(())
}

Standard input and output

A very common source of input is standard input:

use std::io;

fn main() -> io::Result<()> {
    let mut input = String::new();

    io::stdin().read_line(&mut input)?;

    println!("You typed: {}", input.trim());
    Ok(())
}

Note that you cannot use the ? operator in functions that do not return a Result<T, E>. Instead, you can call .unwrap() or match on the return value to catch any possible errors:

use std::io;

let mut input = String::new();

io::stdin().read_line(&mut input).unwrap();

And a very common source of output is standard output:

use std::io;
use std::io::prelude::*;

fn main() -> io::Result<()> {
    io::stdout().write(&[42])?;
    Ok(())
}

Of course, using io::stdout directly is less common than something like println.

Iterator types

A large number of the structures provided by std::io are for various ways of iterating over I/O. For example, Lines is used to split over lines:

use std::io;
use std::io::prelude::*;
use std::io::BufReader;
use std::fs::File;

fn main() -> io::Result<()> {
    let f = File::open("foo.txt")?;
    let reader = BufReader::new(f);

    for line in reader.lines() {
        println!("{}", line?);
    }
    Ok(())
}

Functions

There are a number of functions that offer access to various features. For example, we can use three of these functions to copy everything from standard input to standard output:

use std::io;

fn main() -> io::Result<()> {
    io::copy(&mut io::stdin(), &mut io::stdout())?;
    Ok(())
}

io::Result

Last, but certainly not least, is io::Result. This type is used as the return type of many std::io functions that can cause an error, and can be returned from your own functions as well. Many of the examples in this module use the ? operator:

use std::io;

fn read_input() -> io::Result<()> {
    let mut input = String::new();

    io::stdin().read_line(&mut input)?;

    println!("You typed: {}", input.trim());

    Ok(())
}

The return type of read_input(), io::Result<()>, is a very common type for functions which don’t have a ‘real’ return value, but do want to return errors if they happen. In this case, the only purpose of this function is to read the line and print it, so we use ().

Platform-specific behavior

Many I/O functions throughout the standard library are documented to indicate what various library or syscalls they are delegated to. This is done to help applications both understand what’s happening under the hood as well as investigate any possibly unclear semantics. Note, however, that this is informative, not a binding contract. The implementation of many of these functions are subject to change over time and may call fewer or more syscalls/library functions.

I/O Safety

Rust follows an I/O safety discipline that is comparable to its memory safety discipline. This means that file descriptors can be exclusively owned. (Here, “file descriptor” is meant to subsume similar concepts that exist across a wide range of operating systems even if they might use a different name, such as “handle”.) An exclusively owned file descriptor is one that no other code is allowed to access in any way, but the owner is allowed to access and even close it any time. A type that owns its file descriptor should usually close it in its drop function. Types like File own their file descriptor. Similarly, file descriptors can be borrowed, granting the temporary right to perform operations on this file descriptor. This indicates that the file descriptor will not be closed for the lifetime of the borrow, but it does not imply any right to close this file descriptor, since it will likely be owned by someone else.

The platform-specific parts of the Rust standard library expose types that reflect these concepts, see os::unix and os::windows.

To uphold I/O safety, it is crucial that no code acts on file descriptors it does not own or borrow, and no code closes file descriptors it does not own. In other words, a safe function that takes a regular integer, treats it as a file descriptor, and acts on it, is unsound.

Not upholding I/O safety and acting on a file descriptor without proof of ownership can lead to misbehavior and even Undefined Behavior in code that relies on ownership of its file descriptors: a closed file descriptor could be re-allocated, so the original owner of that file descriptor is now working on the wrong file. Some code might even rely on fully encapsulating its file descriptors with no operations being performed by any other part of the program.

Note that exclusive ownership of a file descriptor does not imply exclusive ownership of the underlying kernel object that the file descriptor references (also called “open file description” on some operating systems). File descriptors basically work like Arc: when you receive an owned file descriptor, you cannot know whether there are any other file descriptors that reference the same kernel object. However, when you create a new kernel object, you know that you are holding the only reference to it. Just be careful not to lend it to anyone, since they can obtain a clone and then you can no longer know what the reference count is! In that sense, OwnedFd is like Arc and BorrowedFd<'a> is like &'a Arc (and similar for the Windows types). In particular, given a BorrowedFd<'a>, you are not allowed to close the file descriptor – just like how, given a &'a Arc, you are not allowed to decrement the reference count and potentially free the underlying object. There is no equivalent to Box for file descriptors in the standard library (that would be a type that guarantees that the reference count is 1), however, it would be possible for a crate to define a type with those semantics.

docs.rsToggle fullscreen

std::io

pub trait Write

A trait for objects which are byte-oriented sinks.

Implementors of the Write trait are sometimes called ‘writers’.

Writers are defined by two required methods, write and flush:

The write method will attempt to write some data into the object, returning how many bytes were successfully written.
The flush method is useful for adapters and explicit buffers themselves for ensuring that all buffered data has been pushed out to the ‘true sink’.

Writers are intended to be composable with one another. Many implementors throughout std::io take and provide types which implement the Write trait.

Examples

use std::io::prelude::*;
use std::fs::File;

fn main() -> std::io::Result<()> {
    let data = b"some bytes";

    let mut pos = 0;
    let mut buffer = File::create("foo.txt")?;

    while pos < data.len() {
        let bytes_written = buffer.write(&data[pos..])?;
        pos += bytes_written;
    }
    Ok(())
}

The trait also provides convenience methods like write_all, which calls write in a loop until its entire input has been written.

Toggle fullscreen

extern crate base64

Correct, fast, and configurable base64 decoding and encoding. Base64 transports binary data efficiently in contexts where only plain text is allowed.

Usage

Use an Engine to decode or encode base64, configured with the base64 alphabet and padding behavior best suited to your application.

Engine setup

There is more than one way to encode a stream of bytes as “base64”. Different applications use different encoding alphabets and padding behaviors.

Encoding alphabet

Almost all base64 alphabets use A-Z, a-z, and 0-9, which gives nearly 64 characters (26 + 26 + 10 = 62), but they differ in their choice of their final 2.

Most applications use the standard alphabet specified in RFC 4648. If that’s all you need, you can get started quickly by using the pre-configured STANDARD engine, which is also available in the prelude module as shown here, if you prefer a minimal use footprint.

 use base64::prelude::*;

 assert_eq!(BASE64_STANDARD.decode(b"+uwgVQA=")?, b"\xFA\xEC\x20\x55\0");
 assert_eq!(BASE64_STANDARD.encode(b"\xFF\xEC\x20\x55\0"), "/+wgVQA=");

Other common alphabets are available in the alphabet module.

URL-safe alphabet

The standard alphabet uses + and / as its two non-alphanumeric tokens, which cannot be safely used in URL’s without encoding them as %2B and %2F.

To avoid that, some applications use a “URL-safe” alphabet, which uses - and _ instead. To use that alternative alphabet, use the URL_SAFE engine. This example doesn’t use prelude to show what a more explicit use would look like.

 use base64::{engine::general_purpose::URL_SAFE, Engine as _};

 assert_eq!(URL_SAFE.decode(b"-uwgVQA=")?, b"\xFA\xEC\x20\x55\0");
 assert_eq!(URL_SAFE.encode(b"\xFF\xEC\x20\x55\0"), "_-wgVQA=");

Padding characters

Each base64 character represents 6 bits (2⁶ = 64) of the original binary data, and every 3 bytes of input binary data will encode to 4 base64 characters (8 bits × 3 = 6 bits × 4 = 24 bits).

When the input is not an even multiple of 3 bytes in length, canonical base64 encoders insert padding characters at the end, so that the output length is always a multiple of 4:

 use base64::{engine::general_purpose::STANDARD, Engine as _};

 assert_eq!(STANDARD.encode(b""),    "");
 assert_eq!(STANDARD.encode(b"f"),   "Zg==");
 assert_eq!(STANDARD.encode(b"fo"),  "Zm8=");
 assert_eq!(STANDARD.encode(b"foo"), "Zm9v");

Canonical encoding ensures that base64 encodings will be exactly the same, byte-for-byte, regardless of input length. But the = padding characters aren’t necessary for decoding, and they may be omitted by using a NO_PAD configuration:

 use base64::{engine::general_purpose::STANDARD_NO_PAD, Engine as _};

 assert_eq!(STANDARD_NO_PAD.encode(b""),    "");
 assert_eq!(STANDARD_NO_PAD.encode(b"f"),   "Zg");
 assert_eq!(STANDARD_NO_PAD.encode(b"fo"),  "Zm8");
 assert_eq!(STANDARD_NO_PAD.encode(b"foo"), "Zm9v");

The pre-configured NO_PAD engines will reject inputs containing padding = characters. To encode without padding and still accept padding while decoding, create an engine with that padding mode.

 assert_eq!(STANDARD_NO_PAD.decode(b"Zm8="), Err(base64::DecodeError::InvalidPadding));

Further customization

Decoding and encoding behavior can be customized by creating an engine with an alphabet and padding configuration:

 use base64::{engine, alphabet, Engine as _};

 // bizarro-world base64: +/ as the first symbols instead of the last
 let alphabet =
     alphabet::Alphabet::new("+/ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789")
     .unwrap();

 // a very weird config that encodes with padding but requires no padding when decoding...?
 let crazy_config = engine::GeneralPurposeConfig::new()
     .with_decode_allow_trailing_bits(true)
     .with_encode_padding(true)
     .with_decode_padding_mode(engine::DecodePaddingMode::RequireNone);

 let crazy_engine = engine::GeneralPurpose::new(&alphabet, crazy_config);

 let encoded = crazy_engine.encode(b"abc 123");

Memory allocation

The decode and encode engine methods allocate memory for their results – decode returns a Vec<u8> and encode returns a String. To instead decode or encode into a buffer that you allocated, use one of the alternative methods:

Decoding

Encoding

Input and output

The base64 crate can decode and encode values in memory, or DecoderReader and EncoderWriter provide streaming decoding and encoding for any readable or writable byte stream.

Decoding

 use base64::{engine::general_purpose::STANDARD, read::DecoderReader};

 let mut input = io::stdin();
 let mut decoder = DecoderReader::new(&mut input, &STANDARD);
 io::copy(&mut decoder, &mut io::stdout())?;

Encoding

 use base64::{engine::general_purpose::STANDARD, write::EncoderWriter};

 let mut output = io::stdout();
 let mut encoder = EncoderWriter::new(&mut output, &STANDARD);
 io::copy(&mut io::stdin(), &mut encoder)?;

Display

If you only need a base64 representation for implementing the Display trait, use Base64Display:

use base64::{display::Base64Display, engine::general_purpose::STANDARD};

let value = Base64Display::new(b"\0\x01\x02\x03", &STANDARD);
assert_eq!("base64: AAECAw==", format!("base64: {}", value));

Panics

If length calculations result in overflowing usize, a panic will result.

docs.rsToggle fullscreen

base64::engine

pub trait Engine
where
    Self: Send + Sync,

An Engine provides low-level encoding and decoding operations that all other higher-level parts of the API use. Users of the library will generally not need to implement this.

Different implementations offer different characteristics. The library currently ships with GeneralPurpose that offers good speed and works on any CPU, with more choices coming later, like a constant-time one when side channel resistance is called for, and vendor-specific vectorized ones for more speed.

See general_purpose::STANDARD_NO_PAD if you just want standard base64. Otherwise, when possible, it’s recommended to store the engine in a const so that references to it won’t pose any lifetime issues, and to avoid repeating the cost of engine setup.

Since almost nobody will need to implement Engine, docs for internal methods are hidden.

Toggle fullscreen

codeintel::block_e60e6d0ee5b71fe6

fn generate_godbolt_link(links: &mut Vec<(&str, String)>, tool: &str, code: &str, lang: &str) -> bool

Toggle fullscreen

links: &mut Vec<(&str, String)>

docs.rsToggle fullscreen

alloc::vec

pub struct Vec<T, A = Global>
where
    A: Allocator,
{
    buf: RawVec<T, A>,
    len: usize,
}

A contiguous growable array type, written as Vec<T>, short for ‘vector’.

Examples

let mut vec = Vec::new();
vec.push(1);
vec.push(2);

assert_eq!(vec.len(), 2);
assert_eq!(vec[0], 1);

assert_eq!(vec.pop(), Some(2));
assert_eq!(vec.len(), 1);

vec[0] = 7;
assert_eq!(vec[0], 7);

vec.extend([1, 2, 3]);

for x in &vec {
    println!("{x}");
}
assert_eq!(vec, [7, 1, 2, 3]);

The vec macro is provided for convenient initialization:

let mut vec1 = vec![1, 2, 3];
vec1.push(4);
let vec2 = Vec::from([1, 2, 3, 4]);
assert_eq!(vec1, vec2);

It can also initialize each element of a Vec<T> with a given value. This may be more efficient than performing allocation and initialization in separate steps, especially when initializing a vector of zeros:

let vec = vec![0; 5];
assert_eq!(vec, [0, 0, 0, 0, 0]);

// The following is equivalent, but potentially slower:
let mut vec = Vec::with_capacity(5);
vec.resize(5, 0);
assert_eq!(vec, [0, 0, 0, 0, 0]);

For more information, see Capacity and Reallocation.

Use a Vec<T> as an efficient stack:

let mut stack = Vec::new();

stack.push(1);
stack.push(2);
stack.push(3);

while let Some(top) = stack.pop() {
    // Prints 3, 2, 1
    println!("{top}");
}

Indexing

The Vec type allows access to values by index, because it implements the Index trait. An example will be more explicit:

let v = vec![0, 2, 4, 6];
println!("{}", v[1]); // it will display '2'

However be careful: if you try to access an index which isn’t in the Vec, your software will panic! You cannot do this:

let v = vec![0, 2, 4, 6];
println!("{}", v[6]); // it will panic!

Use get and get_mut if you want to check whether the index is in the Vec.

Slicing

A Vec can be mutable. On the other hand, slices are read-only objects. To get a slice, use &. Example:

fn read_slice(slice: &[usize]) {
    // ...
}

let v = vec![0, 1];
read_slice(&v);

// ... and that's all!
// you can also do it like this:
let u: &[usize] = &v;
// or like this:
let u: &[_] = &v;

In Rust, it’s more common to pass slices as arguments rather than vectors when you just want to provide read access. The same goes for String and &str.

Capacity and reallocation

The capacity of a vector is the amount of space allocated for any future elements that will be added onto the vector. This is not to be confused with the length of a vector, which specifies the number of actual elements within the vector. If a vector’s length exceeds its capacity, its capacity will automatically be increased, but its elements will have to be reallocated.

For example, a vector with capacity 10 and length 0 would be an empty vector with space for 10 more elements. Pushing 10 or fewer elements onto the vector will not change its capacity or cause reallocation to occur. However, if the vector’s length is increased to 11, it will have to reallocate, which can be slow. For this reason, it is recommended to use Vec::with_capacity whenever possible to specify how big the vector is expected to get.

Guarantees

Due to its incredibly fundamental nature, Vec makes a lot of guarantees about its design. This ensures that it’s as low-overhead as possible in the general case, and can be correctly manipulated in primitive ways by unsafe code. Note that these guarantees refer to an unqualified Vec<T>. If additional type parameters are added (e.g., to support custom allocators), overriding their defaults may change the behavior.

Most fundamentally, Vec is and always will be a (pointer, capacity, length) triplet. No more, no less. The order of these fields is completely unspecified, and you should use the appropriate methods to modify these. The pointer will never be null, so this type is null-pointer-optimized.

However, the pointer might not actually point to allocated memory. In particular, if you construct a Vec with capacity 0 via Vec::new, vec![], Vec::with_capacity(0), or by calling shrink_to_fit on an empty Vec, it will not allocate memory. Similarly, if you store zero-sized types inside a Vec, it will not allocate space for them. Note that in this case the Vec might not report a capacity of 0. Vec will allocate if and only if size_of::<T>() * capacity > 0. In general, Vec’s allocation details are very subtle — if you intend to allocate memory using a Vec and use it for something else (either to pass to unsafe code, or to build your own memory-backed collection), be sure to deallocate this memory by using from_raw_parts to recover the Vec and then dropping it.

If a Vec has allocated memory, then the memory it points to is on the heap (as defined by the allocator Rust is configured to use by default), and its pointer points to len initialized, contiguous elements in order (what you would see if you coerced it to a slice), followed by capacity - len logically uninitialized, contiguous elements.

A vector containing the elements 'a' and 'b' with capacity 4 can be visualized as below. The top part is the Vec struct, it contains a pointer to the head of the allocation in the heap, length and capacity. The bottom part is the allocation on the heap, a contiguous memory block.

            ptr      len  capacity
       +--------+--------+--------+
       | 0x0123 |      2 |      4 |
       +--------+--------+--------+
            |
            v
Heap   +--------+--------+--------+--------+
       |    'a' |    'b' | uninit | uninit |
       +--------+--------+--------+--------+

uninit represents memory that is not initialized, see MaybeUninit.
Note: the ABI is not stable and Vec makes no guarantees about its memorylayout (including the order of fields).

Vec will never perform a “small optimization” where elements are actually stored on the stack for two reasons:

It would make it more difficult for unsafe code to correctly manipulate a Vec. The contents of a Vec wouldn’t have a stable address if it were only moved, and it would be more difficult to determine if a Vec had actually allocated memory.
It would penalize the general case, incurring an additional branch on every access.

Vec will never automatically shrink itself, even if completely empty. This ensures no unnecessary allocations or deallocations occur. Emptying a Vec and then filling it back up to the same len should incur no calls to the allocator. If you wish to free up unused memory, use shrink_to_fit or shrink_to.

push and insert will never (re)allocate if the reported capacity is sufficient. push and insert will (re)allocate if len == capacity. That is, the reported capacity is completely accurate, and can be relied on. It can even be used to manually free the memory allocated by a Vec if desired. Bulk insertion methods may reallocate, even when not necessary.

Vec does not guarantee any particular growth strategy when reallocating when full, nor when reserve is called. The current strategy is basic and it may prove desirable to use a non-constant growth factor. Whatever strategy is used will of course guarantee O(1) amortized push.

It is guaranteed, in order to respect the intentions of the programmer, that all of vec![e_1, e_2, ..., e_n], vec![x; n], and Vec::with_capacity(n) produce a Vec that requests an allocation of the exact size needed for precisely n elements from the allocator, and no other size (such as, for example: a size rounded up to the nearest power of 2). The allocator will return an allocation that is at least as large as requested, but it may be larger.

It is guaranteed that the Vec::capacity method returns a value that is at least the requested capacity and not more than the allocated capacity.

The method Vec::shrink_to_fit will attempt to discard excess capacity an allocator has given to a Vec. If len == capacity, then a Vec<T> can be converted to and from a Box<[T]> without reallocating or moving the elements. Vec exploits this fact as much as reasonable when implementing common conversions such as into_boxed_slice.

Vec will not specifically overwrite any data that is removed from it, but also won’t specifically preserve it. Its uninitialized memory is scratch space that it may use however it wants. It will generally just do whatever is most efficient or otherwise easy to implement. Do not rely on removed data to be erased for security purposes. Even if you drop a Vec, its buffer may simply be reused by another allocation. Even if you zero a Vec’s memory first, that might not actually happen because the optimizer does not consider this a side-effect that must be preserved. There is one case which we will not break, however: using unsafe code to write to the excess capacity, and then increasing the length to match, is always valid.

Currently, Vec does not guarantee the order in which elements are dropped. The order has changed in the past and may change again.

docs.rsToggle fullscreen

alloc::string

pub struct String {
    vec: Vec<u8>,
}

A UTF-8–encoded, growable string.

String is the most common string type. It has ownership over the contents of the string, stored in a heap-allocated buffer (see Representation). It is closely related to its borrowed counterpart, the primitive str.

Examples

You can create a String from a literal string with String::from:

let hello = String::from("Hello, world!");

You can append a char to a String with the push method, and append a &str with the push_str method:

let mut hello = String::from("Hello, ");

hello.push('w');
hello.push_str("orld!");

If you have a vector of UTF-8 bytes, you can create a String from it with the from_utf8 method:

// some bytes, in a vector
let sparkle_heart = vec![240, 159, 146, 150];

// We know these bytes are valid, so we'll use `unwrap()`.
let sparkle_heart = String::from_utf8(sparkle_heart).unwrap();

assert_eq!("💖", sparkle_heart);

UTF-8

Strings are always valid UTF-8. If you need a non-UTF-8 string, consider OsString. It is similar, but without the UTF-8 constraint. Because UTF-8 is a variable width encoding, Strings are typically smaller than an array of the same chars:

// `s` is ASCII which represents each `char` as one byte
let s = "hello";
assert_eq!(s.len(), 5);

// A `char` array with the same contents would be longer because
// every `char` is four bytes
let s = ['h', 'e', 'l', 'l', 'o'];
let size: usize = s.into_iter().map(|c| size_of_val(&c)).sum();
assert_eq!(size, 20);

// However, for non-ASCII strings, the difference will be smaller
// and sometimes they are the same
let s = "💖💖💖💖💖";
assert_eq!(s.len(), 20);

let s = ['💖', '💖', '💖', '💖', '💖'];
let size: usize = s.into_iter().map(|c| size_of_val(&c)).sum();
assert_eq!(size, 20);

This raises interesting questions as to how s[i] should work. What should i be here? Several options include byte indices and char indices but, because of UTF-8 encoding, only byte indices would provide constant time indexing. Getting the ith char, for example, is available using chars:

let s = "hello";
let third_character = s.chars().nth(2);
assert_eq!(third_character, Some('l'));

let s = "💖💖💖💖💖";
let third_character = s.chars().nth(2);
assert_eq!(third_character, Some('💖'));

Next, what should s[i] return? Because indexing returns a reference to underlying data it could be &u8, &[u8], or something similar. Since we’re only providing one index, &u8 makes the most sense but that might not be what the user expects and can be explicitly achieved with as_bytes():

// The first byte is 104 - the byte value of `'h'`
let s = "hello";
assert_eq!(s.as_bytes()[0], 104);
// or
assert_eq!(s.as_bytes()[0], b'h');

// The first byte is 240 which isn't obviously useful
let s = "💖💖💖💖💖";
assert_eq!(s.as_bytes()[0], 240);

Due to these ambiguities/restrictions, indexing with a usize is simply forbidden:

let s = "hello";

// The following will not compile!
println!("The first letter of s is {}", s[0]);

It is more clear, however, how &s[i..j] should work (that is, indexing with a range). It should accept byte indices (to be constant-time) and return a &str which is UTF-8 encoded. This is also called “string slicing”. Note this will panic if the byte indices provided are not character boundaries - see is_char_boundary for more details. See the implementations for SliceIndex<str> for more details on string slicing. For a non-panicking version of string slicing, see get.

The bytes and chars methods return iterators over the bytes and codepoints of the string, respectively. To iterate over codepoints along with byte indices, use char_indices.

Deref

String implements Deref<Target = str>, and so inherits all of str’s methods. In addition, this means that you can pass a String to a function which takes a &str by using an ampersand (&):

fn takes_str(s: &str) { }

let s = String::from("Hello");

takes_str(&s);

This will create a &str from the String and pass it in. This conversion is very inexpensive, and so generally, functions will accept &strs as arguments unless they need a String for some specific reason.

In certain cases Rust doesn’t have enough information to make this conversion, known as Deref coercion. In the following example a string slice &'a str implements the trait TraitExample, and the function example_func takes anything that implements the trait. In this case Rust would need to make two implicit conversions, which Rust doesn’t have the means to do. For that reason, the following example will not compile.

trait TraitExample {}

impl<'a> TraitExample for &'a str {}

fn example_func<A: TraitExample>(example_arg: A) {}

let example_string = String::from("example_string");
example_func(&example_string);

There are two options that would work instead. The first would be to change the line example_func(&example_string); to example_func(example_string.as_str());, using the method as_str() to explicitly extract the string slice containing the string. The second way changes example_func(&example_string); to example_func(&*example_string);. In this case we are dereferencing a String to a str, then referencing the str back to &str. The second way is more idiomatic, however both work to do the conversion explicitly rather than relying on the implicit conversion.

Representation

A String is made up of three components: a pointer to some bytes, a length, and a capacity. The pointer points to the internal buffer which String uses to store its data. The length is the number of bytes currently stored in the buffer, and the capacity is the size of the buffer in bytes. As such, the length will always be less than or equal to the capacity.

This buffer is always stored on the heap.

You can look at these with the as_ptr, len, and capacity methods:

let story = String::from("Once upon a time...");

// Deconstruct the String into parts.
let (ptr, len, capacity) = story.into_raw_parts();

// story has nineteen bytes
assert_eq!(19, len);

// We can re-build a String out of ptr, len, and capacity. This is all
// unsafe because we are responsible for making sure the components are
// valid:
let s = unsafe { String::from_raw_parts(ptr, len, capacity) } ;

assert_eq!(String::from("Once upon a time..."), s);

If a String has enough capacity, adding elements to it will not re-allocate. For example, consider this program:

let mut s = String::new();

println!("{}", s.capacity());

for _ in 0..5 {
    s.push_str("hello");
    println!("{}", s.capacity());
}

This will output the following:

At first, we have no memory allocated at all, but as we append to the string, it increases its capacity appropriately. If we instead use the with_capacity method to allocate the correct capacity initially:

let mut s = String::with_capacity(25);

println!("{}", s.capacity());

for _ in 0..5 {
    s.push_str("hello");
    println!("{}", s.capacity());
}

We end up with a different output:

Here, there’s no need to allocate more memory inside the loop.

Toggle fullscreen

tool: &str

Toggle fullscreen

code: &str

Toggle fullscreen

lang: &str

Toggle fullscreen

core::cmp::impls

impl<A, B> PartialEq<&B> for &A
fn ne(&self, other: &&B) -> bool
where
    // Bounds from impl:
    A: PointeeSized,
    B: PointeeSized,
    A: PartialEq<B>,

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.

Toggle fullscreen

let args: Value

Toggle fullscreen

extern crate serde_json

Serde JSON

JSON is a ubiquitous open-standard format that uses human-readable text to transmit data objects consisting of key-value pairs.

{
    "name": "John Doe",
    "age": 43,
    "address": {
        "street": "10 Downing Street",
        "city": "London"
    },
    "phones": [
        "+44 1234567",
        "+44 2345678"
    ]
}

There are three common ways that you might find yourself needing to work with JSON data in Rust.

As text data. An unprocessed string of JSON data that you receive onan HTTP endpoint, read from a file, or prepare to send to a remoteserver.
As an untyped or loosely typed representation. Maybe you want tocheck that some JSON data is valid before passing it on, but withoutknowing the structure of what it contains. Or you want to do very basicmanipulations like insert a key in a particular spot.
As a strongly typed Rust data structure. When you expect all or mostof your data to conform to a particular structure and want to get realwork done without JSON’s loosey-goosey nature tripping you up.

Serde JSON provides efficient, flexible, safe ways of converting data between each of these representations.

Operating on untyped JSON values

Any valid JSON data can be manipulated in the following recursive enum representation. This data structure is serde_json::Value.

enum Value {
    Null,
    Bool(bool),
    Number(Number),
    String(String),
    Array(Vec<Value>),
    Object(Map<String, Value>),
}

A string of JSON data can be parsed into a serde_json::Value by the serde_json::from_str function. There is also from_slice for parsing from a byte slice &[u8] and from_reader for parsing from any io::Read like a File or a TCP stream.

use serde_json::{Result, Value};

fn untyped_example() -> Result<()> {
    // Some JSON input data as a &str. Maybe this comes from the user.
    let data = r#"
        {
            "name": "John Doe",
            "age": 43,
            "phones": [
                "+44 1234567",
                "+44 2345678"
            ]
        }"#;

    // Parse the string of data into serde_json::Value.
    let v: Value = serde_json::from_str(data)?;

    // Access parts of the data by indexing with square brackets.
    println!("Please call {} at the number {}", v["name"], v["phones"][0]);

    Ok(())
}

The result of square bracket indexing like v["name"] is a borrow of the data at that index, so the type is &Value. A JSON map can be indexed with string keys, while a JSON array can be indexed with integer keys. If the type of the data is not right for the type with which it is being indexed, or if a map does not contain the key being indexed, or if the index into a vector is out of bounds, the returned element is Value::Null.

When a Value is printed, it is printed as a JSON string. So in the code above, the output looks like Please call "John Doe" at the number "+44 1234567". The quotation marks appear because v["name"] is a &Value containing a JSON string and its JSON representation is "John Doe". Printing as a plain string without quotation marks involves converting from a JSON string to a Rust string with as_str() or avoiding the use of Value as described in the following section.

The Value representation is sufficient for very basic tasks but can be tedious to work with for anything more significant. Error handling is verbose to implement correctly, for example imagine trying to detect the presence of unrecognized fields in the input data. The compiler is powerless to help you when you make a mistake, for example imagine typoing v["name"] as v["nmae"] in one of the dozens of places it is used in your code.

Parsing JSON as strongly typed data structures

Serde provides a powerful way of mapping JSON data into Rust data structures largely automatically.

use serde::{Deserialize, Serialize};
use serde_json::Result;

#[derive(Serialize, Deserialize)]
struct Person {
    name: String,
    age: u8,
    phones: Vec<String>,
}

fn typed_example() -> Result<()> {
    // Some JSON input data as a &str. Maybe this comes from the user.
    let data = r#"
        {
            "name": "John Doe",
            "age": 43,
            "phones": [
                "+44 1234567",
                "+44 2345678"
            ]
        }"#;

    // Parse the string of data into a Person object. This is exactly the
    // same function as the one that produced serde_json::Value above, but
    // now we are asking it for a Person as output.
    let p: Person = serde_json::from_str(data)?;

    // Do things just like with any other Rust data structure.
    println!("Please call {} at the number {}", p.name, p.phones[0]);

    Ok(())
}

This is the same serde_json::from_str function as before, but this time we assign the return value to a variable of type Person so Serde will automatically interpret the input data as a Person and produce informative error messages if the layout does not conform to what a Person is expected to look like.

Any type that implements Serde’s Deserialize trait can be deserialized this way. This includes built-in Rust standard library types like Vec<T> and HashMap<K, V>, as well as any structs or enums annotated with #[derive(Deserialize)].

Once we have p of type Person, our IDE and the Rust compiler can help us use it correctly like they do for any other Rust code. The IDE can autocomplete field names to prevent typos, which was impossible in the serde_json::Value representation. And the Rust compiler can check that when we write p.phones[0], then p.phones is guaranteed to be a Vec<String> so indexing into it makes sense and produces a String.

Constructing JSON values

Serde JSON provides a json! macro to build serde_json::Value objects with very natural JSON syntax.

use serde_json::json;

fn main() {
    // The type of `john` is `serde_json::Value`
    let john = json!({
        "name": "John Doe",
        "age": 43,
        "phones": [
            "+44 1234567",
            "+44 2345678"
        ]
    });

    println!("first phone number: {}", john["phones"][0]);

    // Convert to a string of JSON and print it out
    println!("{}", john.to_string());
}

The Value::to_string() function converts a serde_json::Value into a String of JSON text.

One neat thing about the json! macro is that variables and expressions can be interpolated directly into the JSON value as you are building it. Serde will check at compile time that the value you are interpolating is able to be represented as JSON.

let full_name = "John Doe";
let age_last_year = 42;

// The type of `john` is `serde_json::Value`
let john = json!({
    "name": full_name,
    "age": age_last_year + 1,
    "phones": [
        format!("+44 {}", random_phone())
    ]
});

This is amazingly convenient, but we have the problem we had before with Value: the IDE and Rust compiler cannot help us if we get it wrong. Serde JSON provides a better way of serializing strongly-typed data structures into JSON text.

Creating JSON by serializing data structures

A data structure can be converted to a JSON string by serde_json::to_string. There is also serde_json::to_vec which serializes to a Vec<u8> and serde_json::to_writer which serializes to any io::Write such as a File or a TCP stream.

use serde::{Deserialize, Serialize};
use serde_json::Result;

#[derive(Serialize, Deserialize)]
struct Address {
    street: String,
    city: String,
}

fn print_an_address() -> Result<()> {
    // Some data structure.
    let address = Address {
        street: "10 Downing Street".to_owned(),
        city: "London".to_owned(),
    };

    // Serialize it to a JSON string.
    let j = serde_json::to_string(&address)?;

    // Print, write to a file, or send to an HTTP server.
    println!("{}", j);

    Ok(())
}

Any type that implements Serde’s Serialize trait can be serialized this way. This includes built-in Rust standard library types like Vec<T> and HashMap<K, V>, as well as any structs or enums annotated with #[derive(Serialize)].

No-std support

As long as there is a memory allocator, it is possible to use serde_json without the rest of the Rust standard library. Disable the default “std” feature and enable the “alloc” feature:

[dependencies]
serde_json = { version = "1.0", default-features = false, features = ["alloc"] }

For JSON support in Serde without a memory allocator, please see the serde-json-core crate.

docs.rsToggle fullscreen

serde_json::macros

macro_rules! json

Construct a serde_json::Value from a JSON literal.

let value = json!({
    "code": 200,
    "success": true,
    "payload": {
        "features": [
            "serde",
            "json"
        ],
        "homepage": null
    }
});

Variables or expressions can be interpolated into the JSON literal. Any type interpolated into an array element or object value must implement Serde’s Serialize trait, while any type interpolated into an object key must implement Into<String>. If the Serialize implementation of the interpolated type decides to fail, or if the interpolated type contains a map with non-string keys, the json! macro will panic.

let code = 200;
let features = vec!["serde", "json"];

let value = json!({
    "code": code,
    "success": code == 200,
    "payload": {
        features[0]: features[1]
    }
});

Trailing commas are allowed inside both arrays and objects.

let value = json!([
    "notice",
    "the",
    "trailing",
    "comma -->",
]);

Toggle fullscreen

let mut url: String

docs.rsToggle fullscreen

alloc::string::String

fn from(s: &str) -> String

Converts a &str into a String.

The result is allocated on the heap.

Toggle fullscreen

let mut encoder: ZlibEncoder<Vec<u8>>

Toggle fullscreen

extern crate flate2

A DEFLATE-based stream compression/decompression library

This library provides support for compression and decompression of DEFLATE-based streams:

the DEFLATE format itself
the zlib format
gzip

These three formats are all closely related and largely only differ in their headers/footers. This crate has three types in each submodule for dealing with these three formats.

Implementation

In addition to supporting three formats, this crate supports several different backends, controlled through this crate’s features flags:

default, or rust_backend - this implementation currently uses the miniz_oxide crate which is a port of miniz.c to Rust. This feature does not require a C compiler, and only uses safe Rust code.

Note that the rust_backend feature may at some point be switched to use zlib-rs, and that miniz_oxide should be used explicitly if this is not desired.
zlib-rs - this implementation utilizes the zlib-rs crate, a Rust rewrite of zlib. This backend is the fastest, at the cost of some unsafe Rust code.

Several backends implemented in C are also available. These are useful in case you are already using a specific C implementation and need the result of compression to be bit-identical. See the crate’s README for details on the available C backends.

The zlib-rs backend typically outperforms all the C implementations.

Feature Flags

Activate the document-features cargo feature to see feature docs here

Ambiguous feature selection

As Cargo features are additive, while backends are not, there is an order in which backends become active if multiple are selected.

zlib-ng
zlib-rs
cloudflare_zlib
miniz_oxide

Organization

This crate consists of three main modules: bufread, read, and write. Each module implements DEFLATE, zlib, and gzip for std::io::BufRead input types, std::io::Read input types, and std::io::Write output types respectively.

Use the bufread implementations if you can provide a BufRead type for the input. The &[u8] slice type implements the BufRead trait.

The read implementations conveniently wrap a Read type in a BufRead implementation. However, the read implementations may read past the end of the input data, making the Read type useless for subsequent reads of the input. If you need to re-use the Read type, wrap it in a std::io::BufReader, use the bufread implementations, and perform subsequent reads on the BufReader.

The write implementations are most useful when there is no way to create a BufRead type, notably when reading async iterators (streams).

use futures::{Stream, StreamExt};
use std::io::{Result, Write as _};

async fn decompress_gzip_stream<S, I>(stream: S) -> Result<Vec<u8>>
where
    S: Stream<Item = I>,
    I: AsRef<[u8]>
{
    let mut stream = std::pin::pin!(stream);
    let mut w = Vec::<u8>::new();
    let mut decoder = flate2::write::GzDecoder::new(w);
    while let Some(input) = stream.next().await {
        decoder.write_all(input.as_ref())?;
    }
    decoder.finish()
}

Note that types which operate over a specific trait often implement the mirroring trait as well. For example a bufread::DeflateDecoder<T> also implements the Write trait if T: Write. That is, the “dual trait” is forwarded directly to the underlying object if available.

About multi-member Gzip files

While most gzip files one encounters will have a single member that can be read with the GzDecoder, there may be some files which have multiple members.

A GzDecoder will only read the first member of gzip data, which may unexpectedly provide partial results when a multi-member gzip file is encountered. GzDecoder is appropriate for data that is designed to be read as single members from a multi-member file. bufread::GzDecoder and write::GzDecoder also allow non-gzip data following gzip data to be handled.

The MultiGzDecoder on the other hand will decode all members of a gzip file into one consecutive stream of bytes, which hides the underlying members entirely. If a file contains non-gzip data after the gzip data, MultiGzDecoder will emit an error after decoding the gzip data. This behavior matches the gzip, gunzip, and zcat command line tools.

docs.rsToggle fullscreen

flate2

pub mod write

Types which operate over Write streams, both encoders and decoders for various formats.

docs.rsToggle fullscreen

flate2::zlib::write

pub struct ZlibEncoder<W>
where
    W: Write,
{
    inner: Writer<W, Compress>,
}

A ZLIB encoder, or compressor.

This structure implements a Write interface and takes a stream of uncompressed data, writing the compressed data to the wrapped writer.

Examples

use std::io::prelude::*;
use flate2::Compression;
use flate2::write::ZlibEncoder;

// Vec<u8> implements Write, assigning the compressed bytes of sample string

let mut e = ZlibEncoder::new(Vec::new(), Compression::default());
e.write_all(b"Hello World")?;
let compressed = e.finish()?;

docs.rsToggle fullscreen

flate2::zlib::write::ZlibEncoder

impl<W> ZlibEncoder<W>
pub fn new(w: W, level: crate::Compression) -> ZlibEncoder<W>
where
    // Bounds from impl:
    W: Write,

Creates a new encoder which will write compressed data to the stream given at the given compression level.

When this encoder is dropped or unwrapped the final pieces of data will be flushed.

docs.rsToggle fullscreen

alloc::vec::Vec

impl<T> Vec<T, Global>
pub const fn new() -> Self

Constructs a new, empty Vec<T>.

The vector will not allocate until elements are pushed onto it.

Examples

let mut vec: Vec<i32> = Vec::new();

docs.rsToggle fullscreen

flate2

pub struct Compression(u32)

When compressing data, the compression level can be specified by a value in this struct.

docs.rsToggle fullscreen

flate2::Compression

fn default() -> Compression

Returns the “default value” for a type.

Default values are often some kind of initial value, identity value, or anything else that may make sense as a default.

Examples

Using built-in default values:

let i: i8 = Default::default();
let (x, y): (Option<String>, f64) = Default::default();
let (a, b, (c, d)): (i32, u32, (bool, bool)) = Default::default();

Making your own:

enum Kind {
    A,
    B,
    C,
}

impl Default for Kind {
    fn default() -> Self { Kind::A }
}

docs.rsToggle fullscreen

std::io::Write

pub trait Write
pub fn write_all(&mut self, mut buf: &[u8]) -> Result<()>

Attempts to write an entire buffer into this writer.

This method will continuously call write until there is no more data to be written or an error of non-ErrorKind::Interrupted kind is returned. This method will not return until the entire buffer has been successfully written or such an error occurs. The first error that is not of ErrorKind::Interrupted kind generated from this method will be returned.

If the buffer contains no data, this will never call write.

Errors

This function will return the first error of non-ErrorKind::Interrupted kind that write returns.

Examples

use std::io::prelude::*;
use std::fs::File;

fn main() -> std::io::Result<()> {
    let mut buffer = File::create("foo.txt")?;

    buffer.write_all(b"some bytes")?;
    Ok(())
}

Toggle fullscreen

alloc::string

impl<T> ToString for T
fn to_string(&self) -> String
where
    // Bounds from impl:
    T: fmt::Display + ?Sized,

Converts the given value to a String.

Examples

let i = 5;
let five = String::from("5");

assert_eq!(five, i.to_string());

docs.rsToggle fullscreen

alloc::string::String

pub const fn as_bytes(&self) -> &[u8]

Returns a byte slice of this String’s contents.

The inverse of this method is from_utf8.

Examples

let s = String::from("hello");

assert_eq!(&[104, 101, 108, 108, 111], s.as_bytes());

docs.rsToggle fullscreen

core::result::Result

impl<T, E> Result<T, E>
pub fn unwrap(self) -> T
where
    E: fmt::Debug,

Returns the contained Ok value, consuming the self value.

Because this function may panic, its use is generally discouraged. Panics are meant for unrecoverable errors, and may abort the entire program.

Instead, prefer to use the ? (try) operator, or pattern matching to handle the Err case explicitly, or call unwrap_or, unwrap_or_else, or unwrap_or_default.

Panics

Panics if the value is an Err, with a panic message provided by the Err’s value.

Examples

Basic usage:

let x: Result<u32, &str> = Ok(2);
assert_eq!(x.unwrap(), 2);

let x: Result<u32, &str> = Err("emergency failure");
x.unwrap(); // panics with `emergency failure`

Toggle fullscreen

let args: Vec<u8>

docs.rsToggle fullscreen

flate2::zlib::write::ZlibEncoder

impl<W> ZlibEncoder<W>
pub fn finish(self) -> io::Result<W>
where
    // Bounds from impl:
    W: Write,

Consumes this encoder, flushing the output stream.

This will flush the underlying data stream, close off the compressed stream and, if successful, return the contained writer.

Note that this function may not be suitable to call in a situation where the underlying stream is an asynchronous I/O stream. To finish a stream the try_finish (or shutdown) method should be used instead. To re-acquire ownership of a stream it is safe to call this method after try_finish or shutdown has returned Ok.

Errors

This function will perform I/O to complete this stream, and any I/O errors which occur will be returned from this function.

docs.rsToggle fullscreen

base64

pub mod prelude

Preconfigured engines for common use cases.

These are re-exports of const engines in crate::engine::general_purpose, renamed with a BASE64_ prefix for those who prefer to use the entire path to a name.

Examples

 use base64::prelude::{Engine as _, BASE64_STANDARD_NO_PAD};

 assert_eq!("c29tZSBieXRlcw", &BASE64_STANDARD_NO_PAD.encode(b"some bytes"));

docs.rsToggle fullscreen

base64::engine::general_purpose

pub const URL_SAFE_NO_PAD: GeneralPurpose = GeneralPurpose::new(&alphabet::URL_SAFE, NO_PAD)

A GeneralPurpose engine using the alphabet::URL_SAFE base64 alphabet and NO_PAD config.

docs.rsToggle fullscreen

base64::engine::Engine

pub trait Engine
pub fn encode_string<T>(&self, input: T, output_buf: &mut String)
where
    T: AsRef<[u8]>,
    // Bounds from trait:
    Self: Send + Sync,

Encode arbitrary octets as base64 into a supplied String. Writes into the supplied String, which may allocate if its internal buffer isn’t big enough.

Example

use base64::{Engine as _, engine::{self, general_purpose}, alphabet};
const CUSTOM_ENGINE: engine::GeneralPurpose =
    engine::GeneralPurpose::new(&alphabet::URL_SAFE, general_purpose::NO_PAD);

fn main() {
    let mut buf = String::new();
    general_purpose::STANDARD.encode_string(b"hello world~", &mut buf);
    println!("{}", buf);

    buf.clear();
    CUSTOM_ENGINE.encode_string(b"hello internet~", &mut buf);
    println!("{}", buf);
}

docs.rsToggle fullscreen

alloc::vec::Vec

impl<T, A> Vec<T, A>
pub fn push(&mut self, value: T)
where
    // Bounds from impl:
    A: Allocator + Destruct,

Appends an element to the back of a collection.

Panics

Panics if the new capacity exceeds isize::MAX bytes.

Examples

let mut vec = vec![1, 2];
vec.push(3);
assert_eq!(vec, [1, 2, 3]);

Time complexity

Takes amortized O(1) time. If the vector’s length would exceed its capacity after the push, O(capacity) time is taken to copy the vector’s elements to a larger allocation. This expensive operation is offset by the capacity O(1) insertions it allows.

Toggle fullscreen

The inset CSS property defines the logical block and inline start and end offsets of an element, which map to physical offsets depending on the element’s writing mode, directionality, and text orientation. It corresponds to the top and bottom, or right and left properties depending on the values defined for writing-mode, direction, and text-orientation.

(Edge 87, Firefox 66, Safari 14.1, Chrome 87, Opera 73)

Syntax: <‘top’>{1,4}

MDN Reference

Toggle fullscreen

Specifies how far an absolutely positioned box’s bottom margin edge is offset above the bottom edge of the box’s ‘containing block’.

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 5, Opera 6)

Syntax: <length> | <percentage> | auto

MDN Reference

Toggle fullscreen

Specifies how far an absolutely positioned box’s right margin edge is offset to the left of the right edge of the box’s ‘containing block’.

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 5.5, Opera 5)

Syntax: <length> | <percentage> | auto

MDN Reference

Toggle fullscreen

Allows authors to constrain content width to a certain range.

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 7, Opera 4)

MDN Reference

Toggle fullscreen

Allows authors to constrain content height to a certain range.

(Edge 12, Firefox 1, Safari 1.3, Chrome 1, IE 7, Opera 7)

MDN Reference

Toggle fullscreen

Shorthand property for setting border width, style, and color.

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 4, Opera 3.5)

Syntax: <line-width> || <line-style> || <color>

MDN Reference

Toggle fullscreen

Indicates the desired height of glyphs from the font. For scalable fonts, the font-size is a scale factor applied to the EM unit of the font. (Note that certain glyphs may bleed outside their EM box.) For non-scalable fonts, the font-size is converted into absolute units and matched against the declared font-size of the font, using the same absolute coordinate space for both of the matched values.

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 5.5, Opera 7)

Syntax: <absolute-size> | <relative-size> | <length-percentage>

MDN Reference

Toggle fullscreen

Allows authors to constrain content width to a certain range.

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 7, Opera 4)

MDN Reference

Toggle fullscreen

Allows authors to constrain content height to a certain range.

(Edge 12, Firefox 3, Safari 1.3, Chrome 1, IE 7, Opera 4)

MDN Reference

Toggle fullscreen

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 3, Opera 3.5)

Syntax: <length-percentage> | auto

MDN Reference

Toggle fullscreen

Controls the appearance of selection.

(Edge 79, Firefox 69, Chrome 54, IE 10, Opera 41)

Syntax: auto | text | none | all

MDN Reference

Toggle fullscreen

Allows control over cursor appearance in an element

(Edge 12, Firefox 1, Safari 1.2, Chrome 1, IE 4, Opera 7)

MDN Reference

Toggle fullscreen

For a positioned box, the ‘z-index’ property specifies the stack level of the box in the current stacking context and whether the box establishes a local stacking context.

(Edge 12, Firefox 1, Safari 1, Chrome 1, IE 4, Opera 4)

Syntax: auto | <integer>

MDN Reference

Toggle fullscreen

Shorthand property for ‘outline-style’, ‘outline-width’, and ‘outline-color’.

(Edge 94, Firefox 88, Safari 16.4, Chrome 94, IE 8, Opera 80)

Syntax: <‘outline-width’> || <‘outline-style’> || <‘outline-color’>

MDN Reference

Toggle fullscreen

The color of the outline.

(Edge 12, Firefox 1.5, Safari 1.2, Chrome 1, IE 8, Opera 7)

Syntax: auto | <color>

MDN Reference