Inclusive Range Performance

I got baited by this r/rust post on Zig being faster than Rust, which is not the focus of this tidbit, rather this comment in the same post:

Try to change this:

let sqrt = (n as ).sqrt() as ;
!(3..=sqrt)

…to:

let sqrt = (n as ).sqrt() as  + 1;
!(3..sqrt) ...

…just for fun and see what happens. The ..= range can be significantly slower than the .. range because in some cases it compiles to two separate comparisons instead of one. I don’t know if it will make a difference in your case but I’m curious.

u/ObligatoryOption

After a short excursion, I learned something new!

Inclusive ranges are slower compared to their equivalent exclusive range.

The inclusive range version requires doing additional checks since the upper bound may be the largest value for the integer type. Without the check, the iteration variable will overflow and cause an infinite loop, see also this comment by CAD1997. These checks also result in less optimization opportunities.

Rust

pub fn (: ) ->  {
    let mut  = 0;
    for _ in 1..( + 1) {
         += ::::(1);
    }
    
}

pub fn (: ) ->  {
    let mut  = 0;
    for _ in 1..= {
         += ::::(1);
    }
    
}

Assembly

example::exclusive:
        lea     rax, [rdi - 1]
        cmp     rax, -3
        ja      .LBB0_1
        xor     eax, eax
        lea     rcx, [rsp - 8]
.LBB0_4:
        mov     qword ptr [rsp - 8], 1
        add     rax, qword ptr [rsp - 8]
        dec     rdi
        jne     .LBB0_4
        ret
.LBB0_1:
        xor     eax, eax
        ret

example::inclusive:
        test    rdi, rdi
        je      .LBB1_1
        mov     ecx, 1
        xor     eax, eax
        lea     rdx, [rsp - 8]
.LBB1_4:
        mov     rsi, rcx
        cmp     rcx, rdi
        adc     rcx, 0
        mov     qword ptr [rsp - 8], 1
        add     rax, qword ptr [rsp - 8]
        cmp     rsi, rdi
        jae     .LBB1_2
        cmp     rcx, rdi
        jbe     .LBB1_4
.LBB1_2:
        ret
.LBB1_1:
        xor     eax, eax
        ret

Benchmark

The benchmark was done using Criterion:

fn (: &mut criterion::Criterion) {
    let mut  = .benchmark_group("Iteration");
    for  in &[256, 512, 1024, 2048, 4096, 8192] {
        .bench_with_input(
            criterion::BenchmarkId::new("Exclusive", ),
            ,
            |, | .iter(|| exclusive(*)),
        );
        .bench_with_input(
            criterion::BenchmarkId::new("Inclusive", ),
            ,
            |, | .iter(|| inclusive(*)),
        );
    }
    .finish();
}

criterion::criterion_group!(benches, benchmark);
criterion::criterion_main!(benches);

And the output:

0 0.5 1 1.5 2 2.5 3 3.5 4 0 1000 2000 3000 4000 5000 6000 7000 8000 9000Average time (µs)InputExclusiveInclusiveIteration: Comparison

As you can see, the exclusive range version performs about twice as fast as the inclusive range version.

b: {unknown}
u64

The 64-bit unsigned integer type.

core
pub mod hint

Hints to compiler that affects how code should be emitted or optimized.

Hints may be compile time or runtime.

let limit: &i32
f32

A 32-bit floating-point type (specifically, the “binary32” type defined in IEEE 754-2008).

This type can represent a wide range of decimal numbers, like 3.5, 27, -113.75, 0.0078125, 34359738368, 0, -1. So unlike integer types (such as i32), floating-point types can represent non-integer numbers, too.

However, being able to represent this wide range of numbers comes at the cost of precision: floats can only represent some of the real numbers and calculation with floats round to a nearby representable number. For example, 5.0 and 1.0 can be exactly represented as f32, but 1.0 / 5.0 results in 0.20000000298023223876953125 since 0.2 cannot be exactly represented as f32. Note, however, that printing floats with println and friends will often discard insignificant digits: println!("{}", 1.0f32 / 5.0f32) will print 0.2.

Additionally, f32 can represent some special values:

  • −0.0: IEEE 754 floating-point numbers have a bit that indicates their sign, so −0.0 is a possible value. For comparison −0.0 = +0.0, but floating-point operations can carry the sign bit through arithmetic operations. This means −0.0 × +0.0 produces −0.0 and a negative number rounded to a value smaller than a float can represent also produces −0.0.
  • and −∞: these result from calculations like 1.0 / 0.0.
  • NaN (not a number): this value results from calculations like (-1.0).sqrt(). NaN has some potentially unexpected behavior:
    • It is not equal to any float, including itself! This is the reason f32 doesn’t implement the Eq trait.
    • It is also neither smaller nor greater than any float, making it impossible to sort by the default comparison operation, which is the reason f32 doesn’t implement the Ord trait.
    • It is also considered infectious as almost all calculations where one of the operands is NaN will also result in NaN. The explanations on this page only explicitly document behavior on NaN operands if this default is deviated from.
    • Lastly, there are multiple bit patterns that are considered NaN. Rust does not currently guarantee that the bit patterns of NaN are preserved over arithmetic operations, and they are not guaranteed to be portable or even fully deterministic! This means that there may be some surprising results upon inspecting the bit patterns, as the same calculations might produce NaNs with different bit patterns. This also affects the sign of the NaN: checking is_sign_positive or is_sign_negative on a NaN is the most common way to run into these surprising results. (Checking x >= 0.0 or x <= 0.0 avoids those surprises, but also how negative/positive zero are treated.) See the section below for what exactly is guaranteed about the bit pattern of a NaN.

When a primitive operation (addition, subtraction, multiplication, or division) is performed on this type, the result is rounded according to the roundTiesToEven direction defined in IEEE 754-2008. That means:

  • The result is the representable value closest to the true value, if there is a unique closest representable value.
  • If the true value is exactly half-way between two representable values, the result is the one with an even least-significant binary digit.
  • If the true value’s magnitude is ≥ f32::MAX + 2(f32::MAX_EXPf32::MANTISSA_DIGITS − 1), the result is ∞ or −∞ (preserving the true value’s sign).
  • If the result of a sum exactly equals zero, the outcome is +0.0 unless both arguments were negative, then it is -0.0. Subtraction a - b is regarded as a sum a + (-b).

For more information on floating-point numbers, see Wikipedia.

See also the std::f32::consts module.

NaN bit patterns

This section defines the possible NaN bit patterns returned by floating-point operations.

The bit pattern of a floating-point NaN value is defined by:

  • a sign bit.
  • a quiet/signaling bit. Rust assumes that the quiet/signaling bit being set to 1 indicates a quiet NaN (QNaN), and a value of 0 indicates a signaling NaN (SNaN). In the following we will hence just call it the “quiet bit”.
  • a payload, which makes up the rest of the significand (i.e., the mantissa) except for the quiet bit.

The rules for NaN values differ between arithmetic and non-arithmetic (or “bitwise”) operations. The non-arithmetic operations are unary -, abs, copysign, signum, {to,from}_bits, {to,from}_{be,le,ne}_bytes and is_sign_{positive,negative}. These operations are guaranteed to exactly preserve the bit pattern of their input except for possibly changing the sign bit.

The following rules apply when a NaN value is returned from an arithmetic operation:

  • The result has a non-deterministic sign.

  • The quiet bit and payload are non-deterministically chosen from the following set of options:

    • Preferred NaN: The quiet bit is set and the payload is all-zero.
    • Quieting NaN propagation: The quiet bit is set and the payload is copied from any input operand that is a NaN. If the inputs and outputs do not have the same payload size (i.e., for as casts), then
      • If the output is smaller than the input, low-order bits of the payload get dropped.
      • If the output is larger than the input, the payload gets filled up with 0s in the low-order bits.
    • Unchanged NaN propagation: The quiet bit and payload are copied from any input operand that is a NaN. If the inputs and outputs do not have the same size (i.e., for as casts), the same rules as for “quieting NaN propagation” apply, with one caveat: if the output is smaller than the input, dropping the low-order bits may result in a payload of 0; a payload of 0 is not possible with a signaling NaN (the all-0 significand encodes an infinity) so unchanged NaN propagation cannot occur with some inputs.
    • Target-specific NaN: The quiet bit is set and the payload is picked from a target-specific set of “extra” possible NaN payloads. The set can depend on the input operand values. See the table below for the concrete NaNs this set contains on various targets.

In particular, if all input NaNs are quiet (or if there are no input NaNs), then the output NaN is definitely quiet. Signaling NaN outputs can only occur if they are provided as an input value. Similarly, if all input NaNs are preferred (or if there are no input NaNs) and the target does not have any “extra” NaN payloads, then the output NaN is guaranteed to be preferred.

The non-deterministic choice happens when the operation is executed; i.e., the result of a NaN-producing floating-point operation is a stable bit pattern (looking at these bits multiple times will yield consistent results), but running the same operation twice with the same inputs can produce different results.

These guarantees are neither stronger nor weaker than those of IEEE 754: IEEE 754 guarantees that an operation never returns a signaling NaN, whereas it is possible for operations like SNAN * 1.0 to return a signaling NaN in Rust. Conversely, IEEE 754 makes no statement at all about which quiet NaN is returned, whereas Rust restricts the set of possible results to the ones listed above.

Unless noted otherwise, the same rules also apply to NaNs returned by other library functions (e.g. min, minimum, max, maximum); other aspects of their semantics and which IEEE 754 operation they correspond to are documented with the respective functions.

When an arithmetic floating-point operation is executed in const context, the same rules apply: no guarantee is made about which of the NaN bit patterns described above will be returned. The result does not have to match what happens when executing the same code at runtime, and the result can vary depending on factors such as compiler version and flags.

Target-specific “extra” NaN values

target_arch Extra payloads possible on this platform
x86, x86_64, arm, aarch64, riscv32, riscv64 None
sparc, sparc64 The all-one payload
wasm32, wasm64 If all input NaNs are quiet with all-zero payload: None.
Otherwise: all possible payloads.

For targets not in this table, all payloads are possible.

core::hint
pub const fn black_box<T>(dummy: T) -> T

An identity function that hints to the compiler to be maximally pessimistic about what black_box could do.

Unlike [std::convert::identity], a Rust compiler is encouraged to assume that black_box can use dummy in any possible valid way that Rust code is allowed to without introducing undefined behavior in the calling code. This property makes black_box useful for writing code in which certain optimizations are not desired, such as benchmarks.

Note however, that black_box is only (and can only be) provided on a “best-effort” basis. The extent to which it can block optimisations may vary depending upon the platform and code-gen backend used. Programs cannot rely on black_box for correctness, beyond it behaving as the identity function. As such, it must not be relied upon to control critical program behavior. This also means that this function does not offer any guarantees for cryptographic or security purposes.

When is this useful?

While not suitable in those mission-critical cases, black_box’s functionality can generally be relied upon for benchmarking, and should be used there. It will try to ensure that the compiler doesn’t optimize away part of the intended test code based on context. For example:

fn contains(haystack: &[&str], needle: &str) -> bool {
    haystack.iter().any(|x| x == &needle)
}

pub fn benchmark() {
    let haystack = vec!["abc", "def", "ghi", "jkl", "mno"];
    let needle = "ghi";
    for _ in 0..10 {
        contains(&haystack, needle);
    }
}

The compiler could theoretically make optimizations like the following:

  • The needle and haystack do not change, move the call to contains outside the loop and delete the loop
  • Inline contains
  • needle and haystack have values known at compile time, contains is always true. Remove the call and replace with true
  • Nothing is done with the result of contains: delete this function call entirely
  • benchmark now has no purpose: delete this function

It is not likely that all of the above happens, but the compiler is definitely able to make some optimizations that could result in a very inaccurate benchmark. This is where black_box comes in:

use std::hint::black_box;

// Same `contains` function.
fn contains(haystack: &[&str], needle: &str) -> bool {
    haystack.iter().any(|x| x == &needle)
}

pub fn benchmark() {
    let haystack = vec!["abc", "def", "ghi", "jkl", "mno"];
    let needle = "ghi";
    for _ in 0..10 {
        // Force the compiler to run `contains`, even though it is a pure function whose
        // results are unused.
        black_box(contains(
            // Prevent the compiler from making assumptions about the input.
            black_box(&haystack),
            black_box(needle),
        ));
    }
}

This essentially tells the compiler to block optimizations across any calls to black_box. So, it now:

  • Treats both arguments to contains as unpredictable: the body of contains can no longer be optimized based on argument values
  • Treats the call to contains and its result as volatile: the body of benchmark cannot optimize this away

This makes our benchmark much more realistic to how the function would actually be used, where arguments are usually not known at compile time and the result is used in some way.

How to use this

In practice, black_box serves two purposes:

  1. It prevents the compiler from making optimizations related to the value returned by black_box
  2. It forces the value passed to black_box to be calculated, even if the return value of black_box is unused
use std::hint::black_box;

let zero = 0;
let five = 5;

// The compiler will see this and remove the `* five` call, because it knows that multiplying
// any integer by 0 will result in 0.
let c = zero * five;

// Adding `black_box` here disables the compiler's ability to reason about the first operand in the multiplication.
// It is forced to assume that it can be any possible number, so it cannot remove the `* five`
// operation.
let c = black_box(zero) * five;

While most cases will not be as clear-cut as the above example, it still illustrates how black_box can be used. When benchmarking a function, you usually want to wrap its inputs in black_box so the compiler cannot make optimizations that would be unrealistic in real-life use.

use std::hint::black_box;

// This is a simple function that increments its input by 1. Note that it is pure, meaning it
// has no side-effects. This function has no effect if its result is unused. (An example of a
// function *with* side-effects is `println!()`.)
fn increment(x: u8) -> u8 {
    x + 1
}

// Here, we call `increment` but discard its result. The compiler, seeing this and knowing that
// `increment` is pure, will eliminate this function call entirely. This may not be desired,
// though, especially if we're trying to track how much time `increment` takes to execute.
let _ = increment(black_box(5));

// Here, we force `increment` to be executed. This is because the compiler treats `black_box`
// as if it has side-effects, and thus must compute its input.
let _ = black_box(increment(black_box(5)));

There may be additional situations where you want to wrap the result of a function in black_box to force its execution. This is situational though, and may not have any effect (such as when the function returns a zero-sized type such as () unit).

Note that black_box has no effect on how its input is treated, only its output. As such, expressions passed to black_box may still be optimized:

use std::hint::black_box;

// The compiler sees this...
let y = black_box(5 * 10);

// ...as this. As such, it will likely simplify `5 * 10` to just `50`.
let _0 = 5 * 10;
let y = black_box(_0);

In the above example, the 5 * 10 expression is considered distinct from the black_box call, and thus is still optimized by the compiler. You can prevent this by moving the multiplication operation outside of black_box:

use std::hint::black_box;

// No assumptions can be made about either operand, so the multiplication is not optimized out.
let y = black_box(5) * black_box(10);

During constant evaluation, black_box is treated as a no-op.

src
pub fn exclusive(upper_limit: u64) -> u64
src
pub fn inclusive(upper_limit: u64) -> u64
extern crate std

The Rust Standard Library

The Rust Standard Library is the foundation of portable Rust software, a set of minimal and battle-tested shared abstractions for the broader Rust ecosystem. It offers core types, like [Vec<T>] and [Option<T>], library-defined operations on language primitives, standard macros, [I/O] and [multithreading], among many other things.

std is available to all Rust crates by default. Therefore, the standard library can be accessed in use statements through the path std, as in use std::env.

How to read this documentation

If you already know the name of what you are looking for, the fastest way to find it is to use the search bar at the top of the page.

Otherwise, you may want to jump to one of these useful sections:

If this is your first time, the documentation for the standard library is written to be casually perused. Clicking on interesting things should generally lead you to interesting places. Still, there are important bits you don’t want to miss, so read on for a tour of the standard library and its documentation!

Once you are familiar with the contents of the standard library you may begin to find the verbosity of the prose distracting. At this stage in your development you may want to press the “ Summary” button near the top of the page to collapse it into a more skimmable view.

While you are looking at the top of the page, also notice the “Source” link. Rust’s API documentation comes with the source code and you are encouraged to read it. The standard library source is generally high quality and a peek behind the curtains is often enlightening.

What is in the standard library documentation?

First of all, The Rust Standard Library is divided into a number of focused modules, all listed further down this page. These modules are the bedrock upon which all of Rust is forged, and they have mighty names like [std::slice] and [std::cmp]. Modules’ documentation typically includes an overview of the module along with examples, and are a smart place to start familiarizing yourself with the library.

Second, implicit methods on primitive types are documented here. This can be a source of confusion for two reasons:

  1. While primitives are implemented by the compiler, the standard library implements methods directly on the primitive types (and it is the only library that does so), which are documented in the section on primitives.
  2. The standard library exports many modules with the same name as primitive types. These define additional items related to the primitive type, but not the all-important methods.

So for example there is a page for the primitive type i32 that lists all the methods that can be called on 32-bit integers (very useful), and there is a page for the module std::i32 that documents the constant values [MIN] and [MAX] (rarely useful).

Note the documentation for the primitives [str] and [T] (also called ‘slice’). Many method calls on String and [Vec<T>] are actually calls to methods on [str] and [T] respectively, via deref coercions.

Third, the standard library defines [The Rust Prelude], a small collection of items - mostly traits - that are imported into every module of every crate. The traits in the prelude are pervasive, making the prelude documentation a good entry point to learning about the library.

And finally, the standard library exports a number of standard macros, and lists them on this page (technically, not all of the standard macros are defined by the standard library - some are defined by the compiler - but they are documented here the same). Like the prelude, the standard macros are imported by default into all crates.

Contributing changes to the documentation

Check out the Rust contribution guidelines here. The source for this documentation can be found on GitHub in the ‘library/std/’ directory. To contribute changes, make sure you read the guidelines first, then submit pull-requests for your suggested changes.

Contributions are appreciated! If you see a part of the docs that can be improved, submit a PR, or chat with us first on Discord #docs.

A Tour of The Rust Standard Library

The rest of this crate documentation is dedicated to pointing out notable features of The Rust Standard Library.

Containers and collections

The option and result modules define optional and error-handling types, [Option<T>] and [Result<T, E>]. The iter module defines Rust’s iterator trait, Iterator, which works with the for loop to access collections.

The standard library exposes three common ways to deal with contiguous regions of memory:

  • [Vec<T>] - A heap-allocated vector that is resizable at runtime.
  • [T; N] - An inline array with a fixed size at compile time.
  • [T] - A dynamically sized slice into any other kind of contiguous storage, whether heap-allocated or not.

Slices can only be handled through some kind of pointer, and as such come in many flavors such as:

  • &[T] - shared slice
  • &mut [T] - mutable slice
  • Box<[T]> - owned slice

[str], a UTF-8 string slice, is a primitive type, and the standard library defines many methods for it. Rust [str]s are typically accessed as immutable references: &str. Use the owned String for building and mutating strings.

For converting to strings use the format macro, and for converting from strings use the [FromStr] trait.

Data may be shared by placing it in a reference-counted box or the [Rc] type, and if further contained in a [Cell] or [RefCell], may be mutated as well as shared. Likewise, in a concurrent setting it is common to pair an atomically-reference-counted box, [Arc], with a [Mutex] to get the same effect.

The collections module defines maps, sets, linked lists and other typical collection types, including the common [HashMap<K, V>].

Platform abstractions and I/O

Besides basic data types, the standard library is largely concerned with abstracting over differences in common platforms, most notably Windows and Unix derivatives.

Common types of I/O, including [files], [TCP], and [UDP], are defined in the io, fs, and net modules.

The thread module contains Rust’s threading abstractions. sync contains further primitive shared memory types, including [atomic], [mpmc] and [mpsc], which contains the channel types for message passing.

Use before and after main()

Many parts of the standard library are expected to work before and after main(); but this is not guaranteed or ensured by tests. It is recommended that you write your own tests and run them on each platform you wish to support. This means that use of std before/after main, especially of features that interact with the OS or global state, is exempted from stability and portability guarantees and instead only provided on a best-effort basis. Nevertheless bug reports are appreciated.

On the other hand core and alloc are most likely to work in such environments with the caveat that any hookable behavior such as panics, oom handling or allocators will also depend on the compatibility of the hooks.

Some features may also behave differently outside main, e.g. stdio could become unbuffered, some panics might turn into aborts, backtraces might not get symbolicated or similar.

Non-exhaustive list of known limitations:

  • after-main use of thread-locals, which also affects additional features:
  • before-main stdio file descriptors are not guaranteed to be open on unix platforms
let mut sum: u64
upper_limit: u64
c: &mut {unknown}
i: {unknown}
let mut group: {unknown}
src
fn benchmark(c: &mut criterion::Criterion)
u32

The 32-bit unsigned integer type.