How does double ampersands passed to size_of_val work?

Advertisements

I read a book published by Apress named Beginning Rust – Get Started with Rust 2021 Edition

In one of the code examples, the author does not explain it in detail or clearly how the code works. Here is the code snippet

/* In a 64-bit system, it prints:
16 16 16; 8 8 8
In a 32-bit system, it prints:
8 8 8; 4 4 4
*/
fn main() {
    use std::mem::*;
    let a: &str = "";
    let b: &str = "0123456789";
    let c: &str = "abcdè";
    print!("{} {} {}; ",
        size_of_val(&a),
        size_of_val(&b),
        size_of_val(&c));
    print!("{} {} {}",
        size_of_val(&&a),
        size_of_val(&&b),
        size_of_val(&&c));
}

My question is how it work since the size_of_val takes a reference and this was done in the declaration of the &str. But how come in the print! statement, the author put another ampersand before the variable? In addition to that when we just pass the variable without an additional ampersand such as size_of_val(a or b or c), the size we get is for a 0, for b 10 and for c 6, but when we pass the variable with the ampersand such as size_of_val(&a or &b or &c), then like the comments above the main function described by the author, the sizes are 16 16 16 or 8 8 8. Last for the second print! statement (macro), the author put double ampersands to get the size of reference? How does it work. Just don’t get it cuz I thought that would generate the error since size_of_val only accept one reference but then in the print! macro there is another ampersand and the second macro there are double ampersands…

>Solution :

The size_of_val() function is declared as follows:

pub fn size_of_val<T>(val: &T) -> usize
where
    T: ?Sized, 

That means: given any type T (the ?Sized constraint means "really any type, even unsized ones"), we take a reference for T and give back a usize.

Let’s take a as an example (b and c are the same).

When we evaluate size_of_val(a), the compiler knows that a has type &str, and thus it infers the generic parameter to be str (without a reference), so the full call is size_of_val::<str>(a /* &str */), which match the signature: we give &str for T == str.

What is the size of a str? str is actually a continuous sequence of bytes, encoding the string as UTF-8. a contains "", the empty string, which is of course zero bytes long. So size_of_val() returns 0. For b, there are 10 ASCII characters, each is one byte long UTF8-encoded, so together they’re 10 bytes long. C contains 4 ASCII chars (abcd), so four bytes, and one Unicode character (è) that is two bytes wide, encoded as \xC3\xA8 (195 and 168 in decimal). So a total length of six bytes.

What does happen when we calculate size_of_val(&a)? &a is &&str because a is &str, so the compiler infers T to be &str. The size of &str is constant and always double the size of a pointer: this is because &str, i.e. a pointer to str, should include the data address and the length. On 64 bit platforms this is 16 (8 * 2); on 32 bit ones it is 8 (4 * 2). This is called a fat pointer, that is, a pointer that carries additional metadata besides just the address (note that it is not guaranteed to be double times the length, so don’t rely on it, but practically it is).

When we evaluate size_of_val(&&a), the type of &&a is &&&str, so T is inferred to be &&str. While &str (a pointer to str) is a fat pointer, meaning it is doubled in size, a pointer to a fat pointer is a normal thin pointer (the opposite of a fat pointer: a pointer that only carries the address, without any additional metadata), meaning it is one machine word size. So 8 bytes for 64 bit or 4 bytes for 32 bit platforms.

Leave a ReplyCancel reply