Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

String::from() a string literal differs in length for "o" and "ó". Why?

I am learning Rust and I wanted to play around with slices, but this made me "discover" that string literals "o" and "ó" differ in length.

This code:

fn main() {
    let o = String::from("o");
    let oo = String::from("ó");

    println!("o length: {}, ó length: {}", o.len(), oo.len());
}

returns:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

o length: 1, ó length: 2

The result of:

println!("{}", String::from("ó"));
println!("{}", "ó");

Is exactly the same.

I am asking this because I am no master of Strings, string literals, bytes and encoding, especially in Rust. Why is the length of o = 1 and of ó = 2?

>Solution :

String::len() returns the length of the string in bytes, not characters. Strings in Rust are UTF-8 encoded, and the character ó requires two bytes to encode in UTF-8.

Note also that there’s a few ways to write ó in Unicode. It has its own code point, but there is also a "combining" accent mark that combines with the previous character. Writing ó using that mechanism would contain two code points: the o character (1 byte in UTF-8), and the combining accent mark (2 bytes in UTF-8), so you could also see a length of 3 bytes. (Playground example)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading