I am learning Rust and I wanted to play around with slices, but this made me "discover" that string literals "o" and "ó" differ in length.
This code:
fn main() {
let o = String::from("o");
let oo = String::from("ó");
println!("o length: {}, ó length: {}", o.len(), oo.len());
}
returns:
o length: 1, ó length: 2
The result of:
println!("{}", String::from("ó"));
println!("{}", "ó");
Is exactly the same.
I am asking this because I am no master of Strings, string literals, bytes and encoding, especially in Rust. Why is the length of o = 1 and of ó = 2?
>Solution :
String::len() returns the length of the string in bytes, not characters. Strings in Rust are UTF-8 encoded, and the character ó requires two bytes to encode in UTF-8.
Note also that there’s a few ways to write ó in Unicode. It has its own code point, but there is also a "combining" accent mark that combines with the previous character. Writing ó using that mechanism would contain two code points: the o character (1 byte in UTF-8), and the combining accent mark (2 bytes in UTF-8), so you could also see a length of 3 bytes. (Playground example)