I was running some fuzzing on my code and it found a bug. I have reduced it down to the following code snippet and I cannot see what is wrong.
Given the string
s := string("\xc0")
The len(s) function returns 1. However, if you loop through the string the first rune is length 3.
for _, r := range s {
fmt.Println("len of rune:", utf8.RuneLen(r)) // Will print 3
}
My assumptions are:
len(string)is returning the number of bytes in the stringutf8.RuneLen(r)is returning the number of bytes in the rune
I assume I am misunderstanding something, but how can the length of a string be less than the length of one of it’s runes?
Playground here: https://go.dev/play/p/SH3ZI2IZyrL
>Solution :
The explanation is simple: your input is not valid UTF-8 encoded string.
fmt.Println(utf8.ValidString(s))
This outputs: false.
If you go with a valid string holding a rune of \xc0:
s = string([]rune{'\xc0'})
Then output is:
len of s: 2
runs in s: 1
len of rune: 2
Try it on the Go Playground.