Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Go string appears shorter than it's first rune

I was running some fuzzing on my code and it found a bug. I have reduced it down to the following code snippet and I cannot see what is wrong.

Given the string

s := string("\xc0")

The len(s) function returns 1. However, if you loop through the string the first rune is length 3.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    for _, r := range s {
        fmt.Println("len of rune:", utf8.RuneLen(r)) // Will print 3
    }

My assumptions are:

  • len(string) is returning the number of bytes in the string
  • utf8.RuneLen(r) is returning the number of bytes in the rune

I assume I am misunderstanding something, but how can the length of a string be less than the length of one of it’s runes?

Playground here: https://go.dev/play/p/SH3ZI2IZyrL

>Solution :

The explanation is simple: your input is not valid UTF-8 encoded string.

fmt.Println(utf8.ValidString(s))

This outputs: false.

If you go with a valid string holding a rune of \xc0:

s = string([]rune{'\xc0'})

Then output is:

len of s: 2
runs in s: 1
len of rune: 2

Try it on the Go Playground.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading