Why does `IsSorted` in the standard library iterate over the slice in reverse?

Advertisements

Out of curiosity, I’ve been looking at the source of both the sort package in the standard library and the experimental slices package which is soon to be added to the standard library. Both packages define an IsSorted function, where they check whether a list is sorted by verifying adjacent elements are ordered. However, this is done in reverse.

I created a little reproduction, loosely based on the generic slices package. My experiments do show that the reverse version is faster.

func IsSortedForward[T any](slice []T, compare func(a, b T) int) bool {
    for i := 0; i < len(slice) - 1; i++ {
        if compare(slice[i], slice[i+1]) > 0 {
            return false
        }
    }
    return true
}

func IsSortedForward2[T any](slice []T, compare func(a, b T) int) bool {
    n := len(slice) - 1
    for i := 0; i < n; i++ {
        if compare(slice[i], slice[i+1]) > 0 {
            return false
        }
    }
    return true
}

func IsSortedReverse[T any](slice []T, compare func(a, b T) int) bool {
    for i := len(slice) - 1; i > 0; i-- {
        if compare(slice[i], slice[i-1]) < 0 {
            return false
        }
    }
    return true
}

My initial thought is that the forward version would be more "cache-friendly". However, I did notice one advantage of the reverse version as len(slice) - 1 is only computed once. Whereas, in the forward version it is computed on each iteration. My assumption is that this is why the reverse version is marginally faster.

I then extracted len(slice) - 1 outside of the loop in the forward version, so that it is only computed once, which did improve its performance. However, it was still marginally slower than the reverse version. My guess is that the comparison i > 0 in the reverse version (with zero) is faster than the comparison with i < n in the forward version?

Why do you think the reverse version is faster? Why is the adjusted forward version still not as fast as the reverse version?

EDIT: I didn’t know that the Compiler Explorer (Godbolt) works with Go… Here is a little reproduction of the functions addressed in this question. It appears that Go has its own assembly language?

>Solution :

In short: because more efficient code can be generated from a downloard loop, specifically for the condition part.

Personal note: Should you use it in your own code? Not if performance is not critical. Use a loop that is more readable and makes sense. There is obvious gain using the faster version in the standard library (as everyone is using it), but just for a few microseconds don’t sacrifice readability in your code.

IsSortedForward2() and IsSortedReverse() are more or less the same except that the former uses an increment operator (i++) and i+1 inside the body, the latter uses a decrement operator (i--) and i-1 inside the body, these are equivalent in terms of performance.

The noticeable difference comes from the condition: the forward variant uses i < n, and the reverse variant uses i > 0. The first requires comparing 2 variables, the latter requires comparing one variable to a very special value: 0.

In general the first condition requires loading 2 variables (memory locations) to registers and comparing their values. The latter requires loading 1 variable to a register and comparing it to the special 0 value. Note that in practice these variables may live in registers, so an actual memory load may not have to happen.

You can check the generated assembly code by running:

go tool compile -S play.go > play.s

The source lines I used:

20 func IsSortedForward2[T any](slice []T, compare func(a, b T) int) bool {
21     n := len(slice) - 1
22     for i := 0; i < n; i++ {
23          if compare(slice[i], slice[i+1]) > 0 {
24              return false
25          }
26     }
27     return true
28 }
29 
30 func IsSortedReverse[T any](slice []T, compare func(a, b T) int) bool {
31     for i := len(slice) - 1; i > 0; i-- {
32         if compare(slice[i], slice[i-1]) < 0 {
33             return false
34         }
35     }
36     return true
37 }

The interesting lines are 22 and 31, here are the generated assembly for those lines:

play.go:22) JMP     77
play.go:22) MOVQ    main.n+16(SP), DI
play.go:22) MOVQ    main.compare+80(SP), SI
play.go:22) MOVQ    main.slice+64(SP), CX
play.go:22) MOVQ    main.slice+56(SP), BX
play.go:22) MOVQ    main..autotmp_9+24(SP), AX
play.go:22) CMPQ    AX, DI
play.go:22) JGE     136

play.go:31) DECQ    CX
play.go:31) JMP     53
play.go:32) MOVQ    main.i+16(SP), CX
play.go:32) DECQ    CX
play.go:32) MOVQ    main.slice+48(SP), BX
play.go:32) MOVQ    main.compare+72(SP), SI
play.go:31) TESTQ   CX, CX
play.go:31) JLE     100
play.go:31) MOVQ    CX, main.i+16(SP)

IsSortedForward2() loads i and n into AX and DI registers, and compares them using CMPQ.

IsSortedReverse() only has to load i into CX, and tests if it’s zero using TESTQ.

Leave a ReplyCancel reply