While the Go standard library makes sorting slices of strings in a case-sensitive manner a trivial operation, what does a developer do when case-insensitivity is required? This blog post outlines why the common approach to sorting strings in a case-insensitive manner does not scale and provides a highly performant, case-insensitive, string sort solution.

Sorting String Slices

The standard library’s sort package provides the ability to sort slices of arbitrary types in increasing order, but when it comes to strings, Go can be a little sensitive. For example: (run):

package main

import (
    "fmt"
    "sort"
)

func main() {
    data := []string{"A", "b", "c", "D"}
    fmt.Println(sort.StringsAreSorted(data))
    sort.Strings(data)
    fmt.Println(data)
    fmt.Println(sort.StringsAreSorted(data))
}

false
[A D b c]
true

While A b c D is sorted if elements are compared in a case-insensitive manner, the sort package’s string sorting is case-sensitive. Luckily, the sort package also makes it trivial to provide custom comparison logic.

Sorting Sans Sensitivity

The sort package defines functions that accept a custom comparator function in order to implement custom sorting. The following example illustrates how to use these functions to provide a case-insensitive sort (run):

package main

import (
    "fmt"
    "sort"
    "strings"
)

func main() {
    data := []string{"A", "b", "c", "D"}
    less := func(i, j int) bool {
        return strings.ToLower(data[i]) < strings.ToLower(data[j])
    }
    fmt.Println(sort.SliceIsSorted(data, less))
}

true

The above example showcases the use of sort.SliceIsSorted and the comparator function less to determine the slice data is already sorted in a case-insensitive manner. While this may seem like the end of the story, the above implementation comes with a cost.

Performance Concerns

The Go sort.Sort function is an implementation of the quicksort algorithm. This means at best a call to sort.Sort is O(n*log(n)) complex and at worst is O(n^2). In other words, in an ideal scenario the sort operation will make n*log(n) calls to Less and in a worst case situation Less will be invoked n^2 times.

The example in the previous section uses strings.ToLower to standardize the case of both operands prior to comparison. Because string values in Go are immutable, a function like ToLower allocates new memory to hold the result of the operation.

Thus, in a best-case scenario, 2(n*log(n)) new strings are allocated when sorting slices of strings in a case-insensitive manner with strings.ToLower and up to 2(n^2) new strings are created at worst. Obviously this solution does not scale.

Comparing Case Folding

The reason strings.ToLower (or strings.ToUpper) is required in the first place is due to Go’s lack of a function that uses Unicode case-folding to indicate if string a is less than string b. The Go standard library includes the following functions for comparing strings:

Function Case-Folding Description
strings.Compare Returns 1 if a < b, 0 if a == b, 1 if a > b
strings.EqualFold Returns true if two strings are equal

This package introduces sortfold.CompareFold, a function that compares strings using Unicode case-folding like strings.EqualFold and behaves like strings.Compare with respect to the return value:

package main

import (
    "fmt"
    "sort"
    "strings"
    "github.com/akutz/sortfold"
)

func main() {
    data := []string{"A", "b", "c", "D"}
    less := func(i, j int) bool {
        return sortfold.CompareFold(data[i], data[j]) < 0
    }
    fmt.Println(sort.SliceIsSorted(data, less))
}

true

Performance Comparison

The introduction of sortfold.CompareFold provides the ability to sort slices of strings in case-insensitive manner that is highly performant. The benchmark results illuminate the difference between using two implementations of sort.Interface to sort string slices of varying length.

The benchmarks that have the prefix Benchmark_FoldedSort use sortfold.StringSlice, which itself uses sortfold.CompareFold. The benchmarks with the prefix Benchmark_LCasedSort convert strings to lowercase prior to comparison: