C# foreach loop comically slower than for loop on a RaspberryPi

November 26, 2021

I was testing a .NET application on a RaspberryPi and whereas each iteration of that program took 500 milliseconds on a Windows laptop, the same took 5 seconds on a RaspberryPi. After some debugging, I found that majority of that time was being spent on a foreach loop concatenating strings.

Edit 1: To clarify, that 500 ms and 5 s time I mentioned was the time of the entire loop. I placed a timer before the loop, and stopped the timer after the loop had finished. And, the number of iterations are the same in both, 1000.

Edit 2: To time the loop, I used the answer mentioned here.

private static string ComposeRegs(List<list_of_bytes> registers)
{
    string ret = string.Empty;
    foreach (list_of_bytes register in registers)
    {
        ret += Convert.ToString(register.RegisterValue) + ",";
    }
    return ret;
}

Out of the blue I replaced the foreach with a for loop, and suddenly it starts taking almost the same time as it did on that laptop. 500 to 600 milliseconds.

private static string ComposeRegs(List<list_of_bytes> registers)
{
    string ret = string.Empty;
    for (UInt16 i = 0; i < 1000; i++)
    {
        ret += Convert.ToString(registers[i].RegisterValue) + ",";
    }
    return ret;
}

Should I always use for loops instead of foreach? Or was this just a scenario in which a for loop is way faster than a foreach loop?

>Solution :

The actual problem is concatenating strings, not a difference between that's your problem, notforvsforeach`. The reported timings are excruciatingly slow even on a Raspberry Pi. 1000 items is so little data it can fit in either machine’s CPU cache. An RPi has a 1+ GHZ CPU which means each concatenation takes at leas 1000 cycles.

The problem is the concatenation. Strings are immutable. Modifying or concatenating strings creates a new string. Your loops created 2000 temporary objects that need to be garbage collected. That process is expensive. Use a StringBuilder instead, preferably with a capacity roughly equal to the size of the expected string.

    [Benchmark]
    public string StringBuilder()
    {
        var sb = new StringBuilder(registers.Count * 3);
        foreach (list_of_bytes register in registers)
        {
            sb.AppendFormat("{0}",register.RegisterValue);
        }
        return sb.ToString();
    }

Simply measuring a single execution, or even averaging 10 executions, won’t produce valid numbers. It’s quite possible the GC run to collect those 2000 objects during one of the tests. It’s also quite possible that one of the tests was delayed by JIT compilation or any other number of reasons. A test should run long enough to produce stable numbers.

The defacto standard for .NET benchmarking is BenchmarkDotNet. That library will run each benchmark long enough to eliminate startup and cooldown effect and account for memory allocations and GC collections. You’ll see not only how much each test takes but how much RAM is used and how many GCs are caused

To actually measure your code try using this benchmark using BenchmarkDotNet :

public class ConcatTest
{

    private readonly List<list_of_bytes> registers;


    public ConcatTest()
    {
        registers = SomehowFillTheRegisters();
    }

    [Benchmark]
    public string StringBuilder()
    {
        var sb = new StringBuilder(registers.Count * 3);
        foreach (list_of_bytes register in registers)
        {
            sb.AppendFormat("{0}",register.RegisterValue);
        }
        return sb.ToString();
    }

    [Benchmark]
    public string ForEach()
    {
        string ret = string.Empty;
        foreach (list_of_bytes register in registers)
        {
            ret += Convert.ToString(register.RegisterValue) + ",";
        }
        return ret;
    }

    [Benchmark]
    public string For()
    {
        string ret = string.Empty;
        for (UInt16 i = 0; i < registers.Count; i++)
        {
            ret += Convert.ToString(registers[i].RegisterValue) + ",";
        }
        return ret;
    }

}

The tests are run by calling BenchmarkRunner.Run<ConcatTest>()

public class Program
{
    public static void Main(string[] args)
    {
        var summary = BenchmarkRunner.Run<ConcatTest>();
        Console.WriteLine(summary);
    }
}