Relation of endianness to assembly conversion of size in C

July 29, 2024

Please note that the below is adapted from Problem 3.4 of Bryant and O’Hallaron’s text (CSAPP3e). I have stripped away everything but my essential question.

Context: we are looking at a x86-64/Linux/gcc combo wherein ints are 4 bytes and chars are considered signed (and, of course, 1 byte). We are interested in writing the assembly corresponding to conversion of an int to a char which, at a high level, we know arises from performing truncation.

They present the following solution:

movl (%rdi), %eax            // Read 4 bytes
movb %al, (%rsi)             // Store low-order byte

My question is whether we can change the movl to a movb since, after all, we are only using a byte in the end. My concern with this suspicion is that there might be some endian-dependence with the read, and we might somehow be getting the high bits if our processor/OS is in little-endian mode. Is this suspicion correct, or would my change work no matter what?

I would try this out but 1) I am on a Mac with Apple silicon and 2) even if my suspicion worked, I couldn’t be sure if this sort of thing was implementation-dependent.

>Solution :

You’re right to be concerned about endianness for this kind of operation, but in this case, your alternative approach would fail on big-endian machines, not on little-endian ones.

x86 is little endian, which means the low-order eight bits of a 32-bit integer are stored in the first (lowest address) byte of that integer, so

movb (%rdi), %al             // Read low-order byte
movb %al, (%rsi)             // Store low-order byte

will do the truncation you want to do on x86. But on a big-endian machine the equivalent operation would read the highest eight bits of the 32-bit integer. (I would give an example but I don’t remember the assembly language for any big-endian architecture well enough to rattle one off, anymore.)

The virtue of doing it the way CS:APP does it is that the same construct will work correctly on both big- and little-endian architectures. Of course, if you’re programming in assembly language you have to rewrite the code anyway, but it’s one fewer thing to worry about while you’re doing that.