I’m working with an embedded system (RISC-V based) where a processor is connected to a data bus, which again is connected to a UART module. When the data bus is read/written, it needs two processor cycles before it is ready for new read/write operations, otherwise it yields an error.
The data bus is memory mapped like so:
struct WishboneBus {
volatile uint32_t read_addr;
volatile uint32_t write_addr;
volatile uint32_t write_data;
volatile uint32_t read_data;
};
#define WISHBONE_BUS ((struct WishboneBus *) (0x4000000))
I would like to write code like this:
void wb_write(uint32_t addr, uint32_t data) {
WISHBONE_BUS->write_data = data;
WISHBONE_BUS->write_addr = addr;
}
When the program is compiled with -O0, it yields these six instructions:
WISHBONE_BUS->write_data = data;
153c: 400007b7 lui a5,0x40000
1540: fd842703 lw a4,-40(s0)
1544: 00e7a423 sw a4,8(a5) # 40000008 <BUS_START+0x8>
WISHBONE_BUS->write_addr = addr;
1548: 400007b7 lui a5,0x40000
154c: fdc42703 lw a4,-36(s0)
1550: 00e7a223 sw a4,4(a5) # 40000004 <BUS_START+0x4>
But with -Os, it yields these instructions:
WISHBONE_BUS->write_data = data;
df8: 40000737 lui a4,0x40000
dfc: 00b72423 sw a1,8(a4) # 40000008 <BUS_START+0x8>
WISHBONE_BUS->write_addr = addr;
e00: 00a72223 sw a0,4(a4)
(The non-relevant parts of the function call are removed.)
In the -O0 case, there are two instructions (cycles) between each sw, so it works. But in the -Os case, there are two sw after each other, and the bus yields an error because the bus needs two cycles between read/write operations.
This motivates the question:
Can a C variable be constrained to a certain read/write speed?
I know it is possible to add nop instructions to make this work, but the number of required nop instructions will depend on the optimization. And I realize I can use macros to fix this, but I would like an elegant solution. E.g., some kind of __attribute__((min_access_cycles(2))) or something.
>Solution :
C does not have that feature, and also GCC probably doesn’t know how RISC-V instructions get mapped to clock cycles on your particular system. So you will have to use some amount of assembly to get the timing right.
I prefer using as little assembly when possible (e.g. just a few NOPs written in an inline assembly block), so the compiler is more free to optimize things. Also, calling an external function can often be less efficient because the compiler could be forced to save some registers to the stack, and it has to put the function arguments in particular registers to conform to the ABI.