What is the clock cycle time with and without this improvement?

June 13, 2022

In the following three problems, assume that we are starting with a
datapath from Figure 4.2, where I-Mem, Add, Mux, ALU, Regs, D-Mem,
and Control blocks have latencies of 400 ps, 100 ps, 30 ps, 120 ps,
200 ps, 350 ps, and 100 ps, respectively, and costs of 1000, 30, 10,
100, 200, 2000, and 500, respectively.

And I find solution like below

Cycle Time Without improvement = I-Mem + Regs + Mux + ALU + D-Mem +
Mux = 400 + 200 + 30 + 120 + 350 + 30 = 1130

Cycle Time With improvement = 1130 + 300 = 1430

Now I wonder why last register write latency(200) is not added?
To be more precise, Critical path is determined by load instruction’s latency then why critical path is not

I-Mem + Regs + Mux + ALU + D-Mem + MUX + Regs

but

I-Mem + Regs + Mux + ALU + D-Mem + MUX

>Solution :

It is a good question as to whether it requires two Regs latencies.

The register write is a capture of the output of one cycle. It happens at the end of one clock cycle, and the start of the next — it is the clock cycle edge/transition to the next cycle that causes the capture.

In one sense, the written output of one instruction effectively happens in parallel with the read of the next, with the only requirement for this overlap being that the next instruction must be able to read the output of the prior instruction instead of a stale register value. And this is possible because the written data was available at the very top/beginning of the current cycle.

The PC works the same: at the clock transition from one cycle’s end to another cycle’s start, the value for the new PC is captured and then released to the I Mem. So, the read and write effectively happen in parallel, with the only requirement then being that the read value sent to I Mem is the new one.

This is the fundamental way that cycles work: enregistered values start the next cycle, then combinational logic computes new values that are captured at the end of the cycle and form the program state available for the start of the new cycle. So one cycle does state -> processing -> (new) state and then the cycle repeats.