EECS251LB : Field Programmable Gate Array Laboratory

Published in University of California, Berkeley, 2025

I won the Apple Design Competition for this Design!

Introduction

This project implements a 32-bit RISC-V core with a 3-stage pipeline, synchronous instruction/data memories, with minimal stalling and an integer cycles per instruction of 1. In addition to integer operations, the processor also supports floating point operations with a floating point CPI of slightly greater than 1. UART is used to interface the core with the real world. This RISC-V core was fully implemented on a PYNQ-Z1 board as shown in Figure 1.

Figure 1 : Project Overview

3 Stage Pipeline

The three stage pipeline is divided into three phases :

  • F : Instruction Fetch
  • DX : Decode and Execute
  • MW : Memory and Write-Back

Figure 2 shows the Block Diagram of our three stage pipeline with Floating Point Unit in parallel to the ALU. The Floating Point Unit (FPU) is pipelined with 2 stages for floating point addition and 3 stages for floating point multiplication and accumulation.

Figure 2 : Block Diagram of the three stage pipelined core

In the F stage, the IMEM, PC and pipeline register are clocked simultaneously. Between the DX and MW stage, the DMEM and pipeline registers are clocked together. Extensive forwarding is employed to ensure that ALU-ALU hazards, MEM-MEM hazards, MEM-ALU hazards and ALU-MEM hazards do not encounter any stalls. The PC Select Mux takes one of its input right from the output of the ALU. In case of a branch instruction, the ALU output is directly forwarded to the IMEM address, thus reducing any branch misprediction and having a resultant integer CPI of 1.

Due to the long critical path of the floating point unit, it has been pipelined to 2 stages for floating point addition and 3 stages for floating point multiplication and accumulation to ensure high frequency of operation.

Memory Mapped IO

A memory mapped IO is used to map the different memory locations to IMEM, BIOS, DMEM and UART modules. Figure 3 shows the MMIO mapping of the RISC-V core.

Figure 3 : Memory Mapped IO

Resource Utilization

FPGA Usage

FPGA ResourceNo. of Units
LUTs2589
SLICE registers567
BRAMs34
DSP Blocks2

Critical Path

The timing of the critical path is 15.744 ns.

Results

ParameterValue
Integer CPI (mmult)1
Integer CPI (bdd)1
Floating Point CPI (fpmmult)1.33
Clock Frequency62 MHz
Cost4138844
Figure of Merit (FoM)22.96

The achieved Figure of Merit (FoM) is 22.96!