Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ [dual-core] add inter-core communication #1142

Merged
merged 21 commits into from
Jan 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12

| Date | Version | Comment | Ticket |
|:----:|:-------:|:--------|:------:|
| 04.01.2025 | 1.10.8.8 | :sparkles: add inter-core communication (ICC) for the SMP dual-core setup | [#1142](https://github.com/stnolting/neorv32/pull/1142) |
| 03.01.2025 | 1.10.8.7 | :warning: :sparkles: replace `Zalrsc` ISA extensions (reservation-set operations) by `Zaamo` ISA extension (atomic read-modify-write operations) | [#1141](https://github.com/stnolting/neorv32/pull/1141) |
| 01.01.2025 | 1.10.8.6 | :sparkles: :test_tube: add smp dual-core option | [#1135](https://github.com/stnolting/neorv32/pull/1135) |
| 29.12.2024 | 1.10.8.5 | :test_tube: add multi-hart support to debug module | [#1132](https://github.com/stnolting/neorv32/pull/1132) |
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,8 @@ setup according to your needs. Note that all of the following SoC modules are en
**CPU Core**

* [![RISCV-ARCHID](https://img.shields.io/badge/RISC--V%20Architecture%20ID-19-000000.svg?longCache=true&style=flat-square&logo=riscv&colorA=273274&colorB=fbb517)](https://github.com/riscv/riscv-isa-manual/blob/master/marchid.md)
* RISC-V 32-bit little-endian single- or SMP-dual-core pipelined/multi-cycle modified Harvard architecture
* RISC-V 32-bit little-endian pipelined/multi-cycle modified Harvard architecture
* Single-core or SMP dual-core configuration (including low-latency inter-core communication)
* configurable [instruction sets and extensions](https://stnolting.github.io/neorv32/#_instruction_sets_and_extensions):
\
`RV32`
Expand Down
37 changes: 23 additions & 14 deletions docs/datasheet/cpu.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -62,25 +62,33 @@ type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "D
direction as seen from the CPU.

.NEORV32 CPU Signal List
[cols="<3,^3,^1,<5"]
[cols="^2,^2,^1,<8"]
[options="header", grid="rows"]
|=======================
| Signal | Width/Type | Dir | Description
4+^| **Global Signals**
| `clk_i` | 1 | in | Global clock line, all registers triggering on rising edge.
| `rstn_i` | 1 | in | Global reset, low-active.
| `clk_i` | 1 | in | Global clock line, all registers triggering on rising edge.
| `rstn_i` | 1 | in | Global reset, low-active.
4+^| **Interrupts (<<_traps_exceptions_and_interrupts>>)**
| `msi_i` | 1 | in | RISC-V machine software interrupt.
| `mei_i` | 1 | in | RISC-V machine external interrupt.
| `mti_i` | 1 | in | RISC-V machine timer interrupt.
| `firq_i` | 16 | in | Custom fast interrupt request signals.
| `dbi_i` | 1 | in | Request CPU to halt and enter debug mode (RISC-V <<_on_chip_debugger_ocd>>).
| `msi_i` | 1 | in | RISC-V machine software interrupt.
| `mei_i` | 1 | in | RISC-V machine external interrupt.
| `mti_i` | 1 | in | RISC-V machine timer interrupt.
| `firq_i` | 16 | in | Custom fast interrupt request signals.
| `dbi_i` | 1 | in | Request CPU to halt and enter debug mode (RISC-V <<_on_chip_debugger_ocd>>).
4+^| **Instruction <<_bus_interface>>**
| `ibus_req_o` | `bus_req_t` | out | Instruction fetch bus request.
| `ibus_rsp_i` | `bus_rsp_t` | in | Instruction fetch bus response.
| `ibus_req_o` | `bus_req_t` | out | Instruction fetch bus request.
| `ibus_rsp_i` | `bus_rsp_t` | in | Instruction fetch bus response.
4+^| **Data <<_bus_interface>>**
| `dbus_req_o` | `bus_req_t` | out | Data access (load/store) bus request.
| `dbus_rsp_i` | `bus_rsp_t` | in | Data access (load/store) bus response.
| `dbus_req_o` | `bus_req_t` | out | Data access (load/store) bus request.
| `dbus_rsp_i` | `bus_rsp_t` | in | Data access (load/store) bus response.
4+^| **<<_inter_core_communication_icc>> TX links**
| `icc_tx_rdy_o` | 2 | out | Data available for cores `0..1`.
| `icc_tx_ack_i` | 2 | in | Read-enable from cores `0..1`.
| `icc_tx_dat_o` | 2*32 | out | Data for cores `0..1`.
4+^| **<<_inter_core_communication_icc>> RX links**
| `icc_rx_rdy_i` | 2 | in | Data available from cores `0..1`.
| `icc_rx_ack_o` | 2 | out | Read-enable for cores `0..1`.
| `icc_rx_dat_i` | 2*32 | in | Data from cores `0..1`.
|=======================

.Bus Interface Protocol
Expand Down Expand Up @@ -109,8 +117,9 @@ The generic type "suv(x:y)" represents a `std_ulogic_vector(x downto y)`.
[options="header",grid="rows"]
|=======================
| Name | Type | Description
| `HART_ID` | natural | Value for the <<_mhartid>> CSR.
| `VENDOR_ID` | suv(31:0) | Value for the <<_mvendorid>> CSR.
| `HART_ID` | natural | ID of the core (for <<_mhartid>> CSR).
| `NUM_HARTS` | natural | Total number of cores in the system.
| `VENDOR_ID` | suv(31:0) | Vendor identification (for <<_mvendorid>> CSR).
| `BOOT_ADDR` | suv(31:0) | CPU reset address. See section <<_address_space>>.
| `DEBUG_PARK_ADDR` | suv(31:0) | "Park loop" entry address for the <<_on_chip_debugger_ocd>>, has to be 4-byte aligned.
| `DEBUG_EXC_ADDR` | suv(31:0) | "Exception" entry address for the <<_on_chip_debugger_ocd>>, has to be 4-byte aligned.
Expand Down
120 changes: 90 additions & 30 deletions docs/datasheet/cpu_csr.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,6 @@ to check if the targeted bits can actually be modified.
| 0x7b0 | <<_dcsr>> | - | DRW | Debug control and status register
| 0x7b1 | <<_dpc>> | - | DRW | Debug program counter
| 0x7b2 | <<_dscratch0>> | - | DRW | Debug scratch register 0
5+^| **<<_custom_functions_unit_cfu_csrs>>**
| 0x800 .. 0x803 | <<_cfureg, `cfureg0`>> .. <<_cfureg, `cfureg3`>> | `CSR_CFUCREG0` .. `CSR_CFUCREG3` | URW | Custom CFU registers 0 to 3
5+^| **<<_machine_counter_and_timer_csrs>>**
| 0xb00 | <<_mcycleh, `mcycle`>> | `CSR_MCYCLE` | MRW | Machine cycle counter low word
| 0xb02 | <<_minstreth, `minstret`>> | `CSR_MINSTRET` | MRW | Machine instruction-retired counter low word
Expand All @@ -79,7 +77,11 @@ to check if the targeted bits can actually be modified.
| 0xf14 | <<_mhartid>> | `CSR_MHARTID` | MRO | Machine hardware thread ID
| 0xf15 | <<_mconfigptr>> | `CSR_MCONFIGPTR` | MRO | Machine configuration pointer register
5+^| **<<_neorv32_specific_csrs>>**
| 0xfc0 | <<_mxisa>> | `CSR_MXISA` | MRO | NEORV32-specific "eXtended" machine CPU ISA and extensions
| 0xbc0 | <<_mxiccrxd>> | `CSR_MXICCRXD` | MRW | ICC RX data
| 0xbc1 | <<_mxicctxd>> | `CSR_MXICCTXD` | MRW | ICC TX data
| 0xbc2 .. 0xbc3 | <<_mxiccsr, `mxiccsr0`>> .. <<_mxiccsr, `mxiccsr0`>> | `CSR_MXICCSR0` .. `CSR_MXICCSR3` | MRW | ICC control and status
| 0x800 .. 0x803 | <<_cfureg, `cfureg0`>> .. <<_cfureg, `cfureg3`>> | `CSR_CFUCREG0` .. `CSR_CFUCREG3` | URW | Custom CFU registers 0 to 3
| 0xfc0 | <<_mxisa>> | `CSR_MXISA` | MRO | Extended machine CPU ISA and extensions
|=======================


Expand Down Expand Up @@ -595,28 +597,6 @@ implementation of the according modes.
|=======================


<<<
// ####################################################################################################################
:sectnums:
==== Custom Functions Unit (CFU) CSRs

[discrete]
===== **`cfureg`**

[cols="<1,<8"]
[frame="topbot",grid="none"]
|=======================
| Name | Custom (user-defined) CFU CSRs
| Address | `0x800` (`cfureg0`)
| | `0x801` (`cfureg1`)
| | `0x802` (`cfureg2`)
| | `0x803` (`cfureg3`)
| Reset value | `0x00000000`
| ISA | `Zicsr` & `Zxcfu`
| Description | User-defined CSRs to be used within the <<_custom_functions_unit_cfu>>.
|=======================


<<<
// ####################################################################################################################
:sectnums:
Expand Down Expand Up @@ -929,12 +909,92 @@ core's hart ID is unique starting at 0 for the first core.
:sectnums:
==== NEORV32-Specific CSRs

.RISC-V-Compliant Mapping
[NOTE]
All NEORV32-specific CSRs are mapped to addresses that are explicitly reserved for custom **Machine-Mode, read-only** CSRs
(assured by the RISC-V privileged specifications). Hence, these CSRs can only be accessed when in machine-mode. Any access
outside of machine-mode will raise an illegal instruction exception.
All NEORV32-specific CSRs are mapped to addresses that are explicitly reserved for
custom/implementation-specific use (assured by the RISC-V privileged specifications).


[discrete]
===== **`cfureg`**

[cols="<1,<8"]
[frame="topbot",grid="none"]
|=======================
| Name | Custom (user-defined) CFU CSRs
| Address | `0x800` (`cfureg0`)
| | `0x801` (`cfureg1`)
| | `0x802` (`cfureg2`)
| | `0x803` (`cfureg3`)
| Reset value | `0x00000000`
| ISA | `Zicsr` & `Zxcfu`
| Description | User-defined CSRs to be used within the <<_custom_functions_unit_cfu>>.
|=======================


{empty} +
[discrete]
===== **`mxiccrxd`**

[cols="<1,<8"]
[frame="topbot",grid="none"]
|=======================
| Name | <<_inter_core_communication_icc>> RX data
| Address | `0xbc0`
| Reset value | `0x00000000`
| ISA | `Zicsr` & `X`
| Description | RX data from selected link. Buffered by a 4-entries-deep and 32-bit wide FIFO.
This CSR is hardwired to all-zero if there is just a single CPU core in the system.
|=======================


{empty} +
[discrete]
===== **`mxicctxd`**

[cols="<1,<8"]
[frame="topbot",grid="none"]
|=======================
| Name | <<_inter_core_communication_icc>> TX data
| Address | `0xbc1`
| Reset value | `0x00000000`
| ISA | `Zicsr` & `X`
| Description | TX data for selected link. Buffered by a 4-entries-deep and 32-bit wide FIFO.
This CSR is hardwired to all-zero if there is just a single CPU core in the system.
|=======================

{empty} +
[discrete]
===== **`mxiccsr`**

[cols="<1,<8"]
[frame="topbot",grid="none"]
|=======================
| Name | <<_inter_core_communication_icc>> control and status
| Address | `0xbc2` (`mxiccsr0`)
| | `0xbc3` (`mxiccsr1`)
| Reset value | `0x40000000`
| ISA | `Zicsr` & `X`
| Description | Link selection and status. Note that `mxiccsr1` is just a mirrored copy of `mxiccsr0`.
This CSR is hardwired to all-zero if there is just a single CPU core in the system.
|=======================

.`mxiccsr` CSR Bits
[cols="^1,^2,^1,<5"]
[options="header",grid="rows"]
|=======================
| Bit | Name [C] | R/W | Description
| 1:0 | `CSR_MXICCSR_LINK_MSB : CSR_MXICCSR_LINK_LSB` | r/w | Link select. The value in this memory corresponds
to the ID of the core to which a connection is to be established via a link. The ICC data registers <<_mxiccrxd>>
and <<_mxicctxd>> will only access the queue FIFOs of the selected link. Note that only bit 0 is writable. Bit 1
is hardwaired to zero.
| 29:2 | - | r/- | Reserved; hardwired to zero.
| 30 | `CSR_MXICCSR_TX_FREE` | r/- | Set if there is free space for TX data for the selected link.
| 31 | `CSR_MXICCSR_RX_AVAIL` | r/- | Set if RX data from the selected link is available.
|=======================


{empty} +
[discrete]
===== **`mxisa`**

Expand All @@ -946,7 +1006,7 @@ outside of machine-mode will raise an illegal instruction exception.
| Reset value | `DEFINED`
| ISA | `Zicsr` & `X`
| Description | The `mxisa` CSRs is a NEORV32-specific read-only CSR that helps machine-mode software to
discover ISA sub-extensions and CPU configuration options
discover additional ISA (sub-)extensions and CPU configuration options.
|=======================

.`mxisa` CSR Bits
Expand Down Expand Up @@ -985,5 +1045,5 @@ discover ISA sub-extensions and CPU configuration options
| 28 | `CSR_MXISA_RFHWRST` | r/- | full hardware reset of register file available when set (`CPU_RF_HW_RST_EN`), see <<_cpu_tuning_options>>
| 29 | `CSR_MXISA_FASTMUL` | r/- | fast multiplication available when set (`CPU_FAST_MUL_EN`), see <<_cpu_tuning_options>>
| 30 | `CSR_MXISA_FASTSHIFT` | r/- | fast shifts available when set (`CPU_FAST_SHIFT_EN`), see <<_cpu_tuning_options>>
| 31 | `CSR_MXISA_IS_SIM` | r/- | set if CPU is being **simulated** (⚠️ not guaranteed)
| 31 | `CSR_MXISA_IS_SIM` | r/- | set if CPU is being **simulated**
|=======================
69 changes: 50 additions & 19 deletions docs/datasheet/cpu_dual_core.adoc
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
:sectnums:
=== Dual-Core Configuration

.Hardware Requirements
[IMPORTANT]
The SMP dual-core configuration requires the <<_core_local_interruptor_clint>> to be implemented.
.Dual-Core Example
[TIP]
A simple dual-core example program can be found in `sw/example/demo_dual_core`.

Optionally, the CPU core can be implemented as **symmetric multiprocessing (SMP) dual-core** system.
This dual-core configuration is enabled by the `DUAL_CORE_EN` <<_processor_top_entity_generics, top generic>>.
Expand All @@ -14,10 +14,10 @@ of both core complexes are further switched into a single system bus using a rou

image::smp_system.png[align=center]

Both CPU cores are fully identical and use the same configuration provided by the according
<<_processor_top_entity_generics, top generics>>. However, each core can be identified by the according
"hart ID" that can be retrieved from the <<_mhartid>> CSR. CPU core 0 (the _primary_ core) has `mhartid = 0`
while core 1 (the _secondary_ core) has `mhartid = 1`.
Both CPU cores are fully identical and use the same ISA, tuning and cache configurations provided by the
according <<_processor_top_entity_generics, top generics>>. However, each core can be identified by the
according "hart ID" that can be retrieved from the <<_mhartid>> CSR. CPU core 0 (the _primary_ core) has
`mhartid = 0` while core 1 (the _secondary_ core) has `mhartid = 1`.

The following table summarizes the most important aspects when using the dual-core configuration.

Expand All @@ -41,32 +41,63 @@ while the top of stack of core 1 has to be explicitly defined by core 0 (see <<_
cores share the same heap, `.data` and `.bss` sections.
| **Constructors and destructors** | Constructors and destructors are executed on core 0 only.
(see )
| **Core communication** | See section <<_inter_core_communication_icc>>.
| **Bootloader** | Only core 0 will boot and execute the bootloader while core 1 is held in standby.
| **Booting** | See next section <<_dual_core_boot>>.
| **Booting** | See section <<_dual_core_boot>>.
|=======================

.Dual-Core Example

==== Inter-Core Communication (ICC)

Both cores can communicate with each other via a direct point-to-point connection based on FIFO-like message
queues. These direct communication links are faster (in terms of latency) compared to a memory-mapped or
shared-memory communication. Additionally, communication using these links is guaranteed to be atomic.

The inter-core communication (ICC) module is implemented as dedicated hardware module within each CPU core
(VHDL file `rtl/core/neorv32_cpu_icc.vhd`). This module is automatically included if the dual-core option
is enabled. Each core provides a 32-bit wide and 4 entries deep FIFO for sending data to the other core.
Hence, there are two FIFOs: one for sending data from core 0 to core 1 and another one for sending data the
opposite way.

The ICC communication links are accessed via NEORV32-specific CSRs. Hence, those FIFOs are accessible only
by the CPU core itself and cannot be accessed by the DMA or any other CPU core. In total, three CSRs are
provided to handle communications:

The <<_mxiccsr>> is used to select the core with which to communicate. In the dual-core configuration core 1
can only select core 0 and vice versa. The core selection in this register allows access to the according
message FIFOs via the two other CSRs. Additionally, the CSR provides status flags (TX FIFO data available;
RX FIFO free space) related to the selected communication link.

The <<_mxiccrxd>> and <<_mxicctxd>> CSRs are used for the actual data read and write operations. Writing data
to <<<<_mxicctxd>>> will send to the message queue of the core selected by <<_mxiccsr>>. Conversely, reading
data from <<_mxiccrxd>> will return data received from the core selected by <<_mxiccsr>>.

The ICC FIFOs do not provide any interrupt capabilities. Software is expected to use the machine-software
interrupt of the receiving core (provided by the <<_core_local_interruptor_clint>>) to inform it about
available messages.

.ICC Software API
[TIP]
A simple dual-core example setup / test program can be found in `sw/example/demo_dual_core`.
The NEORV32 software framework provides API wrappers to abstract inter-core communication:
`sw/lib/include/noevr32_smp.h`


==== Dual-Core Boot

After reset both cores start booting. However, core 1 will always (regardless of the boot configuration) enter
sleep mode inside the default <<_start_up_code_crt0>> that is linked with any compiled application. The primary
core (core 0) will continue booting executing either the <<_bootloader>> or the pre-installed image in the
internal instruction memory (depending on the <<_boot_configuration>>).
After reset, both cores start booting. However, core 1 will - regardless of the <<_boot_configuration>> - always
enter <<_sleep_mode>> right inside the default <<_start_up_code_crt0>> that is linked with any compiled
application. The primary core (core 0) will continue booting, executing either the <<_bootloader>> or the
pre-installed image from the internal instruction memory (depending on the boot configuration).

To boot-up core 1 the primary core has to use a special library function provided by the NEORV32 runtime
environment (RTE):
To boot-up core 1, the primary core has to use a special library function provided by the NEORV32 software framework:

.CPU Core 1 launch function prototype (note that this function can only be executed on core 0)
[source,c]
----
int neorv32_rte_smp_launch(void (*entry_point)(void), uint8_t* stack_memory, size_t stack_size_bytes);
int neorv32_smp_launch(int hart_id, void (*entry_point)(void), uint8_t* stack_memory, size_t stack_size_bytes);
----

When executed, core 0 will populate a configuration structure in main memory that contain the entry point
When executed, core 0 use the <<_inter_core_communication_icc>> to send launch data that includes the entry point
for core 1 (via `entry_point`) and the actual stack configuration (via `stack_memory` and `stack_size_bytes`).

.Core 1 Stack Memory
Expand All @@ -78,5 +109,5 @@ boundary.´

After that, the primary core triggers the _machine software interrupt_ of core 1 using the
<<_core_local_interruptor_clint>>. Core 1 wakes up from sleep mode, consumes the configuration structure and
finally starts executing at the provided entry point. When `neorv32_rte_smp_launch()` returns (with no error
finally starts executing at the provided entry point. When `neorv32_smp_launch()` returns (with no error
code) the secondary core is online and running.
1 change: 1 addition & 0 deletions docs/datasheet/overview.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,7 @@ neorv32_top.vhd - NEORV32 PROCESSOR/SOC TOP ENTITY
││└neorv32_cpu_cp_shifter.vhd - Bit-shift co-processor (base ISA)
│├neorv32_cpu_control.vhd - CPU control, exception system and CSRs
││└neorv32_cpu_decompressor.vhd - Compressed instructions decoder (C ext.)
│├neorv32_cpu_icc.vhd - Inter-core communication unit
│├neorv32_cpu_lsu.vhd - Load/store unit
│├neorv32_cpu_pmp.vhd - Physical memory protection unit (Smpmp ext.)
│└neorv32_cpu_regfile.vhd - Data register file
Expand Down
1 change: 1 addition & 0 deletions docs/datasheet/software.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ The NEORV32 HAL consists of the following files.
| `neorv32_rte.c` | `neorv32_rte.h` | <<_neorv32_runtime_environment>>
| `neorv32_sdi.c` | `neorv32_sdi.h` | <<_serial_data_interface_controller_sdi>> HAL
| `neorv32_slink.c` | `neorv32_slink.h` | <<_stream_link_interface_slink>> HAL
| `neorv32_smp.c` | `neorv32_smp.h` | HAL for the SMP <<_dual_core_configuration>>
| `neorv32_spi.c` | `neorv32_spi.h` | <<_serial_peripheral_interface_controller_spi>> HAL
| `neorv32_sysinfo.c` | `neorv32_sysinfo.h` | <<_system_configuration_information_memory_sysinfo>> HAL
| `neorv32_trng.c` | `neorv32_trng.h` | <<_true_random_number_generator_trng>> HAL
Expand Down
Binary file modified docs/figures/smp_system.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading