Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMP dual-core cleanups #1146

Merged
merged 24 commits into from
Jan 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
91bf795
⚠️ [rtl] remove core_complex rtl file
stnolting Jan 10, 2025
e794045
[cpu] simplify ICC links
stnolting Jan 10, 2025
46f2239
[cpu] clean-up dual-core generics
stnolting Jan 10, 2025
0fa0bc2
[cpu] rework ICC CSRs
stnolting Jan 10, 2025
bb37419
[crt0] update dual-core startup code
stnolting Jan 10, 2025
ae729f0
[sw/lib] cleanup SMP library
stnolting Jan 10, 2025
86ca786
[top] rework core complex instances
stnolting Jan 10, 2025
ff82994
[demo_dual_core] code updates
stnolting Jan 10, 2025
4841fc7
[processor_check] minor code update
stnolting Jan 10, 2025
acc5e5b
[changelog] add v1.10.9.2
stnolting Jan 10, 2025
6eb9722
[rtl] ICC: comment edits
stnolting Jan 10, 2025
82c1ef9
[rtl] update default memory images
stnolting Jan 10, 2025
4852e08
[crt0] minor edits
stnolting Jan 10, 2025
5fd2df0
[sw/lib] SMP: add blocking ICC push/pop functions
stnolting Jan 10, 2025
e551798
[rtl] icc: minor comment edits
stnolting Jan 10, 2025
053d54d
[sw/lib] adjust type of core1's main function
stnolting Jan 10, 2025
1e53b2a
[sw/example] simplify dual-core demo
stnolting Jan 10, 2025
2805acc
[processor_check] minor fix
stnolting Jan 10, 2025
592a6a2
[sw/example] add dual-core ICC demo
stnolting Jan 10, 2025
337c05a
[dual_core_demo] re-add spinlock example
stnolting Jan 10, 2025
650e1f0
[sw/example] typo fix
stnolting Jan 10, 2025
db6a4b7
[docs] smp updates and cleanups
stnolting Jan 10, 2025
6688b19
[docs] remove core complex from rtl list
stnolting Jan 10, 2025
49c89d9
Merge branch 'main' into smp_cleanup
stnolting Jan 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12

| Date | Version | Comment | Ticket |
|:----:|:-------:|:--------|:------:|
| 10.01.2025 | 1.10.9.2 | clean-up SMP dual-core configuration (HW and SW optimizations) | [#1146](https://github.com/stnolting/neorv32/pull/1146) |
| 09.01.2025 | 1.10.9.1 | fix side-effects of CSR read instructions | [#1145](https://github.com/stnolting/neorv32/pull/1145) |
| 08.01.2025 | [**:rocket:1.10.9**](https://github.com/stnolting/neorv32/releases/tag/v1.10.9) | **New release** | |
| 07.01.2025 | 1.10.8.9 | rtl edits and cleanups; add dedicated "core complex" wrapper (CPU + L1 caches + bus switch) | [#1144](https://github.com/stnolting/neorv32/pull/1144) |
Expand Down
1 change: 1 addition & 0 deletions docs/datasheet/cpu.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ The generic type "suv(x:y)" represents a `std_ulogic_vector(x downto y)`.
| `BOOT_ADDR` | suv(31:0) | CPU reset address. See section <<_address_space>>.
| `DEBUG_PARK_ADDR` | suv(31:0) | "Park loop" entry address for the <<_on_chip_debugger_ocd>>, has to be 4-byte aligned.
| `DEBUG_EXC_ADDR` | suv(31:0) | "Exception" entry address for the <<_on_chip_debugger_ocd>>, has to be 4-byte aligned.
| `ICC_EN` | boolean | Implement <<_inter_core_communication_icc>> module. Automatically enabled for the SMP <<_dual_core_configuration>>.
| `RISCV_ISA_Sdext` | boolean | Implement RISC-V-compatible "debug" CPU operation mode required for the <<_on_chip_debugger_ocd>>.
| `RISCV_ISA_Sdtrig` | boolean | Implement RISC-V-compatible trigger module. See section <<_on_chip_debugger_ocd>>.
| `RISCV_ISA_Smpmp` | boolean | Implement RISC-V-compatible physical memory protection (PMP). See section <<_smpmp_isa_extension>>.
Expand Down
68 changes: 25 additions & 43 deletions docs/datasheet/cpu_csr.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,8 @@ to check if the targeted bits can actually be modified.
| 0xf14 | <<_mhartid>> | `CSR_MHARTID` | MRO | Machine hardware thread ID
| 0xf15 | <<_mconfigptr>> | `CSR_MCONFIGPTR` | MRO | Machine configuration pointer register
5+^| **<<_neorv32_specific_csrs>>**
| 0xbc0 | <<_mxiccrxd>> | `CSR_MXICCRXD` | MRW | ICC RX data
| 0xbc1 | <<_mxicctxd>> | `CSR_MXICCTXD` | MRW | ICC TX data
| 0xbc2 .. 0xbc3 | <<_mxiccsr, `mxiccsr0`>> .. <<_mxiccsr, `mxiccsr0`>> | `CSR_MXICCSR0` .. `CSR_MXICCSR3` | MRW | ICC control and status
| 0xbc0 | <<_mxiccsreg>> | `CSR_MXICCSREG` | MRW | Inter-core communication status register
| 0xbc1 | <<_mxiccdata>> | `CSR_MXICCDATA` | MRW | Inter-core communication data register
| 0x800 .. 0x803 | <<_cfureg, `cfureg0`>> .. <<_cfureg, `cfureg3`>> | `CSR_CFUCREG0` .. `CSR_CFUCREG3` | URW | Custom CFU registers 0 to 3
| 0xfc0 | <<_mxisa>> | `CSR_MXISA` | MRO | Extended machine CPU ISA and extensions
|=======================
Expand Down Expand Up @@ -931,66 +930,49 @@ custom/implementation-specific use (assured by the RISC-V privileged specificati
| Description | User-defined CSRs to be used within the <<_custom_functions_unit_cfu>>.
|=======================


{empty} +
[discrete]
===== **`mxiccrxd`**
===== **`mxiccsreg`**

[cols="<1,<8"]
[frame="topbot",grid="none"]
|=======================
| Name | <<_inter_core_communication_icc>> RX data
| Name | <<_inter_core_communication_icc>> status register
| Address | `0xbc0`
| Reset value | `0x00000000`
| Reset value | `0x40000000`
| ISA | `Zicsr` & `X`
| Description | RX data from selected link. Buffered by a 4-entries-deep and 32-bit wide FIFO.
This CSR is hardwired to all-zero if there is just a single CPU core in the system.
| Description | Shows the status of the core's inter-core communication link (message queue / FIFO status flags).
The entire CSR is read-only. However, write accesses are ignored.
This CSR is hardwired to all-zero if the <<_dual_core_configuration>> is disabled.
|=======================


{empty} +
[discrete]
===== **`mxicctxd`**

[cols="<1,<8"]
[frame="topbot",grid="none"]
.`mxiccsreg` CSR Bits
[cols="^1,^2,^1,<5"]
[options="header",grid="rows"]
|=======================
| Name | <<_inter_core_communication_icc>> TX data
| Address | `0xbc1`
| Reset value | `0x00000000`
| ISA | `Zicsr` & `X`
| Description | TX data for selected link. Buffered by a 4-entries-deep and 32-bit wide FIFO.
This CSR is hardwired to all-zero if there is just a single CPU core in the system.
| Bit | Name [C] | R/W | Description
| 0 | `CSR_MXICCSREG_RX_AVAIL` | r/- | Set if RX data from the other core is available.
| 1 | `CSR_MXICCSREG_TX_FREE` | r/- | Set if there is free space for TX data for the other core.
| 31:2 | - | r/- | Reserved; hardwired to zero.
|=======================


{empty} +
[discrete]
===== **`mxiccsr`**
===== **`mxiccdata`**

[cols="<1,<8"]
[frame="topbot",grid="none"]
|=======================
| Name | <<_inter_core_communication_icc>> control and status
| Address | `0xbc2` (`mxiccsr0`)
| | `0xbc3` (`mxiccsr1`)
| Reset value | `0x40000000`
| Name | <<_inter_core_communication_icc>> data register
| Address | `0xbc1`
| Reset value | `0x00000000`
| ISA | `Zicsr` & `X`
| Description | Link selection and status. Note that `mxiccsr1` is just a mirrored copy of `mxiccsr0`.
This CSR is hardwired to all-zero if there is just a single CPU core in the system.
|=======================

.`mxiccsr` CSR Bits
[cols="^1,^2,^1,<5"]
[options="header",grid="rows"]
|=======================
| Bit | Name [C] | R/W | Description
| 1:0 | `CSR_MXICCSR_LINK_MSB : CSR_MXICCSR_LINK_LSB` | r/w | Link select. The value in this memory corresponds
to the ID of the core to which a connection is to be established via a link. The ICC data registers <<_mxiccrxd>>
and <<_mxicctxd>> will only access the queue FIFOs of the selected link. Note that only bit 0 is writable. Bit 1
is hardwaired to zero.
| 29:2 | - | r/- | Reserved; hardwired to zero.
| 30 | `CSR_MXICCSR_TX_FREE` | r/- | Set if there is free space for TX data for the selected link.
| 31 | `CSR_MXICCSR_RX_AVAIL` | r/- | Set if RX data from the selected link is available.
| Description | This CSR provides access to the inter-core communication message queues that are implemented
as simple FIFOs. Writing to this register will put data into the message queue so it can be read by the other
core. Reading from this register will return data received from the other core (i.e. this CSR has side effects
when reading). A read access will return all-zero of no RX data is available from the other core.
This CSR is hardwired to all-zero if the <<_dual_core_configuration>> is disabled.
|=======================


Expand Down
65 changes: 38 additions & 27 deletions docs/datasheet/cpu_dual_core.adoc
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
:sectnums:
=== Dual-Core Configuration

.Dual-Core Example
.Dual-Core Example Programs
[TIP]
A simple dual-core example program can be found in `sw/example/demo_dual_core`.
A set of rather simple dual-core example programs can be found in `sw/example/demo_dual_core*`.

Optionally, the CPU core can be implemented as **symmetric multiprocessing (SMP) dual-core** system.
This dual-core configuration is enabled by the `DUAL_CORE_EN` <<_processor_top_entity_generics, top generic>>.
Expand All @@ -24,24 +24,30 @@ The following table summarizes the most important aspects when using the dual-co
[cols="<2,<10"]
[grid="rows"]
|=======================
| **CPU configuration** | Both cores use the same cache, CPU and ISA configuration provided by the according top generics.
| **Debugging** | A special SMP openOCD script (`sw/openocd/openocd_neorv32.dual_core.cfg`) is required to
debug both cores at one. SMP-debugging is fully supported by RISC-V gdb port.
debug both cores at one. SMP-debugging is fully supported by the RISC-V gdb port.
| **Clock and reset** | Both cores use the same global processor clock and reset. If <<_cpu_clock_gating>>
is enabled the clock of each core can be individually halted by putting it into <<_sleep_mode>>.
| **Address space** | Both cores have access to the same <<_address_space>>.
is enabled, the clock of each core can be individually halted by putting the core into <<_sleep_mode>>.
| **Address space** | Both cores have full access to the same physical <<_address_space>>.
| **Interrupts** | All <<_processor_interrupts>> are routed to both cores. Hence, each core has access to
all <<_neorv32_specific_fast_interrupt_requests>> (FIRQs). Additionally, the RISC-V machine-level _external
interrupt_ (via the top `mext_irq_i` port) is also send to both cores. In contrast, the RISC-V machine level
_software_ and _timer_ interrupts are exclusive for each core (provided by the <<_core_local_interruptor_clint>>).
| **RTE** | The <<_neorv32_runtime_environment>> also supports the dual-core configuration. However, it needs
to be explicitly initialized on each core individually. The RTE trap handling provides a individual handler
tables for each core.
_software_ and _timer_ interrupts are core-exclusive (provided by the <<_core_local_interruptor_clint>>).
| **RTE** | The <<_neorv32_runtime_environment>> fully supports the dual-core configuration and provides
core-individual trap handler tables. However, the RTE needs to be explicitly initialized on each core
(executing `neorv32_rte_setup()`).
| **Memory** | Each core has its own stack. The top of stack of core 0 is defined by the <<_linker_script>>
while the top of stack of core 1 has to be explicitly defined by core 0 (see <<_dual_core_boot>>). Both
cores share the same heap, `.data` and `.bss` sections.
| **Constructors and destructors** | Constructors and destructors are executed on core 0 only.
(see )
| **Core communication** | See section <<_inter_core_communication_icc>>.
cores share the same heap, `.data` and `.bss` sections. Hence, only core 0 setups the `.data` and `.bss`
sections at boot-up.
| **Constructors and destructors** | Constructors and destructors are executed by core 0 only
(see section <<_c_standard_library>>).
| **Cache coherency** | Be aware that there is no cache snooping available. If any level-1 cache is enabled
(<<_processor_internal_instruction_cache_icache>> and/or <<_processor_internal_data_cache_dcache>>) care
must be taken to prevent access to outdated data - either by using cache synchronization (`fence[.]`
instructions) or by using <<_atomic_memory_access>>.
| **Inter-core communication** | See section <<_inter_core_communication_icc>>.
| **Bootloader** | Only core 0 will boot and execute the bootloader while core 1 is held in standby.
| **Booting** | See section <<_dual_core_boot>>.
|=======================
Expand All @@ -55,22 +61,18 @@ shared-memory communication. Additionally, communication using these links is gu

The inter-core communication (ICC) module is implemented as dedicated hardware module within each CPU core
(VHDL file `rtl/core/neorv32_cpu_icc.vhd`). This module is automatically included if the dual-core option
is enabled. Each core provides a 32-bit wide and 4 entries deep FIFO for sending data to the other core.
is enabled. Each core provides a **32-bit wide** and **4 entries deep** FIFO for sending data to the other core.
Hence, there are two FIFOs: one for sending data from core 0 to core 1 and another one for sending data the
opposite way.

The ICC communication links are accessed via NEORV32-specific CSRs. Hence, those FIFOs are accessible only
by the CPU core itself and cannot be accessed by the DMA or any other CPU core. In total, three CSRs are
provided to handle communications:
The ICC communication links are accessed via two NEORV32-specific CSRs. Hence, those FIFOs are accessible only
by the CPU core itself and cannot be accessed by the DMA or any other CPU core.

The <<_mxiccsr>> is used to select the core with which to communicate. In the dual-core configuration core 1
can only select core 0 and vice versa. The core selection in this register allows access to the according
message FIFOs via the two other CSRs. Additionally, the CSR provides status flags (TX FIFO data available;
RX FIFO free space) related to the selected communication link.

The <<_mxiccrxd>> and <<_mxicctxd>> CSRs are used for the actual data read and write operations. Writing data
to <<<<_mxicctxd>>> will send to the message queue of the core selected by <<_mxiccsr>>. Conversely, reading
data from <<_mxiccrxd>> will return data received from the core selected by <<_mxiccsr>>.
The <<_mxiccsreg>> provides read-only status information about the core's ICC links: bit 0 becomes set if
there is RX data available for _this_ core (send from the the other core). Bit 1 is set as long there is
free space in _this_ core's TX data FIFO. The <<_mxiccdata>> CSR is used for actual data send/receive operations.
Writing this register will put the according data word into the TX link FIFO of _this_ core. Reading this CSR
will return a data word from the RX FIFO of _this_ core.

The ICC FIFOs do not provide any interrupt capabilities. Software is expected to use the machine-software
interrupt of the receiving core (provided by the <<_core_local_interruptor_clint>>) to inform it about
Expand All @@ -94,11 +96,20 @@ To boot-up core 1, the primary core has to use a special library function provid
.CPU Core 1 launch function prototype (note that this function can only be executed on core 0)
[source,c]
----
int neorv32_smp_launch(int hart_id, void (*entry_point)(void), uint8_t* stack_memory, size_t stack_size_bytes);
int neorv32_smp_launch(int (*entry_point)(void), uint8_t* stack_memory, size_t stack_size_bytes);
----

When executed, core 0 use the <<_inter_core_communication_icc>> to send launch data that includes the entry point
When executed, core 0 uses the <<_inter_core_communication_icc>> to send launch data that includes the entry point
for core 1 (via `entry_point`) and the actual stack configuration (via `stack_memory` and `stack_size_bytes`).
Note that the main function for core 1 has to use a specific type (return `int`, no arguments):

.CPU Core 1 Main Function
[source,c]
----
int core1_main(void) {
return 0; // return to crt0 and go to sleep mode
}
----

.Core 1 Stack Memory
[NOTE]
Expand Down
1 change: 0 additions & 1 deletion docs/datasheet/overview.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,6 @@ rtl/core
├-neorv32_clint.vhd - Core local interruptor
├-neorv32_clockgate.vhd - Generic clock gating switch
├-neorv32_cfs.vhd - Custom functions subsystem
├-neorv32_core_complex.vhd - NEORV32 CORE COMPLEX TOP ENTITY
├-neorv32_cpu.vhd - NEORV32 CPU TOP ENTITY
├-neorv32_cpu_alu.vhd - Arithmetic/logic unit
├-neorv32_cpu_control.vhd - CPU control, exception system and CSRs
Expand Down
3 changes: 2 additions & 1 deletion docs/datasheet/software.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -380,7 +380,8 @@ Note that `\n` (newline) is automatically converted to `\r\n` (carriage-return a
.Constructors and Destructors
[NOTE]
Constructors and destructors for plain C code or for C++ applications are supported by the software framework.
See `sw/example/hello_cpp` for a minimal example.
See `sw/example/hello_cpp` for a minimal example. Note that constructor and destructors are only executed
by core 0 (primary core) in the SMP <<_dual_core_configuration>>.

.Newlib Test/Demo Program
[TIP]
Expand Down
25 changes: 11 additions & 14 deletions rtl/core/neorv32_application_image.vhd
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
-- The NEORV32 RISC-V Processor - github.com/stnolting/neorv32
-- Auto-generated memory initialization image (for internal IMEM)
-- Source: demo_blink_led/build/main.bin
-- Built: 07.01.2025 21:36:11
-- Built: 10.01.2025 10:25:11

library ieee;
use ieee.std_logic_1164.all;
Expand All @@ -11,7 +11,7 @@ use neorv32.neorv32_package.all;

package neorv32_application_image is

constant application_init_size_c : natural := 1228; -- bytes
constant application_init_size_c : natural := 1216; -- bytes
constant application_init_image_c : mem32_t := (
x"f14020f3",
x"80002217",
Expand All @@ -23,11 +23,11 @@ x"000022b7",
x"80028293",
x"30029073",
x"00000317",
x"19430313",
x"18830313",
x"30531073",
x"30401073",
x"00000397",
x"49838393",
x"48c38393",
x"80000417",
x"fc440413",
x"80000497",
Expand All @@ -37,7 +37,7 @@ x"fb450513",
x"80000597",
x"fac58593",
x"00000617",
x"19c60613",
x"19060613",
x"00000693",
x"00000713",
x"00000793",
Expand All @@ -57,26 +57,23 @@ x"00000e13",
x"00000e93",
x"00000f13",
x"00000f93",
x"04008a63",
x"04008463",
x"00000797",
x"01878793",
x"30579073",
x"30446073",
x"30046073",
x"0e80006f",
x"0dc0006f",
x"fff40737",
x"00209793",
x"00f70733",
x"00072023",
x"bc201073",
x"bc0026f3",
x"00072223",
x"bc1026f3",
x"ffab4737",
x"32170713",
x"00d70463",
x"30200073",
x"bc102173",
x"bc102673",
x"bc171073",
x"bc002173",
x"bc002673",
x"0540006f",
x"00838e63",
x"00945c63",
Expand Down
Loading
Loading