Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed DRAM training/initialization on several DDR5 RDIMMs #190

Open
pjattke opened this issue Nov 15, 2024 · 0 comments
Open

Failed DRAM training/initialization on several DDR5 RDIMMs #190

pjattke opened this issue Nov 15, 2024 · 0 comments
Labels

Comments

@pjattke
Copy link
Contributor

pjattke commented Nov 15, 2024

Hello, we encountered issues during the DRAM training on some of our DDR5 RDIMMs with the ddr5-tester platform.

The bitstreams were built with the following command:

make -j"$(nproc)" build TARGET_ARGS="--l2-size 256 \
   --build \
   --iodelay-clk-freq 400e6 \
   --bios-lto \
   --rw-bios \
   --from-spd $(realpath "$spd_file") \
   --no-sdram-hw-test"

The bitstream is based on commit 998d4afeb of the rowhammer-tester repository.

Testing Procedure

For each of the DIMMs listed in "Tested Devices", we flash the bitstream, then we run the DRAM training and memtest. We test each DIMM for 10 times but stop early, after five successful runs. We reboot and reflash the FPGA between test iterations.

We used the DDR5 tester boards with S/N 18 (board rev. 1.0.1) and S/N 15 (board rev. 1.0.0) for all tests. Due to time reasons, we limited the tests for the board with S/N 15 to 3 repetitions.

Tested Devices

✔ – devices that passed all repetitions of the memory training and memtest.
✘ – devices that failed the memory training and/or memtest.

Module Manuf. Chip Mf. Size Module Model No. Int. ID Result S/N18 Result S/N15
Micron Micron 16 GB MTC10F1084S1RC48BA1 300 ✘ (0/10) ✘ (2/3)
Kingston SK Hynix 16 GB KSM48R40BS8KMM-16HMR 304
Micron Micron 32 GB MTC20F1045S1RC48BA2 301
Micron Micron 32 GB MTC18F1045S1PC48BA2 302 ✘ (0/10)
Kingston SK Hynix 32 GB KSM48R40BS4TMM-32HMR 303 ✘ (0/10) ✘ (0/3)
Samsung Samsung 32 GB M321R4GA0BB0-CQKMS 308 ✘ (0/10) ✘ (0/3)
SK Hynix SK Hynix 32 GB HMCG88AGBRA188N 306 ✘ (0/10) ✘ (0/3)
Samsung Samsung 32 GB M321R4GA3BB6-CQKET 307 unt.*
Kingston SK Hynix 32 GB KSM48R40BD8KMM-32HMR 310
Micron Micron 32 GB MTC20F2085S1RC48BA1 311 ✘ (1/10)
Kingston SK Hynix 64 GB KTL-TS548D4-64G 313 ✘ (0/10)
Kingston SK Hynix 64 GB KSM48R40BD4TMM-64HMR 312 ✘ (0/10) ✘ (0/3)

(*unt.: untested, as the DIMM broke meanwhile we could not test it again)

Interestingly, some of these devices (e.g., DIMM 300) have passed the memory training before (with a much older bitstream) but don't seem to work with the latest bitstream anymore. Has anything changed in the training procedure?

Logfiles

We attach the logfiles of the devices that failed for further analysis: logs-validation.tar.gz.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants