Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File cannot be written to when recovering from crash during nova_extend_inode_log (checksums+parity enabled) #123

Open
pcworld opened this issue Nov 13, 2021 · 3 comments

Comments

@pcworld
Copy link
Contributor

pcworld commented Nov 13, 2021

This report assumes that NOVA's protection features are enabled (nova.metadata_csum=1 nova.data_csum=1 nova.data_parity=1).
When writing to a file and crashing during nova_extend_inode_log, this can result in the file becoming unwritable after recovering from that crash state.

I run echo HelloWorld > /mnt/myfile as in #105.
I consider a crash during nova_extend_inode_log with the following call stack: nova_extend_inode_log <- nova_get_append_head <- nova_append_log_entry <- nova_append_file_write_entry <- do_nova_inplace_file_write <- nova_inplace_file_write <- nova_dax_file_write <- __vfs_write.
I consider persisting only one of the pending stores, I think it's the checksum updated in log.c l. 1224 (I will confirm that later), but not other stores (in particular, excluding some updates performed by nova_update_alter_pages).
I then load the resulting crash image, and reading the file system is fine (with an empty /mnt/myfile), however when appending to the file with echo test >> /mnt/myfile, an I/O error occurs with the following trace output by NOVA:

/ # echo test >> /mnt/myfile
nova: nova_check_inode_integrity: inode replica 33 is stale, trying to repair using the primary
nova: nova_repair_inode: inode 33 error repaired
nova error: 
nova_verify_entry_csum: log page tail error detected
nova error: 
nova_verify_entry_csum: unable to repair entry errors
nova: nova_find_free_slot ERROR: 883 - 883 already in free list
nova: nova_free_blocks: find free slot fail: -22
nova error: 
Inode 33: free 1 data block from 883 to 883 failed!
nova: Pi 33: log head 0x374000, tail 0x374040
nova: file write entry @ 0x374000: epoch 0, trans 1, pgoff 0, pages 1, blocknr 883, reassigned 0, updating 0, invalid count 0, size 5, mtime 1636784578
nova: nova_get_entry_copy: unknown or unsupported entry type (0) for checksum, 0xffff888008000040
CPU: 0 PID: 1 Comm: sh Not tainted 5.1.0+ #76
Call Trace:
<snip>

I have attached the crash image where this problem occurs (I'm not sure though if NOVA images can be used with other machines): img.gz

Edit: I originally wrongly reported this as a kernel fault, however appending to the file is just segfaulting my shell (which is also init), apparently it cannot handle write calls resulting in I/O errors.
Edit 2: I'm really unsure if there maybe still is another kernel issue (and contrary to my previous edit maybe it's not the shell's fault). Even though a subshell with the append command crashes, if I issue the append command a second time in another subshell, the append operation appears not to finish and just hangs. I will need to investigate this further.

@pcworld pcworld changed the title Kernel fault when recovering from crash during nova_extend_inode_log (checksums+parity enabled) File cannot be written to when recovering from crash during nova_extend_inode_log (checksums+parity enabled) Nov 15, 2021
@Andiry
Copy link
Contributor

Andiry commented Dec 7, 2021

Thanks for the reporting. However, I am not sure how to use the image:

# echo img > /dev/pmem0 
# mount -t NOVA /dev/pmem0 /mnt/ramdisk/
# dmesg
[ 4337.052594] nova: Arch new instructions support: CLWB NO
[ 4337.052602] nova: Data structure size: inode 120, log_page 4096, file_write_entry 64, dir_entry(max) 304, setattr_entry 56, link_change_entry 40
[ 4351.582107] nova: 16 cpus online
[ 4351.582112] nova: nova_get_nvmm_info: dev pmem0, phys_addr 0x380000000, virt_addr 0xffffa005c0000000, size 2147483648
[ 4351.582113] nova: measure timing 0, metadata checksum 0, wprotect 0, data checksum 0, data parity 0, DRAM checksum 0
[ 4351.582203] nova: Start NOVA snapshot cleaner thread.
[ 4351.582206] nova: Running snapshot cleaner thread
[ 4351.582211] nova: Error in super block: try to repair it with the other copy
[ 4351.582220] nova: nova_init_inode_list_from_inode: 16 inode nodes
[ 4351.582222] nova: Recovered 0 snapshots, latest epoch ID 0
[ 4351.582223] nova: NOVA: Normal shutdown
[ 4351.582228] nova: Current epoch id: 0

Writing to /mnt/ramdisk/myfile and it succeeds.

@pcworld
Copy link
Contributor Author

pcworld commented Dec 7, 2021

echo img > /dev/pmem0

I suppose you meant cat there? (That command would write the literal string "img" onto the device, however mounting should fail after that, so I assume you did it properly).

It seems that NOVA images are not really portable, I have also tried using it on another virtual machine, but when #cpus != 1 (the crash image was produced with #cpus = 1), weird things happen and the file is not even listed. My guess would be that NOVA does not properly recognize when #cpus have changed and the per-CPU journals are not recovered correctly or so. Should I report this as a separate issue?
The mounted image should contain a file called "myfile".
Though even when I set #cpus = 1 in another VM setup (different from the VM from which I extracted the image), I can iirc load the image and the existing file is listed, but I don't observe the problem I originally reported.
So I will need to dig further into how to reproduce this without relying on sharing the image.

@Andiry
Copy link
Contributor

Andiry commented Dec 7, 2021

Yes, the change of #cpus is a known issue of NOVA. Fixing that, however, requires a lot of effort I assume.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants