Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NULL pointer dereference in kmod driver while hot plugging a CPU #3441

Open
loli10K opened this issue Dec 31, 2024 · 1 comment
Open

NULL pointer dereference in kmod driver while hot plugging a CPU #3441

loli10K opened this issue Dec 31, 2024 · 1 comment
Labels

Comments

@loli10K
Copy link

loli10K commented Dec 31, 2024

Describe the bug

The kmod driver doesn't handle CPU hot plugging gracefully. Maybe it's not a common use case (that is, it doesn't really happen that often during a workload) but it did happen to me.

How to reproduce it

Happened once randomly while hot plugging a core, can be easily reproduced running the following commands in a loop

echo 1 > /sys/devices/system/cpu/cpuX/online
echo 0 > /sys/devices/system/cpu/cpuX/online

Expected behaviour

Falco's kmod driver should handle CPU hot plugging gracefully.

Screenshots

No screenshot, but i'll do you one better, kernel oops (this is from my debug kernel, but it does happen on 5.15.0-67-generic as well):

[   93.904133] BUG: kernel NULL pointer dereference, address: 0000000000000008
[   93.906458] #PF: supervisor read access in kernel mode
[   93.907814] #PF: error_code(0x0000) - not-present page
[   93.909319] PGD 0 P4D 0 
[   93.909996] Oops: 0000 [#1] SMP PTI
[   93.910993] CPU: 2 PID: 23 Comm: cpuhp/2 Tainted: G           OE     5.15.67 #2
[   93.913099] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[   93.914593] RIP: 0010:record_event_consumer+0xe4/0xeb0 [falco]
[   93.916146] Code: 1c c5 80 f8 46 82 41 83 3c 24 02 4c 8b 43 08 0f 84 60 02 00 00 41 bf 01 00 00 00 f0 44 0f c1 7b 24 45 85 ff 0f 85 88 02 00 00 <49> 8b 40 08 48 83 c0 01 49 89 40 08 41 8b 10 41 8b 40 04 39 c2 0f
[   93.921189] RSP: 0000:ffffc900001ebc20 EFLAGS: 00010246
[   93.922569] RAX: 0000000000000002 RBX: ffffe8ffffd42ba8 RCX: 0000000000000000
[   93.924449] RDX: 0000000000000000 RSI: 00000000000000f4 RDI: ffffc900001ebd70
[   93.926355] RBP: ffffc900001ebe48 R08: 0000000000000000 R09: 0000000000000009
[   93.928205] R10: 0000000000000014 R11: ffffc900001ebcc0 R12: ffffc900001ebe20
[   93.930317] R13: ffffc9000036b000 R14: 00000000000000f4 R15: 0000000000000000
[   93.932140] FS:  0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
[   93.934252] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   93.935600] CR2: 0000000000000008 CR3: 000000000260a001 CR4: 00000000003706e0
[   93.937456] Call Trace:
[   93.938269]  <TASK>
[   93.938806]  ? ___cache_free+0x2df/0x490
[   93.939747]  ? netlink_broadcast_filtered+0x146/0x4a0
[   93.941307]  ? unwind_next_frame+0x482/0x5d0
[   93.942556]  ? ret_from_fork+0x22/0x30
[   93.943514]  ? unwind_next_frame+0x61/0x5d0
[   93.944543]  record_event_all_consumers+0x54/0x80 [falco]
[   93.945852]  ? do_cpu_callback+0x120/0x120 [falco]
[   93.947031]  do_cpu_callback+0xf6/0x120 [falco]
[   93.948246]  scap_cpu_online+0x3c/0x50 [falco]
[   93.949352]  cpuhp_invoke_callback+0x25f/0x3c0
[   93.950517]  ? virtnet_cpu_dead+0x30/0x30 [virtio_net]
[   93.951747]  cpuhp_thread_fun+0x8d/0x140
[   93.952718]  smpboot_thread_fn+0xaf/0x140
[   93.953751]  ? smpboot_register_percpu_thread+0xe0/0xe0
[   93.955682]  kthread+0x127/0x150
[   93.956955]  ? set_kthread_struct+0x50/0x50
[   93.958519]  ret_from_fork+0x22/0x30
[   93.959852]  </TASK>
[   93.960726] Modules linked in: falco(OE) evdev(E) virtio_balloon(E) virtio_console(E) sch_fq_codel(E) msr(E) fuse(E) efi_pstore(E) virtio_rng(E) rng_core(E) autofs4(E) efivars(E) virtio_net(E) net_failover(E) virtio_blk(E) failover(E) qxl(E) drm_ttm_helper(E) ttm(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ghash_clmulni_intel(E) drm(E) cryptd(E) ata_generic(E) virtio_pci(E) virtio(E) virtio_pci_modern_dev(E) virtio_ring(E) libata(E) [last unloaded: falco]
[   93.978637] CR2: 0000000000000008
[   93.979894] ---[ end trace 59aa4e75c88d37e4 ]---
[   93.981589] RIP: 0010:record_event_consumer+0xe4/0xeb0 [falco]
[   93.983756] Code: 1c c5 80 f8 46 82 41 83 3c 24 02 4c 8b 43 08 0f 84 60 02 00 00 41 bf 01 00 00 00 f0 44 0f c1 7b 24 45 85 ff 0f 85 88 02 00 00 <49> 8b 40 08 48 83 c0 01 49 89 40 08 41 8b 10 41 8b 40 04 39 c2 0f
[   93.990679] RSP: 0000:ffffc900001ebc20 EFLAGS: 00010246
[   93.992623] RAX: 0000000000000002 RBX: ffffe8ffffd42ba8 RCX: 0000000000000000
[   93.995245] RDX: 0000000000000000 RSI: 00000000000000f4 RDI: ffffc900001ebd70
[   93.997870] RBP: ffffc900001ebe48 R08: 0000000000000000 R09: 0000000000000009
[   94.000504] R10: 0000000000000014 R11: ffffc900001ebcc0 R12: ffffc900001ebe20
[   94.003128] R13: ffffc9000036b000 R14: 00000000000000f4 R15: 0000000000000000
[   94.005743] FS:  0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
[   94.008708] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   94.010837] CR2: 0000000000000008 CR3: 000000000260a001 CR4: 00000000003706e0

kgdb

Thread 27 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 23]
record_event_consumer (consumer=consumer@entry=0xffffc900003a3000, event_type=event_type@entry=PPME_CPU_HOTPLUG_E, 
    drop_flags=drop_flags@entry=UF_NEVER_DROP, ns=ns@entry=1735650240013207314, 
    event_datap=event_datap@entry=0xffffc900001ebe20, tp_type=tp_type@entry=KMOD_PROG_ATTACHED_MAX)
    at /usr/src/falco-7.3.0+driver/main.c:1847
1847		ring_info->n_evts++;
(gdb) list
1842			ring_info->n_preemptions++;
1843			atomic_dec(&ring->preempt_count);
1844			put_cpu();
1845			return res;
1846		}
1847		ring_info->n_evts++;
1848	
1849		/*
1850		 * Calculate the space currently available in the buffer
1851		 */
(gdb) p ring_info
$1 = (struct ppm_ring_buffer_info *) 0x0 <fixed_percpu_data>
(gdb) 

Environment

  • Falco version:
Falco version: 0.39.2
Libs version:  0.18.2
Plugin API:    3.7.0
Engine:        0.43.0
Driver:
  API version:    8.0.0
  Schema version: 2.0.0
  Default driver: 7.3.0+driver
  • System info:
{
  "machine": "x86_64",
  "nodename": "falco",
  "release": "5.15.0-67-generic",
  "sysname": "Linux",
  "version": "#74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023"
}
  • Cloud provider or hardware configuration:
QEMU/KVM virtual hardware
  • OS:
PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
  • Kernel:
Linux falco 5.15.0-67-generic #74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Installation method:
deb

Additional context

@loli10K
Copy link
Author

loli10K commented Jan 9, 2025

I believe this is a regression, this used to work with an older version of the kernel driver (draios/sysdig#744) and it's actually shockingly easier to reproduce:

  1. stop falco service
  2. rmmod falco for good measure
  3. echo 0 > /sys/devices/system/cpu/cpu1/online
  4. start falco
  5. echo 1 > /sys/devices/system/cpu/cpu1/online

This behavior also contradicts what's written here:

https://github.com/falcosecurity/libs/blob/7.3.0%2Bdriver/driver/main.c#L475-L482

  /*
   * If a cpu is offline when the consumer is first created, we
   * will never get events for that cpu even if it later comes
   * online via hotplug. We could allocate these rings on-demand
   * later in this function if needed for hotplug, but that
   * requires the consumer to know to call open again, and that is
   * not supported.
   */

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant