Filtered by vendor Linux
Subscriptions
Total
14380 CVE
CVE | Vendors | Products | Updated | CVSS v3.1 |
---|---|---|---|---|
CVE-2024-53685 | 1 Linux | 1 Linux Kernel | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: ceph: give up on paths longer than PATH_MAX If the full path to be built by ceph_mdsc_build_path() happens to be longer than PATH_MAX, then this function will enter an endless (retry) loop, effectively blocking the whole task. Most of the machine becomes unusable, making this a very simple and effective DoS vulnerability. I cannot imagine why this retry was ever implemented, but it seems rather useless and harmful to me. Let's remove it and fail with ENAMETOOLONG instead. | ||||
CVE-2024-53682 | 1 Linux | 1 Linux Kernel | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: regulator: axp20x: AXP717: set ramp_delay AXP717 datasheet says that regulator ramp delay is 15.625 us/step, which is 10mV in our case. Add a AXP_DESC_RANGES_DELAY macro and update AXP_DESC_RANGES macro to expand to AXP_DESC_RANGES_DELAY with ramp_delay = 0 For DCDC4, steps is 100mv Add a AXP_DESC_DELAY macro and update AXP_DESC macro to expand to AXP_DESC_DELAY with ramp_delay = 0 This patch fix crashes when using CPU DVFS. | ||||
CVE-2024-49573 | 1 Linux | 1 Linux Kernel | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: sched/fair: Fix NEXT_BUDDY Adam reports that enabling NEXT_BUDDY insta triggers a WARN in pick_next_entity(). Moving clear_buddies() up before the delayed dequeue bits ensures no ->next buddy becomes delayed. Further ensure no new ->next buddy ever starts as delayed. | ||||
CVE-2024-49571 | 1 Linux | 1 Linux Kernel | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: net/smc: check iparea_offset and ipv6_prefixes_cnt when receiving proposal msg When receiving proposal msg in server, the field iparea_offset and the field ipv6_prefixes_cnt in proposal msg are from the remote client and can not be fully trusted. Especially the field iparea_offset, once exceed the max value, there has the chance to access wrong address, and crash may happen. This patch checks iparea_offset and ipv6_prefixes_cnt before using them. | ||||
CVE-2024-49568 | 1 Linux | 1 Linux Kernel | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: net/smc: check v2_ext_offset/eid_cnt/ism_gid_cnt when receiving proposal msg When receiving proposal msg in server, the fields v2_ext_offset/ eid_cnt/ism_gid_cnt in proposal msg are from the remote client and can not be fully trusted. Especially the field v2_ext_offset, once exceed the max value, there has the chance to access wrong address, and crash may happen. This patch checks the fields v2_ext_offset/eid_cnt/ism_gid_cnt before using them. | ||||
CVE-2024-47408 | 1 Linux | 1 Linux Kernel | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: net/smc: check smcd_v2_ext_offset when receiving proposal msg When receiving proposal msg in server, the field smcd_v2_ext_offset in proposal msg is from the remote client and can not be fully trusted. Once the value of smcd_v2_ext_offset exceed the max value, there has the chance to access wrong address, and crash may happen. This patch checks the value of smcd_v2_ext_offset before using it. | ||||
CVE-2024-41932 | 1 Linux | 1 Linux Kernel | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: sched: fix warning in sched_setaffinity Commit 8f9ea86fdf99b added some logic to sched_setaffinity that included a WARN when a per-task affinity assignment races with a cpuset update. Specifically, we can have a race where a cpuset update results in the task affinity no longer being a subset of the cpuset. That's fine; we have a fallback to instead use the cpuset mask. However, we have a WARN set up that will trigger if the cpuset mask has no overlap at all with the requested task affinity. This shouldn't be a warning condition; its trivial to create this condition. Reproduced the warning by the following setup: - $PID inside a cpuset cgroup - another thread repeatedly switching the cpuset cpus from 1-2 to just 1 - another thread repeatedly setting the $PID affinity (via taskset) to 2 | ||||
CVE-2023-52925 | 1 Linux | 1 Linux Kernel | 2025-10-15 | 6.2 Medium |
In the Linux kernel, the following vulnerability has been resolved: netfilter: nf_tables: don't fail inserts if duplicate has expired nftables selftests fail: run-tests.sh testcases/sets/0044interval_overlap_0 Expected: 0-2 . 0-3, got: W: [FAILED] ./testcases/sets/0044interval_overlap_0: got 1 Insertion must ignore duplicate but expired entries. Moreover, there is a strange asymmetry in nft_pipapo_activate: It refetches the current element, whereas the other ->activate callbacks (bitmap, hash, rhash, rbtree) use elem->priv. Same for .remove: other set implementations take elem->priv, nft_pipapo_remove fetches elem->priv, then does a relookup, remove this. I suspect this was the reason for the change that prompted the removal of the expired check in pipapo_get() in the first place, but skipping exired elements there makes no sense to me, this helper is used for normal get requests, insertions (duplicate check) and deactivate callback. In first two cases expired elements must be skipped. For ->deactivate(), this gets called for DELSETELEM, so it seems to me that expired elements should be skipped as well, i.e. delete request should fail with -ENOENT error. | ||||
CVE-2023-52924 | 1 Linux | 1 Linux Kernel | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: netfilter: nf_tables: don't skip expired elements during walk There is an asymmetry between commit/abort and preparation phase if the following conditions are met: 1. set is a verdict map ("1.2.3.4 : jump foo") 2. timeouts are enabled In this case, following sequence is problematic: 1. element E in set S refers to chain C 2. userspace requests removal of set S 3. kernel does a set walk to decrement chain->use count for all elements from preparation phase 4. kernel does another set walk to remove elements from the commit phase (or another walk to do a chain->use increment for all elements from abort phase) If E has already expired in 1), it will be ignored during list walk, so its use count won't have been changed. Then, when set is culled, ->destroy callback will zap the element via nf_tables_set_elem_destroy(), but this function is only safe for elements that have been deactivated earlier from the preparation phase: lack of earlier deactivate removes the element but leaks the chain use count, which results in a WARN splat when the chain gets removed later, plus a leak of the nft_chain structure. Update pipapo_get() not to skip expired elements, otherwise flush command reports bogus ENOENT errors. | ||||
CVE-2023-52923 | 1 Linux | 1 Linux Kernel | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: netfilter: nf_tables: adapt set backend to use GC transaction API Use the GC transaction API to replace the old and buggy gc API and the busy mark approach. No set elements are removed from async garbage collection anymore, instead the _DEAD bit is set on so the set element is not visible from lookup path anymore. Async GC enqueues transaction work that might be aborted and retried later. rbtree and pipapo set backends does not set on the _DEAD bit from the sync GC path since this runs in control plane path where mutex is held. In this case, set elements are deactivated, removed and then released via RCU callback, sync GC never fails. | ||||
CVE-2024-54031 | 1 Linux | 1 Linux Kernel | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: netfilter: nft_set_hash: unaligned atomic read on struct nft_set_ext Access to genmask field in struct nft_set_ext results in unaligned atomic read: [ 72.130109] Unable to handle kernel paging request at virtual address ffff0000c2bb708c [ 72.131036] Mem abort info: [ 72.131213] ESR = 0x0000000096000021 [ 72.131446] EC = 0x25: DABT (current EL), IL = 32 bits [ 72.132209] SET = 0, FnV = 0 [ 72.133216] EA = 0, S1PTW = 0 [ 72.134080] FSC = 0x21: alignment fault [ 72.135593] Data abort info: [ 72.137194] ISV = 0, ISS = 0x00000021, ISS2 = 0x00000000 [ 72.142351] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 72.145989] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 72.150115] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000237d27000 [ 72.154893] [ffff0000c2bb708c] pgd=0000000000000000, p4d=180000023ffff403, pud=180000023f84b403, pmd=180000023f835403, +pte=0068000102bb7707 [ 72.163021] Internal error: Oops: 0000000096000021 [#1] SMP [...] [ 72.170041] CPU: 7 UID: 0 PID: 54 Comm: kworker/7:0 Tainted: G E 6.13.0-rc3+ #2 [ 72.170509] Tainted: [E]=UNSIGNED_MODULE [ 72.170720] Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-stable202302-for-qemu 03/01/2023 [ 72.171192] Workqueue: events_power_efficient nft_rhash_gc [nf_tables] [ 72.171552] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 72.171915] pc : nft_rhash_gc+0x200/0x2d8 [nf_tables] [ 72.172166] lr : nft_rhash_gc+0x128/0x2d8 [nf_tables] [ 72.172546] sp : ffff800081f2bce0 [ 72.172724] x29: ffff800081f2bd40 x28: ffff0000c2bb708c x27: 0000000000000038 [ 72.173078] x26: ffff0000c6780ef0 x25: ffff0000c643df00 x24: ffff0000c6778f78 [ 72.173431] x23: 000000000000001a x22: ffff0000c4b1f000 x21: ffff0000c6780f78 [ 72.173782] x20: ffff0000c2bb70dc x19: ffff0000c2bb7080 x18: 0000000000000000 [ 72.174135] x17: ffff0000c0a4e1c0 x16: 0000000000003000 x15: 0000ac26d173b978 [ 72.174485] x14: ffffffffffffffff x13: 0000000000000030 x12: ffff0000c6780ef0 [ 72.174841] x11: 0000000000000000 x10: ffff800081f2bcf8 x9 : ffff0000c3000000 [ 72.175193] x8 : 00000000000004be x7 : 0000000000000000 x6 : 0000000000000000 [ 72.175544] x5 : 0000000000000040 x4 : ffff0000c3000010 x3 : 0000000000000000 [ 72.175871] x2 : 0000000000003a98 x1 : ffff0000c2bb708c x0 : 0000000000000004 [ 72.176207] Call trace: [ 72.176316] nft_rhash_gc+0x200/0x2d8 [nf_tables] (P) [ 72.176653] process_one_work+0x178/0x3d0 [ 72.176831] worker_thread+0x200/0x3f0 [ 72.176995] kthread+0xe8/0xf8 [ 72.177130] ret_from_fork+0x10/0x20 [ 72.177289] Code: 54fff984 d503201f d2800080 91003261 (f820303f) [ 72.177557] ---[ end trace 0000000000000000 ]--- Align struct nft_set_ext to word size to address this and documentation it. pahole reports that this increases the size of elements for rhash and pipapo in 8 bytes on x86_64. | ||||
CVE-2024-53681 | 2 Linux, Redhat | 2 Linux Kernel, Enterprise Linux | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: nvmet: Don't overflow subsysnqn nvmet_root_discovery_nqn_store treats the subsysnqn string like a fixed size buffer, even though it is dynamically allocated to the size of the string. Create a new string with kstrndup instead of using the old buffer. | ||||
CVE-2024-39282 | 1 Linux | 1 Linux Kernel | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: net: wwan: t7xx: Fix FSM command timeout issue When driver processes the internal state change command, it use an asynchronous thread to process the command operation. If the main thread detects that the task has timed out, the asynchronous thread will panic when executing the completion notification because the main thread completion object has been released. BUG: unable to handle page fault for address: fffffffffffffff8 PGD 1f283a067 P4D 1f283a067 PUD 1f283c067 PMD 0 Oops: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:complete_all+0x3e/0xa0 [...] Call Trace: <TASK> ? __die_body+0x68/0xb0 ? page_fault_oops+0x379/0x3e0 ? exc_page_fault+0x69/0xa0 ? asm_exc_page_fault+0x22/0x30 ? complete_all+0x3e/0xa0 fsm_main_thread+0xa3/0x9c0 [mtk_t7xx (HASH:1400 5)] ? __pfx_autoremove_wake_function+0x10/0x10 kthread+0xd8/0x110 ? __pfx_fsm_main_thread+0x10/0x10 [mtk_t7xx (HASH:1400 5)] ? __pfx_kthread+0x10/0x10 ret_from_fork+0x38/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> [...] CR2: fffffffffffffff8 ---[ end trace 0000000000000000 ]--- Use the reference counter to ensure safe release as Sergey suggests: https://lore.kernel.org/all/[email protected]/ | ||||
CVE-2024-54193 | 1 Linux | 1 Linux Kernel | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: accel/ivpu: Fix WARN in ivpu_ipc_send_receive_internal() Move pm_runtime_set_active() to ivpu_pm_init() so when ivpu_ipc_send_receive_internal() is executed before ivpu_pm_enable() it already has correct runtime state, even if last resume was not successful. | ||||
CVE-2022-49151 | 1 Linux | 1 Linux Kernel | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: can: mcba_usb: properly check endpoint type Syzbot reported warning in usb_submit_urb() which is caused by wrong endpoint type. We should check that in endpoint is actually present to prevent this warning. Found pipes are now saved to struct mcba_priv and code uses them directly instead of making pipes in place. Fail log: | usb 5-1: BOGUS urb xfer, pipe 3 != type 1 | WARNING: CPU: 1 PID: 49 at drivers/usb/core/urb.c:502 usb_submit_urb+0xed2/0x18a0 drivers/usb/core/urb.c:502 | Modules linked in: | CPU: 1 PID: 49 Comm: kworker/1:2 Not tainted 5.17.0-rc6-syzkaller-00184-g38f80f42147f #0 | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 | Workqueue: usb_hub_wq hub_event | RIP: 0010:usb_submit_urb+0xed2/0x18a0 drivers/usb/core/urb.c:502 | ... | Call Trace: | <TASK> | mcba_usb_start drivers/net/can/usb/mcba_usb.c:662 [inline] | mcba_usb_probe+0x8a3/0xc50 drivers/net/can/usb/mcba_usb.c:858 | usb_probe_interface+0x315/0x7f0 drivers/usb/core/driver.c:396 | call_driver_probe drivers/base/dd.c:517 [inline] | ||||
CVE-2022-49147 | 2 Linux, Redhat | 2 Linux Kernel, Enterprise Linux | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: block: Fix the maximum minor value is blk_alloc_ext_minor() ida_alloc_range(..., min, max, ...) returns values from min to max, inclusive. So, NR_EXT_DEVT is a valid idx returned by blk_alloc_ext_minor(). This is an issue because in device_add_disk(), this value is used in: ddev->devt = MKDEV(disk->major, disk->first_minor); and NR_EXT_DEVT is '(1 << MINORBITS)'. So, should 'disk->first_minor' be NR_EXT_DEVT, it would overflow. | ||||
CVE-2022-49146 | 1 Linux | 1 Linux Kernel | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: virtio: use virtio_device_ready() in virtio_device_restore() After waking up a suspended VM, the kernel prints the following trace for virtio drivers which do not directly call virtio_device_ready() in the .restore: PM: suspend exit irq 22: nobody cared (try booting with the "irqpoll" option) Call Trace: <IRQ> dump_stack_lvl+0x38/0x49 dump_stack+0x10/0x12 __report_bad_irq+0x3a/0xaf note_interrupt.cold+0xb/0x60 handle_irq_event+0x71/0x80 handle_fasteoi_irq+0x95/0x1e0 __common_interrupt+0x6b/0x110 common_interrupt+0x63/0xe0 asm_common_interrupt+0x1e/0x40 ? __do_softirq+0x75/0x2f3 irq_exit_rcu+0x93/0xe0 sysvec_apic_timer_interrupt+0xac/0xd0 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch_cpu_idle+0x12/0x20 default_idle_call+0x39/0xf0 do_idle+0x1b5/0x210 cpu_startup_entry+0x20/0x30 start_secondary+0xf3/0x100 secondary_startup_64_no_verify+0xc3/0xcb </TASK> handlers: [<000000008f9bac49>] vp_interrupt [<000000008f9bac49>] vp_interrupt Disabling IRQ #22 This happens because we don't invoke .enable_cbs callback in virtio_device_restore(). That callback is used by some transports (e.g. virtio-pci) to enable interrupts. Let's fix it, by calling virtio_device_ready() as we do in virtio_dev_probe(). This function calls .enable_cts callback and sets DRIVER_OK status bit. This fix also avoids setting DRIVER_OK twice for those drivers that call virtio_device_ready() in the .restore. | ||||
CVE-2022-49142 | 2 Linux, Redhat | 2 Linux Kernel, Enterprise Linux | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: net: preserve skb_end_offset() in skb_unclone_keeptruesize() syzbot found another way to trigger the infamous WARN_ON_ONCE(delta < len) in skb_try_coalesce() [1] I was able to root cause the issue to kfence. When kfence is in action, the following assertion is no longer true: int size = xxxx; void *ptr1 = kmalloc(size, gfp); void *ptr2 = kmalloc(size, gfp); if (ptr1 && ptr2) ASSERT(ksize(ptr1) == ksize(ptr2)); We attempted to fix these issues in the blamed commits, but forgot that TCP was possibly shifting data after skb_unclone_keeptruesize() has been used, notably from tcp_retrans_try_collapse(). So we not only need to keep same skb->truesize value, we also need to make sure TCP wont fill new tailroom that pskb_expand_head() was able to get from a addr = kmalloc(...) followed by ksize(addr) Split skb_unclone_keeptruesize() into two parts: 1) Inline skb_unclone_keeptruesize() for the common case, when skb is not cloned. 2) Out of line __skb_unclone_keeptruesize() for the 'slow path'. WARNING: CPU: 1 PID: 6490 at net/core/skbuff.c:5295 skb_try_coalesce+0x1235/0x1560 net/core/skbuff.c:5295 Modules linked in: CPU: 1 PID: 6490 Comm: syz-executor161 Not tainted 5.17.0-rc4-syzkaller-00229-g4f12b742eb2b #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:skb_try_coalesce+0x1235/0x1560 net/core/skbuff.c:5295 Code: bf 01 00 00 00 0f b7 c0 89 c6 89 44 24 20 e8 62 24 4e fa 8b 44 24 20 83 e8 01 0f 85 e5 f0 ff ff e9 87 f4 ff ff e8 cb 20 4e fa <0f> 0b e9 06 f9 ff ff e8 af b2 95 fa e9 69 f0 ff ff e8 95 b2 95 fa RSP: 0018:ffffc900063af268 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 00000000ffffffd5 RCX: 0000000000000000 RDX: ffff88806fc05700 RSI: ffffffff872abd55 RDI: 0000000000000003 RBP: ffff88806e675500 R08: 00000000ffffffd5 R09: 0000000000000000 R10: ffffffff872ab659 R11: 0000000000000000 R12: ffff88806dd554e8 R13: ffff88806dd9bac0 R14: ffff88806dd9a2c0 R15: 0000000000000155 FS: 00007f18014f9700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020002000 CR3: 000000006be7a000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tcp_try_coalesce net/ipv4/tcp_input.c:4651 [inline] tcp_try_coalesce+0x393/0x920 net/ipv4/tcp_input.c:4630 tcp_queue_rcv+0x8a/0x6e0 net/ipv4/tcp_input.c:4914 tcp_data_queue+0x11fd/0x4bb0 net/ipv4/tcp_input.c:5025 tcp_rcv_established+0x81e/0x1ff0 net/ipv4/tcp_input.c:5947 tcp_v4_do_rcv+0x65e/0x980 net/ipv4/tcp_ipv4.c:1719 sk_backlog_rcv include/net/sock.h:1037 [inline] __release_sock+0x134/0x3b0 net/core/sock.c:2779 release_sock+0x54/0x1b0 net/core/sock.c:3311 sk_wait_data+0x177/0x450 net/core/sock.c:2821 tcp_recvmsg_locked+0xe28/0x1fd0 net/ipv4/tcp.c:2457 tcp_recvmsg+0x137/0x610 net/ipv4/tcp.c:2572 inet_recvmsg+0x11b/0x5e0 net/ipv4/af_inet.c:850 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] sock_recvmsg net/socket.c:962 [inline] ____sys_recvmsg+0x2c4/0x600 net/socket.c:2632 ___sys_recvmsg+0x127/0x200 net/socket.c:2674 __sys_recvmsg+0xe2/0x1a0 net/socket.c:2704 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae | ||||
CVE-2022-49133 | 1 Linux | 1 Linux Kernel | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: drm/amdkfd: svm range restore work deadlock when process exit kfd_process_notifier_release flush svm_range_restore_work which calls svm_range_list_lock_and_flush_work to flush deferred_list work, but if deferred_list work mmput release the last user, it will call exit_mmap -> notifier_release, it is deadlock with below backtrace. Move flush svm_range_restore_work to kfd_process_wq_release to avoid deadlock. Then svm_range_restore_work take task->mm ref to avoid mm is gone while validating and mapping ranges to GPU. Workqueue: events svm_range_deferred_list_work [amdgpu] Call Trace: wait_for_completion+0x94/0x100 __flush_work+0x12a/0x1e0 __cancel_work_timer+0x10e/0x190 cancel_delayed_work_sync+0x13/0x20 kfd_process_notifier_release+0x98/0x2a0 [amdgpu] __mmu_notifier_release+0x74/0x1f0 exit_mmap+0x170/0x200 mmput+0x5d/0x130 svm_range_deferred_list_work+0x104/0x230 [amdgpu] process_one_work+0x220/0x3c0 | ||||
CVE-2022-49124 | 2 Linux, Redhat | 2 Linux Kernel, Enterprise Linux | 2025-10-15 | 5.5 Medium |
In the Linux kernel, the following vulnerability has been resolved: x86/mce: Work around an erratum on fast string copy instructions A rare kernel panic scenario can happen when the following conditions are met due to an erratum on fast string copy instructions: 1) An uncorrected error. 2) That error must be in first cache line of a page. 3) Kernel must execute page_copy from the page immediately before that page. The fast string copy instructions ("REP; MOVS*") could consume an uncorrectable memory error in the cache line _right after_ the desired region to copy and raise an MCE. Bit 0 of MSR_IA32_MISC_ENABLE can be cleared to disable fast string copy and will avoid such spurious machine checks. However, that is less preferable due to the permanent performance impact. Considering memory poison is rare, it's desirable to keep fast string copy enabled until an MCE is seen. Intel has confirmed the following: 1. The CPU erratum of fast string copy only applies to Skylake, Cascade Lake and Cooper Lake generations. Directly return from the MCE handler: 2. Will result in complete execution of the "REP; MOVS*" with no data loss or corruption. 3. Will not result in another MCE firing on the next poisoned cache line due to "REP; MOVS*". 4. Will resume execution from a correct point in code. 5. Will result in the same instruction that triggered the MCE firing a second MCE immediately for any other software recoverable data fetch errors. 6. Is not safe without disabling the fast string copy, as the next fast string copy of the same buffer on the same CPU would result in a PANIC MCE. This should mitigate the erratum completely with the only caveat that the fast string copy is disabled on the affected hyper thread thus performance degradation. This is still better than the OS crashing on MCEs raised on an irrelevant process due to "REP; MOVS*' accesses in a kernel context, e.g., copy_page. Injected errors on 1st cache line of 8 anonymous pages of process 'proc1' and observed MCE consumption from 'proc2' with no panic (directly returned). Without the fix, the host panicked within a few minutes on a random 'proc2' process due to kernel access from copy_page. [ bp: Fix comment style + touch ups, zap an unlikely(), improve the quirk function's readability. ] |