zhenwei pi
67f22ba775
mm/memory-failure: disable unpoison once hw error happens
Currently unpoison_memory(unsigned long pfn) is designed for soft
poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets cleared
on a x86 platform once hardware memory corrupts.
Unpoisoning a hardware corrupted page puts page back buddy only, the
kernel has a chance to access the page with *NOT PRESENT* KPTE. This
leads BUG during accessing on the corrupted KPTE.
Suggested by David&Naoya, disable unpoison mechanism when a real HW error
happens to avoid BUG like this:
Unpoison: Software-unpoisoned page 0x61234
BUG: unable to handle page fault for address: ffff888061234000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G M OE 5.18.0.bm.1-amd64 #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
RIP: 0010:clear_page_erms+0x7/0x10
Code: ...
RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000
RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000
RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276
R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001
FS: 00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
prep_new_page+0x151/0x170
get_page_from_freelist+0xca0/0xe20
? sysvec_apic_timer_interrupt+0xab/0xc0
? asm_sysvec_apic_timer_interrupt+0x1b/0x20
__alloc_pages+0x17e/0x340
__folio_alloc+0x17/0x40
vma_alloc_folio+0x84/0x280
__handle_mm_fault+0x8d4/0xeb0
handle_mm_fault+0xd5/0x2a0
do_user_addr_fault+0x1d0/0x680
? kvm_read_and_reset_apf_flags+0x3b/0x50
exc_page_fault+0x78/0x170
asm_exc_page_fault+0x27/0x30
Link: https://lkml.kernel.org/r/20220615093209.259374-2-pizhenwei@bytedance.com
Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support")
Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org> [5.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16 19:11:32 -07:00
..
2022-03-22 15:57:12 -07:00
2020-03-25 11:50:48 +01:00
2020-10-13 18:38:32 -07:00
2022-04-16 02:44:49 -06:00
2018-04-16 14:18:15 -06:00
2022-05-09 16:41:24 -06:00
2020-04-07 10:43:39 -07:00
2022-01-22 08:33:38 +02:00
2022-05-13 16:48:55 -07:00
2021-11-11 09:34:35 -08:00
2019-04-30 06:34:40 -06:00
2022-06-16 19:11:32 -07:00
2022-05-26 12:32:41 -07:00
2020-10-15 07:49:40 +02:00
2021-06-29 10:53:55 -07:00
2020-10-22 16:11:04 -06:00
2019-07-15 11:03:02 -03:00
2022-05-09 16:41:24 -06:00
2021-12-16 15:53:22 -07:00
2022-05-09 16:41:24 -06:00
2022-05-09 16:41:24 -06:00
2020-05-15 11:34:55 -06:00
2022-01-15 16:30:30 +02:00
2022-05-13 07:20:18 -07:00
2022-05-09 16:41:24 -06:00
2022-01-27 11:22:34 -07:00
2022-05-09 16:41:24 -06:00
2022-05-09 16:41:24 -06:00
2022-05-09 16:41:24 -06:00
2018-04-16 14:18:15 -06:00
2022-05-09 16:41:24 -06:00
2022-05-09 16:41:24 -06:00
2022-04-06 11:09:48 +02:00
2021-01-18 13:27:18 -07:00
2022-05-09 16:41:24 -06:00
2021-04-30 11:20:37 -07:00
2022-04-01 11:46:09 -07:00
2022-05-09 16:41:24 -06:00
2022-01-15 16:30:28 +02:00
2022-04-28 23:16:16 -07:00
2018-04-16 14:18:15 -06:00
2018-04-16 14:18:15 -06:00