Go to file
Hugh Dickins 9ea42ba3e6 mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)
commit 073861ed77b6b957c3c8d54a11dc503f7d986ceb upstream.

Twice now, when exercising ext4 looped on shmem huge pages, I have crashed
on the PF_ONLY_HEAD check inside PageWaiters(): ext4_finish_bio() calling
end_page_writeback() calling wake_up_page() on tail of a shmem huge page,
no longer an ext4 page at all.

The problem is that PageWriteback is not accompanied by a page reference
(as the NOTE at the end of test_clear_page_writeback() acknowledges): as
soon as TestClearPageWriteback has been done, that page could be removed
from page cache, freed, and reused for something else by the time that
wake_up_page() is reached.

https://lore.kernel.org/linux-mm/20200827122019.GC14765@casper.infradead.org/
Matthew Wilcox suggested avoiding or weakening the PageWaiters() tail
check; but I'm paranoid about even looking at an unreferenced struct page,
lest its memory might itself have already been reused or hotremoved (and
wake_up_page_bit() may modify that memory with its ClearPageWaiters()).

Then on crashing a second time, realized there's a stronger reason against
that approach.  If my testing just occasionally crashes on that check,
when the page is reused for part of a compound page, wouldn't it be much
more common for the page to get reused as an order-0 page before reaching
wake_up_page()?  And on rare occasions, might that reused page already be
marked PageWriteback by its new user, and already be waited upon?  What
would that look like?

It would look like BUG_ON(PageWriteback) after wait_on_page_writeback()
in write_cache_pages() (though I have never seen that crash myself).

Matthew Wilcox explaining this to himself:
 "page is allocated, added to page cache, dirtied, writeback starts,

  --- thread A ---
  filesystem calls end_page_writeback()
        test_clear_page_writeback()
  --- context switch to thread B ---
  truncate_inode_pages_range() finds the page, it doesn't have writeback set,
  we delete it from the page cache.  Page gets reallocated, dirtied, writeback
  starts again.  Then we call write_cache_pages(), see
  PageWriteback() set, call wait_on_page_writeback()
  --- context switch back to thread A ---
  wake_up_page(page, PG_writeback);
  ... thread B is woken, but because the wakeup was for the old use of
  the page, PageWriteback is still set.

  Devious"

And prior to 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
this would have been much less likely: before that, wake_page_function()'s
non-exclusive case would stop walking and not wake if it found Writeback
already set again; whereas now the non-exclusive case proceeds to wake.

I have not thought of a fix that does not add a little overhead: the
simplest fix is for end_page_writeback() to get_page() before calling
test_clear_page_writeback(), then put_page() after wake_up_page().

Was there a chance of missed wakeups before, since a page freed before
reaching wake_up_page() would have PageWaiters cleared?  I think not,
because each waiter does hold a reference on the page.  This bug comes
when the old use of the page, the one we do TestClearPageWriteback on,
had *no* waiters, so no additional page reference beyond the page cache
(and whoever racily freed it).  The reuse of the page has a waiter
holding a reference, and its own PageWriteback set; but the belated
wake_up_page() has woken the reuse to hit that BUG_ON(PageWriteback).

Reported-by: syzbot+3622cea378100f45d59f@syzkaller.appspotmail.com
Reported-by: Qian Cai <cai@lca.pw>
Fixes: 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-28 10:18:42 +02:00
arch x86/apic: Fix kernel panic when booting with intremap=off and x2apic_phys 2023-06-28 10:18:42 +02:00
block block/blk-iocost (gcc13): keep large values in a new enum 2023-06-14 10:59:54 +02:00
certs certs/blacklist_hashes.c: fix const confusion in certs blacklist 2022-06-22 14:11:22 +02:00
crypto KEYS: asymmetric: Copy sig and digest in public_key_verify_signature() 2023-06-21 15:44:08 +02:00
Documentation Remove DECnet support from kernel 2023-06-21 15:44:10 +02:00
drivers i2c: imx-lpi2c: fix type char overflow issue when calculating the clock cycle 2023-06-28 10:18:42 +02:00
fs cifs: Fix potential deadlock when updating vol in cifs_reconnect() 2023-06-28 10:18:37 +02:00
include rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer() 2023-06-28 10:18:38 +02:00
init init/Kconfig: fix CC_HAS_ASM_GOTO_TIED_OUTPUT test with dash 2022-12-08 11:22:59 +01:00
ipc ipc/sem: Fix dangling sem_array access in semtimedop race 2022-12-08 11:23:06 +01:00
kernel cgroup: Do not corrupt task iteration when rebinding subsystem 2023-06-28 10:18:36 +02:00
lib test_firmware: fix a memory leak with reqs buffer 2023-06-21 15:44:08 +02:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
mm mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback) 2023-06-28 10:18:42 +02:00
net sch_netem: acquire qdisc lock in netem_change() 2023-06-28 10:18:40 +02:00
samples samples/bpf: Fix fout leak in hbm's run_bpf_prog 2023-05-30 12:44:03 +01:00
scripts recordmcount: Fix memory leaks in the uwrite function 2023-05-30 12:44:04 +01:00
security selinux: don't use make's grouped targets feature yet 2023-06-09 10:29:02 +02:00
sound ASoC: nau8824: Add quirk to active-high jack-detect 2023-06-28 10:18:41 +02:00
tools selftests/ptp: Fix timestamp printf format for PTP_SYS_OFFSET 2023-06-21 15:44:12 +02:00
usr initramfs: restore default compression behavior 2020-04-08 09:08:38 +02:00
virt KVM: Destroy target device if coalesced MMIO unregistration fails 2023-03-11 16:44:01 +01:00
.clang-format clang-format: Update with the latest for_each macro list 2019-08-31 10:00:51 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes
.gitignore Modules updates for v5.4 2019-09-22 10:34:46 -07:00
.mailmap ARM: SoC fixes 2019-11-10 13:41:59 -08:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS MAINTAINERS: Remove Simon as Renesas SoC Co-Maintainer 2019-10-10 08:12:51 -07:00
Kbuild kbuild: do not descend to ./Kbuild when cleaning 2019-08-21 21:03:58 +09:00
Kconfig docs: kbuild: convert docs to ReST and rename to *.rst 2019-06-14 14:21:21 -06:00
MAINTAINERS Remove DECnet support from kernel 2023-06-21 15:44:10 +02:00
Makefile Linux 5.4.248 2023-06-21 15:44:12 +02:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.