There is a race window between page_pinner_inited set and the pp_buffer
initialization which cause accessing the pp_buffer->lock. Avoid this by
moving the pp_buffer initialization to page_ext_ops->init() which sets
the page_pinner_inited only after the pp_buffer is initialized.
Race scenario:
1) init_page_pinner is called --> page_pinner_inited is set.
2) __alloc_contig_migrate_range --> __page_pinner_failure_detect()
accesses the pp_buffer->lock(yet to be initialized).
3) Then the pp_buffer is allocated and initialized.
Below is the issue call stack:
spin_bug+0x0
_raw_spin_lock_irqsave+0x3c
__page_pinner_failure_detect+0x110
__alloc_contig_migrate_range+0x1c4
alloc_contig_range+0x130
cma_alloc+0x170
dma_alloc_contiguous+0xa0
__dma_direct_alloc_pages+0x16c
dma_direct_alloc+0x88
Bug: 259024332
Change-Id: I6849ac4d944498b9a431b47cad7adc7903c9bbaa
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
For CMA allocation, it's really critical to migrate a page but
sometimes it fails. One of the reasons is some driver holds a
page refcount for a long time so VM couldn't migrate the page
at that time.
The concern here is there is no way to find the who hold the
refcount of the page effectively. This patch introduces feature
to keep tracking page's pinner. All get_page sites are vulnerable
to pin a page for a long time but the cost to keep track it would
be significat since get_page is the most frequent kernel operation.
Furthermore, the page could be not user page but kernel page which
is not related to the page migration failure.
Thus, this patch keeps tracks of only migration failed pages to
reduce runtime cost. Once page migration fails in CMA allocation
path, those pages are marked as "migration failure" and every
put_page operation against those pages, callstack of the put
are recorded into page_pinner buffer. Later, admin can see
what pages were failed and who released the refcount since the
failure. It really helps effectively to find out longtime refcount
holder to prevent the page migration.
note: page_pinner doesn't guarantee attributing/unattributing are
atomic if they happen at the same time. It's just best effort so
false-positive could happen.
Bug: 183414571
BUg: 240196534
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Minchan Kim <minchan@google.com>
Change-Id: I603d0c0122734c377db6b1eb95848a6f734173a0
(cherry picked from commit 898cfbf094a2fc13c67fab5b5d3c916f0139833a)