Since the printk cpulock is CPU-reentrant and since it is used
in all contexts, its usage must be carefully considered and
most likely will require programming locklessly. To avoid
mistaking the printk cpulock as a typical lock, rename it to
cpu_sync. The main functions then become:
printk_cpu_sync_get_irqsave(flags);
printk_cpu_sync_put_irqrestore(flags);
Add extra notes of caution in the function description to help
developers understand the requirements for correct usage.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220421212250.565456-2-john.ogness@linutronix.de
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmI4ggsACgkQUqAMR0iA
lPLrlA//R12HGCGSzdpdyynl+5wByIqcHe8RANHOAj9f9qxBtmYv2ZK69mzSvhHO
6kAGdb3vBtxo1NCHeqxlXpds9GP/zOGEWmEJP2P7pIZ8ci8QtwrXCtQ8XIW9UGhJ
WHzXpXkfzcIDsRZs6B1pxN5cRXuW2VVzfgxyu6L+hvNV0o0PPO4A48ptzNBZh8rj
URAid+n/aGs9SOXM0h8SRjjBYEqjiB2RZ3gLg5XGZmcATtitBO135LGZnBR2fwnX
RZKckbdA/fBzqS4Njsp2rV5Rqldwj7mHzQbcsQm4YDrxSdl8d78XxQdAA5sNyaCD
ToDw6/DeegXzgtPJpuBH/ymF9RczIu4l3eawO1FBMCB5EPq56zVHWErxry8qaTgi
yQFqhBgifNN5NqfQCn7dyF10usmsvImFczre7ZxJvL7vmzqDsYYqdZG5oouLudR4
iOphFwX71v4X+RsxbOXqEt+mS3AwqEJc1SZl5rrDc4TSUOE1qCd+ncLTAuAf3Wfm
1xaZ+siomahcZAKrgmSw6AcD5bU+JJpr6FktKAddiO7J1+nIdT1lYEbpUsfWZ/p8
Kx8A2M2ula+whJ6CgtnGTsbsacsFi+j/MioMTGZIU+Fubkig3XEeIp3QUO6sEN+9
/sUQ6Wj6c95miWdttff9o6ap8py9NbfuKIw/HMOesfLVKP82rmw=
=EUpJ
-----END PGP SIGNATURE-----
Merge tag 'printk-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Make %pK behave the same as %p for kptr_restrict == 0 also with
no_hash_pointers parameter
- Ignore the default console in the device tree also when console=null
or console="" is used on the command line
- Document console=null and console="" behavior
- Prevent a deadlock and a livelock caused by console_lock in panic()
- Make console_lock available for panicking CPU
- Fast query for the next to-be-used sequence number
- Use the expected return values in printk.devkmsg __setup handler
- Use the correct atomic operations in wake_up_klogd() irq_work handler
- Avoid possible unaligned access when handling %4cc printing format
* tag 'printk-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: fix return value of printk.devkmsg __setup handler
vsprintf: Fix %pK with kptr_restrict == 0
printk: make suppress_panic_printk static
printk: Set console_set_on_cmdline=1 when __add_preferred_console() is called with user_specified == true
Docs: printk: add 'console=null|""' to admin/kernel-parameters
printk: use atomic updates for klogd work
printk: Drop console_sem during panic
printk: Avoid livelock with heavy printk during panic
printk: disable optimistic spin during panic
printk: Add panic_in_progress helper
vsprintf: Move space out of string literals in fourcc_string()
vsprintf: Fix potential unaligned access
printk: ringbuffer: Improve prb_next_seq() performance
- Rewrite how munlock works to massively reduce the contention
on i_mmap_rwsem (Hugh Dickins):
https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
- Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig):
https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
- Convert GUP to use folios and make pincount available for order-1
pages. (Matthew Wilcox)
- Convert a few more truncation functions to use folios (Matthew Wilcox)
- Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox)
- Convert rmap_walk to use folios (Matthew Wilcox)
- Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
- Add support for creating large folios in readahead (Matthew Wilcox)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmI4ucgACgkQDpNsjXcp
gj69Wgf6AwqwmO5Tmy+fLScDPqWxmXJofbocae1kyoGHf7Ui91OK4U2j6IpvAr+g
P/vLIK+JAAcTQcrSCjymuEkf4HkGZOR03QQn7maPIEe4eLrZRQDEsmHC1L9gpeJp
s/GMvDWiGE0Tnxu0EOzfVi/yT+qjIl/S8VvqtCoJv1HdzxitZ7+1RDuqImaMC5MM
Qi3uHag78vLmCltLXpIOdpgZhdZexCdL2Y/1npf+b6FVkAJRRNUnA0gRbS7YpoVp
CbxEJcmAl9cpJLuj5i5kIfS9trr+/QcvbUlzRxh4ggC58iqnmF2V09l2MJ7YU3XL
v1O/Elq4lRhXninZFQEm9zjrri7LDQ==
=n9Ad
-----END PGP SIGNATURE-----
Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache
Pull folio updates from Matthew Wilcox:
- Rewrite how munlock works to massively reduce the contention on
i_mmap_rwsem (Hugh Dickins):
https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
- Sort out the page refcount mess for ZONE_DEVICE pages (Christoph
Hellwig):
https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
- Convert GUP to use folios and make pincount available for order-1
pages. (Matthew Wilcox)
- Convert a few more truncation functions to use folios (Matthew
Wilcox)
- Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew
Wilcox)
- Convert rmap_walk to use folios (Matthew Wilcox)
- Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
- Add support for creating large folios in readahead (Matthew Wilcox)
* tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits)
mm/damon: minor cleanup for damon_pa_young
selftests/vm/transhuge-stress: Support file-backed PMD folios
mm/filemap: Support VM_HUGEPAGE for file mappings
mm/readahead: Switch to page_cache_ra_order
mm/readahead: Align file mappings for non-DAX
mm/readahead: Add large folio readahead
mm: Support arbitrary THP sizes
mm: Make large folios depend on THP
mm: Fix READ_ONLY_THP warning
mm/filemap: Allow large folios to be added to the page cache
mm: Turn can_split_huge_page() into can_split_folio()
mm/vmscan: Convert pageout() to take a folio
mm/vmscan: Turn page_check_references() into folio_check_references()
mm/vmscan: Account large folios correctly
mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
mm/vmscan: Free non-shmem folios without splitting them
mm/rmap: Constify the rmap_walk_control argument
mm/rmap: Convert rmap_walk() to take a folio
mm: Turn page_anon_vma() into folio_anon_vma()
mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read()
...
Allow the use of a deferrable timer, which does not force CPU wake-ups
when the system is idle. A consequence is that the sample interval
becomes very unpredictable, to the point that it is not guaranteed that
the KFENCE KUnit test still passes.
Nevertheless, on power-constrained systems this may be preferable, so
let's give the user the option should they accept the above trade-off.
Link: https://lkml.kernel.org/r/20220308141415.3168078-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In function kunit_test_timeout, it is declared "300 * MSEC_PER_SEC"
represent 5min. However, it is wrong when dealing with arm64 whose
default HZ = 250, or some other situations. Use msecs_to_jiffies to fix
this, and kunit_test_timeout will work as desired.
Link: https://lkml.kernel.org/r/20220309083753.1561921-3-liupeng256@huawei.com
Fixes: 5f3e062089 ("kunit: test: add support for test abort")
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Wang Kefeng <wangkefeng.wang@huawei.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "kunit: fix a UAF bug and do some optimization", v2.
This series is to fix UAF (use after free) when running kfence test case
test_gfpzero, which is time costly. This UAF bug can be easily triggered
by setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Furthermore, some
optimization for kunit tests has been done.
This patch (of 3):
Kunit will create a new thread to run an actual test case, and the main
process will wait for the completion of the actual test thread until
overtime. The variable "struct kunit test" has local property in function
kunit_try_catch_run, and will be used in the test case thread. Task
kunit_try_catch_run will free "struct kunit test" when kunit runs
overtime, but the actual test case is still run and an UAF bug will be
triggered.
The above problem has been both observed in a physical machine and qemu
platform when running kfence kunit tests. The problem can be triggered
when setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Under this setting, the
test case test_gfpzero will cost hours and kunit will run to overtime.
The follows show the panic log.
BUG: unable to handle page fault for address: ffffffff82d882e9
Call Trace:
kunit_log_append+0x58/0xd0
...
test_alloc.constprop.0.cold+0x6b/0x8a [kfence_test]
test_gfpzero.cold+0x61/0x8ab [kfence_test]
kunit_try_run_case+0x4c/0x70
kunit_generic_run_threadfn_adapter+0x11/0x20
kthread+0x166/0x190
ret_from_fork+0x22/0x30
Kernel panic - not syncing: Fatal exception
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
To solve this problem, the test case thread should be stopped when the
kunit frame runs overtime. The stop signal will send in function
kunit_try_catch_run, and test_gfpzero will handle it.
Link: https://lkml.kernel.org/r/20220309083753.1561921-1-liupeng256@huawei.com
Link: https://lkml.kernel.org/r/20220309083753.1561921-2-liupeng256@huawei.com
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Wang Kefeng <wangkefeng.wang@huawei.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The workingset will add the xa_node to the shadow_nodes list. So the
allocation of xa_node should be done by kmem_cache_alloc_lru(). Using
xas_set_lru() to pass the list_lru which we want to insert xa_node into to
set up the xa_node reclaim context correctly.
Link: https://lkml.kernel.org/r/20220228122126.37293-9-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Convert overflow selftest to KUnit
- Convert stackinit selftest to KUnit
- Implement size_t saturating arithmetic helpers
- Allow struct_size() to be used in initializers
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmI4l80WHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJjsSEACqmwsnvyQXI+fKBr/wsqGRGdTx
cURccVT/mhQSaAAJMoYjWqOQZVs63dwtoM9leVA9rZuAFNFyiGKrK5r/KhpOijYu
AlIOPJzxDnPDu/jHHtAnDgsUeTHPhDnqLPK5j+oz1gPkyHBLyBFvEqDNrlAiTbvV
JLkssdcYPEv8QiLBkqX5ossOexxHksvxixmXts1Vc85I/anyuvtbpq/u7HsUrbcO
+f/qj7ekB114VgREPJZu5wc2pB+iJMA8jEGqrNLWCOqRIFXJOWLWky/wmATjwXST
Pi1kwzII7XZQMrVlMOK0P4YxepLKv5wnJGxZIi6JwJswd0a6oc8NLDTXrtHEq0jq
5Vqq+nPCyW2+OLWF5sNLYzlArI3G6tIPWQSxJcLfcnXLP/tz1+KiW4aa46V16N+D
MBQBCK1xei61kWFixn5qGVydOoaTTXgDhMWenxEk55EuU+S9XmiC1Nwvodsl65dv
RVGEYfk/7AlRGGTdasn35+6cmrFaCrElGz8+ZfDTaZZbbr6FfWpXRB4xQYwmqwDh
YGoyXNQdqlxtGaH5lutmsK5l+q2NlD0u8qRk6pti07hHMAJEyb0i6o3lNsUyw38T
gjoglwZUYOUwGOaWk6IOA7Gc3vCycdzP5t2njjBx/54PrCI9tq1oCN9bE6eAtRcA
4BoHC368qhuPttUaWA==
=eRcK
-----END PGP SIGNATURE-----
Merge tag 'overflow-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull overflow updates from Kees Cook:
"These changes come in roughly two halves: support of Gustavo A. R.
Silva's struct_size() work via additional helpers for catching
overflow allocation size calculations, and conversions of selftests to
KUnit (which includes some tweaks for UML + Clang):
- Convert overflow selftest to KUnit
- Convert stackinit selftest to KUnit
- Implement size_t saturating arithmetic helpers
- Allow struct_size() to be used in initializers"
* tag 'overflow-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lib: stackinit: Convert to KUnit
um: Allow builds with Clang
lib: overflow: Convert to Kunit
overflow: Provide constant expression struct_size
overflow: Implement size_t saturating arithmetic helpers
test_overflow: Regularize test reporting output
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI0/QUQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpn8GEACRVxJaJV5qjZfoFAQKoAWJEtquwjeARyB+
0V8ROWHDWHSacdug9wBytayiS1lz2zmUHJ6YXyts2dn0v6CrK4s8yGzk5G/RgH6+
6M3GmBKjj+r1DfE8L3OoQWkDR1JFPuFxXTG/uBd7fBY2Excih1Z0D2lpspMleIRf
w8zBrlWrWH8lZlm6HF3fadjEoiWhOM5F4Ofz3eg/PAQrHuD06z8hjQgMeR0jQVzw
bWF9jrdNIplxRjNWIwCTsQRM+z5KQhUGwDODJjIwdQtVaKSt9D99ZbeKTudlslQ2
zrizsCq8P1RjBPcrA45FV6QnT9DIRRGrYzHD63qC6fDae34rbzdSHUwRMP2XSxo8
+hT1AzGypiBauODTPzHFtTskaQ0KibLznEanChh/ThySmNYcEVAljSx3Z5Vo81J+
IqJYK2m3RESCFruy9w3U/P7qiXZmqYldPfjxAKq8ucg6x1PU3XRAVm7SI/i4l75D
Crk1ujj2LJgsyxL6qMrK3XUavl1SJdzWeFSarcCt3m4m11EWWfYzmG8Yn8OE2CEZ
a2CAyDsRi8CZ3hvkaMwigL4wBJjrrig8vyIgok3VrfCmYlNNqMQqM5Rw7vzjR3v1
cKewI3rQjkFXEaveIXyGPTI/0Da4cT0DOfn/Mws9MDUXNPlFMNEDUZkPuzMywiTB
2SWDLTe77g==
=993h
-----END PGP SIGNATURE-----
Merge tag 'for-5.18/drivers-2022-03-18' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
- NVMe updates via Christoph:
- add vectored-io support for user-passthrough (Kanchan Joshi)
- add verbose error logging (Alan Adamson)
- support buffered I/O on block devices in nvmet (Chaitanya
Kulkarni)
- central discovery controller support (Martin Belanger)
- fix and extended the globally unique idenfier validation
(Christoph)
- move away from the deprecated IDA APIs (Sagi Grimberg)
- misc code cleanup (Keith Busch, Max Gurtovoy, Qinghua Jin,
Chaitanya Kulkarni)
- add lockdep annotations for in-kernel sockets (Chris Leech)
- use vmalloc for ANA log buffer (Hannes Reinecke)
- kerneldoc fixes (Chaitanya Kulkarni)
- cleanups (Guoqing Jiang, Chaitanya Kulkarni, Christoph)
- warn about shared namespaces without multipathing (Christoph)
- MD updates via Song with a set of cleanups (Christoph, Mariusz, Paul,
Erik, Dirk)
- loop cleanups and queue depth configuration (Chaitanya)
- null_blk cleanups and fixes (Chaitanya)
- Use descriptive init/exit names in virtio_blk (Randy)
- Use bvec_kmap_local() in drivers (Christoph)
- bcache fixes (Mingzhe)
- xen blk-front persistent grant speedups (Juergen)
- rnbd fix and cleanup (Gioh)
- Misc fixes (Christophe, Colin)
* tag 'for-5.18/drivers-2022-03-18' of git://git.kernel.dk/linux-block: (76 commits)
virtio_blk: eliminate anonymous module_init & module_exit
nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH
nvme: remove nvme_alloc_request and nvme_alloc_request_qid
nvme: cleanup how disk->disk_name is assigned
nvmet: move the call to nvmet_ns_changed out of nvmet_ns_revalidate
nvmet: use snprintf() with PAGE_SIZE in configfs
nvmet: don't fold lines
nvmet-rdma: fix kernel-doc warning for nvmet_rdma_device_removal
nvmet-fc: fix kernel-doc warning for nvmet_fc_unregister_targetport
nvmet-fc: fix kernel-doc warning for nvmet_fc_register_targetport
nvme-tcp: lockdep: annotate in-kernel sockets
nvme-tcp: don't fold the line
nvme-tcp: don't initialize ret variable
nvme-multipath: call bio_io_error in nvme_ns_head_submit_bio
nvme-multipath: use vmalloc for ANA log buffer
xen/blkfront: speed up purge_persistent_grants()
raid5: initialize the stripe_head embeeded bios as needed
raid5-cache: statically allocate the recovery ra bio
raid5-cache: fully initialize flush_bio when needed
raid5-ppl: fully initialize the bio in ppl_new_iounit
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmIzwtEACgkQSfxwEqXe
A67NCBAA1+U01HXx4ethmmy1m2pXHAIwngI7PP0QzyZtmoloWockdN1lRfQ1C0uJ
Whk/9Hc9G7iujznsxOnCS+LeNwRzd7CjtFbTgK+yGIRKwL9GFcVwA5nrifP9TjqZ
FWmTIomjjmA06YRYsNOdNSQdN6DdpQz8xLw0EqVOZerI4ITFErYlW8lLqOOKY99N
f9glQK75kh41SUgo+K3JSn46fhB95HldL6dYSZzjQ6QsVKBQuQTDE9ryfrH2XZDw
xI2nf/ycXPUBv7Bb+0op+7ES++CoDigM2nIyxapEj3ZkpplxL4M+cCIHq3Juzfwm
jDdbZbs5SqDszOQM/dvCJSR+S/D3QIKdv3fwwWHDTigByZdgpudT3rr9k7dY60Z8
aNvOzNWOzGH9/0boLl55WysF6cBQnazbgtzeWpzeuWFhAyfxN/DJx2sf8U+TmN6n
3bDUafamAvmkkIOoHUzOXfjo2lhXxlmRZ40rWVNX5JvcJj5+5jRmTawrQj+9fn8/
MhiIZ6KBDV1OxPwJzG6jm++JP6rgXfXsxduomO7cIEWs10itf/cE8WD9qJrtZTtg
kfjYUguFOd/QyzY0A1w6FD865vy8YhATk71Ywgwj9AI+cfH8QUajpDkXOutjop8x
8HBxIGx6Itgzilfuo5jpJxlVhNO3G6v1fX/A+mUMAfHufkmnfiQ=
=cyDR
-----END PGP SIGNATURE-----
Merge tag 'random-5.18-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"There have been a few important changes to the RNG's crypto, but the
intent for 5.18 has been to shore up the existing design as much as
possible with modern cryptographic functions and proven constructions,
rather than actually changing up anything fundamental to the RNG's
design.
So it's still the same old RNG at its core as before: it still counts
entropy bits, and collects from the various sources with the same
heuristics as before, and so forth. However, the cryptographic
algorithms that transform that entropic data into safe random numbers
have been modernized.
Just as important, if not more, is that the code has been cleaned up
and re-documented. As one of the first drivers in Linux, going back to
1.3.30, its general style and organization was showing its age and
becoming both a maintenance burden and an auditability impediment.
Hopefully this provides a more solid foundation to build on for the
future. I encourage you to open up the file in full, and maybe you'll
remark, "oh, that's what it's doing," and enjoy reading it. That, at
least, is the eventual goal, which this pull begins working toward.
Here's a summary of the various patches in this pull:
- /dev/urandom and /dev/random now do the same thing, per the patch
we discussed on the list. I think this is worth trying out. If it
does appear problematic, I've made sure to keep it standalone and
revertible without any conflicts.
- Fixes and cleanups for numerous integer type problems, locking
issues, and general code quality concerns.
- The input pool's LFSR has been replaced with a cryptographically
secure hash function, which has security and performance benefits
alike, and consequently allows us to count entropy bits linearly.
- The pre-init injection now uses a real hash function too, instead
of an LFSR or vanilla xor.
- The interrupt handler's fast_mix() function now uses one round of
SipHash, rather than the fake crypto that was there before.
- All additions of RDRAND and RDSEED now go through the input pool's
hash function, in part to mitigate ridiculous hypothetical CPU
backdoors, but more so to have a consistent interface for ingesting
entropy that's easy to analyze, making everything happen one way,
instead of a potpourri of different ways.
- The crng now works on per-cpu data, while also being in accordance
with the actual "fast key erasure RNG" design. This allows us to
fix several boot-time race complications associated with the prior
dynamically allocated model, eliminates much locking, and makes our
backtrack protection more robust.
- Batched entropy now erases doled out values so that it's backtrack
resistant.
- Working closely with Sebastian, the interrupt handler no longer
needs to take any locks at all, as we punt the
synchronized/expensive operations to a workqueue. This is
especially nice for PREEMPT_RT, where taking spinlocks in irq
context is problematic. It also makes the handler faster for the
rest of us.
- Also working with Sebastian, we now do the right thing on CPU
hotplug, so that we don't use stale entropy or fail to accumulate
new entropy when CPUs come back online.
- We handle virtual machines that fork / clone / snapshot, using the
"vmgenid" ACPI specification for retrieving a unique new RNG seed,
which we can use to also make WireGuard (and in the future, other
things) safe across VM forks.
- Around boot time, we now try to reseed more often if enough entropy
is available, before settling on the usual 5 minute schedule.
- Last, but certainly not least, the documentation in the file has
been updated considerably"
* tag 'random-5.18-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (60 commits)
random: check for signal and try earlier when generating entropy
random: reseed more often immediately after booting
random: make consistent usage of crng_ready()
random: use SipHash as interrupt entropy accumulator
wireguard: device: clear keys on VM fork
random: provide notifier for VM fork
random: replace custom notifier chain with standard one
random: do not export add_vmfork_randomness() unless needed
virt: vmgenid: notify RNG of VM fork and supply generation ID
ACPI: allow longer device IDs
random: add mechanism for VM forks to reinitialize crng
random: don't let 644 read-only sysctls be written to
random: give sysctl_random_min_urandom_seed a more sensible value
random: block in /dev/urandom
random: do crng pre-init loading in worker rather than irq
random: unify cycles_t and jiffies usage and types
random: cleanup UUID handling
random: only wake up writers after zap if threshold was passed
random: round-robin registers as ulong, not u32
random: clear fast pool, crng, and batches in cpuhp bring up
...
Convert stackinit unit tests to KUnit, for better integration
into the kernel self test framework. Includes a rename of
test_stackinit.c to stackinit_kunit.c, and CONFIG_TEST_STACKINIT to
CONFIG_STACKINIT_KUNIT_TEST.
Adjust expected test results based on which stack initialization method
was chosen:
$ CMD="./tools/testing/kunit/kunit.py run stackinit --raw_output \
--arch=x86_64 --kconfig_add"
$ $CMD | grep stackinit:
# stackinit: pass:36 fail:0 skip:29 total:65
$ $CMD CONFIG_GCC_PLUGIN_STRUCTLEAK_USER=y | grep stackinit:
# stackinit: pass:37 fail:0 skip:28 total:65
$ $CMD CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF=y | grep stackinit:
# stackinit: pass:55 fail:0 skip:10 total:65
$ $CMD CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL=y | grep stackinit:
# stackinit: pass:62 fail:0 skip:3 total:65
$ $CMD CONFIG_INIT_STACK_ALL_PATTERN=y --make_option LLVM=1 | grep stackinit:
# stackinit: pass:60 fail:0 skip:5 total:65
$ $CMD CONFIG_INIT_STACK_ALL_ZERO=y --make_option LLVM=1 | grep stackinit:
# stackinit: pass:60 fail:0 skip:5 total:65
Temporarily remove the userspace-build mode, which will be restored in a
later patch.
Expand the size of the pre-case switch variable so it doesn't get
accidentally cleared.
Cc: David Gow <davidgow@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
v1: https://lore.kernel.org/lkml/20220224055145.1853657-1-keescook@chromium.org
v2:
- split "userspace KUnit stub" into separate header and patch (Daniel)
- Improve commit log and comments (David)
- Provide mapping of expected XFAIL tests to CONFIGs (David)
We previously rolled our own randomness readiness notifier, which only
has two users in the whole kernel. Replace this with a more standard
atomic notifier block that serves the same purpose with less code. Also
unexport the symbols, because no modules use it, only unconditional
builtins. The only drawback is that it's possible for a notification
handler returning the "stop" code to prevent further processing, but
given that there are only two users, and that we're unexporting this
anyway, that doesn't seem like a significant drawback for the
simplification we receive here.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
On Ubuntu 21.10 (ppc64le) building raid6test with gcc (Ubuntu
11.2.0-7ubuntu2) 11.2.0 fails with the error below.
gcc -I.. -I ../../../include -g -O2 \
-I../../../arch/powerpc/include -DCONFIG_ALTIVEC \
-c -o vpermxor1.o vpermxor1.c
vpermxor1.c: In function ‘raid6_vpermxor1_gen_syndrome_real’:
vpermxor1.c:64:29: error: expected string literal before ‘VPERMXOR’
64 | asm(VPERMXOR(%0,%1,%2,%3):"=v"(wq0):"v"(gf_high), "v"(gf_low), "v"(wq0));
| ^~~~~~~~
make: *** [Makefile:58: vpermxor1.o] Error 1
So, include the header asm/ppc-opcode.h defining this macro also when
not building the Linux kernel but only this too.
Cc: Matt Brown <matthew.brown.dev@gmail.com>
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Song Liu <song@kernel.org>
Buidling raid6test on Ubuntu 21.10 (ppc64le) with GNU Make 4.3 shows the
errors below:
$ cd lib/raid6/test/
$ make
<stdin>:1:1: error: stray ‘\’ in program
<stdin>:1:2: error: stray ‘#’ in program
<stdin>:1:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ \
before ‘<’ token
[...]
The errors come from the HAS_ALTIVEC test, which fails, and the POWER
optimized versions are not built. That’s also reason nobody noticed on the
other architectures.
GNU Make 4.3 does not remove the backslash anymore. From the 4.3 release
announcment:
> * WARNING: Backward-incompatibility!
> Number signs (#) appearing inside a macro reference or function invocation
> no longer introduce comments and should not be escaped with backslashes:
> thus a call such as:
> foo := $(shell echo '#')
> is legal. Previously the number sign needed to be escaped, for example:
> foo := $(shell echo '\#')
> Now this latter will resolve to "\#". If you want to write makefiles
> portable to both versions, assign the number sign to a variable:
> H := \#
> foo := $(shell echo '$H')
> This was claimed to be fixed in 3.81, but wasn't, for some reason.
> To detect this change search for 'nocomment' in the .FEATURES variable.
So, do the same as commit 9564a8cf42 ("Kbuild: fix # escaping in .cmd
files for future Make") and commit 929bef4677 ("bpf: Use $(pound) instead
of \# in Makefiles") and define and use a $(pound) variable.
Reference for the change in make:
https://git.savannah.gnu.org/cgit/make.git/commit/?id=c6966b323811c37acedff05b57
Cc: Matt Brown <matthew.brown.dev@gmail.com>
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Song Liu <song@kernel.org>
GCC 10+ defaults to -fno-common, which enforces proper declaration of
external references using "extern". without this change a link would
fail with:
lib/raid6/test/algos.c:28: multiple definition of `raid6_call';
lib/raid6/test/test.c:22: first defined here
the pq.h header that is included already includes an extern declaration
so we can just remove the redundant one here.
Cc: <stable@vger.kernel.org>
Signed-off-by: Dirk Müller <dmueller@suse.de>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Song Liu <song@kernel.org>
ZONE_DEVICE struct pages have an extra reference count that complicates
the code for put_page() and several places in the kernel that need to
check the reference count to see that a page is not being used (gup,
compaction, migration, etc.). Clean up the code so the reference count
doesn't need to be treated specially for ZONE_DEVICE pages.
Note that this excludes the special idle page wakeup for fsdax pages,
which still happens at refcount 1. This is a separate issue and will
be sorted out later. Given that only fsdax pages require the
notifiacation when the refcount hits 1 now, the PAGEMAP_OPS Kconfig
symbol can go away and be replaced with a FS_DAX check for this hook
in the put_page fastpath.
Based on an earlier patch from Ralph Campbell <rcampbell@nvidia.com>.
Link: https://lkml.kernel.org/r/20220210072828.2930359-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Cc: Christian Knig <christian.koenig@amd.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Move the check for the actual pgmap types that need the free at refcount
one behavior into the out of line helper, and thus avoid the need to
pull memremap.h into mm.h.
Link: https://lkml.kernel.org/r/20220210072828.2930359-7-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
hmm.h pulls in the world for no good reason at all. Remove the
includes and push a few ones into the users instead.
Link: https://lkml.kernel.org/r/20220210072828.2930359-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Christian Knig <christian.koenig@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
A subsequent patch will make the crypto/dh's dh_is_pubkey_valid() to
calculate a safe-prime groups Q parameter from P: Q = (P - 1) / 2. For
implementing this, mpi_rshift() will be needed. Export it so that it's
accessible from crypto/dh.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
With HW_TAGS KASAN and kasan.stacktrace=off, the cache created in the
kmem_cache_double_destroy() test might get merged with an existing one.
Thus, the first kmem_cache_destroy() call won't actually destroy it but
will only decrease the refcount. This causes the test to fail.
Provide an empty constructor for the created cache to prevent the cache
from getting merged.
Link: https://lkml.kernel.org/r/b597bd434c49591d8af00ee3993a42c609dc9a59.1644346040.git.andreyknvl@google.com
Fixes: f98f966cd7 ("kasan: test: add test case for double-kmem_cache_destroy()")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Although kptr_restrict is set to 0 and the kernel is booted with
no_hash_pointers parameter, the content of /proc/vmallocinfo is
lacking the real addresses.
/ # cat /proc/vmallocinfo
0x(ptrval)-0x(ptrval) 8192 load_module+0xc0c/0x2c0c pages=1 vmalloc
0x(ptrval)-0x(ptrval) 12288 start_kernel+0x4e0/0x690 pages=2 vmalloc
0x(ptrval)-0x(ptrval) 12288 start_kernel+0x4e0/0x690 pages=2 vmalloc
0x(ptrval)-0x(ptrval) 8192 _mpic_map_mmio.constprop.0+0x20/0x44 phys=0x80041000 ioremap
0x(ptrval)-0x(ptrval) 12288 _mpic_map_mmio.constprop.0+0x20/0x44 phys=0x80041000 ioremap
...
According to the documentation for /proc/sys/kernel/, %pK is
equivalent to %p when kptr_restrict is set to 0.
Fixes: 5ead723a20 ("lib/vsprintf: no_hash_pointers prints all addresses as unhashed")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/107476128e59bff11a309b5bf7579a1753a41aca.1645087605.git.christophe.leroy@csgroup.eu
Pull ITER_PIPE fix from Al Viro:
"Fix for old sloppiness in pipe_buffer reuse"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
lib/iov_iter: initialize "flags" in new pipe_buffer
These explicit tracepoints aren't really used and show sign of aging.
It's work to keep these up to date, and before I attempted to keep them
up to date, they weren't up to date, which indicates that they're not
really used. These days there are better ways of introspecting anyway.
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The functions copy_page_to_iter_pipe() and push_pipe() can both
allocate a new pipe_buffer, but the "flags" member initializer is
missing.
Fixes: 241699cd72 ("new iov_iter flavour: pipe-backed")
To: Alexander Viro <viro@zeniv.linux.org.uk>
To: linux-fsdevel@vger.kernel.org
To: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Resending this to properly add it to the patch tracker - thanks for letting
me know, Arnd :)
When ARM is enabled, and BITREVERSE is disabled,
Kbuild gives the following warning:
WARNING: unmet direct dependencies detected for HAVE_ARCH_BITREVERSE
Depends on [n]: BITREVERSE [=n]
Selected by [y]:
- ARM [=y] && (CPU_32v7M [=n] || CPU_32v7 [=y]) && !CPU_32v6 [=n]
This is because ARM selects HAVE_ARCH_BITREVERSE
without selecting BITREVERSE, despite
HAVE_ARCH_BITREVERSE depending on BITREVERSE.
This unmet dependency bug was found by Kismet,
a static analysis tool for Kconfig. Please advise if this
is not the appropriate solution.
Signed-off-by: Julian Braha <julianbraha@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
There have been cases where struct_size() (or flex_array_size()) needs
to be calculated for an initializer, which requires it be a constant
expression. This is possible when the "count" argument is a constant
expression, so provide this ability for the helpers.
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Tested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/lkml/20220210010407.GA701603@embeddedor
In order to perform more open-coded replacements of common allocation
size arithmetic, the kernel needs saturating (SIZE_MAX) helpers for
multiplication, addition, and subtraction. For example, it is common in
allocators, especially on realloc, to add to an existing size:
p = krealloc(map->patch,
sizeof(struct reg_sequence) * (map->patch_regs + num_regs),
GFP_KERNEL);
There is no existing saturating replacement for this calculation, and
just leaving the addition open coded inside array_size() could
potentially overflow as well. For example, an overflow in an expression
for a size_t argument might wrap to zero:
array_size(anything, something_at_size_max + 1) == 0
Introduce size_mul(), size_add(), and size_sub() helpers that
implicitly promote arguments to size_t and saturated calculations for
use in allocations. With these helpers it is also possible to redefine
array_size(), array3_size(), flex_array_size(), and struct_size() in
terms of the new helpers.
As with the check_*_overflow() helpers, the new helpers use __must_check,
though what is really desired is a way to make sure that assignment is
only to a size_t lvalue. Without this, it's still possible to introduce
overflow/underflow via type conversion (i.e. from size_t to int).
Enforcing this will currently need to be left to static analysis or
future use of -Wconversion.
Additionally update the overflow unit tests to force runtime evaluation
for the pathological cases.
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Len Baker <len.baker@gmx.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
The literals "big-endian" and "little-endian" may be potentially
occurred in other places. Dropping space allows linker to
merge them by using only a single copy.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220127181233.72910-2-andriy.shevchenko@linux.intel.com
The %p4cc specifier in some cases might get an unaligned pointer.
Due to this we need to make copy to local variable once to avoid
potential crashes on some architectures due to improper access.
Fixes: af612e43de ("lib/vsprintf: Add support for printing V4L2 and DRM fourccs")
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220127181233.72910-1-andriy.shevchenko@linux.intel.com
Since __sbitmap_queue_get_shallow() was introduced in commit c05e667337
("sbitmap: add sbitmap_get_shallow() operation"), it has not been used.
Delete __sbitmap_queue_get_shallow() and rename public
__sbitmap_queue_get_shallow() -> sbitmap_queue_get_shallow() as it is odd
to have public __foo but no foo at all.
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1644322024-105340-1-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Only the last sbitmap_word can have different depth, and all the others
must have same depth of 1U << sb->shift, so not necessary to store it in
sbitmap_word, and it can be retrieved easily and efficiently by adding
one internal helper of __map_depth(sb, index).
Remove 'depth' field from sbitmap_word, then the annotation of
____cacheline_aligned_in_smp for 'word' isn't needed any more.
Not see performance effect when running high parallel IOPS test on
null_blk.
This way saves us one cacheline(usually 64 words) per each sbitmap_word.
Cc: Martin Wilck <martin.wilck@suse.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/20220110072945.347535-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blake2s_compress_generic is weakly aliased by blake2s_compress. The
current harness for function selection uses a function pointer, which is
ordinarily inlined and resolved at compile time. But when Clang's CFI is
enabled, CFI still triggers when making an indirect call via a weak
symbol. This seems like a bug in Clang's CFI, as though it's bucketing
weak symbols and strong symbols differently. It also only seems to
trigger when "full LTO" mode is used, rather than "thin LTO".
[ 0.000000][ T0] Kernel panic - not syncing: CFI failure (target: blake2s_compress_generic+0x0/0x1444)
[ 0.000000][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-mainline-06981-g076c855b846e #1
[ 0.000000][ T0] Hardware name: MT6873 (DT)
[ 0.000000][ T0] Call trace:
[ 0.000000][ T0] dump_backtrace+0xfc/0x1dc
[ 0.000000][ T0] dump_stack_lvl+0xa8/0x11c
[ 0.000000][ T0] panic+0x194/0x464
[ 0.000000][ T0] __cfi_check_fail+0x54/0x58
[ 0.000000][ T0] __cfi_slowpath_diag+0x354/0x4b0
[ 0.000000][ T0] blake2s_update+0x14c/0x178
[ 0.000000][ T0] _extract_entropy+0xf4/0x29c
[ 0.000000][ T0] crng_initialize_primary+0x24/0x94
[ 0.000000][ T0] rand_initialize+0x2c/0x6c
[ 0.000000][ T0] start_kernel+0x2f8/0x65c
[ 0.000000][ T0] __primary_switched+0xc4/0x7be4
[ 0.000000][ T0] Rebooting in 5 seconds..
Nonetheless, the function pointer method isn't so terrific anyway, so
this patch replaces it with a simple boolean, which also gets inlined
away. This successfully works around the Clang bug.
In general, I'm not too keen on all of the indirection involved here; it
clearly does more harm than good. Hopefully the whole thing can get
cleaned up down the road when lib/crypto is overhauled more
comprehensively. But for now, we go with a simple bandaid.
Fixes: 6048fdcc5f ("lib/crypto: blake2s: include as built-in")
Link: https://github.com/ClangBuiltLinux/linux/issues/1567
Reported-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
crc32c_le self test had a stray multiply by two inherited from
the crc32_le+crc32_be test loop.
Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crc32_le and __crc32c_le can be overridden - extend this to crc32_be.
Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Casts were added in commit 8f243af42a ("sections: fix const sections
for crc32 table") to cope with the tables not being const. They are no
longer required since commit f5e38b9284 ("lib: crc32: constify crc32
lookup table").
Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
With CONFIG_FORTIFY_SOURCE enabled, string functions will also perform
dynamic checks using __builtin_object_size(ptr), which when failed will
panic the kernel.
Because the KASAN test deliberately performs out-of-bounds operations,
the kernel panics with FORTIFY_SOURCE, for example:
| kernel BUG at lib/string_helpers.c:910!
| invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
| CPU: 1 PID: 137 Comm: kunit_try_catch Tainted: G B 5.16.0-rc3+ #3
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
| RIP: 0010:fortify_panic+0x19/0x1b
| ...
| Call Trace:
| kmalloc_oob_in_memset.cold+0x16/0x16
| ...
Fix it by also hiding `ptr` from the optimizer, which will ensure that
__builtin_object_size() does not return a valid size, preventing
fortified string functions from panicking.
Link: https://lkml.kernel.org/r/20220124160744.1244685-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Nico Pache <npache@redhat.com>
Reviewed-by: Nico Pache <npache@redhat.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stand-alone implementation of the SM3 algorithm. It is designed
to have as little dependencies as possible. In other cases you
should generally use the hash APIs from include/crypto/hash.h.
Especially when hashing large amounts of data as those APIs may
be hw-accelerated. In the new SM3 stand-alone library,
sm3_transform() has also been optimized, instead of simply using
the code in sm3_generic.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Reviewed-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit 180dccb0db ("blk-mq: fix tag_get wait task can't be
awakened") will recalculate wake_batch when incrementing or decrementing
active_queues to avoid wake_batch > hctx_max_depth. At the same time, in
order to not affect performance as much as possible, the minimum wakeup
batch is set to 4. But when the QD is small (such as QD=1), if inc or dec
active_queues increases wakeup batch, that can lead to a hang:
Fix this problem with the following strategies:
QD : >= 32 | < 32
---------------------------------
wakeup batch: 8~4 | 3~1
Fixes: 180dccb0db ("blk-mq: fix tag_get wait task can't be awakened")
Link: https://lore.kernel.org/linux-block/78cafe94-a787-e006-8851-69906f0c2128@huawei.com/T/#t
Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>
Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
Link: https://lore.kernel.org/r/20220127100047.1763746-1-qiulaibin@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQHJBAABCgAzFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmHi+xgVHHl1cnkubm9y
b3ZAZ21haWwuY29tAAoJELFEgP06H77IxdoMAMf3E+L51Ys/4iAiyJQNVoT3aIBC
A8ZVOB9he1OA3o3wBNIRKmICHk+ovnfCWcXTr9fG/Ade2wJz88NAsGPQ1Phywb+s
iGlpySllFN72RT9ZqtJhLEzgoHHOL0CzTW07TN9GJy4gQA2h2G9CTP+OmsQdnVqE
m9Fn3PSlJ5lhzePlKfnln8rGZFgrriJakfEFPC79n/7an4+2Hvkb5rWigo7KQc4Z
9YNqYUcHWZFUgq80adxEb9LlbMXdD+Z/8fCjOrAatuwVkD4RDt6iKD0mFGjHXGL7
MZ9KRS8AfZXawmetk3jjtsV+/QkeS+Deuu7k0FoO0Th2QV7BGSDhsLXAS5By/MOC
nfSyHhnXHzCsBMyVNrJHmNhEZoN29+tRwI84JX9lWcf/OLANcCofnP6f2UIX7tZY
CAZAgVELp+0YQXdybrfzTQ8BT3TinjS/aZtCrYijRendI1GwUXcyl69vdOKqAHuk
5jy8k/xHyp+ZWu6v+PyAAAEGowY++qhL0fmszA==
=RKW4
-----END PGP SIGNATURE-----
Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux
Pull bitmap updates from Yury Norov:
- introduce for_each_set_bitrange()
- use find_first_*_bit() instead of find_next_*_bit() where possible
- unify for_each_bit() macros
* tag 'bitmap-5.17-rc1' of git://github.com/norov/linux:
vsprintf: rework bitmap_list_string
lib: bitmap: add performance test for bitmap_print_to_pagebuf
bitmap: unify find_bit operations
mm/percpu: micro-optimize pcpu_is_populated()
Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
find: micro-optimize for_each_{set,clear}_bit()
include/linux: move for_each_bit() macros from bitops.h to find.h
cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
tools: sync tools/bitmap with mother linux
all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
cpumask: use find_first_and_bit()
lib: add find_first_and_bit()
arch: remove GENERIC_FIND_FIRST_BIT entirely
include: move find.h from asm_generic to linux
bitops: move find_bit_*_le functions from le.h to find.h
bitops: protect find_first_{,zero}_bit properly
The non-interrupt portion of interrupt stack traces before interrupt
entry is usually arbitrary. Therefore, saving stack traces of
interrupts (that include entries before interrupt entry) to stack depot
leads to unbounded stackdepot growth.
As such, use of filter_irq_stacks() is a requirement to ensure
stackdepot can efficiently deduplicate interrupt stacks.
Looking through all current users of stack_depot_save(), none (except
KASAN) pass the stack trace through filter_irq_stacks() before passing
it on to stack_depot_save().
Rather than adding filter_irq_stacks() to all current users of
stack_depot_save(), it became clear that stack_depot_save() should
simply do filter_irq_stacks().
Link: https://lkml.kernel.org/r/20211130095727.2378739-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, enabling CONFIG_STACKDEPOT means its stack_table will be
allocated from memblock, even if stack depot ends up not actually used.
The default size of stack_table is 4MB on 32-bit, 8MB on 64-bit.
This is fine for use-cases such as KASAN which is also a config option
and has overhead on its own. But it's an issue for functionality that
has to be actually enabled on boot (page_owner) or depends on hardware
(GPU drivers) and thus the memory might be wasted. This was raised as
an issue [1] when attempting to add stackdepot support for SLUB's debug
object tracking functionality. It's common to build kernels with
CONFIG_SLUB_DEBUG and enable slub_debug on boot only when needed, or
create only specific kmem caches with debugging for testing purposes.
It would thus be more efficient if stackdepot's table was allocated only
when actually going to be used. This patch thus makes the allocation
(and whole stack_depot_init() call) optional:
- Add a CONFIG_STACKDEPOT_ALWAYS_INIT flag to keep using the current
well-defined point of allocation as part of mem_init(). Make
CONFIG_KASAN select this flag.
- Other users have to call stack_depot_init() as part of their own init
when it's determined that stack depot will actually be used. This may
depend on both config and runtime conditions. Convert current users
which are page_owner and several in the DRM subsystem. Same will be
done for SLUB later.
- Because the init might now be called after the boot-time memblock
allocation has given all memory to the buddy allocator, change
stack_depot_init() to allocate stack_table with kvmalloc() when
memblock is no longer available. Also handle allocation failure by
disabling stackdepot (could have theoretically happened even with
memblock allocation previously), and don't unnecessarily align the
memblock allocation to its own size anymore.
[1] https://lore.kernel.org/all/CAMuHMdW=eoVzM1Re5FVoEN87nKfiLmM2+Ah7eNu2KXEhCvbZyA@mail.gmail.com/
Link: https://lkml.kernel.org/r/20211013073005.11351-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Marco Elver <elver@google.com> # stackdepot
Cc: Marco Elver <elver@google.com>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Oliver Glitta <glittao@gmail.com>
Cc: Imran Khan <imran.f.khan@oracle.com>
From: Colin Ian King <colin.king@canonical.com>
Subject: lib/stackdepot: fix spelling mistake and grammar in pr_err message
There is a spelling mistake of the work allocation so fix this and
re-phrase the message to make it easier to read.
Link: https://lkml.kernel.org/r/20211015104159.11282-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
From: Vlastimil Babka <vbabka@suse.cz>
Subject: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup
On FLATMEM, we call page_ext_init_flatmem_late() just before
kmem_cache_init() which means stack_depot_init() (called by page owner
init) will not recognize properly it should use kvmalloc() and not
memblock_alloc(). memblock_alloc() will also not issue a warning and
return a block memory that can be invalid and cause kernel page fault when
saving stacks, as reported by the kernel test robot [1].
Fix this by moving page_ext_init_flatmem_late() below kmem_cache_init() so
that slab_is_available() is true during stack_depot_init(). SPARSEMEM
doesn't have this issue, as it doesn't do page_ext_init_flatmem_late(),
but a different page_ext_init() even later in the boot process.
Thanks to Mike Rapoport for pointing out the FLATMEM init ordering issue.
While at it, also actually resolve a checkpatch warning in stack_depot_init()
from DRM CI, which was supposed to be in the original patch already.
[1] https://lore.kernel.org/all/20211014085450.GC18719@xsang-OptiPlex-9020/
Link: https://lkml.kernel.org/r/6abd9213-19a9-6d58-cedc-2414386d2d81@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: kernel test robot <oliver.sang@intel.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
From: Vlastimil Babka <vbabka@suse.cz>
Subject: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup3
Due to cd06ab2fd4 ("drm/locking: add backtrace for locking contended
locks without backoff") landing recently to -next adding a new stack depot
user in drivers/gpu/drm/drm_modeset_lock.c we need to add an appropriate
call to stack_depot_init() there as well.
Link: https://lkml.kernel.org/r/2a692365-cfa1-64f2-34e0-8aa5674dce5e@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Marco Elver <elver@google.com>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Oliver Glitta <glittao@gmail.com>
Cc: Imran Khan <imran.f.khan@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
From: Vlastimil Babka <vbabka@suse.cz>
Subject: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup4
Due to 4e66934eaa ("lib: add reference counting tracking
infrastructure") landing recently to net-next adding a new stack depot
user in lib/ref_tracker.c we need to add an appropriate call to
stack_depot_init() there as well.
Link: https://lkml.kernel.org/r/45c1b738-1a2f-5b5f-2f6d-86fab206d01c@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Slab <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>