android_kernel_samsung_sm8650

Author	SHA1	Message	Date
Todd Kjos	f27fc6ba23	Merge "Merge tag 'android14-6.1.68_r00' into branch 'android14-6.1'" into android14-6.1	2024-01-24 17:34:59 +00:00
Dylan Yudaken	ced65a053b	UPSTREAM: io_uring: use call_rcu_hurry if signaling an eventfd io_uring uses call_rcu in the case it needs to signal an eventfd as a result of an eventfd signal, since recursing eventfd signals are not allowed. This should be calling the new call_rcu_hurry API to not delay the signal. Signed-off-by: Dylan Yudaken <dylany@meta.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Paul E. McKenney <paulmck@kernel.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Link: https://lore.kernel.org/r/20221215184138.795576-1-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit 44a84da45272b3f4beb90025a64cfbde18f1aef0) Bug: 258241771 Signed-off-by: Joel Fernandes <joelaf@google.com> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/4909038 Reviewed-by: Vineeth Pillai <vineethrp@google.com> Reviewed-by: Ross Zwisler <zwisler@google.com> Reviewed-by: Sean Paul <sean@poorly.run> Signed-off-by: Qais Yousef <qyousef@google.com> Change-Id: Iec189c9ce0a95ccacda81f58bf7d49a575a6ab3f	2024-01-19 00:10:44 +00:00
Pavel Begunkov	bfe5a5e2f9	io_uring: fix mutex_unlock with unreferenced ctx commit f7b32e785042d2357c5abc23ca6db1b92c91a070 upstream. Callers of mutex_unlock() have to make sure that the mutex stays alive for the whole duration of the function call. For io_uring that means that the following pattern is not valid unless we ensure that the context outlives the mutex_unlock() call. mutex_lock(&ctx->uring_lock); req_put(req); // typically via io_req_task_submit() mutex_unlock(&ctx->uring_lock); Most contexts are fine: io-wq pins requests, syscalls hold the file, task works are taking ctx references and so on. However, the task work fallback path doesn't follow the rule. Cc: <stable@vger.kernel.org> Fixes: `04fc6c802d` ("io_uring: save ctx put/get for task_work submit") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/io-uring/CAG48ez3xSoYb+45f1RLtktROJrpiDQ1otNvdR+YLQf7m+Krj5Q@mail.gmail.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-12-13 18:39:18 +01:00
Pavel Begunkov	9704cfcf1f	io_uring/sqpoll: fix io-wq affinity when IORING_SETUP_SQPOLL is used From: Jens Axboe <axboe@kernel.dk> [ upstream commit ebdfefc09c6de7897962769bd3e63a2ff443ebf5 ] If we setup the ring with SQPOLL, then that polling thread has its own io-wq setup. This means that if the application uses IORING_REGISTER_IOWQ_AFF to set the io-wq affinity, we should not be setting it for the invoking task, but rather the sqpoll task. Add an sqpoll helper that parks the thread and updates the affinity, and use that one if we're using SQPOLL. Fixes: `fe76421d1d` ("io_uring: allow user configurable IO thread CPU affinity") Cc: stable@vger.kernel.org # 5.10+ Link: https://github.com/axboe/liburing/discussions/884 Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-09-19 12:27:54 +02:00
Pavel Begunkov	605d055452	io_uring: break out of iowq iopoll on teardown [ upstream commit 45500dc4e01c167ee063f3dcc22f51ced5b2b1e9 ] io-wq will retry iopoll even when it failed with -EAGAIN. If that races with task exit, which sets TIF_NOTIFY_SIGNAL for all its workers, such workers might potentially infinitely spin retrying iopoll again and again and each time failing on some allocation / waiting / etc. Don't keep spinning if io-wq is dying. Fixes: `561fb04a6a` ("io_uring: replace workqueue usage with io-wq") Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-09-19 12:27:54 +02:00
Pavel Begunkov	8fa9cb5844	io_uring: break iopolling on signal commit dc314886cb3d0e4ab2858003e8de2917f8a3ccbd upstream. Don't keep spinning iopoll with a signal set. It'll eventually return back, e.g. by virtue of need_resched(), but it's not a nice user experience. Cc: stable@vger.kernel.org Fixes: `def596e955` ("io_uring: support for IO polling") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/eeba551e82cad12af30c3220125eb6cb244cc94c.1691594339.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-09-13 09:43:00 +02:00
Pavel Begunkov	0c323430e4	io_uring: fix drain stalls by invalid SQE [ Upstream commit cfdbaa3a291d6fd2cb4a1a70d74e63b4abc2f5ec ] cq_extra is protected by ->completion_lock, which io_get_sqe() misses. The bug is harmless as it doesn't happen in real life, requires invalid SQ index array and racing with submission, and only messes up the userspace, i.e. stall requests execution but will be cleaned up on ring destruction. Fixes: `15641e4270` ("io_uring: don't cache number of dropped SQEs") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/66096d54651b1a60534bb2023f2947f09f50ef73.1691538547.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-09-13 09:42:43 +02:00
Helge Deller	5d54040e9d	io_uring/parisc: Adjust pgoff in io_uring mmap() for parisc Vidra Jonas reported issues on parisc with libuv which then triggers build errors with cmake. Debugging shows that those issues stem from io_uring(). I was not able to easily pull in upstream commits directly, so here is IMHO the least invasive manual backport of the following upstream commits to fix the cache aliasing issues on parisc on kernel 6.1 with io_uring: 56675f8b9f9b ("io_uring/parisc: Adjust pgoff in io_uring mmap() for parisc") 32832a407a71 ("io_uring: Fix io_uring mmap() by using architecture-provided get_unmapped_area()") d808459b2e31 ("io_uring: Adjust mapping wrt architecture aliasing requirements") With this patch kernel 6.1 has all relevant mmap changes and is identical to kernel 6.5 with regard to mmap() in io_uring. Signed-off-by: Helge Deller <deller@gmx.de> Reported-by: Vidra.Jonas@seznam.cz Link: https://lore.kernel.org/linux-parisc/520.NvTX.6mXZpmfh4Ju.1awpAS@seznam.cz/ Cc: Sam James <sam@gentoo.org> Cc: John David Anglin <dave.anglin@bell.net> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-09-02 09:16:19 +02:00
Jens Axboe	c7920f9928	io_uring: gate iowait schedule on having pending requests Commit 7b72d661f1f2f950ab8c12de7e2bc48bdac8ed69 upstream. A previous commit made all cqring waits marked as iowait, as a way to improve performance for short schedules with pending IO. However, for use cases that have a special reaper thread that does nothing but wait on events on the ring, this causes a cosmetic issue where we know have one core marked as being "busy" with 100% iowait. While this isn't a grave issue, it is confusing to users. Rather than always mark us as being in iowait, gate setting of current->in_iowait to 1 by whether or not the waiting task has pending requests. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/io-uring/CAMEGJJ2RxopfNQ7GNLhr7X9=bHXKo+G5OOe0LUq=+UgLXsv1Xg@mail.gmail.com/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=217699 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217700 Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Reported-by: Phil Elwell <phil@raspberrypi.com> Tested-by: Andres Freund <andres@anarazel.de> Fixes: 8a796565cec3 ("io_uring: Use io_schedule* in cqring wait") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-08-11 12:08:08 +02:00
Ondrej Mosnacek	04f7d49174	io_uring: don't audit the capability check in io_uring_create() [ Upstream commit 6adc2272aaaf84f34b652cf77f770c6fcc4b8336 ] The check being unconditional may lead to unwanted denials reported by LSMs when a process has the capability granted by DAC, but denied by an LSM. In the case of SELinux such denials are a problem, since they can't be effectively filtered out via the policy and when not silenced, they produce noise that may hide a true problem or an attack. Since not having the capability merely means that the created io_uring context will be accounted against the current user's RLIMIT_MEMLOCK limit, we can disable auditing of denials for this check by using ns_capable_noaudit() instead of capable(). Fixes: `2b188cc1bb` ("Add io_uring IO interface") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2193317 Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Link: https://lore.kernel.org/r/20230718115607.65652-1-omosnace@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-03 10:23:48 +02:00
Jens Axboe	1b87f546a0	io_uring: treat -EAGAIN for REQ_F_NOWAIT as final for io-wq commit a9be202269580ca611c6cebac90eaf1795497800 upstream. io-wq assumes that an issue is blocking, but it may not be if the request type has asked for a non-blocking attempt. If we get -EAGAIN for that case, then we need to treat it as a final result and not retry or arm poll for it. Cc: stable@vger.kernel.org # 5.10+ Link: https://github.com/axboe/liburing/issues/897 Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-27 08:50:23 +02:00
Andres Freund	f32dfc802e	io_uring: Use io_schedule* in cqring wait Commit 8a796565cec3601071cbbd27d6304e202019d014 upstream. I observed poor performance of io_uring compared to synchronous IO. That turns out to be caused by deeper CPU idle states entered with io_uring, due to io_uring using plain schedule(), whereas synchronous IO uses io_schedule(). The losses due to this are substantial. On my cascade lake workstation, t/io_uring from the fio repository e.g. yields regressions between 20% and 40% with the following command: ./t/io_uring -r 5 -X0 -d 1 -s 1 -c 1 -p 0 -S$use_sync -R 0 /mnt/t2/fio/write.0.0 This is repeatable with different filesystems, using raw block devices and using different block devices. Use io_schedule_prepare() / io_schedule_finish() in io_cqring_wait_schedule() to address the difference. After that using io_uring is on par or surpassing synchronous IO (using registered files etc makes it reliably win, but arguably is a less fair comparison). There are other calls to schedule() in io_uring/, but none immediately jump out to be similarly situated, so I did not touch them. Similarly, it's possible that mutex_lock_io() should be used, but it's not clear if there are cases where that matters. Cc: stable@vger.kernel.org # 5.10+ Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: io-uring@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Andres Freund <andres@anarazel.de> Link: https://lore.kernel.org/r/20230707162007.194068-1-andres@anarazel.de [axboe: minor style fixup] Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-19 16:22:18 +02:00
Jens Axboe	b50d6e06cc	io_uring: wait interruptibly for request completions on exit commit 4826c59453b3b4677d6bf72814e7ababdea86949 upstream. WHen the ring exits, cleanup is done and the final cancelation and waiting on completions is done by io_ring_exit_work. That function is invoked by kworker, which doesn't take any signals. Because of that, it doesn't really matter if we wait for completions in TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE state. However, it does matter to the hung task detection checker! Normally we expect cancelations and completions to happen rather quickly. Some test cases, however, will exit the ring and park the owning task stopped (eg via SIGSTOP). If the owning task needs to run task_work to complete requests, then io_ring_exit_work won't make any progress until the task is runnable again. Hence io_ring_exit_work can trigger the hung task detection, which is particularly problematic if panic-on-hung-task is enabled. As the ring exit doesn't take signals to begin with, have it wait interruptibly rather than uninterruptibly. io_uring has a separate stuck-exit warning that triggers independently anyway, so we're not really missing anything by making this switch. Cc: stable@vger.kernel.org # 5.10+ Link: https://lore.kernel.org/r/b0e4aaef-7088-56ce-244c-976edeac0e66@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-07-19 16:22:09 +02:00
Wojciech Lukowicz	ac48787f58	io_uring: fix memory leak when removing provided buffers [ Upstream commit b4a72c0589fdea6259720375426179888969d6a2 ] When removing provided buffers, io_buffer structs are not being disposed of, leading to a memory leak. They can't be freed individually, because they are allocated in page-sized groups. They need to be added to some free list instead, such as io_buffers_cache. All callers already hold the lock protecting it, apart from when destroying buffers, so had to extend the lock there. Fixes: `cc3cec8367` ("io_uring: speedup provided buffer handling") Signed-off-by: Wojciech Lukowicz <wlukowicz01@gmail.com> Link: https://lore.kernel.org/r/20230401195039.404909-2-wlukowicz01@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-04-13 16:55:31 +02:00
Joseph Qi	75a499fc9d	io_uring: fix fget leak when fs don't support nowait buffered read commit 54aa7f2330b82884f4a1afce0220add6e8312f8b upstream. Heming reported a BUG when using io_uring doing link-cp on ocfs2. [1] Do the following steps can reproduce this BUG: mount -t ocfs2 /dev/vdc /mnt/ocfs2 cp testfile /mnt/ocfs2/ ./link-cp /mnt/ocfs2/testfile /mnt/ocfs2/testfile.1 umount /mnt/ocfs2 Then umount will fail, and it outputs: umount: /mnt/ocfs2: target is busy. While tracing umount, it blames mnt_get_count() not return as expected. Do a deep investigation for fget()/fput() on related code flow, I've finally found that fget() leaks since ocfs2 doesn't support nowait buffered read. io_issue_sqe \|-io_assign_file // do fget() first \|-io_read \|-io_iter_do_read \|-ocfs2_file_read_iter // return -EOPNOTSUPP \|-kiocb_done \|-io_rw_done \|-__io_complete_rw_common // set REQ_F_REISSUE \|-io_resubmit_prep \|-io_req_prep_async // override req->file, leak happens This was introduced by commit `a196c78b54` in v5.18. Fix it by don't re-assign req->file if it has already been assigned. [1] https://lore.kernel.org/ocfs2-devel/ab580a75-91c8-d68a-3455-40361be1bfa8@linux.alibaba.com/T/#t Fixes: `a196c78b54` ("io_uring: assign non-fixed early for async work") Cc: <stable@vger.kernel.org> Reported-by: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Link: https://lore.kernel.org/r/20230228045459.13524-1-joseph.qi@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-10 09:34:04 +01:00
Jens Axboe	d9399e5430	io_uring: add reschedule point to handle_tw_list() commit f58680085478dd292435727210122960d38e8014 upstream. If CONFIG_PREEMPT_NONE is set and the task_work chains are long, we could be running into issues blocking others for too long. Add a reschedule check in handle_tw_list(), and flush the ctx if we need to reschedule. Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-10 09:34:03 +01:00
Jens Axboe	42fe280e26	io_uring: add a conditional reschedule to the IOPOLL cancelation loop commit fcc926bb857949dbfa51a7d95f3f5ebc657f198c upstream. If the kernel is configured with CONFIG_PREEMPT_NONE, we could be sitting in a tight loop reaping events but not giving them a chance to finish. This results in a trace ala: rcu: INFO: rcu_sched self-detected stall on CPU rcu: 2-...!: (5249 ticks this GP) idle=935c/1/0x4000000000000000 softirq=4265/4274 fqs=1 (t=5251 jiffies g=465 q=4135 ncpus=4) rcu: rcu_sched kthread starved for 5249 jiffies! g465 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_sched state:R running task stack:0 pid:12 ppid:2 flags:0x00000008 Call trace: __switch_to+0xb0/0xc8 __schedule+0x43c/0x520 schedule+0x4c/0x98 schedule_timeout+0xbc/0xdc rcu_gp_fqs_loop+0x308/0x344 rcu_gp_kthread+0xd8/0xf0 kthread+0xb8/0xc8 ret_from_fork+0x10/0x20 rcu: Stack dump where RCU GP kthread last ran: Task dump for CPU 0: task:kworker/u8:10 state:R running task stack:0 pid:89 ppid:2 flags:0x0000000a Workqueue: events_unbound io_ring_exit_work Call trace: __switch_to+0xb0/0xc8 0xffff0000c8fefd28 CPU: 2 PID: 95 Comm: kworker/u8:13 Not tainted 6.2.0-rc5-00042-g40316e337c80-dirty #2759 Hardware name: linux,dummy-virt (DT) Workqueue: events_unbound io_ring_exit_work pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : io_do_iopoll+0x344/0x360 lr : io_do_iopoll+0xb8/0x360 sp : ffff800009bebc60 x29: ffff800009bebc60 x28: 0000000000000000 x27: 0000000000000000 x26: ffff0000c0f67d48 x25: ffff0000c0f67840 x24: ffff800008950024 x23: 0000000000000001 x22: 0000000000000000 x21: ffff0000c27d3200 x20: ffff0000c0f67840 x19: ffff0000c0f67800 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000001 x13: 0000000000000001 x12: 0000000000000000 x11: 0000000000000179 x10: 0000000000000870 x9 : ffff800009bebd60 x8 : ffff0000c27d3ad0 x7 : fefefefefefefeff x6 : 0000646e756f626e x5 : ffff0000c0f67840 x4 : 0000000000000000 x3 : ffff0000c2398000 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: io_do_iopoll+0x344/0x360 io_uring_try_cancel_requests+0x21c/0x334 io_ring_exit_work+0x90/0x40c process_one_work+0x1a4/0x254 worker_thread+0x1ec/0x258 kthread+0xb8/0xc8 ret_from_fork+0x10/0x20 Add a cond_resched() in the cancelation IOPOLL loop to fix this. Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-10 09:34:03 +01:00
Pavel Begunkov	c981a035df	io_uring: use user visible tail in io_uring_poll() commit c10bb64684813a326174c3eebcafb3ee5af52ca3 upstream. We return POLLIN from io_uring_poll() depending on whether there are CQEs for the userspace, and so we should use the user visible tail pointer instead of a transient cached value. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/228ffcbf30ba98856f66ffdb9a6a60ead1dd96c0.1674484266.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-10 09:34:03 +01:00
Dylan Yudaken	5e6618e28b	io_uring: always prep_async for drain requests [ Upstream commit ef5c600adb1d985513d2b612cc90403a148ff287 ] Drain requests all go through io_drain_req, which has a quick exit in case there is nothing pending (ie the drain is not useful). In that case it can run the issue the request immediately. However for safety it queues it through task work. The problem is that in this case the request is run asynchronously, but the async work has not been prepared through io_req_prep_async. This has not been a problem up to now, as the task work always would run before returning to userspace, and so the user would not have a chance to race with it. However - with IORING_SETUP_DEFER_TASKRUN - this is no longer the case and the work might be defered, giving userspace a chance to change data being referred to in the request. Instead _always_ prep_async for drain requests, which is simpler anyway and removes this issue. Cc: stable@vger.kernel.org Fixes: `c0e0d6ba25` ("io_uring: add IORING_SETUP_DEFER_TASKRUN") Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20230127105911.2420061-1-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-02-01 08:34:42 +01:00
Pavel Begunkov	1ce29f7444	io_uring: inline __io_req_complete_put() [ Upstream commit fa18fa2272c7469e470dcb7bf838ea50a25494ca ] Inline __io_req_complete_put() into io_req_complete_post(), there are no other users. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1923a4dfe80fa877f859a22ed3df2d5fc8ecf02b.1669203009.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: ef5c600adb1d ("io_uring: always prep_async for drain requests") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-02-01 08:34:41 +01:00
Pavel Begunkov	85224a3f89	io_uring: remove io_req_tw_post_queue [ Upstream commit 833b5dfffc26c81835ce38e2a5df9ac5fa142735 ] Remove io_req_tw_post() and io_req_tw_post_queue(), we can use io_req_task_complete() instead. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b9b73c08022c7f1457023ac841f35c0100e70345.1669203009.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: ef5c600adb1d ("io_uring: always prep_async for drain requests") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-02-01 08:34:41 +01:00
Pavel Begunkov	0c4d95ad21	io_uring: hold locks for io_req_complete_failed [ Upstream commit e276ae344a770f91912a81c6a338d92efd319be2 ] A preparation patch, make sure we always hold uring_lock around io_req_complete_failed(). The only place deviating from the rule is io_cancel_defer_files(), queue a tw instead. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/70760344eadaecf2939287084b9d4ba5c05a6984.1669203009.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: ef5c600adb1d ("io_uring: always prep_async for drain requests") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-02-01 08:34:41 +01:00
Pavel Begunkov	6dbb84c7c2	io_uring: inline __io_req_complete_post() [ Upstream commit f9d567c75ec216447f36da6e855500023504fa04 ] There is only one user of __io_req_complete_post(), inline it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ef4c9059950a3da5cf68df00f977f1fd13bd9306.1668597569.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: ef5c600adb1d ("io_uring: always prep_async for drain requests") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-02-01 08:34:41 +01:00
Pavel Begunkov	bfd4d18840	io_uring: inline io_req_task_work_add() [ Upstream commit e52d2e583e4ad1d5d0b804d79c2b8752eb0e5ceb ] __io_req_task_work_add() is huge but marked inline, that makes compilers to generate lots of garbage. Inline the wrapper caller io_req_task_work_add() instead. before and after: text data bss dec hex filename 47347 16248 8 63603 f873 io_uring/io_uring.o text data bss dec hex filename 45303 16248 8 61559 f077 io_uring/io_uring.o Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/26dc8c28ca0160e3269ef3e55c5a8b917c4d4450.1668162751.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: ef5c600adb1d ("io_uring: always prep_async for drain requests") Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-02-01 08:34:41 +01:00
Pavel Begunkov	e595dcd987	io_uring: fix CQ waiting timeout handling commit 12521a5d5cb7ff0ad43eadfc9c135d86e1131fa8 upstream. Jiffy to ktime CQ waiting conversion broke how we treat timeouts, in particular we rearm it anew every time we get into io_cqring_wait_schedule() without adjusting the timeout. Waiting for 2 CQEs and getting a task_work in the middle may double the timeout value, or even worse in some cases task may wait indefinitely. Cc: stable@vger.kernel.org Fixes: `228339662b` ("io_uring: don't convert to jiffies for waiting on timeouts") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f7bffddd71b08f28a877d44d37ac953ddb01590d.1672915663.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-01-12 12:02:48 +01:00
Pavel Begunkov	91d1295199	io_uring: pin context while queueing deferred tw commit 9ffa13ff78a0a55df968a72d6f0ebffccee5c9f4 upstream. Unlike normal tw, nothing prevents deferred tw to be executed right after an tw item added to ->work_llist in io_req_local_work_add(). For instance, the waiting task may get waken up by CQ posting or a normal tw. Thus we need to pin the ring for the rest of io_req_local_work_add() Cc: stable@vger.kernel.org Fixes: `c0e0d6ba25` ("io_uring: add IORING_SETUP_DEFER_TASKRUN") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1a79362b9c10b8523ef70b061d96523650a23344.1672795998.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-01-12 12:02:48 +01:00
Jens Axboe	8b2de52126	io_uring: check for valid register opcode earlier [ Upstream commit 343190841a1f22b96996d9f8cfab902a4d1bfd0e ] We only check the register opcode value inside the restricted ring section, move it into the main io_uring_register() function instead and check it up front. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-01-12 12:02:41 +01:00
Jens Axboe	ff46a46b7a	io_uring: pass in EPOLL_URING_WAKE for eventfd signaling and wakeups [ Upstream commit 4464853277d0ccdb9914608dd1332f0fa2f9846f ] Pass in EPOLL_URING_WAKE when signaling eventfd or doing poll related wakups, so that we can check for a circular event dependency between eventfd and epoll. If this flag is set when our wakeup handlers are called, then we know we have a dependency that needs to terminate multishot requests. eventfd and epoll are the only such possible dependencies. Cc: stable@vger.kernel.org # 6.0 Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-01-04 11:28:48 +01:00
Pavel Begunkov	f6a5cada4f	io_uring: dont remove file from msg_ring reqs commit ef0ec1ad03119b8b46b035dad42bca7d6da7c2e5 upstream. We should not be messing with req->file outside of core paths. Clearing it makes msg_ring non reentrant, i.e. luckily io_msg_send_fd() fails the request on failed io_double_lock_ctx() but clearly was originally intended to do retries instead. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e5ac9edadb574fe33f6d727cb8f14ce68262a684.1670384893.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-12-31 13:33:12 +01:00
Harshit Mogalapalli	998b30c394	io_uring: Fix a null-ptr-deref in io_tctx_exit_cb() Syzkaller reports a NULL deref bug as follows: BUG: KASAN: null-ptr-deref in io_tctx_exit_cb+0x53/0xd3 Read of size 4 at addr 0000000000000138 by task file1/1955 CPU: 1 PID: 1955 Comm: file1 Not tainted 6.1.0-rc7-00103-gef4d3ea40565 #75 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xcd/0x134 ? io_tctx_exit_cb+0x53/0xd3 kasan_report+0xbb/0x1f0 ? io_tctx_exit_cb+0x53/0xd3 kasan_check_range+0x140/0x190 io_tctx_exit_cb+0x53/0xd3 task_work_run+0x164/0x250 ? task_work_cancel+0x30/0x30 get_signal+0x1c3/0x2440 ? lock_downgrade+0x6e0/0x6e0 ? lock_downgrade+0x6e0/0x6e0 ? exit_signals+0x8b0/0x8b0 ? do_raw_read_unlock+0x3b/0x70 ? do_raw_spin_unlock+0x50/0x230 arch_do_signal_or_restart+0x82/0x2470 ? kmem_cache_free+0x260/0x4b0 ? putname+0xfe/0x140 ? get_sigframe_size+0x10/0x10 ? do_execveat_common.isra.0+0x226/0x710 ? lockdep_hardirqs_on+0x79/0x100 ? putname+0xfe/0x140 ? do_execveat_common.isra.0+0x238/0x710 exit_to_user_mode_prepare+0x15f/0x250 syscall_exit_to_user_mode+0x19/0x50 do_syscall_64+0x42/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0023:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 002b:00000000fffb7790 EFLAGS: 00000200 ORIG_RAX: 000000000000000b RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Kernel panic - not syncing: panic_on_warn set ... This happens because the adding of task_work from io_ring_exit_work() isn't synchronized with canceling all work items from eg exec. The execution of the two are ordered in that they are both run by the task itself, but if io_tctx_exit_cb() is queued while we're canceling all work items off exec AND gets executed when the task exits to userspace rather than in the main loop in io_uring_cancel_generic(), then we can find current->io_uring == NULL and hit the above crash. It's safe to add this NULL check here, because the execution of the two paths are done by the task itself. Cc: stable@vger.kernel.org Fixes: `d56d938b4b` ("io_uring: do ctx initiated file note removal") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Link: https://lore.kernel.org/r/20221206093833.3812138-1-harshit.m.mogalapalli@oracle.com [axboe: add code comment and also put an explanation in the commit msg] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-12-07 06:45:20 -07:00
Pavel Begunkov	9148286476	io_uring: fix multishot accept request leaks Having REQ_F_POLLED set doesn't guarantee that the request is executed as a multishot from the polling path. Fortunately for us, if the code thinks it's multishot issue when it's not, it can only ask to skip completion so leaking the request. Use issue_flags to mark multipoll issues. Cc: stable@vger.kernel.org Fixes: `390ed29b5e` ("io_uring: add IORING_ACCEPT_MULTISHOT for accept") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7700ac57653f2823e30b34dc74da68678c0c5f13.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-17 12:33:33 -07:00
Dylan Yudaken	0fc8c2acbf	io_uring: calculate CQEs from the user visible value io_cqring_wait (and it's wake function io_has_work) used cached_cq_tail in order to calculate the number of CQEs. cached_cq_tail is set strictly before the user visible rings->cq.tail However as far as userspace is concerned, if io_uring_enter(2) is called with a minimum number of events, they will verify by checking rings->cq.tail. It is therefore possible for io_uring_enter(2) to return early with fewer events visible to the user. Instead make the wait functions read from the user visible value, so there will be no discrepency. This is triggered eventually by the following reproducer: struct io_uring_sqe sqe; struct io_uring_cqe cqe; unsigned int cqe_ready; struct io_uring ring; int ret, i; ret = io_uring_queue_init(N, &ring, 0); assert(!ret); while(true) { for (i = 0; i < N; i++) { sqe = io_uring_get_sqe(&ring); io_uring_prep_nop(sqe); sqe->flags \|= IOSQE_ASYNC; } ret = io_uring_submit(&ring); assert(ret == N); do { ret = io_uring_wait_cqes(&ring, &cqe, N, NULL, NULL); } while(ret == -EINTR); cqe_ready = io_uring_cq_ready(&ring); assert(!ret); assert(cqe_ready == N); io_uring_cq_advance(&ring, N); } Fixes: `ad3eb2c89f` ("io_uring: split overflow state into SQ and CQ side") Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221108153016.1854297-1-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-11-08 10:36:15 -07:00
Dylan Yudaken	b3026767e1	io_uring: unlock if __io_run_local_work locked inside It is possible for tw to lock the ring, and this was not propogated out to io_run_local_work. This can cause an unlock to be missed. Instead pass a pointer to locked into __io_run_local_work. Fixes: `8ac5d85a89` ("io_uring: add local task_work run helper that is entered locked") Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221027144429.3971400-3-dylany@meta.com [axboe: WARN_ON() -> WARN_ON_ONCE() and add a minor comment] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-10-27 09:52:12 -06:00
Dylan Yudaken	8de11cdc96	io_uring: use io_run_local_work_locked helper prefer to use io_run_local_work_locked helper for consistency Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221027144429.3971400-2-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-10-27 09:51:47 -06:00
Pavel Begunkov	02bac94bd8	io_uring: don't iopoll from io_ring_ctx_wait_and_kill() We should not be completing requests from a task context that has already undergone io_uring cancellations, i.e. __io_uring_cancel(), as there are some assumptions, e.g. around cached task refs draining. Remove iopolling from io_ring_ctx_wait_and_kill() as it can be called later after PF_EXITING is set with the last task_work run. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7c03cc91455c4a1af49c6b9cbda4e57ea467aa11.1665891182.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-10-16 17:08:42 -06:00
Pavel Begunkov	34f0bc427e	io_uring: reuse io_alloc_req() Don't duplicate io_alloc_req() in io_req_caches_free() but reuse the helper. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6005fc88274864a49fc3096c22d8bdd605cf8576.1665891182.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-10-16 17:08:42 -06:00
Pavel Begunkov	4d5059512d	io_uring: kill hot path fixed file bitmap debug checks We test file_table.bitmap in io_file_get_fixed() to check invariants, don't do it, it's expensive and was showing up in profiles. No reports of this triggering has come in. Move the check to the file clear instead, which will still catch any wrong usage. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cf77f2ded68d2e5b2bc7355784d969837d48e023.1665891182.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-10-16 17:07:53 -06:00
Pavel Begunkov	38eddb2c75	io_uring: remove FFS_SCM THe lifetime of SCM'ed files is bound to ring_sock, which is destroyed strictly after we're done with registered file tables. This means there is no need for the FFS_SCM hack, which was not available on 32-bit builds anyway. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/984226a1045adf42dc35d8bd7fb5a8bbfa472ce1.1665891182.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-10-16 17:07:12 -06:00
Pavel Begunkov	44f87745d5	io_uring: optimise locking for local tw with submit_wait Running local task_work requires taking uring_lock, for submit + wait we can try to run them right after submit while we still hold the lock and save one lock/unlokc pair. The optimisation was implemented in the first local tw patches but got dropped for simplicity. Suggested-by: Dylan Yudaken <dylany@fb.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/281fc79d98b5d91fe4778c5137a17a2ab4693e5c.1665088876.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-10-12 16:30:56 -06:00
Pavel Begunkov	fc86f9d3bb	io_uring: remove redundant memory barrier in io_req_local_work_add io_cqring_wake() needs a barrier for the waitqueue_active() check. However, in the case of io_req_local_work_add(), we call llist_add() first, which implies an atomic. Hence we can replace smb_mb() with smp_mb__after_atomic(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/43983bc8bc507172adda7a0f00cab1aff09fd238.1665018309.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-10-12 16:30:56 -06:00
Pavel Begunkov	b7a817752e	io_uring: remove notif leftovers Notifications were killed but there is a couple of fields and struct declarations left, remove them. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8df8877d677be5a2b43afd936d600e60105ea960.1664849941.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-10-12 16:30:56 -06:00
Pavel Begunkov	42b6419d0a	io_uring: correct pinned_vm accounting ->mm_account should be released only after we free all registered buffers, otherwise __io_sqe_buffers_unregister() will see a NULL ->mm_account and skip locked_vm accounting. Cc: <Stable@vger.kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6d798f65ed4ab8db3664c4d3397d4af16ca98846.1664849932.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-10-12 16:30:56 -06:00
Pavel Begunkov	d7cce96c44	io_uring: limit registration w/ SINGLE_ISSUER IORING_SETUP_SINGLE_ISSUER restricts what tasks can submit requests. Extend it to registration as well, so non-owning task can't do registrations. It's not necessary at the moment but might be useful in the future. Cc: <stable@vger.kernel.org> # 6.0 Fixes: `97bbdc06a4` ("io_uring: add IORING_SETUP_SINGLE_ISSUER") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f52a6a9c8a8990d4a831f73c0571e7406aac2bba.1664237592.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-10-07 12:25:30 -06:00
Dylan Yudaken	97c96e9fa3	io_uring: simplify __io_uring_add_tctx_node Remove submitter parameter from __io_uring_add_tctx_node. It was only called from one place, and we can do that logic in that one place. Signed-off-by: Dylan Yudaken <dylany@fb.com> Fixes: `97bbdc06a4` ("io_uring: add IORING_SETUP_SINGLE_ISSUER") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-10-07 12:25:30 -06:00
Linus Torvalds	513389809e	for-6.1/block-2022-10-03 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmM67XkQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpiHoD/9eN+6YnNRPu5+2zeGnnm1Nlwic6YMZeORr KFIeC0COMWoFhNBIPFkgAKT+0qIH+uGt5UsHSM3Y5La7wMR8yLxD4PAnvTZ/Ijtt yxVIOmonJoQ0OrQ2kTbvDXL/9OCUrzwXXyUIEPJnH0Ca1mxeNOgDHbE7VGF6DMul 0D3pI8qs2WLnHlDi1V/8kH5qZ6WoAJSDcb8sTzOUVnyveZPNaZhGQJuHA2XAYMtg fqKMDJqgmNk6jdTMUgdF5B+rV64PQoCy28I7fXqGkEe+RE5TBy57vAa0XY84V8XR /a8CEuwMts2ypk1hIcJG8Vv8K6u5war9yPM5MTngKsoMpzNIlhrhaJQVyjKdcs+E Ixwzexu6xTYcrcq+mUARgeTh79FzTBM/uXEdbCG2G3S6HPd6UZWUJZGfxw/l0Aem V4xB7lj6SQaJDU1iJCYUaHcekNXhQAPvyVG+R2ED1SO3McTpTPIM1aeigxw6vj7u bH3Kfdr94Z8HNuoLuiS6YYfjNt2Shf4LEB6GxKJ9TYHtyhdOyO0H64jGHpygrWqN cSnkWPUqUUNpF7srKM0ZgbliCshvmyJc4aMOFd0gBY/kXf5J/j7IXvh8TFCi9rHH 0KyZH3/3Zsu9geUn3ynznlr4FXU+BcqE6boaa/iWb9sN1m+Rvaahv8cSch/dh44a vQNj/iOBQA== =R05e -----END PGP SIGNATURE----- Merge tag 'for-6.1/block-2022-10-03' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: - NVMe pull requests via Christoph: - handle number of queue changes in the TCP and RDMA drivers (Daniel Wagner) - allow changing the number of queues in nvmet (Daniel Wagner) - also consider host_iface when checking ip options (Daniel Wagner) - don't map pages which can't come from HIGHMEM (Fabio M. De Francesco) - avoid unnecessary flush bios in nvmet (Guixin Liu) - shrink and better pack the nvme_iod structure (Keith Busch) - add comment for unaligned "fake" nqn (Linjun Bao) - print actual source IP address through sysfs "address" attr (Martin Belanger) - various cleanups (Jackie Liu, Wolfram Sang, Genjian Zhang) - handle effects after freeing the request (Keith Busch) - copy firmware_rev on each init (Keith Busch) - restrict management ioctls to admin (Keith Busch) - ensure subsystem reset is single threaded (Keith Busch) - report the actual number of tagset maps in nvme-pci (Keith Busch) - small fabrics authentication fixups (Christoph Hellwig) - add common code for tagset allocation and freeing (Christoph Hellwig) - stop using the request_queue in nvmet (Christoph Hellwig) - set min_align_mask before calculating max_hw_sectors (Rishabh Bhatnagar) - send a rediscover uevent when a persistent discovery controller reconnects (Sagi Grimberg) - misc nvmet-tcp fixes (Varun Prakash, zhenwei pi) - MD pull request via Song: - Various raid5 fix and clean up, by Logan Gunthorpe and David Sloan. - Raid10 performance optimization, by Yu Kuai. - sbitmap wakeup hang fixes (Hugh, Keith, Jan, Yu) - IO scheduler switching quisce fix (Keith) - s390/dasd block driver updates (Stefan) - support for recovery for the ublk driver (ZiyangZhang) - rnbd drivers fixes and updates (Guoqing, Santosh, ye, Christoph) - blk-mq and null_blk map fixes (Bart) - various bcache fixes (Coly, Jilin, Jules) - nbd signal hang fix (Shigeru) - block writeback throttling fix (Yu) - optimize the passthrough mapping handling (me) - prepare block cgroups to being gendisk based (Christoph) - get rid of an old PSI hack in the block layer, moving it to the callers instead where it belongs (Christoph) - blk-throttle fixes and cleanups (Yu) - misc fixes and cleanups (Liu Shixin, Liu Song, Miaohe, Pankaj, Ping-Xiang, Wolfram, Saurabh, Li Jinlin, Li Lei, Lin, Li zeming, Miaohe, Bart, Coly, Gaosheng * tag 'for-6.1/block-2022-10-03' of git://git.kernel.dk/linux: (162 commits) sbitmap: fix lockup while swapping block: add rationale for not using blk_mq_plug() when applicable block: adapt blk_mq_plug() to not plug for writes that require a zone lock s390/dasd: use blk_mq_alloc_disk blk-cgroup: don't update the blkg lookup hint in blkg_conf_prep nvmet: don't look at the request_queue in nvmet_bdev_set_limits nvmet: don't look at the request_queue in nvmet_bdev_zone_mgmt_emulate_all blk-mq: use quiesced elevator switch when reinitializing queues block: replace blk_queue_nowait with bdev_nowait nvme: remove nvme_ctrl_init_connect_q nvme-loop: use the tagset alloc/free helpers nvme-loop: store the generic nvme_ctrl in set->driver_data nvme-loop: initialize sqsize later nvme-fc: use the tagset alloc/free helpers nvme-fc: store the generic nvme_ctrl in set->driver_data nvme-fc: keep ctrl->sqsize in sync with opts->queue_size nvme-rdma: use the tagset alloc/free helpers nvme-rdma: store the generic nvme_ctrl in set->driver_data nvme-tcp: use the tagset alloc/free helpers nvme-tcp: store the generic nvme_ctrl in set->driver_data ...	2022-10-07 09:19:14 -07:00
Linus Torvalds	0a78a376ef	for-6.1/io_uring-2022-10-03 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmM67S0QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgppnPEACkBzilBLKwT9MWdUAITwyrMXsAa1R9gsR9 Tb3Xs+mNO2meuycLAUh4LIbb28NNr7/S5rwWet5NRZ71hgv4Q/WA/0EemAGGXYqd +3MEBAWU3FBFkC/cJXCnT8F5yCXYRkT5n/hzCSNEpNKjQ5JnAhHDlWAjgzZRuD/A A+YJjoBVJJuI1wY4I5XCpeQXEmg/Wc1MgXfyHgWVtGKnYrrxibiCnBZnqbAMZNvD hGn1Vl02ooamGTFm/nW/OAt71DtqsjWUCVOHKmlZ+zBUjbUj6FMXmPVV7vCV9o2w PT4Dx3CTc2iXwa8KfEFNPvXBzy0Qfu8edweP/MvZHWHVZREpEAh4cG6GhwW8whD+ 5mPisqmRjZKe0BBS4k/wKN1RXEypSQoTU4EdljfbQPU/usn35lmjMmEXXgs3IhqM fcTdO5ZUOp+CGyzI0Bc7UtS8vilJbX9ynN8G80MUUAZzuQg39MH7lNQYSJSSvJfU OlvzmL3lhRLYM1s/KKiZzdDBoMvC7R4oHmzCveOjQTMIHf6WNyqKFlrWScq2wzpN oRxqt0xiVQ3PFMmFj6N08f145qtbASuF3sKv7dbU3QXTsCAos3wdTdX+PejYApEZ W3dr0TDjNBicLNVPiSj132p0ZRtdZvLGuGVkBD4GPQeH2NwswxMHQAfz8e2lqmA4 9bWG6BM7Yw== =m9kX -----END PGP SIGNATURE----- Merge tag 'for-6.1/io_uring-2022-10-03' of git://git.kernel.dk/linux Pull io_uring updates from Jens Axboe: - Add supported for more directly managed task_work running. This is beneficial for real world applications that end up issuing lots of system calls as part of handling work. Normal task_work will always execute as we transition in and out of the kernel, even for "unrelated" system calls. It's more efficient to defer the handling of io_uring's deferred work until the application wants it to be run, generally in batches. As part of ongoing work to write an io_uring network backend for Thrift, this has been shown to greatly improve performance. (Dylan) - Add IOPOLL support for passthrough (Kanchan) - Improvements and fixes to the send zero-copy support (Pavel) - Partial IO handling fixes (Pavel) - CQE ordering fixes around CQ ring overflow (Pavel) - Support sendto() for non-zc as well (Pavel) - Support sendmsg for zerocopy (Pavel) - Networking iov_iter fix (Stefan) - Misc fixes and cleanups (Pavel, me) * tag 'for-6.1/io_uring-2022-10-03' of git://git.kernel.dk/linux: (56 commits) io_uring/net: fix notif cqe reordering io_uring/net: don't update msg_name if not provided io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL io_uring/rw: defer fsnotify calls to task context io_uring/net: fix fast_iov assignment in io_setup_async_msg() io_uring/net: fix non-zc send with address io_uring/net: don't skip notifs for failed requests io_uring/rw: don't lose short results on io_setup_async_rw() io_uring/rw: fix unexpected link breakage io_uring/net: fix cleanup double free free_iov init io_uring: fix CQE reordering io_uring/net: fix UAF in io_sendrecv_fail() selftest/net: adjust io_uring sendzc notif handling io_uring: ensure local task_work marks task as running io_uring/net: zerocopy sendmsg io_uring/net: combine fail handlers io_uring/net: rename io_sendzc() io_uring/net: support non-zerocopy sendto io_uring/net: refactor io_setup_async_addr io_uring/net: don't lose partial send_zc on fail ...	2022-10-07 08:52:43 -07:00
Christoph Hellwig	568ec936bf	block: replace blk_queue_nowait with bdev_nowait Replace blk_queue_nowait with a bdev_nowait helpers that takes the block_device given that the I/O submission path should not have to look into the request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20220927075815.269694-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-27 09:57:58 -06:00
Dylan Yudaken	7cae596bc3	io_uring: register single issuer task at creation Instead of picking the task from the first submitter task, rather use the creator task or in the case of disabled (IORING_SETUP_R_DISABLED) the enabling task. This approach allows a lot of simplification of the logic here. This removes init logic from the submission path, which can always be a bit confusing, but also removes the need for locking to write (or read) the submitter_task. Users that want to move a ring before submitting can create the ring disabled and then enable it on the submitting task. Signed-off-by: Dylan Yudaken <dylany@fb.com> Fixes: `97bbdc06a4` ("io_uring: add IORING_SETUP_SINGLE_ISSUER") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-26 11:26:18 -06:00
Jens Axboe	e775f93f2a	io_uring: ensure that cached task references are always put on exit io_uring caches task references to avoid doing atomics for each of them per request. If a request is put from the same task that allocated it, then we can maintain a per-ctx cache of them. This obviously relies on io_uring always pruning caches in a reliable way, and there's currently a case off io_uring fd release where we can miss that. One example is a ring setup with IOPOLL, which relies on the task polling for completions, which will free them. However, if such a task submits a request and then exits or closes the ring without reaping the completion, then ring release will reap and put. If release happens from that very same task, the completed request task refs will get put back into the cache pool. This is problematic, as we're now beyond the point of pruning caches. Manually drop these caches after doing an IOPOLL reap. This releases references from the current task, which is enough. If another task happens to be doing the release, then the caching will not be triggered and there's no issue. Cc: stable@vger.kernel.org Fixes: `e98e49b2bb` ("io_uring: extend task put optimisations") Reported-by: Homin Rhee <hominlab@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-23 18:51:08 -06:00
Pavel Begunkov	aa1df3a360	io_uring: fix CQE reordering Overflowing CQEs may result in reordering, which is buggy in case of links, F_MORE and so on. If we guarantee that we don't reorder for the unlikely event of a CQ ring overflow, then we can further extend this to not have to terminate multishot requests if it happens. For other operations, like zerocopy sends, we have no choice but to honor CQE ordering. Reported-by: Dylan Yudaken <dylany@fb.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ec3bc55687b0768bbe20fb62d7d06cfced7d7e70.1663892031.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-23 15:04:20 -06:00

1 2 3 4

182 Commits