From 03a793945396f4c6af71a4b3e1e44189236afc90 Mon Sep 17 00:00:00 2001 From: Daniel J Blueman Date: Fri, 19 Apr 2024 16:51:46 +0800 Subject: [PATCH 0001/1565] x86/tsc: Trust initial offset in architectural TSC-adjust MSRs commit 455f9075f14484f358b3c1d6845b4a438de198a7 upstream. When the BIOS configures the architectural TSC-adjust MSRs on secondary sockets to correct a constant inter-chassis offset, after Linux brings the cores online, the TSC sync check later resets the core-local MSR to 0, triggering HPET fallback and leading to performance loss. Fix this by unconditionally using the initial adjust values read from the MSRs. Trusting the initial offsets in this architectural mechanism is a better approach than special-casing workarounds for specific platforms. Signed-off-by: Daniel J Blueman Signed-off-by: Thomas Gleixner Reviewed-by: Steffen Persvold Reviewed-by: James Cleverdon Reviewed-by: Dimitri Sivanich Reviewed-by: Prarit Bhargava Link: https://lore.kernel.org/r/20240419085146.175665-1-daniel@quora.org Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/tsc_sync.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c index 923660057f36..a6f357774697 100644 --- a/arch/x86/kernel/tsc_sync.c +++ b/arch/x86/kernel/tsc_sync.c @@ -192,11 +192,9 @@ bool tsc_store_and_check_tsc_adjust(bool bootcpu) cur->warned = false; /* - * If a non-zero TSC value for socket 0 may be valid then the default - * adjusted value cannot assumed to be zero either. + * The default adjust value cannot be assumed to be zero on any socket. */ - if (tsc_async_resets) - cur->adjusted = bootval; + cur->adjusted = bootval; /* * Check whether this CPU is the first in a package to come up. In From 0fb736c9931e02dbc7d9a75044c8e1c039e50f04 Mon Sep 17 00:00:00 2001 From: Daniel Starke Date: Wed, 24 Apr 2024 07:48:41 +0200 Subject: [PATCH 0002/1565] tty: n_gsm: fix possible out-of-bounds in gsm0_receive() commit 47388e807f85948eefc403a8a5fdc5b406a65d5a upstream. Assuming the following: - side A configures the n_gsm in basic option mode - side B sends the header of a basic option mode frame with data length 1 - side A switches to advanced option mode - side B sends 2 data bytes which exceeds gsm->len Reason: gsm->len is not used in advanced option mode. - side A switches to basic option mode - side B keeps sending until gsm0_receive() writes past gsm->buf Reason: Neither gsm->state nor gsm->len have been reset after reconfiguration. Fix this by changing gsm->count to gsm->len comparison from equal to less than. Also add upper limit checks against the constant MAX_MRU in gsm0_receive() and gsm1_receive() to harden against memory corruption of gsm->len and gsm->mru. All other checks remain as we still need to limit the data according to the user configuration and actual payload size. Reported-by: j51569436@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218708 Tested-by: j51569436@gmail.com Fixes: e1eaea46bb40 ("tty: n_gsm line discipline") Cc: stable@vger.kernel.org Signed-off-by: Daniel Starke Link: https://lore.kernel.org/r/20240424054842.7741-1-daniel.starke@siemens.com Signed-off-by: Greg Kroah-Hartman --- drivers/tty/n_gsm.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index c20f69a4c5e9..0a367fa23c27 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -2082,8 +2082,12 @@ static void gsm0_receive(struct gsm_mux *gsm, unsigned char c) break; case GSM_DATA: /* Data */ gsm->buf[gsm->count++] = c; - if (gsm->count == gsm->len) + if (gsm->count >= MAX_MRU) { + gsm->bad_size++; + gsm->state = GSM_SEARCH; + } else if (gsm->count >= gsm->len) { gsm->state = GSM_FCS; + } break; case GSM_FCS: /* FCS follows the packet */ gsm->received_fcs = c; @@ -2176,7 +2180,7 @@ static void gsm1_receive(struct gsm_mux *gsm, unsigned char c) gsm->state = GSM_DATA; break; case GSM_DATA: /* Data */ - if (gsm->count > gsm->mru) { /* Allow one for the FCS */ + if (gsm->count > gsm->mru || gsm->count > MAX_MRU) { /* Allow one for the FCS */ gsm->state = GSM_OVERRUN; gsm->bad_size++; } else From 07ef95cc7a579731198c93beed281e3a79a0e586 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Mon, 15 Apr 2024 14:02:23 +0300 Subject: [PATCH 0003/1565] speakup: Fix sizeof() vs ARRAY_SIZE() bug commit 008ab3c53bc4f0b2f20013c8f6c204a3203d0b8b upstream. The "buf" pointer is an array of u16 values. This code should be using ARRAY_SIZE() (which is 256) instead of sizeof() (which is 512), otherwise it can the still got out of bounds. Fixes: c8d2f34ea96e ("speakup: Avoid crash on very long word") Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter Reviewed-by: Samuel Thibault Link: https://lore.kernel.org/r/d16f67d2-fd0a-4d45-adac-75ddd11001aa@moroto.mountain Signed-off-by: Greg Kroah-Hartman --- drivers/accessibility/speakup/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/accessibility/speakup/main.c b/drivers/accessibility/speakup/main.c index 7598778cf37f..60c059c74fb6 100644 --- a/drivers/accessibility/speakup/main.c +++ b/drivers/accessibility/speakup/main.c @@ -576,7 +576,7 @@ static u_long get_word(struct vc_data *vc) } attr_ch = get_char(vc, (u_short *)tmp_pos, &spk_attr); buf[cnt++] = attr_ch; - while (tmpx < vc->vc_cols - 1 && cnt < sizeof(buf) - 1) { + while (tmpx < vc->vc_cols - 1 && cnt < ARRAY_SIZE(buf) - 1) { tmp_pos += 2; tmpx++; ch = get_char(vc, (u_short *)tmp_pos, &temp); From 1e160196042cac946798ac192a0bc3398f1aa66b Mon Sep 17 00:00:00 2001 From: Petr Pavlu Date: Fri, 17 May 2024 15:40:08 +0200 Subject: [PATCH 0004/1565] ring-buffer: Fix a race between readers and resize checks commit c2274b908db05529980ec056359fae916939fdaa upstream. The reader code in rb_get_reader_page() swaps a new reader page into the ring buffer by doing cmpxchg on old->list.prev->next to point it to the new page. Following that, if the operation is successful, old->list.next->prev gets updated too. This means the underlying doubly-linked list is temporarily inconsistent, page->prev->next or page->next->prev might not be equal back to page for some page in the ring buffer. The resize operation in ring_buffer_resize() can be invoked in parallel. It calls rb_check_pages() which can detect the described inconsistency and stop further tracing: [ 190.271762] ------------[ cut here ]------------ [ 190.271771] WARNING: CPU: 1 PID: 6186 at kernel/trace/ring_buffer.c:1467 rb_check_pages.isra.0+0x6a/0xa0 [ 190.271789] Modules linked in: [...] [ 190.271991] Unloaded tainted modules: intel_uncore_frequency(E):1 skx_edac(E):1 [ 190.272002] CPU: 1 PID: 6186 Comm: cmd.sh Kdump: loaded Tainted: G E 6.9.0-rc6-default #5 158d3e1e6d0b091c34c3b96bfd99a1c58306d79f [ 190.272011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014 [ 190.272015] RIP: 0010:rb_check_pages.isra.0+0x6a/0xa0 [ 190.272023] Code: [...] [ 190.272028] RSP: 0018:ffff9c37463abb70 EFLAGS: 00010206 [ 190.272034] RAX: ffff8eba04b6cb80 RBX: 0000000000000007 RCX: ffff8eba01f13d80 [ 190.272038] RDX: ffff8eba01f130c0 RSI: ffff8eba04b6cd00 RDI: ffff8eba0004c700 [ 190.272042] RBP: ffff8eba0004c700 R08: 0000000000010002 R09: 0000000000000000 [ 190.272045] R10: 00000000ffff7f52 R11: ffff8eba7f600000 R12: ffff8eba0004c720 [ 190.272049] R13: ffff8eba00223a00 R14: 0000000000000008 R15: ffff8eba067a8000 [ 190.272053] FS: 00007f1bd64752c0(0000) GS:ffff8eba7f680000(0000) knlGS:0000000000000000 [ 190.272057] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 190.272061] CR2: 00007f1bd6662590 CR3: 000000010291e001 CR4: 0000000000370ef0 [ 190.272070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 190.272073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 190.272077] Call Trace: [ 190.272098] [ 190.272189] ring_buffer_resize+0x2ab/0x460 [ 190.272199] __tracing_resize_ring_buffer.part.0+0x23/0xa0 [ 190.272206] tracing_resize_ring_buffer+0x65/0x90 [ 190.272216] tracing_entries_write+0x74/0xc0 [ 190.272225] vfs_write+0xf5/0x420 [ 190.272248] ksys_write+0x67/0xe0 [ 190.272256] do_syscall_64+0x82/0x170 [ 190.272363] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 190.272373] RIP: 0033:0x7f1bd657d263 [ 190.272381] Code: [...] [ 190.272385] RSP: 002b:00007ffe72b643f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 190.272391] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1bd657d263 [ 190.272395] RDX: 0000000000000002 RSI: 0000555a6eb538e0 RDI: 0000000000000001 [ 190.272398] RBP: 0000555a6eb538e0 R08: 000000000000000a R09: 0000000000000000 [ 190.272401] R10: 0000555a6eb55190 R11: 0000000000000246 R12: 00007f1bd6662500 [ 190.272404] R13: 0000000000000002 R14: 00007f1bd6667c00 R15: 0000000000000002 [ 190.272412] [ 190.272414] ---[ end trace 0000000000000000 ]--- Note that ring_buffer_resize() calls rb_check_pages() only if the parent trace_buffer has recording disabled. Recent commit d78ab792705c ("tracing: Stop current tracer when resizing buffer") causes that it is now always the case which makes it more likely to experience this issue. The window to hit this race is nonetheless very small. To help reproducing it, one can add a delay loop in rb_get_reader_page(): ret = rb_head_page_replace(reader, cpu_buffer->reader_page); if (!ret) goto spin; for (unsigned i = 0; i < 1U << 26; i++) /* inserted delay loop */ __asm__ __volatile__ ("" : : : "memory"); rb_list_head(reader->list.next)->prev = &cpu_buffer->reader_page->list; .. and then run the following commands on the target system: echo 1 > /sys/kernel/tracing/events/sched/sched_switch/enable while true; do echo 16 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1 echo 8 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1 done & while true; do for i in /sys/kernel/tracing/per_cpu/*; do timeout 0.1 cat $i/trace_pipe; sleep 0.2 done done To fix the problem, make sure ring_buffer_resize() doesn't invoke rb_check_pages() concurrently with a reader operating on the same ring_buffer_per_cpu by taking its cpu_buffer->reader_lock. Link: https://lore.kernel.org/linux-trace-kernel/20240517134008.24529-3-petr.pavlu@suse.com Cc: stable@vger.kernel.org Cc: Masami Hiramatsu Cc: Mathieu Desnoyers Fixes: 659f451ff213 ("ring-buffer: Add integrity check at end of iter read") Signed-off-by: Petr Pavlu [ Fixed whitespace ] Signed-off-by: Steven Rostedt (Google) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/ring_buffer.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 2df8e13a29e5..9a2c8727b033 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1498,6 +1498,11 @@ static int rb_check_bpage(struct ring_buffer_per_cpu *cpu_buffer, * * As a safety measure we check to make sure the data pages have not * been corrupted. + * + * Callers of this function need to guarantee that the list of pages doesn't get + * modified during the check. In particular, if it's possible that the function + * is invoked with concurrent readers which can swap in a new reader page then + * the caller should take cpu_buffer->reader_lock. */ static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer) { @@ -2222,8 +2227,12 @@ int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size, */ synchronize_rcu(); for_each_buffer_cpu(buffer, cpu) { + unsigned long flags; + cpu_buffer = buffer->buffers[cpu]; + raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); rb_check_pages(cpu_buffer); + raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); } atomic_dec(&buffer->record_disabled); } From 0d0ecd841f3f7218a211271855917aab3eb4beb0 Mon Sep 17 00:00:00 2001 From: Thorsten Blum Date: Fri, 10 May 2024 13:30:55 +0200 Subject: [PATCH 0005/1565] net: smc91x: Fix m68k kernel compilation for ColdFire CPU MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 5eefb477d21a26183bc3499aeefa991198315a2d upstream. Compiling the m68k kernel with support for the ColdFire CPU family fails with the following error: In file included from drivers/net/ethernet/smsc/smc91x.c:80: drivers/net/ethernet/smsc/smc91x.c: In function ‘smc_reset’: drivers/net/ethernet/smsc/smc91x.h:160:40: error: implicit declaration of function ‘_swapw’; did you mean ‘swap’? [-Werror=implicit-function-declaration] 160 | #define SMC_outw(lp, v, a, r) writew(_swapw(v), (a) + (r)) | ^~~~~~ drivers/net/ethernet/smsc/smc91x.h:904:25: note: in expansion of macro ‘SMC_outw’ 904 | SMC_outw(lp, x, ioaddr, BANK_SELECT); \ | ^~~~~~~~ drivers/net/ethernet/smsc/smc91x.c:250:9: note: in expansion of macro ‘SMC_SELECT_BANK’ 250 | SMC_SELECT_BANK(lp, 2); | ^~~~~~~~~~~~~~~ cc1: some warnings being treated as errors The function _swapw() was removed in commit d97cf70af097 ("m68k: use asm-generic/io.h for non-MMU io access functions"), but is still used in drivers/net/ethernet/smsc/smc91x.h. Use ioread16be() and iowrite16be() to resolve the error. Cc: stable@vger.kernel.org Fixes: d97cf70af097 ("m68k: use asm-generic/io.h for non-MMU io access functions") Signed-off-by: Thorsten Blum Reviewed-by: Andrew Lunn Link: https://lore.kernel.org/r/20240510113054.186648-2-thorsten.blum@toblux.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/smsc/smc91x.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/smsc/smc91x.h b/drivers/net/ethernet/smsc/smc91x.h index 387539a8094b..95e9204ce827 100644 --- a/drivers/net/ethernet/smsc/smc91x.h +++ b/drivers/net/ethernet/smsc/smc91x.h @@ -175,8 +175,8 @@ static inline void mcf_outsw(void *a, unsigned char *p, int l) writew(*wp++, a); } -#define SMC_inw(a, r) _swapw(readw((a) + (r))) -#define SMC_outw(lp, v, a, r) writew(_swapw(v), (a) + (r)) +#define SMC_inw(a, r) ioread16be((a) + (r)) +#define SMC_outw(lp, v, a, r) iowrite16be(v, (a) + (r)) #define SMC_insw(a, r, p, l) mcf_insw(a + r, p, l) #define SMC_outsw(a, r, p, l) mcf_outsw(a + r, p, l) From 89e07418a68604a4c16fc40e7bf85af600eafb9b Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi Date: Mon, 20 May 2024 22:26:20 +0900 Subject: [PATCH 0006/1565] nilfs2: fix unexpected freezing of nilfs_segctor_sync() commit 936184eadd82906992ff1f5ab3aada70cce44cee upstream. A potential and reproducible race issue has been identified where nilfs_segctor_sync() would block even after the log writer thread writes a checkpoint, unless there is an interrupt or other trigger to resume log writing. This turned out to be because, depending on the execution timing of the log writer thread running in parallel, the log writer thread may skip responding to nilfs_segctor_sync(), which causes a call to schedule() waiting for completion within nilfs_segctor_sync() to lose the opportunity to wake up. The reason why waking up the task waiting in nilfs_segctor_sync() may be skipped is that updating the request generation issued using a shared sequence counter and adding an wait queue entry to the request wait queue to the log writer, are not done atomically. There is a possibility that log writing and request completion notification by nilfs_segctor_wakeup() may occur between the two operations, and in that case, the wait queue entry is not yet visible to nilfs_segctor_wakeup() and the wake-up of nilfs_segctor_sync() will be carried over until the next request occurs. Fix this issue by performing these two operations simultaneously within the lock section of sc_state_lock. Also, following the memory barrier guidelines for event waiting loops, move the call to set_current_state() in the same location into the event waiting loop to ensure that a memory barrier is inserted just before the event condition determination. Link: https://lkml.kernel.org/r/20240520132621.4054-3-konishi.ryusuke@gmail.com Fixes: 9ff05123e3bf ("nilfs2: segment constructor") Signed-off-by: Ryusuke Konishi Tested-by: Ryusuke Konishi Cc: Cc: "Bai, Shuangpeng" Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- fs/nilfs2/segment.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index be0ca35b8aa4..c409f95b0052 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -2212,19 +2212,28 @@ static int nilfs_segctor_sync(struct nilfs_sc_info *sci) struct nilfs_segctor_wait_request wait_req; int err = 0; - spin_lock(&sci->sc_state_lock); init_wait(&wait_req.wq); wait_req.err = 0; atomic_set(&wait_req.done, 0); + init_waitqueue_entry(&wait_req.wq, current); + + /* + * To prevent a race issue where completion notifications from the + * log writer thread are missed, increment the request sequence count + * "sc_seq_request" and insert a wait queue entry using the current + * sequence number into the "sc_wait_request" queue at the same time + * within the lock section of "sc_state_lock". + */ + spin_lock(&sci->sc_state_lock); wait_req.seq = ++sci->sc_seq_request; + add_wait_queue(&sci->sc_wait_request, &wait_req.wq); spin_unlock(&sci->sc_state_lock); - init_waitqueue_entry(&wait_req.wq, current); - add_wait_queue(&sci->sc_wait_request, &wait_req.wq); - set_current_state(TASK_INTERRUPTIBLE); wake_up(&sci->sc_wait_daemon); for (;;) { + set_current_state(TASK_INTERRUPTIBLE); + if (atomic_read(&wait_req.done)) { err = wait_req.err; break; From eff7cdf890b02596b8d73e910bdbdd489175dbdb Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi Date: Mon, 20 May 2024 22:26:21 +0900 Subject: [PATCH 0007/1565] nilfs2: fix potential hang in nilfs_detach_log_writer() commit eb85dace897c5986bc2f36b3c783c6abb8a4292e upstream. Syzbot has reported a potential hang in nilfs_detach_log_writer() called during nilfs2 unmount. Analysis revealed that this is because nilfs_segctor_sync(), which synchronizes with the log writer thread, can be called after nilfs_segctor_destroy() terminates that thread, as shown in the call trace below: nilfs_detach_log_writer nilfs_segctor_destroy nilfs_segctor_kill_thread --> Shut down log writer thread flush_work nilfs_iput_work_func nilfs_dispose_list iput nilfs_evict_inode nilfs_transaction_commit nilfs_construct_segment (if inode needs sync) nilfs_segctor_sync --> Attempt to synchronize with log writer thread *** DEADLOCK *** Fix this issue by changing nilfs_segctor_sync() so that the log writer thread returns normally without synchronizing after it terminates, and by forcing tasks that are already waiting to complete once after the thread terminates. The skipped inode metadata flushout will then be processed together in the subsequent cleanup work in nilfs_segctor_destroy(). Link: https://lkml.kernel.org/r/20240520132621.4054-4-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi Reported-by: syzbot+e3973c409251e136fdd0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e3973c409251e136fdd0 Tested-by: Ryusuke Konishi Cc: Cc: "Bai, Shuangpeng" Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- fs/nilfs2/segment.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index c409f95b0052..3997f377f0e3 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -2234,6 +2234,14 @@ static int nilfs_segctor_sync(struct nilfs_sc_info *sci) for (;;) { set_current_state(TASK_INTERRUPTIBLE); + /* + * Synchronize only while the log writer thread is alive. + * Leave flushing out after the log writer thread exits to + * the cleanup work in nilfs_segctor_destroy(). + */ + if (!sci->sc_task) + break; + if (atomic_read(&wait_req.done)) { err = wait_req.err; break; @@ -2249,7 +2257,7 @@ static int nilfs_segctor_sync(struct nilfs_sc_info *sci) return err; } -static void nilfs_segctor_wakeup(struct nilfs_sc_info *sci, int err) +static void nilfs_segctor_wakeup(struct nilfs_sc_info *sci, int err, bool force) { struct nilfs_segctor_wait_request *wrq, *n; unsigned long flags; @@ -2257,7 +2265,7 @@ static void nilfs_segctor_wakeup(struct nilfs_sc_info *sci, int err) spin_lock_irqsave(&sci->sc_wait_request.lock, flags); list_for_each_entry_safe(wrq, n, &sci->sc_wait_request.head, wq.entry) { if (!atomic_read(&wrq->done) && - nilfs_cnt32_ge(sci->sc_seq_done, wrq->seq)) { + (force || nilfs_cnt32_ge(sci->sc_seq_done, wrq->seq))) { wrq->err = err; atomic_set(&wrq->done, 1); } @@ -2397,7 +2405,7 @@ static void nilfs_segctor_notify(struct nilfs_sc_info *sci, int mode, int err) if (mode == SC_LSEG_SR) { sci->sc_state &= ~NILFS_SEGCTOR_COMMIT; sci->sc_seq_done = sci->sc_seq_accepted; - nilfs_segctor_wakeup(sci, err); + nilfs_segctor_wakeup(sci, err, false); sci->sc_flush_request = 0; } else { if (mode == SC_FLUSH_FILE) @@ -2779,6 +2787,13 @@ static void nilfs_segctor_destroy(struct nilfs_sc_info *sci) || sci->sc_seq_request != sci->sc_seq_done); spin_unlock(&sci->sc_state_lock); + /* + * Forcibly wake up tasks waiting in nilfs_segctor_sync(), which can + * be called from delayed iput() via nilfs_evict_inode() and can race + * with the above log writer thread termination. + */ + nilfs_segctor_wakeup(sci, 0, true); + if (flush_work(&sci->sc_iput_work)) flag = true; From d7ff29a429b56f04783152ad7bbd7233b740e434 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Wed, 22 May 2024 09:04:39 +0200 Subject: [PATCH 0008/1565] ALSA: core: Fix NULL module pointer assignment at card init commit 39381fe7394e5eafac76e7e9367e7351138a29c1 upstream. The commit 81033c6b584b ("ALSA: core: Warn on empty module") introduced a WARN_ON() for a NULL module pointer passed at snd_card object creation, and it also wraps the code around it with '#ifdef MODULE'. This works in most cases, but the devils are always in details. "MODULE" is defined when the target code (i.e. the sound core) is built as a module; but this doesn't mean that the caller is also built-in or not. Namely, when only the sound core is built-in (CONFIG_SND=y) while the driver is a module (CONFIG_SND_USB_AUDIO=m), the passed module pointer is ignored even if it's non-NULL, and card->module remains as NULL. This would result in the missing module reference up/down at the device open/close, leading to a race with the code execution after the module removal. For addressing the bug, move the assignment of card->module again out of ifdef. The WARN_ON() is still wrapped with ifdef because the module can be really NULL when all sound drivers are built-in. Note that we keep 'ifdef MODULE' for WARN_ON(), otherwise it would lead to a false-positive NULL module check. Admittedly it won't catch perfectly, i.e. no check is performed when CONFIG_SND=y. But, it's no real problem as it's only for debugging, and the condition is pretty rare. Fixes: 81033c6b584b ("ALSA: core: Warn on empty module") Reported-by: Xu Yang Closes: https://lore.kernel.org/r/20240520170349.2417900-1-xu.yang_2@nxp.com Cc: Signed-off-by: Takashi Iwai Tested-by: Xu Yang Link: https://lore.kernel.org/r/20240522070442.17786-1-tiwai@suse.de Signed-off-by: Greg Kroah-Hartman --- sound/core/init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/core/init.c b/sound/core/init.c index 9f5270c90a10..b6dd43005c27 100644 --- a/sound/core/init.c +++ b/sound/core/init.c @@ -206,8 +206,8 @@ int snd_card_new(struct device *parent, int idx, const char *xid, card->number = idx; #ifdef MODULE WARN_ON(!module); - card->module = module; #endif + card->module = module; INIT_LIST_HEAD(&card->devices); init_rwsem(&card->controls_rwsem); rwlock_init(&card->ctl_files_rwlock); From c7d22022ece9a9d6624e97ddd96a0f2a6a2ea045 Mon Sep 17 00:00:00 2001 From: Igor Artemiev Date: Fri, 5 Apr 2024 18:24:30 +0300 Subject: [PATCH 0009/1565] wifi: cfg80211: fix the order of arguments for trace events of the tx_rx_evt class [ Upstream commit 9ef369973cd2c97cce3388d2c0c7e3c056656e8a ] The declarations of the tx_rx_evt class and the rdev_set_antenna event use the wrong order of arguments in the TP_ARGS macro. Fix the order of arguments in the TP_ARGS macro. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Igor Artemiev Link: https://msgid.link/20240405152431.270267-1-Igor.A.Artemiev@mcst.ru Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin --- net/wireless/trace.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/wireless/trace.h b/net/wireless/trace.h index edc824c103e8..06e81d1efc92 100644 --- a/net/wireless/trace.h +++ b/net/wireless/trace.h @@ -1660,7 +1660,7 @@ TRACE_EVENT(rdev_return_void_tx_rx, DECLARE_EVENT_CLASS(tx_rx_evt, TP_PROTO(struct wiphy *wiphy, u32 tx, u32 rx), - TP_ARGS(wiphy, rx, tx), + TP_ARGS(wiphy, tx, rx), TP_STRUCT__entry( WIPHY_ENTRY __field(u32, tx) @@ -1677,7 +1677,7 @@ DECLARE_EVENT_CLASS(tx_rx_evt, DEFINE_EVENT(tx_rx_evt, rdev_set_antenna, TP_PROTO(struct wiphy *wiphy, u32 tx, u32 rx), - TP_ARGS(wiphy, rx, tx) + TP_ARGS(wiphy, tx, rx) ); DECLARE_EVENT_CLASS(wiphy_netdev_id_evt, From c393ce8157a694e1e73f27823354aa1ec740020f Mon Sep 17 00:00:00 2001 From: Daniele Palmas Date: Thu, 18 Apr 2024 13:12:07 +0200 Subject: [PATCH 0010/1565] net: usb: qmi_wwan: add Telit FN920C04 compositions [ Upstream commit 0b8fe5bd73249dc20be2e88a12041f8920797b59 ] Add the following Telit FN920C04 compositions: 0x10a0: rmnet + tty (AT/NMEA) + tty (AT) + tty (diag) T: Bus=03 Lev=01 Prnt=03 Port=06 Cnt=01 Dev#= 5 Spd=480 MxCh= 0 D: Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=10a0 Rev=05.15 S: Manufacturer=Telit Cinterion S: Product=FN920 S: SerialNumber=92c4c4d8 C: #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms 0x10a4: rmnet + tty (AT) + tty (AT) + tty (diag) T: Bus=03 Lev=01 Prnt=03 Port=06 Cnt=01 Dev#= 8 Spd=480 MxCh= 0 D: Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=10a4 Rev=05.15 S: Manufacturer=Telit Cinterion S: Product=FN920 S: SerialNumber=92c4c4d8 C: #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms 0x10a9: rmnet + tty (AT) + tty (diag) + DPL (data packet logging) + adb T: Bus=03 Lev=01 Prnt=03 Port=06 Cnt=01 Dev#= 9 Spd=480 MxCh= 0 D: Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=10a9 Rev=05.15 S: Manufacturer=Telit Cinterion S: Product=FN920 S: SerialNumber=92c4c4d8 C: #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 3 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=80 Driver=(none) E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none) E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms Signed-off-by: Daniele Palmas Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/net/usb/qmi_wwan.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index be2761d0bcd9..4dd1a9fb4c8a 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1301,6 +1301,9 @@ static const struct usb_device_id products[] = { {QMI_QUIRK_SET_DTR(0x1bc7, 0x1060, 2)}, /* Telit LN920 */ {QMI_QUIRK_SET_DTR(0x1bc7, 0x1070, 2)}, /* Telit FN990 */ {QMI_QUIRK_SET_DTR(0x1bc7, 0x1080, 2)}, /* Telit FE990 */ + {QMI_QUIRK_SET_DTR(0x1bc7, 0x10a0, 0)}, /* Telit FN920C04 */ + {QMI_QUIRK_SET_DTR(0x1bc7, 0x10a4, 0)}, /* Telit FN920C04 */ + {QMI_QUIRK_SET_DTR(0x1bc7, 0x10a9, 0)}, /* Telit FN920C04 */ {QMI_FIXED_INTF(0x1bc7, 0x1100, 3)}, /* Telit ME910 */ {QMI_FIXED_INTF(0x1bc7, 0x1101, 3)}, /* Telit ME910 dual modem */ {QMI_FIXED_INTF(0x1bc7, 0x1200, 5)}, /* Telit LE920 */ From c9c742eaa5fb44d605f367abc33b486da62a9dde Mon Sep 17 00:00:00 2001 From: Joshua Ashton Date: Thu, 2 Nov 2023 04:21:55 +0000 Subject: [PATCH 0011/1565] drm/amd/display: Set color_mgmt_changed to true on unsuspend [ Upstream commit 2eb9dd497a698dc384c0dd3e0311d541eb2e13dd ] Otherwise we can end up with a frame on unsuspend where color management is not applied when userspace has not committed themselves. Fixes re-applying color management on Steam Deck/Gamescope on S3 resume. Signed-off-by: Joshua Ashton Reviewed-by: Harry Wentland Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 3578e3b3536e..29ef0ed44d5f 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2099,6 +2099,7 @@ static int dm_resume(void *handle) dc_stream_release(dm_new_crtc_state->stream); dm_new_crtc_state->stream = NULL; } + dm_new_crtc_state->base.color_mgmt_changed = true; } for_each_new_plane_in_state(dm->cached_state, plane, new_plane_state, i) { From 6cd625926e264539760302aeaafdb520f9fe983a Mon Sep 17 00:00:00 2001 From: Derek Fang Date: Mon, 8 Apr 2024 17:10:56 +0800 Subject: [PATCH 0012/1565] ASoC: rt5645: Fix the electric noise due to the CBJ contacts floating [ Upstream commit 103abab975087e1f01b76fcb54c91dbb65dbc249 ] The codec leaves tie combo jack's sleeve/ring2 to floating status default. It would cause electric noise while connecting the active speaker jack during boot or shutdown. This patch requests a gpio to control the additional jack circuit to tie the contacts to the ground or floating. Signed-off-by: Derek Fang Link: https://msgid.link/r/20240408091057.14165-1-derek.fang@realtek.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/codecs/rt5645.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/sound/soc/codecs/rt5645.c b/sound/soc/codecs/rt5645.c index 5db63ef33f1a..ac403827a229 100644 --- a/sound/soc/codecs/rt5645.c +++ b/sound/soc/codecs/rt5645.c @@ -414,6 +414,7 @@ struct rt5645_priv { struct regmap *regmap; struct i2c_client *i2c; struct gpio_desc *gpiod_hp_det; + struct gpio_desc *gpiod_cbj_sleeve; struct snd_soc_jack *hp_jack; struct snd_soc_jack *mic_jack; struct snd_soc_jack *btn_jack; @@ -3169,6 +3170,9 @@ static int rt5645_jack_detect(struct snd_soc_component *component, int jack_inse regmap_update_bits(rt5645->regmap, RT5645_IN1_CTRL2, RT5645_CBJ_MN_JD, 0); + if (rt5645->gpiod_cbj_sleeve) + gpiod_set_value(rt5645->gpiod_cbj_sleeve, 1); + msleep(600); regmap_read(rt5645->regmap, RT5645_IN1_CTRL3, &val); val &= 0x7; @@ -3185,6 +3189,8 @@ static int rt5645_jack_detect(struct snd_soc_component *component, int jack_inse snd_soc_dapm_disable_pin(dapm, "Mic Det Power"); snd_soc_dapm_sync(dapm); rt5645->jack_type = SND_JACK_HEADPHONE; + if (rt5645->gpiod_cbj_sleeve) + gpiod_set_value(rt5645->gpiod_cbj_sleeve, 0); } if (rt5645->pdata.level_trigger_irq) regmap_update_bits(rt5645->regmap, RT5645_IRQ_CTRL2, @@ -3210,6 +3216,9 @@ static int rt5645_jack_detect(struct snd_soc_component *component, int jack_inse if (rt5645->pdata.level_trigger_irq) regmap_update_bits(rt5645->regmap, RT5645_IRQ_CTRL2, RT5645_JD_1_1_MASK, RT5645_JD_1_1_INV); + + if (rt5645->gpiod_cbj_sleeve) + gpiod_set_value(rt5645->gpiod_cbj_sleeve, 0); } return rt5645->jack_type; @@ -3882,6 +3891,16 @@ static int rt5645_i2c_probe(struct i2c_client *i2c, return ret; } + rt5645->gpiod_cbj_sleeve = devm_gpiod_get_optional(&i2c->dev, "cbj-sleeve", + GPIOD_OUT_LOW); + + if (IS_ERR(rt5645->gpiod_cbj_sleeve)) { + ret = PTR_ERR(rt5645->gpiod_cbj_sleeve); + dev_info(&i2c->dev, "failed to initialize gpiod, ret=%d\n", ret); + if (ret != -ENOENT) + return ret; + } + for (i = 0; i < ARRAY_SIZE(rt5645->supplies); i++) rt5645->supplies[i].supply = rt5645_supply_names[i]; @@ -4125,6 +4144,9 @@ static int rt5645_i2c_remove(struct i2c_client *i2c) cancel_delayed_work_sync(&rt5645->jack_detect_work); cancel_delayed_work_sync(&rt5645->rcclock_work); + if (rt5645->gpiod_cbj_sleeve) + gpiod_set_value(rt5645->gpiod_cbj_sleeve, 0); + regulator_bulk_disable(ARRAY_SIZE(rt5645->supplies), rt5645->supplies); return 0; @@ -4142,6 +4164,9 @@ static void rt5645_i2c_shutdown(struct i2c_client *i2c) 0); msleep(20); regmap_write(rt5645->regmap, RT5645_RESET, 0); + + if (rt5645->gpiod_cbj_sleeve) + gpiod_set_value(rt5645->gpiod_cbj_sleeve, 0); } static struct i2c_driver rt5645_i2c_driver = { From fcc54151a9ff0b1e3cd5877783f84bcad53b9335 Mon Sep 17 00:00:00 2001 From: Derek Fang Date: Mon, 8 Apr 2024 17:10:57 +0800 Subject: [PATCH 0013/1565] ASoC: dt-bindings: rt5645: add cbj sleeve gpio property [ Upstream commit 306b38e3fa727d22454a148a364123709e356600 ] Add an optional gpio property to control external CBJ circuits to avoid some electric noise caused by sleeve/ring2 contacts floating. Signed-off-by: Derek Fang Link: https://msgid.link/r/20240408091057.14165-2-derek.fang@realtek.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- Documentation/devicetree/bindings/sound/rt5645.txt | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/Documentation/devicetree/bindings/sound/rt5645.txt b/Documentation/devicetree/bindings/sound/rt5645.txt index 41a62fd2ae1f..c1fa379f5f3e 100644 --- a/Documentation/devicetree/bindings/sound/rt5645.txt +++ b/Documentation/devicetree/bindings/sound/rt5645.txt @@ -20,6 +20,11 @@ Optional properties: a GPIO spec for the external headphone detect pin. If jd-mode = 0, we will get the JD status by getting the value of hp-detect-gpios. +- cbj-sleeve-gpios: + a GPIO spec to control the external combo jack circuit to tie the sleeve/ring2 + contacts to the ground or floating. It could avoid some electric noise from the + active speaker jacks. + - realtek,in2-differential Boolean. Indicate MIC2 input are differential, rather than single-ended. @@ -68,6 +73,7 @@ codec: rt5650@1a { compatible = "realtek,rt5650"; reg = <0x1a>; hp-detect-gpios = <&gpio 19 0>; + cbj-sleeve-gpios = <&gpio 20 0>; interrupt-parent = <&gpio>; interrupts = <7 IRQ_TYPE_EDGE_FALLING>; realtek,dmic-en = "true"; From 0c71bfad148361b3054eb750b7a72d053c8c3a33 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Wed, 10 Apr 2024 19:26:15 +0200 Subject: [PATCH 0014/1565] regulator: vqmmc-ipq4019: fix module autoloading [ Upstream commit 68adb581a39ae63a0ed082c47f01fbbe515efa0e ] Add MODULE_DEVICE_TABLE(), so the module could be properly autoloaded based on the alias from of_device_id table. Signed-off-by: Krzysztof Kozlowski Reviewed-by: Konrad Dybcio Link: https://msgid.link/r/20240410172615.255424-2-krzk@kernel.org Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- drivers/regulator/vqmmc-ipq4019-regulator.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/regulator/vqmmc-ipq4019-regulator.c b/drivers/regulator/vqmmc-ipq4019-regulator.c index 6d5ae25d08d1..e2a28788d8a2 100644 --- a/drivers/regulator/vqmmc-ipq4019-regulator.c +++ b/drivers/regulator/vqmmc-ipq4019-regulator.c @@ -86,6 +86,7 @@ static const struct of_device_id regulator_ipq4019_of_match[] = { { .compatible = "qcom,vqmmc-ipq4019-regulator", }, {}, }; +MODULE_DEVICE_TABLE(of, regulator_ipq4019_of_match); static struct platform_driver ipq4019_regulator_driver = { .probe = ipq4019_regulator_probe, From 43ff957b96f85d4e3faf912452647da784a6e905 Mon Sep 17 00:00:00 2001 From: Jack Yu Date: Mon, 15 Apr 2024 06:27:23 +0000 Subject: [PATCH 0015/1565] ASoC: rt715: add vendor clear control register [ Upstream commit cebfbc89ae2552dbb58cd9b8206a5c8e0e6301e9 ] Add vendor clear control register in readable register's callback function. This prevents an access failure reported in Intel CI tests. Signed-off-by: Jack Yu Closes: https://github.com/thesofproject/linux/issues/4860 Tested-by: Pierre-Louis Bossart Link: https://lore.kernel.org/r/6a103ce9134d49d8b3941172c87a7bd4@realtek.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/codecs/rt715-sdw.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/soc/codecs/rt715-sdw.c b/sound/soc/codecs/rt715-sdw.c index 361a90ae594c..c0f09c3bcb6e 100644 --- a/sound/soc/codecs/rt715-sdw.c +++ b/sound/soc/codecs/rt715-sdw.c @@ -110,6 +110,7 @@ static bool rt715_readable_register(struct device *dev, unsigned int reg) case 0x839d: case 0x83a7: case 0x83a9: + case 0x752001: case 0x752039: return true; default: From eb464a8d826e5f56eb49d3af836c2633a38c71bd Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Fri, 26 Apr 2024 10:30:33 -0500 Subject: [PATCH 0016/1565] ASoC: da7219-aad: fix usage of device_get_named_child_node() [ Upstream commit e8a6a5ad73acbafd98e8fd3f0cbf6e379771bb76 ] The documentation for device_get_named_child_node() mentions this important point: " The caller is responsible for calling fwnode_handle_put() on the returned fwnode pointer. " Add fwnode_handle_put() to avoid a leaked reference. Signed-off-by: Pierre-Louis Bossart Link: https://lore.kernel.org/r/20240426153033.38500-1-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/codecs/da7219-aad.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/sound/soc/codecs/da7219-aad.c b/sound/soc/codecs/da7219-aad.c index b6030709b6b6..2fb26dd84bc7 100644 --- a/sound/soc/codecs/da7219-aad.c +++ b/sound/soc/codecs/da7219-aad.c @@ -629,8 +629,10 @@ static struct da7219_aad_pdata *da7219_aad_fw_to_pdata(struct device *dev) return NULL; aad_pdata = devm_kzalloc(dev, sizeof(*aad_pdata), GFP_KERNEL); - if (!aad_pdata) + if (!aad_pdata) { + fwnode_handle_put(aad_np); return NULL; + } aad_pdata->irq = i2c->irq; @@ -705,6 +707,8 @@ static struct da7219_aad_pdata *da7219_aad_fw_to_pdata(struct device *dev) else aad_pdata->adc_1bit_rpt = DA7219_AAD_ADC_1BIT_RPT_1; + fwnode_handle_put(aad_np); + return aad_pdata; } From 5b4d14a0bce6c1dfcee67c58e66d2f88eff9b724 Mon Sep 17 00:00:00 2001 From: Lancelot SIX Date: Wed, 10 Apr 2024 14:14:13 +0100 Subject: [PATCH 0017/1565] drm/amdkfd: Flush the process wq before creating a kfd_process [ Upstream commit f5b9053398e70a0c10aa9cb4dd5910ab6bc457c5 ] There is a race condition when re-creating a kfd_process for a process. This has been observed when a process under the debugger executes exec(3). In this scenario: - The process executes exec. - This will eventually release the process's mm, which will cause the kfd_process object associated with the process to be freed (kfd_process_free_notifier decrements the reference count to the kfd_process to 0). This causes kfd_process_ref_release to enqueue kfd_process_wq_release to the kfd_process_wq. - The debugger receives the PTRACE_EVENT_EXEC notification, and tries to re-enable AMDGPU traps (KFD_IOC_DBG_TRAP_ENABLE). - When handling this request, KFD tries to re-create a kfd_process. This eventually calls kfd_create_process and kobject_init_and_add. At this point the call to kobject_init_and_add can fail because the old kfd_process.kobj has not been freed yet by kfd_process_wq_release. This patch proposes to avoid this race by making sure to drain kfd_process_wq before creating a new kfd_process object. This way, we know that any cleanup task is done executing when we reach kobject_init_and_add. Signed-off-by: Lancelot SIX Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index d243e60c6eef..534f2dec6356 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -766,6 +766,14 @@ struct kfd_process *kfd_create_process(struct file *filep) if (process) { pr_debug("Process already found\n"); } else { + /* If the process just called exec(3), it is possible that the + * cleanup of the kfd_process (following the release of the mm + * of the old process image) is still in the cleanup work queue. + * Make sure to drain any job before trying to recreate any + * resource for this process. + */ + flush_workqueue(kfd_process_wq); + process = create_process(thread); if (IS_ERR(process)) goto out; From 904a590dab64d3a44fd6b0b44655572de2b8a436 Mon Sep 17 00:00:00 2001 From: Nilay Shroff Date: Tue, 16 Apr 2024 13:49:23 +0530 Subject: [PATCH 0018/1565] nvme: find numa distance only if controller has valid numa id [ Upstream commit 863fe60ed27f2c85172654a63c5b827e72c8b2e6 ] On system where native nvme multipath is configured and iopolicy is set to numa but the nvme controller numa node id is undefined or -1 (NUMA_NO_NODE) then avoid calculating node distance for finding optimal io path. In such case we may access numa distance table with invalid index and that may potentially refer to incorrect memory. So this patch ensures that if the nvme controller numa node id is -1 then instead of calculating node distance for finding optimal io path, we set the numa node distance of such controller to default 10 (LOCAL_DISTANCE). Link: https://lore.kernel.org/all/20240413090614.678353-1-nilay@linux.ibm.com/ Signed-off-by: Nilay Shroff Reviewed-by: Christoph Hellwig Reviewed-by: Sagi Grimberg Reviewed-by: Chaitanya Kulkarni Signed-off-by: Keith Busch Signed-off-by: Sasha Levin --- drivers/nvme/host/multipath.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 379d6818a063..9f59f93b70e2 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -168,7 +168,8 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) if (nvme_path_is_disabled(ns)) continue; - if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_NUMA) + if (ns->ctrl->numa_node != NUMA_NO_NODE && + READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_NUMA) distance = node_distance(node, ns->ctrl->numa_node); else distance = LOCAL_DISTANCE; From a81f072e50ee0e0d4a2355cdcdf2001e6bb83ddd Mon Sep 17 00:00:00 2001 From: Eric Sandeen Date: Fri, 1 Mar 2024 16:33:11 -0600 Subject: [PATCH 0019/1565] openpromfs: finish conversion to the new mount API [ Upstream commit 8f27829974b025d4df2e78894105d75e3bf349f0 ] The original mount API conversion inexplicably left out the change from ->remount_fs to ->reconfigure; do that now. Fixes: 7ab2fa7693c3 ("vfs: Convert openpromfs to use the new mount API") Signed-off-by: Eric Sandeen Link: https://lore.kernel.org/r/90b968aa-c979-420f-ba37-5acc3391b28f@redhat.com Signed-off-by: Christian Brauner Signed-off-by: Sasha Levin --- fs/openpromfs/inode.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/openpromfs/inode.c b/fs/openpromfs/inode.c index 40c8c2e32fa3..1e22344be5e5 100644 --- a/fs/openpromfs/inode.c +++ b/fs/openpromfs/inode.c @@ -362,10 +362,10 @@ static struct inode *openprom_iget(struct super_block *sb, ino_t ino) return inode; } -static int openprom_remount(struct super_block *sb, int *flags, char *data) +static int openpromfs_reconfigure(struct fs_context *fc) { - sync_filesystem(sb); - *flags |= SB_NOATIME; + sync_filesystem(fc->root->d_sb); + fc->sb_flags |= SB_NOATIME; return 0; } @@ -373,7 +373,6 @@ static const struct super_operations openprom_sops = { .alloc_inode = openprom_alloc_inode, .free_inode = openprom_free_inode, .statfs = simple_statfs, - .remount_fs = openprom_remount, }; static int openprom_fill_super(struct super_block *s, struct fs_context *fc) @@ -417,6 +416,7 @@ static int openpromfs_get_tree(struct fs_context *fc) static const struct fs_context_operations openpromfs_context_ops = { .get_tree = openpromfs_get_tree, + .reconfigure = openpromfs_reconfigure, }; static int openpromfs_init_fs_context(struct fs_context *fc) From ebed0d666fa709bae9e8cafa8ec6e7ebd1d318c6 Mon Sep 17 00:00:00 2001 From: Aleksandr Mishin Date: Fri, 22 Mar 2024 23:59:15 +0300 Subject: [PATCH 0020/1565] crypto: bcm - Fix pointer arithmetic [ Upstream commit 2b3460cbf454c6b03d7429e9ffc4fe09322eb1a9 ] In spu2_dump_omd() value of ptr is increased by ciph_key_len instead of hash_iv_len which could lead to going beyond the buffer boundaries. Fix this bug by changing ciph_key_len to hash_iv_len. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 9d12ba86f818 ("crypto: brcm - Add Broadcom SPU driver") Signed-off-by: Aleksandr Mishin Signed-off-by: Herbert Xu Signed-off-by: Sasha Levin --- drivers/crypto/bcm/spu2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/crypto/bcm/spu2.c b/drivers/crypto/bcm/spu2.c index c860ffb0b4c3..670d439f204c 100644 --- a/drivers/crypto/bcm/spu2.c +++ b/drivers/crypto/bcm/spu2.c @@ -495,7 +495,7 @@ static void spu2_dump_omd(u8 *omd, u16 hash_key_len, u16 ciph_key_len, if (hash_iv_len) { packet_log(" Hash IV Length %u bytes\n", hash_iv_len); packet_dump(" hash IV: ", ptr, hash_iv_len); - ptr += ciph_key_len; + ptr += hash_iv_len; } if (ciph_iv_len) { From e05ee61361e4b6340fb9c2d775b60c88573f03f5 Mon Sep 17 00:00:00 2001 From: Laurent Pinchart Date: Tue, 26 Mar 2024 21:58:06 +0200 Subject: [PATCH 0021/1565] firmware: raspberrypi: Use correct device for DMA mappings [ Upstream commit df518a0ae1b982a4dcf2235464016c0c4576a34d ] The buffer used to transfer data over the mailbox interface is mapped using the client's device. This is incorrect, as the device performing the DMA transfer is the mailbox itself. Fix it by using the mailbox controller device instead. This requires including the mailbox_controller.h header to dereference the mbox_chan and mbox_controller structures. The header is not meant to be included by clients. This could be fixed by extending the client API with a function to access the controller's device. Fixes: 4e3d60656a72 ("ARM: bcm2835: Add the Raspberry Pi firmware driver") Signed-off-by: Laurent Pinchart Reviewed-by: Stefan Wahren Tested-by: Ivan T. Ivanov Link: https://lore.kernel.org/r/20240326195807.15163-3-laurent.pinchart@ideasonboard.com Signed-off-by: Florian Fainelli Signed-off-by: Sasha Levin --- drivers/firmware/raspberrypi.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/firmware/raspberrypi.c b/drivers/firmware/raspberrypi.c index 45ff03da234a..c8a292df6fea 100644 --- a/drivers/firmware/raspberrypi.c +++ b/drivers/firmware/raspberrypi.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -96,8 +97,8 @@ int rpi_firmware_property_list(struct rpi_firmware *fw, if (size & 3) return -EINVAL; - buf = dma_alloc_coherent(fw->cl.dev, PAGE_ALIGN(size), &bus_addr, - GFP_ATOMIC); + buf = dma_alloc_coherent(fw->chan->mbox->dev, PAGE_ALIGN(size), + &bus_addr, GFP_ATOMIC); if (!buf) return -ENOMEM; @@ -125,7 +126,7 @@ int rpi_firmware_property_list(struct rpi_firmware *fw, ret = -EINVAL; } - dma_free_coherent(fw->cl.dev, PAGE_ALIGN(size), buf, bus_addr); + dma_free_coherent(fw->chan->mbox->dev, PAGE_ALIGN(size), buf, bus_addr); return ret; } From edbfc42ab080e78c6907d40a42c9d10b69e445c1 Mon Sep 17 00:00:00 2001 From: Brian Kubisiak Date: Sun, 17 Mar 2024 07:46:00 -0700 Subject: [PATCH 0022/1565] ecryptfs: Fix buffer size for tag 66 packet [ Upstream commit 85a6a1aff08ec9f5b929d345d066e2830e8818e5 ] The 'TAG 66 Packet Format' description is missing the cipher code and checksum fields that are packed into the message packet. As a result, the buffer allocated for the packet is 3 bytes too small and write_tag_66_packet() will write up to 3 bytes past the end of the buffer. Fix this by increasing the size of the allocation so the whole packet will always fit in the buffer. This fixes the below kasan slab-out-of-bounds bug: BUG: KASAN: slab-out-of-bounds in ecryptfs_generate_key_packet_set+0x7d6/0xde0 Write of size 1 at addr ffff88800afbb2a5 by task touch/181 CPU: 0 PID: 181 Comm: touch Not tainted 6.6.13-gnu #1 4c9534092be820851bb687b82d1f92a426598dc6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2/GNU Guix 04/01/2014 Call Trace: dump_stack_lvl+0x4c/0x70 print_report+0xc5/0x610 ? ecryptfs_generate_key_packet_set+0x7d6/0xde0 ? kasan_complete_mode_report_info+0x44/0x210 ? ecryptfs_generate_key_packet_set+0x7d6/0xde0 kasan_report+0xc2/0x110 ? ecryptfs_generate_key_packet_set+0x7d6/0xde0 __asan_store1+0x62/0x80 ecryptfs_generate_key_packet_set+0x7d6/0xde0 ? __pfx_ecryptfs_generate_key_packet_set+0x10/0x10 ? __alloc_pages+0x2e2/0x540 ? __pfx_ovl_open+0x10/0x10 [overlay 30837f11141636a8e1793533a02e6e2e885dad1d] ? dentry_open+0x8f/0xd0 ecryptfs_write_metadata+0x30a/0x550 ? __pfx_ecryptfs_write_metadata+0x10/0x10 ? ecryptfs_get_lower_file+0x6b/0x190 ecryptfs_initialize_file+0x77/0x150 ecryptfs_create+0x1c2/0x2f0 path_openat+0x17cf/0x1ba0 ? __pfx_path_openat+0x10/0x10 do_filp_open+0x15e/0x290 ? __pfx_do_filp_open+0x10/0x10 ? __kasan_check_write+0x18/0x30 ? _raw_spin_lock+0x86/0xf0 ? __pfx__raw_spin_lock+0x10/0x10 ? __kasan_check_write+0x18/0x30 ? alloc_fd+0xf4/0x330 do_sys_openat2+0x122/0x160 ? __pfx_do_sys_openat2+0x10/0x10 __x64_sys_openat+0xef/0x170 ? __pfx___x64_sys_openat+0x10/0x10 do_syscall_64+0x60/0xd0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f00a703fd67 Code: 25 00 00 41 00 3d 00 00 41 00 74 37 64 8b 04 25 18 00 00 00 85 c0 75 5b 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 85 00 00 00 48 83 c4 68 5d 41 5c c3 0f 1f RSP: 002b:00007ffc088e30b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 00007ffc088e3368 RCX: 00007f00a703fd67 RDX: 0000000000000941 RSI: 00007ffc088e48d7 RDI: 00000000ffffff9c RBP: 00007ffc088e48d7 R08: 0000000000000001 R09: 0000000000000000 R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941 R13: 0000000000000000 R14: 00007ffc088e48d7 R15: 00007f00a7180040 Allocated by task 181: kasan_save_stack+0x2f/0x60 kasan_set_track+0x29/0x40 kasan_save_alloc_info+0x25/0x40 __kasan_kmalloc+0xc5/0xd0 __kmalloc+0x66/0x160 ecryptfs_generate_key_packet_set+0x6d2/0xde0 ecryptfs_write_metadata+0x30a/0x550 ecryptfs_initialize_file+0x77/0x150 ecryptfs_create+0x1c2/0x2f0 path_openat+0x17cf/0x1ba0 do_filp_open+0x15e/0x290 do_sys_openat2+0x122/0x160 __x64_sys_openat+0xef/0x170 do_syscall_64+0x60/0xd0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: dddfa461fc89 ("[PATCH] eCryptfs: Public key; packet management") Signed-off-by: Brian Kubisiak Link: https://lore.kernel.org/r/5j2q56p6qkhezva6b2yuqfrsurmvrrqtxxzrnp3wqu7xrz22i7@hoecdztoplbl Signed-off-by: Christian Brauner Signed-off-by: Sasha Levin --- fs/ecryptfs/keystore.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/ecryptfs/keystore.c b/fs/ecryptfs/keystore.c index f6a17d259db7..afe51b4be1e7 100644 --- a/fs/ecryptfs/keystore.c +++ b/fs/ecryptfs/keystore.c @@ -300,9 +300,11 @@ write_tag_66_packet(char *signature, u8 cipher_code, * | Key Identifier Size | 1 or 2 bytes | * | Key Identifier | arbitrary | * | File Encryption Key Size | 1 or 2 bytes | + * | Cipher Code | 1 byte | * | File Encryption Key | arbitrary | + * | Checksum | 2 bytes | */ - data_len = (5 + ECRYPTFS_SIG_SIZE_HEX + crypt_stat->key_size); + data_len = (8 + ECRYPTFS_SIG_SIZE_HEX + crypt_stat->key_size); *packet = kmalloc(data_len, GFP_KERNEL); message = *packet; if (!message) { From dae53e39cdd6de8379a15502a5e7ad929c760183 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 28 Mar 2024 15:30:44 +0100 Subject: [PATCH 0023/1565] nilfs2: fix out-of-range warning [ Upstream commit c473bcdd80d4ab2ae79a7a509a6712818366e32a ] clang-14 points out that v_size is always smaller than a 64KB page size if that is configured by the CPU architecture: fs/nilfs2/ioctl.c:63:19: error: result of comparison of constant 65536 with expression of type '__u16' (aka 'unsigned short') is always false [-Werror,-Wtautological-constant-out-of-range-compare] if (argv->v_size > PAGE_SIZE) ~~~~~~~~~~~~ ^ ~~~~~~~~~ This is ok, so just shut up that warning with a cast. Signed-off-by: Arnd Bergmann Link: https://lore.kernel.org/r/20240328143051.1069575-7-arnd@kernel.org Fixes: 3358b4aaa84f ("nilfs2: fix problems of memory allocation in ioctl") Acked-by: Ryusuke Konishi Reviewed-by: Justin Stitt Signed-off-by: Christian Brauner Signed-off-by: Sasha Levin --- fs/nilfs2/ioctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c index 01235fac5971..668c1b28a940 100644 --- a/fs/nilfs2/ioctl.c +++ b/fs/nilfs2/ioctl.c @@ -59,7 +59,7 @@ static int nilfs_ioctl_wrap_copy(struct the_nilfs *nilfs, if (argv->v_nmembs == 0) return 0; - if (argv->v_size > PAGE_SIZE) + if ((size_t)argv->v_size > PAGE_SIZE) return -EINVAL; /* From f24cac6459370a63af17087a3d54e1c1b4c6da80 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Mon, 1 Apr 2024 22:35:54 -0400 Subject: [PATCH 0024/1565] parisc: add missing export of __cmpxchg_u8() [ Upstream commit c57e5dccb06decf3cb6c272ab138c033727149b5 ] __cmpxchg_u8() had been added (initially) for the sake of drivers/phy/ti/phy-tusb1210.c; the thing is, that drivers is modular, so we need an export Fixes: b344d6a83d01 "parisc: add support for cmpxchg on u8 pointers" Signed-off-by: Al Viro Signed-off-by: Paul E. McKenney Signed-off-by: Sasha Levin --- arch/parisc/kernel/parisc_ksyms.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/parisc/kernel/parisc_ksyms.c b/arch/parisc/kernel/parisc_ksyms.c index e8a6a751dfd8..07418f11a964 100644 --- a/arch/parisc/kernel/parisc_ksyms.c +++ b/arch/parisc/kernel/parisc_ksyms.c @@ -21,6 +21,7 @@ EXPORT_SYMBOL(memset); #include EXPORT_SYMBOL(__xchg8); EXPORT_SYMBOL(__xchg32); +EXPORT_SYMBOL(__cmpxchg_u8); EXPORT_SYMBOL(__cmpxchg_u32); EXPORT_SYMBOL(__cmpxchg_u64); #ifdef CONFIG_SMP From 6cd2cbd553ea8693cfd1f8fe490860d9dbf41776 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 3 Apr 2024 10:06:42 +0200 Subject: [PATCH 0025/1565] crypto: ccp - drop platform ifdef checks [ Upstream commit 42c2d7d02977ef09d434b1f5b354f5bc6c1027ab ] When both ACPI and OF are disabled, the dev_vdata variable is unused: drivers/crypto/ccp/sp-platform.c:33:34: error: unused variable 'dev_vdata' [-Werror,-Wunused-const-variable] This is not a useful configuration, and there is not much point in saving a few bytes when only one of the two is enabled, so just remove all these ifdef checks and rely on of_match_node() and acpi_match_device() returning NULL when these subsystems are disabled. Fixes: 6c5063434098 ("crypto: ccp - Add ACPI support") Signed-off-by: Arnd Bergmann Acked-by: Tom Lendacky Signed-off-by: Herbert Xu Signed-off-by: Sasha Levin --- drivers/crypto/ccp/sp-platform.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/drivers/crypto/ccp/sp-platform.c b/drivers/crypto/ccp/sp-platform.c index 9dba52fbee99..121f9d0cb608 100644 --- a/drivers/crypto/ccp/sp-platform.c +++ b/drivers/crypto/ccp/sp-platform.c @@ -39,44 +39,38 @@ static const struct sp_dev_vdata dev_vdata[] = { }, }; -#ifdef CONFIG_ACPI static const struct acpi_device_id sp_acpi_match[] = { { "AMDI0C00", (kernel_ulong_t)&dev_vdata[0] }, { }, }; MODULE_DEVICE_TABLE(acpi, sp_acpi_match); -#endif -#ifdef CONFIG_OF static const struct of_device_id sp_of_match[] = { { .compatible = "amd,ccp-seattle-v1a", .data = (const void *)&dev_vdata[0] }, { }, }; MODULE_DEVICE_TABLE(of, sp_of_match); -#endif static struct sp_dev_vdata *sp_get_of_version(struct platform_device *pdev) { -#ifdef CONFIG_OF const struct of_device_id *match; match = of_match_node(sp_of_match, pdev->dev.of_node); if (match && match->data) return (struct sp_dev_vdata *)match->data; -#endif + return NULL; } static struct sp_dev_vdata *sp_get_acpi_version(struct platform_device *pdev) { -#ifdef CONFIG_ACPI const struct acpi_device_id *match; match = acpi_match_device(sp_acpi_match, &pdev->dev); if (match && match->driver_data) return (struct sp_dev_vdata *)match->driver_data; -#endif + return NULL; } @@ -222,12 +216,8 @@ static int sp_platform_resume(struct platform_device *pdev) static struct platform_driver sp_platform_driver = { .driver = { .name = "ccp", -#ifdef CONFIG_ACPI .acpi_match_table = sp_acpi_match, -#endif -#ifdef CONFIG_OF .of_match_table = sp_of_match, -#endif }, .probe = sp_platform_probe, .remove = sp_platform_remove, From 0ceb0a40c5ecc563764128f67bfceea3fb7d3ada Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Fri, 5 Apr 2024 20:26:08 -0400 Subject: [PATCH 0026/1565] crypto: x86/nh-avx2 - add missing vzeroupper [ Upstream commit 4ad096cca942959871d8ff73826d30f81f856f6e ] Since nh_avx2() uses ymm registers, execute vzeroupper before returning from it. This is necessary to avoid reducing the performance of SSE code. Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305") Signed-off-by: Eric Biggers Acked-by: Tim Chen Signed-off-by: Herbert Xu Signed-off-by: Sasha Levin --- arch/x86/crypto/nh-avx2-x86_64.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/crypto/nh-avx2-x86_64.S b/arch/x86/crypto/nh-avx2-x86_64.S index 6a0b15e7196a..54c0ee41209d 100644 --- a/arch/x86/crypto/nh-avx2-x86_64.S +++ b/arch/x86/crypto/nh-avx2-x86_64.S @@ -153,5 +153,6 @@ SYM_FUNC_START(nh_avx2) vpaddq T1, T0, T0 vpaddq T4, T0, T0 vmovdqu T0, (HASH) + vzeroupper RET SYM_FUNC_END(nh_avx2) From 9f6dbd0aa10744602134306b81d2c8b645419422 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Fri, 5 Apr 2024 20:26:09 -0400 Subject: [PATCH 0027/1565] crypto: x86/sha256-avx2 - add missing vzeroupper [ Upstream commit 57ce8a4e162599cf9adafef1f29763160a8e5564 ] Since sha256_transform_rorx() uses ymm registers, execute vzeroupper before returning from it. This is necessary to avoid reducing the performance of SSE code. Fixes: d34a460092d8 ("crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions") Signed-off-by: Eric Biggers Acked-by: Tim Chen Signed-off-by: Herbert Xu Signed-off-by: Sasha Levin --- arch/x86/crypto/sha256-avx2-asm.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/crypto/sha256-avx2-asm.S b/arch/x86/crypto/sha256-avx2-asm.S index 3439aaf4295d..81c8053152bb 100644 --- a/arch/x86/crypto/sha256-avx2-asm.S +++ b/arch/x86/crypto/sha256-avx2-asm.S @@ -711,6 +711,7 @@ done_hash: popq %r13 popq %r12 popq %rbx + vzeroupper RET SYM_FUNC_END(sha256_transform_rorx) From de1207e5fd26bd3fb15c18df8264a3c50110e251 Mon Sep 17 00:00:00 2001 From: Peter Oberparleiter Date: Tue, 26 Mar 2024 17:04:56 +0100 Subject: [PATCH 0028/1565] s390/cio: fix tracepoint subchannel type field [ Upstream commit 8692a24d0fae19f674d51726d179ad04ba95d958 ] The subchannel-type field "st" of s390_cio_stsch and s390_cio_msch tracepoints is incorrectly filled with the subchannel-enabled SCHIB value "ena". Fix this by assigning the correct value. Fixes: d1de8633d96a ("s390 cio: Rewrite trace point class s390_class_schib") Reviewed-by: Heiko Carstens Signed-off-by: Peter Oberparleiter Signed-off-by: Alexander Gordeev Signed-off-by: Sasha Levin --- drivers/s390/cio/trace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/s390/cio/trace.h b/drivers/s390/cio/trace.h index 4803139bce14..4716798b1368 100644 --- a/drivers/s390/cio/trace.h +++ b/drivers/s390/cio/trace.h @@ -50,7 +50,7 @@ DECLARE_EVENT_CLASS(s390_class_schib, __entry->devno = schib->pmcw.dev; __entry->schib = *schib; __entry->pmcw_ena = schib->pmcw.ena; - __entry->pmcw_st = schib->pmcw.ena; + __entry->pmcw_st = schib->pmcw.st; __entry->pmcw_dnv = schib->pmcw.dnv; __entry->pmcw_dev = schib->pmcw.dev; __entry->pmcw_lpm = schib->pmcw.lpm; From f0eea095ce8c959b86e1e57fe36ca4fea5ae54f8 Mon Sep 17 00:00:00 2001 From: Ilya Denisyev Date: Fri, 12 Apr 2024 18:53:54 +0300 Subject: [PATCH 0029/1565] jffs2: prevent xattr node from overflowing the eraseblock [ Upstream commit c6854e5a267c28300ff045480b5a7ee7f6f1d913 ] Add a check to make sure that the requested xattr node size is no larger than the eraseblock minus the cleanmarker. Unlike the usual inode nodes, the xattr nodes aren't split into parts and spread across multiple eraseblocks, which means that a xattr node must not occupy more than one eraseblock. If the requested xattr value is too large, the xattr node can spill onto the next eraseblock, overwriting the nodes and causing errors such as: jffs2: argh. node added in wrong place at 0x0000b050(2) jffs2: nextblock 0x0000a000, expected at 0000b00c jffs2: error: (823) do_verify_xattr_datum: node CRC failed at 0x01e050, read=0xfc892c93, calc=0x000000 jffs2: notice: (823) jffs2_get_inode_nodes: Node header CRC failed at 0x01e00c. {848f,2fc4,0fef511f,59a3d171} jffs2: Node at 0x0000000c with length 0x00001044 would run over the end of the erase block jffs2: Perhaps the file system was created with the wrong erase size? jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00000010: 0x1044 instead This breaks the filesystem and can lead to KASAN crashes such as: BUG: KASAN: slab-out-of-bounds in jffs2_sum_add_kvec+0x125e/0x15d0 Read of size 4 at addr ffff88802c31e914 by task repro/830 CPU: 0 PID: 830 Comm: repro Not tainted 6.9.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 Call Trace: dump_stack_lvl+0xc6/0x120 print_report+0xc4/0x620 ? __virt_addr_valid+0x308/0x5b0 kasan_report+0xc1/0xf0 ? jffs2_sum_add_kvec+0x125e/0x15d0 ? jffs2_sum_add_kvec+0x125e/0x15d0 jffs2_sum_add_kvec+0x125e/0x15d0 jffs2_flash_direct_writev+0xa8/0xd0 jffs2_flash_writev+0x9c9/0xef0 ? __x64_sys_setxattr+0xc4/0x160 ? do_syscall_64+0x69/0x140 ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [...] Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: aa98d7cf59b5 ("[JFFS2][XATTR] XATTR support on JFFS2 (version. 5)") Signed-off-by: Ilya Denisyev Link: https://lore.kernel.org/r/20240412155357.237803-1-dev@elkcl.ru Signed-off-by: Christian Brauner Signed-off-by: Sasha Levin --- fs/jffs2/xattr.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/jffs2/xattr.c b/fs/jffs2/xattr.c index acb4492f5970..5a31220f96f5 100644 --- a/fs/jffs2/xattr.c +++ b/fs/jffs2/xattr.c @@ -1111,6 +1111,9 @@ int do_jffs2_setxattr(struct inode *inode, int xprefix, const char *xname, return rc; request = PAD(sizeof(struct jffs2_raw_xattr) + strlen(xname) + 1 + size); + if (request > c->sector_size - c->cleanmarker_size) + return -ERANGE; + rc = jffs2_reserve_space(c, request, &length, ALLOC_NORMAL, JFFS2_SUMMARY_XATTR_SIZE); if (rc) { From b2f8354f732a54ae4f8ab9a6f5e32abcbb710e2b Mon Sep 17 00:00:00 2001 From: Chun-Kuang Hu Date: Thu, 22 Feb 2024 15:41:09 +0000 Subject: [PATCH 0030/1565] soc: mediatek: cmdq: Fix typo of CMDQ_JUMP_RELATIVE [ Upstream commit ed4d5ab179b9f0a60da87c650a31f1816db9b4b4 ] For cmdq jump command, offset 0 means relative jump and offset 1 means absolute jump. cmdq_pkt_jump() is absolute jump, so fix the typo of CMDQ_JUMP_RELATIVE in cmdq_pkt_jump(). Fixes: 946f1792d3d7 ("soc: mediatek: cmdq: add jump function") Signed-off-by: Chun-Kuang Hu Reviewed-by: AngeloGioacchino Del Regno Link: https://lore.kernel.org/r/20240222154120.16959-2-chunkuang.hu@kernel.org Signed-off-by: AngeloGioacchino Del Regno Signed-off-by: Sasha Levin --- drivers/soc/mediatek/mtk-cmdq-helper.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/soc/mediatek/mtk-cmdq-helper.c b/drivers/soc/mediatek/mtk-cmdq-helper.c index 505651b0d715..c0b2f9397ea8 100644 --- a/drivers/soc/mediatek/mtk-cmdq-helper.c +++ b/drivers/soc/mediatek/mtk-cmdq-helper.c @@ -13,7 +13,8 @@ #define CMDQ_POLL_ENABLE_MASK BIT(0) #define CMDQ_EOC_IRQ_EN BIT(0) #define CMDQ_REG_TYPE 1 -#define CMDQ_JUMP_RELATIVE 1 +#define CMDQ_JUMP_RELATIVE 0 +#define CMDQ_JUMP_ABSOLUTE 1 struct cmdq_instruction { union { @@ -414,7 +415,7 @@ int cmdq_pkt_jump(struct cmdq_pkt *pkt, dma_addr_t addr) struct cmdq_instruction inst = {}; inst.op = CMDQ_CODE_JUMP; - inst.offset = CMDQ_JUMP_RELATIVE; + inst.offset = CMDQ_JUMP_ABSOLUTE; inst.value = addr >> cmdq_get_shift_pa(((struct cmdq_client *)pkt->cl)->chan); return cmdq_pkt_append_command(pkt, inst); From c62f315238df3eebee7a38c0b3dba8a6d8e87ad9 Mon Sep 17 00:00:00 2001 From: Zhu Yanjun Date: Thu, 25 Apr 2024 19:16:35 +0200 Subject: [PATCH 0031/1565] null_blk: Fix missing mutex_destroy() at module removal [ Upstream commit 07d1b99825f40f9c0d93e6b99d79a08d0717bac1 ] When a mutex lock is not used any more, the function mutex_destroy should be called to mark the mutex lock uninitialized. Fixes: f2298c0403b0 ("null_blk: multi queue aware block test driver") Signed-off-by: Zhu Yanjun Link: https://lore.kernel.org/r/20240425171635.4227-1-yanjun.zhu@linux.dev Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin --- drivers/block/null_blk/main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c index 35b390a785dd..ee1c3f476a3a 100644 --- a/drivers/block/null_blk/main.c +++ b/drivers/block/null_blk/main.c @@ -2032,6 +2032,8 @@ static void __exit null_exit(void) if (g_queue_mode == NULL_Q_MQ && shared_tags) blk_mq_free_tag_set(&tag_set); + + mutex_destroy(&lock); } module_init(null_init); From 3f5b73ef8fd6268cbc968b308d8eafe56fda97f3 Mon Sep 17 00:00:00 2001 From: Yu Kuai Date: Mon, 22 Apr 2024 14:58:24 +0800 Subject: [PATCH 0032/1565] md: fix resync softlockup when bitmap size is less than array size [ Upstream commit f0e729af2eb6bee9eb58c4df1087f14ebaefe26b ] Is is reported that for dm-raid10, lvextend + lvchange --syncaction will trigger following softlockup: kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [mdX_resync:6976] CPU: 7 PID: 3588 Comm: mdX_resync Kdump: loaded Not tainted 6.9.0-rc4-next-20240419 #1 RIP: 0010:_raw_spin_unlock_irq+0x13/0x30 Call Trace: md_bitmap_start_sync+0x6b/0xf0 raid10_sync_request+0x25c/0x1b40 [raid10] md_do_sync+0x64b/0x1020 md_thread+0xa7/0x170 kthread+0xcf/0x100 ret_from_fork+0x30/0x50 ret_from_fork_asm+0x1a/0x30 And the detailed process is as follows: md_do_sync j = mddev->resync_min while (j < max_sectors) sectors = raid10_sync_request(mddev, j, &skipped) if (!md_bitmap_start_sync(..., &sync_blocks)) // md_bitmap_start_sync set sync_blocks to 0 return sync_blocks + sectors_skippe; // sectors = 0; j += sectors; // j never change Root cause is that commit 301867b1c168 ("md/raid10: check slab-out-of-bounds in md_bitmap_get_counter") return early from md_bitmap_get_counter(), without setting returned blocks. Fix this problem by always set returned blocks from md_bitmap_get_counter"(), as it used to be. Noted that this patch just fix the softlockup problem in kernel, the case that bitmap size doesn't match array size still need to be fixed. Fixes: 301867b1c168 ("md/raid10: check slab-out-of-bounds in md_bitmap_get_counter") Reported-and-tested-by: Nigel Croxon Closes: https://lore.kernel.org/all/71ba5272-ab07-43ba-8232-d2da642acb4e@redhat.com/ Signed-off-by: Yu Kuai Link: https://lore.kernel.org/r/20240422065824.2516-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu Signed-off-by: Sasha Levin --- drivers/md/md-bitmap.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c index b28302836b2e..91bc764a854c 100644 --- a/drivers/md/md-bitmap.c +++ b/drivers/md/md-bitmap.c @@ -1355,7 +1355,7 @@ __acquires(bitmap->lock) sector_t chunk = offset >> bitmap->chunkshift; unsigned long page = chunk >> PAGE_COUNTER_SHIFT; unsigned long pageoff = (chunk & PAGE_COUNTER_MASK) << COUNTER_BYTE_SHIFT; - sector_t csize; + sector_t csize = ((sector_t)1) << bitmap->chunkshift; int err; if (page >= bitmap->pages) { @@ -1364,6 +1364,7 @@ __acquires(bitmap->lock) * End-of-device while looking for a whole page or * user set a huge number to sysfs bitmap_set_bits. */ + *blocks = csize - (offset & (csize - 1)); return NULL; } err = md_bitmap_checkpage(bitmap, page, create, 0); @@ -1372,8 +1373,7 @@ __acquires(bitmap->lock) bitmap->bp[page].map == NULL) csize = ((sector_t)1) << (bitmap->chunkshift + PAGE_COUNTER_SHIFT); - else - csize = ((sector_t)1) << bitmap->chunkshift; + *blocks = csize - (offset & (csize - 1)); if (err < 0) From 5fb37c456d3800c0429edbd7b2ca7fbff450e547 Mon Sep 17 00:00:00 2001 From: Baochen Qiang Date: Wed, 6 Mar 2024 07:15:14 +0200 Subject: [PATCH 0033/1565] wifi: ath10k: poll service ready message before failing [ Upstream commit e57b7d62a1b2f496caf0beba81cec3c90fad80d5 ] Currently host relies on CE interrupts to get notified that the service ready message is ready. This results in timeout issue if the interrupt is not fired, due to some unknown reasons. See below logs: [76321.937866] ath10k_pci 0000:02:00.0: wmi service ready event not received ... [76322.016738] ath10k_pci 0000:02:00.0: Could not init core: -110 And finally it causes WLAN interface bring up failure. Change to give it one more chance here by polling CE rings, before failing directly. Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00157-QCARMSWPZ-1 Fixes: 5e3dd157d7e7 ("ath10k: mac80211 driver for Qualcomm Atheros 802.11ac CQA98xx devices") Reported-by: James Prestwood Tested-By: James Prestwood # on QCA6174 hw3.2 Link: https://lore.kernel.org/linux-wireless/304ce305-fbe6-420e-ac2a-d61ae5e6ca1a@gmail.com/ Signed-off-by: Baochen Qiang Acked-by: Jeff Johnson Signed-off-by: Kalle Valo Link: https://msgid.link/20240227030409.89702-1-quic_bqiang@quicinc.com Signed-off-by: Sasha Levin --- drivers/net/wireless/ath/ath10k/wmi.c | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c index 85fe855ece09..9cfd35dc87ba 100644 --- a/drivers/net/wireless/ath/ath10k/wmi.c +++ b/drivers/net/wireless/ath/ath10k/wmi.c @@ -1762,12 +1762,32 @@ void ath10k_wmi_put_wmi_channel(struct ath10k *ar, struct wmi_channel *ch, int ath10k_wmi_wait_for_service_ready(struct ath10k *ar) { - unsigned long time_left; + unsigned long time_left, i; time_left = wait_for_completion_timeout(&ar->wmi.service_ready, WMI_SERVICE_READY_TIMEOUT_HZ); - if (!time_left) - return -ETIMEDOUT; + if (!time_left) { + /* Sometimes the PCI HIF doesn't receive interrupt + * for the service ready message even if the buffer + * was completed. PCIe sniffer shows that it's + * because the corresponding CE ring doesn't fires + * it. Workaround here by polling CE rings once. + */ + ath10k_warn(ar, "failed to receive service ready completion, polling..\n"); + + for (i = 0; i < CE_COUNT; i++) + ath10k_hif_send_complete_check(ar, i, 1); + + time_left = wait_for_completion_timeout(&ar->wmi.service_ready, + WMI_SERVICE_READY_TIMEOUT_HZ); + if (!time_left) { + ath10k_warn(ar, "polling timed out\n"); + return -ETIMEDOUT; + } + + ath10k_warn(ar, "service ready completion received, continuing normally\n"); + } + return 0; } From 01ea6818fac10574d64e7714341c166ffe3d4098 Mon Sep 17 00:00:00 2001 From: Guixiong Wei Date: Sun, 17 Mar 2024 23:05:47 +0800 Subject: [PATCH 0034/1565] x86/boot: Ignore relocations in .notes sections in walk_relocs() too [ Upstream commit 76e9762d66373354b45c33b60e9a53ef2a3c5ff2 ] Commit: aaa8736370db ("x86, relocs: Ignore relocations in .notes section") ... only started ignoring the .notes sections in print_absolute_relocs(), but the same logic should also by applied in walk_relocs() to avoid such relocations. [ mingo: Fixed various typos in the changelog, removed extra curly braces from the code. ] Fixes: aaa8736370db ("x86, relocs: Ignore relocations in .notes section") Fixes: 5ead97c84fa7 ("xen: Core Xen implementation") Fixes: da1a679cde9b ("Add /sys/kernel/notes") Signed-off-by: Guixiong Wei Signed-off-by: Ingo Molnar Reviewed-by: Kees Cook Link: https://lore.kernel.org/r/20240317150547.24910-1-weiguixiong@bytedance.com Signed-off-by: Sasha Levin --- arch/x86/tools/relocs.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c index 0043fd374a62..f9ec998b7e94 100644 --- a/arch/x86/tools/relocs.c +++ b/arch/x86/tools/relocs.c @@ -689,6 +689,15 @@ static void walk_relocs(int (*process)(struct section *sec, Elf_Rel *rel, if (!(sec_applies->shdr.sh_flags & SHF_ALLOC)) { continue; } + + /* + * Do not perform relocations in .notes sections; any + * values there are meant for pre-boot consumption (e.g. + * startup_xen). + */ + if (sec_applies->shdr.sh_type == SHT_NOTE) + continue; + sh_symtab = sec_symtab->symtab; sym_strtab = sec_symtab->link->strtab; for (j = 0; j < sec->shdr.sh_size/sizeof(Elf_Rel); j++) { From 742f58067071dfd2f451909ab84d2377cf84c501 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 26 Mar 2024 23:38:02 +0100 Subject: [PATCH 0035/1565] qed: avoid truncating work queue length [ Upstream commit 954fd908f177604d4cce77e2a88cc50b29bad5ff ] clang complains that the temporary string for the name passed into alloc_workqueue() is too short for its contents: drivers/net/ethernet/qlogic/qed/qed_main.c:1218:3: error: 'snprintf' will always be truncated; specified size is 16, but format string expands to at least 18 [-Werror,-Wformat-truncation] There is no need for a temporary buffer, and the actual name of a workqueue is 32 bytes (WQ_NAME_LEN), so just use the interface as intended to avoid the truncation. Fixes: 59ccf86fe69a ("qed: Add driver infrastucture for handling mfw requests.") Signed-off-by: Arnd Bergmann Link: https://lore.kernel.org/r/20240326223825.4084412-4-arnd@kernel.org Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/ethernet/qlogic/qed/qed_main.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c index 41bc31e3f935..03ede9425889 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -1234,7 +1234,6 @@ static void qed_slowpath_task(struct work_struct *work) static int qed_slowpath_wq_start(struct qed_dev *cdev) { struct qed_hwfn *hwfn; - char name[NAME_SIZE]; int i; if (IS_VF(cdev)) @@ -1243,11 +1242,11 @@ static int qed_slowpath_wq_start(struct qed_dev *cdev) for_each_hwfn(cdev, i) { hwfn = &cdev->hwfns[i]; - snprintf(name, NAME_SIZE, "slowpath-%02x:%02x.%02x", - cdev->pdev->bus->number, - PCI_SLOT(cdev->pdev->devfn), hwfn->abs_pf_id); + hwfn->slowpath_wq = alloc_workqueue("slowpath-%02x:%02x.%02x", + 0, 0, cdev->pdev->bus->number, + PCI_SLOT(cdev->pdev->devfn), + hwfn->abs_pf_id); - hwfn->slowpath_wq = alloc_workqueue(name, 0, 0); if (!hwfn->slowpath_wq) { DP_NOTICE(hwfn, "Cannot create slowpath workqueue\n"); return -ENOMEM; From d193f4a153ac74639de0dd644aee5771c2ca600f Mon Sep 17 00:00:00 2001 From: Andrew Halaney Date: Fri, 29 Mar 2024 15:46:43 -0500 Subject: [PATCH 0036/1565] scsi: ufs: qcom: Perform read back after writing reset bit [ Upstream commit c4d28e06b0c94636f6e35d003fa9ebac0a94e1ae ] Currently, the reset bit for the UFS provided reset controller (used by its phy) is written to, and then a mb() happens to try and ensure that hit the device. Immediately afterwards a usleep_range() occurs. mb() ensures that the write completes, but completion doesn't mean that it isn't stored in a buffer somewhere. The recommendation for ensuring this bit has taken effect on the device is to perform a read back to force it to make it all the way to the device. This is documented in device-io.rst and a talk by Will Deacon on this can be seen over here: https://youtu.be/i6DayghhA8Q?si=MiyxB5cKJXSaoc01&t=1678 Let's do that to ensure the bit hits the device. By doing so and guaranteeing the ordering against the immediately following usleep_range(), the mb() can safely be removed. Fixes: 81c0fc51b7a7 ("ufs-qcom: add support for Qualcomm Technologies Inc platforms") Reviewed-by: Manivannan Sadhasivam Reviewed-by: Can Guo Signed-off-by: Andrew Halaney Link: https://lore.kernel.org/r/20240329-ufs-reset-ensure-effect-before-delay-v5-1-181252004586@redhat.com Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/ufs/ufs-qcom.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/scsi/ufs/ufs-qcom.h b/drivers/scsi/ufs/ufs-qcom.h index 3f4922743b3e..478134ba8086 100644 --- a/drivers/scsi/ufs/ufs-qcom.h +++ b/drivers/scsi/ufs/ufs-qcom.h @@ -156,10 +156,10 @@ static inline void ufs_qcom_assert_reset(struct ufs_hba *hba) 1 << OFFSET_UFS_PHY_SOFT_RESET, REG_UFS_CFG1); /* - * Make sure assertion of ufs phy reset is written to - * register before returning + * Dummy read to ensure the write takes effect before doing any sort + * of delay */ - mb(); + ufshcd_readl(hba, REG_UFS_CFG1); } static inline void ufs_qcom_deassert_reset(struct ufs_hba *hba) @@ -168,10 +168,10 @@ static inline void ufs_qcom_deassert_reset(struct ufs_hba *hba) 0 << OFFSET_UFS_PHY_SOFT_RESET, REG_UFS_CFG1); /* - * Make sure de-assertion of ufs phy reset is written to - * register before returning + * Dummy read to ensure the write takes effect before doing any sort + * of delay */ - mb(); + ufshcd_readl(hba, REG_UFS_CFG1); } /* Host controller hardware version: major.minor.step */ From 82783759e88beee69b710806e45acbb2fc589801 Mon Sep 17 00:00:00 2001 From: Ziqi Chen Date: Fri, 8 Jan 2021 18:56:25 +0800 Subject: [PATCH 0037/1565] scsi: ufs-qcom: Fix ufs RST_n spec violation [ Upstream commit b61d0414136853fc38898829cde837ce5d691a9a ] According to the spec (JESD220E chapter 7.2), while powering off/on the ufs device, RST_n signal should be between VSS(Ground) and VCCQ/VCCQ2. Link: https://lore.kernel.org/r/1610103385-45755-3-git-send-email-ziqichen@codeaurora.org Acked-by: Avri Altman Signed-off-by: Ziqi Chen Signed-off-by: Martin K. Petersen Stable-dep-of: a862fafa263a ("scsi: ufs: qcom: Perform read back after writing REG_UFS_SYS1CLK_1US") Signed-off-by: Sasha Levin --- drivers/scsi/ufs/ufs-qcom.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c index 08331ecbe91f..a432c561c3df 100644 --- a/drivers/scsi/ufs/ufs-qcom.c +++ b/drivers/scsi/ufs/ufs-qcom.c @@ -578,6 +578,17 @@ static int ufs_qcom_link_startup_notify(struct ufs_hba *hba, return err; } +static void ufs_qcom_device_reset_ctrl(struct ufs_hba *hba, bool asserted) +{ + struct ufs_qcom_host *host = ufshcd_get_variant(hba); + + /* reset gpio is optional */ + if (!host->device_reset) + return; + + gpiod_set_value_cansleep(host->device_reset, asserted); +} + static int ufs_qcom_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) { struct ufs_qcom_host *host = ufshcd_get_variant(hba); @@ -592,6 +603,9 @@ static int ufs_qcom_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op) ufs_qcom_disable_lane_clks(host); phy_power_off(phy); + /* reset the connected UFS device during power down */ + ufs_qcom_device_reset_ctrl(hba, true); + } else if (!ufs_qcom_is_link_active(hba)) { ufs_qcom_disable_lane_clks(host); } @@ -1441,10 +1455,10 @@ static int ufs_qcom_device_reset(struct ufs_hba *hba) * The UFS device shall detect reset pulses of 1us, sleep for 10us to * be on the safe side. */ - gpiod_set_value_cansleep(host->device_reset, 1); + ufs_qcom_device_reset_ctrl(hba, true); usleep_range(10, 15); - gpiod_set_value_cansleep(host->device_reset, 0); + ufs_qcom_device_reset_ctrl(hba, false); usleep_range(10, 15); return 0; From d113c66bb4ae1fff42e4933ef1d53107c8983f8a Mon Sep 17 00:00:00 2001 From: Andrew Halaney Date: Fri, 29 Mar 2024 15:46:44 -0500 Subject: [PATCH 0038/1565] scsi: ufs: qcom: Perform read back after writing REG_UFS_SYS1CLK_1US [ Upstream commit a862fafa263aea0f427d51aca6ff7fd9eeaaa8bd ] Currently after writing to REG_UFS_SYS1CLK_1US a mb() is used to ensure that write has gone through to the device. mb() ensures that the write completes, but completion doesn't mean that it isn't stored in a buffer somewhere. The recommendation for ensuring this bit has taken effect on the device is to perform a read back to force it to make it all the way to the device. This is documented in device-io.rst and a talk by Will Deacon on this can be seen over here: https://youtu.be/i6DayghhA8Q?si=MiyxB5cKJXSaoc01&t=1678 Let's do that to ensure the bit hits the device. Because the mb()'s purpose wasn't to add extra ordering (on top of the ordering guaranteed by writel()/readl()), it can safely be removed. Fixes: f06fcc7155dc ("scsi: ufs-qcom: add QUniPro hardware support and power optimizations") Reviewed-by: Can Guo Signed-off-by: Andrew Halaney Link: https://lore.kernel.org/r/20240329-ufs-reset-ensure-effect-before-delay-v5-2-181252004586@redhat.com Reviewed-by: Manivannan Sadhasivam Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/ufs/ufs-qcom.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c index a432c561c3df..45d21cb80289 100644 --- a/drivers/scsi/ufs/ufs-qcom.c +++ b/drivers/scsi/ufs/ufs-qcom.c @@ -449,7 +449,7 @@ static int ufs_qcom_cfg_timers(struct ufs_hba *hba, u32 gear, * make sure above write gets applied before we return from * this function. */ - mb(); + ufshcd_readl(hba, REG_UFS_SYS1CLK_1US); } if (ufs_qcom_cap_qunipro(host)) From ee4bf03d261f4788accb441c08f344c1e990feef Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Thu, 22 Dec 2022 19:39:55 +0530 Subject: [PATCH 0039/1565] scsi: ufs: ufs-qcom: Fix the Qcom register name for offset 0xD0 [ Upstream commit 7959587f3284bf163e4f1baff3c6fa71fc6a55b1 ] On newer UFS revisions, the register at offset 0xD0 is called, REG_UFS_PARAM0. Since the existing register, RETRY_TIMER_REG is not used anywhere, it is safe to use the new name. Reviewed-by: Andrew Halaney Reviewed-by: Asutosh Das Tested-by: Andrew Halaney # Qdrive3/sa8540p-ride Signed-off-by: Manivannan Sadhasivam Signed-off-by: Martin K. Petersen Stable-dep-of: 823150ecf04f ("scsi: ufs: qcom: Perform read back after writing unipro mode") Signed-off-by: Sasha Levin --- drivers/scsi/ufs/ufs-qcom.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/ufs/ufs-qcom.h b/drivers/scsi/ufs/ufs-qcom.h index 478134ba8086..70bee1d1f113 100644 --- a/drivers/scsi/ufs/ufs-qcom.h +++ b/drivers/scsi/ufs/ufs-qcom.h @@ -46,7 +46,8 @@ enum { REG_UFS_TX_SYMBOL_CLK_NS_US = 0xC4, REG_UFS_LOCAL_PORT_ID_REG = 0xC8, REG_UFS_PA_ERR_CODE = 0xCC, - REG_UFS_RETRY_TIMER_REG = 0xD0, + /* On older UFS revisions, this register is called "RETRY_TIMER_REG" */ + REG_UFS_PARAM0 = 0xD0, REG_UFS_PA_LINK_STARTUP_TIMER = 0xD8, REG_UFS_CFG1 = 0xDC, REG_UFS_CFG2 = 0xE0, From 506f63e97d3e62d0bca6cc2e6f6ecdb34a601f44 Mon Sep 17 00:00:00 2001 From: Abel Vesa Date: Thu, 19 Jan 2023 17:14:05 +0200 Subject: [PATCH 0040/1565] scsi: ufs: ufs-qcom: Clear qunipro_g4_sel for HW version major 5 [ Upstream commit 9c02aa24bf404a39ec509d9f50539056b9b128f7 ] On SM8550, depending on the Qunipro, we can run with G5 or G4. For now, when the major version is 5 or above, we go with G5. Therefore, we need to specifically tell UFS HC that. Signed-off-by: Abel Vesa Signed-off-by: Martin K. Petersen Stable-dep-of: 823150ecf04f ("scsi: ufs: qcom: Perform read back after writing unipro mode") Signed-off-by: Sasha Levin --- drivers/scsi/ufs/ufs-qcom.c | 8 ++++++-- drivers/scsi/ufs/ufs-qcom.h | 6 +++++- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c index 45d21cb80289..ef8d721022da 100644 --- a/drivers/scsi/ufs/ufs-qcom.c +++ b/drivers/scsi/ufs/ufs-qcom.c @@ -242,6 +242,10 @@ static void ufs_qcom_select_unipro_mode(struct ufs_qcom_host *host) ufshcd_rmwl(host->hba, QUNIPRO_SEL, ufs_qcom_cap_qunipro(host) ? QUNIPRO_SEL : 0, REG_UFS_CFG1); + + if (host->hw_ver.major == 0x05) + ufshcd_rmwl(host->hba, QUNIPRO_G4_SEL, 0, REG_UFS_CFG0); + /* make sure above configuration is applied before we return */ mb(); } @@ -515,9 +519,9 @@ static int ufs_qcom_cfg_timers(struct ufs_hba *hba, u32 gear, mb(); } - if (update_link_startup_timer) { + if (update_link_startup_timer && host->hw_ver.major != 0x5) { ufshcd_writel(hba, ((core_clk_rate / MSEC_PER_SEC) * 100), - REG_UFS_PA_LINK_STARTUP_TIMER); + REG_UFS_CFG0); /* * make sure that this configuration is applied before * we return diff --git a/drivers/scsi/ufs/ufs-qcom.h b/drivers/scsi/ufs/ufs-qcom.h index 70bee1d1f113..742f752d01d6 100644 --- a/drivers/scsi/ufs/ufs-qcom.h +++ b/drivers/scsi/ufs/ufs-qcom.h @@ -48,7 +48,8 @@ enum { REG_UFS_PA_ERR_CODE = 0xCC, /* On older UFS revisions, this register is called "RETRY_TIMER_REG" */ REG_UFS_PARAM0 = 0xD0, - REG_UFS_PA_LINK_STARTUP_TIMER = 0xD8, + /* On older UFS revisions, this register is called "REG_UFS_PA_LINK_STARTUP_TIMER" */ + REG_UFS_CFG0 = 0xD8, REG_UFS_CFG1 = 0xDC, REG_UFS_CFG2 = 0xE0, REG_UFS_HW_VERSION = 0xE4, @@ -86,6 +87,9 @@ enum { #define UFS_CNTLR_2_x_x_VEN_REGS_OFFSET(x) (0x000 + x) #define UFS_CNTLR_3_x_x_VEN_REGS_OFFSET(x) (0x400 + x) +/* bit definitions for REG_UFS_CFG0 register */ +#define QUNIPRO_G4_SEL BIT(5) + /* bit definitions for REG_UFS_CFG1 register */ #define QUNIPRO_SEL 0x1 #define UTP_DBG_RAMS_EN 0x20000 From c8f2eefc496e13314b2f090000f885ad50652049 Mon Sep 17 00:00:00 2001 From: Andrew Halaney Date: Fri, 29 Mar 2024 15:46:46 -0500 Subject: [PATCH 0041/1565] scsi: ufs: qcom: Perform read back after writing unipro mode [ Upstream commit 823150ecf04f958213cf3bf162187cd1a91c885c ] Currently, the QUNIPRO_SEL bit is written to and then an mb() is used to ensure that completes before continuing. mb() ensures that the write completes, but completion doesn't mean that it isn't stored in a buffer somewhere. The recommendation for ensuring this bit has taken effect on the device is to perform a read back to force it to make it all the way to the device. This is documented in device-io.rst and a talk by Will Deacon on this can be seen over here: https://youtu.be/i6DayghhA8Q?si=MiyxB5cKJXSaoc01&t=1678 But, there's really no reason to even ensure completion before continuing. The only requirement here is that this write is ordered to this endpoint (which readl()/writel() guarantees already). For that reason the mb() can be dropped altogether without anything forcing completion. Fixes: f06fcc7155dc ("scsi: ufs-qcom: add QUniPro hardware support and power optimizations") Signed-off-by: Andrew Halaney Link: https://lore.kernel.org/r/20240329-ufs-reset-ensure-effect-before-delay-v5-4-181252004586@redhat.com Reviewed-by: Manivannan Sadhasivam Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/ufs/ufs-qcom.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c index ef8d721022da..7b8e59126567 100644 --- a/drivers/scsi/ufs/ufs-qcom.c +++ b/drivers/scsi/ufs/ufs-qcom.c @@ -245,9 +245,6 @@ static void ufs_qcom_select_unipro_mode(struct ufs_qcom_host *host) if (host->hw_ver.major == 0x05) ufshcd_rmwl(host->hba, QUNIPRO_G4_SEL, 0, REG_UFS_CFG0); - - /* make sure above configuration is applied before we return */ - mb(); } /* From dca975427630f167e59781b761df3e703163d3ca Mon Sep 17 00:00:00 2001 From: Andrew Halaney Date: Fri, 29 Mar 2024 15:46:47 -0500 Subject: [PATCH 0042/1565] scsi: ufs: qcom: Perform read back after writing CGC enable [ Upstream commit d9488511b3ac7eb48a91bc5eded7027525525e03 ] Currently, the CGC enable bit is written and then an mb() is used to ensure that completes before continuing. mb() ensures that the write completes, but completion doesn't mean that it isn't stored in a buffer somewhere. The recommendation for ensuring this bit has taken effect on the device is to perform a read back to force it to make it all the way to the device. This is documented in device-io.rst and a talk by Will Deacon on this can be seen over here: https://youtu.be/i6DayghhA8Q?si=MiyxB5cKJXSaoc01&t=1678 Let's do that to ensure the bit hits the device. Because the mb()'s purpose wasn't to add extra ordering (on top of the ordering guaranteed by writel()/readl()), it can safely be removed. Reviewed-by: Manivannan Sadhasivam Reviewed-by: Can Guo Fixes: 81c0fc51b7a7 ("ufs-qcom: add support for Qualcomm Technologies Inc platforms") Signed-off-by: Andrew Halaney Link: https://lore.kernel.org/r/20240329-ufs-reset-ensure-effect-before-delay-v5-5-181252004586@redhat.com Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/ufs/ufs-qcom.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c index 7b8e59126567..ed0d6ea967e1 100644 --- a/drivers/scsi/ufs/ufs-qcom.c +++ b/drivers/scsi/ufs/ufs-qcom.c @@ -353,7 +353,7 @@ static void ufs_qcom_enable_hw_clk_gating(struct ufs_hba *hba) REG_UFS_CFG2); /* Ensure that HW clock gating is enabled before next operations */ - mb(); + ufshcd_readl(hba, REG_UFS_CFG2); } static int ufs_qcom_hce_enable_notify(struct ufs_hba *hba, From bbc00d1b7a7106354c7a5d5678746decbcc741f3 Mon Sep 17 00:00:00 2001 From: Andrew Halaney Date: Fri, 29 Mar 2024 15:46:48 -0500 Subject: [PATCH 0043/1565] scsi: ufs: cdns-pltfrm: Perform read back after writing HCLKDIV [ Upstream commit b715c55daf598aac8fa339048e4ca8a0916b332e ] Currently, HCLKDIV is written to and then completed with an mb(). mb() ensures that the write completes, but completion doesn't mean that it isn't stored in a buffer somewhere. The recommendation for ensuring this bit has taken effect on the device is to perform a read back to force it to make it all the way to the device. This is documented in device-io.rst and a talk by Will Deacon on this can be seen over here: https://youtu.be/i6DayghhA8Q?si=MiyxB5cKJXSaoc01&t=1678 Let's do that to ensure the bit hits the device. Because the mb()'s purpose wasn't to add extra ordering (on top of the ordering guaranteed by writel()/readl()), it can safely be removed. Fixes: d90996dae8e4 ("scsi: ufs: Add UFS platform driver for Cadence UFS") Reviewed-by: Manivannan Sadhasivam Signed-off-by: Andrew Halaney Link: https://lore.kernel.org/r/20240329-ufs-reset-ensure-effect-before-delay-v5-6-181252004586@redhat.com Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/ufs/cdns-pltfrm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/ufs/cdns-pltfrm.c b/drivers/scsi/ufs/cdns-pltfrm.c index da065a259f6e..408ba52671de 100644 --- a/drivers/scsi/ufs/cdns-pltfrm.c +++ b/drivers/scsi/ufs/cdns-pltfrm.c @@ -135,7 +135,7 @@ static int cdns_ufs_set_hclkdiv(struct ufs_hba *hba) * Make sure the register was updated, * UniPro layer will not work with an incorrect value. */ - mb(); + ufshcd_readl(hba, CDNS_UFS_REG_HCLKDIV); return 0; } From 3bbfbd5a36d82b3a09a70c6ddd74c851b315ed64 Mon Sep 17 00:00:00 2001 From: Andrew Halaney Date: Fri, 29 Mar 2024 15:46:50 -0500 Subject: [PATCH 0044/1565] scsi: ufs: core: Perform read back after disabling interrupts [ Upstream commit e4a628877119bd40164a651d20321247b6f94a8b ] Currently, interrupts are cleared and disabled prior to registering the interrupt. An mb() is used to complete the clear/disable writes before the interrupt is registered. mb() ensures that the write completes, but completion doesn't mean that it isn't stored in a buffer somewhere. The recommendation for ensuring these bits have taken effect on the device is to perform a read back to force it to make it all the way to the device. This is documented in device-io.rst and a talk by Will Deacon on this can be seen over here: https://youtu.be/i6DayghhA8Q?si=MiyxB5cKJXSaoc01&t=1678 Let's do that to ensure these bits hit the device. Because the mb()'s purpose wasn't to add extra ordering (on top of the ordering guaranteed by writel()/readl()), it can safely be removed. Fixes: 199ef13cac7d ("scsi: ufs: avoid spurious UFS host controller interrupts") Reviewed-by: Manivannan Sadhasivam Reviewed-by: Bart Van Assche Reviewed-by: Can Guo Signed-off-by: Andrew Halaney Link: https://lore.kernel.org/r/20240329-ufs-reset-ensure-effect-before-delay-v5-8-181252004586@redhat.com Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/ufs/ufshcd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index a432aebd14be..99b59392626b 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -9193,7 +9193,7 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq) * Make sure that UFS interrupts are disabled and any pending interrupt * status is cleared before registering UFS interrupt handler. */ - mb(); + ufshcd_readl(hba, REG_INTERRUPT_ENABLE); /* IRQ registration */ err = devm_request_irq(dev, irq, ufshcd_intr, IRQF_SHARED, UFSHCD, hba); From bb291d4d086875492d49fcf7b79df851fe9f7f30 Mon Sep 17 00:00:00 2001 From: Andrew Halaney Date: Fri, 29 Mar 2024 15:46:51 -0500 Subject: [PATCH 0045/1565] scsi: ufs: core: Perform read back after disabling UIC_COMMAND_COMPL [ Upstream commit 4bf3855497b60765ca03b983d064b25e99b97657 ] Currently, the UIC_COMMAND_COMPL interrupt is disabled and a wmb() is used to complete the register write before any following writes. wmb() ensures the writes complete in that order, but completion doesn't mean that it isn't stored in a buffer somewhere. The recommendation for ensuring this bit has taken effect on the device is to perform a read back to force it to make it all the way to the device. This is documented in device-io.rst and a talk by Will Deacon on this can be seen over here: https://youtu.be/i6DayghhA8Q?si=MiyxB5cKJXSaoc01&t=1678 Let's do that to ensure the bit hits the device. Because the wmb()'s purpose wasn't to add extra ordering (on top of the ordering guaranteed by writel()/readl()), it can safely be removed. Fixes: d75f7fe495cf ("scsi: ufs: reduce the interrupts for power mode change requests") Reviewed-by: Bart Van Assche Reviewed-by: Can Guo Reviewed-by: Manivannan Sadhasivam Signed-off-by: Andrew Halaney Link: https://lore.kernel.org/r/20240329-ufs-reset-ensure-effect-before-delay-v5-9-181252004586@redhat.com Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/ufs/ufshcd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 99b59392626b..0b0230b46f28 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -3819,7 +3819,7 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct uic_command *cmd) * Make sure UIC command completion interrupt is disabled before * issuing UIC command. */ - wmb(); + ufshcd_readl(hba, REG_INTERRUPT_ENABLE); reenable_intr = true; } ret = __ufshcd_send_uic_cmd(hba, cmd, false); From f31b49ba366260ef7a1af537f936b3837359fe04 Mon Sep 17 00:00:00 2001 From: Zenghui Yu Date: Wed, 27 Mar 2024 22:23:05 +0800 Subject: [PATCH 0046/1565] irqchip/alpine-msi: Fix off-by-one in allocation error path [ Upstream commit ff3669a71afa06208de58d6bea1cc49d5e3fcbd1 ] When alpine_msix_gic_domain_alloc() fails, there is an off-by-one in the number of interrupts to be freed. Fix it by passing the number of successfully allocated interrupts, instead of the relative index of the last allocated one. Fixes: 3841245e8498 ("irqchip/alpine-msi: Fix freeing of interrupts on allocation error path") Signed-off-by: Zenghui Yu Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20240327142305.1048-1-yuzenghui@huawei.com Signed-off-by: Sasha Levin --- drivers/irqchip/irq-alpine-msi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-alpine-msi.c b/drivers/irqchip/irq-alpine-msi.c index 1819bb1d2723..aedbc4befcdf 100644 --- a/drivers/irqchip/irq-alpine-msi.c +++ b/drivers/irqchip/irq-alpine-msi.c @@ -165,7 +165,7 @@ static int alpine_msix_middle_domain_alloc(struct irq_domain *domain, return 0; err_sgi: - irq_domain_free_irqs_parent(domain, virq, i - 1); + irq_domain_free_irqs_parent(domain, virq, i); alpine_msix_free_sgi(priv, sgi, nr_irqs); return err; } From a8f0a14c3b88e5f8f369828f2f7d4cb91be97f16 Mon Sep 17 00:00:00 2001 From: Zenghui Yu Date: Wed, 27 Mar 2024 22:23:34 +0800 Subject: [PATCH 0047/1565] irqchip/loongson-pch-msi: Fix off-by-one on allocation error path [ Upstream commit b327708798809328f21da8dc14cc8883d1e8a4b3 ] When pch_msi_parent_domain_alloc() returns an error, there is an off-by-one in the number of interrupts to be freed. Fix it by passing the number of successfully allocated interrupts, instead of the relative index of the last allocated one. Fixes: 632dcc2c75ef ("irqchip: Add Loongson PCH MSI controller") Signed-off-by: Zenghui Yu Signed-off-by: Thomas Gleixner Reviewed-by: Jiaxun Yang Link: https://lore.kernel.org/r/20240327142334.1098-1-yuzenghui@huawei.com Signed-off-by: Sasha Levin --- drivers/irqchip/irq-loongson-pch-msi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-loongson-pch-msi.c b/drivers/irqchip/irq-loongson-pch-msi.c index 32562b7e681b..254a58fbb844 100644 --- a/drivers/irqchip/irq-loongson-pch-msi.c +++ b/drivers/irqchip/irq-loongson-pch-msi.c @@ -132,7 +132,7 @@ static int pch_msi_middle_domain_alloc(struct irq_domain *domain, err_hwirq: pch_msi_free_hwirq(priv, hwirq, nr_irqs); - irq_domain_free_irqs_parent(domain, virq, i - 1); + irq_domain_free_irqs_parent(domain, virq, i); return err; } From 11f9bd11020e2649cdeccd51e7ea89d62dfa305a Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 9 Apr 2024 16:00:55 +0200 Subject: [PATCH 0048/1565] ACPI: disable -Wstringop-truncation [ Upstream commit a3403d304708f60565582d60af4316289d0316a0 ] gcc -Wstringop-truncation warns about copying a string that results in a missing nul termination: drivers/acpi/acpica/tbfind.c: In function 'acpi_tb_find_table': drivers/acpi/acpica/tbfind.c:60:9: error: 'strncpy' specified bound 6 equals destination size [-Werror=stringop-truncation] 60 | strncpy(header.oem_id, oem_id, ACPI_OEM_ID_SIZE); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/acpi/acpica/tbfind.c:61:9: error: 'strncpy' specified bound 8 equals destination size [-Werror=stringop-truncation] 61 | strncpy(header.oem_table_id, oem_table_id, ACPI_OEM_TABLE_ID_SIZE); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The code works as intended, and the warning could be addressed by using a memcpy(), but turning the warning off for this file works equally well and may be easier to merge. Fixes: 47c08729bf1c ("ACPICA: Fix for LoadTable operator, input strings") Link: https://lore.kernel.org/lkml/CAJZ5v0hoUfv54KW7y4223Mn9E7D4xvR7whRFNLTBqCZMUxT50Q@mail.gmail.com/#t Signed-off-by: Arnd Bergmann Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin --- drivers/acpi/acpica/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/acpi/acpica/Makefile b/drivers/acpi/acpica/Makefile index f919811156b1..b6cf9c9bd639 100644 --- a/drivers/acpi/acpica/Makefile +++ b/drivers/acpi/acpica/Makefile @@ -5,6 +5,7 @@ ccflags-y := -D_LINUX -DBUILDING_ACPICA ccflags-$(CONFIG_ACPI_DEBUG) += -DACPI_DEBUG_OUTPUT +CFLAGS_tbfind.o += $(call cc-disable-warning, stringop-truncation) # use acpi.o to put all files here into acpi.o modparam namespace obj-y += acpi.o From dd52e3bc4fe859a6fc0f778ddc58e2ea35e1b7da Mon Sep 17 00:00:00 2001 From: Andreas Gruenbacher Date: Fri, 5 Apr 2024 13:47:51 +0200 Subject: [PATCH 0049/1565] gfs2: Fix "ignore unlock failures after withdraw" [ Upstream commit 5d9231111966b6c5a65016d58dcbeab91055bc91 ] Commit 3e11e53041502 tries to suppress dlm_lock() lock conversion errors that occur when the lockspace has already been released. It does that by setting and checking the SDF_SKIP_DLM_UNLOCK flag. This conflicts with the intended meaning of the SDF_SKIP_DLM_UNLOCK flag, so check whether the lockspace is still allocated instead. (Given the current DLM API, checking for this kind of error after the fact seems easier that than to make sure that the lockspace is still allocated before calling dlm_lock(). Changing the DLM API so that users maintain the lockspace references themselves would be an option.) Fixes: 3e11e53041502 ("GFS2: ignore unlock failures after withdraw") Signed-off-by: Andreas Gruenbacher Signed-off-by: Sasha Levin --- fs/gfs2/glock.c | 4 +++- fs/gfs2/util.c | 1 - 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index dd052101e226..b0f01a8e3776 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -691,11 +691,13 @@ __acquires(&gl->gl_lockref.lock) } if (sdp->sd_lockstruct.ls_ops->lm_lock) { + struct lm_lockstruct *ls = &sdp->sd_lockstruct; + /* lock_dlm */ ret = sdp->sd_lockstruct.ls_ops->lm_lock(gl, target, lck_flags); if (ret == -EINVAL && gl->gl_target == LM_ST_UNLOCKED && target == LM_ST_UNLOCKED && - test_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags)) { + test_bit(DFL_UNMOUNT, &ls->ls_recover_flags)) { finish_xmote(gl, target); gfs2_glock_queue_work(gl, 0); } else if (ret) { diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c index 3ece99e6490c..d11152dedb80 100644 --- a/fs/gfs2/util.c +++ b/fs/gfs2/util.c @@ -348,7 +348,6 @@ int gfs2_withdraw(struct gfs2_sbd *sdp) fs_err(sdp, "telling LM to unmount\n"); lm->lm_unmount(sdp); } - set_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags); fs_err(sdp, "File system withdrawn\n"); dump_stack(); clear_bit(SDF_WITHDRAW_IN_PROG, &sdp->sd_flags); From 7a4d18a27d858ecaefd1116b8a785b255ba39607 Mon Sep 17 00:00:00 2001 From: Geliang Tang Date: Tue, 9 Apr 2024 13:18:40 +0800 Subject: [PATCH 0050/1565] selftests/bpf: Fix umount cgroup2 error in test_sockmap [ Upstream commit d75142dbeb2bd1587b9cc19f841578f541275a64 ] This patch fixes the following "umount cgroup2" error in test_sockmap.c: (cgroup_helpers.c:353: errno: Device or resource busy) umount cgroup2 Cgroup fd cg_fd should be closed before cleanup_cgroup_environment(). Fixes: 13a5f3ffd202 ("bpf: Selftests, sockmap test prog run without setting cgroup") Signed-off-by: Geliang Tang Acked-by: Yonghong Song Link: https://lore.kernel.org/r/0399983bde729708773416b8488bac2cd5e022b8.1712639568.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau Signed-off-by: Sasha Levin --- tools/testing/selftests/bpf/test_sockmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c index 427ca00a3217..85d57633c8b6 100644 --- a/tools/testing/selftests/bpf/test_sockmap.c +++ b/tools/testing/selftests/bpf/test_sockmap.c @@ -2014,9 +2014,9 @@ int main(int argc, char **argv) free(options.whitelist); if (options.blacklist) free(options.blacklist); + close(cg_fd); if (cg_created) cleanup_cgroup_environment(); - close(cg_fd); return err; } From 3a28fbf533d88827d8f2df347ac9d9c4ea2c0664 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Wed, 11 May 2022 17:48:41 +0200 Subject: [PATCH 0051/1565] cpufreq: Reorganize checks in cpufreq_offline() [ Upstream commit e1e962c5b9edbc628a335bcdbd010331a12d3e5b ] Notice that cpufreq_offline() only needs to check policy_is_inactive() once and rearrange the code in there to make that happen. No expected functional impact. Signed-off-by: Rafael J. Wysocki Acked-by: Viresh Kumar Stable-dep-of: b8f85833c057 ("cpufreq: exit() callback is optional") Signed-off-by: Sasha Levin --- drivers/cpufreq/cpufreq.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 5b4bca71f201..97999bda188d 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1573,24 +1573,18 @@ static int cpufreq_offline(unsigned int cpu) } down_write(&policy->rwsem); + if (has_target()) cpufreq_stop_governor(policy); cpumask_clear_cpu(cpu, policy->cpus); - if (policy_is_inactive(policy)) { - if (has_target()) - strncpy(policy->last_governor, policy->governor->name, - CPUFREQ_NAME_LEN); - else - policy->last_policy = policy->policy; - } else if (cpu == policy->cpu) { - /* Nominate new CPU */ - policy->cpu = cpumask_any(policy->cpus); - } - - /* Start governor again for active policy */ if (!policy_is_inactive(policy)) { + /* Nominate a new CPU if necessary. */ + if (cpu == policy->cpu) + policy->cpu = cpumask_any(policy->cpus); + + /* Start the governor again for the active policy. */ if (has_target()) { ret = cpufreq_start_governor(policy); if (ret) @@ -1600,6 +1594,12 @@ static int cpufreq_offline(unsigned int cpu) goto unlock; } + if (has_target()) + strncpy(policy->last_governor, policy->governor->name, + CPUFREQ_NAME_LEN); + else + policy->last_policy = policy->policy; + if (cpufreq_thermal_control_enabled(cpufreq_driver)) { cpufreq_cooling_unregister(policy->cdev); policy->cdev = NULL; From 19b06dec363b65adb0e873f519ea50038ceecfcc Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Wed, 11 May 2022 17:50:09 +0200 Subject: [PATCH 0052/1565] cpufreq: Split cpufreq_offline() [ Upstream commit fddd8f86dff4a24742a7f0322ccbb34c6c1c9850 ] Split the "core" part running under the policy rwsem out of cpufreq_offline() to allow the locking in cpufreq_remove_dev() to be rearranged more easily. As a side-effect this eliminates the unlock label that's not needed any more. No expected functional impact. Signed-off-by: Rafael J. Wysocki Acked-by: Viresh Kumar Stable-dep-of: b8f85833c057 ("cpufreq: exit() callback is optional") Signed-off-by: Sasha Levin --- drivers/cpufreq/cpufreq.c | 33 +++++++++++++++++++-------------- 1 file changed, 19 insertions(+), 14 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 97999bda188d..dd1acb55fe71 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1559,21 +1559,10 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) return 0; } -static int cpufreq_offline(unsigned int cpu) +static void __cpufreq_offline(unsigned int cpu, struct cpufreq_policy *policy) { - struct cpufreq_policy *policy; int ret; - pr_debug("%s: unregistering CPU %u\n", __func__, cpu); - - policy = cpufreq_cpu_get_raw(cpu); - if (!policy) { - pr_debug("%s: No cpu_data found\n", __func__); - return 0; - } - - down_write(&policy->rwsem); - if (has_target()) cpufreq_stop_governor(policy); @@ -1591,7 +1580,7 @@ static int cpufreq_offline(unsigned int cpu) pr_err("%s: Failed to start governor\n", __func__); } - goto unlock; + return; } if (has_target()) @@ -1621,8 +1610,24 @@ static int cpufreq_offline(unsigned int cpu) cpufreq_driver->exit(policy); policy->freq_table = NULL; } +} + +static int cpufreq_offline(unsigned int cpu) +{ + struct cpufreq_policy *policy; + + pr_debug("%s: unregistering CPU %u\n", __func__, cpu); + + policy = cpufreq_cpu_get_raw(cpu); + if (!policy) { + pr_debug("%s: No cpu_data found\n", __func__); + return 0; + } + + down_write(&policy->rwsem); + + __cpufreq_offline(cpu, policy); -unlock: up_write(&policy->rwsem); return 0; } From 92aca16797e6f799737846887e31aaf1cf3cce96 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Wed, 11 May 2022 17:51:39 +0200 Subject: [PATCH 0053/1565] cpufreq: Rearrange locking in cpufreq_remove_dev() [ Upstream commit f339f3541701d824a0256ad4bf14c26ceb6d79c3 ] Currently, cpufreq_remove_dev() invokes the ->exit() driver callback without holding the policy rwsem which is inconsistent with what happens if ->exit() is invoked directly from cpufreq_offline(). It also manipulates the real_cpus mask and removes the CPU device symlink without holding the policy rwsem, but cpufreq_offline() holds the rwsem around the modifications thereof. For consistency, modify cpufreq_remove_dev() to hold the policy rwsem until the ->exit() callback has been called (or it has been determined that it is not necessary to call it). Signed-off-by: Rafael J. Wysocki Acked-by: Viresh Kumar Stable-dep-of: b8f85833c057 ("cpufreq: exit() callback is optional") Signed-off-by: Sasha Levin --- drivers/cpufreq/cpufreq.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index dd1acb55fe71..cbdea4fa600a 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1645,19 +1645,26 @@ static void cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif) if (!policy) return; + down_write(&policy->rwsem); + if (cpu_online(cpu)) - cpufreq_offline(cpu); + __cpufreq_offline(cpu, policy); cpumask_clear_cpu(cpu, policy->real_cpus); remove_cpu_dev_symlink(policy, dev); - if (cpumask_empty(policy->real_cpus)) { - /* We did light-weight exit earlier, do full tear down now */ - if (cpufreq_driver->offline) - cpufreq_driver->exit(policy); - - cpufreq_policy_free(policy); + if (!cpumask_empty(policy->real_cpus)) { + up_write(&policy->rwsem); + return; } + + /* We did light-weight exit earlier, do full tear down now */ + if (cpufreq_driver->offline) + cpufreq_driver->exit(policy); + + up_write(&policy->rwsem); + + cpufreq_policy_free(policy); } /** From dfc56ff5ec9904c008e9376d90a6d7e2d2bec4d3 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Fri, 12 Apr 2024 11:19:20 +0530 Subject: [PATCH 0054/1565] cpufreq: exit() callback is optional [ Upstream commit b8f85833c05730d631576008daaa34096bc7f3ce ] The exit() callback is optional and shouldn't be called without checking a valid pointer first. Also, we must clear freq_table pointer even if the exit() callback isn't present. Signed-off-by: Viresh Kumar Fixes: 91a12e91dc39 ("cpufreq: Allow light-weight tear down and bring up of CPUs") Fixes: f339f3541701 ("cpufreq: Rearrange locking in cpufreq_remove_dev()") Reported-by: Lizhe Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin --- drivers/cpufreq/cpufreq.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index cbdea4fa600a..162de402b113 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1606,10 +1606,13 @@ static void __cpufreq_offline(unsigned int cpu, struct cpufreq_policy *policy) */ if (cpufreq_driver->offline) { cpufreq_driver->offline(policy); - } else if (cpufreq_driver->exit) { - cpufreq_driver->exit(policy); - policy->freq_table = NULL; + return; } + + if (cpufreq_driver->exit) + cpufreq_driver->exit(policy); + + policy->freq_table = NULL; } static int cpufreq_offline(unsigned int cpu) @@ -1659,7 +1662,7 @@ static void cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif) } /* We did light-weight exit earlier, do full tear down now */ - if (cpufreq_driver->offline) + if (cpufreq_driver->offline && cpufreq_driver->exit) cpufreq_driver->exit(policy); up_write(&policy->rwsem); From 0ce990e6efe8c5350bb2ad173d0ba97541fe2c22 Mon Sep 17 00:00:00 2001 From: Lorenz Bauer Date: Thu, 20 Jul 2023 17:30:07 +0200 Subject: [PATCH 0055/1565] net: export inet_lookup_reuseport and inet6_lookup_reuseport [ Upstream commit ce796e60b3b196b61fcc565df195443cbb846ef0 ] Rename the existing reuseport helpers for IPv4 and IPv6 so that they can be invoked in the follow up commit. Export them so that building DCCP and IPv6 as a module works. No change in functionality. Reviewed-by: Kuniyuki Iwashima Signed-off-by: Lorenz Bauer Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-3-7021b683cdae@isovalent.com Signed-off-by: Martin KaFai Lau Stable-dep-of: 50aee97d1511 ("udp: Avoid call to compute_score on multiple sites") Signed-off-by: Sasha Levin --- include/net/inet6_hashtables.h | 7 +++++++ include/net/inet_hashtables.h | 5 +++++ net/ipv4/inet_hashtables.c | 15 ++++++++------- net/ipv6/inet6_hashtables.c | 19 ++++++++++--------- 4 files changed, 30 insertions(+), 16 deletions(-) diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h index 56f1286583d3..032ddab48f8f 100644 --- a/include/net/inet6_hashtables.h +++ b/include/net/inet6_hashtables.h @@ -48,6 +48,13 @@ struct sock *__inet6_lookup_established(struct net *net, const u16 hnum, const int dif, const int sdif); +struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk, + struct sk_buff *skb, int doff, + const struct in6_addr *saddr, + __be16 sport, + const struct in6_addr *daddr, + unsigned short hnum); + struct sock *inet6_lookup_listener(struct net *net, struct inet_hashinfo *hashinfo, struct sk_buff *skb, int doff, diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index c9e387d174c6..94c418d2e345 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -313,6 +313,11 @@ struct sock *__inet_lookup_established(struct net *net, const __be32 daddr, const u16 hnum, const int dif, const int sdif); +struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk, + struct sk_buff *skb, int doff, + __be32 saddr, __be16 sport, + __be32 daddr, unsigned short hnum); + static inline struct sock * inet_lookup_established(struct net *net, struct inet_hashinfo *hashinfo, const __be32 saddr, const __be16 sport, diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index ad050f8476b8..bea88219d531 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -252,10 +252,10 @@ static inline int compute_score(struct sock *sk, struct net *net, return score; } -static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk, - struct sk_buff *skb, int doff, - __be32 saddr, __be16 sport, - __be32 daddr, unsigned short hnum) +struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk, + struct sk_buff *skb, int doff, + __be32 saddr, __be16 sport, + __be32 daddr, unsigned short hnum) { struct sock *reuse_sk = NULL; u32 phash; @@ -266,6 +266,7 @@ static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk, } return reuse_sk; } +EXPORT_SYMBOL_GPL(inet_lookup_reuseport); /* * Here are some nice properties to exploit here. The BSD API @@ -290,8 +291,8 @@ static struct sock *inet_lhash2_lookup(struct net *net, sk = (struct sock *)icsk; score = compute_score(sk, net, hnum, daddr, dif, sdif); if (score > hiscore) { - result = lookup_reuseport(net, sk, skb, doff, - saddr, sport, daddr, hnum); + result = inet_lookup_reuseport(net, sk, skb, doff, + saddr, sport, daddr, hnum); if (result) return result; @@ -320,7 +321,7 @@ static inline struct sock *inet_lookup_run_bpf(struct net *net, if (no_reuseport || IS_ERR_OR_NULL(sk)) return sk; - reuse_sk = lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum); + reuse_sk = inet_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum); if (reuse_sk) sk = reuse_sk; return sk; diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index b4a5e01e1201..2267f973e2df 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -113,12 +113,12 @@ static inline int compute_score(struct sock *sk, struct net *net, return score; } -static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk, - struct sk_buff *skb, int doff, - const struct in6_addr *saddr, - __be16 sport, - const struct in6_addr *daddr, - unsigned short hnum) +struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk, + struct sk_buff *skb, int doff, + const struct in6_addr *saddr, + __be16 sport, + const struct in6_addr *daddr, + unsigned short hnum) { struct sock *reuse_sk = NULL; u32 phash; @@ -129,6 +129,7 @@ static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk, } return reuse_sk; } +EXPORT_SYMBOL_GPL(inet6_lookup_reuseport); /* called with rcu_read_lock() */ static struct sock *inet6_lhash2_lookup(struct net *net, @@ -146,8 +147,8 @@ static struct sock *inet6_lhash2_lookup(struct net *net, sk = (struct sock *)icsk; score = compute_score(sk, net, hnum, daddr, dif, sdif); if (score > hiscore) { - result = lookup_reuseport(net, sk, skb, doff, - saddr, sport, daddr, hnum); + result = inet6_lookup_reuseport(net, sk, skb, doff, + saddr, sport, daddr, hnum); if (result) return result; @@ -178,7 +179,7 @@ static inline struct sock *inet6_lookup_run_bpf(struct net *net, if (no_reuseport || IS_ERR_OR_NULL(sk)) return sk; - reuse_sk = lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum); + reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum); if (reuse_sk) sk = reuse_sk; return sk; From 452f8dc251f81b05a553bbef6a326952445b9794 Mon Sep 17 00:00:00 2001 From: Lorenz Bauer Date: Thu, 20 Jul 2023 17:30:08 +0200 Subject: [PATCH 0056/1565] net: remove duplicate reuseport_lookup functions [ Upstream commit 0f495f7617229772403e683033abc473f0f0553c ] There are currently four copies of reuseport_lookup: one each for (TCP, UDP)x(IPv4, IPv6). This forces us to duplicate all callers of those functions as well. This is already the case for sk_lookup helpers (inet,inet6,udp4,udp6)_lookup_run_bpf. There are two differences between the reuseport_lookup helpers: 1. They call different hash functions depending on protocol 2. UDP reuseport_lookup checks that sk_state != TCP_ESTABLISHED Move the check for sk_state into the caller and use the INDIRECT_CALL infrastructure to cut down the helpers to one per IP version. Reviewed-by: Kuniyuki Iwashima Signed-off-by: Lorenz Bauer Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-4-7021b683cdae@isovalent.com Signed-off-by: Martin KaFai Lau Stable-dep-of: 50aee97d1511 ("udp: Avoid call to compute_score on multiple sites") Signed-off-by: Sasha Levin --- include/net/inet6_hashtables.h | 11 ++++++++- include/net/inet_hashtables.h | 15 ++++++++----- net/ipv4/inet_hashtables.c | 20 +++++++++++------ net/ipv4/udp.c | 34 +++++++++++----------------- net/ipv6/inet6_hashtables.c | 14 ++++++++---- net/ipv6/udp.c | 41 +++++++++++++--------------------- 6 files changed, 72 insertions(+), 63 deletions(-) diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h index 032ddab48f8f..f89320b6fee3 100644 --- a/include/net/inet6_hashtables.h +++ b/include/net/inet6_hashtables.h @@ -48,12 +48,21 @@ struct sock *__inet6_lookup_established(struct net *net, const u16 hnum, const int dif, const int sdif); +typedef u32 (inet6_ehashfn_t)(const struct net *net, + const struct in6_addr *laddr, const u16 lport, + const struct in6_addr *faddr, const __be16 fport); + +inet6_ehashfn_t inet6_ehashfn; + +INDIRECT_CALLABLE_DECLARE(inet6_ehashfn_t udp6_ehashfn); + struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk, struct sk_buff *skb, int doff, const struct in6_addr *saddr, __be16 sport, const struct in6_addr *daddr, - unsigned short hnum); + unsigned short hnum, + inet6_ehashfn_t *ehashfn); struct sock *inet6_lookup_listener(struct net *net, struct inet_hashinfo *hashinfo, diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 94c418d2e345..7778422cfac5 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -313,10 +313,19 @@ struct sock *__inet_lookup_established(struct net *net, const __be32 daddr, const u16 hnum, const int dif, const int sdif); +typedef u32 (inet_ehashfn_t)(const struct net *net, + const __be32 laddr, const __u16 lport, + const __be32 faddr, const __be16 fport); + +inet_ehashfn_t inet_ehashfn; + +INDIRECT_CALLABLE_DECLARE(inet_ehashfn_t udp_ehashfn); + struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk, struct sk_buff *skb, int doff, __be32 saddr, __be16 sport, - __be32 daddr, unsigned short hnum); + __be32 daddr, unsigned short hnum, + inet_ehashfn_t *ehashfn); static inline struct sock * inet_lookup_established(struct net *net, struct inet_hashinfo *hashinfo, @@ -387,10 +396,6 @@ static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo, refcounted); } -u32 inet6_ehashfn(const struct net *net, - const struct in6_addr *laddr, const u16 lport, - const struct in6_addr *faddr, const __be16 fport); - static inline void sk_daddr_set(struct sock *sk, __be32 addr) { sk->sk_daddr = addr; /* alias of inet_daddr */ diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index bea88219d531..56deddeac1b0 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -28,9 +28,9 @@ #include #include -static u32 inet_ehashfn(const struct net *net, const __be32 laddr, - const __u16 lport, const __be32 faddr, - const __be16 fport) +u32 inet_ehashfn(const struct net *net, const __be32 laddr, + const __u16 lport, const __be32 faddr, + const __be16 fport) { static u32 inet_ehash_secret __read_mostly; @@ -39,6 +39,7 @@ static u32 inet_ehashfn(const struct net *net, const __be32 laddr, return __inet_ehashfn(laddr, lport, faddr, fport, inet_ehash_secret + net_hash_mix(net)); } +EXPORT_SYMBOL_GPL(inet_ehashfn); /* This function handles inet_sock, but also timewait and request sockets * for IPv4/IPv6. @@ -252,16 +253,20 @@ static inline int compute_score(struct sock *sk, struct net *net, return score; } +INDIRECT_CALLABLE_DECLARE(inet_ehashfn_t udp_ehashfn); + struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk, struct sk_buff *skb, int doff, __be32 saddr, __be16 sport, - __be32 daddr, unsigned short hnum) + __be32 daddr, unsigned short hnum, + inet_ehashfn_t *ehashfn) { struct sock *reuse_sk = NULL; u32 phash; if (sk->sk_reuseport) { - phash = inet_ehashfn(net, daddr, hnum, saddr, sport); + phash = INDIRECT_CALL_2(ehashfn, udp_ehashfn, inet_ehashfn, + net, daddr, hnum, saddr, sport); reuse_sk = reuseport_select_sock(sk, phash, skb, doff); } return reuse_sk; @@ -292,7 +297,7 @@ static struct sock *inet_lhash2_lookup(struct net *net, score = compute_score(sk, net, hnum, daddr, dif, sdif); if (score > hiscore) { result = inet_lookup_reuseport(net, sk, skb, doff, - saddr, sport, daddr, hnum); + saddr, sport, daddr, hnum, inet_ehashfn); if (result) return result; @@ -321,7 +326,8 @@ static inline struct sock *inet_lookup_run_bpf(struct net *net, if (no_reuseport || IS_ERR_OR_NULL(sk)) return sk; - reuse_sk = inet_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum); + reuse_sk = inet_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum, + inet_ehashfn); if (reuse_sk) sk = reuse_sk; return sk; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 16ff3962b24d..127aa7b5ce97 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -398,9 +398,9 @@ static int compute_score(struct sock *sk, struct net *net, return score; } -static u32 udp_ehashfn(const struct net *net, const __be32 laddr, - const __u16 lport, const __be32 faddr, - const __be16 fport) +INDIRECT_CALLABLE_SCOPE +u32 udp_ehashfn(const struct net *net, const __be32 laddr, const __u16 lport, + const __be32 faddr, const __be16 fport) { static u32 udp_ehash_secret __read_mostly; @@ -410,22 +410,6 @@ static u32 udp_ehashfn(const struct net *net, const __be32 laddr, udp_ehash_secret + net_hash_mix(net)); } -static struct sock *lookup_reuseport(struct net *net, struct sock *sk, - struct sk_buff *skb, - __be32 saddr, __be16 sport, - __be32 daddr, unsigned short hnum) -{ - struct sock *reuse_sk = NULL; - u32 hash; - - if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) { - hash = udp_ehashfn(net, daddr, hnum, saddr, sport); - reuse_sk = reuseport_select_sock(sk, hash, skb, - sizeof(struct udphdr)); - } - return reuse_sk; -} - /* called with rcu_read_lock() */ static struct sock *udp4_lib_lookup2(struct net *net, __be32 saddr, __be16 sport, @@ -444,7 +428,14 @@ static struct sock *udp4_lib_lookup2(struct net *net, daddr, hnum, dif, sdif); if (score > badness) { badness = score; - result = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum); + + if (sk->sk_state == TCP_ESTABLISHED) { + result = sk; + continue; + } + + result = inet_lookup_reuseport(net, sk, skb, sizeof(struct udphdr), + saddr, sport, daddr, hnum, udp_ehashfn); if (!result) { result = sk; continue; @@ -483,7 +474,8 @@ static struct sock *udp4_lookup_run_bpf(struct net *net, if (no_reuseport || IS_ERR_OR_NULL(sk)) return sk; - reuse_sk = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum); + reuse_sk = inet_lookup_reuseport(net, sk, skb, sizeof(struct udphdr), + saddr, sport, daddr, hnum, udp_ehashfn); if (reuse_sk) sk = reuse_sk; return sk; diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index 2267f973e2df..21b4fc835487 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -41,6 +41,7 @@ u32 inet6_ehashfn(const struct net *net, return __inet6_ehashfn(lhash, lport, fhash, fport, inet6_ehash_secret + net_hash_mix(net)); } +EXPORT_SYMBOL_GPL(inet6_ehashfn); /* * Sockets in TCP_CLOSE state are _always_ taken out of the hash, so @@ -113,18 +114,22 @@ static inline int compute_score(struct sock *sk, struct net *net, return score; } +INDIRECT_CALLABLE_DECLARE(inet6_ehashfn_t udp6_ehashfn); + struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk, struct sk_buff *skb, int doff, const struct in6_addr *saddr, __be16 sport, const struct in6_addr *daddr, - unsigned short hnum) + unsigned short hnum, + inet6_ehashfn_t *ehashfn) { struct sock *reuse_sk = NULL; u32 phash; if (sk->sk_reuseport) { - phash = inet6_ehashfn(net, daddr, hnum, saddr, sport); + phash = INDIRECT_CALL_INET(ehashfn, udp6_ehashfn, inet6_ehashfn, + net, daddr, hnum, saddr, sport); reuse_sk = reuseport_select_sock(sk, phash, skb, doff); } return reuse_sk; @@ -148,7 +153,7 @@ static struct sock *inet6_lhash2_lookup(struct net *net, score = compute_score(sk, net, hnum, daddr, dif, sdif); if (score > hiscore) { result = inet6_lookup_reuseport(net, sk, skb, doff, - saddr, sport, daddr, hnum); + saddr, sport, daddr, hnum, inet6_ehashfn); if (result) return result; @@ -179,7 +184,8 @@ static inline struct sock *inet6_lookup_run_bpf(struct net *net, if (no_reuseport || IS_ERR_OR_NULL(sk)) return sk; - reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum); + reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff, + saddr, sport, daddr, hnum, inet6_ehashfn); if (reuse_sk) sk = reuse_sk; return sk; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 8c9672e7a7dd..1632b940347c 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -67,11 +67,12 @@ int udpv6_init_sock(struct sock *sk) return 0; } -static u32 udp6_ehashfn(const struct net *net, - const struct in6_addr *laddr, - const u16 lport, - const struct in6_addr *faddr, - const __be16 fport) +INDIRECT_CALLABLE_SCOPE +u32 udp6_ehashfn(const struct net *net, + const struct in6_addr *laddr, + const u16 lport, + const struct in6_addr *faddr, + const __be16 fport) { static u32 udp6_ehash_secret __read_mostly; static u32 udp_ipv6_hash_secret __read_mostly; @@ -155,24 +156,6 @@ static int compute_score(struct sock *sk, struct net *net, return score; } -static struct sock *lookup_reuseport(struct net *net, struct sock *sk, - struct sk_buff *skb, - const struct in6_addr *saddr, - __be16 sport, - const struct in6_addr *daddr, - unsigned int hnum) -{ - struct sock *reuse_sk = NULL; - u32 hash; - - if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) { - hash = udp6_ehashfn(net, daddr, hnum, saddr, sport); - reuse_sk = reuseport_select_sock(sk, hash, skb, - sizeof(struct udphdr)); - } - return reuse_sk; -} - /* called with rcu_read_lock() */ static struct sock *udp6_lib_lookup2(struct net *net, const struct in6_addr *saddr, __be16 sport, @@ -190,7 +173,14 @@ static struct sock *udp6_lib_lookup2(struct net *net, daddr, hnum, dif, sdif); if (score > badness) { badness = score; - result = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum); + + if (sk->sk_state == TCP_ESTABLISHED) { + result = sk; + continue; + } + + result = inet6_lookup_reuseport(net, sk, skb, sizeof(struct udphdr), + saddr, sport, daddr, hnum, udp6_ehashfn); if (!result) { result = sk; continue; @@ -230,7 +220,8 @@ static inline struct sock *udp6_lookup_run_bpf(struct net *net, if (no_reuseport || IS_ERR_OR_NULL(sk)) return sk; - reuse_sk = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum); + reuse_sk = inet6_lookup_reuseport(net, sk, skb, sizeof(struct udphdr), + saddr, sport, daddr, hnum, udp6_ehashfn); if (reuse_sk) sk = reuse_sk; return sk; From 69fab9d2e24a580c04d306014f756cb3c6bcc34d Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Fri, 12 Apr 2024 17:20:04 -0400 Subject: [PATCH 0057/1565] udp: Avoid call to compute_score on multiple sites [ Upstream commit 50aee97d15113b95a68848db1f0cb2a6c09f753a ] We've observed a 7-12% performance regression in iperf3 UDP ipv4 and ipv6 tests with multiple sockets on Zen3 cpus, which we traced back to commit f0ea27e7bfe1 ("udp: re-score reuseport groups when connected sockets are present"). The failing tests were those that would spawn UDP sockets per-cpu on systems that have a high number of cpus. Unsurprisingly, it is not caused by the extra re-scoring of the reused socket, but due to the compiler no longer inlining compute_score, once it has the extra call site in udp4_lib_lookup2. This is augmented by the "Safe RET" mitigation for SRSO, needed in our Zen3 cpus. We could just explicitly inline it, but compute_score() is quite a large function, around 300b. Inlining in two sites would almost double udp4_lib_lookup2, which is a silly thing to do just to workaround a mitigation. Instead, this patch shuffles the code a bit to avoid the multiple calls to compute_score. Since it is a static function used in one spot, the compiler can safely fold it in, as it did before, without increasing the text size. With this patch applied I ran my original iperf3 testcases. The failing cases all looked like this (ipv4): iperf3 -c 127.0.0.1 --udp -4 -f K -b $R -l 8920 -t 30 -i 5 -P 64 -O 2 where $R is either 1G/10G/0 (max, unlimited). I ran 3 times each. baseline is v6.9-rc3. harmean == harmonic mean; CV == coefficient of variation. ipv4: 1G 10G MAX HARMEAN (CV) HARMEAN (CV) HARMEAN (CV) baseline 1743852.66(0.0208) 1725933.02(0.0167) 1705203.78(0.0386) patched 1968727.61(0.0035) 1962283.22(0.0195) 1923853.50(0.0256) ipv6: 1G 10G MAX HARMEAN (CV) HARMEAN (CV) HARMEAN (CV) baseline 1729020.03(0.0028) 1691704.49(0.0243) 1692251.34(0.0083) patched 1900422.19(0.0067) 1900968.01(0.0067) 1568532.72(0.1519) This restores the performance we had before the change above with this benchmark. We obviously don't expect any real impact when mitigations are disabled, but just to be sure it also doesn't regresses: mitigations=off ipv4: 1G 10G MAX HARMEAN (CV) HARMEAN (CV) HARMEAN (CV) baseline 3230279.97(0.0066) 3229320.91(0.0060) 2605693.19(0.0697) patched 3242802.36(0.0073) 3239310.71(0.0035) 2502427.19(0.0882) Cc: Lorenz Bauer Fixes: f0ea27e7bfe1 ("udp: re-score reuseport groups when connected sockets are present") Signed-off-by: Gabriel Krisman Bertazi Reviewed-by: Kuniyuki Iwashima Reviewed-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/ipv4/udp.c | 21 ++++++++++++++++----- net/ipv6/udp.c | 20 ++++++++++++++++---- 2 files changed, 32 insertions(+), 9 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 127aa7b5ce97..da9015efb45e 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -420,15 +420,21 @@ static struct sock *udp4_lib_lookup2(struct net *net, { struct sock *sk, *result; int score, badness; + bool need_rescore; result = NULL; badness = 0; udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) { - score = compute_score(sk, net, saddr, sport, - daddr, hnum, dif, sdif); + need_rescore = false; +rescore: + score = compute_score(need_rescore ? result : sk, net, saddr, + sport, daddr, hnum, dif, sdif); if (score > badness) { badness = score; + if (need_rescore) + continue; + if (sk->sk_state == TCP_ESTABLISHED) { result = sk; continue; @@ -449,9 +455,14 @@ static struct sock *udp4_lib_lookup2(struct net *net, if (IS_ERR(result)) continue; - badness = compute_score(result, net, saddr, sport, - daddr, hnum, dif, sdif); - + /* compute_score is too long of a function to be + * inlined, and calling it again here yields + * measureable overhead for some + * workloads. Work around it by jumping + * backwards to rescore 'result'. + */ + need_rescore = true; + goto rescore; } } return result; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 1632b940347c..203a6d64d7e9 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -165,15 +165,21 @@ static struct sock *udp6_lib_lookup2(struct net *net, { struct sock *sk, *result; int score, badness; + bool need_rescore; result = NULL; badness = -1; udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) { - score = compute_score(sk, net, saddr, sport, - daddr, hnum, dif, sdif); + need_rescore = false; +rescore: + score = compute_score(need_rescore ? result : sk, net, saddr, + sport, daddr, hnum, dif, sdif); if (score > badness) { badness = score; + if (need_rescore) + continue; + if (sk->sk_state == TCP_ESTABLISHED) { result = sk; continue; @@ -194,8 +200,14 @@ static struct sock *udp6_lib_lookup2(struct net *net, if (IS_ERR(result)) continue; - badness = compute_score(sk, net, saddr, sport, - daddr, hnum, dif, sdif); + /* compute_score is too long of a function to be + * inlined, and calling it again here yields + * measureable overhead for some + * workloads. Work around it by jumping + * backwards to rescore 'result'. + */ + need_rescore = true; + goto rescore; } } return result; From 0dc60ee1ed22d9a16550a905d07dcc265ffad314 Mon Sep 17 00:00:00 2001 From: Xingui Yang Date: Tue, 12 Mar 2024 14:11:03 +0000 Subject: [PATCH 0058/1565] scsi: libsas: Fix the failure of adding phy with zero-address to port [ Upstream commit 06036a0a5db34642c5dbe22021a767141f010b7a ] As of commit 7d1d86518118 ("[SCSI] libsas: fix false positive 'device attached' conditions"), reset the phy->entacted_sas_addr address to a zero-address when the link rate is less than 1.5G. Currently we find that when a new device is attached, and the link rate is less than 1.5G, but the device type is not NO_DEVICE, for example: the link rate is SAS_PHY_RESET_IN_PROGRESS and the device type is stp. After setting the phy->entacted_sas_addr address to the zero address, the port will continue to be created for the phy with the zero-address, and other phys with the zero-address will be tried to be added to the new port: [562240.051197] sas: ex 500e004aaaaaaa1f phy19:U:0 attached: 0000000000000000 (no device) // phy19 is deleted but still on the parent port's phy_list [562240.062536] sas: ex 500e004aaaaaaa1f phy0 new device attached [562240.062616] sas: ex 500e004aaaaaaa1f phy00:U:5 attached: 0000000000000000 (stp) [562240.062680] port-7:7:0: trying to add phy phy-7:7:19 fails: it's already part of another port Therefore, it should be the same as sas_get_phy_attached_dev(). Only when device_type is SAS_PHY_UNUSED, sas_address is set to the 0 address. Fixes: 7d1d86518118 ("[SCSI] libsas: fix false positive 'device attached' conditions") Signed-off-by: Xingui Yang Link: https://lore.kernel.org/r/20240312141103.31358-5-yangxingui@huawei.com Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/libsas/sas_expander.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index 8444a4287ac1..8681e0cdfa5e 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -256,8 +256,7 @@ static void sas_set_ex_phy(struct domain_device *dev, int phy_id, void *rsp) /* help some expanders that fail to zero sas_address in the 'no * device' case */ - if (phy->attached_dev_type == SAS_PHY_UNUSED || - phy->linkrate < SAS_LINK_RATE_1_5_GBPS) + if (phy->attached_dev_type == SAS_PHY_UNUSED) memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE); else memcpy(phy->attached_sas_addr, dr->attached_sas_addr, SAS_ADDR_SIZE); From 176fb7770d36a9f3ce81768b607a5b2eb961f2c1 Mon Sep 17 00:00:00 2001 From: Yuri Karpov Date: Tue, 12 Mar 2024 20:04:47 +0300 Subject: [PATCH 0059/1565] scsi: hpsa: Fix allocation size for Scsi_Host private data [ Upstream commit 504e2bed5d50610c1836046c0c195b0a6dba9c72 ] struct Scsi_Host private data contains pointer to struct ctlr_info. Restore allocation of only 8 bytes to store pointer in struct Scsi_Host private data area. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: bbbd25499100 ("scsi: hpsa: Fix allocation size for scsi_host_alloc()") Signed-off-by: Yuri Karpov Link: https://lore.kernel.org/r/20240312170447.743709-1-YKarpov@ispras.ru Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/hpsa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c index a44a098dbb9c..4e7ec83c07cf 100644 --- a/drivers/scsi/hpsa.c +++ b/drivers/scsi/hpsa.c @@ -5834,7 +5834,7 @@ static int hpsa_scsi_host_alloc(struct ctlr_info *h) { struct Scsi_Host *sh; - sh = scsi_host_alloc(&hpsa_driver_template, sizeof(struct ctlr_info)); + sh = scsi_host_alloc(&hpsa_driver_template, sizeof(struct ctlr_info *)); if (sh == NULL) { dev_err(&h->pdev->dev, "scsi_host_alloc failed\n"); return -ENOMEM; From 1b27468dbe58ae0f7b9a591dfa8fa4d6b5597cee Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 18 Apr 2024 22:17:06 +0200 Subject: [PATCH 0060/1565] x86/purgatory: Switch to the position-independent small code model [ Upstream commit cba786af84a0f9716204e09f518ce3b7ada8555e ] On x86, the ordinary, position dependent small and kernel code models only support placement of the executable in 32-bit addressable memory, due to the use of 32-bit signed immediates to generate references to global variables. For the kernel, this implies that all global variables must reside in the top 2 GiB of the kernel virtual address space, where the implicit address bits 63:32 are equal to sign bit 31. This means the kernel code model is not suitable for other bare metal executables such as the kexec purgatory, which can be placed arbitrarily in the physical address space, where its address may no longer be representable as a sign extended 32-bit quantity. For this reason, commit e16c2983fba0 ("x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors") switched to the large code model, which uses 64-bit immediates for all symbol references, including function calls, in order to avoid relying on any assumptions regarding proximity of symbols in the final executable. The large code model is rarely used, clunky and the least likely to operate in a similar fashion when comparing GCC and Clang, so it is best avoided. This is especially true now that Clang 18 has started to emit executable code in two separate sections (.text and .ltext), which triggers an issue in the kexec loading code at runtime. The SUSE bugzilla fixes tag points to gcc 13 having issues with the large model too and that perhaps the large model should simply not be used at all. Instead, use the position independent small code model, which makes no assumptions about placement but only about proximity, where all referenced symbols must be within -/+ 2 GiB, i.e., in range for a RIP-relative reference. Use hidden visibility to suppress the use of a GOT, which carries absolute addresses that are not covered by static ELF relocations, and is therefore incompatible with the kexec loader's relocation logic. [ bp: Massage commit message. ] Fixes: e16c2983fba0 ("x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors") Fixes: https://bugzilla.suse.com/show_bug.cgi?id=1211853 Closes: https://github.com/ClangBuiltLinux/linux/issues/2016 Signed-off-by: Ard Biesheuvel Signed-off-by: Borislav Petkov (AMD) Reviewed-by: Nathan Chancellor Reviewed-by: Fangrui Song Acked-by: Nick Desaulniers Tested-by: Nathan Chancellor Link: https://lore.kernel.org/all/20240417-x86-fix-kexec-with-llvm-18-v1-0-5383121e8fb7@kernel.org/ Signed-off-by: Sasha Levin --- arch/x86/purgatory/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile index dc0b91c1db04..7c7bfdf0d0c0 100644 --- a/arch/x86/purgatory/Makefile +++ b/arch/x86/purgatory/Makefile @@ -37,7 +37,8 @@ KCOV_INSTRUMENT := n # make up the standalone purgatory.ro PURGATORY_CFLAGS_REMOVE := -mcmodel=kernel -PURGATORY_CFLAGS := -mcmodel=large -ffreestanding -fno-zero-initialized-in-bss -g0 +PURGATORY_CFLAGS := -mcmodel=small -ffreestanding -fno-zero-initialized-in-bss -g0 +PURGATORY_CFLAGS += -fpic -fvisibility=hidden PURGATORY_CFLAGS += $(DISABLE_STACKLEAK_PLUGIN) -DDISABLE_BRANCH_PROFILING PURGATORY_CFLAGS += -fno-stack-protector From 5c4756e0fb0c385c390a32fa2e5f4b3710a6c1d2 Mon Sep 17 00:00:00 2001 From: Su Hui Date: Mon, 22 Apr 2024 11:42:44 +0800 Subject: [PATCH 0061/1565] wifi: ath10k: Fix an error code problem in ath10k_dbg_sta_write_peer_debug_trigger() [ Upstream commit c511a9c12674d246916bb16c479d496b76983193 ] Clang Static Checker (scan-build) warns: drivers/net/wireless/ath/ath10k/debugfs_sta.c:line 429, column 3 Value stored to 'ret' is never read. Return 'ret' rather than 'count' when 'ret' stores an error code. Fixes: ee8b08a1be82 ("ath10k: add debugfs support to get per peer tids log via tracing") Signed-off-by: Su Hui Acked-by: Jeff Johnson Signed-off-by: Kalle Valo Link: https://msgid.link/20240422034243.938962-1-suhui@nfschina.com Signed-off-by: Sasha Levin --- drivers/net/wireless/ath/ath10k/debugfs_sta.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/ath10k/debugfs_sta.c b/drivers/net/wireless/ath/ath10k/debugfs_sta.c index 367539f2c370..f7912c72cba3 100644 --- a/drivers/net/wireless/ath/ath10k/debugfs_sta.c +++ b/drivers/net/wireless/ath/ath10k/debugfs_sta.c @@ -438,7 +438,7 @@ ath10k_dbg_sta_write_peer_debug_trigger(struct file *file, } out: mutex_unlock(&ar->conf_mutex); - return count; + return ret ?: count; } static const struct file_operations fops_peer_debug_trigger = { From 88aa40df8ee4718cab47e11d1206c4d1119d652d Mon Sep 17 00:00:00 2001 From: Dmitry Baryshkov Date: Tue, 30 Jan 2024 08:47:06 +0200 Subject: [PATCH 0062/1565] wifi: ath10k: populate board data for WCN3990 [ Upstream commit f1f1b5b055c9f27a2f90fd0f0521f5920e9b3c18 ] Specify board data size (and board.bin filename) for the WCN3990 platform. Reported-by: Yongqin Liu Fixes: 03a72288c546 ("ath10k: wmi: add hw params entry for wcn3990") Signed-off-by: Dmitry Baryshkov Signed-off-by: Kalle Valo Link: https://msgid.link/20240130-wcn3990-board-fw-v1-1-738f7c19a8c8@linaro.org Signed-off-by: Sasha Levin --- drivers/net/wireless/ath/ath10k/core.c | 3 +++ drivers/net/wireless/ath/ath10k/hw.h | 1 + drivers/net/wireless/ath/ath10k/targaddrs.h | 3 +++ 3 files changed, 7 insertions(+) diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c index 57ac80997319..d03a36c45f9f 100644 --- a/drivers/net/wireless/ath/ath10k/core.c +++ b/drivers/net/wireless/ath/ath10k/core.c @@ -625,6 +625,9 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = { .max_spatial_stream = 4, .fw = { .dir = WCN3990_HW_1_0_FW_DIR, + .board = WCN3990_HW_1_0_BOARD_DATA_FILE, + .board_size = WCN3990_BOARD_DATA_SZ, + .board_ext_size = WCN3990_BOARD_EXT_DATA_SZ, }, .sw_decrypt_mcast_mgmt = true, .hw_ops = &wcn3990_ops, diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h index d3ef83ad577d..c4df0e4e161b 100644 --- a/drivers/net/wireless/ath/ath10k/hw.h +++ b/drivers/net/wireless/ath/ath10k/hw.h @@ -132,6 +132,7 @@ enum qca9377_chip_id_rev { /* WCN3990 1.0 definitions */ #define WCN3990_HW_1_0_DEV_VERSION ATH10K_HW_WCN3990 #define WCN3990_HW_1_0_FW_DIR ATH10K_FW_DIR "/WCN3990/hw1.0" +#define WCN3990_HW_1_0_BOARD_DATA_FILE "board.bin" #define ATH10K_FW_FILE_BASE "firmware" #define ATH10K_FW_API_MAX 6 diff --git a/drivers/net/wireless/ath/ath10k/targaddrs.h b/drivers/net/wireless/ath/ath10k/targaddrs.h index ec556bb88d65..ba37e6c7ced0 100644 --- a/drivers/net/wireless/ath/ath10k/targaddrs.h +++ b/drivers/net/wireless/ath/ath10k/targaddrs.h @@ -491,4 +491,7 @@ struct host_interest { #define QCA4019_BOARD_DATA_SZ 12064 #define QCA4019_BOARD_EXT_DATA_SZ 0 +#define WCN3990_BOARD_DATA_SZ 26328 +#define WCN3990_BOARD_EXT_DATA_SZ 0 + #endif /* __TARGADDRS_H__ */ From bc7cae63fa39d8c3057ff2f23522f80f14fba526 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 23 Apr 2024 12:56:20 +0000 Subject: [PATCH 0063/1565] tcp: avoid premature drops in tcp_add_backlog() [ Upstream commit ec00ed472bdb7d0af840da68c8c11bff9f4d9caa ] While testing TCP performance with latest trees, I saw suspect SOCKET_BACKLOG drops. tcp_add_backlog() computes its limit with : limit = (u32)READ_ONCE(sk->sk_rcvbuf) + (u32)(READ_ONCE(sk->sk_sndbuf) >> 1); limit += 64 * 1024; This does not take into account that sk->sk_backlog.len is reset only at the very end of __release_sock(). Both sk->sk_backlog.len and sk->sk_rmem_alloc could reach sk_rcvbuf in normal conditions. We should double sk->sk_rcvbuf contribution in the formula to absorb bubbles in the backlog, which happen more often for very fast flows. This change maintains decent protection against abuses. Fixes: c377411f2494 ("net: sk_add_backlog() take rmem_alloc into account") Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20240423125620.3309458-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/ipv4/tcp_ipv4.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 85d8688933f3..0e7179a19e22 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1787,7 +1787,7 @@ int tcp_v4_early_demux(struct sk_buff *skb) bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb) { - u32 limit, tail_gso_size, tail_gso_segs; + u32 tail_gso_size, tail_gso_segs; struct skb_shared_info *shinfo; const struct tcphdr *th; struct tcphdr *thtail; @@ -1796,6 +1796,7 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb) bool fragstolen; u32 gso_segs; u32 gso_size; + u64 limit; int delta; /* In case all data was pulled from skb frags (in __pskb_pull_tail()), @@ -1891,7 +1892,13 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb) __skb_push(skb, hdrlen); no_coalesce: - limit = (u32)READ_ONCE(sk->sk_rcvbuf) + (u32)(READ_ONCE(sk->sk_sndbuf) >> 1); + /* sk->sk_backlog.len is reset only at the end of __release_sock(). + * Both sk->sk_backlog.len and sk->sk_rmem_alloc could reach + * sk_rcvbuf in normal conditions. + */ + limit = ((u64)READ_ONCE(sk->sk_rcvbuf)) << 1; + + limit += ((u32)READ_ONCE(sk->sk_sndbuf)) >> 1; /* Only socket owner can try to collapse/prune rx queues * to reduce memory overhead, so add a little headroom here. @@ -1899,6 +1906,8 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb) */ limit += 64 * 1024; + limit = min_t(u64, limit, UINT_MAX); + if (unlikely(sk_add_backlog(sk, skb, limit))) { bh_unlock_sock(sk); __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPBACKLOGDROP); From 9d132029224f0c35fe0e1a4193108a579cb168a4 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 26 Apr 2024 06:42:22 +0000 Subject: [PATCH 0064/1565] net: give more chances to rcu in netdev_wait_allrefs_any() [ Upstream commit cd42ba1c8ac9deb9032add6adf491110e7442040 ] This came while reviewing commit c4e86b4363ac ("net: add two more call_rcu_hurry()"). Paolo asked if adding one synchronize_rcu() would help. While synchronize_rcu() does not help, making sure to call rcu_barrier() before msleep(wait) is definitely helping to make sure lazy call_rcu() are completed. Instead of waiting ~100 seconds in my tests, the ref_tracker splats occurs one time only, and netdev_wait_allrefs_any() latency is reduced to the strict minimum. Ideally we should audit our call_rcu() users to make sure no refcount (or cascading call_rcu()) is held too long, because rcu_barrier() is quite expensive. Fixes: 0e4be9e57e8c ("net: use exponential backoff in netdev_wait_allrefs") Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/all/28bbf698-befb-42f6-b561-851c67f464aa@kernel.org/T/#m76d73ed6b03cd930778ac4d20a777f22a08d6824 Reviewed-by: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/core/dev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index 0e2c433bebcd..5e91496fd3a3 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -10217,8 +10217,9 @@ static void netdev_wait_allrefs(struct net_device *dev) rebroadcast_time = jiffies; } + rcu_barrier(); + if (!wait) { - rcu_barrier(); wait = WAIT_REFS_MIN_MSECS; } else { msleep(wait); From 280619bbdeac186fb320fab3d61122d2a085def8 Mon Sep 17 00:00:00 2001 From: Finn Thain Date: Wed, 13 Mar 2024 13:53:41 +1100 Subject: [PATCH 0065/1565] macintosh/via-macii: Fix "BUG: sleeping function called from invalid context" [ Upstream commit d301a71c76ee4c384b4e03cdc320a55f5cf1df05 ] The via-macii ADB driver calls request_irq() after disabling hard interrupts. But disabling interrupts isn't necessary here because the VIA shift register interrupt was masked during VIA1 initialization. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Finn Thain Reviewed-by: Geert Uytterhoeven Link: https://lore.kernel.org/r/419fcc09d0e563b425c419053d02236b044d86b0.1710298421.git.fthain@linux-m68k.org Signed-off-by: Geert Uytterhoeven Signed-off-by: Sasha Levin --- drivers/macintosh/via-macii.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/macintosh/via-macii.c b/drivers/macintosh/via-macii.c index 060e03f2264b..6adb34d1e381 100644 --- a/drivers/macintosh/via-macii.c +++ b/drivers/macintosh/via-macii.c @@ -142,24 +142,19 @@ static int macii_probe(void) /* Initialize the driver */ static int macii_init(void) { - unsigned long flags; int err; - local_irq_save(flags); - err = macii_init_via(); if (err) - goto out; + return err; err = request_irq(IRQ_MAC_ADB, macii_interrupt, 0, "ADB", macii_interrupt); if (err) - goto out; + return err; macii_state = idle; -out: - local_irq_restore(flags); - return err; + return 0; } /* initialize the hardware */ From 8650725bb0a48b206d5a8ddad3a7488f9a5985b7 Mon Sep 17 00:00:00 2001 From: Nikita Zhandarovich Date: Mon, 22 Apr 2024 11:33:55 -0700 Subject: [PATCH 0066/1565] wifi: carl9170: add a proper sanity check for endpoints [ Upstream commit b6dd09b3dac89b45d1ea3e3bd035a3859c0369a0 ] Syzkaller reports [1] hitting a warning which is caused by presence of a wrong endpoint type at the URB sumbitting stage. While there was a check for a specific 4th endpoint, since it can switch types between bulk and interrupt, other endpoints are trusted implicitly. Similar warning is triggered in a couple of other syzbot issues [2]. Fix the issue by doing a comprehensive check of all endpoints taking into account difference between high- and full-speed configuration. [1] Syzkaller report: ... WARNING: CPU: 0 PID: 4721 at drivers/usb/core/urb.c:504 usb_submit_urb+0xed6/0x1880 drivers/usb/core/urb.c:504 ... Call Trace: carl9170_usb_send_rx_irq_urb+0x273/0x340 drivers/net/wireless/ath/carl9170/usb.c:504 carl9170_usb_init_device drivers/net/wireless/ath/carl9170/usb.c:939 [inline] carl9170_usb_firmware_finish drivers/net/wireless/ath/carl9170/usb.c:999 [inline] carl9170_usb_firmware_step2+0x175/0x240 drivers/net/wireless/ath/carl9170/usb.c:1028 request_firmware_work_func+0x130/0x240 drivers/base/firmware_loader/main.c:1107 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289 worker_thread+0x669/0x1090 kernel/workqueue.c:2436 kthread+0x2e8/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 [2] Related syzkaller crashes: Link: https://syzkaller.appspot.com/bug?extid=e394db78ae0b0032cb4d Link: https://syzkaller.appspot.com/bug?extid=9468df99cb63a4a4c4e1 Reported-and-tested-by: syzbot+0ae4804973be759fa420@syzkaller.appspotmail.com Fixes: a84fab3cbfdc ("carl9170: 802.11 rx/tx processing and usb backend") Signed-off-by: Nikita Zhandarovich Acked-By: Christian Lamparter Signed-off-by: Kalle Valo Link: https://msgid.link/20240422183355.3785-1-n.zhandarovich@fintech.ru Signed-off-by: Sasha Levin --- drivers/net/wireless/ath/carl9170/usb.c | 32 +++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/drivers/net/wireless/ath/carl9170/usb.c b/drivers/net/wireless/ath/carl9170/usb.c index e4eb666c6eea..a5265997b576 100644 --- a/drivers/net/wireless/ath/carl9170/usb.c +++ b/drivers/net/wireless/ath/carl9170/usb.c @@ -1069,6 +1069,38 @@ static int carl9170_usb_probe(struct usb_interface *intf, ar->usb_ep_cmd_is_bulk = true; } + /* Verify that all expected endpoints are present */ + if (ar->usb_ep_cmd_is_bulk) { + u8 bulk_ep_addr[] = { + AR9170_USB_EP_RX | USB_DIR_IN, + AR9170_USB_EP_TX | USB_DIR_OUT, + AR9170_USB_EP_CMD | USB_DIR_OUT, + 0}; + u8 int_ep_addr[] = { + AR9170_USB_EP_IRQ | USB_DIR_IN, + 0}; + if (!usb_check_bulk_endpoints(intf, bulk_ep_addr) || + !usb_check_int_endpoints(intf, int_ep_addr)) + err = -ENODEV; + } else { + u8 bulk_ep_addr[] = { + AR9170_USB_EP_RX | USB_DIR_IN, + AR9170_USB_EP_TX | USB_DIR_OUT, + 0}; + u8 int_ep_addr[] = { + AR9170_USB_EP_IRQ | USB_DIR_IN, + AR9170_USB_EP_CMD | USB_DIR_OUT, + 0}; + if (!usb_check_bulk_endpoints(intf, bulk_ep_addr) || + !usb_check_int_endpoints(intf, int_ep_addr)) + err = -ENODEV; + } + + if (err) { + carl9170_free(ar); + return err; + } + usb_set_intfdata(intf, ar); SET_IEEE80211_DEV(ar->hw, &intf->dev); From ee25389df80138907bc9dcdf4a2be2067cde9a81 Mon Sep 17 00:00:00 2001 From: Nikita Zhandarovich Date: Mon, 8 Apr 2024 05:14:25 -0700 Subject: [PATCH 0067/1565] wifi: ar5523: enable proper endpoint verification [ Upstream commit e120b6388d7d88635d67dcae6483f39c37111850 ] Syzkaller reports [1] hitting a warning about an endpoint in use not having an expected type to it. Fix the issue by checking for the existence of all proper endpoints with their according types intact. Sadly, this patch has not been tested on real hardware. [1] Syzkaller report: ------------[ cut here ]------------ usb 1-1: BOGUS urb xfer, pipe 3 != type 1 WARNING: CPU: 0 PID: 3643 at drivers/usb/core/urb.c:504 usb_submit_urb+0xed6/0x1880 drivers/usb/core/urb.c:504 ... Call Trace: ar5523_cmd+0x41b/0x780 drivers/net/wireless/ath/ar5523/ar5523.c:275 ar5523_cmd_read drivers/net/wireless/ath/ar5523/ar5523.c:302 [inline] ar5523_host_available drivers/net/wireless/ath/ar5523/ar5523.c:1376 [inline] ar5523_probe+0x14b0/0x1d10 drivers/net/wireless/ath/ar5523/ar5523.c:1655 usb_probe_interface+0x30f/0x7f0 drivers/usb/core/driver.c:396 call_driver_probe drivers/base/dd.c:560 [inline] really_probe+0x249/0xb90 drivers/base/dd.c:639 __driver_probe_device+0x1df/0x4d0 drivers/base/dd.c:778 driver_probe_device+0x4c/0x1a0 drivers/base/dd.c:808 __device_attach_driver+0x1d4/0x2e0 drivers/base/dd.c:936 bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:427 __device_attach+0x1e4/0x530 drivers/base/dd.c:1008 bus_probe_device+0x1e8/0x2a0 drivers/base/bus.c:487 device_add+0xbd9/0x1e90 drivers/base/core.c:3517 usb_set_configuration+0x101d/0x1900 drivers/usb/core/message.c:2170 usb_generic_driver_probe+0xbe/0x100 drivers/usb/core/generic.c:238 usb_probe_device+0xd8/0x2c0 drivers/usb/core/driver.c:293 call_driver_probe drivers/base/dd.c:560 [inline] really_probe+0x249/0xb90 drivers/base/dd.c:639 __driver_probe_device+0x1df/0x4d0 drivers/base/dd.c:778 driver_probe_device+0x4c/0x1a0 drivers/base/dd.c:808 __device_attach_driver+0x1d4/0x2e0 drivers/base/dd.c:936 bus_for_each_drv+0x163/0x1e0 drivers/base/bus.c:427 __device_attach+0x1e4/0x530 drivers/base/dd.c:1008 bus_probe_device+0x1e8/0x2a0 drivers/base/bus.c:487 device_add+0xbd9/0x1e90 drivers/base/core.c:3517 usb_new_device.cold+0x685/0x10ad drivers/usb/core/hub.c:2573 hub_port_connect drivers/usb/core/hub.c:5353 [inline] hub_port_connect_change drivers/usb/core/hub.c:5497 [inline] port_event drivers/usb/core/hub.c:5653 [inline] hub_event+0x26cb/0x45d0 drivers/usb/core/hub.c:5735 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289 worker_thread+0x669/0x1090 kernel/workqueue.c:2436 kthread+0x2e8/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 Reported-and-tested-by: syzbot+1bc2c2afd44f820a669f@syzkaller.appspotmail.com Fixes: b7d572e1871d ("ar5523: Add new driver") Signed-off-by: Nikita Zhandarovich Signed-off-by: Kalle Valo Link: https://msgid.link/20240408121425.29392-1-n.zhandarovich@fintech.ru Signed-off-by: Sasha Levin --- drivers/net/wireless/ath/ar5523/ar5523.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/drivers/net/wireless/ath/ar5523/ar5523.c b/drivers/net/wireless/ath/ar5523/ar5523.c index efe38b2c1df7..71c2bf8817dc 100644 --- a/drivers/net/wireless/ath/ar5523/ar5523.c +++ b/drivers/net/wireless/ath/ar5523/ar5523.c @@ -1590,6 +1590,20 @@ static int ar5523_probe(struct usb_interface *intf, struct ar5523 *ar; int error = -ENOMEM; + static const u8 bulk_ep_addr[] = { + AR5523_CMD_TX_PIPE | USB_DIR_OUT, + AR5523_DATA_TX_PIPE | USB_DIR_OUT, + AR5523_CMD_RX_PIPE | USB_DIR_IN, + AR5523_DATA_RX_PIPE | USB_DIR_IN, + 0}; + + if (!usb_check_bulk_endpoints(intf, bulk_ep_addr)) { + dev_err(&dev->dev, + "Could not find all expected endpoints\n"); + error = -ENODEV; + goto out; + } + /* * Load firmware if the device requires it. This will return * -ENXIO on success and we'll get called back afer the usb From 3557a7fc5cbd4d99870a2460b748bcf5ddae08bd Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Fri, 1 Mar 2024 22:02:30 +0100 Subject: [PATCH 0068/1565] sh: kprobes: Merge arch_copy_kprobe() into arch_prepare_kprobe() [ Upstream commit 1422ae080b66134fe192082d9b721ab7bd93fcc5 ] arch/sh/kernel/kprobes.c:52:16: warning: no previous prototype for 'arch_copy_kprobe' [-Wmissing-prototypes] Although SH kprobes support was only merged in v2.6.28, it missed the earlier removal of the arch_copy_kprobe() callback in v2.6.15. Based on the powerpc part of commit 49a2a1b83ba6fa40 ("[PATCH] kprobes: changed from using spinlock to mutex"). Fixes: d39f5450146ff39f ("sh: Add kprobes support.") Signed-off-by: Geert Uytterhoeven Reviewed-by: John Paul Adrian Glaubitz Link: https://lore.kernel.org/r/717d47a19689cc944fae6e981a1ad7cae1642c89.1709326528.git.geert+renesas@glider.be Signed-off-by: John Paul Adrian Glaubitz Signed-off-by: Sasha Levin --- arch/sh/kernel/kprobes.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/sh/kernel/kprobes.c b/arch/sh/kernel/kprobes.c index 756100b01e84..849801373219 100644 --- a/arch/sh/kernel/kprobes.c +++ b/arch/sh/kernel/kprobes.c @@ -44,17 +44,12 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p) if (OPCODE_RTE(opcode)) return -EFAULT; /* Bad breakpoint */ + memcpy(p->ainsn.insn, p->addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); p->opcode = opcode; return 0; } -void __kprobes arch_copy_kprobe(struct kprobe *p) -{ - memcpy(p->ainsn.insn, p->addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); - p->opcode = *p->addr; -} - void __kprobes arch_arm_kprobe(struct kprobe *p) { *p->addr = BREAKPOINT_INSTRUCTION; From 019ae041a568e49f676f4c0448a0abcd0217da48 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Sun, 24 Mar 2024 16:18:04 -0700 Subject: [PATCH 0069/1565] Revert "sh: Handle calling csum_partial with misaligned data" [ Upstream commit b5319c96292ff877f6b58d349acf0a9dc8d3b454 ] This reverts commit cadc4e1a2b4d20d0cc0e81f2c6ba0588775e54e5. Commit cadc4e1a2b4d ("sh: Handle calling csum_partial with misaligned data") causes bad checksum calculations on unaligned data. Reverting it fixes the problem. # Subtest: checksum # module: checksum_kunit 1..5 # test_csum_fixed_random_inputs: ASSERTION FAILED at lib/checksum_kunit.c:500 Expected ( u64)result == ( u64)expec, but ( u64)result == 53378 (0xd082) ( u64)expec == 33488 (0x82d0) # test_csum_fixed_random_inputs: pass:0 fail:1 skip:0 total:1 not ok 1 test_csum_fixed_random_inputs # test_csum_all_carry_inputs: ASSERTION FAILED at lib/checksum_kunit.c:525 Expected ( u64)result == ( u64)expec, but ( u64)result == 65281 (0xff01) ( u64)expec == 65280 (0xff00) # test_csum_all_carry_inputs: pass:0 fail:1 skip:0 total:1 not ok 2 test_csum_all_carry_inputs # test_csum_no_carry_inputs: ASSERTION FAILED at lib/checksum_kunit.c:573 Expected ( u64)result == ( u64)expec, but ( u64)result == 65535 (0xffff) ( u64)expec == 65534 (0xfffe) # test_csum_no_carry_inputs: pass:0 fail:1 skip:0 total:1 not ok 3 test_csum_no_carry_inputs # test_ip_fast_csum: pass:1 fail:0 skip:0 total:1 ok 4 test_ip_fast_csum # test_csum_ipv6_magic: pass:1 fail:0 skip:0 total:1 ok 5 test_csum_ipv6_magic # checksum: pass:2 fail:3 skip:0 total:5 # Totals: pass:2 fail:3 skip:0 total:5 not ok 22 checksum Fixes: cadc4e1a2b4d ("sh: Handle calling csum_partial with misaligned data") Signed-off-by: Guenter Roeck Tested-by: Geert Uytterhoeven Reviewed-by: John Paul Adrian Glaubitz Link: https://lore.kernel.org/r/20240324231804.841099-1-linux@roeck-us.net Signed-off-by: John Paul Adrian Glaubitz Signed-off-by: Sasha Levin --- arch/sh/lib/checksum.S | 67 ++++++++++++------------------------------ 1 file changed, 18 insertions(+), 49 deletions(-) diff --git a/arch/sh/lib/checksum.S b/arch/sh/lib/checksum.S index 3e07074e0098..06fed5a21e8b 100644 --- a/arch/sh/lib/checksum.S +++ b/arch/sh/lib/checksum.S @@ -33,7 +33,8 @@ */ /* - * asmlinkage __wsum csum_partial(const void *buf, int len, __wsum sum); + * unsigned int csum_partial(const unsigned char *buf, int len, + * unsigned int sum); */ .text @@ -45,31 +46,11 @@ ENTRY(csum_partial) * Fortunately, it is easy to convert 2-byte alignment to 4-byte * alignment for the unrolled loop. */ + mov r5, r1 mov r4, r0 - tst #3, r0 ! Check alignment. - bt/s 2f ! Jump if alignment is ok. - mov r4, r7 ! Keep a copy to check for alignment + tst #2, r0 ! Check alignment. + bt 2f ! Jump if alignment is ok. ! - tst #1, r0 ! Check alignment. - bt 21f ! Jump if alignment is boundary of 2bytes. - - ! buf is odd - tst r5, r5 - add #-1, r5 - bt 9f - mov.b @r4+, r0 - extu.b r0, r0 - addc r0, r6 ! t=0 from previous tst - mov r6, r0 - shll8 r6 - shlr16 r0 - shlr8 r0 - or r0, r6 - mov r4, r0 - tst #2, r0 - bt 2f -21: - ! buf is 2 byte aligned (len could be 0) add #-2, r5 ! Alignment uses up two bytes. cmp/pz r5 ! bt/s 1f ! Jump if we had at least two bytes. @@ -77,17 +58,16 @@ ENTRY(csum_partial) bra 6f add #2, r5 ! r5 was < 2. Deal with it. 1: + mov r5, r1 ! Save new len for later use. mov.w @r4+, r0 extu.w r0, r0 addc r0, r6 bf 2f add #1, r6 2: - ! buf is 4 byte aligned (len could be 0) - mov r5, r1 mov #-5, r0 - shld r0, r1 - tst r1, r1 + shld r0, r5 + tst r5, r5 bt/s 4f ! if it's =0, go to 4f clrt .align 2 @@ -109,31 +89,30 @@ ENTRY(csum_partial) addc r0, r6 addc r2, r6 movt r0 - dt r1 + dt r5 bf/s 3b cmp/eq #1, r0 - ! here, we know r1==0 - addc r1, r6 ! add carry to r6 + ! here, we know r5==0 + addc r5, r6 ! add carry to r6 4: - mov r5, r0 + mov r1, r0 and #0x1c, r0 tst r0, r0 - bt 6f - ! 4 bytes or more remaining - mov r0, r1 - shlr2 r1 + bt/s 6f + mov r0, r5 + shlr2 r5 mov #0, r2 5: addc r2, r6 mov.l @r4+, r2 movt r0 - dt r1 + dt r5 bf/s 5b cmp/eq #1, r0 addc r2, r6 - addc r1, r6 ! r1==0 here, so it means add carry-bit + addc r5, r6 ! r5==0 here, so it means add carry-bit 6: - ! 3 bytes or less remaining + mov r1, r5 mov #3, r0 and r0, r5 tst r5, r5 @@ -159,16 +138,6 @@ ENTRY(csum_partial) mov #0, r0 addc r0, r6 9: - ! Check if the buffer was misaligned, if so realign sum - mov r7, r0 - tst #1, r0 - bt 10f - mov r6, r0 - shll8 r6 - shlr16 r0 - shlr8 r0 - or r0, r6 -10: rts mov r6, r0 From 2bd97a0868b0f6ad81cc21299df4017b083d639c Mon Sep 17 00:00:00 2001 From: John Hubbard Date: Thu, 2 May 2024 18:58:20 -0700 Subject: [PATCH 0070/1565] selftests/binderfs: use the Makefile's rules, not Make's implicit rules [ Upstream commit 019baf635eb6ffe8d6c1343f81788f02a7e0ed98 ] First of all, in order to build with clang at all, one must first apply Valentin Obst's build fix for LLVM [1]. Once that is done, then when building with clang, via: make LLVM=1 -C tools/testing/selftests ...the following error occurs: clang: error: cannot specify -o when generating multiple output files This is because clang, unlike gcc, won't accept invocations of this form: clang file1.c header2.h While trying to fix this, I noticed that: a) selftests/lib.mk already avoids the problem, and b) The binderfs Makefile indavertently bypasses the selftests/lib.mk build system, and quitely uses Make's implicit build rules for .c files instead. The Makefile attempts to set up both a dependency and a source file, neither of which was needed, because lib.mk is able to automatically handle both. This line: binderfs_test: binderfs_test.c ...causes Make's implicit rules to run, which builds binderfs_test without ever looking at lib.mk. Fix this by simply deleting the "binderfs_test:" Makefile target and letting lib.mk handle it instead. [1] https://lore.kernel.org/all/20240329-selftests-libmk-llvm-rfc-v1-1-2f9ed7d1c49f@valentinobst.de/ Fixes: 6e29225af902 ("binderfs: port tests to test harness infrastructure") Cc: Christian Brauner Signed-off-by: John Hubbard Reviewed-by: Christian Brauner Signed-off-by: Shuah Khan Signed-off-by: Sasha Levin --- tools/testing/selftests/filesystems/binderfs/Makefile | 2 -- 1 file changed, 2 deletions(-) diff --git a/tools/testing/selftests/filesystems/binderfs/Makefile b/tools/testing/selftests/filesystems/binderfs/Makefile index 8af25ae96049..24d8910c7ab5 100644 --- a/tools/testing/selftests/filesystems/binderfs/Makefile +++ b/tools/testing/selftests/filesystems/binderfs/Makefile @@ -3,6 +3,4 @@ CFLAGS += -I../../../../../usr/include/ -pthread TEST_GEN_PROGS := binderfs_test -binderfs_test: binderfs_test.c ../../kselftest.h ../../kselftest_harness.h - include ../../lib.mk From 2321281f19b3cce9a2224384de4e3137a0756eed Mon Sep 17 00:00:00 2001 From: Chen Ni Date: Mon, 29 Apr 2024 16:54:22 +0800 Subject: [PATCH 0071/1565] HID: intel-ish-hid: ipc: Add check for pci_alloc_irq_vectors [ Upstream commit 6baa4524027fd64d7ca524e1717c88c91a354b93 ] Add a check for the return value of pci_alloc_irq_vectors() and return error if it fails. [jkosina@suse.com: reworded changelog based on Srinivas' suggestion] Fixes: 74fbc7d371d9 ("HID: intel-ish-hid: add MSI interrupt support") Signed-off-by: Chen Ni Acked-by: Srinivas Pandruvada Signed-off-by: Jiri Kosina Signed-off-by: Sasha Levin --- drivers/hid/intel-ish-hid/ipc/pci-ish.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/hid/intel-ish-hid/ipc/pci-ish.c b/drivers/hid/intel-ish-hid/ipc/pci-ish.c index c6d48a8648b7..4a16ede402ff 100644 --- a/drivers/hid/intel-ish-hid/ipc/pci-ish.c +++ b/drivers/hid/intel-ish-hid/ipc/pci-ish.c @@ -164,6 +164,11 @@ static int ish_probe(struct pci_dev *pdev, const struct pci_device_id *ent) /* request and enable interrupt */ ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES); + if (ret < 0) { + dev_err(dev, "ISH: Failed to allocate IRQ vectors\n"); + return ret; + } + if (!pdev->msi_enabled && !pdev->msix_enabled) irq_flag = IRQF_SHARED; From 00b425ff0891283207d7bad607a2412225274d7a Mon Sep 17 00:00:00 2001 From: Bui Quang Minh Date: Wed, 24 Apr 2024 21:44:20 +0700 Subject: [PATCH 0072/1565] scsi: bfa: Ensure the copied buf is NUL terminated [ Upstream commit 13d0cecb4626fae67c00c84d3c7851f6b62f7df3 ] Currently, we allocate a nbytes-sized kernel buffer and copy nbytes from userspace to that buffer. Later, we use sscanf on this buffer but we don't ensure that the string is terminated inside the buffer, this can lead to OOB read when using sscanf. Fix this issue by using memdup_user_nul instead of memdup_user. Fixes: 9f30b674759b ("bfa: replace 2 kzalloc/copy_from_user by memdup_user") Signed-off-by: Bui Quang Minh Link: https://lore.kernel.org/r/20240424-fix-oob-read-v2-3-f1f1b53a10f4@gmail.com Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/bfa/bfad_debugfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/bfa/bfad_debugfs.c b/drivers/scsi/bfa/bfad_debugfs.c index fd1b378a263a..d3c7d4423c51 100644 --- a/drivers/scsi/bfa/bfad_debugfs.c +++ b/drivers/scsi/bfa/bfad_debugfs.c @@ -250,7 +250,7 @@ bfad_debugfs_write_regrd(struct file *file, const char __user *buf, unsigned long flags; void *kern_buf; - kern_buf = memdup_user(buf, nbytes); + kern_buf = memdup_user_nul(buf, nbytes); if (IS_ERR(kern_buf)) return PTR_ERR(kern_buf); @@ -317,7 +317,7 @@ bfad_debugfs_write_regwr(struct file *file, const char __user *buf, unsigned long flags; void *kern_buf; - kern_buf = memdup_user(buf, nbytes); + kern_buf = memdup_user_nul(buf, nbytes); if (IS_ERR(kern_buf)) return PTR_ERR(kern_buf); From 769b9fd2af02c069451fe9108dba73355d9a021c Mon Sep 17 00:00:00 2001 From: Bui Quang Minh Date: Wed, 24 Apr 2024 21:44:21 +0700 Subject: [PATCH 0073/1565] scsi: qedf: Ensure the copied buf is NUL terminated [ Upstream commit d0184a375ee797eb657d74861ba0935b6e405c62 ] Currently, we allocate a count-sized kernel buffer and copy count from userspace to that buffer. Later, we use kstrtouint on this buffer but we don't ensure that the string is terminated inside the buffer, this can lead to OOB read when using kstrtouint. Fix this issue by using memdup_user_nul instead of memdup_user. Fixes: 61d8658b4a43 ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.") Signed-off-by: Bui Quang Minh Link: https://lore.kernel.org/r/20240424-fix-oob-read-v2-4-f1f1b53a10f4@gmail.com Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/qedf/qedf_debugfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/qedf/qedf_debugfs.c b/drivers/scsi/qedf/qedf_debugfs.c index 451fd236bfd0..96174353e389 100644 --- a/drivers/scsi/qedf/qedf_debugfs.c +++ b/drivers/scsi/qedf/qedf_debugfs.c @@ -170,7 +170,7 @@ qedf_dbg_debug_cmd_write(struct file *filp, const char __user *buffer, if (!count || *ppos) return 0; - kern_buf = memdup_user(buffer, count); + kern_buf = memdup_user_nul(buffer, count); if (IS_ERR(kern_buf)) return PTR_ERR(kern_buf); From 4cff6817ee446fc4250d7d5e2bb967ce5a2d8624 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Sat, 4 May 2024 14:38:15 +0300 Subject: [PATCH 0074/1565] wifi: mwl8k: initialize cmd->addr[] properly [ Upstream commit 1d60eabb82694e58543e2b6366dae3e7465892a5 ] This loop is supposed to copy the mac address to cmd->addr but the i++ increment is missing so it copies everything to cmd->addr[0] and only the last address is recorded. Fixes: 22bedad3ce11 ("net: convert multicast list to list_head") Signed-off-by: Dan Carpenter Signed-off-by: Kalle Valo Link: https://msgid.link/b788be9a-15f5-4cca-a3fe-79df4c8ce7b2@moroto.mountain Signed-off-by: Sasha Levin --- drivers/net/wireless/marvell/mwl8k.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/marvell/mwl8k.c b/drivers/net/wireless/marvell/mwl8k.c index dc91ac8cbd48..dd72e9f8b407 100644 --- a/drivers/net/wireless/marvell/mwl8k.c +++ b/drivers/net/wireless/marvell/mwl8k.c @@ -2714,7 +2714,7 @@ __mwl8k_cmd_mac_multicast_adr(struct ieee80211_hw *hw, int allmulti, cmd->action |= cpu_to_le16(MWL8K_ENABLE_RX_MULTICAST); cmd->numaddr = cpu_to_le16(mc_count); netdev_hw_addr_list_for_each(ha, mc_list) { - memcpy(cmd->addr[i], ha->addr, ETH_ALEN); + memcpy(cmd->addr[i++], ha->addr, ETH_ALEN); } } From 2093cc6e87581ea72e4da32ff3622ba7a9a691a1 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 6 May 2024 13:55:46 +0000 Subject: [PATCH 0075/1565] usb: aqc111: stop lying about skb->truesize [ Upstream commit 9aad6e45c4e7d16b2bb7c3794154b828fb4384b4 ] Some usb drivers try to set small skb->truesize and break core networking stacks. I replace one skb_clone() by an allocation of a fresh and small skb, to get minimally sized skbs, like we did in commit 1e2c61172342 ("net: cdc_ncm: reduce skb truesize in rx path") and 4ce62d5b2f7a ("net: usb: ax88179_178a: stop lying about skb->truesize") Fixes: 361459cd9642 ("net: usb: aqc111: Implement RX data path") Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20240506135546.3641185-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/usb/aqc111.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c index 4ea02116be18..895d4f5166f9 100644 --- a/drivers/net/usb/aqc111.c +++ b/drivers/net/usb/aqc111.c @@ -1141,17 +1141,15 @@ static int aqc111_rx_fixup(struct usbnet *dev, struct sk_buff *skb) continue; } - /* Clone SKB */ - new_skb = skb_clone(skb, GFP_ATOMIC); + new_skb = netdev_alloc_skb_ip_align(dev->net, pkt_len); if (!new_skb) goto err; - new_skb->len = pkt_len; + skb_put(new_skb, pkt_len); + memcpy(new_skb->data, skb->data, pkt_len); skb_pull(new_skb, AQ_RX_HW_PAD); - skb_set_tail_pointer(new_skb, new_skb->len); - new_skb->truesize = SKB_TRUESIZE(new_skb->len); if (aqc111_data->rx_checksum) aqc111_rx_checksum(new_skb, pkt_desc); From d50b11c21ff0f20264e14b262751151ff58b986e Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 6 May 2024 14:39:39 +0000 Subject: [PATCH 0076/1565] net: usb: sr9700: stop lying about skb->truesize [ Upstream commit 05417aa9c0c038da2464a0c504b9d4f99814a23b ] Some usb drivers set small skb->truesize and break core networking stacks. In this patch, I removed one of the skb->truesize override. I also replaced one skb_clone() by an allocation of a fresh and small skb, to get minimally sized skbs, like we did in commit 1e2c61172342 ("net: cdc_ncm: reduce skb truesize in rx path") and 4ce62d5b2f7a ("net: usb: ax88179_178a: stop lying about skb->truesize") Fixes: c9b37458e956 ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support") Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20240506143939.3673865-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/usb/sr9700.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/drivers/net/usb/sr9700.c b/drivers/net/usb/sr9700.c index 811c8751308c..3fac642bec77 100644 --- a/drivers/net/usb/sr9700.c +++ b/drivers/net/usb/sr9700.c @@ -418,19 +418,15 @@ static int sr9700_rx_fixup(struct usbnet *dev, struct sk_buff *skb) skb_pull(skb, 3); skb->len = len; skb_set_tail_pointer(skb, len); - skb->truesize = len + sizeof(struct sk_buff); return 2; } - /* skb_clone is used for address align */ - sr_skb = skb_clone(skb, GFP_ATOMIC); + sr_skb = netdev_alloc_skb_ip_align(dev->net, len); if (!sr_skb) return 0; - sr_skb->len = len; - sr_skb->data = skb->data + 3; - skb_set_tail_pointer(sr_skb, len); - sr_skb->truesize = len + sizeof(struct sk_buff); + skb_put(sr_skb, len); + memcpy(sr_skb->data, skb->data + 3, len); usbnet_skb_return(dev, sr_skb); skb_pull(skb, len + SR_RX_OVERHEAD); From 4eeffecc8e3cce25bb559502c2fd94a948bcde82 Mon Sep 17 00:00:00 2001 From: Michael Schmitz Date: Thu, 11 Apr 2024 15:36:31 +1200 Subject: [PATCH 0077/1565] m68k: Fix spinlock race in kernel thread creation [ Upstream commit da89ce46f02470ef08f0f580755d14d547da59ed ] Context switching does take care to retain the correct lock owner across the switch from 'prev' to 'next' tasks. This does rely on interrupts remaining disabled for the entire duration of the switch. This condition is guaranteed for normal process creation and context switching between already running processes, because both 'prev' and 'next' already have interrupts disabled in their saved copies of the status register. The situation is different for newly created kernel threads. The status register is set to PS_S in copy_thread(), which does leave the IPL at 0. Upon restoring the 'next' thread's status register in switch_to() aka resume(), interrupts then become enabled prematurely. resume() then returns via ret_from_kernel_thread() and schedule_tail() where run queue lock is released (see finish_task_switch() and finish_lock_switch()). A timer interrupt calling scheduler_tick() before the lock is released in finish_task_switch() will find the lock already taken, with the current task as lock owner. This causes a spinlock recursion warning as reported by Guenter Roeck. As far as I can ascertain, this race has been opened in commit 533e6903bea0 ("m68k: split ret_from_fork(), simplify kernel_thread()") but I haven't done a detailed study of kernel history so it may well predate that commit. Interrupts cannot be disabled in the saved status register copy for kernel threads (init will complain about interrupts disabled when finally starting user space). Disable interrupts temporarily when switching the tasks' register sets in resume(). Note that a simple oriw 0x700,%sr after restoring sr is not enough here - this leaves enough of a race for the 'spinlock recursion' warning to still be observed. Tested on ARAnyM and qemu (Quadra 800 emulation). Fixes: 533e6903bea0 ("m68k: split ret_from_fork(), simplify kernel_thread()") Reported-by: Guenter Roeck Closes: https://lore.kernel.org/all/07811b26-677c-4d05-aeb4-996cd880b789@roeck-us.net Signed-off-by: Michael Schmitz Tested-by: Guenter Roeck Reviewed-by: Geert Uytterhoeven Link: https://lore.kernel.org/r/20240411033631.16335-1-schmitzmic@gmail.com Signed-off-by: Geert Uytterhoeven Signed-off-by: Sasha Levin --- arch/m68k/kernel/entry.S | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/m68k/kernel/entry.S b/arch/m68k/kernel/entry.S index 546bab6bfc27..d0ca4df43528 100644 --- a/arch/m68k/kernel/entry.S +++ b/arch/m68k/kernel/entry.S @@ -432,7 +432,9 @@ resume: movec %a0,%dfc /* restore status register */ - movew %a1@(TASK_THREAD+THREAD_SR),%sr + movew %a1@(TASK_THREAD+THREAD_SR),%d0 + oriw #0x0700,%d0 + movew %d0,%sr rts From 3eccf76b572f502648abd5109ccde03ad0c5322b Mon Sep 17 00:00:00 2001 From: Finn Thain Date: Sat, 4 May 2024 14:31:12 +1000 Subject: [PATCH 0078/1565] m68k: mac: Fix reboot hang on Mac IIci [ Upstream commit 265a3b322df9a973ff1fc63da70af456ab6ae1d6 ] Calling mac_reset() on a Mac IIci does reset the system, but what follows is a POST failure that requires a manual reset to resolve. Avoid that by using the 68030 asm implementation instead of the C implementation. Apparently the SE/30 has a similar problem as it has used the asm implementation since before git. This patch extends that solution to other systems with a similar ROM. After this patch, the only systems still using the C implementation are 68040 systems where adb_type is either MAC_ADB_IOP or MAC_ADB_II. This implies a 1 MiB Quadra ROM. This now includes the Quadra 900/950, which previously fell through to the "should never get here" catch-all. Reported-and-tested-by: Stan Johnson Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Finn Thain Reviewed-by: Geert Uytterhoeven Link: https://lore.kernel.org/r/480ebd1249d229c6dc1f3f1c6d599b8505483fd8.1714797072.git.fthain@linux-m68k.org Signed-off-by: Geert Uytterhoeven Signed-off-by: Sasha Levin --- arch/m68k/mac/misc.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/arch/m68k/mac/misc.c b/arch/m68k/mac/misc.c index 90f4e9ca1276..d3b34dd7590d 100644 --- a/arch/m68k/mac/misc.c +++ b/arch/m68k/mac/misc.c @@ -452,30 +452,18 @@ void mac_poweroff(void) void mac_reset(void) { - if (macintosh_config->adb_type == MAC_ADB_II && - macintosh_config->ident != MAC_MODEL_SE30) { - /* need ROMBASE in booter */ - /* indeed, plus need to MAP THE ROM !! */ - - if (mac_bi_data.rombase == 0) - mac_bi_data.rombase = 0x40800000; - - /* works on some */ - rom_reset = (void *) (mac_bi_data.rombase + 0xa); - - local_irq_disable(); - rom_reset(); #ifdef CONFIG_ADB_CUDA - } else if (macintosh_config->adb_type == MAC_ADB_EGRET || - macintosh_config->adb_type == MAC_ADB_CUDA) { + if (macintosh_config->adb_type == MAC_ADB_EGRET || + macintosh_config->adb_type == MAC_ADB_CUDA) { cuda_restart(); + } else #endif #ifdef CONFIG_ADB_PMU - } else if (macintosh_config->adb_type == MAC_ADB_PB2) { + if (macintosh_config->adb_type == MAC_ADB_PB2) { pmu_restart(); + } else #endif - } else if (CPU_IS_030) { - + if (CPU_IS_030) { /* 030-specific reset routine. The idea is general, but the * specific registers to reset are '030-specific. Until I * have a non-030 machine, I can't test anything else. @@ -523,6 +511,18 @@ void mac_reset(void) "jmp %/a0@\n\t" /* jump to the reset vector */ ".chip 68k" : : "r" (offset), "a" (rombase) : "a0"); + } else { + /* need ROMBASE in booter */ + /* indeed, plus need to MAP THE ROM !! */ + + if (mac_bi_data.rombase == 0) + mac_bi_data.rombase = 0x40800000; + + /* works on some */ + rom_reset = (void *)(mac_bi_data.rombase + 0xa); + + local_irq_disable(); + rom_reset(); } /* should never get here */ From ad31e0e765e9e23cb924b75f2fa8bac5fa217fde Mon Sep 17 00:00:00 2001 From: gaoxingwang Date: Mon, 22 Apr 2024 17:19:17 +0800 Subject: [PATCH 0079/1565] net: ipv6: fix wrong start position when receive hop-by-hop fragment [ Upstream commit 1cd354fe1e4864eeaff62f66ee513080ec946f20 ] In IPv6, ipv6_rcv_core will parse the hop-by-hop type extension header and increase skb->transport_header by one extension header length. But if there are more other extension headers like fragment header at this time, the skb->transport_header points to the second extension header, not the transport layer header or the first extension header. This will result in the start and nexthdrp variable not pointing to the same position in ipv6frag_thdr_trunced, and ipv6_skip_exthdr returning incorrect offset and frag_off.Sometimes,the length of the last sharded packet is smaller than the calculated incorrect offset, resulting in packet loss. We can use network header to offset and calculate the correct position to solve this problem. Fixes: 9d9e937b1c8b (ipv6/netfilter: Discard first fragment not including all headers) Signed-off-by: Gao Xingwang Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/ipv6/reassembly.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index 28e44782c94d..699367517155 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -363,7 +363,7 @@ static int ipv6_frag_rcv(struct sk_buff *skb) * the source of the fragment, with the Pointer field set to zero. */ nexthdr = hdr->nexthdr; - if (ipv6frag_thdr_truncated(skb, skb_transport_offset(skb), &nexthdr)) { + if (ipv6frag_thdr_truncated(skb, skb_network_offset(skb) + sizeof(struct ipv6hdr), &nexthdr)) { __IP6_INC_STATS(net, __in6_dev_get_safely(skb->dev), IPSTATS_MIB_INHDRERRORS); icmpv6_param_prob(skb, ICMPV6_HDR_INCOMP, 0); From e22b23f5888a065d084e87db1eec639c445e677f Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 8 May 2024 06:45:04 -0700 Subject: [PATCH 0080/1565] eth: sungem: remove .ndo_poll_controller to avoid deadlocks [ Upstream commit ac0a230f719b02432d8c7eba7615ebd691da86f4 ] Erhard reports netpoll warnings from sungem: netpoll_send_skb_on_dev(): eth0 enabled interrupts in poll (gem_start_xmit+0x0/0x398) WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c gem_poll_controller() disables interrupts, which may sleep. We can't sleep in netpoll, it has interrupts disabled completely. Strangely, gem_poll_controller() doesn't even poll the completions, and instead acts as if an interrupt has fired so it just schedules NAPI and exits. None of this has been necessary for years, since netpoll invokes NAPI directly. Fixes: fe09bb619096 ("sungem: Spring cleaning and GRO support") Reported-and-tested-by: Erhard Furtner Link: https://lore.kernel.org/all/20240428125306.2c3080ef@legion Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/20240508134504.3560956-1-kuba@kernel.org Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/ethernet/sun/sungem.c | 14 -------------- 1 file changed, 14 deletions(-) diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c index 58f142ee78a3..4d6a9f02c738 100644 --- a/drivers/net/ethernet/sun/sungem.c +++ b/drivers/net/ethernet/sun/sungem.c @@ -949,17 +949,6 @@ static irqreturn_t gem_interrupt(int irq, void *dev_id) return IRQ_HANDLED; } -#ifdef CONFIG_NET_POLL_CONTROLLER -static void gem_poll_controller(struct net_device *dev) -{ - struct gem *gp = netdev_priv(dev); - - disable_irq(gp->pdev->irq); - gem_interrupt(gp->pdev->irq, dev); - enable_irq(gp->pdev->irq); -} -#endif - static void gem_tx_timeout(struct net_device *dev, unsigned int txqueue) { struct gem *gp = netdev_priv(dev); @@ -2836,9 +2825,6 @@ static const struct net_device_ops gem_netdev_ops = { .ndo_change_mtu = gem_change_mtu, .ndo_validate_addr = eth_validate_addr, .ndo_set_mac_address = gem_set_mac_address, -#ifdef CONFIG_NET_POLL_CONTROLLER - .ndo_poll_controller = gem_poll_controller, -#endif }; static int gem_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) From 161e43067b86e49e2ccc9780368aec8d600c9ed7 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Thu, 9 May 2024 09:44:54 +0200 Subject: [PATCH 0081/1565] net: ethernet: cortina: Locking fixes [ Upstream commit 812552808f7ff71133fc59768cdc253c5b8ca1bf ] This fixes a probably long standing problem in the Cortina Gemini ethernet driver: there are some paths in the code where the IRQ registers are written without taking the proper locks. Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet") Signed-off-by: Linus Walleij Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20240509-gemini-ethernet-locking-v1-1-afd00a528b95@linaro.org Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/ethernet/cortina/gemini.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c index c78587ddb32f..fa46854fd697 100644 --- a/drivers/net/ethernet/cortina/gemini.c +++ b/drivers/net/ethernet/cortina/gemini.c @@ -1109,10 +1109,13 @@ static void gmac_tx_irq_enable(struct net_device *netdev, { struct gemini_ethernet_port *port = netdev_priv(netdev); struct gemini_ethernet *geth = port->geth; + unsigned long flags; u32 val, mask; netdev_dbg(netdev, "%s device %d\n", __func__, netdev->dev_id); + spin_lock_irqsave(&geth->irq_lock, flags); + mask = GMAC0_IRQ0_TXQ0_INTS << (6 * netdev->dev_id + txq); if (en) @@ -1121,6 +1124,8 @@ static void gmac_tx_irq_enable(struct net_device *netdev, val = readl(geth->base + GLOBAL_INTERRUPT_ENABLE_0_REG); val = en ? val | mask : val & ~mask; writel(val, geth->base + GLOBAL_INTERRUPT_ENABLE_0_REG); + + spin_unlock_irqrestore(&geth->irq_lock, flags); } static void gmac_tx_irq(struct net_device *netdev, unsigned int txq_num) @@ -1427,15 +1432,19 @@ static unsigned int gmac_rx(struct net_device *netdev, unsigned int budget) union gmac_rxdesc_3 word3; struct page *page = NULL; unsigned int page_offs; + unsigned long flags; unsigned short r, w; union dma_rwptr rw; dma_addr_t mapping; int frag_nr = 0; + spin_lock_irqsave(&geth->irq_lock, flags); rw.bits32 = readl(ptr_reg); /* Reset interrupt as all packages until here are taken into account */ writel(DEFAULT_Q0_INT_BIT << netdev->dev_id, geth->base + GLOBAL_INTERRUPT_STATUS_1_REG); + spin_unlock_irqrestore(&geth->irq_lock, flags); + r = rw.bits.rptr; w = rw.bits.wptr; @@ -1738,10 +1747,9 @@ static irqreturn_t gmac_irq(int irq, void *data) gmac_update_hw_stats(netdev); if (val & (GMAC0_RX_OVERRUN_INT_BIT << (netdev->dev_id * 8))) { + spin_lock(&geth->irq_lock); writel(GMAC0_RXDERR_INT_BIT << (netdev->dev_id * 8), geth->base + GLOBAL_INTERRUPT_STATUS_4_REG); - - spin_lock(&geth->irq_lock); u64_stats_update_begin(&port->ir_stats_syncp); ++port->stats.rx_fifo_errors; u64_stats_update_end(&port->ir_stats_syncp); From 4d51845d734a4c5d079e56e0916f936a55e15055 Mon Sep 17 00:00:00 2001 From: Breno Leitao Date: Thu, 9 May 2024 01:14:46 -0700 Subject: [PATCH 0082/1565] af_unix: Fix data races in unix_release_sock/unix_stream_sendmsg [ Upstream commit 540bf24fba16b88c1b3b9353927204b4f1074e25 ] A data-race condition has been identified in af_unix. In one data path, the write function unix_release_sock() atomically writes to sk->sk_shutdown using WRITE_ONCE. However, on the reader side, unix_stream_sendmsg() does not read it atomically. Consequently, this issue is causing the following KCSAN splat to occur: BUG: KCSAN: data-race in unix_release_sock / unix_stream_sendmsg write (marked) to 0xffff88867256ddbb of 1 bytes by task 7270 on cpu 28: unix_release_sock (net/unix/af_unix.c:640) unix_release (net/unix/af_unix.c:1050) sock_close (net/socket.c:659 net/socket.c:1421) __fput (fs/file_table.c:422) __fput_sync (fs/file_table.c:508) __se_sys_close (fs/open.c:1559 fs/open.c:1541) __x64_sys_close (fs/open.c:1541) x64_sys_call (arch/x86/entry/syscall_64.c:33) do_syscall_64 (arch/x86/entry/common.c:?) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) read to 0xffff88867256ddbb of 1 bytes by task 989 on cpu 14: unix_stream_sendmsg (net/unix/af_unix.c:2273) __sock_sendmsg (net/socket.c:730 net/socket.c:745) ____sys_sendmsg (net/socket.c:2584) __sys_sendmmsg (net/socket.c:2638 net/socket.c:2724) __x64_sys_sendmmsg (net/socket.c:2753 net/socket.c:2750 net/socket.c:2750) x64_sys_call (arch/x86/entry/syscall_64.c:33) do_syscall_64 (arch/x86/entry/common.c:?) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) value changed: 0x01 -> 0x03 The line numbers are related to commit dd5a440a31fa ("Linux 6.9-rc7"). Commit e1d09c2c2f57 ("af_unix: Fix data races around sk->sk_shutdown.") addressed a comparable issue in the past regarding sk->sk_shutdown. However, it overlooked resolving this particular data path. This patch only offending unix_stream_sendmsg() function, since the other reads seem to be protected by unix_state_lock() as discussed in Link: https://lore.kernel.org/all/20240508173324.53565-1-kuniyu@amazon.com/ Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Breno Leitao Reviewed-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20240509081459.2807828-1-leitao@debian.org Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/unix/af_unix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 224b1fdc8227..3ab726a668e8 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1918,7 +1918,7 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, goto out_err; } - if (sk->sk_shutdown & SEND_SHUTDOWN) + if (READ_ONCE(sk->sk_shutdown) & SEND_SHUTDOWN) goto pipe_err; while (sent < len) { From c3dc80f63326261fc991ac87a79d82a2e138bbb9 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 9 May 2024 08:33:13 +0000 Subject: [PATCH 0083/1565] net: usb: smsc95xx: stop lying about skb->truesize [ Upstream commit d50729f1d60bca822ef6d9c1a5fb28d486bd7593 ] Some usb drivers try to set small skb->truesize and break core networking stacks. In this patch, I removed one of the skb->truesize override. I also replaced one skb_clone() by an allocation of a fresh and small skb, to get minimally sized skbs, like we did in commit 1e2c61172342 ("net: cdc_ncm: reduce skb truesize in rx path") and 4ce62d5b2f7a ("net: usb: ax88179_178a: stop lying about skb->truesize") v3: also fix a sparse error ( https://lore.kernel.org/oe-kbuild-all/202405091310.KvncIecx-lkp@intel.com/ ) v2: leave the skb_trim() game because smsc95xx_rx_csum_offload() needs the csum part. (Jakub) While we are it, use get_unaligned() in smsc95xx_rx_csum_offload(). Fixes: 2f7ca802bdae ("net: Add SMSC LAN9500 USB2.0 10/100 ethernet adapter driver") Signed-off-by: Eric Dumazet Cc: Steve Glendinning Cc: UNGLinuxDriver@microchip.com Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20240509083313.2113832-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/usb/smsc95xx.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c index 569be01700aa..2778ad6726f0 100644 --- a/drivers/net/usb/smsc95xx.c +++ b/drivers/net/usb/smsc95xx.c @@ -1801,9 +1801,11 @@ static int smsc95xx_reset_resume(struct usb_interface *intf) static void smsc95xx_rx_csum_offload(struct sk_buff *skb) { - skb->csum = *(u16 *)(skb_tail_pointer(skb) - 2); + u16 *csum_ptr = (u16 *)(skb_tail_pointer(skb) - 2); + + skb->csum = (__force __wsum)get_unaligned(csum_ptr); skb->ip_summed = CHECKSUM_COMPLETE; - skb_trim(skb, skb->len - 2); + skb_trim(skb, skb->len - 2); /* remove csum */ } static int smsc95xx_rx_fixup(struct usbnet *dev, struct sk_buff *skb) @@ -1861,25 +1863,22 @@ static int smsc95xx_rx_fixup(struct usbnet *dev, struct sk_buff *skb) if (dev->net->features & NETIF_F_RXCSUM) smsc95xx_rx_csum_offload(skb); skb_trim(skb, skb->len - 4); /* remove fcs */ - skb->truesize = size + sizeof(struct sk_buff); return 1; } - ax_skb = skb_clone(skb, GFP_ATOMIC); + ax_skb = netdev_alloc_skb_ip_align(dev->net, size); if (unlikely(!ax_skb)) { netdev_warn(dev->net, "Error allocating skb\n"); return 0; } - ax_skb->len = size; - ax_skb->data = packet; - skb_set_tail_pointer(ax_skb, size); + skb_put(ax_skb, size); + memcpy(ax_skb->data, packet, size); if (dev->net->features & NETIF_F_RXCSUM) smsc95xx_rx_csum_offload(ax_skb); skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */ - ax_skb->truesize = size + sizeof(struct sk_buff); usbnet_skb_return(dev, ax_skb); } From 5ab6aecbede080b44b8e34720ab72050bf1e6982 Mon Sep 17 00:00:00 2001 From: Ilya Maximets Date: Thu, 9 May 2024 11:38:05 +0200 Subject: [PATCH 0084/1565] net: openvswitch: fix overwriting ct original tuple for ICMPv6 [ Upstream commit 7c988176b6c16c516474f6fceebe0f055af5eb56 ] OVS_PACKET_CMD_EXECUTE has 3 main attributes: - OVS_PACKET_ATTR_KEY - Packet metadata in a netlink format. - OVS_PACKET_ATTR_PACKET - Binary packet content. - OVS_PACKET_ATTR_ACTIONS - Actions to execute on the packet. OVS_PACKET_ATTR_KEY is parsed first to populate sw_flow_key structure with the metadata like conntrack state, input port, recirculation id, etc. Then the packet itself gets parsed to populate the rest of the keys from the packet headers. Whenever the packet parsing code starts parsing the ICMPv6 header, it first zeroes out fields in the key corresponding to Neighbor Discovery information even if it is not an ND packet. It is an 'ipv6.nd' field. However, the 'ipv6' is a union that shares the space between 'nd' and 'ct_orig' that holds the original tuple conntrack metadata parsed from the OVS_PACKET_ATTR_KEY. ND packets should not normally have conntrack state, so it's fine to share the space, but normal ICMPv6 Echo packets or maybe other types of ICMPv6 can have the state attached and it should not be overwritten. The issue results in all but the last 4 bytes of the destination address being wiped from the original conntrack tuple leading to incorrect packet matching and potentially executing wrong actions in case this packet recirculates within the datapath or goes back to userspace. ND fields should not be accessed in non-ND packets, so not clearing them should be fine. Executing memset() only for actual ND packets to avoid the issue. Initializing the whole thing before parsing is needed because ND packet may not contain all the options. The issue only affects the OVS_PACKET_CMD_EXECUTE path and doesn't affect packets entering OVS datapath from network interfaces, because in this case CT metadata is populated from skb after the packet is already parsed. Fixes: 9dd7f8907c37 ("openvswitch: Add original direction conntrack tuple to sw_flow_key.") Reported-by: Antonin Bas Closes: https://github.com/openvswitch/ovs-issues/issues/327 Signed-off-by: Ilya Maximets Acked-by: Aaron Conole Acked-by: Eelco Chaudron Link: https://lore.kernel.org/r/20240509094228.1035477-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/openvswitch/flow.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c index c9ba61413c98..9bad601c7fe8 100644 --- a/net/openvswitch/flow.c +++ b/net/openvswitch/flow.c @@ -412,7 +412,6 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key, */ key->tp.src = htons(icmp->icmp6_type); key->tp.dst = htons(icmp->icmp6_code); - memset(&key->ipv6.nd, 0, sizeof(key->ipv6.nd)); if (icmp->icmp6_code == 0 && (icmp->icmp6_type == NDISC_NEIGHBOUR_SOLICITATION || @@ -421,6 +420,8 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key, struct nd_msg *nd; int offset; + memset(&key->ipv6.nd, 0, sizeof(key->ipv6.nd)); + /* In order to process neighbor discovery options, we need the * entire packet. */ From 1ba1b4cc3afbf7b89d6c81b8cf0f608cdf73b118 Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Thu, 9 May 2024 21:18:10 +0800 Subject: [PATCH 0085/1565] ipv6: sr: add missing seg6_local_exit [ Upstream commit 3321687e321307629c71b664225b861ebf3e5753 ] Currently, we only call seg6_local_exit() in seg6_init() if seg6_local_init() failed. But forgot to call it in seg6_exit(). Fixes: d1df6fd8a1d2 ("ipv6: sr: define core operations for seg6local lightweight tunnel") Signed-off-by: Hangbin Liu Reviewed-by: Sabrina Dubroca Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20240509131812.1662197-2-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/ipv6/seg6.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv6/seg6.c b/net/ipv6/seg6.c index a8439fded12d..1a7bd85746b3 100644 --- a/net/ipv6/seg6.c +++ b/net/ipv6/seg6.c @@ -503,6 +503,7 @@ void seg6_exit(void) seg6_hmac_exit(); #endif #ifdef CONFIG_IPV6_SEG6_LWTUNNEL + seg6_local_exit(); seg6_iptunnel_exit(); #endif unregister_pernet_subsys(&ip6_segments_ops); From d33327a7c0b06206bdf11326ab3d535b1b99bdb4 Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Thu, 9 May 2024 21:18:11 +0800 Subject: [PATCH 0086/1565] ipv6: sr: fix incorrect unregister order [ Upstream commit 6e370a771d2985107e82d0f6174381c1acb49c20 ] Commit 5559cea2d5aa ("ipv6: sr: fix possible use-after-free and null-ptr-deref") changed the register order in seg6_init(). But the unregister order in seg6_exit() is not updated. Fixes: 5559cea2d5aa ("ipv6: sr: fix possible use-after-free and null-ptr-deref") Signed-off-by: Hangbin Liu Reviewed-by: Sabrina Dubroca Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20240509131812.1662197-3-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/ipv6/seg6.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/seg6.c b/net/ipv6/seg6.c index 1a7bd85746b3..722bbf4055b0 100644 --- a/net/ipv6/seg6.c +++ b/net/ipv6/seg6.c @@ -506,6 +506,6 @@ void seg6_exit(void) seg6_local_exit(); seg6_iptunnel_exit(); #endif - unregister_pernet_subsys(&ip6_segments_ops); genl_unregister_family(&seg6_genl_family); + unregister_pernet_subsys(&ip6_segments_ops); } From 00e6335329f23ac6cf3105931691674e28bc598c Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Thu, 9 May 2024 21:18:12 +0800 Subject: [PATCH 0087/1565] ipv6: sr: fix invalid unregister error path [ Upstream commit 160e9d2752181fcf18c662e74022d77d3164cd45 ] The error path of seg6_init() is wrong in case CONFIG_IPV6_SEG6_LWTUNNEL is not defined. In that case if seg6_hmac_init() fails, the genl_unregister_family() isn't called. This issue exist since commit 46738b1317e1 ("ipv6: sr: add option to control lwtunnel support"), and commit 5559cea2d5aa ("ipv6: sr: fix possible use-after-free and null-ptr-deref") replaced unregister_pernet_subsys() with genl_unregister_family() in this error path. Fixes: 46738b1317e1 ("ipv6: sr: add option to control lwtunnel support") Reported-by: Guillaume Nault Signed-off-by: Hangbin Liu Reviewed-by: Sabrina Dubroca Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20240509131812.1662197-4-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/ipv6/seg6.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv6/seg6.c b/net/ipv6/seg6.c index 722bbf4055b0..77221f27262a 100644 --- a/net/ipv6/seg6.c +++ b/net/ipv6/seg6.c @@ -490,6 +490,8 @@ int __init seg6_init(void) #endif #ifdef CONFIG_IPV6_SEG6_LWTUNNEL out_unregister_genl: +#endif +#if IS_ENABLED(CONFIG_IPV6_SEG6_LWTUNNEL) || IS_ENABLED(CONFIG_IPV6_SEG6_HMAC) genl_unregister_family(&seg6_genl_family); #endif out_unregister_pernet: From f6fbb8535e990f844371086ab2c1221f71f993d3 Mon Sep 17 00:00:00 2001 From: Akiva Goldberger Date: Thu, 9 May 2024 14:29:51 +0300 Subject: [PATCH 0088/1565] net/mlx5: Discard command completions in internal error [ Upstream commit db9b31aa9bc56ff0d15b78f7e827d61c4a096e40 ] Fix use after free when FW completion arrives while device is in internal error state. Avoid calling completion handler in this case, since the device will flush the command interface and trigger all completions manually. Kernel log: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. ... RIP: 0010:refcount_warn_saturate+0xd8/0xe0 ... Call Trace: ? __warn+0x79/0x120 ? refcount_warn_saturate+0xd8/0xe0 ? report_bug+0x17c/0x190 ? handle_bug+0x3c/0x60 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? refcount_warn_saturate+0xd8/0xe0 cmd_ent_put+0x13b/0x160 [mlx5_core] mlx5_cmd_comp_handler+0x5f9/0x670 [mlx5_core] cmd_comp_notifier+0x1f/0x30 [mlx5_core] notifier_call_chain+0x35/0xb0 atomic_notifier_call_chain+0x16/0x20 mlx5_eq_async_int+0xf6/0x290 [mlx5_core] notifier_call_chain+0x35/0xb0 atomic_notifier_call_chain+0x16/0x20 irq_int_handler+0x19/0x30 [mlx5_core] __handle_irq_event_percpu+0x4b/0x160 handle_irq_event+0x2e/0x80 handle_edge_irq+0x98/0x230 __common_interrupt+0x3b/0xa0 common_interrupt+0x7b/0xa0 asm_common_interrupt+0x22/0x40 Fixes: 51d138c2610a ("net/mlx5: Fix health error state handling") Signed-off-by: Akiva Goldberger Reviewed-by: Moshe Shemesh Signed-off-by: Tariq Toukan Link: https://lore.kernel.org/r/20240509112951.590184-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index 0ba43c93abb2..42dc76f85c62 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -1523,6 +1523,9 @@ static int cmd_comp_notifier(struct notifier_block *nb, dev = container_of(cmd, struct mlx5_core_dev, cmd); eqe = data; + if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR) + return NOTIFY_DONE; + mlx5_cmd_comp_handler(dev, be32_to_cpu(eqe->data.cmd.vector), false); return NOTIFY_OK; From 04bc4d1090c343025d69149ca669a27c5b9c34a7 Mon Sep 17 00:00:00 2001 From: Srinivasan Shanmugam Date: Mon, 26 Feb 2024 18:38:08 +0530 Subject: [PATCH 0089/1565] drm/amd/display: Fix potential index out of bounds in color transformation function [ Upstream commit 63ae548f1054a0b71678d0349c7dc9628ddd42ca ] Fixes index out of bounds issue in the color transformation function. The issue could occur when the index 'i' exceeds the number of transfer function points (TRANSFER_FUNC_POINTS). The fix adds a check to ensure 'i' is within bounds before accessing the transfer function points. If 'i' is out of bounds, an error message is logged and the function returns false to indicate an error. Reported by smatch: drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_cm_common.c:405 cm_helper_translate_curve_to_hw_format() error: buffer overflow 'output_tf->tf_pts.red' 1025 <= s32max drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_cm_common.c:406 cm_helper_translate_curve_to_hw_format() error: buffer overflow 'output_tf->tf_pts.green' 1025 <= s32max drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_cm_common.c:407 cm_helper_translate_curve_to_hw_format() error: buffer overflow 'output_tf->tf_pts.blue' 1025 <= s32max Fixes: b629596072e5 ("drm/amd/display: Build unity lut for shaper") Cc: Vitaly Prosyak Cc: Charlene Liu Cc: Harry Wentland Cc: Rodrigo Siqueira Cc: Roman Li Cc: Aurabindo Pillai Cc: Tom Chung Signed-off-by: Srinivasan Shanmugam Reviewed-by: Tom Chung Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.c index 7a00fe525dfb..bd9bc51983fe 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.c @@ -379,6 +379,11 @@ bool cm_helper_translate_curve_to_hw_format( i += increment) { if (j == hw_points - 1) break; + if (i >= TRANSFER_FUNC_POINTS) { + DC_LOG_ERROR("Index out of bounds: i=%d, TRANSFER_FUNC_POINTS=%d\n", + i, TRANSFER_FUNC_POINTS); + return false; + } rgb_resulted[j].red = output_tf->tf_pts.red[i]; rgb_resulted[j].green = output_tf->tf_pts.green[i]; rgb_resulted[j].blue = output_tf->tf_pts.blue[i]; From ece8098579e10bf2cdd1b9cd75cb5e4a2814cf21 Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Thu, 12 Nov 2020 16:38:15 -0600 Subject: [PATCH 0090/1565] ASoC: soc-acpi: add helper to identify parent driver. [ Upstream commit 644eebdbbf1154c995d6319c133d7d5b898c5ed2 ] Intel machine drivers are used by parent platform drivers based on closed-source firmware (Atom/SST and catpt) and SOF-based ones. In some cases for ACPI-based platforms, the behavior of machine drivers needs to be modified depending on the parent type, typically for card names and power management. An initial solution based on passing a boolean flag as a platform device parameter was tested earlier. Since it looked overkill, this patch suggests instead a simple string comparison to identify an SOF parent device/driver. Suggested-by: Kai Vehmanen Signed-off-by: Pierre-Louis Bossart Reviewed-by: Ranjani Sridharan Reviewed-by: Rander Wang Reviewed-by: Guennadi Liakhovetski Link: https://lore.kernel.org/r/20201112223825.39765-5-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown Stable-dep-of: 0cb3b7fd530b ("ASoC: Intel: Disable route checks for Skylake boards") Signed-off-by: Sasha Levin --- include/sound/soc-acpi.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/include/sound/soc-acpi.h b/include/sound/soc-acpi.h index b16a844d16ef..9a43c44dcbbb 100644 --- a/include/sound/soc-acpi.h +++ b/include/sound/soc-acpi.h @@ -171,4 +171,10 @@ struct snd_soc_acpi_codecs { u8 codecs[SND_SOC_ACPI_MAX_CODECS][ACPI_ID_LEN]; }; +static inline bool snd_soc_acpi_sof_parent(struct device *dev) +{ + return dev->parent && dev->parent->driver && dev->parent->driver->name && + !strcmp(dev->parent->driver->name, "sof-audio-acpi"); +} + #endif From 613a349cbf8bf14bc2f13933f86703d052a51719 Mon Sep 17 00:00:00 2001 From: Cezary Rojewski Date: Fri, 8 Mar 2024 10:04:58 +0100 Subject: [PATCH 0091/1565] ASoC: Intel: Disable route checks for Skylake boards [ Upstream commit 0cb3b7fd530b8c107443218ce6db5cb6e7b5dbe1 ] Topology files that are propagated to the world and utilized by the skylake-driver carry shortcomings in their SectionGraphs. Since commit daa480bde6b3 ("ASoC: soc-core: tidyup for snd_soc_dapm_add_routes()") route checks are no longer permissive. Probe failures for Intel boards have been partially addressed by commit a22ae72b86a4 ("ASoC: soc-core: disable route checks for legacy devices") and its follow up but only skl_nau88l25_ssm4567.c is patched. Fix the problem for the rest of the boards. Link: https://lore.kernel.org/all/20200309192744.18380-1-pierre-louis.bossart@linux.intel.com/ Fixes: daa480bde6b3 ("ASoC: soc-core: tidyup for snd_soc_dapm_add_routes()") Signed-off-by: Cezary Rojewski Link: https://msgid.link/r/20240308090502.2136760-2-cezary.rojewski@intel.com Reviewed-by: Pierre-Louis Bossart Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/intel/boards/bxt_da7219_max98357a.c | 1 + sound/soc/intel/boards/bxt_rt298.c | 1 + sound/soc/intel/boards/glk_rt5682_max98357a.c | 2 ++ sound/soc/intel/boards/kbl_da7219_max98357a.c | 1 + sound/soc/intel/boards/kbl_da7219_max98927.c | 4 ++++ sound/soc/intel/boards/kbl_rt5660.c | 1 + sound/soc/intel/boards/kbl_rt5663_max98927.c | 2 ++ sound/soc/intel/boards/kbl_rt5663_rt5514_max98927.c | 1 + sound/soc/intel/boards/skl_hda_dsp_generic.c | 2 ++ sound/soc/intel/boards/skl_nau88l25_max98357a.c | 1 + sound/soc/intel/boards/skl_rt286.c | 1 + 11 files changed, 17 insertions(+) diff --git a/sound/soc/intel/boards/bxt_da7219_max98357a.c b/sound/soc/intel/boards/bxt_da7219_max98357a.c index 0c0a717823c4..1a24c44db6dd 100644 --- a/sound/soc/intel/boards/bxt_da7219_max98357a.c +++ b/sound/soc/intel/boards/bxt_da7219_max98357a.c @@ -750,6 +750,7 @@ static struct snd_soc_card broxton_audio_card = { .dapm_routes = audio_map, .num_dapm_routes = ARRAY_SIZE(audio_map), .fully_routed = true, + .disable_route_checks = true, .late_probe = bxt_card_late_probe, }; diff --git a/sound/soc/intel/boards/bxt_rt298.c b/sound/soc/intel/boards/bxt_rt298.c index 0f3157dfa838..13c41338003f 100644 --- a/sound/soc/intel/boards/bxt_rt298.c +++ b/sound/soc/intel/boards/bxt_rt298.c @@ -575,6 +575,7 @@ static struct snd_soc_card broxton_rt298 = { .dapm_routes = broxton_rt298_map, .num_dapm_routes = ARRAY_SIZE(broxton_rt298_map), .fully_routed = true, + .disable_route_checks = true, .late_probe = bxt_card_late_probe, }; diff --git a/sound/soc/intel/boards/glk_rt5682_max98357a.c b/sound/soc/intel/boards/glk_rt5682_max98357a.c index 62cca511522e..c1b789ac6d50 100644 --- a/sound/soc/intel/boards/glk_rt5682_max98357a.c +++ b/sound/soc/intel/boards/glk_rt5682_max98357a.c @@ -603,6 +603,8 @@ static int geminilake_audio_probe(struct platform_device *pdev) card = &glk_audio_card_rt5682_m98357a; card->dev = &pdev->dev; snd_soc_card_set_drvdata(card, ctx); + if (!snd_soc_acpi_sof_parent(&pdev->dev)) + card->disable_route_checks = true; /* override plaform name, if required */ mach = pdev->dev.platform_data; diff --git a/sound/soc/intel/boards/kbl_da7219_max98357a.c b/sound/soc/intel/boards/kbl_da7219_max98357a.c index 36f1f49e0b76..4ecef661f883 100644 --- a/sound/soc/intel/boards/kbl_da7219_max98357a.c +++ b/sound/soc/intel/boards/kbl_da7219_max98357a.c @@ -571,6 +571,7 @@ static struct snd_soc_card kabylake_audio_card_da7219_m98357a = { .dapm_routes = kabylake_map, .num_dapm_routes = ARRAY_SIZE(kabylake_map), .fully_routed = true, + .disable_route_checks = true, .late_probe = kabylake_card_late_probe, }; diff --git a/sound/soc/intel/boards/kbl_da7219_max98927.c b/sound/soc/intel/boards/kbl_da7219_max98927.c index 884741aa4833..80c343fe7f65 100644 --- a/sound/soc/intel/boards/kbl_da7219_max98927.c +++ b/sound/soc/intel/boards/kbl_da7219_max98927.c @@ -1016,6 +1016,7 @@ static struct snd_soc_card kbl_audio_card_da7219_m98927 = { .codec_conf = max98927_codec_conf, .num_configs = ARRAY_SIZE(max98927_codec_conf), .fully_routed = true, + .disable_route_checks = true, .late_probe = kabylake_card_late_probe, }; @@ -1034,6 +1035,7 @@ static struct snd_soc_card kbl_audio_card_max98927 = { .codec_conf = max98927_codec_conf, .num_configs = ARRAY_SIZE(max98927_codec_conf), .fully_routed = true, + .disable_route_checks = true, .late_probe = kabylake_card_late_probe, }; @@ -1051,6 +1053,7 @@ static struct snd_soc_card kbl_audio_card_da7219_m98373 = { .codec_conf = max98373_codec_conf, .num_configs = ARRAY_SIZE(max98373_codec_conf), .fully_routed = true, + .disable_route_checks = true, .late_probe = kabylake_card_late_probe, }; @@ -1068,6 +1071,7 @@ static struct snd_soc_card kbl_audio_card_max98373 = { .codec_conf = max98373_codec_conf, .num_configs = ARRAY_SIZE(max98373_codec_conf), .fully_routed = true, + .disable_route_checks = true, .late_probe = kabylake_card_late_probe, }; diff --git a/sound/soc/intel/boards/kbl_rt5660.c b/sound/soc/intel/boards/kbl_rt5660.c index 3a9f91b58e11..2bffd2b6b361 100644 --- a/sound/soc/intel/boards/kbl_rt5660.c +++ b/sound/soc/intel/boards/kbl_rt5660.c @@ -519,6 +519,7 @@ static struct snd_soc_card kabylake_audio_card_rt5660 = { .dapm_routes = kabylake_rt5660_map, .num_dapm_routes = ARRAY_SIZE(kabylake_rt5660_map), .fully_routed = true, + .disable_route_checks = true, .late_probe = kabylake_card_late_probe, }; diff --git a/sound/soc/intel/boards/kbl_rt5663_max98927.c b/sound/soc/intel/boards/kbl_rt5663_max98927.c index 9a4b3d0973f6..be76c71eca67 100644 --- a/sound/soc/intel/boards/kbl_rt5663_max98927.c +++ b/sound/soc/intel/boards/kbl_rt5663_max98927.c @@ -948,6 +948,7 @@ static struct snd_soc_card kabylake_audio_card_rt5663_m98927 = { .codec_conf = max98927_codec_conf, .num_configs = ARRAY_SIZE(max98927_codec_conf), .fully_routed = true, + .disable_route_checks = true, .late_probe = kabylake_card_late_probe, }; @@ -964,6 +965,7 @@ static struct snd_soc_card kabylake_audio_card_rt5663 = { .dapm_routes = kabylake_5663_map, .num_dapm_routes = ARRAY_SIZE(kabylake_5663_map), .fully_routed = true, + .disable_route_checks = true, .late_probe = kabylake_card_late_probe, }; diff --git a/sound/soc/intel/boards/kbl_rt5663_rt5514_max98927.c b/sound/soc/intel/boards/kbl_rt5663_rt5514_max98927.c index f95546c184aa..291f56f79cd3 100644 --- a/sound/soc/intel/boards/kbl_rt5663_rt5514_max98927.c +++ b/sound/soc/intel/boards/kbl_rt5663_rt5514_max98927.c @@ -779,6 +779,7 @@ static struct snd_soc_card kabylake_audio_card = { .codec_conf = max98927_codec_conf, .num_configs = ARRAY_SIZE(max98927_codec_conf), .fully_routed = true, + .disable_route_checks = true, .late_probe = kabylake_card_late_probe, }; diff --git a/sound/soc/intel/boards/skl_hda_dsp_generic.c b/sound/soc/intel/boards/skl_hda_dsp_generic.c index bc50eda297ab..c2ff59e9a8cf 100644 --- a/sound/soc/intel/boards/skl_hda_dsp_generic.c +++ b/sound/soc/intel/boards/skl_hda_dsp_generic.c @@ -229,6 +229,8 @@ static int skl_hda_audio_probe(struct platform_device *pdev) ctx->common_hdmi_codec_drv = mach->mach_params.common_hdmi_codec_drv; hda_soc_card.dev = &pdev->dev; + if (!snd_soc_acpi_sof_parent(&pdev->dev)) + hda_soc_card.disable_route_checks = true; if (mach->mach_params.dmic_num > 0) { snprintf(hda_soc_components, sizeof(hda_soc_components), diff --git a/sound/soc/intel/boards/skl_nau88l25_max98357a.c b/sound/soc/intel/boards/skl_nau88l25_max98357a.c index 55802900069a..d976636f5a49 100644 --- a/sound/soc/intel/boards/skl_nau88l25_max98357a.c +++ b/sound/soc/intel/boards/skl_nau88l25_max98357a.c @@ -643,6 +643,7 @@ static struct snd_soc_card skylake_audio_card = { .dapm_routes = skylake_map, .num_dapm_routes = ARRAY_SIZE(skylake_map), .fully_routed = true, + .disable_route_checks = true, .late_probe = skylake_card_late_probe, }; diff --git a/sound/soc/intel/boards/skl_rt286.c b/sound/soc/intel/boards/skl_rt286.c index 5a0c64a83146..4ff3e5fb9b7d 100644 --- a/sound/soc/intel/boards/skl_rt286.c +++ b/sound/soc/intel/boards/skl_rt286.c @@ -524,6 +524,7 @@ static struct snd_soc_card skylake_rt286 = { .dapm_routes = skylake_rt286_map, .num_dapm_routes = ARRAY_SIZE(skylake_rt286_map), .fully_routed = true, + .disable_route_checks = true, .late_probe = skylake_card_late_probe, }; From d3727d6e2b98295152771ddbb27eb51dbc349217 Mon Sep 17 00:00:00 2001 From: Maxim Korotkov Date: Wed, 13 Mar 2024 13:27:20 +0300 Subject: [PATCH 0092/1565] mtd: rawnand: hynix: fixed typo [ Upstream commit 6819db94e1cd3ce24a432f3616cd563ed0c4eaba ] The function hynix_nand_rr_init() should probably return an error code. Judging by the usage, it seems that the return code is passed up the call stack. Right now, it always returns 0 and the function hynix_nand_cleanup() in hynix_nand_init() has never been called. Found by RASU JSC and Linux Verification Center (linuxtesting.org) Fixes: 626994e07480 ("mtd: nand: hynix: Add read-retry support for 1x nm MLC NANDs") Signed-off-by: Maxim Korotkov Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20240313102721.1991299-1-korotkov.maxim.s@gmail.com Signed-off-by: Sasha Levin --- drivers/mtd/nand/raw/nand_hynix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mtd/nand/raw/nand_hynix.c b/drivers/mtd/nand/raw/nand_hynix.c index a9f50c9af109..856b3d6eceb7 100644 --- a/drivers/mtd/nand/raw/nand_hynix.c +++ b/drivers/mtd/nand/raw/nand_hynix.c @@ -402,7 +402,7 @@ static int hynix_nand_rr_init(struct nand_chip *chip) if (ret) pr_warn("failed to initialize read-retry infrastructure"); - return 0; + return ret; } static void hynix_nand_extract_oobsize(struct nand_chip *chip, From 078e02dcb4c6c5db6d5e88bd0b7611d554dd441a Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 26 Mar 2024 23:38:00 +0100 Subject: [PATCH 0093/1565] fbdev: shmobile: fix snprintf truncation [ Upstream commit 26c8cfb9d1e4b252336d23dd5127a8cbed414a32 ] The name of the overlay does not fit into the fixed-length field: drivers/video/fbdev/sh_mobile_lcdcfb.c:1577:2: error: 'snprintf' will always be truncated; specified size is 16, but format string expands to at least 25 Make it short enough by changing the string. Fixes: c5deac3c9b22 ("fbdev: sh_mobile_lcdc: Implement overlays support") Signed-off-by: Arnd Bergmann Reviewed-by: Laurent Pinchart Signed-off-by: Helge Deller Signed-off-by: Sasha Levin --- drivers/video/fbdev/sh_mobile_lcdcfb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/video/fbdev/sh_mobile_lcdcfb.c b/drivers/video/fbdev/sh_mobile_lcdcfb.c index c1043420dbd3..21d13d16f973 100644 --- a/drivers/video/fbdev/sh_mobile_lcdcfb.c +++ b/drivers/video/fbdev/sh_mobile_lcdcfb.c @@ -1580,7 +1580,7 @@ sh_mobile_lcdc_overlay_fb_init(struct sh_mobile_lcdc_overlay *ovl) */ info->fix = sh_mobile_lcdc_overlay_fix; snprintf(info->fix.id, sizeof(info->fix.id), - "SH Mobile LCDC Overlay %u", ovl->index); + "SHMobile ovl %u", ovl->index); info->fix.smem_start = ovl->dma_handle; info->fix.smem_len = ovl->fb_size; info->fix.line_length = ovl->pitch; From 7441f9e0560abdb629150650e26c4cfea8c9aff3 Mon Sep 17 00:00:00 2001 From: Christian Hewitt Date: Tue, 9 Jan 2024 23:07:04 +0000 Subject: [PATCH 0094/1565] drm/meson: vclk: fix calculation of 59.94 fractional rates [ Upstream commit bfbc68e4d8695497f858a45a142665e22a512ea3 ] Playing 4K media with 59.94 fractional rate (typically VP9) causes the screen to lose sync with the following error reported in the system log: [ 89.610280] Fatal Error, invalid HDMI vclk freq 593406 Modetest shows the following: 3840x2160 59.94 3840 4016 4104 4400 2160 2168 2178 2250 593407 flags: xxxx, xxxx, drm calculated value -------------------------------------^ Change the fractional rate calculation to stop DIV_ROUND_CLOSEST rounding down which results in vclk freq failing to match correctly. Fixes: e5fab2ec9ca4 ("drm/meson: vclk: add support for YUV420 setup") Signed-off-by: Christian Hewitt Reviewed-by: Neil Armstrong Link: https://lore.kernel.org/r/20240109230704.4120561-1-christianshewitt@gmail.com Signed-off-by: Neil Armstrong Link: https://patchwork.freedesktop.org/patch/msgid/20240109230704.4120561-1-christianshewitt@gmail.com Signed-off-by: Sasha Levin --- drivers/gpu/drm/meson/meson_vclk.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/meson/meson_vclk.c b/drivers/gpu/drm/meson/meson_vclk.c index 0eb86943a358..37127ba40aa9 100644 --- a/drivers/gpu/drm/meson/meson_vclk.c +++ b/drivers/gpu/drm/meson/meson_vclk.c @@ -790,13 +790,13 @@ meson_vclk_vic_supported_freq(struct meson_drm *priv, unsigned int phy_freq, FREQ_1000_1001(params[i].pixel_freq)); DRM_DEBUG_DRIVER("i = %d phy_freq = %d alt = %d\n", i, params[i].phy_freq, - FREQ_1000_1001(params[i].phy_freq/10)*10); + FREQ_1000_1001(params[i].phy_freq/1000)*1000); /* Match strict frequency */ if (phy_freq == params[i].phy_freq && vclk_freq == params[i].vclk_freq) return MODE_OK; /* Match 1000/1001 variant */ - if (phy_freq == (FREQ_1000_1001(params[i].phy_freq/10)*10) && + if (phy_freq == (FREQ_1000_1001(params[i].phy_freq/1000)*1000) && vclk_freq == FREQ_1000_1001(params[i].vclk_freq)) return MODE_OK; } @@ -1070,7 +1070,7 @@ void meson_vclk_setup(struct meson_drm *priv, unsigned int target, for (freq = 0 ; params[freq].pixel_freq ; ++freq) { if ((phy_freq == params[freq].phy_freq || - phy_freq == FREQ_1000_1001(params[freq].phy_freq/10)*10) && + phy_freq == FREQ_1000_1001(params[freq].phy_freq/1000)*1000) && (vclk_freq == params[freq].vclk_freq || vclk_freq == FREQ_1000_1001(params[freq].vclk_freq))) { if (vclk_freq != params[freq].vclk_freq) From d17b75ee9c2e44d3a3682c4ea5ab713ea6073350 Mon Sep 17 00:00:00 2001 From: Justin Green Date: Thu, 7 Mar 2024 13:00:51 -0500 Subject: [PATCH 0095/1565] drm/mediatek: Add 0 size check to mtk_drm_gem_obj [ Upstream commit 1e4350095e8ab2577ee05f8c3b044e661b5af9a0 ] Add a check to mtk_drm_gem_init if we attempt to allocate a GEM object of 0 bytes. Currently, no such check exists and the kernel will panic if a userspace application attempts to allocate a 0x0 GBM buffer. Tested by attempting to allocate a 0x0 GBM buffer on an MT8188 and verifying that we now return EINVAL. Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.") Signed-off-by: Justin Green Reviewed-by: AngeloGioacchino Del Regno Reviewed-by: CK Hu Link: https://patchwork.kernel.org/project/dri-devel/patch/20240307180051.4104425-1-greenjustin@chromium.org/ Signed-off-by: Chun-Kuang Hu Signed-off-by: Sasha Levin --- drivers/gpu/drm/mediatek/mtk_drm_gem.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/mediatek/mtk_drm_gem.c b/drivers/gpu/drm/mediatek/mtk_drm_gem.c index b20ea58907c2..1dac9cd20d46 100644 --- a/drivers/gpu/drm/mediatek/mtk_drm_gem.c +++ b/drivers/gpu/drm/mediatek/mtk_drm_gem.c @@ -21,6 +21,9 @@ static struct mtk_drm_gem_obj *mtk_drm_gem_init(struct drm_device *dev, size = round_up(size, PAGE_SIZE); + if (size == 0) + return ERR_PTR(-EINVAL); + mtk_gem_obj = kzalloc(sizeof(*mtk_gem_obj), GFP_KERNEL); if (!mtk_gem_obj) return ERR_PTR(-ENOMEM); From 94349e015c111a98fe545bfb3573abc4a4eb355c Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 3 Apr 2024 10:06:19 +0200 Subject: [PATCH 0096/1565] powerpc/fsl-soc: hide unused const variable [ Upstream commit 01acaf3aa75e1641442cc23d8fe0a7bb4226efb1 ] vmpic_msi_feature is only used conditionally, which triggers a rare -Werror=unused-const-variable= warning with gcc: arch/powerpc/sysdev/fsl_msi.c:567:37: error: 'vmpic_msi_feature' defined but not used [-Werror=unused-const-variable=] 567 | static const struct fsl_msi_feature vmpic_msi_feature = Hide this one in the same #ifdef as the reference so we can turn on the warning by default. Fixes: 305bcf26128e ("powerpc/fsl-soc: use CONFIG_EPAPR_PARAVIRT for hcalls") Signed-off-by: Arnd Bergmann Reviewed-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://msgid.link/20240403080702.3509288-2-arnd@kernel.org Signed-off-by: Sasha Levin --- arch/powerpc/sysdev/fsl_msi.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c index d276c5e96445..77aa7a13ab1d 100644 --- a/arch/powerpc/sysdev/fsl_msi.c +++ b/arch/powerpc/sysdev/fsl_msi.c @@ -573,10 +573,12 @@ static const struct fsl_msi_feature ipic_msi_feature = { .msiir_offset = 0x38, }; +#ifdef CONFIG_EPAPR_PARAVIRT static const struct fsl_msi_feature vmpic_msi_feature = { .fsl_pic_ip = FSL_PIC_IP_VMPIC, .msiir_offset = 0, }; +#endif static const struct of_device_id fsl_of_msi_ids[] = { { From bb3c425921f6872331d056781ef20028de486c4d Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 3 Apr 2024 10:06:31 +0200 Subject: [PATCH 0097/1565] fbdev: sisfb: hide unused variables [ Upstream commit 688cf598665851b9e8cb5083ff1d208ce43d10ff ] Building with W=1 shows that a couple of variables in this driver are only used in certain configurations: drivers/video/fbdev/sis/init301.c:239:28: error: 'SiS_Part2CLVX_6' defined but not used [-Werror=unused-const-variable=] 239 | static const unsigned char SiS_Part2CLVX_6[] = { /* 1080i */ | ^~~~~~~~~~~~~~~ drivers/video/fbdev/sis/init301.c:230:28: error: 'SiS_Part2CLVX_5' defined but not used [-Werror=unused-const-variable=] 230 | static const unsigned char SiS_Part2CLVX_5[] = { /* 750p */ | ^~~~~~~~~~~~~~~ drivers/video/fbdev/sis/init301.c:211:28: error: 'SiS_Part2CLVX_4' defined but not used [-Werror=unused-const-variable=] 211 | static const unsigned char SiS_Part2CLVX_4[] = { /* PAL */ | ^~~~~~~~~~~~~~~ drivers/video/fbdev/sis/init301.c:192:28: error: 'SiS_Part2CLVX_3' defined but not used [-Werror=unused-const-variable=] 192 | static const unsigned char SiS_Part2CLVX_3[] = { /* NTSC, 525i, 525p */ | ^~~~~~~~~~~~~~~ drivers/video/fbdev/sis/init301.c:184:28: error: 'SiS_Part2CLVX_2' defined but not used [-Werror=unused-const-variable=] 184 | static const unsigned char SiS_Part2CLVX_2[] = { | ^~~~~~~~~~~~~~~ drivers/video/fbdev/sis/init301.c:176:28: error: 'SiS_Part2CLVX_1' defined but not used [-Werror=unused-const-variable=] 176 | static const unsigned char SiS_Part2CLVX_1[] = { | ^~~~~~~~~~~~~~~ This started showing up after the definitions were moved into the source file from the header, which was not flagged by the compiler. Move the definition into the appropriate #ifdef block that already exists next to them. Fixes: 5908986ef348 ("video: fbdev: sis: avoid mismatched prototypes") Signed-off-by: Arnd Bergmann Signed-off-by: Helge Deller Signed-off-by: Sasha Levin --- drivers/video/fbdev/sis/init301.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/video/fbdev/sis/init301.c b/drivers/video/fbdev/sis/init301.c index a8fb41f1a258..09329072004f 100644 --- a/drivers/video/fbdev/sis/init301.c +++ b/drivers/video/fbdev/sis/init301.c @@ -172,7 +172,7 @@ static const unsigned char SiS_HiTVGroup3_2[] = { }; /* 301C / 302ELV extended Part2 TV registers (4 tap scaler) */ - +#ifdef CONFIG_FB_SIS_315 static const unsigned char SiS_Part2CLVX_1[] = { 0x00,0x00, 0x00,0x20,0x00,0x00,0x7F,0x20,0x02,0x7F,0x7D,0x20,0x04,0x7F,0x7D,0x1F,0x06,0x7E, @@ -245,7 +245,6 @@ static const unsigned char SiS_Part2CLVX_6[] = { /* 1080i */ 0xFF,0xFF, }; -#ifdef CONFIG_FB_SIS_315 /* 661 et al LCD data structure (2.03.00) */ static const unsigned char SiS_LCDStruct661[] = { /* 1024x768 */ From 409f20085d34c964a2f57794349d078fb0337cf4 Mon Sep 17 00:00:00 2001 From: Aleksandr Burakov Date: Fri, 1 Mar 2024 14:15:53 +0300 Subject: [PATCH 0098/1565] media: ngene: Add dvb_ca_en50221_init return value check [ Upstream commit 9bb1fd7eddcab2d28cfc11eb20f1029154dac718 ] The return value of dvb_ca_en50221_init() is not checked here that may cause undefined behavior in case of nonzero value return. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 25aee3debe04 ("[media] Rename media/dvb as media/pci") Signed-off-by: Aleksandr Burakov Signed-off-by: Hans Verkuil Signed-off-by: Sasha Levin --- drivers/media/pci/ngene/ngene-core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/media/pci/ngene/ngene-core.c b/drivers/media/pci/ngene/ngene-core.c index e1a8c611d01b..97cac9770802 100644 --- a/drivers/media/pci/ngene/ngene-core.c +++ b/drivers/media/pci/ngene/ngene-core.c @@ -1488,7 +1488,9 @@ static int init_channel(struct ngene_channel *chan) } if (dev->ci.en && (io & NGENE_IO_TSOUT)) { - dvb_ca_en50221_init(adapter, dev->ci.en, 0, 1); + ret = dvb_ca_en50221_init(adapter, dev->ci.en, 0, 1); + if (ret != 0) + goto err; set_transfer(chan, 1); chan->dev->channel[2].DataFormatFlags = DF_SWAP32; set_transfer(&chan->dev->channel[2], 1); From 552280a9921fa2a4be861f07ec9e3ad939ce736e Mon Sep 17 00:00:00 2001 From: Ricardo Ribalda Date: Mon, 25 Mar 2024 14:50:24 +0000 Subject: [PATCH 0099/1565] media: radio-shark2: Avoid led_names truncations MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 1820e16a3019b6258e6009d34432946a6ddd0a90 ] Increase the size of led_names so it can fit any valid v4l2 device name. Fixes: drivers/media/radio/radio-shark2.c:197:17: warning: ‘%s’ directive output may be truncated writing up to 35 bytes into a region of size 32 [-Wformat-truncation=] Signed-off-by: Ricardo Ribalda Signed-off-by: Hans Verkuil Signed-off-by: Sasha Levin --- drivers/media/radio/radio-shark2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/radio/radio-shark2.c b/drivers/media/radio/radio-shark2.c index f1c5c0a6a335..e3e6aa87fe08 100644 --- a/drivers/media/radio/radio-shark2.c +++ b/drivers/media/radio/radio-shark2.c @@ -62,7 +62,7 @@ struct shark_device { #ifdef SHARK_USE_LEDS struct work_struct led_work; struct led_classdev leds[NO_LEDS]; - char led_names[NO_LEDS][32]; + char led_names[NO_LEDS][64]; atomic_t brightness[NO_LEDS]; unsigned long brightness_new; #endif From 85d1a27402f81f2e04b0e67d20f749c2a14edbb3 Mon Sep 17 00:00:00 2001 From: Aleksandr Mishin Date: Mon, 8 Apr 2024 15:58:10 +0300 Subject: [PATCH 0100/1565] drm: bridge: cdns-mhdp8546: Fix possible null pointer dereference [ Upstream commit 935a92a1c400285545198ca2800a4c6c519c650a ] In cdns_mhdp_atomic_enable(), the return value of drm_mode_duplicate() is assigned to mhdp_state->current_mode, and there is a dereference of it in drm_mode_set_name(), which will lead to a NULL pointer dereference on failure of drm_mode_duplicate(). Fix this bug add a check of mhdp_state->current_mode. Fixes: fb43aa0acdfd ("drm: bridge: Add support for Cadence MHDP8546 DPI/DP bridge") Signed-off-by: Aleksandr Mishin Reviewed-by: Robert Foss Signed-off-by: Robert Foss Link: https://patchwork.freedesktop.org/patch/msgid/20240408125810.21899-1-amishin@t-argos.ru Signed-off-by: Sasha Levin --- drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c index f56ff97c9899..ae99d04f0045 100644 --- a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c +++ b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c @@ -1978,6 +1978,9 @@ static void cdns_mhdp_atomic_enable(struct drm_bridge *bridge, mhdp_state = to_cdns_mhdp_bridge_state(new_state); mhdp_state->current_mode = drm_mode_duplicate(bridge->dev, mode); + if (!mhdp_state->current_mode) + return; + drm_mode_set_name(mhdp_state->current_mode); dev_dbg(mhdp->dev, "%s: Enabling mode %s\n", __func__, mode->name); From 08ce354f3da44f58e448ca42ce38e9f22418bc5b Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Fri, 9 Feb 2024 21:39:38 -0800 Subject: [PATCH 0101/1565] fbdev: sh7760fb: allow modular build [ Upstream commit 51084f89d687e14d96278241e5200cde4b0985c7 ] There is no reason to prohibit sh7760fb from being built as a loadable module as suggested by Geert, so change the config symbol from bool to tristate to allow that and change the FB dependency as needed. Fixes: f75f71b2c418 ("fbdev/sh7760fb: Depend on FB=y") Suggested-by: Geert Uytterhoeven Signed-off-by: Randy Dunlap Cc: Thomas Zimmermann Cc: Javier Martinez Canillas Cc: John Paul Adrian Glaubitz Cc: Sam Ravnborg Cc: Helge Deller Cc: linux-fbdev@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Acked-by: John Paul Adrian Glaubitz Acked-by: Javier Martinez Canillas Signed-off-by: Helge Deller Signed-off-by: Sasha Levin --- drivers/video/fbdev/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/video/fbdev/Kconfig b/drivers/video/fbdev/Kconfig index dd5958463097..9ad3e5157869 100644 --- a/drivers/video/fbdev/Kconfig +++ b/drivers/video/fbdev/Kconfig @@ -2013,8 +2013,8 @@ config FB_COBALT depends on FB && MIPS_COBALT config FB_SH7760 - bool "SH7760/SH7763/SH7720/SH7721 LCDC support" - depends on FB=y && (CPU_SUBTYPE_SH7760 || CPU_SUBTYPE_SH7763 \ + tristate "SH7760/SH7763/SH7720/SH7721 LCDC support" + depends on FB && (CPU_SUBTYPE_SH7760 || CPU_SUBTYPE_SH7763 \ || CPU_SUBTYPE_SH7720 || CPU_SUBTYPE_SH7721) select FB_CFB_FILLRECT select FB_CFB_COPYAREA From 4b68b861b514a5c09220d622ac3784c0ebac6c80 Mon Sep 17 00:00:00 2001 From: Zhipeng Lu Date: Thu, 18 Jan 2024 16:13:00 +0100 Subject: [PATCH 0102/1565] media: atomisp: ssh_css: Fix a null-pointer dereference in load_video_binaries [ Upstream commit 3b621e9e9e148c0928ab109ac3d4b81487469acb ] The allocation failure of mycs->yuv_scaler_binary in load_video_binaries() is followed with a dereference of mycs->yuv_scaler_binary after the following call chain: sh_css_pipe_load_binaries() |-> load_video_binaries(mycs->yuv_scaler_binary == NULL) | |-> sh_css_pipe_unload_binaries() |-> unload_video_binaries() In unload_video_binaries(), it calls to ia_css_binary_unload with argument &pipe->pipe_settings.video.yuv_scaler_binary[i], which refers to the same memory slot as mycs->yuv_scaler_binary. Thus, a null-pointer dereference is triggered. Link: https://lore.kernel.org/r/20240118151303.3828292-1-alexious@zju.edu.cn Fixes: a49d25364dfb ("staging/atomisp: Add support for the Intel IPU v2") Signed-off-by: Zhipeng Lu Reviewed-by: Andy Shevchenko Signed-off-by: Hans de Goede Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin --- drivers/staging/media/atomisp/pci/sh_css.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/staging/media/atomisp/pci/sh_css.c b/drivers/staging/media/atomisp/pci/sh_css.c index 54a18921fbd1..cb0354520360 100644 --- a/drivers/staging/media/atomisp/pci/sh_css.c +++ b/drivers/staging/media/atomisp/pci/sh_css.c @@ -5477,6 +5477,7 @@ static int load_video_binaries(struct ia_css_pipe *pipe) mycs->yuv_scaler_binary = kzalloc(cas_scaler_descr.num_stage * sizeof(struct ia_css_binary), GFP_KERNEL); if (!mycs->yuv_scaler_binary) { + mycs->num_yuv_scaler = 0; err = -ENOMEM; return err; } From a5fa5b40a278a3ca978fed64707bd27614adb1eb Mon Sep 17 00:00:00 2001 From: Huai-Yuan Liu Date: Sun, 7 Apr 2024 14:30:53 +0800 Subject: [PATCH 0103/1565] drm/arm/malidp: fix a possible null pointer dereference [ Upstream commit a1f95aede6285dba6dd036d907196f35ae3a11ea ] In malidp_mw_connector_reset, new memory is allocated with kzalloc, but no check is performed. In order to prevent null pointer dereferencing, ensure that mw_state is checked before calling __drm_atomic_helper_connector_reset. Fixes: 8cbc5caf36ef ("drm: mali-dp: Add writeback connector") Signed-off-by: Huai-Yuan Liu Signed-off-by: Liviu Dudau Link: https://patchwork.freedesktop.org/patch/msgid/20240407063053.5481-1-qq810974084@gmail.com Signed-off-by: Sasha Levin --- drivers/gpu/drm/arm/malidp_mw.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/arm/malidp_mw.c b/drivers/gpu/drm/arm/malidp_mw.c index 7d0e7b031e44..fa5e77ee3af8 100644 --- a/drivers/gpu/drm/arm/malidp_mw.c +++ b/drivers/gpu/drm/arm/malidp_mw.c @@ -70,7 +70,10 @@ static void malidp_mw_connector_reset(struct drm_connector *connector) __drm_atomic_helper_connector_destroy_state(connector->state); kfree(connector->state); - __drm_atomic_helper_connector_reset(connector, &mw_state->base); + connector->state = NULL; + + if (mw_state) + __drm_atomic_helper_connector_reset(connector, &mw_state->base); } static enum drm_connector_status From 2d9adecc88ab678785b581ab021f039372c324cb Mon Sep 17 00:00:00 2001 From: Aleksandr Mishin Date: Tue, 9 Apr 2024 10:56:22 +0300 Subject: [PATCH 0104/1565] drm: vc4: Fix possible null pointer dereference [ Upstream commit c534b63bede6cb987c2946ed4d0b0013a52c5ba7 ] In vc4_hdmi_audio_init() of_get_address() may return NULL which is later dereferenced. Fix this bug by adding NULL check. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: bb7d78568814 ("drm/vc4: Add HDMI audio support") Signed-off-by: Aleksandr Mishin Signed-off-by: Maxime Ripard Link: https://patchwork.freedesktop.org/patch/msgid/20240409075622.11783-1-amishin@t-argos.ru Signed-off-by: Sasha Levin --- drivers/gpu/drm/vc4/vc4_hdmi.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c index 6d01258349fa..0bd49538cb6d 100644 --- a/drivers/gpu/drm/vc4/vc4_hdmi.c +++ b/drivers/gpu/drm/vc4/vc4_hdmi.c @@ -1253,6 +1253,8 @@ static int vc4_hdmi_audio_init(struct vc4_hdmi *vc4_hdmi) index = 1; addr = of_get_address(dev->of_node, index, NULL, NULL); + if (!addr) + return -EINVAL; vc4_hdmi->audio.dma_data.addr = be32_to_cpup(addr) + mai_data->offset; vc4_hdmi->audio.dma_data.addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES; From 3ce31a0e3705e18862310d251e9e43b95d4958b0 Mon Sep 17 00:00:00 2001 From: Steven Rostedt Date: Tue, 16 Apr 2024 00:03:03 -0400 Subject: [PATCH 0105/1565] ASoC: tracing: Export SND_SOC_DAPM_DIR_OUT to its value [ Upstream commit 58300f8d6a48e58d1843199be743f819e2791ea3 ] The string SND_SOC_DAPM_DIR_OUT is printed in the snd_soc_dapm_path trace event instead of its value: (((REC->path_dir) == SND_SOC_DAPM_DIR_OUT) ? "->" : "<-") User space cannot parse this, as it has no idea what SND_SOC_DAPM_DIR_OUT is. Use TRACE_DEFINE_ENUM() to convert it to its value: (((REC->path_dir) == 1) ? "->" : "<-") So that user space tools, such as perf and trace-cmd, can parse it correctly. Reported-by: Luca Ceresoli Fixes: 6e588a0d839b5 ("ASoC: dapm: Consolidate path trace events") Signed-off-by: Steven Rostedt (Google) Link: https://lore.kernel.org/r/20240416000303.04670cdf@rorschach.local.home Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- include/trace/events/asoc.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/trace/events/asoc.h b/include/trace/events/asoc.h index 40c300fe704d..f62d5b702426 100644 --- a/include/trace/events/asoc.h +++ b/include/trace/events/asoc.h @@ -11,6 +11,8 @@ #define DAPM_DIRECT "(direct)" #define DAPM_ARROW(dir) (((dir) == SND_SOC_DAPM_DIR_OUT) ? "->" : "<-") +TRACE_DEFINE_ENUM(SND_SOC_DAPM_DIR_OUT); + struct snd_soc_jack; struct snd_soc_card; struct snd_soc_dapm_widget; From ee751403fb824253b6782330362c11cb6bf3cc6c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?N=C3=ADcolas=20F=2E=20R=2E=20A=2E=20Prado?= Date: Mon, 15 Apr 2024 17:49:32 -0400 Subject: [PATCH 0106/1565] drm/bridge: lt9611: Don't log an error when DSI host can't be found MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit cd0a2c6a081ff67007323725b9ff07d9934b1ed8 ] Given that failing to find a DSI host causes the driver to defer probe, make use of dev_err_probe() to log the reason. This makes the defer probe reason available and avoids alerting userspace about something that is not necessarily an error. Fixes: 23278bf54afe ("drm/bridge: Introduce LT9611 DSI to HDMI bridge") Suggested-by: AngeloGioacchino Del Regno Reviewed-by: AngeloGioacchino Del Regno Reviewed-by: Laurent Pinchart Signed-off-by: Nícolas F. R. A. Prado Reviewed-by: Dmitry Baryshkov Signed-off-by: Robert Foss Link: https://patchwork.freedesktop.org/patch/msgid/20240415-anx7625-defer-log-no-dsi-host-v3-4-619a28148e5c@collabora.com Signed-off-by: Sasha Levin --- drivers/gpu/drm/bridge/lontium-lt9611.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/bridge/lontium-lt9611.c b/drivers/gpu/drm/bridge/lontium-lt9611.c index 660e05fa4a70..7f58ceda5b08 100644 --- a/drivers/gpu/drm/bridge/lontium-lt9611.c +++ b/drivers/gpu/drm/bridge/lontium-lt9611.c @@ -766,10 +766,8 @@ static struct mipi_dsi_device *lt9611_attach_dsi(struct lt9611 *lt9611, int ret; host = of_find_mipi_dsi_host_by_node(dsi_node); - if (!host) { - dev_err(lt9611->dev, "failed to find dsi host\n"); - return ERR_PTR(-EPROBE_DEFER); - } + if (!host) + return ERR_PTR(dev_err_probe(lt9611->dev, -EPROBE_DEFER, "failed to find dsi host\n")); dsi = mipi_dsi_device_register_full(host, &info); if (IS_ERR(dsi)) { From f10aa595ee4687844139164b94ef3139c55384ba Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?N=C3=ADcolas=20F=2E=20R=2E=20A=2E=20Prado?= Date: Mon, 15 Apr 2024 17:49:34 -0400 Subject: [PATCH 0107/1565] drm/bridge: tc358775: Don't log an error when DSI host can't be found MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 272377aa0e3dddeec3f568c8bb9d12c7a79d8ef5 ] Given that failing to find a DSI host causes the driver to defer probe, make use of dev_err_probe() to log the reason. This makes the defer probe reason available and avoids alerting userspace about something that is not necessarily an error. Fixes: b26975593b17 ("display/drm/bridge: TC358775 DSI/LVDS driver") Suggested-by: AngeloGioacchino Del Regno Reviewed-by: AngeloGioacchino Del Regno Reviewed-by: Laurent Pinchart Signed-off-by: Nícolas F. R. A. Prado Signed-off-by: Robert Foss Link: https://patchwork.freedesktop.org/patch/msgid/20240415-anx7625-defer-log-no-dsi-host-v3-6-619a28148e5c@collabora.com Signed-off-by: Sasha Levin --- drivers/gpu/drm/bridge/tc358775.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/bridge/tc358775.c b/drivers/gpu/drm/bridge/tc358775.c index 2272adcc5b4a..55697fa4d7c8 100644 --- a/drivers/gpu/drm/bridge/tc358775.c +++ b/drivers/gpu/drm/bridge/tc358775.c @@ -605,10 +605,8 @@ static int tc_bridge_attach(struct drm_bridge *bridge, }; host = of_find_mipi_dsi_host_by_node(tc->host_node); - if (!host) { - dev_err(dev, "failed to find dsi host\n"); - return -EPROBE_DEFER; - } + if (!host) + return dev_err_probe(dev, -EPROBE_DEFER, "failed to find dsi host\n"); dsi = mipi_dsi_device_register_full(host, &info); if (IS_ERR(dsi)) { From 5d344b30893f70a0726721fdc44f3656f63fe5a2 Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Thu, 28 Mar 2024 11:27:36 +0100 Subject: [PATCH 0108/1565] drm/panel: simple: Add missing Innolux G121X1-L03 format, flags, connector [ Upstream commit 11ac72d033b9f577e8ba0c7a41d1c312bb232593 ] The .bpc = 6 implies .bus_format = MEDIA_BUS_FMT_RGB666_1X7X3_SPWG , add the missing bus_format. Add missing connector type and bus_flags as well. Documentation [1] 1.4 GENERAL SPECIFICATI0NS indicates this panel is capable of both RGB 18bit/24bit panel, the current configuration uses 18bit mode, .bus_format = MEDIA_BUS_FMT_RGB666_1X7X3_SPWG , .bpc = 6. Support for the 24bit mode would require another entry in panel-simple with .bus_format = MEDIA_BUS_FMT_RGB666_1X7X4_SPWG and .bpc = 8, which is out of scope of this fix. [1] https://www.distec.de/fileadmin/pdf/produkte/TFT-Displays/Innolux/G121X1-L03_Datasheet.pdf Fixes: f8fa17ba812b ("drm/panel: simple: Add support for Innolux G121X1-L03") Signed-off-by: Marek Vasut Acked-by: Jessica Zhang Link: https://patchwork.freedesktop.org/patch/msgid/20240328102746.17868-2-marex@denx.de Signed-off-by: Sasha Levin --- drivers/gpu/drm/panel/panel-simple.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/panel/panel-simple.c b/drivers/gpu/drm/panel/panel-simple.c index 51470020ba61..7797aad592a1 100644 --- a/drivers/gpu/drm/panel/panel-simple.c +++ b/drivers/gpu/drm/panel/panel-simple.c @@ -2235,6 +2235,9 @@ static const struct panel_desc innolux_g121x1_l03 = { .unprepare = 200, .disable = 400, }, + .bus_format = MEDIA_BUS_FMT_RGB666_1X7X3_SPWG, + .bus_flags = DRM_BUS_FLAG_DE_HIGH, + .connector_type = DRM_MODE_CONNECTOR_LVDS, }; /* From eb9635b4a94fb0685a7f01b73ad085d23e126c9e Mon Sep 17 00:00:00 2001 From: Dmitry Baryshkov Date: Mon, 8 Apr 2024 02:53:51 +0300 Subject: [PATCH 0109/1565] drm/mipi-dsi: use correct return type for the DSC functions [ Upstream commit de1c705c50326acaceaf1f02bc5bf6f267c572bd ] The functions mipi_dsi_compression_mode() and mipi_dsi_picture_parameter_set() return 0-or-error rather than a buffer size. Follow example of other similar MIPI DSI functions and use int return type instead of size_t. Fixes: f4dea1aaa9a1 ("drm/dsi: add helpers for DSI compression mode and PPS packets") Reviewed-by: Marijn Suijten Reviewed-by: Jessica Zhang Signed-off-by: Dmitry Baryshkov Link: https://patchwork.freedesktop.org/patch/msgid/20240408-lg-sw43408-panel-v5-2-4e092da22991@linaro.org Signed-off-by: Sasha Levin --- drivers/gpu/drm/drm_mipi_dsi.c | 6 +++--- include/drm/drm_mipi_dsi.h | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/drm_mipi_dsi.c b/drivers/gpu/drm/drm_mipi_dsi.c index 83918ac1f608..1e9842aac4dc 100644 --- a/drivers/gpu/drm/drm_mipi_dsi.c +++ b/drivers/gpu/drm/drm_mipi_dsi.c @@ -572,7 +572,7 @@ EXPORT_SYMBOL(mipi_dsi_set_maximum_return_packet_size); * * Return: 0 on success or a negative error code on failure. */ -ssize_t mipi_dsi_compression_mode(struct mipi_dsi_device *dsi, bool enable) +int mipi_dsi_compression_mode(struct mipi_dsi_device *dsi, bool enable) { /* Note: Needs updating for non-default PPS or algorithm */ u8 tx[2] = { enable << 0, 0 }; @@ -597,8 +597,8 @@ EXPORT_SYMBOL(mipi_dsi_compression_mode); * * Return: 0 on success or a negative error code on failure. */ -ssize_t mipi_dsi_picture_parameter_set(struct mipi_dsi_device *dsi, - const struct drm_dsc_picture_parameter_set *pps) +int mipi_dsi_picture_parameter_set(struct mipi_dsi_device *dsi, + const struct drm_dsc_picture_parameter_set *pps) { struct mipi_dsi_msg msg = { .channel = dsi->channel, diff --git a/include/drm/drm_mipi_dsi.h b/include/drm/drm_mipi_dsi.h index 3c0d1495c062..0dbed65c0ca5 100644 --- a/include/drm/drm_mipi_dsi.h +++ b/include/drm/drm_mipi_dsi.h @@ -231,9 +231,9 @@ int mipi_dsi_shutdown_peripheral(struct mipi_dsi_device *dsi); int mipi_dsi_turn_on_peripheral(struct mipi_dsi_device *dsi); int mipi_dsi_set_maximum_return_packet_size(struct mipi_dsi_device *dsi, u16 value); -ssize_t mipi_dsi_compression_mode(struct mipi_dsi_device *dsi, bool enable); -ssize_t mipi_dsi_picture_parameter_set(struct mipi_dsi_device *dsi, - const struct drm_dsc_picture_parameter_set *pps); +int mipi_dsi_compression_mode(struct mipi_dsi_device *dsi, bool enable); +int mipi_dsi_picture_parameter_set(struct mipi_dsi_device *dsi, + const struct drm_dsc_picture_parameter_set *pps); ssize_t mipi_dsi_generic_write(struct mipi_dsi_device *dsi, const void *payload, size_t size); From 3c3744c309e933ad4763fabd14695e8da28d5caf Mon Sep 17 00:00:00 2001 From: Xi Wang Date: Sat, 14 Nov 2020 17:58:36 +0800 Subject: [PATCH 0110/1565] RDMA/hns: Refactor the hns_roce_buf allocation flow [ Upstream commit 6f6e2dcbb82b9b2ea304fe32635789fedd4e9868 ] Add a group of flags to control the 'struct hns_roce_buf' allocation flow, this is used to support the caller running in atomic context. Link: https://lore.kernel.org/r/1605347916-15964-1-git-send-email-liweihang@huawei.com Signed-off-by: Xi Wang Signed-off-by: Weihang Li Signed-off-by: Jason Gunthorpe Stable-dep-of: 203b70fda634 ("RDMA/hns: Fix return value in hns_roce_map_mr_sg") Signed-off-by: Sasha Levin --- drivers/infiniband/hw/hns/hns_roce_alloc.c | 128 +++++++++++--------- drivers/infiniband/hw/hns/hns_roce_device.h | 51 ++++---- drivers/infiniband/hw/hns/hns_roce_mr.c | 39 ++---- 3 files changed, 113 insertions(+), 105 deletions(-) diff --git a/drivers/infiniband/hw/hns/hns_roce_alloc.c b/drivers/infiniband/hw/hns/hns_roce_alloc.c index 5b2baf89d110..4bcaaa0524b1 100644 --- a/drivers/infiniband/hw/hns/hns_roce_alloc.c +++ b/drivers/infiniband/hw/hns/hns_roce_alloc.c @@ -159,76 +159,96 @@ void hns_roce_bitmap_cleanup(struct hns_roce_bitmap *bitmap) void hns_roce_buf_free(struct hns_roce_dev *hr_dev, struct hns_roce_buf *buf) { - struct device *dev = hr_dev->dev; - u32 size = buf->size; - int i; + struct hns_roce_buf_list *trunks; + u32 i; - if (size == 0) + if (!buf) return; - buf->size = 0; + trunks = buf->trunk_list; + if (trunks) { + buf->trunk_list = NULL; + for (i = 0; i < buf->ntrunks; i++) + dma_free_coherent(hr_dev->dev, 1 << buf->trunk_shift, + trunks[i].buf, trunks[i].map); - if (hns_roce_buf_is_direct(buf)) { - dma_free_coherent(dev, size, buf->direct.buf, buf->direct.map); - } else { - for (i = 0; i < buf->npages; ++i) - if (buf->page_list[i].buf) - dma_free_coherent(dev, 1 << buf->page_shift, - buf->page_list[i].buf, - buf->page_list[i].map); - kfree(buf->page_list); - buf->page_list = NULL; + kfree(trunks); } + + kfree(buf); } -int hns_roce_buf_alloc(struct hns_roce_dev *hr_dev, u32 size, u32 max_direct, - struct hns_roce_buf *buf, u32 page_shift) +/* + * Allocate the dma buffer for storing ROCEE table entries + * + * @size: required size + * @page_shift: the unit size in a continuous dma address range + * @flags: HNS_ROCE_BUF_ flags to control the allocation flow. + */ +struct hns_roce_buf *hns_roce_buf_alloc(struct hns_roce_dev *hr_dev, u32 size, + u32 page_shift, u32 flags) { - struct hns_roce_buf_list *buf_list; - struct device *dev = hr_dev->dev; - u32 page_size; - int i; + u32 trunk_size, page_size, alloced_size; + struct hns_roce_buf_list *trunks; + struct hns_roce_buf *buf; + gfp_t gfp_flags; + u32 ntrunk, i; /* The minimum shift of the page accessed by hw is HNS_HW_PAGE_SHIFT */ - buf->page_shift = max_t(int, HNS_HW_PAGE_SHIFT, page_shift); + if (WARN_ON(page_shift < HNS_HW_PAGE_SHIFT)) + return ERR_PTR(-EINVAL); + gfp_flags = (flags & HNS_ROCE_BUF_NOSLEEP) ? GFP_ATOMIC : GFP_KERNEL; + buf = kzalloc(sizeof(*buf), gfp_flags); + if (!buf) + return ERR_PTR(-ENOMEM); + + buf->page_shift = page_shift; page_size = 1 << buf->page_shift; - buf->npages = DIV_ROUND_UP(size, page_size); - /* required size is not bigger than one trunk size */ - if (size <= max_direct) { - buf->page_list = NULL; - buf->direct.buf = dma_alloc_coherent(dev, size, - &buf->direct.map, - GFP_KERNEL); - if (!buf->direct.buf) - return -ENOMEM; + /* Calc the trunk size and num by required size and page_shift */ + if (flags & HNS_ROCE_BUF_DIRECT) { + buf->trunk_shift = ilog2(ALIGN(size, PAGE_SIZE)); + ntrunk = 1; } else { - buf_list = kcalloc(buf->npages, sizeof(*buf_list), GFP_KERNEL); - if (!buf_list) - return -ENOMEM; - - for (i = 0; i < buf->npages; i++) { - buf_list[i].buf = dma_alloc_coherent(dev, page_size, - &buf_list[i].map, - GFP_KERNEL); - if (!buf_list[i].buf) - break; - } - - if (i != buf->npages && i > 0) { - while (i-- > 0) - dma_free_coherent(dev, page_size, - buf_list[i].buf, - buf_list[i].map); - kfree(buf_list); - return -ENOMEM; - } - buf->page_list = buf_list; + buf->trunk_shift = ilog2(ALIGN(page_size, PAGE_SIZE)); + ntrunk = DIV_ROUND_UP(size, 1 << buf->trunk_shift); } - buf->size = size; - return 0; + trunks = kcalloc(ntrunk, sizeof(*trunks), gfp_flags); + if (!trunks) { + kfree(buf); + return ERR_PTR(-ENOMEM); + } + + trunk_size = 1 << buf->trunk_shift; + alloced_size = 0; + for (i = 0; i < ntrunk; i++) { + trunks[i].buf = dma_alloc_coherent(hr_dev->dev, trunk_size, + &trunks[i].map, gfp_flags); + if (!trunks[i].buf) + break; + + alloced_size += trunk_size; + } + + buf->ntrunks = i; + + /* In nofail mode, it's only failed when the alloced size is 0 */ + if ((flags & HNS_ROCE_BUF_NOFAIL) ? i == 0 : i != ntrunk) { + for (i = 0; i < buf->ntrunks; i++) + dma_free_coherent(hr_dev->dev, trunk_size, + trunks[i].buf, trunks[i].map); + + kfree(trunks); + kfree(buf); + return ERR_PTR(-ENOMEM); + } + + buf->npages = DIV_ROUND_UP(alloced_size, page_size); + buf->trunk_list = trunks; + + return buf; } int hns_roce_get_kmem_bufs(struct hns_roce_dev *hr_dev, dma_addr_t *bufs, diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h index 09b5e4935c2c..e1f5b9259682 100644 --- a/drivers/infiniband/hw/hns/hns_roce_device.h +++ b/drivers/infiniband/hw/hns/hns_roce_device.h @@ -263,9 +263,6 @@ enum { #define HNS_HW_PAGE_SHIFT 12 #define HNS_HW_PAGE_SIZE (1 << HNS_HW_PAGE_SHIFT) -/* The minimum page count for hardware access page directly. */ -#define HNS_HW_DIRECT_PAGE_COUNT 2 - struct hns_roce_uar { u64 pfn; unsigned long index; @@ -417,11 +414,26 @@ struct hns_roce_buf_list { dma_addr_t map; }; +/* + * %HNS_ROCE_BUF_DIRECT indicates that the all memory must be in a continuous + * dma address range. + * + * %HNS_ROCE_BUF_NOSLEEP indicates that the caller cannot sleep. + * + * %HNS_ROCE_BUF_NOFAIL allocation only failed when allocated size is zero, even + * the allocated size is smaller than the required size. + */ +enum { + HNS_ROCE_BUF_DIRECT = BIT(0), + HNS_ROCE_BUF_NOSLEEP = BIT(1), + HNS_ROCE_BUF_NOFAIL = BIT(2), +}; + struct hns_roce_buf { - struct hns_roce_buf_list direct; - struct hns_roce_buf_list *page_list; + struct hns_roce_buf_list *trunk_list; + u32 ntrunks; u32 npages; - u32 size; + unsigned int trunk_shift; unsigned int page_shift; }; @@ -1067,29 +1079,18 @@ static inline struct hns_roce_qp return xa_load(&hr_dev->qp_table_xa, qpn & (hr_dev->caps.num_qps - 1)); } -static inline bool hns_roce_buf_is_direct(struct hns_roce_buf *buf) -{ - if (buf->page_list) - return false; - - return true; -} - static inline void *hns_roce_buf_offset(struct hns_roce_buf *buf, int offset) { - if (hns_roce_buf_is_direct(buf)) - return (char *)(buf->direct.buf) + (offset & (buf->size - 1)); - - return (char *)(buf->page_list[offset >> buf->page_shift].buf) + - (offset & ((1 << buf->page_shift) - 1)); + return (char *)(buf->trunk_list[offset >> buf->trunk_shift].buf) + + (offset & ((1 << buf->trunk_shift) - 1)); } static inline dma_addr_t hns_roce_buf_page(struct hns_roce_buf *buf, int idx) { - if (hns_roce_buf_is_direct(buf)) - return buf->direct.map + ((dma_addr_t)idx << buf->page_shift); - else - return buf->page_list[idx].map; + int offset = idx << buf->page_shift; + + return buf->trunk_list[offset >> buf->trunk_shift].map + + (offset & ((1 << buf->trunk_shift) - 1)); } #define hr_hw_page_align(x) ALIGN(x, 1 << HNS_HW_PAGE_SHIFT) @@ -1221,8 +1222,8 @@ int hns_roce_alloc_mw(struct ib_mw *mw, struct ib_udata *udata); int hns_roce_dealloc_mw(struct ib_mw *ibmw); void hns_roce_buf_free(struct hns_roce_dev *hr_dev, struct hns_roce_buf *buf); -int hns_roce_buf_alloc(struct hns_roce_dev *hr_dev, u32 size, u32 max_direct, - struct hns_roce_buf *buf, u32 page_shift); +struct hns_roce_buf *hns_roce_buf_alloc(struct hns_roce_dev *hr_dev, u32 size, + u32 page_shift, u32 flags); int hns_roce_get_kmem_bufs(struct hns_roce_dev *hr_dev, dma_addr_t *bufs, int buf_cnt, int start, struct hns_roce_buf *buf); diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c index d5b3b10e0a80..13c7ac6ea312 100644 --- a/drivers/infiniband/hw/hns/hns_roce_mr.c +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c @@ -694,15 +694,6 @@ static inline size_t mtr_bufs_size(struct hns_roce_buf_attr *attr) return size; } -static inline size_t mtr_kmem_direct_size(bool is_direct, size_t alloc_size, - unsigned int page_shift) -{ - if (is_direct) - return ALIGN(alloc_size, 1 << page_shift); - else - return HNS_HW_DIRECT_PAGE_COUNT << page_shift; -} - /* * check the given pages in continuous address space * Returns 0 on success, or the error page num. @@ -731,7 +722,6 @@ static void mtr_free_bufs(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr) /* release kernel buffers */ if (mtr->kmem) { hns_roce_buf_free(hr_dev, mtr->kmem); - kfree(mtr->kmem); mtr->kmem = NULL; } } @@ -743,13 +733,12 @@ static int mtr_alloc_bufs(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr, struct ib_device *ibdev = &hr_dev->ib_dev; unsigned int best_pg_shift; int all_pg_count = 0; - size_t direct_size; size_t total_size; int ret; total_size = mtr_bufs_size(buf_attr); if (total_size < 1) { - ibdev_err(ibdev, "Failed to check mtr size\n"); + ibdev_err(ibdev, "failed to check mtr size\n."); return -EINVAL; } @@ -761,7 +750,7 @@ static int mtr_alloc_bufs(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr, mtr->umem = ib_umem_get(ibdev, user_addr, total_size, buf_attr->user_access); if (IS_ERR_OR_NULL(mtr->umem)) { - ibdev_err(ibdev, "Failed to get umem, ret %ld\n", + ibdev_err(ibdev, "failed to get umem, ret = %ld.\n", PTR_ERR(mtr->umem)); return -ENOMEM; } @@ -779,19 +768,16 @@ static int mtr_alloc_bufs(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr, ret = 0; } else { mtr->umem = NULL; - mtr->kmem = kzalloc(sizeof(*mtr->kmem), GFP_KERNEL); - if (!mtr->kmem) { - ibdev_err(ibdev, "Failed to alloc kmem\n"); - return -ENOMEM; - } - direct_size = mtr_kmem_direct_size(is_direct, total_size, - buf_attr->page_shift); - ret = hns_roce_buf_alloc(hr_dev, total_size, direct_size, - mtr->kmem, buf_attr->page_shift); - if (ret) { - ibdev_err(ibdev, "Failed to alloc kmem, ret %d\n", ret); - goto err_alloc_mem; + mtr->kmem = + hns_roce_buf_alloc(hr_dev, total_size, + buf_attr->page_shift, + is_direct ? HNS_ROCE_BUF_DIRECT : 0); + if (IS_ERR(mtr->kmem)) { + ibdev_err(ibdev, "failed to alloc kmem, ret = %ld.\n", + PTR_ERR(mtr->kmem)); + return PTR_ERR(mtr->kmem); } + best_pg_shift = buf_attr->page_shift; all_pg_count = mtr->kmem->npages; } @@ -799,7 +785,8 @@ static int mtr_alloc_bufs(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr, /* must bigger than minimum hardware page shift */ if (best_pg_shift < HNS_HW_PAGE_SHIFT || all_pg_count < 1) { ret = -EINVAL; - ibdev_err(ibdev, "Failed to check mtr page shift %d count %d\n", + ibdev_err(ibdev, + "failed to check mtr, page shift = %u count = %d.\n", best_pg_shift, all_pg_count); goto err_alloc_mem; } From e612e695d3a5415fe191d8ee6fbf6915bb11d20d Mon Sep 17 00:00:00 2001 From: Yangyang Li Date: Tue, 24 Nov 2020 20:24:09 +0800 Subject: [PATCH 0111/1565] RDMA/hns: Create QP with selected QPN for bank load balance [ Upstream commit 71586dd2001087e89e344e2c7dcee6b4a53bb6de ] In order to improve performance by balancing the load between different banks of cache, the QPC cache is desigend to choose one of 8 banks according to lower 3 bits of QPN. The hns driver needs to count the number of QP on each bank and then assigns the QP being created to the bank with the minimum load first. Link: https://lore.kernel.org/r/1606220649-1465-1-git-send-email-liweihang@huawei.com Signed-off-by: Yangyang Li Signed-off-by: Weihang Li Signed-off-by: Jason Gunthorpe Stable-dep-of: 203b70fda634 ("RDMA/hns: Fix return value in hns_roce_map_mr_sg") Signed-off-by: Sasha Levin --- drivers/infiniband/hw/hns/hns_roce_device.h | 15 ++- drivers/infiniband/hw/hns/hns_roce_qp.c | 100 ++++++++++++++++---- 2 files changed, 96 insertions(+), 19 deletions(-) diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h index e1f5b9259682..5f1617356661 100644 --- a/drivers/infiniband/hw/hns/hns_roce_device.h +++ b/drivers/infiniband/hw/hns/hns_roce_device.h @@ -115,6 +115,8 @@ #define HNS_ROCE_IDX_QUE_ENTRY_SZ 4 #define SRQ_DB_REG 0x230 +#define HNS_ROCE_QP_BANK_NUM 8 + /* The chip implementation of the consumer index is calculated * according to twice the actual EQ depth */ @@ -520,13 +522,22 @@ struct hns_roce_uar_table { struct hns_roce_bitmap bitmap; }; +struct hns_roce_bank { + struct ida ida; + u32 inuse; /* Number of IDs allocated */ + u32 min; /* Lowest ID to allocate. */ + u32 max; /* Highest ID to allocate. */ + u32 next; /* Next ID to allocate. */ +}; + struct hns_roce_qp_table { - struct hns_roce_bitmap bitmap; struct hns_roce_hem_table qp_table; struct hns_roce_hem_table irrl_table; struct hns_roce_hem_table trrl_table; struct hns_roce_hem_table sccc_table; struct mutex scc_mutex; + struct hns_roce_bank bank[HNS_ROCE_QP_BANK_NUM]; + spinlock_t bank_lock; }; struct hns_roce_cq_table { @@ -776,7 +787,7 @@ struct hns_roce_caps { u32 max_rq_sg; u32 max_extend_sg; int num_qps; - int reserved_qps; + u32 reserved_qps; int num_qpc_timer; int num_cqc_timer; int num_srqs; diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c index d1c07f1f8fe9..8323de2c4bf9 100644 --- a/drivers/infiniband/hw/hns/hns_roce_qp.c +++ b/drivers/infiniband/hw/hns/hns_roce_qp.c @@ -154,9 +154,50 @@ static void hns_roce_ib_qp_event(struct hns_roce_qp *hr_qp, } } +static u8 get_least_load_bankid_for_qp(struct hns_roce_bank *bank) +{ + u32 least_load = bank[0].inuse; + u8 bankid = 0; + u32 bankcnt; + u8 i; + + for (i = 1; i < HNS_ROCE_QP_BANK_NUM; i++) { + bankcnt = bank[i].inuse; + if (bankcnt < least_load) { + least_load = bankcnt; + bankid = i; + } + } + + return bankid; +} + +static int alloc_qpn_with_bankid(struct hns_roce_bank *bank, u8 bankid, + unsigned long *qpn) +{ + int id; + + id = ida_alloc_range(&bank->ida, bank->next, bank->max, GFP_KERNEL); + if (id < 0) { + id = ida_alloc_range(&bank->ida, bank->min, bank->max, + GFP_KERNEL); + if (id < 0) + return id; + } + + /* the QPN should keep increasing until the max value is reached. */ + bank->next = (id + 1) > bank->max ? bank->min : id + 1; + + /* the lower 3 bits is bankid */ + *qpn = (id << 3) | bankid; + + return 0; +} static int alloc_qpn(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp) { + struct hns_roce_qp_table *qp_table = &hr_dev->qp_table; unsigned long num = 0; + u8 bankid; int ret; if (hr_qp->ibqp.qp_type == IB_QPT_GSI) { @@ -169,13 +210,21 @@ static int alloc_qpn(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp) hr_qp->doorbell_qpn = 1; } else { - ret = hns_roce_bitmap_alloc_range(&hr_dev->qp_table.bitmap, - 1, 1, &num); + spin_lock(&qp_table->bank_lock); + bankid = get_least_load_bankid_for_qp(qp_table->bank); + + ret = alloc_qpn_with_bankid(&qp_table->bank[bankid], bankid, + &num); if (ret) { - ibdev_err(&hr_dev->ib_dev, "Failed to alloc bitmap\n"); - return -ENOMEM; + ibdev_err(&hr_dev->ib_dev, + "failed to alloc QPN, ret = %d\n", ret); + spin_unlock(&qp_table->bank_lock); + return ret; } + qp_table->bank[bankid].inuse++; + spin_unlock(&qp_table->bank_lock); + hr_qp->doorbell_qpn = (u32)num; } @@ -340,9 +389,15 @@ static void free_qpc(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp) hns_roce_table_put(hr_dev, &qp_table->irrl_table, hr_qp->qpn); } +static inline u8 get_qp_bankid(unsigned long qpn) +{ + /* The lower 3 bits of QPN are used to hash to different banks */ + return (u8)(qpn & GENMASK(2, 0)); +} + static void free_qpn(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp) { - struct hns_roce_qp_table *qp_table = &hr_dev->qp_table; + u8 bankid; if (hr_qp->ibqp.qp_type == IB_QPT_GSI) return; @@ -350,7 +405,13 @@ static void free_qpn(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp) if (hr_qp->qpn < hr_dev->caps.reserved_qps) return; - hns_roce_bitmap_free_range(&qp_table->bitmap, hr_qp->qpn, 1, BITMAP_RR); + bankid = get_qp_bankid(hr_qp->qpn); + + ida_free(&hr_dev->qp_table.bank[bankid].ida, hr_qp->qpn >> 3); + + spin_lock(&hr_dev->qp_table.bank_lock); + hr_dev->qp_table.bank[bankid].inuse--; + spin_unlock(&hr_dev->qp_table.bank_lock); } static int set_rq_size(struct hns_roce_dev *hr_dev, struct ib_qp_cap *cap, @@ -1293,22 +1354,24 @@ bool hns_roce_wq_overflow(struct hns_roce_wq *hr_wq, int nreq, int hns_roce_init_qp_table(struct hns_roce_dev *hr_dev) { struct hns_roce_qp_table *qp_table = &hr_dev->qp_table; - int reserved_from_top = 0; - int reserved_from_bot; - int ret; + unsigned int reserved_from_bot; + unsigned int i; mutex_init(&qp_table->scc_mutex); xa_init(&hr_dev->qp_table_xa); reserved_from_bot = hr_dev->caps.reserved_qps; - ret = hns_roce_bitmap_init(&qp_table->bitmap, hr_dev->caps.num_qps, - hr_dev->caps.num_qps - 1, reserved_from_bot, - reserved_from_top); - if (ret) { - dev_err(hr_dev->dev, "qp bitmap init failed!error=%d\n", - ret); - return ret; + for (i = 0; i < reserved_from_bot; i++) { + hr_dev->qp_table.bank[get_qp_bankid(i)].inuse++; + hr_dev->qp_table.bank[get_qp_bankid(i)].min++; + } + + for (i = 0; i < HNS_ROCE_QP_BANK_NUM; i++) { + ida_init(&hr_dev->qp_table.bank[i].ida); + hr_dev->qp_table.bank[i].max = hr_dev->caps.num_qps / + HNS_ROCE_QP_BANK_NUM - 1; + hr_dev->qp_table.bank[i].next = hr_dev->qp_table.bank[i].min; } return 0; @@ -1316,5 +1379,8 @@ int hns_roce_init_qp_table(struct hns_roce_dev *hr_dev) void hns_roce_cleanup_qp_table(struct hns_roce_dev *hr_dev) { - hns_roce_bitmap_cleanup(&hr_dev->qp_table.bitmap); + int i; + + for (i = 0; i < HNS_ROCE_QP_BANK_NUM; i++) + ida_destroy(&hr_dev->qp_table.bank[i].ida); } From 4c91ad5ed5636cbcc2e3830900cf4c6d8a36070b Mon Sep 17 00:00:00 2001 From: Wenpeng Liang Date: Fri, 11 Dec 2020 09:37:35 +0800 Subject: [PATCH 0112/1565] RDMA/hns: Fix incorrect symbol types [ Upstream commit dcdc366acf8ffc29f091a09e08b4e46caa0a0f21 ] Types of some fields, variables and parameters of some functions should be unsigned. Link: https://lore.kernel.org/r/1607650657-35992-10-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang Signed-off-by: Weihang Li Signed-off-by: Jason Gunthorpe Stable-dep-of: 203b70fda634 ("RDMA/hns: Fix return value in hns_roce_map_mr_sg") Signed-off-by: Sasha Levin --- drivers/infiniband/hw/hns/hns_roce_cmd.c | 10 ++-- drivers/infiniband/hw/hns/hns_roce_cmd.h | 2 +- drivers/infiniband/hw/hns/hns_roce_common.h | 14 +++--- drivers/infiniband/hw/hns/hns_roce_db.c | 8 ++-- drivers/infiniband/hw/hns/hns_roce_device.h | 53 +++++++++++---------- drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 8 ++-- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 30 ++++++------ drivers/infiniband/hw/hns/hns_roce_main.c | 2 +- drivers/infiniband/hw/hns/hns_roce_mr.c | 11 ++--- drivers/infiniband/hw/hns/hns_roce_qp.c | 8 ++-- 10 files changed, 73 insertions(+), 73 deletions(-) diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c b/drivers/infiniband/hw/hns/hns_roce_cmd.c index c493d7644b57..339e3fd98b0b 100644 --- a/drivers/infiniband/hw/hns/hns_roce_cmd.c +++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c @@ -60,7 +60,7 @@ static int hns_roce_cmd_mbox_post_hw(struct hns_roce_dev *hr_dev, u64 in_param, static int __hns_roce_cmd_mbox_poll(struct hns_roce_dev *hr_dev, u64 in_param, u64 out_param, unsigned long in_modifier, u8 op_modifier, u16 op, - unsigned long timeout) + unsigned int timeout) { struct device *dev = hr_dev->dev; int ret; @@ -78,7 +78,7 @@ static int __hns_roce_cmd_mbox_poll(struct hns_roce_dev *hr_dev, u64 in_param, static int hns_roce_cmd_mbox_poll(struct hns_roce_dev *hr_dev, u64 in_param, u64 out_param, unsigned long in_modifier, - u8 op_modifier, u16 op, unsigned long timeout) + u8 op_modifier, u16 op, unsigned int timeout) { int ret; @@ -108,7 +108,7 @@ void hns_roce_cmd_event(struct hns_roce_dev *hr_dev, u16 token, u8 status, static int __hns_roce_cmd_mbox_wait(struct hns_roce_dev *hr_dev, u64 in_param, u64 out_param, unsigned long in_modifier, u8 op_modifier, u16 op, - unsigned long timeout) + unsigned int timeout) { struct hns_roce_cmdq *cmd = &hr_dev->cmd; struct hns_roce_cmd_context *context; @@ -159,7 +159,7 @@ static int __hns_roce_cmd_mbox_wait(struct hns_roce_dev *hr_dev, u64 in_param, static int hns_roce_cmd_mbox_wait(struct hns_roce_dev *hr_dev, u64 in_param, u64 out_param, unsigned long in_modifier, - u8 op_modifier, u16 op, unsigned long timeout) + u8 op_modifier, u16 op, unsigned int timeout) { int ret; @@ -173,7 +173,7 @@ static int hns_roce_cmd_mbox_wait(struct hns_roce_dev *hr_dev, u64 in_param, int hns_roce_cmd_mbox(struct hns_roce_dev *hr_dev, u64 in_param, u64 out_param, unsigned long in_modifier, u8 op_modifier, u16 op, - unsigned long timeout) + unsigned int timeout) { int ret; diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.h b/drivers/infiniband/hw/hns/hns_roce_cmd.h index 8e63b827f28c..8025e7f657fa 100644 --- a/drivers/infiniband/hw/hns/hns_roce_cmd.h +++ b/drivers/infiniband/hw/hns/hns_roce_cmd.h @@ -141,7 +141,7 @@ enum { int hns_roce_cmd_mbox(struct hns_roce_dev *hr_dev, u64 in_param, u64 out_param, unsigned long in_modifier, u8 op_modifier, u16 op, - unsigned long timeout); + unsigned int timeout); struct hns_roce_cmd_mailbox * hns_roce_alloc_cmd_mailbox(struct hns_roce_dev *hr_dev); diff --git a/drivers/infiniband/hw/hns/hns_roce_common.h b/drivers/infiniband/hw/hns/hns_roce_common.h index f5669ff8cfeb..e57e812b1546 100644 --- a/drivers/infiniband/hw/hns/hns_roce_common.h +++ b/drivers/infiniband/hw/hns/hns_roce_common.h @@ -38,19 +38,19 @@ #define roce_raw_write(value, addr) \ __raw_writel((__force u32)cpu_to_le32(value), (addr)) -#define roce_get_field(origin, mask, shift) \ - (((le32_to_cpu(origin)) & (mask)) >> (shift)) +#define roce_get_field(origin, mask, shift) \ + ((le32_to_cpu(origin) & (mask)) >> (u32)(shift)) #define roce_get_bit(origin, shift) \ roce_get_field((origin), (1ul << (shift)), (shift)) -#define roce_set_field(origin, mask, shift, val) \ - do { \ - (origin) &= ~cpu_to_le32(mask); \ - (origin) |= cpu_to_le32(((u32)(val) << (shift)) & (mask)); \ +#define roce_set_field(origin, mask, shift, val) \ + do { \ + (origin) &= ~cpu_to_le32(mask); \ + (origin) |= cpu_to_le32(((u32)(val) << (u32)(shift)) & (mask)); \ } while (0) -#define roce_set_bit(origin, shift, val) \ +#define roce_set_bit(origin, shift, val) \ roce_set_field((origin), (1ul << (shift)), (shift), (val)) #define ROCEE_GLB_CFG_ROCEE_DB_SQ_MODE_S 3 diff --git a/drivers/infiniband/hw/hns/hns_roce_db.c b/drivers/infiniband/hw/hns/hns_roce_db.c index bff6abdccfb0..5cb7376ce978 100644 --- a/drivers/infiniband/hw/hns/hns_roce_db.c +++ b/drivers/infiniband/hw/hns/hns_roce_db.c @@ -95,8 +95,8 @@ static struct hns_roce_db_pgdir *hns_roce_alloc_db_pgdir( static int hns_roce_alloc_db_from_pgdir(struct hns_roce_db_pgdir *pgdir, struct hns_roce_db *db, int order) { - int o; - int i; + unsigned long o; + unsigned long i; for (o = order; o <= 1; ++o) { i = find_first_bit(pgdir->bits[o], HNS_ROCE_DB_PER_PAGE >> o); @@ -154,8 +154,8 @@ int hns_roce_alloc_db(struct hns_roce_dev *hr_dev, struct hns_roce_db *db, void hns_roce_free_db(struct hns_roce_dev *hr_dev, struct hns_roce_db *db) { - int o; - int i; + unsigned long o; + unsigned long i; mutex_lock(&hr_dev->pgdir_mutex); diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h index 5f1617356661..db78bc2d44c2 100644 --- a/drivers/infiniband/hw/hns/hns_roce_device.h +++ b/drivers/infiniband/hw/hns/hns_roce_device.h @@ -315,7 +315,7 @@ struct hns_roce_hem_table { }; struct hns_roce_buf_region { - int offset; /* page offset */ + u32 offset; /* page offset */ u32 count; /* page count */ int hopnum; /* addressing hop num */ }; @@ -335,10 +335,10 @@ struct hns_roce_buf_attr { size_t size; /* region size */ int hopnum; /* multi-hop addressing hop num */ } region[HNS_ROCE_MAX_BT_REGION]; - int region_count; /* valid region count */ + unsigned int region_count; /* valid region count */ unsigned int page_shift; /* buffer page shift */ bool fixed_page; /* decide page shift is fixed-size or maximum size */ - int user_access; /* umem access flag */ + unsigned int user_access; /* umem access flag */ bool mtt_only; /* only alloc buffer-required MTT memory */ }; @@ -349,7 +349,7 @@ struct hns_roce_hem_cfg { unsigned int buf_pg_shift; /* buffer page shift */ unsigned int buf_pg_count; /* buffer page count */ struct hns_roce_buf_region region[HNS_ROCE_MAX_BT_REGION]; - int region_count; + unsigned int region_count; }; /* memory translate region */ @@ -397,7 +397,7 @@ struct hns_roce_wq { u64 *wrid; /* Work request ID */ spinlock_t lock; u32 wqe_cnt; /* WQE num */ - int max_gs; + u32 max_gs; int offset; int wqe_shift; /* WQE size */ u32 head; @@ -463,8 +463,8 @@ struct hns_roce_db { } u; dma_addr_t dma; void *virt_addr; - int index; - int order; + unsigned long index; + unsigned long order; }; struct hns_roce_cq { @@ -512,8 +512,8 @@ struct hns_roce_srq { u64 *wrid; struct hns_roce_idx_que idx_que; spinlock_t lock; - int head; - int tail; + u16 head; + u16 tail; struct mutex mutex; void (*event)(struct hns_roce_srq *srq, enum hns_roce_event event); }; @@ -751,11 +751,11 @@ struct hns_roce_eq { int type_flag; /* Aeq:1 ceq:0 */ int eqn; u32 entries; - int log_entries; + u32 log_entries; int eqe_size; int irq; int log_page_size; - int cons_index; + u32 cons_index; struct hns_roce_buf_list *buf_list; int over_ignore; int coalesce; @@ -763,7 +763,7 @@ struct hns_roce_eq { int hop_num; struct hns_roce_mtr mtr; u16 eq_max_cnt; - int eq_period; + u32 eq_period; int shift; int event_type; int sub_type; @@ -786,8 +786,8 @@ struct hns_roce_caps { u32 max_sq_inline; u32 max_rq_sg; u32 max_extend_sg; - int num_qps; - u32 reserved_qps; + u32 num_qps; + u32 reserved_qps; int num_qpc_timer; int num_cqc_timer; int num_srqs; @@ -799,7 +799,7 @@ struct hns_roce_caps { u32 max_srq_desc_sz; int max_qp_init_rdma; int max_qp_dest_rdma; - int num_cqs; + u32 num_cqs; u32 max_cqes; u32 min_cqes; u32 min_wqes; @@ -808,7 +808,7 @@ struct hns_roce_caps { int num_aeq_vectors; int num_comp_vectors; int num_other_vectors; - int num_mtpts; + u32 num_mtpts; u32 num_mtt_segs; u32 num_cqe_segs; u32 num_srqwqe_segs; @@ -919,7 +919,7 @@ struct hns_roce_hw { int (*post_mbox)(struct hns_roce_dev *hr_dev, u64 in_param, u64 out_param, u32 in_modifier, u8 op_modifier, u16 op, u16 token, int event); - int (*chk_mbox)(struct hns_roce_dev *hr_dev, unsigned long timeout); + int (*chk_mbox)(struct hns_roce_dev *hr_dev, unsigned int timeout); int (*rst_prc_mbox)(struct hns_roce_dev *hr_dev); int (*set_gid)(struct hns_roce_dev *hr_dev, u8 port, int gid_index, const union ib_gid *gid, const struct ib_gid_attr *attr); @@ -1090,15 +1090,16 @@ static inline struct hns_roce_qp return xa_load(&hr_dev->qp_table_xa, qpn & (hr_dev->caps.num_qps - 1)); } -static inline void *hns_roce_buf_offset(struct hns_roce_buf *buf, int offset) +static inline void *hns_roce_buf_offset(struct hns_roce_buf *buf, + unsigned int offset) { return (char *)(buf->trunk_list[offset >> buf->trunk_shift].buf) + (offset & ((1 << buf->trunk_shift) - 1)); } -static inline dma_addr_t hns_roce_buf_page(struct hns_roce_buf *buf, int idx) +static inline dma_addr_t hns_roce_buf_page(struct hns_roce_buf *buf, u32 idx) { - int offset = idx << buf->page_shift; + unsigned int offset = idx << buf->page_shift; return buf->trunk_list[offset >> buf->trunk_shift].map + (offset & ((1 << buf->trunk_shift) - 1)); @@ -1173,7 +1174,7 @@ int hns_roce_mtr_create(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr, void hns_roce_mtr_destroy(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr); int hns_roce_mtr_map(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr, - dma_addr_t *pages, int page_cnt); + dma_addr_t *pages, unsigned int page_cnt); int hns_roce_init_pd_table(struct hns_roce_dev *hr_dev); int hns_roce_init_mr_table(struct hns_roce_dev *hr_dev); @@ -1256,10 +1257,10 @@ struct ib_qp *hns_roce_create_qp(struct ib_pd *ib_pd, int hns_roce_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask, struct ib_udata *udata); void init_flush_work(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp); -void *hns_roce_get_recv_wqe(struct hns_roce_qp *hr_qp, int n); -void *hns_roce_get_send_wqe(struct hns_roce_qp *hr_qp, int n); -void *hns_roce_get_extend_sge(struct hns_roce_qp *hr_qp, int n); -bool hns_roce_wq_overflow(struct hns_roce_wq *hr_wq, int nreq, +void *hns_roce_get_recv_wqe(struct hns_roce_qp *hr_qp, unsigned int n); +void *hns_roce_get_send_wqe(struct hns_roce_qp *hr_qp, unsigned int n); +void *hns_roce_get_extend_sge(struct hns_roce_qp *hr_qp, unsigned int n); +bool hns_roce_wq_overflow(struct hns_roce_wq *hr_wq, u32 nreq, struct ib_cq *ib_cq); enum hns_roce_qp_state to_hns_roce_state(enum ib_qp_state state); void hns_roce_lock_cqs(struct hns_roce_cq *send_cq, @@ -1289,7 +1290,7 @@ void hns_roce_cq_completion(struct hns_roce_dev *hr_dev, u32 cqn); void hns_roce_cq_event(struct hns_roce_dev *hr_dev, u32 cqn, int event_type); void hns_roce_qp_event(struct hns_roce_dev *hr_dev, u32 qpn, int event_type); void hns_roce_srq_event(struct hns_roce_dev *hr_dev, u32 srqn, int event_type); -int hns_get_gid_index(struct hns_roce_dev *hr_dev, u8 port, int gid_index); +u8 hns_get_gid_index(struct hns_roce_dev *hr_dev, u8 port, int gid_index); void hns_roce_handle_device_err(struct hns_roce_dev *hr_dev); int hns_roce_init(struct hns_roce_dev *hr_dev); void hns_roce_exit(struct hns_roce_dev *hr_dev); diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c index 6f9b024d4ff7..ea8d7154d0e1 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c @@ -288,7 +288,7 @@ static int hns_roce_v1_post_send(struct ib_qp *ibqp, ret = -EINVAL; *bad_wr = wr; dev_err(dev, "inline len(1-%d)=%d, illegal", - ctrl->msg_length, + le32_to_cpu(ctrl->msg_length), hr_dev->caps.max_sq_inline); goto out; } @@ -1715,7 +1715,7 @@ static int hns_roce_v1_post_mbox(struct hns_roce_dev *hr_dev, u64 in_param, } static int hns_roce_v1_chk_mbox(struct hns_roce_dev *hr_dev, - unsigned long timeout) + unsigned int timeout) { u8 __iomem *hcr = hr_dev->reg_base + ROCEE_MB1_REG; unsigned long end; @@ -3674,10 +3674,10 @@ static int hns_roce_v1_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata) return 0; } -static void set_eq_cons_index_v1(struct hns_roce_eq *eq, int req_not) +static void set_eq_cons_index_v1(struct hns_roce_eq *eq, u32 req_not) { roce_raw_write((eq->cons_index & HNS_ROCE_V1_CONS_IDX_M) | - (req_not << eq->log_entries), eq->doorbell); + (req_not << eq->log_entries), eq->doorbell); } static void hns_roce_v1_wq_catas_err_handle(struct hns_roce_dev *hr_dev, diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index 518b38e9158d..10ffad061f94 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -650,7 +650,7 @@ static int hns_roce_v2_post_send(struct ib_qp *ibqp, unsigned int sge_idx; unsigned int wqe_idx; void *wqe = NULL; - int nreq; + u32 nreq; int ret; spin_lock_irqsave(&qp->sq.lock, flags); @@ -828,7 +828,7 @@ static void *get_srq_wqe(struct hns_roce_srq *srq, int n) return hns_roce_buf_offset(srq->buf_mtr.kmem, n << srq->wqe_shift); } -static void *get_idx_buf(struct hns_roce_idx_que *idx_que, int n) +static void *get_idx_buf(struct hns_roce_idx_que *idx_que, unsigned int n) { return hns_roce_buf_offset(idx_que->mtr.kmem, n << idx_que->entry_shift); @@ -869,12 +869,12 @@ static int hns_roce_v2_post_srq_recv(struct ib_srq *ibsrq, struct hns_roce_v2_wqe_data_seg *dseg; struct hns_roce_v2_db srq_db; unsigned long flags; + unsigned int ind; __le32 *srq_idx; int ret = 0; int wqe_idx; void *wqe; int nreq; - int ind; int i; spin_lock_irqsave(&srq->lock, flags); @@ -1128,7 +1128,7 @@ static void hns_roce_cmq_init_regs(struct hns_roce_dev *hr_dev, bool ring_type) roce_write(hr_dev, ROCEE_TX_CMQ_BASEADDR_H_REG, upper_32_bits(dma)); roce_write(hr_dev, ROCEE_TX_CMQ_DEPTH_REG, - ring->desc_num >> HNS_ROCE_CMQ_DESC_NUM_S); + (u32)ring->desc_num >> HNS_ROCE_CMQ_DESC_NUM_S); roce_write(hr_dev, ROCEE_TX_CMQ_HEAD_REG, 0); roce_write(hr_dev, ROCEE_TX_CMQ_TAIL_REG, 0); } else { @@ -1136,7 +1136,7 @@ static void hns_roce_cmq_init_regs(struct hns_roce_dev *hr_dev, bool ring_type) roce_write(hr_dev, ROCEE_RX_CMQ_BASEADDR_H_REG, upper_32_bits(dma)); roce_write(hr_dev, ROCEE_RX_CMQ_DEPTH_REG, - ring->desc_num >> HNS_ROCE_CMQ_DESC_NUM_S); + (u32)ring->desc_num >> HNS_ROCE_CMQ_DESC_NUM_S); roce_write(hr_dev, ROCEE_RX_CMQ_HEAD_REG, 0); roce_write(hr_dev, ROCEE_RX_CMQ_TAIL_REG, 0); } @@ -1907,8 +1907,8 @@ static void set_default_caps(struct hns_roce_dev *hr_dev) } } -static void calc_pg_sz(int obj_num, int obj_size, int hop_num, int ctx_bt_num, - int *buf_page_size, int *bt_page_size, u32 hem_type) +static void calc_pg_sz(u32 obj_num, u32 obj_size, u32 hop_num, u32 ctx_bt_num, + u32 *buf_page_size, u32 *bt_page_size, u32 hem_type) { u64 obj_per_chunk; u64 bt_chunk_size = PAGE_SIZE; @@ -2382,10 +2382,10 @@ static int hns_roce_init_link_table(struct hns_roce_dev *hr_dev, u32 buf_chk_sz; dma_addr_t t; int func_num = 1; - int pg_num_a; - int pg_num_b; - int pg_num; - int size; + u32 pg_num_a; + u32 pg_num_b; + u32 pg_num; + u32 size; int i; switch (type) { @@ -2549,7 +2549,7 @@ static int hns_roce_query_mbox_status(struct hns_roce_dev *hr_dev) struct hns_roce_cmq_desc desc; struct hns_roce_mbox_status *mb_st = (struct hns_roce_mbox_status *)desc.data; - enum hns_roce_cmd_return_status status; + int status; hns_roce_cmq_setup_basic_desc(&desc, HNS_ROCE_OPC_QUERY_MB_ST, true); @@ -2620,7 +2620,7 @@ static int hns_roce_v2_post_mbox(struct hns_roce_dev *hr_dev, u64 in_param, } static int hns_roce_v2_chk_mbox(struct hns_roce_dev *hr_dev, - unsigned long timeout) + unsigned int timeout) { struct device *dev = hr_dev->dev; unsigned long end; @@ -2970,7 +2970,7 @@ static void *get_cqe_v2(struct hns_roce_cq *hr_cq, int n) return hns_roce_buf_offset(hr_cq->mtr.kmem, n * hr_cq->cqe_size); } -static void *get_sw_cqe_v2(struct hns_roce_cq *hr_cq, int n) +static void *get_sw_cqe_v2(struct hns_roce_cq *hr_cq, unsigned int n) { struct hns_roce_v2_cqe *cqe = get_cqe_v2(hr_cq, n & hr_cq->ib_cq.cqe); @@ -3314,7 +3314,7 @@ static int hns_roce_v2_poll_one(struct hns_roce_cq *hr_cq, int is_send; u16 wqe_ctr; u32 opcode; - int qpn; + u32 qpn; int ret; /* Find cqe according to consumer index */ diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c index 90cbd15f6441..f62162771db5 100644 --- a/drivers/infiniband/hw/hns/hns_roce_main.c +++ b/drivers/infiniband/hw/hns/hns_roce_main.c @@ -53,7 +53,7 @@ * GID[0][0], GID[1][0],.....GID[N - 1][0], * And so on */ -int hns_get_gid_index(struct hns_roce_dev *hr_dev, u8 port, int gid_index) +u8 hns_get_gid_index(struct hns_roce_dev *hr_dev, u8 port, int gid_index) { return gid_index * hr_dev->caps.num_ports + port; } diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c index 13c7ac6ea312..3d432a6918a7 100644 --- a/drivers/infiniband/hw/hns/hns_roce_mr.c +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c @@ -508,7 +508,7 @@ int hns_roce_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, ibdev_err(ibdev, "failed to map sg mtr, ret = %d.\n", ret); ret = 0; } else { - mr->pbl_mtr.hem_cfg.buf_pg_shift = ilog2(ibmr->page_size); + mr->pbl_mtr.hem_cfg.buf_pg_shift = (u32)ilog2(ibmr->page_size); ret = mr->npages; } @@ -827,12 +827,12 @@ static int mtr_get_pages(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr, } int hns_roce_mtr_map(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr, - dma_addr_t *pages, int page_cnt) + dma_addr_t *pages, unsigned int page_cnt) { struct ib_device *ibdev = &hr_dev->ib_dev; struct hns_roce_buf_region *r; + unsigned int i; int err; - int i; /* * Only use the first page address as root ba when hopnum is 0, this @@ -869,13 +869,12 @@ int hns_roce_mtr_find(struct hns_roce_dev *hr_dev, struct hns_roce_mtr *mtr, int offset, u64 *mtt_buf, int mtt_max, u64 *base_addr) { struct hns_roce_hem_cfg *cfg = &mtr->hem_cfg; + int mtt_count, left; int start_index; - int mtt_count; int total = 0; __le64 *mtts; - int npage; + u32 npage; u64 addr; - int left; if (!mtt_buf || mtt_max < 1) goto done; diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c index 8323de2c4bf9..3ed43a5c7596 100644 --- a/drivers/infiniband/hw/hns/hns_roce_qp.c +++ b/drivers/infiniband/hw/hns/hns_roce_qp.c @@ -1318,22 +1318,22 @@ static inline void *get_wqe(struct hns_roce_qp *hr_qp, int offset) return hns_roce_buf_offset(hr_qp->mtr.kmem, offset); } -void *hns_roce_get_recv_wqe(struct hns_roce_qp *hr_qp, int n) +void *hns_roce_get_recv_wqe(struct hns_roce_qp *hr_qp, unsigned int n) { return get_wqe(hr_qp, hr_qp->rq.offset + (n << hr_qp->rq.wqe_shift)); } -void *hns_roce_get_send_wqe(struct hns_roce_qp *hr_qp, int n) +void *hns_roce_get_send_wqe(struct hns_roce_qp *hr_qp, unsigned int n) { return get_wqe(hr_qp, hr_qp->sq.offset + (n << hr_qp->sq.wqe_shift)); } -void *hns_roce_get_extend_sge(struct hns_roce_qp *hr_qp, int n) +void *hns_roce_get_extend_sge(struct hns_roce_qp *hr_qp, unsigned int n) { return get_wqe(hr_qp, hr_qp->sge.offset + (n << hr_qp->sge.sge_shift)); } -bool hns_roce_wq_overflow(struct hns_roce_wq *hr_wq, int nreq, +bool hns_roce_wq_overflow(struct hns_roce_wq *hr_wq, u32 nreq, struct ib_cq *ib_cq) { struct hns_roce_cq *hr_cq; From 9efed7448b1794e116d7a3ec75469f0f7c168e63 Mon Sep 17 00:00:00 2001 From: Zhengchao Shao Date: Thu, 11 Apr 2024 11:38:51 +0800 Subject: [PATCH 0113/1565] RDMA/hns: Fix return value in hns_roce_map_mr_sg [ Upstream commit 203b70fda63425a4eb29f03f9074859afe821a39 ] As described in the ib_map_mr_sg function comment, it returns the number of sg elements that were mapped to the memory region. However, hns_roce_map_mr_sg returns the number of pages required for mapping the DMA area. Fix it. Fixes: 9b2cf76c9f05 ("RDMA/hns: Optimize PBL buffer allocation process") Signed-off-by: Zhengchao Shao Link: https://lore.kernel.org/r/20240411033851.2884771-1-shaozhengchao@huawei.com Reviewed-by: Junxian Huang Signed-off-by: Leon Romanovsky Signed-off-by: Sasha Levin --- drivers/infiniband/hw/hns/hns_roce_mr.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c index 3d432a6918a7..c038ed7d9496 100644 --- a/drivers/infiniband/hw/hns/hns_roce_mr.c +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c @@ -484,18 +484,18 @@ int hns_roce_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, struct ib_device *ibdev = &hr_dev->ib_dev; struct hns_roce_mr *mr = to_hr_mr(ibmr); struct hns_roce_mtr *mtr = &mr->pbl_mtr; - int ret = 0; + int ret, sg_num = 0; mr->npages = 0; mr->page_list = kvcalloc(mr->pbl_mtr.hem_cfg.buf_pg_count, sizeof(dma_addr_t), GFP_KERNEL); if (!mr->page_list) - return ret; + return sg_num; - ret = ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, hns_roce_set_page); - if (ret < 1) { + sg_num = ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, hns_roce_set_page); + if (sg_num < 1) { ibdev_err(ibdev, "failed to store sg pages %u %u, cnt = %d.\n", - mr->npages, mr->pbl_mtr.hem_cfg.buf_pg_count, ret); + mr->npages, mr->pbl_mtr.hem_cfg.buf_pg_count, sg_num); goto err_page_list; } @@ -506,17 +506,16 @@ int hns_roce_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, ret = hns_roce_mtr_map(hr_dev, mtr, mr->page_list, mr->npages); if (ret) { ibdev_err(ibdev, "failed to map sg mtr, ret = %d.\n", ret); - ret = 0; + sg_num = 0; } else { mr->pbl_mtr.hem_cfg.buf_pg_shift = (u32)ilog2(ibmr->page_size); - ret = mr->npages; } err_page_list: kvfree(mr->page_list); mr->page_list = NULL; - return ret; + return sg_num; } static void hns_roce_mw_free(struct hns_roce_dev *hr_dev, From 9cce44567f1da36bd9082c6fb17c5523a3d79d9b Mon Sep 17 00:00:00 2001 From: Chengchang Tang Date: Fri, 12 Apr 2024 17:16:15 +0800 Subject: [PATCH 0114/1565] RDMA/hns: Use complete parentheses in macros [ Upstream commit 4125269bb9b22e1d8cdf4412c81be8074dbc61ca ] Use complete parentheses to ensure that macro expansion does not produce unexpected results. Fixes: a25d13cbe816 ("RDMA/hns: Add the interfaces to support multi hop addressing for the contexts in hip08") Signed-off-by: Chengchang Tang Signed-off-by: Junxian Huang Link: https://lore.kernel.org/r/20240412091616.370789-10-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky Signed-off-by: Sasha Levin --- drivers/infiniband/hw/hns/hns_roce_hem.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/hw/hns/hns_roce_hem.h b/drivers/infiniband/hw/hns/hns_roce_hem.h index b7617786b100..5b2162a2b8ce 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hem.h +++ b/drivers/infiniband/hw/hns/hns_roce_hem.h @@ -60,16 +60,16 @@ enum { (sizeof(struct scatterlist) + sizeof(void *))) #define check_whether_bt_num_3(type, hop_num) \ - (type < HEM_TYPE_MTT && hop_num == 2) + ((type) < HEM_TYPE_MTT && (hop_num) == 2) #define check_whether_bt_num_2(type, hop_num) \ - ((type < HEM_TYPE_MTT && hop_num == 1) || \ - (type >= HEM_TYPE_MTT && hop_num == 2)) + (((type) < HEM_TYPE_MTT && (hop_num) == 1) || \ + ((type) >= HEM_TYPE_MTT && (hop_num) == 2)) #define check_whether_bt_num_1(type, hop_num) \ - ((type < HEM_TYPE_MTT && hop_num == HNS_ROCE_HOP_NUM_0) || \ - (type >= HEM_TYPE_MTT && hop_num == 1) || \ - (type >= HEM_TYPE_MTT && hop_num == HNS_ROCE_HOP_NUM_0)) + (((type) < HEM_TYPE_MTT && (hop_num) == HNS_ROCE_HOP_NUM_0) || \ + ((type) >= HEM_TYPE_MTT && (hop_num) == 1) || \ + ((type) >= HEM_TYPE_MTT && (hop_num) == HNS_ROCE_HOP_NUM_0)) struct hns_roce_hem_chunk { struct list_head list; From 45b31be4dd22827903df15c548b97b416790139b Mon Sep 17 00:00:00 2001 From: Chengchang Tang Date: Fri, 12 Apr 2024 17:16:16 +0800 Subject: [PATCH 0115/1565] RDMA/hns: Modify the print level of CQE error [ Upstream commit 349e859952285ab9689779fb46de163f13f18f43 ] Too much print may lead to a panic in kernel. Change ibdev_err() to ibdev_err_ratelimited(), and change the printing level of cqe dump to debug level. Fixes: 7c044adca272 ("RDMA/hns: Simplify the cqe code of poll cq") Signed-off-by: Chengchang Tang Signed-off-by: Junxian Huang Link: https://lore.kernel.org/r/20240412091616.370789-11-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky Signed-off-by: Sasha Levin --- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index 10ffad061f94..13aa8dd42f7d 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -3278,8 +3278,9 @@ static void get_cqe_status(struct hns_roce_dev *hr_dev, struct hns_roce_qp *qp, wc->status == IB_WC_WR_FLUSH_ERR)) return; - ibdev_err(&hr_dev->ib_dev, "error cqe status 0x%x:\n", cqe_status); - print_hex_dump(KERN_ERR, "", DUMP_PREFIX_NONE, 16, 4, cqe, + ibdev_err_ratelimited(&hr_dev->ib_dev, "error cqe status 0x%x:\n", + cqe_status); + print_hex_dump(KERN_DEBUG, "", DUMP_PREFIX_NONE, 16, 4, cqe, cq->cqe_size, false); /* From 9ff328de02842c48edf931747174211d2a2c520b Mon Sep 17 00:00:00 2001 From: Marc Gonzalez Date: Thu, 25 Apr 2024 17:07:07 +0200 Subject: [PATCH 0116/1565] clk: qcom: mmcc-msm8998: fix venus clock issue [ Upstream commit e20ae5ae9f0c843aded4f06f3d1cab7384789e92 ] Right now, msm8998 video decoder (venus) is non-functional: $ time mpv --hwdec=v4l2m2m-copy --vd-lavc-software-fallback=no --vo=null --no-audio --untimed --length=30 --quiet demo-480.webm (+) Video --vid=1 (*) (vp9 854x480 29.970fps) Audio --aid=1 --alang=eng (*) (opus 2ch 48000Hz) [ffmpeg/video] vp9_v4l2m2m: output VIDIOC_REQBUFS failed: Connection timed out [ffmpeg/video] vp9_v4l2m2m: no v4l2 output context's buffers [ffmpeg/video] vp9_v4l2m2m: can't configure decoder Could not open codec. Software decoding fallback is disabled. Exiting... (Quit) Bryan O'Donoghue suggested the proper fix: - Set required register offsets in venus GDSC structs. - Set HW_CTRL flag. $ time mpv --hwdec=v4l2m2m-copy --vd-lavc-software-fallback=no --vo=null --no-audio --untimed --length=30 --quiet demo-480.webm (+) Video --vid=1 (*) (vp9 854x480 29.970fps) Audio --aid=1 --alang=eng (*) (opus 2ch 48000Hz) [ffmpeg/video] vp9_v4l2m2m: VIDIOC_G_FMT ioctl [ffmpeg/video] vp9_v4l2m2m: VIDIOC_G_FMT ioctl ... Using hardware decoding (v4l2m2m-copy). VO: [null] 854x480 nv12 Exiting... (End of file) real 0m3.315s user 0m1.277s sys 0m0.453s NOTES: GDSC = Globally Distributed Switch Controller Use same code as mmcc-msm8996 with: s/venus_gdsc/video_top_gdsc/ s/venus_core0_gdsc/video_subcore0_gdsc/ s/venus_core1_gdsc/video_subcore1_gdsc/ https://git.codelinaro.org/clo/la/kernel/msm-4.4/-/blob/caf_migration/kernel.lnx.4.4.r38-rel/include/dt-bindings/clock/msm-clocks-hwio-8996.h https://git.codelinaro.org/clo/la/kernel/msm-4.4/-/blob/caf_migration/kernel.lnx.4.4.r38-rel/include/dt-bindings/clock/msm-clocks-hwio-8998.h 0x1024 = MMSS_VIDEO GDSCR (undocumented) 0x1028 = MMSS_VIDEO_CORE_CBCR 0x1030 = MMSS_VIDEO_AHB_CBCR 0x1034 = MMSS_VIDEO_AXI_CBCR 0x1038 = MMSS_VIDEO_MAXI_CBCR 0x1040 = MMSS_VIDEO_SUBCORE0 GDSCR (undocumented) 0x1044 = MMSS_VIDEO_SUBCORE1 GDSCR (undocumented) 0x1048 = MMSS_VIDEO_SUBCORE0_CBCR 0x104c = MMSS_VIDEO_SUBCORE1_CBCR Fixes: d14b15b5931c2b ("clk: qcom: Add MSM8998 Multimedia Clock Controller (MMCC) driver") Reviewed-by: Bryan O'Donoghue Signed-off-by: Marc Gonzalez Reviewed-by: Jeffrey Hugo Link: https://lore.kernel.org/r/ff4e2e34-a677-4c39-8c29-83655c5512ae@freebox.fr Signed-off-by: Bjorn Andersson Signed-off-by: Sasha Levin --- drivers/clk/qcom/mmcc-msm8998.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/clk/qcom/mmcc-msm8998.c b/drivers/clk/qcom/mmcc-msm8998.c index a68764cfb793..5e2e60c1c228 100644 --- a/drivers/clk/qcom/mmcc-msm8998.c +++ b/drivers/clk/qcom/mmcc-msm8998.c @@ -2587,6 +2587,8 @@ static struct clk_hw *mmcc_msm8998_hws[] = { static struct gdsc video_top_gdsc = { .gdscr = 0x1024, + .cxcs = (unsigned int []){ 0x1028, 0x1034, 0x1038 }, + .cxc_count = 3, .pd = { .name = "video_top", }, @@ -2595,20 +2597,26 @@ static struct gdsc video_top_gdsc = { static struct gdsc video_subcore0_gdsc = { .gdscr = 0x1040, + .cxcs = (unsigned int []){ 0x1048 }, + .cxc_count = 1, .pd = { .name = "video_subcore0", }, .parent = &video_top_gdsc.pd, .pwrsts = PWRSTS_OFF_ON, + .flags = HW_CTRL, }; static struct gdsc video_subcore1_gdsc = { .gdscr = 0x1044, + .cxcs = (unsigned int []){ 0x104c }, + .cxc_count = 1, .pd = { .name = "video_subcore1", }, .parent = &video_top_gdsc.pd, .pwrsts = PWRSTS_OFF_ON, + .flags = HW_CTRL, }; static struct gdsc mdss_gdsc = { From 4d693ca24a367fc28d0e3557396844aaa63df2e0 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Thu, 2 May 2024 13:58:45 +0300 Subject: [PATCH 0117/1565] x86/insn: Fix PUSH instruction in x86 instruction decoder opcode map [ Upstream commit 59162e0c11d7257cde15f907d19fefe26da66692 ] The x86 instruction decoder is used not only for decoding kernel instructions. It is also used by perf uprobes (user space probes) and by perf tools Intel Processor Trace decoding. Consequently, it needs to support instructions executed by user space also. Opcode 0x68 PUSH instruction is currently defined as 64-bit operand size only i.e. (d64). That was based on Intel SDM Opcode Map. However that is contradicted by the Instruction Set Reference section for PUSH in the same manual. Remove 64-bit operand size only annotation from opcode 0x68 PUSH instruction. Example: $ cat pushw.s .global _start .text _start: pushw $0x1234 mov $0x1,%eax # system call number (sys_exit) int $0x80 $ as -o pushw.o pushw.s $ ld -s -o pushw pushw.o $ objdump -d pushw | tail -4 0000000000401000 <.text>: 401000: 66 68 34 12 pushw $0x1234 401004: b8 01 00 00 00 mov $0x1,%eax 401009: cd 80 int $0x80 $ perf record -e intel_pt//u ./pushw [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.014 MB perf.data ] Before: $ perf script --insn-trace=disasm Warning: 1 instruction trace errors pushw 10349 [000] 10586.869237014: 401000 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) pushw $0x1234 pushw 10349 [000] 10586.869237014: 401006 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %al, (%rax) pushw 10349 [000] 10586.869237014: 401008 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %cl, %ch pushw 10349 [000] 10586.869237014: 40100a [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb $0x2e, (%rax) instruction trace error type 1 time 10586.869237224 cpu 0 pid 10349 tid 10349 ip 0x40100d code 6: Trace doesn't match instruction After: $ perf script --insn-trace=disasm pushw 10349 [000] 10586.869237014: 401000 [unknown] (./pushw) pushw $0x1234 pushw 10349 [000] 10586.869237014: 401004 [unknown] (./pushw) movl $1, %eax Fixes: eb13296cfaf6 ("x86: Instruction decoder API") Signed-off-by: Adrian Hunter Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20240502105853.5338-3-adrian.hunter@intel.com Signed-off-by: Sasha Levin --- arch/x86/lib/x86-opcode-map.txt | 2 +- tools/arch/x86/lib/x86-opcode-map.txt | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt index ec31f5b60323..1c25c1072a84 100644 --- a/arch/x86/lib/x86-opcode-map.txt +++ b/arch/x86/lib/x86-opcode-map.txt @@ -148,7 +148,7 @@ AVXcode: 65: SEG=GS (Prefix) 66: Operand-Size (Prefix) 67: Address-Size (Prefix) -68: PUSH Iz (d64) +68: PUSH Iz 69: IMUL Gv,Ev,Iz 6a: PUSH Ib (d64) 6b: IMUL Gv,Ev,Ib diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt index ec31f5b60323..1c25c1072a84 100644 --- a/tools/arch/x86/lib/x86-opcode-map.txt +++ b/tools/arch/x86/lib/x86-opcode-map.txt @@ -148,7 +148,7 @@ AVXcode: 65: SEG=GS (Prefix) 66: Operand-Size (Prefix) 67: Address-Size (Prefix) -68: PUSH Iz (d64) +68: PUSH Iz 69: IMUL Gv,Ev,Iz 6a: PUSH Ib (d64) 6b: IMUL Gv,Ev,Ib From 7c9ab0a44952c1ead18b79746888ddf0eaafe0d2 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Thu, 7 Mar 2024 12:53:20 +0100 Subject: [PATCH 0118/1565] ext4: avoid excessive credit estimate in ext4_tmpfile() [ Upstream commit 35a1f12f0ca857fee1d7a04ef52cbd5f1f84de13 ] A user with minimum journal size (1024 blocks these days) complained about the following error triggered by generic/697 test in ext4_tmpfile(): run fstests generic/697 at 2024-02-28 05:34:46 JBD2: vfstest wants too many credits credits:260 rsv_credits:0 max:256 EXT4-fs error (device loop0) in __ext4_new_inode:1083: error 28 Indeed the credit estimate in ext4_tmpfile() is huge. EXT4_MAXQUOTAS_INIT_BLOCKS() is 219, then 10 credits from ext4_tmpfile() itself and then ext4_xattr_credits_for_new_inode() adds more credits needed for security attributes and ACLs. Now the EXT4_MAXQUOTAS_INIT_BLOCKS() is in fact unnecessary because we've already initialized quotas with dquot_init() shortly before and so EXT4_MAXQUOTAS_TRANS_BLOCKS() is enough (which boils down to 3 credits). Fixes: af51a2ac36d1 ("ext4: ->tmpfile() support") Signed-off-by: Jan Kara Tested-by: Luis Henriques Tested-by: Disha Goel Link: https://lore.kernel.org/r/20240307115320.28949-1-jack@suse.cz Signed-off-by: Theodore Ts'o Signed-off-by: Sasha Levin --- fs/ext4/namei.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index a50eb4c61ecc..af0801593abb 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2748,7 +2748,7 @@ static int ext4_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode) inode = ext4_new_inode_start_handle(dir, mode, NULL, 0, NULL, EXT4_HT_DIR, - EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb) + + EXT4_MAXQUOTAS_TRANS_BLOCKS(dir->i_sb) + 4 + EXT4_XATTR_TRANS_BLOCKS); handle = ext4_journal_current_handle(); err = PTR_ERR(inode); From 2a2bba3cbd6a22c964ebdaf97c4fc805974fe3d6 Mon Sep 17 00:00:00 2001 From: Aleksandr Aprelkov Date: Wed, 27 Mar 2024 14:10:44 +0700 Subject: [PATCH 0119/1565] sunrpc: removed redundant procp check [ Upstream commit a576f36971ab4097b6aa76433532aa1fb5ee2d3b ] since vs_proc pointer is dereferenced before getting it's address there's no need to check for NULL. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 8e5b67731d08 ("SUNRPC: Add a callback to initialise server requests") Signed-off-by: Aleksandr Aprelkov Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- net/sunrpc/svc.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 495ebe7fad6d..cfe8b911ca01 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1248,8 +1248,6 @@ svc_generic_init_request(struct svc_rqst *rqstp, if (rqstp->rq_proc >= versp->vs_nproc) goto err_bad_proc; rqstp->rq_procinfo = procp = &versp->vs_proc[rqstp->rq_proc]; - if (!procp) - goto err_bad_proc; /* Initialize storage for argp and resp */ memset(rqstp->rq_argp, 0, procp->pc_argsize); From f49c865d5b93f615a3bb0d4ab7ee8a9f9ef2f539 Mon Sep 17 00:00:00 2001 From: Kemeng Shi Date: Sat, 4 Mar 2023 01:21:20 +0800 Subject: [PATCH 0120/1565] ext4: simplify calculation of blkoff in ext4_mb_new_blocks_simple [ Upstream commit 253cacb0de89235673ad5889d61f275a73dbee79 ] We try to allocate a block from goal in ext4_mb_new_blocks_simple. We only need get blkoff in first group with goal and set blkoff to 0 for the rest groups. Signed-off-by: Kemeng Shi Link: https://lore.kernel.org/r/20230303172120.3800725-21-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o Stable-dep-of: 3f4830abd236 ("ext4: fix potential unnitialized variable") Signed-off-by: Sasha Levin --- fs/ext4/mballoc.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index c1d632725c2c..26beaf102ce3 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -5371,9 +5371,6 @@ static ext4_fsblk_t ext4_mb_new_blocks_simple(handle_t *handle, return 0; } - ext4_get_group_no_and_offset(sb, - max(ext4_group_first_block_no(sb, group), goal), - NULL, &blkoff); while (1) { i = mb_find_next_zero_bit(bitmap_bh->b_data, max, blkoff); @@ -5388,6 +5385,8 @@ static ext4_fsblk_t ext4_mb_new_blocks_simple(handle_t *handle, brelse(bitmap_bh); if (i < max) break; + + blkoff = 0; } if (group >= ext4_get_groups_count(sb) || i >= max) { From e498c2f441d9000be165bcb5fa6bf3b386cca9dd Mon Sep 17 00:00:00 2001 From: Kemeng Shi Date: Sat, 3 Jun 2023 23:03:10 +0800 Subject: [PATCH 0121/1565] ext4: fix unit mismatch in ext4_mb_new_blocks_simple [ Upstream commit 497885f72d930305d8e61b6b616b22b4da1adf90 ] The "i" returned from mb_find_next_zero_bit is in cluster unit and we need offset "block" corresponding to "i" in block unit. Convert "i" to block unit to fix the unit mismatch. Signed-off-by: Kemeng Shi Reviewed-by: Ojaswin Mujoo Link: https://lore.kernel.org/r/20230603150327.3596033-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o Stable-dep-of: 3f4830abd236 ("ext4: fix potential unnitialized variable") Signed-off-by: Sasha Levin --- fs/ext4/mballoc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 26beaf102ce3..f54f23afd93d 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -5349,6 +5349,7 @@ static ext4_fsblk_t ext4_mb_new_blocks_simple(handle_t *handle, { struct buffer_head *bitmap_bh; struct super_block *sb = ar->inode->i_sb; + struct ext4_sb_info *sbi = EXT4_SB(sb); ext4_group_t group; ext4_grpblk_t blkoff; ext4_grpblk_t max = EXT4_CLUSTERS_PER_GROUP(sb); @@ -5377,7 +5378,8 @@ static ext4_fsblk_t ext4_mb_new_blocks_simple(handle_t *handle, if (i >= max) break; if (ext4_fc_replay_check_excluded(sb, - ext4_group_first_block_no(sb, group) + i)) { + ext4_group_first_block_no(sb, group) + + EXT4_C2B(sbi, i))) { blkoff = i + 1; } else break; @@ -5394,7 +5396,7 @@ static ext4_fsblk_t ext4_mb_new_blocks_simple(handle_t *handle, return 0; } - block = ext4_group_first_block_no(sb, group) + i; + block = ext4_group_first_block_no(sb, group) + EXT4_C2B(sbi, i); ext4_mb_mark_bb(sb, block, 1, 1); ar->len = 1; From fdbce45449050ffa2d8d355558cc473a245109a1 Mon Sep 17 00:00:00 2001 From: Kemeng Shi Date: Sat, 3 Jun 2023 23:03:15 +0800 Subject: [PATCH 0122/1565] ext4: try all groups in ext4_mb_new_blocks_simple [ Upstream commit 19a043bb1fd1b5cb2652ca33536c55e6c0a70df0 ] ext4_mb_new_blocks_simple ignores the group before goal, so it will fail if free blocks reside in group before goal. Try all groups to avoid unexpected failure. Search finishes either if any free block is found or if no available blocks are found. Simpliy check "i >= max" to distinguish the above cases. Signed-off-by: Kemeng Shi Suggested-by: Theodore Ts'o Reviewed-by: Ojaswin Mujoo Link: https://lore.kernel.org/r/20230603150327.3596033-8-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o Stable-dep-of: 3f4830abd236 ("ext4: fix potential unnitialized variable") Signed-off-by: Sasha Levin --- fs/ext4/mballoc.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index f54f23afd93d..de69929495a5 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -5350,7 +5350,7 @@ static ext4_fsblk_t ext4_mb_new_blocks_simple(handle_t *handle, struct buffer_head *bitmap_bh; struct super_block *sb = ar->inode->i_sb; struct ext4_sb_info *sbi = EXT4_SB(sb); - ext4_group_t group; + ext4_group_t group, nr; ext4_grpblk_t blkoff; ext4_grpblk_t max = EXT4_CLUSTERS_PER_GROUP(sb); ext4_grpblk_t i = 0; @@ -5364,7 +5364,7 @@ static ext4_fsblk_t ext4_mb_new_blocks_simple(handle_t *handle, ar->len = 0; ext4_get_group_no_and_offset(sb, goal, &group, &blkoff); - for (; group < ext4_get_groups_count(sb); group++) { + for (nr = ext4_get_groups_count(sb); nr > 0; nr--) { bitmap_bh = ext4_read_block_bitmap(sb, group); if (IS_ERR(bitmap_bh)) { *errp = PTR_ERR(bitmap_bh); @@ -5388,10 +5388,13 @@ static ext4_fsblk_t ext4_mb_new_blocks_simple(handle_t *handle, if (i < max) break; + if (++group >= ext4_get_groups_count(sb)) + group = 0; + blkoff = 0; } - if (group >= ext4_get_groups_count(sb) || i >= max) { + if (i >= max) { *errp = -ENOSPC; return 0; } From 4deaa4d5f800c51c9bd41dee46969b26e08a02d9 Mon Sep 17 00:00:00 2001 From: Kemeng Shi Date: Sat, 3 Jun 2023 23:03:17 +0800 Subject: [PATCH 0123/1565] ext4: remove unused parameter from ext4_mb_new_blocks_simple() [ Upstream commit ad78b5efe4246e5deba8d44a6ed172b8a00d3113 ] Two cleanups for ext4_mb_new_blocks_simple: Remove unused parameter handle of ext4_mb_new_blocks_simple. Move ext4_mb_new_blocks_simple definition before ext4_mb_new_blocks to remove unnecessary forward declaration of ext4_mb_new_blocks_simple. Signed-off-by: Kemeng Shi Reviewed-by: Ojaswin Mujoo Link: https://lore.kernel.org/r/20230603150327.3596033-10-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o Stable-dep-of: 3f4830abd236 ("ext4: fix potential unnitialized variable") Signed-off-by: Sasha Levin --- fs/ext4/mballoc.c | 137 +++++++++++++++++++++++----------------------- 1 file changed, 67 insertions(+), 70 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index de69929495a5..099007ec774c 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -5083,8 +5083,72 @@ static bool ext4_mb_discard_preallocations_should_retry(struct super_block *sb, return ret; } -static ext4_fsblk_t ext4_mb_new_blocks_simple(handle_t *handle, - struct ext4_allocation_request *ar, int *errp); +/* + * Simple allocator for Ext4 fast commit replay path. It searches for blocks + * linearly starting at the goal block and also excludes the blocks which + * are going to be in use after fast commit replay. + */ +static ext4_fsblk_t +ext4_mb_new_blocks_simple(struct ext4_allocation_request *ar, int *errp) +{ + struct buffer_head *bitmap_bh; + struct super_block *sb = ar->inode->i_sb; + struct ext4_sb_info *sbi = EXT4_SB(sb); + ext4_group_t group, nr; + ext4_grpblk_t blkoff; + ext4_grpblk_t max = EXT4_CLUSTERS_PER_GROUP(sb); + ext4_grpblk_t i = 0; + ext4_fsblk_t goal, block; + struct ext4_super_block *es = EXT4_SB(sb)->s_es; + + goal = ar->goal; + if (goal < le32_to_cpu(es->s_first_data_block) || + goal >= ext4_blocks_count(es)) + goal = le32_to_cpu(es->s_first_data_block); + + ar->len = 0; + ext4_get_group_no_and_offset(sb, goal, &group, &blkoff); + for (nr = ext4_get_groups_count(sb); nr > 0; nr--) { + bitmap_bh = ext4_read_block_bitmap(sb, group); + if (IS_ERR(bitmap_bh)) { + *errp = PTR_ERR(bitmap_bh); + pr_warn("Failed to read block bitmap\n"); + return 0; + } + + while (1) { + i = mb_find_next_zero_bit(bitmap_bh->b_data, max, + blkoff); + if (i >= max) + break; + if (ext4_fc_replay_check_excluded(sb, + ext4_group_first_block_no(sb, group) + + EXT4_C2B(sbi, i))) { + blkoff = i + 1; + } else + break; + } + brelse(bitmap_bh); + if (i < max) + break; + + if (++group >= ext4_get_groups_count(sb)) + group = 0; + + blkoff = 0; + } + + if (i >= max) { + *errp = -ENOSPC; + return 0; + } + + block = ext4_group_first_block_no(sb, group) + EXT4_C2B(sbi, i); + ext4_mb_mark_bb(sb, block, 1, 1); + ar->len = 1; + + return block; +} /* * Main entry point into mballoc to allocate blocks @@ -5109,7 +5173,7 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle, trace_ext4_request_blocks(ar); if (sbi->s_mount_state & EXT4_FC_REPLAY) - return ext4_mb_new_blocks_simple(handle, ar, errp); + return ext4_mb_new_blocks_simple(ar, errp); /* Allow to use superuser reservation for quota file */ if (ext4_is_quota_file(ar->inode)) @@ -5339,73 +5403,6 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b, return 0; } -/* - * Simple allocator for Ext4 fast commit replay path. It searches for blocks - * linearly starting at the goal block and also excludes the blocks which - * are going to be in use after fast commit replay. - */ -static ext4_fsblk_t ext4_mb_new_blocks_simple(handle_t *handle, - struct ext4_allocation_request *ar, int *errp) -{ - struct buffer_head *bitmap_bh; - struct super_block *sb = ar->inode->i_sb; - struct ext4_sb_info *sbi = EXT4_SB(sb); - ext4_group_t group, nr; - ext4_grpblk_t blkoff; - ext4_grpblk_t max = EXT4_CLUSTERS_PER_GROUP(sb); - ext4_grpblk_t i = 0; - ext4_fsblk_t goal, block; - struct ext4_super_block *es = EXT4_SB(sb)->s_es; - - goal = ar->goal; - if (goal < le32_to_cpu(es->s_first_data_block) || - goal >= ext4_blocks_count(es)) - goal = le32_to_cpu(es->s_first_data_block); - - ar->len = 0; - ext4_get_group_no_and_offset(sb, goal, &group, &blkoff); - for (nr = ext4_get_groups_count(sb); nr > 0; nr--) { - bitmap_bh = ext4_read_block_bitmap(sb, group); - if (IS_ERR(bitmap_bh)) { - *errp = PTR_ERR(bitmap_bh); - pr_warn("Failed to read block bitmap\n"); - return 0; - } - - while (1) { - i = mb_find_next_zero_bit(bitmap_bh->b_data, max, - blkoff); - if (i >= max) - break; - if (ext4_fc_replay_check_excluded(sb, - ext4_group_first_block_no(sb, group) + - EXT4_C2B(sbi, i))) { - blkoff = i + 1; - } else - break; - } - brelse(bitmap_bh); - if (i < max) - break; - - if (++group >= ext4_get_groups_count(sb)) - group = 0; - - blkoff = 0; - } - - if (i >= max) { - *errp = -ENOSPC; - return 0; - } - - block = ext4_group_first_block_no(sb, group) + EXT4_C2B(sbi, i); - ext4_mb_mark_bb(sb, block, 1, 1); - ar->len = 1; - - return block; -} - static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block, unsigned long count) { From e28a16af4c3e250090e2ce27663ef065e7bf3ac6 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 17 Apr 2024 21:10:40 +0300 Subject: [PATCH 0124/1565] ext4: fix potential unnitialized variable [ Upstream commit 3f4830abd236d0428e50451e1ecb62e14c365e9b ] Smatch complains "err" can be uninitialized in the caller. fs/ext4/indirect.c:349 ext4_alloc_branch() error: uninitialized symbol 'err'. Set the error to zero on the success path. Fixes: 8016e29f4362 ("ext4: fast commit recovery path") Signed-off-by: Dan Carpenter Link: https://lore.kernel.org/r/363a4673-0fb8-4adf-b4fb-90a499077276@moroto.mountain Signed-off-by: Theodore Ts'o Signed-off-by: Sasha Levin --- fs/ext4/mballoc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 099007ec774c..87378d08a414 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -5147,6 +5147,7 @@ ext4_mb_new_blocks_simple(struct ext4_allocation_request *ar, int *errp) ext4_mb_mark_bb(sb, block, 1, 1); ar->len = 1; + *errp = 0; return block; } From f148a95f68c66c1b097391b68e153d5a46f0e780 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 7 May 2024 09:10:41 -0400 Subject: [PATCH 0125/1565] SUNRPC: Fix gss_free_in_token_pages() [ Upstream commit bafa6b4d95d97877baa61883ff90f7e374427fae ] Dan Carpenter says: > Commit 5866efa8cbfb ("SUNRPC: Fix svcauth_gss_proxy_init()") from Oct > 24, 2019 (linux-next), leads to the following Smatch static checker > warning: > > net/sunrpc/auth_gss/svcauth_gss.c:1039 gss_free_in_token_pages() > warn: iterator 'i' not incremented > > net/sunrpc/auth_gss/svcauth_gss.c > 1034 static void gss_free_in_token_pages(struct gssp_in_token *in_token) > 1035 { > 1036 u32 inlen; > 1037 int i; > 1038 > --> 1039 i = 0; > 1040 inlen = in_token->page_len; > 1041 while (inlen) { > 1042 if (in_token->pages[i]) > 1043 put_page(in_token->pages[i]); > ^ > This puts page zero over and over. > > 1044 inlen -= inlen > PAGE_SIZE ? PAGE_SIZE : inlen; > 1045 } > 1046 > 1047 kfree(in_token->pages); > 1048 in_token->pages = NULL; > 1049 } Based on the way that the ->pages[] array is constructed in gss_read_proxy_verf(), we know that once the loop encounters a NULL page pointer, the remaining array elements must also be NULL. Reported-by: Dan Carpenter Suggested-by: Trond Myklebust Fixes: 5866efa8cbfb ("SUNRPC: Fix svcauth_gss_proxy_init()") Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- net/sunrpc/auth_gss/svcauth_gss.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c index 406ff7f8b156..23e6ac9cce11 100644 --- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -1126,17 +1126,11 @@ gss_read_verf(struct rpc_gss_wire_cred *gc, static void gss_free_in_token_pages(struct gssp_in_token *in_token) { - u32 inlen; int i; i = 0; - inlen = in_token->page_len; - while (inlen) { - if (in_token->pages[i]) - put_page(in_token->pages[i]); - inlen -= inlen > PAGE_SIZE ? PAGE_SIZE : inlen; - } - + while (in_token->pages[i]) + put_page(in_token->pages[i++]); kfree(in_token->pages); in_token->pages = NULL; } From 08df7b006c8f59457f56d16143c5f277d46f7d89 Mon Sep 17 00:00:00 2001 From: Gautam Menghani Date: Thu, 30 Jun 2022 00:58:22 +0530 Subject: [PATCH 0126/1565] selftests/kcmp: Make the test output consistent and clear [ Upstream commit ff682226a353d88ffa5db9c2a9b945066776311e ] Make the output format of this test consistent. Currently the output is as follows: +TAP version 13 +1..1 +# selftests: kcmp: kcmp_test +# pid1: 45814 pid2: 45815 FD: 1 FILES: 1 VM: 2 FS: 1 SIGHAND: 2 + IO: 0 SYSVSEM: 0 INV: -1 +# PASS: 0 returned as expected +# PASS: 0 returned as expected +# PASS: 0 returned as expected +# # Planned tests != run tests (0 != 3) +# # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0 +# # Planned tests != run tests (0 != 3) +# # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0 +# # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0 +ok 1 selftests: kcmp: kcmp_test With this patch applied the output is as follows: +TAP version 13 +1..1 +# selftests: kcmp: kcmp_test +# TAP version 13 +# 1..3 +# pid1: 46330 pid2: 46331 FD: 1 FILES: 2 VM: 2 FS: 2 SIGHAND: 1 + IO: 0 SYSVSEM: 0 INV: -1 +# PASS: 0 returned as expected +# PASS: 0 returned as expected +# PASS: 0 returned as expected +# # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0 +ok 1 selftests: kcmp: kcmp_test Signed-off-by: Gautam Menghani Signed-off-by: Shuah Khan Stable-dep-of: eb59a5811371 ("selftests/kcmp: remove unused open mode") Signed-off-by: Sasha Levin --- tools/testing/selftests/kcmp/kcmp_test.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/kcmp/kcmp_test.c b/tools/testing/selftests/kcmp/kcmp_test.c index 6ea7b9f37a41..25110c7c0b3e 100644 --- a/tools/testing/selftests/kcmp/kcmp_test.c +++ b/tools/testing/selftests/kcmp/kcmp_test.c @@ -88,6 +88,9 @@ int main(int argc, char **argv) int pid2 = getpid(); int ret; + ksft_print_header(); + ksft_set_plan(3); + fd2 = open(kpath, O_RDWR, 0644); if (fd2 < 0) { perror("Can't open file"); @@ -152,7 +155,6 @@ int main(int argc, char **argv) ksft_inc_pass_cnt(); } - ksft_print_cnts(); if (ret) ksft_exit_fail(); @@ -162,5 +164,5 @@ int main(int argc, char **argv) waitpid(pid2, &status, P_ALL); - return ksft_exit_pass(); + return 0; } From 68e8c44c0d7a47957dd7d26f276bd13e5f442a85 Mon Sep 17 00:00:00 2001 From: Edward Liaw Date: Mon, 29 Apr 2024 23:46:09 +0000 Subject: [PATCH 0127/1565] selftests/kcmp: remove unused open mode [ Upstream commit eb59a58113717df04b8a8229befd8ab1e5dbf86e ] Android bionic warns that open modes are ignored if O_CREAT or O_TMPFILE aren't specified. The permissions for the file are set above: fd1 = open(kpath, O_RDWR | O_CREAT | O_TRUNC, 0644); Link: https://lkml.kernel.org/r/20240429234610.191144-1-edliaw@google.com Fixes: d97b46a64674 ("syscalls, x86: add __NR_kcmp syscall") Signed-off-by: Edward Liaw Reviewed-by: Cyrill Gorcunov Cc: Eric Biederman Cc: Shuah Khan Signed-off-by: Andrew Morton Signed-off-by: Sasha Levin --- tools/testing/selftests/kcmp/kcmp_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/kcmp/kcmp_test.c b/tools/testing/selftests/kcmp/kcmp_test.c index 25110c7c0b3e..d7a8e321bb16 100644 --- a/tools/testing/selftests/kcmp/kcmp_test.c +++ b/tools/testing/selftests/kcmp/kcmp_test.c @@ -91,7 +91,7 @@ int main(int argc, char **argv) ksft_print_header(); ksft_set_plan(3); - fd2 = open(kpath, O_RDWR, 0644); + fd2 = open(kpath, O_RDWR); if (fd2 < 0) { perror("Can't open file"); ksft_exit_fail(); From 3e83903cd474c1b2e75b317ddfbb2b562fc8cd6e Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Thu, 9 May 2024 10:39:33 +0300 Subject: [PATCH 0128/1565] RDMA/IPoIB: Fix format truncation compilation errors MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 49ca2b2ef3d003402584c68ae7b3055ba72e750a ] Truncate the device name to store IPoIB VLAN name. [leonro@5b4e8fba4ddd kernel]$ make -s -j 20 allmodconfig [leonro@5b4e8fba4ddd kernel]$ make -s -j 20 W=1 drivers/infiniband/ulp/ipoib/ drivers/infiniband/ulp/ipoib/ipoib_vlan.c: In function ‘ipoib_vlan_add’: drivers/infiniband/ulp/ipoib/ipoib_vlan.c:187:52: error: ‘%04x’ directive output may be truncated writing 4 bytes into a region of size between 0 and 15 [-Werror=format-truncation=] 187 | snprintf(intf_name, sizeof(intf_name), "%s.%04x", | ^~~~ drivers/infiniband/ulp/ipoib/ipoib_vlan.c:187:48: note: directive argument in the range [0, 65535] 187 | snprintf(intf_name, sizeof(intf_name), "%s.%04x", | ^~~~~~~~~ drivers/infiniband/ulp/ipoib/ipoib_vlan.c:187:9: note: ‘snprintf’ output between 6 and 21 bytes into a destination of size 16 187 | snprintf(intf_name, sizeof(intf_name), "%s.%04x", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 188 | ppriv->dev->name, pkey); | ~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors make[6]: *** [scripts/Makefile.build:244: drivers/infiniband/ulp/ipoib/ipoib_vlan.o] Error 1 make[6]: *** Waiting for unfinished jobs.... Fixes: 9baa0b036410 ("IB/ipoib: Add rtnl_link_ops support") Link: https://lore.kernel.org/r/e9d3e1fef69df4c9beaf402cc3ac342bad680791.1715240029.git.leon@kernel.org Signed-off-by: Leon Romanovsky Signed-off-by: Sasha Levin --- drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c index 4c50a87ed7cc..15a7cda44676 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c @@ -185,8 +185,12 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) ppriv = ipoib_priv(pdev); - snprintf(intf_name, sizeof(intf_name), "%s.%04x", - ppriv->dev->name, pkey); + /* If you increase IFNAMSIZ, update snprintf below + * to allow longer names. + */ + BUILD_BUG_ON(IFNAMSIZ != 16); + snprintf(intf_name, sizeof(intf_name), "%.10s.%04x", ppriv->dev->name, + pkey); ndev = ipoib_intf_alloc(ppriv->ca, ppriv->port, intf_name); if (IS_ERR(ndev)) { From 66fd37d0a86fea66ac8df547d5595c9d42ee7734 Mon Sep 17 00:00:00 2001 From: Qinglang Miao Date: Tue, 5 Jan 2021 13:57:54 +0800 Subject: [PATCH 0129/1565] net: qrtr: fix null-ptr-deref in qrtr_ns_remove [ Upstream commit 4beb17e553b49c3dd74505c9f361e756aaae653e ] A null-ptr-deref bug is reported by Hulk Robot like this: -------------- KASAN: null-ptr-deref in range [0x0000000000000128-0x000000000000012f] Call Trace: qrtr_ns_remove+0x22/0x40 [ns] qrtr_proto_fini+0xa/0x31 [qrtr] __x64_sys_delete_module+0x337/0x4e0 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x468ded -------------- When qrtr_ns_init fails in qrtr_proto_init, qrtr_ns_remove which would be called later on would raise a null-ptr-deref because qrtr_ns.workqueue has been destroyed. Fix it by making qrtr_ns_init have a return value and adding a check in qrtr_proto_init. Reported-by: Hulk Robot Signed-off-by: Qinglang Miao Signed-off-by: David S. Miller Stable-dep-of: fd76e5ccc48f ("net: qrtr: ns: Fix module refcnt") Signed-off-by: Sasha Levin --- net/qrtr/af_qrtr.c | 16 +++++++++++----- net/qrtr/ns.c | 7 ++++--- net/qrtr/qrtr.h | 2 +- 3 files changed, 16 insertions(+), 9 deletions(-) diff --git a/net/qrtr/af_qrtr.c b/net/qrtr/af_qrtr.c index 71c2295d4a57..29c0886eb9ef 100644 --- a/net/qrtr/af_qrtr.c +++ b/net/qrtr/af_qrtr.c @@ -1279,13 +1279,19 @@ static int __init qrtr_proto_init(void) return rc; rc = sock_register(&qrtr_family); - if (rc) { - proto_unregister(&qrtr_proto); - return rc; - } + if (rc) + goto err_proto; - qrtr_ns_init(); + rc = qrtr_ns_init(); + if (rc) + goto err_sock; + return 0; + +err_sock: + sock_unregister(qrtr_family.family); +err_proto: + proto_unregister(&qrtr_proto); return rc; } postcore_initcall(qrtr_proto_init); diff --git a/net/qrtr/ns.c b/net/qrtr/ns.c index c92dd960bfef..3376a656da39 100644 --- a/net/qrtr/ns.c +++ b/net/qrtr/ns.c @@ -771,7 +771,7 @@ static void qrtr_ns_data_ready(struct sock *sk) queue_work(qrtr_ns.workqueue, &qrtr_ns.work); } -void qrtr_ns_init(void) +int qrtr_ns_init(void) { struct sockaddr_qrtr sq; int ret; @@ -782,7 +782,7 @@ void qrtr_ns_init(void) ret = sock_create_kern(&init_net, AF_QIPCRTR, SOCK_DGRAM, PF_QIPCRTR, &qrtr_ns.sock); if (ret < 0) - return; + return ret; ret = kernel_getsockname(qrtr_ns.sock, (struct sockaddr *)&sq); if (ret < 0) { @@ -815,12 +815,13 @@ void qrtr_ns_init(void) if (ret < 0) goto err_wq; - return; + return 0; err_wq: destroy_workqueue(qrtr_ns.workqueue); err_sock: sock_release(qrtr_ns.sock); + return ret; } EXPORT_SYMBOL_GPL(qrtr_ns_init); diff --git a/net/qrtr/qrtr.h b/net/qrtr/qrtr.h index dc2b67f17927..3f2d28696062 100644 --- a/net/qrtr/qrtr.h +++ b/net/qrtr/qrtr.h @@ -29,7 +29,7 @@ void qrtr_endpoint_unregister(struct qrtr_endpoint *ep); int qrtr_endpoint_post(struct qrtr_endpoint *ep, const void *data, size_t len); -void qrtr_ns_init(void); +int qrtr_ns_init(void); void qrtr_ns_remove(void); From cafccde4298faa2d54dbe3622f7f11d4d15872d8 Mon Sep 17 00:00:00 2001 From: Chris Lew Date: Mon, 13 May 2024 10:31:46 -0700 Subject: [PATCH 0130/1565] net: qrtr: ns: Fix module refcnt [ Upstream commit fd76e5ccc48f9f54eb44909dd7c0b924005f1582 ] The qrtr protocol core logic and the qrtr nameservice are combined into a single module. Neither the core logic or nameservice provide much functionality by themselves; combining the two into a single module also prevents any possible issues that may stem from client modules loading inbetween qrtr and the ns. Creating a socket takes two references to the module that owns the socket protocol. Since the ns needs to create the control socket, this creates a scenario where there are always two references to the qrtr module. This prevents the execution of 'rmmod' for qrtr. To resolve this, forcefully put the module refcount for the socket opened by the nameservice. Fixes: a365023a76f2 ("net: qrtr: combine nameservice into main module") Reported-by: Jeffrey Hugo Tested-by: Jeffrey Hugo Signed-off-by: Chris Lew Reviewed-by: Manivannan Sadhasivam Reviewed-by: Jeffrey Hugo Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/qrtr/ns.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/net/qrtr/ns.c b/net/qrtr/ns.c index 3376a656da39..1da34d54092b 100644 --- a/net/qrtr/ns.c +++ b/net/qrtr/ns.c @@ -815,6 +815,24 @@ int qrtr_ns_init(void) if (ret < 0) goto err_wq; + /* As the qrtr ns socket owner and creator is the same module, we have + * to decrease the qrtr module reference count to guarantee that it + * remains zero after the ns socket is created, otherwise, executing + * "rmmod" command is unable to make the qrtr module deleted after the + * qrtr module is inserted successfully. + * + * However, the reference count is increased twice in + * sock_create_kern(): one is to increase the reference count of owner + * of qrtr socket's proto_ops struct; another is to increment the + * reference count of owner of qrtr proto struct. Therefore, we must + * decrement the module reference count twice to ensure that it keeps + * zero after server's listening socket is created. Of course, we + * must bump the module reference count twice as well before the socket + * is closed. + */ + module_put(qrtr_ns.sock->ops->owner); + module_put(qrtr_ns.sock->sk->sk_prot_creator->owner); + return 0; err_wq: @@ -829,6 +847,15 @@ void qrtr_ns_remove(void) { cancel_work_sync(&qrtr_ns.work); destroy_workqueue(qrtr_ns.workqueue); + + /* sock_release() expects the two references that were put during + * qrtr_ns_init(). This function is only called during module remove, + * so try_stop_module() has already set the refcnt to 0. Use + * __module_get() instead of try_module_get() to successfully take two + * references. + */ + __module_get(qrtr_ns.sock->ops->owner); + __module_get(qrtr_ns.sock->sk->sk_prot_creator->owner); sock_release(qrtr_ns.sock); } EXPORT_SYMBOL_GPL(qrtr_ns_remove); From b117e5b4f27c2c9076561b6be450a9619f0b79de Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 15 May 2024 14:29:34 +0000 Subject: [PATCH 0131/1565] netrom: fix possible dead-lock in nr_rt_ioctl() [ Upstream commit e03e7f20ebf7e1611d40d1fdc1bde900fd3335f6 ] syzbot loves netrom, and found a possible deadlock in nr_rt_ioctl [1] Make sure we always acquire nr_node_list_lock before nr_node_lock(nr_node) [1] WARNING: possible circular locking dependency detected 6.9.0-rc7-syzkaller-02147-g654de42f3fc6 #0 Not tainted ------------------------------------------------------ syz-executor350/5129 is trying to acquire lock: ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline] ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_node_lock include/net/netrom.h:152 [inline] ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:464 [inline] ffff8880186e2070 (&nr_node->node_lock){+...}-{2:2}, at: nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697 but task is already holding lock: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline] ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:462 [inline] ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_rt_ioctl+0x10a/0x1090 net/netrom/nr_route.c:697 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (nr_node_list_lock){+...}-{2:2}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:356 [inline] nr_remove_node net/netrom/nr_route.c:299 [inline] nr_del_node+0x4b4/0x820 net/netrom/nr_route.c:355 nr_rt_ioctl+0xa95/0x1090 net/netrom/nr_route.c:683 sock_do_ioctl+0x158/0x460 net/socket.c:1222 sock_ioctl+0x629/0x8e0 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&nr_node->node_lock){+...}-{2:2}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:356 [inline] nr_node_lock include/net/netrom.h:152 [inline] nr_dec_obs net/netrom/nr_route.c:464 [inline] nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697 sock_do_ioctl+0x158/0x460 net/socket.c:1222 sock_ioctl+0x629/0x8e0 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(nr_node_list_lock); lock(&nr_node->node_lock); lock(nr_node_list_lock); lock(&nr_node->node_lock); *** DEADLOCK *** 1 lock held by syz-executor350/5129: #0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline] #0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_dec_obs net/netrom/nr_route.c:462 [inline] #0: ffffffff8f7053b8 (nr_node_list_lock){+...}-{2:2}, at: nr_rt_ioctl+0x10a/0x1090 net/netrom/nr_route.c:697 stack backtrace: CPU: 0 PID: 5129 Comm: syz-executor350 Not tainted 6.9.0-rc7-syzkaller-02147-g654de42f3fc6 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187 check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline] _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178 spin_lock_bh include/linux/spinlock.h:356 [inline] nr_node_lock include/net/netrom.h:152 [inline] nr_dec_obs net/netrom/nr_route.c:464 [inline] nr_rt_ioctl+0x1bb/0x1090 net/netrom/nr_route.c:697 sock_do_ioctl+0x158/0x460 net/socket.c:1222 sock_ioctl+0x629/0x8e0 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot Signed-off-by: Eric Dumazet Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20240515142934.3708038-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/netrom/nr_route.c | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c index 895702337c92..9269b5e69b9a 100644 --- a/net/netrom/nr_route.c +++ b/net/netrom/nr_route.c @@ -284,22 +284,14 @@ static int __must_check nr_add_node(ax25_address *nr, const char *mnemonic, return 0; } -static inline void __nr_remove_node(struct nr_node *nr_node) +static void nr_remove_node_locked(struct nr_node *nr_node) { + lockdep_assert_held(&nr_node_list_lock); + hlist_del_init(&nr_node->node_node); nr_node_put(nr_node); } -#define nr_remove_node_locked(__node) \ - __nr_remove_node(__node) - -static void nr_remove_node(struct nr_node *nr_node) -{ - spin_lock_bh(&nr_node_list_lock); - __nr_remove_node(nr_node); - spin_unlock_bh(&nr_node_list_lock); -} - static inline void __nr_remove_neigh(struct nr_neigh *nr_neigh) { hlist_del_init(&nr_neigh->neigh_node); @@ -338,6 +330,7 @@ static int nr_del_node(ax25_address *callsign, ax25_address *neighbour, struct n return -EINVAL; } + spin_lock_bh(&nr_node_list_lock); nr_node_lock(nr_node); for (i = 0; i < nr_node->count; i++) { if (nr_node->routes[i].neighbour == nr_neigh) { @@ -351,7 +344,7 @@ static int nr_del_node(ax25_address *callsign, ax25_address *neighbour, struct n nr_node->count--; if (nr_node->count == 0) { - nr_remove_node(nr_node); + nr_remove_node_locked(nr_node); } else { switch (i) { case 0: @@ -365,12 +358,14 @@ static int nr_del_node(ax25_address *callsign, ax25_address *neighbour, struct n nr_node_put(nr_node); } nr_node_unlock(nr_node); + spin_unlock_bh(&nr_node_list_lock); return 0; } } nr_neigh_put(nr_neigh); nr_node_unlock(nr_node); + spin_unlock_bh(&nr_node_list_lock); nr_node_put(nr_node); return -EINVAL; From e892f9932dd6c0c689d4c1dc0888cfa86377c636 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 15 May 2024 16:33:58 +0000 Subject: [PATCH 0132/1565] af_packet: do not call packet_read_pending() from tpacket_destruct_skb() [ Upstream commit 581073f626e387d3e7eed55c48c8495584ead7ba ] trafgen performance considerably sank on hosts with many cores after the blamed commit. packet_read_pending() is very expensive, and calling it in af_packet fast path defeats Daniel intent in commit b013840810c2 ("packet: use percpu mmap tx frame pending refcount") tpacket_destruct_skb() makes room for one packet, we can immediately wakeup a producer, no need to completely drain the tx ring. Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET") Signed-off-by: Eric Dumazet Cc: Neil Horman Cc: Daniel Borkmann Reviewed-by: Willem de Bruijn Link: https://lore.kernel.org/r/20240515163358.4105915-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/packet/af_packet.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index db5d16c5d5b1..8e52f0949305 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2487,8 +2487,7 @@ static void tpacket_destruct_skb(struct sk_buff *skb) ts = __packet_set_timestamp(po, ph, skb); __packet_set_status(po, ph, TP_STATUS_AVAILABLE | ts); - if (!packet_read_pending(&po->tx_ring)) - complete(&po->skb_completion); + complete(&po->skb_completion); } sock_wfree(skb); From ae20865fe637b955ea11cf5ae3082033256cfaa6 Mon Sep 17 00:00:00 2001 From: Vitalii Bursov Date: Tue, 30 Apr 2024 18:05:23 +0300 Subject: [PATCH 0133/1565] sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level [ Upstream commit a1fd0b9d751f840df23ef0e75b691fc00cfd4743 ] Change relax_domain_level checks so that it would be possible to include or exclude all domains from newidle balancing. This matches the behavior described in the documentation: -1 no request. use system default or follow request of others. 0 no search. 1 search siblings (hyperthreads in a core). "2" enables levels 0 and 1, level_max excludes the last (level_max) level, and level_max+1 includes all levels. Fixes: 1d3504fcf560 ("sched, cpuset: customize sched domains, core") Signed-off-by: Vitalii Bursov Signed-off-by: Ingo Molnar Tested-by: Dietmar Eggemann Reviewed-by: Vincent Guittot Reviewed-by: Valentin Schneider Link: https://lore.kernel.org/r/bd6de28e80073c79466ec6401cdeae78f0d4423d.1714488502.git.vitaly@bursov.com Signed-off-by: Sasha Levin --- kernel/cgroup/cpuset.c | 2 +- kernel/sched/topology.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 195f9cccab20..9f2a93c829a9 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1901,7 +1901,7 @@ bool current_cpuset_is_being_rebound(void) static int update_relax_domain_level(struct cpuset *cs, s64 val) { #ifdef CONFIG_SMP - if (val < -1 || val >= sched_domain_level_max) + if (val < -1 || val > sched_domain_level_max + 1) return -EINVAL; #endif diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index ff2c6d3ba6c7..6eb0996ef859 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1217,7 +1217,7 @@ static void set_domain_attribute(struct sched_domain *sd, } else request = attr->relax_domain_level; - if (sd->level > request) { + if (sd->level >= request) { /* Turn off idle balance on this domain: */ sd->flags &= ~(SD_BALANCE_WAKE|SD_BALANCE_NEWIDLE); } From eac10cf3a97ffd4b4deb0a29f57c118225a42850 Mon Sep 17 00:00:00 2001 From: Rui Miguel Silva Date: Mon, 25 Mar 2024 22:09:55 +0000 Subject: [PATCH 0134/1565] greybus: lights: check return of get_channel_from_mode [ Upstream commit a1ba19a1ae7cd1e324685ded4ab563e78fe68648 ] If channel for the given node is not found we return null from get_channel_from_mode. Make sure we validate the return pointer before using it in two of the missing places. This was originally reported in [0]: Found by Linux Verification Center (linuxtesting.org) with SVACE. [0] https://lore.kernel.org/all/20240301190425.120605-1-m.lobanov@rosalinux.ru Fixes: 2870b52bae4c ("greybus: lights: add lights implementation") Reported-by: Mikhail Lobanov Suggested-by: Mikhail Lobanov Suggested-by: Alex Elder Signed-off-by: Rui Miguel Silva Link: https://lore.kernel.org/r/20240325221549.2185265-1-rmfrfs@gmail.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- drivers/staging/greybus/light.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/staging/greybus/light.c b/drivers/staging/greybus/light.c index e59bb27236b9..7352d7deb8ba 100644 --- a/drivers/staging/greybus/light.c +++ b/drivers/staging/greybus/light.c @@ -147,6 +147,9 @@ static int __gb_lights_flash_brightness_set(struct gb_channel *channel) channel = get_channel_from_mode(channel->light, GB_CHANNEL_MODE_TORCH); + if (!channel) + return -EINVAL; + /* For not flash we need to convert brightness to intensity */ intensity = channel->intensity_uA.min + (channel->intensity_uA.step * channel->led->brightness); @@ -550,7 +553,10 @@ static int gb_lights_light_v4l2_register(struct gb_light *light) } channel_flash = get_channel_from_mode(light, GB_CHANNEL_MODE_FLASH); - WARN_ON(!channel_flash); + if (!channel_flash) { + dev_err(dev, "failed to get flash channel from mode\n"); + return -EINVAL; + } fled = &channel_flash->fled; From 7b98f1493a5b012edc2b853270b82c4cf9c56c9a Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Tue, 26 Mar 2024 19:28:45 +0800 Subject: [PATCH 0135/1565] f2fs: fix to wait on page writeback in __clone_blkaddrs() [ Upstream commit d3876e34e7e789e2cbdd782360fef2a777391082 ] In below race condition, dst page may become writeback status in __clone_blkaddrs(), it needs to wait writeback before update, fix it. Thread A GC Thread - f2fs_move_file_range - filemap_write_and_wait_range(dst) - gc_data_segment - f2fs_down_write(dst) - move_data_page - set_page_writeback(dst_page) - f2fs_submit_page_write - f2fs_up_write(dst) - f2fs_down_write(dst) - __exchange_data_block - __clone_blkaddrs - f2fs_get_new_data_page - memcpy_page Fixes: 0a2aa8fbb969 ("f2fs: refactor __exchange_data_block for speed up") Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Sasha Levin --- fs/f2fs/file.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 40e805014f71..678d72a87025 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1253,6 +1253,8 @@ static int __clone_blkaddrs(struct inode *src_inode, struct inode *dst_inode, f2fs_put_page(psrc, 1); return PTR_ERR(pdst); } + f2fs_wait_on_page_writeback(pdst, DATA, true, true); + f2fs_copy_page(psrc, pdst); set_page_dirty(pdst); f2fs_put_page(pdst, 1); From fd4bcb991ebaf0d1813d81d9983cfa99f9ef5328 Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Tue, 26 Mar 2024 09:01:16 +0000 Subject: [PATCH 0136/1565] soundwire: cadence: fix invalid PDI offset [ Upstream commit 8ee1b439b1540ae543149b15a2a61b9dff937d91 ] For some reason, we add an offset to the PDI, presumably to skip the PDI0 and PDI1 which are reserved for BPT. This code is however completely wrong and leads to an out-of-bounds access. We were just lucky so far since we used only a couple of PDIs and remained within the PDI array bounds. A Fixes: tag is not provided since there are no known platforms where the out-of-bounds would be accessed, and the initial code had problems as well. A follow-up patch completely removes this useless offset. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Signed-off-by: Bard Liao Link: https://lore.kernel.org/r/20240326090122.1051806-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul Signed-off-by: Sasha Levin --- drivers/soundwire/cadence_master.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/soundwire/cadence_master.c b/drivers/soundwire/cadence_master.c index 18e7d158fcca..d3e9cd3faadf 100644 --- a/drivers/soundwire/cadence_master.c +++ b/drivers/soundwire/cadence_master.c @@ -1698,7 +1698,7 @@ struct sdw_cdns_pdi *sdw_cdns_alloc_pdi(struct sdw_cdns *cdns, /* check if we found a PDI, else find in bi-directional */ if (!pdi) - pdi = cdns_find_pdi(cdns, 2, stream->num_bd, stream->bd, + pdi = cdns_find_pdi(cdns, 0, stream->num_bd, stream->bd, dai_id); if (pdi) { From 83e078085f1418a2b296bf46d281d1d3f0d3d0fe Mon Sep 17 00:00:00 2001 From: Chen Ni Date: Wed, 3 Apr 2024 02:49:32 +0000 Subject: [PATCH 0137/1565] dmaengine: idma64: Add check for dma_set_max_seg_size [ Upstream commit 2b1c1cf08a0addb6df42f16b37133dc7a351de29 ] As the possible failure of the dma_set_max_seg_size(), it should be better to check the return value of the dma_set_max_seg_size(). Fixes: e3fdb1894cfa ("dmaengine: idma64: set maximum allowed segment size for DMA") Signed-off-by: Chen Ni Acked-by: Andy Shevchenko Link: https://lore.kernel.org/r/20240403024932.3342606-1-nichen@iscas.ac.cn Signed-off-by: Vinod Koul Signed-off-by: Sasha Levin --- drivers/dma/idma64.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/dma/idma64.c b/drivers/dma/idma64.c index db506e1f7ef4..0f065ba844c0 100644 --- a/drivers/dma/idma64.c +++ b/drivers/dma/idma64.c @@ -594,7 +594,9 @@ static int idma64_probe(struct idma64_chip *chip) idma64->dma.dev = chip->sysdev; - dma_set_max_seg_size(idma64->dma.dev, IDMA64C_CTLH_BLOCK_TS_MASK); + ret = dma_set_max_seg_size(idma64->dma.dev, IDMA64C_CTLH_BLOCK_TS_MASK); + if (ret) + return ret; ret = dma_async_device_register(&idma64->dma); if (ret) From bc40d0e356bb8e54b07a5a13da2477f4f418e228 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 8 Apr 2024 09:34:24 +0200 Subject: [PATCH 0138/1565] firmware: dmi-id: add a release callback function [ Upstream commit cf770af5645a41a753c55a053fa1237105b0964a ] dmi_class uses kfree() as the .release function, but that now causes a warning with clang-16 as it violates control flow integrity (KCFI) rules: drivers/firmware/dmi-id.c:174:17: error: cast from 'void (*)(const void *)' to 'void (*)(struct device *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict] 174 | .dev_release = (void(*)(struct device *)) kfree, Add an explicit function to call kfree() instead. Fixes: 4f5c791a850e ("DMI-based module autoloading") Link: https://lore.kernel.org/lkml/20240213100238.456912-1-arnd@kernel.org/ Signed-off-by: Arnd Bergmann Signed-off-by: Jean Delvare Signed-off-by: Sasha Levin --- drivers/firmware/dmi-id.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/firmware/dmi-id.c b/drivers/firmware/dmi-id.c index 86d71b0212b1..0cfc60cc267d 100644 --- a/drivers/firmware/dmi-id.c +++ b/drivers/firmware/dmi-id.c @@ -164,9 +164,14 @@ static int dmi_dev_uevent(struct device *dev, struct kobj_uevent_env *env) return 0; } +static void dmi_dev_release(struct device *dev) +{ + kfree(dev); +} + static struct class dmi_class = { .name = "dmi", - .dev_release = (void(*)(struct device *)) kfree, + .dev_release = dmi_dev_release, .dev_uevent = dmi_dev_uevent, }; From cc121e3722a0a2c8f716ef991e5425b180a5fb94 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Tue, 2 Apr 2024 22:50:28 +0300 Subject: [PATCH 0139/1565] serial: max3100: Lock port->lock when calling uart_handle_cts_change() [ Upstream commit 77ab53371a2066fdf9b895246505f5ef5a4b5d47 ] uart_handle_cts_change() has to be called with port lock taken, Since we run it in a separate work, the lock may not be taken at the time of running. Make sure that it's taken by explicitly doing that. Without it we got a splat: WARNING: CPU: 0 PID: 10 at drivers/tty/serial/serial_core.c:3491 uart_handle_cts_change+0xa6/0xb0 ... Workqueue: max3100-0 max3100_work [max3100] RIP: 0010:uart_handle_cts_change+0xa6/0xb0 ... max3100_handlerx+0xc5/0x110 [max3100] max3100_work+0x12a/0x340 [max3100] Fixes: 7831d56b0a35 ("tty: MAX3100") Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20240402195306.269276-2-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- drivers/tty/serial/max3100.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/tty/serial/max3100.c b/drivers/tty/serial/max3100.c index 371569a0fd00..915d7753eec2 100644 --- a/drivers/tty/serial/max3100.c +++ b/drivers/tty/serial/max3100.c @@ -213,7 +213,7 @@ static int max3100_sr(struct max3100_port *s, u16 tx, u16 *rx) return 0; } -static int max3100_handlerx(struct max3100_port *s, u16 rx) +static int max3100_handlerx_unlocked(struct max3100_port *s, u16 rx) { unsigned int ch, flg, status = 0; int ret = 0, cts; @@ -253,6 +253,17 @@ static int max3100_handlerx(struct max3100_port *s, u16 rx) return ret; } +static int max3100_handlerx(struct max3100_port *s, u16 rx) +{ + unsigned long flags; + int ret; + + uart_port_lock_irqsave(&s->port, &flags); + ret = max3100_handlerx_unlocked(s, rx); + uart_port_unlock_irqrestore(&s->port, flags); + return ret; +} + static void max3100_work(struct work_struct *w) { struct max3100_port *s = container_of(w, struct max3100_port, work); From e8e2a4339decad7e59425b594a98613402652d72 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Tue, 2 Apr 2024 22:50:29 +0300 Subject: [PATCH 0140/1565] serial: max3100: Update uart_driver_registered on driver removal MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 712a1fcb38dc7cac6da63ee79a88708fbf9c45ec ] The removal of the last MAX3100 device triggers the removal of the driver. However, code doesn't update the respective global variable and after insmod — rmmod — insmod cycle the kernel oopses: max3100 spi-PRP0001:01: max3100_probe: adding port 0 BUG: kernel NULL pointer dereference, address: 0000000000000408 ... RIP: 0010:serial_core_register_port+0xa0/0x840 ... max3100_probe+0x1b6/0x280 [max3100] spi_probe+0x8d/0xb0 Update the actual state so next time UART driver will be registered again. Hugo also noticed, that the error path in the probe also affected by having the variable set, and not cleared. Instead of clearing it move the assignment after the successfull uart_register_driver() call. Fixes: 7831d56b0a35 ("tty: MAX3100") Signed-off-by: Andy Shevchenko Reviewed-by: Hugo Villeneuve Link: https://lore.kernel.org/r/20240402195306.269276-3-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- drivers/tty/serial/max3100.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/tty/serial/max3100.c b/drivers/tty/serial/max3100.c index 915d7753eec2..c1ee88f53033 100644 --- a/drivers/tty/serial/max3100.c +++ b/drivers/tty/serial/max3100.c @@ -754,13 +754,14 @@ static int max3100_probe(struct spi_device *spi) mutex_lock(&max3100s_lock); if (!uart_driver_registered) { - uart_driver_registered = 1; retval = uart_register_driver(&max3100_uart_driver); if (retval) { printk(KERN_ERR "Couldn't register max3100 uart driver\n"); mutex_unlock(&max3100s_lock); return retval; } + + uart_driver_registered = 1; } for (i = 0; i < MAX_MAX3100; i++) @@ -846,6 +847,7 @@ static int max3100_remove(struct spi_device *spi) } pr_debug("removing max3100 driver\n"); uart_unregister_driver(&max3100_uart_driver); + uart_driver_registered = 0; mutex_unlock(&max3100s_lock); return 0; From d1f67d1d8c08832f0145736a02d757ef3fcc7b14 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Tue, 2 Apr 2024 22:50:30 +0300 Subject: [PATCH 0141/1565] serial: max3100: Fix bitwise types [ Upstream commit e60955dbecb97f080848a57524827e2db29c70fd ] Sparse is not happy about misuse of bitwise types: .../max3100.c:194:13: warning: incorrect type in assignment (different base types) .../max3100.c:194:13: expected unsigned short [addressable] [usertype] etx .../max3100.c:194:13: got restricted __be16 [usertype] .../max3100.c:202:15: warning: cast to restricted __be16 Fix this by choosing proper types for the respective variables. Fixes: 7831d56b0a35 ("tty: MAX3100") Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20240402195306.269276-4-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- drivers/tty/serial/max3100.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/tty/serial/max3100.c b/drivers/tty/serial/max3100.c index c1ee88f53033..17b6f4a872d6 100644 --- a/drivers/tty/serial/max3100.c +++ b/drivers/tty/serial/max3100.c @@ -45,6 +45,9 @@ #include #include #include +#include + +#include #include @@ -191,7 +194,7 @@ static void max3100_timeout(struct timer_list *t) static int max3100_sr(struct max3100_port *s, u16 tx, u16 *rx) { struct spi_message message; - u16 etx, erx; + __be16 etx, erx; int status; struct spi_transfer tran = { .tx_buf = &etx, From 276bc8a7dcc10909e6bdea3b69d3bdb97f0adf7a Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 3 Apr 2024 10:06:35 +0200 Subject: [PATCH 0142/1565] greybus: arche-ctrl: move device table to its right location [ Upstream commit 6a0b8c0da8d8d418cde6894a104cf74e6098ddfa ] The arche-ctrl has two platform drivers and three of_device_id tables, but one table is only used for the the module loader, while the other two seem to be associated with their drivers. This leads to a W=1 warning when the driver is built-in: drivers/staging/greybus/arche-platform.c:623:34: error: 'arche_combined_id' defined but not used [-Werror=unused-const-variable=] 623 | static const struct of_device_id arche_combined_id[] = { Drop the extra table and register both tables that are actually used as the ones for the module loader instead. Fixes: 7b62b61c752a ("greybus: arche-ctrl: Don't expose driver internals to arche-platform driver") Signed-off-by: Arnd Bergmann Link: https://lore.kernel.org/r/20240403080702.3509288-18-arnd@kernel.org Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- drivers/staging/greybus/arche-apb-ctrl.c | 1 + drivers/staging/greybus/arche-platform.c | 9 +-------- 2 files changed, 2 insertions(+), 8 deletions(-) diff --git a/drivers/staging/greybus/arche-apb-ctrl.c b/drivers/staging/greybus/arche-apb-ctrl.c index bbf3ba744fc4..c7383c6c6094 100644 --- a/drivers/staging/greybus/arche-apb-ctrl.c +++ b/drivers/staging/greybus/arche-apb-ctrl.c @@ -468,6 +468,7 @@ static const struct of_device_id arche_apb_ctrl_of_match[] = { { .compatible = "usbffff,2", }, { }, }; +MODULE_DEVICE_TABLE(of, arche_apb_ctrl_of_match); static struct platform_driver arche_apb_ctrl_device_driver = { .probe = arche_apb_ctrl_probe, diff --git a/drivers/staging/greybus/arche-platform.c b/drivers/staging/greybus/arche-platform.c index eebf0deb39f5..02a700f720e6 100644 --- a/drivers/staging/greybus/arche-platform.c +++ b/drivers/staging/greybus/arche-platform.c @@ -622,14 +622,7 @@ static const struct of_device_id arche_platform_of_match[] = { { .compatible = "google,arche-platform", }, { }, }; - -static const struct of_device_id arche_combined_id[] = { - /* Use PID/VID of SVC device */ - { .compatible = "google,arche-platform", }, - { .compatible = "usbffff,2", }, - { }, -}; -MODULE_DEVICE_TABLE(of, arche_combined_id); +MODULE_DEVICE_TABLE(of, arche_platform_of_match); static struct platform_driver arche_platform_device_driver = { .probe = arche_platform_probe, From 1ada965692908b99578ec95e6ec62d5df4358342 Mon Sep 17 00:00:00 2001 From: Hugo Villeneuve Date: Tue, 9 Apr 2024 11:42:49 -0400 Subject: [PATCH 0143/1565] serial: sc16is7xx: add proper sched.h include for sched_set_fifo() [ Upstream commit 2a8e4ab0c93fad30769479f86849e22d63cd0e12 ] Replace incorrect include with the proper one for sched_set_fifo() declaration. Fixes: 28d2f209cd16 ("sched,serial: Convert to sched_set_fifo()") Signed-off-by: Hugo Villeneuve Link: https://lore.kernel.org/r/20240409154253.3043822-2-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- drivers/tty/serial/sc16is7xx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c index 29f05db0d49b..d751f8ce5cf6 100644 --- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -25,7 +26,6 @@ #include #include #include -#include #define SC16IS7XX_NAME "sc16is7xx" #define SC16IS7XX_MAX_DEVS 8 From 70fb69e05a258e4fec458ce3f2497775b41ffcc3 Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Thu, 26 Nov 2020 18:32:09 +0800 Subject: [PATCH 0144/1565] f2fs: compress: support chksum [ Upstream commit b28f047b28c51d0b9864c34b097bb0b221ea7247 ] This patch supports to store chksum value with compressed data, and verify the integrality of compressed data while reading the data. The feature can be enabled through specifying mount option 'compress_chksum'. Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Stable-dep-of: 7c5dffb3d90c ("f2fs: compress: fix to relocate check condition in f2fs_{release,reserve}_compress_blocks()") Signed-off-by: Sasha Levin --- Documentation/filesystems/f2fs.rst | 1 + fs/f2fs/compress.c | 23 +++++++++++++++++++++++ fs/f2fs/f2fs.h | 16 ++++++++++++++-- fs/f2fs/inode.c | 3 +++ fs/f2fs/super.c | 9 +++++++++ include/linux/f2fs_fs.h | 2 +- 6 files changed, 51 insertions(+), 3 deletions(-) diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst index 8c0fbdd8ce6f..3d21a9e86995 100644 --- a/Documentation/filesystems/f2fs.rst +++ b/Documentation/filesystems/f2fs.rst @@ -260,6 +260,7 @@ compress_extension=%s Support adding specified extension, so that f2fs can enab For other files, we can still enable compression via ioctl. Note that, there is one reserved special extension '*', it can be set to enable compression for all files. +compress_chksum Support verifying chksum of raw data in compressed cluster. inlinecrypt When possible, encrypt/decrypt the contents of encrypted files using the blk-crypto framework rather than filesystem-layer encryption. This allows the use of diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c index a94e102d1586..c87020afda51 100644 --- a/fs/f2fs/compress.c +++ b/fs/f2fs/compress.c @@ -589,6 +589,7 @@ static int f2fs_compress_pages(struct compress_ctx *cc) f2fs_cops[fi->i_compress_algorithm]; unsigned int max_len, new_nr_cpages; struct page **new_cpages; + u32 chksum = 0; int i, ret; trace_f2fs_compress_pages_start(cc->inode, cc->cluster_idx, @@ -642,6 +643,11 @@ static int f2fs_compress_pages(struct compress_ctx *cc) cc->cbuf->clen = cpu_to_le32(cc->clen); + if (fi->i_compress_flag & 1 << COMPRESS_CHKSUM) + chksum = f2fs_crc32(F2FS_I_SB(cc->inode), + cc->cbuf->cdata, cc->clen); + cc->cbuf->chksum = cpu_to_le32(chksum); + for (i = 0; i < COMPRESS_DATA_RESERVED_SIZE; i++) cc->cbuf->reserved[i] = cpu_to_le32(0); @@ -777,6 +783,23 @@ void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity) ret = cops->decompress_pages(dic); + if (!ret && fi->i_compress_flag & 1 << COMPRESS_CHKSUM) { + u32 provided = le32_to_cpu(dic->cbuf->chksum); + u32 calculated = f2fs_crc32(sbi, dic->cbuf->cdata, dic->clen); + + if (provided != calculated) { + if (!is_inode_flag_set(dic->inode, FI_COMPRESS_CORRUPT)) { + set_inode_flag(dic->inode, FI_COMPRESS_CORRUPT); + printk_ratelimited( + "%sF2FS-fs (%s): checksum invalid, nid = %lu, %x vs %x", + KERN_INFO, sbi->sb->s_id, dic->inode->i_ino, + provided, calculated); + } + set_sbi_flag(sbi, SBI_NEED_FSCK); + WARN_ON_ONCE(1); + } + } + out_vunmap_cbuf: vm_unmap_ram(dic->cbuf, dic->nr_cpages); out_vunmap_rbuf: diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 83ebc860508b..6dfefbf54917 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -147,7 +147,8 @@ struct f2fs_mount_info { /* For compression */ unsigned char compress_algorithm; /* algorithm type */ - unsigned compress_log_size; /* cluster log size */ + unsigned char compress_log_size; /* cluster log size */ + bool compress_chksum; /* compressed data chksum */ unsigned char compress_ext_cnt; /* extension count */ unsigned char extensions[COMPRESS_EXT_NUM][F2FS_EXTENSION_LEN]; /* extensions */ }; @@ -678,6 +679,7 @@ enum { FI_ATOMIC_REVOKE_REQUEST, /* request to drop atomic data */ FI_VERITY_IN_PROGRESS, /* building fs-verity Merkle tree */ FI_COMPRESSED_FILE, /* indicate file's data can be compressed */ + FI_COMPRESS_CORRUPT, /* indicate compressed cluster is corrupted */ FI_MMAP_FILE, /* indicate file was mmapped */ FI_MAX, /* max flag, never be used */ }; @@ -736,6 +738,7 @@ struct f2fs_inode_info { atomic_t i_compr_blocks; /* # of compressed blocks */ unsigned char i_compress_algorithm; /* algorithm type */ unsigned char i_log_cluster_size; /* log of cluster size */ + unsigned short i_compress_flag; /* compress flag */ unsigned int i_cluster_size; /* cluster size */ }; @@ -1281,9 +1284,15 @@ enum compress_algorithm_type { COMPRESS_MAX, }; -#define COMPRESS_DATA_RESERVED_SIZE 5 +enum compress_flag { + COMPRESS_CHKSUM, + COMPRESS_MAX_FLAG, +}; + +#define COMPRESS_DATA_RESERVED_SIZE 4 struct compress_data { __le32 clen; /* compressed data size */ + __le32 chksum; /* compressed data chksum */ __le32 reserved[COMPRESS_DATA_RESERVED_SIZE]; /* reserved */ u8 cdata[]; /* compressed data */ }; @@ -3925,6 +3934,9 @@ static inline void set_compress_context(struct inode *inode) F2FS_OPTION(sbi).compress_algorithm; F2FS_I(inode)->i_log_cluster_size = F2FS_OPTION(sbi).compress_log_size; + F2FS_I(inode)->i_compress_flag = + F2FS_OPTION(sbi).compress_chksum ? + 1 << COMPRESS_CHKSUM : 0; F2FS_I(inode)->i_cluster_size = 1 << F2FS_I(inode)->i_log_cluster_size; F2FS_I(inode)->i_flags |= F2FS_COMPR_FL; diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index 87752550f78c..3e98551f4186 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -455,6 +455,7 @@ static int do_read_inode(struct inode *inode) le64_to_cpu(ri->i_compr_blocks)); fi->i_compress_algorithm = ri->i_compress_algorithm; fi->i_log_cluster_size = ri->i_log_cluster_size; + fi->i_compress_flag = le16_to_cpu(ri->i_compress_flag); fi->i_cluster_size = 1 << fi->i_log_cluster_size; set_inode_flag(inode, FI_COMPRESSED_FILE); } @@ -633,6 +634,8 @@ void f2fs_update_inode(struct inode *inode, struct page *node_page) &F2FS_I(inode)->i_compr_blocks)); ri->i_compress_algorithm = F2FS_I(inode)->i_compress_algorithm; + ri->i_compress_flag = + cpu_to_le16(F2FS_I(inode)->i_compress_flag); ri->i_log_cluster_size = F2FS_I(inode)->i_log_cluster_size; } diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 9a74d60f61db..065aa01958e9 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -146,6 +146,7 @@ enum { Opt_compress_algorithm, Opt_compress_log_size, Opt_compress_extension, + Opt_compress_chksum, Opt_atgc, Opt_err, }; @@ -214,6 +215,7 @@ static match_table_t f2fs_tokens = { {Opt_compress_algorithm, "compress_algorithm=%s"}, {Opt_compress_log_size, "compress_log_size=%u"}, {Opt_compress_extension, "compress_extension=%s"}, + {Opt_compress_chksum, "compress_chksum"}, {Opt_atgc, "atgc"}, {Opt_err, NULL}, }; @@ -974,10 +976,14 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount) F2FS_OPTION(sbi).compress_ext_cnt++; kfree(name); break; + case Opt_compress_chksum: + F2FS_OPTION(sbi).compress_chksum = true; + break; #else case Opt_compress_algorithm: case Opt_compress_log_size: case Opt_compress_extension: + case Opt_compress_chksum: f2fs_info(sbi, "compression options not supported"); break; #endif @@ -1562,6 +1568,9 @@ static inline void f2fs_show_compress_options(struct seq_file *seq, seq_printf(seq, ",compress_extension=%s", F2FS_OPTION(sbi).extensions[i]); } + + if (F2FS_OPTION(sbi).compress_chksum) + seq_puts(seq, ",compress_chksum"); } static int f2fs_show_options(struct seq_file *seq, struct dentry *root) diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h index a5dbb57a687f..7dc2a06cf19a 100644 --- a/include/linux/f2fs_fs.h +++ b/include/linux/f2fs_fs.h @@ -273,7 +273,7 @@ struct f2fs_inode { __le64 i_compr_blocks; /* # of compressed blocks */ __u8 i_compress_algorithm; /* compress algorithm */ __u8 i_log_cluster_size; /* log of cluster size */ - __le16 i_padding; /* padding */ + __le16 i_compress_flag; /* compress flag */ __le32 i_extra_end[0]; /* for attribute size calculation */ } __packed; __le32 i_addr[DEF_ADDRS_PER_INODE]; /* Pointers to data blocks */ From 5b09d2e7906639b29959bba3d516a71b3280a764 Mon Sep 17 00:00:00 2001 From: Daeho Jeong Date: Tue, 1 Dec 2020 13:08:02 +0900 Subject: [PATCH 0145/1565] f2fs: add compress_mode mount option [ Upstream commit 602a16d58e9aab3c423bcf051033ea6c9e8a6d37 ] We will add a new "compress_mode" mount option to control file compression mode. This supports "fs" and "user". In "fs" mode (default), f2fs does automatic compression on the compression enabled files. In "user" mode, f2fs disables the automaic compression and gives the user discretion of choosing the target file and the timing. It means the user can do manual compression/decompression on the compression enabled files using ioctls. Signed-off-by: Daeho Jeong Reviewed-by: Chao Yu Signed-off-by: Jaegeuk Kim Stable-dep-of: 7c5dffb3d90c ("f2fs: compress: fix to relocate check condition in f2fs_{release,reserve}_compress_blocks()") Signed-off-by: Sasha Levin --- Documentation/filesystems/f2fs.rst | 35 ++++++++++++++++++++++++++++++ fs/f2fs/compress.c | 2 +- fs/f2fs/data.c | 2 +- fs/f2fs/f2fs.h | 30 +++++++++++++++++++++++++ fs/f2fs/segment.c | 2 +- fs/f2fs/super.c | 23 ++++++++++++++++++++ 6 files changed, 91 insertions(+), 3 deletions(-) diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst index 3d21a9e86995..de2bacc418fe 100644 --- a/Documentation/filesystems/f2fs.rst +++ b/Documentation/filesystems/f2fs.rst @@ -261,6 +261,13 @@ compress_extension=%s Support adding specified extension, so that f2fs can enab Note that, there is one reserved special extension '*', it can be set to enable compression for all files. compress_chksum Support verifying chksum of raw data in compressed cluster. +compress_mode=%s Control file compression mode. This supports "fs" and "user" + modes. In "fs" mode (default), f2fs does automatic compression + on the compression enabled files. In "user" mode, f2fs disables + the automaic compression and gives the user discretion of + choosing the target file and the timing. The user can do manual + compression/decompression on the compression enabled files using + ioctls. inlinecrypt When possible, encrypt/decrypt the contents of encrypted files using the blk-crypto framework rather than filesystem-layer encryption. This allows the use of @@ -811,6 +818,34 @@ Compress metadata layout:: | data length | data chksum | reserved | compressed data | +-------------+-------------+----------+----------------------------+ +Compression mode +-------------------------- + +f2fs supports "fs" and "user" compression modes with "compression_mode" mount option. +With this option, f2fs provides a choice to select the way how to compress the +compression enabled files (refer to "Compression implementation" section for how to +enable compression on a regular inode). + +1) compress_mode=fs +This is the default option. f2fs does automatic compression in the writeback of the +compression enabled files. + +2) compress_mode=user +This disables the automaic compression and gives the user discretion of choosing the +target file and the timing. The user can do manual compression/decompression on the +compression enabled files using F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE +ioctls like the below. + +To decompress a file, + +fd = open(filename, O_WRONLY, 0); +ret = ioctl(fd, F2FS_IOC_DECOMPRESS_FILE); + +To compress a file, + +fd = open(filename, O_WRONLY, 0); +ret = ioctl(fd, F2FS_IOC_COMPRESS_FILE); + NVMe Zoned Namespace devices ---------------------------- diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c index c87020afda51..6c870b741cfe 100644 --- a/fs/f2fs/compress.c +++ b/fs/f2fs/compress.c @@ -929,7 +929,7 @@ int f2fs_is_compressed_cluster(struct inode *inode, pgoff_t index) static bool cluster_may_compress(struct compress_ctx *cc) { - if (!f2fs_compressed_file(cc->inode)) + if (!f2fs_need_compress_data(cc->inode)) return false; if (f2fs_is_atomic_file(cc->inode)) return false; diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index e0533cffbb07..fc6c88e80cf4 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -3222,7 +3222,7 @@ static inline bool __should_serialize_io(struct inode *inode, if (IS_NOQUOTA(inode)) return false; - if (f2fs_compressed_file(inode)) + if (f2fs_need_compress_data(inode)) return true; if (wbc->sync_mode != WB_SYNC_ALL) return true; diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 6dfefbf54917..6a9f4dcea06d 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -150,6 +150,7 @@ struct f2fs_mount_info { unsigned char compress_log_size; /* cluster log size */ bool compress_chksum; /* compressed data chksum */ unsigned char compress_ext_cnt; /* extension count */ + int compress_mode; /* compression mode */ unsigned char extensions[COMPRESS_EXT_NUM][F2FS_EXTENSION_LEN]; /* extensions */ }; @@ -681,6 +682,7 @@ enum { FI_COMPRESSED_FILE, /* indicate file's data can be compressed */ FI_COMPRESS_CORRUPT, /* indicate compressed cluster is corrupted */ FI_MMAP_FILE, /* indicate file was mmapped */ + FI_ENABLE_COMPRESS, /* enable compression in "user" compression mode */ FI_MAX, /* max flag, never be used */ }; @@ -1255,6 +1257,18 @@ enum fsync_mode { FSYNC_MODE_NOBARRIER, /* fsync behaves nobarrier based on posix */ }; +enum { + COMPR_MODE_FS, /* + * automatically compress compression + * enabled files + */ + COMPR_MODE_USER, /* + * automatical compression is disabled. + * user can control the file compression + * using ioctls + */ +}; + /* * this value is set in page as a private data which indicate that * the page is atomically written, and it is in inmem_pages list. @@ -2795,6 +2809,22 @@ static inline int f2fs_compressed_file(struct inode *inode) is_inode_flag_set(inode, FI_COMPRESSED_FILE); } +static inline bool f2fs_need_compress_data(struct inode *inode) +{ + int compress_mode = F2FS_OPTION(F2FS_I_SB(inode)).compress_mode; + + if (!f2fs_compressed_file(inode)) + return false; + + if (compress_mode == COMPR_MODE_FS) + return true; + else if (compress_mode == COMPR_MODE_USER && + is_inode_flag_set(inode, FI_ENABLE_COMPRESS)) + return true; + + return false; +} + static inline unsigned int addrs_per_inode(struct inode *inode) { unsigned int addrs = CUR_ADDRS_PER_INODE(inode) - diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index a27a93429271..ad30908ac99f 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -3296,7 +3296,7 @@ static int __get_segment_type_6(struct f2fs_io_info *fio) else return CURSEG_COLD_DATA; } - if (file_is_cold(inode) || f2fs_compressed_file(inode)) + if (file_is_cold(inode) || f2fs_need_compress_data(inode)) return CURSEG_COLD_DATA; if (file_is_hot(inode) || is_inode_flag_set(inode, FI_HOT_DATA) || diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 065aa01958e9..1281b59da6a2 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -147,6 +147,7 @@ enum { Opt_compress_log_size, Opt_compress_extension, Opt_compress_chksum, + Opt_compress_mode, Opt_atgc, Opt_err, }; @@ -216,6 +217,7 @@ static match_table_t f2fs_tokens = { {Opt_compress_log_size, "compress_log_size=%u"}, {Opt_compress_extension, "compress_extension=%s"}, {Opt_compress_chksum, "compress_chksum"}, + {Opt_compress_mode, "compress_mode=%s"}, {Opt_atgc, "atgc"}, {Opt_err, NULL}, }; @@ -979,11 +981,26 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount) case Opt_compress_chksum: F2FS_OPTION(sbi).compress_chksum = true; break; + case Opt_compress_mode: + name = match_strdup(&args[0]); + if (!name) + return -ENOMEM; + if (!strcmp(name, "fs")) { + F2FS_OPTION(sbi).compress_mode = COMPR_MODE_FS; + } else if (!strcmp(name, "user")) { + F2FS_OPTION(sbi).compress_mode = COMPR_MODE_USER; + } else { + kfree(name); + return -EINVAL; + } + kfree(name); + break; #else case Opt_compress_algorithm: case Opt_compress_log_size: case Opt_compress_extension: case Opt_compress_chksum: + case Opt_compress_mode: f2fs_info(sbi, "compression options not supported"); break; #endif @@ -1571,6 +1588,11 @@ static inline void f2fs_show_compress_options(struct seq_file *seq, if (F2FS_OPTION(sbi).compress_chksum) seq_puts(seq, ",compress_chksum"); + + if (F2FS_OPTION(sbi).compress_mode == COMPR_MODE_FS) + seq_printf(seq, ",compress_mode=%s", "fs"); + else if (F2FS_OPTION(sbi).compress_mode == COMPR_MODE_USER) + seq_printf(seq, ",compress_mode=%s", "user"); } static int f2fs_show_options(struct seq_file *seq, struct dentry *root) @@ -1720,6 +1742,7 @@ static void default_options(struct f2fs_sb_info *sbi) F2FS_OPTION(sbi).compress_algorithm = COMPRESS_LZ4; F2FS_OPTION(sbi).compress_log_size = MIN_COMPRESS_LOG_SIZE; F2FS_OPTION(sbi).compress_ext_cnt = 0; + F2FS_OPTION(sbi).compress_mode = COMPR_MODE_FS; F2FS_OPTION(sbi).bggc_mode = BGGC_MODE_ON; sbi->sb->s_flags &= ~SB_INLINECRYPT; From df4978d96890429719a781cc70f6888c67530403 Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Wed, 12 May 2021 17:52:57 +0800 Subject: [PATCH 0146/1565] f2fs: compress: clean up parameter of __f2fs_cluster_blocks() [ Upstream commit 91f0fb6903ed30370135381f10c02a10c7872cdc ] Previously, in order to reuse __f2fs_cluster_blocks(), f2fs_is_compressed_cluster() assigned a compress_ctx type variable, which is used to pass few parameters (cc.inode, cc.cluster_size, cc.cluster_idx), it's wasteful to allocate such large space in stack. Let's clean up parameters of __f2fs_cluster_blocks() to avoid that. Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Stable-dep-of: 7c5dffb3d90c ("f2fs: compress: fix to relocate check condition in f2fs_{release,reserve}_compress_blocks()") Signed-off-by: Sasha Levin --- fs/f2fs/compress.c | 33 +++++++++++++-------------------- 1 file changed, 13 insertions(+), 20 deletions(-) diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c index 6c870b741cfe..04b6de1a5874 100644 --- a/fs/f2fs/compress.c +++ b/fs/f2fs/compress.c @@ -866,14 +866,17 @@ static bool __cluster_may_compress(struct compress_ctx *cc) return true; } -static int __f2fs_cluster_blocks(struct compress_ctx *cc, bool compr) +static int __f2fs_cluster_blocks(struct inode *inode, + unsigned int cluster_idx, bool compr) { struct dnode_of_data dn; + unsigned int cluster_size = F2FS_I(inode)->i_cluster_size; + unsigned int start_idx = cluster_idx << + F2FS_I(inode)->i_log_cluster_size; int ret; - set_new_dnode(&dn, cc->inode, NULL, NULL, 0); - ret = f2fs_get_dnode_of_data(&dn, start_idx_of_cluster(cc), - LOOKUP_NODE); + set_new_dnode(&dn, inode, NULL, NULL, 0); + ret = f2fs_get_dnode_of_data(&dn, start_idx, LOOKUP_NODE); if (ret) { if (ret == -ENOENT) ret = 0; @@ -884,7 +887,7 @@ static int __f2fs_cluster_blocks(struct compress_ctx *cc, bool compr) int i; ret = 1; - for (i = 1; i < cc->cluster_size; i++) { + for (i = 1; i < cluster_size; i++) { block_t blkaddr; blkaddr = data_blkaddr(dn.inode, @@ -906,25 +909,15 @@ static int __f2fs_cluster_blocks(struct compress_ctx *cc, bool compr) /* return # of compressed blocks in compressed cluster */ static int f2fs_compressed_blocks(struct compress_ctx *cc) { - return __f2fs_cluster_blocks(cc, true); + return __f2fs_cluster_blocks(cc->inode, cc->cluster_idx, true); } /* return # of valid blocks in compressed cluster */ -static int f2fs_cluster_blocks(struct compress_ctx *cc) -{ - return __f2fs_cluster_blocks(cc, false); -} - int f2fs_is_compressed_cluster(struct inode *inode, pgoff_t index) { - struct compress_ctx cc = { - .inode = inode, - .log_cluster_size = F2FS_I(inode)->i_log_cluster_size, - .cluster_size = F2FS_I(inode)->i_cluster_size, - .cluster_idx = index >> F2FS_I(inode)->i_log_cluster_size, - }; - - return f2fs_cluster_blocks(&cc); + return __f2fs_cluster_blocks(inode, + index >> F2FS_I(inode)->i_log_cluster_size, + false); } static bool cluster_may_compress(struct compress_ctx *cc) @@ -975,7 +968,7 @@ static int prepare_compress_overwrite(struct compress_ctx *cc, bool prealloc; retry: - ret = f2fs_cluster_blocks(cc); + ret = f2fs_is_compressed_cluster(cc->inode, start_idx); if (ret <= 0) return ret; From 486009bc2fca02b31b972d1b10391a9d22ea4e29 Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Wed, 12 May 2021 17:52:58 +0800 Subject: [PATCH 0147/1565] f2fs: compress: remove unneeded preallocation [ Upstream commit 8f1d49832636d514e949b29ce64370ebebf6d6d2 ] We will reserve iblocks for compression saved, so during compressed cluster overwrite, we don't need to preallocate blocks for later write. In addition, it adds a bug_on to detect wrong reserved iblock number in __f2fs_cluster_blocks(). Bug fix in the original patch by Jaegeuk: If we released compressed blocks having an immutable bit, we can see less number of compressed block addresses. Let's fix wrong BUG_ON. Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Stable-dep-of: 7c5dffb3d90c ("f2fs: compress: fix to relocate check condition in f2fs_{release,reserve}_compress_blocks()") Signed-off-by: Sasha Levin --- fs/f2fs/compress.c | 27 +++------------------------ fs/f2fs/file.c | 4 ---- 2 files changed, 3 insertions(+), 28 deletions(-) diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c index 04b6de1a5874..388ed7052d9b 100644 --- a/fs/f2fs/compress.c +++ b/fs/f2fs/compress.c @@ -900,6 +900,9 @@ static int __f2fs_cluster_blocks(struct inode *inode, ret++; } } + + f2fs_bug_on(F2FS_I_SB(inode), + !compr && ret != cluster_size && !IS_IMMUTABLE(inode)); } fail: f2fs_put_dnode(&dn); @@ -960,21 +963,16 @@ static int prepare_compress_overwrite(struct compress_ctx *cc, struct f2fs_sb_info *sbi = F2FS_I_SB(cc->inode); struct address_space *mapping = cc->inode->i_mapping; struct page *page; - struct dnode_of_data dn; sector_t last_block_in_bio; unsigned fgp_flag = FGP_LOCK | FGP_WRITE | FGP_CREAT; pgoff_t start_idx = start_idx_of_cluster(cc); int i, ret; - bool prealloc; retry: ret = f2fs_is_compressed_cluster(cc->inode, start_idx); if (ret <= 0) return ret; - /* compressed case */ - prealloc = (ret < cc->cluster_size); - ret = f2fs_init_compress_ctx(cc); if (ret) return ret; @@ -1032,25 +1030,6 @@ static int prepare_compress_overwrite(struct compress_ctx *cc, } } - if (prealloc) { - f2fs_do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, true); - - set_new_dnode(&dn, cc->inode, NULL, NULL, 0); - - for (i = cc->cluster_size - 1; i > 0; i--) { - ret = f2fs_get_block(&dn, start_idx + i); - if (ret) { - i = cc->cluster_size; - break; - } - - if (dn.data_blkaddr != NEW_ADDR) - break; - } - - f2fs_do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, false); - } - if (likely(!ret)) { *fsdata = cc->rpages; *pagep = cc->rpages[offset_in_cluster(cc, index)]; diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 678d72a87025..8b136f2dc2d2 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -81,10 +81,6 @@ static vm_fault_t f2fs_vm_page_mkwrite(struct vm_fault *vmf) err = ret; goto err; } else if (ret) { - if (ret < F2FS_I(inode)->i_cluster_size) { - err = -EAGAIN; - goto err; - } need_alloc = false; } } From 8e1651cd667cd6779db28314844d88b6de8312a3 Mon Sep 17 00:00:00 2001 From: Jaegeuk Kim Date: Tue, 25 May 2021 11:39:35 -0700 Subject: [PATCH 0148/1565] f2fs: introduce FI_COMPRESS_RELEASED instead of using IMMUTABLE bit [ Upstream commit c61404153eb683da9c35aad133131554861ed561 ] Once we release compressed blocks, we used to set IMMUTABLE bit. But it turned out it disallows every fs operations which we don't need for compression. Let's just prevent writing data only. Reviewed-by: Chao Yu Signed-off-by: Jaegeuk Kim Stable-dep-of: 7c5dffb3d90c ("f2fs: compress: fix to relocate check condition in f2fs_{release,reserve}_compress_blocks()") Signed-off-by: Sasha Levin --- fs/f2fs/compress.c | 3 ++- fs/f2fs/f2fs.h | 6 ++++++ fs/f2fs/file.c | 18 ++++++++++++------ include/linux/f2fs_fs.h | 1 + 4 files changed, 21 insertions(+), 7 deletions(-) diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c index 388ed7052d9b..be6f2988ac7f 100644 --- a/fs/f2fs/compress.c +++ b/fs/f2fs/compress.c @@ -902,7 +902,8 @@ static int __f2fs_cluster_blocks(struct inode *inode, } f2fs_bug_on(F2FS_I_SB(inode), - !compr && ret != cluster_size && !IS_IMMUTABLE(inode)); + !compr && ret != cluster_size && + !is_inode_flag_set(inode, FI_COMPRESS_RELEASED)); } fail: f2fs_put_dnode(&dn); diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 6a9f4dcea06d..4380df9b2d70 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -683,6 +683,7 @@ enum { FI_COMPRESS_CORRUPT, /* indicate compressed cluster is corrupted */ FI_MMAP_FILE, /* indicate file was mmapped */ FI_ENABLE_COMPRESS, /* enable compression in "user" compression mode */ + FI_COMPRESS_RELEASED, /* compressed blocks were released */ FI_MAX, /* max flag, never be used */ }; @@ -2650,6 +2651,7 @@ static inline void __mark_inode_dirty_flag(struct inode *inode, case FI_DATA_EXIST: case FI_INLINE_DOTS: case FI_PIN_FILE: + case FI_COMPRESS_RELEASED: f2fs_mark_inode_dirty_sync(inode, true); } } @@ -2771,6 +2773,8 @@ static inline void get_inline_info(struct inode *inode, struct f2fs_inode *ri) set_bit(FI_EXTRA_ATTR, fi->flags); if (ri->i_inline & F2FS_PIN_FILE) set_bit(FI_PIN_FILE, fi->flags); + if (ri->i_inline & F2FS_COMPRESS_RELEASED) + set_bit(FI_COMPRESS_RELEASED, fi->flags); } static inline void set_raw_inline(struct inode *inode, struct f2fs_inode *ri) @@ -2791,6 +2795,8 @@ static inline void set_raw_inline(struct inode *inode, struct f2fs_inode *ri) ri->i_inline |= F2FS_EXTRA_ATTR; if (is_inode_flag_set(inode, FI_PIN_FILE)) ri->i_inline |= F2FS_PIN_FILE; + if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) + ri->i_inline |= F2FS_COMPRESS_RELEASED; } static inline int f2fs_has_extra_attr(struct inode *inode) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 8b136f2dc2d2..cdd2630f4860 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -63,6 +63,9 @@ static vm_fault_t f2fs_vm_page_mkwrite(struct vm_fault *vmf) if (unlikely(IS_IMMUTABLE(inode))) return VM_FAULT_SIGBUS; + if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) + return VM_FAULT_SIGBUS; + if (unlikely(f2fs_cp_error(sbi))) { err = -EIO; goto err; @@ -3551,7 +3554,7 @@ static int f2fs_release_compress_blocks(struct file *filp, unsigned long arg) goto out; } - if (IS_IMMUTABLE(inode)) { + if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) { ret = -EINVAL; goto out; } @@ -3560,8 +3563,7 @@ static int f2fs_release_compress_blocks(struct file *filp, unsigned long arg) if (ret) goto out; - F2FS_I(inode)->i_flags |= F2FS_IMMUTABLE_FL; - f2fs_set_inode_flags(inode); + set_inode_flag(inode, FI_COMPRESS_RELEASED); inode->i_ctime = current_time(inode); f2fs_mark_inode_dirty_sync(inode, true); @@ -3727,7 +3729,7 @@ static int f2fs_reserve_compress_blocks(struct file *filp, unsigned long arg) inode_lock(inode); - if (!IS_IMMUTABLE(inode)) { + if (!is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) { ret = -EINVAL; goto unlock_inode; } @@ -3772,8 +3774,7 @@ static int f2fs_reserve_compress_blocks(struct file *filp, unsigned long arg) up_write(&F2FS_I(inode)->i_mmap_sem); if (ret >= 0) { - F2FS_I(inode)->i_flags &= ~F2FS_IMMUTABLE_FL; - f2fs_set_inode_flags(inode); + clear_inode_flag(inode, FI_COMPRESS_RELEASED); inode->i_ctime = current_time(inode); f2fs_mark_inode_dirty_sync(inode, true); } @@ -4130,6 +4131,11 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) goto unlock; } + if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) { + ret = -EPERM; + goto unlock; + } + ret = generic_write_checks(iocb, from); if (ret > 0) { bool preallocated = false; diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h index 7dc2a06cf19a..9a6082c57219 100644 --- a/include/linux/f2fs_fs.h +++ b/include/linux/f2fs_fs.h @@ -229,6 +229,7 @@ struct f2fs_extent { #define F2FS_INLINE_DOTS 0x10 /* file having implicit dot dentries */ #define F2FS_EXTRA_ATTR 0x20 /* file having extra attribute */ #define F2FS_PIN_FILE 0x40 /* file should not be gced */ +#define F2FS_COMPRESS_RELEASED 0x80 /* file released compressed blocks */ struct f2fs_inode { __le16 i_mode; /* file mode */ From 431ecafbffabd0dc397909078a670e5b94026e09 Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Sun, 7 Apr 2024 15:26:03 +0800 Subject: [PATCH 0149/1565] f2fs: compress: fix to relocate check condition in f2fs_{release,reserve}_compress_blocks() [ Upstream commit 7c5dffb3d90c5921b91981cc663e02757d90526e ] Compress flag should be checked after inode lock held to avoid racing w/ f2fs_setflags_common(), fix it. Fixes: 4c8ff7095bef ("f2fs: support data compression") Reported-by: Zhiguo Niu Closes: https://lore.kernel.org/linux-f2fs-devel/CAHJ8P3LdZXLc2rqeYjvymgYHr2+YLuJ0sLG9DdsJZmwO7deuhw@mail.gmail.com Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Sasha Levin --- fs/f2fs/file.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index cdd2630f4860..9023083a98a6 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -3533,9 +3533,6 @@ static int f2fs_release_compress_blocks(struct file *filp, unsigned long arg) if (!f2fs_sb_has_compression(F2FS_I_SB(inode))) return -EOPNOTSUPP; - if (!f2fs_compressed_file(inode)) - return -EINVAL; - if (f2fs_readonly(sbi->sb)) return -EROFS; @@ -3554,7 +3551,8 @@ static int f2fs_release_compress_blocks(struct file *filp, unsigned long arg) goto out; } - if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) { + if (!f2fs_compressed_file(inode) || + is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) { ret = -EINVAL; goto out; } @@ -3712,9 +3710,6 @@ static int f2fs_reserve_compress_blocks(struct file *filp, unsigned long arg) if (!f2fs_sb_has_compression(F2FS_I_SB(inode))) return -EOPNOTSUPP; - if (!f2fs_compressed_file(inode)) - return -EINVAL; - if (f2fs_readonly(sbi->sb)) return -EROFS; @@ -3729,7 +3724,8 @@ static int f2fs_reserve_compress_blocks(struct file *filp, unsigned long arg) inode_lock(inode); - if (!is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) { + if (!f2fs_compressed_file(inode) || + !is_inode_flag_set(inode, FI_COMPRESS_RELEASED)) { ret = -EINVAL; goto unlock_inode; } From eebbc4eb7e6628c8ff88241f5eea2321a09a7452 Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Tue, 27 Apr 2021 11:07:30 +0800 Subject: [PATCH 0150/1565] f2fs: add cp_error check in f2fs_write_compressed_pages [ Upstream commit ee68d27181f060fab29e60d1d31aab6a42703dd4 ] This patch adds cp_error check in f2fs_write_compressed_pages() like we did in f2fs_write_single_data_page() Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Stable-dep-of: 278a6253a673 ("f2fs: fix to relocate check condition in f2fs_fallocate()") Signed-off-by: Sasha Levin --- fs/f2fs/compress.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c index be6f2988ac7f..9dc2e09f0a60 100644 --- a/fs/f2fs/compress.c +++ b/fs/f2fs/compress.c @@ -1161,6 +1161,12 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc, loff_t psize; int i, err; + /* we should bypass data pages to proceed the kworkder jobs */ + if (unlikely(f2fs_cp_error(sbi))) { + mapping_set_error(cc->rpages[0]->mapping, -EIO); + goto out_free; + } + if (IS_NOQUOTA(inode)) { /* * We need to wait for node_write to avoid block allocation during From d992b7802612fe04026c74ec0e376c2c001fa19b Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Tue, 20 Jul 2021 09:03:29 +0800 Subject: [PATCH 0151/1565] f2fs: fix to force keeping write barrier for strict fsync mode [ Upstream commit 2787991516468bfafafb9bf2b45a848e6b202e7c ] [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html As [1] reported, if lower device doesn't support write barrier, in below case: - write page #0; persist - overwrite page #0 - fsync - write data page #0 OPU into device's cache - write inode page into device's cache - issue flush If SPO is triggered during flush command, inode page can be persisted before data page #0, so that after recovery, inode page can be recovered with new physical block address of data page #0, however there may contains dummy data in new physical block address. Then what user will see is: after overwrite & fsync + SPO, old data in file was corrupted, if any user do care about such case, we can suggest user to use STRICT fsync mode, in this mode, we will force to use atomic write sematics to keep write order in between data/node and last node, so that it avoids potential data corruption during fsync(). Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Stable-dep-of: 278a6253a673 ("f2fs: fix to relocate check condition in f2fs_fallocate()") Signed-off-by: Sasha Levin --- fs/f2fs/file.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 9023083a98a6..02971d6b347e 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -297,6 +297,18 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, f2fs_exist_written_data(sbi, ino, UPDATE_INO)) goto flush_out; goto out; + } else { + /* + * for OPU case, during fsync(), node can be persisted before + * data when lower device doesn't support write barrier, result + * in data corruption after SPO. + * So for strict fsync mode, force to use atomic write sematics + * to keep write order in between data/node and last node to + * avoid potential data corruption. + */ + if (F2FS_OPTION(sbi).fsync_mode == + FSYNC_MODE_STRICT && !atomic) + atomic = true; } go_write: /* From bdca4b67862128edb77633cf38a9e29bec59380a Mon Sep 17 00:00:00 2001 From: Jaegeuk Kim Date: Fri, 7 Jan 2022 20:08:45 -0800 Subject: [PATCH 0152/1565] f2fs: do not allow partial truncation on pinned file [ Upstream commit 5fed0be8583f08c1548b4dcd9e5ee0d1133d0730 ] If the pinned file has a hole by partial truncation, application that has the block map will be broken. Reviewed-by: Chao Yu Signed-off-by: Jaegeuk Kim Stable-dep-of: 278a6253a673 ("f2fs: fix to relocate check condition in f2fs_fallocate()") Signed-off-by: Sasha Levin --- fs/f2fs/file.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 02971d6b347e..e88d4c0f71e3 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1745,7 +1745,11 @@ static long f2fs_fallocate(struct file *file, int mode, (mode & (FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_INSERT_RANGE))) return -EOPNOTSUPP; - if (f2fs_compressed_file(inode) && + /* + * Pinned file should not support partial trucation since the block + * can be used by applications. + */ + if ((f2fs_compressed_file(inode) || f2fs_is_pinned_file(inode)) && (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE | FALLOC_FL_INSERT_RANGE))) return -EOPNOTSUPP; From f1169d2b2aa2ece587db5398494a83e881f664b2 Mon Sep 17 00:00:00 2001 From: Jinyoung CHOI Date: Mon, 6 Feb 2023 20:56:00 +0900 Subject: [PATCH 0153/1565] f2fs: fix typos in comments [ Upstream commit 146949defda868378992171b9e42318b06fcd482 ] This patch is to fix typos in f2fs files. Signed-off-by: Jinyoung Choi Reviewed-by: Chao Yu Signed-off-by: Jaegeuk Kim Stable-dep-of: 278a6253a673 ("f2fs: fix to relocate check condition in f2fs_fallocate()") Signed-off-by: Sasha Levin --- fs/f2fs/checkpoint.c | 4 ++-- fs/f2fs/compress.c | 2 +- fs/f2fs/data.c | 8 ++++---- fs/f2fs/extent_cache.c | 4 ++-- fs/f2fs/file.c | 6 +++--- fs/f2fs/namei.c | 2 +- fs/f2fs/segment.c | 2 +- 7 files changed, 14 insertions(+), 14 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 8ca549cc975e..f4ab9b313e4f 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -776,7 +776,7 @@ static void write_orphan_inodes(struct f2fs_sb_info *sbi, block_t start_blk) */ head = &im->ino_list; - /* loop for each orphan inode entry and write them in Jornal block */ + /* loop for each orphan inode entry and write them in journal block */ list_for_each_entry(orphan, head, list) { if (!page) { page = f2fs_grab_meta_page(sbi, start_blk++); @@ -1109,7 +1109,7 @@ int f2fs_sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type, } else { /* * We should submit bio, since it exists several - * wribacking dentry pages in the freeing inode. + * writebacking dentry pages in the freeing inode. */ f2fs_submit_merged_write(sbi, DATA); cond_resched(); diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c index 9dc2e09f0a60..6fb875c8cf28 100644 --- a/fs/f2fs/compress.c +++ b/fs/f2fs/compress.c @@ -1161,7 +1161,7 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc, loff_t psize; int i, err; - /* we should bypass data pages to proceed the kworkder jobs */ + /* we should bypass data pages to proceed the kworker jobs */ if (unlikely(f2fs_cp_error(sbi))) { mapping_set_error(cc->rpages[0]->mapping, -EIO); goto out_free; diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index fc6c88e80cf4..1b764f70b70e 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -2431,7 +2431,7 @@ static int f2fs_mpage_readpages(struct inode *inode, #ifdef CONFIG_F2FS_FS_COMPRESSION if (f2fs_compressed_file(inode)) { - /* there are remained comressed pages, submit them */ + /* there are remained compressed pages, submit them */ if (!f2fs_cluster_can_merge_page(&cc, page->index)) { ret = f2fs_read_multi_pages(&cc, &bio, max_nr_pages, @@ -2807,7 +2807,7 @@ int f2fs_write_single_data_page(struct page *page, int *submitted, trace_f2fs_writepage(page, DATA); - /* we should bypass data pages to proceed the kworkder jobs */ + /* we should bypass data pages to proceed the kworker jobs */ if (unlikely(f2fs_cp_error(sbi))) { mapping_set_error(page->mapping, -EIO); /* @@ -2933,7 +2933,7 @@ int f2fs_write_single_data_page(struct page *page, int *submitted, redirty_out: redirty_page_for_writepage(wbc, page); /* - * pageout() in MM traslates EAGAIN, so calls handle_write_error() + * pageout() in MM translates EAGAIN, so calls handle_write_error() * -> mapping_set_error() -> set_bit(AS_EIO, ...). * file_write_and_wait_range() will see EIO error, which is critical * to return value of fsync() followed by atomic_write failure to user. @@ -2967,7 +2967,7 @@ static int f2fs_write_data_page(struct page *page, } /* - * This function was copied from write_cche_pages from mm/page-writeback.c. + * This function was copied from write_cache_pages from mm/page-writeback.c. * The major change is making write step of cold data page separately from * warm/hot data page. */ diff --git a/fs/f2fs/extent_cache.c b/fs/f2fs/extent_cache.c index ad0b83a41226..01f93fae5629 100644 --- a/fs/f2fs/extent_cache.c +++ b/fs/f2fs/extent_cache.c @@ -112,7 +112,7 @@ struct rb_node **f2fs_lookup_rb_tree_for_insert(struct f2fs_sb_info *sbi, * @prev_ex: extent before ofs * @next_ex: extent after ofs * @insert_p: insert point for new extent at ofs - * in order to simpfy the insertion after. + * in order to simplify the insertion after. * tree must stay unchanged between lookup and insertion. */ struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root_cached *root, @@ -572,7 +572,7 @@ static void f2fs_update_extent_tree_range(struct inode *inode, if (!en) en = next_en; - /* 2. invlidate all extent nodes in range [fofs, fofs + len - 1] */ + /* 2. invalidate all extent nodes in range [fofs, fofs + len - 1] */ while (en && en->ei.fofs < end) { unsigned int org_end; int parts = 0; /* # of parts current extent split into */ diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index e88d4c0f71e3..367b19f059f3 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -302,7 +302,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, * for OPU case, during fsync(), node can be persisted before * data when lower device doesn't support write barrier, result * in data corruption after SPO. - * So for strict fsync mode, force to use atomic write sematics + * So for strict fsync mode, force to use atomic write semantics * to keep write order in between data/node and last node to * avoid potential data corruption. */ @@ -1746,7 +1746,7 @@ static long f2fs_fallocate(struct file *file, int mode, return -EOPNOTSUPP; /* - * Pinned file should not support partial trucation since the block + * Pinned file should not support partial truncation since the block * can be used by applications. */ if ((f2fs_compressed_file(inode) || f2fs_is_pinned_file(inode)) && @@ -1796,7 +1796,7 @@ static long f2fs_fallocate(struct file *file, int mode, static int f2fs_release_file(struct inode *inode, struct file *filp) { /* - * f2fs_relase_file is called at every close calls. So we should + * f2fs_release_file is called at every close calls. So we should * not drop any inmemory pages by close called by other process. */ if (!(filp->f_mode & FMODE_WRITE) || diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c index 99e4ec48d2a4..56d23bc25435 100644 --- a/fs/f2fs/namei.c +++ b/fs/f2fs/namei.c @@ -937,7 +937,7 @@ static int f2fs_rename(struct inode *old_dir, struct dentry *old_dentry, /* * If new_inode is null, the below renaming flow will - * add a link in old_dir which can conver inline_dir. + * add a link in old_dir which can convert inline_dir. * After then, if we failed to get the entry due to other * reasons like ENOMEM, we had to remove the new entry. * Instead of adding such the error handling routine, let's diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index ad30908ac99f..134a78179e6f 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -3688,7 +3688,7 @@ void f2fs_wait_on_page_writeback(struct page *page, /* submit cached LFS IO */ f2fs_submit_merged_write_cond(sbi, NULL, page, 0, type); - /* sbumit cached IPU IO */ + /* submit cached IPU IO */ f2fs_submit_merged_ipu_write(sbi, NULL, page); if (ordered) { wait_on_page_writeback(page); From 2b16554fb26db0185ef9cfa4110c8c47b09c8626 Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Wed, 3 Apr 2024 22:24:19 +0800 Subject: [PATCH 0154/1565] f2fs: fix to relocate check condition in f2fs_fallocate() [ Upstream commit 278a6253a673611dbc8ab72a3b34b151a8e75822 ] compress and pinfile flag should be checked after inode lock held to avoid race condition, fix it. Fixes: 4c8ff7095bef ("f2fs: support data compression") Fixes: 5fed0be8583f ("f2fs: do not allow partial truncation on pinned file") Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Sasha Levin --- fs/f2fs/file.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 367b19f059f3..9cf04b9d3ad8 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1745,15 +1745,6 @@ static long f2fs_fallocate(struct file *file, int mode, (mode & (FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_INSERT_RANGE))) return -EOPNOTSUPP; - /* - * Pinned file should not support partial truncation since the block - * can be used by applications. - */ - if ((f2fs_compressed_file(inode) || f2fs_is_pinned_file(inode)) && - (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_COLLAPSE_RANGE | - FALLOC_FL_ZERO_RANGE | FALLOC_FL_INSERT_RANGE))) - return -EOPNOTSUPP; - if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE | FALLOC_FL_INSERT_RANGE)) @@ -1761,6 +1752,17 @@ static long f2fs_fallocate(struct file *file, int mode, inode_lock(inode); + /* + * Pinned file should not support partial truncation since the block + * can be used by applications. + */ + if ((f2fs_compressed_file(inode) || f2fs_is_pinned_file(inode)) && + (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_COLLAPSE_RANGE | + FALLOC_FL_ZERO_RANGE | FALLOC_FL_INSERT_RANGE))) { + ret = -EOPNOTSUPP; + goto out; + } + ret = file_modified(file); if (ret) goto out; From cdbe0477a0b5db6dbeadb518504bb3226590a740 Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Wed, 3 Apr 2024 22:24:20 +0800 Subject: [PATCH 0155/1565] f2fs: fix to check pinfile flag in f2fs_move_file_range() [ Upstream commit e07230da0500e0919a765037c5e81583b519be2c ] ioctl(F2FS_IOC_MOVE_RANGE) can truncate or punch hole on pinned file, fix to disallow it. Fixes: 5fed0be8583f ("f2fs: do not allow partial truncation on pinned file") Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Sasha Levin --- fs/f2fs/file.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 9cf04b9d3ad8..f66959ae0163 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -2835,7 +2835,8 @@ static int f2fs_move_file_range(struct file *file_in, loff_t pos_in, goto out; } - if (f2fs_compressed_file(src) || f2fs_compressed_file(dst)) { + if (f2fs_compressed_file(src) || f2fs_compressed_file(dst) || + f2fs_is_pinned_file(src) || f2fs_is_pinned_file(dst)) { ret = -EOPNOTSUPP; goto out_unlock; } From be76107dc4c10f2af7371cd6d9428f582d2787f7 Mon Sep 17 00:00:00 2001 From: Thomas Haemmerle Date: Mon, 15 Apr 2024 12:50:27 +0200 Subject: [PATCH 0156/1565] iio: pressure: dps310: support negative temperature values [ Upstream commit 9dd6b32e76ff714308964cd9ec91466a343dcb8b ] The current implementation interprets negative values returned from `dps310_calculate_temp` as error codes. This has a side effect that when negative temperature values are calculated, they are interpreted as error. Fix this by using the return value only for error handling and passing a pointer for the value. Fixes: ba6ec48e76bc ("iio: Add driver for Infineon DPS310") Signed-off-by: Thomas Haemmerle Link: https://lore.kernel.org/r/20240415105030.1161770-2-thomas.haemmerle@leica-geosystems.com Signed-off-by: Jonathan Cameron Signed-off-by: Sasha Levin --- drivers/iio/pressure/dps310.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/iio/pressure/dps310.c b/drivers/iio/pressure/dps310.c index 1b6b9530f166..7fdc7a0147f0 100644 --- a/drivers/iio/pressure/dps310.c +++ b/drivers/iio/pressure/dps310.c @@ -730,7 +730,7 @@ static int dps310_read_pressure(struct dps310_data *data, int *val, int *val2, } } -static int dps310_calculate_temp(struct dps310_data *data) +static int dps310_calculate_temp(struct dps310_data *data, int *val) { s64 c0; s64 t; @@ -746,7 +746,9 @@ static int dps310_calculate_temp(struct dps310_data *data) t = c0 + ((s64)data->temp_raw * (s64)data->c1); /* Convert to milliCelsius and scale the temperature */ - return (int)div_s64(t * 1000LL, kt); + *val = (int)div_s64(t * 1000LL, kt); + + return 0; } static int dps310_read_temp(struct dps310_data *data, int *val, int *val2, @@ -768,11 +770,10 @@ static int dps310_read_temp(struct dps310_data *data, int *val, int *val2, if (rc) return rc; - rc = dps310_calculate_temp(data); - if (rc < 0) + rc = dps310_calculate_temp(data, val); + if (rc) return rc; - *val = rc; return IIO_VAL_INT; case IIO_CHAN_INFO_OVERSAMPLING_RATIO: From 840c9c7d6aeca58a95f5231c25ff2b164ecd39a8 Mon Sep 17 00:00:00 2001 From: Tom Rix Date: Tue, 8 Jun 2021 14:23:47 -0700 Subject: [PATCH 0157/1565] fpga: region: change FPGA indirect article to an [ Upstream commit 011c49e3703854e52c0fb88f22cf38aca1d4d514 ] Change use of 'a fpga' to 'an fpga' Signed-off-by: Tom Rix Link: https://lore.kernel.org/r/20210608212350.3029742-10-trix@redhat.com Signed-off-by: Greg Kroah-Hartman Stable-dep-of: b7c0e1ecee40 ("fpga: region: add owner module and take its refcount") Signed-off-by: Sasha Levin --- drivers/fpga/fpga-region.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/fpga/fpga-region.c b/drivers/fpga/fpga-region.c index c3134b89c3fe..c5c55d2f20b9 100644 --- a/drivers/fpga/fpga-region.c +++ b/drivers/fpga/fpga-region.c @@ -33,14 +33,14 @@ struct fpga_region *fpga_region_class_find( EXPORT_SYMBOL_GPL(fpga_region_class_find); /** - * fpga_region_get - get an exclusive reference to a fpga region + * fpga_region_get - get an exclusive reference to an fpga region * @region: FPGA Region struct * * Caller should call fpga_region_put() when done with region. * * Return fpga_region struct if successful. * Return -EBUSY if someone already has a reference to the region. - * Return -ENODEV if @np is not a FPGA Region. + * Return -ENODEV if @np is not an FPGA Region. */ static struct fpga_region *fpga_region_get(struct fpga_region *region) { @@ -234,7 +234,7 @@ struct fpga_region EXPORT_SYMBOL_GPL(fpga_region_create); /** - * fpga_region_free - free a FPGA region created by fpga_region_create() + * fpga_region_free - free an FPGA region created by fpga_region_create() * @region: FPGA region */ void fpga_region_free(struct fpga_region *region) @@ -257,7 +257,7 @@ static void devm_fpga_region_release(struct device *dev, void *res) * @mgr: manager that programs this region * @get_bridges: optional function to get bridges to a list * - * This function is intended for use in a FPGA region driver's probe function. + * This function is intended for use in an FPGA region driver's probe function. * After the region driver creates the region struct with * devm_fpga_region_create(), it should register it with fpga_region_register(). * The region driver's remove function should call fpga_region_unregister(). @@ -291,7 +291,7 @@ struct fpga_region EXPORT_SYMBOL_GPL(devm_fpga_region_create); /** - * fpga_region_register - register a FPGA region + * fpga_region_register - register an FPGA region * @region: FPGA region * * Return: 0 or -errno @@ -303,10 +303,10 @@ int fpga_region_register(struct fpga_region *region) EXPORT_SYMBOL_GPL(fpga_region_register); /** - * fpga_region_unregister - unregister a FPGA region + * fpga_region_unregister - unregister an FPGA region * @region: FPGA region * - * This function is intended for use in a FPGA region driver's remove function. + * This function is intended for use in an FPGA region driver's remove function. */ void fpga_region_unregister(struct fpga_region *region) { From 9fdd3d1cd01a817456979d8bf7b5f45c436cefd4 Mon Sep 17 00:00:00 2001 From: Russ Weight Date: Mon, 14 Jun 2021 10:09:06 -0700 Subject: [PATCH 0158/1565] fpga: region: Rename dev to parent for parent device [ Upstream commit 5e77886d0aa9a882424f6a4ccb3eca4dca43b4a0 ] Rename variable "dev" to "parent" in cases where it represents the parent device. Signed-off-by: Russ Weight Reviewed-by: Xu Yilun Signed-off-by: Moritz Fischer Link: https://lore.kernel.org/r/20210614170909.232415-6-mdf@kernel.org Signed-off-by: Greg Kroah-Hartman Stable-dep-of: b7c0e1ecee40 ("fpga: region: add owner module and take its refcount") Signed-off-by: Sasha Levin --- drivers/fpga/fpga-region.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/fpga/fpga-region.c b/drivers/fpga/fpga-region.c index c5c55d2f20b9..a4838715221f 100644 --- a/drivers/fpga/fpga-region.c +++ b/drivers/fpga/fpga-region.c @@ -181,7 +181,7 @@ ATTRIBUTE_GROUPS(fpga_region); /** * fpga_region_create - alloc and init a struct fpga_region - * @dev: device parent + * @parent: device parent * @mgr: manager that programs this region * @get_bridges: optional function to get bridges to a list * @@ -192,7 +192,7 @@ ATTRIBUTE_GROUPS(fpga_region); * Return: struct fpga_region or NULL */ struct fpga_region -*fpga_region_create(struct device *dev, +*fpga_region_create(struct device *parent, struct fpga_manager *mgr, int (*get_bridges)(struct fpga_region *)) { @@ -214,8 +214,8 @@ struct fpga_region device_initialize(®ion->dev); region->dev.class = fpga_region_class; - region->dev.parent = dev; - region->dev.of_node = dev->of_node; + region->dev.parent = parent; + region->dev.of_node = parent->of_node; region->dev.id = id; ret = dev_set_name(®ion->dev, "region%d", id); @@ -253,7 +253,7 @@ static void devm_fpga_region_release(struct device *dev, void *res) /** * devm_fpga_region_create - create and initialize a managed FPGA region struct - * @dev: device parent + * @parent: device parent * @mgr: manager that programs this region * @get_bridges: optional function to get bridges to a list * @@ -268,7 +268,7 @@ static void devm_fpga_region_release(struct device *dev, void *res) * Return: struct fpga_region or NULL */ struct fpga_region -*devm_fpga_region_create(struct device *dev, +*devm_fpga_region_create(struct device *parent, struct fpga_manager *mgr, int (*get_bridges)(struct fpga_region *)) { @@ -278,12 +278,12 @@ struct fpga_region if (!ptr) return NULL; - region = fpga_region_create(dev, mgr, get_bridges); + region = fpga_region_create(parent, mgr, get_bridges); if (!region) { devres_free(ptr); } else { *ptr = region; - devres_add(dev, ptr); + devres_add(parent, ptr); } return region; From b089bb733c47ffd0f4154680f729edb372a863b6 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Mon, 10 May 2021 12:26:25 +0200 Subject: [PATCH 0159/1565] docs: driver-api: fpga: avoid using UTF-8 chars MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 758f74674bcb82e1ed1a0b5a56980f295183b546 ] While UTF-8 characters can be used at the Linux documentation, the best is to use them only when ASCII doesn't offer a good replacement. So, replace the occurences of the following UTF-8 characters: - U+2014 ('—'): EM DASH Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Moritz Fischer Stable-dep-of: b7c0e1ecee40 ("fpga: region: add owner module and take its refcount") Signed-off-by: Sasha Levin --- Documentation/driver-api/fpga/fpga-bridge.rst | 10 +++++----- Documentation/driver-api/fpga/fpga-mgr.rst | 12 +++++------ .../driver-api/fpga/fpga-programming.rst | 8 ++++---- Documentation/driver-api/fpga/fpga-region.rst | 20 +++++++++---------- 4 files changed, 25 insertions(+), 25 deletions(-) diff --git a/Documentation/driver-api/fpga/fpga-bridge.rst b/Documentation/driver-api/fpga/fpga-bridge.rst index 198aadafd3e7..8d650b4e2ce6 100644 --- a/Documentation/driver-api/fpga/fpga-bridge.rst +++ b/Documentation/driver-api/fpga/fpga-bridge.rst @@ -4,11 +4,11 @@ FPGA Bridge API to implement a new FPGA bridge ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -* struct fpga_bridge — The FPGA Bridge structure -* struct fpga_bridge_ops — Low level Bridge driver ops -* devm_fpga_bridge_create() — Allocate and init a bridge struct -* fpga_bridge_register() — Register a bridge -* fpga_bridge_unregister() — Unregister a bridge +* struct fpga_bridge - The FPGA Bridge structure +* struct fpga_bridge_ops - Low level Bridge driver ops +* devm_fpga_bridge_create() - Allocate and init a bridge struct +* fpga_bridge_register() - Register a bridge +* fpga_bridge_unregister() - Unregister a bridge .. kernel-doc:: include/linux/fpga/fpga-bridge.h :functions: fpga_bridge diff --git a/Documentation/driver-api/fpga/fpga-mgr.rst b/Documentation/driver-api/fpga/fpga-mgr.rst index 917ee22db429..4d926b452cb3 100644 --- a/Documentation/driver-api/fpga/fpga-mgr.rst +++ b/Documentation/driver-api/fpga/fpga-mgr.rst @@ -101,12 +101,12 @@ in state. API for implementing a new FPGA Manager driver ---------------------------------------------- -* ``fpga_mgr_states`` — Values for :c:expr:`fpga_manager->state`. -* struct fpga_manager — the FPGA manager struct -* struct fpga_manager_ops — Low level FPGA manager driver ops -* devm_fpga_mgr_create() — Allocate and init a manager struct -* fpga_mgr_register() — Register an FPGA manager -* fpga_mgr_unregister() — Unregister an FPGA manager +* ``fpga_mgr_states`` - Values for :c:expr:`fpga_manager->state`. +* struct fpga_manager - the FPGA manager struct +* struct fpga_manager_ops - Low level FPGA manager driver ops +* devm_fpga_mgr_create() - Allocate and init a manager struct +* fpga_mgr_register() - Register an FPGA manager +* fpga_mgr_unregister() - Unregister an FPGA manager .. kernel-doc:: include/linux/fpga/fpga-mgr.h :functions: fpga_mgr_states diff --git a/Documentation/driver-api/fpga/fpga-programming.rst b/Documentation/driver-api/fpga/fpga-programming.rst index 002392dab04f..fb4da4240e96 100644 --- a/Documentation/driver-api/fpga/fpga-programming.rst +++ b/Documentation/driver-api/fpga/fpga-programming.rst @@ -84,10 +84,10 @@ will generate that list. Here's some sample code of what to do next:: API for programming an FPGA --------------------------- -* fpga_region_program_fpga() — Program an FPGA -* fpga_image_info() — Specifies what FPGA image to program -* fpga_image_info_alloc() — Allocate an FPGA image info struct -* fpga_image_info_free() — Free an FPGA image info struct +* fpga_region_program_fpga() - Program an FPGA +* fpga_image_info() - Specifies what FPGA image to program +* fpga_image_info_alloc() - Allocate an FPGA image info struct +* fpga_image_info_free() - Free an FPGA image info struct .. kernel-doc:: drivers/fpga/fpga-region.c :functions: fpga_region_program_fpga diff --git a/Documentation/driver-api/fpga/fpga-region.rst b/Documentation/driver-api/fpga/fpga-region.rst index 363a8171ab0a..2636a27c11b2 100644 --- a/Documentation/driver-api/fpga/fpga-region.rst +++ b/Documentation/driver-api/fpga/fpga-region.rst @@ -45,19 +45,19 @@ An example of usage can be seen in the probe function of [#f2]_. API to add a new FPGA region ---------------------------- -* struct fpga_region — The FPGA region struct -* devm_fpga_region_create() — Allocate and init a region struct -* fpga_region_register() — Register an FPGA region -* fpga_region_unregister() — Unregister an FPGA region +* struct fpga_region - The FPGA region struct +* devm_fpga_region_create() - Allocate and init a region struct +* fpga_region_register() - Register an FPGA region +* fpga_region_unregister() - Unregister an FPGA region The FPGA region's probe function will need to get a reference to the FPGA Manager it will be using to do the programming. This usually would happen during the region's probe function. -* fpga_mgr_get() — Get a reference to an FPGA manager, raise ref count -* of_fpga_mgr_get() — Get a reference to an FPGA manager, raise ref count, +* fpga_mgr_get() - Get a reference to an FPGA manager, raise ref count +* of_fpga_mgr_get() - Get a reference to an FPGA manager, raise ref count, given a device node. -* fpga_mgr_put() — Put an FPGA manager +* fpga_mgr_put() - Put an FPGA manager The FPGA region will need to specify which bridges to control while programming the FPGA. The region driver can build a list of bridges during probe time @@ -66,11 +66,11 @@ the list of bridges to program just before programming (:c:expr:`fpga_region->get_bridges`). The FPGA bridge framework supplies the following APIs to handle building or tearing down that list. -* fpga_bridge_get_to_list() — Get a ref of an FPGA bridge, add it to a +* fpga_bridge_get_to_list() - Get a ref of an FPGA bridge, add it to a list -* of_fpga_bridge_get_to_list() — Get a ref of an FPGA bridge, add it to a +* of_fpga_bridge_get_to_list() - Get a ref of an FPGA bridge, add it to a list, given a device node -* fpga_bridges_put() — Given a list of bridges, put them +* fpga_bridges_put() - Given a list of bridges, put them .. kernel-doc:: include/linux/fpga/fpga-region.h :functions: fpga_region From af02dec83a48cd8b32266bb4bea7a557be974d35 Mon Sep 17 00:00:00 2001 From: Russ Weight Date: Thu, 18 Nov 2021 17:55:53 -0800 Subject: [PATCH 0160/1565] fpga: region: Use standard dev_release for class driver [ Upstream commit 8886a579744fbfa53e69aa453ed10ae3b1f9abac ] The FPGA region class driver data structure is being treated as a managed resource instead of using the standard dev_release call-back function to release the class data structure. This change removes the managed resource code and combines the create() and register() functions into a single register() or register_full() function. The register_full() function accepts an info data structure to provide flexibility in passing optional parameters. The register() function supports the current parameter list for users that don't require the use of optional parameters. Signed-off-by: Russ Weight Reviewed-by: Xu Yilun Acked-by: Xu Yilun Signed-off-by: Moritz Fischer Stable-dep-of: b7c0e1ecee40 ("fpga: region: add owner module and take its refcount") Signed-off-by: Sasha Levin --- Documentation/driver-api/fpga/fpga-region.rst | 12 +- drivers/fpga/dfl-fme-region.c | 17 ++- drivers/fpga/dfl.c | 12 +- drivers/fpga/fpga-region.c | 119 +++++++----------- drivers/fpga/of-fpga-region.c | 10 +- include/linux/fpga/fpga-region.h | 36 ++++-- 6 files changed, 95 insertions(+), 111 deletions(-) diff --git a/Documentation/driver-api/fpga/fpga-region.rst b/Documentation/driver-api/fpga/fpga-region.rst index 2636a27c11b2..dc55d60a0b4a 100644 --- a/Documentation/driver-api/fpga/fpga-region.rst +++ b/Documentation/driver-api/fpga/fpga-region.rst @@ -46,8 +46,11 @@ API to add a new FPGA region ---------------------------- * struct fpga_region - The FPGA region struct -* devm_fpga_region_create() - Allocate and init a region struct -* fpga_region_register() - Register an FPGA region +* struct fpga_region_info - Parameter structure for fpga_region_register_full() +* fpga_region_register_full() - Create and register an FPGA region using the + fpga_region_info structure to provide the full flexibility of options +* fpga_region_register() - Create and register an FPGA region using standard + arguments * fpga_region_unregister() - Unregister an FPGA region The FPGA region's probe function will need to get a reference to the FPGA @@ -75,8 +78,11 @@ following APIs to handle building or tearing down that list. .. kernel-doc:: include/linux/fpga/fpga-region.h :functions: fpga_region +.. kernel-doc:: include/linux/fpga/fpga-region.h + :functions: fpga_region_info + .. kernel-doc:: drivers/fpga/fpga-region.c - :functions: devm_fpga_region_create + :functions: fpga_region_register_full .. kernel-doc:: drivers/fpga/fpga-region.c :functions: fpga_region_register diff --git a/drivers/fpga/dfl-fme-region.c b/drivers/fpga/dfl-fme-region.c index 1eeb42af1012..4aebde0a7f1c 100644 --- a/drivers/fpga/dfl-fme-region.c +++ b/drivers/fpga/dfl-fme-region.c @@ -30,6 +30,7 @@ static int fme_region_get_bridges(struct fpga_region *region) static int fme_region_probe(struct platform_device *pdev) { struct dfl_fme_region_pdata *pdata = dev_get_platdata(&pdev->dev); + struct fpga_region_info info = { 0 }; struct device *dev = &pdev->dev; struct fpga_region *region; struct fpga_manager *mgr; @@ -39,20 +40,18 @@ static int fme_region_probe(struct platform_device *pdev) if (IS_ERR(mgr)) return -EPROBE_DEFER; - region = devm_fpga_region_create(dev, mgr, fme_region_get_bridges); - if (!region) { - ret = -ENOMEM; + info.mgr = mgr; + info.compat_id = mgr->compat_id; + info.get_bridges = fme_region_get_bridges; + info.priv = pdata; + region = fpga_region_register_full(dev, &info); + if (IS_ERR(region)) { + ret = PTR_ERR(region); goto eprobe_mgr_put; } - region->priv = pdata; - region->compat_id = mgr->compat_id; platform_set_drvdata(pdev, region); - ret = fpga_region_register(region); - if (ret) - goto eprobe_mgr_put; - dev_dbg(dev, "DFL FME FPGA Region probed\n"); return 0; diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c index eb8a6e329af9..ae220365fc04 100644 --- a/drivers/fpga/dfl.c +++ b/drivers/fpga/dfl.c @@ -1400,19 +1400,15 @@ dfl_fpga_feature_devs_enumerate(struct dfl_fpga_enum_info *info) if (!cdev) return ERR_PTR(-ENOMEM); - cdev->region = devm_fpga_region_create(info->dev, NULL, NULL); - if (!cdev->region) { - ret = -ENOMEM; - goto free_cdev_exit; - } - cdev->parent = info->dev; mutex_init(&cdev->lock); INIT_LIST_HEAD(&cdev->port_dev_list); - ret = fpga_region_register(cdev->region); - if (ret) + cdev->region = fpga_region_register(info->dev, NULL, NULL); + if (IS_ERR(cdev->region)) { + ret = PTR_ERR(cdev->region); goto free_cdev_exit; + } /* create and init build info for enumeration */ binfo = devm_kzalloc(info->dev, sizeof(*binfo), GFP_KERNEL); diff --git a/drivers/fpga/fpga-region.c b/drivers/fpga/fpga-region.c index a4838715221f..b0ac18de4885 100644 --- a/drivers/fpga/fpga-region.c +++ b/drivers/fpga/fpga-region.c @@ -180,39 +180,42 @@ static struct attribute *fpga_region_attrs[] = { ATTRIBUTE_GROUPS(fpga_region); /** - * fpga_region_create - alloc and init a struct fpga_region + * fpga_region_register_full - create and register an FPGA Region device * @parent: device parent - * @mgr: manager that programs this region - * @get_bridges: optional function to get bridges to a list + * @info: parameters for FPGA Region * - * The caller of this function is responsible for freeing the resulting region - * struct with fpga_region_free(). Using devm_fpga_region_create() instead is - * recommended. - * - * Return: struct fpga_region or NULL + * Return: struct fpga_region or ERR_PTR() */ -struct fpga_region -*fpga_region_create(struct device *parent, - struct fpga_manager *mgr, - int (*get_bridges)(struct fpga_region *)) +struct fpga_region * +fpga_region_register_full(struct device *parent, const struct fpga_region_info *info) { struct fpga_region *region; int id, ret = 0; + if (!info) { + dev_err(parent, + "Attempt to register without required info structure\n"); + return ERR_PTR(-EINVAL); + } + region = kzalloc(sizeof(*region), GFP_KERNEL); if (!region) - return NULL; + return ERR_PTR(-ENOMEM); id = ida_simple_get(&fpga_region_ida, 0, 0, GFP_KERNEL); - if (id < 0) + if (id < 0) { + ret = id; goto err_free; + } + + region->mgr = info->mgr; + region->compat_id = info->compat_id; + region->priv = info->priv; + region->get_bridges = info->get_bridges; - region->mgr = mgr; - region->get_bridges = get_bridges; mutex_init(®ion->mutex); INIT_LIST_HEAD(®ion->bridge_list); - device_initialize(®ion->dev); region->dev.class = fpga_region_class; region->dev.parent = parent; region->dev.of_node = parent->of_node; @@ -222,6 +225,12 @@ struct fpga_region if (ret) goto err_remove; + ret = device_register(®ion->dev); + if (ret) { + put_device(®ion->dev); + return ERR_PTR(ret); + } + return region; err_remove: @@ -229,76 +238,32 @@ struct fpga_region err_free: kfree(region); - return NULL; + return ERR_PTR(ret); } -EXPORT_SYMBOL_GPL(fpga_region_create); +EXPORT_SYMBOL_GPL(fpga_region_register_full); /** - * fpga_region_free - free an FPGA region created by fpga_region_create() - * @region: FPGA region - */ -void fpga_region_free(struct fpga_region *region) -{ - ida_simple_remove(&fpga_region_ida, region->dev.id); - kfree(region); -} -EXPORT_SYMBOL_GPL(fpga_region_free); - -static void devm_fpga_region_release(struct device *dev, void *res) -{ - struct fpga_region *region = *(struct fpga_region **)res; - - fpga_region_free(region); -} - -/** - * devm_fpga_region_create - create and initialize a managed FPGA region struct + * fpga_region_register - create and register an FPGA Region device * @parent: device parent * @mgr: manager that programs this region * @get_bridges: optional function to get bridges to a list * - * This function is intended for use in an FPGA region driver's probe function. - * After the region driver creates the region struct with - * devm_fpga_region_create(), it should register it with fpga_region_register(). - * The region driver's remove function should call fpga_region_unregister(). - * The region struct allocated with this function will be freed automatically on - * driver detach. This includes the case of a probe function returning error - * before calling fpga_region_register(), the struct will still get cleaned up. + * This simple version of the register function should be sufficient for most users. + * The fpga_region_register_full() function is available for users that need to + * pass additional, optional parameters. * - * Return: struct fpga_region or NULL + * Return: struct fpga_region or ERR_PTR() */ -struct fpga_region -*devm_fpga_region_create(struct device *parent, - struct fpga_manager *mgr, - int (*get_bridges)(struct fpga_region *)) +struct fpga_region * +fpga_region_register(struct device *parent, struct fpga_manager *mgr, + int (*get_bridges)(struct fpga_region *)) { - struct fpga_region **ptr, *region; + struct fpga_region_info info = { 0 }; - ptr = devres_alloc(devm_fpga_region_release, sizeof(*ptr), GFP_KERNEL); - if (!ptr) - return NULL; + info.mgr = mgr; + info.get_bridges = get_bridges; - region = fpga_region_create(parent, mgr, get_bridges); - if (!region) { - devres_free(ptr); - } else { - *ptr = region; - devres_add(parent, ptr); - } - - return region; -} -EXPORT_SYMBOL_GPL(devm_fpga_region_create); - -/** - * fpga_region_register - register an FPGA region - * @region: FPGA region - * - * Return: 0 or -errno - */ -int fpga_region_register(struct fpga_region *region) -{ - return device_add(®ion->dev); + return fpga_region_register_full(parent, &info); } EXPORT_SYMBOL_GPL(fpga_region_register); @@ -316,6 +281,10 @@ EXPORT_SYMBOL_GPL(fpga_region_unregister); static void fpga_region_dev_release(struct device *dev) { + struct fpga_region *region = to_fpga_region(dev); + + ida_simple_remove(&fpga_region_ida, region->dev.id); + kfree(region); } /** diff --git a/drivers/fpga/of-fpga-region.c b/drivers/fpga/of-fpga-region.c index e405309baadc..466e083654ae 100644 --- a/drivers/fpga/of-fpga-region.c +++ b/drivers/fpga/of-fpga-region.c @@ -405,16 +405,12 @@ static int of_fpga_region_probe(struct platform_device *pdev) if (IS_ERR(mgr)) return -EPROBE_DEFER; - region = devm_fpga_region_create(dev, mgr, of_fpga_region_get_bridges); - if (!region) { - ret = -ENOMEM; + region = fpga_region_register(dev, mgr, of_fpga_region_get_bridges); + if (IS_ERR(region)) { + ret = PTR_ERR(region); goto eprobe_mgr_put; } - ret = fpga_region_register(region); - if (ret) - goto eprobe_mgr_put; - of_platform_populate(np, fpga_region_of_match, NULL, ®ion->dev); platform_set_drvdata(pdev, region); diff --git a/include/linux/fpga/fpga-region.h b/include/linux/fpga/fpga-region.h index 27cb706275db..3b87f232425c 100644 --- a/include/linux/fpga/fpga-region.h +++ b/include/linux/fpga/fpga-region.h @@ -7,6 +7,27 @@ #include #include +struct fpga_region; + +/** + * struct fpga_region_info - collection of parameters an FPGA Region + * @mgr: fpga region manager + * @compat_id: FPGA region id for compatibility check. + * @priv: fpga region private data + * @get_bridges: optional function to get bridges to a list + * + * fpga_region_info contains parameters for the register_full function. + * These are separated into an info structure because they some are optional + * others could be added to in the future. The info structure facilitates + * maintaining a stable API. + */ +struct fpga_region_info { + struct fpga_manager *mgr; + struct fpga_compat_id *compat_id; + void *priv; + int (*get_bridges)(struct fpga_region *region); +}; + /** * struct fpga_region - FPGA Region structure * @dev: FPGA Region device @@ -37,15 +58,12 @@ struct fpga_region *fpga_region_class_find( int fpga_region_program_fpga(struct fpga_region *region); -struct fpga_region -*fpga_region_create(struct device *dev, struct fpga_manager *mgr, - int (*get_bridges)(struct fpga_region *)); -void fpga_region_free(struct fpga_region *region); -int fpga_region_register(struct fpga_region *region); +struct fpga_region * +fpga_region_register_full(struct device *parent, const struct fpga_region_info *info); + +struct fpga_region * +fpga_region_register(struct device *parent, struct fpga_manager *mgr, + int (*get_bridges)(struct fpga_region *)); void fpga_region_unregister(struct fpga_region *region); -struct fpga_region -*devm_fpga_region_create(struct device *dev, struct fpga_manager *mgr, - int (*get_bridges)(struct fpga_region *)); - #endif /* _FPGA_REGION_H */ From 26e6e25d742e29885cf44274fcf6b744366c4702 Mon Sep 17 00:00:00 2001 From: Marco Pagani Date: Fri, 19 Apr 2024 10:35:59 +0200 Subject: [PATCH 0161/1565] fpga: region: add owner module and take its refcount [ Upstream commit b7c0e1ecee403a43abc89eb3e75672b01ff2ece9 ] The current implementation of the fpga region assumes that the low-level module registers a driver for the parent device and uses its owner pointer to take the module's refcount. This approach is problematic since it can lead to a null pointer dereference while attempting to get the region during programming if the parent device does not have a driver. To address this problem, add a module owner pointer to the fpga_region struct and use it to take the module's refcount. Modify the functions for registering a region to take an additional owner module parameter and rename them to avoid conflicts. Use the old function names for helper macros that automatically set the module that registers the region as the owner. This ensures compatibility with existing low-level control modules and reduces the chances of registering a region without setting the owner. Also, update the documentation to keep it consistent with the new interface for registering an fpga region. Fixes: 0fa20cdfcc1f ("fpga: fpga-region: device tree control for FPGA") Suggested-by: Greg Kroah-Hartman Suggested-by: Xu Yilun Reviewed-by: Russ Weight Signed-off-by: Marco Pagani Acked-by: Xu Yilun Link: https://lore.kernel.org/r/20240419083601.77403-1-marpagan@redhat.com Signed-off-by: Xu Yilun Signed-off-by: Sasha Levin --- Documentation/driver-api/fpga/fpga-region.rst | 13 ++++++---- drivers/fpga/fpga-region.c | 24 +++++++++++-------- include/linux/fpga/fpga-region.h | 13 +++++++--- 3 files changed, 32 insertions(+), 18 deletions(-) diff --git a/Documentation/driver-api/fpga/fpga-region.rst b/Documentation/driver-api/fpga/fpga-region.rst index dc55d60a0b4a..2d03b5fb7657 100644 --- a/Documentation/driver-api/fpga/fpga-region.rst +++ b/Documentation/driver-api/fpga/fpga-region.rst @@ -46,13 +46,16 @@ API to add a new FPGA region ---------------------------- * struct fpga_region - The FPGA region struct -* struct fpga_region_info - Parameter structure for fpga_region_register_full() -* fpga_region_register_full() - Create and register an FPGA region using the +* struct fpga_region_info - Parameter structure for __fpga_region_register_full() +* __fpga_region_register_full() - Create and register an FPGA region using the fpga_region_info structure to provide the full flexibility of options -* fpga_region_register() - Create and register an FPGA region using standard +* __fpga_region_register() - Create and register an FPGA region using standard arguments * fpga_region_unregister() - Unregister an FPGA region +Helper macros ``fpga_region_register()`` and ``fpga_region_register_full()`` +automatically set the module that registers the FPGA region as the owner. + The FPGA region's probe function will need to get a reference to the FPGA Manager it will be using to do the programming. This usually would happen during the region's probe function. @@ -82,10 +85,10 @@ following APIs to handle building or tearing down that list. :functions: fpga_region_info .. kernel-doc:: drivers/fpga/fpga-region.c - :functions: fpga_region_register_full + :functions: __fpga_region_register_full .. kernel-doc:: drivers/fpga/fpga-region.c - :functions: fpga_region_register + :functions: __fpga_region_register .. kernel-doc:: drivers/fpga/fpga-region.c :functions: fpga_region_unregister diff --git a/drivers/fpga/fpga-region.c b/drivers/fpga/fpga-region.c index b0ac18de4885..d73daea579ac 100644 --- a/drivers/fpga/fpga-region.c +++ b/drivers/fpga/fpga-region.c @@ -52,7 +52,7 @@ static struct fpga_region *fpga_region_get(struct fpga_region *region) } get_device(dev); - if (!try_module_get(dev->parent->driver->owner)) { + if (!try_module_get(region->ops_owner)) { put_device(dev); mutex_unlock(®ion->mutex); return ERR_PTR(-ENODEV); @@ -74,7 +74,7 @@ static void fpga_region_put(struct fpga_region *region) dev_dbg(dev, "put\n"); - module_put(dev->parent->driver->owner); + module_put(region->ops_owner); put_device(dev); mutex_unlock(®ion->mutex); } @@ -180,14 +180,16 @@ static struct attribute *fpga_region_attrs[] = { ATTRIBUTE_GROUPS(fpga_region); /** - * fpga_region_register_full - create and register an FPGA Region device + * __fpga_region_register_full - create and register an FPGA Region device * @parent: device parent * @info: parameters for FPGA Region + * @owner: module containing the get_bridges function * * Return: struct fpga_region or ERR_PTR() */ struct fpga_region * -fpga_region_register_full(struct device *parent, const struct fpga_region_info *info) +__fpga_region_register_full(struct device *parent, const struct fpga_region_info *info, + struct module *owner) { struct fpga_region *region; int id, ret = 0; @@ -212,6 +214,7 @@ fpga_region_register_full(struct device *parent, const struct fpga_region_info * region->compat_id = info->compat_id; region->priv = info->priv; region->get_bridges = info->get_bridges; + region->ops_owner = owner; mutex_init(®ion->mutex); INIT_LIST_HEAD(®ion->bridge_list); @@ -240,13 +243,14 @@ fpga_region_register_full(struct device *parent, const struct fpga_region_info * return ERR_PTR(ret); } -EXPORT_SYMBOL_GPL(fpga_region_register_full); +EXPORT_SYMBOL_GPL(__fpga_region_register_full); /** - * fpga_region_register - create and register an FPGA Region device + * __fpga_region_register - create and register an FPGA Region device * @parent: device parent * @mgr: manager that programs this region * @get_bridges: optional function to get bridges to a list + * @owner: module containing the get_bridges function * * This simple version of the register function should be sufficient for most users. * The fpga_region_register_full() function is available for users that need to @@ -255,17 +259,17 @@ EXPORT_SYMBOL_GPL(fpga_region_register_full); * Return: struct fpga_region or ERR_PTR() */ struct fpga_region * -fpga_region_register(struct device *parent, struct fpga_manager *mgr, - int (*get_bridges)(struct fpga_region *)) +__fpga_region_register(struct device *parent, struct fpga_manager *mgr, + int (*get_bridges)(struct fpga_region *), struct module *owner) { struct fpga_region_info info = { 0 }; info.mgr = mgr; info.get_bridges = get_bridges; - return fpga_region_register_full(parent, &info); + return __fpga_region_register_full(parent, &info, owner); } -EXPORT_SYMBOL_GPL(fpga_region_register); +EXPORT_SYMBOL_GPL(__fpga_region_register); /** * fpga_region_unregister - unregister an FPGA region diff --git a/include/linux/fpga/fpga-region.h b/include/linux/fpga/fpga-region.h index 3b87f232425c..1c446c2ce2f9 100644 --- a/include/linux/fpga/fpga-region.h +++ b/include/linux/fpga/fpga-region.h @@ -36,6 +36,7 @@ struct fpga_region_info { * @mgr: FPGA manager * @info: FPGA image info * @compat_id: FPGA region id for compatibility check. + * @ops_owner: module containing the get_bridges function * @priv: private data * @get_bridges: optional function to get bridges to a list */ @@ -46,6 +47,7 @@ struct fpga_region { struct fpga_manager *mgr; struct fpga_image_info *info; struct fpga_compat_id *compat_id; + struct module *ops_owner; void *priv; int (*get_bridges)(struct fpga_region *region); }; @@ -58,12 +60,17 @@ struct fpga_region *fpga_region_class_find( int fpga_region_program_fpga(struct fpga_region *region); +#define fpga_region_register_full(parent, info) \ + __fpga_region_register_full(parent, info, THIS_MODULE) struct fpga_region * -fpga_region_register_full(struct device *parent, const struct fpga_region_info *info); +__fpga_region_register_full(struct device *parent, const struct fpga_region_info *info, + struct module *owner); +#define fpga_region_register(parent, mgr, get_bridges) \ + __fpga_region_register(parent, mgr, get_bridges, THIS_MODULE) struct fpga_region * -fpga_region_register(struct device *parent, struct fpga_manager *mgr, - int (*get_bridges)(struct fpga_region *)); +__fpga_region_register(struct device *parent, struct fpga_manager *mgr, + int (*get_bridges)(struct fpga_region *), struct module *owner); void fpga_region_unregister(struct fpga_region *region); #endif /* _FPGA_REGION_H */ From 34ff72bb5d65370dc949328c9f878dd9b30ddecf Mon Sep 17 00:00:00 2001 From: Michal Simek Date: Thu, 11 Apr 2024 10:21:44 +0200 Subject: [PATCH 0162/1565] microblaze: Remove gcc flag for non existing early_printk.c file [ Upstream commit edc66cf0c4164aa3daf6cc55e970bb94383a6a57 ] early_printk support for removed long time ago but compilation flag for ftrace still points to already removed file that's why remove that line too. Fixes: 96f0e6fcc9ad ("microblaze: remove redundant early_printk support") Signed-off-by: Michal Simek Link: https://lore.kernel.org/r/5493467419cd2510a32854e2807bcd263de981a0.1712823702.git.michal.simek@amd.com Signed-off-by: Sasha Levin --- arch/microblaze/kernel/Makefile | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/microblaze/kernel/Makefile b/arch/microblaze/kernel/Makefile index dd71637437f4..8b9d52b194cb 100644 --- a/arch/microblaze/kernel/Makefile +++ b/arch/microblaze/kernel/Makefile @@ -7,7 +7,6 @@ ifdef CONFIG_FUNCTION_TRACER # Do not trace early boot code and low level code CFLAGS_REMOVE_timer.o = -pg CFLAGS_REMOVE_intc.o = -pg -CFLAGS_REMOVE_early_printk.o = -pg CFLAGS_REMOVE_ftrace.o = -pg CFLAGS_REMOVE_process.o = -pg endif From 23209f947d4194dd7bc6e1000ead092d5e85eb7f Mon Sep 17 00:00:00 2001 From: Michal Simek Date: Thu, 11 Apr 2024 10:27:21 +0200 Subject: [PATCH 0163/1565] microblaze: Remove early printk call from cpuinfo-static.c [ Upstream commit 58d647506c92ccd3cfa0c453c68ddd14f40bf06f ] Early printk has been removed already that's why also remove calling it. Similar change has been done in cpuinfo-pvr-full.c by commit cfbd8d1979af ("microblaze: Remove early printk setup"). Fixes: 96f0e6fcc9ad ("microblaze: remove redundant early_printk support") Signed-off-by: Michal Simek Link: https://lore.kernel.org/r/2f10db506be8188fa07b6ec331caca01af1b10f8.1712824039.git.michal.simek@amd.com Signed-off-by: Sasha Levin --- arch/microblaze/kernel/cpu/cpuinfo-static.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/microblaze/kernel/cpu/cpuinfo-static.c b/arch/microblaze/kernel/cpu/cpuinfo-static.c index 85dbda4a08a8..03da36dc6d9c 100644 --- a/arch/microblaze/kernel/cpu/cpuinfo-static.c +++ b/arch/microblaze/kernel/cpu/cpuinfo-static.c @@ -18,7 +18,7 @@ static const char family_string[] = CONFIG_XILINX_MICROBLAZE0_FAMILY; static const char cpu_ver_string[] = CONFIG_XILINX_MICROBLAZE0_HW_VER; #define err_printk(x) \ - early_printk("ERROR: Microblaze " x "-different for kernel and DTS\n"); + pr_err("ERROR: Microblaze " x "-different for kernel and DTS\n"); void __init set_cpuinfo_static(struct cpuinfo *ci, struct device_node *cpu) { From 7716b201d2e246b2dc60ae17bcf2b0b074d4c452 Mon Sep 17 00:00:00 2001 From: Chris Wulff Date: Thu, 25 Apr 2024 15:20:20 +0000 Subject: [PATCH 0164/1565] usb: gadget: u_audio: Clear uac pointer when freed. [ Upstream commit a2cf936ebef291ef7395172b9e2f624779fb6dc0 ] This prevents use of a stale pointer if functions are called after g_cleanup that shouldn't be. This doesn't fix any races, but converts a possibly silent kernel memory corruption into an obvious NULL pointer dereference report. Fixes: eb9fecb9e69b ("usb: gadget: f_uac2: split out audio core") Signed-off-by: Chris Wulff Link: https://lore.kernel.org/stable/CO1PR17MB54194226DA08BFC9EBD8C163E1172%40CO1PR17MB5419.namprd17.prod.outlook.com Link: https://lore.kernel.org/r/CO1PR17MB54194226DA08BFC9EBD8C163E1172@CO1PR17MB5419.namprd17.prod.outlook.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- drivers/usb/gadget/function/u_audio.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/usb/gadget/function/u_audio.c b/drivers/usb/gadget/function/u_audio.c index 6c8b8f5b7e0f..a387ff2c8b73 100644 --- a/drivers/usb/gadget/function/u_audio.c +++ b/drivers/usb/gadget/function/u_audio.c @@ -611,6 +611,8 @@ void g_audio_cleanup(struct g_audio *g_audio) return; uac = g_audio->uac; + g_audio->uac = NULL; + card = uac->card; if (card) snd_card_free_when_closed(card); From 713fc00c571dde4af3db2dbd5d1b0eadc327817b Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Mon, 29 Apr 2024 16:01:05 +0300 Subject: [PATCH 0165/1565] stm class: Fix a double free in stm_register_device() [ Upstream commit 3df463865ba42b8f88a590326f4c9ea17a1ce459 ] The put_device(&stm->dev) call will trigger stm_device_release() which frees "stm" so the vfree(stm) on the next line is a double free. Fixes: 389b6699a2aa ("stm class: Fix stm device initialization order") Signed-off-by: Dan Carpenter Reviewed-by: Amelie Delaunay Reviewed-by: Andy Shevchenko Signed-off-by: Alexander Shishkin Link: https://lore.kernel.org/r/20240429130119.1518073-2-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- drivers/hwtracing/stm/core.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/hwtracing/stm/core.c b/drivers/hwtracing/stm/core.c index 2712e699ba08..ae9ea3a1fa2a 100644 --- a/drivers/hwtracing/stm/core.c +++ b/drivers/hwtracing/stm/core.c @@ -868,8 +868,11 @@ int stm_register_device(struct device *parent, struct stm_data *stm_data, return -ENOMEM; stm->major = register_chrdev(0, stm_data->name, &stm_fops); - if (stm->major < 0) - goto err_free; + if (stm->major < 0) { + err = stm->major; + vfree(stm); + return err; + } device_initialize(&stm->dev); stm->dev.devt = MKDEV(stm->major, 0); @@ -913,10 +916,8 @@ int stm_register_device(struct device *parent, struct stm_data *stm_data, err_device: unregister_chrdev(stm->major, stm_data->name); - /* matches device_initialize() above */ + /* calls stm_device_release() */ put_device(&stm->dev); -err_free: - vfree(stm); return err; } From 9349e1f2c95f19aeb66f43305c04fed1dde34187 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Tue, 19 Dec 2023 06:01:47 +0100 Subject: [PATCH 0166/1565] ppdev: Remove usage of the deprecated ida_simple_xx() API [ Upstream commit d8407f71ebeaeb6f50bd89791837873e44609708 ] ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). This is less verbose. Signed-off-by: Christophe JAILLET Link: https://lore.kernel.org/r/ba9da12fdd5cdb2c28180b7160af5042447d803f.1702962092.git.christophe.jaillet@wanadoo.fr Signed-off-by: Greg Kroah-Hartman Stable-dep-of: fbf740aeb86a ("ppdev: Add an error check in register_device") Signed-off-by: Sasha Levin --- drivers/char/ppdev.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/char/ppdev.c b/drivers/char/ppdev.c index 38b46c7d1737..f6024d97fe70 100644 --- a/drivers/char/ppdev.c +++ b/drivers/char/ppdev.c @@ -299,7 +299,7 @@ static int register_device(int minor, struct pp_struct *pp) goto err; } - index = ida_simple_get(&ida_index, 0, 0, GFP_KERNEL); + index = ida_alloc(&ida_index, GFP_KERNEL); memset(&ppdev_cb, 0, sizeof(ppdev_cb)); ppdev_cb.irq_func = pp_irq; ppdev_cb.flags = (pp->flags & PP_EXCL) ? PARPORT_FLAG_EXCL : 0; @@ -310,7 +310,7 @@ static int register_device(int minor, struct pp_struct *pp) if (!pdev) { pr_warn("%s: failed to register device!\n", name); rc = -ENXIO; - ida_simple_remove(&ida_index, index); + ida_free(&ida_index, index); goto err; } @@ -750,7 +750,7 @@ static int pp_release(struct inode *inode, struct file *file) if (pp->pdev) { parport_unregister_device(pp->pdev); - ida_simple_remove(&ida_index, pp->index); + ida_free(&ida_index, pp->index); pp->pdev = NULL; pr_debug(CHRDEV "%x: unregistered pardevice\n", minor); } From d32caf51379a4d71db03d3d4d7c22d27cdf7f68b Mon Sep 17 00:00:00 2001 From: Huai-Yuan Liu Date: Fri, 12 Apr 2024 16:38:40 +0800 Subject: [PATCH 0167/1565] ppdev: Add an error check in register_device [ Upstream commit fbf740aeb86a4fe82ad158d26d711f2f3be79b3e ] In register_device, the return value of ida_simple_get is unchecked, in witch ida_simple_get will use an invalid index value. To address this issue, index should be checked after ida_simple_get. When the index value is abnormal, a warning message should be printed, the port should be dropped, and the value should be recorded. Fixes: 9a69645dde11 ("ppdev: fix registering same device name") Signed-off-by: Huai-Yuan Liu Link: https://lore.kernel.org/r/20240412083840.234085-1-qq810974084@gmail.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- drivers/char/ppdev.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/char/ppdev.c b/drivers/char/ppdev.c index f6024d97fe70..a97edbf7455a 100644 --- a/drivers/char/ppdev.c +++ b/drivers/char/ppdev.c @@ -296,28 +296,35 @@ static int register_device(int minor, struct pp_struct *pp) if (!port) { pr_warn("%s: no associated port!\n", name); rc = -ENXIO; - goto err; + goto err_free_name; } index = ida_alloc(&ida_index, GFP_KERNEL); + if (index < 0) { + pr_warn("%s: failed to get index!\n", name); + rc = index; + goto err_put_port; + } + memset(&ppdev_cb, 0, sizeof(ppdev_cb)); ppdev_cb.irq_func = pp_irq; ppdev_cb.flags = (pp->flags & PP_EXCL) ? PARPORT_FLAG_EXCL : 0; ppdev_cb.private = pp; pdev = parport_register_dev_model(port, name, &ppdev_cb, index); - parport_put_port(port); if (!pdev) { pr_warn("%s: failed to register device!\n", name); rc = -ENXIO; ida_free(&ida_index, index); - goto err; + goto err_put_port; } pp->pdev = pdev; pp->index = index; dev_dbg(&pdev->dev, "registered pardevice\n"); -err: +err_put_port: + parport_put_port(port); +err_free_name: kfree(name); return rc; } From 9ca02da316bea536bf7e5b73898eb6b6ebcc6035 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Mon, 12 Feb 2024 22:00:28 -0800 Subject: [PATCH 0168/1565] extcon: max8997: select IRQ_DOMAIN instead of depending on it [ Upstream commit b1781d0a1458070d40134e4f3412ec9d70099bec ] IRQ_DOMAIN is a hidden (not user visible) symbol. Users cannot set it directly thru "make *config", so drivers should select it instead of depending on it if they need it. Relying on it being set for a dependency is risky. Consistently using "select" or "depends on" can also help reduce Kconfig circular dependency issues. Therefore, change EXTCON_MAX8997's use of "depends on" for IRQ_DOMAIN to "select". Link: https://lore.kernel.org/lkml/20240213060028.9744-1-rdunlap@infradead.org/ Fixes: dca1a71e4108 ("extcon: Add support irq domain for MAX8997 muic") Signed-off-by: Randy Dunlap Acked-by: Arnd Bergmann Signed-off-by: Chanwoo Choi Signed-off-by: Sasha Levin --- drivers/extcon/Kconfig | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/extcon/Kconfig b/drivers/extcon/Kconfig index aac507bff135..09485803280e 100644 --- a/drivers/extcon/Kconfig +++ b/drivers/extcon/Kconfig @@ -121,7 +121,8 @@ config EXTCON_MAX77843 config EXTCON_MAX8997 tristate "Maxim MAX8997 EXTCON Support" - depends on MFD_MAX8997 && IRQ_DOMAIN + depends on MFD_MAX8997 + select IRQ_DOMAIN help If you say yes here you get support for the MUIC device of Maxim MAX8997 PMIC. The MAX8997 MUIC is a USB port accessory From d2e2e90c7637e376054583fb076379ade13318a4 Mon Sep 17 00:00:00 2001 From: Kuppuswamy Sathyanarayanan Date: Wed, 1 May 2024 02:25:43 +0000 Subject: [PATCH 0169/1565] PCI/EDR: Align EDR_PORT_DPC_ENABLE_DSM with PCI Firmware r3.3 [ Upstream commit f24ba846133d0edec785ac6430d4daf6e9c93a09 ] The "Downstream Port Containment related Enhancements" ECN of Jan 28, 2019 (document 12888 below), defined the EDR_PORT_DPC_ENABLE_DSM function with Revision ID 5 with Arg3 being an integer. But when the ECN was integrated into PCI Firmware r3.3, sec 4.6.12, it was defined as Revision ID 6 with Arg3 being a package containing an integer. The implementation in acpi_enable_dpc() supplies a package as Arg3 (arg4 in the code), but it previously specified Revision ID 5. Align this with PCI Firmware r3.3 by using Revision ID 6. If firmware implemented per the ECN, its Revision 5 function would receive a package as Arg3 when it expects an integer, so acpi_enable_dpc() would likely fail. If such firmware exists and lacks a Revision 6 function that expects a package, we may have to add support for Revision 5. Link: https://lore.kernel.org/r/20240501022543.1626025-1-sathyanarayanan.kuppuswamy@linux.intel.com Link: https://members.pcisig.com/wg/PCI-SIG/document/12888 Fixes: ac1c8e35a326 ("PCI/DPC: Add Error Disconnect Recover (EDR) support") Signed-off-by: Kuppuswamy Sathyanarayanan [bhelgaas: split into two patches, update commit log] Signed-off-by: Bjorn Helgaas Tested-by: Satish Thatchanamurthy # one platform Signed-off-by: Sasha Levin --- drivers/pci/pcie/edr.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/drivers/pci/pcie/edr.c b/drivers/pci/pcie/edr.c index 87734e4c3c20..5b5a502363c0 100644 --- a/drivers/pci/pcie/edr.c +++ b/drivers/pci/pcie/edr.c @@ -32,10 +32,10 @@ static int acpi_enable_dpc(struct pci_dev *pdev) int status = 0; /* - * Behavior when calling unsupported _DSM functions is undefined, - * so check whether EDR_PORT_DPC_ENABLE_DSM is supported. + * Per PCI Firmware r3.3, sec 4.6.12, EDR_PORT_DPC_ENABLE_DSM is + * optional. Return success if it's not implemented. */ - if (!acpi_check_dsm(adev->handle, &pci_acpi_dsm_guid, 5, + if (!acpi_check_dsm(adev->handle, &pci_acpi_dsm_guid, 6, 1ULL << EDR_PORT_DPC_ENABLE_DSM)) return 0; @@ -46,12 +46,7 @@ static int acpi_enable_dpc(struct pci_dev *pdev) argv4.package.count = 1; argv4.package.elements = &req; - /* - * Per Downstream Port Containment Related Enhancements ECN to PCI - * Firmware Specification r3.2, sec 4.6.12, EDR_PORT_DPC_ENABLE_DSM is - * optional. Return success if it's not implemented. - */ - obj = acpi_evaluate_dsm(adev->handle, &pci_acpi_dsm_guid, 5, + obj = acpi_evaluate_dsm(adev->handle, &pci_acpi_dsm_guid, 6, EDR_PORT_DPC_ENABLE_DSM, &argv4); if (!obj) return 0; From 4b6e5edefd469d2a49176244e006dd79599f89fc Mon Sep 17 00:00:00 2001 From: Kuppuswamy Sathyanarayanan Date: Wed, 8 May 2024 14:31:38 -0500 Subject: [PATCH 0170/1565] PCI/EDR: Align EDR_PORT_LOCATE_DSM with PCI Firmware r3.3 [ Upstream commit e2e78a294a8a863898b781dbcf90e087eda3155d ] The "Downstream Port Containment related Enhancements" ECN of Jan 28, 2019 (document 12888 below), defined the EDR_PORT_LOCATE_DSM function with Revision ID 5 with a return value encoding (Bits 2:0 = Function, Bits 7:3 = Device, Bits 15:8 = Bus). When the ECN was integrated into PCI Firmware r3.3, sec 4.6.13, Bit 31 was added to indicate success or failure. Check Bit 31 for failure in acpi_dpc_port_get(). Link: https://lore.kernel.org/r/20240501022543.1626025-1-sathyanarayanan.kuppuswamy@linux.intel.com Link: https://members.pcisig.com/wg/PCI-SIG/document/12888 Fixes: ac1c8e35a326 ("PCI/DPC: Add Error Disconnect Recover (EDR) support") Signed-off-by: Kuppuswamy Sathyanarayanan [bhelgaas: split into two patches, update commit log] Signed-off-by: Bjorn Helgaas Tested-by: Satish Thatchanamurthy # one platform Signed-off-by: Sasha Levin --- drivers/pci/pcie/edr.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/edr.c b/drivers/pci/pcie/edr.c index 5b5a502363c0..35210007602c 100644 --- a/drivers/pci/pcie/edr.c +++ b/drivers/pci/pcie/edr.c @@ -80,8 +80,9 @@ static struct pci_dev *acpi_dpc_port_get(struct pci_dev *pdev) u16 port; /* - * Behavior when calling unsupported _DSM functions is undefined, - * so check whether EDR_PORT_DPC_ENABLE_DSM is supported. + * If EDR_PORT_LOCATE_DSM is not implemented under the target of + * EDR, the target is the port that experienced the containment + * event (PCI Firmware r3.3, sec 4.6.13). */ if (!acpi_check_dsm(adev->handle, &pci_acpi_dsm_guid, 5, 1ULL << EDR_PORT_LOCATE_DSM)) @@ -98,6 +99,16 @@ static struct pci_dev *acpi_dpc_port_get(struct pci_dev *pdev) return NULL; } + /* + * Bit 31 represents the success/failure of the operation. If bit + * 31 is set, the operation failed. + */ + if (obj->integer.value & BIT(31)) { + ACPI_FREE(obj); + pci_err(pdev, "Locate Port _DSM failed\n"); + return NULL; + } + /* * Firmware returns DPC port BDF details in following format: * 15:8 = bus From a6e1f7744e9b84f86a629a76024bba8468aa153b Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Mon, 6 May 2024 18:41:39 +0800 Subject: [PATCH 0171/1565] f2fs: compress: fix to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock [ Upstream commit 0a4ed2d97cb6d044196cc3e726b6699222b41019 ] It needs to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock to avoid racing with checkpoint, otherwise, filesystem metadata including blkaddr in dnode, inode fields and .total_valid_block_count may be corrupted after SPO case. Fixes: ef8d563f184e ("f2fs: introduce F2FS_IOC_RELEASE_COMPRESS_BLOCKS") Fixes: c75488fb4d82 ("f2fs: introduce F2FS_IOC_RESERVE_COMPRESS_BLOCKS") Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Sasha Levin --- fs/f2fs/file.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index f66959ae0163..c858118a3927 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -3596,9 +3596,12 @@ static int f2fs_release_compress_blocks(struct file *filp, unsigned long arg) struct dnode_of_data dn; pgoff_t end_offset, count; + f2fs_lock_op(sbi); + set_new_dnode(&dn, inode, NULL, NULL, 0); ret = f2fs_get_dnode_of_data(&dn, page_idx, LOOKUP_NODE); if (ret) { + f2fs_unlock_op(sbi); if (ret == -ENOENT) { page_idx = f2fs_get_next_page_offset(&dn, page_idx); @@ -3616,6 +3619,8 @@ static int f2fs_release_compress_blocks(struct file *filp, unsigned long arg) f2fs_put_dnode(&dn); + f2fs_unlock_op(sbi); + if (ret < 0) break; @@ -3758,9 +3763,12 @@ static int f2fs_reserve_compress_blocks(struct file *filp, unsigned long arg) struct dnode_of_data dn; pgoff_t end_offset, count; + f2fs_lock_op(sbi); + set_new_dnode(&dn, inode, NULL, NULL, 0); ret = f2fs_get_dnode_of_data(&dn, page_idx, LOOKUP_NODE); if (ret) { + f2fs_unlock_op(sbi); if (ret == -ENOENT) { page_idx = f2fs_get_next_page_offset(&dn, page_idx); @@ -3778,6 +3786,8 @@ static int f2fs_reserve_compress_blocks(struct file *filp, unsigned long arg) f2fs_put_dnode(&dn); + f2fs_unlock_op(sbi); + if (ret < 0) break; From c1958b978d50609b461e399a3f52ce7330aa6b01 Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Tue, 7 May 2024 11:31:00 +0800 Subject: [PATCH 0172/1565] f2fs: fix to release node block count in error path of f2fs_new_node_page() [ Upstream commit 0fa4e57c1db263effd72d2149d4e21da0055c316 ] It missed to call dec_valid_node_count() to release node block count in error path, fix it. Fixes: 141170b759e0 ("f2fs: fix to avoid use f2fs_bug_on() in f2fs_new_node_page()") Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Sasha Levin --- fs/f2fs/node.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 02cb1c806c3e..348ad1d6199f 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1242,6 +1242,7 @@ struct page *f2fs_new_node_page(struct dnode_of_data *dn, unsigned int ofs) } if (unlikely(new_ni.blk_addr != NULL_ADDR)) { err = -EFSCORRUPTED; + dec_valid_node_count(sbi, dn->inode, !ofs); set_sbi_flag(sbi, SBI_NEED_FSCK); goto fail; } @@ -1267,7 +1268,6 @@ struct page *f2fs_new_node_page(struct dnode_of_data *dn, unsigned int ofs) if (ofs == 0) inc_valid_inode_count(sbi); return page; - fail: clear_node_page_dirty(page); f2fs_put_page(page, 1); From b8962cf98595d1ec62f40f23667de830567ec8bc Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Fri, 10 May 2024 11:33:39 +0800 Subject: [PATCH 0173/1565] f2fs: compress: don't allow unaligned truncation on released compress inode [ Upstream commit 29ed2b5dd521ce7c5d8466cd70bf0cc9d07afeee ] f2fs image may be corrupted after below testcase: - mkfs.f2fs -O extra_attr,compression -f /dev/vdb - mount /dev/vdb /mnt/f2fs - touch /mnt/f2fs/file - f2fs_io setflags compression /mnt/f2fs/file - dd if=/dev/zero of=/mnt/f2fs/file bs=4k count=4 - f2fs_io release_cblocks /mnt/f2fs/file - truncate -s 8192 /mnt/f2fs/file - umount /mnt/f2fs - fsck.f2fs /dev/vdb [ASSERT] (fsck_chk_inode_blk:1256) --> ino: 0x5 has i_blocks: 0x00000002, but has 0x3 blocks [FSCK] valid_block_count matching with CP [Fail] [0x4, 0x5] [FSCK] other corrupted bugs [Fail] The reason is: partial truncation assume compressed inode has reserved blocks, after partial truncation, valid block count may change w/o .i_blocks and .total_valid_block_count update, result in corruption. This patch only allow cluster size aligned truncation on released compress inode for fixing. Fixes: c61404153eb6 ("f2fs: introduce FI_COMPRESS_RELEASED instead of using IMMUTABLE bit") Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Sasha Levin --- fs/f2fs/file.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index c858118a3927..50514962771a 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -891,9 +891,14 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr) ATTR_GID | ATTR_TIMES_SET)))) return -EPERM; - if ((attr->ia_valid & ATTR_SIZE) && - !f2fs_is_compress_backend_ready(inode)) - return -EOPNOTSUPP; + if ((attr->ia_valid & ATTR_SIZE)) { + if (!f2fs_is_compress_backend_ready(inode)) + return -EOPNOTSUPP; + if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED) && + !IS_ALIGNED(attr->ia_size, + F2FS_BLK_TO_BYTES(F2FS_I(inode)->i_cluster_size))) + return -EINVAL; + } err = setattr_prepare(dentry, attr); if (err) From 158adcb7fd7e6a23cbad50f27685ca2afdb6d15e Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Mon, 6 May 2024 13:40:17 +0200 Subject: [PATCH 0174/1565] serial: sh-sci: protect invalidating RXDMA on shutdown [ Upstream commit aae20f6e34cd0cbd67a1d0e5877561c40109a81b ] The to-be-fixed commit removed locking when invalidating the DMA RX descriptors on shutdown. It overlooked that there is still a rx_timer running which may still access the protected data. So, re-add the locking. Reported-by: Dirk Behme Closes: https://lore.kernel.org/r/ee6c9e16-9f29-450e-81da-4a8dceaa8fc7@de.bosch.com Fixes: 2c4ee23530ff ("serial: sh-sci: Postpone DMA release when falling back to PIO") Signed-off-by: Wolfram Sang Link: https://lore.kernel.org/r/20240506114016.30498-7-wsa+renesas@sang-engineering.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- drivers/tty/serial/sh-sci.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c index a7c28543c6f7..71cf9a7329f9 100644 --- a/drivers/tty/serial/sh-sci.c +++ b/drivers/tty/serial/sh-sci.c @@ -1257,9 +1257,14 @@ static void sci_dma_rx_chan_invalidate(struct sci_port *s) static void sci_dma_rx_release(struct sci_port *s) { struct dma_chan *chan = s->chan_rx_saved; + struct uart_port *port = &s->port; + unsigned long flags; + uart_port_lock_irqsave(port, &flags); s->chan_rx_saved = NULL; sci_dma_rx_chan_invalidate(s); + uart_port_unlock_irqrestore(port, flags); + dmaengine_terminate_sync(chan); dma_free_coherent(chan->device->dev, s->buf_len_rx * 2, s->rx_buf[0], sg_dma_address(&s->sg_rx[0])); From ec43f32f66d9375c3f1318db97f05229f63ac5ed Mon Sep 17 00:00:00 2001 From: Ian Rogers Date: Wed, 8 May 2024 22:20:15 -0700 Subject: [PATCH 0175/1565] libsubcmd: Fix parse-options memory leak [ Upstream commit 230a7a71f92212e723fa435d4ca5922de33ec88a ] If a usage string is built in parse_options_subcommand, also free it. Fixes: 901421a5bdf605d2 ("perf tools: Remove subcmd dependencies on strbuf") Signed-off-by: Ian Rogers Cc: Adrian Hunter Cc: Alexander Shishkin Cc: Ingo Molnar Cc: Jiri Olsa Cc: Josh Poimboeuf Cc: Kan Liang Cc: Mark Rutland Cc: Namhyung Kim Cc: Peter Zijlstra Link: https://lore.kernel.org/r/20240509052015.1914670-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin --- tools/lib/subcmd/parse-options.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/tools/lib/subcmd/parse-options.c b/tools/lib/subcmd/parse-options.c index 39ebf6192016..e799d35cba43 100644 --- a/tools/lib/subcmd/parse-options.c +++ b/tools/lib/subcmd/parse-options.c @@ -633,11 +633,10 @@ int parse_options_subcommand(int argc, const char **argv, const struct option *o const char *const subcommands[], const char *usagestr[], int flags) { struct parse_opt_ctx_t ctx; + char *buf = NULL; /* build usage string if it's not provided */ if (subcommands && !usagestr[0]) { - char *buf = NULL; - astrcatf(&buf, "%s %s [] {", subcmd_config.exec_name, argv[0]); for (int i = 0; subcommands[i]; i++) { @@ -679,7 +678,10 @@ int parse_options_subcommand(int argc, const char **argv, const struct option *o astrcatf(&error_buf, "unknown switch `%c'", *ctx.opt); usage_with_options(usagestr, options); } - + if (buf) { + usagestr[0] = NULL; + free(buf); + } return parse_options_end(&ctx); } From 66a02effb898e31145ef6520d1767eebd3e6ac08 Mon Sep 17 00:00:00 2001 From: Alexander Egorenkov Date: Fri, 10 May 2024 12:41:25 +0200 Subject: [PATCH 0176/1565] s390/ipl: Fix incorrect initialization of len fields in nvme reipl block [ Upstream commit 9c922b73acaf39f867668d9cbe5dc69c23511f84 ] Use correct symbolic constants IPL_BP_NVME_LEN and IPL_BP0_NVME_LEN to initialize nvme reipl block when 'scp_data' sysfs attribute is being updated. This bug had not been detected before because the corresponding fcp and nvme symbolic constants are equal. Fixes: 23a457b8d57d ("s390: nvme reipl") Reviewed-by: Heiko Carstens Signed-off-by: Alexander Egorenkov Signed-off-by: Alexander Gordeev Signed-off-by: Sasha Levin --- arch/s390/kernel/ipl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/s390/kernel/ipl.c b/arch/s390/kernel/ipl.c index c469e8848d65..ab23742088d0 100644 --- a/arch/s390/kernel/ipl.c +++ b/arch/s390/kernel/ipl.c @@ -832,8 +832,8 @@ static ssize_t reipl_nvme_scpdata_write(struct file *filp, struct kobject *kobj, scpdata_len += padding; } - reipl_block_nvme->hdr.len = IPL_BP_FCP_LEN + scpdata_len; - reipl_block_nvme->nvme.len = IPL_BP0_FCP_LEN + scpdata_len; + reipl_block_nvme->hdr.len = IPL_BP_NVME_LEN + scpdata_len; + reipl_block_nvme->nvme.len = IPL_BP0_NVME_LEN + scpdata_len; reipl_block_nvme->nvme.scp_data_len = scpdata_len; return count; From adc7dc29b7967eb54a0830d6cd6d13c9d7ad2166 Mon Sep 17 00:00:00 2001 From: Alexander Egorenkov Date: Fri, 10 May 2024 12:41:26 +0200 Subject: [PATCH 0177/1565] s390/ipl: Fix incorrect initialization of nvme dump block [ Upstream commit 7faacaeaf6ce12fae78751de5ad869d8f1e1cd7a ] Initialize the correct fields of the nvme dump block. This bug had not been detected before because first, the fcp and nvme fields of struct ipl_parameter_block are part of the same union and, therefore, overlap in memory and second, they are identical in structure and size. Fixes: d70e38cb1dee ("s390: nvme dump support") Reviewed-by: Heiko Carstens Signed-off-by: Alexander Egorenkov Signed-off-by: Alexander Gordeev Signed-off-by: Sasha Levin --- arch/s390/kernel/ipl.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/s390/kernel/ipl.c b/arch/s390/kernel/ipl.c index ab23742088d0..939ceec83048 100644 --- a/arch/s390/kernel/ipl.c +++ b/arch/s390/kernel/ipl.c @@ -1602,9 +1602,9 @@ static int __init dump_nvme_init(void) } dump_block_nvme->hdr.len = IPL_BP_NVME_LEN; dump_block_nvme->hdr.version = IPL_PARM_BLOCK_VERSION; - dump_block_nvme->fcp.len = IPL_BP0_NVME_LEN; - dump_block_nvme->fcp.pbt = IPL_PBT_NVME; - dump_block_nvme->fcp.opt = IPL_PB0_NVME_OPT_DUMP; + dump_block_nvme->nvme.len = IPL_BP0_NVME_LEN; + dump_block_nvme->nvme.pbt = IPL_PBT_NVME; + dump_block_nvme->nvme.opt = IPL_PB0_NVME_OPT_DUMP; dump_capabilities |= DUMP_TYPE_NVME; return 0; } From abeaeaee7fa95b109fbae942a9d2ac79f4323e02 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 28 Mar 2024 13:28:56 -0700 Subject: [PATCH 0178/1565] Input: ims-pcu - fix printf string overflow [ Upstream commit bf32bceedd0453c70d9d022e2e29f98e446d7161 ] clang warns about a string overflow in this driver drivers/input/misc/ims-pcu.c:1802:2: error: 'snprintf' will always be truncated; specified size is 10, but format string expands to at least 12 [-Werror,-Wformat-truncation] drivers/input/misc/ims-pcu.c:1814:2: error: 'snprintf' will always be truncated; specified size is 10, but format string expands to at least 12 [-Werror,-Wformat-truncation] Make the buffer a little longer to ensure it always fits. Fixes: 628329d52474 ("Input: add IMS Passenger Control Unit driver") Signed-off-by: Arnd Bergmann Link: https://lore.kernel.org/r/20240326223825.4084412-7-arnd@kernel.org Signed-off-by: Dmitry Torokhov Signed-off-by: Sasha Levin --- drivers/input/misc/ims-pcu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/input/misc/ims-pcu.c b/drivers/input/misc/ims-pcu.c index 08b9b5cdb943..e5cb20e7f57b 100644 --- a/drivers/input/misc/ims-pcu.c +++ b/drivers/input/misc/ims-pcu.c @@ -42,8 +42,8 @@ struct ims_pcu_backlight { #define IMS_PCU_PART_NUMBER_LEN 15 #define IMS_PCU_SERIAL_NUMBER_LEN 8 #define IMS_PCU_DOM_LEN 8 -#define IMS_PCU_FW_VERSION_LEN (9 + 1) -#define IMS_PCU_BL_VERSION_LEN (9 + 1) +#define IMS_PCU_FW_VERSION_LEN 16 +#define IMS_PCU_BL_VERSION_LEN 16 #define IMS_PCU_BL_RESET_REASON_LEN (2 + 1) #define IMS_PCU_PCU_B_DEVICE_ID 5 From 0096d223f78cb48db1ae8ae9fd56d702896ba8ae Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= Date: Wed, 20 Sep 2023 14:58:13 +0200 Subject: [PATCH 0179/1565] Input: ioc3kbd - convert to platform remove callback returning void MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 150e792dee9ca8416f3d375e48f2f4d7f701fc6b ] The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new() which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König Link: https://lore.kernel.org/r/20230920125829.1478827-37-u.kleine-koenig@pengutronix.de Signed-off-by: Dmitry Torokhov Stable-dep-of: d40e9edcf3eb ("Input: ioc3kbd - add device table") Signed-off-by: Sasha Levin --- drivers/input/serio/ioc3kbd.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/input/serio/ioc3kbd.c b/drivers/input/serio/ioc3kbd.c index d51bfe912db5..50552dc7b4f5 100644 --- a/drivers/input/serio/ioc3kbd.c +++ b/drivers/input/serio/ioc3kbd.c @@ -190,7 +190,7 @@ static int ioc3kbd_probe(struct platform_device *pdev) return 0; } -static int ioc3kbd_remove(struct platform_device *pdev) +static void ioc3kbd_remove(struct platform_device *pdev) { struct ioc3kbd_data *d = platform_get_drvdata(pdev); @@ -198,13 +198,11 @@ static int ioc3kbd_remove(struct platform_device *pdev) serio_unregister_port(d->kbd); serio_unregister_port(d->aux); - - return 0; } static struct platform_driver ioc3kbd_driver = { .probe = ioc3kbd_probe, - .remove = ioc3kbd_remove, + .remove_new = ioc3kbd_remove, .driver = { .name = "ioc3-kbd", }, From 52f8d76769e7c7ae1bc0e0e12b8241aa4caf9e0d Mon Sep 17 00:00:00 2001 From: Karel Balej Date: Fri, 15 Mar 2024 12:46:14 -0700 Subject: [PATCH 0180/1565] Input: ioc3kbd - add device table [ Upstream commit d40e9edcf3eb925c259df9f9dd7319a4fcbc675b ] Without the device table the driver will not auto-load when compiled as a module. Fixes: 273db8f03509 ("Input: add IOC3 serio driver") Signed-off-by: Karel Balej Link: https://lore.kernel.org/r/20240313115832.8052-1-balejk@matfyz.cz Signed-off-by: Dmitry Torokhov Signed-off-by: Sasha Levin --- drivers/input/serio/ioc3kbd.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/input/serio/ioc3kbd.c b/drivers/input/serio/ioc3kbd.c index 50552dc7b4f5..676b0bda3d72 100644 --- a/drivers/input/serio/ioc3kbd.c +++ b/drivers/input/serio/ioc3kbd.c @@ -200,9 +200,16 @@ static void ioc3kbd_remove(struct platform_device *pdev) serio_unregister_port(d->aux); } +static const struct platform_device_id ioc3kbd_id_table[] = { + { "ioc3-kbd", }, + { } +}; +MODULE_DEVICE_TABLE(platform, ioc3kbd_id_table); + static struct platform_driver ioc3kbd_driver = { .probe = ioc3kbd_probe, .remove_new = ioc3kbd_remove, + .id_table = ioc3kbd_id_table, .driver = { .name = "ioc3-kbd", }, From e7a444a35eba221465ed4f07f9064bef9f74401d Mon Sep 17 00:00:00 2001 From: Judith Mendez Date: Wed, 20 Mar 2024 17:38:31 -0500 Subject: [PATCH 0181/1565] mmc: sdhci_am654: Add tuning algorithm for delay chain [ Upstream commit 6231d99dd4119312ad41abf9383e18fec66cbe4b ] Currently the sdhci_am654 driver only supports one tuning algorithm which should be used only when DLL is enabled. The ITAPDLY is selected from the largest passing window and the buffer is viewed as a circular buffer. The new algorithm should be used when the delay chain is enabled. The ITAPDLY is selected from the largest passing window and the buffer is not viewed as a circular buffer. This implementation is based off of the following paper: [1]. Also add support for multiple failing windows. [1] https://www.ti.com/lit/an/spract9/spract9.pdf Fixes: 13ebeae68ac9 ("mmc: sdhci_am654: Add support for software tuning") Signed-off-by: Judith Mendez Acked-by: Adrian Hunter Link: https://lore.kernel.org/r/20240320223837.959900-2-jm@ti.com Signed-off-by: Ulf Hansson Signed-off-by: Sasha Levin --- drivers/mmc/host/sdhci_am654.c | 112 +++++++++++++++++++++++++++------ 1 file changed, 92 insertions(+), 20 deletions(-) diff --git a/drivers/mmc/host/sdhci_am654.c b/drivers/mmc/host/sdhci_am654.c index eb52e0c5a020..2e302c2b0ac1 100644 --- a/drivers/mmc/host/sdhci_am654.c +++ b/drivers/mmc/host/sdhci_am654.c @@ -149,10 +149,17 @@ struct sdhci_am654_data { int strb_sel; u32 flags; u32 quirks; + bool dll_enable; #define SDHCI_AM654_QUIRK_FORCE_CDTEST BIT(0) }; +struct window { + u8 start; + u8 end; + u8 length; +}; + struct sdhci_am654_driver_data { const struct sdhci_pltfm_data *pdata; u32 flags; @@ -294,10 +301,13 @@ static void sdhci_am654_set_clock(struct sdhci_host *host, unsigned int clock) regmap_update_bits(sdhci_am654->base, PHY_CTRL4, mask, val); - if (timing > MMC_TIMING_UHS_SDR25 && clock >= CLOCK_TOO_SLOW_HZ) + if (timing > MMC_TIMING_UHS_SDR25 && clock >= CLOCK_TOO_SLOW_HZ) { sdhci_am654_setup_dll(host, clock); - else + sdhci_am654->dll_enable = true; + } else { sdhci_am654_setup_delay_chain(sdhci_am654, timing); + sdhci_am654->dll_enable = false; + } regmap_update_bits(sdhci_am654->base, PHY_CTRL5, CLKBUFSEL_MASK, sdhci_am654->clkbuf_sel); @@ -415,39 +425,101 @@ static u32 sdhci_am654_cqhci_irq(struct sdhci_host *host, u32 intmask) return 0; } -#define ITAP_MAX 32 +#define ITAPDLY_LENGTH 32 +#define ITAPDLY_LAST_INDEX (ITAPDLY_LENGTH - 1) + +static u32 sdhci_am654_calculate_itap(struct sdhci_host *host, struct window + *fail_window, u8 num_fails, bool circular_buffer) +{ + u8 itap = 0, start_fail = 0, end_fail = 0, pass_length = 0; + u8 first_fail_start = 0, last_fail_end = 0; + struct device *dev = mmc_dev(host->mmc); + struct window pass_window = {0, 0, 0}; + int prev_fail_end = -1; + u8 i; + + if (!num_fails) + return ITAPDLY_LAST_INDEX >> 1; + + if (fail_window->length == ITAPDLY_LENGTH) { + dev_err(dev, "No passing ITAPDLY, return 0\n"); + return 0; + } + + first_fail_start = fail_window->start; + last_fail_end = fail_window[num_fails - 1].end; + + for (i = 0; i < num_fails; i++) { + start_fail = fail_window[i].start; + end_fail = fail_window[i].end; + pass_length = start_fail - (prev_fail_end + 1); + + if (pass_length > pass_window.length) { + pass_window.start = prev_fail_end + 1; + pass_window.length = pass_length; + } + prev_fail_end = end_fail; + } + + if (!circular_buffer) + pass_length = ITAPDLY_LAST_INDEX - last_fail_end; + else + pass_length = ITAPDLY_LAST_INDEX - last_fail_end + first_fail_start; + + if (pass_length > pass_window.length) { + pass_window.start = last_fail_end + 1; + pass_window.length = pass_length; + } + + if (!circular_buffer) + itap = pass_window.start + (pass_window.length >> 1); + else + itap = (pass_window.start + (pass_window.length >> 1)) % ITAPDLY_LENGTH; + + return (itap > ITAPDLY_LAST_INDEX) ? ITAPDLY_LAST_INDEX >> 1 : itap; +} + static int sdhci_am654_platform_execute_tuning(struct sdhci_host *host, u32 opcode) { struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); struct sdhci_am654_data *sdhci_am654 = sdhci_pltfm_priv(pltfm_host); - int cur_val, prev_val = 1, fail_len = 0, pass_window = 0, pass_len; - u32 itap; + struct window fail_window[ITAPDLY_LENGTH]; + u8 curr_pass, itap; + u8 fail_index = 0; + u8 prev_pass = 1; + + memset(fail_window, 0, sizeof(fail_window)); /* Enable ITAPDLY */ regmap_update_bits(sdhci_am654->base, PHY_CTRL4, ITAPDLYENA_MASK, 1 << ITAPDLYENA_SHIFT); - for (itap = 0; itap < ITAP_MAX; itap++) { + for (itap = 0; itap < ITAPDLY_LENGTH; itap++) { sdhci_am654_write_itapdly(sdhci_am654, itap); - cur_val = !mmc_send_tuning(host->mmc, opcode, NULL); - if (cur_val && !prev_val) - pass_window = itap; + curr_pass = !mmc_send_tuning(host->mmc, opcode, NULL); - if (!cur_val) - fail_len++; + if (!curr_pass && prev_pass) + fail_window[fail_index].start = itap; - prev_val = cur_val; + if (!curr_pass) { + fail_window[fail_index].end = itap; + fail_window[fail_index].length++; + } + + if (curr_pass && !prev_pass) + fail_index++; + + prev_pass = curr_pass; } - /* - * Having determined the length of the failing window and start of - * the passing window calculate the length of the passing window and - * set the final value halfway through it considering the range as a - * circular buffer - */ - pass_len = ITAP_MAX - fail_len; - itap = (pass_window + (pass_len >> 1)) % ITAP_MAX; + + if (fail_window[fail_index].length != 0) + fail_index++; + + itap = sdhci_am654_calculate_itap(host, fail_window, fail_index, + sdhci_am654->dll_enable); + sdhci_am654_write_itapdly(sdhci_am654, itap); return 0; From 8dcfbb27e4257e893d8ef9f066b0673c59b0e347 Mon Sep 17 00:00:00 2001 From: Judith Mendez Date: Wed, 20 Mar 2024 17:38:32 -0500 Subject: [PATCH 0182/1565] mmc: sdhci_am654: Write ITAPDLY for DDR52 timing [ Upstream commit d465234493bb6ad1b9c10a0c9ef9881b8d85081a ] For DDR52 timing, DLL is enabled but tuning is not carried out, therefore the ITAPDLY value in PHY CTRL 4 register is not correct. Fix this by writing ITAPDLY after enabling DLL. Fixes: a161c45f2979 ("mmc: sdhci_am654: Enable DLL only for some speed modes") Signed-off-by: Judith Mendez Reviewed-by: Andrew Davis Acked-by: Adrian Hunter Link: https://lore.kernel.org/r/20240320223837.959900-3-jm@ti.com Signed-off-by: Ulf Hansson Signed-off-by: Sasha Levin --- drivers/mmc/host/sdhci_am654.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/mmc/host/sdhci_am654.c b/drivers/mmc/host/sdhci_am654.c index 2e302c2b0ac1..13bff543f7c0 100644 --- a/drivers/mmc/host/sdhci_am654.c +++ b/drivers/mmc/host/sdhci_am654.c @@ -304,6 +304,7 @@ static void sdhci_am654_set_clock(struct sdhci_host *host, unsigned int clock) if (timing > MMC_TIMING_UHS_SDR25 && clock >= CLOCK_TOO_SLOW_HZ) { sdhci_am654_setup_dll(host, clock); sdhci_am654->dll_enable = true; + sdhci_am654_write_itapdly(sdhci_am654, sdhci_am654->itap_del_sel[timing]); } else { sdhci_am654_setup_delay_chain(sdhci_am654, timing); sdhci_am654->dll_enable = false; From 76f2b3ccbd63b749743f1b201a156a857031aa4d Mon Sep 17 00:00:00 2001 From: Vignesh Raghavendra Date: Wed, 22 Nov 2023 11:32:14 +0530 Subject: [PATCH 0183/1565] mmc: sdhci_am654: Drop lookup for deprecated ti,otap-del-sel [ Upstream commit 5cb2f9286a31f33dc732c57540838ad9339393ab ] ti,otap-del-sel has been deprecated since v5.7 and there are no users of this property and no documentation in the DT bindings either. Drop the fallback code looking for this property, this makes sdhci_am654_get_otap_delay() much easier to read as all the TAP values can be handled via a single iterator loop. Signed-off-by: Vignesh Raghavendra Acked-by: Adrian Hunter Link: https://lore.kernel.org/r/20231122060215.2074799-1-vigneshr@ti.com Signed-off-by: Ulf Hansson Stable-dep-of: 387c1bf7dce0 ("mmc: sdhci_am654: Add OTAP/ITAP delay enable") Signed-off-by: Sasha Levin --- drivers/mmc/host/sdhci_am654.c | 37 ++++++---------------------------- 1 file changed, 6 insertions(+), 31 deletions(-) diff --git a/drivers/mmc/host/sdhci_am654.c b/drivers/mmc/host/sdhci_am654.c index 13bff543f7c0..847cd4e43472 100644 --- a/drivers/mmc/host/sdhci_am654.c +++ b/drivers/mmc/host/sdhci_am654.c @@ -140,7 +140,6 @@ static const struct timing_data td[] = { struct sdhci_am654_data { struct regmap *base; - bool legacy_otapdly; int otap_del_sel[ARRAY_SIZE(td)]; int itap_del_sel[ARRAY_SIZE(td)]; int clkbuf_sel; @@ -278,11 +277,7 @@ static void sdhci_am654_set_clock(struct sdhci_host *host, unsigned int clock) sdhci_set_clock(host, clock); /* Setup DLL Output TAP delay */ - if (sdhci_am654->legacy_otapdly) - otap_del_sel = sdhci_am654->otap_del_sel[0]; - else - otap_del_sel = sdhci_am654->otap_del_sel[timing]; - + otap_del_sel = sdhci_am654->otap_del_sel[timing]; otap_del_ena = (timing > MMC_TIMING_UHS_SDR25) ? 1 : 0; mask = OTAPDLYENA_MASK | OTAPDLYSEL_MASK; @@ -324,10 +319,7 @@ static void sdhci_j721e_4bit_set_clock(struct sdhci_host *host, u32 mask, val; /* Setup DLL Output TAP delay */ - if (sdhci_am654->legacy_otapdly) - otap_del_sel = sdhci_am654->otap_del_sel[0]; - else - otap_del_sel = sdhci_am654->otap_del_sel[timing]; + otap_del_sel = sdhci_am654->otap_del_sel[timing]; mask = OTAPDLYENA_MASK | OTAPDLYSEL_MASK; val = (0x1 << OTAPDLYENA_SHIFT) | @@ -652,32 +644,15 @@ static int sdhci_am654_get_otap_delay(struct sdhci_host *host, int i; int ret; - ret = device_property_read_u32(dev, td[MMC_TIMING_LEGACY].otap_binding, - &sdhci_am654->otap_del_sel[MMC_TIMING_LEGACY]); - if (ret) { - /* - * ti,otap-del-sel-legacy is mandatory, look for old binding - * if not found. - */ - ret = device_property_read_u32(dev, "ti,otap-del-sel", - &sdhci_am654->otap_del_sel[0]); - if (ret) { - dev_err(dev, "Couldn't find otap-del-sel\n"); - - return ret; - } - - dev_info(dev, "Using legacy binding ti,otap-del-sel\n"); - sdhci_am654->legacy_otapdly = true; - - return 0; - } - for (i = MMC_TIMING_LEGACY; i <= MMC_TIMING_MMC_HS400; i++) { ret = device_property_read_u32(dev, td[i].otap_binding, &sdhci_am654->otap_del_sel[i]); if (ret) { + if (i == MMC_TIMING_LEGACY) { + dev_err(dev, "Couldn't find mandatory ti,otap-del-sel-legacy\n"); + return ret; + } dev_dbg(dev, "Couldn't find %s\n", td[i].otap_binding); /* From b55d988df1d60cd089f71a53a3f486eead98b900 Mon Sep 17 00:00:00 2001 From: Judith Mendez Date: Wed, 20 Mar 2024 17:38:33 -0500 Subject: [PATCH 0184/1565] mmc: sdhci_am654: Add OTAP/ITAP delay enable [ Upstream commit 387c1bf7dce0dfea02080c8bdb066b5209e92155 ] Currently the OTAP/ITAP delay enable functionality is incorrect in the am654_set_clock function. The OTAP delay is not enabled when timing < SDR25 bus speed mode. The ITAP delay is not enabled for timings that do not carry out tuning. Add this OTAP/ITAP delay functionality according to the datasheet [1] OTAPDLYENA and ITAPDLYENA for MMC0. [1] https://www.ti.com/lit/ds/symlink/am62p.pdf Fixes: 8ee5fc0e0b3b ("mmc: sdhci_am654: Update OTAPDLY writes") Signed-off-by: Judith Mendez Acked-by: Adrian Hunter Link: https://lore.kernel.org/r/20240320223837.959900-4-jm@ti.com Signed-off-by: Ulf Hansson Signed-off-by: Sasha Levin --- drivers/mmc/host/sdhci_am654.c | 40 ++++++++++++++++++++++------------ 1 file changed, 26 insertions(+), 14 deletions(-) diff --git a/drivers/mmc/host/sdhci_am654.c b/drivers/mmc/host/sdhci_am654.c index 847cd4e43472..1968381b9310 100644 --- a/drivers/mmc/host/sdhci_am654.c +++ b/drivers/mmc/host/sdhci_am654.c @@ -142,6 +142,7 @@ struct sdhci_am654_data { struct regmap *base; int otap_del_sel[ARRAY_SIZE(td)]; int itap_del_sel[ARRAY_SIZE(td)]; + u32 itap_del_ena[ARRAY_SIZE(td)]; int clkbuf_sel; int trm_icp; int drv_strength; @@ -238,11 +239,13 @@ static void sdhci_am654_setup_dll(struct sdhci_host *host, unsigned int clock) } static void sdhci_am654_write_itapdly(struct sdhci_am654_data *sdhci_am654, - u32 itapdly) + u32 itapdly, u32 enable) { /* Set ITAPCHGWIN before writing to ITAPDLY */ regmap_update_bits(sdhci_am654->base, PHY_CTRL4, ITAPCHGWIN_MASK, 1 << ITAPCHGWIN_SHIFT); + regmap_update_bits(sdhci_am654->base, PHY_CTRL4, ITAPDLYENA_MASK, + enable << ITAPDLYENA_SHIFT); regmap_update_bits(sdhci_am654->base, PHY_CTRL4, ITAPDLYSEL_MASK, itapdly << ITAPDLYSEL_SHIFT); regmap_update_bits(sdhci_am654->base, PHY_CTRL4, ITAPCHGWIN_MASK, 0); @@ -259,8 +262,8 @@ static void sdhci_am654_setup_delay_chain(struct sdhci_am654_data *sdhci_am654, mask = SELDLYTXCLK_MASK | SELDLYRXCLK_MASK; regmap_update_bits(sdhci_am654->base, PHY_CTRL5, mask, val); - sdhci_am654_write_itapdly(sdhci_am654, - sdhci_am654->itap_del_sel[timing]); + sdhci_am654_write_itapdly(sdhci_am654, sdhci_am654->itap_del_sel[timing], + sdhci_am654->itap_del_ena[timing]); } static void sdhci_am654_set_clock(struct sdhci_host *host, unsigned int clock) @@ -269,7 +272,6 @@ static void sdhci_am654_set_clock(struct sdhci_host *host, unsigned int clock) struct sdhci_am654_data *sdhci_am654 = sdhci_pltfm_priv(pltfm_host); unsigned char timing = host->mmc->ios.timing; u32 otap_del_sel; - u32 otap_del_ena; u32 mask, val; regmap_update_bits(sdhci_am654->base, PHY_CTRL1, ENDLL_MASK, 0); @@ -278,10 +280,9 @@ static void sdhci_am654_set_clock(struct sdhci_host *host, unsigned int clock) /* Setup DLL Output TAP delay */ otap_del_sel = sdhci_am654->otap_del_sel[timing]; - otap_del_ena = (timing > MMC_TIMING_UHS_SDR25) ? 1 : 0; mask = OTAPDLYENA_MASK | OTAPDLYSEL_MASK; - val = (otap_del_ena << OTAPDLYENA_SHIFT) | + val = (0x1 << OTAPDLYENA_SHIFT) | (otap_del_sel << OTAPDLYSEL_SHIFT); /* Write to STRBSEL for HS400 speed mode */ @@ -299,7 +300,8 @@ static void sdhci_am654_set_clock(struct sdhci_host *host, unsigned int clock) if (timing > MMC_TIMING_UHS_SDR25 && clock >= CLOCK_TOO_SLOW_HZ) { sdhci_am654_setup_dll(host, clock); sdhci_am654->dll_enable = true; - sdhci_am654_write_itapdly(sdhci_am654, sdhci_am654->itap_del_sel[timing]); + sdhci_am654_write_itapdly(sdhci_am654, sdhci_am654->itap_del_sel[timing], + sdhci_am654->itap_del_ena[timing]); } else { sdhci_am654_setup_delay_chain(sdhci_am654, timing); sdhci_am654->dll_enable = false; @@ -316,6 +318,7 @@ static void sdhci_j721e_4bit_set_clock(struct sdhci_host *host, struct sdhci_am654_data *sdhci_am654 = sdhci_pltfm_priv(pltfm_host); unsigned char timing = host->mmc->ios.timing; u32 otap_del_sel; + u32 itap_del_ena; u32 mask, val; /* Setup DLL Output TAP delay */ @@ -324,6 +327,12 @@ static void sdhci_j721e_4bit_set_clock(struct sdhci_host *host, mask = OTAPDLYENA_MASK | OTAPDLYSEL_MASK; val = (0x1 << OTAPDLYENA_SHIFT) | (otap_del_sel << OTAPDLYSEL_SHIFT); + + itap_del_ena = sdhci_am654->itap_del_ena[timing]; + + mask |= ITAPDLYENA_MASK; + val |= (itap_del_ena << ITAPDLYENA_SHIFT); + regmap_update_bits(sdhci_am654->base, PHY_CTRL4, mask, val); regmap_update_bits(sdhci_am654->base, PHY_CTRL5, CLKBUFSEL_MASK, @@ -477,6 +486,7 @@ static int sdhci_am654_platform_execute_tuning(struct sdhci_host *host, { struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host); struct sdhci_am654_data *sdhci_am654 = sdhci_pltfm_priv(pltfm_host); + unsigned char timing = host->mmc->ios.timing; struct window fail_window[ITAPDLY_LENGTH]; u8 curr_pass, itap; u8 fail_index = 0; @@ -485,11 +495,10 @@ static int sdhci_am654_platform_execute_tuning(struct sdhci_host *host, memset(fail_window, 0, sizeof(fail_window)); /* Enable ITAPDLY */ - regmap_update_bits(sdhci_am654->base, PHY_CTRL4, ITAPDLYENA_MASK, - 1 << ITAPDLYENA_SHIFT); + sdhci_am654->itap_del_ena[timing] = 0x1; for (itap = 0; itap < ITAPDLY_LENGTH; itap++) { - sdhci_am654_write_itapdly(sdhci_am654, itap); + sdhci_am654_write_itapdly(sdhci_am654, itap, sdhci_am654->itap_del_ena[timing]); curr_pass = !mmc_send_tuning(host->mmc, opcode, NULL); @@ -513,7 +522,7 @@ static int sdhci_am654_platform_execute_tuning(struct sdhci_host *host, itap = sdhci_am654_calculate_itap(host, fail_window, fail_index, sdhci_am654->dll_enable); - sdhci_am654_write_itapdly(sdhci_am654, itap); + sdhci_am654_write_itapdly(sdhci_am654, itap, sdhci_am654->itap_del_ena[timing]); return 0; } @@ -665,9 +674,12 @@ static int sdhci_am654_get_otap_delay(struct sdhci_host *host, host->mmc->caps2 &= ~td[i].capability; } - if (td[i].itap_binding) - device_property_read_u32(dev, td[i].itap_binding, - &sdhci_am654->itap_del_sel[i]); + if (td[i].itap_binding) { + ret = device_property_read_u32(dev, td[i].itap_binding, + &sdhci_am654->itap_del_sel[i]); + if (!ret) + sdhci_am654->itap_del_ena[i] = 0x1; + } } return 0; From 2621bf50f5803cce956a6a4aa43801240976616e Mon Sep 17 00:00:00 2001 From: Judith Mendez Date: Wed, 20 Mar 2024 17:38:36 -0500 Subject: [PATCH 0185/1565] mmc: sdhci_am654: Add ITAPDLYSEL in sdhci_j721e_4bit_set_clock [ Upstream commit 9dff65bb5e09903c27d9cff947dff4d22b6ea6a1 ] Add ITAPDLYSEL to sdhci_j721e_4bit_set_clock function. This allows to set the correct ITAPDLY for timings that do not carry out tuning. Fixes: 1accbced1c32 ("mmc: sdhci_am654: Add Support for 4 bit IP on J721E") Signed-off-by: Judith Mendez Acked-by: Adrian Hunter Link: https://lore.kernel.org/r/20240320223837.959900-7-jm@ti.com Signed-off-by: Ulf Hansson Signed-off-by: Sasha Levin --- drivers/mmc/host/sdhci_am654.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/mmc/host/sdhci_am654.c b/drivers/mmc/host/sdhci_am654.c index 1968381b9310..879ead07f802 100644 --- a/drivers/mmc/host/sdhci_am654.c +++ b/drivers/mmc/host/sdhci_am654.c @@ -319,6 +319,7 @@ static void sdhci_j721e_4bit_set_clock(struct sdhci_host *host, unsigned char timing = host->mmc->ios.timing; u32 otap_del_sel; u32 itap_del_ena; + u32 itap_del_sel; u32 mask, val; /* Setup DLL Output TAP delay */ @@ -328,13 +329,18 @@ static void sdhci_j721e_4bit_set_clock(struct sdhci_host *host, val = (0x1 << OTAPDLYENA_SHIFT) | (otap_del_sel << OTAPDLYSEL_SHIFT); + /* Setup Input TAP delay */ itap_del_ena = sdhci_am654->itap_del_ena[timing]; + itap_del_sel = sdhci_am654->itap_del_sel[timing]; - mask |= ITAPDLYENA_MASK; - val |= (itap_del_ena << ITAPDLYENA_SHIFT); + mask |= ITAPDLYENA_MASK | ITAPDLYSEL_MASK; + val |= (itap_del_ena << ITAPDLYENA_SHIFT) | + (itap_del_sel << ITAPDLYSEL_SHIFT); + regmap_update_bits(sdhci_am654->base, PHY_CTRL4, ITAPCHGWIN_MASK, + 1 << ITAPCHGWIN_SHIFT); regmap_update_bits(sdhci_am654->base, PHY_CTRL4, mask, val); - + regmap_update_bits(sdhci_am654->base, PHY_CTRL4, ITAPCHGWIN_MASK, 0); regmap_update_bits(sdhci_am654->base, PHY_CTRL5, CLKBUFSEL_MASK, sdhci_am654->clkbuf_sel); From 580e47c2824254cb278c09ec84e2aed7947220fd Mon Sep 17 00:00:00 2001 From: Judith Mendez Date: Wed, 20 Mar 2024 17:38:37 -0500 Subject: [PATCH 0186/1565] mmc: sdhci_am654: Fix ITAPDLY for HS400 timing [ Upstream commit d3182932bb070e7518411fd165e023f82afd7d25 ] While STRB is currently used for DATA and CRC responses, the CMD responses from the device to the host still require ITAPDLY for HS400 timing. Currently what is stored for HS400 is the ITAPDLY from High Speed mode which is incorrect. The ITAPDLY for HS400 speed mode should be the same as ITAPDLY as HS200 timing after tuning is executed. Add the functionality to save ITAPDLY from HS200 tuning and save as HS400 ITAPDLY. Fixes: a161c45f2979 ("mmc: sdhci_am654: Enable DLL only for some speed modes") Signed-off-by: Judith Mendez Acked-by: Adrian Hunter Link: https://lore.kernel.org/r/20240320223837.959900-8-jm@ti.com Signed-off-by: Ulf Hansson Signed-off-by: Sasha Levin --- drivers/mmc/host/sdhci_am654.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/mmc/host/sdhci_am654.c b/drivers/mmc/host/sdhci_am654.c index 879ead07f802..9d74ee989cb7 100644 --- a/drivers/mmc/host/sdhci_am654.c +++ b/drivers/mmc/host/sdhci_am654.c @@ -300,6 +300,12 @@ static void sdhci_am654_set_clock(struct sdhci_host *host, unsigned int clock) if (timing > MMC_TIMING_UHS_SDR25 && clock >= CLOCK_TOO_SLOW_HZ) { sdhci_am654_setup_dll(host, clock); sdhci_am654->dll_enable = true; + + if (timing == MMC_TIMING_MMC_HS400) { + sdhci_am654->itap_del_ena[timing] = 0x1; + sdhci_am654->itap_del_sel[timing] = sdhci_am654->itap_del_sel[timing - 1]; + } + sdhci_am654_write_itapdly(sdhci_am654, sdhci_am654->itap_del_sel[timing], sdhci_am654->itap_del_ena[timing]); } else { @@ -530,6 +536,9 @@ static int sdhci_am654_platform_execute_tuning(struct sdhci_host *host, sdhci_am654_write_itapdly(sdhci_am654, itap, sdhci_am654->itap_del_ena[timing]); + /* Save ITAPDLY */ + sdhci_am654->itap_del_sel[timing] = itap; + return 0; } From a3bb8070b71b39af2a810747636c2a52e57b584c Mon Sep 17 00:00:00 2001 From: Fenglin Wu Date: Mon, 15 Apr 2024 16:03:40 -0700 Subject: [PATCH 0187/1565] Input: pm8xxx-vibrator - correct VIB_MAX_LEVELS calculation [ Upstream commit 48c0687a322d54ac7e7a685c0b6db78d78f593af ] The output voltage is inclusive hence the max level calculation is off-by-one-step. Correct it. iWhile we are at it also add a define for the step size instead of using the magic value. Fixes: 11205bb63e5c ("Input: add support for pm8xxx based vibrator driver") Signed-off-by: Fenglin Wu Reviewed-by: Dmitry Baryshkov Link: https://lore.kernel.org/r/20240412-pm8xxx-vibrator-new-design-v10-1-0ec0ad133866@quicinc.com Signed-off-by: Dmitry Torokhov Signed-off-by: Sasha Levin --- drivers/input/misc/pm8xxx-vibrator.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/input/misc/pm8xxx-vibrator.c b/drivers/input/misc/pm8xxx-vibrator.c index 53ad25eaf1a2..8bfe5c7b1244 100644 --- a/drivers/input/misc/pm8xxx-vibrator.c +++ b/drivers/input/misc/pm8xxx-vibrator.c @@ -14,7 +14,8 @@ #define VIB_MAX_LEVEL_mV (3100) #define VIB_MIN_LEVEL_mV (1200) -#define VIB_MAX_LEVELS (VIB_MAX_LEVEL_mV - VIB_MIN_LEVEL_mV) +#define VIB_PER_STEP_mV (100) +#define VIB_MAX_LEVELS (VIB_MAX_LEVEL_mV - VIB_MIN_LEVEL_mV + VIB_PER_STEP_mV) #define MAX_FF_SPEED 0xff @@ -118,10 +119,10 @@ static void pm8xxx_work_handler(struct work_struct *work) vib->active = true; vib->level = ((VIB_MAX_LEVELS * vib->speed) / MAX_FF_SPEED) + VIB_MIN_LEVEL_mV; - vib->level /= 100; + vib->level /= VIB_PER_STEP_mV; } else { vib->active = false; - vib->level = VIB_MIN_LEVEL_mV / 100; + vib->level = VIB_MIN_LEVEL_mV / VIB_PER_STEP_mV; } pm8xxx_vib_set(vib, vib->active); From 96b9ed94dcb3595aa544e3874782f05babc0df4f Mon Sep 17 00:00:00 2001 From: Marijn Suijten Date: Wed, 17 Apr 2024 01:57:43 +0200 Subject: [PATCH 0188/1565] drm/msm/dpu: Always flush the slave INTF on the CTL [ Upstream commit 2b938c3ab0a69ec6ea587bbf6fc2aec3db4a8736 ] As we can clearly see in a downstream kernel [1], flushing the slave INTF is skipped /only if/ the PPSPLIT topology is active. However, when DPU was originally submitted to mainline PPSPLIT was no longer part of it (seems to have been ripped out before submission), but this clause was incorrectly ported from the original SDE driver. Given that there is no support for PPSPLIT (currently), flushing the slave INTF should /never/ be skipped (as the `if (ppsplit && !master) goto skip;` clause downstream never becomes true). [1]: https://git.codelinaro.org/clo/la/platform/vendor/opensource/display-drivers/-/blob/display-kernel.lnx.5.4.r1-rel/msm/sde/sde_encoder_phys_cmd.c?ref_type=heads#L1131-1139 Fixes: 25fdd5933e4c ("drm/msm: Add SDM845 DPU support") Signed-off-by: Marijn Suijten Reviewed-by: Dmitry Baryshkov Patchwork: https://patchwork.freedesktop.org/patch/589901/ Link: https://lore.kernel.org/r/20240417-drm-msm-initial-dualpipe-dsc-fixes-v1-3-78ae3ee9a697@somainline.org Signed-off-by: Dmitry Baryshkov Signed-off-by: Sasha Levin --- drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c index 8493d68ad841..e1a97c10c0e4 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c @@ -448,9 +448,6 @@ static void dpu_encoder_phys_cmd_enable_helper( _dpu_encoder_phys_cmd_pingpong_config(phys_enc); - if (!dpu_encoder_phys_cmd_is_master(phys_enc)) - return; - ctl = phys_enc->hw_ctl; ctl->ops.get_bitmask_intf(ctl, &flush_mask, phys_enc->intf_idx); ctl->ops.update_pending_flush(ctl, flush_mask); From 1ef5d235be29c06b7f4df95aa59df6c62190ad20 Mon Sep 17 00:00:00 2001 From: Duoming Zhou Date: Wed, 6 Mar 2024 17:12:59 +0800 Subject: [PATCH 0189/1565] um: Fix return value in ubd_init() [ Upstream commit 31a5990ed253a66712d7ddc29c92d297a991fdf2 ] When kmalloc_array() fails to allocate memory, the ubd_init() should return -ENOMEM instead of -1. So, fix it. Fixes: f88f0bdfc32f ("um: UBD Improvements") Signed-off-by: Duoming Zhou Reviewed-by: Johannes Berg Signed-off-by: Richard Weinberger Signed-off-by: Sasha Levin --- arch/um/drivers/ubd_kern.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c index b12c1b0d3e1d..de28ce711687 100644 --- a/arch/um/drivers/ubd_kern.c +++ b/arch/um/drivers/ubd_kern.c @@ -1158,7 +1158,7 @@ static int __init ubd_init(void) if (irq_req_buffer == NULL) { printk(KERN_ERR "Failed to initialize ubd buffering\n"); - return -1; + return -ENOMEM; } io_req_buffer = kmalloc_array(UBD_REQ_BUFFER_SIZE, sizeof(struct io_thread_req *), @@ -1169,7 +1169,7 @@ static int __init ubd_init(void) if (io_req_buffer == NULL) { printk(KERN_ERR "Failed to initialize ubd buffering\n"); - return -1; + return -ENOMEM; } platform_driver_register(&ubd_driver); mutex_lock(&ubd_lock); From 351d1a64544944b44732f6a64ed65573b00b9e14 Mon Sep 17 00:00:00 2001 From: Roberto Sassu Date: Thu, 7 Mar 2024 11:49:26 +0100 Subject: [PATCH 0190/1565] um: Add winch to winch_handlers before registering winch IRQ [ Upstream commit a0fbbd36c156b9f7b2276871d499c9943dfe5101 ] Registering a winch IRQ is racy, an interrupt may occur before the winch is added to the winch_handlers list. If that happens, register_winch_irq() adds to that list a winch that is scheduled to be (or has already been) freed, causing a panic later in winch_cleanup(). Avoid the race by adding the winch to the winch_handlers list before registering the IRQ, and rolling back if um_request_irq() fails. Fixes: 42a359e31a0e ("uml: SIGIO support cleanup") Signed-off-by: Roberto Sassu Reviewed-by: Johannes Berg Signed-off-by: Richard Weinberger Signed-off-by: Sasha Levin --- arch/um/drivers/line.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/um/drivers/line.c b/arch/um/drivers/line.c index 14ad9f495fe6..37e96ba0f5fb 100644 --- a/arch/um/drivers/line.c +++ b/arch/um/drivers/line.c @@ -668,24 +668,26 @@ void register_winch_irq(int fd, int tty_fd, int pid, struct tty_port *port, goto cleanup; } - *winch = ((struct winch) { .list = LIST_HEAD_INIT(winch->list), - .fd = fd, + *winch = ((struct winch) { .fd = fd, .tty_fd = tty_fd, .pid = pid, .port = port, .stack = stack }); + spin_lock(&winch_handler_lock); + list_add(&winch->list, &winch_handlers); + spin_unlock(&winch_handler_lock); + if (um_request_irq(WINCH_IRQ, fd, IRQ_READ, winch_interrupt, IRQF_SHARED, "winch", winch) < 0) { printk(KERN_ERR "register_winch_irq - failed to register " "IRQ\n"); + spin_lock(&winch_handler_lock); + list_del(&winch->list); + spin_unlock(&winch_handler_lock); goto out_free; } - spin_lock(&winch_handler_lock); - list_add(&winch->list, &winch_handlers); - spin_unlock(&winch_handler_lock); - return; out_free: From 4d5ef7facea11462a2b74e6857bd568e2fdb19e3 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Thu, 28 Mar 2024 10:06:36 +0100 Subject: [PATCH 0191/1565] um: vector: fix bpfflash parameter evaluation [ Upstream commit 584ed2f76ff5fe360d87a04d17b6520c7999e06b ] With W=1 the build complains about a pointer compared to zero, clearly the result should've been compared. Fixes: 9807019a62dc ("um: Loadable BPF "Firmware" for vector drivers") Signed-off-by: Johannes Berg Reviewed-by: Tiwei Bie Signed-off-by: Richard Weinberger Signed-off-by: Sasha Levin --- arch/um/drivers/vector_kern.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/um/drivers/vector_kern.c b/arch/um/drivers/vector_kern.c index fc662f7cc2af..c10432ef2d41 100644 --- a/arch/um/drivers/vector_kern.c +++ b/arch/um/drivers/vector_kern.c @@ -142,7 +142,7 @@ static bool get_bpf_flash(struct arglist *def) if (allow != NULL) { if (kstrtoul(allow, 10, &result) == 0) - return (allow > 0); + return result > 0; } return false; } From f4b3d2585b33d521980b48bad354983d3f17aacc Mon Sep 17 00:00:00 2001 From: Michael Walle Date: Sun, 25 Feb 2024 08:19:33 +0200 Subject: [PATCH 0192/1565] drm/bridge: tc358775: fix support for jeida-18 and jeida-24 [ Upstream commit 30ea09a182cb37c4921b9d477ed18107befe6d78 ] The bridge always uses 24bpp internally. Therefore, for jeida-18 mapping we need to discard the lowest two bits for each channel and thus starting with LV_[RGB]2. jeida-24 has the same mapping but uses four lanes instead of three, with the forth pair transmitting the lowest two bits of each channel. Thus, the mapping between jeida-18 and jeida-24 is actually the same, except that one channel is turned off (by selecting the RGB666 format in VPCTRL). While at it, remove the bogus comment about the hardware default because the default is overwritten in any case. Tested with a jeida-18 display (Evervision VGG644804). Fixes: b26975593b17 ("display/drm/bridge: TC358775 DSI/LVDS driver") Signed-off-by: Michael Walle Signed-off-by: Tony Lindgren Reviewed-by: Robert Foss Signed-off-by: Robert Foss Link: https://patchwork.freedesktop.org/patch/msgid/20240225062008.33191-5-tony@atomide.com Signed-off-by: Sasha Levin --- drivers/gpu/drm/bridge/tc358775.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/bridge/tc358775.c b/drivers/gpu/drm/bridge/tc358775.c index 55697fa4d7c8..2e299cfe4e48 100644 --- a/drivers/gpu/drm/bridge/tc358775.c +++ b/drivers/gpu/drm/bridge/tc358775.c @@ -453,10 +453,6 @@ static void tc_bridge_enable(struct drm_bridge *bridge) dev_dbg(tc->dev, "bus_formats %04x bpc %d\n", connector->display_info.bus_formats[0], tc->bpc); - /* - * Default hardware register settings of tc358775 configured - * with MEDIA_BUS_FMT_RGB888_1X7X4_JEIDA jeida-24 format - */ if (connector->display_info.bus_formats[0] == MEDIA_BUS_FMT_RGB888_1X7X4_SPWG) { /* VESA-24 */ @@ -467,14 +463,15 @@ static void tc_bridge_enable(struct drm_bridge *bridge) d2l_write(tc->i2c, LV_MX1619, LV_MX(LVI_B6, LVI_B7, LVI_B1, LVI_B2)); d2l_write(tc->i2c, LV_MX2023, LV_MX(LVI_B3, LVI_B4, LVI_B5, LVI_L0)); d2l_write(tc->i2c, LV_MX2427, LV_MX(LVI_HS, LVI_VS, LVI_DE, LVI_R6)); - } else { /* MEDIA_BUS_FMT_RGB666_1X7X3_SPWG - JEIDA-18 */ - d2l_write(tc->i2c, LV_MX0003, LV_MX(LVI_R0, LVI_R1, LVI_R2, LVI_R3)); - d2l_write(tc->i2c, LV_MX0407, LV_MX(LVI_R4, LVI_L0, LVI_R5, LVI_G0)); - d2l_write(tc->i2c, LV_MX0811, LV_MX(LVI_G1, LVI_G2, LVI_L0, LVI_L0)); - d2l_write(tc->i2c, LV_MX1215, LV_MX(LVI_G3, LVI_G4, LVI_G5, LVI_B0)); - d2l_write(tc->i2c, LV_MX1619, LV_MX(LVI_L0, LVI_L0, LVI_B1, LVI_B2)); - d2l_write(tc->i2c, LV_MX2023, LV_MX(LVI_B3, LVI_B4, LVI_B5, LVI_L0)); - d2l_write(tc->i2c, LV_MX2427, LV_MX(LVI_HS, LVI_VS, LVI_DE, LVI_L0)); + } else { + /* JEIDA-18 and JEIDA-24 */ + d2l_write(tc->i2c, LV_MX0003, LV_MX(LVI_R2, LVI_R3, LVI_R4, LVI_R5)); + d2l_write(tc->i2c, LV_MX0407, LV_MX(LVI_R6, LVI_R1, LVI_R7, LVI_G2)); + d2l_write(tc->i2c, LV_MX0811, LV_MX(LVI_G3, LVI_G4, LVI_G0, LVI_G1)); + d2l_write(tc->i2c, LV_MX1215, LV_MX(LVI_G5, LVI_G6, LVI_G7, LVI_B2)); + d2l_write(tc->i2c, LV_MX1619, LV_MX(LVI_B0, LVI_B1, LVI_B3, LVI_B4)); + d2l_write(tc->i2c, LV_MX2023, LV_MX(LVI_B5, LVI_B6, LVI_B7, LVI_L0)); + d2l_write(tc->i2c, LV_MX2427, LV_MX(LVI_HS, LVI_VS, LVI_DE, LVI_R0)); } d2l_write(tc->i2c, VFUEN, VFUEN_EN); From a16775828aaed1c54ff4e6fe83e8e4d5c6a50cb7 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Mon, 22 Apr 2024 12:32:44 +0300 Subject: [PATCH 0193/1565] media: stk1160: fix bounds checking in stk1160_copy_video() [ Upstream commit faa4364bef2ec0060de381ff028d1d836600a381 ] The subtract in this condition is reversed. The ->length is the length of the buffer. The ->bytesused is how many bytes we have copied thus far. When the condition is reversed that means the result of the subtraction is always negative but since it's unsigned then the result is a very high positive value. That means the overflow check is never true. Additionally, the ->bytesused doesn't actually work for this purpose because we're not writing to "buf->mem + buf->bytesused". Instead, the math to calculate the destination where we are writing is a bit involved. You calculate the number of full lines already written, multiply by two, skip a line if necessary so that we start on an odd numbered line, and add the offset into the line. To fix this buffer overflow, just take the actual destination where we are writing, if the offset is already out of bounds print an error and return. Otherwise, write up to buf->length bytes. Fixes: 9cb2173e6ea8 ("[media] media: Add stk1160 new driver (easycap replacement)") Signed-off-by: Dan Carpenter Reviewed-by: Ricardo Ribalda Signed-off-by: Hans Verkuil Signed-off-by: Sasha Levin --- drivers/media/usb/stk1160/stk1160-video.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/media/usb/stk1160/stk1160-video.c b/drivers/media/usb/stk1160/stk1160-video.c index 4cf540d1b250..2a5a90311e0c 100644 --- a/drivers/media/usb/stk1160/stk1160-video.c +++ b/drivers/media/usb/stk1160/stk1160-video.c @@ -99,7 +99,7 @@ void stk1160_buffer_done(struct stk1160 *dev) static inline void stk1160_copy_video(struct stk1160 *dev, u8 *src, int len) { - int linesdone, lineoff, lencopy; + int linesdone, lineoff, lencopy, offset; int bytesperline = dev->width * 2; struct stk1160_buffer *buf = dev->isoc_ctl.buf; u8 *dst = buf->mem; @@ -139,8 +139,13 @@ void stk1160_copy_video(struct stk1160 *dev, u8 *src, int len) * Check if we have enough space left in the buffer. * In that case, we force loop exit after copy. */ - if (lencopy > buf->bytesused - buf->length) { - lencopy = buf->bytesused - buf->length; + offset = dst - (u8 *)buf->mem; + if (offset > buf->length) { + dev_warn_ratelimited(dev->dev, "out of bounds offset\n"); + return; + } + if (lencopy > buf->length - offset) { + lencopy = buf->length - offset; remain = lencopy; } @@ -182,8 +187,13 @@ void stk1160_copy_video(struct stk1160 *dev, u8 *src, int len) * Check if we have enough space left in the buffer. * In that case, we force loop exit after copy. */ - if (lencopy > buf->bytesused - buf->length) { - lencopy = buf->bytesused - buf->length; + offset = dst - (u8 *)buf->mem; + if (offset > buf->length) { + dev_warn_ratelimited(dev->dev, "offset out of bounds\n"); + return; + } + if (lencopy > buf->length - offset) { + lencopy = buf->length - offset; remain = lencopy; } From ca17da90001af98957ca862484d6cda1240fe2b5 Mon Sep 17 00:00:00 2001 From: Azeem Shaikh Date: Tue, 16 May 2023 02:54:04 +0000 Subject: [PATCH 0194/1565] scsi: qla2xxx: Replace all non-returning strlcpy() with strscpy() [ Upstream commit 37f1663c91934f664fb850306708094a324c227c ] strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). No return values were used, so direct replacement is safe. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Signed-off-by: Azeem Shaikh Link: https://lore.kernel.org/r/20230516025404.2843867-1-azeemshaikh38@gmail.com Reviewed-by: Kees Cook Signed-off-by: Martin K. Petersen Stable-dep-of: c3408c4ae041 ("scsi: qla2xxx: Avoid possible run-time warning with long model_num") Signed-off-by: Sasha Levin --- drivers/scsi/qla2xxx/qla_init.c | 8 ++++---- drivers/scsi/qla2xxx/qla_mr.c | 20 ++++++++++---------- 2 files changed, 14 insertions(+), 14 deletions(-) diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c index 3d56f971cdc4..8d54f6099802 100644 --- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -4644,7 +4644,7 @@ qla2x00_set_model_info(scsi_qla_host_t *vha, uint8_t *model, size_t len, if (use_tbl && ha->pdev->subsystem_vendor == PCI_VENDOR_ID_QLOGIC && index < QLA_MODEL_NAMES) - strlcpy(ha->model_desc, + strscpy(ha->model_desc, qla2x00_model_name[index * 2 + 1], sizeof(ha->model_desc)); } else { @@ -4652,14 +4652,14 @@ qla2x00_set_model_info(scsi_qla_host_t *vha, uint8_t *model, size_t len, if (use_tbl && ha->pdev->subsystem_vendor == PCI_VENDOR_ID_QLOGIC && index < QLA_MODEL_NAMES) { - strlcpy(ha->model_number, + strscpy(ha->model_number, qla2x00_model_name[index * 2], sizeof(ha->model_number)); - strlcpy(ha->model_desc, + strscpy(ha->model_desc, qla2x00_model_name[index * 2 + 1], sizeof(ha->model_desc)); } else { - strlcpy(ha->model_number, def, + strscpy(ha->model_number, def, sizeof(ha->model_number)); } } diff --git a/drivers/scsi/qla2xxx/qla_mr.c b/drivers/scsi/qla2xxx/qla_mr.c index 7178646ee0f0..cc8994a7c994 100644 --- a/drivers/scsi/qla2xxx/qla_mr.c +++ b/drivers/scsi/qla2xxx/qla_mr.c @@ -691,7 +691,7 @@ qlafx00_pci_info_str(struct scsi_qla_host *vha, char *str, size_t str_len) struct qla_hw_data *ha = vha->hw; if (pci_is_pcie(ha->pdev)) - strlcpy(str, "PCIe iSA", str_len); + strscpy(str, "PCIe iSA", str_len); return str; } @@ -1849,21 +1849,21 @@ qlafx00_fx_disc(scsi_qla_host_t *vha, fc_port_t *fcport, uint16_t fx_type) phost_info = &preg_hsi->hsi; memset(preg_hsi, 0, sizeof(struct register_host_info)); phost_info->os_type = OS_TYPE_LINUX; - strlcpy(phost_info->sysname, p_sysid->sysname, + strscpy(phost_info->sysname, p_sysid->sysname, sizeof(phost_info->sysname)); - strlcpy(phost_info->nodename, p_sysid->nodename, + strscpy(phost_info->nodename, p_sysid->nodename, sizeof(phost_info->nodename)); if (!strcmp(phost_info->nodename, "(none)")) ha->mr.host_info_resend = true; - strlcpy(phost_info->release, p_sysid->release, + strscpy(phost_info->release, p_sysid->release, sizeof(phost_info->release)); - strlcpy(phost_info->version, p_sysid->version, + strscpy(phost_info->version, p_sysid->version, sizeof(phost_info->version)); - strlcpy(phost_info->machine, p_sysid->machine, + strscpy(phost_info->machine, p_sysid->machine, sizeof(phost_info->machine)); - strlcpy(phost_info->domainname, p_sysid->domainname, + strscpy(phost_info->domainname, p_sysid->domainname, sizeof(phost_info->domainname)); - strlcpy(phost_info->hostdriver, QLA2XXX_VERSION, + strscpy(phost_info->hostdriver, QLA2XXX_VERSION, sizeof(phost_info->hostdriver)); preg_hsi->utc = (uint64_t)ktime_get_real_seconds(); ql_dbg(ql_dbg_init, vha, 0x0149, @@ -1909,9 +1909,9 @@ qlafx00_fx_disc(scsi_qla_host_t *vha, fc_port_t *fcport, uint16_t fx_type) if (fx_type == FXDISC_GET_CONFIG_INFO) { struct config_info_data *pinfo = (struct config_info_data *) fdisc->u.fxiocb.rsp_addr; - strlcpy(vha->hw->model_number, pinfo->model_num, + strscpy(vha->hw->model_number, pinfo->model_num, ARRAY_SIZE(vha->hw->model_number)); - strlcpy(vha->hw->model_desc, pinfo->model_description, + strscpy(vha->hw->model_desc, pinfo->model_description, ARRAY_SIZE(vha->hw->model_desc)); memcpy(&vha->hw->mr.symbolic_name, pinfo->symbolic_name, sizeof(vha->hw->mr.symbolic_name)); From c11caf1339b81ed83d071b74b5d4b39fe4eda409 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 22 Aug 2022 17:14:54 +0200 Subject: [PATCH 0195/1565] media: flexcop-usb: clean up endpoint sanity checks [ Upstream commit 3de50478b5cc2e0c2479a5f2b967f331f7597d23 ] Add a temporary variable to make the endpoint sanity checks a bit more readable. While at it, fix a typo in the usb_set_interface() comment. Signed-off-by: Johan Hovold Link: https://lore.kernel.org/r/20220822151456.27178-2-johan@kernel.org Signed-off-by: Greg Kroah-Hartman Stable-dep-of: f62dc8f6bf82 ("media: flexcop-usb: fix sanity check of bNumEndpoints") Signed-off-by: Sasha Levin --- drivers/media/usb/b2c2/flexcop-usb.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/media/usb/b2c2/flexcop-usb.c b/drivers/media/usb/b2c2/flexcop-usb.c index 2299d5cca8ff..6d199b32e317 100644 --- a/drivers/media/usb/b2c2/flexcop-usb.c +++ b/drivers/media/usb/b2c2/flexcop-usb.c @@ -502,17 +502,21 @@ static int flexcop_usb_transfer_init(struct flexcop_usb *fc_usb) static int flexcop_usb_init(struct flexcop_usb *fc_usb) { - /* use the alternate setting with the larges buffer */ - int ret = usb_set_interface(fc_usb->udev, 0, 1); + struct usb_host_interface *alt; + int ret; + /* use the alternate setting with the largest buffer */ + ret = usb_set_interface(fc_usb->udev, 0, 1); if (ret) { err("set interface failed."); return ret; } - if (fc_usb->uintf->cur_altsetting->desc.bNumEndpoints < 1) + alt = fc_usb->uintf->cur_altsetting; + + if (alt->desc.bNumEndpoints < 1) return -ENODEV; - if (!usb_endpoint_is_isoc_in(&fc_usb->uintf->cur_altsetting->endpoint[0].desc)) + if (!usb_endpoint_is_isoc_in(&alt->endpoint[0].desc)) return -ENODEV; switch (fc_usb->udev->speed) { From a2d61b328e39c2ed8e275cedc2fe44ea598852c9 Mon Sep 17 00:00:00 2001 From: Dongliang Mu Date: Thu, 2 Jun 2022 06:50:24 +0100 Subject: [PATCH 0196/1565] media: flexcop-usb: fix sanity check of bNumEndpoints [ Upstream commit f62dc8f6bf82d1b307fc37d8d22cc79f67856c2f ] Commit d725d20e81c2 ("media: flexcop-usb: sanity checking of endpoint type ") adds a sanity check for endpoint[1], but fails to modify the sanity check of bNumEndpoints. Fix this by modifying the sanity check of bNumEndpoints to 2. Link: https://lore.kernel.org/linux-media/20220602055027.849014-1-dzm91@hust.edu.cn Fixes: d725d20e81c2 ("media: flexcop-usb: sanity checking of endpoint type") Signed-off-by: Dongliang Mu Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin --- drivers/media/usb/b2c2/flexcop-usb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/usb/b2c2/flexcop-usb.c b/drivers/media/usb/b2c2/flexcop-usb.c index 6d199b32e317..6ded5a6181aa 100644 --- a/drivers/media/usb/b2c2/flexcop-usb.c +++ b/drivers/media/usb/b2c2/flexcop-usb.c @@ -514,7 +514,7 @@ static int flexcop_usb_init(struct flexcop_usb *fc_usb) alt = fc_usb->uintf->cur_altsetting; - if (alt->desc.bNumEndpoints < 1) + if (alt->desc.bNumEndpoints < 2) return -ENODEV; if (!usb_endpoint_is_isoc_in(&alt->endpoint[0].desc)) return -ENODEV; From 12ea1ec1372591ddbfe19b77cc2b4f2aeac85a41 Mon Sep 17 00:00:00 2001 From: Shrikanth Hegde Date: Fri, 12 Apr 2024 14:50:47 +0530 Subject: [PATCH 0197/1565] powerpc/pseries: Add failure related checks for h_get_mpp and h_get_ppp [ Upstream commit 6d4341638516bf97b9a34947e0bd95035a8230a5 ] Couple of Minor fixes: - hcall return values are long. Fix that for h_get_mpp, h_get_ppp and parse_ppp_data - If hcall fails, values set should be at-least zero. It shouldn't be uninitialized values. Fix that for h_get_mpp and h_get_ppp Signed-off-by: Shrikanth Hegde Signed-off-by: Michael Ellerman Link: https://msgid.link/20240412092047.455483-3-sshegde@linux.ibm.com Signed-off-by: Sasha Levin --- arch/powerpc/include/asm/hvcall.h | 2 +- arch/powerpc/platforms/pseries/lpar.c | 6 +++--- arch/powerpc/platforms/pseries/lparcfg.c | 6 +++--- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index 00c8cda1c9c3..1a60188f74ad 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -494,7 +494,7 @@ struct hvcall_mpp_data { unsigned long backing_mem; }; -int h_get_mpp(struct hvcall_mpp_data *); +long h_get_mpp(struct hvcall_mpp_data *mpp_data); struct hvcall_mpp_x_data { unsigned long coalesced_bytes; diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c index 4a3425fb1939..aed67f1a1bc5 100644 --- a/arch/powerpc/platforms/pseries/lpar.c +++ b/arch/powerpc/platforms/pseries/lpar.c @@ -1883,10 +1883,10 @@ void __trace_hcall_exit(long opcode, long retval, unsigned long *retbuf) * h_get_mpp * H_GET_MPP hcall returns info in 7 parms */ -int h_get_mpp(struct hvcall_mpp_data *mpp_data) +long h_get_mpp(struct hvcall_mpp_data *mpp_data) { - int rc; - unsigned long retbuf[PLPAR_HCALL9_BUFSIZE]; + unsigned long retbuf[PLPAR_HCALL9_BUFSIZE] = {0}; + long rc; rc = plpar_hcall9(H_GET_MPP, retbuf); diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c index a7d4e25ae82a..e0c5fc9ab242 100644 --- a/arch/powerpc/platforms/pseries/lparcfg.c +++ b/arch/powerpc/platforms/pseries/lparcfg.c @@ -112,8 +112,8 @@ struct hvcall_ppp_data { */ static unsigned int h_get_ppp(struct hvcall_ppp_data *ppp_data) { - unsigned long rc; - unsigned long retbuf[PLPAR_HCALL9_BUFSIZE]; + unsigned long retbuf[PLPAR_HCALL9_BUFSIZE] = {0}; + long rc; rc = plpar_hcall9(H_GET_PPP, retbuf); @@ -192,7 +192,7 @@ static void parse_ppp_data(struct seq_file *m) struct hvcall_ppp_data ppp_data; struct device_node *root; const __be32 *perf_level; - int rc; + long rc; rc = h_get_ppp(&ppp_data); if (rc) From 6be4923ade2b5e5a58e3dde5522dd75d77276fa4 Mon Sep 17 00:00:00 2001 From: Tiwei Bie Date: Tue, 23 Apr 2024 20:58:53 +0800 Subject: [PATCH 0198/1565] um: Fix the -Wmissing-prototypes warning for __switch_mm [ Upstream commit 2cbade17b18c0f0fd9963f26c9fc9b057eb1cb3a ] The __switch_mm function is defined in the user code, and is called by the kernel code. It should be declared in a shared header. Fixes: 4dc706c2f292 ("um: take um_mmu.h to asm/mmu.h, clean asm/mmu_context.h a bit") Signed-off-by: Tiwei Bie Signed-off-by: Richard Weinberger Signed-off-by: Sasha Levin --- arch/um/include/asm/mmu.h | 2 -- arch/um/include/shared/skas/mm_id.h | 2 ++ 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/um/include/asm/mmu.h b/arch/um/include/asm/mmu.h index 5b072aba5b65..a7cb380c0b5c 100644 --- a/arch/um/include/asm/mmu.h +++ b/arch/um/include/asm/mmu.h @@ -15,8 +15,6 @@ typedef struct mm_context { struct page *stub_pages[2]; } mm_context_t; -extern void __switch_mm(struct mm_id * mm_idp); - /* Avoid tangled inclusion with asm/ldt.h */ extern long init_new_ldt(struct mm_context *to_mm, struct mm_context *from_mm); extern void free_ldt(struct mm_context *mm); diff --git a/arch/um/include/shared/skas/mm_id.h b/arch/um/include/shared/skas/mm_id.h index e82e203f5f41..92dbf727e384 100644 --- a/arch/um/include/shared/skas/mm_id.h +++ b/arch/um/include/shared/skas/mm_id.h @@ -15,4 +15,6 @@ struct mm_id { int kill; }; +void __switch_mm(struct mm_id *mm_idp); + #endif From a0ca5ff242933f7c5ab6e250bfb1b381d090a250 Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Fri, 23 Feb 2024 12:24:38 +0000 Subject: [PATCH 0199/1565] media: cec: cec-adap: always cancel work in cec_transmit_msg_fh [ Upstream commit 9fe2816816a3c765dff3b88af5b5c3d9bbb911ce ] Do not check for !data->completed, just always call cancel_delayed_work_sync(). This fixes a small race condition. Signed-off-by: Hans Verkuil Reported-by: Yang, Chenyuan Closes: https://lore.kernel.org/linux-media/PH7PR11MB57688E64ADE4FE82E658D86DA09EA@PH7PR11MB5768.namprd11.prod.outlook.com/ Fixes: 490d84f6d73c ("media: cec: forgot to cancel delayed work") Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin --- drivers/media/cec/core/cec-adap.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index 2f7ab5df1c58..78027f0310ac 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -913,8 +913,7 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg, */ mutex_unlock(&adap->lock); wait_for_completion_killable(&data->c); - if (!data->completed) - cancel_delayed_work_sync(&data->work); + cancel_delayed_work_sync(&data->work); mutex_lock(&adap->lock); /* Cancel the transmit if it was interrupted */ From 9f6da5da3d7c73a0daca5c2bc90d3527520ee727 Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Fri, 23 Feb 2024 12:25:55 +0000 Subject: [PATCH 0200/1565] media: cec: cec-api: add locking in cec_release() [ Upstream commit 42bcaacae924bf18ae387c3f78c202df0b739292 ] When cec_release() uses fh->msgs it has to take fh->lock, otherwise the list can get corrupted. Signed-off-by: Hans Verkuil Reported-by: Yang, Chenyuan Closes: https://lore.kernel.org/linux-media/PH7PR11MB57688E64ADE4FE82E658D86DA09EA@PH7PR11MB5768.namprd11.prod.outlook.com/ Fixes: ca684386e6e2 ("[media] cec: add HDMI CEC framework (api)") Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin --- drivers/media/cec/core/cec-api.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c index f922a2196b2b..893641ebc644 100644 --- a/drivers/media/cec/core/cec-api.c +++ b/drivers/media/cec/core/cec-api.c @@ -672,6 +672,8 @@ static int cec_release(struct inode *inode, struct file *filp) list_del(&data->xfer_list); } mutex_unlock(&adap->lock); + + mutex_lock(&fh->lock); while (!list_empty(&fh->msgs)) { struct cec_msg_entry *entry = list_first_entry(&fh->msgs, struct cec_msg_entry, list); @@ -689,6 +691,7 @@ static int cec_release(struct inode *inode, struct file *filp) kfree(entry); } } + mutex_unlock(&fh->lock); kfree(fh); cec_put_device(devnode); From ca55f013be1368e47d5ac755e7ca4217671426d5 Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Tue, 9 Mar 2021 12:48:20 +0100 Subject: [PATCH 0201/1565] media: core headers: fix kernel-doc warnings [ Upstream commit f12b81e47f48940a6ec82ff308a7d97cd2307442 ] This patch fixes the following kernel-doc warnings: include/uapi/linux/videodev2.h:996: warning: Function parameter or member 'm' not described in 'v4l2_plane' include/uapi/linux/videodev2.h:996: warning: Function parameter or member 'reserved' not described in 'v4l2_plane' include/uapi/linux/videodev2.h:1057: warning: Function parameter or member 'm' not described in 'v4l2_buffer' include/uapi/linux/videodev2.h:1057: warning: Function parameter or member 'reserved2' not described in 'v4l2_buffer' include/uapi/linux/videodev2.h:1057: warning: Function parameter or member 'reserved' not described in 'v4l2_buffer' include/uapi/linux/videodev2.h:1068: warning: Function parameter or member 'tv' not described in 'v4l2_timeval_to_ns' include/uapi/linux/videodev2.h:1068: warning: Excess function parameter 'ts' description in 'v4l2_timeval_to_ns' include/uapi/linux/videodev2.h:1138: warning: Function parameter or member 'reserved' not described in 'v4l2_exportbuffer' include/uapi/linux/videodev2.h:2237: warning: Function parameter or member 'reserved' not described in 'v4l2_plane_pix_format' include/uapi/linux/videodev2.h:2270: warning: Function parameter or member 'hsv_enc' not described in 'v4l2_pix_format_mplane' include/uapi/linux/videodev2.h:2270: warning: Function parameter or member 'reserved' not described in 'v4l2_pix_format_mplane' include/uapi/linux/videodev2.h:2281: warning: Function parameter or member 'reserved' not described in 'v4l2_sdr_format' include/uapi/linux/videodev2.h:2315: warning: Function parameter or member 'fmt' not described in 'v4l2_format' include/uapi/linux/v4l2-subdev.h:53: warning: Function parameter or member 'reserved' not described in 'v4l2_subdev_format' include/uapi/linux/v4l2-subdev.h:66: warning: Function parameter or member 'reserved' not described in 'v4l2_subdev_crop' include/uapi/linux/v4l2-subdev.h:89: warning: Function parameter or member 'reserved' not described in 'v4l2_subdev_mbus_code_enum' include/uapi/linux/v4l2-subdev.h:108: warning: Function parameter or member 'min_width' not described in 'v4l2_subdev_frame_size_enum' include/uapi/linux/v4l2-subdev.h:108: warning: Function parameter or member 'max_width' not described in 'v4l2_subdev_frame_size_enum' include/uapi/linux/v4l2-subdev.h:108: warning: Function parameter or member 'min_height' not described in 'v4l2_subdev_frame_size_enum' include/uapi/linux/v4l2-subdev.h:108: warning: Function parameter or member 'max_height' not described in 'v4l2_subdev_frame_size_enum' include/uapi/linux/v4l2-subdev.h:108: warning: Function parameter or member 'reserved' not described in 'v4l2_subdev_frame_size_enum' include/uapi/linux/v4l2-subdev.h:119: warning: Function parameter or member 'reserved' not described in 'v4l2_subdev_frame_interval' include/uapi/linux/v4l2-subdev.h:140: warning: Function parameter or member 'reserved' not described in 'v4l2_subdev_frame_interval_enum' include/uapi/linux/cec.h:406: warning: Function parameter or member 'raw' not described in 'cec_connector_info' include/uapi/linux/cec.h:470: warning: Function parameter or member 'flags' not described in 'cec_event' include/media/v4l2-h264.h:82: warning: Function parameter or member 'reflist' not described in 'v4l2_h264_build_p_ref_list' include/media/v4l2-h264.h:82: warning: expecting prototype for v4l2_h264_build_b_ref_lists(). Prototype was for v4l2_h264_build_p_ref_list() instead include/media/cec.h:50: warning: Function parameter or member 'lock' not described in 'cec_devnode' include/media/v4l2-jpeg.h:122: warning: Function parameter or member 'num_dht' not described in 'v4l2_jpeg_header' include/media/v4l2-jpeg.h:122: warning: Function parameter or member 'num_dqt' not described in 'v4l2_jpeg_header' Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Stable-dep-of: 47c82aac10a6 ("media: cec: core: avoid recursive cec_claim_log_addrs") Signed-off-by: Sasha Levin --- include/media/cec.h | 2 +- include/media/v4l2-h264.h | 6 +++--- include/media/v4l2-jpeg.h | 2 ++ include/uapi/linux/cec.h | 3 ++- include/uapi/linux/v4l2-subdev.h | 12 +++++++++++- include/uapi/linux/videodev2.h | 15 ++++++++++++++- 6 files changed, 33 insertions(+), 7 deletions(-) diff --git a/include/media/cec.h b/include/media/cec.h index cd35ae6b7560..208c9613c07e 100644 --- a/include/media/cec.h +++ b/include/media/cec.h @@ -28,8 +28,8 @@ * @minor: device node minor number * @registered: the device was correctly registered * @unregistered: the device was unregistered - * @fhs_lock: lock to control access to the filehandle list * @fhs: the list of open filehandles (cec_fh) + * @lock: lock to control access to this structure * * This structure represents a cec-related device node. * diff --git a/include/media/v4l2-h264.h b/include/media/v4l2-h264.h index f08ba181263d..1cc89d2e693a 100644 --- a/include/media/v4l2-h264.h +++ b/include/media/v4l2-h264.h @@ -66,11 +66,11 @@ v4l2_h264_build_b_ref_lists(const struct v4l2_h264_reflist_builder *builder, u8 *b0_reflist, u8 *b1_reflist); /** - * v4l2_h264_build_b_ref_lists() - Build the P reference list + * v4l2_h264_build_p_ref_list() - Build the P reference list * * @builder: reference list builder context - * @p_reflist: 16-bytes array used to store the P reference list. Each entry - * is an index in the DPB + * @reflist: 16-bytes array used to store the P reference list. Each entry + * is an index in the DPB * * This functions builds the P reference lists. This procedure is describe in * section '8.2.4 Decoding process for reference picture lists construction' diff --git a/include/media/v4l2-jpeg.h b/include/media/v4l2-jpeg.h index ddba2a56c321..3a3344a97678 100644 --- a/include/media/v4l2-jpeg.h +++ b/include/media/v4l2-jpeg.h @@ -91,7 +91,9 @@ struct v4l2_jpeg_scan_header { * struct v4l2_jpeg_header - parsed JPEG header * @sof: pointer to frame header and size * @sos: pointer to scan header and size + * @num_dht: number of entries in @dht * @dht: pointers to huffman tables and sizes + * @num_dqt: number of entries in @dqt * @dqt: pointers to quantization tables and sizes * @frame: parsed frame header * @scan: pointer to parsed scan header, optional diff --git a/include/uapi/linux/cec.h b/include/uapi/linux/cec.h index 7d1a06c52469..dc8879d179fd 100644 --- a/include/uapi/linux/cec.h +++ b/include/uapi/linux/cec.h @@ -396,6 +396,7 @@ struct cec_drm_connector_info { * associated with the CEC adapter. * @type: connector type (if any) * @drm: drm connector info + * @raw: array to pad the union */ struct cec_connector_info { __u32 type; @@ -453,7 +454,7 @@ struct cec_event_lost_msgs { * struct cec_event - CEC event structure * @ts: the timestamp of when the event was sent. * @event: the event. - * array. + * @flags: event flags. * @state_change: the event payload for CEC_EVENT_STATE_CHANGE. * @lost_msgs: the event payload for CEC_EVENT_LOST_MSGS. * @raw: array to pad the union. diff --git a/include/uapi/linux/v4l2-subdev.h b/include/uapi/linux/v4l2-subdev.h index a38454d9e0f5..658106f5b5dc 100644 --- a/include/uapi/linux/v4l2-subdev.h +++ b/include/uapi/linux/v4l2-subdev.h @@ -44,6 +44,7 @@ enum v4l2_subdev_format_whence { * @which: format type (from enum v4l2_subdev_format_whence) * @pad: pad number, as reported by the media API * @format: media bus format (format code and frame size) + * @reserved: drivers and applications must zero this array */ struct v4l2_subdev_format { __u32 which; @@ -57,6 +58,7 @@ struct v4l2_subdev_format { * @which: format type (from enum v4l2_subdev_format_whence) * @pad: pad number, as reported by the media API * @rect: pad crop rectangle boundaries + * @reserved: drivers and applications must zero this array */ struct v4l2_subdev_crop { __u32 which; @@ -78,6 +80,7 @@ struct v4l2_subdev_crop { * @code: format code (MEDIA_BUS_FMT_ definitions) * @which: format type (from enum v4l2_subdev_format_whence) * @flags: flags set by the driver, (V4L2_SUBDEV_MBUS_CODE_*) + * @reserved: drivers and applications must zero this array */ struct v4l2_subdev_mbus_code_enum { __u32 pad; @@ -90,10 +93,15 @@ struct v4l2_subdev_mbus_code_enum { /** * struct v4l2_subdev_frame_size_enum - Media bus format enumeration - * @pad: pad number, as reported by the media API * @index: format index during enumeration + * @pad: pad number, as reported by the media API * @code: format code (MEDIA_BUS_FMT_ definitions) + * @min_width: minimum frame width, in pixels + * @max_width: maximum frame width, in pixels + * @min_height: minimum frame height, in pixels + * @max_height: maximum frame height, in pixels * @which: format type (from enum v4l2_subdev_format_whence) + * @reserved: drivers and applications must zero this array */ struct v4l2_subdev_frame_size_enum { __u32 index; @@ -111,6 +119,7 @@ struct v4l2_subdev_frame_size_enum { * struct v4l2_subdev_frame_interval - Pad-level frame rate * @pad: pad number, as reported by the media API * @interval: frame interval in seconds + * @reserved: drivers and applications must zero this array */ struct v4l2_subdev_frame_interval { __u32 pad; @@ -127,6 +136,7 @@ struct v4l2_subdev_frame_interval { * @height: frame height in pixels * @interval: frame interval in seconds * @which: format type (from enum v4l2_subdev_format_whence) + * @reserved: drivers and applications must zero this array */ struct v4l2_subdev_frame_interval_enum { __u32 index; diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h index 55b8c4b82479..1bbd81f031fe 100644 --- a/include/uapi/linux/videodev2.h +++ b/include/uapi/linux/videodev2.h @@ -976,8 +976,10 @@ struct v4l2_requestbuffers { * pointing to this plane * @fd: when memory is V4L2_MEMORY_DMABUF, a userspace file * descriptor associated with this plane + * @m: union of @mem_offset, @userptr and @fd * @data_offset: offset in the plane to the start of data; usually 0, * unless there is a header in front of the data + * @reserved: drivers and applications must zero this array * * Multi-planar buffers consist of one or more planes, e.g. an YCbCr buffer * with two planes can have one plane for Y, and another for interleaved CbCr @@ -1019,10 +1021,14 @@ struct v4l2_plane { * a userspace file descriptor associated with this buffer * @planes: for multiplanar buffers; userspace pointer to the array of plane * info structs for this buffer + * @m: union of @offset, @userptr, @planes and @fd * @length: size in bytes of the buffer (NOT its payload) for single-plane * buffers (when type != *_MPLANE); number of elements in the * planes array for multi-plane buffers + * @reserved2: drivers and applications must zero this field * @request_fd: fd of the request that this buffer should use + * @reserved: for backwards compatibility with applications that do not know + * about @request_fd * * Contains data exchanged by application and driver using one of the Streaming * I/O methods. @@ -1060,7 +1066,7 @@ struct v4l2_buffer { #ifndef __KERNEL__ /** * v4l2_timeval_to_ns - Convert timeval to nanoseconds - * @ts: pointer to the timeval variable to be converted + * @tv: pointer to the timeval variable to be converted * * Returns the scalar nanosecond representation of the timeval * parameter. @@ -1121,6 +1127,7 @@ static inline __u64 v4l2_timeval_to_ns(const struct timeval *tv) * @flags: flags for newly created file, currently only O_CLOEXEC is * supported, refer to manual of open syscall for more details * @fd: file descriptor associated with DMABUF (set by driver) + * @reserved: drivers and applications must zero this array * * Contains data used for exporting a video buffer as DMABUF file descriptor. * The buffer is identified by a 'cookie' returned by VIDIOC_QUERYBUF @@ -2215,6 +2222,7 @@ struct v4l2_mpeg_vbi_fmt_ivtv { * this plane will be used * @bytesperline: distance in bytes between the leftmost pixels in two * adjacent lines + * @reserved: drivers and applications must zero this array */ struct v4l2_plane_pix_format { __u32 sizeimage; @@ -2233,8 +2241,10 @@ struct v4l2_plane_pix_format { * @num_planes: number of planes for this format * @flags: format flags (V4L2_PIX_FMT_FLAG_*) * @ycbcr_enc: enum v4l2_ycbcr_encoding, Y'CbCr encoding + * @hsv_enc: enum v4l2_hsv_encoding, HSV encoding * @quantization: enum v4l2_quantization, colorspace quantization * @xfer_func: enum v4l2_xfer_func, colorspace transfer function + * @reserved: drivers and applications must zero this array */ struct v4l2_pix_format_mplane { __u32 width; @@ -2259,6 +2269,7 @@ struct v4l2_pix_format_mplane { * struct v4l2_sdr_format - SDR format definition * @pixelformat: little endian four character code (fourcc) * @buffersize: maximum size in bytes required for data + * @reserved: drivers and applications must zero this array */ struct v4l2_sdr_format { __u32 pixelformat; @@ -2285,6 +2296,8 @@ struct v4l2_meta_format { * @vbi: raw VBI capture or output parameters * @sliced: sliced VBI capture or output parameters * @raw_data: placeholder for future extensions and custom formats + * @fmt: union of @pix, @pix_mp, @win, @vbi, @sliced, @sdr, @meta + * and @raw_data */ struct v4l2_format { __u32 type; From 0ab74ae99f86d0bc06b10d8c41980a91ba853f45 Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Wed, 1 Dec 2021 13:41:26 +0100 Subject: [PATCH 0202/1565] media: cec: fix a deadlock situation [ Upstream commit a9e6107616bb8108aa4fc22584a05e69761a91f7 ] The cec_devnode struct has a lock meant to serialize access to the fields of this struct. This lock is taken during device node (un)registration and when opening or releasing a filehandle to the device node. When the last open filehandle is closed the cec adapter might be disabled by calling the adap_enable driver callback with the devnode.lock held. However, if during that callback a message or event arrives then the driver will call one of the cec_queue_event() variants in cec-adap.c, and those will take the same devnode.lock to walk the open filehandle list. This obviously causes a deadlock. This is quite easy to reproduce with the cec-gpio driver since that uses the cec-pin framework which generated lots of events and uses a kernel thread for the processing, so when adap_enable is called the thread is still running and can generate events. But I suspect that it might also happen with other drivers if an interrupt arrives signaling e.g. a received message before adap_enable had a chance to disable the interrupts. This patch adds a new mutex to serialize access to the fhs list. When adap_enable() is called the devnode.lock mutex is held, but not devnode.lock_fhs. The event functions in cec-adap.c will now use devnode.lock_fhs instead of devnode.lock, ensuring that it is safe to call those functions from the adap_enable callback. This specific issue only happens if the last open filehandle is closed and the physical address is invalid. This is not something that happens during normal operation, but it does happen when monitoring CEC traffic (e.g. cec-ctl --monitor) with an unconfigured CEC adapter. Signed-off-by: Hans Verkuil Cc: # for v5.13 and up Signed-off-by: Mauro Carvalho Chehab Stable-dep-of: 47c82aac10a6 ("media: cec: core: avoid recursive cec_claim_log_addrs") Signed-off-by: Sasha Levin --- drivers/media/cec/core/cec-adap.c | 38 +++++++++++++++++-------------- drivers/media/cec/core/cec-api.c | 6 +++++ drivers/media/cec/core/cec-core.c | 3 +++ include/media/cec.h | 11 +++++++-- 4 files changed, 39 insertions(+), 19 deletions(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index 78027f0310ac..d2a6fd8b6501 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -161,10 +161,10 @@ static void cec_queue_event(struct cec_adapter *adap, u64 ts = ktime_get_ns(); struct cec_fh *fh; - mutex_lock(&adap->devnode.lock); + mutex_lock(&adap->devnode.lock_fhs); list_for_each_entry(fh, &adap->devnode.fhs, list) cec_queue_event_fh(fh, ev, ts); - mutex_unlock(&adap->devnode.lock); + mutex_unlock(&adap->devnode.lock_fhs); } /* Notify userspace that the CEC pin changed state at the given time. */ @@ -178,11 +178,12 @@ void cec_queue_pin_cec_event(struct cec_adapter *adap, bool is_high, }; struct cec_fh *fh; - mutex_lock(&adap->devnode.lock); - list_for_each_entry(fh, &adap->devnode.fhs, list) + mutex_lock(&adap->devnode.lock_fhs); + list_for_each_entry(fh, &adap->devnode.fhs, list) { if (fh->mode_follower == CEC_MODE_MONITOR_PIN) cec_queue_event_fh(fh, &ev, ktime_to_ns(ts)); - mutex_unlock(&adap->devnode.lock); + } + mutex_unlock(&adap->devnode.lock_fhs); } EXPORT_SYMBOL_GPL(cec_queue_pin_cec_event); @@ -195,10 +196,10 @@ void cec_queue_pin_hpd_event(struct cec_adapter *adap, bool is_high, ktime_t ts) }; struct cec_fh *fh; - mutex_lock(&adap->devnode.lock); + mutex_lock(&adap->devnode.lock_fhs); list_for_each_entry(fh, &adap->devnode.fhs, list) cec_queue_event_fh(fh, &ev, ktime_to_ns(ts)); - mutex_unlock(&adap->devnode.lock); + mutex_unlock(&adap->devnode.lock_fhs); } EXPORT_SYMBOL_GPL(cec_queue_pin_hpd_event); @@ -211,10 +212,10 @@ void cec_queue_pin_5v_event(struct cec_adapter *adap, bool is_high, ktime_t ts) }; struct cec_fh *fh; - mutex_lock(&adap->devnode.lock); + mutex_lock(&adap->devnode.lock_fhs); list_for_each_entry(fh, &adap->devnode.fhs, list) cec_queue_event_fh(fh, &ev, ktime_to_ns(ts)); - mutex_unlock(&adap->devnode.lock); + mutex_unlock(&adap->devnode.lock_fhs); } EXPORT_SYMBOL_GPL(cec_queue_pin_5v_event); @@ -286,12 +287,12 @@ static void cec_queue_msg_monitor(struct cec_adapter *adap, u32 monitor_mode = valid_la ? CEC_MODE_MONITOR : CEC_MODE_MONITOR_ALL; - mutex_lock(&adap->devnode.lock); + mutex_lock(&adap->devnode.lock_fhs); list_for_each_entry(fh, &adap->devnode.fhs, list) { if (fh->mode_follower >= monitor_mode) cec_queue_msg_fh(fh, msg); } - mutex_unlock(&adap->devnode.lock); + mutex_unlock(&adap->devnode.lock_fhs); } /* @@ -302,12 +303,12 @@ static void cec_queue_msg_followers(struct cec_adapter *adap, { struct cec_fh *fh; - mutex_lock(&adap->devnode.lock); + mutex_lock(&adap->devnode.lock_fhs); list_for_each_entry(fh, &adap->devnode.fhs, list) { if (fh->mode_follower == CEC_MODE_FOLLOWER) cec_queue_msg_fh(fh, msg); } - mutex_unlock(&adap->devnode.lock); + mutex_unlock(&adap->devnode.lock_fhs); } /* Notify userspace of an adapter state change. */ @@ -1559,6 +1560,7 @@ void __cec_s_phys_addr(struct cec_adapter *adap, u16 phys_addr, bool block) /* Disabling monitor all mode should always succeed */ if (adap->monitor_all_cnt) WARN_ON(call_op(adap, adap_monitor_all_enable, false)); + /* serialize adap_enable */ mutex_lock(&adap->devnode.lock); if (adap->needs_hpd || list_empty(&adap->devnode.fhs)) { WARN_ON(adap->ops->adap_enable(adap, false)); @@ -1570,14 +1572,16 @@ void __cec_s_phys_addr(struct cec_adapter *adap, u16 phys_addr, bool block) return; } + /* serialize adap_enable */ mutex_lock(&adap->devnode.lock); adap->last_initiator = 0xff; adap->transmit_in_progress = false; - if ((adap->needs_hpd || list_empty(&adap->devnode.fhs)) && - adap->ops->adap_enable(adap, true)) { - mutex_unlock(&adap->devnode.lock); - return; + if (adap->needs_hpd || list_empty(&adap->devnode.fhs)) { + if (adap->ops->adap_enable(adap, true)) { + mutex_unlock(&adap->devnode.lock); + return; + } } if (adap->monitor_all_cnt && diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c index 893641ebc644..899017a0e514 100644 --- a/drivers/media/cec/core/cec-api.c +++ b/drivers/media/cec/core/cec-api.c @@ -586,6 +586,7 @@ static int cec_open(struct inode *inode, struct file *filp) return err; } + /* serialize adap_enable */ mutex_lock(&devnode->lock); if (list_empty(&devnode->fhs) && !adap->needs_hpd && @@ -624,7 +625,9 @@ static int cec_open(struct inode *inode, struct file *filp) } #endif + mutex_lock(&devnode->lock_fhs); list_add(&fh->list, &devnode->fhs); + mutex_unlock(&devnode->lock_fhs); mutex_unlock(&devnode->lock); return 0; @@ -653,8 +656,11 @@ static int cec_release(struct inode *inode, struct file *filp) cec_monitor_all_cnt_dec(adap); mutex_unlock(&adap->lock); + /* serialize adap_enable */ mutex_lock(&devnode->lock); + mutex_lock(&devnode->lock_fhs); list_del(&fh->list); + mutex_unlock(&devnode->lock_fhs); if (cec_is_registered(adap) && list_empty(&devnode->fhs) && !adap->needs_hpd && adap->phys_addr == CEC_PHYS_ADDR_INVALID) { WARN_ON(adap->ops->adap_enable(adap, false)); diff --git a/drivers/media/cec/core/cec-core.c b/drivers/media/cec/core/cec-core.c index ece236291f35..61139b0a0869 100644 --- a/drivers/media/cec/core/cec-core.c +++ b/drivers/media/cec/core/cec-core.c @@ -167,8 +167,10 @@ static void cec_devnode_unregister(struct cec_adapter *adap) return; } + mutex_lock(&devnode->lock_fhs); list_for_each_entry(fh, &devnode->fhs, list) wake_up_interruptible(&fh->wait); + mutex_unlock(&devnode->lock_fhs); devnode->registered = false; devnode->unregistered = true; @@ -272,6 +274,7 @@ struct cec_adapter *cec_allocate_adapter(const struct cec_adap_ops *ops, /* adap->devnode initialization */ INIT_LIST_HEAD(&adap->devnode.fhs); + mutex_init(&adap->devnode.lock_fhs); mutex_init(&adap->devnode.lock); adap->kthread = kthread_run(cec_thread_func, adap, "cec-%s", name); diff --git a/include/media/cec.h b/include/media/cec.h index 208c9613c07e..77346f757036 100644 --- a/include/media/cec.h +++ b/include/media/cec.h @@ -26,13 +26,17 @@ * @dev: cec device * @cdev: cec character device * @minor: device node minor number + * @lock: lock to serialize open/release and registration * @registered: the device was correctly registered * @unregistered: the device was unregistered + * @lock_fhs: lock to control access to @fhs * @fhs: the list of open filehandles (cec_fh) - * @lock: lock to control access to this structure * * This structure represents a cec-related device node. * + * To add or remove filehandles from @fhs the @lock must be taken first, + * followed by @lock_fhs. It is safe to access @fhs if either lock is held. + * * The @parent is a physical device. It must be set by core or device drivers * before registering the node. */ @@ -43,10 +47,13 @@ struct cec_devnode { /* device info */ int minor; + /* serialize open/release and registration */ + struct mutex lock; bool registered; bool unregistered; + /* protect access to fhs */ + struct mutex lock_fhs; struct list_head fhs; - struct mutex lock; }; struct cec_adapter; From 2c67f3634f827b06b73bc1c1d3750f3bfed072b8 Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Thu, 3 Mar 2022 13:55:08 +0000 Subject: [PATCH 0203/1565] media: cec: call enable_adap on s_log_addrs [ Upstream commit 3813c932ed970dd4f413498ccecb03c73c4f1784 ] Don't enable/disable the adapter if the first fh is opened or the last fh is closed, instead do this when the adapter is configured or unconfigured, and also when we enter Monitor All or Monitor Pin mode for the first time or we exit the Monitor All/Pin mode for the last time. However, if needs_hpd is true, then do this when the physical address is set or cleared: in that case the adapter typically is powered by the HPD, so it really is disabled when the HPD is low. This case (needs_hpd is true) was already handled in this way, so this wasn't changed. The problem with the old behavior was that if the HPD goes low when no fh is open, and a transmit was in progress, then the adapter would be disabled, typically stopping the transmit immediately which leaves a partial message on the bus, which isn't nice and can confuse some adapters. It makes much more sense to disable it only when the adapter is unconfigured and we're not monitoring the bus, since then you really won't be using it anymore. To keep track of this store a CEC activation count and call adap_enable only when it goes from 0 to 1 or back to 0. Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Stable-dep-of: 47c82aac10a6 ("media: cec: core: avoid recursive cec_claim_log_addrs") Signed-off-by: Sasha Levin --- drivers/media/cec/core/cec-adap.c | 174 ++++++++++++++++++++++-------- drivers/media/cec/core/cec-api.c | 18 +--- include/media/cec.h | 2 + 3 files changed, 130 insertions(+), 64 deletions(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index d2a6fd8b6501..6415a80c9040 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -1532,6 +1532,7 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block) "ceccfg-%s", adap->name); if (IS_ERR(adap->kthread_config)) { adap->kthread_config = NULL; + adap->is_configuring = false; } else if (block) { mutex_unlock(&adap->lock); wait_for_completion(&adap->config_completion); @@ -1539,60 +1540,91 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block) } } +/* + * Helper functions to enable/disable the CEC adapter. + * + * These functions are called with adap->lock held. + */ +static int cec_activate_cnt_inc(struct cec_adapter *adap) +{ + int ret; + + if (adap->activate_cnt++) + return 0; + + /* serialize adap_enable */ + mutex_lock(&adap->devnode.lock); + adap->last_initiator = 0xff; + adap->transmit_in_progress = false; + ret = adap->ops->adap_enable(adap, true); + if (ret) + adap->activate_cnt--; + mutex_unlock(&adap->devnode.lock); + return ret; +} + +static void cec_activate_cnt_dec(struct cec_adapter *adap) +{ + if (WARN_ON(!adap->activate_cnt)) + return; + + if (--adap->activate_cnt) + return; + + /* serialize adap_enable */ + mutex_lock(&adap->devnode.lock); + WARN_ON(adap->ops->adap_enable(adap, false)); + adap->last_initiator = 0xff; + adap->transmit_in_progress = false; + mutex_unlock(&adap->devnode.lock); +} + /* Set a new physical address and send an event notifying userspace of this. * * This function is called with adap->lock held. */ void __cec_s_phys_addr(struct cec_adapter *adap, u16 phys_addr, bool block) { + bool becomes_invalid = phys_addr == CEC_PHYS_ADDR_INVALID; + bool is_invalid = adap->phys_addr == CEC_PHYS_ADDR_INVALID; + if (phys_addr == adap->phys_addr) return; - if (phys_addr != CEC_PHYS_ADDR_INVALID && adap->devnode.unregistered) + if (!becomes_invalid && adap->devnode.unregistered) return; dprintk(1, "new physical address %x.%x.%x.%x\n", cec_phys_addr_exp(phys_addr)); - if (phys_addr == CEC_PHYS_ADDR_INVALID || - adap->phys_addr != CEC_PHYS_ADDR_INVALID) { + if (becomes_invalid || !is_invalid) { adap->phys_addr = CEC_PHYS_ADDR_INVALID; cec_post_state_event(adap); cec_adap_unconfigure(adap); - /* Disabling monitor all mode should always succeed */ - if (adap->monitor_all_cnt) - WARN_ON(call_op(adap, adap_monitor_all_enable, false)); - /* serialize adap_enable */ - mutex_lock(&adap->devnode.lock); - if (adap->needs_hpd || list_empty(&adap->devnode.fhs)) { - WARN_ON(adap->ops->adap_enable(adap, false)); - adap->transmit_in_progress = false; + if (becomes_invalid && adap->needs_hpd) { + /* Disable monitor-all/pin modes if needed */ + if (adap->monitor_all_cnt) + WARN_ON(call_op(adap, adap_monitor_all_enable, false)); + if (adap->monitor_pin_cnt) + WARN_ON(call_op(adap, adap_monitor_pin_enable, false)); + cec_activate_cnt_dec(adap); wake_up_interruptible(&adap->kthread_waitq); } - mutex_unlock(&adap->devnode.lock); - if (phys_addr == CEC_PHYS_ADDR_INVALID) + if (becomes_invalid) return; } - /* serialize adap_enable */ - mutex_lock(&adap->devnode.lock); - adap->last_initiator = 0xff; - adap->transmit_in_progress = false; - - if (adap->needs_hpd || list_empty(&adap->devnode.fhs)) { - if (adap->ops->adap_enable(adap, true)) { - mutex_unlock(&adap->devnode.lock); + if (is_invalid && adap->needs_hpd) { + if (cec_activate_cnt_inc(adap)) return; - } + /* + * Re-enable monitor-all/pin modes if needed. We warn, but + * continue if this fails as this is not a critical error. + */ + if (adap->monitor_all_cnt) + WARN_ON(call_op(adap, adap_monitor_all_enable, true)); + if (adap->monitor_pin_cnt) + WARN_ON(call_op(adap, adap_monitor_pin_enable, true)); } - if (adap->monitor_all_cnt && - call_op(adap, adap_monitor_all_enable, true)) { - if (adap->needs_hpd || list_empty(&adap->devnode.fhs)) - WARN_ON(adap->ops->adap_enable(adap, false)); - mutex_unlock(&adap->devnode.lock); - return; - } - mutex_unlock(&adap->devnode.lock); - adap->phys_addr = phys_addr; cec_post_state_event(adap); if (adap->log_addrs.num_log_addrs) @@ -1656,6 +1688,8 @@ int __cec_s_log_addrs(struct cec_adapter *adap, return -ENODEV; if (!log_addrs || log_addrs->num_log_addrs == 0) { + if (!adap->is_configuring && !adap->is_configured) + return 0; cec_adap_unconfigure(adap); adap->log_addrs.num_log_addrs = 0; for (i = 0; i < CEC_MAX_LOG_ADDRS; i++) @@ -1663,6 +1697,8 @@ int __cec_s_log_addrs(struct cec_adapter *adap, adap->log_addrs.osd_name[0] = '\0'; adap->log_addrs.vendor_id = CEC_VENDOR_ID_NONE; adap->log_addrs.cec_version = CEC_OP_CEC_VERSION_2_0; + if (!adap->needs_hpd) + cec_activate_cnt_dec(adap); return 0; } @@ -1796,6 +1832,12 @@ int __cec_s_log_addrs(struct cec_adapter *adap, sizeof(log_addrs->features[i])); } + if (!adap->needs_hpd && !adap->is_configuring && !adap->is_configured) { + int ret = cec_activate_cnt_inc(adap); + + if (ret) + return ret; + } log_addrs->log_addr_mask = adap->log_addrs.log_addr_mask; adap->log_addrs = *log_addrs; if (adap->phys_addr != CEC_PHYS_ADDR_INVALID) @@ -2099,20 +2141,37 @@ static int cec_receive_notify(struct cec_adapter *adap, struct cec_msg *msg, */ int cec_monitor_all_cnt_inc(struct cec_adapter *adap) { - int ret = 0; + int ret; - if (adap->monitor_all_cnt == 0) - ret = call_op(adap, adap_monitor_all_enable, 1); - if (ret == 0) - adap->monitor_all_cnt++; + if (adap->monitor_all_cnt++) + return 0; + + if (!adap->needs_hpd) { + ret = cec_activate_cnt_inc(adap); + if (ret) { + adap->monitor_all_cnt--; + return ret; + } + } + + ret = call_op(adap, adap_monitor_all_enable, true); + if (ret) { + adap->monitor_all_cnt--; + if (!adap->needs_hpd) + cec_activate_cnt_dec(adap); + } return ret; } void cec_monitor_all_cnt_dec(struct cec_adapter *adap) { - adap->monitor_all_cnt--; - if (adap->monitor_all_cnt == 0) - WARN_ON(call_op(adap, adap_monitor_all_enable, 0)); + if (WARN_ON(!adap->monitor_all_cnt)) + return; + if (--adap->monitor_all_cnt) + return; + WARN_ON(call_op(adap, adap_monitor_all_enable, false)); + if (!adap->needs_hpd) + cec_activate_cnt_dec(adap); } /* @@ -2122,20 +2181,37 @@ void cec_monitor_all_cnt_dec(struct cec_adapter *adap) */ int cec_monitor_pin_cnt_inc(struct cec_adapter *adap) { - int ret = 0; + int ret; - if (adap->monitor_pin_cnt == 0) - ret = call_op(adap, adap_monitor_pin_enable, 1); - if (ret == 0) - adap->monitor_pin_cnt++; + if (adap->monitor_pin_cnt++) + return 0; + + if (!adap->needs_hpd) { + ret = cec_activate_cnt_inc(adap); + if (ret) { + adap->monitor_pin_cnt--; + return ret; + } + } + + ret = call_op(adap, adap_monitor_pin_enable, true); + if (ret) { + adap->monitor_pin_cnt--; + if (!adap->needs_hpd) + cec_activate_cnt_dec(adap); + } return ret; } void cec_monitor_pin_cnt_dec(struct cec_adapter *adap) { - adap->monitor_pin_cnt--; - if (adap->monitor_pin_cnt == 0) - WARN_ON(call_op(adap, adap_monitor_pin_enable, 0)); + if (WARN_ON(!adap->monitor_pin_cnt)) + return; + if (--adap->monitor_pin_cnt) + return; + WARN_ON(call_op(adap, adap_monitor_pin_enable, false)); + if (!adap->needs_hpd) + cec_activate_cnt_dec(adap); } #ifdef CONFIG_DEBUG_FS @@ -2149,6 +2225,7 @@ int cec_adap_status(struct seq_file *file, void *priv) struct cec_data *data; mutex_lock(&adap->lock); + seq_printf(file, "activation count: %u\n", adap->activate_cnt); seq_printf(file, "configured: %d\n", adap->is_configured); seq_printf(file, "configuring: %d\n", adap->is_configuring); seq_printf(file, "phys_addr: %x.%x.%x.%x\n", @@ -2163,6 +2240,9 @@ int cec_adap_status(struct seq_file *file, void *priv) if (adap->monitor_all_cnt) seq_printf(file, "file handles in Monitor All mode: %u\n", adap->monitor_all_cnt); + if (adap->monitor_pin_cnt) + seq_printf(file, "file handles in Monitor Pin mode: %u\n", + adap->monitor_pin_cnt); if (adap->tx_timeouts) { seq_printf(file, "transmit timeouts: %u\n", adap->tx_timeouts); diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c index 899017a0e514..0be4e822211e 100644 --- a/drivers/media/cec/core/cec-api.c +++ b/drivers/media/cec/core/cec-api.c @@ -586,18 +586,6 @@ static int cec_open(struct inode *inode, struct file *filp) return err; } - /* serialize adap_enable */ - mutex_lock(&devnode->lock); - if (list_empty(&devnode->fhs) && - !adap->needs_hpd && - adap->phys_addr == CEC_PHYS_ADDR_INVALID) { - err = adap->ops->adap_enable(adap, true); - if (err) { - mutex_unlock(&devnode->lock); - kfree(fh); - return err; - } - } filp->private_data = fh; /* Queue up initial state events */ @@ -625,6 +613,7 @@ static int cec_open(struct inode *inode, struct file *filp) } #endif + mutex_lock(&devnode->lock); mutex_lock(&devnode->lock_fhs); list_add(&fh->list, &devnode->fhs); mutex_unlock(&devnode->lock_fhs); @@ -656,15 +645,10 @@ static int cec_release(struct inode *inode, struct file *filp) cec_monitor_all_cnt_dec(adap); mutex_unlock(&adap->lock); - /* serialize adap_enable */ mutex_lock(&devnode->lock); mutex_lock(&devnode->lock_fhs); list_del(&fh->list); mutex_unlock(&devnode->lock_fhs); - if (cec_is_registered(adap) && list_empty(&devnode->fhs) && - !adap->needs_hpd && adap->phys_addr == CEC_PHYS_ADDR_INVALID) { - WARN_ON(adap->ops->adap_enable(adap, false)); - } mutex_unlock(&devnode->lock); /* Unhook pending transmits from this filehandle. */ diff --git a/include/media/cec.h b/include/media/cec.h index 77346f757036..97c5f5bfcbd0 100644 --- a/include/media/cec.h +++ b/include/media/cec.h @@ -185,6 +185,7 @@ struct cec_adap_ops { * Drivers that need this can set this field to true after the * cec_allocate_adapter() call. * @last_initiator: the initiator of the last transmitted message. + * @activate_cnt: number of times that CEC is activated * @monitor_all_cnt: number of filehandles monitoring all msgs * @monitor_pin_cnt: number of filehandles monitoring pin changes * @follower_cnt: number of filehandles in follower mode @@ -236,6 +237,7 @@ struct cec_adapter { bool cec_pin_is_high; bool adap_controls_phys_addr; u8 last_initiator; + u32 activate_cnt; u32 monitor_all_cnt; u32 monitor_pin_cnt; u32 follower_cnt; From b64cb24a9e97f05473eb2b9b0573d25cfde50446 Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Thu, 3 Mar 2022 14:01:44 +0000 Subject: [PATCH 0204/1565] media: cec: abort if the current transmit was canceled [ Upstream commit 590a8e564c6eff7e77a84e728612f1269e3c0685 ] If a transmit-in-progress was canceled, then, once the transmit is done, mark it as aborted and refrain from retrying the transmit. To signal this situation the new transmit_in_progress_aborted field is set to true. The old implementation would just set adap->transmitting to NULL and set adap->transmit_in_progress to false, but on the hardware level the transmit was still ongoing. However, the framework would think the transmit was aborted, and if a new transmit was issued, then it could overwrite the HW buffer containing the old transmit with the new transmit, leading to garbled data on the CEC bus. Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Stable-dep-of: 47c82aac10a6 ("media: cec: core: avoid recursive cec_claim_log_addrs") Signed-off-by: Sasha Levin --- drivers/media/cec/core/cec-adap.c | 14 +++++++++++--- include/media/cec.h | 6 ++++++ 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index 6415a80c9040..fd4af157f4ce 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -421,7 +421,7 @@ static void cec_flush(struct cec_adapter *adap) cec_data_cancel(data, CEC_TX_STATUS_ABORTED); } if (adap->transmitting) - cec_data_cancel(adap->transmitting, CEC_TX_STATUS_ABORTED); + adap->transmit_in_progress_aborted = true; /* Cancel the pending timeout work. */ list_for_each_entry_safe(data, n, &adap->wait_queue, list) { @@ -572,6 +572,7 @@ int cec_thread_func(void *_adap) if (data->attempts == 0) data->attempts = attempts; + adap->transmit_in_progress_aborted = false; /* Tell the adapter to transmit, cancel on error */ if (adap->ops->adap_transmit(adap, data->attempts, signal_free_time, &data->msg)) @@ -599,6 +600,8 @@ void cec_transmit_done_ts(struct cec_adapter *adap, u8 status, struct cec_msg *msg; unsigned int attempts_made = arb_lost_cnt + nack_cnt + low_drive_cnt + error_cnt; + bool done = status & (CEC_TX_STATUS_MAX_RETRIES | CEC_TX_STATUS_OK); + bool aborted = adap->transmit_in_progress_aborted; dprintk(2, "%s: status 0x%02x\n", __func__, status); if (attempts_made < 1) @@ -619,6 +622,7 @@ void cec_transmit_done_ts(struct cec_adapter *adap, u8 status, goto wake_thread; } adap->transmit_in_progress = false; + adap->transmit_in_progress_aborted = false; msg = &data->msg; @@ -639,8 +643,7 @@ void cec_transmit_done_ts(struct cec_adapter *adap, u8 status, * the hardware didn't signal that it retried itself (by setting * CEC_TX_STATUS_MAX_RETRIES), then we will retry ourselves. */ - if (data->attempts > attempts_made && - !(status & (CEC_TX_STATUS_MAX_RETRIES | CEC_TX_STATUS_OK))) { + if (!aborted && data->attempts > attempts_made && !done) { /* Retry this message */ data->attempts -= attempts_made; if (msg->timeout) @@ -655,6 +658,8 @@ void cec_transmit_done_ts(struct cec_adapter *adap, u8 status, goto wake_thread; } + if (aborted && !done) + status |= CEC_TX_STATUS_ABORTED; data->attempts = 0; /* Always set CEC_TX_STATUS_MAX_RETRIES on error */ @@ -1576,6 +1581,9 @@ static void cec_activate_cnt_dec(struct cec_adapter *adap) WARN_ON(adap->ops->adap_enable(adap, false)); adap->last_initiator = 0xff; adap->transmit_in_progress = false; + adap->transmit_in_progress_aborted = false; + if (adap->transmitting) + cec_data_cancel(adap->transmitting, CEC_TX_STATUS_ABORTED); mutex_unlock(&adap->devnode.lock); } diff --git a/include/media/cec.h b/include/media/cec.h index 97c5f5bfcbd0..31d704f36707 100644 --- a/include/media/cec.h +++ b/include/media/cec.h @@ -163,6 +163,11 @@ struct cec_adap_ops { * @wait_queue: queue of transmits waiting for a reply * @transmitting: CEC messages currently being transmitted * @transmit_in_progress: true if a transmit is in progress + * @transmit_in_progress_aborted: true if a transmit is in progress is to be + * aborted. This happens if the logical address is + * invalidated while the transmit is ongoing. In that + * case the transmit will finish, but will not retransmit + * and be marked as ABORTED. * @kthread_config: kthread used to configure a CEC adapter * @config_completion: used to signal completion of the config kthread * @kthread: main CEC processing thread @@ -218,6 +223,7 @@ struct cec_adapter { struct list_head wait_queue; struct cec_data *transmitting; bool transmit_in_progress; + bool transmit_in_progress_aborted; struct task_struct *kthread_config; struct completion config_completion; From 8fa7e4896fddb62dfe7262ebc0dca411c24de09f Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Sat, 13 Nov 2021 11:02:36 +0000 Subject: [PATCH 0205/1565] media: cec: correctly pass on reply results [ Upstream commit f9d0ecbf56f4b90745a6adc5b59281ad8f70ab54 ] The results of non-blocking transmits were not correctly communicated to userspace. Specifically: 1) if a non-blocking transmit was canceled, then rx_status wasn't set to 0 as it should. 2) if the non-blocking transmit succeeded, but the corresponding reply never arrived (aborted or timed out), then tx_status wasn't set to 0 as it should, and rx_status was hardcoded to ABORTED instead of the actual reason, such as TIMEOUT. In addition, adap->ops->received() was never called, so drivers that want to do message processing themselves would not be informed of the failed reply. Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Stable-dep-of: 47c82aac10a6 ("media: cec: core: avoid recursive cec_claim_log_addrs") Signed-off-by: Sasha Levin --- drivers/media/cec/core/cec-adap.c | 48 +++++++++++++++++++------------ 1 file changed, 30 insertions(+), 18 deletions(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index fd4af157f4ce..41f4def8a31c 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -366,38 +366,48 @@ static void cec_data_completed(struct cec_data *data) /* * A pending CEC transmit needs to be cancelled, either because the CEC * adapter is disabled or the transmit takes an impossibly long time to - * finish. + * finish, or the reply timed out. * * This function is called with adap->lock held. */ -static void cec_data_cancel(struct cec_data *data, u8 tx_status) +static void cec_data_cancel(struct cec_data *data, u8 tx_status, u8 rx_status) { + struct cec_adapter *adap = data->adap; + /* * It's either the current transmit, or it is a pending * transmit. Take the appropriate action to clear it. */ - if (data->adap->transmitting == data) { - data->adap->transmitting = NULL; + if (adap->transmitting == data) { + adap->transmitting = NULL; } else { list_del_init(&data->list); if (!(data->msg.tx_status & CEC_TX_STATUS_OK)) - if (!WARN_ON(!data->adap->transmit_queue_sz)) - data->adap->transmit_queue_sz--; + if (!WARN_ON(!adap->transmit_queue_sz)) + adap->transmit_queue_sz--; } if (data->msg.tx_status & CEC_TX_STATUS_OK) { data->msg.rx_ts = ktime_get_ns(); - data->msg.rx_status = CEC_RX_STATUS_ABORTED; + data->msg.rx_status = rx_status; + if (!data->blocking) + data->msg.tx_status = 0; } else { data->msg.tx_ts = ktime_get_ns(); data->msg.tx_status |= tx_status | CEC_TX_STATUS_MAX_RETRIES; data->msg.tx_error_cnt++; data->attempts = 0; + if (!data->blocking) + data->msg.rx_status = 0; } /* Queue transmitted message for monitoring purposes */ - cec_queue_msg_monitor(data->adap, &data->msg, 1); + cec_queue_msg_monitor(adap, &data->msg, 1); + + if (!data->blocking && data->msg.sequence && adap->ops->received) + /* Allow drivers to process the message first */ + adap->ops->received(adap, &data->msg); cec_data_completed(data); } @@ -418,7 +428,7 @@ static void cec_flush(struct cec_adapter *adap) while (!list_empty(&adap->transmit_queue)) { data = list_first_entry(&adap->transmit_queue, struct cec_data, list); - cec_data_cancel(data, CEC_TX_STATUS_ABORTED); + cec_data_cancel(data, CEC_TX_STATUS_ABORTED, 0); } if (adap->transmitting) adap->transmit_in_progress_aborted = true; @@ -426,7 +436,7 @@ static void cec_flush(struct cec_adapter *adap) /* Cancel the pending timeout work. */ list_for_each_entry_safe(data, n, &adap->wait_queue, list) { if (cancel_delayed_work(&data->work)) - cec_data_cancel(data, CEC_TX_STATUS_OK); + cec_data_cancel(data, CEC_TX_STATUS_OK, CEC_RX_STATUS_ABORTED); /* * If cancel_delayed_work returned false, then * the cec_wait_timeout function is running, @@ -516,7 +526,7 @@ int cec_thread_func(void *_adap) adap->transmitting->msg.msg); /* Just give up on this. */ cec_data_cancel(adap->transmitting, - CEC_TX_STATUS_TIMEOUT); + CEC_TX_STATUS_TIMEOUT, 0); } else { pr_warn("cec-%s: transmit timed out\n", adap->name); } @@ -576,7 +586,7 @@ int cec_thread_func(void *_adap) /* Tell the adapter to transmit, cancel on error */ if (adap->ops->adap_transmit(adap, data->attempts, signal_free_time, &data->msg)) - cec_data_cancel(data, CEC_TX_STATUS_ABORTED); + cec_data_cancel(data, CEC_TX_STATUS_ABORTED, 0); else adap->transmit_in_progress = true; @@ -738,9 +748,7 @@ static void cec_wait_timeout(struct work_struct *work) /* Mark the message as timed out */ list_del_init(&data->list); - data->msg.rx_ts = ktime_get_ns(); - data->msg.rx_status = CEC_RX_STATUS_TIMEOUT; - cec_data_completed(data); + cec_data_cancel(data, CEC_TX_STATUS_OK, CEC_RX_STATUS_TIMEOUT); unlock: mutex_unlock(&adap->lock); } @@ -923,8 +931,12 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg, mutex_lock(&adap->lock); /* Cancel the transmit if it was interrupted */ - if (!data->completed) - cec_data_cancel(data, CEC_TX_STATUS_ABORTED); + if (!data->completed) { + if (data->msg.tx_status & CEC_TX_STATUS_OK) + cec_data_cancel(data, CEC_TX_STATUS_OK, CEC_RX_STATUS_ABORTED); + else + cec_data_cancel(data, CEC_TX_STATUS_ABORTED, 0); + } /* The transmit completed (possibly with an error) */ *msg = data->msg; @@ -1583,7 +1595,7 @@ static void cec_activate_cnt_dec(struct cec_adapter *adap) adap->transmit_in_progress = false; adap->transmit_in_progress_aborted = false; if (adap->transmitting) - cec_data_cancel(adap->transmitting, CEC_TX_STATUS_ABORTED); + cec_data_cancel(adap->transmitting, CEC_TX_STATUS_ABORTED, 0); mutex_unlock(&adap->devnode.lock); } From 73ef9ae980edef7ceeb2aa6e9ef38bee1fd8afd8 Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Thu, 17 Mar 2022 08:51:20 +0000 Subject: [PATCH 0206/1565] media: cec: use call_op and check for !unregistered [ Upstream commit e2ed5024ac2bc27d4bfc99fd58f5ab54de8fa965 ] Use call_(void_)op consistently in the CEC core framework. Ditto for the cec pin ops. And check if !adap->devnode.unregistered before calling each op. This avoids calls to ops when the device has been unregistered and the underlying hardware may be gone. Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Stable-dep-of: 47c82aac10a6 ("media: cec: core: avoid recursive cec_claim_log_addrs") Signed-off-by: Sasha Levin --- drivers/media/cec/core/cec-adap.c | 37 ++++++++++----------------- drivers/media/cec/core/cec-api.c | 6 +++-- drivers/media/cec/core/cec-core.c | 4 +-- drivers/media/cec/core/cec-pin-priv.h | 11 ++++++++ drivers/media/cec/core/cec-pin.c | 23 ++++++++--------- drivers/media/cec/core/cec-priv.h | 10 ++++++++ 6 files changed, 51 insertions(+), 40 deletions(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index 41f4def8a31c..07ff4b0c8461 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -39,15 +39,6 @@ static void cec_fill_msg_report_features(struct cec_adapter *adap, */ #define CEC_XFER_TIMEOUT_MS (5 * 400 + 100) -#define call_op(adap, op, arg...) \ - (adap->ops->op ? adap->ops->op(adap, ## arg) : 0) - -#define call_void_op(adap, op, arg...) \ - do { \ - if (adap->ops->op) \ - adap->ops->op(adap, ## arg); \ - } while (0) - static int cec_log_addr2idx(const struct cec_adapter *adap, u8 log_addr) { int i; @@ -405,9 +396,9 @@ static void cec_data_cancel(struct cec_data *data, u8 tx_status, u8 rx_status) /* Queue transmitted message for monitoring purposes */ cec_queue_msg_monitor(adap, &data->msg, 1); - if (!data->blocking && data->msg.sequence && adap->ops->received) + if (!data->blocking && data->msg.sequence) /* Allow drivers to process the message first */ - adap->ops->received(adap, &data->msg); + call_op(adap, received, &data->msg); cec_data_completed(data); } @@ -584,8 +575,8 @@ int cec_thread_func(void *_adap) adap->transmit_in_progress_aborted = false; /* Tell the adapter to transmit, cancel on error */ - if (adap->ops->adap_transmit(adap, data->attempts, - signal_free_time, &data->msg)) + if (call_op(adap, adap_transmit, data->attempts, + signal_free_time, &data->msg)) cec_data_cancel(data, CEC_TX_STATUS_ABORTED, 0); else adap->transmit_in_progress = true; @@ -1311,7 +1302,7 @@ static int cec_config_log_addr(struct cec_adapter *adap, * Message not acknowledged, so this logical * address is free to use. */ - err = adap->ops->adap_log_addr(adap, log_addr); + err = call_op(adap, adap_log_addr, log_addr); if (err) return err; @@ -1328,9 +1319,8 @@ static int cec_config_log_addr(struct cec_adapter *adap, */ static void cec_adap_unconfigure(struct cec_adapter *adap) { - if (!adap->needs_hpd || - adap->phys_addr != CEC_PHYS_ADDR_INVALID) - WARN_ON(adap->ops->adap_log_addr(adap, CEC_LOG_ADDR_INVALID)); + if (!adap->needs_hpd || adap->phys_addr != CEC_PHYS_ADDR_INVALID) + WARN_ON(call_op(adap, adap_log_addr, CEC_LOG_ADDR_INVALID)); adap->log_addrs.log_addr_mask = 0; adap->is_configured = false; cec_flush(adap); @@ -1573,7 +1563,7 @@ static int cec_activate_cnt_inc(struct cec_adapter *adap) mutex_lock(&adap->devnode.lock); adap->last_initiator = 0xff; adap->transmit_in_progress = false; - ret = adap->ops->adap_enable(adap, true); + ret = call_op(adap, adap_enable, true); if (ret) adap->activate_cnt--; mutex_unlock(&adap->devnode.lock); @@ -1590,7 +1580,7 @@ static void cec_activate_cnt_dec(struct cec_adapter *adap) /* serialize adap_enable */ mutex_lock(&adap->devnode.lock); - WARN_ON(adap->ops->adap_enable(adap, false)); + WARN_ON(call_op(adap, adap_enable, false)); adap->last_initiator = 0xff; adap->transmit_in_progress = false; adap->transmit_in_progress_aborted = false; @@ -1964,11 +1954,10 @@ static int cec_receive_notify(struct cec_adapter *adap, struct cec_msg *msg, msg->msg[1] != CEC_MSG_CDC_MESSAGE) return 0; - if (adap->ops->received) { - /* Allow drivers to process the message first */ - if (adap->ops->received(adap, msg) != -ENOMSG) - return 0; - } + /* Allow drivers to process the message first */ + if (adap->ops->received && !adap->devnode.unregistered && + adap->ops->received(adap, msg) != -ENOMSG) + return 0; /* * REPORT_PHYSICAL_ADDR, CEC_MSG_USER_CONTROL_PRESSED and diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c index 0be4e822211e..feaf100a44c2 100644 --- a/drivers/media/cec/core/cec-api.c +++ b/drivers/media/cec/core/cec-api.c @@ -595,7 +595,8 @@ static int cec_open(struct inode *inode, struct file *filp) adap->conn_info.type != CEC_CONNECTOR_TYPE_NO_CONNECTOR; cec_queue_event_fh(fh, &ev, 0); #ifdef CONFIG_CEC_PIN - if (adap->pin && adap->pin->ops->read_hpd) { + if (adap->pin && adap->pin->ops->read_hpd && + !adap->devnode.unregistered) { err = adap->pin->ops->read_hpd(adap); if (err >= 0) { ev.event = err ? CEC_EVENT_PIN_HPD_HIGH : @@ -603,7 +604,8 @@ static int cec_open(struct inode *inode, struct file *filp) cec_queue_event_fh(fh, &ev, 0); } } - if (adap->pin && adap->pin->ops->read_5v) { + if (adap->pin && adap->pin->ops->read_5v && + !adap->devnode.unregistered) { err = adap->pin->ops->read_5v(adap); if (err >= 0) { ev.event = err ? CEC_EVENT_PIN_5V_HIGH : diff --git a/drivers/media/cec/core/cec-core.c b/drivers/media/cec/core/cec-core.c index 61139b0a0869..0a3345e1a0f3 100644 --- a/drivers/media/cec/core/cec-core.c +++ b/drivers/media/cec/core/cec-core.c @@ -204,7 +204,7 @@ static ssize_t cec_error_inj_write(struct file *file, line = strsep(&p, "\n"); if (!*line || *line == '#') continue; - if (!adap->ops->error_inj_parse_line(adap, line)) { + if (!call_op(adap, error_inj_parse_line, line)) { kfree(buf); return -EINVAL; } @@ -217,7 +217,7 @@ static int cec_error_inj_show(struct seq_file *sf, void *unused) { struct cec_adapter *adap = sf->private; - return adap->ops->error_inj_show(adap, sf); + return call_op(adap, error_inj_show, sf); } static int cec_error_inj_open(struct inode *inode, struct file *file) diff --git a/drivers/media/cec/core/cec-pin-priv.h b/drivers/media/cec/core/cec-pin-priv.h index f423db8855d9..5577c62e4131 100644 --- a/drivers/media/cec/core/cec-pin-priv.h +++ b/drivers/media/cec/core/cec-pin-priv.h @@ -12,6 +12,17 @@ #include #include +#define call_pin_op(pin, op, arg...) \ + ((pin && pin->ops->op && !pin->adap->devnode.unregistered) ? \ + pin->ops->op(pin->adap, ## arg) : 0) + +#define call_void_pin_op(pin, op, arg...) \ + do { \ + if (pin && pin->ops->op && \ + !pin->adap->devnode.unregistered) \ + pin->ops->op(pin->adap, ## arg); \ + } while (0) + enum cec_pin_state { /* CEC is off */ CEC_ST_OFF, diff --git a/drivers/media/cec/core/cec-pin.c b/drivers/media/cec/core/cec-pin.c index f8452a1f9fc6..447fdada20e9 100644 --- a/drivers/media/cec/core/cec-pin.c +++ b/drivers/media/cec/core/cec-pin.c @@ -135,7 +135,7 @@ static void cec_pin_update(struct cec_pin *pin, bool v, bool force) static bool cec_pin_read(struct cec_pin *pin) { - bool v = pin->ops->read(pin->adap); + bool v = call_pin_op(pin, read); cec_pin_update(pin, v, false); return v; @@ -143,13 +143,13 @@ static bool cec_pin_read(struct cec_pin *pin) static void cec_pin_low(struct cec_pin *pin) { - pin->ops->low(pin->adap); + call_void_pin_op(pin, low); cec_pin_update(pin, false, false); } static bool cec_pin_high(struct cec_pin *pin) { - pin->ops->high(pin->adap); + call_void_pin_op(pin, high); return cec_pin_read(pin); } @@ -1086,7 +1086,7 @@ static int cec_pin_thread_func(void *_adap) CEC_PIN_IRQ_UNCHANGED)) { case CEC_PIN_IRQ_DISABLE: if (irq_enabled) { - pin->ops->disable_irq(adap); + call_void_pin_op(pin, disable_irq); irq_enabled = false; } cec_pin_high(pin); @@ -1097,7 +1097,7 @@ static int cec_pin_thread_func(void *_adap) case CEC_PIN_IRQ_ENABLE: if (irq_enabled) break; - pin->enable_irq_failed = !pin->ops->enable_irq(adap); + pin->enable_irq_failed = !call_pin_op(pin, enable_irq); if (pin->enable_irq_failed) { cec_pin_to_idle(pin); hrtimer_start(&pin->timer, ns_to_ktime(0), @@ -1112,8 +1112,8 @@ static int cec_pin_thread_func(void *_adap) if (kthread_should_stop()) break; } - if (pin->ops->disable_irq && irq_enabled) - pin->ops->disable_irq(adap); + if (irq_enabled) + call_void_pin_op(pin, disable_irq); hrtimer_cancel(&pin->timer); cec_pin_read(pin); cec_pin_to_idle(pin); @@ -1208,7 +1208,7 @@ static void cec_pin_adap_status(struct cec_adapter *adap, seq_printf(file, "state: %s\n", states[pin->state].name); seq_printf(file, "tx_bit: %d\n", pin->tx_bit); seq_printf(file, "rx_bit: %d\n", pin->rx_bit); - seq_printf(file, "cec pin: %d\n", pin->ops->read(adap)); + seq_printf(file, "cec pin: %d\n", call_pin_op(pin, read)); seq_printf(file, "cec pin events dropped: %u\n", pin->work_pin_events_dropped_cnt); seq_printf(file, "irq failed: %d\n", pin->enable_irq_failed); @@ -1261,8 +1261,7 @@ static void cec_pin_adap_status(struct cec_adapter *adap, pin->rx_data_bit_too_long_cnt = 0; pin->rx_low_drive_cnt = 0; pin->tx_low_drive_cnt = 0; - if (pin->ops->status) - pin->ops->status(adap, file); + call_void_pin_op(pin, status, file); } static int cec_pin_adap_monitor_all_enable(struct cec_adapter *adap, @@ -1278,7 +1277,7 @@ static void cec_pin_adap_free(struct cec_adapter *adap) { struct cec_pin *pin = adap->pin; - if (pin->ops->free) + if (pin && pin->ops->free) pin->ops->free(adap); adap->pin = NULL; kfree(pin); @@ -1288,7 +1287,7 @@ static int cec_pin_received(struct cec_adapter *adap, struct cec_msg *msg) { struct cec_pin *pin = adap->pin; - if (pin->ops->received) + if (pin->ops->received && !adap->devnode.unregistered) return pin->ops->received(adap, msg); return -ENOMSG; } diff --git a/drivers/media/cec/core/cec-priv.h b/drivers/media/cec/core/cec-priv.h index 9bbd05053d42..b78df931aa74 100644 --- a/drivers/media/cec/core/cec-priv.h +++ b/drivers/media/cec/core/cec-priv.h @@ -17,6 +17,16 @@ pr_info("cec-%s: " fmt, adap->name, ## arg); \ } while (0) +#define call_op(adap, op, arg...) \ + ((adap->ops->op && !adap->devnode.unregistered) ? \ + adap->ops->op(adap, ## arg) : 0) + +#define call_void_op(adap, op, arg...) \ + do { \ + if (adap->ops->op && !adap->devnode.unregistered) \ + adap->ops->op(adap, ## arg); \ + } while (0) + /* devnode to cec_adapter */ #define to_cec_adapter(node) container_of(node, struct cec_adapter, devnode) From 3e938b7d40fb3920619c3a9909cdd7902bcebae8 Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Tue, 10 May 2022 10:53:05 +0200 Subject: [PATCH 0207/1565] media: cec-adap.c: drop activate_cnt, use state info instead [ Upstream commit f9222f8ca18bcb1d55dd749b493b29fd8092fb82 ] Using an activation counter to decide when the enable or disable the cec adapter is not the best approach and can lead to race conditions. Change this to determining the current status of the adapter, and enable or disable the adapter accordingly. It now only needs to be called whenever there is a chance that the state changes, and it can handle enabling/disabling monitoring as well if needed. This simplifies the code and it should be a more robust approach as well. Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Stable-dep-of: 47c82aac10a6 ("media: cec: core: avoid recursive cec_claim_log_addrs") Signed-off-by: Sasha Levin --- drivers/media/cec/core/cec-adap.c | 152 ++++++++++++------------------ include/media/cec.h | 4 +- 2 files changed, 61 insertions(+), 95 deletions(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index 07ff4b0c8461..920c108b84aa 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -1548,47 +1548,59 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block) } /* - * Helper functions to enable/disable the CEC adapter. + * Helper function to enable/disable the CEC adapter. * - * These functions are called with adap->lock held. + * This function is called with adap->lock held. */ -static int cec_activate_cnt_inc(struct cec_adapter *adap) +static int cec_adap_enable(struct cec_adapter *adap) { - int ret; + bool enable; + int ret = 0; - if (adap->activate_cnt++) + enable = adap->monitor_all_cnt || adap->monitor_pin_cnt || + adap->log_addrs.num_log_addrs; + if (adap->needs_hpd) + enable = enable && adap->phys_addr != CEC_PHYS_ADDR_INVALID; + + if (enable == adap->is_enabled) return 0; /* serialize adap_enable */ mutex_lock(&adap->devnode.lock); - adap->last_initiator = 0xff; - adap->transmit_in_progress = false; - ret = call_op(adap, adap_enable, true); - if (ret) - adap->activate_cnt--; + if (enable) { + adap->last_initiator = 0xff; + adap->transmit_in_progress = false; + ret = adap->ops->adap_enable(adap, true); + if (!ret) { + /* + * Enable monitor-all/pin modes if needed. We warn, but + * continue if this fails as this is not a critical error. + */ + if (adap->monitor_all_cnt) + WARN_ON(call_op(adap, adap_monitor_all_enable, true)); + if (adap->monitor_pin_cnt) + WARN_ON(call_op(adap, adap_monitor_pin_enable, true)); + } + } else { + /* Disable monitor-all/pin modes if needed (needs_hpd == 1) */ + if (adap->monitor_all_cnt) + WARN_ON(call_op(adap, adap_monitor_all_enable, false)); + if (adap->monitor_pin_cnt) + WARN_ON(call_op(adap, adap_monitor_pin_enable, false)); + WARN_ON(adap->ops->adap_enable(adap, false)); + adap->last_initiator = 0xff; + adap->transmit_in_progress = false; + adap->transmit_in_progress_aborted = false; + if (adap->transmitting) + cec_data_cancel(adap->transmitting, CEC_TX_STATUS_ABORTED, 0); + } + if (!ret) + adap->is_enabled = enable; + wake_up_interruptible(&adap->kthread_waitq); mutex_unlock(&adap->devnode.lock); return ret; } -static void cec_activate_cnt_dec(struct cec_adapter *adap) -{ - if (WARN_ON(!adap->activate_cnt)) - return; - - if (--adap->activate_cnt) - return; - - /* serialize adap_enable */ - mutex_lock(&adap->devnode.lock); - WARN_ON(call_op(adap, adap_enable, false)); - adap->last_initiator = 0xff; - adap->transmit_in_progress = false; - adap->transmit_in_progress_aborted = false; - if (adap->transmitting) - cec_data_cancel(adap->transmitting, CEC_TX_STATUS_ABORTED, 0); - mutex_unlock(&adap->devnode.lock); -} - /* Set a new physical address and send an event notifying userspace of this. * * This function is called with adap->lock held. @@ -1609,33 +1621,16 @@ void __cec_s_phys_addr(struct cec_adapter *adap, u16 phys_addr, bool block) adap->phys_addr = CEC_PHYS_ADDR_INVALID; cec_post_state_event(adap); cec_adap_unconfigure(adap); - if (becomes_invalid && adap->needs_hpd) { - /* Disable monitor-all/pin modes if needed */ - if (adap->monitor_all_cnt) - WARN_ON(call_op(adap, adap_monitor_all_enable, false)); - if (adap->monitor_pin_cnt) - WARN_ON(call_op(adap, adap_monitor_pin_enable, false)); - cec_activate_cnt_dec(adap); - wake_up_interruptible(&adap->kthread_waitq); + if (becomes_invalid) { + cec_adap_enable(adap); + return; } - if (becomes_invalid) - return; - } - - if (is_invalid && adap->needs_hpd) { - if (cec_activate_cnt_inc(adap)) - return; - /* - * Re-enable monitor-all/pin modes if needed. We warn, but - * continue if this fails as this is not a critical error. - */ - if (adap->monitor_all_cnt) - WARN_ON(call_op(adap, adap_monitor_all_enable, true)); - if (adap->monitor_pin_cnt) - WARN_ON(call_op(adap, adap_monitor_pin_enable, true)); } adap->phys_addr = phys_addr; + if (is_invalid) + cec_adap_enable(adap); + cec_post_state_event(adap); if (adap->log_addrs.num_log_addrs) cec_claim_log_addrs(adap, block); @@ -1692,6 +1687,7 @@ int __cec_s_log_addrs(struct cec_adapter *adap, struct cec_log_addrs *log_addrs, bool block) { u16 type_mask = 0; + int err; int i; if (adap->devnode.unregistered) @@ -1707,8 +1703,7 @@ int __cec_s_log_addrs(struct cec_adapter *adap, adap->log_addrs.osd_name[0] = '\0'; adap->log_addrs.vendor_id = CEC_VENDOR_ID_NONE; adap->log_addrs.cec_version = CEC_OP_CEC_VERSION_2_0; - if (!adap->needs_hpd) - cec_activate_cnt_dec(adap); + cec_adap_enable(adap); return 0; } @@ -1842,17 +1837,12 @@ int __cec_s_log_addrs(struct cec_adapter *adap, sizeof(log_addrs->features[i])); } - if (!adap->needs_hpd && !adap->is_configuring && !adap->is_configured) { - int ret = cec_activate_cnt_inc(adap); - - if (ret) - return ret; - } log_addrs->log_addr_mask = adap->log_addrs.log_addr_mask; adap->log_addrs = *log_addrs; - if (adap->phys_addr != CEC_PHYS_ADDR_INVALID) + err = cec_adap_enable(adap); + if (!err && adap->phys_addr != CEC_PHYS_ADDR_INVALID) cec_claim_log_addrs(adap, block); - return 0; + return err; } int cec_s_log_addrs(struct cec_adapter *adap, @@ -2155,20 +2145,9 @@ int cec_monitor_all_cnt_inc(struct cec_adapter *adap) if (adap->monitor_all_cnt++) return 0; - if (!adap->needs_hpd) { - ret = cec_activate_cnt_inc(adap); - if (ret) { - adap->monitor_all_cnt--; - return ret; - } - } - - ret = call_op(adap, adap_monitor_all_enable, true); - if (ret) { + ret = cec_adap_enable(adap); + if (ret) adap->monitor_all_cnt--; - if (!adap->needs_hpd) - cec_activate_cnt_dec(adap); - } return ret; } @@ -2179,8 +2158,7 @@ void cec_monitor_all_cnt_dec(struct cec_adapter *adap) if (--adap->monitor_all_cnt) return; WARN_ON(call_op(adap, adap_monitor_all_enable, false)); - if (!adap->needs_hpd) - cec_activate_cnt_dec(adap); + cec_adap_enable(adap); } /* @@ -2195,20 +2173,9 @@ int cec_monitor_pin_cnt_inc(struct cec_adapter *adap) if (adap->monitor_pin_cnt++) return 0; - if (!adap->needs_hpd) { - ret = cec_activate_cnt_inc(adap); - if (ret) { - adap->monitor_pin_cnt--; - return ret; - } - } - - ret = call_op(adap, adap_monitor_pin_enable, true); - if (ret) { + ret = cec_adap_enable(adap); + if (ret) adap->monitor_pin_cnt--; - if (!adap->needs_hpd) - cec_activate_cnt_dec(adap); - } return ret; } @@ -2219,8 +2186,7 @@ void cec_monitor_pin_cnt_dec(struct cec_adapter *adap) if (--adap->monitor_pin_cnt) return; WARN_ON(call_op(adap, adap_monitor_pin_enable, false)); - if (!adap->needs_hpd) - cec_activate_cnt_dec(adap); + cec_adap_enable(adap); } #ifdef CONFIG_DEBUG_FS @@ -2234,7 +2200,7 @@ int cec_adap_status(struct seq_file *file, void *priv) struct cec_data *data; mutex_lock(&adap->lock); - seq_printf(file, "activation count: %u\n", adap->activate_cnt); + seq_printf(file, "enabled: %d\n", adap->is_enabled); seq_printf(file, "configured: %d\n", adap->is_configured); seq_printf(file, "configuring: %d\n", adap->is_configuring); seq_printf(file, "phys_addr: %x.%x.%x.%x\n", diff --git a/include/media/cec.h b/include/media/cec.h index 31d704f36707..df3e8738d512 100644 --- a/include/media/cec.h +++ b/include/media/cec.h @@ -180,6 +180,7 @@ struct cec_adap_ops { * @needs_hpd: if true, then the HDMI HotPlug Detect pin must be high * in order to transmit or receive CEC messages. This is usually a HW * limitation. + * @is_enabled: the CEC adapter is enabled * @is_configuring: the CEC adapter is configuring (i.e. claiming LAs) * @is_configured: the CEC adapter is configured (i.e. has claimed LAs) * @cec_pin_is_high: if true then the CEC pin is high. Only used with the @@ -190,7 +191,6 @@ struct cec_adap_ops { * Drivers that need this can set this field to true after the * cec_allocate_adapter() call. * @last_initiator: the initiator of the last transmitted message. - * @activate_cnt: number of times that CEC is activated * @monitor_all_cnt: number of filehandles monitoring all msgs * @monitor_pin_cnt: number of filehandles monitoring pin changes * @follower_cnt: number of filehandles in follower mode @@ -238,12 +238,12 @@ struct cec_adapter { u16 phys_addr; bool needs_hpd; + bool is_enabled; bool is_configuring; bool is_configured; bool cec_pin_is_high; bool adap_controls_phys_addr; u8 last_initiator; - u32 activate_cnt; u32 monitor_all_cnt; u32 monitor_pin_cnt; u32 follower_cnt; From 5103090f4e55252cb1b1a6084da53d63434dc24b Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Thu, 22 Feb 2024 16:17:33 +0000 Subject: [PATCH 0208/1565] media: cec: core: avoid recursive cec_claim_log_addrs [ Upstream commit 47c82aac10a6954d68f29f10d9758d016e8e5af1 ] Keep track if cec_claim_log_addrs() is running, and return -EBUSY if it is when calling CEC_ADAP_S_LOG_ADDRS. This prevents a case where cec_claim_log_addrs() could be called while it was still in progress. Signed-off-by: Hans Verkuil Reported-by: Yang, Chenyuan Closes: https://lore.kernel.org/linux-media/PH7PR11MB57688E64ADE4FE82E658D86DA09EA@PH7PR11MB5768.namprd11.prod.outlook.com/ Fixes: ca684386e6e2 ("[media] cec: add HDMI CEC framework (api)") Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin --- drivers/media/cec/core/cec-adap.c | 6 +++++- drivers/media/cec/core/cec-api.c | 2 +- include/media/cec.h | 1 + 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index 920c108b84aa..42d66ae27b19 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -1528,9 +1528,12 @@ static int cec_config_thread_func(void *arg) */ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block) { - if (WARN_ON(adap->is_configuring || adap->is_configured)) + if (WARN_ON(adap->is_claiming_log_addrs || + adap->is_configuring || adap->is_configured)) return; + adap->is_claiming_log_addrs = true; + init_completion(&adap->config_completion); /* Ready to kick off the thread */ @@ -1545,6 +1548,7 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block) wait_for_completion(&adap->config_completion); mutex_lock(&adap->lock); } + adap->is_claiming_log_addrs = false; } /* diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c index feaf100a44c2..8bdf58abdf96 100644 --- a/drivers/media/cec/core/cec-api.c +++ b/drivers/media/cec/core/cec-api.c @@ -178,7 +178,7 @@ static long cec_adap_s_log_addrs(struct cec_adapter *adap, struct cec_fh *fh, CEC_LOG_ADDRS_FL_ALLOW_RC_PASSTHRU | CEC_LOG_ADDRS_FL_CDC_ONLY; mutex_lock(&adap->lock); - if (!adap->is_configuring && + if (!adap->is_claiming_log_addrs && !adap->is_configuring && (!log_addrs.num_log_addrs || !adap->is_configured) && !cec_is_busy(adap, fh)) { err = __cec_s_log_addrs(adap, &log_addrs, block); diff --git a/include/media/cec.h b/include/media/cec.h index df3e8738d512..23202bf439b4 100644 --- a/include/media/cec.h +++ b/include/media/cec.h @@ -239,6 +239,7 @@ struct cec_adapter { u16 phys_addr; bool needs_hpd; bool is_enabled; + bool is_claiming_log_addrs; bool is_configuring; bool is_configured; bool cec_pin_is_high; From 3166b2dffaee89b8c68f55d2c825bb3163a3ea55 Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Tue, 30 Apr 2024 11:13:47 +0100 Subject: [PATCH 0209/1565] media: cec: core: avoid confusing "transmit timed out" message [ Upstream commit cbe499977bc36fedae89f0a0d7deb4ccde9798fe ] If, when waiting for a transmit to finish, the wait is interrupted, then you might get a "transmit timed out" message, even though the transmit was interrupted and did not actually time out. Set transmit_in_progress_aborted to true if the wait_for_completion_killable() call was interrupted and ensure that the transmit is properly marked as ABORTED. Signed-off-by: Hans Verkuil Reported-by: Yang, Chenyuan Closes: https://lore.kernel.org/linux-media/PH7PR11MB57688E64ADE4FE82E658D86DA09EA@PH7PR11MB5768.namprd11.prod.outlook.com/ Fixes: 590a8e564c6e ("media: cec: abort if the current transmit was canceled") Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin --- drivers/media/cec/core/cec-adap.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index 42d66ae27b19..a5853c82d5e4 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -502,6 +502,15 @@ int cec_thread_func(void *_adap) goto unlock; } + if (adap->transmit_in_progress && + adap->transmit_in_progress_aborted) { + if (adap->transmitting) + cec_data_cancel(adap->transmitting, + CEC_TX_STATUS_ABORTED, 0); + adap->transmit_in_progress = false; + adap->transmit_in_progress_aborted = false; + goto unlock; + } if (adap->transmit_in_progress && timeout) { /* * If we timeout, then log that. Normally this does @@ -755,6 +764,7 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg, { struct cec_data *data; bool is_raw = msg_is_raw(msg); + int err; if (adap->devnode.unregistered) return -ENODEV; @@ -917,10 +927,13 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg, * Release the lock and wait, retake the lock afterwards. */ mutex_unlock(&adap->lock); - wait_for_completion_killable(&data->c); + err = wait_for_completion_killable(&data->c); cancel_delayed_work_sync(&data->work); mutex_lock(&adap->lock); + if (err) + adap->transmit_in_progress_aborted = true; + /* Cancel the transmit if it was interrupted */ if (!data->completed) { if (data->msg.tx_status & CEC_TX_STATUS_OK) From a2b0c3a6d46006872a30097c9209f7216703a94b Mon Sep 17 00:00:00 2001 From: Zhu Yanjun Date: Mon, 6 May 2024 09:55:38 +0200 Subject: [PATCH 0210/1565] null_blk: Fix the WARNING: modpost: missing MODULE_DESCRIPTION() [ Upstream commit 9e6727f824edcdb8fdd3e6e8a0862eb49546e1cd ] No functional changes intended. Fixes: f2298c0403b0 ("null_blk: multi queue aware block test driver") Signed-off-by: Zhu Yanjun Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20240506075538.6064-1-yanjun.zhu@linux.dev Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin --- drivers/block/null_blk/main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c index ee1c3f476a3a..862a9420df52 100644 --- a/drivers/block/null_blk/main.c +++ b/drivers/block/null_blk/main.c @@ -2040,4 +2040,5 @@ module_init(null_init); module_exit(null_exit); MODULE_AUTHOR("Jens Axboe "); +MODULE_DESCRIPTION("multi queue aware block test driver"); MODULE_LICENSE("GPL"); From 8fb8be0e3b6d6bd563dec253b9a20e49aa072fdb Mon Sep 17 00:00:00 2001 From: Matti Vaittinen Date: Thu, 16 May 2024 11:54:41 +0300 Subject: [PATCH 0211/1565] regulator: bd71828: Don't overwrite runtime voltages [ Upstream commit 0f9f7c63c415e287cd57b5c98be61eb320dedcfc ] Some of the regulators on the BD71828 have common voltage setting for RUN/SUSPEND/IDLE/LPSR states. The enable control can be set for each state though. The driver allows setting the voltage values for these states via device-tree. As a side effect, setting the voltages for SUSPEND/IDLE/LPSR will also change the RUN level voltage which is not desired and can break the system. The comment in code reflects this behaviour, but it is likely to not make people any happier. The right thing to do is to allow setting the enable/disable state at SUSPEND/IDLE/LPSR via device-tree, but to disallow setting state specific voltages for those regulators. BUCK1 is a bit different. It only shares the SUSPEND and LPSR state voltages. The former behaviour of allowing to silently overwrite the SUSPEND state voltage by LPSR state voltage is also changed here so that the SUSPEND voltage is prioritized over LPSR voltage. Prevent setting PMIC state specific voltages for regulators which do not support it. Signed-off-by: Matti Vaittinen Fixes: 522498f8cb8c ("regulator: bd71828: Basic support for ROHM bd71828 PMIC regulators") Link: https://msgid.link/r/e1883ae1e3ae5668f1030455d4750923561f3d68.1715848512.git.mazziesaccount@gmail.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- drivers/regulator/bd71828-regulator.c | 58 +-------------------------- 1 file changed, 2 insertions(+), 56 deletions(-) diff --git a/drivers/regulator/bd71828-regulator.c b/drivers/regulator/bd71828-regulator.c index 85c0b9000963..fc08f6f61655 100644 --- a/drivers/regulator/bd71828-regulator.c +++ b/drivers/regulator/bd71828-regulator.c @@ -234,14 +234,11 @@ static const struct bd71828_regulator_data bd71828_rdata[] = { .suspend_reg = BD71828_REG_BUCK1_SUSP_VOLT, .suspend_mask = BD71828_MASK_BUCK1267_VOLT, .suspend_on_mask = BD71828_MASK_SUSP_EN, - .lpsr_on_mask = BD71828_MASK_LPSR_EN, /* * LPSR voltage is same as SUSPEND voltage. Allow - * setting it so that regulator can be set enabled at - * LPSR state + * only enabling/disabling regulator for LPSR state */ - .lpsr_reg = BD71828_REG_BUCK1_SUSP_VOLT, - .lpsr_mask = BD71828_MASK_BUCK1267_VOLT, + .lpsr_on_mask = BD71828_MASK_LPSR_EN, }, .reg_inits = buck1_inits, .reg_init_amnt = ARRAY_SIZE(buck1_inits), @@ -312,13 +309,7 @@ static const struct bd71828_regulator_data bd71828_rdata[] = { ROHM_DVS_LEVEL_SUSPEND | ROHM_DVS_LEVEL_LPSR, .run_reg = BD71828_REG_BUCK3_VOLT, - .idle_reg = BD71828_REG_BUCK3_VOLT, - .suspend_reg = BD71828_REG_BUCK3_VOLT, - .lpsr_reg = BD71828_REG_BUCK3_VOLT, .run_mask = BD71828_MASK_BUCK3_VOLT, - .idle_mask = BD71828_MASK_BUCK3_VOLT, - .suspend_mask = BD71828_MASK_BUCK3_VOLT, - .lpsr_mask = BD71828_MASK_BUCK3_VOLT, .idle_on_mask = BD71828_MASK_IDLE_EN, .suspend_on_mask = BD71828_MASK_SUSP_EN, .lpsr_on_mask = BD71828_MASK_LPSR_EN, @@ -353,13 +344,7 @@ static const struct bd71828_regulator_data bd71828_rdata[] = { ROHM_DVS_LEVEL_SUSPEND | ROHM_DVS_LEVEL_LPSR, .run_reg = BD71828_REG_BUCK4_VOLT, - .idle_reg = BD71828_REG_BUCK4_VOLT, - .suspend_reg = BD71828_REG_BUCK4_VOLT, - .lpsr_reg = BD71828_REG_BUCK4_VOLT, .run_mask = BD71828_MASK_BUCK4_VOLT, - .idle_mask = BD71828_MASK_BUCK4_VOLT, - .suspend_mask = BD71828_MASK_BUCK4_VOLT, - .lpsr_mask = BD71828_MASK_BUCK4_VOLT, .idle_on_mask = BD71828_MASK_IDLE_EN, .suspend_on_mask = BD71828_MASK_SUSP_EN, .lpsr_on_mask = BD71828_MASK_LPSR_EN, @@ -394,13 +379,7 @@ static const struct bd71828_regulator_data bd71828_rdata[] = { ROHM_DVS_LEVEL_SUSPEND | ROHM_DVS_LEVEL_LPSR, .run_reg = BD71828_REG_BUCK5_VOLT, - .idle_reg = BD71828_REG_BUCK5_VOLT, - .suspend_reg = BD71828_REG_BUCK5_VOLT, - .lpsr_reg = BD71828_REG_BUCK5_VOLT, .run_mask = BD71828_MASK_BUCK5_VOLT, - .idle_mask = BD71828_MASK_BUCK5_VOLT, - .suspend_mask = BD71828_MASK_BUCK5_VOLT, - .lpsr_mask = BD71828_MASK_BUCK5_VOLT, .idle_on_mask = BD71828_MASK_IDLE_EN, .suspend_on_mask = BD71828_MASK_SUSP_EN, .lpsr_on_mask = BD71828_MASK_LPSR_EN, @@ -509,13 +488,7 @@ static const struct bd71828_regulator_data bd71828_rdata[] = { ROHM_DVS_LEVEL_SUSPEND | ROHM_DVS_LEVEL_LPSR, .run_reg = BD71828_REG_LDO1_VOLT, - .idle_reg = BD71828_REG_LDO1_VOLT, - .suspend_reg = BD71828_REG_LDO1_VOLT, - .lpsr_reg = BD71828_REG_LDO1_VOLT, .run_mask = BD71828_MASK_LDO_VOLT, - .idle_mask = BD71828_MASK_LDO_VOLT, - .suspend_mask = BD71828_MASK_LDO_VOLT, - .lpsr_mask = BD71828_MASK_LDO_VOLT, .idle_on_mask = BD71828_MASK_IDLE_EN, .suspend_on_mask = BD71828_MASK_SUSP_EN, .lpsr_on_mask = BD71828_MASK_LPSR_EN, @@ -549,13 +522,7 @@ static const struct bd71828_regulator_data bd71828_rdata[] = { ROHM_DVS_LEVEL_SUSPEND | ROHM_DVS_LEVEL_LPSR, .run_reg = BD71828_REG_LDO2_VOLT, - .idle_reg = BD71828_REG_LDO2_VOLT, - .suspend_reg = BD71828_REG_LDO2_VOLT, - .lpsr_reg = BD71828_REG_LDO2_VOLT, .run_mask = BD71828_MASK_LDO_VOLT, - .idle_mask = BD71828_MASK_LDO_VOLT, - .suspend_mask = BD71828_MASK_LDO_VOLT, - .lpsr_mask = BD71828_MASK_LDO_VOLT, .idle_on_mask = BD71828_MASK_IDLE_EN, .suspend_on_mask = BD71828_MASK_SUSP_EN, .lpsr_on_mask = BD71828_MASK_LPSR_EN, @@ -589,13 +556,7 @@ static const struct bd71828_regulator_data bd71828_rdata[] = { ROHM_DVS_LEVEL_SUSPEND | ROHM_DVS_LEVEL_LPSR, .run_reg = BD71828_REG_LDO3_VOLT, - .idle_reg = BD71828_REG_LDO3_VOLT, - .suspend_reg = BD71828_REG_LDO3_VOLT, - .lpsr_reg = BD71828_REG_LDO3_VOLT, .run_mask = BD71828_MASK_LDO_VOLT, - .idle_mask = BD71828_MASK_LDO_VOLT, - .suspend_mask = BD71828_MASK_LDO_VOLT, - .lpsr_mask = BD71828_MASK_LDO_VOLT, .idle_on_mask = BD71828_MASK_IDLE_EN, .suspend_on_mask = BD71828_MASK_SUSP_EN, .lpsr_on_mask = BD71828_MASK_LPSR_EN, @@ -630,13 +591,7 @@ static const struct bd71828_regulator_data bd71828_rdata[] = { ROHM_DVS_LEVEL_SUSPEND | ROHM_DVS_LEVEL_LPSR, .run_reg = BD71828_REG_LDO4_VOLT, - .idle_reg = BD71828_REG_LDO4_VOLT, - .suspend_reg = BD71828_REG_LDO4_VOLT, - .lpsr_reg = BD71828_REG_LDO4_VOLT, .run_mask = BD71828_MASK_LDO_VOLT, - .idle_mask = BD71828_MASK_LDO_VOLT, - .suspend_mask = BD71828_MASK_LDO_VOLT, - .lpsr_mask = BD71828_MASK_LDO_VOLT, .idle_on_mask = BD71828_MASK_IDLE_EN, .suspend_on_mask = BD71828_MASK_SUSP_EN, .lpsr_on_mask = BD71828_MASK_LPSR_EN, @@ -671,13 +626,7 @@ static const struct bd71828_regulator_data bd71828_rdata[] = { ROHM_DVS_LEVEL_SUSPEND | ROHM_DVS_LEVEL_LPSR, .run_reg = BD71828_REG_LDO5_VOLT, - .idle_reg = BD71828_REG_LDO5_VOLT, - .suspend_reg = BD71828_REG_LDO5_VOLT, - .lpsr_reg = BD71828_REG_LDO5_VOLT, .run_mask = BD71828_MASK_LDO_VOLT, - .idle_mask = BD71828_MASK_LDO_VOLT, - .suspend_mask = BD71828_MASK_LDO_VOLT, - .lpsr_mask = BD71828_MASK_LDO_VOLT, .idle_on_mask = BD71828_MASK_IDLE_EN, .suspend_on_mask = BD71828_MASK_SUSP_EN, .lpsr_on_mask = BD71828_MASK_LPSR_EN, @@ -736,9 +685,6 @@ static const struct bd71828_regulator_data bd71828_rdata[] = { .suspend_reg = BD71828_REG_LDO7_VOLT, .lpsr_reg = BD71828_REG_LDO7_VOLT, .run_mask = BD71828_MASK_LDO_VOLT, - .idle_mask = BD71828_MASK_LDO_VOLT, - .suspend_mask = BD71828_MASK_LDO_VOLT, - .lpsr_mask = BD71828_MASK_LDO_VOLT, .idle_on_mask = BD71828_MASK_IDLE_EN, .suspend_on_mask = BD71828_MASK_SUSP_EN, .lpsr_on_mask = BD71828_MASK_LPSR_EN, From ee6a497844783f55d5505bf7307832510353e24f Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Sun, 4 Feb 2024 21:20:03 +0900 Subject: [PATCH 0212/1565] x86/kconfig: Select ARCH_WANT_FRAME_POINTERS again when UNWINDER_FRAME_POINTER=y [ Upstream commit 66ee3636eddcc82ab82b539d08b85fb5ac1dff9b ] It took me some time to understand the purpose of the tricky code at the end of arch/x86/Kconfig.debug. Without it, the following would be shown: WARNING: unmet direct dependencies detected for FRAME_POINTER because 81d387190039 ("x86/kconfig: Consolidate unwinders into multiple choice selection") removed 'select ARCH_WANT_FRAME_POINTERS'. The correct and more straightforward approach should have been to move it where 'select FRAME_POINTER' is located. Several architectures properly handle the conditional selection of ARCH_WANT_FRAME_POINTERS. For example, 'config UNWINDER_FRAME_POINTER' in arch/arm/Kconfig.debug. Fixes: 81d387190039 ("x86/kconfig: Consolidate unwinders into multiple choice selection") Signed-off-by: Masahiro Yamada Signed-off-by: Borislav Petkov (AMD) Acked-by: Josh Poimboeuf Link: https://lore.kernel.org/r/20240204122003.53795-1-masahiroy@kernel.org Signed-off-by: Sasha Levin --- arch/x86/Kconfig.debug | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index 27b5e2bc6a01..e6546aa448de 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -258,6 +258,7 @@ config UNWINDER_ORC config UNWINDER_FRAME_POINTER bool "Frame pointer unwinder" + select ARCH_WANT_FRAME_POINTERS select FRAME_POINTER help This option enables the frame pointer unwinder for unwinding kernel @@ -281,7 +282,3 @@ config UNWINDER_GUESS overhead. endchoice - -config FRAME_POINTER - depends on !UNWINDER_ORC && !UNWINDER_GUESS - bool From f80b786ab0550d0020191a59077b2c7e069db2d1 Mon Sep 17 00:00:00 2001 From: Ryosuke Yasuoka Date: Sun, 19 May 2024 18:43:03 +0900 Subject: [PATCH 0213/1565] nfc: nci: Fix uninit-value in nci_rx_work [ Upstream commit e4a87abf588536d1cdfb128595e6e680af5cf3ed ] syzbot reported the following uninit-value access issue [1] nci_rx_work() parses received packet from ndev->rx_q. It should be validated header size, payload size and total packet size before processing the packet. If an invalid packet is detected, it should be silently discarded. Fixes: d24b03535e5e ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet") Reported-and-tested-by: syzbot+d7b4dc6cd50410152534@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d7b4dc6cd50410152534 [1] Signed-off-by: Ryosuke Yasuoka Reviewed-by: Krzysztof Kozlowski Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/nfc/nci/core.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/net/nfc/nci/core.c b/net/nfc/nci/core.c index d8002065baae..3d67d558ed48 100644 --- a/net/nfc/nci/core.c +++ b/net/nfc/nci/core.c @@ -1452,6 +1452,19 @@ int nci_core_ntf_packet(struct nci_dev *ndev, __u16 opcode, ndev->ops->n_core_ops); } +static bool nci_valid_size(struct sk_buff *skb) +{ + unsigned int hdr_size = NCI_CTRL_HDR_SIZE; + BUILD_BUG_ON(NCI_CTRL_HDR_SIZE != NCI_DATA_HDR_SIZE); + + if (skb->len < hdr_size || + !nci_plen(skb->data) || + skb->len < hdr_size + nci_plen(skb->data)) { + return false; + } + return true; +} + /* ---- NCI TX Data worker thread ---- */ static void nci_tx_work(struct work_struct *work) @@ -1502,7 +1515,7 @@ static void nci_rx_work(struct work_struct *work) nfc_send_to_raw_sock(ndev->nfc_dev, skb, RAW_PAYLOAD_NCI, NFC_DIRECTION_RX); - if (!nci_plen(skb->data)) { + if (!nci_valid_size(skb)) { kfree_skb(skb); break; } From 6f39d5aae6767ce7756c2ef9fde53333b127e6a4 Mon Sep 17 00:00:00 2001 From: Shenghao Ding Date: Sat, 18 May 2024 11:35:15 +0800 Subject: [PATCH 0214/1565] ASoC: tas2552: Add TX path for capturing AUDIO-OUT data [ Upstream commit 7078ac4fd179a68d0bab448004fcd357e7a45f8d ] TAS2552 is a Smartamp with I/V sense data, add TX path to support capturing I/V data. Fixes: 38803ce7b53b ("ASoC: codecs: tas*: merge .digital_mute() into .mute_stream()") Signed-off-by: Shenghao Ding Link: https://msgid.link/r/20240518033515.866-1-shenghao-ding@ti.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/codecs/tas2552.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/sound/soc/codecs/tas2552.c b/sound/soc/codecs/tas2552.c index bd00c35116cd..f045876c12f1 100644 --- a/sound/soc/codecs/tas2552.c +++ b/sound/soc/codecs/tas2552.c @@ -2,7 +2,8 @@ /* * tas2552.c - ALSA SoC Texas Instruments TAS2552 Mono Audio Amplifier * - * Copyright (C) 2014 Texas Instruments Incorporated - https://www.ti.com + * Copyright (C) 2014 - 2024 Texas Instruments Incorporated - + * https://www.ti.com * * Author: Dan Murphy */ @@ -119,12 +120,14 @@ static const struct snd_soc_dapm_widget tas2552_dapm_widgets[] = &tas2552_input_mux_control), SND_SOC_DAPM_AIF_IN("DAC IN", "DAC Playback", 0, SND_SOC_NOPM, 0, 0), + SND_SOC_DAPM_AIF_OUT("ASI OUT", "DAC Capture", 0, SND_SOC_NOPM, 0, 0), SND_SOC_DAPM_DAC("DAC", NULL, SND_SOC_NOPM, 0, 0), SND_SOC_DAPM_OUT_DRV("ClassD", TAS2552_CFG_2, 7, 0, NULL, 0), SND_SOC_DAPM_SUPPLY("PLL", TAS2552_CFG_2, 3, 0, NULL, 0), SND_SOC_DAPM_POST("Post Event", tas2552_post_event), - SND_SOC_DAPM_OUTPUT("OUT") + SND_SOC_DAPM_OUTPUT("OUT"), + SND_SOC_DAPM_INPUT("DMIC") }; static const struct snd_soc_dapm_route tas2552_audio_map[] = { @@ -134,6 +137,7 @@ static const struct snd_soc_dapm_route tas2552_audio_map[] = { {"ClassD", NULL, "Input selection"}, {"OUT", NULL, "ClassD"}, {"ClassD", NULL, "PLL"}, + {"ASI OUT", NULL, "DMIC"} }; #ifdef CONFIG_PM @@ -538,6 +542,13 @@ static struct snd_soc_dai_driver tas2552_dai[] = { .rates = SNDRV_PCM_RATE_8000_192000, .formats = TAS2552_FORMATS, }, + .capture = { + .stream_name = "Capture", + .channels_min = 2, + .channels_max = 2, + .rates = SNDRV_PCM_RATE_8000_192000, + .formats = TAS2552_FORMATS, + }, .ops = &tas2552_speaker_dai_ops, }, }; From f2b326b774501df6d7aa0aa2df4ba5c05b2f33f0 Mon Sep 17 00:00:00 2001 From: Dan Aloni Date: Thu, 25 Apr 2024 13:49:38 +0300 Subject: [PATCH 0215/1565] sunrpc: fix NFSACL RPC retry on soft mount [ Upstream commit 0dc9f430027b8bd9073fdafdfcdeb1a073ab5594 ] It used to be quite awhile ago since 1b63a75180c6 ('SUNRPC: Refactor rpc_clone_client()'), in 2012, that `cl_timeout` was copied in so that all mount parameters propagate to NFSACL clients. However since that change, if mount options as follows are given: soft,timeo=50,retrans=16,vers=3 The resultant NFSACL client receives: cl_softrtry: 1 cl_timeout: to_initval=60000, to_maxval=60000, to_increment=0, to_retries=2, to_exponential=0 These values lead to NFSACL operations not being retried under the condition of transient network outages with soft mount. Instead, getacl call fails after 60 seconds with EIO. The simple fix is to pass the existing client's `cl_timeout` as the new client timeout. Cc: Chuck Lever Cc: Benjamin Coddington Link: https://lore.kernel.org/all/20231105154857.ryakhmgaptq3hb6b@gmail.com/T/ Fixes: 1b63a75180c6 ('SUNRPC: Refactor rpc_clone_client()') Signed-off-by: Dan Aloni Reviewed-by: Benjamin Coddington Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin --- net/sunrpc/clnt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index a5ce9b937c42..196a3b11d150 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -970,6 +970,7 @@ struct rpc_clnt *rpc_bind_new_program(struct rpc_clnt *old, .authflavor = old->cl_auth->au_flavor, .cred = old->cl_cred, .stats = old->cl_stats, + .timeout = old->cl_timeout, }; struct rpc_clnt *clnt; int err; From 1c65ebce7d37ee2c779b110bbe9139871183f293 Mon Sep 17 00:00:00 2001 From: Dan Aloni Date: Mon, 6 May 2024 12:37:59 +0300 Subject: [PATCH 0216/1565] rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL [ Upstream commit 4836da219781ec510c4c0303df901aa643507a7a ] Under the scenario of IB device bonding, when bringing down one of the ports, or all ports, we saw xprtrdma entering a non-recoverable state where it is not even possible to complete the disconnect and shut it down the mount, requiring a reboot. Following debug, we saw that transport connect never ended after receiving the RDMA_CM_EVENT_DEVICE_REMOVAL callback. The DEVICE_REMOVAL callback is irrespective of whether the CM_ID is connected, and ESTABLISHED may not have happened. So need to work with each of these states accordingly. Fixes: 2acc5cae2923 ('xprtrdma: Prevent dereferencing r_xprt->rx_ep after it is freed') Cc: Sagi Grimberg Signed-off-by: Dan Aloni Reviewed-by: Sagi Grimberg Reviewed-by: Chuck Lever Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin --- net/sunrpc/xprtrdma/verbs.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index d015576f3081..9e9df38b29f7 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -268,7 +268,11 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event) case RDMA_CM_EVENT_DEVICE_REMOVAL: pr_info("rpcrdma: removing device %s for %pISpc\n", ep->re_id->device->name, sap); - fallthrough; + switch (xchg(&ep->re_connect_status, -ENODEV)) { + case 0: goto wake_connect_worker; + case 1: goto disconnected; + } + return 0; case RDMA_CM_EVENT_ADDR_CHANGE: ep->re_connect_status = -ENODEV; goto disconnected; From daf341e0a2318b813427d5a78788c86f4a7f02be Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Fri, 17 May 2024 08:54:35 +0800 Subject: [PATCH 0217/1565] ipv6: sr: fix memleak in seg6_hmac_init_algo [ Upstream commit efb9f4f19f8e37fde43dfecebc80292d179f56c6 ] seg6_hmac_init_algo returns without cleaning up the previous allocations if one fails, so it's going to leak all that memory and the crypto tfms. Update seg6_hmac_exit to only free the memory when allocated, so we can reuse the code directly. Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support") Reported-by: Sabrina Dubroca Closes: https://lore.kernel.org/netdev/Zj3bh-gE7eT6V6aH@hog/ Signed-off-by: Hangbin Liu Reviewed-by: Simon Horman Reviewed-by: Sabrina Dubroca Link: https://lore.kernel.org/r/20240517005435.2600277-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/ipv6/seg6_hmac.c | 42 ++++++++++++++++++++++++++++-------------- 1 file changed, 28 insertions(+), 14 deletions(-) diff --git a/net/ipv6/seg6_hmac.c b/net/ipv6/seg6_hmac.c index 552bce1fdfb9..2e2b94ae6355 100644 --- a/net/ipv6/seg6_hmac.c +++ b/net/ipv6/seg6_hmac.c @@ -355,6 +355,7 @@ static int seg6_hmac_init_algo(void) struct crypto_shash *tfm; struct shash_desc *shash; int i, alg_count, cpu; + int ret = -ENOMEM; alg_count = ARRAY_SIZE(hmac_algos); @@ -365,12 +366,14 @@ static int seg6_hmac_init_algo(void) algo = &hmac_algos[i]; algo->tfms = alloc_percpu(struct crypto_shash *); if (!algo->tfms) - return -ENOMEM; + goto error_out; for_each_possible_cpu(cpu) { tfm = crypto_alloc_shash(algo->name, 0, 0); - if (IS_ERR(tfm)) - return PTR_ERR(tfm); + if (IS_ERR(tfm)) { + ret = PTR_ERR(tfm); + goto error_out; + } p_tfm = per_cpu_ptr(algo->tfms, cpu); *p_tfm = tfm; } @@ -382,18 +385,22 @@ static int seg6_hmac_init_algo(void) algo->shashs = alloc_percpu(struct shash_desc *); if (!algo->shashs) - return -ENOMEM; + goto error_out; for_each_possible_cpu(cpu) { shash = kzalloc_node(shsize, GFP_KERNEL, cpu_to_node(cpu)); if (!shash) - return -ENOMEM; + goto error_out; *per_cpu_ptr(algo->shashs, cpu) = shash; } } return 0; + +error_out: + seg6_hmac_exit(); + return ret; } int __init seg6_hmac_init(void) @@ -413,22 +420,29 @@ int __net_init seg6_hmac_net_init(struct net *net) void seg6_hmac_exit(void) { struct seg6_hmac_algo *algo = NULL; + struct crypto_shash *tfm; + struct shash_desc *shash; int i, alg_count, cpu; alg_count = ARRAY_SIZE(hmac_algos); for (i = 0; i < alg_count; i++) { algo = &hmac_algos[i]; - for_each_possible_cpu(cpu) { - struct crypto_shash *tfm; - struct shash_desc *shash; - shash = *per_cpu_ptr(algo->shashs, cpu); - kfree(shash); - tfm = *per_cpu_ptr(algo->tfms, cpu); - crypto_free_shash(tfm); + if (algo->shashs) { + for_each_possible_cpu(cpu) { + shash = *per_cpu_ptr(algo->shashs, cpu); + kfree(shash); + } + free_percpu(algo->shashs); + } + + if (algo->tfms) { + for_each_possible_cpu(cpu) { + tfm = *per_cpu_ptr(algo->tfms, cpu); + crypto_free_shash(tfm); + } + free_percpu(algo->tfms); } - free_percpu(algo->tfms); - free_percpu(algo->shashs); } } EXPORT_SYMBOL(seg6_hmac_exit); From 42bd4e491cf150fbf881ad081df14e5d0b12a320 Mon Sep 17 00:00:00 2001 From: Sagi Grimberg Date: Wed, 16 Jun 2021 14:19:33 -0700 Subject: [PATCH 0218/1565] params: lift param_set_uint_minmax to common code [ Upstream commit 2a14c9ae15a38148484a128b84bff7e9ffd90d68 ] It is a useful helper hence move it to common code so others can enjoy it. Suggested-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Reviewed-by: Hannes Reinecke Signed-off-by: Sagi Grimberg Signed-off-by: Christoph Hellwig Stable-dep-of: 3ebc46ca8675 ("tcp: Fix shift-out-of-bounds in dctcp_update_alpha().") Signed-off-by: Sasha Levin --- include/linux/moduleparam.h | 2 ++ kernel/params.c | 18 ++++++++++++++++++ net/sunrpc/xprtsock.c | 18 ------------------ 3 files changed, 20 insertions(+), 18 deletions(-) diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h index 6388eb9734a5..f25a1c484390 100644 --- a/include/linux/moduleparam.h +++ b/include/linux/moduleparam.h @@ -431,6 +431,8 @@ extern int param_get_int(char *buffer, const struct kernel_param *kp); extern const struct kernel_param_ops param_ops_uint; extern int param_set_uint(const char *val, const struct kernel_param *kp); extern int param_get_uint(char *buffer, const struct kernel_param *kp); +int param_set_uint_minmax(const char *val, const struct kernel_param *kp, + unsigned int min, unsigned int max); #define param_check_uint(name, p) __param_check(name, p, unsigned int) extern const struct kernel_param_ops param_ops_long; diff --git a/kernel/params.c b/kernel/params.c index 164d79330849..eb00abef7076 100644 --- a/kernel/params.c +++ b/kernel/params.c @@ -243,6 +243,24 @@ STANDARD_PARAM_DEF(ulong, unsigned long, "%lu", kstrtoul); STANDARD_PARAM_DEF(ullong, unsigned long long, "%llu", kstrtoull); STANDARD_PARAM_DEF(hexint, unsigned int, "%#08x", kstrtouint); +int param_set_uint_minmax(const char *val, const struct kernel_param *kp, + unsigned int min, unsigned int max) +{ + unsigned int num; + int ret; + + if (!val) + return -EINVAL; + ret = kstrtouint(val, 0, &num); + if (ret) + return ret; + if (num < min || num > max) + return -EINVAL; + *((unsigned int *)kp->arg) = num; + return 0; +} +EXPORT_SYMBOL_GPL(param_set_uint_minmax); + int param_set_charp(const char *val, const struct kernel_param *kp) { if (strlen(val) > 1024) { diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index ae5b5380f0f0..0666f981618a 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -3166,24 +3166,6 @@ void cleanup_socket_xprt(void) xprt_unregister_transport(&xs_bc_tcp_transport); } -static int param_set_uint_minmax(const char *val, - const struct kernel_param *kp, - unsigned int min, unsigned int max) -{ - unsigned int num; - int ret; - - if (!val) - return -EINVAL; - ret = kstrtouint(val, 0, &num); - if (ret) - return ret; - if (num < min || num > max) - return -EINVAL; - *((unsigned int *)kp->arg) = num; - return 0; -} - static int param_set_portnr(const char *val, const struct kernel_param *kp) { return param_set_uint_minmax(val, kp, From e9b2f60636d18dfd0dd4965b3316f88dfd6a2b31 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 17 May 2024 18:16:26 +0900 Subject: [PATCH 0219/1565] tcp: Fix shift-out-of-bounds in dctcp_update_alpha(). [ Upstream commit 3ebc46ca8675de6378e3f8f40768e180bb8afa66 ] In dctcp_update_alpha(), we use a module parameter dctcp_shift_g as follows: alpha -= min_not_zero(alpha, alpha >> dctcp_shift_g); ... delivered_ce <<= (10 - dctcp_shift_g); It seems syzkaller started fuzzing module parameters and triggered shift-out-of-bounds [0] by setting 100 to dctcp_shift_g: memcpy((void*)0x20000080, "/sys/module/tcp_dctcp/parameters/dctcp_shift_g\000", 47); res = syscall(__NR_openat, /*fd=*/0xffffffffffffff9cul, /*file=*/0x20000080ul, /*flags=*/2ul, /*mode=*/0ul); memcpy((void*)0x20000000, "100\000", 4); syscall(__NR_write, /*fd=*/r[0], /*val=*/0x20000000ul, /*len=*/4ul); Let's limit the max value of dctcp_shift_g by param_set_uint_minmax(). With this patch: # echo 10 > /sys/module/tcp_dctcp/parameters/dctcp_shift_g # cat /sys/module/tcp_dctcp/parameters/dctcp_shift_g 10 # echo 11 > /sys/module/tcp_dctcp/parameters/dctcp_shift_g -bash: echo: write error: Invalid argument [0]: UBSAN: shift-out-of-bounds in net/ipv4/tcp_dctcp.c:143:12 shift exponent 100 is too large for 32-bit type 'u32' (aka 'unsigned int') CPU: 0 PID: 8083 Comm: syz-executor345 Not tainted 6.9.0-05151-g1b294a1f3561 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x201/0x300 lib/dump_stack.c:114 ubsan_epilogue lib/ubsan.c:231 [inline] __ubsan_handle_shift_out_of_bounds+0x346/0x3a0 lib/ubsan.c:468 dctcp_update_alpha+0x540/0x570 net/ipv4/tcp_dctcp.c:143 tcp_in_ack_event net/ipv4/tcp_input.c:3802 [inline] tcp_ack+0x17b1/0x3bc0 net/ipv4/tcp_input.c:3948 tcp_rcv_state_process+0x57a/0x2290 net/ipv4/tcp_input.c:6711 tcp_v4_do_rcv+0x764/0xc40 net/ipv4/tcp_ipv4.c:1937 sk_backlog_rcv include/net/sock.h:1106 [inline] __release_sock+0x20f/0x350 net/core/sock.c:2983 release_sock+0x61/0x1f0 net/core/sock.c:3549 mptcp_subflow_shutdown+0x3d0/0x620 net/mptcp/protocol.c:2907 mptcp_check_send_data_fin+0x225/0x410 net/mptcp/protocol.c:2976 __mptcp_close+0x238/0xad0 net/mptcp/protocol.c:3072 mptcp_close+0x2a/0x1a0 net/mptcp/protocol.c:3127 inet_release+0x190/0x1f0 net/ipv4/af_inet.c:437 __sock_release net/socket.c:659 [inline] sock_close+0xc0/0x240 net/socket.c:1421 __fput+0x41b/0x890 fs/file_table.c:422 task_work_run+0x23b/0x300 kernel/task_work.c:180 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0x9c8/0x2540 kernel/exit.c:878 do_group_exit+0x201/0x2b0 kernel/exit.c:1027 __do_sys_exit_group kernel/exit.c:1038 [inline] __se_sys_exit_group kernel/exit.c:1036 [inline] __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1036 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xe4/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x67/0x6f RIP: 0033:0x7f6c2b5005b6 Code: Unable to access opcode bytes at 0x7f6c2b50058c. RSP: 002b:00007ffe883eb948 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 00007f6c2b5862f0 RCX: 00007f6c2b5005b6 RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001 RBP: 0000000000000001 R08: 00000000000000e7 R09: ffffffffffffffc0 R10: 0000000000000006 R11: 0000000000000246 R12: 00007f6c2b5862f0 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 Reported-by: syzkaller Reported-by: Yue Sun Reported-by: xingwei lee Closes: https://lore.kernel.org/netdev/CAEkJfYNJM=cw-8x7_Vmj1J6uYVCWMbbvD=EFmDPVBGpTsqOxEA@mail.gmail.com/ Fixes: e3118e8359bb ("net: tcp: add DCTCP congestion control algorithm") Signed-off-by: Kuniyuki Iwashima Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20240517091626.32772-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/ipv4/tcp_dctcp.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c index 79f705450c16..be2c97e907ae 100644 --- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -55,7 +55,18 @@ struct dctcp { }; static unsigned int dctcp_shift_g __read_mostly = 4; /* g = 1/2^4 */ -module_param(dctcp_shift_g, uint, 0644); + +static int dctcp_shift_g_set(const char *val, const struct kernel_param *kp) +{ + return param_set_uint_minmax(val, kp, 0, 10); +} + +static const struct kernel_param_ops dctcp_shift_g_ops = { + .set = dctcp_shift_g_set, + .get = param_get_uint, +}; + +module_param_cb(dctcp_shift_g, &dctcp_shift_g_ops, &dctcp_shift_g, 0644); MODULE_PARM_DESC(dctcp_shift_g, "parameter g for updating dctcp_alpha"); static unsigned int dctcp_alpha_on_init __read_mostly = DCTCP_MAX_ALPHA; From 8cae65ace421e098d39455512b3366a20ab6e5fe Mon Sep 17 00:00:00 2001 From: Aaron Conole Date: Thu, 16 May 2024 16:09:41 -0400 Subject: [PATCH 0220/1565] openvswitch: Set the skbuff pkt_type for proper pmtud support. [ Upstream commit 30a92c9e3d6b073932762bef2ac66f4ee784c657 ] Open vSwitch is originally intended to switch at layer 2, only dealing with Ethernet frames. With the introduction of l3 tunnels support, it crossed into the realm of needing to care a bit about some routing details when making forwarding decisions. If an oversized packet would need to be fragmented during this forwarding decision, there is a chance for pmtu to get involved and generate a routing exception. This is gated by the skbuff->pkt_type field. When a flow is already loaded into the openvswitch module this field is set up and transitioned properly as a packet moves from one port to another. In the case that a packet execute is invoked after a flow is newly installed this field is not properly initialized. This causes the pmtud mechanism to omit sending the required exception messages across the tunnel boundary and a second attempt needs to be made to make sure that the routing exception is properly setup. To fix this, we set the outgoing packet's pkt_type to PACKET_OUTGOING, since it can only get to the openvswitch module via a port device or packet command. Even for bridge ports as users, the pkt_type needs to be reset when doing the transmit as the packet is truly outgoing and routing needs to get involved post packet transformations, in the case of VXLAN/GENEVE/udp-tunnel packets. In general, the pkt_type on output gets ignored, since we go straight to the driver, but in the case of tunnel ports they go through IP routing layer. This issue is periodically encountered in complex setups, such as large openshift deployments, where multiple sets of tunnel traversal occurs. A way to recreate this is with the ovn-heater project that can setup a networking environment which mimics such large deployments. We need larger environments for this because we need to ensure that flow misses occur. In these environment, without this patch, we can see: ./ovn_cluster.sh start podman exec ovn-chassis-1 ip r a 170.168.0.5/32 dev eth1 mtu 1200 podman exec ovn-chassis-1 ip netns exec sw01p1 ip r flush cache podman exec ovn-chassis-1 ip netns exec sw01p1 \ ping 21.0.0.3 -M do -s 1300 -c2 PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data. From 21.0.0.3 icmp_seq=2 Frag needed and DF set (mtu = 1142) --- 21.0.0.3 ping statistics --- ... Using tcpdump, we can also see the expected ICMP FRAG_NEEDED message is not sent into the server. With this patch, setting the pkt_type, we see the following: podman exec ovn-chassis-1 ip netns exec sw01p1 \ ping 21.0.0.3 -M do -s 1300 -c2 PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data. From 21.0.0.3 icmp_seq=1 Frag needed and DF set (mtu = 1222) ping: local error: message too long, mtu=1222 --- 21.0.0.3 ping statistics --- ... In this case, the first ping request receives the FRAG_NEEDED message and a local routing exception is created. Tested-by: Jaime Caamano Reported-at: https://issues.redhat.com/browse/FDP-164 Fixes: 58264848a5a7 ("openvswitch: Add vxlan tunneling support.") Signed-off-by: Aaron Conole Acked-by: Eelco Chaudron Link: https://lore.kernel.org/r/20240516200941.16152-1-aconole@redhat.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/openvswitch/actions.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 80fee9d118ee..4095456f413d 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -923,6 +923,12 @@ static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port, pskb_trim(skb, ovs_mac_header_len(key)); } + /* Need to set the pkt_type to involve the routing layer. The + * packet movement through the OVS datapath doesn't generally + * use routing, but this is needed for tunnel cases. + */ + skb->pkt_type = PACKET_OUTGOING; + if (likely(!mru || (skb->len <= mru + vport->dev->hard_header_len))) { ovs_vport_send(vport, skb, ovs_key_mac_proto(key)); From 461a760d578b2b2c2faac3040b6b7c77baf128f8 Mon Sep 17 00:00:00 2001 From: Jiangfeng Xiao Date: Mon, 20 May 2024 21:34:37 +0800 Subject: [PATCH 0221/1565] arm64: asm-bug: Add .align 2 to the end of __BUG_ENTRY [ Upstream commit ffbf4fb9b5c12ff878a10ea17997147ea4ebea6f ] When CONFIG_DEBUG_BUGVERBOSE=n, we fail to add necessary padding bytes to bug_table entries, and as a result the last entry in a bug table will be ignored, potentially leading to an unexpected panic(). All prior entries in the table will be handled correctly. The arm64 ABI requires that struct fields of up to 8 bytes are naturally-aligned, with padding added within a struct such that struct are suitably aligned within arrays. When CONFIG_DEBUG_BUGVERPOSE=y, the layout of a bug_entry is: struct bug_entry { signed int bug_addr_disp; // 4 bytes signed int file_disp; // 4 bytes unsigned short line; // 2 bytes unsigned short flags; // 2 bytes } ... with 12 bytes total, requiring 4-byte alignment. When CONFIG_DEBUG_BUGVERBOSE=n, the layout of a bug_entry is: struct bug_entry { signed int bug_addr_disp; // 4 bytes unsigned short flags; // 2 bytes < implicit padding > // 2 bytes } ... with 8 bytes total, with 6 bytes of data and 2 bytes of trailing padding, requiring 4-byte alginment. When we create a bug_entry in assembly, we align the start of the entry to 4 bytes, which implicitly handles padding for any prior entries. However, we do not align the end of the entry, and so when CONFIG_DEBUG_BUGVERBOSE=n, the final entry lacks the trailing padding bytes. For the main kernel image this is not a problem as find_bug() doesn't depend on the trailing padding bytes when searching for entries: for (bug = __start___bug_table; bug < __stop___bug_table; ++bug) if (bugaddr == bug_addr(bug)) return bug; However for modules, module_bug_finalize() depends on the trailing bytes when calculating the number of entries: mod->num_bugs = sechdrs[i].sh_size / sizeof(struct bug_entry); ... and as the last bug_entry lacks the necessary padding bytes, this entry will not be counted, e.g. in the case of a single entry: sechdrs[i].sh_size == 6 sizeof(struct bug_entry) == 8; sechdrs[i].sh_size / sizeof(struct bug_entry) == 0; Consequently module_find_bug() will miss the last bug_entry when it does: for (i = 0; i < mod->num_bugs; ++i, ++bug) if (bugaddr == bug_addr(bug)) goto out; ... which can lead to a kenrel panic due to an unhandled bug. This can be demonstrated with the following module: static int __init buginit(void) { WARN(1, "hello\n"); return 0; } static void __exit bugexit(void) { } module_init(buginit); module_exit(bugexit); MODULE_LICENSE("GPL"); ... which will trigger a kernel panic when loaded: ------------[ cut here ]------------ hello Unexpected kernel BRK exception at EL1 Internal error: BRK handler: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: hello(O+) CPU: 0 PID: 50 Comm: insmod Tainted: G O 6.9.1 #8 Hardware name: linux,dummy-virt (DT) pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : buginit+0x18/0x1000 [hello] lr : buginit+0x18/0x1000 [hello] sp : ffff800080533ae0 x29: ffff800080533ae0 x28: 0000000000000000 x27: 0000000000000000 x26: ffffaba8c4e70510 x25: ffff800080533c30 x24: ffffaba8c4a28a58 x23: 0000000000000000 x22: 0000000000000000 x21: ffff3947c0eab3c0 x20: ffffaba8c4e3f000 x19: ffffaba846464000 x18: 0000000000000006 x17: 0000000000000000 x16: ffffaba8c2492834 x15: 0720072007200720 x14: 0720072007200720 x13: ffffaba8c49b27c8 x12: 0000000000000312 x11: 0000000000000106 x10: ffffaba8c4a0a7c8 x9 : ffffaba8c49b27c8 x8 : 00000000ffffefff x7 : ffffaba8c4a0a7c8 x6 : 80000000fffff000 x5 : 0000000000000107 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff3947c0eab3c0 Call trace: buginit+0x18/0x1000 [hello] do_one_initcall+0x80/0x1c8 do_init_module+0x60/0x218 load_module+0x1ba4/0x1d70 __do_sys_init_module+0x198/0x1d0 __arm64_sys_init_module+0x1c/0x28 invoke_syscall+0x48/0x114 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x34/0xd8 el0t_64_sync_handler+0x120/0x12c el0t_64_sync+0x190/0x194 Code: d0ffffe0 910003fd 91000000 9400000b (d4210000) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: BRK handler: Fatal exception Fix this by always aligning the end of a bug_entry to 4 bytes, which is correct regardless of CONFIG_DEBUG_BUGVERBOSE. Fixes: 9fb7410f955f ("arm64/BUG: Use BRK instruction for generic BUG traps") Signed-off-by: Yuanbin Xie Signed-off-by: Jiangfeng Xiao Reviewed-by: Mark Rutland Link: https://lore.kernel.org/r/1716212077-43826-1-git-send-email-xiaojiangfeng@huawei.com Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- arch/arm64/include/asm/asm-bug.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/include/asm/asm-bug.h b/arch/arm64/include/asm/asm-bug.h index 03f52f84a4f3..bc2dcc8a0000 100644 --- a/arch/arm64/include/asm/asm-bug.h +++ b/arch/arm64/include/asm/asm-bug.h @@ -28,6 +28,7 @@ 14470: .long 14471f - 14470b; \ _BUGVERBOSE_LOCATION(__FILE__, __LINE__) \ .short flags; \ + .align 2; \ .popsection; \ 14471: #else From 7fbe54f02a5c77ff5dd65e8ed0b58e3bd8c43e9c Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Fri, 26 Apr 2024 17:08:45 +0200 Subject: [PATCH 0222/1565] virtio: delete vq in vp_find_vqs_msix() when request_irq() fails [ Upstream commit 89875151fccdd024d571aa884ea97a0128b968b6 ] When request_irq() fails, error path calls vp_del_vqs(). There, as vq is present in the list, free_irq() is called for the same vector. That causes following splat: [ 0.414355] Trying to free already-free IRQ 27 [ 0.414403] WARNING: CPU: 1 PID: 1 at kernel/irq/manage.c:1899 free_irq+0x1a1/0x2d0 [ 0.414510] Modules linked in: [ 0.414540] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc4+ #27 [ 0.414540] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014 [ 0.414540] RIP: 0010:free_irq+0x1a1/0x2d0 [ 0.414540] Code: 1e 00 48 83 c4 08 48 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 90 8b 74 24 04 48 c7 c7 98 80 6c b1 e8 00 c9 f7 ff 90 <0f> 0b 90 90 48 89 ee 4c 89 ef e8 e0 20 b8 00 49 8b 47 40 48 8b 40 [ 0.414540] RSP: 0000:ffffb71480013ae0 EFLAGS: 00010086 [ 0.414540] RAX: 0000000000000000 RBX: ffffa099c2722000 RCX: 0000000000000000 [ 0.414540] RDX: 0000000000000000 RSI: ffffb71480013998 RDI: 0000000000000001 [ 0.414540] RBP: 0000000000000246 R08: 00000000ffffdfff R09: 0000000000000001 [ 0.414540] R10: 00000000ffffdfff R11: ffffffffb18729c0 R12: ffffa099c1c91760 [ 0.414540] R13: ffffa099c1c916a4 R14: ffffa099c1d2f200 R15: ffffa099c1c91600 [ 0.414540] FS: 0000000000000000(0000) GS:ffffa099fec40000(0000) knlGS:0000000000000000 [ 0.414540] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.414540] CR2: 0000000000000000 CR3: 0000000008e3e001 CR4: 0000000000370ef0 [ 0.414540] Call Trace: [ 0.414540] [ 0.414540] ? __warn+0x80/0x120 [ 0.414540] ? free_irq+0x1a1/0x2d0 [ 0.414540] ? report_bug+0x164/0x190 [ 0.414540] ? handle_bug+0x3b/0x70 [ 0.414540] ? exc_invalid_op+0x17/0x70 [ 0.414540] ? asm_exc_invalid_op+0x1a/0x20 [ 0.414540] ? free_irq+0x1a1/0x2d0 [ 0.414540] vp_del_vqs+0xc1/0x220 [ 0.414540] vp_find_vqs_msix+0x305/0x470 [ 0.414540] vp_find_vqs+0x3e/0x1a0 [ 0.414540] vp_modern_find_vqs+0x1b/0x70 [ 0.414540] init_vqs+0x387/0x600 [ 0.414540] virtnet_probe+0x50a/0xc80 [ 0.414540] virtio_dev_probe+0x1e0/0x2b0 [ 0.414540] really_probe+0xc0/0x2c0 [ 0.414540] ? __pfx___driver_attach+0x10/0x10 [ 0.414540] __driver_probe_device+0x73/0x120 [ 0.414540] driver_probe_device+0x1f/0xe0 [ 0.414540] __driver_attach+0x88/0x180 [ 0.414540] bus_for_each_dev+0x85/0xd0 [ 0.414540] bus_add_driver+0xec/0x1f0 [ 0.414540] driver_register+0x59/0x100 [ 0.414540] ? __pfx_virtio_net_driver_init+0x10/0x10 [ 0.414540] virtio_net_driver_init+0x90/0xb0 [ 0.414540] do_one_initcall+0x58/0x230 [ 0.414540] kernel_init_freeable+0x1a3/0x2d0 [ 0.414540] ? __pfx_kernel_init+0x10/0x10 [ 0.414540] kernel_init+0x1a/0x1c0 [ 0.414540] ret_from_fork+0x31/0x50 [ 0.414540] ? __pfx_kernel_init+0x10/0x10 [ 0.414540] ret_from_fork_asm+0x1a/0x30 [ 0.414540] Fix this by calling deleting the current vq when request_irq() fails. Fixes: 0b0f9dc52ed0 ("Revert "virtio_pci: use shared interrupts for virtqueues"") Signed-off-by: Jiri Pirko Message-Id: <20240426150845.3999481-1-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin --- drivers/virtio/virtio_pci_common.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index 1e890ef17687..a6f375417fd5 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -339,8 +339,10 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned nvqs, vring_interrupt, 0, vp_dev->msix_names[msix_vec], vqs[i]); - if (err) + if (err) { + vp_del_vq(vqs[i]); goto error_find; + } } return 0; From cb95173e6c0b03cd1d3439aacd3866723ddaee04 Mon Sep 17 00:00:00 2001 From: Wei Fang Date: Tue, 21 May 2024 10:38:00 +0800 Subject: [PATCH 0223/1565] net: fec: avoid lock evasion when reading pps_enable [ Upstream commit 3b1c92f8e5371700fada307cc8fd2c51fa7bc8c1 ] The assignment of pps_enable is protected by tmreg_lock, but the read operation of pps_enable is not. So the Coverity tool reports a lock evasion warning which may cause data race to occur when running in a multithread environment. Although this issue is almost impossible to occur, we'd better fix it, at least it seems more logically reasonable, and it also prevents Coverity from continuing to issue warnings. Fixes: 278d24047891 ("net: fec: ptp: Enable PPS output based on ptp clock") Signed-off-by: Wei Fang Link: https://lore.kernel.org/r/20240521023800.17102-1-wei.fang@nxp.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- drivers/net/ethernet/freescale/fec_ptp.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_ptp.c b/drivers/net/ethernet/freescale/fec_ptp.c index c5ae67300590..780fbb3e1ed0 100644 --- a/drivers/net/ethernet/freescale/fec_ptp.c +++ b/drivers/net/ethernet/freescale/fec_ptp.c @@ -103,14 +103,13 @@ static int fec_ptp_enable_pps(struct fec_enet_private *fep, uint enable) u64 ns; val = 0; - if (fep->pps_enable == enable) - return 0; - - fep->pps_channel = DEFAULT_PPS_CHANNEL; - fep->reload_period = PPS_OUPUT_RELOAD_PERIOD; - spin_lock_irqsave(&fep->tmreg_lock, flags); + if (fep->pps_enable == enable) { + spin_unlock_irqrestore(&fep->tmreg_lock, flags); + return 0; + } + if (enable) { /* clear capture or output compare interrupt status if have. */ @@ -441,6 +440,9 @@ static int fec_ptp_enable(struct ptp_clock_info *ptp, int ret = 0; if (rq->type == PTP_CLK_REQ_PPS) { + fep->pps_channel = DEFAULT_PPS_CHANNEL; + fep->reload_period = PPS_OUPUT_RELOAD_PERIOD; + ret = fec_ptp_enable_pps(fep, on); return ret; From d72e126e9a36d3d33889829df8fc90100bb0e071 Mon Sep 17 00:00:00 2001 From: "Dae R. Jeong" Date: Tue, 21 May 2024 19:34:38 +0900 Subject: [PATCH 0224/1565] tls: fix missing memory barrier in tls_init [ Upstream commit 91e61dd7a0af660408e87372d8330ceb218be302 ] In tls_init(), a write memory barrier is missing, and store-store reordering may cause NULL dereference in tls_{setsockopt,getsockopt}. CPU0 CPU1 ----- ----- // In tls_init() // In tls_ctx_create() ctx = kzalloc() ctx->sk_proto = READ_ONCE(sk->sk_prot) -(1) // In update_sk_prot() WRITE_ONCE(sk->sk_prot, tls_prots) -(2) // In sock_common_setsockopt() READ_ONCE(sk->sk_prot)->setsockopt() // In tls_{setsockopt,getsockopt}() ctx->sk_proto->setsockopt() -(3) In the above scenario, when (1) and (2) are reordered, (3) can observe the NULL value of ctx->sk_proto, causing NULL dereference. To fix it, we rely on rcu_assign_pointer() which implies the release barrier semantic. By moving rcu_assign_pointer() after ctx->sk_proto is initialized, we can ensure that ctx->sk_proto are visible when changing sk->sk_prot. Fixes: d5bee7374b68 ("net/tls: Annotate access to sk_prot with READ_ONCE/WRITE_ONCE") Signed-off-by: Yewon Choi Signed-off-by: Dae R. Jeong Link: https://lore.kernel.org/netdev/ZU4OJG56g2V9z_H7@dragonet/T/ Link: https://lore.kernel.org/r/Zkx4vjSFp0mfpjQ2@libra05 Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/tls/tls_main.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index 2bbacd9b97e5..ebf856cf821d 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -633,9 +633,17 @@ struct tls_context *tls_ctx_create(struct sock *sk) return NULL; mutex_init(&ctx->tx_lock); - rcu_assign_pointer(icsk->icsk_ulp_data, ctx); ctx->sk_proto = READ_ONCE(sk->sk_prot); ctx->sk = sk; + /* Release semantic of rcu_assign_pointer() ensures that + * ctx->sk_proto is visible before changing sk->sk_prot in + * update_sk_prot(), and prevents reading uninitialized value in + * tls_{getsockopt, setsockopt}. Note that we do not need a + * read barrier in tls_{getsockopt,setsockopt} as there is an + * address dependency between sk->sk_proto->{getsockopt,setsockopt} + * and ctx->sk_proto. + */ + rcu_assign_pointer(icsk->icsk_ulp_data, ctx); return ctx; } From 728fb8b3b55fe4046904723227f9e8f6f53417a4 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Sun, 5 May 2024 19:36:49 +0900 Subject: [PATCH 0225/1565] nfc: nci: Fix kcov check in nci_rx_work() [ Upstream commit 19e35f24750ddf860c51e51c68cf07ea181b4881 ] Commit 7e8cdc97148c ("nfc: Add KCOV annotations") added kcov_remote_start_common()/kcov_remote_stop() pair into nci_rx_work(), with an assumption that kcov_remote_stop() is called upon continue of the for loop. But commit d24b03535e5e ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet") forgot to call kcov_remote_stop() before break of the for loop. Reported-by: syzbot Closes: https://syzkaller.appspot.com/bug?extid=0438378d6f157baae1a2 Fixes: d24b03535e5e ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet") Suggested-by: Andrey Konovalov Signed-off-by: Tetsuo Handa Reviewed-by: Krzysztof Kozlowski Link: https://lore.kernel.org/r/6d10f829-5a0c-405a-b39a-d7266f3a1a0b@I-love.SAKURA.ne.jp Signed-off-by: Jakub Kicinski Stable-dep-of: 6671e352497c ("nfc: nci: Fix handling of zero-length payload packets in nci_rx_work()") Signed-off-by: Sasha Levin --- net/nfc/nci/core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/nfc/nci/core.c b/net/nfc/nci/core.c index 3d67d558ed48..e2d632e0d3eb 100644 --- a/net/nfc/nci/core.c +++ b/net/nfc/nci/core.c @@ -1517,6 +1517,7 @@ static void nci_rx_work(struct work_struct *work) if (!nci_valid_size(skb)) { kfree_skb(skb); + kcov_remote_stop(); break; } From 0f26983c24235e1617f57d73d0760172019b1f18 Mon Sep 17 00:00:00 2001 From: Ryosuke Yasuoka Date: Wed, 22 May 2024 00:34:42 +0900 Subject: [PATCH 0226/1565] nfc: nci: Fix handling of zero-length payload packets in nci_rx_work() [ Upstream commit 6671e352497ca4bb07a96c48e03907065ff77d8a ] When nci_rx_work() receives a zero-length payload packet, it should not discard the packet and exit the loop. Instead, it should continue processing subsequent packets. Fixes: d24b03535e5e ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet") Signed-off-by: Ryosuke Yasuoka Reviewed-by: Simon Horman Reviewed-by: Krzysztof Kozlowski Link: https://lore.kernel.org/r/20240521153444.535399-1-ryasuoka@redhat.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/nfc/nci/core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/nfc/nci/core.c b/net/nfc/nci/core.c index e2d632e0d3eb..3182b4228cfa 100644 --- a/net/nfc/nci/core.c +++ b/net/nfc/nci/core.c @@ -1517,8 +1517,7 @@ static void nci_rx_work(struct work_struct *work) if (!nci_valid_size(skb)) { kfree_skb(skb); - kcov_remote_stop(); - break; + continue; } /* Process frame */ From e01065b339e323b3dfa1be217fd89e9b3208b0ab Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 15 May 2024 13:23:39 +0000 Subject: [PATCH 0227/1565] netfilter: nfnetlink_queue: acquire rcu_read_lock() in instance_destroy_rcu() [ Upstream commit dc21c6cc3d6986d938efbf95de62473982c98dec ] syzbot reported that nf_reinject() could be called without rcu_read_lock() : WARNING: suspicious RCU usage 6.9.0-rc7-syzkaller-02060-g5c1672705a1a #0 Not tainted net/netfilter/nfnetlink_queue.c:263 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by syz-executor.4/13427: #0: ffffffff8e334f60 (rcu_callback){....}-{0:0}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline] #0: ffffffff8e334f60 (rcu_callback){....}-{0:0}, at: rcu_do_batch kernel/rcu/tree.c:2190 [inline] #0: ffffffff8e334f60 (rcu_callback){....}-{0:0}, at: rcu_core+0xa86/0x1830 kernel/rcu/tree.c:2471 #1: ffff88801ca92958 (&inst->lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline] #1: ffff88801ca92958 (&inst->lock){+.-.}-{2:2}, at: nfqnl_flush net/netfilter/nfnetlink_queue.c:405 [inline] #1: ffff88801ca92958 (&inst->lock){+.-.}-{2:2}, at: instance_destroy_rcu+0x30/0x220 net/netfilter/nfnetlink_queue.c:172 stack backtrace: CPU: 0 PID: 13427 Comm: syz-executor.4 Not tainted 6.9.0-rc7-syzkaller-02060-g5c1672705a1a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712 nf_reinject net/netfilter/nfnetlink_queue.c:323 [inline] nfqnl_reinject+0x6ec/0x1120 net/netfilter/nfnetlink_queue.c:397 nfqnl_flush net/netfilter/nfnetlink_queue.c:410 [inline] instance_destroy_rcu+0x1ae/0x220 net/netfilter/nfnetlink_queue.c:172 rcu_do_batch kernel/rcu/tree.c:2196 [inline] rcu_core+0xafd/0x1830 kernel/rcu/tree.c:2471 handle_softirqs+0x2d6/0x990 kernel/softirq.c:554 __do_softirq kernel/softirq.c:588 [inline] invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637 irq_exit_rcu+0x9/0x30 kernel/softirq.c:649 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043 Fixes: 9872bec773c2 ("[NETFILTER]: nfnetlink: use RCU for queue instances hash") Reported-by: syzbot Signed-off-by: Eric Dumazet Acked-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- net/netfilter/nfnetlink_queue.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c index 9d87606c76ff..dc6af1919dea 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -167,7 +167,9 @@ instance_destroy_rcu(struct rcu_head *head) struct nfqnl_instance *inst = container_of(head, struct nfqnl_instance, rcu); + rcu_read_lock(); nfqnl_flush(inst, NULL, 0); + rcu_read_unlock(); kfree(inst); module_put(THIS_MODULE); } From 7ca9cf24b04af9e945e431fec8dca473d7621b4c Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Thu, 9 May 2024 23:02:24 +0200 Subject: [PATCH 0228/1565] netfilter: nft_payload: restore vlan q-in-q match support [ Upstream commit aff5c01fa1284d606f8e7cbdaafeef2511bb46c1 ] Revert f6ae9f120dad ("netfilter: nft_payload: add C-VLAN support"). f41f72d09ee1 ("netfilter: nft_payload: simplify vlan header handling") already allows to match on inner vlan tags by subtract the vlan header size to the payload offset which has been popped and stored in skbuff metadata fields. Fixes: f6ae9f120dad ("netfilter: nft_payload: add C-VLAN support") Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- net/netfilter/nft_payload.c | 23 +++++++---------------- 1 file changed, 7 insertions(+), 16 deletions(-) diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c index 56f6c05362ae..fa64b1b8ae91 100644 --- a/net/netfilter/nft_payload.c +++ b/net/netfilter/nft_payload.c @@ -44,36 +44,27 @@ nft_payload_copy_vlan(u32 *d, const struct sk_buff *skb, u8 offset, u8 len) int mac_off = skb_mac_header(skb) - skb->data; u8 *vlanh, *dst_u8 = (u8 *) d; struct vlan_ethhdr veth; - u8 vlan_hlen = 0; - - if ((skb->protocol == htons(ETH_P_8021AD) || - skb->protocol == htons(ETH_P_8021Q)) && - offset >= VLAN_ETH_HLEN && offset < VLAN_ETH_HLEN + VLAN_HLEN) - vlan_hlen += VLAN_HLEN; vlanh = (u8 *) &veth; - if (offset < VLAN_ETH_HLEN + vlan_hlen) { + if (offset < VLAN_ETH_HLEN) { u8 ethlen = len; - if (vlan_hlen && - skb_copy_bits(skb, mac_off, &veth, VLAN_ETH_HLEN) < 0) - return false; - else if (!nft_payload_rebuild_vlan_hdr(skb, mac_off, &veth)) + if (!nft_payload_rebuild_vlan_hdr(skb, mac_off, &veth)) return false; - if (offset + len > VLAN_ETH_HLEN + vlan_hlen) - ethlen -= offset + len - VLAN_ETH_HLEN - vlan_hlen; + if (offset + len > VLAN_ETH_HLEN) + ethlen -= offset + len - VLAN_ETH_HLEN; - memcpy(dst_u8, vlanh + offset - vlan_hlen, ethlen); + memcpy(dst_u8, vlanh + offset, ethlen); len -= ethlen; if (len == 0) return true; dst_u8 += ethlen; - offset = ETH_HLEN + vlan_hlen; + offset = ETH_HLEN; } else { - offset -= VLAN_HLEN + vlan_hlen; + offset -= VLAN_HLEN; } return skb_copy_bits(skb, offset + mac_off, dst_u8, len) == 0; From 5f72ba46f1d8221e41057e50d9fce8f5c33ed742 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 22 May 2024 20:09:49 +0300 Subject: [PATCH 0229/1565] spi: Don't mark message DMA mapped when no transfer in it is [ Upstream commit 9f788ba457b45b0ce422943fcec9fa35c4587764 ] There is no need to set the DMA mapped flag of the message if it has no mapped transfers. Moreover, it may give the code a chance to take the wrong paths, i.e. to exercise DMA related APIs on unmapped data. Make __spi_map_msg() to bail earlier on the above mentioned cases. Fixes: 99adef310f68 ("spi: Provide core support for DMA mapping transfers") Signed-off-by: Andy Shevchenko Link: https://msgid.link/r/20240522171018.3362521-2-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- drivers/spi/spi.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index 857a1399850c..e84494eed1c1 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -970,6 +970,7 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) else rx_dev = ctlr->dev.parent; + ret = -ENOMSG; list_for_each_entry(xfer, &msg->transfers, transfer_list) { if (!ctlr->can_dma(ctlr, msg->spi, xfer)) continue; @@ -993,6 +994,9 @@ static int __spi_map_msg(struct spi_controller *ctlr, struct spi_message *msg) } } } + /* No transfer has been mapped, bail out with success */ + if (ret) + return 0; ctlr->cur_msg_mapped = true; From 82fdfbf2424356821c75e56b76546005d5d1f67e Mon Sep 17 00:00:00 2001 From: Sagi Grimberg Date: Tue, 21 May 2024 23:20:28 +0300 Subject: [PATCH 0230/1565] nvmet: fix ns enable/disable possible hang [ Upstream commit f97914e35fd98b2b18fb8a092e0a0799f73afdfe ] When disabling an nvmet namespace, there is a period where the subsys->lock is released, as the ns disable waits for backend IO to complete, and the ns percpu ref to be properly killed. The original intent was to avoid taking the subsystem lock for a prolong period as other processes may need to acquire it (for example new incoming connections). However, it opens up a window where another process may come in and enable the ns, (re)intiailizing the ns percpu_ref, causing the disable sequence to hang. Solve this by taking the global nvmet_config_sem over the entire configfs enable/disable sequence. Fixes: a07b4970f464 ("nvmet: add a generic NVMe target") Signed-off-by: Sagi Grimberg Reviewed-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Signed-off-by: Keith Busch Signed-off-by: Sasha Levin --- drivers/nvme/target/configfs.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c index 9aed5cc71096..f2d11fc04752 100644 --- a/drivers/nvme/target/configfs.c +++ b/drivers/nvme/target/configfs.c @@ -532,10 +532,18 @@ static ssize_t nvmet_ns_enable_store(struct config_item *item, if (strtobool(page, &enable)) return -EINVAL; + /* + * take a global nvmet_config_sem because the disable routine has a + * window where it releases the subsys-lock, giving a chance to + * a parallel enable to concurrently execute causing the disable to + * have a misaccounting of the ns percpu_ref. + */ + down_write(&nvmet_config_sem); if (enable) ret = nvmet_ns_enable(ns); else nvmet_ns_disable(ns); + up_write(&nvmet_config_sem); return ret ? ret : count; } From 72c6038d23cbf60d29a2752fb1696bb504928b5a Mon Sep 17 00:00:00 2001 From: Carolina Jubran Date: Wed, 22 May 2024 22:26:58 +0300 Subject: [PATCH 0231/1565] net/mlx5e: Use rx_missed_errors instead of rx_dropped for reporting buffer exhaustion [ Upstream commit 5c74195d5dd977e97556e6fa76909b831c241230 ] Previously, the driver incorrectly used rx_dropped to report device buffer exhaustion. According to the documentation, rx_dropped should not be used to count packets dropped due to buffer exhaustion, which is the purpose of rx_missed_errors. Use rx_missed_errors as intended for counting packets dropped due to buffer exhaustion. Fixes: 269e6b3af3bf ("net/mlx5e: Report additional error statistics in get stats ndo") Signed-off-by: Carolina Jubran Signed-off-by: Tariq Toukan Reviewed-by: Simon Horman Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 5673a4113253..f1834853872d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3695,7 +3695,7 @@ mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats) mlx5e_fold_sw_stats64(priv, stats); } - stats->rx_dropped = priv->stats.qcnt.rx_out_of_buffer; + stats->rx_missed_errors = priv->stats.qcnt.rx_out_of_buffer; stats->rx_length_errors = PPORT_802_3_GET(pstats, a_in_range_length_errors) + From ae6fc4e6a3322f6d1c8ff59150d8469487a73dd8 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Sun, 5 May 2024 23:08:31 +0900 Subject: [PATCH 0232/1565] dma-buf/sw-sync: don't enable IRQ from sync_print_obj() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit b794918961516f667b0c745aebdfebbb8a98df39 ] Since commit a6aa8fca4d79 ("dma-buf/sw-sync: Reduce irqsave/irqrestore from known context") by error replaced spin_unlock_irqrestore() with spin_unlock_irq() for both sync_debugfs_show() and sync_print_obj() despite sync_print_obj() is called from sync_debugfs_show(), lockdep complains inconsistent lock state warning. Use plain spin_{lock,unlock}() for sync_print_obj(), for sync_debugfs_show() is already using spin_{lock,unlock}_irq(). Reported-by: syzbot Closes: https://syzkaller.appspot.com/bug?extid=a225ee3df7e7f9372dbe Fixes: a6aa8fca4d79 ("dma-buf/sw-sync: Reduce irqsave/irqrestore from known context") Signed-off-by: Tetsuo Handa Reviewed-by: Christian König Link: https://patchwork.freedesktop.org/patch/msgid/c2e46020-aaa6-4e06-bf73-f05823f913f0@I-love.SAKURA.ne.jp Signed-off-by: Christian König Signed-off-by: Sasha Levin --- drivers/dma-buf/sync_debug.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/dma-buf/sync_debug.c b/drivers/dma-buf/sync_debug.c index 101394f16930..237bce21d1e7 100644 --- a/drivers/dma-buf/sync_debug.c +++ b/drivers/dma-buf/sync_debug.c @@ -110,12 +110,12 @@ static void sync_print_obj(struct seq_file *s, struct sync_timeline *obj) seq_printf(s, "%s: %d\n", obj->name, obj->value); - spin_lock_irq(&obj->lock); + spin_lock(&obj->lock); /* Caller already disabled IRQ. */ list_for_each(pos, &obj->pt_list) { struct sync_pt *pt = container_of(pos, struct sync_pt, link); sync_print_fence(s, &pt->base, false); } - spin_unlock_irq(&obj->lock); + spin_unlock(&obj->lock); } static void sync_print_sync_file(struct seq_file *s, From 540d73a5c0521ef83af292ec0fe454c91f18fff7 Mon Sep 17 00:00:00 2001 From: Friedrich Vock Date: Tue, 14 May 2024 09:09:31 +0200 Subject: [PATCH 0233/1565] bpf: Fix potential integer overflow in resolve_btfids [ Upstream commit 44382b3ed6b2787710c8ade06c0e97f5970a47c8 ] err is a 32-bit integer, but elf_update returns an off_t, which is 64-bit at least on 64-bit platforms. If symbols_patch is called on a binary between 2-4GB in size, the result will be negative when cast to a 32-bit integer, which the code assumes means an error occurred. This can wrongly trigger build failures when building very large kernel images. Fixes: fbbb68de80a4 ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object") Signed-off-by: Friedrich Vock Signed-off-by: Daniel Borkmann Acked-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20240514070931.199694-1-friedrich.vock@gmx.de Signed-off-by: Sasha Levin --- tools/bpf/resolve_btfids/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/bpf/resolve_btfids/main.c b/tools/bpf/resolve_btfids/main.c index f32c059fbfb4..8b2a2576fed6 100644 --- a/tools/bpf/resolve_btfids/main.c +++ b/tools/bpf/resolve_btfids/main.c @@ -637,7 +637,7 @@ static int sets_patch(struct object *obj) static int symbols_patch(struct object *obj) { - int err; + off_t err; if (__symbols_patch(obj, &obj->structs) || __symbols_patch(obj, &obj->unions) || From 3c0d36972edbe56fcf98899622d9b90ac9965227 Mon Sep 17 00:00:00 2001 From: Roded Zats Date: Wed, 22 May 2024 10:30:44 +0300 Subject: [PATCH 0234/1565] enic: Validate length of nl attributes in enic_set_vf_port [ Upstream commit e8021b94b0412c37bcc79027c2e382086b6ce449 ] enic_set_vf_port assumes that the nl attribute IFLA_PORT_PROFILE is of length PORT_PROFILE_MAX and that the nl attributes IFLA_PORT_INSTANCE_UUID, IFLA_PORT_HOST_UUID are of length PORT_UUID_MAX. These attributes are validated (in the function do_setlink in rtnetlink.c) using the nla_policy ifla_port_policy. The policy defines IFLA_PORT_PROFILE as NLA_STRING, IFLA_PORT_INSTANCE_UUID as NLA_BINARY and IFLA_PORT_HOST_UUID as NLA_STRING. That means that the length validation using the policy is for the max size of the attributes and not on exact size so the length of these attributes might be less than the sizes that enic_set_vf_port expects. This might cause an out of bands read access in the memcpys of the data of these attributes in enic_set_vf_port. Fixes: f8bd909183ac ("net: Add ndo_{set|get}_vf_port support for enic dynamic vnics") Signed-off-by: Roded Zats Link: https://lore.kernel.org/r/20240522073044.33519-1-rzats@paloaltonetworks.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- drivers/net/ethernet/cisco/enic/enic_main.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c index 548d8095c0a7..b695f3f23328 100644 --- a/drivers/net/ethernet/cisco/enic/enic_main.c +++ b/drivers/net/ethernet/cisco/enic/enic_main.c @@ -1117,18 +1117,30 @@ static int enic_set_vf_port(struct net_device *netdev, int vf, pp->request = nla_get_u8(port[IFLA_PORT_REQUEST]); if (port[IFLA_PORT_PROFILE]) { + if (nla_len(port[IFLA_PORT_PROFILE]) != PORT_PROFILE_MAX) { + memcpy(pp, &prev_pp, sizeof(*pp)); + return -EINVAL; + } pp->set |= ENIC_SET_NAME; memcpy(pp->name, nla_data(port[IFLA_PORT_PROFILE]), PORT_PROFILE_MAX); } if (port[IFLA_PORT_INSTANCE_UUID]) { + if (nla_len(port[IFLA_PORT_INSTANCE_UUID]) != PORT_UUID_MAX) { + memcpy(pp, &prev_pp, sizeof(*pp)); + return -EINVAL; + } pp->set |= ENIC_SET_INSTANCE; memcpy(pp->instance_uuid, nla_data(port[IFLA_PORT_INSTANCE_UUID]), PORT_UUID_MAX); } if (port[IFLA_PORT_HOST_UUID]) { + if (nla_len(port[IFLA_PORT_HOST_UUID]) != PORT_UUID_MAX) { + memcpy(pp, &prev_pp, sizeof(*pp)); + return -EINVAL; + } pp->set |= ENIC_SET_HOST; memcpy(pp->host_uuid, nla_data(port[IFLA_PORT_HOST_UUID]), PORT_UUID_MAX); From 117cacd72ca8823e57063052de7b444101bb69cf Mon Sep 17 00:00:00 2001 From: Parthiban Veerasooran Date: Thu, 23 May 2024 14:23:14 +0530 Subject: [PATCH 0235/1565] net: usb: smsc95xx: fix changing LED_SEL bit value updated from EEPROM [ Upstream commit 52a2f0608366a629d43dacd3191039c95fef74ba ] LED Select (LED_SEL) bit in the LED General Purpose IO Configuration register is used to determine the functionality of external LED pins (Speed Indicator, Link and Activity Indicator, Full Duplex Link Indicator). The default value for this bit is 0 when no EEPROM is present. If a EEPROM is present, the default value is the value of the LED Select bit in the Configuration Flags of the EEPROM. A USB Reset or Lite Reset (LRST) will cause this bit to be restored to the image value last loaded from EEPROM, or to be set to 0 if no EEPROM is present. While configuring the dual purpose GPIO/LED pins to LED outputs in the LED General Purpose IO Configuration register, the LED_SEL bit is changed as 0 and resulting the configured value from the EEPROM is cleared. The issue is fixed by using read-modify-write approach. Fixes: f293501c61c5 ("smsc95xx: configure LED outputs") Signed-off-by: Parthiban Veerasooran Reviewed-by: Simon Horman Reviewed-by: Woojung Huh Link: https://lore.kernel.org/r/20240523085314.167650-1-Parthiban.Veerasooran@microchip.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- drivers/net/usb/smsc95xx.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c index 2778ad6726f0..30e5f6910e6f 100644 --- a/drivers/net/usb/smsc95xx.c +++ b/drivers/net/usb/smsc95xx.c @@ -845,7 +845,7 @@ static int smsc95xx_start_rx_path(struct usbnet *dev, int in_pm) static int smsc95xx_reset(struct usbnet *dev) { struct smsc95xx_priv *pdata = dev->driver_priv; - u32 read_buf, write_buf, burst_cap; + u32 read_buf, burst_cap; int ret = 0, timeout; netif_dbg(dev, ifup, dev->net, "entering smsc95xx_reset\n"); @@ -987,10 +987,13 @@ static int smsc95xx_reset(struct usbnet *dev) return ret; netif_dbg(dev, ifup, dev->net, "ID_REV = 0x%08x\n", read_buf); + ret = smsc95xx_read_reg(dev, LED_GPIO_CFG, &read_buf); + if (ret < 0) + return ret; /* Configure GPIO pins as LED outputs */ - write_buf = LED_GPIO_CFG_SPD_LED | LED_GPIO_CFG_LNK_LED | - LED_GPIO_CFG_FDX_LED; - ret = smsc95xx_write_reg(dev, LED_GPIO_CFG, write_buf); + read_buf |= LED_GPIO_CFG_SPD_LED | LED_GPIO_CFG_LNK_LED | + LED_GPIO_CFG_FDX_LED; + ret = smsc95xx_write_reg(dev, LED_GPIO_CFG, read_buf); if (ret < 0) return ret; From 29467edc23818dc5a33042ffb4920b49b090e63d Mon Sep 17 00:00:00 2001 From: Jakub Sitnicki Date: Mon, 27 May 2024 13:20:07 +0200 Subject: [PATCH 0236/1565] bpf: Allow delete from sockmap/sockhash only if update is allowed [ Upstream commit 98e948fb60d41447fd8d2d0c3b8637fc6b6dc26d ] We have seen an influx of syzkaller reports where a BPF program attached to a tracepoint triggers a locking rule violation by performing a map_delete on a sockmap/sockhash. We don't intend to support this artificial use scenario. Extend the existing verifier allowed-program-type check for updating sockmap/sockhash to also cover deleting from a map. From now on only BPF programs which were previously allowed to update sockmap/sockhash can delete from these map types. Fixes: ff9105993240 ("bpf, sockmap: Prevent lock inversion deadlock in map delete elem") Reported-by: Tetsuo Handa Reported-by: syzbot+ec941d6e24f633a59172@syzkaller.appspotmail.com Signed-off-by: Jakub Sitnicki Signed-off-by: Daniel Borkmann Tested-by: syzbot+ec941d6e24f633a59172@syzkaller.appspotmail.com Acked-by: John Fastabend Closes: https://syzkaller.appspot.com/bug?extid=ec941d6e24f633a59172 Link: https://lore.kernel.org/bpf/20240527-sockmap-verify-deletes-v1-1-944b372f2101@cloudflare.com Signed-off-by: Sasha Levin --- kernel/bpf/verifier.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 25f8a8716e88..ad115ccc2fe0 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -4890,7 +4890,8 @@ static bool may_update_sockmap(struct bpf_verifier_env *env, int func_id) enum bpf_attach_type eatype = env->prog->expected_attach_type; enum bpf_prog_type type = resolve_prog_type(env->prog); - if (func_id != BPF_FUNC_map_update_elem) + if (func_id != BPF_FUNC_map_update_elem && + func_id != BPF_FUNC_map_delete_elem) return false; /* It's not possible to get access to a locked struct sock in these @@ -4901,6 +4902,11 @@ static bool may_update_sockmap(struct bpf_verifier_env *env, int func_id) if (eatype == BPF_TRACE_ITER) return true; break; + case BPF_PROG_TYPE_SOCK_OPS: + /* map_update allowed only via dedicated helpers with event type checks */ + if (func_id == BPF_FUNC_map_delete_elem) + return true; + break; case BPF_PROG_TYPE_SOCKET_FILTER: case BPF_PROG_TYPE_SCHED_CLS: case BPF_PROG_TYPE_SCHED_ACT: @@ -4988,7 +4994,6 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, case BPF_MAP_TYPE_SOCKMAP: if (func_id != BPF_FUNC_sk_redirect_map && func_id != BPF_FUNC_sock_map_update && - func_id != BPF_FUNC_map_delete_elem && func_id != BPF_FUNC_msg_redirect_map && func_id != BPF_FUNC_sk_select_reuseport && func_id != BPF_FUNC_map_lookup_elem && @@ -4998,7 +5003,6 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, case BPF_MAP_TYPE_SOCKHASH: if (func_id != BPF_FUNC_sk_redirect_hash && func_id != BPF_FUNC_sock_hash_update && - func_id != BPF_FUNC_map_delete_elem && func_id != BPF_FUNC_msg_redirect_hash && func_id != BPF_FUNC_sk_select_reuseport && func_id != BPF_FUNC_map_lookup_elem && From ddd2912a94eb7c50bf5659245518e8844e582680 Mon Sep 17 00:00:00 2001 From: Xiaolei Wang Date: Fri, 24 May 2024 13:05:28 +0800 Subject: [PATCH 0237/1565] net:fec: Add fec_enet_deinit() [ Upstream commit bf0497f53c8535f99b72041529d3f7708a6e2c0d ] When fec_probe() fails or fec_drv_remove() needs to release the fec queue and remove a NAPI context, therefore add a function corresponding to fec_enet_init() and call fec_enet_deinit() which does the opposite to release memory and remove a NAPI context. Fixes: 59d0f7465644 ("net: fec: init multi queue date structure") Signed-off-by: Xiaolei Wang Reviewed-by: Wei Fang Reviewed-by: Andrew Lunn Link: https://lore.kernel.org/r/20240524050528.4115581-1-xiaolei.wang@windriver.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/ethernet/freescale/fec_main.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index fe29769cb158..adb76db66031 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -3443,6 +3443,14 @@ static int fec_enet_init(struct net_device *ndev) return ret; } +static void fec_enet_deinit(struct net_device *ndev) +{ + struct fec_enet_private *fep = netdev_priv(ndev); + + netif_napi_del(&fep->napi); + fec_enet_free_queue(ndev); +} + #ifdef CONFIG_OF static int fec_reset_phy(struct platform_device *pdev) { @@ -3813,6 +3821,7 @@ fec_probe(struct platform_device *pdev) fec_enet_mii_remove(fep); failed_mii_init: failed_irq: + fec_enet_deinit(ndev); failed_init: fec_ptp_stop(pdev); failed_reset: @@ -3874,6 +3883,7 @@ fec_drv_remove(struct platform_device *pdev) pm_runtime_put_noidle(&pdev->dev); pm_runtime_disable(&pdev->dev); + fec_enet_deinit(ndev); free_netdev(ndev); return 0; } From 07eeedafc59c45fe5de43958128542be3784764c Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Mon, 13 May 2024 12:27:15 +0200 Subject: [PATCH 0238/1565] netfilter: tproxy: bail out if IP has been disabled on the device [ Upstream commit 21a673bddc8fd4873c370caf9ae70ffc6d47e8d3 ] syzbot reports: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] [..] RIP: 0010:nf_tproxy_laddr4+0xb7/0x340 net/ipv4/netfilter/nf_tproxy_ipv4.c:62 Call Trace: nft_tproxy_eval_v4 net/netfilter/nft_tproxy.c:56 [inline] nft_tproxy_eval+0xa9a/0x1a00 net/netfilter/nft_tproxy.c:168 __in_dev_get_rcu() can return NULL, so check for this. Reported-and-tested-by: syzbot+b94a6818504ea90d7661@syzkaller.appspotmail.com Fixes: cc6eb4338569 ("tproxy: use the interface primary IP address as a default value for --on-ip") Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- net/ipv4/netfilter/nf_tproxy_ipv4.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv4/netfilter/nf_tproxy_ipv4.c b/net/ipv4/netfilter/nf_tproxy_ipv4.c index 61cb2341f50f..7c1a0cd9f435 100644 --- a/net/ipv4/netfilter/nf_tproxy_ipv4.c +++ b/net/ipv4/netfilter/nf_tproxy_ipv4.c @@ -58,6 +58,8 @@ __be32 nf_tproxy_laddr4(struct sk_buff *skb, __be32 user_laddr, __be32 daddr) laddr = 0; indev = __in_dev_get_rcu(skb->dev); + if (!indev) + return daddr; in_dev_for_each_ifa_rcu(ifa, indev) { if (ifa->ifa_flags & IFA_F_SECONDARY) From 19e5a3d771fac8ef26d4c7eca77f8b063b983b2c Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Sun, 19 May 2024 18:22:27 +0900 Subject: [PATCH 0239/1565] kconfig: fix comparison to constant symbols, 'm', 'n' [ Upstream commit aabdc960a283ba78086b0bf66ee74326f49e218e ] Currently, comparisons to 'm' or 'n' result in incorrect output. [Test Code] config MODULES def_bool y modules config A def_tristate m config B def_bool A > n CONFIG_B is unset, while CONFIG_B=y is expected. The reason for the issue is because Kconfig compares the tristate values as strings. Currently, the .type fields in the constant symbol definitions, symbol_{yes,mod,no} are unspecified, i.e., S_UNKNOWN. When expr_calc_value() evaluates 'A > n', it checks the types of 'A' and 'n' to determine how to compare them. The left-hand side, 'A', is a tristate symbol with a value of 'm', which corresponds to a numeric value of 1. (Internally, 'y', 'm', and 'n' are represented as 2, 1, and 0, respectively.) The right-hand side, 'n', has an unknown type, so it is treated as the string "n" during the comparison. expr_calc_value() compares two values numerically only when both can have numeric values. Otherwise, they are compared as strings. symbol numeric value ASCII code ------------------------------------- y 2 0x79 m 1 0x6d n 0 0x6e 'm' is greater than 'n' if compared numerically (since 1 is greater than 0), but smaller than 'n' if compared as strings (since the ASCII code 0x6d is smaller than 0x6e). Specifying .type=S_TRISTATE for symbol_{yes,mod,no} fixes the above test code. Doing so, however, would cause a regression to the following test code. [Test Code 2] config MODULES def_bool n modules config A def_tristate n config B def_bool A = m You would get CONFIG_B=y, while CONFIG_B should not be set. The reason is because sym_get_string_value() turns 'm' into 'n' when the module feature is disabled. Consequently, expr_calc_value() evaluates 'A = n' instead of 'A = m'. This oddity has been hidden because the type of 'm' was previously S_UNKNOWN instead of S_TRISTATE. sym_get_string_value() should not tweak the string because the tristate value has already been correctly calculated. There is no reason to return the string "n" where its tristate value is mod. Fixes: 31847b67bec0 ("kconfig: allow use of relations other than (in)equality") Signed-off-by: Masahiro Yamada Signed-off-by: Sasha Levin --- scripts/kconfig/symbol.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/scripts/kconfig/symbol.c b/scripts/kconfig/symbol.c index a2056fa80de2..ff4c5d314b4d 100644 --- a/scripts/kconfig/symbol.c +++ b/scripts/kconfig/symbol.c @@ -13,18 +13,21 @@ struct symbol symbol_yes = { .name = "y", + .type = S_TRISTATE, .curr = { "y", yes }, .flags = SYMBOL_CONST|SYMBOL_VALID, }; struct symbol symbol_mod = { .name = "m", + .type = S_TRISTATE, .curr = { "m", mod }, .flags = SYMBOL_CONST|SYMBOL_VALID, }; struct symbol symbol_no = { .name = "n", + .type = S_TRISTATE, .curr = { "n", no }, .flags = SYMBOL_CONST|SYMBOL_VALID, }; @@ -776,8 +779,7 @@ const char *sym_get_string_value(struct symbol *sym) case no: return "n"; case mod: - sym_calc_value(modules_sym); - return (modules_sym->curr.tri == no) ? "n" : "m"; + return "m"; case yes: return "y"; } From 3c5caaef46d63cd9fcfc6186822f5c4c52b992f3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= Date: Tue, 21 May 2024 12:52:42 +0200 Subject: [PATCH 0240/1565] spi: stm32: Don't warn about spurious interrupts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 95d7c452a26564ef0c427f2806761b857106d8c4 ] The dev_warn to notify about a spurious interrupt was introduced with the reasoning that these are unexpected. However spurious interrupts tend to trigger continously and the error message on the serial console prevents that the core's detection of spurious interrupts kicks in (which disables the irq) and just floods the console. Fixes: c64e7efe46b7 ("spi: stm32: make spurious and overrun interrupts visible") Signed-off-by: Uwe Kleine-König Link: https://msgid.link/r/20240521105241.62400-2-u.kleine-koenig@pengutronix.de Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- drivers/spi/spi-stm32.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/spi/spi-stm32.c b/drivers/spi/spi-stm32.c index 9ec37cf10c01..f97b822cca19 100644 --- a/drivers/spi/spi-stm32.c +++ b/drivers/spi/spi-stm32.c @@ -931,7 +931,7 @@ static irqreturn_t stm32h7_spi_irq_thread(int irq, void *dev_id) mask |= STM32H7_SPI_SR_TXP | STM32H7_SPI_SR_RXP; if (!(sr & mask)) { - dev_warn(spi->dev, "spurious IT (sr=0x%08x, ier=0x%08x)\n", + dev_vdbg(spi->dev, "spurious IT (sr=0x%08x, ier=0x%08x)\n", sr, ier); spin_unlock_irqrestore(&spi->lock, flags); return IRQ_NONE; From 1abbf079da59ef559d0ab4219d2a0302f7970761 Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Wed, 29 May 2024 17:56:33 +0800 Subject: [PATCH 0241/1565] ipvlan: Dont Use skb->sk in ipvlan_process_v{4,6}_outbound [ Upstream commit b3dc6e8003b500861fa307e9a3400c52e78e4d3a ] Raw packet from PF_PACKET socket ontop of an IPv6-backed ipvlan device will hit WARN_ON_ONCE() in sk_mc_loop() through sch_direct_xmit() path. WARNING: CPU: 2 PID: 0 at net/core/sock.c:775 sk_mc_loop+0x2d/0x70 Modules linked in: sch_netem ipvlan rfkill cirrus drm_shmem_helper sg drm_kms_helper CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Not tainted 6.9.0+ #279 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:sk_mc_loop+0x2d/0x70 Code: fa 0f 1f 44 00 00 65 0f b7 15 f7 96 a3 4f 31 c0 66 85 d2 75 26 48 85 ff 74 1c RSP: 0018:ffffa9584015cd78 EFLAGS: 00010212 RAX: 0000000000000011 RBX: ffff91e585793e00 RCX: 0000000002c6a001 RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffff91e589c0f000 RBP: ffff91e5855bd100 R08: 0000000000000000 R09: 3d00545216f43d00 R10: ffff91e584fdcc50 R11: 00000060dd8616f4 R12: ffff91e58132d000 R13: ffff91e584fdcc68 R14: ffff91e5869ce800 R15: ffff91e589c0f000 FS: 0000000000000000(0000) GS:ffff91e898100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f788f7c44c0 CR3: 0000000008e1a000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? __warn (kernel/panic.c:693) ? sk_mc_loop (net/core/sock.c:760) ? report_bug (lib/bug.c:201 lib/bug.c:219) ? handle_bug (arch/x86/kernel/traps.c:239) ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) ? sk_mc_loop (net/core/sock.c:760) ip6_finish_output2 (net/ipv6/ip6_output.c:83 (discriminator 1)) ? nf_hook_slow (net/netfilter/core.c:626) ip6_finish_output (net/ipv6/ip6_output.c:222) ? __pfx_ip6_finish_output (net/ipv6/ip6_output.c:215) ipvlan_xmit_mode_l3 (drivers/net/ipvlan/ipvlan_core.c:602) ipvlan ipvlan_start_xmit (drivers/net/ipvlan/ipvlan_main.c:226) ipvlan dev_hard_start_xmit (net/core/dev.c:3594) sch_direct_xmit (net/sched/sch_generic.c:343) __qdisc_run (net/sched/sch_generic.c:416) net_tx_action (net/core/dev.c:5286) handle_softirqs (kernel/softirq.c:555) __irq_exit_rcu (kernel/softirq.c:589) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1043) The warning triggers as this: packet_sendmsg packet_snd //skb->sk is packet sk __dev_queue_xmit __dev_xmit_skb //q->enqueue is not NULL __qdisc_run sch_direct_xmit dev_hard_start_xmit ipvlan_start_xmit ipvlan_xmit_mode_l3 //l3 mode ipvlan_process_outbound //vepa flag ipvlan_process_v6_outbound ip6_local_out __ip6_finish_output ip6_finish_output2 //multicast packet sk_mc_loop //sk->sk_family is AF_PACKET Call ip{6}_local_out() with NULL sk in ipvlan as other tunnels to fix this. Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Suggested-by: Eric Dumazet Signed-off-by: Yue Haibing Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/20240529095633.613103-1-yuehaibing@huawei.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- drivers/net/ipvlan/ipvlan_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c index bfea28bd4502..d04b1450875b 100644 --- a/drivers/net/ipvlan/ipvlan_core.c +++ b/drivers/net/ipvlan/ipvlan_core.c @@ -440,7 +440,7 @@ static noinline_for_stack int ipvlan_process_v4_outbound(struct sk_buff *skb) memset(IPCB(skb), 0, sizeof(*IPCB(skb))); - err = ip_local_out(net, skb->sk, skb); + err = ip_local_out(net, NULL, skb); if (unlikely(net_xmit_eval(err))) DEV_STATS_INC(dev, tx_errors); else @@ -495,7 +495,7 @@ static int ipvlan_process_v6_outbound(struct sk_buff *skb) memset(IP6CB(skb), 0, sizeof(*IP6CB(skb))); - err = ip6_local_out(dev_net(dev), skb->sk, skb); + err = ip6_local_out(dev_net(dev), NULL, skb); if (unlikely(net_xmit_eval(err))) DEV_STATS_INC(dev, tx_errors); else From 1f4b848935515a09be039226ed034355be4efccc Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Thu, 30 May 2024 08:20:14 -0700 Subject: [PATCH 0242/1565] hwmon: (shtc1) Fix property misspelling [ Upstream commit 52a2c70c3ec555e670a34dd1ab958986451d2dd2 ] The property name is "sensirion,low-precision", not "sensicon,low-precision". Cc: Chris Ruehl Fixes: be7373b60df5 ("hwmon: shtc1: add support for device tree bindings") Signed-off-by: Guenter Roeck Signed-off-by: Sasha Levin --- drivers/hwmon/shtc1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/hwmon/shtc1.c b/drivers/hwmon/shtc1.c index 18546ebc8e9f..0365643029ae 100644 --- a/drivers/hwmon/shtc1.c +++ b/drivers/hwmon/shtc1.c @@ -238,7 +238,7 @@ static int shtc1_probe(struct i2c_client *client) if (np) { data->setup.blocking_io = of_property_read_bool(np, "sensirion,blocking-io"); - data->setup.high_precision = !of_property_read_bool(np, "sensicon,low-precision"); + data->setup.high_precision = !of_property_read_bool(np, "sensirion,low-precision"); } else { if (client->dev.platform_data) data->setup = *(struct shtc1_platform_data *)dev->platform_data; From bdd0aa055b8ec7e24bbc19513f3231958741d0ab Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Tue, 14 May 2024 20:27:36 +0200 Subject: [PATCH 0243/1565] ALSA: timer: Set lower bound of start tick time commit 4a63bd179fa8d3fcc44a0d9d71d941ddd62f0c4e upstream. Currently ALSA timer doesn't have the lower limit of the start tick time, and it allows a very small size, e.g. 1 tick with 1ns resolution for hrtimer. Such a situation may lead to an unexpected RCU stall, where the callback repeatedly queuing the expire update, as reported by fuzzer. This patch introduces a sanity check of the timer start tick time, so that the system returns an error when a too small start size is set. As of this patch, the lower limit is hard-coded to 100us, which is small enough but can still work somehow. Reported-by: syzbot+43120c2af6ca2938cc38@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/000000000000fa00a1061740ab6d@google.com Cc: Link: https://lore.kernel.org/r/20240514182745.4015-1-tiwai@suse.de Signed-off-by: Takashi Iwai [ backport note: the error handling is changed, as the original commit is based on the recent cleanup with guard() in commit beb45974dd49 -- tiwai ] Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/core/timer.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/sound/core/timer.c b/sound/core/timer.c index 764d2b19344e..708c9a46eefe 100644 --- a/sound/core/timer.c +++ b/sound/core/timer.c @@ -553,6 +553,16 @@ static int snd_timer_start1(struct snd_timer_instance *timeri, goto unlock; } + /* check the actual time for the start tick; + * bail out as error if it's way too low (< 100us) + */ + if (start) { + if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000) { + result = -EINVAL; + goto unlock; + } + } + if (start) timeri->ticks = timeri->cticks = ticks; else if (!timeri->cticks) From 6752dfcfff3ac3e16625ebd3f0ad9630900e7e76 Mon Sep 17 00:00:00 2001 From: Dongli Zhang Date: Wed, 22 May 2024 15:02:18 -0700 Subject: [PATCH 0244/1565] genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline commit a6c11c0a5235fb144a65e0cb2ffd360ddc1f6c32 upstream. The absence of IRQD_MOVE_PCNTXT prevents immediate effectiveness of interrupt affinity reconfiguration via procfs. Instead, the change is deferred until the next instance of the interrupt being triggered on the original CPU. When the interrupt next triggers on the original CPU, the new affinity is enforced within __irq_move_irq(). A vector is allocated from the new CPU, but the old vector on the original CPU remains and is not immediately reclaimed. Instead, apicd->move_in_progress is flagged, and the reclaiming process is delayed until the next trigger of the interrupt on the new CPU. Upon the subsequent triggering of the interrupt on the new CPU, irq_complete_move() adds a task to the old CPU's vector_cleanup list if it remains online. Subsequently, the timer on the old CPU iterates over its vector_cleanup list, reclaiming old vectors. However, a rare scenario arises if the old CPU is outgoing before the interrupt triggers again on the new CPU. In that case irq_force_complete_move() is not invoked on the outgoing CPU to reclaim the old apicd->prev_vector because the interrupt isn't currently affine to the outgoing CPU, and irq_needs_fixup() returns false. Even though __vector_schedule_cleanup() is later called on the new CPU, it doesn't reclaim apicd->prev_vector; instead, it simply resets both apicd->move_in_progress and apicd->prev_vector to 0. As a result, the vector remains unreclaimed in vector_matrix, leading to a CPU vector leak. To address this issue, move the invocation of irq_force_complete_move() before the irq_needs_fixup() call to reclaim apicd->prev_vector, if the interrupt is currently or used to be affine to the outgoing CPU. Additionally, reclaim the vector in __vector_schedule_cleanup() as well, following a warning message, although theoretically it should never see apicd->move_in_progress with apicd->prev_cpu pointing to an offline CPU. Fixes: f0383c24b485 ("genirq/cpuhotplug: Add support for cleaning up move in progress") Signed-off-by: Dongli Zhang Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240522220218.162423-1-dongli.zhang@oracle.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/apic/vector.c | 9 ++++++--- kernel/irq/cpuhotplug.c | 16 ++++++++-------- 2 files changed, 14 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index bd557e9f5dd8..167c9df27ae7 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -920,7 +920,8 @@ static void __send_cleanup_vector(struct apic_chip_data *apicd) hlist_add_head(&apicd->clist, per_cpu_ptr(&cleanup_list, cpu)); apic->send_IPI(cpu, IRQ_MOVE_CLEANUP_VECTOR); } else { - apicd->prev_vector = 0; + pr_warn("IRQ %u schedule cleanup for offline CPU %u\n", apicd->irq, cpu); + free_moved_vector(apicd); } raw_spin_unlock(&vector_lock); } @@ -957,6 +958,7 @@ void irq_complete_move(struct irq_cfg *cfg) */ void irq_force_complete_move(struct irq_desc *desc) { + unsigned int cpu = smp_processor_id(); struct apic_chip_data *apicd; struct irq_data *irqd; unsigned int vector; @@ -981,10 +983,11 @@ void irq_force_complete_move(struct irq_desc *desc) goto unlock; /* - * If prev_vector is empty, no action required. + * If prev_vector is empty or the descriptor is neither currently + * nor previously on the outgoing CPU no action required. */ vector = apicd->prev_vector; - if (!vector) + if (!vector || (apicd->cpu != cpu && apicd->prev_cpu != cpu)) goto unlock; /* diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c index 02236b13b359..40c6512bc163 100644 --- a/kernel/irq/cpuhotplug.c +++ b/kernel/irq/cpuhotplug.c @@ -69,6 +69,14 @@ static bool migrate_one_irq(struct irq_desc *desc) return false; } + /* + * Complete an eventually pending irq move cleanup. If this + * interrupt was moved in hard irq context, then the vectors need + * to be cleaned up. It can't wait until this interrupt actually + * happens and this CPU was involved. + */ + irq_force_complete_move(desc); + /* * No move required, if: * - Interrupt is per cpu @@ -87,14 +95,6 @@ static bool migrate_one_irq(struct irq_desc *desc) return false; } - /* - * Complete an eventually pending irq move cleanup. If this - * interrupt was moved in hard irq context, then the vectors need - * to be cleaned up. It can't wait until this interrupt actually - * happens and this CPU was involved. - */ - irq_force_complete_move(desc); - /* * If there is a setaffinity pending, then try to reuse the pending * mask, so the last change of the affinity does not get lost. If From 0cf6693d3f8ed5b733c6df1989fc43cfc07bce05 Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Mon, 12 Jun 2023 15:58:37 +0200 Subject: [PATCH 0245/1565] media: cec: core: add adap_nb_transmit_canceled() callback commit da53c36ddd3f118a525a04faa8c47ca471e6c467 upstream. A potential deadlock was found by Zheng Zhang with a local syzkaller instance. The problem is that when a non-blocking CEC transmit is canceled by calling cec_data_cancel, that in turn can call the high-level received() driver callback, which can call cec_transmit_msg() to transmit a new message. The cec_data_cancel() function is called with the adap->lock mutex held, and cec_transmit_msg() tries to take that same lock. The root cause is that the received() callback can either be used to pass on a received message (and then adap->lock is not held), or to report a canceled transmit (and then adap->lock is held). This is confusing, so create a new low-level adap_nb_transmit_canceled callback that reports back that a non-blocking transmit was canceled. And the received() callback is only called when a message is received, as was the case before commit f9d0ecbf56f4 ("media: cec: correctly pass on reply results") complicated matters. Reported-by: Zheng Zhang Signed-off-by: Hans Verkuil Fixes: f9d0ecbf56f4 ("media: cec: correctly pass on reply results") Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman --- drivers/media/cec/core/cec-adap.c | 4 ++-- include/media/cec.h | 6 ++++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index a5853c82d5e4..73cc2b0fe185 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -397,8 +397,8 @@ static void cec_data_cancel(struct cec_data *data, u8 tx_status, u8 rx_status) cec_queue_msg_monitor(adap, &data->msg, 1); if (!data->blocking && data->msg.sequence) - /* Allow drivers to process the message first */ - call_op(adap, received, &data->msg); + /* Allow drivers to react to a canceled transmit */ + call_void_op(adap, adap_nb_transmit_canceled, &data->msg); cec_data_completed(data); } diff --git a/include/media/cec.h b/include/media/cec.h index 23202bf439b4..38eb9334d854 100644 --- a/include/media/cec.h +++ b/include/media/cec.h @@ -120,14 +120,16 @@ struct cec_adap_ops { int (*adap_log_addr)(struct cec_adapter *adap, u8 logical_addr); int (*adap_transmit)(struct cec_adapter *adap, u8 attempts, u32 signal_free_time, struct cec_msg *msg); + void (*adap_nb_transmit_canceled)(struct cec_adapter *adap, + const struct cec_msg *msg); void (*adap_status)(struct cec_adapter *adap, struct seq_file *file); void (*adap_free)(struct cec_adapter *adap); - /* Error injection callbacks */ + /* Error injection callbacks, called without adap->lock held */ int (*error_inj_show)(struct cec_adapter *adap, struct seq_file *sf); bool (*error_inj_parse_line)(struct cec_adapter *adap, char *line); - /* High-level CEC message callback */ + /* High-level CEC message callback, called without adap->lock held */ int (*received)(struct cec_adapter *adap, struct cec_msg *msg); }; From 4cefcd0af7458bdeff56a9d8dfc6868ce23d128a Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 2 Jun 2024 18:15:25 -0400 Subject: [PATCH 0246/1565] SUNRPC: Fix loop termination condition in gss_free_in_token_pages() commit 4a77c3dead97339478c7422eb07bf4bf63577008 upstream. The in_token->pages[] array is not NULL terminated. This results in the following KASAN splat: KASAN: maybe wild-memory-access in range [0x04a2013400000008-0x04a201340000000f] Fixes: bafa6b4d95d9 ("SUNRPC: Fix gss_free_in_token_pages()") Reviewed-by: Benjamin Coddington Signed-off-by: Chuck Lever Signed-off-by: Greg Kroah-Hartman --- net/sunrpc/auth_gss/svcauth_gss.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c index 23e6ac9cce11..784c8b24f164 100644 --- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -1156,7 +1156,7 @@ static int gss_read_proxy_verf(struct svc_rqst *rqstp, } pages = DIV_ROUND_UP(inlen, PAGE_SIZE); - in_token->pages = kcalloc(pages, sizeof(struct page *), GFP_KERNEL); + in_token->pages = kcalloc(pages + 1, sizeof(struct page *), GFP_KERNEL); if (!in_token->pages) { kfree(in_handle->data); return SVC_DENIED; From 75805481c35d05d6d56db175d8f020c2948a3e3a Mon Sep 17 00:00:00 2001 From: Carlos Llamas Date: Sun, 21 Apr 2024 17:37:49 +0000 Subject: [PATCH 0247/1565] binder: fix max_thread type inconsistency MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 42316941335644a98335f209daafa4c122f28983 upstream. The type defined for the BINDER_SET_MAX_THREADS ioctl was changed from size_t to __u32 in order to avoid incompatibility issues between 32 and 64-bit kernels. However, the internal types used to copy from user and store the value were never updated. Use u32 to fix the inconsistency. Fixes: a9350fc859ae ("staging: android: binder: fix BINDER_SET_MAX_THREADS declaration") Reported-by: Arve Hjønnevåg Cc: stable@vger.kernel.org Signed-off-by: Carlos Llamas Reviewed-by: Alice Ryhl Link: https://lore.kernel.org/r/20240421173750.3117808-1-cmllamas@google.com [cmllamas: resolve minor conflicts due to missing commit 421518a2740f] Signed-off-by: Carlos Llamas Signed-off-by: Greg Kroah-Hartman --- drivers/android/binder.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index bcbaa4d6a0ff..6631a65f632b 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -476,7 +476,7 @@ struct binder_proc { struct list_head todo; struct binder_stats stats; struct list_head delivered_death; - int max_threads; + u32 max_threads; int requested_threads; int requested_threads_started; int tmp_ref; @@ -5408,7 +5408,7 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) goto err; break; case BINDER_SET_MAX_THREADS: { - int max_threads; + u32 max_threads; if (copy_from_user(&max_threads, ubuf, sizeof(max_threads))) { From 54e8f88d2baa104dee179ebd3c3325b93383e7d2 Mon Sep 17 00:00:00 2001 From: Jorge Ramirez-Ortiz Date: Wed, 3 Jan 2024 12:29:11 +0100 Subject: [PATCH 0248/1565] mmc: core: Do not force a retune before RPMB switch commit 67380251e8bbd3302c64fea07f95c31971b91c22 upstream. Requesting a retune before switching to the RPMB partition has been observed to cause CRC errors on the RPMB reads (-EILSEQ). Since RPMB reads can not be retried, the clients would be directly affected by the errors. This commit disables the retune request prior to switching to the RPMB partition: mmc_retune_pause() no longer triggers a retune before the pause period begins. This was verified with the sdhci-of-arasan driver (ZynqMP) configured for HS200 using two separate eMMC cards (DG4064 and 064GB2). In both cases, the error was easy to reproduce triggering every few tenths of reads. With this commit, systems that were utilizing OP-TEE to access RPMB variables will experience an enhanced performance. Specifically, when OP-TEE is configured to employ RPMB as a secure storage solution, it not only writes the data but also the secure filesystem within the partition. As a result, retrieving any variable involves multiple RPMB reads, typically around five. For context, on ZynqMP, each retune request consumed approximately 8ms. Consequently, reading any RPMB variable used to take at the very minimum 40ms. After droping the need to retune before switching to the RPMB partition, this is no longer the case. Signed-off-by: Jorge Ramirez-Ortiz Acked-by: Avri Altman Acked-by: Adrian Hunter Link: https://lore.kernel.org/r/20240103112911.2954632-1-jorge@foundries.io Signed-off-by: Ulf Hansson Signed-off-by: Florian Fainelli Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/core/host.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c index b949a4468bf5..7ba1343ca5c1 100644 --- a/drivers/mmc/core/host.c +++ b/drivers/mmc/core/host.c @@ -114,13 +114,12 @@ void mmc_retune_enable(struct mmc_host *host) /* * Pause re-tuning for a small set of operations. The pause begins after the - * next command and after first doing re-tuning. + * next command. */ void mmc_retune_pause(struct mmc_host *host) { if (!host->retune_paused) { host->retune_paused = 1; - mmc_retune_needed(host); mmc_retune_hold(host); } } From b6920325aca03138ef9ba759ce0d4e6fbb5c9763 Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Fri, 10 May 2024 11:50:27 +0800 Subject: [PATCH 0249/1565] io_uring: fail NOP if non-zero op flags is passed in commit 3d8f874bd620ce03f75a5512847586828ab86544 upstream. The NOP op flags should have been checked from beginning like any other opcode, otherwise NOP may not be extended with the op flags. Given both liburing and Rust io-uring crate always zeros SQE op flags, just ignore users which play raw NOP uring interface without zeroing SQE, because NOP is just for test purpose. Then we can save one NOP2 opcode. Suggested-by: Jens Axboe Fixes: 2b188cc1bb85 ("Add io_uring IO interface") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei Link: https://lore.kernel.org/r/20240510035031.78874-2-ming.lei@redhat.com Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- io_uring/io_uring.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 93f9ecedc59f..47bc8fe2b945 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -6474,6 +6474,8 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { switch (req->opcode) { case IORING_OP_NOP: + if (READ_ONCE(sqe->rw_flags)) + return -EINVAL; return 0; case IORING_OP_READV: case IORING_OP_READ_FIXED: From e31fe702ed0891ee2194fffa08c6333b50ff8eb5 Mon Sep 17 00:00:00 2001 From: Marc Dionne Date: Fri, 24 May 2024 17:17:55 +0100 Subject: [PATCH 0250/1565] afs: Don't cross .backup mountpoint from backup volume commit 29be9100aca2915fab54b5693309bc42956542e5 upstream. Don't cross a mountpoint that explicitly specifies a backup volume (target is .backup) when starting from a backup volume. It it not uncommon to mount a volume's backup directly in the volume itself. This can cause tools that are not paying attention to get into a loop mounting the volume onto itself as they attempt to traverse the tree, leading to a variety of problems. This doesn't prevent the general case of loops in a sequence of mountpoints, but addresses a common special case in the same way as other afs clients. Reported-by: Jan Henrik Sylvester Link: http://lists.infradead.org/pipermail/linux-afs/2024-May/008454.html Reported-by: Markus Suvanto Link: http://lists.infradead.org/pipermail/linux-afs/2024-February/008074.html Signed-off-by: Marc Dionne Signed-off-by: David Howells Link: https://lore.kernel.org/r/768760.1716567475@warthog.procyon.org.uk Reviewed-by: Jeffrey Altman cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner Signed-off-by: Greg Kroah-Hartman --- fs/afs/mntpt.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/afs/mntpt.c b/fs/afs/mntpt.c index bbb2c210d139..fa8a6543142d 100644 --- a/fs/afs/mntpt.c +++ b/fs/afs/mntpt.c @@ -146,6 +146,11 @@ static int afs_mntpt_set_params(struct fs_context *fc, struct dentry *mntpt) put_page(page); if (ret < 0) return ret; + + /* Don't cross a backup volume mountpoint from a backup volume */ + if (src_as->volume && src_as->volume->type == AFSVL_BACKVOL && + ctx->type == AFSVL_BACKVOL) + return -ENODEV; } return 0; From 67fa90d4a2ccd9ebb0e1e168c7d0b5d0cf3c7148 Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi Date: Mon, 20 May 2024 22:26:19 +0900 Subject: [PATCH 0251/1565] nilfs2: fix use-after-free of timer for log writer thread commit f5d4e04634c9cf68bdf23de08ada0bb92e8befe7 upstream. Patch series "nilfs2: fix log writer related issues". This bug fix series covers three nilfs2 log writer-related issues, including a timer use-after-free issue and potential deadlock issue on unmount, and a potential freeze issue in event synchronization found during their analysis. Details are described in each commit log. This patch (of 3): A use-after-free issue has been reported regarding the timer sc_timer on the nilfs_sc_info structure. The problem is that even though it is used to wake up a sleeping log writer thread, sc_timer is not shut down until the nilfs_sc_info structure is about to be freed, and is used regardless of the thread's lifetime. Fix this issue by limiting the use of sc_timer only while the log writer thread is alive. Link: https://lkml.kernel.org/r/20240520132621.4054-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240520132621.4054-2-konishi.ryusuke@gmail.com Fixes: fdce895ea5dd ("nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info") Signed-off-by: Ryusuke Konishi Reported-by: "Bai, Shuangpeng" Closes: https://groups.google.com/g/syzkaller/c/MK_LYqtt8ko/m/8rgdWeseAwAJ Tested-by: Ryusuke Konishi Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- fs/nilfs2/segment.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index 3997f377f0e3..2eeae67ad298 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -2164,8 +2164,10 @@ static void nilfs_segctor_start_timer(struct nilfs_sc_info *sci) { spin_lock(&sci->sc_state_lock); if (!(sci->sc_state & NILFS_SEGCTOR_COMMIT)) { - sci->sc_timer.expires = jiffies + sci->sc_interval; - add_timer(&sci->sc_timer); + if (sci->sc_task) { + sci->sc_timer.expires = jiffies + sci->sc_interval; + add_timer(&sci->sc_timer); + } sci->sc_state |= NILFS_SEGCTOR_COMMIT; } spin_unlock(&sci->sc_state_lock); @@ -2385,10 +2387,21 @@ int nilfs_construct_dsync_segment(struct super_block *sb, struct inode *inode, */ static void nilfs_segctor_accept(struct nilfs_sc_info *sci) { + bool thread_is_alive; + spin_lock(&sci->sc_state_lock); sci->sc_seq_accepted = sci->sc_seq_request; + thread_is_alive = (bool)sci->sc_task; spin_unlock(&sci->sc_state_lock); - del_timer_sync(&sci->sc_timer); + + /* + * This function does not race with the log writer thread's + * termination. Therefore, deleting sc_timer, which should not be + * done after the log writer thread exits, can be done safely outside + * the area protected by sc_state_lock. + */ + if (thread_is_alive) + del_timer_sync(&sci->sc_timer); } /** @@ -2414,7 +2427,7 @@ static void nilfs_segctor_notify(struct nilfs_sc_info *sci, int mode, int err) sci->sc_flush_request &= ~FLUSH_DAT_BIT; /* re-enable timer if checkpoint creation was not done */ - if ((sci->sc_state & NILFS_SEGCTOR_COMMIT) && + if ((sci->sc_state & NILFS_SEGCTOR_COMMIT) && sci->sc_task && time_before(jiffies, sci->sc_timer.expires)) add_timer(&sci->sc_timer); } @@ -2604,6 +2617,7 @@ static int nilfs_segctor_thread(void *arg) int timeout = 0; sci->sc_timer_task = current; + timer_setup(&sci->sc_timer, nilfs_construction_timeout, 0); /* start sync. */ sci->sc_task = current; @@ -2670,6 +2684,7 @@ static int nilfs_segctor_thread(void *arg) end_thread: /* end sync. */ sci->sc_task = NULL; + del_timer_sync(&sci->sc_timer); wake_up(&sci->sc_wait_task); /* for nilfs_segctor_kill_thread() */ spin_unlock(&sci->sc_state_lock); return 0; @@ -2733,7 +2748,6 @@ static struct nilfs_sc_info *nilfs_segctor_new(struct super_block *sb, INIT_LIST_HEAD(&sci->sc_gc_inodes); INIT_LIST_HEAD(&sci->sc_iput_queue); INIT_WORK(&sci->sc_iput_work, nilfs_iput_work_func); - timer_setup(&sci->sc_timer, nilfs_construction_timeout, 0); sci->sc_interval = HZ * NILFS_SC_DEFAULT_TIMEOUT; sci->sc_mjcp_freq = HZ * NILFS_SC_DEFAULT_SR_FREQ; @@ -2819,7 +2833,6 @@ static void nilfs_segctor_destroy(struct nilfs_sc_info *sci) down_write(&nilfs->ns_segctor_sem); - del_timer_sync(&sci->sc_timer); kfree(sci); } From 3ee36f0048a3e5b0750bcbff87ae21093f29b21e Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Mon, 3 Jun 2024 10:59:26 +0200 Subject: [PATCH 0252/1565] vxlan: Fix regression when dropping packets due to invalid src addresses commit 1cd4bc987abb2823836cbb8f887026011ccddc8a upstream. Commit f58f45c1e5b9 ("vxlan: drop packets from invalid src-address") has recently been added to vxlan mainly in the context of source address snooping/learning so that when it is enabled, an entry in the FDB is not being created for an invalid address for the corresponding tunnel endpoint. Before commit f58f45c1e5b9 vxlan was similarly behaving as geneve in that it passed through whichever macs were set in the L2 header. It turns out that this change in behavior breaks setups, for example, Cilium with netkit in L3 mode for Pods as well as tunnel mode has been passing before the change in f58f45c1e5b9 for both vxlan and geneve. After mentioned change it is only passing for geneve as in case of vxlan packets are dropped due to vxlan_set_mac() returning false as source and destination macs are zero which for E/W traffic via tunnel is totally fine. Fix it by only opting into the is_valid_ether_addr() check in vxlan_set_mac() when in fact source address snooping/learning is actually enabled in vxlan. This is done by moving the check into vxlan_snoop(). With this change, the Cilium connectivity test suite passes again for both tunnel flavors. Fixes: f58f45c1e5b9 ("vxlan: drop packets from invalid src-address") Signed-off-by: Daniel Borkmann Cc: David Bauer Cc: Ido Schimmel Cc: Nikolay Aleksandrov Cc: Martin KaFai Lau Reviewed-by: Ido Schimmel Reviewed-by: Nikolay Aleksandrov Reviewed-by: David Bauer Signed-off-by: David S. Miller [ Backport note: vxlan snooping/learning not supported in 6.8 or older, so commit is simply a revert. ] Signed-off-by: Daniel Borkmann Signed-off-by: Greg Kroah-Hartman --- drivers/net/vxlan/vxlan_core.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c index b173497a3e0c..3096769e718e 100644 --- a/drivers/net/vxlan/vxlan_core.c +++ b/drivers/net/vxlan/vxlan_core.c @@ -1778,10 +1778,6 @@ static bool vxlan_set_mac(struct vxlan_dev *vxlan, if (ether_addr_equal(eth_hdr(skb)->h_source, vxlan->dev->dev_addr)) return false; - /* Ignore packets from invalid src-address */ - if (!is_valid_ether_addr(eth_hdr(skb)->h_source)) - return false; - /* Get address from the outer IP header */ if (vxlan_get_sk_family(vs) == AF_INET) { saddr.sin.sin_addr.s_addr = ip_hdr(skb)->saddr; From a447f26830383f13ffffcb8a75028d0b21ba62cd Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Mon, 29 Apr 2024 10:00:51 +0200 Subject: [PATCH 0253/1565] x86/mm: Remove broken vsyscall emulation code from the page fault code commit 02b670c1f88e78f42a6c5aee155c7b26960ca054 upstream. The syzbot-reported stack trace from hell in this discussion thread actually has three nested page faults: https://lore.kernel.org/r/000000000000d5f4fc0616e816d4@google.com ... and I think that's actually the important thing here: - the first page fault is from user space, and triggers the vsyscall emulation. - the second page fault is from __do_sys_gettimeofday(), and that should just have caused the exception that then sets the return value to -EFAULT - the third nested page fault is due to _raw_spin_unlock_irqrestore() -> preempt_schedule() -> trace_sched_switch(), which then causes a BPF trace program to run, which does that bpf_probe_read_compat(), which causes that page fault under pagefault_disable(). It's quite the nasty backtrace, and there's a lot going on. The problem is literally the vsyscall emulation, which sets current->thread.sig_on_uaccess_err = 1; and that causes the fixup_exception() code to send the signal *despite* the exception being caught. And I think that is in fact completely bogus. It's completely bogus exactly because it sends that signal even when it *shouldn't* be sent - like for the BPF user mode trace gathering. In other words, I think the whole "sig_on_uaccess_err" thing is entirely broken, because it makes any nested page-faults do all the wrong things. Now, arguably, I don't think anybody should enable vsyscall emulation any more, but this test case clearly does. I think we should just make the "send SIGSEGV" be something that the vsyscall emulation does on its own, not this broken per-thread state for something that isn't actually per thread. The x86 page fault code actually tried to deal with the "incorrect nesting" by having that: if (in_interrupt()) return; which ignores the sig_on_uaccess_err case when it happens in interrupts, but as shown by this example, these nested page faults do not need to be about interrupts at all. IOW, I think the only right thing is to remove that horrendously broken code. The attached patch looks like the ObviouslyCorrect(tm) thing to do. NOTE! This broken code goes back to this commit in 2011: 4fc3490114bb ("x86-64: Set siginfo and context on vsyscall emulation faults") ... and back then the reason was to get all the siginfo details right. Honestly, I do not for a moment believe that it's worth getting the siginfo details right here, but part of the commit says: This fixes issues with UML when vsyscall=emulate. ... and so my patch to remove this garbage will probably break UML in this situation. I do not believe that anybody should be running with vsyscall=emulate in 2024 in the first place, much less if you are doing things like UML. But let's see if somebody screams. Reported-and-tested-by: syzbot+83e7f982ca045ab4405c@syzkaller.appspotmail.com Signed-off-by: Linus Torvalds Signed-off-by: Ingo Molnar Tested-by: Jiri Olsa Acked-by: Andy Lutomirski Link: https://lore.kernel.org/r/CAHk-=wh9D6f7HUkDgZHKmDCHUQmp+Co89GP+b8+z+G56BKeyNg@mail.gmail.com Signed-off-by: Sasha Levin [gpiccoli: Backport the patch due to differences in the trees. The main change between 5.10.y and 5.15.y is due to renaming the fixup function, by commit 6456a2a69ee1 ("x86/fault: Rename no_context() to kernelmode_fixup_or_oops()"). Following 2 commits cause divergence in the diffs too (in the removed lines): cd072dab453a ("x86/fault: Add a helper function to sanitize error code") d4ffd5df9d18 ("x86/fault: Fix wrong signal when vsyscall fails with pkey") Finally, there is context adjustment in the processor.h file.] Signed-off-by: Guilherme G. Piccoli Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/vsyscall/vsyscall_64.c | 28 ++------------------------- arch/x86/include/asm/processor.h | 1 - arch/x86/mm/fault.c | 27 +------------------------- 3 files changed, 3 insertions(+), 53 deletions(-) diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c index 44c33103a955..f0b817eb6e8b 100644 --- a/arch/x86/entry/vsyscall/vsyscall_64.c +++ b/arch/x86/entry/vsyscall/vsyscall_64.c @@ -98,11 +98,6 @@ static int addr_to_vsyscall_nr(unsigned long addr) static bool write_ok_or_segv(unsigned long ptr, size_t size) { - /* - * XXX: if access_ok, get_user, and put_user handled - * sig_on_uaccess_err, this could go away. - */ - if (!access_ok((void __user *)ptr, size)) { struct thread_struct *thread = ¤t->thread; @@ -120,10 +115,8 @@ static bool write_ok_or_segv(unsigned long ptr, size_t size) bool emulate_vsyscall(unsigned long error_code, struct pt_regs *regs, unsigned long address) { - struct task_struct *tsk; unsigned long caller; int vsyscall_nr, syscall_nr, tmp; - int prev_sig_on_uaccess_err; long ret; unsigned long orig_dx; @@ -172,8 +165,6 @@ bool emulate_vsyscall(unsigned long error_code, goto sigsegv; } - tsk = current; - /* * Check for access_ok violations and find the syscall nr. * @@ -233,12 +224,8 @@ bool emulate_vsyscall(unsigned long error_code, goto do_ret; /* skip requested */ /* - * With a real vsyscall, page faults cause SIGSEGV. We want to - * preserve that behavior to make writing exploits harder. + * With a real vsyscall, page faults cause SIGSEGV. */ - prev_sig_on_uaccess_err = current->thread.sig_on_uaccess_err; - current->thread.sig_on_uaccess_err = 1; - ret = -EFAULT; switch (vsyscall_nr) { case 0: @@ -261,23 +248,12 @@ bool emulate_vsyscall(unsigned long error_code, break; } - current->thread.sig_on_uaccess_err = prev_sig_on_uaccess_err; - check_fault: if (ret == -EFAULT) { /* Bad news -- userspace fed a bad pointer to a vsyscall. */ warn_bad_vsyscall(KERN_INFO, regs, "vsyscall fault (exploit attempt?)"); - - /* - * If we failed to generate a signal for any reason, - * generate one here. (This should be impossible.) - */ - if (WARN_ON_ONCE(!sigismember(&tsk->pending.signal, SIGBUS) && - !sigismember(&tsk->pending.signal, SIGSEGV))) - goto sigsegv; - - return true; /* Don't emulate the ret. */ + goto sigsegv; } regs->ax = ret; diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 6dc3c5f0be07..c682a14299e0 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -528,7 +528,6 @@ struct thread_struct { unsigned long iopl_emul; unsigned int iopl_warn:1; - unsigned int sig_on_uaccess_err:1; /* Floating point and extended processor state */ struct fpu fpu; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index cdb337cf92ba..98a5924d98b7 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -649,33 +649,8 @@ no_context(struct pt_regs *regs, unsigned long error_code, } /* Are we prepared to handle this kernel fault? */ - if (fixup_exception(regs, X86_TRAP_PF, error_code, address)) { - /* - * Any interrupt that takes a fault gets the fixup. This makes - * the below recursive fault logic only apply to a faults from - * task context. - */ - if (in_interrupt()) - return; - - /* - * Per the above we're !in_interrupt(), aka. task context. - * - * In this case we need to make sure we're not recursively - * faulting through the emulate_vsyscall() logic. - */ - if (current->thread.sig_on_uaccess_err && signal) { - set_signal_archinfo(address, error_code); - - /* XXX: hwpoison faults will set the wrong code. */ - force_sig_fault(signal, si_code, (void __user *)address); - } - - /* - * Barring that, we can do the fixup and be happy. - */ + if (fixup_exception(regs, X86_TRAP_PF, error_code, address)) return; - } #ifdef CONFIG_VMAP_STACK /* From 9c1c2ea0996d6d4068a73c7d9ca7da2008d1c52c Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Tue, 23 Jan 2024 23:45:32 +0100 Subject: [PATCH 0254/1565] netfilter: nf_tables: restrict tunnel object to NFPROTO_NETDEV commit 776d451648443f9884be4a1b4e38e8faf1c621f9 upstream. Bail out on using the tunnel dst template from other than netdev family. Add the infrastructure to check for the family in objects. Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support") Signed-off-by: Pablo Neira Ayuso [KN: Backport patch according to v5.10.x source] Signed-off-by: Kuntal Nayak Signed-off-by: Greg Kroah-Hartman --- include/net/netfilter/nf_tables.h | 2 ++ net/netfilter/nf_tables_api.c | 14 +++++++++----- net/netfilter/nft_tunnel.c | 1 + 3 files changed, 12 insertions(+), 5 deletions(-) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 2da11d8c0f45..ab8d84775ca8 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -1174,6 +1174,7 @@ void nft_obj_notify(struct net *net, const struct nft_table *table, * @type: stateful object numeric type * @owner: module owner * @maxattr: maximum netlink attribute + * @family: address family for AF-specific object types * @policy: netlink attribute policy */ struct nft_object_type { @@ -1183,6 +1184,7 @@ struct nft_object_type { struct list_head list; u32 type; unsigned int maxattr; + u8 family; struct module *owner; const struct nla_policy *policy; }; diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 858d09b54eaa..de56f25dcda1 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -6234,11 +6234,15 @@ static int nft_object_dump(struct sk_buff *skb, unsigned int attr, return -1; } -static const struct nft_object_type *__nft_obj_type_get(u32 objtype) +static const struct nft_object_type *__nft_obj_type_get(u32 objtype, u8 family) { const struct nft_object_type *type; list_for_each_entry(type, &nf_tables_objects, list) { + if (type->family != NFPROTO_UNSPEC && + type->family != family) + continue; + if (objtype == type->type) return type; } @@ -6246,11 +6250,11 @@ static const struct nft_object_type *__nft_obj_type_get(u32 objtype) } static const struct nft_object_type * -nft_obj_type_get(struct net *net, u32 objtype) +nft_obj_type_get(struct net *net, u32 objtype, u8 family) { const struct nft_object_type *type; - type = __nft_obj_type_get(objtype); + type = __nft_obj_type_get(objtype, family); if (type != NULL && try_module_get(type->owner)) return type; @@ -6343,7 +6347,7 @@ static int nf_tables_newobj(struct net *net, struct sock *nlsk, if (nlh->nlmsg_flags & NLM_F_REPLACE) return -EOPNOTSUPP; - type = __nft_obj_type_get(objtype); + type = __nft_obj_type_get(objtype, family); nft_ctx_init(&ctx, net, skb, nlh, family, table, NULL, nla); return nf_tables_updobj(&ctx, type, nla[NFTA_OBJ_DATA], obj); @@ -6354,7 +6358,7 @@ static int nf_tables_newobj(struct net *net, struct sock *nlsk, if (!nft_use_inc(&table->use)) return -EMFILE; - type = nft_obj_type_get(net, objtype); + type = nft_obj_type_get(net, objtype, family); if (IS_ERR(type)) { err = PTR_ERR(type); goto err_type; diff --git a/net/netfilter/nft_tunnel.c b/net/netfilter/nft_tunnel.c index 2ee50996da8c..c8822fa8196d 100644 --- a/net/netfilter/nft_tunnel.c +++ b/net/netfilter/nft_tunnel.c @@ -684,6 +684,7 @@ static const struct nft_object_ops nft_tunnel_obj_ops = { static struct nft_object_type nft_tunnel_obj_type __read_mostly = { .type = NFT_OBJECT_TUNNEL, + .family = NFPROTO_NETDEV, .ops = &nft_tunnel_obj_ops, .maxattr = NFTA_TUNNEL_KEY_MAX, .policy = nft_tunnel_key_policy, From cade34279c2249eafe528564bd2e203e4ff15f88 Mon Sep 17 00:00:00 2001 From: Ziyang Xuan Date: Sun, 7 Apr 2024 14:56:05 +0800 Subject: [PATCH 0255/1565] netfilter: nf_tables: Fix potential data-race in __nft_obj_type_get() commit d78d867dcea69c328db30df665be5be7d0148484 upstream. nft_unregister_obj() can concurrent with __nft_obj_type_get(), and there is not any protection when iterate over nf_tables_objects list in __nft_obj_type_get(). Therefore, there is potential data-race of nf_tables_objects list entry. Use list_for_each_entry_rcu() to iterate over nf_tables_objects list in __nft_obj_type_get(), and use rcu_read_lock() in the caller nft_obj_type_get() to protect the entire type query process. Fixes: e50092404c1b ("netfilter: nf_tables: add stateful objects") Signed-off-by: Ziyang Xuan Signed-off-by: Pablo Neira Ayuso Signed-off-by: Kuntal Nayak Signed-off-by: Greg Kroah-Hartman --- net/netfilter/nf_tables_api.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index de56f25dcda1..f3cb5c920276 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -6238,7 +6238,7 @@ static const struct nft_object_type *__nft_obj_type_get(u32 objtype, u8 family) { const struct nft_object_type *type; - list_for_each_entry(type, &nf_tables_objects, list) { + list_for_each_entry_rcu(type, &nf_tables_objects, list) { if (type->family != NFPROTO_UNSPEC && type->family != family) continue; @@ -6254,9 +6254,13 @@ nft_obj_type_get(struct net *net, u32 objtype, u8 family) { const struct nft_object_type *type; + rcu_read_lock(); type = __nft_obj_type_get(objtype, family); - if (type != NULL && try_module_get(type->owner)) + if (type != NULL && try_module_get(type->owner)) { + rcu_read_unlock(); return type; + } + rcu_read_unlock(); lockdep_nfnl_nft_mutex_not_held(); #ifdef CONFIG_MODULES From 75c87e2ac6149abf44bdde0dd6d541763ddb0dff Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Thu, 25 Apr 2024 16:58:38 +0800 Subject: [PATCH 0256/1565] f2fs: fix to do sanity check on i_xattr_nid in sanity_check_inode() commit 20faaf30e55522bba2b56d9c46689233205d7717 upstream. syzbot reports a kernel bug as below: F2FS-fs (loop0): Mounted with checkpoint version = 48b305e4 ================================================================== BUG: KASAN: slab-out-of-bounds in f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline] BUG: KASAN: slab-out-of-bounds in current_nat_addr fs/f2fs/node.h:213 [inline] BUG: KASAN: slab-out-of-bounds in f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600 Read of size 1 at addr ffff88807a58c76c by task syz-executor280/5076 CPU: 1 PID: 5076 Comm: syz-executor280 Not tainted 6.9.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline] current_nat_addr fs/f2fs/node.h:213 [inline] f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600 f2fs_xattr_fiemap fs/f2fs/data.c:1848 [inline] f2fs_fiemap+0x55d/0x1ee0 fs/f2fs/data.c:1925 ioctl_fiemap fs/ioctl.c:220 [inline] do_vfs_ioctl+0x1c07/0x2e50 fs/ioctl.c:838 __do_sys_ioctl fs/ioctl.c:902 [inline] __se_sys_ioctl+0x81/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f The root cause is we missed to do sanity check on i_xattr_nid during f2fs_iget(), so that in fiemap() path, current_nat_addr() will access nat_bitmap w/ offset from invalid i_xattr_nid, result in triggering kasan bug report, fix it. Reported-and-tested-by: syzbot+3694e283cf5c40df6d14@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/00000000000094036c0616e72a1d@google.com Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Greg Kroah-Hartman --- fs/f2fs/inode.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index 3e98551f4186..724760353bcd 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -326,6 +326,12 @@ static bool sanity_check_inode(struct inode *inode, struct page *node_page) } } + if (fi->i_xattr_nid && f2fs_check_nid_range(sbi, fi->i_xattr_nid)) { + f2fs_warn(sbi, "%s: inode (ino=%lx) has corrupted i_xattr_nid: %u, run fsck to fix.", + __func__, inode->i_ino, fi->i_xattr_nid); + return false; + } + return true; } From b479fd59a1f4a342b69fce34f222d93bf791dca4 Mon Sep 17 00:00:00 2001 From: Zheyu Ma Date: Tue, 5 Apr 2022 10:50:18 +0100 Subject: [PATCH 0257/1565] media: lgdt3306a: Add a check against null-pointer-def commit c1115ddbda9c930fba0fdd062e7a8873ebaf898d upstream. The driver should check whether the client provides the platform_data. The following log reveals it: [ 29.610324] BUG: KASAN: null-ptr-deref in kmemdup+0x30/0x40 [ 29.610730] Read of size 40 at addr 0000000000000000 by task bash/414 [ 29.612820] Call Trace: [ 29.613030] [ 29.613201] dump_stack_lvl+0x56/0x6f [ 29.613496] ? kmemdup+0x30/0x40 [ 29.613754] print_report.cold+0x494/0x6b7 [ 29.614082] ? kmemdup+0x30/0x40 [ 29.614340] kasan_report+0x8a/0x190 [ 29.614628] ? kmemdup+0x30/0x40 [ 29.614888] kasan_check_range+0x14d/0x1d0 [ 29.615213] memcpy+0x20/0x60 [ 29.615454] kmemdup+0x30/0x40 [ 29.615700] lgdt3306a_probe+0x52/0x310 [ 29.616339] i2c_device_probe+0x951/0xa90 Link: https://lore.kernel.org/linux-media/20220405095018.3993578-1-zheyuma97@gmail.com Signed-off-by: Zheyu Ma Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman --- drivers/media/dvb-frontends/lgdt3306a.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/media/dvb-frontends/lgdt3306a.c b/drivers/media/dvb-frontends/lgdt3306a.c index 47fb22180d5b..d638cc88aa77 100644 --- a/drivers/media/dvb-frontends/lgdt3306a.c +++ b/drivers/media/dvb-frontends/lgdt3306a.c @@ -2213,6 +2213,11 @@ static int lgdt3306a_probe(struct i2c_client *client, struct dvb_frontend *fe; int ret; + if (!client->dev.platform_data) { + dev_err(&client->dev, "platform data is mandatory\n"); + return -EINVAL; + } + config = kmemdup(client->dev.platform_data, sizeof(struct lgdt3306a_config), GFP_KERNEL); if (config == NULL) { From 8112fa72b7f139052843ff484130d6f97e9f052f Mon Sep 17 00:00:00 2001 From: Bob Zhou Date: Tue, 23 Apr 2024 16:58:11 +0800 Subject: [PATCH 0258/1565] drm/amdgpu: add error handle to avoid out-of-bounds MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 8b2faf1a4f3b6c748c0da36cda865a226534d520 upstream. if the sdma_v4_0_irq_id_to_seq return -EINVAL, the process should be stop to avoid out-of-bounds read, so directly return -EINVAL. Signed-off-by: Bob Zhou Acked-by: Christian König Reviewed-by: Le Ma Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c index dbcaef3f35da..c54834ed8751 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c @@ -2073,6 +2073,9 @@ static int sdma_v4_0_process_trap_irq(struct amdgpu_device *adev, DRM_DEBUG("IH: SDMA trap\n"); instance = sdma_v4_0_irq_id_to_seq(entry->client_id); + if (instance < 0) + return instance; + switch (entry->ring_id) { case 0: amdgpu_fence_process(&adev->sdma.instance[instance].ring); From 73485d6bd9d8fb16b09ea70367b7a86645da7282 Mon Sep 17 00:00:00 2001 From: Sergey Shtylyov Date: Sat, 4 May 2024 23:27:25 +0300 Subject: [PATCH 0259/1565] ata: pata_legacy: make legacy_exit() work again commit d4a89339f17c87c4990070e9116462d16e75894f upstream. Commit defc9cd826e4 ("pata_legacy: resychronize with upstream changes and resubmit") missed to update legacy_exit(), so that it now fails to do any cleanup -- the loop body there can never be entered. Fix that and finally remove now useless nr_legacy_host variable... Found by Linux Verification Center (linuxtesting.org) with the Svace static analysis tool. Fixes: defc9cd826e4 ("pata_legacy: resychronize with upstream changes and resubmit") Cc: stable@vger.kernel.org Signed-off-by: Sergey Shtylyov Reviewed-by: Niklas Cassel Signed-off-by: Damien Le Moal Signed-off-by: Greg Kroah-Hartman --- drivers/ata/pata_legacy.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/ata/pata_legacy.c b/drivers/ata/pata_legacy.c index 4405d255e3aa..220739b309ea 100644 --- a/drivers/ata/pata_legacy.c +++ b/drivers/ata/pata_legacy.c @@ -114,8 +114,6 @@ static int legacy_port[NR_HOST] = { 0x1f0, 0x170, 0x1e8, 0x168, 0x1e0, 0x160 }; static struct legacy_probe probe_list[NR_HOST]; static struct legacy_data legacy_data[NR_HOST]; static struct ata_host *legacy_host[NR_HOST]; -static int nr_legacy_host; - static int probe_all; /* Set to check all ISA port ranges */ static int ht6560a; /* HT 6560A on primary 1, second 2, both 3 */ @@ -1239,9 +1237,11 @@ static __exit void legacy_exit(void) { int i; - for (i = 0; i < nr_legacy_host; i++) { + for (i = 0; i < NR_HOST; i++) { struct legacy_data *ld = &legacy_data[i]; - ata_host_detach(legacy_host[i]); + + if (legacy_host[i]) + ata_host_detach(legacy_host[i]); platform_device_unregister(ld->platform_dev); } } From e2c6a9b342c6d10c43fdab02f67953cc2369f8b1 Mon Sep 17 00:00:00 2001 From: Christoffer Sandberg Date: Mon, 22 Apr 2024 10:04:36 +0200 Subject: [PATCH 0260/1565] ACPI: resource: Do IRQ override on TongFang GXxHRXx and GMxHGxx commit c81bf14f9db68311c2e75428eea070d97d603975 upstream. Listed devices need the override for the keyboard to work. Signed-off-by: Christoffer Sandberg Signed-off-by: Werner Sembach Cc: All applicable Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/resource.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 42b1b06efda6..aa92ec4fe721 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -475,6 +475,18 @@ static const struct dmi_system_id asus_laptop[] = { DMI_MATCH(DMI_BOARD_NAME, "B2502CBA"), }, }, + { + /* TongFang GXxHRXx/TUXEDO InfinityBook Pro Gen9 AMD */ + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "GXxHRXx"), + }, + }, + { + /* TongFang GMxHGxx/TUXEDO Stellaris Slim Gen1 AMD */ + .matches = { + DMI_MATCH(DMI_BOARD_NAME, "GMxHGxx"), + }, + }, { } }; From 5cd042835674b4c85b77075962c1b4c3956cd6e9 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Mon, 1 Apr 2024 16:08:54 +0200 Subject: [PATCH 0261/1565] arm64: tegra: Correct Tegra132 I2C alias commit 2633c58e1354d7de2c8e7be8bdb6f68a0a01bad7 upstream. There is no such device as "as3722@40", because its name is "pmic". Use phandles for aliases to fix relying on full node path. This corrects aliases for RTC devices and also fixes dtc W=1 warning: tegra132-norrin.dts:12.3-36: Warning (alias_paths): /aliases:rtc0: aliases property is not a valid node (/i2c@7000d000/as3722@40) Fixes: 0f279ebdf3ce ("arm64: tegra: Add NVIDIA Tegra132 Norrin support") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski Reviewed-by: Jon Hunter Signed-off-by: Thierry Reding Signed-off-by: Greg Kroah-Hartman --- arch/arm64/boot/dts/nvidia/tegra132-norrin.dts | 4 ++-- arch/arm64/boot/dts/nvidia/tegra132.dtsi | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/arm64/boot/dts/nvidia/tegra132-norrin.dts b/arch/arm64/boot/dts/nvidia/tegra132-norrin.dts index 6e5f8465669e..a5ff8cfedf34 100644 --- a/arch/arm64/boot/dts/nvidia/tegra132-norrin.dts +++ b/arch/arm64/boot/dts/nvidia/tegra132-norrin.dts @@ -9,8 +9,8 @@ / { compatible = "nvidia,norrin", "nvidia,tegra132", "nvidia,tegra124"; aliases { - rtc0 = "/i2c@7000d000/as3722@40"; - rtc1 = "/rtc@7000e000"; + rtc0 = &as3722; + rtc1 = &tegra_rtc; serial0 = &uarta; }; diff --git a/arch/arm64/boot/dts/nvidia/tegra132.dtsi b/arch/arm64/boot/dts/nvidia/tegra132.dtsi index b14e9f3bfdbd..2533d72fb2e5 100644 --- a/arch/arm64/boot/dts/nvidia/tegra132.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra132.dtsi @@ -573,7 +573,7 @@ spi@7000de00 { status = "disabled"; }; - rtc@7000e000 { + tegra_rtc: rtc@7000e000 { compatible = "nvidia,tegra124-rtc", "nvidia,tegra20-rtc"; reg = <0x0 0x7000e000 0x0 0x100>; interrupts = ; From 1f26711c084c29aedd1b28f427abc05e9158cae6 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Wed, 1 May 2024 09:52:01 +0200 Subject: [PATCH 0262/1565] arm64: dts: qcom: qcs404: fix bluetooth device address commit f5f390a77f18eaeb2c93211a1b7c5e66b5acd423 upstream. The 'local-bd-address' property is used to pass a unique Bluetooth device address from the boot firmware to the kernel and should otherwise be left unset so that the OS can prevent the controller from being used until a valid address has been provided through some other means (e.g. using btmgmt). Fixes: 60f77ae7d1c1 ("arm64: dts: qcom: qcs404-evb: Enable uart3 and add Bluetooth") Cc: stable@vger.kernel.org # 5.10 Signed-off-by: Johan Hovold Reviewed-by: Bryan O'Donoghue Link: https://lore.kernel.org/r/20240501075201.4732-1-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson Signed-off-by: Greg Kroah-Hartman --- arch/arm64/boot/dts/qcom/qcs404-evb.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/qcom/qcs404-evb.dtsi b/arch/arm64/boot/dts/qcom/qcs404-evb.dtsi index a80c578484ba..b6d70d0073e7 100644 --- a/arch/arm64/boot/dts/qcom/qcs404-evb.dtsi +++ b/arch/arm64/boot/dts/qcom/qcs404-evb.dtsi @@ -60,7 +60,7 @@ bluetooth { vddrf-supply = <&vreg_l1_1p3>; vddch0-supply = <&vdd_ch0_3p3>; - local-bd-address = [ 02 00 00 00 5a ad ]; + local-bd-address = [ 00 00 00 00 00 00 ]; max-speed = <3200000>; }; From aa64464c8f4d2ab92f6d0b959a1e0767b829d787 Mon Sep 17 00:00:00 2001 From: Yu Kuai Date: Fri, 22 Mar 2024 16:10:05 +0800 Subject: [PATCH 0263/1565] md/raid5: fix deadlock that raid5d() wait for itself to clear MD_SB_CHANGE_PENDING commit 151f66bb618d1fd0eeb84acb61b4a9fa5d8bb0fa upstream. Xiao reported that lvm2 test lvconvert-raid-takeover.sh can hang with small possibility, the root cause is exactly the same as commit bed9e27baf52 ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"") However, Dan reported another hang after that, and junxiao investigated the problem and found out that this is caused by plugged bio can't issue from raid5d(). Current implementation in raid5d() has a weird dependence: 1) md_check_recovery() from raid5d() must hold 'reconfig_mutex' to clear MD_SB_CHANGE_PENDING; 2) raid5d() handles IO in a deadloop, until all IO are issued; 3) IO from raid5d() must wait for MD_SB_CHANGE_PENDING to be cleared; This behaviour is introduce before v2.6, and for consequence, if other context hold 'reconfig_mutex', and md_check_recovery() can't update super_block, then raid5d() will waste one cpu 100% by the deadloop, until 'reconfig_mutex' is released. Refer to the implementation from raid1 and raid10, fix this problem by skipping issue IO if MD_SB_CHANGE_PENDING is still set after md_check_recovery(), daemon thread will be woken up when 'reconfig_mutex' is released. Meanwhile, the hang problem will be fixed as well. Fixes: 5e2cf333b7bd ("md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d") Cc: stable@vger.kernel.org # v5.19+ Reported-and-tested-by: Dan Moulding Closes: https://lore.kernel.org/all/20240123005700.9302-1-dan@danm.net/ Investigated-by: Junxiao Bi Signed-off-by: Yu Kuai Link: https://lore.kernel.org/r/20240322081005.1112401-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu Signed-off-by: Greg Kroah-Hartman --- drivers/md/raid5.c | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 9f114b9d8dc6..66167c4c7bc9 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -36,7 +36,6 @@ */ #include -#include #include #include #include @@ -6484,6 +6483,9 @@ static void raid5d(struct md_thread *thread) int batch_size, released; unsigned int offset; + if (test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)) + break; + released = release_stripe_list(conf, conf->temp_inactive_list); if (released) clear_bit(R5_DID_ALLOC, &conf->cache_state); @@ -6520,18 +6522,7 @@ static void raid5d(struct md_thread *thread) spin_unlock_irq(&conf->device_lock); md_check_recovery(mddev); spin_lock_irq(&conf->device_lock); - - /* - * Waiting on MD_SB_CHANGE_PENDING below may deadlock - * seeing md_check_recovery() is needed to clear - * the flag when using mdmon. - */ - continue; } - - wait_event_lock_irq(mddev->sb_wait, - !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags), - conf->device_lock); } pr_debug("%d stripes handled\n", handled); From 8dffc574c765be44c94ea1e69991013f537645c9 Mon Sep 17 00:00:00 2001 From: Bitterblue Smith Date: Mon, 15 Apr 2024 23:59:05 +0300 Subject: [PATCH 0264/1565] wifi: rtl8xxxu: Fix the TX power of RTL8192CU, RTL8723AU commit 08b5d052d17a89bb8706b2888277d0b682dc1610 upstream. Don't subtract 1 from the power index. This was added in commit 2fc0b8e5a17d ("rtl8xxxu: Add TX power base values for gen1 parts") for unknown reasons. The vendor drivers don't do this. Also correct the calculations of values written to REG_OFDM0_X{C,D}_TX_IQ_IMBALANCE. According to the vendor driver, these are used for TX power training. With these changes rtl8xxxu sets the TX power of RTL8192CU the same as the vendor driver. None of this appears to have any effect on my RTL8192CU device. Cc: stable@vger.kernel.org Signed-off-by: Bitterblue Smith Reviewed-by: Ping-Ke Shih Signed-off-by: Ping-Ke Shih Link: https://msgid.link/6ae5945b-644e-45e4-a78f-4c7d9c987910@gmail.com Signed-off-by: Greg Kroah-Hartman --- .../wireless/realtek/rtl8xxxu/rtl8xxxu_core.c | 26 ++++++++----------- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c index 9efc15e69ae8..5b27c22e7e58 100644 --- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c +++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include "rtl8xxxu.h" #include "rtl8xxxu_regs.h" @@ -1389,13 +1390,13 @@ rtl8xxxu_gen1_set_tx_power(struct rtl8xxxu_priv *priv, int channel, bool ht40) u8 cck[RTL8723A_MAX_RF_PATHS], ofdm[RTL8723A_MAX_RF_PATHS]; u8 ofdmbase[RTL8723A_MAX_RF_PATHS], mcsbase[RTL8723A_MAX_RF_PATHS]; u32 val32, ofdm_a, ofdm_b, mcs_a, mcs_b; - u8 val8; + u8 val8, base; int group, i; group = rtl8xxxu_gen1_channel_to_group(channel); - cck[0] = priv->cck_tx_power_index_A[group] - 1; - cck[1] = priv->cck_tx_power_index_B[group] - 1; + cck[0] = priv->cck_tx_power_index_A[group]; + cck[1] = priv->cck_tx_power_index_B[group]; if (priv->hi_pa) { if (cck[0] > 0x20) @@ -1406,10 +1407,6 @@ rtl8xxxu_gen1_set_tx_power(struct rtl8xxxu_priv *priv, int channel, bool ht40) ofdm[0] = priv->ht40_1s_tx_power_index_A[group]; ofdm[1] = priv->ht40_1s_tx_power_index_B[group]; - if (ofdm[0]) - ofdm[0] -= 1; - if (ofdm[1]) - ofdm[1] -= 1; ofdmbase[0] = ofdm[0] + priv->ofdm_tx_power_index_diff[group].a; ofdmbase[1] = ofdm[1] + priv->ofdm_tx_power_index_diff[group].b; @@ -1498,20 +1495,19 @@ rtl8xxxu_gen1_set_tx_power(struct rtl8xxxu_priv *priv, int channel, bool ht40) rtl8xxxu_write32(priv, REG_TX_AGC_A_MCS15_MCS12, mcs_a + power_base->reg_0e1c); + val8 = u32_get_bits(mcs_a + power_base->reg_0e1c, 0xff000000); for (i = 0; i < 3; i++) { - if (i != 2) - val8 = (mcsbase[0] > 8) ? (mcsbase[0] - 8) : 0; - else - val8 = (mcsbase[0] > 6) ? (mcsbase[0] - 6) : 0; + base = i != 2 ? 8 : 6; + val8 = max_t(int, val8 - base, 0); rtl8xxxu_write8(priv, REG_OFDM0_XC_TX_IQ_IMBALANCE + i, val8); } + rtl8xxxu_write32(priv, REG_TX_AGC_B_MCS15_MCS12, mcs_b + power_base->reg_0868); + val8 = u32_get_bits(mcs_b + power_base->reg_0868, 0xff000000); for (i = 0; i < 3; i++) { - if (i != 2) - val8 = (mcsbase[1] > 8) ? (mcsbase[1] - 8) : 0; - else - val8 = (mcsbase[1] > 6) ? (mcsbase[1] - 6) : 0; + base = i != 2 ? 8 : 6; + val8 = max_t(int, val8 - base, 0); rtl8xxxu_write8(priv, REG_OFDM0_XD_TX_IQ_IMBALANCE + i, val8); } } From 6d5bfcd2ccb535ae1d5b94a293954a9e424fefb8 Mon Sep 17 00:00:00 2001 From: Bitterblue Smith Date: Thu, 25 Apr 2024 21:12:38 +0300 Subject: [PATCH 0265/1565] wifi: rtlwifi: rtl8192de: Fix low speed with WPA3-SAE commit a7c0f48410f546772ac94a0f7b7291a15c4fc173 upstream. Some (all?) management frames are incorrectly reported to mac80211 as decrypted when actually the hardware did not decrypt them. This results in speeds 3-5 times lower than expected, 20-30 Mbps instead of 100 Mbps. Fix this by checking the encryption type field of the RX descriptor. rtw88 does the same thing. This fix was tested only with rtl8192du, which will use the same code. Cc: stable@vger.kernel.org Signed-off-by: Bitterblue Smith Signed-off-by: Ping-Ke Shih Link: https://msgid.link/4d600435-f0ea-46b0-bdb4-e60f173da8dd@gmail.com Signed-off-by: Greg Kroah-Hartman --- .../net/wireless/realtek/rtlwifi/rtl8192de/trx.c | 5 ++--- .../net/wireless/realtek/rtlwifi/rtl8192de/trx.h | 14 ++++++++++++++ 2 files changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8192de/trx.c b/drivers/net/wireless/realtek/rtlwifi/rtl8192de/trx.c index 8944712274b5..55122aa7bbda 100644 --- a/drivers/net/wireless/realtek/rtlwifi/rtl8192de/trx.c +++ b/drivers/net/wireless/realtek/rtlwifi/rtl8192de/trx.c @@ -414,7 +414,8 @@ bool rtl92de_rx_query_desc(struct ieee80211_hw *hw, struct rtl_stats *stats, stats->icv = (u16)get_rx_desc_icv(pdesc); stats->crc = (u16)get_rx_desc_crc32(pdesc); stats->hwerror = (stats->crc | stats->icv); - stats->decrypted = !get_rx_desc_swdec(pdesc); + stats->decrypted = !get_rx_desc_swdec(pdesc) && + get_rx_desc_enc_type(pdesc) != RX_DESC_ENC_NONE; stats->rate = (u8)get_rx_desc_rxmcs(pdesc); stats->shortpreamble = (u16)get_rx_desc_splcp(pdesc); stats->isampdu = (bool)(get_rx_desc_paggr(pdesc) == 1); @@ -427,8 +428,6 @@ bool rtl92de_rx_query_desc(struct ieee80211_hw *hw, struct rtl_stats *stats, rx_status->band = hw->conf.chandef.chan->band; if (get_rx_desc_crc32(pdesc)) rx_status->flag |= RX_FLAG_FAILED_FCS_CRC; - if (!get_rx_desc_swdec(pdesc)) - rx_status->flag |= RX_FLAG_DECRYPTED; if (get_rx_desc_bw(pdesc)) rx_status->bw = RATE_INFO_BW_40; if (get_rx_desc_rxht(pdesc)) diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8192de/trx.h b/drivers/net/wireless/realtek/rtlwifi/rtl8192de/trx.h index d01578875cd5..6b1553239b0c 100644 --- a/drivers/net/wireless/realtek/rtlwifi/rtl8192de/trx.h +++ b/drivers/net/wireless/realtek/rtlwifi/rtl8192de/trx.h @@ -14,6 +14,15 @@ #define USB_HWDESC_HEADER_LEN 32 #define CRCLENGTH 4 +enum rtl92d_rx_desc_enc { + RX_DESC_ENC_NONE = 0, + RX_DESC_ENC_WEP40 = 1, + RX_DESC_ENC_TKIP_WO_MIC = 2, + RX_DESC_ENC_TKIP_MIC = 3, + RX_DESC_ENC_AES = 4, + RX_DESC_ENC_WEP104 = 5, +}; + /* macros to read/write various fields in RX or TX descriptors */ static inline void set_tx_desc_pkt_size(__le32 *__pdesc, u32 __val) @@ -246,6 +255,11 @@ static inline u32 get_rx_desc_drv_info_size(__le32 *__pdesc) return le32_get_bits(*__pdesc, GENMASK(19, 16)); } +static inline u32 get_rx_desc_enc_type(__le32 *__pdesc) +{ + return le32_get_bits(*__pdesc, GENMASK(22, 20)); +} + static inline u32 get_rx_desc_shift(__le32 *__pdesc) { return le32_get_bits(*__pdesc, GENMASK(25, 24)); From c539721e903f9e4a7ed4b2c6d27206a2e0eedd47 Mon Sep 17 00:00:00 2001 From: Bitterblue Smith Date: Thu, 25 Apr 2024 21:13:12 +0300 Subject: [PATCH 0266/1565] wifi: rtlwifi: rtl8192de: Fix endianness issue in RX path commit 2f228d364da95ab58f63a3fedc00d5b2b7db16ab upstream. Structs rx_desc_92d and rx_fwinfo_92d will not work for big endian systems. Delete rx_desc_92d because it's big and barely used, and instead use the get_rx_desc_rxmcs and get_rx_desc_rxht functions, which work on big endian systems too. Fix rx_fwinfo_92d by duplicating four of its members in the correct order. Tested only with RTL8192DU, which will use the same code. Tested only on a little endian system. Cc: stable@vger.kernel.org Signed-off-by: Bitterblue Smith Signed-off-by: Ping-Ke Shih Link: https://msgid.link/698463da-5ef1-40c7-b744-fa51ad847caf@gmail.com Signed-off-by: Greg Kroah-Hartman --- .../wireless/realtek/rtlwifi/rtl8192de/trx.c | 16 ++--- .../wireless/realtek/rtlwifi/rtl8192de/trx.h | 65 ++----------------- 2 files changed, 15 insertions(+), 66 deletions(-) diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8192de/trx.c b/drivers/net/wireless/realtek/rtlwifi/rtl8192de/trx.c index 55122aa7bbda..93cba14f4401 100644 --- a/drivers/net/wireless/realtek/rtlwifi/rtl8192de/trx.c +++ b/drivers/net/wireless/realtek/rtlwifi/rtl8192de/trx.c @@ -35,7 +35,7 @@ static long _rtl92de_translate_todbm(struct ieee80211_hw *hw, static void _rtl92de_query_rxphystatus(struct ieee80211_hw *hw, struct rtl_stats *pstats, - struct rx_desc_92d *pdesc, + __le32 *pdesc, struct rx_fwinfo_92d *p_drvinfo, bool packet_match_bssid, bool packet_toself, @@ -49,8 +49,10 @@ static void _rtl92de_query_rxphystatus(struct ieee80211_hw *hw, u8 i, max_spatial_stream; u32 rssi, total_rssi = 0; bool is_cck_rate; + u8 rxmcs; - is_cck_rate = RX_HAL_IS_CCK_RATE(pdesc->rxmcs); + rxmcs = get_rx_desc_rxmcs(pdesc); + is_cck_rate = rxmcs <= DESC_RATE11M; pstats->packet_matchbssid = packet_match_bssid; pstats->packet_toself = packet_toself; pstats->packet_beacon = packet_beacon; @@ -158,8 +160,8 @@ static void _rtl92de_query_rxphystatus(struct ieee80211_hw *hw, pstats->rx_pwdb_all = pwdb_all; pstats->rxpower = rx_pwr_all; pstats->recvsignalpower = rx_pwr_all; - if (pdesc->rxht && pdesc->rxmcs >= DESC_RATEMCS8 && - pdesc->rxmcs <= DESC_RATEMCS15) + if (get_rx_desc_rxht(pdesc) && rxmcs >= DESC_RATEMCS8 && + rxmcs <= DESC_RATEMCS15) max_spatial_stream = 2; else max_spatial_stream = 1; @@ -365,7 +367,7 @@ static void _rtl92de_process_phyinfo(struct ieee80211_hw *hw, static void _rtl92de_translate_rx_signal_stuff(struct ieee80211_hw *hw, struct sk_buff *skb, struct rtl_stats *pstats, - struct rx_desc_92d *pdesc, + __le32 *pdesc, struct rx_fwinfo_92d *p_drvinfo) { struct rtl_mac *mac = rtl_mac(rtl_priv(hw)); @@ -441,9 +443,7 @@ bool rtl92de_rx_query_desc(struct ieee80211_hw *hw, struct rtl_stats *stats, if (phystatus) { p_drvinfo = (struct rx_fwinfo_92d *)(skb->data + stats->rx_bufshift); - _rtl92de_translate_rx_signal_stuff(hw, - skb, stats, - (struct rx_desc_92d *)pdesc, + _rtl92de_translate_rx_signal_stuff(hw, skb, stats, pdesc, p_drvinfo); } /*rx_status->qual = stats->signal; */ diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8192de/trx.h b/drivers/net/wireless/realtek/rtlwifi/rtl8192de/trx.h index 6b1553239b0c..eb3f768140b5 100644 --- a/drivers/net/wireless/realtek/rtlwifi/rtl8192de/trx.h +++ b/drivers/net/wireless/realtek/rtlwifi/rtl8192de/trx.h @@ -394,10 +394,17 @@ struct rx_fwinfo_92d { u8 csi_target[2]; u8 sigevm; u8 max_ex_pwr; +#ifdef __LITTLE_ENDIAN u8 ex_intf_flag:1; u8 sgi_en:1; u8 rxsc:2; u8 reserve:4; +#else + u8 reserve:4; + u8 rxsc:2; + u8 sgi_en:1; + u8 ex_intf_flag:1; +#endif } __packed; struct tx_desc_92d { @@ -502,64 +509,6 @@ struct tx_desc_92d { u32 reserve_pass_pcie_mm_limit[4]; } __packed; -struct rx_desc_92d { - u32 length:14; - u32 crc32:1; - u32 icverror:1; - u32 drv_infosize:4; - u32 security:3; - u32 qos:1; - u32 shift:2; - u32 phystatus:1; - u32 swdec:1; - u32 lastseg:1; - u32 firstseg:1; - u32 eor:1; - u32 own:1; - - u32 macid:5; - u32 tid:4; - u32 hwrsvd:5; - u32 paggr:1; - u32 faggr:1; - u32 a1_fit:4; - u32 a2_fit:4; - u32 pam:1; - u32 pwr:1; - u32 moredata:1; - u32 morefrag:1; - u32 type:2; - u32 mc:1; - u32 bc:1; - - u32 seq:12; - u32 frag:4; - u32 nextpktlen:14; - u32 nextind:1; - u32 rsvd:1; - - u32 rxmcs:6; - u32 rxht:1; - u32 amsdu:1; - u32 splcp:1; - u32 bandwidth:1; - u32 htc:1; - u32 tcpchk_rpt:1; - u32 ipcchk_rpt:1; - u32 tcpchk_valid:1; - u32 hwpcerr:1; - u32 hwpcind:1; - u32 iv0:16; - - u32 iv1; - - u32 tsfl; - - u32 bufferaddress; - u32 bufferaddress64; - -} __packed; - void rtl92de_tx_fill_desc(struct ieee80211_hw *hw, struct ieee80211_hdr *hdr, u8 *pdesc, u8 *pbd_desc_tx, struct ieee80211_tx_info *info, From 82e6eba1a548f5923bc8544ac57590d3ca60839e Mon Sep 17 00:00:00 2001 From: Yang Xiwen Date: Mon, 19 Feb 2024 23:05:26 +0800 Subject: [PATCH 0267/1565] arm64: dts: hi3798cv200: fix the size of GICR commit 428a575dc9038846ad259466d5ba109858c0a023 upstream. During boot, Linux kernel complains: [ 0.000000] GIC: GICv2 detected, but range too small and irqchip.gicv2_force_probe not set This SoC is using a regular GIC-400 and the GICR space size should be 8KB rather than 256B. With this patch: [ 0.000000] GIC: Using split EOI/Deactivate mode So this should be the correct fix. Fixes: 2f20182ed670 ("arm64: dts: hisilicon: add dts files for hi3798cv200-poplar board") Signed-off-by: Yang Xiwen Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240219-cache-v3-1-a33c57534ae9@outlook.com Signed-off-by: Krzysztof Kozlowski Signed-off-by: Greg Kroah-Hartman --- arch/arm64/boot/dts/hisilicon/hi3798cv200.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/hisilicon/hi3798cv200.dtsi b/arch/arm64/boot/dts/hisilicon/hi3798cv200.dtsi index 12bc1d3ed424..adc0a096ab4c 100644 --- a/arch/arm64/boot/dts/hisilicon/hi3798cv200.dtsi +++ b/arch/arm64/boot/dts/hisilicon/hi3798cv200.dtsi @@ -58,7 +58,7 @@ cpu@3 { gic: interrupt-controller@f1001000 { compatible = "arm,gic-400"; reg = <0x0 0xf1001000 0x0 0x1000>, /* GICD */ - <0x0 0xf1002000 0x0 0x100>; /* GICC */ + <0x0 0xf1002000 0x0 0x2000>; /* GICC */ #address-cells = <0>; #interrupt-cells = <3>; interrupt-controller; From 1514e1fb2a52d6fcbb2321894d41f1f4ea54c024 Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Fri, 23 Feb 2024 09:46:19 +0100 Subject: [PATCH 0268/1565] media: mc: mark the media devnode as registered from the, start commit 4bc60736154bc9e0e39d3b88918f5d3762ebe5e0 upstream. First the media device node was created, and if successful it was marked as 'registered'. This leaves a small race condition where an application can open the device node and get an error back because the 'registered' flag was not yet set. Change the order: first set the 'registered' flag, then actually register the media device node. If that fails, then clear the flag. Signed-off-by: Hans Verkuil Acked-by: Sakari Ailus Reviewed-by: Laurent Pinchart Fixes: cf4b9211b568 ("[media] media: Media device node support") Cc: stable@vger.kernel.org Signed-off-by: Sakari Ailus Signed-off-by: Greg Kroah-Hartman --- drivers/media/mc/mc-devnode.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/media/mc/mc-devnode.c b/drivers/media/mc/mc-devnode.c index f11382afe23b..f249199dc616 100644 --- a/drivers/media/mc/mc-devnode.c +++ b/drivers/media/mc/mc-devnode.c @@ -246,15 +246,14 @@ int __must_check media_devnode_register(struct media_device *mdev, kobject_set_name(&devnode->cdev.kobj, "media%d", devnode->minor); /* Part 3: Add the media and char device */ + set_bit(MEDIA_FLAG_REGISTERED, &devnode->flags); ret = cdev_device_add(&devnode->cdev, &devnode->dev); if (ret < 0) { + clear_bit(MEDIA_FLAG_REGISTERED, &devnode->flags); pr_err("%s: cdev_device_add failed\n", __func__); goto cdev_add_error; } - /* Part 4: Activate this minor. The char device can now be used. */ - set_bit(MEDIA_FLAG_REGISTERED, &devnode->flags); - return 0; cdev_add_error: From 567d3a4959dde9b1bd9bf494da185ed071d89875 Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Fri, 12 Jan 2024 00:40:36 +0000 Subject: [PATCH 0269/1565] media: mxl5xx: Move xpt structures off stack commit 526f4527545b2d4ce0733733929fac7b6da09ac6 upstream. When building for LoongArch with clang 18.0.0, the stack usage of probe() is larger than the allowed 2048 bytes: drivers/media/dvb-frontends/mxl5xx.c:1698:12: warning: stack frame size (2368) exceeds limit (2048) in 'probe' [-Wframe-larger-than] 1698 | static int probe(struct mxl *state, struct mxl5xx_cfg *cfg) | ^ 1 warning generated. This is the result of the linked LLVM commit, which changes how the arrays of structures in config_ts() get handled with CONFIG_INIT_STACK_ZERO and CONFIG_INIT_STACK_PATTERN, which causes the above warning in combination with inlining, as config_ts() gets inlined into probe(). This warning can be easily fixed by moving the array of structures off of the stackvia 'static const', which is a better location for these variables anyways because they are static data that is only ever read from, never modified, so allocating the stack space is wasteful. This drops the stack usage from 2368 bytes to 256 bytes with the same compiler and configuration. Link: https://lore.kernel.org/linux-media/20240111-dvb-mxl5xx-move-structs-off-stack-v1-1-ca4230e67c11@kernel.org Cc: stable@vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/1977 Link: https://github.com/llvm/llvm-project/commit/afe8b93ffdfef5d8879e1894b9d7dda40dee2b8d Signed-off-by: Nathan Chancellor Reviewed-by: Miguel Ojeda Tested-by: Miguel Ojeda Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman --- drivers/media/dvb-frontends/mxl5xx.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/media/dvb-frontends/mxl5xx.c b/drivers/media/dvb-frontends/mxl5xx.c index 0b00a23436ed..aaf9a173596a 100644 --- a/drivers/media/dvb-frontends/mxl5xx.c +++ b/drivers/media/dvb-frontends/mxl5xx.c @@ -1390,57 +1390,57 @@ static int config_ts(struct mxl *state, enum MXL_HYDRA_DEMOD_ID_E demod_id, u32 nco_count_min = 0; u32 clk_type = 0; - struct MXL_REG_FIELD_T xpt_sync_polarity[MXL_HYDRA_DEMOD_MAX] = { + static const struct MXL_REG_FIELD_T xpt_sync_polarity[MXL_HYDRA_DEMOD_MAX] = { {0x90700010, 8, 1}, {0x90700010, 9, 1}, {0x90700010, 10, 1}, {0x90700010, 11, 1}, {0x90700010, 12, 1}, {0x90700010, 13, 1}, {0x90700010, 14, 1}, {0x90700010, 15, 1} }; - struct MXL_REG_FIELD_T xpt_clock_polarity[MXL_HYDRA_DEMOD_MAX] = { + static const struct MXL_REG_FIELD_T xpt_clock_polarity[MXL_HYDRA_DEMOD_MAX] = { {0x90700010, 16, 1}, {0x90700010, 17, 1}, {0x90700010, 18, 1}, {0x90700010, 19, 1}, {0x90700010, 20, 1}, {0x90700010, 21, 1}, {0x90700010, 22, 1}, {0x90700010, 23, 1} }; - struct MXL_REG_FIELD_T xpt_valid_polarity[MXL_HYDRA_DEMOD_MAX] = { + static const struct MXL_REG_FIELD_T xpt_valid_polarity[MXL_HYDRA_DEMOD_MAX] = { {0x90700014, 0, 1}, {0x90700014, 1, 1}, {0x90700014, 2, 1}, {0x90700014, 3, 1}, {0x90700014, 4, 1}, {0x90700014, 5, 1}, {0x90700014, 6, 1}, {0x90700014, 7, 1} }; - struct MXL_REG_FIELD_T xpt_ts_clock_phase[MXL_HYDRA_DEMOD_MAX] = { + static const struct MXL_REG_FIELD_T xpt_ts_clock_phase[MXL_HYDRA_DEMOD_MAX] = { {0x90700018, 0, 3}, {0x90700018, 4, 3}, {0x90700018, 8, 3}, {0x90700018, 12, 3}, {0x90700018, 16, 3}, {0x90700018, 20, 3}, {0x90700018, 24, 3}, {0x90700018, 28, 3} }; - struct MXL_REG_FIELD_T xpt_lsb_first[MXL_HYDRA_DEMOD_MAX] = { + static const struct MXL_REG_FIELD_T xpt_lsb_first[MXL_HYDRA_DEMOD_MAX] = { {0x9070000C, 16, 1}, {0x9070000C, 17, 1}, {0x9070000C, 18, 1}, {0x9070000C, 19, 1}, {0x9070000C, 20, 1}, {0x9070000C, 21, 1}, {0x9070000C, 22, 1}, {0x9070000C, 23, 1} }; - struct MXL_REG_FIELD_T xpt_sync_byte[MXL_HYDRA_DEMOD_MAX] = { + static const struct MXL_REG_FIELD_T xpt_sync_byte[MXL_HYDRA_DEMOD_MAX] = { {0x90700010, 0, 1}, {0x90700010, 1, 1}, {0x90700010, 2, 1}, {0x90700010, 3, 1}, {0x90700010, 4, 1}, {0x90700010, 5, 1}, {0x90700010, 6, 1}, {0x90700010, 7, 1} }; - struct MXL_REG_FIELD_T xpt_enable_output[MXL_HYDRA_DEMOD_MAX] = { + static const struct MXL_REG_FIELD_T xpt_enable_output[MXL_HYDRA_DEMOD_MAX] = { {0x9070000C, 0, 1}, {0x9070000C, 1, 1}, {0x9070000C, 2, 1}, {0x9070000C, 3, 1}, {0x9070000C, 4, 1}, {0x9070000C, 5, 1}, {0x9070000C, 6, 1}, {0x9070000C, 7, 1} }; - struct MXL_REG_FIELD_T xpt_err_replace_sync[MXL_HYDRA_DEMOD_MAX] = { + static const struct MXL_REG_FIELD_T xpt_err_replace_sync[MXL_HYDRA_DEMOD_MAX] = { {0x9070000C, 24, 1}, {0x9070000C, 25, 1}, {0x9070000C, 26, 1}, {0x9070000C, 27, 1}, {0x9070000C, 28, 1}, {0x9070000C, 29, 1}, {0x9070000C, 30, 1}, {0x9070000C, 31, 1} }; - struct MXL_REG_FIELD_T xpt_err_replace_valid[MXL_HYDRA_DEMOD_MAX] = { + static const struct MXL_REG_FIELD_T xpt_err_replace_valid[MXL_HYDRA_DEMOD_MAX] = { {0x90700014, 8, 1}, {0x90700014, 9, 1}, {0x90700014, 10, 1}, {0x90700014, 11, 1}, {0x90700014, 12, 1}, {0x90700014, 13, 1}, {0x90700014, 14, 1}, {0x90700014, 15, 1} }; - struct MXL_REG_FIELD_T xpt_continuous_clock[MXL_HYDRA_DEMOD_MAX] = { + static const struct MXL_REG_FIELD_T xpt_continuous_clock[MXL_HYDRA_DEMOD_MAX] = { {0x907001D4, 0, 1}, {0x907001D4, 1, 1}, {0x907001D4, 2, 1}, {0x907001D4, 3, 1}, {0x907001D4, 4, 1}, {0x907001D4, 5, 1}, {0x907001D4, 6, 1}, {0x907001D4, 7, 1} }; - struct MXL_REG_FIELD_T xpt_nco_clock_rate[MXL_HYDRA_DEMOD_MAX] = { + static const struct MXL_REG_FIELD_T xpt_nco_clock_rate[MXL_HYDRA_DEMOD_MAX] = { {0x90700044, 16, 80}, {0x90700044, 16, 81}, {0x90700044, 16, 82}, {0x90700044, 16, 83}, {0x90700044, 16, 84}, {0x90700044, 16, 85}, From e6823bb7f4ebf49deae51cc96886acb7b0cf6cdb Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Fri, 23 Feb 2024 09:45:36 +0100 Subject: [PATCH 0270/1565] media: v4l2-core: hold videodev_lock until dev reg, finishes commit 1ed4477f2ea4743e7c5e1f9f3722152d14e6eeb1 upstream. After the new V4L2 device node was registered, some additional initialization was done before the device node was marked as 'registered'. During the time between creating the device node and marking it as 'registered' it was possible to open the device node, which would return -ENODEV since the 'registered' flag was not yet set. Hold the videodev_lock mutex from just before the device node is registered until the 'registered' flag is set. Since v4l2_open will take the same lock, it will wait until this registration process is finished. This resolves this race condition. Signed-off-by: Hans Verkuil Reviewed-by: Sakari Ailus Cc: # for vi4.18 and up Signed-off-by: Greg Kroah-Hartman --- drivers/media/v4l2-core/v4l2-dev.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/media/v4l2-core/v4l2-dev.c b/drivers/media/v4l2-core/v4l2-dev.c index a593ea0598b5..4c31aa0a941e 100644 --- a/drivers/media/v4l2-core/v4l2-dev.c +++ b/drivers/media/v4l2-core/v4l2-dev.c @@ -1030,8 +1030,10 @@ int __video_register_device(struct video_device *vdev, vdev->dev.devt = MKDEV(VIDEO_MAJOR, vdev->minor); vdev->dev.parent = vdev->dev_parent; dev_set_name(&vdev->dev, "%s%d", name_base, vdev->num); + mutex_lock(&videodev_lock); ret = device_register(&vdev->dev); if (ret < 0) { + mutex_unlock(&videodev_lock); pr_err("%s: device_register failed\n", __func__); goto cleanup; } @@ -1051,6 +1053,7 @@ int __video_register_device(struct video_device *vdev, /* Part 6: Activate this minor. The char device can now be used. */ set_bit(V4L2_FL_REGISTERED, &vdev->flags); + mutex_unlock(&videodev_lock); return 0; From 24b7af86a80c759d0cb44561b0f98f6810b607a2 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Wed, 10 Apr 2024 21:16:34 +0200 Subject: [PATCH 0271/1565] mmc: core: Add mmc_gpiod_set_cd_config() function commit 63a7cd660246aa36af263b85c33ecc6601bf04be upstream. Some mmc host drivers may need to fixup a card-detection GPIO's config to e.g. enable the GPIO controllers builtin pull-up resistor on devices where the firmware description of the GPIO is broken (e.g. GpioInt with PullNone instead of PullUp in ACPI DSDT). Since this is the exception rather then the rule adding a config parameter to mmc_gpiod_request_cd() seems undesirable, so instead add a new mmc_gpiod_set_cd_config() function. This is simply a wrapper to call gpiod_set_config() on the card-detect GPIO acquired through mmc_gpiod_request_cd(). Reviewed-by: Andy Shevchenko Signed-off-by: Hans de Goede Acked-by: Adrian Hunter Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240410191639.526324-2-hdegoede@redhat.com Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/core/slot-gpio.c | 20 ++++++++++++++++++++ include/linux/mmc/slot-gpio.h | 1 + 2 files changed, 21 insertions(+) diff --git a/drivers/mmc/core/slot-gpio.c b/drivers/mmc/core/slot-gpio.c index 681653d097ef..04c510b751b2 100644 --- a/drivers/mmc/core/slot-gpio.c +++ b/drivers/mmc/core/slot-gpio.c @@ -202,6 +202,26 @@ int mmc_gpiod_request_cd(struct mmc_host *host, const char *con_id, } EXPORT_SYMBOL(mmc_gpiod_request_cd); +/** + * mmc_gpiod_set_cd_config - set config for card-detection GPIO + * @host: mmc host + * @config: Generic pinconf config (from pinconf_to_config_packed()) + * + * This can be used by mmc host drivers to fixup a card-detection GPIO's config + * (e.g. set PIN_CONFIG_BIAS_PULL_UP) after acquiring the GPIO descriptor + * through mmc_gpiod_request_cd(). + * + * Returns: + * 0 on success, or a negative errno value on error. + */ +int mmc_gpiod_set_cd_config(struct mmc_host *host, unsigned long config) +{ + struct mmc_gpio *ctx = host->slot.handler_priv; + + return gpiod_set_config(ctx->cd_gpio, config); +} +EXPORT_SYMBOL(mmc_gpiod_set_cd_config); + bool mmc_can_gpio_cd(struct mmc_host *host) { struct mmc_gpio *ctx = host->slot.handler_priv; diff --git a/include/linux/mmc/slot-gpio.h b/include/linux/mmc/slot-gpio.h index 4ae2f2908f99..d4a1567c94d0 100644 --- a/include/linux/mmc/slot-gpio.h +++ b/include/linux/mmc/slot-gpio.h @@ -20,6 +20,7 @@ int mmc_gpiod_request_cd(struct mmc_host *host, const char *con_id, unsigned int debounce); int mmc_gpiod_request_ro(struct mmc_host *host, const char *con_id, unsigned int idx, unsigned int debounce); +int mmc_gpiod_set_cd_config(struct mmc_host *host, unsigned long config); void mmc_gpio_set_cd_isr(struct mmc_host *host, irqreturn_t (*isr)(int irq, void *dev_id)); int mmc_gpio_set_cd_wake(struct mmc_host *host, bool on); From b3418751cca0dd2c43ad7e66651ea436f30d4c7b Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Wed, 10 Apr 2024 21:16:36 +0200 Subject: [PATCH 0272/1565] mmc: sdhci-acpi: Sort DMI quirks alphabetically commit a92a73b1d9249d155412d8ac237142fa716803ea upstream. Sort the DMI quirks alphabetically. Reviewed-by: Andy Shevchenko Signed-off-by: Hans de Goede Acked-by: Adrian Hunter Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240410191639.526324-4-hdegoede@redhat.com Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/host/sdhci-acpi.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/drivers/mmc/host/sdhci-acpi.c b/drivers/mmc/host/sdhci-acpi.c index 2a28101777c6..9070b80dd3a7 100644 --- a/drivers/mmc/host/sdhci-acpi.c +++ b/drivers/mmc/host/sdhci-acpi.c @@ -761,7 +761,20 @@ static const struct acpi_device_id sdhci_acpi_ids[] = { }; MODULE_DEVICE_TABLE(acpi, sdhci_acpi_ids); +/* Please keep this list sorted alphabetically */ static const struct dmi_system_id sdhci_acpi_quirks[] = { + { + /* + * The Acer Aspire Switch 10 (SW5-012) microSD slot always + * reports the card being write-protected even though microSD + * cards do not have a write-protect switch at all. + */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Acer"), + DMI_MATCH(DMI_PRODUCT_NAME, "Aspire SW5-012"), + }, + .driver_data = (void *)DMI_QUIRK_SD_NO_WRITE_PROTECT, + }, { /* * The Lenovo Miix 320-10ICR has a bug in the _PS0 method of @@ -776,18 +789,6 @@ static const struct dmi_system_id sdhci_acpi_quirks[] = { }, .driver_data = (void *)DMI_QUIRK_RESET_SD_SIGNAL_VOLT_ON_SUSP, }, - { - /* - * The Acer Aspire Switch 10 (SW5-012) microSD slot always - * reports the card being write-protected even though microSD - * cards do not have a write-protect switch at all. - */ - .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Acer"), - DMI_MATCH(DMI_PRODUCT_NAME, "Aspire SW5-012"), - }, - .driver_data = (void *)DMI_QUIRK_SD_NO_WRITE_PROTECT, - }, { /* * The Toshiba WT8-B's microSD slot always reports the card being From 63eda0f3eb4d53e7a8eee4cadf8b0f30a7fe3ac3 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Wed, 10 Apr 2024 21:16:37 +0200 Subject: [PATCH 0273/1565] mmc: sdhci-acpi: Fix Lenovo Yoga Tablet 2 Pro 1380 sdcard slot not working commit f3521d7cbaefff19cc656325787ed797e5f6a955 upstream. The Lenovo Yoga Tablet 2 Pro 1380 sdcard slot has an active high cd pin and a broken wp pin which always reports the card being write-protected. Add a DMI quirk to address both issues. Reviewed-by: Andy Shevchenko Signed-off-by: Hans de Goede Acked-by: Adrian Hunter Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240410191639.526324-5-hdegoede@redhat.com Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/host/sdhci-acpi.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/drivers/mmc/host/sdhci-acpi.c b/drivers/mmc/host/sdhci-acpi.c index 9070b80dd3a7..2bf374cc8a8c 100644 --- a/drivers/mmc/host/sdhci-acpi.c +++ b/drivers/mmc/host/sdhci-acpi.c @@ -81,6 +81,7 @@ struct sdhci_acpi_host { enum { DMI_QUIRK_RESET_SD_SIGNAL_VOLT_ON_SUSP = BIT(0), DMI_QUIRK_SD_NO_WRITE_PROTECT = BIT(1), + DMI_QUIRK_SD_CD_ACTIVE_HIGH = BIT(2), }; static inline void *sdhci_acpi_priv(struct sdhci_acpi_host *c) @@ -789,6 +790,26 @@ static const struct dmi_system_id sdhci_acpi_quirks[] = { }, .driver_data = (void *)DMI_QUIRK_RESET_SD_SIGNAL_VOLT_ON_SUSP, }, + { + /* + * Lenovo Yoga Tablet 2 Pro 1380F/L (13" Android version) this + * has broken WP reporting and an inverted CD signal. + * Note this has more or less the same BIOS as the Lenovo Yoga + * Tablet 2 830F/L or 1050F/L (8" and 10" Android), but unlike + * the 830 / 1050 models which share the same mainboard this + * model has a different mainboard and the inverted CD and + * broken WP are unique to this board. + */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Intel Corp."), + DMI_MATCH(DMI_PRODUCT_NAME, "VALLEYVIEW C0 PLATFORM"), + DMI_MATCH(DMI_BOARD_NAME, "BYT-T FFD8"), + /* Full match so as to NOT match the 830/1050 BIOS */ + DMI_MATCH(DMI_BIOS_VERSION, "BLADE_21.X64.0005.R00.1504101516"), + }, + .driver_data = (void *)(DMI_QUIRK_SD_NO_WRITE_PROTECT | + DMI_QUIRK_SD_CD_ACTIVE_HIGH), + }, { /* * The Toshiba WT8-B's microSD slot always reports the card being @@ -914,6 +935,9 @@ static int sdhci_acpi_probe(struct platform_device *pdev) if (sdhci_acpi_flag(c, SDHCI_ACPI_SD_CD)) { bool v = sdhci_acpi_flag(c, SDHCI_ACPI_SD_CD_OVERRIDE_LEVEL); + if (quirks & DMI_QUIRK_SD_CD_ACTIVE_HIGH) + host->mmc->caps2 |= MMC_CAP2_CD_ACTIVE_HIGH; + err = mmc_gpiod_request_cd(host->mmc, NULL, 0, v, 0); if (err) { if (err == -EPROBE_DEFER) From 68447c350fc1b1dcc6d57d12ffa61fb4d628b704 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Wed, 10 Apr 2024 21:16:38 +0200 Subject: [PATCH 0274/1565] mmc: sdhci-acpi: Disable write protect detection on Toshiba WT10-A commit ef3eab75e17191e5665f52e64e85bc29d5705a7b upstream. On the Toshiba WT10-A the microSD slot always reports the card being write-protected, just like on the Toshiba WT8-B. Add a DMI quirk to work around this. Reviewed-by: Andy Shevchenko Signed-off-by: Hans de Goede Acked-by: Adrian Hunter Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240410191639.526324-6-hdegoede@redhat.com Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/host/sdhci-acpi.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/mmc/host/sdhci-acpi.c b/drivers/mmc/host/sdhci-acpi.c index 2bf374cc8a8c..ba1272db851f 100644 --- a/drivers/mmc/host/sdhci-acpi.c +++ b/drivers/mmc/host/sdhci-acpi.c @@ -821,6 +821,17 @@ static const struct dmi_system_id sdhci_acpi_quirks[] = { }, .driver_data = (void *)DMI_QUIRK_SD_NO_WRITE_PROTECT, }, + { + /* + * The Toshiba WT10-A's microSD slot always reports the card being + * write-protected. + */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"), + DMI_MATCH(DMI_PRODUCT_NAME, "TOSHIBA WT10-A"), + }, + .driver_data = (void *)DMI_QUIRK_SD_NO_WRITE_PROTECT, + }, {} /* Terminating entry */ }; From 32f92b0078ebf79dbe4827288e0acb50d89d3d5b Mon Sep 17 00:00:00 2001 From: Cai Xinchen Date: Tue, 16 Apr 2024 06:51:37 +0000 Subject: [PATCH 0275/1565] fbdev: savage: Handle err return when savagefb_check_var failed commit 6ad959b6703e2c4c5d7af03b4cfd5ff608036339 upstream. The commit 04e5eac8f3ab("fbdev: savage: Error out if pixclock equals zero") checks the value of pixclock to avoid divide-by-zero error. However the function savagefb_probe doesn't handle the error return of savagefb_check_var. When pixclock is 0, it will cause divide-by-zero error. Fixes: 04e5eac8f3ab ("fbdev: savage: Error out if pixclock equals zero") Signed-off-by: Cai Xinchen Cc: stable@vger.kernel.org Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman --- drivers/video/fbdev/savage/savagefb_driver.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/video/fbdev/savage/savagefb_driver.c b/drivers/video/fbdev/savage/savagefb_driver.c index 94ebd8af50cf..224d7c8146a9 100644 --- a/drivers/video/fbdev/savage/savagefb_driver.c +++ b/drivers/video/fbdev/savage/savagefb_driver.c @@ -2271,7 +2271,10 @@ static int savagefb_probe(struct pci_dev *dev, const struct pci_device_id *id) if (info->var.xres_virtual > 0x1000) info->var.xres_virtual = 0x1000; #endif - savagefb_check_var(&info->var, info); + err = savagefb_check_var(&info->var, info); + if (err) + goto failed; + savagefb_set_fix(info); /* From 4d8226bc7e5936158d26e6dffcf6075a17539182 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Fri, 24 May 2024 15:19:55 +0100 Subject: [PATCH 0276/1565] KVM: arm64: Allow AArch32 PSTATE.M to be restored as System mode commit dfe6d190f38fc5df5ff2614b463a5195a399c885 upstream. It appears that we don't allow a vcpu to be restored in AArch32 System mode, as we *never* included it in the list of valid modes. Just add it to the list of allowed modes. Fixes: 0d854a60b1d7 ("arm64: KVM: enable initialization of a 32bit vcpu") Cc: stable@vger.kernel.org Acked-by: Oliver Upton Link: https://lore.kernel.org/r/20240524141956.1450304-3-maz@kernel.org Signed-off-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/kvm/guest.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index dfb5218137ca..3aad58626429 100644 --- a/arch/arm64/kvm/guest.c +++ b/arch/arm64/kvm/guest.c @@ -234,6 +234,7 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) case PSR_AA32_MODE_SVC: case PSR_AA32_MODE_ABT: case PSR_AA32_MODE_UND: + case PSR_AA32_MODE_SYS: if (!vcpu_el1_is_32bit(vcpu)) return -EINVAL; break; From 6815376b7f5ec0d78d91c6b520570d8d5cd4be8b Mon Sep 17 00:00:00 2001 From: Vitaly Chikunov Date: Mon, 18 Mar 2024 03:42:40 +0300 Subject: [PATCH 0277/1565] crypto: ecrdsa - Fix module auto-load on add_key commit eb5739a1efbc9ff216271aeea0ebe1c92e5383e5 upstream. Add module alias with the algorithm cra_name similar to what we have for RSA-related and other algorithms. The kernel attempts to modprobe asymmetric algorithms using the names "crypto-$cra_name" and "crypto-$cra_name-all." However, since these aliases are currently missing, the modules are not loaded. For instance, when using the `add_key` function, the hash algorithm is typically loaded automatically, but the asymmetric algorithm is not. Steps to test: 1. Cert is generated usings ima-evm-utils test suite with `gen-keys.sh`, example cert is provided below: $ base64 -d >test-gost2012_512-A.cer < Cc: stable@vger.kernel.org Signed-off-by: Vitaly Chikunov Tested-by: Stefan Berger Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/ecrdsa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/crypto/ecrdsa.c b/crypto/ecrdsa.c index f7ed43020672..0a970261b107 100644 --- a/crypto/ecrdsa.c +++ b/crypto/ecrdsa.c @@ -294,4 +294,5 @@ module_exit(ecrdsa_mod_fini); MODULE_LICENSE("GPL"); MODULE_AUTHOR("Vitaly Chikunov "); MODULE_DESCRIPTION("EC-RDSA generic algorithm"); +MODULE_ALIAS_CRYPTO("ecrdsa"); MODULE_ALIAS_CRYPTO("ecrdsa-generic"); From a718b6d2a329e069b27d9049a71be5931e71d960 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Wed, 8 May 2024 16:39:51 +0800 Subject: [PATCH 0278/1565] crypto: qat - Fix ADF_DEV_RESET_SYNC memory leak commit d3b17c6d9dddc2db3670bc9be628b122416a3d26 upstream. Using completion_done to determine whether the caller has gone away only works after a complete call. Furthermore it's still possible that the caller has not yet called wait_for_completion, resulting in another potential UAF. Fix this by making the caller use cancel_work_sync and then freeing the memory safely. Fixes: 7d42e097607c ("crypto: qat - resolve race condition during AER recovery") Cc: #6.8+ Signed-off-by: Herbert Xu Reviewed-by: Giovanni Cabiddu Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/qat/qat_common/adf_aer.c | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/drivers/crypto/qat/qat_common/adf_aer.c b/drivers/crypto/qat/qat_common/adf_aer.c index dc0feedf5270..bf6e2116148b 100644 --- a/drivers/crypto/qat/qat_common/adf_aer.c +++ b/drivers/crypto/qat/qat_common/adf_aer.c @@ -95,8 +95,7 @@ static void adf_device_reset_worker(struct work_struct *work) if (adf_dev_init(accel_dev) || adf_dev_start(accel_dev)) { /* The device hanged and we can't restart it so stop here */ dev_err(&GET_DEV(accel_dev), "Restart device failed\n"); - if (reset_data->mode == ADF_DEV_RESET_ASYNC || - completion_done(&reset_data->compl)) + if (reset_data->mode == ADF_DEV_RESET_ASYNC) kfree(reset_data); WARN(1, "QAT: device restart failed. Device is unusable\n"); return; @@ -104,16 +103,8 @@ static void adf_device_reset_worker(struct work_struct *work) adf_dev_restarted_notify(accel_dev); clear_bit(ADF_STATUS_RESTARTING, &accel_dev->status); - /* - * The dev is back alive. Notify the caller if in sync mode - * - * If device restart will take a more time than expected, - * the schedule_reset() function can timeout and exit. This can be - * detected by calling the completion_done() function. In this case - * the reset_data structure needs to be freed here. - */ - if (reset_data->mode == ADF_DEV_RESET_ASYNC || - completion_done(&reset_data->compl)) + /* The dev is back alive. Notify the caller if in sync mode */ + if (reset_data->mode == ADF_DEV_RESET_ASYNC) kfree(reset_data); else complete(&reset_data->compl); @@ -148,10 +139,10 @@ static int adf_dev_aer_schedule_reset(struct adf_accel_dev *accel_dev, if (!timeout) { dev_err(&GET_DEV(accel_dev), "Reset device timeout expired\n"); + cancel_work_sync(&reset_data->reset_work); ret = -EFAULT; - } else { - kfree(reset_data); } + kfree(reset_data); return ret; } return 0; From 3cc7687f7ff32fc4233fdb5d694f09f9eb00ac84 Mon Sep 17 00:00:00 2001 From: xu xin Date: Tue, 14 May 2024 20:11:02 +0800 Subject: [PATCH 0279/1565] net/ipv6: Fix route deleting failure when metric equals 0 commit bb487272380d120295e955ad8acfcbb281b57642 upstream. Problem ========= After commit 67f695134703 ("ipv6: Move setting default metric for routes"), we noticed that the logic of assigning the default value of fc_metirc changed in the ioctl process. That is, when users use ioctl(fd, SIOCADDRT, rt) with a non-zero metric to add a route, then they may fail to delete a route with passing in a metric value of 0 to the kernel by ioctl(fd, SIOCDELRT, rt). But iproute can succeed in deleting it. As a reference, when using iproute tools by netlink to delete routes with a metric parameter equals 0, like the command as follows: ip -6 route del fe80::/64 via fe81::5054:ff:fe11:3451 dev eth0 metric 0 the user can still succeed in deleting the route entry with the smallest metric. Root Reason =========== After commit 67f695134703 ("ipv6: Move setting default metric for routes"), When ioctl() pass in SIOCDELRT with a zero metric, rtmsg_to_fib6_config() will set a defalut value (1024) to cfg->fc_metric in kernel, and in ip6_route_del() and the line 4074 at net/ipv3/route.c, it will check by if (cfg->fc_metric && cfg->fc_metric != rt->fib6_metric) continue; and the condition is true and skip the later procedure (deleting route) because cfg->fc_metric != rt->fib6_metric. But before that commit, cfg->fc_metric is still zero there, so the condition is false and it will do the following procedure (deleting). Solution ======== In order to keep a consistent behaviour across netlink() and ioctl(), we should allow to delete a route with a metric value of 0. So we only do the default setting of fc_metric in route adding. CC: stable@vger.kernel.org # 5.4+ Fixes: 67f695134703 ("ipv6: Move setting default metric for routes") Co-developed-by: Fan Yu Signed-off-by: Fan Yu Signed-off-by: xu xin Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20240514201102055dD2Ba45qKbLlUMxu_DTHP@zte.com.cn Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- net/ipv6/route.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 2d53c362f309..e3d0686d2a33 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -4345,7 +4345,7 @@ static void rtmsg_to_fib6_config(struct net *net, .fc_table = l3mdev_fib_table_by_index(net, rtmsg->rtmsg_ifindex) ? : RT6_TABLE_MAIN, .fc_ifindex = rtmsg->rtmsg_ifindex, - .fc_metric = rtmsg->rtmsg_metric ? : IP6_RT_PRIO_USER, + .fc_metric = rtmsg->rtmsg_metric, .fc_expires = rtmsg->rtmsg_info, .fc_dst_len = rtmsg->rtmsg_dst_len, .fc_src_len = rtmsg->rtmsg_src_len, @@ -4375,6 +4375,9 @@ int ipv6_route_ioctl(struct net *net, unsigned int cmd, struct in6_rtmsg *rtmsg) rtnl_lock(); switch (cmd) { case SIOCADDRT: + /* Only do the default setting of fc_metric in route adding */ + if (cfg.fc_metric == 0) + cfg.fc_metric = IP6_RT_PRIO_USER; err = ip6_route_add(&cfg, GFP_KERNEL, NULL); break; case SIOCDELRT: From 124947855564572713d705a13be7d0c9dae16a17 Mon Sep 17 00:00:00 2001 From: Nikita Zhandarovich Date: Mon, 8 Apr 2024 07:10:39 -0700 Subject: [PATCH 0280/1565] net/9p: fix uninit-value in p9_client_rpc() commit 25460d6f39024cc3b8241b14c7ccf0d6f11a736a upstream. Syzbot with the help of KMSAN reported the following error: BUG: KMSAN: uninit-value in trace_9p_client_res include/trace/events/9p.h:146 [inline] BUG: KMSAN: uninit-value in p9_client_rpc+0x1314/0x1340 net/9p/client.c:754 trace_9p_client_res include/trace/events/9p.h:146 [inline] p9_client_rpc+0x1314/0x1340 net/9p/client.c:754 p9_client_create+0x1551/0x1ff0 net/9p/client.c:1031 v9fs_session_init+0x1b9/0x28e0 fs/9p/v9fs.c:410 v9fs_mount+0xe2/0x12b0 fs/9p/vfs_super.c:122 legacy_get_tree+0x114/0x290 fs/fs_context.c:662 vfs_get_tree+0xa7/0x570 fs/super.c:1797 do_new_mount+0x71f/0x15e0 fs/namespace.c:3352 path_mount+0x742/0x1f20 fs/namespace.c:3679 do_mount fs/namespace.c:3692 [inline] __do_sys_mount fs/namespace.c:3898 [inline] __se_sys_mount+0x725/0x810 fs/namespace.c:3875 __x64_sys_mount+0xe4/0x150 fs/namespace.c:3875 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 Uninit was created at: __alloc_pages+0x9d6/0xe70 mm/page_alloc.c:4598 __alloc_pages_node include/linux/gfp.h:238 [inline] alloc_pages_node include/linux/gfp.h:261 [inline] alloc_slab_page mm/slub.c:2175 [inline] allocate_slab mm/slub.c:2338 [inline] new_slab+0x2de/0x1400 mm/slub.c:2391 ___slab_alloc+0x1184/0x33d0 mm/slub.c:3525 __slab_alloc mm/slub.c:3610 [inline] __slab_alloc_node mm/slub.c:3663 [inline] slab_alloc_node mm/slub.c:3835 [inline] kmem_cache_alloc+0x6d3/0xbe0 mm/slub.c:3852 p9_tag_alloc net/9p/client.c:278 [inline] p9_client_prepare_req+0x20a/0x1770 net/9p/client.c:641 p9_client_rpc+0x27e/0x1340 net/9p/client.c:688 p9_client_create+0x1551/0x1ff0 net/9p/client.c:1031 v9fs_session_init+0x1b9/0x28e0 fs/9p/v9fs.c:410 v9fs_mount+0xe2/0x12b0 fs/9p/vfs_super.c:122 legacy_get_tree+0x114/0x290 fs/fs_context.c:662 vfs_get_tree+0xa7/0x570 fs/super.c:1797 do_new_mount+0x71f/0x15e0 fs/namespace.c:3352 path_mount+0x742/0x1f20 fs/namespace.c:3679 do_mount fs/namespace.c:3692 [inline] __do_sys_mount fs/namespace.c:3898 [inline] __se_sys_mount+0x725/0x810 fs/namespace.c:3875 __x64_sys_mount+0xe4/0x150 fs/namespace.c:3875 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 If p9_check_errors() fails early in p9_client_rpc(), req->rc.tag will not be properly initialized. However, trace_9p_client_res() ends up trying to print it out anyway before p9_client_rpc() finishes. Fix this issue by assigning default values to p9_fcall fields such as 'tag' and (just in case KMSAN unearths something new) 'id' during the tag allocation stage. Reported-and-tested-by: syzbot+ff14db38f56329ef68df@syzkaller.appspotmail.com Fixes: 348b59012e5c ("net/9p: Convert net/9p protocol dumps to tracepoints") Signed-off-by: Nikita Zhandarovich Reviewed-by: Christian Schoenebeck Cc: stable@vger.kernel.org Message-ID: <20240408141039.30428-1-n.zhandarovich@fintech.ru> Signed-off-by: Dominique Martinet Signed-off-by: Greg Kroah-Hartman --- net/9p/client.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/9p/client.c b/net/9p/client.c index cd85a4b6448b..0fa324e8b245 100644 --- a/net/9p/client.c +++ b/net/9p/client.c @@ -235,6 +235,8 @@ static int p9_fcall_init(struct p9_client *c, struct p9_fcall *fc, if (!fc->sdata) return -ENOMEM; fc->capacity = alloc_msize; + fc->id = 0; + fc->tag = P9_NOTAG; return 0; } From 6e7dd338c05388adcadde88ce50a29ad1d97e121 Mon Sep 17 00:00:00 2001 From: Alexander Shishkin Date: Mon, 29 Apr 2024 16:01:18 +0300 Subject: [PATCH 0281/1565] intel_th: pci: Add Meteor Lake-S CPU support commit a4f813c3ec9d1c32bc402becd1f011b3904dd699 upstream. Add support for the Trace Hub in Meteor Lake-S CPU. Signed-off-by: Alexander Shishkin Reviewed-by: Andy Shevchenko Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240429130119.1518073-15-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/hwtracing/intel_th/pci.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c index e25438025b9f..7f13687267fd 100644 --- a/drivers/hwtracing/intel_th/pci.c +++ b/drivers/hwtracing/intel_th/pci.c @@ -289,6 +289,11 @@ static const struct pci_device_id intel_th_pci_id_table[] = { PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x7e24), .driver_data = (kernel_ulong_t)&intel_th_2x, }, + { + /* Meteor Lake-S CPU */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0xae24), + .driver_data = (kernel_ulong_t)&intel_th_2x, + }, { /* Raptor Lake-S */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x7a26), From 161f5a1189b78a61936dda31c7f864af6e0213e3 Mon Sep 17 00:00:00 2001 From: Sam Ravnborg Date: Sat, 30 Mar 2024 10:57:45 +0100 Subject: [PATCH 0282/1565] sparc64: Fix number of online CPUs commit 98937707fea8375e8acea0aaa0b68a956dd52719 upstream. Nick Bowler reported: When using newer kernels on my Ultra 60 with dual 450MHz UltraSPARC-II CPUs, I noticed that only CPU 0 comes up, while older kernels (including 4.7) are working fine with both CPUs. I bisected the failure to this commit: 9b2f753ec23710aa32c0d837d2499db92fe9115b is the first bad commit commit 9b2f753ec23710aa32c0d837d2499db92fe9115b Author: Atish Patra Date: Thu Sep 15 14:54:40 2016 -0600 sparc64: Fix cpu_possible_mask if nr_cpus is set This is a small change that reverts very easily on top of 5.18: there is just one trivial conflict. Once reverted, both CPUs work again. Maybe this is related to the fact that the CPUs on this system are numbered CPU0 and CPU2 (there is no CPU1)? The current code that adjust cpu_possible based on nr_cpu_ids do not take into account that CPU's may not come one after each other. Move the chech to the function that setup the cpu_possible mask so there is no need to adjust it later. Signed-off-by: Sam Ravnborg Fixes: 9b2f753ec237 ("sparc64: Fix cpu_possible_mask if nr_cpus is set") Reported-by: Nick Bowler Tested-by: Nick Bowler Link: https://lore.kernel.org/sparclinux/20201009161924.c8f031c079dd852941307870@gmx.de/ Link: https://lore.kernel.org/all/CADyTPEwt=ZNams+1bpMB1F9w_vUdPsGCt92DBQxxq_VtaLoTdw@mail.gmail.com/ Cc: stable@vger.kernel.org # v4.8+ Cc: Andreas Larsson Cc: David S. Miller Cc: Atish Patra Cc: Bob Picco Cc: Vijay Kumar Cc: David S. Miller Reviewed-by: Andreas Larsson Acked-by: Arnd Bergmann Link: https://lore.kernel.org/r/20240330-sparc64-warnings-v1-9-37201023ee2f@ravnborg.org Signed-off-by: Andreas Larsson Signed-off-by: Greg Kroah-Hartman --- arch/sparc/include/asm/smp_64.h | 2 -- arch/sparc/kernel/prom_64.c | 4 +++- arch/sparc/kernel/setup_64.c | 1 - arch/sparc/kernel/smp_64.c | 14 -------------- 4 files changed, 3 insertions(+), 18 deletions(-) diff --git a/arch/sparc/include/asm/smp_64.h b/arch/sparc/include/asm/smp_64.h index e75783b6abc4..16ab904616a0 100644 --- a/arch/sparc/include/asm/smp_64.h +++ b/arch/sparc/include/asm/smp_64.h @@ -47,7 +47,6 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask); int hard_smp_processor_id(void); #define raw_smp_processor_id() (current_thread_info()->cpu) -void smp_fill_in_cpu_possible_map(void); void smp_fill_in_sib_core_maps(void); void cpu_play_dead(void); @@ -77,7 +76,6 @@ void __cpu_die(unsigned int cpu); #define smp_fill_in_sib_core_maps() do { } while (0) #define smp_fetch_global_regs() do { } while (0) #define smp_fetch_global_pmu() do { } while (0) -#define smp_fill_in_cpu_possible_map() do { } while (0) #define smp_init_cpu_poke() do { } while (0) #define scheduler_poke() do { } while (0) diff --git a/arch/sparc/kernel/prom_64.c b/arch/sparc/kernel/prom_64.c index f883a50fa333..4eae633f7198 100644 --- a/arch/sparc/kernel/prom_64.c +++ b/arch/sparc/kernel/prom_64.c @@ -483,7 +483,9 @@ static void *record_one_cpu(struct device_node *dp, int cpuid, int arg) ncpus_probed++; #ifdef CONFIG_SMP set_cpu_present(cpuid, true); - set_cpu_possible(cpuid, true); + + if (num_possible_cpus() < nr_cpu_ids) + set_cpu_possible(cpuid, true); #endif return NULL; } diff --git a/arch/sparc/kernel/setup_64.c b/arch/sparc/kernel/setup_64.c index d87244197d5c..d921b4790af9 100644 --- a/arch/sparc/kernel/setup_64.c +++ b/arch/sparc/kernel/setup_64.c @@ -688,7 +688,6 @@ void __init setup_arch(char **cmdline_p) paging_init(); init_sparc64_elf_hwcap(); - smp_fill_in_cpu_possible_map(); /* * Once the OF device tree and MDESC have been setup and nr_cpus has * been parsed, we know the list of possible cpus. Therefore we can diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c index ae5faa1d989d..748909095d9b 100644 --- a/arch/sparc/kernel/smp_64.c +++ b/arch/sparc/kernel/smp_64.c @@ -1210,20 +1210,6 @@ void __init smp_setup_processor_id(void) xcall_deliver_impl = hypervisor_xcall_deliver; } -void __init smp_fill_in_cpu_possible_map(void) -{ - int possible_cpus = num_possible_cpus(); - int i; - - if (possible_cpus > nr_cpu_ids) - possible_cpus = nr_cpu_ids; - - for (i = 0; i < possible_cpus; i++) - set_cpu_possible(i, true); - for (; i < NR_CPUS; i++) - set_cpu_possible(i, false); -} - void smp_fill_in_sib_core_maps(void) { unsigned int i; From b487b48efd0c0a3709b4c73e990eede73320667f Mon Sep 17 00:00:00 2001 From: Judith Mendez Date: Wed, 17 Apr 2024 15:57:00 -0500 Subject: [PATCH 0283/1565] watchdog: rti_wdt: Set min_hw_heartbeat_ms to accommodate a safety margin commit cae58516534e110f4a8558d48aa4435e15519121 upstream. On AM62x, the watchdog is pet before the valid window is open. Fix min_hw_heartbeat and accommodate a 2% + static offset safety margin. The static offset accounts for max hardware error. Remove the hack in the driver which shifts the open window boundary, since it is no longer necessary due to the fix mentioned above. cc: stable@vger.kernel.org Fixes: 5527483f8f7c ("watchdog: rti-wdt: attach to running watchdog during probe") Signed-off-by: Judith Mendez Reviewed-by: Guenter Roeck Link: https://lore.kernel.org/r/20240417205700.3947408-1-jm@ti.com Signed-off-by: Guenter Roeck Signed-off-by: Wim Van Sebroeck Signed-off-by: Greg Kroah-Hartman --- drivers/watchdog/rti_wdt.c | 34 +++++++++++++++------------------- 1 file changed, 15 insertions(+), 19 deletions(-) diff --git a/drivers/watchdog/rti_wdt.c b/drivers/watchdog/rti_wdt.c index daa00f3c5a6a..7f2ca611a3f8 100644 --- a/drivers/watchdog/rti_wdt.c +++ b/drivers/watchdog/rti_wdt.c @@ -52,6 +52,8 @@ #define DWDST BIT(1) +#define MAX_HW_ERROR 250 + static int heartbeat = DEFAULT_HEARTBEAT; /* @@ -90,7 +92,7 @@ static int rti_wdt_start(struct watchdog_device *wdd) * to be 50% or less than that; we obviouly want to configure the open * window as large as possible so we select the 50% option. */ - wdd->min_hw_heartbeat_ms = 500 * wdd->timeout; + wdd->min_hw_heartbeat_ms = 520 * wdd->timeout + MAX_HW_ERROR; /* Generate NMI when wdt expires */ writel_relaxed(RTIWWDRX_NMI, wdt->base + RTIWWDRXCTRL); @@ -124,31 +126,33 @@ static int rti_wdt_setup_hw_hb(struct watchdog_device *wdd, u32 wsize) * be petted during the open window; not too early or not too late. * The HW configuration options only allow for the open window size * to be 50% or less than that. + * To avoid any glitches, we accommodate 2% + max hardware error + * safety margin. */ switch (wsize) { case RTIWWDSIZE_50P: - /* 50% open window => 50% min heartbeat */ - wdd->min_hw_heartbeat_ms = 500 * heartbeat; + /* 50% open window => 52% min heartbeat */ + wdd->min_hw_heartbeat_ms = 520 * heartbeat + MAX_HW_ERROR; break; case RTIWWDSIZE_25P: - /* 25% open window => 75% min heartbeat */ - wdd->min_hw_heartbeat_ms = 750 * heartbeat; + /* 25% open window => 77% min heartbeat */ + wdd->min_hw_heartbeat_ms = 770 * heartbeat + MAX_HW_ERROR; break; case RTIWWDSIZE_12P5: - /* 12.5% open window => 87.5% min heartbeat */ - wdd->min_hw_heartbeat_ms = 875 * heartbeat; + /* 12.5% open window => 89.5% min heartbeat */ + wdd->min_hw_heartbeat_ms = 895 * heartbeat + MAX_HW_ERROR; break; case RTIWWDSIZE_6P25: - /* 6.5% open window => 93.5% min heartbeat */ - wdd->min_hw_heartbeat_ms = 935 * heartbeat; + /* 6.5% open window => 95.5% min heartbeat */ + wdd->min_hw_heartbeat_ms = 955 * heartbeat + MAX_HW_ERROR; break; case RTIWWDSIZE_3P125: - /* 3.125% open window => 96.9% min heartbeat */ - wdd->min_hw_heartbeat_ms = 969 * heartbeat; + /* 3.125% open window => 98.9% min heartbeat */ + wdd->min_hw_heartbeat_ms = 989 * heartbeat + MAX_HW_ERROR; break; default: @@ -222,14 +226,6 @@ static int rti_wdt_probe(struct platform_device *pdev) return -EINVAL; } - /* - * If watchdog is running at 32k clock, it is not accurate. - * Adjust frequency down in this case so that we don't pet - * the watchdog too often. - */ - if (wdt->freq < 32768) - wdt->freq = wdt->freq * 9 / 10; - pm_runtime_enable(dev); ret = pm_runtime_get_sync(dev); if (ret < 0) { From cfdc2fa4db57503bc6d3817240547c8ddc55fa96 Mon Sep 17 00:00:00 2001 From: Daniel Thompson Date: Wed, 24 Apr 2024 15:03:34 +0100 Subject: [PATCH 0284/1565] kdb: Fix buffer overflow during tab-complete commit e9730744bf3af04cda23799029342aa3cddbc454 upstream. Currently, when the user attempts symbol completion with the Tab key, kdb will use strncpy() to insert the completed symbol into the command buffer. Unfortunately it passes the size of the source buffer rather than the destination to strncpy() with predictably horrible results. Most obviously if the command buffer is already full but cp, the cursor position, is in the middle of the buffer, then we will write past the end of the supplied buffer. Fix this by replacing the dubious strncpy() calls with memmove()/memcpy() calls plus explicit boundary checks to make sure we have enough space before we start moving characters around. Reported-by: Justin Stitt Closes: https://lore.kernel.org/all/CAFhGd8qESuuifuHsNjFPR-Va3P80bxrw+LqvC8deA8GziUJLpw@mail.gmail.com/ Cc: stable@vger.kernel.org Reviewed-by: Douglas Anderson Reviewed-by: Justin Stitt Tested-by: Justin Stitt Link: https://lore.kernel.org/r/20240424-kgdb_read_refactor-v3-1-f236dbe9828d@linaro.org Signed-off-by: Daniel Thompson Signed-off-by: Greg Kroah-Hartman --- kernel/debug/kdb/kdb_io.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c index 6735ac36b718..dcc7e13e98b3 100644 --- a/kernel/debug/kdb/kdb_io.c +++ b/kernel/debug/kdb/kdb_io.c @@ -354,14 +354,19 @@ static char *kdb_read(char *buffer, size_t bufsize) kdb_printf(kdb_prompt_str); kdb_printf("%s", buffer); } else if (tab != 2 && count > 0) { - len_tmp = strlen(p_tmp); - strncpy(p_tmp+len_tmp, cp, lastchar-cp+1); - len_tmp = strlen(p_tmp); - strncpy(cp, p_tmp+len, len_tmp-len + 1); - len = len_tmp - len; - kdb_printf("%s", cp); - cp += len; - lastchar += len; + /* How many new characters do we want from tmpbuffer? */ + len_tmp = strlen(p_tmp) - len; + if (lastchar + len_tmp >= bufend) + len_tmp = bufend - lastchar; + + if (len_tmp) { + /* + 1 ensures the '\0' is memmove'd */ + memmove(cp+len_tmp, cp, (lastchar-cp) + 1); + memcpy(cp, p_tmp+len, len_tmp); + kdb_printf("%s", cp); + cp += len_tmp; + lastchar += len_tmp; + } } kdb_nextline = 1; /* reset output line number */ break; From 7c19e28f3a8148a44f306a38e2768be494c8de57 Mon Sep 17 00:00:00 2001 From: Daniel Thompson Date: Wed, 24 Apr 2024 15:03:35 +0100 Subject: [PATCH 0285/1565] kdb: Use format-strings rather than '\0' injection in kdb_read() commit 09b35989421dfd5573f0b4683c7700a7483c71f9 upstream. Currently when kdb_read() needs to reposition the cursor it uses copy and paste code that works by injecting an '\0' at the cursor position before delivering a carriage-return and reprinting the line (which stops at the '\0'). Tidy up the code by hoisting the copy and paste code into an appropriately named function. Additionally let's replace the '\0' injection with a proper field width parameter so that the string will be abridged during formatting instead. Cc: stable@vger.kernel.org # Not a bug fix but it is needed for later bug fixes Tested-by: Justin Stitt Reviewed-by: Douglas Anderson Link: https://lore.kernel.org/r/20240424-kgdb_read_refactor-v3-2-f236dbe9828d@linaro.org Signed-off-by: Daniel Thompson Signed-off-by: Greg Kroah-Hartman --- kernel/debug/kdb/kdb_io.c | 55 ++++++++++++++++++++++++--------------- 1 file changed, 34 insertions(+), 21 deletions(-) diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c index dcc7e13e98b3..0f4b288185be 100644 --- a/kernel/debug/kdb/kdb_io.c +++ b/kernel/debug/kdb/kdb_io.c @@ -172,6 +172,33 @@ char kdb_getchar(void) unreachable(); } +/** + * kdb_position_cursor() - Place cursor in the correct horizontal position + * @prompt: Nil-terminated string containing the prompt string + * @buffer: Nil-terminated string containing the entire command line + * @cp: Cursor position, pointer the character in buffer where the cursor + * should be positioned. + * + * The cursor is positioned by sending a carriage-return and then printing + * the content of the line until we reach the correct cursor position. + * + * There is some additional fine detail here. + * + * Firstly, even though kdb_printf() will correctly format zero-width fields + * we want the second call to kdb_printf() to be conditional. That keeps things + * a little cleaner when LOGGING=1. + * + * Secondly, we can't combine everything into one call to kdb_printf() since + * that renders into a fixed length buffer and the combined print could result + * in unwanted truncation. + */ +static void kdb_position_cursor(char *prompt, char *buffer, char *cp) +{ + kdb_printf("\r%s", kdb_prompt_str); + if (cp > buffer) + kdb_printf("%.*s", (int)(cp - buffer), buffer); +} + /* * kdb_read * @@ -200,7 +227,6 @@ static char *kdb_read(char *buffer, size_t bufsize) * and null byte */ char *lastchar; char *p_tmp; - char tmp; static char tmpbuffer[CMD_BUFLEN]; int len = strlen(buffer); int len_tmp; @@ -237,12 +263,8 @@ static char *kdb_read(char *buffer, size_t bufsize) } *(--lastchar) = '\0'; --cp; - kdb_printf("\b%s \r", cp); - tmp = *cp; - *cp = '\0'; - kdb_printf(kdb_prompt_str); - kdb_printf("%s", buffer); - *cp = tmp; + kdb_printf("\b%s ", cp); + kdb_position_cursor(kdb_prompt_str, buffer, cp); } break; case 13: /* enter */ @@ -259,19 +281,14 @@ static char *kdb_read(char *buffer, size_t bufsize) memcpy(tmpbuffer, cp+1, lastchar - cp - 1); memcpy(cp, tmpbuffer, lastchar - cp - 1); *(--lastchar) = '\0'; - kdb_printf("%s \r", cp); - tmp = *cp; - *cp = '\0'; - kdb_printf(kdb_prompt_str); - kdb_printf("%s", buffer); - *cp = tmp; + kdb_printf("%s ", cp); + kdb_position_cursor(kdb_prompt_str, buffer, cp); } break; case 1: /* Home */ if (cp > buffer) { - kdb_printf("\r"); - kdb_printf(kdb_prompt_str); cp = buffer; + kdb_position_cursor(kdb_prompt_str, buffer, cp); } break; case 5: /* End */ @@ -377,13 +394,9 @@ static char *kdb_read(char *buffer, size_t bufsize) memcpy(cp+1, tmpbuffer, lastchar - cp); *++lastchar = '\0'; *cp = key; - kdb_printf("%s\r", cp); + kdb_printf("%s", cp); ++cp; - tmp = *cp; - *cp = '\0'; - kdb_printf(kdb_prompt_str); - kdb_printf("%s", buffer); - *cp = tmp; + kdb_position_cursor(kdb_prompt_str, buffer, cp); } else { *++lastchar = '\0'; *cp++ = key; From 2b5e1534dfc79ef68fa5fddb0a405dfa3fe8af89 Mon Sep 17 00:00:00 2001 From: Daniel Thompson Date: Wed, 24 Apr 2024 15:03:36 +0100 Subject: [PATCH 0286/1565] kdb: Fix console handling when editing and tab-completing commands commit db2f9c7dc29114f531df4a425d0867d01e1f1e28 upstream. Currently, if the cursor position is not at the end of the command buffer and the user uses the Tab-complete functions, then the console does not leave the cursor in the correct position. For example consider the following buffer with the cursor positioned at the ^: md kdb_pro 10 ^ Pressing tab should result in: md kdb_prompt_str 10 ^ However this does not happen. Instead the cursor is placed at the end (after then 10) and further cursor movement redraws incorrectly. The same problem exists when we double-Tab but in a different part of the code. Fix this by sending a carriage return and then redisplaying the text to the left of the cursor. Cc: stable@vger.kernel.org Reviewed-by: Douglas Anderson Tested-by: Justin Stitt Link: https://lore.kernel.org/r/20240424-kgdb_read_refactor-v3-3-f236dbe9828d@linaro.org Signed-off-by: Daniel Thompson Signed-off-by: Greg Kroah-Hartman --- kernel/debug/kdb/kdb_io.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c index 0f4b288185be..217124076c6c 100644 --- a/kernel/debug/kdb/kdb_io.c +++ b/kernel/debug/kdb/kdb_io.c @@ -370,6 +370,8 @@ static char *kdb_read(char *buffer, size_t bufsize) kdb_printf("\n"); kdb_printf(kdb_prompt_str); kdb_printf("%s", buffer); + if (cp != lastchar) + kdb_position_cursor(kdb_prompt_str, buffer, cp); } else if (tab != 2 && count > 0) { /* How many new characters do we want from tmpbuffer? */ len_tmp = strlen(p_tmp) - len; @@ -383,6 +385,9 @@ static char *kdb_read(char *buffer, size_t bufsize) kdb_printf("%s", cp); cp += len_tmp; lastchar += len_tmp; + if (cp != lastchar) + kdb_position_cursor(kdb_prompt_str, + buffer, cp); } } kdb_nextline = 1; /* reset output line number */ From e3d11ff45fde8068777d1122dc424becfeb992fa Mon Sep 17 00:00:00 2001 From: Daniel Thompson Date: Wed, 24 Apr 2024 15:03:37 +0100 Subject: [PATCH 0287/1565] kdb: Merge identical case statements in kdb_read() commit 6244917f377bf64719551b58592a02a0336a7439 upstream. The code that handles case 14 (down) and case 16 (up) has been copy and pasted despite being byte-for-byte identical. Combine them. Cc: stable@vger.kernel.org # Not a bug fix but it is needed for later bug fixes Reviewed-by: Douglas Anderson Tested-by: Justin Stitt Link: https://lore.kernel.org/r/20240424-kgdb_read_refactor-v3-4-f236dbe9828d@linaro.org Signed-off-by: Daniel Thompson Signed-off-by: Greg Kroah-Hartman --- kernel/debug/kdb/kdb_io.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c index 217124076c6c..91872c42cd7e 100644 --- a/kernel/debug/kdb/kdb_io.c +++ b/kernel/debug/kdb/kdb_io.c @@ -304,6 +304,7 @@ static char *kdb_read(char *buffer, size_t bufsize) } break; case 14: /* Down */ + case 16: /* Up */ memset(tmpbuffer, ' ', strlen(kdb_prompt_str) + (lastchar-buffer)); *(tmpbuffer+strlen(kdb_prompt_str) + @@ -318,15 +319,6 @@ static char *kdb_read(char *buffer, size_t bufsize) ++cp; } break; - case 16: /* Up */ - memset(tmpbuffer, ' ', - strlen(kdb_prompt_str) + (lastchar-buffer)); - *(tmpbuffer+strlen(kdb_prompt_str) + - (lastchar-buffer)) = '\0'; - kdb_printf("\r%s\r", tmpbuffer); - *lastchar = (char)key; - *(lastchar+1) = '\0'; - return lastchar; case 9: /* Tab */ if (tab < 2) ++tab; From 51664ef6ac84219d2606bdbfb065aa03e5a8bffa Mon Sep 17 00:00:00 2001 From: Daniel Thompson Date: Wed, 24 Apr 2024 15:03:38 +0100 Subject: [PATCH 0288/1565] kdb: Use format-specifiers rather than memset() for padding in kdb_read() commit c9b51ddb66b1d96e4d364c088da0f1dfb004c574 upstream. Currently when the current line should be removed from the display kdb_read() uses memset() to fill a temporary buffer with spaces. The problem is not that this could be trivially implemented using a format string rather than open coding it. The real problem is that it is possible, on systems with a long kdb_prompt_str, to write past the end of the tmpbuffer. Happily, as mentioned above, this can be trivially implemented using a format string. Make it so! Cc: stable@vger.kernel.org Reviewed-by: Douglas Anderson Tested-by: Justin Stitt Link: https://lore.kernel.org/r/20240424-kgdb_read_refactor-v3-5-f236dbe9828d@linaro.org Signed-off-by: Daniel Thompson Signed-off-by: Greg Kroah-Hartman --- kernel/debug/kdb/kdb_io.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c index 91872c42cd7e..a3b4b55d2e2e 100644 --- a/kernel/debug/kdb/kdb_io.c +++ b/kernel/debug/kdb/kdb_io.c @@ -305,11 +305,9 @@ static char *kdb_read(char *buffer, size_t bufsize) break; case 14: /* Down */ case 16: /* Up */ - memset(tmpbuffer, ' ', - strlen(kdb_prompt_str) + (lastchar-buffer)); - *(tmpbuffer+strlen(kdb_prompt_str) + - (lastchar-buffer)) = '\0'; - kdb_printf("\r%s\r", tmpbuffer); + kdb_printf("\r%*c\r", + (int)(strlen(kdb_prompt_str) + (lastchar - buffer)), + ' '); *lastchar = (char)key; *(lastchar+1) = '\0'; return lastchar; From 2295a7ef5c8c49241bff769e7826ef2582e532a6 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 28 May 2024 11:43:53 +0000 Subject: [PATCH 0289/1565] net: fix __dst_negative_advice() race commit 92f1655aa2b2294d0b49925f3b875a634bd3b59e upstream. __dst_negative_advice() does not enforce proper RCU rules when sk->dst_cache must be cleared, leading to possible UAF. RCU rules are that we must first clear sk->sk_dst_cache, then call dst_release(old_dst). Note that sk_dst_reset(sk) is implementing this protocol correctly, while __dst_negative_advice() uses the wrong order. Given that ip6_negative_advice() has special logic against RTF_CACHE, this means each of the three ->negative_advice() existing methods must perform the sk_dst_reset() themselves. Note the check against NULL dst is centralized in __dst_negative_advice(), there is no need to duplicate it in various callbacks. Many thanks to Clement Lecigne for tracking this issue. This old bug became visible after the blamed commit, using UDP sockets. Fixes: a87cb3e48ee8 ("net: Facility to report route quality of connected sockets") Reported-by: Clement Lecigne Diagnosed-by: Clement Lecigne Signed-off-by: Eric Dumazet Cc: Tom Herbert Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20240528114353.1794151-1-edumazet@google.com Signed-off-by: Jakub Kicinski [Lee: Stable backport] Signed-off-by: Lee Jones Signed-off-by: Greg Kroah-Hartman --- include/net/dst_ops.h | 2 +- include/net/sock.h | 13 +++---------- net/ipv4/route.c | 22 ++++++++-------------- net/ipv6/route.c | 29 +++++++++++++++-------------- net/xfrm/xfrm_policy.c | 11 +++-------- 5 files changed, 30 insertions(+), 47 deletions(-) diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h index 632086b2f644..3ae2fda29507 100644 --- a/include/net/dst_ops.h +++ b/include/net/dst_ops.h @@ -24,7 +24,7 @@ struct dst_ops { void (*destroy)(struct dst_entry *); void (*ifdown)(struct dst_entry *, struct net_device *dev, int how); - struct dst_entry * (*negative_advice)(struct dst_entry *); + void (*negative_advice)(struct sock *sk, struct dst_entry *); void (*link_failure)(struct sk_buff *); void (*update_pmtu)(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb, u32 mtu, diff --git a/include/net/sock.h b/include/net/sock.h index 8bcc96bf291c..0be681984987 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2012,17 +2012,10 @@ sk_dst_get(struct sock *sk) static inline void __dst_negative_advice(struct sock *sk) { - struct dst_entry *ndst, *dst = __sk_dst_get(sk); + struct dst_entry *dst = __sk_dst_get(sk); - if (dst && dst->ops->negative_advice) { - ndst = dst->ops->negative_advice(dst); - - if (ndst != dst) { - rcu_assign_pointer(sk->sk_dst_cache, ndst); - sk_tx_queue_clear(sk); - WRITE_ONCE(sk->sk_dst_pending_confirm, 0); - } - } + if (dst && dst->ops->negative_advice) + dst->ops->negative_advice(sk, dst); } static inline void dst_negative_advice(struct sock *sk) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index cc409cc0789c..b3b49d8b386d 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -137,7 +137,8 @@ static int ip_rt_gc_timeout __read_mostly = RT_GC_TIMEOUT; static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie); static unsigned int ipv4_default_advmss(const struct dst_entry *dst); static unsigned int ipv4_mtu(const struct dst_entry *dst); -static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst); +static void ipv4_negative_advice(struct sock *sk, + struct dst_entry *dst); static void ipv4_link_failure(struct sk_buff *skb); static void ip_rt_update_pmtu(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb, u32 mtu, @@ -866,22 +867,15 @@ static void ip_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_buf __ip_do_redirect(rt, skb, &fl4, true); } -static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst) +static void ipv4_negative_advice(struct sock *sk, + struct dst_entry *dst) { struct rtable *rt = (struct rtable *)dst; - struct dst_entry *ret = dst; - if (rt) { - if (dst->obsolete > 0) { - ip_rt_put(rt); - ret = NULL; - } else if ((rt->rt_flags & RTCF_REDIRECTED) || - rt->dst.expires) { - ip_rt_put(rt); - ret = NULL; - } - } - return ret; + if ((dst->obsolete > 0) || + (rt->rt_flags & RTCF_REDIRECTED) || + rt->dst.expires) + sk_dst_reset(sk); } /* diff --git a/net/ipv6/route.c b/net/ipv6/route.c index e3d0686d2a33..88f96241ca97 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -85,7 +85,8 @@ enum rt6_nud_state { static struct dst_entry *ip6_dst_check(struct dst_entry *dst, u32 cookie); static unsigned int ip6_default_advmss(const struct dst_entry *dst); static unsigned int ip6_mtu(const struct dst_entry *dst); -static struct dst_entry *ip6_negative_advice(struct dst_entry *); +static void ip6_negative_advice(struct sock *sk, + struct dst_entry *dst); static void ip6_dst_destroy(struct dst_entry *); static void ip6_dst_ifdown(struct dst_entry *, struct net_device *dev, int how); @@ -2635,24 +2636,24 @@ static struct dst_entry *ip6_dst_check(struct dst_entry *dst, u32 cookie) return dst_ret; } -static struct dst_entry *ip6_negative_advice(struct dst_entry *dst) +static void ip6_negative_advice(struct sock *sk, + struct dst_entry *dst) { struct rt6_info *rt = (struct rt6_info *) dst; - if (rt) { - if (rt->rt6i_flags & RTF_CACHE) { - rcu_read_lock(); - if (rt6_check_expired(rt)) { - rt6_remove_exception_rt(rt); - dst = NULL; - } - rcu_read_unlock(); - } else { - dst_release(dst); - dst = NULL; + if (rt->rt6i_flags & RTF_CACHE) { + rcu_read_lock(); + if (rt6_check_expired(rt)) { + /* counteract the dst_release() in sk_dst_reset() */ + dst_hold(dst); + sk_dst_reset(sk); + + rt6_remove_exception_rt(rt); } + rcu_read_unlock(); + return; } - return dst; + sk_dst_reset(sk); } static void ip6_link_failure(struct sk_buff *skb) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 664d55957feb..fadb309b25b4 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -3807,15 +3807,10 @@ static void xfrm_link_failure(struct sk_buff *skb) /* Impossible. Such dst must be popped before reaches point of failure. */ } -static struct dst_entry *xfrm_negative_advice(struct dst_entry *dst) +static void xfrm_negative_advice(struct sock *sk, struct dst_entry *dst) { - if (dst) { - if (dst->obsolete) { - dst_release(dst); - dst = NULL; - } - } - return dst; + if (dst->obsolete) + sk_dst_reset(sk); } static void xfrm_init_pmtu(struct xfrm_dst **bundle, int nr) From 08018302f23902a830e564ebcdf343be433e6fcf Mon Sep 17 00:00:00 2001 From: Mike Gilbert Date: Wed, 6 Mar 2024 12:11:47 -0500 Subject: [PATCH 0290/1565] sparc: move struct termio to asm/termios.h commit c32d18e7942d7589b62e301eb426b32623366565 upstream. Every other arch declares struct termio in asm/termios.h, so make sparc match them. Resolves a build failure in the PPP software package, which includes both bits/ioctl-types.h via sys/ioctl.h (glibc) and asm/termbits.h. Closes: https://bugs.gentoo.org/918992 Signed-off-by: Mike Gilbert Cc: stable@vger.kernel.org Reviewed-by: Andreas Larsson Tested-by: Andreas Larsson Link: https://lore.kernel.org/r/20240306171149.3843481-1-floppym@gentoo.org Signed-off-by: Andreas Larsson Signed-off-by: Greg Kroah-Hartman --- arch/sparc/include/uapi/asm/termbits.h | 10 ---------- arch/sparc/include/uapi/asm/termios.h | 9 +++++++++ 2 files changed, 9 insertions(+), 10 deletions(-) diff --git a/arch/sparc/include/uapi/asm/termbits.h b/arch/sparc/include/uapi/asm/termbits.h index ce5ad5d0f105..0614e179bccc 100644 --- a/arch/sparc/include/uapi/asm/termbits.h +++ b/arch/sparc/include/uapi/asm/termbits.h @@ -13,16 +13,6 @@ typedef unsigned int tcflag_t; typedef unsigned long tcflag_t; #endif -#define NCC 8 -struct termio { - unsigned short c_iflag; /* input mode flags */ - unsigned short c_oflag; /* output mode flags */ - unsigned short c_cflag; /* control mode flags */ - unsigned short c_lflag; /* local mode flags */ - unsigned char c_line; /* line discipline */ - unsigned char c_cc[NCC]; /* control characters */ -}; - #define NCCS 17 struct termios { tcflag_t c_iflag; /* input mode flags */ diff --git a/arch/sparc/include/uapi/asm/termios.h b/arch/sparc/include/uapi/asm/termios.h index ee86f4093d83..cceb32260881 100644 --- a/arch/sparc/include/uapi/asm/termios.h +++ b/arch/sparc/include/uapi/asm/termios.h @@ -40,5 +40,14 @@ struct winsize { unsigned short ws_ypixel; }; +#define NCC 8 +struct termio { + unsigned short c_iflag; /* input mode flags */ + unsigned short c_oflag; /* output mode flags */ + unsigned short c_cflag; /* control mode flags */ + unsigned short c_lflag; /* local mode flags */ + unsigned char c_line; /* line discipline */ + unsigned char c_cc[NCC]; /* control characters */ +}; #endif /* _UAPI_SPARC_TERMIOS_H */ From 76dc776153a47372719d664e0fc50d6355791abb Mon Sep 17 00:00:00 2001 From: Baokun Li Date: Sat, 4 May 2024 15:55:25 +0800 Subject: [PATCH 0291/1565] ext4: fix mb_cache_entry's e_refcnt leak in ext4_xattr_block_cache_find() commit 0c0b4a49d3e7f49690a6827a41faeffad5df7e21 upstream. Syzbot reports a warning as follows: ============================================ WARNING: CPU: 0 PID: 5075 at fs/mbcache.c:419 mb_cache_destroy+0x224/0x290 Modules linked in: CPU: 0 PID: 5075 Comm: syz-executor199 Not tainted 6.9.0-rc6-gb947cc5bf6d7 RIP: 0010:mb_cache_destroy+0x224/0x290 fs/mbcache.c:419 Call Trace: ext4_put_super+0x6d4/0xcd0 fs/ext4/super.c:1375 generic_shutdown_super+0x136/0x2d0 fs/super.c:641 kill_block_super+0x44/0x90 fs/super.c:1675 ext4_kill_sb+0x68/0xa0 fs/ext4/super.c:7327 [...] ============================================ This is because when finding an entry in ext4_xattr_block_cache_find(), if ext4_sb_bread() returns -ENOMEM, the ce's e_refcnt, which has already grown in the __entry_find(), won't be put away, and eventually trigger the above issue in mb_cache_destroy() due to reference count leakage. So call mb_cache_entry_put() on the -ENOMEM error branch as a quick fix. Reported-by: syzbot+dd43bd0f7474512edc47@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=dd43bd0f7474512edc47 Fixes: fb265c9cb49e ("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases") Cc: stable@kernel.org Signed-off-by: Baokun Li Reviewed-by: Jan Kara Link: https://lore.kernel.org/r/20240504075526.2254349-2-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman --- fs/ext4/xattr.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 5bf858fc271f..2dcb7d2bb85e 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -3065,8 +3065,10 @@ ext4_xattr_block_cache_find(struct inode *inode, bh = ext4_sb_bread(inode->i_sb, ce->e_value, REQ_PRIO); if (IS_ERR(bh)) { - if (PTR_ERR(bh) == -ENOMEM) + if (PTR_ERR(bh) == -ENOMEM) { + mb_cache_entry_put(ea_block_cache, ce); return NULL; + } bh = NULL; EXT4_ERROR_INODE(inode, "block %lu read error", (unsigned long)ce->e_value); From 7360cef95aa1ea2b5efb7b5e2ed32e941664e1f0 Mon Sep 17 00:00:00 2001 From: Harald Freudenberger Date: Mon, 13 May 2024 14:49:13 +0200 Subject: [PATCH 0292/1565] s390/ap: Fix crash in AP internal function modify_bitmap() commit d4f9d5a99a3fd1b1c691b7a1a6f8f3f25f4116c9 upstream. A system crash like this Failing address: 200000cb7df6f000 TEID: 200000cb7df6f403 Fault in home space mode while using kernel ASCE. AS:00000002d71bc007 R3:00000003fe5b8007 S:000000011a446000 P:000000015660c13d Oops: 0038 ilc:3 [#1] PREEMPT SMP Modules linked in: mlx5_ib ... CPU: 8 PID: 7556 Comm: bash Not tainted 6.9.0-rc7 #8 Hardware name: IBM 3931 A01 704 (LPAR) Krnl PSW : 0704e00180000000 0000014b75e7b606 (ap_parse_bitmap_str+0x10e/0x1f8) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000001 ffffffffffffffc0 0000000000000001 00000048f96b75d3 000000cb00000100 ffffffffffffffff ffffffffffffffff 000000cb7df6fce0 000000cb7df6fce0 00000000ffffffff 000000000000002b 00000048ffffffff 000003ff9b2dbc80 200000cb7df6fcd8 0000014bffffffc0 000000cb7df6fbc8 Krnl Code: 0000014b75e7b5fc: a7840047 brc 8,0000014b75e7b68a 0000014b75e7b600: 18b2 lr %r11,%r2 #0000014b75e7b602: a7f4000a brc 15,0000014b75e7b616 >0000014b75e7b606: eb22d00000e6 laog %r2,%r2,0(%r13) 0000014b75e7b60c: a7680001 lhi %r6,1 0000014b75e7b610: 187b lr %r7,%r11 0000014b75e7b612: 84960021 brxh %r9,%r6,0000014b75e7b654 0000014b75e7b616: 18e9 lr %r14,%r9 Call Trace: [<0000014b75e7b606>] ap_parse_bitmap_str+0x10e/0x1f8 ([<0000014b75e7b5dc>] ap_parse_bitmap_str+0xe4/0x1f8) [<0000014b75e7b758>] apmask_store+0x68/0x140 [<0000014b75679196>] kernfs_fop_write_iter+0x14e/0x1e8 [<0000014b75598524>] vfs_write+0x1b4/0x448 [<0000014b7559894c>] ksys_write+0x74/0x100 [<0000014b7618a440>] __do_syscall+0x268/0x328 [<0000014b761a3558>] system_call+0x70/0x98 INFO: lockdep is turned off. Last Breaking-Event-Address: [<0000014b75e7b636>] ap_parse_bitmap_str+0x13e/0x1f8 Kernel panic - not syncing: Fatal exception: panic_on_oops occured when /sys/bus/ap/a[pq]mask was updated with a relative mask value (like +0x10-0x12,+60,-90) with one of the numeric values exceeding INT_MAX. The fix is simple: use unsigned long values for the internal variables. The correct checks are already in place in the function but a simple int for the internal variables was used with the possibility to overflow. Reported-by: Marc Hartmayer Signed-off-by: Harald Freudenberger Tested-by: Marc Hartmayer Reviewed-by: Holger Dengler Cc: Signed-off-by: Heiko Carstens Signed-off-by: Greg Kroah-Hartman --- drivers/s390/crypto/ap_bus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c index c00a288a4eca..13e56a23e41e 100644 --- a/drivers/s390/crypto/ap_bus.c +++ b/drivers/s390/crypto/ap_bus.c @@ -859,7 +859,7 @@ static int hex2bitmap(const char *str, unsigned long *bitmap, int bits) */ static int modify_bitmap(const char *str, unsigned long *bitmap, int bits) { - int a, i, z; + unsigned long a, i, z; char *np, sign; /* bits needs to be a multiple of 8 */ From 3e41609e629a112091f16bd4c00a76bb4ff2b567 Mon Sep 17 00:00:00 2001 From: Sergey Shtylyov Date: Fri, 10 May 2024 23:24:04 +0300 Subject: [PATCH 0293/1565] nfs: fix undefined behavior in nfs_block_bits() commit 3c0a2e0b0ae661457c8505fecc7be5501aa7a715 upstream. Shifting *signed int* typed constant 1 left by 31 bits causes undefined behavior. Specify the correct *unsigned long* type by using 1UL instead. Found by Linux Verification Center (linuxtesting.org) with the Svace static analysis tool. Cc: stable@vger.kernel.org Signed-off-by: Sergey Shtylyov Reviewed-by: Benjamin Coddington Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/internal.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 9a72abfb46ab..566f1b11f62f 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -660,9 +660,9 @@ unsigned long nfs_block_bits(unsigned long bsize, unsigned char *nrbitsp) if ((bsize & (bsize - 1)) || nrbitsp) { unsigned char nrbits; - for (nrbits = 31; nrbits && !(bsize & (1 << nrbits)); nrbits--) + for (nrbits = 31; nrbits && !(bsize & (1UL << nrbits)); nrbits--) ; - bsize = 1 << nrbits; + bsize = 1UL << nrbits; if (nrbitsp) *nrbitsp = nrbits; } From 6285d50a233502f4f101dfe843a47098ef5e1078 Mon Sep 17 00:00:00 2001 From: Anna Schumaker Date: Thu, 25 Apr 2024 16:24:29 -0400 Subject: [PATCH 0294/1565] NFS: Fix READ_PLUS when server doesn't support OP_READ_PLUS commit f06d1b10cb016d5aaecdb1804fefca025387bd10 upstream. Olga showed me a case where the client was sending multiple READ_PLUS calls to the server in parallel, and the server replied NFS4ERR_OPNOTSUPP to each. The client would fall back to READ for the first reply, but fail to retry the other calls. I fix this by removing the test for NFS_CAP_READ_PLUS in nfs4_read_plus_not_supported(). This allows us to reschedule any READ_PLUS call that has a NFS4ERR_OPNOTSUPP return value, even after the capability has been cleared. Reported-by: Olga Kornievskaia Fixes: c567552612ec ("NFS: Add READ_PLUS data segment support") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Anna Schumaker Reviewed-by: Benjamin Coddington Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/nfs4proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 8e546e6a5619..1ff3f9efbe51 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -5320,7 +5320,7 @@ static bool nfs4_read_plus_not_supported(struct rpc_task *task, struct rpc_message *msg = &task->tk_msg; if (msg->rpc_proc == &nfs4_procedures[NFSPROC4_CLNT_READ_PLUS] && - server->caps & NFS_CAP_READ_PLUS && task->tk_status == -ENOTSUPP) { + task->tk_status == -ENOTSUPP) { server->caps &= ~NFS_CAP_READ_PLUS; msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ]; rpc_restart_call_prepare(task); From d7ae4792b5d03b45590158f623b7c68c621287a4 Mon Sep 17 00:00:00 2001 From: Neil Armstrong Date: Mon, 21 Aug 2023 14:11:21 +0200 Subject: [PATCH 0295/1565] scsi: ufs: ufs-qcom: Clear qunipro_g4_sel for HW major version > 5 commit c422fbd5cb58c9a078172ae1e9750971b738a197 upstream. The qunipro_g4_sel clear is also needed for new platforms with major version > 5. Fix the version check to take this into account. Fixes: 9c02aa24bf40 ("scsi: ufs: ufs-qcom: Clear qunipro_g4_sel for HW version major 5") Acked-by: Manivannan Sadhasivam Reviewed-by: Nitin Rawat Signed-off-by: Neil Armstrong Link: https://lore.kernel.org/r/20230821-topic-sm8x50-upstream-ufs-major-5-plus-v2-1-f42a4b712e58@linaro.org Reviewed-by: "Bao D. Nguyen" Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/ufs/ufs-qcom.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/ufs/ufs-qcom.c b/drivers/scsi/ufs/ufs-qcom.c index ed0d6ea967e1..2c3972d9ba52 100644 --- a/drivers/scsi/ufs/ufs-qcom.c +++ b/drivers/scsi/ufs/ufs-qcom.c @@ -243,7 +243,7 @@ static void ufs_qcom_select_unipro_mode(struct ufs_qcom_host *host) ufs_qcom_cap_qunipro(host) ? QUNIPRO_SEL : 0, REG_UFS_CFG1); - if (host->hw_ver.major == 0x05) + if (host->hw_ver.major >= 0x05) ufshcd_rmwl(host->hba, QUNIPRO_G4_SEL, 0, REG_UFS_CFG0); } From 5fe764c781f0bce73fe6be96af752485e6378faf Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Wed, 9 Dec 2020 16:42:14 +0800 Subject: [PATCH 0296/1565] f2fs: compress: fix compression chksum commit 75e91c888989cf2df5c78b251b07de1f5052e30e upstream. This patch addresses minor issues in compression chksum. Fixes: b28f047b28c5 ("f2fs: compress: support chksum") Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Greg Kroah-Hartman --- fs/f2fs/compress.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c index 6fb875c8cf28..8dccb59d072b 100644 --- a/fs/f2fs/compress.c +++ b/fs/f2fs/compress.c @@ -783,7 +783,7 @@ void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity) ret = cops->decompress_pages(dic); - if (!ret && fi->i_compress_flag & 1 << COMPRESS_CHKSUM) { + if (!ret && (fi->i_compress_flag & 1 << COMPRESS_CHKSUM)) { u32 provided = le32_to_cpu(dic->cbuf->chksum); u32 calculated = f2fs_crc32(sbi, dic->cbuf->cdata, dic->clen); @@ -796,7 +796,6 @@ void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity) provided, calculated); } set_sbi_flag(sbi, SBI_NEED_FSCK); - WARN_ON_ONCE(1); } } From 68a9559376224dbe448dd60d624ca268ddf8de2b Mon Sep 17 00:00:00 2001 From: Yangyang Li Date: Tue, 19 Jan 2021 17:28:33 +0800 Subject: [PATCH 0297/1565] RDMA/hns: Use mutex instead of spinlock for ida allocation commit 9293d3fcb70583f2c786f04ca788af026b7c4c5c upstream. GFP_KERNEL may cause ida_alloc_range() to sleep, but the spinlock covering this function is not allowed to sleep, so the spinlock needs to be changed to mutex. As there is a certain chance of memory allocation failure, GFP_ATOMIC is not suitable for QP allocation scenarios. Fixes: 71586dd20010 ("RDMA/hns: Create QP with selected QPN for bank load balance") Link: https://lore.kernel.org/r/1611048513-28663-1-git-send-email-liweihang@huawei.com Signed-off-by: Yangyang Li Signed-off-by: Weihang Li Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/hns/hns_roce_device.h | 2 +- drivers/infiniband/hw/hns/hns_roce_qp.c | 11 ++++++----- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h index db78bc2d44c2..385f26ef57e1 100644 --- a/drivers/infiniband/hw/hns/hns_roce_device.h +++ b/drivers/infiniband/hw/hns/hns_roce_device.h @@ -537,7 +537,7 @@ struct hns_roce_qp_table { struct hns_roce_hem_table sccc_table; struct mutex scc_mutex; struct hns_roce_bank bank[HNS_ROCE_QP_BANK_NUM]; - spinlock_t bank_lock; + struct mutex bank_mutex; }; struct hns_roce_cq_table { diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c index 3ed43a5c7596..5af36bb326b9 100644 --- a/drivers/infiniband/hw/hns/hns_roce_qp.c +++ b/drivers/infiniband/hw/hns/hns_roce_qp.c @@ -210,7 +210,7 @@ static int alloc_qpn(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp) hr_qp->doorbell_qpn = 1; } else { - spin_lock(&qp_table->bank_lock); + mutex_lock(&qp_table->bank_mutex); bankid = get_least_load_bankid_for_qp(qp_table->bank); ret = alloc_qpn_with_bankid(&qp_table->bank[bankid], bankid, @@ -218,12 +218,12 @@ static int alloc_qpn(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp) if (ret) { ibdev_err(&hr_dev->ib_dev, "failed to alloc QPN, ret = %d\n", ret); - spin_unlock(&qp_table->bank_lock); + mutex_unlock(&qp_table->bank_mutex); return ret; } qp_table->bank[bankid].inuse++; - spin_unlock(&qp_table->bank_lock); + mutex_unlock(&qp_table->bank_mutex); hr_qp->doorbell_qpn = (u32)num; } @@ -409,9 +409,9 @@ static void free_qpn(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp) ida_free(&hr_dev->qp_table.bank[bankid].ida, hr_qp->qpn >> 3); - spin_lock(&hr_dev->qp_table.bank_lock); + mutex_lock(&hr_dev->qp_table.bank_mutex); hr_dev->qp_table.bank[bankid].inuse--; - spin_unlock(&hr_dev->qp_table.bank_lock); + mutex_unlock(&hr_dev->qp_table.bank_mutex); } static int set_rq_size(struct hns_roce_dev *hr_dev, struct ib_qp_cap *cap, @@ -1358,6 +1358,7 @@ int hns_roce_init_qp_table(struct hns_roce_dev *hr_dev) unsigned int i; mutex_init(&qp_table->scc_mutex); + mutex_init(&qp_table->bank_mutex); xa_init(&hr_dev->qp_table_xa); reserved_from_bot = hr_dev->caps.reserved_qps; From 487489c4c8222f90c6bf5c1c2385a32504fb0366 Mon Sep 17 00:00:00 2001 From: Chengchang Tang Date: Fri, 4 Aug 2023 09:27:11 +0800 Subject: [PATCH 0298/1565] RDMA/hns: Fix CQ and QP cache affinity commit 9e03dbea2b0634b21a45946b4f8097e0dc86ebe1 upstream. Currently, the affinity between QP cache and CQ cache is not considered when assigning QPN, it will affect the message rate of HW. Allocate QPN from QP cache with better CQ affinity to get better performance. Fixes: 71586dd20010 ("RDMA/hns: Create QP with selected QPN for bank load balance") Signed-off-by: Chengchang Tang Signed-off-by: Junxian Huang Link: https://lore.kernel.org/r/20230804012711.808069-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/hns/hns_roce_device.h | 2 ++ drivers/infiniband/hw/hns/hns_roce_qp.c | 28 ++++++++++++++++----- 2 files changed, 24 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h index 385f26ef57e1..fe54e09eeccd 100644 --- a/drivers/infiniband/hw/hns/hns_roce_device.h +++ b/drivers/infiniband/hw/hns/hns_roce_device.h @@ -122,6 +122,8 @@ */ #define EQ_DEPTH_COEFF 2 +#define CQ_BANKID_MASK GENMASK(1, 0) + enum { SERV_TYPE_RC, SERV_TYPE_UC, diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c index 5af36bb326b9..1a6de9a9e57c 100644 --- a/drivers/infiniband/hw/hns/hns_roce_qp.c +++ b/drivers/infiniband/hw/hns/hns_roce_qp.c @@ -154,14 +154,29 @@ static void hns_roce_ib_qp_event(struct hns_roce_qp *hr_qp, } } -static u8 get_least_load_bankid_for_qp(struct hns_roce_bank *bank) +static u8 get_affinity_cq_bank(u8 qp_bank) { - u32 least_load = bank[0].inuse; + return (qp_bank >> 1) & CQ_BANKID_MASK; +} + +static u8 get_least_load_bankid_for_qp(struct ib_qp_init_attr *init_attr, + struct hns_roce_bank *bank) +{ +#define INVALID_LOAD_QPNUM 0xFFFFFFFF + struct ib_cq *scq = init_attr->send_cq; + u32 least_load = INVALID_LOAD_QPNUM; + unsigned long cqn = 0; u8 bankid = 0; u32 bankcnt; u8 i; - for (i = 1; i < HNS_ROCE_QP_BANK_NUM; i++) { + if (scq) + cqn = to_hr_cq(scq)->cqn; + + for (i = 0; i < HNS_ROCE_QP_BANK_NUM; i++) { + if (scq && (get_affinity_cq_bank(i) != (cqn & CQ_BANKID_MASK))) + continue; + bankcnt = bank[i].inuse; if (bankcnt < least_load) { least_load = bankcnt; @@ -193,7 +208,8 @@ static int alloc_qpn_with_bankid(struct hns_roce_bank *bank, u8 bankid, return 0; } -static int alloc_qpn(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp) +static int alloc_qpn(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp, + struct ib_qp_init_attr *init_attr) { struct hns_roce_qp_table *qp_table = &hr_dev->qp_table; unsigned long num = 0; @@ -211,7 +227,7 @@ static int alloc_qpn(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp) hr_qp->doorbell_qpn = 1; } else { mutex_lock(&qp_table->bank_mutex); - bankid = get_least_load_bankid_for_qp(qp_table->bank); + bankid = get_least_load_bankid_for_qp(init_attr, qp_table->bank); ret = alloc_qpn_with_bankid(&qp_table->bank[bankid], bankid, &num); @@ -1005,7 +1021,7 @@ static int hns_roce_create_qp_common(struct hns_roce_dev *hr_dev, goto err_db; } - ret = alloc_qpn(hr_dev, hr_qp); + ret = alloc_qpn(hr_dev, hr_qp, init_attr); if (ret) { ibdev_err(ibdev, "failed to alloc QPN, ret = %d.\n", ret); goto err_buf; From a2ed1606213906ac22fd66bebb34b88f8e24224b Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Sun, 16 Jun 2024 13:32:37 +0200 Subject: [PATCH 0299/1565] Linux 5.10.219 Link: https://lore.kernel.org/r/20240613113247.525431100@linuxfoundation.org Tested-by: Dominique Martinet Tested-by: Pavel Machek (CIP) Tested-by: Mark Brown Tested-by: Jon Hunter Signed-off-by: Greg Kroah-Hartman --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 3bc56bf43c82..3b36b77589f2 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 5 PATCHLEVEL = 10 -SUBLEVEL = 218 +SUBLEVEL = 219 EXTRAVERSION = NAME = Dare mighty things From 9aa0a43a55ff181abd771035f23941402db96829 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 10 Jun 2020 10:36:42 -0400 Subject: [PATCH 0300/1565] SUNRPC: Rename svc_encode_read_payload() [ Upstream commit 03493bca084fdca48abc59b00e06ce733aa9eb7d ] Clean up: "result payload" is a less confusing name for these payloads. "READ payload" reflects only the NFS usage. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 2 +- include/linux/sunrpc/svc.h | 6 +++--- include/linux/sunrpc/svc_rdma.h | 4 ++-- include/linux/sunrpc/svc_xprt.h | 4 ++-- net/sunrpc/svc.c | 11 ++++++----- net/sunrpc/svcsock.c | 8 ++++---- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 8 ++++---- net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +- 8 files changed, 23 insertions(+), 22 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index dbfa24cf3390..9971d3c29573 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3843,7 +3843,7 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, read->rd_length = maxcount; if (nfserr) return nfserr; - if (svc_encode_read_payload(resp->rqstp, starting_len + 8, maxcount)) + if (svc_encode_result_payload(resp->rqstp, starting_len + 8, maxcount)) return nfserr_io; xdr_truncate_encode(xdr, starting_len + 8 + xdr_align_size(maxcount)); diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 386628b36bc7..c220b734fa69 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -519,9 +519,9 @@ void svc_wake_up(struct svc_serv *); void svc_reserve(struct svc_rqst *rqstp, int space); struct svc_pool * svc_pool_for_cpu(struct svc_serv *serv, int cpu); char * svc_print_addr(struct svc_rqst *, char *, size_t); -int svc_encode_read_payload(struct svc_rqst *rqstp, - unsigned int offset, - unsigned int length); +int svc_encode_result_payload(struct svc_rqst *rqstp, + unsigned int offset, + unsigned int length); unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, struct page **pages, struct kvec *first, size_t total); diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h index 9dc3a3b88391..2b870a3f391b 100644 --- a/include/linux/sunrpc/svc_rdma.h +++ b/include/linux/sunrpc/svc_rdma.h @@ -207,8 +207,8 @@ extern void svc_rdma_send_error_msg(struct svcxprt_rdma *rdma, struct svc_rdma_recv_ctxt *rctxt, int status); extern int svc_rdma_sendto(struct svc_rqst *); -extern int svc_rdma_read_payload(struct svc_rqst *rqstp, unsigned int offset, - unsigned int length); +extern int svc_rdma_result_payload(struct svc_rqst *rqstp, unsigned int offset, + unsigned int length); /* svc_rdma_transport.c */ extern struct svc_xprt_class svc_rdma_class; diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index aca35ab5cff2..92455e0d5244 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -21,8 +21,8 @@ struct svc_xprt_ops { int (*xpo_has_wspace)(struct svc_xprt *); int (*xpo_recvfrom)(struct svc_rqst *); int (*xpo_sendto)(struct svc_rqst *); - int (*xpo_read_payload)(struct svc_rqst *, unsigned int, - unsigned int); + int (*xpo_result_payload)(struct svc_rqst *, unsigned int, + unsigned int); void (*xpo_release_rqst)(struct svc_rqst *); void (*xpo_detach)(struct svc_xprt *); void (*xpo_free)(struct svc_xprt *); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index cfe8b911ca01..e4e4e203ecda 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1626,7 +1626,7 @@ u32 svc_max_payload(const struct svc_rqst *rqstp) EXPORT_SYMBOL_GPL(svc_max_payload); /** - * svc_encode_read_payload - mark a range of bytes as a READ payload + * svc_encode_result_payload - mark a range of bytes as a result payload * @rqstp: svc_rqst to operate on * @offset: payload's byte offset in rqstp->rq_res * @length: size of payload, in bytes @@ -1634,12 +1634,13 @@ EXPORT_SYMBOL_GPL(svc_max_payload); * Returns zero on success, or a negative errno if a permanent * error occurred. */ -int svc_encode_read_payload(struct svc_rqst *rqstp, unsigned int offset, - unsigned int length) +int svc_encode_result_payload(struct svc_rqst *rqstp, unsigned int offset, + unsigned int length) { - return rqstp->rq_xprt->xpt_ops->xpo_read_payload(rqstp, offset, length); + return rqstp->rq_xprt->xpt_ops->xpo_result_payload(rqstp, offset, + length); } -EXPORT_SYMBOL_GPL(svc_encode_read_payload); +EXPORT_SYMBOL_GPL(svc_encode_result_payload); /** * svc_fill_write_vector - Construct data argument for VFS write call diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 3d5ee042c501..90f6231d6ed6 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -181,8 +181,8 @@ static void svc_set_cmsg_data(struct svc_rqst *rqstp, struct cmsghdr *cmh) } } -static int svc_sock_read_payload(struct svc_rqst *rqstp, unsigned int offset, - unsigned int length) +static int svc_sock_result_payload(struct svc_rqst *rqstp, unsigned int offset, + unsigned int length) { return 0; } @@ -635,7 +635,7 @@ static const struct svc_xprt_ops svc_udp_ops = { .xpo_create = svc_udp_create, .xpo_recvfrom = svc_udp_recvfrom, .xpo_sendto = svc_udp_sendto, - .xpo_read_payload = svc_sock_read_payload, + .xpo_result_payload = svc_sock_result_payload, .xpo_release_rqst = svc_udp_release_rqst, .xpo_detach = svc_sock_detach, .xpo_free = svc_sock_free, @@ -1209,7 +1209,7 @@ static const struct svc_xprt_ops svc_tcp_ops = { .xpo_create = svc_tcp_create, .xpo_recvfrom = svc_tcp_recvfrom, .xpo_sendto = svc_tcp_sendto, - .xpo_read_payload = svc_sock_read_payload, + .xpo_result_payload = svc_sock_result_payload, .xpo_release_rqst = svc_tcp_release_rqst, .xpo_detach = svc_tcp_sock_detach, .xpo_free = svc_sock_free, diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index c3d588b149aa..c8411b4f3492 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -979,19 +979,19 @@ int svc_rdma_sendto(struct svc_rqst *rqstp) } /** - * svc_rdma_read_payload - special processing for a READ payload + * svc_rdma_result_payload - special processing for a result payload * @rqstp: svc_rqst to operate on * @offset: payload's byte offset in @xdr * @length: size of payload, in bytes * * Returns zero on success. * - * For the moment, just record the xdr_buf location of the READ + * For the moment, just record the xdr_buf location of the result * payload. svc_rdma_sendto will use that location later when * we actually send the payload. */ -int svc_rdma_read_payload(struct svc_rqst *rqstp, unsigned int offset, - unsigned int length) +int svc_rdma_result_payload(struct svc_rqst *rqstp, unsigned int offset, + unsigned int length) { struct svc_rdma_recv_ctxt *rctxt = rqstp->rq_xprt_ctxt; diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index 5f7e3d12523f..c895f80df659 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -80,7 +80,7 @@ static const struct svc_xprt_ops svc_rdma_ops = { .xpo_create = svc_rdma_create, .xpo_recvfrom = svc_rdma_recvfrom, .xpo_sendto = svc_rdma_sendto, - .xpo_read_payload = svc_rdma_read_payload, + .xpo_result_payload = svc_rdma_result_payload, .xpo_release_rqst = svc_rdma_release_rqst, .xpo_detach = svc_rdma_detach, .xpo_free = svc_rdma_free, From a2f25c3208d18c1bf6084d2bf290cda2ce4befda Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 5 Nov 2020 10:24:19 -0500 Subject: [PATCH 0301/1565] NFSD: Invoke svc_encode_result_payload() in "read" NFSD encoders [ Upstream commit 76e5492b161f555c0fb69cad9eb39a7d8467f5fe ] Have the NFSD encoders annotate the boundaries of every direct-data-placement eligible result data payload. Then change svcrdma to use that annotation instead of the xdr->page_len when handling Write chunks. For NFSv4 on RDMA, that enables the ability to recognize multiple result payloads per compound. This is a pre-requisite for supporting multiple Write chunks per RPC transaction. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 7 +++++ fs/nfsd/nfs4xdr.c | 41 +++++++++++++++++++-------- fs/nfsd/nfsxdr.c | 6 ++++ net/sunrpc/xprtrdma/svc_rdma_sendto.c | 24 +++++----------- 4 files changed, 49 insertions(+), 29 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 716566da400e..27b24823f7c4 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -707,6 +707,7 @@ int nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_readlinkres *resp = rqstp->rq_resp; + struct kvec *head = rqstp->rq_res.head; *p++ = resp->status; p = encode_post_op_attr(rqstp, p, &resp->fh); @@ -720,6 +721,8 @@ nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) *p = 0; rqstp->rq_res.tail[0].iov_len = 4 - (resp->len&3); } + if (svc_encode_result_payload(rqstp, head->iov_len, resp->len)) + return 0; return 1; } else return xdr_ressize_check(rqstp, p); @@ -730,6 +733,7 @@ int nfs3svc_encode_readres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_readres *resp = rqstp->rq_resp; + struct kvec *head = rqstp->rq_res.head; *p++ = resp->status; p = encode_post_op_attr(rqstp, p, &resp->fh); @@ -746,6 +750,9 @@ nfs3svc_encode_readres(struct svc_rqst *rqstp, __be32 *p) *p = 0; rqstp->rq_res.tail[0].iov_len = 4 - (resp->count & 3); } + if (svc_encode_result_payload(rqstp, head->iov_len, + resp->count)) + return 0; return 1; } else return xdr_ressize_check(rqstp, p); diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 9971d3c29573..4b3344296ed0 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3770,8 +3770,8 @@ static __be32 nfsd4_encode_splice_read( { struct xdr_stream *xdr = &resp->xdr; struct xdr_buf *buf = xdr->buf; + int status, space_left; u32 eof; - int space_left; __be32 nfserr; __be32 *p = xdr->p - 2; @@ -3782,14 +3782,13 @@ static __be32 nfsd4_encode_splice_read( nfserr = nfsd_splice_read(read->rd_rqstp, read->rd_fhp, file, read->rd_offset, &maxcount, &eof); read->rd_length = maxcount; - if (nfserr) { - /* - * nfsd_splice_actor may have already messed with the - * page length; reset it so as not to confuse - * xdr_truncate_encode: - */ - buf->page_len = 0; - return nfserr; + if (nfserr) + goto out_err; + status = svc_encode_result_payload(read->rd_rqstp, + buf->head[0].iov_len, maxcount); + if (status) { + nfserr = nfserrno(status); + goto out_err; } *(p++) = htonl(eof); @@ -3820,6 +3819,15 @@ static __be32 nfsd4_encode_splice_read( xdr->end = (__be32 *)((void *)xdr->end + space_left); return 0; + +out_err: + /* + * nfsd_splice_actor may have already messed with the + * page length; reset it so as not to confuse + * xdr_truncate_encode in our caller. + */ + buf->page_len = 0; + return nfserr; } static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, @@ -3911,6 +3919,7 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd int zero = 0; struct xdr_stream *xdr = &resp->xdr; int length_offset = xdr->buf->len; + int status; __be32 *p; p = xdr_reserve_space(xdr, 4); @@ -3931,9 +3940,13 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd (char *)p, &maxcount); if (nfserr == nfserr_isdir) nfserr = nfserr_inval; - if (nfserr) { - xdr_truncate_encode(xdr, length_offset); - return nfserr; + if (nfserr) + goto out_err; + status = svc_encode_result_payload(readlink->rl_rqstp, length_offset, + maxcount); + if (status) { + nfserr = nfserrno(status); + goto out_err; } wire_count = htonl(maxcount); @@ -3943,6 +3956,10 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd write_bytes_to_xdr_buf(xdr->buf, length_offset + 4 + maxcount, &zero, 4 - (maxcount&3)); return 0; + +out_err: + xdr_truncate_encode(xdr, length_offset); + return nfserr; } static __be32 diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 8a288c8fcd57..9e00a902113e 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -469,6 +469,7 @@ int nfssvc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_readlinkres *resp = rqstp->rq_resp; + struct kvec *head = rqstp->rq_res.head; *p++ = resp->status; if (resp->status != nfs_ok) @@ -483,6 +484,8 @@ nfssvc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) *p = 0; rqstp->rq_res.tail[0].iov_len = 4 - (resp->len&3); } + if (svc_encode_result_payload(rqstp, head->iov_len, resp->len)) + return 0; return 1; } @@ -490,6 +493,7 @@ int nfssvc_encode_readres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_readres *resp = rqstp->rq_resp; + struct kvec *head = rqstp->rq_res.head; *p++ = resp->status; if (resp->status != nfs_ok) @@ -507,6 +511,8 @@ nfssvc_encode_readres(struct svc_rqst *rqstp, __be32 *p) *p = 0; rqstp->rq_res.tail[0].iov_len = 4 - (resp->count&3); } + if (svc_encode_result_payload(rqstp, head->iov_len, resp->count)) + return 0; return 1; } diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index c8411b4f3492..d6436c13d5c4 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -448,7 +448,6 @@ static ssize_t svc_rdma_encode_write_chunk(__be32 *src, * svc_rdma_encode_write_list - Encode RPC Reply's Write chunk list * @rctxt: Reply context with information about the RPC Call * @sctxt: Send context for the RPC Reply - * @length: size in bytes of the payload in the first Write chunk * * The client provides a Write chunk list in the Call message. Fill * in the segments in the first Write chunk in the Reply's transport @@ -465,12 +464,12 @@ static ssize_t svc_rdma_encode_write_chunk(__be32 *src, */ static ssize_t svc_rdma_encode_write_list(const struct svc_rdma_recv_ctxt *rctxt, - struct svc_rdma_send_ctxt *sctxt, - unsigned int length) + struct svc_rdma_send_ctxt *sctxt) { ssize_t len, ret; - ret = svc_rdma_encode_write_chunk(rctxt->rc_write_list, sctxt, length); + ret = svc_rdma_encode_write_chunk(rctxt->rc_write_list, sctxt, + rctxt->rc_read_payload_length); if (ret < 0) return ret; len = ret; @@ -923,21 +922,12 @@ int svc_rdma_sendto(struct svc_rqst *rqstp) goto err0; if (wr_lst) { /* XXX: Presume the client sent only one Write chunk */ - unsigned long offset; - unsigned int length; - - if (rctxt->rc_read_payload_length) { - offset = rctxt->rc_read_payload_offset; - length = rctxt->rc_read_payload_length; - } else { - offset = xdr->head[0].iov_len; - length = xdr->page_len; - } - ret = svc_rdma_send_write_chunk(rdma, wr_lst, xdr, offset, - length); + ret = svc_rdma_send_write_chunk(rdma, wr_lst, xdr, + rctxt->rc_read_payload_offset, + rctxt->rc_read_payload_length); if (ret < 0) goto err2; - if (svc_rdma_encode_write_list(rctxt, sctxt, length) < 0) + if (svc_rdma_encode_write_list(rctxt, sctxt) < 0) goto err0; } else { if (xdr_stream_encode_item_absent(&sctxt->sc_stream) < 0) From 92f59545b914b761879e82a3dc514a377b2b2b7d Mon Sep 17 00:00:00 2001 From: Tom Rix Date: Sun, 1 Nov 2020 07:32:34 -0800 Subject: [PATCH 0302/1565] NFSD: A semicolon is not needed after a switch statement. [ Upstream commit 25fef48bdbe7cac5ba5577eab6a750e1caea43bc ] Signed-off-by: Tom Rix Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 4b3344296ed0..fd9107332a20 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2558,7 +2558,7 @@ static u32 nfs4_file_type(umode_t mode) case S_IFREG: return NF4REG; case S_IFSOCK: return NF4SOCK; default: return NF4BAD; - }; + } } static inline __be32 From e51368510170946af965a1face59ca2c3b9550c1 Mon Sep 17 00:00:00 2001 From: Alex Shi Date: Fri, 6 Nov 2020 13:40:57 +0800 Subject: [PATCH 0303/1565] nfsd/nfs3: remove unused macro nfsd3_fhandleres [ Upstream commit 71fd721839a74d945c242299f6be29a246fc2131 ] The macro is unused, remove it to tame gcc warning: fs/nfsd/nfs3proc.c:702:0: warning: macro "nfsd3_fhandleres" is not used [-Wunused-macros] Signed-off-by: Alex Shi Cc: "J. Bruce Fields" Cc: Chuck Lever Cc: linux-nfs@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 981a4e4c9a3c..eba84417406c 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -694,7 +694,6 @@ nfsd3_proc_commit(struct svc_rqst *rqstp) #define nfsd3_mkdirargs nfsd3_createargs #define nfsd3_readdirplusargs nfsd3_readdirargs #define nfsd3_fhandleargs nfsd_fhandle -#define nfsd3_fhandleres nfsd3_attrstat #define nfsd3_attrstatres nfsd3_attrstat #define nfsd3_wccstatres nfsd3_attrstat #define nfsd3_createres nfsd3_diropres From 9f8405182bdd90be2d67f9b7684a9bf0b29c3678 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 19 Aug 2020 12:56:40 -0400 Subject: [PATCH 0304/1565] NFSD: Clean up the show_nf_may macro [ Upstream commit b76278ae68848cea13b325d247aa5cf31c87edac ] Display all currently possible NFSD_MAY permission flags. Move and rename show_nf_may with a more generic name because the NFSD_MAY permission flags are used in other places besides the file cache. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/trace.h | 40 ++++++++++++++++++++++++++-------------- 1 file changed, 26 insertions(+), 14 deletions(-) diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index a952f4a9b2a6..7bb1c398daa5 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -12,6 +12,22 @@ #include "export.h" #include "nfsfh.h" +#define show_nfsd_may_flags(x) \ + __print_flags(x, "|", \ + { NFSD_MAY_EXEC, "EXEC" }, \ + { NFSD_MAY_WRITE, "WRITE" }, \ + { NFSD_MAY_READ, "READ" }, \ + { NFSD_MAY_SATTR, "SATTR" }, \ + { NFSD_MAY_TRUNC, "TRUNC" }, \ + { NFSD_MAY_LOCK, "LOCK" }, \ + { NFSD_MAY_OWNER_OVERRIDE, "OWNER_OVERRIDE" }, \ + { NFSD_MAY_LOCAL_ACCESS, "LOCAL_ACCESS" }, \ + { NFSD_MAY_BYPASS_GSS_ON_ROOT, "BYPASS_GSS_ON_ROOT" }, \ + { NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" }, \ + { NFSD_MAY_BYPASS_GSS, "BYPASS_GSS" }, \ + { NFSD_MAY_READ_IF_EXEC, "READ_IF_EXEC" }, \ + { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" }) + TRACE_EVENT(nfsd_compound, TP_PROTO(const struct svc_rqst *rqst, u32 args_opcnt), @@ -392,6 +408,9 @@ TRACE_EVENT(nfsd_clid_inuse_err, __entry->cl_boot, __entry->cl_id) ) +/* + * from fs/nfsd/filecache.h + */ TRACE_DEFINE_ENUM(NFSD_FILE_HASHED); TRACE_DEFINE_ENUM(NFSD_FILE_PENDING); TRACE_DEFINE_ENUM(NFSD_FILE_BREAK_READ); @@ -406,13 +425,6 @@ TRACE_DEFINE_ENUM(NFSD_FILE_REFERENCED); { 1 << NFSD_FILE_BREAK_WRITE, "BREAK_WRITE" }, \ { 1 << NFSD_FILE_REFERENCED, "REFERENCED"}) -/* FIXME: This should probably be fleshed out in the future. */ -#define show_nf_may(val) \ - __print_flags(val, "|", \ - { NFSD_MAY_READ, "READ" }, \ - { NFSD_MAY_WRITE, "WRITE" }, \ - { NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" }) - DECLARE_EVENT_CLASS(nfsd_file_class, TP_PROTO(struct nfsd_file *nf), TP_ARGS(nf), @@ -437,7 +449,7 @@ DECLARE_EVENT_CLASS(nfsd_file_class, __entry->nf_inode, __entry->nf_ref, show_nf_flags(__entry->nf_flags), - show_nf_may(__entry->nf_may), + show_nfsd_may_flags(__entry->nf_may), __entry->nf_file) ) @@ -463,10 +475,10 @@ TRACE_EVENT(nfsd_file_acquire, __field(u32, xid) __field(unsigned int, hash) __field(void *, inode) - __field(unsigned int, may_flags) + __field(unsigned long, may_flags) __field(int, nf_ref) __field(unsigned long, nf_flags) - __field(unsigned char, nf_may) + __field(unsigned long, nf_may) __field(struct file *, nf_file) __field(u32, status) ), @@ -485,10 +497,10 @@ TRACE_EVENT(nfsd_file_acquire, TP_printk("xid=0x%x hash=0x%x inode=0x%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=0x%p status=%u", __entry->xid, __entry->hash, __entry->inode, - show_nf_may(__entry->may_flags), __entry->nf_ref, - show_nf_flags(__entry->nf_flags), - show_nf_may(__entry->nf_may), __entry->nf_file, - __entry->status) + show_nfsd_may_flags(__entry->may_flags), + __entry->nf_ref, show_nf_flags(__entry->nf_flags), + show_nfsd_may_flags(__entry->nf_may), + __entry->nf_file, __entry->status) ); DECLARE_EVENT_CLASS(nfsd_file_search_class, From a7b8e883cef762e79baa3c407a3005148e28ab5b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 4 Sep 2020 15:06:26 -0400 Subject: [PATCH 0305/1565] NFSD: Remove extra "0x" in tracepoint format specifier [ Upstream commit 3a90e1dff16afdae6e1c918bfaff24f4d0f84869 ] Clean up: %p adds its own 0x already. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/trace.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 7bb1c398daa5..9239d97b682c 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -444,7 +444,7 @@ DECLARE_EVENT_CLASS(nfsd_file_class, __entry->nf_may = nf->nf_may; __entry->nf_file = nf->nf_file; ), - TP_printk("hash=0x%x inode=0x%p ref=%d flags=%s may=%s file=%p", + TP_printk("hash=0x%x inode=%p ref=%d flags=%s may=%s file=%p", __entry->nf_hashval, __entry->nf_inode, __entry->nf_ref, @@ -495,7 +495,7 @@ TRACE_EVENT(nfsd_file_acquire, __entry->status = be32_to_cpu(status); ), - TP_printk("xid=0x%x hash=0x%x inode=0x%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=0x%p status=%u", + TP_printk("xid=0x%x hash=0x%x inode=%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=%p status=%u", __entry->xid, __entry->hash, __entry->inode, show_nfsd_may_flags(__entry->may_flags), __entry->nf_ref, show_nf_flags(__entry->nf_flags), @@ -516,7 +516,7 @@ DECLARE_EVENT_CLASS(nfsd_file_search_class, __entry->hash = hash; __entry->found = found; ), - TP_printk("hash=0x%x inode=0x%p found=%d", __entry->hash, + TP_printk("hash=0x%x inode=%p found=%d", __entry->hash, __entry->inode, __entry->found) ); @@ -544,7 +544,7 @@ TRACE_EVENT(nfsd_file_fsnotify_handle_event, __entry->mode = inode->i_mode; __entry->mask = mask; ), - TP_printk("inode=0x%p nlink=%u mode=0%ho mask=0x%x", __entry->inode, + TP_printk("inode=%p nlink=%u mode=0%ho mask=0x%x", __entry->inode, __entry->nlink, __entry->mode, __entry->mask) ); From 9393f1628f9abdf6c78f85cf8c1a8fcc9361e6ef Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 27 Aug 2020 16:09:53 -0400 Subject: [PATCH 0306/1565] NFSD: Add SPDX header for fs/nfsd/trace.c [ Upstream commit f45a444cfe582b85af937a30d35d68d9a84399dd ] Clean up. The file was contributed in 2014 by Christoph Hellwig in commit 31ef83dc0538 ("nfsd: add trace events"). Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/trace.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nfsd/trace.c b/fs/nfsd/trace.c index 90967466a1e5..f008b95ceec2 100644 --- a/fs/nfsd/trace.c +++ b/fs/nfsd/trace.c @@ -1,3 +1,4 @@ +// SPDX-License-Identifier: GPL-2.0 #define CREATE_TRACE_POINTS #include "trace.h" From 164937edca64cb63600b6bc452bf9673359fee4e Mon Sep 17 00:00:00 2001 From: Huang Guobin Date: Wed, 25 Nov 2020 03:39:33 -0500 Subject: [PATCH 0307/1565] nfsd: Fix error return code in nfsd_file_cache_init() [ Upstream commit 231307df246eb29f30092836524ebb1fcb8f5b25 ] Fix to return PTR_ERR() error code from the error handling case instead of 0 in function nfsd_file_cache_init(), as done elsewhere in this function. Fixes: 65294c1f2c5e7("nfsd: add a new struct file caching facility to nfsd") Signed-off-by: Huang Guobin Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index e30e1ddc1ace..d0748ff04b92 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -684,6 +684,7 @@ nfsd_file_cache_init(void) if (IS_ERR(nfsd_file_fsnotify_group)) { pr_err("nfsd: unable to create fsnotify group: %ld\n", PTR_ERR(nfsd_file_fsnotify_group)); + ret = PTR_ERR(nfsd_file_fsnotify_group); nfsd_file_fsnotify_group = NULL; goto out_notifier; } From 2f46cc814106962a5b2a674cd9ff576b7dd00460 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 11 Nov 2020 15:52:47 -0500 Subject: [PATCH 0308/1565] SUNRPC: Add xdr_set_scratch_page() and xdr_reset_scratch_buffer() [ Upstream commit 0ae4c3e8a64ace1b8d7de033b0751afe43024416 ] Clean up: De-duplicate some frequently-used code. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfs/blocklayout/blocklayout.c | 2 +- fs/nfs/blocklayout/dev.c | 2 +- fs/nfs/dir.c | 2 +- fs/nfs/filelayout/filelayout.c | 2 +- fs/nfs/filelayout/filelayoutdev.c | 2 +- fs/nfs/flexfilelayout/flexfilelayout.c | 2 +- fs/nfs/flexfilelayout/flexfilelayoutdev.c | 2 +- fs/nfs/nfs42xdr.c | 2 +- fs/nfs/nfs4xdr.c | 6 ++-- fs/nfsd/nfs4proc.c | 2 +- include/linux/sunrpc/xdr.h | 44 ++++++++++++++++++++++- net/sunrpc/auth_gss/gss_rpc_xdr.c | 2 +- net/sunrpc/xdr.c | 28 +++------------ 13 files changed, 59 insertions(+), 39 deletions(-) diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c index 73000aa2d220..a9e563145e0c 100644 --- a/fs/nfs/blocklayout/blocklayout.c +++ b/fs/nfs/blocklayout/blocklayout.c @@ -699,7 +699,7 @@ bl_alloc_lseg(struct pnfs_layout_hdr *lo, struct nfs4_layoutget_res *lgr, xdr_init_decode_pages(&xdr, &buf, lgr->layoutp->pages, lgr->layoutp->len); - xdr_set_scratch_buffer(&xdr, page_address(scratch), PAGE_SIZE); + xdr_set_scratch_page(&xdr, scratch); status = -EIO; p = xdr_inline_decode(&xdr, 4); diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c index 6e3a14fdff9c..16412d6636e8 100644 --- a/fs/nfs/blocklayout/dev.c +++ b/fs/nfs/blocklayout/dev.c @@ -510,7 +510,7 @@ bl_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev, goto out; xdr_init_decode_pages(&xdr, &buf, pdev->pages, pdev->pglen); - xdr_set_scratch_buffer(&xdr, page_address(scratch), PAGE_SIZE); + xdr_set_scratch_page(&xdr, scratch); p = xdr_inline_decode(&xdr, sizeof(__be32)); if (!p) diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 9f88ca7b2001..935029632d5f 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -576,7 +576,7 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en goto out_nopages; xdr_init_decode_pages(&stream, &buf, xdr_pages, buflen); - xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE); + xdr_set_scratch_page(&stream, scratch); do { if (entry->label) diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c index deecfb50dd7e..45eec08ec904 100644 --- a/fs/nfs/filelayout/filelayout.c +++ b/fs/nfs/filelayout/filelayout.c @@ -666,7 +666,7 @@ filelayout_decode_layout(struct pnfs_layout_hdr *flo, return -ENOMEM; xdr_init_decode_pages(&stream, &buf, lgr->layoutp->pages, lgr->layoutp->len); - xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE); + xdr_set_scratch_page(&stream, scratch); /* 20 = ufl_util (4), first_stripe_index (4), pattern_offset (8), * num_fh (4) */ diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c index d913e818858f..86c3f7e69ec4 100644 --- a/fs/nfs/filelayout/filelayoutdev.c +++ b/fs/nfs/filelayout/filelayoutdev.c @@ -82,7 +82,7 @@ nfs4_fl_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev, goto out_err; xdr_init_decode_pages(&stream, &buf, pdev->pages, pdev->pglen); - xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE); + xdr_set_scratch_page(&stream, scratch); /* Get the stripe count (number of stripe index) */ p = xdr_inline_decode(&stream, 4); diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index e4f2820ba5a5..f2ae271fe7ec 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -378,7 +378,7 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh, xdr_init_decode_pages(&stream, &buf, lgr->layoutp->pages, lgr->layoutp->len); - xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE); + xdr_set_scratch_page(&stream, scratch); /* stripe unit and mirror_array_cnt */ rc = -EIO; diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c index 1f12297109b4..bfa7202ca7be 100644 --- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c +++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c @@ -69,7 +69,7 @@ nfs4_ff_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev, INIT_LIST_HEAD(&dsaddrs); xdr_init_decode_pages(&stream, &buf, pdev->pages, pdev->pglen); - xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE); + xdr_set_scratch_page(&stream, scratch); /* multipath count */ p = xdr_inline_decode(&stream, 4); diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c index f2248d9d4db5..df5bee2f505c 100644 --- a/fs/nfs/nfs42xdr.c +++ b/fs/nfs/nfs42xdr.c @@ -1536,7 +1536,7 @@ static int nfs4_xdr_dec_listxattrs(struct rpc_rqst *rqstp, struct compound_hdr hdr; int status; - xdr_set_scratch_buffer(xdr, page_address(res->scratch), PAGE_SIZE); + xdr_set_scratch_page(xdr, res->scratch); status = decode_compound_hdr(xdr, &hdr); if (status) diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c index f1e599553f2b..4e5c6cb770ad 100644 --- a/fs/nfs/nfs4xdr.c +++ b/fs/nfs/nfs4xdr.c @@ -6404,10 +6404,8 @@ nfs4_xdr_dec_getacl(struct rpc_rqst *rqstp, struct xdr_stream *xdr, struct compound_hdr hdr; int status; - if (res->acl_scratch != NULL) { - void *p = page_address(res->acl_scratch); - xdr_set_scratch_buffer(xdr, p, PAGE_SIZE); - } + if (res->acl_scratch != NULL) + xdr_set_scratch_page(xdr, res->acl_scratch); status = decode_compound_hdr(xdr, &hdr); if (status) goto out; diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index e84996c3867c..c01b6f7462fc 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2266,7 +2266,7 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp, xdr->end = head->iov_base + PAGE_SIZE - rqstp->rq_auth_slack; /* Tail and page_len should be zero at this point: */ buf->len = buf->head[0].iov_len; - xdr->scratch.iov_len = 0; + xdr_reset_scratch_buffer(xdr); xdr->page_ptr = buf->pages - 1; buf->buflen = PAGE_SIZE * (1 + rqstp->rq_page_end - buf->pages) - rqstp->rq_auth_slack; diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 6d9d1520612b..0c8cab6210b3 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -246,7 +246,6 @@ extern void xdr_init_decode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, struct rpc_rqst *rqst); extern void xdr_init_decode_pages(struct xdr_stream *xdr, struct xdr_buf *buf, struct page **pages, unsigned int len); -extern void xdr_set_scratch_buffer(struct xdr_stream *xdr, void *buf, size_t buflen); extern __be32 *xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes); extern unsigned int xdr_read_pages(struct xdr_stream *xdr, unsigned int len); extern void xdr_enter_page(struct xdr_stream *xdr, unsigned int len); @@ -254,6 +253,49 @@ extern int xdr_process_buf(struct xdr_buf *buf, unsigned int offset, unsigned in extern uint64_t xdr_align_data(struct xdr_stream *, uint64_t, uint32_t); extern uint64_t xdr_expand_hole(struct xdr_stream *, uint64_t, uint64_t); +/** + * xdr_set_scratch_buffer - Attach a scratch buffer for decoding data. + * @xdr: pointer to xdr_stream struct + * @buf: pointer to an empty buffer + * @buflen: size of 'buf' + * + * The scratch buffer is used when decoding from an array of pages. + * If an xdr_inline_decode() call spans across page boundaries, then + * we copy the data into the scratch buffer in order to allow linear + * access. + */ +static inline void +xdr_set_scratch_buffer(struct xdr_stream *xdr, void *buf, size_t buflen) +{ + xdr->scratch.iov_base = buf; + xdr->scratch.iov_len = buflen; +} + +/** + * xdr_set_scratch_page - Attach a scratch buffer for decoding data + * @xdr: pointer to xdr_stream struct + * @page: an anonymous page + * + * See xdr_set_scratch_buffer(). + */ +static inline void +xdr_set_scratch_page(struct xdr_stream *xdr, struct page *page) +{ + xdr_set_scratch_buffer(xdr, page_address(page), PAGE_SIZE); +} + +/** + * xdr_reset_scratch_buffer - Clear scratch buffer information + * @xdr: pointer to xdr_stream struct + * + * See xdr_set_scratch_buffer(). + */ +static inline void +xdr_reset_scratch_buffer(struct xdr_stream *xdr) +{ + xdr_set_scratch_buffer(xdr, NULL, 0); +} + /** * xdr_stream_remaining - Return the number of bytes remaining in the stream * @xdr: pointer to struct xdr_stream diff --git a/net/sunrpc/auth_gss/gss_rpc_xdr.c b/net/sunrpc/auth_gss/gss_rpc_xdr.c index e265b8d38aa1..a857fc99431c 100644 --- a/net/sunrpc/auth_gss/gss_rpc_xdr.c +++ b/net/sunrpc/auth_gss/gss_rpc_xdr.c @@ -800,7 +800,7 @@ int gssx_dec_accept_sec_context(struct rpc_rqst *rqstp, scratch = alloc_page(GFP_KERNEL); if (!scratch) return -ENOMEM; - xdr_set_scratch_buffer(xdr, page_address(scratch), PAGE_SIZE); + xdr_set_scratch_page(xdr, scratch); /* res->status */ err = gssx_dec_status(xdr, &res->status); diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c index d84bb5037bb5..02adc5c7f034 100644 --- a/net/sunrpc/xdr.c +++ b/net/sunrpc/xdr.c @@ -669,7 +669,7 @@ void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, struct kvec *iov = buf->head; int scratch_len = buf->buflen - buf->page_len - buf->tail[0].iov_len; - xdr_set_scratch_buffer(xdr, NULL, 0); + xdr_reset_scratch_buffer(xdr); BUG_ON(scratch_len < 0); xdr->buf = buf; xdr->iov = iov; @@ -713,7 +713,7 @@ inline void xdr_commit_encode(struct xdr_stream *xdr) page = page_address(*xdr->page_ptr); memcpy(xdr->scratch.iov_base, page, shift); memmove(page, page + shift, (void *)xdr->p - page); - xdr->scratch.iov_len = 0; + xdr_reset_scratch_buffer(xdr); } EXPORT_SYMBOL_GPL(xdr_commit_encode); @@ -743,8 +743,7 @@ static __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr, * the "scratch" iov to track any temporarily unused fragment of * space at the end of the previous buffer: */ - xdr->scratch.iov_base = xdr->p; - xdr->scratch.iov_len = frag1bytes; + xdr_set_scratch_buffer(xdr, xdr->p, frag1bytes); p = page_address(*xdr->page_ptr); /* * Note this is where the next encode will start after we've @@ -1056,8 +1055,7 @@ void xdr_init_decode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, struct rpc_rqst *rqst) { xdr->buf = buf; - xdr->scratch.iov_base = NULL; - xdr->scratch.iov_len = 0; + xdr_reset_scratch_buffer(xdr); xdr->nwords = XDR_QUADLEN(buf->len); if (buf->head[0].iov_len != 0) xdr_set_iov(xdr, buf->head, buf->len); @@ -1105,24 +1103,6 @@ static __be32 * __xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes) return p; } -/** - * xdr_set_scratch_buffer - Attach a scratch buffer for decoding data. - * @xdr: pointer to xdr_stream struct - * @buf: pointer to an empty buffer - * @buflen: size of 'buf' - * - * The scratch buffer is used when decoding from an array of pages. - * If an xdr_inline_decode() call spans across page boundaries, then - * we copy the data into the scratch buffer in order to allow linear - * access. - */ -void xdr_set_scratch_buffer(struct xdr_stream *xdr, void *buf, size_t buflen) -{ - xdr->scratch.iov_base = buf; - xdr->scratch.iov_len = buflen; -} -EXPORT_SYMBOL_GPL(xdr_set_scratch_buffer); - static __be32 *xdr_copy_to_scratch(struct xdr_stream *xdr, size_t nbytes) { __be32 *p; From 79e4e0d489c8e72b9efa388e504a036eec1550c6 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 5 Nov 2020 11:19:42 -0500 Subject: [PATCH 0309/1565] SUNRPC: Prepare for xdr_stream-style decoding on the server-side [ Upstream commit 5191955d6fc65e6d4efe8f4f10a6028298f57281 ] A "permanent" struct xdr_stream is allocated in struct svc_rqst so that it is usable by all server-side decoders. A per-rqst scratch buffer is also allocated to handle decoding XDR data items that cross page boundaries. To demonstrate how it will be used, add the first call site for the new svcxdr_init_decode() API. As an additional part of the overall conversion, add symbolic constants for successful and failed XDR operations. Returning "0" is overloaded. Sometimes it means something failed, but sometimes it means success. To make it more clear when XDR decoding functions succeed or fail, introduce symbolic constants. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfssvc.c | 2 ++ include/linux/sunrpc/svc.h | 16 ++++++++++++++++ net/sunrpc/svc.c | 5 +++++ 3 files changed, 23 insertions(+) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 2e61a565cdbd..311b287c5024 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1052,6 +1052,8 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) * (necessary in the NFSv4.0 compound case) */ rqstp->rq_cachetype = proc->pc_cachetype; + + svcxdr_init_decode(rqstp); if (!proc->pc_decode(rqstp, argv->iov_base)) goto out_decode_err; diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index c220b734fa69..34c2a69820e9 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -247,6 +247,8 @@ struct svc_rqst { size_t rq_xprt_hlen; /* xprt header len */ struct xdr_buf rq_arg; + struct xdr_stream rq_arg_stream; + struct page *rq_scratch_page; struct xdr_buf rq_res; struct page *rq_pages[RPCSVC_MAXPAGES + 1]; struct page * *rq_respages; /* points into rq_pages */ @@ -557,4 +559,18 @@ static inline void svc_reserve_auth(struct svc_rqst *rqstp, int space) svc_reserve(rqstp, space + rqstp->rq_auth_slack); } +/** + * svcxdr_init_decode - Prepare an xdr_stream for svc Call decoding + * @rqstp: controlling server RPC transaction context + * + */ +static inline void svcxdr_init_decode(struct svc_rqst *rqstp) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct kvec *argv = rqstp->rq_arg.head; + + xdr_init_decode(xdr, &rqstp->rq_arg, argv->iov_base, NULL); + xdr_set_scratch_page(xdr, rqstp->rq_scratch_page); +} + #endif /* SUNRPC_SVC_H */ diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index e4e4e203ecda..73e02ba4ed53 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -614,6 +614,10 @@ svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node) rqstp->rq_server = serv; rqstp->rq_pool = pool; + rqstp->rq_scratch_page = alloc_pages_node(node, GFP_KERNEL, 0); + if (!rqstp->rq_scratch_page) + goto out_enomem; + rqstp->rq_argp = kmalloc_node(serv->sv_xdrsize, GFP_KERNEL, node); if (!rqstp->rq_argp) goto out_enomem; @@ -846,6 +850,7 @@ void svc_rqst_free(struct svc_rqst *rqstp) { svc_release_buffer(rqstp); + put_page(rqstp->rq_scratch_page); kfree(rqstp->rq_resp); kfree(rqstp->rq_argp); kfree(rqstp->rq_auth_data); From fbffaddb766b9fe4ab461c751e79b06334ddac22 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 5 Nov 2020 14:48:29 -0500 Subject: [PATCH 0310/1565] NFSD: Add common helpers to decode void args and encode void results [ Upstream commit 788f7183fba86b46074c16e7d57ea09302badff4 ] Start off the conversion to xdr_stream by de-duplicating the functions that decode void arguments and encode void results. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs2acl.c | 21 ++++----------------- fs/nfsd/nfs3acl.c | 8 ++++---- fs/nfsd/nfs3proc.c | 10 ++++------ fs/nfsd/nfs3xdr.c | 11 ----------- fs/nfsd/nfs4proc.c | 11 ++++------- fs/nfsd/nfs4xdr.c | 12 ------------ fs/nfsd/nfsd.h | 8 ++++++++ fs/nfsd/nfsproc.c | 25 ++++++++++++------------- fs/nfsd/nfssvc.c | 28 ++++++++++++++++++++++++++++ fs/nfsd/nfsxdr.c | 10 ---------- fs/nfsd/xdr.h | 2 -- fs/nfsd/xdr3.h | 2 -- fs/nfsd/xdr4.h | 2 -- 13 files changed, 64 insertions(+), 86 deletions(-) diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 6a900f770dd2..b0f66604532a 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -185,10 +185,6 @@ static __be32 nfsacld_proc_access(struct svc_rqst *rqstp) /* * XDR decode functions */ -static int nfsaclsvc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p) -{ - return 1; -} static int nfsaclsvc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p) { @@ -255,15 +251,6 @@ static int nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) * XDR encode functions */ -/* - * There must be an encoding function for void results so svc_process - * will work properly. - */ -static int nfsaclsvc_encode_voidres(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_ressize_check(rqstp, p); -} - /* GETACL */ static int nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) { @@ -378,10 +365,10 @@ struct nfsd3_voidargs { int dummy; }; static const struct svc_procedure nfsd_acl_procedures2[5] = { [ACLPROC2_NULL] = { .pc_func = nfsacld_proc_null, - .pc_decode = nfsaclsvc_decode_voidarg, - .pc_encode = nfsaclsvc_encode_voidres, - .pc_argsize = sizeof(struct nfsd3_voidargs), - .pc_ressize = sizeof(struct nfsd3_voidargs), + .pc_decode = nfssvc_decode_voidarg, + .pc_encode = nfssvc_encode_voidres, + .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, }, diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 34a394e50e1d..7c30876a31a1 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -245,10 +245,10 @@ struct nfsd3_voidargs { int dummy; }; static const struct svc_procedure nfsd_acl_procedures3[3] = { [ACLPROC3_NULL] = { .pc_func = nfsd3_proc_null, - .pc_decode = nfs3svc_decode_voidarg, - .pc_encode = nfs3svc_encode_voidres, - .pc_argsize = sizeof(struct nfsd3_voidargs), - .pc_ressize = sizeof(struct nfsd3_voidargs), + .pc_decode = nfssvc_decode_voidarg, + .pc_encode = nfssvc_encode_voidres, + .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, }, diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index eba84417406c..3257233d1a65 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -697,8 +697,6 @@ nfsd3_proc_commit(struct svc_rqst *rqstp) #define nfsd3_attrstatres nfsd3_attrstat #define nfsd3_wccstatres nfsd3_attrstat #define nfsd3_createres nfsd3_diropres -#define nfsd3_voidres nfsd3_voidargs -struct nfsd3_voidargs { int dummy; }; #define ST 1 /* status*/ #define FH 17 /* filehandle with length */ @@ -709,10 +707,10 @@ struct nfsd3_voidargs { int dummy; }; static const struct svc_procedure nfsd_procedures3[22] = { [NFS3PROC_NULL] = { .pc_func = nfsd3_proc_null, - .pc_decode = nfs3svc_decode_voidarg, - .pc_encode = nfs3svc_encode_voidres, - .pc_argsize = sizeof(struct nfsd3_voidargs), - .pc_ressize = sizeof(struct nfsd3_voidres), + .pc_decode = nfssvc_decode_voidarg, + .pc_encode = nfssvc_encode_voidres, + .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, }, diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 27b24823f7c4..0d75d201db1b 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -304,11 +304,6 @@ void fill_post_wcc(struct svc_fh *fhp) /* * XDR decode functions */ -int -nfs3svc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p) -{ - return 1; -} int nfs3svc_decode_fhandle(struct svc_rqst *rqstp, __be32 *p) @@ -642,12 +637,6 @@ nfs3svc_decode_commitargs(struct svc_rqst *rqstp, __be32 *p) * XDR encode functions */ -int -nfs3svc_encode_voidres(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_ressize_check(rqstp, p); -} - /* GETATTR */ int nfs3svc_encode_attrstat(struct svc_rqst *rqstp, __be32 *p) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index c01b6f7462fc..55a46e7c152b 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -3285,16 +3285,13 @@ static const char *nfsd4_op_name(unsigned opnum) return "unknown_operation"; } -#define nfsd4_voidres nfsd4_voidargs -struct nfsd4_voidargs { int dummy; }; - static const struct svc_procedure nfsd_procedures4[2] = { [NFSPROC4_NULL] = { .pc_func = nfsd4_proc_null, - .pc_decode = nfs4svc_decode_voidarg, - .pc_encode = nfs4svc_encode_voidres, - .pc_argsize = sizeof(struct nfsd4_voidargs), - .pc_ressize = sizeof(struct nfsd4_voidres), + .pc_decode = nfssvc_decode_voidarg, + .pc_encode = nfssvc_encode_voidres, + .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 1, }, diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index fd9107332a20..c176d0090f7c 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5288,12 +5288,6 @@ nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op) p = xdr_encode_opaque_fixed(p, rp->rp_buf, rp->rp_buflen); } -int -nfs4svc_encode_voidres(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_ressize_check(rqstp, p); -} - void nfsd4_release_compoundargs(struct svc_rqst *rqstp) { struct nfsd4_compoundargs *args = rqstp->rq_argp; @@ -5311,12 +5305,6 @@ void nfsd4_release_compoundargs(struct svc_rqst *rqstp) } } -int -nfs4svc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p) -{ - return 1; -} - int nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, __be32 *p) { diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 4362d295ed34..3eaa81e001f9 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -73,6 +73,14 @@ extern unsigned long nfsd_drc_mem_used; extern const struct seq_operations nfs_exports_op; +/* + * Common void argument and result helpers + */ +struct nfsd_voidargs { }; +struct nfsd_voidres { }; +int nfssvc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p); +int nfssvc_encode_voidres(struct svc_rqst *rqstp, __be32 *p); + /* * Function prototypes. */ diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index bbd01e8397f6..dbd8d3604653 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -609,7 +609,6 @@ nfsd_proc_statfs(struct svc_rqst *rqstp) * NFSv2 Server procedures. * Only the results of non-idempotent operations are cached. */ -struct nfsd_void { int dummy; }; #define ST 1 /* status */ #define FH 8 /* filehandle */ @@ -618,10 +617,10 @@ struct nfsd_void { int dummy; }; static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_NULL] = { .pc_func = nfsd_proc_null, - .pc_decode = nfssvc_decode_void, - .pc_encode = nfssvc_encode_void, - .pc_argsize = sizeof(struct nfsd_void), - .pc_ressize = sizeof(struct nfsd_void), + .pc_decode = nfssvc_decode_voidarg, + .pc_encode = nfssvc_encode_voidres, + .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, }, @@ -647,10 +646,10 @@ static const struct svc_procedure nfsd_procedures2[18] = { }, [NFSPROC_ROOT] = { .pc_func = nfsd_proc_root, - .pc_decode = nfssvc_decode_void, - .pc_encode = nfssvc_encode_void, - .pc_argsize = sizeof(struct nfsd_void), - .pc_ressize = sizeof(struct nfsd_void), + .pc_decode = nfssvc_decode_voidarg, + .pc_encode = nfssvc_encode_voidres, + .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, }, @@ -685,10 +684,10 @@ static const struct svc_procedure nfsd_procedures2[18] = { }, [NFSPROC_WRITECACHE] = { .pc_func = nfsd_proc_writecache, - .pc_decode = nfssvc_decode_void, - .pc_encode = nfssvc_encode_void, - .pc_argsize = sizeof(struct nfsd_void), - .pc_ressize = sizeof(struct nfsd_void), + .pc_decode = nfssvc_decode_voidarg, + .pc_encode = nfssvc_encode_voidres, + .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, }, diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 311b287c5024..c81d49b07971 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1107,6 +1107,34 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) return 1; } +/** + * nfssvc_decode_voidarg - Decode void arguments + * @rqstp: Server RPC transaction context + * @p: buffer containing arguments to decode + * + * Return values: + * %0: Arguments were not valid + * %1: Decoding was successful + */ +int nfssvc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p) +{ + return 1; +} + +/** + * nfssvc_encode_voidres - Encode void results + * @rqstp: Server RPC transaction context + * @p: buffer in which to encode results + * + * Return values: + * %0: Local error while encoding + * %1: Encoding was successful + */ +int nfssvc_encode_voidres(struct svc_rqst *rqstp, __be32 *p) +{ + return xdr_ressize_check(rqstp, p); +} + int nfsd_pool_stats_open(struct inode *inode, struct file *file) { int ret; diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 9e00a902113e..7aa6e8aca2c1 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -192,11 +192,6 @@ __be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *f /* * XDR decode functions */ -int -nfssvc_decode_void(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_argsize_check(rqstp, p); -} int nfssvc_decode_fhandle(struct svc_rqst *rqstp, __be32 *p) @@ -423,11 +418,6 @@ nfssvc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) /* * XDR encode functions */ -int -nfssvc_encode_void(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_ressize_check(rqstp, p); -} int nfssvc_encode_stat(struct svc_rqst *rqstp, __be32 *p) diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index b8cc6a4b2e0e..edd87688ff86 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -144,7 +144,6 @@ union nfsd_xdrstore { #define NFS2_SVC_XDRSIZE sizeof(union nfsd_xdrstore) -int nfssvc_decode_void(struct svc_rqst *, __be32 *); int nfssvc_decode_fhandle(struct svc_rqst *, __be32 *); int nfssvc_decode_sattrargs(struct svc_rqst *, __be32 *); int nfssvc_decode_diropargs(struct svc_rqst *, __be32 *); @@ -156,7 +155,6 @@ int nfssvc_decode_readlinkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_linkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_symlinkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_readdirargs(struct svc_rqst *, __be32 *); -int nfssvc_encode_void(struct svc_rqst *, __be32 *); int nfssvc_encode_stat(struct svc_rqst *, __be32 *); int nfssvc_encode_attrstat(struct svc_rqst *, __be32 *); int nfssvc_encode_diropres(struct svc_rqst *, __be32 *); diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index ae6fa6c9cb46..456fcd7a1038 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -273,7 +273,6 @@ union nfsd3_xdrstore { #define NFS3_SVC_XDRSIZE sizeof(union nfsd3_xdrstore) -int nfs3svc_decode_voidarg(struct svc_rqst *, __be32 *); int nfs3svc_decode_fhandle(struct svc_rqst *, __be32 *); int nfs3svc_decode_sattrargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_diropargs(struct svc_rqst *, __be32 *); @@ -290,7 +289,6 @@ int nfs3svc_decode_symlinkargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_readdirargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_readdirplusargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_commitargs(struct svc_rqst *, __be32 *); -int nfs3svc_encode_voidres(struct svc_rqst *, __be32 *); int nfs3svc_encode_attrstat(struct svc_rqst *, __be32 *); int nfs3svc_encode_wccstat(struct svc_rqst *, __be32 *); int nfs3svc_encode_diropres(struct svc_rqst *, __be32 *); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 679d40af1bbb..37f89ad5e992 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -781,8 +781,6 @@ set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp) bool nfsd4_mach_creds_match(struct nfs4_client *cl, struct svc_rqst *rqstp); -int nfs4svc_decode_voidarg(struct svc_rqst *, __be32 *); -int nfs4svc_encode_voidres(struct svc_rqst *, __be32 *); int nfs4svc_decode_compoundargs(struct svc_rqst *, __be32 *); int nfs4svc_encode_compoundres(struct svc_rqst *, __be32 *); __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *, u32); From 568f9ca73d6ee19c2602dab10f5ff623c3ae7fda Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 19 Oct 2020 13:00:29 -0400 Subject: [PATCH 0311/1565] NFSD: Add tracepoints in nfsd_dispatch() [ Upstream commit 0dfdad1c1d1b77b9b085f4da390464dd0ac5647a ] For troubleshooting purposes, record GARBAGE_ARGS and CANT_ENCODE failures. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfssvc.c | 17 ++++---------- fs/nfsd/trace.h | 60 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index c81d49b07971..3fb9607d67a3 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -29,6 +29,8 @@ #include "netns.h" #include "filecache.h" +#include "trace.h" + #define NFSDDBG_FACILITY NFSDDBG_SVC bool inter_copy_offload_enable; @@ -1041,11 +1043,8 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) struct kvec *resv = &rqstp->rq_res.head[0]; __be32 *p; - dprintk("nfsd_dispatch: vers %d proc %d\n", - rqstp->rq_vers, rqstp->rq_proc); - if (nfs_request_too_big(rqstp, proc)) - goto out_too_large; + goto out_decode_err; /* * Give the xdr decoder a chance to change this if it wants @@ -1084,24 +1083,18 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) out_cached_reply: return 1; -out_too_large: - dprintk("nfsd: NFSv%d argument too large\n", rqstp->rq_vers); - *statp = rpc_garbage_args; - return 1; - out_decode_err: - dprintk("nfsd: failed to decode arguments!\n"); + trace_nfsd_garbage_args_err(rqstp); *statp = rpc_garbage_args; return 1; out_update_drop: - dprintk("nfsd: Dropping request; may be revisited later\n"); nfsd_cache_update(rqstp, RC_NOCACHE, NULL); out_dropit: return 0; out_encode_err: - dprintk("nfsd: failed to encode result!\n"); + trace_nfsd_cant_encode_err(rqstp); nfsd_cache_update(rqstp, RC_NOCACHE, NULL); *statp = rpc_system_err; return 1; diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 9239d97b682c..dbbda3e09eed 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -12,6 +12,66 @@ #include "export.h" #include "nfsfh.h" +#define NFSD_TRACE_PROC_ARG_FIELDS \ + __field(unsigned int, netns_ino) \ + __field(u32, xid) \ + __array(unsigned char, server, sizeof(struct sockaddr_in6)) \ + __array(unsigned char, client, sizeof(struct sockaddr_in6)) + +#define NFSD_TRACE_PROC_ARG_ASSIGNMENTS \ + do { \ + __entry->netns_ino = SVC_NET(rqstp)->ns.inum; \ + __entry->xid = be32_to_cpu(rqstp->rq_xid); \ + memcpy(__entry->server, &rqstp->rq_xprt->xpt_local, \ + rqstp->rq_xprt->xpt_locallen); \ + memcpy(__entry->client, &rqstp->rq_xprt->xpt_remote, \ + rqstp->rq_xprt->xpt_remotelen); \ + } while (0); + +TRACE_EVENT(nfsd_garbage_args_err, + TP_PROTO( + const struct svc_rqst *rqstp + ), + TP_ARGS(rqstp), + TP_STRUCT__entry( + NFSD_TRACE_PROC_ARG_FIELDS + + __field(u32, vers) + __field(u32, proc) + ), + TP_fast_assign( + NFSD_TRACE_PROC_ARG_ASSIGNMENTS + + __entry->vers = rqstp->rq_vers; + __entry->proc = rqstp->rq_proc; + ), + TP_printk("xid=0x%08x vers=%u proc=%u", + __entry->xid, __entry->vers, __entry->proc + ) +); + +TRACE_EVENT(nfsd_cant_encode_err, + TP_PROTO( + const struct svc_rqst *rqstp + ), + TP_ARGS(rqstp), + TP_STRUCT__entry( + NFSD_TRACE_PROC_ARG_FIELDS + + __field(u32, vers) + __field(u32, proc) + ), + TP_fast_assign( + NFSD_TRACE_PROC_ARG_ASSIGNMENTS + + __entry->vers = rqstp->rq_vers; + __entry->proc = rqstp->rq_proc; + ), + TP_printk("xid=0x%08x vers=%u proc=%u", + __entry->xid, __entry->vers, __entry->proc + ) +); + #define show_nfsd_may_flags(x) \ __print_flags(x, "|", \ { NFSD_MAY_EXEC, "EXEC" }, \ From 2994c8888472ac1fe1ce17895786eb76f2483d9b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sat, 21 Nov 2020 11:36:42 -0500 Subject: [PATCH 0312/1565] NFSD: Add tracepoints in nfsd4_decode/encode_compound() [ Upstream commit 08281341be8ebc97ee47999812bcf411942baa1e ] For troubleshooting purposes, record failures to decode NFSv4 operation arguments and encode operation results. trace_nfsd_compound_decode_err() replaces the dprintk() call sites that are embedded in READ_* macros that are about to be removed. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 13 +++++++-- fs/nfsd/trace.h | 68 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 79 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index c176d0090f7c..101ada3636b7 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -54,6 +54,8 @@ #include "pnfs.h" #include "filecache.h" +#include "trace.h" + #ifdef CONFIG_NFSD_V4_SECURITY_LABEL #include #endif @@ -2248,9 +2250,14 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) READ_BUF(4); op->opnum = be32_to_cpup(p++); - if (nfsd4_opnum_in_range(argp, op)) + if (nfsd4_opnum_in_range(argp, op)) { op->status = nfsd4_dec_ops[op->opnum](argp, &op->u); - else { + if (op->status != nfs_ok) + trace_nfsd_compound_decode_err(argp->rqstp, + argp->opcnt, i, + op->opnum, + op->status); + } else { op->opnum = OP_ILLEGAL; op->status = nfserr_op_illegal; } @@ -5220,6 +5227,8 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) !nfsd4_enc_ops[op->opnum]); encoder = nfsd4_enc_ops[op->opnum]; op->status = encoder(resp, op->status, &op->u); + if (op->status) + trace_nfsd_compound_encode_err(rqstp, op->opnum, op->status); if (opdesc && opdesc->op_release) opdesc->op_release(&op->u); xdr_commit_encode(xdr); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index dbbda3e09eed..2bc2a888f7fa 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -28,6 +28,24 @@ rqstp->rq_xprt->xpt_remotelen); \ } while (0); +#define NFSD_TRACE_PROC_RES_FIELDS \ + __field(unsigned int, netns_ino) \ + __field(u32, xid) \ + __field(unsigned long, status) \ + __array(unsigned char, server, sizeof(struct sockaddr_in6)) \ + __array(unsigned char, client, sizeof(struct sockaddr_in6)) + +#define NFSD_TRACE_PROC_RES_ASSIGNMENTS(error) \ + do { \ + __entry->netns_ino = SVC_NET(rqstp)->ns.inum; \ + __entry->xid = be32_to_cpu(rqstp->rq_xid); \ + __entry->status = be32_to_cpu(error); \ + memcpy(__entry->server, &rqstp->rq_xprt->xpt_local, \ + rqstp->rq_xprt->xpt_locallen); \ + memcpy(__entry->client, &rqstp->rq_xprt->xpt_remote, \ + rqstp->rq_xprt->xpt_remotelen); \ + } while (0); + TRACE_EVENT(nfsd_garbage_args_err, TP_PROTO( const struct svc_rqst *rqstp @@ -127,6 +145,56 @@ TRACE_EVENT(nfsd_compound_status, __get_str(name), __entry->status) ) +TRACE_EVENT(nfsd_compound_decode_err, + TP_PROTO( + const struct svc_rqst *rqstp, + u32 args_opcnt, + u32 resp_opcnt, + u32 opnum, + __be32 status + ), + TP_ARGS(rqstp, args_opcnt, resp_opcnt, opnum, status), + TP_STRUCT__entry( + NFSD_TRACE_PROC_RES_FIELDS + + __field(u32, args_opcnt) + __field(u32, resp_opcnt) + __field(u32, opnum) + ), + TP_fast_assign( + NFSD_TRACE_PROC_RES_ASSIGNMENTS(status) + + __entry->args_opcnt = args_opcnt; + __entry->resp_opcnt = resp_opcnt; + __entry->opnum = opnum; + ), + TP_printk("op=%u/%u opnum=%u status=%lu", + __entry->resp_opcnt, __entry->args_opcnt, + __entry->opnum, __entry->status) +); + +TRACE_EVENT(nfsd_compound_encode_err, + TP_PROTO( + const struct svc_rqst *rqstp, + u32 opnum, + __be32 status + ), + TP_ARGS(rqstp, opnum, status), + TP_STRUCT__entry( + NFSD_TRACE_PROC_RES_FIELDS + + __field(u32, opnum) + ), + TP_fast_assign( + NFSD_TRACE_PROC_RES_ASSIGNMENTS(status) + + __entry->opnum = opnum; + ), + TP_printk("opnum=%u status=%lu", + __entry->opnum, __entry->status) +); + + DECLARE_EVENT_CLASS(nfsd_fh_err_class, TP_PROTO(struct svc_rqst *rqstp, struct svc_fh *fhp, From bbb0a710a2c7a0b171e7a0d6fbdd6d5a2acd003b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 11:54:23 -0500 Subject: [PATCH 0313/1565] NFSD: Replace the internals of the READ_BUF() macro [ Upstream commit c1346a1216ab5cb04a265380ac9035d91b16b6d5 ] Convert the READ_BUF macro in nfs4xdr.c from open code to instead use the new xdr_stream-style decoders already in use by the encode side (and by the in-kernel NFS client implementation). Once this conversion is done, each individual NFSv4 argument decoder can be independently cleaned up to replace these macros with C code. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 4 +- fs/nfsd/nfs4xdr.c | 181 ++++++------------------------------- fs/nfsd/xdr4.h | 10 +- include/linux/sunrpc/xdr.h | 2 + net/sunrpc/xdr.c | 45 +++++++++ 5 files changed, 77 insertions(+), 165 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 55a46e7c152b..95545a61bfc7 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1024,8 +1024,8 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, write->wr_how_written = write->wr_stable_how; - nvecs = svc_fill_write_vector(rqstp, write->wr_pagelist, - &write->wr_head, write->wr_buflen); + nvecs = svc_fill_write_vector(rqstp, write->wr_payload.pages, + write->wr_payload.head, write->wr_buflen); WARN_ON_ONCE(nvecs > ARRAY_SIZE(rqstp->rq_vec)); status = nfsd_vfs_write(rqstp, &cstate->current_fh, nf, diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 101ada3636b7..8c694844f0ef 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -131,90 +131,13 @@ xdr_error: \ memcpy((x), p, nbytes); \ p += XDR_QUADLEN(nbytes); \ } while (0) - -/* READ_BUF, read_buf(): nbytes must be <= PAGE_SIZE */ -#define READ_BUF(nbytes) do { \ - if (nbytes <= (u32)((char *)argp->end - (char *)argp->p)) { \ - p = argp->p; \ - argp->p += XDR_QUADLEN(nbytes); \ - } else if (!(p = read_buf(argp, nbytes))) { \ - dprintk("NFSD: xdr error (%s:%d)\n", \ - __FILE__, __LINE__); \ - goto xdr_error; \ - } \ -} while (0) - -static void next_decode_page(struct nfsd4_compoundargs *argp) -{ - argp->p = page_address(argp->pagelist[0]); - argp->pagelist++; - if (argp->pagelen < PAGE_SIZE) { - argp->end = argp->p + XDR_QUADLEN(argp->pagelen); - argp->pagelen = 0; - } else { - argp->end = argp->p + (PAGE_SIZE>>2); - argp->pagelen -= PAGE_SIZE; - } -} - -static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes) -{ - /* We want more bytes than seem to be available. - * Maybe we need a new page, maybe we have just run out - */ - unsigned int avail = (char *)argp->end - (char *)argp->p; - __be32 *p; - - if (argp->pagelen == 0) { - struct kvec *vec = &argp->rqstp->rq_arg.tail[0]; - - if (!argp->tail) { - argp->tail = true; - avail = vec->iov_len; - argp->p = vec->iov_base; - argp->end = vec->iov_base + avail; - } - - if (avail < nbytes) - return NULL; - - p = argp->p; - argp->p += XDR_QUADLEN(nbytes); - return p; - } - - if (avail + argp->pagelen < nbytes) - return NULL; - if (avail + PAGE_SIZE < nbytes) /* need more than a page !! */ - return NULL; - /* ok, we can do it with the current plus the next page */ - if (nbytes <= sizeof(argp->tmp)) - p = argp->tmp; - else { - kfree(argp->tmpp); - p = argp->tmpp = kmalloc(nbytes, GFP_KERNEL); - if (!p) - return NULL; - - } - /* - * The following memcpy is safe because read_buf is always - * called with nbytes > avail, and the two cases above both - * guarantee p points to at least nbytes bytes. - */ - memcpy(p, argp->p, avail); - next_decode_page(argp); - memcpy(((char*)p)+avail, argp->p, (nbytes - avail)); - argp->p += XDR_QUADLEN(nbytes - avail); - return p; -} - -static unsigned int compoundargs_bytes_left(struct nfsd4_compoundargs *argp) -{ - unsigned int this = (char *)argp->end - (char *)argp->p; - - return this + argp->pagelen; -} +#define READ_BUF(nbytes) \ + do { \ + p = xdr_inline_decode(argp->xdr,\ + nbytes); \ + if (!p) \ + goto xdr_error; \ + } while (0) static int zero_clientid(clientid_t *clid) { @@ -261,44 +184,6 @@ svcxdr_dupstr(struct nfsd4_compoundargs *argp, void *buf, u32 len) return p; } -static __be32 -svcxdr_construct_vector(struct nfsd4_compoundargs *argp, struct kvec *head, - struct page ***pagelist, u32 buflen) -{ - int avail; - int len; - int pages; - - /* Sorry .. no magic macros for this.. * - * READ_BUF(write->wr_buflen); - * SAVEMEM(write->wr_buf, write->wr_buflen); - */ - avail = (char *)argp->end - (char *)argp->p; - if (avail + argp->pagelen < buflen) { - dprintk("NFSD: xdr error (%s:%d)\n", - __FILE__, __LINE__); - return nfserr_bad_xdr; - } - head->iov_base = argp->p; - head->iov_len = avail; - *pagelist = argp->pagelist; - - len = XDR_QUADLEN(buflen) << 2; - if (len >= avail) { - len -= avail; - - pages = len >> PAGE_SHIFT; - argp->pagelist += pages; - argp->pagelen -= pages * PAGE_SIZE; - len -= pages * PAGE_SIZE; - - next_decode_page(argp); - } - argp->p += XDR_QUADLEN(len); - - return 0; -} - /** * savemem - duplicate a chunk of memory for later processing * @argp: NFSv4 compound argument structure to be freed with @@ -398,7 +283,7 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, READ_BUF(4); len += 4; nace = be32_to_cpup(p++); - if (nace > compoundargs_bytes_left(argp)/20) + if (nace > xdr_stream_remaining(argp->xdr) / sizeof(struct nfs4_ace)) /* * Even with 4-byte names there wouldn't be * space for that many aces; something fishy is @@ -929,7 +814,7 @@ static __be32 nfsd4_decode_share_deny(struct nfsd4_compoundargs *argp, u32 *x) static __be32 nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_netobj *o) { - __be32 *p; + DECODE_HEAD; READ_BUF(4); o->len = be32_to_cpup(p++); @@ -939,9 +824,8 @@ static __be32 nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_ne READ_BUF(o->len); SAVEMEM(o->data, o->len); - return nfs_ok; -xdr_error: - return nfserr_bad_xdr; + + DECODE_TAIL; } static __be32 @@ -1319,10 +1203,8 @@ nfsd4_decode_write(struct nfsd4_compoundargs *argp, struct nfsd4_write *write) goto xdr_error; write->wr_buflen = be32_to_cpup(p++); - status = svcxdr_construct_vector(argp, &write->wr_head, - &write->wr_pagelist, write->wr_buflen); - if (status) - return status; + if (!xdr_stream_subsegment(argp->xdr, &write->wr_payload, write->wr_buflen)) + goto xdr_error; DECODE_TAIL; } @@ -1891,13 +1773,14 @@ nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek) */ /* - * Decode data into buffer. Uses head and pages constructed by - * svcxdr_construct_vector. + * Decode data into buffer. */ static __be32 -nfsd4_vbuf_from_vector(struct nfsd4_compoundargs *argp, struct kvec *head, - struct page **pages, char **bufp, u32 buflen) +nfsd4_vbuf_from_vector(struct nfsd4_compoundargs *argp, struct xdr_buf *xdr, + char **bufp, u32 buflen) { + struct page **pages = xdr->pages; + struct kvec *head = xdr->head; char *tmp, *dp; u32 len; @@ -2012,8 +1895,6 @@ nfsd4_decode_setxattr(struct nfsd4_compoundargs *argp, { DECODE_HEAD; u32 flags, maxcount, size; - struct kvec head; - struct page **pagelist; READ_BUF(4); flags = be32_to_cpup(p++); @@ -2036,12 +1917,12 @@ nfsd4_decode_setxattr(struct nfsd4_compoundargs *argp, setxattr->setxa_len = size; if (size > 0) { - status = svcxdr_construct_vector(argp, &head, &pagelist, size); - if (status) - return status; + struct xdr_buf payload; - status = nfsd4_vbuf_from_vector(argp, &head, pagelist, - &setxattr->setxa_buf, size); + if (!xdr_stream_subsegment(argp->xdr, &payload, size)) + goto xdr_error; + status = nfsd4_vbuf_from_vector(argp, &payload, + &setxattr->setxa_buf, size); } DECODE_TAIL; @@ -5305,8 +5186,6 @@ void nfsd4_release_compoundargs(struct svc_rqst *rqstp) kfree(args->ops); args->ops = args->iops; } - kfree(args->tmpp); - args->tmpp = NULL; while (args->to_free) { struct svcxdr_tmpbuf *tb = args->to_free; args->to_free = tb->next; @@ -5319,19 +5198,11 @@ nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd4_compoundargs *args = rqstp->rq_argp; - if (rqstp->rq_arg.head[0].iov_len % 4) { - /* client is nuts */ - dprintk("%s: compound not properly padded! (peeraddr=%pISc xid=0x%x)", - __func__, svc_addr(rqstp), be32_to_cpu(rqstp->rq_xid)); - return 0; - } - args->p = p; - args->end = rqstp->rq_arg.head[0].iov_base + rqstp->rq_arg.head[0].iov_len; - args->pagelist = rqstp->rq_arg.pages; - args->pagelen = rqstp->rq_arg.page_len; - args->tail = false; + /* svcxdr_tmp_alloc */ args->tmpp = NULL; args->to_free = NULL; + + args->xdr = &rqstp->rq_arg_stream; args->ops = args->iops; args->rqstp = rqstp; diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 37f89ad5e992..0eb13bd603ea 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -419,8 +419,7 @@ struct nfsd4_write { u64 wr_offset; /* request */ u32 wr_stable_how; /* request */ u32 wr_buflen; /* request */ - struct kvec wr_head; - struct page ** wr_pagelist; /* request */ + struct xdr_buf wr_payload; /* request */ u32 wr_bytes_written; /* response */ u32 wr_how_written; /* response */ @@ -696,15 +695,10 @@ struct svcxdr_tmpbuf { struct nfsd4_compoundargs { /* scratch variables for XDR decode */ - __be32 * p; - __be32 * end; - struct page ** pagelist; - int pagelen; - bool tail; __be32 tmp[8]; __be32 * tmpp; + struct xdr_stream *xdr; struct svcxdr_tmpbuf *to_free; - struct svc_rqst *rqstp; u32 taglen; diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 0c8cab6210b3..c03f7bf585c9 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -252,6 +252,8 @@ extern void xdr_enter_page(struct xdr_stream *xdr, unsigned int len); extern int xdr_process_buf(struct xdr_buf *buf, unsigned int offset, unsigned int len, int (*actor)(struct scatterlist *, void *), void *data); extern uint64_t xdr_align_data(struct xdr_stream *, uint64_t, uint32_t); extern uint64_t xdr_expand_hole(struct xdr_stream *, uint64_t, uint64_t); +extern bool xdr_stream_subsegment(struct xdr_stream *xdr, struct xdr_buf *subbuf, + unsigned int len); /** * xdr_set_scratch_buffer - Attach a scratch buffer for decoding data. diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c index 02adc5c7f034..722586696fad 100644 --- a/net/sunrpc/xdr.c +++ b/net/sunrpc/xdr.c @@ -1412,6 +1412,51 @@ xdr_buf_subsegment(struct xdr_buf *buf, struct xdr_buf *subbuf, } EXPORT_SYMBOL_GPL(xdr_buf_subsegment); +/** + * xdr_stream_subsegment - set @subbuf to a portion of @xdr + * @xdr: an xdr_stream set up for decoding + * @subbuf: the result buffer + * @nbytes: length of @xdr to extract, in bytes + * + * Sets up @subbuf to represent a portion of @xdr. The portion + * starts at the current offset in @xdr, and extends for a length + * of @nbytes. If this is successful, @xdr is advanced to the next + * position following that portion. + * + * Return values: + * %true: @subbuf has been initialized, and @xdr has been advanced. + * %false: a bounds error has occurred + */ +bool xdr_stream_subsegment(struct xdr_stream *xdr, struct xdr_buf *subbuf, + unsigned int nbytes) +{ + unsigned int remaining, offset, len; + + if (xdr_buf_subsegment(xdr->buf, subbuf, xdr_stream_pos(xdr), nbytes)) + return false; + + if (subbuf->head[0].iov_len) + if (!__xdr_inline_decode(xdr, subbuf->head[0].iov_len)) + return false; + + remaining = subbuf->page_len; + offset = subbuf->page_base; + while (remaining) { + len = min_t(unsigned int, remaining, PAGE_SIZE) - offset; + + if (xdr->p == xdr->end && !xdr_set_next_buffer(xdr)) + return false; + if (!__xdr_inline_decode(xdr, len)) + return false; + + remaining -= len; + offset = 0; + } + + return true; +} +EXPORT_SYMBOL_GPL(xdr_stream_subsegment); + /** * xdr_buf_trim - lop at most "len" bytes off the end of "buf" * @buf: buf to be trimmed From 9921353a52a7d621eb64ccaabaf28cc29be1c765 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 13:12:27 -0500 Subject: [PATCH 0314/1565] NFSD: Replace READ* macros in nfsd4_decode_access() [ Upstream commit d169a6a9e5fd7f9e4b74e5e5d2e5a4fd0f84ef05 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 8c694844f0ef..f3d54af9de92 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -439,17 +439,6 @@ nfsd4_decode_stateid(struct nfsd4_compoundargs *argp, stateid_t *sid) DECODE_TAIL; } -static __be32 -nfsd4_decode_access(struct nfsd4_compoundargs *argp, struct nfsd4_access *access) -{ - DECODE_HEAD; - - READ_BUF(4); - access->ac_req_access = be32_to_cpup(p++); - - DECODE_TAIL; -} - static __be32 nfsd4_decode_cb_sec(struct nfsd4_compoundargs *argp, struct nfsd4_cb_sec *cbs) { DECODE_HEAD; @@ -531,6 +520,19 @@ static __be32 nfsd4_decode_cb_sec(struct nfsd4_compoundargs *argp, struct nfsd4_ DECODE_TAIL; } +/* + * NFSv4 operation argument decoders + */ + +static __be32 +nfsd4_decode_access(struct nfsd4_compoundargs *argp, + struct nfsd4_access *access) +{ + if (xdr_stream_decode_u32(argp->xdr, &access->ac_req_access) < 0) + return nfserr_bad_xdr; + return nfs_ok; +} + static __be32 nfsd4_decode_backchannel_ctl(struct nfsd4_compoundargs *argp, struct nfsd4_backchannel_ctl *bc) { DECODE_HEAD; From c701c0e5a95675b03313487d7d75a5205292db7d Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 13:18:23 -0500 Subject: [PATCH 0315/1565] NFSD: Replace READ* macros in nfsd4_decode_close() [ Upstream commit d3d2f38154571e70d5806b5c5264bf61c101ea15 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index f3d54af9de92..ca0247853493 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -439,6 +439,19 @@ nfsd4_decode_stateid(struct nfsd4_compoundargs *argp, stateid_t *sid) DECODE_TAIL; } +static __be32 +nfsd4_decode_stateid4(struct nfsd4_compoundargs *argp, stateid_t *sid) +{ + __be32 *p; + + p = xdr_inline_decode(argp->xdr, NFS4_STATEID_SIZE); + if (!p) + return nfserr_bad_xdr; + sid->si_generation = be32_to_cpup(p++); + memcpy(&sid->si_opaque, p, sizeof(sid->si_opaque)); + return nfs_ok; +} + static __be32 nfsd4_decode_cb_sec(struct nfsd4_compoundargs *argp, struct nfsd4_cb_sec *cbs) { DECODE_HEAD; @@ -559,13 +572,9 @@ static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_close(struct nfsd4_compoundargs *argp, struct nfsd4_close *close) { - DECODE_HEAD; - - READ_BUF(4); - close->cl_seqid = be32_to_cpup(p++); - return nfsd4_decode_stateid(argp, &close->cl_stateid); - - DECODE_TAIL; + if (xdr_stream_decode_u32(argp->xdr, &close->cl_seqid) < 0) + return nfserr_bad_xdr; + return nfsd4_decode_stateid4(argp, &close->cl_stateid); } From f82c6ad7e2fb55743ef28bfa6466387146b5adb9 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 13:19:51 -0500 Subject: [PATCH 0316/1565] NFSD: Replace READ* macros in nfsd4_decode_commit() [ Upstream commit cbd9abb3706e96563b36af67595707a7054ab693 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 12 +++++------- include/linux/sunrpc/xdr.h | 21 +++++++++++++++++++++ 2 files changed, 26 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index ca0247853493..8251b905d547 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -581,13 +581,11 @@ nfsd4_decode_close(struct nfsd4_compoundargs *argp, struct nfsd4_close *close) static __be32 nfsd4_decode_commit(struct nfsd4_compoundargs *argp, struct nfsd4_commit *commit) { - DECODE_HEAD; - - READ_BUF(12); - p = xdr_decode_hyper(p, &commit->co_offset); - commit->co_count = be32_to_cpup(p++); - - DECODE_TAIL; + if (xdr_stream_decode_u64(argp->xdr, &commit->co_offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &commit->co_count) < 0) + return nfserr_bad_xdr; + return nfs_ok; } static __be32 diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index c03f7bf585c9..6b1757543747 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -569,6 +569,27 @@ xdr_stream_decode_u32(struct xdr_stream *xdr, __u32 *ptr) return 0; } +/** + * xdr_stream_decode_u64 - Decode a 64-bit integer + * @xdr: pointer to xdr_stream + * @ptr: location to store 64-bit integer + * + * Return values: + * %0 on success + * %-EBADMSG on XDR buffer overflow + */ +static inline ssize_t +xdr_stream_decode_u64(struct xdr_stream *xdr, __u64 *ptr) +{ + const size_t count = sizeof(*ptr); + __be32 *p = xdr_inline_decode(xdr, count); + + if (unlikely(!p)) + return -EBADMSG; + xdr_decode_hyper(p, ptr); + return 0; +} + /** * xdr_stream_decode_opaque_fixed - Decode fixed length opaque xdr data * @xdr: pointer to xdr_stream From 2a8ae039571cec9c3e554615daaa02f06a68429f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 19 Nov 2020 13:09:13 -0500 Subject: [PATCH 0317/1565] NFSD: Change the way the expected length of a fattr4 is checked [ Upstream commit 081d53fe0b43c47c36d1832b759bf14edde9cdbb ] Because the fattr4 is now managed in an xdr_stream, all that is needed is to store the initial position of the stream before decoding the attribute list. Then the actual length of the list is computed using the final stream position, after decoding is complete. No behavior change is expected. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 34 +++++++++++----------------------- 1 file changed, 11 insertions(+), 23 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 8251b905d547..de5ac334cb8a 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -250,7 +250,8 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, struct iattr *iattr, struct nfs4_acl **acl, struct xdr_netobj *label, int *umask) { - int expected_len, len = 0; + unsigned int starting_pos; + u32 attrlist4_count; u32 dummy32; char *buf; @@ -267,12 +268,12 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, return nfserr_attrnotsupp; } - READ_BUF(4); - expected_len = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &attrlist4_count) < 0) + return nfserr_bad_xdr; + starting_pos = xdr_stream_pos(argp->xdr); if (bmval[0] & FATTR4_WORD0_SIZE) { READ_BUF(8); - len += 8; p = xdr_decode_hyper(p, &iattr->ia_size); iattr->ia_valid |= ATTR_SIZE; } @@ -280,7 +281,7 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, u32 nace; struct nfs4_ace *ace; - READ_BUF(4); len += 4; + READ_BUF(4); nace = be32_to_cpup(p++); if (nace > xdr_stream_remaining(argp->xdr) / sizeof(struct nfs4_ace)) @@ -297,13 +298,12 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, (*acl)->naces = nace; for (ace = (*acl)->aces; ace < (*acl)->aces + nace; ace++) { - READ_BUF(16); len += 16; + READ_BUF(16); ace->type = be32_to_cpup(p++); ace->flag = be32_to_cpup(p++); ace->access_mask = be32_to_cpup(p++); dummy32 = be32_to_cpup(p++); READ_BUF(dummy32); - len += XDR_QUADLEN(dummy32) << 2; READMEM(buf, dummy32); ace->whotype = nfs4_acl_get_whotype(buf, dummy32); status = nfs_ok; @@ -322,17 +322,14 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, *acl = NULL; if (bmval[1] & FATTR4_WORD1_MODE) { READ_BUF(4); - len += 4; iattr->ia_mode = be32_to_cpup(p++); iattr->ia_mode &= (S_IFMT | S_IALLUGO); iattr->ia_valid |= ATTR_MODE; } if (bmval[1] & FATTR4_WORD1_OWNER) { READ_BUF(4); - len += 4; dummy32 = be32_to_cpup(p++); READ_BUF(dummy32); - len += (XDR_QUADLEN(dummy32) << 2); READMEM(buf, dummy32); if ((status = nfsd_map_name_to_uid(argp->rqstp, buf, dummy32, &iattr->ia_uid))) return status; @@ -340,10 +337,8 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, } if (bmval[1] & FATTR4_WORD1_OWNER_GROUP) { READ_BUF(4); - len += 4; dummy32 = be32_to_cpup(p++); READ_BUF(dummy32); - len += (XDR_QUADLEN(dummy32) << 2); READMEM(buf, dummy32); if ((status = nfsd_map_name_to_gid(argp->rqstp, buf, dummy32, &iattr->ia_gid))) return status; @@ -351,11 +346,9 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, } if (bmval[1] & FATTR4_WORD1_TIME_ACCESS_SET) { READ_BUF(4); - len += 4; dummy32 = be32_to_cpup(p++); switch (dummy32) { case NFS4_SET_TO_CLIENT_TIME: - len += 12; status = nfsd4_decode_time(argp, &iattr->ia_atime); if (status) return status; @@ -370,11 +363,9 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, } if (bmval[1] & FATTR4_WORD1_TIME_MODIFY_SET) { READ_BUF(4); - len += 4; dummy32 = be32_to_cpup(p++); switch (dummy32) { case NFS4_SET_TO_CLIENT_TIME: - len += 12; status = nfsd4_decode_time(argp, &iattr->ia_mtime); if (status) return status; @@ -392,18 +383,14 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, if (IS_ENABLED(CONFIG_NFSD_V4_SECURITY_LABEL) && bmval[2] & FATTR4_WORD2_SECURITY_LABEL) { READ_BUF(4); - len += 4; dummy32 = be32_to_cpup(p++); /* lfs: we don't use it */ READ_BUF(4); - len += 4; dummy32 = be32_to_cpup(p++); /* pi: we don't use it either */ READ_BUF(4); - len += 4; dummy32 = be32_to_cpup(p++); READ_BUF(dummy32); if (dummy32 > NFS4_MAXLABELLEN) return nfserr_badlabel; - len += (XDR_QUADLEN(dummy32) << 2); READMEM(buf, dummy32); label->len = dummy32; label->data = svcxdr_dupstr(argp, buf, dummy32); @@ -414,15 +401,16 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, if (!umask) goto xdr_error; READ_BUF(8); - len += 8; dummy32 = be32_to_cpup(p++); iattr->ia_mode = dummy32 & (S_IFMT | S_IALLUGO); dummy32 = be32_to_cpup(p++); *umask = dummy32 & S_IRWXUGO; iattr->ia_valid |= ATTR_MODE; } - if (len != expected_len) - goto xdr_error; + + /* request sanity: did attrlist4 contain the expected number of words? */ + if (attrlist4_count != xdr_stream_pos(argp->xdr) - starting_pos) + return nfserr_bad_xdr; DECODE_TAIL; } From ee02662724e31d372c4cd755039a8511b57db4a1 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 19 Nov 2020 13:47:16 -0500 Subject: [PATCH 0318/1565] NFSD: Replace READ* macros that decode the fattr4 size attribute [ Upstream commit 2ac1b9b2afbbacf597dbec722b23b6be62e4e41e ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index de5ac334cb8a..5ec0c2dac334 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -273,8 +273,11 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, starting_pos = xdr_stream_pos(argp->xdr); if (bmval[0] & FATTR4_WORD0_SIZE) { - READ_BUF(8); - p = xdr_decode_hyper(p, &iattr->ia_size); + u64 size; + + if (xdr_stream_decode_u64(argp->xdr, &size) < 0) + return nfserr_bad_xdr; + iattr->ia_size = size; iattr->ia_valid |= ATTR_SIZE; } if (bmval[0] & FATTR4_WORD0_ACL) { From 3d3690b6620e6de4ff4ea48b99b7df7a234ac4af Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 19 Nov 2020 13:02:54 -0500 Subject: [PATCH 0319/1565] NFSD: Replace READ* macros that decode the fattr4 acl attribute [ Upstream commit c941a96823cf52e742606b486b81ab346bf111c9 ] Refactor for clarity and to move infrequently-used code out of line. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 107 +++++++++++++++++++++++++++++----------------- 1 file changed, 67 insertions(+), 40 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 5ec0c2dac334..0fe57ca0f31a 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -245,6 +245,70 @@ nfsd4_decode_bitmap(struct nfsd4_compoundargs *argp, u32 *bmval) DECODE_TAIL; } +static __be32 +nfsd4_decode_nfsace4(struct nfsd4_compoundargs *argp, struct nfs4_ace *ace) +{ + __be32 *p, status; + u32 length; + + if (xdr_stream_decode_u32(argp->xdr, &ace->type) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &ace->flag) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &ace->access_mask) < 0) + return nfserr_bad_xdr; + + if (xdr_stream_decode_u32(argp->xdr, &length) < 0) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, length); + if (!p) + return nfserr_bad_xdr; + ace->whotype = nfs4_acl_get_whotype((char *)p, length); + if (ace->whotype != NFS4_ACL_WHO_NAMED) + status = nfs_ok; + else if (ace->flag & NFS4_ACE_IDENTIFIER_GROUP) + status = nfsd_map_name_to_gid(argp->rqstp, + (char *)p, length, &ace->who_gid); + else + status = nfsd_map_name_to_uid(argp->rqstp, + (char *)p, length, &ace->who_uid); + + return status; +} + +/* A counted array of nfsace4's */ +static noinline __be32 +nfsd4_decode_acl(struct nfsd4_compoundargs *argp, struct nfs4_acl **acl) +{ + struct nfs4_ace *ace; + __be32 status; + u32 count; + + if (xdr_stream_decode_u32(argp->xdr, &count) < 0) + return nfserr_bad_xdr; + + if (count > xdr_stream_remaining(argp->xdr) / 20) + /* + * Even with 4-byte names there wouldn't be + * space for that many aces; something fishy is + * going on: + */ + return nfserr_fbig; + + *acl = svcxdr_tmpalloc(argp, nfs4_acl_bytes(count)); + if (*acl == NULL) + return nfserr_jukebox; + + (*acl)->naces = count; + for (ace = (*acl)->aces; ace < (*acl)->aces + count; ace++) { + status = nfsd4_decode_nfsace4(argp, ace); + if (status) + return status; + } + + return nfs_ok; +} + static __be32 nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, struct iattr *iattr, struct nfs4_acl **acl, @@ -281,46 +345,9 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, iattr->ia_valid |= ATTR_SIZE; } if (bmval[0] & FATTR4_WORD0_ACL) { - u32 nace; - struct nfs4_ace *ace; - - READ_BUF(4); - nace = be32_to_cpup(p++); - - if (nace > xdr_stream_remaining(argp->xdr) / sizeof(struct nfs4_ace)) - /* - * Even with 4-byte names there wouldn't be - * space for that many aces; something fishy is - * going on: - */ - return nfserr_fbig; - - *acl = svcxdr_tmpalloc(argp, nfs4_acl_bytes(nace)); - if (*acl == NULL) - return nfserr_jukebox; - - (*acl)->naces = nace; - for (ace = (*acl)->aces; ace < (*acl)->aces + nace; ace++) { - READ_BUF(16); - ace->type = be32_to_cpup(p++); - ace->flag = be32_to_cpup(p++); - ace->access_mask = be32_to_cpup(p++); - dummy32 = be32_to_cpup(p++); - READ_BUF(dummy32); - READMEM(buf, dummy32); - ace->whotype = nfs4_acl_get_whotype(buf, dummy32); - status = nfs_ok; - if (ace->whotype != NFS4_ACL_WHO_NAMED) - ; - else if (ace->flag & NFS4_ACE_IDENTIFIER_GROUP) - status = nfsd_map_name_to_gid(argp->rqstp, - buf, dummy32, &ace->who_gid); - else - status = nfsd_map_name_to_uid(argp->rqstp, - buf, dummy32, &ace->who_uid); - if (status) - return status; - } + status = nfsd4_decode_acl(argp, acl); + if (status) + return status; } else *acl = NULL; if (bmval[1] & FATTR4_WORD1_MODE) { From 8801b0c2842173291701ef6290df973f4969275d Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 19 Nov 2020 13:54:26 -0500 Subject: [PATCH 0320/1565] NFSD: Replace READ* macros that decode the fattr4 mode attribute [ Upstream commit 1c8f0ad7dd35fd12307904036c7c839f77b6e3f9 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 0fe57ca0f31a..9dc73ab95eac 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -351,8 +351,11 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, } else *acl = NULL; if (bmval[1] & FATTR4_WORD1_MODE) { - READ_BUF(4); - iattr->ia_mode = be32_to_cpup(p++); + u32 mode; + + if (xdr_stream_decode_u32(argp->xdr, &mode) < 0) + return nfserr_bad_xdr; + iattr->ia_mode = mode; iattr->ia_mode &= (S_IFMT | S_IALLUGO); iattr->ia_valid |= ATTR_MODE; } From dec78fb66dd6c57a39998022c3c6c8c34b5c11cf Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 19 Nov 2020 13:56:42 -0500 Subject: [PATCH 0321/1565] NFSD: Replace READ* macros that decode the fattr4 owner attribute [ Upstream commit 9853a5ac9be381917e9be0b4133cd4ac5a7ad875 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 9dc73ab95eac..7dc6b79e51fd 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -360,11 +360,16 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, iattr->ia_valid |= ATTR_MODE; } if (bmval[1] & FATTR4_WORD1_OWNER) { - READ_BUF(4); - dummy32 = be32_to_cpup(p++); - READ_BUF(dummy32); - READMEM(buf, dummy32); - if ((status = nfsd_map_name_to_uid(argp->rqstp, buf, dummy32, &iattr->ia_uid))) + u32 length; + + if (xdr_stream_decode_u32(argp->xdr, &length) < 0) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, length); + if (!p) + return nfserr_bad_xdr; + status = nfsd_map_name_to_uid(argp->rqstp, (char *)p, length, + &iattr->ia_uid); + if (status) return status; iattr->ia_valid |= ATTR_UID; } From df42ebb61bbe9436028bfc0e938287ac1f20a7ee Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 19 Nov 2020 13:58:18 -0500 Subject: [PATCH 0322/1565] NFSD: Replace READ* macros that decode the fattr4 owner_group attribute [ Upstream commit 393c31dd27f83adb06b07a1b5f0a5b8966a0f01e ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 7dc6b79e51fd..979f1d384cd0 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -374,11 +374,16 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, iattr->ia_valid |= ATTR_UID; } if (bmval[1] & FATTR4_WORD1_OWNER_GROUP) { - READ_BUF(4); - dummy32 = be32_to_cpup(p++); - READ_BUF(dummy32); - READMEM(buf, dummy32); - if ((status = nfsd_map_name_to_gid(argp->rqstp, buf, dummy32, &iattr->ia_gid))) + u32 length; + + if (xdr_stream_decode_u32(argp->xdr, &length) < 0) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, length); + if (!p) + return nfserr_bad_xdr; + status = nfsd_map_name_to_gid(argp->rqstp, (char *)p, length, + &iattr->ia_gid); + if (status) return status; iattr->ia_valid |= ATTR_GID; } From 064e439befc906970645f04d01ec7e2ce832f27b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 19 Nov 2020 14:01:08 -0500 Subject: [PATCH 0323/1565] NFSD: Replace READ* macros that decode the fattr4 time_set attributes [ Upstream commit 1c3eff7ea4a98c642134ee493001ae13b79ff38c ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 39 +++++++++++++++++++++++++++++---------- 1 file changed, 29 insertions(+), 10 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 979f1d384cd0..09975786e71b 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -219,6 +219,21 @@ nfsd4_decode_time(struct nfsd4_compoundargs *argp, struct timespec64 *tv) DECODE_TAIL; } +static __be32 +nfsd4_decode_nfstime4(struct nfsd4_compoundargs *argp, struct timespec64 *tv) +{ + __be32 *p; + + p = xdr_inline_decode(argp->xdr, XDR_UNIT * 3); + if (!p) + return nfserr_bad_xdr; + p = xdr_decode_hyper(p, &tv->tv_sec); + tv->tv_nsec = be32_to_cpup(p++); + if (tv->tv_nsec >= (u32)1000000000) + return nfserr_inval; + return nfs_ok; +} + static __be32 nfsd4_decode_bitmap(struct nfsd4_compoundargs *argp, u32 *bmval) { @@ -388,11 +403,13 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, iattr->ia_valid |= ATTR_GID; } if (bmval[1] & FATTR4_WORD1_TIME_ACCESS_SET) { - READ_BUF(4); - dummy32 = be32_to_cpup(p++); - switch (dummy32) { + u32 set_it; + + if (xdr_stream_decode_u32(argp->xdr, &set_it) < 0) + return nfserr_bad_xdr; + switch (set_it) { case NFS4_SET_TO_CLIENT_TIME: - status = nfsd4_decode_time(argp, &iattr->ia_atime); + status = nfsd4_decode_nfstime4(argp, &iattr->ia_atime); if (status) return status; iattr->ia_valid |= (ATTR_ATIME | ATTR_ATIME_SET); @@ -401,15 +418,17 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, iattr->ia_valid |= ATTR_ATIME; break; default: - goto xdr_error; + return nfserr_bad_xdr; } } if (bmval[1] & FATTR4_WORD1_TIME_MODIFY_SET) { - READ_BUF(4); - dummy32 = be32_to_cpup(p++); - switch (dummy32) { + u32 set_it; + + if (xdr_stream_decode_u32(argp->xdr, &set_it) < 0) + return nfserr_bad_xdr; + switch (set_it) { case NFS4_SET_TO_CLIENT_TIME: - status = nfsd4_decode_time(argp, &iattr->ia_mtime); + status = nfsd4_decode_nfstime4(argp, &iattr->ia_mtime); if (status) return status; iattr->ia_valid |= (ATTR_MTIME | ATTR_MTIME_SET); @@ -418,7 +437,7 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, iattr->ia_valid |= ATTR_MTIME; break; default: - goto xdr_error; + return nfserr_bad_xdr; } } From 91a6752daddd88d6e0da2e809b407d8a33770cf8 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 19 Nov 2020 14:05:51 -0500 Subject: [PATCH 0324/1565] NFSD: Replace READ* macros that decode the fattr4 security label attribute [ Upstream commit dabe91828f92cd493e9e75efbc10f9878d2a73fe ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 46 ++++++++++++++++++++++++++++++---------------- 1 file changed, 30 insertions(+), 16 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 09975786e71b..453a902c7490 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -324,6 +324,33 @@ nfsd4_decode_acl(struct nfsd4_compoundargs *argp, struct nfs4_acl **acl) return nfs_ok; } +static noinline __be32 +nfsd4_decode_security_label(struct nfsd4_compoundargs *argp, + struct xdr_netobj *label) +{ + u32 lfs, pi, length; + __be32 *p; + + if (xdr_stream_decode_u32(argp->xdr, &lfs) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &pi) < 0) + return nfserr_bad_xdr; + + if (xdr_stream_decode_u32(argp->xdr, &length) < 0) + return nfserr_bad_xdr; + if (length > NFS4_MAXLABELLEN) + return nfserr_badlabel; + p = xdr_inline_decode(argp->xdr, length); + if (!p) + return nfserr_bad_xdr; + label->len = length; + label->data = svcxdr_dupstr(argp, p, length); + if (!label->data) + return nfserr_jukebox; + + return nfs_ok; +} + static __be32 nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, struct iattr *iattr, struct nfs4_acl **acl, @@ -332,7 +359,6 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, unsigned int starting_pos; u32 attrlist4_count; u32 dummy32; - char *buf; DECODE_HEAD; iattr->ia_valid = 0; @@ -440,24 +466,12 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, return nfserr_bad_xdr; } } - label->len = 0; if (IS_ENABLED(CONFIG_NFSD_V4_SECURITY_LABEL) && bmval[2] & FATTR4_WORD2_SECURITY_LABEL) { - READ_BUF(4); - dummy32 = be32_to_cpup(p++); /* lfs: we don't use it */ - READ_BUF(4); - dummy32 = be32_to_cpup(p++); /* pi: we don't use it either */ - READ_BUF(4); - dummy32 = be32_to_cpup(p++); - READ_BUF(dummy32); - if (dummy32 > NFS4_MAXLABELLEN) - return nfserr_badlabel; - READMEM(buf, dummy32); - label->len = dummy32; - label->data = svcxdr_dupstr(argp, buf, dummy32); - if (!label->data) - return nfserr_jukebox; + status = nfsd4_decode_security_label(argp, label); + if (status) + return status; } if (bmval[2] & FATTR4_WORD2_MODE_UMASK) { if (!umask) From 9737a9a8f923ec7621ea6f9d180e176666b1f435 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 19 Nov 2020 14:07:43 -0500 Subject: [PATCH 0325/1565] NFSD: Replace READ* macros that decode the fattr4 umask attribute [ Upstream commit 66f0476c704c86d44aa9da19d4753df66f2dbc96 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 453a902c7490..2d97bbff13b6 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -358,7 +358,6 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, { unsigned int starting_pos; u32 attrlist4_count; - u32 dummy32; DECODE_HEAD; iattr->ia_valid = 0; @@ -474,13 +473,16 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, return status; } if (bmval[2] & FATTR4_WORD2_MODE_UMASK) { + u32 mode, mask; + if (!umask) - goto xdr_error; - READ_BUF(8); - dummy32 = be32_to_cpup(p++); - iattr->ia_mode = dummy32 & (S_IFMT | S_IALLUGO); - dummy32 = be32_to_cpup(p++); - *umask = dummy32 & S_IRWXUGO; + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &mode) < 0) + return nfserr_bad_xdr; + iattr->ia_mode = mode & (S_IFMT | S_IALLUGO); + if (xdr_stream_decode_u32(argp->xdr, &mask) < 0) + return nfserr_bad_xdr; + *umask = mask & S_IRWXUGO; iattr->ia_valid |= ATTR_MODE; } From 57516a96cae8c0b5ee53e25d141a4f96cbb1000e Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 12:56:05 -0500 Subject: [PATCH 0326/1565] NFSD: Replace READ* macros in nfsd4_decode_fattr() [ Upstream commit d1c263a031e876ac3ca5223c728e4d98ed50b3c0 ] Let's be more careful to avoid overrunning the memory that backs the bitmap array. This requires updating the synopsis of nfsd4_decode_fattr(). Bruce points out that a server needs to be careful to return nfs_ok when a client presents bitmap bits the server doesn't support. This includes bits in bitmap words the server might not yet support. The current READ* based implementation is good about that, but that requirement hasn't been documented. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 82 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 64 insertions(+), 18 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 2d97bbff13b6..c916e5d9d307 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -260,6 +260,46 @@ nfsd4_decode_bitmap(struct nfsd4_compoundargs *argp, u32 *bmval) DECODE_TAIL; } +/** + * nfsd4_decode_bitmap4 - Decode an NFSv4 bitmap4 + * @argp: NFSv4 compound argument structure + * @bmval: pointer to an array of u32's to decode into + * @bmlen: size of the @bmval array + * + * The server needs to return nfs_ok rather than nfserr_bad_xdr when + * encountering bitmaps containing bits it does not recognize. This + * includes bits in bitmap words past WORDn, where WORDn is the last + * bitmap WORD the implementation currently supports. Thus we are + * careful here to simply ignore bits in bitmap words that this + * implementation has yet to support explicitly. + * + * Return values: + * %nfs_ok: @bmval populated successfully + * %nfserr_bad_xdr: the encoded bitmap was invalid + */ +static __be32 +nfsd4_decode_bitmap4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen) +{ + u32 i, count; + __be32 *p; + + if (xdr_stream_decode_u32(argp->xdr, &count) < 0) + return nfserr_bad_xdr; + /* request sanity */ + if (count > 1000) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, count << 2); + if (!p) + return nfserr_bad_xdr; + i = 0; + while (i < count) + bmval[i++] = be32_to_cpup(p++); + while (i < bmlen) + bmval[i++] = 0; + + return nfs_ok; +} + static __be32 nfsd4_decode_nfsace4(struct nfsd4_compoundargs *argp, struct nfs4_ace *ace) { @@ -352,17 +392,18 @@ nfsd4_decode_security_label(struct nfsd4_compoundargs *argp, } static __be32 -nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, - struct iattr *iattr, struct nfs4_acl **acl, - struct xdr_netobj *label, int *umask) +nfsd4_decode_fattr4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen, + struct iattr *iattr, struct nfs4_acl **acl, + struct xdr_netobj *label, int *umask) { unsigned int starting_pos; u32 attrlist4_count; + __be32 *p, status; - DECODE_HEAD; iattr->ia_valid = 0; - if ((status = nfsd4_decode_bitmap(argp, bmval))) - return status; + status = nfsd4_decode_bitmap4(argp, bmval, bmlen); + if (status) + return nfserr_bad_xdr; if (bmval[0] & ~NFSD_WRITEABLE_ATTRS_WORD0 || bmval[1] & ~NFSD_WRITEABLE_ATTRS_WORD1 @@ -490,7 +531,7 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, if (attrlist4_count != xdr_stream_pos(argp->xdr) - starting_pos) return nfserr_bad_xdr; - DECODE_TAIL; + return nfs_ok; } static __be32 @@ -690,9 +731,10 @@ nfsd4_decode_create(struct nfsd4_compoundargs *argp, struct nfsd4_create *create if ((status = check_filename(create->cr_name, create->cr_namelen))) return status; - status = nfsd4_decode_fattr(argp, create->cr_bmval, &create->cr_iattr, - &create->cr_acl, &create->cr_label, - &create->cr_umask); + status = nfsd4_decode_fattr4(argp, create->cr_bmval, + ARRAY_SIZE(create->cr_bmval), + &create->cr_iattr, &create->cr_acl, + &create->cr_label, &create->cr_umask); if (status) goto out; @@ -941,9 +983,10 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) switch (open->op_createmode) { case NFS4_CREATE_UNCHECKED: case NFS4_CREATE_GUARDED: - status = nfsd4_decode_fattr(argp, open->op_bmval, - &open->op_iattr, &open->op_acl, &open->op_label, - &open->op_umask); + status = nfsd4_decode_fattr4(argp, open->op_bmval, + ARRAY_SIZE(open->op_bmval), + &open->op_iattr, &open->op_acl, + &open->op_label, &open->op_umask); if (status) goto out; break; @@ -956,9 +999,10 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) goto xdr_error; READ_BUF(NFS4_VERIFIER_SIZE); COPYMEM(open->op_verf.data, NFS4_VERIFIER_SIZE); - status = nfsd4_decode_fattr(argp, open->op_bmval, - &open->op_iattr, &open->op_acl, &open->op_label, - &open->op_umask); + status = nfsd4_decode_fattr4(argp, open->op_bmval, + ARRAY_SIZE(open->op_bmval), + &open->op_iattr, &open->op_acl, + &open->op_label, &open->op_umask); if (status) goto out; break; @@ -1194,8 +1238,10 @@ nfsd4_decode_setattr(struct nfsd4_compoundargs *argp, struct nfsd4_setattr *seta status = nfsd4_decode_stateid(argp, &setattr->sa_stateid); if (status) return status; - return nfsd4_decode_fattr(argp, setattr->sa_bmval, &setattr->sa_iattr, - &setattr->sa_acl, &setattr->sa_label, NULL); + return nfsd4_decode_fattr4(argp, setattr->sa_bmval, + ARRAY_SIZE(setattr->sa_bmval), + &setattr->sa_iattr, &setattr->sa_acl, + &setattr->sa_label, NULL); } static __be32 From 3012fe5fea5512f680ed1f68ef5810506092b74f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 13:24:10 -0500 Subject: [PATCH 0327/1565] NFSD: Replace READ* macros in nfsd4_decode_create() [ Upstream commit 000dfa18b3df9c62df5f768f9187cf1a94ded71d ] A dedicated decoder for component4 is introduced here, which will be used by other operation decoders in subsequent patches. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 58 ++++++++++++++++++++++++++++++++--------------- 1 file changed, 40 insertions(+), 18 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index c916e5d9d307..8f296f5568b1 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -92,6 +92,8 @@ check_filename(char *str, int len) if (len == 0) return nfserr_inval; + if (len > NFS4_MAXNAMLEN) + return nfserr_nametoolong; if (isdotent(str, len)) return nfserr_badname; for (i = 0; i < len; i++) @@ -205,6 +207,27 @@ static char *savemem(struct nfsd4_compoundargs *argp, __be32 *p, int nbytes) return ret; } +static __be32 +nfsd4_decode_component4(struct nfsd4_compoundargs *argp, char **namp, u32 *lenp) +{ + __be32 *p, status; + + if (xdr_stream_decode_u32(argp->xdr, lenp) < 0) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, *lenp); + if (!p) + return nfserr_bad_xdr; + status = check_filename((char *)p, *lenp); + if (status) + return status; + *namp = svcxdr_tmpalloc(argp, *lenp); + if (!*namp) + return nfserr_jukebox; + memcpy(*namp, p, *lenp); + + return nfs_ok; +} + static __be32 nfsd4_decode_time(struct nfsd4_compoundargs *argp, struct timespec64 *tv) { @@ -698,24 +721,27 @@ nfsd4_decode_commit(struct nfsd4_compoundargs *argp, struct nfsd4_commit *commit static __be32 nfsd4_decode_create(struct nfsd4_compoundargs *argp, struct nfsd4_create *create) { - DECODE_HEAD; + __be32 *p, status; - READ_BUF(4); - create->cr_type = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &create->cr_type) < 0) + return nfserr_bad_xdr; switch (create->cr_type) { case NF4LNK: - READ_BUF(4); - create->cr_datalen = be32_to_cpup(p++); - READ_BUF(create->cr_datalen); + if (xdr_stream_decode_u32(argp->xdr, &create->cr_datalen) < 0) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, create->cr_datalen); + if (!p) + return nfserr_bad_xdr; create->cr_data = svcxdr_dupstr(argp, p, create->cr_datalen); if (!create->cr_data) return nfserr_jukebox; break; case NF4BLK: case NF4CHR: - READ_BUF(8); - create->cr_specdata1 = be32_to_cpup(p++); - create->cr_specdata2 = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &create->cr_specdata1) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &create->cr_specdata2) < 0) + return nfserr_bad_xdr; break; case NF4SOCK: case NF4FIFO: @@ -723,22 +749,18 @@ nfsd4_decode_create(struct nfsd4_compoundargs *argp, struct nfsd4_create *create default: break; } - - READ_BUF(4); - create->cr_namelen = be32_to_cpup(p++); - READ_BUF(create->cr_namelen); - SAVEMEM(create->cr_name, create->cr_namelen); - if ((status = check_filename(create->cr_name, create->cr_namelen))) + status = nfsd4_decode_component4(argp, &create->cr_name, + &create->cr_namelen); + if (status) return status; - status = nfsd4_decode_fattr4(argp, create->cr_bmval, ARRAY_SIZE(create->cr_bmval), &create->cr_iattr, &create->cr_acl, &create->cr_label, &create->cr_umask); if (status) - goto out; + return status; - DECODE_TAIL; + return nfs_ok; } static inline __be32 From 42e319695efc041e6f085a2d89ea10ab5c2e7d7b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sat, 21 Nov 2020 14:11:58 -0500 Subject: [PATCH 0328/1565] NFSD: Replace READ* macros in nfsd4_decode_delegreturn() [ Upstream commit 95e6482cedfc0785b85db49b72a05323bbf41750 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 8f296f5568b1..234d50096123 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -766,7 +766,7 @@ nfsd4_decode_create(struct nfsd4_compoundargs *argp, struct nfsd4_create *create static inline __be32 nfsd4_decode_delegreturn(struct nfsd4_compoundargs *argp, struct nfsd4_delegreturn *dr) { - return nfsd4_decode_stateid(argp, &dr->dr_stateid); + return nfsd4_decode_stateid4(argp, &dr->dr_stateid); } static inline __be32 From 42c4437d78e62ddd6a7e4394cd75274c40a074a3 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 19 Nov 2020 14:40:20 -0500 Subject: [PATCH 0329/1565] NFSD: Replace READ* macros in nfsd4_decode_getattr() [ Upstream commit f759eff260f1f0b0f56531517762f27ee3233506 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 234d50096123..70ce48340c1b 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -772,7 +772,8 @@ nfsd4_decode_delegreturn(struct nfsd4_compoundargs *argp, struct nfsd4_delegretu static inline __be32 nfsd4_decode_getattr(struct nfsd4_compoundargs *argp, struct nfsd4_getattr *getattr) { - return nfsd4_decode_bitmap(argp, getattr->ga_bmval); + return nfsd4_decode_bitmap4(argp, getattr->ga_bmval, + ARRAY_SIZE(getattr->ga_bmval)); } static __be32 From 84bc365eee7f55c32b07c0f7b904aea2da5cc4e3 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 15:01:24 -0500 Subject: [PATCH 0330/1565] NFSD: Replace READ* macros in nfsd4_decode_link() [ Upstream commit 5c505d128691c70991b766dd6a3faf49fa59ecfb ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 70ce48340c1b..4596b8cef222 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -779,16 +779,7 @@ nfsd4_decode_getattr(struct nfsd4_compoundargs *argp, struct nfsd4_getattr *geta static __be32 nfsd4_decode_link(struct nfsd4_compoundargs *argp, struct nfsd4_link *link) { - DECODE_HEAD; - - READ_BUF(4); - link->li_namelen = be32_to_cpup(p++); - READ_BUF(link->li_namelen); - SAVEMEM(link->li_name, link->li_namelen); - if ((status = check_filename(link->li_name, link->li_namelen))) - return status; - - DECODE_TAIL; + return nfsd4_decode_component4(argp, &link->li_name, &link->li_namelen); } static __be32 From 753bb6b0e78854756a55f88b9fe6e0a6a811e5a8 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 11:41:55 -0500 Subject: [PATCH 0331/1565] NFSD: Relocate nfsd4_decode_opaque() [ Upstream commit 5dcbfabb676b2b6d97767209cf707eb463ca232a ] Enable nfsd4_decode_opaque() to be used in more decoders, and replace the READ* macros in nfsd4_decode_opaque(). Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 43 +++++++++++++++++++++++++++---------------- 1 file changed, 27 insertions(+), 16 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 4596b8cef222..b3459059cec1 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -207,6 +207,33 @@ static char *savemem(struct nfsd4_compoundargs *argp, __be32 *p, int nbytes) return ret; } + +/* + * NFSv4 basic data type decoders + */ + +static __be32 +nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_netobj *o) +{ + __be32 *p; + u32 len; + + if (xdr_stream_decode_u32(argp->xdr, &len) < 0) + return nfserr_bad_xdr; + if (len == 0 || len > NFS4_OPAQUE_LIMIT) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, len); + if (!p) + return nfserr_bad_xdr; + o->data = svcxdr_tmpalloc(argp, len); + if (!o->data) + return nfserr_jukebox; + o->len = len; + memcpy(o->data, p, len); + + return nfs_ok; +} + static __be32 nfsd4_decode_component4(struct nfsd4_compoundargs *argp, char **namp, u32 *lenp) { @@ -943,22 +970,6 @@ static __be32 nfsd4_decode_share_deny(struct nfsd4_compoundargs *argp, u32 *x) return nfserr_bad_xdr; } -static __be32 nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_netobj *o) -{ - DECODE_HEAD; - - READ_BUF(4); - o->len = be32_to_cpup(p++); - - if (o->len == 0 || o->len > NFS4_OPAQUE_LIMIT) - return nfserr_bad_xdr; - - READ_BUF(o->len); - SAVEMEM(o->data, o->len); - - DECODE_TAIL; -} - static __be32 nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) { From 0c281b7083f23ad0c5612f9dc6110388dadadf0e Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 16 Nov 2020 17:25:02 -0500 Subject: [PATCH 0332/1565] NFSD: Add helpers to decode a clientid4 and an NFSv4 state owner [ Upstream commit 144e82694092ff80b5e64749d6822cd8947587f2 ] These helpers will also be used to simplify decoders in subsequent patches. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 34 +++++++++++++++++++++++++++++----- 1 file changed, 29 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index b3459059cec1..63140cd4c50e 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -609,6 +609,30 @@ nfsd4_decode_stateid4(struct nfsd4_compoundargs *argp, stateid_t *sid) return nfs_ok; } +static __be32 +nfsd4_decode_clientid4(struct nfsd4_compoundargs *argp, clientid_t *clientid) +{ + __be32 *p; + + p = xdr_inline_decode(argp->xdr, sizeof(__be64)); + if (!p) + return nfserr_bad_xdr; + memcpy(clientid, p, sizeof(*clientid)); + return nfs_ok; +} + +static __be32 +nfsd4_decode_state_owner4(struct nfsd4_compoundargs *argp, + clientid_t *clientid, struct xdr_netobj *owner) +{ + __be32 status; + + status = nfsd4_decode_clientid4(argp, clientid); + if (status) + return status; + return nfsd4_decode_opaque(argp, owner); +} + static __be32 nfsd4_decode_cb_sec(struct nfsd4_compoundargs *argp, struct nfsd4_cb_sec *cbs) { DECODE_HEAD; @@ -832,12 +856,12 @@ nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) status = nfsd4_decode_stateid(argp, &lock->lk_new_open_stateid); if (status) return status; - READ_BUF(8 + sizeof(clientid_t)); + READ_BUF(4); lock->lk_new_lock_seqid = be32_to_cpup(p++); - COPYMEM(&lock->lk_new_clientid, sizeof(clientid_t)); - lock->lk_new_owner.len = be32_to_cpup(p++); - READ_BUF(lock->lk_new_owner.len); - READMEM(lock->lk_new_owner.data, lock->lk_new_owner.len); + status = nfsd4_decode_state_owner4(argp, &lock->lk_new_clientid, + &lock->lk_new_owner); + if (status) + return status; } else { status = nfsd4_decode_stateid(argp, &lock->lk_old_lock_stateid); if (status) From d27f2dcedae2ab3afe6a82f0f449abbe081c4d01 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 16 Nov 2020 17:16:52 -0500 Subject: [PATCH 0333/1565] NFSD: Add helper for decoding locker4 [ Upstream commit 8918cc0d2b72db9997390626010b182c4500d749 ] Refactor for clarity. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 64 +++++++++++++++++++++++++------------- include/linux/sunrpc/xdr.h | 21 +++++++++++++ 2 files changed, 64 insertions(+), 21 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 63140cd4c50e..15ed5249e2c7 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -833,6 +833,48 @@ nfsd4_decode_link(struct nfsd4_compoundargs *argp, struct nfsd4_link *link) return nfsd4_decode_component4(argp, &link->li_name, &link->li_namelen); } +static __be32 +nfsd4_decode_open_to_lock_owner4(struct nfsd4_compoundargs *argp, + struct nfsd4_lock *lock) +{ + __be32 status; + + if (xdr_stream_decode_u32(argp->xdr, &lock->lk_new_open_seqid) < 0) + return nfserr_bad_xdr; + status = nfsd4_decode_stateid4(argp, &lock->lk_new_open_stateid); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &lock->lk_new_lock_seqid) < 0) + return nfserr_bad_xdr; + return nfsd4_decode_state_owner4(argp, &lock->lk_new_clientid, + &lock->lk_new_owner); +} + +static __be32 +nfsd4_decode_exist_lock_owner4(struct nfsd4_compoundargs *argp, + struct nfsd4_lock *lock) +{ + __be32 status; + + status = nfsd4_decode_stateid4(argp, &lock->lk_old_lock_stateid); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &lock->lk_old_lock_seqid) < 0) + return nfserr_bad_xdr; + + return nfs_ok; +} + +static __be32 +nfsd4_decode_locker4(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) +{ + if (xdr_stream_decode_bool(argp->xdr, &lock->lk_is_new) < 0) + return nfserr_bad_xdr; + if (lock->lk_is_new) + return nfsd4_decode_open_to_lock_owner4(argp, lock); + return nfsd4_decode_exist_lock_owner4(argp, lock); +} + static __be32 nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) { @@ -848,27 +890,7 @@ nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) lock->lk_reclaim = be32_to_cpup(p++); p = xdr_decode_hyper(p, &lock->lk_offset); p = xdr_decode_hyper(p, &lock->lk_length); - lock->lk_is_new = be32_to_cpup(p++); - - if (lock->lk_is_new) { - READ_BUF(4); - lock->lk_new_open_seqid = be32_to_cpup(p++); - status = nfsd4_decode_stateid(argp, &lock->lk_new_open_stateid); - if (status) - return status; - READ_BUF(4); - lock->lk_new_lock_seqid = be32_to_cpup(p++); - status = nfsd4_decode_state_owner4(argp, &lock->lk_new_clientid, - &lock->lk_new_owner); - if (status) - return status; - } else { - status = nfsd4_decode_stateid(argp, &lock->lk_old_lock_stateid); - if (status) - return status; - READ_BUF(4); - lock->lk_old_lock_seqid = be32_to_cpup(p++); - } + status = nfsd4_decode_locker4(argp, lock); DECODE_TAIL; } diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 6b1757543747..f6569b620bea 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -548,6 +548,27 @@ static inline bool xdr_item_is_present(const __be32 *p) return *p != xdr_zero; } +/** + * xdr_stream_decode_bool - Decode a boolean + * @xdr: pointer to xdr_stream + * @ptr: pointer to a u32 in which to store the result + * + * Return values: + * %0 on success + * %-EBADMSG on XDR buffer overflow + */ +static inline ssize_t +xdr_stream_decode_bool(struct xdr_stream *xdr, __u32 *ptr) +{ + const size_t count = sizeof(*ptr); + __be32 *p = xdr_inline_decode(xdr, count); + + if (unlikely(!p)) + return -EBADMSG; + *ptr = (*p != xdr_zero); + return 0; +} + /** * xdr_stream_decode_u32 - Decode a 32-bit integer * @xdr: pointer to xdr_stream From 8357985d2185e00fad7227f3834e0cd1985636cf Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 13:29:27 -0500 Subject: [PATCH 0334/1565] NFSD: Replace READ* macros in nfsd4_decode_lock() [ Upstream commit 7c59deed5cd2e1cfc6cbecf06f4584ac53755f53 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 15ed5249e2c7..b50d2987bb6e 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -878,21 +878,17 @@ nfsd4_decode_locker4(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) static __be32 nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) { - DECODE_HEAD; - - /* - * type, reclaim(boolean), offset, length, new_lock_owner(boolean) - */ - READ_BUF(28); - lock->lk_type = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &lock->lk_type) < 0) + return nfserr_bad_xdr; if ((lock->lk_type < NFS4_READ_LT) || (lock->lk_type > NFS4_WRITEW_LT)) - goto xdr_error; - lock->lk_reclaim = be32_to_cpup(p++); - p = xdr_decode_hyper(p, &lock->lk_offset); - p = xdr_decode_hyper(p, &lock->lk_length); - status = nfsd4_decode_locker4(argp, lock); - - DECODE_TAIL; + return nfserr_bad_xdr; + if (xdr_stream_decode_bool(argp->xdr, &lock->lk_reclaim) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &lock->lk_offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &lock->lk_length) < 0) + return nfserr_bad_xdr; + return nfsd4_decode_locker4(argp, lock); } static __be32 From 3625de1522fa14d85546139f0e77f1622785c7e6 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 13:31:44 -0500 Subject: [PATCH 0335/1565] NFSD: Replace READ* macros in nfsd4_decode_lockt() [ Upstream commit 0a146f04aa0fa7a57aaed3913d1c2732b3853f31 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index b50d2987bb6e..4f2680650a56 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -894,20 +894,16 @@ nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) static __be32 nfsd4_decode_lockt(struct nfsd4_compoundargs *argp, struct nfsd4_lockt *lockt) { - DECODE_HEAD; - - READ_BUF(32); - lockt->lt_type = be32_to_cpup(p++); - if((lockt->lt_type < NFS4_READ_LT) || (lockt->lt_type > NFS4_WRITEW_LT)) - goto xdr_error; - p = xdr_decode_hyper(p, &lockt->lt_offset); - p = xdr_decode_hyper(p, &lockt->lt_length); - COPYMEM(&lockt->lt_clientid, 8); - lockt->lt_owner.len = be32_to_cpup(p++); - READ_BUF(lockt->lt_owner.len); - READMEM(lockt->lt_owner.data, lockt->lt_owner.len); - - DECODE_TAIL; + if (xdr_stream_decode_u32(argp->xdr, &lockt->lt_type) < 0) + return nfserr_bad_xdr; + if ((lockt->lt_type < NFS4_READ_LT) || (lockt->lt_type > NFS4_WRITEW_LT)) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &lockt->lt_offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &lockt->lt_length) < 0) + return nfserr_bad_xdr; + return nfsd4_decode_state_owner4(argp, &lockt->lt_clientid, + &lockt->lt_owner); } static __be32 From 340878b2e0a5f82238e75aad678309e90381b9d8 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 13:33:28 -0500 Subject: [PATCH 0336/1565] NFSD: Replace READ* macros in nfsd4_decode_locku() [ Upstream commit ca9cf9fc27f8f722e9eb2763173ba01f6ac3dad1 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 4f2680650a56..3c20a1f8eaa9 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -909,21 +909,23 @@ nfsd4_decode_lockt(struct nfsd4_compoundargs *argp, struct nfsd4_lockt *lockt) static __be32 nfsd4_decode_locku(struct nfsd4_compoundargs *argp, struct nfsd4_locku *locku) { - DECODE_HEAD; + __be32 status; - READ_BUF(8); - locku->lu_type = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &locku->lu_type) < 0) + return nfserr_bad_xdr; if ((locku->lu_type < NFS4_READ_LT) || (locku->lu_type > NFS4_WRITEW_LT)) - goto xdr_error; - locku->lu_seqid = be32_to_cpup(p++); - status = nfsd4_decode_stateid(argp, &locku->lu_stateid); + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &locku->lu_seqid) < 0) + return nfserr_bad_xdr; + status = nfsd4_decode_stateid4(argp, &locku->lu_stateid); if (status) return status; - READ_BUF(16); - p = xdr_decode_hyper(p, &locku->lu_offset); - p = xdr_decode_hyper(p, &locku->lu_length); + if (xdr_stream_decode_u64(argp->xdr, &locku->lu_offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &locku->lu_length) < 0) + return nfserr_bad_xdr; - DECODE_TAIL; + return nfs_ok; } static __be32 From def95074db3c1e955f24b7d3b3001b1e2c4cea18 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 15:02:40 -0500 Subject: [PATCH 0337/1565] NFSD: Replace READ* macros in nfsd4_decode_lookup() [ Upstream commit 3d5877e8e03f60d7cc804d7b230ff9c00c9c07bd ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 3c20a1f8eaa9..431ab9d604be 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -931,16 +931,7 @@ nfsd4_decode_locku(struct nfsd4_compoundargs *argp, struct nfsd4_locku *locku) static __be32 nfsd4_decode_lookup(struct nfsd4_compoundargs *argp, struct nfsd4_lookup *lookup) { - DECODE_HEAD; - - READ_BUF(4); - lookup->lo_len = be32_to_cpup(p++); - READ_BUF(lookup->lo_len); - SAVEMEM(lookup->lo_name, lookup->lo_len); - if ((status = check_filename(lookup->lo_name, lookup->lo_len))) - return status; - - DECODE_TAIL; + return nfsd4_decode_component4(argp, &lookup->lo_name, &lookup->lo_len); } static __be32 nfsd4_decode_share_access(struct nfsd4_compoundargs *argp, u32 *share_access, u32 *deleg_want, u32 *deleg_when) From 11ea3e65f070e389234ed98cdd2b670cb206faa6 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 16 Nov 2020 17:34:01 -0500 Subject: [PATCH 0338/1565] NFSD: Add helper to decode NFSv4 verifiers [ Upstream commit 796dd1c6b680959ac968b52aa507911b288b1749 ] This helper will be used to simplify decoders in subsequent patches. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 431ab9d604be..1a2dc52c4340 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -284,6 +284,18 @@ nfsd4_decode_nfstime4(struct nfsd4_compoundargs *argp, struct timespec64 *tv) return nfs_ok; } +static __be32 +nfsd4_decode_verifier4(struct nfsd4_compoundargs *argp, nfs4_verifier *verf) +{ + __be32 *p; + + p = xdr_inline_decode(argp->xdr, NFS4_VERIFIER_SIZE); + if (!p) + return nfserr_bad_xdr; + memcpy(verf->data, p, sizeof(verf->data)); + return nfs_ok; +} + static __be32 nfsd4_decode_bitmap(struct nfsd4_compoundargs *argp, u32 *bmval) { @@ -1047,14 +1059,16 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) goto out; break; case NFS4_CREATE_EXCLUSIVE: - READ_BUF(NFS4_VERIFIER_SIZE); - COPYMEM(open->op_verf.data, NFS4_VERIFIER_SIZE); + status = nfsd4_decode_verifier4(argp, &open->op_verf); + if (status) + return status; break; case NFS4_CREATE_EXCLUSIVE4_1: if (argp->minorversion < 1) goto xdr_error; - READ_BUF(NFS4_VERIFIER_SIZE); - COPYMEM(open->op_verf.data, NFS4_VERIFIER_SIZE); + status = nfsd4_decode_verifier4(argp, &open->op_verf); + if (status) + return status; status = nfsd4_decode_fattr4(argp, open->op_bmval, ARRAY_SIZE(open->op_bmval), &open->op_iattr, &open->op_acl, From c1ea8812d4210fd77ff8490b3eb75e7be688ec08 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 16 Nov 2020 17:37:42 -0500 Subject: [PATCH 0339/1565] NFSD: Add helper to decode OPEN's createhow4 argument [ Upstream commit bf33bab3c4182cdd795983f14de5606e82fab377 ] Refactor for clarity. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 78 +++++++++++++++++++++++++++-------------------- 1 file changed, 45 insertions(+), 33 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 1a2dc52c4340..62096b2a57b3 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -946,6 +946,48 @@ nfsd4_decode_lookup(struct nfsd4_compoundargs *argp, struct nfsd4_lookup *lookup return nfsd4_decode_component4(argp, &lookup->lo_name, &lookup->lo_len); } +static __be32 +nfsd4_decode_createhow4(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) +{ + __be32 status; + + if (xdr_stream_decode_u32(argp->xdr, &open->op_createmode) < 0) + return nfserr_bad_xdr; + switch (open->op_createmode) { + case NFS4_CREATE_UNCHECKED: + case NFS4_CREATE_GUARDED: + status = nfsd4_decode_fattr4(argp, open->op_bmval, + ARRAY_SIZE(open->op_bmval), + &open->op_iattr, &open->op_acl, + &open->op_label, &open->op_umask); + if (status) + return status; + break; + case NFS4_CREATE_EXCLUSIVE: + status = nfsd4_decode_verifier4(argp, &open->op_verf); + if (status) + return status; + break; + case NFS4_CREATE_EXCLUSIVE4_1: + if (argp->minorversion < 1) + return nfserr_bad_xdr; + status = nfsd4_decode_verifier4(argp, &open->op_verf); + if (status) + return status; + status = nfsd4_decode_fattr4(argp, open->op_bmval, + ARRAY_SIZE(open->op_bmval), + &open->op_iattr, &open->op_acl, + &open->op_label, &open->op_umask); + if (status) + return status; + break; + default: + return nfserr_bad_xdr; + } + + return nfs_ok; +} + static __be32 nfsd4_decode_share_access(struct nfsd4_compoundargs *argp, u32 *share_access, u32 *deleg_want, u32 *deleg_when) { __be32 *p; @@ -1046,39 +1088,9 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) case NFS4_OPEN_NOCREATE: break; case NFS4_OPEN_CREATE: - READ_BUF(4); - open->op_createmode = be32_to_cpup(p++); - switch (open->op_createmode) { - case NFS4_CREATE_UNCHECKED: - case NFS4_CREATE_GUARDED: - status = nfsd4_decode_fattr4(argp, open->op_bmval, - ARRAY_SIZE(open->op_bmval), - &open->op_iattr, &open->op_acl, - &open->op_label, &open->op_umask); - if (status) - goto out; - break; - case NFS4_CREATE_EXCLUSIVE: - status = nfsd4_decode_verifier4(argp, &open->op_verf); - if (status) - return status; - break; - case NFS4_CREATE_EXCLUSIVE4_1: - if (argp->minorversion < 1) - goto xdr_error; - status = nfsd4_decode_verifier4(argp, &open->op_verf); - if (status) - return status; - status = nfsd4_decode_fattr4(argp, open->op_bmval, - ARRAY_SIZE(open->op_bmval), - &open->op_iattr, &open->op_acl, - &open->op_label, &open->op_umask); - if (status) - goto out; - break; - default: - goto xdr_error; - } + status = nfsd4_decode_createhow4(argp, open); + if (status) + return status; break; default: goto xdr_error; From 070df4a4e9868485245e33823d7a1a366382008f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 16 Nov 2020 17:41:21 -0500 Subject: [PATCH 0340/1565] NFSD: Add helper to decode OPEN's openflag4 argument [ Upstream commit e6ec04b27bfb4869c0e35fbcf24333d379f101d5 ] Refactor for clarity. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 38 +++++++++++++++++++++++++------------- 1 file changed, 25 insertions(+), 13 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 62096b2a57b3..76715d1935ad 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -988,6 +988,28 @@ nfsd4_decode_createhow4(struct nfsd4_compoundargs *argp, struct nfsd4_open *open return nfs_ok; } +static __be32 +nfsd4_decode_openflag4(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) +{ + __be32 status; + + if (xdr_stream_decode_u32(argp->xdr, &open->op_create) < 0) + return nfserr_bad_xdr; + switch (open->op_create) { + case NFS4_OPEN_NOCREATE: + break; + case NFS4_OPEN_CREATE: + status = nfsd4_decode_createhow4(argp, open); + if (status) + return status; + break; + default: + return nfserr_bad_xdr; + } + + return nfs_ok; +} + static __be32 nfsd4_decode_share_access(struct nfsd4_compoundargs *argp, u32 *share_access, u32 *deleg_want, u32 *deleg_when) { __be32 *p; @@ -1082,19 +1104,9 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) status = nfsd4_decode_opaque(argp, &open->op_owner); if (status) goto xdr_error; - READ_BUF(4); - open->op_create = be32_to_cpup(p++); - switch (open->op_create) { - case NFS4_OPEN_NOCREATE: - break; - case NFS4_OPEN_CREATE: - status = nfsd4_decode_createhow4(argp, open); - if (status) - return status; - break; - default: - goto xdr_error; - } + status = nfsd4_decode_openflag4(argp, open); + if (status) + return status; /* open_claim */ READ_BUF(4); From fa60cc6971fbd58bb00f38072ef21f1931ad7d85 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 16 Nov 2020 17:54:48 -0500 Subject: [PATCH 0341/1565] NFSD: Replace READ* macros in nfsd4_decode_share_access() [ Upstream commit 9aa62f5199749b274454b6d7d914c9b2a5e77031 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 76715d1935ad..a43b39940ab2 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1012,11 +1012,10 @@ nfsd4_decode_openflag4(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) static __be32 nfsd4_decode_share_access(struct nfsd4_compoundargs *argp, u32 *share_access, u32 *deleg_want, u32 *deleg_when) { - __be32 *p; u32 w; - READ_BUF(4); - w = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &w) < 0) + return nfserr_bad_xdr; *share_access = w & NFS4_SHARE_ACCESS_MASK; *deleg_want = w & NFS4_SHARE_WANT_MASK; if (deleg_when) @@ -1059,7 +1058,6 @@ static __be32 nfsd4_decode_share_access(struct nfsd4_compoundargs *argp, u32 *sh NFS4_SHARE_PUSH_DELEG_WHEN_UNCONTENDED): return nfs_ok; } -xdr_error: return nfserr_bad_xdr; } From 48385b58bcf637a5e20d1c062d08471fa481a387 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 16 Nov 2020 17:56:17 -0500 Subject: [PATCH 0342/1565] NFSD: Replace READ* macros in nfsd4_decode_share_deny() [ Upstream commit b07bebd9eb9842e2d0dea87efeb92884556e55b0 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index a43b39940ab2..a9257ec9d151 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1063,16 +1063,13 @@ static __be32 nfsd4_decode_share_access(struct nfsd4_compoundargs *argp, u32 *sh static __be32 nfsd4_decode_share_deny(struct nfsd4_compoundargs *argp, u32 *x) { - __be32 *p; - - READ_BUF(4); - *x = be32_to_cpup(p++); - /* Note: unlinke access bits, deny bits may be zero. */ + if (xdr_stream_decode_u32(argp->xdr, x) < 0) + return nfserr_bad_xdr; + /* Note: unlike access bits, deny bits may be zero. */ if (*x & ~NFS4_SHARE_DENY_BOTH) return nfserr_bad_xdr; + return nfs_ok; -xdr_error: - return nfserr_bad_xdr; } static __be32 From 4507c23e4204aadc7d968273ac4c47171b3fd55b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 16 Nov 2020 17:45:04 -0500 Subject: [PATCH 0343/1565] NFSD: Add helper to decode OPEN's open_claim4 argument [ Upstream commit 1708e50b0145f393acbec9e319bdf0e33f765d25 ] Refactor for clarity. Note that op_fname is the only instance of an NFSv4 filename stored in a struct xdr_netobj. Convert it to a u32/char * pair so that the new nfsd4_decode_filename() helper can be used. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 8 ++-- fs/nfsd/nfs4xdr.c | 95 ++++++++++++++++++++++++---------------------- fs/nfsd/xdr4.h | 3 +- 3 files changed, 56 insertions(+), 50 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 95545a61bfc7..a038d1e182ff 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -257,8 +257,8 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru * in NFSv4 as in v3 except EXCLUSIVE4_1. */ current->fs->umask = open->op_umask; - status = do_nfsd_create(rqstp, current_fh, open->op_fname.data, - open->op_fname.len, &open->op_iattr, + status = do_nfsd_create(rqstp, current_fh, open->op_fname, + open->op_fnamelen, &open->op_iattr, *resfh, open->op_createmode, (u32 *)open->op_verf.data, &open->op_truncate, &open->op_created); @@ -283,7 +283,7 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru * a chance to an acquire a delegation if appropriate. */ status = nfsd_lookup(rqstp, current_fh, - open->op_fname.data, open->op_fname.len, *resfh); + open->op_fname, open->op_fnamelen, *resfh); if (status) goto out; status = nfsd_check_obj_isreg(*resfh); @@ -360,7 +360,7 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, bool reclaim = false; dprintk("NFSD: nfsd4_open filename %.*s op_openowner %p\n", - (int)open->op_fname.len, open->op_fname.data, + (int)open->op_fnamelen, open->op_fname, open->op_openowner); /* This check required by spec. */ diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index a9257ec9d151..3e0fca521c39 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1072,6 +1072,55 @@ static __be32 nfsd4_decode_share_deny(struct nfsd4_compoundargs *argp, u32 *x) return nfs_ok; } +static __be32 +nfsd4_decode_open_claim4(struct nfsd4_compoundargs *argp, + struct nfsd4_open *open) +{ + __be32 status; + + if (xdr_stream_decode_u32(argp->xdr, &open->op_claim_type) < 0) + return nfserr_bad_xdr; + switch (open->op_claim_type) { + case NFS4_OPEN_CLAIM_NULL: + case NFS4_OPEN_CLAIM_DELEGATE_PREV: + status = nfsd4_decode_component4(argp, &open->op_fname, + &open->op_fnamelen); + if (status) + return status; + break; + case NFS4_OPEN_CLAIM_PREVIOUS: + if (xdr_stream_decode_u32(argp->xdr, &open->op_delegate_type) < 0) + return nfserr_bad_xdr; + break; + case NFS4_OPEN_CLAIM_DELEGATE_CUR: + status = nfsd4_decode_stateid4(argp, &open->op_delegate_stateid); + if (status) + return status; + status = nfsd4_decode_component4(argp, &open->op_fname, + &open->op_fnamelen); + if (status) + return status; + break; + case NFS4_OPEN_CLAIM_FH: + case NFS4_OPEN_CLAIM_DELEG_PREV_FH: + if (argp->minorversion < 1) + return nfserr_bad_xdr; + /* void */ + break; + case NFS4_OPEN_CLAIM_DELEG_CUR_FH: + if (argp->minorversion < 1) + return nfserr_bad_xdr; + status = nfsd4_decode_stateid4(argp, &open->op_delegate_stateid); + if (status) + return status; + break; + default: + return nfserr_bad_xdr; + } + + return nfs_ok; +} + static __be32 nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) { @@ -1102,51 +1151,7 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) status = nfsd4_decode_openflag4(argp, open); if (status) return status; - - /* open_claim */ - READ_BUF(4); - open->op_claim_type = be32_to_cpup(p++); - switch (open->op_claim_type) { - case NFS4_OPEN_CLAIM_NULL: - case NFS4_OPEN_CLAIM_DELEGATE_PREV: - READ_BUF(4); - open->op_fname.len = be32_to_cpup(p++); - READ_BUF(open->op_fname.len); - SAVEMEM(open->op_fname.data, open->op_fname.len); - if ((status = check_filename(open->op_fname.data, open->op_fname.len))) - return status; - break; - case NFS4_OPEN_CLAIM_PREVIOUS: - READ_BUF(4); - open->op_delegate_type = be32_to_cpup(p++); - break; - case NFS4_OPEN_CLAIM_DELEGATE_CUR: - status = nfsd4_decode_stateid(argp, &open->op_delegate_stateid); - if (status) - return status; - READ_BUF(4); - open->op_fname.len = be32_to_cpup(p++); - READ_BUF(open->op_fname.len); - SAVEMEM(open->op_fname.data, open->op_fname.len); - if ((status = check_filename(open->op_fname.data, open->op_fname.len))) - return status; - break; - case NFS4_OPEN_CLAIM_FH: - case NFS4_OPEN_CLAIM_DELEG_PREV_FH: - if (argp->minorversion < 1) - goto xdr_error; - /* void */ - break; - case NFS4_OPEN_CLAIM_DELEG_CUR_FH: - if (argp->minorversion < 1) - goto xdr_error; - status = nfsd4_decode_stateid(argp, &open->op_delegate_stateid); - if (status) - return status; - break; - default: - goto xdr_error; - } + status = nfsd4_decode_open_claim4(argp, open); DECODE_TAIL; } diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 0eb13bd603ea..6245004a9993 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -252,7 +252,8 @@ struct nfsd4_listxattrs { struct nfsd4_open { u32 op_claim_type; /* request */ - struct xdr_netobj op_fname; /* request - everything but CLAIM_PREV */ + u32 op_fnamelen; + char * op_fname; /* request - everything but CLAIM_PREV */ u32 op_delegate_type; /* request - CLAIM_PREV only */ stateid_t op_delegate_stateid; /* request - response */ u32 op_why_no_deleg; /* response - DELEG_NONE_EXT only */ From f6eb911d790bff3ea2f5324c3128e3fa080b0d60 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 1 Nov 2020 12:04:06 -0500 Subject: [PATCH 0344/1565] NFSD: Replace READ* macros in nfsd4_decode_open() [ Upstream commit 61e5e0b3ec713d1365008c8af3fe5fdd262e2a60 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 3e0fca521c39..2de54e84a3ab 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1124,7 +1124,7 @@ nfsd4_decode_open_claim4(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) { - DECODE_HEAD; + __be32 status; u32 dummy; memset(open->op_bmval, 0, sizeof(open->op_bmval)); @@ -1132,28 +1132,24 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) open->op_openowner = NULL; open->op_xdr_error = 0; - /* seqid, share_access, share_deny, clientid, ownerlen */ - READ_BUF(4); - open->op_seqid = be32_to_cpup(p++); - /* decode, yet ignore deleg_when until supported */ + if (xdr_stream_decode_u32(argp->xdr, &open->op_seqid) < 0) + return nfserr_bad_xdr; + /* deleg_want is ignored */ status = nfsd4_decode_share_access(argp, &open->op_share_access, &open->op_deleg_want, &dummy); if (status) - goto xdr_error; + return status; status = nfsd4_decode_share_deny(argp, &open->op_share_deny); if (status) - goto xdr_error; - READ_BUF(sizeof(clientid_t)); - COPYMEM(&open->op_clientid, sizeof(clientid_t)); - status = nfsd4_decode_opaque(argp, &open->op_owner); + return status; + status = nfsd4_decode_state_owner4(argp, &open->op_clientid, + &open->op_owner); if (status) - goto xdr_error; + return status; status = nfsd4_decode_openflag4(argp, open); if (status) return status; - status = nfsd4_decode_open_claim4(argp, open); - - DECODE_TAIL; + return nfsd4_decode_open_claim4(argp, open); } static __be32 From e557a2eabb350c03a8f3e546530d9760161d2648 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 14:18:57 -0500 Subject: [PATCH 0345/1565] NFSD: Replace READ* macros in nfsd4_decode_open_confirm() [ Upstream commit 06bee693a1f1cb774b91000f05a6e183c257d8e9 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 2de54e84a3ab..7b6fb11cdc80 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1155,18 +1155,18 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) static __be32 nfsd4_decode_open_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_open_confirm *open_conf) { - DECODE_HEAD; + __be32 status; if (argp->minorversion >= 1) return nfserr_notsupp; - status = nfsd4_decode_stateid(argp, &open_conf->oc_req_stateid); + status = nfsd4_decode_stateid4(argp, &open_conf->oc_req_stateid); if (status) return status; - READ_BUF(4); - open_conf->oc_seqid = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &open_conf->oc_seqid) < 0) + return nfserr_bad_xdr; - DECODE_TAIL; + return nfs_ok; } static __be32 From ad1ea32c9732e40d0eabbd0c7a25e16a74765d83 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 14:21:01 -0500 Subject: [PATCH 0346/1565] NFSD: Replace READ* macros in nfsd4_decode_open_downgrade() [ Upstream commit dca71651f097ea608945d7a66bf62761a630de9a ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 7b6fb11cdc80..95c755473899 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1172,21 +1172,19 @@ nfsd4_decode_open_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_open_con static __be32 nfsd4_decode_open_downgrade(struct nfsd4_compoundargs *argp, struct nfsd4_open_downgrade *open_down) { - DECODE_HEAD; - - status = nfsd4_decode_stateid(argp, &open_down->od_stateid); + __be32 status; + + status = nfsd4_decode_stateid4(argp, &open_down->od_stateid); if (status) return status; - READ_BUF(4); - open_down->od_seqid = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &open_down->od_seqid) < 0) + return nfserr_bad_xdr; + /* deleg_want is ignored */ status = nfsd4_decode_share_access(argp, &open_down->od_share_access, &open_down->od_deleg_want, NULL); if (status) return status; - status = nfsd4_decode_share_deny(argp, &open_down->od_share_deny); - if (status) - return status; - DECODE_TAIL; + return nfsd4_decode_share_deny(argp, &open_down->od_share_deny); } static __be32 From 4b28cd7e832213a82667eb604100b4eeda6fea6a Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 14:23:02 -0500 Subject: [PATCH 0347/1565] NFSD: Replace READ* macros in nfsd4_decode_putfh() [ Upstream commit a73bed98413b1d9eb4466f776a56d2fde8b3b2c9 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 95c755473899..149948393ccb 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1190,16 +1190,21 @@ nfsd4_decode_open_downgrade(struct nfsd4_compoundargs *argp, struct nfsd4_open_d static __be32 nfsd4_decode_putfh(struct nfsd4_compoundargs *argp, struct nfsd4_putfh *putfh) { - DECODE_HEAD; + __be32 *p; - READ_BUF(4); - putfh->pf_fhlen = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &putfh->pf_fhlen) < 0) + return nfserr_bad_xdr; if (putfh->pf_fhlen > NFS4_FHSIZE) - goto xdr_error; - READ_BUF(putfh->pf_fhlen); - SAVEMEM(putfh->pf_fhval, putfh->pf_fhlen); + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, putfh->pf_fhlen); + if (!p) + return nfserr_bad_xdr; + putfh->pf_fhval = svcxdr_tmpalloc(argp, putfh->pf_fhlen); + if (!putfh->pf_fhval) + return nfserr_jukebox; + memcpy(putfh->pf_fhval, p, putfh->pf_fhlen); - DECODE_TAIL; + return nfs_ok; } static __be32 From d0a0219a35fced96522aac4a2bb1f142e55c90e2 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 14:28:24 -0500 Subject: [PATCH 0348/1565] NFSD: Replace READ* macros in nfsd4_decode_read() [ Upstream commit 3909c3bc604688503e31ddceb429dc156c4720c1 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 149948393ccb..c9652040d748 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1218,16 +1218,17 @@ nfsd4_decode_putpubfh(struct nfsd4_compoundargs *argp, void *p) static __be32 nfsd4_decode_read(struct nfsd4_compoundargs *argp, struct nfsd4_read *read) { - DECODE_HEAD; + __be32 status; - status = nfsd4_decode_stateid(argp, &read->rd_stateid); + status = nfsd4_decode_stateid4(argp, &read->rd_stateid); if (status) return status; - READ_BUF(12); - p = xdr_decode_hyper(p, &read->rd_offset); - read->rd_length = be32_to_cpup(p++); + if (xdr_stream_decode_u64(argp->xdr, &read->rd_offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &read->rd_length) < 0) + return nfserr_bad_xdr; - DECODE_TAIL; + return nfs_ok; } static __be32 From c24e2a4943abdddae26c5e4760ef871ad30223a4 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 14:30:59 -0500 Subject: [PATCH 0349/1565] NFSD: Replace READ* macros in nfsd4_decode_readdir() [ Upstream commit 0dfaf2a371436860ace6af889e6cd8410ee63164 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index c9652040d748..6036f8d595ef 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1234,17 +1234,22 @@ nfsd4_decode_read(struct nfsd4_compoundargs *argp, struct nfsd4_read *read) static __be32 nfsd4_decode_readdir(struct nfsd4_compoundargs *argp, struct nfsd4_readdir *readdir) { - DECODE_HEAD; + __be32 status; - READ_BUF(24); - p = xdr_decode_hyper(p, &readdir->rd_cookie); - COPYMEM(readdir->rd_verf.data, sizeof(readdir->rd_verf.data)); - readdir->rd_dircount = be32_to_cpup(p++); - readdir->rd_maxcount = be32_to_cpup(p++); - if ((status = nfsd4_decode_bitmap(argp, readdir->rd_bmval))) - goto out; + if (xdr_stream_decode_u64(argp->xdr, &readdir->rd_cookie) < 0) + return nfserr_bad_xdr; + status = nfsd4_decode_verifier4(argp, &readdir->rd_verf); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &readdir->rd_dircount) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &readdir->rd_maxcount) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_uint32_array(argp->xdr, readdir->rd_bmval, + ARRAY_SIZE(readdir->rd_bmval)) < 0) + return nfserr_bad_xdr; - DECODE_TAIL; + return nfs_ok; } static __be32 From 274b8f0597cf56043bac0c7e1188c38fbeddbeb4 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 15:04:36 -0500 Subject: [PATCH 0350/1565] NFSD: Replace READ* macros in nfsd4_decode_remove() [ Upstream commit b7f5fbf219aecda98e32de305551e445f9438899 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 6036f8d595ef..d4e1e3138739 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1255,16 +1255,7 @@ nfsd4_decode_readdir(struct nfsd4_compoundargs *argp, struct nfsd4_readdir *read static __be32 nfsd4_decode_remove(struct nfsd4_compoundargs *argp, struct nfsd4_remove *remove) { - DECODE_HEAD; - - READ_BUF(4); - remove->rm_namelen = be32_to_cpup(p++); - READ_BUF(remove->rm_namelen); - SAVEMEM(remove->rm_name, remove->rm_namelen); - if ((status = check_filename(remove->rm_name, remove->rm_namelen))) - return status; - - DECODE_TAIL; + return nfsd4_decode_component4(argp, &remove->rm_name, &remove->rm_namelen); } static __be32 From e2b287a53ccaf284d1b010f87e64682bd82771bc Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 15:05:58 -0500 Subject: [PATCH 0351/1565] NFSD: Replace READ* macros in nfsd4_decode_rename() [ Upstream commit ba881a0a5342b3aaf83958901ebe3fe752eaab46 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 18 ++++-------------- 1 file changed, 4 insertions(+), 14 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index d4e1e3138739..adf4a6fb94d4 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1261,22 +1261,12 @@ nfsd4_decode_remove(struct nfsd4_compoundargs *argp, struct nfsd4_remove *remove static __be32 nfsd4_decode_rename(struct nfsd4_compoundargs *argp, struct nfsd4_rename *rename) { - DECODE_HEAD; + __be32 status; - READ_BUF(4); - rename->rn_snamelen = be32_to_cpup(p++); - READ_BUF(rename->rn_snamelen); - SAVEMEM(rename->rn_sname, rename->rn_snamelen); - READ_BUF(4); - rename->rn_tnamelen = be32_to_cpup(p++); - READ_BUF(rename->rn_tnamelen); - SAVEMEM(rename->rn_tname, rename->rn_tnamelen); - if ((status = check_filename(rename->rn_sname, rename->rn_snamelen))) + status = nfsd4_decode_component4(argp, &rename->rn_sname, &rename->rn_snamelen); + if (status) return status; - if ((status = check_filename(rename->rn_tname, rename->rn_tnamelen))) - return status; - - DECODE_TAIL; + return nfsd4_decode_component4(argp, &rename->rn_tname, &rename->rn_tnamelen); } static __be32 From 2cef1009f8e730e9dd5824d2ba3b5f984554c922 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 15:08:50 -0500 Subject: [PATCH 0352/1565] NFSD: Replace READ* macros in nfsd4_decode_renew() [ Upstream commit d12f90458dc8c11734ba44ec88f109bf8de86ff0 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index adf4a6fb94d4..51b59b369d72 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1272,15 +1272,7 @@ nfsd4_decode_rename(struct nfsd4_compoundargs *argp, struct nfsd4_rename *rename static __be32 nfsd4_decode_renew(struct nfsd4_compoundargs *argp, clientid_t *clientid) { - DECODE_HEAD; - - if (argp->minorversion >= 1) - return nfserr_notsupp; - - READ_BUF(sizeof(clientid_t)); - COPYMEM(clientid, sizeof(clientid_t)); - - DECODE_TAIL; + return nfsd4_decode_clientid4(argp, clientid); } static __be32 From 5277d6034642fa9621ff67242fef30c86a28fc52 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 15:09:42 -0500 Subject: [PATCH 0353/1565] NFSD: Replace READ* macros in nfsd4_decode_secinfo() [ Upstream commit d0abdae5191a916d767164f6fc6c0e2e814a20a7 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 51b59b369d72..42d69c0207ce 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1279,16 +1279,7 @@ static __be32 nfsd4_decode_secinfo(struct nfsd4_compoundargs *argp, struct nfsd4_secinfo *secinfo) { - DECODE_HEAD; - - READ_BUF(4); - secinfo->si_namelen = be32_to_cpup(p++); - READ_BUF(secinfo->si_namelen); - SAVEMEM(secinfo->si_name, secinfo->si_namelen); - status = check_filename(secinfo->si_name, secinfo->si_namelen); - if (status) - return status; - DECODE_TAIL; + return nfsd4_decode_component4(argp, &secinfo->si_name, &secinfo->si_namelen); } static __be32 From 7c06ba5c8bf4e0edbb113b0fa97af31ac7cdc662 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sat, 21 Nov 2020 14:14:59 -0500 Subject: [PATCH 0354/1565] NFSD: Replace READ* macros in nfsd4_decode_setattr() [ Upstream commit 44592fe9479d8d4b88594365ab825f7b07afdf7c ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 42d69c0207ce..cda56ca9ca3f 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1298,7 +1298,7 @@ nfsd4_decode_setattr(struct nfsd4_compoundargs *argp, struct nfsd4_setattr *seta { __be32 status; - status = nfsd4_decode_stateid(argp, &setattr->sa_stateid); + status = nfsd4_decode_stateid4(argp, &setattr->sa_stateid); if (status) return status; return nfsd4_decode_fattr4(argp, setattr->sa_bmval, From 2503dcf0f68a40078eb6b231037988929eefd8b9 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 14:35:02 -0500 Subject: [PATCH 0355/1565] NFSD: Replace READ* macros in nfsd4_decode_setclientid() [ Upstream commit 92fa6c08c251d52d0d7b46066ecf87b96a0c4b8f ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 47 +++++++++++++++++++++++++++++++---------------- 1 file changed, 31 insertions(+), 16 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index cda56ca9ca3f..0af51cc1adba 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1310,31 +1310,46 @@ nfsd4_decode_setattr(struct nfsd4_compoundargs *argp, struct nfsd4_setattr *seta static __be32 nfsd4_decode_setclientid(struct nfsd4_compoundargs *argp, struct nfsd4_setclientid *setclientid) { - DECODE_HEAD; + __be32 *p, status; if (argp->minorversion >= 1) return nfserr_notsupp; - READ_BUF(NFS4_VERIFIER_SIZE); - COPYMEM(setclientid->se_verf.data, NFS4_VERIFIER_SIZE); - + status = nfsd4_decode_verifier4(argp, &setclientid->se_verf); + if (status) + return status; status = nfsd4_decode_opaque(argp, &setclientid->se_name); if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &setclientid->se_callback_prog) < 0) return nfserr_bad_xdr; - READ_BUF(8); - setclientid->se_callback_prog = be32_to_cpup(p++); - setclientid->se_callback_netid_len = be32_to_cpup(p++); - READ_BUF(setclientid->se_callback_netid_len); - SAVEMEM(setclientid->se_callback_netid_val, setclientid->se_callback_netid_len); - READ_BUF(4); - setclientid->se_callback_addr_len = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &setclientid->se_callback_netid_len) < 0) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, setclientid->se_callback_netid_len); + if (!p) + return nfserr_bad_xdr; + setclientid->se_callback_netid_val = svcxdr_tmpalloc(argp, + setclientid->se_callback_netid_len); + if (!setclientid->se_callback_netid_val) + return nfserr_jukebox; + memcpy(setclientid->se_callback_netid_val, p, + setclientid->se_callback_netid_len); - READ_BUF(setclientid->se_callback_addr_len); - SAVEMEM(setclientid->se_callback_addr_val, setclientid->se_callback_addr_len); - READ_BUF(4); - setclientid->se_callback_ident = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &setclientid->se_callback_addr_len) < 0) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, setclientid->se_callback_addr_len); + if (!p) + return nfserr_bad_xdr; + setclientid->se_callback_addr_val = svcxdr_tmpalloc(argp, + setclientid->se_callback_addr_len); + if (!setclientid->se_callback_addr_val) + return nfserr_jukebox; + memcpy(setclientid->se_callback_addr_val, p, + setclientid->se_callback_addr_len); + if (xdr_stream_decode_u32(argp->xdr, &setclientid->se_callback_ident) < 0) + return nfserr_bad_xdr; - DECODE_TAIL; + return nfs_ok; } static __be32 From 19a4c05e816776c8176cf4ac10870c4c303eed43 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 15:12:33 -0500 Subject: [PATCH 0356/1565] NFSD: Replace READ* macros in nfsd4_decode_setclientid_confirm() [ Upstream commit d1ca55149d67e5896f89a30053f5d83c002ac10e ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 0af51cc1adba..057cc1579f9b 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1355,16 +1355,15 @@ nfsd4_decode_setclientid(struct nfsd4_compoundargs *argp, struct nfsd4_setclient static __be32 nfsd4_decode_setclientid_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_setclientid_confirm *scd_c) { - DECODE_HEAD; + __be32 status; if (argp->minorversion >= 1) return nfserr_notsupp; - READ_BUF(8 + NFS4_VERIFIER_SIZE); - COPYMEM(&scd_c->sc_clientid, 8); - COPYMEM(&scd_c->sc_confirm, NFS4_VERIFIER_SIZE); - - DECODE_TAIL; + status = nfsd4_decode_clientid4(argp, &scd_c->sc_clientid); + if (status) + return status; + return nfsd4_decode_verifier4(argp, &scd_c->sc_confirm); } /* Also used for NVERIFY */ From b1af8f131eb8d84f50f8778c538ae1937bc51fe8 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 14:40:32 -0500 Subject: [PATCH 0357/1565] NFSD: Replace READ* macros in nfsd4_decode_verify() [ Upstream commit 67cd453eeda86be90f83a0f4798f33832cf2d98c ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 057cc1579f9b..231a2628e3e6 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1370,20 +1370,27 @@ nfsd4_decode_setclientid_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_s static __be32 nfsd4_decode_verify(struct nfsd4_compoundargs *argp, struct nfsd4_verify *verify) { - DECODE_HEAD; + __be32 *p, status; - if ((status = nfsd4_decode_bitmap(argp, verify->ve_bmval))) - goto out; + status = nfsd4_decode_bitmap4(argp, verify->ve_bmval, + ARRAY_SIZE(verify->ve_bmval)); + if (status) + return status; /* For convenience's sake, we compare raw xdr'd attributes in * nfsd4_proc_verify */ - READ_BUF(4); - verify->ve_attrlen = be32_to_cpup(p++); - READ_BUF(verify->ve_attrlen); - SAVEMEM(verify->ve_attrval, verify->ve_attrlen); + if (xdr_stream_decode_u32(argp->xdr, &verify->ve_attrlen) < 0) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, verify->ve_attrlen); + if (!p) + return nfserr_bad_xdr; + verify->ve_attrval = svcxdr_tmpalloc(argp, verify->ve_attrlen); + if (!verify->ve_attrval) + return nfserr_jukebox; + memcpy(verify->ve_attrval, p, verify->ve_attrlen); - DECODE_TAIL; + return nfs_ok; } static __be32 From eeab2f3bf284ac8e9da8c8807ffdf2a8a4276a02 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 14:44:28 -0500 Subject: [PATCH 0358/1565] NFSD: Replace READ* macros in nfsd4_decode_write() [ Upstream commit 244e2befcba80f42c65293b6c56282bb78f9f417 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 231a2628e3e6..26744b7f0e35 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1396,22 +1396,23 @@ nfsd4_decode_verify(struct nfsd4_compoundargs *argp, struct nfsd4_verify *verify static __be32 nfsd4_decode_write(struct nfsd4_compoundargs *argp, struct nfsd4_write *write) { - DECODE_HEAD; + __be32 status; - status = nfsd4_decode_stateid(argp, &write->wr_stateid); + status = nfsd4_decode_stateid4(argp, &write->wr_stateid); if (status) return status; - READ_BUF(16); - p = xdr_decode_hyper(p, &write->wr_offset); - write->wr_stable_how = be32_to_cpup(p++); + if (xdr_stream_decode_u64(argp->xdr, &write->wr_offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &write->wr_stable_how) < 0) + return nfserr_bad_xdr; if (write->wr_stable_how > NFS_FILE_SYNC) - goto xdr_error; - write->wr_buflen = be32_to_cpup(p++); - + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &write->wr_buflen) < 0) + return nfserr_bad_xdr; if (!xdr_stream_subsegment(argp->xdr, &write->wr_payload, write->wr_buflen)) - goto xdr_error; + return nfserr_bad_xdr; - DECODE_TAIL; + return nfs_ok; } static __be32 From 79f1a8323a3432ca600bdc973cde4242ffde9e9b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 13:42:25 -0500 Subject: [PATCH 0359/1565] NFSD: Replace READ* macros in nfsd4_decode_release_lockowner() [ Upstream commit a4a80c15ca4dd998ab5cbe87bd856c626a318a80 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 26744b7f0e35..cc406b7a530b 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1418,20 +1418,20 @@ nfsd4_decode_write(struct nfsd4_compoundargs *argp, struct nfsd4_write *write) static __be32 nfsd4_decode_release_lockowner(struct nfsd4_compoundargs *argp, struct nfsd4_release_lockowner *rlockowner) { - DECODE_HEAD; + __be32 status; if (argp->minorversion >= 1) return nfserr_notsupp; - READ_BUF(12); - COPYMEM(&rlockowner->rl_clientid, sizeof(clientid_t)); - rlockowner->rl_owner.len = be32_to_cpup(p++); - READ_BUF(rlockowner->rl_owner.len); - READMEM(rlockowner->rl_owner.data, rlockowner->rl_owner.len); + status = nfsd4_decode_state_owner4(argp, &rlockowner->rl_clientid, + &rlockowner->rl_owner); + if (status) + return status; if (argp->minorversion && !zero_clientid(&rlockowner->rl_clientid)) return nfserr_inval; - DECODE_TAIL; + + return nfs_ok; } static __be32 From 5658ca0651e6fbcf5b91dd649e248502fb8b6b35 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 13:09:34 -0500 Subject: [PATCH 0360/1565] NFSD: Replace READ* macros in nfsd4_decode_cb_sec() [ Upstream commit 1a99440807bfc66597aaa2e0f0213c319b023e34 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 165 ++++++++++++++++++++++++++++++---------------- 1 file changed, 107 insertions(+), 58 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index cc406b7a530b..6f3c86bee621 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -212,6 +212,25 @@ static char *savemem(struct nfsd4_compoundargs *argp, __be32 *p, int nbytes) * NFSv4 basic data type decoders */ +/* + * This helper handles variable-length opaques which belong to protocol + * elements that this implementation does not support. + */ +static __be32 +nfsd4_decode_ignored_string(struct nfsd4_compoundargs *argp, u32 maxlen) +{ + u32 len; + + if (xdr_stream_decode_u32(argp->xdr, &len) < 0) + return nfserr_bad_xdr; + if (maxlen && len > maxlen) + return nfserr_bad_xdr; + if (!xdr_inline_decode(argp->xdr, len)) + return nfserr_bad_xdr; + + return nfs_ok; +} + static __be32 nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_netobj *o) { @@ -645,87 +664,117 @@ nfsd4_decode_state_owner4(struct nfsd4_compoundargs *argp, return nfsd4_decode_opaque(argp, owner); } -static __be32 nfsd4_decode_cb_sec(struct nfsd4_compoundargs *argp, struct nfsd4_cb_sec *cbs) +/* Defined in Appendix A of RFC 5531 */ +static __be32 +nfsd4_decode_authsys_parms(struct nfsd4_compoundargs *argp, + struct nfsd4_cb_sec *cbs) { - DECODE_HEAD; - struct user_namespace *userns = nfsd_user_namespace(argp->rqstp); - u32 dummy, uid, gid; - char *machine_name; - int i; - int nr_secflavs; + u32 stamp, gidcount, uid, gid; + __be32 *p, status; + + if (xdr_stream_decode_u32(argp->xdr, &stamp) < 0) + return nfserr_bad_xdr; + /* machine name */ + status = nfsd4_decode_ignored_string(argp, 255); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &uid) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &gid) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &gidcount) < 0) + return nfserr_bad_xdr; + if (gidcount > 16) + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, gidcount << 2); + if (!p) + return nfserr_bad_xdr; + if (cbs->flavor == (u32)(-1)) { + struct user_namespace *userns = nfsd_user_namespace(argp->rqstp); + + kuid_t kuid = make_kuid(userns, uid); + kgid_t kgid = make_kgid(userns, gid); + if (uid_valid(kuid) && gid_valid(kgid)) { + cbs->uid = kuid; + cbs->gid = kgid; + cbs->flavor = RPC_AUTH_UNIX; + } else { + dprintk("RPC_AUTH_UNIX with invalid uid or gid, ignoring!\n"); + } + } + + return nfs_ok; +} + +static __be32 +nfsd4_decode_gss_cb_handles4(struct nfsd4_compoundargs *argp, + struct nfsd4_cb_sec *cbs) +{ + __be32 status; + u32 service; + + dprintk("RPC_AUTH_GSS callback secflavor not supported!\n"); + + if (xdr_stream_decode_u32(argp->xdr, &service) < 0) + return nfserr_bad_xdr; + if (service < RPC_GSS_SVC_NONE || service > RPC_GSS_SVC_PRIVACY) + return nfserr_bad_xdr; + /* gcbp_handle_from_server */ + status = nfsd4_decode_ignored_string(argp, 0); + if (status) + return status; + /* gcbp_handle_from_client */ + status = nfsd4_decode_ignored_string(argp, 0); + if (status) + return status; + + return nfs_ok; +} + +/* a counted array of callback_sec_parms4 items */ +static __be32 +nfsd4_decode_cb_sec(struct nfsd4_compoundargs *argp, struct nfsd4_cb_sec *cbs) +{ + u32 i, secflavor, nr_secflavs; + __be32 status; /* callback_sec_params4 */ - READ_BUF(4); - nr_secflavs = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &nr_secflavs) < 0) + return nfserr_bad_xdr; if (nr_secflavs) cbs->flavor = (u32)(-1); else /* Is this legal? Be generous, take it to mean AUTH_NONE: */ cbs->flavor = 0; + for (i = 0; i < nr_secflavs; ++i) { - READ_BUF(4); - dummy = be32_to_cpup(p++); - switch (dummy) { + if (xdr_stream_decode_u32(argp->xdr, &secflavor) < 0) + return nfserr_bad_xdr; + switch (secflavor) { case RPC_AUTH_NULL: - /* Nothing to read */ + /* void */ if (cbs->flavor == (u32)(-1)) cbs->flavor = RPC_AUTH_NULL; break; case RPC_AUTH_UNIX: - READ_BUF(8); - /* stamp */ - dummy = be32_to_cpup(p++); - - /* machine name */ - dummy = be32_to_cpup(p++); - READ_BUF(dummy); - SAVEMEM(machine_name, dummy); - - /* uid, gid */ - READ_BUF(8); - uid = be32_to_cpup(p++); - gid = be32_to_cpup(p++); - - /* more gids */ - READ_BUF(4); - dummy = be32_to_cpup(p++); - READ_BUF(dummy * 4); - if (cbs->flavor == (u32)(-1)) { - kuid_t kuid = make_kuid(userns, uid); - kgid_t kgid = make_kgid(userns, gid); - if (uid_valid(kuid) && gid_valid(kgid)) { - cbs->uid = kuid; - cbs->gid = kgid; - cbs->flavor = RPC_AUTH_UNIX; - } else { - dprintk("RPC_AUTH_UNIX with invalid" - "uid or gid ignoring!\n"); - } - } + status = nfsd4_decode_authsys_parms(argp, cbs); + if (status) + return status; break; case RPC_AUTH_GSS: - dprintk("RPC_AUTH_GSS callback secflavor " - "not supported!\n"); - READ_BUF(8); - /* gcbp_service */ - dummy = be32_to_cpup(p++); - /* gcbp_handle_from_server */ - dummy = be32_to_cpup(p++); - READ_BUF(dummy); - p += XDR_QUADLEN(dummy); - /* gcbp_handle_from_client */ - READ_BUF(4); - dummy = be32_to_cpup(p++); - READ_BUF(dummy); + status = nfsd4_decode_gss_cb_handles4(argp, cbs); + if (status) + return status; break; default: - dprintk("Illegal callback secflavor\n"); return nfserr_inval; } } - DECODE_TAIL; + + return nfs_ok; } + /* * NFSv4 operation argument decoders */ From 7d2108407466c644ad43836d44179b4f81a6b840 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 13:14:35 -0500 Subject: [PATCH 0361/1565] NFSD: Replace READ* macros in nfsd4_decode_backchannel_ctl() [ Upstream commit 0f81d96098f8eb707afe2f8d5c3fe0f9316ef5ce ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 6f3c86bee621..efd1504cd02b 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -788,17 +788,6 @@ nfsd4_decode_access(struct nfsd4_compoundargs *argp, return nfs_ok; } -static __be32 nfsd4_decode_backchannel_ctl(struct nfsd4_compoundargs *argp, struct nfsd4_backchannel_ctl *bc) -{ - DECODE_HEAD; - - READ_BUF(4); - bc->bc_cb_program = be32_to_cpup(p++); - nfsd4_decode_cb_sec(argp, &bc->bc_cb_sec); - - DECODE_TAIL; -} - static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, struct nfsd4_bind_conn_to_session *bcts) { DECODE_HEAD; @@ -1483,6 +1472,13 @@ nfsd4_decode_release_lockowner(struct nfsd4_compoundargs *argp, struct nfsd4_rel return nfs_ok; } +static __be32 nfsd4_decode_backchannel_ctl(struct nfsd4_compoundargs *argp, struct nfsd4_backchannel_ctl *bc) +{ + if (xdr_stream_decode_u32(argp->xdr, &bc->bc_cb_program) < 0) + return nfserr_bad_xdr; + return nfsd4_decode_cb_sec(argp, &bc->bc_cb_sec); +} + static __be32 nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, struct nfsd4_exchange_id *exid) From b73633804246be23ef596abab24598e86995ccfd Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 13:16:23 -0500 Subject: [PATCH 0362/1565] NFSD: Replace READ* macros in nfsd4_decode_bind_conn_to_session() [ Upstream commit 571e0451c4de0a545960ffaea16d969931afc563 ] A dedicated sessionid4 decoder is introduced that will be used by other operation decoders in subsequent patches. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 41 +++++++++++++++++++++++++++++------------ 1 file changed, 29 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index efd1504cd02b..5dad32ab02ec 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -664,6 +664,19 @@ nfsd4_decode_state_owner4(struct nfsd4_compoundargs *argp, return nfsd4_decode_opaque(argp, owner); } +static __be32 +nfsd4_decode_sessionid4(struct nfsd4_compoundargs *argp, + struct nfs4_sessionid *sessionid) +{ + __be32 *p; + + p = xdr_inline_decode(argp->xdr, NFS4_MAX_SESSIONID_LEN); + if (!p) + return nfserr_bad_xdr; + memcpy(sessionid->data, p, sizeof(sessionid->data)); + return nfs_ok; +} + /* Defined in Appendix A of RFC 5531 */ static __be32 nfsd4_decode_authsys_parms(struct nfsd4_compoundargs *argp, @@ -788,18 +801,6 @@ nfsd4_decode_access(struct nfsd4_compoundargs *argp, return nfs_ok; } -static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, struct nfsd4_bind_conn_to_session *bcts) -{ - DECODE_HEAD; - - READ_BUF(NFS4_MAX_SESSIONID_LEN + 8); - COPYMEM(bcts->sessionid.data, NFS4_MAX_SESSIONID_LEN); - bcts->dir = be32_to_cpup(p++); - /* XXX: skipping ctsa_use_conn_in_rdma_mode. Perhaps Tom Tucker - * could help us figure out we should be using it. */ - DECODE_TAIL; -} - static __be32 nfsd4_decode_close(struct nfsd4_compoundargs *argp, struct nfsd4_close *close) { @@ -1479,6 +1480,22 @@ static __be32 nfsd4_decode_backchannel_ctl(struct nfsd4_compoundargs *argp, stru return nfsd4_decode_cb_sec(argp, &bc->bc_cb_sec); } +static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, struct nfsd4_bind_conn_to_session *bcts) +{ + u32 use_conn_in_rdma_mode; + __be32 status; + + status = nfsd4_decode_sessionid4(argp, &bcts->sessionid); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &bcts->dir) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &use_conn_in_rdma_mode) < 0) + return nfserr_bad_xdr; + + return nfs_ok; +} + static __be32 nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, struct nfsd4_exchange_id *exid) From cb568dbdef68e036ff271f9aaecb29824d4133f8 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 11:13:00 -0500 Subject: [PATCH 0363/1565] NFSD: Add a separate decoder to handle state_protect_ops [ Upstream commit 2548aa784d760567c2a77cbd8b7c55b211167c37 ] Refactor for clarity and de-duplication of code. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 66 +++++++++++++++++------------------------------ 1 file changed, 23 insertions(+), 43 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 5dad32ab02ec..15535b14328e 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -315,32 +315,6 @@ nfsd4_decode_verifier4(struct nfsd4_compoundargs *argp, nfs4_verifier *verf) return nfs_ok; } -static __be32 -nfsd4_decode_bitmap(struct nfsd4_compoundargs *argp, u32 *bmval) -{ - u32 bmlen; - DECODE_HEAD; - - bmval[0] = 0; - bmval[1] = 0; - bmval[2] = 0; - - READ_BUF(4); - bmlen = be32_to_cpup(p++); - if (bmlen > 1000) - goto xdr_error; - - READ_BUF(bmlen << 2); - if (bmlen > 0) - bmval[0] = be32_to_cpup(p++); - if (bmlen > 1) - bmval[1] = be32_to_cpup(p++); - if (bmlen > 2) - bmval[2] = be32_to_cpup(p++); - - DECODE_TAIL; -} - /** * nfsd4_decode_bitmap4 - Decode an NFSv4 bitmap4 * @argp: NFSv4 compound argument structure @@ -1496,6 +1470,24 @@ static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, return nfs_ok; } +static __be32 +nfsd4_decode_state_protect_ops(struct nfsd4_compoundargs *argp, + struct nfsd4_exchange_id *exid) +{ + __be32 status; + + status = nfsd4_decode_bitmap4(argp, exid->spo_must_enforce, + ARRAY_SIZE(exid->spo_must_enforce)); + if (status) + return nfserr_bad_xdr; + status = nfsd4_decode_bitmap4(argp, exid->spo_must_allow, + ARRAY_SIZE(exid->spo_must_allow)); + if (status) + return nfserr_bad_xdr; + + return nfs_ok; +} + static __be32 nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, struct nfsd4_exchange_id *exid) @@ -1520,27 +1512,15 @@ nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, case SP4_NONE: break; case SP4_MACH_CRED: - /* spo_must_enforce */ - status = nfsd4_decode_bitmap(argp, - exid->spo_must_enforce); + status = nfsd4_decode_state_protect_ops(argp, exid); if (status) - goto out; - /* spo_must_allow */ - status = nfsd4_decode_bitmap(argp, exid->spo_must_allow); - if (status) - goto out; + return status; break; case SP4_SSV: /* ssp_ops */ - READ_BUF(4); - dummy = be32_to_cpup(p++); - READ_BUF(dummy * 4); - p += dummy; - - READ_BUF(4); - dummy = be32_to_cpup(p++); - READ_BUF(dummy * 4); - p += dummy; + status = nfsd4_decode_state_protect_ops(argp, exid); + if (status) + return status; /* ssp_hash_algs<> */ READ_BUF(4); From 0c935af3cfb73065308c62a03118726c92567d05 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 11:17:50 -0500 Subject: [PATCH 0364/1565] NFSD: Add a separate decoder for ssv_sp_parms [ Upstream commit 547bfeb4cd8d491aabbd656d5a6f410cb4249b4e ] Refactor for clarity. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 70 +++++++++++++++++++++++++++++------------------ 1 file changed, 44 insertions(+), 26 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 15535b14328e..8c5701367e4a 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1488,12 +1488,54 @@ nfsd4_decode_state_protect_ops(struct nfsd4_compoundargs *argp, return nfs_ok; } +/* + * This implementation currently does not support SP4_SSV. + * This decoder simply skips over these arguments. + */ +static noinline __be32 +nfsd4_decode_ssv_sp_parms(struct nfsd4_compoundargs *argp, + struct nfsd4_exchange_id *exid) +{ + u32 count, window, num_gss_handles; + __be32 status; + + /* ssp_ops */ + status = nfsd4_decode_state_protect_ops(argp, exid); + if (status) + return status; + + /* ssp_hash_algs<> */ + if (xdr_stream_decode_u32(argp->xdr, &count) < 0) + return nfserr_bad_xdr; + while (count--) { + status = nfsd4_decode_ignored_string(argp, 0); + if (status) + return status; + } + + /* ssp_encr_algs<> */ + if (xdr_stream_decode_u32(argp->xdr, &count) < 0) + return nfserr_bad_xdr; + while (count--) { + status = nfsd4_decode_ignored_string(argp, 0); + if (status) + return status; + } + + if (xdr_stream_decode_u32(argp->xdr, &window) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &num_gss_handles) < 0) + return nfserr_bad_xdr; + + return nfs_ok; +} + static __be32 nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, struct nfsd4_exchange_id *exid) { - int dummy, tmp; DECODE_HEAD; + int dummy; READ_BUF(NFS4_VERIFIER_SIZE); COPYMEM(exid->verifier.data, NFS4_VERIFIER_SIZE); @@ -1517,33 +1559,9 @@ nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, return status; break; case SP4_SSV: - /* ssp_ops */ - status = nfsd4_decode_state_protect_ops(argp, exid); + status = nfsd4_decode_ssv_sp_parms(argp, exid); if (status) return status; - - /* ssp_hash_algs<> */ - READ_BUF(4); - tmp = be32_to_cpup(p++); - while (tmp--) { - READ_BUF(4); - dummy = be32_to_cpup(p++); - READ_BUF(dummy); - p += XDR_QUADLEN(dummy); - } - - /* ssp_encr_algs<> */ - READ_BUF(4); - tmp = be32_to_cpup(p++); - while (tmp--) { - READ_BUF(4); - dummy = be32_to_cpup(p++); - READ_BUF(dummy); - p += XDR_QUADLEN(dummy); - } - - /* ignore ssp_window and ssp_num_gss_handles: */ - READ_BUF(8); break; default: goto xdr_error; From d50a76f1f3fcd3407d135eeaefdf7e572abbeb0c Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 2 Nov 2020 15:19:12 -0500 Subject: [PATCH 0365/1565] NFSD: Add a helper to decode state_protect4_a [ Upstream commit 523ec6ed6fb80fd1537d748a06bffd060a8b3235 ] Refactor for clarity. Also, remove a stale comment. Commit ed94164398c9 ("nfsd: implement machine credential support for some operations") added support for SP4_MACH_CRED, so state_protect_a is no longer completely ignored. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 2 +- fs/nfsd/nfs4xdr.c | 44 +++++++++++++++++++++++++++----------------- fs/nfsd/xdr4.h | 2 +- 3 files changed, 29 insertions(+), 19 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index d402ca0b535f..e7ec7593eaaa 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3089,7 +3089,7 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, rpc_ntop(sa, addr_str, sizeof(addr_str)); dprintk("%s rqstp=%p exid=%p clname.len=%u clname.data=%p " - "ip_addr=%s flags %x, spa_how %d\n", + "ip_addr=%s flags %x, spa_how %u\n", __func__, rqstp, exid, exid->clname.len, exid->clname.data, addr_str, exid->flags, exid->spa_how); diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 8c5701367e4a..6a4ab81e01ff 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1531,25 +1531,13 @@ nfsd4_decode_ssv_sp_parms(struct nfsd4_compoundargs *argp, } static __be32 -nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, - struct nfsd4_exchange_id *exid) +nfsd4_decode_state_protect4_a(struct nfsd4_compoundargs *argp, + struct nfsd4_exchange_id *exid) { - DECODE_HEAD; - int dummy; + __be32 status; - READ_BUF(NFS4_VERIFIER_SIZE); - COPYMEM(exid->verifier.data, NFS4_VERIFIER_SIZE); - - status = nfsd4_decode_opaque(argp, &exid->clname); - if (status) + if (xdr_stream_decode_u32(argp->xdr, &exid->spa_how) < 0) return nfserr_bad_xdr; - - READ_BUF(4); - exid->flags = be32_to_cpup(p++); - - /* Ignore state_protect4_a */ - READ_BUF(4); - exid->spa_how = be32_to_cpup(p++); switch (exid->spa_how) { case SP4_NONE: break; @@ -1564,9 +1552,31 @@ nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, return status; break; default: - goto xdr_error; + return nfserr_bad_xdr; } + return nfs_ok; +} + +static __be32 +nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, + struct nfsd4_exchange_id *exid) +{ + DECODE_HEAD; + int dummy; + + status = nfsd4_decode_verifier4(argp, &exid->verifier); + if (status) + return status; + status = nfsd4_decode_opaque(argp, &exid->clname); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &exid->flags) < 0) + return nfserr_bad_xdr; + status = nfsd4_decode_state_protect4_a(argp, exid); + if (status) + return status; + READ_BUF(4); /* nfs_impl_id4 array length */ dummy = be32_to_cpup(p++); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 6245004a9993..232529bc1b79 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -433,7 +433,7 @@ struct nfsd4_exchange_id { u32 flags; clientid_t clientid; u32 seqid; - int spa_how; + u32 spa_how; u32 spo_must_enforce[3]; u32 spo_must_allow[3]; struct xdr_netobj nii_domain; From d01f41320d2ae7bdbb24fa935dfe5f6887abfa17 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 16 Nov 2020 15:21:55 -0500 Subject: [PATCH 0366/1565] NFSD: Add a helper to decode nfs_impl_id4 [ Upstream commit 10ff84228197f47401833495ba19a50131323b4a ] Refactor for clarity. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 63 ++++++++++++++++++++++++++++------------------- 1 file changed, 38 insertions(+), 25 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 6a4ab81e01ff..e06e657c3d91 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1558,12 +1558,47 @@ nfsd4_decode_state_protect4_a(struct nfsd4_compoundargs *argp, return nfs_ok; } +static __be32 +nfsd4_decode_nfs_impl_id4(struct nfsd4_compoundargs *argp, + struct nfsd4_exchange_id *exid) +{ + __be32 status; + u32 count; + + if (xdr_stream_decode_u32(argp->xdr, &count) < 0) + return nfserr_bad_xdr; + switch (count) { + case 0: + break; + case 1: + /* Note that RFC 8881 places no length limit on + * nii_domain, but this implementation permits no + * more than NFS4_OPAQUE_LIMIT bytes */ + status = nfsd4_decode_opaque(argp, &exid->nii_domain); + if (status) + return status; + /* Note that RFC 8881 places no length limit on + * nii_name, but this implementation permits no + * more than NFS4_OPAQUE_LIMIT bytes */ + status = nfsd4_decode_opaque(argp, &exid->nii_name); + if (status) + return status; + status = nfsd4_decode_nfstime4(argp, &exid->nii_time); + if (status) + return status; + break; + default: + return nfserr_bad_xdr; + } + + return nfs_ok; +} + static __be32 nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, struct nfsd4_exchange_id *exid) { - DECODE_HEAD; - int dummy; + __be32 status; status = nfsd4_decode_verifier4(argp, &exid->verifier); if (status) @@ -1576,29 +1611,7 @@ nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, status = nfsd4_decode_state_protect4_a(argp, exid); if (status) return status; - - READ_BUF(4); /* nfs_impl_id4 array length */ - dummy = be32_to_cpup(p++); - - if (dummy > 1) - goto xdr_error; - - if (dummy == 1) { - status = nfsd4_decode_opaque(argp, &exid->nii_domain); - if (status) - goto xdr_error; - - /* nii_name */ - status = nfsd4_decode_opaque(argp, &exid->nii_name); - if (status) - goto xdr_error; - - /* nii_date */ - status = nfsd4_decode_time(argp, &exid->nii_time); - if (status) - goto xdr_error; - } - DECODE_TAIL; + return nfsd4_decode_nfs_impl_id4(argp, exid); } static __be32 From 2bd9ef494a2c5fa0a5ec6fae657c844fcc7885c1 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 16 Nov 2020 15:35:05 -0500 Subject: [PATCH 0367/1565] NFSD: Add a helper to decode channel_attrs4 [ Upstream commit 3a3f1fbacb0960b628e5a9f07c78287312f7a99d ] De-duplicate some code. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 71 +++++++++++++++++++++++++---------------------- 1 file changed, 38 insertions(+), 33 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index e06e657c3d91..716a16961ff4 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1614,6 +1614,38 @@ nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, return nfsd4_decode_nfs_impl_id4(argp, exid); } +static __be32 +nfsd4_decode_channel_attrs4(struct nfsd4_compoundargs *argp, + struct nfsd4_channel_attrs *ca) +{ + __be32 *p; + + p = xdr_inline_decode(argp->xdr, XDR_UNIT * 7); + if (!p) + return nfserr_bad_xdr; + + /* headerpadsz is ignored */ + p++; + ca->maxreq_sz = be32_to_cpup(p++); + ca->maxresp_sz = be32_to_cpup(p++); + ca->maxresp_cached = be32_to_cpup(p++); + ca->maxops = be32_to_cpup(p++); + ca->maxreqs = be32_to_cpup(p++); + ca->nr_rdma_attrs = be32_to_cpup(p); + switch (ca->nr_rdma_attrs) { + case 0: + break; + case 1: + if (xdr_stream_decode_u32(argp->xdr, &ca->rdma_attrs) < 0) + return nfserr_bad_xdr; + break; + default: + return nfserr_bad_xdr; + } + + return nfs_ok; +} + static __be32 nfsd4_decode_create_session(struct nfsd4_compoundargs *argp, struct nfsd4_create_session *sess) @@ -1625,39 +1657,12 @@ nfsd4_decode_create_session(struct nfsd4_compoundargs *argp, sess->seqid = be32_to_cpup(p++); sess->flags = be32_to_cpup(p++); - /* Fore channel attrs */ - READ_BUF(28); - p++; /* headerpadsz is always 0 */ - sess->fore_channel.maxreq_sz = be32_to_cpup(p++); - sess->fore_channel.maxresp_sz = be32_to_cpup(p++); - sess->fore_channel.maxresp_cached = be32_to_cpup(p++); - sess->fore_channel.maxops = be32_to_cpup(p++); - sess->fore_channel.maxreqs = be32_to_cpup(p++); - sess->fore_channel.nr_rdma_attrs = be32_to_cpup(p++); - if (sess->fore_channel.nr_rdma_attrs == 1) { - READ_BUF(4); - sess->fore_channel.rdma_attrs = be32_to_cpup(p++); - } else if (sess->fore_channel.nr_rdma_attrs > 1) { - dprintk("Too many fore channel attr bitmaps!\n"); - goto xdr_error; - } - - /* Back channel attrs */ - READ_BUF(28); - p++; /* headerpadsz is always 0 */ - sess->back_channel.maxreq_sz = be32_to_cpup(p++); - sess->back_channel.maxresp_sz = be32_to_cpup(p++); - sess->back_channel.maxresp_cached = be32_to_cpup(p++); - sess->back_channel.maxops = be32_to_cpup(p++); - sess->back_channel.maxreqs = be32_to_cpup(p++); - sess->back_channel.nr_rdma_attrs = be32_to_cpup(p++); - if (sess->back_channel.nr_rdma_attrs == 1) { - READ_BUF(4); - sess->back_channel.rdma_attrs = be32_to_cpup(p++); - } else if (sess->back_channel.nr_rdma_attrs > 1) { - dprintk("Too many back channel attr bitmaps!\n"); - goto xdr_error; - } + status = nfsd4_decode_channel_attrs4(argp, &sess->fore_channel); + if (status) + return status; + status = nfsd4_decode_channel_attrs4(argp, &sess->back_channel); + if (status) + return status; READ_BUF(4); sess->callback_prog = be32_to_cpup(p++); From 73684a8118f35eead4ceae01feace36a41a34a11 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 14:52:44 -0500 Subject: [PATCH 0368/1565] NFSD: Replace READ* macros in nfsd4_decode_create_session() [ Upstream commit 81243e3fe37ed547fc4ed8aab1cec2865540bb18 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 716a16961ff4..3e2e0de00c3a 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1650,24 +1650,28 @@ static __be32 nfsd4_decode_create_session(struct nfsd4_compoundargs *argp, struct nfsd4_create_session *sess) { - DECODE_HEAD; - - READ_BUF(16); - COPYMEM(&sess->clientid, 8); - sess->seqid = be32_to_cpup(p++); - sess->flags = be32_to_cpup(p++); + __be32 status; + status = nfsd4_decode_clientid4(argp, &sess->clientid); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &sess->seqid) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &sess->flags) < 0) + return nfserr_bad_xdr; status = nfsd4_decode_channel_attrs4(argp, &sess->fore_channel); if (status) return status; status = nfsd4_decode_channel_attrs4(argp, &sess->back_channel); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &sess->callback_prog) < 0) + return nfserr_bad_xdr; + status = nfsd4_decode_cb_sec(argp, &sess->cb_sec); if (status) return status; - READ_BUF(4); - sess->callback_prog = be32_to_cpup(p++); - nfsd4_decode_cb_sec(argp, &sess->cb_sec); - DECODE_TAIL; + return nfs_ok; } static __be32 From 92ae309a9908b63976be92ec62fe3051abbb39d4 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 13:50:55 -0500 Subject: [PATCH 0369/1565] NFSD: Replace READ* macros in nfsd4_decode_destroy_session() [ Upstream commit 94e254af1f873b4b551db4c4549294f2c4d385ef ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 3e2e0de00c3a..7a0730688b2f 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1678,11 +1678,7 @@ static __be32 nfsd4_decode_destroy_session(struct nfsd4_compoundargs *argp, struct nfsd4_destroy_session *destroy_session) { - DECODE_HEAD; - READ_BUF(NFS4_MAX_SESSIONID_LEN); - COPYMEM(destroy_session->sessionid.data, NFS4_MAX_SESSIONID_LEN); - - DECODE_TAIL; + return nfsd4_decode_sessionid4(argp, &destroy_session->sessionid); } static __be32 From 5f892c11787e12d1fcf13ceb45d66059cd056724 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 1 Nov 2020 13:38:27 -0500 Subject: [PATCH 0370/1565] NFSD: Replace READ* macros in nfsd4_decode_free_stateid() [ Upstream commit aec387d5909304810d899f7d90ae57df33f3a75c ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 7a0730688b2f..92988926d954 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1685,13 +1685,7 @@ static __be32 nfsd4_decode_free_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_free_stateid *free_stateid) { - DECODE_HEAD; - - READ_BUF(sizeof(stateid_t)); - free_stateid->fr_stateid.si_generation = be32_to_cpup(p++); - COPYMEM(&free_stateid->fr_stateid.si_opaque, sizeof(stateid_opaque_t)); - - DECODE_TAIL; + return nfsd4_decode_stateid4(argp, &free_stateid->fr_stateid); } static __be32 From c0a4c4e46b8ab28ec7b33fa82e9d775d6e1e8f80 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 15:03:50 -0500 Subject: [PATCH 0371/1565] NFSD: Replace READ* macros in nfsd4_decode_getdeviceinfo() [ Upstream commit 044959715f370b24870c95df3940add8710c5a29 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 48 +++++++++++++++++++++++++++-------------------- 1 file changed, 28 insertions(+), 20 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 92988926d954..11e32c244b23 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -638,6 +638,21 @@ nfsd4_decode_state_owner4(struct nfsd4_compoundargs *argp, return nfsd4_decode_opaque(argp, owner); } +#ifdef CONFIG_NFSD_PNFS +static __be32 +nfsd4_decode_deviceid4(struct nfsd4_compoundargs *argp, + struct nfsd4_deviceid *devid) +{ + __be32 *p; + + p = xdr_inline_decode(argp->xdr, NFS4_DEVICEID4_SIZE); + if (!p) + return nfserr_bad_xdr; + memcpy(devid, p, sizeof(*devid)); + return nfs_ok; +} +#endif /* CONFIG_NFSD_PNFS */ + static __be32 nfsd4_decode_sessionid4(struct nfsd4_compoundargs *argp, struct nfs4_sessionid *sessionid) @@ -1765,27 +1780,20 @@ static __be32 nfsd4_decode_getdeviceinfo(struct nfsd4_compoundargs *argp, struct nfsd4_getdeviceinfo *gdev) { - DECODE_HEAD; - u32 num, i; + __be32 status; - READ_BUF(sizeof(struct nfsd4_deviceid) + 3 * 4); - COPYMEM(&gdev->gd_devid, sizeof(struct nfsd4_deviceid)); - gdev->gd_layout_type = be32_to_cpup(p++); - gdev->gd_maxcount = be32_to_cpup(p++); - num = be32_to_cpup(p++); - if (num) { - if (num > 1000) - goto xdr_error; - READ_BUF(4 * num); - gdev->gd_notify_types = be32_to_cpup(p++); - for (i = 1; i < num; i++) { - if (be32_to_cpup(p++)) { - status = nfserr_inval; - goto out; - } - } - } - DECODE_TAIL; + status = nfsd4_decode_deviceid4(argp, &gdev->gd_devid); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &gdev->gd_layout_type) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &gdev->gd_maxcount) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_uint32_array(argp->xdr, + &gdev->gd_notify_types, 1) < 0) + return nfserr_bad_xdr; + + return nfs_ok; } static __be32 From 40770a0f8ef6fcb5c3baf91e9e8fbd11a9b1390b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 10:40:07 -0500 Subject: [PATCH 0372/1565] NFSD: Replace READ* macros in nfsd4_decode_layoutcommit() [ Upstream commit 5185980d8a23001c2317c290129ab7ab20067e20 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 120 ++++++++++++++++++++++------------------------ 1 file changed, 58 insertions(+), 62 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 11e32c244b23..9cd7270e67ad 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -274,20 +274,6 @@ nfsd4_decode_component4(struct nfsd4_compoundargs *argp, char **namp, u32 *lenp) return nfs_ok; } -static __be32 -nfsd4_decode_time(struct nfsd4_compoundargs *argp, struct timespec64 *tv) -{ - DECODE_HEAD; - - READ_BUF(12); - p = xdr_decode_hyper(p, &tv->tv_sec); - tv->tv_nsec = be32_to_cpup(p++); - if (tv->tv_nsec >= (u32)1000000000) - return nfserr_inval; - - DECODE_TAIL; -} - static __be32 nfsd4_decode_nfstime4(struct nfsd4_compoundargs *argp, struct timespec64 *tv) { @@ -651,6 +637,29 @@ nfsd4_decode_deviceid4(struct nfsd4_compoundargs *argp, memcpy(devid, p, sizeof(*devid)); return nfs_ok; } + +static __be32 +nfsd4_decode_layoutupdate4(struct nfsd4_compoundargs *argp, + struct nfsd4_layoutcommit *lcp) +{ + if (xdr_stream_decode_u32(argp->xdr, &lcp->lc_layout_type) < 0) + return nfserr_bad_xdr; + if (lcp->lc_layout_type < LAYOUT_NFSV4_1_FILES) + return nfserr_bad_xdr; + if (lcp->lc_layout_type >= LAYOUT_TYPE_MAX) + return nfserr_bad_xdr; + + if (xdr_stream_decode_u32(argp->xdr, &lcp->lc_up_len) < 0) + return nfserr_bad_xdr; + if (lcp->lc_up_len > 0) { + lcp->lc_up_layout = xdr_inline_decode(argp->xdr, lcp->lc_up_len); + if (!lcp->lc_up_layout) + return nfserr_bad_xdr; + } + + return nfs_ok; +} + #endif /* CONFIG_NFSD_PNFS */ static __be32 @@ -1796,6 +1805,41 @@ nfsd4_decode_getdeviceinfo(struct nfsd4_compoundargs *argp, return nfs_ok; } +static __be32 +nfsd4_decode_layoutcommit(struct nfsd4_compoundargs *argp, + struct nfsd4_layoutcommit *lcp) +{ + __be32 *p, status; + + if (xdr_stream_decode_u64(argp->xdr, &lcp->lc_seg.offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &lcp->lc_seg.length) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_bool(argp->xdr, &lcp->lc_reclaim) < 0) + return nfserr_bad_xdr; + status = nfsd4_decode_stateid4(argp, &lcp->lc_sid); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &lcp->lc_newoffset) < 0) + return nfserr_bad_xdr; + if (lcp->lc_newoffset) { + if (xdr_stream_decode_u64(argp->xdr, &lcp->lc_last_wr) < 0) + return nfserr_bad_xdr; + } else + lcp->lc_last_wr = 0; + p = xdr_inline_decode(argp->xdr, XDR_UNIT); + if (!p) + return nfserr_bad_xdr; + if (xdr_item_is_present(p)) { + status = nfsd4_decode_nfstime4(argp, &lcp->lc_mtime); + if (status) + return status; + } else { + lcp->lc_mtime.tv_nsec = UTIME_NOW; + } + return nfsd4_decode_layoutupdate4(argp, lcp); +} + static __be32 nfsd4_decode_layoutget(struct nfsd4_compoundargs *argp, struct nfsd4_layoutget *lgp) @@ -1820,54 +1864,6 @@ nfsd4_decode_layoutget(struct nfsd4_compoundargs *argp, DECODE_TAIL; } -static __be32 -nfsd4_decode_layoutcommit(struct nfsd4_compoundargs *argp, - struct nfsd4_layoutcommit *lcp) -{ - DECODE_HEAD; - u32 timechange; - - READ_BUF(20); - p = xdr_decode_hyper(p, &lcp->lc_seg.offset); - p = xdr_decode_hyper(p, &lcp->lc_seg.length); - lcp->lc_reclaim = be32_to_cpup(p++); - - status = nfsd4_decode_stateid(argp, &lcp->lc_sid); - if (status) - return status; - - READ_BUF(4); - lcp->lc_newoffset = be32_to_cpup(p++); - if (lcp->lc_newoffset) { - READ_BUF(8); - p = xdr_decode_hyper(p, &lcp->lc_last_wr); - } else - lcp->lc_last_wr = 0; - READ_BUF(4); - timechange = be32_to_cpup(p++); - if (timechange) { - status = nfsd4_decode_time(argp, &lcp->lc_mtime); - if (status) - return status; - } else { - lcp->lc_mtime.tv_nsec = UTIME_NOW; - } - READ_BUF(8); - lcp->lc_layout_type = be32_to_cpup(p++); - - /* - * Save the layout update in XDR format and let the layout driver deal - * with it later. - */ - lcp->lc_up_len = be32_to_cpup(p++); - if (lcp->lc_up_len > 0) { - READ_BUF(lcp->lc_up_len); - READMEM(lcp->lc_up_layout, lcp->lc_up_len); - } - - DECODE_TAIL; -} - static __be32 nfsd4_decode_layoutreturn(struct nfsd4_compoundargs *argp, struct nfsd4_layoutreturn *lrp) From 40e627c502da12cd5b670a92606bfbb2693602e0 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 15:06:04 -0500 Subject: [PATCH 0373/1565] NFSD: Replace READ* macros in nfsd4_decode_layoutget() [ Upstream commit c8e88e3aa73889421461f878cd569ef84f231ceb ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 31 +++++++++++++++++-------------- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 9cd7270e67ad..837d2d8fb3ff 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1844,24 +1844,27 @@ static __be32 nfsd4_decode_layoutget(struct nfsd4_compoundargs *argp, struct nfsd4_layoutget *lgp) { - DECODE_HEAD; + __be32 status; - READ_BUF(36); - lgp->lg_signal = be32_to_cpup(p++); - lgp->lg_layout_type = be32_to_cpup(p++); - lgp->lg_seg.iomode = be32_to_cpup(p++); - p = xdr_decode_hyper(p, &lgp->lg_seg.offset); - p = xdr_decode_hyper(p, &lgp->lg_seg.length); - p = xdr_decode_hyper(p, &lgp->lg_minlength); - - status = nfsd4_decode_stateid(argp, &lgp->lg_sid); + if (xdr_stream_decode_u32(argp->xdr, &lgp->lg_signal) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &lgp->lg_layout_type) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &lgp->lg_seg.iomode) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &lgp->lg_seg.offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &lgp->lg_seg.length) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &lgp->lg_minlength) < 0) + return nfserr_bad_xdr; + status = nfsd4_decode_stateid4(argp, &lgp->lg_sid); if (status) return status; + if (xdr_stream_decode_u32(argp->xdr, &lgp->lg_maxcount) < 0) + return nfserr_bad_xdr; - READ_BUF(4); - lgp->lg_maxcount = be32_to_cpup(p++); - - DECODE_TAIL; + return nfs_ok; } static __be32 From b63e313dce04349876c577bc40fd705bbaf0b523 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 10:42:25 -0500 Subject: [PATCH 0374/1565] NFSD: Replace READ* macros in nfsd4_decode_layoutreturn() [ Upstream commit 645fcad371420913c30e9aca80fc0a38f3acf432 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 72 +++++++++++++++++++++++++++++------------------ 1 file changed, 44 insertions(+), 28 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 837d2d8fb3ff..ae70070a5821 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -660,6 +660,43 @@ nfsd4_decode_layoutupdate4(struct nfsd4_compoundargs *argp, return nfs_ok; } +static __be32 +nfsd4_decode_layoutreturn4(struct nfsd4_compoundargs *argp, + struct nfsd4_layoutreturn *lrp) +{ + __be32 status; + + if (xdr_stream_decode_u32(argp->xdr, &lrp->lr_return_type) < 0) + return nfserr_bad_xdr; + switch (lrp->lr_return_type) { + case RETURN_FILE: + if (xdr_stream_decode_u64(argp->xdr, &lrp->lr_seg.offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &lrp->lr_seg.length) < 0) + return nfserr_bad_xdr; + status = nfsd4_decode_stateid4(argp, &lrp->lr_sid); + if (status) + return status; + if (xdr_stream_decode_u32(argp->xdr, &lrp->lrf_body_len) < 0) + return nfserr_bad_xdr; + if (lrp->lrf_body_len > 0) { + lrp->lrf_body = xdr_inline_decode(argp->xdr, lrp->lrf_body_len); + if (!lrp->lrf_body) + return nfserr_bad_xdr; + } + break; + case RETURN_FSID: + case RETURN_ALL: + lrp->lr_seg.offset = 0; + lrp->lr_seg.length = NFS4_MAX_UINT64; + break; + default: + return nfserr_bad_xdr; + } + + return nfs_ok; +} + #endif /* CONFIG_NFSD_PNFS */ static __be32 @@ -1871,34 +1908,13 @@ static __be32 nfsd4_decode_layoutreturn(struct nfsd4_compoundargs *argp, struct nfsd4_layoutreturn *lrp) { - DECODE_HEAD; - - READ_BUF(16); - lrp->lr_reclaim = be32_to_cpup(p++); - lrp->lr_layout_type = be32_to_cpup(p++); - lrp->lr_seg.iomode = be32_to_cpup(p++); - lrp->lr_return_type = be32_to_cpup(p++); - if (lrp->lr_return_type == RETURN_FILE) { - READ_BUF(16); - p = xdr_decode_hyper(p, &lrp->lr_seg.offset); - p = xdr_decode_hyper(p, &lrp->lr_seg.length); - - status = nfsd4_decode_stateid(argp, &lrp->lr_sid); - if (status) - return status; - - READ_BUF(4); - lrp->lrf_body_len = be32_to_cpup(p++); - if (lrp->lrf_body_len > 0) { - READ_BUF(lrp->lrf_body_len); - READMEM(lrp->lrf_body, lrp->lrf_body_len); - } - } else { - lrp->lr_seg.offset = 0; - lrp->lr_seg.length = NFS4_MAX_UINT64; - } - - DECODE_TAIL; + if (xdr_stream_decode_bool(argp->xdr, &lrp->lr_reclaim) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &lrp->lr_layout_type) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &lrp->lr_seg.iomode) < 0) + return nfserr_bad_xdr; + return nfsd4_decode_layoutreturn4(argp, lrp); } #endif /* CONFIG_NFSD_PNFS */ From 1ea743dc481f14c0b08bafff83596f4ad6127a8c Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 14:33:12 -0500 Subject: [PATCH 0375/1565] NFSD: Replace READ* macros in nfsd4_decode_secinfo_no_name() [ Upstream commit 53d70873e37c09a582167ed73d1858e3a2af0157 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index ae70070a5821..0561b4385583 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1356,17 +1356,6 @@ nfsd4_decode_secinfo(struct nfsd4_compoundargs *argp, return nfsd4_decode_component4(argp, &secinfo->si_name, &secinfo->si_namelen); } -static __be32 -nfsd4_decode_secinfo_no_name(struct nfsd4_compoundargs *argp, - struct nfsd4_secinfo_no_name *sin) -{ - DECODE_HEAD; - - READ_BUF(4); - sin->sin_style = be32_to_cpup(p++); - DECODE_TAIL; -} - static __be32 nfsd4_decode_setattr(struct nfsd4_compoundargs *argp, struct nfsd4_setattr *setattr) { @@ -1918,6 +1907,14 @@ nfsd4_decode_layoutreturn(struct nfsd4_compoundargs *argp, } #endif /* CONFIG_NFSD_PNFS */ +static __be32 nfsd4_decode_secinfo_no_name(struct nfsd4_compoundargs *argp, + struct nfsd4_secinfo_no_name *sin) +{ + if (xdr_stream_decode_u32(argp->xdr, &sin->sin_style) < 0) + return nfserr_bad_xdr; + return nfs_ok; +} + static __be32 nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp, struct nfsd4_fallocate *fallocate) From f0de0b6895490b15121b12dcadc92310139c75f8 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 14:55:19 -0500 Subject: [PATCH 0376/1565] NFSD: Replace READ* macros in nfsd4_decode_sequence() [ Upstream commit cf907b11326d9360877d6c6ea8f75e1b29f39f2f ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 36 ++++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 0561b4385583..3fe0d0228c4a 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1738,22 +1738,6 @@ nfsd4_decode_free_stateid(struct nfsd4_compoundargs *argp, return nfsd4_decode_stateid4(argp, &free_stateid->fr_stateid); } -static __be32 -nfsd4_decode_sequence(struct nfsd4_compoundargs *argp, - struct nfsd4_sequence *seq) -{ - DECODE_HEAD; - - READ_BUF(NFS4_MAX_SESSIONID_LEN + 16); - COPYMEM(seq->sessionid.data, NFS4_MAX_SESSIONID_LEN); - seq->seqid = be32_to_cpup(p++); - seq->slotid = be32_to_cpup(p++); - seq->maxslots = be32_to_cpup(p++); - seq->cachethis = be32_to_cpup(p++); - - DECODE_TAIL; -} - static __be32 nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_test_stateid *test_stateid) { @@ -1915,6 +1899,26 @@ static __be32 nfsd4_decode_secinfo_no_name(struct nfsd4_compoundargs *argp, return nfs_ok; } +static __be32 +nfsd4_decode_sequence(struct nfsd4_compoundargs *argp, + struct nfsd4_sequence *seq) +{ + __be32 *p, status; + + status = nfsd4_decode_sessionid4(argp, &seq->sessionid); + if (status) + return status; + p = xdr_inline_decode(argp->xdr, XDR_UNIT * 4); + if (!p) + return nfserr_bad_xdr; + seq->seqid = be32_to_cpup(p++); + seq->slotid = be32_to_cpup(p++); + seq->maxslots = be32_to_cpup(p++); + seq->cachethis = be32_to_cpup(p); + + return nfs_ok; +} + static __be32 nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp, struct nfsd4_fallocate *fallocate) From 7675420fdebea540b5b7672733c7630876fef382 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 14:57:44 -0500 Subject: [PATCH 0377/1565] NFSD: Replace READ* macros in nfsd4_decode_test_stateid() [ Upstream commit b7a0c8f6e741bf9dee0d24e69d3ce51fa4ccce78 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 61 +++++++++++++++++++---------------------------- 1 file changed, 25 insertions(+), 36 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 3fe0d0228c4a..9642de155043 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1738,42 +1738,6 @@ nfsd4_decode_free_stateid(struct nfsd4_compoundargs *argp, return nfsd4_decode_stateid4(argp, &free_stateid->fr_stateid); } -static __be32 -nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_test_stateid *test_stateid) -{ - int i; - __be32 *p, status; - struct nfsd4_test_stateid_id *stateid; - - READ_BUF(4); - test_stateid->ts_num_ids = ntohl(*p++); - - INIT_LIST_HEAD(&test_stateid->ts_stateid_list); - - for (i = 0; i < test_stateid->ts_num_ids; i++) { - stateid = svcxdr_tmpalloc(argp, sizeof(*stateid)); - if (!stateid) { - status = nfserrno(-ENOMEM); - goto out; - } - - INIT_LIST_HEAD(&stateid->ts_id_list); - list_add_tail(&stateid->ts_id_list, &test_stateid->ts_stateid_list); - - status = nfsd4_decode_stateid(argp, &stateid->ts_id_stateid); - if (status) - goto out; - } - - status = 0; -out: - return status; -xdr_error: - dprintk("NFSD: xdr error (%s:%d)\n", __FILE__, __LINE__); - status = nfserr_bad_xdr; - goto out; -} - static __be32 nfsd4_decode_destroy_clientid(struct nfsd4_compoundargs *argp, struct nfsd4_destroy_clientid *dc) { DECODE_HEAD; @@ -1919,6 +1883,31 @@ nfsd4_decode_sequence(struct nfsd4_compoundargs *argp, return nfs_ok; } +static __be32 +nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_test_stateid *test_stateid) +{ + struct nfsd4_test_stateid_id *stateid; + __be32 status; + u32 i; + + if (xdr_stream_decode_u32(argp->xdr, &test_stateid->ts_num_ids) < 0) + return nfserr_bad_xdr; + + INIT_LIST_HEAD(&test_stateid->ts_stateid_list); + for (i = 0; i < test_stateid->ts_num_ids; i++) { + stateid = svcxdr_tmpalloc(argp, sizeof(*stateid)); + if (!stateid) + return nfserrno(-ENOMEM); /* XXX: not jukebox? */ + INIT_LIST_HEAD(&stateid->ts_id_list); + list_add_tail(&stateid->ts_id_list, &test_stateid->ts_stateid_list); + status = nfsd4_decode_stateid4(argp, &stateid->ts_id_stateid); + if (status) + return status; + } + + return nfs_ok; +} + static __be32 nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp, struct nfsd4_fallocate *fallocate) From 093f9d2c8f4cd4b42d3725a4e69af72762f54b05 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 15:15:09 -0500 Subject: [PATCH 0378/1565] NFSD: Replace READ* macros in nfsd4_decode_destroy_clientid() [ Upstream commit c95f2ec3490586cbb33badc8f4c82d6aa4955078 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 9642de155043..d0f0b7cd4e74 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1738,16 +1738,6 @@ nfsd4_decode_free_stateid(struct nfsd4_compoundargs *argp, return nfsd4_decode_stateid4(argp, &free_stateid->fr_stateid); } -static __be32 nfsd4_decode_destroy_clientid(struct nfsd4_compoundargs *argp, struct nfsd4_destroy_clientid *dc) -{ - DECODE_HEAD; - - READ_BUF(8); - COPYMEM(&dc->clientid, 8); - - DECODE_TAIL; -} - static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, struct nfsd4_reclaim_complete *rc) { DECODE_HEAD; @@ -1908,6 +1898,12 @@ nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_test_sta return nfs_ok; } +static __be32 nfsd4_decode_destroy_clientid(struct nfsd4_compoundargs *argp, + struct nfsd4_destroy_clientid *dc) +{ + return nfsd4_decode_clientid4(argp, &dc->clientid); +} + static __be32 nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp, struct nfsd4_fallocate *fallocate) From de0dc37a791e92f945cf62c6dafd19aa830192bb Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 3 Nov 2020 15:02:11 -0500 Subject: [PATCH 0379/1565] NFSD: Replace READ* macros in nfsd4_decode_reclaim_complete() [ Upstream commit 0d6467844d437e07db1e76d96176b1a55401018c ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index d0f0b7cd4e74..2c684f7e7465 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1738,16 +1738,6 @@ nfsd4_decode_free_stateid(struct nfsd4_compoundargs *argp, return nfsd4_decode_stateid4(argp, &free_stateid->fr_stateid); } -static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, struct nfsd4_reclaim_complete *rc) -{ - DECODE_HEAD; - - READ_BUF(4); - rc->rca_one_fs = be32_to_cpup(p++); - - DECODE_TAIL; -} - #ifdef CONFIG_NFSD_PNFS static __be32 nfsd4_decode_getdeviceinfo(struct nfsd4_compoundargs *argp, @@ -1904,6 +1894,14 @@ static __be32 nfsd4_decode_destroy_clientid(struct nfsd4_compoundargs *argp, return nfsd4_decode_clientid4(argp, &dc->clientid); } +static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, + struct nfsd4_reclaim_complete *rc) +{ + if (xdr_stream_decode_bool(argp->xdr, &rc->rca_one_fs) < 0) + return nfserr_bad_xdr; + return nfs_ok; +} + static __be32 nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp, struct nfsd4_fallocate *fallocate) From a0b8dabc5906e92ab24221ceffb2653569532c0d Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 10:44:05 -0500 Subject: [PATCH 0380/1565] NFSD: Replace READ* macros in nfsd4_decode_fallocate() [ Upstream commit 6aef27aaeae7611f98af08205acc79f5a8f3aa59 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 2c684f7e7465..c8506bb6d872 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1906,17 +1906,17 @@ static __be32 nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp, struct nfsd4_fallocate *fallocate) { - DECODE_HEAD; + __be32 status; - status = nfsd4_decode_stateid(argp, &fallocate->falloc_stateid); + status = nfsd4_decode_stateid4(argp, &fallocate->falloc_stateid); if (status) return status; + if (xdr_stream_decode_u64(argp->xdr, &fallocate->falloc_offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &fallocate->falloc_length) < 0) + return nfserr_bad_xdr; - READ_BUF(16); - p = xdr_decode_hyper(p, &fallocate->falloc_offset); - xdr_decode_hyper(p, &fallocate->falloc_length); - - DECODE_TAIL; + return nfs_ok; } static __be32 From dc1a31ca8e96f4c0362c8d7da10c98f757061289 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 16 Nov 2020 18:05:06 -0500 Subject: [PATCH 0381/1565] NFSD: Replace READ* macros in nfsd4_decode_nl4_server() [ Upstream commit f49e4b4d58cc835d8bd0cc9663f7b9c5497e0e7e ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 34 ++++++++++++++++++++-------------- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index c8506bb6d872..05aa36f92a92 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1941,36 +1941,42 @@ nfsd4_decode_clone(struct nfsd4_compoundargs *argp, struct nfsd4_clone *clone) static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp, struct nl4_server *ns) { - DECODE_HEAD; struct nfs42_netaddr *naddr; + __be32 *p; - READ_BUF(4); - ns->nl4_type = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &ns->nl4_type) < 0) + return nfserr_bad_xdr; /* currently support for 1 inter-server source server */ switch (ns->nl4_type) { case NL4_NETADDR: naddr = &ns->u.nl4_addr; - READ_BUF(4); - naddr->netid_len = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &naddr->netid_len) < 0) + return nfserr_bad_xdr; if (naddr->netid_len > RPCBIND_MAXNETIDLEN) - goto xdr_error; + return nfserr_bad_xdr; - READ_BUF(naddr->netid_len + 4); /* 4 for uaddr len */ - COPYMEM(naddr->netid, naddr->netid_len); + p = xdr_inline_decode(argp->xdr, naddr->netid_len); + if (!p) + return nfserr_bad_xdr; + memcpy(naddr->netid, p, naddr->netid_len); - naddr->addr_len = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &naddr->addr_len) < 0) + return nfserr_bad_xdr; if (naddr->addr_len > RPCBIND_MAXUADDRLEN) - goto xdr_error; + return nfserr_bad_xdr; - READ_BUF(naddr->addr_len); - COPYMEM(naddr->addr, naddr->addr_len); + p = xdr_inline_decode(argp->xdr, naddr->addr_len); + if (!p) + return nfserr_bad_xdr; + memcpy(naddr->addr, p, naddr->addr_len); break; default: - goto xdr_error; + return nfserr_bad_xdr; } - DECODE_TAIL; + + return nfs_ok; } static __be32 From 8604d294c1286267e5f1355c002a223723ffafea Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 10:49:37 -0500 Subject: [PATCH 0382/1565] NFSD: Replace READ* macros in nfsd4_decode_copy() [ Upstream commit e8febea7190bcbd1e608093acb67f2a5009556aa ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 41 ++++++++++++++++++++++------------------- fs/nfsd/xdr4.h | 2 +- 2 files changed, 23 insertions(+), 20 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 05aa36f92a92..2529368cbbc0 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1982,40 +1982,44 @@ static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) { - DECODE_HEAD; struct nl4_server *ns_dummy; - int i, count; + u32 consecutive, i, count; + __be32 status; - status = nfsd4_decode_stateid(argp, ©->cp_src_stateid); + status = nfsd4_decode_stateid4(argp, ©->cp_src_stateid); if (status) return status; - status = nfsd4_decode_stateid(argp, ©->cp_dst_stateid); + status = nfsd4_decode_stateid4(argp, ©->cp_dst_stateid); if (status) return status; + if (xdr_stream_decode_u64(argp->xdr, ©->cp_src_pos) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, ©->cp_dst_pos) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, ©->cp_count) < 0) + return nfserr_bad_xdr; + /* ca_consecutive: we always do consecutive copies */ + if (xdr_stream_decode_u32(argp->xdr, &consecutive) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, ©->cp_synchronous) < 0) + return nfserr_bad_xdr; - READ_BUF(8 + 8 + 8 + 4 + 4 + 4); - p = xdr_decode_hyper(p, ©->cp_src_pos); - p = xdr_decode_hyper(p, ©->cp_dst_pos); - p = xdr_decode_hyper(p, ©->cp_count); - p++; /* ca_consecutive: we always do consecutive copies */ - copy->cp_synchronous = be32_to_cpup(p++); - - count = be32_to_cpup(p++); - + if (xdr_stream_decode_u32(argp->xdr, &count) < 0) + return nfserr_bad_xdr; copy->cp_intra = false; if (count == 0) { /* intra-server copy */ copy->cp_intra = true; - goto intra; + return nfs_ok; } - /* decode all the supplied server addresses but use first */ + /* decode all the supplied server addresses but use only the first */ status = nfsd4_decode_nl4_server(argp, ©->cp_src); if (status) return status; ns_dummy = kmalloc(sizeof(struct nl4_server), GFP_KERNEL); if (ns_dummy == NULL) - return nfserrno(-ENOMEM); + return nfserrno(-ENOMEM); /* XXX: jukebox? */ for (i = 0; i < count - 1; i++) { status = nfsd4_decode_nl4_server(argp, ns_dummy); if (status) { @@ -2024,9 +2028,8 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) } } kfree(ns_dummy); -intra: - DECODE_TAIL; + return nfs_ok; } static __be32 @@ -4792,7 +4795,7 @@ nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr, __be32 *p; nfserr = nfsd42_encode_write_res(resp, ©->cp_res, - copy->cp_synchronous); + !!copy->cp_synchronous); if (nfserr) return nfserr; diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 232529bc1b79..facc5762bf83 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -554,7 +554,7 @@ struct nfsd4_copy { bool cp_intra; /* both */ - bool cp_synchronous; + u32 cp_synchronous; /* response */ struct nfsd42_write_res cp_res; From c2d2a919b2f249c48eaad76f7c7a8bf3f9f46479 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sat, 21 Nov 2020 14:19:24 -0500 Subject: [PATCH 0383/1565] NFSD: Replace READ* macros in nfsd4_decode_copy_notify() [ Upstream commit f9a953fb369bbd2135ccead3393ec1ef66544471 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 2529368cbbc0..09aea361c175 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2032,25 +2032,25 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) return nfs_ok; } -static __be32 -nfsd4_decode_offload_status(struct nfsd4_compoundargs *argp, - struct nfsd4_offload_status *os) -{ - return nfsd4_decode_stateid(argp, &os->stateid); -} - static __be32 nfsd4_decode_copy_notify(struct nfsd4_compoundargs *argp, struct nfsd4_copy_notify *cn) { __be32 status; - status = nfsd4_decode_stateid(argp, &cn->cpn_src_stateid); + status = nfsd4_decode_stateid4(argp, &cn->cpn_src_stateid); if (status) return status; return nfsd4_decode_nl4_server(argp, &cn->cpn_dst); } +static __be32 +nfsd4_decode_offload_status(struct nfsd4_compoundargs *argp, + struct nfsd4_offload_status *os) +{ + return nfsd4_decode_stateid(argp, &os->stateid); +} + static __be32 nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek) { From f8eb5424e3184908f9cd61fdb7fe48de1b0cef52 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sat, 21 Nov 2020 14:21:25 -0500 Subject: [PATCH 0384/1565] NFSD: Replace READ* macros in nfsd4_decode_offload_status() [ Upstream commit 2846bb0525a73e00b3566fda535ea6a5879e2971 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 09aea361c175..101beb315d96 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2048,7 +2048,7 @@ static __be32 nfsd4_decode_offload_status(struct nfsd4_compoundargs *argp, struct nfsd4_offload_status *os) { - return nfsd4_decode_stateid(argp, &os->stateid); + return nfsd4_decode_stateid4(argp, &os->stateid); } static __be32 From a2f6c16ad1383130300162ee548aeeac1f8da461 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 10:54:47 -0500 Subject: [PATCH 0385/1565] NFSD: Replace READ* macros in nfsd4_decode_seek() [ Upstream commit 9d32b412fe0a6186cc57789d218e8f8299454ae2 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 101beb315d96..2e2fae7e38db 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2054,17 +2054,17 @@ nfsd4_decode_offload_status(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek) { - DECODE_HEAD; + __be32 status; - status = nfsd4_decode_stateid(argp, &seek->seek_stateid); + status = nfsd4_decode_stateid4(argp, &seek->seek_stateid); if (status) return status; + if (xdr_stream_decode_u64(argp->xdr, &seek->seek_offset) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u32(argp->xdr, &seek->seek_whence) < 0) + return nfserr_bad_xdr; - READ_BUF(8 + 4); - p = xdr_decode_hyper(p, &seek->seek_offset); - seek->seek_whence = be32_to_cpup(p); - - DECODE_TAIL; + return nfs_ok; } /* From b719fc9375ccc54613818e220262625e08fab813 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 10:46:46 -0500 Subject: [PATCH 0386/1565] NFSD: Replace READ* macros in nfsd4_decode_clone() [ Upstream commit 3dfd0b0e15671e2b4047ccb9222432f0b2d930be ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 52 +++++++++++++++++++---------------------------- 1 file changed, 21 insertions(+), 31 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 2e2fae7e38db..bf2a2ef6a8b9 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -575,18 +575,6 @@ nfsd4_decode_fattr4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen, return nfs_ok; } -static __be32 -nfsd4_decode_stateid(struct nfsd4_compoundargs *argp, stateid_t *sid) -{ - DECODE_HEAD; - - READ_BUF(sizeof(stateid_t)); - sid->si_generation = be32_to_cpup(p++); - COPYMEM(&sid->si_opaque, sizeof(stateid_opaque_t)); - - DECODE_TAIL; -} - static __be32 nfsd4_decode_stateid4(struct nfsd4_compoundargs *argp, stateid_t *sid) { @@ -1919,25 +1907,6 @@ nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp, return nfs_ok; } -static __be32 -nfsd4_decode_clone(struct nfsd4_compoundargs *argp, struct nfsd4_clone *clone) -{ - DECODE_HEAD; - - status = nfsd4_decode_stateid(argp, &clone->cl_src_stateid); - if (status) - return status; - status = nfsd4_decode_stateid(argp, &clone->cl_dst_stateid); - if (status) - return status; - - READ_BUF(8 + 8 + 8); - p = xdr_decode_hyper(p, &clone->cl_src_pos); - p = xdr_decode_hyper(p, &clone->cl_dst_pos); - p = xdr_decode_hyper(p, &clone->cl_count); - DECODE_TAIL; -} - static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp, struct nl4_server *ns) { @@ -2067,6 +2036,27 @@ nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek) return nfs_ok; } +static __be32 +nfsd4_decode_clone(struct nfsd4_compoundargs *argp, struct nfsd4_clone *clone) +{ + __be32 status; + + status = nfsd4_decode_stateid4(argp, &clone->cl_src_stateid); + if (status) + return status; + status = nfsd4_decode_stateid4(argp, &clone->cl_dst_stateid); + if (status) + return status; + if (xdr_stream_decode_u64(argp->xdr, &clone->cl_src_pos) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &clone->cl_dst_pos) < 0) + return nfserr_bad_xdr; + if (xdr_stream_decode_u64(argp->xdr, &clone->cl_count) < 0) + return nfserr_bad_xdr; + + return nfs_ok; +} + /* * XDR data that is more than PAGE_SIZE in size is normally part of a * read or write. However, the size of extended attributes is limited From 9bc67df0f9a2819c77e1d7e0c0aed76a0d46974d Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 10:56:52 -0500 Subject: [PATCH 0387/1565] NFSD: Replace READ* macros in nfsd4_decode_xattr_name() [ Upstream commit 830c71502ae0ae1677ac6c08ffbcf85a6e7b2937 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index bf2a2ef6a8b9..1fcb668e4110 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2117,25 +2117,22 @@ nfsd4_vbuf_from_vector(struct nfsd4_compoundargs *argp, struct xdr_buf *xdr, static __be32 nfsd4_decode_xattr_name(struct nfsd4_compoundargs *argp, char **namep) { - DECODE_HEAD; char *name, *sp, *dp; u32 namelen, cnt; + __be32 *p; - READ_BUF(4); - namelen = be32_to_cpup(p++); - + if (xdr_stream_decode_u32(argp->xdr, &namelen) < 0) + return nfserr_bad_xdr; if (namelen > (XATTR_NAME_MAX - XATTR_USER_PREFIX_LEN)) return nfserr_nametoolong; - if (namelen == 0) - goto xdr_error; - - READ_BUF(namelen); - + return nfserr_bad_xdr; + p = xdr_inline_decode(argp->xdr, namelen); + if (!p) + return nfserr_bad_xdr; name = svcxdr_tmpalloc(argp, namelen + XATTR_USER_PREFIX_LEN + 1); if (!name) return nfserr_jukebox; - memcpy(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN); /* @@ -2148,14 +2145,14 @@ nfsd4_decode_xattr_name(struct nfsd4_compoundargs *argp, char **namep) while (cnt-- > 0) { if (*sp == '\0') - goto xdr_error; + return nfserr_bad_xdr; *dp++ = *sp++; } *dp = '\0'; *namep = name; - DECODE_TAIL; + return nfs_ok; } /* From 8e1b8a78a9291533a3658872c7988e7647a62a1b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 10:59:57 -0500 Subject: [PATCH 0388/1565] NFSD: Replace READ* macros in nfsd4_decode_setxattr() [ Upstream commit 403366a7e8e2930002157525cd44add7fa01bca9 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 1fcb668e4110..38610764d716 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2184,11 +2184,11 @@ static __be32 nfsd4_decode_setxattr(struct nfsd4_compoundargs *argp, struct nfsd4_setxattr *setxattr) { - DECODE_HEAD; u32 flags, maxcount, size; + __be32 status; - READ_BUF(4); - flags = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &flags) < 0) + return nfserr_bad_xdr; if (flags > SETXATTR4_REPLACE) return nfserr_inval; @@ -2201,8 +2201,8 @@ nfsd4_decode_setxattr(struct nfsd4_compoundargs *argp, maxcount = svc_max_payload(argp->rqstp); maxcount = min_t(u32, XATTR_SIZE_MAX, maxcount); - READ_BUF(4); - size = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &size) < 0) + return nfserr_bad_xdr; if (size > maxcount) return nfserr_xattr2big; @@ -2211,12 +2211,12 @@ nfsd4_decode_setxattr(struct nfsd4_compoundargs *argp, struct xdr_buf payload; if (!xdr_stream_subsegment(argp->xdr, &payload, size)) - goto xdr_error; + return nfserr_bad_xdr; status = nfsd4_vbuf_from_vector(argp, &payload, &setxattr->setxa_buf, size); } - DECODE_TAIL; + return nfs_ok; } static __be32 From c2d0c16990b97a598a0f9740baefe357f2230d73 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 11:04:02 -0500 Subject: [PATCH 0389/1565] NFSD: Replace READ* macros in nfsd4_decode_listxattrs() [ Upstream commit 2212036cadf4da3c4b0e4bd2a9a8c3d78617ab4f ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 38610764d716..bf8eacab6495 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2223,11 +2223,10 @@ static __be32 nfsd4_decode_listxattrs(struct nfsd4_compoundargs *argp, struct nfsd4_listxattrs *listxattrs) { - DECODE_HEAD; u32 maxcount; - READ_BUF(12); - p = xdr_decode_hyper(p, &listxattrs->lsxa_cookie); + if (xdr_stream_decode_u64(argp->xdr, &listxattrs->lsxa_cookie) < 0) + return nfserr_bad_xdr; /* * If the cookie is too large to have even one user.x attribute @@ -2237,7 +2236,8 @@ nfsd4_decode_listxattrs(struct nfsd4_compoundargs *argp, (XATTR_LIST_MAX / (XATTR_USER_PREFIX_LEN + 2))) return nfserr_badcookie; - maxcount = be32_to_cpup(p++); + if (xdr_stream_decode_u32(argp->xdr, &maxcount) < 0) + return nfserr_bad_xdr; if (maxcount < 8) /* Always need at least 2 words (length and one character) */ return nfserr_inval; @@ -2245,7 +2245,7 @@ nfsd4_decode_listxattrs(struct nfsd4_compoundargs *argp, maxcount = min(maxcount, svc_max_payload(argp->rqstp)); listxattrs->lsxa_maxcount = maxcount; - DECODE_TAIL; + return nfs_ok; } static __be32 From 6b48808835a2b92caeb90d8840b8dc068488b61c Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 22 Nov 2020 12:49:52 -0500 Subject: [PATCH 0390/1565] NFSD: Make nfsd4_ops::opnum a u32 [ Upstream commit 3a237b4af5b7b0e77588e120554077cab3341943 ] Avoid passing a "pointer to int" argument to xdr_stream_decode_u32. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfs4xdr.c | 7 +++---- fs/nfsd/xdr4.h | 2 +- 3 files changed, 5 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index a038d1e182ff..6b06f0ad0561 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -3272,7 +3272,7 @@ int nfsd4_max_reply(struct svc_rqst *rqstp, struct nfsd4_op *op) void warn_on_nonidempotent_op(struct nfsd4_op *op) { if (OPDESC(op)->op_flags & OP_MODIFIES_SOMETHING) { - pr_err("unable to encode reply to nonidempotent op %d (%s)\n", + pr_err("unable to encode reply to nonidempotent op %u (%s)\n", op->opnum, nfsd4_op_name(op->opnum)); WARN_ON_ONCE(1); } diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index bf8eacab6495..085191b4b3aa 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2419,9 +2419,8 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) op = &argp->ops[i]; op->replay = NULL; - READ_BUF(4); - op->opnum = be32_to_cpup(p++); - + if (xdr_stream_decode_u32(argp->xdr, &op->opnum) < 0) + return nfserr_bad_xdr; if (nfsd4_opnum_in_range(argp, op)) { op->status = nfsd4_dec_ops[op->opnum](argp, &op->u); if (op->status != nfs_ok) @@ -5395,7 +5394,7 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) if (op->status && opdesc && !(opdesc->op_flags & OP_NONTRIVIAL_ERROR_ENCODE)) goto status; - BUG_ON(op->opnum < 0 || op->opnum >= ARRAY_SIZE(nfsd4_enc_ops) || + BUG_ON(op->opnum >= ARRAY_SIZE(nfsd4_enc_ops) || !nfsd4_enc_ops[op->opnum]); encoder = nfsd4_enc_ops[op->opnum]; op->status = encoder(resp, op->status, &op->u); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index facc5762bf83..2c31f3a7d7c7 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -615,7 +615,7 @@ struct nfsd4_copy_notify { }; struct nfsd4_op { - int opnum; + u32 opnum; const struct nfsd4_operation * opdesc; __be32 status; union nfsd4_op_u { From b24e6a40eebae079882e2d4e776a3b5950359cf7 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 11:07:06 -0500 Subject: [PATCH 0391/1565] NFSD: Replace READ* macros in nfsd4_decode_compound() [ Upstream commit d9b74bdac6f24afc3101b6a5b6f59842610c9c94 ] And clean-up: Now that we have removed the DECODE_TAIL macro from nfsd4_decode_compound(), we observe that there's no benefit for nfsd4_decode_compound() to return nfs_ok or nfserr_bad_xdr only to have its sole caller convert those values to one or zero, respectively. Have nfsd4_decode_compound() return 1/0 instead. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 67 ++++++++++++++++++++--------------------------- 1 file changed, 28 insertions(+), 39 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 085191b4b3aa..30604a3e70c0 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -186,28 +186,6 @@ svcxdr_dupstr(struct nfsd4_compoundargs *argp, void *buf, u32 len) return p; } -/** - * savemem - duplicate a chunk of memory for later processing - * @argp: NFSv4 compound argument structure to be freed with - * @p: pointer to be duplicated - * @nbytes: length to be duplicated - * - * Returns a pointer to a copy of @nbytes bytes of memory at @p - * that are preserved until processing of the NFSv4 compound - * operation described by @argp finishes. - */ -static char *savemem(struct nfsd4_compoundargs *argp, __be32 *p, int nbytes) -{ - void *ret; - - ret = svcxdr_tmpalloc(argp, nbytes); - if (!ret) - return NULL; - memcpy(ret, p, nbytes); - return ret; -} - - /* * NFSv4 basic data type decoders */ @@ -2372,43 +2350,54 @@ nfsd4_opnum_in_range(struct nfsd4_compoundargs *argp, struct nfsd4_op *op) return true; } -static __be32 +static int nfsd4_decode_compound(struct nfsd4_compoundargs *argp) { - DECODE_HEAD; struct nfsd4_op *op; bool cachethis = false; int auth_slack= argp->rqstp->rq_auth_slack; int max_reply = auth_slack + 8; /* opcnt, status */ int readcount = 0; int readbytes = 0; + __be32 *p; int i; - READ_BUF(4); - argp->taglen = be32_to_cpup(p++); - READ_BUF(argp->taglen); - SAVEMEM(argp->tag, argp->taglen); - READ_BUF(8); - argp->minorversion = be32_to_cpup(p++); - argp->opcnt = be32_to_cpup(p++); - max_reply += 4 + (XDR_QUADLEN(argp->taglen) << 2); + if (xdr_stream_decode_u32(argp->xdr, &argp->taglen) < 0) + return 0; + max_reply += XDR_UNIT; + argp->tag = NULL; + if (unlikely(argp->taglen)) { + if (argp->taglen > NFSD4_MAX_TAGLEN) + return 0; + p = xdr_inline_decode(argp->xdr, argp->taglen); + if (!p) + return 0; + argp->tag = svcxdr_tmpalloc(argp, argp->taglen); + if (!argp->tag) + return 0; + memcpy(argp->tag, p, argp->taglen); + max_reply += xdr_align_size(argp->taglen); + } + + if (xdr_stream_decode_u32(argp->xdr, &argp->minorversion) < 0) + return 0; + if (xdr_stream_decode_u32(argp->xdr, &argp->opcnt) < 0) + return 0; - if (argp->taglen > NFSD4_MAX_TAGLEN) - goto xdr_error; /* * NFS4ERR_RESOURCE is a more helpful error than GARBAGE_ARGS * here, so we return success at the xdr level so that * nfsd4_proc can handle this is an NFS-level error. */ if (argp->opcnt > NFSD_MAX_OPS_PER_COMPOUND) - return 0; + return 1; if (argp->opcnt > ARRAY_SIZE(argp->iops)) { argp->ops = kzalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL); if (!argp->ops) { argp->ops = argp->iops; dprintk("nfsd: couldn't allocate room for COMPOUND\n"); - goto xdr_error; + return 0; } } @@ -2420,7 +2409,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) op->replay = NULL; if (xdr_stream_decode_u32(argp->xdr, &op->opnum) < 0) - return nfserr_bad_xdr; + return 0; if (nfsd4_opnum_in_range(argp, op)) { op->status = nfsd4_dec_ops[op->opnum](argp, &op->u); if (op->status != nfs_ok) @@ -2467,7 +2456,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) if (readcount > 1 || max_reply > PAGE_SIZE - auth_slack) clear_bit(RQ_SPLICE_OK, &argp->rqstp->rq_flags); - DECODE_TAIL; + return 1; } static __be32 *encode_change(__be32 *p, struct kstat *stat, struct inode *inode, @@ -5496,7 +5485,7 @@ nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, __be32 *p) args->ops = args->iops; args->rqstp = rqstp; - return !nfsd4_decode_compound(args); + return nfsd4_decode_compound(args); } int From 3aea16e6b70b04db380a11e9aac53ade07a23788 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 4 Nov 2020 11:12:18 -0500 Subject: [PATCH 0392/1565] NFSD: Remove macros that are no longer used [ Upstream commit 5cfc822f3e77b0477e6602d399116130317f537a ] Now that all the NFSv4 decoder functions have been converted to make direct calls to the xdr helpers, remove the unused C macros. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 40 ---------------------------------------- fs/nfsd/xdr4.h | 9 --------- 2 files changed, 49 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 30604a3e70c0..315be1c1ab85 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -102,45 +102,6 @@ check_filename(char *str, int len) return 0; } -#define DECODE_HEAD \ - __be32 *p; \ - __be32 status -#define DECODE_TAIL \ - status = 0; \ -out: \ - return status; \ -xdr_error: \ - dprintk("NFSD: xdr error (%s:%d)\n", \ - __FILE__, __LINE__); \ - status = nfserr_bad_xdr; \ - goto out - -#define READMEM(x,nbytes) do { \ - x = (char *)p; \ - p += XDR_QUADLEN(nbytes); \ -} while (0) -#define SAVEMEM(x,nbytes) do { \ - if (!(x = (p==argp->tmp || p == argp->tmpp) ? \ - savemem(argp, p, nbytes) : \ - (char *)p)) { \ - dprintk("NFSD: xdr error (%s:%d)\n", \ - __FILE__, __LINE__); \ - goto xdr_error; \ - } \ - p += XDR_QUADLEN(nbytes); \ -} while (0) -#define COPYMEM(x,nbytes) do { \ - memcpy((x), p, nbytes); \ - p += XDR_QUADLEN(nbytes); \ -} while (0) -#define READ_BUF(nbytes) \ - do { \ - p = xdr_inline_decode(argp->xdr,\ - nbytes); \ - if (!p) \ - goto xdr_error; \ - } while (0) - static int zero_clientid(clientid_t *clid) { return (clid->cl_boot == 0) && (clid->cl_id == 0); @@ -5478,7 +5439,6 @@ nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, __be32 *p) struct nfsd4_compoundargs *args = rqstp->rq_argp; /* svcxdr_tmp_alloc */ - args->tmpp = NULL; args->to_free = NULL; args->xdr = &rqstp->rq_arg_stream; diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 2c31f3a7d7c7..e12fbe382e3f 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -386,13 +386,6 @@ struct nfsd4_setclientid_confirm { nfs4_verifier sc_confirm; }; -struct nfsd4_saved_compoundargs { - __be32 *p; - __be32 *end; - int pagelen; - struct page **pagelist; -}; - struct nfsd4_test_stateid_id { __be32 ts_id_status; stateid_t ts_id_stateid; @@ -696,8 +689,6 @@ struct svcxdr_tmpbuf { struct nfsd4_compoundargs { /* scratch variables for XDR decode */ - __be32 tmp[8]; - __be32 * tmpp; struct xdr_stream *xdr; struct svcxdr_tmpbuf *to_free; struct svc_rqst *rqstp; From f8032b859df63e8fee31d74b3f738ed8a862e107 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Mon, 30 Nov 2020 17:46:14 -0500 Subject: [PATCH 0393/1565] nfsd: only call inode_query_iversion in the I_VERSION case [ Upstream commit 70b87f77294d16d3e567056ba4c9ee2b091a5b50 ] inode_query_iversion() can modify i_version. Depending on the exported filesystem, that may not be safe. For example, if you're re-exporting NFS, NFS stores the server's change attribute in i_version and does not expect it to be modified locally. This has been observed causing unnecessary cache invalidations. The way a filesystem indicates that it's OK to call inode_query_iverson() is by setting SB_I_VERSION. So, move the I_VERSION check out of encode_change(), where it's used only in GETATTR responses, to nfsd4_change_attribute(), which is also called for pre- and post- operation attributes. (Note we could also pull the NFSEXP_V4ROOT case into nfsd4_change_attribute() as well. That would actually be a no-op, since pre/post attrs are only used for metadata-modifying operations, and V4ROOT exports are read-only. But we might make the change in the future just for simplicity.) Reported-by: Daire Byrne Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 5 ++--- fs/nfsd/nfs4xdr.c | 6 +----- fs/nfsd/nfsfh.h | 14 ++++++++++---- 3 files changed, 13 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 0d75d201db1b..5956b0317c55 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -291,14 +291,13 @@ void fill_post_wcc(struct svc_fh *fhp) printk("nfsd: inode locked twice during operation.\n"); err = fh_getattr(fhp, &fhp->fh_post_attr); - fhp->fh_post_change = nfsd4_change_attribute(&fhp->fh_post_attr, - d_inode(fhp->fh_dentry)); if (err) { fhp->fh_post_saved = false; - /* Grab the ctime anyway - set_change_info might use it */ fhp->fh_post_attr.ctime = d_inode(fhp->fh_dentry)->i_ctime; } else fhp->fh_post_saved = true; + fhp->fh_post_change = nfsd4_change_attribute(&fhp->fh_post_attr, + d_inode(fhp->fh_dentry)); } /* diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 315be1c1ab85..bdcfb5f7021d 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2426,12 +2426,8 @@ static __be32 *encode_change(__be32 *p, struct kstat *stat, struct inode *inode, if (exp->ex_flags & NFSEXP_V4ROOT) { *p++ = cpu_to_be32(convert_to_wallclock(exp->cd->flush_time)); *p++ = 0; - } else if (IS_I_VERSION(inode)) { + } else p = xdr_encode_hyper(p, nfsd4_change_attribute(stat, inode)); - } else { - *p++ = cpu_to_be32(stat->ctime.tv_sec); - *p++ = cpu_to_be32(stat->ctime.tv_nsec); - } return p; } diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 56cfbc361561..39d764b129fa 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -261,10 +261,16 @@ static inline u64 nfsd4_change_attribute(struct kstat *stat, { u64 chattr; - chattr = stat->ctime.tv_sec; - chattr <<= 30; - chattr += stat->ctime.tv_nsec; - chattr += inode_query_iversion(inode); + if (IS_I_VERSION(inode)) { + chattr = stat->ctime.tv_sec; + chattr <<= 30; + chattr += stat->ctime.tv_nsec; + chattr += inode_query_iversion(inode); + } else { + chattr = stat->ctime.tv_sec; + chattr <<= 32; + chattr += stat->ctime.tv_nsec; + } return chattr; } From 88dea0f92b20593a5bd48c1256c862794fc99f97 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Mon, 30 Nov 2020 17:46:15 -0500 Subject: [PATCH 0394/1565] nfsd: simplify nfsd4_change_info [ Upstream commit b2140338d8dca827ad9e83f3e026e9d51748b265 ] It doesn't make sense to carry all these extra fields around. Just make everything into change attribute from the start. This is just cleanup, there should be no change in behavior. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 11 ++--------- fs/nfsd/xdr4.h | 11 ----------- 2 files changed, 2 insertions(+), 20 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index bdcfb5f7021d..4df6c75d0eb7 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2459,15 +2459,8 @@ static __be32 *encode_time_delta(__be32 *p, struct inode *inode) static __be32 *encode_cinfo(__be32 *p, struct nfsd4_change_info *c) { *p++ = cpu_to_be32(c->atomic); - if (c->change_supported) { - p = xdr_encode_hyper(p, c->before_change); - p = xdr_encode_hyper(p, c->after_change); - } else { - *p++ = cpu_to_be32(c->before_ctime_sec); - *p++ = cpu_to_be32(c->before_ctime_nsec); - *p++ = cpu_to_be32(c->after_ctime_sec); - *p++ = cpu_to_be32(c->after_ctime_nsec); - } + p = xdr_encode_hyper(p, c->before_change); + p = xdr_encode_hyper(p, c->after_change); return p; } diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index e12fbe382e3f..b4556e86e97c 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -76,12 +76,7 @@ static inline bool nfsd4_has_session(struct nfsd4_compound_state *cs) struct nfsd4_change_info { u32 atomic; - bool change_supported; - u32 before_ctime_sec; - u32 before_ctime_nsec; u64 before_change; - u32 after_ctime_sec; - u32 after_ctime_nsec; u64 after_change; }; @@ -754,15 +749,9 @@ set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp) { BUG_ON(!fhp->fh_pre_saved); cinfo->atomic = (u32)fhp->fh_post_saved; - cinfo->change_supported = IS_I_VERSION(d_inode(fhp->fh_dentry)); cinfo->before_change = fhp->fh_pre_change; cinfo->after_change = fhp->fh_post_change; - cinfo->before_ctime_sec = fhp->fh_pre_ctime.tv_sec; - cinfo->before_ctime_nsec = fhp->fh_pre_ctime.tv_nsec; - cinfo->after_ctime_sec = fhp->fh_post_attr.ctime.tv_sec; - cinfo->after_ctime_nsec = fhp->fh_post_attr.ctime.tv_nsec; - } From 796785a79b4aefecab31bada4e7bc46e25fd91aa Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Mon, 30 Nov 2020 17:46:16 -0500 Subject: [PATCH 0395/1565] nfsd: minor nfsd4_change_attribute cleanup [ Upstream commit 4b03d99794eeed27650597a886247c6427ce1055 ] Minor cleanup, no change in behavior. Also pull out a common helper that'll be useful elsewhere. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsfh.h | 13 +++++-------- include/linux/iversion.h | 13 +++++++++++++ 2 files changed, 18 insertions(+), 8 deletions(-) diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 39d764b129fa..45bd776290d5 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -259,19 +259,16 @@ fh_clear_wcc(struct svc_fh *fhp) static inline u64 nfsd4_change_attribute(struct kstat *stat, struct inode *inode) { - u64 chattr; - if (IS_I_VERSION(inode)) { + u64 chattr; + chattr = stat->ctime.tv_sec; chattr <<= 30; chattr += stat->ctime.tv_nsec; chattr += inode_query_iversion(inode); - } else { - chattr = stat->ctime.tv_sec; - chattr <<= 32; - chattr += stat->ctime.tv_nsec; - } - return chattr; + return chattr; + } else + return time_to_chattr(&stat->ctime); } extern void fill_pre_wcc(struct svc_fh *fhp); diff --git a/include/linux/iversion.h b/include/linux/iversion.h index 2917ef990d43..3bfebde5a1a6 100644 --- a/include/linux/iversion.h +++ b/include/linux/iversion.h @@ -328,6 +328,19 @@ inode_query_iversion(struct inode *inode) return cur >> I_VERSION_QUERIED_SHIFT; } +/* + * For filesystems without any sort of change attribute, the best we can + * do is fake one up from the ctime: + */ +static inline u64 time_to_chattr(struct timespec64 *t) +{ + u64 chattr = t->tv_sec; + + chattr <<= 32; + chattr += t->tv_nsec; + return chattr; +} + /** * inode_eq_iversion_raw - check whether the raw i_version counter has changed * @inode: inode to check From b720ceec88a75fc6ea171519c3d30854c57d0e41 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Mon, 30 Nov 2020 17:46:17 -0500 Subject: [PATCH 0396/1565] nfsd4: don't query change attribute in v2/v3 case [ Upstream commit 942b20dc245590327ee0187c15c78174cd96dd52 ] inode_query_iversion() has side effects, and there's no point calling it when we're not even going to use it. We check whether we're currently processing a v4 request by checking fh_maxsize, which is arguably a little hacky; we could add a flag to svc_fh instead. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 5956b0317c55..7d44e10a5f5d 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -259,11 +259,11 @@ void fill_pre_wcc(struct svc_fh *fhp) { struct inode *inode; struct kstat stat; + bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); __be32 err; if (fhp->fh_pre_saved) return; - inode = d_inode(fhp->fh_dentry); err = fh_getattr(fhp, &stat); if (err) { @@ -272,11 +272,12 @@ void fill_pre_wcc(struct svc_fh *fhp) stat.ctime = inode->i_ctime; stat.size = inode->i_size; } + if (v4) + fhp->fh_pre_change = nfsd4_change_attribute(&stat, inode); fhp->fh_pre_mtime = stat.mtime; fhp->fh_pre_ctime = stat.ctime; fhp->fh_pre_size = stat.size; - fhp->fh_pre_change = nfsd4_change_attribute(&stat, inode); fhp->fh_pre_saved = true; } @@ -285,6 +286,8 @@ void fill_pre_wcc(struct svc_fh *fhp) */ void fill_post_wcc(struct svc_fh *fhp) { + bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); + struct inode *inode = d_inode(fhp->fh_dentry); __be32 err; if (fhp->fh_post_saved) @@ -293,11 +296,12 @@ void fill_post_wcc(struct svc_fh *fhp) err = fh_getattr(fhp, &fhp->fh_post_attr); if (err) { fhp->fh_post_saved = false; - fhp->fh_post_attr.ctime = d_inode(fhp->fh_dentry)->i_ctime; + fhp->fh_post_attr.ctime = inode->i_ctime; } else fhp->fh_post_saved = true; - fhp->fh_post_change = nfsd4_change_attribute(&fhp->fh_post_attr, - d_inode(fhp->fh_dentry)); + if (v4) + fhp->fh_post_change = + nfsd4_change_attribute(&fhp->fh_post_attr, inode); } /* From 34de27ed8447d9cffe0d6bedceb1238c50e8c78b Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Mon, 30 Nov 2020 17:46:18 -0500 Subject: [PATCH 0397/1565] Revert "nfsd4: support change_attr_type attribute" This reverts commit a85857633b04d57f4524cca0a2bfaf87b2543f9f. We're still factoring ctime into our change attribute even in the IS_I_VERSION case. If someone sets the system time backwards, a client could see the change attribute go backwards. Maybe we can just say "well, don't do that", but there's some question whether that's good enough, or whether we need a better guarantee. Also, the client still isn't actually using the attribute. While we're still figuring this out, let's just stop returning this attribute. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 10 ---------- fs/nfsd/nfsd.h | 1 - include/linux/nfs4.h | 8 -------- 3 files changed, 19 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 4df6c75d0eb7..bb037e8eb830 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3311,16 +3311,6 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, goto out; } - if (bmval2 & FATTR4_WORD2_CHANGE_ATTR_TYPE) { - p = xdr_reserve_space(xdr, 4); - if (!p) - goto out_resource; - if (IS_I_VERSION(d_inode(dentry))) - *p++ = cpu_to_be32(NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR); - else - *p++ = cpu_to_be32(NFS4_CHANGE_TYPE_IS_TIME_METADATA); - } - #ifdef CONFIG_NFSD_V4_SECURITY_LABEL if (bmval2 & FATTR4_WORD2_SECURITY_LABEL) { status = nfsd4_encode_security_label(xdr, rqstp, context, diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 3eaa81e001f9..2326428e2c5b 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -394,7 +394,6 @@ void nfsd_lockd_shutdown(void); #define NFSD4_2_SUPPORTED_ATTRS_WORD2 \ (NFSD4_1_SUPPORTED_ATTRS_WORD2 | \ - FATTR4_WORD2_CHANGE_ATTR_TYPE | \ FATTR4_WORD2_MODE_UMASK | \ NFSD4_2_SECURITY_ATTRS | \ FATTR4_WORD2_XATTR_SUPPORT) diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h index 9dc7eeac924f..5b4c67c91f56 100644 --- a/include/linux/nfs4.h +++ b/include/linux/nfs4.h @@ -385,13 +385,6 @@ enum lock_type4 { NFS4_WRITEW_LT = 4 }; -enum change_attr_type4 { - NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR = 0, - NFS4_CHANGE_TYPE_IS_VERSION_COUNTER = 1, - NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS = 2, - NFS4_CHANGE_TYPE_IS_TIME_METADATA = 3, - NFS4_CHANGE_TYPE_IS_UNDEFINED = 4 -}; /* Mandatory Attributes */ #define FATTR4_WORD0_SUPPORTED_ATTRS (1UL << 0) @@ -459,7 +452,6 @@ enum change_attr_type4 { #define FATTR4_WORD2_LAYOUT_BLKSIZE (1UL << 1) #define FATTR4_WORD2_MDSTHRESHOLD (1UL << 4) #define FATTR4_WORD2_CLONE_BLKSIZE (1UL << 13) -#define FATTR4_WORD2_CHANGE_ATTR_TYPE (1UL << 15) #define FATTR4_WORD2_SECURITY_LABEL (1UL << 16) #define FATTR4_WORD2_MODE_UMASK (1UL << 17) #define FATTR4_WORD2_XATTR_SUPPORT (1UL << 18) From d5314c9bb7f521714ff0e0bcd24ca2eaf00f622b Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 30 Nov 2020 17:03:14 -0500 Subject: [PATCH 0398/1565] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations [ Upstream commit daab110e47f8d7aa6da66923e3ac1a8dbd2b2a72 ] With NFSv3 nfsd will always attempt to send along WCC data to the client. This generally involves saving off the in-core inode information prior to doing the operation on the given filehandle, and then issuing a vfs_getattr to it after the op. Some filesystems (particularly clustered or networked ones) have an expensive ->getattr inode operation. Atomicity is also often difficult or impossible to guarantee on such filesystems. For those, we're best off not trying to provide WCC information to the client at all, and to simply allow it to poll for that information as needed with a GETATTR RPC. This patch adds a new flags field to struct export_operations, and defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate that nfsd should not attempt to provide WCC info in NFSv3 replies. It also adds a blurb about the new flags field and flag to the exporting documentation. The server will also now skip collecting this information for NFSv2 as well, since that info is never used there anyway. Note that this patch does not add this flag to any filesystem export_operations structures. This was originally developed to allow reexporting nfs via nfsd. Other filesystems may want to consider enabling this flag too. It's hard to tell however which ones have export operations to enable export via knfsd and which ones mostly rely on them for open-by-filehandle support, so I'm leaving that up to the individual maintainers to decide. I am cc'ing the relevant lists for those filesystems that I think may want to consider adding this though. Cc: HPDD-discuss@lists.01.org Cc: ceph-devel@vger.kernel.org Cc: cluster-devel@redhat.com Cc: fuse-devel@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Signed-off-by: Jeff Layton Signed-off-by: Lance Shelton Signed-off-by: Trond Myklebust Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- Documentation/filesystems/nfs/exporting.rst | 27 +++++++++++++++++++++ fs/nfs/export.c | 1 + fs/nfsd/nfs3xdr.c | 7 ++++-- fs/nfsd/nfsfh.c | 14 +++++++++++ fs/nfsd/nfsfh.h | 2 +- include/linux/exportfs.h | 2 ++ 6 files changed, 50 insertions(+), 3 deletions(-) diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst index 33d588a01ace..cbe542ad5233 100644 --- a/Documentation/filesystems/nfs/exporting.rst +++ b/Documentation/filesystems/nfs/exporting.rst @@ -154,6 +154,11 @@ struct which has the following members: to find potential names, and matches inode numbers to find the correct match. + flags + Some filesystems may need to be handled differently than others. The + export_operations struct also includes a flags field that allows the + filesystem to communicate such information to nfsd. See the Export + Operations Flags section below for more explanation. A filehandle fragment consists of an array of 1 or more 4byte words, together with a one byte "type". @@ -163,3 +168,25 @@ generated by encode_fh, in which case it will have been padded with nuls. Rather, the encode_fh routine should choose a "type" which indicates the decode_fh how much of the filehandle is valid, and how it should be interpreted. + +Export Operations Flags +----------------------- +In addition to the operation vector pointers, struct export_operations also +contains a "flags" field that allows the filesystem to communicate to nfsd +that it may want to do things differently when dealing with it. The +following flags are defined: + + EXPORT_OP_NOWCC - disable NFSv3 WCC attributes on this filesystem + RFC 1813 recommends that servers always send weak cache consistency + (WCC) data to the client after each operation. The server should + atomically collect attributes about the inode, do an operation on it, + and then collect the attributes afterward. This allows the client to + skip issuing GETATTRs in some situations but means that the server + is calling vfs_getattr for almost all RPCs. On some filesystems + (particularly those that are clustered or networked) this is expensive + and atomicity is difficult to guarantee. This flag indicates to nfsd + that it should skip providing WCC attributes to the client in NFSv3 + replies when doing operations on this filesystem. Consider enabling + this on filesystems that have an expensive ->getattr inode operation, + or when atomicity between pre and post operation attribute collection + is impossible to guarantee. diff --git a/fs/nfs/export.c b/fs/nfs/export.c index 3430d6891e89..8f4c528865c5 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -171,4 +171,5 @@ const struct export_operations nfs_export_ops = { .encode_fh = nfs_encode_fh, .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, + .flags = EXPORT_OP_NOWCC, }; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 7d44e10a5f5d..34b880211e5e 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -206,7 +206,7 @@ static __be32 * encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) { struct dentry *dentry = fhp->fh_dentry; - if (dentry && d_really_is_positive(dentry)) { + if (!fhp->fh_no_wcc && dentry && d_really_is_positive(dentry)) { __be32 err; struct kstat stat; @@ -262,7 +262,7 @@ void fill_pre_wcc(struct svc_fh *fhp) bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); __be32 err; - if (fhp->fh_pre_saved) + if (fhp->fh_no_wcc || fhp->fh_pre_saved) return; inode = d_inode(fhp->fh_dentry); err = fh_getattr(fhp, &stat); @@ -290,6 +290,9 @@ void fill_post_wcc(struct svc_fh *fhp) struct inode *inode = d_inode(fhp->fh_dentry); __be32 err; + if (fhp->fh_no_wcc) + return; + if (fhp->fh_post_saved) printk("nfsd: inode locked twice during operation.\n"); diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index c81dbbad8792..9c29a523f484 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -291,6 +291,16 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) fhp->fh_dentry = dentry; fhp->fh_export = exp; + + switch (rqstp->rq_vers) { + case 3: + if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC) + fhp->fh_no_wcc = true; + break; + case 2: + fhp->fh_no_wcc = true; + } + return 0; out: exp_put(exp); @@ -559,6 +569,9 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry, */ set_version_and_fsid_type(fhp, exp, ref_fh); + /* If we have a ref_fh, then copy the fh_no_wcc setting from it. */ + fhp->fh_no_wcc = ref_fh ? ref_fh->fh_no_wcc : false; + if (ref_fh == fhp) fh_put(ref_fh); @@ -662,6 +675,7 @@ fh_put(struct svc_fh *fhp) exp_put(exp); fhp->fh_export = NULL; } + fhp->fh_no_wcc = false; return; } diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 45bd776290d5..347d10aa6265 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -35,6 +35,7 @@ typedef struct svc_fh { bool fh_locked; /* inode locked by us */ bool fh_want_write; /* remount protection taken */ + bool fh_no_wcc; /* no wcc data needed */ int fh_flags; /* FH flags */ #ifdef CONFIG_NFSD_V3 bool fh_post_saved; /* post-op attrs saved */ @@ -54,7 +55,6 @@ typedef struct svc_fh { struct kstat fh_post_attr; /* full attrs after operation */ u64 fh_post_change; /* nfsv4 change; see above */ #endif /* CONFIG_NFSD_V3 */ - } svc_fh; #define NFSD4_FH_FOREIGN (1<<0) #define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f)) diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index 3ceb72b67a7a..e7de0103a32e 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -213,6 +213,8 @@ struct export_operations { bool write, u32 *device_generation); int (*commit_blocks)(struct inode *inode, struct iomap *iomaps, int nr_iomaps, struct iattr *iattr); +#define EXPORT_OP_NOWCC (0x1) /* Don't collect wcc data for NFSv3 replies */ + unsigned long flags; }; extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid, From 203ca3253b34a1786c885491f0643d4d95a1d690 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 30 Nov 2020 17:03:15 -0500 Subject: [PATCH 0399/1565] nfsd: allow filesystems to opt out of subtree checking [ Upstream commit ba5e8187c55555519ae0b63c0fb681391bc42af9 ] When we start allowing NFS to be reexported, then we have some problems when it comes to subtree checking. In principle, we could allow it, but it would mean encoding parent info in the filehandles and there may not be enough space for that in a NFSv3 filehandle. To enforce this at export upcall time, we add a new export_ops flag that declares the filesystem ineligible for subtree checking. Signed-off-by: Jeff Layton Signed-off-by: Lance Shelton Signed-off-by: Trond Myklebust Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- Documentation/filesystems/nfs/exporting.rst | 12 ++++++++++++ fs/nfs/export.c | 2 +- fs/nfsd/export.c | 6 ++++++ include/linux/exportfs.h | 1 + 4 files changed, 20 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst index cbe542ad5233..960be64446cb 100644 --- a/Documentation/filesystems/nfs/exporting.rst +++ b/Documentation/filesystems/nfs/exporting.rst @@ -190,3 +190,15 @@ following flags are defined: this on filesystems that have an expensive ->getattr inode operation, or when atomicity between pre and post operation attribute collection is impossible to guarantee. + + EXPORT_OP_NOSUBTREECHK - disallow subtree checking on this fs + Many NFS operations deal with filehandles, which the server must then + vet to ensure that they live inside of an exported tree. When the + export consists of an entire filesystem, this is trivial. nfsd can just + ensure that the filehandle live on the filesystem. When only part of a + filesystem is exported however, then nfsd must walk the ancestors of the + inode to ensure that it's within an exported subtree. This is an + expensive operation and not all filesystems can support it properly. + This flag exempts the filesystem from subtree checking and causes + exportfs to get back an error if it tries to enable subtree checking + on it. diff --git a/fs/nfs/export.c b/fs/nfs/export.c index 8f4c528865c5..b9ba306bf912 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -171,5 +171,5 @@ const struct export_operations nfs_export_ops = { .encode_fh = nfs_encode_fh, .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, - .flags = EXPORT_OP_NOWCC, + .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK, }; diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c index 21e404e7cb68..81e7bb12aca6 100644 --- a/fs/nfsd/export.c +++ b/fs/nfsd/export.c @@ -408,6 +408,12 @@ static int check_export(struct inode *inode, int *flags, unsigned char *uuid) return -EINVAL; } + if (inode->i_sb->s_export_op->flags & EXPORT_OP_NOSUBTREECHK && + !(*flags & NFSEXP_NOSUBTREECHECK)) { + dprintk("%s: %s does not support subtree checking!\n", + __func__, inode->i_sb->s_type->name); + return -EINVAL; + } return 0; } diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index e7de0103a32e..2fcbab0f6b61 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -214,6 +214,7 @@ struct export_operations { int (*commit_blocks)(struct inode *inode, struct iomap *iomaps, int nr_iomaps, struct iattr *iattr); #define EXPORT_OP_NOWCC (0x1) /* Don't collect wcc data for NFSv3 replies */ +#define EXPORT_OP_NOSUBTREECHK (0x2) /* Subtree checking is not supported! */ unsigned long flags; }; From 93f7d515d873e37be0b9e4712e9171a3443f2d6a Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 30 Nov 2020 17:03:16 -0500 Subject: [PATCH 0400/1565] nfsd: close cached files prior to a REMOVE or RENAME that would replace target [ Upstream commit 7f84b488f9add1d5cca3e6197c95914c7bd3c1cf ] It's not uncommon for some workloads to do a bunch of I/O to a file and delete it just afterward. If knfsd has a cached open file however, then the file may still be open when the dentry is unlinked. If the underlying filesystem is nfs, then that could trigger it to do a sillyrename. On a REMOVE or RENAME scan the nfsd_file cache for open files that correspond to the inode, and proactively unhash and put their references. This should prevent any delete-on-last-close activity from occurring, solely due to knfsd's open file cache. This must be done synchronously though so we use the variants that call flush_delayed_fput. There are deadlock possibilities if you call flush_delayed_fput while holding locks, however. In the case of nfsd_rename, we don't even do the lookups of the dentries to be renamed until we've locked for rename. Once we've figured out what the target dentry is for a rename, check to see whether there are cached open files associated with it. If there are, then unwind all of the locking, close them all, and then reattempt the rename. None of this is really necessary for "typical" filesystems though. It's mostly of use for NFS, so declare a new export op flag and use that to determine whether to close the files beforehand. Signed-off-by: Jeff Layton Signed-off-by: Lance Shelton Signed-off-by: Trond Myklebust [ cel: adjusted to apply to 5.10.y ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- Documentation/filesystems/nfs/exporting.rst | 13 +++++++++++++ fs/nfs/export.c | 2 +- fs/nfsd/vfs.c | 16 +++++++++------- include/linux/exportfs.h | 5 +++-- 4 files changed, 26 insertions(+), 10 deletions(-) diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst index 960be64446cb..0e98edd353b5 100644 --- a/Documentation/filesystems/nfs/exporting.rst +++ b/Documentation/filesystems/nfs/exporting.rst @@ -202,3 +202,16 @@ following flags are defined: This flag exempts the filesystem from subtree checking and causes exportfs to get back an error if it tries to enable subtree checking on it. + + EXPORT_OP_CLOSE_BEFORE_UNLINK - always close cached files before unlinking + On some exportable filesystems (such as NFS) unlinking a file that + is still open can cause a fair bit of extra work. For instance, + the NFS client will do a "sillyrename" to ensure that the file + sticks around while it's still open. When reexporting, that open + file is held by nfsd so we usually end up doing a sillyrename, and + then immediately deleting the sillyrenamed file just afterward when + the link count actually goes to zero. Sometimes this delete can race + with other operations (for instance an rmdir of the parent directory). + This flag causes nfsd to close any open files for this inode _before_ + calling into the vfs to do an unlink or a rename that would replace + an existing file. diff --git a/fs/nfs/export.c b/fs/nfs/export.c index b9ba306bf912..5428713af5fe 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -171,5 +171,5 @@ const struct export_operations nfs_export_ops = { .encode_fh = nfs_encode_fh, .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, - .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK, + .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK|EXPORT_OP_CLOSE_BEFORE_UNLINK, }; diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 31edb883afd0..fb4e6c57ce0b 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1739,7 +1739,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, struct inode *fdir, *tdir; __be32 err; int host_err; - bool has_cached = false; + bool close_cached = false; err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE); if (err) @@ -1798,8 +1798,9 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, if (ndentry == trap) goto out_dput_new; - if (nfsd_has_cached_files(ndentry)) { - has_cached = true; + if ((ndentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) && + nfsd_has_cached_files(ndentry)) { + close_cached = true; goto out_dput_old; } else { host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0); @@ -1820,7 +1821,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, * as that would do the wrong thing if the two directories * were the same, so again we do it by hand. */ - if (!has_cached) { + if (!close_cached) { fill_post_wcc(ffhp); fill_post_wcc(tfhp); } @@ -1834,8 +1835,8 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, * shouldn't be done with locks held however, so we delay it until this * point and then reattempt the whole shebang. */ - if (has_cached) { - has_cached = false; + if (close_cached) { + close_cached = false; nfsd_close_cached_files(ndentry); dput(ndentry); goto retry; @@ -1887,7 +1888,8 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, type = d_inode(rdentry)->i_mode & S_IFMT; if (type != S_IFDIR) { - nfsd_close_cached_files(rdentry); + if (rdentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) + nfsd_close_cached_files(rdentry); host_err = vfs_unlink(dirp, rdentry, NULL); } else { host_err = vfs_rmdir(dirp, rdentry); diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index 2fcbab0f6b61..d829403ffd3b 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -213,8 +213,9 @@ struct export_operations { bool write, u32 *device_generation); int (*commit_blocks)(struct inode *inode, struct iomap *iomaps, int nr_iomaps, struct iattr *iattr); -#define EXPORT_OP_NOWCC (0x1) /* Don't collect wcc data for NFSv3 replies */ -#define EXPORT_OP_NOSUBTREECHK (0x2) /* Subtree checking is not supported! */ +#define EXPORT_OP_NOWCC (0x1) /* don't collect v3 wcc data */ +#define EXPORT_OP_NOSUBTREECHK (0x2) /* no subtree checking */ +#define EXPORT_OP_CLOSE_BEFORE_UNLINK (0x4) /* close files before unlink */ unsigned long flags; }; From 38e213c1e41eabc1f5aba5d7e5bc237b62f6658c Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 30 Nov 2020 17:03:17 -0500 Subject: [PATCH 0401/1565] exportfs: Add a function to return the raw output from fh_to_dentry() [ Upstream commit d045465fc6cbfa4acfb5a7d817a7c1a57a078109 ] In order to allow nfsd to accept return values that are not acceptable to overlayfs and others, add a new function. Signed-off-by: Trond Myklebust Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/exportfs/expfs.c | 32 ++++++++++++++++++++++++-------- include/linux/exportfs.h | 5 +++++ 2 files changed, 29 insertions(+), 8 deletions(-) diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c index 2dd55b172d57..0106eba46d5a 100644 --- a/fs/exportfs/expfs.c +++ b/fs/exportfs/expfs.c @@ -417,9 +417,11 @@ int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len, } EXPORT_SYMBOL_GPL(exportfs_encode_fh); -struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid, - int fh_len, int fileid_type, - int (*acceptable)(void *, struct dentry *), void *context) +struct dentry * +exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len, + int fileid_type, + int (*acceptable)(void *, struct dentry *), + void *context) { const struct export_operations *nop = mnt->mnt_sb->s_export_op; struct dentry *result, *alias; @@ -432,10 +434,8 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid, if (!nop || !nop->fh_to_dentry) return ERR_PTR(-ESTALE); result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type); - if (PTR_ERR(result) == -ENOMEM) - return ERR_CAST(result); if (IS_ERR_OR_NULL(result)) - return ERR_PTR(-ESTALE); + return result; /* * If no acceptance criteria was specified by caller, a disconnected @@ -561,10 +561,26 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid, err_result: dput(result); - if (err != -ENOMEM) - err = -ESTALE; return ERR_PTR(err); } +EXPORT_SYMBOL_GPL(exportfs_decode_fh_raw); + +struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid, + int fh_len, int fileid_type, + int (*acceptable)(void *, struct dentry *), + void *context) +{ + struct dentry *ret; + + ret = exportfs_decode_fh_raw(mnt, fid, fh_len, fileid_type, + acceptable, context); + if (IS_ERR_OR_NULL(ret)) { + if (ret == ERR_PTR(-ENOMEM)) + return ret; + return ERR_PTR(-ESTALE); + } + return ret; +} EXPORT_SYMBOL_GPL(exportfs_decode_fh); MODULE_LICENSE("GPL"); diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index d829403ffd3b..846df3c96730 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -223,6 +223,11 @@ extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid, int *max_len, struct inode *parent); extern int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len, int connectable); +extern struct dentry *exportfs_decode_fh_raw(struct vfsmount *mnt, + struct fid *fid, int fh_len, + int fileid_type, + int (*acceptable)(void *, struct dentry *), + void *context); extern struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid, int fh_len, int fileid_type, int (*acceptable)(void *, struct dentry *), void *context); From f3056a0ac2c5ced2a9d9f9f7d639094fc3df922a Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 30 Nov 2020 17:03:18 -0500 Subject: [PATCH 0402/1565] nfsd: Fix up nfsd to ensure that timeout errors don't result in ESTALE [ Upstream commit 2e19d10c1438241de32467637a2a411971547991 ] If the underlying filesystem times out, then we want knfsd to return NFSERR_JUKEBOX/DELAY rather than NFSERR_STALE. Signed-off-by: Trond Myklebust Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsfh.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 9c29a523f484..e80a7525561d 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -268,12 +268,20 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) if (fileid_type == FILEID_ROOT) dentry = dget(exp->ex_path.dentry); else { - dentry = exportfs_decode_fh(exp->ex_path.mnt, fid, - data_left, fileid_type, - nfsd_acceptable, exp); - if (IS_ERR_OR_NULL(dentry)) + dentry = exportfs_decode_fh_raw(exp->ex_path.mnt, fid, + data_left, fileid_type, + nfsd_acceptable, exp); + if (IS_ERR_OR_NULL(dentry)) { trace_nfsd_set_fh_dentry_badhandle(rqstp, fhp, dentry ? PTR_ERR(dentry) : -ESTALE); + switch (PTR_ERR(dentry)) { + case -ENOMEM: + case -ETIMEDOUT: + break; + default: + dentry = ERR_PTR(-ESTALE); + } + } } if (dentry == NULL) goto out; From 0750e494c75e48eaf49460c330585e0e1e01b47e Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 30 Nov 2020 17:03:19 -0500 Subject: [PATCH 0403/1565] nfsd: Set PF_LOCAL_THROTTLE on local filesystems only [ Upstream commit 01cbf3853959feec40ec9b9a399e12a021cd4d81 ] Don't set PF_LOCAL_THROTTLE on remote filesystems like NFS, since they aren't expected to ever be subject to double buffering. Signed-off-by: Trond Myklebust Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfs/export.c | 3 ++- fs/nfsd/vfs.c | 13 +++++++++++-- include/linux/exportfs.h | 1 + 3 files changed, 14 insertions(+), 3 deletions(-) diff --git a/fs/nfs/export.c b/fs/nfs/export.c index 5428713af5fe..48b879cfe6e3 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -171,5 +171,6 @@ const struct export_operations nfs_export_ops = { .encode_fh = nfs_encode_fh, .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, - .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK|EXPORT_OP_CLOSE_BEFORE_UNLINK, + .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK| + EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS, }; diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index fb4e6c57ce0b..a515cbd0a7d8 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -986,6 +986,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, __be32 *verf) { struct file *file = nf->nf_file; + struct super_block *sb = file_inode(file)->i_sb; struct svc_export *exp; struct iov_iter iter; errseq_t since; @@ -993,12 +994,18 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, int host_err; int use_wgather; loff_t pos = offset; + unsigned long exp_op_flags = 0; unsigned int pflags = current->flags; rwf_t flags = 0; + bool restore_flags = false; trace_nfsd_write_opened(rqstp, fhp, offset, *cnt); - if (test_bit(RQ_LOCAL, &rqstp->rq_flags)) + if (sb->s_export_op) + exp_op_flags = sb->s_export_op->flags; + + if (test_bit(RQ_LOCAL, &rqstp->rq_flags) && + !(exp_op_flags & EXPORT_OP_REMOTE_FS)) { /* * We want throttling in balance_dirty_pages() * and shrink_inactive_list() to only consider @@ -1007,6 +1014,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, * the client's dirty pages or its congested queue. */ current->flags |= PF_LOCAL_THROTTLE; + restore_flags = true; + } exp = fhp->fh_export; use_wgather = (rqstp->rq_vers == 2) && EX_WGATHER(exp); @@ -1062,7 +1071,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, trace_nfsd_write_err(rqstp, fhp, offset, host_err); nfserr = nfserrno(host_err); } - if (test_bit(RQ_LOCAL, &rqstp->rq_flags)) + if (restore_flags) current_restore_flags(pflags, PF_LOCAL_THROTTLE); return nfserr; } diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index 846df3c96730..d93e8a6737bb 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -216,6 +216,7 @@ struct export_operations { #define EXPORT_OP_NOWCC (0x1) /* don't collect v3 wcc data */ #define EXPORT_OP_NOSUBTREECHK (0x2) /* no subtree checking */ #define EXPORT_OP_CLOSE_BEFORE_UNLINK (0x4) /* close files before unlink */ +#define EXPORT_OP_REMOTE_FS (0x8) /* Filesystem is remote */ unsigned long flags; }; From 8f1df3d0c1469d4b4020e078efb724e6e535cd4c Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 30 Nov 2020 23:14:27 -0500 Subject: [PATCH 0404/1565] nfsd: Record NFSv4 pre/post-op attributes as non-atomic [ Upstream commit 716a8bc7f706eeef80ab42c99d9f210eda845c81 ] For the case of NFSv4, specify to the client that the pre/post-op attributes were not recorded atomically with the main operation. Signed-off-by: Trond Myklebust Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfs/export.c | 3 ++- fs/nfsd/nfsfh.c | 4 ++++ fs/nfsd/nfsfh.h | 5 +++++ fs/nfsd/xdr4.h | 2 +- include/linux/exportfs.h | 3 +++ 5 files changed, 15 insertions(+), 2 deletions(-) diff --git a/fs/nfs/export.c b/fs/nfs/export.c index 48b879cfe6e3..7412bb164fa7 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -172,5 +172,6 @@ const struct export_operations nfs_export_ops = { .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK| - EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS, + EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS| + EXPORT_OP_NOATOMIC_ATTR, }; diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index e80a7525561d..66f2ef67792a 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -301,6 +301,10 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) fhp->fh_export = exp; switch (rqstp->rq_vers) { + case 4: + if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOATOMIC_ATTR) + fhp->fh_no_atomic_attr = true; + break; case 3: if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC) fhp->fh_no_wcc = true; diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 347d10aa6265..cb20c2cd3469 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -36,6 +36,11 @@ typedef struct svc_fh { bool fh_locked; /* inode locked by us */ bool fh_want_write; /* remount protection taken */ bool fh_no_wcc; /* no wcc data needed */ + bool fh_no_atomic_attr; + /* + * wcc data is not atomic with + * operation + */ int fh_flags; /* FH flags */ #ifdef CONFIG_NFSD_V3 bool fh_post_saved; /* post-op attrs saved */ diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index b4556e86e97c..a60ff5ce1a37 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -748,7 +748,7 @@ static inline void set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp) { BUG_ON(!fhp->fh_pre_saved); - cinfo->atomic = (u32)fhp->fh_post_saved; + cinfo->atomic = (u32)(fhp->fh_post_saved && !fhp->fh_no_atomic_attr); cinfo->before_change = fhp->fh_pre_change; cinfo->after_change = fhp->fh_post_change; diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index d93e8a6737bb..9f4d4bcbf251 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -217,6 +217,9 @@ struct export_operations { #define EXPORT_OP_NOSUBTREECHK (0x2) /* no subtree checking */ #define EXPORT_OP_CLOSE_BEFORE_UNLINK (0x4) /* close files before unlink */ #define EXPORT_OP_REMOTE_FS (0x8) /* Filesystem is remote */ +#define EXPORT_OP_NOATOMIC_ATTR (0x10) /* Filesystem cannot supply + atomic attribute updates + */ unsigned long flags; }; From 527c9b6eb18dc49da73468cb2808c201f6ef64a1 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Wed, 9 Dec 2020 15:42:57 -0600 Subject: [PATCH 0405/1565] exec: Don't open code get_close_on_exec [ Upstream commit 878f12dbb8f514799d126544d59be4d2675caac3 ] Al Viro pointed out that using the phrase "close_on_exec(fd, rcu_dereference_raw(current->files->fdt))" instead of wrapping it in rcu_read_lock(), rcu_read_unlock() is a very questionable optimization[1]. Once wrapped with rcu_read_lock()/rcu_read_unlock() that phrase becomes equivalent the helper function get_close_on_exec so simplify the code and make it more robust by simply using get_close_on_exec. [1] https://lkml.kernel.org/r/20201207222214.GA4115853@ZenIV.linux.org.uk Suggested-by: Al Viro Link: https://lkml.kernel.org/r/87k0tqr6zi.fsf_-_@x220.int.ebiederm.org Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/exec.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index ebe9011955b9..fb8813cc532d 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1821,8 +1821,7 @@ static int bprm_execve(struct linux_binprm *bprm, * inaccessible after exec. Relies on having exclusive access to * current->files (due to unshare_files above). */ - if (bprm->fdpath && - close_on_exec(fd, rcu_dereference_raw(current->files->fdt))) + if (bprm->fdpath && get_close_on_exec(fd)) bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE; /* Set the unchanging part of bprm->cred */ From 5091d051c51db83c15f3e0b493f1c0da30316a36 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:18 -0600 Subject: [PATCH 0406/1565] exec: Move unshare_files to fix posix file locking during exec MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit b6043501289ebf169ae19b810a882d517377302f ] Many moons ago the binfmts were doing some very questionable things with file descriptors and an unsharing of the file descriptor table was added to make things better[1][2]. The helper steal_lockss was added to avoid breaking the userspace programs[3][4][6]. Unfortunately it turned out that steal_locks did not work for network file systems[5], so it was removed to see if anyone would complain[7][8]. It was thought at the time that NPTL would not be affected as the unshare_files happened after the other threads were killed[8]. Unfortunately because there was an unshare_files in binfmt_elf.c before the threads were killed this analysis was incorrect. This unshare_files in binfmt_elf.c resulted in the unshares_files happening whenever threads were present. Which led to unshare_files being moved to the start of do_execve[9]. Later the problems were rediscovered and the suggested approach was to readd steal_locks under a different name[10]. I happened to be reviewing patches and I noticed that this approach was a step backwards[11]. I proposed simply moving unshare_files[12] and it was pointed out that moving unshare_files without auditing the code was also unsafe[13]. There were then several attempts to solve this[14][15][16] and I even posted this set of changes[17]. Unfortunately because auditing all of execve is time consuming this change did not make it in at the time. Well now that I am cleaning up exec I have made the time to read through all of the binfmts and the only playing with file descriptors is either the security modules closing them in security_bprm_committing_creds or is in the generic code in fs/exec.c. None of it happens before begin_new_exec is called. So move unshare_files into begin_new_exec, after the point of no return. If memory is very very very low and the application calling exec is sharing file descriptor tables between processes we might fail past the point of no return. Which is unfortunate but no different than any of the other places where we allocate memory after the point of no return. This movement allows another process that shares the file table, or another thread of the same process and that closes files or changes their close on exec behavior and races with execve to cause some unexpected things to happen. There is only one time of check to time of use race and it is just there so that execve fails instead of an interpreter failing when it tries to open the file it is supposed to be interpreting. Failing later if userspace is being silly is not a problem. With this change it the following discription from the removal of steal_locks[8] finally becomes true. Apps using NPTL are not affected, since all other threads are killed before execve. Apps using LinuxThreads are only affected if they - have multiple threads during exec (LinuxThreads doesn't kill other threads, the app may do it with pthread_kill_other_threads_np()) - rely on POSIX locks being inherited across exec Both conditions are documented, but not their interaction. Apps using clone() natively are affected if they - use clone(CLONE_FILES) - rely on POSIX locks being inherited across exec I have investigated some paths to make it possible to solve this without moving unshare_files but they all look more complicated[18]. Reported-by: Daniel P. Berrangé Reported-by: Jeff Layton History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git [1] 02cda956de0b ("[PATCH] unshare_files" [2] 04e9bcb4d106 ("[PATCH] use new unshare_files helper") [3] 088f5d7244de ("[PATCH] add steal_locks helper") [4] 02c541ec8ffa ("[PATCH] use new steal_locks helper") [5] https://lkml.kernel.org/r/E1FLIlF-0007zR-00@dorka.pomaz.szeredi.hu [6] https://lkml.kernel.org/r/0060321191605.GB15997@sorel.sous-sol.org [7] https://lkml.kernel.org/r/E1FLwjC-0000kJ-00@dorka.pomaz.szeredi.hu [8] c89681ed7d0e ("[PATCH] remove steal_locks()") [9] fd8328be874f ("[PATCH] sanitize handling of shared descriptor tables in failing execve()") [10] https://lkml.kernel.org/r/20180317142520.30520-1-jlayton@kernel.org [11] https://lkml.kernel.org/r/87r2nwqk73.fsf@xmission.com [12] https://lkml.kernel.org/r/87bmfgvg8w.fsf@xmission.com [13] https://lkml.kernel.org/r/20180322111424.GE30522@ZenIV.linux.org.uk [14] https://lkml.kernel.org/r/20180827174722.3723-1-jlayton@kernel.org [15] https://lkml.kernel.org/r/20180830172423.21964-1-jlayton@kernel.org [16] https://lkml.kernel.org/r/20180914105310.6454-1-jlayton@kernel.org [17] https://lkml.kernel.org/r/87a7ohs5ow.fsf@xmission.com [18] https://lkml.kernel.org/r/87pn8c1uj6.fsf_-_@x220.int.ebiederm.org Acked-by: Christian Brauner v1: https://lkml.kernel.org/r/20200817220425.9389-1-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-1-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/exec.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index fb8813cc532d..42952cf90f4a 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1245,6 +1245,7 @@ void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec) int begin_new_exec(struct linux_binprm * bprm) { struct task_struct *me = current; + struct files_struct *displaced; int retval; /* Once we are committed compute the creds */ @@ -1264,6 +1265,13 @@ int begin_new_exec(struct linux_binprm * bprm) if (retval) goto out; + /* Ensure the files table is not shared. */ + retval = unshare_files(&displaced); + if (retval) + goto out; + if (displaced) + put_files_struct(displaced); + /* * Must be called _before_ exec_mmap() as bprm->mm is * not visibile until then. This also enables the update @@ -1789,7 +1797,6 @@ static int bprm_execve(struct linux_binprm *bprm, int fd, struct filename *filename, int flags) { struct file *file; - struct files_struct *displaced; int retval; /* @@ -1797,13 +1804,9 @@ static int bprm_execve(struct linux_binprm *bprm, */ io_uring_task_cancel(); - retval = unshare_files(&displaced); - if (retval) - return retval; - retval = prepare_bprm_creds(bprm); if (retval) - goto out_files; + return retval; check_unsafe_exec(bprm); current->in_execve = 1; @@ -1818,8 +1821,12 @@ static int bprm_execve(struct linux_binprm *bprm, bprm->file = file; /* * Record that a name derived from an O_CLOEXEC fd will be - * inaccessible after exec. Relies on having exclusive access to - * current->files (due to unshare_files above). + * inaccessible after exec. This allows the code in exec to + * choose to fail when the executable is not mmaped into the + * interpreter and an open file descriptor is not passed to + * the interpreter. This makes for a better user experience + * than having the interpreter start and then immediately fail + * when it finds the executable is inaccessible. */ if (bprm->fdpath && get_close_on_exec(fd)) bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE; @@ -1839,8 +1846,6 @@ static int bprm_execve(struct linux_binprm *bprm, rseq_execve(current); acct_update_integrals(current); task_numa_free(current, false); - if (displaced) - put_files_struct(displaced); return retval; out: @@ -1857,10 +1862,6 @@ static int bprm_execve(struct linux_binprm *bprm, current->fs->in_exec = 0; current->in_execve = 0; -out_files: - if (displaced) - reset_files_struct(displaced); - return retval; } From 44f79df28b47749be944e0bea86ccd7ad9772636 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:19 -0600 Subject: [PATCH 0407/1565] exec: Simplify unshare_files [ Upstream commit 1f702603e7125a390b5cdf5ce00539781cfcc86a ] Now that exec no longer needs to return the unshared files to their previous value there is no reason to return displaced. Instead when unshare_fd creates a copy of the file table, call put_files_struct before returning from unshare_files. Acked-by: Christian Brauner v1: https://lkml.kernel.org/r/20200817220425.9389-2-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-2-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/coredump.c | 5 +---- fs/exec.c | 5 +---- include/linux/fdtable.h | 2 +- kernel/fork.c | 12 ++++++------ 4 files changed, 9 insertions(+), 15 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index 9d91e831ed0b..7b085975ea16 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -590,7 +590,6 @@ void do_coredump(const kernel_siginfo_t *siginfo) int ispipe; size_t *argv = NULL; int argc = 0; - struct files_struct *displaced; /* require nonrelative corefile path and be extra careful */ bool need_suid_safe = false; bool core_dumped = false; @@ -797,11 +796,9 @@ void do_coredump(const kernel_siginfo_t *siginfo) } /* get us an unshared descriptor table; almost always a no-op */ - retval = unshare_files(&displaced); + retval = unshare_files(); if (retval) goto close_fail; - if (displaced) - put_files_struct(displaced); if (!dump_interrupted()) { /* * umh disabled with CONFIG_STATIC_USERMODEHELPER_PATH="" would diff --git a/fs/exec.c b/fs/exec.c index 42952cf90f4a..d5c8f085235b 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1245,7 +1245,6 @@ void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec) int begin_new_exec(struct linux_binprm * bprm) { struct task_struct *me = current; - struct files_struct *displaced; int retval; /* Once we are committed compute the creds */ @@ -1266,11 +1265,9 @@ int begin_new_exec(struct linux_binprm * bprm) goto out; /* Ensure the files table is not shared. */ - retval = unshare_files(&displaced); + retval = unshare_files(); if (retval) goto out; - if (displaced) - put_files_struct(displaced); /* * Must be called _before_ exec_mmap() as bprm->mm is diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index f1a99d3e5570..b32ab2163dc2 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -109,7 +109,7 @@ struct task_struct; struct files_struct *get_files_struct(struct task_struct *); void put_files_struct(struct files_struct *fs); void reset_files_struct(struct files_struct *); -int unshare_files(struct files_struct **); +int unshare_files(void); struct files_struct *dup_fd(struct files_struct *, unsigned, int *) __latent_entropy; void do_close_on_exec(struct files_struct *); int iterate_fd(struct files_struct *, unsigned, diff --git a/kernel/fork.c b/kernel/fork.c index 633b0af1d1a7..8b8a5a172b15 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -3077,21 +3077,21 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags) * the exec layer of the kernel. */ -int unshare_files(struct files_struct **displaced) +int unshare_files(void) { struct task_struct *task = current; - struct files_struct *copy = NULL; + struct files_struct *old, *copy = NULL; int error; error = unshare_fd(CLONE_FILES, NR_OPEN_MAX, ©); - if (error || !copy) { - *displaced = NULL; + if (error || !copy) return error; - } - *displaced = task->files; + + old = task->files; task_lock(task); task->files = copy; task_unlock(task); + put_files_struct(old); return 0; } From ba7aac19b4be916b68067bfe09b1b648dd0bdd8f Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:20 -0600 Subject: [PATCH 0408/1565] exec: Remove reset_files_struct [ Upstream commit 950db38ff2c01b7aabbd7ab4a50b7992750fa63d ] Now that exec no longer needs to restore the previous value of current->files on error there are no more callers of reset_files_struct so remove it. Acked-by: Christian Brauner v1: https://lkml.kernel.org/r/20200817220425.9389-3-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-3-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/file.c | 12 ------------ include/linux/fdtable.h | 1 - 2 files changed, 13 deletions(-) diff --git a/fs/file.c b/fs/file.c index d6bc73960e4a..5065252bb474 100644 --- a/fs/file.c +++ b/fs/file.c @@ -466,18 +466,6 @@ void put_files_struct(struct files_struct *files) } } -void reset_files_struct(struct files_struct *files) -{ - struct task_struct *tsk = current; - struct files_struct *old; - - old = tsk->files; - task_lock(tsk); - tsk->files = files; - task_unlock(tsk); - put_files_struct(old); -} - void exit_files(struct task_struct *tsk) { struct files_struct * files = tsk->files; diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index b32ab2163dc2..c0ca6fb3f0f9 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -108,7 +108,6 @@ struct task_struct; struct files_struct *get_files_struct(struct task_struct *); void put_files_struct(struct files_struct *fs); -void reset_files_struct(struct files_struct *); int unshare_files(void); struct files_struct *dup_fd(struct files_struct *, unsigned, int *) __latent_entropy; void do_close_on_exec(struct files_struct *); From fe1722255ebd9b7fa3602ea1e86cf79f8844d43e Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:21 -0600 Subject: [PATCH 0409/1565] kcmp: In kcmp_epoll_target use fget_task [ Upstream commit f43c283a89a7dc531a47d4b1e001503cf3dc3234 ] Use the helper fget_task and simplify the code. As well as simplifying the code this removes one unnecessary increment of struct files_struct. This unnecessary increment of files_struct.count can result in exec unnecessarily unsharing files_struct and breaking posix locks, and it can result in fget_light having to fallback to fget reducing performance. Suggested-by: Oleg Nesterov Reviewed-by: Cyrill Gorcunov v1: https://lkml.kernel.org/r/20200817220425.9389-4-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-4-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- kernel/kcmp.c | 20 ++++---------------- 1 file changed, 4 insertions(+), 16 deletions(-) diff --git a/kernel/kcmp.c b/kernel/kcmp.c index c0d2ad9b4705..bd6f9edf98fd 100644 --- a/kernel/kcmp.c +++ b/kernel/kcmp.c @@ -107,7 +107,6 @@ static int kcmp_epoll_target(struct task_struct *task1, { struct file *filp, *filp_epoll, *filp_tgt; struct kcmp_epoll_slot slot; - struct files_struct *files; if (copy_from_user(&slot, uslot, sizeof(slot))) return -EFAULT; @@ -116,23 +115,12 @@ static int kcmp_epoll_target(struct task_struct *task1, if (!filp) return -EBADF; - files = get_files_struct(task2); - if (!files) + filp_epoll = fget_task(task2, slot.efd); + if (!filp_epoll) return -EBADF; - spin_lock(&files->file_lock); - filp_epoll = fcheck_files(files, slot.efd); - if (filp_epoll) - get_file(filp_epoll); - else - filp_tgt = ERR_PTR(-EBADF); - spin_unlock(&files->file_lock); - put_files_struct(files); - - if (filp_epoll) { - filp_tgt = get_epoll_tfile_raw_ptr(filp_epoll, slot.tfd, slot.toff); - fput(filp_epoll); - } + filp_tgt = get_epoll_tfile_raw_ptr(filp_epoll, slot.tfd, slot.toff); + fput(filp_epoll); if (IS_ERR(filp_tgt)) return PTR_ERR(filp_tgt); From c874ec02cb8ac5edeb1b01b1df41ed23bbbea562 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:22 -0600 Subject: [PATCH 0410/1565] bpf: In bpf_task_fd_query use fget_task [ Upstream commit b48845af0152d790a54b8ab78cc2b7c07485fc98 ] Use the helper fget_task to simplify bpf_task_fd_query. As well as simplifying the code this removes one unnecessary increment of struct files_struct. This unnecessary increment of files_struct.count can result in exec unnecessarily unsharing files_struct and breaking posix locks, and it can result in fget_light having to fallback to fget reducing performance. This simplification comes from the observation that none of the callers of get_files_struct actually need to call get_files_struct that was made when discussing[1] exec and posix file locks. [1] https://lkml.kernel.org/r/20180915160423.GA31461@redhat.com Suggested-by: Oleg Nesterov v1: https://lkml.kernel.org/r/20200817220425.9389-5-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-5-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- kernel/bpf/syscall.c | 20 +++----------------- 1 file changed, 3 insertions(+), 17 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index e1bee8cd3404..fbe7f8e2b022 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3929,7 +3929,6 @@ static int bpf_task_fd_query(const union bpf_attr *attr, pid_t pid = attr->task_fd_query.pid; u32 fd = attr->task_fd_query.fd; const struct perf_event *event; - struct files_struct *files; struct task_struct *task; struct file *file; int err; @@ -3949,23 +3948,11 @@ static int bpf_task_fd_query(const union bpf_attr *attr, if (!task) return -ENOENT; - files = get_files_struct(task); - put_task_struct(task); - if (!files) - return -ENOENT; - err = 0; - spin_lock(&files->file_lock); - file = fcheck_files(files, fd); + file = fget_task(task, fd); + put_task_struct(task); if (!file) - err = -EBADF; - else - get_file(file); - spin_unlock(&files->file_lock); - put_files_struct(files); - - if (err) - goto out; + return -EBADF; if (file->f_op == &bpf_link_fops) { struct bpf_link *link = file->private_data; @@ -4005,7 +3992,6 @@ static int bpf_task_fd_query(const union bpf_attr *attr, err = -ENOTSUPP; put_file: fput(file); -out: return err; } From 4d037e1173b5d26e3f9382c6c917f8447bc37543 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:23 -0600 Subject: [PATCH 0411/1565] proc/fd: In proc_fd_link use fget_task [ Upstream commit 439be32656035d3239fd56f9b83353ec06cb3b45 ] When discussing[1] exec and posix file locks it was realized that none of the callers of get_files_struct fundamentally needed to call get_files_struct, and that by switching them to helper functions instead it will both simplify their code and remove unnecessary increments of files_struct.count. Those unnecessary increments can result in exec unnecessarily unsharing files_struct which breaking posix locks, and it can result in fget_light having to fallback to fget reducing system performance. Simplifying proc_fd_link is a little bit tricky. It is necessary to know that there is a reference to fd_f ile while path_get is running. This reference can either be guaranteed to exist either by locking the fdtable as the code currently does or by taking a reference on the file in question. Use fget_task to remove the need for get_files_struct and to take a reference to file in question. [1] https://lkml.kernel.org/r/20180915160423.GA31461@redhat.com Suggested-by: Oleg Nesterov v1: https://lkml.kernel.org/r/20200817220425.9389-8-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-6-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/proc/fd.c | 13 +++---------- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/fs/proc/fd.c b/fs/proc/fd.c index 81882a13212d..d58960f6ee52 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -146,29 +146,22 @@ static const struct dentry_operations tid_fd_dentry_operations = { static int proc_fd_link(struct dentry *dentry, struct path *path) { - struct files_struct *files = NULL; struct task_struct *task; int ret = -ENOENT; task = get_proc_task(d_inode(dentry)); if (task) { - files = get_files_struct(task); - put_task_struct(task); - } - - if (files) { unsigned int fd = proc_fd(d_inode(dentry)); struct file *fd_file; - spin_lock(&files->file_lock); - fd_file = fcheck_files(files, fd); + fd_file = fget_task(task, fd); if (fd_file) { *path = fd_file->f_path; path_get(&fd_file->f_path); ret = 0; + fput(fd_file); } - spin_unlock(&files->file_lock); - put_files_struct(files); + put_task_struct(task); } return ret; From e6f42bc11a60571768c96f74a8bdf4fac5624801 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 29 Feb 2024 18:19:36 -0500 Subject: [PATCH 0412/1565] Revert "fget: clarify and improve __fget_files() implementation" Temporarily revert commit 0849f83e4782 ("fget: clarify and improve __fget_files() implementation") to enable subsequent upstream commits to apply and build cleanly. Stable-dep-of: bebf684bf330 ("file: Rename __fcheck_files to files_lookup_fd_raw") Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/file.c | 72 +++++++++++++------------------------------------------ 1 file changed, 16 insertions(+), 56 deletions(-) diff --git a/fs/file.c b/fs/file.c index 5065252bb474..fea693acc065 100644 --- a/fs/file.c +++ b/fs/file.c @@ -849,68 +849,28 @@ void do_close_on_exec(struct files_struct *files) spin_unlock(&files->file_lock); } -static inline struct file *__fget_files_rcu(struct files_struct *files, - unsigned int fd, fmode_t mask, unsigned int refs) -{ - for (;;) { - struct file *file; - struct fdtable *fdt = rcu_dereference_raw(files->fdt); - struct file __rcu **fdentry; - - if (unlikely(fd >= fdt->max_fds)) - return NULL; - - fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds); - file = rcu_dereference_raw(*fdentry); - if (unlikely(!file)) - return NULL; - - if (unlikely(file->f_mode & mask)) - return NULL; - - /* - * Ok, we have a file pointer. However, because we do - * this all locklessly under RCU, we may be racing with - * that file being closed. - * - * Such a race can take two forms: - * - * (a) the file ref already went down to zero, - * and get_file_rcu_many() fails. Just try - * again: - */ - if (unlikely(!get_file_rcu_many(file, refs))) - continue; - - /* - * (b) the file table entry has changed under us. - * Note that we don't need to re-check the 'fdt->fd' - * pointer having changed, because it always goes - * hand-in-hand with 'fdt'. - * - * If so, we need to put our refs and try again. - */ - if (unlikely(rcu_dereference_raw(files->fdt) != fdt) || - unlikely(rcu_dereference_raw(*fdentry) != file)) { - fput_many(file, refs); - continue; - } - - /* - * Ok, we have a ref to the file, and checked that it - * still exists. - */ - return file; - } -} - static struct file *__fget_files(struct files_struct *files, unsigned int fd, fmode_t mask, unsigned int refs) { struct file *file; rcu_read_lock(); - file = __fget_files_rcu(files, fd, mask, refs); +loop: + file = fcheck_files(files, fd); + if (file) { + /* File object ref couldn't be taken. + * dup2() atomicity guarantee is the reason + * we loop to catch the new file (or NULL pointer) + */ + if (file->f_mode & mask) + file = NULL; + else if (!get_file_rcu_many(file, refs)) + goto loop; + else if (__fcheck_files(files, fd) != file) { + fput_many(file, refs); + goto loop; + } + } rcu_read_unlock(); return file; From ddb21f9984209b2c502ed28698918975528721f5 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Thu, 10 Dec 2020 12:39:54 -0600 Subject: [PATCH 0413/1565] file: Rename __fcheck_files to files_lookup_fd_raw [ Upstream commit bebf684bf330915e6c96313ad7db89a5480fc9c2 ] The function fcheck despite it's comment is poorly named as it has no callers that only check it's return value. All of fcheck's callers use the returned file descriptor. The same is true for fcheck_files and __fcheck_files. A new less confusing name is needed. In addition the names of these functions are confusing as they do not report the kind of locks that are needed to be held when these functions are called making error prone to use them. To remedy this I am making the base functio name lookup_fd and will and prefixes and sufficies to indicate the rest of the context. Name the function (previously called __fcheck_files) that proceeds from a struct files_struct, looks up the struct file of a file descriptor, and requires it's callers to verify all of the appropriate locks are held files_lookup_fd_raw. The need for better names became apparent in the last round of discussion of this set of changes[1]. [1] https://lkml.kernel.org/r/CAHk-=wj8BQbgJFLa+J0e=iT-1qpmCRTbPAJ8gd6MJQ=kbRPqyQ@mail.gmail.com Link: https://lkml.kernel.org/r/20201120231441.29911-7-ebiederm@xmission.com Signed-off-by: Eric W. Biederman [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/file.c | 4 ++-- include/linux/fdtable.h | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/file.c b/fs/file.c index fea693acc065..eb1e2b7220ac 100644 --- a/fs/file.c +++ b/fs/file.c @@ -866,7 +866,7 @@ static struct file *__fget_files(struct files_struct *files, unsigned int fd, file = NULL; else if (!get_file_rcu_many(file, refs)) goto loop; - else if (__fcheck_files(files, fd) != file) { + else if (files_lookup_fd_raw(files, fd) != file) { fput_many(file, refs); goto loop; } @@ -933,7 +933,7 @@ static unsigned long __fget_light(unsigned int fd, fmode_t mask) struct file *file; if (atomic_read(&files->count) == 1) { - file = __fcheck_files(files, fd); + file = files_lookup_fd_raw(files, fd); if (!file || unlikely(file->f_mode & mask)) return 0; return (unsigned long)file; diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index c0ca6fb3f0f9..10e75b4c30a4 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -80,7 +80,7 @@ struct dentry; /* * The caller must ensure that fd table isn't shared or hold rcu or file lock */ -static inline struct file *__fcheck_files(struct files_struct *files, unsigned int fd) +static inline struct file *files_lookup_fd_raw(struct files_struct *files, unsigned int fd) { struct fdtable *fdt = rcu_dereference_raw(files->fdt); @@ -96,7 +96,7 @@ static inline struct file *fcheck_files(struct files_struct *files, unsigned int RCU_LOCKDEP_WARN(!rcu_read_lock_held() && !lockdep_is_held(&files->file_lock), "suspicious rcu_dereference_check() usage"); - return __fcheck_files(files, fd); + return files_lookup_fd_raw(files, fd); } /* From 9080557c56cd673941675f38805356f7f72949fa Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:25 -0600 Subject: [PATCH 0414/1565] file: Factor files_lookup_fd_locked out of fcheck_files [ Upstream commit 120ce2b0cd52abe73e8b16c23461eb14df5a87d8 ] To make it easy to tell where files->file_lock protection is being used when looking up a file create files_lookup_fd_locked. Only allow this function to be called with the file_lock held. Update the callers of fcheck and fcheck_files that are called with the files->file_lock held to call files_lookup_fd_locked instead. Hopefully this makes it easier to quickly understand what is going on. The need for better names became apparent in the last round of discussion of this set of changes[1]. [1] https://lkml.kernel.org/r/CAHk-=wj8BQbgJFLa+J0e=iT-1qpmCRTbPAJ8gd6MJQ=kbRPqyQ@mail.gmail.com Link: https://lkml.kernel.org/r/20201120231441.29911-8-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/file.c | 2 +- fs/locks.c | 14 ++++++++------ fs/proc/fd.c | 2 +- include/linux/fdtable.h | 7 +++++++ 4 files changed, 17 insertions(+), 8 deletions(-) diff --git a/fs/file.c b/fs/file.c index eb1e2b7220ac..6a6b03ce4ad6 100644 --- a/fs/file.c +++ b/fs/file.c @@ -1158,7 +1158,7 @@ static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags) spin_lock(&files->file_lock); err = expand_files(files, newfd); - file = fcheck(oldfd); + file = files_lookup_fd_locked(files, oldfd); if (unlikely(!file)) goto Ebadf; if (unlikely(err < 0)) { diff --git a/fs/locks.c b/fs/locks.c index cbb5701ce9f3..873f97504bdd 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2536,14 +2536,15 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd, */ if (!error && file_lock->fl_type != F_UNLCK && !(file_lock->fl_flags & FL_OFDLCK)) { + struct files_struct *files = current->files; /* * We need that spin_lock here - it prevents reordering between * update of i_flctx->flc_posix and check for it done in * close(). rcu_read_lock() wouldn't do. */ - spin_lock(¤t->files->file_lock); - f = fcheck(fd); - spin_unlock(¤t->files->file_lock); + spin_lock(&files->file_lock); + f = files_lookup_fd_locked(files, fd); + spin_unlock(&files->file_lock); if (f != filp) { file_lock->fl_type = F_UNLCK; error = do_lock_file_wait(filp, cmd, file_lock); @@ -2667,14 +2668,15 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd, */ if (!error && file_lock->fl_type != F_UNLCK && !(file_lock->fl_flags & FL_OFDLCK)) { + struct files_struct *files = current->files; /* * We need that spin_lock here - it prevents reordering between * update of i_flctx->flc_posix and check for it done in * close(). rcu_read_lock() wouldn't do. */ - spin_lock(¤t->files->file_lock); - f = fcheck(fd); - spin_unlock(¤t->files->file_lock); + spin_lock(&files->file_lock); + f = files_lookup_fd_locked(files, fd); + spin_unlock(&files->file_lock); if (f != filp) { file_lock->fl_type = F_UNLCK; error = do_lock_file_wait(filp, cmd, file_lock); diff --git a/fs/proc/fd.c b/fs/proc/fd.c index d58960f6ee52..2cca9bca3b3a 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -35,7 +35,7 @@ static int seq_show(struct seq_file *m, void *v) unsigned int fd = proc_fd(m->private); spin_lock(&files->file_lock); - file = fcheck_files(files, fd); + file = files_lookup_fd_locked(files, fd); if (file) { struct fdtable *fdt = files_fdtable(files); diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index 10e75b4c30a4..87be704268d2 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -91,6 +91,13 @@ static inline struct file *files_lookup_fd_raw(struct files_struct *files, unsig return NULL; } +static inline struct file *files_lookup_fd_locked(struct files_struct *files, unsigned int fd) +{ + RCU_LOCKDEP_WARN(!lockdep_is_held(&files->file_lock), + "suspicious rcu_dereference_check() usage"); + return files_lookup_fd_raw(files, fd); +} + static inline struct file *fcheck_files(struct files_struct *files, unsigned int fd) { RCU_LOCKDEP_WARN(!rcu_read_lock_held() && From 23f55649921b78ddc3b24a443e755e546a574288 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:26 -0600 Subject: [PATCH 0415/1565] file: Replace fcheck_files with files_lookup_fd_rcu [ Upstream commit f36c2943274199cb8aef32ac96531ffb7c4b43d0 ] This change renames fcheck_files to files_lookup_fd_rcu. All of the remaining callers take the rcu_read_lock before calling this function so the _rcu suffix is appropriate. This change also tightens up the debug check to verify that all callers hold the rcu_read_lock. All callers that used to call files_check with the files->file_lock held have now been changed to call files_lookup_fd_locked. This change of name has helped remind me of which locks and which guarantees are in place helping me to catch bugs later in the patchset. The need for better names became apparent in the last round of discussion of this set of changes[1]. [1] https://lkml.kernel.org/r/CAHk-=wj8BQbgJFLa+J0e=iT-1qpmCRTbPAJ8gd6MJQ=kbRPqyQ@mail.gmail.com Link: https://lkml.kernel.org/r/20201120231441.29911-9-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- Documentation/filesystems/files.rst | 6 +++--- fs/file.c | 4 ++-- fs/proc/fd.c | 4 ++-- include/linux/fdtable.h | 7 +++---- kernel/bpf/task_iter.c | 2 +- kernel/kcmp.c | 2 +- 6 files changed, 12 insertions(+), 13 deletions(-) diff --git a/Documentation/filesystems/files.rst b/Documentation/filesystems/files.rst index cbf8e57376bf..ea75acdb632c 100644 --- a/Documentation/filesystems/files.rst +++ b/Documentation/filesystems/files.rst @@ -62,7 +62,7 @@ the fdtable structure - be held. 4. To look up the file structure given an fd, a reader - must use either fcheck() or fcheck_files() APIs. These + must use either fcheck() or files_lookup_fd_rcu() APIs. These take care of barrier requirements due to lock-free lookup. An example:: @@ -84,7 +84,7 @@ the fdtable structure - on ->f_count:: rcu_read_lock(); - file = fcheck_files(files, fd); + file = files_lookup_fd_rcu(files, fd); if (file) { if (atomic_long_inc_not_zero(&file->f_count)) *fput_needed = 1; @@ -104,7 +104,7 @@ the fdtable structure - lock-free, they must be installed using rcu_assign_pointer() API. If they are looked up lock-free, rcu_dereference() must be used. However it is advisable to use files_fdtable() - and fcheck()/fcheck_files() which take care of these issues. + and fcheck()/files_lookup_fd_rcu() which take care of these issues. 7. While updating, the fdtable pointer must be looked up while holding files->file_lock. If ->file_lock is dropped, then diff --git a/fs/file.c b/fs/file.c index 6a6b03ce4ad6..6149f75a18a6 100644 --- a/fs/file.c +++ b/fs/file.c @@ -856,7 +856,7 @@ static struct file *__fget_files(struct files_struct *files, unsigned int fd, rcu_read_lock(); loop: - file = fcheck_files(files, fd); + file = files_lookup_fd_rcu(files, fd); if (file) { /* File object ref couldn't be taken. * dup2() atomicity guarantee is the reason @@ -1187,7 +1187,7 @@ SYSCALL_DEFINE2(dup2, unsigned int, oldfd, unsigned int, newfd) int retval = oldfd; rcu_read_lock(); - if (!fcheck_files(files, oldfd)) + if (!files_lookup_fd_rcu(files, oldfd)) retval = -EBADF; rcu_read_unlock(); return retval; diff --git a/fs/proc/fd.c b/fs/proc/fd.c index 2cca9bca3b3a..3dec44d7c5c5 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -90,7 +90,7 @@ static bool tid_fd_mode(struct task_struct *task, unsigned fd, fmode_t *mode) return false; rcu_read_lock(); - file = fcheck_files(files, fd); + file = files_lookup_fd_rcu(files, fd); if (file) *mode = file->f_mode; rcu_read_unlock(); @@ -243,7 +243,7 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, char name[10 + 1]; unsigned int len; - f = fcheck_files(files, fd); + f = files_lookup_fd_rcu(files, fd); if (!f) continue; data.mode = f->f_mode; diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index 87be704268d2..a45fa2ef723f 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -98,10 +98,9 @@ static inline struct file *files_lookup_fd_locked(struct files_struct *files, un return files_lookup_fd_raw(files, fd); } -static inline struct file *fcheck_files(struct files_struct *files, unsigned int fd) +static inline struct file *files_lookup_fd_rcu(struct files_struct *files, unsigned int fd) { - RCU_LOCKDEP_WARN(!rcu_read_lock_held() && - !lockdep_is_held(&files->file_lock), + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "suspicious rcu_dereference_check() usage"); return files_lookup_fd_raw(files, fd); } @@ -109,7 +108,7 @@ static inline struct file *fcheck_files(struct files_struct *files, unsigned int /* * Check whether the specified fd has an open file. */ -#define fcheck(fd) fcheck_files(current->files, fd) +#define fcheck(fd) files_lookup_fd_rcu(current->files, fd) struct task_struct; diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c index f3d3a562a802..762b4d7c3779 100644 --- a/kernel/bpf/task_iter.c +++ b/kernel/bpf/task_iter.c @@ -185,7 +185,7 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info) for (; curr_fd < max_fds; curr_fd++) { struct file *f; - f = fcheck_files(curr_files, curr_fd); + f = files_lookup_fd_rcu(curr_files, curr_fd); if (!f) continue; if (!get_file_rcu(f)) diff --git a/kernel/kcmp.c b/kernel/kcmp.c index bd6f9edf98fd..5b2435e03047 100644 --- a/kernel/kcmp.c +++ b/kernel/kcmp.c @@ -67,7 +67,7 @@ get_file_raw_ptr(struct task_struct *task, unsigned int idx) rcu_read_lock(); if (task->files) - file = fcheck_files(task->files, idx); + file = files_lookup_fd_rcu(task->files, idx); rcu_read_unlock(); task_unlock(task); From c4716bb296504cbc64aeefb370df44e821214c44 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:27 -0600 Subject: [PATCH 0416/1565] file: Rename fcheck lookup_fd_rcu [ Upstream commit 460b4f812a9d473d4b39d87d37844f9fc30a9eb3 ] Also remove the confusing comment about checking if a fd exists. I could not find one instance in the entire kernel that still matches the description or the reason for the name fcheck. The need for better names became apparent in the last round of discussion of this set of changes[1]. [1] https://lkml.kernel.org/r/CAHk-=wj8BQbgJFLa+J0e=iT-1qpmCRTbPAJ8gd6MJQ=kbRPqyQ@mail.gmail.com Link: https://lkml.kernel.org/r/20201120231441.29911-10-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- Documentation/filesystems/files.rst | 6 +++--- arch/powerpc/platforms/cell/spufs/coredump.c | 2 +- fs/notify/dnotify/dnotify.c | 2 +- include/linux/fdtable.h | 8 ++++---- 4 files changed, 9 insertions(+), 9 deletions(-) diff --git a/Documentation/filesystems/files.rst b/Documentation/filesystems/files.rst index ea75acdb632c..bcf84459917f 100644 --- a/Documentation/filesystems/files.rst +++ b/Documentation/filesystems/files.rst @@ -62,7 +62,7 @@ the fdtable structure - be held. 4. To look up the file structure given an fd, a reader - must use either fcheck() or files_lookup_fd_rcu() APIs. These + must use either lookup_fd_rcu() or files_lookup_fd_rcu() APIs. These take care of barrier requirements due to lock-free lookup. An example:: @@ -70,7 +70,7 @@ the fdtable structure - struct file *file; rcu_read_lock(); - file = fcheck(fd); + file = lookup_fd_rcu(fd); if (file) { ... } @@ -104,7 +104,7 @@ the fdtable structure - lock-free, they must be installed using rcu_assign_pointer() API. If they are looked up lock-free, rcu_dereference() must be used. However it is advisable to use files_fdtable() - and fcheck()/files_lookup_fd_rcu() which take care of these issues. + and lookup_fd_rcu()/files_lookup_fd_rcu() which take care of these issues. 7. While updating, the fdtable pointer must be looked up while holding files->file_lock. If ->file_lock is dropped, then diff --git a/arch/powerpc/platforms/cell/spufs/coredump.c b/arch/powerpc/platforms/cell/spufs/coredump.c index 026c181a98c5..60b5583e9eaf 100644 --- a/arch/powerpc/platforms/cell/spufs/coredump.c +++ b/arch/powerpc/platforms/cell/spufs/coredump.c @@ -74,7 +74,7 @@ static struct spu_context *coredump_next_context(int *fd) *fd = n - 1; rcu_read_lock(); - file = fcheck(*fd); + file = lookup_fd_rcu(*fd); ctx = SPUFS_I(file_inode(file))->i_ctx; get_spu_context(ctx); rcu_read_unlock(); diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index e45ca6ecba95..e85e13c50d6d 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -327,7 +327,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) } rcu_read_lock(); - f = fcheck(fd); + f = lookup_fd_rcu(fd); rcu_read_unlock(); /* if (f != filp) means that we lost a race and another task/thread diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index a45fa2ef723f..695306cc5337 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -105,10 +105,10 @@ static inline struct file *files_lookup_fd_rcu(struct files_struct *files, unsig return files_lookup_fd_raw(files, fd); } -/* - * Check whether the specified fd has an open file. - */ -#define fcheck(fd) files_lookup_fd_rcu(current->files, fd) +static inline struct file *lookup_fd_rcu(unsigned int fd) +{ + return files_lookup_fd_rcu(current->files, fd); +} struct task_struct; From 32ac87287d0b50eb7d6933e58580fc78198f1f35 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:28 -0600 Subject: [PATCH 0417/1565] file: Implement task_lookup_fd_rcu [ Upstream commit 3a879fb38082125cc0d8aa89b70c7f3a7cdf584b ] As a companion to lookup_fd_rcu implement task_lookup_fd_rcu for querying an arbitrary process about a specific file. Acked-by: Christian Brauner v1: https://lkml.kernel.org/r/20200818103713.aw46m7vprsy4vlve@wittgenstein Link: https://lkml.kernel.org/r/20201120231441.29911-11-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/file.c | 15 +++++++++++++++ include/linux/fdtable.h | 2 ++ 2 files changed, 17 insertions(+) diff --git a/fs/file.c b/fs/file.c index 6149f75a18a6..60a3ccba728c 100644 --- a/fs/file.c +++ b/fs/file.c @@ -911,6 +911,21 @@ struct file *fget_task(struct task_struct *task, unsigned int fd) return file; } +struct file *task_lookup_fd_rcu(struct task_struct *task, unsigned int fd) +{ + /* Must be called with rcu_read_lock held */ + struct files_struct *files; + struct file *file = NULL; + + task_lock(task); + files = task->files; + if (files) + file = files_lookup_fd_rcu(files, fd); + task_unlock(task); + + return file; +} + /* * Lightweight file lookup - no refcnt increment if fd table isn't shared. * diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index 695306cc5337..a88f68f74067 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -110,6 +110,8 @@ static inline struct file *lookup_fd_rcu(unsigned int fd) return files_lookup_fd_rcu(current->files, fd); } +struct file *task_lookup_fd_rcu(struct task_struct *task, unsigned int fd); + struct task_struct; struct files_struct *get_files_struct(struct task_struct *); From c2291f7bdf25d655e80da628a2512f3ffb44db9b Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:29 -0600 Subject: [PATCH 0418/1565] proc/fd: In tid_fd_mode use task_lookup_fd_rcu [ Upstream commit 64eb661fda0269276b4c46965832938e3f268268 ] When discussing[1] exec and posix file locks it was realized that none of the callers of get_files_struct fundamentally needed to call get_files_struct, and that by switching them to helper functions instead it will both simplify their code and remove unnecessary increments of files_struct.count. Those unnecessary increments can result in exec unnecessarily unsharing files_struct which breaking posix locks, and it can result in fget_light having to fallback to fget reducing system performance. Instead of manually coding finding the files struct for a task and then calling files_lookup_fd_rcu, use the helper task_lookup_fd_rcu that combines those to steps. Making the code simpler and removing the need to get a reference on a files_struct. [1] https://lkml.kernel.org/r/20180915160423.GA31461@redhat.com Suggested-by: Oleg Nesterov Acked-by: Christian Brauner v1: https://lkml.kernel.org/r/20200817220425.9389-7-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-12-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/proc/fd.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/proc/fd.c b/fs/proc/fd.c index 3dec44d7c5c5..c1a984f3c4df 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -83,18 +83,13 @@ static const struct file_operations proc_fdinfo_file_operations = { static bool tid_fd_mode(struct task_struct *task, unsigned fd, fmode_t *mode) { - struct files_struct *files = get_files_struct(task); struct file *file; - if (!files) - return false; - rcu_read_lock(); - file = files_lookup_fd_rcu(files, fd); + file = task_lookup_fd_rcu(task, fd); if (file) *mode = file->f_mode; rcu_read_unlock(); - put_files_struct(files); return !!file; } From 6007aeeaefb31b22b66e1f7c4dc5dc49c5ab6e98 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:30 -0600 Subject: [PATCH 0419/1565] kcmp: In get_file_raw_ptr use task_lookup_fd_rcu [ Upstream commit ed77e80e14a3cd55c73848b9e8043020e717ce12 ] Modify get_file_raw_ptr to use task_lookup_fd_rcu. The helper task_lookup_fd_rcu does the work of taking the task lock and verifying that task->files != NULL and then calls files_lookup_fd_rcu. So let use the helper to make a simpler implementation of get_file_raw_ptr. Acked-by: Cyrill Gorcunov Link: https://lkml.kernel.org/r/20201120231441.29911-13-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- kernel/kcmp.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/kernel/kcmp.c b/kernel/kcmp.c index 5b2435e03047..5353edfad8e1 100644 --- a/kernel/kcmp.c +++ b/kernel/kcmp.c @@ -61,16 +61,11 @@ static int kcmp_ptr(void *v1, void *v2, enum kcmp_type type) static struct file * get_file_raw_ptr(struct task_struct *task, unsigned int idx) { - struct file *file = NULL; + struct file *file; - task_lock(task); rcu_read_lock(); - - if (task->files) - file = files_lookup_fd_rcu(task->files, idx); - + file = task_lookup_fd_rcu(task, idx); rcu_read_unlock(); - task_unlock(task); return file; } From a6da7536e488ae64726350fcd50d14f7309ffeaa Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:31 -0600 Subject: [PATCH 0420/1565] file: Implement task_lookup_next_fd_rcu [ Upstream commit e9a53aeb5e0a838f10fcea74235664e7ad5e6e1a ] As a companion to fget_task and task_lookup_fd_rcu implement task_lookup_next_fd_rcu that will return the struct file for the first file descriptor number that is equal or greater than the fd argument value, or NULL if there is no such struct file. This allows file descriptors of foreign processes to be iterated through safely, without needed to increment the count on files_struct. Some concern[1] has been expressed that this function takes the task_lock for each iteration and thus for each file descriptor. This place where this function will be called in a commonly used code path is for listing /proc//fd. I did some small benchmarks and did not see any measurable performance differences. For ordinary users ls is likely to stat each of the directory entries and tid_fd_mode called from tid_fd_revalidae has always taken the task lock for each file descriptor. So this does not look like it will be a big change in practice. At some point is will probably be worth changing put_files_struct to free files_struct after an rcu grace period so that task_lock won't be needed at all. [1] https://lkml.kernel.org/r/20200817220425.9389-10-ebiederm@xmission.com v1: https://lkml.kernel.org/r/20200817220425.9389-9-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-14-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/file.c | 21 +++++++++++++++++++++ include/linux/fdtable.h | 1 + 2 files changed, 22 insertions(+) diff --git a/fs/file.c b/fs/file.c index 60a3ccba728c..9fa49e6298fb 100644 --- a/fs/file.c +++ b/fs/file.c @@ -926,6 +926,27 @@ struct file *task_lookup_fd_rcu(struct task_struct *task, unsigned int fd) return file; } +struct file *task_lookup_next_fd_rcu(struct task_struct *task, unsigned int *ret_fd) +{ + /* Must be called with rcu_read_lock held */ + struct files_struct *files; + unsigned int fd = *ret_fd; + struct file *file = NULL; + + task_lock(task); + files = task->files; + if (files) { + for (; fd < files_fdtable(files)->max_fds; fd++) { + file = files_lookup_fd_rcu(files, fd); + if (file) + break; + } + } + task_unlock(task); + *ret_fd = fd; + return file; +} + /* * Lightweight file lookup - no refcnt increment if fd table isn't shared. * diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index a88f68f74067..b0c6a959c6a0 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -111,6 +111,7 @@ static inline struct file *lookup_fd_rcu(unsigned int fd) } struct file *task_lookup_fd_rcu(struct task_struct *task, unsigned int fd); +struct file *task_lookup_next_fd_rcu(struct task_struct *task, unsigned int *fd); struct task_struct; From c0e3f6df04ce00260f1a108d69f6928816d43681 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:32 -0600 Subject: [PATCH 0421/1565] proc/fd: In proc_readfd_common use task_lookup_next_fd_rcu [ Upstream commit 5b17b61870e2f4b0a4fdc5c6039fbdb4ffb796df ] When discussing[1] exec and posix file locks it was realized that none of the callers of get_files_struct fundamentally needed to call get_files_struct, and that by switching them to helper functions instead it will both simplify their code and remove unnecessary increments of files_struct.count. Those unnecessary increments can result in exec unnecessarily unsharing files_struct which breaking posix locks, and it can result in fget_light having to fallback to fget reducing system performance. Using task_lookup_next_fd_rcu simplifies proc_readfd_common, by moving the checking for the maximum file descritor into the generic code, and by remvoing the need for capturing and releasing a reference on files_struct. As task_lookup_fd_rcu may update the fd ctx->pos has been changed to be the fd +2 after task_lookup_fd_rcu returns. [1] https://lkml.kernel.org/r/20180915160423.GA31461@redhat.com Suggested-by: Oleg Nesterov Tested-by: Andy Lavr v1: https://lkml.kernel.org/r/20200817220425.9389-10-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-15-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/proc/fd.c | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/fs/proc/fd.c b/fs/proc/fd.c index c1a984f3c4df..72c1525b4b3e 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -217,7 +217,6 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, instantiate_t instantiate) { struct task_struct *p = get_proc_task(file_inode(file)); - struct files_struct *files; unsigned int fd; if (!p) @@ -225,22 +224,18 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, if (!dir_emit_dots(file, ctx)) goto out; - files = get_files_struct(p); - if (!files) - goto out; rcu_read_lock(); - for (fd = ctx->pos - 2; - fd < files_fdtable(files)->max_fds; - fd++, ctx->pos++) { + for (fd = ctx->pos - 2;; fd++) { struct file *f; struct fd_data data; char name[10 + 1]; unsigned int len; - f = files_lookup_fd_rcu(files, fd); + f = task_lookup_next_fd_rcu(p, &fd); + ctx->pos = fd + 2LL; if (!f) - continue; + break; data.mode = f->f_mode; rcu_read_unlock(); data.fd = fd; @@ -249,13 +244,11 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, if (!proc_fill_cache(file, ctx, name, len, instantiate, p, &data)) - goto out_fd_loop; + goto out; cond_resched(); rcu_read_lock(); } rcu_read_unlock(); -out_fd_loop: - put_files_struct(files); out: put_task_struct(p); return 0; From b4b827da9096162b3cd4e3d7cee6da087c47e6f0 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:34 -0600 Subject: [PATCH 0422/1565] proc/fd: In fdinfo seq_show don't use get_files_struct [ Upstream commit 775e0656b27210ae668e33af00bece858f44576f ] When discussing[1] exec and posix file locks it was realized that none of the callers of get_files_struct fundamentally needed to call get_files_struct, and that by switching them to helper functions instead it will both simplify their code and remove unnecessary increments of files_struct.count. Those unnecessary increments can result in exec unnecessarily unsharing files_struct which breaking posix locks, and it can result in fget_light having to fallback to fget reducing system performance. Instead hold task_lock for the duration that task->files needs to be stable in seq_show. The task_lock was already taken in get_files_struct, and so skipping get_files_struct performs less work overall, and avoids the problems with the files_struct reference count. [1] https://lkml.kernel.org/r/20180915160423.GA31461@redhat.com Suggested-by: Oleg Nesterov Acked-by: Christian Brauner v1: https://lkml.kernel.org/r/20200817220425.9389-12-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-17-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/proc/fd.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/proc/fd.c b/fs/proc/fd.c index 72c1525b4b3e..cb51763ed554 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -28,9 +28,8 @@ static int seq_show(struct seq_file *m, void *v) if (!task) return -ENOENT; - files = get_files_struct(task); - put_task_struct(task); - + task_lock(task); + files = task->files; if (files) { unsigned int fd = proc_fd(m->private); @@ -47,8 +46,9 @@ static int seq_show(struct seq_file *m, void *v) ret = 0; } spin_unlock(&files->file_lock); - put_files_struct(files); } + task_unlock(task); + put_task_struct(task); if (ret) return ret; @@ -57,6 +57,7 @@ static int seq_show(struct seq_file *m, void *v) (long long)file->f_pos, f_flags, real_mount(file->f_path.mnt)->mnt_id); + /* show_fd_locks() never deferences files so a stale value is safe */ show_fd_locks(m, file, files); if (seq_has_overflowed(m)) goto out; From 89f9e529643ab5b1a98b98e30ac4177d0f7ba1cc Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:35 -0600 Subject: [PATCH 0423/1565] file: Merge __fd_install into fd_install [ Upstream commit d74ba04d919ebe30bf47406819c18c6b50003d92 ] The function __fd_install was added to support binder[1]. With binder fixed[2] there are no more users. As fd_install just calls __fd_install with "files=current->files", merge them together by transforming the files parameter into a local variable initialized to current->files. [1] f869e8a7f753 ("expose a low-level variant of fd_install() for binder") [2] 44d8047f1d87 ("binder: use standard functions to allocate fds") Acked-by: Christian Brauner v1: https://lkml.kernel.org/r/20200817220425.9389-14-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-18-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/file.c | 25 ++++++------------------- include/linux/fdtable.h | 2 -- 2 files changed, 6 insertions(+), 21 deletions(-) diff --git a/fs/file.c b/fs/file.c index 9fa49e6298fb..a80deabe7f7d 100644 --- a/fs/file.c +++ b/fs/file.c @@ -175,7 +175,7 @@ static int expand_fdtable(struct files_struct *files, unsigned int nr) spin_unlock(&files->file_lock); new_fdt = alloc_fdtable(nr); - /* make sure all __fd_install() have seen resize_in_progress + /* make sure all fd_install() have seen resize_in_progress * or have finished their rcu_read_lock_sched() section. */ if (atomic_read(&files->count) > 1) @@ -198,7 +198,7 @@ static int expand_fdtable(struct files_struct *files, unsigned int nr) rcu_assign_pointer(files->fdt, new_fdt); if (cur_fdt != &files->fdtab) call_rcu(&cur_fdt->rcu, free_fdtable_rcu); - /* coupled with smp_rmb() in __fd_install() */ + /* coupled with smp_rmb() in fd_install() */ smp_wmb(); return 1; } @@ -613,17 +613,13 @@ EXPORT_SYMBOL(put_unused_fd); * It should never happen - if we allow dup2() do it, _really_ bad things * will follow. * - * NOTE: __fd_install() variant is really, really low-level; don't - * use it unless you are forced to by truly lousy API shoved down - * your throat. 'files' *MUST* be either current->files or obtained - * by get_files_struct(current) done by whoever had given it to you, - * or really bad things will happen. Normally you want to use - * fd_install() instead. + * This consumes the "file" refcount, so callers should treat it + * as if they had called fput(file). */ -void __fd_install(struct files_struct *files, unsigned int fd, - struct file *file) +void fd_install(unsigned int fd, struct file *file) { + struct files_struct *files = current->files; struct fdtable *fdt; rcu_read_lock_sched(); @@ -645,15 +641,6 @@ void __fd_install(struct files_struct *files, unsigned int fd, rcu_read_unlock_sched(); } -/* - * This consumes the "file" refcount, so callers should treat it - * as if they had called fput(file). - */ -void fd_install(unsigned int fd, struct file *file) -{ - __fd_install(current->files, fd, file); -} - EXPORT_SYMBOL(fd_install); static struct file *pick_file(struct files_struct *files, unsigned fd) diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index b0c6a959c6a0..6e8743a4c9d3 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -126,8 +126,6 @@ int iterate_fd(struct files_struct *, unsigned, extern int __alloc_fd(struct files_struct *files, unsigned start, unsigned end, unsigned flags); -extern void __fd_install(struct files_struct *files, - unsigned int fd, struct file *file); extern int __close_fd(struct files_struct *files, unsigned int fd); extern int __close_range(unsigned int fd, unsigned int max_fd, unsigned int flags); From 9e8ef54ca890eda6f1993405684a68cc8dff06e7 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:36 -0600 Subject: [PATCH 0424/1565] file: In f_dupfd read RLIMIT_NOFILE once. Simplify the code, and remove the chance of races by reading RLIMIT_NOFILE only once in f_dupfd. Pass the read value of RLIMIT_NOFILE into alloc_fd which is the other location the rlimit was read in f_dupfd. As f_dupfd is the only caller of alloc_fd this changing alloc_fd is trivially safe. Further this causes alloc_fd to take all of the same arguments as __alloc_fd except for the files_struct argument. Acked-by: Christian Brauner v1: https://lkml.kernel.org/r/20200817220425.9389-15-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-19-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/file.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/file.c b/fs/file.c index a80deabe7f7d..9e2b171b9252 100644 --- a/fs/file.c +++ b/fs/file.c @@ -567,9 +567,9 @@ int __alloc_fd(struct files_struct *files, return error; } -static int alloc_fd(unsigned start, unsigned flags) +static int alloc_fd(unsigned start, unsigned end, unsigned flags) { - return __alloc_fd(current->files, start, rlimit(RLIMIT_NOFILE), flags); + return __alloc_fd(current->files, start, end, flags); } int __get_unused_fd_flags(unsigned flags, unsigned long nofile) @@ -1235,10 +1235,11 @@ SYSCALL_DEFINE1(dup, unsigned int, fildes) int f_dupfd(unsigned int from, struct file *file, unsigned flags) { + unsigned long nofile = rlimit(RLIMIT_NOFILE); int err; - if (from >= rlimit(RLIMIT_NOFILE)) + if (from >= nofile) return -EINVAL; - err = alloc_fd(from, flags); + err = alloc_fd(from, nofile, flags); if (err >= 0) { get_file(file); fd_install(err, file); From 7458c5ae465ef8f052e5bb5a7c4755e9c755d752 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:37 -0600 Subject: [PATCH 0425/1565] file: Merge __alloc_fd into alloc_fd [ Upstream commit aa384d10f3d06d4b85597ff5df41551262220e16 ] The function __alloc_fd was added to support binder[1]. With binder fixed[2] there are no more users. As alloc_fd just calls __alloc_fd with "files=current->files", merge them together by transforming the files parameter into a local variable initialized to current->files. [1] dcfadfa4ec5a ("new helper: __alloc_fd()") [2] 44d8047f1d87 ("binder: use standard functions to allocate fds") Acked-by: Christian Brauner v1: https://lkml.kernel.org/r/20200817220425.9389-16-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-20-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/file.c | 11 +++-------- include/linux/fdtable.h | 2 -- 2 files changed, 3 insertions(+), 10 deletions(-) diff --git a/fs/file.c b/fs/file.c index 9e2b171b9252..48d0306e42cc 100644 --- a/fs/file.c +++ b/fs/file.c @@ -509,9 +509,9 @@ static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start) /* * allocate a file descriptor, mark it busy. */ -int __alloc_fd(struct files_struct *files, - unsigned start, unsigned end, unsigned flags) +static int alloc_fd(unsigned start, unsigned end, unsigned flags) { + struct files_struct *files = current->files; unsigned int fd; int error; struct fdtable *fdt; @@ -567,14 +567,9 @@ int __alloc_fd(struct files_struct *files, return error; } -static int alloc_fd(unsigned start, unsigned end, unsigned flags) -{ - return __alloc_fd(current->files, start, end, flags); -} - int __get_unused_fd_flags(unsigned flags, unsigned long nofile) { - return __alloc_fd(current->files, 0, nofile, flags); + return alloc_fd(0, nofile, flags); } int get_unused_fd_flags(unsigned flags) diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index 6e8743a4c9d3..d26b884fcc5c 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -124,8 +124,6 @@ int iterate_fd(struct files_struct *, unsigned, int (*)(const void *, struct file *, unsigned), const void *); -extern int __alloc_fd(struct files_struct *files, - unsigned start, unsigned end, unsigned flags); extern int __close_fd(struct files_struct *files, unsigned int fd); extern int __close_range(unsigned int fd, unsigned int max_fd, unsigned int flags); From 6d256a904cd700fb374f689697901686bf3e9bdd Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:38 -0600 Subject: [PATCH 0426/1565] file: Rename __close_fd to close_fd and remove the files parameter [ Upstream commit 8760c909f54a82aaa6e76da19afe798a0c77c3c3 ] The function __close_fd was added to support binder[1]. Now that binder has been fixed to no longer need __close_fd[2] all calls to __close_fd pass current->files. Therefore transform the files parameter into a local variable initialized to current->files, and rename __close_fd to close_fd to reflect this change, and keep it in sync with the similar changes to __alloc_fd, and __fd_install. This removes the need for callers to care about the extra care that needs to be take if anything except current->files is passed, by limiting the callers to only operation on current->files. [1] 483ce1d4b8c3 ("take descriptor-related part of close() to file.c") [2] 44d8047f1d87 ("binder: use standard functions to allocate fds") Acked-by: Christian Brauner v1: https://lkml.kernel.org/r/20200817220425.9389-17-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20201120231441.29911-21-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/file.c | 10 ++++------ fs/open.c | 2 +- include/linux/fdtable.h | 3 +-- include/linux/syscalls.h | 6 +++--- 4 files changed, 9 insertions(+), 12 deletions(-) diff --git a/fs/file.c b/fs/file.c index 48d0306e42cc..fdb84a64724b 100644 --- a/fs/file.c +++ b/fs/file.c @@ -659,11 +659,9 @@ static struct file *pick_file(struct files_struct *files, unsigned fd) return file; } -/* - * The same warnings as for __alloc_fd()/__fd_install() apply here... - */ -int __close_fd(struct files_struct *files, unsigned fd) +int close_fd(unsigned fd) { + struct files_struct *files = current->files; struct file *file; file = pick_file(files, fd); @@ -672,7 +670,7 @@ int __close_fd(struct files_struct *files, unsigned fd) return filp_close(file, files); } -EXPORT_SYMBOL(__close_fd); /* for ksys_close() */ +EXPORT_SYMBOL(close_fd); /* for ksys_close() */ /** * __close_range() - Close all file descriptors in a given range. @@ -1087,7 +1085,7 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags) struct files_struct *files = current->files; if (!file) - return __close_fd(files, fd); + return close_fd(fd); if (fd >= rlimit(RLIMIT_NOFILE)) return -EBADF; diff --git a/fs/open.c b/fs/open.c index 83f62cf1432c..48933cbb7539 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1310,7 +1310,7 @@ EXPORT_SYMBOL(filp_close); */ SYSCALL_DEFINE1(close, unsigned int, fd) { - int retval = __close_fd(current->files, fd); + int retval = close_fd(fd); /* can't restart close syscall because file table entry was cleared */ if (unlikely(retval == -ERESTARTSYS || diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index d26b884fcc5c..4ed3589f9294 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -124,8 +124,7 @@ int iterate_fd(struct files_struct *, unsigned, int (*)(const void *, struct file *, unsigned), const void *); -extern int __close_fd(struct files_struct *files, - unsigned int fd); +extern int close_fd(unsigned int fd); extern int __close_range(unsigned int fd, unsigned int max_fd, unsigned int flags); extern int close_fd_get_file(unsigned int fd, struct file **res); extern int unshare_fd(unsigned long unshare_flags, unsigned int max_fds, diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 17a24e1180da..0bc3dd86f9e5 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1320,16 +1320,16 @@ static inline long ksys_ftruncate(unsigned int fd, loff_t length) return do_sys_ftruncate(fd, length, 1); } -extern int __close_fd(struct files_struct *files, unsigned int fd); +extern int close_fd(unsigned int fd); /* * In contrast to sys_close(), this stub does not check whether the syscall * should or should not be restarted, but returns the raw error codes from - * __close_fd(). + * close_fd(). */ static inline int ksys_close(unsigned int fd) { - return __close_fd(current->files, fd); + return close_fd(fd); } extern long do_sys_truncate(const char __user *pathname, loff_t length); From 1aecdaa7e2c6619a7d2c0a81c8f5c06e52f870f3 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 20 Nov 2020 17:14:39 -0600 Subject: [PATCH 0427/1565] file: Replace ksys_close with close_fd [ Upstream commit 1572bfdf21d4d50e51941498ffe0b56c2289f783 ] Now that ksys_close is exactly identical to close_fd replace the one caller of ksys_close with close_fd. [1] https://lkml.kernel.org/r/20200818112020.GA17080@infradead.org Suggested-by: Christoph Hellwig Link: https://lkml.kernel.org/r/20201120231441.29911-22-ebiederm@xmission.com Signed-off-by: Eric W. Biederman Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/autofs/dev-ioctl.c | 5 +++-- include/linux/syscalls.h | 12 ------------ 2 files changed, 3 insertions(+), 14 deletions(-) diff --git a/fs/autofs/dev-ioctl.c b/fs/autofs/dev-ioctl.c index 322b7dfb4ea0..5bf781ea6d67 100644 --- a/fs/autofs/dev-ioctl.c +++ b/fs/autofs/dev-ioctl.c @@ -4,9 +4,10 @@ * Copyright 2008 Ian Kent */ +#include #include #include -#include +#include #include #include @@ -289,7 +290,7 @@ static int autofs_dev_ioctl_closemount(struct file *fp, struct autofs_sb_info *sbi, struct autofs_dev_ioctl *param) { - return ksys_close(param->ioctlfd); + return close_fd(param->ioctlfd); } /* diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 0bc3dd86f9e5..1ea422b1a9f1 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1320,18 +1320,6 @@ static inline long ksys_ftruncate(unsigned int fd, loff_t length) return do_sys_ftruncate(fd, length, 1); } -extern int close_fd(unsigned int fd); - -/* - * In contrast to sys_close(), this stub does not check whether the syscall - * should or should not be restarted, but returns the raw error codes from - * close_fd(). - */ -static inline int ksys_close(unsigned int fd) -{ - return close_fd(fd); -} - extern long do_sys_truncate(const char __user *pathname, loff_t length); static inline long ksys_truncate(const char __user *pathname, loff_t length) From 185e81a977d1ca0adc83aa9a1b8c6564772a791c Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Sun, 8 Nov 2020 22:59:31 -0500 Subject: [PATCH 0428/1565] inotify: Increase default inotify.max_user_watches limit to 1048576 [ Upstream commit 92890123749bafc317bbfacbe0a62ce08d78efb7 ] The default value of inotify.max_user_watches sysctl parameter was set to 8192 since the introduction of the inotify feature in 2005 by commit 0eeca28300df ("[PATCH] inotify"). Today this value is just too small for many modern usage. As a result, users have to explicitly set it to a larger value to make it work. After some searching around the web, these are the inotify.max_user_watches values used by some projects: - vscode: 524288 - dropbox support: 100000 - users on stackexchange: 12228 - lsyncd user: 2000000 - code42 support: 1048576 - monodevelop: 16384 - tectonic: 524288 - openshift origin: 65536 Each watch point adds an inotify_inode_mark structure to an inode to be watched. It also pins the watched inode. Modeled after the epoll.max_user_watches behavior to adjust the default value according to the amount of addressable memory available, make inotify.max_user_watches behave in a similar way to make it use no more than 1% of addressable memory within the range [8192, 1048576]. We estimate the amount of memory used by inotify mark to size of inotify_inode_mark plus two times the size of struct inode (we double the inode size to cover the additional filesystem private inode part). That means that a 64-bit system with 128GB or more memory will likely have the maximum value of 1048576 for inotify.max_user_watches. This default should be big enough for most use cases. Link: https://lore.kernel.org/r/20201109035931.4740-1-longman@redhat.com Reviewed-by: Amir Goldstein Signed-off-by: Waiman Long Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/inotify/inotify_user.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 32b6b97021be..b564a32403aa 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -37,6 +37,15 @@ #include +/* + * An inotify watch requires allocating an inotify_inode_mark structure as + * well as pinning the watched inode. Doubling the size of a VFS inode + * should be more than enough to cover the additional filesystem inode + * size increase. + */ +#define INOTIFY_WATCH_COST (sizeof(struct inotify_inode_mark) + \ + 2 * sizeof(struct inode)) + /* configurable via /proc/sys/fs/inotify/ */ static int inotify_max_queued_events __read_mostly; @@ -797,6 +806,18 @@ SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd) */ static int __init inotify_user_setup(void) { + unsigned long watches_max; + struct sysinfo si; + + si_meminfo(&si); + /* + * Allow up to 1% of addressable memory to be allocated for inotify + * watches (per user) limited to the range [8192, 1048576]. + */ + watches_max = (((si.totalram - si.totalhigh) / 100) << PAGE_SHIFT) / + INOTIFY_WATCH_COST; + watches_max = clamp(watches_max, 8192UL, 1048576UL); + BUILD_BUG_ON(IN_ACCESS != FS_ACCESS); BUILD_BUG_ON(IN_MODIFY != FS_MODIFY); BUILD_BUG_ON(IN_ATTRIB != FS_ATTRIB); @@ -823,7 +844,7 @@ static int __init inotify_user_setup(void) inotify_max_queued_events = 16384; init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES] = 128; - init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = 8192; + init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = watches_max; return 0; } From 131676b8240fdded867167b3501b4e53897cc9f4 Mon Sep 17 00:00:00 2001 From: Zheng Yongjun Date: Fri, 11 Dec 2020 16:41:58 +0800 Subject: [PATCH 0429/1565] fs/lockd: convert comma to semicolon [ Upstream commit 3316fb80a0b4c1fef03a3eb1a7f0651e2133c429 ] Replace a comma between expression statements by a semicolon. Signed-off-by: Zheng Yongjun Signed-off-by: Trond Myklebust Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/host.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lockd/host.c b/fs/lockd/host.c index 771c289f6df7..f802223e71ab 100644 --- a/fs/lockd/host.c +++ b/fs/lockd/host.c @@ -163,7 +163,7 @@ static struct nlm_host *nlm_alloc_host(struct nlm_lookup_host_info *ni, host->h_nsmhandle = nsm; host->h_addrbuf = nsm->sm_addrbuf; host->net = ni->net; - host->h_cred = get_cred(ni->cred), + host->h_cred = get_cred(ni->cred); strlcpy(host->nodename, utsname()->nodename, sizeof(host->nodename)); out: From bca0057f686bff5999896e5780dfcbce4b9fbc45 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 18 Dec 2020 12:28:23 -0500 Subject: [PATCH 0430/1565] NFSD: Fix sparse warning in nfssvc.c [ Upstream commit d6c9e4368cc6a61bf25c9c72437ced509c854563 ] fs/nfsd/nfssvc.c:36:6: warning: symbol 'inter_copy_offload_enable' was not declared. Should it be static? The parameter was added by commit ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy"). Relocate it into the source file that uses it, and make it static. This approach is similar to the nfs4_disable_idmapping, cltrack_prog, and cltrack_legacy_disable module parameters. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 5 +++++ fs/nfsd/nfssvc.c | 6 ------ fs/nfsd/xdr4.h | 1 - 3 files changed, 5 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 6b06f0ad0561..1ef98398362a 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -50,6 +50,11 @@ #include "pnfs.h" #include "trace.h" +static bool inter_copy_offload_enable; +module_param(inter_copy_offload_enable, bool, 0644); +MODULE_PARM_DESC(inter_copy_offload_enable, + "Enable inter server to server copy offload. Default: false"); + #ifdef CONFIG_NFSD_V4_SECURITY_LABEL #include diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 3fb9607d67a3..423410cc0214 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -33,12 +33,6 @@ #define NFSDDBG_FACILITY NFSDDBG_SVC -bool inter_copy_offload_enable; -EXPORT_SYMBOL_GPL(inter_copy_offload_enable); -module_param(inter_copy_offload_enable, bool, 0644); -MODULE_PARM_DESC(inter_copy_offload_enable, - "Enable inter server to server copy offload. Default: false"); - extern struct svc_program nfsd_program; static int nfsd(void *vrqstp); #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index a60ff5ce1a37..c300885ae75d 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -568,7 +568,6 @@ struct nfsd4_copy { struct nfs_fh c_fh; nfs4_stateid stateid; }; -extern bool inter_copy_offload_enable; struct nfsd4_seek { /* request */ From 5b82798f78f9861b01c6c725c062de91f80ad489 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 18 Dec 2020 12:28:58 -0500 Subject: [PATCH 0431/1565] NFSD: Restore NFSv4 decoding's SAVEMEM functionality [ Upstream commit 7b723008f9c95624c848fad661c01b06e47b20da ] While converting the NFSv4 decoder to use xdr_stream-based XDR processing, I removed the old SAVEMEM() macro. This macro wrapped a bit of logic that avoided a memory allocation by recognizing when the decoded item resides in a linear section of the Receive buffer. In that case, it returned a pointer into that buffer instead of allocating a bounce buffer. The bounce buffer is necessary only when xdr_inline_decode() has placed the decoded item in the xdr_stream's scratch buffer, which disappears the next time xdr_inline_decode() is called with that xdr_stream. That happens only if the data item crosses a page boundary in the receive buffer, an exceedingly rare occurrence. Allocating a bounce buffer every time results in a minor performance regression that was introduced by the recent NFSv4 decoder overhaul. Let's restore the previous behavior. On average, it saves about 1.5 kmalloc() calls per COMPOUND. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 42 ++++++++++++++++++++++++++---------------- 1 file changed, 26 insertions(+), 16 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index bb037e8eb830..8d5fdae568ae 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -147,6 +147,25 @@ svcxdr_dupstr(struct nfsd4_compoundargs *argp, void *buf, u32 len) return p; } +static void * +svcxdr_savemem(struct nfsd4_compoundargs *argp, __be32 *p, u32 len) +{ + __be32 *tmp; + + /* + * The location of the decoded data item is stable, + * so @p is OK to use. This is the common case. + */ + if (p != argp->xdr->scratch.iov_base) + return p; + + tmp = svcxdr_tmpalloc(argp, len); + if (!tmp) + return NULL; + memcpy(tmp, p, len); + return tmp; +} + /* * NFSv4 basic data type decoders */ @@ -183,11 +202,10 @@ nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_netobj *o) p = xdr_inline_decode(argp->xdr, len); if (!p) return nfserr_bad_xdr; - o->data = svcxdr_tmpalloc(argp, len); + o->data = svcxdr_savemem(argp, p, len); if (!o->data) return nfserr_jukebox; o->len = len; - memcpy(o->data, p, len); return nfs_ok; } @@ -205,10 +223,9 @@ nfsd4_decode_component4(struct nfsd4_compoundargs *argp, char **namp, u32 *lenp) status = check_filename((char *)p, *lenp); if (status) return status; - *namp = svcxdr_tmpalloc(argp, *lenp); + *namp = svcxdr_savemem(argp, p, *lenp); if (!*namp) return nfserr_jukebox; - memcpy(*namp, p, *lenp); return nfs_ok; } @@ -1200,10 +1217,9 @@ nfsd4_decode_putfh(struct nfsd4_compoundargs *argp, struct nfsd4_putfh *putfh) p = xdr_inline_decode(argp->xdr, putfh->pf_fhlen); if (!p) return nfserr_bad_xdr; - putfh->pf_fhval = svcxdr_tmpalloc(argp, putfh->pf_fhlen); + putfh->pf_fhval = svcxdr_savemem(argp, p, putfh->pf_fhlen); if (!putfh->pf_fhval) return nfserr_jukebox; - memcpy(putfh->pf_fhval, p, putfh->pf_fhlen); return nfs_ok; } @@ -1318,24 +1334,20 @@ nfsd4_decode_setclientid(struct nfsd4_compoundargs *argp, struct nfsd4_setclient p = xdr_inline_decode(argp->xdr, setclientid->se_callback_netid_len); if (!p) return nfserr_bad_xdr; - setclientid->se_callback_netid_val = svcxdr_tmpalloc(argp, + setclientid->se_callback_netid_val = svcxdr_savemem(argp, p, setclientid->se_callback_netid_len); if (!setclientid->se_callback_netid_val) return nfserr_jukebox; - memcpy(setclientid->se_callback_netid_val, p, - setclientid->se_callback_netid_len); if (xdr_stream_decode_u32(argp->xdr, &setclientid->se_callback_addr_len) < 0) return nfserr_bad_xdr; p = xdr_inline_decode(argp->xdr, setclientid->se_callback_addr_len); if (!p) return nfserr_bad_xdr; - setclientid->se_callback_addr_val = svcxdr_tmpalloc(argp, + setclientid->se_callback_addr_val = svcxdr_savemem(argp, p, setclientid->se_callback_addr_len); if (!setclientid->se_callback_addr_val) return nfserr_jukebox; - memcpy(setclientid->se_callback_addr_val, p, - setclientid->se_callback_addr_len); if (xdr_stream_decode_u32(argp->xdr, &setclientid->se_callback_ident) < 0) return nfserr_bad_xdr; @@ -1375,10 +1387,9 @@ nfsd4_decode_verify(struct nfsd4_compoundargs *argp, struct nfsd4_verify *verify p = xdr_inline_decode(argp->xdr, verify->ve_attrlen); if (!p) return nfserr_bad_xdr; - verify->ve_attrval = svcxdr_tmpalloc(argp, verify->ve_attrlen); + verify->ve_attrval = svcxdr_savemem(argp, p, verify->ve_attrlen); if (!verify->ve_attrval) return nfserr_jukebox; - memcpy(verify->ve_attrval, p, verify->ve_attrlen); return nfs_ok; } @@ -2333,10 +2344,9 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) p = xdr_inline_decode(argp->xdr, argp->taglen); if (!p) return 0; - argp->tag = svcxdr_tmpalloc(argp, argp->taglen); + argp->tag = svcxdr_savemem(argp, p, argp->taglen); if (!argp->tag) return 0; - memcpy(argp->tag, p, argp->taglen); max_reply += xdr_align_size(argp->taglen); } From c336597d03ec4f3ce270312a064309bc9dc316ac Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 17 Sep 2020 17:22:49 -0400 Subject: [PATCH 0432/1565] SUNRPC: Make trace_svc_process() display the RPC procedure symbolically [ Upstream commit 2289e87b5951f97783f07fc895e6c5e804b53668 ] The next few patches will employ these strings to help make server- side trace logs more human-readable. A similar technique is already in use in kernel RPC client code. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc4proc.c | 24 ++++++++++++++++++++++++ fs/lockd/svcproc.c | 24 ++++++++++++++++++++++++ fs/nfs/callback_xdr.c | 2 ++ fs/nfsd/nfs2acl.c | 5 +++++ fs/nfsd/nfs3acl.c | 3 +++ fs/nfsd/nfs3proc.c | 22 ++++++++++++++++++++++ fs/nfsd/nfs4proc.c | 2 ++ fs/nfsd/nfsproc.c | 18 ++++++++++++++++++ include/linux/sunrpc/svc.h | 1 + 9 files changed, 101 insertions(+) diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c index fa41dda39925..4c10fb5138f1 100644 --- a/fs/lockd/svc4proc.c +++ b/fs/lockd/svc4proc.c @@ -512,6 +512,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "NULL", }, [NLMPROC_TEST] = { .pc_func = nlm4svc_proc_test, @@ -520,6 +521,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+2+No+Rg, + .pc_name = "TEST", }, [NLMPROC_LOCK] = { .pc_func = nlm4svc_proc_lock, @@ -528,6 +530,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "LOCK", }, [NLMPROC_CANCEL] = { .pc_func = nlm4svc_proc_cancel, @@ -536,6 +539,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "CANCEL", }, [NLMPROC_UNLOCK] = { .pc_func = nlm4svc_proc_unlock, @@ -544,6 +548,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "UNLOCK", }, [NLMPROC_GRANTED] = { .pc_func = nlm4svc_proc_granted, @@ -552,6 +557,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "GRANTED", }, [NLMPROC_TEST_MSG] = { .pc_func = nlm4svc_proc_test_msg, @@ -560,6 +566,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "TEST_MSG", }, [NLMPROC_LOCK_MSG] = { .pc_func = nlm4svc_proc_lock_msg, @@ -568,6 +575,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "LOCK_MSG", }, [NLMPROC_CANCEL_MSG] = { .pc_func = nlm4svc_proc_cancel_msg, @@ -576,6 +584,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "CANCEL_MSG", }, [NLMPROC_UNLOCK_MSG] = { .pc_func = nlm4svc_proc_unlock_msg, @@ -584,6 +593,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "UNLOCK_MSG", }, [NLMPROC_GRANTED_MSG] = { .pc_func = nlm4svc_proc_granted_msg, @@ -592,6 +602,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "GRANTED_MSG", }, [NLMPROC_TEST_RES] = { .pc_func = nlm4svc_proc_null, @@ -600,6 +611,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "TEST_RES", }, [NLMPROC_LOCK_RES] = { .pc_func = nlm4svc_proc_null, @@ -608,6 +620,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "LOCK_RES", }, [NLMPROC_CANCEL_RES] = { .pc_func = nlm4svc_proc_null, @@ -616,6 +629,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "CANCEL_RES", }, [NLMPROC_UNLOCK_RES] = { .pc_func = nlm4svc_proc_null, @@ -624,6 +638,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "UNLOCK_RES", }, [NLMPROC_GRANTED_RES] = { .pc_func = nlm4svc_proc_granted_res, @@ -632,6 +647,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "GRANTED_RES", }, [NLMPROC_NSM_NOTIFY] = { .pc_func = nlm4svc_proc_sm_notify, @@ -640,6 +656,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_reboot), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "SM_NOTIFY", }, [17] = { .pc_func = nlm4svc_proc_unused, @@ -648,6 +665,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, + .pc_name = "UNUSED", }, [18] = { .pc_func = nlm4svc_proc_unused, @@ -656,6 +674,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, + .pc_name = "UNUSED", }, [19] = { .pc_func = nlm4svc_proc_unused, @@ -664,6 +683,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, + .pc_name = "UNUSED", }, [NLMPROC_SHARE] = { .pc_func = nlm4svc_proc_share, @@ -672,6 +692,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, + .pc_name = "SHARE", }, [NLMPROC_UNSHARE] = { .pc_func = nlm4svc_proc_unshare, @@ -680,6 +701,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, + .pc_name = "UNSHARE", }, [NLMPROC_NM_LOCK] = { .pc_func = nlm4svc_proc_nm_lock, @@ -688,6 +710,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "NM_LOCK", }, [NLMPROC_FREE_ALL] = { .pc_func = nlm4svc_proc_free_all, @@ -696,5 +719,6 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "FREE_ALL", }, }; diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c index 50855f2c1f4b..4ae4b63b5392 100644 --- a/fs/lockd/svcproc.c +++ b/fs/lockd/svcproc.c @@ -554,6 +554,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "NULL", }, [NLMPROC_TEST] = { .pc_func = nlmsvc_proc_test, @@ -562,6 +563,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+2+No+Rg, + .pc_name = "TEST", }, [NLMPROC_LOCK] = { .pc_func = nlmsvc_proc_lock, @@ -570,6 +572,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "LOCK", }, [NLMPROC_CANCEL] = { .pc_func = nlmsvc_proc_cancel, @@ -578,6 +581,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "CANCEL", }, [NLMPROC_UNLOCK] = { .pc_func = nlmsvc_proc_unlock, @@ -586,6 +590,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "UNLOCK", }, [NLMPROC_GRANTED] = { .pc_func = nlmsvc_proc_granted, @@ -594,6 +599,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "GRANTED", }, [NLMPROC_TEST_MSG] = { .pc_func = nlmsvc_proc_test_msg, @@ -602,6 +608,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "TEST_MSG", }, [NLMPROC_LOCK_MSG] = { .pc_func = nlmsvc_proc_lock_msg, @@ -610,6 +617,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "LOCK_MSG", }, [NLMPROC_CANCEL_MSG] = { .pc_func = nlmsvc_proc_cancel_msg, @@ -618,6 +626,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "CANCEL_MSG", }, [NLMPROC_UNLOCK_MSG] = { .pc_func = nlmsvc_proc_unlock_msg, @@ -626,6 +635,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "UNLOCK_MSG", }, [NLMPROC_GRANTED_MSG] = { .pc_func = nlmsvc_proc_granted_msg, @@ -634,6 +644,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "GRANTED_MSG", }, [NLMPROC_TEST_RES] = { .pc_func = nlmsvc_proc_null, @@ -642,6 +653,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "TEST_RES", }, [NLMPROC_LOCK_RES] = { .pc_func = nlmsvc_proc_null, @@ -650,6 +662,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "LOCK_RES", }, [NLMPROC_CANCEL_RES] = { .pc_func = nlmsvc_proc_null, @@ -658,6 +671,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "CANCEL_RES", }, [NLMPROC_UNLOCK_RES] = { .pc_func = nlmsvc_proc_null, @@ -666,6 +680,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "UNLOCK_RES", }, [NLMPROC_GRANTED_RES] = { .pc_func = nlmsvc_proc_granted_res, @@ -674,6 +689,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "GRANTED_RES", }, [NLMPROC_NSM_NOTIFY] = { .pc_func = nlmsvc_proc_sm_notify, @@ -682,6 +698,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_reboot), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "SM_NOTIFY", }, [17] = { .pc_func = nlmsvc_proc_unused, @@ -690,6 +707,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "UNUSED", }, [18] = { .pc_func = nlmsvc_proc_unused, @@ -698,6 +716,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "UNUSED", }, [19] = { .pc_func = nlmsvc_proc_unused, @@ -706,6 +725,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, + .pc_name = "UNUSED", }, [NLMPROC_SHARE] = { .pc_func = nlmsvc_proc_share, @@ -714,6 +734,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, + .pc_name = "SHARE", }, [NLMPROC_UNSHARE] = { .pc_func = nlmsvc_proc_unshare, @@ -722,6 +743,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, + .pc_name = "UNSHARE", }, [NLMPROC_NM_LOCK] = { .pc_func = nlmsvc_proc_nm_lock, @@ -730,6 +752,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, + .pc_name = "NM_LOCK", }, [NLMPROC_FREE_ALL] = { .pc_func = nlmsvc_proc_free_all, @@ -738,5 +761,6 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_argsize = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, + .pc_name = "FREE_ALL", }, }; diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index ca8a4aa351dc..f9dfd4e712a3 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -1056,6 +1056,7 @@ static const struct svc_procedure nfs4_callback_procedures1[] = { .pc_decode = nfs4_decode_void, .pc_encode = nfs4_encode_void, .pc_xdrressize = 1, + .pc_name = "NULL", }, [CB_COMPOUND] = { .pc_func = nfs4_callback_compound, @@ -1063,6 +1064,7 @@ static const struct svc_procedure nfs4_callback_procedures1[] = { .pc_argsize = 256, .pc_ressize = 256, .pc_xdrressize = NFS4_CALLBACK_BUFSIZE, + .pc_name = "COMPOUND", } }; diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index b0f66604532a..899762da23c9 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -371,6 +371,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, + .pc_name = "NULL", }, [ACLPROC2_GETACL] = { .pc_func = nfsacld_proc_getacl, @@ -381,6 +382,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_ressize = sizeof(struct nfsd3_getaclres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+1+2*(1+ACL), + .pc_name = "GETACL", }, [ACLPROC2_SETACL] = { .pc_func = nfsacld_proc_setacl, @@ -391,6 +393,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, + .pc_name = "SETACL", }, [ACLPROC2_GETATTR] = { .pc_func = nfsacld_proc_getattr, @@ -401,6 +404,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, + .pc_name = "GETATTR", }, [ACLPROC2_ACCESS] = { .pc_func = nfsacld_proc_access, @@ -411,6 +415,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_ressize = sizeof(struct nfsd3_accessres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT+1, + .pc_name = "SETATTR", }, }; diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 7c30876a31a1..9e1a92fb9771 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -251,6 +251,7 @@ static const struct svc_procedure nfsd_acl_procedures3[3] = { .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, + .pc_name = "NULL", }, [ACLPROC3_GETACL] = { .pc_func = nfsd3_proc_getacl, @@ -261,6 +262,7 @@ static const struct svc_procedure nfsd_acl_procedures3[3] = { .pc_ressize = sizeof(struct nfsd3_getaclres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+1+2*(1+ACL), + .pc_name = "GETACL", }, [ACLPROC3_SETACL] = { .pc_func = nfsd3_proc_setacl, @@ -271,6 +273,7 @@ static const struct svc_procedure nfsd_acl_procedures3[3] = { .pc_ressize = sizeof(struct nfsd3_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT, + .pc_name = "SETACL", }, }; diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 3257233d1a65..0f79e007c620 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -713,6 +713,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, + .pc_name = "NULL", }, [NFS3PROC_GETATTR] = { .pc_func = nfsd3_proc_getattr, @@ -723,6 +724,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_attrstatres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, + .pc_name = "GETATTR", }, [NFS3PROC_SETATTR] = { .pc_func = nfsd3_proc_setattr, @@ -733,6 +735,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_wccstatres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC, + .pc_name = "SETATTR", }, [NFS3PROC_LOOKUP] = { .pc_func = nfsd3_proc_lookup, @@ -743,6 +746,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_diropres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+FH+pAT+pAT, + .pc_name = "LOOKUP", }, [NFS3PROC_ACCESS] = { .pc_func = nfsd3_proc_access, @@ -753,6 +757,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_accessres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+1, + .pc_name = "ACCESS", }, [NFS3PROC_READLINK] = { .pc_func = nfsd3_proc_readlink, @@ -763,6 +768,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_readlinkres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+1+NFS3_MAXPATHLEN/4, + .pc_name = "READLINK", }, [NFS3PROC_READ] = { .pc_func = nfsd3_proc_read, @@ -773,6 +779,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_readres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+4+NFSSVC_MAXBLKSIZE/4, + .pc_name = "READ", }, [NFS3PROC_WRITE] = { .pc_func = nfsd3_proc_write, @@ -783,6 +790,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_writeres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC+4, + .pc_name = "WRITE", }, [NFS3PROC_CREATE] = { .pc_func = nfsd3_proc_create, @@ -793,6 +801,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, + .pc_name = "CREATE", }, [NFS3PROC_MKDIR] = { .pc_func = nfsd3_proc_mkdir, @@ -803,6 +812,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, + .pc_name = "MKDIR", }, [NFS3PROC_SYMLINK] = { .pc_func = nfsd3_proc_symlink, @@ -813,6 +823,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, + .pc_name = "SYMLINK", }, [NFS3PROC_MKNOD] = { .pc_func = nfsd3_proc_mknod, @@ -823,6 +834,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, + .pc_name = "MKNOD", }, [NFS3PROC_REMOVE] = { .pc_func = nfsd3_proc_remove, @@ -833,6 +845,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_wccstatres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC, + .pc_name = "REMOVE", }, [NFS3PROC_RMDIR] = { .pc_func = nfsd3_proc_rmdir, @@ -843,6 +856,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_wccstatres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC, + .pc_name = "RMDIR", }, [NFS3PROC_RENAME] = { .pc_func = nfsd3_proc_rename, @@ -853,6 +867,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_renameres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC+WC, + .pc_name = "RENAME", }, [NFS3PROC_LINK] = { .pc_func = nfsd3_proc_link, @@ -863,6 +878,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_linkres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+pAT+WC, + .pc_name = "LINK", }, [NFS3PROC_READDIR] = { .pc_func = nfsd3_proc_readdir, @@ -872,6 +888,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_argsize = sizeof(struct nfsd3_readdirargs), .pc_ressize = sizeof(struct nfsd3_readdirres), .pc_cachetype = RC_NOCACHE, + .pc_name = "READDIR", }, [NFS3PROC_READDIRPLUS] = { .pc_func = nfsd3_proc_readdirplus, @@ -881,6 +898,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_argsize = sizeof(struct nfsd3_readdirplusargs), .pc_ressize = sizeof(struct nfsd3_readdirres), .pc_cachetype = RC_NOCACHE, + .pc_name = "READDIRPLUS", }, [NFS3PROC_FSSTAT] = { .pc_func = nfsd3_proc_fsstat, @@ -890,6 +908,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_fsstatres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+2*6+1, + .pc_name = "FSSTAT", }, [NFS3PROC_FSINFO] = { .pc_func = nfsd3_proc_fsinfo, @@ -899,6 +918,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_fsinfores), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+12, + .pc_name = "FSINFO", }, [NFS3PROC_PATHCONF] = { .pc_func = nfsd3_proc_pathconf, @@ -908,6 +928,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_pathconfres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+6, + .pc_name = "PATHCONF", }, [NFS3PROC_COMMIT] = { .pc_func = nfsd3_proc_commit, @@ -918,6 +939,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_ressize = sizeof(struct nfsd3_commitres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+WC+2, + .pc_name = "COMMIT", }, }; diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 1ef98398362a..a5e1f5c1a4d6 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -3299,6 +3299,7 @@ static const struct svc_procedure nfsd_procedures4[2] = { .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 1, + .pc_name = "NULL", }, [NFSPROC4_COMPOUND] = { .pc_func = nfsd4_proc_compound, @@ -3309,6 +3310,7 @@ static const struct svc_procedure nfsd_procedures4[2] = { .pc_release = nfsd4_release_compoundargs, .pc_cachetype = RC_NOCACHE, .pc_xdrressize = NFSD_BUFSIZE/4, + .pc_name = "COMPOUND", }, }; diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index dbd8d3604653..f22f70f63b53 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -623,6 +623,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, + .pc_name = "NULL", }, [NFSPROC_GETATTR] = { .pc_func = nfsd_proc_getattr, @@ -633,6 +634,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, + .pc_name = "GETATTR", }, [NFSPROC_SETATTR] = { .pc_func = nfsd_proc_setattr, @@ -643,6 +645,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+AT, + .pc_name = "SETATTR", }, [NFSPROC_ROOT] = { .pc_func = nfsd_proc_root, @@ -652,6 +655,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, + .pc_name = "ROOT", }, [NFSPROC_LOOKUP] = { .pc_func = nfsd_proc_lookup, @@ -662,6 +666,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_diropres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+FH+AT, + .pc_name = "LOOKUP", }, [NFSPROC_READLINK] = { .pc_func = nfsd_proc_readlink, @@ -671,6 +676,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_readlinkres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+1+NFS_MAXPATHLEN/4, + .pc_name = "READLINK", }, [NFSPROC_READ] = { .pc_func = nfsd_proc_read, @@ -681,6 +687,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_readres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT+1+NFSSVC_MAXBLKSIZE_V2/4, + .pc_name = "READ", }, [NFSPROC_WRITECACHE] = { .pc_func = nfsd_proc_writecache, @@ -690,6 +697,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, + .pc_name = "WRITECACHE", }, [NFSPROC_WRITE] = { .pc_func = nfsd_proc_write, @@ -700,6 +708,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+AT, + .pc_name = "WRITE", }, [NFSPROC_CREATE] = { .pc_func = nfsd_proc_create, @@ -710,6 +719,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_diropres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+FH+AT, + .pc_name = "CREATE", }, [NFSPROC_REMOVE] = { .pc_func = nfsd_proc_remove, @@ -719,6 +729,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, + .pc_name = "REMOVE", }, [NFSPROC_RENAME] = { .pc_func = nfsd_proc_rename, @@ -728,6 +739,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, + .pc_name = "RENAME", }, [NFSPROC_LINK] = { .pc_func = nfsd_proc_link, @@ -737,6 +749,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, + .pc_name = "LINK", }, [NFSPROC_SYMLINK] = { .pc_func = nfsd_proc_symlink, @@ -746,6 +759,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, + .pc_name = "SYMLINK", }, [NFSPROC_MKDIR] = { .pc_func = nfsd_proc_mkdir, @@ -756,6 +770,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_diropres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+FH+AT, + .pc_name = "MKDIR", }, [NFSPROC_RMDIR] = { .pc_func = nfsd_proc_rmdir, @@ -765,6 +780,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, + .pc_name = "RMDIR", }, [NFSPROC_READDIR] = { .pc_func = nfsd_proc_readdir, @@ -773,6 +789,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_argsize = sizeof(struct nfsd_readdirargs), .pc_ressize = sizeof(struct nfsd_readdirres), .pc_cachetype = RC_NOCACHE, + .pc_name = "READDIR", }, [NFSPROC_STATFS] = { .pc_func = nfsd_proc_statfs, @@ -782,6 +799,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_ressize = sizeof(struct nfsd_statfsres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+5, + .pc_name = "STATFS", }, }; diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 34c2a69820e9..31ee3b6047c3 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -463,6 +463,7 @@ struct svc_procedure { unsigned int pc_ressize; /* result struct size */ unsigned int pc_cachetype; /* cache info (NFS) */ unsigned int pc_xdrressize; /* maximum size of XDR reply */ + const char * pc_name; /* for display */ }; /* From 97d254cba30d04c968080ed5018e0b2bba7ec430 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Dec 2020 10:22:09 -0500 Subject: [PATCH 0433/1565] SUNRPC: Display RPC procedure names instead of proc numbers [ Upstream commit 89ff87494c6e4b32ea7960d0c644efdbb2fe6ef5 ] Make the sunrpc trace subsystem trace events easier to use. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/trace/events/sunrpc.h | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 8220369ee610..200978b94a0b 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -1578,6 +1578,7 @@ TRACE_EVENT(svc_process, __field(u32, vers) __field(u32, proc) __string(service, name) + __string(procedure, rqst->rq_procinfo->pc_name) __string(addr, rqst->rq_xprt ? rqst->rq_xprt->xpt_remotebuf : "(null)") ), @@ -1587,13 +1588,16 @@ TRACE_EVENT(svc_process, __entry->vers = rqst->rq_vers; __entry->proc = rqst->rq_proc; __assign_str(service, name); + __assign_str(procedure, rqst->rq_procinfo->pc_name); __assign_str(addr, rqst->rq_xprt ? rqst->rq_xprt->xpt_remotebuf : "(null)"); ), - TP_printk("addr=%s xid=0x%08x service=%s vers=%u proc=%u", + TP_printk("addr=%s xid=0x%08x service=%s vers=%u proc=%s", __get_str(addr), __entry->xid, - __get_str(service), __entry->vers, __entry->proc) + __get_str(service), __entry->vers, + __get_str(procedure) + ) ); DECLARE_EVENT_CLASS(svc_rqst_event, @@ -1849,6 +1853,7 @@ TRACE_EVENT(svc_stats_latency, TP_STRUCT__entry( __field(u32, xid) __field(unsigned long, execute) + __string(procedure, rqst->rq_procinfo->pc_name) __string(addr, rqst->rq_xprt->xpt_remotebuf) ), @@ -1856,11 +1861,13 @@ TRACE_EVENT(svc_stats_latency, __entry->xid = be32_to_cpu(rqst->rq_xid); __entry->execute = ktime_to_us(ktime_sub(ktime_get(), rqst->rq_stime)); + __assign_str(procedure, rqst->rq_procinfo->pc_name); __assign_str(addr, rqst->rq_xprt->xpt_remotebuf); ), - TP_printk("addr=%s xid=0x%08x execute-us=%lu", - __get_str(addr), __entry->xid, __entry->execute) + TP_printk("addr=%s xid=0x%08x proc=%s execute-us=%lu", + __get_str(addr), __entry->xid, __get_str(procedure), + __entry->execute) ); DECLARE_EVENT_CLASS(svc_deferred_event, From 22b19656eaacd32c7b679d853c68b098337c52c6 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 27 Nov 2020 17:37:02 -0500 Subject: [PATCH 0434/1565] SUNRPC: Move definition of XDR_UNIT [ Upstream commit 81d217474326b25d7f14274b02fe3da1e85ad934 ] Clean up: The unit of XDR alignment is defined by RFC 4506, not as part of the RPC message header. Thus it belongs in include/linux/sunrpc/xdr.h. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/sunrpc/msg_prot.h | 3 --- include/linux/sunrpc/xdr.h | 13 ++++++++++--- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/include/linux/sunrpc/msg_prot.h b/include/linux/sunrpc/msg_prot.h index 43f854487539..938c2bf29db8 100644 --- a/include/linux/sunrpc/msg_prot.h +++ b/include/linux/sunrpc/msg_prot.h @@ -10,9 +10,6 @@ #define RPC_VERSION 2 -/* size of an XDR encoding unit in bytes, i.e. 32bit */ -#define XDR_UNIT (4) - /* spec defines authentication flavor as an unsigned 32 bit integer */ typedef u32 rpc_authflavor_t; diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index f6569b620bea..eba6204330b3 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -19,6 +19,13 @@ struct bio_vec; struct rpc_rqst; +/* + * Size of an XDR encoding unit in bytes, i.e. 32 bits, + * as defined in Section 3 of RFC 4506. All encoded + * XDR data items are aligned on a boundary of 32 bits. + */ +#define XDR_UNIT sizeof(__be32) + /* * Buffer adjustment */ @@ -329,7 +336,7 @@ ssize_t xdr_stream_decode_string_dup(struct xdr_stream *xdr, char **str, static inline size_t xdr_align_size(size_t n) { - const size_t mask = sizeof(__u32) - 1; + const size_t mask = XDR_UNIT - 1; return (n + mask) & ~mask; } @@ -359,7 +366,7 @@ static inline size_t xdr_pad_size(size_t n) */ static inline ssize_t xdr_stream_encode_item_present(struct xdr_stream *xdr) { - const size_t len = sizeof(__be32); + const size_t len = XDR_UNIT; __be32 *p = xdr_reserve_space(xdr, len); if (unlikely(!p)) @@ -378,7 +385,7 @@ static inline ssize_t xdr_stream_encode_item_present(struct xdr_stream *xdr) */ static inline int xdr_stream_encode_item_absent(struct xdr_stream *xdr) { - const size_t len = sizeof(__be32); + const size_t len = XDR_UNIT; __be32 *p = xdr_reserve_space(xdr, len); if (unlikely(!p)) From d668aa92a6240269e25eb53343215f55245245ec Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 20 Oct 2020 14:30:02 -0400 Subject: [PATCH 0435/1565] NFSD: Update GETATTR3args decoder to use struct xdr_stream [ Upstream commit 9575363a9e4c8d7e2f9ba5e79884d623fff0be6f ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 3 +-- fs/nfsd/nfs3xdr.c | 31 +++++++++++++++++++++++++------ fs/nfsd/xdr3.h | 2 +- 3 files changed, 27 insertions(+), 9 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 0f79e007c620..a1d743bbb837 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -688,7 +688,6 @@ nfsd3_proc_commit(struct svc_rqst *rqstp) * NFSv3 Server procedures. * Only the results of non-idempotent operations are cached. */ -#define nfs3svc_decode_fhandleargs nfs3svc_decode_fhandle #define nfs3svc_encode_attrstatres nfs3svc_encode_attrstat #define nfs3svc_encode_wccstatres nfs3svc_encode_wccstat #define nfsd3_mkdirargs nfsd3_createargs @@ -720,7 +719,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_decode = nfs3svc_decode_fhandleargs, .pc_encode = nfs3svc_encode_attrstatres, .pc_release = nfs3svc_release_fhandle, - .pc_argsize = sizeof(struct nfsd3_fhandleargs), + .pc_argsize = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd3_attrstatres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 34b880211e5e..3a2b4abea1a4 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -29,8 +29,9 @@ static u32 nfs3_ftypes[] = { /* - * XDR functions for basic NFS types + * Basic NFSv3 data types (RFC 1813 Sections 2.5 and 2.6) */ + static __be32 * encode_time3(__be32 *p, struct timespec64 *time) { @@ -46,6 +47,26 @@ decode_time3(__be32 *p, struct timespec64 *time) return p; } +static bool +svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp) +{ + __be32 *p; + u32 size; + + if (xdr_stream_decode_u32(xdr, &size) < 0) + return false; + if (size == 0 || size > NFS3_FHSIZE) + return false; + p = xdr_inline_decode(xdr, size); + if (!p) + return false; + fh_init(fhp, NFS3_FHSIZE); + fhp->fh_handle.fh_size = size; + memcpy(&fhp->fh_handle.fh_base, p, size); + + return true; +} + static __be32 * decode_fh(__be32 *p, struct svc_fh *fhp) { @@ -312,14 +333,12 @@ void fill_post_wcc(struct svc_fh *fhp) */ int -nfs3svc_decode_fhandle(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_fhandle *args = rqstp->rq_argp; - p = decode_fh(p, &args->fh); - if (!p) - return 0; - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_nfs_fh3(xdr, &args->fh); } int diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 456fcd7a1038..62ea669768cf 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -273,7 +273,7 @@ union nfsd3_xdrstore { #define NFS3_SVC_XDRSIZE sizeof(union nfsd3_xdrstore) -int nfs3svc_decode_fhandle(struct svc_rqst *, __be32 *); +int nfs3svc_decode_fhandleargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_sattrargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_diropargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_accessargs(struct svc_rqst *, __be32 *); From 7bb23be4501bb0c760ae182b0943836fe5e12e97 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 20 Oct 2020 14:32:04 -0400 Subject: [PATCH 0436/1565] NFSD: Update ACCESS3arg decoder to use struct xdr_stream [ Upstream commit 3b921a2b14251e9e203f1e8af76e8ade79f50e50 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 9 +++++---- fs/nfsd/xdr3.h | 2 +- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 3a2b4abea1a4..e07cebd80ef7 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -375,14 +375,15 @@ nfs3svc_decode_diropargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_accessargs *args = rqstp->rq_argp; - p = decode_fh(p, &args->fh); - if (!p) + if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u32(xdr, &args->access) < 0) return 0; - args->access = ntohl(*p++); - return xdr_argsize_check(rqstp, p); + return 1; } int diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 62ea669768cf..a4dce4baec7c 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -25,7 +25,7 @@ struct nfsd3_diropargs { struct nfsd3_accessargs { struct svc_fh fh; - unsigned int access; + __u32 access; }; struct nfsd3_readargs { From b77a4a968d1d2ab9f36556efc8e73772e28f8c90 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 20 Oct 2020 14:34:40 -0400 Subject: [PATCH 0437/1565] NFSD: Update READ3arg decoder to use struct xdr_stream [ Upstream commit be63bd2ac6bbf8c065a0ef6dfbea76934326c352 ] The code that sets up rq_vec is refactored so that it is now adjacent to the nfsd_read() call site where it is used. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 23 ++++++++++++++++++----- fs/nfsd/nfs3xdr.c | 28 +++++++--------------------- fs/nfsd/xdr3.h | 1 - 3 files changed, 25 insertions(+), 27 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index a1d743bbb837..2e477cd87091 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -144,25 +144,38 @@ nfsd3_proc_read(struct svc_rqst *rqstp) { struct nfsd3_readargs *argp = rqstp->rq_argp; struct nfsd3_readres *resp = rqstp->rq_resp; - u32 max_blocksize = svc_max_payload(rqstp); - unsigned long cnt = min(argp->count, max_blocksize); + u32 max_blocksize = svc_max_payload(rqstp); + unsigned int len; + int v; + + argp->count = min_t(u32, argp->count, max_blocksize); dprintk("nfsd: READ(3) %s %lu bytes at %Lu\n", SVCFH_fmt(&argp->fh), (unsigned long) argp->count, (unsigned long long) argp->offset); + v = 0; + len = argp->count; + while (len > 0) { + struct page *page = *(rqstp->rq_next_page++); + + rqstp->rq_vec[v].iov_base = page_address(page); + rqstp->rq_vec[v].iov_len = min_t(unsigned int, len, PAGE_SIZE); + len -= rqstp->rq_vec[v].iov_len; + v++; + } + /* Obtain buffer pointer for payload. * 1 (status) + 22 (post_op_attr) + 1 (count) + 1 (eof) * + 1 (xdr opaque byte count) = 26 */ - resp->count = cnt; + resp->count = argp->count; svc_reserve_auth(rqstp, ((1 + NFS3_POST_OP_ATTR_WORDS + 3)<<2) + resp->count +4); fh_copy(&resp->fh, &argp->fh); resp->status = nfsd_read(rqstp, &resp->fh, argp->offset, - rqstp->rq_vec, argp->vlen, &resp->count, - &resp->eof); + rqstp->rq_vec, v, &resp->count, &resp->eof); return rpc_success; } diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index e07cebd80ef7..2f32df15a7e8 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -389,31 +389,17 @@ nfs3svc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_readargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_readargs *args = rqstp->rq_argp; - unsigned int len; - int v; - u32 max_blocksize = svc_max_payload(rqstp); - p = decode_fh(p, &args->fh); - if (!p) + if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u64(xdr, &args->offset) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &args->count) < 0) return 0; - p = xdr_decode_hyper(p, &args->offset); - args->count = ntohl(*p++); - len = min(args->count, max_blocksize); - - /* set up the kvec */ - v=0; - while (len > 0) { - struct page *p = *(rqstp->rq_next_page++); - - rqstp->rq_vec[v].iov_base = page_address(p); - rqstp->rq_vec[v].iov_len = min_t(unsigned int, len, PAGE_SIZE); - len -= rqstp->rq_vec[v].iov_len; - v++; - } - args->vlen = v; - return xdr_argsize_check(rqstp, p); + return 1; } int diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index a4dce4baec7c..7dfeeaa4e1df 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -32,7 +32,6 @@ struct nfsd3_readargs { struct svc_fh fh; __u64 offset; __u32 count; - int vlen; }; struct nfsd3_writeargs { From 5f36ae59d6cc55437474fc02a450b62b885bd9be Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 22 Oct 2020 11:14:55 -0400 Subject: [PATCH 0438/1565] NFSD: Update WRITE3arg decoder to use struct xdr_stream [ Upstream commit c43b2f229a01969a7ccf94b033c5085e0ec2040c ] As part of the update, open code that sanity-checks the size of the data payload against the length of the RPC Call message has to be re-implemented to use xdr_stream infrastructure. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 51 +++++++++++++++++++---------------------------- 1 file changed, 20 insertions(+), 31 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 2f32df15a7e8..c06467e8ac82 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -405,52 +405,41 @@ nfs3svc_decode_readargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_writeargs *args = rqstp->rq_argp; - unsigned int len, hdr, dlen; u32 max_blocksize = svc_max_payload(rqstp); struct kvec *head = rqstp->rq_arg.head; struct kvec *tail = rqstp->rq_arg.tail; + size_t remaining; - p = decode_fh(p, &args->fh); - if (!p) + if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u64(xdr, &args->offset) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &args->count) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &args->stable) < 0) return 0; - p = xdr_decode_hyper(p, &args->offset); - args->count = ntohl(*p++); - args->stable = ntohl(*p++); - len = args->len = ntohl(*p++); - if ((void *)p > head->iov_base + head->iov_len) + /* opaque data */ + if (xdr_stream_decode_u32(xdr, &args->len) < 0) return 0; - /* - * The count must equal the amount of data passed. - */ + + /* request sanity */ if (args->count != args->len) return 0; - - /* - * Check to make sure that we got the right number of - * bytes. - */ - hdr = (void*)p - head->iov_base; - dlen = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len - hdr; - /* - * Round the length of the data which was specified up to - * the next multiple of XDR units and then compare that - * against the length which was actually received. - * Note that when RPCSEC/GSS (for example) is used, the - * data buffer can be padded so dlen might be larger - * than required. It must never be smaller. - */ - if (dlen < XDR_QUADLEN(len)*4) + remaining = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len; + remaining -= xdr_stream_pos(xdr); + if (remaining < xdr_align_size(args->len)) return 0; - if (args->count > max_blocksize) { args->count = max_blocksize; - len = args->len = max_blocksize; + args->len = max_blocksize; } - args->first.iov_base = (void *)p; - args->first.iov_len = head->iov_len - hdr; + args->first.iov_base = xdr->p; + args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); + return 1; } From 19285d319f7c143f8cc0d461c7b626c08ff8f046 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sat, 24 Oct 2020 12:51:18 -0400 Subject: [PATCH 0439/1565] NFSD: Update READLINK3arg decoder to use struct xdr_stream [ Upstream commit 224c1c894e48cd72e4dd9fb6311be80cbe1369b0 ] The NFSv3 READLINK request takes a single filehandle, so it can re-use GETATTR's decoder. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 9 +++++---- fs/nfsd/nfs3xdr.c | 13 ------------- fs/nfsd/xdr3.h | 6 ------ 3 files changed, 5 insertions(+), 23 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 2e477cd87091..71db0ed3c49e 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -124,15 +124,16 @@ nfsd3_proc_access(struct svc_rqst *rqstp) static __be32 nfsd3_proc_readlink(struct svc_rqst *rqstp) { - struct nfsd3_readlinkargs *argp = rqstp->rq_argp; + struct nfsd_fhandle *argp = rqstp->rq_argp; struct nfsd3_readlinkres *resp = rqstp->rq_resp; + char *buffer = page_address(*(rqstp->rq_next_page++)); dprintk("nfsd: READLINK(3) %s\n", SVCFH_fmt(&argp->fh)); /* Read the symlink. */ fh_copy(&resp->fh, &argp->fh); resp->len = NFS3_MAXPATHLEN; - resp->status = nfsd_readlink(rqstp, &resp->fh, argp->buffer, &resp->len); + resp->status = nfsd_readlink(rqstp, &resp->fh, buffer, &resp->len); return rpc_success; } @@ -773,10 +774,10 @@ static const struct svc_procedure nfsd_procedures3[22] = { }, [NFS3PROC_READLINK] = { .pc_func = nfsd3_proc_readlink, - .pc_decode = nfs3svc_decode_readlinkargs, + .pc_decode = nfs3svc_decode_fhandleargs, .pc_encode = nfs3svc_encode_readlinkres, .pc_release = nfs3svc_release_fhandle, - .pc_argsize = sizeof(struct nfsd3_readlinkargs), + .pc_argsize = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd3_readlinkres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+1+NFS3_MAXPATHLEN/4, diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index c06467e8ac82..6b6a839c1fc8 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -543,19 +543,6 @@ nfs3svc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) return xdr_argsize_check(rqstp, p); } -int -nfs3svc_decode_readlinkargs(struct svc_rqst *rqstp, __be32 *p) -{ - struct nfsd3_readlinkargs *args = rqstp->rq_argp; - - p = decode_fh(p, &args->fh); - if (!p) - return 0; - args->buffer = page_address(*(rqstp->rq_next_page++)); - - return xdr_argsize_check(rqstp, p); -} - int nfs3svc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) { diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 7dfeeaa4e1df..08f909142ddf 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -70,11 +70,6 @@ struct nfsd3_renameargs { unsigned int tlen; }; -struct nfsd3_readlinkargs { - struct svc_fh fh; - char * buffer; -}; - struct nfsd3_linkargs { struct svc_fh ffh; struct svc_fh tfh; @@ -282,7 +277,6 @@ int nfs3svc_decode_createargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_mkdirargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_mknodargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_renameargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_readlinkargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_linkargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_symlinkargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_readdirargs(struct svc_rqst *, __be32 *); From 29270d477fff9be7d068a15bfdf005658cbf119e Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 10 Nov 2020 10:24:39 -0500 Subject: [PATCH 0440/1565] NFSD: Fix returned READDIR offset cookie [ Upstream commit 0a8f37fb34a96267c656f7254e69bb9a2fc89fe4 ] Code inspection shows that the server's NFSv3 READDIR implementation handles offset cookies slightly differently than the NFSv2 READDIR, NFSv3 READDIRPLUS, and NFSv4 READDIR implementations, and there doesn't seem to be any need for this difference. As a clean up, I copied the logic from nfsd3_proc_readdirplus(). Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 71db0ed3c49e..8cffd9852ef0 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -449,6 +449,7 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) struct nfsd3_readdirargs *argp = rqstp->rq_argp; struct nfsd3_readdirres *resp = rqstp->rq_resp; int count = 0; + loff_t offset; struct page **p; caddr_t page_addr = NULL; @@ -467,7 +468,9 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) resp->common.err = nfs_ok; resp->buffer = argp->buffer; resp->rqstp = rqstp; - resp->status = nfsd_readdir(rqstp, &resp->fh, (loff_t *)&argp->cookie, + offset = argp->cookie; + + resp->status = nfsd_readdir(rqstp, &resp->fh, &offset, &resp->common, nfs3svc_encode_entry); memcpy(resp->verf, argp->verf, 8); count = 0; @@ -483,8 +486,6 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) } resp->count = count >> 2; if (resp->offset) { - loff_t offset = argp->cookie; - if (unlikely(resp->offset1)) { /* we ended up with offset on a page boundary */ *resp->offset = htonl(offset >> 32); From e0ddafcc25e5a1f1686769fa14e2c7ffa97266a8 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 17 Nov 2020 09:50:23 -0500 Subject: [PATCH 0441/1565] NFSD: Add helper to set up the pages where the dirlist is encoded [ Upstream commit 40116ebd0934cca7e46423bdb3397d3d27eb9fb9 ] De-duplicate some code that is used by both READDIR and READDIRPLUS to build the dirlist in the Reply. Because this code is not related to decoding READ arguments, it is moved to a more appropriate spot. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 29 +++++++++++++++++++---------- fs/nfsd/nfs3xdr.c | 20 -------------------- fs/nfsd/xdr3.h | 1 - 3 files changed, 19 insertions(+), 31 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 8cffd9852ef0..25f31a03c4f1 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -440,6 +440,23 @@ nfsd3_proc_link(struct svc_rqst *rqstp) return rpc_success; } +static void nfsd3_init_dirlist_pages(struct svc_rqst *rqstp, + struct nfsd3_readdirres *resp, + int count) +{ + count = min_t(u32, count, svc_max_payload(rqstp)); + + /* Convert byte count to number of words (i.e. >> 2), + * and reserve room for the NULL ptr & eof flag (-2 words) */ + resp->buflen = (count >> 2) - 2; + + resp->buffer = page_address(*rqstp->rq_next_page); + while (count > 0) { + rqstp->rq_next_page++; + count -= PAGE_SIZE; + } +} + /* * Read a portion of a directory. */ @@ -457,16 +474,12 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) SVCFH_fmt(&argp->fh), argp->count, (u32) argp->cookie); - /* Make sure we've room for the NULL ptr & eof flag, and shrink to - * client read size */ - count = (argp->count >> 2) - 2; + nfsd3_init_dirlist_pages(rqstp, resp, argp->count); /* Read directory and encode entries on the fly */ fh_copy(&resp->fh, &argp->fh); - resp->buflen = count; resp->common.err = nfs_ok; - resp->buffer = argp->buffer; resp->rqstp = rqstp; offset = argp->cookie; @@ -518,16 +531,12 @@ nfsd3_proc_readdirplus(struct svc_rqst *rqstp) SVCFH_fmt(&argp->fh), argp->count, (u32) argp->cookie); - /* Convert byte count to number of words (i.e. >> 2), - * and reserve room for the NULL ptr & eof flag (-2 words) */ - resp->count = (argp->count >> 2) - 2; + nfsd3_init_dirlist_pages(rqstp, resp, argp->count); /* Read directory and encode entries on the fly */ fh_copy(&resp->fh, &argp->fh); resp->common.err = nfs_ok; - resp->buffer = argp->buffer; - resp->buflen = resp->count; resp->rqstp = rqstp; offset = argp->cookie; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 6b6a839c1fc8..8394aeb8381e 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -560,8 +560,6 @@ int nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_readdirargs *args = rqstp->rq_argp; - int len; - u32 max_blocksize = svc_max_payload(rqstp); p = decode_fh(p, &args->fh); if (!p) @@ -570,14 +568,6 @@ nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) args->verf = p; p += 2; args->dircount = ~0; args->count = ntohl(*p++); - len = args->count = min_t(u32, args->count, max_blocksize); - - while (len > 0) { - struct page *p = *(rqstp->rq_next_page++); - if (!args->buffer) - args->buffer = page_address(p); - len -= PAGE_SIZE; - } return xdr_argsize_check(rqstp, p); } @@ -586,8 +576,6 @@ int nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_readdirargs *args = rqstp->rq_argp; - int len; - u32 max_blocksize = svc_max_payload(rqstp); p = decode_fh(p, &args->fh); if (!p) @@ -597,14 +585,6 @@ nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, __be32 *p) args->dircount = ntohl(*p++); args->count = ntohl(*p++); - len = args->count = min(args->count, max_blocksize); - while (len > 0) { - struct page *p = *(rqstp->rq_next_page++); - if (!args->buffer) - args->buffer = page_address(p); - len -= PAGE_SIZE; - } - return xdr_argsize_check(rqstp, p); } diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 08f909142ddf..789a364d5e69 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -93,7 +93,6 @@ struct nfsd3_readdirargs { __u32 dircount; __u32 count; __be32 * verf; - __be32 * buffer; }; struct nfsd3_commitargs { From fbcd668016107be7e0007b7b5fcc4bf7e63c5bcf Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 19 Oct 2020 13:23:52 -0400 Subject: [PATCH 0442/1565] NFSD: Update READDIR3args decoders to use struct xdr_stream [ Upstream commit 9cedc2e64c296efb3bebe93a0ceeb5e71e8d722d ] As an additional clean up, neither nfsd3_proc_readdir() nor nfsd3_proc_readdirplus() make use of the dircount argument, so remove it from struct nfsd3_readdirargs. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 38 ++++++++++++++++++++++++-------------- fs/nfsd/xdr3.h | 1 - 2 files changed, 24 insertions(+), 15 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 8394aeb8381e..eb55be106a04 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -559,33 +559,43 @@ nfs3svc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_readdirargs *args = rqstp->rq_argp; - p = decode_fh(p, &args->fh); - if (!p) + if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u64(xdr, &args->cookie) < 0) + return 0; + args->verf = xdr_inline_decode(xdr, NFS3_COOKIEVERFSIZE); + if (!args->verf) + return 0; + if (xdr_stream_decode_u32(xdr, &args->count) < 0) return 0; - p = xdr_decode_hyper(p, &args->cookie); - args->verf = p; p += 2; - args->dircount = ~0; - args->count = ntohl(*p++); - return xdr_argsize_check(rqstp, p); + return 1; } int nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_readdirargs *args = rqstp->rq_argp; + u32 dircount; - p = decode_fh(p, &args->fh); - if (!p) + if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u64(xdr, &args->cookie) < 0) + return 0; + args->verf = xdr_inline_decode(xdr, NFS3_COOKIEVERFSIZE); + if (!args->verf) + return 0; + /* dircount is ignored */ + if (xdr_stream_decode_u32(xdr, &dircount) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &args->count) < 0) return 0; - p = xdr_decode_hyper(p, &args->cookie); - args->verf = p; p += 2; - args->dircount = ntohl(*p++); - args->count = ntohl(*p++); - return xdr_argsize_check(rqstp, p); + return 1; } int diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 789a364d5e69..64af5b01c5d7 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -90,7 +90,6 @@ struct nfsd3_symlinkargs { struct nfsd3_readdirargs { struct svc_fh fh; __u64 cookie; - __u32 dircount; __u32 count; __be32 * verf; }; From 47614a374e652b250355c7fa55352ac94cc5b56a Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 20 Oct 2020 14:41:56 -0400 Subject: [PATCH 0443/1565] NFSD: Update COMMIT3arg decoder to use struct xdr_stream [ Upstream commit c8d26a0acfe77f0880e0acfe77e4209cf8f3a38b ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index eb55be106a04..bafb84c97861 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -601,14 +601,17 @@ nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_commitargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_commitargs *args = rqstp->rq_argp; - p = decode_fh(p, &args->fh); - if (!p) - return 0; - p = xdr_decode_hyper(p, &args->offset); - args->count = ntohl(*p++); - return xdr_argsize_check(rqstp, p); + if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u64(xdr, &args->offset) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &args->count) < 0) + return 0; + + return 1; } /* From 69e54a4470a43190ac5e61103c00903847099e21 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 20 Oct 2020 15:42:33 -0400 Subject: [PATCH 0444/1565] NFSD: Update the NFSv3 DIROPargs decoder to use struct xdr_stream [ Upstream commit 54d1d43dc709f58be38d278bfc38e9bfb38d35fc ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 40 +++++++++++++++++++++++++++++++++++----- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index bafb84c97861..299ea8bbd685 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -117,6 +117,39 @@ decode_filename(__be32 *p, char **namp, unsigned int *lenp) return p; } +static bool +svcxdr_decode_filename3(struct xdr_stream *xdr, char **name, unsigned int *len) +{ + u32 size, i; + __be32 *p; + char *c; + + if (xdr_stream_decode_u32(xdr, &size) < 0) + return false; + if (size == 0 || size > NFS3_MAXNAMLEN) + return false; + p = xdr_inline_decode(xdr, size); + if (!p) + return false; + + *len = size; + *name = (char *)p; + for (i = 0, c = *name; i < size; i++, c++) { + if (*c == '\0' || *c == '/') + return false; + } + + return true; +} + +static bool +svcxdr_decode_diropargs3(struct xdr_stream *xdr, struct svc_fh *fhp, + char **name, unsigned int *len) +{ + return svcxdr_decode_nfs_fh3(xdr, fhp) && + svcxdr_decode_filename3(xdr, name, len); +} + static __be32 * decode_sattr3(__be32 *p, struct iattr *iap, struct user_namespace *userns) { @@ -363,13 +396,10 @@ nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_diropargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_diropargs *args = rqstp->rq_argp; - if (!(p = decode_fh(p, &args->fh)) - || !(p = decode_filename(p, &args->name, &args->len))) - return 0; - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len); } int From e84af2339181503d34edd5fce1e1af8c02cceaa6 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 20 Oct 2020 15:44:12 -0400 Subject: [PATCH 0445/1565] NFSD: Update the RENAME3args decoder to use struct xdr_stream [ Upstream commit d181e0a4bef36ee74d1338e5b5c2561d7463a5d0 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 299ea8bbd685..f870a068aad8 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -562,15 +562,13 @@ nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_renameargs *args = rqstp->rq_argp; - if (!(p = decode_fh(p, &args->ffh)) - || !(p = decode_filename(p, &args->fname, &args->flen)) - || !(p = decode_fh(p, &args->tfh)) - || !(p = decode_filename(p, &args->tname, &args->tlen))) - return 0; - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_diropargs3(xdr, &args->ffh, + &args->fname, &args->flen) && + svcxdr_decode_diropargs3(xdr, &args->tfh, + &args->tname, &args->tlen); } int From 71d7e7c6a6f4a4853612afe058e3fc923a4045fb Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 19 Oct 2020 13:26:32 -0400 Subject: [PATCH 0446/1565] NFSD: Update the LINK3args decoder to use struct xdr_stream [ Upstream commit efaa1e7c2c7475f0a9bbeb904d9aba09b73dd52a ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index f870a068aad8..9437dda2646f 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -574,14 +574,12 @@ nfs3svc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_linkargs *args = rqstp->rq_argp; - if (!(p = decode_fh(p, &args->ffh)) - || !(p = decode_fh(p, &args->tfh)) - || !(p = decode_filename(p, &args->tname, &args->tlen))) - return 0; - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_nfs_fh3(xdr, &args->ffh) && + svcxdr_decode_diropargs3(xdr, &args->tfh, + &args->tname, &args->tlen); } int From 48ea0cb79b45e1f139a42257cb46ed8fc6aaca1c Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 20 Oct 2020 15:48:22 -0400 Subject: [PATCH 0447/1565] NFSD: Update the SETATTR3args decoder to use struct xdr_stream [ Upstream commit 9cde9360d18d8b352b737d10f90f2aecccf93dbe ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 138 +++++++++++++++++++++++++++++++++----- include/uapi/linux/nfs3.h | 6 ++ 2 files changed, 127 insertions(+), 17 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 9437dda2646f..6a6bf8e34d82 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -39,12 +39,18 @@ encode_time3(__be32 *p, struct timespec64 *time) return p; } -static __be32 * -decode_time3(__be32 *p, struct timespec64 *time) +static bool +svcxdr_decode_nfstime3(struct xdr_stream *xdr, struct timespec64 *timep) { - time->tv_sec = ntohl(*p++); - time->tv_nsec = ntohl(*p++); - return p; + __be32 *p; + + p = xdr_inline_decode(xdr, XDR_UNIT * 2); + if (!p) + return false; + timep->tv_sec = be32_to_cpup(p++); + timep->tv_nsec = be32_to_cpup(p); + + return true; } static bool @@ -150,6 +156,112 @@ svcxdr_decode_diropargs3(struct xdr_stream *xdr, struct svc_fh *fhp, svcxdr_decode_filename3(xdr, name, len); } +static bool +svcxdr_decode_sattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr, + struct iattr *iap) +{ + u32 set_it; + + iap->ia_valid = 0; + + if (xdr_stream_decode_bool(xdr, &set_it) < 0) + return false; + if (set_it) { + u32 mode; + + if (xdr_stream_decode_u32(xdr, &mode) < 0) + return false; + iap->ia_valid |= ATTR_MODE; + iap->ia_mode = mode; + } + if (xdr_stream_decode_bool(xdr, &set_it) < 0) + return false; + if (set_it) { + u32 uid; + + if (xdr_stream_decode_u32(xdr, &uid) < 0) + return false; + iap->ia_uid = make_kuid(nfsd_user_namespace(rqstp), uid); + if (uid_valid(iap->ia_uid)) + iap->ia_valid |= ATTR_UID; + } + if (xdr_stream_decode_bool(xdr, &set_it) < 0) + return false; + if (set_it) { + u32 gid; + + if (xdr_stream_decode_u32(xdr, &gid) < 0) + return false; + iap->ia_gid = make_kgid(nfsd_user_namespace(rqstp), gid); + if (gid_valid(iap->ia_gid)) + iap->ia_valid |= ATTR_GID; + } + if (xdr_stream_decode_bool(xdr, &set_it) < 0) + return false; + if (set_it) { + u64 newsize; + + if (xdr_stream_decode_u64(xdr, &newsize) < 0) + return false; + iap->ia_valid |= ATTR_SIZE; + iap->ia_size = min_t(u64, newsize, NFS_OFFSET_MAX); + } + if (xdr_stream_decode_u32(xdr, &set_it) < 0) + return false; + switch (set_it) { + case DONT_CHANGE: + break; + case SET_TO_SERVER_TIME: + iap->ia_valid |= ATTR_ATIME; + break; + case SET_TO_CLIENT_TIME: + if (!svcxdr_decode_nfstime3(xdr, &iap->ia_atime)) + return false; + iap->ia_valid |= ATTR_ATIME | ATTR_ATIME_SET; + break; + default: + return false; + } + if (xdr_stream_decode_u32(xdr, &set_it) < 0) + return false; + switch (set_it) { + case DONT_CHANGE: + break; + case SET_TO_SERVER_TIME: + iap->ia_valid |= ATTR_MTIME; + break; + case SET_TO_CLIENT_TIME: + if (!svcxdr_decode_nfstime3(xdr, &iap->ia_mtime)) + return false; + iap->ia_valid |= ATTR_MTIME | ATTR_MTIME_SET; + break; + default: + return false; + } + + return true; +} + +static bool +svcxdr_decode_sattrguard3(struct xdr_stream *xdr, struct nfsd3_sattrargs *args) +{ + __be32 *p; + u32 check; + + if (xdr_stream_decode_bool(xdr, &check) < 0) + return false; + if (check) { + p = xdr_inline_decode(xdr, XDR_UNIT * 2); + if (!p) + return false; + args->check_guard = 1; + args->guardtime = be32_to_cpup(p); + } else + args->check_guard = 0; + + return true; +} + static __be32 * decode_sattr3(__be32 *p, struct iattr *iap, struct user_namespace *userns) { @@ -377,20 +489,12 @@ nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_sattrargs *args = rqstp->rq_argp; - p = decode_fh(p, &args->fh); - if (!p) - return 0; - p = decode_sattr3(p, &args->attrs, nfsd_user_namespace(rqstp)); - - if ((args->check_guard = ntohl(*p++)) != 0) { - struct timespec64 time; - p = decode_time3(p, &time); - args->guardtime = time.tv_sec; - } - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_nfs_fh3(xdr, &args->fh) && + svcxdr_decode_sattr3(rqstp, xdr, &args->attrs) && + svcxdr_decode_sattrguard3(xdr, args); } int diff --git a/include/uapi/linux/nfs3.h b/include/uapi/linux/nfs3.h index 37e4b34e6b43..c22ab77713bd 100644 --- a/include/uapi/linux/nfs3.h +++ b/include/uapi/linux/nfs3.h @@ -63,6 +63,12 @@ enum nfs3_ftype { NF3BAD = 8 }; +enum nfs3_time_how { + DONT_CHANGE = 0, + SET_TO_SERVER_TIME = 1, + SET_TO_CLIENT_TIME = 2, +}; + struct nfs3_fh { unsigned short size; unsigned char data[NFS3_FHSIZE]; From bd54084b587f833bff139784ec6259d0cc55bcef Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 20 Oct 2020 15:56:11 -0400 Subject: [PATCH 0448/1565] NFSD: Update the CREATE3args decoder to use struct xdr_stream [ Upstream commit 6b3a11960d898b25a30103cc6a2ff0b24b90a83b ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 6a6bf8e34d82..24db3725a070 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -580,26 +580,26 @@ nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_createargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_createargs *args = rqstp->rq_argp; - if (!(p = decode_fh(p, &args->fh)) - || !(p = decode_filename(p, &args->name, &args->len))) + if (!svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len)) return 0; - - switch (args->createmode = ntohl(*p++)) { + if (xdr_stream_decode_u32(xdr, &args->createmode) < 0) + return 0; + switch (args->createmode) { case NFS3_CREATE_UNCHECKED: case NFS3_CREATE_GUARDED: - p = decode_sattr3(p, &args->attrs, nfsd_user_namespace(rqstp)); - break; + return svcxdr_decode_sattr3(rqstp, xdr, &args->attrs); case NFS3_CREATE_EXCLUSIVE: - args->verf = p; - p += 2; + args->verf = xdr_inline_decode(xdr, NFS3_CREATEVERFSIZE); + if (!args->verf) + return 0; break; default: return 0; } - - return xdr_argsize_check(rqstp, p); + return 1; } int From b5d1ae6cc4c2d6677ed115f376188b3050499dca Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 20 Oct 2020 17:02:16 -0400 Subject: [PATCH 0449/1565] NFSD: Update the MKDIR3args decoder to use struct xdr_stream [ Upstream commit 83374c278db193f3e8b2608b45da1132b867a760 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 24db3725a070..b4071cda1d65 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -605,14 +605,12 @@ nfs3svc_decode_createargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_createargs *args = rqstp->rq_argp; - if (!(p = decode_fh(p, &args->fh)) || - !(p = decode_filename(p, &args->name, &args->len))) - return 0; - p = decode_sattr3(p, &args->attrs, nfsd_user_namespace(rqstp)); - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_diropargs3(xdr, &args->fh, + &args->name, &args->len) && + svcxdr_decode_sattr3(rqstp, xdr, &args->attrs); } int From 8841760f685bb50c67f8529bf4b821389d0ec002 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 20 Oct 2020 16:01:16 -0400 Subject: [PATCH 0450/1565] NFSD: Update the SYMLINK3args decoder to use struct xdr_stream [ Upstream commit da39201637297460c13134c29286a00f3a1c92fe ] Similar to the WRITE decoder, code that checks the sanity of the payload size is re-wired to work with xdr_stream infrastructure. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 33 ++++++++++++++++++--------------- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index b4071cda1d65..eb17231ab166 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -616,25 +616,28 @@ nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_symlinkargs *args = rqstp->rq_argp; - char *base = (char *)p; - size_t dlen; + struct kvec *head = rqstp->rq_arg.head; + struct kvec *tail = rqstp->rq_arg.tail; + size_t remaining; - if (!(p = decode_fh(p, &args->ffh)) || - !(p = decode_filename(p, &args->fname, &args->flen))) + if (!svcxdr_decode_diropargs3(xdr, &args->ffh, &args->fname, &args->flen)) return 0; - p = decode_sattr3(p, &args->attrs, nfsd_user_namespace(rqstp)); - - args->tlen = ntohl(*p++); - - args->first.iov_base = p; - args->first.iov_len = rqstp->rq_arg.head[0].iov_len; - args->first.iov_len -= (char *)p - base; - - dlen = args->first.iov_len + rqstp->rq_arg.page_len + - rqstp->rq_arg.tail[0].iov_len; - if (dlen < XDR_QUADLEN(args->tlen) << 2) + if (!svcxdr_decode_sattr3(rqstp, xdr, &args->attrs)) return 0; + if (xdr_stream_decode_u32(xdr, &args->tlen) < 0) + return 0; + + /* request sanity */ + remaining = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len; + remaining -= xdr_stream_pos(xdr); + if (remaining < xdr_align_size(args->tlen)) + return 0; + + args->first.iov_base = xdr->p; + args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); + return 1; } From 9ceeee0ec8877bb93e1514353cfe86b8602f67d8 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 20 Oct 2020 17:04:03 -0400 Subject: [PATCH 0451/1565] NFSD: Update the MKNOD3args decoder to use struct xdr_stream [ Upstream commit f8a38e2d6c885f9d7cd03febc515d36293de4a5b ] This commit removes the last usage of the original decode_sattr3(), so it is removed as a clean-up. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 107 +++++++++++++++------------------------------- 1 file changed, 35 insertions(+), 72 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index eb17231ab166..a30b418a5116 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -103,26 +103,6 @@ encode_fh(__be32 *p, struct svc_fh *fhp) return p + XDR_QUADLEN(size); } -/* - * Decode a file name and make sure that the path contains - * no slashes or null bytes. - */ -static __be32 * -decode_filename(__be32 *p, char **namp, unsigned int *lenp) -{ - char *name; - unsigned int i; - - if ((p = xdr_decode_string_inplace(p, namp, lenp, NFS3_MAXNAMLEN)) != NULL) { - for (i = 0, name = *namp; i < *lenp; i++, name++) { - if (*name == '\0' || *name == '/') - return NULL; - } - } - - return p; -} - static bool svcxdr_decode_filename3(struct xdr_stream *xdr, char **name, unsigned int *len) { @@ -262,49 +242,26 @@ svcxdr_decode_sattrguard3(struct xdr_stream *xdr, struct nfsd3_sattrargs *args) return true; } -static __be32 * -decode_sattr3(__be32 *p, struct iattr *iap, struct user_namespace *userns) +static bool +svcxdr_decode_specdata3(struct xdr_stream *xdr, struct nfsd3_mknodargs *args) { - u32 tmp; + __be32 *p; - iap->ia_valid = 0; + p = xdr_inline_decode(xdr, XDR_UNIT * 2); + if (!p) + return false; + args->major = be32_to_cpup(p++); + args->minor = be32_to_cpup(p); - if (*p++) { - iap->ia_valid |= ATTR_MODE; - iap->ia_mode = ntohl(*p++); - } - if (*p++) { - iap->ia_uid = make_kuid(userns, ntohl(*p++)); - if (uid_valid(iap->ia_uid)) - iap->ia_valid |= ATTR_UID; - } - if (*p++) { - iap->ia_gid = make_kgid(userns, ntohl(*p++)); - if (gid_valid(iap->ia_gid)) - iap->ia_valid |= ATTR_GID; - } - if (*p++) { - u64 newsize; + return true; +} - iap->ia_valid |= ATTR_SIZE; - p = xdr_decode_hyper(p, &newsize); - iap->ia_size = min_t(u64, newsize, NFS_OFFSET_MAX); - } - if ((tmp = ntohl(*p++)) == 1) { /* set to server time */ - iap->ia_valid |= ATTR_ATIME; - } else if (tmp == 2) { /* set to client time */ - iap->ia_valid |= ATTR_ATIME | ATTR_ATIME_SET; - iap->ia_atime.tv_sec = ntohl(*p++); - iap->ia_atime.tv_nsec = ntohl(*p++); - } - if ((tmp = ntohl(*p++)) == 1) { /* set to server time */ - iap->ia_valid |= ATTR_MTIME; - } else if (tmp == 2) { /* set to client time */ - iap->ia_valid |= ATTR_MTIME | ATTR_MTIME_SET; - iap->ia_mtime.tv_sec = ntohl(*p++); - iap->ia_mtime.tv_nsec = ntohl(*p++); - } - return p; +static bool +svcxdr_decode_devicedata3(struct svc_rqst *rqstp, struct xdr_stream *xdr, + struct nfsd3_mknodargs *args) +{ + return svcxdr_decode_sattr3(rqstp, xdr, &args->attrs) && + svcxdr_decode_specdata3(xdr, args); } static __be32 *encode_fsid(__be32 *p, struct svc_fh *fhp) @@ -644,24 +601,30 @@ nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_mknodargs *args = rqstp->rq_argp; - if (!(p = decode_fh(p, &args->fh)) - || !(p = decode_filename(p, &args->name, &args->len))) + if (!svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len)) + return 0; + if (xdr_stream_decode_u32(xdr, &args->ftype) < 0) + return 0; + switch (args->ftype) { + case NF3CHR: + case NF3BLK: + return svcxdr_decode_devicedata3(rqstp, xdr, args); + case NF3SOCK: + case NF3FIFO: + return svcxdr_decode_sattr3(rqstp, xdr, &args->attrs); + case NF3REG: + case NF3DIR: + case NF3LNK: + /* Valid XDR but illegal file types */ + break; + default: return 0; - - args->ftype = ntohl(*p++); - - if (args->ftype == NF3BLK || args->ftype == NF3CHR - || args->ftype == NF3SOCK || args->ftype == NF3FIFO) - p = decode_sattr3(p, &args->attrs, nfsd_user_namespace(rqstp)); - - if (args->ftype == NF3BLK || args->ftype == NF3CHR) { - args->major = ntohl(*p++); - args->minor = ntohl(*p++); } - return xdr_argsize_check(rqstp, p); + return 1; } int From ba7e0412fb5a7786de292357cfa94dc35ae5778a Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 21 Oct 2020 12:14:23 -0400 Subject: [PATCH 0452/1565] NFSD: Update the NFSv2 GETATTR argument decoder to use struct xdr_stream [ Upstream commit ebcd8e8b28535b643a4c06685bd363b3b73a96af ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 4 ++-- fs/nfsd/nfsxdr.c | 26 ++++++++++++++++++++------ fs/nfsd/xdr.h | 2 +- 3 files changed, 23 insertions(+), 9 deletions(-) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index f22f70f63b53..3cac9972aa83 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -627,7 +627,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { }, [NFSPROC_GETATTR] = { .pc_func = nfsd_proc_getattr, - .pc_decode = nfssvc_decode_fhandle, + .pc_decode = nfssvc_decode_fhandleargs, .pc_encode = nfssvc_encode_attrstat, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_fhandle), @@ -793,7 +793,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { }, [NFSPROC_STATFS] = { .pc_func = nfsd_proc_statfs, - .pc_decode = nfssvc_decode_fhandle, + .pc_decode = nfssvc_decode_fhandleargs, .pc_encode = nfssvc_encode_statfsres, .pc_argsize = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_statfsres), diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 7aa6e8aca2c1..f3189e1be20f 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -23,8 +23,9 @@ static u32 nfs_ftypes[] = { /* - * XDR functions for basic NFS types + * Basic NFSv2 data types (RFC 1094 Section 2.3) */ + static __be32 * decode_fh(__be32 *p, struct svc_fh *fhp) { @@ -37,6 +38,21 @@ decode_fh(__be32 *p, struct svc_fh *fhp) return p + (NFS_FHSIZE >> 2); } +static bool +svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp) +{ + __be32 *p; + + p = xdr_inline_decode(xdr, NFS_FHSIZE); + if (!p) + return false; + fh_init(fhp, NFS_FHSIZE); + memcpy(&fhp->fh_handle.fh_base, p, NFS_FHSIZE); + fhp->fh_handle.fh_size = NFS_FHSIZE; + + return true; +} + /* Helper function for NFSv2 ACL code */ __be32 *nfs2svc_decode_fh(__be32 *p, struct svc_fh *fhp) { @@ -194,14 +210,12 @@ __be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *f */ int -nfssvc_decode_fhandle(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_fhandle *args = rqstp->rq_argp; - p = decode_fh(p, &args->fh); - if (!p) - return 0; - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_fhandle(xdr, &args->fh); } int diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index edd87688ff86..50466ac6200c 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -144,7 +144,7 @@ union nfsd_xdrstore { #define NFS2_SVC_XDRSIZE sizeof(union nfsd_xdrstore) -int nfssvc_decode_fhandle(struct svc_rqst *, __be32 *); +int nfssvc_decode_fhandleargs(struct svc_rqst *, __be32 *); int nfssvc_decode_sattrargs(struct svc_rqst *, __be32 *); int nfssvc_decode_diropargs(struct svc_rqst *, __be32 *); int nfssvc_decode_readargs(struct svc_rqst *, __be32 *); From 6e88b7ec6cd52d0ccf2a1961f1550093f0f2c581 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 21 Oct 2020 12:15:51 -0400 Subject: [PATCH 0453/1565] NFSD: Update the NFSv2 READ argument decoder to use struct xdr_stream [ Upstream commit 8c293ef993c8df0b1bea9ecb0de6eb96dec3ac9d ] The code that sets up rq_vec is refactored so that it is now adjacent to the nfsd_read() call site where it is used. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 32 ++++++++++++++++++-------------- fs/nfsd/nfsxdr.c | 36 ++++++++++++------------------------ fs/nfsd/xdr.h | 1 - 3 files changed, 30 insertions(+), 39 deletions(-) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 3cac9972aa83..c70ae20e54c4 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -171,32 +171,36 @@ nfsd_proc_read(struct svc_rqst *rqstp) { struct nfsd_readargs *argp = rqstp->rq_argp; struct nfsd_readres *resp = rqstp->rq_resp; + unsigned int len; u32 eof; + int v; dprintk("nfsd: READ %s %d bytes at %d\n", SVCFH_fmt(&argp->fh), argp->count, argp->offset); + argp->count = min_t(u32, argp->count, NFSSVC_MAXBLKSIZE_V2); + + v = 0; + len = argp->count; + while (len > 0) { + struct page *page = *(rqstp->rq_next_page++); + + rqstp->rq_vec[v].iov_base = page_address(page); + rqstp->rq_vec[v].iov_len = min_t(unsigned int, len, PAGE_SIZE); + len -= rqstp->rq_vec[v].iov_len; + v++; + } + /* Obtain buffer pointer for payload. 19 is 1 word for * status, 17 words for fattr, and 1 word for the byte count. */ - - if (NFSSVC_MAXBLKSIZE_V2 < argp->count) { - char buf[RPC_MAX_ADDRBUFLEN]; - printk(KERN_NOTICE - "oversized read request from %s (%d bytes)\n", - svc_print_addr(rqstp, buf, sizeof(buf)), - argp->count); - argp->count = NFSSVC_MAXBLKSIZE_V2; - } svc_reserve_auth(rqstp, (19<<2) + argp->count + 4); resp->count = argp->count; - resp->status = nfsd_read(rqstp, fh_copy(&resp->fh, &argp->fh), - argp->offset, - rqstp->rq_vec, argp->vlen, - &resp->count, - &eof); + fh_copy(&resp->fh, &argp->fh); + resp->status = nfsd_read(rqstp, &resp->fh, argp->offset, + rqstp->rq_vec, v, &resp->count, &eof); if (resp->status == nfs_ok) resp->status = fh_getattr(&resp->fh, &resp->stat); else if (resp->status == nfserr_jukebox) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index f3189e1be20f..1eacaa2c13a9 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -246,33 +246,21 @@ nfssvc_decode_diropargs(struct svc_rqst *rqstp, __be32 *p) int nfssvc_decode_readargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_readargs *args = rqstp->rq_argp; - unsigned int len; - int v; - p = decode_fh(p, &args->fh); - if (!p) + u32 totalcount; + + if (!svcxdr_decode_fhandle(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u32(xdr, &args->offset) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &args->count) < 0) + return 0; + /* totalcount is ignored */ + if (xdr_stream_decode_u32(xdr, &totalcount) < 0) return 0; - args->offset = ntohl(*p++); - len = args->count = ntohl(*p++); - p++; /* totalcount - unused */ - - len = min_t(unsigned int, len, NFSSVC_MAXBLKSIZE_V2); - - /* set up somewhere to store response. - * We take pages, put them on reslist and include in iovec - */ - v=0; - while (len > 0) { - struct page *p = *(rqstp->rq_next_page++); - - rqstp->rq_vec[v].iov_base = page_address(p); - rqstp->rq_vec[v].iov_len = min_t(unsigned int, len, PAGE_SIZE); - len -= rqstp->rq_vec[v].iov_len; - v++; - } - args->vlen = v; - return xdr_argsize_check(rqstp, p); + return 1; } int diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 50466ac6200c..7c704fa3215e 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -27,7 +27,6 @@ struct nfsd_readargs { struct svc_fh fh; __u32 offset; __u32 count; - int vlen; }; struct nfsd_writeargs { From ecee6ba5920cfbd63646750b87db445583fd9c7c Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 21 Oct 2020 12:18:36 -0400 Subject: [PATCH 0454/1565] NFSD: Update the NFSv2 WRITE argument decoder to use struct xdr_stream [ Upstream commit a51b5b737a0be93fae6ea2a18df03ab2359a3f4b ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsxdr.c | 52 +++++++++++++++++++----------------------------- 1 file changed, 21 insertions(+), 31 deletions(-) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 1eacaa2c13a9..11d27b219cff 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -266,46 +266,36 @@ nfssvc_decode_readargs(struct svc_rqst *rqstp, __be32 *p) int nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_writeargs *args = rqstp->rq_argp; - unsigned int len, hdr, dlen; struct kvec *head = rqstp->rq_arg.head; + struct kvec *tail = rqstp->rq_arg.tail; + u32 beginoffset, totalcount; + size_t remaining; - p = decode_fh(p, &args->fh); - if (!p) + if (!svcxdr_decode_fhandle(xdr, &args->fh)) + return 0; + /* beginoffset is ignored */ + if (xdr_stream_decode_u32(xdr, &beginoffset) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &args->offset) < 0) + return 0; + /* totalcount is ignored */ + if (xdr_stream_decode_u32(xdr, &totalcount) < 0) return 0; - p++; /* beginoffset */ - args->offset = ntohl(*p++); /* offset */ - p++; /* totalcount */ - len = args->len = ntohl(*p++); - /* - * The protocol specifies a maximum of 8192 bytes. - */ - if (len > NFSSVC_MAXBLKSIZE_V2) + /* opaque data */ + if (xdr_stream_decode_u32(xdr, &args->len) < 0) return 0; - - /* - * Check to make sure that we got the right number of - * bytes. - */ - hdr = (void*)p - head->iov_base; - if (hdr > head->iov_len) + if (args->len > NFSSVC_MAXBLKSIZE_V2) return 0; - dlen = head->iov_len + rqstp->rq_arg.page_len - hdr; - - /* - * Round the length of the data which was specified up to - * the next multiple of XDR units and then compare that - * against the length which was actually received. - * Note that when RPCSEC/GSS (for example) is used, the - * data buffer can be padded so dlen might be larger - * than required. It must never be smaller. - */ - if (dlen < XDR_QUADLEN(len)*4) + remaining = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len; + remaining -= xdr_stream_pos(xdr); + if (remaining < xdr_align_size(args->len)) return 0; + args->first.iov_base = xdr->p; + args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); - args->first.iov_base = (void *)p; - args->first.iov_len = head->iov_len - hdr; return 1; } From 365835d2ff67e1a1e78dfd23b315a9ed0f8185bc Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 21 Oct 2020 12:21:25 -0400 Subject: [PATCH 0455/1565] NFSD: Update the NFSv2 READLINK argument decoder to use struct xdr_stream [ Upstream commit 1fcbd1c9456ba129d38420e345e91c4b6363db47 ] If the code that sets up the sink buffer for nfsd_readlink() is moved adjacent to the nfsd_readlink() call site that uses it, then the only argument is a file handle, and the fhandle decoder can be used instead. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 9 +++++---- fs/nfsd/nfsxdr.c | 13 ------------- fs/nfsd/xdr.h | 6 ------ 3 files changed, 5 insertions(+), 23 deletions(-) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index c70ae20e54c4..6352da0168e0 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -149,14 +149,15 @@ nfsd_proc_lookup(struct svc_rqst *rqstp) static __be32 nfsd_proc_readlink(struct svc_rqst *rqstp) { - struct nfsd_readlinkargs *argp = rqstp->rq_argp; + struct nfsd_fhandle *argp = rqstp->rq_argp; struct nfsd_readlinkres *resp = rqstp->rq_resp; + char *buffer = page_address(*(rqstp->rq_next_page++)); dprintk("nfsd: READLINK %s\n", SVCFH_fmt(&argp->fh)); /* Read the symlink. */ resp->len = NFS_MAXPATHLEN; - resp->status = nfsd_readlink(rqstp, &argp->fh, argp->buffer, &resp->len); + resp->status = nfsd_readlink(rqstp, &argp->fh, buffer, &resp->len); fh_put(&argp->fh); return rpc_success; @@ -674,9 +675,9 @@ static const struct svc_procedure nfsd_procedures2[18] = { }, [NFSPROC_READLINK] = { .pc_func = nfsd_proc_readlink, - .pc_decode = nfssvc_decode_readlinkargs, + .pc_decode = nfssvc_decode_fhandleargs, .pc_encode = nfssvc_encode_readlinkres, - .pc_argsize = sizeof(struct nfsd_readlinkargs), + .pc_argsize = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_readlinkres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+1+NFS_MAXPATHLEN/4, diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 11d27b219cff..02dd9888d93b 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -326,19 +326,6 @@ nfssvc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) return xdr_argsize_check(rqstp, p); } -int -nfssvc_decode_readlinkargs(struct svc_rqst *rqstp, __be32 *p) -{ - struct nfsd_readlinkargs *args = rqstp->rq_argp; - - p = decode_fh(p, &args->fh); - if (!p) - return 0; - args->buffer = page_address(*(rqstp->rq_next_page++)); - - return xdr_argsize_check(rqstp, p); -} - int nfssvc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) { diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 7c704fa3215e..1338551de828 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -52,11 +52,6 @@ struct nfsd_renameargs { unsigned int tlen; }; -struct nfsd_readlinkargs { - struct svc_fh fh; - char * buffer; -}; - struct nfsd_linkargs { struct svc_fh ffh; struct svc_fh tfh; @@ -150,7 +145,6 @@ int nfssvc_decode_readargs(struct svc_rqst *, __be32 *); int nfssvc_decode_writeargs(struct svc_rqst *, __be32 *); int nfssvc_decode_createargs(struct svc_rqst *, __be32 *); int nfssvc_decode_renameargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_readlinkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_linkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_symlinkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_readdirargs(struct svc_rqst *, __be32 *); From 672111a408726d0a0a24782ded902b0323160440 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 13 Nov 2020 17:03:49 -0500 Subject: [PATCH 0456/1565] NFSD: Add helper to set up the pages where the dirlist is encoded [ Upstream commit 788cd46ecf83ee2d561cb4e754e276dc8089b787 ] Add a helper similar to nfsd3_init_dirlist_pages(). Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 29 ++++++++++++++++++----------- fs/nfsd/nfsxdr.c | 2 -- fs/nfsd/xdr.h | 1 - 3 files changed, 18 insertions(+), 14 deletions(-) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 6352da0168e0..a628ea4d66ea 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -553,6 +553,20 @@ nfsd_proc_rmdir(struct svc_rqst *rqstp) return rpc_success; } +static void nfsd_init_dirlist_pages(struct svc_rqst *rqstp, + struct nfsd_readdirres *resp, + int count) +{ + count = min_t(u32, count, PAGE_SIZE); + + /* Convert byte count to number of words (i.e. >> 2), + * and reserve room for the NULL ptr & eof flag (-2 words) */ + resp->buflen = (count >> 2) - 2; + + resp->buffer = page_address(*rqstp->rq_next_page); + rqstp->rq_next_page++; +} + /* * Read a portion of a directory. */ @@ -561,31 +575,24 @@ nfsd_proc_readdir(struct svc_rqst *rqstp) { struct nfsd_readdirargs *argp = rqstp->rq_argp; struct nfsd_readdirres *resp = rqstp->rq_resp; - int count; loff_t offset; + __be32 *buffer; dprintk("nfsd: READDIR %s %d bytes at %d\n", SVCFH_fmt(&argp->fh), argp->count, argp->cookie); - /* Shrink to the client read size */ - count = (argp->count >> 2) - 2; + nfsd_init_dirlist_pages(rqstp, resp, argp->count); + buffer = resp->buffer; - /* Make sure we've room for the NULL ptr & eof flag */ - count -= 2; - if (count < 0) - count = 0; - - resp->buffer = argp->buffer; resp->offset = NULL; - resp->buflen = count; resp->common.err = nfs_ok; /* Read directory and encode entries on the fly */ offset = argp->cookie; resp->status = nfsd_readdir(rqstp, &argp->fh, &offset, &resp->common, nfssvc_encode_entry); - resp->count = resp->buffer - argp->buffer; + resp->count = resp->buffer - buffer; if (resp->offset) *resp->offset = htonl(offset); diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 02dd9888d93b..3d72334e1673 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -388,8 +388,6 @@ nfssvc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) return 0; args->cookie = ntohl(*p++); args->count = ntohl(*p++); - args->count = min_t(u32, args->count, PAGE_SIZE); - args->buffer = page_address(*(rqstp->rq_next_page++)); return xdr_argsize_check(rqstp, p); } diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 1338551de828..ff68643504c3 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -73,7 +73,6 @@ struct nfsd_readdirargs { struct svc_fh fh; __u32 cookie; __u32 count; - __be32 * buffer; }; struct nfsd_stat { From 7d9ab8ee576f034bdfaf792704b095ec7c750684 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 19 Oct 2020 14:15:51 -0400 Subject: [PATCH 0457/1565] NFSD: Update the NFSv2 READDIR argument decoder to use struct xdr_stream [ Upstream commit 8688361ae2edb8f7e61d926dc5000c9a44f29370 ] As an additional clean up, move code not related to XDR decoding into readdir's .pc_func call out. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsxdr.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 3d72334e1673..7b33093f8d8b 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -381,15 +381,17 @@ nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) int nfssvc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_readdirargs *args = rqstp->rq_argp; - p = decode_fh(p, &args->fh); - if (!p) + if (!svcxdr_decode_fhandle(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u32(xdr, &args->cookie) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &args->count) < 0) return 0; - args->cookie = ntohl(*p++); - args->count = ntohl(*p++); - return xdr_argsize_check(rqstp, p); + return 1; } /* From 0047abd4c411d8976a98ddf9ba1b8a5737dff9bb Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 19 Oct 2020 14:33:24 -0400 Subject: [PATCH 0458/1565] NFSD: Update NFSv2 diropargs decoding to use struct xdr_stream [ Upstream commit 6d742c1864c18f143ea2031f1ed66bcd8f4812de ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsxdr.c | 39 ++++++++++++++++++++++++++++++++++----- 1 file changed, 34 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 7b33093f8d8b..00a7db8548eb 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -86,6 +86,38 @@ decode_filename(__be32 *p, char **namp, unsigned int *lenp) return p; } +static bool +svcxdr_decode_filename(struct xdr_stream *xdr, char **name, unsigned int *len) +{ + u32 size, i; + __be32 *p; + char *c; + + if (xdr_stream_decode_u32(xdr, &size) < 0) + return false; + if (size == 0 || size > NFS_MAXNAMLEN) + return false; + p = xdr_inline_decode(xdr, size); + if (!p) + return false; + + *len = size; + *name = (char *)p; + for (i = 0, c = *name; i < size; i++, c++) + if (*c == '\0' || *c == '/') + return false; + + return true; +} + +static bool +svcxdr_decode_diropargs(struct xdr_stream *xdr, struct svc_fh *fhp, + char **name, unsigned int *len) +{ + return svcxdr_decode_fhandle(xdr, fhp) && + svcxdr_decode_filename(xdr, name, len); +} + static __be32 * decode_sattr(__be32 *p, struct iattr *iap, struct user_namespace *userns) { @@ -234,13 +266,10 @@ nfssvc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p) int nfssvc_decode_diropargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_diropargs *args = rqstp->rq_argp; - if (!(p = decode_fh(p, &args->fh)) - || !(p = decode_filename(p, &args->name, &args->len))) - return 0; - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_diropargs(xdr, &args->fh, &args->name, &args->len); } int From a362dd478be0769426baceba48e196024e648d57 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 21 Oct 2020 12:35:41 -0400 Subject: [PATCH 0459/1565] NFSD: Update the NFSv2 RENAME argument decoder to use struct xdr_stream [ Upstream commit 62aa557efb81ea3339fabe7f5b1a343e742bbbdf ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsxdr.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 00a7db8548eb..d4f4729c7b1c 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -344,15 +344,13 @@ nfssvc_decode_createargs(struct svc_rqst *rqstp, __be32 *p) int nfssvc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_renameargs *args = rqstp->rq_argp; - if (!(p = decode_fh(p, &args->ffh)) - || !(p = decode_filename(p, &args->fname, &args->flen)) - || !(p = decode_fh(p, &args->tfh)) - || !(p = decode_filename(p, &args->tname, &args->tlen))) - return 0; - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_diropargs(xdr, &args->ffh, + &args->fname, &args->flen) && + svcxdr_decode_diropargs(xdr, &args->tfh, + &args->tname, &args->tlen); } int From ebfb21605f1af172fe47710c41e10fd778f75ff5 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 21 Oct 2020 12:34:24 -0400 Subject: [PATCH 0460/1565] NFSD: Update the NFSv2 LINK argument decoder to use struct xdr_stream [ Upstream commit 77edcdf91f6245a9881b84e4e101738148bd039a ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsxdr.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index d4f4729c7b1c..3d0fe03a3fb9 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -356,14 +356,12 @@ nfssvc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) int nfssvc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_linkargs *args = rqstp->rq_argp; - if (!(p = decode_fh(p, &args->ffh)) - || !(p = decode_fh(p, &args->tfh)) - || !(p = decode_filename(p, &args->tname, &args->tlen))) - return 0; - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_fhandle(xdr, &args->ffh) && + svcxdr_decode_diropargs(xdr, &args->tfh, + &args->tname, &args->tlen); } int From 40de4113f801f3e63e0150acbe1cf77298e6a27e Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 21 Oct 2020 12:39:06 -0400 Subject: [PATCH 0461/1565] NFSD: Update the NFSv2 SETATTR argument decoder to use struct xdr_stream [ Upstream commit 2fdd6bd293b9e7dda61220538b2759fbf06f5af0 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsxdr.c | 82 ++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 76 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 3d0fe03a3fb9..6c87ea8f3876 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -173,6 +173,79 @@ decode_sattr(__be32 *p, struct iattr *iap, struct user_namespace *userns) return p; } +static bool +svcxdr_decode_sattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, + struct iattr *iap) +{ + u32 tmp1, tmp2; + __be32 *p; + + p = xdr_inline_decode(xdr, XDR_UNIT * 8); + if (!p) + return false; + + iap->ia_valid = 0; + + /* + * Some Sun clients put 0xffff in the mode field when they + * mean 0xffffffff. + */ + tmp1 = be32_to_cpup(p++); + if (tmp1 != (u32)-1 && tmp1 != 0xffff) { + iap->ia_valid |= ATTR_MODE; + iap->ia_mode = tmp1; + } + + tmp1 = be32_to_cpup(p++); + if (tmp1 != (u32)-1) { + iap->ia_uid = make_kuid(nfsd_user_namespace(rqstp), tmp1); + if (uid_valid(iap->ia_uid)) + iap->ia_valid |= ATTR_UID; + } + + tmp1 = be32_to_cpup(p++); + if (tmp1 != (u32)-1) { + iap->ia_gid = make_kgid(nfsd_user_namespace(rqstp), tmp1); + if (gid_valid(iap->ia_gid)) + iap->ia_valid |= ATTR_GID; + } + + tmp1 = be32_to_cpup(p++); + if (tmp1 != (u32)-1) { + iap->ia_valid |= ATTR_SIZE; + iap->ia_size = tmp1; + } + + tmp1 = be32_to_cpup(p++); + tmp2 = be32_to_cpup(p++); + if (tmp1 != (u32)-1 && tmp2 != (u32)-1) { + iap->ia_valid |= ATTR_ATIME | ATTR_ATIME_SET; + iap->ia_atime.tv_sec = tmp1; + iap->ia_atime.tv_nsec = tmp2 * NSEC_PER_USEC; + } + + tmp1 = be32_to_cpup(p++); + tmp2 = be32_to_cpup(p++); + if (tmp1 != (u32)-1 && tmp2 != (u32)-1) { + iap->ia_valid |= ATTR_MTIME | ATTR_MTIME_SET; + iap->ia_mtime.tv_sec = tmp1; + iap->ia_mtime.tv_nsec = tmp2 * NSEC_PER_USEC; + /* + * Passing the invalid value useconds=1000000 for mtime + * is a Sun convention for "set both mtime and atime to + * current server time". It's needed to make permissions + * checks for the "touch" program across v2 mounts to + * Solaris and Irix boxes work correctly. See description of + * sattr in section 6.1 of "NFS Illustrated" by + * Brent Callaghan, Addison-Wesley, ISBN 0-201-32750-5 + */ + if (tmp2 == 1000000) + iap->ia_valid &= ~(ATTR_ATIME_SET|ATTR_MTIME_SET); + } + + return true; +} + static __be32 * encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, struct kstat *stat) @@ -253,14 +326,11 @@ nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, __be32 *p) int nfssvc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_sattrargs *args = rqstp->rq_argp; - p = decode_fh(p, &args->fh); - if (!p) - return 0; - p = decode_sattr(p, &args->attrs, nfsd_user_namespace(rqstp)); - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_fhandle(xdr, &args->fh) && + svcxdr_decode_sattr(rqstp, xdr, &args->attrs); } int From 1db54ce543bc184d36a623e7e7c31a393287a364 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 21 Oct 2020 12:43:58 -0400 Subject: [PATCH 0462/1565] NFSD: Update the NFSv2 CREATE argument decoder to use struct xdr_stream [ Upstream commit 7dcf65b91ecaf60ce593e7859ae2b29b7c46ccbd ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsxdr.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 6c87ea8f3876..2e2806cbe7b8 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -401,14 +401,12 @@ nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) int nfssvc_decode_createargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_createargs *args = rqstp->rq_argp; - if ( !(p = decode_fh(p, &args->fh)) - || !(p = decode_filename(p, &args->name, &args->len))) - return 0; - p = decode_sattr(p, &args->attrs, nfsd_user_namespace(rqstp)); - - return xdr_argsize_check(rqstp, p); + return svcxdr_decode_diropargs(xdr, &args->fh, + &args->name, &args->len) && + svcxdr_decode_sattr(rqstp, xdr, &args->attrs); } int From 7e6746027b058a0f01a73d3ed2ed352c7cdaf28f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 21 Oct 2020 12:46:03 -0400 Subject: [PATCH 0463/1565] NFSD: Update the NFSv2 SYMLINK argument decoder to use struct xdr_stream [ Upstream commit 09f75a5375ac61f4adb94da0accc1cfc60eb4f2b ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsxdr.c | 113 +++++------------------------------------------ 1 file changed, 10 insertions(+), 103 deletions(-) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 2e2806cbe7b8..f2cb4794aeaf 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -66,26 +66,6 @@ encode_fh(__be32 *p, struct svc_fh *fhp) return p + (NFS_FHSIZE>> 2); } -/* - * Decode a file name and make sure that the path contains - * no slashes or null bytes. - */ -static __be32 * -decode_filename(__be32 *p, char **namp, unsigned int *lenp) -{ - char *name; - unsigned int i; - - if ((p = xdr_decode_string_inplace(p, namp, lenp, NFS_MAXNAMLEN)) != NULL) { - for (i = 0, name = *namp; i < *lenp; i++, name++) { - if (*name == '\0' || *name == '/') - return NULL; - } - } - - return p; -} - static bool svcxdr_decode_filename(struct xdr_stream *xdr, char **name, unsigned int *len) { @@ -118,61 +98,6 @@ svcxdr_decode_diropargs(struct xdr_stream *xdr, struct svc_fh *fhp, svcxdr_decode_filename(xdr, name, len); } -static __be32 * -decode_sattr(__be32 *p, struct iattr *iap, struct user_namespace *userns) -{ - u32 tmp, tmp1; - - iap->ia_valid = 0; - - /* Sun client bug compatibility check: some sun clients seem to - * put 0xffff in the mode field when they mean 0xffffffff. - * Quoting the 4.4BSD nfs server code: Nah nah nah nah na nah. - */ - if ((tmp = ntohl(*p++)) != (u32)-1 && tmp != 0xffff) { - iap->ia_valid |= ATTR_MODE; - iap->ia_mode = tmp; - } - if ((tmp = ntohl(*p++)) != (u32)-1) { - iap->ia_uid = make_kuid(userns, tmp); - if (uid_valid(iap->ia_uid)) - iap->ia_valid |= ATTR_UID; - } - if ((tmp = ntohl(*p++)) != (u32)-1) { - iap->ia_gid = make_kgid(userns, tmp); - if (gid_valid(iap->ia_gid)) - iap->ia_valid |= ATTR_GID; - } - if ((tmp = ntohl(*p++)) != (u32)-1) { - iap->ia_valid |= ATTR_SIZE; - iap->ia_size = tmp; - } - tmp = ntohl(*p++); tmp1 = ntohl(*p++); - if (tmp != (u32)-1 && tmp1 != (u32)-1) { - iap->ia_valid |= ATTR_ATIME | ATTR_ATIME_SET; - iap->ia_atime.tv_sec = tmp; - iap->ia_atime.tv_nsec = tmp1 * 1000; - } - tmp = ntohl(*p++); tmp1 = ntohl(*p++); - if (tmp != (u32)-1 && tmp1 != (u32)-1) { - iap->ia_valid |= ATTR_MTIME | ATTR_MTIME_SET; - iap->ia_mtime.tv_sec = tmp; - iap->ia_mtime.tv_nsec = tmp1 * 1000; - /* - * Passing the invalid value useconds=1000000 for mtime - * is a Sun convention for "set both mtime and atime to - * current server time". It's needed to make permissions - * checks for the "touch" program across v2 mounts to - * Solaris and Irix boxes work correctly. See description of - * sattr in section 6.1 of "NFS Illustrated" by - * Brent Callaghan, Addison-Wesley, ISBN 0-201-32750-5 - */ - if (tmp1 == 1000000) - iap->ia_valid &= ~(ATTR_ATIME_SET|ATTR_MTIME_SET); - } - return p; -} - static bool svcxdr_decode_sattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, struct iattr *iap) @@ -435,40 +360,22 @@ nfssvc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) int nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_symlinkargs *args = rqstp->rq_argp; - char *base = (char *)p; - size_t xdrlen; + struct kvec *head = rqstp->rq_arg.head; - if ( !(p = decode_fh(p, &args->ffh)) - || !(p = decode_filename(p, &args->fname, &args->flen))) + if (!svcxdr_decode_diropargs(xdr, &args->ffh, &args->fname, &args->flen)) + return 0; + if (xdr_stream_decode_u32(xdr, &args->tlen) < 0) return 0; - - args->tlen = ntohl(*p++); if (args->tlen == 0) return 0; - args->first.iov_base = p; - args->first.iov_len = rqstp->rq_arg.head[0].iov_len; - args->first.iov_len -= (char *)p - base; - - /* This request is never larger than a page. Therefore, - * transport will deliver either: - * 1. pathname in the pagelist -> sattr is in the tail. - * 2. everything in the head buffer -> sattr is in the head. - */ - if (rqstp->rq_arg.page_len) { - if (args->tlen != rqstp->rq_arg.page_len) - return 0; - p = rqstp->rq_arg.tail[0].iov_base; - } else { - xdrlen = XDR_QUADLEN(args->tlen); - if (xdrlen > args->first.iov_len - (8 * sizeof(__be32))) - return 0; - p += xdrlen; - } - decode_sattr(p, &args->attrs, nfsd_user_namespace(rqstp)); - - return 1; + args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); + args->first.iov_base = xdr_inline_decode(xdr, args->tlen); + if (!args->first.iov_base) + return 0; + return svcxdr_decode_sattr(rqstp, xdr, &args->attrs); } int From e44061388635614756ede3f21369c3a43ef17ee5 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 20 Oct 2020 10:08:19 -0400 Subject: [PATCH 0464/1565] NFSD: Remove argument length checking in nfsd_dispatch() [ Upstream commit 5650682e16f41722f735b7beeb2dbc3411dfbeb6 ] Now that the argument decoders for NFSv2 and NFSv3 use the xdr_stream mechanism, the version-specific length checking logic in nfsd_dispatch() is no longer necessary. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfssvc.c | 34 ---------------------------------- 1 file changed, 34 deletions(-) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 423410cc0214..6c1d70935ea8 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -988,37 +988,6 @@ nfsd(void *vrqstp) return 0; } -/* - * A write procedure can have a large argument, and a read procedure can - * have a large reply, but no NFSv2 or NFSv3 procedure has argument and - * reply that can both be larger than a page. The xdr code has taken - * advantage of this assumption to be a sloppy about bounds checking in - * some cases. Pending a rewrite of the NFSv2/v3 xdr code to fix that - * problem, we enforce these assumptions here: - */ -static bool nfs_request_too_big(struct svc_rqst *rqstp, - const struct svc_procedure *proc) -{ - /* - * The ACL code has more careful bounds-checking and is not - * susceptible to this problem: - */ - if (rqstp->rq_prog != NFS_PROGRAM) - return false; - /* - * Ditto NFSv4 (which can in theory have argument and reply both - * more than a page): - */ - if (rqstp->rq_vers >= 4) - return false; - /* The reply will be small, we're OK: */ - if (proc->pc_xdrressize > 0 && - proc->pc_xdrressize < XDR_QUADLEN(PAGE_SIZE)) - return false; - - return rqstp->rq_arg.len > PAGE_SIZE; -} - /** * nfsd_dispatch - Process an NFS or NFSACL Request * @rqstp: incoming request @@ -1037,9 +1006,6 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) struct kvec *resv = &rqstp->rq_res.head[0]; __be32 *p; - if (nfs_request_too_big(rqstp, proc)) - goto out_decode_err; - /* * Give the xdr decoder a chance to change this if it wants * (necessary in the NFSv4.0 compound case) From ea6b0e02dcacac18adb300636ce30e88f64dc912 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 17 Nov 2020 11:32:04 -0500 Subject: [PATCH 0465/1565] NFSD: Update the NFSv2 GETACL argument decoder to use struct xdr_stream [ Upstream commit 635a45d34706400c59c3b18ca9fccba195147bda ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs2acl.c | 10 +++++----- fs/nfsd/nfsxdr.c | 11 ++++++++++- fs/nfsd/xdr.h | 1 + fs/nfsd/xdr3.h | 2 +- 4 files changed, 17 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 899762da23c9..df2e145cfab0 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -188,17 +188,17 @@ static __be32 nfsacld_proc_access(struct svc_rqst *rqstp) static int nfsaclsvc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_getaclargs *argp = rqstp->rq_argp; - p = nfs2svc_decode_fh(p, &argp->fh); - if (!p) + if (!svcxdr_decode_fhandle(xdr, &argp->fh)) + return 0; + if (xdr_stream_decode_u32(xdr, &argp->mask) < 0) return 0; - argp->mask = ntohl(*p); p++; - return xdr_argsize_check(rqstp, p); + return 1; } - static int nfsaclsvc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_setaclargs *argp = rqstp->rq_argp; diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index f2cb4794aeaf..5ab9fc14816c 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -38,7 +38,16 @@ decode_fh(__be32 *p, struct svc_fh *fhp) return p + (NFS_FHSIZE >> 2); } -static bool +/** + * svcxdr_decode_fhandle - Decode an NFSv2 file handle + * @xdr: XDR stream positioned at an encoded NFSv2 FH + * @fhp: OUT: filled-in server file handle + * + * Return values: + * %false: The encoded file handle was not valid + * %true: @fhp has been initialized + */ +bool svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp) { __be32 *p; diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index ff68643504c3..77afad72c2aa 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -165,5 +165,6 @@ void nfssvc_release_readres(struct svc_rqst *rqstp); /* Helper functions for NFSv2 ACL code */ __be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, struct kstat *stat); __be32 *nfs2svc_decode_fh(__be32 *p, struct svc_fh *fhp); +bool svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp); #endif /* LINUX_NFSD_H */ diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 64af5b01c5d7..43db4206cd25 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -102,7 +102,7 @@ struct nfsd3_commitargs { struct nfsd3_getaclargs { struct svc_fh fh; - int mask; + __u32 mask; }; struct posix_acl; From e077857ef0f84ba0ea99b2be796e7264a136c90a Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 17 Nov 2020 10:38:46 -0500 Subject: [PATCH 0466/1565] NFSD: Add an xdr_stream-based decoder for NFSv2/3 ACLs [ Upstream commit 6bb844b4eb6e3b109a2fdaffb60e6da722dc4356 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfs_common/nfsacl.c | 52 ++++++++++++++++++++++++++++++++++++++++++ include/linux/nfsacl.h | 3 +++ 2 files changed, 55 insertions(+) diff --git a/fs/nfs_common/nfsacl.c b/fs/nfs_common/nfsacl.c index d056ad2fdefd..79c563c1a5e8 100644 --- a/fs/nfs_common/nfsacl.c +++ b/fs/nfs_common/nfsacl.c @@ -295,3 +295,55 @@ int nfsacl_decode(struct xdr_buf *buf, unsigned int base, unsigned int *aclcnt, nfsacl_desc.desc.array_len; } EXPORT_SYMBOL_GPL(nfsacl_decode); + +/** + * nfs_stream_decode_acl - Decode an NFSv3 ACL + * + * @xdr: an xdr_stream positioned at an encoded ACL + * @aclcnt: OUT: count of ACEs in decoded posix_acl + * @pacl: OUT: a dynamically-allocated buffer containing the decoded posix_acl + * + * Return values: + * %false: The encoded ACL is not valid + * %true: @pacl contains a decoded ACL, and @xdr is advanced + * + * On a successful return, caller must release *pacl using posix_acl_release(). + */ +bool nfs_stream_decode_acl(struct xdr_stream *xdr, unsigned int *aclcnt, + struct posix_acl **pacl) +{ + const size_t elem_size = XDR_UNIT * 3; + struct nfsacl_decode_desc nfsacl_desc = { + .desc = { + .elem_size = elem_size, + .xcode = pacl ? xdr_nfsace_decode : NULL, + }, + }; + unsigned int base; + u32 entries; + + if (xdr_stream_decode_u32(xdr, &entries) < 0) + return false; + if (entries > NFS_ACL_MAX_ENTRIES) + return false; + + base = xdr_stream_pos(xdr); + if (!xdr_inline_decode(xdr, XDR_UNIT + elem_size * entries)) + return false; + nfsacl_desc.desc.array_maxlen = entries; + if (xdr_decode_array2(xdr->buf, base, &nfsacl_desc.desc)) + return false; + + if (pacl) { + if (entries != nfsacl_desc.desc.array_len || + posix_acl_from_nfsacl(nfsacl_desc.acl) != 0) { + posix_acl_release(nfsacl_desc.acl); + return false; + } + *pacl = nfsacl_desc.acl; + } + if (aclcnt) + *aclcnt = entries; + return true; +} +EXPORT_SYMBOL_GPL(nfs_stream_decode_acl); diff --git a/include/linux/nfsacl.h b/include/linux/nfsacl.h index 103d44695323..0ba99c513649 100644 --- a/include/linux/nfsacl.h +++ b/include/linux/nfsacl.h @@ -38,5 +38,8 @@ nfsacl_encode(struct xdr_buf *buf, unsigned int base, struct inode *inode, extern int nfsacl_decode(struct xdr_buf *buf, unsigned int base, unsigned int *aclcnt, struct posix_acl **pacl); +extern bool +nfs_stream_decode_acl(struct xdr_stream *xdr, unsigned int *aclcnt, + struct posix_acl **pacl); #endif /* __LINUX_NFSACL_H */ From 508a791fbe87bdbeb88c1d29eaa487ae823cb413 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 17 Nov 2020 11:37:35 -0500 Subject: [PATCH 0467/1565] NFSD: Update the NFSv2 SETACL argument decoder to use struct xdr_stream [ Upstream commit 427eab3ba22891845265f9a3846de6ac152ec836 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs2acl.c | 29 ++++++++++++----------------- fs/nfsd/xdr3.h | 2 +- 2 files changed, 13 insertions(+), 18 deletions(-) diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index df2e145cfab0..123820ec79d3 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -201,28 +201,23 @@ static int nfsaclsvc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p) static int nfsaclsvc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_setaclargs *argp = rqstp->rq_argp; - struct kvec *head = rqstp->rq_arg.head; - unsigned int base; - int n; - p = nfs2svc_decode_fh(p, &argp->fh); - if (!p) + if (!svcxdr_decode_fhandle(xdr, &argp->fh)) return 0; - argp->mask = ntohl(*p++); - if (argp->mask & ~NFS_ACL_MASK || - !xdr_argsize_check(rqstp, p)) + if (xdr_stream_decode_u32(xdr, &argp->mask) < 0) + return 0; + if (argp->mask & ~NFS_ACL_MASK) + return 0; + if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_ACL) ? + &argp->acl_access : NULL)) + return 0; + if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_DFACL) ? + &argp->acl_default : NULL)) return 0; - base = (char *)p - (char *)head->iov_base; - n = nfsacl_decode(&rqstp->rq_arg, base, NULL, - (argp->mask & NFS_ACL) ? - &argp->acl_access : NULL); - if (n > 0) - n = nfsacl_decode(&rqstp->rq_arg, base + n, NULL, - (argp->mask & NFS_DFACL) ? - &argp->acl_default : NULL); - return (n > 0); + return 1; } static int nfsaclsvc_decode_fhandleargs(struct svc_rqst *rqstp, __be32 *p) diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 43db4206cd25..5afb3ce4f062 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -108,7 +108,7 @@ struct nfsd3_getaclargs { struct posix_acl; struct nfsd3_setaclargs { struct svc_fh fh; - int mask; + __u32 mask; struct posix_acl *acl_access; struct posix_acl *acl_default; }; From 9eea3915dd811b941cdb8b5114c4e384cfaa60b2 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 17 Nov 2020 11:46:50 -0500 Subject: [PATCH 0468/1565] NFSD: Update the NFSv2 ACL GETATTR argument decoder to use struct xdr_stream [ Upstream commit 571d31f37a57729c9d3463b5a692a84e619b408a ] Since the ACL GETATTR procedure is the same as the normal GETATTR procedure, simply re-use nfssvc_decode_fhandleargs. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs2acl.c | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 123820ec79d3..0274348f6679 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -220,16 +220,6 @@ static int nfsaclsvc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) return 1; } -static int nfsaclsvc_decode_fhandleargs(struct svc_rqst *rqstp, __be32 *p) -{ - struct nfsd_fhandle *argp = rqstp->rq_argp; - - p = nfs2svc_decode_fh(p, &argp->fh); - if (!p) - return 0; - return xdr_argsize_check(rqstp, p); -} - static int nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_accessargs *argp = rqstp->rq_argp; @@ -392,7 +382,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { }, [ACLPROC2_GETATTR] = { .pc_func = nfsacld_proc_getattr, - .pc_decode = nfsaclsvc_decode_fhandleargs, + .pc_decode = nfssvc_decode_fhandleargs, .pc_encode = nfsaclsvc_encode_attrstatres, .pc_release = nfsaclsvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_fhandle), From 81f79eb2237b76ed5faf762239beaede7d2b821f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 17 Nov 2020 11:49:29 -0500 Subject: [PATCH 0469/1565] NFSD: Update the NFSv2 ACL ACCESS argument decoder to use struct xdr_stream [ Upstream commit 64063892efc1daa3a48882673811ff327ba75ed5 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs2acl.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 0274348f6679..7eeac5b81c20 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -222,14 +222,15 @@ static int nfsaclsvc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) static int nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) { - struct nfsd3_accessargs *argp = rqstp->rq_argp; + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nfsd3_accessargs *args = rqstp->rq_argp; - p = nfs2svc_decode_fh(p, &argp->fh); - if (!p) + if (!svcxdr_decode_fhandle(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u32(xdr, &args->access) < 0) return 0; - argp->access = ntohl(*p++); - return xdr_argsize_check(rqstp, p); + return 1; } /* From 5ea5e56cfb57797b907d810dcd222735c886a873 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 19 Oct 2020 17:49:16 -0400 Subject: [PATCH 0470/1565] NFSD: Clean up after updating NFSv2 ACL decoders [ Upstream commit baadce65d6ee3032b921d9c043ba808bc69d6b13 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsxdr.c | 18 ------------------ fs/nfsd/xdr.h | 1 - 2 files changed, 19 deletions(-) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 5ab9fc14816c..5d79ef6a0c7f 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -26,18 +26,6 @@ static u32 nfs_ftypes[] = { * Basic NFSv2 data types (RFC 1094 Section 2.3) */ -static __be32 * -decode_fh(__be32 *p, struct svc_fh *fhp) -{ - fh_init(fhp, NFS_FHSIZE); - memcpy(&fhp->fh_handle.fh_base, p, NFS_FHSIZE); - fhp->fh_handle.fh_size = NFS_FHSIZE; - - /* FIXME: Look up export pointer here and verify - * Sun Secure RPC if requested */ - return p + (NFS_FHSIZE >> 2); -} - /** * svcxdr_decode_fhandle - Decode an NFSv2 file handle * @xdr: XDR stream positioned at an encoded NFSv2 FH @@ -62,12 +50,6 @@ svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp) return true; } -/* Helper function for NFSv2 ACL code */ -__be32 *nfs2svc_decode_fh(__be32 *p, struct svc_fh *fhp) -{ - return decode_fh(p, fhp); -} - static __be32 * encode_fh(__be32 *p, struct svc_fh *fhp) { diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 77afad72c2aa..b92f1acec9e7 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -164,7 +164,6 @@ void nfssvc_release_readres(struct svc_rqst *rqstp); /* Helper functions for NFSv2 ACL code */ __be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, struct kstat *stat); -__be32 *nfs2svc_decode_fh(__be32 *p, struct svc_fh *fhp); bool svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp); #endif /* LINUX_NFSD_H */ From f89e3fa89e465bee9a569d5b792b943c16e06b80 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 17 Nov 2020 11:52:04 -0500 Subject: [PATCH 0471/1565] NFSD: Update the NFSv3 GETACL argument decoder to use struct xdr_stream [ Upstream commit 05027eafc266487c6e056d10ab352861df95b5d4 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3acl.c | 11 ++++++----- fs/nfsd/nfs3xdr.c | 11 ++++++++++- fs/nfsd/xdr3.h | 1 + 3 files changed, 17 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 9e1a92fb9771..addb0d7d5500 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -124,19 +124,20 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp) /* * XDR decode functions */ + static int nfs3svc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_getaclargs *args = rqstp->rq_argp; - p = nfs3svc_decode_fh(p, &args->fh); - if (!p) + if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) + return 0; + if (xdr_stream_decode_u32(xdr, &args->mask) < 0) return 0; - args->mask = ntohl(*p); p++; - return xdr_argsize_check(rqstp, p); + return 1; } - static int nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_setaclargs *args = rqstp->rq_argp; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index a30b418a5116..aa55d0ba2a54 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -53,7 +53,16 @@ svcxdr_decode_nfstime3(struct xdr_stream *xdr, struct timespec64 *timep) return true; } -static bool +/** + * svcxdr_decode_nfs_fh3 - Decode an NFSv3 file handle + * @xdr: XDR stream positioned at an undecoded NFSv3 FH + * @fhp: OUT: filled-in server file handle + * + * Return values: + * %false: The encoded file handle was not valid + * %true: @fhp has been initialized + */ +bool svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp) { __be32 *p; diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 5afb3ce4f062..7456aee74f3d 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -308,6 +308,7 @@ int nfs3svc_encode_entry_plus(void *, const char *name, __be32 *nfs3svc_encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp); __be32 *nfs3svc_decode_fh(__be32 *p, struct svc_fh *fhp); +bool svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp); #endif /* _LINUX_NFSD_XDR3_H */ From 22af3dfbe657a1548779e1dab5bad0a61418d29d Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 17 Nov 2020 11:56:26 -0500 Subject: [PATCH 0472/1565] NFSD: Update the NFSv2 SETACL argument decoder to use struct xdr_stream [ Upstream commit 68519ff2a1c72c67fcdc4b81671acda59f420af9 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3acl.c | 31 +++++++++++++------------------ 1 file changed, 13 insertions(+), 18 deletions(-) diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index addb0d7d5500..a568b842e9eb 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -140,28 +140,23 @@ static int nfs3svc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p) static int nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) { - struct nfsd3_setaclargs *args = rqstp->rq_argp; - struct kvec *head = rqstp->rq_arg.head; - unsigned int base; - int n; + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nfsd3_setaclargs *argp = rqstp->rq_argp; - p = nfs3svc_decode_fh(p, &args->fh); - if (!p) + if (!svcxdr_decode_nfs_fh3(xdr, &argp->fh)) return 0; - args->mask = ntohl(*p++); - if (args->mask & ~NFS_ACL_MASK || - !xdr_argsize_check(rqstp, p)) + if (xdr_stream_decode_u32(xdr, &argp->mask) < 0) + return 0; + if (argp->mask & ~NFS_ACL_MASK) + return 0; + if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_ACL) ? + &argp->acl_access : NULL)) + return 0; + if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_DFACL) ? + &argp->acl_default : NULL)) return 0; - base = (char *)p - (char *)head->iov_base; - n = nfsacl_decode(&rqstp->rq_arg, base, NULL, - (args->mask & NFS_ACL) ? - &args->acl_access : NULL); - if (n > 0) - n = nfsacl_decode(&rqstp->rq_arg, base + n, NULL, - (args->mask & NFS_DFACL) ? - &args->acl_default : NULL); - return (n > 0); + return 1; } /* From 0a13baa6ab5a9e035e8e78618c8253e25299b5b2 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 20 Oct 2020 09:56:52 -0400 Subject: [PATCH 0473/1565] NFSD: Clean up after updating NFSv3 ACL decoders [ Upstream commit 9cee763ee654ce8622d673b8e32687d738e24ace ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 20 -------------------- fs/nfsd/xdr3.h | 2 -- 2 files changed, 22 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index aa55d0ba2a54..00a96054280a 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -82,26 +82,6 @@ svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp) return true; } -static __be32 * -decode_fh(__be32 *p, struct svc_fh *fhp) -{ - unsigned int size; - fh_init(fhp, NFS3_FHSIZE); - size = ntohl(*p++); - if (size > NFS3_FHSIZE) - return NULL; - - memcpy(&fhp->fh_handle.fh_base, p, size); - fhp->fh_handle.fh_size = size; - return p + XDR_QUADLEN(size); -} - -/* Helper function for NFSv3 ACL code */ -__be32 *nfs3svc_decode_fh(__be32 *p, struct svc_fh *fhp) -{ - return decode_fh(p, fhp); -} - static __be32 * encode_fh(__be32 *p, struct svc_fh *fhp) { diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 7456aee74f3d..3e1578953f54 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -307,8 +307,6 @@ int nfs3svc_encode_entry_plus(void *, const char *name, /* Helper functions for NFSv3 ACL code */ __be32 *nfs3svc_encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp); -__be32 *nfs3svc_decode_fh(__be32 *p, struct svc_fh *fhp); bool svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp); - #endif /* _LINUX_NFSD_XDR3_H */ From d1344de0d66da89f8df00969a17cdb3f76a95800 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 6 Jan 2021 09:52:34 +0200 Subject: [PATCH 0474/1565] nfsd: remove unused stats counters [ Upstream commit 1b76d1df1a3683b6b23cd1c813d13c5e6a9d35e5 ] Commit 501cb1849f86 ("nfsd: rip out the raparms cache") removed the code that updates read-ahead cache stats counters, commit 8bbfa9f3889b ("knfsd: remove the nfsd thread busy histogram") removed code that updates the thread busy stats counters back in 2009 and code that updated filehandle cache stats was removed back in 2002. Remove the unused stats counters from nfsd_stats struct and print hardcoded zeros in /proc/net/rpc/nfsd. Signed-off-by: Amir Goldstein Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/stats.c | 41 ++++++++++++++++------------------------- fs/nfsd/stats.h | 10 ---------- 2 files changed, 16 insertions(+), 35 deletions(-) diff --git a/fs/nfsd/stats.c b/fs/nfsd/stats.c index b1bc582b0493..e928e224205a 100644 --- a/fs/nfsd/stats.c +++ b/fs/nfsd/stats.c @@ -7,16 +7,14 @@ * Format: * rc * Statistsics for the reply cache - * fh + * fh * statistics for filehandle lookup * io * statistics for IO throughput - * th <10%-20%> <20%-30%> ... <90%-100%> <100%> - * time (seconds) when nfsd thread usage above thresholds - * and number of times that all threads were in use - * ra cache-size <10% <20% <30% ... <100% not-found - * number of times that read-ahead entry was found that deep in - * the cache. + * th + * number of threads + * ra + * * plus generic RPC stats (see net/sunrpc/stats.c) * * Copyright (C) 1995, 1996, 1997 Olaf Kirch @@ -38,31 +36,24 @@ static int nfsd_proc_show(struct seq_file *seq, void *v) { int i; - seq_printf(seq, "rc %u %u %u\nfh %u %u %u %u %u\nio %u %u\n", + seq_printf(seq, "rc %u %u %u\nfh %u 0 0 0 0\nio %u %u\n", nfsdstats.rchits, nfsdstats.rcmisses, nfsdstats.rcnocache, nfsdstats.fh_stale, - nfsdstats.fh_lookup, - nfsdstats.fh_anon, - nfsdstats.fh_nocache_dir, - nfsdstats.fh_nocache_nondir, nfsdstats.io_read, nfsdstats.io_write); - /* thread usage: */ - seq_printf(seq, "th %u %u", nfsdstats.th_cnt, nfsdstats.th_fullcnt); - for (i=0; i<10; i++) { - unsigned int jifs = nfsdstats.th_usage[i]; - unsigned int sec = jifs / HZ, msec = (jifs % HZ)*1000/HZ; - seq_printf(seq, " %u.%03u", sec, msec); - } - /* newline and ra-cache */ - seq_printf(seq, "\nra %u", nfsdstats.ra_size); - for (i=0; i<11; i++) - seq_printf(seq, " %u", nfsdstats.ra_depth[i]); - seq_putc(seq, '\n'); - + /* thread usage: */ + seq_printf(seq, "th %u 0", nfsdstats.th_cnt); + + /* deprecated thread usage histogram stats */ + for (i = 0; i < 10; i++) + seq_puts(seq, " 0.000"); + + /* deprecated ra-cache stats */ + seq_puts(seq, "\nra 0 0 0 0 0 0 0 0 0 0 0 0\n"); + /* show my rpc info */ svc_seq_show(seq, &nfsd_svcstats); diff --git a/fs/nfsd/stats.h b/fs/nfsd/stats.h index b23fdac69820..5e3cdf21556a 100644 --- a/fs/nfsd/stats.h +++ b/fs/nfsd/stats.h @@ -15,19 +15,9 @@ struct nfsd_stats { unsigned int rcmisses; /* repcache hits */ unsigned int rcnocache; /* uncached reqs */ unsigned int fh_stale; /* FH stale error */ - unsigned int fh_lookup; /* dentry cached */ - unsigned int fh_anon; /* anon file dentry returned */ - unsigned int fh_nocache_dir; /* filehandle not found in dcache */ - unsigned int fh_nocache_nondir; /* filehandle not found in dcache */ unsigned int io_read; /* bytes returned to read requests */ unsigned int io_write; /* bytes passed in write requests */ unsigned int th_cnt; /* number of available threads */ - unsigned int th_usage[10]; /* number of ticks during which n perdeciles - * of available threads were in use */ - unsigned int th_fullcnt; /* number of times last free thread was used */ - unsigned int ra_size; /* size of ra cache */ - unsigned int ra_depth[11]; /* number of times ra entry was found that deep - * in the cache (10percentiles). [10] = not found */ #ifdef CONFIG_NFSD_V4 unsigned int nfs4_opcount[LAST_NFS4_OP + 1]; /* count of individual nfsv4 operations */ #endif From 65b1df1358842854ee4b9830b41ccfdf7ae57246 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 6 Jan 2021 09:52:35 +0200 Subject: [PATCH 0475/1565] nfsd: protect concurrent access to nfsd stats counters [ Upstream commit e567b98ce9a4b35b63c364d24828a9e5cd7a8179 ] nfsd stats counters can be updated by concurrent nfsd threads without any protection. Convert some nfsd_stats and nfsd_net struct members to use percpu counters. The longest_chain* members of struct nfsd_net remain unprotected. Signed-off-by: Amir Goldstein Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/netns.h | 23 ++++++++----- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfscache.c | 52 ++++++++++++++++++++--------- fs/nfsd/nfsctl.c | 5 ++- fs/nfsd/nfsfh.c | 2 +- fs/nfsd/stats.c | 79 ++++++++++++++++++++++++++++++++++++-------- fs/nfsd/stats.h | 82 +++++++++++++++++++++++++++++++++++++++------- fs/nfsd/vfs.c | 4 +-- 8 files changed, 194 insertions(+), 55 deletions(-) diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 02d3d2f0e616..a75abeb1e698 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -10,6 +10,7 @@ #include #include +#include /* Hash tables for nfs4_clientid state */ #define CLIENT_HASH_BITS 4 @@ -21,6 +22,14 @@ struct cld_net; struct nfsd4_client_tracking_ops; +enum { + /* cache misses due only to checksum comparison failures */ + NFSD_NET_PAYLOAD_MISSES, + /* amount of memory (in bytes) currently consumed by the DRC */ + NFSD_NET_DRC_MEM_USAGE, + NFSD_NET_COUNTERS_NUM +}; + /* * Represents a nfsd "container". With respect to nfsv4 state tracking, the * fields of interest are the *_id_hashtbls and the *_name_tree. These track @@ -149,20 +158,16 @@ struct nfsd_net { /* * Stats and other tracking of on the duplicate reply cache. - * These fields and the "rc" fields in nfsdstats are modified - * with only the per-bucket cache lock, which isn't really safe - * and should be fixed if we want the statistics to be - * completely accurate. + * The longest_chain* fields are modified with only the per-bucket + * cache lock, which isn't really safe and should be fixed if we want + * these statistics to be completely accurate. */ /* total number of entries */ atomic_t num_drc_entries; - /* cache misses due only to checksum comparison failures */ - unsigned int payload_misses; - - /* amount of memory (in bytes) currently consumed by the DRC */ - unsigned int drc_mem_usage; + /* Per-netns stats counters */ + struct percpu_counter counter[NFSD_NET_COUNTERS_NUM]; /* longest hash chain seen */ unsigned int longest_chain; diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index a5e1f5c1a4d6..4f64d94909ec 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2168,7 +2168,7 @@ nfsd4_proc_null(struct svc_rqst *rqstp) static inline void nfsd4_increment_op_stats(u32 opnum) { if (opnum >= FIRST_NFS4_OP && opnum <= LAST_NFS4_OP) - nfsdstats.nfs4_opcount[opnum]++; + percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_NFS4_OP(opnum)]); } static const struct nfsd4_operation nfsd4_ops[]; diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c index 80c90fc231a5..96cdf77925f3 100644 --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -121,14 +121,14 @@ nfsd_reply_cache_free_locked(struct nfsd_drc_bucket *b, struct svc_cacherep *rp, struct nfsd_net *nn) { if (rp->c_type == RC_REPLBUFF && rp->c_replvec.iov_base) { - nn->drc_mem_usage -= rp->c_replvec.iov_len; + nfsd_stats_drc_mem_usage_sub(nn, rp->c_replvec.iov_len); kfree(rp->c_replvec.iov_base); } if (rp->c_state != RC_UNUSED) { rb_erase(&rp->c_node, &b->rb_head); list_del(&rp->c_lru); atomic_dec(&nn->num_drc_entries); - nn->drc_mem_usage -= sizeof(*rp); + nfsd_stats_drc_mem_usage_sub(nn, sizeof(*rp)); } kmem_cache_free(drc_slab, rp); } @@ -154,6 +154,16 @@ void nfsd_drc_slab_free(void) kmem_cache_destroy(drc_slab); } +static int nfsd_reply_cache_stats_init(struct nfsd_net *nn) +{ + return nfsd_percpu_counters_init(nn->counter, NFSD_NET_COUNTERS_NUM); +} + +static void nfsd_reply_cache_stats_destroy(struct nfsd_net *nn) +{ + nfsd_percpu_counters_destroy(nn->counter, NFSD_NET_COUNTERS_NUM); +} + int nfsd_reply_cache_init(struct nfsd_net *nn) { unsigned int hashsize; @@ -165,12 +175,16 @@ int nfsd_reply_cache_init(struct nfsd_net *nn) hashsize = nfsd_hashsize(nn->max_drc_entries); nn->maskbits = ilog2(hashsize); + status = nfsd_reply_cache_stats_init(nn); + if (status) + goto out_nomem; + nn->nfsd_reply_cache_shrinker.scan_objects = nfsd_reply_cache_scan; nn->nfsd_reply_cache_shrinker.count_objects = nfsd_reply_cache_count; nn->nfsd_reply_cache_shrinker.seeks = 1; status = register_shrinker(&nn->nfsd_reply_cache_shrinker); if (status) - goto out_nomem; + goto out_stats_destroy; nn->drc_hashtbl = kvzalloc(array_size(hashsize, sizeof(*nn->drc_hashtbl)), GFP_KERNEL); @@ -186,6 +200,8 @@ int nfsd_reply_cache_init(struct nfsd_net *nn) return 0; out_shrinker: unregister_shrinker(&nn->nfsd_reply_cache_shrinker); +out_stats_destroy: + nfsd_reply_cache_stats_destroy(nn); out_nomem: printk(KERN_ERR "nfsd: failed to allocate reply cache\n"); return -ENOMEM; @@ -196,6 +212,7 @@ void nfsd_reply_cache_shutdown(struct nfsd_net *nn) struct svc_cacherep *rp; unsigned int i; + nfsd_reply_cache_stats_destroy(nn); unregister_shrinker(&nn->nfsd_reply_cache_shrinker); for (i = 0; i < nn->drc_hashsize; i++) { @@ -324,7 +341,7 @@ nfsd_cache_key_cmp(const struct svc_cacherep *key, { if (key->c_key.k_xid == rp->c_key.k_xid && key->c_key.k_csum != rp->c_key.k_csum) { - ++nn->payload_misses; + nfsd_stats_payload_misses_inc(nn); trace_nfsd_drc_mismatch(nn, key, rp); } @@ -407,7 +424,7 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp) rqstp->rq_cacherep = NULL; if (type == RC_NOCACHE) { - nfsdstats.rcnocache++; + nfsd_stats_rc_nocache_inc(); goto out; } @@ -429,12 +446,12 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp) goto found_entry; } - nfsdstats.rcmisses++; + nfsd_stats_rc_misses_inc(); rqstp->rq_cacherep = rp; rp->c_state = RC_INPROG; atomic_inc(&nn->num_drc_entries); - nn->drc_mem_usage += sizeof(*rp); + nfsd_stats_drc_mem_usage_add(nn, sizeof(*rp)); /* go ahead and prune the cache */ prune_bucket(b, nn); @@ -446,7 +463,7 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp) found_entry: /* We found a matching entry which is either in progress or done. */ - nfsdstats.rchits++; + nfsd_stats_rc_hits_inc(); rtn = RC_DROPIT; /* Request being processed */ @@ -548,7 +565,7 @@ void nfsd_cache_update(struct svc_rqst *rqstp, int cachetype, __be32 *statp) return; } spin_lock(&b->cache_lock); - nn->drc_mem_usage += bufsize; + nfsd_stats_drc_mem_usage_add(nn, bufsize); lru_put_end(b, rp); rp->c_secure = test_bit(RQ_SECURE, &rqstp->rq_flags); rp->c_type = cachetype; @@ -588,13 +605,18 @@ static int nfsd_reply_cache_stats_show(struct seq_file *m, void *v) seq_printf(m, "max entries: %u\n", nn->max_drc_entries); seq_printf(m, "num entries: %u\n", - atomic_read(&nn->num_drc_entries)); + atomic_read(&nn->num_drc_entries)); seq_printf(m, "hash buckets: %u\n", 1 << nn->maskbits); - seq_printf(m, "mem usage: %u\n", nn->drc_mem_usage); - seq_printf(m, "cache hits: %u\n", nfsdstats.rchits); - seq_printf(m, "cache misses: %u\n", nfsdstats.rcmisses); - seq_printf(m, "not cached: %u\n", nfsdstats.rcnocache); - seq_printf(m, "payload misses: %u\n", nn->payload_misses); + seq_printf(m, "mem usage: %lld\n", + percpu_counter_sum_positive(&nn->counter[NFSD_NET_DRC_MEM_USAGE])); + seq_printf(m, "cache hits: %lld\n", + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_HITS])); + seq_printf(m, "cache misses: %lld\n", + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_MISSES])); + seq_printf(m, "not cached: %lld\n", + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE])); + seq_printf(m, "payload misses: %lld\n", + percpu_counter_sum_positive(&nn->counter[NFSD_NET_PAYLOAD_MISSES])); seq_printf(m, "longest chain len: %u\n", nn->longest_chain); seq_printf(m, "cachesize at longest: %u\n", nn->longest_chain_cachesize); return 0; diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index c4b11560ac1b..7f85c171f83a 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1522,7 +1522,9 @@ static int __init init_nfsd(void) retval = nfsd4_init_pnfs(); if (retval) goto out_free_slabs; - nfsd_stat_init(); /* Statistics */ + retval = nfsd_stat_init(); /* Statistics */ + if (retval) + goto out_free_pnfs; retval = nfsd_drc_slab_create(); if (retval) goto out_free_stat; @@ -1552,6 +1554,7 @@ static int __init init_nfsd(void) nfsd_drc_slab_free(); out_free_stat: nfsd_stat_shutdown(); +out_free_pnfs: nfsd4_exit_pnfs(); out_free_slabs: nfsd4_free_slabs(); diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 66f2ef67792a..9e31b2b5c6d2 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -422,7 +422,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access) } out: if (error == nfserr_stale) - nfsdstats.fh_stale++; + nfsd_stats_fh_stale_inc(); return error; } diff --git a/fs/nfsd/stats.c b/fs/nfsd/stats.c index e928e224205a..1d3b881e7382 100644 --- a/fs/nfsd/stats.c +++ b/fs/nfsd/stats.c @@ -36,13 +36,13 @@ static int nfsd_proc_show(struct seq_file *seq, void *v) { int i; - seq_printf(seq, "rc %u %u %u\nfh %u 0 0 0 0\nio %u %u\n", - nfsdstats.rchits, - nfsdstats.rcmisses, - nfsdstats.rcnocache, - nfsdstats.fh_stale, - nfsdstats.io_read, - nfsdstats.io_write); + seq_printf(seq, "rc %lld %lld %lld\nfh %lld 0 0 0 0\nio %lld %lld\n", + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_HITS]), + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_MISSES]), + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE]), + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_FH_STALE]), + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_IO_READ]), + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_IO_WRITE])); /* thread usage: */ seq_printf(seq, "th %u 0", nfsdstats.th_cnt); @@ -61,8 +61,10 @@ static int nfsd_proc_show(struct seq_file *seq, void *v) /* Show count for individual nfsv4 operations */ /* Writing operation numbers 0 1 2 also for maintaining uniformity */ seq_printf(seq,"proc4ops %u", LAST_NFS4_OP + 1); - for (i = 0; i <= LAST_NFS4_OP; i++) - seq_printf(seq, " %u", nfsdstats.nfs4_opcount[i]); + for (i = 0; i <= LAST_NFS4_OP; i++) { + seq_printf(seq, " %lld", + percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_NFS4_OP(i)])); + } seq_putc(seq, '\n'); #endif @@ -82,14 +84,63 @@ static const struct proc_ops nfsd_proc_ops = { .proc_release = single_release, }; -void -nfsd_stat_init(void) +int nfsd_percpu_counters_init(struct percpu_counter counters[], int num) { - svc_proc_register(&init_net, &nfsd_svcstats, &nfsd_proc_ops); + int i, err = 0; + + for (i = 0; !err && i < num; i++) + err = percpu_counter_init(&counters[i], 0, GFP_KERNEL); + + if (!err) + return 0; + + for (; i > 0; i--) + percpu_counter_destroy(&counters[i-1]); + + return err; } -void -nfsd_stat_shutdown(void) +void nfsd_percpu_counters_reset(struct percpu_counter counters[], int num) { + int i; + + for (i = 0; i < num; i++) + percpu_counter_set(&counters[i], 0); +} + +void nfsd_percpu_counters_destroy(struct percpu_counter counters[], int num) +{ + int i; + + for (i = 0; i < num; i++) + percpu_counter_destroy(&counters[i]); +} + +static int nfsd_stat_counters_init(void) +{ + return nfsd_percpu_counters_init(nfsdstats.counter, NFSD_STATS_COUNTERS_NUM); +} + +static void nfsd_stat_counters_destroy(void) +{ + nfsd_percpu_counters_destroy(nfsdstats.counter, NFSD_STATS_COUNTERS_NUM); +} + +int nfsd_stat_init(void) +{ + int err; + + err = nfsd_stat_counters_init(); + if (err) + return err; + + svc_proc_register(&init_net, &nfsd_svcstats, &nfsd_proc_ops); + + return 0; +} + +void nfsd_stat_shutdown(void) +{ + nfsd_stat_counters_destroy(); svc_proc_unregister(&init_net, "nfsd"); } diff --git a/fs/nfsd/stats.h b/fs/nfsd/stats.h index 5e3cdf21556a..87c3150c200f 100644 --- a/fs/nfsd/stats.h +++ b/fs/nfsd/stats.h @@ -8,27 +8,85 @@ #define _NFSD_STATS_H #include +#include +enum { + NFSD_STATS_RC_HITS, /* repcache hits */ + NFSD_STATS_RC_MISSES, /* repcache misses */ + NFSD_STATS_RC_NOCACHE, /* uncached reqs */ + NFSD_STATS_FH_STALE, /* FH stale error */ + NFSD_STATS_IO_READ, /* bytes returned to read requests */ + NFSD_STATS_IO_WRITE, /* bytes passed in write requests */ +#ifdef CONFIG_NFSD_V4 + NFSD_STATS_FIRST_NFS4_OP, /* count of individual nfsv4 operations */ + NFSD_STATS_LAST_NFS4_OP = NFSD_STATS_FIRST_NFS4_OP + LAST_NFS4_OP, +#define NFSD_STATS_NFS4_OP(op) (NFSD_STATS_FIRST_NFS4_OP + (op)) +#endif + NFSD_STATS_COUNTERS_NUM +}; + struct nfsd_stats { - unsigned int rchits; /* repcache hits */ - unsigned int rcmisses; /* repcache hits */ - unsigned int rcnocache; /* uncached reqs */ - unsigned int fh_stale; /* FH stale error */ - unsigned int io_read; /* bytes returned to read requests */ - unsigned int io_write; /* bytes passed in write requests */ - unsigned int th_cnt; /* number of available threads */ -#ifdef CONFIG_NFSD_V4 - unsigned int nfs4_opcount[LAST_NFS4_OP + 1]; /* count of individual nfsv4 operations */ -#endif + struct percpu_counter counter[NFSD_STATS_COUNTERS_NUM]; + /* Protected by nfsd_mutex */ + unsigned int th_cnt; /* number of available threads */ }; extern struct nfsd_stats nfsdstats; + extern struct svc_stat nfsd_svcstats; -void nfsd_stat_init(void); -void nfsd_stat_shutdown(void); +int nfsd_percpu_counters_init(struct percpu_counter counters[], int num); +void nfsd_percpu_counters_reset(struct percpu_counter counters[], int num); +void nfsd_percpu_counters_destroy(struct percpu_counter counters[], int num); +int nfsd_stat_init(void); +void nfsd_stat_shutdown(void); + +static inline void nfsd_stats_rc_hits_inc(void) +{ + percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_RC_HITS]); +} + +static inline void nfsd_stats_rc_misses_inc(void) +{ + percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_RC_MISSES]); +} + +static inline void nfsd_stats_rc_nocache_inc(void) +{ + percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE]); +} + +static inline void nfsd_stats_fh_stale_inc(void) +{ + percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_FH_STALE]); +} + +static inline void nfsd_stats_io_read_add(s64 amount) +{ + percpu_counter_add(&nfsdstats.counter[NFSD_STATS_IO_READ], amount); +} + +static inline void nfsd_stats_io_write_add(s64 amount) +{ + percpu_counter_add(&nfsdstats.counter[NFSD_STATS_IO_WRITE], amount); +} + +static inline void nfsd_stats_payload_misses_inc(struct nfsd_net *nn) +{ + percpu_counter_inc(&nn->counter[NFSD_NET_PAYLOAD_MISSES]); +} + +static inline void nfsd_stats_drc_mem_usage_add(struct nfsd_net *nn, s64 amount) +{ + percpu_counter_add(&nn->counter[NFSD_NET_DRC_MEM_USAGE], amount); +} + +static inline void nfsd_stats_drc_mem_usage_sub(struct nfsd_net *nn, s64 amount) +{ + percpu_counter_sub(&nn->counter[NFSD_NET_DRC_MEM_USAGE], amount); +} #endif /* _NFSD_STATS_H */ diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index a515cbd0a7d8..1b44d8f985be 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -897,7 +897,7 @@ static __be32 nfsd_finish_read(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned long *count, u32 *eof, ssize_t host_err) { if (host_err >= 0) { - nfsdstats.io_read += host_err; + nfsd_stats_io_read_add(host_err); *eof = nfsd_eof_on_read(file, offset, host_err, *count); *count = host_err; fsnotify_access(file); @@ -1050,7 +1050,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, goto out_nfserr; } *cnt = host_err; - nfsdstats.io_write += *cnt; + nfsd_stats_io_write_add(*cnt); fsnotify_modify(file); host_err = filemap_check_wb_err(file->f_mapping, since); if (host_err < 0) From 42cf742d8626ccb051042de7c9d991b271c9d32c Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 6 Jan 2021 09:52:36 +0200 Subject: [PATCH 0476/1565] nfsd: report per-export stats [ Upstream commit 20ad856e47323e208ae8d6a9ecfe5bf0be6f505e ] Collect some nfsd stats per export in addition to the global stats. A new nfsdfs export_stats file is created. It uses the same ops as the exports file to iterate the export entries and we use the file's name to determine the reported info per export. For example: $ cat /proc/fs/nfsd/export_stats # Version 1.1 # Path Client Start-time # Stats /test localhost 92 fh_stale: 0 io_read: 9 io_write: 1 Every export entry reports the start time when stats collection started, so stats collecting scripts can know if stats where reset between samples. Signed-off-by: Amir Goldstein Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/export.c | 68 ++++++++++++++++++++++++++++++++++++++++++------ fs/nfsd/export.h | 15 +++++++++++ fs/nfsd/nfsctl.c | 3 +++ fs/nfsd/nfsd.h | 2 +- fs/nfsd/nfsfh.c | 4 +-- fs/nfsd/stats.h | 12 ++++++--- fs/nfsd/vfs.c | 4 +-- 7 files changed, 92 insertions(+), 16 deletions(-) diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c index 81e7bb12aca6..7c863f2c21e0 100644 --- a/fs/nfsd/export.c +++ b/fs/nfsd/export.c @@ -331,12 +331,29 @@ static void nfsd4_fslocs_free(struct nfsd4_fs_locations *fsloc) fsloc->locations = NULL; } +static int export_stats_init(struct export_stats *stats) +{ + stats->start_time = ktime_get_seconds(); + return nfsd_percpu_counters_init(stats->counter, EXP_STATS_COUNTERS_NUM); +} + +static void export_stats_reset(struct export_stats *stats) +{ + nfsd_percpu_counters_reset(stats->counter, EXP_STATS_COUNTERS_NUM); +} + +static void export_stats_destroy(struct export_stats *stats) +{ + nfsd_percpu_counters_destroy(stats->counter, EXP_STATS_COUNTERS_NUM); +} + static void svc_export_put(struct kref *ref) { struct svc_export *exp = container_of(ref, struct svc_export, h.ref); path_put(&exp->ex_path); auth_domain_put(exp->ex_client); nfsd4_fslocs_free(&exp->ex_fslocs); + export_stats_destroy(&exp->ex_stats); kfree(exp->ex_uuid); kfree_rcu(exp, ex_rcu); } @@ -692,22 +709,47 @@ static void exp_flags(struct seq_file *m, int flag, int fsid, kuid_t anonu, kgid_t anong, struct nfsd4_fs_locations *fslocs); static void show_secinfo(struct seq_file *m, struct svc_export *exp); +static int is_export_stats_file(struct seq_file *m) +{ + /* + * The export_stats file uses the same ops as the exports file. + * We use the file's name to determine the reported info per export. + * There is no rename in nsfdfs, so d_name.name is stable. + */ + return !strcmp(m->file->f_path.dentry->d_name.name, "export_stats"); +} + static int svc_export_show(struct seq_file *m, struct cache_detail *cd, struct cache_head *h) { - struct svc_export *exp ; + struct svc_export *exp; + bool export_stats = is_export_stats_file(m); - if (h ==NULL) { - seq_puts(m, "#path domain(flags)\n"); + if (h == NULL) { + if (export_stats) + seq_puts(m, "#path domain start-time\n#\tstats\n"); + else + seq_puts(m, "#path domain(flags)\n"); return 0; } exp = container_of(h, struct svc_export, h); seq_path(m, &exp->ex_path, " \t\n\\"); seq_putc(m, '\t'); seq_escape(m, exp->ex_client->name, " \t\n\\"); + if (export_stats) { + seq_printf(m, "\t%lld\n", exp->ex_stats.start_time); + seq_printf(m, "\tfh_stale: %lld\n", + percpu_counter_sum_positive(&exp->ex_stats.counter[EXP_STATS_FH_STALE])); + seq_printf(m, "\tio_read: %lld\n", + percpu_counter_sum_positive(&exp->ex_stats.counter[EXP_STATS_IO_READ])); + seq_printf(m, "\tio_write: %lld\n", + percpu_counter_sum_positive(&exp->ex_stats.counter[EXP_STATS_IO_WRITE])); + seq_putc(m, '\n'); + return 0; + } seq_putc(m, '('); - if (test_bit(CACHE_VALID, &h->flags) && + if (test_bit(CACHE_VALID, &h->flags) && !test_bit(CACHE_NEGATIVE, &h->flags)) { exp_flags(m, exp->ex_flags, exp->ex_fsid, exp->ex_anon_uid, exp->ex_anon_gid, &exp->ex_fslocs); @@ -748,6 +790,7 @@ static void svc_export_init(struct cache_head *cnew, struct cache_head *citem) new->ex_layout_types = 0; new->ex_uuid = NULL; new->cd = item->cd; + export_stats_reset(&new->ex_stats); } static void export_update(struct cache_head *cnew, struct cache_head *citem) @@ -780,10 +823,15 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem) static struct cache_head *svc_export_alloc(void) { struct svc_export *i = kmalloc(sizeof(*i), GFP_KERNEL); - if (i) - return &i->h; - else + if (!i) return NULL; + + if (export_stats_init(&i->ex_stats)) { + kfree(i); + return NULL; + } + + return &i->h; } static const struct cache_detail svc_export_cache_template = { @@ -1245,10 +1293,14 @@ static int e_show(struct seq_file *m, void *p) struct cache_head *cp = p; struct svc_export *exp = container_of(cp, struct svc_export, h); struct cache_detail *cd = m->private; + bool export_stats = is_export_stats_file(m); if (p == SEQ_START_TOKEN) { seq_puts(m, "# Version 1.1\n"); - seq_puts(m, "# Path Client(Flags) # IPs\n"); + if (export_stats) + seq_puts(m, "# Path Client Start-time\n#\tStats\n"); + else + seq_puts(m, "# Path Client(Flags) # IPs\n"); return 0; } diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h index e7daa1f246f0..ee0e3aba4a6e 100644 --- a/fs/nfsd/export.h +++ b/fs/nfsd/export.h @@ -6,6 +6,7 @@ #define NFSD_EXPORT_H #include +#include #include #include @@ -46,6 +47,19 @@ struct exp_flavor_info { u32 flags; }; +/* Per-export stats */ +enum { + EXP_STATS_FH_STALE, + EXP_STATS_IO_READ, + EXP_STATS_IO_WRITE, + EXP_STATS_COUNTERS_NUM +}; + +struct export_stats { + time64_t start_time; + struct percpu_counter counter[EXP_STATS_COUNTERS_NUM]; +}; + struct svc_export { struct cache_head h; struct auth_domain * ex_client; @@ -62,6 +76,7 @@ struct svc_export { struct nfsd4_deviceid_map *ex_devid_map; struct cache_detail *cd; struct rcu_head ex_rcu; + struct export_stats ex_stats; }; /* an "export key" (expkey) maps a filehandlefragement to an diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 7f85c171f83a..fa060f91df1b 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -32,6 +32,7 @@ enum { NFSD_Root = 1, NFSD_List, + NFSD_Export_Stats, NFSD_Export_features, NFSD_Fh, NFSD_FO_UnlockIP, @@ -1352,6 +1353,8 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc) static const struct tree_descr nfsd_files[] = { [NFSD_List] = {"exports", &exports_nfsd_operations, S_IRUGO}, + /* Per-export io stats use same ops as exports file */ + [NFSD_Export_Stats] = {"export_stats", &exports_nfsd_operations, S_IRUGO}, [NFSD_Export_features] = {"export_features", &export_features_operations, S_IRUGO}, [NFSD_FO_UnlockIP] = {"unlock_ip", diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 2326428e2c5b..27c1308ffc2b 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -24,8 +24,8 @@ #include #include "netns.h" -#include "stats.h" #include "export.h" +#include "stats.h" #undef ifdebug #ifdef CONFIG_SUNRPC_DEBUG diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 9e31b2b5c6d2..4744a276058d 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -349,7 +349,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) __be32 fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access) { - struct svc_export *exp; + struct svc_export *exp = NULL; struct dentry *dentry; __be32 error; @@ -422,7 +422,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access) } out: if (error == nfserr_stale) - nfsd_stats_fh_stale_inc(); + nfsd_stats_fh_stale_inc(exp); return error; } diff --git a/fs/nfsd/stats.h b/fs/nfsd/stats.h index 87c3150c200f..51ecda852e23 100644 --- a/fs/nfsd/stats.h +++ b/fs/nfsd/stats.h @@ -59,19 +59,25 @@ static inline void nfsd_stats_rc_nocache_inc(void) percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE]); } -static inline void nfsd_stats_fh_stale_inc(void) +static inline void nfsd_stats_fh_stale_inc(struct svc_export *exp) { percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_FH_STALE]); + if (exp) + percpu_counter_inc(&exp->ex_stats.counter[EXP_STATS_FH_STALE]); } -static inline void nfsd_stats_io_read_add(s64 amount) +static inline void nfsd_stats_io_read_add(struct svc_export *exp, s64 amount) { percpu_counter_add(&nfsdstats.counter[NFSD_STATS_IO_READ], amount); + if (exp) + percpu_counter_add(&exp->ex_stats.counter[EXP_STATS_IO_READ], amount); } -static inline void nfsd_stats_io_write_add(s64 amount) +static inline void nfsd_stats_io_write_add(struct svc_export *exp, s64 amount) { percpu_counter_add(&nfsdstats.counter[NFSD_STATS_IO_WRITE], amount); + if (exp) + percpu_counter_add(&exp->ex_stats.counter[EXP_STATS_IO_WRITE], amount); } static inline void nfsd_stats_payload_misses_inc(struct nfsd_net *nn) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 1b44d8f985be..3e30788e0046 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -897,7 +897,7 @@ static __be32 nfsd_finish_read(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned long *count, u32 *eof, ssize_t host_err) { if (host_err >= 0) { - nfsd_stats_io_read_add(host_err); + nfsd_stats_io_read_add(fhp->fh_export, host_err); *eof = nfsd_eof_on_read(file, offset, host_err, *count); *count = host_err; fsnotify_access(file); @@ -1050,7 +1050,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, goto out_nfserr; } *cnt = host_err; - nfsd_stats_io_write_add(*cnt); + nfsd_stats_io_write_add(exp, *cnt); fsnotify_modify(file); host_err = filemap_check_wb_err(file->f_mapping, since); if (host_err < 0) From 4f26b1747a2ec827045dc1d8f04b641c76e1d5c9 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Thu, 21 Jan 2021 17:57:37 -0500 Subject: [PATCH 0477/1565] nfsd4: simplify process_lookup1 [ Upstream commit 33311873adb0d55c287b164117b5b4bb7b1bdc40 ] This STALE_CLIENTID check is redundant with the one in lookup_clientid(). There's a difference in behavior is in case of memory allocation failure, which I think isn't a big deal. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index e7ec7593eaaa..3f2604737636 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4722,8 +4722,6 @@ nfsd4_process_open1(struct nfsd4_compound_state *cstate, struct nfs4_openowner *oo = NULL; __be32 status; - if (STALE_CLIENTID(&open->op_clientid, nn)) - return nfserr_stale_clientid; /* * In case we need it later, after we've already created the * file and don't want to risk a further failure: From 52923f25be3c3479a2af17f94c80b85b5bb4c383 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Thu, 21 Jan 2021 17:57:38 -0500 Subject: [PATCH 0478/1565] nfsd: simplify process_lock [ Upstream commit a9d53a75cf574d6aa41f3cb4968fffe4f64e0fad ] Similarly, this STALE_CLIENTID check is already handled by: nfs4_preprocess_confirmed_seqid_op()-> nfs4_preprocess_seqid_op()-> nfsd4_lookup_stateid()-> set_client()-> STALE_CLIENTID() (This may cause it to return a different error in some cases where there are multiple things wrong; pynfs test SEQ10 regressed on this commit because of that, but I think that's the test's fault, and I've fixed it separately.) Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 3f2604737636..15ed72b0ef55 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6720,10 +6720,6 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, &cstate->session->se_client->cl_clientid, sizeof(clientid_t)); - status = nfserr_stale_clientid; - if (STALE_CLIENTID(&lock->lk_new_clientid, nn)) - goto out; - /* validate and update open stateid and open seqid */ status = nfs4_preprocess_confirmed_seqid_op(cstate, lock->lk_new_open_seqid, From ea92c0768f98c733280b6ab6326ea2f11deac502 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Thu, 21 Jan 2021 17:57:39 -0500 Subject: [PATCH 0479/1565] nfsd: simplify nfsd_renew [ Upstream commit b4587eb2cf4b6271f67fb93b75f7de2a2026e853 ] You can take the single-exit thing too far, I think. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 15ed72b0ef55..574c88a9da26 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5322,15 +5322,12 @@ nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, trace_nfsd_clid_renew(clid); status = lookup_clientid(clid, cstate, nn, false); if (status) - goto out; + return status; clp = cstate->clp; - status = nfserr_cb_path_down; if (!list_empty(&clp->cl_delegations) && clp->cl_cb_state != NFSD4_CB_UP) - goto out; - status = nfs_ok; -out: - return status; + return nfserr_cb_path_down; + return nfs_ok; } void From 17006574683f3b4a62dbaa275260731cef369485 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Thu, 21 Jan 2021 17:57:40 -0500 Subject: [PATCH 0480/1565] nfsd: rename lookup_clientid->set_client [ Upstream commit 460d27091ae2c23e7ac959a61cd481c58832db58 ] I think this is a better name, and I'm going to reuse elsewhere the code that does the lookup itself. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 574c88a9da26..60c66becbc87 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4675,7 +4675,7 @@ static __be32 nfsd4_check_seqid(struct nfsd4_compound_state *cstate, struct nfs4 return nfserr_bad_seqid; } -static __be32 lookup_clientid(clientid_t *clid, +static __be32 set_client(clientid_t *clid, struct nfsd4_compound_state *cstate, struct nfsd_net *nn, bool sessions) @@ -4730,7 +4730,7 @@ nfsd4_process_open1(struct nfsd4_compound_state *cstate, if (open->op_file == NULL) return nfserr_jukebox; - status = lookup_clientid(clientid, cstate, nn, false); + status = set_client(clientid, cstate, nn, false); if (status) return status; clp = cstate->clp; @@ -5320,7 +5320,7 @@ nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); trace_nfsd_clid_renew(clid); - status = lookup_clientid(clid, cstate, nn, false); + status = set_client(clid, cstate, nn, false); if (status) return status; clp = cstate->clp; @@ -5701,8 +5701,7 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate, if (ZERO_STATEID(stateid) || ONE_STATEID(stateid) || CLOSE_STATEID(stateid)) return nfserr_bad_stateid; - status = lookup_clientid(&stateid->si_opaque.so_clid, cstate, nn, - false); + status = set_client(&stateid->si_opaque.so_clid, cstate, nn, false); if (status == nfserr_stale_clientid) { if (cstate->session) return nfserr_bad_stateid; @@ -5841,7 +5840,7 @@ static __be32 find_cpntf_state(struct nfsd_net *nn, stateid_t *st, cps->cpntf_time = ktime_get_boottime_seconds(); memset(&cstate, 0, sizeof(cstate)); - status = lookup_clientid(&cps->cp_p_clid, &cstate, nn, true); + status = set_client(&cps->cp_p_clid, &cstate, nn, true); if (status) goto out; status = nfsd4_lookup_stateid(&cstate, &cps->cp_p_stateid, @@ -6933,8 +6932,7 @@ nfsd4_lockt(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, return nfserr_inval; if (!nfsd4_has_session(cstate)) { - status = lookup_clientid(&lockt->lt_clientid, cstate, nn, - false); + status = set_client(&lockt->lt_clientid, cstate, nn, false); if (status) goto out; } @@ -7118,7 +7116,7 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp, dprintk("nfsd4_release_lockowner clientid: (%08x/%08x):\n", clid->cl_boot, clid->cl_id); - status = lookup_clientid(clid, cstate, nn, false); + status = set_client(clid, cstate, nn, false); if (status) return status; @@ -7258,7 +7256,7 @@ nfs4_check_open_reclaim(clientid_t *clid, __be32 status; /* find clientid in conf_id_hashtbl */ - status = lookup_clientid(clid, cstate, nn, false); + status = set_client(clid, cstate, nn, false); if (status) return nfserr_reclaim_bad; From 1d4ccfdc7d0eaf941fddce788f2abd781193e1e2 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Thu, 21 Jan 2021 17:57:41 -0500 Subject: [PATCH 0481/1565] nfsd: refactor set_client [ Upstream commit 7950b5316e40d99dcb85ab81a2d1dbb913d7c1c8 ] This'll be useful elsewhere. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 60c66becbc87..c4e9f807b4a1 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4675,40 +4675,40 @@ static __be32 nfsd4_check_seqid(struct nfsd4_compound_state *cstate, struct nfs4 return nfserr_bad_seqid; } +static struct nfs4_client *lookup_clientid(clientid_t *clid, bool sessions, + struct nfsd_net *nn) +{ + struct nfs4_client *found; + + spin_lock(&nn->client_lock); + found = find_confirmed_client(clid, sessions, nn); + if (found) + atomic_inc(&found->cl_rpc_users); + spin_unlock(&nn->client_lock); + return found; +} + static __be32 set_client(clientid_t *clid, struct nfsd4_compound_state *cstate, struct nfsd_net *nn, bool sessions) { - struct nfs4_client *found; - if (cstate->clp) { - found = cstate->clp; - if (!same_clid(&found->cl_clientid, clid)) + if (!same_clid(&cstate->clp->cl_clientid, clid)) return nfserr_stale_clientid; return nfs_ok; } - if (STALE_CLIENTID(clid, nn)) return nfserr_stale_clientid; - /* * For v4.1+ we get the client in the SEQUENCE op. If we don't have one * cached already then we know this is for is for v4.0 and "sessions" * will be false. */ WARN_ON_ONCE(cstate->session); - spin_lock(&nn->client_lock); - found = find_confirmed_client(clid, sessions, nn); - if (!found) { - spin_unlock(&nn->client_lock); + cstate->clp = lookup_clientid(clid, sessions, nn); + if (!cstate->clp) return nfserr_expired; - } - atomic_inc(&found->cl_rpc_users); - spin_unlock(&nn->client_lock); - - /* Cache the nfs4_client in cstate! */ - cstate->clp = found; return nfs_ok; } From 87ab73c1cc75ec55b012552811c66443b2b71651 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Thu, 21 Jan 2021 17:57:42 -0500 Subject: [PATCH 0482/1565] nfsd: find_cpntf_state cleanup [ Upstream commit 47fdb22dacae78f37701d82a94c16a014186d34e ] I think this unusual use of struct compound_state could cause confusion. It's not that much more complicated just to open-code this stateid lookup. The only change in behavior should be a different error return in the case the copy is using a source stateid that is a revoked delegation, but I doubt that matters. Signed-off-by: J. Bruce Fields [ cel: squashed in fix reported by Coverity ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index c4e9f807b4a1..b05598e5bc16 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5832,21 +5832,27 @@ static __be32 find_cpntf_state(struct nfsd_net *nn, stateid_t *st, { __be32 status; struct nfs4_cpntf_state *cps = NULL; - struct nfsd4_compound_state cstate; + struct nfs4_client *found; status = manage_cpntf_state(nn, st, NULL, &cps); if (status) return status; cps->cpntf_time = ktime_get_boottime_seconds(); - memset(&cstate, 0, sizeof(cstate)); - status = set_client(&cps->cp_p_clid, &cstate, nn, true); - if (status) + + status = nfserr_expired; + found = lookup_clientid(&cps->cp_p_clid, true, nn); + if (!found) goto out; - status = nfsd4_lookup_stateid(&cstate, &cps->cp_p_stateid, - NFS4_DELEG_STID|NFS4_OPEN_STID|NFS4_LOCK_STID, - stid, nn); - put_client_renew(cstate.clp); + + *stid = find_stateid_by_type(found, &cps->cp_p_stateid, + NFS4_DELEG_STID|NFS4_OPEN_STID|NFS4_LOCK_STID); + if (*stid) + status = nfs_ok; + else + status = nfserr_bad_stateid; + + put_client_renew(found); out: nfs4_put_cpntf_state(nn, cps); return status; From 25ac4fdbdce718edc7cbb5e71760eb746ac026c2 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Thu, 21 Jan 2021 17:57:43 -0500 Subject: [PATCH 0483/1565] nfsd: remove unused set_client argument [ Upstream commit f71475ba8c2a77fff8051903cf4b7d826c3d1693 ] Every caller is setting this argument to false, so we don't need it. Also cut this comment a bit and remove an unnecessary warning. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index b05598e5bc16..cbec87ee6bc0 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4690,8 +4690,7 @@ static struct nfs4_client *lookup_clientid(clientid_t *clid, bool sessions, static __be32 set_client(clientid_t *clid, struct nfsd4_compound_state *cstate, - struct nfsd_net *nn, - bool sessions) + struct nfsd_net *nn) { if (cstate->clp) { if (!same_clid(&cstate->clp->cl_clientid, clid)) @@ -4701,12 +4700,10 @@ static __be32 set_client(clientid_t *clid, if (STALE_CLIENTID(clid, nn)) return nfserr_stale_clientid; /* - * For v4.1+ we get the client in the SEQUENCE op. If we don't have one - * cached already then we know this is for is for v4.0 and "sessions" - * will be false. + * We're in the 4.0 case (otherwise the SEQUENCE op would have + * set cstate->clp), so session = false: */ - WARN_ON_ONCE(cstate->session); - cstate->clp = lookup_clientid(clid, sessions, nn); + cstate->clp = lookup_clientid(clid, false, nn); if (!cstate->clp) return nfserr_expired; return nfs_ok; @@ -4730,7 +4727,7 @@ nfsd4_process_open1(struct nfsd4_compound_state *cstate, if (open->op_file == NULL) return nfserr_jukebox; - status = set_client(clientid, cstate, nn, false); + status = set_client(clientid, cstate, nn); if (status) return status; clp = cstate->clp; @@ -5320,7 +5317,7 @@ nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); trace_nfsd_clid_renew(clid); - status = set_client(clid, cstate, nn, false); + status = set_client(clid, cstate, nn); if (status) return status; clp = cstate->clp; @@ -5701,7 +5698,7 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate, if (ZERO_STATEID(stateid) || ONE_STATEID(stateid) || CLOSE_STATEID(stateid)) return nfserr_bad_stateid; - status = set_client(&stateid->si_opaque.so_clid, cstate, nn, false); + status = set_client(&stateid->si_opaque.so_clid, cstate, nn); if (status == nfserr_stale_clientid) { if (cstate->session) return nfserr_bad_stateid; @@ -6938,7 +6935,7 @@ nfsd4_lockt(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, return nfserr_inval; if (!nfsd4_has_session(cstate)) { - status = set_client(&lockt->lt_clientid, cstate, nn, false); + status = set_client(&lockt->lt_clientid, cstate, nn); if (status) goto out; } @@ -7122,7 +7119,7 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp, dprintk("nfsd4_release_lockowner clientid: (%08x/%08x):\n", clid->cl_boot, clid->cl_id); - status = set_client(clid, cstate, nn, false); + status = set_client(clid, cstate, nn); if (status) return status; @@ -7262,7 +7259,7 @@ nfs4_check_open_reclaim(clientid_t *clid, __be32 status; /* find clientid in conf_id_hashtbl */ - status = set_client(clid, cstate, nn, false); + status = set_client(clid, cstate, nn); if (status) return nfserr_reclaim_bad; From bc6015541cda7063077201415b1b1f3310b67923 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Thu, 21 Jan 2021 17:57:44 -0500 Subject: [PATCH 0484/1565] nfsd: simplify nfsd4_check_open_reclaim [ Upstream commit 1722b04624806ced51693f546edb83e8b2297a77 ] The set_client() was already taken care of by process_open1(). The comments here are mostly redundant with the code. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 3 +-- fs/nfsd/nfs4state.c | 18 +++--------------- fs/nfsd/state.h | 3 +-- 3 files changed, 5 insertions(+), 19 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 4f64d94909ec..5d304b51914a 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -428,8 +428,7 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out; break; case NFS4_OPEN_CLAIM_PREVIOUS: - status = nfs4_check_open_reclaim(&open->op_clientid, - cstate, nn); + status = nfs4_check_open_reclaim(cstate->clp); if (status) goto out; open->op_openowner->oo_flags |= NFS4_OO_CONFIRMED; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index cbec87ee6bc0..4da8467a3570 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -7248,25 +7248,13 @@ nfsd4_find_reclaim_client(struct xdr_netobj name, struct nfsd_net *nn) return NULL; } -/* -* Called from OPEN. Look for clientid in reclaim list. -*/ __be32 -nfs4_check_open_reclaim(clientid_t *clid, - struct nfsd4_compound_state *cstate, - struct nfsd_net *nn) +nfs4_check_open_reclaim(struct nfs4_client *clp) { - __be32 status; - - /* find clientid in conf_id_hashtbl */ - status = set_client(clid, cstate, nn); - if (status) - return nfserr_reclaim_bad; - - if (test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, &cstate->clp->cl_flags)) + if (test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, &clp->cl_flags)) return nfserr_no_grace; - if (nfsd4_client_record_check(cstate->clp)) + if (nfsd4_client_record_check(clp)) return nfserr_reclaim_bad; return nfs_ok; diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 9eae11a9d21c..73deea353169 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -649,8 +649,7 @@ void nfs4_remove_reclaim_record(struct nfs4_client_reclaim *, struct nfsd_net *) extern void nfs4_release_reclaim(struct nfsd_net *); extern struct nfs4_client_reclaim *nfsd4_find_reclaim_client(struct xdr_netobj name, struct nfsd_net *nn); -extern __be32 nfs4_check_open_reclaim(clientid_t *clid, - struct nfsd4_compound_state *cstate, struct nfsd_net *nn); +extern __be32 nfs4_check_open_reclaim(struct nfs4_client *); extern void nfsd4_probe_callback(struct nfs4_client *clp); extern void nfsd4_probe_callback_sync(struct nfs4_client *clp); extern void nfsd4_change_callback(struct nfs4_client *clp, struct nfs4_cb_conn *); From b267f61182c1054e4b7e6ce91563432e464094e7 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Thu, 21 Jan 2021 17:57:45 -0500 Subject: [PATCH 0485/1565] nfsd: cstate->session->se_client -> cstate->clp [ Upstream commit ec59659b4972ec25851aa03b4b5baba6764a62e4 ] I'm not sure why we're writing this out the hard way in so many places. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 5 ++--- fs/nfsd/nfs4state.c | 16 ++++++++-------- 2 files changed, 10 insertions(+), 11 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 5d304b51914a..0acf7af9aab8 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -378,8 +378,7 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, * Before RECLAIM_COMPLETE done, server should deny new lock */ if (nfsd4_has_session(cstate) && - !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, - &cstate->session->se_client->cl_flags) && + !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, &cstate->clp->cl_flags) && open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) return nfserr_grace; @@ -1881,7 +1880,7 @@ nfsd4_getdeviceinfo(struct svc_rqst *rqstp, nfserr = nfs_ok; if (gdp->gd_maxcount != 0) { nfserr = ops->proc_getdeviceinfo(exp->ex_path.mnt->mnt_sb, - rqstp, cstate->session->se_client, gdp); + rqstp, cstate->clp, gdp); } gdp->gd_notify_types &= ops->notify_types; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 4da8467a3570..c2eba93ab513 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3923,6 +3923,7 @@ nfsd4_reclaim_complete(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) { struct nfsd4_reclaim_complete *rc = &u->reclaim_complete; + struct nfs4_client *clp = cstate->clp; __be32 status = 0; if (rc->rca_one_fs) { @@ -3936,12 +3937,11 @@ nfsd4_reclaim_complete(struct svc_rqst *rqstp, } status = nfserr_complete_already; - if (test_and_set_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, - &cstate->session->se_client->cl_flags)) + if (test_and_set_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, &clp->cl_flags)) goto out; status = nfserr_stale_clientid; - if (is_client_expired(cstate->session->se_client)) + if (is_client_expired(clp)) /* * The following error isn't really legal. * But we only get here if the client just explicitly @@ -3952,8 +3952,8 @@ nfsd4_reclaim_complete(struct svc_rqst *rqstp, goto out; status = nfs_ok; - nfsd4_client_record_create(cstate->session->se_client); - inc_reclaim_complete(cstate->session->se_client); + nfsd4_client_record_create(clp); + inc_reclaim_complete(clp); out: return status; } @@ -5938,7 +5938,7 @@ nfsd4_test_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, { struct nfsd4_test_stateid *test_stateid = &u->test_stateid; struct nfsd4_test_stateid_id *stateid; - struct nfs4_client *cl = cstate->session->se_client; + struct nfs4_client *cl = cstate->clp; list_for_each_entry(stateid, &test_stateid->ts_stateid_list, ts_id_list) stateid->ts_id_status = @@ -5984,7 +5984,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stateid_t *stateid = &free_stateid->fr_stateid; struct nfs4_stid *s; struct nfs4_delegation *dp; - struct nfs4_client *cl = cstate->session->se_client; + struct nfs4_client *cl = cstate->clp; __be32 ret = nfserr_bad_stateid; spin_lock(&cl->cl_lock); @@ -6716,7 +6716,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (nfsd4_has_session(cstate)) /* See rfc 5661 18.10.3: given clientid is ignored: */ memcpy(&lock->lk_new_clientid, - &cstate->session->se_client->cl_clientid, + &cstate->clp->cl_clientid, sizeof(clientid_t)); /* validate and update open stateid and open seqid */ From 5a0b45626fc199bc27858e9e4333a829c80fb875 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Thu, 28 Jan 2021 01:42:26 -0500 Subject: [PATCH 0486/1565] NFSv4_2: SSC helper should use its own config. [ Upstream commit 02591f9febd5f69bb4c266a4abf899c4cf21964f ] Currently NFSv4_2 SSC helper, nfs_ssc, incorrectly uses GRACE_PERIOD as its config. Fix by adding new config NFS_V4_2_SSC_HELPER which depends on NFS_V4_2 and is automatically selected when NFSD_V4 is enabled. Also removed the file name from a comment in nfs_ssc.c. Signed-off-by: Dai Ngo Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/Kconfig | 4 ++++ fs/nfs/nfs4file.c | 4 ++++ fs/nfs/super.c | 12 ++++++++++++ fs/nfs_common/Makefile | 2 +- fs/nfs_common/nfs_ssc.c | 2 -- fs/nfsd/Kconfig | 1 + 6 files changed, 22 insertions(+), 3 deletions(-) diff --git a/fs/Kconfig b/fs/Kconfig index da524c4d7b7e..462253ae483a 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -333,6 +333,10 @@ config NFS_COMMON depends on NFSD || NFS_FS || LOCKD default y +config NFS_V4_2_SSC_HELPER + tristate + default y if NFS_V4=y || NFS_FS=y + source "net/sunrpc/Kconfig" source "fs/ceph/Kconfig" source "fs/cifs/Kconfig" diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index 70cd0d764c44..5ad57ad89fb1 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -430,7 +430,9 @@ static const struct nfs4_ssc_client_ops nfs4_ssc_clnt_ops_tbl = { */ void nfs42_ssc_register_ops(void) { +#ifdef CONFIG_NFSD_V4 nfs42_ssc_register(&nfs4_ssc_clnt_ops_tbl); +#endif } /** @@ -441,7 +443,9 @@ void nfs42_ssc_register_ops(void) */ void nfs42_ssc_unregister_ops(void) { +#ifdef CONFIG_NFSD_V4 nfs42_ssc_unregister(&nfs4_ssc_clnt_ops_tbl); +#endif } #endif /* CONFIG_NFS_V4_2 */ diff --git a/fs/nfs/super.c b/fs/nfs/super.c index b3fcc27b9564..7179d59d73ca 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -86,9 +86,11 @@ const struct super_operations nfs_sops = { }; EXPORT_SYMBOL_GPL(nfs_sops); +#ifdef CONFIG_NFS_V4_2 static const struct nfs_ssc_client_ops nfs_ssc_clnt_ops_tbl = { .sco_sb_deactive = nfs_sb_deactive, }; +#endif #if IS_ENABLED(CONFIG_NFS_V4) static int __init register_nfs4_fs(void) @@ -111,15 +113,21 @@ static void unregister_nfs4_fs(void) } #endif +#ifdef CONFIG_NFS_V4_2 static void nfs_ssc_register_ops(void) { +#ifdef CONFIG_NFSD_V4 nfs_ssc_register(&nfs_ssc_clnt_ops_tbl); +#endif } static void nfs_ssc_unregister_ops(void) { +#ifdef CONFIG_NFSD_V4 nfs_ssc_unregister(&nfs_ssc_clnt_ops_tbl); +#endif } +#endif /* CONFIG_NFS_V4_2 */ static struct shrinker acl_shrinker = { .count_objects = nfs_access_cache_count, @@ -148,7 +156,9 @@ int __init register_nfs_fs(void) ret = register_shrinker(&acl_shrinker); if (ret < 0) goto error_3; +#ifdef CONFIG_NFS_V4_2 nfs_ssc_register_ops(); +#endif return 0; error_3: nfs_unregister_sysctl(); @@ -168,7 +178,9 @@ void __exit unregister_nfs_fs(void) unregister_shrinker(&acl_shrinker); nfs_unregister_sysctl(); unregister_nfs4_fs(); +#ifdef CONFIG_NFS_V4_2 nfs_ssc_unregister_ops(); +#endif unregister_filesystem(&nfs_fs_type); } diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile index fa82f5aaa6d9..119c75ab9fd0 100644 --- a/fs/nfs_common/Makefile +++ b/fs/nfs_common/Makefile @@ -7,4 +7,4 @@ obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o nfs_acl-objs := nfsacl.o obj-$(CONFIG_GRACE_PERIOD) += grace.o -obj-$(CONFIG_GRACE_PERIOD) += nfs_ssc.o +obj-$(CONFIG_NFS_V4_2_SSC_HELPER) += nfs_ssc.o diff --git a/fs/nfs_common/nfs_ssc.c b/fs/nfs_common/nfs_ssc.c index f43bbb373913..7c1509e968c8 100644 --- a/fs/nfs_common/nfs_ssc.c +++ b/fs/nfs_common/nfs_ssc.c @@ -1,7 +1,5 @@ // SPDX-License-Identifier: GPL-2.0-only /* - * fs/nfs_common/nfs_ssc_comm.c - * * Helper for knfsd's SSC to access ops in NFS client modules * * Author: Dai Ngo diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig index 248f1459c039..d6cff5fbe705 100644 --- a/fs/nfsd/Kconfig +++ b/fs/nfsd/Kconfig @@ -77,6 +77,7 @@ config NFSD_V4 select CRYPTO_MD5 select CRYPTO_SHA256 select GRACE_PERIOD + select NFS_V4_2_SSC_HELPER if NFS_V4_2 help This option enables support in your system's NFS server for version 4 of the NFS protocol (RFC 3530). From 0220d51186482f54199dbb3349b951235c6bca69 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Fri, 29 Jan 2021 14:26:29 -0500 Subject: [PATCH 0487/1565] nfs: use change attribute for NFS re-exports [ Upstream commit 3cc55f4434b421d37300aa9a167ace7d60b45ccf ] When exporting NFS, we may as well use the real change attribute returned by the original server instead of faking up a change attribute from the ctime. Note we can't do that by setting I_VERSION--that would also turn on the logic in iversion.h which treats the lower bit specially, and that doesn't make sense for NFS. So instead we define a new export operation for filesystems like NFS that want to manage the change attribute themselves. Signed-off-by: J. Bruce Fields Reviewed-by: Christoph Hellwig Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfs/export.c | 18 ++++++++++++++++++ fs/nfsd/nfsfh.h | 5 ++++- include/linux/exportfs.h | 1 + 3 files changed, 23 insertions(+), 1 deletion(-) diff --git a/fs/nfs/export.c b/fs/nfs/export.c index 7412bb164fa7..f2b34cfe286c 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -167,10 +167,28 @@ nfs_get_parent(struct dentry *dentry) return parent; } +static u64 nfs_fetch_iversion(struct inode *inode) +{ + struct nfs_server *server = NFS_SERVER(inode); + + /* Is this the right call?: */ + nfs_revalidate_inode(server, inode); + /* + * Also, note we're ignoring any returned error. That seems to be + * the practice for cache consistency information elsewhere in + * the server, but I'm not sure why. + */ + if (server->nfs_client->rpc_ops->version >= 4) + return inode_peek_iversion_raw(inode); + else + return time_to_chattr(&inode->i_ctime); +} + const struct export_operations nfs_export_ops = { .encode_fh = nfs_encode_fh, .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, + .fetch_iversion = nfs_fetch_iversion, .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK| EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS| EXPORT_OP_NOATOMIC_ATTR, diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index cb20c2cd3469..f58933519f38 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -12,6 +12,7 @@ #include #include #include +#include static inline __u32 ino_t_to_u32(ino_t ino) { @@ -264,7 +265,9 @@ fh_clear_wcc(struct svc_fh *fhp) static inline u64 nfsd4_change_attribute(struct kstat *stat, struct inode *inode) { - if (IS_I_VERSION(inode)) { + if (inode->i_sb->s_export_op->fetch_iversion) + return inode->i_sb->s_export_op->fetch_iversion(inode); + else if (IS_I_VERSION(inode)) { u64 chattr; chattr = stat->ctime.tv_sec; diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index 9f4d4bcbf251..fe848901fcc3 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -213,6 +213,7 @@ struct export_operations { bool write, u32 *device_generation); int (*commit_blocks)(struct inode *inode, struct iomap *iomaps, int nr_iomaps, struct iattr *iattr); + u64 (*fetch_iversion)(struct inode *); #define EXPORT_OP_NOWCC (0x1) /* don't collect v3 wcc data */ #define EXPORT_OP_NOSUBTREECHK (0x2) /* no subtree checking */ #define EXPORT_OP_CLOSE_BEFORE_UNLINK (0x4) /* close files before unlink */ From c1fe2bb305a23b4b9060b44b44ac9d34b8750e3b Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Fri, 29 Jan 2021 14:27:01 -0500 Subject: [PATCH 0488/1565] nfsd: skip some unnecessary stats in the v4 case [ Upstream commit 428a23d2bf0ca8fd4d364a464c3e468f0e81671e ] In the typical case of v4 and an i_version-supporting filesystem, we can skip a stat which is only required to fake up a change attribute from ctime. Signed-off-by: J. Bruce Fields Reviewed-by: Christoph Hellwig Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 44 +++++++++++++++++++++++++++----------------- 1 file changed, 27 insertions(+), 17 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 00a96054280a..9d9a01ce0b27 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -364,6 +364,11 @@ encode_wcc_data(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) return encode_post_op_attr(rqstp, p, fhp); } +static bool fs_supports_change_attribute(struct super_block *sb) +{ + return sb->s_flags & SB_I_VERSION || sb->s_export_op->fetch_iversion; +} + /* * Fill in the pre_op attr for the wcc data */ @@ -372,24 +377,26 @@ void fill_pre_wcc(struct svc_fh *fhp) struct inode *inode; struct kstat stat; bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); - __be32 err; if (fhp->fh_no_wcc || fhp->fh_pre_saved) return; inode = d_inode(fhp->fh_dentry); - err = fh_getattr(fhp, &stat); - if (err) { - /* Grab the times from inode anyway */ - stat.mtime = inode->i_mtime; - stat.ctime = inode->i_ctime; - stat.size = inode->i_size; + if (fs_supports_change_attribute(inode->i_sb) || !v4) { + __be32 err = fh_getattr(fhp, &stat); + + if (err) { + /* Grab the times from inode anyway */ + stat.mtime = inode->i_mtime; + stat.ctime = inode->i_ctime; + stat.size = inode->i_size; + } + fhp->fh_pre_mtime = stat.mtime; + fhp->fh_pre_ctime = stat.ctime; + fhp->fh_pre_size = stat.size; } if (v4) fhp->fh_pre_change = nfsd4_change_attribute(&stat, inode); - fhp->fh_pre_mtime = stat.mtime; - fhp->fh_pre_ctime = stat.ctime; - fhp->fh_pre_size = stat.size; fhp->fh_pre_saved = true; } @@ -400,7 +407,6 @@ void fill_post_wcc(struct svc_fh *fhp) { bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); struct inode *inode = d_inode(fhp->fh_dentry); - __be32 err; if (fhp->fh_no_wcc) return; @@ -408,12 +414,16 @@ void fill_post_wcc(struct svc_fh *fhp) if (fhp->fh_post_saved) printk("nfsd: inode locked twice during operation.\n"); - err = fh_getattr(fhp, &fhp->fh_post_attr); - if (err) { - fhp->fh_post_saved = false; - fhp->fh_post_attr.ctime = inode->i_ctime; - } else - fhp->fh_post_saved = true; + fhp->fh_post_saved = true; + + if (fs_supports_change_attribute(inode->i_sb) || !v4) { + __be32 err = fh_getattr(fhp, &fhp->fh_post_attr); + + if (err) { + fhp->fh_post_saved = false; + fhp->fh_post_attr.ctime = inode->i_ctime; + } + } if (v4) fhp->fh_post_change = nfsd4_change_attribute(&fhp->fh_post_attr, inode); From 51f620fcc419913315623e81b69a4f2bc59100a8 Mon Sep 17 00:00:00 2001 From: Shakeel Butt Date: Sat, 19 Dec 2020 20:46:08 -0800 Subject: [PATCH 0489/1565] inotify, memcg: account inotify instances to kmemcg [ Upstream commit ac7b79fd190b02e7151bc7d2b9da692f537657f3 ] Currently the fs sysctl inotify/max_user_instances is used to limit the number of inotify instances on the system. For systems running multiple workloads, the per-user namespace sysctl max_inotify_instances can be used to further partition inotify instances. However there is no easy way to set a sensible system level max limit on inotify instances and further partition it between the workloads. It is much easier to charge the underlying resource (i.e. memory) behind the inotify instances to the memcg of the workload and let their memory limits limit the number of inotify instances they can create. With inotify instances charged to memcg, the admin can simply set max_user_instances to INT_MAX and let the memcg limits of the jobs limit their inotify instances. Link: https://lore.kernel.org/r/20201220044608.1258123-1-shakeelb@google.com Reviewed-by: Amir Goldstein Signed-off-by: Shakeel Butt Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 2 +- fs/notify/group.c | 25 ++++++++++++++++++++----- fs/notify/inotify/inotify_user.c | 4 ++-- include/linux/fsnotify_backend.h | 1 + 4 files changed, 24 insertions(+), 8 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 84de9f97bbc0..3e905b2e1b9c 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -976,7 +976,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) f_flags |= O_NONBLOCK; /* fsnotify_alloc_group takes a ref. Dropped in fanotify_release */ - group = fsnotify_alloc_group(&fanotify_fsnotify_ops); + group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops); if (IS_ERR(group)) { free_uid(user); return PTR_ERR(group); diff --git a/fs/notify/group.c b/fs/notify/group.c index a4a4b1c64d32..ffd723ffe46d 100644 --- a/fs/notify/group.c +++ b/fs/notify/group.c @@ -111,14 +111,12 @@ void fsnotify_put_group(struct fsnotify_group *group) } EXPORT_SYMBOL_GPL(fsnotify_put_group); -/* - * Create a new fsnotify_group and hold a reference for the group returned. - */ -struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops) +static struct fsnotify_group *__fsnotify_alloc_group( + const struct fsnotify_ops *ops, gfp_t gfp) { struct fsnotify_group *group; - group = kzalloc(sizeof(struct fsnotify_group), GFP_KERNEL); + group = kzalloc(sizeof(struct fsnotify_group), gfp); if (!group) return ERR_PTR(-ENOMEM); @@ -139,8 +137,25 @@ struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops) return group; } + +/* + * Create a new fsnotify_group and hold a reference for the group returned. + */ +struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops) +{ + return __fsnotify_alloc_group(ops, GFP_KERNEL); +} EXPORT_SYMBOL_GPL(fsnotify_alloc_group); +/* + * Create a new fsnotify_group and hold a reference for the group returned. + */ +struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops) +{ + return __fsnotify_alloc_group(ops, GFP_KERNEL_ACCOUNT); +} +EXPORT_SYMBOL_GPL(fsnotify_alloc_user_group); + int fsnotify_fasync(int fd, struct file *file, int on) { struct fsnotify_group *group = file->private_data; diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index b564a32403aa..ad8fb4bca6dc 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -632,11 +632,11 @@ static struct fsnotify_group *inotify_new_group(unsigned int max_events) struct fsnotify_group *group; struct inotify_event_info *oevent; - group = fsnotify_alloc_group(&inotify_fsnotify_ops); + group = fsnotify_alloc_user_group(&inotify_fsnotify_ops); if (IS_ERR(group)) return group; - oevent = kmalloc(sizeof(struct inotify_event_info), GFP_KERNEL); + oevent = kmalloc(sizeof(struct inotify_event_info), GFP_KERNEL_ACCOUNT); if (unlikely(!oevent)) { fsnotify_destroy_group(group); return ERR_PTR(-ENOMEM); diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index a2e42d3cd87c..e5409b83e731 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -470,6 +470,7 @@ static inline void fsnotify_update_flags(struct dentry *dentry) /* create a new group */ extern struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops); +extern struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops); /* get reference to a group */ extern void fsnotify_get_group(struct fsnotify_group *group); /* drop reference on a group from fsnotify_alloc_group */ From 32edffff869a224f0c4705e4c46a0598e49cf7a6 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 2 Feb 2021 13:13:24 +0100 Subject: [PATCH 0490/1565] module: unexport find_module and module_mutex [ Upstream commit 089049f6c9956c5cf1fc89fe10229c76e99f4bef ] find_module is not used by modular code any more, and random driver code has no business calling it to start with. Reviewed-by: Miroslav Benes Signed-off-by: Christoph Hellwig Signed-off-by: Jessica Yu Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- kernel/module.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel/module.c b/kernel/module.c index 72a5dcdccf7b..c0e51ffe26f0 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -88,7 +88,6 @@ * 3) module_addr_min/module_addr_max. * (delete and add uses RCU list operations). */ DEFINE_MUTEX(module_mutex); -EXPORT_SYMBOL_GPL(module_mutex); static LIST_HEAD(modules); /* Work queue for freeing init sections in success case */ @@ -645,7 +644,6 @@ struct module *find_module(const char *name) module_assert_mutex(); return find_module_all(name, strlen(name), false); } -EXPORT_SYMBOL_GPL(find_module); #ifdef CONFIG_SMP From bef9d8b4f84b6e8482d72096a049139898a2213c Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 2 Feb 2021 13:13:25 +0100 Subject: [PATCH 0491/1565] module: use RCU to synchronize find_module [ Upstream commit a006050575745ca2be25118b90f1c37f454ac542 ] Allow for a RCU-sched critical section around find_module, following the lower level find_module_all helper, and switch the two callers outside of module.c to use such a RCU-sched critical section instead of module_mutex. Reviewed-by: Petr Mladek Acked-by: Miroslav Benes Signed-off-by: Christoph Hellwig Signed-off-by: Jessica Yu Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/module.h | 2 +- kernel/livepatch/core.c | 5 +++-- kernel/module.c | 1 - kernel/trace/trace_kprobe.c | 4 ++-- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/linux/module.h b/include/linux/module.h index 6264617bab4d..86fae5d1c0e3 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -582,7 +582,7 @@ static inline bool within_module(unsigned long addr, const struct module *mod) return within_module_init(addr, mod) || within_module_core(addr, mod); } -/* Search for module by name: must hold module_mutex. */ +/* Search for module by name: must be in a RCU-sched critical section. */ struct module *find_module(const char *name); struct symsearch { diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index f5faf935c2d8..e660ea4f90a2 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include "core.h" #include "patch.h" @@ -57,7 +58,7 @@ static void klp_find_object_module(struct klp_object *obj) if (!klp_is_module(obj)) return; - mutex_lock(&module_mutex); + rcu_read_lock_sched(); /* * We do not want to block removal of patched modules and therefore * we do not take a reference here. The patches are removed by @@ -74,7 +75,7 @@ static void klp_find_object_module(struct klp_object *obj) if (mod && mod->klp_alive) obj->mod = mod; - mutex_unlock(&module_mutex); + rcu_read_unlock_sched(); } static bool klp_initialized(void) diff --git a/kernel/module.c b/kernel/module.c index c0e51ffe26f0..1f9f6133c30e 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -641,7 +641,6 @@ static struct module *find_module_all(const char *name, size_t len, struct module *find_module(const char *name) { - module_assert_mutex(); return find_module_all(name, strlen(name), false); } diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 718357289899..5453af26ff76 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -124,9 +124,9 @@ static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk) if (!p) return true; *p = '\0'; - mutex_lock(&module_mutex); + rcu_read_lock_sched(); ret = !!find_module(tk->symbol); - mutex_unlock(&module_mutex); + rcu_read_unlock_sched(); *p = ':'; return ret; From f8d8568627241ac8501288206d1e620b250bac7c Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 2 Feb 2021 13:13:26 +0100 Subject: [PATCH 0492/1565] kallsyms: refactor {,module_}kallsyms_on_each_symbol [ Upstream commit 013c1667cf78c1d847152f7116436d82dcab3db4 ] Require an explicit call to module_kallsyms_on_each_symbol to look for symbols in modules instead of the call from kallsyms_on_each_symbol, and acquire module_mutex inside of module_kallsyms_on_each_symbol instead of leaving that up to the caller. Note that this slightly changes the behavior for the livepatch code in that the symbols from vmlinux are not iterated anymore if objname is set, but that actually is the desired behavior in this case. Reviewed-by: Petr Mladek Acked-by: Miroslav Benes Signed-off-by: Christoph Hellwig Signed-off-by: Jessica Yu Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- kernel/kallsyms.c | 6 +++++- kernel/livepatch/core.c | 2 -- kernel/module.c | 13 ++++--------- 3 files changed, 9 insertions(+), 12 deletions(-) diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index fe9de067771c..a0d3f0865916 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -177,6 +177,10 @@ unsigned long kallsyms_lookup_name(const char *name) return module_kallsyms_lookup_name(name); } +/* + * Iterate over all symbols in vmlinux. For symbols from modules use + * module_kallsyms_on_each_symbol instead. + */ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, unsigned long), void *data) @@ -192,7 +196,7 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, if (ret != 0) return ret; } - return module_kallsyms_on_each_symbol(fn, data); + return 0; } static unsigned long get_symbol_pos(unsigned long addr, diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index e660ea4f90a2..147ed154ebc7 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -164,12 +164,10 @@ static int klp_find_object_symbol(const char *objname, const char *name, .pos = sympos, }; - mutex_lock(&module_mutex); if (objname) module_kallsyms_on_each_symbol(klp_find_callback, &args); else kallsyms_on_each_symbol(klp_find_callback, &args); - mutex_unlock(&module_mutex); /* * Ensure an address was found. If sympos is 0, ensure symbol is unique; diff --git a/kernel/module.c b/kernel/module.c index 1f9f6133c30e..330387d63c63 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -255,11 +255,6 @@ static void mod_update_bounds(struct module *mod) struct list_head *kdb_modules = &modules; /* kdb needs the list of modules */ #endif /* CONFIG_KGDB_KDB */ -static void module_assert_mutex(void) -{ - lockdep_assert_held(&module_mutex); -} - static void module_assert_mutex_or_preempt(void) { #ifdef CONFIG_LOCKDEP @@ -4457,8 +4452,7 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, unsigned int i; int ret; - module_assert_mutex(); - + mutex_lock(&module_mutex); list_for_each_entry(mod, &modules, list) { /* We hold module_mutex: no need for rcu_dereference_sched */ struct mod_kallsyms *kallsyms = mod->kallsyms; @@ -4474,10 +4468,11 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, ret = fn(data, kallsyms_symbol_name(kallsyms, i), mod, kallsyms_symbol_value(sym)); if (ret != 0) - return ret; + break; } } - return 0; + mutex_unlock(&module_mutex); + return ret; } #endif /* CONFIG_KALLSYMS */ From 666a413295929ca30703ac162886f205ff2ae6ad Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 2 Feb 2021 13:13:27 +0100 Subject: [PATCH 0493/1565] kallsyms: only build {,module_}kallsyms_on_each_symbol when required [ Upstream commit 3e3552056ab42f883d7723eeb42fed712b66bacf ] kallsyms_on_each_symbol and module_kallsyms_on_each_symbol are only used by the livepatching code, so don't build them if livepatching is not enabled. Reviewed-by: Miroslav Benes Signed-off-by: Christoph Hellwig Signed-off-by: Jessica Yu Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/kallsyms.h | 17 ++++------------- include/linux/module.h | 16 ++++------------ kernel/kallsyms.c | 2 ++ kernel/module.c | 2 ++ 4 files changed, 12 insertions(+), 25 deletions(-) diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index 481273f0c72d..465060acc981 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -71,15 +71,14 @@ static inline void *dereference_symbol_descriptor(void *ptr) return ptr; } -#ifdef CONFIG_KALLSYMS -/* Lookup the address for a symbol. Returns 0 if not found. */ -unsigned long kallsyms_lookup_name(const char *name); - -/* Call a function on each kallsyms symbol in the core kernel */ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, unsigned long), void *data); +#ifdef CONFIG_KALLSYMS +/* Lookup the address for a symbol. Returns 0 if not found. */ +unsigned long kallsyms_lookup_name(const char *name); + extern int kallsyms_lookup_size_offset(unsigned long addr, unsigned long *symbolsize, unsigned long *offset); @@ -108,14 +107,6 @@ static inline unsigned long kallsyms_lookup_name(const char *name) return 0; } -static inline int kallsyms_on_each_symbol(int (*fn)(void *, const char *, - struct module *, - unsigned long), - void *data) -{ - return 0; -} - static inline int kallsyms_lookup_size_offset(unsigned long addr, unsigned long *symbolsize, unsigned long *offset) diff --git a/include/linux/module.h b/include/linux/module.h index 86fae5d1c0e3..59cbd8e1be2d 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -604,10 +604,6 @@ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type, /* Look for this name: can be of form module:name. */ unsigned long module_kallsyms_lookup_name(const char *name); -int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, - struct module *, unsigned long), - void *data); - extern void __noreturn __module_put_and_exit(struct module *mod, long code); #define module_put_and_exit(code) __module_put_and_exit(THIS_MODULE, code) @@ -791,14 +787,6 @@ static inline unsigned long module_kallsyms_lookup_name(const char *name) return 0; } -static inline int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, - struct module *, - unsigned long), - void *data) -{ - return 0; -} - static inline int register_module_notifier(struct notifier_block *nb) { /* no events will happen anyway, so this can always succeed */ @@ -887,4 +875,8 @@ static inline bool module_sig_ok(struct module *module) } #endif /* CONFIG_MODULE_SIG */ +int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, + struct module *, unsigned long), + void *data); + #endif /* _LINUX_MODULE_H */ diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index a0d3f0865916..8043a90aa50e 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -177,6 +177,7 @@ unsigned long kallsyms_lookup_name(const char *name) return module_kallsyms_lookup_name(name); } +#ifdef CONFIG_LIVEPATCH /* * Iterate over all symbols in vmlinux. For symbols from modules use * module_kallsyms_on_each_symbol instead. @@ -198,6 +199,7 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, } return 0; } +#endif /* CONFIG_LIVEPATCH */ static unsigned long get_symbol_pos(unsigned long addr, unsigned long *symbolsize, diff --git a/kernel/module.c b/kernel/module.c index 330387d63c63..949d09d2d829 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -4444,6 +4444,7 @@ unsigned long module_kallsyms_lookup_name(const char *name) return ret; } +#ifdef CONFIG_LIVEPATCH int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, unsigned long), void *data) @@ -4474,6 +4475,7 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, mutex_unlock(&module_mutex); return ret; } +#endif /* CONFIG_LIVEPATCH */ #endif /* CONFIG_KALLSYMS */ /* Maximum number of characters written by module_flags() */ From b0fa673c8c248ec2a0b6e563fb586df355b4f427 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Thu, 21 Jan 2021 14:19:22 +0100 Subject: [PATCH 0494/1565] fs: add file and path permissions helpers [ Upstream commit 02f92b3868a1b34ab98464e76b0e4e060474ba10 ] Add two simple helpers to check permissions on a file and path respectively and convert over some callers. It simplifies quite a few codepaths and also reduces the churn in later patches quite a bit. Christoph also correctly points out that this makes codepaths (e.g. ioctls) way easier to follow that would otherwise have to do more complex argument passing than necessary. Link: https://lore.kernel.org/r/20210121131959.646623-4-christian.brauner@ubuntu.com Cc: David Howells Cc: Al Viro Cc: linux-fsdevel@vger.kernel.org Suggested-by: Christoph Hellwig Reviewed-by: Christoph Hellwig Reviewed-by: James Morris Signed-off-by: Christian Brauner Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/init.c | 6 +++--- fs/notify/fanotify/fanotify_user.c | 2 +- fs/notify/inotify/inotify_user.c | 2 +- fs/open.c | 6 +++--- fs/udf/file.c | 2 +- fs/verity/enable.c | 2 +- include/linux/fs.h | 8 ++++++++ kernel/bpf/inode.c | 2 +- kernel/sys.c | 2 +- mm/madvise.c | 2 +- mm/memcontrol.c | 2 +- mm/mincore.c | 2 +- net/unix/af_unix.c | 2 +- 13 files changed, 24 insertions(+), 16 deletions(-) diff --git a/fs/init.c b/fs/init.c index e9c320a48cf1..02723bea8499 100644 --- a/fs/init.c +++ b/fs/init.c @@ -49,7 +49,7 @@ int __init init_chdir(const char *filename) error = kern_path(filename, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &path); if (error) return error; - error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR); + error = path_permission(&path, MAY_EXEC | MAY_CHDIR); if (!error) set_fs_pwd(current->fs, &path); path_put(&path); @@ -64,7 +64,7 @@ int __init init_chroot(const char *filename) error = kern_path(filename, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &path); if (error) return error; - error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR); + error = path_permission(&path, MAY_EXEC | MAY_CHDIR); if (error) goto dput_and_out; error = -EPERM; @@ -118,7 +118,7 @@ int __init init_eaccess(const char *filename) error = kern_path(filename, LOOKUP_FOLLOW, &path); if (error) return error; - error = inode_permission(d_inode(path.dentry), MAY_ACCESS); + error = path_permission(&path, MAY_ACCESS); path_put(&path); return error; } diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 3e905b2e1b9c..829ead2792df 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -702,7 +702,7 @@ static int fanotify_find_path(int dfd, const char __user *filename, } /* you can only watch an inode if you have read permissions on it */ - ret = inode_permission(path->dentry->d_inode, MAY_READ); + ret = path_permission(path, MAY_READ); if (ret) { path_put(path); goto out; diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index ad8fb4bca6dc..82fc0cf86a7c 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -352,7 +352,7 @@ static int inotify_find_inode(const char __user *dirname, struct path *path, if (error) return error; /* you can only watch an inode if you have read permissions on it */ - error = inode_permission(path->dentry->d_inode, MAY_READ); + error = path_permission(path, MAY_READ); if (error) { path_put(path); return error; diff --git a/fs/open.c b/fs/open.c index 48933cbb7539..9f56ebacfbef 100644 --- a/fs/open.c +++ b/fs/open.c @@ -492,7 +492,7 @@ SYSCALL_DEFINE1(chdir, const char __user *, filename) if (error) goto out; - error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR); + error = path_permission(&path, MAY_EXEC | MAY_CHDIR); if (error) goto dput_and_out; @@ -521,7 +521,7 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd) if (!d_can_lookup(f.file->f_path.dentry)) goto out_putf; - error = inode_permission(file_inode(f.file), MAY_EXEC | MAY_CHDIR); + error = file_permission(f.file, MAY_EXEC | MAY_CHDIR); if (!error) set_fs_pwd(current->fs, &f.file->f_path); out_putf: @@ -540,7 +540,7 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename) if (error) goto out; - error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR); + error = path_permission(&path, MAY_EXEC | MAY_CHDIR); if (error) goto dput_and_out; diff --git a/fs/udf/file.c b/fs/udf/file.c index e283a62701b8..25f7c915f22b 100644 --- a/fs/udf/file.c +++ b/fs/udf/file.c @@ -181,7 +181,7 @@ long udf_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) long old_block, new_block; int result; - if (inode_permission(inode, MAY_READ) != 0) { + if (file_permission(filp, MAY_READ) != 0) { udf_debug("no permission to access inode %lu\n", inode->i_ino); return -EPERM; } diff --git a/fs/verity/enable.c b/fs/verity/enable.c index 5ceae66e1ae0..29becb66d0d8 100644 --- a/fs/verity/enable.c +++ b/fs/verity/enable.c @@ -369,7 +369,7 @@ int fsverity_ioctl_enable(struct file *filp, const void __user *uarg) * has verity enabled, and to stabilize the data being hashed. */ - err = inode_permission(inode, MAY_WRITE); + err = file_permission(filp, MAY_WRITE); if (err) return err; diff --git a/include/linux/fs.h b/include/linux/fs.h index 6de70634e547..0974e8160f50 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2824,6 +2824,14 @@ static inline int bmap(struct inode *inode, sector_t *block) extern int notify_change(struct dentry *, struct iattr *, struct inode **); extern int inode_permission(struct inode *, int); extern int generic_permission(struct inode *, int); +static inline int file_permission(struct file *file, int mask) +{ + return inode_permission(file_inode(file), mask); +} +static inline int path_permission(const struct path *path, int mask) +{ + return inode_permission(d_inode(path->dentry), mask); +} extern int __check_sticky(struct inode *dir, struct inode *inode); static inline bool execute_ok(struct inode *inode) diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c index 6b14b4c4068c..5966013bc788 100644 --- a/kernel/bpf/inode.c +++ b/kernel/bpf/inode.c @@ -507,7 +507,7 @@ static void *bpf_obj_do_get(const char __user *pathname, return ERR_PTR(ret); inode = d_backing_inode(path.dentry); - ret = inode_permission(inode, ACC_MODE(flags)); + ret = path_permission(&path, ACC_MODE(flags)); if (ret) goto out; diff --git a/kernel/sys.c b/kernel/sys.c index efc213ae4c5a..7a2cfb57fa9e 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1873,7 +1873,7 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd) if (!S_ISREG(inode->i_mode) || path_noexec(&exe.file->f_path)) goto exit; - err = inode_permission(inode, MAY_EXEC); + err = file_permission(exe.file, MAY_EXEC); if (err) goto exit; diff --git a/mm/madvise.c b/mm/madvise.c index f71fc88f0b33..a63aa04ec7fa 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -543,7 +543,7 @@ static inline bool can_do_pageout(struct vm_area_struct *vma) * opens a side channel. */ return inode_owner_or_capable(file_inode(vma->vm_file)) || - inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0; + file_permission(vma->vm_file, MAY_WRITE) == 0; } static long madvise_pageout(struct vm_area_struct *vma, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ddc8ed096dec..186ae9dba0fd 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4918,7 +4918,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of, /* the process need read permission on control file */ /* AV: shouldn't we check that it's been opened for read instead? */ - ret = inode_permission(file_inode(cfile.file), MAY_READ); + ret = file_permission(cfile.file, MAY_READ); if (ret < 0) goto out_put_cfile; diff --git a/mm/mincore.c b/mm/mincore.c index 02db1a834021..7bdb4673f776 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -167,7 +167,7 @@ static inline bool can_do_mincore(struct vm_area_struct *vma) * mappings, which opens a side channel. */ return inode_owner_or_capable(file_inode(vma->vm_file)) || - inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0; + file_permission(vma->vm_file, MAY_WRITE) == 0; } static const struct mm_walk_ops mincore_walk_ops = { diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 3ab726a668e8..405bf3e6eb79 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -959,7 +959,7 @@ static struct sock *unix_find_other(struct net *net, if (err) goto fail; inode = d_backing_inode(path.dentry); - err = inode_permission(inode, MAY_WRITE); + err = path_permission(&path, MAY_WRITE); if (err) goto put_fail; From e864d4d834f820c8be01c9d048cdab6051fe3932 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Thu, 21 Jan 2021 14:19:32 +0100 Subject: [PATCH 0495/1565] namei: introduce struct renamedata [ Upstream commit 9fe61450972d3900bffb1dc26a17ebb9cdd92db2 ] In order to handle idmapped mounts we will extend the vfs rename helper to take two new arguments in follow up patches. Since this operations already takes a bunch of arguments add a simple struct renamedata and make the current helper use it before we extend it. Link: https://lore.kernel.org/r/20210121131959.646623-14-christian.brauner@ubuntu.com Cc: Christoph Hellwig Cc: David Howells Cc: Al Viro Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Christian Brauner Signed-off-by: Sasha Levin --- fs/cachefiles/namei.c | 9 +++++++-- fs/ecryptfs/inode.c | 10 +++++++--- fs/namei.c | 21 +++++++++++++++------ fs/nfsd/vfs.c | 8 +++++++- fs/overlayfs/overlayfs.h | 9 ++++++++- include/linux/fs.h | 12 +++++++++++- 6 files changed, 55 insertions(+), 14 deletions(-) diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index ecc8ecbbfa5a..7b987de0babe 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -412,9 +412,14 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache, if (ret < 0) { cachefiles_io_error(cache, "Rename security error %d", ret); } else { + struct renamedata rd = { + .old_dir = d_inode(dir), + .old_dentry = rep, + .new_dir = d_inode(cache->graveyard), + .new_dentry = grave, + }; trace_cachefiles_rename(object, rep, grave, why); - ret = vfs_rename(d_inode(dir), rep, - d_inode(cache->graveyard), grave, NULL, 0); + ret = vfs_rename(&rd); if (ret != 0 && ret != -ENOMEM) cachefiles_io_error(cache, "Rename failed with error %d", ret); diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index c867a0d62f36..1dbe0c3ff38e 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -598,6 +598,7 @@ ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry, struct dentry *lower_new_dir_dentry; struct dentry *trap; struct inode *target_inode; + struct renamedata rd = {}; if (flags) return -EINVAL; @@ -627,9 +628,12 @@ ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry, rc = -ENOTEMPTY; goto out_lock; } - rc = vfs_rename(d_inode(lower_old_dir_dentry), lower_old_dentry, - d_inode(lower_new_dir_dentry), lower_new_dentry, - NULL, 0); + + rd.old_dir = d_inode(lower_old_dir_dentry); + rd.old_dentry = lower_old_dentry; + rd.new_dir = d_inode(lower_new_dir_dentry); + rd.new_dentry = lower_new_dentry; + rc = vfs_rename(&rd); if (rc) goto out_lock; if (target_inode) diff --git a/fs/namei.c b/fs/namei.c index cb37d7c477e0..72521a614514 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -4277,11 +4277,14 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname * ->i_mutex on parents, which works but leads to some truly excessive * locking]. */ -int vfs_rename(struct inode *old_dir, struct dentry *old_dentry, - struct inode *new_dir, struct dentry *new_dentry, - struct inode **delegated_inode, unsigned int flags) +int vfs_rename(struct renamedata *rd) { int error; + struct inode *old_dir = rd->old_dir, *new_dir = rd->new_dir; + struct dentry *old_dentry = rd->old_dentry; + struct dentry *new_dentry = rd->new_dentry; + struct inode **delegated_inode = rd->delegated_inode; + unsigned int flags = rd->flags; bool is_dir = d_is_dir(old_dentry); struct inode *source = old_dentry->d_inode; struct inode *target = new_dentry->d_inode; @@ -4429,6 +4432,7 @@ EXPORT_SYMBOL(vfs_rename); int do_renameat2(int olddfd, struct filename *from, int newdfd, struct filename *to, unsigned int flags) { + struct renamedata rd; struct dentry *old_dentry, *new_dentry; struct dentry *trap; struct path old_path, new_path; @@ -4532,9 +4536,14 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd, &new_path, new_dentry, flags); if (error) goto exit5; - error = vfs_rename(old_path.dentry->d_inode, old_dentry, - new_path.dentry->d_inode, new_dentry, - &delegated_inode, flags); + + rd.old_dir = old_path.dentry->d_inode; + rd.old_dentry = old_dentry; + rd.new_dir = new_path.dentry->d_inode; + rd.new_dentry = new_dentry; + rd.delegated_inode = &delegated_inode; + rd.flags = flags; + error = vfs_rename(&rd); exit5: dput(new_dentry); exit4: diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 3e30788e0046..d12c3e71ca10 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1812,7 +1812,13 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, close_cached = true; goto out_dput_old; } else { - host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0); + struct renamedata rd = { + .old_dir = fdir, + .old_dentry = odentry, + .new_dir = tdir, + .new_dentry = ndentry, + }; + host_err = vfs_rename(&rd); if (!host_err) { host_err = commit_metadata(tfhp); if (!host_err) diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index 26f91868fbda..87b7a4a74f4e 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -212,9 +212,16 @@ static inline int ovl_do_rename(struct inode *olddir, struct dentry *olddentry, unsigned int flags) { int err; + struct renamedata rd = { + .old_dir = olddir, + .old_dentry = olddentry, + .new_dir = newdir, + .new_dentry = newdentry, + .flags = flags, + }; pr_debug("rename(%pd2, %pd2, 0x%x)\n", olddentry, newdentry, flags); - err = vfs_rename(olddir, olddentry, newdir, newdentry, NULL, flags); + err = vfs_rename(&rd); if (err) { pr_debug("...rename(%pd2, %pd2, ...) = %i\n", olddentry, newdentry, err); diff --git a/include/linux/fs.h b/include/linux/fs.h index 0974e8160f50..cc3b6ddf5822 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1780,7 +1780,17 @@ extern int vfs_symlink(struct inode *, struct dentry *, const char *); extern int vfs_link(struct dentry *, struct inode *, struct dentry *, struct inode **); extern int vfs_rmdir(struct inode *, struct dentry *); extern int vfs_unlink(struct inode *, struct dentry *, struct inode **); -extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, struct inode **, unsigned int); + +struct renamedata { + struct inode *old_dir; + struct dentry *old_dentry; + struct inode *new_dir; + struct dentry *new_dentry; + struct inode **delegated_inode; + unsigned int flags; +} __randomize_layout; + +int vfs_rename(struct renamedata *); static inline int vfs_whiteout(struct inode *dir, struct dentry *dentry) { From 48aadfa75b619611206351754c318aaeb71aa45c Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 27 Oct 2020 15:53:42 -0400 Subject: [PATCH 0496/1565] NFSD: Extract the svcxdr_init_encode() helper [ Upstream commit bddfdbcddbe267519cd36aeb115fdf8620980111 ] NFSD initializes an encode xdr_stream only after the RPC layer has already inserted the RPC Reply header. Thus it behaves differently than xdr_init_encode does, which assumes the passed-in xdr_buf is entirely devoid of content. nfs4proc.c has this server-side stream initialization helper, but it is visible only to the NFSv4 code. Move this helper to a place that can be accessed by NFSv2 and NFSv3 server XDR functions. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 31 +++-------- fs/nfsd/nfs4state.c | 6 +- fs/nfsd/nfs4xdr.c | 110 ++++++++++++++++++------------------- fs/nfsd/nfssvc.c | 4 +- fs/nfsd/xdr4.h | 2 +- include/linux/sunrpc/svc.h | 25 +++++++++ 6 files changed, 94 insertions(+), 84 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 0acf7af9aab8..2fba0808d975 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2256,25 +2256,6 @@ static bool need_wrongsec_check(struct svc_rqst *rqstp) return !(nextd->op_flags & OP_HANDLES_WRONGSEC); } -static void svcxdr_init_encode(struct svc_rqst *rqstp, - struct nfsd4_compoundres *resp) -{ - struct xdr_stream *xdr = &resp->xdr; - struct xdr_buf *buf = &rqstp->rq_res; - struct kvec *head = buf->head; - - xdr->buf = buf; - xdr->iov = head; - xdr->p = head->iov_base + head->iov_len; - xdr->end = head->iov_base + PAGE_SIZE - rqstp->rq_auth_slack; - /* Tail and page_len should be zero at this point: */ - buf->len = buf->head[0].iov_len; - xdr_reset_scratch_buffer(xdr); - xdr->page_ptr = buf->pages - 1; - buf->buflen = PAGE_SIZE * (1 + rqstp->rq_page_end - buf->pages) - - rqstp->rq_auth_slack; -} - #ifdef CONFIG_NFSD_V4_2_INTER_SSC static void check_if_stalefh_allowed(struct nfsd4_compoundargs *args) @@ -2329,10 +2310,14 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); __be32 status; - svcxdr_init_encode(rqstp, resp); - resp->tagp = resp->xdr.p; + resp->xdr = &rqstp->rq_res_stream; + + /* reserve space for: NFS status code */ + xdr_reserve_space(resp->xdr, XDR_UNIT); + + resp->tagp = resp->xdr->p; /* reserve space for: taglen, tag, and opcnt */ - xdr_reserve_space(&resp->xdr, 8 + args->taglen); + xdr_reserve_space(resp->xdr, XDR_UNIT * 2 + args->taglen); resp->taglen = args->taglen; resp->tag = args->tag; resp->rqstp = rqstp; @@ -2438,7 +2423,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) encode_op: if (op->status == nfserr_replay_me) { op->replay = &cstate->replay_owner->so_replay; - nfsd4_encode_replay(&resp->xdr, op); + nfsd4_encode_replay(resp->xdr, op); status = op->status = op->replay->rp_status; } else { nfsd4_encode_operation(resp, op); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index c2eba93ab513..914f60cee322 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2925,7 +2925,7 @@ gen_callback(struct nfs4_client *clp, struct nfsd4_setclientid *se, struct svc_r static void nfsd4_store_cache_entry(struct nfsd4_compoundres *resp) { - struct xdr_buf *buf = resp->xdr.buf; + struct xdr_buf *buf = resp->xdr->buf; struct nfsd4_slot *slot = resp->cstate.slot; unsigned int base; @@ -2995,7 +2995,7 @@ nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp, struct nfsd4_sequence *seq) { struct nfsd4_slot *slot = resp->cstate.slot; - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; __be32 status; @@ -3740,7 +3740,7 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, { struct nfsd4_sequence *seq = &u->sequence; struct nfsd4_compoundres *resp = rqstp->rq_resp; - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; struct nfsd4_session *session; struct nfs4_client *clp; struct nfsd4_slot *slot; diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 8d5fdae568ae..262c6fec56aa 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3595,7 +3595,7 @@ nfsd4_encode_stateid(struct xdr_stream *xdr, stateid_t *sid) static __be32 nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_access *access) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 8); @@ -3608,7 +3608,7 @@ nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_bind_conn_to_session *bcts) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, NFS4_MAX_SESSIONID_LEN + 8); @@ -3625,7 +3625,7 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp, static __be32 nfsd4_encode_close(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_close *close) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; return nfsd4_encode_stateid(xdr, &close->cl_stateid); } @@ -3634,7 +3634,7 @@ nfsd4_encode_close(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_c static __be32 nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_commit *commit) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, NFS4_VERIFIER_SIZE); @@ -3648,7 +3648,7 @@ nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_ static __be32 nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_create *create) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 20); @@ -3663,7 +3663,7 @@ static __be32 nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_getattr *getattr) { struct svc_fh *fhp = getattr->ga_fhp; - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; return nfsd4_encode_fattr(xdr, fhp, fhp->fh_export, fhp->fh_dentry, getattr->ga_bmval, resp->rqstp, 0); @@ -3672,7 +3672,7 @@ nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 static __be32 nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, struct svc_fh **fhpp) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; struct svc_fh *fhp = *fhpp; unsigned int len; __be32 *p; @@ -3727,7 +3727,7 @@ nfsd4_encode_lock_denied(struct xdr_stream *xdr, struct nfsd4_lock_denied *ld) static __be32 nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lock *lock) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; if (!nfserr) nfserr = nfsd4_encode_stateid(xdr, &lock->lk_resp_stateid); @@ -3740,7 +3740,7 @@ nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lo static __be32 nfsd4_encode_lockt(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lockt *lockt) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; if (nfserr == nfserr_denied) nfsd4_encode_lock_denied(xdr, &lockt->lt_denied); @@ -3750,7 +3750,7 @@ nfsd4_encode_lockt(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_l static __be32 nfsd4_encode_locku(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_locku *locku) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; return nfsd4_encode_stateid(xdr, &locku->lu_stateid); } @@ -3759,7 +3759,7 @@ nfsd4_encode_locku(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_l static __be32 nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_link *link) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 20); @@ -3773,7 +3773,7 @@ nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_li static __be32 nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open *open) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; nfserr = nfsd4_encode_stateid(xdr, &open->op_stateid); @@ -3867,7 +3867,7 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op static __be32 nfsd4_encode_open_confirm(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open_confirm *oc) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; return nfsd4_encode_stateid(xdr, &oc->oc_resp_stateid); } @@ -3875,7 +3875,7 @@ nfsd4_encode_open_confirm(struct nfsd4_compoundres *resp, __be32 nfserr, struct static __be32 nfsd4_encode_open_downgrade(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open_downgrade *od) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; return nfsd4_encode_stateid(xdr, &od->od_stateid); } @@ -3885,7 +3885,7 @@ static __be32 nfsd4_encode_splice_read( struct nfsd4_read *read, struct file *file, unsigned long maxcount) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; struct xdr_buf *buf = xdr->buf; int status, space_left; u32 eof; @@ -3951,7 +3951,7 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, struct nfsd4_read *read, struct file *file, unsigned long maxcount) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; u32 eof; int starting_len = xdr->buf->len - 8; __be32 nfserr; @@ -3990,7 +3990,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_read *read) { unsigned long maxcount; - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; struct file *file; int starting_len = xdr->buf->len; __be32 *p; @@ -4004,7 +4004,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, WARN_ON_ONCE(test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags)); return nfserr_resource; } - if (resp->xdr.buf->page_len && + if (resp->xdr->buf->page_len && test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags)) { WARN_ON_ONCE(1); return nfserr_serverfault; @@ -4034,7 +4034,7 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd int maxcount; __be32 wire_count; int zero = 0; - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; int length_offset = xdr->buf->len; int status; __be32 *p; @@ -4086,7 +4086,7 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 int bytes_left; loff_t offset; __be64 wire_offset; - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; int starting_len = xdr->buf->len; __be32 *p; @@ -4097,8 +4097,8 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 /* XXX: Following NFSv3, we ignore the READDIR verifier for now. */ *p++ = cpu_to_be32(0); *p++ = cpu_to_be32(0); - resp->xdr.buf->head[0].iov_len = ((char *)resp->xdr.p) - - (char *)resp->xdr.buf->head[0].iov_base; + xdr->buf->head[0].iov_len = (char *)xdr->p - + (char *)xdr->buf->head[0].iov_base; /* * Number of bytes left for directory entries allowing for the @@ -4173,7 +4173,7 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 static __be32 nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_remove *remove) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 20); @@ -4186,7 +4186,7 @@ nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_ static __be32 nfsd4_encode_rename(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_rename *rename) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 40); @@ -4269,7 +4269,7 @@ static __be32 nfsd4_encode_secinfo(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_secinfo *secinfo) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; return nfsd4_do_encode_secinfo(xdr, secinfo->si_exp); } @@ -4278,7 +4278,7 @@ static __be32 nfsd4_encode_secinfo_no_name(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_secinfo_no_name *secinfo) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; return nfsd4_do_encode_secinfo(xdr, secinfo->sin_exp); } @@ -4290,7 +4290,7 @@ nfsd4_encode_secinfo_no_name(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_setattr *setattr) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 16); @@ -4314,7 +4314,7 @@ nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 static __be32 nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_setclientid *scd) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; if (!nfserr) { @@ -4338,7 +4338,7 @@ nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct n static __be32 nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_write *write) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 16); @@ -4355,7 +4355,7 @@ static __be32 nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_exchange_id *exid) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; char *major_id; char *server_scope; @@ -4433,7 +4433,7 @@ static __be32 nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_create_session *sess) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 24); @@ -4486,7 +4486,7 @@ static __be32 nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_sequence *seq) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, NFS4_MAX_SESSIONID_LEN + 20); @@ -4509,7 +4509,7 @@ static __be32 nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_test_stateid *test_stateid) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; struct nfsd4_test_stateid_id *stateid, *next; __be32 *p; @@ -4530,7 +4530,7 @@ static __be32 nfsd4_encode_getdeviceinfo(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_getdeviceinfo *gdev) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; const struct nfsd4_layout_ops *ops; u32 starting_len = xdr->buf->len, needed_len; __be32 *p; @@ -4583,7 +4583,7 @@ static __be32 nfsd4_encode_layoutget(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_layoutget *lgp) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; const struct nfsd4_layout_ops *ops; __be32 *p; @@ -4610,7 +4610,7 @@ static __be32 nfsd4_encode_layoutcommit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_layoutcommit *lcp) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 4); @@ -4631,7 +4631,7 @@ static __be32 nfsd4_encode_layoutreturn(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_layoutreturn *lrp) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 4); @@ -4649,7 +4649,7 @@ nfsd42_encode_write_res(struct nfsd4_compoundres *resp, struct nfsd42_write_res *write, bool sync) { __be32 *p; - p = xdr_reserve_space(&resp->xdr, 4); + p = xdr_reserve_space(resp->xdr, 4); if (!p) return nfserr_resource; @@ -4658,11 +4658,11 @@ nfsd42_encode_write_res(struct nfsd4_compoundres *resp, else { __be32 nfserr; *p++ = cpu_to_be32(1); - nfserr = nfsd4_encode_stateid(&resp->xdr, &write->cb_stateid); + nfserr = nfsd4_encode_stateid(resp->xdr, &write->cb_stateid); if (nfserr) return nfserr; } - p = xdr_reserve_space(&resp->xdr, 8 + 4 + NFS4_VERIFIER_SIZE); + p = xdr_reserve_space(resp->xdr, 8 + 4 + NFS4_VERIFIER_SIZE); if (!p) return nfserr_resource; @@ -4676,7 +4676,7 @@ nfsd42_encode_write_res(struct nfsd4_compoundres *resp, static __be32 nfsd42_encode_nl4_server(struct nfsd4_compoundres *resp, struct nl4_server *ns) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; struct nfs42_netaddr *addr; __be32 *p; @@ -4724,7 +4724,7 @@ nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr, if (nfserr) return nfserr; - p = xdr_reserve_space(&resp->xdr, 4 + 4); + p = xdr_reserve_space(resp->xdr, 4 + 4); *p++ = xdr_one; /* cr_consecutive */ *p++ = cpu_to_be32(copy->cp_synchronous); return 0; @@ -4734,7 +4734,7 @@ static __be32 nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_offload_status *os) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 8 + 4); @@ -4751,7 +4751,7 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp, unsigned long *maxcount, u32 *eof, loff_t *pos) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; struct file *file = read->rd_nf->nf_file; int starting_len = xdr->buf->len; loff_t hole_pos; @@ -4810,7 +4810,7 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp, count = data_pos - read->rd_offset; /* Content type, offset, byte count */ - p = xdr_reserve_space(&resp->xdr, 4 + 8 + 8); + p = xdr_reserve_space(resp->xdr, 4 + 8 + 8); if (!p) return nfserr_resource; @@ -4828,7 +4828,7 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_read *read) { unsigned long maxcount, count; - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; struct file *file; int starting_len = xdr->buf->len; int last_segment = xdr->buf->len; @@ -4899,7 +4899,7 @@ static __be32 nfsd4_encode_copy_notify(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_copy_notify *cn) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; if (nfserr) @@ -4935,7 +4935,7 @@ nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr, { __be32 *p; - p = xdr_reserve_space(&resp->xdr, 4 + 8); + p = xdr_reserve_space(resp->xdr, 4 + 8); *p++ = cpu_to_be32(seek->seek_eof); p = xdr_encode_hyper(p, seek->seek_pos); @@ -4996,7 +4996,7 @@ static __be32 nfsd4_encode_getxattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_getxattr *getxattr) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p, err; p = xdr_reserve_space(xdr, 4); @@ -5020,7 +5020,7 @@ static __be32 nfsd4_encode_setxattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_setxattr *setxattr) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 20); @@ -5061,7 +5061,7 @@ static __be32 nfsd4_encode_listxattrs(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_listxattrs *listxattrs) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; u32 cookie_offset, count_offset, eof; u32 left, xdrleft, slen, count; u32 xdrlen, offset; @@ -5172,7 +5172,7 @@ static __be32 nfsd4_encode_removexattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_removexattr *removexattr) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 20); @@ -5312,7 +5312,7 @@ __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 respsize) void nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) { - struct xdr_stream *xdr = &resp->xdr; + struct xdr_stream *xdr = resp->xdr; struct nfs4_stateowner *so = resp->cstate.replay_owner; struct svc_rqst *rqstp = resp->rqstp; const struct nfsd4_operation *opdesc = op->opdesc; @@ -5441,14 +5441,14 @@ int nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd4_compoundres *resp = rqstp->rq_resp; - struct xdr_buf *buf = resp->xdr.buf; + struct xdr_buf *buf = resp->xdr->buf; WARN_ON_ONCE(buf->len != buf->head[0].iov_len + buf->page_len + buf->tail[0].iov_len); *p = resp->cstate.status; - rqstp->rq_next_page = resp->xdr.page_ptr + 1; + rqstp->rq_next_page = resp->xdr->page_ptr + 1; p = resp->tagp; *p++ = htonl(resp->taglen); diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 6c1d70935ea8..0666ef4b87b7 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1030,7 +1030,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) * NFSv4 does some encoding while processing */ p = resv->iov_base + resv->iov_len; - resv->iov_len += sizeof(__be32); + svcxdr_init_encode(rqstp); *statp = proc->pc_func(rqstp); if (*statp == rpc_drop_reply || test_bit(RQ_DROPME, &rqstp->rq_flags)) @@ -1085,7 +1085,7 @@ int nfssvc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p) */ int nfssvc_encode_voidres(struct svc_rqst *rqstp, __be32 *p) { - return xdr_ressize_check(rqstp, p); + return 1; } int nfsd_pool_stats_open(struct inode *inode, struct file *file) diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index c300885ae75d..fe540a3415c6 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -698,7 +698,7 @@ struct nfsd4_compoundargs { struct nfsd4_compoundres { /* scratch variables for XDR encode */ - struct xdr_stream xdr; + struct xdr_stream *xdr; struct svc_rqst * rqstp; u32 taglen; diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 31ee3b6047c3..e91d51ea028b 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -248,6 +248,7 @@ struct svc_rqst { size_t rq_xprt_hlen; /* xprt header len */ struct xdr_buf rq_arg; struct xdr_stream rq_arg_stream; + struct xdr_stream rq_res_stream; struct page *rq_scratch_page; struct xdr_buf rq_res; struct page *rq_pages[RPCSVC_MAXPAGES + 1]; @@ -574,4 +575,28 @@ static inline void svcxdr_init_decode(struct svc_rqst *rqstp) xdr_set_scratch_page(xdr, rqstp->rq_scratch_page); } +/** + * svcxdr_init_encode - Prepare an xdr_stream for svc Reply encoding + * @rqstp: controlling server RPC transaction context + * + */ +static inline void svcxdr_init_encode(struct svc_rqst *rqstp) +{ + struct xdr_stream *xdr = &rqstp->rq_res_stream; + struct xdr_buf *buf = &rqstp->rq_res; + struct kvec *resv = buf->head; + + xdr_reset_scratch_buffer(xdr); + + xdr->buf = buf; + xdr->iov = resv; + xdr->p = resv->iov_base + resv->iov_len; + xdr->end = resv->iov_base + PAGE_SIZE - rqstp->rq_auth_slack; + buf->len = resv->iov_len; + xdr->page_ptr = buf->pages - 1; + buf->buflen = PAGE_SIZE * (1 + rqstp->rq_page_end - buf->pages); + buf->buflen -= rqstp->rq_auth_slack; + xdr->rqst = NULL; +} + #endif /* SUNRPC_SVC_H */ From 0ef12d755c4b0d43c72a3457aa5eb965a66ff722 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 21 Oct 2020 11:58:41 -0400 Subject: [PATCH 0497/1565] NFSD: Update the GETATTR3res encoder to use struct xdr_stream [ Upstream commit 2c42f804d30f6a8d86665eca84071b316821ea08 ] As an additional clean up, some renaming is done to more closely reflect the data type and variable names used in the NFSv3 XDR definition provided in RFC 1813. "attrstat" is an NFSv2 thingie. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 2 +- fs/nfsd/nfs3xdr.c | 95 ++++++++++++++++++++++++++++++++++++++++++---- fs/nfsd/nfsfh.c | 2 +- fs/nfsd/nfsfh.h | 2 +- fs/nfsd/xdr3.h | 2 +- 5 files changed, 91 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 25f31a03c4f1..1c3cf97ed95d 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -741,7 +741,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { [NFS3PROC_GETATTR] = { .pc_func = nfsd3_proc_getattr, .pc_decode = nfs3svc_decode_fhandleargs, - .pc_encode = nfs3svc_encode_attrstatres, + .pc_encode = nfs3svc_encode_getattrres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd3_attrstatres), diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 9d9a01ce0b27..75739861d235 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -20,7 +20,7 @@ /* * Mapping of S_IF* types to NFS file types */ -static u32 nfs3_ftypes[] = { +static const u32 nfs3_ftypes[] = { NF3NON, NF3FIFO, NF3CHR, NF3BAD, NF3DIR, NF3BAD, NF3BLK, NF3BAD, NF3REG, NF3BAD, NF3LNK, NF3BAD, @@ -39,6 +39,15 @@ encode_time3(__be32 *p, struct timespec64 *time) return p; } +static __be32 * +encode_nfstime3(__be32 *p, const struct timespec64 *time) +{ + *p++ = cpu_to_be32((u32)time->tv_sec); + *p++ = cpu_to_be32(time->tv_nsec); + + return p; +} + static bool svcxdr_decode_nfstime3(struct xdr_stream *xdr, struct timespec64 *timep) { @@ -82,6 +91,19 @@ svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp) return true; } +static bool +svcxdr_encode_nfsstat3(struct xdr_stream *xdr, __be32 status) +{ + __be32 *p; + + p = xdr_reserve_space(xdr, sizeof(status)); + if (!p) + return false; + *p = status; + + return true; +} + static __be32 * encode_fh(__be32 *p, struct svc_fh *fhp) { @@ -253,6 +275,58 @@ svcxdr_decode_devicedata3(struct svc_rqst *rqstp, struct xdr_stream *xdr, svcxdr_decode_specdata3(xdr, args); } +static bool +svcxdr_encode_fattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr, + const struct svc_fh *fhp, const struct kstat *stat) +{ + struct user_namespace *userns = nfsd_user_namespace(rqstp); + __be32 *p; + u64 fsid; + + p = xdr_reserve_space(xdr, XDR_UNIT * 21); + if (!p) + return false; + + *p++ = cpu_to_be32(nfs3_ftypes[(stat->mode & S_IFMT) >> 12]); + *p++ = cpu_to_be32((u32)(stat->mode & S_IALLUGO)); + *p++ = cpu_to_be32((u32)stat->nlink); + *p++ = cpu_to_be32((u32)from_kuid_munged(userns, stat->uid)); + *p++ = cpu_to_be32((u32)from_kgid_munged(userns, stat->gid)); + if (S_ISLNK(stat->mode) && stat->size > NFS3_MAXPATHLEN) + p = xdr_encode_hyper(p, (u64)NFS3_MAXPATHLEN); + else + p = xdr_encode_hyper(p, (u64)stat->size); + + /* used */ + p = xdr_encode_hyper(p, ((u64)stat->blocks) << 9); + + /* rdev */ + *p++ = cpu_to_be32((u32)MAJOR(stat->rdev)); + *p++ = cpu_to_be32((u32)MINOR(stat->rdev)); + + switch(fsid_source(fhp)) { + case FSIDSOURCE_FSID: + fsid = (u64)fhp->fh_export->ex_fsid; + break; + case FSIDSOURCE_UUID: + fsid = ((u64 *)fhp->fh_export->ex_uuid)[0]; + fsid ^= ((u64 *)fhp->fh_export->ex_uuid)[1]; + break; + default: + fsid = (u64)huge_encode_dev(fhp->fh_dentry->d_sb->s_dev); + } + p = xdr_encode_hyper(p, fsid); + + /* fileid */ + p = xdr_encode_hyper(p, stat->ino); + + p = encode_nfstime3(p, &stat->atime); + p = encode_nfstime3(p, &stat->mtime); + encode_nfstime3(p, &stat->ctime); + + return true; +} + static __be32 *encode_fsid(__be32 *p, struct svc_fh *fhp) { u64 f; @@ -713,17 +787,22 @@ nfs3svc_decode_commitargs(struct svc_rqst *rqstp, __be32 *p) /* GETATTR */ int -nfs3svc_encode_attrstat(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_getattrres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_attrstat *resp = rqstp->rq_resp; - *p++ = resp->status; - if (resp->status == 0) { - lease_get_mtime(d_inode(resp->fh.fh_dentry), - &resp->stat.mtime); - p = encode_fattr3(rqstp, p, &resp->fh, &resp->stat); + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + lease_get_mtime(d_inode(resp->fh.fh_dentry), &resp->stat.mtime); + if (!svcxdr_encode_fattr3(rqstp, xdr, &resp->fh, &resp->stat)) + return 0; + break; } - return xdr_ressize_check(rqstp, p); + + return 1; } /* SETATTR, REMOVE, RMDIR */ diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 4744a276058d..04930056222b 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -710,7 +710,7 @@ char * SVCFH_fmt(struct svc_fh *fhp) return buf; } -enum fsid_source fsid_source(struct svc_fh *fhp) +enum fsid_source fsid_source(const struct svc_fh *fhp) { if (fhp->fh_handle.fh_version != 1) return FSIDSOURCE_DEV; diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index f58933519f38..aff2cda5c6c3 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -82,7 +82,7 @@ enum fsid_source { FSIDSOURCE_FSID, FSIDSOURCE_UUID, }; -extern enum fsid_source fsid_source(struct svc_fh *fhp); +extern enum fsid_source fsid_source(const struct svc_fh *fhp); /* diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 3e1578953f54..0822981c61b9 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -280,7 +280,7 @@ int nfs3svc_decode_symlinkargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_readdirargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_readdirplusargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_commitargs(struct svc_rqst *, __be32 *); -int nfs3svc_encode_attrstat(struct svc_rqst *, __be32 *); +int nfs3svc_encode_getattrres(struct svc_rqst *, __be32 *); int nfs3svc_encode_wccstat(struct svc_rqst *, __be32 *); int nfs3svc_encode_diropres(struct svc_rqst *, __be32 *); int nfs3svc_encode_accessres(struct svc_rqst *, __be32 *); From fd9e183df6255038f78a40993ba694a6d3b50bdd Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 22 Oct 2020 13:56:58 -0400 Subject: [PATCH 0498/1565] NFSD: Update the NFSv3 ACCESS3res encoder to use struct xdr_stream [ Upstream commit 907c38227fb57f5c537491ca76dd0b9636029393 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 50 ++++++++++++++++++++++++++++++++++++++++++----- fs/nfsd/vfs.h | 2 +- 2 files changed, 46 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 75739861d235..9d6c989df6d8 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -383,6 +383,35 @@ encode_saved_post_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) return encode_fattr3(rqstp, p, fhp, &fhp->fh_post_attr); } +static bool +svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr, + const struct svc_fh *fhp) +{ + struct dentry *dentry = fhp->fh_dentry; + struct kstat stat; + + /* + * The inode may be NULL if the call failed because of a + * stale file handle. In this case, no attributes are + * returned. + */ + if (fhp->fh_no_wcc || !dentry || !d_really_is_positive(dentry)) + goto no_post_op_attrs; + if (fh_getattr(fhp, &stat) != nfs_ok) + goto no_post_op_attrs; + + if (xdr_stream_encode_item_present(xdr) < 0) + return false; + lease_get_mtime(d_inode(dentry), &stat.mtime); + if (!svcxdr_encode_fattr3(rqstp, xdr, fhp, &stat)) + return false; + + return true; + +no_post_op_attrs: + return xdr_stream_encode_item_absent(xdr) > 0; +} + /* * Encode post-operation attributes. * The inode may be NULL if the call failed because of a stale file @@ -835,13 +864,24 @@ nfs3svc_encode_diropres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_accessres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_accessres *resp = rqstp->rq_resp; - *p++ = resp->status; - p = encode_post_op_attr(rqstp, p, &resp->fh); - if (resp->status == 0) - *p++ = htonl(resp->access); - return xdr_ressize_check(rqstp, p); + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) + return 0; + if (xdr_stream_encode_u32(xdr, resp->access) < 0) + return 0; + break; + default: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) + return 0; + } + + return 1; } /* READLINK */ diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index a2442ebe5acf..b21b76e6b9a8 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -152,7 +152,7 @@ static inline void fh_drop_write(struct svc_fh *fh) } } -static inline __be32 fh_getattr(struct svc_fh *fh, struct kstat *stat) +static inline __be32 fh_getattr(const struct svc_fh *fh, struct kstat *stat) { struct path p = {.mnt = fh->fh_export->ex_path.mnt, .dentry = fh->fh_dentry}; From f74e0652a60b0d98de5c9617ea5d80c940a22c7a Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 22 Oct 2020 14:46:58 -0400 Subject: [PATCH 0499/1565] NFSD: Update the NFSv3 LOOKUP3res encoder to use struct xdr_stream [ Upstream commit 5cf353354af1a385f29dec4609a1532d32c83a25 ] Also, clean up: Rename the encoder function to match the name of the result structure in RFC 1813, consistent with other encoder function names in nfs3xdr.c. "diropres" is an NFSv2 thingie. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 2 +- fs/nfsd/nfs3xdr.c | 43 +++++++++++++++++++++++++++++++++++-------- fs/nfsd/xdr3.h | 2 +- 3 files changed, 37 insertions(+), 10 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 1c3cf97ed95d..60e8c25be757 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -763,7 +763,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { [NFS3PROC_LOOKUP] = { .pc_func = nfsd3_proc_lookup, .pc_decode = nfs3svc_decode_diropargs, - .pc_encode = nfs3svc_encode_diropres, + .pc_encode = nfs3svc_encode_lookupres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_diropargs), .pc_ressize = sizeof(struct nfsd3_diropres), diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 9d6c989df6d8..2bb998b3834b 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -104,6 +104,23 @@ svcxdr_encode_nfsstat3(struct xdr_stream *xdr, __be32 status) return true; } +static bool +svcxdr_encode_nfs_fh3(struct xdr_stream *xdr, const struct svc_fh *fhp) +{ + u32 size = fhp->fh_handle.fh_size; + __be32 *p; + + p = xdr_reserve_space(xdr, XDR_UNIT + size); + if (!p) + return false; + *p++ = cpu_to_be32(size); + if (size) + p[XDR_QUADLEN(size) - 1] = 0; + memcpy(p, &fhp->fh_handle.fh_base, size); + + return true; +} + static __be32 * encode_fh(__be32 *p, struct svc_fh *fhp) { @@ -846,18 +863,28 @@ nfs3svc_encode_wccstat(struct svc_rqst *rqstp, __be32 *p) } /* LOOKUP */ -int -nfs3svc_encode_diropres(struct svc_rqst *rqstp, __be32 *p) +int nfs3svc_encode_lookupres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_diropres *resp = rqstp->rq_resp; - *p++ = resp->status; - if (resp->status == 0) { - p = encode_fh(p, &resp->fh); - p = encode_post_op_attr(rqstp, p, &resp->fh); + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_nfs_fh3(xdr, &resp->fh)) + return 0; + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) + return 0; + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->dirfh)) + return 0; + break; + default: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->dirfh)) + return 0; } - p = encode_post_op_attr(rqstp, p, &resp->dirfh); - return xdr_ressize_check(rqstp, p); + + return 1; } /* ACCESS */ diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 0822981c61b9..7db4ee17aa20 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -282,7 +282,7 @@ int nfs3svc_decode_readdirplusargs(struct svc_rqst *, __be32 *); int nfs3svc_decode_commitargs(struct svc_rqst *, __be32 *); int nfs3svc_encode_getattrres(struct svc_rqst *, __be32 *); int nfs3svc_encode_wccstat(struct svc_rqst *, __be32 *); -int nfs3svc_encode_diropres(struct svc_rqst *, __be32 *); +int nfs3svc_encode_lookupres(struct svc_rqst *, __be32 *); int nfs3svc_encode_accessres(struct svc_rqst *, __be32 *); int nfs3svc_encode_readlinkres(struct svc_rqst *, __be32 *); int nfs3svc_encode_readres(struct svc_rqst *, __be32 *); From c3995f8be13a4265223125131a9fd472634ea00f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 22 Oct 2020 15:12:38 -0400 Subject: [PATCH 0500/1565] NFSD: Update the NFSv3 wccstat result encoder to use struct xdr_stream [ Upstream commit 70f8e839859a994e324e1d18889f8319bbd5bff9 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 68 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 65 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 2bb998b3834b..29b923a497f4 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -400,6 +400,35 @@ encode_saved_post_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) return encode_fattr3(rqstp, p, fhp, &fhp->fh_post_attr); } +static bool +svcxdr_encode_wcc_attr(struct xdr_stream *xdr, const struct svc_fh *fhp) +{ + __be32 *p; + + p = xdr_reserve_space(xdr, XDR_UNIT * 6); + if (!p) + return false; + p = xdr_encode_hyper(p, (u64)fhp->fh_pre_size); + p = encode_nfstime3(p, &fhp->fh_pre_mtime); + encode_nfstime3(p, &fhp->fh_pre_ctime); + + return true; +} + +static bool +svcxdr_encode_pre_op_attr(struct xdr_stream *xdr, const struct svc_fh *fhp) +{ + if (!fhp->fh_pre_saved) { + if (xdr_stream_encode_item_absent(xdr) < 0) + return false; + return true; + } + + if (xdr_stream_encode_item_present(xdr) < 0) + return false; + return svcxdr_encode_wcc_attr(xdr, fhp); +} + static bool svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr, const struct svc_fh *fhp) @@ -460,6 +489,39 @@ nfs3svc_encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fh return encode_post_op_attr(rqstp, p, fhp); } +/* + * Encode weak cache consistency data + */ +static bool +svcxdr_encode_wcc_data(struct svc_rqst *rqstp, struct xdr_stream *xdr, + const struct svc_fh *fhp) +{ + struct dentry *dentry = fhp->fh_dentry; + + if (!dentry || !d_really_is_positive(dentry) || !fhp->fh_post_saved) + goto neither; + + /* before */ + if (!svcxdr_encode_pre_op_attr(xdr, fhp)) + return false; + + /* after */ + if (xdr_stream_encode_item_present(xdr) < 0) + return false; + if (!svcxdr_encode_fattr3(rqstp, xdr, fhp, &fhp->fh_post_attr)) + return false; + + return true; + +neither: + if (xdr_stream_encode_item_absent(xdr) < 0) + return false; + if (!svcxdr_encode_post_op_attr(rqstp, xdr, fhp)) + return false; + + return true; +} + /* * Enocde weak cache consistency data */ @@ -855,11 +917,11 @@ nfs3svc_encode_getattrres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_wccstat(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_attrstat *resp = rqstp->rq_resp; - *p++ = resp->status; - p = encode_wcc_data(rqstp, p, &resp->fh); - return xdr_ressize_check(rqstp, p); + return svcxdr_encode_nfsstat3(xdr, resp->status) && + svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh); } /* LOOKUP */ From 170e6bd25e691e2a9e2ed69437491354b46c9808 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 22 Oct 2020 15:18:40 -0400 Subject: [PATCH 0501/1565] NFSD: Update the NFSv3 READLINK3res encoder to use struct xdr_stream [ Upstream commit 9a9c8923b3efd593d0e6a405efef9d58c6e6804b ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 5 +++-- fs/nfsd/nfs3xdr.c | 34 ++++++++++++++++++---------------- fs/nfsd/xdr3.h | 1 + 3 files changed, 22 insertions(+), 18 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 60e8c25be757..e8d772f2c776 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -126,14 +126,15 @@ nfsd3_proc_readlink(struct svc_rqst *rqstp) { struct nfsd_fhandle *argp = rqstp->rq_argp; struct nfsd3_readlinkres *resp = rqstp->rq_resp; - char *buffer = page_address(*(rqstp->rq_next_page++)); dprintk("nfsd: READLINK(3) %s\n", SVCFH_fmt(&argp->fh)); /* Read the symlink. */ fh_copy(&resp->fh, &argp->fh); resp->len = NFS3_MAXPATHLEN; - resp->status = nfsd_readlink(rqstp, &resp->fh, buffer, &resp->len); + resp->pages = rqstp->rq_next_page++; + resp->status = nfsd_readlink(rqstp, &resp->fh, + page_address(*resp->pages), &resp->len); return rpc_success; } diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 29b923a497f4..352691c3e246 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -977,26 +977,28 @@ nfs3svc_encode_accessres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_readlinkres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head; - *p++ = resp->status; - p = encode_post_op_attr(rqstp, p, &resp->fh); - if (resp->status == 0) { - *p++ = htonl(resp->len); - xdr_ressize_check(rqstp, p); - rqstp->rq_res.page_len = resp->len; - if (resp->len & 3) { - /* need to pad the tail */ - rqstp->rq_res.tail[0].iov_base = p; - *p = 0; - rqstp->rq_res.tail[0].iov_len = 4 - (resp->len&3); - } - if (svc_encode_result_payload(rqstp, head->iov_len, resp->len)) + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) return 0; - return 1; - } else - return xdr_ressize_check(rqstp, p); + if (xdr_stream_encode_u32(xdr, resp->len) < 0) + return 0; + xdr_write_pages(xdr, resp->pages, 0, resp->len); + if (svc_encode_result_payload(rqstp, head->iov_len, resp->len) < 0) + return 0; + break; + default: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) + return 0; + } + + return 1; } /* READ */ diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 7db4ee17aa20..1d633c5d5fa2 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -137,6 +137,7 @@ struct nfsd3_readlinkres { __be32 status; struct svc_fh fh; __u32 len; + struct page **pages; }; struct nfsd3_readres { From b241ab982373b0b16fd2145ef6a745222ad9bc54 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 22 Oct 2020 15:23:50 -0400 Subject: [PATCH 0502/1565] NFSD: Update the NFSv3 READ3res encode to use struct xdr_stream [ Upstream commit cc9bcdad7773c295375e66c892c7ac00524706f2 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 1 + fs/nfsd/nfs3xdr.c | 43 ++++++++++++++++++++------------------ fs/nfsd/xdr3.h | 1 + include/linux/sunrpc/xdr.h | 20 ++++++++++++++++++ 4 files changed, 45 insertions(+), 20 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index e8d772f2c776..201f2009b540 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -159,6 +159,7 @@ nfsd3_proc_read(struct svc_rqst *rqstp) v = 0; len = argp->count; + resp->pages = rqstp->rq_next_page; while (len > 0) { struct page *page = *(rqstp->rq_next_page++); diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 352691c3e246..859cc6c51c1a 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -1005,30 +1005,33 @@ nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_readres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_readres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head; - *p++ = resp->status; - p = encode_post_op_attr(rqstp, p, &resp->fh); - if (resp->status == 0) { - *p++ = htonl(resp->count); - *p++ = htonl(resp->eof); - *p++ = htonl(resp->count); /* xdr opaque count */ - xdr_ressize_check(rqstp, p); - /* now update rqstp->rq_res to reflect data as well */ - rqstp->rq_res.page_len = resp->count; - if (resp->count & 3) { - /* need to pad the tail */ - rqstp->rq_res.tail[0].iov_base = p; - *p = 0; - rqstp->rq_res.tail[0].iov_len = 4 - (resp->count & 3); - } - if (svc_encode_result_payload(rqstp, head->iov_len, - resp->count)) + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) return 0; - return 1; - } else - return xdr_ressize_check(rqstp, p); + if (xdr_stream_encode_u32(xdr, resp->count) < 0) + return 0; + if (xdr_stream_encode_bool(xdr, resp->eof) < 0) + return 0; + if (xdr_stream_encode_u32(xdr, resp->count) < 0) + return 0; + xdr_write_pages(xdr, resp->pages, rqstp->rq_res.page_base, + resp->count); + if (svc_encode_result_payload(rqstp, head->iov_len, resp->count) < 0) + return 0; + break; + default: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) + return 0; + } + + return 1; } /* WRITE */ diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 1d633c5d5fa2..8073350418ae 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -145,6 +145,7 @@ struct nfsd3_readres { struct svc_fh fh; unsigned long count; __u32 eof; + struct page **pages; }; struct nfsd3_writeres { diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index eba6204330b3..237b78146c7d 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -394,6 +394,26 @@ static inline int xdr_stream_encode_item_absent(struct xdr_stream *xdr) return len; } +/** + * xdr_stream_encode_bool - Encode a "not present" list item + * @xdr: pointer to xdr_stream + * @n: boolean value to encode + * + * Return values: + * On success, returns length in bytes of XDR buffer consumed + * %-EMSGSIZE on XDR buffer overflow + */ +static inline int xdr_stream_encode_bool(struct xdr_stream *xdr, __u32 n) +{ + const size_t len = XDR_UNIT; + __be32 *p = xdr_reserve_space(xdr, len); + + if (unlikely(!p)) + return -EMSGSIZE; + *p = n ? xdr_one : xdr_zero; + return len; +} + /** * xdr_stream_encode_u32 - Encode a 32-bit integer * @xdr: pointer to xdr_stream From 8737a75f265dd4e1b448338ec989fa1c551063ee Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 22 Oct 2020 15:26:31 -0400 Subject: [PATCH 0503/1565] NFSD: Update the NFSv3 WRITE3res encoder to use struct xdr_stream [ Upstream commit ecb7a085ac15a8844ebf12fca6ae51ce71ac9b3b ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 40 ++++++++++++++++++++++++++++++++-------- 1 file changed, 32 insertions(+), 8 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 859cc6c51c1a..0191cbfc486b 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -131,6 +131,19 @@ encode_fh(__be32 *p, struct svc_fh *fhp) return p + XDR_QUADLEN(size); } +static bool +svcxdr_encode_writeverf3(struct xdr_stream *xdr, const __be32 *verf) +{ + __be32 *p; + + p = xdr_reserve_space(xdr, NFS3_WRITEVERFSIZE); + if (!p) + return false; + memcpy(p, verf, NFS3_WRITEVERFSIZE); + + return true; +} + static bool svcxdr_decode_filename3(struct xdr_stream *xdr, char **name, unsigned int *len) { @@ -1038,17 +1051,28 @@ nfs3svc_encode_readres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_writeres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_writeres *resp = rqstp->rq_resp; - *p++ = resp->status; - p = encode_wcc_data(rqstp, p, &resp->fh); - if (resp->status == 0) { - *p++ = htonl(resp->count); - *p++ = htonl(resp->committed); - *p++ = resp->verf[0]; - *p++ = resp->verf[1]; + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) + return 0; + if (xdr_stream_encode_u32(xdr, resp->count) < 0) + return 0; + if (xdr_stream_encode_u32(xdr, resp->committed) < 0) + return 0; + if (!svcxdr_encode_writeverf3(xdr, resp->verf)) + return 0; + break; + default: + if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) + return 0; } - return xdr_ressize_check(rqstp, p); + + return 1; } /* CREATE, MKDIR, SYMLINK, MKNOD */ From 1863ca4c9e29227eab16587938ebf3dc0bcea920 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 22 Oct 2020 15:27:23 -0400 Subject: [PATCH 0504/1565] NFSD: Update the NFSv3 CREATE family of encoders to use struct xdr_stream [ Upstream commit 78315b36781d259dcbdc102ff22c3f2f25712223 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 35 ++++++++++++++++++++++++++++------- 1 file changed, 28 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 0191cbfc486b..052376a65f72 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -121,6 +121,17 @@ svcxdr_encode_nfs_fh3(struct xdr_stream *xdr, const struct svc_fh *fhp) return true; } +static bool +svcxdr_encode_post_op_fh3(struct xdr_stream *xdr, const struct svc_fh *fhp) +{ + if (xdr_stream_encode_item_present(xdr) < 0) + return false; + if (!svcxdr_encode_nfs_fh3(xdr, fhp)) + return false; + + return true; +} + static __be32 * encode_fh(__be32 *p, struct svc_fh *fhp) { @@ -1079,16 +1090,26 @@ nfs3svc_encode_writeres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_createres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_diropres *resp = rqstp->rq_resp; - *p++ = resp->status; - if (resp->status == 0) { - *p++ = xdr_one; - p = encode_fh(p, &resp->fh); - p = encode_post_op_attr(rqstp, p, &resp->fh); + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_post_op_fh3(xdr, &resp->fh)) + return 0; + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) + return 0; + if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->dirfh)) + return 0; + break; + default: + if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->dirfh)) + return 0; } - p = encode_wcc_data(rqstp, p, &resp->dirfh); - return xdr_ressize_check(rqstp, p); + + return 1; } /* RENAME */ From 0404cffec4136da3db5ccda30c47a06f33f63ec0 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 22 Oct 2020 15:33:05 -0400 Subject: [PATCH 0505/1565] NFSD: Update the NFSv3 RENAMEv3res encoder to use struct xdr_stream [ Upstream commit 89d79e9672dfa6d0cc416699c16f2d312da58ff2 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 052376a65f72..1d52a69562b8 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -1116,12 +1116,12 @@ nfs3svc_encode_createres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_renameres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_renameres *resp = rqstp->rq_resp; - *p++ = resp->status; - p = encode_wcc_data(rqstp, p, &resp->ffh); - p = encode_wcc_data(rqstp, p, &resp->tfh); - return xdr_ressize_check(rqstp, p); + return svcxdr_encode_nfsstat3(xdr, resp->status) && + svcxdr_encode_wcc_data(rqstp, xdr, &resp->ffh) && + svcxdr_encode_wcc_data(rqstp, xdr, &resp->tfh); } /* LINK */ From 066dc317fa65e975dc17a0d4e4e8998122d627d2 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 22 Oct 2020 15:08:29 -0400 Subject: [PATCH 0506/1565] NFSD: Update the NFSv3 LINK3res encoder to use struct xdr_stream [ Upstream commit 4d74380a446f75eebb2171687d9b8baf0025bdf1 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 1d52a69562b8..e159e4557428 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -1128,12 +1128,12 @@ nfs3svc_encode_renameres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_linkres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_linkres *resp = rqstp->rq_resp; - *p++ = resp->status; - p = encode_post_op_attr(rqstp, p, &resp->fh); - p = encode_wcc_data(rqstp, p, &resp->tfh); - return xdr_ressize_check(rqstp, p); + return svcxdr_encode_nfsstat3(xdr, resp->status) && + svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh) && + svcxdr_encode_wcc_data(rqstp, xdr, &resp->tfh); } /* READDIR */ From f6908e2bcd84b0ac74bf04f41363b3085be63c8b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 6 Nov 2020 13:08:45 -0500 Subject: [PATCH 0507/1565] NFSD: Update the NFSv3 FSSTAT3res encoder to use struct xdr_stream [ Upstream commit 8b7044984fd6eeadf72285e3617116bd15e9e676 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 58 +++++++++++++++++++++++++++++++++++------------ 1 file changed, 44 insertions(+), 14 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index e159e4557428..e4a569e7216d 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -17,6 +17,13 @@ #define NFSDDBG_FACILITY NFSDDBG_XDR +/* + * Force construction of an empty post-op attr + */ +static const struct svc_fh nfs3svc_null_fh = { + .fh_no_wcc = true, +}; + /* * Mapping of S_IF* types to NFS file types */ @@ -1392,27 +1399,50 @@ nfs3svc_encode_entry_plus(void *cd, const char *name, return encode_entry(cd, name, namlen, offset, ino, d_type, 1); } +static bool +svcxdr_encode_fsstat3resok(struct xdr_stream *xdr, + const struct nfsd3_fsstatres *resp) +{ + const struct kstatfs *s = &resp->stats; + u64 bs = s->f_bsize; + __be32 *p; + + p = xdr_reserve_space(xdr, XDR_UNIT * 13); + if (!p) + return false; + p = xdr_encode_hyper(p, bs * s->f_blocks); /* total bytes */ + p = xdr_encode_hyper(p, bs * s->f_bfree); /* free bytes */ + p = xdr_encode_hyper(p, bs * s->f_bavail); /* user available bytes */ + p = xdr_encode_hyper(p, s->f_files); /* total inodes */ + p = xdr_encode_hyper(p, s->f_ffree); /* free inodes */ + p = xdr_encode_hyper(p, s->f_ffree); /* user available inodes */ + *p = cpu_to_be32(resp->invarsec); /* mean unchanged time */ + + return true; +} + /* FSSTAT */ int nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_fsstatres *resp = rqstp->rq_resp; - struct kstatfs *s = &resp->stats; - u64 bs = s->f_bsize; - *p++ = resp->status; - *p++ = xdr_zero; /* no post_op_attr */ - - if (resp->status == 0) { - p = xdr_encode_hyper(p, bs * s->f_blocks); /* total bytes */ - p = xdr_encode_hyper(p, bs * s->f_bfree); /* free bytes */ - p = xdr_encode_hyper(p, bs * s->f_bavail); /* user available bytes */ - p = xdr_encode_hyper(p, s->f_files); /* total inodes */ - p = xdr_encode_hyper(p, s->f_ffree); /* free inodes */ - p = xdr_encode_hyper(p, s->f_ffree); /* user available inodes */ - *p++ = htonl(resp->invarsec); /* mean unchanged time */ + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) + return 0; + if (!svcxdr_encode_fsstat3resok(xdr, resp)) + return 0; + break; + default: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) + return 0; } - return xdr_ressize_check(rqstp, p); + + return 1; } /* FSINFO */ From 4c06f831d28b8ffb436818440202c5795ed242e1 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 22 Oct 2020 13:42:13 -0400 Subject: [PATCH 0508/1565] NFSD: Update the NFSv3 FSINFO3res encoder to use struct xdr_stream [ Upstream commit 0a139d1b7f327010acc36e8162936d3108c7addb ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 62 +++++++++++++++++++++++++++++++++++------------ 1 file changed, 46 insertions(+), 16 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index e4a569e7216d..514f53ad7302 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -24,6 +24,15 @@ static const struct svc_fh nfs3svc_null_fh = { .fh_no_wcc = true, }; +/* + * time_delta. {1, 0} means the server is accurate only + * to the nearest second. + */ +static const struct timespec64 nfs3svc_time_delta = { + .tv_sec = 1, + .tv_nsec = 0, +}; + /* * Mapping of S_IF* types to NFS file types */ @@ -1445,30 +1454,51 @@ nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, __be32 *p) return 1; } +static bool +svcxdr_encode_fsinfo3resok(struct xdr_stream *xdr, + const struct nfsd3_fsinfores *resp) +{ + __be32 *p; + + p = xdr_reserve_space(xdr, XDR_UNIT * 12); + if (!p) + return false; + *p++ = cpu_to_be32(resp->f_rtmax); + *p++ = cpu_to_be32(resp->f_rtpref); + *p++ = cpu_to_be32(resp->f_rtmult); + *p++ = cpu_to_be32(resp->f_wtmax); + *p++ = cpu_to_be32(resp->f_wtpref); + *p++ = cpu_to_be32(resp->f_wtmult); + *p++ = cpu_to_be32(resp->f_dtpref); + p = xdr_encode_hyper(p, resp->f_maxfilesize); + p = encode_nfstime3(p, &nfs3svc_time_delta); + *p = cpu_to_be32(resp->f_properties); + + return true; +} + /* FSINFO */ int nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_fsinfores *resp = rqstp->rq_resp; - *p++ = resp->status; - *p++ = xdr_zero; /* no post_op_attr */ - - if (resp->status == 0) { - *p++ = htonl(resp->f_rtmax); - *p++ = htonl(resp->f_rtpref); - *p++ = htonl(resp->f_rtmult); - *p++ = htonl(resp->f_wtmax); - *p++ = htonl(resp->f_wtpref); - *p++ = htonl(resp->f_wtmult); - *p++ = htonl(resp->f_dtpref); - p = xdr_encode_hyper(p, resp->f_maxfilesize); - *p++ = xdr_one; - *p++ = xdr_zero; - *p++ = htonl(resp->f_properties); + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) + return 0; + if (!svcxdr_encode_fsinfo3resok(xdr, resp)) + return 0; + break; + default: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) + return 0; } - return xdr_ressize_check(rqstp, p); + return 1; } /* PATHCONF */ From 349d96b070de15376b3db8fb0c8da4e1f46a0eab Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 6 Nov 2020 13:15:09 -0500 Subject: [PATCH 0509/1565] NFSD: Update the NFSv3 PATHCONF3res encoder to use struct xdr_stream [ Upstream commit ded04a587f6ceaaba3caefad4021f2212b46c9ff ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 44 ++++++++++++++++++++++++++++---------- include/linux/sunrpc/xdr.h | 18 ++++++++++++++-- 2 files changed, 49 insertions(+), 13 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 514f53ad7302..1467bba02e18 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -1501,25 +1501,47 @@ nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, __be32 *p) return 1; } +static bool +svcxdr_encode_pathconf3resok(struct xdr_stream *xdr, + const struct nfsd3_pathconfres *resp) +{ + __be32 *p; + + p = xdr_reserve_space(xdr, XDR_UNIT * 6); + if (!p) + return false; + *p++ = cpu_to_be32(resp->p_link_max); + *p++ = cpu_to_be32(resp->p_name_max); + p = xdr_encode_bool(p, resp->p_no_trunc); + p = xdr_encode_bool(p, resp->p_chown_restricted); + p = xdr_encode_bool(p, resp->p_case_insensitive); + xdr_encode_bool(p, resp->p_case_preserving); + + return true; +} + /* PATHCONF */ int nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_pathconfres *resp = rqstp->rq_resp; - *p++ = resp->status; - *p++ = xdr_zero; /* no post_op_attr */ - - if (resp->status == 0) { - *p++ = htonl(resp->p_link_max); - *p++ = htonl(resp->p_name_max); - *p++ = htonl(resp->p_no_trunc); - *p++ = htonl(resp->p_chown_restricted); - *p++ = htonl(resp->p_case_insensitive); - *p++ = htonl(resp->p_case_preserving); + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) + return 0; + if (!svcxdr_encode_pathconf3resok(xdr, resp)) + return 0; + break; + default: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) + return 0; } - return xdr_ressize_check(rqstp, p); + return 1; } /* COMMIT */ diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 237b78146c7d..927f1458bcab 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -395,7 +395,21 @@ static inline int xdr_stream_encode_item_absent(struct xdr_stream *xdr) } /** - * xdr_stream_encode_bool - Encode a "not present" list item + * xdr_encode_bool - Encode a boolean item + * @p: address in a buffer into which to encode + * @n: boolean value to encode + * + * Return value: + * Address of item following the encoded boolean + */ +static inline __be32 *xdr_encode_bool(__be32 *p, u32 n) +{ + *p = n ? xdr_one : xdr_zero; + return p++; +} + +/** + * xdr_stream_encode_bool - Encode a boolean item * @xdr: pointer to xdr_stream * @n: boolean value to encode * @@ -410,7 +424,7 @@ static inline int xdr_stream_encode_bool(struct xdr_stream *xdr, __u32 n) if (unlikely(!p)) return -EMSGSIZE; - *p = n ? xdr_one : xdr_zero; + xdr_encode_bool(p, n); return len; } From 30dabf1d4fd4b7a9af6237169d118a92ebfcb47b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 22 Oct 2020 15:35:46 -0400 Subject: [PATCH 0510/1565] NFSD: Update the NFSv3 COMMIT3res encoder to use struct xdr_stream [ Upstream commit 5ef2826c761079e27904c85034df34e601b82d94 ] As an additional clean up, encode_wcc_data() is removed because it is now no longer used. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 54 +++++++++++++---------------------------------- 1 file changed, 15 insertions(+), 39 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 1467bba02e18..eab14b52db20 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -432,14 +432,6 @@ encode_fattr3(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, return p; } -static __be32 * -encode_saved_post_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) -{ - /* Attributes to follow */ - *p++ = xdr_one; - return encode_fattr3(rqstp, p, fhp, &fhp->fh_post_attr); -} - static bool svcxdr_encode_wcc_attr(struct xdr_stream *xdr, const struct svc_fh *fhp) { @@ -562,30 +554,6 @@ svcxdr_encode_wcc_data(struct svc_rqst *rqstp, struct xdr_stream *xdr, return true; } -/* - * Enocde weak cache consistency data - */ -static __be32 * -encode_wcc_data(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) -{ - struct dentry *dentry = fhp->fh_dentry; - - if (dentry && d_really_is_positive(dentry) && fhp->fh_post_saved) { - if (fhp->fh_pre_saved) { - *p++ = xdr_one; - p = xdr_encode_hyper(p, (u64) fhp->fh_pre_size); - p = encode_time3(p, &fhp->fh_pre_mtime); - p = encode_time3(p, &fhp->fh_pre_ctime); - } else { - *p++ = xdr_zero; - } - return encode_saved_post_attr(rqstp, p, fhp); - } - /* no pre- or post-attrs */ - *p++ = xdr_zero; - return encode_post_op_attr(rqstp, p, fhp); -} - static bool fs_supports_change_attribute(struct super_block *sb) { return sb->s_flags & SB_I_VERSION || sb->s_export_op->fetch_iversion; @@ -1548,16 +1516,24 @@ nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_commitres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_commitres *resp = rqstp->rq_resp; - *p++ = resp->status; - p = encode_wcc_data(rqstp, p, &resp->fh); - /* Write verifier */ - if (resp->status == 0) { - *p++ = resp->verf[0]; - *p++ = resp->verf[1]; + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) + return 0; + if (!svcxdr_encode_writeverf3(xdr, resp->verf)) + return 0; + break; + default: + if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) + return 0; } - return xdr_ressize_check(rqstp, p); + + return 1; } /* From 3b2fef48b77c611c19081805b9378e1fe80ad8a6 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 10 Nov 2020 09:57:14 -0500 Subject: [PATCH 0511/1565] NFSD: Add a helper that encodes NFSv3 directory offset cookies [ Upstream commit a161e6c76aeba835e475a2f27dbbe5c37e565e94 ] Refactor: De-duplicate identical code that handles encoding of directory offset cookies across page boundaries. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 24 ++---------------------- fs/nfsd/nfs3xdr.c | 36 +++++++++++++++++++++++------------- fs/nfsd/xdr3.h | 2 ++ 3 files changed, 27 insertions(+), 35 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 201f2009b540..acb0a2d37dcb 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -500,17 +500,7 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) count += PAGE_SIZE; } resp->count = count >> 2; - if (resp->offset) { - if (unlikely(resp->offset1)) { - /* we ended up with offset on a page boundary */ - *resp->offset = htonl(offset >> 32); - *resp->offset1 = htonl(offset & 0xffffffff); - resp->offset1 = NULL; - } else { - xdr_encode_hyper(resp->offset, offset); - } - resp->offset = NULL; - } + nfs3svc_encode_cookie3(resp, offset); return rpc_success; } @@ -565,17 +555,7 @@ nfsd3_proc_readdirplus(struct svc_rqst *rqstp) count += PAGE_SIZE; } resp->count = count >> 2; - if (resp->offset) { - if (unlikely(resp->offset1)) { - /* we ended up with offset on a page boundary */ - *resp->offset = htonl(offset >> 32); - *resp->offset1 = htonl(offset & 0xffffffff); - resp->offset1 = NULL; - } else { - xdr_encode_hyper(resp->offset, offset); - } - resp->offset = NULL; - } + nfs3svc_encode_cookie3(resp, offset); out: return rpc_success; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index eab14b52db20..e334a1454edb 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -1219,6 +1219,28 @@ static __be32 *encode_entryplus_baggage(struct nfsd3_readdirres *cd, __be32 *p, return p; } +/** + * nfs3svc_encode_cookie3 - Encode a directory offset cookie + * @resp: readdir result context + * @offset: offset cookie to encode + * + */ +void nfs3svc_encode_cookie3(struct nfsd3_readdirres *resp, u64 offset) +{ + if (!resp->offset) + return; + + if (resp->offset1) { + /* we ended up with offset on a page boundary */ + *resp->offset = cpu_to_be32(offset >> 32); + *resp->offset1 = cpu_to_be32(offset & 0xffffffff); + resp->offset1 = NULL; + } else { + xdr_encode_hyper(resp->offset, offset); + } + resp->offset = NULL; +} + /* * Encode a directory entry. This one works for both normal readdir * and readdirplus. @@ -1244,19 +1266,7 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen, int elen; /* estimated entry length in words */ int num_entry_words = 0; /* actual number of words */ - if (cd->offset) { - u64 offset64 = offset; - - if (unlikely(cd->offset1)) { - /* we ended up with offset on a page boundary */ - *cd->offset = htonl(offset64 >> 32); - *cd->offset1 = htonl(offset64 & 0xffffffff); - cd->offset1 = NULL; - } else { - xdr_encode_hyper(cd->offset, offset64); - } - cd->offset = NULL; - } + nfs3svc_encode_cookie3(cd, offset); /* dprintk("encode_entry(%.*s @%ld%s)\n", diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 8073350418ae..e76e9230827e 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -300,6 +300,8 @@ int nfs3svc_encode_commitres(struct svc_rqst *, __be32 *); void nfs3svc_release_fhandle(struct svc_rqst *); void nfs3svc_release_fhandle2(struct svc_rqst *); + +void nfs3svc_encode_cookie3(struct nfsd3_readdirres *resp, u64 offset); int nfs3svc_encode_entry(void *, const char *name, int namlen, loff_t offset, u64 ino, unsigned int); From cacfe8f6d809fef3f966c017f2e8c9efa287e53c Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 9 Nov 2020 13:13:21 -0500 Subject: [PATCH 0512/1565] NFSD: Count bytes instead of pages in the NFSv3 READDIR encoder [ Upstream commit a1409e2de4f11034c8eb30775cc3e37039a4ef13 ] Clean up: Counting the bytes used by each returned directory entry seems less brittle to me than trying to measure consumed pages after the fact. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 31 ++----------------------------- fs/nfsd/nfs3xdr.c | 1 + 2 files changed, 3 insertions(+), 29 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index acb0a2d37dcb..7dcc7abb1f34 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -467,10 +467,7 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) { struct nfsd3_readdirargs *argp = rqstp->rq_argp; struct nfsd3_readdirres *resp = rqstp->rq_resp; - int count = 0; loff_t offset; - struct page **p; - caddr_t page_addr = NULL; dprintk("nfsd: READDIR(3) %s %d bytes at %d\n", SVCFH_fmt(&argp->fh), @@ -481,6 +478,7 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) /* Read directory and encode entries on the fly */ fh_copy(&resp->fh, &argp->fh); + resp->count = 0; resp->common.err = nfs_ok; resp->rqstp = rqstp; offset = argp->cookie; @@ -488,18 +486,6 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) resp->status = nfsd_readdir(rqstp, &resp->fh, &offset, &resp->common, nfs3svc_encode_entry); memcpy(resp->verf, argp->verf, 8); - count = 0; - for (p = rqstp->rq_respages + 1; p < rqstp->rq_next_page; p++) { - page_addr = page_address(*p); - - if (((caddr_t)resp->buffer >= page_addr) && - ((caddr_t)resp->buffer < page_addr + PAGE_SIZE)) { - count += (caddr_t)resp->buffer - page_addr; - break; - } - count += PAGE_SIZE; - } - resp->count = count >> 2; nfs3svc_encode_cookie3(resp, offset); return rpc_success; @@ -514,10 +500,7 @@ nfsd3_proc_readdirplus(struct svc_rqst *rqstp) { struct nfsd3_readdirargs *argp = rqstp->rq_argp; struct nfsd3_readdirres *resp = rqstp->rq_resp; - int count = 0; loff_t offset; - struct page **p; - caddr_t page_addr = NULL; dprintk("nfsd: READDIR+(3) %s %d bytes at %d\n", SVCFH_fmt(&argp->fh), @@ -528,6 +511,7 @@ nfsd3_proc_readdirplus(struct svc_rqst *rqstp) /* Read directory and encode entries on the fly */ fh_copy(&resp->fh, &argp->fh); + resp->count = 0; resp->common.err = nfs_ok; resp->rqstp = rqstp; offset = argp->cookie; @@ -544,17 +528,6 @@ nfsd3_proc_readdirplus(struct svc_rqst *rqstp) resp->status = nfsd_readdir(rqstp, &resp->fh, &offset, &resp->common, nfs3svc_encode_entry_plus); memcpy(resp->verf, argp->verf, 8); - for (p = rqstp->rq_respages + 1; p < rqstp->rq_next_page; p++) { - page_addr = page_address(*p); - - if (((caddr_t)resp->buffer >= page_addr) && - ((caddr_t)resp->buffer < page_addr + PAGE_SIZE)) { - count += (caddr_t)resp->buffer - page_addr; - break; - } - count += PAGE_SIZE; - } - resp->count = count >> 2; nfs3svc_encode_cookie3(resp, offset); out: diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index e334a1454edb..523b2dca0494 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -1364,6 +1364,7 @@ encode_entry(struct readdir_cd *ccd, const char *name, int namlen, return -EINVAL; } + cd->count += num_entry_words; cd->buflen -= num_entry_words; cd->buffer = p; cd->common.err = nfs_ok; From 7cbec0dc097a803307bca96bad20e73e645f1539 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 22 Oct 2020 19:31:48 -0400 Subject: [PATCH 0513/1565] NFSD: Update the NFSv3 READDIR3res encoder to use struct xdr_stream [ Upstream commit e4ccfe3014de435984939a3d84b7f241d3b57b0d ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 3 ++- fs/nfsd/nfs3xdr.c | 52 ++++++++++++++++++++++++++++++---------------- fs/nfsd/xdr3.h | 1 + 3 files changed, 37 insertions(+), 19 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 7dcc7abb1f34..9e8481242dea 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -452,7 +452,8 @@ static void nfsd3_init_dirlist_pages(struct svc_rqst *rqstp, * and reserve room for the NULL ptr & eof flag (-2 words) */ resp->buflen = (count >> 2) - 2; - resp->buffer = page_address(*rqstp->rq_next_page); + resp->pages = rqstp->rq_next_page; + resp->buffer = page_address(*resp->pages); while (count > 0) { rqstp->rq_next_page++; count -= PAGE_SIZE; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 523b2dca0494..3d076d3c5c7b 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -158,6 +158,19 @@ encode_fh(__be32 *p, struct svc_fh *fhp) return p + XDR_QUADLEN(size); } +static bool +svcxdr_encode_cookieverf3(struct xdr_stream *xdr, const __be32 *verf) +{ + __be32 *p; + + p = xdr_reserve_space(xdr, NFS3_COOKIEVERFSIZE); + if (!p) + return false; + memcpy(p, verf, NFS3_COOKIEVERFSIZE); + + return true; +} + static bool svcxdr_encode_writeverf3(struct xdr_stream *xdr, const __be32 *verf) { @@ -1124,27 +1137,30 @@ nfs3svc_encode_linkres(struct svc_rqst *rqstp, __be32 *p) int nfs3svc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_readdirres *resp = rqstp->rq_resp; - *p++ = resp->status; - p = encode_post_op_attr(rqstp, p, &resp->fh); + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) + return 0; + if (!svcxdr_encode_cookieverf3(xdr, resp->verf)) + return 0; + xdr_write_pages(xdr, resp->pages, 0, resp->count << 2); + /* no more entries */ + if (xdr_stream_encode_item_absent(xdr) < 0) + return 0; + if (xdr_stream_encode_bool(xdr, resp->common.err == nfserr_eof) < 0) + return 0; + break; + default: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) + return 0; + } - if (resp->status == 0) { - /* stupid readdir cookie */ - memcpy(p, resp->verf, 8); p += 2; - xdr_ressize_check(rqstp, p); - if (rqstp->rq_res.head[0].iov_len + (2<<2) > PAGE_SIZE) - return 1; /*No room for trailer */ - rqstp->rq_res.page_len = (resp->count) << 2; - - /* add the 'tail' to the end of the 'head' page - page 0. */ - rqstp->rq_res.tail[0].iov_base = p; - *p++ = 0; /* no more entries */ - *p++ = htonl(resp->common.err == nfserr_eof); - rqstp->rq_res.tail[0].iov_len = 2<<2; - return 1; - } else - return xdr_ressize_check(rqstp, p); + return 1; } static __be32 * diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index e76e9230827e..a4cdd8ccb175 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -176,6 +176,7 @@ struct nfsd3_readdirres { struct svc_fh scratch; int count; __be32 verf[2]; + struct page **pages; struct readdir_cd common; __be32 * buffer; From 37aa5e64022243e721b8334122997881177a4cfc Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 22 Oct 2020 19:46:58 -0400 Subject: [PATCH 0514/1565] NFSD: Update NFSv3 READDIR entry encoders to use struct xdr_stream [ Upstream commit 7f87fc2d34d475225e78b7f5c4eabb121f4282b2 ] The benefit of the xdr_stream helpers is that they transparently handle encoding an XDR data item that crosses page boundaries. Most of the open-coded logic to do that here can be eliminated. A sub-buffer and sub-stream are set up as a sink buffer for the directory entry encoder. As an entry is encoded, it is added to the end of the content in this buffer/stream. The total length of the directory list is tracked in the buffer's @len field. When it comes time to encode the Reply, the sub-buffer is merged into rq_res's page array at the correct place using xdr_write_pages(). Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 35 ++++++---- fs/nfsd/nfs3xdr.c | 166 +++++++++++++++++++++++++++++++++++++++++---- fs/nfsd/xdr3.h | 14 ++-- 3 files changed, 185 insertions(+), 30 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 9e8481242dea..781bb2b115e7 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -446,18 +446,30 @@ static void nfsd3_init_dirlist_pages(struct svc_rqst *rqstp, struct nfsd3_readdirres *resp, int count) { + struct xdr_buf *buf = &resp->dirlist; + struct xdr_stream *xdr = &resp->xdr; + count = min_t(u32, count, svc_max_payload(rqstp)); - /* Convert byte count to number of words (i.e. >> 2), - * and reserve room for the NULL ptr & eof flag (-2 words) */ - resp->buflen = (count >> 2) - 2; + memset(buf, 0, sizeof(*buf)); - resp->pages = rqstp->rq_next_page; - resp->buffer = page_address(*resp->pages); + /* Reserve room for the NULL ptr & eof flag (-2 words) */ + buf->buflen = count - XDR_UNIT * 2; + buf->pages = rqstp->rq_next_page; while (count > 0) { rqstp->rq_next_page++; count -= PAGE_SIZE; } + + /* This is xdr_init_encode(), but it assumes that + * the head kvec has already been consumed. */ + xdr_set_scratch_buffer(xdr, NULL, 0); + xdr->buf = buf; + xdr->page_ptr = buf->pages; + xdr->iov = NULL; + xdr->p = page_address(*buf->pages); + xdr->end = xdr->p + (PAGE_SIZE >> 2); + xdr->rqst = NULL; } /* @@ -476,16 +488,13 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) nfsd3_init_dirlist_pages(rqstp, resp, argp->count); - /* Read directory and encode entries on the fly */ fh_copy(&resp->fh, &argp->fh); - - resp->count = 0; resp->common.err = nfs_ok; + resp->cookie_offset = 0; resp->rqstp = rqstp; offset = argp->cookie; - resp->status = nfsd_readdir(rqstp, &resp->fh, &offset, - &resp->common, nfs3svc_encode_entry); + &resp->common, nfs3svc_encode_entry3); memcpy(resp->verf, argp->verf, 8); nfs3svc_encode_cookie3(resp, offset); @@ -509,11 +518,9 @@ nfsd3_proc_readdirplus(struct svc_rqst *rqstp) nfsd3_init_dirlist_pages(rqstp, resp, argp->count); - /* Read directory and encode entries on the fly */ fh_copy(&resp->fh, &argp->fh); - - resp->count = 0; resp->common.err = nfs_ok; + resp->cookie_offset = 0; resp->rqstp = rqstp; offset = argp->cookie; @@ -527,7 +534,7 @@ nfsd3_proc_readdirplus(struct svc_rqst *rqstp) } resp->status = nfsd_readdir(rqstp, &resp->fh, &offset, - &resp->common, nfs3svc_encode_entry_plus); + &resp->common, nfs3svc_encode_entryplus3); memcpy(resp->verf, argp->verf, 8); nfs3svc_encode_cookie3(resp, offset); diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 3d076d3c5c7b..f38d845ac8a0 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -1139,6 +1139,7 @@ nfs3svc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) { struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_readdirres *resp = rqstp->rq_resp; + struct xdr_buf *dirlist = &resp->dirlist; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) return 0; @@ -1148,7 +1149,7 @@ nfs3svc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) return 0; if (!svcxdr_encode_cookieverf3(xdr, resp->verf)) return 0; - xdr_write_pages(xdr, resp->pages, 0, resp->count << 2); + xdr_write_pages(xdr, dirlist->pages, 0, dirlist->len); /* no more entries */ if (xdr_stream_encode_item_absent(xdr) < 0) return 0; @@ -1240,21 +1241,18 @@ static __be32 *encode_entryplus_baggage(struct nfsd3_readdirres *cd, __be32 *p, * @resp: readdir result context * @offset: offset cookie to encode * + * The buffer space for the offset cookie has already been reserved + * by svcxdr_encode_entry3_common(). */ void nfs3svc_encode_cookie3(struct nfsd3_readdirres *resp, u64 offset) { - if (!resp->offset) - return; + __be64 cookie = cpu_to_be64(offset); - if (resp->offset1) { - /* we ended up with offset on a page boundary */ - *resp->offset = cpu_to_be32(offset >> 32); - *resp->offset1 = cpu_to_be32(offset & 0xffffffff); - resp->offset1 = NULL; - } else { - xdr_encode_hyper(resp->offset, offset); - } - resp->offset = NULL; + if (!resp->cookie_offset) + return; + write_bytes_to_xdr_buf(&resp->dirlist, resp->cookie_offset, &cookie, + sizeof(cookie)); + resp->cookie_offset = 0; } /* @@ -1403,6 +1401,150 @@ nfs3svc_encode_entry_plus(void *cd, const char *name, return encode_entry(cd, name, namlen, offset, ino, d_type, 1); } +static bool +svcxdr_encode_entry3_common(struct nfsd3_readdirres *resp, const char *name, + int namlen, loff_t offset, u64 ino) +{ + struct xdr_buf *dirlist = &resp->dirlist; + struct xdr_stream *xdr = &resp->xdr; + + if (xdr_stream_encode_item_present(xdr) < 0) + return false; + /* fileid */ + if (xdr_stream_encode_u64(xdr, ino) < 0) + return false; + /* name */ + if (xdr_stream_encode_opaque(xdr, name, min(namlen, NFS3_MAXNAMLEN)) < 0) + return false; + /* cookie */ + resp->cookie_offset = dirlist->len; + if (xdr_stream_encode_u64(xdr, NFS_OFFSET_MAX) < 0) + return false; + + return true; +} + +/** + * nfs3svc_encode_entry3 - encode one NFSv3 READDIR entry + * @data: directory context + * @name: name of the object to be encoded + * @namlen: length of that name, in bytes + * @offset: the offset of the previous entry + * @ino: the fileid of this entry + * @d_type: unused + * + * Return values: + * %0: Entry was successfully encoded. + * %-EINVAL: An encoding problem occured, secondary status code in resp->common.err + * + * On exit, the following fields are updated: + * - resp->xdr + * - resp->common.err + * - resp->cookie_offset + */ +int nfs3svc_encode_entry3(void *data, const char *name, int namlen, + loff_t offset, u64 ino, unsigned int d_type) +{ + struct readdir_cd *ccd = data; + struct nfsd3_readdirres *resp = container_of(ccd, + struct nfsd3_readdirres, + common); + unsigned int starting_length = resp->dirlist.len; + + /* The offset cookie for the previous entry */ + nfs3svc_encode_cookie3(resp, offset); + + if (!svcxdr_encode_entry3_common(resp, name, namlen, offset, ino)) + goto out_toosmall; + + xdr_commit_encode(&resp->xdr); + resp->common.err = nfs_ok; + return 0; + +out_toosmall: + resp->cookie_offset = 0; + resp->common.err = nfserr_toosmall; + resp->dirlist.len = starting_length; + return -EINVAL; +} + +static bool +svcxdr_encode_entry3_plus(struct nfsd3_readdirres *resp, const char *name, + int namlen, u64 ino) +{ + struct xdr_stream *xdr = &resp->xdr; + struct svc_fh *fhp = &resp->scratch; + bool result; + + result = false; + fh_init(fhp, NFS3_FHSIZE); + if (compose_entry_fh(resp, fhp, name, namlen, ino) != nfs_ok) + goto out_noattrs; + + if (!svcxdr_encode_post_op_attr(resp->rqstp, xdr, fhp)) + goto out; + if (!svcxdr_encode_post_op_fh3(xdr, fhp)) + goto out; + result = true; + +out: + fh_put(fhp); + return result; + +out_noattrs: + if (xdr_stream_encode_item_absent(xdr) < 0) + return false; + if (xdr_stream_encode_item_absent(xdr) < 0) + return false; + return true; +} + +/** + * nfs3svc_encode_entryplus3 - encode one NFSv3 READDIRPLUS entry + * @data: directory context + * @name: name of the object to be encoded + * @namlen: length of that name, in bytes + * @offset: the offset of the previous entry + * @ino: the fileid of this entry + * @d_type: unused + * + * Return values: + * %0: Entry was successfully encoded. + * %-EINVAL: An encoding problem occured, secondary status code in resp->common.err + * + * On exit, the following fields are updated: + * - resp->xdr + * - resp->common.err + * - resp->cookie_offset + */ +int nfs3svc_encode_entryplus3(void *data, const char *name, int namlen, + loff_t offset, u64 ino, unsigned int d_type) +{ + struct readdir_cd *ccd = data; + struct nfsd3_readdirres *resp = container_of(ccd, + struct nfsd3_readdirres, + common); + unsigned int starting_length = resp->dirlist.len; + + /* The offset cookie for the previous entry */ + nfs3svc_encode_cookie3(resp, offset); + + if (!svcxdr_encode_entry3_common(resp, name, namlen, offset, ino)) + goto out_toosmall; + if (!svcxdr_encode_entry3_plus(resp, name, namlen, ino)) + goto out_toosmall; + + xdr_commit_encode(&resp->xdr); + resp->common.err = nfs_ok; + return 0; + +out_toosmall: + resp->cookie_offset = 0; + resp->common.err = nfserr_toosmall; + resp->dirlist.len = starting_length; + return -EINVAL; +} + static bool svcxdr_encode_fsstat3resok(struct xdr_stream *xdr, const struct nfsd3_fsstatres *resp) diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index a4cdd8ccb175..81dea78b0f17 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -169,20 +169,22 @@ struct nfsd3_linkres { }; struct nfsd3_readdirres { + /* Components of the reply */ __be32 status; struct svc_fh fh; - /* Just to save kmalloc on every readdirplus entry (svc_fh is a - * little large for the stack): */ - struct svc_fh scratch; int count; __be32 verf[2]; - struct page **pages; + /* Used to encode the reply's entry list */ + struct xdr_stream xdr; + struct xdr_buf dirlist; + struct svc_fh scratch; struct readdir_cd common; __be32 * buffer; int buflen; __be32 * offset; __be32 * offset1; + unsigned int cookie_offset; struct svc_rqst * rqstp; }; @@ -309,6 +311,10 @@ int nfs3svc_encode_entry(void *, const char *name, int nfs3svc_encode_entry_plus(void *, const char *name, int namlen, loff_t offset, u64 ino, unsigned int); +int nfs3svc_encode_entry3(void *data, const char *name, int namlen, + loff_t offset, u64 ino, unsigned int d_type); +int nfs3svc_encode_entryplus3(void *data, const char *name, int namlen, + loff_t offset, u64 ino, unsigned int d_type); /* Helper functions for NFSv3 ACL code */ __be32 *nfs3svc_encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp); From d8554802010d13c60cbf147deec7b91db89adbbc Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 13 Nov 2020 11:27:13 -0500 Subject: [PATCH 0515/1565] NFSD: Remove unused NFSv3 directory entry encoders [ Upstream commit 1411934627f9fe31a36ac8c43179ce9b63edce5c ] Clean up. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 190 ---------------------------------------------- fs/nfsd/xdr3.h | 11 --- 2 files changed, 201 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index f38d845ac8a0..646bbfc5b779 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -148,16 +148,6 @@ svcxdr_encode_post_op_fh3(struct xdr_stream *xdr, const struct svc_fh *fhp) return true; } -static __be32 * -encode_fh(__be32 *p, struct svc_fh *fhp) -{ - unsigned int size = fhp->fh_handle.fh_size; - *p++ = htonl(size); - if (size) p[XDR_QUADLEN(size)-1]=0; - memcpy(p, &fhp->fh_handle.fh_base, size); - return p + XDR_QUADLEN(size); -} - static bool svcxdr_encode_cookieverf3(struct xdr_stream *xdr, const __be32 *verf) { @@ -1164,20 +1154,6 @@ nfs3svc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) return 1; } -static __be32 * -encode_entry_baggage(struct nfsd3_readdirres *cd, __be32 *p, const char *name, - int namlen, u64 ino) -{ - *p++ = xdr_one; /* mark entry present */ - p = xdr_encode_hyper(p, ino); /* file id */ - p = xdr_encode_array(p, name, namlen);/* name length & name */ - - cd->offset = p; /* remember pointer */ - p = xdr_encode_hyper(p, NFS_OFFSET_MAX);/* offset of next entry */ - - return p; -} - static __be32 compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp, const char *name, int namlen, u64 ino) @@ -1216,26 +1192,6 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp, return rv; } -static __be32 *encode_entryplus_baggage(struct nfsd3_readdirres *cd, __be32 *p, const char *name, int namlen, u64 ino) -{ - struct svc_fh *fh = &cd->scratch; - __be32 err; - - fh_init(fh, NFS3_FHSIZE); - err = compose_entry_fh(cd, fh, name, namlen, ino); - if (err) { - *p++ = 0; - *p++ = 0; - goto out; - } - p = encode_post_op_attr(cd->rqstp, p, fh); - *p++ = xdr_one; /* yes, a file handle follows */ - p = encode_fh(p, fh); -out: - fh_put(fh); - return p; -} - /** * nfs3svc_encode_cookie3 - Encode a directory offset cookie * @resp: readdir result context @@ -1255,152 +1211,6 @@ void nfs3svc_encode_cookie3(struct nfsd3_readdirres *resp, u64 offset) resp->cookie_offset = 0; } -/* - * Encode a directory entry. This one works for both normal readdir - * and readdirplus. - * The normal readdir reply requires 2 (fileid) + 1 (stringlen) - * + string + 2 (cookie) + 1 (next) words, i.e. 6 + strlen. - * - * The readdirplus baggage is 1+21 words for post_op_attr, plus the - * file handle. - */ - -#define NFS3_ENTRY_BAGGAGE (2 + 1 + 2 + 1) -#define NFS3_ENTRYPLUS_BAGGAGE (1 + 21 + 1 + (NFS3_FHSIZE >> 2)) -static int -encode_entry(struct readdir_cd *ccd, const char *name, int namlen, - loff_t offset, u64 ino, unsigned int d_type, int plus) -{ - struct nfsd3_readdirres *cd = container_of(ccd, struct nfsd3_readdirres, - common); - __be32 *p = cd->buffer; - caddr_t curr_page_addr = NULL; - struct page ** page; - int slen; /* string (name) length */ - int elen; /* estimated entry length in words */ - int num_entry_words = 0; /* actual number of words */ - - nfs3svc_encode_cookie3(cd, offset); - - /* - dprintk("encode_entry(%.*s @%ld%s)\n", - namlen, name, (long) offset, plus? " plus" : ""); - */ - - /* truncate filename if too long */ - namlen = min(namlen, NFS3_MAXNAMLEN); - - slen = XDR_QUADLEN(namlen); - elen = slen + NFS3_ENTRY_BAGGAGE - + (plus? NFS3_ENTRYPLUS_BAGGAGE : 0); - - if (cd->buflen < elen) { - cd->common.err = nfserr_toosmall; - return -EINVAL; - } - - /* determine which page in rq_respages[] we are currently filling */ - for (page = cd->rqstp->rq_respages + 1; - page < cd->rqstp->rq_next_page; page++) { - curr_page_addr = page_address(*page); - - if (((caddr_t)cd->buffer >= curr_page_addr) && - ((caddr_t)cd->buffer < curr_page_addr + PAGE_SIZE)) - break; - } - - if ((caddr_t)(cd->buffer + elen) < (curr_page_addr + PAGE_SIZE)) { - /* encode entry in current page */ - - p = encode_entry_baggage(cd, p, name, namlen, ino); - - if (plus) - p = encode_entryplus_baggage(cd, p, name, namlen, ino); - num_entry_words = p - cd->buffer; - } else if (*(page+1) != NULL) { - /* temporarily encode entry into next page, then move back to - * current and next page in rq_respages[] */ - __be32 *p1, *tmp; - int len1, len2; - - /* grab next page for temporary storage of entry */ - p1 = tmp = page_address(*(page+1)); - - p1 = encode_entry_baggage(cd, p1, name, namlen, ino); - - if (plus) - p1 = encode_entryplus_baggage(cd, p1, name, namlen, ino); - - /* determine entry word length and lengths to go in pages */ - num_entry_words = p1 - tmp; - len1 = curr_page_addr + PAGE_SIZE - (caddr_t)cd->buffer; - if ((num_entry_words << 2) < len1) { - /* the actual number of words in the entry is less - * than elen and can still fit in the current page - */ - memmove(p, tmp, num_entry_words << 2); - p += num_entry_words; - - /* update offset */ - cd->offset = cd->buffer + (cd->offset - tmp); - } else { - unsigned int offset_r = (cd->offset - tmp) << 2; - - /* update pointer to offset location. - * This is a 64bit quantity, so we need to - * deal with 3 cases: - * - entirely in first page - * - entirely in second page - * - 4 bytes in each page - */ - if (offset_r + 8 <= len1) { - cd->offset = p + (cd->offset - tmp); - } else if (offset_r >= len1) { - cd->offset -= len1 >> 2; - } else { - /* sitting on the fence */ - BUG_ON(offset_r != len1 - 4); - cd->offset = p + (cd->offset - tmp); - cd->offset1 = tmp; - } - - len2 = (num_entry_words << 2) - len1; - - /* move from temp page to current and next pages */ - memmove(p, tmp, len1); - memmove(tmp, (caddr_t)tmp+len1, len2); - - p = tmp + (len2 >> 2); - } - } - else { - cd->common.err = nfserr_toosmall; - return -EINVAL; - } - - cd->count += num_entry_words; - cd->buflen -= num_entry_words; - cd->buffer = p; - cd->common.err = nfs_ok; - return 0; - -} - -int -nfs3svc_encode_entry(void *cd, const char *name, - int namlen, loff_t offset, u64 ino, unsigned int d_type) -{ - return encode_entry(cd, name, namlen, offset, ino, d_type, 0); -} - -int -nfs3svc_encode_entry_plus(void *cd, const char *name, - int namlen, loff_t offset, u64 ino, - unsigned int d_type) -{ - return encode_entry(cd, name, namlen, offset, ino, d_type, 1); -} - static bool svcxdr_encode_entry3_common(struct nfsd3_readdirres *resp, const char *name, int namlen, loff_t offset, u64 ino) diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 81dea78b0f17..b851458373db 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -172,7 +172,6 @@ struct nfsd3_readdirres { /* Components of the reply */ __be32 status; struct svc_fh fh; - int count; __be32 verf[2]; /* Used to encode the reply's entry list */ @@ -180,10 +179,6 @@ struct nfsd3_readdirres { struct xdr_buf dirlist; struct svc_fh scratch; struct readdir_cd common; - __be32 * buffer; - int buflen; - __be32 * offset; - __be32 * offset1; unsigned int cookie_offset; struct svc_rqst * rqstp; @@ -305,12 +300,6 @@ void nfs3svc_release_fhandle(struct svc_rqst *); void nfs3svc_release_fhandle2(struct svc_rqst *); void nfs3svc_encode_cookie3(struct nfsd3_readdirres *resp, u64 offset); -int nfs3svc_encode_entry(void *, const char *name, - int namlen, loff_t offset, u64 ino, - unsigned int); -int nfs3svc_encode_entry_plus(void *, const char *name, - int namlen, loff_t offset, u64 ino, - unsigned int); int nfs3svc_encode_entry3(void *data, const char *name, int namlen, loff_t offset, u64 ino, unsigned int d_type); int nfs3svc_encode_entryplus3(void *data, const char *name, int namlen, From e4e6019ce5a24c9acc8bbd21e0df38e85b49c132 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 15 Jan 2021 09:28:44 -0500 Subject: [PATCH 0516/1565] NFSD: Reduce svc_rqst::rq_pages churn during READDIR operations [ Upstream commit 76ed0dd96eeb2771b21bf5dcbd88326ef89ee0ed ] During NFSv2 and NFSv3 READDIR/PLUS operations, NFSD advances rq_next_page to the full size of the client-requested buffer, then releases all those pages at the end of the request. The next request to use that nfsd thread has to refill the pages. NFSD does this even when the dirlist in the reply is small. With NFSv3 clients that send READDIR operations with large buffer sizes, that can be 256 put_page/alloc_page pairs per READDIR request, even though those pages often remain unused. We can save some work by not releasing dirlist buffer pages that were not used to form the READDIR Reply. I've left the NFSv2 code alone since there are never more than three pages involved in an NFSv2 READDIR Reply. Eventually we should nail down why these pages need to be released at all in order to avoid allocating and releasing pages unnecessarily. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 781bb2b115e7..be1ed33e424e 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -498,6 +498,9 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) memcpy(resp->verf, argp->verf, 8); nfs3svc_encode_cookie3(resp, offset); + /* Recycle only pages that were part of the reply */ + rqstp->rq_next_page = resp->xdr.page_ptr + 1; + return rpc_success; } @@ -538,6 +541,9 @@ nfsd3_proc_readdirplus(struct svc_rqst *rqstp) memcpy(resp->verf, argp->verf, 8); nfs3svc_encode_cookie3(resp, offset); + /* Recycle only pages that were part of the reply */ + rqstp->rq_next_page = resp->xdr.page_ptr + 1; + out: return rpc_success; } From 816c23c911f61ba7df9e60a0411aae7a683b342c Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 23 Oct 2020 11:08:02 -0400 Subject: [PATCH 0517/1565] NFSD: Update the NFSv2 stat encoder to use struct xdr_stream [ Upstream commit a887eaed2a964754334cd3f8c5fe87e413e68fef ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 10 +++++----- fs/nfsd/nfsxdr.c | 19 ++++++++++++++++--- fs/nfsd/xdr.h | 2 +- 3 files changed, 22 insertions(+), 9 deletions(-) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index a628ea4d66ea..00eb722129ab 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -736,7 +736,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_REMOVE] = { .pc_func = nfsd_proc_remove, .pc_decode = nfssvc_decode_diropargs, - .pc_encode = nfssvc_encode_stat, + .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_diropargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, @@ -746,7 +746,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_RENAME] = { .pc_func = nfsd_proc_rename, .pc_decode = nfssvc_decode_renameargs, - .pc_encode = nfssvc_encode_stat, + .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_renameargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, @@ -756,7 +756,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_LINK] = { .pc_func = nfsd_proc_link, .pc_decode = nfssvc_decode_linkargs, - .pc_encode = nfssvc_encode_stat, + .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_linkargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, @@ -766,7 +766,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_SYMLINK] = { .pc_func = nfsd_proc_symlink, .pc_decode = nfssvc_decode_symlinkargs, - .pc_encode = nfssvc_encode_stat, + .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_symlinkargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, @@ -787,7 +787,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_RMDIR] = { .pc_func = nfsd_proc_rmdir, .pc_decode = nfssvc_decode_diropargs, - .pc_encode = nfssvc_encode_stat, + .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_diropargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 5d79ef6a0c7f..10cd120044b3 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -26,6 +26,19 @@ static u32 nfs_ftypes[] = { * Basic NFSv2 data types (RFC 1094 Section 2.3) */ +static bool +svcxdr_encode_stat(struct xdr_stream *xdr, __be32 status) +{ + __be32 *p; + + p = xdr_reserve_space(xdr, sizeof(status)); + if (!p) + return false; + *p = status; + + return true; +} + /** * svcxdr_decode_fhandle - Decode an NFSv2 file handle * @xdr: XDR stream positioned at an encoded NFSv2 FH @@ -390,12 +403,12 @@ nfssvc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) */ int -nfssvc_encode_stat(struct svc_rqst *rqstp, __be32 *p) +nfssvc_encode_statres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_stat *resp = rqstp->rq_resp; - *p++ = resp->status; - return xdr_ressize_check(rqstp, p); + return svcxdr_encode_stat(xdr, resp->status); } int diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index b92f1acec9e7..f040123373bf 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -147,7 +147,7 @@ int nfssvc_decode_renameargs(struct svc_rqst *, __be32 *); int nfssvc_decode_linkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_symlinkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_readdirargs(struct svc_rqst *, __be32 *); -int nfssvc_encode_stat(struct svc_rqst *, __be32 *); +int nfssvc_encode_statres(struct svc_rqst *, __be32 *); int nfssvc_encode_attrstat(struct svc_rqst *, __be32 *); int nfssvc_encode_diropres(struct svc_rqst *, __be32 *); int nfssvc_encode_readlinkres(struct svc_rqst *, __be32 *); From c64d5d0ca9f94594f37d3f2f575433674a44feb8 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 23 Oct 2020 15:28:59 -0400 Subject: [PATCH 0518/1565] NFSD: Update the NFSv2 attrstat encoder to use struct xdr_stream [ Upstream commit 92b54a4fa4224e6116eb0d87a39dd05af23fcdfa ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 6 ++-- fs/nfsd/nfsxdr.c | 90 ++++++++++++++++++++++++++++++++++++++++++----- fs/nfsd/xdr.h | 2 +- 3 files changed, 86 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 00eb722129ab..2a30b27c9d9b 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -640,7 +640,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_GETATTR] = { .pc_func = nfsd_proc_getattr, .pc_decode = nfssvc_decode_fhandleargs, - .pc_encode = nfssvc_encode_attrstat, + .pc_encode = nfssvc_encode_attrstatres, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_attrstat), @@ -651,7 +651,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_SETATTR] = { .pc_func = nfsd_proc_setattr, .pc_decode = nfssvc_decode_sattrargs, - .pc_encode = nfssvc_encode_attrstat, + .pc_encode = nfssvc_encode_attrstatres, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_sattrargs), .pc_ressize = sizeof(struct nfsd_attrstat), @@ -714,7 +714,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_WRITE] = { .pc_func = nfsd_proc_write, .pc_decode = nfssvc_decode_writeargs, - .pc_encode = nfssvc_encode_attrstat, + .pc_encode = nfssvc_encode_attrstatres, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_writeargs), .pc_ressize = sizeof(struct nfsd_attrstat), diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 10cd120044b3..65c8f8f31444 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -14,7 +14,7 @@ /* * Mapping of S_IF* types to NFS file types */ -static u32 nfs_ftypes[] = { +static const u32 nfs_ftypes[] = { NFNON, NFCHR, NFCHR, NFBAD, NFDIR, NFBAD, NFBLK, NFBAD, NFREG, NFBAD, NFLNK, NFBAD, @@ -70,6 +70,17 @@ encode_fh(__be32 *p, struct svc_fh *fhp) return p + (NFS_FHSIZE>> 2); } +static __be32 * +encode_timeval(__be32 *p, const struct timespec64 *time) +{ + *p++ = cpu_to_be32((u32)time->tv_sec); + if (time->tv_nsec) + *p++ = cpu_to_be32(time->tv_nsec / NSEC_PER_USEC); + else + *p++ = xdr_zero; + return p; +} + static bool svcxdr_decode_filename(struct xdr_stream *xdr, char **name, unsigned int *len) { @@ -233,6 +244,64 @@ encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, return p; } +static int +svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, + const struct svc_fh *fhp, const struct kstat *stat) +{ + struct user_namespace *userns = nfsd_user_namespace(rqstp); + struct dentry *dentry = fhp->fh_dentry; + int type = stat->mode & S_IFMT; + struct timespec64 time; + __be32 *p; + u32 fsid; + + p = xdr_reserve_space(xdr, XDR_UNIT * 17); + if (!p) + return 0; + + *p++ = cpu_to_be32(nfs_ftypes[type >> 12]); + *p++ = cpu_to_be32((u32)stat->mode); + *p++ = cpu_to_be32((u32)stat->nlink); + *p++ = cpu_to_be32((u32)from_kuid_munged(userns, stat->uid)); + *p++ = cpu_to_be32((u32)from_kgid_munged(userns, stat->gid)); + + if (S_ISLNK(type) && stat->size > NFS_MAXPATHLEN) + *p++ = cpu_to_be32(NFS_MAXPATHLEN); + else + *p++ = cpu_to_be32((u32) stat->size); + *p++ = cpu_to_be32((u32) stat->blksize); + if (S_ISCHR(type) || S_ISBLK(type)) + *p++ = cpu_to_be32(new_encode_dev(stat->rdev)); + else + *p++ = cpu_to_be32(0xffffffff); + *p++ = cpu_to_be32((u32)stat->blocks); + + switch (fsid_source(fhp)) { + case FSIDSOURCE_FSID: + fsid = (u32)fhp->fh_export->ex_fsid; + break; + case FSIDSOURCE_UUID: + fsid = ((u32 *)fhp->fh_export->ex_uuid)[0]; + fsid ^= ((u32 *)fhp->fh_export->ex_uuid)[1]; + fsid ^= ((u32 *)fhp->fh_export->ex_uuid)[2]; + fsid ^= ((u32 *)fhp->fh_export->ex_uuid)[3]; + break; + default: + fsid = new_encode_dev(stat->dev); + break; + } + *p++ = cpu_to_be32(fsid); + + *p++ = cpu_to_be32((u32)stat->ino); + p = encode_timeval(p, &stat->atime); + time = stat->mtime; + lease_get_mtime(d_inode(dentry), &time); + p = encode_timeval(p, &time); + encode_timeval(p, &stat->ctime); + + return 1; +} + /* Helper function for NFSv2 ACL code */ __be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, struct kstat *stat) { @@ -412,16 +481,21 @@ nfssvc_encode_statres(struct svc_rqst *rqstp, __be32 *p) } int -nfssvc_encode_attrstat(struct svc_rqst *rqstp, __be32 *p) +nfssvc_encode_attrstatres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_attrstat *resp = rqstp->rq_resp; - *p++ = resp->status; - if (resp->status != nfs_ok) - goto out; - p = encode_fattr(rqstp, p, &resp->fh, &resp->stat); -out: - return xdr_ressize_check(rqstp, p); + if (!svcxdr_encode_stat(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) + return 0; + break; + } + + return 1; } int diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index f040123373bf..45aa6b75a5f8 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -148,7 +148,7 @@ int nfssvc_decode_linkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_symlinkargs(struct svc_rqst *, __be32 *); int nfssvc_decode_readdirargs(struct svc_rqst *, __be32 *); int nfssvc_encode_statres(struct svc_rqst *, __be32 *); -int nfssvc_encode_attrstat(struct svc_rqst *, __be32 *); +int nfssvc_encode_attrstatres(struct svc_rqst *, __be32 *); int nfssvc_encode_diropres(struct svc_rqst *, __be32 *); int nfssvc_encode_readlinkres(struct svc_rqst *, __be32 *); int nfssvc_encode_readres(struct svc_rqst *, __be32 *); From 9aab4f03e8f222452f3d234b6d38998b25a0e291 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 23 Oct 2020 16:44:16 -0400 Subject: [PATCH 0519/1565] NFSD: Update the NFSv2 diropres encoder to use struct xdr_stream [ Upstream commit e3b4ef221ac57c08341c97a10c8a81c041f76716 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsxdr.c | 38 +++++++++++++++++++++++++------------- 1 file changed, 25 insertions(+), 13 deletions(-) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 65c8f8f31444..989144b0d5be 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -63,11 +63,17 @@ svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp) return true; } -static __be32 * -encode_fh(__be32 *p, struct svc_fh *fhp) +static bool +svcxdr_encode_fhandle(struct xdr_stream *xdr, const struct svc_fh *fhp) { + __be32 *p; + + p = xdr_reserve_space(xdr, NFS_FHSIZE); + if (!p) + return false; memcpy(p, &fhp->fh_handle.fh_base, NFS_FHSIZE); - return p + (NFS_FHSIZE>> 2); + + return true; } static __be32 * @@ -244,7 +250,7 @@ encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, return p; } -static int +static bool svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, const struct svc_fh *fhp, const struct kstat *stat) { @@ -257,7 +263,7 @@ svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, p = xdr_reserve_space(xdr, XDR_UNIT * 17); if (!p) - return 0; + return false; *p++ = cpu_to_be32(nfs_ftypes[type >> 12]); *p++ = cpu_to_be32((u32)stat->mode); @@ -299,7 +305,7 @@ svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, p = encode_timeval(p, &time); encode_timeval(p, &stat->ctime); - return 1; + return true; } /* Helper function for NFSv2 ACL code */ @@ -501,15 +507,21 @@ nfssvc_encode_attrstatres(struct svc_rqst *rqstp, __be32 *p) int nfssvc_encode_diropres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_diropres *resp = rqstp->rq_resp; - *p++ = resp->status; - if (resp->status != nfs_ok) - goto out; - p = encode_fh(p, &resp->fh); - p = encode_fattr(rqstp, p, &resp->fh, &resp->stat); -out: - return xdr_ressize_check(rqstp, p); + if (!svcxdr_encode_stat(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_fhandle(xdr, &resp->fh)) + return 0; + if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) + return 0; + break; + } + + return 1; } int From 27909a583cc399a0587ae676733d4b210b2f92c4 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 23 Oct 2020 15:41:09 -0400 Subject: [PATCH 0520/1565] NFSD: Update the NFSv2 READLINK result encoder to use struct xdr_stream [ Upstream commit d9014b0f8fae11f22a3d356553844e06ddcdce4a ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 5 +++-- fs/nfsd/nfsxdr.c | 26 ++++++++++++-------------- fs/nfsd/xdr.h | 1 + 3 files changed, 16 insertions(+), 16 deletions(-) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 2a30b27c9d9b..fa9a18897bc2 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -151,13 +151,14 @@ nfsd_proc_readlink(struct svc_rqst *rqstp) { struct nfsd_fhandle *argp = rqstp->rq_argp; struct nfsd_readlinkres *resp = rqstp->rq_resp; - char *buffer = page_address(*(rqstp->rq_next_page++)); dprintk("nfsd: READLINK %s\n", SVCFH_fmt(&argp->fh)); /* Read the symlink. */ resp->len = NFS_MAXPATHLEN; - resp->status = nfsd_readlink(rqstp, &argp->fh, buffer, &resp->len); + resp->page = *(rqstp->rq_next_page++); + resp->status = nfsd_readlink(rqstp, &argp->fh, + page_address(resp->page), &resp->len); fh_put(&argp->fh); return rpc_success; diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 989144b0d5be..74d9d11949c6 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -527,24 +527,22 @@ nfssvc_encode_diropres(struct svc_rqst *rqstp, __be32 *p) int nfssvc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_readlinkres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head; - *p++ = resp->status; - if (resp->status != nfs_ok) - return xdr_ressize_check(rqstp, p); - - *p++ = htonl(resp->len); - xdr_ressize_check(rqstp, p); - rqstp->rq_res.page_len = resp->len; - if (resp->len & 3) { - /* need to pad the tail */ - rqstp->rq_res.tail[0].iov_base = p; - *p = 0; - rqstp->rq_res.tail[0].iov_len = 4 - (resp->len&3); - } - if (svc_encode_result_payload(rqstp, head->iov_len, resp->len)) + if (!svcxdr_encode_stat(xdr, resp->status)) return 0; + switch (resp->status) { + case nfs_ok: + if (xdr_stream_encode_u32(xdr, resp->len) < 0) + return 0; + xdr_write_pages(xdr, &resp->page, 0, resp->len); + if (svc_encode_result_payload(rqstp, head->iov_len, resp->len) < 0) + return 0; + break; + } + return 1; } diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 45aa6b75a5f8..b17fd72c69a6 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -94,6 +94,7 @@ struct nfsd_diropres { struct nfsd_readlinkres { __be32 status; int len; + struct page *page; }; struct nfsd_readres { From ad0614d3a857363b6e3978ce8adb7d8fedd4e18b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 23 Oct 2020 16:40:11 -0400 Subject: [PATCH 0521/1565] NFSD: Update the NFSv2 READ result encoder to use struct xdr_stream [ Upstream commit a6f8d9dc9e44b51303d9abde4643460137d19b28 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 1 + fs/nfsd/nfsxdr.c | 32 +++++++++++++++----------------- fs/nfsd/xdr.h | 1 + 3 files changed, 17 insertions(+), 17 deletions(-) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index fa9a18897bc2..1fd91e00b97b 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -185,6 +185,7 @@ nfsd_proc_read(struct svc_rqst *rqstp) v = 0; len = argp->count; + resp->pages = rqstp->rq_next_page; while (len > 0) { struct page *page = *(rqstp->rq_next_page++); diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 74d9d11949c6..d6d7d07dbb1b 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -549,27 +549,25 @@ nfssvc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) int nfssvc_encode_readres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_readres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head; - *p++ = resp->status; - if (resp->status != nfs_ok) - return xdr_ressize_check(rqstp, p); - - p = encode_fattr(rqstp, p, &resp->fh, &resp->stat); - *p++ = htonl(resp->count); - xdr_ressize_check(rqstp, p); - - /* now update rqstp->rq_res to reflect data as well */ - rqstp->rq_res.page_len = resp->count; - if (resp->count & 3) { - /* need to pad the tail */ - rqstp->rq_res.tail[0].iov_base = p; - *p = 0; - rqstp->rq_res.tail[0].iov_len = 4 - (resp->count&3); - } - if (svc_encode_result_payload(rqstp, head->iov_len, resp->count)) + if (!svcxdr_encode_stat(xdr, resp->status)) return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) + return 0; + if (xdr_stream_encode_u32(xdr, resp->count) < 0) + return 0; + xdr_write_pages(xdr, resp->pages, rqstp->rq_res.page_base, + resp->count); + if (svc_encode_result_payload(rqstp, head->iov_len, resp->count) < 0) + return 0; + break; + } + return 1; } diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index b17fd72c69a6..337c581e15b4 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -102,6 +102,7 @@ struct nfsd_readres { struct svc_fh fh; unsigned long count; struct kstat stat; + struct page **pages; }; struct nfsd_readdirres { From 60bc5af5b8dc11cc2198b132c52b5105dbc13904 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 23 Oct 2020 19:01:38 -0400 Subject: [PATCH 0522/1565] NFSD: Update the NFSv2 STATFS result encoder to use struct xdr_stream [ Upstream commit bf15229f2ced4f14946eef958336f764e30f8efb ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsxdr.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index d6d7d07dbb1b..39d296aecd3e 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -592,19 +592,26 @@ nfssvc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) int nfssvc_encode_statfsres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_statfsres *resp = rqstp->rq_resp; struct kstatfs *stat = &resp->stats; - *p++ = resp->status; - if (resp->status != nfs_ok) - return xdr_ressize_check(rqstp, p); + if (!svcxdr_encode_stat(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + p = xdr_reserve_space(xdr, XDR_UNIT * 5); + if (!p) + return 0; + *p++ = cpu_to_be32(NFSSVC_MAXBLKSIZE_V2); + *p++ = cpu_to_be32(stat->f_bsize); + *p++ = cpu_to_be32(stat->f_blocks); + *p++ = cpu_to_be32(stat->f_bfree); + *p = cpu_to_be32(stat->f_bavail); + break; + } - *p++ = htonl(NFSSVC_MAXBLKSIZE_V2); /* max transfer size */ - *p++ = htonl(stat->f_bsize); - *p++ = htonl(stat->f_blocks); - *p++ = htonl(stat->f_bfree); - *p++ = htonl(stat->f_bavail); - return xdr_ressize_check(rqstp, p); + return 1; } int From c4e272758974cba11fd3611a8f39aa227cd130e0 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 13 Nov 2020 16:53:17 -0500 Subject: [PATCH 0523/1565] NFSD: Add a helper that encodes NFSv3 directory offset cookies [ Upstream commit d52532002ffa217ad3fa4c3ba86c95203d21dd21 ] Refactor: Add helper function similar to nfs3svc_encode_cookie3(). Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 3 +-- fs/nfsd/nfsxdr.c | 18 ++++++++++++++++-- fs/nfsd/xdr.h | 1 + 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 1fd91e00b97b..2d3d7cdffd52 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -595,8 +595,7 @@ nfsd_proc_readdir(struct svc_rqst *rqstp) &resp->common, nfssvc_encode_entry); resp->count = resp->buffer - buffer; - if (resp->offset) - *resp->offset = htonl(offset); + nfssvc_encode_nfscookie(resp, offset); fh_put(&argp->fh); return rpc_success; diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 39d296aecd3e..a87b21cfe0d0 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -614,6 +614,21 @@ nfssvc_encode_statfsres(struct svc_rqst *rqstp, __be32 *p) return 1; } +/** + * nfssvc_encode_nfscookie - Encode a directory offset cookie + * @resp: readdir result context + * @offset: offset cookie to encode + * + */ +void nfssvc_encode_nfscookie(struct nfsd_readdirres *resp, u32 offset) +{ + if (!resp->offset) + return; + + *resp->offset = cpu_to_be32(offset); + resp->offset = NULL; +} + int nfssvc_encode_entry(void *ccdv, const char *name, int namlen, loff_t offset, u64 ino, unsigned int d_type) @@ -632,8 +647,7 @@ nfssvc_encode_entry(void *ccdv, const char *name, cd->common.err = nfserr_fbig; return -EINVAL; } - if (cd->offset) - *cd->offset = htonl(offset); + nfssvc_encode_nfscookie(cd, offset); /* truncate filename */ namlen = min(namlen, NFS2_MAXNAMLEN); diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 337c581e15b4..75b3b3144534 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -157,6 +157,7 @@ int nfssvc_encode_readres(struct svc_rqst *, __be32 *); int nfssvc_encode_statfsres(struct svc_rqst *, __be32 *); int nfssvc_encode_readdirres(struct svc_rqst *, __be32 *); +void nfssvc_encode_nfscookie(struct nfsd_readdirres *resp, u32 offset); int nfssvc_encode_entry(void *, const char *name, int namlen, loff_t offset, u64 ino, unsigned int); From bc17759a4e9928a869542a714d1e997073b24413 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 13 Nov 2020 16:57:44 -0500 Subject: [PATCH 0524/1565] NFSD: Count bytes instead of pages in the NFSv2 READDIR encoder [ Upstream commit 8141d6a2bb6c655ff0c0b81ced80d9025f03e926 ] Clean up: Counting the bytes used by each returned directory entry seems less brittle to me than trying to measure consumed pages after the fact. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 4 ---- fs/nfsd/nfsxdr.c | 3 ++- 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 2d3d7cdffd52..23b2a900cb79 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -578,14 +578,12 @@ nfsd_proc_readdir(struct svc_rqst *rqstp) struct nfsd_readdirargs *argp = rqstp->rq_argp; struct nfsd_readdirres *resp = rqstp->rq_resp; loff_t offset; - __be32 *buffer; dprintk("nfsd: READDIR %s %d bytes at %d\n", SVCFH_fmt(&argp->fh), argp->count, argp->cookie); nfsd_init_dirlist_pages(rqstp, resp, argp->count); - buffer = resp->buffer; resp->offset = NULL; resp->common.err = nfs_ok; @@ -593,8 +591,6 @@ nfsd_proc_readdir(struct svc_rqst *rqstp) offset = argp->cookie; resp->status = nfsd_readdir(rqstp, &argp->fh, &offset, &resp->common, nfssvc_encode_entry); - - resp->count = resp->buffer - buffer; nfssvc_encode_nfscookie(resp, offset); fh_put(&argp->fh); diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index a87b21cfe0d0..8ae23ed6dc5d 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -584,7 +584,7 @@ nfssvc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) p = resp->buffer; *p++ = 0; /* no more entries */ *p++ = htonl((resp->common.err == nfserr_eof)); - rqstp->rq_res.page_len = (((unsigned long)p-1) & ~PAGE_MASK)+1; + rqstp->rq_res.page_len = resp->count << 2; return 1; } @@ -667,6 +667,7 @@ nfssvc_encode_entry(void *ccdv, const char *name, cd->offset = p; /* remember pointer */ *p++ = htonl(~0U); /* offset of next entry */ + cd->count += p - cd->buffer; cd->buflen = buflen; cd->buffer = p; cd->common.err = nfs_ok; From 801e4d79b779ff63b8f42bd4eff7f7eda9d538af Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 23 Oct 2020 16:49:01 -0400 Subject: [PATCH 0525/1565] NFSD: Update the NFSv2 READDIR result encoder to use struct xdr_stream [ Upstream commit 94c8f8c682a6497af7ea71351b18f637c6337d42 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsxdr.c | 22 +++++++++++++--------- fs/nfsd/xdr.h | 1 + 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 8ae23ed6dc5d..9522e5c5f49d 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -574,17 +574,21 @@ nfssvc_encode_readres(struct svc_rqst *rqstp, __be32 *p) int nfssvc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_readdirres *resp = rqstp->rq_resp; - *p++ = resp->status; - if (resp->status != nfs_ok) - return xdr_ressize_check(rqstp, p); - - xdr_ressize_check(rqstp, p); - p = resp->buffer; - *p++ = 0; /* no more entries */ - *p++ = htonl((resp->common.err == nfserr_eof)); - rqstp->rq_res.page_len = resp->count << 2; + if (!svcxdr_encode_stat(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + xdr_write_pages(xdr, &resp->page, 0, resp->count << 2); + /* no more entries */ + if (xdr_stream_encode_item_absent(xdr) < 0) + return 0; + if (xdr_stream_encode_bool(xdr, resp->common.err == nfserr_eof) < 0) + return 0; + break; + } return 1; } diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 75b3b3144534..d7beef2c2ed5 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -114,6 +114,7 @@ struct nfsd_readdirres { __be32 * buffer; int buflen; __be32 * offset; + struct page *page; }; struct nfsd_statfsres { From b8658c947d540cf5d135176354da5c8f2811efa0 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sat, 14 Nov 2020 13:45:35 -0500 Subject: [PATCH 0526/1565] NFSD: Update the NFSv2 READDIR entry encoder to use struct xdr_stream [ Upstream commit f5dcccd647da513a89f3b6ca392b0c1eb050b9fc ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 26 +++++++++++---- fs/nfsd/nfsxdr.c | 81 ++++++++++++++++++++++++++++++++++++++++++++--- fs/nfsd/xdr.h | 7 ++++ 3 files changed, 103 insertions(+), 11 deletions(-) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 23b2a900cb79..135c0bc468bc 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -559,14 +559,27 @@ static void nfsd_init_dirlist_pages(struct svc_rqst *rqstp, struct nfsd_readdirres *resp, int count) { + struct xdr_buf *buf = &resp->dirlist; + struct xdr_stream *xdr = &resp->xdr; + count = min_t(u32, count, PAGE_SIZE); - /* Convert byte count to number of words (i.e. >> 2), - * and reserve room for the NULL ptr & eof flag (-2 words) */ - resp->buflen = (count >> 2) - 2; + memset(buf, 0, sizeof(*buf)); - resp->buffer = page_address(*rqstp->rq_next_page); + /* Reserve room for the NULL ptr & eof flag (-2 words) */ + buf->buflen = count - sizeof(__be32) * 2; + buf->pages = rqstp->rq_next_page; rqstp->rq_next_page++; + + /* This is xdr_init_encode(), but it assumes that + * the head kvec has already been consumed. */ + xdr_set_scratch_buffer(xdr, NULL, 0); + xdr->buf = buf; + xdr->page_ptr = buf->pages; + xdr->iov = NULL; + xdr->p = page_address(*buf->pages); + xdr->end = xdr->p + (PAGE_SIZE >> 2); + xdr->rqst = NULL; } /* @@ -585,12 +598,11 @@ nfsd_proc_readdir(struct svc_rqst *rqstp) nfsd_init_dirlist_pages(rqstp, resp, argp->count); - resp->offset = NULL; resp->common.err = nfs_ok; - /* Read directory and encode entries on the fly */ + resp->cookie_offset = 0; offset = argp->cookie; resp->status = nfsd_readdir(rqstp, &argp->fh, &offset, - &resp->common, nfssvc_encode_entry); + &resp->common, nfs2svc_encode_entry); nfssvc_encode_nfscookie(resp, offset); fh_put(&argp->fh); diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 9522e5c5f49d..1102d40ded03 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -576,12 +576,13 @@ nfssvc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) { struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_readdirres *resp = rqstp->rq_resp; + struct xdr_buf *dirlist = &resp->dirlist; if (!svcxdr_encode_stat(xdr, resp->status)) return 0; switch (resp->status) { case nfs_ok: - xdr_write_pages(xdr, &resp->page, 0, resp->count << 2); + xdr_write_pages(xdr, dirlist->pages, 0, dirlist->len); /* no more entries */ if (xdr_stream_encode_item_absent(xdr) < 0) return 0; @@ -623,14 +624,86 @@ nfssvc_encode_statfsres(struct svc_rqst *rqstp, __be32 *p) * @resp: readdir result context * @offset: offset cookie to encode * + * The buffer space for the offset cookie has already been reserved + * by svcxdr_encode_entry_common(). */ void nfssvc_encode_nfscookie(struct nfsd_readdirres *resp, u32 offset) { - if (!resp->offset) + __be32 cookie = cpu_to_be32(offset); + + if (!resp->cookie_offset) return; - *resp->offset = cpu_to_be32(offset); - resp->offset = NULL; + write_bytes_to_xdr_buf(&resp->dirlist, resp->cookie_offset, &cookie, + sizeof(cookie)); + resp->cookie_offset = 0; +} + +static bool +svcxdr_encode_entry_common(struct nfsd_readdirres *resp, const char *name, + int namlen, loff_t offset, u64 ino) +{ + struct xdr_buf *dirlist = &resp->dirlist; + struct xdr_stream *xdr = &resp->xdr; + + if (xdr_stream_encode_item_present(xdr) < 0) + return false; + /* fileid */ + if (xdr_stream_encode_u32(xdr, (u32)ino) < 0) + return false; + /* name */ + if (xdr_stream_encode_opaque(xdr, name, min(namlen, NFS2_MAXNAMLEN)) < 0) + return false; + /* cookie */ + resp->cookie_offset = dirlist->len; + if (xdr_stream_encode_u32(xdr, ~0U) < 0) + return false; + + return true; +} + +/** + * nfs2svc_encode_entry - encode one NFSv2 READDIR entry + * @data: directory context + * @name: name of the object to be encoded + * @namlen: length of that name, in bytes + * @offset: the offset of the previous entry + * @ino: the fileid of this entry + * @d_type: unused + * + * Return values: + * %0: Entry was successfully encoded. + * %-EINVAL: An encoding problem occured, secondary status code in resp->common.err + * + * On exit, the following fields are updated: + * - resp->xdr + * - resp->common.err + * - resp->cookie_offset + */ +int nfs2svc_encode_entry(void *data, const char *name, int namlen, + loff_t offset, u64 ino, unsigned int d_type) +{ + struct readdir_cd *ccd = data; + struct nfsd_readdirres *resp = container_of(ccd, + struct nfsd_readdirres, + common); + unsigned int starting_length = resp->dirlist.len; + + /* The offset cookie for the previous entry */ + nfssvc_encode_nfscookie(resp, offset); + + if (!svcxdr_encode_entry_common(resp, name, namlen, offset, ino)) + goto out_toosmall; + + xdr_commit_encode(&resp->xdr); + resp->common.err = nfs_ok; + return 0; + +out_toosmall: + resp->cookie_offset = 0; + resp->common.err = nfserr_toosmall; + resp->dirlist.len = starting_length; + return -EINVAL; } int diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index d7beef2c2ed5..a065852c9ea8 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -106,15 +106,20 @@ struct nfsd_readres { }; struct nfsd_readdirres { + /* Components of the reply */ __be32 status; int count; + /* Used to encode the reply's entry list */ + struct xdr_stream xdr; + struct xdr_buf dirlist; struct readdir_cd common; __be32 * buffer; int buflen; __be32 * offset; struct page *page; + unsigned int cookie_offset; }; struct nfsd_statfsres { @@ -159,6 +164,8 @@ int nfssvc_encode_statfsres(struct svc_rqst *, __be32 *); int nfssvc_encode_readdirres(struct svc_rqst *, __be32 *); void nfssvc_encode_nfscookie(struct nfsd_readdirres *resp, u32 offset); +int nfs2svc_encode_entry(void *data, const char *name, int namlen, + loff_t offset, u64 ino, unsigned int d_type); int nfssvc_encode_entry(void *, const char *name, int namlen, loff_t offset, u64 ino, unsigned int); From 93584780eb4d8f58303c4d828b18bf7b136b917c Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 15 Nov 2020 14:30:13 -0500 Subject: [PATCH 0527/1565] NFSD: Remove unused NFSv2 directory entry encoders [ Upstream commit 8a2cf9f5709cc20a1114a7d22655928314fc86f8 ] Clean up. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 2 +- fs/nfsd/nfsxdr.c | 51 +++-------------------------------------------- fs/nfsd/xdr.h | 10 ++-------- 3 files changed, 6 insertions(+), 57 deletions(-) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 135c0bc468bc..72f8bc4a7ea4 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -602,7 +602,7 @@ nfsd_proc_readdir(struct svc_rqst *rqstp) resp->cookie_offset = 0; offset = argp->cookie; resp->status = nfsd_readdir(rqstp, &argp->fh, &offset, - &resp->common, nfs2svc_encode_entry); + &resp->common, nfssvc_encode_entry); nfssvc_encode_nfscookie(resp, offset); fh_put(&argp->fh); diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 1102d40ded03..5df6f00d76fd 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -663,7 +663,7 @@ svcxdr_encode_entry_common(struct nfsd_readdirres *resp, const char *name, } /** - * nfs2svc_encode_entry - encode one NFSv2 READDIR entry + * nfssvc_encode_entry - encode one NFSv2 READDIR entry * @data: directory context * @name: name of the object to be encoded * @namlen: length of that name, in bytes @@ -680,8 +680,8 @@ svcxdr_encode_entry_common(struct nfsd_readdirres *resp, const char *name, * - resp->common.err * - resp->cookie_offset */ -int nfs2svc_encode_entry(void *data, const char *name, int namlen, - loff_t offset, u64 ino, unsigned int d_type) +int nfssvc_encode_entry(void *data, const char *name, int namlen, + loff_t offset, u64 ino, unsigned int d_type) { struct readdir_cd *ccd = data; struct nfsd_readdirres *resp = container_of(ccd, @@ -706,51 +706,6 @@ int nfs2svc_encode_entry(void *data, const char *name, int namlen, return -EINVAL; } -int -nfssvc_encode_entry(void *ccdv, const char *name, - int namlen, loff_t offset, u64 ino, unsigned int d_type) -{ - struct readdir_cd *ccd = ccdv; - struct nfsd_readdirres *cd = container_of(ccd, struct nfsd_readdirres, common); - __be32 *p = cd->buffer; - int buflen, slen; - - /* - dprintk("nfsd: entry(%.*s off %ld ino %ld)\n", - namlen, name, offset, ino); - */ - - if (offset > ~((u32) 0)) { - cd->common.err = nfserr_fbig; - return -EINVAL; - } - nfssvc_encode_nfscookie(cd, offset); - - /* truncate filename */ - namlen = min(namlen, NFS2_MAXNAMLEN); - slen = XDR_QUADLEN(namlen); - - if ((buflen = cd->buflen - slen - 4) < 0) { - cd->common.err = nfserr_toosmall; - return -EINVAL; - } - if (ino > ~((u32) 0)) { - cd->common.err = nfserr_fbig; - return -EINVAL; - } - *p++ = xdr_one; /* mark entry present */ - *p++ = htonl((u32) ino); /* file id */ - p = xdr_encode_array(p, name, namlen);/* name length & name */ - cd->offset = p; /* remember pointer */ - *p++ = htonl(~0U); /* offset of next entry */ - - cd->count += p - cd->buffer; - cd->buflen = buflen; - cd->buffer = p; - cd->common.err = nfs_ok; - return 0; -} - /* * XDR release functions */ diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index a065852c9ea8..10f3bd25e8cc 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -115,10 +115,6 @@ struct nfsd_readdirres { struct xdr_stream xdr; struct xdr_buf dirlist; struct readdir_cd common; - __be32 * buffer; - int buflen; - __be32 * offset; - struct page *page; unsigned int cookie_offset; }; @@ -164,10 +160,8 @@ int nfssvc_encode_statfsres(struct svc_rqst *, __be32 *); int nfssvc_encode_readdirres(struct svc_rqst *, __be32 *); void nfssvc_encode_nfscookie(struct nfsd_readdirres *resp, u32 offset); -int nfs2svc_encode_entry(void *data, const char *name, int namlen, - loff_t offset, u64 ino, unsigned int d_type); -int nfssvc_encode_entry(void *, const char *name, - int namlen, loff_t offset, u64 ino, unsigned int); +int nfssvc_encode_entry(void *data, const char *name, int namlen, + loff_t offset, u64 ino, unsigned int d_type); void nfssvc_release_attrstat(struct svc_rqst *rqstp); void nfssvc_release_diropres(struct svc_rqst *rqstp); From 89ac9a8101ad4aff456116d2dcc9a56df3d9052d Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 18 Nov 2020 14:55:05 -0500 Subject: [PATCH 0528/1565] NFSD: Add an xdr_stream-based encoder for NFSv2/3 ACLs [ Upstream commit 8edc0648880a151026fe625fa1b76772b5766f68 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfs_common/nfsacl.c | 71 ++++++++++++++++++++++++++++++++++++++++++ include/linux/nfsacl.h | 3 ++ 2 files changed, 74 insertions(+) diff --git a/fs/nfs_common/nfsacl.c b/fs/nfs_common/nfsacl.c index 79c563c1a5e8..5a5bd85d08f8 100644 --- a/fs/nfs_common/nfsacl.c +++ b/fs/nfs_common/nfsacl.c @@ -136,6 +136,77 @@ int nfsacl_encode(struct xdr_buf *buf, unsigned int base, struct inode *inode, } EXPORT_SYMBOL_GPL(nfsacl_encode); +/** + * nfs_stream_encode_acl - Encode an NFSv3 ACL + * + * @xdr: an xdr_stream positioned to receive an encoded ACL + * @inode: inode of file whose ACL this is + * @acl: posix_acl to encode + * @encode_entries: whether to encode ACEs as well + * @typeflag: ACL type: NFS_ACL_DEFAULT or zero + * + * Return values: + * %false: The ACL could not be encoded + * %true: @xdr is advanced to the next available position + */ +bool nfs_stream_encode_acl(struct xdr_stream *xdr, struct inode *inode, + struct posix_acl *acl, int encode_entries, + int typeflag) +{ + const size_t elem_size = XDR_UNIT * 3; + u32 entries = (acl && acl->a_count) ? max_t(int, acl->a_count, 4) : 0; + struct nfsacl_encode_desc nfsacl_desc = { + .desc = { + .elem_size = elem_size, + .array_len = encode_entries ? entries : 0, + .xcode = xdr_nfsace_encode, + }, + .acl = acl, + .typeflag = typeflag, + .uid = inode->i_uid, + .gid = inode->i_gid, + }; + struct nfsacl_simple_acl aclbuf; + unsigned int base; + int err; + + if (entries > NFS_ACL_MAX_ENTRIES) + return false; + if (xdr_stream_encode_u32(xdr, entries) < 0) + return false; + + if (encode_entries && acl && acl->a_count == 3) { + struct posix_acl *acl2 = &aclbuf.acl; + + /* Avoid the use of posix_acl_alloc(). nfsacl_encode() is + * invoked in contexts where a memory allocation failure is + * fatal. Fortunately this fake ACL is small enough to + * construct on the stack. */ + posix_acl_init(acl2, 4); + + /* Insert entries in canonical order: other orders seem + to confuse Solaris VxFS. */ + acl2->a_entries[0] = acl->a_entries[0]; /* ACL_USER_OBJ */ + acl2->a_entries[1] = acl->a_entries[1]; /* ACL_GROUP_OBJ */ + acl2->a_entries[2] = acl->a_entries[1]; /* ACL_MASK */ + acl2->a_entries[2].e_tag = ACL_MASK; + acl2->a_entries[3] = acl->a_entries[2]; /* ACL_OTHER */ + nfsacl_desc.acl = acl2; + } + + base = xdr_stream_pos(xdr); + if (!xdr_reserve_space(xdr, XDR_UNIT + + elem_size * nfsacl_desc.desc.array_len)) + return false; + err = xdr_encode_array2(xdr->buf, base, &nfsacl_desc.desc); + if (err) + return false; + + return true; +} +EXPORT_SYMBOL_GPL(nfs_stream_encode_acl); + + struct nfsacl_decode_desc { struct xdr_array2_desc desc; unsigned int count; diff --git a/include/linux/nfsacl.h b/include/linux/nfsacl.h index 0ba99c513649..8e76a79cdc6a 100644 --- a/include/linux/nfsacl.h +++ b/include/linux/nfsacl.h @@ -41,5 +41,8 @@ nfsacl_decode(struct xdr_buf *buf, unsigned int base, unsigned int *aclcnt, extern bool nfs_stream_decode_acl(struct xdr_stream *xdr, unsigned int *aclcnt, struct posix_acl **pacl); +extern bool +nfs_stream_encode_acl(struct xdr_stream *xdr, struct inode *inode, + struct posix_acl *acl, int encode_entries, int typeflag); #endif /* __LINUX_NFSACL_H */ From 6677b0d16abe77702040768c96e2ea17cd5b3f6e Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 18 Nov 2020 14:38:47 -0500 Subject: [PATCH 0529/1565] NFSD: Update the NFSv2 GETACL result encoder to use struct xdr_stream [ Upstream commit f8cba47344f794b54373189bec23195b51020faf ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs2acl.c | 44 +++++++++++++++++--------------------------- fs/nfsd/nfsxdr.c | 24 ++++++++++++++++++++++-- fs/nfsd/xdr.h | 3 +++ 3 files changed, 42 insertions(+), 29 deletions(-) diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 7eeac5b81c20..102f8a9a235c 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -240,51 +240,41 @@ static int nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) /* GETACL */ static int nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_getaclres *resp = rqstp->rq_resp; struct dentry *dentry = resp->fh.fh_dentry; struct inode *inode; - struct kvec *head = rqstp->rq_res.head; - unsigned int base; - int n; int w; - *p++ = resp->status; - if (resp->status != nfs_ok) - return xdr_ressize_check(rqstp, p); - - /* - * Since this is version 2, the check for nfserr in - * nfsd_dispatch actually ensures the following cannot happen. - * However, it seems fragile to depend on that. - */ - if (dentry == NULL || d_really_is_negative(dentry)) + if (!svcxdr_encode_stat(xdr, resp->status)) return 0; + + if (dentry == NULL || d_really_is_negative(dentry)) + return 1; inode = d_inode(dentry); - p = nfs2svc_encode_fattr(rqstp, p, &resp->fh, &resp->stat); - *p++ = htonl(resp->mask); - if (!xdr_ressize_check(rqstp, p)) + if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) + return 0; + if (xdr_stream_encode_u32(xdr, resp->mask) < 0) return 0; - base = (char *)p - (char *)head->iov_base; rqstp->rq_res.page_len = w = nfsacl_size( (resp->mask & NFS_ACL) ? resp->acl_access : NULL, (resp->mask & NFS_DFACL) ? resp->acl_default : NULL); while (w > 0) { if (!*(rqstp->rq_next_page++)) - return 0; + return 1; w -= PAGE_SIZE; } - n = nfsacl_encode(&rqstp->rq_res, base, inode, - resp->acl_access, - resp->mask & NFS_ACL, 0); - if (n > 0) - n = nfsacl_encode(&rqstp->rq_res, base + n, inode, - resp->acl_default, - resp->mask & NFS_DFACL, - NFS_ACL_DEFAULT); - return (n > 0); + if (!nfs_stream_encode_acl(xdr, inode, resp->acl_access, + resp->mask & NFS_ACL, 0)) + return 0; + if (!nfs_stream_encode_acl(xdr, inode, resp->acl_default, + resp->mask & NFS_DFACL, NFS_ACL_DEFAULT)) + return 0; + + return 1; } static int nfsaclsvc_encode_attrstatres(struct svc_rqst *rqstp, __be32 *p) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 5df6f00d76fd..1fed3a8deb18 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -26,7 +26,16 @@ static const u32 nfs_ftypes[] = { * Basic NFSv2 data types (RFC 1094 Section 2.3) */ -static bool +/** + * svcxdr_encode_stat - Encode an NFSv2 status code + * @xdr: XDR stream + * @status: status value to encode + * + * Return values: + * %false: Send buffer space was exhausted + * %true: Success + */ +bool svcxdr_encode_stat(struct xdr_stream *xdr, __be32 status) { __be32 *p; @@ -250,7 +259,18 @@ encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, return p; } -static bool +/** + * svcxdr_encode_fattr - Encode NFSv2 file attributes + * @rqstp: Context of a completed RPC transaction + * @xdr: XDR stream + * @fhp: File handle to encode + * @stat: Attributes to encode + * + * Return values: + * %false: Send buffer space was exhausted + * %true: Success + */ +bool svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, const struct svc_fh *fhp, const struct kstat *stat) { diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 10f3bd25e8cc..8bcdc37398ab 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -170,5 +170,8 @@ void nfssvc_release_readres(struct svc_rqst *rqstp); /* Helper functions for NFSv2 ACL code */ __be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, struct kstat *stat); bool svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp); +bool svcxdr_encode_stat(struct xdr_stream *xdr, __be32 status); +bool svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, + const struct svc_fh *fhp, const struct kstat *stat); #endif /* LINUX_NFSD_H */ From 82ac35b16710034ac2eaddf598a3190ad9675cb8 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 18 Nov 2020 14:47:56 -0500 Subject: [PATCH 0530/1565] NFSD: Update the NFSv2 SETACL result encoder to use struct xdr_stream [ Upstream commit 778f068fa0c0846b650ebdb8795fd51b5badc332 ] The SETACL result encoder is exactly the same as the NFSv2 attrstatres decoder. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs2acl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 102f8a9a235c..ef06a2a384be 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -363,8 +363,8 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { [ACLPROC2_SETACL] = { .pc_func = nfsacld_proc_setacl, .pc_decode = nfsaclsvc_decode_setaclargs, - .pc_encode = nfsaclsvc_encode_attrstatres, - .pc_release = nfsaclsvc_release_attrstat, + .pc_encode = nfssvc_encode_attrstatres, + .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd3_setaclargs), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE, From 6ef7a56fd7fa845fa13278dbb385a5f191659884 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 18 Nov 2020 14:49:57 -0500 Subject: [PATCH 0531/1565] NFSD: Update the NFSv2 ACL GETATTR result encoder to use struct xdr_stream [ Upstream commit 8d2009a10b3abaa12a39deb4876b215714993fe8 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs2acl.c | 24 ++---------------------- 1 file changed, 2 insertions(+), 22 deletions(-) diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index ef06a2a384be..c805ac8dd7e7 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -277,19 +277,6 @@ static int nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) return 1; } -static int nfsaclsvc_encode_attrstatres(struct svc_rqst *rqstp, __be32 *p) -{ - struct nfsd_attrstat *resp = rqstp->rq_resp; - - *p++ = resp->status; - if (resp->status != nfs_ok) - goto out; - - p = nfs2svc_encode_fattr(rqstp, p, &resp->fh, &resp->stat); -out: - return xdr_ressize_check(rqstp, p); -} - /* ACCESS */ static int nfsaclsvc_encode_accessres(struct svc_rqst *rqstp, __be32 *p) { @@ -317,13 +304,6 @@ static void nfsaclsvc_release_getacl(struct svc_rqst *rqstp) posix_acl_release(resp->acl_default); } -static void nfsaclsvc_release_attrstat(struct svc_rqst *rqstp) -{ - struct nfsd_attrstat *resp = rqstp->rq_resp; - - fh_put(&resp->fh); -} - static void nfsaclsvc_release_access(struct svc_rqst *rqstp) { struct nfsd3_accessres *resp = rqstp->rq_resp; @@ -374,8 +354,8 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { [ACLPROC2_GETATTR] = { .pc_func = nfsacld_proc_getattr, .pc_decode = nfssvc_decode_fhandleargs, - .pc_encode = nfsaclsvc_encode_attrstatres, - .pc_release = nfsaclsvc_release_attrstat, + .pc_encode = nfssvc_encode_attrstatres, + .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE, From 3d2033a58c6c9de62d98dd7bf5b93acd40903b5f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 18 Nov 2020 14:52:09 -0500 Subject: [PATCH 0532/1565] NFSD: Update the NFSv2 ACL ACCESS result encoder to use struct xdr_stream [ Upstream commit 07f5c2963c04b11603e9667f89bb430c132e9cc1 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs2acl.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index c805ac8dd7e7..8703326fc165 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -280,16 +280,21 @@ static int nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) /* ACCESS */ static int nfsaclsvc_encode_accessres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_accessres *resp = rqstp->rq_resp; - *p++ = resp->status; - if (resp->status != nfs_ok) - goto out; + if (!svcxdr_encode_stat(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) + return 0; + if (xdr_stream_encode_u32(xdr, resp->access) < 0) + return 0; + break; + } - p = nfs2svc_encode_fattr(rqstp, p, &resp->fh, &resp->stat); - *p++ = htonl(resp->access); -out: - return xdr_ressize_check(rqstp, p); + return 1; } /* From 67d4f36707ade27001eb50412bab24422bcc0cc4 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 15 Nov 2020 14:31:42 -0500 Subject: [PATCH 0533/1565] NFSD: Clean up after updating NFSv2 ACL encoders [ Upstream commit 83d0b84572775a29f800de67a1b9b642a5376bc3 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsxdr.c | 64 ------------------------------------------------ fs/nfsd/xdr.h | 1 - 2 files changed, 65 deletions(-) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 1fed3a8deb18..b800cfefcab7 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -201,64 +201,6 @@ svcxdr_decode_sattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, return true; } -static __be32 * -encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, - struct kstat *stat) -{ - struct user_namespace *userns = nfsd_user_namespace(rqstp); - struct dentry *dentry = fhp->fh_dentry; - int type; - struct timespec64 time; - u32 f; - - type = (stat->mode & S_IFMT); - - *p++ = htonl(nfs_ftypes[type >> 12]); - *p++ = htonl((u32) stat->mode); - *p++ = htonl((u32) stat->nlink); - *p++ = htonl((u32) from_kuid_munged(userns, stat->uid)); - *p++ = htonl((u32) from_kgid_munged(userns, stat->gid)); - - if (S_ISLNK(type) && stat->size > NFS_MAXPATHLEN) { - *p++ = htonl(NFS_MAXPATHLEN); - } else { - *p++ = htonl((u32) stat->size); - } - *p++ = htonl((u32) stat->blksize); - if (S_ISCHR(type) || S_ISBLK(type)) - *p++ = htonl(new_encode_dev(stat->rdev)); - else - *p++ = htonl(0xffffffff); - *p++ = htonl((u32) stat->blocks); - switch (fsid_source(fhp)) { - default: - case FSIDSOURCE_DEV: - *p++ = htonl(new_encode_dev(stat->dev)); - break; - case FSIDSOURCE_FSID: - *p++ = htonl((u32) fhp->fh_export->ex_fsid); - break; - case FSIDSOURCE_UUID: - f = ((u32*)fhp->fh_export->ex_uuid)[0]; - f ^= ((u32*)fhp->fh_export->ex_uuid)[1]; - f ^= ((u32*)fhp->fh_export->ex_uuid)[2]; - f ^= ((u32*)fhp->fh_export->ex_uuid)[3]; - *p++ = htonl(f); - break; - } - *p++ = htonl((u32) stat->ino); - *p++ = htonl((u32) stat->atime.tv_sec); - *p++ = htonl(stat->atime.tv_nsec ? stat->atime.tv_nsec / 1000 : 0); - time = stat->mtime; - lease_get_mtime(d_inode(dentry), &time); - *p++ = htonl((u32) time.tv_sec); - *p++ = htonl(time.tv_nsec ? time.tv_nsec / 1000 : 0); - *p++ = htonl((u32) stat->ctime.tv_sec); - *p++ = htonl(stat->ctime.tv_nsec ? stat->ctime.tv_nsec / 1000 : 0); - - return p; -} - /** * svcxdr_encode_fattr - Encode NFSv2 file attributes * @rqstp: Context of a completed RPC transaction @@ -328,12 +270,6 @@ svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, return true; } -/* Helper function for NFSv2 ACL code */ -__be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, struct kstat *stat) -{ - return encode_fattr(rqstp, p, fhp, stat); -} - /* * XDR decode functions */ diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 8bcdc37398ab..c67ad02b9a02 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -168,7 +168,6 @@ void nfssvc_release_diropres(struct svc_rqst *rqstp); void nfssvc_release_readres(struct svc_rqst *rqstp); /* Helper functions for NFSv2 ACL code */ -__be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, struct kstat *stat); bool svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp); bool svcxdr_encode_stat(struct xdr_stream *xdr, __be32 status); bool svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, From d505e66191072748620fc0af038cea4e4da0e3cd Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 18 Nov 2020 16:11:42 -0500 Subject: [PATCH 0534/1565] NFSD: Update the NFSv3 GETACL result encoder to use struct xdr_stream [ Upstream commit 20798dfe249a01ad1b12eec7dbc572db5003244a ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3acl.c | 33 +++++++++++++++++++-------------- fs/nfsd/nfs3xdr.c | 23 +++++++++++++++++++++-- fs/nfsd/xdr3.h | 3 +++ 3 files changed, 43 insertions(+), 16 deletions(-) diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index a568b842e9eb..04e157b0b201 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -166,22 +166,25 @@ static int nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) /* GETACL */ static int nfs3svc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_getaclres *resp = rqstp->rq_resp; struct dentry *dentry = resp->fh.fh_dentry; + struct kvec *head = rqstp->rq_res.head; + struct inode *inode = d_inode(dentry); + unsigned int base; + int n; + int w; - *p++ = resp->status; - p = nfs3svc_encode_post_op_attr(rqstp, p, &resp->fh); - if (resp->status == 0 && dentry && d_really_is_positive(dentry)) { - struct inode *inode = d_inode(dentry); - struct kvec *head = rqstp->rq_res.head; - unsigned int base; - int n; - int w; - - *p++ = htonl(resp->mask); - if (!xdr_ressize_check(rqstp, p)) + if (!svcxdr_encode_nfsstat3(xdr, resp->status)) + return 0; + switch (resp->status) { + case nfs_ok: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) return 0; - base = (char *)p - (char *)head->iov_base; + if (xdr_stream_encode_u32(xdr, resp->mask) < 0) + return 0; + + base = (char *)xdr->p - (char *)head->iov_base; rqstp->rq_res.page_len = w = nfsacl_size( (resp->mask & NFS_ACL) ? resp->acl_access : NULL, @@ -202,9 +205,11 @@ static int nfs3svc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) NFS_ACL_DEFAULT); if (n <= 0) return 0; - } else - if (!xdr_ressize_check(rqstp, p)) + break; + default: + if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) return 0; + } return 1; } diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 646bbfc5b779..941740a97f8f 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -107,7 +107,16 @@ svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp) return true; } -static bool +/** + * svcxdr_encode_nfsstat3 - Encode an NFSv3 status code + * @xdr: XDR stream + * @status: status value to encode + * + * Return values: + * %false: Send buffer space was exhausted + * %true: Success + */ +bool svcxdr_encode_nfsstat3(struct xdr_stream *xdr, __be32 status) { __be32 *p; @@ -464,7 +473,17 @@ svcxdr_encode_pre_op_attr(struct xdr_stream *xdr, const struct svc_fh *fhp) return svcxdr_encode_wcc_attr(xdr, fhp); } -static bool +/** + * svcxdr_encode_post_op_attr - Encode NFSv3 post-op attributes + * @rqstp: Context of a completed RPC transaction + * @xdr: XDR stream + * @fhp: File handle to encode + * + * Return values: + * %false: Send buffer space was exhausted + * %true: Success + */ +bool svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr, const struct svc_fh *fhp) { diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index b851458373db..746c5f79964f 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -308,5 +308,8 @@ int nfs3svc_encode_entryplus3(void *data, const char *name, int namlen, __be32 *nfs3svc_encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp); bool svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp); +bool svcxdr_encode_nfsstat3(struct xdr_stream *xdr, __be32 status); +bool svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr, + const struct svc_fh *fhp); #endif /* _LINUX_NFSD_XDR3_H */ From 857a37235cf0d2c5ee999b2f5f31dffdbdb209ae Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 18 Nov 2020 16:21:24 -0500 Subject: [PATCH 0535/1565] NFSD: Update the NFSv3 SETACL result encoder to use struct xdr_stream [ Upstream commit 15e432bf0cfd1e6aebfa9ffd4e0cc2ff4f3ae2db ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3acl.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 04e157b0b201..cfb686f23e57 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -217,11 +217,11 @@ static int nfs3svc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) /* SETACL */ static int nfs3svc_encode_setaclres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_attrstat *resp = rqstp->rq_resp; - *p++ = resp->status; - p = nfs3svc_encode_post_op_attr(rqstp, p, &resp->fh); - return xdr_ressize_check(rqstp, p); + return svcxdr_encode_nfsstat3(xdr, resp->status) && + svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh); } /* From a6d9f6f371cb016ccccbf4f04f47074ab507eada Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 15 Nov 2020 15:09:16 -0500 Subject: [PATCH 0536/1565] NFSD: Clean up after updating NFSv3 ACL encoders [ Upstream commit 1416f435303d81070c6bcf5a4a9b4ed0f7a9f013 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 86 ----------------------------------------------- fs/nfsd/xdr3.h | 2 -- 2 files changed, 88 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 941740a97f8f..fcfa0d611b93 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -48,13 +48,6 @@ static const u32 nfs3_ftypes[] = { * Basic NFSv3 data types (RFC 1813 Sections 2.5 and 2.6) */ -static __be32 * -encode_time3(__be32 *p, struct timespec64 *time) -{ - *p++ = htonl((u32) time->tv_sec); *p++ = htonl(time->tv_nsec); - return p; -} - static __be32 * encode_nfstime3(__be32 *p, const struct timespec64 *time) { @@ -396,54 +389,6 @@ svcxdr_encode_fattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr, return true; } -static __be32 *encode_fsid(__be32 *p, struct svc_fh *fhp) -{ - u64 f; - switch(fsid_source(fhp)) { - default: - case FSIDSOURCE_DEV: - p = xdr_encode_hyper(p, (u64)huge_encode_dev - (fhp->fh_dentry->d_sb->s_dev)); - break; - case FSIDSOURCE_FSID: - p = xdr_encode_hyper(p, (u64) fhp->fh_export->ex_fsid); - break; - case FSIDSOURCE_UUID: - f = ((u64*)fhp->fh_export->ex_uuid)[0]; - f ^= ((u64*)fhp->fh_export->ex_uuid)[1]; - p = xdr_encode_hyper(p, f); - break; - } - return p; -} - -static __be32 * -encode_fattr3(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, - struct kstat *stat) -{ - struct user_namespace *userns = nfsd_user_namespace(rqstp); - *p++ = htonl(nfs3_ftypes[(stat->mode & S_IFMT) >> 12]); - *p++ = htonl((u32) (stat->mode & S_IALLUGO)); - *p++ = htonl((u32) stat->nlink); - *p++ = htonl((u32) from_kuid_munged(userns, stat->uid)); - *p++ = htonl((u32) from_kgid_munged(userns, stat->gid)); - if (S_ISLNK(stat->mode) && stat->size > NFS3_MAXPATHLEN) { - p = xdr_encode_hyper(p, (u64) NFS3_MAXPATHLEN); - } else { - p = xdr_encode_hyper(p, (u64) stat->size); - } - p = xdr_encode_hyper(p, ((u64)stat->blocks) << 9); - *p++ = htonl((u32) MAJOR(stat->rdev)); - *p++ = htonl((u32) MINOR(stat->rdev)); - p = encode_fsid(p, fhp); - p = xdr_encode_hyper(p, stat->ino); - p = encode_time3(p, &stat->atime); - p = encode_time3(p, &stat->mtime); - p = encode_time3(p, &stat->ctime); - - return p; -} - static bool svcxdr_encode_wcc_attr(struct xdr_stream *xdr, const struct svc_fh *fhp) { @@ -512,37 +457,6 @@ svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr, return xdr_stream_encode_item_absent(xdr) > 0; } -/* - * Encode post-operation attributes. - * The inode may be NULL if the call failed because of a stale file - * handle. In this case, no attributes are returned. - */ -static __be32 * -encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) -{ - struct dentry *dentry = fhp->fh_dentry; - if (!fhp->fh_no_wcc && dentry && d_really_is_positive(dentry)) { - __be32 err; - struct kstat stat; - - err = fh_getattr(fhp, &stat); - if (!err) { - *p++ = xdr_one; /* attributes follow */ - lease_get_mtime(d_inode(dentry), &stat.mtime); - return encode_fattr3(rqstp, p, fhp, &stat); - } - } - *p++ = xdr_zero; - return p; -} - -/* Helper for NFSv3 ACLs */ -__be32 * -nfs3svc_encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) -{ - return encode_post_op_attr(rqstp, p, fhp); -} - /* * Encode weak cache consistency data */ diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 746c5f79964f..933008382bbe 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -305,8 +305,6 @@ int nfs3svc_encode_entry3(void *data, const char *name, int namlen, int nfs3svc_encode_entryplus3(void *data, const char *name, int namlen, loff_t offset, u64 ino, unsigned int d_type); /* Helper functions for NFSv3 ACL code */ -__be32 *nfs3svc_encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, - struct svc_fh *fhp); bool svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp); bool svcxdr_encode_nfsstat3(struct xdr_stream *xdr, __be32 status); bool svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr, From e7dac943b4d4ad8fd9a171bfbf73ca161fd75157 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 5 Mar 2021 13:57:40 -0500 Subject: [PATCH 0537/1565] NFSD: Add a tracepoint to record directory entry encoding [ Upstream commit 6019ce0742ca55d3e45279a19b07d1542747a098 ] Enable watching the progress of directory encoding to capture the timing of any issues with reading or encoding a directory. The new tracepoint captures dirent encoding for all NFS versions. For example, here's what a few NFSv4 directory entries might look like: nfsd-989 [002] 468.596265: nfsd_dirent: fh_hash=0x5d162594 ino=2 name=. nfsd-989 [002] 468.596267: nfsd_dirent: fh_hash=0x5d162594 ino=1 name=.. nfsd-989 [002] 468.596299: nfsd_dirent: fh_hash=0x5d162594 ino=3827 name=zlib.c nfsd-989 [002] 468.596325: nfsd_dirent: fh_hash=0x5d162594 ino=3811 name=xdiff nfsd-989 [002] 468.596351: nfsd_dirent: fh_hash=0x5d162594 ino=3810 name=xdiff-interface.h nfsd-989 [002] 468.596377: nfsd_dirent: fh_hash=0x5d162594 ino=3809 name=xdiff-interface.c Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/trace.h | 24 ++++++++++++++++++++++++ fs/nfsd/vfs.c | 9 ++++++--- 2 files changed, 30 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 2bc2a888f7fa..d20ab767ba26 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -391,6 +391,30 @@ DEFINE_EVENT(nfsd_err_class, nfsd_##name, \ DEFINE_NFSD_ERR_EVENT(read_err); DEFINE_NFSD_ERR_EVENT(write_err); +TRACE_EVENT(nfsd_dirent, + TP_PROTO(struct svc_fh *fhp, + u64 ino, + const char *name, + int namlen), + TP_ARGS(fhp, ino, name, namlen), + TP_STRUCT__entry( + __field(u32, fh_hash) + __field(u64, ino) + __field(int, len) + __dynamic_array(unsigned char, name, namlen) + ), + TP_fast_assign( + __entry->fh_hash = fhp ? knfsd_fh_hash(&fhp->fh_handle) : 0; + __entry->ino = ino; + __entry->len = namlen; + memcpy(__get_str(name), name, namlen); + __assign_str(name, name); + ), + TP_printk("fh_hash=0x%08x ino=%llu name=%.*s", + __entry->fh_hash, __entry->ino, + __entry->len, __get_str(name)) +) + #include "state.h" #include "filecache.h" #include "vfs.h" diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index d12c3e71ca10..520e55c35e74 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1979,8 +1979,9 @@ static int nfsd_buffered_filldir(struct dir_context *ctx, const char *name, return 0; } -static __be32 nfsd_buffered_readdir(struct file *file, nfsd_filldir_t func, - struct readdir_cd *cdp, loff_t *offsetp) +static __be32 nfsd_buffered_readdir(struct file *file, struct svc_fh *fhp, + nfsd_filldir_t func, struct readdir_cd *cdp, + loff_t *offsetp) { struct buffered_dirent *de; int host_err; @@ -2026,6 +2027,8 @@ static __be32 nfsd_buffered_readdir(struct file *file, nfsd_filldir_t func, if (cdp->err != nfs_ok) break; + trace_nfsd_dirent(fhp, de->ino, de->name, de->namlen); + reclen = ALIGN(sizeof(*de) + de->namlen, sizeof(u64)); size -= reclen; @@ -2073,7 +2076,7 @@ nfsd_readdir(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t *offsetp, goto out_close; } - err = nfsd_buffered_readdir(file, func, cdp, offsetp); + err = nfsd_buffered_readdir(file, fhp, func, cdp, offsetp); if (err == nfserr_eof || err == nfserr_toosmall) err = nfs_ok; /* can still be found in ->err */ From aab7be2475d1e49c9e83e6b7cef3a0edfe7ce199 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 5 Mar 2021 14:22:32 -0500 Subject: [PATCH 0538/1565] NFSD: Clean up NFSDDBG_FACILITY macro [ Upstream commit 219a170502b3d597c52eeec088aee8fbf7b90da5 ] These are no longer needed because there are no dprintk() call sites in these files. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 3 --- fs/nfsd/nfsxdr.c | 2 -- 2 files changed, 5 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index fcfa0d611b93..0a5ebc52e6a9 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -14,9 +14,6 @@ #include "netns.h" #include "vfs.h" -#define NFSDDBG_FACILITY NFSDDBG_XDR - - /* * Force construction of an empty post-op attr */ diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index b800cfefcab7..a06c05fe3b42 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -9,8 +9,6 @@ #include "xdr.h" #include "auth.h" -#define NFSDDBG_FACILITY NFSDDBG_XDR - /* * Mapping of S_IF* types to NFS file types */ From 0fa20162bfc7f3b72faed475d0c4a083d859d4e3 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Tue, 2 Mar 2021 10:46:23 -0500 Subject: [PATCH 0539/1565] nfsd: helper for laundromat expiry calculations [ Upstream commit 7f7e7a4006f74b031718055a0751c70c2e3d5e7e ] We do this same logic repeatedly, and it's easy to get the sense of the comparison wrong. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 49 +++++++++++++++++++++++++-------------------- 1 file changed, 27 insertions(+), 22 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 914f60cee322..0afc14b3e459 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5385,6 +5385,22 @@ static bool clients_still_reclaiming(struct nfsd_net *nn) return true; } +struct laundry_time { + time64_t cutoff; + time64_t new_timeo; +}; + +static bool state_expired(struct laundry_time *lt, time64_t last_refresh) +{ + time64_t time_remaining; + + if (last_refresh < lt->cutoff) + return true; + time_remaining = last_refresh - lt->cutoff; + lt->new_timeo = min(lt->new_timeo, time_remaining); + return false; +} + static time64_t nfs4_laundromat(struct nfsd_net *nn) { @@ -5394,14 +5410,16 @@ nfs4_laundromat(struct nfsd_net *nn) struct nfs4_ol_stateid *stp; struct nfsd4_blocked_lock *nbl; struct list_head *pos, *next, reaplist; - time64_t cutoff = ktime_get_boottime_seconds() - nn->nfsd4_lease; - time64_t t, new_timeo = nn->nfsd4_lease; + struct laundry_time lt = { + .cutoff = ktime_get_boottime_seconds() - nn->nfsd4_lease, + .new_timeo = nn->nfsd4_lease + }; struct nfs4_cpntf_state *cps; copy_stateid_t *cps_t; int i; if (clients_still_reclaiming(nn)) { - new_timeo = 0; + lt.new_timeo = 0; goto out; } nfsd4_end_grace(nn); @@ -5411,7 +5429,7 @@ nfs4_laundromat(struct nfsd_net *nn) idr_for_each_entry(&nn->s2s_cp_stateids, cps_t, i) { cps = container_of(cps_t, struct nfs4_cpntf_state, cp_stateid); if (cps->cp_stateid.sc_type == NFS4_COPYNOTIFY_STID && - cps->cpntf_time < cutoff) + state_expired(<, cps->cpntf_time)) _free_cpntf_state_locked(nn, cps); } spin_unlock(&nn->s2s_cp_lock); @@ -5419,11 +5437,8 @@ nfs4_laundromat(struct nfsd_net *nn) spin_lock(&nn->client_lock); list_for_each_safe(pos, next, &nn->client_lru) { clp = list_entry(pos, struct nfs4_client, cl_lru); - if (clp->cl_time > cutoff) { - t = clp->cl_time - cutoff; - new_timeo = min(new_timeo, t); + if (!state_expired(<, clp->cl_time)) break; - } if (mark_client_expired_locked(clp)) { trace_nfsd_clid_expired(&clp->cl_clientid); continue; @@ -5440,11 +5455,8 @@ nfs4_laundromat(struct nfsd_net *nn) spin_lock(&state_lock); list_for_each_safe(pos, next, &nn->del_recall_lru) { dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru); - if (dp->dl_time > cutoff) { - t = dp->dl_time - cutoff; - new_timeo = min(new_timeo, t); + if (!state_expired(<, dp->dl_time)) break; - } WARN_ON(!unhash_delegation_locked(dp)); list_add(&dp->dl_recall_lru, &reaplist); } @@ -5460,11 +5472,8 @@ nfs4_laundromat(struct nfsd_net *nn) while (!list_empty(&nn->close_lru)) { oo = list_first_entry(&nn->close_lru, struct nfs4_openowner, oo_close_lru); - if (oo->oo_time > cutoff) { - t = oo->oo_time - cutoff; - new_timeo = min(new_timeo, t); + if (!state_expired(<, oo->oo_time)) break; - } list_del_init(&oo->oo_close_lru); stp = oo->oo_last_closed_stid; oo->oo_last_closed_stid = NULL; @@ -5490,11 +5499,8 @@ nfs4_laundromat(struct nfsd_net *nn) while (!list_empty(&nn->blocked_locks_lru)) { nbl = list_first_entry(&nn->blocked_locks_lru, struct nfsd4_blocked_lock, nbl_lru); - if (nbl->nbl_time > cutoff) { - t = nbl->nbl_time - cutoff; - new_timeo = min(new_timeo, t); + if (!state_expired(<, nbl->nbl_time)) break; - } list_move(&nbl->nbl_lru, &reaplist); list_del_init(&nbl->nbl_list); } @@ -5507,8 +5513,7 @@ nfs4_laundromat(struct nfsd_net *nn) free_blocked_lock(nbl); } out: - new_timeo = max_t(time64_t, new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT); - return new_timeo; + return max_t(time64_t, lt.new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT); } static struct workqueue_struct *laundry_wq; From 2a5df97ba41c5b519b9d46b439b1a9927d59e9ff Mon Sep 17 00:00:00 2001 From: Paul Menzel Date: Fri, 12 Mar 2021 22:03:00 +0100 Subject: [PATCH 0540/1565] nfsd: Log client tracking type log message as info instead of warning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit f988a7b71d1e66e63f79cd59c763875347943a7a ] `printk()`, by default, uses the log level warning, which leaves the user reading NFSD: Using UMH upcall client tracking operations. wondering what to do about it (`dmesg --level=warn`). Several client tracking methods are tried, and expected to fail. That’s why a message is printed only on success. It might be interesting for users to know the chosen method, so use info-level instead of debug-level. Cc: linux-nfs@vger.kernel.org Signed-off-by: Paul Menzel Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4recover.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c index 83c4e6883953..d08c1a8c9254 100644 --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -626,7 +626,7 @@ nfsd4_legacy_tracking_init(struct net *net) status = nfsd4_load_reboot_recovery_data(net); if (status) goto err; - printk("NFSD: Using legacy client tracking operations.\n"); + pr_info("NFSD: Using legacy client tracking operations.\n"); return 0; err: @@ -1030,7 +1030,7 @@ nfsd4_init_cld_pipe(struct net *net) status = __nfsd4_init_cld_pipe(net); if (!status) - printk("NFSD: Using old nfsdcld client tracking operations.\n"); + pr_info("NFSD: Using old nfsdcld client tracking operations.\n"); return status; } @@ -1607,7 +1607,7 @@ nfsd4_cld_tracking_init(struct net *net) nfs4_release_reclaim(nn); goto err_remove; } else - printk("NFSD: Using nfsdcld client tracking operations.\n"); + pr_info("NFSD: Using nfsdcld client tracking operations.\n"); return 0; err_remove: @@ -1866,7 +1866,7 @@ nfsd4_umh_cltrack_init(struct net *net) ret = nfsd4_umh_cltrack_upcall("init", NULL, grace_start, NULL); kfree(grace_start); if (!ret) - printk("NFSD: Using UMH upcall client tracking operations.\n"); + pr_info("NFSD: Using UMH upcall client tracking operations.\n"); return ret; } From ed01819390647a4fed22b56611bf0e2da6d27c66 Mon Sep 17 00:00:00 2001 From: Ricardo Ribalda Date: Thu, 18 Mar 2021 21:22:21 +0100 Subject: [PATCH 0541/1565] nfsd: Fix typo "accesible" [ Upstream commit 34a624931b8c12b435b5009edc5897e4630107bc ] Trivial fix. Cc: linux-nfs@vger.kernel.org Signed-off-by: Ricardo Ribalda Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/Kconfig | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig index d6cff5fbe705..5fa38ad9e7e3 100644 --- a/fs/nfsd/Kconfig +++ b/fs/nfsd/Kconfig @@ -99,7 +99,7 @@ config NFSD_BLOCKLAYOUT help This option enables support for the exporting pNFS block layouts in the kernel's NFS server. The pNFS block layout enables NFS - clients to directly perform I/O to block devices accesible to both + clients to directly perform I/O to block devices accessible to both the server and the clients. See RFC 5663 for more details. If unsure, say N. @@ -113,7 +113,7 @@ config NFSD_SCSILAYOUT help This option enables support for the exporting pNFS SCSI layouts in the kernel's NFS server. The pNFS SCSI layout enables NFS - clients to directly perform I/O to SCSI devices accesible to both + clients to directly perform I/O to SCSI devices accessible to both the server and the clients. See draft-ietf-nfsv4-scsi-layout for more details. @@ -127,7 +127,7 @@ config NFSD_FLEXFILELAYOUT This option enables support for the exporting pNFS Flex File layouts in the kernel's NFS server. The pNFS Flex File layout enables NFS clients to directly perform I/O to NFSv3 devices - accesible to both the server and the clients. See + accessible to both the server and the clients. See draft-ietf-nfsv4-flex-files for more details. Warning, this server implements the bare minimum functionality From 1f76b1e65926a460cf782e187defed6f482b5f15 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Thu, 18 Mar 2021 20:03:23 -0400 Subject: [PATCH 0542/1565] nfsd: COPY with length 0 should copy to end of file [ Upstream commit 792a5112aa90e59c048b601c6382fe3498d75db7 ] >From https://tools.ietf.org/html/rfc7862#page-65 A count of 0 (zero) requests that all bytes from ca_src_offset through EOF be copied to the destination. Reported-by: Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 2fba0808d975..949d9cedef5d 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1380,6 +1380,9 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy) u64 src_pos = copy->cp_src_pos; u64 dst_pos = copy->cp_dst_pos; + /* See RFC 7862 p.67: */ + if (bytes_total == 0) + bytes_total = ULLONG_MAX; do { if (kthread_should_stop()) break; From 14b13e0603f83b724da37e7eac2462fcd19d3ada Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Thu, 18 Mar 2021 20:03:22 -0400 Subject: [PATCH 0543/1565] nfsd: don't ignore high bits of copy count [ Upstream commit e7a833e9cc6c3b58fe94f049d2b40943cba07086 ] Note size_t is 32-bit on a 32-bit architecture, but cp_count is defined by the protocol to be 64 bit, so we could be turning a large copy into a 0-length copy here. Reported-by: Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 949d9cedef5d..f85958f81a26 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1376,7 +1376,7 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy) struct file *dst = copy->nf_dst->nf_file; struct file *src = copy->nf_src->nf_file; ssize_t bytes_copied = 0; - size_t bytes_total = copy->cp_count; + u64 bytes_total = copy->cp_count; u64 src_pos = copy->cp_src_pos; u64 dst_pos = copy->cp_dst_pos; From 289adc864d0a0e754d9773887ce3c1493b76132b Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Sat, 20 Mar 2021 09:38:04 +1100 Subject: [PATCH 0544/1565] nfsd: report client confirmation status in "info" file [ Upstream commit 472d155a0631bd1a09b5c0c275a254e65605d683 ] mountd can now monitor clients appearing and disappearing in /proc/fs/nfsd/clients, and will log these events, in liu of the logging of mount/unmount events for NFSv3. Currently it cannot distinguish between unconfirmed clients (which might be transient and totally uninteresting) and confirmed clients. So add a "status: " line which reports either "confirmed" or "unconfirmed", and use fsnotify to report that the info file has been modified. This requires a bit of infrastructure to keep the dentry for the "info" file. There is no need to take a counted reference as the dentry must remain around until the client is removed. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 19 +++++++++++++++---- fs/nfsd/nfsctl.c | 14 ++++++++------ fs/nfsd/nfsd.h | 4 +++- fs/nfsd/state.h | 4 ++++ 4 files changed, 30 insertions(+), 11 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 0afc14b3e459..104d56363654 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -43,6 +43,7 @@ #include #include #include +#include #include "xdr4.h" #include "xdr4cb.h" #include "vfs.h" @@ -2370,6 +2371,10 @@ static int client_info_show(struct seq_file *m, void *v) memcpy(&clid, &clp->cl_clientid, sizeof(clid)); seq_printf(m, "clientid: 0x%llx\n", clid); seq_printf(m, "address: \"%pISpc\"\n", (struct sockaddr *)&clp->cl_addr); + if (test_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags)) + seq_puts(m, "status: confirmed\n"); + else + seq_puts(m, "status: unconfirmed\n"); seq_printf(m, "name: "); seq_quote_mem(m, clp->cl_name.data, clp->cl_name.len); seq_printf(m, "\nminor version: %d\n", clp->cl_minorversion); @@ -2724,6 +2729,7 @@ static struct nfs4_client *create_client(struct xdr_netobj name, int ret; struct net *net = SVC_NET(rqstp); struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct dentry *dentries[ARRAY_SIZE(client_files)]; clp = alloc_client(name); if (clp == NULL) @@ -2743,9 +2749,11 @@ static struct nfs4_client *create_client(struct xdr_netobj name, memcpy(&clp->cl_addr, sa, sizeof(struct sockaddr_storage)); clp->cl_cb_session = NULL; clp->net = net; - clp->cl_nfsd_dentry = nfsd_client_mkdir(nn, &clp->cl_nfsdfs, - clp->cl_clientid.cl_id - nn->clientid_base, - client_files); + clp->cl_nfsd_dentry = nfsd_client_mkdir( + nn, &clp->cl_nfsdfs, + clp->cl_clientid.cl_id - nn->clientid_base, + client_files, dentries); + clp->cl_nfsd_info_dentry = dentries[0]; if (!clp->cl_nfsd_dentry) { free_client(clp); return NULL; @@ -2820,7 +2828,10 @@ move_to_confirmed(struct nfs4_client *clp) list_move(&clp->cl_idhash, &nn->conf_id_hashtbl[idhashval]); rb_erase(&clp->cl_namenode, &nn->unconf_name_tree); add_clp_to_name_tree(clp, &nn->conf_name_tree); - set_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags); + if (!test_and_set_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags) && + clp->cl_nfsd_dentry && + clp->cl_nfsd_info_dentry) + fsnotify_dentry(clp->cl_nfsd_info_dentry, FS_MODIFY); renew_client_locked(clp); } diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index fa060f91df1b..147ed8995a79 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1270,7 +1270,8 @@ static void nfsdfs_remove_files(struct dentry *root) /* XXX: cut'n'paste from simple_fill_super; figure out if we could share * code instead. */ static int nfsdfs_create_files(struct dentry *root, - const struct tree_descr *files) + const struct tree_descr *files, + struct dentry **fdentries) { struct inode *dir = d_inode(root); struct inode *inode; @@ -1279,8 +1280,6 @@ static int nfsdfs_create_files(struct dentry *root, inode_lock(dir); for (i = 0; files->name && files->name[0]; i++, files++) { - if (!files->name) - continue; dentry = d_alloc_name(root, files->name); if (!dentry) goto out; @@ -1294,6 +1293,8 @@ static int nfsdfs_create_files(struct dentry *root, inode->i_private = __get_nfsdfs_client(dir); d_add(dentry, inode); fsnotify_create(dir, dentry); + if (fdentries) + fdentries[i] = dentry; } inode_unlock(dir); return 0; @@ -1305,8 +1306,9 @@ static int nfsdfs_create_files(struct dentry *root, /* on success, returns positive number unique to that client. */ struct dentry *nfsd_client_mkdir(struct nfsd_net *nn, - struct nfsdfs_client *ncl, u32 id, - const struct tree_descr *files) + struct nfsdfs_client *ncl, u32 id, + const struct tree_descr *files, + struct dentry **fdentries) { struct dentry *dentry; char name[11]; @@ -1317,7 +1319,7 @@ struct dentry *nfsd_client_mkdir(struct nfsd_net *nn, dentry = nfsd_mkdir(nn->nfsd_client_dir, ncl, name); if (IS_ERR(dentry)) /* XXX: tossing errors? */ return NULL; - ret = nfsdfs_create_files(dentry, files); + ret = nfsdfs_create_files(dentry, files, fdentries); if (ret) { nfsd_client_rmdir(dentry); return NULL; diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 27c1308ffc2b..14dbfa75059d 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -106,7 +106,9 @@ struct nfsdfs_client { struct nfsdfs_client *get_nfsdfs_client(struct inode *); struct dentry *nfsd_client_mkdir(struct nfsd_net *nn, - struct nfsdfs_client *ncl, u32 id, const struct tree_descr *); + struct nfsdfs_client *ncl, u32 id, + const struct tree_descr *, + struct dentry **fdentries); void nfsd_client_rmdir(struct dentry *dentry); diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 73deea353169..54cab651ac1d 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -371,6 +371,10 @@ struct nfs4_client { /* debugging info directory under nfsd/clients/ : */ struct dentry *cl_nfsd_dentry; + /* 'info' file within that directory. Ref is not counted, + * but will remain valid iff cl_nfsd_dentry != NULL + */ + struct dentry *cl_nfsd_info_dentry; /* for nfs41 callbacks */ /* We currently support a single back channel with a single slot */ From 117dac268d805a7f3990caab43ab62529686865a Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 29 Jan 2021 13:04:04 -0500 Subject: [PATCH 0545/1565] SUNRPC: Export svc_xprt_received() [ Upstream commit 7dcfbd86adc45f6d6b37278efd22530cf80ab474 ] Prepare svc_xprt_received() to be called from transport code instead of from generic RPC server code. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/sunrpc/svc_xprt.h | 1 + include/trace/events/sunrpc.h | 1 + net/sunrpc/svc_xprt.c | 13 +++++++++---- 3 files changed, 11 insertions(+), 4 deletions(-) diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index 92455e0d5244..c5278871f9e4 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -130,6 +130,7 @@ void svc_xprt_init(struct net *, struct svc_xprt_class *, struct svc_xprt *, int svc_create_xprt(struct svc_serv *, const char *, struct net *, const int, const unsigned short, int, const struct cred *); +void svc_xprt_received(struct svc_xprt *xprt); void svc_xprt_do_enqueue(struct svc_xprt *xprt); void svc_xprt_enqueue(struct svc_xprt *xprt); void svc_xprt_put(struct svc_xprt *xprt); diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 200978b94a0b..657138ab1f91 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -1756,6 +1756,7 @@ DECLARE_EVENT_CLASS(svc_xprt_event, ), \ TP_ARGS(xprt)) +DEFINE_SVC_XPRT_EVENT(received); DEFINE_SVC_XPRT_EVENT(no_write_space); DEFINE_SVC_XPRT_EVENT(close); DEFINE_SVC_XPRT_EVENT(detach); diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 06e503466c32..d150e25b4d45 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -233,21 +233,25 @@ static struct svc_xprt *__svc_xpo_create(struct svc_xprt_class *xcl, return xprt; } -/* - * svc_xprt_received conditionally queues the transport for processing - * by another thread. The caller must hold the XPT_BUSY bit and must +/** + * svc_xprt_received - start next receiver thread + * @xprt: controlling transport + * + * The caller must hold the XPT_BUSY bit and must * not thereafter touch transport data. * * Note: XPT_DATA only gets cleared when a read-attempt finds no (or * insufficient) data. */ -static void svc_xprt_received(struct svc_xprt *xprt) +void svc_xprt_received(struct svc_xprt *xprt) { if (!test_bit(XPT_BUSY, &xprt->xpt_flags)) { WARN_ONCE(1, "xprt=0x%p already busy!", xprt); return; } + trace_svc_xprt_received(xprt); + /* As soon as we clear busy, the xprt could be closed and * 'put', so we need a reference to call svc_enqueue_xprt with: */ @@ -257,6 +261,7 @@ static void svc_xprt_received(struct svc_xprt *xprt) xprt->xpt_server->sv_ops->svo_enqueue_xprt(xprt); svc_xprt_put(xprt); } +EXPORT_SYMBOL_GPL(svc_xprt_received); void svc_add_new_perm_xprt(struct svc_serv *serv, struct svc_xprt *new) { From b20d88bf1eab5f33103457f535dad686c849b325 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Tue, 23 Mar 2021 17:48:58 -0500 Subject: [PATCH 0546/1565] UAPI: nfsfh.h: Replace one-element array with flexible-array member MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit c0a744dcaa29e9537e8607ae9c965ad936124a4d ] There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. Use an anonymous union with a couple of anonymous structs in order to keep userspace unchanged: $ pahole -C nfs_fhbase_new fs/nfsd/nfsfh.o struct nfs_fhbase_new { union { struct { __u8 fb_version_aux; /* 0 1 */ __u8 fb_auth_type_aux; /* 1 1 */ __u8 fb_fsid_type_aux; /* 2 1 */ __u8 fb_fileid_type_aux; /* 3 1 */ __u32 fb_auth[1]; /* 4 4 */ }; /* 0 8 */ struct { __u8 fb_version; /* 0 1 */ __u8 fb_auth_type; /* 1 1 */ __u8 fb_fsid_type; /* 2 1 */ __u8 fb_fileid_type; /* 3 1 */ __u32 fb_auth_flex[0]; /* 4 0 */ }; /* 0 4 */ }; /* 0 8 */ /* size: 8, cachelines: 1, members: 1 */ /* last cacheline: 8 bytes */ }; Also, this helps with the ongoing efforts to enable -Warray-bounds by fixing the following warnings: fs/nfsd/nfsfh.c: In function ‘nfsd_set_fh_dentry’: fs/nfsd/nfsfh.c:191:41: warning: array subscript 1 is above array bounds of ‘__u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds] 191 | ntohl((__force __be32)fh->fh_fsid[1]))); | ~~~~~~~~~~~^~~ ./include/linux/kdev_t.h:12:46: note: in definition of macro ‘MKDEV’ 12 | #define MKDEV(ma,mi) (((ma) << MINORBITS) | (mi)) | ^~ ./include/uapi/linux/byteorder/little_endian.h:40:26: note: in expansion of macro ‘__swab32’ 40 | #define __be32_to_cpu(x) __swab32((__force __u32)(__be32)(x)) | ^~~~~~~~ ./include/linux/byteorder/generic.h:136:21: note: in expansion of macro ‘__be32_to_cpu’ 136 | #define ___ntohl(x) __be32_to_cpu(x) | ^~~~~~~~~~~~~ ./include/linux/byteorder/generic.h:140:18: note: in expansion of macro ‘___ntohl’ 140 | #define ntohl(x) ___ntohl(x) | ^~~~~~~~ fs/nfsd/nfsfh.c:191:8: note: in expansion of macro ‘ntohl’ 191 | ntohl((__force __be32)fh->fh_fsid[1]))); | ^~~~~ fs/nfsd/nfsfh.c:192:32: warning: array subscript 2 is above array bounds of ‘__u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds] 192 | fh->fh_fsid[1] = fh->fh_fsid[2]; | ~~~~~~~~~~~^~~ fs/nfsd/nfsfh.c:192:15: warning: array subscript 1 is above array bounds of ‘__u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds] 192 | fh->fh_fsid[1] = fh->fh_fsid[2]; | ~~~~~~~~~~~^~~ [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/79 Link: https://github.com/KSPP/linux/issues/109 Signed-off-by: Gustavo A. R. Silva Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/uapi/linux/nfsd/nfsfh.h | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/include/uapi/linux/nfsd/nfsfh.h b/include/uapi/linux/nfsd/nfsfh.h index ff0ca88b1c8f..427294dd56a1 100644 --- a/include/uapi/linux/nfsd/nfsfh.h +++ b/include/uapi/linux/nfsd/nfsfh.h @@ -64,13 +64,24 @@ struct nfs_fhbase_old { * in include/linux/exportfs.h for currently registered values. */ struct nfs_fhbase_new { - __u8 fb_version; /* == 1, even => nfs_fhbase_old */ - __u8 fb_auth_type; - __u8 fb_fsid_type; - __u8 fb_fileid_type; - __u32 fb_auth[1]; -/* __u32 fb_fsid[0]; floating */ -/* __u32 fb_fileid[0]; floating */ + union { + struct { + __u8 fb_version_aux; /* == 1, even => nfs_fhbase_old */ + __u8 fb_auth_type_aux; + __u8 fb_fsid_type_aux; + __u8 fb_fileid_type_aux; + __u32 fb_auth[1]; + /* __u32 fb_fsid[0]; floating */ + /* __u32 fb_fileid[0]; floating */ + }; + struct { + __u8 fb_version; /* == 1, even => nfs_fhbase_old */ + __u8 fb_auth_type; + __u8 fb_fsid_type; + __u8 fb_fileid_type; + __u32 fb_auth_flex[]; /* flexible-array member */ + }; + }; }; struct knfsd_fh { @@ -97,7 +108,7 @@ struct knfsd_fh { #define fh_fsid_type fh_base.fh_new.fb_fsid_type #define fh_auth_type fh_base.fh_new.fb_auth_type #define fh_fileid_type fh_base.fh_new.fb_fileid_type -#define fh_fsid fh_base.fh_new.fb_auth +#define fh_fsid fh_base.fh_new.fb_auth_flex /* Do not use, provided for userspace compatiblity. */ #define fh_auth fh_base.fh_new.fb_auth From d9168ab8d714e7931438088dba3eb79c5a04152d Mon Sep 17 00:00:00 2001 From: Guobin Huang Date: Tue, 6 Apr 2021 20:08:18 +0800 Subject: [PATCH 0547/1565] NFSD: Use DEFINE_SPINLOCK() for spinlock [ Upstream commit b73ac6808b0f7994a05ebc38571e2e9eaf98a0f4 ] spinlock can be initialized automatically with DEFINE_SPINLOCK() rather than explicitly calling spin_lock_init(). Reported-by: Hulk Robot Signed-off-by: Guobin Huang Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfssvc.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 0666ef4b87b7..79bc75d41522 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -84,7 +84,7 @@ DEFINE_MUTEX(nfsd_mutex); * version 4.1 DRC caches. * nfsd_drc_pages_used tracks the current version 4.1 DRC memory usage. */ -spinlock_t nfsd_drc_lock; +DEFINE_SPINLOCK(nfsd_drc_lock); unsigned long nfsd_drc_max_mem; unsigned long nfsd_drc_mem_used; @@ -563,7 +563,6 @@ static void set_max_drc(void) nfsd_drc_max_mem = (nr_free_buffer_pages() >> NFSD_DRC_SIZE_SHIFT) * PAGE_SIZE; nfsd_drc_mem_used = 0; - spin_lock_init(&nfsd_drc_lock); dprintk("%s nfsd_drc_max_mem %lu \n", __func__, nfsd_drc_max_mem); } From 62b7f3847373a48568162419f3a2250bce048756 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 4 Mar 2021 12:48:22 +0200 Subject: [PATCH 0548/1565] fsnotify: allow fsnotify_{peek,remove}_first_event with empty queue [ Upstream commit 6f73171e192366ff7c98af9fb50615ef9615f8a7 ] Current code has an assumtion that fsnotify_notify_queue_is_empty() is called to verify that queue is not empty before trying to peek or remove an event from queue. Remove this assumption by moving the fsnotify_notify_queue_is_empty() into the functions, allow them to return NULL value and check return value by all callers. This is a prep patch for multi event queues. Link: https://lore.kernel.org/r/20210304104826.3993892-2-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 26 ++++++++++------ fs/notify/inotify/inotify_user.c | 5 ++- fs/notify/notification.c | 50 ++++++++++++++---------------- include/linux/fsnotify_backend.h | 8 ++++- 4 files changed, 48 insertions(+), 41 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 829ead2792df..30651e8d1a6d 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -100,24 +100,30 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group, { size_t event_size = FAN_EVENT_METADATA_LEN; struct fanotify_event *event = NULL; + struct fsnotify_event *fsn_event; unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); pr_debug("%s: group=%p count=%zd\n", __func__, group, count); spin_lock(&group->notification_lock); - if (fsnotify_notify_queue_is_empty(group)) + fsn_event = fsnotify_peek_first_event(group); + if (!fsn_event) goto out; - if (fid_mode) { - event_size += fanotify_event_info_len(fid_mode, - FANOTIFY_E(fsnotify_peek_first_event(group))); - } + event = FANOTIFY_E(fsn_event); + if (fid_mode) + event_size += fanotify_event_info_len(fid_mode, event); if (event_size > count) { event = ERR_PTR(-EINVAL); goto out; } - event = FANOTIFY_E(fsnotify_remove_first_event(group)); + + /* + * Held the notification_lock the whole time, so this is the + * same event we peeked above. + */ + fsnotify_remove_first_event(group); if (fanotify_is_perm_event(event->mask)) FANOTIFY_PERM(event)->state = FAN_EVENT_REPORTED; out: @@ -573,6 +579,7 @@ static ssize_t fanotify_write(struct file *file, const char __user *buf, size_t static int fanotify_release(struct inode *ignored, struct file *file) { struct fsnotify_group *group = file->private_data; + struct fsnotify_event *fsn_event; /* * Stop new events from arriving in the notification queue. since @@ -601,13 +608,12 @@ static int fanotify_release(struct inode *ignored, struct file *file) * dequeue them and set the response. They will be freed once the * response is consumed and fanotify_get_response() returns. */ - while (!fsnotify_notify_queue_is_empty(group)) { - struct fanotify_event *event; + while ((fsn_event = fsnotify_remove_first_event(group))) { + struct fanotify_event *event = FANOTIFY_E(fsn_event); - event = FANOTIFY_E(fsnotify_remove_first_event(group)); if (!(event->mask & FANOTIFY_PERM_EVENTS)) { spin_unlock(&group->notification_lock); - fsnotify_destroy_event(group, &event->fse); + fsnotify_destroy_event(group, fsn_event); } else { finish_permission_event(group, FANOTIFY_PERM(event), FAN_ALLOW); diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 82fc0cf86a7c..c2018983832e 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -146,10 +146,9 @@ static struct fsnotify_event *get_one_event(struct fsnotify_group *group, size_t event_size = sizeof(struct inotify_event); struct fsnotify_event *event; - if (fsnotify_notify_queue_is_empty(group)) - return NULL; - event = fsnotify_peek_first_event(group); + if (!event) + return NULL; pr_debug("%s: group=%p event=%p\n", __func__, group, event); diff --git a/fs/notify/notification.c b/fs/notify/notification.c index 75d79d6d3ef0..001cfe7d2e4e 100644 --- a/fs/notify/notification.c +++ b/fs/notify/notification.c @@ -47,13 +47,6 @@ u32 fsnotify_get_cookie(void) } EXPORT_SYMBOL_GPL(fsnotify_get_cookie); -/* return true if the notify queue is empty, false otherwise */ -bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group) -{ - assert_spin_locked(&group->notification_lock); - return list_empty(&group->notification_list) ? true : false; -} - void fsnotify_destroy_event(struct fsnotify_group *group, struct fsnotify_event *event) { @@ -141,35 +134,38 @@ void fsnotify_remove_queued_event(struct fsnotify_group *group, } /* - * Remove and return the first event from the notification list. It is the - * responsibility of the caller to destroy the obtained event - */ -struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group *group) -{ - struct fsnotify_event *event; - - assert_spin_locked(&group->notification_lock); - - pr_debug("%s: group=%p\n", __func__, group); - - event = list_first_entry(&group->notification_list, - struct fsnotify_event, list); - fsnotify_remove_queued_event(group, event); - return event; -} - -/* - * This will not remove the event, that must be done with - * fsnotify_remove_first_event() + * Return the first event on the notification list without removing it. + * Returns NULL if the list is empty. */ struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group) { assert_spin_locked(&group->notification_lock); + if (fsnotify_notify_queue_is_empty(group)) + return NULL; + return list_first_entry(&group->notification_list, struct fsnotify_event, list); } +/* + * Remove and return the first event from the notification list. It is the + * responsibility of the caller to destroy the obtained event + */ +struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group *group) +{ + struct fsnotify_event *event = fsnotify_peek_first_event(group); + + if (!event) + return NULL; + + pr_debug("%s: group=%p event=%p\n", __func__, group, event); + + fsnotify_remove_queued_event(group, event); + + return event; +} + /* * Called when a group is being torn down to clean up any outstanding * event notifications. diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index e5409b83e731..7eb979bfc141 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -495,7 +495,13 @@ static inline void fsnotify_queue_overflow(struct fsnotify_group *group) fsnotify_add_event(group, group->overflow_event, NULL); } -/* true if the group notification queue is empty */ +static inline bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group) +{ + assert_spin_locked(&group->notification_lock); + + return list_empty(&group->notification_list); +} + extern bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group); /* return, but do not dequeue the first event on the notification queue */ extern struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group); From 4f14948942937eeb2f4c363f5912d143049cdf26 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 7 Mar 2024 09:22:43 -0500 Subject: [PATCH 0549/1565] Revert "fanotify: limit number of event merge attempts" Temporarily revert commit ad3ea16746cc ("fanotify: limit number of event merge attempts") to enable subsequent upstream commits to apply and build cleanly. Stable-dep-of: 8988f11abb82 ("fanotify: reduce event objectid to 29-bit hash") Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index c3af99e94f1d..1192c9953620 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -129,15 +129,11 @@ static bool fanotify_should_merge(struct fsnotify_event *old_fsn, return false; } -/* Limit event merges to limit CPU overhead per event */ -#define FANOTIFY_MAX_MERGE_EVENTS 128 - /* and the list better be locked by something too! */ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) { struct fsnotify_event *test_event; struct fanotify_event *new; - int i = 0; pr_debug("%s: list=%p event=%p\n", __func__, list, event); new = FANOTIFY_E(event); @@ -151,8 +147,6 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) return 0; list_for_each_entry_reverse(test_event, list, list) { - if (++i > FANOTIFY_MAX_MERGE_EVENTS) - break; if (fanotify_should_merge(test_event, event)) { FANOTIFY_E(test_event)->mask |= new->mask; return 1; From 5b57a2b74d01c6f2c3698697f7beedd0ccf25f12 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 4 Mar 2021 12:48:23 +0200 Subject: [PATCH 0550/1565] fanotify: reduce event objectid to 29-bit hash [ Upstream commit 8988f11abb820bacfcc53d498370bfb30f792ec4 ] objectid is only used by fanotify backend and it is just an optimization for event merge before comparing all fields in event. Move the objectid member from common struct fsnotify_event into struct fanotify_event and reduce it to 29-bit hash to cram it together with the 3-bit event type. Events of different types are never merged, so the combination of event type and hash form a 32-bit key for fast compare of events. This reduces the size of events by one pointer and paves the way for adding hashed queue support for fanotify. Link: https://lore.kernel.org/r/20210304104826.3993892-3-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 25 ++++++++++++------------- fs/notify/fanotify/fanotify.h | 16 +++++++++++++--- fs/notify/inotify/inotify_fsnotify.c | 2 +- fs/notify/inotify/inotify_user.c | 2 +- include/linux/fsnotify_backend.h | 5 +---- 5 files changed, 28 insertions(+), 22 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 1192c9953620..8a2bb6954e02 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -88,16 +88,12 @@ static bool fanotify_name_event_equal(struct fanotify_name_event *fne1, return fanotify_info_equal(info1, info2); } -static bool fanotify_should_merge(struct fsnotify_event *old_fsn, - struct fsnotify_event *new_fsn) +static bool fanotify_should_merge(struct fanotify_event *old, + struct fanotify_event *new) { - struct fanotify_event *old, *new; + pr_debug("%s: old=%p new=%p\n", __func__, old, new); - pr_debug("%s: old=%p new=%p\n", __func__, old_fsn, new_fsn); - old = FANOTIFY_E(old_fsn); - new = FANOTIFY_E(new_fsn); - - if (old_fsn->objectid != new_fsn->objectid || + if (old->hash != new->hash || old->type != new->type || old->pid != new->pid) return false; @@ -133,10 +129,9 @@ static bool fanotify_should_merge(struct fsnotify_event *old_fsn, static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) { struct fsnotify_event *test_event; - struct fanotify_event *new; + struct fanotify_event *old, *new = FANOTIFY_E(event); pr_debug("%s: list=%p event=%p\n", __func__, list, event); - new = FANOTIFY_E(event); /* * Don't merge a permission event with any other event so that we know @@ -147,8 +142,9 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) return 0; list_for_each_entry_reverse(test_event, list, list) { - if (fanotify_should_merge(test_event, event)) { - FANOTIFY_E(test_event)->mask |= new->mask; + old = FANOTIFY_E(test_event); + if (fanotify_should_merge(old, new)) { + old->mask |= new->mask; return 1; } } @@ -533,6 +529,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, struct mem_cgroup *old_memcg; struct inode *child = NULL; bool name_event = false; + unsigned int hash = 0; if ((fid_mode & FAN_REPORT_DIR_FID) && dirid) { /* @@ -600,8 +597,10 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, * Use the victim inode instead of the watching inode as the id for * event queue, so event reported on parent is merged with event * reported on child when both directory and child watches exist. + * Hash object id for queue merge. */ - fanotify_init_event(event, (unsigned long)id, mask); + hash = hash_ptr(id, FANOTIFY_EVENT_HASH_BITS); + fanotify_init_event(event, hash, mask); if (FAN_GROUP_FLAG(group, FAN_REPORT_TID)) event->pid = get_pid(task_pid(current)); else diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 896c819a1786..d531f0cfa46f 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -135,19 +135,29 @@ enum fanotify_event_type { FANOTIFY_EVENT_TYPE_PATH, FANOTIFY_EVENT_TYPE_PATH_PERM, FANOTIFY_EVENT_TYPE_OVERFLOW, /* struct fanotify_event */ + __FANOTIFY_EVENT_TYPE_NUM }; +#define FANOTIFY_EVENT_TYPE_BITS \ + (ilog2(__FANOTIFY_EVENT_TYPE_NUM - 1) + 1) +#define FANOTIFY_EVENT_HASH_BITS \ + (32 - FANOTIFY_EVENT_TYPE_BITS) + struct fanotify_event { struct fsnotify_event fse; u32 mask; - enum fanotify_event_type type; + struct { + unsigned int type : FANOTIFY_EVENT_TYPE_BITS; + unsigned int hash : FANOTIFY_EVENT_HASH_BITS; + }; struct pid *pid; }; static inline void fanotify_init_event(struct fanotify_event *event, - unsigned long id, u32 mask) + unsigned int hash, u32 mask) { - fsnotify_init_event(&event->fse, id); + fsnotify_init_event(&event->fse); + event->hash = hash; event->mask = mask; event->pid = NULL; } diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index 66991c7fef9e..e2b124c0081d 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -114,7 +114,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask, mask &= ~IN_ISDIR; fsn_event = &event->fse; - fsnotify_init_event(fsn_event, 0); + fsnotify_init_event(fsn_event); event->mask = mask; event->wd = wd; event->sync_cookie = cookie; diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index c2018983832e..62cd91bc00b8 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -641,7 +641,7 @@ static struct fsnotify_group *inotify_new_group(unsigned int max_events) return ERR_PTR(-ENOMEM); } group->overflow_event = &oevent->fse; - fsnotify_init_event(group->overflow_event, 0); + fsnotify_init_event(group->overflow_event); oevent->mask = FS_Q_OVERFLOW; oevent->wd = -1; oevent->sync_cookie = 0; diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 7eb979bfc141..fc98f9f88d12 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -167,7 +167,6 @@ struct fsnotify_ops { */ struct fsnotify_event { struct list_head list; - unsigned long objectid; /* identifier for queue merges */ }; /* @@ -582,11 +581,9 @@ extern void fsnotify_put_mark(struct fsnotify_mark *mark); extern void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info); extern bool fsnotify_prepare_user_wait(struct fsnotify_iter_info *iter_info); -static inline void fsnotify_init_event(struct fsnotify_event *event, - unsigned long objectid) +static inline void fsnotify_init_event(struct fsnotify_event *event) { INIT_LIST_HEAD(&event->list); - event->objectid = objectid; } #else From ae7fd89daeb695636f9453207c7a61f1b62ce38d Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 4 Mar 2021 12:48:24 +0200 Subject: [PATCH 0551/1565] fanotify: mix event info and pid into merge key hash [ Upstream commit 7e3e5c6943994943eb76cab2d3a1806bc10b9045 ] Improve the merge key hash by mixing more values relevant for merge. For example, all FAN_CREATE name events in the same dir used to have the same merge key based on the dir inode. With this change the created file name is mixed into the merge key. The object id that was used as merge key is redundant to the event info so it is no longer mixed into the hash. Permission events are not hashed, so no need to hash their info. Link: https://lore.kernel.org/r/20210304104826.3993892-4-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 87 ++++++++++++++++++++++++----------- fs/notify/fanotify/fanotify.h | 5 ++ 2 files changed, 66 insertions(+), 26 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 8a2bb6954e02..43a606f15370 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -14,6 +14,7 @@ #include #include #include +#include #include "fanotify.h" @@ -22,12 +23,24 @@ static bool fanotify_path_equal(struct path *p1, struct path *p2) return p1->mnt == p2->mnt && p1->dentry == p2->dentry; } +static unsigned int fanotify_hash_path(const struct path *path) +{ + return hash_ptr(path->dentry, FANOTIFY_EVENT_HASH_BITS) ^ + hash_ptr(path->mnt, FANOTIFY_EVENT_HASH_BITS); +} + static inline bool fanotify_fsid_equal(__kernel_fsid_t *fsid1, __kernel_fsid_t *fsid2) { return fsid1->val[0] == fsid2->val[0] && fsid1->val[1] == fsid2->val[1]; } +static unsigned int fanotify_hash_fsid(__kernel_fsid_t *fsid) +{ + return hash_32(fsid->val[0], FANOTIFY_EVENT_HASH_BITS) ^ + hash_32(fsid->val[1], FANOTIFY_EVENT_HASH_BITS); +} + static bool fanotify_fh_equal(struct fanotify_fh *fh1, struct fanotify_fh *fh2) { @@ -38,6 +51,16 @@ static bool fanotify_fh_equal(struct fanotify_fh *fh1, !memcmp(fanotify_fh_buf(fh1), fanotify_fh_buf(fh2), fh1->len); } +static unsigned int fanotify_hash_fh(struct fanotify_fh *fh) +{ + long salt = (long)fh->type | (long)fh->len << 8; + + /* + * full_name_hash() works long by long, so it handles fh buf optimally. + */ + return full_name_hash((void *)salt, fanotify_fh_buf(fh), fh->len); +} + static bool fanotify_fid_event_equal(struct fanotify_fid_event *ffe1, struct fanotify_fid_event *ffe2) { @@ -325,7 +348,8 @@ static int fanotify_encode_fh_len(struct inode *inode) * Return 0 on failure to encode. */ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, - unsigned int fh_len, gfp_t gfp) + unsigned int fh_len, unsigned int *hash, + gfp_t gfp) { int dwords, type = 0; char *ext_buf = NULL; @@ -368,6 +392,9 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, fh->type = type; fh->len = fh_len; + /* Mix fh into event merge key */ + *hash ^= fanotify_hash_fh(fh); + return FANOTIFY_FH_HDR_LEN + fh_len; out_err: @@ -421,6 +448,7 @@ static struct inode *fanotify_dfid_inode(u32 event_mask, const void *data, } static struct fanotify_event *fanotify_alloc_path_event(const struct path *path, + unsigned int *hash, gfp_t gfp) { struct fanotify_path_event *pevent; @@ -431,6 +459,7 @@ static struct fanotify_event *fanotify_alloc_path_event(const struct path *path, pevent->fae.type = FANOTIFY_EVENT_TYPE_PATH; pevent->path = *path; + *hash ^= fanotify_hash_path(path); path_get(path); return &pevent->fae; @@ -456,6 +485,7 @@ static struct fanotify_event *fanotify_alloc_perm_event(const struct path *path, static struct fanotify_event *fanotify_alloc_fid_event(struct inode *id, __kernel_fsid_t *fsid, + unsigned int *hash, gfp_t gfp) { struct fanotify_fid_event *ffe; @@ -466,16 +496,18 @@ static struct fanotify_event *fanotify_alloc_fid_event(struct inode *id, ffe->fae.type = FANOTIFY_EVENT_TYPE_FID; ffe->fsid = *fsid; + *hash ^= fanotify_hash_fsid(fsid); fanotify_encode_fh(&ffe->object_fh, id, fanotify_encode_fh_len(id), - gfp); + hash, gfp); return &ffe->fae; } static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, __kernel_fsid_t *fsid, - const struct qstr *file_name, + const struct qstr *name, struct inode *child, + unsigned int *hash, gfp_t gfp) { struct fanotify_name_event *fne; @@ -488,24 +520,30 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, size = sizeof(*fne) + FANOTIFY_FH_HDR_LEN + dir_fh_len; if (child_fh_len) size += FANOTIFY_FH_HDR_LEN + child_fh_len; - if (file_name) - size += file_name->len + 1; + if (name) + size += name->len + 1; fne = kmalloc(size, gfp); if (!fne) return NULL; fne->fae.type = FANOTIFY_EVENT_TYPE_FID_NAME; fne->fsid = *fsid; + *hash ^= fanotify_hash_fsid(fsid); info = &fne->info; fanotify_info_init(info); dfh = fanotify_info_dir_fh(info); - info->dir_fh_totlen = fanotify_encode_fh(dfh, id, dir_fh_len, 0); + info->dir_fh_totlen = fanotify_encode_fh(dfh, id, dir_fh_len, hash, 0); if (child_fh_len) { ffh = fanotify_info_file_fh(info); - info->file_fh_totlen = fanotify_encode_fh(ffh, child, child_fh_len, 0); + info->file_fh_totlen = fanotify_encode_fh(ffh, child, + child_fh_len, hash, 0); + } + if (name) { + long salt = name->len; + + fanotify_info_copy_name(info, name); + *hash ^= full_name_hash((void *)salt, name->name, name->len); } - if (file_name) - fanotify_info_copy_name(info, file_name); pr_debug("%s: ino=%lu size=%u dir_fh_len=%u child_fh_len=%u name_len=%u name='%.*s'\n", __func__, id->i_ino, size, dir_fh_len, child_fh_len, @@ -530,6 +568,8 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, struct inode *child = NULL; bool name_event = false; unsigned int hash = 0; + bool ondir = mask & FAN_ONDIR; + struct pid *pid; if ((fid_mode & FAN_REPORT_DIR_FID) && dirid) { /* @@ -537,8 +577,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, * report the child fid for events reported on a non-dir child * in addition to reporting the parent fid and maybe child name. */ - if ((fid_mode & FAN_REPORT_FID) && - id != dirid && !(mask & FAN_ONDIR)) + if ((fid_mode & FAN_REPORT_FID) && id != dirid && !ondir) child = id; id = dirid; @@ -559,8 +598,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, if (!(fid_mode & FAN_REPORT_NAME)) { name_event = !!child; file_name = NULL; - } else if ((mask & ALL_FSNOTIFY_DIRENT_EVENTS) || - !(mask & FAN_ONDIR)) { + } else if ((mask & ALL_FSNOTIFY_DIRENT_EVENTS) || !ondir) { name_event = true; } } @@ -583,28 +621,25 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, event = fanotify_alloc_perm_event(path, gfp); } else if (name_event && (file_name || child)) { event = fanotify_alloc_name_event(id, fsid, file_name, child, - gfp); + &hash, gfp); } else if (fid_mode) { - event = fanotify_alloc_fid_event(id, fsid, gfp); + event = fanotify_alloc_fid_event(id, fsid, &hash, gfp); } else { - event = fanotify_alloc_path_event(path, gfp); + event = fanotify_alloc_path_event(path, &hash, gfp); } if (!event) goto out; - /* - * Use the victim inode instead of the watching inode as the id for - * event queue, so event reported on parent is merged with event - * reported on child when both directory and child watches exist. - * Hash object id for queue merge. - */ - hash = hash_ptr(id, FANOTIFY_EVENT_HASH_BITS); - fanotify_init_event(event, hash, mask); if (FAN_GROUP_FLAG(group, FAN_REPORT_TID)) - event->pid = get_pid(task_pid(current)); + pid = get_pid(task_pid(current)); else - event->pid = get_pid(task_tgid(current)); + pid = get_pid(task_tgid(current)); + + /* Mix event info, FAN_ONDIR flag and pid into event merge key */ + hash ^= hash_long((unsigned long)pid | ondir, FANOTIFY_EVENT_HASH_BITS); + fanotify_init_event(event, hash, mask); + event->pid = pid; out: set_active_memcg(old_memcg); diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index d531f0cfa46f..9871f76cd9c2 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -115,6 +115,11 @@ static inline void fanotify_info_init(struct fanotify_info *info) info->name_len = 0; } +static inline unsigned int fanotify_info_len(struct fanotify_info *info) +{ + return info->dir_fh_totlen + info->file_fh_totlen + info->name_len; +} + static inline void fanotify_info_copy_name(struct fanotify_info *info, const struct qstr *name) { From 40e1e98c1bb24318e2287bdb949fac45e746608b Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 4 Mar 2021 12:48:25 +0200 Subject: [PATCH 0552/1565] fsnotify: use hash table for faster events merge [ Upstream commit 94e00d28a680dff18805ca472b191364347d2234 ] In order to improve event merge performance, hash events in a 128 size hash table by the event merge key. The fanotify_event size grows by two pointers, but we just reduced its size by removing the objectid member, so overall its size is increased by one pointer. Permission events and overflow event are not merged so they are also not hashed. Link: https://lore.kernel.org/r/20210304104826.3993892-5-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 40 +++++++++++++++++++++++----- fs/notify/fanotify/fanotify.h | 25 +++++++++++++++++ fs/notify/fanotify/fanotify_user.c | 39 +++++++++++++++++++++++++++ fs/notify/inotify/inotify_fsnotify.c | 7 ++--- fs/notify/notification.c | 22 ++++++++++----- include/linux/fsnotify_backend.h | 10 ++++--- 6 files changed, 123 insertions(+), 20 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 43a606f15370..50b3abc06215 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -149,12 +149,15 @@ static bool fanotify_should_merge(struct fanotify_event *old, } /* and the list better be locked by something too! */ -static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) +static int fanotify_merge(struct fsnotify_group *group, + struct fsnotify_event *event) { - struct fsnotify_event *test_event; struct fanotify_event *old, *new = FANOTIFY_E(event); + unsigned int bucket = fanotify_event_hash_bucket(group, new); + struct hlist_head *hlist = &group->fanotify_data.merge_hash[bucket]; - pr_debug("%s: list=%p event=%p\n", __func__, list, event); + pr_debug("%s: group=%p event=%p bucket=%u\n", __func__, + group, event, bucket); /* * Don't merge a permission event with any other event so that we know @@ -164,8 +167,7 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) if (fanotify_is_perm_event(new->mask)) return 0; - list_for_each_entry_reverse(test_event, list, list) { - old = FANOTIFY_E(test_event); + hlist_for_each_entry(old, hlist, merge_list) { if (fanotify_should_merge(old, new)) { old->mask |= new->mask; return 1; @@ -203,8 +205,11 @@ static int fanotify_get_response(struct fsnotify_group *group, return ret; } /* Event not yet reported? Just remove it. */ - if (event->state == FAN_EVENT_INIT) + if (event->state == FAN_EVENT_INIT) { fsnotify_remove_queued_event(group, &event->fae.fse); + /* Permission events are not supposed to be hashed */ + WARN_ON_ONCE(!hlist_unhashed(&event->fae.merge_list)); + } /* * Event may be also answered in case signal delivery raced * with wakeup. In that case we have nothing to do besides @@ -679,6 +684,24 @@ static __kernel_fsid_t fanotify_get_fsid(struct fsnotify_iter_info *iter_info) return fsid; } +/* + * Add an event to hash table for faster merge. + */ +static void fanotify_insert_event(struct fsnotify_group *group, + struct fsnotify_event *fsn_event) +{ + struct fanotify_event *event = FANOTIFY_E(fsn_event); + unsigned int bucket = fanotify_event_hash_bucket(group, event); + struct hlist_head *hlist = &group->fanotify_data.merge_hash[bucket]; + + assert_spin_locked(&group->notification_lock); + + pr_debug("%s: group=%p event=%p bucket=%u\n", __func__, + group, event, bucket); + + hlist_add_head(&event->merge_list, hlist); +} + static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, const void *data, int data_type, struct inode *dir, @@ -749,7 +772,9 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, } fsn_event = &event->fse; - ret = fsnotify_add_event(group, fsn_event, fanotify_merge); + ret = fsnotify_add_event(group, fsn_event, fanotify_merge, + fanotify_is_hashed_event(mask) ? + fanotify_insert_event : NULL); if (ret) { /* Permission events shouldn't be merged */ BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS); @@ -772,6 +797,7 @@ static void fanotify_free_group_priv(struct fsnotify_group *group) { struct user_struct *user; + kfree(group->fanotify_data.merge_hash); user = group->fanotify_data.user; atomic_dec(&user->fanotify_listeners); free_uid(user); diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 9871f76cd9c2..4a5e555dc3d2 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -3,6 +3,7 @@ #include #include #include +#include extern struct kmem_cache *fanotify_mark_cache; extern struct kmem_cache *fanotify_fid_event_cachep; @@ -150,6 +151,7 @@ enum fanotify_event_type { struct fanotify_event { struct fsnotify_event fse; + struct hlist_node merge_list; /* List for hashed merge */ u32 mask; struct { unsigned int type : FANOTIFY_EVENT_TYPE_BITS; @@ -162,6 +164,7 @@ static inline void fanotify_init_event(struct fanotify_event *event, unsigned int hash, u32 mask) { fsnotify_init_event(&event->fse); + INIT_HLIST_NODE(&event->merge_list); event->hash = hash; event->mask = mask; event->pid = NULL; @@ -299,3 +302,25 @@ static inline struct path *fanotify_event_path(struct fanotify_event *event) else return NULL; } + +/* + * Use 128 size hash table to speed up events merge. + */ +#define FANOTIFY_HTABLE_BITS (7) +#define FANOTIFY_HTABLE_SIZE (1 << FANOTIFY_HTABLE_BITS) +#define FANOTIFY_HTABLE_MASK (FANOTIFY_HTABLE_SIZE - 1) + +/* + * Permission events and overflow event do not get merged - don't hash them. + */ +static inline bool fanotify_is_hashed_event(u32 mask) +{ + return !fanotify_is_perm_event(mask) && !(mask & FS_Q_OVERFLOW); +} + +static inline unsigned int fanotify_event_hash_bucket( + struct fsnotify_group *group, + struct fanotify_event *event) +{ + return event->hash & FANOTIFY_HTABLE_MASK; +} diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 30651e8d1a6d..1456470d0c4c 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -89,6 +89,23 @@ static int fanotify_event_info_len(unsigned int fid_mode, return info_len; } +/* + * Remove an hashed event from merge hash table. + */ +static void fanotify_unhash_event(struct fsnotify_group *group, + struct fanotify_event *event) +{ + assert_spin_locked(&group->notification_lock); + + pr_debug("%s: group=%p event=%p bucket=%u\n", __func__, + group, event, fanotify_event_hash_bucket(group, event)); + + if (WARN_ON_ONCE(hlist_unhashed(&event->merge_list))) + return; + + hlist_del_init(&event->merge_list); +} + /* * Get an fanotify notification event if one exists and is small * enough to fit in "count". Return an error pointer if the count @@ -126,6 +143,8 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group, fsnotify_remove_first_event(group); if (fanotify_is_perm_event(event->mask)) FANOTIFY_PERM(event)->state = FAN_EVENT_REPORTED; + if (fanotify_is_hashed_event(event->mask)) + fanotify_unhash_event(group, event); out: spin_unlock(&group->notification_lock); return event; @@ -925,6 +944,20 @@ static struct fsnotify_event *fanotify_alloc_overflow_event(void) return &oevent->fse; } +static struct hlist_head *fanotify_alloc_merge_hash(void) +{ + struct hlist_head *hash; + + hash = kmalloc(sizeof(struct hlist_head) << FANOTIFY_HTABLE_BITS, + GFP_KERNEL_ACCOUNT); + if (!hash) + return NULL; + + __hash_init(hash, FANOTIFY_HTABLE_SIZE); + + return hash; +} + /* fanotify syscalls */ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) { @@ -993,6 +1026,12 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) atomic_inc(&user->fanotify_listeners); group->memcg = get_mem_cgroup_from_mm(current->mm); + group->fanotify_data.merge_hash = fanotify_alloc_merge_hash(); + if (!group->fanotify_data.merge_hash) { + fd = -ENOMEM; + goto out_destroy_group; + } + group->overflow_event = fanotify_alloc_overflow_event(); if (unlikely(!group->overflow_event)) { fd = -ENOMEM; diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index e2b124c0081d..b0530f75b274 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -46,9 +46,10 @@ static bool event_compare(struct fsnotify_event *old_fsn, return false; } -static int inotify_merge(struct list_head *list, - struct fsnotify_event *event) +static int inotify_merge(struct fsnotify_group *group, + struct fsnotify_event *event) { + struct list_head *list = &group->notification_list; struct fsnotify_event *last_event; last_event = list_entry(list->prev, struct fsnotify_event, list); @@ -122,7 +123,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask, if (len) strcpy(event->name, name->name); - ret = fsnotify_add_event(group, fsn_event, inotify_merge); + ret = fsnotify_add_event(group, fsn_event, inotify_merge, NULL); if (ret) { /* Our event wasn't used in the end. Free it. */ fsnotify_destroy_event(group, fsn_event); diff --git a/fs/notify/notification.c b/fs/notify/notification.c index 001cfe7d2e4e..32f45543b9c6 100644 --- a/fs/notify/notification.c +++ b/fs/notify/notification.c @@ -68,16 +68,22 @@ void fsnotify_destroy_event(struct fsnotify_group *group, } /* - * Add an event to the group notification queue. The group can later pull this - * event off the queue to deal with. The function returns 0 if the event was - * added to the queue, 1 if the event was merged with some other queued event, + * Try to add an event to the notification queue. + * The group can later pull this event off the queue to deal with. + * The group can use the @merge hook to merge the event with a queued event. + * The group can use the @insert hook to insert the event into hash table. + * The function returns: + * 0 if the event was added to a queue + * 1 if the event was merged with some other queued event * 2 if the event was not queued - either the queue of events has overflown - * or the group is shutting down. + * or the group is shutting down. */ int fsnotify_add_event(struct fsnotify_group *group, struct fsnotify_event *event, - int (*merge)(struct list_head *, - struct fsnotify_event *)) + int (*merge)(struct fsnotify_group *, + struct fsnotify_event *), + void (*insert)(struct fsnotify_group *, + struct fsnotify_event *)) { int ret = 0; struct list_head *list = &group->notification_list; @@ -104,7 +110,7 @@ int fsnotify_add_event(struct fsnotify_group *group, } if (!list_empty(list) && merge) { - ret = merge(list, event); + ret = merge(group, event); if (ret) { spin_unlock(&group->notification_lock); return ret; @@ -114,6 +120,8 @@ int fsnotify_add_event(struct fsnotify_group *group, queue: group->q_len++; list_add_tail(&event->list, list); + if (insert) + insert(group, event); spin_unlock(&group->notification_lock); wake_up(&group->notification_waitq); diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index fc98f9f88d12..63fb766f0f3e 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -233,6 +233,8 @@ struct fsnotify_group { #endif #ifdef CONFIG_FANOTIFY struct fanotify_group_private_data { + /* Hash table of events for merge */ + struct hlist_head *merge_hash; /* allows a group to block waiting for a userspace response */ struct list_head access_list; wait_queue_head_t access_waitq; @@ -486,12 +488,14 @@ extern void fsnotify_destroy_event(struct fsnotify_group *group, /* attach the event to the group notification queue */ extern int fsnotify_add_event(struct fsnotify_group *group, struct fsnotify_event *event, - int (*merge)(struct list_head *, - struct fsnotify_event *)); + int (*merge)(struct fsnotify_group *, + struct fsnotify_event *), + void (*insert)(struct fsnotify_group *, + struct fsnotify_event *)); /* Queue overflow event to a notification group */ static inline void fsnotify_queue_overflow(struct fsnotify_group *group) { - fsnotify_add_event(group, group->overflow_event, NULL); + fsnotify_add_event(group, group->overflow_event, NULL, NULL); } static inline bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group) From 7df80a90e1a115cfd0e684015ee3bc780999b8c6 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 4 Mar 2021 12:48:26 +0200 Subject: [PATCH 0553/1565] fanotify: limit number of event merge attempts [ Upstream commit b8cd0ee8cda68a888a317991c1e918a8cba1a568 ] Event merges are expensive when event queue size is large, so limit the linear search to 128 merge tests. In combination with 128 size hash table, there is a potential to merge with up to 16K events in the hashed queue. Link: https://lore.kernel.org/r/20210304104826.3993892-6-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 50b3abc06215..754e27ead874 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -148,6 +148,9 @@ static bool fanotify_should_merge(struct fanotify_event *old, return false; } +/* Limit event merges to limit CPU overhead per event */ +#define FANOTIFY_MAX_MERGE_EVENTS 128 + /* and the list better be locked by something too! */ static int fanotify_merge(struct fsnotify_group *group, struct fsnotify_event *event) @@ -155,6 +158,7 @@ static int fanotify_merge(struct fsnotify_group *group, struct fanotify_event *old, *new = FANOTIFY_E(event); unsigned int bucket = fanotify_event_hash_bucket(group, new); struct hlist_head *hlist = &group->fanotify_data.merge_hash[bucket]; + int i = 0; pr_debug("%s: group=%p event=%p bucket=%u\n", __func__, group, event, bucket); @@ -168,6 +172,8 @@ static int fanotify_merge(struct fsnotify_group *group, return 0; hlist_for_each_entry(old, hlist, merge_list) { + if (++i > FANOTIFY_MAX_MERGE_EVENTS) + break; if (fanotify_should_merge(old, new)) { old->mask |= new->mask; return 1; From 3e441a872a57003038b249f6260782ab629d3631 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 4 Mar 2021 13:29:20 +0200 Subject: [PATCH 0554/1565] fanotify: configurable limits via sysfs [ Upstream commit 5b8fea65d197f408bb00b251c70d842826d6b70b ] fanotify has some hardcoded limits. The only APIs to escape those limits are FAN_UNLIMITED_QUEUE and FAN_UNLIMITED_MARKS. Allow finer grained tuning of the system limits via sysfs tunables under /proc/sys/fs/fanotify, similar to tunables under /proc/sys/fs/inotify, with some minor differences. - max_queued_events - global system tunable for group queue size limit. Like the inotify tunable with the same name, it defaults to 16384 and applies on initialization of a new group. - max_user_marks - user ns tunable for marks limit per user. Like the inotify tunable named max_user_watches, on a machine with sufficient RAM and it defaults to 1048576 in init userns and can be further limited per containing user ns. - max_user_groups - user ns tunable for number of groups per user. Like the inotify tunable named max_user_instances, it defaults to 128 in init userns and can be further limited per containing user ns. The slightly different tunable names used for fanotify are derived from the "group" and "mark" terminology used in the fanotify man pages and throughout the code. Considering the fact that the default value for max_user_instances was increased in kernel v5.10 from 8192 to 1048576, leaving the legacy fanotify limit of 8192 marks per group in addition to the max_user_marks limit makes little sense, so the per group marks limit has been removed. Note that when a group is initialized with FAN_UNLIMITED_MARKS, its own marks are not accounted in the per user marks account, so in effect the limit of max_user_marks is only for the collection of groups that are not initialized with FAN_UNLIMITED_MARKS. Link: https://lore.kernel.org/r/20210304112921.3996419-2-amir73il@gmail.com Suggested-by: Jan Kara Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 16 ++-- fs/notify/fanotify/fanotify_user.c | 123 ++++++++++++++++++++++++----- fs/notify/group.c | 1 - fs/notify/mark.c | 4 - include/linux/fanotify.h | 3 + include/linux/fsnotify_backend.h | 6 +- include/linux/sched/user.h | 3 - include/linux/user_namespace.h | 4 + kernel/sysctl.c | 12 ++- kernel/ucount.c | 4 + 10 files changed, 137 insertions(+), 39 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 754e27ead874..057abd2cf887 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -801,12 +801,10 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, static void fanotify_free_group_priv(struct fsnotify_group *group) { - struct user_struct *user; - kfree(group->fanotify_data.merge_hash); - user = group->fanotify_data.user; - atomic_dec(&user->fanotify_listeners); - free_uid(user); + if (group->fanotify_data.ucounts) + dec_ucount(group->fanotify_data.ucounts, + UCOUNT_FANOTIFY_GROUPS); } static void fanotify_free_path_event(struct fanotify_event *event) @@ -862,6 +860,13 @@ static void fanotify_free_event(struct fsnotify_event *fsn_event) } } +static void fanotify_freeing_mark(struct fsnotify_mark *mark, + struct fsnotify_group *group) +{ + if (!FAN_GROUP_FLAG(group, FAN_UNLIMITED_MARKS)) + dec_ucount(group->fanotify_data.ucounts, UCOUNT_FANOTIFY_MARKS); +} + static void fanotify_free_mark(struct fsnotify_mark *fsn_mark) { kmem_cache_free(fanotify_mark_cache, fsn_mark); @@ -871,5 +876,6 @@ const struct fsnotify_ops fanotify_fsnotify_ops = { .handle_event = fanotify_handle_event, .free_group_priv = fanotify_free_group_priv, .free_event = fanotify_free_event, + .freeing_mark = fanotify_freeing_mark, .free_mark = fanotify_free_mark, }; diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 1456470d0c4c..74b4da6354e1 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -27,8 +27,61 @@ #include "fanotify.h" #define FANOTIFY_DEFAULT_MAX_EVENTS 16384 -#define FANOTIFY_DEFAULT_MAX_MARKS 8192 -#define FANOTIFY_DEFAULT_MAX_LISTENERS 128 +#define FANOTIFY_OLD_DEFAULT_MAX_MARKS 8192 +#define FANOTIFY_DEFAULT_MAX_GROUPS 128 + +/* + * Legacy fanotify marks limits (8192) is per group and we introduced a tunable + * limit of marks per user, similar to inotify. Effectively, the legacy limit + * of fanotify marks per user is * . + * This default limit (1M) also happens to match the increased limit of inotify + * max_user_watches since v5.10. + */ +#define FANOTIFY_DEFAULT_MAX_USER_MARKS \ + (FANOTIFY_OLD_DEFAULT_MAX_MARKS * FANOTIFY_DEFAULT_MAX_GROUPS) + +/* + * Most of the memory cost of adding an inode mark is pinning the marked inode. + * The size of the filesystem inode struct is not uniform across filesystems, + * so double the size of a VFS inode is used as a conservative approximation. + */ +#define INODE_MARK_COST (2 * sizeof(struct inode)) + +/* configurable via /proc/sys/fs/fanotify/ */ +static int fanotify_max_queued_events __read_mostly; + +#ifdef CONFIG_SYSCTL + +#include + +struct ctl_table fanotify_table[] = { + { + .procname = "max_user_groups", + .data = &init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS], + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + }, + { + .procname = "max_user_marks", + .data = &init_user_ns.ucount_max[UCOUNT_FANOTIFY_MARKS], + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + }, + { + .procname = "max_queued_events", + .data = &fanotify_max_queued_events, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO + }, + { } +}; +#endif /* CONFIG_SYSCTL */ /* * All flags that may be specified in parameter event_f_flags of fanotify_init. @@ -847,24 +900,38 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group, unsigned int type, __kernel_fsid_t *fsid) { + struct ucounts *ucounts = group->fanotify_data.ucounts; struct fsnotify_mark *mark; int ret; - if (atomic_read(&group->num_marks) > group->fanotify_data.max_marks) + /* + * Enforce per user marks limits per user in all containing user ns. + * A group with FAN_UNLIMITED_MARKS does not contribute to mark count + * in the limited groups account. + */ + if (!FAN_GROUP_FLAG(group, FAN_UNLIMITED_MARKS) && + !inc_ucount(ucounts->ns, ucounts->uid, UCOUNT_FANOTIFY_MARKS)) return ERR_PTR(-ENOSPC); mark = kmem_cache_alloc(fanotify_mark_cache, GFP_KERNEL); - if (!mark) - return ERR_PTR(-ENOMEM); + if (!mark) { + ret = -ENOMEM; + goto out_dec_ucounts; + } fsnotify_init_mark(mark, group); ret = fsnotify_add_mark_locked(mark, connp, type, 0, fsid); if (ret) { fsnotify_put_mark(mark); - return ERR_PTR(ret); + goto out_dec_ucounts; } return mark; + +out_dec_ucounts: + if (!FAN_GROUP_FLAG(group, FAN_UNLIMITED_MARKS)) + dec_ucount(ucounts, UCOUNT_FANOTIFY_MARKS); + return ERR_PTR(ret); } @@ -963,7 +1030,6 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) { struct fsnotify_group *group; int f_flags, fd; - struct user_struct *user; unsigned int fid_mode = flags & FANOTIFY_FID_BITS; unsigned int class = flags & FANOTIFY_CLASS_BITS; @@ -1002,12 +1068,6 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) if ((fid_mode & FAN_REPORT_NAME) && !(fid_mode & FAN_REPORT_DIR_FID)) return -EINVAL; - user = get_current_user(); - if (atomic_read(&user->fanotify_listeners) > FANOTIFY_DEFAULT_MAX_LISTENERS) { - free_uid(user); - return -EMFILE; - } - f_flags = O_RDWR | FMODE_NONOTIFY; if (flags & FAN_CLOEXEC) f_flags |= O_CLOEXEC; @@ -1017,13 +1077,19 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) /* fsnotify_alloc_group takes a ref. Dropped in fanotify_release */ group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops); if (IS_ERR(group)) { - free_uid(user); return PTR_ERR(group); } - group->fanotify_data.user = user; + /* Enforce groups limits per user in all containing user ns */ + group->fanotify_data.ucounts = inc_ucount(current_user_ns(), + current_euid(), + UCOUNT_FANOTIFY_GROUPS); + if (!group->fanotify_data.ucounts) { + fd = -EMFILE; + goto out_destroy_group; + } + group->fanotify_data.flags = flags; - atomic_inc(&user->fanotify_listeners); group->memcg = get_mem_cgroup_from_mm(current->mm); group->fanotify_data.merge_hash = fanotify_alloc_merge_hash(); @@ -1064,16 +1130,13 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) goto out_destroy_group; group->max_events = UINT_MAX; } else { - group->max_events = FANOTIFY_DEFAULT_MAX_EVENTS; + group->max_events = fanotify_max_queued_events; } if (flags & FAN_UNLIMITED_MARKS) { fd = -EPERM; if (!capable(CAP_SYS_ADMIN)) goto out_destroy_group; - group->fanotify_data.max_marks = UINT_MAX; - } else { - group->fanotify_data.max_marks = FANOTIFY_DEFAULT_MAX_MARKS; } if (flags & FAN_ENABLE_AUDIT) { @@ -1375,6 +1438,21 @@ SYSCALL32_DEFINE6(fanotify_mark, */ static int __init fanotify_user_setup(void) { + struct sysinfo si; + int max_marks; + + si_meminfo(&si); + /* + * Allow up to 1% of addressable memory to be accounted for per user + * marks limited to the range [8192, 1048576]. mount and sb marks are + * a lot cheaper than inode marks, but there is no reason for a user + * to have many of those, so calculate by the cost of inode marks. + */ + max_marks = (((si.totalram - si.totalhigh) / 100) << PAGE_SHIFT) / + INODE_MARK_COST; + max_marks = clamp(max_marks, FANOTIFY_OLD_DEFAULT_MAX_MARKS, + FANOTIFY_DEFAULT_MAX_USER_MARKS); + BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10); BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9); @@ -1389,6 +1467,11 @@ static int __init fanotify_user_setup(void) KMEM_CACHE(fanotify_perm_event, SLAB_PANIC); } + fanotify_max_queued_events = FANOTIFY_DEFAULT_MAX_EVENTS; + init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS] = + FANOTIFY_DEFAULT_MAX_GROUPS; + init_user_ns.ucount_max[UCOUNT_FANOTIFY_MARKS] = max_marks; + return 0; } device_initcall(fanotify_user_setup); diff --git a/fs/notify/group.c b/fs/notify/group.c index ffd723ffe46d..fb89c351295d 100644 --- a/fs/notify/group.c +++ b/fs/notify/group.c @@ -122,7 +122,6 @@ static struct fsnotify_group *__fsnotify_alloc_group( /* set to 0 when there a no external references to this group */ refcount_set(&group->refcnt, 1); - atomic_set(&group->num_marks, 0); atomic_set(&group->user_waits, 0); spin_lock_init(&group->notification_lock); diff --git a/fs/notify/mark.c b/fs/notify/mark.c index 5b44be5f93dd..7af98a7c33c2 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -391,8 +391,6 @@ void fsnotify_detach_mark(struct fsnotify_mark *mark) list_del_init(&mark->g_list); spin_unlock(&mark->lock); - atomic_dec(&group->num_marks); - /* Drop mark reference acquired in fsnotify_add_mark_locked() */ fsnotify_put_mark(mark); } @@ -656,7 +654,6 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, mark->flags |= FSNOTIFY_MARK_FLAG_ALIVE | FSNOTIFY_MARK_FLAG_ATTACHED; list_add(&mark->g_list, &group->marks_list); - atomic_inc(&group->num_marks); fsnotify_get_mark(mark); /* for g_list */ spin_unlock(&mark->lock); @@ -674,7 +671,6 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, FSNOTIFY_MARK_FLAG_ATTACHED); list_del_init(&mark->g_list); spin_unlock(&mark->lock); - atomic_dec(&group->num_marks); fsnotify_put_mark(mark); return ret; diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 3e9c56ee651f..031a97d8369a 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -2,8 +2,11 @@ #ifndef _LINUX_FANOTIFY_H #define _LINUX_FANOTIFY_H +#include #include +extern struct ctl_table fanotify_table[]; /* for sysctl */ + #define FAN_GROUP_FLAG(group, flag) \ ((group)->fanotify_data.flags & (flag)) diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 63fb766f0f3e..1ce66748a2d2 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -206,9 +206,6 @@ struct fsnotify_group { /* stores all fastpath marks assoc with this group so they can be cleaned on unregister */ struct mutex mark_mutex; /* protect marks_list */ - atomic_t num_marks; /* 1 for each mark and 1 for not being - * past the point of no return when freeing - * a group */ atomic_t user_waits; /* Number of tasks waiting for user * response */ struct list_head marks_list; /* all inode marks for this group */ @@ -240,8 +237,7 @@ struct fsnotify_group { wait_queue_head_t access_waitq; int flags; /* flags from fanotify_init() */ int f_flags; /* event_f_flags from fanotify_init() */ - unsigned int max_marks; - struct user_struct *user; + struct ucounts *ucounts; } fanotify_data; #endif /* CONFIG_FANOTIFY */ }; diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h index a8ec3b6093fc..3632c5d6ec55 100644 --- a/include/linux/sched/user.h +++ b/include/linux/sched/user.h @@ -14,9 +14,6 @@ struct user_struct { refcount_t __count; /* reference count */ atomic_t processes; /* How many processes does this user have? */ atomic_t sigpending; /* How many pending signals does this user have? */ -#ifdef CONFIG_FANOTIFY - atomic_t fanotify_listeners; -#endif #ifdef CONFIG_EPOLL atomic_long_t epoll_watches; /* The number of file descriptors currently watched */ #endif diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 7616c7bf4b24..0793951867ae 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -49,6 +49,10 @@ enum ucount_type { #ifdef CONFIG_INOTIFY_USER UCOUNT_INOTIFY_INSTANCES, UCOUNT_INOTIFY_WATCHES, +#endif +#ifdef CONFIG_FANOTIFY + UCOUNT_FANOTIFY_GROUPS, + UCOUNT_FANOTIFY_MARKS, #endif UCOUNT_COUNTS, }; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 99a19190196e..b29a9568ebbe 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -142,6 +142,9 @@ static unsigned long hung_task_timeout_max = (LONG_MAX/HZ); #ifdef CONFIG_INOTIFY_USER #include #endif +#ifdef CONFIG_FANOTIFY +#include +#endif #ifdef CONFIG_PROC_SYSCTL @@ -3330,7 +3333,14 @@ static struct ctl_table fs_table[] = { .mode = 0555, .child = inotify_table, }, -#endif +#endif +#ifdef CONFIG_FANOTIFY + { + .procname = "fanotify", + .mode = 0555, + .child = fanotify_table, + }, +#endif #ifdef CONFIG_EPOLL { .procname = "epoll", diff --git a/kernel/ucount.c b/kernel/ucount.c index 11b1596e2542..8d8874f1c35e 100644 --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -73,6 +73,10 @@ static struct ctl_table user_table[] = { #ifdef CONFIG_INOTIFY_USER UCOUNT_ENTRY("max_inotify_instances"), UCOUNT_ENTRY("max_inotify_watches"), +#endif +#ifdef CONFIG_FANOTIFY + UCOUNT_ENTRY("max_fanotify_groups"), + UCOUNT_ENTRY("max_fanotify_marks"), #endif { } }; From 4ac0ad23728ab4907f82636da867a7becb0a2b15 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 4 Mar 2021 13:29:21 +0200 Subject: [PATCH 0555/1565] fanotify: support limited functionality for unprivileged users [ Upstream commit 7cea2a3c505e87a9d6afc78be4a7f7be636a73a7 ] Add limited support for unprivileged fanotify groups. An unprivileged users is not allowed to get an open file descriptor in the event nor the process pid of another process. An unprivileged user cannot request permission events, cannot set mount/filesystem marks and cannot request unlimited queue/marks. This enables the limited functionality similar to inotify when watching a set of files and directories for OPEN/ACCESS/MODIFY/CLOSE events, without requiring SYS_CAP_ADMIN privileges. The FAN_REPORT_DFID_NAME init flag, provide a method for an unprivileged listener watching a set of directories (with FAN_EVENT_ON_CHILD) to monitor all changes inside those directories. This typically requires that the listener keeps a map of watched directory fid to dirfd (O_PATH), where fid is obtained with name_to_handle_at() before starting to watch for changes. When getting an event, the reported fid of the parent should be resolved to dirfd and fstatsat(2) with dirfd and name should be used to query the state of the filesystem entry. Link: https://lore.kernel.org/r/20210304112921.3996419-3-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 29 ++++++++++++++++++++++++-- fs/notify/fdinfo.c | 3 ++- include/linux/fanotify.h | 33 +++++++++++++++++++++++++----- 3 files changed, 57 insertions(+), 8 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 74b4da6354e1..842cccb4f749 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -419,6 +419,14 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, metadata.reserved = 0; metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS; metadata.pid = pid_vnr(event->pid); + /* + * For an unprivileged listener, event->pid can be used to identify the + * events generated by the listener process itself, without disclosing + * the pids of other processes. + */ + if (!capable(CAP_SYS_ADMIN) && + task_tgid(current) != event->pid) + metadata.pid = 0; if (path && path->mnt && path->dentry) { fd = create_fd(group, path, &f); @@ -1036,8 +1044,16 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) pr_debug("%s: flags=%x event_f_flags=%x\n", __func__, flags, event_f_flags); - if (!capable(CAP_SYS_ADMIN)) - return -EPERM; + if (!capable(CAP_SYS_ADMIN)) { + /* + * An unprivileged user can setup an fanotify group with + * limited functionality - an unprivileged group is limited to + * notification events with file handles and it cannot use + * unlimited queue/marks. + */ + if ((flags & FANOTIFY_ADMIN_INIT_FLAGS) || !fid_mode) + return -EPERM; + } #ifdef CONFIG_AUDITSYSCALL if (flags & ~(FANOTIFY_INIT_FLAGS | FAN_ENABLE_AUDIT)) @@ -1306,6 +1322,15 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, goto fput_and_out; group = f.file->private_data; + /* + * An unprivileged user is not allowed to watch a mount point nor + * a filesystem. + */ + ret = -EPERM; + if (!capable(CAP_SYS_ADMIN) && + mark_type != FAN_MARK_INODE) + goto fput_and_out; + /* * group->priority == FS_PRIO_0 == FAN_CLASS_NOTIF. These are not * allowed to set permissions events. diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c index 765b50aeadd2..85b112bd8851 100644 --- a/fs/notify/fdinfo.c +++ b/fs/notify/fdinfo.c @@ -137,7 +137,8 @@ void fanotify_show_fdinfo(struct seq_file *m, struct file *f) struct fsnotify_group *group = f->private_data; seq_printf(m, "fanotify flags:%x event-flags:%x\n", - group->fanotify_data.flags, group->fanotify_data.f_flags); + group->fanotify_data.flags, + group->fanotify_data.f_flags); show_fdinfo(m, f, fanotify_fdinfo); } diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 031a97d8369a..bad41bcb25df 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -18,15 +18,38 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ * these constant, the programs may break if re-compiled with new uapi headers * and then run on an old kernel. */ -#define FANOTIFY_CLASS_BITS (FAN_CLASS_NOTIF | FAN_CLASS_CONTENT | \ + +/* Group classes where permission events are allowed */ +#define FANOTIFY_PERM_CLASSES (FAN_CLASS_CONTENT | \ FAN_CLASS_PRE_CONTENT) +#define FANOTIFY_CLASS_BITS (FAN_CLASS_NOTIF | FANOTIFY_PERM_CLASSES) + #define FANOTIFY_FID_BITS (FAN_REPORT_FID | FAN_REPORT_DFID_NAME) -#define FANOTIFY_INIT_FLAGS (FANOTIFY_CLASS_BITS | FANOTIFY_FID_BITS | \ - FAN_REPORT_TID | \ - FAN_CLOEXEC | FAN_NONBLOCK | \ - FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS) +/* + * fanotify_init() flags that require CAP_SYS_ADMIN. + * We do not allow unprivileged groups to request permission events. + * We do not allow unprivileged groups to get other process pid in events. + * We do not allow unprivileged groups to use unlimited resources. + */ +#define FANOTIFY_ADMIN_INIT_FLAGS (FANOTIFY_PERM_CLASSES | \ + FAN_REPORT_TID | \ + FAN_UNLIMITED_QUEUE | \ + FAN_UNLIMITED_MARKS) + +/* + * fanotify_init() flags that are allowed for user without CAP_SYS_ADMIN. + * FAN_CLASS_NOTIF is the only class we allow for unprivileged group. + * We do not allow unprivileged groups to get file descriptors in events, + * so one of the flags for reporting file handles is required. + */ +#define FANOTIFY_USER_INIT_FLAGS (FAN_CLASS_NOTIF | \ + FANOTIFY_FID_BITS | \ + FAN_CLOEXEC | FAN_NONBLOCK) + +#define FANOTIFY_INIT_FLAGS (FANOTIFY_ADMIN_INIT_FLAGS | \ + FANOTIFY_USER_INIT_FLAGS) #define FANOTIFY_MARK_TYPE_BITS (FAN_MARK_INODE | FAN_MARK_MOUNT | \ FAN_MARK_FILESYSTEM) From bd373a90d048093030c5ce376a87340ac42c28e2 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Thu, 25 Mar 2021 09:37:43 +0100 Subject: [PATCH 0556/1565] fanotify_user: use upper_32_bits() to verify mask [ Upstream commit 22d483b99863202e3631ff66fa0f3c2302c0f96f ] I don't see an obvious reason why the upper 32 bit check needs to be open-coded this way. Switch to upper_32_bits() which is more idiomatic and should conceptually be the same check. Cc: Amir Goldstein Cc: Jan Kara Link: https://lore.kernel.org/r/20210325083742.2334933-1-brauner@kernel.org Signed-off-by: Christian Brauner Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 842cccb4f749..98289ace66fa 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1268,7 +1268,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, __func__, fanotify_fd, flags, dfd, pathname, mask); /* we only use the lower 32 bits as of right now. */ - if (mask & ((__u64)0xffffffff << 32)) + if (upper_32_bits(mask)) return -EINVAL; if (flags & ~FANOTIFY_MARK_FLAGS) From 856b0c4979c7d1f6a4c2608a2115fdf7edb81fff Mon Sep 17 00:00:00 2001 From: Jiapeng Chong Date: Thu, 15 Apr 2021 16:38:24 +0800 Subject: [PATCH 0557/1565] nfsd: remove unused function [ Upstream commit 363f8dd5eecd6c67fe9840ef6065440f0ee7df3a ] Fix the following clang warning: fs/nfsd/nfs4state.c:6276:1: warning: unused function 'end_offset' [-Wunused-function]. Reported-by: Abaci Robot Signed-off-by: Jiapeng Chong Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 9 --------- 1 file changed, 9 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 104d56363654..a42a505b3e41 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6336,15 +6336,6 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, return status; } -static inline u64 -end_offset(u64 start, u64 len) -{ - u64 end; - - end = start + len; - return end >= start ? end: NFS4_MAX_UINT64; -} - /* last octet in a range */ static inline u64 last_byte_offset(u64 start, u64 len) From e63b956b2da90c59a476dd743179377d28581552 Mon Sep 17 00:00:00 2001 From: Vasily Averin Date: Thu, 15 Apr 2021 15:00:58 +0300 Subject: [PATCH 0558/1565] nfsd: removed unused argument in nfsd_startup_generic() [ Upstream commit 70c5307564035c160078401f541c397d77b95415 ] Since commit 501cb1849f86 ("nfsd: rip out the raparms cache") nrservs is not used in nfsd_startup_generic() Signed-off-by: Vasily Averin Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfssvc.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 79bc75d41522..731d89898903 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -308,7 +308,7 @@ static int nfsd_init_socks(struct net *net, const struct cred *cred) static int nfsd_users = 0; -static int nfsd_startup_generic(int nrservs) +static int nfsd_startup_generic(void) { int ret; @@ -374,7 +374,7 @@ void nfsd_reset_boot_verifier(struct nfsd_net *nn) write_sequnlock(&nn->boot_lock); } -static int nfsd_startup_net(int nrservs, struct net *net, const struct cred *cred) +static int nfsd_startup_net(struct net *net, const struct cred *cred) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); int ret; @@ -382,7 +382,7 @@ static int nfsd_startup_net(int nrservs, struct net *net, const struct cred *cre if (nn->nfsd_net_up) return 0; - ret = nfsd_startup_generic(nrservs); + ret = nfsd_startup_generic(); if (ret) return ret; ret = nfsd_init_socks(net, cred); @@ -790,7 +790,7 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred) nfsd_up_before = nn->nfsd_net_up; - error = nfsd_startup_net(nrservs, net, cred); + error = nfsd_startup_net(net, cred); if (error) goto out_destroy; error = nn->nfsd_serv->sv_ops->svo_setup(nn->nfsd_serv, From cc6d658669f8dc737affc88d4a06348f43627aa2 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Fri, 16 Apr 2021 14:00:15 -0400 Subject: [PATCH 0559/1565] nfsd: hash nfs4_files by inode number [ Upstream commit f9b60e2209213fdfcc504ba25a404977c5d08b77 ] The nfs4_file structure is per-filehandle, not per-inode, because the spec requires open and other state to be per filehandle. But it will turn out to be convenient for nfs4_files associated with the same inode to be hashed to the same bucket, so let's hash on the inode instead of the filehandle. Filehandle aliasing is rare, so that shouldn't have much performance impact. (If you have a ton of exported filesystems, though, and all of them have a root with inode number 2, could that get you an overlong hash chain? Perhaps this (and the v4 open file cache) should be hashed on the inode pointer instead.) Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 27 ++++++++++++--------------- fs/nfsd/state.h | 1 - 2 files changed, 12 insertions(+), 16 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index a42a505b3e41..8d2d6e90bfc5 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -554,14 +554,12 @@ static unsigned int ownerstr_hashval(struct xdr_netobj *ownername) #define FILE_HASH_BITS 8 #define FILE_HASH_SIZE (1 << FILE_HASH_BITS) -static unsigned int nfsd_fh_hashval(struct knfsd_fh *fh) +static unsigned int file_hashval(struct svc_fh *fh) { - return jhash2(fh->fh_base.fh_pad, XDR_QUADLEN(fh->fh_size), 0); -} + struct inode *inode = d_inode(fh->fh_dentry); -static unsigned int file_hashval(struct knfsd_fh *fh) -{ - return nfsd_fh_hashval(fh) & (FILE_HASH_SIZE - 1); + /* XXX: why not (here & in file cache) use inode? */ + return (unsigned int)hash_long(inode->i_ino, FILE_HASH_BITS); } static struct hlist_head file_hashtbl[FILE_HASH_SIZE]; @@ -4106,7 +4104,7 @@ static struct nfs4_file *nfsd4_alloc_file(void) } /* OPEN Share state helper functions */ -static void nfsd4_init_file(struct knfsd_fh *fh, unsigned int hashval, +static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval, struct nfs4_file *fp) { lockdep_assert_held(&state_lock); @@ -4116,7 +4114,7 @@ static void nfsd4_init_file(struct knfsd_fh *fh, unsigned int hashval, INIT_LIST_HEAD(&fp->fi_stateids); INIT_LIST_HEAD(&fp->fi_delegations); INIT_LIST_HEAD(&fp->fi_clnt_odstate); - fh_copy_shallow(&fp->fi_fhandle, fh); + fh_copy_shallow(&fp->fi_fhandle, &fh->fh_handle); fp->fi_deleg_file = NULL; fp->fi_had_conflict = false; fp->fi_share_deny = 0; @@ -4460,13 +4458,13 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net) /* search file_hashtbl[] for file */ static struct nfs4_file * -find_file_locked(struct knfsd_fh *fh, unsigned int hashval) +find_file_locked(struct svc_fh *fh, unsigned int hashval) { struct nfs4_file *fp; hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash, lockdep_is_held(&state_lock)) { - if (fh_match(&fp->fi_fhandle, fh)) { + if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) { if (refcount_inc_not_zero(&fp->fi_ref)) return fp; } @@ -4474,8 +4472,7 @@ find_file_locked(struct knfsd_fh *fh, unsigned int hashval) return NULL; } -struct nfs4_file * -find_file(struct knfsd_fh *fh) +static struct nfs4_file * find_file(struct svc_fh *fh) { struct nfs4_file *fp; unsigned int hashval = file_hashval(fh); @@ -4487,7 +4484,7 @@ find_file(struct knfsd_fh *fh) } static struct nfs4_file * -find_or_add_file(struct nfs4_file *new, struct knfsd_fh *fh) +find_or_add_file(struct nfs4_file *new, struct svc_fh *fh) { struct nfs4_file *fp; unsigned int hashval = file_hashval(fh); @@ -4519,7 +4516,7 @@ nfs4_share_conflict(struct svc_fh *current_fh, unsigned int deny_type) struct nfs4_file *fp; __be32 ret = nfs_ok; - fp = find_file(¤t_fh->fh_handle); + fp = find_file(current_fh); if (!fp) return ret; /* Check for conflicting share reservations */ @@ -5208,7 +5205,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf * and check for delegations in the process of being recalled. * If not found, create the nfs4_file struct */ - fp = find_or_add_file(open->op_file, ¤t_fh->fh_handle); + fp = find_or_add_file(open->op_file, current_fh); if (fp != open->op_file) { status = nfs4_check_deleg(cl, open, &dp); if (status) diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 54cab651ac1d..61a2d95d7923 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -669,7 +669,6 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(struct xdr_netobj name struct xdr_netobj princhash, struct nfsd_net *nn); extern bool nfs4_has_reclaimed_state(struct xdr_netobj name, struct nfsd_net *nn); -struct nfs4_file *find_file(struct knfsd_fh *fh); void put_nfs4_file(struct nfs4_file *fi); extern void nfs4_put_copy(struct nfsd4_copy *copy); extern struct nfsd4_copy * From ee548b1629909626e56d4b90b1affad3310bc380 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Fri, 16 Apr 2021 14:00:16 -0400 Subject: [PATCH 0560/1565] nfsd: track filehandle aliasing in nfs4_files [ Upstream commit a0ce48375a367222989c2618fe68bf34db8c7bb7 ] It's unusual but possible for multiple filehandles to point to the same file. In that case, we may end up with multiple nfs4_files referencing the same inode. For delegation purposes it will turn out to be useful to flag those cases. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 37 ++++++++++++++++++++++++++++--------- fs/nfsd/state.h | 2 ++ 2 files changed, 30 insertions(+), 9 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 8d2d6e90bfc5..89d5669ce146 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4120,6 +4120,8 @@ static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval, fp->fi_share_deny = 0; memset(fp->fi_fds, 0, sizeof(fp->fi_fds)); memset(fp->fi_access, 0, sizeof(fp->fi_access)); + fp->fi_aliased = false; + fp->fi_inode = d_inode(fh->fh_dentry); #ifdef CONFIG_NFSD_PNFS INIT_LIST_HEAD(&fp->fi_lo_states); atomic_set(&fp->fi_lo_recalls, 0); @@ -4472,6 +4474,31 @@ find_file_locked(struct svc_fh *fh, unsigned int hashval) return NULL; } +static struct nfs4_file *insert_file(struct nfs4_file *new, struct svc_fh *fh, + unsigned int hashval) +{ + struct nfs4_file *fp; + struct nfs4_file *ret = NULL; + bool alias_found = false; + + spin_lock(&state_lock); + hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash, + lockdep_is_held(&state_lock)) { + if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) { + if (refcount_inc_not_zero(&fp->fi_ref)) + ret = fp; + } else if (d_inode(fh->fh_dentry) == fp->fi_inode) + fp->fi_aliased = alias_found = true; + } + if (likely(ret == NULL)) { + nfsd4_init_file(fh, hashval, new); + new->fi_aliased = alias_found; + ret = new; + } + spin_unlock(&state_lock); + return ret; +} + static struct nfs4_file * find_file(struct svc_fh *fh) { struct nfs4_file *fp; @@ -4495,15 +4522,7 @@ find_or_add_file(struct nfs4_file *new, struct svc_fh *fh) if (fp) return fp; - spin_lock(&state_lock); - fp = find_file_locked(fh, hashval); - if (likely(fp == NULL)) { - nfsd4_init_file(fh, hashval, new); - fp = new; - } - spin_unlock(&state_lock); - - return fp; + return insert_file(new, fh, hashval); } /* diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 61a2d95d7923..e73bdbb1634a 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -516,6 +516,8 @@ struct nfs4_clnt_odstate { */ struct nfs4_file { refcount_t fi_ref; + struct inode * fi_inode; + bool fi_aliased; spinlock_t fi_lock; struct hlist_node fi_hash; /* hash on fi_fhandle */ struct list_head fi_stateids; From d2431cc9670a0b7dee8829d82ba1e960c0e9c6b7 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Fri, 16 Apr 2021 14:00:17 -0400 Subject: [PATCH 0561/1565] nfsd: reshuffle some code [ Upstream commit ebd9d2c2f5a7ebaaed2d7bb4dee148755f46033d ] No change in behavior, I'm just moving some code around to avoid forward references in a following patch. (To do someday: figure out how to split up nfs4state.c. It's big and disorganized.) Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 235 ++++++++++++++++++++++---------------------- 1 file changed, 118 insertions(+), 117 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 89d5669ce146..9c8dacf0f2b8 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -354,6 +354,124 @@ static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops = { .release = nfsd4_cb_notify_lock_release, }; +/* + * We store the NONE, READ, WRITE, and BOTH bits separately in the + * st_{access,deny}_bmap field of the stateid, in order to track not + * only what share bits are currently in force, but also what + * combinations of share bits previous opens have used. This allows us + * to enforce the recommendation of rfc 3530 14.2.19 that the server + * return an error if the client attempt to downgrade to a combination + * of share bits not explicable by closing some of its previous opens. + * + * XXX: This enforcement is actually incomplete, since we don't keep + * track of access/deny bit combinations; so, e.g., we allow: + * + * OPEN allow read, deny write + * OPEN allow both, deny none + * DOWNGRADE allow read, deny none + * + * which we should reject. + */ +static unsigned int +bmap_to_share_mode(unsigned long bmap) +{ + int i; + unsigned int access = 0; + + for (i = 1; i < 4; i++) { + if (test_bit(i, &bmap)) + access |= i; + } + return access; +} + +/* set share access for a given stateid */ +static inline void +set_access(u32 access, struct nfs4_ol_stateid *stp) +{ + unsigned char mask = 1 << access; + + WARN_ON_ONCE(access > NFS4_SHARE_ACCESS_BOTH); + stp->st_access_bmap |= mask; +} + +/* clear share access for a given stateid */ +static inline void +clear_access(u32 access, struct nfs4_ol_stateid *stp) +{ + unsigned char mask = 1 << access; + + WARN_ON_ONCE(access > NFS4_SHARE_ACCESS_BOTH); + stp->st_access_bmap &= ~mask; +} + +/* test whether a given stateid has access */ +static inline bool +test_access(u32 access, struct nfs4_ol_stateid *stp) +{ + unsigned char mask = 1 << access; + + return (bool)(stp->st_access_bmap & mask); +} + +/* set share deny for a given stateid */ +static inline void +set_deny(u32 deny, struct nfs4_ol_stateid *stp) +{ + unsigned char mask = 1 << deny; + + WARN_ON_ONCE(deny > NFS4_SHARE_DENY_BOTH); + stp->st_deny_bmap |= mask; +} + +/* clear share deny for a given stateid */ +static inline void +clear_deny(u32 deny, struct nfs4_ol_stateid *stp) +{ + unsigned char mask = 1 << deny; + + WARN_ON_ONCE(deny > NFS4_SHARE_DENY_BOTH); + stp->st_deny_bmap &= ~mask; +} + +/* test whether a given stateid is denying specific access */ +static inline bool +test_deny(u32 deny, struct nfs4_ol_stateid *stp) +{ + unsigned char mask = 1 << deny; + + return (bool)(stp->st_deny_bmap & mask); +} + +static int nfs4_access_to_omode(u32 access) +{ + switch (access & NFS4_SHARE_ACCESS_BOTH) { + case NFS4_SHARE_ACCESS_READ: + return O_RDONLY; + case NFS4_SHARE_ACCESS_WRITE: + return O_WRONLY; + case NFS4_SHARE_ACCESS_BOTH: + return O_RDWR; + } + WARN_ON_ONCE(1); + return O_RDONLY; +} + +static inline int +access_permit_read(struct nfs4_ol_stateid *stp) +{ + return test_access(NFS4_SHARE_ACCESS_READ, stp) || + test_access(NFS4_SHARE_ACCESS_BOTH, stp) || + test_access(NFS4_SHARE_ACCESS_WRITE, stp); +} + +static inline int +access_permit_write(struct nfs4_ol_stateid *stp) +{ + return test_access(NFS4_SHARE_ACCESS_WRITE, stp) || + test_access(NFS4_SHARE_ACCESS_BOTH, stp); +} + static inline struct nfs4_stateowner * nfs4_get_stateowner(struct nfs4_stateowner *sop) { @@ -1167,108 +1285,6 @@ static unsigned int clientstr_hashval(struct xdr_netobj name) return opaque_hashval(name.data, 8) & CLIENT_HASH_MASK; } -/* - * We store the NONE, READ, WRITE, and BOTH bits separately in the - * st_{access,deny}_bmap field of the stateid, in order to track not - * only what share bits are currently in force, but also what - * combinations of share bits previous opens have used. This allows us - * to enforce the recommendation of rfc 3530 14.2.19 that the server - * return an error if the client attempt to downgrade to a combination - * of share bits not explicable by closing some of its previous opens. - * - * XXX: This enforcement is actually incomplete, since we don't keep - * track of access/deny bit combinations; so, e.g., we allow: - * - * OPEN allow read, deny write - * OPEN allow both, deny none - * DOWNGRADE allow read, deny none - * - * which we should reject. - */ -static unsigned int -bmap_to_share_mode(unsigned long bmap) { - int i; - unsigned int access = 0; - - for (i = 1; i < 4; i++) { - if (test_bit(i, &bmap)) - access |= i; - } - return access; -} - -/* set share access for a given stateid */ -static inline void -set_access(u32 access, struct nfs4_ol_stateid *stp) -{ - unsigned char mask = 1 << access; - - WARN_ON_ONCE(access > NFS4_SHARE_ACCESS_BOTH); - stp->st_access_bmap |= mask; -} - -/* clear share access for a given stateid */ -static inline void -clear_access(u32 access, struct nfs4_ol_stateid *stp) -{ - unsigned char mask = 1 << access; - - WARN_ON_ONCE(access > NFS4_SHARE_ACCESS_BOTH); - stp->st_access_bmap &= ~mask; -} - -/* test whether a given stateid has access */ -static inline bool -test_access(u32 access, struct nfs4_ol_stateid *stp) -{ - unsigned char mask = 1 << access; - - return (bool)(stp->st_access_bmap & mask); -} - -/* set share deny for a given stateid */ -static inline void -set_deny(u32 deny, struct nfs4_ol_stateid *stp) -{ - unsigned char mask = 1 << deny; - - WARN_ON_ONCE(deny > NFS4_SHARE_DENY_BOTH); - stp->st_deny_bmap |= mask; -} - -/* clear share deny for a given stateid */ -static inline void -clear_deny(u32 deny, struct nfs4_ol_stateid *stp) -{ - unsigned char mask = 1 << deny; - - WARN_ON_ONCE(deny > NFS4_SHARE_DENY_BOTH); - stp->st_deny_bmap &= ~mask; -} - -/* test whether a given stateid is denying specific access */ -static inline bool -test_deny(u32 deny, struct nfs4_ol_stateid *stp) -{ - unsigned char mask = 1 << deny; - - return (bool)(stp->st_deny_bmap & mask); -} - -static int nfs4_access_to_omode(u32 access) -{ - switch (access & NFS4_SHARE_ACCESS_BOTH) { - case NFS4_SHARE_ACCESS_READ: - return O_RDONLY; - case NFS4_SHARE_ACCESS_WRITE: - return O_WRONLY; - case NFS4_SHARE_ACCESS_BOTH: - return O_RDWR; - } - WARN_ON_ONCE(1); - return O_RDONLY; -} - /* * A stateid that had a deny mode associated with it is being released * or downgraded. Recalculate the deny mode on the file. @@ -5565,21 +5581,6 @@ static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp) return nfs_ok; } -static inline int -access_permit_read(struct nfs4_ol_stateid *stp) -{ - return test_access(NFS4_SHARE_ACCESS_READ, stp) || - test_access(NFS4_SHARE_ACCESS_BOTH, stp) || - test_access(NFS4_SHARE_ACCESS_WRITE, stp); -} - -static inline int -access_permit_write(struct nfs4_ol_stateid *stp) -{ - return test_access(NFS4_SHARE_ACCESS_WRITE, stp) || - test_access(NFS4_SHARE_ACCESS_BOTH, stp); -} - static __be32 nfs4_check_openmode(struct nfs4_ol_stateid *stp, int flags) { From 39ab09108e2863392cac94be9f4116c2f0fed3ac Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Fri, 16 Apr 2021 14:00:18 -0400 Subject: [PATCH 0562/1565] nfsd: grant read delegations to clients holding writes [ Upstream commit aba2072f452346d56a462718bcde93d697383148 ] It's OK to grant a read delegation to a client that holds a write, as long as it's the only client holding the write. We originally tried to do this in commit 94415b06eb8a ("nfsd4: a client's own opens needn't prevent delegations"), which had to be reverted in commit 6ee65a773096 ("Revert "nfsd4: a client's own opens needn't prevent delegations""). Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/locks.c | 3 ++ fs/nfsd/nfs4state.c | 82 +++++++++++++++++++++++++++++++++++++-------- 2 files changed, 71 insertions(+), 14 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 873f97504bdd..101867933e4d 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -1808,6 +1808,9 @@ check_conflicting_open(struct file *filp, const long arg, int flags) if (flags & FL_LAYOUT) return 0; + if (flags & FL_DELEG) + /* We leave these checks to the caller */ + return 0; if (arg == F_RDLCK) return inode_is_open_for_write(inode) ? -EAGAIN : 0; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 9c8dacf0f2b8..fd0d6e81a227 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5030,6 +5030,65 @@ static struct file_lock *nfs4_alloc_init_lease(struct nfs4_delegation *dp, return fl; } +static int nfsd4_check_conflicting_opens(struct nfs4_client *clp, + struct nfs4_file *fp) +{ + struct nfs4_ol_stateid *st; + struct file *f = fp->fi_deleg_file->nf_file; + struct inode *ino = locks_inode(f); + int writes; + + writes = atomic_read(&ino->i_writecount); + if (!writes) + return 0; + /* + * There could be multiple filehandles (hence multiple + * nfs4_files) referencing this file, but that's not too + * common; let's just give up in that case rather than + * trying to go look up all the clients using that other + * nfs4_file as well: + */ + if (fp->fi_aliased) + return -EAGAIN; + /* + * If there's a close in progress, make sure that we see it + * clear any fi_fds[] entries before we see it decrement + * i_writecount: + */ + smp_mb__after_atomic(); + + if (fp->fi_fds[O_WRONLY]) + writes--; + if (fp->fi_fds[O_RDWR]) + writes--; + if (writes > 0) + return -EAGAIN; /* There may be non-NFSv4 writers */ + /* + * It's possible there are non-NFSv4 write opens in progress, + * but if they haven't incremented i_writecount yet then they + * also haven't called break lease yet; so, they'll break this + * lease soon enough. So, all that's left to check for is NFSv4 + * opens: + */ + spin_lock(&fp->fi_lock); + list_for_each_entry(st, &fp->fi_stateids, st_perfile) { + if (st->st_openstp == NULL /* it's an open */ && + access_permit_write(st) && + st->st_stid.sc_client != clp) { + spin_unlock(&fp->fi_lock); + return -EAGAIN; + } + } + spin_unlock(&fp->fi_lock); + /* + * There's a small chance that we could be racing with another + * NFSv4 open. However, any open that hasn't added itself to + * the fi_stateids list also hasn't called break_lease yet; so, + * they'll break this lease soon enough. + */ + return 0; +} + static struct nfs4_delegation * nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh, struct nfs4_file *fp, struct nfs4_clnt_odstate *odstate) @@ -5049,9 +5108,12 @@ nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh, nf = find_readable_file(fp); if (!nf) { - /* We should always have a readable file here */ - WARN_ON_ONCE(1); - return ERR_PTR(-EBADF); + /* + * We probably could attempt another open and get a read + * delegation, but for now, don't bother until the + * client actually sends us one. + */ + return ERR_PTR(-EAGAIN); } spin_lock(&state_lock); spin_lock(&fp->fi_lock); @@ -5086,6 +5148,9 @@ nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh, locks_free_lock(fl); if (status) goto out_clnt_odstate; + status = nfsd4_check_conflicting_opens(clp, fp); + if (status) + goto out_unlock; spin_lock(&state_lock); spin_lock(&fp->fi_lock); @@ -5167,17 +5232,6 @@ nfs4_open_delegation(struct svc_fh *fh, struct nfsd4_open *open, goto out_no_deleg; if (!cb_up || !(oo->oo_flags & NFS4_OO_CONFIRMED)) goto out_no_deleg; - /* - * Also, if the file was opened for write or - * create, there's a good chance the client's - * about to write to it, resulting in an - * immediate recall (since we don't support - * write delegations): - */ - if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) - goto out_no_deleg; - if (open->op_create == NFS4_OPEN_CREATE) - goto out_no_deleg; break; default: goto out_no_deleg; From 3793f28102f185969e2f7d4641160179b6240249 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Fri, 20 Nov 2020 12:26:40 -0600 Subject: [PATCH 0563/1565] nfsd: Fix fall-through warnings for Clang [ Upstream commit 76c50eb70d8e1133eaada0013845619c36345fbc ] In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple warnings by explicitly adding a couple of break statements instead of just letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 1 + fs/nfsd/nfsctl.c | 1 + 2 files changed, 2 insertions(+) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index fd0d6e81a227..ea68dc157ada 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3161,6 +3161,7 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out_nolock; } new->cl_mach_cred = true; + break; case SP4_NONE: break; default: /* checked by xdr code */ diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 147ed8995a79..cb73c1292562 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1169,6 +1169,7 @@ static struct inode *nfsd_get_inode(struct super_block *sb, umode_t mode) inode->i_fop = &simple_dir_operations; inode->i_op = &simple_dir_inode_operations; inc_nlink(inode); + break; default: break; } From b2c0c7cb7fe34d1f18f1292b39c7221dd9765ca7 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Thu, 22 Apr 2021 03:37:49 -0400 Subject: [PATCH 0564/1565] NFSv4.2: Remove ifdef CONFIG_NFSD from NFSv4.2 client SSC code. [ Upstream commit d9092b4bb2109502eb8972021a3f74febc931a63 ] The client SSC code should not depend on any of the CONFIG_NFSD config. This patch removes all CONFIG_NFSD from NFSv4.2 client SSC code and simplifies the config of CONFIG_NFS_V4_2_SSC_HELPER, NFSD_V4_2_INTER_SSC. Signed-off-by: Dai Ngo Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin --- fs/Kconfig | 4 ++-- fs/nfs/nfs4file.c | 4 ---- fs/nfs/super.c | 4 ---- fs/nfsd/Kconfig | 2 +- 4 files changed, 3 insertions(+), 11 deletions(-) diff --git a/fs/Kconfig b/fs/Kconfig index 462253ae483a..eaff422877c3 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -334,8 +334,8 @@ config NFS_COMMON default y config NFS_V4_2_SSC_HELPER - tristate - default y if NFS_V4=y || NFS_FS=y + bool + default y if NFS_V4_2 source "net/sunrpc/Kconfig" source "fs/ceph/Kconfig" diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index 5ad57ad89fb1..70cd0d764c44 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -430,9 +430,7 @@ static const struct nfs4_ssc_client_ops nfs4_ssc_clnt_ops_tbl = { */ void nfs42_ssc_register_ops(void) { -#ifdef CONFIG_NFSD_V4 nfs42_ssc_register(&nfs4_ssc_clnt_ops_tbl); -#endif } /** @@ -443,9 +441,7 @@ void nfs42_ssc_register_ops(void) */ void nfs42_ssc_unregister_ops(void) { -#ifdef CONFIG_NFSD_V4 nfs42_ssc_unregister(&nfs4_ssc_clnt_ops_tbl); -#endif } #endif /* CONFIG_NFS_V4_2 */ diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 7179d59d73ca..1ffce9076060 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -116,16 +116,12 @@ static void unregister_nfs4_fs(void) #ifdef CONFIG_NFS_V4_2 static void nfs_ssc_register_ops(void) { -#ifdef CONFIG_NFSD_V4 nfs_ssc_register(&nfs_ssc_clnt_ops_tbl); -#endif } static void nfs_ssc_unregister_ops(void) { -#ifdef CONFIG_NFSD_V4 nfs_ssc_unregister(&nfs_ssc_clnt_ops_tbl); -#endif } #endif /* CONFIG_NFS_V4_2 */ diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig index 5fa38ad9e7e3..f229172652be 100644 --- a/fs/nfsd/Kconfig +++ b/fs/nfsd/Kconfig @@ -138,7 +138,7 @@ config NFSD_FLEXFILELAYOUT config NFSD_V4_2_INTER_SSC bool "NFSv4.2 inter server to server COPY" - depends on NFSD_V4 && NFS_V4_1 && NFS_V4_2 + depends on NFSD_V4 && NFS_V4_2 help This option enables support for NFSv4.2 inter server to server copy where the destination server calls the NFSv4.2 From 0245993ace73088472190161965fff25ba685f68 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Wed, 24 Mar 2021 15:32:21 -0400 Subject: [PATCH 0565/1565] NFS: fix nfs_fetch_iversion() [ Upstream commit b876d708316bf9b6b9678eb2beb289b93cfe6369 ] The change attribute is always set by all NFS client versions so get rid of the open-coded version. Fixes: 3cc55f4434b4 ("nfs: use change attribute for NFS re-exports") Signed-off-by: Trond Myklebust Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfs/export.c | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/fs/nfs/export.c b/fs/nfs/export.c index f2b34cfe286c..b347e3ce0cc8 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -171,17 +171,10 @@ static u64 nfs_fetch_iversion(struct inode *inode) { struct nfs_server *server = NFS_SERVER(inode); - /* Is this the right call?: */ - nfs_revalidate_inode(server, inode); - /* - * Also, note we're ignoring any returned error. That seems to be - * the practice for cache consistency information elsewhere in - * the server, but I'm not sure why. - */ - if (server->nfs_client->rpc_ops->version >= 4) - return inode_peek_iversion_raw(inode); - else - return time_to_chattr(&inode->i_ctime); + if (nfs_check_cache_invalid(inode, NFS_INO_INVALID_CHANGE | + NFS_INO_REVAL_PAGECACHE)) + __nfs_revalidate_inode(server, inode); + return inode_peek_iversion_raw(inode); } const struct export_operations nfs_export_ops = { From a4d250f5107c6d270bcb1071c00977273068c0ea Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Mon, 24 May 2021 16:53:21 +0300 Subject: [PATCH 0566/1565] fanotify: fix permission model of unprivileged group [ Upstream commit a8b98c808eab3ec8f1b5a64be967b0f4af4cae43 ] Reporting event->pid should depend on the privileges of the user that initialized the group, not the privileges of the user reading the events. Use an internal group flag FANOTIFY_UNPRIV to record the fact that the group was initialized by an unprivileged user. To be on the safe side, the premissions to setup filesystem and mount marks now require that both the user that initialized the group and the user setting up the mark have CAP_SYS_ADMIN. Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiA77_P5vtv7e83g0+9d7B5W9ZTE4GfQEYbWmfT1rA=VA@mail.gmail.com/ Fixes: 7cea2a3c505e ("fanotify: support limited functionality for unprivileged users") Cc: # v5.12+ Link: https://lore.kernel.org/r/20210524135321.2190062-1-amir73il@gmail.com Reviewed-by: Matthew Bobrowski Acked-by: Christian Brauner Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 30 ++++++++++++++++++++++++------ fs/notify/fdinfo.c | 2 +- include/linux/fanotify.h | 4 ++++ 3 files changed, 29 insertions(+), 7 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 98289ace66fa..1c439e2fdcd8 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -424,11 +424,18 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, * events generated by the listener process itself, without disclosing * the pids of other processes. */ - if (!capable(CAP_SYS_ADMIN) && + if (FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV) && task_tgid(current) != event->pid) metadata.pid = 0; - if (path && path->mnt && path->dentry) { + /* + * For now, fid mode is required for an unprivileged listener and + * fid mode does not report fd in events. Keep this check anyway + * for safety in case fid mode requirement is relaxed in the future + * to allow unprivileged listener to get events with no fd and no fid. + */ + if (!FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV) && + path && path->mnt && path->dentry) { fd = create_fd(group, path, &f); if (fd < 0) return fd; @@ -1040,6 +1047,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) int f_flags, fd; unsigned int fid_mode = flags & FANOTIFY_FID_BITS; unsigned int class = flags & FANOTIFY_CLASS_BITS; + unsigned int internal_flags = 0; pr_debug("%s: flags=%x event_f_flags=%x\n", __func__, flags, event_f_flags); @@ -1053,6 +1061,13 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) */ if ((flags & FANOTIFY_ADMIN_INIT_FLAGS) || !fid_mode) return -EPERM; + + /* + * Setting the internal flag FANOTIFY_UNPRIV on the group + * prevents setting mount/filesystem marks on this group and + * prevents reporting pid and open fd in events. + */ + internal_flags |= FANOTIFY_UNPRIV; } #ifdef CONFIG_AUDITSYSCALL @@ -1105,7 +1120,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) goto out_destroy_group; } - group->fanotify_data.flags = flags; + group->fanotify_data.flags = flags | internal_flags; group->memcg = get_mem_cgroup_from_mm(current->mm); group->fanotify_data.merge_hash = fanotify_alloc_merge_hash(); @@ -1323,11 +1338,13 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, group = f.file->private_data; /* - * An unprivileged user is not allowed to watch a mount point nor - * a filesystem. + * An unprivileged user is not allowed to setup mount nor filesystem + * marks. This also includes setting up such marks by a group that + * was initialized by an unprivileged user. */ ret = -EPERM; - if (!capable(CAP_SYS_ADMIN) && + if ((!capable(CAP_SYS_ADMIN) || + FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV)) && mark_type != FAN_MARK_INODE) goto fput_and_out; @@ -1478,6 +1495,7 @@ static int __init fanotify_user_setup(void) max_marks = clamp(max_marks, FANOTIFY_OLD_DEFAULT_MAX_MARKS, FANOTIFY_DEFAULT_MAX_USER_MARKS); + BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS); BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10); BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9); diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c index 85b112bd8851..3451708fd035 100644 --- a/fs/notify/fdinfo.c +++ b/fs/notify/fdinfo.c @@ -137,7 +137,7 @@ void fanotify_show_fdinfo(struct seq_file *m, struct file *f) struct fsnotify_group *group = f->private_data; seq_printf(m, "fanotify flags:%x event-flags:%x\n", - group->fanotify_data.flags, + group->fanotify_data.flags & FANOTIFY_INIT_FLAGS, group->fanotify_data.f_flags); show_fdinfo(m, f, fanotify_fdinfo); diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index bad41bcb25df..a16dbeced152 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -51,6 +51,10 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ #define FANOTIFY_INIT_FLAGS (FANOTIFY_ADMIN_INIT_FLAGS | \ FANOTIFY_USER_INIT_FLAGS) +/* Internal group flags */ +#define FANOTIFY_UNPRIV 0x80000000 +#define FANOTIFY_INTERNAL_GROUP_FLAGS (FANOTIFY_UNPRIV) + #define FANOTIFY_MARK_TYPE_BITS (FAN_MARK_INODE | FAN_MARK_MOUNT | \ FAN_MARK_FILESYSTEM) From b00bb7dfe2594932bfcdf65955a4b87b5faa96d8 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:55:23 -0400 Subject: [PATCH 0567/1565] NFSD: Add an RPC authflavor tracepoint display helper [ Upstream commit 87b2394d60c32c158ebb96ace4abee883baf1239 ] To be used in subsequent patches. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/trace.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index d20ab767ba26..3ec6d38fa531 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -841,6 +841,22 @@ DEFINE_NFSD_CB_EVENT(setup); DEFINE_NFSD_CB_EVENT(state); DEFINE_NFSD_CB_EVENT(shutdown); +TRACE_DEFINE_ENUM(RPC_AUTH_NULL); +TRACE_DEFINE_ENUM(RPC_AUTH_UNIX); +TRACE_DEFINE_ENUM(RPC_AUTH_GSS); +TRACE_DEFINE_ENUM(RPC_AUTH_GSS_KRB5); +TRACE_DEFINE_ENUM(RPC_AUTH_GSS_KRB5I); +TRACE_DEFINE_ENUM(RPC_AUTH_GSS_KRB5P); + +#define show_nfsd_authflavor(val) \ + __print_symbolic(val, \ + { RPC_AUTH_NULL, "none" }, \ + { RPC_AUTH_UNIX, "sys" }, \ + { RPC_AUTH_GSS, "gss" }, \ + { RPC_AUTH_GSS_KRB5, "krb5" }, \ + { RPC_AUTH_GSS_KRB5I, "krb5i" }, \ + { RPC_AUTH_GSS_KRB5P, "krb5p" }) + TRACE_EVENT(nfsd_cb_setup_err, TP_PROTO( const struct nfs4_client *clp, From c8493d73083c37872bfa4959fec21ab431173356 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:55:29 -0400 Subject: [PATCH 0568/1565] NFSD: Add nfsd_clid_cred_mismatch tracepoint [ Upstream commit 27787733ef44332fce749aa853f2749d141982b0 ] Record when a client tries to establish a lease record but uses an unexpected credential. This is often a sign of a configuration problem. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 14 ++++++++++---- fs/nfsd/trace.h | 28 ++++++++++++++++++++++++++++ 2 files changed, 38 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index ea68dc157ada..2e18b1ad889d 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3203,6 +3203,7 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (!creds_match) { /* case 3 */ if (client_has_state(conf)) { status = nfserr_clid_inuse; + trace_nfsd_clid_cred_mismatch(conf, rqstp); goto out; } goto out_new; @@ -3447,9 +3448,10 @@ nfsd4_create_session(struct svc_rqst *rqstp, goto out_free_conn; } } else if (unconf) { + status = nfserr_clid_inuse; if (!same_creds(&unconf->cl_cred, &rqstp->rq_cred) || !rpc_cmp_addr(sa, (struct sockaddr *) &unconf->cl_addr)) { - status = nfserr_clid_inuse; + trace_nfsd_clid_cred_mismatch(unconf, rqstp); goto out_free_conn; } status = nfserr_wrong_cred; @@ -4008,7 +4010,7 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (clp_used_exchangeid(conf)) goto out; if (!same_creds(&conf->cl_cred, &rqstp->rq_cred)) { - trace_nfsd_clid_inuse_err(conf); + trace_nfsd_clid_cred_mismatch(conf, rqstp); goto out; } } @@ -4066,10 +4068,14 @@ nfsd4_setclientid_confirm(struct svc_rqst *rqstp, * Nevertheless, RFC 7530 recommends INUSE for this case: */ status = nfserr_clid_inuse; - if (unconf && !same_creds(&unconf->cl_cred, &rqstp->rq_cred)) + if (unconf && !same_creds(&unconf->cl_cred, &rqstp->rq_cred)) { + trace_nfsd_clid_cred_mismatch(unconf, rqstp); goto out; - if (conf && !same_creds(&conf->cl_cred, &rqstp->rq_cred)) + } + if (conf && !same_creds(&conf->cl_cred, &rqstp->rq_cred)) { + trace_nfsd_clid_cred_mismatch(conf, rqstp); goto out; + } /* cases below refer to rfc 3530 section 14.2.34: */ if (!unconf || !same_verf(&confirm, &unconf->cl_confirm)) { if (conf && same_verf(&confirm, &conf->cl_confirm)) { diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 3ec6d38fa531..bec85fc8be01 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -536,6 +536,34 @@ DEFINE_EVENT(nfsd_net_class, nfsd_##name, \ DEFINE_NET_EVENT(grace_start); DEFINE_NET_EVENT(grace_complete); +TRACE_EVENT(nfsd_clid_cred_mismatch, + TP_PROTO( + const struct nfs4_client *clp, + const struct svc_rqst *rqstp + ), + TP_ARGS(clp, rqstp), + TP_STRUCT__entry( + __field(u32, cl_boot) + __field(u32, cl_id) + __field(unsigned long, cl_flavor) + __field(unsigned long, new_flavor) + __array(unsigned char, addr, sizeof(struct sockaddr_in6)) + ), + TP_fast_assign( + __entry->cl_boot = clp->cl_clientid.cl_boot; + __entry->cl_id = clp->cl_clientid.cl_id; + __entry->cl_flavor = clp->cl_cred.cr_flavor; + __entry->new_flavor = rqstp->rq_cred.cr_flavor; + memcpy(__entry->addr, &rqstp->rq_xprt->xpt_remote, + sizeof(struct sockaddr_in6)); + ), + TP_printk("client %08x:%08x flavor=%s, conflict=%s from addr=%pISpc", + __entry->cl_boot, __entry->cl_id, + show_nfsd_authflavor(__entry->cl_flavor), + show_nfsd_authflavor(__entry->new_flavor), __entry->addr + ) +) + TRACE_EVENT(nfsd_clid_inuse_err, TP_PROTO(const struct nfs4_client *clp), TP_ARGS(clp), From 8da1871206651d7b2a32d6d7fca10f24821f9fcb Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:55:36 -0400 Subject: [PATCH 0569/1565] NFSD: Add nfsd_clid_verf_mismatch tracepoint [ Upstream commit 744ea54c869cebe41fbad5f53f8a8ca5d93a5c97 ] Record when a client presents a different boot verifier than the one we know about. Typically this is a sign the client has rebooted, but sometimes it signals a conflicting client ID, which the client's administrator will need to address. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 11 ++++++++--- fs/nfsd/trace.h | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 2e18b1ad889d..8169625fdd23 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3213,6 +3213,7 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out_copy; } /* case 5, client reboot */ + trace_nfsd_clid_verf_mismatch(conf, rqstp, &verf); conf = NULL; goto out_new; } @@ -4018,9 +4019,13 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (unconf) unhash_client_locked(unconf); /* We need to handle only case 1: probable callback update */ - if (conf && same_verf(&conf->cl_verifier, &clverifier)) { - copy_clid(new, conf); - gen_confirm(new, nn); + if (conf) { + if (same_verf(&conf->cl_verifier, &clverifier)) { + copy_clid(new, conf); + gen_confirm(new, nn); + } else + trace_nfsd_clid_verf_mismatch(conf, rqstp, + &clverifier); } new->cl_minorversion = 0; gen_callback(new, setclid, rqstp); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index bec85fc8be01..0ab46a0c911d 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -564,6 +564,38 @@ TRACE_EVENT(nfsd_clid_cred_mismatch, ) ) +TRACE_EVENT(nfsd_clid_verf_mismatch, + TP_PROTO( + const struct nfs4_client *clp, + const struct svc_rqst *rqstp, + const nfs4_verifier *verf + ), + TP_ARGS(clp, rqstp, verf), + TP_STRUCT__entry( + __field(u32, cl_boot) + __field(u32, cl_id) + __array(unsigned char, cl_verifier, NFS4_VERIFIER_SIZE) + __array(unsigned char, new_verifier, NFS4_VERIFIER_SIZE) + __array(unsigned char, addr, sizeof(struct sockaddr_in6)) + ), + TP_fast_assign( + __entry->cl_boot = clp->cl_clientid.cl_boot; + __entry->cl_id = clp->cl_clientid.cl_id; + memcpy(__entry->cl_verifier, (void *)&clp->cl_verifier, + NFS4_VERIFIER_SIZE); + memcpy(__entry->new_verifier, (void *)verf, + NFS4_VERIFIER_SIZE); + memcpy(__entry->addr, &rqstp->rq_xprt->xpt_remote, + sizeof(struct sockaddr_in6)); + ), + TP_printk("client %08x:%08x verf=0x%s, updated=0x%s from addr=%pISpc", + __entry->cl_boot, __entry->cl_id, + __print_hex_str(__entry->cl_verifier, NFS4_VERIFIER_SIZE), + __print_hex_str(__entry->new_verifier, NFS4_VERIFIER_SIZE), + __entry->addr + ) +); + TRACE_EVENT(nfsd_clid_inuse_err, TP_PROTO(const struct nfs4_client *clp), TP_ARGS(clp), From c6889b75a6171888b142a38f7b50949ed5f4a906 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:55:42 -0400 Subject: [PATCH 0570/1565] NFSD: Remove trace_nfsd_clid_inuse_err [ Upstream commit 0bfaacac57e64aa342f865b8ddcab06ca59a6f83 ] This tracepoint has been replaced by nfsd_clid_cred_mismatch and nfsd_clid_verf_mismatch, and can simply be removed. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/trace.h | 24 ------------------------ 1 file changed, 24 deletions(-) diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 0ab46a0c911d..2fac89e29f51 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -596,30 +596,6 @@ TRACE_EVENT(nfsd_clid_verf_mismatch, ) ); -TRACE_EVENT(nfsd_clid_inuse_err, - TP_PROTO(const struct nfs4_client *clp), - TP_ARGS(clp), - TP_STRUCT__entry( - __field(u32, cl_boot) - __field(u32, cl_id) - __array(unsigned char, addr, sizeof(struct sockaddr_in6)) - __field(unsigned int, namelen) - __dynamic_array(unsigned char, name, clp->cl_name.len) - ), - TP_fast_assign( - __entry->cl_boot = clp->cl_clientid.cl_boot; - __entry->cl_id = clp->cl_clientid.cl_id; - memcpy(__entry->addr, &clp->cl_addr, - sizeof(struct sockaddr_in6)); - __entry->namelen = clp->cl_name.len; - memcpy(__get_dynamic_array(name), clp->cl_name.data, - clp->cl_name.len); - ), - TP_printk("nfs4_clientid %.*s already in use by %pISpc, client %08x:%08x", - __entry->namelen, __get_str(name), __entry->addr, - __entry->cl_boot, __entry->cl_id) -) - /* * from fs/nfsd/filecache.h */ From 3b6808c793f3dbcd5e46c74bfa7301a2bf6dbe5b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:55:48 -0400 Subject: [PATCH 0571/1565] NFSD: Add nfsd_clid_confirmed tracepoint [ Upstream commit 7e3b32ace6094aadfa2e1e54ca4c6bbfd07646af ] This replaces a dprintk call site in order to get greater visibility on when client IDs are confirmed or re-used. Simple example: nfsd-995 [000] 126.622975: nfsd_compound: xid=0x3a34e2b1 opcnt=1 nfsd-995 [000] 126.623005: nfsd_cb_args: addr=192.168.2.51:45901 client 60958e3b:9213ef0e prog=1073741824 ident=1 nfsd-995 [000] 126.623007: nfsd_compound_status: op=1/1 OP_SETCLIENTID status=0 nfsd-996 [001] 126.623142: nfsd_compound: xid=0x3b34e2b1 opcnt=1 >>>> nfsd-996 [001] 126.623146: nfsd_clid_confirmed: client 60958e3b:9213ef0e nfsd-996 [001] 126.623148: nfsd_cb_probe: addr=192.168.2.51:45901 client 60958e3b:9213ef0e state=UNKNOWN nfsd-996 [001] 126.623154: nfsd_compound_status: op=1/1 OP_SETCLIENTID_CONFIRM status=0 Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 10 +++++----- fs/nfsd/trace.h | 1 + 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 8169625fdd23..b10593079e38 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2838,14 +2838,14 @@ move_to_confirmed(struct nfs4_client *clp) lockdep_assert_held(&nn->client_lock); - dprintk("NFSD: move_to_confirm nfs4_client %p\n", clp); list_move(&clp->cl_idhash, &nn->conf_id_hashtbl[idhashval]); rb_erase(&clp->cl_namenode, &nn->unconf_name_tree); add_clp_to_name_tree(clp, &nn->conf_name_tree); - if (!test_and_set_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags) && - clp->cl_nfsd_dentry && - clp->cl_nfsd_info_dentry) - fsnotify_dentry(clp->cl_nfsd_info_dentry, FS_MODIFY); + if (!test_and_set_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags)) { + trace_nfsd_clid_confirmed(&clp->cl_clientid); + if (clp->cl_nfsd_dentry && clp->cl_nfsd_info_dentry) + fsnotify_dentry(clp->cl_nfsd_info_dentry, FS_MODIFY); + } renew_client_locked(clp); } diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 2fac89e29f51..2c0f0057f60c 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -511,6 +511,7 @@ DEFINE_EVENT(nfsd_clientid_class, nfsd_clid_##name, \ TP_PROTO(const clientid_t *clid), \ TP_ARGS(clid)) +DEFINE_CLIENTID_EVENT(confirmed); DEFINE_CLIENTID_EVENT(expired); DEFINE_CLIENTID_EVENT(purged); DEFINE_CLIENTID_EVENT(renew); From 580ec8b6536a1cb35fbdab4b0a4a27c29e5fc0a5 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:55:54 -0400 Subject: [PATCH 0572/1565] NFSD: Add nfsd_clid_reclaim_complete tracepoint [ Upstream commit cee8aa074281e5269d8404be2b6388bb29ea8efc ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 1 + fs/nfsd/trace.h | 1 + 2 files changed, 2 insertions(+) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index b10593079e38..da5b9b88b0cd 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3981,6 +3981,7 @@ nfsd4_reclaim_complete(struct svc_rqst *rqstp, goto out; status = nfs_ok; + trace_nfsd_clid_reclaim_complete(&clp->cl_clientid); nfsd4_client_record_create(clp); inc_reclaim_complete(clp); out: diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 2c0f0057f60c..6c787f4ef563 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -511,6 +511,7 @@ DEFINE_EVENT(nfsd_clientid_class, nfsd_clid_##name, \ TP_PROTO(const clientid_t *clid), \ TP_ARGS(clid)) +DEFINE_CLIENTID_EVENT(reclaim_complete); DEFINE_CLIENTID_EVENT(confirmed); DEFINE_CLIENTID_EVENT(expired); DEFINE_CLIENTID_EVENT(purged); From 056332823cdc7fd3b3f78f87dbb80f90420c5011 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:56:00 -0400 Subject: [PATCH 0573/1565] NFSD: Add nfsd_clid_destroyed tracepoint [ Upstream commit c41a9b7a906fb872f8b2b1a34d2a1d5ef7f94adb ] Record client-requested termination of client IDs. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 1 + fs/nfsd/trace.h | 1 + 2 files changed, 2 insertions(+) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index da5b9b88b0cd..6f04a84f76c0 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3939,6 +3939,7 @@ nfsd4_destroy_clientid(struct svc_rqst *rqstp, status = nfserr_wrong_cred; goto out; } + trace_nfsd_clid_destroyed(&clp->cl_clientid); unhash_client_locked(clp); out: spin_unlock(&nn->client_lock); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 6c787f4ef563..3aca6dcba90a 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -513,6 +513,7 @@ DEFINE_EVENT(nfsd_clientid_class, nfsd_clid_##name, \ DEFINE_CLIENTID_EVENT(reclaim_complete); DEFINE_CLIENTID_EVENT(confirmed); +DEFINE_CLIENTID_EVENT(destroyed); DEFINE_CLIENTID_EVENT(expired); DEFINE_CLIENTID_EVENT(purged); DEFINE_CLIENTID_EVENT(renew); From 650530d52260f1853faa6c05649101182c935f7f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:56:06 -0400 Subject: [PATCH 0574/1565] NFSD: Add a couple more nfsd_clid_expired call sites [ Upstream commit 2958d2ee71021b6c44212ec6c2a39cc71d9cd4a9 ] Improve observation of NFSv4 lease expiry. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 9 ++++++--- fs/nfsd/trace.h | 3 ++- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 6f04a84f76c0..7e8752f4affd 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2687,6 +2687,8 @@ static void force_expire_client(struct nfs4_client *clp) struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); bool already_expired; + trace_nfsd_clid_admin_expired(&clp->cl_clientid); + spin_lock(&nn->client_lock); clp->cl_time = 0; spin_unlock(&nn->client_lock); @@ -3233,6 +3235,7 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = mark_client_expired_locked(conf); if (status) goto out; + trace_nfsd_clid_replaced(&conf->cl_clientid); } new->cl_minorversion = cstate->minorversion; new->cl_spo_must_allow.u.words[0] = exid->spo_must_allow[0]; @@ -3472,6 +3475,7 @@ nfsd4_create_session(struct svc_rqst *rqstp, old = NULL; goto out_free_conn; } + trace_nfsd_clid_replaced(&old->cl_clientid); } move_to_confirmed(unconf); conf = unconf; @@ -4112,6 +4116,7 @@ nfsd4_setclientid_confirm(struct svc_rqst *rqstp, old = NULL; goto out; } + trace_nfsd_clid_replaced(&old->cl_clientid); } move_to_confirmed(unconf); conf = unconf; @@ -5550,10 +5555,8 @@ nfs4_laundromat(struct nfsd_net *nn) clp = list_entry(pos, struct nfs4_client, cl_lru); if (!state_expired(<, clp->cl_time)) break; - if (mark_client_expired_locked(clp)) { - trace_nfsd_clid_expired(&clp->cl_clientid); + if (mark_client_expired_locked(clp)) continue; - } list_add(&clp->cl_lru, &reaplist); } spin_unlock(&nn->client_lock); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 3aca6dcba90a..3271d925abf2 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -514,7 +514,8 @@ DEFINE_EVENT(nfsd_clientid_class, nfsd_clid_##name, \ DEFINE_CLIENTID_EVENT(reclaim_complete); DEFINE_CLIENTID_EVENT(confirmed); DEFINE_CLIENTID_EVENT(destroyed); -DEFINE_CLIENTID_EVENT(expired); +DEFINE_CLIENTID_EVENT(admin_expired); +DEFINE_CLIENTID_EVENT(replaced); DEFINE_CLIENTID_EVENT(purged); DEFINE_CLIENTID_EVENT(renew); DEFINE_CLIENTID_EVENT(stale); From 5070351cdcebecac057b6648672eb3308cabde59 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:56:13 -0400 Subject: [PATCH 0575/1565] NFSD: Add tracepoints for SETCLIENTID edge cases [ Upstream commit 237f91c85acef206a33bc02f3c4e856128fd7994 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 19 ++++++++----------- fs/nfsd/trace.h | 37 +++++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+), 11 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 7e8752f4affd..31310dbe9c1e 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4008,11 +4008,9 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, new = create_client(clname, rqstp, &clverifier); if (new == NULL) return nfserr_jukebox; - /* Cases below refer to rfc 3530 section 14.2.33: */ spin_lock(&nn->client_lock); conf = find_confirmed_client_by_name(&clname, nn); if (conf && client_has_state(conf)) { - /* case 0: */ status = nfserr_clid_inuse; if (clp_used_exchangeid(conf)) goto out; @@ -4024,7 +4022,6 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, unconf = find_unconfirmed_client_by_name(&clname, nn); if (unconf) unhash_client_locked(unconf); - /* We need to handle only case 1: probable callback update */ if (conf) { if (same_verf(&conf->cl_verifier, &clverifier)) { copy_clid(new, conf); @@ -4032,7 +4029,8 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, } else trace_nfsd_clid_verf_mismatch(conf, rqstp, &clverifier); - } + } else + trace_nfsd_clid_fresh(new); new->cl_minorversion = 0; gen_callback(new, setclid, rqstp); add_to_unconfirmed(new); @@ -4045,12 +4043,13 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, spin_unlock(&nn->client_lock); if (new) free_client(new); - if (unconf) + if (unconf) { + trace_nfsd_clid_expire_unconf(&unconf->cl_clientid); expire_client(unconf); + } return status; } - __be32 nfsd4_setclientid_confirm(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, @@ -4087,21 +4086,19 @@ nfsd4_setclientid_confirm(struct svc_rqst *rqstp, trace_nfsd_clid_cred_mismatch(conf, rqstp); goto out; } - /* cases below refer to rfc 3530 section 14.2.34: */ if (!unconf || !same_verf(&confirm, &unconf->cl_confirm)) { if (conf && same_verf(&confirm, &conf->cl_confirm)) { - /* case 2: probable retransmit */ status = nfs_ok; - } else /* case 4: client hasn't noticed we rebooted yet? */ + } else status = nfserr_stale_clientid; goto out; } status = nfs_ok; - if (conf) { /* case 1: callback update */ + if (conf) { old = unconf; unhash_client_locked(old); nfsd4_change_callback(conf, &unconf->cl_cb_conn); - } else { /* case 3: normal case; new or rebooted client */ + } else { old = find_confirmed_client_by_name(&unconf->cl_name, nn); if (old) { status = nfserr_clid_inuse; diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 3271d925abf2..aa42d31cdfac 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -511,6 +511,7 @@ DEFINE_EVENT(nfsd_clientid_class, nfsd_clid_##name, \ TP_PROTO(const clientid_t *clid), \ TP_ARGS(clid)) +DEFINE_CLIENTID_EVENT(expire_unconf); DEFINE_CLIENTID_EVENT(reclaim_complete); DEFINE_CLIENTID_EVENT(confirmed); DEFINE_CLIENTID_EVENT(destroyed); @@ -600,6 +601,42 @@ TRACE_EVENT(nfsd_clid_verf_mismatch, ) ); +DECLARE_EVENT_CLASS(nfsd_clid_class, + TP_PROTO(const struct nfs4_client *clp), + TP_ARGS(clp), + TP_STRUCT__entry( + __field(u32, cl_boot) + __field(u32, cl_id) + __array(unsigned char, addr, sizeof(struct sockaddr_in6)) + __field(unsigned long, flavor) + __array(unsigned char, verifier, NFS4_VERIFIER_SIZE) + __dynamic_array(char, name, clp->cl_name.len + 1) + ), + TP_fast_assign( + __entry->cl_boot = clp->cl_clientid.cl_boot; + __entry->cl_id = clp->cl_clientid.cl_id; + memcpy(__entry->addr, &clp->cl_addr, + sizeof(struct sockaddr_in6)); + __entry->flavor = clp->cl_cred.cr_flavor; + memcpy(__entry->verifier, (void *)&clp->cl_verifier, + NFS4_VERIFIER_SIZE); + memcpy(__get_str(name), clp->cl_name.data, clp->cl_name.len); + __get_str(name)[clp->cl_name.len] = '\0'; + ), + TP_printk("addr=%pISpc name='%s' verifier=0x%s flavor=%s client=%08x:%08x", + __entry->addr, __get_str(name), + __print_hex_str(__entry->verifier, NFS4_VERIFIER_SIZE), + show_nfsd_authflavor(__entry->flavor), + __entry->cl_boot, __entry->cl_id) +); + +#define DEFINE_CLID_EVENT(name) \ +DEFINE_EVENT(nfsd_clid_class, nfsd_clid_##name, \ + TP_PROTO(const struct nfs4_client *clp), \ + TP_ARGS(clp)) + +DEFINE_CLID_EVENT(fresh); + /* * from fs/nfsd/filecache.h */ From 88b3cdfd487357b16c285d7a2f9517669a526fa7 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:56:19 -0400 Subject: [PATCH 0576/1565] NFSD: Add tracepoints for EXCHANGEID edge cases [ Upstream commit e8f80c5545ec5794644b48537449e48b009d608d ] Some of the most common cases are traced. Enough infrastructure is now in place that more can be added later, as needed. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 12 +++++++++--- fs/nfsd/trace.h | 1 + 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 31310dbe9c1e..adf476cbf36c 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3200,6 +3200,7 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, } /* case 6 */ exid->flags |= EXCHGID4_FLAG_CONFIRMED_R; + trace_nfsd_clid_confirmed_r(conf); goto out_copy; } if (!creds_match) { /* case 3 */ @@ -3212,6 +3213,7 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, } if (verfs_match) { /* case 2 */ conf->cl_exchange_flags |= EXCHGID4_FLAG_CONFIRMED_R; + trace_nfsd_clid_confirmed_r(conf); goto out_copy; } /* case 5, client reboot */ @@ -3225,11 +3227,13 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out; } - unconf = find_unconfirmed_client_by_name(&exid->clname, nn); + unconf = find_unconfirmed_client_by_name(&exid->clname, nn); if (unconf) /* case 4, possible retry or client restart */ unhash_client_locked(unconf); - /* case 1 (normal case) */ + /* case 1, new owner ID */ + trace_nfsd_clid_fresh(new); + out_new: if (conf) { status = mark_client_expired_locked(conf); @@ -3259,8 +3263,10 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, out_nolock: if (new) expire_client(new); - if (unconf) + if (unconf) { + trace_nfsd_clid_expire_unconf(&unconf->cl_clientid); expire_client(unconf); + } return status; } diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index aa42d31cdfac..de461c82dbf4 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -636,6 +636,7 @@ DEFINE_EVENT(nfsd_clid_class, nfsd_clid_##name, \ TP_ARGS(clp)) DEFINE_CLID_EVENT(fresh); +DEFINE_CLID_EVENT(confirmed_r); /* * from fs/nfsd/filecache.h From b6ba775ccc947731ddd8f2473e9e734c46374494 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:56:25 -0400 Subject: [PATCH 0577/1565] NFSD: Constify @fh argument of knfsd_fh_hash() [ Upstream commit 1736aec82a15cb5d4b3bbe0b2fbae0ede66b1a1a ] Enable knfsd_fh_hash() to be invoked in functions where the filehandle pointer is a const. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsfh.h | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index aff2cda5c6c3..6106697adc04 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -225,15 +225,12 @@ static inline bool fh_fsid_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2) * returns a crc32 hash for the filehandle that is compatible with * the one displayed by "wireshark". */ - -static inline u32 -knfsd_fh_hash(struct knfsd_fh *fh) +static inline u32 knfsd_fh_hash(const struct knfsd_fh *fh) { return ~crc32_le(0xFFFFFFFF, (unsigned char *)&fh->fh_base, fh->fh_size); } #else -static inline u32 -knfsd_fh_hash(struct knfsd_fh *fh) +static inline u32 knfsd_fh_hash(const struct knfsd_fh *fh) { return 0; } From 2be1f2275193a3adf5cee89bdf4578487061adc6 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:56:31 -0400 Subject: [PATCH 0578/1565] NFSD: Capture every CB state transition [ Upstream commit 8476c69a7fa0f1f9705ec0caa4e97c08b5045779 ] We were missing one. As a clean-up, add a helper that sets the new CB state and fires a tracepoint. The tracepoint fires only when the state changes, to help reduce trace log noise. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4callback.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index f5b7ad0847f2..2e6a4a9e59ca 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -945,20 +945,26 @@ static int setup_callback_client(struct nfs4_client *clp, struct nfs4_cb_conn *c return 0; } +static void nfsd4_mark_cb_state(struct nfs4_client *clp, int newstate) +{ + if (clp->cl_cb_state != newstate) { + clp->cl_cb_state = newstate; + trace_nfsd_cb_state(clp); + } +} + static void nfsd4_mark_cb_down(struct nfs4_client *clp, int reason) { if (test_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags)) return; - clp->cl_cb_state = NFSD4_CB_DOWN; - trace_nfsd_cb_state(clp); + nfsd4_mark_cb_state(clp, NFSD4_CB_DOWN); } static void nfsd4_mark_cb_fault(struct nfs4_client *clp, int reason) { if (test_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags)) return; - clp->cl_cb_state = NFSD4_CB_FAULT; - trace_nfsd_cb_state(clp); + nfsd4_mark_cb_state(clp, NFSD4_CB_FAULT); } static void nfsd4_cb_probe_done(struct rpc_task *task, void *calldata) @@ -968,10 +974,8 @@ static void nfsd4_cb_probe_done(struct rpc_task *task, void *calldata) trace_nfsd_cb_done(clp, task->tk_status); if (task->tk_status) nfsd4_mark_cb_down(clp, task->tk_status); - else { - clp->cl_cb_state = NFSD4_CB_UP; - trace_nfsd_cb_state(clp); - } + else + nfsd4_mark_cb_state(clp, NFSD4_CB_UP); } static void nfsd4_cb_probe_release(void *calldata) @@ -995,8 +999,7 @@ static const struct rpc_call_ops nfsd4_cb_probe_ops = { */ void nfsd4_probe_callback(struct nfs4_client *clp) { - clp->cl_cb_state = NFSD4_CB_UNKNOWN; - trace_nfsd_cb_state(clp); + nfsd4_mark_cb_state(clp, NFSD4_CB_UNKNOWN); set_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags); nfsd4_run_cb(&clp->cl_cb_null); } @@ -1009,11 +1012,10 @@ void nfsd4_probe_callback_sync(struct nfs4_client *clp) void nfsd4_change_callback(struct nfs4_client *clp, struct nfs4_cb_conn *conn) { - clp->cl_cb_state = NFSD4_CB_UNKNOWN; + nfsd4_mark_cb_state(clp, NFSD4_CB_UNKNOWN); spin_lock(&clp->cl_lock); memcpy(&clp->cl_cb_conn, conn, sizeof(struct nfs4_cb_conn)); spin_unlock(&clp->cl_lock); - trace_nfsd_cb_state(clp); } /* @@ -1345,7 +1347,7 @@ nfsd4_run_cb_work(struct work_struct *work) * Don't send probe messages for 4.1 or later. */ if (!cb->cb_ops && clp->cl_minorversion) { - clp->cl_cb_state = NFSD4_CB_UP; + nfsd4_mark_cb_state(clp, NFSD4_CB_UP); nfsd41_destroy_cb(cb); return; } From 3076ede3fc10c5b1f118173df499bbd46427f93a Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:56:37 -0400 Subject: [PATCH 0579/1565] NFSD: Drop TRACE_DEFINE_ENUM for NFSD4_CB_ macros [ Upstream commit 167145cc64ce4b4b177e636829909a6b14004f9e ] TRACE_DEFINE_ENUM() is necessary for enum {} but not for C macros. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/trace.h | 5 ----- 1 file changed, 5 deletions(-) diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index de461c82dbf4..3683076e0fcd 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -877,11 +877,6 @@ TRACE_EVENT(nfsd_cb_nodelegs, TP_printk("client %08x:%08x", __entry->cl_boot, __entry->cl_id) ) -TRACE_DEFINE_ENUM(NFSD4_CB_UP); -TRACE_DEFINE_ENUM(NFSD4_CB_UNKNOWN); -TRACE_DEFINE_ENUM(NFSD4_CB_DOWN); -TRACE_DEFINE_ENUM(NFSD4_CB_FAULT); - #define show_cb_state(val) \ __print_symbolic(val, \ { NFSD4_CB_UP, "UP" }, \ From 634816f9d3de92ee2635158e84c4731074131799 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:56:43 -0400 Subject: [PATCH 0580/1565] NFSD: Add cb_lost tracepoint [ Upstream commit 806d65b617d89be887fe68bfa051f78143669cd7 ] Provide more clarity about when the callback channel is in trouble. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 2 ++ fs/nfsd/trace.h | 1 + 2 files changed, 3 insertions(+) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index adf476cbf36c..a8aa3680605b 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1763,6 +1763,8 @@ static void nfsd4_conn_lost(struct svc_xpt_user *u) struct nfsd4_conn *c = container_of(u, struct nfsd4_conn, cn_xpt_user); struct nfs4_client *clp = c->cn_session->se_client; + trace_nfsd_cb_lost(clp); + spin_lock(&clp->cl_lock); if (!list_empty(&c->cn_persession)) { list_del(&c->cn_persession); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 3683076e0fcd..afffb4912acb 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -912,6 +912,7 @@ DEFINE_EVENT(nfsd_cb_class, nfsd_cb_##name, \ DEFINE_NFSD_CB_EVENT(setup); DEFINE_NFSD_CB_EVENT(state); +DEFINE_NFSD_CB_EVENT(lost); DEFINE_NFSD_CB_EVENT(shutdown); TRACE_DEFINE_ENUM(RPC_AUTH_NULL); From fc3b4f0188e99422fe8840b4f079ee3f4e397afe Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:56:49 -0400 Subject: [PATCH 0581/1565] NFSD: Adjust cb_shutdown tracepoint [ Upstream commit b200f0e35338b052976b6c5759e4f77a3013e6f6 ] Show when the upper layer requested a shutdown. RPC tracepoints can already show when rpc_shutdown_client() is called. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4callback.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index 2e6a4a9e59ca..2a2eb6184bda 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -1233,6 +1233,9 @@ void nfsd4_destroy_callback_queue(void) /* must be called under the state lock */ void nfsd4_shutdown_callback(struct nfs4_client *clp) { + if (clp->cl_cb_state != NFSD4_CB_UNKNOWN) + trace_nfsd_cb_shutdown(clp); + set_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags); /* * Note this won't actually result in a null callback; @@ -1278,7 +1281,6 @@ static void nfsd4_process_cb_update(struct nfsd4_callback *cb) * kill the old client: */ if (clp->cl_cb_client) { - trace_nfsd_cb_shutdown(clp); rpc_shutdown_client(clp->cl_cb_client); clp->cl_cb_client = NULL; put_cred(clp->cl_cb_cred); From 59ddc5a82bc300597210c7433b28e6152d860f21 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:57:02 -0400 Subject: [PATCH 0582/1565] NFSD: Enhance the nfsd_cb_setup tracepoint [ Upstream commit 9f57c6062bf3ce2c6ab9ba60040b34e8134ef259 ] Display the transport protocol and authentication flavor so admins can see what they might be getting wrong. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4callback.c | 3 ++- fs/nfsd/trace.h | 27 ++++++++++++++++++++++++++- 2 files changed, 28 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index 2a2eb6184bda..fe1f36b70fa0 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -941,7 +941,8 @@ static int setup_callback_client(struct nfs4_client *clp, struct nfs4_cb_conn *c clp->cl_cb_conn.cb_xprt = conn->cb_xprt; clp->cl_cb_client = client; clp->cl_cb_cred = cred; - trace_nfsd_cb_setup(clp); + trace_nfsd_cb_setup(clp, rpc_peeraddr2str(client, RPC_DISPLAY_NETID), + args.authflavor); return 0; } diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index afffb4912acb..86e0656bdb77 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -910,7 +910,6 @@ DEFINE_EVENT(nfsd_cb_class, nfsd_cb_##name, \ TP_PROTO(const struct nfs4_client *clp), \ TP_ARGS(clp)) -DEFINE_NFSD_CB_EVENT(setup); DEFINE_NFSD_CB_EVENT(state); DEFINE_NFSD_CB_EVENT(lost); DEFINE_NFSD_CB_EVENT(shutdown); @@ -931,6 +930,32 @@ TRACE_DEFINE_ENUM(RPC_AUTH_GSS_KRB5P); { RPC_AUTH_GSS_KRB5I, "krb5i" }, \ { RPC_AUTH_GSS_KRB5P, "krb5p" }) +TRACE_EVENT(nfsd_cb_setup, + TP_PROTO(const struct nfs4_client *clp, + const char *netid, + rpc_authflavor_t authflavor + ), + TP_ARGS(clp, netid, authflavor), + TP_STRUCT__entry( + __field(u32, cl_boot) + __field(u32, cl_id) + __field(unsigned long, authflavor) + __array(unsigned char, addr, sizeof(struct sockaddr_in6)) + __array(unsigned char, netid, 8) + ), + TP_fast_assign( + __entry->cl_boot = clp->cl_clientid.cl_boot; + __entry->cl_id = clp->cl_clientid.cl_id; + strlcpy(__entry->netid, netid, sizeof(__entry->netid)); + __entry->authflavor = authflavor; + memcpy(__entry->addr, &clp->cl_cb_conn.cb_addr, + sizeof(struct sockaddr_in6)); + ), + TP_printk("addr=%pISpc client %08x:%08x proto=%s flavor=%s", + __entry->addr, __entry->cl_boot, __entry->cl_id, + __entry->netid, show_nfsd_authflavor(__entry->authflavor)) +); + TRACE_EVENT(nfsd_cb_setup_err, TP_PROTO( const struct nfs4_client *clp, From 60aac215347ce083bfeaf71cecd69896ee9104d0 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:57:08 -0400 Subject: [PATCH 0583/1565] NFSD: Add an nfsd_cb_lm_notify tracepoint [ Upstream commit 2cde7f8118f0fea29ad73ddcf28817f95adeffd5 ] When the server kicks off a CB_LM_NOTIFY callback, record its arguments so we can better observe asynchronous locking behavior. For example: nfsd-998 [002] 1471.705873: nfsd_cb_notify_lock: addr=192.168.2.51:0 client 6092a47c:35a43fc1 fh_hash=0x8950b23a Signed-off-by: Chuck Lever Cc: Jeff Layton Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 4 +++- fs/nfsd/trace.h | 26 ++++++++++++++++++++++++++ 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index a8aa3680605b..89054fe68aca 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6494,8 +6494,10 @@ nfsd4_lm_notify(struct file_lock *fl) } spin_unlock(&nn->blocked_locks_lock); - if (queue) + if (queue) { + trace_nfsd_cb_notify_lock(lo, nbl); nfsd4_run_cb(&nbl->nbl_cb); + } } static const struct lock_manager_operations nfsd_posix_mng_ops = { diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 86e0656bdb77..bed7d5d49fee 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -1027,6 +1027,32 @@ TRACE_EVENT(nfsd_cb_done, __entry->status) ); +TRACE_EVENT(nfsd_cb_notify_lock, + TP_PROTO( + const struct nfs4_lockowner *lo, + const struct nfsd4_blocked_lock *nbl + ), + TP_ARGS(lo, nbl), + TP_STRUCT__entry( + __field(u32, cl_boot) + __field(u32, cl_id) + __field(u32, fh_hash) + __array(unsigned char, addr, sizeof(struct sockaddr_in6)) + ), + TP_fast_assign( + const struct nfs4_client *clp = lo->lo_owner.so_client; + + __entry->cl_boot = clp->cl_clientid.cl_boot; + __entry->cl_id = clp->cl_clientid.cl_id; + __entry->fh_hash = knfsd_fh_hash(&nbl->nbl_fh); + memcpy(__entry->addr, &clp->cl_cb_conn.cb_addr, + sizeof(struct sockaddr_in6)); + ), + TP_printk("addr=%pISpc client %08x:%08x fh_hash=0x%08x", + __entry->addr, __entry->cl_boot, __entry->cl_id, + __entry->fh_hash) +); + #endif /* _NFSD_TRACE_H */ #undef TRACE_INCLUDE_PATH From 9f76187f0a46f23029f2c4ea5b96180cc284dc6d Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:57:14 -0400 Subject: [PATCH 0584/1565] NFSD: Add an nfsd_cb_offload tracepoint [ Upstream commit 87512386e951ee28ba2e7ef32b843ac97621d371 ] Record the arguments of CB_OFFLOAD callbacks so we can better observe asynchronous copy-offload behavior. For example: nfsd-995 [008] 7721.934222: nfsd_cb_offload: addr=192.168.2.51:0 client 6092a47c:35a43fc1 fh_hash=0x8739113a count=116528 status=0 Signed-off-by: Chuck Lever Cc: Olga Kornievskaia Cc: Dai Ngo Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 2 ++ fs/nfsd/trace.h | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 38 insertions(+) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index f85958f81a26..dfce9c432a5e 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1489,6 +1489,8 @@ static int nfsd4_do_async_copy(void *data) memcpy(&cb_copy->fh, ©->fh, sizeof(copy->fh)); nfsd4_init_cb(&cb_copy->cp_cb, cb_copy->cp_clp, &nfsd4_cb_offload_ops, NFSPROC4_CLNT_CB_OFFLOAD); + trace_nfsd_cb_offload(copy->cp_clp, ©->cp_res.cb_stateid, + ©->fh, copy->cp_count, copy->nfserr); nfsd4_run_cb(&cb_copy->cp_cb); out: if (!copy->cp_intra) diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index bed7d5d49fee..fe32dfe1e55a 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -1053,6 +1053,42 @@ TRACE_EVENT(nfsd_cb_notify_lock, __entry->fh_hash) ); +TRACE_EVENT(nfsd_cb_offload, + TP_PROTO( + const struct nfs4_client *clp, + const stateid_t *stp, + const struct knfsd_fh *fh, + u64 count, + __be32 status + ), + TP_ARGS(clp, stp, fh, count, status), + TP_STRUCT__entry( + __field(u32, cl_boot) + __field(u32, cl_id) + __field(u32, si_id) + __field(u32, si_generation) + __field(u32, fh_hash) + __field(int, status) + __field(u64, count) + __array(unsigned char, addr, sizeof(struct sockaddr_in6)) + ), + TP_fast_assign( + __entry->cl_boot = stp->si_opaque.so_clid.cl_boot; + __entry->cl_id = stp->si_opaque.so_clid.cl_id; + __entry->si_id = stp->si_opaque.so_id; + __entry->si_generation = stp->si_generation; + __entry->fh_hash = knfsd_fh_hash(fh); + __entry->status = be32_to_cpu(status); + __entry->count = count; + memcpy(__entry->addr, &clp->cl_cb_conn.cb_addr, + sizeof(struct sockaddr_in6)); + ), + TP_printk("addr=%pISpc client %08x:%08x stateid %08x:%08x fh_hash=0x%08x count=%llu status=%d", + __entry->addr, __entry->cl_boot, __entry->cl_id, + __entry->si_id, __entry->si_generation, + __entry->fh_hash, __entry->count, __entry->status) +); + #endif /* _NFSD_TRACE_H */ #undef TRACE_INCLUDE_PATH From a577eb06dee49c22f92ea04289199b99c07ba324 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:57:20 -0400 Subject: [PATCH 0585/1565] NFSD: Replace the nfsd_deleg_break tracepoint [ Upstream commit 17d76ddf76e4972411402743eea7243d9a46f4f9 ] Renamed so it can be enabled as a set with the other nfsd_cb_ tracepoints. And, consistent with those tracepoints, report the address of the client, the client ID the server has given it, and the state ID being recalled. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 2 +- fs/nfsd/trace.h | 32 +++++++++++++++++++++++++++++++- 2 files changed, 32 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 89054fe68aca..09967037eb1a 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4675,7 +4675,7 @@ nfsd_break_deleg_cb(struct file_lock *fl) struct nfs4_delegation *dp = (struct nfs4_delegation *)fl->fl_owner; struct nfs4_file *fp = dp->dl_stid.sc_file; - trace_nfsd_deleg_break(&dp->dl_stid.sc_stateid); + trace_nfsd_cb_recall(&dp->dl_stid); /* * We don't want the locks code to timeout the lease for us; diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index fe32dfe1e55a..9061fd28c996 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -459,7 +459,6 @@ DEFINE_STATEID_EVENT(layout_recall_release); DEFINE_STATEID_EVENT(open); DEFINE_STATEID_EVENT(deleg_read); -DEFINE_STATEID_EVENT(deleg_break); DEFINE_STATEID_EVENT(deleg_recall); DECLARE_EVENT_CLASS(nfsd_stateseqid_class, @@ -1027,6 +1026,37 @@ TRACE_EVENT(nfsd_cb_done, __entry->status) ); +TRACE_EVENT(nfsd_cb_recall, + TP_PROTO( + const struct nfs4_stid *stid + ), + TP_ARGS(stid), + TP_STRUCT__entry( + __field(u32, cl_boot) + __field(u32, cl_id) + __field(u32, si_id) + __field(u32, si_generation) + __array(unsigned char, addr, sizeof(struct sockaddr_in6)) + ), + TP_fast_assign( + const stateid_t *stp = &stid->sc_stateid; + const struct nfs4_client *clp = stid->sc_client; + + __entry->cl_boot = stp->si_opaque.so_clid.cl_boot; + __entry->cl_id = stp->si_opaque.so_clid.cl_id; + __entry->si_id = stp->si_opaque.so_id; + __entry->si_generation = stp->si_generation; + if (clp) + memcpy(__entry->addr, &clp->cl_cb_conn.cb_addr, + sizeof(struct sockaddr_in6)); + else + memset(__entry->addr, 0, sizeof(struct sockaddr_in6)); + ), + TP_printk("addr=%pISpc client %08x:%08x stateid %08x:%08x", + __entry->addr, __entry->cl_boot, __entry->cl_id, + __entry->si_id, __entry->si_generation) +); + TRACE_EVENT(nfsd_cb_notify_lock, TP_PROTO( const struct nfs4_lockowner *lo, From 3a63aa2459dc8d4d1f79d52f9882f898c6028db7 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:57:26 -0400 Subject: [PATCH 0586/1565] NFSD: Add an nfsd_cb_probe tracepoint [ Upstream commit 4ade892ae1c35527584decb7fa026553d53cd03f ] Record a tracepoint event when the server performs a callback probe. This event can be enabled as a group with other nfsd_cb tracepoints. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4callback.c | 1 + fs/nfsd/trace.h | 1 + 2 files changed, 2 insertions(+) diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index fe1f36b70fa0..453f60b127eb 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -1000,6 +1000,7 @@ static const struct rpc_call_ops nfsd4_cb_probe_ops = { */ void nfsd4_probe_callback(struct nfs4_client *clp) { + trace_nfsd_cb_probe(clp); nfsd4_mark_cb_state(clp, NFSD4_CB_UNKNOWN); set_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags); nfsd4_run_cb(&clp->cl_cb_null); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 9061fd28c996..4361a0807f07 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -910,6 +910,7 @@ DEFINE_EVENT(nfsd_cb_class, nfsd_cb_##name, \ TP_ARGS(clp)) DEFINE_NFSD_CB_EVENT(state); +DEFINE_NFSD_CB_EVENT(probe); DEFINE_NFSD_CB_EVENT(lost); DEFINE_NFSD_CB_EVENT(shutdown); From 3e8aeb13a730af10c8d36da7c18c889daf7a3ae7 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:57:32 -0400 Subject: [PATCH 0587/1565] NFSD: Remove the nfsd_cb_work and nfsd_cb_done tracepoints [ Upstream commit 1d2bf65983a137121c165a7e69b2885572954915 ] Clean up: These are noise in properly working systems. If you really need to observe the operation of the callback mechanism, use the sunrpc:rpc\* tracepoints along with the workqueue tracepoints. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4callback.c | 5 ----- fs/nfsd/trace.h | 48 ------------------------------------------ 2 files changed, 53 deletions(-) diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index 453f60b127eb..59dc80ecd376 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -972,7 +972,6 @@ static void nfsd4_cb_probe_done(struct rpc_task *task, void *calldata) { struct nfs4_client *clp = container_of(calldata, struct nfs4_client, cl_cb_null); - trace_nfsd_cb_done(clp, task->tk_status); if (task->tk_status) nfsd4_mark_cb_down(clp, task->tk_status); else @@ -1174,8 +1173,6 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata) struct nfsd4_callback *cb = calldata; struct nfs4_client *clp = cb->cb_clp; - trace_nfsd_cb_done(clp, task->tk_status); - if (!nfsd4_cb_sequence_done(task, cb)) return; @@ -1328,8 +1325,6 @@ nfsd4_run_cb_work(struct work_struct *work) struct rpc_clnt *clnt; int flags; - trace_nfsd_cb_work(clp, cb->cb_msg.rpc_proc->p_name); - if (cb->cb_need_restart) { cb->cb_need_restart = false; } else { diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 4361a0807f07..87ac1f19bfd0 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -979,54 +979,6 @@ TRACE_EVENT(nfsd_cb_setup_err, __entry->addr, __entry->cl_boot, __entry->cl_id, __entry->error) ); -TRACE_EVENT(nfsd_cb_work, - TP_PROTO( - const struct nfs4_client *clp, - const char *procedure - ), - TP_ARGS(clp, procedure), - TP_STRUCT__entry( - __field(u32, cl_boot) - __field(u32, cl_id) - __string(procedure, procedure) - __array(unsigned char, addr, sizeof(struct sockaddr_in6)) - ), - TP_fast_assign( - __entry->cl_boot = clp->cl_clientid.cl_boot; - __entry->cl_id = clp->cl_clientid.cl_id; - __assign_str(procedure, procedure) - memcpy(__entry->addr, &clp->cl_cb_conn.cb_addr, - sizeof(struct sockaddr_in6)); - ), - TP_printk("addr=%pISpc client %08x:%08x procedure=%s", - __entry->addr, __entry->cl_boot, __entry->cl_id, - __get_str(procedure)) -); - -TRACE_EVENT(nfsd_cb_done, - TP_PROTO( - const struct nfs4_client *clp, - int status - ), - TP_ARGS(clp, status), - TP_STRUCT__entry( - __field(u32, cl_boot) - __field(u32, cl_id) - __field(int, status) - __array(unsigned char, addr, sizeof(struct sockaddr_in6)) - ), - TP_fast_assign( - __entry->cl_boot = clp->cl_clientid.cl_boot; - __entry->cl_id = clp->cl_clientid.cl_id; - __entry->status = status; - memcpy(__entry->addr, &clp->cl_cb_conn.cb_addr, - sizeof(struct sockaddr_in6)); - ), - TP_printk("addr=%pISpc client %08x:%08x status=%d", - __entry->addr, __entry->cl_boot, __entry->cl_id, - __entry->status) -); - TRACE_EVENT(nfsd_cb_recall, TP_PROTO( const struct nfs4_stid *stid From e7bbdd7deeb2f1cc3985b2bfa1b0a36fe97fb899 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 14 May 2021 15:57:39 -0400 Subject: [PATCH 0588/1565] NFSD: Update nfsd_cb_args tracepoint [ Upstream commit d6cbe98ff32aef795462a309ef048cfb89d1a11d ] Clean-up: Re-order the display of IP address and client ID to be consistent with other _cb_ tracepoints. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/trace.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 87ac1f19bfd0..68a0fecdd5f4 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -857,9 +857,9 @@ TRACE_EVENT(nfsd_cb_args, memcpy(__entry->addr, &conn->cb_addr, sizeof(struct sockaddr_in6)); ), - TP_printk("client %08x:%08x callback addr=%pISpc prog=%u ident=%u", - __entry->cl_boot, __entry->cl_id, - __entry->addr, __entry->prog, __entry->ident) + TP_printk("addr=%pISpc client %08x:%08x prog=%u ident=%u", + __entry->addr, __entry->cl_boot, __entry->cl_id, + __entry->prog, __entry->ident) ); TRACE_EVENT(nfsd_cb_nodelegs, From f666a75ccd9cd9cdb02d8a3f588fc40a6b32ec03 Mon Sep 17 00:00:00 2001 From: Yu Hsiang Huang Date: Fri, 14 May 2021 11:58:29 +0800 Subject: [PATCH 0589/1565] nfsd: Prevent truncation of an unlinked inode from blocking access to its directory [ Upstream commit e5d74a2d0ee67ae00edad43c3d7811016e4d2e21 ] Truncation of an unlinked inode may take a long time for I/O waiting, and it doesn't have to prevent access to the directory. Thus, let truncation occur outside the directory's mutex, just like do_unlinkat() does. Signed-off-by: Yu Hsiang Huang Signed-off-by: Bing Jing Chang Signed-off-by: Robbie Ko Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 520e55c35e74..2eb3bfbc8a35 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1870,6 +1870,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, { struct dentry *dentry, *rdentry; struct inode *dirp; + struct inode *rinode; __be32 err; int host_err; @@ -1898,6 +1899,8 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, host_err = -ENOENT; goto out_drop_write; } + rinode = d_inode(rdentry); + ihold(rinode); if (!type) type = d_inode(rdentry)->i_mode & S_IFMT; @@ -1913,6 +1916,8 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, if (!host_err) host_err = commit_metadata(fhp); dput(rdentry); + fh_unlock(fhp); + iput(rinode); /* truncate the inode here */ out_drop_write: fh_drop_write(fhp); From 94a89247017396dab16d2adb1a6859e2b46c5039 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Fri, 14 May 2021 18:21:37 -0400 Subject: [PATCH 0590/1565] nfsd: move some commit_metadata()s outside the inode lock [ Upstream commit eeeadbb9bd5652c47bb9b31aa9ad8b4f1b4aa8b3 ] The commit may be time-consuming and there's no need to hold the lock for it. More of these are possible, these were just some easy ones. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 2eb3bfbc8a35..74b2c6c5ad0b 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1626,9 +1626,9 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp, host_err = vfs_symlink(d_inode(dentry), dnew, path); err = nfserrno(host_err); + fh_unlock(fhp); if (!err) err = nfserrno(commit_metadata(fhp)); - fh_unlock(fhp); fh_drop_write(fhp); @@ -1693,6 +1693,7 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp, if (d_really_is_negative(dold)) goto out_dput; host_err = vfs_link(dold, dirp, dnew, NULL); + fh_unlock(ffhp); if (!host_err) { err = nfserrno(commit_metadata(ffhp)); if (!err) @@ -1913,10 +1914,10 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, host_err = vfs_rmdir(dirp, rdentry); } + fh_unlock(fhp); if (!host_err) host_err = commit_metadata(fhp); dput(rdentry); - fh_unlock(fhp); iput(rinode); /* truncate the inode here */ out_drop_write: From 817c6eb975798647c4bbbfc9eb288aa413ec7f65 Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia Date: Wed, 19 May 2021 14:48:27 -0400 Subject: [PATCH 0591/1565] NFSD add vfs_fsync after async copy is done [ Upstream commit eac0b17a77fbd763d305a5eaa4fd1119e5a0fe0d ] Currently, the server does all copies as NFS_UNSTABLE. For synchronous copies linux client will append a COMMIT to the COPY compound but for async copies it does not (because COMMIT needs to be done after all bytes are copied and not as a reply to the COPY operation). However, in order to save the client doing a COMMIT as a separate rpc, the server can reply back with NFS_FILE_SYNC copy. This patch proposed to add vfs_fsync() call at the end of the async copy. Signed-off-by: Olga Kornievskaia Signed-off-by: J. Bruce Fields [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 15 ++++++++++++++- fs/nfsd/xdr4.h | 1 + 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index dfce9c432a5e..aa0da0737a3f 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1366,7 +1366,8 @@ static const struct nfsd4_callback_ops nfsd4_cb_offload_ops = { static void nfsd4_init_copy_res(struct nfsd4_copy *copy, bool sync) { - copy->cp_res.wr_stable_how = NFS_UNSTABLE; + copy->cp_res.wr_stable_how = + copy->committed ? NFS_FILE_SYNC : NFS_UNSTABLE; copy->cp_synchronous = sync; gen_boot_verifier(©->cp_res.wr_verifier, copy->cp_clp->net); } @@ -1375,10 +1376,12 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy) { struct file *dst = copy->nf_dst->nf_file; struct file *src = copy->nf_src->nf_file; + errseq_t since; ssize_t bytes_copied = 0; u64 bytes_total = copy->cp_count; u64 src_pos = copy->cp_src_pos; u64 dst_pos = copy->cp_dst_pos; + __be32 status; /* See RFC 7862 p.67: */ if (bytes_total == 0) @@ -1395,6 +1398,16 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy) src_pos += bytes_copied; dst_pos += bytes_copied; } while (bytes_total > 0 && !copy->cp_synchronous); + /* for a non-zero asynchronous copy do a commit of data */ + if (!copy->cp_synchronous && copy->cp_res.wr_bytes_written > 0) { + since = READ_ONCE(dst->f_wb_err); + status = vfs_fsync_range(dst, copy->cp_dst_pos, + copy->cp_res.wr_bytes_written, 0); + if (!status) + status = filemap_check_wb_err(dst->f_mapping, since); + if (!status) + copy->committed = true; + } return bytes_copied; } diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index fe540a3415c6..37af86e370cb 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -567,6 +567,7 @@ struct nfsd4_copy { struct vfsmount *ss_mnt; struct nfs_fh c_fh; nfs4_stateid stateid; + bool committed; }; struct nfsd4_seek { From a4bc287943f5695209ff36bdc89f17b48d68fae7 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Fri, 21 May 2021 15:09:37 -0400 Subject: [PATCH 0592/1565] NFSD: delay unmount source's export after inter-server copy completed. [ Upstream commit f4e44b393389c77958f7c58bf4415032b4cda15b ] Currently the source's export is mounted and unmounted on every inter-server copy operation. This patch is an enhancement to delay the unmount of the source export for a certain period of time to eliminate the mount and unmount overhead on subsequent copy operations. After a copy operation completes, a work entry is added to the delayed unmount list with an expiration time. This list is serviced by the laundromat thread to unmount the export of the expired entries. Each time the export is being used again, its expiration time is extended and the entry is re-inserted to the tail of the list. The unmount task and the mount operation of the copy request are synced to make sure the export is not unmounted while it's being used. Signed-off-by: Dai Ngo Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/netns.h | 6 ++ fs/nfsd/nfs4proc.c | 135 ++++++++++++++++++++++++++++++++++++++-- fs/nfsd/nfs4state.c | 71 +++++++++++++++++++++ fs/nfsd/nfsd.h | 4 ++ fs/nfsd/nfssvc.c | 3 + include/linux/nfs_ssc.h | 14 +++++ 6 files changed, 229 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index a75abeb1e698..935c1028c217 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -176,6 +176,12 @@ struct nfsd_net { unsigned int longest_chain_cachesize; struct shrinker nfsd_reply_cache_shrinker; + + /* tracking server-to-server copy mounts */ + spinlock_t nfsd_ssc_lock; + struct list_head nfsd_ssc_mount_list; + wait_queue_head_t nfsd_ssc_waitq; + /* utsname taken from the process that starts the server */ char nfsd_name[UNX_MAXNODENAME+1]; }; diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index aa0da0737a3f..573c550e7ace 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -55,6 +55,13 @@ module_param(inter_copy_offload_enable, bool, 0644); MODULE_PARM_DESC(inter_copy_offload_enable, "Enable inter server to server copy offload. Default: false"); +#ifdef CONFIG_NFSD_V4_2_INTER_SSC +static int nfsd4_ssc_umount_timeout = 900000; /* default to 15 mins */ +module_param(nfsd4_ssc_umount_timeout, int, 0644); +MODULE_PARM_DESC(nfsd4_ssc_umount_timeout, + "idle msecs before unmount export from source server"); +#endif + #ifdef CONFIG_NFSD_V4_SECURITY_LABEL #include @@ -1168,6 +1175,81 @@ extern void nfs_sb_deactive(struct super_block *sb); #define NFSD42_INTERSSC_MOUNTOPS "vers=4.2,addr=%s,sec=sys" +/* + * setup a work entry in the ssc delayed unmount list. + */ +static int nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, + struct nfsd4_ssc_umount_item **retwork, struct vfsmount **ss_mnt) +{ + struct nfsd4_ssc_umount_item *ni = 0; + struct nfsd4_ssc_umount_item *work = NULL; + struct nfsd4_ssc_umount_item *tmp; + DEFINE_WAIT(wait); + + *ss_mnt = NULL; + *retwork = NULL; + work = kzalloc(sizeof(*work), GFP_KERNEL); +try_again: + spin_lock(&nn->nfsd_ssc_lock); + list_for_each_entry_safe(ni, tmp, &nn->nfsd_ssc_mount_list, nsui_list) { + if (strncmp(ni->nsui_ipaddr, ipaddr, sizeof(ni->nsui_ipaddr))) + continue; + /* found a match */ + if (ni->nsui_busy) { + /* wait - and try again */ + prepare_to_wait(&nn->nfsd_ssc_waitq, &wait, + TASK_INTERRUPTIBLE); + spin_unlock(&nn->nfsd_ssc_lock); + + /* allow 20secs for mount/unmount for now - revisit */ + if (signal_pending(current) || + (schedule_timeout(20*HZ) == 0)) { + kfree(work); + return nfserr_eagain; + } + finish_wait(&nn->nfsd_ssc_waitq, &wait); + goto try_again; + } + *ss_mnt = ni->nsui_vfsmount; + refcount_inc(&ni->nsui_refcnt); + spin_unlock(&nn->nfsd_ssc_lock); + kfree(work); + + /* return vfsmount in ss_mnt */ + return 0; + } + if (work) { + strncpy(work->nsui_ipaddr, ipaddr, sizeof(work->nsui_ipaddr)); + refcount_set(&work->nsui_refcnt, 2); + work->nsui_busy = true; + list_add_tail(&work->nsui_list, &nn->nfsd_ssc_mount_list); + *retwork = work; + } + spin_unlock(&nn->nfsd_ssc_lock); + return 0; +} + +static void nfsd4_ssc_update_dul_work(struct nfsd_net *nn, + struct nfsd4_ssc_umount_item *work, struct vfsmount *ss_mnt) +{ + /* set nsui_vfsmount, clear busy flag and wakeup waiters */ + spin_lock(&nn->nfsd_ssc_lock); + work->nsui_vfsmount = ss_mnt; + work->nsui_busy = false; + wake_up_all(&nn->nfsd_ssc_waitq); + spin_unlock(&nn->nfsd_ssc_lock); +} + +static void nfsd4_ssc_cancel_dul_work(struct nfsd_net *nn, + struct nfsd4_ssc_umount_item *work) +{ + spin_lock(&nn->nfsd_ssc_lock); + list_del(&work->nsui_list); + wake_up_all(&nn->nfsd_ssc_waitq); + spin_unlock(&nn->nfsd_ssc_lock); + kfree(work); +} + /* * Support one copy source server for now. */ @@ -1184,6 +1266,8 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, char *ipaddr, *dev_name, *raw_data; int len, raw_len; __be32 status = nfserr_inval; + struct nfsd4_ssc_umount_item *work = NULL; + struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); naddr = &nss->u.nl4_addr; tmp_addrlen = rpc_uaddr2sockaddr(SVC_NET(rqstp), naddr->addr, @@ -1232,12 +1316,23 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, goto out_free_rawdata; snprintf(dev_name, len + 5, "%s%s%s:/", startsep, ipaddr, endsep); + status = nfsd4_ssc_setup_dul(nn, ipaddr, &work, &ss_mnt); + if (status) + goto out_free_devname; + if (ss_mnt) + goto out_done; + /* Use an 'internal' mount: SB_KERNMOUNT -> MNT_INTERNAL */ ss_mnt = vfs_kern_mount(type, SB_KERNMOUNT, dev_name, raw_data); module_put(type->owner); - if (IS_ERR(ss_mnt)) + if (IS_ERR(ss_mnt)) { + if (work) + nfsd4_ssc_cancel_dul_work(nn, work); goto out_free_devname; - + } + if (work) + nfsd4_ssc_update_dul_work(nn, work, ss_mnt); +out_done: status = 0; *mount = ss_mnt; @@ -1297,10 +1392,42 @@ static void nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src, struct nfsd_file *dst) { + bool found = false; + long timeout; + struct nfsd4_ssc_umount_item *tmp; + struct nfsd4_ssc_umount_item *ni = 0; + struct nfsd_net *nn = net_generic(dst->nf_net, nfsd_net_id); + nfs42_ssc_close(src->nf_file); - fput(src->nf_file); nfsd_file_put(dst); - mntput(ss_mnt); + fput(src->nf_file); + + if (!nn) { + mntput(ss_mnt); + return; + } + spin_lock(&nn->nfsd_ssc_lock); + timeout = msecs_to_jiffies(nfsd4_ssc_umount_timeout); + list_for_each_entry_safe(ni, tmp, &nn->nfsd_ssc_mount_list, nsui_list) { + if (ni->nsui_vfsmount->mnt_sb == ss_mnt->mnt_sb) { + list_del(&ni->nsui_list); + /* + * vfsmount can be shared by multiple exports, + * decrement refcnt. If the count drops to 1 it + * will be unmounted when nsui_expire expires. + */ + refcount_dec(&ni->nsui_refcnt); + ni->nsui_expire = jiffies + timeout; + list_add_tail(&ni->nsui_list, &nn->nfsd_ssc_mount_list); + found = true; + break; + } + } + spin_unlock(&nn->nfsd_ssc_lock); + if (!found) { + mntput(ss_mnt); + return; + } } #else /* CONFIG_NFSD_V4_2_INTER_SSC */ diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 09967037eb1a..3dd6e25d5d90 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -44,6 +44,7 @@ #include #include #include +#include #include "xdr4.h" #include "xdr4cb.h" #include "vfs.h" @@ -5522,6 +5523,69 @@ static bool state_expired(struct laundry_time *lt, time64_t last_refresh) return false; } +#ifdef CONFIG_NFSD_V4_2_INTER_SSC +void nfsd4_ssc_init_umount_work(struct nfsd_net *nn) +{ + spin_lock_init(&nn->nfsd_ssc_lock); + INIT_LIST_HEAD(&nn->nfsd_ssc_mount_list); + init_waitqueue_head(&nn->nfsd_ssc_waitq); +} +EXPORT_SYMBOL_GPL(nfsd4_ssc_init_umount_work); + +/* + * This is called when nfsd is being shutdown, after all inter_ssc + * cleanup were done, to destroy the ssc delayed unmount list. + */ +static void nfsd4_ssc_shutdown_umount(struct nfsd_net *nn) +{ + struct nfsd4_ssc_umount_item *ni = 0; + struct nfsd4_ssc_umount_item *tmp; + + spin_lock(&nn->nfsd_ssc_lock); + list_for_each_entry_safe(ni, tmp, &nn->nfsd_ssc_mount_list, nsui_list) { + list_del(&ni->nsui_list); + spin_unlock(&nn->nfsd_ssc_lock); + mntput(ni->nsui_vfsmount); + kfree(ni); + spin_lock(&nn->nfsd_ssc_lock); + } + spin_unlock(&nn->nfsd_ssc_lock); +} + +static void nfsd4_ssc_expire_umount(struct nfsd_net *nn) +{ + bool do_wakeup = false; + struct nfsd4_ssc_umount_item *ni = 0; + struct nfsd4_ssc_umount_item *tmp; + + spin_lock(&nn->nfsd_ssc_lock); + list_for_each_entry_safe(ni, tmp, &nn->nfsd_ssc_mount_list, nsui_list) { + if (time_after(jiffies, ni->nsui_expire)) { + if (refcount_read(&ni->nsui_refcnt) > 1) + continue; + + /* mark being unmount */ + ni->nsui_busy = true; + spin_unlock(&nn->nfsd_ssc_lock); + mntput(ni->nsui_vfsmount); + spin_lock(&nn->nfsd_ssc_lock); + + /* waiters need to start from begin of list */ + list_del(&ni->nsui_list); + kfree(ni); + + /* wakeup ssc_connect waiters */ + do_wakeup = true; + continue; + } + break; + } + if (do_wakeup) + wake_up_all(&nn->nfsd_ssc_waitq); + spin_unlock(&nn->nfsd_ssc_lock); +} +#endif + static time64_t nfs4_laundromat(struct nfsd_net *nn) { @@ -5631,6 +5695,10 @@ nfs4_laundromat(struct nfsd_net *nn) list_del_init(&nbl->nbl_lru); free_blocked_lock(nbl); } +#ifdef CONFIG_NFSD_V4_2_INTER_SSC + /* service the server-to-server copy delayed unmount list */ + nfsd4_ssc_expire_umount(nn); +#endif out: return max_t(time64_t, lt.new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT); } @@ -7546,6 +7614,9 @@ nfs4_state_shutdown_net(struct net *net) nfsd4_client_tracking_exit(net); nfs4_state_destroy_net(net); +#ifdef CONFIG_NFSD_V4_2_INTER_SSC + nfsd4_ssc_shutdown_umount(nn); +#endif } void diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 14dbfa75059d..9664303afdaf 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -484,6 +484,10 @@ static inline bool nfsd_attrs_supported(u32 minorversion, const u32 *bmval) extern int nfsd4_is_junction(struct dentry *dentry); extern int register_cld_notifier(void); extern void unregister_cld_notifier(void); +#ifdef CONFIG_NFSD_V4_2_INTER_SSC +extern void nfsd4_ssc_init_umount_work(struct nfsd_net *nn); +#endif + #else /* CONFIG_NFSD_V4 */ static inline int nfsd4_is_junction(struct dentry *dentry) { diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 731d89898903..373695cc62a7 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -403,6 +403,9 @@ static int nfsd_startup_net(struct net *net, const struct cred *cred) if (ret) goto out_filecache; +#ifdef CONFIG_NFSD_V4_2_INTER_SSC + nfsd4_ssc_init_umount_work(nn); +#endif nn->nfsd_net_up = true; return 0; diff --git a/include/linux/nfs_ssc.h b/include/linux/nfs_ssc.h index f5ba0fbff72f..222ae8883e85 100644 --- a/include/linux/nfs_ssc.h +++ b/include/linux/nfs_ssc.h @@ -8,6 +8,7 @@ */ #include +#include extern struct nfs_ssc_client_ops_tbl nfs_ssc_client_tbl; @@ -52,6 +53,19 @@ static inline void nfs42_ssc_close(struct file *filep) if (nfs_ssc_client_tbl.ssc_nfs4_ops) (*nfs_ssc_client_tbl.ssc_nfs4_ops->sco_close)(filep); } + +struct nfsd4_ssc_umount_item { + struct list_head nsui_list; + bool nsui_busy; + /* + * nsui_refcnt inited to 2, 1 on list and 1 for consumer. Entry + * is removed when refcnt drops to 1 and nsui_expire expires. + */ + refcount_t nsui_refcnt; + unsigned long nsui_expire; + struct vfsmount *nsui_vfsmount; + char nsui_ipaddr[RPC_MAX_ADDRBUFLEN]; +}; #endif /* From dbc0aa47959567b428a5a9fa8c9d421dd3c2678b Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Tue, 25 May 2021 14:53:44 -0400 Subject: [PATCH 0593/1565] nfsd: move fsnotify on client creation outside spinlock [ Upstream commit 934bd07fae7e55232845f909f78873ab8678ca74 ] This was causing a "sleeping function called from invalid context" warning. I don't think we need the set_and_test_bit() here; clients move from unconfirmed to confirmed only once, under the client_lock. The (conf == unconf) is a way to check whether we're in that confirming case, hopefully that's not too obscure. Fixes: 472d155a0631 "nfsd: report client confirmation status in "info" file" Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 3dd6e25d5d90..4e14a9f6dfd3 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2846,11 +2846,8 @@ move_to_confirmed(struct nfs4_client *clp) list_move(&clp->cl_idhash, &nn->conf_id_hashtbl[idhashval]); rb_erase(&clp->cl_namenode, &nn->unconf_name_tree); add_clp_to_name_tree(clp, &nn->conf_name_tree); - if (!test_and_set_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags)) { - trace_nfsd_clid_confirmed(&clp->cl_clientid); - if (clp->cl_nfsd_dentry && clp->cl_nfsd_info_dentry) - fsnotify_dentry(clp->cl_nfsd_info_dentry, FS_MODIFY); - } + set_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags); + trace_nfsd_clid_confirmed(&clp->cl_clientid); renew_client_locked(clp); } @@ -3509,6 +3506,8 @@ nfsd4_create_session(struct svc_rqst *rqstp, /* cache solo and embedded create sessions under the client_lock */ nfsd4_cache_create_session(cr_ses, cs_slot, status); spin_unlock(&nn->client_lock); + if (conf == unconf) + fsnotify_dentry(conf->cl_nfsd_info_dentry, FS_MODIFY); /* init connection and backchannel */ nfsd4_init_conn(rqstp, conn, new); nfsd4_put_session(new); @@ -4129,6 +4128,8 @@ nfsd4_setclientid_confirm(struct svc_rqst *rqstp, } get_client_locked(conf); spin_unlock(&nn->client_lock); + if (conf == unconf) + fsnotify_dentry(conf->cl_nfsd_info_dentry, FS_MODIFY); nfsd4_probe_callback(conf); spin_lock(&nn->client_lock); put_client_renew_locked(conf); From 22b7c93d967462cfcdfbe130327f2e4aa6ee8875 Mon Sep 17 00:00:00 2001 From: Dave Wysochanski Date: Wed, 2 Jun 2021 13:51:39 -0400 Subject: [PATCH 0594/1565] nfsd4: Expose the callback address and state of each NFS4 client [ Upstream commit 3518c8666f15cdd5d38878005dab1d589add1c19 ] In addition to the client's address, display the callback channel state and address in the 'info' file. Signed-off-by: Dave Wysochanski Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 4e14a9f6dfd3..a20cdb191004 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2376,6 +2376,21 @@ static void seq_quote_mem(struct seq_file *m, char *data, int len) seq_printf(m, "\""); } +static const char *cb_state2str(int state) +{ + switch (state) { + case NFSD4_CB_UP: + return "UP"; + case NFSD4_CB_UNKNOWN: + return "UNKNOWN"; + case NFSD4_CB_DOWN: + return "DOWN"; + case NFSD4_CB_FAULT: + return "FAULT"; + } + return "UNDEFINED"; +} + static int client_info_show(struct seq_file *m, void *v) { struct inode *inode = m->private; @@ -2404,6 +2419,8 @@ static int client_info_show(struct seq_file *m, void *v) seq_printf(m, "\nImplementation time: [%lld, %ld]\n", clp->cl_nii_time.tv_sec, clp->cl_nii_time.tv_nsec); } + seq_printf(m, "callback state: %s\n", cb_state2str(clp->cl_cb_state)); + seq_printf(m, "callback address: %pISpc\n", &clp->cl_cb_conn.cb_addr); drop_client(clp); return 0; From c5a305d93e6b8a4e0609d1af3e58c5da35c9c175 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Thu, 3 Jun 2021 20:02:26 -0400 Subject: [PATCH 0595/1565] nfsd: fix kernel test robot warning in SSC code [ Upstream commit f47dc2d3013c65631bf8903becc7d88dc9d9966e ] Fix by initializing pointer nfsd4_ssc_umount_item with NULL instead of 0. Replace return value of nfsd4_ssc_setup_dul with __be32 instead of int. Reported-by: kernel test robot Signed-off-by: Dai Ngo Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 4 ++-- fs/nfsd/nfs4state.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 573c550e7ace..598b54893f83 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1178,7 +1178,7 @@ extern void nfs_sb_deactive(struct super_block *sb); /* * setup a work entry in the ssc delayed unmount list. */ -static int nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, +static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, struct nfsd4_ssc_umount_item **retwork, struct vfsmount **ss_mnt) { struct nfsd4_ssc_umount_item *ni = 0; @@ -1395,7 +1395,7 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src, bool found = false; long timeout; struct nfsd4_ssc_umount_item *tmp; - struct nfsd4_ssc_umount_item *ni = 0; + struct nfsd4_ssc_umount_item *ni = NULL; struct nfsd_net *nn = net_generic(dst->nf_net, nfsd_net_id); nfs42_ssc_close(src->nf_file); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index a20cdb191004..401f0f274371 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5556,7 +5556,7 @@ EXPORT_SYMBOL_GPL(nfsd4_ssc_init_umount_work); */ static void nfsd4_ssc_shutdown_umount(struct nfsd_net *nn) { - struct nfsd4_ssc_umount_item *ni = 0; + struct nfsd4_ssc_umount_item *ni = NULL; struct nfsd4_ssc_umount_item *tmp; spin_lock(&nn->nfsd_ssc_lock); From 2983611a663e1ab84eddda8a1752e6f43b6f6246 Mon Sep 17 00:00:00 2001 From: Wei Yongjun Date: Fri, 4 Jun 2021 10:12:37 +0000 Subject: [PATCH 0596/1565] NFSD: Fix error return code in nfsd4_interssc_connect() [ Upstream commit 54185267e1fe476875e649bb18e1c4254c123305 ] 'status' has been overwritten to 0 after nfsd4_ssc_setup_dul(), this cause 0 will be return in vfs_kern_mount() error case. Fix to return nfserr_nodev in this error. Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.") Reported-by: Hulk Robot Signed-off-by: Wei Yongjun Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 598b54893f83..f7ddfa204abc 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1326,6 +1326,7 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, ss_mnt = vfs_kern_mount(type, SB_KERNMOUNT, dev_name, raw_data); module_put(type->owner); if (IS_ERR(ss_mnt)) { + status = nfserr_nodev; if (work) nfsd4_ssc_cancel_dul_work(nn, work); goto out_free_devname; From c58120ab476503cb6fcb4ecab052aaa56309c9b5 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Mon, 14 Jun 2021 11:20:49 -0400 Subject: [PATCH 0597/1565] nfsd: rpc_peeraddr2str needs rcu lock [ Upstream commit 05570a2b01117209b500e1989ce8f1b0524c489f ] I'm not even sure cl_xprt can change here, but we're getting "suspicious RCU usage" warnings, and other rpc_peeraddr2str callers are taking the rcu lock. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4callback.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index 59dc80ecd376..97f517e9b418 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -941,8 +941,10 @@ static int setup_callback_client(struct nfs4_client *clp, struct nfs4_cb_conn *c clp->cl_cb_conn.cb_xprt = conn->cb_xprt; clp->cl_cb_client = client; clp->cl_cb_cred = cred; + rcu_read_lock(); trace_nfsd_cb_setup(clp, rpc_peeraddr2str(client, RPC_DISPLAY_NETID), args.authflavor); + rcu_read_unlock(); return 0; } From eeea3b96d15040a023008cb5c83738aa49260b9f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:50:40 -0400 Subject: [PATCH 0598/1565] lockd: Remove stale comments [ Upstream commit 99cdf57b33e68df7afc876739c93a11f0b1ba807 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/lockd/xdr.h | 6 ------ include/linux/lockd/xdr4.h | 7 +------ 2 files changed, 1 insertion(+), 12 deletions(-) diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h index 7ab9f264313f..a98309c0121c 100644 --- a/include/linux/lockd/xdr.h +++ b/include/linux/lockd/xdr.h @@ -109,11 +109,5 @@ int nlmsvc_decode_shareargs(struct svc_rqst *, __be32 *); int nlmsvc_encode_shareres(struct svc_rqst *, __be32 *); int nlmsvc_decode_notify(struct svc_rqst *, __be32 *); int nlmsvc_decode_reboot(struct svc_rqst *, __be32 *); -/* -int nlmclt_encode_testargs(struct rpc_rqst *, u32 *, struct nlm_args *); -int nlmclt_encode_lockargs(struct rpc_rqst *, u32 *, struct nlm_args *); -int nlmclt_encode_cancargs(struct rpc_rqst *, u32 *, struct nlm_args *); -int nlmclt_encode_unlockargs(struct rpc_rqst *, u32 *, struct nlm_args *); - */ #endif /* LOCKD_XDR_H */ diff --git a/include/linux/lockd/xdr4.h b/include/linux/lockd/xdr4.h index e709fe5924f2..5ae766f26e04 100644 --- a/include/linux/lockd/xdr4.h +++ b/include/linux/lockd/xdr4.h @@ -37,12 +37,7 @@ int nlm4svc_decode_shareargs(struct svc_rqst *, __be32 *); int nlm4svc_encode_shareres(struct svc_rqst *, __be32 *); int nlm4svc_decode_notify(struct svc_rqst *, __be32 *); int nlm4svc_decode_reboot(struct svc_rqst *, __be32 *); -/* -int nlmclt_encode_testargs(struct rpc_rqst *, u32 *, struct nlm_args *); -int nlmclt_encode_lockargs(struct rpc_rqst *, u32 *, struct nlm_args *); -int nlmclt_encode_cancargs(struct rpc_rqst *, u32 *, struct nlm_args *); -int nlmclt_encode_unlockargs(struct rpc_rqst *, u32 *, struct nlm_args *); - */ + extern const struct rpc_version nlm_version4; #endif /* LOCKD_XDR4_H */ From 3595ff1c2caa0b81bc2b3885e1b7589dd43b9a4c Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:50:46 -0400 Subject: [PATCH 0599/1565] lockd: Create a simplified .vs_dispatch method for NLM requests [ Upstream commit a9ad1a8090f58b2ed1774dd0f4c7cdb8210a3793 ] To enable xdr_stream-based encoding and decoding, create a bespoke RPC dispatch function for the lockd service. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 1a639e34847d..2de048f80eb8 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -766,6 +766,46 @@ static void __exit exit_nlm(void) module_init(init_nlm); module_exit(exit_nlm); +/** + * nlmsvc_dispatch - Process an NLM Request + * @rqstp: incoming request + * @statp: pointer to location of accept_stat field in RPC Reply buffer + * + * Return values: + * %0: Processing complete; do not send a Reply + * %1: Processing complete; send Reply in rqstp->rq_res + */ +static int nlmsvc_dispatch(struct svc_rqst *rqstp, __be32 *statp) +{ + const struct svc_procedure *procp = rqstp->rq_procinfo; + struct kvec *argv = rqstp->rq_arg.head; + struct kvec *resv = rqstp->rq_res.head; + + svcxdr_init_decode(rqstp); + if (!procp->pc_decode(rqstp, argv->iov_base)) + goto out_decode_err; + + *statp = procp->pc_func(rqstp); + if (*statp == rpc_drop_reply) + return 0; + if (*statp != rpc_success) + return 1; + + svcxdr_init_encode(rqstp); + if (!procp->pc_encode(rqstp, resv->iov_base + resv->iov_len)) + goto out_encode_err; + + return 1; + +out_decode_err: + *statp = rpc_garbage_args; + return 1; + +out_encode_err: + *statp = rpc_system_err; + return 1; +} + /* * Define NLM program and procedures */ @@ -775,6 +815,7 @@ static const struct svc_version nlmsvc_version1 = { .vs_nproc = 17, .vs_proc = nlmsvc_procedures, .vs_count = nlmsvc_version1_count, + .vs_dispatch = nlmsvc_dispatch, .vs_xdrsize = NLMSVC_XDRSIZE, }; static unsigned int nlmsvc_version3_count[24]; @@ -783,6 +824,7 @@ static const struct svc_version nlmsvc_version3 = { .vs_nproc = 24, .vs_proc = nlmsvc_procedures, .vs_count = nlmsvc_version3_count, + .vs_dispatch = nlmsvc_dispatch, .vs_xdrsize = NLMSVC_XDRSIZE, }; #ifdef CONFIG_LOCKD_V4 @@ -792,6 +834,7 @@ static const struct svc_version nlmsvc_version4 = { .vs_nproc = 24, .vs_proc = nlmsvc_procedures4, .vs_count = nlmsvc_version4_count, + .vs_dispatch = nlmsvc_dispatch, .vs_xdrsize = NLMSVC_XDRSIZE, }; #endif From fa4b890c0da06873b0cb23ac96746fa180e8f3e9 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:50:52 -0400 Subject: [PATCH 0600/1565] lockd: Common NLM XDR helpers [ Upstream commit a6a63ca5652ea05637ecfe349f9e895031529556 ] Add a .h file containing xdr_stream-based XDR helpers common to both NLMv3 and NLMv4. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svcxdr.h | 151 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 151 insertions(+) create mode 100644 fs/lockd/svcxdr.h diff --git a/fs/lockd/svcxdr.h b/fs/lockd/svcxdr.h new file mode 100644 index 000000000000..c69a0bb76c94 --- /dev/null +++ b/fs/lockd/svcxdr.h @@ -0,0 +1,151 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Encode/decode NLM basic data types + * + * Basic NLMv3 XDR data types are not defined in an IETF standards + * document. X/Open has a description of these data types that + * is useful. See Chapter 10 of "Protocols for Interworking: + * XNFS, Version 3W". + * + * Basic NLMv4 XDR data types are defined in Appendix II.1.4 of + * RFC 1813: "NFS Version 3 Protocol Specification". + * + * Author: Chuck Lever + * + * Copyright (c) 2020, Oracle and/or its affiliates. + */ + +#ifndef _LOCKD_SVCXDR_H_ +#define _LOCKD_SVCXDR_H_ + +static inline bool +svcxdr_decode_stats(struct xdr_stream *xdr, __be32 *status) +{ + __be32 *p; + + p = xdr_inline_decode(xdr, XDR_UNIT); + if (!p) + return false; + *status = *p; + + return true; +} + +static inline bool +svcxdr_encode_stats(struct xdr_stream *xdr, __be32 status) +{ + __be32 *p; + + p = xdr_reserve_space(xdr, XDR_UNIT); + if (!p) + return false; + *p = status; + + return true; +} + +static inline bool +svcxdr_decode_string(struct xdr_stream *xdr, char **data, unsigned int *data_len) +{ + __be32 *p; + u32 len; + + if (xdr_stream_decode_u32(xdr, &len) < 0) + return false; + if (len > NLM_MAXSTRLEN) + return false; + p = xdr_inline_decode(xdr, len); + if (!p) + return false; + *data_len = len; + *data = (char *)p; + + return true; +} + +/* + * NLM cookies are defined by specification to be a variable-length + * XDR opaque no longer than 1024 bytes. However, this implementation + * limits their length to 32 bytes, and treats zero-length cookies + * specially. + */ +static inline bool +svcxdr_decode_cookie(struct xdr_stream *xdr, struct nlm_cookie *cookie) +{ + __be32 *p; + u32 len; + + if (xdr_stream_decode_u32(xdr, &len) < 0) + return false; + if (len > NLM_MAXCOOKIELEN) + return false; + if (!len) + goto out_hpux; + + p = xdr_inline_decode(xdr, len); + if (!p) + return false; + cookie->len = len; + memcpy(cookie->data, p, len); + + return true; + + /* apparently HPUX can return empty cookies */ +out_hpux: + cookie->len = 4; + memset(cookie->data, 0, 4); + return true; +} + +static inline bool +svcxdr_encode_cookie(struct xdr_stream *xdr, const struct nlm_cookie *cookie) +{ + __be32 *p; + + if (xdr_stream_encode_u32(xdr, cookie->len) < 0) + return false; + p = xdr_reserve_space(xdr, cookie->len); + if (!p) + return false; + memcpy(p, cookie->data, cookie->len); + + return true; +} + +static inline bool +svcxdr_decode_owner(struct xdr_stream *xdr, struct xdr_netobj *obj) +{ + __be32 *p; + u32 len; + + if (xdr_stream_decode_u32(xdr, &len) < 0) + return false; + if (len > XDR_MAX_NETOBJ) + return false; + p = xdr_inline_decode(xdr, len); + if (!p) + return false; + obj->len = len; + obj->data = (u8 *)p; + + return true; +} + +static inline bool +svcxdr_encode_owner(struct xdr_stream *xdr, const struct xdr_netobj *obj) +{ + unsigned int quadlen = XDR_QUADLEN(obj->len); + __be32 *p; + + if (xdr_stream_encode_u32(xdr, obj->len) < 0) + return false; + p = xdr_reserve_space(xdr, obj->len); + if (!p) + return false; + p[quadlen - 1] = 0; /* XDR pad */ + memcpy(p, obj->data, obj->len); + + return true; +} + +#endif /* _LOCKD_SVCXDR_H_ */ From 4c3f448aaa0bb251663c363b10d17a051e3b0e63 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:50:58 -0400 Subject: [PATCH 0601/1565] lockd: Update the NLMv1 void argument decoder to use struct xdr_stream [ Upstream commit cc1029b51273da5b342683e9ae14ab4eeaa15997 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 982629f7b120..8be42a23679e 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -19,6 +19,8 @@ #include +#include "svcxdr.h" + #define NLMDBG_FACILITY NLMDBG_XDR @@ -178,8 +180,15 @@ nlm_encode_testres(__be32 *p, struct nlm_res *resp) /* - * First, the server side XDR functions + * Decode Call arguments */ + +int +nlmsvc_decode_void(struct svc_rqst *rqstp, __be32 *p) +{ + return 1; +} + int nlmsvc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) { @@ -339,12 +348,6 @@ nlmsvc_decode_res(struct svc_rqst *rqstp, __be32 *p) return xdr_argsize_check(rqstp, p); } -int -nlmsvc_decode_void(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_argsize_check(rqstp, p); -} - int nlmsvc_encode_void(struct svc_rqst *rqstp, __be32 *p) { From 8f9f41ebfa176ba6f61de7a84eefdf0be265d40d Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:51:04 -0400 Subject: [PATCH 0602/1565] lockd: Update the NLMv1 TEST arguments decoder to use struct xdr_stream [ Upstream commit 2fd0c67aabcf0f8821450b00ee511faa0b7761bf ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr.c | 72 +++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 66 insertions(+), 6 deletions(-) diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 8be42a23679e..56982edd4766 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -98,6 +98,33 @@ nlm_decode_fh(__be32 *p, struct nfs_fh *f) return p + XDR_QUADLEN(NFS2_FHSIZE); } +/* + * NLM file handles are defined by specification to be a variable-length + * XDR opaque no longer than 1024 bytes. However, this implementation + * constrains their length to exactly the length of an NFSv2 file + * handle. + */ +static bool +svcxdr_decode_fhandle(struct xdr_stream *xdr, struct nfs_fh *fh) +{ + __be32 *p; + u32 len; + + if (xdr_stream_decode_u32(xdr, &len) < 0) + return false; + if (len != NFS2_FHSIZE) + return false; + + p = xdr_inline_decode(xdr, len); + if (!p) + return false; + fh->size = NFS2_FHSIZE; + memcpy(fh->data, p, len); + memset(fh->data + NFS2_FHSIZE, 0, sizeof(fh->data) - NFS2_FHSIZE); + + return true; +} + /* * Encode and decode owner handle */ @@ -143,6 +170,38 @@ nlm_decode_lock(__be32 *p, struct nlm_lock *lock) return p; } +static bool +svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) +{ + struct file_lock *fl = &lock->fl; + s32 start, len, end; + + if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) + return false; + if (!svcxdr_decode_fhandle(xdr, &lock->fh)) + return false; + if (!svcxdr_decode_owner(xdr, &lock->oh)) + return false; + if (xdr_stream_decode_u32(xdr, &lock->svid) < 0) + return false; + if (xdr_stream_decode_u32(xdr, &start) < 0) + return false; + if (xdr_stream_decode_u32(xdr, &len) < 0) + return false; + + locks_init_lock(fl); + fl->fl_flags = FL_POSIX; + fl->fl_type = F_RDLCK; + end = start + len - 1; + fl->fl_start = s32_to_loff_t(start); + if (len == 0 || end < 0) + fl->fl_end = OFFSET_MAX; + else + fl->fl_end = s32_to_loff_t(end); + + return true; +} + /* * Encode result of a TEST/TEST_MSG call */ @@ -192,19 +251,20 @@ nlmsvc_decode_void(struct svc_rqst *rqstp, __be32 *p) int nlmsvc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; - u32 exclusive; + u32 exclusive; - if (!(p = nlm_decode_cookie(p, &argp->cookie))) + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) return 0; - - exclusive = ntohl(*p++); - if (!(p = nlm_decode_lock(p, &argp->lock))) + if (xdr_stream_decode_bool(xdr, &exclusive) < 0) + return 0; + if (!svcxdr_decode_lock(xdr, &argp->lock)) return 0; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; - return xdr_argsize_check(rqstp, p); + return 1; } int From 3e8675ff1ebc87d144ea3b875e5627c71484782f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:51:10 -0400 Subject: [PATCH 0603/1565] lockd: Update the NLMv1 LOCK arguments decoder to use struct xdr_stream [ Upstream commit c1adb8c672ca2b085c400695ef064547d77eda29 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr.c | 47 ++++++++++++++++++++++++++--------------------- 1 file changed, 26 insertions(+), 21 deletions(-) diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 56982edd4766..8a9f02e45df2 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -267,6 +267,32 @@ nlmsvc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) return 1; } +int +nlmsvc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_args *argp = rqstp->rq_argp; + u32 exclusive; + + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) + return 0; + if (xdr_stream_decode_bool(xdr, &argp->block) < 0) + return 0; + if (xdr_stream_decode_bool(xdr, &exclusive) < 0) + return 0; + if (!svcxdr_decode_lock(xdr, &argp->lock)) + return 0; + if (exclusive) + argp->lock.fl.fl_type = F_WRLCK; + if (xdr_stream_decode_bool(xdr, &argp->reclaim) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &argp->state) < 0) + return 0; + argp->monitor = 1; /* monitor client by default */ + + return 1; +} + int nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -277,27 +303,6 @@ nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) return xdr_ressize_check(rqstp, p); } -int -nlmsvc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_args *argp = rqstp->rq_argp; - u32 exclusive; - - if (!(p = nlm_decode_cookie(p, &argp->cookie))) - return 0; - argp->block = ntohl(*p++); - exclusive = ntohl(*p++); - if (!(p = nlm_decode_lock(p, &argp->lock))) - return 0; - if (exclusive) - argp->lock.fl.fl_type = F_WRLCK; - argp->reclaim = ntohl(*p++); - argp->state = ntohl(*p++); - argp->monitor = 1; /* monitor client by default */ - - return xdr_argsize_check(rqstp, p); -} - int nlmsvc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) { From 2025b3acf6555114c13850bf32f6e7ea5365c1bb Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:51:16 -0400 Subject: [PATCH 0604/1565] lockd: Update the NLMv1 CANCEL arguments decoder to use struct xdr_stream [ Upstream commit f4e08f3ac8c4945ea54a740e3afcf44b34e7cf44 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr.c | 38 +++++++++++++++++++++----------------- 1 file changed, 21 insertions(+), 17 deletions(-) diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 8a9f02e45df2..ef38f07d1224 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -293,6 +293,27 @@ nlmsvc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) return 1; } +int +nlmsvc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_args *argp = rqstp->rq_argp; + u32 exclusive; + + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) + return 0; + if (xdr_stream_decode_bool(xdr, &argp->block) < 0) + return 0; + if (xdr_stream_decode_bool(xdr, &exclusive) < 0) + return 0; + if (!svcxdr_decode_lock(xdr, &argp->lock)) + return 0; + if (exclusive) + argp->lock.fl.fl_type = F_WRLCK; + + return 1; +} + int nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -303,23 +324,6 @@ nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) return xdr_ressize_check(rqstp, p); } -int -nlmsvc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_args *argp = rqstp->rq_argp; - u32 exclusive; - - if (!(p = nlm_decode_cookie(p, &argp->cookie))) - return 0; - argp->block = ntohl(*p++); - exclusive = ntohl(*p++); - if (!(p = nlm_decode_lock(p, &argp->lock))) - return 0; - if (exclusive) - argp->lock.fl.fl_type = F_WRLCK; - return xdr_argsize_check(rqstp, p); -} - int nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) { From b4ea38d69d891dec5025a27e3e81b2de3f871d6d Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:51:22 -0400 Subject: [PATCH 0605/1565] lockd: Update the NLMv1 UNLOCK arguments decoder to use struct xdr_stream [ Upstream commit c27045d302b022ed11d24a2653bceb6af56c6327 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr.c | 57 +++++++++++++------------------------------------- 1 file changed, 15 insertions(+), 42 deletions(-) diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index ef38f07d1224..b59e02b4417c 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -140,36 +140,6 @@ nlm_encode_oh(__be32 *p, struct xdr_netobj *oh) return xdr_encode_netobj(p, oh); } -static __be32 * -nlm_decode_lock(__be32 *p, struct nlm_lock *lock) -{ - struct file_lock *fl = &lock->fl; - s32 start, len, end; - - if (!(p = xdr_decode_string_inplace(p, &lock->caller, - &lock->len, - NLM_MAXSTRLEN)) - || !(p = nlm_decode_fh(p, &lock->fh)) - || !(p = nlm_decode_oh(p, &lock->oh))) - return NULL; - lock->svid = ntohl(*p++); - - locks_init_lock(fl); - fl->fl_flags = FL_POSIX; - fl->fl_type = F_RDLCK; /* as good as anything else */ - start = ntohl(*p++); - len = ntohl(*p++); - end = start + len - 1; - - fl->fl_start = s32_to_loff_t(start); - - if (len == 0 || end < 0) - fl->fl_end = OFFSET_MAX; - else - fl->fl_end = s32_to_loff_t(end); - return p; -} - static bool svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) { @@ -314,6 +284,21 @@ nlmsvc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) return 1; } +int +nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_args *argp = rqstp->rq_argp; + + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) + return 0; + if (!svcxdr_decode_lock(xdr, &argp->lock)) + return 0; + argp->lock.fl.fl_type = F_UNLCK; + + return 1; +} + int nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -324,18 +309,6 @@ nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) return xdr_ressize_check(rqstp, p); } -int -nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_args *argp = rqstp->rq_argp; - - if (!(p = nlm_decode_cookie(p, &argp->cookie)) - || !(p = nlm_decode_lock(p, &argp->lock))) - return 0; - argp->lock.fl.fl_type = F_UNLCK; - return xdr_argsize_check(rqstp, p); -} - int nlmsvc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) { From 90f483a775446fe4f731824fab7a4bfb71469ab5 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:51:28 -0400 Subject: [PATCH 0606/1565] lockd: Update the NLMv1 nlm_res arguments decoder to use struct xdr_stream [ Upstream commit 16ddcabe6240c4fb01c97f6fce6c35ddf8626ad5 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index b59e02b4417c..911b6377a6da 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -299,6 +299,20 @@ nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) return 1; } +int +nlmsvc_decode_res(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_res *resp = rqstp->rq_argp; + + if (!svcxdr_decode_cookie(xdr, &resp->cookie)) + return 0; + if (!svcxdr_decode_stats(xdr, &resp->status)) + return 0; + + return 1; +} + int nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -379,17 +393,6 @@ nlmsvc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) return xdr_argsize_check(rqstp, p); } -int -nlmsvc_decode_res(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_res *resp = rqstp->rq_argp; - - if (!(p = nlm_decode_cookie(p, &resp->cookie))) - return 0; - resp->status = *p++; - return xdr_argsize_check(rqstp, p); -} - int nlmsvc_encode_void(struct svc_rqst *rqstp, __be32 *p) { From 56c936af53e3de2ad5ee3de70ebd811d0dca2541 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:51:34 -0400 Subject: [PATCH 0607/1565] lockd: Update the NLMv1 SM_NOTIFY arguments decoder to use struct xdr_stream [ Upstream commit 137e05e2f735f696e117553f7fa5ef8fb09953e1 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr.c | 39 ++++++++++++++++++++++++++------------- 1 file changed, 26 insertions(+), 13 deletions(-) diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 911b6377a6da..421613170e5f 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -313,6 +313,32 @@ nlmsvc_decode_res(struct svc_rqst *rqstp, __be32 *p) return 1; } +int +nlmsvc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_reboot *argp = rqstp->rq_argp; + u32 len; + + if (xdr_stream_decode_u32(xdr, &len) < 0) + return 0; + if (len > SM_MAXSTRLEN) + return 0; + p = xdr_inline_decode(xdr, len); + if (!p) + return 0; + argp->len = len; + argp->mon = (char *)p; + if (xdr_stream_decode_u32(xdr, &argp->state) < 0) + return 0; + p = xdr_inline_decode(xdr, SM_PRIV_SIZE); + if (!p) + return 0; + memcpy(&argp->priv.data, p, sizeof(argp->priv.data)); + + return 1; +} + int nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -380,19 +406,6 @@ nlmsvc_decode_notify(struct svc_rqst *rqstp, __be32 *p) return xdr_argsize_check(rqstp, p); } -int -nlmsvc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_reboot *argp = rqstp->rq_argp; - - if (!(p = xdr_decode_string_inplace(p, &argp->mon, &argp->len, SM_MAXSTRLEN))) - return 0; - argp->state = ntohl(*p++); - memcpy(&argp->priv.data, p, sizeof(argp->priv.data)); - p += XDR_QUADLEN(SM_PRIV_SIZE); - return xdr_argsize_check(rqstp, p); -} - int nlmsvc_encode_void(struct svc_rqst *rqstp, __be32 *p) { From 6c522daf60925469b5b302a5cb4cf238e392fc23 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:51:40 -0400 Subject: [PATCH 0608/1565] lockd: Update the NLMv1 SHARE arguments decoder to use struct xdr_stream [ Upstream commit 890939e1266b9adf3b0acd5e0385b39813cb8f11 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr.c | 100 ++++++++++++++----------------------------------- 1 file changed, 28 insertions(+), 72 deletions(-) diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 421613170e5f..c496f18eff06 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -21,8 +21,6 @@ #include "svcxdr.h" -#define NLMDBG_FACILITY NLMDBG_XDR - static inline loff_t s32_to_loff_t(__s32 offset) @@ -46,33 +44,6 @@ loff_t_to_s32(loff_t offset) /* * XDR functions for basic NLM types */ -static __be32 *nlm_decode_cookie(__be32 *p, struct nlm_cookie *c) -{ - unsigned int len; - - len = ntohl(*p++); - - if(len==0) - { - c->len=4; - memset(c->data, 0, 4); /* hockeypux brain damage */ - } - else if(len<=NLM_MAXCOOKIELEN) - { - c->len=len; - memcpy(c->data, p, len); - p+=XDR_QUADLEN(len); - } - else - { - dprintk("lockd: bad cookie size %d (only cookies under " - "%d bytes are supported.)\n", - len, NLM_MAXCOOKIELEN); - return NULL; - } - return p; -} - static inline __be32 * nlm_encode_cookie(__be32 *p, struct nlm_cookie *c) { @@ -82,22 +53,6 @@ nlm_encode_cookie(__be32 *p, struct nlm_cookie *c) return p; } -static __be32 * -nlm_decode_fh(__be32 *p, struct nfs_fh *f) -{ - unsigned int len; - - if ((len = ntohl(*p++)) != NFS2_FHSIZE) { - dprintk("lockd: bad fhandle size %d (should be %d)\n", - len, NFS2_FHSIZE); - return NULL; - } - f->size = NFS2_FHSIZE; - memset(f->data, 0, sizeof(f->data)); - memcpy(f->data, p, NFS2_FHSIZE); - return p + XDR_QUADLEN(NFS2_FHSIZE); -} - /* * NLM file handles are defined by specification to be a variable-length * XDR opaque no longer than 1024 bytes. However, this implementation @@ -128,12 +83,6 @@ svcxdr_decode_fhandle(struct xdr_stream *xdr, struct nfs_fh *fh) /* * Encode and decode owner handle */ -static inline __be32 * -nlm_decode_oh(__be32 *p, struct xdr_netobj *oh) -{ - return xdr_decode_netobj(p, oh); -} - static inline __be32 * nlm_encode_oh(__be32 *p, struct xdr_netobj *oh) { @@ -339,6 +288,34 @@ nlmsvc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) return 1; } +int +nlmsvc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_args *argp = rqstp->rq_argp; + struct nlm_lock *lock = &argp->lock; + + memset(lock, 0, sizeof(*lock)); + locks_init_lock(&lock->fl); + lock->svid = ~(u32)0; + + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) + return 0; + if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) + return 0; + if (!svcxdr_decode_fhandle(xdr, &lock->fh)) + return 0; + if (!svcxdr_decode_owner(xdr, &lock->oh)) + return 0; + /* XXX: Range checks are missing in the original code */ + if (xdr_stream_decode_u32(xdr, &argp->fsm_mode) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &argp->fsm_access) < 0) + return 0; + + return 1; +} + int nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -349,27 +326,6 @@ nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) return xdr_ressize_check(rqstp, p); } -int -nlmsvc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_args *argp = rqstp->rq_argp; - struct nlm_lock *lock = &argp->lock; - - memset(lock, 0, sizeof(*lock)); - locks_init_lock(&lock->fl); - lock->svid = ~(u32) 0; - - if (!(p = nlm_decode_cookie(p, &argp->cookie)) - || !(p = xdr_decode_string_inplace(p, &lock->caller, - &lock->len, NLM_MAXSTRLEN)) - || !(p = nlm_decode_fh(p, &lock->fh)) - || !(p = nlm_decode_oh(p, &lock->oh))) - return 0; - argp->fsm_mode = ntohl(*p++); - argp->fsm_access = ntohl(*p++); - return xdr_argsize_check(rqstp, p); -} - int nlmsvc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) { From 02a3c81665ac71960794b27c34a1ef0280d5cbbc Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:51:46 -0400 Subject: [PATCH 0609/1565] lockd: Update the NLMv1 FREE_ALL arguments decoder to use struct xdr_stream [ Upstream commit 14e105256b9dcdf50a003e2e9a0da77e06770a4b ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index c496f18eff06..091c8c463ab4 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -316,6 +316,21 @@ nlmsvc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) return 1; } +int +nlmsvc_decode_notify(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_args *argp = rqstp->rq_argp; + struct nlm_lock *lock = &argp->lock; + + if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) + return 0; + if (xdr_stream_decode_u32(xdr, &argp->state) < 0) + return 0; + + return 1; +} + int nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -349,19 +364,6 @@ nlmsvc_encode_res(struct svc_rqst *rqstp, __be32 *p) return xdr_ressize_check(rqstp, p); } -int -nlmsvc_decode_notify(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_args *argp = rqstp->rq_argp; - struct nlm_lock *lock = &argp->lock; - - if (!(p = xdr_decode_string_inplace(p, &lock->caller, - &lock->len, NLM_MAXSTRLEN))) - return 0; - argp->state = ntohl(*p++); - return xdr_argsize_check(rqstp, p); -} - int nlmsvc_encode_void(struct svc_rqst *rqstp, __be32 *p) { From e6c92714e9a6007d6a804d2598c09c7599009611 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:51:52 -0400 Subject: [PATCH 0610/1565] lockd: Update the NLMv1 void results encoder to use struct xdr_stream [ Upstream commit e26ec898b68b2ab64f379ba0fc0a615b2ad41f40 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 091c8c463ab4..840fa8ff8426 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -331,6 +331,17 @@ nlmsvc_decode_notify(struct svc_rqst *rqstp, __be32 *p) return 1; } + +/* + * Encode Reply results + */ + +int +nlmsvc_encode_void(struct svc_rqst *rqstp, __be32 *p) +{ + return 1; +} + int nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -363,9 +374,3 @@ nlmsvc_encode_res(struct svc_rqst *rqstp, __be32 *p) *p++ = resp->status; return xdr_ressize_check(rqstp, p); } - -int -nlmsvc_encode_void(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_ressize_check(rqstp, p); -} From d0ddd21bd52c23c9bc524660823cd92893f21a6a Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:51:58 -0400 Subject: [PATCH 0611/1565] lockd: Update the NLMv1 TEST results encoder to use struct xdr_stream [ Upstream commit adf98a4850b9ede9fc174c78a885845fb08499a5 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr.c | 74 ++++++++++++++++++++++++-------------------------- 1 file changed, 35 insertions(+), 39 deletions(-) diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 840fa8ff8426..daf3524040d6 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -80,15 +80,6 @@ svcxdr_decode_fhandle(struct xdr_stream *xdr, struct nfs_fh *fh) return true; } -/* - * Encode and decode owner handle - */ -static inline __be32 * -nlm_encode_oh(__be32 *p, struct xdr_netobj *oh) -{ - return xdr_encode_netobj(p, oh); -} - static bool svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) { @@ -121,39 +112,44 @@ svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) return true; } -/* - * Encode result of a TEST/TEST_MSG call - */ -static __be32 * -nlm_encode_testres(__be32 *p, struct nlm_res *resp) +static bool +svcxdr_encode_holder(struct xdr_stream *xdr, const struct nlm_lock *lock) { - s32 start, len; + const struct file_lock *fl = &lock->fl; + s32 start, len; - if (!(p = nlm_encode_cookie(p, &resp->cookie))) - return NULL; - *p++ = resp->status; + /* exclusive */ + if (xdr_stream_encode_bool(xdr, fl->fl_type != F_RDLCK) < 0) + return false; + if (xdr_stream_encode_u32(xdr, lock->svid) < 0) + return false; + if (!svcxdr_encode_owner(xdr, &lock->oh)) + return false; + start = loff_t_to_s32(fl->fl_start); + if (fl->fl_end == OFFSET_MAX) + len = 0; + else + len = loff_t_to_s32(fl->fl_end - fl->fl_start + 1); + if (xdr_stream_encode_u32(xdr, start) < 0) + return false; + if (xdr_stream_encode_u32(xdr, len) < 0) + return false; - if (resp->status == nlm_lck_denied) { - struct file_lock *fl = &resp->lock.fl; + return true; +} - *p++ = (fl->fl_type == F_RDLCK)? xdr_zero : xdr_one; - *p++ = htonl(resp->lock.svid); - - /* Encode owner handle. */ - if (!(p = xdr_encode_netobj(p, &resp->lock.oh))) - return NULL; - - start = loff_t_to_s32(fl->fl_start); - if (fl->fl_end == OFFSET_MAX) - len = 0; - else - len = loff_t_to_s32(fl->fl_end - fl->fl_start + 1); - - *p++ = htonl(start); - *p++ = htonl(len); +static bool +svcxdr_encode_testrply(struct xdr_stream *xdr, const struct nlm_res *resp) +{ + if (!svcxdr_encode_stats(xdr, resp->status)) + return false; + switch (resp->status) { + case nlm_lck_denied: + if (!svcxdr_encode_holder(xdr, &resp->lock)) + return false; } - return p; + return true; } @@ -345,11 +341,11 @@ nlmsvc_encode_void(struct svc_rqst *rqstp, __be32 *p) int nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp; - if (!(p = nlm_encode_testres(p, resp))) - return 0; - return xdr_ressize_check(rqstp, p); + return svcxdr_encode_cookie(xdr, &resp->cookie) && + svcxdr_encode_testrply(xdr, resp); } int From b049476790166a844aa37cb550ef1e8c3fe73ed6 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:52:04 -0400 Subject: [PATCH 0612/1565] lockd: Update the NLMv1 nlm_res results encoder to use struct xdr_stream [ Upstream commit e96735a6980574ecbdb24c760b8d294095e47074 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index daf3524040d6..4fb6090bc915 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -348,6 +348,16 @@ nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) svcxdr_encode_testrply(xdr, resp); } +int +nlmsvc_encode_res(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_res_stream; + struct nlm_res *resp = rqstp->rq_resp; + + return svcxdr_encode_cookie(xdr, &resp->cookie) && + svcxdr_encode_stats(xdr, resp->status); +} + int nlmsvc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) { @@ -359,14 +369,3 @@ nlmsvc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) *p++ = xdr_zero; /* sequence argument */ return xdr_ressize_check(rqstp, p); } - -int -nlmsvc_encode_res(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_res *resp = rqstp->rq_resp; - - if (!(p = nlm_encode_cookie(p, &resp->cookie))) - return 0; - *p++ = resp->status; - return xdr_ressize_check(rqstp, p); -} From 45c1384bd767026dff0bef2a716ed92984980509 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:52:10 -0400 Subject: [PATCH 0613/1565] lockd: Update the NLMv1 SHARE results encoder to use struct xdr_stream [ Upstream commit 529ca3a116e8978575fec061a71fa6865a344891 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr.c | 25 +++++++++---------------- 1 file changed, 9 insertions(+), 16 deletions(-) diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 4fb6090bc915..9235e60b1769 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -41,18 +41,6 @@ loff_t_to_s32(loff_t offset) return res; } -/* - * XDR functions for basic NLM types - */ -static inline __be32 * -nlm_encode_cookie(__be32 *p, struct nlm_cookie *c) -{ - *p++ = htonl(c->len); - memcpy(p, c->data, c->len); - p+=XDR_QUADLEN(c->len); - return p; -} - /* * NLM file handles are defined by specification to be a variable-length * XDR opaque no longer than 1024 bytes. However, this implementation @@ -361,11 +349,16 @@ nlmsvc_encode_res(struct svc_rqst *rqstp, __be32 *p) int nlmsvc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp; - if (!(p = nlm_encode_cookie(p, &resp->cookie))) + if (!svcxdr_encode_cookie(xdr, &resp->cookie)) return 0; - *p++ = resp->status; - *p++ = xdr_zero; /* sequence argument */ - return xdr_ressize_check(rqstp, p); + if (!svcxdr_encode_stats(xdr, resp->status)) + return 0; + /* sequence */ + if (xdr_stream_encode_u32(xdr, 0) < 0) + return 0; + + return 1; } From c8f404825085edcc5cffc744ad35562a643d95d0 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:52:16 -0400 Subject: [PATCH 0614/1565] lockd: Update the NLMv4 void arguments decoder to use struct xdr_stream [ Upstream commit 7956521aac58e434a05cf3c68c1b66c1312e5649 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr4.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 5fa9f48a9dba..d0960a8551f8 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -18,6 +18,8 @@ #include #include +#include "svcxdr.h" + #define NLMDBG_FACILITY NLMDBG_XDR static inline loff_t @@ -175,8 +177,15 @@ nlm4_encode_testres(__be32 *p, struct nlm_res *resp) /* - * First, the server side XDR functions + * Decode Call arguments */ + +int +nlm4svc_decode_void(struct svc_rqst *rqstp, __be32 *p) +{ + return 1; +} + int nlm4svc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) { @@ -336,12 +345,6 @@ nlm4svc_decode_res(struct svc_rqst *rqstp, __be32 *p) return xdr_argsize_check(rqstp, p); } -int -nlm4svc_decode_void(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_argsize_check(rqstp, p); -} - int nlm4svc_encode_void(struct svc_rqst *rqstp, __be32 *p) { From 360159aafa8b0717db40c3c188f4eab33d4fc9b2 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:52:22 -0400 Subject: [PATCH 0615/1565] lockd: Update the NLMv4 TEST arguments decoder to use struct xdr_stream [ Upstream commit 345b4159a075b15dc4ae70f1db90fa8abf85d2e7 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr4.c | 72 ++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 66 insertions(+), 6 deletions(-) diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index d0960a8551f8..cf64794fdc1f 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -96,6 +96,32 @@ nlm4_decode_fh(__be32 *p, struct nfs_fh *f) return p + XDR_QUADLEN(f->size); } +/* + * NLM file handles are defined by specification to be a variable-length + * XDR opaque no longer than 1024 bytes. However, this implementation + * limits their length to the size of an NFSv3 file handle. + */ +static bool +svcxdr_decode_fhandle(struct xdr_stream *xdr, struct nfs_fh *fh) +{ + __be32 *p; + u32 len; + + if (xdr_stream_decode_u32(xdr, &len) < 0) + return false; + if (len > NFS_MAXFHSIZE) + return false; + + p = xdr_inline_decode(xdr, len); + if (!p) + return false; + fh->size = len; + memcpy(fh->data, p, len); + memset(fh->data + len, 0, sizeof(fh->data) - len); + + return true; +} + /* * Encode and decode owner handle */ @@ -135,6 +161,39 @@ nlm4_decode_lock(__be32 *p, struct nlm_lock *lock) return p; } +static bool +svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) +{ + struct file_lock *fl = &lock->fl; + u64 len, start; + s64 end; + + if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) + return false; + if (!svcxdr_decode_fhandle(xdr, &lock->fh)) + return false; + if (!svcxdr_decode_owner(xdr, &lock->oh)) + return false; + if (xdr_stream_decode_u32(xdr, &lock->svid) < 0) + return false; + if (xdr_stream_decode_u64(xdr, &start) < 0) + return false; + if (xdr_stream_decode_u64(xdr, &len) < 0) + return false; + + locks_init_lock(fl); + fl->fl_flags = FL_POSIX; + fl->fl_type = F_RDLCK; + end = start + len - 1; + fl->fl_start = s64_to_loff_t(start); + if (len == 0 || end < 0) + fl->fl_end = OFFSET_MAX; + else + fl->fl_end = s64_to_loff_t(end); + + return true; +} + /* * Encode result of a TEST/TEST_MSG call */ @@ -189,19 +248,20 @@ nlm4svc_decode_void(struct svc_rqst *rqstp, __be32 *p) int nlm4svc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; - u32 exclusive; + u32 exclusive; - if (!(p = nlm4_decode_cookie(p, &argp->cookie))) + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) return 0; - - exclusive = ntohl(*p++); - if (!(p = nlm4_decode_lock(p, &argp->lock))) + if (xdr_stream_decode_bool(xdr, &exclusive) < 0) + return 0; + if (!svcxdr_decode_lock(xdr, &argp->lock)) return 0; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; - return xdr_argsize_check(rqstp, p); + return 1; } int From 101d45274abae2ad4d9021b2305be22ab63c819a Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:52:28 -0400 Subject: [PATCH 0616/1565] lockd: Update the NLMv4 LOCK arguments decoder to use struct xdr_stream [ Upstream commit 0e5977af4fdc277984fca7d8c2e0c880935775a0 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr4.c | 47 ++++++++++++++++++++++++++--------------------- 1 file changed, 26 insertions(+), 21 deletions(-) diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index cf64794fdc1f..1d3e780c25fd 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -264,6 +264,32 @@ nlm4svc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) return 1; } +int +nlm4svc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_args *argp = rqstp->rq_argp; + u32 exclusive; + + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) + return 0; + if (xdr_stream_decode_bool(xdr, &argp->block) < 0) + return 0; + if (xdr_stream_decode_bool(xdr, &exclusive) < 0) + return 0; + if (!svcxdr_decode_lock(xdr, &argp->lock)) + return 0; + if (exclusive) + argp->lock.fl.fl_type = F_WRLCK; + if (xdr_stream_decode_bool(xdr, &argp->reclaim) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &argp->state) < 0) + return 0; + argp->monitor = 1; /* monitor client by default */ + + return 1; +} + int nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -274,27 +300,6 @@ nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) return xdr_ressize_check(rqstp, p); } -int -nlm4svc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_args *argp = rqstp->rq_argp; - u32 exclusive; - - if (!(p = nlm4_decode_cookie(p, &argp->cookie))) - return 0; - argp->block = ntohl(*p++); - exclusive = ntohl(*p++); - if (!(p = nlm4_decode_lock(p, &argp->lock))) - return 0; - if (exclusive) - argp->lock.fl.fl_type = F_WRLCK; - argp->reclaim = ntohl(*p++); - argp->state = ntohl(*p++); - argp->monitor = 1; /* monitor client by default */ - - return xdr_argsize_check(rqstp, p); -} - int nlm4svc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) { From 0652983fbe1810066c531f1abe1cf0a7be82dd12 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:52:34 -0400 Subject: [PATCH 0617/1565] lockd: Update the NLMv4 CANCEL arguments decoder to use struct xdr_stream [ Upstream commit 1e1f38dcf3c031715191e1fd26f70a0affca4dbd ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr4.c | 37 ++++++++++++++++++++----------------- 1 file changed, 20 insertions(+), 17 deletions(-) diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 1d3e780c25fd..37d45f1d7199 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -290,6 +290,26 @@ nlm4svc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) return 1; } +int +nlm4svc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_args *argp = rqstp->rq_argp; + u32 exclusive; + + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) + return 0; + if (xdr_stream_decode_bool(xdr, &argp->block) < 0) + return 0; + if (xdr_stream_decode_bool(xdr, &exclusive) < 0) + return 0; + if (!svcxdr_decode_lock(xdr, &argp->lock)) + return 0; + if (exclusive) + argp->lock.fl.fl_type = F_WRLCK; + return 1; +} + int nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -300,23 +320,6 @@ nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) return xdr_ressize_check(rqstp, p); } -int -nlm4svc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_args *argp = rqstp->rq_argp; - u32 exclusive; - - if (!(p = nlm4_decode_cookie(p, &argp->cookie))) - return 0; - argp->block = ntohl(*p++); - exclusive = ntohl(*p++); - if (!(p = nlm4_decode_lock(p, &argp->lock))) - return 0; - if (exclusive) - argp->lock.fl.fl_type = F_WRLCK; - return xdr_argsize_check(rqstp, p); -} - int nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) { From 9e1daae6303a550e6e9c1e138ac44aa59bc27c2f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:52:40 -0400 Subject: [PATCH 0618/1565] lockd: Update the NLMv4 UNLOCK arguments decoder to use struct xdr_stream [ Upstream commit d76d8c25cea794f65615f3a2324052afa4b5f900 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr4.c | 57 +++++++++++++------------------------------------ 1 file changed, 15 insertions(+), 42 deletions(-) diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 37d45f1d7199..47a87ea4a99b 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -131,36 +131,6 @@ nlm4_decode_oh(__be32 *p, struct xdr_netobj *oh) return xdr_decode_netobj(p, oh); } -static __be32 * -nlm4_decode_lock(__be32 *p, struct nlm_lock *lock) -{ - struct file_lock *fl = &lock->fl; - __u64 len, start; - __s64 end; - - if (!(p = xdr_decode_string_inplace(p, &lock->caller, - &lock->len, NLM_MAXSTRLEN)) - || !(p = nlm4_decode_fh(p, &lock->fh)) - || !(p = nlm4_decode_oh(p, &lock->oh))) - return NULL; - lock->svid = ntohl(*p++); - - locks_init_lock(fl); - fl->fl_flags = FL_POSIX; - fl->fl_type = F_RDLCK; /* as good as anything else */ - p = xdr_decode_hyper(p, &start); - p = xdr_decode_hyper(p, &len); - end = start + len - 1; - - fl->fl_start = s64_to_loff_t(start); - - if (len == 0 || end < 0) - fl->fl_end = OFFSET_MAX; - else - fl->fl_end = s64_to_loff_t(end); - return p; -} - static bool svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) { @@ -310,6 +280,21 @@ nlm4svc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) return 1; } +int +nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_args *argp = rqstp->rq_argp; + + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) + return 0; + if (!svcxdr_decode_lock(xdr, &argp->lock)) + return 0; + argp->lock.fl.fl_type = F_UNLCK; + + return 1; +} + int nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -320,18 +305,6 @@ nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) return xdr_ressize_check(rqstp, p); } -int -nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_args *argp = rqstp->rq_argp; - - if (!(p = nlm4_decode_cookie(p, &argp->cookie)) - || !(p = nlm4_decode_lock(p, &argp->lock))) - return 0; - argp->lock.fl.fl_type = F_UNLCK; - return xdr_argsize_check(rqstp, p); -} - int nlm4svc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) { From 33f31f6e85d1ec9cb1cf9f8aadb800f340920bf4 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:52:46 -0400 Subject: [PATCH 0619/1565] lockd: Update the NLMv4 nlm_res arguments decoder to use struct xdr_stream [ Upstream commit b4c24b5a41da63e5f3a9b6ea56cbe2a1efe49579 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr4.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 47a87ea4a99b..6bd3bfb69ed7 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -295,6 +295,20 @@ nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) return 1; } +int +nlm4svc_decode_res(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_res *resp = rqstp->rq_argp; + + if (!svcxdr_decode_cookie(xdr, &resp->cookie)) + return 0; + if (!svcxdr_decode_stats(xdr, &resp->status)) + return 0; + + return 1; +} + int nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -375,17 +389,6 @@ nlm4svc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) return xdr_argsize_check(rqstp, p); } -int -nlm4svc_decode_res(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_res *resp = rqstp->rq_argp; - - if (!(p = nlm4_decode_cookie(p, &resp->cookie))) - return 0; - resp->status = *p++; - return xdr_argsize_check(rqstp, p); -} - int nlm4svc_encode_void(struct svc_rqst *rqstp, __be32 *p) { From 300a4b1632c34bd7d9a0d48a0bd3c93929719d4e Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:52:52 -0400 Subject: [PATCH 0620/1565] lockd: Update the NLMv4 SM_NOTIFY arguments decoder to use struct xdr_stream [ Upstream commit bc3665fd718b325cfff3abd383b00d1a87e028dc ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr4.c | 39 ++++++++++++++++++++++++++------------- 1 file changed, 26 insertions(+), 13 deletions(-) diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 6bd3bfb69ed7..2dbf82c2726b 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -309,6 +309,32 @@ nlm4svc_decode_res(struct svc_rqst *rqstp, __be32 *p) return 1; } +int +nlm4svc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_reboot *argp = rqstp->rq_argp; + u32 len; + + if (xdr_stream_decode_u32(xdr, &len) < 0) + return 0; + if (len > SM_MAXSTRLEN) + return 0; + p = xdr_inline_decode(xdr, len); + if (!p) + return 0; + argp->len = len; + argp->mon = (char *)p; + if (xdr_stream_decode_u32(xdr, &argp->state) < 0) + return 0; + p = xdr_inline_decode(xdr, SM_PRIV_SIZE); + if (!p) + return 0; + memcpy(&argp->priv.data, p, sizeof(argp->priv.data)); + + return 1; +} + int nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -376,19 +402,6 @@ nlm4svc_decode_notify(struct svc_rqst *rqstp, __be32 *p) return xdr_argsize_check(rqstp, p); } -int -nlm4svc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_reboot *argp = rqstp->rq_argp; - - if (!(p = xdr_decode_string_inplace(p, &argp->mon, &argp->len, SM_MAXSTRLEN))) - return 0; - argp->state = ntohl(*p++); - memcpy(&argp->priv.data, p, sizeof(argp->priv.data)); - p += XDR_QUADLEN(SM_PRIV_SIZE); - return xdr_argsize_check(rqstp, p); -} - int nlm4svc_encode_void(struct svc_rqst *rqstp, __be32 *p) { From 604c8a432c6c76368d7e21a2defc5cefbcc0d240 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:52:58 -0400 Subject: [PATCH 0621/1565] lockd: Update the NLMv4 SHARE arguments decoder to use struct xdr_stream [ Upstream commit 7cf96b6d0104b12aa30961901879e428884b1695 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr4.c | 103 +++++++++++++----------------------------------- 1 file changed, 28 insertions(+), 75 deletions(-) diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 2dbf82c2726b..e6bab1d1e41f 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -42,37 +42,6 @@ loff_t_to_s64(loff_t offset) return res; } -/* - * XDR functions for basic NLM types - */ -static __be32 * -nlm4_decode_cookie(__be32 *p, struct nlm_cookie *c) -{ - unsigned int len; - - len = ntohl(*p++); - - if(len==0) - { - c->len=4; - memset(c->data, 0, 4); /* hockeypux brain damage */ - } - else if(len<=NLM_MAXCOOKIELEN) - { - c->len=len; - memcpy(c->data, p, len); - p+=XDR_QUADLEN(len); - } - else - { - dprintk("lockd: bad cookie size %d (only cookies under " - "%d bytes are supported.)\n", - len, NLM_MAXCOOKIELEN); - return NULL; - } - return p; -} - static __be32 * nlm4_encode_cookie(__be32 *p, struct nlm_cookie *c) { @@ -82,20 +51,6 @@ nlm4_encode_cookie(__be32 *p, struct nlm_cookie *c) return p; } -static __be32 * -nlm4_decode_fh(__be32 *p, struct nfs_fh *f) -{ - memset(f->data, 0, sizeof(f->data)); - f->size = ntohl(*p++); - if (f->size > NFS_MAXFHSIZE) { - dprintk("lockd: bad fhandle size %d (should be <=%d)\n", - f->size, NFS_MAXFHSIZE); - return NULL; - } - memcpy(f->data, p, f->size); - return p + XDR_QUADLEN(f->size); -} - /* * NLM file handles are defined by specification to be a variable-length * XDR opaque no longer than 1024 bytes. However, this implementation @@ -122,15 +77,6 @@ svcxdr_decode_fhandle(struct xdr_stream *xdr, struct nfs_fh *fh) return true; } -/* - * Encode and decode owner handle - */ -static __be32 * -nlm4_decode_oh(__be32 *p, struct xdr_netobj *oh) -{ - return xdr_decode_netobj(p, oh); -} - static bool svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) { @@ -335,6 +281,34 @@ nlm4svc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) return 1; } +int +nlm4svc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_args *argp = rqstp->rq_argp; + struct nlm_lock *lock = &argp->lock; + + memset(lock, 0, sizeof(*lock)); + locks_init_lock(&lock->fl); + lock->svid = ~(u32)0; + + if (!svcxdr_decode_cookie(xdr, &argp->cookie)) + return 0; + if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) + return 0; + if (!svcxdr_decode_fhandle(xdr, &lock->fh)) + return 0; + if (!svcxdr_decode_owner(xdr, &lock->oh)) + return 0; + /* XXX: Range checks are missing in the original code */ + if (xdr_stream_decode_u32(xdr, &argp->fsm_mode) < 0) + return 0; + if (xdr_stream_decode_u32(xdr, &argp->fsm_access) < 0) + return 0; + + return 1; +} + int nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -345,27 +319,6 @@ nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) return xdr_ressize_check(rqstp, p); } -int -nlm4svc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_args *argp = rqstp->rq_argp; - struct nlm_lock *lock = &argp->lock; - - memset(lock, 0, sizeof(*lock)); - locks_init_lock(&lock->fl); - lock->svid = ~(u32) 0; - - if (!(p = nlm4_decode_cookie(p, &argp->cookie)) - || !(p = xdr_decode_string_inplace(p, &lock->caller, - &lock->len, NLM_MAXSTRLEN)) - || !(p = nlm4_decode_fh(p, &lock->fh)) - || !(p = nlm4_decode_oh(p, &lock->oh))) - return 0; - argp->fsm_mode = ntohl(*p++); - argp->fsm_access = ntohl(*p++); - return xdr_argsize_check(rqstp, p); -} - int nlm4svc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) { From 0add7c13bf78f8ecc8ac74f753d368c0e3e0089e Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:53:04 -0400 Subject: [PATCH 0622/1565] lockd: Update the NLMv4 FREE_ALL arguments decoder to use struct xdr_stream [ Upstream commit 3049e974a7c7cfa0c15fb807f4a3e75b2ab8517a ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr4.c | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index e6bab1d1e41f..6c5383bef2bf 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -309,6 +309,21 @@ nlm4svc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) return 1; } +int +nlm4svc_decode_notify(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_arg_stream; + struct nlm_args *argp = rqstp->rq_argp; + struct nlm_lock *lock = &argp->lock; + + if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) + return 0; + if (xdr_stream_decode_u32(xdr, &argp->state) < 0) + return 0; + + return 1; +} + int nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -342,19 +357,6 @@ nlm4svc_encode_res(struct svc_rqst *rqstp, __be32 *p) return xdr_ressize_check(rqstp, p); } -int -nlm4svc_decode_notify(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_args *argp = rqstp->rq_argp; - struct nlm_lock *lock = &argp->lock; - - if (!(p = xdr_decode_string_inplace(p, &lock->caller, - &lock->len, NLM_MAXSTRLEN))) - return 0; - argp->state = ntohl(*p++); - return xdr_argsize_check(rqstp, p); -} - int nlm4svc_encode_void(struct svc_rqst *rqstp, __be32 *p) { From 4f5ba2e6b434d4ca5535ba43ca5a1bdb471348cd Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:53:11 -0400 Subject: [PATCH 0623/1565] lockd: Update the NLMv4 void results encoder to use struct xdr_stream [ Upstream commit ec757e423b4fcd6e5ea4405d1e8243c040458d78 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr4.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 6c5383bef2bf..0db142e203d2 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -324,6 +324,17 @@ nlm4svc_decode_notify(struct svc_rqst *rqstp, __be32 *p) return 1; } + +/* + * Encode Reply results + */ + +int +nlm4svc_encode_void(struct svc_rqst *rqstp, __be32 *p) +{ + return 1; +} + int nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { @@ -356,9 +367,3 @@ nlm4svc_encode_res(struct svc_rqst *rqstp, __be32 *p) *p++ = resp->status; return xdr_ressize_check(rqstp, p); } - -int -nlm4svc_encode_void(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_ressize_check(rqstp, p); -} From e1e61d647f264d9196b4ddb1ec518405c2318ee4 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:53:17 -0400 Subject: [PATCH 0624/1565] lockd: Update the NLMv4 TEST results encoder to use struct xdr_stream [ Upstream commit 1beef1473ccaa70a2d54f9e76fba5f534931ea23 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr4.c | 72 ++++++++++++++++++++++++------------------------- 1 file changed, 35 insertions(+), 37 deletions(-) diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 0db142e203d2..9b8a7afb935c 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -20,8 +20,6 @@ #include "svcxdr.h" -#define NLMDBG_FACILITY NLMDBG_XDR - static inline loff_t s64_to_loff_t(__s64 offset) { @@ -110,44 +108,44 @@ svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) return true; } -/* - * Encode result of a TEST/TEST_MSG call - */ -static __be32 * -nlm4_encode_testres(__be32 *p, struct nlm_res *resp) +static bool +svcxdr_encode_holder(struct xdr_stream *xdr, const struct nlm_lock *lock) { - s64 start, len; + const struct file_lock *fl = &lock->fl; + s64 start, len; - dprintk("xdr: before encode_testres (p %p resp %p)\n", p, resp); - if (!(p = nlm4_encode_cookie(p, &resp->cookie))) - return NULL; - *p++ = resp->status; + /* exclusive */ + if (xdr_stream_encode_bool(xdr, fl->fl_type != F_RDLCK) < 0) + return false; + if (xdr_stream_encode_u32(xdr, lock->svid) < 0) + return false; + if (!svcxdr_encode_owner(xdr, &lock->oh)) + return false; + start = loff_t_to_s64(fl->fl_start); + if (fl->fl_end == OFFSET_MAX) + len = 0; + else + len = loff_t_to_s64(fl->fl_end - fl->fl_start + 1); + if (xdr_stream_encode_u64(xdr, start) < 0) + return false; + if (xdr_stream_encode_u64(xdr, len) < 0) + return false; - if (resp->status == nlm_lck_denied) { - struct file_lock *fl = &resp->lock.fl; + return true; +} - *p++ = (fl->fl_type == F_RDLCK)? xdr_zero : xdr_one; - *p++ = htonl(resp->lock.svid); - - /* Encode owner handle. */ - if (!(p = xdr_encode_netobj(p, &resp->lock.oh))) - return NULL; - - start = loff_t_to_s64(fl->fl_start); - if (fl->fl_end == OFFSET_MAX) - len = 0; - else - len = loff_t_to_s64(fl->fl_end - fl->fl_start + 1); - - p = xdr_encode_hyper(p, start); - p = xdr_encode_hyper(p, len); - dprintk("xdr: encode_testres (status %u pid %d type %d start %Ld end %Ld)\n", - resp->status, (int)resp->lock.svid, fl->fl_type, - (long long)fl->fl_start, (long long)fl->fl_end); +static bool +svcxdr_encode_testrply(struct xdr_stream *xdr, const struct nlm_res *resp) +{ + if (!svcxdr_encode_stats(xdr, resp->status)) + return false; + switch (resp->status) { + case nlm_lck_denied: + if (!svcxdr_encode_holder(xdr, &resp->lock)) + return false; } - dprintk("xdr: after encode_testres (p %p resp %p)\n", p, resp); - return p; + return true; } @@ -338,11 +336,11 @@ nlm4svc_encode_void(struct svc_rqst *rqstp, __be32 *p) int nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp; - if (!(p = nlm4_encode_testres(p, resp))) - return 0; - return xdr_ressize_check(rqstp, p); + return svcxdr_encode_cookie(xdr, &resp->cookie) && + svcxdr_encode_testrply(xdr, resp); } int From fec07309928151ea78d37ba4e8afc303b761a57f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:53:23 -0400 Subject: [PATCH 0625/1565] lockd: Update the NLMv4 nlm_res results encoder to use struct xdr_stream [ Upstream commit 447c14d48968d0d4c2733c3f8052cb63aa1deb38 ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr4.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 9b8a7afb935c..efdede71b951 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -343,6 +343,16 @@ nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) svcxdr_encode_testrply(xdr, resp); } +int +nlm4svc_encode_res(struct svc_rqst *rqstp, __be32 *p) +{ + struct xdr_stream *xdr = &rqstp->rq_res_stream; + struct nlm_res *resp = rqstp->rq_resp; + + return svcxdr_encode_cookie(xdr, &resp->cookie) && + svcxdr_encode_stats(xdr, resp->status); +} + int nlm4svc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) { @@ -354,14 +364,3 @@ nlm4svc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) *p++ = xdr_zero; /* sequence argument */ return xdr_ressize_check(rqstp, p); } - -int -nlm4svc_encode_res(struct svc_rqst *rqstp, __be32 *p) -{ - struct nlm_res *resp = rqstp->rq_resp; - - if (!(p = nlm4_encode_cookie(p, &resp->cookie))) - return 0; - *p++ = resp->status; - return xdr_ressize_check(rqstp, p); -} From ce1819876203d27af97d527da7eb21f1e5f496bb Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Jun 2021 16:53:29 -0400 Subject: [PATCH 0626/1565] lockd: Update the NLMv4 SHARE results encoder to use struct xdr_stream [ Upstream commit 0ff5b50ab1f7f39862d0cdf6803978d31b27f25e ] Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Sasha Levin --- fs/lockd/xdr4.c | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index efdede71b951..98e957e4566c 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -40,15 +40,6 @@ loff_t_to_s64(loff_t offset) return res; } -static __be32 * -nlm4_encode_cookie(__be32 *p, struct nlm_cookie *c) -{ - *p++ = htonl(c->len); - memcpy(p, c->data, c->len); - p+=XDR_QUADLEN(c->len); - return p; -} - /* * NLM file handles are defined by specification to be a variable-length * XDR opaque no longer than 1024 bytes. However, this implementation @@ -356,11 +347,16 @@ nlm4svc_encode_res(struct svc_rqst *rqstp, __be32 *p) int nlm4svc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) { + struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp; - if (!(p = nlm4_encode_cookie(p, &resp->cookie))) + if (!svcxdr_encode_cookie(xdr, &resp->cookie)) return 0; - *p++ = resp->status; - *p++ = xdr_zero; /* sequence argument */ - return xdr_ressize_check(rqstp, p); + if (!svcxdr_encode_stats(xdr, resp->status)) + return 0; + /* sequence */ + if (xdr_stream_encode_u32(xdr, 0) < 0) + return 0; + + return 1; } From 17600880e1534b8851c82014b160bc4fa7a18fe5 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Thu, 13 May 2021 16:16:39 +0100 Subject: [PATCH 0627/1565] nfsd: remove redundant assignment to pointer 'this' [ Upstream commit e34c0ce9136a0fe96f0f547898d14c44f3c9f147 ] The pointer 'this' is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index f7ddfa204abc..1f840c72e978 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -3369,7 +3369,7 @@ bool nfsd4_spo_must_allow(struct svc_rqst *rqstp) { struct nfsd4_compoundres *resp = rqstp->rq_resp; struct nfsd4_compoundargs *argp = rqstp->rq_argp; - struct nfsd4_op *this = &argp->ops[resp->opcnt - 1]; + struct nfsd4_op *this; struct nfsd4_compound_state *cstate = &resp->cstate; struct nfs4_op_map *allow = &cstate->clp->cl_spo_must_allow; u32 opiter; From 5610ed80e86022e13733715a381db75f724392e2 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 25 Jun 2021 11:12:49 -0400 Subject: [PATCH 0628/1565] NFSD: Prevent a possible oops in the nfs_dirent() tracepoint [ Upstream commit 7b08cf62b1239a4322427d677ea9363f0ab677c6 ] The double copy of the string is a mistake, plus __assign_str() uses strlen(), which is wrong to do on a string that isn't guaranteed to be NUL-terminated. Fixes: 6019ce0742ca ("NFSD: Add a tracepoint to record directory entry encoding") Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/trace.h | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 68a0fecdd5f4..de245f433392 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -408,7 +408,6 @@ TRACE_EVENT(nfsd_dirent, __entry->ino = ino; __entry->len = namlen; memcpy(__get_str(name), name, namlen); - __assign_str(name, name); ), TP_printk("fh_hash=0x%08x ino=%llu name=%.*s", __entry->fh_hash, __entry->ino, From e79057d15d96ef19de4de6d7e479bae3d58a2a8d Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Thu, 1 Jul 2021 20:06:56 -0400 Subject: [PATCH 0629/1565] nfsd: fix NULL dereference in nfs3svc_encode_getaclres [ Upstream commit ab1016d39cc052064e32f25ad18ef8767a0ee3b8 ] In error cases the dentry may be NULL. Before 20798dfe249a, the encoder also checked dentry and d_really_is_positive(dentry), but that looks like overkill to me--zero status should be enough to guarantee a positive dentry. This isn't the first time we've seen an error-case NULL dereference hidden in the initialization of a local variable in an xdr encoder. But I went back through the other recent rewrites and didn't spot any similar bugs. Reported-by: JianHong Yin Reviewed-by: Chuck Lever III Fixes: 20798dfe249a ("NFSD: Update the NFSv3 GETACL result encoder...") Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3acl.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index cfb686f23e57..5e13e5f7f92b 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -170,7 +170,7 @@ static int nfs3svc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) struct nfsd3_getaclres *resp = rqstp->rq_resp; struct dentry *dentry = resp->fh.fh_dentry; struct kvec *head = rqstp->rq_res.head; - struct inode *inode = d_inode(dentry); + struct inode *inode; unsigned int base; int n; int w; @@ -179,6 +179,7 @@ static int nfs3svc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) return 0; switch (resp->status) { case nfs_ok: + inode = d_inode(dentry); if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) return 0; if (xdr_stream_encode_u32(xdr, resp->mask) < 0) From 774c2dbca76e9c63b63a43b059b7b5b6d0ed9440 Mon Sep 17 00:00:00 2001 From: Matthew Bobrowski Date: Sun, 8 Aug 2021 15:24:33 +1000 Subject: [PATCH 0630/1565] kernel/pid.c: remove static qualifier from pidfd_create() [ Upstream commit c576e0fcd6188d0edb50b0fb83f853433ef4819b ] With the idea of returning pidfds from the fanotify API, we need to expose a mechanism for creating pidfds. We drop the static qualifier from pidfd_create() and add its declaration to linux/pid.h so that the pidfd_create() helper can be called from other kernel subsystems i.e. fanotify. Link: https://lore.kernel.org/r/0c68653ec32f1b7143301f0231f7ed14062fd82b.1628398044.git.repnop@google.com Signed-off-by: Matthew Bobrowski Acked-by: Christian Brauner Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/pid.h | 1 + kernel/pid.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/pid.h b/include/linux/pid.h index fa10acb8d6a4..af308e15f174 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -78,6 +78,7 @@ struct file; extern struct pid *pidfd_pid(const struct file *file); struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags); +int pidfd_create(struct pid *pid, unsigned int flags); static inline struct pid *get_pid(struct pid *pid) { diff --git a/kernel/pid.c b/kernel/pid.c index 4856818c9de1..74f0466757cb 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -550,10 +550,12 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags) * Note, that this function can only be called after the fd table has * been unshared to avoid leaking the pidfd to the new process. * + * This symbol should not be explicitly exported to loadable modules. + * * Return: On success, a cloexec pidfd is returned. * On error, a negative errno number will be returned. */ -static int pidfd_create(struct pid *pid, unsigned int flags) +int pidfd_create(struct pid *pid, unsigned int flags) { int fd; From 03b5d3ee505bae04c8612eda4d92b9d29899ca0d Mon Sep 17 00:00:00 2001 From: Matthew Bobrowski Date: Sun, 8 Aug 2021 15:25:05 +1000 Subject: [PATCH 0631/1565] kernel/pid.c: implement additional checks upon pidfd_create() parameters [ Upstream commit 490b9ba881e2c6337bb09b68010803ae98e59f4a ] By adding the pidfd_create() declaration to linux/pid.h, we effectively expose this function to the rest of the kernel. In order to avoid any unintended behavior, or set false expectations upon this function, ensure that constraints are forced upon each of the passed parameters. This includes the checking of whether the passed struct pid is a thread-group leader as pidfd creation is currently limited to such pid types. Link: https://lore.kernel.org/r/2e9b91c2d529d52a003b8b86c45f866153be9eb5.1628398044.git.repnop@google.com Signed-off-by: Matthew Bobrowski Acked-by: Christian Brauner Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- kernel/pid.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/kernel/pid.c b/kernel/pid.c index 74f0466757cb..0820f2c50bb0 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -559,6 +559,12 @@ int pidfd_create(struct pid *pid, unsigned int flags) { int fd; + if (!pid || !pid_has_task(pid, PIDTYPE_TGID)) + return -EINVAL; + + if (flags & ~(O_NONBLOCK | O_RDWR | O_CLOEXEC)) + return -EINVAL; + fd = anon_inode_getfd("[pidfd]", &pidfd_fops, get_pid(pid), flags | O_RDWR | O_CLOEXEC); if (fd < 0) @@ -598,10 +604,7 @@ SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags) if (!p) return -ESRCH; - if (pid_has_task(p, PIDTYPE_TGID)) - fd = pidfd_create(p, flags); - else - fd = -EINVAL; + fd = pidfd_create(p, flags); put_pid(p); return fd; From 3ddcb1939608af343d2242efcbc099e34f340ab9 Mon Sep 17 00:00:00 2001 From: Matthew Bobrowski Date: Sun, 8 Aug 2021 15:25:32 +1000 Subject: [PATCH 0632/1565] fanotify: minor cosmetic adjustments to fid labels [ Upstream commit d3424c9bac893bd06f38a20474cd622881d384ca ] With the idea to support additional info record types in the future i.e. fanotify_event_info_pidfd, it's a good idea to rename some of the labels assigned to some of the existing fid related functions, parameters, etc which more accurately represent the intent behind their usage. For example, copy_info_to_user() was defined with a generic function label, which arguably reads as being supportive of different info record types, however the parameter list for this function is explicitly tailored towards the creation and copying of the fanotify_event_info_fid records. This same point applies to the macro defined as FANOTIFY_INFO_HDR_LEN. With fanotify_event_info_len(), we change the parameter label so that the function implies that it can be extended to calculate the length for additional info record types. Link: https://lore.kernel.org/r/7c3ec33f3c718dac40764305d4d494d858f59c51.1628398044.git.repnop@google.com Signed-off-by: Matthew Bobrowski Reviewed-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 33 +++++++++++++++++------------- 1 file changed, 19 insertions(+), 14 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 1c439e2fdcd8..74b51dab2cdc 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -104,7 +104,7 @@ struct kmem_cache *fanotify_path_event_cachep __read_mostly; struct kmem_cache *fanotify_perm_event_cachep __read_mostly; #define FANOTIFY_EVENT_ALIGN 4 -#define FANOTIFY_INFO_HDR_LEN \ +#define FANOTIFY_FID_INFO_HDR_LEN \ (sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle)) static int fanotify_fid_info_len(int fh_len, int name_len) @@ -114,10 +114,11 @@ static int fanotify_fid_info_len(int fh_len, int name_len) if (name_len) info_len += name_len + 1; - return roundup(FANOTIFY_INFO_HDR_LEN + info_len, FANOTIFY_EVENT_ALIGN); + return roundup(FANOTIFY_FID_INFO_HDR_LEN + info_len, + FANOTIFY_EVENT_ALIGN); } -static int fanotify_event_info_len(unsigned int fid_mode, +static int fanotify_event_info_len(unsigned int info_mode, struct fanotify_event *event) { struct fanotify_info *info = fanotify_event_info(event); @@ -128,7 +129,8 @@ static int fanotify_event_info_len(unsigned int fid_mode, if (dir_fh_len) { info_len += fanotify_fid_info_len(dir_fh_len, info->name_len); - } else if ((fid_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) { + } else if ((info_mode & FAN_REPORT_NAME) && + (event->mask & FAN_ONDIR)) { /* * With group flag FAN_REPORT_NAME, if name was not recorded in * event on a directory, we will report the name ".". @@ -303,9 +305,10 @@ static int process_access_response(struct fsnotify_group *group, return -ENOENT; } -static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, - int info_type, const char *name, size_t name_len, - char __user *buf, size_t count) +static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, + int info_type, const char *name, + size_t name_len, + char __user *buf, size_t count) { struct fanotify_event_info_fid info = { }; struct file_handle handle = { }; @@ -463,10 +466,11 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, if (fanotify_event_dir_fh_len(event)) { info_type = info->name_len ? FAN_EVENT_INFO_TYPE_DFID_NAME : FAN_EVENT_INFO_TYPE_DFID; - ret = copy_info_to_user(fanotify_event_fsid(event), - fanotify_info_dir_fh(info), - info_type, fanotify_info_name(info), - info->name_len, buf, count); + ret = copy_fid_info_to_user(fanotify_event_fsid(event), + fanotify_info_dir_fh(info), + info_type, + fanotify_info_name(info), + info->name_len, buf, count); if (ret < 0) goto out_close_fd; @@ -512,9 +516,10 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, info_type = FAN_EVENT_INFO_TYPE_FID; } - ret = copy_info_to_user(fanotify_event_fsid(event), - fanotify_event_object_fh(event), - info_type, dot, dot_len, buf, count); + ret = copy_fid_info_to_user(fanotify_event_fsid(event), + fanotify_event_object_fh(event), + info_type, dot, dot_len, + buf, count); if (ret < 0) goto out_close_fd; From 77bc7f529abd6f3031bffbb53cfc3dc9c8e20db3 Mon Sep 17 00:00:00 2001 From: Matthew Bobrowski Date: Sun, 8 Aug 2021 15:25:58 +1000 Subject: [PATCH 0633/1565] fanotify: introduce a generic info record copying helper [ Upstream commit 0aca67bb7f0d8c997dfef8ff0bfeb0afb361f0e6 ] The copy_info_records_to_user() helper allows for the separation of info record copying routines/conditionals from copy_event_to_user(), which reduces the overall clutter within this function. This becomes especially true as we start introducing additional info records in the future i.e. struct fanotify_event_info_pidfd. On success, this helper returns the total amount of bytes that have been copied into the user supplied buffer and on error, a negative value is returned to the caller. The newly defined macro FANOTIFY_INFO_MODES can be used to obtain info record types that have been enabled for a specific notification group. This macro becomes useful in the subsequent patch when the FAN_REPORT_PIDFD initialization flag is introduced. Link: https://lore.kernel.org/r/8872947dfe12ce8ae6e9a7f2d49ea29bc8006af0.1628398044.git.repnop@google.com Signed-off-by: Matthew Bobrowski Reviewed-by: Amir Goldstein Signed-off-by: Jan Kara [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 155 ++++++++++++++++------------- include/linux/fanotify.h | 2 + 2 files changed, 90 insertions(+), 67 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 74b51dab2cdc..83405949a71b 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -173,7 +173,7 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group, size_t event_size = FAN_EVENT_METADATA_LEN; struct fanotify_event *event = NULL; struct fsnotify_event *fsn_event; - unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); + unsigned int info_mode = FAN_GROUP_FLAG(group, FANOTIFY_INFO_MODES); pr_debug("%s: group=%p count=%zd\n", __func__, group, count); @@ -183,8 +183,8 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group, goto out; event = FANOTIFY_E(fsn_event); - if (fid_mode) - event_size += fanotify_event_info_len(fid_mode, event); + if (info_mode) + event_size += fanotify_event_info_len(info_mode, event); if (event_size > count) { event = ERR_PTR(-EINVAL); @@ -401,6 +401,86 @@ static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, return info_len; } +static int copy_info_records_to_user(struct fanotify_event *event, + struct fanotify_info *info, + unsigned int info_mode, + char __user *buf, size_t count) +{ + int ret, total_bytes = 0, info_type = 0; + unsigned int fid_mode = info_mode & FANOTIFY_FID_BITS; + + /* + * Event info records order is as follows: dir fid + name, child fid. + */ + if (fanotify_event_dir_fh_len(event)) { + info_type = info->name_len ? FAN_EVENT_INFO_TYPE_DFID_NAME : + FAN_EVENT_INFO_TYPE_DFID; + ret = copy_fid_info_to_user(fanotify_event_fsid(event), + fanotify_info_dir_fh(info), + info_type, + fanotify_info_name(info), + info->name_len, buf, count); + if (ret < 0) + return ret; + + buf += ret; + count -= ret; + total_bytes += ret; + } + + if (fanotify_event_object_fh_len(event)) { + const char *dot = NULL; + int dot_len = 0; + + if (fid_mode == FAN_REPORT_FID || info_type) { + /* + * With only group flag FAN_REPORT_FID only type FID is + * reported. Second info record type is always FID. + */ + info_type = FAN_EVENT_INFO_TYPE_FID; + } else if ((fid_mode & FAN_REPORT_NAME) && + (event->mask & FAN_ONDIR)) { + /* + * With group flag FAN_REPORT_NAME, if name was not + * recorded in an event on a directory, report the name + * "." with info type DFID_NAME. + */ + info_type = FAN_EVENT_INFO_TYPE_DFID_NAME; + dot = "."; + dot_len = 1; + } else if ((event->mask & ALL_FSNOTIFY_DIRENT_EVENTS) || + (event->mask & FAN_ONDIR)) { + /* + * With group flag FAN_REPORT_DIR_FID, a single info + * record has type DFID for directory entry modification + * event and for event on a directory. + */ + info_type = FAN_EVENT_INFO_TYPE_DFID; + } else { + /* + * With group flags FAN_REPORT_DIR_FID|FAN_REPORT_FID, + * a single info record has type FID for event on a + * non-directory, when there is no directory to report. + * For example, on FAN_DELETE_SELF event. + */ + info_type = FAN_EVENT_INFO_TYPE_FID; + } + + ret = copy_fid_info_to_user(fanotify_event_fsid(event), + fanotify_event_object_fh(event), + info_type, dot, dot_len, + buf, count); + if (ret < 0) + return ret; + + buf += ret; + count -= ret; + total_bytes += ret; + } + + return total_bytes; +} + static ssize_t copy_event_to_user(struct fsnotify_group *group, struct fanotify_event *event, char __user *buf, size_t count) @@ -408,15 +488,14 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, struct fanotify_event_metadata metadata; struct path *path = fanotify_event_path(event); struct fanotify_info *info = fanotify_event_info(event); - unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); + unsigned int info_mode = FAN_GROUP_FLAG(group, FANOTIFY_INFO_MODES); struct file *f = NULL; int ret, fd = FAN_NOFD; - int info_type = 0; pr_debug("%s: group=%p event=%p\n", __func__, group, event); metadata.event_len = FAN_EVENT_METADATA_LEN + - fanotify_event_info_len(fid_mode, event); + fanotify_event_info_len(info_mode, event); metadata.metadata_len = FAN_EVENT_METADATA_LEN; metadata.vers = FANOTIFY_METADATA_VERSION; metadata.reserved = 0; @@ -462,69 +541,11 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, if (fanotify_is_perm_event(event->mask)) FANOTIFY_PERM(event)->fd = fd; - /* Event info records order is: dir fid + name, child fid */ - if (fanotify_event_dir_fh_len(event)) { - info_type = info->name_len ? FAN_EVENT_INFO_TYPE_DFID_NAME : - FAN_EVENT_INFO_TYPE_DFID; - ret = copy_fid_info_to_user(fanotify_event_fsid(event), - fanotify_info_dir_fh(info), - info_type, - fanotify_info_name(info), - info->name_len, buf, count); + if (info_mode) { + ret = copy_info_records_to_user(event, info, info_mode, + buf, count); if (ret < 0) goto out_close_fd; - - buf += ret; - count -= ret; - } - - if (fanotify_event_object_fh_len(event)) { - const char *dot = NULL; - int dot_len = 0; - - if (fid_mode == FAN_REPORT_FID || info_type) { - /* - * With only group flag FAN_REPORT_FID only type FID is - * reported. Second info record type is always FID. - */ - info_type = FAN_EVENT_INFO_TYPE_FID; - } else if ((fid_mode & FAN_REPORT_NAME) && - (event->mask & FAN_ONDIR)) { - /* - * With group flag FAN_REPORT_NAME, if name was not - * recorded in an event on a directory, report the - * name "." with info type DFID_NAME. - */ - info_type = FAN_EVENT_INFO_TYPE_DFID_NAME; - dot = "."; - dot_len = 1; - } else if ((event->mask & ALL_FSNOTIFY_DIRENT_EVENTS) || - (event->mask & FAN_ONDIR)) { - /* - * With group flag FAN_REPORT_DIR_FID, a single info - * record has type DFID for directory entry modification - * event and for event on a directory. - */ - info_type = FAN_EVENT_INFO_TYPE_DFID; - } else { - /* - * With group flags FAN_REPORT_DIR_FID|FAN_REPORT_FID, - * a single info record has type FID for event on a - * non-directory, when there is no directory to report. - * For example, on FAN_DELETE_SELF event. - */ - info_type = FAN_EVENT_INFO_TYPE_FID; - } - - ret = copy_fid_info_to_user(fanotify_event_fsid(event), - fanotify_event_object_fh(event), - info_type, dot, dot_len, - buf, count); - if (ret < 0) - goto out_close_fd; - - buf += ret; - count -= ret; } if (f) diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index a16dbeced152..10a7e26ddba6 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -27,6 +27,8 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ #define FANOTIFY_FID_BITS (FAN_REPORT_FID | FAN_REPORT_DFID_NAME) +#define FANOTIFY_INFO_MODES (FANOTIFY_FID_BITS) + /* * fanotify_init() flags that require CAP_SYS_ADMIN. * We do not allow unprivileged groups to request permission events. From cde8883b0b292f94a2f2e0e6096892e0e13cac0f Mon Sep 17 00:00:00 2001 From: Matthew Bobrowski Date: Sun, 8 Aug 2021 15:26:25 +1000 Subject: [PATCH 0634/1565] fanotify: add pidfd support to the fanotify API [ Upstream commit af579beb666aefb17e9a335c12c788c92932baf1 ] Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which allows userspace applications to control whether a pidfd information record containing a pidfd is to be returned alongside the generic event metadata for each event. If FAN_REPORT_PIDFD is enabled for a notification group, an additional struct fanotify_event_info_pidfd object type will be supplied alongside the generic struct fanotify_event_metadata for a single event. This functionality is analogous to that of FAN_REPORT_FID in terms of how the event structure is supplied to a userspace application. Usage of FAN_REPORT_PIDFD with FAN_REPORT_FID/FAN_REPORT_DFID_NAME is permitted, and in this case a struct fanotify_event_info_pidfd object will likely follow any struct fanotify_event_info_fid object. Currently, the usage of the FAN_REPORT_TID flag is not permitted along with FAN_REPORT_PIDFD as the pidfd API currently only supports the creation of pidfds for thread-group leaders. Additionally, usage of the FAN_REPORT_PIDFD flag is limited to privileged processes only i.e. event listeners that are running with the CAP_SYS_ADMIN capability. Attempting to supply the FAN_REPORT_TID initialization flags with FAN_REPORT_PIDFD or creating a notification group without CAP_SYS_ADMIN will result with -EINVAL being returned to the caller. In the event of a pidfd creation error, there are two types of error values that can be reported back to the listener. There is FAN_NOPIDFD, which will be reported in cases where the process responsible for generating the event has terminated prior to the event listener being able to read the event. Then there is FAN_EPIDFD, which will be reported when a more generic pidfd creation error has occurred when fanotify calls pidfd_create(). Link: https://lore.kernel.org/r/5f9e09cff7ed62bfaa51c1369e0f7ea5f16a91aa.1628398044.git.repnop@google.com Signed-off-by: Matthew Bobrowski Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 85 ++++++++++++++++++++++++++++-- include/linux/fanotify.h | 3 +- include/uapi/linux/fanotify.h | 13 +++++ 3 files changed, 96 insertions(+), 5 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 83405949a71b..10fb062065b6 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 #include #include +#include #include #include #include @@ -106,6 +107,8 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly; #define FANOTIFY_EVENT_ALIGN 4 #define FANOTIFY_FID_INFO_HDR_LEN \ (sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle)) +#define FANOTIFY_PIDFD_INFO_HDR_LEN \ + sizeof(struct fanotify_event_info_pidfd) static int fanotify_fid_info_len(int fh_len, int name_len) { @@ -138,6 +141,9 @@ static int fanotify_event_info_len(unsigned int info_mode, dot_len = 1; } + if (info_mode & FAN_REPORT_PIDFD) + info_len += FANOTIFY_PIDFD_INFO_HDR_LEN; + if (fh_len) info_len += fanotify_fid_info_len(fh_len, dot_len); @@ -401,13 +407,34 @@ static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, return info_len; } +static int copy_pidfd_info_to_user(int pidfd, + char __user *buf, + size_t count) +{ + struct fanotify_event_info_pidfd info = { }; + size_t info_len = FANOTIFY_PIDFD_INFO_HDR_LEN; + + if (WARN_ON_ONCE(info_len > count)) + return -EFAULT; + + info.hdr.info_type = FAN_EVENT_INFO_TYPE_PIDFD; + info.hdr.len = info_len; + info.pidfd = pidfd; + + if (copy_to_user(buf, &info, info_len)) + return -EFAULT; + + return info_len; +} + static int copy_info_records_to_user(struct fanotify_event *event, struct fanotify_info *info, - unsigned int info_mode, + unsigned int info_mode, int pidfd, char __user *buf, size_t count) { int ret, total_bytes = 0, info_type = 0; unsigned int fid_mode = info_mode & FANOTIFY_FID_BITS; + unsigned int pidfd_mode = info_mode & FAN_REPORT_PIDFD; /* * Event info records order is as follows: dir fid + name, child fid. @@ -478,6 +505,16 @@ static int copy_info_records_to_user(struct fanotify_event *event, total_bytes += ret; } + if (pidfd_mode) { + ret = copy_pidfd_info_to_user(pidfd, buf, count); + if (ret < 0) + return ret; + + buf += ret; + count -= ret; + total_bytes += ret; + } + return total_bytes; } @@ -489,8 +526,9 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, struct path *path = fanotify_event_path(event); struct fanotify_info *info = fanotify_event_info(event); unsigned int info_mode = FAN_GROUP_FLAG(group, FANOTIFY_INFO_MODES); + unsigned int pidfd_mode = info_mode & FAN_REPORT_PIDFD; struct file *f = NULL; - int ret, fd = FAN_NOFD; + int ret, pidfd = FAN_NOPIDFD, fd = FAN_NOFD; pr_debug("%s: group=%p event=%p\n", __func__, group, event); @@ -524,6 +562,33 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, } metadata.fd = fd; + if (pidfd_mode) { + /* + * Complain if the FAN_REPORT_PIDFD and FAN_REPORT_TID mutual + * exclusion is ever lifted. At the time of incoporating pidfd + * support within fanotify, the pidfd API only supported the + * creation of pidfds for thread-group leaders. + */ + WARN_ON_ONCE(FAN_GROUP_FLAG(group, FAN_REPORT_TID)); + + /* + * The PIDTYPE_TGID check for an event->pid is performed + * preemptively in an attempt to catch out cases where the event + * listener reads events after the event generating process has + * already terminated. Report FAN_NOPIDFD to the event listener + * in those cases, with all other pidfd creation errors being + * reported as FAN_EPIDFD. + */ + if (metadata.pid == 0 || + !pid_has_task(event->pid, PIDTYPE_TGID)) { + pidfd = FAN_NOPIDFD; + } else { + pidfd = pidfd_create(event->pid, 0); + if (pidfd < 0) + pidfd = FAN_EPIDFD; + } + } + ret = -EFAULT; /* * Sanity check copy size in case get_one_event() and @@ -542,7 +607,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, FANOTIFY_PERM(event)->fd = fd; if (info_mode) { - ret = copy_info_records_to_user(event, info, info_mode, + ret = copy_info_records_to_user(event, info, info_mode, pidfd, buf, count); if (ret < 0) goto out_close_fd; @@ -558,6 +623,10 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, put_unused_fd(fd); fput(f); } + + if (pidfd >= 0) + close_fd(pidfd); + return ret; } @@ -1103,6 +1172,14 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) #endif return -EINVAL; + /* + * A pidfd can only be returned for a thread-group leader; thus + * FAN_REPORT_PIDFD and FAN_REPORT_TID need to remain mutually + * exclusive. + */ + if ((flags & FAN_REPORT_PIDFD) && (flags & FAN_REPORT_TID)) + return -EINVAL; + if (event_f_flags & ~FANOTIFY_INIT_ALL_EVENT_F_BITS) return -EINVAL; @@ -1522,7 +1599,7 @@ static int __init fanotify_user_setup(void) FANOTIFY_DEFAULT_MAX_USER_MARKS); BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS); - BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10); + BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 11); BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9); fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 10a7e26ddba6..eec3b7c40811 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -27,7 +27,7 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ #define FANOTIFY_FID_BITS (FAN_REPORT_FID | FAN_REPORT_DFID_NAME) -#define FANOTIFY_INFO_MODES (FANOTIFY_FID_BITS) +#define FANOTIFY_INFO_MODES (FANOTIFY_FID_BITS | FAN_REPORT_PIDFD) /* * fanotify_init() flags that require CAP_SYS_ADMIN. @@ -37,6 +37,7 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ */ #define FANOTIFY_ADMIN_INIT_FLAGS (FANOTIFY_PERM_CLASSES | \ FAN_REPORT_TID | \ + FAN_REPORT_PIDFD | \ FAN_UNLIMITED_QUEUE | \ FAN_UNLIMITED_MARKS) diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index fbf9c5c7dd59..64553df9d735 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -51,6 +51,7 @@ #define FAN_ENABLE_AUDIT 0x00000040 /* Flags to determine fanotify event format */ +#define FAN_REPORT_PIDFD 0x00000080 /* Report pidfd for event->pid */ #define FAN_REPORT_TID 0x00000100 /* event->pid is thread id */ #define FAN_REPORT_FID 0x00000200 /* Report unique file id */ #define FAN_REPORT_DIR_FID 0x00000400 /* Report unique directory id */ @@ -123,6 +124,7 @@ struct fanotify_event_metadata { #define FAN_EVENT_INFO_TYPE_FID 1 #define FAN_EVENT_INFO_TYPE_DFID_NAME 2 #define FAN_EVENT_INFO_TYPE_DFID 3 +#define FAN_EVENT_INFO_TYPE_PIDFD 4 /* Variable length info record following event metadata */ struct fanotify_event_info_header { @@ -148,6 +150,15 @@ struct fanotify_event_info_fid { unsigned char handle[0]; }; +/* + * This structure is used for info records of type FAN_EVENT_INFO_TYPE_PIDFD. + * It holds a pidfd for the pid that was responsible for generating an event. + */ +struct fanotify_event_info_pidfd { + struct fanotify_event_info_header hdr; + __s32 pidfd; +}; + struct fanotify_response { __s32 fd; __u32 response; @@ -160,6 +171,8 @@ struct fanotify_response { /* No fd set in event */ #define FAN_NOFD -1 +#define FAN_NOPIDFD FAN_NOFD +#define FAN_EPIDFD -2 /* Helper functions to deal with fanotify_event_metadata buffers */ #define FAN_EVENT_METADATA_LEN (sizeof(struct fanotify_event_metadata)) From cdbf9c5f81d0951521f0dc1670851f4cabf0fba9 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Tue, 10 Aug 2021 18:12:17 +0300 Subject: [PATCH 0635/1565] fsnotify: replace igrab() with ihold() on attach connector [ Upstream commit 09ddbe69c9925b42cb9529f60678c25b241d8b18 ] We must have a reference on inode, so ihold is cheaper. Link: https://lore.kernel.org/r/20210810151220.285179-2-amir73il@gmail.com Reviewed-by: Matthew Bobrowski Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/mark.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/notify/mark.c b/fs/notify/mark.c index 7af98a7c33c2..3c8fc77d3f07 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -493,8 +493,11 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, conn->fsid.val[0] = conn->fsid.val[1] = 0; conn->flags = 0; } - if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) - inode = igrab(fsnotify_conn_inode(conn)); + if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) { + inode = fsnotify_conn_inode(conn); + ihold(inode); + } + /* * cmpxchg() provides the barrier so that readers of *connp can see * only initialized structure From 44858a348881c66db3a5e346cd53585b5ac6ce7e Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Tue, 10 Aug 2021 18:12:18 +0300 Subject: [PATCH 0636/1565] fsnotify: count s_fsnotify_inode_refs for attached connectors [ Upstream commit 11fa333b58ba1518e7c69fafb6513a0117f8fe33 ] Instead of incrementing s_fsnotify_inode_refs when detaching connector from inode, increment it earlier when attaching connector to inode. Next patch is going to use s_fsnotify_inode_refs to count all objects with attached connectors. Link: https://lore.kernel.org/r/20210810151220.285179-3-amir73il@gmail.com Reviewed-by: Matthew Bobrowski Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/mark.c | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/fs/notify/mark.c b/fs/notify/mark.c index 3c8fc77d3f07..f6d1ad3ecca2 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -169,6 +169,21 @@ static void fsnotify_connector_destroy_workfn(struct work_struct *work) } } +static void fsnotify_get_inode_ref(struct inode *inode) +{ + ihold(inode); + atomic_long_inc(&inode->i_sb->s_fsnotify_inode_refs); +} + +static void fsnotify_put_inode_ref(struct inode *inode) +{ + struct super_block *sb = inode->i_sb; + + iput(inode); + if (atomic_long_dec_and_test(&sb->s_fsnotify_inode_refs)) + wake_up_var(&sb->s_fsnotify_inode_refs); +} + static void *fsnotify_detach_connector_from_object( struct fsnotify_mark_connector *conn, unsigned int *type) @@ -182,7 +197,6 @@ static void *fsnotify_detach_connector_from_object( if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) { inode = fsnotify_conn_inode(conn); inode->i_fsnotify_mask = 0; - atomic_long_inc(&inode->i_sb->s_fsnotify_inode_refs); } else if (conn->type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) { fsnotify_conn_mount(conn)->mnt_fsnotify_mask = 0; } else if (conn->type == FSNOTIFY_OBJ_TYPE_SB) { @@ -209,19 +223,12 @@ static void fsnotify_final_mark_destroy(struct fsnotify_mark *mark) /* Drop object reference originally held by a connector */ static void fsnotify_drop_object(unsigned int type, void *objp) { - struct inode *inode; - struct super_block *sb; - if (!objp) return; /* Currently only inode references are passed to be dropped */ if (WARN_ON_ONCE(type != FSNOTIFY_OBJ_TYPE_INODE)) return; - inode = objp; - sb = inode->i_sb; - iput(inode); - if (atomic_long_dec_and_test(&sb->s_fsnotify_inode_refs)) - wake_up_var(&sb->s_fsnotify_inode_refs); + fsnotify_put_inode_ref(objp); } void fsnotify_put_mark(struct fsnotify_mark *mark) @@ -495,7 +502,7 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, } if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) { inode = fsnotify_conn_inode(conn); - ihold(inode); + fsnotify_get_inode_ref(inode); } /* @@ -505,7 +512,7 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, if (cmpxchg(connp, NULL, conn)) { /* Someone else created list structure for us */ if (inode) - iput(inode); + fsnotify_put_inode_ref(inode); kmem_cache_free(fsnotify_mark_connector_cachep, conn); } From 9917e1bda3d7a4c5ac9a4ec7795a2fceb0d6f8fa Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Tue, 10 Aug 2021 18:12:19 +0300 Subject: [PATCH 0637/1565] fsnotify: count all objects with attached connectors [ Upstream commit ec44610fe2b86daef70f3f53f47d2a2542d7094f ] Rename s_fsnotify_inode_refs to s_fsnotify_connectors and count all objects with attached connectors, not only inodes with attached connectors. This will be used to optimize fsnotify() calls on sb without any type of marks. Link: https://lore.kernel.org/r/20210810151220.285179-4-amir73il@gmail.com Signed-off-by: Amir Goldstein Reviewed-by: Matthew Bobrowski Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fsnotify.c | 6 +++--- fs/notify/fsnotify.h | 15 +++++++++++++++ fs/notify/mark.c | 24 +++++++++++++++++++++--- include/linux/fs.h | 7 +++++-- 4 files changed, 44 insertions(+), 8 deletions(-) diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 30d422b8c0fc..963e6ce75b96 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -87,15 +87,15 @@ static void fsnotify_unmount_inodes(struct super_block *sb) if (iput_inode) iput(iput_inode); - /* Wait for outstanding inode references from connectors */ - wait_var_event(&sb->s_fsnotify_inode_refs, - !atomic_long_read(&sb->s_fsnotify_inode_refs)); } void fsnotify_sb_delete(struct super_block *sb) { fsnotify_unmount_inodes(sb); fsnotify_clear_marks_by_sb(sb); + /* Wait for outstanding object references from connectors */ + wait_var_event(&sb->s_fsnotify_connectors, + !atomic_long_read(&sb->s_fsnotify_connectors)); } /* diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h index ff2063ec6b0f..87d8a50ee803 100644 --- a/fs/notify/fsnotify.h +++ b/fs/notify/fsnotify.h @@ -27,6 +27,21 @@ static inline struct super_block *fsnotify_conn_sb( return container_of(conn->obj, struct super_block, s_fsnotify_marks); } +static inline struct super_block *fsnotify_connector_sb( + struct fsnotify_mark_connector *conn) +{ + switch (conn->type) { + case FSNOTIFY_OBJ_TYPE_INODE: + return fsnotify_conn_inode(conn)->i_sb; + case FSNOTIFY_OBJ_TYPE_VFSMOUNT: + return fsnotify_conn_mount(conn)->mnt.mnt_sb; + case FSNOTIFY_OBJ_TYPE_SB: + return fsnotify_conn_sb(conn); + default: + return NULL; + } +} + /* destroy all events sitting in this groups notification queue */ extern void fsnotify_flush_notify(struct fsnotify_group *group); diff --git a/fs/notify/mark.c b/fs/notify/mark.c index f6d1ad3ecca2..796946eb0c2e 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -172,7 +172,7 @@ static void fsnotify_connector_destroy_workfn(struct work_struct *work) static void fsnotify_get_inode_ref(struct inode *inode) { ihold(inode); - atomic_long_inc(&inode->i_sb->s_fsnotify_inode_refs); + atomic_long_inc(&inode->i_sb->s_fsnotify_connectors); } static void fsnotify_put_inode_ref(struct inode *inode) @@ -180,8 +180,24 @@ static void fsnotify_put_inode_ref(struct inode *inode) struct super_block *sb = inode->i_sb; iput(inode); - if (atomic_long_dec_and_test(&sb->s_fsnotify_inode_refs)) - wake_up_var(&sb->s_fsnotify_inode_refs); + if (atomic_long_dec_and_test(&sb->s_fsnotify_connectors)) + wake_up_var(&sb->s_fsnotify_connectors); +} + +static void fsnotify_get_sb_connectors(struct fsnotify_mark_connector *conn) +{ + struct super_block *sb = fsnotify_connector_sb(conn); + + if (sb) + atomic_long_inc(&sb->s_fsnotify_connectors); +} + +static void fsnotify_put_sb_connectors(struct fsnotify_mark_connector *conn) +{ + struct super_block *sb = fsnotify_connector_sb(conn); + + if (sb && atomic_long_dec_and_test(&sb->s_fsnotify_connectors)) + wake_up_var(&sb->s_fsnotify_connectors); } static void *fsnotify_detach_connector_from_object( @@ -203,6 +219,7 @@ static void *fsnotify_detach_connector_from_object( fsnotify_conn_sb(conn)->s_fsnotify_mask = 0; } + fsnotify_put_sb_connectors(conn); rcu_assign_pointer(*(conn->obj), NULL); conn->obj = NULL; conn->type = FSNOTIFY_OBJ_TYPE_DETACHED; @@ -504,6 +521,7 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, inode = fsnotify_conn_inode(conn); fsnotify_get_inode_ref(inode); } + fsnotify_get_sb_connectors(conn); /* * cmpxchg() provides the barrier so that readers of *connp can see diff --git a/include/linux/fs.h b/include/linux/fs.h index cc3b6ddf5822..a9ac60d3be1d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1512,8 +1512,11 @@ struct super_block { /* Number of inodes with nlink == 0 but still referenced */ atomic_long_t s_remove_count; - /* Pending fsnotify inode refs */ - atomic_long_t s_fsnotify_inode_refs; + /* + * Number of inode/mount/sb objects that are being watched, note that + * inodes objects are currently double-accounted. + */ + atomic_long_t s_fsnotify_connectors; /* Being remounted read-only */ int s_readonly_remount; From 860893f9e35178d6894f190b8476ce7e1fb9481f Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Tue, 10 Aug 2021 18:12:20 +0300 Subject: [PATCH 0638/1565] fsnotify: optimize the case of no marks of any type [ Upstream commit e43de7f0862b8598cd1ef440e3b4701cd107ea40 ] Add a simple check in the inline helpers to avoid calling fsnotify() and __fsnotify_parent() in case there are no marks of any type (inode/sb/mount) for an inode's sb, so there can be no objects of any type interested in the event. Link: https://lore.kernel.org/r/20210810151220.285179-5-amir73il@gmail.com Reviewed-by: Matthew Bobrowski Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/fsnotify.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index 79add91eaa04..a9477c14fad5 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -30,6 +30,9 @@ static inline void fsnotify_name(struct inode *dir, __u32 mask, struct inode *child, const struct qstr *name, u32 cookie) { + if (atomic_long_read(&dir->i_sb->s_fsnotify_connectors) == 0) + return; + fsnotify(mask, child, FSNOTIFY_EVENT_INODE, dir, name, NULL, cookie); } @@ -41,6 +44,9 @@ static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry, static inline void fsnotify_inode(struct inode *inode, __u32 mask) { + if (atomic_long_read(&inode->i_sb->s_fsnotify_connectors) == 0) + return; + if (S_ISDIR(inode->i_mode)) mask |= FS_ISDIR; @@ -53,6 +59,9 @@ static inline int fsnotify_parent(struct dentry *dentry, __u32 mask, { struct inode *inode = d_inode(dentry); + if (atomic_long_read(&inode->i_sb->s_fsnotify_connectors) == 0) + return 0; + if (S_ISDIR(inode->i_mode)) { mask |= FS_ISDIR; From 9e5f2e0ae0196ff8be932f26f5d4e5cd4e572e69 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 28 Jun 2021 16:34:20 -0400 Subject: [PATCH 0639/1565] NFSD: Clean up splice actor [ Upstream commit c7e0b781b73c2e26e442ed71397cc2bc5945a732 ] A few useful observations: - The value in @size is never modified. - splice_desc.len is an unsigned int, and so is xdr_buf.page_len. An implicit cast to size_t is unnecessary. - The computation of .page_len is the same in all three arms of the "if" statement, so hoist it out to make it clear that the operation is an unconditional invariant. The resulting function is 18 bytes shorter on my system (-Os). Signed-off-by: Chuck Lever Reviewed-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 74b2c6c5ad0b..8520a2fc92de 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -854,26 +854,21 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct svc_rqst *rqstp = sd->u.data; struct page **pp = rqstp->rq_next_page; struct page *page = buf->page; - size_t size; - - size = sd->len; if (rqstp->rq_res.page_len == 0) { get_page(page); put_page(*rqstp->rq_next_page); *(rqstp->rq_next_page++) = page; rqstp->rq_res.page_base = buf->offset; - rqstp->rq_res.page_len = size; } else if (page != pp[-1]) { get_page(page); if (*rqstp->rq_next_page) put_page(*rqstp->rq_next_page); *(rqstp->rq_next_page++) = page; - rqstp->rq_res.page_len += size; - } else - rqstp->rq_res.page_len += size; + } + rqstp->rq_res.page_len += sd->len; - return size; + return sd->len; } static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe, From a4f616afb4ee5ad56ac37a640bd9b1613d463113 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 1 Jul 2021 10:03:10 -0400 Subject: [PATCH 0640/1565] SUNRPC: Add svc_rqst_replace_page() API [ Upstream commit 2f0f88f42f2eab0421ed37d7494de9124fdf0d34 ] Replacing a page in rq_pages[] requires a get_page(), which is a bus-locked operation, and a put_page(), which can be even more costly. To reduce the cost of replacing a page in rq_pages[], batch the put_page() operations by collecting "freed" pages in a pagevec, and then release those pages when the pagevec is full. This pagevec is also emptied when each RPC completes. [ cel: adjusted to apply without f6e70aab9dfe ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/sunrpc/svc.h | 4 ++++ net/sunrpc/svc.c | 21 +++++++++++++++++++++ net/sunrpc/svc_xprt.c | 3 +++ 3 files changed, 28 insertions(+) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index e91d51ea028b..ab9afbf0a0d8 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -19,6 +19,7 @@ #include #include #include +#include /* statistics for svc_pool structures */ struct svc_pool_stats { @@ -256,6 +257,7 @@ struct svc_rqst { struct page * *rq_next_page; /* next reply page to use */ struct page * *rq_page_end; /* one past the last page */ + struct pagevec rq_pvec; struct kvec rq_vec[RPCSVC_MAXPAGES]; /* generally useful.. */ struct bio_vec rq_bvec[RPCSVC_MAXPAGES]; @@ -502,6 +504,8 @@ struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node); struct svc_rqst *svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node); +void svc_rqst_replace_page(struct svc_rqst *rqstp, + struct page *page); void svc_rqst_free(struct svc_rqst *); void svc_exit_thread(struct svc_rqst *); unsigned int svc_pool_map_get(void); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 73e02ba4ed53..27f98756d175 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -842,6 +842,27 @@ svc_set_num_threads_sync(struct svc_serv *serv, struct svc_pool *pool, int nrser } EXPORT_SYMBOL_GPL(svc_set_num_threads_sync); +/** + * svc_rqst_replace_page - Replace one page in rq_pages[] + * @rqstp: svc_rqst with pages to replace + * @page: replacement page + * + * When replacing a page in rq_pages, batch the release of the + * replaced pages to avoid hammering the page allocator. + */ +void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) +{ + if (*rqstp->rq_next_page) { + if (!pagevec_space(&rqstp->rq_pvec)) + __pagevec_release(&rqstp->rq_pvec); + pagevec_add(&rqstp->rq_pvec, *rqstp->rq_next_page); + } + + get_page(page); + *(rqstp->rq_next_page++) = page; +} +EXPORT_SYMBOL_GPL(svc_rqst_replace_page); + /* * Called from a server thread as it's exiting. Caller must hold the "service * mutex" for the service. diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index d150e25b4d45..570b092165d7 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -525,6 +525,7 @@ static void svc_xprt_release(struct svc_rqst *rqstp) kfree(rqstp->rq_deferred); rqstp->rq_deferred = NULL; + pagevec_release(&rqstp->rq_pvec); svc_free_res_pages(rqstp); rqstp->rq_res.page_len = 0; rqstp->rq_res.page_base = 0; @@ -651,6 +652,8 @@ static int svc_alloc_arg(struct svc_rqst *rqstp) int pages; int i; + pagevec_init(&rqstp->rq_pvec); + /* now allocate needed pages. If we get a failure, sleep briefly */ pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT; if (pages > RPCSVC_MAXPAGES) { From 86df138e8d4d378b4b7c61632ef5fb243672e02f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 28 Jun 2021 17:24:27 -0400 Subject: [PATCH 0641/1565] NFSD: Batch release pages during splice read [ Upstream commit 496d83cf0f2fa70cfe256c2499e2d3523d3868f3 ] Large splice reads call put_page() repeatedly. put_page() is relatively expensive to call, so replace it with the new svc_rqst_replace_page() helper to help amortize that cost. Signed-off-by: Chuck Lever Reviewed-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 8520a2fc92de..05b5f7e241e7 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -856,15 +856,10 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct page *page = buf->page; if (rqstp->rq_res.page_len == 0) { - get_page(page); - put_page(*rqstp->rq_next_page); - *(rqstp->rq_next_page++) = page; + svc_rqst_replace_page(rqstp, page); rqstp->rq_res.page_base = buf->offset; } else if (page != pp[-1]) { - get_page(page); - if (*rqstp->rq_next_page) - put_page(*rqstp->rq_next_page); - *(rqstp->rq_next_page++) = page; + svc_rqst_replace_page(rqstp, page); } rqstp->rq_res.page_len += sd->len; From 130dcbf77a7ec948fd3f6c4c90d7a788e685f7c6 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Wed, 28 Jul 2021 08:56:09 +1000 Subject: [PATCH 0642/1565] NFSD: remove vanity comments [ Upstream commit ea49dc79002c416a9003f3204bc14f846a0dbcae ] Including one's name in copyright claims is appropriate. Including it in random comments is just vanity. After 2 decades, it is time for these to be gone. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 1 - include/uapi/linux/nfsd/nfsfh.h | 1 - 2 files changed, 2 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 05b5f7e241e7..2c493937dd5e 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -244,7 +244,6 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp, * returned. Otherwise the covered directory is returned. * NOTE: this mountpoint crossing is not supported properly by all * clients and is explicitly disallowed for NFSv3 - * NeilBrown */ __be32 nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name, diff --git a/include/uapi/linux/nfsd/nfsfh.h b/include/uapi/linux/nfsd/nfsfh.h index 427294dd56a1..e29e8accc4f4 100644 --- a/include/uapi/linux/nfsd/nfsfh.h +++ b/include/uapi/linux/nfsd/nfsfh.h @@ -33,7 +33,6 @@ struct nfs_fhbase_old { /* * This is the new flexible, extensible style NFSv2/v3/v4 file handle. - * by Neil Brown - March 2000 * * The file handle starts with a sequence of four-byte words. * The first word contains a version number (1) and three descriptor bytes From f469e60f9a0fd2b211b723dbd19c8f87294da3e3 Mon Sep 17 00:00:00 2001 From: Jia He Date: Tue, 3 Aug 2021 12:59:36 +0200 Subject: [PATCH 0643/1565] sysctl: introduce new proc handler proc_dobool [ Upstream commit a2071573d6346819cc4e5787b4206f2184985160 ] This is to let bool variable could be correctly displayed in big/little endian sysctl procfs. sizeof(bool) is arch dependent, proc_dobool should work in all arches. Suggested-by: Pan Xinhui Signed-off-by: Jia He [thuth: rebased the patch to the current kernel version] Signed-off-by: Thomas Huth Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/sysctl.h | 2 ++ kernel/sysctl.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 44 insertions(+) diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index c202a72e1690..47cf70c8eb93 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -55,6 +55,8 @@ typedef int proc_handler(struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos); int proc_dostring(struct ctl_table *, int, void *, size_t *, loff_t *); +int proc_dobool(struct ctl_table *table, int write, void *buffer, + size_t *lenp, loff_t *ppos); int proc_dointvec(struct ctl_table *, int, void *, size_t *, loff_t *); int proc_douintvec(struct ctl_table *, int, void *, size_t *, loff_t *); int proc_dointvec_minmax(struct ctl_table *, int, void *, size_t *, loff_t *); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index b29a9568ebbe..abe0f16d5364 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -546,6 +546,21 @@ static void proc_put_char(void **buf, size_t *size, char c) } } +static int do_proc_dobool_conv(bool *negp, unsigned long *lvalp, + int *valp, + int write, void *data) +{ + if (write) { + *(bool *)valp = *lvalp; + } else { + int val = *(bool *)valp; + + *lvalp = (unsigned long)val; + *negp = false; + } + return 0; +} + static int do_proc_dointvec_conv(bool *negp, unsigned long *lvalp, int *valp, int write, void *data) @@ -808,6 +823,26 @@ static int do_proc_douintvec(struct ctl_table *table, int write, buffer, lenp, ppos, conv, data); } +/** + * proc_dobool - read/write a bool + * @table: the sysctl table + * @write: %TRUE if this is a write to the sysctl file + * @buffer: the user buffer + * @lenp: the size of the user buffer + * @ppos: file position + * + * Reads/writes up to table->maxlen/sizeof(unsigned int) integer + * values from/to the user buffer, treated as an ASCII string. + * + * Returns 0 on success. + */ +int proc_dobool(struct ctl_table *table, int write, void *buffer, + size_t *lenp, loff_t *ppos) +{ + return do_proc_dointvec(table, write, buffer, lenp, ppos, + do_proc_dobool_conv, NULL); +} + /** * proc_dointvec - read a vector of integers * @table: the sysctl table @@ -1644,6 +1679,12 @@ int proc_dostring(struct ctl_table *table, int write, return -ENOSYS; } +int proc_dobool(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + return -ENOSYS; +} + int proc_dointvec(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { @@ -3503,6 +3544,7 @@ int __init sysctl_init(void) * No sense putting this after each symbol definition, twice, * exception granted :-) */ +EXPORT_SYMBOL(proc_dobool); EXPORT_SYMBOL(proc_dointvec); EXPORT_SYMBOL(proc_douintvec); EXPORT_SYMBOL(proc_dointvec_jiffies); From 860f01260e5379ea15b90adb86edc22e9c7da3b6 Mon Sep 17 00:00:00 2001 From: Jia He Date: Tue, 3 Aug 2021 12:59:37 +0200 Subject: [PATCH 0644/1565] lockd: change the proc_handler for nsm_use_hostnames [ Upstream commit d02a3a2cb25d384005a6e3446a445013342024b7 ] nsm_use_hostnames is a module parameter and it will be exported to sysctl procfs. This is to let user sometimes change it from userspace. But the minimal unit for sysctl procfs read/write it sizeof(int). In big endian system, the converting from/to bool to/from int will cause error for proc items. This patch use a new proc_handler proc_dobool to fix it. Signed-off-by: Jia He Reviewed-by: Pan Xinhui [thuth: Fix typo in commit message] Signed-off-by: Thomas Huth Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 2de048f80eb8..0ab9756ed235 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -584,7 +584,7 @@ static struct ctl_table nlm_sysctls[] = { .data = &nsm_use_hostnames, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_dobool, }, { .procname = "nsm_local_state", From 3fbc744783dd3be199ded1d8f84ca1d283e0467f Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Mon, 23 Aug 2021 12:01:18 -0400 Subject: [PATCH 0645/1565] nlm: minor nlm_lookup_file argument change [ Upstream commit 2dc6f19e4f438d4c14987cb17aee38aaf7304e7f ] It'll come in handy to get the whole nlm_lock. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc4proc.c | 3 ++- fs/lockd/svcproc.c | 2 +- fs/lockd/svcsubs.c | 15 ++++++++------- include/linux/lockd/lockd.h | 2 +- 4 files changed, 12 insertions(+), 10 deletions(-) diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c index 4c10fb5138f1..bc496bbd696b 100644 --- a/fs/lockd/svc4proc.c +++ b/fs/lockd/svc4proc.c @@ -40,7 +40,8 @@ nlm4svc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, /* Obtain file pointer. Not used by FREE_ALL call. */ if (filp != NULL) { - if ((error = nlm_lookup_file(rqstp, &file, &lock->fh)) != 0) + error = nlm_lookup_file(rqstp, &file, lock); + if (error) goto no_locks; *filp = file; diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c index 4ae4b63b5392..f4e5e0eb30fd 100644 --- a/fs/lockd/svcproc.c +++ b/fs/lockd/svcproc.c @@ -69,7 +69,7 @@ nlmsvc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, /* Obtain file pointer. Not used by FREE_ALL call. */ if (filp != NULL) { - error = cast_status(nlm_lookup_file(rqstp, &file, &lock->fh)); + error = cast_status(nlm_lookup_file(rqstp, &file, lock)); if (error != 0) goto no_locks; *filp = file; diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index 028fc152da22..2d62633b39e5 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -82,31 +82,31 @@ static inline unsigned int file_hash(struct nfs_fh *f) */ __be32 nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result, - struct nfs_fh *f) + struct nlm_lock *lock) { struct nlm_file *file; unsigned int hash; __be32 nfserr; - nlm_debug_print_fh("nlm_lookup_file", f); + nlm_debug_print_fh("nlm_lookup_file", &lock->fh); - hash = file_hash(f); + hash = file_hash(&lock->fh); /* Lock file table */ mutex_lock(&nlm_file_mutex); hlist_for_each_entry(file, &nlm_files[hash], f_list) - if (!nfs_compare_fh(&file->f_handle, f)) + if (!nfs_compare_fh(&file->f_handle, &lock->fh)) goto found; - nlm_debug_print_fh("creating file for", f); + nlm_debug_print_fh("creating file for", &lock->fh); nfserr = nlm_lck_denied_nolocks; file = kzalloc(sizeof(*file), GFP_KERNEL); if (!file) goto out_unlock; - memcpy(&file->f_handle, f, sizeof(struct nfs_fh)); + memcpy(&file->f_handle, &lock->fh, sizeof(struct nfs_fh)); mutex_init(&file->f_mutex); INIT_HLIST_NODE(&file->f_list); INIT_LIST_HEAD(&file->f_blocks); @@ -117,7 +117,8 @@ nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result, * We have to make sure we have the right credential to open * the file. */ - if ((nfserr = nlmsvc_ops->fopen(rqstp, f, &file->f_file)) != 0) { + nfserr = nlmsvc_ops->fopen(rqstp, &lock->fh, &file->f_file); + if (nfserr) { dprintk("lockd: open failed (error %d)\n", nfserr); goto out_free; } diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h index 666f5f310a04..81b71ad2040a 100644 --- a/include/linux/lockd/lockd.h +++ b/include/linux/lockd/lockd.h @@ -286,7 +286,7 @@ void nlmsvc_locks_init_private(struct file_lock *, struct nlm_host *, pid_t); * File handling for the server personality */ __be32 nlm_lookup_file(struct svc_rqst *, struct nlm_file **, - struct nfs_fh *); + struct nlm_lock *); void nlm_release_file(struct nlm_file *); void nlmsvc_release_lockowner(struct nlm_lock *); void nlmsvc_mark_resources(struct net *); From 14c2a0fad5410aac061916e49042d9729db52852 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Mon, 23 Aug 2021 11:26:39 -0400 Subject: [PATCH 0646/1565] nlm: minor refactoring [ Upstream commit a81041b7d8f08c4e1014173c5483a0f18724a576 ] Make this lookup slightly more concise, and prepare for changing how we look this up in a following patch. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svclock.c | 16 ++++++++-------- fs/lockd/svcsubs.c | 4 ++-- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index 273a81971ed5..bcd180ba9957 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -474,8 +474,8 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file, __be32 ret; dprintk("lockd: nlmsvc_lock(%s/%ld, ty=%d, pi=%d, %Ld-%Ld, bl=%d)\n", - locks_inode(file->f_file)->i_sb->s_id, - locks_inode(file->f_file)->i_ino, + nlmsvc_file_inode(file)->i_sb->s_id, + nlmsvc_file_inode(file)->i_ino, lock->fl.fl_type, lock->fl.fl_pid, (long long)lock->fl.fl_start, (long long)lock->fl.fl_end, @@ -581,8 +581,8 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file, struct nlm_lockowner *test_owner; dprintk("lockd: nlmsvc_testlock(%s/%ld, ty=%d, %Ld-%Ld)\n", - locks_inode(file->f_file)->i_sb->s_id, - locks_inode(file->f_file)->i_ino, + nlmsvc_file_inode(file)->i_sb->s_id, + nlmsvc_file_inode(file)->i_ino, lock->fl.fl_type, (long long)lock->fl.fl_start, (long long)lock->fl.fl_end); @@ -644,8 +644,8 @@ nlmsvc_unlock(struct net *net, struct nlm_file *file, struct nlm_lock *lock) int error; dprintk("lockd: nlmsvc_unlock(%s/%ld, pi=%d, %Ld-%Ld)\n", - locks_inode(file->f_file)->i_sb->s_id, - locks_inode(file->f_file)->i_ino, + nlmsvc_file_inode(file)->i_sb->s_id, + nlmsvc_file_inode(file)->i_ino, lock->fl.fl_pid, (long long)lock->fl.fl_start, (long long)lock->fl.fl_end); @@ -673,8 +673,8 @@ nlmsvc_cancel_blocked(struct net *net, struct nlm_file *file, struct nlm_lock *l int status = 0; dprintk("lockd: nlmsvc_cancel(%s/%ld, pi=%d, %Ld-%Ld)\n", - locks_inode(file->f_file)->i_sb->s_id, - locks_inode(file->f_file)->i_ino, + nlmsvc_file_inode(file)->i_sb->s_id, + nlmsvc_file_inode(file)->i_ino, lock->fl.fl_pid, (long long)lock->fl.fl_start, (long long)lock->fl.fl_end); diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index 2d62633b39e5..e02951f8a28f 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -45,7 +45,7 @@ static inline void nlm_debug_print_fh(char *msg, struct nfs_fh *f) static inline void nlm_debug_print_file(char *msg, struct nlm_file *file) { - struct inode *inode = locks_inode(file->f_file); + struct inode *inode = nlmsvc_file_inode(file); dprintk("lockd: %s %s/%ld\n", msg, inode->i_sb->s_id, inode->i_ino); @@ -416,7 +416,7 @@ nlmsvc_match_sb(void *datap, struct nlm_file *file) { struct super_block *sb = datap; - return sb == locks_inode(file->f_file)->i_sb; + return sb == nlmsvc_file_inode(file)->i_sb; } /** From b4bf52174b4f39b0e8db01568914920a3d2e80d0 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Fri, 20 Aug 2021 17:02:02 -0400 Subject: [PATCH 0647/1565] lockd: update nlm_lookup_file reexport comment [ Upstream commit b661601a9fdf1af8516e1100de8bba84bd41cca4 ] Update comment to reflect that we *do* allow reexport, whether it's a good idea or not.... Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svcsubs.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index e02951f8a28f..13e6ffc219ec 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -111,8 +111,9 @@ nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result, INIT_HLIST_NODE(&file->f_list); INIT_LIST_HEAD(&file->f_blocks); - /* Open the file. Note that this must not sleep for too long, else - * we would lock up lockd:-) So no NFS re-exports, folks. + /* + * Open the file. Note that if we're reexporting, for example, + * this could block the lockd thread for a while. * * We have to make sure we have the right credential to open * the file. From e580323ac0b51ad10ec2e181d1f777479b7983e7 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Mon, 23 Aug 2021 16:44:00 -0400 Subject: [PATCH 0648/1565] Keep read and write fds with each nlm_file [ Upstream commit 7f024fcd5c97dc70bb9121c80407cf3cf9be7159 ] We shouldn't really be using a read-only file descriptor to take a write lock. Most filesystems will put up with it. But NFS, for example, won't. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc4proc.c | 4 +- fs/lockd/svclock.c | 25 ++++++--- fs/lockd/svcproc.c | 4 +- fs/lockd/svcsubs.c | 102 +++++++++++++++++++++++++----------- fs/nfsd/lockd.c | 8 ++- include/linux/lockd/bind.h | 3 +- include/linux/lockd/lockd.h | 9 +++- 7 files changed, 111 insertions(+), 44 deletions(-) diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c index bc496bbd696b..e10ae2c41279 100644 --- a/fs/lockd/svc4proc.c +++ b/fs/lockd/svc4proc.c @@ -40,13 +40,15 @@ nlm4svc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, /* Obtain file pointer. Not used by FREE_ALL call. */ if (filp != NULL) { + int mode = lock_to_openmode(&lock->fl); + error = nlm_lookup_file(rqstp, &file, lock); if (error) goto no_locks; *filp = file; /* Set up the missing parts of the file_lock structure */ - lock->fl.fl_file = file->f_file; + lock->fl.fl_file = file->f_file[mode]; lock->fl.fl_pid = current->tgid; lock->fl.fl_lmops = &nlmsvc_lock_operations; nlmsvc_locks_init_private(&lock->fl, host, (pid_t)lock->svid); diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index bcd180ba9957..a7b4c51667ad 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -471,6 +471,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file, { struct nlm_block *block = NULL; int error; + int mode; __be32 ret; dprintk("lockd: nlmsvc_lock(%s/%ld, ty=%d, pi=%d, %Ld-%Ld, bl=%d)\n", @@ -524,7 +525,8 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file, if (!wait) lock->fl.fl_flags &= ~FL_SLEEP; - error = vfs_lock_file(file->f_file, F_SETLK, &lock->fl, NULL); + mode = lock_to_openmode(&lock->fl); + error = vfs_lock_file(file->f_file[mode], F_SETLK, &lock->fl, NULL); lock->fl.fl_flags &= ~FL_SLEEP; dprintk("lockd: vfs_lock_file returned %d\n", error); @@ -577,6 +579,7 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file, struct nlm_lock *conflock, struct nlm_cookie *cookie) { int error; + int mode; __be32 ret; struct nlm_lockowner *test_owner; @@ -595,7 +598,8 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file, /* If there's a conflicting lock, remember to clean up the test lock */ test_owner = (struct nlm_lockowner *)lock->fl.fl_owner; - error = vfs_test_lock(file->f_file, &lock->fl); + mode = lock_to_openmode(&lock->fl); + error = vfs_test_lock(file->f_file[mode], &lock->fl); if (error) { /* We can't currently deal with deferred test requests */ if (error == FILE_LOCK_DEFERRED) @@ -641,7 +645,7 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file, __be32 nlmsvc_unlock(struct net *net, struct nlm_file *file, struct nlm_lock *lock) { - int error; + int error = 0; dprintk("lockd: nlmsvc_unlock(%s/%ld, pi=%d, %Ld-%Ld)\n", nlmsvc_file_inode(file)->i_sb->s_id, @@ -654,7 +658,12 @@ nlmsvc_unlock(struct net *net, struct nlm_file *file, struct nlm_lock *lock) nlmsvc_cancel_blocked(net, file, lock); lock->fl.fl_type = F_UNLCK; - error = vfs_lock_file(file->f_file, F_SETLK, &lock->fl, NULL); + if (file->f_file[O_RDONLY]) + error = vfs_lock_file(file->f_file[O_RDONLY], F_SETLK, + &lock->fl, NULL); + if (file->f_file[O_WRONLY]) + error = vfs_lock_file(file->f_file[O_WRONLY], F_SETLK, + &lock->fl, NULL); return (error < 0)? nlm_lck_denied_nolocks : nlm_granted; } @@ -671,6 +680,7 @@ nlmsvc_cancel_blocked(struct net *net, struct nlm_file *file, struct nlm_lock *l { struct nlm_block *block; int status = 0; + int mode; dprintk("lockd: nlmsvc_cancel(%s/%ld, pi=%d, %Ld-%Ld)\n", nlmsvc_file_inode(file)->i_sb->s_id, @@ -686,7 +696,8 @@ nlmsvc_cancel_blocked(struct net *net, struct nlm_file *file, struct nlm_lock *l block = nlmsvc_lookup_block(file, lock); mutex_unlock(&file->f_mutex); if (block != NULL) { - vfs_cancel_lock(block->b_file->f_file, + mode = lock_to_openmode(&lock->fl); + vfs_cancel_lock(block->b_file->f_file[mode], &block->b_call->a_args.lock.fl); status = nlmsvc_unlink_block(block); nlmsvc_release_block(block); @@ -803,6 +814,7 @@ nlmsvc_grant_blocked(struct nlm_block *block) { struct nlm_file *file = block->b_file; struct nlm_lock *lock = &block->b_call->a_args.lock; + int mode; int error; loff_t fl_start, fl_end; @@ -828,7 +840,8 @@ nlmsvc_grant_blocked(struct nlm_block *block) lock->fl.fl_flags |= FL_SLEEP; fl_start = lock->fl.fl_start; fl_end = lock->fl.fl_end; - error = vfs_lock_file(file->f_file, F_SETLK, &lock->fl, NULL); + mode = lock_to_openmode(&lock->fl); + error = vfs_lock_file(file->f_file[mode], F_SETLK, &lock->fl, NULL); lock->fl.fl_flags &= ~FL_SLEEP; lock->fl.fl_start = fl_start; lock->fl.fl_end = fl_end; diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c index f4e5e0eb30fd..99696d3f6dd6 100644 --- a/fs/lockd/svcproc.c +++ b/fs/lockd/svcproc.c @@ -55,6 +55,7 @@ nlmsvc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, struct nlm_host *host = NULL; struct nlm_file *file = NULL; struct nlm_lock *lock = &argp->lock; + int mode; __be32 error = 0; /* nfsd callbacks must have been installed for this procedure */ @@ -75,7 +76,8 @@ nlmsvc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, *filp = file; /* Set up the missing parts of the file_lock structure */ - lock->fl.fl_file = file->f_file; + mode = lock_to_openmode(&lock->fl); + lock->fl.fl_file = file->f_file[mode]; lock->fl.fl_pid = current->tgid; lock->fl.fl_lmops = &nlmsvc_lock_operations; nlmsvc_locks_init_private(&lock->fl, host, (pid_t)lock->svid); diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index 13e6ffc219ec..cb3a7512c33e 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -71,14 +71,35 @@ static inline unsigned int file_hash(struct nfs_fh *f) return tmp & (FILE_NRHASH - 1); } +int lock_to_openmode(struct file_lock *lock) +{ + return (lock->fl_type == F_WRLCK) ? O_WRONLY : O_RDONLY; +} + +/* + * Open the file. Note that if we're reexporting, for example, + * this could block the lockd thread for a while. + * + * We have to make sure we have the right credential to open + * the file. + */ +static __be32 nlm_do_fopen(struct svc_rqst *rqstp, + struct nlm_file *file, int mode) +{ + struct file **fp = &file->f_file[mode]; + __be32 nfserr; + + if (*fp) + return 0; + nfserr = nlmsvc_ops->fopen(rqstp, &file->f_handle, fp, mode); + if (nfserr) + dprintk("lockd: open failed (error %d)\n", nfserr); + return nfserr; +} + /* * Lookup file info. If it doesn't exist, create a file info struct * and open a (VFS) file for the given inode. - * - * FIXME: - * Note that we open the file O_RDONLY even when creating write locks. - * This is not quite right, but for now, we assume the client performs - * the proper R/W checking. */ __be32 nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result, @@ -87,42 +108,38 @@ nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result, struct nlm_file *file; unsigned int hash; __be32 nfserr; + int mode; nlm_debug_print_fh("nlm_lookup_file", &lock->fh); hash = file_hash(&lock->fh); + mode = lock_to_openmode(&lock->fl); /* Lock file table */ mutex_lock(&nlm_file_mutex); hlist_for_each_entry(file, &nlm_files[hash], f_list) - if (!nfs_compare_fh(&file->f_handle, &lock->fh)) + if (!nfs_compare_fh(&file->f_handle, &lock->fh)) { + mutex_lock(&file->f_mutex); + nfserr = nlm_do_fopen(rqstp, file, mode); + mutex_unlock(&file->f_mutex); goto found; - + } nlm_debug_print_fh("creating file for", &lock->fh); nfserr = nlm_lck_denied_nolocks; file = kzalloc(sizeof(*file), GFP_KERNEL); if (!file) - goto out_unlock; + goto out_free; memcpy(&file->f_handle, &lock->fh, sizeof(struct nfs_fh)); mutex_init(&file->f_mutex); INIT_HLIST_NODE(&file->f_list); INIT_LIST_HEAD(&file->f_blocks); - /* - * Open the file. Note that if we're reexporting, for example, - * this could block the lockd thread for a while. - * - * We have to make sure we have the right credential to open - * the file. - */ - nfserr = nlmsvc_ops->fopen(rqstp, &lock->fh, &file->f_file); - if (nfserr) { - dprintk("lockd: open failed (error %d)\n", nfserr); - goto out_free; - } + nfserr = nlm_do_fopen(rqstp, file, mode); + if (nfserr) + goto out_unlock; hlist_add_head(&file->f_list, &nlm_files[hash]); @@ -130,7 +147,6 @@ nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result, dprintk("lockd: found file %p (count %d)\n", file, file->f_count); *result = file; file->f_count++; - nfserr = 0; out_unlock: mutex_unlock(&nlm_file_mutex); @@ -150,13 +166,34 @@ nlm_delete_file(struct nlm_file *file) nlm_debug_print_file("closing file", file); if (!hlist_unhashed(&file->f_list)) { hlist_del(&file->f_list); - nlmsvc_ops->fclose(file->f_file); + if (file->f_file[O_RDONLY]) + nlmsvc_ops->fclose(file->f_file[O_RDONLY]); + if (file->f_file[O_WRONLY]) + nlmsvc_ops->fclose(file->f_file[O_WRONLY]); kfree(file); } else { printk(KERN_WARNING "lockd: attempt to release unknown file!\n"); } } +static int nlm_unlock_files(struct nlm_file *file) +{ + struct file_lock lock; + struct file *f; + + lock.fl_type = F_UNLCK; + lock.fl_start = 0; + lock.fl_end = OFFSET_MAX; + for (f = file->f_file[0]; f <= file->f_file[1]; f++) { + if (f && vfs_lock_file(f, F_SETLK, &lock, NULL) < 0) { + pr_warn("lockd: unlock failure in %s:%d\n", + __FILE__, __LINE__); + return 1; + } + } + return 0; +} + /* * Loop over all locks on the given file and perform the specified * action. @@ -184,17 +221,10 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file *file, lockhost = ((struct nlm_lockowner *)fl->fl_owner)->host; if (match(lockhost, host)) { - struct file_lock lock = *fl; spin_unlock(&flctx->flc_lock); - lock.fl_type = F_UNLCK; - lock.fl_start = 0; - lock.fl_end = OFFSET_MAX; - if (vfs_lock_file(file->f_file, F_SETLK, &lock, NULL) < 0) { - printk("lockd: unlock failure in %s:%d\n", - __FILE__, __LINE__); + if (nlm_unlock_files(file)) return 1; - } goto again; } } @@ -248,6 +278,15 @@ nlm_file_inuse(struct nlm_file *file) return 0; } +static void nlm_close_files(struct nlm_file *file) +{ + struct file *f; + + for (f = file->f_file[0]; f <= file->f_file[1]; f++) + if (f) + nlmsvc_ops->fclose(f); +} + /* * Loop over all files in the file table. */ @@ -278,7 +317,7 @@ nlm_traverse_files(void *data, nlm_host_match_fn_t match, if (list_empty(&file->f_blocks) && !file->f_locks && !file->f_shares && !file->f_count) { hlist_del(&file->f_list); - nlmsvc_ops->fclose(file->f_file); + nlm_close_files(file); kfree(file); } } @@ -412,6 +451,7 @@ nlmsvc_invalidate_all(void) nlm_traverse_files(NULL, nlmsvc_is_client, NULL); } + static int nlmsvc_match_sb(void *datap, struct nlm_file *file) { diff --git a/fs/nfsd/lockd.c b/fs/nfsd/lockd.c index 3f5b3d7b62b7..606fa155c28a 100644 --- a/fs/nfsd/lockd.c +++ b/fs/nfsd/lockd.c @@ -25,9 +25,11 @@ * Note: we hold the dentry use count while the file is open. */ static __be32 -nlm_fopen(struct svc_rqst *rqstp, struct nfs_fh *f, struct file **filp) +nlm_fopen(struct svc_rqst *rqstp, struct nfs_fh *f, struct file **filp, + int mode) { __be32 nfserr; + int access; struct svc_fh fh; /* must initialize before using! but maxsize doesn't matter */ @@ -36,7 +38,9 @@ nlm_fopen(struct svc_rqst *rqstp, struct nfs_fh *f, struct file **filp) memcpy((char*)&fh.fh_handle.fh_base, f->data, f->size); fh.fh_export = NULL; - nfserr = nfsd_open(rqstp, &fh, S_IFREG, NFSD_MAY_LOCK, filp); + access = (mode == O_WRONLY) ? NFSD_MAY_WRITE : NFSD_MAY_READ; + access |= NFSD_MAY_LOCK; + nfserr = nfsd_open(rqstp, &fh, S_IFREG, access, filp); fh_put(&fh); /* We return nlm error codes as nlm doesn't know * about nfsd, but nfsd does know about nlm.. diff --git a/include/linux/lockd/bind.h b/include/linux/lockd/bind.h index 0520c0cd73f4..3bc9f7410e21 100644 --- a/include/linux/lockd/bind.h +++ b/include/linux/lockd/bind.h @@ -27,7 +27,8 @@ struct rpc_task; struct nlmsvc_binding { __be32 (*fopen)(struct svc_rqst *, struct nfs_fh *, - struct file **); + struct file **, + int mode); void (*fclose)(struct file *); }; diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h index 81b71ad2040a..c4ae6506b8b3 100644 --- a/include/linux/lockd/lockd.h +++ b/include/linux/lockd/lockd.h @@ -10,6 +10,8 @@ #ifndef LINUX_LOCKD_LOCKD_H #define LINUX_LOCKD_LOCKD_H +/* XXX: a lot of this should really be under fs/lockd. */ + #include #include #include @@ -154,7 +156,8 @@ struct nlm_rqst { struct nlm_file { struct hlist_node f_list; /* linked list */ struct nfs_fh f_handle; /* NFS file handle */ - struct file * f_file; /* VFS file pointer */ + struct file * f_file[2]; /* VFS file pointers, + indexed by O_ flags */ struct nlm_share * f_shares; /* DOS shares */ struct list_head f_blocks; /* blocked locks */ unsigned int f_locks; /* guesstimate # of locks */ @@ -267,6 +270,7 @@ typedef int (*nlm_host_match_fn_t)(void *cur, struct nlm_host *ref); /* * Server-side lock handling */ +int lock_to_openmode(struct file_lock *); __be32 nlmsvc_lock(struct svc_rqst *, struct nlm_file *, struct nlm_host *, struct nlm_lock *, int, struct nlm_cookie *, int); @@ -301,7 +305,8 @@ int nlmsvc_unlock_all_by_ip(struct sockaddr *server_addr); static inline struct inode *nlmsvc_file_inode(struct nlm_file *file) { - return locks_inode(file->f_file); + return locks_inode(file->f_file[O_RDONLY] ? + file->f_file[O_RDONLY] : file->f_file[O_WRONLY]); } static inline int __nlm_privileged_request4(const struct sockaddr *sap) From 5ea5be84ddd7bb81277598512b11f2116b566bcd Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Fri, 20 Aug 2021 17:02:04 -0400 Subject: [PATCH 0649/1565] nfs: don't atempt blocking locks on nfs reexports [ Upstream commit f657f8eef3ff870552c9fd2839e0061046f44618 ] NFS implements blocking locks by blocking inside its lock method. In the reexport case, this blocks the nfs server thread, which could lead to deadlocks since an nfs server thread might be required to unlock the conflicting lock. It also causes a crash, since the nfs server thread assumes it can free the lock when its lm_notify lock callback is called. Ideal would be to make the nfs lock method return without blocking in this case, but for now it works just not to attempt blocking locks. The difference is just that the original client will have to poll (as it does in the v4.0 case) instead of getting a callback when the lock's available. Signed-off-by: J. Bruce Fields Acked-by: Anna Schumaker Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfs/export.c | 2 +- fs/nfsd/nfs4state.c | 8 ++++++-- include/linux/exportfs.h | 2 ++ 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/nfs/export.c b/fs/nfs/export.c index b347e3ce0cc8..40beac65d135 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -184,5 +184,5 @@ const struct export_operations nfs_export_ops = { .fetch_iversion = nfs_fetch_iversion, .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK| EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS| - EXPORT_OP_NOATOMIC_ATTR, + EXPORT_OP_NOATOMIC_ATTR|EXPORT_OP_SYNC_LOCKS, }; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 401f0f274371..fd3bdf0bf005 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6878,6 +6878,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_blocked_lock *nbl = NULL; struct file_lock *file_lock = NULL; struct file_lock *conflock = NULL; + struct super_block *sb; __be32 status = 0; int lkflg; int err; @@ -6899,6 +6900,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, dprintk("NFSD: nfsd4_lock: permission denied!\n"); return status; } + sb = cstate->current_fh.fh_dentry->d_sb; if (lock->lk_is_new) { if (nfsd4_has_session(cstate)) @@ -6947,7 +6949,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, fp = lock_stp->st_stid.sc_file; switch (lock->lk_type) { case NFS4_READW_LT: - if (nfsd4_has_session(cstate)) + if (nfsd4_has_session(cstate) && + !(sb->s_export_op->flags & EXPORT_OP_SYNC_LOCKS)) fl_flags |= FL_SLEEP; fallthrough; case NFS4_READ_LT: @@ -6959,7 +6962,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, fl_type = F_RDLCK; break; case NFS4_WRITEW_LT: - if (nfsd4_has_session(cstate)) + if (nfsd4_has_session(cstate) && + !(sb->s_export_op->flags & EXPORT_OP_SYNC_LOCKS)) fl_flags |= FL_SLEEP; fallthrough; case NFS4_WRITE_LT: diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index fe848901fcc3..3260fe714846 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -221,6 +221,8 @@ struct export_operations { #define EXPORT_OP_NOATOMIC_ATTR (0x10) /* Filesystem cannot supply atomic attribute updates */ +#define EXPORT_OP_SYNC_LOCKS (0x20) /* Filesystem can't do + asychronous blocking locks */ unsigned long flags; }; From bd5b3deed01afe27154465f06c973b4dbea6b8c3 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Fri, 20 Aug 2021 17:02:05 -0400 Subject: [PATCH 0650/1565] lockd: don't attempt blocking locks on nfs reexports [ Upstream commit b840be2f00c0bc00d993f8f76e251052b83e4382 ] As in the v4 case, it doesn't work well to block waiting for a lock on an nfs filesystem. As in the v4 case, that means we're depending on the client to poll. It's probably incorrect to depend on that, but I *think* clients do poll in practice. In any case, it's an improvement over hanging the lockd thread indefinitely as we currently are. Signed-off-by: J. Bruce Fields Acked-by: Anna Schumaker Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svclock.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index a7b4c51667ad..e9b85d8fd5fe 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -31,6 +31,7 @@ #include #include #include +#include #define NLMDBG_FACILITY NLMDBG_SVCLOCK @@ -470,18 +471,24 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file, struct nlm_cookie *cookie, int reclaim) { struct nlm_block *block = NULL; + struct inode *inode = nlmsvc_file_inode(file); int error; int mode; + int async_block = 0; __be32 ret; dprintk("lockd: nlmsvc_lock(%s/%ld, ty=%d, pi=%d, %Ld-%Ld, bl=%d)\n", - nlmsvc_file_inode(file)->i_sb->s_id, - nlmsvc_file_inode(file)->i_ino, + inode->i_sb->s_id, inode->i_ino, lock->fl.fl_type, lock->fl.fl_pid, (long long)lock->fl.fl_start, (long long)lock->fl.fl_end, wait); + if (inode->i_sb->s_export_op->flags & EXPORT_OP_SYNC_LOCKS) { + async_block = wait; + wait = 0; + } + /* Lock file against concurrent access */ mutex_lock(&file->f_mutex); /* Get existing block (in case client is busy-waiting) @@ -542,7 +549,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file, */ if (wait) break; - ret = nlm_lck_denied; + ret = async_block ? nlm_lck_blocked : nlm_lck_denied; goto out; case FILE_LOCK_DEFERRED: if (wait) From efea5d558ef3a24b9938fed1668931c464461a0d Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Fri, 20 Aug 2021 17:02:06 -0400 Subject: [PATCH 0651/1565] nfs: don't allow reexport reclaims [ Upstream commit bb0a55bb7148a49e549ee992200860e7a040d3a5 ] In the reexport case, nfsd is currently passing along locks with the reclaim bit set. The client sends a new lock request, which is granted if there's currently no conflict--even if it's possible a conflicting lock could have been briefly held in the interim. We don't currently have any way to safely grant reclaim, so for now let's just deny them all. I'm doing this by passing the reclaim bit to nfs and letting it fail the call, with the idea that eventually the client might be able to do something more forgiving here. Signed-off-by: J. Bruce Fields Acked-by: Anna Schumaker Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfs/file.c | 3 +++ fs/nfsd/nfs4state.c | 3 +++ fs/nfsd/nfsproc.c | 1 + include/linux/errno.h | 1 + include/linux/fs.h | 1 + 5 files changed, 9 insertions(+) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 7be1a7f7fcb2..d35aae47b062 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -798,6 +798,9 @@ int nfs_lock(struct file *filp, int cmd, struct file_lock *fl) nfs_inc_stats(inode, NFSIOS_VFSLOCK); + if (fl->fl_flags & FL_RECLAIM) + return -ENOGRACE; + /* No mandatory locks over NFS */ if (__mandatory_lock(inode) && fl->fl_type != F_UNLCK) goto out_err; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index fd3bdf0bf005..e9ac77c28741 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6946,6 +6946,9 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (!locks_in_grace(net) && lock->lk_reclaim) goto out; + if (lock->lk_reclaim) + fl_flags |= FL_RECLAIM; + fp = lock_stp->st_stid.sc_file; switch (lock->lk_type) { case NFS4_READW_LT: diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 72f8bc4a7ea4..78bdfdc253fd 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -881,6 +881,7 @@ nfserrno (int errno) { nfserr_serverfault, -ENFILE }, { nfserr_io, -EUCLEAN }, { nfserr_perm, -ENOKEY }, + { nfserr_no_grace, -ENOGRACE}, }; int i; diff --git a/include/linux/errno.h b/include/linux/errno.h index d73f597a2484..8b0c754bab02 100644 --- a/include/linux/errno.h +++ b/include/linux/errno.h @@ -31,5 +31,6 @@ #define EJUKEBOX 528 /* Request initiated, but will not complete before timeout */ #define EIOCBQUEUED 529 /* iocb queued, will get completion event */ #define ERECALLCONFLICT 530 /* conflict with recalled state */ +#define ENOGRACE 531 /* NFS file lock reclaim refused */ #endif diff --git a/include/linux/fs.h b/include/linux/fs.h index a9ac60d3be1d..c0459446e144 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -996,6 +996,7 @@ static inline struct file *get_file(struct file *f) #define FL_UNLOCK_PENDING 512 /* Lease is being broken */ #define FL_OFDLCK 1024 /* lock is "owned" by struct file */ #define FL_LAYOUT 2048 /* outstanding pNFS layout */ +#define FL_RECLAIM 4096 /* reclaiming from a reboot server */ #define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE) From a96da583ff54c0935ed44a80d79e54f3c84e8843 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 15 Jul 2021 15:52:06 -0400 Subject: [PATCH 0652/1565] SUNRPC: Add svc_rqst::rq_auth_stat [ Upstream commit 438623a06bacd69c40c4af633bb09a3bbb9dfc78 ] I'd like to take commit 4532608d71c8 ("SUNRPC: Clean up generic dispatcher code") even further by using only private local SVC dispatchers for all kernel RPC services. This change would enable the removal of the logic that switches between svc_generic_dispatch() and a service's private dispatcher, and simplify the invocation of the service's pc_release method so that humans can visually verify that it is always invoked properly. All that will come later. First, let's provide a better way to return authentication errors from SVC dispatcher functions. Instead of overloading the dispatch method's *statp argument, add a field to struct svc_rqst that can hold an error value. Signed-off-by: Chuck Lever Signed-off-by: Anna Schumaker Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/sunrpc/svc.h | 1 + include/linux/sunrpc/svcauth.h | 4 +-- include/trace/events/sunrpc.h | 6 ++--- net/sunrpc/auth_gss/svcauth_gss.c | 43 +++++++++++++++---------------- net/sunrpc/svc.c | 17 ++++++------ net/sunrpc/svcauth.c | 8 +++--- net/sunrpc/svcauth_unix.c | 12 ++++----- 7 files changed, 46 insertions(+), 45 deletions(-) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index ab9afbf0a0d8..7b7bd8a0a6fb 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -284,6 +284,7 @@ struct svc_rqst { void * rq_argp; /* decoded arguments */ void * rq_resp; /* xdr'd results */ void * rq_auth_data; /* flavor-specific data */ + __be32 rq_auth_stat; /* authentication status */ int rq_auth_slack; /* extra space xdr code * should leave in head * for krb5i, krb5p. diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h index b0003866a249..6d9cc9080aca 100644 --- a/include/linux/sunrpc/svcauth.h +++ b/include/linux/sunrpc/svcauth.h @@ -127,7 +127,7 @@ struct auth_ops { char * name; struct module *owner; int flavour; - int (*accept)(struct svc_rqst *rq, __be32 *authp); + int (*accept)(struct svc_rqst *rq); int (*release)(struct svc_rqst *rq); void (*domain_release)(struct auth_domain *); int (*set_client)(struct svc_rqst *rq); @@ -149,7 +149,7 @@ struct auth_ops { struct svc_xprt; -extern int svc_authenticate(struct svc_rqst *rqstp, __be32 *authp); +extern int svc_authenticate(struct svc_rqst *rqstp); extern int svc_authorise(struct svc_rqst *rqstp); extern int svc_set_client(struct svc_rqst *rqstp); extern int svc_auth_register(rpc_authflavor_t flavor, struct auth_ops *aops); diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 657138ab1f91..0cb9e182a1b1 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -1547,9 +1547,9 @@ TRACE_DEFINE_ENUM(SVC_COMPLETE); { SVC_COMPLETE, "SVC_COMPLETE" }) TRACE_EVENT(svc_authenticate, - TP_PROTO(const struct svc_rqst *rqst, int auth_res, __be32 auth_stat), + TP_PROTO(const struct svc_rqst *rqst, int auth_res), - TP_ARGS(rqst, auth_res, auth_stat), + TP_ARGS(rqst, auth_res), TP_STRUCT__entry( __field(u32, xid) @@ -1560,7 +1560,7 @@ TRACE_EVENT(svc_authenticate, TP_fast_assign( __entry->xid = be32_to_cpu(rqst->rq_xid); __entry->svc_status = auth_res; - __entry->auth_stat = be32_to_cpu(auth_stat); + __entry->auth_stat = be32_to_cpu(rqst->rq_auth_stat); ), TP_printk("xid=0x%08x auth_res=%s auth_stat=%s", diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c index 784c8b24f164..54303b7efde7 100644 --- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -707,11 +707,11 @@ svc_safe_putnetobj(struct kvec *resv, struct xdr_netobj *o) /* * Verify the checksum on the header and return SVC_OK on success. * Otherwise, return SVC_DROP (in the case of a bad sequence number) - * or return SVC_DENIED and indicate error in authp. + * or return SVC_DENIED and indicate error in rqstp->rq_auth_stat. */ static int gss_verify_header(struct svc_rqst *rqstp, struct rsc *rsci, - __be32 *rpcstart, struct rpc_gss_wire_cred *gc, __be32 *authp) + __be32 *rpcstart, struct rpc_gss_wire_cred *gc) { struct gss_ctx *ctx_id = rsci->mechctx; struct xdr_buf rpchdr; @@ -725,7 +725,7 @@ gss_verify_header(struct svc_rqst *rqstp, struct rsc *rsci, iov.iov_len = (u8 *)argv->iov_base - (u8 *)rpcstart; xdr_buf_from_iov(&iov, &rpchdr); - *authp = rpc_autherr_badverf; + rqstp->rq_auth_stat = rpc_autherr_badverf; if (argv->iov_len < 4) return SVC_DENIED; flavor = svc_getnl(argv); @@ -737,13 +737,13 @@ gss_verify_header(struct svc_rqst *rqstp, struct rsc *rsci, if (rqstp->rq_deferred) /* skip verification of revisited request */ return SVC_OK; if (gss_verify_mic(ctx_id, &rpchdr, &checksum) != GSS_S_COMPLETE) { - *authp = rpcsec_gsserr_credproblem; + rqstp->rq_auth_stat = rpcsec_gsserr_credproblem; return SVC_DENIED; } if (gc->gc_seq > MAXSEQ) { trace_rpcgss_svc_seqno_large(rqstp, gc->gc_seq); - *authp = rpcsec_gsserr_ctxproblem; + rqstp->rq_auth_stat = rpcsec_gsserr_ctxproblem; return SVC_DENIED; } if (!gss_check_seq_num(rqstp, rsci, gc->gc_seq)) @@ -1136,7 +1136,7 @@ static void gss_free_in_token_pages(struct gssp_in_token *in_token) } static int gss_read_proxy_verf(struct svc_rqst *rqstp, - struct rpc_gss_wire_cred *gc, __be32 *authp, + struct rpc_gss_wire_cred *gc, struct xdr_netobj *in_handle, struct gssp_in_token *in_token) { @@ -1145,7 +1145,7 @@ static int gss_read_proxy_verf(struct svc_rqst *rqstp, int pages, i, res, pgto, pgfrom; size_t inlen, to_offs, from_offs; - res = gss_read_common_verf(gc, argv, authp, in_handle); + res = gss_read_common_verf(gc, argv, &rqstp->rq_auth_stat, in_handle); if (res) return res; @@ -1226,7 +1226,7 @@ gss_write_resv(struct kvec *resv, size_t size_limit, * Otherwise, drop the request pending an answer to the upcall. */ static int svcauth_gss_legacy_init(struct svc_rqst *rqstp, - struct rpc_gss_wire_cred *gc, __be32 *authp) + struct rpc_gss_wire_cred *gc) { struct kvec *argv = &rqstp->rq_arg.head[0]; struct kvec *resv = &rqstp->rq_res.head[0]; @@ -1235,7 +1235,7 @@ static int svcauth_gss_legacy_init(struct svc_rqst *rqstp, struct sunrpc_net *sn = net_generic(SVC_NET(rqstp), sunrpc_net_id); memset(&rsikey, 0, sizeof(rsikey)); - ret = gss_read_verf(gc, argv, authp, + ret = gss_read_verf(gc, argv, &rqstp->rq_auth_stat, &rsikey.in_handle, &rsikey.in_token); if (ret) return ret; @@ -1338,7 +1338,7 @@ static int gss_proxy_save_rsc(struct cache_detail *cd, } static int svcauth_gss_proxy_init(struct svc_rqst *rqstp, - struct rpc_gss_wire_cred *gc, __be32 *authp) + struct rpc_gss_wire_cred *gc) { struct kvec *resv = &rqstp->rq_res.head[0]; struct xdr_netobj cli_handle; @@ -1350,8 +1350,7 @@ static int svcauth_gss_proxy_init(struct svc_rqst *rqstp, struct sunrpc_net *sn = net_generic(net, sunrpc_net_id); memset(&ud, 0, sizeof(ud)); - ret = gss_read_proxy_verf(rqstp, gc, authp, - &ud.in_handle, &ud.in_token); + ret = gss_read_proxy_verf(rqstp, gc, &ud.in_handle, &ud.in_token); if (ret) return ret; @@ -1524,7 +1523,7 @@ static void destroy_use_gss_proxy_proc_entry(struct net *net) {} * response here and return SVC_COMPLETE. */ static int -svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp) +svcauth_gss_accept(struct svc_rqst *rqstp) { struct kvec *argv = &rqstp->rq_arg.head[0]; struct kvec *resv = &rqstp->rq_res.head[0]; @@ -1537,7 +1536,7 @@ svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp) int ret; struct sunrpc_net *sn = net_generic(SVC_NET(rqstp), sunrpc_net_id); - *authp = rpc_autherr_badcred; + rqstp->rq_auth_stat = rpc_autherr_badcred; if (!svcdata) svcdata = kmalloc(sizeof(*svcdata), GFP_KERNEL); if (!svcdata) @@ -1574,22 +1573,22 @@ svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp) if ((gc->gc_proc != RPC_GSS_PROC_DATA) && (rqstp->rq_proc != 0)) goto auth_err; - *authp = rpc_autherr_badverf; + rqstp->rq_auth_stat = rpc_autherr_badverf; switch (gc->gc_proc) { case RPC_GSS_PROC_INIT: case RPC_GSS_PROC_CONTINUE_INIT: if (use_gss_proxy(SVC_NET(rqstp))) - return svcauth_gss_proxy_init(rqstp, gc, authp); + return svcauth_gss_proxy_init(rqstp, gc); else - return svcauth_gss_legacy_init(rqstp, gc, authp); + return svcauth_gss_legacy_init(rqstp, gc); case RPC_GSS_PROC_DATA: case RPC_GSS_PROC_DESTROY: /* Look up the context, and check the verifier: */ - *authp = rpcsec_gsserr_credproblem; + rqstp->rq_auth_stat = rpcsec_gsserr_credproblem; rsci = gss_svc_searchbyctx(sn->rsc_cache, &gc->gc_ctx); if (!rsci) goto auth_err; - switch (gss_verify_header(rqstp, rsci, rpcstart, gc, authp)) { + switch (gss_verify_header(rqstp, rsci, rpcstart, gc)) { case SVC_OK: break; case SVC_DENIED: @@ -1599,7 +1598,7 @@ svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp) } break; default: - *authp = rpc_autherr_rejectedcred; + rqstp->rq_auth_stat = rpc_autherr_rejectedcred; goto auth_err; } @@ -1615,13 +1614,13 @@ svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp) svc_putnl(resv, RPC_SUCCESS); goto complete; case RPC_GSS_PROC_DATA: - *authp = rpcsec_gsserr_ctxproblem; + rqstp->rq_auth_stat = rpcsec_gsserr_ctxproblem; svcdata->verf_start = resv->iov_base + resv->iov_len; if (gss_write_verf(rqstp, rsci->mechctx, gc->gc_seq)) goto auth_err; rqstp->rq_cred = rsci->cred; get_group_info(rsci->cred.cr_group_info); - *authp = rpc_autherr_badcred; + rqstp->rq_auth_stat = rpc_autherr_badcred; switch (gc->gc_svc) { case RPC_GSS_SVC_NONE: break; diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 27f98756d175..cbcc951639ad 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1305,7 +1305,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) struct svc_process_info process; __be32 *statp; u32 prog, vers; - __be32 auth_stat, rpc_stat; + __be32 rpc_stat; int auth_res; __be32 *reply_statp; @@ -1348,14 +1348,14 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) * We do this before anything else in order to get a decent * auth verifier. */ - auth_res = svc_authenticate(rqstp, &auth_stat); + auth_res = svc_authenticate(rqstp); /* Also give the program a chance to reject this call: */ if (auth_res == SVC_OK && progp) { - auth_stat = rpc_autherr_badcred; + rqstp->rq_auth_stat = rpc_autherr_badcred; auth_res = progp->pg_authenticate(rqstp); } if (auth_res != SVC_OK) - trace_svc_authenticate(rqstp, auth_res, auth_stat); + trace_svc_authenticate(rqstp, auth_res); switch (auth_res) { case SVC_OK: break; @@ -1414,8 +1414,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) goto release_dropit; if (*statp == rpc_garbage_args) goto err_garbage; - auth_stat = svc_get_autherr(rqstp, statp); - if (auth_stat != rpc_auth_ok) + rqstp->rq_auth_stat = svc_get_autherr(rqstp, statp); + if (rqstp->rq_auth_stat != rpc_auth_ok) goto err_release_bad_auth; } else { dprintk("svc: calling dispatcher\n"); @@ -1472,13 +1472,14 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) if (procp->pc_release) procp->pc_release(rqstp); err_bad_auth: - dprintk("svc: authentication failed (%d)\n", ntohl(auth_stat)); + dprintk("svc: authentication failed (%d)\n", + be32_to_cpu(rqstp->rq_auth_stat)); serv->sv_stats->rpcbadauth++; /* Restore write pointer to location of accept status: */ xdr_ressize_check(rqstp, reply_statp); svc_putnl(resv, 1); /* REJECT */ svc_putnl(resv, 1); /* AUTH_ERROR */ - svc_putnl(resv, ntohl(auth_stat)); /* status */ + svc_putu32(resv, rqstp->rq_auth_stat); /* status */ goto sendit; err_bad_prog: diff --git a/net/sunrpc/svcauth.c b/net/sunrpc/svcauth.c index 998b196b6176..5a8b8e03fdd4 100644 --- a/net/sunrpc/svcauth.c +++ b/net/sunrpc/svcauth.c @@ -59,12 +59,12 @@ svc_put_auth_ops(struct auth_ops *aops) } int -svc_authenticate(struct svc_rqst *rqstp, __be32 *authp) +svc_authenticate(struct svc_rqst *rqstp) { rpc_authflavor_t flavor; struct auth_ops *aops; - *authp = rpc_auth_ok; + rqstp->rq_auth_stat = rpc_auth_ok; flavor = svc_getnl(&rqstp->rq_arg.head[0]); @@ -72,7 +72,7 @@ svc_authenticate(struct svc_rqst *rqstp, __be32 *authp) aops = svc_get_auth_ops(flavor); if (aops == NULL) { - *authp = rpc_autherr_badcred; + rqstp->rq_auth_stat = rpc_autherr_badcred; return SVC_DENIED; } @@ -80,7 +80,7 @@ svc_authenticate(struct svc_rqst *rqstp, __be32 *authp) init_svc_cred(&rqstp->rq_cred); rqstp->rq_authop = aops; - return aops->accept(rqstp, authp); + return aops->accept(rqstp); } EXPORT_SYMBOL_GPL(svc_authenticate); diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c index 60754a292589..c20c63d651a9 100644 --- a/net/sunrpc/svcauth_unix.c +++ b/net/sunrpc/svcauth_unix.c @@ -743,7 +743,7 @@ svcauth_unix_set_client(struct svc_rqst *rqstp) EXPORT_SYMBOL_GPL(svcauth_unix_set_client); static int -svcauth_null_accept(struct svc_rqst *rqstp, __be32 *authp) +svcauth_null_accept(struct svc_rqst *rqstp) { struct kvec *argv = &rqstp->rq_arg.head[0]; struct kvec *resv = &rqstp->rq_res.head[0]; @@ -754,12 +754,12 @@ svcauth_null_accept(struct svc_rqst *rqstp, __be32 *authp) if (svc_getu32(argv) != 0) { dprintk("svc: bad null cred\n"); - *authp = rpc_autherr_badcred; + rqstp->rq_auth_stat = rpc_autherr_badcred; return SVC_DENIED; } if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) { dprintk("svc: bad null verf\n"); - *authp = rpc_autherr_badverf; + rqstp->rq_auth_stat = rpc_autherr_badverf; return SVC_DENIED; } @@ -803,7 +803,7 @@ struct auth_ops svcauth_null = { static int -svcauth_unix_accept(struct svc_rqst *rqstp, __be32 *authp) +svcauth_unix_accept(struct svc_rqst *rqstp) { struct kvec *argv = &rqstp->rq_arg.head[0]; struct kvec *resv = &rqstp->rq_res.head[0]; @@ -845,7 +845,7 @@ svcauth_unix_accept(struct svc_rqst *rqstp, __be32 *authp) } groups_sort(cred->cr_group_info); if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) { - *authp = rpc_autherr_badverf; + rqstp->rq_auth_stat = rpc_autherr_badverf; return SVC_DENIED; } @@ -857,7 +857,7 @@ svcauth_unix_accept(struct svc_rqst *rqstp, __be32 *authp) return SVC_OK; badcred: - *authp = rpc_autherr_badcred; + rqstp->rq_auth_stat = rpc_autherr_badcred; return SVC_DENIED; } From febf43bcdc2bbb5aef8d27df5104288562c765fe Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 15 Jul 2021 15:52:12 -0400 Subject: [PATCH 0653/1565] SUNRPC: Set rq_auth_stat in the pg_authenticate() callout [ Upstream commit 5c2465dfd457f3015eebcc3ace50570e1d896aeb ] In a few moments, rq_auth_stat will need to be explicitly set to rpc_auth_ok before execution gets to the dispatcher. svc_authenticate() already sets it, but it often gets reset to rpc_autherr_badcred right after that call, even when authentication is successful. Let's ensure that the pg_authenticate callout and svc_set_client() set it properly in every case. Signed-off-by: Chuck Lever Signed-off-by: Anna Schumaker Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 2 ++ fs/nfs/callback.c | 4 ++++ net/sunrpc/auth_gss/svcauth_gss.c | 4 ++++ net/sunrpc/svc.c | 4 +--- net/sunrpc/svcauth_unix.c | 6 +++++- 5 files changed, 16 insertions(+), 4 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 0ab9756ed235..b632be3ad57b 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -649,6 +649,7 @@ static int lockd_authenticate(struct svc_rqst *rqstp) switch (rqstp->rq_authop->flavour) { case RPC_AUTH_NULL: case RPC_AUTH_UNIX: + rqstp->rq_auth_stat = rpc_auth_ok; if (rqstp->rq_proc == 0) return SVC_OK; if (is_callback(rqstp->rq_proc)) { @@ -659,6 +660,7 @@ static int lockd_authenticate(struct svc_rqst *rqstp) } return svc_set_client(rqstp); } + rqstp->rq_auth_stat = rpc_autherr_badcred; return SVC_DENIED; } diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 7817ad94a6ba..86d856de1389 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -429,6 +429,8 @@ check_gss_callback_principal(struct nfs_client *clp, struct svc_rqst *rqstp) */ static int nfs_callback_authenticate(struct svc_rqst *rqstp) { + rqstp->rq_auth_stat = rpc_autherr_badcred; + switch (rqstp->rq_authop->flavour) { case RPC_AUTH_NULL: if (rqstp->rq_proc != CB_NULL) @@ -439,6 +441,8 @@ static int nfs_callback_authenticate(struct svc_rqst *rqstp) if (svc_is_backchannel(rqstp)) return SVC_DENIED; } + + rqstp->rq_auth_stat = rpc_auth_ok; return SVC_OK; } diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c index 54303b7efde7..329eac782cc5 100644 --- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -1038,6 +1038,8 @@ svcauth_gss_set_client(struct svc_rqst *rqstp) struct rpc_gss_wire_cred *gc = &svcdata->clcred; int stat; + rqstp->rq_auth_stat = rpc_autherr_badcred; + /* * A gss export can be specified either by: * export *(sec=krb5,rw) @@ -1053,6 +1055,8 @@ svcauth_gss_set_client(struct svc_rqst *rqstp) stat = svcauth_unix_set_client(rqstp); if (stat == SVC_DROP || stat == SVC_CLOSE) return stat; + + rqstp->rq_auth_stat = rpc_auth_ok; return SVC_OK; } diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index cbcc951639ad..f03650727533 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1350,10 +1350,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) */ auth_res = svc_authenticate(rqstp); /* Also give the program a chance to reject this call: */ - if (auth_res == SVC_OK && progp) { - rqstp->rq_auth_stat = rpc_autherr_badcred; + if (auth_res == SVC_OK && progp) auth_res = progp->pg_authenticate(rqstp); - } if (auth_res != SVC_OK) trace_svc_authenticate(rqstp, auth_res); switch (auth_res) { diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c index c20c63d651a9..1868596259af 100644 --- a/net/sunrpc/svcauth_unix.c +++ b/net/sunrpc/svcauth_unix.c @@ -699,8 +699,9 @@ svcauth_unix_set_client(struct svc_rqst *rqstp) rqstp->rq_client = NULL; if (rqstp->rq_proc == 0) - return SVC_OK; + goto out; + rqstp->rq_auth_stat = rpc_autherr_badcred; ipm = ip_map_cached_get(xprt); if (ipm == NULL) ipm = __ip_map_lookup(sn->ip_map_cache, rqstp->rq_server->sv_program->pg_class, @@ -737,6 +738,9 @@ svcauth_unix_set_client(struct svc_rqst *rqstp) put_group_info(cred->cr_group_info); cred->cr_group_info = gi; } + +out: + rqstp->rq_auth_stat = rpc_auth_ok; return SVC_OK; } From 91bbbffece637eccb3e744670cb3b5877060aabe Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 15 Jul 2021 15:52:19 -0400 Subject: [PATCH 0654/1565] SUNRPC: Eliminate the RQ_AUTHERR flag [ Upstream commit 9082e1d914f8b27114352b1940bbcc7522f682e7 ] Now that there is an alternate method for returning an auth_stat value, replace the RQ_AUTHERR flag with use of that new method. Signed-off-by: Chuck Lever Signed-off-by: Anna Schumaker Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfs/callback_xdr.c | 3 ++- include/linux/sunrpc/svc.h | 2 -- include/trace/events/sunrpc.h | 3 +-- net/sunrpc/svc.c | 24 ++++-------------------- 4 files changed, 7 insertions(+), 25 deletions(-) diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index f9dfd4e712a3..0559e8b6a8ec 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -984,7 +984,8 @@ static __be32 nfs4_callback_compound(struct svc_rqst *rqstp) out_invalidcred: pr_warn_ratelimited("NFS: NFSv4 callback contains invalid cred\n"); - return svc_return_autherr(rqstp, rpc_autherr_badcred); + rqstp->rq_auth_stat = rpc_autherr_badcred; + return rpc_success; } /* diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 7b7bd8a0a6fb..dd3daadbc0e5 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -277,7 +277,6 @@ struct svc_rqst { #define RQ_VICTIM (5) /* about to be shut down */ #define RQ_BUSY (6) /* request is busy */ #define RQ_DATA (7) /* request has data */ -#define RQ_AUTHERR (8) /* Request status is auth error */ unsigned long rq_flags; /* flags field */ ktime_t rq_qtime; /* enqueue time */ @@ -537,7 +536,6 @@ unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, char *svc_fill_symlink_pathname(struct svc_rqst *rqstp, struct kvec *first, void *p, size_t total); -__be32 svc_return_autherr(struct svc_rqst *rqstp, __be32 auth_err); __be32 svc_generic_init_request(struct svc_rqst *rqstp, const struct svc_program *progp, struct svc_process_info *procinfo); diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 0cb9e182a1b1..fce071f39f51 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -1480,8 +1480,7 @@ DEFINE_SVCXDRBUF_EVENT(sendto); svc_rqst_flag(SPLICE_OK) \ svc_rqst_flag(VICTIM) \ svc_rqst_flag(BUSY) \ - svc_rqst_flag(DATA) \ - svc_rqst_flag_end(AUTHERR) + svc_rqst_flag_end(DATA) #undef svc_rqst_flag #undef svc_rqst_flag_end diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index f03650727533..0d3c3ca2830a 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1187,22 +1187,6 @@ void svc_printk(struct svc_rqst *rqstp, const char *fmt, ...) static __printf(2,3) void svc_printk(struct svc_rqst *rqstp, const char *fmt, ...) {} #endif -__be32 -svc_return_autherr(struct svc_rqst *rqstp, __be32 auth_err) -{ - set_bit(RQ_AUTHERR, &rqstp->rq_flags); - return auth_err; -} -EXPORT_SYMBOL_GPL(svc_return_autherr); - -static __be32 -svc_get_autherr(struct svc_rqst *rqstp, __be32 *statp) -{ - if (test_and_clear_bit(RQ_AUTHERR, &rqstp->rq_flags)) - return *statp; - return rpc_auth_ok; -} - static int svc_generic_dispatch(struct svc_rqst *rqstp, __be32 *statp) { @@ -1226,7 +1210,7 @@ svc_generic_dispatch(struct svc_rqst *rqstp, __be32 *statp) test_bit(RQ_DROPME, &rqstp->rq_flags)) return 0; - if (test_bit(RQ_AUTHERR, &rqstp->rq_flags)) + if (rqstp->rq_auth_stat != rpc_auth_ok) return 1; if (*statp != rpc_success) @@ -1412,15 +1396,15 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) goto release_dropit; if (*statp == rpc_garbage_args) goto err_garbage; - rqstp->rq_auth_stat = svc_get_autherr(rqstp, statp); - if (rqstp->rq_auth_stat != rpc_auth_ok) - goto err_release_bad_auth; } else { dprintk("svc: calling dispatcher\n"); if (!process.dispatch(rqstp, statp)) goto release_dropit; /* Release reply info */ } + if (rqstp->rq_auth_stat != rpc_auth_ok) + goto err_release_bad_auth; + /* Check RPC status result */ if (*statp != rpc_success) resv->iov_len = ((void*)statp) - resv->iov_base + 4; From edf220fe151691f35ee19f86d32291f932d780f7 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 15 Jul 2021 15:52:25 -0400 Subject: [PATCH 0655/1565] NFS: Add a private local dispatcher for NFSv4 callback operations [ Upstream commit 7d34c96217cf3c2d37ca0a56ca0bc3c3bef1e189 ] The client's NFSv4 callback service is the only remaining user of svc_generic_dispatch(). Note that the NFSv4 callback service doesn't use the .pc_encode and .pc_decode callouts in any substantial way, so they are removed. Signed-off-by: Chuck Lever Signed-off-by: Anna Schumaker Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfs/callback_xdr.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index 0559e8b6a8ec..e7d1efd45fa4 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -988,6 +988,15 @@ static __be32 nfs4_callback_compound(struct svc_rqst *rqstp) return rpc_success; } +static int +nfs_callback_dispatch(struct svc_rqst *rqstp, __be32 *statp) +{ + const struct svc_procedure *procp = rqstp->rq_procinfo; + + *statp = procp->pc_func(rqstp); + return 1; +} + /* * Define NFS4 callback COMPOUND ops. */ @@ -1076,7 +1085,7 @@ const struct svc_version nfs4_callback_version1 = { .vs_proc = nfs4_callback_procedures1, .vs_count = nfs4_callback_count1, .vs_xdrsize = NFS4_CALLBACK_XDRSIZE, - .vs_dispatch = NULL, + .vs_dispatch = nfs_callback_dispatch, .vs_hidden = true, .vs_need_cong_ctrl = true, }; @@ -1088,7 +1097,7 @@ const struct svc_version nfs4_callback_version4 = { .vs_proc = nfs4_callback_procedures1, .vs_count = nfs4_callback_count4, .vs_xdrsize = NFS4_CALLBACK_XDRSIZE, - .vs_dispatch = NULL, + .vs_dispatch = nfs_callback_dispatch, .vs_hidden = true, .vs_need_cong_ctrl = true, }; From 1773901afb33b28fce99c978cffae598210f689b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 15 Jul 2021 15:52:31 -0400 Subject: [PATCH 0656/1565] NFS: Remove unused callback void decoder [ Upstream commit c35a810ce59524971c4a3b45faed4d0121e5a305 ] Clean up: The callback RPC dispatcher no longer invokes these call outs, although svc_process_common() relies on seeing a .pc_encode function. Signed-off-by: Chuck Lever Signed-off-by: Anna Schumaker Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfs/callback_xdr.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index e7d1efd45fa4..600e64068240 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -63,11 +63,10 @@ static __be32 nfs4_callback_null(struct svc_rqst *rqstp) return htonl(NFS4_OK); } -static int nfs4_decode_void(struct svc_rqst *rqstp, __be32 *p) -{ - return xdr_argsize_check(rqstp, p); -} - +/* + * svc_process_common() looks for an XDR encoder to know when + * not to drop a Reply. + */ static int nfs4_encode_void(struct svc_rqst *rqstp, __be32 *p) { return xdr_ressize_check(rqstp, p); @@ -1063,7 +1062,6 @@ static struct callback_op callback_ops[] = { static const struct svc_procedure nfs4_callback_procedures1[] = { [CB_NULL] = { .pc_func = nfs4_callback_null, - .pc_decode = nfs4_decode_void, .pc_encode = nfs4_encode_void, .pc_xdrressize = 1, .pc_name = "NULL", From b801327ba3c3b3672db67f3efdbc299bae6ece8b Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 9 Sep 2021 14:56:34 +0300 Subject: [PATCH 0657/1565] fsnotify: fix sb_connectors leak [ Upstream commit 4396a73115fc8739083536162e2228c0c0c3ed1a ] Fix a leak in s_fsnotify_connectors counter in case of a race between concurrent add of new fsnotify mark to an object. The task that lost the race fails to drop the counter before freeing the unused connector. Following umount() hangs in fsnotify_sb_delete()/wait_var_event(), because s_fsnotify_connectors never drops to zero. Fixes: ec44610fe2b8 ("fsnotify: count all objects with attached connectors") Reported-by: Murphy Zhou Link: https://lore.kernel.org/linux-fsdevel/20210907063338.ycaw6wvhzrfsfdlp@xzhoux.usersys.redhat.com/ Signed-off-by: Amir Goldstein Signed-off-by: Linus Torvalds Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/mark.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/notify/mark.c b/fs/notify/mark.c index 796946eb0c2e..bea106fac090 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -531,6 +531,7 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, /* Someone else created list structure for us */ if (inode) fsnotify_put_inode_ref(inode); + fsnotify_put_sb_connectors(conn); kmem_cache_free(fsnotify_mark_connector_cachep, conn); } From 7918a95bc2267f6a6404ed4d580c19e768be9e43 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 16 Sep 2021 17:24:54 -0400 Subject: [PATCH 0658/1565] NLM: Fix svcxdr_encode_owner() [ Upstream commit 89c485c7a3ecbc2ebd568f9c9c2edf3a8cf7485b ] Dai Ngo reports that, since the XDR overhaul, the NLM server crashes when the TEST procedure wants to return NLM_DENIED. There is a bug in svcxdr_encode_owner() that none of our standard test cases found. Replace the open-coded function with a call to an appropriate pre-fabricated XDR helper. Reported-by: Dai Ngo Fixes: a6a63ca5652e ("lockd: Common NLM XDR helpers") Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svcxdr.h | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/fs/lockd/svcxdr.h b/fs/lockd/svcxdr.h index c69a0bb76c94..4f1a451da5ba 100644 --- a/fs/lockd/svcxdr.h +++ b/fs/lockd/svcxdr.h @@ -134,18 +134,9 @@ svcxdr_decode_owner(struct xdr_stream *xdr, struct xdr_netobj *obj) static inline bool svcxdr_encode_owner(struct xdr_stream *xdr, const struct xdr_netobj *obj) { - unsigned int quadlen = XDR_QUADLEN(obj->len); - __be32 *p; - - if (xdr_stream_encode_u32(xdr, obj->len) < 0) + if (obj->len > XDR_MAX_NETOBJ) return false; - p = xdr_reserve_space(xdr, obj->len); - if (!p) - return false; - p[quadlen - 1] = 0; /* XDR pad */ - memcpy(p, obj->data, obj->len); - - return true; + return xdr_stream_encode_opaque(xdr, obj->data, obj->len) > 0; } #endif /* _LOCKD_SVCXDR_H_ */ From 6719531e67134984b951948563f47740271b4b74 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Thu, 30 Sep 2021 15:44:42 -0400 Subject: [PATCH 0659/1565] nfsd: Fix a warning for nfsd_file_close_inode [ Upstream commit 19598141f40dff728dd50799e510805261f48850 ] Signed-off-by: Trond Myklebust Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index d0748ff04b92..87d984e0cdc0 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -541,7 +541,7 @@ nfsd_file_close_inode_sync(struct inode *inode) } /** - * nfsd_file_close_inode_sync - attempt to forcibly close a nfsd_file + * nfsd_file_close_inode - attempt a delayed close of a nfsd_file * @inode: inode of the file to attempt to remove * * Walk the whole hash bucket, looking for any files that correspond to "inode". From f114860f727950e8227b0bafe26216e71a681ce4 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Mon, 25 Oct 2021 16:27:16 -0300 Subject: [PATCH 0660/1565] fsnotify: pass data_type to fsnotify_name() [ Upstream commit 9baf93d68bcc3d0a6042283b82603c076e25e4f5 ] Align the arguments of fsnotify_name() to those of fsnotify(). Link: https://lore.kernel.org/r/20211025192746.66445-2-krisman@collabora.com Reviewed-by: Jan Kara Signed-off-by: Amir Goldstein Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara [ cel: adjust fsnotify_delete as well, a37d9a17f099 is already applied ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/fsnotify.h | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index a9477c14fad5..211463715e29 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -26,20 +26,21 @@ * FS_EVENT_ON_CHILD mask on the parent inode and will not be reported if only * the child is interested and not the parent. */ -static inline void fsnotify_name(struct inode *dir, __u32 mask, - struct inode *child, - const struct qstr *name, u32 cookie) +static inline int fsnotify_name(__u32 mask, const void *data, int data_type, + struct inode *dir, const struct qstr *name, + u32 cookie) { if (atomic_long_read(&dir->i_sb->s_fsnotify_connectors) == 0) - return; + return 0; - fsnotify(mask, child, FSNOTIFY_EVENT_INODE, dir, name, NULL, cookie); + return fsnotify(mask, data, data_type, dir, name, NULL, cookie); } static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry, __u32 mask) { - fsnotify_name(dir, mask, d_inode(dentry), &dentry->d_name, 0); + fsnotify_name(mask, d_inode(dentry), FSNOTIFY_EVENT_INODE, + dir, &dentry->d_name, 0); } static inline void fsnotify_inode(struct inode *inode, __u32 mask) @@ -154,8 +155,10 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir, new_dir_mask |= FS_ISDIR; } - fsnotify_name(old_dir, old_dir_mask, source, old_name, fs_cookie); - fsnotify_name(new_dir, new_dir_mask, source, new_name, fs_cookie); + fsnotify_name(old_dir_mask, source, FSNOTIFY_EVENT_INODE, + old_dir, old_name, fs_cookie); + fsnotify_name(new_dir_mask, source, FSNOTIFY_EVENT_INODE, + new_dir, new_name, fs_cookie); if (target) fsnotify_link_count(target); @@ -209,7 +212,8 @@ static inline void fsnotify_link(struct inode *dir, struct inode *inode, fsnotify_link_count(inode); audit_inode_child(dir, new_dentry, AUDIT_TYPE_CHILD_CREATE); - fsnotify_name(dir, FS_CREATE, inode, &new_dentry->d_name, 0); + fsnotify_name(FS_CREATE, inode, FSNOTIFY_EVENT_INODE, + dir, &new_dentry->d_name, 0); } /* @@ -228,7 +232,8 @@ static inline void fsnotify_delete(struct inode *dir, struct inode *inode, if (S_ISDIR(inode->i_mode)) mask |= FS_ISDIR; - fsnotify_name(dir, mask, inode, &dentry->d_name, 0); + fsnotify_name(mask, inode, FSNOTIFY_EVENT_INODE, dir, &dentry->d_name, + 0); } /** From 9601d20734061ea0891e9effaf36eeeab0dad897 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Mon, 25 Oct 2021 16:27:17 -0300 Subject: [PATCH 0661/1565] fsnotify: pass dentry instead of inode data [ Upstream commit fd5a3ff49a19aa69e2bc1e26e98037c2d778e61a ] Define a new data type to pass for event - FSNOTIFY_EVENT_DENTRY. Use it to pass the dentry instead of it's ->d_inode where available. This is needed in preparation to the refactor to retrieve the super block from the data field. In some cases (i.e. mkdir in kernfs), the data inode comes from a negative dentry, such that no super block information would be available. By receiving the dentry itself, instead of the inode, fsnotify can derive the super block even on these cases. Link: https://lore.kernel.org/r/20211025192746.66445-3-krisman@collabora.com Reviewed-by: Jan Kara Signed-off-by: Amir Goldstein [Expand explanation in commit message] Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/fsnotify.h | 5 ++--- include/linux/fsnotify_backend.h | 16 ++++++++++++++++ 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index 211463715e29..e969a23f7063 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -39,8 +39,7 @@ static inline int fsnotify_name(__u32 mask, const void *data, int data_type, static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry, __u32 mask) { - fsnotify_name(mask, d_inode(dentry), FSNOTIFY_EVENT_INODE, - dir, &dentry->d_name, 0); + fsnotify_name(mask, dentry, FSNOTIFY_EVENT_DENTRY, dir, &dentry->d_name, 0); } static inline void fsnotify_inode(struct inode *inode, __u32 mask) @@ -87,7 +86,7 @@ static inline int fsnotify_parent(struct dentry *dentry, __u32 mask, */ static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask) { - fsnotify_parent(dentry, mask, d_inode(dentry), FSNOTIFY_EVENT_INODE); + fsnotify_parent(dentry, mask, dentry, FSNOTIFY_EVENT_DENTRY); } static inline int fsnotify_file(struct file *file, __u32 mask) diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 1ce66748a2d2..a2db821e8a8f 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -248,6 +248,7 @@ enum fsnotify_data_type { FSNOTIFY_EVENT_NONE, FSNOTIFY_EVENT_PATH, FSNOTIFY_EVENT_INODE, + FSNOTIFY_EVENT_DENTRY, }; static inline struct inode *fsnotify_data_inode(const void *data, int data_type) @@ -255,6 +256,8 @@ static inline struct inode *fsnotify_data_inode(const void *data, int data_type) switch (data_type) { case FSNOTIFY_EVENT_INODE: return (struct inode *)data; + case FSNOTIFY_EVENT_DENTRY: + return d_inode(data); case FSNOTIFY_EVENT_PATH: return d_inode(((const struct path *)data)->dentry); default: @@ -262,6 +265,19 @@ static inline struct inode *fsnotify_data_inode(const void *data, int data_type) } } +static inline struct dentry *fsnotify_data_dentry(const void *data, int data_type) +{ + switch (data_type) { + case FSNOTIFY_EVENT_DENTRY: + /* Non const is needed for dget() */ + return (struct dentry *)data; + case FSNOTIFY_EVENT_PATH: + return ((const struct path *)data)->dentry; + default: + return NULL; + } +} + static inline const struct path *fsnotify_data_path(const void *data, int data_type) { From 60b6dab8c81eb058e506456c4650d067bfb57fe2 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Mon, 25 Oct 2021 16:27:18 -0300 Subject: [PATCH 0662/1565] fsnotify: clarify contract for create event hooks [ Upstream commit dabe729dddca550446e9cc118c96d1f91703345b ] Clarify argument names and contract for fsnotify_create() and fsnotify_mkdir() to reflect the anomaly of kernfs, which leaves dentries negavite after mkdir/create. Remove the WARN_ON(!inode) in audit code that were added by the Fixes commit under the wrong assumption that dentries cannot be negative after mkdir/create. Fixes: aa93bdc5500c ("fsnotify: use helpers to access data by data_type") Link: https://lore.kernel.org/linux-fsdevel/87mtp5yz0q.fsf@collabora.com/ Link: https://lore.kernel.org/r/20211025192746.66445-4-krisman@collabora.com Reviewed-by: Jan Kara Reported-by: Gabriel Krisman Bertazi Signed-off-by: Amir Goldstein Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/fsnotify.h | 22 ++++++++++++++++------ kernel/audit_fsnotify.c | 3 +-- kernel/audit_watch.c | 3 +-- 3 files changed, 18 insertions(+), 10 deletions(-) diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index e969a23f7063..addca4ea56ad 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -192,16 +192,22 @@ static inline void fsnotify_inoderemove(struct inode *inode) /* * fsnotify_create - 'name' was linked in + * + * Caller must make sure that dentry->d_name is stable. + * Note: some filesystems (e.g. kernfs) leave @dentry negative and instantiate + * ->d_inode later */ -static inline void fsnotify_create(struct inode *inode, struct dentry *dentry) +static inline void fsnotify_create(struct inode *dir, struct dentry *dentry) { - audit_inode_child(inode, dentry, AUDIT_TYPE_CHILD_CREATE); + audit_inode_child(dir, dentry, AUDIT_TYPE_CHILD_CREATE); - fsnotify_dirent(inode, dentry, FS_CREATE); + fsnotify_dirent(dir, dentry, FS_CREATE); } /* * fsnotify_link - new hardlink in 'inode' directory + * + * Caller must make sure that new_dentry->d_name is stable. * Note: We have to pass also the linked inode ptr as some filesystems leave * new_dentry->d_inode NULL and instantiate inode pointer later */ @@ -267,12 +273,16 @@ static inline void fsnotify_unlink(struct inode *dir, struct dentry *dentry) /* * fsnotify_mkdir - directory 'name' was created + * + * Caller must make sure that dentry->d_name is stable. + * Note: some filesystems (e.g. kernfs) leave @dentry negative and instantiate + * ->d_inode later */ -static inline void fsnotify_mkdir(struct inode *inode, struct dentry *dentry) +static inline void fsnotify_mkdir(struct inode *dir, struct dentry *dentry) { - audit_inode_child(inode, dentry, AUDIT_TYPE_CHILD_CREATE); + audit_inode_child(dir, dentry, AUDIT_TYPE_CHILD_CREATE); - fsnotify_dirent(inode, dentry, FS_CREATE | FS_ISDIR); + fsnotify_dirent(dir, dentry, FS_CREATE | FS_ISDIR); } /* diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c index b2ebacd2f309..76a5925b4e18 100644 --- a/kernel/audit_fsnotify.c +++ b/kernel/audit_fsnotify.c @@ -161,8 +161,7 @@ static int audit_mark_handle_event(struct fsnotify_mark *inode_mark, u32 mask, audit_mark = container_of(inode_mark, struct audit_fsnotify_mark, mark); - if (WARN_ON_ONCE(inode_mark->group != audit_fsnotify_group) || - WARN_ON_ONCE(!inode)) + if (WARN_ON_ONCE(inode_mark->group != audit_fsnotify_group)) return 0; if (mask & (FS_CREATE|FS_MOVED_TO|FS_DELETE|FS_MOVED_FROM)) { diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c index edbeffee64b8..fd7b30a2d9a4 100644 --- a/kernel/audit_watch.c +++ b/kernel/audit_watch.c @@ -472,8 +472,7 @@ static int audit_watch_handle_event(struct fsnotify_mark *inode_mark, u32 mask, parent = container_of(inode_mark, struct audit_parent, mark); - if (WARN_ON_ONCE(inode_mark->group != audit_watch_group) || - WARN_ON_ONCE(!inode)) + if (WARN_ON_ONCE(inode_mark->group != audit_watch_group)) return 0; if (mask & (FS_CREATE|FS_MOVED_TO) && inode) From 7fee789540e920f294955603e44197f397b997c5 Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:19 -0300 Subject: [PATCH 0663/1565] fsnotify: Don't insert unmergeable events in hashtable [ Upstream commit cc53b55f697fe5aa98bdbfdfe67c6401da242155 ] Some events, like the overflow event, are not mergeable, so they are not hashed. But, when failing inside fsnotify_add_event for lack of space, fsnotify_add_event() still calls the insert hook, which adds the overflow event to the merge list. Add a check to prevent any kind of unmergeable event to be inserted in the hashtable. Fixes: 94e00d28a680 ("fsnotify: use hash table for faster events merge") Link: https://lore.kernel.org/r/20211025192746.66445-5-krisman@collabora.com Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 057abd2cf887..310246f8d3f1 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -702,6 +702,9 @@ static void fanotify_insert_event(struct fsnotify_group *group, assert_spin_locked(&group->notification_lock); + if (!fanotify_is_hashed_event(event->mask)) + return; + pr_debug("%s: group=%p event=%p bucket=%u\n", __func__, group, event, bucket); @@ -779,8 +782,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, fsn_event = &event->fse; ret = fsnotify_add_event(group, fsn_event, fanotify_merge, - fanotify_is_hashed_event(mask) ? - fanotify_insert_event : NULL); + fanotify_insert_event); if (ret) { /* Permission events shouldn't be merged */ BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS); From 326be73a59858d4aa51fd4eeea98f25ef886a531 Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:20 -0300 Subject: [PATCH 0664/1565] fanotify: Fold event size calculation to its own function [ Upstream commit b9928e80dda84b349ba8de01780b9bef2fc36ffa ] Every time this function is invoked, it is immediately added to FAN_EVENT_METADATA_LEN, since there is no need to just calculate the length of info records. This minor clean up folds the rest of the calculation into the function, which now operates in terms of events, returning the size of the entire event, including metadata. Link: https://lore.kernel.org/r/20211025192746.66445-6-krisman@collabora.com Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 35 +++++++++++++++++------------- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 10fb062065b6..846e2a661526 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -121,17 +121,24 @@ static int fanotify_fid_info_len(int fh_len, int name_len) FANOTIFY_EVENT_ALIGN); } -static int fanotify_event_info_len(unsigned int info_mode, - struct fanotify_event *event) +static size_t fanotify_event_len(unsigned int info_mode, + struct fanotify_event *event) { - struct fanotify_info *info = fanotify_event_info(event); - int dir_fh_len = fanotify_event_dir_fh_len(event); - int fh_len = fanotify_event_object_fh_len(event); - int info_len = 0; + size_t event_len = FAN_EVENT_METADATA_LEN; + struct fanotify_info *info; + int dir_fh_len; + int fh_len; int dot_len = 0; + if (!info_mode) + return event_len; + + info = fanotify_event_info(event); + dir_fh_len = fanotify_event_dir_fh_len(event); + fh_len = fanotify_event_object_fh_len(event); + if (dir_fh_len) { - info_len += fanotify_fid_info_len(dir_fh_len, info->name_len); + event_len += fanotify_fid_info_len(dir_fh_len, info->name_len); } else if ((info_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) { /* @@ -142,12 +149,12 @@ static int fanotify_event_info_len(unsigned int info_mode, } if (info_mode & FAN_REPORT_PIDFD) - info_len += FANOTIFY_PIDFD_INFO_HDR_LEN; + event_len += FANOTIFY_PIDFD_INFO_HDR_LEN; if (fh_len) - info_len += fanotify_fid_info_len(fh_len, dot_len); + event_len += fanotify_fid_info_len(fh_len, dot_len); - return info_len; + return event_len; } /* @@ -176,7 +183,7 @@ static void fanotify_unhash_event(struct fsnotify_group *group, static struct fanotify_event *get_one_event(struct fsnotify_group *group, size_t count) { - size_t event_size = FAN_EVENT_METADATA_LEN; + size_t event_size; struct fanotify_event *event = NULL; struct fsnotify_event *fsn_event; unsigned int info_mode = FAN_GROUP_FLAG(group, FANOTIFY_INFO_MODES); @@ -189,8 +196,7 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group, goto out; event = FANOTIFY_E(fsn_event); - if (info_mode) - event_size += fanotify_event_info_len(info_mode, event); + event_size = fanotify_event_len(info_mode, event); if (event_size > count) { event = ERR_PTR(-EINVAL); @@ -532,8 +538,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, pr_debug("%s: group=%p event=%p\n", __func__, group, event); - metadata.event_len = FAN_EVENT_METADATA_LEN + - fanotify_event_info_len(info_mode, event); + metadata.event_len = fanotify_event_len(info_mode, event); metadata.metadata_len = FAN_EVENT_METADATA_LEN; metadata.vers = FANOTIFY_METADATA_VERSION; metadata.reserved = 0; From 9539a89f28ed814fe3fd2d89cf011fb78d405f8e Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:21 -0300 Subject: [PATCH 0665/1565] fanotify: Split fsid check from other fid mode checks [ Upstream commit 8299212cbdb01a5867e230e961f82e5c02a6de34 ] FAN_FS_ERROR will require fsid, but not necessarily require the filesystem to expose a file handle. Split those checks into different functions, so they can be used separately when setting up an event. While there, update a comment about tmpfs having 0 fsid, which is no longer true. Link: https://lore.kernel.org/r/20211025192746.66445-7-krisman@collabora.com Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 846e2a661526..6dd6a2e05f55 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1295,16 +1295,15 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) return fd; } -/* Check if filesystem can encode a unique fid */ -static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid) +static int fanotify_test_fsid(struct dentry *dentry, __kernel_fsid_t *fsid) { __kernel_fsid_t root_fsid; int err; /* - * Make sure path is not in filesystem with zero fsid (e.g. tmpfs). + * Make sure dentry is not of a filesystem with zero fsid (e.g. fuse). */ - err = vfs_get_fsid(path->dentry, fsid); + err = vfs_get_fsid(dentry, fsid); if (err) return err; @@ -1312,10 +1311,10 @@ static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid) return -ENODEV; /* - * Make sure path is not inside a filesystem subvolume (e.g. btrfs) + * Make sure dentry is not of a filesystem subvolume (e.g. btrfs) * which uses a different fsid than sb root. */ - err = vfs_get_fsid(path->dentry->d_sb->s_root, &root_fsid); + err = vfs_get_fsid(dentry->d_sb->s_root, &root_fsid); if (err) return err; @@ -1323,6 +1322,12 @@ static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid) root_fsid.val[1] != fsid->val[1]) return -EXDEV; + return 0; +} + +/* Check if filesystem can encode a unique fid */ +static int fanotify_test_fid(struct dentry *dentry) +{ /* * We need to make sure that the file system supports at least * encoding a file handle so user can use name_to_handle_at() to @@ -1330,8 +1335,8 @@ static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid) * objects. However, name_to_handle_at() requires that the * filesystem also supports decoding file handles. */ - if (!path->dentry->d_sb->s_export_op || - !path->dentry->d_sb->s_export_op->fh_to_dentry) + if (!dentry->d_sb->s_export_op || + !dentry->d_sb->s_export_op->fh_to_dentry) return -EOPNOTSUPP; return 0; @@ -1500,7 +1505,11 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, } if (fid_mode) { - ret = fanotify_test_fid(&path, &__fsid); + ret = fanotify_test_fsid(path.dentry, &__fsid); + if (ret) + goto path_put_and_out; + + ret = fanotify_test_fid(path.dentry); if (ret) goto path_put_and_out; From 7c9ba74cb30b1073783a6edbd5aacf14f4714fd8 Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:22 -0300 Subject: [PATCH 0666/1565] inotify: Don't force FS_IN_IGNORED [ Upstream commit e0462f91d24756916fded4313d508e0fc52f39c9 ] According to Amir: "FS_IN_IGNORED is completely internal to inotify and there is no need to set it in i_fsnotify_mask at all, so if we remove the bit from the output of inotify_arg_to_mask() no functionality will change and we will be able to overload the event bit for FS_ERROR." This is done in preparation to overload FS_ERROR with the notification mechanism in fanotify. Link: https://lore.kernel.org/r/20211025192746.66445-8-krisman@collabora.com Suggested-by: Amir Goldstein Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/inotify/inotify_user.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 62cd91bc00b8..9676c3f2df3d 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -89,10 +89,10 @@ static inline __u32 inotify_arg_to_mask(struct inode *inode, u32 arg) __u32 mask; /* - * Everything should accept their own ignored and should receive events - * when the inode is unmounted. All directories care about children. + * Everything should receive events when the inode is unmounted. + * All directories care about children. */ - mask = (FS_IN_IGNORED | FS_UNMOUNT); + mask = (FS_UNMOUNT); if (S_ISDIR(inode->i_mode)) mask |= FS_EVENT_ON_CHILD; From 24eda1b5e6f6e17fc4d8a1962dac8b8daaff977e Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:23 -0300 Subject: [PATCH 0667/1565] fsnotify: Add helper to detect overflow_event [ Upstream commit 808967a0a4d2f4ce6a2005c5692fffbecaf018c1 ] Similarly to fanotify_is_perm_event and friends, provide a helper predicate to say whether a mask is of an overflow event. Link: https://lore.kernel.org/r/20211025192746.66445-9-krisman@collabora.com Suggested-by: Amir Goldstein Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.h | 3 ++- include/linux/fsnotify_backend.h | 5 +++++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 4a5e555dc3d2..c42cf8fd7d79 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -315,7 +315,8 @@ static inline struct path *fanotify_event_path(struct fanotify_event *event) */ static inline bool fanotify_is_hashed_event(u32 mask) { - return !fanotify_is_perm_event(mask) && !(mask & FS_Q_OVERFLOW); + return !(fanotify_is_perm_event(mask) || + fsnotify_is_overflow_event(mask)); } static inline unsigned int fanotify_event_hash_bucket( diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index a2db821e8a8f..749bc85e1d1c 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -510,6 +510,11 @@ static inline void fsnotify_queue_overflow(struct fsnotify_group *group) fsnotify_add_event(group, group->overflow_event, NULL, NULL); } +static inline bool fsnotify_is_overflow_event(u32 mask) +{ + return mask & FS_Q_OVERFLOW; +} + static inline bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group) { assert_spin_locked(&group->notification_lock); From 44844158eea621f8b9d3485196310ef48caa9a7b Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:24 -0300 Subject: [PATCH 0668/1565] fsnotify: Add wrapper around fsnotify_add_event [ Upstream commit 1ad03c3a326a86e259389592117252c851873395 ] fsnotify_add_event is growing in number of parameters, which in most case are just passed a NULL pointer. So, split out a new fsnotify_insert_event function to clean things up for users who don't need an insert hook. Link: https://lore.kernel.org/r/20211025192746.66445-10-krisman@collabora.com Suggested-by: Amir Goldstein Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 4 ++-- fs/notify/inotify/inotify_fsnotify.c | 2 +- fs/notify/notification.c | 12 ++++++------ include/linux/fsnotify_backend.h | 23 ++++++++++++++++------- 4 files changed, 25 insertions(+), 16 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 310246f8d3f1..f82e20228999 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -781,8 +781,8 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, } fsn_event = &event->fse; - ret = fsnotify_add_event(group, fsn_event, fanotify_merge, - fanotify_insert_event); + ret = fsnotify_insert_event(group, fsn_event, fanotify_merge, + fanotify_insert_event); if (ret) { /* Permission events shouldn't be merged */ BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS); diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index b0530f75b274..be3eb1cebdcc 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -123,7 +123,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask, if (len) strcpy(event->name, name->name); - ret = fsnotify_add_event(group, fsn_event, inotify_merge, NULL); + ret = fsnotify_add_event(group, fsn_event, inotify_merge); if (ret) { /* Our event wasn't used in the end. Free it. */ fsnotify_destroy_event(group, fsn_event); diff --git a/fs/notify/notification.c b/fs/notify/notification.c index 32f45543b9c6..44bb10f50715 100644 --- a/fs/notify/notification.c +++ b/fs/notify/notification.c @@ -78,12 +78,12 @@ void fsnotify_destroy_event(struct fsnotify_group *group, * 2 if the event was not queued - either the queue of events has overflown * or the group is shutting down. */ -int fsnotify_add_event(struct fsnotify_group *group, - struct fsnotify_event *event, - int (*merge)(struct fsnotify_group *, - struct fsnotify_event *), - void (*insert)(struct fsnotify_group *, - struct fsnotify_event *)) +int fsnotify_insert_event(struct fsnotify_group *group, + struct fsnotify_event *event, + int (*merge)(struct fsnotify_group *, + struct fsnotify_event *), + void (*insert)(struct fsnotify_group *, + struct fsnotify_event *)) { int ret = 0; struct list_head *list = &group->notification_list; diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 749bc85e1d1c..b323d0c4b967 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -498,16 +498,25 @@ extern int fsnotify_fasync(int fd, struct file *file, int on); extern void fsnotify_destroy_event(struct fsnotify_group *group, struct fsnotify_event *event); /* attach the event to the group notification queue */ -extern int fsnotify_add_event(struct fsnotify_group *group, - struct fsnotify_event *event, - int (*merge)(struct fsnotify_group *, - struct fsnotify_event *), - void (*insert)(struct fsnotify_group *, - struct fsnotify_event *)); +extern int fsnotify_insert_event(struct fsnotify_group *group, + struct fsnotify_event *event, + int (*merge)(struct fsnotify_group *, + struct fsnotify_event *), + void (*insert)(struct fsnotify_group *, + struct fsnotify_event *)); + +static inline int fsnotify_add_event(struct fsnotify_group *group, + struct fsnotify_event *event, + int (*merge)(struct fsnotify_group *, + struct fsnotify_event *)) +{ + return fsnotify_insert_event(group, event, merge, NULL); +} + /* Queue overflow event to a notification group */ static inline void fsnotify_queue_overflow(struct fsnotify_group *group) { - fsnotify_add_event(group, group->overflow_event, NULL, NULL); + fsnotify_add_event(group, group->overflow_event, NULL); } static inline bool fsnotify_is_overflow_event(u32 mask) From 5c4ce075c92b6714b74c9a68abc78bde6d72b96b Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:25 -0300 Subject: [PATCH 0669/1565] fsnotify: Retrieve super block from the data field [ Upstream commit 29335033c574a15334015d8c4e36862cff3d3384 ] Some file system events (i.e. FS_ERROR) might not be associated with an inode or directory. For these, we can retrieve the super block from the data field. But, since the super_block is available in the data field on every event type, simplify the code to always retrieve it from there, through a new helper. Link: https://lore.kernel.org/r/20211025192746.66445-11-krisman@collabora.com Suggested-by: Jan Kara Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fsnotify.c | 7 +++---- include/linux/fsnotify_backend.h | 15 +++++++++++++++ 2 files changed, 18 insertions(+), 4 deletions(-) diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 963e6ce75b96..fde3a1115a17 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -455,16 +455,16 @@ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info) * @file_name is relative to * @file_name: optional file name associated with event * @inode: optional inode associated with event - - * either @dir or @inode must be non-NULL. - * if both are non-NULL event may be reported to both. + * If @dir and @inode are both non-NULL, event may be + * reported to both. * @cookie: inotify rename cookie */ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, const struct qstr *file_name, struct inode *inode, u32 cookie) { const struct path *path = fsnotify_data_path(data, data_type); + struct super_block *sb = fsnotify_data_sb(data, data_type); struct fsnotify_iter_info iter_info = {}; - struct super_block *sb; struct mount *mnt = NULL; struct inode *parent = NULL; int ret = 0; @@ -483,7 +483,6 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, */ parent = dir; } - sb = inode->i_sb; /* * Optimization: srcu_read_lock() has a memory barrier which can diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index b323d0c4b967..035438fe4a43 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -289,6 +289,21 @@ static inline const struct path *fsnotify_data_path(const void *data, } } +static inline struct super_block *fsnotify_data_sb(const void *data, + int data_type) +{ + switch (data_type) { + case FSNOTIFY_EVENT_INODE: + return ((struct inode *)data)->i_sb; + case FSNOTIFY_EVENT_DENTRY: + return ((struct dentry *)data)->d_sb; + case FSNOTIFY_EVENT_PATH: + return ((const struct path *)data)->dentry->d_sb; + default: + return NULL; + } +} + enum fsnotify_obj_type { FSNOTIFY_OBJ_TYPE_INODE, FSNOTIFY_OBJ_TYPE_PARENT, From c9f9d99ea4c3f3e00979f042f5039790c32748f1 Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:26 -0300 Subject: [PATCH 0670/1565] fsnotify: Protect fsnotify_handle_inode_event from no-inode events [ Upstream commit 24dca90590509a7a6cbe0650100c90c5b8a3468a ] FAN_FS_ERROR allows events without inodes - i.e. for file system-wide errors. Even though fsnotify_handle_inode_event is not currently used by fanotify, this patch protects other backends from cases where neither inode or dir are provided. Also document the constraints of the interface (inode and dir cannot be both NULL). Link: https://lore.kernel.org/r/20211025192746.66445-12-krisman@collabora.com Suggested-by: Amir Goldstein Signed-off-by: Gabriel Krisman Bertazi Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 3 +++ fs/notify/fsnotify.c | 3 +++ include/linux/fsnotify_backend.h | 1 + 3 files changed, 7 insertions(+) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 87d984e0cdc0..8cd7d5d6955a 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -601,6 +601,9 @@ nfsd_file_fsnotify_handle_event(struct fsnotify_mark *mark, u32 mask, struct inode *inode, struct inode *dir, const struct qstr *name, u32 cookie) { + if (WARN_ON_ONCE(!inode)) + return 0; + trace_nfsd_file_fsnotify_handle_event(inode, mask); /* Should be no marks on non-regular files */ diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index fde3a1115a17..4034ca566f95 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -252,6 +252,9 @@ static int fsnotify_handle_inode_event(struct fsnotify_group *group, if (WARN_ON_ONCE(!ops->handle_inode_event)) return 0; + if (WARN_ON_ONCE(!inode && !dir)) + return 0; + if ((inode_mark->mask & FS_EXCL_UNLINK) && path && d_unlinked(path->dentry)) return 0; diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 035438fe4a43..b71dc788018e 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -136,6 +136,7 @@ struct mem_cgroup; * @dir: optional directory associated with event - * if @file_name is not NULL, this is the directory that * @file_name is relative to. + * Either @inode or @dir must be non-NULL. * @file_name: optional file name associated with event * @cookie: inotify rename cookie * From 313234a93ea1517df516608925cc431996ac5709 Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:27 -0300 Subject: [PATCH 0671/1565] fsnotify: Pass group argument to free_event [ Upstream commit 330ae77d2a5b0af32c0f29e139bf28ec8591de59 ] For group-wide mempool backed events, like FS_ERROR, the free_event callback will need to reference the group's mempool to free the memory. Wire that argument into the current callers. Link: https://lore.kernel.org/r/20211025192746.66445-13-krisman@collabora.com Reviewed-by: Jan Kara Reviewed-by: Amir Goldstein Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 3 ++- fs/notify/group.c | 2 +- fs/notify/inotify/inotify_fsnotify.c | 3 ++- fs/notify/notification.c | 2 +- include/linux/fsnotify_backend.h | 2 +- 5 files changed, 7 insertions(+), 5 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index f82e20228999..c620b4f6fe12 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -835,7 +835,8 @@ static void fanotify_free_name_event(struct fanotify_event *event) kfree(FANOTIFY_NE(event)); } -static void fanotify_free_event(struct fsnotify_event *fsn_event) +static void fanotify_free_event(struct fsnotify_group *group, + struct fsnotify_event *fsn_event) { struct fanotify_event *event; diff --git a/fs/notify/group.c b/fs/notify/group.c index fb89c351295d..6a297efc4788 100644 --- a/fs/notify/group.c +++ b/fs/notify/group.c @@ -88,7 +88,7 @@ void fsnotify_destroy_group(struct fsnotify_group *group) * that deliberately ignores overflow events. */ if (group->overflow_event) - group->ops->free_event(group->overflow_event); + group->ops->free_event(group, group->overflow_event); fsnotify_put_group(group); } diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index be3eb1cebdcc..827982783639 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -184,7 +184,8 @@ static void inotify_free_group_priv(struct fsnotify_group *group) dec_inotify_instances(group->inotify_data.ucounts); } -static void inotify_free_event(struct fsnotify_event *fsn_event) +static void inotify_free_event(struct fsnotify_group *group, + struct fsnotify_event *fsn_event) { kfree(INOTIFY_E(fsn_event)); } diff --git a/fs/notify/notification.c b/fs/notify/notification.c index 44bb10f50715..9022ae650cf8 100644 --- a/fs/notify/notification.c +++ b/fs/notify/notification.c @@ -64,7 +64,7 @@ void fsnotify_destroy_event(struct fsnotify_group *group, WARN_ON(!list_empty(&event->list)); spin_unlock(&group->notification_lock); } - group->ops->free_event(event); + group->ops->free_event(group, event); } /* diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index b71dc788018e..3a7c31436182 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -156,7 +156,7 @@ struct fsnotify_ops { const struct qstr *file_name, u32 cookie); void (*free_group_priv)(struct fsnotify_group *group); void (*freeing_mark)(struct fsnotify_mark *mark, struct fsnotify_group *group); - void (*free_event)(struct fsnotify_event *event); + void (*free_event)(struct fsnotify_group *group, struct fsnotify_event *event); /* called on final put+free to free memory */ void (*free_mark)(struct fsnotify_mark *mark); }; From 44ce59c254109b9692db2e925b2e129e12989811 Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:28 -0300 Subject: [PATCH 0672/1565] fanotify: Support null inode event in fanotify_dfid_inode [ Upstream commit 12f47bf0f0990933d95d021d13d31bda010648fd ] FAN_FS_ERROR doesn't support DFID, but this function is still called for every event. The problem is that it is not capable of handling null inodes, which now can happen in case of superblock error events. For this case, just returning dir will be enough. Link: https://lore.kernel.org/r/20211025192746.66445-14-krisman@collabora.com Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index c620b4f6fe12..397ee623ff1e 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -452,7 +452,7 @@ static struct inode *fanotify_dfid_inode(u32 event_mask, const void *data, if (event_mask & ALL_FSNOTIFY_DIRENT_EVENTS) return dir; - if (S_ISDIR(inode->i_mode)) + if (inode && S_ISDIR(inode->i_mode)) return inode; return dir; From 86bda2d7525206651acd055276981459d20724df Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:29 -0300 Subject: [PATCH 0673/1565] fanotify: Allow file handle encoding for unhashed events [ Upstream commit 74fe4734897a2da2ae2a665a5e622cd490d36eaf ] Allow passing a NULL hash to fanotify_encode_fh and avoid calculating the hash if not needed. Link: https://lore.kernel.org/r/20211025192746.66445-15-krisman@collabora.com Reviewed-by: Jan Kara Reviewed-by: Amir Goldstein Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 397ee623ff1e..ec84fee7ad01 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -403,8 +403,12 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, fh->type = type; fh->len = fh_len; - /* Mix fh into event merge key */ - *hash ^= fanotify_hash_fh(fh); + /* + * Mix fh into event merge key. Hash might be NULL in case of + * unhashed FID events (i.e. FAN_FS_ERROR). + */ + if (hash) + *hash ^= fanotify_hash_fh(fh); return FANOTIFY_FH_HDR_LEN + fh_len; From 2f65be620948af5e7308e23faaa752177e1be3f4 Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:30 -0300 Subject: [PATCH 0674/1565] fanotify: Encode empty file handle when no inode is provided [ Upstream commit 272531ac619b374ab474e989eb387162fded553f ] Instead of failing, encode an invalid file handle in fanotify_encode_fh if no inode is provided. This bogus file handle will be reported by FAN_FS_ERROR for non-inode errors. Link: https://lore.kernel.org/r/20211025192746.66445-16-krisman@collabora.com Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index ec84fee7ad01..c64d61b673ca 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -370,8 +370,14 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, fh->type = FILEID_ROOT; fh->len = 0; fh->flags = 0; + + /* + * Invalid FHs are used by FAN_FS_ERROR for errors not + * linked to any inode. The f_handle won't be reported + * back to userspace. + */ if (!inode) - return 0; + goto out; /* * !gpf means preallocated variable size fh, but fh_len could @@ -403,6 +409,7 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, fh->type = type; fh->len = fh_len; +out: /* * Mix fh into event merge key. Hash might be NULL in case of * unhashed FID events (i.e. FAN_FS_ERROR). From 8ccc724f50706836092aa361c5a31677b4873bcc Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:31 -0300 Subject: [PATCH 0675/1565] fanotify: Require fid_mode for any non-fd event [ Upstream commit 4fe595cf1c80e7a5af4d00c4da29def64aff57a2 ] Like inode events, FAN_FS_ERROR will require fid mode. Therefore, convert the verification during fanotify_mark(2) to require fid for any non-fd event. This means fid_mode will not only be required for inode events, but for any event that doesn't provide a descriptor. Link: https://lore.kernel.org/r/20211025192746.66445-17-krisman@collabora.com Suggested-by: Amir Goldstein Reviewed-by: Jan Kara Reviewed-by: Amir Goldstein Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 12 ++++++------ include/linux/fanotify.h | 3 +++ 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 6dd6a2e05f55..34bf71108f7a 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1471,14 +1471,14 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, goto fput_and_out; /* - * Events with data type inode do not carry enough information to report - * event->fd, so we do not allow setting a mask for inode events unless - * group supports reporting fid. - * inode events are not supported on a mount mark, because they do not - * carry enough information (i.e. path) to be filtered by mount point. + * Events that do not carry enough information to report + * event->fd require a group that supports reporting fid. Those + * events are not supported on a mount mark, because they do not + * carry enough information (i.e. path) to be filtered by mount + * point. */ fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); - if (mask & FANOTIFY_INODE_EVENTS && + if (mask & ~(FANOTIFY_FD_EVENTS|FANOTIFY_EVENT_FLAGS) && (!fid_mode || mark_type == FAN_MARK_MOUNT)) goto fput_and_out; diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index eec3b7c40811..52d464802d99 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -84,6 +84,9 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ */ #define FANOTIFY_DIRENT_EVENTS (FAN_MOVE | FAN_CREATE | FAN_DELETE) +/* Events that can be reported with event->fd */ +#define FANOTIFY_FD_EVENTS (FANOTIFY_PATH_EVENTS | FANOTIFY_PERM_EVENTS) + /* Events that can only be reported with data type FSNOTIFY_EVENT_INODE */ #define FANOTIFY_INODE_EVENTS (FANOTIFY_DIRENT_EVENTS | \ FAN_ATTRIB | FAN_MOVE_SELF | FAN_DELETE_SELF) From badbf879decac83029ad22b9f2a38ee8fa489fd6 Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:32 -0300 Subject: [PATCH 0676/1565] fsnotify: Support FS_ERROR event type [ Upstream commit 9daa811073fa19c08e8aad3b90f9235fed161acf ] Expose a new type of fsnotify event for filesystems to report errors for userspace monitoring tools. fanotify will send this type of notification for FAN_FS_ERROR events. This also introduce a helper for generating the new event. Link: https://lore.kernel.org/r/20211025192746.66445-18-krisman@collabora.com Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/fsnotify.h | 13 +++++++++++++ include/linux/fsnotify_backend.h | 32 +++++++++++++++++++++++++++++++- 2 files changed, 44 insertions(+), 1 deletion(-) diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index addca4ea56ad..bec1e23ecf78 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -376,4 +376,17 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid) fsnotify_dentry(dentry, mask); } +static inline int fsnotify_sb_error(struct super_block *sb, struct inode *inode, + int error) +{ + struct fs_error_report report = { + .error = error, + .inode = inode, + .sb = sb, + }; + + return fsnotify(FS_ERROR, &report, FSNOTIFY_EVENT_ERROR, + NULL, NULL, NULL, 0); +} + #endif /* _LINUX_FS_NOTIFY_H */ diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 3a7c31436182..00dbaafbcf95 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -42,6 +42,12 @@ #define FS_UNMOUNT 0x00002000 /* inode on umount fs */ #define FS_Q_OVERFLOW 0x00004000 /* Event queued overflowed */ +#define FS_ERROR 0x00008000 /* Filesystem Error (fanotify) */ + +/* + * FS_IN_IGNORED overloads FS_ERROR. It is only used internally by inotify + * which does not support FS_ERROR. + */ #define FS_IN_IGNORED 0x00008000 /* last inotify event here */ #define FS_OPEN_PERM 0x00010000 /* open event in an permission hook */ @@ -95,7 +101,8 @@ #define ALL_FSNOTIFY_EVENTS (ALL_FSNOTIFY_DIRENT_EVENTS | \ FS_EVENTS_POSS_ON_CHILD | \ FS_DELETE_SELF | FS_MOVE_SELF | FS_DN_RENAME | \ - FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED) + FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED | \ + FS_ERROR) /* Extra flags that may be reported with event or control handling of events */ #define ALL_FSNOTIFY_FLAGS (FS_EXCL_UNLINK | FS_ISDIR | FS_IN_ONESHOT | \ @@ -250,6 +257,13 @@ enum fsnotify_data_type { FSNOTIFY_EVENT_PATH, FSNOTIFY_EVENT_INODE, FSNOTIFY_EVENT_DENTRY, + FSNOTIFY_EVENT_ERROR, +}; + +struct fs_error_report { + int error; + struct inode *inode; + struct super_block *sb; }; static inline struct inode *fsnotify_data_inode(const void *data, int data_type) @@ -261,6 +275,8 @@ static inline struct inode *fsnotify_data_inode(const void *data, int data_type) return d_inode(data); case FSNOTIFY_EVENT_PATH: return d_inode(((const struct path *)data)->dentry); + case FSNOTIFY_EVENT_ERROR: + return ((struct fs_error_report *)data)->inode; default: return NULL; } @@ -300,6 +316,20 @@ static inline struct super_block *fsnotify_data_sb(const void *data, return ((struct dentry *)data)->d_sb; case FSNOTIFY_EVENT_PATH: return ((const struct path *)data)->dentry->d_sb; + case FSNOTIFY_EVENT_ERROR: + return ((struct fs_error_report *) data)->sb; + default: + return NULL; + } +} + +static inline struct fs_error_report *fsnotify_data_error_report( + const void *data, + int data_type) +{ + switch (data_type) { + case FSNOTIFY_EVENT_ERROR: + return (struct fs_error_report *) data; default: return NULL; } From eec22d03a98e1e10e0754259e2d56234dc6891de Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:33 -0300 Subject: [PATCH 0677/1565] fanotify: Reserve UAPI bits for FAN_FS_ERROR [ Upstream commit 8d11a4f43ef4679be0908026907a7613b33d7127 ] FAN_FS_ERROR allows reporting of event type FS_ERROR to userspace, which is a mechanism to report file system wide problems via fanotify. This commit preallocate userspace visible bits to match the FS_ERROR event. Link: https://lore.kernel.org/r/20211025192746.66445-19-krisman@collabora.com Reviewed-by: Jan Kara Reviewed-by: Amir Goldstein Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 1 + include/uapi/linux/fanotify.h | 1 + 2 files changed, 2 insertions(+) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index c64d61b673ca..8f152445d75c 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -752,6 +752,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR); BUILD_BUG_ON(FAN_OPEN_EXEC != FS_OPEN_EXEC); BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM); + BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR); BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19); diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index 64553df9d735..2990731ddc8b 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -20,6 +20,7 @@ #define FAN_OPEN_EXEC 0x00001000 /* File was opened for exec */ #define FAN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */ +#define FAN_FS_ERROR 0x00008000 /* Filesystem error */ #define FAN_OPEN_PERM 0x00010000 /* File open in perm check */ #define FAN_ACCESS_PERM 0x00020000 /* File accessed in perm check */ From 68aacb60a799ed556a1910d6df31cd89ff894dd0 Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:34 -0300 Subject: [PATCH 0678/1565] fanotify: Pre-allocate pool of error events [ Upstream commit 734a1a5eccc5f7473002b0669f788e135f1f64aa ] Pre-allocate slots for file system errors to have greater chances of succeeding, since error events can happen in GFP_NOFS context. This patch introduces a group-wide mempool of error events, shared by all FAN_FS_ERROR marks in this group. Link: https://lore.kernel.org/r/20211025192746.66445-20-krisman@collabora.com Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 3 +++ fs/notify/fanotify/fanotify.h | 11 +++++++++++ fs/notify/fanotify/fanotify_user.c | 26 +++++++++++++++++++++++++- include/linux/fsnotify_backend.h | 2 ++ 4 files changed, 41 insertions(+), 1 deletion(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 8f152445d75c..01d68dfc74aa 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -819,6 +819,9 @@ static void fanotify_free_group_priv(struct fsnotify_group *group) if (group->fanotify_data.ucounts) dec_ucount(group->fanotify_data.ucounts, UCOUNT_FANOTIFY_GROUPS); + + if (mempool_initialized(&group->fanotify_data.error_events_pool)) + mempool_exit(&group->fanotify_data.error_events_pool); } static void fanotify_free_path_event(struct fanotify_event *event) diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index c42cf8fd7d79..a577e87fac2b 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -141,6 +141,7 @@ enum fanotify_event_type { FANOTIFY_EVENT_TYPE_PATH, FANOTIFY_EVENT_TYPE_PATH_PERM, FANOTIFY_EVENT_TYPE_OVERFLOW, /* struct fanotify_event */ + FANOTIFY_EVENT_TYPE_FS_ERROR, /* struct fanotify_error_event */ __FANOTIFY_EVENT_TYPE_NUM }; @@ -196,6 +197,16 @@ FANOTIFY_NE(struct fanotify_event *event) return container_of(event, struct fanotify_name_event, fae); } +struct fanotify_error_event { + struct fanotify_event fae; +}; + +static inline struct fanotify_error_event * +FANOTIFY_EE(struct fanotify_event *event) +{ + return container_of(event, struct fanotify_error_event, fae); +} + static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event) { if (event->type == FANOTIFY_EVENT_TYPE_FID) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 34bf71108f7a..8a2b7941fc98 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -30,6 +30,7 @@ #define FANOTIFY_DEFAULT_MAX_EVENTS 16384 #define FANOTIFY_OLD_DEFAULT_MAX_MARKS 8192 #define FANOTIFY_DEFAULT_MAX_GROUPS 128 +#define FANOTIFY_DEFAULT_FEE_POOL_SIZE 32 /* * Legacy fanotify marks limits (8192) is per group and we introduced a tunable @@ -1049,6 +1050,15 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group, return ERR_PTR(ret); } +static int fanotify_group_init_error_pool(struct fsnotify_group *group) +{ + if (mempool_initialized(&group->fanotify_data.error_events_pool)) + return 0; + + return mempool_init_kmalloc_pool(&group->fanotify_data.error_events_pool, + FANOTIFY_DEFAULT_FEE_POOL_SIZE, + sizeof(struct fanotify_error_event)); +} static int fanotify_add_mark(struct fsnotify_group *group, fsnotify_connp_t *connp, unsigned int type, @@ -1057,6 +1067,7 @@ static int fanotify_add_mark(struct fsnotify_group *group, { struct fsnotify_mark *fsn_mark; __u32 added; + int ret = 0; mutex_lock(&group->mark_mutex); fsn_mark = fsnotify_find_mark(connp, group); @@ -1067,13 +1078,26 @@ static int fanotify_add_mark(struct fsnotify_group *group, return PTR_ERR(fsn_mark); } } + + /* + * Error events are pre-allocated per group, only if strictly + * needed (i.e. FAN_FS_ERROR was requested). + */ + if (!(flags & FAN_MARK_IGNORED_MASK) && (mask & FAN_FS_ERROR)) { + ret = fanotify_group_init_error_pool(group); + if (ret) + goto out; + } + added = fanotify_mark_add_to_mask(fsn_mark, mask, flags); if (added & ~fsnotify_conn_mask(fsn_mark->connector)) fsnotify_recalc_mask(fsn_mark->connector); + +out: mutex_unlock(&group->mark_mutex); fsnotify_put_mark(fsn_mark); - return 0; + return ret; } static int fanotify_add_vfsmount_mark(struct fsnotify_group *group, diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 00dbaafbcf95..51ef2b079bfa 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -19,6 +19,7 @@ #include #include #include +#include /* * IN_* from inotfy.h lines up EXACTLY with FS_*, this is so we can easily @@ -246,6 +247,7 @@ struct fsnotify_group { int flags; /* flags from fanotify_init() */ int f_flags; /* event_f_flags from fanotify_init() */ struct ucounts *ucounts; + mempool_t error_events_pool; } fanotify_data; #endif /* CONFIG_FANOTIFY */ }; From 9b98f4ff5186a06eac16bc81a337b15d49261653 Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:35 -0300 Subject: [PATCH 0679/1565] fanotify: Support enqueueing of error events [ Upstream commit 83e9acbe13dc1b767f91b5c1350f7a65689b26f6 ] Once an error event is triggered, enqueue it in the notification group, similarly to what is done for other events. FAN_FS_ERROR is not handled specially, since the memory is now handled by a preallocated mempool. For now, make the event unhashed. A future patch implements merging of this kind of event. Link: https://lore.kernel.org/r/20211025192746.66445-21-krisman@collabora.com Reviewed-by: Jan Kara Reviewed-by: Amir Goldstein Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 35 +++++++++++++++++++++++++++++++++++ fs/notify/fanotify/fanotify.h | 6 ++++++ 2 files changed, 41 insertions(+) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 01d68dfc74aa..1f195c95dfcd 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -574,6 +574,27 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, return &fne->fae; } +static struct fanotify_event *fanotify_alloc_error_event( + struct fsnotify_group *group, + __kernel_fsid_t *fsid, + const void *data, int data_type) +{ + struct fs_error_report *report = + fsnotify_data_error_report(data, data_type); + struct fanotify_error_event *fee; + + if (WARN_ON_ONCE(!report)) + return NULL; + + fee = mempool_alloc(&group->fanotify_data.error_events_pool, GFP_NOFS); + if (!fee) + return NULL; + + fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR; + + return &fee->fae; +} + static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, u32 mask, const void *data, int data_type, struct inode *dir, @@ -641,6 +662,9 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, if (fanotify_is_perm_event(mask)) { event = fanotify_alloc_perm_event(path, gfp); + } else if (fanotify_is_error_event(mask)) { + event = fanotify_alloc_error_event(group, fsid, data, + data_type); } else if (name_event && (file_name || child)) { event = fanotify_alloc_name_event(id, fsid, file_name, child, &hash, gfp); @@ -850,6 +874,14 @@ static void fanotify_free_name_event(struct fanotify_event *event) kfree(FANOTIFY_NE(event)); } +static void fanotify_free_error_event(struct fsnotify_group *group, + struct fanotify_event *event) +{ + struct fanotify_error_event *fee = FANOTIFY_EE(event); + + mempool_free(fee, &group->fanotify_data.error_events_pool); +} + static void fanotify_free_event(struct fsnotify_group *group, struct fsnotify_event *fsn_event) { @@ -873,6 +905,9 @@ static void fanotify_free_event(struct fsnotify_group *group, case FANOTIFY_EVENT_TYPE_OVERFLOW: kfree(event); break; + case FANOTIFY_EVENT_TYPE_FS_ERROR: + fanotify_free_error_event(group, event); + break; default: WARN_ON_ONCE(1); } diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index a577e87fac2b..ebef952481fa 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -298,6 +298,11 @@ static inline struct fanotify_event *FANOTIFY_E(struct fsnotify_event *fse) return container_of(fse, struct fanotify_event, fse); } +static inline bool fanotify_is_error_event(u32 mask) +{ + return mask & FAN_FS_ERROR; +} + static inline bool fanotify_event_has_path(struct fanotify_event *event) { return event->type == FANOTIFY_EVENT_TYPE_PATH || @@ -327,6 +332,7 @@ static inline struct path *fanotify_event_path(struct fanotify_event *event) static inline bool fanotify_is_hashed_event(u32 mask) { return !(fanotify_is_perm_event(mask) || + fanotify_is_error_event(mask) || fsnotify_is_overflow_event(mask)); } From b974c8aa0081321d1e908a79c2b025219edbe2de Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:36 -0300 Subject: [PATCH 0680/1565] fanotify: Support merging of error events [ Upstream commit 8a6ae64132fd27a944faed7bc38484827609eb76 ] Error events (FAN_FS_ERROR) against the same file system can be merged by simply iterating the error count. The hash is taken from the fsid, without considering the FH. This means that only the first error object is reported. Link: https://lore.kernel.org/r/20211025192746.66445-22-krisman@collabora.com Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 26 ++++++++++++++++++++++++-- fs/notify/fanotify/fanotify.h | 4 +++- 2 files changed, 27 insertions(+), 3 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 1f195c95dfcd..cedcb1546804 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -111,6 +111,16 @@ static bool fanotify_name_event_equal(struct fanotify_name_event *fne1, return fanotify_info_equal(info1, info2); } +static bool fanotify_error_event_equal(struct fanotify_error_event *fee1, + struct fanotify_error_event *fee2) +{ + /* Error events against the same file system are always merged. */ + if (!fanotify_fsid_equal(&fee1->fsid, &fee2->fsid)) + return false; + + return true; +} + static bool fanotify_should_merge(struct fanotify_event *old, struct fanotify_event *new) { @@ -141,6 +151,9 @@ static bool fanotify_should_merge(struct fanotify_event *old, case FANOTIFY_EVENT_TYPE_FID_NAME: return fanotify_name_event_equal(FANOTIFY_NE(old), FANOTIFY_NE(new)); + case FANOTIFY_EVENT_TYPE_FS_ERROR: + return fanotify_error_event_equal(FANOTIFY_EE(old), + FANOTIFY_EE(new)); default: WARN_ON_ONCE(1); } @@ -176,6 +189,10 @@ static int fanotify_merge(struct fsnotify_group *group, break; if (fanotify_should_merge(old, new)) { old->mask |= new->mask; + + if (fanotify_is_error_event(old->mask)) + FANOTIFY_EE(old)->err_count++; + return 1; } } @@ -577,7 +594,8 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, static struct fanotify_event *fanotify_alloc_error_event( struct fsnotify_group *group, __kernel_fsid_t *fsid, - const void *data, int data_type) + const void *data, int data_type, + unsigned int *hash) { struct fs_error_report *report = fsnotify_data_error_report(data, data_type); @@ -591,6 +609,10 @@ static struct fanotify_event *fanotify_alloc_error_event( return NULL; fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR; + fee->err_count = 1; + fee->fsid = *fsid; + + *hash ^= fanotify_hash_fsid(fsid); return &fee->fae; } @@ -664,7 +686,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, event = fanotify_alloc_perm_event(path, gfp); } else if (fanotify_is_error_event(mask)) { event = fanotify_alloc_error_event(group, fsid, data, - data_type); + data_type, &hash); } else if (name_event && (file_name || child)) { event = fanotify_alloc_name_event(id, fsid, file_name, child, &hash, gfp); diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index ebef952481fa..2b032b79d5b0 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -199,6 +199,9 @@ FANOTIFY_NE(struct fanotify_event *event) struct fanotify_error_event { struct fanotify_event fae; + u32 err_count; /* Suppressed errors count */ + + __kernel_fsid_t fsid; /* FSID this error refers to. */ }; static inline struct fanotify_error_event * @@ -332,7 +335,6 @@ static inline struct path *fanotify_event_path(struct fanotify_event *event) static inline bool fanotify_is_hashed_event(u32 mask) { return !(fanotify_is_perm_event(mask) || - fanotify_is_error_event(mask) || fsnotify_is_overflow_event(mask)); } From 7935cf4070c42c9a49577221317d5e0198cf600c Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:37 -0300 Subject: [PATCH 0681/1565] fanotify: Wrap object_fh inline space in a creator macro [ Upstream commit 2c5069433a3adc01ff9c5673567961bb7f138074 ] fanotify_error_event would duplicate this sequence of declarations that already exist elsewhere with a slight different size. Create a helper macro to avoid code duplication. Link: https://lore.kernel.org/r/20211025192746.66445-23-krisman@collabora.com Suggested-by: Jan Kara Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.h | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 2b032b79d5b0..3510d06654ed 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -171,12 +171,18 @@ static inline void fanotify_init_event(struct fanotify_event *event, event->pid = NULL; } +#define FANOTIFY_INLINE_FH(name, size) \ +struct { \ + struct fanotify_fh (name); \ + /* Space for object_fh.buf[] - access with fanotify_fh_buf() */ \ + unsigned char _inline_fh_buf[(size)]; \ +} + struct fanotify_fid_event { struct fanotify_event fae; __kernel_fsid_t fsid; - struct fanotify_fh object_fh; - /* Reserve space in object_fh.buf[] - access with fanotify_fh_buf() */ - unsigned char _inline_fh_buf[FANOTIFY_INLINE_FH_LEN]; + + FANOTIFY_INLINE_FH(object_fh, FANOTIFY_INLINE_FH_LEN); }; static inline struct fanotify_fid_event * From 7fa20568b6e5150ce6ccf437769a92a8144fd407 Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:38 -0300 Subject: [PATCH 0682/1565] fanotify: Add helpers to decide whether to report FID/DFID [ Upstream commit 4bd5a5c8e6e5cd964e9738e6ef87f6c2cb453edf ] Now that there is an event that reports FID records even for a zeroed file handle, wrap the logic that deides whether to issue the records into helper functions. This shouldn't have any impact on the code, but simplifies further patches. Link: https://lore.kernel.org/r/20211025192746.66445-24-krisman@collabora.com Reviewed-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi Reviewed-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.h | 10 ++++++++++ fs/notify/fanotify/fanotify_user.c | 13 +++++++------ 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 3510d06654ed..80af269eebb8 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -264,6 +264,16 @@ static inline int fanotify_event_dir_fh_len(struct fanotify_event *event) return info ? fanotify_info_dir_fh_len(info) : 0; } +static inline bool fanotify_event_has_object_fh(struct fanotify_event *event) +{ + return fanotify_event_object_fh_len(event) > 0; +} + +static inline bool fanotify_event_has_dir_fh(struct fanotify_event *event) +{ + return fanotify_event_dir_fh_len(event) > 0; +} + struct fanotify_path_event { struct fanotify_event fae; struct path path; diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 8a2b7941fc98..34ed30be0e4d 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -135,10 +135,9 @@ static size_t fanotify_event_len(unsigned int info_mode, return event_len; info = fanotify_event_info(event); - dir_fh_len = fanotify_event_dir_fh_len(event); - fh_len = fanotify_event_object_fh_len(event); - if (dir_fh_len) { + if (fanotify_event_has_dir_fh(event)) { + dir_fh_len = fanotify_event_dir_fh_len(event); event_len += fanotify_fid_info_len(dir_fh_len, info->name_len); } else if ((info_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) { @@ -152,8 +151,10 @@ static size_t fanotify_event_len(unsigned int info_mode, if (info_mode & FAN_REPORT_PIDFD) event_len += FANOTIFY_PIDFD_INFO_HDR_LEN; - if (fh_len) + if (fanotify_event_has_object_fh(event)) { + fh_len = fanotify_event_object_fh_len(event); event_len += fanotify_fid_info_len(fh_len, dot_len); + } return event_len; } @@ -446,7 +447,7 @@ static int copy_info_records_to_user(struct fanotify_event *event, /* * Event info records order is as follows: dir fid + name, child fid. */ - if (fanotify_event_dir_fh_len(event)) { + if (fanotify_event_has_dir_fh(event)) { info_type = info->name_len ? FAN_EVENT_INFO_TYPE_DFID_NAME : FAN_EVENT_INFO_TYPE_DFID; ret = copy_fid_info_to_user(fanotify_event_fsid(event), @@ -462,7 +463,7 @@ static int copy_info_records_to_user(struct fanotify_event *event, total_bytes += ret; } - if (fanotify_event_object_fh_len(event)) { + if (fanotify_event_has_object_fh(event)) { const char *dot = NULL; int dot_len = 0; From bb247feb22d76161d28d67d80da06c6ca459fd34 Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:40 -0300 Subject: [PATCH 0683/1565] fanotify: WARN_ON against too large file handles [ Upstream commit 572c28f27a269f88e2d8d7b6b1507f114d637337 ] struct fanotify_error_event, at least, is preallocated and isn't able to to handle arbitrarily large file handles. Future-proof the code by complaining loudly if a handle larger than MAX_HANDLE_SZ is ever found. Link: https://lore.kernel.org/r/20211025192746.66445-26-krisman@collabora.com Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index cedcb1546804..45df610debbe 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -360,13 +360,23 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, static int fanotify_encode_fh_len(struct inode *inode) { int dwords = 0; + int fh_len; if (!inode) return 0; exportfs_encode_inode_fh(inode, NULL, &dwords, NULL); + fh_len = dwords << 2; - return dwords << 2; + /* + * struct fanotify_error_event might be preallocated and is + * limited to MAX_HANDLE_SZ. This should never happen, but + * safeguard by forcing an invalid file handle. + */ + if (WARN_ON_ONCE(fh_len > MAX_HANDLE_SZ)) + return 0; + + return fh_len; } /* From aefd9029fa501e1bc661c1e436838959d996b075 Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:41 -0300 Subject: [PATCH 0684/1565] fanotify: Report fid info for file related file system errors [ Upstream commit 936d6a38be39177495af38497bf8da1c6128fa1b ] Plumb the pieces to add a FID report to error records. Since all error event memory must be pre-allocated, we pre-allocate the maximum file handle size possible, such that it should always fit. For errors that don't expose a file handle, report it with an invalid FID. Internally we use zero-length FILEID_ROOT file handle for passing the information (which we report as zero-length FILEID_INVALID file handle to userspace) so we update the handle reporting code to deal with this case correctly. Link: https://lore.kernel.org/r/20211025192746.66445-27-krisman@collabora.com Link: https://lore.kernel.org/r/20211025192746.66445-25-krisman@collabora.com Signed-off-by: Gabriel Krisman Bertazi Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara [Folded two patches into 2 to make series bisectable] Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 11 +++++++++++ fs/notify/fanotify/fanotify.h | 9 +++++++++ fs/notify/fanotify/fanotify_user.c | 8 +++++--- 3 files changed, 25 insertions(+), 3 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 45df610debbe..465f07e70e6d 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -609,7 +609,9 @@ static struct fanotify_event *fanotify_alloc_error_event( { struct fs_error_report *report = fsnotify_data_error_report(data, data_type); + struct inode *inode; struct fanotify_error_event *fee; + int fh_len; if (WARN_ON_ONCE(!report)) return NULL; @@ -622,6 +624,15 @@ static struct fanotify_event *fanotify_alloc_error_event( fee->err_count = 1; fee->fsid = *fsid; + inode = report->inode; + fh_len = fanotify_encode_fh_len(inode); + + /* Bad fh_len. Fallback to using an invalid fh. Should never happen. */ + if (!fh_len && inode) + inode = NULL; + + fanotify_encode_fh(&fee->object_fh, inode, fh_len, NULL, 0); + *hash ^= fanotify_hash_fsid(fsid); return &fee->fae; diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 80af269eebb8..edd7587adcc5 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -208,6 +208,8 @@ struct fanotify_error_event { u32 err_count; /* Suppressed errors count */ __kernel_fsid_t fsid; /* FSID this error refers to. */ + + FANOTIFY_INLINE_FH(object_fh, MAX_HANDLE_SZ); }; static inline struct fanotify_error_event * @@ -222,6 +224,8 @@ static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event) return &FANOTIFY_FE(event)->fsid; else if (event->type == FANOTIFY_EVENT_TYPE_FID_NAME) return &FANOTIFY_NE(event)->fsid; + else if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR) + return &FANOTIFY_EE(event)->fsid; else return NULL; } @@ -233,6 +237,8 @@ static inline struct fanotify_fh *fanotify_event_object_fh( return &FANOTIFY_FE(event)->object_fh; else if (event->type == FANOTIFY_EVENT_TYPE_FID_NAME) return fanotify_info_file_fh(&FANOTIFY_NE(event)->info); + else if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR) + return &FANOTIFY_EE(event)->object_fh; else return NULL; } @@ -266,6 +272,9 @@ static inline int fanotify_event_dir_fh_len(struct fanotify_event *event) static inline bool fanotify_event_has_object_fh(struct fanotify_event *event) { + /* For error events, even zeroed fh are reported. */ + if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR) + return true; return fanotify_event_object_fh_len(event) > 0; } diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 34ed30be0e4d..0d36ac3ed7e9 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -334,9 +334,6 @@ static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, pr_debug("%s: fh_len=%zu name_len=%zu, info_len=%zu, count=%zu\n", __func__, fh_len, name_len, info_len, count); - if (!fh_len) - return 0; - if (WARN_ON_ONCE(len < sizeof(info) || len > count)) return -EFAULT; @@ -371,6 +368,11 @@ static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, handle.handle_type = fh->type; handle.handle_bytes = fh_len; + + /* Mangle handle_type for bad file_handle */ + if (!fh_len) + handle.handle_type = FILEID_INVALID; + if (copy_to_user(buf, &handle, sizeof(handle))) return -EFAULT; From b0f01b7c080896cb32689de598622125fbd62bc3 Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:42 -0300 Subject: [PATCH 0685/1565] fanotify: Emit generic error info for error event [ Upstream commit 130a3c742107acff985541c28360c8b40203559c ] The error info is a record sent to users on FAN_FS_ERROR events documenting the type of error. It also carries an error count, documenting how many errors were observed since the last reporting. Link: https://lore.kernel.org/r/20211025192746.66445-28-krisman@collabora.com Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 1 + fs/notify/fanotify/fanotify.h | 1 + fs/notify/fanotify/fanotify_user.c | 36 ++++++++++++++++++++++++++++++ include/uapi/linux/fanotify.h | 7 ++++++ 4 files changed, 45 insertions(+) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 465f07e70e6d..af61425e6e3b 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -621,6 +621,7 @@ static struct fanotify_event *fanotify_alloc_error_event( return NULL; fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR; + fee->error = report->error; fee->err_count = 1; fee->fsid = *fsid; diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index edd7587adcc5..d25f500bf7e7 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -205,6 +205,7 @@ FANOTIFY_NE(struct fanotify_event *event) struct fanotify_error_event { struct fanotify_event fae; + s32 error; /* Error reported by the Filesystem. */ u32 err_count; /* Suppressed errors count */ __kernel_fsid_t fsid; /* FSID this error refers to. */ diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 0d36ac3ed7e9..3040c8669dfd 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -110,6 +110,8 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly; (sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle)) #define FANOTIFY_PIDFD_INFO_HDR_LEN \ sizeof(struct fanotify_event_info_pidfd) +#define FANOTIFY_ERROR_INFO_LEN \ + (sizeof(struct fanotify_event_info_error)) static int fanotify_fid_info_len(int fh_len, int name_len) { @@ -134,6 +136,9 @@ static size_t fanotify_event_len(unsigned int info_mode, if (!info_mode) return event_len; + if (fanotify_is_error_event(event->mask)) + event_len += FANOTIFY_ERROR_INFO_LEN; + info = fanotify_event_info(event); if (fanotify_event_has_dir_fh(event)) { @@ -319,6 +324,28 @@ static int process_access_response(struct fsnotify_group *group, return -ENOENT; } +static size_t copy_error_info_to_user(struct fanotify_event *event, + char __user *buf, int count) +{ + struct fanotify_event_info_error info; + struct fanotify_error_event *fee = FANOTIFY_EE(event); + + info.hdr.info_type = FAN_EVENT_INFO_TYPE_ERROR; + info.hdr.pad = 0; + info.hdr.len = FANOTIFY_ERROR_INFO_LEN; + + if (WARN_ON(count < info.hdr.len)) + return -EFAULT; + + info.error = fee->error; + info.error_count = fee->err_count; + + if (copy_to_user(buf, &info, sizeof(info))) + return -EFAULT; + + return info.hdr.len; +} + static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, int info_type, const char *name, size_t name_len, @@ -525,6 +552,15 @@ static int copy_info_records_to_user(struct fanotify_event *event, total_bytes += ret; } + if (fanotify_is_error_event(event->mask)) { + ret = copy_error_info_to_user(event, buf, count); + if (ret < 0) + return ret; + buf += ret; + count -= ret; + total_bytes += ret; + } + return total_bytes; } diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index 2990731ddc8b..bd1932c2074d 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -126,6 +126,7 @@ struct fanotify_event_metadata { #define FAN_EVENT_INFO_TYPE_DFID_NAME 2 #define FAN_EVENT_INFO_TYPE_DFID 3 #define FAN_EVENT_INFO_TYPE_PIDFD 4 +#define FAN_EVENT_INFO_TYPE_ERROR 5 /* Variable length info record following event metadata */ struct fanotify_event_info_header { @@ -160,6 +161,12 @@ struct fanotify_event_info_pidfd { __s32 pidfd; }; +struct fanotify_event_info_error { + struct fanotify_event_info_header hdr; + __s32 error; + __u32 error_count; +}; + struct fanotify_response { __s32 fd; __u32 response; From 2eda01447798f7ee30a26583a684bbd00ac0d54c Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:43 -0300 Subject: [PATCH 0686/1565] fanotify: Allow users to request FAN_FS_ERROR events [ Upstream commit 9709bd548f11a092d124698118013f66e1740f9b ] Wire up the FAN_FS_ERROR event in the fanotify_mark syscall, allowing user space to request the monitoring of FAN_FS_ERROR events. These events are limited to filesystem marks, so check it is the case in the syscall handler. Link: https://lore.kernel.org/r/20211025192746.66445-29-krisman@collabora.com Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara Signed-off-by: Gabriel Krisman Bertazi Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 2 +- fs/notify/fanotify/fanotify_user.c | 4 ++++ include/linux/fanotify.h | 6 +++++- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index af61425e6e3b..b6091775aa6e 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -822,7 +822,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM); BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR); - BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19); + BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20); mask = fanotify_group_event_mask(group, iter_info, mask, data, data_type, dir); diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 3040c8669dfd..e6860c55b3dc 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1533,6 +1533,10 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, group->priority == FS_PRIO_0) goto fput_and_out; + if (mask & FAN_FS_ERROR && + mark_type != FAN_MARK_FILESYSTEM) + goto fput_and_out; + /* * Events that do not carry enough information to report * event->fd require a group that supports reporting fid. Those diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 52d464802d99..616af2ea20f3 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -91,9 +91,13 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ #define FANOTIFY_INODE_EVENTS (FANOTIFY_DIRENT_EVENTS | \ FAN_ATTRIB | FAN_MOVE_SELF | FAN_DELETE_SELF) +/* Events that can only be reported with data type FSNOTIFY_EVENT_ERROR */ +#define FANOTIFY_ERROR_EVENTS (FAN_FS_ERROR) + /* Events that user can request to be notified on */ #define FANOTIFY_EVENTS (FANOTIFY_PATH_EVENTS | \ - FANOTIFY_INODE_EVENTS) + FANOTIFY_INODE_EVENTS | \ + FANOTIFY_ERROR_EVENTS) /* Events that require a permission response from user */ #define FANOTIFY_PERM_EVENTS (FAN_OPEN_PERM | FAN_ACCESS_PERM | \ From 2b2963c72c8ab618237d9bccf1fbfea6092ed8a0 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sat, 16 Oct 2021 18:02:57 -0400 Subject: [PATCH 0687/1565] SUNRPC: Trace calls to .rpc_call_done [ Upstream commit b40887e10dcacc5e8ae3c1a99dcba20877c4831b ] Introduce a single tracepoint that can replace simple dprintk call sites in upper layer "rpc_call_done" callbacks. Example: kworker/u24:2-1254 [001] 771.026677: rpc_stats_latency: task:00000001@00000002 xid=0x16a6f3c0 rpcbindv2 GETPORT backlog=446 rtt=101 execute=555 kworker/u24:2-1254 [001] 771.026677: rpc_task_call_done: task:00000001@00000002 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpcb_getport_done kworker/u24:2-1254 [001] 771.026678: rpcb_setport: task:00000001@00000002 status=0 port=20048 Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/clntproc.c | 3 --- fs/lockd/svc4proc.c | 2 -- fs/lockd/svcproc.c | 2 -- fs/nfs/filelayout/filelayout.c | 2 -- fs/nfs/flexfilelayout/flexfilelayout.c | 2 -- fs/nfs/pagelist.c | 3 --- fs/nfs/write.c | 3 --- include/trace/events/sunrpc.h | 1 + net/sunrpc/sched.c | 1 + 9 files changed, 2 insertions(+), 17 deletions(-) diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c index b11f2afa84f1..99fffc9cb958 100644 --- a/fs/lockd/clntproc.c +++ b/fs/lockd/clntproc.c @@ -794,9 +794,6 @@ static void nlmclnt_cancel_callback(struct rpc_task *task, void *data) goto retry_cancel; } - dprintk("lockd: cancel status %u (task %u)\n", - status, task->tk_pid); - switch (status) { case NLM_LCK_GRANTED: case NLM_LCK_DENIED_GRACE_PERIOD: diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c index e10ae2c41279..176b468a61c7 100644 --- a/fs/lockd/svc4proc.c +++ b/fs/lockd/svc4proc.c @@ -269,8 +269,6 @@ nlm4svc_proc_granted(struct svc_rqst *rqstp) */ static void nlm4svc_callback_exit(struct rpc_task *task, void *data) { - dprintk("lockd: %5u callback returned %d\n", task->tk_pid, - -task->tk_status); } static void nlm4svc_callback_release(void *data) diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c index 99696d3f6dd6..4dc1b40a489a 100644 --- a/fs/lockd/svcproc.c +++ b/fs/lockd/svcproc.c @@ -301,8 +301,6 @@ nlmsvc_proc_granted(struct svc_rqst *rqstp) */ static void nlmsvc_callback_exit(struct rpc_task *task, void *data) { - dprintk("lockd: %5u callback returned %d\n", task->tk_pid, - -task->tk_status); } void nlmsvc_release_call(struct nlm_rqst *call) diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c index 45eec08ec904..2ed8b6885b09 100644 --- a/fs/nfs/filelayout/filelayout.c +++ b/fs/nfs/filelayout/filelayout.c @@ -293,8 +293,6 @@ static void filelayout_read_call_done(struct rpc_task *task, void *data) { struct nfs_pgio_header *hdr = data; - dprintk("--> %s task->tk_status %d\n", __func__, task->tk_status); - if (test_bit(NFS_IOHDR_REDO, &hdr->flags) && task->tk_status == 0) { nfs41_sequence_done(task, &hdr->res.seq_res); diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index f2ae271fe7ec..a263bfec4244 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -1419,8 +1419,6 @@ static void ff_layout_read_call_done(struct rpc_task *task, void *data) { struct nfs_pgio_header *hdr = data; - dprintk("--> %s task->tk_status %d\n", __func__, task->tk_status); - if (test_bit(NFS_IOHDR_REDO, &hdr->flags) && task->tk_status == 0) { nfs4_sequence_done(task, &hdr->res.seq_res); diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c index 17fef6eb490c..d79a3b6cb070 100644 --- a/fs/nfs/pagelist.c +++ b/fs/nfs/pagelist.c @@ -870,9 +870,6 @@ static void nfs_pgio_result(struct rpc_task *task, void *calldata) struct nfs_pgio_header *hdr = calldata; struct inode *inode = hdr->inode; - dprintk("NFS: %s: %5u, (status %d)\n", __func__, - task->tk_pid, task->tk_status); - if (hdr->rw_ops->rw_done(task, hdr, inode) != 0) return; if (task->tk_status < 0) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 4cf060691979..2bde35921f2b 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1809,9 +1809,6 @@ static void nfs_commit_done(struct rpc_task *task, void *calldata) { struct nfs_commit_data *data = calldata; - dprintk("NFS: %5u nfs_commit_done (status %d)\n", - task->tk_pid, task->tk_status); - /* Call the NFS version-specific code */ NFS_PROTO(data->inode)->commit_done(task, data); trace_nfs_commit_done(task, data); diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index fce071f39f51..56e4a57d2538 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -394,6 +394,7 @@ DEFINE_RPC_RUNNING_EVENT(complete); DEFINE_RPC_RUNNING_EVENT(timeout); DEFINE_RPC_RUNNING_EVENT(signalled); DEFINE_RPC_RUNNING_EVENT(end); +DEFINE_RPC_RUNNING_EVENT(call_done); DECLARE_EVENT_CLASS(rpc_task_queued, diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index a00890962e11..a4c9d410eb8d 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -821,6 +821,7 @@ void rpc_exit_task(struct rpc_task *task) else if (task->tk_client) rpc_count_iostats(task, task->tk_client->cl_metrics); if (task->tk_ops->rpc_call_done != NULL) { + trace_rpc_task_call_done(task, task->tk_ops->rpc_call_done); task->tk_ops->rpc_call_done(task, task->tk_calldata); if (task->tk_action != NULL) { /* Always release the RPC slot and buffer memory */ From d2815110a7418a3155bd74e6a15db5ea6261d333 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 20 Sep 2021 15:25:21 -0400 Subject: [PATCH 0688/1565] NFSD: Optimize DRC bucket pruning [ Upstream commit 8847ecc9274a14114385d1cb4030326baa0766eb ] DRC bucket pruning is done by nfsd_cache_lookup(), which is part of every NFSv2 and NFSv3 dispatch (ie, it's done while the client is waiting). I added a trace_printk() in prune_bucket() to see just how long it takes to prune. Here are two ends of the spectrum: prune_bucket: Scanned 1 and freed 0 in 90 ns, 62 entries remaining prune_bucket: Scanned 2 and freed 1 in 716 ns, 63 entries remaining ... prune_bucket: Scanned 75 and freed 74 in 34149 ns, 1 entries remaining Pruning latency is noticeable on fast transports with fast storage. By noticeable, I mean that the latency measured here in the worst case is the same order of magnitude as the round trip time for cached server operations. We could do something like moving expired entries to an expired list and then free them later instead of freeing them right in prune_bucket(). But simply limiting the number of entries that can be pruned by a lookup is simple and retains more entries in the cache, making the DRC somewhat more effective. Comparison with a 70/30 fio 8KB 12 thread direct I/O test: Before: write: IOPS=61.6k, BW=481MiB/s (505MB/s)(14.1GiB/30001msec); 0 zone resets WRITE: 1848726 ops (30%) avg bytes sent per op: 8340 avg bytes received per op: 136 backlog wait: 0.635158 RTT: 0.128525 total execute time: 0.827242 (milliseconds) After: write: IOPS=63.0k, BW=492MiB/s (516MB/s)(14.4GiB/30001msec); 0 zone resets WRITE: 1891144 ops (30%) avg bytes sent per op: 8340 avg bytes received per op: 136 backlog wait: 0.616114 RTT: 0.126842 total execute time: 0.805348 (milliseconds) Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfscache.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c index 96cdf77925f3..6e0b6f3148dc 100644 --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -241,8 +241,8 @@ lru_put_end(struct nfsd_drc_bucket *b, struct svc_cacherep *rp) list_move_tail(&rp->c_lru, &b->lru_head); } -static long -prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn) +static long prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn, + unsigned int max) { struct svc_cacherep *rp, *tmp; long freed = 0; @@ -258,11 +258,17 @@ prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn) time_before(jiffies, rp->c_timestamp + RC_EXPIRE)) break; nfsd_reply_cache_free_locked(b, rp, nn); - freed++; + if (max && freed++ > max) + break; } return freed; } +static long nfsd_prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn) +{ + return prune_bucket(b, nn, 3); +} + /* * Walk the LRU list and prune off entries that are older than RC_EXPIRE. * Also prune the oldest ones when the total exceeds the max number of entries. @@ -279,7 +285,7 @@ prune_cache_entries(struct nfsd_net *nn) if (list_empty(&b->lru_head)) continue; spin_lock(&b->cache_lock); - freed += prune_bucket(b, nn); + freed += prune_bucket(b, nn, 0); spin_unlock(&b->cache_lock); } return freed; @@ -453,8 +459,7 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp) atomic_inc(&nn->num_drc_entries); nfsd_stats_drc_mem_usage_add(nn, sizeof(*rp)); - /* go ahead and prune the cache */ - prune_bucket(b, nn); + nfsd_prune_bucket(b, nn); out_unlock: spin_unlock(&b->cache_lock); From 918bc45a57bcd0bfa5673cba8844765a466f8e7b Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Thu, 2 Sep 2021 11:14:47 +1000 Subject: [PATCH 0689/1565] NFSD: move filehandle format declarations out of "uapi". [ Upstream commit ef5825e3cf0d0af657f5fb4dd86d750ed42fee0a ] A small part of the declaration concerning filehandle format are currently in the "uapi" include directory: include/uapi/linux/nfsd/nfsfh.h There is a lot more to the filehandle format, including "enum fid_type" and "enum nfsd_fsid" which are not exported via "uapi". This small part of the filehandle definition is of minimal use outside of the kernel, and I can find no evidence that an other code is using it. Certainly nfs-utils and wireshark (The most likely candidates) do not use these declarations. So move it out of "uapi" by copying the content from include/uapi/linux/nfsd/nfsfh.h into fs/nfsd/nfsfh.h A few unnecessary "#include" directives are not copied, and neither is the #define of fh_auth, which is annotated as being for userspace only. The copyright claims in the uapi file are identical to those in the nfsd file, so there is no need to copy those. The "__u32" style integer types are only needed in "uapi". In kernel-only code we can use the more familiar "u32" style. Signed-off-by: NeilBrown Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsfh.h | 97 ++++++++++++++++++++++++++- fs/nfsd/vfs.c | 1 + include/uapi/linux/nfsd/nfsfh.h | 115 -------------------------------- 3 files changed, 97 insertions(+), 116 deletions(-) delete mode 100644 include/uapi/linux/nfsd/nfsfh.h diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 6106697adc04..ad47f16676a8 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -10,9 +10,104 @@ #include #include -#include #include #include +#include + + +/* + * This is the old "dentry style" Linux NFSv2 file handle. + * + * The xino and xdev fields are currently used to transport the + * ino/dev of the exported inode. + */ +struct nfs_fhbase_old { + u32 fb_dcookie; /* dentry cookie - always 0xfeebbaca */ + u32 fb_ino; /* our inode number */ + u32 fb_dirino; /* dir inode number, 0 for directories */ + u32 fb_dev; /* our device */ + u32 fb_xdev; + u32 fb_xino; + u32 fb_generation; +}; + +/* + * This is the new flexible, extensible style NFSv2/v3/v4 file handle. + * + * The file handle starts with a sequence of four-byte words. + * The first word contains a version number (1) and three descriptor bytes + * that tell how the remaining 3 variable length fields should be handled. + * These three bytes are auth_type, fsid_type and fileid_type. + * + * All four-byte values are in host-byte-order. + * + * The auth_type field is deprecated and must be set to 0. + * + * The fsid_type identifies how the filesystem (or export point) is + * encoded. + * Current values: + * 0 - 4 byte device id (ms-2-bytes major, ls-2-bytes minor), 4byte inode number + * NOTE: we cannot use the kdev_t device id value, because kdev_t.h + * says we mustn't. We must break it up and reassemble. + * 1 - 4 byte user specified identifier + * 2 - 4 byte major, 4 byte minor, 4 byte inode number - DEPRECATED + * 3 - 4 byte device id, encoded for user-space, 4 byte inode number + * 4 - 4 byte inode number and 4 byte uuid + * 5 - 8 byte uuid + * 6 - 16 byte uuid + * 7 - 8 byte inode number and 16 byte uuid + * + * The fileid_type identified how the file within the filesystem is encoded. + * The values for this field are filesystem specific, exccept that + * filesystems must not use the values '0' or '0xff'. 'See enum fid_type' + * in include/linux/exportfs.h for currently registered values. + */ +struct nfs_fhbase_new { + union { + struct { + u8 fb_version_aux; /* == 1, even => nfs_fhbase_old */ + u8 fb_auth_type_aux; + u8 fb_fsid_type_aux; + u8 fb_fileid_type_aux; + u32 fb_auth[1]; + /* u32 fb_fsid[0]; floating */ + /* u32 fb_fileid[0]; floating */ + }; + struct { + u8 fb_version; /* == 1, even => nfs_fhbase_old */ + u8 fb_auth_type; + u8 fb_fsid_type; + u8 fb_fileid_type; + u32 fb_auth_flex[]; /* flexible-array member */ + }; + }; +}; + +struct knfsd_fh { + unsigned int fh_size; /* significant for NFSv3. + * Points to the current size while building + * a new file handle + */ + union { + struct nfs_fhbase_old fh_old; + u32 fh_pad[NFS4_FHSIZE/4]; + struct nfs_fhbase_new fh_new; + } fh_base; +}; + +#define ofh_dcookie fh_base.fh_old.fb_dcookie +#define ofh_ino fh_base.fh_old.fb_ino +#define ofh_dirino fh_base.fh_old.fb_dirino +#define ofh_dev fh_base.fh_old.fb_dev +#define ofh_xdev fh_base.fh_old.fb_xdev +#define ofh_xino fh_base.fh_old.fb_xino +#define ofh_generation fh_base.fh_old.fb_generation + +#define fh_version fh_base.fh_new.fb_version +#define fh_fsid_type fh_base.fh_new.fb_fsid_type +#define fh_auth_type fh_base.fh_new.fb_auth_type +#define fh_fileid_type fh_base.fh_new.fb_fileid_type +#define fh_fsid fh_base.fh_new.fb_auth_flex static inline __u32 ino_t_to_u32(ino_t ino) { diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 2c493937dd5e..05b5f7e241e7 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -244,6 +244,7 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp, * returned. Otherwise the covered directory is returned. * NOTE: this mountpoint crossing is not supported properly by all * clients and is explicitly disallowed for NFSv3 + * NeilBrown */ __be32 nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name, diff --git a/include/uapi/linux/nfsd/nfsfh.h b/include/uapi/linux/nfsd/nfsfh.h deleted file mode 100644 index e29e8accc4f4..000000000000 --- a/include/uapi/linux/nfsd/nfsfh.h +++ /dev/null @@ -1,115 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ -/* - * This file describes the layout of the file handles as passed - * over the wire. - * - * Copyright (C) 1995, 1996, 1997 Olaf Kirch - */ - -#ifndef _UAPI_LINUX_NFSD_FH_H -#define _UAPI_LINUX_NFSD_FH_H - -#include -#include -#include -#include -#include - -/* - * This is the old "dentry style" Linux NFSv2 file handle. - * - * The xino and xdev fields are currently used to transport the - * ino/dev of the exported inode. - */ -struct nfs_fhbase_old { - __u32 fb_dcookie; /* dentry cookie - always 0xfeebbaca */ - __u32 fb_ino; /* our inode number */ - __u32 fb_dirino; /* dir inode number, 0 for directories */ - __u32 fb_dev; /* our device */ - __u32 fb_xdev; - __u32 fb_xino; - __u32 fb_generation; -}; - -/* - * This is the new flexible, extensible style NFSv2/v3/v4 file handle. - * - * The file handle starts with a sequence of four-byte words. - * The first word contains a version number (1) and three descriptor bytes - * that tell how the remaining 3 variable length fields should be handled. - * These three bytes are auth_type, fsid_type and fileid_type. - * - * All four-byte values are in host-byte-order. - * - * The auth_type field is deprecated and must be set to 0. - * - * The fsid_type identifies how the filesystem (or export point) is - * encoded. - * Current values: - * 0 - 4 byte device id (ms-2-bytes major, ls-2-bytes minor), 4byte inode number - * NOTE: we cannot use the kdev_t device id value, because kdev_t.h - * says we mustn't. We must break it up and reassemble. - * 1 - 4 byte user specified identifier - * 2 - 4 byte major, 4 byte minor, 4 byte inode number - DEPRECATED - * 3 - 4 byte device id, encoded for user-space, 4 byte inode number - * 4 - 4 byte inode number and 4 byte uuid - * 5 - 8 byte uuid - * 6 - 16 byte uuid - * 7 - 8 byte inode number and 16 byte uuid - * - * The fileid_type identified how the file within the filesystem is encoded. - * The values for this field are filesystem specific, exccept that - * filesystems must not use the values '0' or '0xff'. 'See enum fid_type' - * in include/linux/exportfs.h for currently registered values. - */ -struct nfs_fhbase_new { - union { - struct { - __u8 fb_version_aux; /* == 1, even => nfs_fhbase_old */ - __u8 fb_auth_type_aux; - __u8 fb_fsid_type_aux; - __u8 fb_fileid_type_aux; - __u32 fb_auth[1]; - /* __u32 fb_fsid[0]; floating */ - /* __u32 fb_fileid[0]; floating */ - }; - struct { - __u8 fb_version; /* == 1, even => nfs_fhbase_old */ - __u8 fb_auth_type; - __u8 fb_fsid_type; - __u8 fb_fileid_type; - __u32 fb_auth_flex[]; /* flexible-array member */ - }; - }; -}; - -struct knfsd_fh { - unsigned int fh_size; /* significant for NFSv3. - * Points to the current size while building - * a new file handle - */ - union { - struct nfs_fhbase_old fh_old; - __u32 fh_pad[NFS4_FHSIZE/4]; - struct nfs_fhbase_new fh_new; - } fh_base; -}; - -#define ofh_dcookie fh_base.fh_old.fb_dcookie -#define ofh_ino fh_base.fh_old.fb_ino -#define ofh_dirino fh_base.fh_old.fb_dirino -#define ofh_dev fh_base.fh_old.fb_dev -#define ofh_xdev fh_base.fh_old.fb_xdev -#define ofh_xino fh_base.fh_old.fb_xino -#define ofh_generation fh_base.fh_old.fb_generation - -#define fh_version fh_base.fh_new.fb_version -#define fh_fsid_type fh_base.fh_new.fb_fsid_type -#define fh_auth_type fh_base.fh_new.fb_auth_type -#define fh_fileid_type fh_base.fh_new.fb_fileid_type -#define fh_fsid fh_base.fh_new.fb_auth_flex - -/* Do not use, provided for userspace compatiblity. */ -#define fh_auth fh_base.fh_new.fb_auth - -#endif /* _UAPI_LINUX_NFSD_FH_H */ From 25054b04ec92cceccf2c1da75c5abd52c8fdc5b7 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Thu, 2 Sep 2021 11:15:29 +1000 Subject: [PATCH 0690/1565] NFSD: drop support for ancient filehandles [ Upstream commit c645a883df34ee10b884ec921e850def54b7f461 ] Filehandles not in the "new" or "version 1" format have not been handed out for new mounts since Linux 2.4 which was released 20 years ago. I think it is safe to say that no such file handles are still in use, and that we can drop support for them. Signed-off-by: NeilBrown Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsfh.c | 154 +++++++++++++++--------------------------------- fs/nfsd/nfsfh.h | 34 +---------- 2 files changed, 51 insertions(+), 137 deletions(-) diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 04930056222b..7e5a508173a0 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -153,11 +153,12 @@ static inline __be32 check_pseudo_root(struct svc_rqst *rqstp, static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) { struct knfsd_fh *fh = &fhp->fh_handle; - struct fid *fid = NULL, sfid; + struct fid *fid = NULL; struct svc_export *exp; struct dentry *dentry; int fileid_type; int data_left = fh->fh_size/4; + int len; __be32 error; error = nfserr_stale; @@ -166,48 +167,35 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) if (rqstp->rq_vers == 4 && fh->fh_size == 0) return nfserr_nofilehandle; - if (fh->fh_version == 1) { - int len; + if (fh->fh_version != 1) + return error; - if (--data_left < 0) - return error; - if (fh->fh_auth_type != 0) - return error; - len = key_len(fh->fh_fsid_type) / 4; - if (len == 0) - return error; - if (fh->fh_fsid_type == FSID_MAJOR_MINOR) { - /* deprecated, convert to type 3 */ - len = key_len(FSID_ENCODE_DEV)/4; - fh->fh_fsid_type = FSID_ENCODE_DEV; - /* - * struct knfsd_fh uses host-endian fields, which are - * sometimes used to hold net-endian values. This - * confuses sparse, so we must use __force here to - * keep it from complaining. - */ - fh->fh_fsid[0] = new_encode_dev(MKDEV(ntohl((__force __be32)fh->fh_fsid[0]), - ntohl((__force __be32)fh->fh_fsid[1]))); - fh->fh_fsid[1] = fh->fh_fsid[2]; - } - data_left -= len; - if (data_left < 0) - return error; - exp = rqst_exp_find(rqstp, fh->fh_fsid_type, fh->fh_fsid); - fid = (struct fid *)(fh->fh_fsid + len); - } else { - __u32 tfh[2]; - dev_t xdev; - ino_t xino; - - if (fh->fh_size != NFS_FHSIZE) - return error; - /* assume old filehandle format */ - xdev = old_decode_dev(fh->ofh_xdev); - xino = u32_to_ino_t(fh->ofh_xino); - mk_fsid(FSID_DEV, tfh, xdev, xino, 0, NULL); - exp = rqst_exp_find(rqstp, FSID_DEV, tfh); + if (--data_left < 0) + return error; + if (fh->fh_auth_type != 0) + return error; + len = key_len(fh->fh_fsid_type) / 4; + if (len == 0) + return error; + if (fh->fh_fsid_type == FSID_MAJOR_MINOR) { + /* deprecated, convert to type 3 */ + len = key_len(FSID_ENCODE_DEV)/4; + fh->fh_fsid_type = FSID_ENCODE_DEV; + /* + * struct knfsd_fh uses host-endian fields, which are + * sometimes used to hold net-endian values. This + * confuses sparse, so we must use __force here to + * keep it from complaining. + */ + fh->fh_fsid[0] = new_encode_dev(MKDEV(ntohl((__force __be32)fh->fh_fsid[0]), + ntohl((__force __be32)fh->fh_fsid[1]))); + fh->fh_fsid[1] = fh->fh_fsid[2]; } + data_left -= len; + if (data_left < 0) + return error; + exp = rqst_exp_find(rqstp, fh->fh_fsid_type, fh->fh_fsid); + fid = (struct fid *)(fh->fh_fsid + len); error = nfserr_stale; if (IS_ERR(exp)) { @@ -252,18 +240,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) if (rqstp->rq_vers > 2) error = nfserr_badhandle; - if (fh->fh_version != 1) { - sfid.i32.ino = fh->ofh_ino; - sfid.i32.gen = fh->ofh_generation; - sfid.i32.parent_ino = fh->ofh_dirino; - fid = &sfid; - data_left = 3; - if (fh->ofh_dirino == 0) - fileid_type = FILEID_INO32_GEN; - else - fileid_type = FILEID_INO32_GEN_PARENT; - } else - fileid_type = fh->fh_fileid_type; + fileid_type = fh->fh_fileid_type; if (fileid_type == FILEID_ROOT) dentry = dget(exp->ex_path.dentry); @@ -451,20 +428,6 @@ static void _fh_update(struct svc_fh *fhp, struct svc_export *exp, } } -/* - * for composing old style file handles - */ -static inline void _fh_update_old(struct dentry *dentry, - struct svc_export *exp, - struct knfsd_fh *fh) -{ - fh->ofh_ino = ino_t_to_u32(d_inode(dentry)->i_ino); - fh->ofh_generation = d_inode(dentry)->i_generation; - if (d_is_dir(dentry) || - (exp->ex_flags & NFSEXP_NOSUBTREECHECK)) - fh->ofh_dirino = 0; -} - static bool is_root_export(struct svc_export *exp) { return exp->ex_path.dentry == exp->ex_path.dentry->d_sb->s_root; @@ -561,9 +524,6 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry, /* ref_fh is a reference file handle. * if it is non-null and for the same filesystem, then we should compose * a filehandle which is of the same version, where possible. - * Currently, that means that if ref_fh->fh_handle.fh_version == 0xca - * Then create a 32byte filehandle using nfs_fhbase_old - * */ struct inode * inode = d_inode(dentry); @@ -599,35 +559,21 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry, fhp->fh_dentry = dget(dentry); /* our internal copy */ fhp->fh_export = exp_get(exp); - if (fhp->fh_handle.fh_version == 0xca) { - /* old style filehandle please */ - memset(&fhp->fh_handle.fh_base, 0, NFS_FHSIZE); - fhp->fh_handle.fh_size = NFS_FHSIZE; - fhp->fh_handle.ofh_dcookie = 0xfeebbaca; - fhp->fh_handle.ofh_dev = old_encode_dev(ex_dev); - fhp->fh_handle.ofh_xdev = fhp->fh_handle.ofh_dev; - fhp->fh_handle.ofh_xino = - ino_t_to_u32(d_inode(exp->ex_path.dentry)->i_ino); - fhp->fh_handle.ofh_dirino = ino_t_to_u32(parent_ino(dentry)); - if (inode) - _fh_update_old(dentry, exp, &fhp->fh_handle); - } else { - fhp->fh_handle.fh_size = - key_len(fhp->fh_handle.fh_fsid_type) + 4; - fhp->fh_handle.fh_auth_type = 0; + fhp->fh_handle.fh_size = + key_len(fhp->fh_handle.fh_fsid_type) + 4; + fhp->fh_handle.fh_auth_type = 0; - mk_fsid(fhp->fh_handle.fh_fsid_type, - fhp->fh_handle.fh_fsid, - ex_dev, - d_inode(exp->ex_path.dentry)->i_ino, - exp->ex_fsid, exp->ex_uuid); + mk_fsid(fhp->fh_handle.fh_fsid_type, + fhp->fh_handle.fh_fsid, + ex_dev, + d_inode(exp->ex_path.dentry)->i_ino, + exp->ex_fsid, exp->ex_uuid); - if (inode) - _fh_update(fhp, exp, dentry); - if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) { - fh_put(fhp); - return nfserr_opnotsupp; - } + if (inode) + _fh_update(fhp, exp, dentry); + if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) { + fh_put(fhp); + return nfserr_opnotsupp; } return 0; @@ -648,16 +594,12 @@ fh_update(struct svc_fh *fhp) dentry = fhp->fh_dentry; if (d_really_is_negative(dentry)) goto out_negative; - if (fhp->fh_handle.fh_version != 1) { - _fh_update_old(dentry, fhp->fh_export, &fhp->fh_handle); - } else { - if (fhp->fh_handle.fh_fileid_type != FILEID_ROOT) - return 0; + if (fhp->fh_handle.fh_fileid_type != FILEID_ROOT) + return 0; - _fh_update(fhp, fhp->fh_export, dentry); - if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) - return nfserr_opnotsupp; - } + _fh_update(fhp, fhp->fh_export, dentry); + if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) + return nfserr_opnotsupp; return 0; out_bad: printk(KERN_ERR "fh_update: fh not verified!\n"); diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index ad47f16676a8..8b5587f274a7 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -14,26 +14,7 @@ #include #include - /* - * This is the old "dentry style" Linux NFSv2 file handle. - * - * The xino and xdev fields are currently used to transport the - * ino/dev of the exported inode. - */ -struct nfs_fhbase_old { - u32 fb_dcookie; /* dentry cookie - always 0xfeebbaca */ - u32 fb_ino; /* our inode number */ - u32 fb_dirino; /* dir inode number, 0 for directories */ - u32 fb_dev; /* our device */ - u32 fb_xdev; - u32 fb_xino; - u32 fb_generation; -}; - -/* - * This is the new flexible, extensible style NFSv2/v3/v4 file handle. - * * The file handle starts with a sequence of four-byte words. * The first word contains a version number (1) and three descriptor bytes * that tell how the remaining 3 variable length fields should be handled. @@ -57,7 +38,7 @@ struct nfs_fhbase_old { * 6 - 16 byte uuid * 7 - 8 byte inode number and 16 byte uuid * - * The fileid_type identified how the file within the filesystem is encoded. + * The fileid_type identifies how the file within the filesystem is encoded. * The values for this field are filesystem specific, exccept that * filesystems must not use the values '0' or '0xff'. 'See enum fid_type' * in include/linux/exportfs.h for currently registered values. @@ -65,7 +46,7 @@ struct nfs_fhbase_old { struct nfs_fhbase_new { union { struct { - u8 fb_version_aux; /* == 1, even => nfs_fhbase_old */ + u8 fb_version_aux; /* == 1 */ u8 fb_auth_type_aux; u8 fb_fsid_type_aux; u8 fb_fileid_type_aux; @@ -74,7 +55,7 @@ struct nfs_fhbase_new { /* u32 fb_fileid[0]; floating */ }; struct { - u8 fb_version; /* == 1, even => nfs_fhbase_old */ + u8 fb_version; /* == 1 */ u8 fb_auth_type; u8 fb_fsid_type; u8 fb_fileid_type; @@ -89,20 +70,11 @@ struct knfsd_fh { * a new file handle */ union { - struct nfs_fhbase_old fh_old; u32 fh_pad[NFS4_FHSIZE/4]; struct nfs_fhbase_new fh_new; } fh_base; }; -#define ofh_dcookie fh_base.fh_old.fb_dcookie -#define ofh_ino fh_base.fh_old.fb_ino -#define ofh_dirino fh_base.fh_old.fb_dirino -#define ofh_dev fh_base.fh_old.fb_dev -#define ofh_xdev fh_base.fh_old.fb_xdev -#define ofh_xino fh_base.fh_old.fb_xino -#define ofh_generation fh_base.fh_old.fb_generation - #define fh_version fh_base.fh_new.fb_version #define fh_fsid_type fh_base.fh_new.fb_fsid_type #define fh_auth_type fh_base.fh_new.fb_auth_type From 67841880909219887688ee0ee1e53012c1be1569 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Thu, 2 Sep 2021 11:16:32 +1000 Subject: [PATCH 0691/1565] NFSD: simplify struct nfsfh [ Upstream commit d8b26071e65e80a348602b939e333242f989221b ] Most of the fields in 'struct knfsd_fh' are 2 levels deep (a union and a struct) and are accessed using macros like: #define fh_FOO fh_base.fh_new.fb_FOO This patch makes the union and struct anonymous, so that "fh_FOO" can be a name directly within 'struct knfsd_fh' and the #defines aren't needed. The file handle as a whole is sometimes accessed as "fh_base" or "fh_base.fh_pad", neither of which are particularly helpful names. As the struct holding the filehandle is now anonymous, we cannot use the name of that, so we union it with 'fh_raw' and use that where the raw filehandle is needed. fh_raw also ensure the structure is large enough for the largest possible filehandle. fh_raw is a 'char' array, removing any need to cast it for memcpy etc. SVCFH_fmt() is simplified using the "%ph" printk format. This changes the appearance of filehandles in dprintk() debugging, making them a little more precise. Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/flexfilelayout.c | 2 +- fs/nfsd/lockd.c | 2 +- fs/nfsd/nfs3xdr.c | 4 ++-- fs/nfsd/nfs4callback.c | 2 +- fs/nfsd/nfs4proc.c | 4 ++-- fs/nfsd/nfs4state.c | 4 ++-- fs/nfsd/nfs4xdr.c | 4 ++-- fs/nfsd/nfsctl.c | 6 ++--- fs/nfsd/nfsfh.c | 13 ++++------- fs/nfsd/nfsfh.h | 50 ++++++++++++---------------------------- fs/nfsd/nfsxdr.c | 4 ++-- 11 files changed, 35 insertions(+), 60 deletions(-) diff --git a/fs/nfsd/flexfilelayout.c b/fs/nfsd/flexfilelayout.c index db7ef07ae50c..2e2f1d5e9f62 100644 --- a/fs/nfsd/flexfilelayout.c +++ b/fs/nfsd/flexfilelayout.c @@ -61,7 +61,7 @@ nfsd4_ff_proc_layoutget(struct inode *inode, const struct svc_fh *fhp, goto out_error; fl->fh.size = fhp->fh_handle.fh_size; - memcpy(fl->fh.data, &fhp->fh_handle.fh_base, fl->fh.size); + memcpy(fl->fh.data, &fhp->fh_handle.fh_raw, fl->fh.size); /* Give whole file layout segments */ seg->offset = 0; diff --git a/fs/nfsd/lockd.c b/fs/nfsd/lockd.c index 606fa155c28a..46a7f9b813e5 100644 --- a/fs/nfsd/lockd.c +++ b/fs/nfsd/lockd.c @@ -35,7 +35,7 @@ nlm_fopen(struct svc_rqst *rqstp, struct nfs_fh *f, struct file **filp, /* must initialize before using! but maxsize doesn't matter */ fh_init(&fh,0); fh.fh_handle.fh_size = f->size; - memcpy((char*)&fh.fh_handle.fh_base, f->data, f->size); + memcpy(&fh.fh_handle.fh_raw, f->data, f->size); fh.fh_export = NULL; access = (mode == O_WRONLY) ? NFSD_MAY_WRITE : NFSD_MAY_READ; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 0a5ebc52e6a9..3d37923afb06 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -92,7 +92,7 @@ svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp) return false; fh_init(fhp, NFS3_FHSIZE); fhp->fh_handle.fh_size = size; - memcpy(&fhp->fh_handle.fh_base, p, size); + memcpy(&fhp->fh_handle.fh_raw, p, size); return true; } @@ -131,7 +131,7 @@ svcxdr_encode_nfs_fh3(struct xdr_stream *xdr, const struct svc_fh *fhp) *p++ = cpu_to_be32(size); if (size) p[XDR_QUADLEN(size) - 1] = 0; - memcpy(p, &fhp->fh_handle.fh_base, size); + memcpy(p, &fhp->fh_handle.fh_raw, size); return true; } diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index 97f517e9b418..e1272a7f4522 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -121,7 +121,7 @@ static void encode_nfs_fh4(struct xdr_stream *xdr, const struct knfsd_fh *fh) BUG_ON(length > NFS4_FHSIZE); p = xdr_reserve_space(xdr, 4 + length); - xdr_encode_opaque(p, &fh->fh_base, length); + xdr_encode_opaque(p, &fh->fh_raw, length); } /* diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 1f840c72e978..d55d9b9dbafb 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -519,7 +519,7 @@ nfsd4_putfh(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, fh_put(&cstate->current_fh); cstate->current_fh.fh_handle.fh_size = putfh->pf_fhlen; - memcpy(&cstate->current_fh.fh_handle.fh_base, putfh->pf_fhval, + memcpy(&cstate->current_fh.fh_handle.fh_raw, putfh->pf_fhval, putfh->pf_fhlen); ret = fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS); #ifdef CONFIG_NFSD_V4_2_INTER_SSC @@ -1379,7 +1379,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, s_fh = &cstate->save_fh; copy->c_fh.size = s_fh->fh_handle.fh_size; - memcpy(copy->c_fh.data, &s_fh->fh_handle.fh_base, copy->c_fh.size); + memcpy(copy->c_fh.data, &s_fh->fh_handle.fh_raw, copy->c_fh.size); copy->stateid.seqid = cpu_to_be32(s_stid->si_generation); memcpy(copy->stateid.other, (void *)&s_stid->si_opaque, sizeof(stateid_opaque_t)); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index e9ac77c28741..68562564be6b 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1022,7 +1022,7 @@ static int delegation_blocked(struct knfsd_fh *fh) } spin_unlock(&blocked_delegations_lock); } - hash = jhash(&fh->fh_base, fh->fh_size, 0); + hash = jhash(&fh->fh_raw, fh->fh_size, 0); if (test_bit(hash&255, bd->set[0]) && test_bit((hash>>8)&255, bd->set[0]) && test_bit((hash>>16)&255, bd->set[0])) @@ -1041,7 +1041,7 @@ static void block_delegations(struct knfsd_fh *fh) u32 hash; struct bloom_pair *bd = &blocked_delegations; - hash = jhash(&fh->fh_base, fh->fh_size, 0); + hash = jhash(&fh->fh_raw, fh->fh_size, 0); spin_lock(&blocked_delegations_lock); __set_bit(hash&255, bd->set[bd->new]); diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 262c6fec56aa..899d961372c0 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3110,7 +3110,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, p = xdr_reserve_space(xdr, fhp->fh_handle.fh_size + 4); if (!p) goto out_resource; - p = xdr_encode_opaque(p, &fhp->fh_handle.fh_base, + p = xdr_encode_opaque(p, &fhp->fh_handle.fh_raw, fhp->fh_handle.fh_size); } if (bmval0 & FATTR4_WORD0_FILEID) { @@ -3681,7 +3681,7 @@ nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, struct svc_fh p = xdr_reserve_space(xdr, len + 4); if (!p) return nfserr_resource; - p = xdr_encode_opaque(p, &fhp->fh_handle.fh_base, len); + p = xdr_encode_opaque(p, &fhp->fh_handle.fh_raw, len); return 0; } diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index cb73c1292562..d0761ca8cb54 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -395,12 +395,12 @@ static ssize_t write_filehandle(struct file *file, char *buf, size_t size) auth_domain_put(dom); if (len) return len; - + mesg = buf; len = SIMPLE_TRANSACTION_LIMIT; - qword_addhex(&mesg, &len, (char*)&fh.fh_base, fh.fh_size); + qword_addhex(&mesg, &len, fh.fh_raw, fh.fh_size); mesg[-1] = '\n'; - return mesg - buf; + return mesg - buf; } /* diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 7e5a508173a0..34e201b6eb62 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -639,16 +639,11 @@ fh_put(struct svc_fh *fhp) char * SVCFH_fmt(struct svc_fh *fhp) { struct knfsd_fh *fh = &fhp->fh_handle; + static char buf[2+1+1+64*3+1]; - static char buf[80]; - sprintf(buf, "%d: %08x %08x %08x %08x %08x %08x", - fh->fh_size, - fh->fh_base.fh_pad[0], - fh->fh_base.fh_pad[1], - fh->fh_base.fh_pad[2], - fh->fh_base.fh_pad[3], - fh->fh_base.fh_pad[4], - fh->fh_base.fh_pad[5]); + if (fh->fh_size < 0 || fh->fh_size> 64) + return "bad-fh"; + sprintf(buf, "%d: %*ph", fh->fh_size, fh->fh_size, fh->fh_raw); return buf; } diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 8b5587f274a7..d11e4b6870d6 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -43,44 +43,24 @@ * filesystems must not use the values '0' or '0xff'. 'See enum fid_type' * in include/linux/exportfs.h for currently registered values. */ -struct nfs_fhbase_new { + +struct knfsd_fh { + unsigned int fh_size; /* + * Points to the current size while + * building a new file handle. + */ union { + char fh_raw[NFS4_FHSIZE]; struct { - u8 fb_version_aux; /* == 1 */ - u8 fb_auth_type_aux; - u8 fb_fsid_type_aux; - u8 fb_fileid_type_aux; - u32 fb_auth[1]; - /* u32 fb_fsid[0]; floating */ - /* u32 fb_fileid[0]; floating */ - }; - struct { - u8 fb_version; /* == 1 */ - u8 fb_auth_type; - u8 fb_fsid_type; - u8 fb_fileid_type; - u32 fb_auth_flex[]; /* flexible-array member */ + u8 fh_version; /* == 1 */ + u8 fh_auth_type; /* deprecated */ + u8 fh_fsid_type; + u8 fh_fileid_type; + u32 fh_fsid[]; /* flexible-array member */ }; }; }; -struct knfsd_fh { - unsigned int fh_size; /* significant for NFSv3. - * Points to the current size while building - * a new file handle - */ - union { - u32 fh_pad[NFS4_FHSIZE/4]; - struct nfs_fhbase_new fh_new; - } fh_base; -}; - -#define fh_version fh_base.fh_new.fb_version -#define fh_fsid_type fh_base.fh_new.fb_fsid_type -#define fh_auth_type fh_base.fh_new.fb_auth_type -#define fh_fileid_type fh_base.fh_new.fb_fileid_type -#define fh_fsid fh_base.fh_new.fb_auth_flex - static inline __u32 ino_t_to_u32(ino_t ino) { return (__u32) ino; @@ -255,7 +235,7 @@ static inline void fh_copy_shallow(struct knfsd_fh *dst, struct knfsd_fh *src) { dst->fh_size = src->fh_size; - memcpy(&dst->fh_base, &src->fh_base, src->fh_size); + memcpy(&dst->fh_raw, &src->fh_raw, src->fh_size); } static __inline__ struct svc_fh * @@ -270,7 +250,7 @@ static inline bool fh_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2) { if (fh1->fh_size != fh2->fh_size) return false; - if (memcmp(fh1->fh_base.fh_pad, fh2->fh_base.fh_pad, fh1->fh_size) != 0) + if (memcmp(fh1->fh_raw, fh2->fh_raw, fh1->fh_size) != 0) return false; return true; } @@ -294,7 +274,7 @@ static inline bool fh_fsid_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2) */ static inline u32 knfsd_fh_hash(const struct knfsd_fh *fh) { - return ~crc32_le(0xFFFFFFFF, (unsigned char *)&fh->fh_base, fh->fh_size); + return ~crc32_le(0xFFFFFFFF, fh->fh_raw, fh->fh_size); } #else static inline u32 knfsd_fh_hash(const struct knfsd_fh *fh) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index a06c05fe3b42..082449c7d0db 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -64,7 +64,7 @@ svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp) if (!p) return false; fh_init(fhp, NFS_FHSIZE); - memcpy(&fhp->fh_handle.fh_base, p, NFS_FHSIZE); + memcpy(&fhp->fh_handle.fh_raw, p, NFS_FHSIZE); fhp->fh_handle.fh_size = NFS_FHSIZE; return true; @@ -78,7 +78,7 @@ svcxdr_encode_fhandle(struct xdr_stream *xdr, const struct svc_fh *fhp) p = xdr_reserve_space(xdr, NFS_FHSIZE); if (!p) return false; - memcpy(p, &fhp->fh_handle.fh_base, NFS_FHSIZE); + memcpy(p, &fhp->fh_handle.fh_raw, NFS_FHSIZE); return true; } From c23b25dd19288db06d8167314cc89365ba649feb Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Sat, 25 Sep 2021 23:58:41 +0100 Subject: [PATCH 0692/1565] NFSD: Initialize pointer ni with NULL and not plain integer 0 [ Upstream commit 8e70bf27fd20cc17e87150327a640e546bfbee64 ] Pointer ni is being initialized with plain integer zero. Fix this by initializing with NULL. Signed-off-by: Colin Ian King Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfs4state.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index d55d9b9dbafb..ebb6d8471e8d 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1181,7 +1181,7 @@ extern void nfs_sb_deactive(struct super_block *sb); static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, struct nfsd4_ssc_umount_item **retwork, struct vfsmount **ss_mnt) { - struct nfsd4_ssc_umount_item *ni = 0; + struct nfsd4_ssc_umount_item *ni = NULL; struct nfsd4_ssc_umount_item *work = NULL; struct nfsd4_ssc_umount_item *tmp; DEFINE_WAIT(wait); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 68562564be6b..0b39ed1568d4 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5573,7 +5573,7 @@ static void nfsd4_ssc_shutdown_umount(struct nfsd_net *nn) static void nfsd4_ssc_expire_umount(struct nfsd_net *nn) { bool do_wakeup = false; - struct nfsd4_ssc_umount_item *ni = 0; + struct nfsd4_ssc_umount_item *ni = NULL; struct nfsd4_ssc_umount_item *tmp; spin_lock(&nn->nfsd_ssc_lock); From 396b359832e7db6da1ce0307a9628f86ccae65a0 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 30 Sep 2021 17:06:21 -0400 Subject: [PATCH 0693/1565] NFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment() [ Upstream commit dae9a6cab8009e526570e7477ce858dcdfeb256e ] Refactor. Now that the NFSv2 and NFSv3 XDR decoders have been converted to use xdr_streams, the WRITE decoder functions can use xdr_stream_subsegment() to extract the WRITE payload into its own xdr_buf, just as the NFSv4 WRITE XDR decoder currently does. That makes it possible to pass the first kvec, pages array + length, page_base, and total payload length via a single function parameter. The payload's page_base is not yet assigned or used, but will be in subsequent patches. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 3 +-- fs/nfsd/nfs3xdr.c | 12 ++---------- fs/nfsd/nfs4proc.c | 3 +-- fs/nfsd/nfsproc.c | 3 +-- fs/nfsd/nfsxdr.c | 9 +-------- fs/nfsd/xdr.h | 2 +- fs/nfsd/xdr3.h | 2 +- include/linux/sunrpc/svc.h | 3 +-- net/sunrpc/svc.c | 11 ++++++----- 9 files changed, 15 insertions(+), 33 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index be1ed33e424e..5abb5c8e2cd2 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -206,8 +206,7 @@ nfsd3_proc_write(struct svc_rqst *rqstp) fh_copy(&resp->fh, &argp->fh); resp->committed = argp->stable; - nvecs = svc_fill_write_vector(rqstp, rqstp->rq_arg.pages, - &argp->first, cnt); + nvecs = svc_fill_write_vector(rqstp, &argp->payload); if (!nvecs) { resp->status = nfserr_io; goto out; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 3d37923afb06..267e56f218af 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -621,9 +621,6 @@ nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_writeargs *args = rqstp->rq_argp; u32 max_blocksize = svc_max_payload(rqstp); - struct kvec *head = rqstp->rq_arg.head; - struct kvec *tail = rqstp->rq_arg.tail; - size_t remaining; if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) return 0; @@ -641,17 +638,12 @@ nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) /* request sanity */ if (args->count != args->len) return 0; - remaining = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len; - remaining -= xdr_stream_pos(xdr); - if (remaining < xdr_align_size(args->len)) - return 0; if (args->count > max_blocksize) { args->count = max_blocksize; args->len = max_blocksize; } - - args->first.iov_base = xdr->p; - args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); + if (!xdr_stream_subsegment(xdr, &args->payload, args->count)) + return 0; return 1; } diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index ebb6d8471e8d..d2ee1ba7ddc6 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1034,8 +1034,7 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, write->wr_how_written = write->wr_stable_how; - nvecs = svc_fill_write_vector(rqstp, write->wr_payload.pages, - write->wr_payload.head, write->wr_buflen); + nvecs = svc_fill_write_vector(rqstp, &write->wr_payload); WARN_ON_ONCE(nvecs > ARRAY_SIZE(rqstp->rq_vec)); status = nfsd_vfs_write(rqstp, &cstate->current_fh, nf, diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 78bdfdc253fd..5a84aed17e70 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -234,8 +234,7 @@ nfsd_proc_write(struct svc_rqst *rqstp) SVCFH_fmt(&argp->fh), argp->len, argp->offset); - nvecs = svc_fill_write_vector(rqstp, rqstp->rq_arg.pages, - &argp->first, cnt); + nvecs = svc_fill_write_vector(rqstp, &argp->payload); if (!nvecs) { resp->status = nfserr_io; goto out; diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 082449c7d0db..ddcc18adfeb1 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -325,10 +325,7 @@ nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) { struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_writeargs *args = rqstp->rq_argp; - struct kvec *head = rqstp->rq_arg.head; - struct kvec *tail = rqstp->rq_arg.tail; u32 beginoffset, totalcount; - size_t remaining; if (!svcxdr_decode_fhandle(xdr, &args->fh)) return 0; @@ -346,12 +343,8 @@ nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) return 0; if (args->len > NFSSVC_MAXBLKSIZE_V2) return 0; - remaining = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len; - remaining -= xdr_stream_pos(xdr); - if (remaining < xdr_align_size(args->len)) + if (!xdr_stream_subsegment(xdr, &args->payload, args->len)) return 0; - args->first.iov_base = xdr->p; - args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); return 1; } diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index c67ad02b9a02..863a35f24910 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -33,7 +33,7 @@ struct nfsd_writeargs { svc_fh fh; __u32 offset; __u32 len; - struct kvec first; + struct xdr_buf payload; }; struct nfsd_createargs { diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 933008382bbe..712c117300cb 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -40,7 +40,7 @@ struct nfsd3_writeargs { __u32 count; int stable; __u32 len; - struct kvec first; + struct xdr_buf payload; }; struct nfsd3_createargs { diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index dd3daadbc0e5..b0986e969c2f 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -531,8 +531,7 @@ int svc_encode_result_payload(struct svc_rqst *rqstp, unsigned int offset, unsigned int length); unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, - struct page **pages, - struct kvec *first, size_t total); + struct xdr_buf *payload); char *svc_fill_symlink_pathname(struct svc_rqst *rqstp, struct kvec *first, void *p, size_t total); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 0d3c3ca2830a..54f66f66beb5 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1654,16 +1654,17 @@ EXPORT_SYMBOL_GPL(svc_encode_result_payload); /** * svc_fill_write_vector - Construct data argument for VFS write call * @rqstp: svc_rqst to operate on - * @pages: list of pages containing data payload - * @first: buffer containing first section of write payload - * @total: total number of bytes of write payload + * @payload: xdr_buf containing only the write data payload * * Fills in rqstp::rq_vec, and returns the number of elements. */ -unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, struct page **pages, - struct kvec *first, size_t total) +unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, + struct xdr_buf *payload) { + struct page **pages = payload->pages; + struct kvec *first = payload->head; struct kvec *vec = rqstp->rq_vec; + size_t total = payload->len; unsigned int i; /* Some types of transport can present the write payload From 0696b6b513a78d8c87aed664266dbc03846a08b4 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 12 Oct 2021 11:57:22 -0400 Subject: [PATCH 0694/1565] SUNRPC: Replace the "__be32 *p" parameter to .pc_decode [ Upstream commit 16c663642c7ec03cd4cee5fec520bb69e97babe4 ] The passed-in value of the "__be32 *p" parameter is now unused in every server-side XDR decoder, and can be removed. Note also that there is a line in each decoder that sets up a local pointer to a struct xdr_stream. Passing that pointer from the dispatcher instead saves one line per decoder function. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 3 +-- fs/lockd/xdr.c | 27 +++++++++-------------- fs/lockd/xdr4.c | 27 +++++++++-------------- fs/nfsd/nfs2acl.c | 12 +++++----- fs/nfsd/nfs3acl.c | 8 +++---- fs/nfsd/nfs3xdr.c | 45 +++++++++++++------------------------- fs/nfsd/nfs4xdr.c | 4 ++-- fs/nfsd/nfsd.h | 3 ++- fs/nfsd/nfssvc.c | 7 +++--- fs/nfsd/nfsxdr.c | 30 +++++++++---------------- fs/nfsd/xdr.h | 21 +++++++++--------- fs/nfsd/xdr3.h | 31 +++++++++++++------------- fs/nfsd/xdr4.h | 2 +- include/linux/lockd/xdr.h | 19 ++++++++-------- include/linux/lockd/xdr4.h | 19 ++++++++-------- include/linux/sunrpc/svc.h | 3 ++- 16 files changed, 112 insertions(+), 149 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index b632be3ad57b..9a82471bda07 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -780,11 +780,10 @@ module_exit(exit_nlm); static int nlmsvc_dispatch(struct svc_rqst *rqstp, __be32 *statp) { const struct svc_procedure *procp = rqstp->rq_procinfo; - struct kvec *argv = rqstp->rq_arg.head; struct kvec *resv = rqstp->rq_res.head; svcxdr_init_decode(rqstp); - if (!procp->pc_decode(rqstp, argv->iov_base)) + if (!procp->pc_decode(rqstp, &rqstp->rq_arg_stream)) goto out_decode_err; *statp = procp->pc_func(rqstp); diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 9235e60b1769..895f15222104 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -146,15 +146,14 @@ svcxdr_encode_testrply(struct xdr_stream *xdr, const struct nlm_res *resp) */ int -nlmsvc_decode_void(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { return 1; } int -nlmsvc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; u32 exclusive; @@ -171,9 +170,8 @@ nlmsvc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) } int -nlmsvc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; u32 exclusive; @@ -197,9 +195,8 @@ nlmsvc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) } int -nlmsvc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; u32 exclusive; @@ -218,9 +215,8 @@ nlmsvc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) } int -nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; if (!svcxdr_decode_cookie(xdr, &argp->cookie)) @@ -233,9 +229,8 @@ nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) } int -nlmsvc_decode_res(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_res *resp = rqstp->rq_argp; if (!svcxdr_decode_cookie(xdr, &resp->cookie)) @@ -247,10 +242,10 @@ nlmsvc_decode_res(struct svc_rqst *rqstp, __be32 *p) } int -nlmsvc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_reboot *argp = rqstp->rq_argp; + __be32 *p; u32 len; if (xdr_stream_decode_u32(xdr, &len) < 0) @@ -273,9 +268,8 @@ nlmsvc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) } int -nlmsvc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; struct nlm_lock *lock = &argp->lock; @@ -301,9 +295,8 @@ nlmsvc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) } int -nlmsvc_decode_notify(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; struct nlm_lock *lock = &argp->lock; diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 98e957e4566c..573c7d580a5e 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -145,15 +145,14 @@ svcxdr_encode_testrply(struct xdr_stream *xdr, const struct nlm_res *resp) */ int -nlm4svc_decode_void(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { return 1; } int -nlm4svc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; u32 exclusive; @@ -170,9 +169,8 @@ nlm4svc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) } int -nlm4svc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; u32 exclusive; @@ -196,9 +194,8 @@ nlm4svc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) } int -nlm4svc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; u32 exclusive; @@ -216,9 +213,8 @@ nlm4svc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) } int -nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; if (!svcxdr_decode_cookie(xdr, &argp->cookie)) @@ -231,9 +227,8 @@ nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) } int -nlm4svc_decode_res(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_res *resp = rqstp->rq_argp; if (!svcxdr_decode_cookie(xdr, &resp->cookie)) @@ -245,10 +240,10 @@ nlm4svc_decode_res(struct svc_rqst *rqstp, __be32 *p) } int -nlm4svc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_reboot *argp = rqstp->rq_argp; + __be32 *p; u32 len; if (xdr_stream_decode_u32(xdr, &len) < 0) @@ -271,9 +266,8 @@ nlm4svc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) } int -nlm4svc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; struct nlm_lock *lock = &argp->lock; @@ -299,9 +293,8 @@ nlm4svc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) } int -nlm4svc_decode_notify(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nlm_args *argp = rqstp->rq_argp; struct nlm_lock *lock = &argp->lock; diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 8703326fc165..53f793c3606d 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -186,9 +186,9 @@ static __be32 nfsacld_proc_access(struct svc_rqst *rqstp) * XDR decode functions */ -static int nfsaclsvc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p) +static int +nfsaclsvc_decode_getaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_getaclargs *argp = rqstp->rq_argp; if (!svcxdr_decode_fhandle(xdr, &argp->fh)) @@ -199,9 +199,9 @@ static int nfsaclsvc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p) return 1; } -static int nfsaclsvc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) +static int +nfsaclsvc_decode_setaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_setaclargs *argp = rqstp->rq_argp; if (!svcxdr_decode_fhandle(xdr, &argp->fh)) @@ -220,9 +220,9 @@ static int nfsaclsvc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) return 1; } -static int nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) +static int +nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_accessargs *args = rqstp->rq_argp; if (!svcxdr_decode_fhandle(xdr, &args->fh)) diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 5e13e5f7f92b..37c8fb184ca4 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -125,9 +125,9 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp) * XDR decode functions */ -static int nfs3svc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p) +static int +nfs3svc_decode_getaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_getaclargs *args = rqstp->rq_argp; if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) @@ -138,9 +138,9 @@ static int nfs3svc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p) return 1; } -static int nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) +static int +nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_setaclargs *argp = rqstp->rq_argp; if (!svcxdr_decode_nfs_fh3(xdr, &argp->fh)) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 267e56f218af..5f744f03cda7 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -557,18 +557,16 @@ void fill_post_wcc(struct svc_fh *fhp) */ int -nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_fhandle *args = rqstp->rq_argp; return svcxdr_decode_nfs_fh3(xdr, &args->fh); } int -nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_sattrargs *args = rqstp->rq_argp; return svcxdr_decode_nfs_fh3(xdr, &args->fh) && @@ -577,18 +575,16 @@ nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p) } int -nfs3svc_decode_diropargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_diropargs *args = rqstp->rq_argp; return svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len); } int -nfs3svc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_accessargs *args = rqstp->rq_argp; if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) @@ -600,9 +596,8 @@ nfs3svc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) } int -nfs3svc_decode_readargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_readargs *args = rqstp->rq_argp; if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) @@ -616,9 +611,8 @@ nfs3svc_decode_readargs(struct svc_rqst *rqstp, __be32 *p) } int -nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_writeargs *args = rqstp->rq_argp; u32 max_blocksize = svc_max_payload(rqstp); @@ -649,9 +643,8 @@ nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) } int -nfs3svc_decode_createargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_createargs *args = rqstp->rq_argp; if (!svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len)) @@ -674,9 +667,8 @@ nfs3svc_decode_createargs(struct svc_rqst *rqstp, __be32 *p) } int -nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_createargs *args = rqstp->rq_argp; return svcxdr_decode_diropargs3(xdr, &args->fh, @@ -685,9 +677,8 @@ nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, __be32 *p) } int -nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_symlinkargs *args = rqstp->rq_argp; struct kvec *head = rqstp->rq_arg.head; struct kvec *tail = rqstp->rq_arg.tail; @@ -713,9 +704,8 @@ nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) } int -nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_mknodargs *args = rqstp->rq_argp; if (!svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len)) @@ -742,9 +732,8 @@ nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, __be32 *p) } int -nfs3svc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_renameargs *args = rqstp->rq_argp; return svcxdr_decode_diropargs3(xdr, &args->ffh, @@ -754,9 +743,8 @@ nfs3svc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) } int -nfs3svc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_linkargs *args = rqstp->rq_argp; return svcxdr_decode_nfs_fh3(xdr, &args->ffh) && @@ -765,9 +753,8 @@ nfs3svc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) } int -nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_readdirargs *args = rqstp->rq_argp; if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) @@ -784,9 +771,8 @@ nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) } int -nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_readdirargs *args = rqstp->rq_argp; u32 dircount; @@ -807,9 +793,8 @@ nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, __be32 *p) } int -nfs3svc_decode_commitargs(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd3_commitargs *args = rqstp->rq_argp; if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 899d961372c0..5ee5081c5663 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5423,14 +5423,14 @@ void nfsd4_release_compoundargs(struct svc_rqst *rqstp) } int -nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, __be32 *p) +nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd4_compoundargs *args = rqstp->rq_argp; /* svcxdr_tmp_alloc */ args->to_free = NULL; - args->xdr = &rqstp->rq_arg_stream; + args->xdr = xdr; args->ops = args->iops; args->rqstp = rqstp; diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 9664303afdaf..6e8ad5f9757c 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -78,7 +78,8 @@ extern const struct seq_operations nfs_exports_op; */ struct nfsd_voidargs { }; struct nfsd_voidres { }; -int nfssvc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p); +int nfssvc_decode_voidarg(struct svc_rqst *rqstp, + struct xdr_stream *xdr); int nfssvc_encode_voidres(struct svc_rqst *rqstp, __be32 *p); /* diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 373695cc62a7..be1d656548cf 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1004,7 +1004,6 @@ nfsd(void *vrqstp) int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) { const struct svc_procedure *proc = rqstp->rq_procinfo; - struct kvec *argv = &rqstp->rq_arg.head[0]; struct kvec *resv = &rqstp->rq_res.head[0]; __be32 *p; @@ -1015,7 +1014,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) rqstp->rq_cachetype = proc->pc_cachetype; svcxdr_init_decode(rqstp); - if (!proc->pc_decode(rqstp, argv->iov_base)) + if (!proc->pc_decode(rqstp, &rqstp->rq_arg_stream)) goto out_decode_err; switch (nfsd_cache_lookup(rqstp)) { @@ -1065,13 +1064,13 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) /** * nfssvc_decode_voidarg - Decode void arguments * @rqstp: Server RPC transaction context - * @p: buffer containing arguments to decode + * @xdr: XDR stream positioned at arguments to decode * * Return values: * %0: Arguments were not valid * %1: Decoding was successful */ -int nfssvc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p) +int nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr) { return 1; } diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index ddcc18adfeb1..08e899180ee4 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -273,18 +273,16 @@ svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, */ int -nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_fhandle *args = rqstp->rq_argp; return svcxdr_decode_fhandle(xdr, &args->fh); } int -nfssvc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_sattrargs *args = rqstp->rq_argp; return svcxdr_decode_fhandle(xdr, &args->fh) && @@ -292,18 +290,16 @@ nfssvc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p) } int -nfssvc_decode_diropargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_diropargs *args = rqstp->rq_argp; return svcxdr_decode_diropargs(xdr, &args->fh, &args->name, &args->len); } int -nfssvc_decode_readargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_readargs *args = rqstp->rq_argp; u32 totalcount; @@ -321,9 +317,8 @@ nfssvc_decode_readargs(struct svc_rqst *rqstp, __be32 *p) } int -nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_writeargs *args = rqstp->rq_argp; u32 beginoffset, totalcount; @@ -350,9 +345,8 @@ nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) } int -nfssvc_decode_createargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_createargs *args = rqstp->rq_argp; return svcxdr_decode_diropargs(xdr, &args->fh, @@ -361,9 +355,8 @@ nfssvc_decode_createargs(struct svc_rqst *rqstp, __be32 *p) } int -nfssvc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_renameargs *args = rqstp->rq_argp; return svcxdr_decode_diropargs(xdr, &args->ffh, @@ -373,9 +366,8 @@ nfssvc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) } int -nfssvc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_linkargs *args = rqstp->rq_argp; return svcxdr_decode_fhandle(xdr, &args->ffh) && @@ -384,9 +376,8 @@ nfssvc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) } int -nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_symlinkargs *args = rqstp->rq_argp; struct kvec *head = rqstp->rq_arg.head; @@ -405,9 +396,8 @@ nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) } int -nfssvc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) +nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_arg_stream; struct nfsd_readdirargs *args = rqstp->rq_argp; if (!svcxdr_decode_fhandle(xdr, &args->fh)) diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 863a35f24910..19e281382bb9 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -141,16 +141,17 @@ union nfsd_xdrstore { #define NFS2_SVC_XDRSIZE sizeof(union nfsd_xdrstore) -int nfssvc_decode_fhandleargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_sattrargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_diropargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_readargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_writeargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_createargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_renameargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_linkargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_symlinkargs(struct svc_rqst *, __be32 *); -int nfssvc_decode_readdirargs(struct svc_rqst *, __be32 *); +int nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); + int nfssvc_encode_statres(struct svc_rqst *, __be32 *); int nfssvc_encode_attrstatres(struct svc_rqst *, __be32 *); int nfssvc_encode_diropres(struct svc_rqst *, __be32 *); diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 712c117300cb..60a8909205e5 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -265,21 +265,22 @@ union nfsd3_xdrstore { #define NFS3_SVC_XDRSIZE sizeof(union nfsd3_xdrstore) -int nfs3svc_decode_fhandleargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_sattrargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_diropargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_accessargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_readargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_writeargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_createargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_mkdirargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_mknodargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_renameargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_linkargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_symlinkargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_readdirargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_readdirplusargs(struct svc_rqst *, __be32 *); -int nfs3svc_decode_commitargs(struct svc_rqst *, __be32 *); +int nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); + int nfs3svc_encode_getattrres(struct svc_rqst *, __be32 *); int nfs3svc_encode_wccstat(struct svc_rqst *, __be32 *); int nfs3svc_encode_lookupres(struct svc_rqst *, __be32 *); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 37af86e370cb..c3e18efcd23b 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -756,7 +756,7 @@ set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp) bool nfsd4_mach_creds_match(struct nfs4_client *cl, struct svc_rqst *rqstp); -int nfs4svc_decode_compoundargs(struct svc_rqst *, __be32 *); +int nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); int nfs4svc_encode_compoundres(struct svc_rqst *, __be32 *); __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *, u32); void nfsd4_encode_operation(struct nfsd4_compoundres *, struct nfsd4_op *); diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h index a98309c0121c..170ad6f5596a 100644 --- a/include/linux/lockd/xdr.h +++ b/include/linux/lockd/xdr.h @@ -96,18 +96,19 @@ struct nlm_reboot { */ #define NLMSVC_XDRSIZE sizeof(struct nlm_args) -int nlmsvc_decode_testargs(struct svc_rqst *, __be32 *); +int nlmsvc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr); + int nlmsvc_encode_testres(struct svc_rqst *, __be32 *); -int nlmsvc_decode_lockargs(struct svc_rqst *, __be32 *); -int nlmsvc_decode_cancargs(struct svc_rqst *, __be32 *); -int nlmsvc_decode_unlockargs(struct svc_rqst *, __be32 *); int nlmsvc_encode_res(struct svc_rqst *, __be32 *); -int nlmsvc_decode_res(struct svc_rqst *, __be32 *); int nlmsvc_encode_void(struct svc_rqst *, __be32 *); -int nlmsvc_decode_void(struct svc_rqst *, __be32 *); -int nlmsvc_decode_shareargs(struct svc_rqst *, __be32 *); int nlmsvc_encode_shareres(struct svc_rqst *, __be32 *); -int nlmsvc_decode_notify(struct svc_rqst *, __be32 *); -int nlmsvc_decode_reboot(struct svc_rqst *, __be32 *); #endif /* LOCKD_XDR_H */ diff --git a/include/linux/lockd/xdr4.h b/include/linux/lockd/xdr4.h index 5ae766f26e04..68e14e0f2b1f 100644 --- a/include/linux/lockd/xdr4.h +++ b/include/linux/lockd/xdr4.h @@ -22,21 +22,20 @@ #define nlm4_fbig cpu_to_be32(NLM_FBIG) #define nlm4_failed cpu_to_be32(NLM_FAILED) +int nlm4svc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr); - -int nlm4svc_decode_testargs(struct svc_rqst *, __be32 *); int nlm4svc_encode_testres(struct svc_rqst *, __be32 *); -int nlm4svc_decode_lockargs(struct svc_rqst *, __be32 *); -int nlm4svc_decode_cancargs(struct svc_rqst *, __be32 *); -int nlm4svc_decode_unlockargs(struct svc_rqst *, __be32 *); int nlm4svc_encode_res(struct svc_rqst *, __be32 *); -int nlm4svc_decode_res(struct svc_rqst *, __be32 *); int nlm4svc_encode_void(struct svc_rqst *, __be32 *); -int nlm4svc_decode_void(struct svc_rqst *, __be32 *); -int nlm4svc_decode_shareargs(struct svc_rqst *, __be32 *); int nlm4svc_encode_shareres(struct svc_rqst *, __be32 *); -int nlm4svc_decode_notify(struct svc_rqst *, __be32 *); -int nlm4svc_decode_reboot(struct svc_rqst *, __be32 *); extern const struct rpc_version nlm_version4; diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index b0986e969c2f..77e3a9b39827 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -457,7 +457,8 @@ struct svc_procedure { /* process the request: */ __be32 (*pc_func)(struct svc_rqst *); /* XDR decode args: */ - int (*pc_decode)(struct svc_rqst *, __be32 *data); + int (*pc_decode)(struct svc_rqst *rqstp, + struct xdr_stream *xdr); /* XDR encode result: */ int (*pc_encode)(struct svc_rqst *, __be32 *data); /* XDR free result: */ From f747ce574c4a4baa65e9a15b4524245cecc12045 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 12 Oct 2021 11:57:28 -0400 Subject: [PATCH 0695/1565] SUNRPC: Change return value type of .pc_decode [ Upstream commit c44b31c263798ec34614dd394c31ef1a2e7e716e ] Returning an undecorated integer is an age-old trope, but it's not clear (even to previous experts in this code) that the only valid return values are 1 and 0. These functions do not return a negative errno, rpc_stat value, or a positive length. Document there are only two valid return values by having .pc_decode return only true or false. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr.c | 96 +++++++++++++++--------------- fs/lockd/xdr4.c | 97 +++++++++++++++--------------- fs/nfsd/nfs2acl.c | 30 +++++----- fs/nfsd/nfs3acl.c | 22 +++---- fs/nfsd/nfs3xdr.c | 118 ++++++++++++++++++------------------- fs/nfsd/nfs4xdr.c | 24 ++++---- fs/nfsd/nfsd.h | 2 +- fs/nfsd/nfssvc.c | 6 +- fs/nfsd/nfsxdr.c | 62 +++++++++---------- fs/nfsd/xdr.h | 20 +++---- fs/nfsd/xdr3.h | 30 +++++----- fs/nfsd/xdr4.h | 2 +- include/linux/lockd/xdr.h | 18 +++--- include/linux/lockd/xdr4.h | 18 +++--- include/linux/sunrpc/svc.h | 2 +- 15 files changed, 274 insertions(+), 273 deletions(-) diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 895f15222104..622c2ca37dbf 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -145,103 +145,103 @@ svcxdr_encode_testrply(struct xdr_stream *xdr, const struct nlm_res *resp) * Decode Call arguments */ -int +bool nlmsvc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - return 1; + return true; } -int +bool nlmsvc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; u32 exclusive; if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &exclusive) < 0) - return 0; + return false; if (!svcxdr_decode_lock(xdr, &argp->lock)) - return 0; + return false; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; - return 1; + return true; } -int +bool nlmsvc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; u32 exclusive; if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &argp->block) < 0) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &exclusive) < 0) - return 0; + return false; if (!svcxdr_decode_lock(xdr, &argp->lock)) - return 0; + return false; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; if (xdr_stream_decode_bool(xdr, &argp->reclaim) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &argp->state) < 0) - return 0; + return false; argp->monitor = 1; /* monitor client by default */ - return 1; + return true; } -int +bool nlmsvc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; u32 exclusive; if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &argp->block) < 0) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &exclusive) < 0) - return 0; + return false; if (!svcxdr_decode_lock(xdr, &argp->lock)) - return 0; + return false; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; - return 1; + return true; } -int +bool nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (!svcxdr_decode_lock(xdr, &argp->lock)) - return 0; + return false; argp->lock.fl.fl_type = F_UNLCK; - return 1; + return true; } -int +bool nlmsvc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_res *resp = rqstp->rq_argp; if (!svcxdr_decode_cookie(xdr, &resp->cookie)) - return 0; + return false; if (!svcxdr_decode_stats(xdr, &resp->status)) - return 0; + return false; - return 1; + return true; } -int +bool nlmsvc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_reboot *argp = rqstp->rq_argp; @@ -249,25 +249,25 @@ nlmsvc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr) u32 len; if (xdr_stream_decode_u32(xdr, &len) < 0) - return 0; + return false; if (len > SM_MAXSTRLEN) - return 0; + return false; p = xdr_inline_decode(xdr, len); if (!p) - return 0; + return false; argp->len = len; argp->mon = (char *)p; if (xdr_stream_decode_u32(xdr, &argp->state) < 0) - return 0; + return false; p = xdr_inline_decode(xdr, SM_PRIV_SIZE); if (!p) - return 0; + return false; memcpy(&argp->priv.data, p, sizeof(argp->priv.data)); - return 1; + return true; } -int +bool nlmsvc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; @@ -278,34 +278,34 @@ nlmsvc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) lock->svid = ~(u32)0; if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) - return 0; + return false; if (!svcxdr_decode_fhandle(xdr, &lock->fh)) - return 0; + return false; if (!svcxdr_decode_owner(xdr, &lock->oh)) - return 0; + return false; /* XXX: Range checks are missing in the original code */ if (xdr_stream_decode_u32(xdr, &argp->fsm_mode) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &argp->fsm_access) < 0) - return 0; + return false; - return 1; + return true; } -int +bool nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; struct nlm_lock *lock = &argp->lock; if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &argp->state) < 0) - return 0; + return false; - return 1; + return true; } diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 573c7d580a5e..45551dee26b4 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -144,102 +144,103 @@ svcxdr_encode_testrply(struct xdr_stream *xdr, const struct nlm_res *resp) * Decode Call arguments */ -int +bool nlm4svc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - return 1; + return true; } -int +bool nlm4svc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; u32 exclusive; if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &exclusive) < 0) - return 0; + return false; if (!svcxdr_decode_lock(xdr, &argp->lock)) - return 0; + return false; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; - return 1; + return true; } -int +bool nlm4svc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; u32 exclusive; if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &argp->block) < 0) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &exclusive) < 0) - return 0; + return false; if (!svcxdr_decode_lock(xdr, &argp->lock)) - return 0; + return false; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; if (xdr_stream_decode_bool(xdr, &argp->reclaim) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &argp->state) < 0) - return 0; + return false; argp->monitor = 1; /* monitor client by default */ - return 1; + return true; } -int +bool nlm4svc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; u32 exclusive; if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &argp->block) < 0) - return 0; + return false; if (xdr_stream_decode_bool(xdr, &exclusive) < 0) - return 0; + return false; if (!svcxdr_decode_lock(xdr, &argp->lock)) - return 0; + return false; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; - return 1; + + return true; } -int +bool nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (!svcxdr_decode_lock(xdr, &argp->lock)) - return 0; + return false; argp->lock.fl.fl_type = F_UNLCK; - return 1; + return true; } -int +bool nlm4svc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_res *resp = rqstp->rq_argp; if (!svcxdr_decode_cookie(xdr, &resp->cookie)) - return 0; + return false; if (!svcxdr_decode_stats(xdr, &resp->status)) - return 0; + return false; - return 1; + return true; } -int +bool nlm4svc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_reboot *argp = rqstp->rq_argp; @@ -247,25 +248,25 @@ nlm4svc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr) u32 len; if (xdr_stream_decode_u32(xdr, &len) < 0) - return 0; + return false; if (len > SM_MAXSTRLEN) - return 0; + return false; p = xdr_inline_decode(xdr, len); if (!p) - return 0; + return false; argp->len = len; argp->mon = (char *)p; if (xdr_stream_decode_u32(xdr, &argp->state) < 0) - return 0; + return false; p = xdr_inline_decode(xdr, SM_PRIV_SIZE); if (!p) - return 0; + return false; memcpy(&argp->priv.data, p, sizeof(argp->priv.data)); - return 1; + return true; } -int +bool nlm4svc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; @@ -276,34 +277,34 @@ nlm4svc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) lock->svid = ~(u32)0; if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return 0; + return false; if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) - return 0; + return false; if (!svcxdr_decode_fhandle(xdr, &lock->fh)) - return 0; + return false; if (!svcxdr_decode_owner(xdr, &lock->oh)) - return 0; + return false; /* XXX: Range checks are missing in the original code */ if (xdr_stream_decode_u32(xdr, &argp->fsm_mode) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &argp->fsm_access) < 0) - return 0; + return false; - return 1; + return true; } -int +bool nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_args *argp = rqstp->rq_argp; struct nlm_lock *lock = &argp->lock; if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &argp->state) < 0) - return 0; + return false; - return 1; + return true; } diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 53f793c3606d..eb6a89baf675 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -186,51 +186,51 @@ static __be32 nfsacld_proc_access(struct svc_rqst *rqstp) * XDR decode functions */ -static int +static bool nfsaclsvc_decode_getaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_getaclargs *argp = rqstp->rq_argp; if (!svcxdr_decode_fhandle(xdr, &argp->fh)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &argp->mask) < 0) - return 0; + return false; - return 1; + return true; } -static int +static bool nfsaclsvc_decode_setaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_setaclargs *argp = rqstp->rq_argp; if (!svcxdr_decode_fhandle(xdr, &argp->fh)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &argp->mask) < 0) - return 0; + return false; if (argp->mask & ~NFS_ACL_MASK) - return 0; + return false; if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_ACL) ? &argp->acl_access : NULL)) - return 0; + return false; if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_DFACL) ? &argp->acl_default : NULL)) - return 0; + return false; - return 1; + return true; } -static int +static bool nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_accessargs *args = rqstp->rq_argp; if (!svcxdr_decode_fhandle(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->access) < 0) - return 0; + return false; - return 1; + return true; } /* diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 37c8fb184ca4..f60b072ce9ec 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -125,38 +125,38 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp) * XDR decode functions */ -static int +static bool nfs3svc_decode_getaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_getaclargs *args = rqstp->rq_argp; if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->mask) < 0) - return 0; + return false; - return 1; + return true; } -static int +static bool nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_setaclargs *argp = rqstp->rq_argp; if (!svcxdr_decode_nfs_fh3(xdr, &argp->fh)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &argp->mask) < 0) - return 0; + return false; if (argp->mask & ~NFS_ACL_MASK) - return 0; + return false; if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_ACL) ? &argp->acl_access : NULL)) - return 0; + return false; if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_DFACL) ? &argp->acl_default : NULL)) - return 0; + return false; - return 1; + return true; } /* diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 5f744f03cda7..1f3de46d24d4 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -556,7 +556,7 @@ void fill_post_wcc(struct svc_fh *fhp) * XDR decode functions */ -int +bool nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_fhandle *args = rqstp->rq_argp; @@ -564,7 +564,7 @@ nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) return svcxdr_decode_nfs_fh3(xdr, &args->fh); } -int +bool nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_sattrargs *args = rqstp->rq_argp; @@ -574,7 +574,7 @@ nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) svcxdr_decode_sattrguard3(xdr, args); } -int +bool nfs3svc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_diropargs *args = rqstp->rq_argp; @@ -582,75 +582,75 @@ nfs3svc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) return svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len); } -int +bool nfs3svc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_accessargs *args = rqstp->rq_argp; if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->access) < 0) - return 0; + return false; - return 1; + return true; } -int +bool nfs3svc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_readargs *args = rqstp->rq_argp; if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u64(xdr, &args->offset) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return 0; + return false; - return 1; + return true; } -int +bool nfs3svc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_writeargs *args = rqstp->rq_argp; u32 max_blocksize = svc_max_payload(rqstp); if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u64(xdr, &args->offset) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->stable) < 0) - return 0; + return false; /* opaque data */ if (xdr_stream_decode_u32(xdr, &args->len) < 0) - return 0; + return false; /* request sanity */ if (args->count != args->len) - return 0; + return false; if (args->count > max_blocksize) { args->count = max_blocksize; args->len = max_blocksize; } if (!xdr_stream_subsegment(xdr, &args->payload, args->count)) - return 0; + return false; - return 1; + return true; } -int +bool nfs3svc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_createargs *args = rqstp->rq_argp; if (!svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->createmode) < 0) - return 0; + return false; switch (args->createmode) { case NFS3_CREATE_UNCHECKED: case NFS3_CREATE_GUARDED: @@ -658,15 +658,15 @@ nfs3svc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) case NFS3_CREATE_EXCLUSIVE: args->verf = xdr_inline_decode(xdr, NFS3_CREATEVERFSIZE); if (!args->verf) - return 0; + return false; break; default: - return 0; + return false; } - return 1; + return true; } -int +bool nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_createargs *args = rqstp->rq_argp; @@ -676,7 +676,7 @@ nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) svcxdr_decode_sattr3(rqstp, xdr, &args->attrs); } -int +bool nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_symlinkargs *args = rqstp->rq_argp; @@ -685,33 +685,33 @@ nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) size_t remaining; if (!svcxdr_decode_diropargs3(xdr, &args->ffh, &args->fname, &args->flen)) - return 0; + return false; if (!svcxdr_decode_sattr3(rqstp, xdr, &args->attrs)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->tlen) < 0) - return 0; + return false; /* request sanity */ remaining = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len; remaining -= xdr_stream_pos(xdr); if (remaining < xdr_align_size(args->tlen)) - return 0; + return false; args->first.iov_base = xdr->p; args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); - return 1; + return true; } -int +bool nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_mknodargs *args = rqstp->rq_argp; if (!svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->ftype) < 0) - return 0; + return false; switch (args->ftype) { case NF3CHR: case NF3BLK: @@ -725,13 +725,13 @@ nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) /* Valid XDR but illegal file types */ break; default: - return 0; + return false; } - return 1; + return true; } -int +bool nfs3svc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_renameargs *args = rqstp->rq_argp; @@ -742,7 +742,7 @@ nfs3svc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) &args->tname, &args->tlen); } -int +bool nfs3svc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_linkargs *args = rqstp->rq_argp; @@ -752,59 +752,59 @@ nfs3svc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) &args->tname, &args->tlen); } -int +bool nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_readdirargs *args = rqstp->rq_argp; if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u64(xdr, &args->cookie) < 0) - return 0; + return false; args->verf = xdr_inline_decode(xdr, NFS3_COOKIEVERFSIZE); if (!args->verf) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return 0; + return false; - return 1; + return true; } -int +bool nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_readdirargs *args = rqstp->rq_argp; u32 dircount; if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u64(xdr, &args->cookie) < 0) - return 0; + return false; args->verf = xdr_inline_decode(xdr, NFS3_COOKIEVERFSIZE); if (!args->verf) - return 0; + return false; /* dircount is ignored */ if (xdr_stream_decode_u32(xdr, &dircount) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return 0; + return false; - return 1; + return true; } -int +bool nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_commitargs *args = rqstp->rq_argp; if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u64(xdr, &args->offset) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return 0; + return false; - return 1; + return true; } /* diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 5ee5081c5663..1b33c1c93e88 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2322,7 +2322,7 @@ nfsd4_opnum_in_range(struct nfsd4_compoundargs *argp, struct nfsd4_op *op) return true; } -static int +static bool nfsd4_decode_compound(struct nfsd4_compoundargs *argp) { struct nfsd4_op *op; @@ -2335,25 +2335,25 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) int i; if (xdr_stream_decode_u32(argp->xdr, &argp->taglen) < 0) - return 0; + return false; max_reply += XDR_UNIT; argp->tag = NULL; if (unlikely(argp->taglen)) { if (argp->taglen > NFSD4_MAX_TAGLEN) - return 0; + return false; p = xdr_inline_decode(argp->xdr, argp->taglen); if (!p) - return 0; + return false; argp->tag = svcxdr_savemem(argp, p, argp->taglen); if (!argp->tag) - return 0; + return false; max_reply += xdr_align_size(argp->taglen); } if (xdr_stream_decode_u32(argp->xdr, &argp->minorversion) < 0) - return 0; + return false; if (xdr_stream_decode_u32(argp->xdr, &argp->opcnt) < 0) - return 0; + return false; /* * NFS4ERR_RESOURCE is a more helpful error than GARBAGE_ARGS @@ -2361,14 +2361,14 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) * nfsd4_proc can handle this is an NFS-level error. */ if (argp->opcnt > NFSD_MAX_OPS_PER_COMPOUND) - return 1; + return true; if (argp->opcnt > ARRAY_SIZE(argp->iops)) { argp->ops = kzalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL); if (!argp->ops) { argp->ops = argp->iops; dprintk("nfsd: couldn't allocate room for COMPOUND\n"); - return 0; + return false; } } @@ -2380,7 +2380,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) op->replay = NULL; if (xdr_stream_decode_u32(argp->xdr, &op->opnum) < 0) - return 0; + return false; if (nfsd4_opnum_in_range(argp, op)) { op->status = nfsd4_dec_ops[op->opnum](argp, &op->u); if (op->status != nfs_ok) @@ -2427,7 +2427,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) if (readcount > 1 || max_reply > PAGE_SIZE - auth_slack) clear_bit(RQ_SPLICE_OK, &argp->rqstp->rq_flags); - return 1; + return true; } static __be32 *encode_change(__be32 *p, struct kstat *stat, struct inode *inode, @@ -5422,7 +5422,7 @@ void nfsd4_release_compoundargs(struct svc_rqst *rqstp) } } -int +bool nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd4_compoundargs *args = rqstp->rq_argp; diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 6e8ad5f9757c..bfcddd4c7534 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -78,7 +78,7 @@ extern const struct seq_operations nfs_exports_op; */ struct nfsd_voidargs { }; struct nfsd_voidres { }; -int nfssvc_decode_voidarg(struct svc_rqst *rqstp, +bool nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr); int nfssvc_encode_voidres(struct svc_rqst *rqstp, __be32 *p); diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index be1d656548cf..00aadc263503 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1067,10 +1067,10 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) * @xdr: XDR stream positioned at arguments to decode * * Return values: - * %0: Arguments were not valid - * %1: Decoding was successful + * %false: Arguments were not valid + * %true: Decoding was successful */ -int nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr) +bool nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr) { return 1; } diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 08e899180ee4..b5817a41b3de 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -272,7 +272,7 @@ svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, * XDR decode functions */ -int +bool nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_fhandle *args = rqstp->rq_argp; @@ -280,7 +280,7 @@ nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) return svcxdr_decode_fhandle(xdr, &args->fh); } -int +bool nfssvc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_sattrargs *args = rqstp->rq_argp; @@ -289,7 +289,7 @@ nfssvc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) svcxdr_decode_sattr(rqstp, xdr, &args->attrs); } -int +bool nfssvc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_diropargs *args = rqstp->rq_argp; @@ -297,54 +297,54 @@ nfssvc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) return svcxdr_decode_diropargs(xdr, &args->fh, &args->name, &args->len); } -int +bool nfssvc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_readargs *args = rqstp->rq_argp; u32 totalcount; if (!svcxdr_decode_fhandle(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->offset) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return 0; + return false; /* totalcount is ignored */ if (xdr_stream_decode_u32(xdr, &totalcount) < 0) - return 0; + return false; - return 1; + return true; } -int +bool nfssvc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_writeargs *args = rqstp->rq_argp; u32 beginoffset, totalcount; if (!svcxdr_decode_fhandle(xdr, &args->fh)) - return 0; + return false; /* beginoffset is ignored */ if (xdr_stream_decode_u32(xdr, &beginoffset) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->offset) < 0) - return 0; + return false; /* totalcount is ignored */ if (xdr_stream_decode_u32(xdr, &totalcount) < 0) - return 0; + return false; /* opaque data */ if (xdr_stream_decode_u32(xdr, &args->len) < 0) - return 0; + return false; if (args->len > NFSSVC_MAXBLKSIZE_V2) - return 0; + return false; if (!xdr_stream_subsegment(xdr, &args->payload, args->len)) - return 0; + return false; - return 1; + return true; } -int +bool nfssvc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_createargs *args = rqstp->rq_argp; @@ -354,7 +354,7 @@ nfssvc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) svcxdr_decode_sattr(rqstp, xdr, &args->attrs); } -int +bool nfssvc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_renameargs *args = rqstp->rq_argp; @@ -365,7 +365,7 @@ nfssvc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) &args->tname, &args->tlen); } -int +bool nfssvc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_linkargs *args = rqstp->rq_argp; @@ -375,39 +375,39 @@ nfssvc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) &args->tname, &args->tlen); } -int +bool nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_symlinkargs *args = rqstp->rq_argp; struct kvec *head = rqstp->rq_arg.head; if (!svcxdr_decode_diropargs(xdr, &args->ffh, &args->fname, &args->flen)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->tlen) < 0) - return 0; + return false; if (args->tlen == 0) - return 0; + return false; args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); args->first.iov_base = xdr_inline_decode(xdr, args->tlen); if (!args->first.iov_base) - return 0; + return false; return svcxdr_decode_sattr(rqstp, xdr, &args->attrs); } -int +bool nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_readdirargs *args = rqstp->rq_argp; if (!svcxdr_decode_fhandle(xdr, &args->fh)) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->cookie) < 0) - return 0; + return false; if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return 0; + return false; - return 1; + return true; } /* diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 19e281382bb9..d897c198c912 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -141,16 +141,16 @@ union nfsd_xdrstore { #define NFS2_SVC_XDRSIZE sizeof(union nfsd_xdrstore) -int nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); int nfssvc_encode_statres(struct svc_rqst *, __be32 *); int nfssvc_encode_attrstatres(struct svc_rqst *, __be32 *); diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 60a8909205e5..ef72bc4868da 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -265,21 +265,21 @@ union nfsd3_xdrstore { #define NFS3_SVC_XDRSIZE sizeof(union nfsd3_xdrstore) -int nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); int nfs3svc_encode_getattrres(struct svc_rqst *, __be32 *); int nfs3svc_encode_wccstat(struct svc_rqst *, __be32 *); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index c3e18efcd23b..8f349640d2e9 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -756,7 +756,7 @@ set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp) bool nfsd4_mach_creds_match(struct nfs4_client *cl, struct svc_rqst *rqstp); -int nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); int nfs4svc_encode_compoundres(struct svc_rqst *, __be32 *); __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *, u32); void nfsd4_encode_operation(struct nfsd4_compoundres *, struct nfsd4_op *); diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h index 170ad6f5596a..e1362244f909 100644 --- a/include/linux/lockd/xdr.h +++ b/include/linux/lockd/xdr.h @@ -96,15 +96,15 @@ struct nlm_reboot { */ #define NLMSVC_XDRSIZE sizeof(struct nlm_args) -int nlmsvc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr); int nlmsvc_encode_testres(struct svc_rqst *, __be32 *); int nlmsvc_encode_res(struct svc_rqst *, __be32 *); diff --git a/include/linux/lockd/xdr4.h b/include/linux/lockd/xdr4.h index 68e14e0f2b1f..376b8f6a3763 100644 --- a/include/linux/lockd/xdr4.h +++ b/include/linux/lockd/xdr4.h @@ -22,15 +22,15 @@ #define nlm4_fbig cpu_to_be32(NLM_FBIG) #define nlm4_failed cpu_to_be32(NLM_FAILED) -int nlm4svc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr); int nlm4svc_encode_testres(struct svc_rqst *, __be32 *); int nlm4svc_encode_res(struct svc_rqst *, __be32 *); diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 77e3a9b39827..85a9884b1074 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -457,7 +457,7 @@ struct svc_procedure { /* process the request: */ __be32 (*pc_func)(struct svc_rqst *); /* XDR decode args: */ - int (*pc_decode)(struct svc_rqst *rqstp, + bool (*pc_decode)(struct svc_rqst *rqstp, struct xdr_stream *xdr); /* XDR encode result: */ int (*pc_encode)(struct svc_rqst *, __be32 *data); From 47047d40af7b5b42d5055a19f66725d8517bb63a Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 13 Oct 2021 10:40:59 -0400 Subject: [PATCH 0696/1565] NFSD: Save location of NFSv4 COMPOUND status [ Upstream commit 3b0ebb255fdc49a3d340846deebf045ef58ec744 ] Refactor: Currently nfs4svc_encode_compoundres() relies on the NFS dispatcher to pass in the buffer location of the COMPOUND status. Instead, save that buffer location in struct nfsd4_compoundres. The compound tag follows immediately after. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfs4xdr.c | 9 +++++++-- fs/nfsd/xdr4.h | 3 ++- 3 files changed, 10 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index d2ee1ba7ddc6..52f3f3553379 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2456,11 +2456,11 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) __be32 status; resp->xdr = &rqstp->rq_res_stream; + resp->statusp = resp->xdr->p; /* reserve space for: NFS status code */ xdr_reserve_space(resp->xdr, XDR_UNIT); - resp->tagp = resp->xdr->p; /* reserve space for: taglen, tag, and opcnt */ xdr_reserve_space(resp->xdr, XDR_UNIT * 2 + args->taglen); resp->taglen = args->taglen; diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 1b33c1c93e88..3f8d23586ea7 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5446,11 +5446,16 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p) WARN_ON_ONCE(buf->len != buf->head[0].iov_len + buf->page_len + buf->tail[0].iov_len); - *p = resp->cstate.status; + /* + * Send buffer space for the following items is reserved + * at the top of nfsd4_proc_compound(). + */ + p = resp->statusp; + + *p++ = resp->cstate.status; rqstp->rq_next_page = resp->xdr->page_ptr + 1; - p = resp->tagp; *p++ = htonl(resp->taglen); memcpy(p, resp->tag, resp->taglen); p += XDR_QUADLEN(resp->taglen); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 8f349640d2e9..8e11dfdc2563 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -702,10 +702,11 @@ struct nfsd4_compoundres { struct xdr_stream *xdr; struct svc_rqst * rqstp; + __be32 *statusp; u32 taglen; char * tag; u32 opcnt; - __be32 * tagp; /* tag, opcount encode location */ + struct nfsd4_compound_state cstate; }; From 61cf6815070a8d27d40b31e4eb4e34fea10a76c5 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 13 Oct 2021 10:41:06 -0400 Subject: [PATCH 0697/1565] SUNRPC: Replace the "__be32 *p" parameter to .pc_encode [ Upstream commit fda494411485aff91768842c532f90fb8eb54943 ] The passed-in value of the "__be32 *p" parameter is now unused in every server-side XDR encoder, and can be removed. Note also that there is a line in each encoder that sets up a local pointer to a struct xdr_stream. Passing that pointer from the dispatcher instead saves one line per encoder function. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 3 +-- fs/lockd/xdr.c | 11 ++++----- fs/lockd/xdr4.c | 11 ++++----- fs/nfs/callback_xdr.c | 4 ++-- fs/nfsd/nfs2acl.c | 8 +++---- fs/nfsd/nfs3acl.c | 8 +++---- fs/nfsd/nfs3xdr.c | 46 +++++++++++++------------------------- fs/nfsd/nfs4xdr.c | 7 +++--- fs/nfsd/nfsd.h | 3 ++- fs/nfsd/nfssvc.c | 9 +++----- fs/nfsd/nfsxdr.c | 22 +++++++----------- fs/nfsd/xdr.h | 14 ++++++------ fs/nfsd/xdr3.h | 30 ++++++++++++------------- fs/nfsd/xdr4.h | 2 +- include/linux/lockd/xdr.h | 8 +++---- include/linux/lockd/xdr4.h | 8 +++---- include/linux/sunrpc/svc.h | 3 ++- 17 files changed, 85 insertions(+), 112 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 9a82471bda07..b220e1b91726 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -780,7 +780,6 @@ module_exit(exit_nlm); static int nlmsvc_dispatch(struct svc_rqst *rqstp, __be32 *statp) { const struct svc_procedure *procp = rqstp->rq_procinfo; - struct kvec *resv = rqstp->rq_res.head; svcxdr_init_decode(rqstp); if (!procp->pc_decode(rqstp, &rqstp->rq_arg_stream)) @@ -793,7 +792,7 @@ static int nlmsvc_dispatch(struct svc_rqst *rqstp, __be32 *statp) return 1; svcxdr_init_encode(rqstp); - if (!procp->pc_encode(rqstp, resv->iov_base + resv->iov_len)) + if (!procp->pc_encode(rqstp, &rqstp->rq_res_stream)) goto out_encode_err; return 1; diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 622c2ca37dbf..2595b4d14cd4 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -314,15 +314,14 @@ nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr) */ int -nlmsvc_encode_void(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { return 1; } int -nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp; return svcxdr_encode_cookie(xdr, &resp->cookie) && @@ -330,9 +329,8 @@ nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) } int -nlmsvc_encode_res(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp; return svcxdr_encode_cookie(xdr, &resp->cookie) && @@ -340,9 +338,8 @@ nlmsvc_encode_res(struct svc_rqst *rqstp, __be32 *p) } int -nlmsvc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) +nlmsvc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp; if (!svcxdr_encode_cookie(xdr, &resp->cookie)) diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 45551dee26b4..32231c21c22d 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -313,15 +313,14 @@ nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr) */ int -nlm4svc_encode_void(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { return 1; } int -nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp; return svcxdr_encode_cookie(xdr, &resp->cookie) && @@ -329,9 +328,8 @@ nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) } int -nlm4svc_encode_res(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp; return svcxdr_encode_cookie(xdr, &resp->cookie) && @@ -339,9 +337,8 @@ nlm4svc_encode_res(struct svc_rqst *rqstp, __be32 *p) } int -nlm4svc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) +nlm4svc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nlm_res *resp = rqstp->rq_resp; if (!svcxdr_encode_cookie(xdr, &resp->cookie)) diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index 600e64068240..c58b3b60bc2c 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -67,9 +67,9 @@ static __be32 nfs4_callback_null(struct svc_rqst *rqstp) * svc_process_common() looks for an XDR encoder to know when * not to drop a Reply. */ -static int nfs4_encode_void(struct svc_rqst *rqstp, __be32 *p) +static int nfs4_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - return xdr_ressize_check(rqstp, p); + return 1; } static __be32 decode_string(struct xdr_stream *xdr, unsigned int *len, diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index eb6a89baf675..23e4c3eb2c38 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -238,9 +238,9 @@ nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) */ /* GETACL */ -static int nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) +static int +nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_getaclres *resp = rqstp->rq_resp; struct dentry *dentry = resp->fh.fh_dentry; struct inode *inode; @@ -278,9 +278,9 @@ static int nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) } /* ACCESS */ -static int nfsaclsvc_encode_accessres(struct svc_rqst *rqstp, __be32 *p) +static int +nfsaclsvc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_accessres *resp = rqstp->rq_resp; if (!svcxdr_encode_stat(xdr, resp->status)) diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index f60b072ce9ec..21aacba88ea3 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -164,9 +164,9 @@ nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) */ /* GETACL */ -static int nfs3svc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) +static int +nfs3svc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_getaclres *resp = rqstp->rq_resp; struct dentry *dentry = resp->fh.fh_dentry; struct kvec *head = rqstp->rq_res.head; @@ -216,9 +216,9 @@ static int nfs3svc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) } /* SETACL */ -static int nfs3svc_encode_setaclres(struct svc_rqst *rqstp, __be32 *p) +static int +nfs3svc_encode_setaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_attrstat *resp = rqstp->rq_resp; return svcxdr_encode_nfsstat3(xdr, resp->status) && diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 1f3de46d24d4..63f0be4e44f7 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -813,9 +813,8 @@ nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) /* GETATTR */ int -nfs3svc_encode_getattrres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_getattrres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_attrstat *resp = rqstp->rq_resp; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) @@ -833,9 +832,8 @@ nfs3svc_encode_getattrres(struct svc_rqst *rqstp, __be32 *p) /* SETATTR, REMOVE, RMDIR */ int -nfs3svc_encode_wccstat(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_wccstat(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_attrstat *resp = rqstp->rq_resp; return svcxdr_encode_nfsstat3(xdr, resp->status) && @@ -843,9 +841,9 @@ nfs3svc_encode_wccstat(struct svc_rqst *rqstp, __be32 *p) } /* LOOKUP */ -int nfs3svc_encode_lookupres(struct svc_rqst *rqstp, __be32 *p) +int +nfs3svc_encode_lookupres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_diropres *resp = rqstp->rq_resp; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) @@ -869,9 +867,8 @@ int nfs3svc_encode_lookupres(struct svc_rqst *rqstp, __be32 *p) /* ACCESS */ int -nfs3svc_encode_accessres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_accessres *resp = rqstp->rq_resp; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) @@ -893,9 +890,8 @@ nfs3svc_encode_accessres(struct svc_rqst *rqstp, __be32 *p) /* READLINK */ int -nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_readlinkres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head; @@ -921,9 +917,8 @@ nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) /* READ */ int -nfs3svc_encode_readres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_readres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head; @@ -954,9 +949,8 @@ nfs3svc_encode_readres(struct svc_rqst *rqstp, __be32 *p) /* WRITE */ int -nfs3svc_encode_writeres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_writeres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_writeres *resp = rqstp->rq_resp; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) @@ -982,9 +976,8 @@ nfs3svc_encode_writeres(struct svc_rqst *rqstp, __be32 *p) /* CREATE, MKDIR, SYMLINK, MKNOD */ int -nfs3svc_encode_createres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_createres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_diropres *resp = rqstp->rq_resp; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) @@ -1008,9 +1001,8 @@ nfs3svc_encode_createres(struct svc_rqst *rqstp, __be32 *p) /* RENAME */ int -nfs3svc_encode_renameres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_renameres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_renameres *resp = rqstp->rq_resp; return svcxdr_encode_nfsstat3(xdr, resp->status) && @@ -1020,9 +1012,8 @@ nfs3svc_encode_renameres(struct svc_rqst *rqstp, __be32 *p) /* LINK */ int -nfs3svc_encode_linkres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_linkres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_linkres *resp = rqstp->rq_resp; return svcxdr_encode_nfsstat3(xdr, resp->status) && @@ -1032,9 +1023,8 @@ nfs3svc_encode_linkres(struct svc_rqst *rqstp, __be32 *p) /* READDIR */ int -nfs3svc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_readdirres *resp = rqstp->rq_resp; struct xdr_buf *dirlist = &resp->dirlist; @@ -1286,9 +1276,8 @@ svcxdr_encode_fsstat3resok(struct xdr_stream *xdr, /* FSSTAT */ int -nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_fsstatres *resp = rqstp->rq_resp; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) @@ -1333,9 +1322,8 @@ svcxdr_encode_fsinfo3resok(struct xdr_stream *xdr, /* FSINFO */ int -nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_fsinfores *resp = rqstp->rq_resp; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) @@ -1376,9 +1364,8 @@ svcxdr_encode_pathconf3resok(struct xdr_stream *xdr, /* PATHCONF */ int -nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_pathconfres *resp = rqstp->rq_resp; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) @@ -1400,9 +1387,8 @@ nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, __be32 *p) /* COMMIT */ int -nfs3svc_encode_commitres(struct svc_rqst *rqstp, __be32 *p) +nfs3svc_encode_commitres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd3_commitres *resp = rqstp->rq_resp; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 3f8d23586ea7..697d61819a59 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5438,10 +5438,11 @@ nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) } int -nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p) +nfs4svc_encode_compoundres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd4_compoundres *resp = rqstp->rq_resp; - struct xdr_buf *buf = resp->xdr->buf; + struct xdr_buf *buf = xdr->buf; + __be32 *p; WARN_ON_ONCE(buf->len != buf->head[0].iov_len + buf->page_len + buf->tail[0].iov_len); @@ -5454,7 +5455,7 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p) *p++ = resp->cstate.status; - rqstp->rq_next_page = resp->xdr->page_ptr + 1; + rqstp->rq_next_page = xdr->page_ptr + 1; *p++ = htonl(resp->taglen); memcpy(p, resp->tag, resp->taglen); diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index bfcddd4c7534..345f8247d5da 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -80,7 +80,8 @@ struct nfsd_voidargs { }; struct nfsd_voidres { }; bool nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_encode_voidres(struct svc_rqst *rqstp, __be32 *p); +int nfssvc_encode_voidres(struct svc_rqst *rqstp, + struct xdr_stream *xdr); /* * Function prototypes. diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 00aadc263503..195f2bcc6538 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1004,8 +1004,6 @@ nfsd(void *vrqstp) int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) { const struct svc_procedure *proc = rqstp->rq_procinfo; - struct kvec *resv = &rqstp->rq_res.head[0]; - __be32 *p; /* * Give the xdr decoder a chance to change this if it wants @@ -1030,14 +1028,13 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) * Need to grab the location to store the status, as * NFSv4 does some encoding while processing */ - p = resv->iov_base + resv->iov_len; svcxdr_init_encode(rqstp); *statp = proc->pc_func(rqstp); if (*statp == rpc_drop_reply || test_bit(RQ_DROPME, &rqstp->rq_flags)) goto out_update_drop; - if (!proc->pc_encode(rqstp, p)) + if (!proc->pc_encode(rqstp, &rqstp->rq_res_stream)) goto out_encode_err; nfsd_cache_update(rqstp, rqstp->rq_cachetype, statp + 1); @@ -1078,13 +1075,13 @@ bool nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr) /** * nfssvc_encode_voidres - Encode void results * @rqstp: Server RPC transaction context - * @p: buffer in which to encode results + * @xdr: XDR stream into which to encode results * * Return values: * %0: Local error while encoding * %1: Encoding was successful */ -int nfssvc_encode_voidres(struct svc_rqst *rqstp, __be32 *p) +int nfssvc_encode_voidres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { return 1; } diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index b5817a41b3de..6aa8138ae2f7 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -415,18 +415,16 @@ nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) */ int -nfssvc_encode_statres(struct svc_rqst *rqstp, __be32 *p) +nfssvc_encode_statres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_stat *resp = rqstp->rq_resp; return svcxdr_encode_stat(xdr, resp->status); } int -nfssvc_encode_attrstatres(struct svc_rqst *rqstp, __be32 *p) +nfssvc_encode_attrstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_attrstat *resp = rqstp->rq_resp; if (!svcxdr_encode_stat(xdr, resp->status)) @@ -442,9 +440,8 @@ nfssvc_encode_attrstatres(struct svc_rqst *rqstp, __be32 *p) } int -nfssvc_encode_diropres(struct svc_rqst *rqstp, __be32 *p) +nfssvc_encode_diropres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_diropres *resp = rqstp->rq_resp; if (!svcxdr_encode_stat(xdr, resp->status)) @@ -462,9 +459,8 @@ nfssvc_encode_diropres(struct svc_rqst *rqstp, __be32 *p) } int -nfssvc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) +nfssvc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_readlinkres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head; @@ -484,9 +480,8 @@ nfssvc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) } int -nfssvc_encode_readres(struct svc_rqst *rqstp, __be32 *p) +nfssvc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_readres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head; @@ -509,9 +504,8 @@ nfssvc_encode_readres(struct svc_rqst *rqstp, __be32 *p) } int -nfssvc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) +nfssvc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_readdirres *resp = rqstp->rq_resp; struct xdr_buf *dirlist = &resp->dirlist; @@ -532,11 +526,11 @@ nfssvc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) } int -nfssvc_encode_statfsres(struct svc_rqst *rqstp, __be32 *p) +nfssvc_encode_statfsres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - struct xdr_stream *xdr = &rqstp->rq_res_stream; struct nfsd_statfsres *resp = rqstp->rq_resp; struct kstatfs *stat = &resp->stats; + __be32 *p; if (!svcxdr_encode_stat(xdr, resp->status)) return 0; diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index d897c198c912..bff7258041fc 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -152,13 +152,13 @@ bool nfssvc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_encode_statres(struct svc_rqst *, __be32 *); -int nfssvc_encode_attrstatres(struct svc_rqst *, __be32 *); -int nfssvc_encode_diropres(struct svc_rqst *, __be32 *); -int nfssvc_encode_readlinkres(struct svc_rqst *, __be32 *); -int nfssvc_encode_readres(struct svc_rqst *, __be32 *); -int nfssvc_encode_statfsres(struct svc_rqst *, __be32 *); -int nfssvc_encode_readdirres(struct svc_rqst *, __be32 *); +int nfssvc_encode_statres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_encode_attrstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_encode_diropres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_encode_statfsres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr); void nfssvc_encode_nfscookie(struct nfsd_readdirres *resp, u32 offset); int nfssvc_encode_entry(void *data, const char *name, int namlen, diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index ef72bc4868da..bb017fc7cba1 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -281,21 +281,21 @@ bool nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_getattrres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_wccstat(struct svc_rqst *, __be32 *); -int nfs3svc_encode_lookupres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_accessres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_readlinkres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_readres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_writeres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_createres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_renameres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_linkres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_readdirres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_fsstatres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_fsinfores(struct svc_rqst *, __be32 *); -int nfs3svc_encode_pathconfres(struct svc_rqst *, __be32 *); -int nfs3svc_encode_commitres(struct svc_rqst *, __be32 *); +int nfs3svc_encode_getattrres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_wccstat(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_lookupres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_writeres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_createres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_renameres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_linkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_encode_commitres(struct svc_rqst *rqstp, struct xdr_stream *xdr); void nfs3svc_release_fhandle(struct svc_rqst *); void nfs3svc_release_fhandle2(struct svc_rqst *); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 8e11dfdc2563..5b343d0c6963 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -758,7 +758,7 @@ set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp) bool nfsd4_mach_creds_match(struct nfs4_client *cl, struct svc_rqst *rqstp); bool nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs4svc_encode_compoundres(struct svc_rqst *, __be32 *); +int nfs4svc_encode_compoundres(struct svc_rqst *rqstp, struct xdr_stream *xdr); __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *, u32); void nfsd4_encode_operation(struct nfsd4_compoundres *, struct nfsd4_op *); void nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op); diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h index e1362244f909..d8bd26a5525e 100644 --- a/include/linux/lockd/xdr.h +++ b/include/linux/lockd/xdr.h @@ -106,9 +106,9 @@ bool nlmsvc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlmsvc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_encode_testres(struct svc_rqst *, __be32 *); -int nlmsvc_encode_res(struct svc_rqst *, __be32 *); -int nlmsvc_encode_void(struct svc_rqst *, __be32 *); -int nlmsvc_encode_shareres(struct svc_rqst *, __be32 *); +int nlmsvc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr); #endif /* LOCKD_XDR_H */ diff --git a/include/linux/lockd/xdr4.h b/include/linux/lockd/xdr4.h index 376b8f6a3763..50677be3557d 100644 --- a/include/linux/lockd/xdr4.h +++ b/include/linux/lockd/xdr4.h @@ -32,10 +32,10 @@ bool nlm4svc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlm4svc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_encode_testres(struct svc_rqst *, __be32 *); -int nlm4svc_encode_res(struct svc_rqst *, __be32 *); -int nlm4svc_encode_void(struct svc_rqst *, __be32 *); -int nlm4svc_encode_shareres(struct svc_rqst *, __be32 *); +int nlm4svc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr); extern const struct rpc_version nlm_version4; diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 85a9884b1074..c6a0b4364f4a 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -460,7 +460,8 @@ struct svc_procedure { bool (*pc_decode)(struct svc_rqst *rqstp, struct xdr_stream *xdr); /* XDR encode result: */ - int (*pc_encode)(struct svc_rqst *, __be32 *data); + int (*pc_encode)(struct svc_rqst *rqstp, + struct xdr_stream *xdr); /* XDR free result: */ void (*pc_release)(struct svc_rqst *); unsigned int pc_argsize; /* argument struct size */ From c7b0a9c75d3c832c919d7ff0d24b410ba4591074 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 13 Oct 2021 10:41:13 -0400 Subject: [PATCH 0698/1565] SUNRPC: Change return value type of .pc_encode [ Upstream commit 130e2054d4a652a2bd79fb1557ddcd19c053cb37 ] Returning an undecorated integer is an age-old trope, but it's not clear (even to previous experts in this code) that the only valid return values are 1 and 0. These functions do not return a negative errno, rpc_stat value, or a positive length. Document there are only two valid return values by having .pc_encode return only true or false. Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/xdr.c | 18 ++-- fs/lockd/xdr4.c | 18 ++-- fs/nfs/callback_xdr.c | 4 +- fs/nfsd/nfs2acl.c | 4 +- fs/nfsd/nfs3acl.c | 18 ++-- fs/nfsd/nfs3xdr.c | 166 ++++++++++++++++++------------------- fs/nfsd/nfs4xdr.c | 4 +- fs/nfsd/nfsd.h | 2 +- fs/nfsd/nfssvc.c | 8 +- fs/nfsd/nfsxdr.c | 60 +++++++------- fs/nfsd/xdr.h | 14 ++-- fs/nfsd/xdr3.h | 30 +++---- fs/nfsd/xdr4.h | 2 +- include/linux/lockd/xdr.h | 8 +- include/linux/lockd/xdr4.h | 8 +- include/linux/sunrpc/svc.h | 2 +- 16 files changed, 183 insertions(+), 183 deletions(-) diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 2595b4d14cd4..2fb5748dae0c 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -313,13 +313,13 @@ nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr) * Encode Reply results */ -int +bool nlmsvc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - return 1; + return true; } -int +bool nlmsvc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_res *resp = rqstp->rq_resp; @@ -328,7 +328,7 @@ nlmsvc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr) svcxdr_encode_testrply(xdr, resp); } -int +bool nlmsvc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_res *resp = rqstp->rq_resp; @@ -337,18 +337,18 @@ nlmsvc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) svcxdr_encode_stats(xdr, resp->status); } -int +bool nlmsvc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_res *resp = rqstp->rq_resp; if (!svcxdr_encode_cookie(xdr, &resp->cookie)) - return 0; + return false; if (!svcxdr_encode_stats(xdr, resp->status)) - return 0; + return false; /* sequence */ if (xdr_stream_encode_u32(xdr, 0) < 0) - return 0; + return false; - return 1; + return true; } diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 32231c21c22d..856267c0864b 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -312,13 +312,13 @@ nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr) * Encode Reply results */ -int +bool nlm4svc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - return 1; + return true; } -int +bool nlm4svc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_res *resp = rqstp->rq_resp; @@ -327,7 +327,7 @@ nlm4svc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr) svcxdr_encode_testrply(xdr, resp); } -int +bool nlm4svc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_res *resp = rqstp->rq_resp; @@ -336,18 +336,18 @@ nlm4svc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) svcxdr_encode_stats(xdr, resp->status); } -int +bool nlm4svc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nlm_res *resp = rqstp->rq_resp; if (!svcxdr_encode_cookie(xdr, &resp->cookie)) - return 0; + return false; if (!svcxdr_encode_stats(xdr, resp->status)) - return 0; + return false; /* sequence */ if (xdr_stream_encode_u32(xdr, 0) < 0) - return 0; + return false; - return 1; + return true; } diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index c58b3b60bc2c..1b21f88a9808 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -67,9 +67,9 @@ static __be32 nfs4_callback_null(struct svc_rqst *rqstp) * svc_process_common() looks for an XDR encoder to know when * not to drop a Reply. */ -static int nfs4_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) +static bool nfs4_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - return 1; + return true; } static __be32 decode_string(struct xdr_stream *xdr, unsigned int *len, diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 23e4c3eb2c38..96733dff354d 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -238,7 +238,7 @@ nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) */ /* GETACL */ -static int +static bool nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_getaclres *resp = rqstp->rq_resp; @@ -278,7 +278,7 @@ nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) } /* ACCESS */ -static int +static bool nfsaclsvc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_accessres *resp = rqstp->rq_resp; diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 21aacba88ea3..350fae92ae04 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -164,7 +164,7 @@ nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) */ /* GETACL */ -static int +static bool nfs3svc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_getaclres *resp = rqstp->rq_resp; @@ -176,14 +176,14 @@ nfs3svc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) int w; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: inode = d_inode(dentry); if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->mask) < 0) - return 0; + return false; base = (char *)xdr->p - (char *)head->iov_base; @@ -192,7 +192,7 @@ nfs3svc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) (resp->mask & NFS_DFACL) ? resp->acl_default : NULL); while (w > 0) { if (!*(rqstp->rq_next_page++)) - return 0; + return false; w -= PAGE_SIZE; } @@ -205,18 +205,18 @@ nfs3svc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) resp->mask & NFS_DFACL, NFS_ACL_DEFAULT); if (n <= 0) - return 0; + return false; break; default: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; } - return 1; + return true; } /* SETACL */ -static int +static bool nfs3svc_encode_setaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_attrstat *resp = rqstp->rq_resp; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 63f0be4e44f7..c3ac1b6aa3aa 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -812,26 +812,26 @@ nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) */ /* GETATTR */ -int +bool nfs3svc_encode_getattrres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_attrstat *resp = rqstp->rq_resp; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: lease_get_mtime(d_inode(resp->fh.fh_dentry), &resp->stat.mtime); if (!svcxdr_encode_fattr3(rqstp, xdr, &resp->fh, &resp->stat)) - return 0; + return false; break; } - return 1; + return true; } /* SETATTR, REMOVE, RMDIR */ -int +bool nfs3svc_encode_wccstat(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_attrstat *resp = rqstp->rq_resp; @@ -841,166 +841,166 @@ nfs3svc_encode_wccstat(struct svc_rqst *rqstp, struct xdr_stream *xdr) } /* LOOKUP */ -int +bool nfs3svc_encode_lookupres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_diropres *resp = rqstp->rq_resp; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_nfs_fh3(xdr, &resp->fh)) - return 0; + return false; if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->dirfh)) - return 0; + return false; break; default: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->dirfh)) - return 0; + return false; } - return 1; + return true; } /* ACCESS */ -int +bool nfs3svc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_accessres *resp = rqstp->rq_resp; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->access) < 0) - return 0; + return false; break; default: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; } - return 1; + return true; } /* READLINK */ -int +bool nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_readlinkres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->len) < 0) - return 0; + return false; xdr_write_pages(xdr, resp->pages, 0, resp->len); if (svc_encode_result_payload(rqstp, head->iov_len, resp->len) < 0) - return 0; + return false; break; default: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; } - return 1; + return true; } /* READ */ -int +bool nfs3svc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_readres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->count) < 0) - return 0; + return false; if (xdr_stream_encode_bool(xdr, resp->eof) < 0) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->count) < 0) - return 0; + return false; xdr_write_pages(xdr, resp->pages, rqstp->rq_res.page_base, resp->count); if (svc_encode_result_payload(rqstp, head->iov_len, resp->count) < 0) - return 0; + return false; break; default: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; } - return 1; + return true; } /* WRITE */ -int +bool nfs3svc_encode_writeres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_writeres *resp = rqstp->rq_resp; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->count) < 0) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->committed) < 0) - return 0; + return false; if (!svcxdr_encode_writeverf3(xdr, resp->verf)) - return 0; + return false; break; default: if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) - return 0; + return false; } - return 1; + return true; } /* CREATE, MKDIR, SYMLINK, MKNOD */ -int +bool nfs3svc_encode_createres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_diropres *resp = rqstp->rq_resp; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_post_op_fh3(xdr, &resp->fh)) - return 0; + return false; if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->dirfh)) - return 0; + return false; break; default: if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->dirfh)) - return 0; + return false; } - return 1; + return true; } /* RENAME */ -int +bool nfs3svc_encode_renameres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_renameres *resp = rqstp->rq_resp; @@ -1011,7 +1011,7 @@ nfs3svc_encode_renameres(struct svc_rqst *rqstp, struct xdr_stream *xdr) } /* LINK */ -int +bool nfs3svc_encode_linkres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_linkres *resp = rqstp->rq_resp; @@ -1022,33 +1022,33 @@ nfs3svc_encode_linkres(struct svc_rqst *rqstp, struct xdr_stream *xdr) } /* READDIR */ -int +bool nfs3svc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_readdirres *resp = rqstp->rq_resp; struct xdr_buf *dirlist = &resp->dirlist; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; if (!svcxdr_encode_cookieverf3(xdr, resp->verf)) - return 0; + return false; xdr_write_pages(xdr, dirlist->pages, 0, dirlist->len); /* no more entries */ if (xdr_stream_encode_item_absent(xdr) < 0) - return 0; + return false; if (xdr_stream_encode_bool(xdr, resp->common.err == nfserr_eof) < 0) - return 0; + return false; break; default: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return 0; + return false; } - return 1; + return true; } static __be32 @@ -1275,26 +1275,26 @@ svcxdr_encode_fsstat3resok(struct xdr_stream *xdr, } /* FSSTAT */ -int +bool nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_fsstatres *resp = rqstp->rq_resp; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) - return 0; + return false; if (!svcxdr_encode_fsstat3resok(xdr, resp)) - return 0; + return false; break; default: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) - return 0; + return false; } - return 1; + return true; } static bool @@ -1321,26 +1321,26 @@ svcxdr_encode_fsinfo3resok(struct xdr_stream *xdr, } /* FSINFO */ -int +bool nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_fsinfores *resp = rqstp->rq_resp; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) - return 0; + return false; if (!svcxdr_encode_fsinfo3resok(xdr, resp)) - return 0; + return false; break; default: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) - return 0; + return false; } - return 1; + return true; } static bool @@ -1363,49 +1363,49 @@ svcxdr_encode_pathconf3resok(struct xdr_stream *xdr, } /* PATHCONF */ -int +bool nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_pathconfres *resp = rqstp->rq_resp; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) - return 0; + return false; if (!svcxdr_encode_pathconf3resok(xdr, resp)) - return 0; + return false; break; default: if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) - return 0; + return false; } - return 1; + return true; } /* COMMIT */ -int +bool nfs3svc_encode_commitres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_commitres *resp = rqstp->rq_resp; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) - return 0; + return false; if (!svcxdr_encode_writeverf3(xdr, resp->verf)) - return 0; + return false; break; default: if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) - return 0; + return false; } - return 1; + return true; } /* diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 697d61819a59..ba2ed12df206 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5437,7 +5437,7 @@ nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) return nfsd4_decode_compound(args); } -int +bool nfs4svc_encode_compoundres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd4_compoundres *resp = rqstp->rq_resp; @@ -5463,7 +5463,7 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, struct xdr_stream *xdr) *p++ = htonl(resp->opcnt); nfsd4_sequence_done(resp); - return 1; + return true; } /* diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 345f8247d5da..498e5a489826 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -80,7 +80,7 @@ struct nfsd_voidargs { }; struct nfsd_voidres { }; bool nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_encode_voidres(struct svc_rqst *rqstp, +bool nfssvc_encode_voidres(struct svc_rqst *rqstp, struct xdr_stream *xdr); /* diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 195f2bcc6538..7df1505425ed 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1078,12 +1078,12 @@ bool nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr) * @xdr: XDR stream into which to encode results * * Return values: - * %0: Local error while encoding - * %1: Encoding was successful + * %false: Local error while encoding + * %true: Encoding was successful */ -int nfssvc_encode_voidres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +bool nfssvc_encode_voidres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - return 1; + return true; } int nfsd_pool_stats_open(struct inode *inode, struct file *file) diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index 6aa8138ae2f7..aba8520b4b8b 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -414,7 +414,7 @@ nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) * XDR encode functions */ -int +bool nfssvc_encode_statres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_stat *resp = rqstp->rq_resp; @@ -422,110 +422,110 @@ nfssvc_encode_statres(struct svc_rqst *rqstp, struct xdr_stream *xdr) return svcxdr_encode_stat(xdr, resp->status); } -int +bool nfssvc_encode_attrstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_attrstat *resp = rqstp->rq_resp; if (!svcxdr_encode_stat(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) - return 0; + return false; break; } - return 1; + return true; } -int +bool nfssvc_encode_diropres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_diropres *resp = rqstp->rq_resp; if (!svcxdr_encode_stat(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_fhandle(xdr, &resp->fh)) - return 0; + return false; if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) - return 0; + return false; break; } - return 1; + return true; } -int +bool nfssvc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_readlinkres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head; if (!svcxdr_encode_stat(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (xdr_stream_encode_u32(xdr, resp->len) < 0) - return 0; + return false; xdr_write_pages(xdr, &resp->page, 0, resp->len); if (svc_encode_result_payload(rqstp, head->iov_len, resp->len) < 0) - return 0; + return false; break; } - return 1; + return true; } -int +bool nfssvc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_readres *resp = rqstp->rq_resp; struct kvec *head = rqstp->rq_res.head; if (!svcxdr_encode_stat(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->count) < 0) - return 0; + return false; xdr_write_pages(xdr, resp->pages, rqstp->rq_res.page_base, resp->count); if (svc_encode_result_payload(rqstp, head->iov_len, resp->count) < 0) - return 0; + return false; break; } - return 1; + return true; } -int +bool nfssvc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_readdirres *resp = rqstp->rq_resp; struct xdr_buf *dirlist = &resp->dirlist; if (!svcxdr_encode_stat(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: xdr_write_pages(xdr, dirlist->pages, 0, dirlist->len); /* no more entries */ if (xdr_stream_encode_item_absent(xdr) < 0) - return 0; + return false; if (xdr_stream_encode_bool(xdr, resp->common.err == nfserr_eof) < 0) - return 0; + return false; break; } - return 1; + return true; } -int +bool nfssvc_encode_statfsres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd_statfsres *resp = rqstp->rq_resp; @@ -533,12 +533,12 @@ nfssvc_encode_statfsres(struct svc_rqst *rqstp, struct xdr_stream *xdr) __be32 *p; if (!svcxdr_encode_stat(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: p = xdr_reserve_space(xdr, XDR_UNIT * 5); if (!p) - return 0; + return false; *p++ = cpu_to_be32(NFSSVC_MAXBLKSIZE_V2); *p++ = cpu_to_be32(stat->f_bsize); *p++ = cpu_to_be32(stat->f_blocks); @@ -547,7 +547,7 @@ nfssvc_encode_statfsres(struct svc_rqst *rqstp, struct xdr_stream *xdr) break; } - return 1; + return true; } /** diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index bff7258041fc..852f71580bd0 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -152,13 +152,13 @@ bool nfssvc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_encode_statres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_encode_attrstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_encode_diropres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_encode_statfsres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfssvc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_encode_statres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_encode_attrstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_encode_diropres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_encode_statfsres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfssvc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr); void nfssvc_encode_nfscookie(struct nfsd_readdirres *resp, u32 offset); int nfssvc_encode_entry(void *data, const char *name, int namlen, diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index bb017fc7cba1..03fe4e21306c 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -281,21 +281,21 @@ bool nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_getattrres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_wccstat(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_lookupres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_writeres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_createres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_renameres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_linkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs3svc_encode_commitres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_getattrres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_wccstat(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_lookupres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_writeres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_createres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_renameres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_linkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs3svc_encode_commitres(struct svc_rqst *rqstp, struct xdr_stream *xdr); void nfs3svc_release_fhandle(struct svc_rqst *); void nfs3svc_release_fhandle2(struct svc_rqst *); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 5b343d0c6963..4a298ac5515d 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -758,7 +758,7 @@ set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp) bool nfsd4_mach_creds_match(struct nfs4_client *cl, struct svc_rqst *rqstp); bool nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nfs4svc_encode_compoundres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nfs4svc_encode_compoundres(struct svc_rqst *rqstp, struct xdr_stream *xdr); __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *, u32); void nfsd4_encode_operation(struct nfsd4_compoundres *, struct nfsd4_op *); void nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op); diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h index d8bd26a5525e..398f70093cd3 100644 --- a/include/linux/lockd/xdr.h +++ b/include/linux/lockd/xdr.h @@ -106,9 +106,9 @@ bool nlmsvc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlmsvc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlmsvc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlmsvc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr); #endif /* LOCKD_XDR_H */ diff --git a/include/linux/lockd/xdr4.h b/include/linux/lockd/xdr4.h index 50677be3557d..9a6b55da8fd6 100644 --- a/include/linux/lockd/xdr4.h +++ b/include/linux/lockd/xdr4.h @@ -32,10 +32,10 @@ bool nlm4svc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlm4svc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); -int nlm4svc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); +bool nlm4svc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr); extern const struct rpc_version nlm_version4; diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index c6a0b4364f4a..15ecd5b1abac 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -460,7 +460,7 @@ struct svc_procedure { bool (*pc_decode)(struct svc_rqst *rqstp, struct xdr_stream *xdr); /* XDR encode result: */ - int (*pc_encode)(struct svc_rqst *rqstp, + bool (*pc_encode)(struct svc_rqst *rqstp, struct xdr_stream *xdr); /* XDR free result: */ void (*pc_release)(struct svc_rqst *); From 022723fe15074fd5132a4c55b1d98753d70bc53f Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Fri, 15 Oct 2021 14:42:11 -0400 Subject: [PATCH 0699/1565] nfsd: update create verifier comment [ Upstream commit 2336d696862186fd4a6ddd1ea0cb243b3e32847c ] I don't know if that Solaris behavior matters any more or if it's still possible to look up that bug ID any more. The XFS behavior's definitely still relevant, though; any but the most recent XFS filesystems will lose the top bits. Reported-by: Frank S. Filz Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 05b5f7e241e7..5b0abdf8de27 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1431,7 +1431,8 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, if (nfsd_create_is_exclusive(createmode)) { /* solaris7 gets confused (bugid 4218508) if these have - * the high bit set, so just clear the high bits. If this is + * the high bit set, as do xfs filesystems without the + * "bigtime" feature. So just clear the high bits. If this is * ever changed to use different attrs for storing the * verifier, then do_open_lookup() will also need to be fixed * accordingly. From f4e9e9565e4240ba7bb7d903adbbc96ce758814f Mon Sep 17 00:00:00 2001 From: Changcheng Deng Date: Tue, 19 Oct 2021 04:14:22 +0000 Subject: [PATCH 0700/1565] NFSD:fix boolreturn.cocci warning [ Upstream commit 291cd656da04163f4bba67953c1f2f823e0d1231 ] ./fs/nfsd/nfssvc.c: 1072: 8-9: :WARNING return of 0/1 in function 'nfssvc_decode_voidarg' with return type bool Return statements in functions returning bool should use true/false instead of 1/0. Reported-by: Zeal Robot Signed-off-by: Changcheng Deng Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfssvc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 7df1505425ed..408cff8fe32d 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1069,7 +1069,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) */ bool nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr) { - return 1; + return true; } /** From 88ccda1a814323f6d7934a76bea43a2331ab1994 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Tue, 26 Oct 2021 12:56:55 -0400 Subject: [PATCH 0701/1565] nfsd4: remove obselete comment [ Upstream commit 80479eb862102f9513e93fcf726c78cc0be2e3b2 ] Mandatory locking has been removed. And the rest of this comment is redundant with the code. Reported-by: Jeff layton Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 5b0abdf8de27..deaf4a50550d 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -751,9 +751,6 @@ __nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, path.dentry = fhp->fh_dentry; inode = d_inode(path.dentry); - /* Disallow write access to files with the append-only bit set - * or any access when mandatory locking enabled - */ err = nfserr_perm; if (IS_APPEND(inode) && (may_flags & NFSD_MAY_WRITE)) goto out; From 1abf3ec5587741cbc248c756a1b1bf933ad79506 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 14 Nov 2021 15:16:04 -0500 Subject: [PATCH 0702/1565] NFSD: Fix exposure in nfsd4_decode_bitmap() [ Upstream commit c0019b7db1d7ac62c711cda6b357a659d46428fe ] rtm@csail.mit.edu reports: > nfsd4_decode_bitmap4() will write beyond bmval[bmlen-1] if the RPC > directs it to do so. This can cause nfsd4_decode_state_protect4_a() > to write client-supplied data beyond the end of > nfsd4_exchange_id.spo_must_allow[] when called by > nfsd4_decode_exchange_id(). Rewrite the loops so nfsd4_decode_bitmap() cannot iterate beyond @bmlen. Reported by: rtm@csail.mit.edu Fixes: d1c263a031e8 ("NFSD: Replace READ* macros in nfsd4_decode_fattr()") Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index ba2ed12df206..d9758232a398 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -288,11 +288,8 @@ nfsd4_decode_bitmap4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen) p = xdr_inline_decode(argp->xdr, count << 2); if (!p) return nfserr_bad_xdr; - i = 0; - while (i < count) - bmval[i++] = be32_to_cpup(p++); - while (i < bmlen) - bmval[i++] = 0; + for (i = 0; i < bmlen; i++) + bmval[i] = (i < count) ? be32_to_cpup(p++) : 0; return nfs_ok; } From 9e291a6a28d32545ed2fd959a8165144d1724df1 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 16 Dec 2021 11:12:11 -0500 Subject: [PATCH 0703/1565] NFSD: Fix READDIR buffer overflow [ Upstream commit 53b1119a6e5028b125f431a0116ba73510d82a72 ] If a client sends a READDIR count argument that is too small (say, zero), then the buffer size calculation in the new init_dirlist helper functions results in an underflow, allowing the XDR stream functions to write beyond the actual buffer. This calculation has always been suspect. NFSD has never sanity- checked the READDIR count argument, but the old entry encoders managed the problem correctly. With the commits below, entry encoding changed, exposing the underflow to the pointer arithmetic in xdr_reserve_space(). Modern NFS clients attempt to retrieve as much data as possible for each READDIR request. Also, we have no unit tests that exercise the behavior of READDIR at the lower bound of @count values. Thus this case was missed during testing. Reported-by: Anatoly Trosinenko Fixes: f5dcccd647da ("NFSD: Update the NFSv2 READDIR entry encoder to use struct xdr_stream") Fixes: 7f87fc2d34d4 ("NFSD: Update NFSv3 READDIR entry encoders to use struct xdr_stream") Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 11 ++++------- fs/nfsd/nfsproc.c | 8 ++++---- 2 files changed, 8 insertions(+), 11 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 5abb5c8e2cd2..d26271c60ab4 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -443,22 +443,19 @@ nfsd3_proc_link(struct svc_rqst *rqstp) static void nfsd3_init_dirlist_pages(struct svc_rqst *rqstp, struct nfsd3_readdirres *resp, - int count) + u32 count) { struct xdr_buf *buf = &resp->dirlist; struct xdr_stream *xdr = &resp->xdr; - count = min_t(u32, count, svc_max_payload(rqstp)); + count = clamp(count, (u32)(XDR_UNIT * 2), svc_max_payload(rqstp)); memset(buf, 0, sizeof(*buf)); /* Reserve room for the NULL ptr & eof flag (-2 words) */ buf->buflen = count - XDR_UNIT * 2; buf->pages = rqstp->rq_next_page; - while (count > 0) { - rqstp->rq_next_page++; - count -= PAGE_SIZE; - } + rqstp->rq_next_page += (buf->buflen + PAGE_SIZE - 1) >> PAGE_SHIFT; /* This is xdr_init_encode(), but it assumes that * the head kvec has already been consumed. */ @@ -467,7 +464,7 @@ static void nfsd3_init_dirlist_pages(struct svc_rqst *rqstp, xdr->page_ptr = buf->pages; xdr->iov = NULL; xdr->p = page_address(*buf->pages); - xdr->end = xdr->p + (PAGE_SIZE >> 2); + xdr->end = (void *)xdr->p + min_t(u32, buf->buflen, PAGE_SIZE); xdr->rqst = NULL; } diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 5a84aed17e70..8ee329bc3815 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -556,17 +556,17 @@ nfsd_proc_rmdir(struct svc_rqst *rqstp) static void nfsd_init_dirlist_pages(struct svc_rqst *rqstp, struct nfsd_readdirres *resp, - int count) + u32 count) { struct xdr_buf *buf = &resp->dirlist; struct xdr_stream *xdr = &resp->xdr; - count = min_t(u32, count, PAGE_SIZE); + count = clamp(count, (u32)(XDR_UNIT * 2), svc_max_payload(rqstp)); memset(buf, 0, sizeof(*buf)); /* Reserve room for the NULL ptr & eof flag (-2 words) */ - buf->buflen = count - sizeof(__be32) * 2; + buf->buflen = count - XDR_UNIT * 2; buf->pages = rqstp->rq_next_page; rqstp->rq_next_page++; @@ -577,7 +577,7 @@ static void nfsd_init_dirlist_pages(struct svc_rqst *rqstp, xdr->page_ptr = buf->pages; xdr->iov = NULL; xdr->p = page_address(*buf->pages); - xdr->end = xdr->p + (PAGE_SIZE >> 2); + xdr->end = (void *)xdr->p + min_t(u32, buf->buflen, PAGE_SIZE); xdr->rqst = NULL; } From c0a5f0b561c8ab2af72ab23da5a6e13d6b5ee154 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Mon, 29 Nov 2021 22:15:27 +0200 Subject: [PATCH 0704/1565] fsnotify: clarify object type argument [ Upstream commit ad69cd9972e79aba103ba5365de0acd35770c265 ] In preparation for separating object type from iterator type, rename some 'type' arguments in functions to 'obj_type' and remove the unused interface to clear marks by object type mask. Link: https://lore.kernel.org/r/20211129201537.1932819-2-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 8 ++++---- fs/notify/group.c | 2 +- fs/notify/mark.c | 27 +++++++++++++++------------ include/linux/fsnotify_backend.h | 28 ++++++++++++---------------- 4 files changed, 32 insertions(+), 33 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index e6860c55b3dc..2f78999a7aa3 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1052,7 +1052,7 @@ static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group, fsnotify_connp_t *connp, - unsigned int type, + unsigned int obj_type, __kernel_fsid_t *fsid) { struct ucounts *ucounts = group->fanotify_data.ucounts; @@ -1075,7 +1075,7 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group, } fsnotify_init_mark(mark, group); - ret = fsnotify_add_mark_locked(mark, connp, type, 0, fsid); + ret = fsnotify_add_mark_locked(mark, connp, obj_type, 0, fsid); if (ret) { fsnotify_put_mark(mark); goto out_dec_ucounts; @@ -1100,7 +1100,7 @@ static int fanotify_group_init_error_pool(struct fsnotify_group *group) } static int fanotify_add_mark(struct fsnotify_group *group, - fsnotify_connp_t *connp, unsigned int type, + fsnotify_connp_t *connp, unsigned int obj_type, __u32 mask, unsigned int flags, __kernel_fsid_t *fsid) { @@ -1111,7 +1111,7 @@ static int fanotify_add_mark(struct fsnotify_group *group, mutex_lock(&group->mark_mutex); fsn_mark = fsnotify_find_mark(connp, group); if (!fsn_mark) { - fsn_mark = fanotify_add_new_mark(group, connp, type, fsid); + fsn_mark = fanotify_add_new_mark(group, connp, obj_type, fsid); if (IS_ERR(fsn_mark)) { mutex_unlock(&group->mark_mutex); return PTR_ERR(fsn_mark); diff --git a/fs/notify/group.c b/fs/notify/group.c index 6a297efc4788..b7d4d64f87c2 100644 --- a/fs/notify/group.c +++ b/fs/notify/group.c @@ -58,7 +58,7 @@ void fsnotify_destroy_group(struct fsnotify_group *group) fsnotify_group_stop_queueing(group); /* Clear all marks for this group and queue them for destruction */ - fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_ALL_TYPES_MASK); + fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_ANY); /* * Some marks can still be pinned when waiting for response from diff --git a/fs/notify/mark.c b/fs/notify/mark.c index bea106fac090..7c0946e16918 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -496,7 +496,7 @@ int fsnotify_compare_groups(struct fsnotify_group *a, struct fsnotify_group *b) } static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, - unsigned int type, + unsigned int obj_type, __kernel_fsid_t *fsid) { struct inode *inode = NULL; @@ -507,7 +507,7 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, return -ENOMEM; spin_lock_init(&conn->lock); INIT_HLIST_HEAD(&conn->list); - conn->type = type; + conn->type = obj_type; conn->obj = connp; /* Cache fsid of filesystem containing the object */ if (fsid) { @@ -572,7 +572,8 @@ static struct fsnotify_mark_connector *fsnotify_grab_connector( * priority, highest number first, and then by the group's location in memory. */ static int fsnotify_add_mark_list(struct fsnotify_mark *mark, - fsnotify_connp_t *connp, unsigned int type, + fsnotify_connp_t *connp, + unsigned int obj_type, int allow_dups, __kernel_fsid_t *fsid) { struct fsnotify_mark *lmark, *last = NULL; @@ -580,7 +581,7 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark, int cmp; int err = 0; - if (WARN_ON(!fsnotify_valid_obj_type(type))) + if (WARN_ON(!fsnotify_valid_obj_type(obj_type))) return -EINVAL; /* Backend is expected to check for zero fsid (e.g. tmpfs) */ @@ -592,7 +593,8 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark, conn = fsnotify_grab_connector(connp); if (!conn) { spin_unlock(&mark->lock); - err = fsnotify_attach_connector_to_object(connp, type, fsid); + err = fsnotify_attach_connector_to_object(connp, obj_type, + fsid); if (err) return err; goto restart; @@ -665,7 +667,7 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark, * event types should be delivered to which group. */ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, - fsnotify_connp_t *connp, unsigned int type, + fsnotify_connp_t *connp, unsigned int obj_type, int allow_dups, __kernel_fsid_t *fsid) { struct fsnotify_group *group = mark->group; @@ -686,7 +688,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, fsnotify_get_mark(mark); /* for g_list */ spin_unlock(&mark->lock); - ret = fsnotify_add_mark_list(mark, connp, type, allow_dups, fsid); + ret = fsnotify_add_mark_list(mark, connp, obj_type, allow_dups, fsid); if (ret) goto err; @@ -706,13 +708,14 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, } int fsnotify_add_mark(struct fsnotify_mark *mark, fsnotify_connp_t *connp, - unsigned int type, int allow_dups, __kernel_fsid_t *fsid) + unsigned int obj_type, int allow_dups, + __kernel_fsid_t *fsid) { int ret; struct fsnotify_group *group = mark->group; mutex_lock(&group->mark_mutex); - ret = fsnotify_add_mark_locked(mark, connp, type, allow_dups, fsid); + ret = fsnotify_add_mark_locked(mark, connp, obj_type, allow_dups, fsid); mutex_unlock(&group->mark_mutex); return ret; } @@ -747,14 +750,14 @@ EXPORT_SYMBOL_GPL(fsnotify_find_mark); /* Clear any marks in a group with given type mask */ void fsnotify_clear_marks_by_group(struct fsnotify_group *group, - unsigned int type_mask) + unsigned int obj_type) { struct fsnotify_mark *lmark, *mark; LIST_HEAD(to_free); struct list_head *head = &to_free; /* Skip selection step if we want to clear all marks. */ - if (type_mask == FSNOTIFY_OBJ_ALL_TYPES_MASK) { + if (obj_type == FSNOTIFY_OBJ_TYPE_ANY) { head = &group->marks_list; goto clear; } @@ -769,7 +772,7 @@ void fsnotify_clear_marks_by_group(struct fsnotify_group *group, */ mutex_lock(&group->mark_mutex); list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) { - if ((1U << mark->connector->type) & type_mask) + if (mark->connector->type == obj_type) list_move(&mark->g_list, &to_free); } mutex_unlock(&group->mark_mutex); diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 51ef2b079bfa..b9c84b1dbcc8 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -338,6 +338,7 @@ static inline struct fs_error_report *fsnotify_data_error_report( } enum fsnotify_obj_type { + FSNOTIFY_OBJ_TYPE_ANY = -1, FSNOTIFY_OBJ_TYPE_INODE, FSNOTIFY_OBJ_TYPE_PARENT, FSNOTIFY_OBJ_TYPE_VFSMOUNT, @@ -346,15 +347,9 @@ enum fsnotify_obj_type { FSNOTIFY_OBJ_TYPE_DETACHED = FSNOTIFY_OBJ_TYPE_COUNT }; -#define FSNOTIFY_OBJ_TYPE_INODE_FL (1U << FSNOTIFY_OBJ_TYPE_INODE) -#define FSNOTIFY_OBJ_TYPE_PARENT_FL (1U << FSNOTIFY_OBJ_TYPE_PARENT) -#define FSNOTIFY_OBJ_TYPE_VFSMOUNT_FL (1U << FSNOTIFY_OBJ_TYPE_VFSMOUNT) -#define FSNOTIFY_OBJ_TYPE_SB_FL (1U << FSNOTIFY_OBJ_TYPE_SB) -#define FSNOTIFY_OBJ_ALL_TYPES_MASK ((1U << FSNOTIFY_OBJ_TYPE_COUNT) - 1) - -static inline bool fsnotify_valid_obj_type(unsigned int type) +static inline bool fsnotify_valid_obj_type(unsigned int obj_type) { - return (type < FSNOTIFY_OBJ_TYPE_COUNT); + return (obj_type < FSNOTIFY_OBJ_TYPE_COUNT); } struct fsnotify_iter_info { @@ -387,7 +382,7 @@ static inline void fsnotify_iter_set_report_type_mark( static inline struct fsnotify_mark *fsnotify_iter_##name##_mark( \ struct fsnotify_iter_info *iter_info) \ { \ - return (iter_info->report_mask & FSNOTIFY_OBJ_TYPE_##NAME##_FL) ? \ + return (iter_info->report_mask & (1U << FSNOTIFY_OBJ_TYPE_##NAME)) ? \ iter_info->marks[FSNOTIFY_OBJ_TYPE_##NAME] : NULL; \ } @@ -604,11 +599,11 @@ extern int fsnotify_get_conn_fsid(const struct fsnotify_mark_connector *conn, __kernel_fsid_t *fsid); /* attach the mark to the object */ extern int fsnotify_add_mark(struct fsnotify_mark *mark, - fsnotify_connp_t *connp, unsigned int type, + fsnotify_connp_t *connp, unsigned int obj_type, int allow_dups, __kernel_fsid_t *fsid); extern int fsnotify_add_mark_locked(struct fsnotify_mark *mark, fsnotify_connp_t *connp, - unsigned int type, int allow_dups, + unsigned int obj_type, int allow_dups, __kernel_fsid_t *fsid); /* attach the mark to the inode */ @@ -637,22 +632,23 @@ extern void fsnotify_detach_mark(struct fsnotify_mark *mark); extern void fsnotify_free_mark(struct fsnotify_mark *mark); /* Wait until all marks queued for destruction are destroyed */ extern void fsnotify_wait_marks_destroyed(void); -/* run all the marks in a group, and clear all of the marks attached to given object type */ -extern void fsnotify_clear_marks_by_group(struct fsnotify_group *group, unsigned int type); +/* Clear all of the marks of a group attached to a given object type */ +extern void fsnotify_clear_marks_by_group(struct fsnotify_group *group, + unsigned int obj_type); /* run all the marks in a group, and clear all of the vfsmount marks */ static inline void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group) { - fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_VFSMOUNT_FL); + fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_VFSMOUNT); } /* run all the marks in a group, and clear all of the inode marks */ static inline void fsnotify_clear_inode_marks_by_group(struct fsnotify_group *group) { - fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_INODE_FL); + fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_INODE); } /* run all the marks in a group, and clear all of the sn marks */ static inline void fsnotify_clear_sb_marks_by_group(struct fsnotify_group *group) { - fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_SB_FL); + fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_SB); } extern void fsnotify_get_mark(struct fsnotify_mark *mark); extern void fsnotify_put_mark(struct fsnotify_mark *mark); From be14cab43ddf78ba1cd34663ef6edf4b97e6a6b8 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Mon, 29 Nov 2021 22:15:28 +0200 Subject: [PATCH 0705/1565] fsnotify: separate mark iterator type from object type enum [ Upstream commit 1c9007d62bea6fd164285314f7553f73e5308863 ] They are two different types that use the same enum, so this confusing. Use the object type to indicate the type of object mark is attached to and the iter type to indicate the type of watch. A group can have two different watches of the same object type (parent and child watches) that match the same event. Link: https://lore.kernel.org/r/20211129201537.1932819-3-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 6 ++--- fs/notify/fsnotify.c | 18 +++++++------- fs/notify/mark.c | 4 ++-- include/linux/fsnotify_backend.h | 41 ++++++++++++++++++++++---------- 4 files changed, 42 insertions(+), 27 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index b6091775aa6e..652fe84cb8ac 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -299,7 +299,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, return 0; } - fsnotify_foreach_obj_type(type) { + fsnotify_foreach_iter_type(type) { if (!fsnotify_iter_should_report_type(iter_info, type)) continue; mark = iter_info->marks[type]; @@ -318,7 +318,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, * If the event is on a child and this mark is on a parent not * watching children, don't send it! */ - if (type == FSNOTIFY_OBJ_TYPE_PARENT && + if (type == FSNOTIFY_ITER_TYPE_PARENT && !(mark->mask & FS_EVENT_ON_CHILD)) continue; @@ -746,7 +746,7 @@ static __kernel_fsid_t fanotify_get_fsid(struct fsnotify_iter_info *iter_info) int type; __kernel_fsid_t fsid = {}; - fsnotify_foreach_obj_type(type) { + fsnotify_foreach_iter_type(type) { struct fsnotify_mark_connector *conn; if (!fsnotify_iter_should_report_type(iter_info, type)) diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 4034ca566f95..0c94457c625e 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -330,7 +330,7 @@ static int send_to_group(__u32 mask, const void *data, int data_type, /* clear ignored on inode modification */ if (mask & FS_MODIFY) { - fsnotify_foreach_obj_type(type) { + fsnotify_foreach_iter_type(type) { if (!fsnotify_iter_should_report_type(iter_info, type)) continue; mark = iter_info->marks[type]; @@ -340,7 +340,7 @@ static int send_to_group(__u32 mask, const void *data, int data_type, } } - fsnotify_foreach_obj_type(type) { + fsnotify_foreach_iter_type(type) { if (!fsnotify_iter_should_report_type(iter_info, type)) continue; mark = iter_info->marks[type]; @@ -405,7 +405,7 @@ static unsigned int fsnotify_iter_select_report_types( int type; /* Choose max prio group among groups of all queue heads */ - fsnotify_foreach_obj_type(type) { + fsnotify_foreach_iter_type(type) { mark = iter_info->marks[type]; if (mark && fsnotify_compare_groups(max_prio_group, mark->group) > 0) @@ -417,7 +417,7 @@ static unsigned int fsnotify_iter_select_report_types( /* Set the report mask for marks from same group as max prio group */ iter_info->report_mask = 0; - fsnotify_foreach_obj_type(type) { + fsnotify_foreach_iter_type(type) { mark = iter_info->marks[type]; if (mark && fsnotify_compare_groups(max_prio_group, mark->group) == 0) @@ -435,7 +435,7 @@ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info) { int type; - fsnotify_foreach_obj_type(type) { + fsnotify_foreach_iter_type(type) { if (fsnotify_iter_should_report_type(iter_info, type)) iter_info->marks[type] = fsnotify_next_mark(iter_info->marks[type]); @@ -519,18 +519,18 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, iter_info.srcu_idx = srcu_read_lock(&fsnotify_mark_srcu); - iter_info.marks[FSNOTIFY_OBJ_TYPE_SB] = + iter_info.marks[FSNOTIFY_ITER_TYPE_SB] = fsnotify_first_mark(&sb->s_fsnotify_marks); if (mnt) { - iter_info.marks[FSNOTIFY_OBJ_TYPE_VFSMOUNT] = + iter_info.marks[FSNOTIFY_ITER_TYPE_VFSMOUNT] = fsnotify_first_mark(&mnt->mnt_fsnotify_marks); } if (inode) { - iter_info.marks[FSNOTIFY_OBJ_TYPE_INODE] = + iter_info.marks[FSNOTIFY_ITER_TYPE_INODE] = fsnotify_first_mark(&inode->i_fsnotify_marks); } if (parent) { - iter_info.marks[FSNOTIFY_OBJ_TYPE_PARENT] = + iter_info.marks[FSNOTIFY_ITER_TYPE_PARENT] = fsnotify_first_mark(&parent->i_fsnotify_marks); } diff --git a/fs/notify/mark.c b/fs/notify/mark.c index 7c0946e16918..b42629d2fc1c 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -353,7 +353,7 @@ bool fsnotify_prepare_user_wait(struct fsnotify_iter_info *iter_info) { int type; - fsnotify_foreach_obj_type(type) { + fsnotify_foreach_iter_type(type) { /* This can fail if mark is being removed */ if (!fsnotify_get_mark_safe(iter_info->marks[type])) { __release(&fsnotify_mark_srcu); @@ -382,7 +382,7 @@ void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info) int type; iter_info->srcu_idx = srcu_read_lock(&fsnotify_mark_srcu); - fsnotify_foreach_obj_type(type) + fsnotify_foreach_iter_type(type) fsnotify_put_mark_wake(iter_info->marks[type]); } diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index b9c84b1dbcc8..73739fee1710 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -337,10 +337,25 @@ static inline struct fs_error_report *fsnotify_data_error_report( } } +/* + * Index to merged marks iterator array that correlates to a type of watch. + * The type of watched object can be deduced from the iterator type, but not + * the other way around, because an event can match different watched objects + * of the same object type. + * For example, both parent and child are watching an object of type inode. + */ +enum fsnotify_iter_type { + FSNOTIFY_ITER_TYPE_INODE, + FSNOTIFY_ITER_TYPE_VFSMOUNT, + FSNOTIFY_ITER_TYPE_SB, + FSNOTIFY_ITER_TYPE_PARENT, + FSNOTIFY_ITER_TYPE_COUNT +}; + +/* The type of object that a mark is attached to */ enum fsnotify_obj_type { FSNOTIFY_OBJ_TYPE_ANY = -1, FSNOTIFY_OBJ_TYPE_INODE, - FSNOTIFY_OBJ_TYPE_PARENT, FSNOTIFY_OBJ_TYPE_VFSMOUNT, FSNOTIFY_OBJ_TYPE_SB, FSNOTIFY_OBJ_TYPE_COUNT, @@ -353,37 +368,37 @@ static inline bool fsnotify_valid_obj_type(unsigned int obj_type) } struct fsnotify_iter_info { - struct fsnotify_mark *marks[FSNOTIFY_OBJ_TYPE_COUNT]; + struct fsnotify_mark *marks[FSNOTIFY_ITER_TYPE_COUNT]; unsigned int report_mask; int srcu_idx; }; static inline bool fsnotify_iter_should_report_type( - struct fsnotify_iter_info *iter_info, int type) + struct fsnotify_iter_info *iter_info, int iter_type) { - return (iter_info->report_mask & (1U << type)); + return (iter_info->report_mask & (1U << iter_type)); } static inline void fsnotify_iter_set_report_type( - struct fsnotify_iter_info *iter_info, int type) + struct fsnotify_iter_info *iter_info, int iter_type) { - iter_info->report_mask |= (1U << type); + iter_info->report_mask |= (1U << iter_type); } static inline void fsnotify_iter_set_report_type_mark( - struct fsnotify_iter_info *iter_info, int type, + struct fsnotify_iter_info *iter_info, int iter_type, struct fsnotify_mark *mark) { - iter_info->marks[type] = mark; - iter_info->report_mask |= (1U << type); + iter_info->marks[iter_type] = mark; + iter_info->report_mask |= (1U << iter_type); } #define FSNOTIFY_ITER_FUNCS(name, NAME) \ static inline struct fsnotify_mark *fsnotify_iter_##name##_mark( \ struct fsnotify_iter_info *iter_info) \ { \ - return (iter_info->report_mask & (1U << FSNOTIFY_OBJ_TYPE_##NAME)) ? \ - iter_info->marks[FSNOTIFY_OBJ_TYPE_##NAME] : NULL; \ + return (iter_info->report_mask & (1U << FSNOTIFY_ITER_TYPE_##NAME)) ? \ + iter_info->marks[FSNOTIFY_ITER_TYPE_##NAME] : NULL; \ } FSNOTIFY_ITER_FUNCS(inode, INODE) @@ -391,8 +406,8 @@ FSNOTIFY_ITER_FUNCS(parent, PARENT) FSNOTIFY_ITER_FUNCS(vfsmount, VFSMOUNT) FSNOTIFY_ITER_FUNCS(sb, SB) -#define fsnotify_foreach_obj_type(type) \ - for (type = 0; type < FSNOTIFY_OBJ_TYPE_COUNT; type++) +#define fsnotify_foreach_iter_type(type) \ + for (type = 0; type < FSNOTIFY_ITER_TYPE_COUNT; type++) /* * fsnotify_connp_t is what we embed in objects which connector can be attached From 4e59c7b3e3b666822144fcb857c5009d921cda30 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Mon, 29 Nov 2021 22:15:29 +0200 Subject: [PATCH 0706/1565] fanotify: introduce group flag FAN_REPORT_TARGET_FID [ Upstream commit d61fd650e9d206a71fda789f02a1ced4b19944c4 ] FAN_REPORT_FID is ambiguous in that it reports the fid of the child for some events and the fid of the parent for create/delete/move events. The new FAN_REPORT_TARGET_FID flag is an implicit request to report the fid of the target object of the operation (a.k.a the child inode) also in create/delete/move events in addition to the fid of the parent and the name of the child. To reduce the test matrix for uninteresting use cases, the new FAN_REPORT_TARGET_FID flag requires both FAN_REPORT_NAME and FAN_REPORT_FID. The convenience macro FAN_REPORT_DFID_NAME_TARGET combines FAN_REPORT_TARGET_FID with all the required flags. Link: https://lore.kernel.org/r/20211129201537.1932819-4-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 48 ++++++++++++++++++++++-------- fs/notify/fanotify/fanotify_user.c | 11 ++++++- include/linux/fanotify.h | 2 +- include/uapi/linux/fanotify.h | 4 +++ 4 files changed, 51 insertions(+), 14 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 652fe84cb8ac..85e542b164c8 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -458,17 +458,41 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, } /* - * The inode to use as identifier when reporting fid depends on the event. - * Report the modified directory inode on dirent modification events. - * Report the "victim" inode otherwise. + * FAN_REPORT_FID is ambiguous in that it reports the fid of the child for + * some events and the fid of the parent for create/delete/move events. + * + * With the FAN_REPORT_TARGET_FID flag, the fid of the child is reported + * also in create/delete/move events in addition to the fid of the parent + * and the name of the child. + */ +static inline bool fanotify_report_child_fid(unsigned int fid_mode, u32 mask) +{ + if (mask & ALL_FSNOTIFY_DIRENT_EVENTS) + return (fid_mode & FAN_REPORT_TARGET_FID); + + return (fid_mode & FAN_REPORT_FID) && !(mask & FAN_ONDIR); +} + +/* + * The inode to use as identifier when reporting fid depends on the event + * and the group flags. + * + * With the group flag FAN_REPORT_TARGET_FID, always report the child fid. + * + * Without the group flag FAN_REPORT_TARGET_FID, report the modified directory + * fid on dirent events and the child fid otherwise. + * * For example: - * FS_ATTRIB reports the child inode even if reported on a watched parent. - * FS_CREATE reports the modified dir inode and not the created inode. + * FS_ATTRIB reports the child fid even if reported on a watched parent. + * FS_CREATE reports the modified dir fid without FAN_REPORT_TARGET_FID. + * and reports the created child fid with FAN_REPORT_TARGET_FID. */ static struct inode *fanotify_fid_inode(u32 event_mask, const void *data, - int data_type, struct inode *dir) + int data_type, struct inode *dir, + unsigned int fid_mode) { - if (event_mask & ALL_FSNOTIFY_DIRENT_EVENTS) + if ((event_mask & ALL_FSNOTIFY_DIRENT_EVENTS) && + !(fid_mode & FAN_REPORT_TARGET_FID)) return dir; return fsnotify_data_inode(data, data_type); @@ -647,10 +671,11 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, { struct fanotify_event *event = NULL; gfp_t gfp = GFP_KERNEL_ACCOUNT; - struct inode *id = fanotify_fid_inode(mask, data, data_type, dir); + unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); + struct inode *id = fanotify_fid_inode(mask, data, data_type, dir, + fid_mode); struct inode *dirid = fanotify_dfid_inode(mask, data, data_type, dir); const struct path *path = fsnotify_data_path(data, data_type); - unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); struct mem_cgroup *old_memcg; struct inode *child = NULL; bool name_event = false; @@ -660,11 +685,10 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, if ((fid_mode & FAN_REPORT_DIR_FID) && dirid) { /* - * With both flags FAN_REPORT_DIR_FID and FAN_REPORT_FID, we - * report the child fid for events reported on a non-dir child + * For certain events and group flags, report the child fid * in addition to reporting the parent fid and maybe child name. */ - if ((fid_mode & FAN_REPORT_FID) && id != dirid && !ondir) + if (fanotify_report_child_fid(fid_mode, mask) && id != dirid) child = id; id = dirid; diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 2f78999a7aa3..6b058d652f47 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1270,6 +1270,15 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) if ((fid_mode & FAN_REPORT_NAME) && !(fid_mode & FAN_REPORT_DIR_FID)) return -EINVAL; + /* + * FAN_REPORT_TARGET_FID requires FAN_REPORT_NAME and FAN_REPORT_FID + * and is used as an indication to report both dir and child fid on all + * dirent events. + */ + if ((fid_mode & FAN_REPORT_TARGET_FID) && + (!(fid_mode & FAN_REPORT_NAME) || !(fid_mode & FAN_REPORT_FID))) + return -EINVAL; + f_flags = O_RDWR | FMODE_NONOTIFY; if (flags & FAN_CLOEXEC) f_flags |= O_CLOEXEC; @@ -1680,7 +1689,7 @@ static int __init fanotify_user_setup(void) FANOTIFY_DEFAULT_MAX_USER_MARKS); BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS); - BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 11); + BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 12); BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9); fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 616af2ea20f3..376e050e6f38 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -25,7 +25,7 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ #define FANOTIFY_CLASS_BITS (FAN_CLASS_NOTIF | FANOTIFY_PERM_CLASSES) -#define FANOTIFY_FID_BITS (FAN_REPORT_FID | FAN_REPORT_DFID_NAME) +#define FANOTIFY_FID_BITS (FAN_REPORT_DFID_NAME_TARGET) #define FANOTIFY_INFO_MODES (FANOTIFY_FID_BITS | FAN_REPORT_PIDFD) diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index bd1932c2074d..60f73639a896 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -57,9 +57,13 @@ #define FAN_REPORT_FID 0x00000200 /* Report unique file id */ #define FAN_REPORT_DIR_FID 0x00000400 /* Report unique directory id */ #define FAN_REPORT_NAME 0x00000800 /* Report events with name */ +#define FAN_REPORT_TARGET_FID 0x00001000 /* Report dirent target id */ /* Convenience macro - FAN_REPORT_NAME requires FAN_REPORT_DIR_FID */ #define FAN_REPORT_DFID_NAME (FAN_REPORT_DIR_FID | FAN_REPORT_NAME) +/* Convenience macro - FAN_REPORT_TARGET_FID requires all other FID flags */ +#define FAN_REPORT_DFID_NAME_TARGET (FAN_REPORT_DFID_NAME | \ + FAN_REPORT_FID | FAN_REPORT_TARGET_FID) /* Deprecated - do not use this in programs and do not add new flags here! */ #define FAN_ALL_INIT_FLAGS (FAN_CLOEXEC | FAN_NONBLOCK | \ From 580eb8de84704fe37a34b8a974f22a11ed385300 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Mon, 29 Nov 2021 22:15:30 +0200 Subject: [PATCH 0707/1565] fsnotify: generate FS_RENAME event with rich information [ Upstream commit e54183fa7047c15819bc155f4c58501d9a9a3489 ] The dnotify FS_DN_RENAME event is used to request notification about a move within the same parent directory and was always coupled with the FS_MOVED_FROM event. Rename the FS_DN_RENAME event flag to FS_RENAME, decouple it from FS_MOVED_FROM and report it with the moved dentry instead of the moved inode, so it has the information about both old and new parent and name. Generate the FS_RENAME event regardless of same parent dir and apply the "same parent" rule in the generic fsnotify_handle_event() helper that is used to call backends with ->handle_inode_event() method (i.e. dnotify). The ->handle_inode_event() method is not rich enough to report both old and new parent and name anyway. The enriched event is reported to fanotify over the ->handle_event() method with the old and new dir inode marks in marks array slots for ITER_TYPE_INODE and a new iter type slot ITER_TYPE_INODE2. The enriched event will be used for reporting old and new parent+name to fanotify groups with FAN_RENAME events. Link: https://lore.kernel.org/r/20211129201537.1932819-5-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/dnotify/dnotify.c | 2 +- fs/notify/fsnotify.c | 37 +++++++++++++++++++++++++------- include/linux/dnotify.h | 2 +- include/linux/fsnotify.h | 9 +++++--- include/linux/fsnotify_backend.h | 7 +++--- 5 files changed, 41 insertions(+), 16 deletions(-) diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index e85e13c50d6d..d5ebebb034ff 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -196,7 +196,7 @@ static __u32 convert_arg(unsigned long arg) if (arg & DN_ATTRIB) new_mask |= FS_ATTRIB; if (arg & DN_RENAME) - new_mask |= FS_DN_RENAME; + new_mask |= FS_RENAME; if (arg & DN_CREATE) new_mask |= (FS_CREATE | FS_MOVED_TO); diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 0c94457c625e..ab81a0776ece 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -279,6 +279,18 @@ static int fsnotify_handle_event(struct fsnotify_group *group, __u32 mask, WARN_ON_ONCE(fsnotify_iter_vfsmount_mark(iter_info))) return 0; + /* + * For FS_RENAME, 'dir' is old dir and 'data' is new dentry. + * The only ->handle_inode_event() backend that supports FS_RENAME is + * dnotify, where it means file was renamed within same parent. + */ + if (mask & FS_RENAME) { + struct dentry *moved = fsnotify_data_dentry(data, data_type); + + if (dir != moved->d_parent->d_inode) + return 0; + } + if (parent_mark) { /* * parent_mark indicates that the parent inode is watching @@ -469,7 +481,9 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, struct super_block *sb = fsnotify_data_sb(data, data_type); struct fsnotify_iter_info iter_info = {}; struct mount *mnt = NULL; - struct inode *parent = NULL; + struct inode *inode2 = NULL; + struct dentry *moved; + int inode2_type; int ret = 0; __u32 test_mask, marks_mask; @@ -479,12 +493,19 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, if (!inode) { /* Dirent event - report on TYPE_INODE to dir */ inode = dir; + /* For FS_RENAME, inode is old_dir and inode2 is new_dir */ + if (mask & FS_RENAME) { + moved = fsnotify_data_dentry(data, data_type); + inode2 = moved->d_parent->d_inode; + inode2_type = FSNOTIFY_ITER_TYPE_INODE2; + } } else if (mask & FS_EVENT_ON_CHILD) { /* * Event on child - report on TYPE_PARENT to dir if it is * watching children and on TYPE_INODE to child. */ - parent = dir; + inode2 = dir; + inode2_type = FSNOTIFY_ITER_TYPE_PARENT; } /* @@ -497,7 +518,7 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, if (!sb->s_fsnotify_marks && (!mnt || !mnt->mnt_fsnotify_marks) && (!inode || !inode->i_fsnotify_marks) && - (!parent || !parent->i_fsnotify_marks)) + (!inode2 || !inode2->i_fsnotify_marks)) return 0; marks_mask = sb->s_fsnotify_mask; @@ -505,8 +526,8 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, marks_mask |= mnt->mnt_fsnotify_mask; if (inode) marks_mask |= inode->i_fsnotify_mask; - if (parent) - marks_mask |= parent->i_fsnotify_mask; + if (inode2) + marks_mask |= inode2->i_fsnotify_mask; /* @@ -529,9 +550,9 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, iter_info.marks[FSNOTIFY_ITER_TYPE_INODE] = fsnotify_first_mark(&inode->i_fsnotify_marks); } - if (parent) { - iter_info.marks[FSNOTIFY_ITER_TYPE_PARENT] = - fsnotify_first_mark(&parent->i_fsnotify_marks); + if (inode2) { + iter_info.marks[inode2_type] = + fsnotify_first_mark(&inode2->i_fsnotify_marks); } /* diff --git a/include/linux/dnotify.h b/include/linux/dnotify.h index 0aad774beaec..b87c3b85a166 100644 --- a/include/linux/dnotify.h +++ b/include/linux/dnotify.h @@ -26,7 +26,7 @@ struct dnotify_struct { FS_MODIFY | FS_MODIFY_CHILD |\ FS_ACCESS | FS_ACCESS_CHILD |\ FS_ATTRIB | FS_ATTRIB_CHILD |\ - FS_CREATE | FS_DN_RENAME |\ + FS_CREATE | FS_RENAME |\ FS_MOVED_FROM | FS_MOVED_TO) extern int dir_notify_enable; diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index bec1e23ecf78..bb8467cd11ae 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -144,16 +144,19 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir, u32 fs_cookie = fsnotify_get_cookie(); __u32 old_dir_mask = FS_MOVED_FROM; __u32 new_dir_mask = FS_MOVED_TO; + __u32 rename_mask = FS_RENAME; const struct qstr *new_name = &moved->d_name; - if (old_dir == new_dir) - old_dir_mask |= FS_DN_RENAME; - if (isdir) { old_dir_mask |= FS_ISDIR; new_dir_mask |= FS_ISDIR; + rename_mask |= FS_ISDIR; } + /* Event with information about both old and new parent+name */ + fsnotify_name(rename_mask, moved, FSNOTIFY_EVENT_DENTRY, + old_dir, old_name, 0); + fsnotify_name(old_dir_mask, source, FSNOTIFY_EVENT_INODE, old_dir, old_name, fs_cookie); fsnotify_name(new_dir_mask, source, FSNOTIFY_EVENT_INODE, diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 73739fee1710..790c31844db5 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -63,7 +63,7 @@ */ #define FS_EVENT_ON_CHILD 0x08000000 -#define FS_DN_RENAME 0x10000000 /* file renamed */ +#define FS_RENAME 0x10000000 /* File was renamed */ #define FS_DN_MULTISHOT 0x20000000 /* dnotify multishot */ #define FS_ISDIR 0x40000000 /* event occurred against dir */ #define FS_IN_ONESHOT 0x80000000 /* only send event once */ @@ -76,7 +76,7 @@ * The watching parent may get an FS_ATTRIB|FS_EVENT_ON_CHILD event * when a directory entry inside a child subdir changes. */ -#define ALL_FSNOTIFY_DIRENT_EVENTS (FS_CREATE | FS_DELETE | FS_MOVE) +#define ALL_FSNOTIFY_DIRENT_EVENTS (FS_CREATE | FS_DELETE | FS_MOVE | FS_RENAME) #define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \ FS_OPEN_EXEC_PERM) @@ -101,7 +101,7 @@ /* Events that can be reported to backends */ #define ALL_FSNOTIFY_EVENTS (ALL_FSNOTIFY_DIRENT_EVENTS | \ FS_EVENTS_POSS_ON_CHILD | \ - FS_DELETE_SELF | FS_MOVE_SELF | FS_DN_RENAME | \ + FS_DELETE_SELF | FS_MOVE_SELF | \ FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED | \ FS_ERROR) @@ -349,6 +349,7 @@ enum fsnotify_iter_type { FSNOTIFY_ITER_TYPE_VFSMOUNT, FSNOTIFY_ITER_TYPE_SB, FSNOTIFY_ITER_TYPE_PARENT, + FSNOTIFY_ITER_TYPE_INODE2, FSNOTIFY_ITER_TYPE_COUNT }; From 4e63ce91997a6ac8b8337e87aafb539590ba9ae9 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Mon, 29 Nov 2021 22:15:31 +0200 Subject: [PATCH 0708/1565] fanotify: use macros to get the offset to fanotify_info buffer [ Upstream commit 2d9374f095136206a02eb0b6cd9ef94632c1e9f7 ] The fanotify_info buffer contains up to two file handles and a name. Use macros to simplify the code that access the different items within the buffer. Add assertions to verify that stored fh len and name len do not overflow the u8 stored value in fanotify_info header. Remove the unused fanotify_info_len() helper. Link: https://lore.kernel.org/r/20211129201537.1932819-6-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 2 +- fs/notify/fanotify/fanotify.h | 41 +++++++++++++++++++++++++---------- 2 files changed, 31 insertions(+), 12 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 85e542b164c8..ffad224be014 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -411,7 +411,7 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, * be zero in that case if encoding fh len failed. */ err = -ENOENT; - if (fh_len < 4 || WARN_ON_ONCE(fh_len % 4)) + if (fh_len < 4 || WARN_ON_ONCE(fh_len % 4) || fh_len > MAX_HANDLE_SZ) goto out_err; /* No external buffer in a variable size allocated fh */ diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index d25f500bf7e7..dd23ba659e76 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -49,6 +49,22 @@ struct fanotify_info { * (optional) file_fh starts at buf[dir_fh_totlen] * name starts at buf[dir_fh_totlen + file_fh_totlen] */ +#define FANOTIFY_DIR_FH_SIZE(info) ((info)->dir_fh_totlen) +#define FANOTIFY_FILE_FH_SIZE(info) ((info)->file_fh_totlen) +#define FANOTIFY_NAME_SIZE(info) ((info)->name_len + 1) + +#define FANOTIFY_DIR_FH_OFFSET(info) 0 +#define FANOTIFY_FILE_FH_OFFSET(info) \ + (FANOTIFY_DIR_FH_OFFSET(info) + FANOTIFY_DIR_FH_SIZE(info)) +#define FANOTIFY_NAME_OFFSET(info) \ + (FANOTIFY_FILE_FH_OFFSET(info) + FANOTIFY_FILE_FH_SIZE(info)) + +#define FANOTIFY_DIR_FH_BUF(info) \ + ((info)->buf + FANOTIFY_DIR_FH_OFFSET(info)) +#define FANOTIFY_FILE_FH_BUF(info) \ + ((info)->buf + FANOTIFY_FILE_FH_OFFSET(info)) +#define FANOTIFY_NAME_BUF(info) \ + ((info)->buf + FANOTIFY_NAME_OFFSET(info)) } __aligned(4); static inline bool fanotify_fh_has_ext_buf(struct fanotify_fh *fh) @@ -87,7 +103,7 @@ static inline struct fanotify_fh *fanotify_info_dir_fh(struct fanotify_info *inf { BUILD_BUG_ON(offsetof(struct fanotify_info, buf) % 4); - return (struct fanotify_fh *)info->buf; + return (struct fanotify_fh *)FANOTIFY_DIR_FH_BUF(info); } static inline int fanotify_info_file_fh_len(struct fanotify_info *info) @@ -101,32 +117,35 @@ static inline int fanotify_info_file_fh_len(struct fanotify_info *info) static inline struct fanotify_fh *fanotify_info_file_fh(struct fanotify_info *info) { - return (struct fanotify_fh *)(info->buf + info->dir_fh_totlen); + return (struct fanotify_fh *)FANOTIFY_FILE_FH_BUF(info); } -static inline const char *fanotify_info_name(struct fanotify_info *info) +static inline char *fanotify_info_name(struct fanotify_info *info) { - return info->buf + info->dir_fh_totlen + info->file_fh_totlen; + if (!info->name_len) + return NULL; + + return FANOTIFY_NAME_BUF(info); } static inline void fanotify_info_init(struct fanotify_info *info) { + BUILD_BUG_ON(FANOTIFY_FH_HDR_LEN + MAX_HANDLE_SZ > U8_MAX); + BUILD_BUG_ON(NAME_MAX > U8_MAX); + info->dir_fh_totlen = 0; info->file_fh_totlen = 0; info->name_len = 0; } -static inline unsigned int fanotify_info_len(struct fanotify_info *info) -{ - return info->dir_fh_totlen + info->file_fh_totlen + info->name_len; -} - static inline void fanotify_info_copy_name(struct fanotify_info *info, const struct qstr *name) { + if (WARN_ON_ONCE(name->len > NAME_MAX)) + return; + info->name_len = name->len; - strcpy(info->buf + info->dir_fh_totlen + info->file_fh_totlen, - name->name); + strcpy(fanotify_info_name(info), name->name); } /* From 967ae137209ce2d983fe16cb5519ff6b21eecde2 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Mon, 29 Nov 2021 22:15:32 +0200 Subject: [PATCH 0709/1565] fanotify: use helpers to parcel fanotify_info buffer [ Upstream commit 1a9515ac9e55e68d733bab81bd408463ab1e25b1 ] fanotify_info buffer is parceled into variable sized records, so the records must be written in order: dir_fh, file_fh, name. Use helpers to assert that order and make fanotify_alloc_name_event() a bit more generic to allow empty dir_fh record and to allow expanding to more records (i.e. name2) soon. Link: https://lore.kernel.org/r/20211129201537.1932819-7-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 35 +++++++++++++++++++---------------- fs/notify/fanotify/fanotify.h | 20 ++++++++++++++++++++ 2 files changed, 39 insertions(+), 16 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index ffad224be014..2b13c79cebc6 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -576,7 +576,7 @@ static struct fanotify_event *fanotify_alloc_fid_event(struct inode *id, return &ffe->fae; } -static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, +static struct fanotify_event *fanotify_alloc_name_event(struct inode *dir, __kernel_fsid_t *fsid, const struct qstr *name, struct inode *child, @@ -586,15 +586,17 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, struct fanotify_name_event *fne; struct fanotify_info *info; struct fanotify_fh *dfh, *ffh; - unsigned int dir_fh_len = fanotify_encode_fh_len(id); + unsigned int dir_fh_len = fanotify_encode_fh_len(dir); unsigned int child_fh_len = fanotify_encode_fh_len(child); - unsigned int size; + unsigned long name_len = name ? name->len : 0; + unsigned int len, size; - size = sizeof(*fne) + FANOTIFY_FH_HDR_LEN + dir_fh_len; + /* Reserve terminating null byte even for empty name */ + size = sizeof(*fne) + name_len + 1; + if (dir_fh_len) + size += FANOTIFY_FH_HDR_LEN + dir_fh_len; if (child_fh_len) size += FANOTIFY_FH_HDR_LEN + child_fh_len; - if (name) - size += name->len + 1; fne = kmalloc(size, gfp); if (!fne) return NULL; @@ -604,22 +606,23 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, *hash ^= fanotify_hash_fsid(fsid); info = &fne->info; fanotify_info_init(info); - dfh = fanotify_info_dir_fh(info); - info->dir_fh_totlen = fanotify_encode_fh(dfh, id, dir_fh_len, hash, 0); + if (dir_fh_len) { + dfh = fanotify_info_dir_fh(info); + len = fanotify_encode_fh(dfh, dir, dir_fh_len, hash, 0); + fanotify_info_set_dir_fh(info, len); + } if (child_fh_len) { ffh = fanotify_info_file_fh(info); - info->file_fh_totlen = fanotify_encode_fh(ffh, child, - child_fh_len, hash, 0); + len = fanotify_encode_fh(ffh, child, child_fh_len, hash, 0); + fanotify_info_set_file_fh(info, len); } - if (name) { - long salt = name->len; - + if (name_len) { fanotify_info_copy_name(info, name); - *hash ^= full_name_hash((void *)salt, name->name, name->len); + *hash ^= full_name_hash((void *)name_len, name->name, name_len); } - pr_debug("%s: ino=%lu size=%u dir_fh_len=%u child_fh_len=%u name_len=%u name='%.*s'\n", - __func__, id->i_ino, size, dir_fh_len, child_fh_len, + pr_debug("%s: size=%u dir_fh_len=%u child_fh_len=%u name_len=%u name='%.*s'\n", + __func__, size, dir_fh_len, child_fh_len, info->name_len, info->name_len, fanotify_info_name(info)); return &fne->fae; diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index dd23ba659e76..7ac6f9f1e414 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -138,6 +138,26 @@ static inline void fanotify_info_init(struct fanotify_info *info) info->name_len = 0; } +/* These set/copy helpers MUST be called by order */ +static inline void fanotify_info_set_dir_fh(struct fanotify_info *info, + unsigned int totlen) +{ + if (WARN_ON_ONCE(info->file_fh_totlen > 0) || + WARN_ON_ONCE(info->name_len > 0)) + return; + + info->dir_fh_totlen = totlen; +} + +static inline void fanotify_info_set_file_fh(struct fanotify_info *info, + unsigned int totlen) +{ + if (WARN_ON_ONCE(info->name_len > 0)) + return; + + info->file_fh_totlen = totlen; +} + static inline void fanotify_info_copy_name(struct fanotify_info *info, const struct qstr *name) { From f59e978cfa9f20663e618294026676b35c2da033 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Mon, 29 Nov 2021 22:15:33 +0200 Subject: [PATCH 0710/1565] fanotify: support secondary dir fh and name in fanotify_info [ Upstream commit 3cf984e950c1c3f41d407ed31db33beb996be132 ] Allow storing a secondary dir fh and name tupple in fanotify_info. This will be used to store the new parent and name information in FAN_RENAME event. Link: https://lore.kernel.org/r/20211129201537.1932819-8-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 20 ++++++-- fs/notify/fanotify/fanotify.h | 79 +++++++++++++++++++++++++++--- fs/notify/fanotify/fanotify_user.c | 3 +- 3 files changed, 88 insertions(+), 14 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 2b13c79cebc6..5f184b2d6ea7 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -76,8 +76,10 @@ static bool fanotify_info_equal(struct fanotify_info *info1, struct fanotify_info *info2) { if (info1->dir_fh_totlen != info2->dir_fh_totlen || + info1->dir2_fh_totlen != info2->dir2_fh_totlen || info1->file_fh_totlen != info2->file_fh_totlen || - info1->name_len != info2->name_len) + info1->name_len != info2->name_len || + info1->name2_len != info2->name2_len) return false; if (info1->dir_fh_totlen && @@ -85,14 +87,24 @@ static bool fanotify_info_equal(struct fanotify_info *info1, fanotify_info_dir_fh(info2))) return false; + if (info1->dir2_fh_totlen && + !fanotify_fh_equal(fanotify_info_dir2_fh(info1), + fanotify_info_dir2_fh(info2))) + return false; + if (info1->file_fh_totlen && !fanotify_fh_equal(fanotify_info_file_fh(info1), fanotify_info_file_fh(info2))) return false; - return !info1->name_len || - !memcmp(fanotify_info_name(info1), fanotify_info_name(info2), - info1->name_len); + if (info1->name_len && + memcmp(fanotify_info_name(info1), fanotify_info_name(info2), + info1->name_len)) + return false; + + return !info1->name2_len || + !memcmp(fanotify_info_name2(info1), fanotify_info_name2(info2), + info1->name2_len); } static bool fanotify_name_event_equal(struct fanotify_name_event *fne1, diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 7ac6f9f1e414..8fa3bc0effd4 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -40,31 +40,45 @@ struct fanotify_fh { struct fanotify_info { /* size of dir_fh/file_fh including fanotify_fh hdr size */ u8 dir_fh_totlen; + u8 dir2_fh_totlen; u8 file_fh_totlen; u8 name_len; - u8 pad; + u8 name2_len; + u8 pad[3]; unsigned char buf[]; /* * (struct fanotify_fh) dir_fh starts at buf[0] - * (optional) file_fh starts at buf[dir_fh_totlen] - * name starts at buf[dir_fh_totlen + file_fh_totlen] + * (optional) dir2_fh starts at buf[dir_fh_totlen] + * (optional) file_fh starts at buf[dir_fh_totlen + dir2_fh_totlen] + * name starts at buf[dir_fh_totlen + dir2_fh_totlen + file_fh_totlen] + * ... */ #define FANOTIFY_DIR_FH_SIZE(info) ((info)->dir_fh_totlen) +#define FANOTIFY_DIR2_FH_SIZE(info) ((info)->dir2_fh_totlen) #define FANOTIFY_FILE_FH_SIZE(info) ((info)->file_fh_totlen) #define FANOTIFY_NAME_SIZE(info) ((info)->name_len + 1) +#define FANOTIFY_NAME2_SIZE(info) ((info)->name2_len + 1) #define FANOTIFY_DIR_FH_OFFSET(info) 0 -#define FANOTIFY_FILE_FH_OFFSET(info) \ +#define FANOTIFY_DIR2_FH_OFFSET(info) \ (FANOTIFY_DIR_FH_OFFSET(info) + FANOTIFY_DIR_FH_SIZE(info)) +#define FANOTIFY_FILE_FH_OFFSET(info) \ + (FANOTIFY_DIR2_FH_OFFSET(info) + FANOTIFY_DIR2_FH_SIZE(info)) #define FANOTIFY_NAME_OFFSET(info) \ (FANOTIFY_FILE_FH_OFFSET(info) + FANOTIFY_FILE_FH_SIZE(info)) +#define FANOTIFY_NAME2_OFFSET(info) \ + (FANOTIFY_NAME_OFFSET(info) + FANOTIFY_NAME_SIZE(info)) #define FANOTIFY_DIR_FH_BUF(info) \ ((info)->buf + FANOTIFY_DIR_FH_OFFSET(info)) +#define FANOTIFY_DIR2_FH_BUF(info) \ + ((info)->buf + FANOTIFY_DIR2_FH_OFFSET(info)) #define FANOTIFY_FILE_FH_BUF(info) \ ((info)->buf + FANOTIFY_FILE_FH_OFFSET(info)) #define FANOTIFY_NAME_BUF(info) \ ((info)->buf + FANOTIFY_NAME_OFFSET(info)) +#define FANOTIFY_NAME2_BUF(info) \ + ((info)->buf + FANOTIFY_NAME2_OFFSET(info)) } __aligned(4); static inline bool fanotify_fh_has_ext_buf(struct fanotify_fh *fh) @@ -106,6 +120,20 @@ static inline struct fanotify_fh *fanotify_info_dir_fh(struct fanotify_info *inf return (struct fanotify_fh *)FANOTIFY_DIR_FH_BUF(info); } +static inline int fanotify_info_dir2_fh_len(struct fanotify_info *info) +{ + if (!info->dir2_fh_totlen || + WARN_ON_ONCE(info->dir2_fh_totlen < FANOTIFY_FH_HDR_LEN)) + return 0; + + return info->dir2_fh_totlen - FANOTIFY_FH_HDR_LEN; +} + +static inline struct fanotify_fh *fanotify_info_dir2_fh(struct fanotify_info *info) +{ + return (struct fanotify_fh *)FANOTIFY_DIR2_FH_BUF(info); +} + static inline int fanotify_info_file_fh_len(struct fanotify_info *info) { if (!info->file_fh_totlen || @@ -128,31 +156,55 @@ static inline char *fanotify_info_name(struct fanotify_info *info) return FANOTIFY_NAME_BUF(info); } +static inline char *fanotify_info_name2(struct fanotify_info *info) +{ + if (!info->name2_len) + return NULL; + + return FANOTIFY_NAME2_BUF(info); +} + static inline void fanotify_info_init(struct fanotify_info *info) { BUILD_BUG_ON(FANOTIFY_FH_HDR_LEN + MAX_HANDLE_SZ > U8_MAX); BUILD_BUG_ON(NAME_MAX > U8_MAX); info->dir_fh_totlen = 0; + info->dir2_fh_totlen = 0; info->file_fh_totlen = 0; info->name_len = 0; + info->name2_len = 0; } /* These set/copy helpers MUST be called by order */ static inline void fanotify_info_set_dir_fh(struct fanotify_info *info, unsigned int totlen) { - if (WARN_ON_ONCE(info->file_fh_totlen > 0) || - WARN_ON_ONCE(info->name_len > 0)) + if (WARN_ON_ONCE(info->dir2_fh_totlen > 0) || + WARN_ON_ONCE(info->file_fh_totlen > 0) || + WARN_ON_ONCE(info->name_len > 0) || + WARN_ON_ONCE(info->name2_len > 0)) return; info->dir_fh_totlen = totlen; } +static inline void fanotify_info_set_dir2_fh(struct fanotify_info *info, + unsigned int totlen) +{ + if (WARN_ON_ONCE(info->file_fh_totlen > 0) || + WARN_ON_ONCE(info->name_len > 0) || + WARN_ON_ONCE(info->name2_len > 0)) + return; + + info->dir2_fh_totlen = totlen; +} + static inline void fanotify_info_set_file_fh(struct fanotify_info *info, unsigned int totlen) { - if (WARN_ON_ONCE(info->name_len > 0)) + if (WARN_ON_ONCE(info->name_len > 0) || + WARN_ON_ONCE(info->name2_len > 0)) return; info->file_fh_totlen = totlen; @@ -161,13 +213,24 @@ static inline void fanotify_info_set_file_fh(struct fanotify_info *info, static inline void fanotify_info_copy_name(struct fanotify_info *info, const struct qstr *name) { - if (WARN_ON_ONCE(name->len > NAME_MAX)) + if (WARN_ON_ONCE(name->len > NAME_MAX) || + WARN_ON_ONCE(info->name2_len > 0)) return; info->name_len = name->len; strcpy(fanotify_info_name(info), name->name); } +static inline void fanotify_info_copy_name2(struct fanotify_info *info, + const struct qstr *name) +{ + if (WARN_ON_ONCE(name->len > NAME_MAX)) + return; + + info->name2_len = name->len; + strcpy(fanotify_info_name2(info), name->name); +} + /* * Common structure for fanotify events. Concrete structs are allocated in * fanotify_handle_event() and freed when the information is retrieved by diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 6b058d652f47..d69570db5efd 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -327,11 +327,10 @@ static int process_access_response(struct fsnotify_group *group, static size_t copy_error_info_to_user(struct fanotify_event *event, char __user *buf, int count) { - struct fanotify_event_info_error info; + struct fanotify_event_info_error info = { }; struct fanotify_error_event *fee = FANOTIFY_EE(event); info.hdr.info_type = FAN_EVENT_INFO_TYPE_ERROR; - info.hdr.pad = 0; info.hdr.len = FANOTIFY_ERROR_INFO_LEN; if (WARN_ON(count < info.hdr.len)) From da527da33bcd5c8a36fadef60ae9569852f2137f Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Mon, 29 Nov 2021 22:15:34 +0200 Subject: [PATCH 0711/1565] fanotify: record old and new parent and name in FAN_RENAME event [ Upstream commit 3982534ba5ce45e890b2f5ef5e7372c1accd14c7 ] In the special case of FAN_RENAME event, we record both the old and new parent and name. Link: https://lore.kernel.org/r/20211129201537.1932819-9-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 42 +++++++++++++++++++++++++++++++---- include/uapi/linux/fanotify.h | 2 ++ 2 files changed, 40 insertions(+), 4 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 5f184b2d6ea7..db81eab90544 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -592,21 +592,28 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *dir, __kernel_fsid_t *fsid, const struct qstr *name, struct inode *child, + struct dentry *moved, unsigned int *hash, gfp_t gfp) { struct fanotify_name_event *fne; struct fanotify_info *info; struct fanotify_fh *dfh, *ffh; + struct inode *dir2 = moved ? d_inode(moved->d_parent) : NULL; + const struct qstr *name2 = moved ? &moved->d_name : NULL; unsigned int dir_fh_len = fanotify_encode_fh_len(dir); + unsigned int dir2_fh_len = fanotify_encode_fh_len(dir2); unsigned int child_fh_len = fanotify_encode_fh_len(child); unsigned long name_len = name ? name->len : 0; + unsigned long name2_len = name2 ? name2->len : 0; unsigned int len, size; /* Reserve terminating null byte even for empty name */ - size = sizeof(*fne) + name_len + 1; + size = sizeof(*fne) + name_len + name2_len + 2; if (dir_fh_len) size += FANOTIFY_FH_HDR_LEN + dir_fh_len; + if (dir2_fh_len) + size += FANOTIFY_FH_HDR_LEN + dir2_fh_len; if (child_fh_len) size += FANOTIFY_FH_HDR_LEN + child_fh_len; fne = kmalloc(size, gfp); @@ -623,6 +630,11 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *dir, len = fanotify_encode_fh(dfh, dir, dir_fh_len, hash, 0); fanotify_info_set_dir_fh(info, len); } + if (dir2_fh_len) { + dfh = fanotify_info_dir2_fh(info); + len = fanotify_encode_fh(dfh, dir2, dir2_fh_len, hash, 0); + fanotify_info_set_dir2_fh(info, len); + } if (child_fh_len) { ffh = fanotify_info_file_fh(info); len = fanotify_encode_fh(ffh, child, child_fh_len, hash, 0); @@ -632,11 +644,22 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *dir, fanotify_info_copy_name(info, name); *hash ^= full_name_hash((void *)name_len, name->name, name_len); } + if (name2_len) { + fanotify_info_copy_name2(info, name2); + *hash ^= full_name_hash((void *)name2_len, name2->name, + name2_len); + } pr_debug("%s: size=%u dir_fh_len=%u child_fh_len=%u name_len=%u name='%.*s'\n", __func__, size, dir_fh_len, child_fh_len, info->name_len, info->name_len, fanotify_info_name(info)); + if (dir2_fh_len) { + pr_debug("%s: dir2_fh_len=%u name2_len=%u name2='%.*s'\n", + __func__, dir2_fh_len, info->name2_len, + info->name2_len, fanotify_info_name2(info)); + } + return &fne->fae; } @@ -692,6 +715,7 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, struct inode *dirid = fanotify_dfid_inode(mask, data, data_type, dir); const struct path *path = fsnotify_data_path(data, data_type); struct mem_cgroup *old_memcg; + struct dentry *moved = NULL; struct inode *child = NULL; bool name_event = false; unsigned int hash = 0; @@ -727,6 +751,15 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, } else if ((mask & ALL_FSNOTIFY_DIRENT_EVENTS) || !ondir) { name_event = true; } + + /* + * In the special case of FAN_RENAME event, we record both + * old and new parent+name. + * 'dirid' and 'file_name' are the old parent+name and + * 'moved' has the new parent+name. + */ + if (mask & FAN_RENAME) + moved = fsnotify_data_dentry(data, data_type); } /* @@ -748,9 +781,9 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, } else if (fanotify_is_error_event(mask)) { event = fanotify_alloc_error_event(group, fsid, data, data_type, &hash); - } else if (name_event && (file_name || child)) { - event = fanotify_alloc_name_event(id, fsid, file_name, child, - &hash, gfp); + } else if (name_event && (file_name || moved || child)) { + event = fanotify_alloc_name_event(dirid, fsid, file_name, child, + moved, &hash, gfp); } else if (fid_mode) { event = fanotify_alloc_fid_event(id, fsid, &hash, gfp); } else { @@ -860,6 +893,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, BUILD_BUG_ON(FAN_OPEN_EXEC != FS_OPEN_EXEC); BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM); BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR); + BUILD_BUG_ON(FAN_RENAME != FS_RENAME); BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20); diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index 60f73639a896..9d0e2dc5767b 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -28,6 +28,8 @@ #define FAN_EVENT_ON_CHILD 0x08000000 /* Interested in child events */ +#define FAN_RENAME 0x10000000 /* File was renamed */ + #define FAN_ONDIR 0x40000000 /* Event occurred against dir */ /* helper events */ From c76fa8515949d5188a85956b57679e29795a6481 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Mon, 29 Nov 2021 22:15:35 +0200 Subject: [PATCH 0712/1565] fanotify: record either old name new name or both for FAN_RENAME [ Upstream commit 2bfbcccde6e7a787feabad4645f628f963fe0663 ] We do not want to report the dirfid+name of a directory whose inode/sb are not watched, because watcher may not have permissions to see the directory content. Use an internal iter_info to indicate to fanotify_alloc_event() which marks of this group are watching FAN_RENAME, so it can decide if we need to record only the old parent+name, new parent+name or both. Link: https://lore.kernel.org/r/20211129201537.1932819-10-amir73il@gmail.com Signed-off-by: Amir Goldstein [JK: Modified code to pass around only mask of mark types matching generated event] Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 59 ++++++++++++++++++++++++++--------- 1 file changed, 44 insertions(+), 15 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index db81eab90544..14bc0f12cc9f 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -284,8 +284,9 @@ static int fanotify_get_response(struct fsnotify_group *group, */ static u32 fanotify_group_event_mask(struct fsnotify_group *group, struct fsnotify_iter_info *iter_info, - u32 event_mask, const void *data, - int data_type, struct inode *dir) + u32 *match_mask, u32 event_mask, + const void *data, int data_type, + struct inode *dir) { __u32 marks_mask = 0, marks_ignored_mask = 0; __u32 test_mask, user_mask = FANOTIFY_OUTGOING_EVENTS | @@ -335,6 +336,9 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, continue; marks_mask |= mark->mask; + + /* Record the mark types of this group that matched the event */ + *match_mask |= 1U << type; } test_mask = event_mask & marks_mask & ~marks_ignored_mask; @@ -701,11 +705,11 @@ static struct fanotify_event *fanotify_alloc_error_event( return &fee->fae; } -static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, - u32 mask, const void *data, - int data_type, struct inode *dir, - const struct qstr *file_name, - __kernel_fsid_t *fsid) +static struct fanotify_event *fanotify_alloc_event( + struct fsnotify_group *group, + u32 mask, const void *data, int data_type, + struct inode *dir, const struct qstr *file_name, + __kernel_fsid_t *fsid, u32 match_mask) { struct fanotify_event *event = NULL; gfp_t gfp = GFP_KERNEL_ACCOUNT; @@ -753,13 +757,36 @@ static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, } /* - * In the special case of FAN_RENAME event, we record both - * old and new parent+name. + * In the special case of FAN_RENAME event, use the match_mask + * to determine if we need to report only the old parent+name, + * only the new parent+name or both. * 'dirid' and 'file_name' are the old parent+name and * 'moved' has the new parent+name. */ - if (mask & FAN_RENAME) - moved = fsnotify_data_dentry(data, data_type); + if (mask & FAN_RENAME) { + bool report_old, report_new; + + if (WARN_ON_ONCE(!match_mask)) + return NULL; + + /* Report both old and new parent+name if sb watching */ + report_old = report_new = + match_mask & (1U << FSNOTIFY_ITER_TYPE_SB); + report_old |= + match_mask & (1U << FSNOTIFY_ITER_TYPE_INODE); + report_new |= + match_mask & (1U << FSNOTIFY_ITER_TYPE_INODE2); + + if (!report_old) { + /* Do not report old parent+name */ + dirid = NULL; + file_name = NULL; + } + if (report_new) { + /* Report new parent+name */ + moved = fsnotify_data_dentry(data, data_type); + } + } } /* @@ -872,6 +899,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, struct fanotify_event *event; struct fsnotify_event *fsn_event; __kernel_fsid_t fsid = {}; + u32 match_mask = 0; BUILD_BUG_ON(FAN_ACCESS != FS_ACCESS); BUILD_BUG_ON(FAN_MODIFY != FS_MODIFY); @@ -897,12 +925,13 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20); - mask = fanotify_group_event_mask(group, iter_info, mask, data, - data_type, dir); + mask = fanotify_group_event_mask(group, iter_info, &match_mask, + mask, data, data_type, dir); if (!mask) return 0; - pr_debug("%s: group=%p mask=%x\n", __func__, group, mask); + pr_debug("%s: group=%p mask=%x report_mask=%x\n", __func__, + group, mask, match_mask); if (fanotify_is_perm_event(mask)) { /* @@ -921,7 +950,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, } event = fanotify_alloc_event(group, mask, data, data_type, dir, - file_name, &fsid); + file_name, &fsid, match_mask); ret = -ENOMEM; if (unlikely(!event)) { /* From a860dd8bf5712a3e4a569e36735cc7c4a9b5fa53 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Mon, 29 Nov 2021 22:15:36 +0200 Subject: [PATCH 0713/1565] fanotify: report old and/or new parent+name in FAN_RENAME event [ Upstream commit 7326e382c21e9c23c89c88369afdc90b82a14da8 ] In the special case of FAN_RENAME event, we report old or new or both old and new parent+name. A single info record will be reported if either the old or new dir is watched and two records will be reported if both old and new dir (or their filesystem) are watched. The old and new parent+name are reported using new info record types FAN_EVENT_INFO_TYPE_{OLD,NEW}_DFID_NAME, so if a single info record is reported, it is clear to the application, to which dir entry the fid+name info is referring to. Link: https://lore.kernel.org/r/20211129201537.1932819-11-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 7 ++++ fs/notify/fanotify/fanotify.h | 18 +++++++++++ fs/notify/fanotify/fanotify_user.c | 52 +++++++++++++++++++++++++++--- include/uapi/linux/fanotify.h | 6 ++++ 4 files changed, 78 insertions(+), 5 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 14bc0f12cc9f..0da305b6f3e2 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -153,6 +153,13 @@ static bool fanotify_should_merge(struct fanotify_event *old, if ((old->mask & FS_ISDIR) != (new->mask & FS_ISDIR)) return false; + /* + * FAN_RENAME event is reported with special info record types, + * so we cannot merge it with other events. + */ + if ((old->mask & FAN_RENAME) != (new->mask & FAN_RENAME)) + return false; + switch (old->type) { case FANOTIFY_EVENT_TYPE_PATH: return fanotify_path_equal(fanotify_event_path(old), diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 8fa3bc0effd4..a3d5b751cac5 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -373,6 +373,13 @@ static inline int fanotify_event_dir_fh_len(struct fanotify_event *event) return info ? fanotify_info_dir_fh_len(info) : 0; } +static inline int fanotify_event_dir2_fh_len(struct fanotify_event *event) +{ + struct fanotify_info *info = fanotify_event_info(event); + + return info ? fanotify_info_dir2_fh_len(info) : 0; +} + static inline bool fanotify_event_has_object_fh(struct fanotify_event *event) { /* For error events, even zeroed fh are reported. */ @@ -386,6 +393,17 @@ static inline bool fanotify_event_has_dir_fh(struct fanotify_event *event) return fanotify_event_dir_fh_len(event) > 0; } +static inline bool fanotify_event_has_dir2_fh(struct fanotify_event *event) +{ + return fanotify_event_dir2_fh_len(event) > 0; +} + +static inline bool fanotify_event_has_any_dir_fh(struct fanotify_event *event) +{ + return fanotify_event_has_dir_fh(event) || + fanotify_event_has_dir2_fh(event); +} + struct fanotify_path_event { struct fanotify_event fae; struct path path; diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index d69570db5efd..e16b18fdf1a6 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -124,12 +124,29 @@ static int fanotify_fid_info_len(int fh_len, int name_len) FANOTIFY_EVENT_ALIGN); } +/* FAN_RENAME may have one or two dir+name info records */ +static int fanotify_dir_name_info_len(struct fanotify_event *event) +{ + struct fanotify_info *info = fanotify_event_info(event); + int dir_fh_len = fanotify_event_dir_fh_len(event); + int dir2_fh_len = fanotify_event_dir2_fh_len(event); + int info_len = 0; + + if (dir_fh_len) + info_len += fanotify_fid_info_len(dir_fh_len, + info->name_len); + if (dir2_fh_len) + info_len += fanotify_fid_info_len(dir2_fh_len, + info->name2_len); + + return info_len; +} + static size_t fanotify_event_len(unsigned int info_mode, struct fanotify_event *event) { size_t event_len = FAN_EVENT_METADATA_LEN; struct fanotify_info *info; - int dir_fh_len; int fh_len; int dot_len = 0; @@ -141,9 +158,8 @@ static size_t fanotify_event_len(unsigned int info_mode, info = fanotify_event_info(event); - if (fanotify_event_has_dir_fh(event)) { - dir_fh_len = fanotify_event_dir_fh_len(event); - event_len += fanotify_fid_info_len(dir_fh_len, info->name_len); + if (fanotify_event_has_any_dir_fh(event)) { + event_len += fanotify_dir_name_info_len(event); } else if ((info_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) { /* @@ -374,6 +390,8 @@ static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, return -EFAULT; break; case FAN_EVENT_INFO_TYPE_DFID_NAME: + case FAN_EVENT_INFO_TYPE_OLD_DFID_NAME: + case FAN_EVENT_INFO_TYPE_NEW_DFID_NAME: if (WARN_ON_ONCE(!name || !name_len)) return -EFAULT; break; @@ -473,11 +491,19 @@ static int copy_info_records_to_user(struct fanotify_event *event, unsigned int pidfd_mode = info_mode & FAN_REPORT_PIDFD; /* - * Event info records order is as follows: dir fid + name, child fid. + * Event info records order is as follows: + * 1. dir fid + name + * 2. (optional) new dir fid + new name + * 3. (optional) child fid */ if (fanotify_event_has_dir_fh(event)) { info_type = info->name_len ? FAN_EVENT_INFO_TYPE_DFID_NAME : FAN_EVENT_INFO_TYPE_DFID; + + /* FAN_RENAME uses special info types */ + if (event->mask & FAN_RENAME) + info_type = FAN_EVENT_INFO_TYPE_OLD_DFID_NAME; + ret = copy_fid_info_to_user(fanotify_event_fsid(event), fanotify_info_dir_fh(info), info_type, @@ -491,6 +517,22 @@ static int copy_info_records_to_user(struct fanotify_event *event, total_bytes += ret; } + /* New dir fid+name may be reported in addition to old dir fid+name */ + if (fanotify_event_has_dir2_fh(event)) { + info_type = FAN_EVENT_INFO_TYPE_NEW_DFID_NAME; + ret = copy_fid_info_to_user(fanotify_event_fsid(event), + fanotify_info_dir2_fh(info), + info_type, + fanotify_info_name2(info), + info->name2_len, buf, count); + if (ret < 0) + return ret; + + buf += ret; + count -= ret; + total_bytes += ret; + } + if (fanotify_event_has_object_fh(event)) { const char *dot = NULL; int dot_len = 0; diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index 9d0e2dc5767b..e8ac38cc2fd6 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -134,6 +134,12 @@ struct fanotify_event_metadata { #define FAN_EVENT_INFO_TYPE_PIDFD 4 #define FAN_EVENT_INFO_TYPE_ERROR 5 +/* Special info types for FAN_RENAME */ +#define FAN_EVENT_INFO_TYPE_OLD_DFID_NAME 10 +/* Reserved for FAN_EVENT_INFO_TYPE_OLD_DFID 11 */ +#define FAN_EVENT_INFO_TYPE_NEW_DFID_NAME 12 +/* Reserved for FAN_EVENT_INFO_TYPE_NEW_DFID 13 */ + /* Variable length info record following event metadata */ struct fanotify_event_info_header { __u8 info_type; From 63b8c1923117f483aa6d05bf8240cc7b22ba06a7 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Mon, 29 Nov 2021 22:15:37 +0200 Subject: [PATCH 0714/1565] fanotify: wire up FAN_RENAME event [ Upstream commit 8cc3b1ccd930fe6971e1527f0c4f1bdc8cb56026 ] FAN_RENAME is the successor of FAN_MOVED_FROM and FAN_MOVED_TO and can be used to get the old and new parent+name information in a single event. FAN_MOVED_FROM and FAN_MOVED_TO are still supported for backward compatibility, but it makes little sense to use them together with FAN_RENAME in the same group. FAN_RENAME uses special info type records to report the old and new parent+name, so reporting only old and new parent id is less useful and was not implemented. Therefore, FAN_REANAME requires a group with flag FAN_REPORT_NAME. Link: https://lore.kernel.org/r/20211129201537.1932819-12-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 2 +- fs/notify/fanotify/fanotify_user.c | 8 ++++++++ include/linux/fanotify.h | 3 ++- 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 0da305b6f3e2..985e995d2a39 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -930,7 +930,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR); BUILD_BUG_ON(FAN_RENAME != FS_RENAME); - BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20); + BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 21); mask = fanotify_group_event_mask(group, iter_info, &match_mask, mask, data, data_type, dir); diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index e16b18fdf1a6..3bac2329dc35 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1599,6 +1599,14 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, (!fid_mode || mark_type == FAN_MARK_MOUNT)) goto fput_and_out; + /* + * FAN_RENAME uses special info type records to report the old and + * new parent+name. Reporting only old and new parent id is less + * useful and was not implemented. + */ + if (mask & FAN_RENAME && !(fid_mode & FAN_REPORT_NAME)) + goto fput_and_out; + if (flags & FAN_MARK_FLUSH) { ret = 0; if (mark_type == FAN_MARK_MOUNT) diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 376e050e6f38..3afdf339d53c 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -82,7 +82,8 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ * Directory entry modification events - reported only to directory * where entry is modified and not to a watching parent. */ -#define FANOTIFY_DIRENT_EVENTS (FAN_MOVE | FAN_CREATE | FAN_DELETE) +#define FANOTIFY_DIRENT_EVENTS (FAN_MOVE | FAN_CREATE | FAN_DELETE | \ + FAN_RENAME) /* Events that can be reported with event->fd */ #define FANOTIFY_FD_EVENTS (FANOTIFY_PATH_EVENTS | FANOTIFY_PERM_EVENTS) From 15606c8d5200d83cb50c26495bdb2ef00c0893fc Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Mon, 22 Nov 2021 10:27:36 -0600 Subject: [PATCH 0715/1565] exit: Implement kthread_exit [ Upstream commit bbda86e988d4c124e4cfa816291cbd583ae8bfb1 ] The way the per task_struct exit_code is used by kernel threads is not quite compatible how it is used by userspace applications. The low byte of the userspace exit_code value encodes the exit signal. While kthreads just use the value as an int holding ordinary kernel function exit status like -EPERM. Add kthread_exit to clearly separate the two kinds of uses. Signed-off-by: "Eric W. Biederman" Stable-dep-of: ca3574bd653a ("exit: Rename module_put_and_exit to module_put_and_kthread_exit") Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/kthread.h | 1 + kernel/kthread.c | 23 +++++++++++++++++++---- tools/objtool/check.c | 1 + 3 files changed, 21 insertions(+), 4 deletions(-) diff --git a/include/linux/kthread.h b/include/linux/kthread.h index 2484ed97e72f..9dae77a97a03 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -68,6 +68,7 @@ void *kthread_probe_data(struct task_struct *k); int kthread_park(struct task_struct *k); void kthread_unpark(struct task_struct *k); void kthread_parkme(void); +void kthread_exit(long result) __noreturn; int kthreadd(void *unused); extern struct task_struct *kthreadd_task; diff --git a/kernel/kthread.c b/kernel/kthread.c index 508fe5278285..9d6cc9c15a55 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -262,6 +262,21 @@ void kthread_parkme(void) } EXPORT_SYMBOL_GPL(kthread_parkme); +/** + * kthread_exit - Cause the current kthread return @result to kthread_stop(). + * @result: The integer value to return to kthread_stop(). + * + * While kthread_exit can be called directly, it exists so that + * functions which do some additional work in non-modular code such as + * module_put_and_kthread_exit can be implemented. + * + * Does not return. + */ +void __noreturn kthread_exit(long result) +{ + do_exit(result); +} + static int kthread(void *_create) { /* Copy data: it's on kthread's stack */ @@ -279,13 +294,13 @@ static int kthread(void *_create) done = xchg(&create->done, NULL); if (!done) { kfree(create); - do_exit(-EINTR); + kthread_exit(-EINTR); } if (!self) { create->result = ERR_PTR(-ENOMEM); complete(done); - do_exit(-ENOMEM); + kthread_exit(-ENOMEM); } self->threadfn = threadfn; @@ -312,7 +327,7 @@ static int kthread(void *_create) __kthread_parkme(self); ret = threadfn(data); } - do_exit(ret); + kthread_exit(ret); } /* called from do_fork() to get node information for about to be created task */ @@ -621,7 +636,7 @@ EXPORT_SYMBOL_GPL(kthread_park); * instead of calling wake_up_process(): the thread will exit without * calling threadfn(). * - * If threadfn() may call do_exit() itself, the caller must ensure + * If threadfn() may call kthread_exit() itself, the caller must ensure * task_struct can't go away. * * Returns the result of threadfn(), or %-EINTR if wake_up_process() diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 059b78d08f7a..6afa1f8ca161 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -168,6 +168,7 @@ static bool __dead_end_function(struct objtool_file *file, struct symbol *func, "panic", "do_exit", "do_task_dead", + "kthread_exit", "make_task_dead", "__module_put_and_exit", "complete_and_exit", From c59dc174b2e4bb1731aaf93f8fad23b9feab00f5 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 3 Dec 2021 11:00:19 -0600 Subject: [PATCH 0716/1565] exit: Rename module_put_and_exit to module_put_and_kthread_exit [ Upstream commit ca3574bd653aba234a4b31955f2778947403be16 ] Update module_put_and_exit to call kthread_exit instead of do_exit. Change the name to reflect this change in functionality. All of the users of module_put_and_exit are causing the current kthread to exit so this change makes it clear what is happening. There is no functional change. Signed-off-by: "Eric W. Biederman" Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- crypto/algboss.c | 4 ++-- fs/cifs/connect.c | 2 +- fs/nfs/callback.c | 4 ++-- fs/nfs/nfs4state.c | 2 +- fs/nfsd/nfssvc.c | 2 +- include/linux/module.h | 6 +++--- kernel/module.c | 6 +++--- net/bluetooth/bnep/core.c | 2 +- net/bluetooth/cmtp/core.c | 2 +- net/bluetooth/hidp/core.c | 2 +- tools/objtool/check.c | 2 +- 11 files changed, 17 insertions(+), 17 deletions(-) diff --git a/crypto/algboss.c b/crypto/algboss.c index 5ebccbd6b74e..b87f907bb142 100644 --- a/crypto/algboss.c +++ b/crypto/algboss.c @@ -74,7 +74,7 @@ static int cryptomgr_probe(void *data) complete_all(¶m->larval->completion); crypto_alg_put(¶m->larval->alg); kfree(param); - module_put_and_exit(0); + module_put_and_kthread_exit(0); } static int cryptomgr_schedule_probe(struct crypto_larval *larval) @@ -209,7 +209,7 @@ static int cryptomgr_test(void *data) crypto_alg_tested(param->driver, err); kfree(param); - module_put_and_exit(0); + module_put_and_kthread_exit(0); } static int cryptomgr_schedule_test(struct crypto_alg *alg) diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 164b98540716..a3c0e6a4e484 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -1242,7 +1242,7 @@ cifs_demultiplex_thread(void *p) } memalloc_noreclaim_restore(noreclaim_flag); - module_put_and_exit(0); + module_put_and_kthread_exit(0); } /* extract the host portion of the UNC string */ diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 86d856de1389..3c86a559a321 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -93,7 +93,7 @@ nfs4_callback_svc(void *vrqstp) svc_process(rqstp); } svc_exit_thread(rqstp); - module_put_and_exit(0); + module_put_and_kthread_exit(0); return 0; } @@ -137,7 +137,7 @@ nfs41_callback_svc(void *vrqstp) } } svc_exit_thread(rqstp); - module_put_and_exit(0); + module_put_and_kthread_exit(0); return 0; } diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index afb617a4a7e4..d8fc5d72a161 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -2757,7 +2757,7 @@ static int nfs4_run_state_manager(void *ptr) goto again; nfs_put_client(clp); - module_put_and_exit(0); + module_put_and_kthread_exit(0); return 0; } diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 408cff8fe32d..0f8415101108 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -986,7 +986,7 @@ nfsd(void *vrqstp) /* Release module */ mutex_unlock(&nfsd_mutex); - module_put_and_exit(0); + module_put_and_kthread_exit(0); return 0; } diff --git a/include/linux/module.h b/include/linux/module.h index 59cbd8e1be2d..a55a40c28568 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -604,9 +604,9 @@ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type, /* Look for this name: can be of form module:name. */ unsigned long module_kallsyms_lookup_name(const char *name); -extern void __noreturn __module_put_and_exit(struct module *mod, +extern void __noreturn __module_put_and_kthread_exit(struct module *mod, long code); -#define module_put_and_exit(code) __module_put_and_exit(THIS_MODULE, code) +#define module_put_and_kthread_exit(code) __module_put_and_kthread_exit(THIS_MODULE, code) #ifdef CONFIG_MODULE_UNLOAD int module_refcount(struct module *mod); @@ -798,7 +798,7 @@ static inline int unregister_module_notifier(struct notifier_block *nb) return 0; } -#define module_put_and_exit(code) do_exit(code) +#define module_put_and_kthread_exit(code) kthread_exit(code) static inline void print_modules(void) { diff --git a/kernel/module.c b/kernel/module.c index 949d09d2d829..9030ff8c0855 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -336,12 +336,12 @@ static inline void add_taint_module(struct module *mod, unsigned flag, * A thread that wants to hold a reference to a module only while it * is running can call this to safely exit. nfsd and lockd use this. */ -void __noreturn __module_put_and_exit(struct module *mod, long code) +void __noreturn __module_put_and_kthread_exit(struct module *mod, long code) { module_put(mod); - do_exit(code); + kthread_exit(code); } -EXPORT_SYMBOL(__module_put_and_exit); +EXPORT_SYMBOL(__module_put_and_kthread_exit); /* Find a module section: 0 means not found. */ static unsigned int find_sec(const struct load_info *info, const char *name) diff --git a/net/bluetooth/bnep/core.c b/net/bluetooth/bnep/core.c index 43c284158f63..09b6d825124e 100644 --- a/net/bluetooth/bnep/core.c +++ b/net/bluetooth/bnep/core.c @@ -535,7 +535,7 @@ static int bnep_session(void *arg) up_write(&bnep_session_sem); free_netdev(dev); - module_put_and_exit(0); + module_put_and_kthread_exit(0); return 0; } diff --git a/net/bluetooth/cmtp/core.c b/net/bluetooth/cmtp/core.c index 83eb84e8e688..90d130588a3e 100644 --- a/net/bluetooth/cmtp/core.c +++ b/net/bluetooth/cmtp/core.c @@ -323,7 +323,7 @@ static int cmtp_session(void *arg) up_write(&cmtp_session_sem); kfree(session); - module_put_and_exit(0); + module_put_and_kthread_exit(0); return 0; } diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c index b946a6379433..3ff870599eb7 100644 --- a/net/bluetooth/hidp/core.c +++ b/net/bluetooth/hidp/core.c @@ -1305,7 +1305,7 @@ static int hidp_session_thread(void *arg) l2cap_unregister_user(session->conn, &session->user); hidp_session_put(session); - module_put_and_exit(0); + module_put_and_kthread_exit(0); return 0; } diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 6afa1f8ca161..0506a48f124c 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -170,7 +170,7 @@ static bool __dead_end_function(struct objtool_file *file, struct symbol *func, "do_task_dead", "kthread_exit", "make_task_dead", - "__module_put_and_exit", + "__module_put_and_kthread_exit", "complete_and_exit", "__reiserfs_panic", "lbug_with_loc", From 307b391221ce5cac86554d5df8cff903921f06ee Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 13 Oct 2021 16:44:20 -0400 Subject: [PATCH 0717/1565] NFSD: Fix sparse warning [ Upstream commit c2f1c4bd20621175c581f298b4943df0cffbd841 ] /home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24: warning: incorrect type in assignment (different base types) /home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24: expected restricted __be32 [usertype] status /home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24: got int Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 52f3f3553379..eaf2f95d059c 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1508,7 +1508,7 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy) u64 bytes_total = copy->cp_count; u64 src_pos = copy->cp_src_pos; u64 dst_pos = copy->cp_dst_pos; - __be32 status; + int status; /* See RFC 7862 p.67: */ if (bytes_total == 0) From e0bf8993522038f5f9881c966ca17bf8885a535f Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0718/1565] NFSD: handle errors better in write_ports_addfd() [ Upstream commit 89b24336f03a8ba560e96b0c47a8434a7fa48e3c ] If write_ports_add() fails, we shouldn't destroy the serv, unless we had only just created it. So if there are any permanent sockets already attached, leave the serv in place. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index d0761ca8cb54..162866cfe83a 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -742,7 +742,7 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred return err; err = svc_addsock(nn->nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred); - if (err < 0) { + if (err < 0 && list_empty(&nn->nfsd_serv->sv_permsocks)) { nfsd_destroy(net); return err; } From e9a4156137cf3051648b17512a77104009ec3a35 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0719/1565] SUNRPC: change svc_get() to return the svc. [ Upstream commit df5e49c880ea0776806b8a9f8ab95e035272cf6f ] It is common for 'get' functions to return the object that was 'got', and there are a couple of places where users of svc_get() would be a little simpler if svc_get() did that. Make it so. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 6 ++---- fs/nfs/callback.c | 6 ++---- include/linux/sunrpc/svc.h | 3 ++- 3 files changed, 6 insertions(+), 9 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index b220e1b91726..2f50d5b2a8a4 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -430,14 +430,12 @@ static struct svc_serv *lockd_create_svc(void) /* * Check whether we're already up and running. */ - if (nlmsvc_rqst) { + if (nlmsvc_rqst) /* * Note: increase service usage, because later in case of error * svc_destroy() will be called. */ - svc_get(nlmsvc_rqst->rq_server); - return nlmsvc_rqst->rq_server; - } + return svc_get(nlmsvc_rqst->rq_server); /* * Sanity check: if there's no pid, diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 3c86a559a321..674198e0eb5e 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -266,14 +266,12 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion) /* * Check whether we're already up and running. */ - if (cb_info->serv) { + if (cb_info->serv) /* * Note: increase service usage, because later in case of error * svc_destroy() will be called. */ - svc_get(cb_info->serv); - return cb_info->serv; - } + return svc_get(cb_info->serv); switch (minorversion) { case 0: diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 15ecd5b1abac..e2580f6a05c2 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -120,9 +120,10 @@ struct svc_serv { * change the number of threads. Horrible, but there it is. * Should be called with the "service mutex" held. */ -static inline void svc_get(struct svc_serv *serv) +static inline struct svc_serv *svc_get(struct svc_serv *serv) { serv->sv_nrthreads++; + return serv; } /* From 64312a7c9fa1676afedbf1ad5e08a10eb272ad59 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0720/1565] SUNRPC/NFSD: clean up get/put functions. [ Upstream commit 8c62d12740a1450d2e8456d5747f440e10db281a ] svc_destroy() is poorly named - it doesn't necessarily destroy the svc, it might just reduce the ref count. nfsd_destroy() is poorly named for the same reason. This patch: - removes the refcount functionality from svc_destroy(), moving it to a new svc_put(). Almost all previous callers of svc_destroy() now call svc_put(). - renames nfsd_destroy() to nfsd_put() and improves the code, using the new svc_destroy() rather than svc_put() - removes a few comments that explain the important for balanced get/put calls. This should be obvious. The only non-trivial part of this is that svc_destroy() would call svc_sock_update() on a non-final decrement. It can no longer do that, and svc_put() isn't really a good place of it. This call is now made from svc_exit_thread() which seems like a good place. This makes the call *before* sv_nrthreads is decremented rather than after. This is not particularly important as the call just sets a flag which causes sv_nrthreads set be checked later. A subsequent patch will improve the ordering. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 6 +----- fs/nfs/callback.c | 14 ++------------ fs/nfsd/nfsctl.c | 4 ++-- fs/nfsd/nfsd.h | 2 +- fs/nfsd/nfssvc.c | 30 ++++++++++++++++-------------- include/linux/sunrpc/svc.h | 26 +++++++++++++++++++++++--- net/sunrpc/svc.c | 19 +++++-------------- 7 files changed, 50 insertions(+), 51 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 2f50d5b2a8a4..135bd86ed3ad 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -431,10 +431,6 @@ static struct svc_serv *lockd_create_svc(void) * Check whether we're already up and running. */ if (nlmsvc_rqst) - /* - * Note: increase service usage, because later in case of error - * svc_destroy() will be called. - */ return svc_get(nlmsvc_rqst->rq_server); /* @@ -495,7 +491,7 @@ int lockd_up(struct net *net, const struct cred *cred) * so we exit through here on both success and failure. */ err_put: - svc_destroy(serv); + svc_put(serv); err_create: mutex_unlock(&nlmsvc_mutex); return error; diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 674198e0eb5e..dddd66749a88 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -267,10 +267,6 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion) * Check whether we're already up and running. */ if (cb_info->serv) - /* - * Note: increase service usage, because later in case of error - * svc_destroy() will be called. - */ return svc_get(cb_info->serv); switch (minorversion) { @@ -333,16 +329,10 @@ int nfs_callback_up(u32 minorversion, struct rpc_xprt *xprt) goto err_start; cb_info->users++; - /* - * svc_create creates the svc_serv with sv_nrthreads == 1, and then - * svc_prepare_thread increments that. So we need to call svc_destroy - * on both success and failure so that the refcount is 1 when the - * thread exits. - */ err_net: if (!cb_info->users) cb_info->serv = NULL; - svc_destroy(serv); + svc_put(serv); err_create: mutex_unlock(&nfs_callback_mutex); return ret; @@ -368,7 +358,7 @@ void nfs_callback_down(int minorversion, struct net *net) if (cb_info->users == 0) { svc_get(serv); serv->sv_ops->svo_setup(serv, NULL, 0); - svc_destroy(serv); + svc_put(serv); dprintk("nfs_callback_down: service destroyed\n"); cb_info->serv = NULL; } diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 162866cfe83a..5c8d985acf5f 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -743,7 +743,7 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred err = svc_addsock(nn->nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred); if (err < 0 && list_empty(&nn->nfsd_serv->sv_permsocks)) { - nfsd_destroy(net); + nfsd_put(net); return err; } @@ -796,7 +796,7 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr if (!list_empty(&nn->nfsd_serv->sv_permsocks)) nn->nfsd_serv->sv_nrthreads--; else - nfsd_destroy(net); + nfsd_put(net); return err; } diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 498e5a489826..3e5008b475ff 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -97,7 +97,7 @@ int nfsd_pool_stats_open(struct inode *, struct file *); int nfsd_pool_stats_release(struct inode *, struct file *); void nfsd_shutdown_threads(struct net *net); -void nfsd_destroy(struct net *net); +void nfsd_put(struct net *net); bool i_am_nfsd(void); diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 0f8415101108..4aee1cfe0d1b 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -623,7 +623,7 @@ void nfsd_shutdown_threads(struct net *net) svc_get(serv); /* Kill outstanding nfsd threads */ serv->sv_ops->svo_setup(serv, NULL, 0); - nfsd_destroy(net); + nfsd_put(net); mutex_unlock(&nfsd_mutex); /* Wait for shutdown of nfsd_serv to complete */ wait_for_completion(&nn->nfsd_shutdown_complete); @@ -656,7 +656,10 @@ int nfsd_create_serv(struct net *net) nn->nfsd_serv->sv_maxconn = nn->max_connections; error = svc_bind(nn->nfsd_serv, net); if (error < 0) { - svc_destroy(nn->nfsd_serv); + /* NOT nfsd_put() as notifiers (see below) haven't + * been set up yet. + */ + svc_put(nn->nfsd_serv); nfsd_complete_shutdown(net); return error; } @@ -697,16 +700,16 @@ int nfsd_get_nrthreads(int n, int *nthreads, struct net *net) return 0; } -void nfsd_destroy(struct net *net) +void nfsd_put(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); - int destroy = (nn->nfsd_serv->sv_nrthreads == 1); - if (destroy) + nn->nfsd_serv->sv_nrthreads -= 1; + if (nn->nfsd_serv->sv_nrthreads == 0) { svc_shutdown_net(nn->nfsd_serv, net); - svc_destroy(nn->nfsd_serv); - if (destroy) + svc_destroy(nn->nfsd_serv); nfsd_complete_shutdown(net); + } } int nfsd_set_nrthreads(int n, int *nthreads, struct net *net) @@ -758,7 +761,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net) if (err) break; } - nfsd_destroy(net); + nfsd_put(net); return err; } @@ -795,7 +798,7 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred) error = nfsd_startup_net(net, cred); if (error) - goto out_destroy; + goto out_put; error = nn->nfsd_serv->sv_ops->svo_setup(nn->nfsd_serv, NULL, nrservs); if (error) @@ -808,8 +811,8 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred) out_shutdown: if (error < 0 && !nfsd_up_before) nfsd_shutdown_net(net); -out_destroy: - nfsd_destroy(net); /* Release server */ +out_put: + nfsd_put(net); out: mutex_unlock(&nfsd_mutex); return error; @@ -982,7 +985,7 @@ nfsd(void *vrqstp) /* Release the thread */ svc_exit_thread(rqstp); - nfsd_destroy(net); + nfsd_put(net); /* Release module */ mutex_unlock(&nfsd_mutex); @@ -1109,8 +1112,7 @@ int nfsd_pool_stats_release(struct inode *inode, struct file *file) struct net *net = inode->i_sb->s_fs_info; mutex_lock(&nfsd_mutex); - /* this function really, really should have been called svc_put() */ - nfsd_destroy(net); + nfsd_put(net); mutex_unlock(&nfsd_mutex); return ret; } diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index e2580f6a05c2..f7b582d3a65a 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -114,8 +114,13 @@ struct svc_serv { #endif /* CONFIG_SUNRPC_BACKCHANNEL */ }; -/* - * We use sv_nrthreads as a reference count. svc_destroy() drops +/** + * svc_get() - increment reference count on a SUNRPC serv + * @serv: the svc_serv to have count incremented + * + * Returns: the svc_serv that was passed in. + * + * We use sv_nrthreads as a reference count. svc_put() drops * this refcount, so we need to bump it up around operations that * change the number of threads. Horrible, but there it is. * Should be called with the "service mutex" held. @@ -126,6 +131,22 @@ static inline struct svc_serv *svc_get(struct svc_serv *serv) return serv; } +void svc_destroy(struct svc_serv *serv); + +/** + * svc_put - decrement reference count on a SUNRPC serv + * @serv: the svc_serv to have count decremented + * + * When the reference count reaches zero, svc_destroy() + * is called to clean up and free the serv. + */ +static inline void svc_put(struct svc_serv *serv) +{ + serv->sv_nrthreads -= 1; + if (serv->sv_nrthreads == 0) + svc_destroy(serv); +} + /* * Maximum payload size supported by a kernel RPC server. * This is use to determine the max number of pages nfsd is @@ -518,7 +539,6 @@ struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int); int svc_set_num_threads_sync(struct svc_serv *, struct svc_pool *, int); int svc_pool_stats_open(struct svc_serv *serv, struct file *file); -void svc_destroy(struct svc_serv *); void svc_shutdown_net(struct svc_serv *, struct net *); int svc_process(struct svc_rqst *); int bc_svc_process(struct svc_serv *, struct rpc_rqst *, diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 54f66f66beb5..00d464f6dbe6 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -526,17 +526,7 @@ EXPORT_SYMBOL_GPL(svc_shutdown_net); void svc_destroy(struct svc_serv *serv) { - dprintk("svc: svc_destroy(%s, %d)\n", - serv->sv_program->pg_name, - serv->sv_nrthreads); - - if (serv->sv_nrthreads) { - if (--(serv->sv_nrthreads) != 0) { - svc_sock_update_bufs(serv); - return; - } - } else - printk("svc_destroy: no threads for serv=%p!\n", serv); + dprintk("svc: svc_destroy(%s)\n", serv->sv_program->pg_name); del_timer_sync(&serv->sv_temptimer); @@ -893,9 +883,10 @@ svc_exit_thread(struct svc_rqst *rqstp) svc_rqst_free(rqstp); - /* Release the server */ - if (serv) - svc_destroy(serv); + if (!serv) + return; + svc_sock_update_bufs(serv); + svc_destroy(serv); } EXPORT_SYMBOL_GPL(svc_exit_thread); From 17041f014060a759a890d601e75356cef3835043 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0721/1565] SUNRPC: stop using ->sv_nrthreads as a refcount [ Upstream commit ec52361df99b490f6af412b046df9799b92c1050 ] The use of sv_nrthreads as a general refcount results in clumsy code, as is seen by various comments needed to explain the situation. This patch introduces a 'struct kref' and uses that for reference counting, leaving sv_nrthreads to be a pure count of threads. The kref is managed particularly in svc_get() and svc_put(), and also nfsd_put(); svc_destroy() now takes a pointer to the embedded kref, rather than to the serv. nfsd allows the svc_serv to exist with ->sv_nrhtreads being zero. This happens when a transport is created before the first thread is started. To support this, a 'keep_active' flag is introduced which holds a ref on the svc_serv. This is set when any listening socket is successfully added (unless there are running threads), and cleared when the number of threads is set. So when the last thread exits, the nfs_serv will be destroyed. The use of 'keep_active' replaces previous code which checked if there were any permanent sockets. We no longer clear ->rq_server when nfsd() exits. This was done to prevent svc_exit_thread() from calling svc_destroy(). Instead we take an extra reference to the svc_serv to prevent svc_destroy() from being called. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 4 ---- fs/nfs/callback.c | 2 +- fs/nfsd/netns.h | 7 +++++++ fs/nfsd/nfsctl.c | 22 +++++++++----------- fs/nfsd/nfssvc.c | 42 +++++++++++++++++++++++--------------- include/linux/sunrpc/svc.h | 14 ++++--------- net/sunrpc/svc.c | 22 ++++++++++---------- 7 files changed, 59 insertions(+), 54 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 135bd86ed3ad..a9669b106dbd 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -486,10 +486,6 @@ int lockd_up(struct net *net, const struct cred *cred) goto err_put; } nlmsvc_users++; - /* - * Note: svc_serv structures have an initial use count of 1, - * so we exit through here on both success and failure. - */ err_put: svc_put(serv); err_create: diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index dddd66749a88..09ec60b99f65 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -169,7 +169,7 @@ static int nfs_callback_start_svc(int minorversion, struct rpc_xprt *xprt, if (nrservs < NFS4_MIN_NR_CALLBACK_THREADS) nrservs = NFS4_MIN_NR_CALLBACK_THREADS; - if (serv->sv_nrthreads-1 == nrservs) + if (serv->sv_nrthreads == nrservs) return 0; ret = serv->sv_ops->svo_setup(serv, NULL, nrservs); diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 935c1028c217..08bcd8f23b01 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -123,6 +123,13 @@ struct nfsd_net { u32 clverifier_counter; struct svc_serv *nfsd_serv; + /* When a listening socket is added to nfsd, keep_active is set + * and this justifies a reference on nfsd_serv. This stops + * nfsd_serv from being freed. When the number of threads is + * set, keep_active is cleared and the reference is dropped. So + * when the last thread exits, the service will be destroyed. + */ + int keep_active; wait_queue_head_t ntf_wq; atomic_t ntf_refcnt; diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 5c8d985acf5f..53076c5afe62 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -742,13 +742,12 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred return err; err = svc_addsock(nn->nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred); - if (err < 0 && list_empty(&nn->nfsd_serv->sv_permsocks)) { - nfsd_put(net); - return err; - } - /* Decrease the count, but don't shut down the service */ - nn->nfsd_serv->sv_nrthreads--; + if (err >= 0 && + !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) + svc_get(nn->nfsd_serv); + + nfsd_put(net); return err; } @@ -783,8 +782,10 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr if (err < 0 && err != -EAFNOSUPPORT) goto out_close; - /* Decrease the count, but don't shut down the service */ - nn->nfsd_serv->sv_nrthreads--; + if (!nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) + svc_get(nn->nfsd_serv); + + nfsd_put(net); return 0; out_close: xprt = svc_find_xprt(nn->nfsd_serv, transport, net, PF_INET, port); @@ -793,10 +794,7 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr svc_xprt_put(xprt); } out_err: - if (!list_empty(&nn->nfsd_serv->sv_permsocks)) - nn->nfsd_serv->sv_nrthreads--; - else - nfsd_put(net); + nfsd_put(net); return err; } diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 4aee1cfe0d1b..141d884fee4f 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -60,13 +60,13 @@ static __be32 nfsd_init_request(struct svc_rqst *, * extent ->sv_temp_socks and ->sv_permsocks. It also protects nfsdstats.th_cnt * * If (out side the lock) nn->nfsd_serv is non-NULL, then it must point to a - * properly initialised 'struct svc_serv' with ->sv_nrthreads > 0. That number - * of nfsd threads must exist and each must listed in ->sp_all_threads in each - * entry of ->sv_pools[]. + * properly initialised 'struct svc_serv' with ->sv_nrthreads > 0 (unless + * nn->keep_active is set). That number of nfsd threads must + * exist and each must be listed in ->sp_all_threads in some entry of + * ->sv_pools[]. * - * Transitions of the thread count between zero and non-zero are of particular - * interest since the svc_serv needs to be created and initialized at that - * point, or freed. + * Each active thread holds a counted reference on nn->nfsd_serv, as does + * the nn->keep_active flag and various transient calls to svc_get(). * * Finally, the nfsd_mutex also protects some of the global variables that are * accessed when nfsd starts and that are settable via the write_* routines in @@ -700,14 +700,22 @@ int nfsd_get_nrthreads(int n, int *nthreads, struct net *net) return 0; } +/* This is the callback for kref_put() below. + * There is no code here as the first thing to be done is + * call svc_shutdown_net(), but we cannot get the 'net' from + * the kref. So do all the work when kref_put returns true. + */ +static void nfsd_noop(struct kref *ref) +{ +} + void nfsd_put(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); - nn->nfsd_serv->sv_nrthreads -= 1; - if (nn->nfsd_serv->sv_nrthreads == 0) { + if (kref_put(&nn->nfsd_serv->sv_refcnt, nfsd_noop)) { svc_shutdown_net(nn->nfsd_serv, net); - svc_destroy(nn->nfsd_serv); + svc_destroy(&nn->nfsd_serv->sv_refcnt); nfsd_complete_shutdown(net); } } @@ -803,15 +811,14 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred) NULL, nrservs); if (error) goto out_shutdown; - /* We are holding a reference to nn->nfsd_serv which - * we don't want to count in the return value, - * so subtract 1 - */ - error = nn->nfsd_serv->sv_nrthreads - 1; + error = nn->nfsd_serv->sv_nrthreads; out_shutdown: if (error < 0 && !nfsd_up_before) nfsd_shutdown_net(net); out_put: + /* Threads now hold service active */ + if (xchg(&nn->keep_active, 0)) + nfsd_put(net); nfsd_put(net); out: mutex_unlock(&nfsd_mutex); @@ -980,11 +987,15 @@ nfsd(void *vrqstp) nfsdstats.th_cnt --; out: - rqstp->rq_server = NULL; + /* Take an extra ref so that the svc_put in svc_exit_thread() + * doesn't call svc_destroy() + */ + svc_get(nn->nfsd_serv); /* Release the thread */ svc_exit_thread(rqstp); + /* Now if needed we call svc_destroy in appropriate context */ nfsd_put(net); /* Release module */ @@ -1099,7 +1110,6 @@ int nfsd_pool_stats_open(struct inode *inode, struct file *file) mutex_unlock(&nfsd_mutex); return -ENODEV; } - /* bump up the psudo refcount while traversing */ svc_get(nn->nfsd_serv); ret = svc_pool_stats_open(nn->nfsd_serv, file); mutex_unlock(&nfsd_mutex); diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index f7b582d3a65a..6829c4ee3654 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -85,6 +85,7 @@ struct svc_serv { struct svc_program * sv_program; /* RPC program */ struct svc_stat * sv_stats; /* RPC statistics */ spinlock_t sv_lock; + struct kref sv_refcnt; unsigned int sv_nrthreads; /* # of server threads */ unsigned int sv_maxconn; /* max connections allowed or * '0' causing max to be based @@ -119,19 +120,14 @@ struct svc_serv { * @serv: the svc_serv to have count incremented * * Returns: the svc_serv that was passed in. - * - * We use sv_nrthreads as a reference count. svc_put() drops - * this refcount, so we need to bump it up around operations that - * change the number of threads. Horrible, but there it is. - * Should be called with the "service mutex" held. */ static inline struct svc_serv *svc_get(struct svc_serv *serv) { - serv->sv_nrthreads++; + kref_get(&serv->sv_refcnt); return serv; } -void svc_destroy(struct svc_serv *serv); +void svc_destroy(struct kref *); /** * svc_put - decrement reference count on a SUNRPC serv @@ -142,9 +138,7 @@ void svc_destroy(struct svc_serv *serv); */ static inline void svc_put(struct svc_serv *serv) { - serv->sv_nrthreads -= 1; - if (serv->sv_nrthreads == 0) - svc_destroy(serv); + kref_put(&serv->sv_refcnt, svc_destroy); } /* diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 00d464f6dbe6..19c9d8bf77be 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -433,7 +433,7 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools, return NULL; serv->sv_name = prog->pg_name; serv->sv_program = prog; - serv->sv_nrthreads = 1; + kref_init(&serv->sv_refcnt); serv->sv_stats = prog->pg_stats; if (bufsize > RPCSVC_MAXPAYLOAD) bufsize = RPCSVC_MAXPAYLOAD; @@ -524,10 +524,11 @@ EXPORT_SYMBOL_GPL(svc_shutdown_net); * protect the sv_nrthreads, sv_permsocks and sv_tempsocks. */ void -svc_destroy(struct svc_serv *serv) +svc_destroy(struct kref *ref) { - dprintk("svc: svc_destroy(%s)\n", serv->sv_program->pg_name); + struct svc_serv *serv = container_of(ref, struct svc_serv, sv_refcnt); + dprintk("svc: svc_destroy(%s)\n", serv->sv_program->pg_name); del_timer_sync(&serv->sv_temptimer); /* @@ -635,6 +636,7 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node) if (!rqstp) return ERR_PTR(-ENOMEM); + svc_get(serv); serv->sv_nrthreads++; spin_lock_bh(&pool->sp_lock); pool->sp_nrthreads++; @@ -774,8 +776,7 @@ int svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) { if (pool == NULL) { - /* The -1 assumes caller has done a svc_get() */ - nrservs -= (serv->sv_nrthreads-1); + nrservs -= serv->sv_nrthreads; } else { spin_lock_bh(&pool->sp_lock); nrservs -= pool->sp_nrthreads; @@ -816,8 +817,7 @@ int svc_set_num_threads_sync(struct svc_serv *serv, struct svc_pool *pool, int nrservs) { if (pool == NULL) { - /* The -1 assumes caller has done a svc_get() */ - nrservs -= (serv->sv_nrthreads-1); + nrservs -= serv->sv_nrthreads; } else { spin_lock_bh(&pool->sp_lock); nrservs -= pool->sp_nrthreads; @@ -881,12 +881,12 @@ svc_exit_thread(struct svc_rqst *rqstp) list_del_rcu(&rqstp->rq_all); spin_unlock_bh(&pool->sp_lock); + serv->sv_nrthreads -= 1; + svc_sock_update_bufs(serv); + svc_rqst_free(rqstp); - if (!serv) - return; - svc_sock_update_bufs(serv); - svc_destroy(serv); + svc_put(serv); } EXPORT_SYMBOL_GPL(svc_exit_thread); From 4efe0b9d11fc5d650d627dba4f43973da10e11ac Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0722/1565] nfsd: make nfsd_stats.th_cnt atomic_t [ Upstream commit 9b6c8c9bebccd5fb785c306b948c08874a88874d ] This allows us to move the updates for th_cnt out of the mutex. This is a step towards reducing mutex coverage in nfsd(). Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfssvc.c | 6 +++--- fs/nfsd/stats.c | 2 +- fs/nfsd/stats.h | 4 +--- 3 files changed, 5 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 141d884fee4f..32f2c46a3832 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -57,7 +57,7 @@ static __be32 nfsd_init_request(struct svc_rqst *, /* * nfsd_mutex protects nn->nfsd_serv -- both the pointer itself and the members * of the svc_serv struct. In particular, ->sv_nrthreads but also to some - * extent ->sv_temp_socks and ->sv_permsocks. It also protects nfsdstats.th_cnt + * extent ->sv_temp_socks and ->sv_permsocks. * * If (out side the lock) nn->nfsd_serv is non-NULL, then it must point to a * properly initialised 'struct svc_serv' with ->sv_nrthreads > 0 (unless @@ -955,8 +955,8 @@ nfsd(void *vrqstp) allow_signal(SIGINT); allow_signal(SIGQUIT); - nfsdstats.th_cnt++; mutex_unlock(&nfsd_mutex); + atomic_inc(&nfsdstats.th_cnt); set_freezable(); @@ -983,8 +983,8 @@ nfsd(void *vrqstp) /* Clear signals before calling svc_exit_thread() */ flush_signals(current); + atomic_dec(&nfsdstats.th_cnt); mutex_lock(&nfsd_mutex); - nfsdstats.th_cnt --; out: /* Take an extra ref so that the svc_put in svc_exit_thread() diff --git a/fs/nfsd/stats.c b/fs/nfsd/stats.c index 1d3b881e7382..a8c5a02a84f0 100644 --- a/fs/nfsd/stats.c +++ b/fs/nfsd/stats.c @@ -45,7 +45,7 @@ static int nfsd_proc_show(struct seq_file *seq, void *v) percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_IO_WRITE])); /* thread usage: */ - seq_printf(seq, "th %u 0", nfsdstats.th_cnt); + seq_printf(seq, "th %u 0", atomic_read(&nfsdstats.th_cnt)); /* deprecated thread usage histogram stats */ for (i = 0; i < 10; i++) diff --git a/fs/nfsd/stats.h b/fs/nfsd/stats.h index 51ecda852e23..9b43dc3d9991 100644 --- a/fs/nfsd/stats.h +++ b/fs/nfsd/stats.h @@ -29,11 +29,9 @@ enum { struct nfsd_stats { struct percpu_counter counter[NFSD_STATS_COUNTERS_NUM]; - /* Protected by nfsd_mutex */ - unsigned int th_cnt; /* number of available threads */ + atomic_t th_cnt; /* number of available threads */ }; - extern struct nfsd_stats nfsdstats; extern struct svc_stat nfsd_svcstats; From 61d12fc30a5e2fa2296f0afc759f023b1e7f3937 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0723/1565] SUNRPC: use sv_lock to protect updates to sv_nrthreads. [ Upstream commit 2a36395fac3b72771f87c3ee4387e3a96d85a7cc ] Using sv_lock means we don't need to hold the service mutex over these updates. In particular, svc_exit_thread() no longer requires synchronisation, so threads can exit asynchronously. Note that we could use an atomic_t, but as there are many more read sites than writes, that would add unnecessary noise to the code. Some reads are already racy, and there is no need for them to not be. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfssvc.c | 5 ++--- net/sunrpc/svc.c | 9 +++++++-- 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 32f2c46a3832..16884a90e1ab 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -55,9 +55,8 @@ static __be32 nfsd_init_request(struct svc_rqst *, struct svc_process_info *); /* - * nfsd_mutex protects nn->nfsd_serv -- both the pointer itself and the members - * of the svc_serv struct. In particular, ->sv_nrthreads but also to some - * extent ->sv_temp_socks and ->sv_permsocks. + * nfsd_mutex protects nn->nfsd_serv -- both the pointer itself and some members + * of the svc_serv struct such as ->sv_temp_socks and ->sv_permsocks. * * If (out side the lock) nn->nfsd_serv is non-NULL, then it must point to a * properly initialised 'struct svc_serv' with ->sv_nrthreads > 0 (unless diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 19c9d8bf77be..283088f3215e 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -521,7 +521,7 @@ EXPORT_SYMBOL_GPL(svc_shutdown_net); /* * Destroy an RPC service. Should be called with appropriate locking to - * protect the sv_nrthreads, sv_permsocks and sv_tempsocks. + * protect sv_permsocks and sv_tempsocks. */ void svc_destroy(struct kref *ref) @@ -637,7 +637,10 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node) return ERR_PTR(-ENOMEM); svc_get(serv); - serv->sv_nrthreads++; + spin_lock_bh(&serv->sv_lock); + serv->sv_nrthreads += 1; + spin_unlock_bh(&serv->sv_lock); + spin_lock_bh(&pool->sp_lock); pool->sp_nrthreads++; list_add_rcu(&rqstp->rq_all, &pool->sp_all_threads); @@ -881,7 +884,9 @@ svc_exit_thread(struct svc_rqst *rqstp) list_del_rcu(&rqstp->rq_all); spin_unlock_bh(&pool->sp_lock); + spin_lock_bh(&serv->sv_lock); serv->sv_nrthreads -= 1; + spin_unlock_bh(&serv->sv_lock); svc_sock_update_bufs(serv); svc_rqst_free(rqstp); From 6343271d5315f82c5fcb483e71c271a64fb96f83 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0724/1565] NFSD: narrow nfsd_mutex protection in nfsd thread [ Upstream commit 9d3792aefdcda71d20c2b1ecc589c17ae71eb523 ] There is nothing happening in the start of nfsd() that requires protection by the mutex, so don't take it until shutting down the thread - which does still require protection - but only for nfsd_put(). Signed-off-by: NeilBrown [ cel: address merge conflict with fd2468fa1301 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfssvc.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 16884a90e1ab..eb8cc4d914fe 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -932,9 +932,6 @@ nfsd(void *vrqstp) struct nfsd_net *nn = net_generic(net, nfsd_net_id); int err; - /* Lock module and set up kernel thread */ - mutex_lock(&nfsd_mutex); - /* At this point, the thread shares current->fs * with the init process. We need to create files with the * umask as defined by the client instead of init's umask. */ @@ -954,7 +951,6 @@ nfsd(void *vrqstp) allow_signal(SIGINT); allow_signal(SIGQUIT); - mutex_unlock(&nfsd_mutex); atomic_inc(&nfsdstats.th_cnt); set_freezable(); @@ -983,7 +979,6 @@ nfsd(void *vrqstp) flush_signals(current); atomic_dec(&nfsdstats.th_cnt); - mutex_lock(&nfsd_mutex); out: /* Take an extra ref so that the svc_put in svc_exit_thread() @@ -995,10 +990,11 @@ nfsd(void *vrqstp) svc_exit_thread(rqstp); /* Now if needed we call svc_destroy in appropriate context */ + mutex_lock(&nfsd_mutex); nfsd_put(net); + mutex_unlock(&nfsd_mutex); /* Release module */ - mutex_unlock(&nfsd_mutex); module_put_and_kthread_exit(0); return 0; } From 361452374168b572feae8f377f386564bdcb4308 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0725/1565] NFSD: Make it possible to use svc_set_num_threads_sync [ Upstream commit 3409e4f1e8f239f0ed81be0b068ecf4e73e2e826 ] nfsd cannot currently use svc_set_num_threads_sync. It instead uses svc_set_num_threads which does *not* wait for threads to all exit, and has a separate mechanism (nfsd_shutdown_complete) to wait for completion. The reason that nfsd is unlike other services is that nfsd threads can exit separately from svc_set_num_threads being called - they die on receipt of SIGKILL. Also, when the last thread exits, the service must be shut down (sockets closed). For this, the nfsd_mutex needs to be taken, and as that mutex needs to be held while svc_set_num_threads is called, the one cannot wait for the other. This patch changes the nfsd thread so that it can drop the ref on the service without blocking on nfsd_mutex, so that svc_set_num_threads_sync can be used: - if it can drop a non-last reference, it does that. This does not trigger shutdown and does not require a mutex. This will likely happen for all but the last thread signalled, and for all threads being shut down by nfsd_shutdown_threads() - if it can get the mutex without blocking (trylock), it does that and then drops the reference. This will likely happen for the last thread killed by SIGKILL - Otherwise there might be an unrelated task holding the mutex, possibly in another network namespace, or nfsd_shutdown_threads() might be just about to get a reference on the service, after which we can drop ours safely. We cannot conveniently get wakeup notifications on these events, and we are unlikely to need to, so we sleep briefly and check again. With this we can discard nfsd_shutdown_complete and nfsd_complete_shutdown(), and switch to svc_set_num_threads_sync. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/netns.h | 3 --- fs/nfsd/nfssvc.c | 41 +++++++++++++++++++------------------- include/linux/sunrpc/svc.h | 13 ++++++++++++ 3 files changed, 33 insertions(+), 24 deletions(-) diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 08bcd8f23b01..1fd59eb0730b 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -134,9 +134,6 @@ struct nfsd_net { wait_queue_head_t ntf_wq; atomic_t ntf_refcnt; - /* Allow umount to wait for nfsd state cleanup */ - struct completion nfsd_shutdown_complete; - /* * clientid and stateid data for construction of net unique COPY * stateids. diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index eb8cc4d914fe..6b10415e4006 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -593,20 +593,10 @@ static const struct svc_serv_ops nfsd_thread_sv_ops = { .svo_shutdown = nfsd_last_thread, .svo_function = nfsd, .svo_enqueue_xprt = svc_xprt_do_enqueue, - .svo_setup = svc_set_num_threads, + .svo_setup = svc_set_num_threads_sync, .svo_module = THIS_MODULE, }; -static void nfsd_complete_shutdown(struct net *net) -{ - struct nfsd_net *nn = net_generic(net, nfsd_net_id); - - WARN_ON(!mutex_is_locked(&nfsd_mutex)); - - nn->nfsd_serv = NULL; - complete(&nn->nfsd_shutdown_complete); -} - void nfsd_shutdown_threads(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); @@ -624,8 +614,6 @@ void nfsd_shutdown_threads(struct net *net) serv->sv_ops->svo_setup(serv, NULL, 0); nfsd_put(net); mutex_unlock(&nfsd_mutex); - /* Wait for shutdown of nfsd_serv to complete */ - wait_for_completion(&nn->nfsd_shutdown_complete); } bool i_am_nfsd(void) @@ -650,7 +638,6 @@ int nfsd_create_serv(struct net *net) &nfsd_thread_sv_ops); if (nn->nfsd_serv == NULL) return -ENOMEM; - init_completion(&nn->nfsd_shutdown_complete); nn->nfsd_serv->sv_maxconn = nn->max_connections; error = svc_bind(nn->nfsd_serv, net); @@ -659,7 +646,7 @@ int nfsd_create_serv(struct net *net) * been set up yet. */ svc_put(nn->nfsd_serv); - nfsd_complete_shutdown(net); + nn->nfsd_serv = NULL; return error; } @@ -715,7 +702,7 @@ void nfsd_put(struct net *net) if (kref_put(&nn->nfsd_serv->sv_refcnt, nfsd_noop)) { svc_shutdown_net(nn->nfsd_serv, net); svc_destroy(&nn->nfsd_serv->sv_refcnt); - nfsd_complete_shutdown(net); + nn->nfsd_serv = NULL; } } @@ -743,7 +730,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net) if (tot > NFSD_MAXSERVS) { /* total too large: scale down requested numbers */ for (i = 0; i < n && tot > 0; i++) { - int new = nthreads[i] * NFSD_MAXSERVS / tot; + int new = nthreads[i] * NFSD_MAXSERVS / tot; tot -= (nthreads[i] - new); nthreads[i] = new; } @@ -989,10 +976,22 @@ nfsd(void *vrqstp) /* Release the thread */ svc_exit_thread(rqstp); - /* Now if needed we call svc_destroy in appropriate context */ - mutex_lock(&nfsd_mutex); - nfsd_put(net); - mutex_unlock(&nfsd_mutex); + /* We need to drop a ref, but may not drop the last reference + * without holding nfsd_mutex, and we cannot wait for nfsd_mutex as that + * could deadlock with nfsd_shutdown_threads() waiting for us. + * So three options are: + * - drop a non-final reference, + * - get the mutex without waiting + * - sleep briefly andd try the above again + */ + while (!svc_put_not_last(nn->nfsd_serv)) { + if (mutex_trylock(&nfsd_mutex)) { + nfsd_put(net); + mutex_unlock(&nfsd_mutex); + break; + } + msleep(20); + } /* Release module */ module_put_and_kthread_exit(0); diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 6829c4ee3654..2768d61b8aed 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -141,6 +141,19 @@ static inline void svc_put(struct svc_serv *serv) kref_put(&serv->sv_refcnt, svc_destroy); } +/** + * svc_put_not_last - decrement non-final reference count on SUNRPC serv + * @serv: the svc_serv to have count decremented + * + * Returns: %true is refcount was decremented. + * + * If the refcount is 1, it is not decremented and instead failure is reported. + */ +static inline bool svc_put_not_last(struct svc_serv *serv) +{ + return refcount_dec_not_one(&serv->sv_refcnt.refcount); +} + /* * Maximum payload size supported by a kernel RPC server. * This is use to determine the max number of pages nfsd is From 7149250beeea91f5de506e9a2f3195a385340210 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0726/1565] SUNRPC: discard svo_setup and rename svc_set_num_threads_sync() [ Upstream commit 3ebdbe5203a874614819700d3f470724cb803709 ] The ->svo_setup callback serves no purpose. It is always called from within the same module that chooses which callback is needed. So discard it and call the relevant function directly. Now that svc_set_num_threads() is no longer used remove it and rename svc_set_num_threads_sync() to remove the "_sync" suffix. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfs/callback.c | 8 +++---- fs/nfsd/nfssvc.c | 11 ++++----- include/linux/sunrpc/svc.h | 4 ---- net/sunrpc/svc.c | 49 ++------------------------------------ 4 files changed, 10 insertions(+), 62 deletions(-) diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 09ec60b99f65..422055a1092f 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -172,9 +172,9 @@ static int nfs_callback_start_svc(int minorversion, struct rpc_xprt *xprt, if (serv->sv_nrthreads == nrservs) return 0; - ret = serv->sv_ops->svo_setup(serv, NULL, nrservs); + ret = svc_set_num_threads(serv, NULL, nrservs); if (ret) { - serv->sv_ops->svo_setup(serv, NULL, 0); + svc_set_num_threads(serv, NULL, 0); return ret; } dprintk("nfs_callback_up: service started\n"); @@ -235,14 +235,12 @@ static int nfs_callback_up_net(int minorversion, struct svc_serv *serv, static const struct svc_serv_ops nfs40_cb_sv_ops = { .svo_function = nfs4_callback_svc, .svo_enqueue_xprt = svc_xprt_do_enqueue, - .svo_setup = svc_set_num_threads_sync, .svo_module = THIS_MODULE, }; #if defined(CONFIG_NFS_V4_1) static const struct svc_serv_ops nfs41_cb_sv_ops = { .svo_function = nfs41_callback_svc, .svo_enqueue_xprt = svc_xprt_do_enqueue, - .svo_setup = svc_set_num_threads_sync, .svo_module = THIS_MODULE, }; @@ -357,7 +355,7 @@ void nfs_callback_down(int minorversion, struct net *net) cb_info->users--; if (cb_info->users == 0) { svc_get(serv); - serv->sv_ops->svo_setup(serv, NULL, 0); + svc_set_num_threads(serv, NULL, 0); svc_put(serv); dprintk("nfs_callback_down: service destroyed\n"); cb_info->serv = NULL; diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 6b10415e4006..8d49dfbe03f8 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -593,7 +593,6 @@ static const struct svc_serv_ops nfsd_thread_sv_ops = { .svo_shutdown = nfsd_last_thread, .svo_function = nfsd, .svo_enqueue_xprt = svc_xprt_do_enqueue, - .svo_setup = svc_set_num_threads_sync, .svo_module = THIS_MODULE, }; @@ -611,7 +610,7 @@ void nfsd_shutdown_threads(struct net *net) svc_get(serv); /* Kill outstanding nfsd threads */ - serv->sv_ops->svo_setup(serv, NULL, 0); + svc_set_num_threads(serv, NULL, 0); nfsd_put(net); mutex_unlock(&nfsd_mutex); } @@ -750,8 +749,9 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net) /* apply the new numbers */ svc_get(nn->nfsd_serv); for (i = 0; i < n; i++) { - err = nn->nfsd_serv->sv_ops->svo_setup(nn->nfsd_serv, - &nn->nfsd_serv->sv_pools[i], nthreads[i]); + err = svc_set_num_threads(nn->nfsd_serv, + &nn->nfsd_serv->sv_pools[i], + nthreads[i]); if (err) break; } @@ -793,8 +793,7 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred) error = nfsd_startup_net(net, cred); if (error) goto out_put; - error = nn->nfsd_serv->sv_ops->svo_setup(nn->nfsd_serv, - NULL, nrservs); + error = svc_set_num_threads(nn->nfsd_serv, NULL, nrservs); if (error) goto out_shutdown; error = nn->nfsd_serv->sv_nrthreads; diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 2768d61b8aed..165719a6229a 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -64,9 +64,6 @@ struct svc_serv_ops { /* queue up a transport for servicing */ void (*svo_enqueue_xprt)(struct svc_xprt *); - /* set up thread (or whatever) execution context */ - int (*svo_setup)(struct svc_serv *, struct svc_pool *, int); - /* optional module to count when adding threads (pooled svcs only) */ struct module *svo_module; }; @@ -544,7 +541,6 @@ void svc_pool_map_put(void); struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, const struct svc_serv_ops *); int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int); -int svc_set_num_threads_sync(struct svc_serv *, struct svc_pool *, int); int svc_pool_stats_open(struct svc_serv *serv, struct file *file); void svc_shutdown_net(struct svc_serv *, struct net *); int svc_process(struct svc_rqst *); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 283088f3215e..da5f008b8d27 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -741,58 +741,13 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) return 0; } - -/* destroy old threads */ -static int -svc_signal_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) -{ - struct task_struct *task; - unsigned int state = serv->sv_nrthreads-1; - - /* destroy old threads */ - do { - task = choose_victim(serv, pool, &state); - if (task == NULL) - break; - send_sig(SIGINT, task, 1); - nrservs++; - } while (nrservs < 0); - - return 0; -} - /* * Create or destroy enough new threads to make the number * of threads the given number. If `pool' is non-NULL, applies * only to threads in that pool, otherwise round-robins between * all pools. Caller must ensure that mutual exclusion between this and * server startup or shutdown. - * - * Destroying threads relies on the service threads filling in - * rqstp->rq_task, which only the nfs ones do. Assumes the serv - * has been created using svc_create_pooled(). - * - * Based on code that used to be in nfsd_svc() but tweaked - * to be pool-aware. */ -int -svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) -{ - if (pool == NULL) { - nrservs -= serv->sv_nrthreads; - } else { - spin_lock_bh(&pool->sp_lock); - nrservs -= pool->sp_nrthreads; - spin_unlock_bh(&pool->sp_lock); - } - - if (nrservs > 0) - return svc_start_kthreads(serv, pool, nrservs); - if (nrservs < 0) - return svc_signal_kthreads(serv, pool, nrservs); - return 0; -} -EXPORT_SYMBOL_GPL(svc_set_num_threads); /* destroy old threads */ static int @@ -817,7 +772,7 @@ svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) } int -svc_set_num_threads_sync(struct svc_serv *serv, struct svc_pool *pool, int nrservs) +svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) { if (pool == NULL) { nrservs -= serv->sv_nrthreads; @@ -833,7 +788,7 @@ svc_set_num_threads_sync(struct svc_serv *serv, struct svc_pool *pool, int nrser return svc_stop_kthreads(serv, pool, nrservs); return 0; } -EXPORT_SYMBOL_GPL(svc_set_num_threads_sync); +EXPORT_SYMBOL_GPL(svc_set_num_threads); /** * svc_rqst_replace_page - Replace one page in rq_pages[] From d95899dadb4d7865b1457d1de09ac6923f0c9f1f Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0727/1565] NFSD: simplify locking for network notifier. [ Upstream commit d057cfec4940ce6eeffa22b4a71dec203b06cd55 ] nfsd currently maintains an open-coded read/write semaphore (refcount and wait queue) for each network namespace to ensure the nfs service isn't shut down while the notifier is running. This is excessive. As there is unlikely to be contention between notifiers and they run without sleeping, a single spinlock is sufficient to avoid problems. Signed-off-by: NeilBrown [ cel: ensure nfsd_notifier_lock is static ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/netns.h | 3 --- fs/nfsd/nfsctl.c | 2 -- fs/nfsd/nfssvc.c | 38 ++++++++++++++++++++------------------ 3 files changed, 20 insertions(+), 23 deletions(-) diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 1fd59eb0730b..021acdc0d03b 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -131,9 +131,6 @@ struct nfsd_net { */ int keep_active; - wait_queue_head_t ntf_wq; - atomic_t ntf_refcnt; - /* * clientid and stateid data for construction of net unique COPY * stateids. diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 53076c5afe62..504b169d2788 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1484,8 +1484,6 @@ static __net_init int nfsd_init_net(struct net *net) nn->clientid_counter = nn->clientid_base + 1; nn->s2s_cp_cl_id = nn->clientid_counter++; - atomic_set(&nn->ntf_refcnt, 0); - init_waitqueue_head(&nn->ntf_wq); seqlock_init(&nn->boot_lock); return 0; diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 8d49dfbe03f8..8554bc7ff432 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -434,6 +434,7 @@ static void nfsd_shutdown_net(struct net *net) nfsd_shutdown_generic(); } +static DEFINE_SPINLOCK(nfsd_notifier_lock); static int nfsd_inetaddr_event(struct notifier_block *this, unsigned long event, void *ptr) { @@ -443,18 +444,17 @@ static int nfsd_inetaddr_event(struct notifier_block *this, unsigned long event, struct nfsd_net *nn = net_generic(net, nfsd_net_id); struct sockaddr_in sin; - if ((event != NETDEV_DOWN) || - !atomic_inc_not_zero(&nn->ntf_refcnt)) + if (event != NETDEV_DOWN || !nn->nfsd_serv) goto out; + spin_lock(&nfsd_notifier_lock); if (nn->nfsd_serv) { dprintk("nfsd_inetaddr_event: removed %pI4\n", &ifa->ifa_local); sin.sin_family = AF_INET; sin.sin_addr.s_addr = ifa->ifa_local; svc_age_temp_xprts_now(nn->nfsd_serv, (struct sockaddr *)&sin); } - atomic_dec(&nn->ntf_refcnt); - wake_up(&nn->ntf_wq); + spin_unlock(&nfsd_notifier_lock); out: return NOTIFY_DONE; @@ -474,10 +474,10 @@ static int nfsd_inet6addr_event(struct notifier_block *this, struct nfsd_net *nn = net_generic(net, nfsd_net_id); struct sockaddr_in6 sin6; - if ((event != NETDEV_DOWN) || - !atomic_inc_not_zero(&nn->ntf_refcnt)) + if (event != NETDEV_DOWN || !nn->nfsd_serv) goto out; + spin_lock(&nfsd_notifier_lock); if (nn->nfsd_serv) { dprintk("nfsd_inet6addr_event: removed %pI6\n", &ifa->addr); sin6.sin6_family = AF_INET6; @@ -486,8 +486,8 @@ static int nfsd_inet6addr_event(struct notifier_block *this, sin6.sin6_scope_id = ifa->idev->dev->ifindex; svc_age_temp_xprts_now(nn->nfsd_serv, (struct sockaddr *)&sin6); } - atomic_dec(&nn->ntf_refcnt); - wake_up(&nn->ntf_wq); + spin_unlock(&nfsd_notifier_lock); + out: return NOTIFY_DONE; } @@ -504,7 +504,6 @@ static void nfsd_last_thread(struct svc_serv *serv, struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); - atomic_dec(&nn->ntf_refcnt); /* check if the notifier still has clients */ if (atomic_dec_return(&nfsd_notifier_refcount) == 0) { unregister_inetaddr_notifier(&nfsd_inetaddr_notifier); @@ -512,7 +511,6 @@ static void nfsd_last_thread(struct svc_serv *serv, struct net *net) unregister_inet6addr_notifier(&nfsd_inet6addr_notifier); #endif } - wait_event(nn->ntf_wq, atomic_read(&nn->ntf_refcnt) == 0); /* * write_ports can create the server without actually starting @@ -624,6 +622,7 @@ int nfsd_create_serv(struct net *net) { int error; struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct svc_serv *serv; WARN_ON(!mutex_is_locked(&nfsd_mutex)); if (nn->nfsd_serv) { @@ -633,21 +632,23 @@ int nfsd_create_serv(struct net *net) if (nfsd_max_blksize == 0) nfsd_max_blksize = nfsd_get_default_max_blksize(); nfsd_reset_versions(nn); - nn->nfsd_serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize, - &nfsd_thread_sv_ops); - if (nn->nfsd_serv == NULL) + serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize, + &nfsd_thread_sv_ops); + if (serv == NULL) return -ENOMEM; - nn->nfsd_serv->sv_maxconn = nn->max_connections; - error = svc_bind(nn->nfsd_serv, net); + serv->sv_maxconn = nn->max_connections; + error = svc_bind(serv, net); if (error < 0) { /* NOT nfsd_put() as notifiers (see below) haven't * been set up yet. */ - svc_put(nn->nfsd_serv); - nn->nfsd_serv = NULL; + svc_put(serv); return error; } + spin_lock(&nfsd_notifier_lock); + nn->nfsd_serv = serv; + spin_unlock(&nfsd_notifier_lock); set_max_drc(); /* check if the notifier is already set */ @@ -657,7 +658,6 @@ int nfsd_create_serv(struct net *net) register_inet6addr_notifier(&nfsd_inet6addr_notifier); #endif } - atomic_inc(&nn->ntf_refcnt); nfsd_reset_boot_verifier(nn); return 0; } @@ -701,7 +701,9 @@ void nfsd_put(struct net *net) if (kref_put(&nn->nfsd_serv->sv_refcnt, nfsd_noop)) { svc_shutdown_net(nn->nfsd_serv, net); svc_destroy(&nn->nfsd_serv->sv_refcnt); + spin_lock(&nfsd_notifier_lock); nn->nfsd_serv = NULL; + spin_unlock(&nfsd_notifier_lock); } } From 32f3e5a70f283105cb1325dd82f39ed7cbf8059e Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0728/1565] lockd: introduce nlmsvc_serv [ Upstream commit 2840fe864c91a0fe822169b1fbfddbcac9aeac43 ] lockd has two globals - nlmsvc_task and nlmsvc_rqst - but mostly it wants the 'struct svc_serv', and when it doesn't want it exactly it can get to what it wants from the serv. This patch is a first step to removing nlmsvc_task and nlmsvc_rqst. It introduces nlmsvc_serv to store the 'struct svc_serv*'. This is set as soon as the serv is created, and cleared only when it is destroyed. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 36 ++++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index a9669b106dbd..83874878f41d 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -54,6 +54,7 @@ EXPORT_SYMBOL_GPL(nlmsvc_ops); static DEFINE_MUTEX(nlmsvc_mutex); static unsigned int nlmsvc_users; +static struct svc_serv *nlmsvc_serv; static struct task_struct *nlmsvc_task; static struct svc_rqst *nlmsvc_rqst; unsigned long nlmsvc_timeout; @@ -306,13 +307,12 @@ static int lockd_inetaddr_event(struct notifier_block *this, !atomic_inc_not_zero(&nlm_ntf_refcnt)) goto out; - if (nlmsvc_rqst) { + if (nlmsvc_serv) { dprintk("lockd_inetaddr_event: removed %pI4\n", &ifa->ifa_local); sin.sin_family = AF_INET; sin.sin_addr.s_addr = ifa->ifa_local; - svc_age_temp_xprts_now(nlmsvc_rqst->rq_server, - (struct sockaddr *)&sin); + svc_age_temp_xprts_now(nlmsvc_serv, (struct sockaddr *)&sin); } atomic_dec(&nlm_ntf_refcnt); wake_up(&nlm_ntf_wq); @@ -336,14 +336,13 @@ static int lockd_inet6addr_event(struct notifier_block *this, !atomic_inc_not_zero(&nlm_ntf_refcnt)) goto out; - if (nlmsvc_rqst) { + if (nlmsvc_serv) { dprintk("lockd_inet6addr_event: removed %pI6\n", &ifa->addr); sin6.sin6_family = AF_INET6; sin6.sin6_addr = ifa->addr; if (ipv6_addr_type(&sin6.sin6_addr) & IPV6_ADDR_LINKLOCAL) sin6.sin6_scope_id = ifa->idev->dev->ifindex; - svc_age_temp_xprts_now(nlmsvc_rqst->rq_server, - (struct sockaddr *)&sin6); + svc_age_temp_xprts_now(nlmsvc_serv, (struct sockaddr *)&sin6); } atomic_dec(&nlm_ntf_refcnt); wake_up(&nlm_ntf_wq); @@ -423,15 +422,17 @@ static const struct svc_serv_ops lockd_sv_ops = { .svo_enqueue_xprt = svc_xprt_do_enqueue, }; -static struct svc_serv *lockd_create_svc(void) +static int lockd_create_svc(void) { struct svc_serv *serv; /* * Check whether we're already up and running. */ - if (nlmsvc_rqst) - return svc_get(nlmsvc_rqst->rq_server); + if (nlmsvc_serv) { + svc_get(nlmsvc_serv); + return 0; + } /* * Sanity check: if there's no pid, @@ -448,14 +449,15 @@ static struct svc_serv *lockd_create_svc(void) serv = svc_create(&nlmsvc_program, LOCKD_BUFSIZE, &lockd_sv_ops); if (!serv) { printk(KERN_WARNING "lockd_up: create service failed\n"); - return ERR_PTR(-ENOMEM); + return -ENOMEM; } + nlmsvc_serv = serv; register_inetaddr_notifier(&lockd_inetaddr_notifier); #if IS_ENABLED(CONFIG_IPV6) register_inet6addr_notifier(&lockd_inet6addr_notifier); #endif dprintk("lockd_up: service created\n"); - return serv; + return 0; } /* @@ -468,11 +470,10 @@ int lockd_up(struct net *net, const struct cred *cred) mutex_lock(&nlmsvc_mutex); - serv = lockd_create_svc(); - if (IS_ERR(serv)) { - error = PTR_ERR(serv); + error = lockd_create_svc(); + if (error) goto err_create; - } + serv = nlmsvc_serv; error = lockd_up_net(serv, net, cred); if (error < 0) { @@ -487,6 +488,8 @@ int lockd_up(struct net *net, const struct cred *cred) } nlmsvc_users++; err_put: + if (nlmsvc_users == 0) + nlmsvc_serv = NULL; svc_put(serv); err_create: mutex_unlock(&nlmsvc_mutex); @@ -501,7 +504,7 @@ void lockd_down(struct net *net) { mutex_lock(&nlmsvc_mutex); - lockd_down_net(nlmsvc_rqst->rq_server, net); + lockd_down_net(nlmsvc_serv, net); if (nlmsvc_users) { if (--nlmsvc_users) goto out; @@ -519,6 +522,7 @@ lockd_down(struct net *net) dprintk("lockd_down: service stopped\n"); lockd_svc_exit_thread(); dprintk("lockd_down: service destroyed\n"); + nlmsvc_serv = NULL; nlmsvc_task = NULL; nlmsvc_rqst = NULL; out: From a389baad9137f26e50ba524ed1d3bcebd0daa93f Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0729/1565] lockd: simplify management of network status notifiers [ Upstream commit 5a8a7ff57421b7de3ae72019938ffb5daaee36e7 ] Now that the network status notifiers use nlmsvc_serv rather then nlmsvc_rqst the management can be simplified. Notifier unregistration synchronises with any pending notifications so providing we unregister before nlm_serv is freed no further interlock is required. So we move the unregister call to just before the thread is killed (which destroys the service) and just before the service is destroyed in the failure-path of lockd_up(). Then nlm_ntf_refcnt and nlm_ntf_wq can be removed. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 35 +++++++++-------------------------- 1 file changed, 9 insertions(+), 26 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 83874878f41d..20cebb191350 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -59,9 +59,6 @@ static struct task_struct *nlmsvc_task; static struct svc_rqst *nlmsvc_rqst; unsigned long nlmsvc_timeout; -static atomic_t nlm_ntf_refcnt = ATOMIC_INIT(0); -static DECLARE_WAIT_QUEUE_HEAD(nlm_ntf_wq); - unsigned int lockd_net_id; /* @@ -303,8 +300,7 @@ static int lockd_inetaddr_event(struct notifier_block *this, struct in_ifaddr *ifa = (struct in_ifaddr *)ptr; struct sockaddr_in sin; - if ((event != NETDEV_DOWN) || - !atomic_inc_not_zero(&nlm_ntf_refcnt)) + if (event != NETDEV_DOWN) goto out; if (nlmsvc_serv) { @@ -314,8 +310,6 @@ static int lockd_inetaddr_event(struct notifier_block *this, sin.sin_addr.s_addr = ifa->ifa_local; svc_age_temp_xprts_now(nlmsvc_serv, (struct sockaddr *)&sin); } - atomic_dec(&nlm_ntf_refcnt); - wake_up(&nlm_ntf_wq); out: return NOTIFY_DONE; @@ -332,8 +326,7 @@ static int lockd_inet6addr_event(struct notifier_block *this, struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr; struct sockaddr_in6 sin6; - if ((event != NETDEV_DOWN) || - !atomic_inc_not_zero(&nlm_ntf_refcnt)) + if (event != NETDEV_DOWN) goto out; if (nlmsvc_serv) { @@ -344,8 +337,6 @@ static int lockd_inet6addr_event(struct notifier_block *this, sin6.sin6_scope_id = ifa->idev->dev->ifindex; svc_age_temp_xprts_now(nlmsvc_serv, (struct sockaddr *)&sin6); } - atomic_dec(&nlm_ntf_refcnt); - wake_up(&nlm_ntf_wq); out: return NOTIFY_DONE; @@ -362,14 +353,6 @@ static void lockd_unregister_notifiers(void) #if IS_ENABLED(CONFIG_IPV6) unregister_inet6addr_notifier(&lockd_inet6addr_notifier); #endif - wait_event(nlm_ntf_wq, atomic_read(&nlm_ntf_refcnt) == 0); -} - -static void lockd_svc_exit_thread(void) -{ - atomic_dec(&nlm_ntf_refcnt); - lockd_unregister_notifiers(); - svc_exit_thread(nlmsvc_rqst); } static int lockd_start_svc(struct svc_serv *serv) @@ -388,11 +371,9 @@ static int lockd_start_svc(struct svc_serv *serv) printk(KERN_WARNING "lockd_up: svc_rqst allocation failed, error=%d\n", error); - lockd_unregister_notifiers(); goto out_rqst; } - atomic_inc(&nlm_ntf_refcnt); svc_sock_update_bufs(serv); serv->sv_maxconn = nlm_max_connections; @@ -410,7 +391,7 @@ static int lockd_start_svc(struct svc_serv *serv) return 0; out_task: - lockd_svc_exit_thread(); + svc_exit_thread(nlmsvc_rqst); nlmsvc_task = NULL; out_rqst: nlmsvc_rqst = NULL; @@ -477,7 +458,6 @@ int lockd_up(struct net *net, const struct cred *cred) error = lockd_up_net(serv, net, cred); if (error < 0) { - lockd_unregister_notifiers(); goto err_put; } @@ -488,8 +468,10 @@ int lockd_up(struct net *net, const struct cred *cred) } nlmsvc_users++; err_put: - if (nlmsvc_users == 0) + if (nlmsvc_users == 0) { + lockd_unregister_notifiers(); nlmsvc_serv = NULL; + } svc_put(serv); err_create: mutex_unlock(&nlmsvc_mutex); @@ -518,13 +500,14 @@ lockd_down(struct net *net) printk(KERN_ERR "lockd_down: no lockd running.\n"); BUG(); } + lockd_unregister_notifiers(); kthread_stop(nlmsvc_task); dprintk("lockd_down: service stopped\n"); - lockd_svc_exit_thread(); + svc_exit_thread(nlmsvc_rqst); + nlmsvc_rqst = NULL; dprintk("lockd_down: service destroyed\n"); nlmsvc_serv = NULL; nlmsvc_task = NULL; - nlmsvc_rqst = NULL; out: mutex_unlock(&nlmsvc_mutex); } From 7077a007037517faf7006eec94dbf0c3d2b76137 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0730/1565] lockd: move lockd_start_svc() call into lockd_create_svc() [ Upstream commit b73a2972041bee70eb0cbbb25fa77828c63c916b ] lockd_start_svc() only needs to be called once, just after the svc is created. If the start fails, the svc is discarded too. It thus makes sense to call lockd_start_svc() from lockd_create_svc(). This allows us to remove the test against nlmsvc_rqst at the start of lockd_start_svc() - it must always be NULL. lockd_up() only held an extra reference on the svc until a thread was created - then it dropped it. The thread - and thus the extra reference - will remain until kthread_stop() is called. Now that the thread is created in lockd_create_svc(), the extra reference can be dropped there. So the 'serv' variable is no longer needed in lockd_up(). Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 20cebb191350..91e7c839841e 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -359,9 +359,6 @@ static int lockd_start_svc(struct svc_serv *serv) { int error; - if (nlmsvc_rqst) - return 0; - /* * Create the kernel thread and wait for it to start. */ @@ -406,6 +403,7 @@ static const struct svc_serv_ops lockd_sv_ops = { static int lockd_create_svc(void) { struct svc_serv *serv; + int error; /* * Check whether we're already up and running. @@ -432,6 +430,13 @@ static int lockd_create_svc(void) printk(KERN_WARNING "lockd_up: create service failed\n"); return -ENOMEM; } + + error = lockd_start_svc(serv); + /* The thread now holds the only reference */ + svc_put(serv); + if (error < 0) + return error; + nlmsvc_serv = serv; register_inetaddr_notifier(&lockd_inetaddr_notifier); #if IS_ENABLED(CONFIG_IPV6) @@ -446,7 +451,6 @@ static int lockd_create_svc(void) */ int lockd_up(struct net *net, const struct cred *cred) { - struct svc_serv *serv; int error; mutex_lock(&nlmsvc_mutex); @@ -454,25 +458,19 @@ int lockd_up(struct net *net, const struct cred *cred) error = lockd_create_svc(); if (error) goto err_create; - serv = nlmsvc_serv; - error = lockd_up_net(serv, net, cred); + error = lockd_up_net(nlmsvc_serv, net, cred); if (error < 0) { goto err_put; } - error = lockd_start_svc(serv); - if (error < 0) { - lockd_down_net(serv, net); - goto err_put; - } nlmsvc_users++; err_put: if (nlmsvc_users == 0) { lockd_unregister_notifiers(); + kthread_stop(nlmsvc_task); nlmsvc_serv = NULL; } - svc_put(serv); err_create: mutex_unlock(&nlmsvc_mutex); return error; From 8304dd04fb7b6c734c7ce2e0e09a164f890db6ee Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0731/1565] lockd: move svc_exit_thread() into the thread [ Upstream commit 6a4e2527a63620a820c4ebf3596b57176da26fb3 ] The normal place to call svc_exit_thread() is from the thread itself just before it exists. Do this for lockd. This means that nlmsvc_rqst is not used out side of lockd_start_svc(), so it can be made local to that function, and renamed to 'rqst'. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 91e7c839841e..9aa499a76159 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -56,7 +56,6 @@ static DEFINE_MUTEX(nlmsvc_mutex); static unsigned int nlmsvc_users; static struct svc_serv *nlmsvc_serv; static struct task_struct *nlmsvc_task; -static struct svc_rqst *nlmsvc_rqst; unsigned long nlmsvc_timeout; unsigned int lockd_net_id; @@ -182,6 +181,11 @@ lockd(void *vrqstp) nlm_shutdown_hosts(); cancel_delayed_work_sync(&ln->grace_period_end); locks_end_grace(&ln->lockd_manager); + + dprintk("lockd_down: service stopped\n"); + + svc_exit_thread(rqstp); + return 0; } @@ -358,13 +362,14 @@ static void lockd_unregister_notifiers(void) static int lockd_start_svc(struct svc_serv *serv) { int error; + struct svc_rqst *rqst; /* * Create the kernel thread and wait for it to start. */ - nlmsvc_rqst = svc_prepare_thread(serv, &serv->sv_pools[0], NUMA_NO_NODE); - if (IS_ERR(nlmsvc_rqst)) { - error = PTR_ERR(nlmsvc_rqst); + rqst = svc_prepare_thread(serv, &serv->sv_pools[0], NUMA_NO_NODE); + if (IS_ERR(rqst)) { + error = PTR_ERR(rqst); printk(KERN_WARNING "lockd_up: svc_rqst allocation failed, error=%d\n", error); @@ -374,24 +379,23 @@ static int lockd_start_svc(struct svc_serv *serv) svc_sock_update_bufs(serv); serv->sv_maxconn = nlm_max_connections; - nlmsvc_task = kthread_create(lockd, nlmsvc_rqst, "%s", serv->sv_name); + nlmsvc_task = kthread_create(lockd, rqst, "%s", serv->sv_name); if (IS_ERR(nlmsvc_task)) { error = PTR_ERR(nlmsvc_task); printk(KERN_WARNING "lockd_up: kthread_run failed, error=%d\n", error); goto out_task; } - nlmsvc_rqst->rq_task = nlmsvc_task; + rqst->rq_task = nlmsvc_task; wake_up_process(nlmsvc_task); dprintk("lockd_up: service started\n"); return 0; out_task: - svc_exit_thread(nlmsvc_rqst); + svc_exit_thread(rqst); nlmsvc_task = NULL; out_rqst: - nlmsvc_rqst = NULL; return error; } @@ -500,9 +504,6 @@ lockd_down(struct net *net) } lockd_unregister_notifiers(); kthread_stop(nlmsvc_task); - dprintk("lockd_down: service stopped\n"); - svc_exit_thread(nlmsvc_rqst); - nlmsvc_rqst = NULL; dprintk("lockd_down: service destroyed\n"); nlmsvc_serv = NULL; nlmsvc_task = NULL; From e5087b3d584f3e18bf43954a5bf068352f498bc7 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0732/1565] lockd: introduce lockd_put() [ Upstream commit 865b674069e05e5779fcf8cf7a166d2acb7e930b ] There is some cleanup that is duplicated in lockd_down() and the failure path of lockd_up(). Factor these out into a new lockd_put() and call it from both places. lockd_put() does *not* take the mutex - that must be held by the caller. It decrements nlmsvc_users and if that reaches zero, it cleans up. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 64 +++++++++++++++++++++----------------------------- 1 file changed, 27 insertions(+), 37 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 9aa499a76159..7f12c280fd30 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -351,14 +351,6 @@ static struct notifier_block lockd_inet6addr_notifier = { }; #endif -static void lockd_unregister_notifiers(void) -{ - unregister_inetaddr_notifier(&lockd_inetaddr_notifier); -#if IS_ENABLED(CONFIG_IPV6) - unregister_inet6addr_notifier(&lockd_inet6addr_notifier); -#endif -} - static int lockd_start_svc(struct svc_serv *serv) { int error; @@ -450,6 +442,27 @@ static int lockd_create_svc(void) return 0; } +static void lockd_put(void) +{ + if (WARN(nlmsvc_users <= 0, "lockd_down: no users!\n")) + return; + if (--nlmsvc_users) + return; + + unregister_inetaddr_notifier(&lockd_inetaddr_notifier); +#if IS_ENABLED(CONFIG_IPV6) + unregister_inet6addr_notifier(&lockd_inet6addr_notifier); +#endif + + if (nlmsvc_task) { + kthread_stop(nlmsvc_task); + dprintk("lockd_down: service stopped\n"); + nlmsvc_task = NULL; + } + nlmsvc_serv = NULL; + dprintk("lockd_down: service destroyed\n"); +} + /* * Bring up the lockd process if it's not already up. */ @@ -461,21 +474,16 @@ int lockd_up(struct net *net, const struct cred *cred) error = lockd_create_svc(); if (error) - goto err_create; + goto err; + nlmsvc_users++; error = lockd_up_net(nlmsvc_serv, net, cred); if (error < 0) { - goto err_put; + lockd_put(); + goto err; } - nlmsvc_users++; -err_put: - if (nlmsvc_users == 0) { - lockd_unregister_notifiers(); - kthread_stop(nlmsvc_task); - nlmsvc_serv = NULL; - } -err_create: +err: mutex_unlock(&nlmsvc_mutex); return error; } @@ -489,25 +497,7 @@ lockd_down(struct net *net) { mutex_lock(&nlmsvc_mutex); lockd_down_net(nlmsvc_serv, net); - if (nlmsvc_users) { - if (--nlmsvc_users) - goto out; - } else { - printk(KERN_ERR "lockd_down: no users! task=%p\n", - nlmsvc_task); - BUG(); - } - - if (!nlmsvc_task) { - printk(KERN_ERR "lockd_down: no lockd running.\n"); - BUG(); - } - lockd_unregister_notifiers(); - kthread_stop(nlmsvc_task); - dprintk("lockd_down: service destroyed\n"); - nlmsvc_serv = NULL; - nlmsvc_task = NULL; -out: + lockd_put(); mutex_unlock(&nlmsvc_mutex); } EXPORT_SYMBOL_GPL(lockd_down); From 9fe19a48a3bf62d57b0f2110e5b286c31e7996cd Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0733/1565] lockd: rename lockd_create_svc() to lockd_get() [ Upstream commit ecd3ad68d2c6d3ae178a63a2d9a02c392904fd36 ] lockd_create_svc() already does an svc_get() if the service already exists, so it is more like a "get" than a "create". So: - Move the increment of nlmsvc_users into the function as well - rename to lockd_get(). It is now the inverse of lockd_put(). Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 7f12c280fd30..1a7c11118b32 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -396,16 +396,14 @@ static const struct svc_serv_ops lockd_sv_ops = { .svo_enqueue_xprt = svc_xprt_do_enqueue, }; -static int lockd_create_svc(void) +static int lockd_get(void) { struct svc_serv *serv; int error; - /* - * Check whether we're already up and running. - */ if (nlmsvc_serv) { svc_get(nlmsvc_serv); + nlmsvc_users++; return 0; } @@ -439,6 +437,7 @@ static int lockd_create_svc(void) register_inet6addr_notifier(&lockd_inet6addr_notifier); #endif dprintk("lockd_up: service created\n"); + nlmsvc_users++; return 0; } @@ -472,10 +471,9 @@ int lockd_up(struct net *net, const struct cred *cred) mutex_lock(&nlmsvc_mutex); - error = lockd_create_svc(); + error = lockd_get(); if (error) goto err; - nlmsvc_users++; error = lockd_up_net(nlmsvc_serv, net, cred); if (error < 0) { From 74a0e37a20995f48af61be4fc24fa19175540b92 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0734/1565] SUNRPC: move the pool_map definitions (back) into svc.c [ Upstream commit cf0e124e0a489944d08fcc3c694d2b234d2cc658 ] These definitions are not used outside of svc.c, and there is no evidence that they ever have been. So move them into svc.c and make the declarations 'static'. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/sunrpc/svc.h | 25 ------------------------- net/sunrpc/svc.c | 31 +++++++++++++++++++++++++------ 2 files changed, 25 insertions(+), 31 deletions(-) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 165719a6229a..89e9d00af601 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -497,29 +497,6 @@ struct svc_procedure { const char * pc_name; /* for display */ }; -/* - * Mode for mapping cpus to pools. - */ -enum { - SVC_POOL_AUTO = -1, /* choose one of the others */ - SVC_POOL_GLOBAL, /* no mapping, just a single global pool - * (legacy & UP mode) */ - SVC_POOL_PERCPU, /* one pool per cpu */ - SVC_POOL_PERNODE /* one pool per numa node */ -}; - -struct svc_pool_map { - int count; /* How many svc_servs use us */ - int mode; /* Note: int not enum to avoid - * warnings about "enumeration value - * not handled in switch" */ - unsigned int npools; - unsigned int *pool_to; /* maps pool id to cpu or node */ - unsigned int *to_pool; /* maps cpu or node to pool id */ -}; - -extern struct svc_pool_map svc_pool_map; - /* * Function prototypes. */ @@ -536,8 +513,6 @@ void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page); void svc_rqst_free(struct svc_rqst *); void svc_exit_thread(struct svc_rqst *); -unsigned int svc_pool_map_get(void); -void svc_pool_map_put(void); struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, const struct svc_serv_ops *); int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index da5f008b8d27..c681ac1c9d56 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -39,14 +39,35 @@ static void svc_unregister(const struct svc_serv *serv, struct net *net); #define SVC_POOL_DEFAULT SVC_POOL_GLOBAL +/* + * Mode for mapping cpus to pools. + */ +enum { + SVC_POOL_AUTO = -1, /* choose one of the others */ + SVC_POOL_GLOBAL, /* no mapping, just a single global pool + * (legacy & UP mode) */ + SVC_POOL_PERCPU, /* one pool per cpu */ + SVC_POOL_PERNODE /* one pool per numa node */ +}; + /* * Structure for mapping cpus to pools and vice versa. * Setup once during sunrpc initialisation. */ -struct svc_pool_map svc_pool_map = { + +struct svc_pool_map { + int count; /* How many svc_servs use us */ + int mode; /* Note: int not enum to avoid + * warnings about "enumeration value + * not handled in switch" */ + unsigned int npools; + unsigned int *pool_to; /* maps pool id to cpu or node */ + unsigned int *to_pool; /* maps cpu or node to pool id */ +}; + +static struct svc_pool_map svc_pool_map = { .mode = SVC_POOL_DEFAULT }; -EXPORT_SYMBOL_GPL(svc_pool_map); static DEFINE_MUTEX(svc_pool_map_mutex);/* protects svc_pool_map.count only */ @@ -220,7 +241,7 @@ svc_pool_map_init_pernode(struct svc_pool_map *m) * vice versa). Initialise the map if we're the first user. * Returns the number of pools. */ -unsigned int +static unsigned int svc_pool_map_get(void) { struct svc_pool_map *m = &svc_pool_map; @@ -255,7 +276,6 @@ svc_pool_map_get(void) mutex_unlock(&svc_pool_map_mutex); return m->npools; } -EXPORT_SYMBOL_GPL(svc_pool_map_get); /* * Drop a reference to the global map of cpus to pools. @@ -264,7 +284,7 @@ EXPORT_SYMBOL_GPL(svc_pool_map_get); * mode using the pool_mode module option without * rebooting or re-loading sunrpc.ko. */ -void +static void svc_pool_map_put(void) { struct svc_pool_map *m = &svc_pool_map; @@ -281,7 +301,6 @@ svc_pool_map_put(void) mutex_unlock(&svc_pool_map_mutex); } -EXPORT_SYMBOL_GPL(svc_pool_map_put); static int svc_pool_map_get_node(unsigned int pidx) { From deeda24a6762d10c3364f34680afa7f6cb48bcc1 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0735/1565] SUNRPC: always treat sv_nrpools==1 as "not pooled" [ Upstream commit 93aa619eb0b42eec2f3a9b4d9db41f5095390aec ] Currently 'pooled' services hold a reference on the pool_map, and 'unpooled' services do not. svc_destroy() uses the presence of ->svo_function (via svc_serv_is_pooled()) to determine if the reference should be dropped. There is no direct correlation between being pooled and the use of svo_function, though in practice, lockd is the only non-pooled service, and the only one not to use svo_function. This is untidy and would cause problems if we changed lockd to use svc_set_num_threads(), which requires the use of ->svo_function. So change the test for "is the service pooled" to "is sv_nrpools > 1". This means that when svc_pool_map_get() returns 1, it must NOT take a reference to the pool. We discard svc_serv_is_pooled(), and test sv_nrpools directly. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- net/sunrpc/svc.c | 54 ++++++++++++++++++++++++++---------------------- 1 file changed, 29 insertions(+), 25 deletions(-) diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index c681ac1c9d56..ceccd4ae5f79 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -35,8 +35,6 @@ static void svc_unregister(const struct svc_serv *serv, struct net *net); -#define svc_serv_is_pooled(serv) ((serv)->sv_ops->svo_function) - #define SVC_POOL_DEFAULT SVC_POOL_GLOBAL /* @@ -238,8 +236,10 @@ svc_pool_map_init_pernode(struct svc_pool_map *m) /* * Add a reference to the global map of cpus to pools (and - * vice versa). Initialise the map if we're the first user. - * Returns the number of pools. + * vice versa) if pools are in use. + * Initialise the map if we're the first user. + * Returns the number of pools. If this is '1', no reference + * was taken. */ static unsigned int svc_pool_map_get(void) @@ -251,6 +251,7 @@ svc_pool_map_get(void) if (m->count++) { mutex_unlock(&svc_pool_map_mutex); + WARN_ON_ONCE(m->npools <= 1); return m->npools; } @@ -266,29 +267,36 @@ svc_pool_map_get(void) break; } - if (npools < 0) { + if (npools <= 0) { /* default, or memory allocation failure */ npools = 1; m->mode = SVC_POOL_GLOBAL; } m->npools = npools; + if (npools == 1) + /* service is unpooled, so doesn't hold a reference */ + m->count--; + mutex_unlock(&svc_pool_map_mutex); - return m->npools; + return npools; } /* - * Drop a reference to the global map of cpus to pools. + * Drop a reference to the global map of cpus to pools, if + * pools were in use, i.e. if npools > 1. * When the last reference is dropped, the map data is * freed; this allows the sysadmin to change the pool * mode using the pool_mode module option without * rebooting or re-loading sunrpc.ko. */ static void -svc_pool_map_put(void) +svc_pool_map_put(int npools) { struct svc_pool_map *m = &svc_pool_map; + if (npools <= 1) + return; mutex_lock(&svc_pool_map_mutex); if (!--m->count) { @@ -357,21 +365,18 @@ svc_pool_for_cpu(struct svc_serv *serv, int cpu) struct svc_pool_map *m = &svc_pool_map; unsigned int pidx = 0; - /* - * An uninitialised map happens in a pure client when - * lockd is brought up, so silently treat it the - * same as SVC_POOL_GLOBAL. - */ - if (svc_serv_is_pooled(serv)) { - switch (m->mode) { - case SVC_POOL_PERCPU: - pidx = m->to_pool[cpu]; - break; - case SVC_POOL_PERNODE: - pidx = m->to_pool[cpu_to_node(cpu)]; - break; - } + if (serv->sv_nrpools <= 1) + return serv->sv_pools; + + switch (m->mode) { + case SVC_POOL_PERCPU: + pidx = m->to_pool[cpu]; + break; + case SVC_POOL_PERNODE: + pidx = m->to_pool[cpu_to_node(cpu)]; + break; } + return &serv->sv_pools[pidx % serv->sv_nrpools]; } @@ -524,7 +529,7 @@ svc_create_pooled(struct svc_program *prog, unsigned int bufsize, goto out_err; return serv; out_err: - svc_pool_map_put(); + svc_pool_map_put(npools); return NULL; } EXPORT_SYMBOL_GPL(svc_create_pooled); @@ -559,8 +564,7 @@ svc_destroy(struct kref *ref) cache_clean_deferred(serv); - if (svc_serv_is_pooled(serv)) - svc_pool_map_put(); + svc_pool_map_put(serv->sv_nrpools); kfree(serv->sv_pools); kfree(serv); From 948e4664cc3705ea73d5bf9bdaa2b5ea17a22223 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0736/1565] lockd: use svc_set_num_threads() for thread start and stop [ Upstream commit 6b044fbaab02292fedb17565dbb3f2528083b169 ] svc_set_num_threads() does everything that lockd_start_svc() does, except set sv_maxconn. It also (when passed 0) finds the threads and stops them with kthread_stop(). So move the setting for sv_maxconn, and use svc_set_num_thread() We now don't need nlmsvc_task. Now that we use svc_set_num_threads() it makes sense to set svo_module. This request that the thread exists with module_put_and_exit(). Also fix the documentation for svo_module to make this explicit. svc_prepare_thread is now only used where it is defined, so it can be made static. Signed-off-by: NeilBrown [ cel: address merge conflict with fd2468fa1301 ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 58 ++++++-------------------------------- include/linux/sunrpc/svc.h | 6 ++-- net/sunrpc/svc.c | 3 +- 3 files changed, 12 insertions(+), 55 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 1a7c11118b32..0475c5a5d061 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -55,7 +55,6 @@ EXPORT_SYMBOL_GPL(nlmsvc_ops); static DEFINE_MUTEX(nlmsvc_mutex); static unsigned int nlmsvc_users; static struct svc_serv *nlmsvc_serv; -static struct task_struct *nlmsvc_task; unsigned long nlmsvc_timeout; unsigned int lockd_net_id; @@ -186,7 +185,7 @@ lockd(void *vrqstp) svc_exit_thread(rqstp); - return 0; + module_put_and_kthread_exit(0); } static int create_lockd_listener(struct svc_serv *serv, const char *name, @@ -292,8 +291,8 @@ static void lockd_down_net(struct svc_serv *serv, struct net *net) __func__, net->ns.inum); } } else { - pr_err("%s: no users! task=%p, net=%x\n", - __func__, nlmsvc_task, net->ns.inum); + pr_err("%s: no users! net=%x\n", + __func__, net->ns.inum); BUG(); } } @@ -351,49 +350,11 @@ static struct notifier_block lockd_inet6addr_notifier = { }; #endif -static int lockd_start_svc(struct svc_serv *serv) -{ - int error; - struct svc_rqst *rqst; - - /* - * Create the kernel thread and wait for it to start. - */ - rqst = svc_prepare_thread(serv, &serv->sv_pools[0], NUMA_NO_NODE); - if (IS_ERR(rqst)) { - error = PTR_ERR(rqst); - printk(KERN_WARNING - "lockd_up: svc_rqst allocation failed, error=%d\n", - error); - goto out_rqst; - } - - svc_sock_update_bufs(serv); - serv->sv_maxconn = nlm_max_connections; - - nlmsvc_task = kthread_create(lockd, rqst, "%s", serv->sv_name); - if (IS_ERR(nlmsvc_task)) { - error = PTR_ERR(nlmsvc_task); - printk(KERN_WARNING - "lockd_up: kthread_run failed, error=%d\n", error); - goto out_task; - } - rqst->rq_task = nlmsvc_task; - wake_up_process(nlmsvc_task); - - dprintk("lockd_up: service started\n"); - return 0; - -out_task: - svc_exit_thread(rqst); - nlmsvc_task = NULL; -out_rqst: - return error; -} - static const struct svc_serv_ops lockd_sv_ops = { .svo_shutdown = svc_rpcb_cleanup, + .svo_function = lockd, .svo_enqueue_xprt = svc_xprt_do_enqueue, + .svo_module = THIS_MODULE, }; static int lockd_get(void) @@ -425,7 +386,8 @@ static int lockd_get(void) return -ENOMEM; } - error = lockd_start_svc(serv); + serv->sv_maxconn = nlm_max_connections; + error = svc_set_num_threads(serv, NULL, 1); /* The thread now holds the only reference */ svc_put(serv); if (error < 0) @@ -453,11 +415,7 @@ static void lockd_put(void) unregister_inet6addr_notifier(&lockd_inet6addr_notifier); #endif - if (nlmsvc_task) { - kthread_stop(nlmsvc_task); - dprintk("lockd_down: service stopped\n"); - nlmsvc_task = NULL; - } + svc_set_num_threads(nlmsvc_serv, NULL, 0); nlmsvc_serv = NULL; dprintk("lockd_down: service destroyed\n"); } diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 89e9d00af601..f116141ea64d 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -64,7 +64,9 @@ struct svc_serv_ops { /* queue up a transport for servicing */ void (*svo_enqueue_xprt)(struct svc_xprt *); - /* optional module to count when adding threads (pooled svcs only) */ + /* optional module to count when adding threads. + * Thread function must call module_put_and_kthread_exit() to exit. + */ struct module *svo_module; }; @@ -507,8 +509,6 @@ struct svc_serv *svc_create(struct svc_program *, unsigned int, const struct svc_serv_ops *); struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node); -struct svc_rqst *svc_prepare_thread(struct svc_serv *serv, - struct svc_pool *pool, int node); void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page); void svc_rqst_free(struct svc_rqst *); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index ceccd4ae5f79..d34d03b0bf76 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -650,7 +650,7 @@ svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node) } EXPORT_SYMBOL_GPL(svc_rqst_alloc); -struct svc_rqst * +static struct svc_rqst * svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node) { struct svc_rqst *rqstp; @@ -670,7 +670,6 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node) spin_unlock_bh(&pool->sp_lock); return rqstp; } -EXPORT_SYMBOL_GPL(svc_prepare_thread); /* * Choose a pool in which to create a new thread, for svc_set_num_threads From e072a635c1ef9d17cf0102401be855eae7305b9f Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 29 Nov 2021 15:51:25 +1100 Subject: [PATCH 0737/1565] NFS: switch the callback service back to non-pooled. [ Upstream commit 23a1a573c61ccb5e7829c1f5472d3e025293a031 ] Now that thread management is consistent there is no need for nfs-callback to use svc_create_pooled() as introduced in Commit df807fffaabd ("NFSv4.x/callback: Create the callback service through svc_create_pooled"). So switch back to svc_create(). If service pools were configured, but the number of threads were left at '1', nfs callback may not work reliably when svc_create_pooled() is used. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfs/callback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 422055a1092f..054cc1255fac 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -286,7 +286,7 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion) printk(KERN_WARNING "nfs_callback_create_svc: no kthread, %d users??\n", cb_info->users); - serv = svc_create_pooled(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE, sv_ops); + serv = svc_create(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE, sv_ops); if (!serv) { printk(KERN_ERR "nfs_callback_create_svc: create service failed\n"); return ERR_PTR(-ENOMEM); From 0bc12c128940544d1a19de44c88d3b4e210704ca Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 30 Sep 2021 19:10:03 -0400 Subject: [PATCH 0738/1565] NFSD: Remove be32_to_cpu() from DRC hash function [ Upstream commit 7578b2f628db27281d3165af0aa862311883a858 ] Commit 7142b98d9fd7 ("nfsd: Clean up drc cache in preparation for global spinlock elimination"), billed as a clean-up, added be32_to_cpu() to the DRC hash function without explanation. That commit removed two comments that state that byte-swapping in the hash function is unnecessary without explaining whether there was a need for that change. On some Intel CPUs, the swab32 instruction is known to cause a CPU pipeline stall. be32_to_cpu() does not add extra randomness, since the hash multiplication is done /before/ shifting to the high-order bits of the result. As a micro-optimization, remove the unnecessary transform from the DRC hash function. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfscache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c index 6e0b6f3148dc..a4a69ab6ab28 100644 --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -87,7 +87,7 @@ nfsd_hashsize(unsigned int limit) static u32 nfsd_cache_hash(__be32 xid, struct nfsd_net *nn) { - return hash_32(be32_to_cpu(xid), nn->maskbits); + return hash_32((__force u32)xid, nn->maskbits); } static struct svc_cacherep * From 677fd67d8b804bbe2ef24f4404b03e8a42359723 Mon Sep 17 00:00:00 2001 From: Jiapeng Chong Date: Thu, 2 Dec 2021 16:35:42 +0800 Subject: [PATCH 0739/1565] NFSD: Fix inconsistent indenting [ Upstream commit 1e37d0e5bda45881eea1bec4b812def72c7d4aea ] Eliminate the follow smatch warning: fs/nfsd/nfs4xdr.c:4766 nfsd4_encode_read_plus_hole() warn: inconsistent indenting. Reported-by: Abaci Robot Signed-off-by: Jiapeng Chong Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index d9758232a398..638a626af18d 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -4812,8 +4812,8 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp, return nfserr_resource; *p++ = htonl(NFS4_CONTENT_HOLE); - p = xdr_encode_hyper(p, read->rd_offset); - p = xdr_encode_hyper(p, count); + p = xdr_encode_hyper(p, read->rd_offset); + p = xdr_encode_hyper(p, count); *eof = (read->rd_offset + count) >= f_size; *maxcount = min_t(unsigned long, count, *maxcount); From e8f923e1e9fcc0832a12c8a2461792e5a9544c03 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Wed, 1 Dec 2021 10:58:14 +1100 Subject: [PATCH 0740/1565] NFSD: simplify per-net file cache management [ Upstream commit 1463b38e7cf34d4cc60f41daff459ad807b2e408 ] We currently have a 'laundrette' for closing cached files - a different work-item for each network-namespace. These 'laundrettes' (aka struct nfsd_fcache_disposal) are currently on a list, and are freed using rcu. The list is not necessary as we have a per-namespace structure (struct nfsd_net) which can hold a link to the nfsd_fcache_disposal. The use of kfree_rcu is also unnecessary as the cache is cleaned of all files associated with a given namespace, and no new files can be added, before the nfsd_fcache_disposal is freed. So add a '->fcache_disposal' link to nfsd_net, and discard the list management and rcu usage. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 76 +++++++++------------------------------------ fs/nfsd/netns.h | 2 ++ 2 files changed, 17 insertions(+), 61 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 8cd7d5d6955a..b6ef8256c9c6 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -44,12 +44,9 @@ struct nfsd_fcache_bucket { static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); struct nfsd_fcache_disposal { - struct list_head list; struct work_struct work; - struct net *net; spinlock_t lock; struct list_head freeme; - struct rcu_head rcu; }; static struct workqueue_struct *nfsd_filecache_wq __read_mostly; @@ -62,8 +59,6 @@ static long nfsd_file_lru_flags; static struct fsnotify_group *nfsd_file_fsnotify_group; static atomic_long_t nfsd_filecache_count; static struct delayed_work nfsd_filecache_laundrette; -static DEFINE_SPINLOCK(laundrette_lock); -static LIST_HEAD(laundrettes); static void nfsd_file_gc(void); @@ -366,19 +361,13 @@ nfsd_file_list_remove_disposal(struct list_head *dst, static void nfsd_file_list_add_disposal(struct list_head *files, struct net *net) { - struct nfsd_fcache_disposal *l; + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct nfsd_fcache_disposal *l = nn->fcache_disposal; - rcu_read_lock(); - list_for_each_entry_rcu(l, &laundrettes, list) { - if (l->net == net) { - spin_lock(&l->lock); - list_splice_tail_init(files, &l->freeme); - spin_unlock(&l->lock); - queue_work(nfsd_filecache_wq, &l->work); - break; - } - } - rcu_read_unlock(); + spin_lock(&l->lock); + list_splice_tail_init(files, &l->freeme); + spin_unlock(&l->lock); + queue_work(nfsd_filecache_wq, &l->work); } static void @@ -754,7 +743,7 @@ nfsd_file_cache_purge(struct net *net) } static struct nfsd_fcache_disposal * -nfsd_alloc_fcache_disposal(struct net *net) +nfsd_alloc_fcache_disposal(void) { struct nfsd_fcache_disposal *l; @@ -762,7 +751,6 @@ nfsd_alloc_fcache_disposal(struct net *net) if (!l) return NULL; INIT_WORK(&l->work, nfsd_file_delayed_close); - l->net = net; spin_lock_init(&l->lock); INIT_LIST_HEAD(&l->freeme); return l; @@ -771,61 +759,27 @@ nfsd_alloc_fcache_disposal(struct net *net) static void nfsd_free_fcache_disposal(struct nfsd_fcache_disposal *l) { - rcu_assign_pointer(l->net, NULL); cancel_work_sync(&l->work); nfsd_file_dispose_list(&l->freeme); - kfree_rcu(l, rcu); -} - -static void -nfsd_add_fcache_disposal(struct nfsd_fcache_disposal *l) -{ - spin_lock(&laundrette_lock); - list_add_tail_rcu(&l->list, &laundrettes); - spin_unlock(&laundrette_lock); -} - -static void -nfsd_del_fcache_disposal(struct nfsd_fcache_disposal *l) -{ - spin_lock(&laundrette_lock); - list_del_rcu(&l->list); - spin_unlock(&laundrette_lock); -} - -static int -nfsd_alloc_fcache_disposal_net(struct net *net) -{ - struct nfsd_fcache_disposal *l; - - l = nfsd_alloc_fcache_disposal(net); - if (!l) - return -ENOMEM; - nfsd_add_fcache_disposal(l); - return 0; + kfree(l); } static void nfsd_free_fcache_disposal_net(struct net *net) { - struct nfsd_fcache_disposal *l; + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct nfsd_fcache_disposal *l = nn->fcache_disposal; - rcu_read_lock(); - list_for_each_entry_rcu(l, &laundrettes, list) { - if (l->net != net) - continue; - nfsd_del_fcache_disposal(l); - rcu_read_unlock(); - nfsd_free_fcache_disposal(l); - return; - } - rcu_read_unlock(); + nfsd_free_fcache_disposal(l); } int nfsd_file_cache_start_net(struct net *net) { - return nfsd_alloc_fcache_disposal_net(net); + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + + nn->fcache_disposal = nfsd_alloc_fcache_disposal(); + return nn->fcache_disposal ? 0 : -ENOMEM; } void diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 021acdc0d03b..9e8b77d2a3a4 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -185,6 +185,8 @@ struct nfsd_net { /* utsname taken from the process that starts the server */ char nfsd_name[UNX_MAXNODENAME+1]; + + struct nfsd_fcache_disposal *fcache_disposal; }; /* Simple check to find out if a given net was properly initialized */ From f0dbe05f6df2cc9d8d16c25e36f82a9d92492886 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 21 Oct 2021 12:11:45 -0400 Subject: [PATCH 0741/1565] NFSD: Combine XDR error tracepoints [ Upstream commit 70e94d757b3e1f46486d573729d84c8955c81dce ] Clean up: The garbage_args and cant_encode tracepoints report the same information as each other, so combine them into a single tracepoint class to reduce code duplication and slightly reduce the size of trace.o. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/trace.h | 28 +++++++--------------------- 1 file changed, 7 insertions(+), 21 deletions(-) diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index de245f433392..cba38e0b204b 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -46,7 +46,7 @@ rqstp->rq_xprt->xpt_remotelen); \ } while (0); -TRACE_EVENT(nfsd_garbage_args_err, +DECLARE_EVENT_CLASS(nfsd_xdr_err_class, TP_PROTO( const struct svc_rqst *rqstp ), @@ -68,27 +68,13 @@ TRACE_EVENT(nfsd_garbage_args_err, ) ); -TRACE_EVENT(nfsd_cant_encode_err, - TP_PROTO( - const struct svc_rqst *rqstp - ), - TP_ARGS(rqstp), - TP_STRUCT__entry( - NFSD_TRACE_PROC_ARG_FIELDS +#define DEFINE_NFSD_XDR_ERR_EVENT(name) \ +DEFINE_EVENT(nfsd_xdr_err_class, nfsd_##name##_err, \ + TP_PROTO(const struct svc_rqst *rqstp), \ + TP_ARGS(rqstp)) - __field(u32, vers) - __field(u32, proc) - ), - TP_fast_assign( - NFSD_TRACE_PROC_ARG_ASSIGNMENTS - - __entry->vers = rqstp->rq_vers; - __entry->proc = rqstp->rq_proc; - ), - TP_printk("xid=0x%08x vers=%u proc=%u", - __entry->xid, __entry->vers, __entry->proc - ) -); +DEFINE_NFSD_XDR_ERR_EVENT(garbage_args); +DEFINE_NFSD_XDR_ERR_EVENT(cant_encode); #define show_nfsd_may_flags(x) \ __print_flags(x, "|", \ From 5a0710a6b40a44b96ec64d2cec70d030cf018a7d Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Tue, 7 Dec 2021 17:32:21 -0500 Subject: [PATCH 0742/1565] nfsd: improve stateid access bitmask documentation [ Upstream commit 3dcd1d8aab00c5d3a0a3725253c86440b1a0f5a7 ] The use of the bitmaps is confusing. Add a cross-reference to make it easier to find the existing comment. Add an updated reference with URL to make it quicker to look up. And a bit more editorializing about the value of this. Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 14 ++++++++++---- fs/nfsd/state.h | 4 ++++ 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 0b39ed1568d4..3b8c5f228397 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -360,11 +360,13 @@ static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops = { * st_{access,deny}_bmap field of the stateid, in order to track not * only what share bits are currently in force, but also what * combinations of share bits previous opens have used. This allows us - * to enforce the recommendation of rfc 3530 14.2.19 that the server - * return an error if the client attempt to downgrade to a combination - * of share bits not explicable by closing some of its previous opens. + * to enforce the recommendation in + * https://datatracker.ietf.org/doc/html/rfc7530#section-16.19.4 that + * the server return an error if the client attempt to downgrade to a + * combination of share bits not explicable by closing some of its + * previous opens. * - * XXX: This enforcement is actually incomplete, since we don't keep + * This enforcement is arguably incomplete, since we don't keep * track of access/deny bit combinations; so, e.g., we allow: * * OPEN allow read, deny write @@ -372,6 +374,10 @@ static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops = { * DOWNGRADE allow read, deny none * * which we should reject. + * + * But you could also argue that our current code is already overkill, + * since it only exists to return NFS4ERR_INVAL on incorrect client + * behavior. */ static unsigned int bmap_to_share_mode(unsigned long bmap) diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index e73bdbb1634a..6eb3c7157214 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -568,6 +568,10 @@ struct nfs4_ol_stateid { struct list_head st_locks; struct nfs4_stateowner *st_stateowner; struct nfs4_clnt_odstate *st_clnt_odstate; +/* + * These bitmasks use 3 separate bits for READ, ALLOW, and BOTH; see the + * comment above bmap_to_share_mode() for explanation: + */ unsigned char st_access_bmap; unsigned char st_deny_bmap; struct nfs4_ol_stateid *st_openstp; From bf5e7e1fa1dbd50c986530c4408881ce991ef4b0 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 13 Dec 2021 10:20:45 -0500 Subject: [PATCH 0743/1565] NFSD: De-duplicate nfsd4_decode_bitmap4() [ Upstream commit cd2e999c7c394ae916d8be741418b3c6c1dddea8 ] Clean up. Trond points out that xdr_stream_decode_uint32_array() does the same thing as nfsd4_decode_bitmap4(). Suggested-by: Trond Myklebust Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 17 +++-------------- 1 file changed, 3 insertions(+), 14 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 638a626af18d..adf97d72bda8 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -277,21 +277,10 @@ nfsd4_decode_verifier4(struct nfsd4_compoundargs *argp, nfs4_verifier *verf) static __be32 nfsd4_decode_bitmap4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen) { - u32 i, count; - __be32 *p; + ssize_t status; - if (xdr_stream_decode_u32(argp->xdr, &count) < 0) - return nfserr_bad_xdr; - /* request sanity */ - if (count > 1000) - return nfserr_bad_xdr; - p = xdr_inline_decode(argp->xdr, count << 2); - if (!p) - return nfserr_bad_xdr; - for (i = 0; i < bmlen; i++) - bmval[i] = (i < count) ? be32_to_cpup(p++) : 0; - - return nfs_ok; + status = xdr_stream_decode_uint32_array(argp->xdr, bmval, bmlen); + return status == -EBADMSG ? nfserr_bad_xdr : nfs_ok; } static __be32 From 535204ecaed0b75bcba00280101e2e90ab5858b9 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Thu, 16 Dec 2021 12:20:13 -0500 Subject: [PATCH 0744/1565] nfs: block notification on fs with its own ->lock [ Upstream commit 40595cdc93edf4110c0f0c0b06f8d82008f23929 ] NFSv4.1 supports an optional lock notification feature which notifies the client when a lock comes available. (Normally NFSv4 clients just poll for locks if necessary.) To make that work, we need to request a blocking lock from the filesystem. We turned that off for NFS in commit f657f8eef3ff ("nfs: don't atempt blocking locks on nfs reexports") [sic] because it actually blocks the nfsd thread while waiting for the lock. Thanks to Vasily Averin for pointing out that NFS isn't the only filesystem with that problem. Any filesystem that leaves ->lock NULL will use posix_lock_file(), which does the right thing. Simplest is just to assume that any filesystem that defines its own ->lock is not safe to request a blocking lock from. So, this patch mostly reverts commit f657f8eef3ff ("nfs: don't atempt blocking locks on nfs reexports") [sic] and commit b840be2f00c0 ("lockd: don't attempt blocking locks on nfs reexports"), and instead uses a check of ->lock (Vasily's suggestion) to decide whether to support blocking lock notifications on a given filesystem. Also add a little documentation. Perhaps someday we could add back an export flag later to allow filesystems with "good" ->lock methods to support blocking lock notifications. Reported-by: Vasily Averin Signed-off-by: J. Bruce Fields [ cel: Description rewritten to address checkpatch nits ] [ cel: Fixed warning when SUNRPC debugging is disabled ] [ cel: Fixed NULL check ] Signed-off-by: Chuck Lever Reviewed-by: Vasily Averin Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svclock.c | 6 ++++-- fs/nfs/export.c | 2 +- fs/nfsd/nfs4state.c | 18 ++++++++++++------ include/linux/exportfs.h | 2 -- include/linux/lockd/lockd.h | 9 +++++++-- 5 files changed, 24 insertions(+), 13 deletions(-) diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index e9b85d8fd5fe..cb3658ab9b7a 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -470,8 +470,10 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file, struct nlm_host *host, struct nlm_lock *lock, int wait, struct nlm_cookie *cookie, int reclaim) { - struct nlm_block *block = NULL; +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG) struct inode *inode = nlmsvc_file_inode(file); +#endif + struct nlm_block *block = NULL; int error; int mode; int async_block = 0; @@ -484,7 +486,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file, (long long)lock->fl.fl_end, wait); - if (inode->i_sb->s_export_op->flags & EXPORT_OP_SYNC_LOCKS) { + if (nlmsvc_file_file(file)->f_op->lock) { async_block = wait; wait = 0; } diff --git a/fs/nfs/export.c b/fs/nfs/export.c index 40beac65d135..b347e3ce0cc8 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -184,5 +184,5 @@ const struct export_operations nfs_export_ops = { .fetch_iversion = nfs_fetch_iversion, .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK| EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS| - EXPORT_OP_NOATOMIC_ATTR|EXPORT_OP_SYNC_LOCKS, + EXPORT_OP_NOATOMIC_ATTR, }; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 3b8c5f228397..36ae55fbfbc6 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6884,7 +6884,6 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_blocked_lock *nbl = NULL; struct file_lock *file_lock = NULL; struct file_lock *conflock = NULL; - struct super_block *sb; __be32 status = 0; int lkflg; int err; @@ -6906,7 +6905,6 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, dprintk("NFSD: nfsd4_lock: permission denied!\n"); return status; } - sb = cstate->current_fh.fh_dentry->d_sb; if (lock->lk_is_new) { if (nfsd4_has_session(cstate)) @@ -6958,8 +6956,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, fp = lock_stp->st_stid.sc_file; switch (lock->lk_type) { case NFS4_READW_LT: - if (nfsd4_has_session(cstate) && - !(sb->s_export_op->flags & EXPORT_OP_SYNC_LOCKS)) + if (nfsd4_has_session(cstate)) fl_flags |= FL_SLEEP; fallthrough; case NFS4_READ_LT: @@ -6971,8 +6968,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, fl_type = F_RDLCK; break; case NFS4_WRITEW_LT: - if (nfsd4_has_session(cstate) && - !(sb->s_export_op->flags & EXPORT_OP_SYNC_LOCKS)) + if (nfsd4_has_session(cstate)) fl_flags |= FL_SLEEP; fallthrough; case NFS4_WRITE_LT: @@ -6993,6 +6989,16 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out; } + /* + * Most filesystems with their own ->lock operations will block + * the nfsd thread waiting to acquire the lock. That leads to + * deadlocks (we don't want every nfsd thread tied up waiting + * for file locks), so don't attempt blocking lock notifications + * on those filesystems: + */ + if (nf->nf_file->f_op->lock) + fl_flags &= ~FL_SLEEP; + nbl = find_or_allocate_block(lock_sop, &fp->fi_fhandle, nn); if (!nbl) { dprintk("NFSD: %s: unable to allocate block!\n", __func__); diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index 3260fe714846..fe848901fcc3 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -221,8 +221,6 @@ struct export_operations { #define EXPORT_OP_NOATOMIC_ATTR (0x10) /* Filesystem cannot supply atomic attribute updates */ -#define EXPORT_OP_SYNC_LOCKS (0x20) /* Filesystem can't do - asychronous blocking locks */ unsigned long flags; }; diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h index c4ae6506b8b3..fcef192e5e45 100644 --- a/include/linux/lockd/lockd.h +++ b/include/linux/lockd/lockd.h @@ -303,10 +303,15 @@ void nlmsvc_invalidate_all(void); int nlmsvc_unlock_all_by_sb(struct super_block *sb); int nlmsvc_unlock_all_by_ip(struct sockaddr *server_addr); +static inline struct file *nlmsvc_file_file(struct nlm_file *file) +{ + return file->f_file[O_RDONLY] ? + file->f_file[O_RDONLY] : file->f_file[O_WRONLY]; +} + static inline struct inode *nlmsvc_file_inode(struct nlm_file *file) { - return locks_inode(file->f_file[O_RDONLY] ? - file->f_file[O_RDONLY] : file->f_file[O_WRONLY]); + return locks_inode(nlmsvc_file_file(file)); } static inline int __nlm_privileged_request4(const struct sockaddr *sap) From fab02e979949d3cf5122475db0d28161256773d5 Mon Sep 17 00:00:00 2001 From: Vasily Averin Date: Fri, 17 Dec 2021 09:49:39 +0300 Subject: [PATCH 0745/1565] nfsd4: add refcount for nfsd4_blocked_lock [ Upstream commit 47446d74f1707049067fee038507cdffda805631 ] nbl allocated in nfsd4_lock can be released by a several ways: directly in nfsd4_lock(), via nfs4_laundromat(), via another nfs command RELEASE_LOCKOWNER or via nfsd4_callback. This structure should be refcounted to be used and released correctly in all these cases. Refcount is initialized to 1 during allocation and is incremented when nbl is added into nbl_list/nbl_lru lists. Usually nbl is linked into both lists together, so only one refcount is used for both lists. However nfsd4_lock() should keep in mind that nbl can be present in one of lists only. This can happen if nbl was handled already by nfs4_laundromat/nfsd4_callback/etc. Refcount is decremented if vfs_lock_file() returns FILE_LOCK_DEFERRED, because nbl can be handled already by nfs4_laundromat/nfsd4_callback/etc. Refcount is not changed in find_blocked_lock() because of it reuses counter released after removing nbl from lists. Signed-off-by: Vasily Averin Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 25 ++++++++++++++++++++++--- fs/nfsd/state.h | 1 + 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 36ae55fbfbc6..4161a4854c43 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -246,6 +246,7 @@ find_blocked_lock(struct nfs4_lockowner *lo, struct knfsd_fh *fh, list_for_each_entry(cur, &lo->lo_blocked, nbl_list) { if (fh_match(fh, &cur->nbl_fh)) { list_del_init(&cur->nbl_list); + WARN_ON(list_empty(&cur->nbl_lru)); list_del_init(&cur->nbl_lru); found = cur; break; @@ -271,6 +272,7 @@ find_or_allocate_block(struct nfs4_lockowner *lo, struct knfsd_fh *fh, INIT_LIST_HEAD(&nbl->nbl_lru); fh_copy_shallow(&nbl->nbl_fh, fh); locks_init_lock(&nbl->nbl_lock); + kref_init(&nbl->nbl_kref); nfsd4_init_cb(&nbl->nbl_cb, lo->lo_owner.so_client, &nfsd4_cb_notify_lock_ops, NFSPROC4_CLNT_CB_NOTIFY_LOCK); @@ -279,12 +281,21 @@ find_or_allocate_block(struct nfs4_lockowner *lo, struct knfsd_fh *fh, return nbl; } +static void +free_nbl(struct kref *kref) +{ + struct nfsd4_blocked_lock *nbl; + + nbl = container_of(kref, struct nfsd4_blocked_lock, nbl_kref); + kfree(nbl); +} + static void free_blocked_lock(struct nfsd4_blocked_lock *nbl) { locks_delete_block(&nbl->nbl_lock); locks_release_private(&nbl->nbl_lock); - kfree(nbl); + kref_put(&nbl->nbl_kref, free_nbl); } static void @@ -302,6 +313,7 @@ remove_blocked_locks(struct nfs4_lockowner *lo) struct nfsd4_blocked_lock, nbl_list); list_del_init(&nbl->nbl_list); + WARN_ON(list_empty(&nbl->nbl_lru)); list_move(&nbl->nbl_lru, &reaplist); } spin_unlock(&nn->blocked_locks_lock); @@ -7029,6 +7041,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, spin_lock(&nn->blocked_locks_lock); list_add_tail(&nbl->nbl_list, &lock_sop->lo_blocked); list_add_tail(&nbl->nbl_lru, &nn->blocked_locks_lru); + kref_get(&nbl->nbl_kref); spin_unlock(&nn->blocked_locks_lock); } @@ -7041,6 +7054,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, nn->somebody_reclaimed = true; break; case FILE_LOCK_DEFERRED: + kref_put(&nbl->nbl_kref, free_nbl); nbl = NULL; fallthrough; case -EAGAIN: /* conflock holds conflicting lock */ @@ -7061,8 +7075,13 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, /* dequeue it if we queued it before */ if (fl_flags & FL_SLEEP) { spin_lock(&nn->blocked_locks_lock); - list_del_init(&nbl->nbl_list); - list_del_init(&nbl->nbl_lru); + if (!list_empty(&nbl->nbl_list) && + !list_empty(&nbl->nbl_lru)) { + list_del_init(&nbl->nbl_list); + list_del_init(&nbl->nbl_lru); + kref_put(&nbl->nbl_kref, free_nbl); + } + /* nbl can use one of lists to be linked to reaplist */ spin_unlock(&nn->blocked_locks_lock); } free_blocked_lock(nbl); diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 6eb3c7157214..95457cfd37fc 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -633,6 +633,7 @@ struct nfsd4_blocked_lock { struct file_lock nbl_lock; struct knfsd_fh nbl_fh; struct nfsd4_callback nbl_cb; + struct kref nbl_kref; }; struct nfsd4_compound_state; From 9175fcf39c20ffd2fe3084dd671827b96c6422f3 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 21 Dec 2021 11:52:06 -0500 Subject: [PATCH 0746/1565] NFSD: Fix zero-length NFSv3 WRITEs [ Upstream commit 6a2f774424bfdcc2df3e17de0cefe74a4269cad5 ] The Linux NFS server currently responds to a zero-length NFSv3 WRITE request with NFS3ERR_IO. It responds to a zero-length NFSv4 WRITE with NFS4_OK and count of zero. RFC 1813 says of the WRITE procedure's @count argument: count The number of bytes of data to be written. If count is 0, the WRITE will succeed and return a count of 0, barring errors due to permissions checking. RFC 8881 has similar language for NFSv4, though NFSv4 removed the explicit @count argument because that value is already contained in the opaque payload array. The synthetic client pynfs's WRT4 and WRT15 tests do emit zero- length WRITEs to exercise this spec requirement. Commit fdec6114ee1f ("nfsd4: zero-length WRITE should succeed") addressed the same problem there with the same fix. But interestingly the Linux NFS client does not appear to emit zero- length WRITEs, instead squelching them. I'm not aware of a test that can generate such WRITEs for NFSv3, so I wrote a naive C program to generate a zero-length WRITE and test this fix. Fixes: 8154ef2776aa ("NFSD: Clean up legacy NFS WRITE argument XDR decoders") Reported-by: Trond Myklebust Signed-off-by: Chuck Lever Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 6 +----- fs/nfsd/nfsproc.c | 5 ----- 2 files changed, 1 insertion(+), 10 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index d26271c60ab4..1515c32e08db 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -207,15 +207,11 @@ nfsd3_proc_write(struct svc_rqst *rqstp) fh_copy(&resp->fh, &argp->fh); resp->committed = argp->stable; nvecs = svc_fill_write_vector(rqstp, &argp->payload); - if (!nvecs) { - resp->status = nfserr_io; - goto out; - } + resp->status = nfsd_write(rqstp, &resp->fh, argp->offset, rqstp->rq_vec, nvecs, &cnt, resp->committed, resp->verf); resp->count = cnt; -out: return rpc_success; } diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 8ee329bc3815..0a2bab7ef33c 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -235,10 +235,6 @@ nfsd_proc_write(struct svc_rqst *rqstp) argp->len, argp->offset); nvecs = svc_fill_write_vector(rqstp, &argp->payload); - if (!nvecs) { - resp->status = nfserr_io; - goto out; - } resp->status = nfsd_write(rqstp, fh_copy(&resp->fh, &argp->fh), argp->offset, rqstp->rq_vec, nvecs, @@ -247,7 +243,6 @@ nfsd_proc_write(struct svc_rqst *rqstp) resp->status = fh_getattr(&resp->fh, &resp->stat); else if (resp->status == nfserr_jukebox) return rpc_drop_reply; -out: return rpc_success; } From f12557372b7685c22d77458ca405f5b5264b668f Mon Sep 17 00:00:00 2001 From: Peng Tao Date: Sat, 18 Dec 2021 20:37:54 -0500 Subject: [PATCH 0747/1565] nfsd: map EBADF [ Upstream commit b3d0db706c77d02055910fcfe2f6eb5155ff9d5e ] Now that we have open file cache, it is possible that another client deletes the file and DP will not know about it. Then IO to MDS would fail with BADSTATEID and knfsd would start state recovery, which should fail as well and then nfs read/write will fail with EBADF. And it triggers a WARN() in nfserrno(). -----------[ cut here ]------------ WARNING: CPU: 0 PID: 13529 at fs/nfsd/nfsproc.c:758 nfserrno+0x58/0x70 [nfsd]() nfsd: non-standard errno: -9 modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_connt pata_acpi floppy CPU: 0 PID: 13529 Comm: nfsd Tainted: G W 4.1.5-00307-g6e6579b #7 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014 0000000000000000 00000000464e6c9c ffff88079085fba8 ffffffff81789936 0000000000000000 ffff88079085fc00 ffff88079085fbe8 ffffffff810a08ea ffff88079085fbe8 ffff88080f45c900 ffff88080f627d50 ffff880790c46a48 all Trace: [] dump_stack+0x45/0x57 [] warn_slowpath_common+0x8a/0xc0 [] warn_slowpath_fmt+0x55/0x70 [] ? splice_direct_to_actor+0x148/0x230 [] ? fsid_source+0x60/0x60 [nfsd] [] nfserrno+0x58/0x70 [nfsd] [] nfsd_finish_read+0x97/0xb0 [nfsd] [] nfsd_splice_read+0x76/0xa0 [nfsd] [] nfsd_read+0xc1/0xd0 [nfsd] [] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc] [] nfsd3_proc_read+0xba/0x150 [nfsd] [] nfsd_dispatch+0xc3/0x210 [nfsd] [] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc] [] svc_process_common+0x453/0x6f0 [sunrpc] [] svc_process+0x113/0x1b0 [sunrpc] [] nfsd+0xff/0x170 [nfsd] [] ? nfsd_destroy+0x80/0x80 [nfsd] [] kthread+0xd8/0xf0 [] ? kthread_create_on_node+0x1b0/0x1b0 [] ret_from_fork+0x42/0x70 [] ? kthread_create_on_node+0x1b0/0x1b0 Signed-off-by: Peng Tao Signed-off-by: Lance Shelton Signed-off-by: Trond Myklebust Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 0a2bab7ef33c..de4b97cdbd2b 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -845,6 +845,7 @@ nfserrno (int errno) { nfserr_io, -EIO }, { nfserr_nxio, -ENXIO }, { nfserr_fbig, -E2BIG }, + { nfserr_stale, -EBADF }, { nfserr_acces, -EACCES }, { nfserr_exist, -EEXIST }, { nfserr_xdev, -EXDEV }, From 3128aa9c984d3eab63b07b39690bcfa7321b62c0 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Sat, 18 Dec 2021 20:37:55 -0500 Subject: [PATCH 0748/1565] nfsd: Add errno mapping for EREMOTEIO [ Upstream commit a2694e51f60c5a18c7e43d1a9feaa46d7f153e65 ] The NFS client can occasionally return EREMOTEIO when signalling issues with the server. ...map to NFSERR_IO. Signed-off-by: Jeff Layton Signed-off-by: Lance Shelton Signed-off-by: Trond Myklebust Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index de4b97cdbd2b..6ed13754290a 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -874,6 +874,7 @@ nfserrno (int errno) { nfserr_toosmall, -ETOOSMALL }, { nfserr_serverfault, -ESERVERFAULT }, { nfserr_serverfault, -ENFILE }, + { nfserr_io, -EREMOTEIO }, { nfserr_io, -EUCLEAN }, { nfserr_perm, -ENOKEY }, { nfserr_no_grace, -ENOGRACE}, From 30282a70aac12b05b329d9a78f45ba847615b5e1 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Sat, 18 Dec 2021 20:37:56 -0500 Subject: [PATCH 0749/1565] nfsd: Retry once in nfsd_open on an -EOPENSTALE return [ Upstream commit 12bcbd40fd931472c7fc9cf3bfe66799ece93ed8 ] If we get back -EOPENSTALE from an NFSv4 open, then we either got some unhandled error or the inode we got back was not the same as the one associated with the dentry. We really have no recourse in that situation other than to retry the open, and if it fails to just return nfserr_stale back to the client. Signed-off-by: Jeff Layton Signed-off-by: Lance Shelton Signed-off-by: Trond Myklebust Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 1 + fs/nfsd/vfs.c | 10 +++++++++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 6ed13754290a..b8ddf21e70ea 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -875,6 +875,7 @@ nfserrno (int errno) { nfserr_serverfault, -ESERVERFAULT }, { nfserr_serverfault, -ENFILE }, { nfserr_io, -EREMOTEIO }, + { nfserr_stale, -EOPENSTALE }, { nfserr_io, -EUCLEAN }, { nfserr_perm, -ENOKEY }, { nfserr_no_grace, -ENOGRACE}, diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index deaf4a50550d..b343677b01ef 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -805,6 +805,7 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int may_flags, struct file **filp) { __be32 err; + bool retried = false; validate_process_creds(); /* @@ -820,9 +821,16 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, */ if (type == S_IFREG) may_flags |= NFSD_MAY_OWNER_OVERRIDE; +retry: err = fh_verify(rqstp, fhp, type, may_flags); - if (!err) + if (!err) { err = __nfsd_open(rqstp, fhp, type, may_flags, filp); + if (err == nfserr_stale && !retried) { + retried = true; + fh_put(fhp); + goto retry; + } + } validate_process_creds(); return err; } From aa9ea9ec295f70b646e00f842fd9bfad0c3052d6 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 28 Dec 2021 14:19:41 -0500 Subject: [PATCH 0750/1565] NFSD: Clean up nfsd_vfs_write() [ Upstream commit 33388b3aefefd4d83764dab8038cb54068161a44 ] The RWF_SYNC and !RWF_SYNC arms are now exactly alike except that the RWF_SYNC arm resets the boot verifier twice in a row. Fix that redundancy and de-duplicate the code. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 21 +++++---------------- 1 file changed, 5 insertions(+), 16 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index b343677b01ef..cef8435d76a6 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1023,22 +1023,11 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, iov_iter_kvec(&iter, WRITE, vec, vlen, *cnt); since = READ_ONCE(file->f_wb_err); - if (flags & RWF_SYNC) { - if (verf) - nfsd_copy_boot_verifier(verf, - net_generic(SVC_NET(rqstp), - nfsd_net_id)); - host_err = vfs_iter_write(file, &iter, &pos, flags); - if (host_err < 0) - nfsd_reset_boot_verifier(net_generic(SVC_NET(rqstp), - nfsd_net_id)); - } else { - if (verf) - nfsd_copy_boot_verifier(verf, - net_generic(SVC_NET(rqstp), - nfsd_net_id)); - host_err = vfs_iter_write(file, &iter, &pos, flags); - } + if (verf) + nfsd_copy_boot_verifier(verf, + net_generic(SVC_NET(rqstp), + nfsd_net_id)); + host_err = vfs_iter_write(file, &iter, &pos, flags); if (host_err < 0) { nfsd_reset_boot_verifier(net_generic(SVC_NET(rqstp), nfsd_net_id)); From 5a1575c02baa0fa13db84134d15b670540c83c38 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 28 Dec 2021 12:41:32 -0500 Subject: [PATCH 0751/1565] NFSD: De-duplicate net_generic(SVC_NET(rqstp), nfsd_net_id) [ Upstream commit fb7622c2dbd1aa41133a8c73e1137b833c074519 ] Since this pointer is used repeatedly, move it to a stack variable. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index cef8435d76a6..aba9d479d084 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -980,6 +980,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, unsigned long *cnt, int stable, __be32 *verf) { + struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); struct file *file = nf->nf_file; struct super_block *sb = file_inode(file)->i_sb; struct svc_export *exp; @@ -1024,13 +1025,10 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, iov_iter_kvec(&iter, WRITE, vec, vlen, *cnt); since = READ_ONCE(file->f_wb_err); if (verf) - nfsd_copy_boot_verifier(verf, - net_generic(SVC_NET(rqstp), - nfsd_net_id)); + nfsd_copy_boot_verifier(verf, nn); host_err = vfs_iter_write(file, &iter, &pos, flags); if (host_err < 0) { - nfsd_reset_boot_verifier(net_generic(SVC_NET(rqstp), - nfsd_net_id)); + nfsd_reset_boot_verifier(nn); goto out_nfserr; } *cnt = host_err; @@ -1043,8 +1041,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, if (stable && use_wgather) { host_err = wait_for_concurrent_writes(file); if (host_err < 0) - nfsd_reset_boot_verifier(net_generic(SVC_NET(rqstp), - nfsd_net_id)); + nfsd_reset_boot_verifier(nn); } out_nfserr: From 45ef8b7aea36e6c179661e5da639a6193aa5cb2f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 28 Dec 2021 14:26:03 -0500 Subject: [PATCH 0752/1565] NFSD: De-duplicate net_generic(nf->nf_net, nfsd_net_id) [ Upstream commit 2c445a0e72cb1fbfbdb7f9473c53556ee27c1d90 ] Since this pointer is used repeatedly, move it to a stack variable. [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index aba9d479d084..2e3b0bd560fc 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1129,6 +1129,7 @@ __be32 nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset, unsigned long count, __be32 *verf) { + struct nfsd_net *nn; struct nfsd_file *nf; loff_t end = LLONG_MAX; __be32 err = nfserr_inval; @@ -1145,6 +1146,7 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &nf); if (err) goto out; + nn = net_generic(nf->nf_net, nfsd_net_id); if (EX_ISSYNC(fhp->fh_export)) { errseq_t since = READ_ONCE(nf->nf_file->f_wb_err); int err2; @@ -1152,8 +1154,7 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, err2 = vfs_fsync_range(nf->nf_file, offset, end, 0); switch (err2) { case 0: - nfsd_copy_boot_verifier(verf, net_generic(nf->nf_net, - nfsd_net_id)); + nfsd_copy_boot_verifier(verf, nn); err2 = filemap_check_wb_err(nf->nf_file->f_mapping, since); err = nfserrno(err2); @@ -1162,13 +1163,11 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, err = nfserr_notsupp; break; default: - nfsd_reset_boot_verifier(net_generic(nf->nf_net, - nfsd_net_id)); + nfsd_reset_boot_verifier(nn); err = nfserrno(err2); } } else - nfsd_copy_boot_verifier(verf, net_generic(nf->nf_net, - nfsd_net_id)); + nfsd_copy_boot_verifier(verf, nn); nfsd_file_put(nf); out: From 306d2c1c080360fd466d244da5c77b6433a15c9e Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Sat, 18 Dec 2021 20:38:00 -0500 Subject: [PATCH 0753/1565] nfsd: Add a tracepoint for errors in nfsd4_clone_file_range() [ Upstream commit a2f4c3fa4db94ba44d32a72201927cfd132a8e82 ] Since a clone error commit can cause the boot verifier to change, we should trace those errors. Signed-off-by: Trond Myklebust Signed-off-by: Chuck Lever [ cel: Addressed a checkpatch.pl splat in fs/nfsd/vfs.h ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/trace.h | 50 ++++++++++++++++++++++++++++++++++++++++++++++ fs/nfsd/vfs.c | 18 +++++++++++++++-- fs/nfsd/vfs.h | 3 ++- 4 files changed, 69 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index eaf2f95d059c..0f2025b7a641 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1104,7 +1104,7 @@ nfsd4_clone(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) goto out; - status = nfsd4_clone_file_range(src, clone->cl_src_pos, + status = nfsd4_clone_file_range(rqstp, src, clone->cl_src_pos, dst, clone->cl_dst_pos, clone->cl_count, EX_ISSYNC(cstate->current_fh.fh_export)); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index cba38e0b204b..199b485c7717 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -400,6 +400,56 @@ TRACE_EVENT(nfsd_dirent, __entry->len, __get_str(name)) ) +DECLARE_EVENT_CLASS(nfsd_copy_err_class, + TP_PROTO(struct svc_rqst *rqstp, + struct svc_fh *src_fhp, + loff_t src_offset, + struct svc_fh *dst_fhp, + loff_t dst_offset, + u64 count, + int status), + TP_ARGS(rqstp, src_fhp, src_offset, dst_fhp, dst_offset, count, status), + TP_STRUCT__entry( + __field(u32, xid) + __field(u32, src_fh_hash) + __field(loff_t, src_offset) + __field(u32, dst_fh_hash) + __field(loff_t, dst_offset) + __field(u64, count) + __field(int, status) + ), + TP_fast_assign( + __entry->xid = be32_to_cpu(rqstp->rq_xid); + __entry->src_fh_hash = knfsd_fh_hash(&src_fhp->fh_handle); + __entry->src_offset = src_offset; + __entry->dst_fh_hash = knfsd_fh_hash(&dst_fhp->fh_handle); + __entry->dst_offset = dst_offset; + __entry->count = count; + __entry->status = status; + ), + TP_printk("xid=0x%08x src_fh_hash=0x%08x src_offset=%lld " + "dst_fh_hash=0x%08x dst_offset=%lld " + "count=%llu status=%d", + __entry->xid, __entry->src_fh_hash, __entry->src_offset, + __entry->dst_fh_hash, __entry->dst_offset, + (unsigned long long)__entry->count, + __entry->status) +) + +#define DEFINE_NFSD_COPY_ERR_EVENT(name) \ +DEFINE_EVENT(nfsd_copy_err_class, nfsd_##name, \ + TP_PROTO(struct svc_rqst *rqstp, \ + struct svc_fh *src_fhp, \ + loff_t src_offset, \ + struct svc_fh *dst_fhp, \ + loff_t dst_offset, \ + u64 count, \ + int status), \ + TP_ARGS(rqstp, src_fhp, src_offset, dst_fhp, dst_offset, \ + count, status)) + +DEFINE_NFSD_COPY_ERR_EVENT(clone_file_range_err); + #include "state.h" #include "filecache.h" #include "vfs.h" diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 2e3b0bd560fc..b1ce38c642cd 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -40,6 +40,7 @@ #include "../internal.h" #include "acl.h" #include "idmap.h" +#include "xdr4.h" #endif /* CONFIG_NFSD_V4 */ #include "nfsd.h" @@ -530,8 +531,15 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp, } #endif -__be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos, - struct nfsd_file *nf_dst, u64 dst_pos, u64 count, bool sync) +static struct nfsd4_compound_state *nfsd4_get_cstate(struct svc_rqst *rqstp) +{ + return &((struct nfsd4_compoundres *)rqstp->rq_resp)->cstate; +} + +__be32 nfsd4_clone_file_range(struct svc_rqst *rqstp, + struct nfsd_file *nf_src, u64 src_pos, + struct nfsd_file *nf_dst, u64 dst_pos, + u64 count, bool sync) { struct file *src = nf_src->nf_file; struct file *dst = nf_dst->nf_file; @@ -558,6 +566,12 @@ __be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos, if (!status) status = commit_inode_metadata(file_inode(src)); if (status < 0) { + trace_nfsd_clone_file_range_err(rqstp, + &nfsd4_get_cstate(rqstp)->save_fh, + src_pos, + &nfsd4_get_cstate(rqstp)->current_fh, + dst_pos, + count, status); nfsd_reset_boot_verifier(net_generic(nf_dst->nf_net, nfsd_net_id)); ret = nfserrno(status); diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index b21b76e6b9a8..9f56dcb22ff7 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -57,7 +57,8 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *, struct svc_fh *, struct xdr_netobj *); __be32 nfsd4_vfs_fallocate(struct svc_rqst *, struct svc_fh *, struct file *, loff_t, loff_t, int); -__be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos, +__be32 nfsd4_clone_file_range(struct svc_rqst *rqstp, + struct nfsd_file *nf_src, u64 src_pos, struct nfsd_file *nf_dst, u64 dst_pos, u64 count, bool sync); #endif /* CONFIG_NFSD_V4 */ From 083d44094ff12f17c6538420a1da12d3f08756e5 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 30 Dec 2021 10:26:18 -0500 Subject: [PATCH 0754/1565] NFSD: Write verifier might go backwards [ Upstream commit cdc556600c0133575487cc69fb3128440b3c3e92 ] When vfs_iter_write() starts to fail because a file system is full, a bunch of writes can fail at once with ENOSPC. These writes repeatedly invoke nfsd_reset_boot_verifier() in quick succession. Ensure that the time it grabs doesn't go backwards due to an ntp adjustment going on at the same time. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfssvc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 8554bc7ff432..4d1d8aa6d7f9 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -363,7 +363,7 @@ void nfsd_copy_boot_verifier(__be32 verf[2], struct nfsd_net *nn) static void nfsd_reset_boot_verifier_locked(struct nfsd_net *nn) { - ktime_get_real_ts64(&nn->nfssvc_boot); + ktime_get_raw_ts64(&nn->nfssvc_boot); } void nfsd_reset_boot_verifier(struct nfsd_net *nn) From e49677ff33f3e3bfb52427d7f9f4a7783d264719 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 29 Dec 2021 14:43:16 -0500 Subject: [PATCH 0755/1565] NFSD: Clean up the nfsd_net::nfssvc_boot field [ Upstream commit 91d2e9b56cf5c80f9efc530d494968369a8a0e0d ] There are two boot-time fields in struct nfsd_net: one called boot_time and one called nfssvc_boot. The latter is used only to form write verifiers, but its documenting comment declares: /* Time of server startup */ Since commit 27c438f53e79 ("nfsd: Support the server resetting the boot verifier"), this field can be reset at any time; it's no longer tied to server restart. So that comment is stale. Also, according to pahole, struct timespec64 is 16 bytes long on x86_64. The nfssvc_boot field is used only to form a write verifier, which is 8 bytes long. Let's clarify this situation by manufacturing an 8-byte verifier in nfs_reset_boot_verifier() and storing only that in struct nfsd_net. We're grabbing 128 bits of time, so compress all of those into a 64-bit verifier instead of throwing out the high-order bits. In the future, the siphash_key can be re-used for other hashed objects per-nfsd_net. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/netns.h | 8 +++++--- fs/nfsd/nfsctl.c | 3 ++- fs/nfsd/nfssvc.c | 51 ++++++++++++++++++++++++++++++++++++------------ 3 files changed, 45 insertions(+), 17 deletions(-) diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 9e8b77d2a3a4..a6ed30025984 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -11,6 +11,7 @@ #include #include #include +#include /* Hash tables for nfs4_clientid state */ #define CLIENT_HASH_BITS 4 @@ -108,9 +109,8 @@ struct nfsd_net { bool nfsd_net_up; bool lockd_up; - /* Time of server startup */ - struct timespec64 nfssvc_boot; - seqlock_t boot_lock; + seqlock_t writeverf_lock; + unsigned char writeverf[8]; /* * Max number of connections this nfsd container will allow. Defaults @@ -187,6 +187,8 @@ struct nfsd_net { char nfsd_name[UNX_MAXNODENAME+1]; struct nfsd_fcache_disposal *fcache_disposal; + + siphash_key_t siphash_key; }; /* Simple check to find out if a given net was properly initialized */ diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 504b169d2788..68b020f2002b 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1484,7 +1484,8 @@ static __net_init int nfsd_init_net(struct net *net) nn->clientid_counter = nn->clientid_base + 1; nn->s2s_cp_cl_id = nn->clientid_counter++; - seqlock_init(&nn->boot_lock); + get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); + seqlock_init(&nn->writeverf_lock); return 0; diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 4d1d8aa6d7f9..5a6066469535 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -344,33 +345,57 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn) return nfsd_vers(nn, 2, NFSD_TEST) || nfsd_vers(nn, 3, NFSD_TEST); } +/** + * nfsd_copy_boot_verifier - Atomically copy a write verifier + * @verf: buffer in which to receive the verifier cookie + * @nn: NFS net namespace + * + * This function provides a wait-free mechanism for copying the + * namespace's boot verifier without tearing it. + */ void nfsd_copy_boot_verifier(__be32 verf[2], struct nfsd_net *nn) { int seq = 0; do { - read_seqbegin_or_lock(&nn->boot_lock, &seq); - /* - * This is opaque to client, so no need to byte-swap. Use - * __force to keep sparse happy. y2038 time_t overflow is - * irrelevant in this usage - */ - verf[0] = (__force __be32)nn->nfssvc_boot.tv_sec; - verf[1] = (__force __be32)nn->nfssvc_boot.tv_nsec; - } while (need_seqretry(&nn->boot_lock, seq)); - done_seqretry(&nn->boot_lock, seq); + read_seqbegin_or_lock(&nn->writeverf_lock, &seq); + memcpy(verf, nn->writeverf, sizeof(*verf)); + } while (need_seqretry(&nn->writeverf_lock, seq)); + done_seqretry(&nn->writeverf_lock, seq); } static void nfsd_reset_boot_verifier_locked(struct nfsd_net *nn) { - ktime_get_raw_ts64(&nn->nfssvc_boot); + struct timespec64 now; + u64 verf; + + /* + * Because the time value is hashed, y2038 time_t overflow + * is irrelevant in this usage. + */ + ktime_get_raw_ts64(&now); + verf = siphash_2u64(now.tv_sec, now.tv_nsec, &nn->siphash_key); + memcpy(nn->writeverf, &verf, sizeof(nn->writeverf)); } +/** + * nfsd_reset_boot_verifier - Generate a new boot verifier + * @nn: NFS net namespace + * + * This function updates the ->writeverf field of @nn. This field + * contains an opaque cookie that, according to Section 18.32.3 of + * RFC 8881, "the client can use to determine whether a server has + * changed instance state (e.g., server restart) between a call to + * WRITE and a subsequent call to either WRITE or COMMIT. This + * cookie MUST be unchanged during a single instance of the NFSv4.1 + * server and MUST be unique between instances of the NFSv4.1 + * server." + */ void nfsd_reset_boot_verifier(struct nfsd_net *nn) { - write_seqlock(&nn->boot_lock); + write_seqlock(&nn->writeverf_lock); nfsd_reset_boot_verifier_locked(nn); - write_sequnlock(&nn->boot_lock); + write_sequnlock(&nn->writeverf_lock); } static int nfsd_startup_net(struct net *net, const struct cred *cred) From a1c9bcfd16f31e6ac1e2bc97ac7914f8d0fa2a75 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 30 Dec 2021 10:22:05 -0500 Subject: [PATCH 0756/1565] NFSD: Rename boot verifier functions [ Upstream commit 3988a57885eeac05ef89f0ab4d7e47b52fbcf630 ] Clean up: These functions handle what the specs call a write verifier, which in the Linux NFS server implementation is now divorced from the server's boot instance [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 2 +- fs/nfsd/netns.h | 4 ++-- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfssvc.c | 16 ++++++++-------- fs/nfsd/vfs.c | 16 ++++++++-------- 5 files changed, 20 insertions(+), 20 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index b6ef8256c9c6..cc2831cec669 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -243,7 +243,7 @@ nfsd_file_do_unhash(struct nfsd_file *nf) trace_nfsd_file_unhash(nf); if (nfsd_file_check_write_error(nf)) - nfsd_reset_boot_verifier(net_generic(nf->nf_net, nfsd_net_id)); + nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); --nfsd_file_hashtbl[nf->nf_hashval].nfb_count; hlist_del_rcu(&nf->nf_node); atomic_long_dec(&nfsd_filecache_count); diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index a6ed30025984..1b1a962a1804 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -198,6 +198,6 @@ extern void nfsd_netns_free_versions(struct nfsd_net *nn); extern unsigned int nfsd_net_id; -void nfsd_copy_boot_verifier(__be32 verf[2], struct nfsd_net *nn); -void nfsd_reset_boot_verifier(struct nfsd_net *nn); +void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn); +void nfsd_reset_write_verifier(struct nfsd_net *nn); #endif /* __NFSD_NETNS_H__ */ diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 0f2025b7a641..e8ffaa7faced 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -598,7 +598,7 @@ static void gen_boot_verifier(nfs4_verifier *verifier, struct net *net) BUILD_BUG_ON(2*sizeof(*verf) != sizeof(verifier->data)); - nfsd_copy_boot_verifier(verf, net_generic(net, nfsd_net_id)); + nfsd_copy_write_verifier(verf, net_generic(net, nfsd_net_id)); } static __be32 diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 5a6066469535..2efe9d33a282 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -346,14 +346,14 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn) } /** - * nfsd_copy_boot_verifier - Atomically copy a write verifier + * nfsd_copy_write_verifier - Atomically copy a write verifier * @verf: buffer in which to receive the verifier cookie * @nn: NFS net namespace * * This function provides a wait-free mechanism for copying the - * namespace's boot verifier without tearing it. + * namespace's write verifier without tearing it. */ -void nfsd_copy_boot_verifier(__be32 verf[2], struct nfsd_net *nn) +void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn) { int seq = 0; @@ -364,7 +364,7 @@ void nfsd_copy_boot_verifier(__be32 verf[2], struct nfsd_net *nn) done_seqretry(&nn->writeverf_lock, seq); } -static void nfsd_reset_boot_verifier_locked(struct nfsd_net *nn) +static void nfsd_reset_write_verifier_locked(struct nfsd_net *nn) { struct timespec64 now; u64 verf; @@ -379,7 +379,7 @@ static void nfsd_reset_boot_verifier_locked(struct nfsd_net *nn) } /** - * nfsd_reset_boot_verifier - Generate a new boot verifier + * nfsd_reset_write_verifier - Generate a new write verifier * @nn: NFS net namespace * * This function updates the ->writeverf field of @nn. This field @@ -391,10 +391,10 @@ static void nfsd_reset_boot_verifier_locked(struct nfsd_net *nn) * server and MUST be unique between instances of the NFSv4.1 * server." */ -void nfsd_reset_boot_verifier(struct nfsd_net *nn) +void nfsd_reset_write_verifier(struct nfsd_net *nn) { write_seqlock(&nn->writeverf_lock); - nfsd_reset_boot_verifier_locked(nn); + nfsd_reset_write_verifier_locked(nn); write_sequnlock(&nn->writeverf_lock); } @@ -683,7 +683,7 @@ int nfsd_create_serv(struct net *net) register_inet6addr_notifier(&nfsd_inet6addr_notifier); #endif } - nfsd_reset_boot_verifier(nn); + nfsd_reset_write_verifier(nn); return 0; } diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index b1ce38c642cd..8cf053b69831 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -572,8 +572,8 @@ __be32 nfsd4_clone_file_range(struct svc_rqst *rqstp, &nfsd4_get_cstate(rqstp)->current_fh, dst_pos, count, status); - nfsd_reset_boot_verifier(net_generic(nf_dst->nf_net, - nfsd_net_id)); + nfsd_reset_write_verifier(net_generic(nf_dst->nf_net, + nfsd_net_id)); ret = nfserrno(status); } } @@ -1039,10 +1039,10 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, iov_iter_kvec(&iter, WRITE, vec, vlen, *cnt); since = READ_ONCE(file->f_wb_err); if (verf) - nfsd_copy_boot_verifier(verf, nn); + nfsd_copy_write_verifier(verf, nn); host_err = vfs_iter_write(file, &iter, &pos, flags); if (host_err < 0) { - nfsd_reset_boot_verifier(nn); + nfsd_reset_write_verifier(nn); goto out_nfserr; } *cnt = host_err; @@ -1055,7 +1055,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, if (stable && use_wgather) { host_err = wait_for_concurrent_writes(file); if (host_err < 0) - nfsd_reset_boot_verifier(nn); + nfsd_reset_write_verifier(nn); } out_nfserr: @@ -1168,7 +1168,7 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, err2 = vfs_fsync_range(nf->nf_file, offset, end, 0); switch (err2) { case 0: - nfsd_copy_boot_verifier(verf, nn); + nfsd_copy_write_verifier(verf, nn); err2 = filemap_check_wb_err(nf->nf_file->f_mapping, since); err = nfserrno(err2); @@ -1177,11 +1177,11 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, err = nfserr_notsupp; break; default: - nfsd_reset_boot_verifier(nn); + nfsd_reset_write_verifier(nn); err = nfserrno(err2); } } else - nfsd_copy_boot_verifier(verf, nn); + nfsd_copy_write_verifier(verf, nn); nfsd_file_put(nf); out: From 5e07d49f4abd0e2b2e764d54d327a22e54266ef7 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 28 Dec 2021 14:27:56 -0500 Subject: [PATCH 0757/1565] NFSD: Trace boot verifier resets [ Upstream commit 75acacb6583df0b9328dc701d8eeea05af49b8b5 ] According to commit bbf2f098838a ("nfsd: Reset the boot verifier on all write I/O errors"), the Linux NFS server forces all clients to resend pending unstable writes if any server-side write or commit operation encounters an error (say, ENOSPC). This is a rare and quite exceptional event that could require administrative recovery action, so it should be made trace-able. Example trace event: nfsd-938 [002] 7174.945558: nfsd_writeverf_reset: boot_time= 61cc920d xid=0xdcd62036 error=-28 new verifier=0x08aecc6142515904 [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/trace.h | 28 ++++++++++++++++++++++++++++ fs/nfsd/vfs.c | 13 ++++++++++--- 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 199b485c7717..8327d1601d71 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -575,6 +575,34 @@ DEFINE_EVENT(nfsd_net_class, nfsd_##name, \ DEFINE_NET_EVENT(grace_start); DEFINE_NET_EVENT(grace_complete); +TRACE_EVENT(nfsd_writeverf_reset, + TP_PROTO( + const struct nfsd_net *nn, + const struct svc_rqst *rqstp, + int error + ), + TP_ARGS(nn, rqstp, error), + TP_STRUCT__entry( + __field(unsigned long long, boot_time) + __field(u32, xid) + __field(int, error) + __array(unsigned char, verifier, NFS4_VERIFIER_SIZE) + ), + TP_fast_assign( + __entry->boot_time = nn->boot_time; + __entry->xid = be32_to_cpu(rqstp->rq_xid); + __entry->error = error; + + /* avoid seqlock inside TP_fast_assign */ + memcpy(__entry->verifier, nn->writeverf, + NFS4_VERIFIER_SIZE); + ), + TP_printk("boot_time=%16llx xid=0x%08x error=%d new verifier=0x%s", + __entry->boot_time, __entry->xid, __entry->error, + __print_hex_str(__entry->verifier, NFS4_VERIFIER_SIZE) + ) +); + TRACE_EVENT(nfsd_clid_cred_mismatch, TP_PROTO( const struct nfs4_client *clp, diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 8cf053b69831..77f48779210d 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -566,14 +566,17 @@ __be32 nfsd4_clone_file_range(struct svc_rqst *rqstp, if (!status) status = commit_inode_metadata(file_inode(src)); if (status < 0) { + struct nfsd_net *nn = net_generic(nf_dst->nf_net, + nfsd_net_id); + trace_nfsd_clone_file_range_err(rqstp, &nfsd4_get_cstate(rqstp)->save_fh, src_pos, &nfsd4_get_cstate(rqstp)->current_fh, dst_pos, count, status); - nfsd_reset_write_verifier(net_generic(nf_dst->nf_net, - nfsd_net_id)); + nfsd_reset_write_verifier(nn); + trace_nfsd_writeverf_reset(nn, rqstp, status); ret = nfserrno(status); } } @@ -1043,6 +1046,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, host_err = vfs_iter_write(file, &iter, &pos, flags); if (host_err < 0) { nfsd_reset_write_verifier(nn); + trace_nfsd_writeverf_reset(nn, rqstp, host_err); goto out_nfserr; } *cnt = host_err; @@ -1054,8 +1058,10 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, if (stable && use_wgather) { host_err = wait_for_concurrent_writes(file); - if (host_err < 0) + if (host_err < 0) { nfsd_reset_write_verifier(nn); + trace_nfsd_writeverf_reset(nn, rqstp, host_err); + } } out_nfserr: @@ -1178,6 +1184,7 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, break; default: nfsd_reset_write_verifier(nn); + trace_nfsd_writeverf_reset(nn, rqstp, err2); err = nfserrno(err2); } } else From 695719e5e6b972bea0107fd795205b510b6db9be Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 24 Dec 2021 14:22:28 -0500 Subject: [PATCH 0758/1565] Revert "nfsd: skip some unnecessary stats in the v4 case" [ Upstream commit 58f258f65267542959487dbe8b5641754411843d ] On the wire, I observed NFSv4 OPEN(CREATE) operations sometimes returning a reasonable-looking value in the cinfo.before field and zero in the cinfo.after field. RFC 8881 Section 10.8.1 says: > When a client is making changes to a given directory, it needs to > determine whether there have been changes made to the directory by > other clients. It does this by using the change attribute as > reported before and after the directory operation in the associated > change_info4 value returned for the operation. and > ... The post-operation change > value needs to be saved as the basis for future change_info4 > comparisons. A good quality client implementation therefore saves the zero cinfo.after value. During a subsequent OPEN operation, it will receive a different non-zero value in the cinfo.before field for that directory, and it will incorrectly believe the directory has changed, triggering an undesirable directory cache invalidation. There are filesystem types where fs_supports_change_attribute() returns false, tmpfs being one. On NFSv4 mounts, this means the fh_getattr() call site in fill_pre_wcc() and fill_post_wcc() is never invoked. Subsequently, nfsd4_change_attribute() is invoked with an uninitialized @stat argument. In fill_pre_wcc(), @stat contains stale stack garbage, which is then placed on the wire. In fill_post_wcc(), ->fh_post_wc is all zeroes, so zero is placed on the wire. Both of these values are meaningless. This fix can be applied immediately to stable kernels. Once there are more regression tests in this area, this optimization can be attempted again. Fixes: 428a23d2bf0c ("nfsd: skip some unnecessary stats in the v4 case") Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 44 +++++++++++++++++--------------------------- 1 file changed, 17 insertions(+), 27 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index c3ac1b6aa3aa..84088581bbe0 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -487,11 +487,6 @@ svcxdr_encode_wcc_data(struct svc_rqst *rqstp, struct xdr_stream *xdr, return true; } -static bool fs_supports_change_attribute(struct super_block *sb) -{ - return sb->s_flags & SB_I_VERSION || sb->s_export_op->fetch_iversion; -} - /* * Fill in the pre_op attr for the wcc data */ @@ -500,26 +495,24 @@ void fill_pre_wcc(struct svc_fh *fhp) struct inode *inode; struct kstat stat; bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); + __be32 err; if (fhp->fh_no_wcc || fhp->fh_pre_saved) return; inode = d_inode(fhp->fh_dentry); - if (fs_supports_change_attribute(inode->i_sb) || !v4) { - __be32 err = fh_getattr(fhp, &stat); - - if (err) { - /* Grab the times from inode anyway */ - stat.mtime = inode->i_mtime; - stat.ctime = inode->i_ctime; - stat.size = inode->i_size; - } - fhp->fh_pre_mtime = stat.mtime; - fhp->fh_pre_ctime = stat.ctime; - fhp->fh_pre_size = stat.size; + err = fh_getattr(fhp, &stat); + if (err) { + /* Grab the times from inode anyway */ + stat.mtime = inode->i_mtime; + stat.ctime = inode->i_ctime; + stat.size = inode->i_size; } if (v4) fhp->fh_pre_change = nfsd4_change_attribute(&stat, inode); + fhp->fh_pre_mtime = stat.mtime; + fhp->fh_pre_ctime = stat.ctime; + fhp->fh_pre_size = stat.size; fhp->fh_pre_saved = true; } @@ -530,6 +523,7 @@ void fill_post_wcc(struct svc_fh *fhp) { bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); struct inode *inode = d_inode(fhp->fh_dentry); + __be32 err; if (fhp->fh_no_wcc) return; @@ -537,16 +531,12 @@ void fill_post_wcc(struct svc_fh *fhp) if (fhp->fh_post_saved) printk("nfsd: inode locked twice during operation.\n"); - fhp->fh_post_saved = true; - - if (fs_supports_change_attribute(inode->i_sb) || !v4) { - __be32 err = fh_getattr(fhp, &fhp->fh_post_attr); - - if (err) { - fhp->fh_post_saved = false; - fhp->fh_post_attr.ctime = inode->i_ctime; - } - } + err = fh_getattr(fhp, &fhp->fh_post_attr); + if (err) { + fhp->fh_post_saved = false; + fhp->fh_post_attr.ctime = inode->i_ctime; + } else + fhp->fh_post_saved = true; if (v4) fhp->fh_post_change = nfsd4_change_attribute(&fhp->fh_post_attr, inode); From 4eefd1125b96168a265a9ef2fd6daecebb3f0ad6 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 24 Dec 2021 14:36:49 -0500 Subject: [PATCH 0759/1565] NFSD: Move fill_pre_wcc() and fill_post_wcc() [ Upstream commit fcb5e3fa012351f3b96024c07bc44834c2478213 ] These functions are related to file handle processing and have nothing to do with XDR encoding or decoding. Also they are no longer NFSv3-specific. As a clean-up, move their definitions to a more appropriate location. WCC is also an NFSv3-specific term, so rename them as general-purpose helpers. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 55 -------------------------------------- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfsfh.c | 66 +++++++++++++++++++++++++++++++++++++++++++++- fs/nfsd/nfsfh.h | 40 ++++++++++++++++++---------- fs/nfsd/vfs.c | 8 +++--- 5 files changed, 96 insertions(+), 75 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 84088581bbe0..7c45ba4db61b 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -487,61 +487,6 @@ svcxdr_encode_wcc_data(struct svc_rqst *rqstp, struct xdr_stream *xdr, return true; } -/* - * Fill in the pre_op attr for the wcc data - */ -void fill_pre_wcc(struct svc_fh *fhp) -{ - struct inode *inode; - struct kstat stat; - bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); - __be32 err; - - if (fhp->fh_no_wcc || fhp->fh_pre_saved) - return; - inode = d_inode(fhp->fh_dentry); - err = fh_getattr(fhp, &stat); - if (err) { - /* Grab the times from inode anyway */ - stat.mtime = inode->i_mtime; - stat.ctime = inode->i_ctime; - stat.size = inode->i_size; - } - if (v4) - fhp->fh_pre_change = nfsd4_change_attribute(&stat, inode); - - fhp->fh_pre_mtime = stat.mtime; - fhp->fh_pre_ctime = stat.ctime; - fhp->fh_pre_size = stat.size; - fhp->fh_pre_saved = true; -} - -/* - * Fill in the post_op attr for the wcc data - */ -void fill_post_wcc(struct svc_fh *fhp) -{ - bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); - struct inode *inode = d_inode(fhp->fh_dentry); - __be32 err; - - if (fhp->fh_no_wcc) - return; - - if (fhp->fh_post_saved) - printk("nfsd: inode locked twice during operation.\n"); - - err = fh_getattr(fhp, &fhp->fh_post_attr); - if (err) { - fhp->fh_post_saved = false; - fhp->fh_post_attr.ctime = inode->i_ctime; - } else - fhp->fh_post_saved = true; - if (v4) - fhp->fh_post_change = - nfsd4_change_attribute(&fhp->fh_post_attr, inode); -} - /* * XDR decode functions */ diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index e8ffaa7faced..451190813302 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2523,7 +2523,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) goto encode_op; } - fh_clear_wcc(current_fh); + fh_clear_pre_post_attrs(current_fh); /* If op is non-idempotent */ if (op->opdesc->op_flags & OP_MODIFIES_SOMETHING) { diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 34e201b6eb62..3b9751555f8f 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -610,6 +610,70 @@ fh_update(struct svc_fh *fhp) return nfserr_serverfault; } +#ifdef CONFIG_NFSD_V3 + +/** + * fh_fill_pre_attrs - Fill in pre-op attributes + * @fhp: file handle to be updated + * + */ +void fh_fill_pre_attrs(struct svc_fh *fhp) +{ + bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); + struct inode *inode; + struct kstat stat; + __be32 err; + + if (fhp->fh_no_wcc || fhp->fh_pre_saved) + return; + + inode = d_inode(fhp->fh_dentry); + err = fh_getattr(fhp, &stat); + if (err) { + /* Grab the times from inode anyway */ + stat.mtime = inode->i_mtime; + stat.ctime = inode->i_ctime; + stat.size = inode->i_size; + } + if (v4) + fhp->fh_pre_change = nfsd4_change_attribute(&stat, inode); + + fhp->fh_pre_mtime = stat.mtime; + fhp->fh_pre_ctime = stat.ctime; + fhp->fh_pre_size = stat.size; + fhp->fh_pre_saved = true; +} + +/** + * fh_fill_post_attrs - Fill in post-op attributes + * @fhp: file handle to be updated + * + */ +void fh_fill_post_attrs(struct svc_fh *fhp) +{ + bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); + struct inode *inode = d_inode(fhp->fh_dentry); + __be32 err; + + if (fhp->fh_no_wcc) + return; + + if (fhp->fh_post_saved) + printk("nfsd: inode locked twice during operation.\n"); + + err = fh_getattr(fhp, &fhp->fh_post_attr); + if (err) { + fhp->fh_post_saved = false; + fhp->fh_post_attr.ctime = inode->i_ctime; + } else + fhp->fh_post_saved = true; + if (v4) + fhp->fh_post_change = + nfsd4_change_attribute(&fhp->fh_post_attr, inode); +} + +#endif /* CONFIG_NFSD_V3 */ + /* * Release a file handle. */ @@ -622,7 +686,7 @@ fh_put(struct svc_fh *fhp) fh_unlock(fhp); fhp->fh_dentry = NULL; dput(dentry); - fh_clear_wcc(fhp); + fh_clear_pre_post_attrs(fhp); } fh_drop_write(fhp); if (exp) { diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index d11e4b6870d6..434930d8a946 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -284,12 +284,13 @@ static inline u32 knfsd_fh_hash(const struct knfsd_fh *fh) #endif #ifdef CONFIG_NFSD_V3 -/* - * The wcc data stored in current_fh should be cleared - * between compound ops. + +/** + * fh_clear_pre_post_attrs - Reset pre/post attributes + * @fhp: file handle to be updated + * */ -static inline void -fh_clear_wcc(struct svc_fh *fhp) +static inline void fh_clear_pre_post_attrs(struct svc_fh *fhp) { fhp->fh_post_saved = false; fhp->fh_pre_saved = false; @@ -323,13 +324,24 @@ static inline u64 nfsd4_change_attribute(struct kstat *stat, return time_to_chattr(&stat->ctime); } -extern void fill_pre_wcc(struct svc_fh *fhp); -extern void fill_post_wcc(struct svc_fh *fhp); -#else -#define fh_clear_wcc(ignored) -#define fill_pre_wcc(ignored) -#define fill_post_wcc(notused) -#endif /* CONFIG_NFSD_V3 */ +extern void fh_fill_pre_attrs(struct svc_fh *fhp); +extern void fh_fill_post_attrs(struct svc_fh *fhp); + +#else /* !CONFIG_NFSD_V3 */ + +static inline void fh_clear_pre_post_attrs(struct svc_fh *fhp) +{ +} + +static inline void fh_fill_pre_attrs(struct svc_fh *fhp) +{ +} + +static inline void fh_fill_post_attrs(struct svc_fh *fhp) +{ +} + +#endif /* !CONFIG_NFSD_V3 */ /* @@ -355,7 +367,7 @@ fh_lock_nested(struct svc_fh *fhp, unsigned int subclass) inode = d_inode(dentry); inode_lock_nested(inode, subclass); - fill_pre_wcc(fhp); + fh_fill_pre_attrs(fhp); fhp->fh_locked = true; } @@ -372,7 +384,7 @@ static inline void fh_unlock(struct svc_fh *fhp) { if (fhp->fh_locked) { - fill_post_wcc(fhp); + fh_fill_post_attrs(fhp); inode_unlock(d_inode(fhp->fh_dentry)); fhp->fh_locked = false; } diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 77f48779210d..d4b6bc3b4d73 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1787,8 +1787,8 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, * so do it by hand */ trap = lock_rename(tdentry, fdentry); ffhp->fh_locked = tfhp->fh_locked = true; - fill_pre_wcc(ffhp); - fill_pre_wcc(tfhp); + fh_fill_pre_attrs(ffhp); + fh_fill_pre_attrs(tfhp); odentry = lookup_one_len(fname, fdentry, flen); host_err = PTR_ERR(odentry); @@ -1840,8 +1840,8 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, * were the same, so again we do it by hand. */ if (!close_cached) { - fill_post_wcc(ffhp); - fill_post_wcc(tfhp); + fh_fill_post_attrs(ffhp); + fh_fill_post_attrs(tfhp); } unlock_rename(tdentry, fdentry); ffhp->fh_locked = tfhp->fh_locked = false; From 11bcfabf24815ae90c60978a1f21a874cd483561 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Wed, 5 Jan 2022 14:15:03 -0500 Subject: [PATCH 0760/1565] nfsd: fix crash on COPY_NOTIFY with special stateid [ Upstream commit 074b07d94e0bb6ddce5690a9b7e2373088e8b33a ] RTM says "If the special ONE stateid is passed to nfs4_preprocess_stateid_op(), it returns status=0 but does not set *cstid. nfsd4_copy_notify() depends on stid being set if status=0, and thus can crash if the client sends the right COPY_NOTIFY RPC." RFC 7862 says "The cna_src_stateid MUST refer to either open or locking states provided earlier by the server. If it is invalid, then the operation MUST fail." The RFC doesn't specify an error, and the choice doesn't matter much as this is clearly illegal client behavior, but bad_stateid seems reasonable. Simplest is just to guarantee that nfs4_preprocess_stateid_op, called with non-NULL cstid, errors out if it can't return a stateid. Reported-by: rtm@csail.mit.edu Fixes: 624322f1adc5 ("NFSD add COPY_NOTIFY operation") Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Reviewed-by: Olga Kornievskaia Tested-by: Olga Kornievskaia Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 4161a4854c43..60d5d1cb2cc6 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6097,7 +6097,11 @@ nfs4_preprocess_stateid_op(struct svc_rqst *rqstp, return nfserr_grace; if (ZERO_STATEID(stateid) || ONE_STATEID(stateid)) { - status = check_special_stateids(net, fhp, stateid, flags); + if (cstid) + status = nfserr_bad_stateid; + else + status = check_special_stateids(net, fhp, stateid, + flags); goto done; } From a667e1df409e5708d53bcc5819c5930bef4c9d44 Mon Sep 17 00:00:00 2001 From: Yang Li Date: Thu, 20 Jan 2022 13:57:22 +0100 Subject: [PATCH 0761/1565] fanotify: remove variable set but not used MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 217663f101a56ef77f82273818253fff082bf503 ] The code that uses the pointer info has been removed in 7326e382c21e ("fanotify: report old and/or new parent+name in FAN_RENAME event"). and fanotify_event_info() doesn't change 'event', so the declaration and assignment of info can be removed. Eliminate the following clang warning: fs/notify/fanotify/fanotify_user.c:161:24: warning: variable ‘info’ set but not used Reported-by: Abaci Robot Signed-off-by: Yang Li Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 3bac2329dc35..667970057411 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -146,7 +146,6 @@ static size_t fanotify_event_len(unsigned int info_mode, struct fanotify_event *event) { size_t event_len = FAN_EVENT_METADATA_LEN; - struct fanotify_info *info; int fh_len; int dot_len = 0; @@ -156,8 +155,6 @@ static size_t fanotify_event_len(unsigned int info_mode, if (fanotify_is_error_event(event->mask)) event_len += FANOTIFY_ERROR_INFO_LEN; - info = fanotify_event_info(event); - if (fanotify_event_has_any_dir_fh(event)) { event_len += fanotify_dir_name_info_len(event); } else if ((info_mode & FAN_REPORT_NAME) && From 20a74a69119e032e431f143697ac1c0bdf938a3e Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Tue, 18 Jan 2022 17:00:16 -0500 Subject: [PATCH 0762/1565] lockd: fix server crash on reboot of client holding lock [ Upstream commit 6e7f90d163afa8fc2efd6ae318e7c20156a5621f ] I thought I was iterating over the array when actually the iteration is over the values contained in the array? Ugh, keep it simple. Symptoms were a null deference in vfs_lock_file() when an NFSv3 client that previously held a lock came back up and sent a notify. Reported-by: Jonathan Woithe Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file") Signed-off-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svcsubs.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index cb3a7512c33e..54c2e42130ca 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -179,19 +179,20 @@ nlm_delete_file(struct nlm_file *file) static int nlm_unlock_files(struct nlm_file *file) { struct file_lock lock; - struct file *f; lock.fl_type = F_UNLCK; lock.fl_start = 0; lock.fl_end = OFFSET_MAX; - for (f = file->f_file[0]; f <= file->f_file[1]; f++) { - if (f && vfs_lock_file(f, F_SETLK, &lock, NULL) < 0) { - pr_warn("lockd: unlock failure in %s:%d\n", - __FILE__, __LINE__); - return 1; - } - } + if (file->f_file[O_RDONLY] && + vfs_lock_file(file->f_file[O_RDONLY], F_SETLK, &lock, NULL)) + goto out_err; + if (file->f_file[O_WRONLY] && + vfs_lock_file(file->f_file[O_WRONLY], F_SETLK, &lock, NULL)) + goto out_err; return 0; +out_err: + pr_warn("lockd: unlock failure in %s:%d\n", __FILE__, __LINE__); + return 1; } /* From fc2d8c153d52ff8ed0ddb88b764a8ee6a2cc0fe5 Mon Sep 17 00:00:00 2001 From: "J. Bruce Fields" Date: Tue, 18 Jan 2022 17:00:51 -0500 Subject: [PATCH 0763/1565] lockd: fix failure to cleanup client locks [ Upstream commit d19a7af73b5ecaac8168712d18be72b9db166768 ] In my testing, we're sometimes hitting the request->fl_flags & FL_EXISTS case in posix_lock_inode, presumably just by random luck since we're not actually initializing fl_flags here. This probably didn't matter before commit 7f024fcd5c97 ("Keep read and write fds with each nlm_file") since we wouldn't previously unlock unless we knew there were locks. But now it causes lockd to give up on removing more locks. We could just initialize fl_flags, but really it seems dubious to be calling vfs_lock_file with random values in some of the fields. Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file") Signed-off-by: J. Bruce Fields [ cel: fixed checkpatch.pl nit ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svcsubs.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index 54c2e42130ca..0a22a2faf552 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -180,6 +180,7 @@ static int nlm_unlock_files(struct nlm_file *file) { struct file_lock lock; + locks_init_lock(&lock); lock.fl_type = F_UNLCK; lock.fl_start = 0; lock.fl_end = OFFSET_MAX; From 1726a39b0879acfb490b22dca643f26f4f907da9 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 4 Feb 2022 15:19:34 -0500 Subject: [PATCH 0764/1565] NFSD: Fix the behavior of READ near OFFSET_MAX [ Upstream commit 0cb4d23ae08c48f6bf3c29a8e5c4a74b8388b960 ] Dan Aloni reports: > Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to > the RPC read layers") on the client, a read of 0xfff is aligned up > to server rsize of 0x1000. > > As a result, in a test where the server has a file of size > 0x7fffffffffffffff, and the client tries to read from the offset > 0x7ffffffffffff000, the read causes loff_t overflow in the server > and it returns an NFS code of EINVAL to the client. The client as > a result indefinitely retries the request. The Linux NFS client does not handle NFS?ERR_INVAL, even though all NFS specifications permit servers to return that status code for a READ. Instead of NFS?ERR_INVAL, have out-of-range READ requests succeed and return a short result. Set the EOF flag in the result to prevent the client from retrying the READ request. This behavior appears to be consistent with Solaris NFS servers. Note that NFSv3 and NFSv4 use u64 offset values on the wire. These must be converted to loff_t internally before use -- an implicit type cast is not adequate for this purpose. Otherwise VFS checks against sb->s_maxbytes do not work properly. Reported-by: Dan Aloni Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 8 ++++++-- fs/nfsd/nfs4proc.c | 8 ++++++-- fs/nfsd/nfs4xdr.c | 8 ++------ 3 files changed, 14 insertions(+), 10 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 1515c32e08db..b540489ea240 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -150,13 +150,17 @@ nfsd3_proc_read(struct svc_rqst *rqstp) unsigned int len; int v; - argp->count = min_t(u32, argp->count, max_blocksize); - dprintk("nfsd: READ(3) %s %lu bytes at %Lu\n", SVCFH_fmt(&argp->fh), (unsigned long) argp->count, (unsigned long long) argp->offset); + argp->count = min_t(u32, argp->count, max_blocksize); + if (argp->offset > (u64)OFFSET_MAX) + argp->offset = (u64)OFFSET_MAX; + if (argp->offset + argp->count > (u64)OFFSET_MAX) + argp->count = (u64)OFFSET_MAX - argp->offset; + v = 0; len = argp->count; resp->pages = rqstp->rq_next_page; diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 451190813302..f3d6bd2bfa4f 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -782,12 +782,16 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, __be32 status; read->rd_nf = NULL; - if (read->rd_offset >= OFFSET_MAX) - return nfserr_inval; trace_nfsd_read_start(rqstp, &cstate->current_fh, read->rd_offset, read->rd_length); + read->rd_length = min_t(u32, read->rd_length, svc_max_payload(rqstp)); + if (read->rd_offset > (u64)OFFSET_MAX) + read->rd_offset = (u64)OFFSET_MAX; + if (read->rd_offset + read->rd_length > (u64)OFFSET_MAX) + read->rd_length = (u64)OFFSET_MAX - read->rd_offset; + /* * If we do a zero copy read, then a client will see read data * that reflects the state of the file *after* performing the diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index adf97d72bda8..be0995bb9459 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3997,10 +3997,8 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, } xdr_commit_encode(xdr); - maxcount = svc_max_payload(resp->rqstp); - maxcount = min_t(unsigned long, maxcount, + maxcount = min_t(unsigned long, read->rd_length, (xdr->buf->buflen - xdr->buf->len)); - maxcount = min_t(unsigned long, maxcount, read->rd_length); if (file->f_op->splice_read && test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags)) @@ -4834,10 +4832,8 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr, return nfserr_resource; xdr_commit_encode(xdr); - maxcount = svc_max_payload(resp->rqstp); - maxcount = min_t(unsigned long, maxcount, + maxcount = min_t(unsigned long, read->rd_length, (xdr->buf->buflen - xdr->buf->len)); - maxcount = min_t(unsigned long, maxcount, read->rd_length); count = maxcount; eof = read->rd_offset >= i_size_read(file_inode(file)); From 38d02ba22e43b6fc7d291cf724bc6e3b7be6626b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 31 Jan 2022 13:01:53 -0500 Subject: [PATCH 0765/1565] NFSD: Fix ia_size underflow [ Upstream commit e6faac3f58c7c4176b66f63def17a34232a17b0e ] iattr::ia_size is a loff_t, which is a signed 64-bit type. NFSv3 and NFSv4 both define file size as an unsigned 64-bit type. Thus there is a range of valid file size values an NFS client can send that is already larger than Linux can handle. Currently decode_fattr4() dumps a full u64 value into ia_size. If that value happens to be larger than S64_MAX, then ia_size underflows. I'm about to fix up the NFSv3 behavior as well, so let's catch the underflow in the common code path: nfsd_setattr(). Cc: stable@vger.kernel.org [ cel: context adjusted, 2f221d6f7b88 has not been applied ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index d4b6bc3b4d73..09e4a0af6fb4 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -449,6 +449,10 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap, .ia_size = iap->ia_size, }; + host_err = -EFBIG; + if (iap->ia_size < 0) + goto out_unlock; + host_err = notify_change(dentry, &size_attr, NULL); if (host_err) goto out_unlock; From a231ae6bb50e7c0a9e9efd7b0d10687f1d71b3a3 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 25 Jan 2022 15:59:57 -0500 Subject: [PATCH 0766/1565] NFSD: Fix NFSv3 SETATTR/CREATE's handling of large file sizes [ Upstream commit a648fdeb7c0e17177a2280344d015dba3fbe3314 ] iattr::ia_size is a loff_t, so these NFSv3 procedures must be careful to deal with incoming client size values that are larger than s64_max without corrupting the value. Silently capping the value results in storing a different value than the client passed in which is unexpected behavior, so remove the min_t() check in decode_sattr3(). Note that RFC 1813 permits only the WRITE procedure to return NFS3ERR_FBIG. We believe that NFSv3 reference implementations also return NFS3ERR_FBIG when ia_size is too large. Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 7c45ba4db61b..2e47a07029f1 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -254,7 +254,7 @@ svcxdr_decode_sattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr, if (xdr_stream_decode_u64(xdr, &newsize) < 0) return false; iap->ia_valid |= ATTR_SIZE; - iap->ia_size = min_t(u64, newsize, NFS_OFFSET_MAX); + iap->ia_size = newsize; } if (xdr_stream_decode_u32(xdr, &set_it) < 0) return false; From 70a80c7e8d5b88d86e2167dc903d7e14c0aea189 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 24 Jan 2022 15:50:31 -0500 Subject: [PATCH 0767/1565] NFSD: COMMIT operations must not return NFS?ERR_INVAL [ Upstream commit 3f965021c8bc38965ecb1924f570c4842b33d408 ] Since, well, forever, the Linux NFS server's nfsd_commit() function has returned nfserr_inval when the passed-in byte range arguments were non-sensical. However, according to RFC 1813 section 3.3.21, NFSv3 COMMIT requests are permitted to return only the following non-zero status codes: NFS3ERR_IO NFS3ERR_STALE NFS3ERR_BADHANDLE NFS3ERR_SERVERFAULT NFS3ERR_INVAL is not included in that list. Likewise, NFS4ERR_INVAL is not listed in the COMMIT row of Table 6 in RFC 8881. RFC 7530 does permit COMMIT to return NFS4ERR_INVAL, but does not specify when it can or should be used. Instead of dropping or failing a COMMIT request in a byte range that is not supported, turn it into a valid request by treating one or both arguments as zero. Offset zero means start-of-file, count zero means until-end-of-file, so we only ever extend the commit range. NFS servers are always allowed to commit more and sooner than requested. The range check is no longer bounded by NFS_OFFSET_MAX, but rather by the value that is returned in the maxfilesize field of the NFSv3 FSINFO procedure or the NFSv4 maxfilesize file attribute. Note that this change results in a new pynfs failure: CMT4 st_commit.testCommitOverflow : RUNNING CMT4 st_commit.testCommitOverflow : FAILURE COMMIT with offset + count overflow should return NFS4ERR_INVAL, instead got NFS4_OK IMO the test is not correct as written: RFC 8881 does not allow the COMMIT operation to return NFS4ERR_INVAL. Reported-by: Dan Aloni Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever Reviewed-by: Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 6 ------ fs/nfsd/vfs.c | 53 +++++++++++++++++++++++++++++++--------------- fs/nfsd/vfs.h | 4 ++-- 3 files changed, 38 insertions(+), 25 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index b540489ea240..936eebd4c56d 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -660,15 +660,9 @@ nfsd3_proc_commit(struct svc_rqst *rqstp) argp->count, (unsigned long long) argp->offset); - if (argp->offset > NFS_OFFSET_MAX) { - resp->status = nfserr_inval; - goto out; - } - fh_copy(&resp->fh, &argp->fh); resp->status = nfsd_commit(rqstp, &resp->fh, argp->offset, argp->count, resp->verf); -out: return rpc_success; } diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 09e4a0af6fb4..89c50ccedf4d 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1140,42 +1140,61 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset, } #ifdef CONFIG_NFSD_V3 -/* - * Commit all pending writes to stable storage. +/** + * nfsd_commit - Commit pending writes to stable storage + * @rqstp: RPC request being processed + * @fhp: NFS filehandle + * @offset: raw offset from beginning of file + * @count: raw count of bytes to sync + * @verf: filled in with the server's current write verifier * - * Note: we only guarantee that data that lies within the range specified - * by the 'offset' and 'count' parameters will be synced. + * Note: we guarantee that data that lies within the range specified + * by the 'offset' and 'count' parameters will be synced. The server + * is permitted to sync data that lies outside this range at the + * same time. * * Unfortunately we cannot lock the file to make sure we return full WCC * data to the client, as locking happens lower down in the filesystem. + * + * Return values: + * An nfsstat value in network byte order. */ __be32 -nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, - loff_t offset, unsigned long count, __be32 *verf) +nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, u64 offset, + u32 count, __be32 *verf) { + u64 maxbytes; + loff_t start, end; struct nfsd_net *nn; struct nfsd_file *nf; - loff_t end = LLONG_MAX; - __be32 err = nfserr_inval; - - if (offset < 0) - goto out; - if (count != 0) { - end = offset + (loff_t)count - 1; - if (end < offset) - goto out; - } + __be32 err; err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &nf); if (err) goto out; + + /* + * Convert the client-provided (offset, count) range to a + * (start, end) range. If the client-provided range falls + * outside the maximum file size of the underlying FS, + * clamp the sync range appropriately. + */ + start = 0; + end = LLONG_MAX; + maxbytes = (u64)fhp->fh_dentry->d_sb->s_maxbytes; + if (offset < maxbytes) { + start = offset; + if (count && (offset + count - 1 < maxbytes)) + end = offset + count - 1; + } + nn = net_generic(nf->nf_net, nfsd_net_id); if (EX_ISSYNC(fhp->fh_export)) { errseq_t since = READ_ONCE(nf->nf_file->f_wb_err); int err2; - err2 = vfs_fsync_range(nf->nf_file, offset, end, 0); + err2 = vfs_fsync_range(nf->nf_file, start, end, 0); switch (err2) { case 0: nfsd_copy_write_verifier(verf, nn); diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index 9f56dcb22ff7..2c43d10e3cab 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -74,8 +74,8 @@ __be32 do_nfsd_create(struct svc_rqst *, struct svc_fh *, char *name, int len, struct iattr *attrs, struct svc_fh *res, int createmode, u32 *verifier, bool *truncp, bool *created); -__be32 nfsd_commit(struct svc_rqst *, struct svc_fh *, - loff_t, unsigned long, __be32 *verf); +__be32 nfsd_commit(struct svc_rqst *rqst, struct svc_fh *fhp, + u64 offset, u32 count, __be32 *verf); #endif /* CONFIG_NFSD_V3 */ #ifdef CONFIG_NFSD_V4 __be32 nfsd_getxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, From 0d1bbb0efe5a2304a3b322bd480b16d07d703a85 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 25 Jan 2022 15:57:45 -0500 Subject: [PATCH 0768/1565] NFSD: Deprecate NFS_OFFSET_MAX [ Upstream commit c306d737691ef84305d4ed0d302c63db2932f0bb ] NFS_OFFSET_MAX was introduced way back in Linux v2.3.y before there was a kernel-wide OFFSET_MAX value. As a clean up, replace the last few uses of it with its generic equivalent, and get rid of it. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 2 +- fs/nfsd/nfs4xdr.c | 2 +- include/linux/nfs.h | 8 -------- 3 files changed, 2 insertions(+), 10 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 2e47a07029f1..0293b8d65f10 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -1060,7 +1060,7 @@ svcxdr_encode_entry3_common(struct nfsd3_readdirres *resp, const char *name, return false; /* cookie */ resp->cookie_offset = dirlist->len; - if (xdr_stream_encode_u64(xdr, NFS_OFFSET_MAX) < 0) + if (xdr_stream_encode_u64(xdr, OFFSET_MAX) < 0) return false; return true; diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index be0995bb9459..a7b9c3faf7a7 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3495,7 +3495,7 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen, p = xdr_reserve_space(xdr, 3*4 + namlen); if (!p) goto fail; - p = xdr_encode_hyper(p, NFS_OFFSET_MAX); /* offset of next entry */ + p = xdr_encode_hyper(p, OFFSET_MAX); /* offset of next entry */ p = xdr_encode_array(p, name, namlen); /* name length & name */ nfserr = nfsd4_encode_dirent_fattr(xdr, cd, name, namlen); diff --git a/include/linux/nfs.h b/include/linux/nfs.h index 0dc7ad38a0da..b06375e88e58 100644 --- a/include/linux/nfs.h +++ b/include/linux/nfs.h @@ -36,14 +36,6 @@ static inline void nfs_copy_fh(struct nfs_fh *target, const struct nfs_fh *sourc memcpy(target->data, source->data, source->size); } - -/* - * This is really a general kernel constant, but since nothing like - * this is defined in the kernel headers, I have to do it here. - */ -#define NFS_OFFSET_MAX ((__s64)((~(__u64)0) >> 1)) - - enum nfs3_stable_how { NFS_UNSTABLE = 0, NFS_DATA_SYNC = 1, From ca6761d39ad27ac73bd40254ebfbe4b6331c20a8 Mon Sep 17 00:00:00 2001 From: Ondrej Valousek Date: Tue, 11 Jan 2022 13:08:42 +0100 Subject: [PATCH 0769/1565] nfsd: Add support for the birth time attribute [ Upstream commit e377a3e698fb56cb63f6bddbebe7da76dc37e316 ] For filesystems that supports "btime" timestamp (i.e. most modern filesystems do) we share it via kernel nfsd. Btime support for NFS client has already been added by Trond recently. Suggested-by: Bruce Fields Signed-off-by: Ondrej Valousek [ cel: addressed some whitespace/checkpatch nits ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 10 ++++++++++ fs/nfsd/nfsd.h | 2 +- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index a7b9c3faf7a7..a9f6c7eeb756 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2854,6 +2854,9 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, err = vfs_getattr(&path, &stat, STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT); if (err) goto out_nfserr; + if (!(stat.result_mask & STATX_BTIME)) + /* underlying FS does not offer btime so we can't share it */ + bmval1 &= ~FATTR4_WORD1_TIME_CREATE; if ((bmval0 & (FATTR4_WORD0_FILES_AVAIL | FATTR4_WORD0_FILES_FREE | FATTR4_WORD0_FILES_TOTAL | FATTR4_WORD0_MAXNAME)) || (bmval1 & (FATTR4_WORD1_SPACE_AVAIL | FATTR4_WORD1_SPACE_FREE | @@ -3254,6 +3257,13 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, p = xdr_encode_hyper(p, (s64)stat.mtime.tv_sec); *p++ = cpu_to_be32(stat.mtime.tv_nsec); } + if (bmval1 & FATTR4_WORD1_TIME_CREATE) { + p = xdr_reserve_space(xdr, 12); + if (!p) + goto out_resource; + p = xdr_encode_hyper(p, (s64)stat.btime.tv_sec); + *p++ = cpu_to_be32(stat.btime.tv_nsec); + } if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) { struct kstat parent_stat; u64 ino = stat.ino; diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 3e5008b475ff..4fc1fd639527 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -364,7 +364,7 @@ void nfsd_lockd_shutdown(void); | FATTR4_WORD1_OWNER | FATTR4_WORD1_OWNER_GROUP | FATTR4_WORD1_RAWDEV \ | FATTR4_WORD1_SPACE_AVAIL | FATTR4_WORD1_SPACE_FREE | FATTR4_WORD1_SPACE_TOTAL \ | FATTR4_WORD1_SPACE_USED | FATTR4_WORD1_TIME_ACCESS | FATTR4_WORD1_TIME_ACCESS_SET \ - | FATTR4_WORD1_TIME_DELTA | FATTR4_WORD1_TIME_METADATA \ + | FATTR4_WORD1_TIME_DELTA | FATTR4_WORD1_TIME_METADATA | FATTR4_WORD1_TIME_CREATE \ | FATTR4_WORD1_TIME_MODIFY | FATTR4_WORD1_TIME_MODIFY_SET | FATTR4_WORD1_MOUNTED_ON_FILEID) #define NFSD4_SUPPORTED_ATTRS_WORD2 0 From 4aa8dac58c17a95bb3e301782d612c1236af4648 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 30 Sep 2021 19:19:57 -0400 Subject: [PATCH 0770/1565] NFSD: De-duplicate hash bucket indexing [ Upstream commit 378a6109dd142a678f629b740f558365150f60f9 ] Clean up: The details of finding the right hash bucket are exactly the same in both nfsd_cache_lookup() and nfsd_cache_update(). Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfscache.c | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c index a4a69ab6ab28..f79790d36728 100644 --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -84,12 +84,6 @@ nfsd_hashsize(unsigned int limit) return roundup_pow_of_two(limit / TARGET_BUCKET_SIZE); } -static u32 -nfsd_cache_hash(__be32 xid, struct nfsd_net *nn) -{ - return hash_32((__force u32)xid, nn->maskbits); -} - static struct svc_cacherep * nfsd_reply_cache_alloc(struct svc_rqst *rqstp, __wsum csum, struct nfsd_net *nn) @@ -241,6 +235,14 @@ lru_put_end(struct nfsd_drc_bucket *b, struct svc_cacherep *rp) list_move_tail(&rp->c_lru, &b->lru_head); } +static noinline struct nfsd_drc_bucket * +nfsd_cache_bucket_find(__be32 xid, struct nfsd_net *nn) +{ + unsigned int hash = hash_32((__force u32)xid, nn->maskbits); + + return &nn->drc_hashtbl[hash]; +} + static long prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn, unsigned int max) { @@ -421,10 +423,8 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp) { struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); struct svc_cacherep *rp, *found; - __be32 xid = rqstp->rq_xid; __wsum csum; - u32 hash = nfsd_cache_hash(xid, nn); - struct nfsd_drc_bucket *b = &nn->drc_hashtbl[hash]; + struct nfsd_drc_bucket *b = nfsd_cache_bucket_find(rqstp->rq_xid, nn); int type = rqstp->rq_cachetype; int rtn = RC_DOIT; @@ -528,7 +528,6 @@ void nfsd_cache_update(struct svc_rqst *rqstp, int cachetype, __be32 *statp) struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); struct svc_cacherep *rp = rqstp->rq_cacherep; struct kvec *resv = &rqstp->rq_res.head[0], *cachv; - u32 hash; struct nfsd_drc_bucket *b; int len; size_t bufsize = 0; @@ -536,8 +535,7 @@ void nfsd_cache_update(struct svc_rqst *rqstp, int cachetype, __be32 *statp) if (!rp) return; - hash = nfsd_cache_hash(rp->c_key.k_xid, nn); - b = &nn->drc_hashtbl[hash]; + b = nfsd_cache_bucket_find(rp->c_key.k_xid, nn); len = resv->iov_len - ((char*)statp - (char*)resv->iov_base); len >>= 2; From 3304d16c24f5388763b2a89212a3143b7da41d12 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 28 Sep 2021 11:39:02 -0400 Subject: [PATCH 0771/1565] NFSD: Skip extra computation for RC_NOCACHE case [ Upstream commit 0f29ce32fbc56cfdb304eec8a4deb920ccfd89c3 ] Force the compiler to skip unneeded initialization for cases that don't need those values. For example, NFSv4 COMPOUND operations are RC_NOCACHE. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfscache.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c index f79790d36728..34087a7e4f93 100644 --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -421,10 +421,10 @@ nfsd_cache_insert(struct nfsd_drc_bucket *b, struct svc_cacherep *key, */ int nfsd_cache_lookup(struct svc_rqst *rqstp) { - struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); + struct nfsd_net *nn; struct svc_cacherep *rp, *found; __wsum csum; - struct nfsd_drc_bucket *b = nfsd_cache_bucket_find(rqstp->rq_xid, nn); + struct nfsd_drc_bucket *b; int type = rqstp->rq_cachetype; int rtn = RC_DOIT; @@ -440,10 +440,12 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp) * Since the common case is a cache miss followed by an insert, * preallocate an entry. */ + nn = net_generic(SVC_NET(rqstp), nfsd_net_id); rp = nfsd_reply_cache_alloc(rqstp, csum, nn); if (!rp) goto out; + b = nfsd_cache_bucket_find(rqstp->rq_xid, nn); spin_lock(&b->cache_lock); found = nfsd_cache_insert(b, rp, nn); if (found != rp) { From 194071d46c5cc8b3fab17768a9214376eee25a30 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 28 Sep 2021 11:40:59 -0400 Subject: [PATCH 0772/1565] NFSD: Streamline the rare "found" case [ Upstream commit add1511c38166cf1036765f8c4aa939f0275a799 ] Move a rarely called function call site out of the hot path. This is an exceptionally small improvement because the compiler inlines most of the functions that nfsd_cache_lookup() calls. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfscache.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c index 34087a7e4f93..0b3f12aa37ff 100644 --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -448,11 +448,8 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp) b = nfsd_cache_bucket_find(rqstp->rq_xid, nn); spin_lock(&b->cache_lock); found = nfsd_cache_insert(b, rp, nn); - if (found != rp) { - nfsd_reply_cache_free_locked(NULL, rp, nn); - rp = found; + if (found != rp) goto found_entry; - } nfsd_stats_rc_misses_inc(); rqstp->rq_cacherep = rp; @@ -470,8 +467,10 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp) found_entry: /* We found a matching entry which is either in progress or done. */ + nfsd_reply_cache_free_locked(NULL, rp, nn); nfsd_stats_rc_hits_inc(); rtn = RC_DROPIT; + rp = found; /* Request being processed */ if (rp->c_state == RC_INPROG) From 99ab6abc88edd907caf97f52ba13825d18aedfad Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 25 Jan 2022 10:17:59 -0500 Subject: [PATCH 0773/1565] SUNRPC: Remove the .svo_enqueue_xprt method [ Upstream commit a9ff2e99e9fa501ec965da03c18a5422b37a2f44 ] We have never been able to track down and address the underlying cause of the performance issues with workqueue-based service support. svo_enqueue_xprt is called multiple times per RPC, so it adds instruction path length, but always ends up at the same function: svc_xprt_do_enqueue(). We do not anticipate needing this flexibility for dynamic nfsd thread management support. As a micro-optimization, remove .svo_enqueue_xprt because Spectre/Meltdown makes virtual function calls more costly. This change essentially reverts commit b9e13cdfac70 ("nfsd/sunrpc: turn enqueueing a svc_xprt into a svc_serv operation"). Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 1 - fs/nfs/callback.c | 2 -- fs/nfsd/nfssvc.c | 1 - include/linux/sunrpc/svc.h | 3 --- include/linux/sunrpc/svc_xprt.h | 1 - net/sunrpc/svc_xprt.c | 10 +++++----- 6 files changed, 5 insertions(+), 13 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 0475c5a5d061..3a05af873625 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -353,7 +353,6 @@ static struct notifier_block lockd_inet6addr_notifier = { static const struct svc_serv_ops lockd_sv_ops = { .svo_shutdown = svc_rpcb_cleanup, .svo_function = lockd, - .svo_enqueue_xprt = svc_xprt_do_enqueue, .svo_module = THIS_MODULE, }; diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 054cc1255fac..7a810f885063 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -234,13 +234,11 @@ static int nfs_callback_up_net(int minorversion, struct svc_serv *serv, static const struct svc_serv_ops nfs40_cb_sv_ops = { .svo_function = nfs4_callback_svc, - .svo_enqueue_xprt = svc_xprt_do_enqueue, .svo_module = THIS_MODULE, }; #if defined(CONFIG_NFS_V4_1) static const struct svc_serv_ops nfs41_cb_sv_ops = { .svo_function = nfs41_callback_svc, - .svo_enqueue_xprt = svc_xprt_do_enqueue, .svo_module = THIS_MODULE, }; diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 2efe9d33a282..3b79b97f2715 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -615,7 +615,6 @@ static int nfsd_get_default_max_blksize(void) static const struct svc_serv_ops nfsd_thread_sv_ops = { .svo_shutdown = nfsd_last_thread, .svo_function = nfsd, - .svo_enqueue_xprt = svc_xprt_do_enqueue, .svo_module = THIS_MODULE, }; diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index f116141ea64d..3c8ed018c686 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -61,9 +61,6 @@ struct svc_serv_ops { /* function for service threads to run */ int (*svo_function)(void *); - /* queue up a transport for servicing */ - void (*svo_enqueue_xprt)(struct svc_xprt *); - /* optional module to count when adding threads. * Thread function must call module_put_and_kthread_exit() to exit. */ diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index c5278871f9e4..1311425ddab7 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -131,7 +131,6 @@ int svc_create_xprt(struct svc_serv *, const char *, struct net *, const int, const unsigned short, int, const struct cred *); void svc_xprt_received(struct svc_xprt *xprt); -void svc_xprt_do_enqueue(struct svc_xprt *xprt); void svc_xprt_enqueue(struct svc_xprt *xprt); void svc_xprt_put(struct svc_xprt *xprt); void svc_xprt_copy_addrs(struct svc_rqst *rqstp, struct svc_xprt *xprt); diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 570b092165d7..833952db2319 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -31,6 +31,7 @@ static int svc_deferred_recv(struct svc_rqst *rqstp); static struct cache_deferred_req *svc_defer(struct cache_req *req); static void svc_age_temp_xprts(struct timer_list *t); static void svc_delete_xprt(struct svc_xprt *xprt); +static void svc_xprt_do_enqueue(struct svc_xprt *xprt); /* apparently the "standard" is that clients close * idle connections after 5 minutes, servers after @@ -253,12 +254,12 @@ void svc_xprt_received(struct svc_xprt *xprt) trace_svc_xprt_received(xprt); /* As soon as we clear busy, the xprt could be closed and - * 'put', so we need a reference to call svc_enqueue_xprt with: + * 'put', so we need a reference to call svc_xprt_do_enqueue with: */ svc_xprt_get(xprt); smp_mb__before_atomic(); clear_bit(XPT_BUSY, &xprt->xpt_flags); - xprt->xpt_server->sv_ops->svo_enqueue_xprt(xprt); + svc_xprt_do_enqueue(xprt); svc_xprt_put(xprt); } EXPORT_SYMBOL_GPL(svc_xprt_received); @@ -410,7 +411,7 @@ static bool svc_xprt_ready(struct svc_xprt *xprt) return false; } -void svc_xprt_do_enqueue(struct svc_xprt *xprt) +static void svc_xprt_do_enqueue(struct svc_xprt *xprt) { struct svc_pool *pool; struct svc_rqst *rqstp = NULL; @@ -454,7 +455,6 @@ void svc_xprt_do_enqueue(struct svc_xprt *xprt) put_cpu(); trace_svc_xprt_do_enqueue(xprt, rqstp); } -EXPORT_SYMBOL_GPL(svc_xprt_do_enqueue); /* * Queue up a transport with data pending. If there are idle nfsd @@ -465,7 +465,7 @@ void svc_xprt_enqueue(struct svc_xprt *xprt) { if (test_bit(XPT_BUSY, &xprt->xpt_flags)) return; - xprt->xpt_server->sv_ops->svo_enqueue_xprt(xprt); + svc_xprt_do_enqueue(xprt); } EXPORT_SYMBOL_GPL(svc_xprt_enqueue); From 8c60a476704da0e2cf3f865e20295d79d988115c Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 25 Jan 2022 17:57:23 -0500 Subject: [PATCH 0774/1565] SUNRPC: Merge svc_do_enqueue_xprt() into svc_enqueue_xprt() [ Upstream commit c0219c499799c1e92bd570c15a47e6257a27bb15 ] Neil says: "These functions were separated in commit 0971374e2818 ("SUNRPC: Reduce contention in svc_xprt_enqueue()") so that the XPT_BUSY check happened before taking any spinlocks. We have since moved or removed the spinlocks so the extra test is fairly pointless." I've made this a separate patch in case the XPT_BUSY change has unexpected consequences and needs to be reverted. Suggested-by: Neil Brown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- net/sunrpc/svc_xprt.c | 26 ++++++++++---------------- 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 833952db2319..b5e80817b02f 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -31,7 +31,6 @@ static int svc_deferred_recv(struct svc_rqst *rqstp); static struct cache_deferred_req *svc_defer(struct cache_req *req); static void svc_age_temp_xprts(struct timer_list *t); static void svc_delete_xprt(struct svc_xprt *xprt); -static void svc_xprt_do_enqueue(struct svc_xprt *xprt); /* apparently the "standard" is that clients close * idle connections after 5 minutes, servers after @@ -254,12 +253,12 @@ void svc_xprt_received(struct svc_xprt *xprt) trace_svc_xprt_received(xprt); /* As soon as we clear busy, the xprt could be closed and - * 'put', so we need a reference to call svc_xprt_do_enqueue with: + * 'put', so we need a reference to call svc_xprt_enqueue with: */ svc_xprt_get(xprt); smp_mb__before_atomic(); clear_bit(XPT_BUSY, &xprt->xpt_flags); - svc_xprt_do_enqueue(xprt); + svc_xprt_enqueue(xprt); svc_xprt_put(xprt); } EXPORT_SYMBOL_GPL(svc_xprt_received); @@ -399,6 +398,8 @@ static bool svc_xprt_ready(struct svc_xprt *xprt) smp_rmb(); xpt_flags = READ_ONCE(xprt->xpt_flags); + if (xpt_flags & BIT(XPT_BUSY)) + return false; if (xpt_flags & (BIT(XPT_CONN) | BIT(XPT_CLOSE))) return true; if (xpt_flags & (BIT(XPT_DATA) | BIT(XPT_DEFERRED))) { @@ -411,7 +412,12 @@ static bool svc_xprt_ready(struct svc_xprt *xprt) return false; } -static void svc_xprt_do_enqueue(struct svc_xprt *xprt) +/** + * svc_xprt_enqueue - Queue a transport on an idle nfsd thread + * @xprt: transport with data pending + * + */ +void svc_xprt_enqueue(struct svc_xprt *xprt) { struct svc_pool *pool; struct svc_rqst *rqstp = NULL; @@ -455,18 +461,6 @@ static void svc_xprt_do_enqueue(struct svc_xprt *xprt) put_cpu(); trace_svc_xprt_do_enqueue(xprt, rqstp); } - -/* - * Queue up a transport with data pending. If there are idle nfsd - * processes, wake 'em up. - * - */ -void svc_xprt_enqueue(struct svc_xprt *xprt) -{ - if (test_bit(XPT_BUSY, &xprt->xpt_flags)) - return; - svc_xprt_do_enqueue(xprt); -} EXPORT_SYMBOL_GPL(svc_xprt_enqueue); /* From 9a43ddd6b626c3da45abd5eab74492b65a238991 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 25 Jan 2022 13:49:29 -0500 Subject: [PATCH 0775/1565] SUNRPC: Remove svo_shutdown method [ Upstream commit 87cdd8641c8a1ec6afd2468265e20840a57fd888 ] Clean up. Neil observed that "any code that calls svc_shutdown_net() knows what the shutdown function should be, and so can call it directly." Signed-off-by: Chuck Lever Reviewed-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 5 ++--- fs/nfsd/nfssvc.c | 2 +- include/linux/sunrpc/svc.h | 3 --- net/sunrpc/svc.c | 3 --- 4 files changed, 3 insertions(+), 10 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 3a05af873625..f5b688a844aa 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -249,6 +249,7 @@ static int make_socks(struct svc_serv *serv, struct net *net, printk(KERN_WARNING "lockd_up: makesock failed, error=%d\n", err); svc_shutdown_net(serv, net); + svc_rpcb_cleanup(serv, net); return err; } @@ -287,8 +288,7 @@ static void lockd_down_net(struct svc_serv *serv, struct net *net) cancel_delayed_work_sync(&ln->grace_period_end); locks_end_grace(&ln->lockd_manager); svc_shutdown_net(serv, net); - dprintk("%s: per-net data destroyed; net=%x\n", - __func__, net->ns.inum); + svc_rpcb_cleanup(serv, net); } } else { pr_err("%s: no users! net=%x\n", @@ -351,7 +351,6 @@ static struct notifier_block lockd_inet6addr_notifier = { #endif static const struct svc_serv_ops lockd_sv_ops = { - .svo_shutdown = svc_rpcb_cleanup, .svo_function = lockd, .svo_module = THIS_MODULE, }; diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 3b79b97f2715..a1765e751b73 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -613,7 +613,6 @@ static int nfsd_get_default_max_blksize(void) } static const struct svc_serv_ops nfsd_thread_sv_ops = { - .svo_shutdown = nfsd_last_thread, .svo_function = nfsd, .svo_module = THIS_MODULE, }; @@ -724,6 +723,7 @@ void nfsd_put(struct net *net) if (kref_put(&nn->nfsd_serv->sv_refcnt, nfsd_noop)) { svc_shutdown_net(nn->nfsd_serv, net); + nfsd_last_thread(nn->nfsd_serv, net); svc_destroy(&nn->nfsd_serv->sv_refcnt); spin_lock(&nfsd_notifier_lock); nn->nfsd_serv = NULL; diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 3c8ed018c686..6f3ba5c51464 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -55,9 +55,6 @@ struct svc_pool { struct svc_serv; struct svc_serv_ops { - /* Callback to use when last thread exits. */ - void (*svo_shutdown)(struct svc_serv *, struct net *); - /* function for service threads to run */ int (*svo_function)(void *); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index d34d03b0bf76..709118bac4c3 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -537,9 +537,6 @@ EXPORT_SYMBOL_GPL(svc_create_pooled); void svc_shutdown_net(struct svc_serv *serv, struct net *net) { svc_close_net(serv, net); - - if (serv->sv_ops->svo_shutdown) - serv->sv_ops->svo_shutdown(serv, net); } EXPORT_SYMBOL_GPL(svc_shutdown_net); From c58a9cfd2091c99a84f0a9d811ce2bc3db2adf28 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 26 Jan 2022 11:42:08 -0500 Subject: [PATCH 0776/1565] SUNRPC: Rename svc_create_xprt() [ Upstream commit 352ad31448fecc78a2e9b78da64eea5d63b8d0ce ] Clean up: Use the "svc_xprt_" function naming convention as is used for other external APIs. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 4 ++-- fs/nfs/callback.c | 12 ++++++------ fs/nfsd/nfsctl.c | 8 ++++---- fs/nfsd/nfssvc.c | 8 ++++---- include/linux/sunrpc/svc_xprt.h | 7 ++++--- net/sunrpc/svc_xprt.c | 24 +++++++++++++++++++----- 6 files changed, 39 insertions(+), 24 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index f5b688a844aa..bba6f2b45b64 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -197,8 +197,8 @@ static int create_lockd_listener(struct svc_serv *serv, const char *name, xprt = svc_find_xprt(serv, name, net, family, 0); if (xprt == NULL) - return svc_create_xprt(serv, name, net, family, port, - SVC_SOCK_DEFAULTS, cred); + return svc_xprt_create(serv, name, net, family, port, + SVC_SOCK_DEFAULTS, cred); svc_xprt_put(xprt); return 0; } diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 7a810f885063..c1a8767100ae 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -45,18 +45,18 @@ static int nfs4_callback_up_net(struct svc_serv *serv, struct net *net) int ret; struct nfs_net *nn = net_generic(net, nfs_net_id); - ret = svc_create_xprt(serv, "tcp", net, PF_INET, - nfs_callback_set_tcpport, SVC_SOCK_ANONYMOUS, - cred); + ret = svc_xprt_create(serv, "tcp", net, PF_INET, + nfs_callback_set_tcpport, SVC_SOCK_ANONYMOUS, + cred); if (ret <= 0) goto out_err; nn->nfs_callback_tcpport = ret; dprintk("NFS: Callback listener port = %u (af %u, net %x)\n", nn->nfs_callback_tcpport, PF_INET, net->ns.inum); - ret = svc_create_xprt(serv, "tcp", net, PF_INET6, - nfs_callback_set_tcpport, SVC_SOCK_ANONYMOUS, - cred); + ret = svc_xprt_create(serv, "tcp", net, PF_INET6, + nfs_callback_set_tcpport, SVC_SOCK_ANONYMOUS, + cred); if (ret > 0) { nn->nfs_callback_tcpport6 = ret; dprintk("NFS: Callback listener port = %u (af %u, net %x)\n", diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 68b020f2002b..8fec779994f7 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -772,13 +772,13 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr if (err != 0) return err; - err = svc_create_xprt(nn->nfsd_serv, transport, net, - PF_INET, port, SVC_SOCK_ANONYMOUS, cred); + err = svc_xprt_create(nn->nfsd_serv, transport, net, + PF_INET, port, SVC_SOCK_ANONYMOUS, cred); if (err < 0) goto out_err; - err = svc_create_xprt(nn->nfsd_serv, transport, net, - PF_INET6, port, SVC_SOCK_ANONYMOUS, cred); + err = svc_xprt_create(nn->nfsd_serv, transport, net, + PF_INET6, port, SVC_SOCK_ANONYMOUS, cred); if (err < 0 && err != -EAFNOSUPPORT) goto out_close; diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index a1765e751b73..5790b1eaff72 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -293,13 +293,13 @@ static int nfsd_init_socks(struct net *net, const struct cred *cred) if (!list_empty(&nn->nfsd_serv->sv_permsocks)) return 0; - error = svc_create_xprt(nn->nfsd_serv, "udp", net, PF_INET, NFS_PORT, - SVC_SOCK_DEFAULTS, cred); + error = svc_xprt_create(nn->nfsd_serv, "udp", net, PF_INET, NFS_PORT, + SVC_SOCK_DEFAULTS, cred); if (error < 0) return error; - error = svc_create_xprt(nn->nfsd_serv, "tcp", net, PF_INET, NFS_PORT, - SVC_SOCK_DEFAULTS, cred); + error = svc_xprt_create(nn->nfsd_serv, "tcp", net, PF_INET, NFS_PORT, + SVC_SOCK_DEFAULTS, cred); if (error < 0) return error; diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index 1311425ddab7..cba9559bba6f 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -127,9 +127,10 @@ int svc_reg_xprt_class(struct svc_xprt_class *); void svc_unreg_xprt_class(struct svc_xprt_class *); void svc_xprt_init(struct net *, struct svc_xprt_class *, struct svc_xprt *, struct svc_serv *); -int svc_create_xprt(struct svc_serv *, const char *, struct net *, - const int, const unsigned short, int, - const struct cred *); +int svc_xprt_create(struct svc_serv *serv, const char *xprt_name, + struct net *net, const int family, + const unsigned short port, int flags, + const struct cred *cred); void svc_xprt_received(struct svc_xprt *xprt); void svc_xprt_enqueue(struct svc_xprt *xprt); void svc_xprt_put(struct svc_xprt *xprt); diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index b5e80817b02f..20baa0de7017 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -272,7 +272,7 @@ void svc_add_new_perm_xprt(struct svc_serv *serv, struct svc_xprt *new) svc_xprt_received(new); } -static int _svc_create_xprt(struct svc_serv *serv, const char *xprt_name, +static int _svc_xprt_create(struct svc_serv *serv, const char *xprt_name, struct net *net, const int family, const unsigned short port, int flags, const struct cred *cred) @@ -308,21 +308,35 @@ static int _svc_create_xprt(struct svc_serv *serv, const char *xprt_name, return -EPROTONOSUPPORT; } -int svc_create_xprt(struct svc_serv *serv, const char *xprt_name, +/** + * svc_xprt_create - Add a new listener to @serv + * @serv: target RPC service + * @xprt_name: transport class name + * @net: network namespace + * @family: network address family + * @port: listener port + * @flags: SVC_SOCK flags + * @cred: credential to bind to this transport + * + * Return values: + * %0: New listener added successfully + * %-EPROTONOSUPPORT: Requested transport type not supported + */ +int svc_xprt_create(struct svc_serv *serv, const char *xprt_name, struct net *net, const int family, const unsigned short port, int flags, const struct cred *cred) { int err; - err = _svc_create_xprt(serv, xprt_name, net, family, port, flags, cred); + err = _svc_xprt_create(serv, xprt_name, net, family, port, flags, cred); if (err == -EPROTONOSUPPORT) { request_module("svc%s", xprt_name); - err = _svc_create_xprt(serv, xprt_name, net, family, port, flags, cred); + err = _svc_xprt_create(serv, xprt_name, net, family, port, flags, cred); } return err; } -EXPORT_SYMBOL_GPL(svc_create_xprt); +EXPORT_SYMBOL_GPL(svc_xprt_create); /* * Copy the local and remote xprt addresses to the rqstp structure From a4bbb1ab69abbd5735a38916f3eae1cf7781eb02 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 31 Jan 2022 13:34:29 -0500 Subject: [PATCH 0777/1565] SUNRPC: Rename svc_close_xprt() [ Upstream commit 4355d767a21b9445958fc11bce9a9701f76529d3 ] Clean up: Use the "svc_xprt_" function naming convention as is used for other external APIs. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsctl.c | 2 +- include/linux/sunrpc/svc_xprt.h | 2 +- net/sunrpc/svc.c | 2 +- net/sunrpc/svc_xprt.c | 9 +++++++-- net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 2 +- 5 files changed, 11 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 8fec779994f7..16920e4512bd 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -790,7 +790,7 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr out_close: xprt = svc_find_xprt(nn->nfsd_serv, transport, net, PF_INET, port); if (xprt != NULL) { - svc_close_xprt(xprt); + svc_xprt_close(xprt); svc_xprt_put(xprt); } out_err: diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index cba9559bba6f..f614f8e248e1 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -135,7 +135,7 @@ void svc_xprt_received(struct svc_xprt *xprt); void svc_xprt_enqueue(struct svc_xprt *xprt); void svc_xprt_put(struct svc_xprt *xprt); void svc_xprt_copy_addrs(struct svc_rqst *rqstp, struct svc_xprt *xprt); -void svc_close_xprt(struct svc_xprt *xprt); +void svc_xprt_close(struct svc_xprt *xprt); int svc_port_is_privileged(struct sockaddr *sin); int svc_print_xprts(char *buf, int maxlen); struct svc_xprt *svc_find_xprt(struct svc_serv *serv, const char *xcl_name, diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 709118bac4c3..9126e1b7d076 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1403,7 +1403,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) svc_authorise(rqstp); close_xprt: if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags)) - svc_close_xprt(rqstp->rq_xprt); + svc_xprt_close(rqstp->rq_xprt); dprintk("svc: svc_process close\n"); return 0; diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 20baa0de7017..ee4a6d57056f 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -1056,7 +1056,12 @@ static void svc_delete_xprt(struct svc_xprt *xprt) svc_xprt_put(xprt); } -void svc_close_xprt(struct svc_xprt *xprt) +/** + * svc_xprt_close - Close a client connection + * @xprt: transport to disconnect + * + */ +void svc_xprt_close(struct svc_xprt *xprt) { trace_svc_xprt_close(xprt); set_bit(XPT_CLOSE, &xprt->xpt_flags); @@ -1071,7 +1076,7 @@ void svc_close_xprt(struct svc_xprt *xprt) */ svc_delete_xprt(xprt); } -EXPORT_SYMBOL_GPL(svc_close_xprt); +EXPORT_SYMBOL_GPL(svc_xprt_close); static int svc_close_list(struct svc_serv *serv, struct list_head *xprt_list, struct net *net) { diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c index c5154bc38e12..feac8c26fb87 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c @@ -186,7 +186,7 @@ static int xprt_rdma_bc_send_request(struct rpc_rqst *rqst) ret = rpcrdma_bc_send_request(rdma, rqst); if (ret == -ENOTCONN) - svc_close_xprt(sxprt); + svc_xprt_close(sxprt); return ret; } From 36c57b27a7d83aeb232e78eaee645dea26a037b3 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 26 Jan 2022 11:30:55 -0500 Subject: [PATCH 0778/1565] SUNRPC: Remove svc_shutdown_net() [ Upstream commit c7d7ec8f043e53ad16e30f5ebb8b9df415ec0f2b ] Clean up: svc_shutdown_net() now does nothing but call svc_close_net(). Replace all external call sites. svc_close_net() is renamed to be the inverse of svc_xprt_create(). Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 4 ++-- fs/nfs/callback.c | 2 +- fs/nfsd/nfssvc.c | 2 +- include/linux/sunrpc/svc.h | 1 - include/linux/sunrpc/svc_xprt.h | 1 + net/sunrpc/svc.c | 6 ------ net/sunrpc/svc_xprt.c | 9 +++++++-- 7 files changed, 12 insertions(+), 13 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index bba6f2b45b64..c83ec4a375bc 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -248,7 +248,7 @@ static int make_socks(struct svc_serv *serv, struct net *net, if (warned++ == 0) printk(KERN_WARNING "lockd_up: makesock failed, error=%d\n", err); - svc_shutdown_net(serv, net); + svc_xprt_destroy_all(serv, net); svc_rpcb_cleanup(serv, net); return err; } @@ -287,7 +287,7 @@ static void lockd_down_net(struct svc_serv *serv, struct net *net) nlm_shutdown_hosts_net(net); cancel_delayed_work_sync(&ln->grace_period_end); locks_end_grace(&ln->lockd_manager); - svc_shutdown_net(serv, net); + svc_xprt_destroy_all(serv, net); svc_rpcb_cleanup(serv, net); } } else { diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index c1a8767100ae..c98c68513590 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -189,7 +189,7 @@ static void nfs_callback_down_net(u32 minorversion, struct svc_serv *serv, struc return; dprintk("NFS: destroy per-net callback data; net=%x\n", net->ns.inum); - svc_shutdown_net(serv, net); + svc_xprt_destroy_all(serv, net); } static int nfs_callback_up_net(int minorversion, struct svc_serv *serv, diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 5790b1eaff72..38895372ec39 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -722,7 +722,7 @@ void nfsd_put(struct net *net) struct nfsd_net *nn = net_generic(net, nfsd_net_id); if (kref_put(&nn->nfsd_serv->sv_refcnt, nfsd_noop)) { - svc_shutdown_net(nn->nfsd_serv, net); + svc_xprt_destroy_all(nn->nfsd_serv, net); nfsd_last_thread(nn->nfsd_serv, net); svc_destroy(&nn->nfsd_serv->sv_refcnt); spin_lock(&nfsd_notifier_lock); diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 6f3ba5c51464..482a024156b1 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -511,7 +511,6 @@ struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, const struct svc_serv_ops *); int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int); int svc_pool_stats_open(struct svc_serv *serv, struct file *file); -void svc_shutdown_net(struct svc_serv *, struct net *); int svc_process(struct svc_rqst *); int bc_svc_process(struct svc_serv *, struct rpc_rqst *, struct svc_rqst *); diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index f614f8e248e1..dbffb92511ef 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -131,6 +131,7 @@ int svc_xprt_create(struct svc_serv *serv, const char *xprt_name, struct net *net, const int family, const unsigned short port, int flags, const struct cred *cred); +void svc_xprt_destroy_all(struct svc_serv *serv, struct net *net); void svc_xprt_received(struct svc_xprt *xprt); void svc_xprt_enqueue(struct svc_xprt *xprt); void svc_xprt_put(struct svc_xprt *xprt); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 9126e1b7d076..c9195e40959c 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -534,12 +534,6 @@ svc_create_pooled(struct svc_program *prog, unsigned int bufsize, } EXPORT_SYMBOL_GPL(svc_create_pooled); -void svc_shutdown_net(struct svc_serv *serv, struct net *net) -{ - svc_close_net(serv, net); -} -EXPORT_SYMBOL_GPL(svc_shutdown_net); - /* * Destroy an RPC service. Should be called with appropriate locking to * protect sv_permsocks and sv_tempsocks. diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index ee4a6d57056f..000b737784bd 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -1128,7 +1128,11 @@ static void svc_clean_up_xprts(struct svc_serv *serv, struct net *net) } } -/* +/** + * svc_xprt_destroy_all - Destroy transports associated with @serv + * @serv: RPC service to be shut down + * @net: target network namespace + * * Server threads may still be running (especially in the case where the * service is still running in other network namespaces). * @@ -1140,7 +1144,7 @@ static void svc_clean_up_xprts(struct svc_serv *serv, struct net *net) * threads, we may need to wait a little while and then check again to * see if they're done. */ -void svc_close_net(struct svc_serv *serv, struct net *net) +void svc_xprt_destroy_all(struct svc_serv *serv, struct net *net) { int delay = 0; @@ -1151,6 +1155,7 @@ void svc_close_net(struct svc_serv *serv, struct net *net) msleep(delay++); } } +EXPORT_SYMBOL_GPL(svc_xprt_destroy_all); /* * Handle defer and revisit of requests From 9802c5746038a1b1e5cedb151352780230e983da Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 16 Feb 2022 12:31:09 -0500 Subject: [PATCH 0779/1565] NFSD: Remove svc_serv_ops::svo_module [ Upstream commit f49169c97fceb21ad6a0aaf671c50b0f520f15a5 ] struct svc_serv_ops is about to be removed. Neil Brown says: > I suspect svo_module can go as well - I don't think the thread is > ever the thing that primarily keeps a module active. A random sample of kthread_create() callers shows sunrpc is the only one that manages module reference count in this way. Suggested-by: Neil Brown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 4 +--- fs/nfs/callback.c | 7 ++----- fs/nfs/nfs4state.c | 1 - fs/nfsd/nfssvc.c | 3 --- include/linux/sunrpc/svc.h | 5 ----- kernel/module.c | 2 +- net/sunrpc/svc.c | 2 -- 7 files changed, 4 insertions(+), 20 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index c83ec4a375bc..bfde31124f3a 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -184,8 +184,7 @@ lockd(void *vrqstp) dprintk("lockd_down: service stopped\n"); svc_exit_thread(rqstp); - - module_put_and_kthread_exit(0); + return 0; } static int create_lockd_listener(struct svc_serv *serv, const char *name, @@ -352,7 +351,6 @@ static struct notifier_block lockd_inet6addr_notifier = { static const struct svc_serv_ops lockd_sv_ops = { .svo_function = lockd, - .svo_module = THIS_MODULE, }; static int lockd_get(void) diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index c98c68513590..a494f9e7bd0a 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -17,7 +17,6 @@ #include #include #include -#include #include #include @@ -92,8 +91,8 @@ nfs4_callback_svc(void *vrqstp) continue; svc_process(rqstp); } + svc_exit_thread(rqstp); - module_put_and_kthread_exit(0); return 0; } @@ -136,8 +135,8 @@ nfs41_callback_svc(void *vrqstp) finish_wait(&serv->sv_cb_waitq, &wq); } } + svc_exit_thread(rqstp); - module_put_and_kthread_exit(0); return 0; } @@ -234,12 +233,10 @@ static int nfs_callback_up_net(int minorversion, struct svc_serv *serv, static const struct svc_serv_ops nfs40_cb_sv_ops = { .svo_function = nfs4_callback_svc, - .svo_module = THIS_MODULE, }; #if defined(CONFIG_NFS_V4_1) static const struct svc_serv_ops nfs41_cb_sv_ops = { .svo_function = nfs41_callback_svc, - .svo_module = THIS_MODULE, }; static const struct svc_serv_ops *nfs4_cb_sv_ops[] = { diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index d8fc5d72a161..ae2da65ffbaf 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -2757,7 +2757,6 @@ static int nfs4_run_state_manager(void *ptr) goto again; nfs_put_client(clp); - module_put_and_kthread_exit(0); return 0; } diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 38895372ec39..d25d4c12a499 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -614,7 +614,6 @@ static int nfsd_get_default_max_blksize(void) static const struct svc_serv_ops nfsd_thread_sv_ops = { .svo_function = nfsd, - .svo_module = THIS_MODULE, }; void nfsd_shutdown_threads(struct net *net) @@ -1018,8 +1017,6 @@ nfsd(void *vrqstp) msleep(20); } - /* Release module */ - module_put_and_kthread_exit(0); return 0; } diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 482a024156b1..c64db9b14a64 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -57,11 +57,6 @@ struct svc_serv; struct svc_serv_ops { /* function for service threads to run */ int (*svo_function)(void *); - - /* optional module to count when adding threads. - * Thread function must call module_put_and_kthread_exit() to exit. - */ - struct module *svo_module; }; /* diff --git a/kernel/module.c b/kernel/module.c index 9030ff8c0855..edc7b99cb16f 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -334,7 +334,7 @@ static inline void add_taint_module(struct module *mod, unsigned flag, /* * A thread that wants to hold a reference to a module only while it - * is running can call this to safely exit. nfsd and lockd use this. + * is running can call this to safely exit. */ void __noreturn __module_put_and_kthread_exit(struct module *mod, long code) { diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index c9195e40959c..bdecc902cf99 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -734,11 +734,9 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) if (IS_ERR(rqstp)) return PTR_ERR(rqstp); - __module_get(serv->sv_ops->svo_module); task = kthread_create_on_node(serv->sv_ops->svo_function, rqstp, node, "%s", serv->sv_name); if (IS_ERR(task)) { - module_put(serv->sv_ops->svo_module); svc_exit_thread(rqstp); return PTR_ERR(task); } From 7e4328b3b98fa6c8ee47298d110abe3c17221ba4 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 16 Feb 2022 12:16:27 -0500 Subject: [PATCH 0780/1565] NFSD: Move svc_serv_ops::svo_function into struct svc_serv [ Upstream commit 37902c6313090235c847af89c5515591261ee338 ] Hoist svo_function back into svc_serv and remove struct svc_serv_ops, since the struct is now devoid of fields. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 6 +----- fs/nfs/callback.c | 43 ++++++++++---------------------------- fs/nfsd/nfssvc.c | 7 +------ include/linux/sunrpc/svc.h | 14 ++++--------- net/sunrpc/svc.c | 37 ++++++++++++++++++++++---------- 5 files changed, 43 insertions(+), 64 deletions(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index bfde31124f3a..59ef8a1f843f 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -349,10 +349,6 @@ static struct notifier_block lockd_inet6addr_notifier = { }; #endif -static const struct svc_serv_ops lockd_sv_ops = { - .svo_function = lockd, -}; - static int lockd_get(void) { struct svc_serv *serv; @@ -376,7 +372,7 @@ static int lockd_get(void) nlm_timeout = LOCKD_DFLT_TIMEO; nlmsvc_timeout = nlm_timeout * HZ; - serv = svc_create(&nlmsvc_program, LOCKD_BUFSIZE, &lockd_sv_ops); + serv = svc_create(&nlmsvc_program, LOCKD_BUFSIZE, lockd); if (!serv) { printk(KERN_WARNING "lockd_up: create service failed\n"); return -ENOMEM; diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index a494f9e7bd0a..456af7d230cf 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -231,29 +231,10 @@ static int nfs_callback_up_net(int minorversion, struct svc_serv *serv, return ret; } -static const struct svc_serv_ops nfs40_cb_sv_ops = { - .svo_function = nfs4_callback_svc, -}; -#if defined(CONFIG_NFS_V4_1) -static const struct svc_serv_ops nfs41_cb_sv_ops = { - .svo_function = nfs41_callback_svc, -}; - -static const struct svc_serv_ops *nfs4_cb_sv_ops[] = { - [0] = &nfs40_cb_sv_ops, - [1] = &nfs41_cb_sv_ops, -}; -#else -static const struct svc_serv_ops *nfs4_cb_sv_ops[] = { - [0] = &nfs40_cb_sv_ops, - [1] = NULL, -}; -#endif - static struct svc_serv *nfs_callback_create_svc(int minorversion) { struct nfs_callback_data *cb_info = &nfs_callback_info[minorversion]; - const struct svc_serv_ops *sv_ops; + int (*threadfn)(void *data); struct svc_serv *serv; /* @@ -262,17 +243,6 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion) if (cb_info->serv) return svc_get(cb_info->serv); - switch (minorversion) { - case 0: - sv_ops = nfs4_cb_sv_ops[0]; - break; - default: - sv_ops = nfs4_cb_sv_ops[1]; - } - - if (sv_ops == NULL) - return ERR_PTR(-ENOTSUPP); - /* * Sanity check: if there's no task, * we should be the first user ... @@ -281,7 +251,16 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion) printk(KERN_WARNING "nfs_callback_create_svc: no kthread, %d users??\n", cb_info->users); - serv = svc_create(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE, sv_ops); + threadfn = nfs4_callback_svc; +#if defined(CONFIG_NFS_V4_1) + if (minorversion) + threadfn = nfs41_callback_svc; +#else + if (minorversion) + return ERR_PTR(-ENOTSUPP); +#endif + serv = svc_create(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE, + threadfn); if (!serv) { printk(KERN_ERR "nfs_callback_create_svc: create service failed\n"); return ERR_PTR(-ENOMEM); diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index d25d4c12a499..2f74be98ff2d 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -612,10 +612,6 @@ static int nfsd_get_default_max_blksize(void) return ret; } -static const struct svc_serv_ops nfsd_thread_sv_ops = { - .svo_function = nfsd, -}; - void nfsd_shutdown_threads(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); @@ -654,8 +650,7 @@ int nfsd_create_serv(struct net *net) if (nfsd_max_blksize == 0) nfsd_max_blksize = nfsd_get_default_max_blksize(); nfsd_reset_versions(nn); - serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize, - &nfsd_thread_sv_ops); + serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize, nfsd); if (serv == NULL) return -ENOMEM; diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index c64db9b14a64..3c908ffbbf45 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -52,13 +52,6 @@ struct svc_pool { unsigned long sp_flags; } ____cacheline_aligned_in_smp; -struct svc_serv; - -struct svc_serv_ops { - /* function for service threads to run */ - int (*svo_function)(void *); -}; - /* * RPC service. * @@ -91,7 +84,8 @@ struct svc_serv { unsigned int sv_nrpools; /* number of thread pools */ struct svc_pool * sv_pools; /* array of thread pools */ - const struct svc_serv_ops *sv_ops; /* server operations */ + int (*sv_threadfn)(void *data); + #if defined(CONFIG_SUNRPC_BACKCHANNEL) struct list_head sv_cb_list; /* queue for callback requests * that arrive over the same @@ -495,7 +489,7 @@ int svc_rpcb_setup(struct svc_serv *serv, struct net *net); void svc_rpcb_cleanup(struct svc_serv *serv, struct net *net); int svc_bind(struct svc_serv *serv, struct net *net); struct svc_serv *svc_create(struct svc_program *, unsigned int, - const struct svc_serv_ops *); + int (*threadfn)(void *data)); struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node); void svc_rqst_replace_page(struct svc_rqst *rqstp, @@ -503,7 +497,7 @@ void svc_rqst_replace_page(struct svc_rqst *rqstp, void svc_rqst_free(struct svc_rqst *); void svc_exit_thread(struct svc_rqst *); struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, - const struct svc_serv_ops *); + int (*threadfn)(void *data)); int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int); int svc_pool_stats_open(struct svc_serv *serv, struct file *file); int svc_process(struct svc_rqst *); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index bdecc902cf99..7f231947347e 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -446,7 +446,7 @@ __svc_init_bc(struct svc_serv *serv) */ static struct svc_serv * __svc_create(struct svc_program *prog, unsigned int bufsize, int npools, - const struct svc_serv_ops *ops) + int (*threadfn)(void *data)) { struct svc_serv *serv; unsigned int vers; @@ -463,7 +463,7 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools, bufsize = RPCSVC_MAXPAYLOAD; serv->sv_max_payload = bufsize? bufsize : 4096; serv->sv_max_mesg = roundup(serv->sv_max_payload + PAGE_SIZE, PAGE_SIZE); - serv->sv_ops = ops; + serv->sv_threadfn = threadfn; xdrsize = 0; while (prog) { prog->pg_lovers = prog->pg_nvers-1; @@ -509,22 +509,37 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools, return serv; } -struct svc_serv * -svc_create(struct svc_program *prog, unsigned int bufsize, - const struct svc_serv_ops *ops) +/** + * svc_create - Create an RPC service + * @prog: the RPC program the new service will handle + * @bufsize: maximum message size for @prog + * @threadfn: a function to service RPC requests for @prog + * + * Returns an instantiated struct svc_serv object or NULL. + */ +struct svc_serv *svc_create(struct svc_program *prog, unsigned int bufsize, + int (*threadfn)(void *data)) { - return __svc_create(prog, bufsize, /*npools*/1, ops); + return __svc_create(prog, bufsize, 1, threadfn); } EXPORT_SYMBOL_GPL(svc_create); -struct svc_serv * -svc_create_pooled(struct svc_program *prog, unsigned int bufsize, - const struct svc_serv_ops *ops) +/** + * svc_create_pooled - Create an RPC service with pooled threads + * @prog: the RPC program the new service will handle + * @bufsize: maximum message size for @prog + * @threadfn: a function to service RPC requests for @prog + * + * Returns an instantiated struct svc_serv object or NULL. + */ +struct svc_serv *svc_create_pooled(struct svc_program *prog, + unsigned int bufsize, + int (*threadfn)(void *data)) { struct svc_serv *serv; unsigned int npools = svc_pool_map_get(); - serv = __svc_create(prog, bufsize, npools, ops); + serv = __svc_create(prog, bufsize, npools, threadfn); if (!serv) goto out_err; return serv; @@ -734,7 +749,7 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) if (IS_ERR(rqstp)) return PTR_ERR(rqstp); - task = kthread_create_on_node(serv->sv_ops->svo_function, rqstp, + task = kthread_create_on_node(serv->sv_threadfn, rqstp, node, "%s", serv->sv_name); if (IS_ERR(task)) { svc_exit_thread(rqstp); From ec3b252a55f0a188ef3c7db87056c4d5e0696365 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 6 Feb 2022 12:25:47 -0500 Subject: [PATCH 0781/1565] NFSD: Remove CONFIG_NFSD_V3 [ Upstream commit 5f9a62ff7d2808c7b56c0ec90f3b7eae5872afe6 ] Eventually support for NFSv2 in the Linux NFS server is to be deprecated and then removed. However, NFSv2 is the "always supported" version that is available as soon as CONFIG_NFSD is set. Before NFSv2 support can be removed, we need to choose a different "always supported" version. This patch removes CONFIG_NFSD_V3 so that NFSv3 is always supported, as NFSv2 is today. When NFSv2 support is removed, NFSv3 will become the only "always supported" NFS version. The defconfigs still need to be updated to remove CONFIG_NFSD_V3=y. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/Kconfig | 2 +- fs/nfsd/Kconfig | 12 +----------- fs/nfsd/Makefile | 3 +-- fs/nfsd/nfsfh.c | 4 ---- fs/nfsd/nfsfh.h | 20 -------------------- fs/nfsd/nfssvc.c | 2 -- fs/nfsd/vfs.c | 9 --------- fs/nfsd/vfs.h | 2 -- 8 files changed, 3 insertions(+), 51 deletions(-) diff --git a/fs/Kconfig b/fs/Kconfig index eaff422877c3..11b60d160f88 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -320,7 +320,7 @@ config LOCKD config LOCKD_V4 bool - depends on NFSD_V3 || NFS_V3 + depends on NFSD || NFS_V3 depends on FILE_LOCKING default y diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig index f229172652be..887af7966b03 100644 --- a/fs/nfsd/Kconfig +++ b/fs/nfsd/Kconfig @@ -35,18 +35,9 @@ config NFSD_V2_ACL bool depends on NFSD -config NFSD_V3 - bool "NFS server support for NFS version 3" - depends on NFSD - help - This option enables support in your system's NFS server for - version 3 of the NFS protocol (RFC 1813). - - If unsure, say Y. - config NFSD_V3_ACL bool "NFS server support for the NFSv3 ACL protocol extension" - depends on NFSD_V3 + depends on NFSD select NFSD_V2_ACL help Solaris NFS servers support an auxiliary NFSv3 ACL protocol that @@ -70,7 +61,6 @@ config NFSD_V3_ACL config NFSD_V4 bool "NFS server support for NFS version 4" depends on NFSD && PROC_FS - select NFSD_V3 select FS_POSIX_ACL select SUNRPC_GSS select CRYPTO diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile index 3f0983e93a99..805c06d5f1b4 100644 --- a/fs/nfsd/Makefile +++ b/fs/nfsd/Makefile @@ -12,9 +12,8 @@ nfsd-y += trace.o nfsd-y += nfssvc.o nfsctl.o nfsproc.o nfsfh.o vfs.o \ export.o auth.o lockd.o nfscache.o nfsxdr.o \ - stats.o filecache.o + stats.o filecache.o nfs3proc.o nfs3xdr.o nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o -nfsd-$(CONFIG_NFSD_V3) += nfs3proc.o nfs3xdr.o nfsd-$(CONFIG_NFSD_V3_ACL) += nfs3acl.o nfsd-$(CONFIG_NFSD_V4) += nfs4proc.o nfs4xdr.o nfs4state.o nfs4idmap.o \ nfs4acl.o nfs4callback.o nfs4recover.o diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 3b9751555f8f..d4ae838948ba 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -610,8 +610,6 @@ fh_update(struct svc_fh *fhp) return nfserr_serverfault; } -#ifdef CONFIG_NFSD_V3 - /** * fh_fill_pre_attrs - Fill in pre-op attributes * @fhp: file handle to be updated @@ -672,8 +670,6 @@ void fh_fill_post_attrs(struct svc_fh *fhp) nfsd4_change_attribute(&fhp->fh_post_attr, inode); } -#endif /* CONFIG_NFSD_V3 */ - /* * Release a file handle. */ diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 434930d8a946..fb9d358a267e 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -90,7 +90,6 @@ typedef struct svc_fh { * operation */ int fh_flags; /* FH flags */ -#ifdef CONFIG_NFSD_V3 bool fh_post_saved; /* post-op attrs saved */ bool fh_pre_saved; /* pre-op attrs saved */ @@ -107,7 +106,6 @@ typedef struct svc_fh { /* Post-op attributes saved in fh_unlock */ struct kstat fh_post_attr; /* full attrs after operation */ u64 fh_post_change; /* nfsv4 change; see above */ -#endif /* CONFIG_NFSD_V3 */ } svc_fh; #define NFSD4_FH_FOREIGN (1<<0) #define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f)) @@ -283,8 +281,6 @@ static inline u32 knfsd_fh_hash(const struct knfsd_fh *fh) } #endif -#ifdef CONFIG_NFSD_V3 - /** * fh_clear_pre_post_attrs - Reset pre/post attributes * @fhp: file handle to be updated @@ -327,22 +323,6 @@ static inline u64 nfsd4_change_attribute(struct kstat *stat, extern void fh_fill_pre_attrs(struct svc_fh *fhp); extern void fh_fill_post_attrs(struct svc_fh *fhp); -#else /* !CONFIG_NFSD_V3 */ - -static inline void fh_clear_pre_post_attrs(struct svc_fh *fhp) -{ -} - -static inline void fh_fill_pre_attrs(struct svc_fh *fhp) -{ -} - -static inline void fh_fill_post_attrs(struct svc_fh *fhp) -{ -} - -#endif /* !CONFIG_NFSD_V3 */ - /* * Lock a file handle/inode diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 2f74be98ff2d..011c556caa1e 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -117,9 +117,7 @@ static struct svc_stat nfsd_acl_svcstats = { static const struct svc_version *nfsd_version[] = { [2] = &nfsd_version2, -#if defined(CONFIG_NFSD_V3) [3] = &nfsd_version3, -#endif #if defined(CONFIG_NFSD_V4) [4] = &nfsd_version4, #endif diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 89c50ccedf4d..86584e727ce0 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -32,9 +32,7 @@ #include #include -#ifdef CONFIG_NFSD_V3 #include "xdr3.h" -#endif /* CONFIG_NFSD_V3 */ #ifdef CONFIG_NFSD_V4 #include "../internal.h" @@ -627,7 +625,6 @@ __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh *fhp, } #endif /* defined(CONFIG_NFSD_V4) */ -#ifdef CONFIG_NFSD_V3 /* * Check server access rights to a file system object */ @@ -739,7 +736,6 @@ nfsd_access(struct svc_rqst *rqstp, struct svc_fh *fhp, u32 *access, u32 *suppor out: return error; } -#endif /* CONFIG_NFSD_V3 */ int nfsd_open_break_lease(struct inode *inode, int access) { @@ -1139,7 +1135,6 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset, return err; } -#ifdef CONFIG_NFSD_V3 /** * nfsd_commit - Commit pending writes to stable storage * @rqstp: RPC request being processed @@ -1217,7 +1212,6 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, u64 offset, out: return err; } -#endif /* CONFIG_NFSD_V3 */ static __be32 nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *resfhp, @@ -1406,8 +1400,6 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, rdev, resfhp); } -#ifdef CONFIG_NFSD_V3 - /* * NFSv3 and NFSv4 version of nfsd_create */ @@ -1573,7 +1565,6 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, err = nfserrno(host_err); goto out; } -#endif /* CONFIG_NFSD_V3 */ /* * Read a symlink. On entry, *lenp must contain the maximum path length that diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index 2c43d10e3cab..ccb87b2864f6 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -68,7 +68,6 @@ __be32 nfsd_create_locked(struct svc_rqst *, struct svc_fh *, __be32 nfsd_create(struct svc_rqst *, struct svc_fh *, char *name, int len, struct iattr *attrs, int type, dev_t rdev, struct svc_fh *res); -#ifdef CONFIG_NFSD_V3 __be32 nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *); __be32 do_nfsd_create(struct svc_rqst *, struct svc_fh *, char *name, int len, struct iattr *attrs, @@ -76,7 +75,6 @@ __be32 do_nfsd_create(struct svc_rqst *, struct svc_fh *, u32 *verifier, bool *truncp, bool *created); __be32 nfsd_commit(struct svc_rqst *rqst, struct svc_fh *fhp, u64 offset, u32 count, __be32 *verf); -#endif /* CONFIG_NFSD_V3 */ #ifdef CONFIG_NFSD_V4 __be32 nfsd_getxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name, void **bufp, int *lenp); From e96076f5790f93ff7166519cd8e3548ddfc80743 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 16 Feb 2022 11:26:06 -0500 Subject: [PATCH 0782/1565] NFSD: Clean up _lm_ operation names [ Upstream commit 35aff0678f99b0623bb72d50112de9e163a19559 ] The common practice is to name function instances the same as the method names, but with a uniquifying prefix. Commit aef9583b234a ("NFSD: Get reference of lockowner when coping file_lock") missed this -- the new function names should both have been of the form "nfsd4_lm_*". Before more lock manager operations are added in NFSD, rename these two functions for consistency. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 60d5d1cb2cc6..0170aaf318ea 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6566,7 +6566,7 @@ nfs4_transform_lock_offset(struct file_lock *lock) } static fl_owner_t -nfsd4_fl_get_owner(fl_owner_t owner) +nfsd4_lm_get_owner(fl_owner_t owner) { struct nfs4_lockowner *lo = (struct nfs4_lockowner *)owner; @@ -6575,7 +6575,7 @@ nfsd4_fl_get_owner(fl_owner_t owner) } static void -nfsd4_fl_put_owner(fl_owner_t owner) +nfsd4_lm_put_owner(fl_owner_t owner) { struct nfs4_lockowner *lo = (struct nfs4_lockowner *)owner; @@ -6610,8 +6610,8 @@ nfsd4_lm_notify(struct file_lock *fl) static const struct lock_manager_operations nfsd_posix_mng_ops = { .lm_notify = nfsd4_lm_notify, - .lm_get_owner = nfsd4_fl_get_owner, - .lm_put_owner = nfsd4_fl_put_owner, + .lm_get_owner = nfsd4_lm_get_owner, + .lm_put_owner = nfsd4_lm_put_owner, }; static inline void From 62fa144b858700ce644720b558491fc097b0096b Mon Sep 17 00:00:00 2001 From: Jakob Koschel Date: Sat, 19 Mar 2022 21:27:04 +0100 Subject: [PATCH 0783/1565] nfsd: fix using the correct variable for sizeof() [ Upstream commit 4fc5f5346592cdc91689455d83885b0af65d71b8 ] While the original code is valid, it is not the obvious choice for the sizeof() call and in preparation to limit the scope of the list iterator variable the sizeof should be changed to the size of the destination. Signed-off-by: Jakob Koschel Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4layouts.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c index 2673019d30ec..7018d209b784 100644 --- a/fs/nfsd/nfs4layouts.c +++ b/fs/nfsd/nfs4layouts.c @@ -421,7 +421,7 @@ nfsd4_insert_layout(struct nfsd4_layoutget *lgp, struct nfs4_layout_stateid *ls) new = kmem_cache_alloc(nfs4_layout_cache, GFP_KERNEL); if (!new) return nfserr_jukebox; - memcpy(&new->lo_seg, seg, sizeof(lp->lo_seg)); + memcpy(&new->lo_seg, seg, sizeof(new->lo_seg)); new->lo_state = ls; spin_lock(&fp->fi_lock); From 5e84e33832d50f707843be79859cecddde6a7f34 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 23 Feb 2022 17:14:37 +0200 Subject: [PATCH 0784/1565] fsnotify: fix merge with parent's ignored mask [ Upstream commit 4f0b903ded728c505850daf2914bfc08841f0ae6 ] fsnotify_parent() does not consider the parent's mark at all unless the parent inode shows interest in events on children and in the specific event. So unless parent added an event to both its mark mask and ignored mask, the event will not be ignored. Fix this by declaring the interest of an object in an event when the event is in either a mark mask or ignored mask. Link: https://lore.kernel.org/r/20220223151438.790268-2-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 17 +++++++++-------- fs/notify/mark.c | 4 ++-- include/linux/fsnotify_backend.h | 15 +++++++++++++++ 3 files changed, 26 insertions(+), 10 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 667970057411..12fb209e6041 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -991,17 +991,18 @@ static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark, __u32 mask, unsigned int flags, __u32 umask, int *destroy) { - __u32 oldmask = 0; + __u32 oldmask, newmask; /* umask bits cannot be removed by user */ mask &= ~umask; spin_lock(&fsn_mark->lock); + oldmask = fsnotify_calc_mask(fsn_mark); if (!(flags & FAN_MARK_IGNORED_MASK)) { - oldmask = fsn_mark->mask; fsn_mark->mask &= ~mask; } else { fsn_mark->ignored_mask &= ~mask; } + newmask = fsnotify_calc_mask(fsn_mark); /* * We need to keep the mark around even if remaining mask cannot * result in any events (e.g. mask == FAN_ONDIR) to support incremenal @@ -1011,7 +1012,7 @@ static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark, *destroy = !((fsn_mark->mask | fsn_mark->ignored_mask) & ~umask); spin_unlock(&fsn_mark->lock); - return mask & oldmask; + return oldmask & ~newmask; } static int fanotify_remove_mark(struct fsnotify_group *group, @@ -1069,23 +1070,23 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group, } static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, - __u32 mask, - unsigned int flags) + __u32 mask, unsigned int flags) { - __u32 oldmask = -1; + __u32 oldmask, newmask; spin_lock(&fsn_mark->lock); + oldmask = fsnotify_calc_mask(fsn_mark); if (!(flags & FAN_MARK_IGNORED_MASK)) { - oldmask = fsn_mark->mask; fsn_mark->mask |= mask; } else { fsn_mark->ignored_mask |= mask; if (flags & FAN_MARK_IGNORED_SURV_MODIFY) fsn_mark->flags |= FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY; } + newmask = fsnotify_calc_mask(fsn_mark); spin_unlock(&fsn_mark->lock); - return mask & ~oldmask; + return newmask & ~oldmask; } static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group, diff --git a/fs/notify/mark.c b/fs/notify/mark.c index b42629d2fc1c..c86982be2d50 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -127,7 +127,7 @@ static void __fsnotify_recalc_mask(struct fsnotify_mark_connector *conn) return; hlist_for_each_entry(mark, &conn->list, obj_list) { if (mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED) - new_mask |= mark->mask; + new_mask |= fsnotify_calc_mask(mark); } *fsnotify_conn_mask_p(conn) = new_mask; } @@ -692,7 +692,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, if (ret) goto err; - if (mark->mask) + if (mark->mask || mark->ignored_mask) fsnotify_recalc_mask(mark->connector); return ret; diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 790c31844db5..5f9c960049b0 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -601,6 +601,21 @@ extern void fsnotify_remove_queued_event(struct fsnotify_group *group, /* functions used to manipulate the marks attached to inodes */ +/* Get mask for calculating object interest taking ignored mask into account */ +static inline __u32 fsnotify_calc_mask(struct fsnotify_mark *mark) +{ + __u32 mask = mark->mask; + + if (!mark->ignored_mask) + return mask; + + /* + * If mark is interested in ignoring events on children, the object must + * show interest in those events for fsnotify_parent() to notice it. + */ + return mask | (mark->ignored_mask & ALL_FSNOTIFY_EVENTS); +} + /* Get mask of events for a list of marks */ extern __u32 fsnotify_conn_mask(struct fsnotify_mark_connector *conn); /* Calculate mask of events for a list of marks */ From 552c24a32ce8b149436c7a6d7fdb9285fcd581ce Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 23 Feb 2022 17:14:38 +0200 Subject: [PATCH 0785/1565] fsnotify: optimize FS_MODIFY events with no ignored masks [ Upstream commit 04e317ba72d07901b03399b3d1525e83424df5b3 ] fsnotify() treats FS_MODIFY events specially - it does not skip them even if the FS_MODIFY event does not apear in the object's fsnotify mask. This is because send_to_group() checks if FS_MODIFY needs to clear ignored mask of marks. The common case is that an object does not have any mark with ignored mask and in particular, that it does not have a mark with ignored mask and without the FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY flag. Set FS_MODIFY in object's fsnotify mask during fsnotify_recalc_mask() if object has a mark with an ignored mask and without the FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY flag and remove the special treatment of FS_MODIFY in fsnotify(), so that FS_MODIFY events could be optimized in the common case. Call fsnotify_recalc_mask() from fanotify after adding or removing an ignored mask from a mark without FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY or when adding the FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY flag to a mark with ignored mask (the flag cannot be removed by fanotify uapi). Performance results for doing 10000000 write(2)s to tmpfs: vanilla patched without notification mark 25.486+-1.054 24.965+-0.244 with notification mark 30.111+-0.139 26.891+-1.355 So we can see the overhead of notification subsystem has been drastically reduced. Link: https://lore.kernel.org/r/20220223151438.790268-3-amir73il@gmail.com Suggested-by: Jan Kara Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 32 +++++++++++++++++++++++------- fs/notify/fsnotify.c | 8 +++++--- include/linux/fsnotify_backend.h | 4 ++++ 3 files changed, 34 insertions(+), 10 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 12fb209e6041..64abec874d8e 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1069,8 +1069,28 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group, flags, umask); } +static void fanotify_mark_add_ignored_mask(struct fsnotify_mark *fsn_mark, + __u32 mask, unsigned int flags, + __u32 *removed) +{ + fsn_mark->ignored_mask |= mask; + + /* + * Setting FAN_MARK_IGNORED_SURV_MODIFY for the first time may lead to + * the removal of the FS_MODIFY bit in calculated mask if it was set + * because of an ignored mask that is now going to survive FS_MODIFY. + */ + if ((flags & FAN_MARK_IGNORED_SURV_MODIFY) && + !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) { + fsn_mark->flags |= FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY; + if (!(fsn_mark->mask & FS_MODIFY)) + *removed = FS_MODIFY; + } +} + static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, - __u32 mask, unsigned int flags) + __u32 mask, unsigned int flags, + __u32 *removed) { __u32 oldmask, newmask; @@ -1079,9 +1099,7 @@ static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, if (!(flags & FAN_MARK_IGNORED_MASK)) { fsn_mark->mask |= mask; } else { - fsn_mark->ignored_mask |= mask; - if (flags & FAN_MARK_IGNORED_SURV_MODIFY) - fsn_mark->flags |= FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY; + fanotify_mark_add_ignored_mask(fsn_mark, mask, flags, removed); } newmask = fsnotify_calc_mask(fsn_mark); spin_unlock(&fsn_mark->lock); @@ -1144,7 +1162,7 @@ static int fanotify_add_mark(struct fsnotify_group *group, __kernel_fsid_t *fsid) { struct fsnotify_mark *fsn_mark; - __u32 added; + __u32 added, removed = 0; int ret = 0; mutex_lock(&group->mark_mutex); @@ -1167,8 +1185,8 @@ static int fanotify_add_mark(struct fsnotify_group *group, goto out; } - added = fanotify_mark_add_to_mask(fsn_mark, mask, flags); - if (added & ~fsnotify_conn_mask(fsn_mark->connector)) + added = fanotify_mark_add_to_mask(fsn_mark, mask, flags, &removed); + if (removed || (added & ~fsnotify_conn_mask(fsn_mark->connector))) fsnotify_recalc_mask(fsn_mark->connector); out: diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index ab81a0776ece..494f653efbc6 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -531,11 +531,13 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, /* - * if this is a modify event we may need to clear the ignored masks - * otherwise return if none of the marks care about this type of event. + * If this is a modify event we may need to clear some ignored masks. + * In that case, the object with ignored masks will have the FS_MODIFY + * event in its mask. + * Otherwise, return if none of the marks care about this type of event. */ test_mask = (mask & ALL_FSNOTIFY_EVENTS); - if (!(mask & FS_MODIFY) && !(test_mask & marks_mask)) + if (!(test_mask & marks_mask)) return 0; iter_info.srcu_idx = srcu_read_lock(&fsnotify_mark_srcu); diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 5f9c960049b0..0805b74cae44 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -609,6 +609,10 @@ static inline __u32 fsnotify_calc_mask(struct fsnotify_mark *mark) if (!mark->ignored_mask) return mask; + /* Interest in FS_MODIFY may be needed for clearing ignored mask */ + if (!(mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) + mask |= FS_MODIFY; + /* * If mark is interested in ignoring events on children, the object must * show interest in those events for fsnotify_parent() to notice it. From a5fa9c824db839c1b7f702b2731c37ee2d700d81 Mon Sep 17 00:00:00 2001 From: Bang Li Date: Fri, 11 Mar 2022 23:12:40 +0800 Subject: [PATCH 0786/1565] fsnotify: remove redundant parameter judgment [ Upstream commit f92ca72b0263d601807bbd23ed25cbe6f4da89f4 ] iput() has already judged the incoming parameter, so there is no need to repeat the judgment here. Link: https://lore.kernel.org/r/20220311151240.62045-1-libang.linuxer@gmail.com Signed-off-by: Bang Li Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fsnotify.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 494f653efbc6..70a8516b78bc 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -70,8 +70,7 @@ static void fsnotify_unmount_inodes(struct super_block *sb) spin_unlock(&inode->i_lock); spin_unlock(&sb->s_inode_list_lock); - if (iput_inode) - iput(iput_inode); + iput(iput_inode); /* for each watch, send FS_UNMOUNT and then remove it */ fsnotify_inode(inode, FS_UNMOUNT); @@ -85,8 +84,7 @@ static void fsnotify_unmount_inodes(struct super_block *sb) } spin_unlock(&sb->s_inode_list_lock); - if (iput_inode) - iput(iput_inode); + iput(iput_inode); } void fsnotify_sb_delete(struct super_block *sb) From 997575f1a1b593a1fd6ffb6a390d335dfe6e4e71 Mon Sep 17 00:00:00 2001 From: Haowen Bai Date: Mon, 28 Mar 2022 10:48:59 +0800 Subject: [PATCH 0787/1565] SUNRPC: Return true/false (not 1/0) from bool functions [ Upstream commit 5f7b839d47dbc74cf4a07beeab5191f93678673e ] Return boolean values ("true" or "false") instead of 1 or 0 from bool functions. This fixes the following warnings from coccicheck: ./fs/nfsd/nfs2acl.c:289:9-10: WARNING: return of 0/1 in function 'nfsaclsvc_encode_accessres' with return type bool ./fs/nfsd/nfs2acl.c:252:9-10: WARNING: return of 0/1 in function 'nfsaclsvc_encode_getaclres' with return type bool Signed-off-by: Haowen Bai Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs2acl.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 96733dff354d..03703b22c81e 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -247,34 +247,34 @@ nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) int w; if (!svcxdr_encode_stat(xdr, resp->status)) - return 0; + return false; if (dentry == NULL || d_really_is_negative(dentry)) - return 1; + return true; inode = d_inode(dentry); if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->mask) < 0) - return 0; + return false; rqstp->rq_res.page_len = w = nfsacl_size( (resp->mask & NFS_ACL) ? resp->acl_access : NULL, (resp->mask & NFS_DFACL) ? resp->acl_default : NULL); while (w > 0) { if (!*(rqstp->rq_next_page++)) - return 1; + return true; w -= PAGE_SIZE; } if (!nfs_stream_encode_acl(xdr, inode, resp->acl_access, resp->mask & NFS_ACL, 0)) - return 0; + return false; if (!nfs_stream_encode_acl(xdr, inode, resp->acl_default, resp->mask & NFS_DFACL, NFS_ACL_DEFAULT)) - return 0; + return false; - return 1; + return true; } /* ACCESS */ @@ -284,17 +284,17 @@ nfsaclsvc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr) struct nfsd3_accessres *resp = rqstp->rq_resp; if (!svcxdr_encode_stat(xdr, resp->status)) - return 0; + return false; switch (resp->status) { case nfs_ok: if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) - return 0; + return false; if (xdr_stream_encode_u32(xdr, resp->access) < 0) - return 0; + return false; break; } - return 1; + return true; } /* From cf04df21a46f5cb0382d04ea4384c60893d26046 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Thu, 31 Mar 2022 09:54:01 -0400 Subject: [PATCH 0788/1565] nfsd: Fix a write performance regression [ Upstream commit 6b8a94332ee4f7d9a8ae0cbac7609f79c212f06c ] The call to filemap_flush() in nfsd_file_put() is there to ensure that we clear out any writes belonging to a NFSv3 client relatively quickly and avoid situations where the file can't be evicted by the garbage collector. It also ensures that we detect write errors quickly. The problem is this causes a regression in performance for some workloads. So try to improve matters by deferring writeback until we're ready to close the file, and need to detect errors so that we can force the client to resend. Tested-by: Jan Kara Fixes: b6669305d35a ("nfsd: Reduce the number of calls to nfsd_file_gc()") Signed-off-by: Trond Myklebust Link: https://lore.kernel.org/all/20220330103457.r4xrhy2d6nhtouzk@quack3.lan Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index cc2831cec669..496f7b3f7523 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -235,6 +235,13 @@ nfsd_file_check_write_error(struct nfsd_file *nf) return filemap_check_wb_err(file->f_mapping, READ_ONCE(file->f_wb_err)); } +static void +nfsd_file_flush(struct nfsd_file *nf) +{ + if (nf->nf_file && vfs_fsync(nf->nf_file, 1) != 0) + nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); +} + static void nfsd_file_do_unhash(struct nfsd_file *nf) { @@ -302,11 +309,14 @@ nfsd_file_put(struct nfsd_file *nf) return; } - filemap_flush(nf->nf_file->f_mapping); is_hashed = test_bit(NFSD_FILE_HASHED, &nf->nf_flags) != 0; - nfsd_file_put_noref(nf); - if (is_hashed) + if (!is_hashed) { + nfsd_file_flush(nf); + nfsd_file_put_noref(nf); + } else { + nfsd_file_put_noref(nf); nfsd_file_schedule_laundrette(); + } if (atomic_long_read(&nfsd_filecache_count) >= NFSD_FILE_LRU_LIMIT) nfsd_file_gc(); } @@ -327,6 +337,7 @@ nfsd_file_dispose_list(struct list_head *dispose) while(!list_empty(dispose)) { nf = list_first_entry(dispose, struct nfsd_file, nf_lru); list_del(&nf->nf_lru); + nfsd_file_flush(nf); nfsd_file_put_noref(nf); } } @@ -340,6 +351,7 @@ nfsd_file_dispose_list_sync(struct list_head *dispose) while(!list_empty(dispose)) { nf = list_first_entry(dispose, struct nfsd_file, nf_lru); list_del(&nf->nf_lru); + nfsd_file_flush(nf); if (!refcount_dec_and_test(&nf->nf_ref)) continue; if (nfsd_file_free(nf)) From 9862064ca81f9b9da8ebe26c86cf2b10c293ba1a Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Thu, 31 Mar 2022 09:54:02 -0400 Subject: [PATCH 0789/1565] nfsd: Clean up nfsd_file_put() [ Upstream commit 999397926ab3f78c7d1235cc4ca6e3c89d2769bf ] Make it a little less racy, by removing the refcount_read() test. Then remove the redundant 'is_hashed' variable. Signed-off-by: Trond Myklebust Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 13 +++---------- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 496f7b3f7523..8f7ed5dbb003 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -301,21 +301,14 @@ nfsd_file_put_noref(struct nfsd_file *nf) void nfsd_file_put(struct nfsd_file *nf) { - bool is_hashed; - set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); - if (refcount_read(&nf->nf_ref) > 2 || !nf->nf_file) { - nfsd_file_put_noref(nf); - return; - } - - is_hashed = test_bit(NFSD_FILE_HASHED, &nf->nf_flags) != 0; - if (!is_hashed) { + if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) { nfsd_file_flush(nf); nfsd_file_put_noref(nf); } else { nfsd_file_put_noref(nf); - nfsd_file_schedule_laundrette(); + if (nf->nf_file) + nfsd_file_schedule_laundrette(); } if (atomic_long_read(&nfsd_filecache_count) >= NFSD_FILE_LRU_LIMIT) nfsd_file_gc(); From d71ea54835dfa69019b91a9465b299d5e6984956 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Sat, 7 May 2022 11:00:28 +0300 Subject: [PATCH 0790/1565] fanotify: do not allow setting dirent events in mask of non-dir [ Upstream commit ceaf69f8eadcafb323392be88e7a5248c415d423 ] Dirent events (create/delete/move) are only reported on watched directory inodes, but in fanotify as well as in legacy inotify, it was always allowed to set them on non-dir inode, which does not result in any meaningful outcome. Until kernel v5.17, dirent events in fanotify also differed from events "on child" (e.g. FAN_OPEN) in the information provided in the event. For example, FAN_OPEN could be set in the mask of a non-dir or the mask of its parent and event would report the fid of the child regardless of the marked object. By contrast, FAN_DELETE is not reported if the child is marked and the child fid was not reported in the events. Since kernel v5.17, with fanotify group flag FAN_REPORT_TARGET_FID, the fid of the child is reported with dirent events, like events "on child", which may create confusion for users expecting the same behavior as events "on child" when setting events in the mask on a child. The desired semantics of setting dirent events in the mask of a child are not clear, so for now, deny this action for a group initialized with flag FAN_REPORT_TARGET_FID and for the new event FAN_RENAME. We may relax this restriction in the future if we decide on the semantics and implement them. Fixes: d61fd650e9d2 ("fanotify: introduce group flag FAN_REPORT_TARGET_FID") Fixes: 8cc3b1ccd930 ("fanotify: wire up FAN_RENAME event") Link: https://lore.kernel.org/linux-fsdevel/20220505133057.zm5t6vumc4xdcnsg@quack3.lan/ Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Link: https://lore.kernel.org/r/20220507080028.219826-1-amir73il@gmail.com Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 64abec874d8e..921ee7b08580 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1663,6 +1663,19 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, else mnt = path.mnt; + /* + * FAN_RENAME is not allowed on non-dir (for now). + * We shouldn't have allowed setting any dirent events in mask of + * non-dir, but because we always allowed it, error only if group + * was initialized with the new flag FAN_REPORT_TARGET_FID. + */ + ret = -ENOTDIR; + if (inode && !S_ISDIR(inode->i_mode) && + ((mask & FAN_RENAME) || + ((mask & FANOTIFY_DIRENT_EVENTS) && + FAN_GROUP_FLAG(group, FAN_REPORT_TARGET_FID)))) + goto path_put_and_out; + /* Mask out FAN_EVENT_ON_CHILD flag for sb/mount/non-dir marks */ if (mnt || !S_ISDIR(inode->i_mode)) { mask &= ~FAN_EVENT_ON_CHILD; From aecfd231bf53be3495fa6eb8c22cf39fc4cd3a86 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Sat, 12 Feb 2022 10:12:52 -0800 Subject: [PATCH 0791/1565] fs/lock: documentation cleanup. Replace inode->i_lock with flc_lock. [ Upstream commit 9d6647762b9c6b555bc83d97d7c93be6057a990f ] Update lock usage of lock_manager_operations' functions to reflect the changes in commit 6109c85037e5 ("locks: add a dedicated spinlock to protect i_flctx lists"). Signed-off-by: Dai Ngo Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- Documentation/filesystems/locking.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index fbd695d66905..07e57f762920 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -437,13 +437,13 @@ prototypes:: locking rules: ====================== ============= ================= ========= -ops inode->i_lock blocked_lock_lock may block +ops flc_lock blocked_lock_lock may block ====================== ============= ================= ========= -lm_notify: yes yes no +lm_notify: no yes no lm_grant: no no no lm_break: yes no no lm_change yes no no -lm_breaker_owns_lease: no no no +lm_breaker_owns_lease: yes no no ====================== ============= ================= ========= buffer_head From 1c47d87317e29eee56b74e82492ce22c171a7cf8 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:13 +0300 Subject: [PATCH 0792/1565] inotify: move control flags from mask to mark flags [ Upstream commit 38035c04f5865c4ef9597d6beed6a7178f90f64a ] The inotify control flags in the mark mask (e.g. FS_IN_ONE_SHOT) are not relevant to object interest mask, so move them to the mark flags. This frees up some bits in the object interest mask. Link: https://lore.kernel.org/r/20220422120327.3459282-3-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fsnotify.c | 4 +-- fs/notify/inotify/inotify.h | 11 ++++++-- fs/notify/inotify/inotify_fsnotify.c | 2 +- fs/notify/inotify/inotify_user.c | 38 ++++++++++++++++++---------- include/linux/fsnotify_backend.h | 16 +++++++----- 5 files changed, 45 insertions(+), 26 deletions(-) diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 70a8516b78bc..6eee19d15e8c 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -253,7 +253,7 @@ static int fsnotify_handle_inode_event(struct fsnotify_group *group, if (WARN_ON_ONCE(!inode && !dir)) return 0; - if ((inode_mark->mask & FS_EXCL_UNLINK) && + if ((inode_mark->flags & FSNOTIFY_MARK_FLAG_EXCL_UNLINK) && path && d_unlinked(path->dentry)) return 0; @@ -581,7 +581,7 @@ static __init int fsnotify_init(void) { int ret; - BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 25); + BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 23); ret = init_srcu_struct(&fsnotify_mark_srcu); if (ret) diff --git a/fs/notify/inotify/inotify.h b/fs/notify/inotify/inotify.h index 8f00151eb731..7d5df7a21539 100644 --- a/fs/notify/inotify/inotify.h +++ b/fs/notify/inotify/inotify.h @@ -27,11 +27,18 @@ static inline struct inotify_event_info *INOTIFY_E(struct fsnotify_event *fse) * userspace. There is at least one bit (FS_EVENT_ON_CHILD) which is * used only internally to the kernel. */ -#define INOTIFY_USER_MASK (IN_ALL_EVENTS | IN_ONESHOT | IN_EXCL_UNLINK) +#define INOTIFY_USER_MASK (IN_ALL_EVENTS) static inline __u32 inotify_mark_user_mask(struct fsnotify_mark *fsn_mark) { - return fsn_mark->mask & INOTIFY_USER_MASK; + __u32 mask = fsn_mark->mask & INOTIFY_USER_MASK; + + if (fsn_mark->flags & FSNOTIFY_MARK_FLAG_EXCL_UNLINK) + mask |= IN_EXCL_UNLINK; + if (fsn_mark->flags & FSNOTIFY_MARK_FLAG_IN_ONESHOT) + mask |= IN_ONESHOT; + + return mask; } extern void inotify_ignored_and_remove_idr(struct fsnotify_mark *fsn_mark, diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index 827982783639..993375f0db67 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -129,7 +129,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask, fsnotify_destroy_event(group, fsn_event); } - if (inode_mark->mask & IN_ONESHOT) + if (inode_mark->flags & FSNOTIFY_MARK_FLAG_IN_ONESHOT) fsnotify_destroy_mark(inode_mark, group); return 0; diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 9676c3f2df3d..861147e57458 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -102,6 +102,21 @@ static inline __u32 inotify_arg_to_mask(struct inode *inode, u32 arg) return mask; } +#define INOTIFY_MARK_FLAGS \ + (FSNOTIFY_MARK_FLAG_EXCL_UNLINK | FSNOTIFY_MARK_FLAG_IN_ONESHOT) + +static inline unsigned int inotify_arg_to_flags(u32 arg) +{ + unsigned int flags = 0; + + if (arg & IN_EXCL_UNLINK) + flags |= FSNOTIFY_MARK_FLAG_EXCL_UNLINK; + if (arg & IN_ONESHOT) + flags |= FSNOTIFY_MARK_FLAG_IN_ONESHOT; + + return flags; +} + static inline u32 inotify_mask_to_arg(__u32 mask) { return mask & (IN_ALL_EVENTS | IN_ISDIR | IN_UNMOUNT | IN_IGNORED | @@ -513,13 +528,10 @@ static int inotify_update_existing_watch(struct fsnotify_group *group, struct fsnotify_mark *fsn_mark; struct inotify_inode_mark *i_mark; __u32 old_mask, new_mask; - __u32 mask; - int add = (arg & IN_MASK_ADD); + int replace = !(arg & IN_MASK_ADD); int create = (arg & IN_MASK_CREATE); int ret; - mask = inotify_arg_to_mask(inode, arg); - fsn_mark = fsnotify_find_mark(&inode->i_fsnotify_marks, group); if (!fsn_mark) return -ENOENT; @@ -532,10 +544,12 @@ static int inotify_update_existing_watch(struct fsnotify_group *group, spin_lock(&fsn_mark->lock); old_mask = fsn_mark->mask; - if (add) - fsn_mark->mask |= mask; - else - fsn_mark->mask = mask; + if (replace) { + fsn_mark->mask = 0; + fsn_mark->flags &= ~INOTIFY_MARK_FLAGS; + } + fsn_mark->mask |= inotify_arg_to_mask(inode, arg); + fsn_mark->flags |= inotify_arg_to_flags(arg); new_mask = fsn_mark->mask; spin_unlock(&fsn_mark->lock); @@ -566,19 +580,17 @@ static int inotify_new_watch(struct fsnotify_group *group, u32 arg) { struct inotify_inode_mark *tmp_i_mark; - __u32 mask; int ret; struct idr *idr = &group->inotify_data.idr; spinlock_t *idr_lock = &group->inotify_data.idr_lock; - mask = inotify_arg_to_mask(inode, arg); - tmp_i_mark = kmem_cache_alloc(inotify_inode_mark_cachep, GFP_KERNEL); if (unlikely(!tmp_i_mark)) return -ENOMEM; fsnotify_init_mark(&tmp_i_mark->fsn_mark, group); - tmp_i_mark->fsn_mark.mask = mask; + tmp_i_mark->fsn_mark.mask = inotify_arg_to_mask(inode, arg); + tmp_i_mark->fsn_mark.flags = inotify_arg_to_flags(arg); tmp_i_mark->wd = -1; ret = inotify_add_to_idr(idr, idr_lock, tmp_i_mark); @@ -832,9 +844,7 @@ static int __init inotify_user_setup(void) BUILD_BUG_ON(IN_UNMOUNT != FS_UNMOUNT); BUILD_BUG_ON(IN_Q_OVERFLOW != FS_Q_OVERFLOW); BUILD_BUG_ON(IN_IGNORED != FS_IN_IGNORED); - BUILD_BUG_ON(IN_EXCL_UNLINK != FS_EXCL_UNLINK); BUILD_BUG_ON(IN_ISDIR != FS_ISDIR); - BUILD_BUG_ON(IN_ONESHOT != FS_IN_ONESHOT); BUILD_BUG_ON(HWEIGHT32(ALL_INOTIFY_BITS) != 22); diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 0805b74cae44..b1c72edd9784 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -55,7 +55,6 @@ #define FS_ACCESS_PERM 0x00020000 /* access event in a permissions hook */ #define FS_OPEN_EXEC_PERM 0x00040000 /* open/exec event in a permission hook */ -#define FS_EXCL_UNLINK 0x04000000 /* do not send events if object is unlinked */ /* * Set on inode mark that cares about things that happen to its children. * Always set for dnotify and inotify. @@ -66,7 +65,6 @@ #define FS_RENAME 0x10000000 /* File was renamed */ #define FS_DN_MULTISHOT 0x20000000 /* dnotify multishot */ #define FS_ISDIR 0x40000000 /* event occurred against dir */ -#define FS_IN_ONESHOT 0x80000000 /* only send event once */ #define FS_MOVE (FS_MOVED_FROM | FS_MOVED_TO) @@ -106,8 +104,7 @@ FS_ERROR) /* Extra flags that may be reported with event or control handling of events */ -#define ALL_FSNOTIFY_FLAGS (FS_EXCL_UNLINK | FS_ISDIR | FS_IN_ONESHOT | \ - FS_DN_MULTISHOT | FS_EVENT_ON_CHILD) +#define ALL_FSNOTIFY_FLAGS (FS_ISDIR | FS_EVENT_ON_CHILD | FS_DN_MULTISHOT) #define ALL_FSNOTIFY_BITS (ALL_FSNOTIFY_EVENTS | ALL_FSNOTIFY_FLAGS) @@ -473,9 +470,14 @@ struct fsnotify_mark { struct fsnotify_mark_connector *connector; /* Events types to ignore [mark->lock, group->mark_mutex] */ __u32 ignored_mask; -#define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY 0x01 -#define FSNOTIFY_MARK_FLAG_ALIVE 0x02 -#define FSNOTIFY_MARK_FLAG_ATTACHED 0x04 + /* General fsnotify mark flags */ +#define FSNOTIFY_MARK_FLAG_ALIVE 0x0001 +#define FSNOTIFY_MARK_FLAG_ATTACHED 0x0002 + /* inotify mark flags */ +#define FSNOTIFY_MARK_FLAG_EXCL_UNLINK 0x0010 +#define FSNOTIFY_MARK_FLAG_IN_ONESHOT 0x0020 + /* fanotify mark flags */ +#define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY 0x0100 unsigned int flags; /* flags [mark->lock] */ }; From 4dc30393bd7b909c3d6083f8d734bf6b65cf82cc Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:15 +0300 Subject: [PATCH 0793/1565] fsnotify: pass flags argument to fsnotify_alloc_group() [ Upstream commit 867a448d587e7fa845bceaf4ee1c632448f2a9fa ] Add flags argument to fsnotify_alloc_group(), define and use the flag FSNOTIFY_GROUP_USER in inotify and fanotify instead of the helper fsnotify_alloc_user_group() to indicate user allocation. Although the flag FSNOTIFY_GROUP_USER is currently not used after group allocation, we store the flags argument in the group struct for future use of other group flags. Link: https://lore.kernel.org/r/20220422120327.3459282-5-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 3 ++- fs/notify/dnotify/dnotify.c | 2 +- fs/notify/fanotify/fanotify_user.c | 3 ++- fs/notify/group.c | 21 +++++++++------------ fs/notify/inotify/inotify_user.c | 3 ++- include/linux/fsnotify_backend.h | 8 ++++++-- kernel/audit_fsnotify.c | 3 ++- kernel/audit_tree.c | 2 +- kernel/audit_watch.c | 2 +- 9 files changed, 26 insertions(+), 21 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 8f7ed5dbb003..7ae2b6611fb2 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -677,7 +677,8 @@ nfsd_file_cache_init(void) goto out_shrinker; } - nfsd_file_fsnotify_group = fsnotify_alloc_group(&nfsd_file_fsnotify_ops); + nfsd_file_fsnotify_group = fsnotify_alloc_group(&nfsd_file_fsnotify_ops, + 0); if (IS_ERR(nfsd_file_fsnotify_group)) { pr_err("nfsd: unable to create fsnotify group: %ld\n", PTR_ERR(nfsd_file_fsnotify_group)); diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index d5ebebb034ff..6c586802c50e 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -383,7 +383,7 @@ static int __init dnotify_init(void) SLAB_PANIC|SLAB_ACCOUNT); dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC|SLAB_ACCOUNT); - dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops); + dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops, 0); if (IS_ERR(dnotify_group)) panic("unable to allocate fsnotify group for dnotify\n"); return 0; diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 921ee7b08580..731bd7f64e01 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1343,7 +1343,8 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) f_flags |= O_NONBLOCK; /* fsnotify_alloc_group takes a ref. Dropped in fanotify_release */ - group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops); + group = fsnotify_alloc_group(&fanotify_fsnotify_ops, + FSNOTIFY_GROUP_USER); if (IS_ERR(group)) { return PTR_ERR(group); } diff --git a/fs/notify/group.c b/fs/notify/group.c index b7d4d64f87c2..18446b7b0d49 100644 --- a/fs/notify/group.c +++ b/fs/notify/group.c @@ -112,7 +112,8 @@ void fsnotify_put_group(struct fsnotify_group *group) EXPORT_SYMBOL_GPL(fsnotify_put_group); static struct fsnotify_group *__fsnotify_alloc_group( - const struct fsnotify_ops *ops, gfp_t gfp) + const struct fsnotify_ops *ops, + int flags, gfp_t gfp) { struct fsnotify_group *group; @@ -133,6 +134,7 @@ static struct fsnotify_group *__fsnotify_alloc_group( INIT_LIST_HEAD(&group->marks_list); group->ops = ops; + group->flags = flags; return group; } @@ -140,21 +142,16 @@ static struct fsnotify_group *__fsnotify_alloc_group( /* * Create a new fsnotify_group and hold a reference for the group returned. */ -struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops) +struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops, + int flags) { - return __fsnotify_alloc_group(ops, GFP_KERNEL); + gfp_t gfp = (flags & FSNOTIFY_GROUP_USER) ? GFP_KERNEL_ACCOUNT : + GFP_KERNEL; + + return __fsnotify_alloc_group(ops, flags, gfp); } EXPORT_SYMBOL_GPL(fsnotify_alloc_group); -/* - * Create a new fsnotify_group and hold a reference for the group returned. - */ -struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops) -{ - return __fsnotify_alloc_group(ops, GFP_KERNEL_ACCOUNT); -} -EXPORT_SYMBOL_GPL(fsnotify_alloc_user_group); - int fsnotify_fasync(int fd, struct file *file, int on) { struct fsnotify_group *group = file->private_data; diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 861147e57458..9e1cf8392385 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -643,7 +643,8 @@ static struct fsnotify_group *inotify_new_group(unsigned int max_events) struct fsnotify_group *group; struct inotify_event_info *oevent; - group = fsnotify_alloc_user_group(&inotify_fsnotify_ops); + group = fsnotify_alloc_group(&inotify_fsnotify_ops, + FSNOTIFY_GROUP_USER); if (IS_ERR(group)) return group; diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index b1c72edd9784..f0bf557af009 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -210,6 +210,9 @@ struct fsnotify_group { unsigned int priority; bool shutdown; /* group is being shut down, don't queue more events */ +#define FSNOTIFY_GROUP_USER 0x01 /* user allocated group */ + int flags; + /* stores all fastpath marks assoc with this group so they can be cleaned on unregister */ struct mutex mark_mutex; /* protect marks_list */ atomic_t user_waits; /* Number of tasks waiting for user @@ -543,8 +546,9 @@ static inline void fsnotify_update_flags(struct dentry *dentry) /* called from fsnotify listeners, such as fanotify or dnotify */ /* create a new group */ -extern struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops); -extern struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops); +extern struct fsnotify_group *fsnotify_alloc_group( + const struct fsnotify_ops *ops, + int flags); /* get reference to a group */ extern void fsnotify_get_group(struct fsnotify_group *group); /* drop reference on a group from fsnotify_alloc_group */ diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c index 76a5925b4e18..31f43b147540 100644 --- a/kernel/audit_fsnotify.c +++ b/kernel/audit_fsnotify.c @@ -182,7 +182,8 @@ static const struct fsnotify_ops audit_mark_fsnotify_ops = { static int __init audit_fsnotify_init(void) { - audit_fsnotify_group = fsnotify_alloc_group(&audit_mark_fsnotify_ops); + audit_fsnotify_group = fsnotify_alloc_group(&audit_mark_fsnotify_ops, + 0); if (IS_ERR(audit_fsnotify_group)) { audit_fsnotify_group = NULL; audit_panic("cannot create audit fsnotify group"); diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c index 39241207ec04..0c35879bbf7c 100644 --- a/kernel/audit_tree.c +++ b/kernel/audit_tree.c @@ -1077,7 +1077,7 @@ static int __init audit_tree_init(void) audit_tree_mark_cachep = KMEM_CACHE(audit_tree_mark, SLAB_PANIC); - audit_tree_group = fsnotify_alloc_group(&audit_tree_ops); + audit_tree_group = fsnotify_alloc_group(&audit_tree_ops, 0); if (IS_ERR(audit_tree_group)) audit_panic("cannot initialize fsnotify group for rectree watches"); diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c index fd7b30a2d9a4..5cf22fe30149 100644 --- a/kernel/audit_watch.c +++ b/kernel/audit_watch.c @@ -492,7 +492,7 @@ static const struct fsnotify_ops audit_watch_fsnotify_ops = { static int __init audit_watch_init(void) { - audit_watch_group = fsnotify_alloc_group(&audit_watch_fsnotify_ops); + audit_watch_group = fsnotify_alloc_group(&audit_watch_fsnotify_ops, 0); if (IS_ERR(audit_watch_group)) { audit_watch_group = NULL; audit_panic("cannot create audit fsnotify group"); From 74f9be7f64ed0aff7b859ca23a21ee8b3fae6616 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:16 +0300 Subject: [PATCH 0794/1565] fsnotify: make allow_dups a property of the group [ Upstream commit f3010343d9e119da35ee864b3a28993bb5c78ed7 ] Instead of passing the allow_dups argument to fsnotify_add_mark() as an argument, define the group flag FSNOTIFY_GROUP_DUPS to express the allow_dups behavior and set this behavior at group creation time for all calls of fsnotify_add_mark(). Rename the allow_dups argument to generic add_flags argument for future use. Link: https://lore.kernel.org/r/20220422120327.3459282-6-amir73il@gmail.com Suggested-by: Jan Kara Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/mark.c | 12 ++++++------ include/linux/fsnotify_backend.h | 13 +++++++------ kernel/audit_fsnotify.c | 4 ++-- 3 files changed, 15 insertions(+), 14 deletions(-) diff --git a/fs/notify/mark.c b/fs/notify/mark.c index c86982be2d50..1fb246ea6175 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -574,7 +574,7 @@ static struct fsnotify_mark_connector *fsnotify_grab_connector( static int fsnotify_add_mark_list(struct fsnotify_mark *mark, fsnotify_connp_t *connp, unsigned int obj_type, - int allow_dups, __kernel_fsid_t *fsid) + int add_flags, __kernel_fsid_t *fsid) { struct fsnotify_mark *lmark, *last = NULL; struct fsnotify_mark_connector *conn; @@ -633,7 +633,7 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark, if ((lmark->group == mark->group) && (lmark->flags & FSNOTIFY_MARK_FLAG_ATTACHED) && - !allow_dups) { + !(mark->group->flags & FSNOTIFY_GROUP_DUPS)) { err = -EEXIST; goto out_err; } @@ -668,7 +668,7 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark, */ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, fsnotify_connp_t *connp, unsigned int obj_type, - int allow_dups, __kernel_fsid_t *fsid) + int add_flags, __kernel_fsid_t *fsid) { struct fsnotify_group *group = mark->group; int ret = 0; @@ -688,7 +688,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, fsnotify_get_mark(mark); /* for g_list */ spin_unlock(&mark->lock); - ret = fsnotify_add_mark_list(mark, connp, obj_type, allow_dups, fsid); + ret = fsnotify_add_mark_list(mark, connp, obj_type, add_flags, fsid); if (ret) goto err; @@ -708,14 +708,14 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, } int fsnotify_add_mark(struct fsnotify_mark *mark, fsnotify_connp_t *connp, - unsigned int obj_type, int allow_dups, + unsigned int obj_type, int add_flags, __kernel_fsid_t *fsid) { int ret; struct fsnotify_group *group = mark->group; mutex_lock(&group->mark_mutex); - ret = fsnotify_add_mark_locked(mark, connp, obj_type, allow_dups, fsid); + ret = fsnotify_add_mark_locked(mark, connp, obj_type, add_flags, fsid); mutex_unlock(&group->mark_mutex); return ret; } diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index f0bf557af009..dd440e6ff528 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -211,6 +211,7 @@ struct fsnotify_group { bool shutdown; /* group is being shut down, don't queue more events */ #define FSNOTIFY_GROUP_USER 0x01 /* user allocated group */ +#define FSNOTIFY_GROUP_DUPS 0x02 /* allow multiple marks per object */ int flags; /* stores all fastpath marks assoc with this group so they can be cleaned on unregister */ @@ -641,26 +642,26 @@ extern int fsnotify_get_conn_fsid(const struct fsnotify_mark_connector *conn, /* attach the mark to the object */ extern int fsnotify_add_mark(struct fsnotify_mark *mark, fsnotify_connp_t *connp, unsigned int obj_type, - int allow_dups, __kernel_fsid_t *fsid); + int add_flags, __kernel_fsid_t *fsid); extern int fsnotify_add_mark_locked(struct fsnotify_mark *mark, fsnotify_connp_t *connp, - unsigned int obj_type, int allow_dups, + unsigned int obj_type, int add_flags, __kernel_fsid_t *fsid); /* attach the mark to the inode */ static inline int fsnotify_add_inode_mark(struct fsnotify_mark *mark, struct inode *inode, - int allow_dups) + int add_flags) { return fsnotify_add_mark(mark, &inode->i_fsnotify_marks, - FSNOTIFY_OBJ_TYPE_INODE, allow_dups, NULL); + FSNOTIFY_OBJ_TYPE_INODE, add_flags, NULL); } static inline int fsnotify_add_inode_mark_locked(struct fsnotify_mark *mark, struct inode *inode, - int allow_dups) + int add_flags) { return fsnotify_add_mark_locked(mark, &inode->i_fsnotify_marks, - FSNOTIFY_OBJ_TYPE_INODE, allow_dups, + FSNOTIFY_OBJ_TYPE_INODE, add_flags, NULL); } diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c index 31f43b147540..691f90dd09d2 100644 --- a/kernel/audit_fsnotify.c +++ b/kernel/audit_fsnotify.c @@ -100,7 +100,7 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa audit_update_mark(audit_mark, dentry->d_inode); audit_mark->rule = krule; - ret = fsnotify_add_inode_mark(&audit_mark->mark, inode, true); + ret = fsnotify_add_inode_mark(&audit_mark->mark, inode, 0); if (ret < 0) { audit_mark->path = NULL; fsnotify_put_mark(&audit_mark->mark); @@ -183,7 +183,7 @@ static const struct fsnotify_ops audit_mark_fsnotify_ops = { static int __init audit_fsnotify_init(void) { audit_fsnotify_group = fsnotify_alloc_group(&audit_mark_fsnotify_ops, - 0); + FSNOTIFY_GROUP_DUPS); if (IS_ERR(audit_fsnotify_group)) { audit_fsnotify_group = NULL; audit_panic("cannot create audit fsnotify group"); From f91ba4a49b6ee69f64b130f4324e7f1826ac14a3 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:17 +0300 Subject: [PATCH 0795/1565] fsnotify: create helpers for group mark_mutex lock [ Upstream commit 43b245a788e2d8f1bb742668a9bdace02fcb3e96 ] Create helpers to take and release the group mark_mutex lock. Define a flag FSNOTIFY_GROUP_NOFS in fsnotify_group that determines if the mark_mutex lock is fs reclaim safe or not. If not safe, the lock helpers take the lock and disable direct fs reclaim. In that case we annotate the mutex with a different lockdep class to express to lockdep that an allocation of mark of an fs reclaim safe group may take the group lock of another "NOFS" group to evict inodes. For now, converted only the callers in common code and no backend defines the NOFS flag. It is intended to be set by fanotify for evictable marks support. Link: https://lore.kernel.org/r/20220422120327.3459282-7-amir73il@gmail.com Suggested-by: Jan Kara Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fdinfo.c | 4 ++-- fs/notify/group.c | 11 +++++++++++ fs/notify/mark.c | 24 +++++++++++------------- include/linux/fsnotify_backend.h | 28 ++++++++++++++++++++++++++++ 4 files changed, 52 insertions(+), 15 deletions(-) diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c index 3451708fd035..1f34c5c29fdb 100644 --- a/fs/notify/fdinfo.c +++ b/fs/notify/fdinfo.c @@ -28,13 +28,13 @@ static void show_fdinfo(struct seq_file *m, struct file *f, struct fsnotify_group *group = f->private_data; struct fsnotify_mark *mark; - mutex_lock(&group->mark_mutex); + fsnotify_group_lock(group); list_for_each_entry(mark, &group->marks_list, g_list) { show(m, mark); if (seq_has_overflowed(m)) break; } - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); } #if defined(CONFIG_EXPORTFS) diff --git a/fs/notify/group.c b/fs/notify/group.c index 18446b7b0d49..1de6631a3925 100644 --- a/fs/notify/group.c +++ b/fs/notify/group.c @@ -115,6 +115,7 @@ static struct fsnotify_group *__fsnotify_alloc_group( const struct fsnotify_ops *ops, int flags, gfp_t gfp) { + static struct lock_class_key nofs_marks_lock; struct fsnotify_group *group; group = kzalloc(sizeof(struct fsnotify_group), gfp); @@ -135,6 +136,16 @@ static struct fsnotify_group *__fsnotify_alloc_group( group->ops = ops; group->flags = flags; + /* + * For most backends, eviction of inode with a mark is not expected, + * because marks hold a refcount on the inode against eviction. + * + * Use a different lockdep class for groups that support evictable + * inode marks, because with evictable marks, mark_mutex is NOT + * fs-reclaim safe - the mutex is taken when evicting inodes. + */ + if (flags & FSNOTIFY_GROUP_NOFS) + lockdep_set_class(&group->mark_mutex, &nofs_marks_lock); return group; } diff --git a/fs/notify/mark.c b/fs/notify/mark.c index 1fb246ea6175..982ca2f20ff5 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -398,9 +398,7 @@ void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info) */ void fsnotify_detach_mark(struct fsnotify_mark *mark) { - struct fsnotify_group *group = mark->group; - - WARN_ON_ONCE(!mutex_is_locked(&group->mark_mutex)); + fsnotify_group_assert_locked(mark->group); WARN_ON_ONCE(!srcu_read_lock_held(&fsnotify_mark_srcu) && refcount_read(&mark->refcnt) < 1 + !!(mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED)); @@ -452,9 +450,9 @@ void fsnotify_free_mark(struct fsnotify_mark *mark) void fsnotify_destroy_mark(struct fsnotify_mark *mark, struct fsnotify_group *group) { - mutex_lock(&group->mark_mutex); + fsnotify_group_lock(group); fsnotify_detach_mark(mark); - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); fsnotify_free_mark(mark); } EXPORT_SYMBOL_GPL(fsnotify_destroy_mark); @@ -673,7 +671,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, struct fsnotify_group *group = mark->group; int ret = 0; - BUG_ON(!mutex_is_locked(&group->mark_mutex)); + fsnotify_group_assert_locked(group); /* * LOCKING ORDER!!!! @@ -714,9 +712,9 @@ int fsnotify_add_mark(struct fsnotify_mark *mark, fsnotify_connp_t *connp, int ret; struct fsnotify_group *group = mark->group; - mutex_lock(&group->mark_mutex); + fsnotify_group_lock(group); ret = fsnotify_add_mark_locked(mark, connp, obj_type, add_flags, fsid); - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); return ret; } EXPORT_SYMBOL_GPL(fsnotify_add_mark); @@ -770,24 +768,24 @@ void fsnotify_clear_marks_by_group(struct fsnotify_group *group, * move marks to free to to_free list in one go and then free marks in * to_free list one by one. */ - mutex_lock(&group->mark_mutex); + fsnotify_group_lock(group); list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) { if (mark->connector->type == obj_type) list_move(&mark->g_list, &to_free); } - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); clear: while (1) { - mutex_lock(&group->mark_mutex); + fsnotify_group_lock(group); if (list_empty(head)) { - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); break; } mark = list_first_entry(head, struct fsnotify_mark, g_list); fsnotify_get_mark(mark); fsnotify_detach_mark(mark); - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); fsnotify_free_mark(mark); fsnotify_put_mark(mark); } diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index dd440e6ff528..d62111e83244 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -20,6 +20,7 @@ #include #include #include +#include /* * IN_* from inotfy.h lines up EXACTLY with FS_*, this is so we can easily @@ -212,7 +213,9 @@ struct fsnotify_group { #define FSNOTIFY_GROUP_USER 0x01 /* user allocated group */ #define FSNOTIFY_GROUP_DUPS 0x02 /* allow multiple marks per object */ +#define FSNOTIFY_GROUP_NOFS 0x04 /* group lock is not direct reclaim safe */ int flags; + unsigned int owner_flags; /* stored flags of mark_mutex owner */ /* stores all fastpath marks assoc with this group so they can be cleaned on unregister */ struct mutex mark_mutex; /* protect marks_list */ @@ -254,6 +257,31 @@ struct fsnotify_group { }; }; +/* + * These helpers are used to prevent deadlock when reclaiming inodes with + * evictable marks of the same group that is allocating a new mark. + */ +static inline void fsnotify_group_lock(struct fsnotify_group *group) +{ + mutex_lock(&group->mark_mutex); + if (group->flags & FSNOTIFY_GROUP_NOFS) + group->owner_flags = memalloc_nofs_save(); +} + +static inline void fsnotify_group_unlock(struct fsnotify_group *group) +{ + if (group->flags & FSNOTIFY_GROUP_NOFS) + memalloc_nofs_restore(group->owner_flags); + mutex_unlock(&group->mark_mutex); +} + +static inline void fsnotify_group_assert_locked(struct fsnotify_group *group) +{ + WARN_ON_ONCE(!mutex_is_locked(&group->mark_mutex)); + if (group->flags & FSNOTIFY_GROUP_NOFS) + WARN_ON_ONCE(!(current->flags & PF_MEMALLOC_NOFS)); +} + /* When calling fsnotify tell it if the data is a path or inode */ enum fsnotify_data_type { FSNOTIFY_EVENT_NONE, From c2c6ced500ad26fb4a30277dbe26932d78f65ae7 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:18 +0300 Subject: [PATCH 0796/1565] inotify: use fsnotify group lock helpers [ Upstream commit 642054b87058019be36033f73c3e48ffff1915aa ] inotify inode marks pin the inode so there is no need to set the FSNOTIFY_GROUP_NOFS flag. Link: https://lore.kernel.org/r/20220422120327.3459282-8-amir73il@gmail.com Suggested-by: Jan Kara Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/inotify/inotify_user.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 9e1cf8392385..3d5d536f8fd6 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -627,13 +627,13 @@ static int inotify_update_watch(struct fsnotify_group *group, struct inode *inod { int ret = 0; - mutex_lock(&group->mark_mutex); + fsnotify_group_lock(group); /* try to update and existing watch with the new arg */ ret = inotify_update_existing_watch(group, inode, arg); /* no mark present, try to add a new one */ if (ret == -ENOENT) ret = inotify_new_watch(group, inode, arg); - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); return ret; } From cc1c875b6960fc372fdb9c6f889a7a1f2072bde1 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:20 +0300 Subject: [PATCH 0797/1565] nfsd: use fsnotify group lock helpers [ Upstream commit b8962a9d8cc2d8c93362e2f684091c79f702f6f3 ] Before commit 9542e6a643fc6 ("nfsd: Containerise filecache laundrette") nfsd would close open files in direct reclaim context and that could cause a deadlock when fsnotify mark allocation went into direct reclaim and nfsd shrinker tried to free existing fsnotify marks. To avoid issues like this in future code, set the FSNOTIFY_GROUP_NOFS flag on nfsd fsnotify group to prevent going into direct reclaim from fsnotify_add_inode_mark(). Link: https://lore.kernel.org/r/20220422120327.3459282-10-amir73il@gmail.com Suggested-by: Jan Kara Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 7ae2b6611fb2..7e99e75b75d7 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -118,14 +118,14 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf) struct inode *inode = nf->nf_inode; do { - mutex_lock(&nfsd_file_fsnotify_group->mark_mutex); + fsnotify_group_lock(nfsd_file_fsnotify_group); mark = fsnotify_find_mark(&inode->i_fsnotify_marks, - nfsd_file_fsnotify_group); + nfsd_file_fsnotify_group); if (mark) { nfm = nfsd_file_mark_get(container_of(mark, struct nfsd_file_mark, nfm_mark)); - mutex_unlock(&nfsd_file_fsnotify_group->mark_mutex); + fsnotify_group_unlock(nfsd_file_fsnotify_group); if (nfm) { fsnotify_put_mark(mark); break; @@ -133,8 +133,9 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf) /* Avoid soft lockup race with nfsd_file_mark_put() */ fsnotify_destroy_mark(mark, nfsd_file_fsnotify_group); fsnotify_put_mark(mark); - } else - mutex_unlock(&nfsd_file_fsnotify_group->mark_mutex); + } else { + fsnotify_group_unlock(nfsd_file_fsnotify_group); + } /* allocate a new nfm */ new = kmem_cache_alloc(nfsd_file_mark_slab, GFP_KERNEL); @@ -678,7 +679,7 @@ nfsd_file_cache_init(void) } nfsd_file_fsnotify_group = fsnotify_alloc_group(&nfsd_file_fsnotify_ops, - 0); + FSNOTIFY_GROUP_NOFS); if (IS_ERR(nfsd_file_fsnotify_group)) { pr_err("nfsd: unable to create fsnotify group: %ld\n", PTR_ERR(nfsd_file_fsnotify_group)); From 3bd557cfdf991a5f050aeeca528b7899846bacea Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:21 +0300 Subject: [PATCH 0798/1565] dnotify: use fsnotify group lock helpers [ Upstream commit aabb45fdcb31f00f1e7cae2bce83e83474a87c03 ] Before commit 9542e6a643fc6 ("nfsd: Containerise filecache laundrette") nfsd would close open files in direct reclaim context. There is no guarantee that others memory shrinkers don't do the same and no guarantee that future shrinkers won't do that. For example, if overlayfs implements inode cache of fscache would keep open files to cached objects, inode shrinkers could end up closing open files to underlying fs. Direct reclaim from dnotify mark allocation context may try to close open files that have dnotify marks of the same group and hit a deadlock on mark_mutex. Set the FSNOTIFY_GROUP_NOFS flag to prevent going into direct reclaim from allocations under dnotify group lock and use the safe group lock helpers. Link: https://lore.kernel.org/r/20220422120327.3459282-11-amir73il@gmail.com Suggested-by: Jan Kara Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/dnotify/dnotify.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index 6c586802c50e..fa81c59a2ad4 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -150,7 +150,7 @@ void dnotify_flush(struct file *filp, fl_owner_t id) return; dn_mark = container_of(fsn_mark, struct dnotify_mark, fsn_mark); - mutex_lock(&dnotify_group->mark_mutex); + fsnotify_group_lock(dnotify_group); spin_lock(&fsn_mark->lock); prev = &dn_mark->dn; @@ -173,7 +173,7 @@ void dnotify_flush(struct file *filp, fl_owner_t id) free = true; } - mutex_unlock(&dnotify_group->mark_mutex); + fsnotify_group_unlock(dnotify_group); if (free) fsnotify_free_mark(fsn_mark); @@ -306,7 +306,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) new_dn_mark->dn = NULL; /* this is needed to prevent the fcntl/close race described below */ - mutex_lock(&dnotify_group->mark_mutex); + fsnotify_group_lock(dnotify_group); /* add the new_fsn_mark or find an old one. */ fsn_mark = fsnotify_find_mark(&inode->i_fsnotify_marks, dnotify_group); @@ -316,7 +316,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) } else { error = fsnotify_add_inode_mark_locked(new_fsn_mark, inode, 0); if (error) { - mutex_unlock(&dnotify_group->mark_mutex); + fsnotify_group_unlock(dnotify_group); goto out_err; } spin_lock(&new_fsn_mark->lock); @@ -365,7 +365,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) if (destroy) fsnotify_detach_mark(fsn_mark); - mutex_unlock(&dnotify_group->mark_mutex); + fsnotify_group_unlock(dnotify_group); if (destroy) fsnotify_free_mark(fsn_mark); fsnotify_put_mark(fsn_mark); @@ -383,7 +383,8 @@ static int __init dnotify_init(void) SLAB_PANIC|SLAB_ACCOUNT); dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC|SLAB_ACCOUNT); - dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops, 0); + dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops, + FSNOTIFY_GROUP_NOFS); if (IS_ERR(dnotify_group)) panic("unable to allocate fsnotify group for dnotify\n"); return 0; From ff34ebaa6f6dc1eebce6a8d6f12a1566f33d00fe Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:22 +0300 Subject: [PATCH 0799/1565] fsnotify: allow adding an inode mark without pinning inode [ Upstream commit c3638b5b13740fa31762d414bbce8b7a694e582a ] fsnotify_add_mark() and variants implicitly take a reference on inode when attaching a mark to an inode. Make that behavior opt-out with the mark flag FSNOTIFY_MARK_FLAG_NO_IREF. Instead of taking the inode reference when attaching connector to inode and dropping the inode reference when detaching connector from inode, take the inode reference on attach of the first mark that wants to hold an inode reference and drop the inode reference on detach of the last mark that wants to hold an inode reference. Backends can "upgrade" an existing mark to take an inode reference, but cannot "downgrade" a mark with inode reference to release the refernce. This leaves the choice to the backend whether or not to pin the inode when adding an inode mark. This is intended to be used when adding a mark with ignored mask that is used for optimization in cases where group can afford getting unneeded events and reinstate the mark with ignored mask when inode is accessed again after being evicted. Link: https://lore.kernel.org/r/20220422120327.3459282-12-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/mark.c | 76 +++++++++++++++++++++++--------- include/linux/fsnotify_backend.h | 2 + 2 files changed, 58 insertions(+), 20 deletions(-) diff --git a/fs/notify/mark.c b/fs/notify/mark.c index 982ca2f20ff5..c74ef947447d 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -116,20 +116,64 @@ __u32 fsnotify_conn_mask(struct fsnotify_mark_connector *conn) return *fsnotify_conn_mask_p(conn); } -static void __fsnotify_recalc_mask(struct fsnotify_mark_connector *conn) +static void fsnotify_get_inode_ref(struct inode *inode) +{ + ihold(inode); + atomic_long_inc(&inode->i_sb->s_fsnotify_connectors); +} + +/* + * Grab or drop inode reference for the connector if needed. + * + * When it's time to drop the reference, we only clear the HAS_IREF flag and + * return the inode object. fsnotify_drop_object() will be resonsible for doing + * iput() outside of spinlocks. This happens when last mark that wanted iref is + * detached. + */ +static struct inode *fsnotify_update_iref(struct fsnotify_mark_connector *conn, + bool want_iref) +{ + bool has_iref = conn->flags & FSNOTIFY_CONN_FLAG_HAS_IREF; + struct inode *inode = NULL; + + if (conn->type != FSNOTIFY_OBJ_TYPE_INODE || + want_iref == has_iref) + return NULL; + + if (want_iref) { + /* Pin inode if any mark wants inode refcount held */ + fsnotify_get_inode_ref(fsnotify_conn_inode(conn)); + conn->flags |= FSNOTIFY_CONN_FLAG_HAS_IREF; + } else { + /* Unpin inode after detach of last mark that wanted iref */ + inode = fsnotify_conn_inode(conn); + conn->flags &= ~FSNOTIFY_CONN_FLAG_HAS_IREF; + } + + return inode; +} + +static void *__fsnotify_recalc_mask(struct fsnotify_mark_connector *conn) { u32 new_mask = 0; + bool want_iref = false; struct fsnotify_mark *mark; assert_spin_locked(&conn->lock); /* We can get detached connector here when inode is getting unlinked. */ if (!fsnotify_valid_obj_type(conn->type)) - return; + return NULL; hlist_for_each_entry(mark, &conn->list, obj_list) { - if (mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED) - new_mask |= fsnotify_calc_mask(mark); + if (!(mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) + continue; + new_mask |= fsnotify_calc_mask(mark); + if (conn->type == FSNOTIFY_OBJ_TYPE_INODE && + !(mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF)) + want_iref = true; } *fsnotify_conn_mask_p(conn) = new_mask; + + return fsnotify_update_iref(conn, want_iref); } /* @@ -169,12 +213,6 @@ static void fsnotify_connector_destroy_workfn(struct work_struct *work) } } -static void fsnotify_get_inode_ref(struct inode *inode) -{ - ihold(inode); - atomic_long_inc(&inode->i_sb->s_fsnotify_connectors); -} - static void fsnotify_put_inode_ref(struct inode *inode) { struct super_block *sb = inode->i_sb; @@ -213,6 +251,10 @@ static void *fsnotify_detach_connector_from_object( if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) { inode = fsnotify_conn_inode(conn); inode->i_fsnotify_mask = 0; + + /* Unpin inode when detaching from connector */ + if (!(conn->flags & FSNOTIFY_CONN_FLAG_HAS_IREF)) + inode = NULL; } else if (conn->type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) { fsnotify_conn_mount(conn)->mnt_fsnotify_mask = 0; } else if (conn->type == FSNOTIFY_OBJ_TYPE_SB) { @@ -274,7 +316,8 @@ void fsnotify_put_mark(struct fsnotify_mark *mark) objp = fsnotify_detach_connector_from_object(conn, &type); free_conn = true; } else { - __fsnotify_recalc_mask(conn); + objp = __fsnotify_recalc_mask(conn); + type = conn->type; } WRITE_ONCE(mark->connector, NULL); spin_unlock(&conn->lock); @@ -497,7 +540,6 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, unsigned int obj_type, __kernel_fsid_t *fsid) { - struct inode *inode = NULL; struct fsnotify_mark_connector *conn; conn = kmem_cache_alloc(fsnotify_mark_connector_cachep, GFP_KERNEL); @@ -505,6 +547,7 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, return -ENOMEM; spin_lock_init(&conn->lock); INIT_HLIST_HEAD(&conn->list); + conn->flags = 0; conn->type = obj_type; conn->obj = connp; /* Cache fsid of filesystem containing the object */ @@ -515,10 +558,6 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, conn->fsid.val[0] = conn->fsid.val[1] = 0; conn->flags = 0; } - if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) { - inode = fsnotify_conn_inode(conn); - fsnotify_get_inode_ref(inode); - } fsnotify_get_sb_connectors(conn); /* @@ -527,8 +566,6 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, */ if (cmpxchg(connp, NULL, conn)) { /* Someone else created list structure for us */ - if (inode) - fsnotify_put_inode_ref(inode); fsnotify_put_sb_connectors(conn); kmem_cache_free(fsnotify_mark_connector_cachep, conn); } @@ -690,8 +727,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, if (ret) goto err; - if (mark->mask || mark->ignored_mask) - fsnotify_recalc_mask(mark->connector); + fsnotify_recalc_mask(mark->connector); return ret; err: diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index d62111e83244..9a1a9e78f69f 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -456,6 +456,7 @@ struct fsnotify_mark_connector { spinlock_t lock; unsigned short type; /* Type of object [lock] */ #define FSNOTIFY_CONN_FLAG_HAS_FSID 0x01 +#define FSNOTIFY_CONN_FLAG_HAS_IREF 0x02 unsigned short flags; /* flags [lock] */ __kernel_fsid_t fsid; /* fsid of filesystem containing object */ union { @@ -510,6 +511,7 @@ struct fsnotify_mark { #define FSNOTIFY_MARK_FLAG_IN_ONESHOT 0x0020 /* fanotify mark flags */ #define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY 0x0100 +#define FSNOTIFY_MARK_FLAG_NO_IREF 0x0200 unsigned int flags; /* flags [mark->lock] */ }; From b9576077eee32f9e7ce6112b21aad63393f5cbdd Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:23 +0300 Subject: [PATCH 0800/1565] fanotify: create helper fanotify_mark_user_flags() [ Upstream commit 4adce25ccfff215939ee465b8c0aa70526d5c352 ] To translate from fsnotify mark flags to user visible flags. Link: https://lore.kernel.org/r/20220422120327.3459282-13-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.h | 10 ++++++++++ fs/notify/fdinfo.c | 6 ++---- 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index a3d5b751cac5..87142bc0131a 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -490,3 +490,13 @@ static inline unsigned int fanotify_event_hash_bucket( { return event->hash & FANOTIFY_HTABLE_MASK; } + +static inline unsigned int fanotify_mark_user_flags(struct fsnotify_mark *mark) +{ + unsigned int mflags = 0; + + if (mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY) + mflags |= FAN_MARK_IGNORED_SURV_MODIFY; + + return mflags; +} diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c index 1f34c5c29fdb..59fb40abe33d 100644 --- a/fs/notify/fdinfo.c +++ b/fs/notify/fdinfo.c @@ -14,6 +14,7 @@ #include #include "inotify/inotify.h" +#include "fanotify/fanotify.h" #include "fdinfo.h" #include "fsnotify.h" @@ -103,12 +104,9 @@ void inotify_show_fdinfo(struct seq_file *m, struct file *f) static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark) { - unsigned int mflags = 0; + unsigned int mflags = fanotify_mark_user_flags(mark); struct inode *inode; - if (mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY) - mflags |= FAN_MARK_IGNORED_SURV_MODIFY; - if (mark->connector->type == FSNOTIFY_OBJ_TYPE_INODE) { inode = igrab(fsnotify_conn_inode(mark->connector)); if (!inode) From 80fb0ae4b14560ddadd67cef61250bd43c9320a7 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:24 +0300 Subject: [PATCH 0801/1565] fanotify: factor out helper fanotify_mark_update_flags() [ Upstream commit 8998d110835e3781ccd3f1ae061a590b4aaba911 ] Handle FAN_MARK_IGNORED_SURV_MODIFY flag change in a helper that is called after updating the mark mask. Replace the added and removed return values and help variables with bool recalc return values and help variable, which makes the code a bit easier to follow. Rename flags argument to fan_flags to emphasize the difference from mark->flags. Link: https://lore.kernel.org/r/20220422120327.3459282-14-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 47 ++++++++++++++++-------------- 1 file changed, 25 insertions(+), 22 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 731bd7f64e01..f0206e3d11a7 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1069,42 +1069,45 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group, flags, umask); } -static void fanotify_mark_add_ignored_mask(struct fsnotify_mark *fsn_mark, - __u32 mask, unsigned int flags, - __u32 *removed) +static bool fanotify_mark_update_flags(struct fsnotify_mark *fsn_mark, + unsigned int fan_flags) { - fsn_mark->ignored_mask |= mask; + bool recalc = false; /* * Setting FAN_MARK_IGNORED_SURV_MODIFY for the first time may lead to * the removal of the FS_MODIFY bit in calculated mask if it was set * because of an ignored mask that is now going to survive FS_MODIFY. */ - if ((flags & FAN_MARK_IGNORED_SURV_MODIFY) && + if ((fan_flags & FAN_MARK_IGNORED_MASK) && + (fan_flags & FAN_MARK_IGNORED_SURV_MODIFY) && !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) { fsn_mark->flags |= FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY; if (!(fsn_mark->mask & FS_MODIFY)) - *removed = FS_MODIFY; + recalc = true; } + + return recalc; } -static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, - __u32 mask, unsigned int flags, - __u32 *removed) +static bool fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, + __u32 mask, unsigned int fan_flags) { - __u32 oldmask, newmask; + bool recalc; spin_lock(&fsn_mark->lock); - oldmask = fsnotify_calc_mask(fsn_mark); - if (!(flags & FAN_MARK_IGNORED_MASK)) { + if (!(fan_flags & FAN_MARK_IGNORED_MASK)) fsn_mark->mask |= mask; - } else { - fanotify_mark_add_ignored_mask(fsn_mark, mask, flags, removed); - } - newmask = fsnotify_calc_mask(fsn_mark); + else + fsn_mark->ignored_mask |= mask; + + recalc = fsnotify_calc_mask(fsn_mark) & + ~fsnotify_conn_mask(fsn_mark->connector); + + recalc |= fanotify_mark_update_flags(fsn_mark, fan_flags); spin_unlock(&fsn_mark->lock); - return newmask & ~oldmask; + return recalc; } static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group, @@ -1158,11 +1161,11 @@ static int fanotify_group_init_error_pool(struct fsnotify_group *group) static int fanotify_add_mark(struct fsnotify_group *group, fsnotify_connp_t *connp, unsigned int obj_type, - __u32 mask, unsigned int flags, + __u32 mask, unsigned int fan_flags, __kernel_fsid_t *fsid) { struct fsnotify_mark *fsn_mark; - __u32 added, removed = 0; + bool recalc; int ret = 0; mutex_lock(&group->mark_mutex); @@ -1179,14 +1182,14 @@ static int fanotify_add_mark(struct fsnotify_group *group, * Error events are pre-allocated per group, only if strictly * needed (i.e. FAN_FS_ERROR was requested). */ - if (!(flags & FAN_MARK_IGNORED_MASK) && (mask & FAN_FS_ERROR)) { + if (!(fan_flags & FAN_MARK_IGNORED_MASK) && (mask & FAN_FS_ERROR)) { ret = fanotify_group_init_error_pool(group); if (ret) goto out; } - added = fanotify_mark_add_to_mask(fsn_mark, mask, flags, &removed); - if (removed || (added & ~fsnotify_conn_mask(fsn_mark->connector))) + recalc = fanotify_mark_add_to_mask(fsn_mark, mask, fan_flags); + if (recalc) fsnotify_recalc_mask(fsn_mark->connector); out: From f85d590059532ac2efe344ba0e3d30442527b5b3 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:25 +0300 Subject: [PATCH 0802/1565] fanotify: implement "evictable" inode marks [ Upstream commit 7d5e005d982527e4029b0139823d179986e34cdc ] When an inode mark is created with flag FAN_MARK_EVICTABLE, it will not pin the marked inode to inode cache, so when inode is evicted from cache due to memory pressure, the mark will be lost. When an inode mark with flag FAN_MARK_EVICATBLE is updated without using this flag, the marked inode is pinned to inode cache. When an inode mark is updated with flag FAN_MARK_EVICTABLE but an existing mark already has the inode pinned, the mark update fails with error EEXIST. Evictable inode marks can be used to setup inode marks with ignored mask to suppress events from uninteresting files or directories in a lazy manner, upon receiving the first event, without having to iterate all the uninteresting files or directories before hand. The evictbale inode mark feature allows performing this lazy marks setup without exhausting the system memory with pinned inodes. This change does not enable the feature yet. Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiRDpuS=2uA6+ZUM7yG9vVU-u212tkunBmSnP_u=mkv=Q@mail.gmail.com/ Link: https://lore.kernel.org/r/20220422120327.3459282-15-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.h | 2 ++ fs/notify/fanotify/fanotify_user.c | 38 ++++++++++++++++++++++++++++-- include/uapi/linux/fanotify.h | 1 + 3 files changed, 39 insertions(+), 2 deletions(-) diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 87142bc0131a..80e0ec95b113 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -497,6 +497,8 @@ static inline unsigned int fanotify_mark_user_flags(struct fsnotify_mark *mark) if (mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY) mflags |= FAN_MARK_IGNORED_SURV_MODIFY; + if (mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF) + mflags |= FAN_MARK_EVICTABLE; return mflags; } diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index f0206e3d11a7..ab7a13686b49 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1072,6 +1072,7 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group, static bool fanotify_mark_update_flags(struct fsnotify_mark *fsn_mark, unsigned int fan_flags) { + bool want_iref = !(fan_flags & FAN_MARK_EVICTABLE); bool recalc = false; /* @@ -1087,7 +1088,18 @@ static bool fanotify_mark_update_flags(struct fsnotify_mark *fsn_mark, recalc = true; } - return recalc; + if (fsn_mark->connector->type != FSNOTIFY_OBJ_TYPE_INODE || + want_iref == !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF)) + return recalc; + + /* + * NO_IREF may be removed from a mark, but not added. + * When removed, fsnotify_recalc_mask() will take the inode ref. + */ + WARN_ON_ONCE(!want_iref); + fsn_mark->flags &= ~FSNOTIFY_MARK_FLAG_NO_IREF; + + return true; } static bool fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, @@ -1113,6 +1125,7 @@ static bool fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group, fsnotify_connp_t *connp, unsigned int obj_type, + unsigned int fan_flags, __kernel_fsid_t *fsid) { struct ucounts *ucounts = group->fanotify_data.ucounts; @@ -1135,6 +1148,9 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group, } fsnotify_init_mark(mark, group); + if (fan_flags & FAN_MARK_EVICTABLE) + mark->flags |= FSNOTIFY_MARK_FLAG_NO_IREF; + ret = fsnotify_add_mark_locked(mark, connp, obj_type, 0, fsid); if (ret) { fsnotify_put_mark(mark); @@ -1171,13 +1187,23 @@ static int fanotify_add_mark(struct fsnotify_group *group, mutex_lock(&group->mark_mutex); fsn_mark = fsnotify_find_mark(connp, group); if (!fsn_mark) { - fsn_mark = fanotify_add_new_mark(group, connp, obj_type, fsid); + fsn_mark = fanotify_add_new_mark(group, connp, obj_type, + fan_flags, fsid); if (IS_ERR(fsn_mark)) { mutex_unlock(&group->mark_mutex); return PTR_ERR(fsn_mark); } } + /* + * Non evictable mark cannot be downgraded to evictable mark. + */ + if (fan_flags & FAN_MARK_EVICTABLE && + !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF)) { + ret = -EEXIST; + goto out; + } + /* * Error events are pre-allocated per group, only if strictly * needed (i.e. FAN_FS_ERROR was requested). @@ -1607,6 +1633,14 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, mark_type != FAN_MARK_FILESYSTEM) goto fput_and_out; + /* + * Evictable is only relevant for inode marks, because only inode object + * can be evicted on memory pressure. + */ + if (flags & FAN_MARK_EVICTABLE && + mark_type != FAN_MARK_INODE) + goto fput_and_out; + /* * Events that do not carry enough information to report * event->fd require a group that supports reporting fid. Those diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index e8ac38cc2fd6..f1f89132d60e 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -82,6 +82,7 @@ #define FAN_MARK_IGNORED_SURV_MODIFY 0x00000040 #define FAN_MARK_FLUSH 0x00000080 /* FAN_MARK_FILESYSTEM is 0x00000100 */ +#define FAN_MARK_EVICTABLE 0x00000200 /* These are NOT bitwise flags. Both bits can be used togther. */ #define FAN_MARK_INODE 0x00000000 From 3083d602ba91d374a0baf87e56e8733c7c507e70 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:26 +0300 Subject: [PATCH 0803/1565] fanotify: use fsnotify group lock helpers [ Upstream commit e79719a2ca5c61912c0493bc1367db52759cf6fd ] Direct reclaim from fanotify mark allocation context may try to evict inodes with evictable marks of the same group and hit this deadlock: [<0>] fsnotify_destroy_mark+0x1f/0x3a [<0>] fsnotify_destroy_marks+0x71/0xd9 [<0>] __destroy_inode+0x24/0x7e [<0>] destroy_inode+0x2c/0x67 [<0>] dispose_list+0x49/0x68 [<0>] prune_icache_sb+0x5b/0x79 [<0>] super_cache_scan+0x11c/0x16f [<0>] shrink_slab.constprop.0+0x23e/0x40f [<0>] shrink_node+0x218/0x3e7 [<0>] do_try_to_free_pages+0x12a/0x2d2 [<0>] try_to_free_pages+0x166/0x242 [<0>] __alloc_pages_slowpath.constprop.0+0x30c/0x903 [<0>] __alloc_pages+0xeb/0x1c7 [<0>] cache_grow_begin+0x6f/0x31e [<0>] fallback_alloc+0xe0/0x12d [<0>] ____cache_alloc_node+0x15a/0x17e [<0>] kmem_cache_alloc_trace+0xa1/0x143 [<0>] fanotify_add_mark+0xd5/0x2b2 [<0>] do_fanotify_mark+0x566/0x5eb [<0>] __x64_sys_fanotify_mark+0x21/0x24 [<0>] do_syscall_64+0x6d/0x80 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae Set the FSNOTIFY_GROUP_NOFS flag to prevent going into direct reclaim from allocations under fanotify group lock and use the safe group lock helpers. Link: https://lore.kernel.org/r/20220422120327.3459282-16-amir73il@gmail.com Suggested-by: Jan Kara Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index ab7a13686b49..ad520a279618 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1023,10 +1023,10 @@ static int fanotify_remove_mark(struct fsnotify_group *group, __u32 removed; int destroy_mark; - mutex_lock(&group->mark_mutex); + fsnotify_group_lock(group); fsn_mark = fsnotify_find_mark(connp, group); if (!fsn_mark) { - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); return -ENOENT; } @@ -1036,7 +1036,7 @@ static int fanotify_remove_mark(struct fsnotify_group *group, fsnotify_recalc_mask(fsn_mark->connector); if (destroy_mark) fsnotify_detach_mark(fsn_mark); - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); if (destroy_mark) fsnotify_free_mark(fsn_mark); @@ -1184,13 +1184,13 @@ static int fanotify_add_mark(struct fsnotify_group *group, bool recalc; int ret = 0; - mutex_lock(&group->mark_mutex); + fsnotify_group_lock(group); fsn_mark = fsnotify_find_mark(connp, group); if (!fsn_mark) { fsn_mark = fanotify_add_new_mark(group, connp, obj_type, fan_flags, fsid); if (IS_ERR(fsn_mark)) { - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); return PTR_ERR(fsn_mark); } } @@ -1219,7 +1219,7 @@ static int fanotify_add_mark(struct fsnotify_group *group, fsnotify_recalc_mask(fsn_mark->connector); out: - mutex_unlock(&group->mark_mutex); + fsnotify_group_unlock(group); fsnotify_put_mark(fsn_mark); return ret; @@ -1373,7 +1373,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) /* fsnotify_alloc_group takes a ref. Dropped in fanotify_release */ group = fsnotify_alloc_group(&fanotify_fsnotify_ops, - FSNOTIFY_GROUP_USER); + FSNOTIFY_GROUP_USER | FSNOTIFY_GROUP_NOFS); if (IS_ERR(group)) { return PTR_ERR(group); } From f6017a718b6398c6ddba8be33ebcb3ab96c83fb2 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:27 +0300 Subject: [PATCH 0804/1565] fanotify: enable "evictable" inode marks [ Upstream commit 5f9d3bd520261fd7a850818c71809fd580e0f30c ] Now that the direct reclaim path is handled we can enable evictable inode marks. Link: https://lore.kernel.org/r/20220422120327.3459282-17-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 2 +- include/linux/fanotify.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index ad520a279618..3a1325c90ff8 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1806,7 +1806,7 @@ static int __init fanotify_user_setup(void) BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS); BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 12); - BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9); + BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 10); fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, SLAB_PANIC|SLAB_ACCOUNT); diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 3afdf339d53c..81f45061c1b1 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -68,6 +68,7 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ FAN_MARK_ONLYDIR | \ FAN_MARK_IGNORED_MASK | \ FAN_MARK_IGNORED_SURV_MODIFY | \ + FAN_MARK_EVICTABLE | \ FAN_MARK_FLUSH) /* From e3bce57ffc7b020f983a5fcd22cd06ddd3520916 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 11 May 2022 22:02:12 +0300 Subject: [PATCH 0805/1565] fsnotify: introduce mark type iterator [ Upstream commit 14362a2541797cf9df0e86fb12dcd7950baf566e ] fsnotify_foreach_iter_mark_type() is used to reduce boilerplate code of iterating all marks of a specific group interested in an event by consulting the iterator report_mask. Use an open coded version of that iterator in fsnotify_iter_next() that collects all marks of the current iteration group without consulting the iterator report_mask. At the moment, the two iterator variants are the same, but this decoupling will allow us to exclude some of the group's marks from reporting the event, for example for event on child and inode marks on parent did not request to watch events on children. Fixes: 2f02fd3fa13e ("fanotify: fix ignore mask logic for events on child and on dir") Reported-by: Jan Kara Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Link: https://lore.kernel.org/r/20220511190213.831646-2-amir73il@gmail.com Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 14 +++------ fs/notify/fsnotify.c | 53 ++++++++++++++++---------------- include/linux/fsnotify_backend.h | 31 ++++++++++++++----- 3 files changed, 54 insertions(+), 44 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 985e995d2a39..263d303d8f8f 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -319,11 +319,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, return 0; } - fsnotify_foreach_iter_type(type) { - if (!fsnotify_iter_should_report_type(iter_info, type)) - continue; - mark = iter_info->marks[type]; - + fsnotify_foreach_iter_mark_type(iter_info, mark, type) { /* Apply ignore mask regardless of ISDIR and ON_CHILD flags */ marks_ignored_mask |= mark->ignored_mask; @@ -849,16 +845,14 @@ static struct fanotify_event *fanotify_alloc_event( */ static __kernel_fsid_t fanotify_get_fsid(struct fsnotify_iter_info *iter_info) { + struct fsnotify_mark *mark; int type; __kernel_fsid_t fsid = {}; - fsnotify_foreach_iter_type(type) { + fsnotify_foreach_iter_mark_type(iter_info, mark, type) { struct fsnotify_mark_connector *conn; - if (!fsnotify_iter_should_report_type(iter_info, type)) - continue; - - conn = READ_ONCE(iter_info->marks[type]->connector); + conn = READ_ONCE(mark->connector); /* Mark is just getting destroyed or created? */ if (!conn) continue; diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 6eee19d15e8c..35740a64ee45 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -335,31 +335,23 @@ static int send_to_group(__u32 mask, const void *data, int data_type, struct fsnotify_mark *mark; int type; - if (WARN_ON(!iter_info->report_mask)) + if (!iter_info->report_mask) return 0; /* clear ignored on inode modification */ if (mask & FS_MODIFY) { - fsnotify_foreach_iter_type(type) { - if (!fsnotify_iter_should_report_type(iter_info, type)) - continue; - mark = iter_info->marks[type]; - if (mark && - !(mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) + fsnotify_foreach_iter_mark_type(iter_info, mark, type) { + if (!(mark->flags & + FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) mark->ignored_mask = 0; } } - fsnotify_foreach_iter_type(type) { - if (!fsnotify_iter_should_report_type(iter_info, type)) - continue; - mark = iter_info->marks[type]; - /* does the object mark tell us to do something? */ - if (mark) { - group = mark->group; - marks_mask |= mark->mask; - marks_ignored_mask |= mark->ignored_mask; - } + /* Are any of the group marks interested in this event? */ + fsnotify_foreach_iter_mark_type(iter_info, mark, type) { + group = mark->group; + marks_mask |= mark->mask; + marks_ignored_mask |= mark->ignored_mask; } pr_debug("%s: group=%p mask=%x marks_mask=%x marks_ignored_mask=%x data=%p data_type=%d dir=%p cookie=%d\n", @@ -403,11 +395,11 @@ static struct fsnotify_mark *fsnotify_next_mark(struct fsnotify_mark *mark) /* * iter_info is a multi head priority queue of marks. - * Pick a subset of marks from queue heads, all with the - * same group and set the report_mask for selected subset. - * Returns the report_mask of the selected subset. + * Pick a subset of marks from queue heads, all with the same group + * and set the report_mask to a subset of the selected marks. + * Returns false if there are no more groups to iterate. */ -static unsigned int fsnotify_iter_select_report_types( +static bool fsnotify_iter_select_report_types( struct fsnotify_iter_info *iter_info) { struct fsnotify_group *max_prio_group = NULL; @@ -423,30 +415,37 @@ static unsigned int fsnotify_iter_select_report_types( } if (!max_prio_group) - return 0; + return false; /* Set the report mask for marks from same group as max prio group */ + iter_info->current_group = max_prio_group; iter_info->report_mask = 0; fsnotify_foreach_iter_type(type) { mark = iter_info->marks[type]; - if (mark && - fsnotify_compare_groups(max_prio_group, mark->group) == 0) + if (mark && mark->group == iter_info->current_group) fsnotify_iter_set_report_type(iter_info, type); } - return iter_info->report_mask; + return true; } /* - * Pop from iter_info multi head queue, the marks that were iterated in the + * Pop from iter_info multi head queue, the marks that belong to the group of * current iteration step. */ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info) { + struct fsnotify_mark *mark; int type; + /* + * We cannot use fsnotify_foreach_iter_mark_type() here because we + * may need to advance a mark of type X that belongs to current_group + * but was not selected for reporting. + */ fsnotify_foreach_iter_type(type) { - if (fsnotify_iter_should_report_type(iter_info, type)) + mark = iter_info->marks[type]; + if (mark && mark->group == iter_info->current_group) iter_info->marks[type] = fsnotify_next_mark(iter_info->marks[type]); } diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 9a1a9e78f69f..9560734759fa 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -399,6 +399,7 @@ static inline bool fsnotify_valid_obj_type(unsigned int obj_type) struct fsnotify_iter_info { struct fsnotify_mark *marks[FSNOTIFY_ITER_TYPE_COUNT]; + struct fsnotify_group *current_group; unsigned int report_mask; int srcu_idx; }; @@ -415,20 +416,31 @@ static inline void fsnotify_iter_set_report_type( iter_info->report_mask |= (1U << iter_type); } -static inline void fsnotify_iter_set_report_type_mark( - struct fsnotify_iter_info *iter_info, int iter_type, - struct fsnotify_mark *mark) +static inline struct fsnotify_mark *fsnotify_iter_mark( + struct fsnotify_iter_info *iter_info, int iter_type) { - iter_info->marks[iter_type] = mark; - iter_info->report_mask |= (1U << iter_type); + if (fsnotify_iter_should_report_type(iter_info, iter_type)) + return iter_info->marks[iter_type]; + return NULL; +} + +static inline int fsnotify_iter_step(struct fsnotify_iter_info *iter, int type, + struct fsnotify_mark **markp) +{ + while (type < FSNOTIFY_ITER_TYPE_COUNT) { + *markp = fsnotify_iter_mark(iter, type); + if (*markp) + break; + type++; + } + return type; } #define FSNOTIFY_ITER_FUNCS(name, NAME) \ static inline struct fsnotify_mark *fsnotify_iter_##name##_mark( \ struct fsnotify_iter_info *iter_info) \ { \ - return (iter_info->report_mask & (1U << FSNOTIFY_ITER_TYPE_##NAME)) ? \ - iter_info->marks[FSNOTIFY_ITER_TYPE_##NAME] : NULL; \ + return fsnotify_iter_mark(iter_info, FSNOTIFY_ITER_TYPE_##NAME); \ } FSNOTIFY_ITER_FUNCS(inode, INODE) @@ -438,6 +450,11 @@ FSNOTIFY_ITER_FUNCS(sb, SB) #define fsnotify_foreach_iter_type(type) \ for (type = 0; type < FSNOTIFY_ITER_TYPE_COUNT; type++) +#define fsnotify_foreach_iter_mark_type(iter, mark, type) \ + for (type = 0; \ + type = fsnotify_iter_step(iter, type, &mark), \ + type < FSNOTIFY_ITER_TYPE_COUNT; \ + type++) /* * fsnotify_connp_t is what we embed in objects which connector can be attached From 4cd725129e65e700d7abc5b5dff34a841f69a65d Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 11 May 2022 22:02:13 +0300 Subject: [PATCH 0806/1565] fsnotify: consistent behavior for parent not watching children [ Upstream commit e730558adffb88a52e562db089e969ee9510184a ] The logic for handling events on child in groups that have a mark on the parent inode, but without FS_EVENT_ON_CHILD flag in the mask is duplicated in several places and inconsistent. Move the logic into the preparation of mark type iterator, so that the parent mark type will be excluded from all mark type iterations in that case. This results in several subtle changes of behavior, hopefully all desired changes of behavior, for example: - Group A has a mount mark with FS_MODIFY in mask - Group A has a mark with ignore mask that does not survive FS_MODIFY and does not watch children on directory D. - Group B has a mark with FS_MODIFY in mask that does watch children on directory D. - FS_MODIFY event on file D/foo should not clear the ignore mask of group A, but before this change it does And if group A ignore mask was set to survive FS_MODIFY: - FS_MODIFY event on file D/foo should be reported to group A on account of the mount mark, but before this change it is wrongly ignored Fixes: 2f02fd3fa13e ("fanotify: fix ignore mask logic for events on child and on dir") Reported-by: Jan Kara Link: https://lore.kernel.org/linux-fsdevel/20220314113337.j7slrb5srxukztje@quack3.lan/ Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Link: https://lore.kernel.org/r/20220511190213.831646-3-amir73il@gmail.com Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 10 +--------- fs/notify/fsnotify.c | 34 +++++++++++++++++++--------------- 2 files changed, 20 insertions(+), 24 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 263d303d8f8f..4f897e109547 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -320,7 +320,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, } fsnotify_foreach_iter_mark_type(iter_info, mark, type) { - /* Apply ignore mask regardless of ISDIR and ON_CHILD flags */ + /* Apply ignore mask regardless of mark's ISDIR flag */ marks_ignored_mask |= mark->ignored_mask; /* @@ -330,14 +330,6 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, if (event_mask & FS_ISDIR && !(mark->mask & FS_ISDIR)) continue; - /* - * If the event is on a child and this mark is on a parent not - * watching children, don't send it! - */ - if (type == FSNOTIFY_ITER_TYPE_PARENT && - !(mark->mask & FS_EVENT_ON_CHILD)) - continue; - marks_mask |= mark->mask; /* Record the mark types of this group that matched the event */ diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 35740a64ee45..0b3e74935cb4 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -290,22 +290,15 @@ static int fsnotify_handle_event(struct fsnotify_group *group, __u32 mask, } if (parent_mark) { - /* - * parent_mark indicates that the parent inode is watching - * children and interested in this event, which is an event - * possible on child. But is *this mark* watching children and - * interested in this event? - */ - if (parent_mark->mask & FS_EVENT_ON_CHILD) { - ret = fsnotify_handle_inode_event(group, parent_mark, mask, - data, data_type, dir, name, 0); - if (ret) - return ret; - } - if (!inode_mark) - return 0; + ret = fsnotify_handle_inode_event(group, parent_mark, mask, + data, data_type, dir, name, 0); + if (ret) + return ret; } + if (!inode_mark) + return 0; + if (mask & FS_EVENT_ON_CHILD) { /* * Some events can be sent on both parent dir and child marks @@ -422,8 +415,19 @@ static bool fsnotify_iter_select_report_types( iter_info->report_mask = 0; fsnotify_foreach_iter_type(type) { mark = iter_info->marks[type]; - if (mark && mark->group == iter_info->current_group) + if (mark && mark->group == iter_info->current_group) { + /* + * FSNOTIFY_ITER_TYPE_PARENT indicates that this inode + * is watching children and interested in this event, + * which is an event possible on child. + * But is *this mark* watching children? + */ + if (type == FSNOTIFY_ITER_TYPE_PARENT && + !(mark->mask & FS_EVENT_ON_CHILD)) + continue; + fsnotify_iter_set_report_type(iter_info, type); + } } return true; From 2723d479f51f2afab7fa0db2aef1948635b95ef7 Mon Sep 17 00:00:00 2001 From: Vasily Averin Date: Sun, 22 May 2022 15:08:02 +0300 Subject: [PATCH 0807/1565] fanotify: fix incorrect fmode_t casts [ Upstream commit dccd855771b37820b6d976a99729c88259549f85 ] Fixes sparce warnings: fs/notify/fanotify/fanotify_user.c:267:63: sparse: warning: restricted fmode_t degrades to integer fs/notify/fanotify/fanotify_user.c:1351:28: sparse: warning: restricted fmode_t degrades to integer FMODE_NONTIFY have bitwise fmode_t type and requires __force attribute for any casts. Signed-off-by: Vasily Averin Reviewed-by: Christian Brauner (Microsoft) Reviewed-by: Christoph Hellwig Signed-off-by: Jan Kara Link: https://lore.kernel.org/r/9adfd6ac-1b89-791e-796b-49ada3293985@openvz.org Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 3a1325c90ff8..f4a5e9074dd4 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -252,7 +252,7 @@ static int create_fd(struct fsnotify_group *group, struct path *path, * originally opened O_WRONLY. */ new_file = dentry_open(path, - group->fanotify_data.f_flags | FMODE_NONOTIFY, + group->fanotify_data.f_flags | __FMODE_NONOTIFY, current_cred()); if (IS_ERR(new_file)) { /* @@ -1365,7 +1365,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) (!(fid_mode & FAN_REPORT_NAME) || !(fid_mode & FAN_REPORT_FID))) return -EINVAL; - f_flags = O_RDWR | FMODE_NONOTIFY; + f_flags = O_RDWR | __FMODE_NONOTIFY; if (flags & FAN_CLOEXEC) f_flags |= O_CLOEXEC; if (flags & FAN_NONBLOCK) From bf1cbe2f3650b4f4a8add6af933c6d7f6af1f361 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 7 Apr 2022 16:48:24 -0400 Subject: [PATCH 0808/1565] NFSD: Clean up nfsd_splice_actor() [ Upstream commit 91e23b1c39820bfed642119ff6b6ef9f43cf09ce ] nfsd_splice_actor() checks that the page being spliced does not match the previous element in the svc_rqst::rq_pages array. We believe this is to prevent a double put_page() in cases where the READ payload is partially contained in the xdr_buf's head buffer. However, the NFSD READ proc functions no longer place any part of the READ payload in the head buffer, in order to properly support NFS/RDMA READ with Write chunks. Therefore, simplify the logic in nfsd_splice_actor() to remove this unnecessary check. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 86584e727ce0..0968eaf735c8 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -874,17 +874,11 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct splice_desc *sd) { struct svc_rqst *rqstp = sd->u.data; - struct page **pp = rqstp->rq_next_page; - struct page *page = buf->page; - if (rqstp->rq_res.page_len == 0) { - svc_rqst_replace_page(rqstp, page); + svc_rqst_replace_page(rqstp, buf->page); + if (rqstp->rq_res.page_len == 0) rqstp->rq_res.page_base = buf->offset; - } else if (page != pp[-1]) { - svc_rqst_replace_page(rqstp, page); - } rqstp->rq_res.page_len += sd->len; - return sd->len; } From 67ef9e5fd737eab2495f2586df7e9ea30caa1b77 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Mon, 2 May 2022 14:19:21 -0700 Subject: [PATCH 0809/1565] NFSD: add courteous server support for thread with only delegation [ Upstream commit 66af25799940b26efd41ea6e648f75c41a48a2c2 ] This patch provides courteous server support for delegation only. Only expired client with delegation but no conflict and no open or lock state is allowed to be in COURTESY state. Delegation conflict with COURTESY/EXPIRABLE client is resolved by setting it to EXPIRABLE, queue work for the laundromat and return delay to the caller. Conflict is resolved when the laudromat runs and expires the EXIRABLE client while the NFS client retries the OPEN request. Local thread request that gets conflict is doing the retry in _break_lease. Client in COURTESY or EXPIRABLE state is allowed to reconnect and continues to have access to its state. Access to the nfs4_client by the reconnecting thread and the laundromat is serialized via the client_lock. Reviewed-by: J. Bruce Fields Signed-off-by: Dai Ngo Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 82 ++++++++++++++++++++++++++++++++++++--------- fs/nfsd/nfsd.h | 1 + fs/nfsd/state.h | 31 +++++++++++++++++ 3 files changed, 99 insertions(+), 15 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 0170aaf318ea..74c0a8890404 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -125,6 +125,8 @@ static void free_session(struct nfsd4_session *); static const struct nfsd4_callback_ops nfsd4_cb_recall_ops; static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops; +static struct workqueue_struct *laundry_wq; + static bool is_session_dead(struct nfsd4_session *ses) { return ses->se_flags & NFS4_SESSION_DEAD; @@ -152,6 +154,7 @@ static __be32 get_client_locked(struct nfs4_client *clp) if (is_client_expired(clp)) return nfserr_expired; atomic_inc(&clp->cl_rpc_users); + clp->cl_state = NFSD4_ACTIVE; return nfs_ok; } @@ -172,6 +175,7 @@ renew_client_locked(struct nfs4_client *clp) list_move_tail(&clp->cl_lru, &nn->client_lru); clp->cl_time = ktime_get_boottime_seconds(); + clp->cl_state = NFSD4_ACTIVE; } static void put_client_renew_locked(struct nfs4_client *clp) @@ -1102,6 +1106,7 @@ alloc_init_deleg(struct nfs4_client *clp, struct nfs4_file *fp, get_clnt_odstate(odstate); dp->dl_type = NFS4_OPEN_DELEGATE_READ; dp->dl_retries = 1; + dp->dl_recalled = false; nfsd4_init_cb(&dp->dl_recall, dp->dl_stid.sc_client, &nfsd4_cb_recall_ops, NFSPROC4_CLNT_CB_RECALL); get_nfs4_file(fp); @@ -2017,6 +2022,8 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name) idr_init(&clp->cl_stateids); atomic_set(&clp->cl_rpc_users, 0); clp->cl_cb_state = NFSD4_CB_UNKNOWN; + clp->cl_state = NFSD4_ACTIVE; + atomic_set(&clp->cl_delegs_in_recall, 0); INIT_LIST_HEAD(&clp->cl_idhash); INIT_LIST_HEAD(&clp->cl_openowners); INIT_LIST_HEAD(&clp->cl_delegations); @@ -4711,9 +4718,18 @@ nfsd_break_deleg_cb(struct file_lock *fl) bool ret = false; struct nfs4_delegation *dp = (struct nfs4_delegation *)fl->fl_owner; struct nfs4_file *fp = dp->dl_stid.sc_file; + struct nfs4_client *clp = dp->dl_stid.sc_client; + struct nfsd_net *nn; trace_nfsd_cb_recall(&dp->dl_stid); + dp->dl_recalled = true; + atomic_inc(&clp->cl_delegs_in_recall); + if (try_to_expire_client(clp)) { + nn = net_generic(clp->net, nfsd_net_id); + mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); + } + /* * We don't want the locks code to timeout the lease for us; * we'll remove it ourself if a delegation isn't returned @@ -4756,9 +4772,14 @@ static int nfsd_change_deleg_cb(struct file_lock *onlist, int arg, struct list_head *dispose) { - if (arg & F_UNLCK) + struct nfs4_delegation *dp = (struct nfs4_delegation *)onlist->fl_owner; + struct nfs4_client *clp = dp->dl_stid.sc_client; + + if (arg & F_UNLCK) { + if (dp->dl_recalled) + atomic_dec(&clp->cl_delegs_in_recall); return lease_modify(onlist, arg, dispose); - else + } else return -EAGAIN; } @@ -5622,6 +5643,49 @@ static void nfsd4_ssc_expire_umount(struct nfsd_net *nn) } #endif +/* + * place holder for now, no check for lock blockers yet + */ +static bool +nfs4_anylock_blockers(struct nfs4_client *clp) +{ + if (atomic_read(&clp->cl_delegs_in_recall) || + client_has_openowners(clp) || + !list_empty(&clp->async_copies)) + return true; + return false; +} + +static void +nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, + struct laundry_time *lt) +{ + struct list_head *pos, *next; + struct nfs4_client *clp; + + INIT_LIST_HEAD(reaplist); + spin_lock(&nn->client_lock); + list_for_each_safe(pos, next, &nn->client_lru) { + clp = list_entry(pos, struct nfs4_client, cl_lru); + if (clp->cl_state == NFSD4_EXPIRABLE) + goto exp_client; + if (!state_expired(lt, clp->cl_time)) + break; + if (!atomic_read(&clp->cl_rpc_users)) + clp->cl_state = NFSD4_COURTESY; + if (!client_has_state(clp) || + ktime_get_boottime_seconds() >= + (clp->cl_time + NFSD_COURTESY_CLIENT_TIMEOUT)) + goto exp_client; + if (nfs4_anylock_blockers(clp)) { +exp_client: + if (!mark_client_expired_locked(clp)) + list_add(&clp->cl_lru, reaplist); + } + } + spin_unlock(&nn->client_lock); +} + static time64_t nfs4_laundromat(struct nfsd_net *nn) { @@ -5644,7 +5708,6 @@ nfs4_laundromat(struct nfsd_net *nn) goto out; } nfsd4_end_grace(nn); - INIT_LIST_HEAD(&reaplist); spin_lock(&nn->s2s_cp_lock); idr_for_each_entry(&nn->s2s_cp_stateids, cps_t, i) { @@ -5654,17 +5717,7 @@ nfs4_laundromat(struct nfsd_net *nn) _free_cpntf_state_locked(nn, cps); } spin_unlock(&nn->s2s_cp_lock); - - spin_lock(&nn->client_lock); - list_for_each_safe(pos, next, &nn->client_lru) { - clp = list_entry(pos, struct nfs4_client, cl_lru); - if (!state_expired(<, clp->cl_time)) - break; - if (mark_client_expired_locked(clp)) - continue; - list_add(&clp->cl_lru, &reaplist); - } - spin_unlock(&nn->client_lock); + nfs4_get_client_reaplist(nn, &reaplist, <); list_for_each_safe(pos, next, &reaplist) { clp = list_entry(pos, struct nfs4_client, cl_lru); trace_nfsd_clid_purged(&clp->cl_clientid); @@ -5739,7 +5792,6 @@ nfs4_laundromat(struct nfsd_net *nn) return max_t(time64_t, lt.new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT); } -static struct workqueue_struct *laundry_wq; static void laundromat_main(struct work_struct *); static void diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 4fc1fd639527..23996c6ca75e 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -336,6 +336,7 @@ void nfsd_lockd_shutdown(void); #define COMPOUND_ERR_SLACK_SPACE 16 /* OP_SETATTR */ #define NFSD_LAUNDROMAT_MINTIMEOUT 1 /* seconds */ +#define NFSD_COURTESY_CLIENT_TIMEOUT (24 * 60 * 60) /* seconds */ /* * The following attributes are currently not supported by the NFSv4 server: diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 95457cfd37fc..f3d6313914ed 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -149,6 +149,7 @@ struct nfs4_delegation { /* For recall: */ int dl_retries; struct nfsd4_callback dl_recall; + bool dl_recalled; }; #define cb_to_delegation(cb) \ @@ -282,6 +283,28 @@ struct nfsd4_sessionid { #define HEXDIR_LEN 33 /* hex version of 16 byte md5 of cl_name plus '\0' */ +/* + * State Meaning Where set + * -------------------------------------------------------------------------- + * | NFSD4_ACTIVE | Confirmed, active | Default | + * |------------------- ----------------------------------------------------| + * | NFSD4_COURTESY | Courtesy state. | nfs4_get_client_reaplist | + * | | Lease/lock/share | | + * | | reservation conflict | | + * | | can cause Courtesy | | + * | | client to be expired | | + * |------------------------------------------------------------------------| + * | NFSD4_EXPIRABLE | Courtesy client to be| nfs4_laundromat | + * | | expired by Laundromat| try_to_expire_client | + * | | due to conflict | | + * |------------------------------------------------------------------------| + */ +enum { + NFSD4_ACTIVE = 0, + NFSD4_COURTESY, + NFSD4_EXPIRABLE, +}; + /* * struct nfs4_client - one per client. Clientids live here. * @@ -385,6 +408,9 @@ struct nfs4_client { struct list_head async_copies; /* list of async copies */ spinlock_t async_lock; /* lock for async copies */ atomic_t cl_cb_inflight; /* Outstanding callbacks */ + + unsigned int cl_state; + atomic_t cl_delegs_in_recall; }; /* struct nfs4_client_reset @@ -702,4 +728,9 @@ extern void nfsd4_client_record_remove(struct nfs4_client *clp); extern int nfsd4_client_record_check(struct nfs4_client *clp); extern void nfsd4_record_grace_done(struct nfsd_net *nn); +static inline bool try_to_expire_client(struct nfs4_client *clp) +{ + cmpxchg(&clp->cl_state, NFSD4_COURTESY, NFSD4_EXPIRABLE); + return clp->cl_state == NFSD4_EXPIRABLE; +} #endif /* NFSD4_STATE_H */ From a26848e2bcc9c40b2f9791f6d827504c513a8307 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Mon, 2 May 2022 14:19:22 -0700 Subject: [PATCH 0810/1565] NFSD: add support for share reservation conflict to courteous server [ Upstream commit 3d69427151806656abf129342028f3f4e5e1fee0 ] This patch allows expired client with open state to be in COURTESY state. Share/access conflict with COURTESY client is resolved by setting COURTESY client to EXPIRABLE state, schedule laundromat to run and returning nfserr_jukebox to the request client. Reviewed-by: J. Bruce Fields Signed-off-by: Dai Ngo Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 109 ++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 101 insertions(+), 8 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 74c0a8890404..4ba0a70d8990 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -705,6 +705,57 @@ static unsigned int file_hashval(struct svc_fh *fh) static struct hlist_head file_hashtbl[FILE_HASH_SIZE]; +/* + * Check if courtesy clients have conflicting access and resolve it if possible + * + * access: is op_share_access if share_access is true. + * Check if access mode, op_share_access, would conflict with + * the current deny mode of the file 'fp'. + * access: is op_share_deny if share_access is false. + * Check if the deny mode, op_share_deny, would conflict with + * current access of the file 'fp'. + * stp: skip checking this entry. + * new_stp: normal open, not open upgrade. + * + * Function returns: + * false - access/deny mode conflict with normal client. + * true - no conflict or conflict with courtesy client(s) is resolved. + */ +static bool +nfs4_resolve_deny_conflicts_locked(struct nfs4_file *fp, bool new_stp, + struct nfs4_ol_stateid *stp, u32 access, bool share_access) +{ + struct nfs4_ol_stateid *st; + bool resolvable = true; + unsigned char bmap; + struct nfsd_net *nn; + struct nfs4_client *clp; + + lockdep_assert_held(&fp->fi_lock); + list_for_each_entry(st, &fp->fi_stateids, st_perfile) { + /* ignore lock stateid */ + if (st->st_openstp) + continue; + if (st == stp && new_stp) + continue; + /* check file access against deny mode or vice versa */ + bmap = share_access ? st->st_deny_bmap : st->st_access_bmap; + if (!(access & bmap_to_share_mode(bmap))) + continue; + clp = st->st_stid.sc_client; + if (try_to_expire_client(clp)) + continue; + resolvable = false; + break; + } + if (resolvable) { + clp = stp->st_stid.sc_client; + nn = net_generic(clp->net, nfsd_net_id); + mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); + } + return resolvable; +} + static void __nfs4_file_get_access(struct nfs4_file *fp, u32 access) { @@ -4985,7 +5036,7 @@ nfsd4_truncate(struct svc_rqst *rqstp, struct svc_fh *fh, static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp, - struct nfsd4_open *open) + struct nfsd4_open *open, bool new_stp) { struct nfsd_file *nf = NULL; __be32 status; @@ -5001,6 +5052,13 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, */ status = nfs4_file_check_deny(fp, open->op_share_deny); if (status != nfs_ok) { + if (status != nfserr_share_denied) { + spin_unlock(&fp->fi_lock); + goto out; + } + if (nfs4_resolve_deny_conflicts_locked(fp, new_stp, + stp, open->op_share_deny, false)) + status = nfserr_jukebox; spin_unlock(&fp->fi_lock); goto out; } @@ -5008,6 +5066,13 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, /* set access to the file */ status = nfs4_file_get_access(fp, open->op_share_access); if (status != nfs_ok) { + if (status != nfserr_share_denied) { + spin_unlock(&fp->fi_lock); + goto out; + } + if (nfs4_resolve_deny_conflicts_locked(fp, new_stp, + stp, open->op_share_access, true)) + status = nfserr_jukebox; spin_unlock(&fp->fi_lock); goto out; } @@ -5054,21 +5119,29 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, } static __be32 -nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp, struct nfsd4_open *open) +nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, + struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp, + struct nfsd4_open *open) { __be32 status; unsigned char old_deny_bmap = stp->st_deny_bmap; if (!test_access(open->op_share_access, stp)) - return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open); + return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open, false); /* test and set deny mode */ spin_lock(&fp->fi_lock); status = nfs4_file_check_deny(fp, open->op_share_deny); if (status == nfs_ok) { - set_deny(open->op_share_deny, stp); - fp->fi_share_deny |= + if (status != nfserr_share_denied) { + set_deny(open->op_share_deny, stp); + fp->fi_share_deny |= (open->op_share_deny & NFS4_SHARE_DENY_BOTH); + } else { + if (nfs4_resolve_deny_conflicts_locked(fp, false, + stp, open->op_share_deny, false)) + status = nfserr_jukebox; + } } spin_unlock(&fp->fi_lock); @@ -5409,7 +5482,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf goto out; } } else { - status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open); + status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open, true); if (status) { stp->st_stid.sc_type = NFS4_CLOSED_STID; release_open_stateid(stp); @@ -5643,6 +5716,26 @@ static void nfsd4_ssc_expire_umount(struct nfsd_net *nn) } #endif +static bool +nfs4_has_any_locks(struct nfs4_client *clp) +{ + int i; + struct nfs4_stateowner *so; + + spin_lock(&clp->cl_lock); + for (i = 0; i < OWNER_HASH_SIZE; i++) { + list_for_each_entry(so, &clp->cl_ownerstr_hashtbl[i], + so_strhash) { + if (so->so_is_open_owner) + continue; + spin_unlock(&clp->cl_lock); + return true; + } + } + spin_unlock(&clp->cl_lock); + return false; +} + /* * place holder for now, no check for lock blockers yet */ @@ -5650,8 +5743,8 @@ static bool nfs4_anylock_blockers(struct nfs4_client *clp) { if (atomic_read(&clp->cl_delegs_in_recall) || - client_has_openowners(clp) || - !list_empty(&clp->async_copies)) + !list_empty(&clp->async_copies) || + nfs4_has_any_locks(clp)) return true; return false; } From 461d0b57c9f31335d22c9e91e33f9a6c68b85440 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Mon, 2 May 2022 14:19:23 -0700 Subject: [PATCH 0811/1565] NFSD: move create/destroy of laundry_wq to init_nfsd and exit_nfsd [ Upstream commit d76cc46b37e123e8d245cc3490978dbda56f979d ] This patch moves create/destroy of laundry_wq from nfs4_state_start and nfs4_state_shutdown_net to init_nfsd and exit_nfsd to prevent the laundromat from being freed while a thread is processing a conflicting lock. Reviewed-by: J. Bruce Fields Signed-off-by: Dai Ngo Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 28 ++++++++++++++++------------ fs/nfsd/nfsctl.c | 4 ++++ fs/nfsd/nfsd.h | 4 ++++ 3 files changed, 24 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 4ba0a70d8990..30ea1c7b6b9f 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -127,6 +127,21 @@ static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops; static struct workqueue_struct *laundry_wq; +int nfsd4_create_laundry_wq(void) +{ + int rc = 0; + + laundry_wq = alloc_workqueue("%s", WQ_UNBOUND, 0, "nfsd4"); + if (laundry_wq == NULL) + rc = -ENOMEM; + return rc; +} + +void nfsd4_destroy_laundry_wq(void) +{ + destroy_workqueue(laundry_wq); +} + static bool is_session_dead(struct nfsd4_session *ses) { return ses->se_flags & NFS4_SESSION_DEAD; @@ -7775,22 +7790,12 @@ nfs4_state_start(void) { int ret; - laundry_wq = alloc_workqueue("%s", WQ_UNBOUND, 0, "nfsd4"); - if (laundry_wq == NULL) { - ret = -ENOMEM; - goto out; - } ret = nfsd4_create_callback_queue(); if (ret) - goto out_free_laundry; + return ret; set_max_delegations(); return 0; - -out_free_laundry: - destroy_workqueue(laundry_wq); -out: - return ret; } void @@ -7827,7 +7832,6 @@ nfs4_state_shutdown_net(struct net *net) void nfs4_state_shutdown(void) { - destroy_workqueue(laundry_wq); nfsd4_destroy_callback_queue(); } diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 16920e4512bd..322a208878f2 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1542,6 +1542,9 @@ static int __init init_nfsd(void) if (retval < 0) goto out_free_filesystem; retval = register_cld_notifier(); + if (retval) + goto out_free_all; + retval = nfsd4_create_laundry_wq(); if (retval) goto out_free_all; return 0; @@ -1566,6 +1569,7 @@ static int __init init_nfsd(void) static void __exit exit_nfsd(void) { + nfsd4_destroy_laundry_wq(); unregister_cld_notifier(); unregister_pernet_subsys(&nfsd_net_ops); nfsd_drc_slab_free(); diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 23996c6ca75e..847b482155ae 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -162,6 +162,8 @@ void nfs4_state_shutdown_net(struct net *net); int nfs4_reset_recoverydir(char *recdir); char * nfs4_recoverydir(void); bool nfsd4_spo_must_allow(struct svc_rqst *rqstp); +int nfsd4_create_laundry_wq(void); +void nfsd4_destroy_laundry_wq(void); #else static inline int nfsd4_init_slabs(void) { return 0; } static inline void nfsd4_free_slabs(void) { } @@ -175,6 +177,8 @@ static inline bool nfsd4_spo_must_allow(struct svc_rqst *rqstp) { return false; } +static inline int nfsd4_create_laundry_wq(void) { return 0; }; +static inline void nfsd4_destroy_laundry_wq(void) {}; #endif /* From eb2eb6b6afdf602b779824ce4e3629ae234ca67a Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Mon, 2 May 2022 14:19:24 -0700 Subject: [PATCH 0812/1565] fs/lock: add helper locks_owner_has_blockers to check for blockers [ Upstream commit 591502c5cb325b1c6ec59ab161927d606b918aa0 ] Add helper locks_owner_has_blockers to check if there is any blockers for a given lockowner. Reviewed-by: J. Bruce Fields Signed-off-by: Dai Ngo Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/locks.c | 28 ++++++++++++++++++++++++++++ include/linux/fs.h | 7 +++++++ 2 files changed, 35 insertions(+) diff --git a/fs/locks.c b/fs/locks.c index 101867933e4d..118df2812f8a 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -376,6 +376,34 @@ void locks_release_private(struct file_lock *fl) } EXPORT_SYMBOL_GPL(locks_release_private); +/** + * locks_owner_has_blockers - Check for blocking lock requests + * @flctx: file lock context + * @owner: lock owner + * + * Return values: + * %true: @owner has at least one blocker + * %false: @owner has no blockers + */ +bool locks_owner_has_blockers(struct file_lock_context *flctx, + fl_owner_t owner) +{ + struct file_lock *fl; + + spin_lock(&flctx->flc_lock); + list_for_each_entry(fl, &flctx->flc_posix, fl_list) { + if (fl->fl_owner != owner) + continue; + if (!list_empty(&fl->fl_blocked_requests)) { + spin_unlock(&flctx->flc_lock); + return true; + } + } + spin_unlock(&flctx->flc_lock); + return false; +} +EXPORT_SYMBOL_GPL(locks_owner_has_blockers); + /* Free a lock which is not in use. */ void locks_free_lock(struct file_lock *fl) { diff --git a/include/linux/fs.h b/include/linux/fs.h index c0459446e144..17dc1ee8c6cb 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1163,6 +1163,8 @@ extern void lease_unregister_notifier(struct notifier_block *); struct files_struct; extern void show_fd_locks(struct seq_file *f, struct file *filp, struct files_struct *files); +extern bool locks_owner_has_blockers(struct file_lock_context *flctx, + fl_owner_t owner); #else /* !CONFIG_FILE_LOCKING */ static inline int fcntl_getlk(struct file *file, unsigned int cmd, struct flock __user *user) @@ -1303,6 +1305,11 @@ static inline int lease_modify(struct file_lock *fl, int arg, struct files_struct; static inline void show_fd_locks(struct seq_file *f, struct file *filp, struct files_struct *files) {} +static inline bool locks_owner_has_blockers(struct file_lock_context *flctx, + fl_owner_t owner) +{ + return false; +} #endif /* !CONFIG_FILE_LOCKING */ static inline struct inode *file_inode(const struct file *f) From 97f77d7d501bee5bf8dd96b6c0b3f666b5b5bc99 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Mon, 2 May 2022 14:19:25 -0700 Subject: [PATCH 0813/1565] fs/lock: add 2 callbacks to lock_manager_operations to resolve conflict [ Upstream commit 2443da2259e97688f93d64d17ab69b15f466078a ] Add 2 new callbacks, lm_lock_expirable and lm_expire_lock, to lock_manager_operations to allow the lock manager to take appropriate action to resolve the lock conflict if possible. A new field, lm_mod_owner, is also added to lock_manager_operations. The lm_mod_owner is used by the fs/lock code to make sure the lock manager module such as nfsd, is not freed while lock conflict is being resolved. lm_lock_expirable checks and returns true to indicate that the lock conflict can be resolved else return false. This callback must be called with the flc_lock held so it can not block. lm_expire_lock is called to resolve the lock conflict if the returned value from lm_lock_expirable is true. This callback is called without the flc_lock held since it's allowed to block. Upon returning from this callback, the lock conflict should be resolved and the caller is expected to restart the conflict check from the beginnning of the list. Lock manager, such as NFSv4 courteous server, uses this callback to resolve conflict by destroying lock owner, or the NFSv4 courtesy client (client that has expired but allowed to maintains its states) that owns the lock. Reviewed-by: J. Bruce Fields Signed-off-by: Dai Ngo Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- Documentation/filesystems/locking.rst | 4 ++++ fs/locks.c | 33 ++++++++++++++++++++++++--- include/linux/fs.h | 3 +++ 3 files changed, 37 insertions(+), 3 deletions(-) diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 07e57f762920..23a0d24168bc 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -433,6 +433,8 @@ prototypes:: void (*lm_break)(struct file_lock *); /* break_lease callback */ int (*lm_change)(struct file_lock **, int); bool (*lm_breaker_owns_lease)(struct file_lock *); + bool (*lm_lock_expirable)(struct file_lock *); + void (*lm_expire_lock)(void); locking rules: @@ -444,6 +446,8 @@ lm_grant: no no no lm_break: yes no no lm_change yes no no lm_breaker_owns_lease: yes no no +lm_lock_expirable yes no no +lm_expire_lock no no yes ====================== ============= ================= ========= buffer_head diff --git a/fs/locks.c b/fs/locks.c index 118df2812f8a..13a3ba97b73d 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -982,6 +982,8 @@ posix_test_lock(struct file *filp, struct file_lock *fl) struct file_lock *cfl; struct file_lock_context *ctx; struct inode *inode = locks_inode(filp); + void *owner; + void (*func)(void); ctx = smp_load_acquire(&inode->i_flctx); if (!ctx || list_empty_careful(&ctx->flc_posix)) { @@ -989,12 +991,23 @@ posix_test_lock(struct file *filp, struct file_lock *fl) return; } +retry: spin_lock(&ctx->flc_lock); list_for_each_entry(cfl, &ctx->flc_posix, fl_list) { - if (posix_locks_conflict(fl, cfl)) { - locks_copy_conflock(fl, cfl); - goto out; + if (!posix_locks_conflict(fl, cfl)) + continue; + if (cfl->fl_lmops && cfl->fl_lmops->lm_lock_expirable + && (*cfl->fl_lmops->lm_lock_expirable)(cfl)) { + owner = cfl->fl_lmops->lm_mod_owner; + func = cfl->fl_lmops->lm_expire_lock; + __module_get(owner); + spin_unlock(&ctx->flc_lock); + (*func)(); + module_put(owner); + goto retry; } + locks_copy_conflock(fl, cfl); + goto out; } fl->fl_type = F_UNLCK; out: @@ -1168,6 +1181,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, int error; bool added = false; LIST_HEAD(dispose); + void *owner; + void (*func)(void); ctx = locks_get_lock_context(inode, request->fl_type); if (!ctx) @@ -1186,6 +1201,7 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, new_fl2 = locks_alloc_lock(); } +retry: percpu_down_read(&file_rwsem); spin_lock(&ctx->flc_lock); /* @@ -1197,6 +1213,17 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, list_for_each_entry(fl, &ctx->flc_posix, fl_list) { if (!posix_locks_conflict(request, fl)) continue; + if (fl->fl_lmops && fl->fl_lmops->lm_lock_expirable + && (*fl->fl_lmops->lm_lock_expirable)(fl)) { + owner = fl->fl_lmops->lm_mod_owner; + func = fl->fl_lmops->lm_expire_lock; + __module_get(owner); + spin_unlock(&ctx->flc_lock); + percpu_up_read(&file_rwsem); + (*func)(); + module_put(owner); + goto retry; + } if (conflock) locks_copy_conflock(conflock, fl); error = -EAGAIN; diff --git a/include/linux/fs.h b/include/linux/fs.h index 17dc1ee8c6cb..3e9105b3cc76 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1017,6 +1017,7 @@ struct file_lock_operations { }; struct lock_manager_operations { + void *lm_mod_owner; fl_owner_t (*lm_get_owner)(fl_owner_t); void (*lm_put_owner)(fl_owner_t); void (*lm_notify)(struct file_lock *); /* unblock callback */ @@ -1025,6 +1026,8 @@ struct lock_manager_operations { int (*lm_change)(struct file_lock *, int, struct list_head *); void (*lm_setup)(struct file_lock *, void **); bool (*lm_breaker_owns_lease)(struct file_lock *); + bool (*lm_lock_expirable)(struct file_lock *cfl); + void (*lm_expire_lock)(void); }; struct lock_manager { From 4a39f029e7e3436d9b6be1900a2a89bda01fcd69 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Mon, 2 May 2022 14:19:26 -0700 Subject: [PATCH 0814/1565] NFSD: add support for lock conflict to courteous server [ Upstream commit 27431affb0dbc259ac6ffe6071243a576c8f38f1 ] This patch allows expired client with lock state to be in COURTESY state. Lock conflict with COURTESY client is resolved by the fs/lock code using the lm_lock_expirable and lm_expire_lock callback in the struct lock_manager_operations. If conflict client is in COURTESY state, set it to EXPIRABLE and schedule the laundromat to run immediately to expire the client. The callback lm_expire_lock waits for the laundromat to flush its work queue before returning to caller. Reviewed-by: J. Bruce Fields Signed-off-by: Dai Ngo Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 70 ++++++++++++++++++++++++++++++++++----------- 1 file changed, 54 insertions(+), 16 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 30ea1c7b6b9f..6851fece3a76 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5731,39 +5731,51 @@ static void nfsd4_ssc_expire_umount(struct nfsd_net *nn) } #endif +/* Check if any lock belonging to this lockowner has any blockers */ static bool -nfs4_has_any_locks(struct nfs4_client *clp) +nfs4_lockowner_has_blockers(struct nfs4_lockowner *lo) +{ + struct file_lock_context *ctx; + struct nfs4_ol_stateid *stp; + struct nfs4_file *nf; + + list_for_each_entry(stp, &lo->lo_owner.so_stateids, st_perstateowner) { + nf = stp->st_stid.sc_file; + ctx = nf->fi_inode->i_flctx; + if (!ctx) + continue; + if (locks_owner_has_blockers(ctx, lo)) + return true; + } + return false; +} + +static bool +nfs4_anylock_blockers(struct nfs4_client *clp) { int i; struct nfs4_stateowner *so; + struct nfs4_lockowner *lo; + if (atomic_read(&clp->cl_delegs_in_recall)) + return true; spin_lock(&clp->cl_lock); for (i = 0; i < OWNER_HASH_SIZE; i++) { list_for_each_entry(so, &clp->cl_ownerstr_hashtbl[i], so_strhash) { if (so->so_is_open_owner) continue; - spin_unlock(&clp->cl_lock); - return true; + lo = lockowner(so); + if (nfs4_lockowner_has_blockers(lo)) { + spin_unlock(&clp->cl_lock); + return true; + } } } spin_unlock(&clp->cl_lock); return false; } -/* - * place holder for now, no check for lock blockers yet - */ -static bool -nfs4_anylock_blockers(struct nfs4_client *clp) -{ - if (atomic_read(&clp->cl_delegs_in_recall) || - !list_empty(&clp->async_copies) || - nfs4_has_any_locks(clp)) - return true; - return false; -} - static void nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, struct laundry_time *lt) @@ -6743,6 +6755,29 @@ nfsd4_lm_put_owner(fl_owner_t owner) nfs4_put_stateowner(&lo->lo_owner); } +/* return pointer to struct nfs4_client if client is expirable */ +static bool +nfsd4_lm_lock_expirable(struct file_lock *cfl) +{ + struct nfs4_lockowner *lo = (struct nfs4_lockowner *)cfl->fl_owner; + struct nfs4_client *clp = lo->lo_owner.so_client; + struct nfsd_net *nn; + + if (try_to_expire_client(clp)) { + nn = net_generic(clp->net, nfsd_net_id); + mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); + return true; + } + return false; +} + +/* schedule laundromat to run immediately and wait for it to complete */ +static void +nfsd4_lm_expire_lock(void) +{ + flush_workqueue(laundry_wq); +} + static void nfsd4_lm_notify(struct file_lock *fl) { @@ -6769,9 +6804,12 @@ nfsd4_lm_notify(struct file_lock *fl) } static const struct lock_manager_operations nfsd_posix_mng_ops = { + .lm_mod_owner = THIS_MODULE, .lm_notify = nfsd4_lm_notify, .lm_get_owner = nfsd4_lm_get_owner, .lm_put_owner = nfsd4_lm_put_owner, + .lm_lock_expirable = nfsd4_lm_lock_expirable, + .lm_expire_lock = nfsd4_lm_expire_lock, }; static inline void From c6207942b2554bbc9a8b4414457ff312f214c6cc Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Mon, 2 May 2022 14:19:27 -0700 Subject: [PATCH 0815/1565] NFSD: Show state of courtesy client in client info [ Upstream commit e9488d5ae13c0a72223c507e2508dc2ac66cad4f ] Update client_info_show to show state of courtesy client and seconds since last renew. Reviewed-by: J. Bruce Fields Signed-off-by: Dai Ngo Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 6851fece3a76..9116496b476a 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2494,10 +2494,17 @@ static int client_info_show(struct seq_file *m, void *v) memcpy(&clid, &clp->cl_clientid, sizeof(clid)); seq_printf(m, "clientid: 0x%llx\n", clid); seq_printf(m, "address: \"%pISpc\"\n", (struct sockaddr *)&clp->cl_addr); - if (test_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags)) + + if (clp->cl_state == NFSD4_COURTESY) + seq_puts(m, "status: courtesy\n"); + else if (clp->cl_state == NFSD4_EXPIRABLE) + seq_puts(m, "status: expirable\n"); + else if (test_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags)) seq_puts(m, "status: confirmed\n"); else seq_puts(m, "status: unconfirmed\n"); + seq_printf(m, "seconds from last renew: %lld\n", + ktime_get_boottime_seconds() - clp->cl_time); seq_printf(m, "name: "); seq_quote_mem(m, clp->cl_name.data, clp->cl_name.len); seq_printf(m, "\nminor version: %d\n", clp->cl_minorversion); From 304505e2e89c45bfe9d78b1ff0296f6abbb26f00 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 25 Mar 2022 14:47:54 -0400 Subject: [PATCH 0816/1565] NFSD: Clean up nfsd3_proc_create() [ Upstream commit e61568599c9ad638fdaba150fee07d7065e31851 ] As near as I can tell, mode bit masking and setting S_IFREG is already done by do_nfsd_create() and vfs_create(). The NFSv4 path (do_open_lookup), for example, does not bother with this special processing. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 16 ++-------------- 1 file changed, 2 insertions(+), 14 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 936eebd4c56d..981a2a71c5af 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -229,8 +229,7 @@ nfsd3_proc_create(struct svc_rqst *rqstp) { struct nfsd3_createargs *argp = rqstp->rq_argp; struct nfsd3_diropres *resp = rqstp->rq_resp; - svc_fh *dirfhp, *newfhp = NULL; - struct iattr *attr; + svc_fh *dirfhp, *newfhp; dprintk("nfsd: CREATE(3) %s %.*s\n", SVCFH_fmt(&argp->fh), @@ -239,20 +238,9 @@ nfsd3_proc_create(struct svc_rqst *rqstp) dirfhp = fh_copy(&resp->dirfh, &argp->fh); newfhp = fh_init(&resp->fh, NFS3_FHSIZE); - attr = &argp->attrs; - /* Unfudge the mode bits */ - attr->ia_mode &= ~S_IFMT; - if (!(attr->ia_valid & ATTR_MODE)) { - attr->ia_valid |= ATTR_MODE; - attr->ia_mode = S_IFREG; - } else { - attr->ia_mode = (attr->ia_mode & ~S_IFMT) | S_IFREG; - } - - /* Now create the file and set attributes */ resp->status = do_nfsd_create(rqstp, dirfhp, argp->name, argp->len, - attr, newfhp, argp->createmode, + &argp->attrs, newfhp, argp->createmode, (u32 *)argp->verf, NULL, NULL); return rpc_success; } From ee0742a93ccba786bc06e00bb0d10a5123acd3ba Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 28 Mar 2022 10:16:42 -0400 Subject: [PATCH 0817/1565] NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create() [ Upstream commit 14ee45b70dd0d9ae76fb066cd8c0652d657353f6 ] Clean up: The "out" label already invokes fh_drop_write(). Note that fh_drop_write() is already careful not to invoke mnt_drop_write() if either it has already been done or there is nothing to drop. Therefore no change in behavior is expected. [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 0968eaf735c8..cdeba19db16d 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1504,7 +1504,6 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, case NFS3_CREATE_GUARDED: err = nfserr_exist; } - fh_drop_write(fhp); goto out; } @@ -1512,10 +1511,8 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, iap->ia_mode &= ~current_umask(); host_err = vfs_create(dirp, dchild, iap->ia_mode, true); - if (host_err < 0) { - fh_drop_write(fhp); + if (host_err < 0) goto out_nfserr; - } if (created) *created = true; From a132795b61fe113a3588015c5a0c0ff621e4f9c6 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 28 Mar 2022 16:10:17 -0400 Subject: [PATCH 0818/1565] NFSD: Refactor nfsd_create_setattr() [ Upstream commit 5f46e950c395b9c14c282b53ba78c5fd46d6c256 ] I'd like to move do_nfsd_create() out of vfs.c. Therefore nfsd_create_setattr() needs to be made publicly visible. Note that both call sites in vfs.c commit both the new object and its parent directory, so just combine those common metadata commits into nfsd_create_setattr(). Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 79 +++++++++++++++++++++++++++------------------------ fs/nfsd/vfs.h | 2 ++ 2 files changed, 44 insertions(+), 37 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index cdeba19db16d..50451789a944 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1207,14 +1207,26 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, u64 offset, return err; } -static __be32 -nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *resfhp, - struct iattr *iap) +/** + * nfsd_create_setattr - Set a created file's attributes + * @rqstp: RPC transaction being executed + * @fhp: NFS filehandle of parent directory + * @resfhp: NFS filehandle of new object + * @iap: requested attributes of new object + * + * Returns nfs_ok on success, or an nfsstat in network byte order. + */ +__be32 +nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, + struct svc_fh *resfhp, struct iattr *iap) { + __be32 status; + /* - * Mode has already been set earlier in create: + * Mode has already been set by file creation. */ iap->ia_valid &= ~ATTR_MODE; + /* * Setting uid/gid works only for root. Irix appears to * send along the gid on create when it tries to implement @@ -1222,10 +1234,31 @@ nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *resfhp, */ if (!uid_eq(current_fsuid(), GLOBAL_ROOT_UID)) iap->ia_valid &= ~(ATTR_UID|ATTR_GID); + + /* + * Callers expect new file metadata to be committed even + * if the attributes have not changed. + */ if (iap->ia_valid) - return nfsd_setattr(rqstp, resfhp, iap, 0, (time64_t)0); - /* Callers expect file metadata to be committed here */ - return nfserrno(commit_metadata(resfhp)); + status = nfsd_setattr(rqstp, resfhp, iap, 0, (time64_t)0); + else + status = nfserrno(commit_metadata(resfhp)); + + /* + * Transactional filesystems had a chance to commit changes + * for both parent and child simultaneously making the + * following commit_metadata a noop in many cases. + */ + if (!status) + status = nfserrno(commit_metadata(fhp)); + + /* + * Update the new filehandle to pick up the new attributes. + */ + if (!status) + status = fh_update(resfhp); + + return status; } /* HPUX client sometimes creates a file in mode 000, and sets size to 0. @@ -1252,7 +1285,6 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, struct dentry *dentry, *dchild; struct inode *dirp; __be32 err; - __be32 err2; int host_err; dentry = fhp->fh_dentry; @@ -1324,22 +1356,8 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, if (host_err < 0) goto out_nfserr; - err = nfsd_create_setattr(rqstp, resfhp, iap); + err = nfsd_create_setattr(rqstp, fhp, resfhp, iap); - /* - * nfsd_create_setattr already committed the child. Transactional - * filesystems had a chance to commit changes for both parent and - * child simultaneously making the following commit_metadata a - * noop. - */ - err2 = nfserrno(commit_metadata(fhp)); - if (err2) - err = err2; - /* - * Update the file handle to get the new inode info. - */ - if (!err) - err = fh_update(resfhp); out: dput(dchild); return err; @@ -1530,20 +1548,7 @@ do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, } set_attr: - err = nfsd_create_setattr(rqstp, resfhp, iap); - - /* - * nfsd_create_setattr already committed the child - * (and possibly also the parent). - */ - if (!err) - err = nfserrno(commit_metadata(fhp)); - - /* - * Update the filehandle to get the new inode info. - */ - if (!err) - err = fh_update(resfhp); + err = nfsd_create_setattr(rqstp, fhp, resfhp, iap); out: fh_unlock(fhp); diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index ccb87b2864f6..1f32a83456b0 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -69,6 +69,8 @@ __be32 nfsd_create(struct svc_rqst *, struct svc_fh *, char *name, int len, struct iattr *attrs, int type, dev_t rdev, struct svc_fh *res); __be32 nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *); +__be32 nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, + struct svc_fh *resfhp, struct iattr *iap); __be32 do_nfsd_create(struct svc_rqst *, struct svc_fh *, char *name, int len, struct iattr *attrs, struct svc_fh *res, int createmode, From a019add1b4561f42e867535f2bc186168d5298e0 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 28 Mar 2022 13:29:23 -0400 Subject: [PATCH 0819/1565] NFSD: Refactor NFSv3 CREATE [ Upstream commit df9606abddfb01090d5ece7dcc2441d848f690f0 ] The NFSv3 CREATE and NFSv4 OPEN(CREATE) use cases are about to diverge such that it makes sense to split do_nfsd_create() into one version for NFSv3 and one for NFSv4. As a first step, copy do_nfsd_create() to nfs3proc.c and remove NFSv4-specific logic. One immediate legibility benefit is that the logic for handling NFSv3 createhow is now quite straightforward. NFSv4 createhow has some subtleties that IMO do not belong in generic code. [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 127 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 121 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 981a2a71c5af..e314545bbdb2 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -8,6 +8,7 @@ #include #include #include +#include #include "cache.h" #include "xdr3.h" @@ -220,10 +221,126 @@ nfsd3_proc_write(struct svc_rqst *rqstp) } /* - * With NFSv3, CREATE processing is a lot easier than with NFSv2. - * At least in theory; we'll see how it fares in practice when the - * first reports about SunOS compatibility problems start to pour in... + * Implement NFSv3's unchecked, guarded, and exclusive CREATE + * semantics for regular files. Except for the created file, + * this operation is stateless on the server. + * + * Upon return, caller must release @fhp and @resfhp. */ +static __be32 +nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, + struct svc_fh *resfhp, struct nfsd3_createargs *argp) +{ + struct iattr *iap = &argp->attrs; + struct dentry *parent, *child; + __u32 v_mtime, v_atime; + struct inode *inode; + __be32 status; + int host_err; + + if (isdotent(argp->name, argp->len)) + return nfserr_exist; + if (!(iap->ia_valid & ATTR_MODE)) + iap->ia_mode = 0; + + status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC); + if (status != nfs_ok) + return status; + + parent = fhp->fh_dentry; + inode = d_inode(parent); + + host_err = fh_want_write(fhp); + if (host_err) + return nfserrno(host_err); + + fh_lock_nested(fhp, I_MUTEX_PARENT); + + child = lookup_one_len(argp->name, parent, argp->len); + if (IS_ERR(child)) { + status = nfserrno(PTR_ERR(child)); + goto out; + } + + if (d_really_is_negative(child)) { + status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE); + if (status != nfs_ok) + goto out; + } + + status = fh_compose(resfhp, fhp->fh_export, child, fhp); + if (status != nfs_ok) + goto out; + + v_mtime = 0; + v_atime = 0; + if (argp->createmode == NFS3_CREATE_EXCLUSIVE) { + u32 *verifier = (u32 *)argp->verf; + + /* + * Solaris 7 gets confused (bugid 4218508) if these have + * the high bit set, as do xfs filesystems without the + * "bigtime" feature. So just clear the high bits. + */ + v_mtime = verifier[0] & 0x7fffffff; + v_atime = verifier[1] & 0x7fffffff; + } + + if (d_really_is_positive(child)) { + status = nfs_ok; + + switch (argp->createmode) { + case NFS3_CREATE_UNCHECKED: + if (!d_is_reg(child)) + break; + iap->ia_valid &= ATTR_SIZE; + goto set_attr; + case NFS3_CREATE_GUARDED: + status = nfserr_exist; + break; + case NFS3_CREATE_EXCLUSIVE: + if (d_inode(child)->i_mtime.tv_sec == v_mtime && + d_inode(child)->i_atime.tv_sec == v_atime && + d_inode(child)->i_size == 0) { + break; + } + status = nfserr_exist; + } + goto out; + } + + if (!IS_POSIXACL(inode)) + iap->ia_mode &= ~current_umask(); + + host_err = vfs_create(inode, child, iap->ia_mode, true); + if (host_err < 0) { + status = nfserrno(host_err); + goto out; + } + + /* A newly created file already has a file size of zero. */ + if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0)) + iap->ia_valid &= ~ATTR_SIZE; + if (argp->createmode == NFS3_CREATE_EXCLUSIVE) { + iap->ia_valid = ATTR_MTIME | ATTR_ATIME | + ATTR_MTIME_SET | ATTR_ATIME_SET; + iap->ia_mtime.tv_sec = v_mtime; + iap->ia_atime.tv_sec = v_atime; + iap->ia_mtime.tv_nsec = 0; + iap->ia_atime.tv_nsec = 0; + } + +set_attr: + status = nfsd_create_setattr(rqstp, fhp, resfhp, iap); + +out: + fh_unlock(fhp); + if (child && !IS_ERR(child)) + dput(child); + fh_drop_write(fhp); + return status; +} + static __be32 nfsd3_proc_create(struct svc_rqst *rqstp) { @@ -239,9 +356,7 @@ nfsd3_proc_create(struct svc_rqst *rqstp) dirfhp = fh_copy(&resp->dirfh, &argp->fh); newfhp = fh_init(&resp->fh, NFS3_FHSIZE); - resp->status = do_nfsd_create(rqstp, dirfhp, argp->name, argp->len, - &argp->attrs, newfhp, argp->createmode, - (u32 *)argp->verf, NULL, NULL); + resp->status = nfsd3_create_file(rqstp, dirfhp, newfhp, argp); return rpc_success; } From 66af1db0cc37870629d71bb1a1577173e473a2d6 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 28 Mar 2022 14:47:34 -0400 Subject: [PATCH 0820/1565] NFSD: Refactor NFSv4 OPEN(CREATE) [ Upstream commit 254454a5aa4a9f696d6bae080c08d5863e650f49 ] Copy do_nfsd_create() to nfs4proc.c and remove NFSv3-specific logic. [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 162 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 152 insertions(+), 10 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index f3d6bd2bfa4f..adb64c9259bd 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -37,6 +37,8 @@ #include #include #include +#include + #include #include @@ -235,6 +237,154 @@ static void nfsd4_set_open_owner_reply_cache(struct nfsd4_compound_state *cstate &resfh->fh_handle); } +static inline bool nfsd4_create_is_exclusive(int createmode) +{ + return createmode == NFS4_CREATE_EXCLUSIVE || + createmode == NFS4_CREATE_EXCLUSIVE4_1; +} + +/* + * Implement NFSv4's unchecked, guarded, and exclusive create + * semantics for regular files. Open state for this new file is + * subsequently fabricated in nfsd4_process_open2(). + * + * Upon return, caller must release @fhp and @resfhp. + */ +static __be32 +nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, + struct svc_fh *resfhp, struct nfsd4_open *open) +{ + struct iattr *iap = &open->op_iattr; + struct dentry *parent, *child; + __u32 v_mtime, v_atime; + struct inode *inode; + __be32 status; + int host_err; + + if (isdotent(open->op_fname, open->op_fnamelen)) + return nfserr_exist; + if (!(iap->ia_valid & ATTR_MODE)) + iap->ia_mode = 0; + + status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC); + if (status != nfs_ok) + return status; + parent = fhp->fh_dentry; + inode = d_inode(parent); + + host_err = fh_want_write(fhp); + if (host_err) + return nfserrno(host_err); + + fh_lock_nested(fhp, I_MUTEX_PARENT); + + child = lookup_one_len(open->op_fname, parent, open->op_fnamelen); + if (IS_ERR(child)) { + status = nfserrno(PTR_ERR(child)); + goto out; + } + + if (d_really_is_negative(child)) { + status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE); + if (status != nfs_ok) + goto out; + } + + status = fh_compose(resfhp, fhp->fh_export, child, fhp); + if (status != nfs_ok) + goto out; + + v_mtime = 0; + v_atime = 0; + if (nfsd4_create_is_exclusive(open->op_createmode)) { + u32 *verifier = (u32 *)open->op_verf.data; + + /* + * Solaris 7 gets confused (bugid 4218508) if these have + * the high bit set, as do xfs filesystems without the + * "bigtime" feature. So just clear the high bits. If this + * is ever changed to use different attrs for storing the + * verifier, then do_open_lookup() will also need to be + * fixed accordingly. + */ + v_mtime = verifier[0] & 0x7fffffff; + v_atime = verifier[1] & 0x7fffffff; + } + + if (d_really_is_positive(child)) { + status = nfs_ok; + + switch (open->op_createmode) { + case NFS4_CREATE_UNCHECKED: + if (!d_is_reg(child)) + break; + + /* + * In NFSv4, we don't want to truncate the file + * now. This would be wrong if the OPEN fails for + * some other reason. Furthermore, if the size is + * nonzero, we should ignore it according to spec! + */ + open->op_truncate = (iap->ia_valid & ATTR_SIZE) && + !iap->ia_size; + break; + case NFS4_CREATE_GUARDED: + status = nfserr_exist; + break; + case NFS4_CREATE_EXCLUSIVE: + if (d_inode(child)->i_mtime.tv_sec == v_mtime && + d_inode(child)->i_atime.tv_sec == v_atime && + d_inode(child)->i_size == 0) { + open->op_created = true; + break; /* subtle */ + } + status = nfserr_exist; + break; + case NFS4_CREATE_EXCLUSIVE4_1: + if (d_inode(child)->i_mtime.tv_sec == v_mtime && + d_inode(child)->i_atime.tv_sec == v_atime && + d_inode(child)->i_size == 0) { + open->op_created = true; + goto set_attr; /* subtle */ + } + status = nfserr_exist; + } + goto out; + } + + if (!IS_POSIXACL(inode)) + iap->ia_mode &= ~current_umask(); + + host_err = vfs_create(inode, child, iap->ia_mode, true); + if (host_err < 0) { + status = nfserrno(host_err); + goto out; + } + open->op_created = true; + + /* A newly created file already has a file size of zero. */ + if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0)) + iap->ia_valid &= ~ATTR_SIZE; + if (nfsd4_create_is_exclusive(open->op_createmode)) { + iap->ia_valid = ATTR_MTIME | ATTR_ATIME | + ATTR_MTIME_SET|ATTR_ATIME_SET; + iap->ia_mtime.tv_sec = v_mtime; + iap->ia_atime.tv_sec = v_atime; + iap->ia_mtime.tv_nsec = 0; + iap->ia_atime.tv_nsec = 0; + } + +set_attr: + status = nfsd_create_setattr(rqstp, fhp, resfhp, iap); + +out: + fh_unlock(fhp); + if (child && !IS_ERR(child)) + dput(child); + fh_drop_write(fhp); + return status; +} + static __be32 do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_open *open, struct svc_fh **resfh) { @@ -264,16 +414,8 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru * yes | yes | GUARDED4 | GUARDED4 */ - /* - * Note: create modes (UNCHECKED,GUARDED...) are the same - * in NFSv4 as in v3 except EXCLUSIVE4_1. - */ current->fs->umask = open->op_umask; - status = do_nfsd_create(rqstp, current_fh, open->op_fname, - open->op_fnamelen, &open->op_iattr, - *resfh, open->op_createmode, - (u32 *)open->op_verf.data, - &open->op_truncate, &open->op_created); + status = nfsd4_create_file(rqstp, current_fh, *resfh, open); current->fs->umask = 0; if (!status && open->op_label.len) @@ -284,7 +426,7 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru * use the returned bitmask to indicate which attributes * we used to store the verifier: */ - if (nfsd_create_is_exclusive(open->op_createmode) && status == 0) + if (nfsd4_create_is_exclusive(open->op_createmode) && status == 0) open->op_bmval[1] |= (FATTR4_WORD1_TIME_ACCESS | FATTR4_WORD1_TIME_MODIFY); } else From 274fd0f9c26105daac1991ee55243e08e0696d90 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 28 Mar 2022 15:36:58 -0400 Subject: [PATCH 0821/1565] NFSD: Remove do_nfsd_create() [ Upstream commit 1c388f27759c5d9271d4fca081f7ee138986eb7d ] Now that its two callers have their own version-specific instance of this function, do_nfsd_create() is no longer used. [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 150 -------------------------------------------------- fs/nfsd/vfs.h | 10 ---- 2 files changed, 160 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 50451789a944..26ae28b6f9b0 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1412,156 +1412,6 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, rdev, resfhp); } -/* - * NFSv3 and NFSv4 version of nfsd_create - */ -__be32 -do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, - char *fname, int flen, struct iattr *iap, - struct svc_fh *resfhp, int createmode, u32 *verifier, - bool *truncp, bool *created) -{ - struct dentry *dentry, *dchild = NULL; - struct inode *dirp; - __be32 err; - int host_err; - __u32 v_mtime=0, v_atime=0; - - err = nfserr_perm; - if (!flen) - goto out; - err = nfserr_exist; - if (isdotent(fname, flen)) - goto out; - if (!(iap->ia_valid & ATTR_MODE)) - iap->ia_mode = 0; - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC); - if (err) - goto out; - - dentry = fhp->fh_dentry; - dirp = d_inode(dentry); - - host_err = fh_want_write(fhp); - if (host_err) - goto out_nfserr; - - fh_lock_nested(fhp, I_MUTEX_PARENT); - - /* - * Compose the response file handle. - */ - dchild = lookup_one_len(fname, dentry, flen); - host_err = PTR_ERR(dchild); - if (IS_ERR(dchild)) - goto out_nfserr; - - /* If file doesn't exist, check for permissions to create one */ - if (d_really_is_negative(dchild)) { - err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE); - if (err) - goto out; - } - - err = fh_compose(resfhp, fhp->fh_export, dchild, fhp); - if (err) - goto out; - - if (nfsd_create_is_exclusive(createmode)) { - /* solaris7 gets confused (bugid 4218508) if these have - * the high bit set, as do xfs filesystems without the - * "bigtime" feature. So just clear the high bits. If this is - * ever changed to use different attrs for storing the - * verifier, then do_open_lookup() will also need to be fixed - * accordingly. - */ - v_mtime = verifier[0]&0x7fffffff; - v_atime = verifier[1]&0x7fffffff; - } - - if (d_really_is_positive(dchild)) { - err = 0; - - switch (createmode) { - case NFS3_CREATE_UNCHECKED: - if (! d_is_reg(dchild)) - goto out; - else if (truncp) { - /* in nfsv4, we need to treat this case a little - * differently. we don't want to truncate the - * file now; this would be wrong if the OPEN - * fails for some other reason. furthermore, - * if the size is nonzero, we should ignore it - * according to spec! - */ - *truncp = (iap->ia_valid & ATTR_SIZE) && !iap->ia_size; - } - else { - iap->ia_valid &= ATTR_SIZE; - goto set_attr; - } - break; - case NFS3_CREATE_EXCLUSIVE: - if ( d_inode(dchild)->i_mtime.tv_sec == v_mtime - && d_inode(dchild)->i_atime.tv_sec == v_atime - && d_inode(dchild)->i_size == 0 ) { - if (created) - *created = true; - break; - } - fallthrough; - case NFS4_CREATE_EXCLUSIVE4_1: - if ( d_inode(dchild)->i_mtime.tv_sec == v_mtime - && d_inode(dchild)->i_atime.tv_sec == v_atime - && d_inode(dchild)->i_size == 0 ) { - if (created) - *created = true; - goto set_attr; - } - fallthrough; - case NFS3_CREATE_GUARDED: - err = nfserr_exist; - } - goto out; - } - - if (!IS_POSIXACL(dirp)) - iap->ia_mode &= ~current_umask(); - - host_err = vfs_create(dirp, dchild, iap->ia_mode, true); - if (host_err < 0) - goto out_nfserr; - if (created) - *created = true; - - nfsd_check_ignore_resizing(iap); - - if (nfsd_create_is_exclusive(createmode)) { - /* Cram the verifier into atime/mtime */ - iap->ia_valid = ATTR_MTIME|ATTR_ATIME - | ATTR_MTIME_SET|ATTR_ATIME_SET; - /* XXX someone who knows this better please fix it for nsec */ - iap->ia_mtime.tv_sec = v_mtime; - iap->ia_atime.tv_sec = v_atime; - iap->ia_mtime.tv_nsec = 0; - iap->ia_atime.tv_nsec = 0; - } - - set_attr: - err = nfsd_create_setattr(rqstp, fhp, resfhp, iap); - - out: - fh_unlock(fhp); - if (dchild && !IS_ERR(dchild)) - dput(dchild); - fh_drop_write(fhp); - return err; - - out_nfserr: - err = nfserrno(host_err); - goto out; -} - /* * Read a symlink. On entry, *lenp must contain the maximum path length that * fits into the buffer. On return, it contains the true length. diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index 1f32a83456b0..f99794b033a5 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -71,10 +71,6 @@ __be32 nfsd_create(struct svc_rqst *, struct svc_fh *, __be32 nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *); __be32 nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct svc_fh *resfhp, struct iattr *iap); -__be32 do_nfsd_create(struct svc_rqst *, struct svc_fh *, - char *name, int len, struct iattr *attrs, - struct svc_fh *res, int createmode, - u32 *verifier, bool *truncp, bool *created); __be32 nfsd_commit(struct svc_rqst *rqst, struct svc_fh *fhp, u64 offset, u32 count, __be32 *verf); #ifdef CONFIG_NFSD_V4 @@ -161,10 +157,4 @@ static inline __be32 fh_getattr(const struct svc_fh *fh, struct kstat *stat) AT_STATX_SYNC_AS_STAT)); } -static inline int nfsd_create_is_exclusive(int createmode) -{ - return createmode == NFS3_CREATE_EXCLUSIVE - || createmode == NFS4_CREATE_EXCLUSIVE4_1; -} - #endif /* LINUX_NFSD_VFS_H */ From d8714bda3f692de172b0226f35b607df63a8c5a4 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 27 Mar 2022 16:46:47 -0400 Subject: [PATCH 0822/1565] NFSD: Clean up nfsd_open_verified() [ Upstream commit f4d84c52643ae1d63a8e73e2585464470e7944d1 ] Its only caller always passes S_IFREG as the @type parameter. As an additional clean-up, add a kerneldoc comment. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 4 ++-- fs/nfsd/vfs.c | 15 ++++++++++++--- fs/nfsd/vfs.h | 2 +- 3 files changed, 15 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 7e99e75b75d7..3c297ccfcc59 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -996,8 +996,8 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, nf->nf_mark = nfsd_file_mark_find_or_create(nf); if (nf->nf_mark) - status = nfsd_open_verified(rqstp, fhp, S_IFREG, - may_flags, &nf->nf_file); + status = nfsd_open_verified(rqstp, fhp, may_flags, + &nf->nf_file); else status = nfserr_jukebox; /* diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 26ae28b6f9b0..5c8dc1a05e57 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -852,14 +852,23 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, return err; } +/** + * nfsd_open_verified - Open a regular file for the filecache + * @rqstp: RPC request + * @fhp: NFS filehandle of the file to open + * @may_flags: internal permission flags + * @filp: OUT: open "struct file *" + * + * Returns an nfsstat value in network byte order. + */ __be32 -nfsd_open_verified(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, - int may_flags, struct file **filp) +nfsd_open_verified(struct svc_rqst *rqstp, struct svc_fh *fhp, int may_flags, + struct file **filp) { __be32 err; validate_process_creds(); - err = __nfsd_open(rqstp, fhp, type, may_flags, filp); + err = __nfsd_open(rqstp, fhp, S_IFREG, may_flags, filp); validate_process_creds(); return err; } diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index f99794b033a5..26347d76f44a 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -86,7 +86,7 @@ __be32 nfsd_setxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, int nfsd_open_break_lease(struct inode *, int); __be32 nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t, int, struct file **); -__be32 nfsd_open_verified(struct svc_rqst *, struct svc_fh *, umode_t, +__be32 nfsd_open_verified(struct svc_rqst *, struct svc_fh *, int, struct file **); __be32 nfsd_splice_read(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file, loff_t offset, From c20097329d2c196b818c4666c7820c1378d69d61 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 30 Mar 2022 10:30:54 -0400 Subject: [PATCH 0823/1565] NFSD: Instantiate a struct file when creating a regular NFSv4 file [ Upstream commit fb70bf124b051d4ded4ce57511dfec6d3ebf2b43 ] There have been reports of races that cause NFSv4 OPEN(CREATE) to return an error even though the requested file was created. NFSv4 does not provide a status code for this case. To mitigate some of these problems, reorganize the NFSv4 OPEN(CREATE) logic to allocate resources before the file is actually created, and open the new file while the parent directory is still locked. Two new APIs are added: + Add an API that works like nfsd_file_acquire() but does not open the underlying file. The OPEN(CREATE) path can use this API when it already has an open file. + Add an API that is kin to dentry_open(). NFSD needs to create a file and grab an open "struct file *" atomically. The alloc_empty_file() has to be done before the inode create. If it fails (for example, because the NFS server has exceeded its max_files limit), we avoid creating the file and can still return an error to the NFS client. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=382 Signed-off-by: Chuck Lever Tested-by: JianHong Yin [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 51 ++++++++++++++++++++++++++++++++++++++------- fs/nfsd/filecache.h | 2 ++ fs/nfsd/nfs4proc.c | 43 ++++++++++++++++++++++++++++++++++---- fs/nfsd/nfs4state.c | 16 +++++++++++--- fs/nfsd/xdr4.h | 1 + fs/open.c | 41 ++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 2 ++ 7 files changed, 142 insertions(+), 14 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 3c297ccfcc59..db9c68a3c1f3 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -898,9 +898,9 @@ nfsd_file_is_cached(struct inode *inode) return ret; } -__be32 -nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, - unsigned int may_flags, struct nfsd_file **pnf) +static __be32 +nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, + unsigned int may_flags, struct nfsd_file **pnf, bool open) { __be32 status; struct net *net = SVC_NET(rqstp); @@ -995,10 +995,13 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, nfsd_file_gc(); nf->nf_mark = nfsd_file_mark_find_or_create(nf); - if (nf->nf_mark) - status = nfsd_open_verified(rqstp, fhp, may_flags, - &nf->nf_file); - else + if (nf->nf_mark) { + if (open) + status = nfsd_open_verified(rqstp, fhp, may_flags, + &nf->nf_file); + else + status = nfs_ok; + } else status = nfserr_jukebox; /* * If construction failed, or we raced with a call to unlink() @@ -1018,6 +1021,40 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, goto out; } +/** + * nfsd_file_acquire - Get a struct nfsd_file with an open file + * @rqstp: the RPC transaction being executed + * @fhp: the NFS filehandle of the file to be opened + * @may_flags: NFSD_MAY_ settings for the file + * @pnf: OUT: new or found "struct nfsd_file" object + * + * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in + * network byte order is returned. + */ +__be32 +nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, + unsigned int may_flags, struct nfsd_file **pnf) +{ + return nfsd_do_file_acquire(rqstp, fhp, may_flags, pnf, true); +} + +/** + * nfsd_file_create - Get a struct nfsd_file, do not open + * @rqstp: the RPC transaction being executed + * @fhp: the NFS filehandle of the file just created + * @may_flags: NFSD_MAY_ settings for the file + * @pnf: OUT: new or found "struct nfsd_file" object + * + * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in + * network byte order is returned. + */ +__be32 +nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, + unsigned int may_flags, struct nfsd_file **pnf) +{ + return nfsd_do_file_acquire(rqstp, fhp, may_flags, pnf, false); +} + /* * Note that fields may be added, removed or reordered in the future. Programs * scraping this file for info should test the labels to ensure they're diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 435ceab27897..1da0c79a5580 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -59,5 +59,7 @@ void nfsd_file_close_inode_sync(struct inode *inode); bool nfsd_file_is_cached(struct inode *inode); __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **nfp); +__be32 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, + unsigned int may_flags, struct nfsd_file **nfp); int nfsd_file_cache_stats_open(struct inode *, struct file *); #endif /* _FS_NFSD_FILECACHE_H */ diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index adb64c9259bd..166ebb126d37 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -243,6 +243,37 @@ static inline bool nfsd4_create_is_exclusive(int createmode) createmode == NFS4_CREATE_EXCLUSIVE4_1; } +static __be32 +nfsd4_vfs_create(struct svc_fh *fhp, struct dentry *child, + struct nfsd4_open *open) +{ + struct file *filp; + struct path path; + int oflags; + + oflags = O_CREAT | O_LARGEFILE; + switch (open->op_share_access & NFS4_SHARE_ACCESS_BOTH) { + case NFS4_SHARE_ACCESS_WRITE: + oflags |= O_WRONLY; + break; + case NFS4_SHARE_ACCESS_BOTH: + oflags |= O_RDWR; + break; + default: + oflags |= O_RDONLY; + } + + path.mnt = fhp->fh_export->ex_path.mnt; + path.dentry = child; + filp = dentry_create(&path, oflags, open->op_iattr.ia_mode, + current_cred()); + if (IS_ERR(filp)) + return nfserrno(PTR_ERR(filp)); + + open->op_filp = filp; + return nfs_ok; +} + /* * Implement NFSv4's unchecked, guarded, and exclusive create * semantics for regular files. Open state for this new file is @@ -355,11 +386,9 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, if (!IS_POSIXACL(inode)) iap->ia_mode &= ~current_umask(); - host_err = vfs_create(inode, child, iap->ia_mode, true); - if (host_err < 0) { - status = nfserrno(host_err); + status = nfsd4_vfs_create(fhp, child, open); + if (status != nfs_ok) goto out; - } open->op_created = true; /* A newly created file already has a file size of zero. */ @@ -517,6 +546,8 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, (int)open->op_fnamelen, open->op_fname, open->op_openowner); + open->op_filp = NULL; + /* This check required by spec. */ if (open->op_create && open->op_claim_type != NFS4_OPEN_CLAIM_NULL) return nfserr_inval; @@ -613,6 +644,10 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (reclaim && !status) nn->somebody_reclaimed = true; out: + if (open->op_filp) { + fput(open->op_filp); + open->op_filp = NULL; + } if (resfh && resfh != &cstate->current_fh) { fh_dup2(&cstate->current_fh, resfh); fh_put(resfh); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 9116496b476a..9f972ca09eec 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5110,9 +5110,19 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, if (!fp->fi_fds[oflag]) { spin_unlock(&fp->fi_lock); - status = nfsd_file_acquire(rqstp, cur_fh, access, &nf); - if (status) - goto out_put_access; + + if (!open->op_filp) { + status = nfsd_file_acquire(rqstp, cur_fh, access, &nf); + if (status != nfs_ok) + goto out_put_access; + } else { + status = nfsd_file_create(rqstp, cur_fh, access, &nf); + if (status != nfs_ok) + goto out_put_access; + nf->nf_file = open->op_filp; + open->op_filp = NULL; + } + spin_lock(&fp->fi_lock); if (!fp->fi_fds[oflag]) { fp->fi_fds[oflag] = nf; diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 4a298ac5515d..448b687943cd 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -273,6 +273,7 @@ struct nfsd4_open { bool op_truncate; /* used during processing */ bool op_created; /* used during processing */ struct nfs4_openowner *op_openowner; /* used during processing */ + struct file *op_filp; /* used during processing */ struct nfs4_file *op_file; /* used during processing */ struct nfs4_ol_stateid *op_stp; /* used during processing */ struct nfs4_clnt_odstate *op_odstate; /* used during processing */ diff --git a/fs/open.c b/fs/open.c index 9f56ebacfbef..d69312a2d434 100644 --- a/fs/open.c +++ b/fs/open.c @@ -954,6 +954,47 @@ struct file *dentry_open(const struct path *path, int flags, } EXPORT_SYMBOL(dentry_open); +/** + * dentry_create - Create and open a file + * @path: path to create + * @flags: O_ flags + * @mode: mode bits for new file + * @cred: credentials to use + * + * Caller must hold the parent directory's lock, and have prepared + * a negative dentry, placed in @path->dentry, for the new file. + * + * Caller sets @path->mnt to the vfsmount of the filesystem where + * the new file is to be created. The parent directory and the + * negative dentry must reside on the same filesystem instance. + * + * On success, returns a "struct file *". Otherwise a ERR_PTR + * is returned. + */ +struct file *dentry_create(const struct path *path, int flags, umode_t mode, + const struct cred *cred) +{ + struct file *f; + int error; + + validate_creds(cred); + f = alloc_empty_file(flags, cred); + if (IS_ERR(f)) + return f; + + error = vfs_create(d_inode(path->dentry->d_parent), + path->dentry, mode, true); + if (!error) + error = vfs_open(path, f); + + if (unlikely(error)) { + fput(f); + return ERR_PTR(error); + } + return f; +} +EXPORT_SYMBOL(dentry_create); + struct file *open_with_fake_path(const struct path *path, int flags, struct inode *inode, const struct cred *cred) { diff --git a/include/linux/fs.h b/include/linux/fs.h index 3e9105b3cc76..7e7098bf2c57 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2618,6 +2618,8 @@ extern struct file *filp_open(const char *, int, umode_t); extern struct file *file_open_root(struct dentry *, struct vfsmount *, const char *, int, umode_t); extern struct file * dentry_open(const struct path *, int, const struct cred *); +extern struct file *dentry_create(const struct path *path, int flags, + umode_t mode, const struct cred *cred); extern struct file * open_with_fake_path(const struct path *, int, struct inode*, const struct cred *); static inline struct file *file_clone_open(struct file *file) From 2805c5439c9587eb263aa85f16f259f215a3fd9b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 30 Mar 2022 14:28:51 -0400 Subject: [PATCH 0824/1565] NFSD: Remove dprintk call sites from tail of nfsd4_open() [ Upstream commit f67a16b147045815b6aaafeef8663e5faeb6d569 ] Clean up: These relics are not likely to benefit server administrators. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 166ebb126d37..c21cfaabf873 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -622,13 +622,9 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, break; case NFS4_OPEN_CLAIM_DELEG_PREV_FH: case NFS4_OPEN_CLAIM_DELEGATE_PREV: - dprintk("NFSD: unsupported OPEN claim type %d\n", - open->op_claim_type); status = nfserr_notsupp; goto out; default: - dprintk("NFSD: Invalid OPEN claim type %d\n", - open->op_claim_type); status = nfserr_inval; goto out; } From bfe9aab120b236c18d8e3a575d4d5b9cdf5dec55 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 21 Mar 2022 16:41:32 -0400 Subject: [PATCH 0825/1565] NFSD: Fix whitespace [ Upstream commit 26320d7e317c37404c811603d50d811132aef78c ] Clean up: Pull case arms back one tab stop to conform every other switch statement in fs/nfsd/nfs4proc.c. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 50 +++++++++++++++++++++++----------------------- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index c21cfaabf873..90a12ccf9671 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -600,33 +600,33 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out; switch (open->op_claim_type) { - case NFS4_OPEN_CLAIM_DELEGATE_CUR: - case NFS4_OPEN_CLAIM_NULL: - status = do_open_lookup(rqstp, cstate, open, &resfh); - if (status) - goto out; - break; - case NFS4_OPEN_CLAIM_PREVIOUS: - status = nfs4_check_open_reclaim(cstate->clp); - if (status) - goto out; - open->op_openowner->oo_flags |= NFS4_OO_CONFIRMED; - reclaim = true; - fallthrough; - case NFS4_OPEN_CLAIM_FH: - case NFS4_OPEN_CLAIM_DELEG_CUR_FH: - status = do_open_fhandle(rqstp, cstate, open); - if (status) - goto out; - resfh = &cstate->current_fh; - break; - case NFS4_OPEN_CLAIM_DELEG_PREV_FH: - case NFS4_OPEN_CLAIM_DELEGATE_PREV: - status = nfserr_notsupp; + case NFS4_OPEN_CLAIM_DELEGATE_CUR: + case NFS4_OPEN_CLAIM_NULL: + status = do_open_lookup(rqstp, cstate, open, &resfh); + if (status) goto out; - default: - status = nfserr_inval; + break; + case NFS4_OPEN_CLAIM_PREVIOUS: + status = nfs4_check_open_reclaim(cstate->clp); + if (status) goto out; + open->op_openowner->oo_flags |= NFS4_OO_CONFIRMED; + reclaim = true; + fallthrough; + case NFS4_OPEN_CLAIM_FH: + case NFS4_OPEN_CLAIM_DELEG_CUR_FH: + status = do_open_fhandle(rqstp, cstate, open); + if (status) + goto out; + resfh = &cstate->current_fh; + break; + case NFS4_OPEN_CLAIM_DELEG_PREV_FH: + case NFS4_OPEN_CLAIM_DELEGATE_PREV: + status = nfserr_notsupp; + goto out; + default: + status = nfserr_inval; + goto out; } /* * nfsd4_process_open2() does the actual opening of the file. If From a38be004749649badfbcfad967b557a65258e2a2 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 23 Mar 2022 13:55:37 -0400 Subject: [PATCH 0826/1565] NFSD: Move documenting comment for nfsd4_process_open2() [ Upstream commit 7e2ce0cc15a509b859199235a2bad9cece00f67a ] Clean up nfsd4_open() by converting a large comment at the only call site for nfsd4_process_open2() to a kerneldoc comment in front of that function. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 6 +----- fs/nfsd/nfs4state.c | 12 ++++++++++++ 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 90a12ccf9671..d62d962ed2f1 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -628,11 +628,7 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfserr_inval; goto out; } - /* - * nfsd4_process_open2() does the actual opening of the file. If - * successful, it (1) truncates the file if open->op_truncate was - * set, (2) sets open->op_stateid, (3) sets open->op_delegation. - */ + status = nfsd4_process_open2(rqstp, resfh, open); WARN(status && open->op_created, "nfsd4_process_open2 failed to open newly-created file! status=%u\n", diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 9f972ca09eec..e0ce0412d7de 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5465,6 +5465,18 @@ static void nfsd4_deleg_xgrade_none_ext(struct nfsd4_open *open, */ } +/** + * nfsd4_process_open2 - finish open processing + * @rqstp: the RPC transaction being executed + * @current_fh: NFSv4 COMPOUND's current filehandle + * @open: OPEN arguments + * + * If successful, (1) truncate the file if open->op_truncate was + * set, (2) set open->op_stateid, (3) set open->op_delegation. + * + * Returns %nfs_ok on success; otherwise an nfs4stat value in + * network byte order is returned. + */ __be32 nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nfsd4_open *open) { From 7b8462f22a63a9ed341df3aa6dd9af437bb0c5ef Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 27 Mar 2022 16:42:20 -0400 Subject: [PATCH 0827/1565] NFSD: Trace filecache opens [ Upstream commit 0122e882119ddbd9efa6edfeeac3f5c704a7aeea ] Instrument calls to nfsd_open_verified() to get a sense of the filecache hit rate. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 5 +++-- fs/nfsd/trace.h | 28 ++++++++++++++++++++++++++++ 2 files changed, 31 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index db9c68a3c1f3..bdb5de8c0803 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -996,10 +996,11 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, nf->nf_mark = nfsd_file_mark_find_or_create(nf); if (nf->nf_mark) { - if (open) + if (open) { status = nfsd_open_verified(rqstp, fhp, may_flags, &nf->nf_file); - else + trace_nfsd_file_open(nf, status); + } else status = nfs_ok; } else status = nfserr_jukebox; diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 8327d1601d71..c4c073e85fdd 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -795,6 +795,34 @@ TRACE_EVENT(nfsd_file_acquire, __entry->nf_file, __entry->status) ); +TRACE_EVENT(nfsd_file_open, + TP_PROTO(struct nfsd_file *nf, __be32 status), + TP_ARGS(nf, status), + TP_STRUCT__entry( + __field(unsigned int, nf_hashval) + __field(void *, nf_inode) /* cannot be dereferenced */ + __field(int, nf_ref) + __field(unsigned long, nf_flags) + __field(unsigned long, nf_may) + __field(void *, nf_file) /* cannot be dereferenced */ + ), + TP_fast_assign( + __entry->nf_hashval = nf->nf_hashval; + __entry->nf_inode = nf->nf_inode; + __entry->nf_ref = refcount_read(&nf->nf_ref); + __entry->nf_flags = nf->nf_flags; + __entry->nf_may = nf->nf_may; + __entry->nf_file = nf->nf_file; + ), + TP_printk("hash=0x%x inode=%p ref=%d flags=%s may=%s file=%p", + __entry->nf_hashval, + __entry->nf_inode, + __entry->nf_ref, + show_nf_flags(__entry->nf_flags), + show_nfsd_may_flags(__entry->nf_may), + __entry->nf_file) +) + DECLARE_EVENT_CLASS(nfsd_file_search_class, TP_PROTO(struct inode *inode, unsigned int hash, int found), TP_ARGS(inode, hash, found), From f7a1ecf2aa4b1d37854d0d3c8d64351e384ba314 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 27 Mar 2022 16:43:03 -0400 Subject: [PATCH 0828/1565] NFSD: Clean up the show_nf_flags() macro [ Upstream commit bb283ca18d1e67c82d22a329c96c9d6036a74790 ] The flags are defined using C macros, so TRACE_DEFINE_ENUM is unnecessary. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/trace.h | 6 ------ 1 file changed, 6 deletions(-) diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index c4c073e85fdd..8ccce4ac66b4 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -703,12 +703,6 @@ DEFINE_CLID_EVENT(confirmed_r); /* * from fs/nfsd/filecache.h */ -TRACE_DEFINE_ENUM(NFSD_FILE_HASHED); -TRACE_DEFINE_ENUM(NFSD_FILE_PENDING); -TRACE_DEFINE_ENUM(NFSD_FILE_BREAK_READ); -TRACE_DEFINE_ENUM(NFSD_FILE_BREAK_WRITE); -TRACE_DEFINE_ENUM(NFSD_FILE_REFERENCED); - #define show_nf_flags(val) \ __print_flags(val, "|", \ { 1 << NFSD_FILE_HASHED, "HASHED" }, \ From e1e87709c4539a0cd11cf0fbca737c861f20efcf Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 29 Apr 2022 10:06:21 -0400 Subject: [PATCH 0829/1565] SUNRPC: Use RMW bitops in single-threaded hot paths [ Upstream commit 28df0988815f63e2af5e6718193c9f68681ad7ff ] I noticed CPU pipeline stalls while using perf. Once an svc thread is scheduled and executing an RPC, no other processes will touch svc_rqst::rq_flags. Thus bus-locked atomics are not needed outside the svc thread scheduler. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 7 ++++--- fs/nfsd/nfs4xdr.c | 2 +- net/sunrpc/auth_gss/svcauth_gss.c | 4 ++-- net/sunrpc/svc.c | 6 +++--- net/sunrpc/svc_xprt.c | 2 +- net/sunrpc/svcsock.c | 8 ++++---- net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +- 7 files changed, 16 insertions(+), 15 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index d62d962ed2f1..adbac1e77e9e 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -970,7 +970,7 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, * the client wants us to do more in this compound: */ if (!nfsd4_last_compound_op(rqstp)) - clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); + __clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); /* check stateid */ status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, @@ -2642,11 +2642,12 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) cstate->minorversion = args->minorversion; fh_init(current_fh, NFS4_FHSIZE); fh_init(save_fh, NFS4_FHSIZE); + /* * Don't use the deferral mechanism for NFSv4; compounds make it * too hard to avoid non-idempotency problems. */ - clear_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); + __clear_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); /* * According to RFC3010, this takes precedence over all other errors. @@ -2761,7 +2762,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) out: cstate->status = status; /* Reset deferral mechanism for RPC deferrals */ - set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); + __set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); return rpc_success; } diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index a9f6c7eeb756..804c137fabec 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2411,7 +2411,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) argp->rqstp->rq_cachetype = cachethis ? RC_REPLBUFF : RC_NOCACHE; if (readcount > 1 || max_reply > PAGE_SIZE - auth_slack) - clear_bit(RQ_SPLICE_OK, &argp->rqstp->rq_flags); + __clear_bit(RQ_SPLICE_OK, &argp->rqstp->rq_flags); return true; } diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c index 329eac782cc5..abaa952ae751 100644 --- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -898,7 +898,7 @@ unwrap_integ_data(struct svc_rqst *rqstp, struct xdr_buf *buf, u32 seq, struct g * rejecting the server-computed MIC in this somewhat rare case, * do not use splice with the GSS integrity service. */ - clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); + __clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); /* Did we already verify the signature on the original pass through? */ if (rqstp->rq_deferred) @@ -970,7 +970,7 @@ unwrap_priv_data(struct svc_rqst *rqstp, struct xdr_buf *buf, u32 seq, struct gs int pad, remaining_len, offset; u32 rseqno; - clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); + __clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); priv_len = svc_getnl(&buf->head[0]); if (rqstp->rq_deferred) { diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 7f231947347e..9caaed1c143e 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1276,10 +1276,10 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) goto err_short_len; /* Will be turned off by GSS integrity and privacy services */ - set_bit(RQ_SPLICE_OK, &rqstp->rq_flags); + __set_bit(RQ_SPLICE_OK, &rqstp->rq_flags); /* Will be turned off only when NFSv4 Sessions are used */ - set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); - clear_bit(RQ_DROPME, &rqstp->rq_flags); + __set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); + __clear_bit(RQ_DROPME, &rqstp->rq_flags); svc_putu32(resv, rqstp->rq_xid); diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 000b737784bd..ea65036dc506 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -1228,7 +1228,7 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req) trace_svc_defer(rqstp); svc_xprt_get(rqstp->rq_xprt); dr->xprt = rqstp->rq_xprt; - set_bit(RQ_DROPME, &rqstp->rq_flags); + __set_bit(RQ_DROPME, &rqstp->rq_flags); dr->handle.revisit = svc_revisit; return &dr->handle; diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 90f6231d6ed6..ff8dcebbdfc4 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -309,9 +309,9 @@ static void svc_sock_setbufsize(struct svc_sock *svsk, unsigned int nreqs) static void svc_sock_secure_port(struct svc_rqst *rqstp) { if (svc_port_is_privileged(svc_addr(rqstp))) - set_bit(RQ_SECURE, &rqstp->rq_flags); + __set_bit(RQ_SECURE, &rqstp->rq_flags); else - clear_bit(RQ_SECURE, &rqstp->rq_flags); + __clear_bit(RQ_SECURE, &rqstp->rq_flags); } /* @@ -1014,9 +1014,9 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp) rqstp->rq_xprt_ctxt = NULL; rqstp->rq_prot = IPPROTO_TCP; if (test_bit(XPT_LOCAL, &svsk->sk_xprt.xpt_flags)) - set_bit(RQ_LOCAL, &rqstp->rq_flags); + __set_bit(RQ_LOCAL, &rqstp->rq_flags); else - clear_bit(RQ_LOCAL, &rqstp->rq_flags); + __clear_bit(RQ_LOCAL, &rqstp->rq_flags); p = (__be32 *)rqstp->rq_arg.head[0].iov_base; calldir = p[1]; diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index c895f80df659..e72e36989985 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -606,7 +606,7 @@ static int svc_rdma_has_wspace(struct svc_xprt *xprt) static void svc_rdma_secure_port(struct svc_rqst *rqstp) { - set_bit(RQ_SECURE, &rqstp->rq_flags); + __set_bit(RQ_SECURE, &rqstp->rq_flags); } static void svc_rdma_kill_temp_xprt(struct svc_xprt *xprt) From ff4e7a4b497a437c9091efcf5d4e44362a9fb961 Mon Sep 17 00:00:00 2001 From: Zhang Xiaoxu Date: Sat, 21 May 2022 12:08:44 +0800 Subject: [PATCH 0830/1565] nfsd: Unregister the cld notifier when laundry_wq create failed [ Upstream commit 62fdb65edb6c43306c774939001f3a00974832aa ] If laundry_wq create failed, the cld notifier should be unregistered. Signed-off-by: Zhang Xiaoxu Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsctl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 322a208878f2..55949e60897d 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1543,12 +1543,14 @@ static int __init init_nfsd(void) goto out_free_filesystem; retval = register_cld_notifier(); if (retval) - goto out_free_all; + goto out_free_subsys; retval = nfsd4_create_laundry_wq(); if (retval) goto out_free_all; return 0; out_free_all: + unregister_cld_notifier(); +out_free_subsys: unregister_pernet_subsys(&nfsd_net_ops); out_free_filesystem: unregister_filesystem(&nfsd_fs_type); From 15081df04a6ec7ff7adc03d897751c048395fa18 Mon Sep 17 00:00:00 2001 From: Zhang Xiaoxu Date: Sat, 21 May 2022 12:08:45 +0800 Subject: [PATCH 0831/1565] nfsd: Fix null-ptr-deref in nfsd_fill_super() [ Upstream commit 6f6f84aa215f7b6665ccbb937db50860f9ec2989 ] KASAN report null-ptr-deref as follows: BUG: KASAN: null-ptr-deref in nfsd_fill_super+0xc6/0xe0 [nfsd] Write of size 8 at addr 000000000000005d by task a.out/852 CPU: 7 PID: 852 Comm: a.out Not tainted 5.18.0-rc7-dirty #66 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 Call Trace: dump_stack_lvl+0x34/0x44 kasan_report+0xab/0x120 ? nfsd_mkdir+0x71/0x1c0 [nfsd] ? nfsd_fill_super+0xc6/0xe0 [nfsd] nfsd_fill_super+0xc6/0xe0 [nfsd] ? nfsd_mkdir+0x1c0/0x1c0 [nfsd] get_tree_keyed+0x8e/0x100 vfs_get_tree+0x41/0xf0 __do_sys_fsconfig+0x590/0x670 ? fscontext_read+0x180/0x180 ? anon_inode_getfd+0x4f/0x70 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae This can be reproduce by concurrent operations: 1. fsopen(nfsd)/fsconfig 2. insmod/rmmod nfsd Since the nfsd file system is registered before than nfsd_net allocated, the caller may get the file_system_type and use the nfsd_net before it allocated, then null-ptr-deref occurred. So init_nfsd() should call register_filesystem() last. Fixes: bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") Signed-off-by: Zhang Xiaoxu Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsctl.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 55949e60897d..0621c2faf242 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1535,25 +1535,25 @@ static int __init init_nfsd(void) retval = create_proc_exports_entry(); if (retval) goto out_free_lockd; - retval = register_filesystem(&nfsd_fs_type); - if (retval) - goto out_free_exports; retval = register_pernet_subsys(&nfsd_net_ops); if (retval < 0) - goto out_free_filesystem; + goto out_free_exports; retval = register_cld_notifier(); if (retval) goto out_free_subsys; retval = nfsd4_create_laundry_wq(); + if (retval) + goto out_free_cld; + retval = register_filesystem(&nfsd_fs_type); if (retval) goto out_free_all; return 0; out_free_all: + nfsd4_destroy_laundry_wq(); +out_free_cld: unregister_cld_notifier(); out_free_subsys: unregister_pernet_subsys(&nfsd_net_ops); -out_free_filesystem: - unregister_filesystem(&nfsd_fs_type); out_free_exports: remove_proc_entry("fs/nfs/exports", NULL); remove_proc_entry("fs/nfs", NULL); @@ -1571,6 +1571,7 @@ static int __init init_nfsd(void) static void __exit exit_nfsd(void) { + unregister_filesystem(&nfsd_fs_type); nfsd4_destroy_laundry_wq(); unregister_cld_notifier(); unregister_pernet_subsys(&nfsd_net_ops); @@ -1581,7 +1582,6 @@ static void __exit exit_nfsd(void) nfsd_lockd_shutdown(); nfsd4_free_slabs(); nfsd4_exit_pnfs(); - unregister_filesystem(&nfsd_fs_type); } MODULE_AUTHOR("Olaf Kirch "); From 13459d22256aa189ea6de1a8fbf9415b2fafb988 Mon Sep 17 00:00:00 2001 From: Julian Schroeder Date: Mon, 23 May 2022 18:52:26 +0000 Subject: [PATCH 0832/1565] nfsd: destroy percpu stats counters after reply cache shutdown MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit fd5e363eac77ef81542db77ddad0559fa0f9204e ] Upon nfsd shutdown any pending DRC cache is freed. DRC cache use is tracked via a percpu counter. In the current code the percpu counter is destroyed before. If any pending cache is still present, percpu_counter_add is called with a percpu counter==NULL. This causes a kernel crash. The solution is to destroy the percpu counter after the cache is freed. Fixes: e567b98ce9a4b (“nfsd: protect concurrent access to nfsd stats counters”) Signed-off-by: Julian Schroeder Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfscache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c index 0b3f12aa37ff..7da88bdc0d6c 100644 --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -206,7 +206,6 @@ void nfsd_reply_cache_shutdown(struct nfsd_net *nn) struct svc_cacherep *rp; unsigned int i; - nfsd_reply_cache_stats_destroy(nn); unregister_shrinker(&nn->nfsd_reply_cache_shrinker); for (i = 0; i < nn->drc_hashsize; i++) { @@ -217,6 +216,7 @@ void nfsd_reply_cache_shutdown(struct nfsd_net *nn) rp, nn); } } + nfsd_reply_cache_stats_destroy(nn); kvfree(nn->drc_hashtbl); nn->drc_hashtbl = NULL; From 345e2e48d8df5941ceddf283cf4977c13620c669 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 22 May 2022 12:07:18 -0400 Subject: [PATCH 0833/1565] NFSD: Modernize nfsd4_release_lockowner() [ Upstream commit bd8fdb6e545f950f4654a9a10d7e819ad48146e5 ] Refactor: Use existing helpers that other lock operations use. This change removes several automatic variables, so re-organize the variable declarations for readability. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 38 ++++++++++++-------------------------- 1 file changed, 12 insertions(+), 26 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index e0ce0412d7de..1c32765e86b1 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -7562,16 +7562,13 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp, union nfsd4_op_u *u) { struct nfsd4_release_lockowner *rlockowner = &u->release_lockowner; - clientid_t *clid = &rlockowner->rl_clientid; - struct nfs4_stateowner *sop; - struct nfs4_lockowner *lo = NULL; - struct nfs4_ol_stateid *stp; - struct xdr_netobj *owner = &rlockowner->rl_owner; - unsigned int hashval = ownerstr_hashval(owner); - __be32 status; struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); + clientid_t *clid = &rlockowner->rl_clientid; + struct nfs4_ol_stateid *stp; + struct nfs4_lockowner *lo; struct nfs4_client *clp; - LIST_HEAD (reaplist); + LIST_HEAD(reaplist); + __be32 status; dprintk("nfsd4_release_lockowner clientid: (%08x/%08x):\n", clid->cl_boot, clid->cl_id); @@ -7579,30 +7576,19 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp, status = set_client(clid, cstate, nn); if (status) return status; - clp = cstate->clp; - /* Find the matching lock stateowner */ + spin_lock(&clp->cl_lock); - list_for_each_entry(sop, &clp->cl_ownerstr_hashtbl[hashval], - so_strhash) { - - if (sop->so_is_open_owner || !same_owner_str(sop, owner)) - continue; - - if (atomic_read(&sop->so_count) != 1) { - spin_unlock(&clp->cl_lock); - return nfserr_locks_held; - } - - lo = lockowner(sop); - nfs4_get_stateowner(sop); - break; - } + lo = find_lockowner_str_locked(clp, &rlockowner->rl_owner); if (!lo) { spin_unlock(&clp->cl_lock); return status; } - + if (atomic_read(&lo->lo_owner.so_count) != 2) { + spin_unlock(&clp->cl_lock); + nfs4_put_stateowner(&lo->lo_owner); + return nfserr_locks_held; + } unhash_lockowner_locked(lo); while (!list_empty(&lo->lo_owner.so_stateids)) { stp = list_first_entry(&lo->lo_owner.so_stateids, From 06252d1bd57ab7f1cc9c9e8b94d158a059e63d83 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 22 May 2022 12:34:38 -0400 Subject: [PATCH 0834/1565] NFSD: Add documenting comment for nfsd4_release_lockowner() [ Upstream commit 043862b09cc00273e35e6c3a6389957953a34207 ] And return explicit nfserr values that match what is documented in the new comment / API contract. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 1c32765e86b1..76ec207f5e44 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -7556,6 +7556,23 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner) return status; } +/** + * nfsd4_release_lockowner - process NFSv4.0 RELEASE_LOCKOWNER operations + * @rqstp: RPC transaction + * @cstate: NFSv4 COMPOUND state + * @u: RELEASE_LOCKOWNER arguments + * + * The lockowner's so_count is bumped when a lock record is added + * or when copying a conflicting lock. The latter case is brief, + * but can lead to fleeting false positives when looking for + * locks-in-use. + * + * Return values: + * %nfs_ok: lockowner released or not found + * %nfserr_locks_held: lockowner still in use + * %nfserr_stale_clientid: clientid no longer active + * %nfserr_expired: clientid not recognized + */ __be32 nfsd4_release_lockowner(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, @@ -7582,7 +7599,7 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp, lo = find_lockowner_str_locked(clp, &rlockowner->rl_owner); if (!lo) { spin_unlock(&clp->cl_lock); - return status; + return nfs_ok; } if (atomic_read(&lo->lo_owner.so_count) != 2) { spin_unlock(&clp->cl_lock); @@ -7598,11 +7615,11 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp, put_ol_stateid_locked(stp, &reaplist); } spin_unlock(&clp->cl_lock); + free_ol_stateid_reaplist(&reaplist); remove_blocked_locks(lo); nfs4_put_stateowner(&lo->lo_owner); - - return status; + return nfs_ok; } static inline struct nfs4_client_reclaim * From 4862b618860332d23218f2caef372d1a6d251e97 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 11 May 2022 13:02:21 -0400 Subject: [PATCH 0835/1565] NFSD: nfsd_file_put() can sleep [ Upstream commit 08af54b3e5729bc1d56ad3190af811301bdc37a1 ] Now that there are no more callers of nfsd_file_put() that might hold a spin lock, ensure the lockdep infrastructure can catch newly introduced calls to nfsd_file_put() made while a spinlock is held. Link: https://lore.kernel.org/linux-nfs/ece7fd1d-5fb3-5155-54ba-347cfc19bd9a@oracle.com/T/#mf1855552570cf9a9c80d1e49d91438cd9085aada Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index bdb5de8c0803..11c096b44740 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -302,6 +302,8 @@ nfsd_file_put_noref(struct nfsd_file *nf) void nfsd_file_put(struct nfsd_file *nf) { + might_sleep(); + set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) { nfsd_file_flush(nf); From ada1757b259f353cade47037ee0a0249b4cddad3 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 31 May 2022 19:49:01 -0400 Subject: [PATCH 0836/1565] NFSD: Fix potential use-after-free in nfsd_file_put() [ Upstream commit b6c71c66b0ad8f2b59d9bc08c7a5079b110bec01 ] nfsd_file_put_noref() can free @nf, so don't dereference @nf immediately upon return from nfsd_file_put_noref(). Suggested-by: Trond Myklebust Fixes: 999397926ab3 ("nfsd: Clean up nfsd_file_put()") Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 11c096b44740..fc0fcb332153 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -308,11 +308,12 @@ nfsd_file_put(struct nfsd_file *nf) if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) { nfsd_file_flush(nf); nfsd_file_put_noref(nf); - } else { + } else if (nf->nf_file) { nfsd_file_put_noref(nf); - if (nf->nf_file) - nfsd_file_schedule_laundrette(); - } + nfsd_file_schedule_laundrette(); + } else + nfsd_file_put_noref(nf); + if (atomic_long_read(&nfsd_filecache_count) >= NFSD_FILE_LRU_LIMIT) nfsd_file_gc(); } From 6943f1073abe910e7cd8c1383e7f330c4111b7a0 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 7 Jun 2022 16:47:58 -0400 Subject: [PATCH 0837/1565] SUNRPC: Optimize xdr_reserve_space() [ Upstream commit 62ed448cc53b654036f7d7f3c99f299d79ad14c3 ] Transitioning between encode buffers is quite infrequent. It happens about 1 time in 400 calls to xdr_reserve_space(), measured on NFSD with a typical build/test workload. Force the compiler to remove that code from xdr_reserve_space(), which is a hot path on both the server and the client. This change reduces the size of xdr_reserve_space() from 10 cache lines to 2 when compiled with -Os. Signed-off-by: Chuck Lever Reviewed-by: J. Bruce Fields Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/sunrpc/xdr.h | 16 +++++++++++++++- net/sunrpc/xdr.c | 17 ++++++++++------- 2 files changed, 25 insertions(+), 8 deletions(-) diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 927f1458bcab..b8fb866e1fd3 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -242,7 +242,7 @@ extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes); extern int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec, size_t nbytes); -extern void xdr_commit_encode(struct xdr_stream *xdr); +extern void __xdr_commit_encode(struct xdr_stream *xdr); extern void xdr_truncate_encode(struct xdr_stream *xdr, size_t len); extern int xdr_restrict_buflen(struct xdr_stream *xdr, int newbuflen); extern void xdr_write_pages(struct xdr_stream *xdr, struct page **pages, @@ -305,6 +305,20 @@ xdr_reset_scratch_buffer(struct xdr_stream *xdr) xdr_set_scratch_buffer(xdr, NULL, 0); } +/** + * xdr_commit_encode - Ensure all data is written to xdr->buf + * @xdr: pointer to xdr_stream + * + * Handle encoding across page boundaries by giving the caller a + * temporary location to write to, then later copying the data into + * place. __xdr_commit_encode() does that copying. + */ +static inline void xdr_commit_encode(struct xdr_stream *xdr) +{ + if (unlikely(xdr->scratch.iov_len)) + __xdr_commit_encode(xdr); +} + /** * xdr_stream_remaining - Return the number of bytes remaining in the stream * @xdr: pointer to struct xdr_stream diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c index 722586696fad..f66e2de7cd27 100644 --- a/net/sunrpc/xdr.c +++ b/net/sunrpc/xdr.c @@ -691,7 +691,7 @@ void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, EXPORT_SYMBOL_GPL(xdr_init_encode); /** - * xdr_commit_encode - Ensure all data is written to buffer + * __xdr_commit_encode - Ensure all data is written to buffer * @xdr: pointer to xdr_stream * * We handle encoding across page boundaries by giving the caller a @@ -703,22 +703,25 @@ EXPORT_SYMBOL_GPL(xdr_init_encode); * required at the end of encoding, or any other time when the xdr_buf * data might be read. */ -inline void xdr_commit_encode(struct xdr_stream *xdr) +void __xdr_commit_encode(struct xdr_stream *xdr) { int shift = xdr->scratch.iov_len; void *page; - if (shift == 0) - return; page = page_address(*xdr->page_ptr); memcpy(xdr->scratch.iov_base, page, shift); memmove(page, page + shift, (void *)xdr->p - page); xdr_reset_scratch_buffer(xdr); } -EXPORT_SYMBOL_GPL(xdr_commit_encode); +EXPORT_SYMBOL_GPL(__xdr_commit_encode); -static __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr, - size_t nbytes) +/* + * The buffer space to be reserved crosses the boundary between + * xdr->buf->head and xdr->buf->pages, or between two pages + * in xdr->buf->pages. + */ +static noinline __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr, + size_t nbytes) { __be32 *p; int space_left; From e9156a243175cd4f4b574709122dcef4cc14d829 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Mon, 27 Jun 2022 20:47:19 +0300 Subject: [PATCH 0838/1565] fanotify: refine the validation checks on non-dir inode mask [ Upstream commit 8698e3bab4dd7968666e84e111d0bfd17c040e77 ] Commit ceaf69f8eadc ("fanotify: do not allow setting dirent events in mask of non-dir") added restrictions about setting dirent events in the mask of a non-dir inode mark, which does not make any sense. For backward compatibility, these restictions were added only to new (v5.17+) APIs. It also does not make any sense to set the flags FAN_EVENT_ON_CHILD or FAN_ONDIR in the mask of a non-dir inode. Add these flags to the dir-only restriction of the new APIs as well. Move the check of the dir-only flags for new APIs into the helper fanotify_events_supported(), which is only called for FAN_MARK_ADD, because there is no need to error on an attempt to remove the dir-only flags from non-dir inode. Fixes: ceaf69f8eadc ("fanotify: do not allow setting dirent events in mask of non-dir") Link: https://lore.kernel.org/linux-fsdevel/20220627113224.kr2725conevh53u4@quack3.lan/ Link: https://lore.kernel.org/r/20220627174719.2838175-1-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 30 +++++++++++++++--------------- include/linux/fanotify.h | 4 ++++ 2 files changed, 19 insertions(+), 15 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index f4a5e9074dd4..990464e00aec 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1501,10 +1501,14 @@ static int fanotify_test_fid(struct dentry *dentry) return 0; } -static int fanotify_events_supported(struct path *path, __u64 mask, +static int fanotify_events_supported(struct fsnotify_group *group, + struct path *path, __u64 mask, unsigned int flags) { unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS; + /* Strict validation of events in non-dir inode mask with v5.17+ APIs */ + bool strict_dir_events = FAN_GROUP_FLAG(group, FAN_REPORT_TARGET_FID) || + (mask & FAN_RENAME); /* * Some filesystems such as 'proc' acquire unusual locks when opening @@ -1532,6 +1536,15 @@ static int fanotify_events_supported(struct path *path, __u64 mask, path->mnt->mnt_sb->s_flags & SB_NOUSER) return -EINVAL; + /* + * We shouldn't have allowed setting dirent events and the directory + * flags FAN_ONDIR and FAN_EVENT_ON_CHILD in mask of non-dir inode, + * but because we always allowed it, error only when using new APIs. + */ + if (strict_dir_events && mark_type == FAN_MARK_INODE && + !d_is_dir(path->dentry) && (mask & FANOTIFY_DIRONLY_EVENT_BITS)) + return -ENOTDIR; + return 0; } @@ -1678,7 +1691,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, goto fput_and_out; if (flags & FAN_MARK_ADD) { - ret = fanotify_events_supported(&path, mask, flags); + ret = fanotify_events_supported(group, &path, mask, flags); if (ret) goto path_put_and_out; } @@ -1701,19 +1714,6 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, else mnt = path.mnt; - /* - * FAN_RENAME is not allowed on non-dir (for now). - * We shouldn't have allowed setting any dirent events in mask of - * non-dir, but because we always allowed it, error only if group - * was initialized with the new flag FAN_REPORT_TARGET_FID. - */ - ret = -ENOTDIR; - if (inode && !S_ISDIR(inode->i_mode) && - ((mask & FAN_RENAME) || - ((mask & FANOTIFY_DIRENT_EVENTS) && - FAN_GROUP_FLAG(group, FAN_REPORT_TARGET_FID)))) - goto path_put_and_out; - /* Mask out FAN_EVENT_ON_CHILD flag for sb/mount/non-dir marks */ if (mnt || !S_ISDIR(inode->i_mode)) { mask &= ~FAN_EVENT_ON_CHILD; diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 81f45061c1b1..4f6cbe6c6e23 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -113,6 +113,10 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ FANOTIFY_PERM_EVENTS | \ FAN_Q_OVERFLOW | FAN_ONDIR) +/* Events and flags relevant only for directories */ +#define FANOTIFY_DIRONLY_EVENT_BITS (FANOTIFY_DIRENT_EVENTS | \ + FAN_EVENT_ON_CHILD | FAN_ONDIR) + #define ALL_FANOTIFY_EVENT_BITS (FANOTIFY_OUTGOING_EVENTS | \ FANOTIFY_EVENT_FLAGS) From 58f985d688aad7e3ee970e43377dfe8a79cf594c Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Thu, 23 Jun 2022 14:47:34 +1000 Subject: [PATCH 0839/1565] NFS: restore module put when manager exits. [ Upstream commit 080abad71e99d2becf38c978572982130b927a28 ] Commit f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module") removed calls to module_put_and_kthread_exit() from threads that acted as SUNRPC servers and had a related svc_serv_ops structure. This was correct. It ALSO removed the module_put_and_kthread_exit() call from nfs4_run_state_manager() which is NOT a SUNRPC service. Consequently every time the NFSv4 state manager runs the module count increments and won't be decremented. So the nfsv4 module cannot be unloaded. So restore the module_put_and_kthread_exit() call. Fixes: f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module") Signed-off-by: NeilBrown Signed-off-by: Anna Schumaker Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfs/nfs4state.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index ae2da65ffbaf..d8fc5d72a161 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -2757,6 +2757,7 @@ static int nfs4_run_state_manager(void *ptr) goto again; nfs_put_client(clp); + module_put_and_kthread_exit(0); return 0; } From 845b309cf58604925de6fd3e4cc11c9d788ca0f7 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 10 Jul 2022 14:46:04 -0400 Subject: [PATCH 0840/1565] NFSD: Decode NFSv4 birth time attribute [ Upstream commit 5b2f3e0777da2a5dd62824bbe2fdab1d12caaf8f ] NFSD has advertised support for the NFSv4 time_create attribute since commit e377a3e698fb ("nfsd: Add support for the birth time attribute"). Igor Mammedov reports that Mac OS clients attempt to set the NFSv4 birth time attribute via OPEN(CREATE) and SETATTR if the server indicates that it supports it, but since the above commit was merged, those attempts now fail. Table 5 in RFC 8881 lists the time_create attribute as one that can be both set and retrieved, but the above commit did not add server support for clients to provide a time_create attribute. IMO that's a bug in our implementation of the NFSv4 protocol, which this commit addresses. Whether NFSD silently ignores the new birth time or actually sets it is another matter. I haven't found another filesystem service in the Linux kernel that enables users or clients to modify a file's birth time attribute. This commit reflects my (perhaps incorrect) understanding of whether Linux users can set a file's birth time. NFSD will now recognize a time_create attribute but it ignores its value. It clears the time_create bit in the returned attribute bitmask to indicate that the value was not used. Reported-by: Igor Mammedov Fixes: e377a3e698fb ("nfsd: Add support for the birth time attribute") Tested-by: Igor Mammedov Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 9 +++++++++ fs/nfsd/nfsd.h | 3 ++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 804c137fabec..b98a24c2a753 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -470,6 +470,15 @@ nfsd4_decode_fattr4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen, return nfserr_bad_xdr; } } + if (bmval[1] & FATTR4_WORD1_TIME_CREATE) { + struct timespec64 ts; + + /* No Linux filesystem supports setting this attribute. */ + bmval[1] &= ~FATTR4_WORD1_TIME_CREATE; + status = nfsd4_decode_nfstime4(argp, &ts); + if (status) + return status; + } if (bmval[1] & FATTR4_WORD1_TIME_MODIFY_SET) { u32 set_it; diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 847b482155ae..9a8b09afc173 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -465,7 +465,8 @@ static inline bool nfsd_attrs_supported(u32 minorversion, const u32 *bmval) (FATTR4_WORD0_SIZE | FATTR4_WORD0_ACL) #define NFSD_WRITEABLE_ATTRS_WORD1 \ (FATTR4_WORD1_MODE | FATTR4_WORD1_OWNER | FATTR4_WORD1_OWNER_GROUP \ - | FATTR4_WORD1_TIME_ACCESS_SET | FATTR4_WORD1_TIME_MODIFY_SET) + | FATTR4_WORD1_TIME_ACCESS_SET | FATTR4_WORD1_TIME_CREATE \ + | FATTR4_WORD1_TIME_MODIFY_SET) #ifdef CONFIG_NFSD_V4_SECURITY_LABEL #define MAYBE_FATTR4_WORD2_SECURITY_LABEL \ FATTR4_WORD2_SECURITY_LABEL From 486c1acf14233b1ff7b37e6f026a737a2f7f53f1 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 11 Jul 2022 14:30:13 -0400 Subject: [PATCH 0841/1565] lockd: set fl_owner when unlocking files [ Upstream commit aec158242b87a43d83322e99bc71ab4428e5ab79 ] Unlocking a POSIX lock on an inode with vfs_lock_file only works if the owner matches. Ensure we set it in the request. Cc: J. Bruce Fields Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file") Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svcsubs.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index 0a22a2faf552..b2f277727469 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -176,7 +176,7 @@ nlm_delete_file(struct nlm_file *file) } } -static int nlm_unlock_files(struct nlm_file *file) +static int nlm_unlock_files(struct nlm_file *file, fl_owner_t owner) { struct file_lock lock; @@ -184,6 +184,7 @@ static int nlm_unlock_files(struct nlm_file *file) lock.fl_type = F_UNLCK; lock.fl_start = 0; lock.fl_end = OFFSET_MAX; + lock.fl_owner = owner; if (file->f_file[O_RDONLY] && vfs_lock_file(file->f_file[O_RDONLY], F_SETLK, &lock, NULL)) goto out_err; @@ -225,7 +226,7 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file *file, if (match(lockhost, host)) { spin_unlock(&flctx->flc_lock); - if (nlm_unlock_files(file)) + if (nlm_unlock_files(file, fl->fl_owner)) return 1; goto again; } From 795f9fa1b50b6578eeb16c98cf90d50cd9a0961c Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 11 Jul 2022 14:30:14 -0400 Subject: [PATCH 0842/1565] lockd: fix nlm_close_files [ Upstream commit 1197eb5906a5464dbaea24cac296dfc38499cc00 ] This loop condition tries a bit too hard to be clever. Just test for the two indices we care about explicitly. Cc: J. Bruce Fields Fixes: 7f024fcd5c97 ("Keep read and write fds with each nlm_file") Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svcsubs.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index b2f277727469..e1c4617de771 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -283,11 +283,10 @@ nlm_file_inuse(struct nlm_file *file) static void nlm_close_files(struct nlm_file *file) { - struct file *f; - - for (f = file->f_file[0]; f <= file->f_file[1]; f++) - if (f) - nlmsvc_ops->fclose(f); + if (file->f_file[O_RDONLY]) + nlmsvc_ops->fclose(file->f_file[O_RDONLY]); + if (file->f_file[O_WRONLY]) + nlmsvc_ops->fclose(file->f_file[O_WRONLY]); } /* From 71860cc4e4365496f748c999ef2a6915005d5b3a Mon Sep 17 00:00:00 2001 From: Oliver Ford Date: Wed, 18 May 2022 15:59:59 +0100 Subject: [PATCH 0843/1565] fs: inotify: Fix typo in inotify comment [ Upstream commit c05787b4c2f80a3bebcb9cdbf255d4fa5c1e24e1 ] Correct spelling in comment. Signed-off-by: Oliver Ford Signed-off-by: Jan Kara Link: https://lore.kernel.org/r/20220518145959.41-1-ojford@gmail.com Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/inotify/inotify_user.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 3d5d536f8fd6..7360d16ce46d 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -123,7 +123,7 @@ static inline u32 inotify_mask_to_arg(__u32 mask) IN_Q_OVERFLOW); } -/* intofiy userspace file descriptor functions */ +/* inotify userspace file descriptor functions */ static __poll_t inotify_poll(struct file *file, poll_table *wait) { struct fsnotify_group *group = file->private_data; From b8d06d1187961381e0c40c70f27f3ff79c323e3a Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 29 Jun 2022 17:42:08 +0300 Subject: [PATCH 0844/1565] fanotify: prepare for setting event flags in ignore mask [ Upstream commit 31a371e419c885e0f137ce70395356ba8639dc52 ] Setting flags FAN_ONDIR FAN_EVENT_ON_CHILD in ignore mask has no effect. The FAN_EVENT_ON_CHILD flag in mask implicitly applies to ignore mask and ignore mask is always implicitly applied to events on directories. Define a mark flag that replaces this legacy behavior with logic of applying the ignore mask according to event flags in ignore mask. Implement the new logic to prepare for supporting an ignore mask that ignores events on children and ignore mask that does not ignore events on directories. To emphasize the change in terminology, also rename ignored_mask mark member to ignore_mask and use accessors to get only the effective ignored events or the ignored events and flags. This change in terminology finally aligns with the "ignore mask" language in man pages and in most of the comments. Link: https://lore.kernel.org/r/20220629144210.2983229-2-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 19 ++++--- fs/notify/fanotify/fanotify_user.c | 21 ++++--- fs/notify/fdinfo.c | 6 +- fs/notify/fsnotify.c | 21 ++++--- include/linux/fsnotify_backend.h | 89 ++++++++++++++++++++++++++++-- 5 files changed, 121 insertions(+), 35 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 4f897e109547..cd7d09a569ff 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -295,12 +295,13 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, const void *data, int data_type, struct inode *dir) { - __u32 marks_mask = 0, marks_ignored_mask = 0; + __u32 marks_mask = 0, marks_ignore_mask = 0; __u32 test_mask, user_mask = FANOTIFY_OUTGOING_EVENTS | FANOTIFY_EVENT_FLAGS; const struct path *path = fsnotify_data_path(data, data_type); unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); struct fsnotify_mark *mark; + bool ondir = event_mask & FAN_ONDIR; int type; pr_debug("%s: report_mask=%x mask=%x data=%p data_type=%d\n", @@ -315,19 +316,21 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, return 0; } else if (!(fid_mode & FAN_REPORT_FID)) { /* Do we have a directory inode to report? */ - if (!dir && !(event_mask & FS_ISDIR)) + if (!dir && !ondir) return 0; } fsnotify_foreach_iter_mark_type(iter_info, mark, type) { - /* Apply ignore mask regardless of mark's ISDIR flag */ - marks_ignored_mask |= mark->ignored_mask; + /* + * Apply ignore mask depending on event flags in ignore mask. + */ + marks_ignore_mask |= + fsnotify_effective_ignore_mask(mark, ondir, type); /* - * If the event is on dir and this mark doesn't care about - * events on dir, don't send it! + * Send the event depending on event flags in mark mask. */ - if (event_mask & FS_ISDIR && !(mark->mask & FS_ISDIR)) + if (!fsnotify_mask_applicable(mark->mask, ondir, type)) continue; marks_mask |= mark->mask; @@ -336,7 +339,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, *match_mask |= 1U << type; } - test_mask = event_mask & marks_mask & ~marks_ignored_mask; + test_mask = event_mask & marks_mask & ~marks_ignore_mask; /* * For dirent modification events (create/delete/move) that do not carry diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 990464e00aec..0d61cb0e4907 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1000,7 +1000,7 @@ static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark, if (!(flags & FAN_MARK_IGNORED_MASK)) { fsn_mark->mask &= ~mask; } else { - fsn_mark->ignored_mask &= ~mask; + fsn_mark->ignore_mask &= ~mask; } newmask = fsnotify_calc_mask(fsn_mark); /* @@ -1009,7 +1009,7 @@ static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark, * changes to the mask. * Destroy mark when only umask bits remain. */ - *destroy = !((fsn_mark->mask | fsn_mark->ignored_mask) & ~umask); + *destroy = !((fsn_mark->mask | fsn_mark->ignore_mask) & ~umask); spin_unlock(&fsn_mark->lock); return oldmask & ~newmask; @@ -1078,7 +1078,7 @@ static bool fanotify_mark_update_flags(struct fsnotify_mark *fsn_mark, /* * Setting FAN_MARK_IGNORED_SURV_MODIFY for the first time may lead to * the removal of the FS_MODIFY bit in calculated mask if it was set - * because of an ignored mask that is now going to survive FS_MODIFY. + * because of an ignore mask that is now going to survive FS_MODIFY. */ if ((fan_flags & FAN_MARK_IGNORED_MASK) && (fan_flags & FAN_MARK_IGNORED_SURV_MODIFY) && @@ -1111,7 +1111,7 @@ static bool fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, if (!(fan_flags & FAN_MARK_IGNORED_MASK)) fsn_mark->mask |= mask; else - fsn_mark->ignored_mask |= mask; + fsn_mark->ignore_mask |= mask; recalc = fsnotify_calc_mask(fsn_mark) & ~fsnotify_conn_mask(fsn_mark->connector); @@ -1249,7 +1249,7 @@ static int fanotify_add_inode_mark(struct fsnotify_group *group, /* * If some other task has this inode open for write we should not add - * an ignored mark, unless that ignored mark is supposed to survive + * an ignore mask, unless that ignore mask is supposed to survive * modification changes anyway. */ if ((flags & FAN_MARK_IGNORED_MASK) && @@ -1559,7 +1559,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, __kernel_fsid_t __fsid, *fsid = NULL; u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS; unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS; - bool ignored = flags & FAN_MARK_IGNORED_MASK; + bool ignore = flags & FAN_MARK_IGNORED_MASK; unsigned int obj_type, fid_mode; u32 umask = 0; int ret; @@ -1608,8 +1608,11 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, if (mask & ~valid_mask) return -EINVAL; - /* Event flags (ONDIR, ON_CHILD) are meaningless in ignored mask */ - if (ignored) + /* + * Event flags (FAN_ONDIR, FAN_EVENT_ON_CHILD) have no effect with + * FAN_MARK_IGNORED_MASK. + */ + if (ignore) mask &= ~FANOTIFY_EVENT_FLAGS; f = fdget(fanotify_fd); @@ -1723,7 +1726,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, * events with parent/name info for non-directory. */ if ((fid_mode & FAN_REPORT_DIR_FID) && - (flags & FAN_MARK_ADD) && !ignored) + (flags & FAN_MARK_ADD) && !ignore) mask |= FAN_EVENT_ON_CHILD; } diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c index 59fb40abe33d..55081ae3a6ec 100644 --- a/fs/notify/fdinfo.c +++ b/fs/notify/fdinfo.c @@ -113,7 +113,7 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark) return; seq_printf(m, "fanotify ino:%lx sdev:%x mflags:%x mask:%x ignored_mask:%x ", inode->i_ino, inode->i_sb->s_dev, - mflags, mark->mask, mark->ignored_mask); + mflags, mark->mask, mark->ignore_mask); show_mark_fhandle(m, inode); seq_putc(m, '\n'); iput(inode); @@ -121,12 +121,12 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark) struct mount *mnt = fsnotify_conn_mount(mark->connector); seq_printf(m, "fanotify mnt_id:%x mflags:%x mask:%x ignored_mask:%x\n", - mnt->mnt_id, mflags, mark->mask, mark->ignored_mask); + mnt->mnt_id, mflags, mark->mask, mark->ignore_mask); } else if (mark->connector->type == FSNOTIFY_OBJ_TYPE_SB) { struct super_block *sb = fsnotify_conn_sb(mark->connector); seq_printf(m, "fanotify sdev:%x mflags:%x mask:%x ignored_mask:%x\n", - sb->s_dev, mflags, mark->mask, mark->ignored_mask); + sb->s_dev, mflags, mark->mask, mark->ignore_mask); } } diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 0b3e74935cb4..8687562df2e3 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -324,7 +324,8 @@ static int send_to_group(__u32 mask, const void *data, int data_type, struct fsnotify_group *group = NULL; __u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS); __u32 marks_mask = 0; - __u32 marks_ignored_mask = 0; + __u32 marks_ignore_mask = 0; + bool is_dir = mask & FS_ISDIR; struct fsnotify_mark *mark; int type; @@ -336,7 +337,7 @@ static int send_to_group(__u32 mask, const void *data, int data_type, fsnotify_foreach_iter_mark_type(iter_info, mark, type) { if (!(mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) - mark->ignored_mask = 0; + mark->ignore_mask = 0; } } @@ -344,14 +345,15 @@ static int send_to_group(__u32 mask, const void *data, int data_type, fsnotify_foreach_iter_mark_type(iter_info, mark, type) { group = mark->group; marks_mask |= mark->mask; - marks_ignored_mask |= mark->ignored_mask; + marks_ignore_mask |= + fsnotify_effective_ignore_mask(mark, is_dir, type); } - pr_debug("%s: group=%p mask=%x marks_mask=%x marks_ignored_mask=%x data=%p data_type=%d dir=%p cookie=%d\n", - __func__, group, mask, marks_mask, marks_ignored_mask, + pr_debug("%s: group=%p mask=%x marks_mask=%x marks_ignore_mask=%x data=%p data_type=%d dir=%p cookie=%d\n", + __func__, group, mask, marks_mask, marks_ignore_mask, data, data_type, dir, cookie); - if (!(test_mask & marks_mask & ~marks_ignored_mask)) + if (!(test_mask & marks_mask & ~marks_ignore_mask)) return 0; if (group->ops->handle_event) { @@ -423,7 +425,8 @@ static bool fsnotify_iter_select_report_types( * But is *this mark* watching children? */ if (type == FSNOTIFY_ITER_TYPE_PARENT && - !(mark->mask & FS_EVENT_ON_CHILD)) + !(mark->mask & FS_EVENT_ON_CHILD) && + !(fsnotify_ignore_mask(mark) & FS_EVENT_ON_CHILD)) continue; fsnotify_iter_set_report_type(iter_info, type); @@ -532,8 +535,8 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, /* - * If this is a modify event we may need to clear some ignored masks. - * In that case, the object with ignored masks will have the FS_MODIFY + * If this is a modify event we may need to clear some ignore masks. + * In that case, the object with ignore masks will have the FS_MODIFY * event in its mask. * Otherwise, return if none of the marks care about this type of event. */ diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 9560734759fa..d7d96c806bff 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -518,8 +518,8 @@ struct fsnotify_mark { struct hlist_node obj_list; /* Head of list of marks for an object [mark ref] */ struct fsnotify_mark_connector *connector; - /* Events types to ignore [mark->lock, group->mark_mutex] */ - __u32 ignored_mask; + /* Events types and flags to ignore [mark->lock, group->mark_mutex] */ + __u32 ignore_mask; /* General fsnotify mark flags */ #define FSNOTIFY_MARK_FLAG_ALIVE 0x0001 #define FSNOTIFY_MARK_FLAG_ATTACHED 0x0002 @@ -529,6 +529,7 @@ struct fsnotify_mark { /* fanotify mark flags */ #define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY 0x0100 #define FSNOTIFY_MARK_FLAG_NO_IREF 0x0200 +#define FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS 0x0400 unsigned int flags; /* flags [mark->lock] */ }; @@ -655,15 +656,91 @@ extern void fsnotify_remove_queued_event(struct fsnotify_group *group, /* functions used to manipulate the marks attached to inodes */ -/* Get mask for calculating object interest taking ignored mask into account */ +/* + * Canonical "ignore mask" including event flags. + * + * Note the subtle semantic difference from the legacy ->ignored_mask. + * ->ignored_mask traditionally only meant which events should be ignored, + * while ->ignore_mask also includes flags regarding the type of objects on + * which events should be ignored. + */ +static inline __u32 fsnotify_ignore_mask(struct fsnotify_mark *mark) +{ + __u32 ignore_mask = mark->ignore_mask; + + /* The event flags in ignore mask take effect */ + if (mark->flags & FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS) + return ignore_mask; + + /* + * Legacy behavior: + * - Always ignore events on dir + * - Ignore events on child if parent is watching children + */ + ignore_mask |= FS_ISDIR; + ignore_mask &= ~FS_EVENT_ON_CHILD; + ignore_mask |= mark->mask & FS_EVENT_ON_CHILD; + + return ignore_mask; +} + +/* Legacy ignored_mask - only event types to ignore */ +static inline __u32 fsnotify_ignored_events(struct fsnotify_mark *mark) +{ + return mark->ignore_mask & ALL_FSNOTIFY_EVENTS; +} + +/* + * Check if mask (or ignore mask) should be applied depending if victim is a + * directory and whether it is reported to a watching parent. + */ +static inline bool fsnotify_mask_applicable(__u32 mask, bool is_dir, + int iter_type) +{ + /* Should mask be applied to a directory? */ + if (is_dir && !(mask & FS_ISDIR)) + return false; + + /* Should mask be applied to a child? */ + if (iter_type == FSNOTIFY_ITER_TYPE_PARENT && + !(mask & FS_EVENT_ON_CHILD)) + return false; + + return true; +} + +/* + * Effective ignore mask taking into account if event victim is a + * directory and whether it is reported to a watching parent. + */ +static inline __u32 fsnotify_effective_ignore_mask(struct fsnotify_mark *mark, + bool is_dir, int iter_type) +{ + __u32 ignore_mask = fsnotify_ignored_events(mark); + + if (!ignore_mask) + return 0; + + /* For non-dir and non-child, no need to consult the event flags */ + if (!is_dir && iter_type != FSNOTIFY_ITER_TYPE_PARENT) + return ignore_mask; + + ignore_mask = fsnotify_ignore_mask(mark); + if (!fsnotify_mask_applicable(ignore_mask, is_dir, iter_type)) + return 0; + + return ignore_mask & ALL_FSNOTIFY_EVENTS; +} + +/* Get mask for calculating object interest taking ignore mask into account */ static inline __u32 fsnotify_calc_mask(struct fsnotify_mark *mark) { __u32 mask = mark->mask; - if (!mark->ignored_mask) + if (!fsnotify_ignored_events(mark)) return mask; - /* Interest in FS_MODIFY may be needed for clearing ignored mask */ + /* Interest in FS_MODIFY may be needed for clearing ignore mask */ if (!(mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) mask |= FS_MODIFY; @@ -671,7 +748,7 @@ static inline __u32 fsnotify_calc_mask(struct fsnotify_mark *mark) * If mark is interested in ignoring events on children, the object must * show interest in those events for fsnotify_parent() to notice it. */ - return mask | (mark->ignored_mask & ALL_FSNOTIFY_EVENTS); + return mask | mark->ignore_mask; } /* Get mask of events for a list of marks */ From 99a022c4bcbbcc2a9c53c706353081c542518bfe Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 29 Jun 2022 17:42:09 +0300 Subject: [PATCH 0845/1565] fanotify: cleanups for fanotify_mark() input validations [ Upstream commit 8afd7215aa97f8868d033f6e1d01a276ab2d29c0 ] Create helper fanotify_may_update_existing_mark() for checking for conflicts between existing mark flags and fanotify_mark() flags. Use variable mark_cmd to make the checks for mark command bits cleaner. Link: https://lore.kernel.org/r/20220629144210.2983229-3-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify_user.c | 30 +++++++++++++++++++++--------- include/linux/fanotify.h | 9 +++++---- 2 files changed, 26 insertions(+), 13 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 0d61cb0e4907..715e41b34412 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1175,6 +1175,19 @@ static int fanotify_group_init_error_pool(struct fsnotify_group *group) sizeof(struct fanotify_error_event)); } +static int fanotify_may_update_existing_mark(struct fsnotify_mark *fsn_mark, + unsigned int fan_flags) +{ + /* + * Non evictable mark cannot be downgraded to evictable mark. + */ + if (fan_flags & FAN_MARK_EVICTABLE && + !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF)) + return -EEXIST; + + return 0; +} + static int fanotify_add_mark(struct fsnotify_group *group, fsnotify_connp_t *connp, unsigned int obj_type, __u32 mask, unsigned int fan_flags, @@ -1196,13 +1209,11 @@ static int fanotify_add_mark(struct fsnotify_group *group, } /* - * Non evictable mark cannot be downgraded to evictable mark. + * Check if requested mark flags conflict with an existing mark flags. */ - if (fan_flags & FAN_MARK_EVICTABLE && - !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF)) { - ret = -EEXIST; + ret = fanotify_may_update_existing_mark(fsn_mark, fan_flags); + if (ret) goto out; - } /* * Error events are pre-allocated per group, only if strictly @@ -1559,6 +1570,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, __kernel_fsid_t __fsid, *fsid = NULL; u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS; unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS; + unsigned int mark_cmd = flags & FANOTIFY_MARK_CMD_BITS; bool ignore = flags & FAN_MARK_IGNORED_MASK; unsigned int obj_type, fid_mode; u32 umask = 0; @@ -1588,7 +1600,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, return -EINVAL; } - switch (flags & (FAN_MARK_ADD | FAN_MARK_REMOVE | FAN_MARK_FLUSH)) { + switch (mark_cmd) { case FAN_MARK_ADD: case FAN_MARK_REMOVE: if (!mask) @@ -1677,7 +1689,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, if (mask & FAN_RENAME && !(fid_mode & FAN_REPORT_NAME)) goto fput_and_out; - if (flags & FAN_MARK_FLUSH) { + if (mark_cmd == FAN_MARK_FLUSH) { ret = 0; if (mark_type == FAN_MARK_MOUNT) fsnotify_clear_vfsmount_marks_by_group(group); @@ -1693,7 +1705,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, if (ret) goto fput_and_out; - if (flags & FAN_MARK_ADD) { + if (mark_cmd == FAN_MARK_ADD) { ret = fanotify_events_supported(group, &path, mask, flags); if (ret) goto path_put_and_out; @@ -1731,7 +1743,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, } /* create/update an inode mark */ - switch (flags & (FAN_MARK_ADD | FAN_MARK_REMOVE)) { + switch (mark_cmd) { case FAN_MARK_ADD: if (mark_type == FAN_MARK_MOUNT) ret = fanotify_add_vfsmount_mark(group, mnt, mask, diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 4f6cbe6c6e23..c9e185407ebc 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -61,15 +61,16 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ #define FANOTIFY_MARK_TYPE_BITS (FAN_MARK_INODE | FAN_MARK_MOUNT | \ FAN_MARK_FILESYSTEM) +#define FANOTIFY_MARK_CMD_BITS (FAN_MARK_ADD | FAN_MARK_REMOVE | \ + FAN_MARK_FLUSH) + #define FANOTIFY_MARK_FLAGS (FANOTIFY_MARK_TYPE_BITS | \ - FAN_MARK_ADD | \ - FAN_MARK_REMOVE | \ + FANOTIFY_MARK_CMD_BITS | \ FAN_MARK_DONT_FOLLOW | \ FAN_MARK_ONLYDIR | \ FAN_MARK_IGNORED_MASK | \ FAN_MARK_IGNORED_SURV_MODIFY | \ - FAN_MARK_EVICTABLE | \ - FAN_MARK_FLUSH) + FAN_MARK_EVICTABLE) /* * Events that can be reported with data type FSNOTIFY_EVENT_PATH. From 85c640adf9fc204eb770d9789ea689bcc56d288a Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 29 Jun 2022 17:42:10 +0300 Subject: [PATCH 0846/1565] fanotify: introduce FAN_MARK_IGNORE [ Upstream commit e252f2ed1c8c6c3884ab5dd34e003ed21f1fe6e0 ] This flag is a new way to configure ignore mask which allows adding and removing the event flags FAN_ONDIR and FAN_EVENT_ON_CHILD in ignore mask. The legacy FAN_MARK_IGNORED_MASK flag would always ignore events on directories and would ignore events on children depending on whether the FAN_EVENT_ON_CHILD flag was set in the (non ignored) mask. FAN_MARK_IGNORE can be used to ignore events on children without setting FAN_EVENT_ON_CHILD in the mark's mask and will not ignore events on directories unconditionally, only when FAN_ONDIR is set in ignore mask. The new behavior is non-downgradable. After calling fanotify_mark() with FAN_MARK_IGNORE once, calling fanotify_mark() with FAN_MARK_IGNORED_MASK on the same object will return EEXIST error. Setting the event flags with FAN_MARK_IGNORE on a non-dir inode mark has no meaning and will return ENOTDIR error. The meaning of FAN_MARK_IGNORED_SURV_MODIFY is preserved with the new FAN_MARK_IGNORE flag, but with a few semantic differences: 1. FAN_MARK_IGNORED_SURV_MODIFY is required for filesystem and mount marks and on an inode mark on a directory. Omitting this flag will return EINVAL or EISDIR error. 2. An ignore mask on a non-directory inode that survives modify could never be downgraded to an ignore mask that does not survive modify. With new FAN_MARK_IGNORE semantics we make that rule explicit - trying to update a surviving ignore mask without the flag FAN_MARK_IGNORED_SURV_MODIFY will return EEXIST error. The conveniene macro FAN_MARK_IGNORE_SURV is added for (FAN_MARK_IGNORE | FAN_MARK_IGNORED_SURV_MODIFY), because the common case should use short constant names. Link: https://lore.kernel.org/r/20220629144210.2983229-4-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.h | 2 + fs/notify/fanotify/fanotify_user.c | 63 +++++++++++++++++++++++++----- include/linux/fanotify.h | 5 ++- include/uapi/linux/fanotify.h | 8 ++++ 4 files changed, 67 insertions(+), 11 deletions(-) diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 80e0ec95b113..1d9f11255c64 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -499,6 +499,8 @@ static inline unsigned int fanotify_mark_user_flags(struct fsnotify_mark *mark) mflags |= FAN_MARK_IGNORED_SURV_MODIFY; if (mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF) mflags |= FAN_MARK_EVICTABLE; + if (mark->flags & FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS) + mflags |= FAN_MARK_IGNORE; return mflags; } diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 715e41b34412..72dd446606a7 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -997,7 +997,7 @@ static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark, mask &= ~umask; spin_lock(&fsn_mark->lock); oldmask = fsnotify_calc_mask(fsn_mark); - if (!(flags & FAN_MARK_IGNORED_MASK)) { + if (!(flags & FANOTIFY_MARK_IGNORE_BITS)) { fsn_mark->mask &= ~mask; } else { fsn_mark->ignore_mask &= ~mask; @@ -1073,15 +1073,24 @@ static bool fanotify_mark_update_flags(struct fsnotify_mark *fsn_mark, unsigned int fan_flags) { bool want_iref = !(fan_flags & FAN_MARK_EVICTABLE); + unsigned int ignore = fan_flags & FANOTIFY_MARK_IGNORE_BITS; bool recalc = false; + /* + * When using FAN_MARK_IGNORE for the first time, mark starts using + * independent event flags in ignore mask. After that, trying to + * update the ignore mask with the old FAN_MARK_IGNORED_MASK API + * will result in EEXIST error. + */ + if (ignore == FAN_MARK_IGNORE) + fsn_mark->flags |= FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS; + /* * Setting FAN_MARK_IGNORED_SURV_MODIFY for the first time may lead to * the removal of the FS_MODIFY bit in calculated mask if it was set * because of an ignore mask that is now going to survive FS_MODIFY. */ - if ((fan_flags & FAN_MARK_IGNORED_MASK) && - (fan_flags & FAN_MARK_IGNORED_SURV_MODIFY) && + if (ignore && (fan_flags & FAN_MARK_IGNORED_SURV_MODIFY) && !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) { fsn_mark->flags |= FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY; if (!(fsn_mark->mask & FS_MODIFY)) @@ -1108,7 +1117,7 @@ static bool fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, bool recalc; spin_lock(&fsn_mark->lock); - if (!(fan_flags & FAN_MARK_IGNORED_MASK)) + if (!(fan_flags & FANOTIFY_MARK_IGNORE_BITS)) fsn_mark->mask |= mask; else fsn_mark->ignore_mask |= mask; @@ -1185,6 +1194,24 @@ static int fanotify_may_update_existing_mark(struct fsnotify_mark *fsn_mark, !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF)) return -EEXIST; + /* + * New ignore mask semantics cannot be downgraded to old semantics. + */ + if (fan_flags & FAN_MARK_IGNORED_MASK && + fsn_mark->flags & FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS) + return -EEXIST; + + /* + * An ignore mask that survives modify could never be downgraded to not + * survive modify. With new FAN_MARK_IGNORE semantics we make that rule + * explicit and return an error when trying to update the ignore mask + * without the original FAN_MARK_IGNORED_SURV_MODIFY value. + */ + if (fan_flags & FAN_MARK_IGNORE && + !(fan_flags & FAN_MARK_IGNORED_SURV_MODIFY) && + fsn_mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY) + return -EEXIST; + return 0; } @@ -1219,7 +1246,8 @@ static int fanotify_add_mark(struct fsnotify_group *group, * Error events are pre-allocated per group, only if strictly * needed (i.e. FAN_FS_ERROR was requested). */ - if (!(fan_flags & FAN_MARK_IGNORED_MASK) && (mask & FAN_FS_ERROR)) { + if (!(fan_flags & FANOTIFY_MARK_IGNORE_BITS) && + (mask & FAN_FS_ERROR)) { ret = fanotify_group_init_error_pool(group); if (ret) goto out; @@ -1263,7 +1291,7 @@ static int fanotify_add_inode_mark(struct fsnotify_group *group, * an ignore mask, unless that ignore mask is supposed to survive * modification changes anyway. */ - if ((flags & FAN_MARK_IGNORED_MASK) && + if ((flags & FANOTIFY_MARK_IGNORE_BITS) && !(flags & FAN_MARK_IGNORED_SURV_MODIFY) && inode_is_open_for_write(inode)) return 0; @@ -1519,7 +1547,8 @@ static int fanotify_events_supported(struct fsnotify_group *group, unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS; /* Strict validation of events in non-dir inode mask with v5.17+ APIs */ bool strict_dir_events = FAN_GROUP_FLAG(group, FAN_REPORT_TARGET_FID) || - (mask & FAN_RENAME); + (mask & FAN_RENAME) || + (flags & FAN_MARK_IGNORE); /* * Some filesystems such as 'proc' acquire unusual locks when opening @@ -1571,7 +1600,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS; unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS; unsigned int mark_cmd = flags & FANOTIFY_MARK_CMD_BITS; - bool ignore = flags & FAN_MARK_IGNORED_MASK; + unsigned int ignore = flags & FANOTIFY_MARK_IGNORE_BITS; unsigned int obj_type, fid_mode; u32 umask = 0; int ret; @@ -1620,12 +1649,19 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, if (mask & ~valid_mask) return -EINVAL; + + /* We don't allow FAN_MARK_IGNORE & FAN_MARK_IGNORED_MASK together */ + if (ignore == (FAN_MARK_IGNORE | FAN_MARK_IGNORED_MASK)) + return -EINVAL; + /* * Event flags (FAN_ONDIR, FAN_EVENT_ON_CHILD) have no effect with * FAN_MARK_IGNORED_MASK. */ - if (ignore) + if (ignore == FAN_MARK_IGNORED_MASK) { mask &= ~FANOTIFY_EVENT_FLAGS; + umask = FANOTIFY_EVENT_FLAGS; + } f = fdget(fanotify_fd); if (unlikely(!f.file)) @@ -1729,6 +1765,13 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, else mnt = path.mnt; + ret = mnt ? -EINVAL : -EISDIR; + /* FAN_MARK_IGNORE requires SURV_MODIFY for sb/mount/dir marks */ + if (mark_cmd == FAN_MARK_ADD && ignore == FAN_MARK_IGNORE && + (mnt || S_ISDIR(inode->i_mode)) && + !(flags & FAN_MARK_IGNORED_SURV_MODIFY)) + goto path_put_and_out; + /* Mask out FAN_EVENT_ON_CHILD flag for sb/mount/non-dir marks */ if (mnt || !S_ISDIR(inode->i_mode)) { mask &= ~FAN_EVENT_ON_CHILD; @@ -1821,7 +1864,7 @@ static int __init fanotify_user_setup(void) BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS); BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 12); - BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 10); + BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 11); fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, SLAB_PANIC|SLAB_ACCOUNT); diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index c9e185407ebc..558844c8d259 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -64,11 +64,14 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ #define FANOTIFY_MARK_CMD_BITS (FAN_MARK_ADD | FAN_MARK_REMOVE | \ FAN_MARK_FLUSH) +#define FANOTIFY_MARK_IGNORE_BITS (FAN_MARK_IGNORED_MASK | \ + FAN_MARK_IGNORE) + #define FANOTIFY_MARK_FLAGS (FANOTIFY_MARK_TYPE_BITS | \ FANOTIFY_MARK_CMD_BITS | \ + FANOTIFY_MARK_IGNORE_BITS | \ FAN_MARK_DONT_FOLLOW | \ FAN_MARK_ONLYDIR | \ - FAN_MARK_IGNORED_MASK | \ FAN_MARK_IGNORED_SURV_MODIFY | \ FAN_MARK_EVICTABLE) diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index f1f89132d60e..d8536d77fea1 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -83,12 +83,20 @@ #define FAN_MARK_FLUSH 0x00000080 /* FAN_MARK_FILESYSTEM is 0x00000100 */ #define FAN_MARK_EVICTABLE 0x00000200 +/* This bit is mutually exclusive with FAN_MARK_IGNORED_MASK bit */ +#define FAN_MARK_IGNORE 0x00000400 /* These are NOT bitwise flags. Both bits can be used togther. */ #define FAN_MARK_INODE 0x00000000 #define FAN_MARK_MOUNT 0x00000010 #define FAN_MARK_FILESYSTEM 0x00000100 +/* + * Convenience macro - FAN_MARK_IGNORE requires FAN_MARK_IGNORED_SURV_MODIFY + * for non-inode mark types. + */ +#define FAN_MARK_IGNORE_SURV (FAN_MARK_IGNORE | FAN_MARK_IGNORED_SURV_MODIFY) + /* Deprecated - do not use this in programs and do not add new flags here! */ #define FAN_ALL_MARK_FLAGS (FAN_MARK_ADD |\ FAN_MARK_REMOVE |\ From 302ae1fb80a3689397747cf7da06d70ea9fb5aab Mon Sep 17 00:00:00 2001 From: Xin Gao Date: Sat, 23 Jul 2022 03:46:39 +0800 Subject: [PATCH 0847/1565] fsnotify: Fix comment typo [ Upstream commit feee1ce45a5666bbdb08c5bb2f5f394047b1915b ] The double `if' is duplicated in line 104, remove one. Signed-off-by: Xin Gao Signed-off-by: Jan Kara Link: https://lore.kernel.org/r/20220722194639.18545-1-gaoxin@cdjrlc.com Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fsnotify.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 8687562df2e3..7974e91ffe13 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -100,7 +100,7 @@ void fsnotify_sb_delete(struct super_block *sb) * Given an inode, first check if we care what happens to our children. Inotify * and dnotify both tell their parents about events. If we care about any event * on a child we run all of our children and set a dentry flag saying that the - * parent cares. Thus when an event happens on a child it can quickly tell if + * parent cares. Thus when an event happens on a child it can quickly tell * if there is a need to find a parent and send the event to the parent. */ void __fsnotify_update_child_dentry_flags(struct inode *inode) From bcaac325dd955ba808f507a0baf0aaf444e1db50 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 29 Jul 2022 17:01:07 -0400 Subject: [PATCH 0848/1565] nfsd: eliminate the NFSD_FILE_BREAK_* flags [ Upstream commit 23ba98de6dcec665e15c0ca19244379bb0d30932 ] We had a report from the spring Bake-a-thon of data corruption in some nfstest_interop tests. Looking at the traces showed the NFS server allowing a v3 WRITE to proceed while a read delegation was still outstanding. Currently, we only set NFSD_FILE_BREAK_* flags if NFSD_MAY_NOT_BREAK_LEASE was set when we call nfsd_file_alloc. NFSD_MAY_NOT_BREAK_LEASE was intended to be set when finding files for COMMIT ops, where we need a writeable filehandle but don't need to break read leases. It doesn't make any sense to consult that flag when allocating a file since the file may be used on subsequent calls where we do want to break the lease (and the usage of it here seems to be reverse from what it should be anyway). Also, after calling nfsd_open_break_lease, we don't want to clear the BREAK_* bits. A lease could end up being set on it later (more than once) and we need to be able to break those leases as well. This means that the NFSD_FILE_BREAK_* flags now just mirror NFSD_MAY_{READ,WRITE} flags, so there's no need for them at all. Just drop those flags and unconditionally call nfsd_open_break_lease every time. Reported-by: Olga Kornieskaia Link: https://bugzilla.redhat.com/show_bug.cgi?id=2107360 Fixes: 65294c1f2c5e (nfsd: add a new struct file caching facility to nfsd) Cc: # 5.4.x : bb283ca18d1e NFSD: Clean up the show_nf_flags() macro Cc: # 5.4.x Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 22 +--------------------- fs/nfsd/filecache.h | 4 +--- fs/nfsd/trace.h | 2 -- 3 files changed, 2 insertions(+), 26 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index fc0fcb332153..1d3d13b78be0 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -183,12 +183,6 @@ nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval, nf->nf_hashval = hashval; refcount_set(&nf->nf_ref, 1); nf->nf_may = may & NFSD_FILE_MAY_MASK; - if (may & NFSD_MAY_NOT_BREAK_LEASE) { - if (may & NFSD_MAY_WRITE) - __set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags); - if (may & NFSD_MAY_READ) - __set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags); - } nf->nf_mark = NULL; trace_nfsd_file_alloc(nf); } @@ -957,21 +951,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, this_cpu_inc(nfsd_file_cache_hits); - if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) { - bool write = (may_flags & NFSD_MAY_WRITE); - - if (test_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags) || - (test_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags) && write)) { - status = nfserrno(nfsd_open_break_lease( - file_inode(nf->nf_file), may_flags)); - if (status == nfs_ok) { - clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags); - if (write) - clear_bit(NFSD_FILE_BREAK_WRITE, - &nf->nf_flags); - } - } - } + status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); out: if (status == nfs_ok) { *pnf = nf; diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 1da0c79a5580..c9e3c6eb4776 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -37,9 +37,7 @@ struct nfsd_file { struct net *nf_net; #define NFSD_FILE_HASHED (0) #define NFSD_FILE_PENDING (1) -#define NFSD_FILE_BREAK_READ (2) -#define NFSD_FILE_BREAK_WRITE (3) -#define NFSD_FILE_REFERENCED (4) +#define NFSD_FILE_REFERENCED (2) unsigned long nf_flags; struct inode *nf_inode; unsigned int nf_hashval; diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 8ccce4ac66b4..5c2292a1892c 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -707,8 +707,6 @@ DEFINE_CLID_EVENT(confirmed_r); __print_flags(val, "|", \ { 1 << NFSD_FILE_HASHED, "HASHED" }, \ { 1 << NFSD_FILE_PENDING, "PENDING" }, \ - { 1 << NFSD_FILE_BREAK_READ, "BREAK_READ" }, \ - { 1 << NFSD_FILE_BREAK_WRITE, "BREAK_WRITE" }, \ { 1 << NFSD_FILE_REFERENCED, "REFERENCED"}) DECLARE_EVENT_CLASS(nfsd_file_class, From 16acc0677f801d98b2de7f83f0f3389ce36c2806 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 19 Jul 2022 09:18:35 -0400 Subject: [PATCH 0849/1565] SUNRPC: Fix xdr_encode_bool() [ Upstream commit c770f31d8f580ed4b965c64f924ec1cc50e41734 ] I discovered that xdr_encode_bool() was returning the same address that was passed in the @p parameter. The documenting comment states that the intent is to return the address of the next buffer location, just like the other "xdr_encode_*" helpers. The result was the encoded results of NFSv3 PATHCONF operations were not formed correctly. Fixes: ded04a587f6c ("NFSD: Update the NFSv3 PATHCONF3res encoder to use struct xdr_stream") Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/sunrpc/xdr.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index b8fb866e1fd3..59d99ff31c1a 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -418,8 +418,8 @@ static inline int xdr_stream_encode_item_absent(struct xdr_stream *xdr) */ static inline __be32 *xdr_encode_bool(__be32 *p, u32 n) { - *p = n ? xdr_one : xdr_zero; - return p++; + *p++ = n ? xdr_one : xdr_zero; + return p; } /** From b5b79fc3ff4f74060eac092fd2ebd7046aa9788e Mon Sep 17 00:00:00 2001 From: Benjamin Coddington Date: Mon, 13 Jun 2022 09:40:06 -0400 Subject: [PATCH 0850/1565] NLM: Defend against file_lock changes after vfs_test_lock() [ Upstream commit 184cefbe62627730c30282df12bcff9aae4816ea ] Instead of trusting that struct file_lock returns completely unchanged after vfs_test_lock() when there's no conflicting lock, stash away our nlm_lockowner reference so we can properly release it for all cases. This defends against another file_lock implementation overwriting fl_owner when the return type is F_UNLCK. Reported-by: Roberto Bergantinos Corpas Tested-by: Roberto Bergantinos Corpas Signed-off-by: Benjamin Coddington Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc4proc.c | 4 +++- fs/lockd/svclock.c | 10 +--------- fs/lockd/svcproc.c | 5 ++++- include/linux/lockd/lockd.h | 1 + 4 files changed, 9 insertions(+), 11 deletions(-) diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c index 176b468a61c7..4f247ab8be61 100644 --- a/fs/lockd/svc4proc.c +++ b/fs/lockd/svc4proc.c @@ -87,6 +87,7 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp) struct nlm_args *argp = rqstp->rq_argp; struct nlm_host *host; struct nlm_file *file; + struct nlm_lockowner *test_owner; __be32 rc = rpc_success; dprintk("lockd: TEST4 called\n"); @@ -96,6 +97,7 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp) if ((resp->status = nlm4svc_retrieve_args(rqstp, argp, &host, &file))) return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success; + test_owner = argp->lock.fl.fl_owner; /* Now check for conflicting locks */ resp->status = nlmsvc_testlock(rqstp, file, host, &argp->lock, &resp->lock, &resp->cookie); if (resp->status == nlm_drop_reply) @@ -103,7 +105,7 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp) else dprintk("lockd: TEST4 status %d\n", ntohl(resp->status)); - nlmsvc_release_lockowner(&argp->lock); + nlmsvc_put_lockowner(test_owner); nlmsvc_release_host(host); nlm_release_file(file); return rc; diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index cb3658ab9b7a..9c1aa75441e1 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -340,7 +340,7 @@ nlmsvc_get_lockowner(struct nlm_lockowner *lockowner) return lockowner; } -static void nlmsvc_put_lockowner(struct nlm_lockowner *lockowner) +void nlmsvc_put_lockowner(struct nlm_lockowner *lockowner) { if (!refcount_dec_and_lock(&lockowner->count, &lockowner->host->h_lock)) return; @@ -590,7 +590,6 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file, int error; int mode; __be32 ret; - struct nlm_lockowner *test_owner; dprintk("lockd: nlmsvc_testlock(%s/%ld, ty=%d, %Ld-%Ld)\n", nlmsvc_file_inode(file)->i_sb->s_id, @@ -604,9 +603,6 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file, goto out; } - /* If there's a conflicting lock, remember to clean up the test lock */ - test_owner = (struct nlm_lockowner *)lock->fl.fl_owner; - mode = lock_to_openmode(&lock->fl); error = vfs_test_lock(file->f_file[mode], &lock->fl); if (error) { @@ -635,10 +631,6 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file, conflock->fl.fl_end = lock->fl.fl_end; locks_release_private(&lock->fl); - /* Clean up the test lock */ - lock->fl.fl_owner = NULL; - nlmsvc_put_lockowner(test_owner); - ret = nlm_lck_denied; out: return ret; diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c index 4dc1b40a489a..b09ca35b527c 100644 --- a/fs/lockd/svcproc.c +++ b/fs/lockd/svcproc.c @@ -116,6 +116,7 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp) struct nlm_args *argp = rqstp->rq_argp; struct nlm_host *host; struct nlm_file *file; + struct nlm_lockowner *test_owner; __be32 rc = rpc_success; dprintk("lockd: TEST called\n"); @@ -125,6 +126,8 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp) if ((resp->status = nlmsvc_retrieve_args(rqstp, argp, &host, &file))) return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success; + test_owner = argp->lock.fl.fl_owner; + /* Now check for conflicting locks */ resp->status = cast_status(nlmsvc_testlock(rqstp, file, host, &argp->lock, &resp->lock, &resp->cookie)); if (resp->status == nlm_drop_reply) @@ -133,7 +136,7 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp) dprintk("lockd: TEST status %d vers %d\n", ntohl(resp->status), rqstp->rq_vers); - nlmsvc_release_lockowner(&argp->lock); + nlmsvc_put_lockowner(test_owner); nlmsvc_release_host(host); nlm_release_file(file); return rc; diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h index fcef192e5e45..70ce419e2709 100644 --- a/include/linux/lockd/lockd.h +++ b/include/linux/lockd/lockd.h @@ -292,6 +292,7 @@ void nlmsvc_locks_init_private(struct file_lock *, struct nlm_host *, pid_t); __be32 nlm_lookup_file(struct svc_rqst *, struct nlm_file **, struct nlm_lock *); void nlm_release_file(struct nlm_file *); +void nlmsvc_put_lockowner(struct nlm_lockowner *); void nlmsvc_release_lockowner(struct nlm_lock *); void nlmsvc_mark_resources(struct net *); void nlmsvc_free_host_resources(struct nlm_host *); From 0a18cd2b946b544371fc1f0d6433c279f8c7e35a Mon Sep 17 00:00:00 2001 From: Zhang Jiaming Date: Thu, 23 Jun 2022 16:20:05 +0800 Subject: [PATCH 0851/1565] NFSD: Fix space and spelling mistake [ Upstream commit f532c9ff103897be0e2a787c0876683c3dc39ed3 ] Add a blank space after ','. Change 'succesful' to 'successful'. Signed-off-by: Zhang Jiaming Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index adbac1e77e9e..cb4a03726670 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -828,7 +828,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out_umask; status = nfsd_create(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - &create->cr_iattr,S_IFCHR, rdev, &resfh); + &create->cr_iattr, S_IFCHR, rdev, &resfh); break; case NF4SOCK: @@ -2703,7 +2703,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) if (op->opdesc->op_flags & OP_MODIFIES_SOMETHING) { /* * Don't execute this op if we couldn't encode a - * succesful reply: + * successful reply: */ u32 plen = op->opdesc->op_rsize_bop(rqstp, op); /* From cc3b111e3b020edf1ef68b4864a26cd122b1f44d Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Tue, 28 Jun 2022 22:25:25 +0100 Subject: [PATCH 0852/1565] nfsd: remove redundant assignment to variable len [ Upstream commit 842e00ac3aa3b4a4f7f750c8ab54f8578fc875d3 ] Variable len is being assigned a value zero and this is never read, it is being re-assigned later. The assignment is redundant and can be removed. Cleans up clang scan-build warning: fs/nfsd/nfsctl.c:636:2: warning: Value stored to 'len' is never read Signed-off-by: Colin Ian King Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsctl.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 0621c2faf242..66c352bf61b1 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -633,7 +633,6 @@ static ssize_t __write_versions(struct file *file, char *buf, size_t size) } /* Now write current state into reply buffer */ - len = 0; sep = ""; remaining = SIMPLE_TRANSACTION_LIMIT; for (num=2 ; num <= 4 ; num++) { From 4ff0e22e547e5201ee5c321db8f31472abca1bec Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:23:45 -0400 Subject: [PATCH 0853/1565] NFSD: Demote a WARN to a pr_warn() [ Upstream commit ca3f9acb6d3faf78da2b63324f7c737dbddf7f69 ] The call trace doesn't add much value, but it sure is noisy. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index cb4a03726670..08c2eaca4f24 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -630,9 +630,9 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, } status = nfsd4_process_open2(rqstp, resfh, open); - WARN(status && open->op_created, - "nfsd4_process_open2 failed to open newly-created file! status=%u\n", - be32_to_cpu(status)); + if (status && open->op_created) + pr_warn("nfsd4_process_open2 failed to open newly-created file: status=%u\n", + be32_to_cpu(status)); if (reclaim && !status) nn->somebody_reclaimed = true; out: From 91c03a61241f7e42fad58f28d4b8f02b3d52de87 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:23:52 -0400 Subject: [PATCH 0854/1565] NFSD: Report filecache LRU size [ Upstream commit 0fd244c115f0321fc5e34ad2291f2a572508e3f7 ] Surface the NFSD filecache's LRU list length to help field troubleshooters monitor filecache issues. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 1d3d13b78be0..377d8211200f 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1047,7 +1047,7 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { unsigned int i, count = 0, longest = 0; - unsigned long hits = 0; + unsigned long lru = 0, hits = 0; /* * No need for spinlocks here since we're not terribly interested in @@ -1060,6 +1060,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) count += nfsd_file_hashtbl[i].nfb_count; longest = max(longest, nfsd_file_hashtbl[i].nfb_count); } + lru = list_lru_count(&nfsd_file_lru); } mutex_unlock(&nfsd_mutex); @@ -1068,6 +1069,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) seq_printf(m, "total entries: %u\n", count); seq_printf(m, "longest chain: %u\n", longest); + seq_printf(m, "lru entries: %lu\n", lru); seq_printf(m, "cache hits: %lu\n", hits); return 0; } From a38dff5964f30710b2fad3fb4cc17a2b9f19d69a Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:23:59 -0400 Subject: [PATCH 0855/1565] NFSD: Report count of calls to nfsd_file_acquire() [ Upstream commit 29d4bdbbb910f33d6058d2c51278f00f656df325 ] Count the number of successful acquisitions that did not create a file (ie, acquisitions that do not result in a compulsory cache miss). This count can be compared directly with the reported hit count to compute a hit ratio. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 377d8211200f..5a09b76ae25a 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -42,6 +42,7 @@ struct nfsd_fcache_bucket { }; static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); +static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); struct nfsd_fcache_disposal { struct work_struct work; @@ -954,6 +955,8 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); out: if (status == nfs_ok) { + if (open) + this_cpu_inc(nfsd_file_acquisitions); *pnf = nf; } else { nfsd_file_put(nf); @@ -1046,8 +1049,9 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, */ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { + unsigned long hits = 0, acquisitions = 0; unsigned int i, count = 0, longest = 0; - unsigned long lru = 0, hits = 0; + unsigned long lru = 0; /* * No need for spinlocks here since we're not terribly interested in @@ -1064,13 +1068,16 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) } mutex_unlock(&nfsd_mutex); - for_each_possible_cpu(i) + for_each_possible_cpu(i) { hits += per_cpu(nfsd_file_cache_hits, i); + acquisitions += per_cpu(nfsd_file_acquisitions, i); + } seq_printf(m, "total entries: %u\n", count); seq_printf(m, "longest chain: %u\n", longest); seq_printf(m, "lru entries: %lu\n", lru); seq_printf(m, "cache hits: %lu\n", hits); + seq_printf(m, "acquisitions: %lu\n", acquisitions); return 0; } From ca9cc17ec04f50c9ac46f625d74f382b0844d75e Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:24:05 -0400 Subject: [PATCH 0856/1565] NFSD: Report count of freed filecache items [ Upstream commit d63293272abb51c02457f1017dfd61c3270d9ae3 ] Surface the count of freed nfsd_file items. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 5a09b76ae25a..5d31622c2304 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -43,6 +43,7 @@ struct nfsd_fcache_bucket { static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); +static DEFINE_PER_CPU(unsigned long, nfsd_file_releases); struct nfsd_fcache_disposal { struct work_struct work; @@ -195,6 +196,8 @@ nfsd_file_free(struct nfsd_file *nf) { bool flush = false; + this_cpu_inc(nfsd_file_releases); + trace_nfsd_file_put_final(nf); if (nf->nf_mark) nfsd_file_mark_put(nf->nf_mark); @@ -1049,7 +1052,7 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, */ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { - unsigned long hits = 0, acquisitions = 0; + unsigned long hits = 0, acquisitions = 0, releases = 0; unsigned int i, count = 0, longest = 0; unsigned long lru = 0; @@ -1071,6 +1074,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) for_each_possible_cpu(i) { hits += per_cpu(nfsd_file_cache_hits, i); acquisitions += per_cpu(nfsd_file_acquisitions, i); + releases += per_cpu(nfsd_file_releases, i); } seq_printf(m, "total entries: %u\n", count); @@ -1078,6 +1082,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) seq_printf(m, "lru entries: %lu\n", lru); seq_printf(m, "cache hits: %lu\n", hits); seq_printf(m, "acquisitions: %lu\n", acquisitions); + seq_printf(m, "releases: %lu\n", releases); return 0; } From d176e98400712387d7dc36a91026f0827dbf84b2 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:24:12 -0400 Subject: [PATCH 0857/1565] NFSD: Report average age of filecache items [ Upstream commit 904940e94a887701db24401e3ed6928a1d4e329f ] This is a measure of how long items stay in the filecache, to help assess how efficient the cache is. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 11 ++++++++++- fs/nfsd/filecache.h | 1 + 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 5d31622c2304..0cd72c20fc12 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -44,6 +44,7 @@ struct nfsd_fcache_bucket { static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); static DEFINE_PER_CPU(unsigned long, nfsd_file_releases); +static DEFINE_PER_CPU(unsigned long, nfsd_file_total_age); struct nfsd_fcache_disposal { struct work_struct work; @@ -177,6 +178,7 @@ nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval, if (nf) { INIT_HLIST_NODE(&nf->nf_node); INIT_LIST_HEAD(&nf->nf_lru); + nf->nf_birthtime = ktime_get(); nf->nf_file = NULL; nf->nf_cred = get_current_cred(); nf->nf_net = net; @@ -194,9 +196,11 @@ nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval, static bool nfsd_file_free(struct nfsd_file *nf) { + s64 age = ktime_to_ms(ktime_sub(ktime_get(), nf->nf_birthtime)); bool flush = false; this_cpu_inc(nfsd_file_releases); + this_cpu_add(nfsd_file_total_age, age); trace_nfsd_file_put_final(nf); if (nf->nf_mark) @@ -1054,7 +1058,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { unsigned long hits = 0, acquisitions = 0, releases = 0; unsigned int i, count = 0, longest = 0; - unsigned long lru = 0; + unsigned long lru = 0, total_age = 0; /* * No need for spinlocks here since we're not terribly interested in @@ -1075,6 +1079,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) hits += per_cpu(nfsd_file_cache_hits, i); acquisitions += per_cpu(nfsd_file_acquisitions, i); releases += per_cpu(nfsd_file_releases, i); + total_age += per_cpu(nfsd_file_total_age, i); } seq_printf(m, "total entries: %u\n", count); @@ -1083,6 +1088,10 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) seq_printf(m, "cache hits: %lu\n", hits); seq_printf(m, "acquisitions: %lu\n", acquisitions); seq_printf(m, "releases: %lu\n", releases); + if (releases) + seq_printf(m, "mean age (ms): %ld\n", total_age / releases); + else + seq_printf(m, "mean age (ms): -\n"); return 0; } diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index c9e3c6eb4776..c6ad5fe47f12 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -44,6 +44,7 @@ struct nfsd_file { refcount_t nf_ref; unsigned char nf_may; struct nfsd_file_mark *nf_mark; + ktime_t nf_birthtime; }; int nfsd_file_cache_init(void); From 8d038e72e7ad559fe02a759557f2d535aca595ad Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:24:18 -0400 Subject: [PATCH 0858/1565] NFSD: Add nfsd_file_lru_dispose_list() helper [ Upstream commit 0bac5a264d9a923f5b01f3521e1519a8d0358342 ] Refactor the invariant part of nfsd_file_lru_walk_list() into a separate helper function. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 0cd72c20fc12..ffe46f3f3349 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -450,11 +450,31 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru, return LRU_SKIP; } +/* + * Unhash items on @dispose immediately, then queue them on the + * disposal workqueue to finish releasing them in the background. + * + * cel: Note that between the time list_lru_shrink_walk runs and + * now, these items are in the hash table but marked unhashed. + * Why release these outside of lru_cb ? There's no lock ordering + * problem since lru_cb currently takes no lock. + */ +static void nfsd_file_gc_dispose_list(struct list_head *dispose) +{ + struct nfsd_file *nf; + + list_for_each_entry(nf, dispose, nf_lru) { + spin_lock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); + nfsd_file_do_unhash(nf); + spin_unlock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); + } + nfsd_file_dispose_list_delayed(dispose); +} + static unsigned long nfsd_file_lru_walk_list(struct shrink_control *sc) { LIST_HEAD(head); - struct nfsd_file *nf; unsigned long ret; if (sc) @@ -464,12 +484,7 @@ nfsd_file_lru_walk_list(struct shrink_control *sc) ret = list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb, &head, LONG_MAX); - list_for_each_entry(nf, &head, nf_lru) { - spin_lock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); - nfsd_file_do_unhash(nf); - spin_unlock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); - } - nfsd_file_dispose_list_delayed(&head); + nfsd_file_gc_dispose_list(&head); return ret; } From e7d5efd20ea92d5648056cdf570bc6dcb5873744 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:24:25 -0400 Subject: [PATCH 0859/1565] NFSD: Refactor nfsd_file_gc() [ Upstream commit 3bc6d3470fe412f818f9bff6b71d1be3a76af8f3 ] Refactor nfsd_file_gc() to use the new list_lru helper. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index ffe46f3f3349..656c94c77941 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -491,7 +491,11 @@ nfsd_file_lru_walk_list(struct shrink_control *sc) static void nfsd_file_gc(void) { - nfsd_file_lru_walk_list(NULL); + LIST_HEAD(dispose); + + list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb, + &dispose, LONG_MAX); + nfsd_file_gc_dispose_list(&dispose); } static void From af057e5884adc6100720a515b6815eb67ccb4d49 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:24:31 -0400 Subject: [PATCH 0860/1565] NFSD: Refactor nfsd_file_lru_scan() [ Upstream commit 39f1d1ff8148902c5692ffb0e1c4479416ab44a7 ] Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 25 +++++++------------------ 1 file changed, 7 insertions(+), 18 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 656c94c77941..1d94491e5dda 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -471,23 +471,6 @@ static void nfsd_file_gc_dispose_list(struct list_head *dispose) nfsd_file_dispose_list_delayed(dispose); } -static unsigned long -nfsd_file_lru_walk_list(struct shrink_control *sc) -{ - LIST_HEAD(head); - unsigned long ret; - - if (sc) - ret = list_lru_shrink_walk(&nfsd_file_lru, sc, - nfsd_file_lru_cb, &head); - else - ret = list_lru_walk(&nfsd_file_lru, - nfsd_file_lru_cb, - &head, LONG_MAX); - nfsd_file_gc_dispose_list(&head); - return ret; -} - static void nfsd_file_gc(void) { @@ -514,7 +497,13 @@ nfsd_file_lru_count(struct shrinker *s, struct shrink_control *sc) static unsigned long nfsd_file_lru_scan(struct shrinker *s, struct shrink_control *sc) { - return nfsd_file_lru_walk_list(sc); + LIST_HEAD(dispose); + unsigned long ret; + + ret = list_lru_shrink_walk(&nfsd_file_lru, sc, + nfsd_file_lru_cb, &dispose); + nfsd_file_gc_dispose_list(&dispose); + return ret; } static struct shrinker nfsd_file_shrinker = { From 2cba48b3d0a000ff119fb98b380ce27911fdcabc Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:24:38 -0400 Subject: [PATCH 0861/1565] NFSD: Report the number of items evicted by the LRU walk [ Upstream commit 94660cc19c75083af046b0f8362e3d3bc2eba21d ] Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 13 ++++++++++--- fs/nfsd/trace.h | 29 +++++++++++++++++++++++++++++ 2 files changed, 39 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 1d94491e5dda..e5bd9f06492c 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -45,6 +45,7 @@ static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); static DEFINE_PER_CPU(unsigned long, nfsd_file_releases); static DEFINE_PER_CPU(unsigned long, nfsd_file_total_age); +static DEFINE_PER_CPU(unsigned long, nfsd_file_evictions); struct nfsd_fcache_disposal { struct work_struct work; @@ -445,6 +446,7 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru, goto out_skip; list_lru_isolate_move(lru, &nf->nf_lru, head); + this_cpu_inc(nfsd_file_evictions); return LRU_REMOVED; out_skip: return LRU_SKIP; @@ -475,9 +477,11 @@ static void nfsd_file_gc(void) { LIST_HEAD(dispose); + unsigned long ret; - list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb, - &dispose, LONG_MAX); + ret = list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb, + &dispose, LONG_MAX); + trace_nfsd_file_gc_removed(ret, list_lru_count(&nfsd_file_lru)); nfsd_file_gc_dispose_list(&dispose); } @@ -502,6 +506,7 @@ nfsd_file_lru_scan(struct shrinker *s, struct shrink_control *sc) ret = list_lru_shrink_walk(&nfsd_file_lru, sc, nfsd_file_lru_cb, &dispose); + trace_nfsd_file_shrinker_removed(ret, list_lru_count(&nfsd_file_lru)); nfsd_file_gc_dispose_list(&dispose); return ret; } @@ -1064,7 +1069,7 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, */ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { - unsigned long hits = 0, acquisitions = 0, releases = 0; + unsigned long hits = 0, acquisitions = 0, releases = 0, evictions = 0; unsigned int i, count = 0, longest = 0; unsigned long lru = 0, total_age = 0; @@ -1088,6 +1093,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) acquisitions += per_cpu(nfsd_file_acquisitions, i); releases += per_cpu(nfsd_file_releases, i); total_age += per_cpu(nfsd_file_total_age, i); + evictions += per_cpu(nfsd_file_evictions, i); } seq_printf(m, "total entries: %u\n", count); @@ -1096,6 +1102,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) seq_printf(m, "cache hits: %lu\n", hits); seq_printf(m, "acquisitions: %lu\n", acquisitions); seq_printf(m, "releases: %lu\n", releases); + seq_printf(m, "evictions: %lu\n", evictions); if (releases) seq_printf(m, "mean age (ms): %ld\n", total_age / releases); else diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 5c2292a1892c..b373a161e862 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -860,6 +860,35 @@ TRACE_EVENT(nfsd_file_fsnotify_handle_event, __entry->nlink, __entry->mode, __entry->mask) ); +DECLARE_EVENT_CLASS(nfsd_file_lruwalk_class, + TP_PROTO( + unsigned long removed, + unsigned long remaining + ), + TP_ARGS(removed, remaining), + TP_STRUCT__entry( + __field(unsigned long, removed) + __field(unsigned long, remaining) + ), + TP_fast_assign( + __entry->removed = removed; + __entry->remaining = remaining; + ), + TP_printk("%lu entries removed, %lu remaining", + __entry->removed, __entry->remaining) +); + +#define DEFINE_NFSD_FILE_LRUWALK_EVENT(name) \ +DEFINE_EVENT(nfsd_file_lruwalk_class, name, \ + TP_PROTO( \ + unsigned long removed, \ + unsigned long remaining \ + ), \ + TP_ARGS(removed, remaining)) + +DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_gc_removed); +DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_shrinker_removed); + #include "cache.h" TRACE_DEFINE_ENUM(RC_DROPIT); From 04b9376a106f2004777d6cbfb7c07573ebf06207 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:24:45 -0400 Subject: [PATCH 0862/1565] NFSD: Record number of flush calls [ Upstream commit df2aff524faceaf743b7c5ab0f4fb86cb511f782 ] Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index e5bd9f06492c..b9941d4ef20d 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -45,6 +45,7 @@ static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); static DEFINE_PER_CPU(unsigned long, nfsd_file_releases); static DEFINE_PER_CPU(unsigned long, nfsd_file_total_age); +static DEFINE_PER_CPU(unsigned long, nfsd_file_pages_flushed); static DEFINE_PER_CPU(unsigned long, nfsd_file_evictions); struct nfsd_fcache_disposal { @@ -242,7 +243,12 @@ nfsd_file_check_write_error(struct nfsd_file *nf) static void nfsd_file_flush(struct nfsd_file *nf) { - if (nf->nf_file && vfs_fsync(nf->nf_file, 1) != 0) + struct file *file = nf->nf_file; + + if (!file || !(file->f_mode & FMODE_WRITE)) + return; + this_cpu_add(nfsd_file_pages_flushed, file->f_mapping->nrpages); + if (vfs_fsync(file, 1) != 0) nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); } @@ -1069,7 +1075,8 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, */ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { - unsigned long hits = 0, acquisitions = 0, releases = 0, evictions = 0; + unsigned long releases = 0, pages_flushed = 0, evictions = 0; + unsigned long hits = 0, acquisitions = 0; unsigned int i, count = 0, longest = 0; unsigned long lru = 0, total_age = 0; @@ -1094,6 +1101,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) releases += per_cpu(nfsd_file_releases, i); total_age += per_cpu(nfsd_file_total_age, i); evictions += per_cpu(nfsd_file_evictions, i); + pages_flushed += per_cpu(nfsd_file_pages_flushed, i); } seq_printf(m, "total entries: %u\n", count); @@ -1107,6 +1115,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) seq_printf(m, "mean age (ms): %ld\n", total_age / releases); else seq_printf(m, "mean age (ms): -\n"); + seq_printf(m, "pages flushed: %lu\n", pages_flushed); return 0; } From 6dc5cab80881746e3f97227718723e3c426148af Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:24:51 -0400 Subject: [PATCH 0863/1565] NFSD: Zero counters when the filecache is re-initialized [ Upstream commit 8b330f78040cbe16cf8029df70391b2a491f17e2 ] If nfsd_file_cache_init() is called after a shutdown, be sure the stat counters are reset. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index b9941d4ef20d..60c51a4d8e0d 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -823,6 +823,8 @@ nfsd_file_cache_shutdown_net(struct net *net) void nfsd_file_cache_shutdown(void) { + int i; + set_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags); lease_unregister_notifier(&nfsd_file_lease_notifier); @@ -846,6 +848,15 @@ nfsd_file_cache_shutdown(void) nfsd_file_hashtbl = NULL; destroy_workqueue(nfsd_filecache_wq); nfsd_filecache_wq = NULL; + + for_each_possible_cpu(i) { + per_cpu(nfsd_file_cache_hits, i) = 0; + per_cpu(nfsd_file_acquisitions, i) = 0; + per_cpu(nfsd_file_releases, i) = 0; + per_cpu(nfsd_file_total_age, i) = 0; + per_cpu(nfsd_file_pages_flushed, i) = 0; + per_cpu(nfsd_file_evictions, i) = 0; + } } static bool From 0c426d4621c862ba9975573ed42f349c4918f8d3 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:24:58 -0400 Subject: [PATCH 0864/1565] NFSD: Hook up the filecache stat file [ Upstream commit 2e6c6e4c4375bfd3defa5b1ff3604d9f33d1c936 ] There has always been the capability of exporting filecache metrics via /proc, but it was never hooked up. Let's surface these metrics to enable better observability of the filecache. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsctl.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 66c352bf61b1..7002edbf2687 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -25,6 +25,7 @@ #include "state.h" #include "netns.h" #include "pnfs.h" +#include "filecache.h" /* * We have a single directory with several nodes in it. @@ -45,6 +46,7 @@ enum { NFSD_Ports, NFSD_MaxBlkSize, NFSD_MaxConnections, + NFSD_Filecache, NFSD_SupportedEnctypes, /* * The below MUST come last. Otherwise we leave a hole in nfsd_files[] @@ -229,6 +231,13 @@ static const struct file_operations reply_cache_stats_operations = { .release = single_release, }; +static const struct file_operations filecache_ops = { + .open = nfsd_file_cache_stats_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + /*----------------------------------------------------------------------------*/ /* * payload - write methods @@ -1370,6 +1379,7 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc) [NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO}, [NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO}, [NFSD_MaxConnections] = {"max_connections", &transaction_ops, S_IWUSR|S_IRUGO}, + [NFSD_Filecache] = {"filecache", &filecache_ops, S_IRUGO}, #if defined(CONFIG_SUNRPC_GSS) || defined(CONFIG_SUNRPC_GSS_MODULE) [NFSD_SupportedEnctypes] = {"supported_krb5_enctypes", &supported_enctypes_ops, S_IRUGO}, #endif /* CONFIG_SUNRPC_GSS or CONFIG_SUNRPC_GSS_MODULE */ From 7cca6908fa148d6dedc9d78d673b2f8d7889b8af Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:25:04 -0400 Subject: [PATCH 0865/1565] NFSD: WARN when freeing an item still linked via nf_lru [ Upstream commit 668ed92e651d3c25f9b6e8cb7ceca54d00daa96d ] Add a guardrail to prevent freeing memory that is still on a list. This includes either a dispose list or the LRU list. This is the sign of a bug, but this class of bugs can be detected so that they don't endanger system stability, especially while debugging. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 60c51a4d8e0d..d9b5f1e18397 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -213,6 +213,14 @@ nfsd_file_free(struct nfsd_file *nf) fput(nf->nf_file); flush = true; } + + /* + * If this item is still linked via nf_lru, that's a bug. + * WARN and leak it to preserve system stability. + */ + if (WARN_ON_ONCE(!list_empty(&nf->nf_lru))) + return flush; + call_rcu(&nf->nf_rcu, nfsd_file_slab_free); return flush; } @@ -342,7 +350,7 @@ nfsd_file_dispose_list(struct list_head *dispose) while(!list_empty(dispose)) { nf = list_first_entry(dispose, struct nfsd_file, nf_lru); - list_del(&nf->nf_lru); + list_del_init(&nf->nf_lru); nfsd_file_flush(nf); nfsd_file_put_noref(nf); } @@ -356,7 +364,7 @@ nfsd_file_dispose_list_sync(struct list_head *dispose) while(!list_empty(dispose)) { nf = list_first_entry(dispose, struct nfsd_file, nf_lru); - list_del(&nf->nf_lru); + list_del_init(&nf->nf_lru); nfsd_file_flush(nf); if (!refcount_dec_and_test(&nf->nf_ref)) continue; From c15db0869e97cc8d13ff536d2ab59b33b5d2cd95 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:25:11 -0400 Subject: [PATCH 0866/1565] NFSD: Trace filecache LRU activity [ Upstream commit c46203acddd9b9200dbc53d0603c97355fd3a03b ] Observe the operation of garbage collection and the lifetime of filecache items. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 44 +++++++++++++++++++++++++++++++------------- fs/nfsd/trace.h | 39 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 70 insertions(+), 13 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index d9b5f1e18397..a995a744a748 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -260,6 +260,18 @@ nfsd_file_flush(struct nfsd_file *nf) nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); } +static void nfsd_file_lru_add(struct nfsd_file *nf) +{ + if (list_lru_add(&nfsd_file_lru, &nf->nf_lru)) + trace_nfsd_file_lru_add(nf); +} + +static void nfsd_file_lru_remove(struct nfsd_file *nf) +{ + if (list_lru_del(&nfsd_file_lru, &nf->nf_lru)) + trace_nfsd_file_lru_del(nf); +} + static void nfsd_file_do_unhash(struct nfsd_file *nf) { @@ -279,8 +291,7 @@ nfsd_file_unhash(struct nfsd_file *nf) { if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { nfsd_file_do_unhash(nf); - if (!list_empty(&nf->nf_lru)) - list_lru_del(&nfsd_file_lru, &nf->nf_lru); + nfsd_file_lru_remove(nf); return true; } return false; @@ -443,27 +454,34 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru, * counter. Here we check the counter and then test and clear the flag. * That order is deliberate to ensure that we can do this locklessly. */ - if (refcount_read(&nf->nf_ref) > 1) - goto out_skip; + if (refcount_read(&nf->nf_ref) > 1) { + trace_nfsd_file_gc_in_use(nf); + return LRU_SKIP; + } /* * Don't throw out files that are still undergoing I/O or * that have uncleared errors pending. */ - if (nfsd_file_check_writeback(nf)) - goto out_skip; + if (nfsd_file_check_writeback(nf)) { + trace_nfsd_file_gc_writeback(nf); + return LRU_SKIP; + } - if (test_and_clear_bit(NFSD_FILE_REFERENCED, &nf->nf_flags)) - goto out_skip; + if (test_and_clear_bit(NFSD_FILE_REFERENCED, &nf->nf_flags)) { + trace_nfsd_file_gc_referenced(nf); + return LRU_SKIP; + } - if (!test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) - goto out_skip; + if (!test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { + trace_nfsd_file_gc_hashed(nf); + return LRU_SKIP; + } list_lru_isolate_move(lru, &nf->nf_lru, head); this_cpu_inc(nfsd_file_evictions); + trace_nfsd_file_gc_disposed(nf); return LRU_REMOVED; -out_skip: - return LRU_SKIP; } /* @@ -1016,7 +1034,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, refcount_inc(&nf->nf_ref); __set_bit(NFSD_FILE_HASHED, &nf->nf_flags); __set_bit(NFSD_FILE_PENDING, &nf->nf_flags); - list_lru_add(&nfsd_file_lru, &nf->nf_lru); + nfsd_file_lru_add(nf); hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head); ++nfsd_file_hashtbl[hashval].nfb_count; nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount, diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index b373a161e862..b1aa28c062ac 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -860,6 +860,45 @@ TRACE_EVENT(nfsd_file_fsnotify_handle_event, __entry->nlink, __entry->mode, __entry->mask) ); +DECLARE_EVENT_CLASS(nfsd_file_gc_class, + TP_PROTO( + const struct nfsd_file *nf + ), + TP_ARGS(nf), + TP_STRUCT__entry( + __field(void *, nf_inode) + __field(void *, nf_file) + __field(int, nf_ref) + __field(unsigned long, nf_flags) + ), + TP_fast_assign( + __entry->nf_inode = nf->nf_inode; + __entry->nf_file = nf->nf_file; + __entry->nf_ref = refcount_read(&nf->nf_ref); + __entry->nf_flags = nf->nf_flags; + ), + TP_printk("inode=%p ref=%d nf_flags=%s nf_file=%p", + __entry->nf_inode, __entry->nf_ref, + show_nf_flags(__entry->nf_flags), + __entry->nf_file + ) +); + +#define DEFINE_NFSD_FILE_GC_EVENT(name) \ +DEFINE_EVENT(nfsd_file_gc_class, name, \ + TP_PROTO( \ + const struct nfsd_file *nf \ + ), \ + TP_ARGS(nf)) + +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add); +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del); +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_in_use); +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_writeback); +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_referenced); +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_hashed); +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_disposed); + DECLARE_EVENT_CLASS(nfsd_file_lruwalk_class, TP_PROTO( unsigned long removed, From 51fc2b2c797108ba058633a4c0fe674fcabbe8a8 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:25:17 -0400 Subject: [PATCH 0867/1565] NFSD: Leave open files out of the filecache LRU [ Upstream commit 4a0e73e635e3f36b616ad5c943e3d23debe4632f ] There have been reports of problems when running fstests generic/531 against Linux NFS servers with NFSv4. The NFS server that hosts the test's SCRATCH_DEV suffers from CPU soft lock-ups during the test. Analysis shows that: fs/nfsd/filecache.c 482 ret = list_lru_walk(&nfsd_file_lru, 483 nfsd_file_lru_cb, 484 &head, LONG_MAX); causes nfsd_file_gc() to walk the entire length of the filecache LRU list every time it is called (which is quite frequently). The walk holds a spinlock the entire time that prevents other nfsd threads from accessing the filecache. What's more, for NFSv4 workloads, none of the items that are visited during this walk may be evicted, since they are all files that are held OPEN by NFS clients. Address this by ensuring that open files are not kept on the LRU list. Reported-by: Frank van der Linden Reported-by: Wang Yugui Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=386 Suggested-by: Trond Myklebust Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 24 +++++++++++++++++++----- fs/nfsd/trace.h | 2 ++ 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index a995a744a748..5c9e3ff6397b 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -262,6 +262,7 @@ nfsd_file_flush(struct nfsd_file *nf) static void nfsd_file_lru_add(struct nfsd_file *nf) { + set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); if (list_lru_add(&nfsd_file_lru, &nf->nf_lru)) trace_nfsd_file_lru_add(nf); } @@ -291,7 +292,6 @@ nfsd_file_unhash(struct nfsd_file *nf) { if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { nfsd_file_do_unhash(nf); - nfsd_file_lru_remove(nf); return true; } return false; @@ -312,6 +312,7 @@ nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *disp if (refcount_dec_not_one(&nf->nf_ref)) return true; + nfsd_file_lru_remove(nf); list_add(&nf->nf_lru, dispose); return true; } @@ -323,6 +324,7 @@ nfsd_file_put_noref(struct nfsd_file *nf) if (refcount_dec_and_test(&nf->nf_ref)) { WARN_ON(test_bit(NFSD_FILE_HASHED, &nf->nf_flags)); + nfsd_file_lru_remove(nf); nfsd_file_free(nf); } } @@ -332,7 +334,7 @@ nfsd_file_put(struct nfsd_file *nf) { might_sleep(); - set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); + nfsd_file_lru_add(nf); if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) { nfsd_file_flush(nf); nfsd_file_put_noref(nf); @@ -432,8 +434,18 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose) } } -/* +/** + * nfsd_file_lru_cb - Examine an entry on the LRU list + * @item: LRU entry to examine + * @lru: controlling LRU + * @lock: LRU list lock (unused) + * @arg: dispose list + * * Note this can deadlock with nfsd_file_cache_purge. + * + * Return values: + * %LRU_REMOVED: @item was removed from the LRU + * %LRU_SKIP: @item cannot be evicted */ static enum lru_status nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru, @@ -455,8 +467,9 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru, * That order is deliberate to ensure that we can do this locklessly. */ if (refcount_read(&nf->nf_ref) > 1) { + list_lru_isolate(lru, &nf->nf_lru); trace_nfsd_file_gc_in_use(nf); - return LRU_SKIP; + return LRU_REMOVED; } /* @@ -1013,6 +1026,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, goto retry; } + nfsd_file_lru_remove(nf); this_cpu_inc(nfsd_file_cache_hits); status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); @@ -1034,7 +1048,6 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, refcount_inc(&nf->nf_ref); __set_bit(NFSD_FILE_HASHED, &nf->nf_flags); __set_bit(NFSD_FILE_PENDING, &nf->nf_flags); - nfsd_file_lru_add(nf); hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head); ++nfsd_file_hashtbl[hashval].nfb_count; nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount, @@ -1059,6 +1072,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, */ if (status != nfs_ok || inode->i_nlink == 0) { bool do_free; + nfsd_file_lru_remove(nf); spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); do_free = nfsd_file_unhash(nf); spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index b1aa28c062ac..5919cdcf137b 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -892,7 +892,9 @@ DEFINE_EVENT(nfsd_file_gc_class, name, \ TP_ARGS(nf)) DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add); +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add_disposed); DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del); +DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del_disposed); DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_in_use); DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_writeback); DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_referenced); From 586e8d6c3dc3f88e0491458c05740aa0abca5f42 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:25:24 -0400 Subject: [PATCH 0868/1565] NFSD: Fix the filecache LRU shrinker [ Upstream commit edead3a55804739b2e4af0f35e9c7326264e7b22 ] Without LRU item rotation, the shrinker visits only a few items on the end of the LRU list, and those would always be long-term OPEN files for NFSv4 workloads. That makes the filecache shrinker completely ineffective. Adopt the same strategy as the inode LRU by using LRU_ROTATE. Suggested-by: Dave Chinner Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 5c9e3ff6397b..849c010c6ef6 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -445,6 +445,7 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose) * * Return values: * %LRU_REMOVED: @item was removed from the LRU + * %LRU_ROTATE: @item is to be moved to the LRU tail * %LRU_SKIP: @item cannot be evicted */ static enum lru_status @@ -483,7 +484,7 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru, if (test_and_clear_bit(NFSD_FILE_REFERENCED, &nf->nf_flags)) { trace_nfsd_file_gc_referenced(nf); - return LRU_SKIP; + return LRU_ROTATE; } if (!test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { @@ -525,7 +526,7 @@ nfsd_file_gc(void) unsigned long ret; ret = list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb, - &dispose, LONG_MAX); + &dispose, list_lru_count(&nfsd_file_lru)); trace_nfsd_file_gc_removed(ret, list_lru_count(&nfsd_file_lru)); nfsd_file_gc_dispose_list(&dispose); } From 99735b8d82d1d9defb439d0125f1654d96fd7a91 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:25:30 -0400 Subject: [PATCH 0869/1565] NFSD: Never call nfsd_file_gc() in foreground paths [ Upstream commit 6df19411367a5fb4ef61854cbd1af269c077f917 ] The checks in nfsd_file_acquire() and nfsd_file_put() that directly invoke filecache garbage collection are intended to keep cache occupancy between a low- and high-watermark. The reason to limit the capacity of the filecache is to keep filecache lookups reasonably fast. However, invoking garbage collection at those points has some undesirable negative impacts. Files that are held open by NFSv4 clients often push the occupancy of the filecache over these watermarks. At that point: - Every call to nfsd_file_acquire() and nfsd_file_put() results in an LRU walk. This has the same effect on lookup latency as long chains in the hash table. - Garbage collection will then run on every nfsd thread, causing a lot of unnecessary lock contention. - Limiting cache capacity pushes out files used only by NFSv3 clients, which are the type of files the filecache is supposed to help. To address those negative impacts, remove the direct calls to the garbage collector. Subsequent patches will address maintaining lookup efficiency as cache capacity increases. Suggested-by: Wang Yugui Suggested-by: Dave Chinner Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 849c010c6ef6..7a02ff11b9ec 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -28,8 +28,6 @@ #define NFSD_LAUNDRETTE_DELAY (2 * HZ) #define NFSD_FILE_SHUTDOWN (1) -#define NFSD_FILE_LRU_THRESHOLD (4096UL) -#define NFSD_FILE_LRU_LIMIT (NFSD_FILE_LRU_THRESHOLD << 2) /* We only care about NFSD_MAY_READ/WRITE for this cache */ #define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE) @@ -65,8 +63,6 @@ static struct fsnotify_group *nfsd_file_fsnotify_group; static atomic_long_t nfsd_filecache_count; static struct delayed_work nfsd_filecache_laundrette; -static void nfsd_file_gc(void); - static void nfsd_file_schedule_laundrette(void) { @@ -343,9 +339,6 @@ nfsd_file_put(struct nfsd_file *nf) nfsd_file_schedule_laundrette(); } else nfsd_file_put_noref(nf); - - if (atomic_long_read(&nfsd_filecache_count) >= NFSD_FILE_LRU_LIMIT) - nfsd_file_gc(); } struct nfsd_file * @@ -1054,8 +1047,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount, nfsd_file_hashtbl[hashval].nfb_count); spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); - if (atomic_long_inc_return(&nfsd_filecache_count) >= NFSD_FILE_LRU_THRESHOLD) - nfsd_file_gc(); + atomic_long_inc(&nfsd_filecache_count); nf->nf_mark = nfsd_file_mark_find_or_create(nf); if (nf->nf_mark) { From 525c2c81fdccbac305ca10ec76f648321941a4d9 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:25:37 -0400 Subject: [PATCH 0870/1565] NFSD: No longer record nf_hashval in the trace log [ Upstream commit 54f7df7094b329ca35d9f9808692bb16c48b13e9 ] I'm about to replace nfsd_file_hashtbl with an rhashtable. The individual hash values will no longer be visible or relevant, so remove them from the tracepoints. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 15 ++++++++------- fs/nfsd/trace.h | 45 +++++++++++++++++++++------------------------ 2 files changed, 29 insertions(+), 31 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 7a02ff11b9ec..2d013a88e356 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -588,7 +588,7 @@ nfsd_file_close_inode_sync(struct inode *inode) LIST_HEAD(dispose); __nfsd_file_close_inode(inode, hashval, &dispose); - trace_nfsd_file_close_inode_sync(inode, hashval, !list_empty(&dispose)); + trace_nfsd_file_close_inode_sync(inode, !list_empty(&dispose)); nfsd_file_dispose_list_sync(&dispose); } @@ -608,7 +608,7 @@ nfsd_file_close_inode(struct inode *inode) LIST_HEAD(dispose); __nfsd_file_close_inode(inode, hashval, &dispose); - trace_nfsd_file_close_inode(inode, hashval, !list_empty(&dispose)); + trace_nfsd_file_close_inode(inode, !list_empty(&dispose)); nfsd_file_dispose_list_delayed(&dispose); } @@ -962,7 +962,7 @@ nfsd_file_is_cached(struct inode *inode) } } rcu_read_unlock(); - trace_nfsd_file_is_cached(inode, hashval, (int)ret); + trace_nfsd_file_is_cached(inode, (int)ret); return ret; } @@ -994,9 +994,8 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, new = nfsd_file_alloc(inode, may_flags, hashval, net); if (!new) { - trace_nfsd_file_acquire(rqstp, hashval, inode, may_flags, - NULL, nfserr_jukebox); - return nfserr_jukebox; + status = nfserr_jukebox; + goto out_status; } spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); @@ -1034,8 +1033,10 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, nf = NULL; } - trace_nfsd_file_acquire(rqstp, hashval, inode, may_flags, nf, status); +out_status: + trace_nfsd_file_acquire(rqstp, inode, may_flags, nf, status); return status; + open_file: nf = new; /* Take reference for the hashtable */ diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 5919cdcf137b..8b34f2a5ad29 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -713,7 +713,6 @@ DECLARE_EVENT_CLASS(nfsd_file_class, TP_PROTO(struct nfsd_file *nf), TP_ARGS(nf), TP_STRUCT__entry( - __field(unsigned int, nf_hashval) __field(void *, nf_inode) __field(int, nf_ref) __field(unsigned long, nf_flags) @@ -721,15 +720,13 @@ DECLARE_EVENT_CLASS(nfsd_file_class, __field(struct file *, nf_file) ), TP_fast_assign( - __entry->nf_hashval = nf->nf_hashval; __entry->nf_inode = nf->nf_inode; __entry->nf_ref = refcount_read(&nf->nf_ref); __entry->nf_flags = nf->nf_flags; __entry->nf_may = nf->nf_may; __entry->nf_file = nf->nf_file; ), - TP_printk("hash=0x%x inode=%p ref=%d flags=%s may=%s file=%p", - __entry->nf_hashval, + TP_printk("inode=%p ref=%d flags=%s may=%s nf_file=%p", __entry->nf_inode, __entry->nf_ref, show_nf_flags(__entry->nf_flags), @@ -749,15 +746,18 @@ DEFINE_NFSD_FILE_EVENT(nfsd_file_put); DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_release_locked); TRACE_EVENT(nfsd_file_acquire, - TP_PROTO(struct svc_rqst *rqstp, unsigned int hash, - struct inode *inode, unsigned int may_flags, - struct nfsd_file *nf, __be32 status), + TP_PROTO( + struct svc_rqst *rqstp, + struct inode *inode, + unsigned int may_flags, + struct nfsd_file *nf, + __be32 status + ), - TP_ARGS(rqstp, hash, inode, may_flags, nf, status), + TP_ARGS(rqstp, inode, may_flags, nf, status), TP_STRUCT__entry( __field(u32, xid) - __field(unsigned int, hash) __field(void *, inode) __field(unsigned long, may_flags) __field(int, nf_ref) @@ -769,7 +769,6 @@ TRACE_EVENT(nfsd_file_acquire, TP_fast_assign( __entry->xid = be32_to_cpu(rqstp->rq_xid); - __entry->hash = hash; __entry->inode = inode; __entry->may_flags = may_flags; __entry->nf_ref = nf ? refcount_read(&nf->nf_ref) : 0; @@ -779,8 +778,8 @@ TRACE_EVENT(nfsd_file_acquire, __entry->status = be32_to_cpu(status); ), - TP_printk("xid=0x%x hash=0x%x inode=%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=%p status=%u", - __entry->xid, __entry->hash, __entry->inode, + TP_printk("xid=0x%x inode=%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=%p status=%u", + __entry->xid, __entry->inode, show_nfsd_may_flags(__entry->may_flags), __entry->nf_ref, show_nf_flags(__entry->nf_flags), show_nfsd_may_flags(__entry->nf_may), @@ -791,7 +790,6 @@ TRACE_EVENT(nfsd_file_open, TP_PROTO(struct nfsd_file *nf, __be32 status), TP_ARGS(nf, status), TP_STRUCT__entry( - __field(unsigned int, nf_hashval) __field(void *, nf_inode) /* cannot be dereferenced */ __field(int, nf_ref) __field(unsigned long, nf_flags) @@ -799,15 +797,13 @@ TRACE_EVENT(nfsd_file_open, __field(void *, nf_file) /* cannot be dereferenced */ ), TP_fast_assign( - __entry->nf_hashval = nf->nf_hashval; __entry->nf_inode = nf->nf_inode; __entry->nf_ref = refcount_read(&nf->nf_ref); __entry->nf_flags = nf->nf_flags; __entry->nf_may = nf->nf_may; __entry->nf_file = nf->nf_file; ), - TP_printk("hash=0x%x inode=%p ref=%d flags=%s may=%s file=%p", - __entry->nf_hashval, + TP_printk("inode=%p ref=%d flags=%s may=%s file=%p", __entry->nf_inode, __entry->nf_ref, show_nf_flags(__entry->nf_flags), @@ -816,26 +812,27 @@ TRACE_EVENT(nfsd_file_open, ) DECLARE_EVENT_CLASS(nfsd_file_search_class, - TP_PROTO(struct inode *inode, unsigned int hash, int found), - TP_ARGS(inode, hash, found), + TP_PROTO( + struct inode *inode, + int found + ), + TP_ARGS(inode, found), TP_STRUCT__entry( __field(struct inode *, inode) - __field(unsigned int, hash) __field(int, found) ), TP_fast_assign( __entry->inode = inode; - __entry->hash = hash; __entry->found = found; ), - TP_printk("hash=0x%x inode=%p found=%d", __entry->hash, - __entry->inode, __entry->found) + TP_printk("inode=%p found=%d", + __entry->inode, __entry->found) ); #define DEFINE_NFSD_FILE_SEARCH_EVENT(name) \ DEFINE_EVENT(nfsd_file_search_class, name, \ - TP_PROTO(struct inode *inode, unsigned int hash, int found), \ - TP_ARGS(inode, hash, found)) + TP_PROTO(struct inode *inode, int found), \ + TP_ARGS(inode, found)) DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode_sync); DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode); From ef7fe4908a1a6187246becc6d1e7ae74c38aa8b4 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:25:44 -0400 Subject: [PATCH 0871/1565] NFSD: Remove lockdep assertion from unhash_and_release_locked() [ Upstream commit f53cef15dddec7203df702cdc62e554190385450 ] IIUC, holding the hash bucket lock is needed only in nfsd_file_unhash, and there is already a lockdep assertion there. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 2d013a88e356..6a01de867795 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -299,8 +299,6 @@ nfsd_file_unhash(struct nfsd_file *nf) static bool nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *dispose) { - lockdep_assert_held(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); - trace_nfsd_file_unhash_and_release_locked(nf); if (!nfsd_file_unhash(nf)) return false; From a86953523ea9608e233d24a91cd5233406bdb638 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:25:50 -0400 Subject: [PATCH 0872/1565] NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode [ Upstream commit 8755326399f471ec3b31e2ab8c5074c0d28a0fb5 ] Remove an unnecessary usage of nf_hashval. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 6a01de867795..d7c74b51eabf 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -272,13 +272,17 @@ static void nfsd_file_lru_remove(struct nfsd_file *nf) static void nfsd_file_do_unhash(struct nfsd_file *nf) { - lockdep_assert_held(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); + struct inode *inode = nf->nf_inode; + unsigned int hashval = (unsigned int)hash_long(inode->i_ino, + NFSD_FILE_HASH_BITS); + + lockdep_assert_held(&nfsd_file_hashtbl[hashval].nfb_lock); trace_nfsd_file_unhash(nf); if (nfsd_file_check_write_error(nf)) nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); - --nfsd_file_hashtbl[nf->nf_hashval].nfb_count; + --nfsd_file_hashtbl[hashval].nfb_count; hlist_del_rcu(&nf->nf_node); atomic_long_dec(&nfsd_filecache_count); } From 10ba39f788868da1a20c9a32e3276a5364681c16 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:25:57 -0400 Subject: [PATCH 0873/1565] NFSD: Refactor __nfsd_file_close_inode() [ Upstream commit a845511007a63467fee575353c706806c21218b1 ] The code that computes the hashval is the same in both callers. To prevent them from going stale, reframe the documenting comments to remove descriptions of the underlying hash table structure, which is about to be replaced. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 40 +++++++++++++++++++------------------ fs/nfsd/trace.h | 48 +++++++++++++++++++++++++++++++++------------ 2 files changed, 56 insertions(+), 32 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index d7c74b51eabf..3925df9124c3 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -558,39 +558,44 @@ static struct shrinker nfsd_file_shrinker = { .seeks = 1, }; -static void -__nfsd_file_close_inode(struct inode *inode, unsigned int hashval, - struct list_head *dispose) +/* + * Find all cache items across all net namespaces that match @inode and + * move them to @dispose. The lookup is atomic wrt nfsd_file_acquire(). + */ +static unsigned int +__nfsd_file_close_inode(struct inode *inode, struct list_head *dispose) { + unsigned int hashval = (unsigned int)hash_long(inode->i_ino, + NFSD_FILE_HASH_BITS); + unsigned int count = 0; struct nfsd_file *nf; struct hlist_node *tmp; spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); hlist_for_each_entry_safe(nf, tmp, &nfsd_file_hashtbl[hashval].nfb_head, nf_node) { - if (inode == nf->nf_inode) + if (inode == nf->nf_inode) { nfsd_file_unhash_and_release_locked(nf, dispose); + count++; + } } spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); + return count; } /** * nfsd_file_close_inode_sync - attempt to forcibly close a nfsd_file * @inode: inode of the file to attempt to remove * - * Walk the whole hash bucket, looking for any files that correspond to "inode". - * If any do, then unhash them and put the hashtable reference to them and - * destroy any that had their last reference put. Also ensure that any of the - * fputs also have their final __fput done as well. + * Unhash and put, then flush and fput all cache items associated with @inode. */ void nfsd_file_close_inode_sync(struct inode *inode) { - unsigned int hashval = (unsigned int)hash_long(inode->i_ino, - NFSD_FILE_HASH_BITS); LIST_HEAD(dispose); + unsigned int count; - __nfsd_file_close_inode(inode, hashval, &dispose); - trace_nfsd_file_close_inode_sync(inode, !list_empty(&dispose)); + count = __nfsd_file_close_inode(inode, &dispose); + trace_nfsd_file_close_inode_sync(inode, count); nfsd_file_dispose_list_sync(&dispose); } @@ -598,19 +603,16 @@ nfsd_file_close_inode_sync(struct inode *inode) * nfsd_file_close_inode - attempt a delayed close of a nfsd_file * @inode: inode of the file to attempt to remove * - * Walk the whole hash bucket, looking for any files that correspond to "inode". - * If any do, then unhash them and put the hashtable reference to them and - * destroy any that had their last reference put. + * Unhash and put all cache item associated with @inode. */ static void nfsd_file_close_inode(struct inode *inode) { - unsigned int hashval = (unsigned int)hash_long(inode->i_ino, - NFSD_FILE_HASH_BITS); LIST_HEAD(dispose); + unsigned int count; - __nfsd_file_close_inode(inode, hashval, &dispose); - trace_nfsd_file_close_inode(inode, !list_empty(&dispose)); + count = __nfsd_file_close_inode(inode, &dispose); + trace_nfsd_file_close_inode(inode, count); nfsd_file_dispose_list_delayed(&dispose); } diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 8b34f2a5ad29..f170f07ec0fd 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -813,31 +813,53 @@ TRACE_EVENT(nfsd_file_open, DECLARE_EVENT_CLASS(nfsd_file_search_class, TP_PROTO( - struct inode *inode, + const struct inode *inode, + unsigned int count + ), + TP_ARGS(inode, count), + TP_STRUCT__entry( + __field(const struct inode *, inode) + __field(unsigned int, count) + ), + TP_fast_assign( + __entry->inode = inode; + __entry->count = count; + ), + TP_printk("inode=%p count=%u", + __entry->inode, __entry->count) +); + +#define DEFINE_NFSD_FILE_SEARCH_EVENT(name) \ +DEFINE_EVENT(nfsd_file_search_class, name, \ + TP_PROTO( \ + const struct inode *inode, \ + unsigned int count \ + ), \ + TP_ARGS(inode, count)) + +DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode_sync); +DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode); + +TRACE_EVENT(nfsd_file_is_cached, + TP_PROTO( + const struct inode *inode, int found ), TP_ARGS(inode, found), TP_STRUCT__entry( - __field(struct inode *, inode) + __field(const struct inode *, inode) __field(int, found) ), TP_fast_assign( __entry->inode = inode; __entry->found = found; ), - TP_printk("inode=%p found=%d", - __entry->inode, __entry->found) + TP_printk("inode=%p is %scached", + __entry->inode, + __entry->found ? "" : "not " + ) ); -#define DEFINE_NFSD_FILE_SEARCH_EVENT(name) \ -DEFINE_EVENT(nfsd_file_search_class, name, \ - TP_PROTO(struct inode *inode, int found), \ - TP_ARGS(inode, found)) - -DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode_sync); -DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode); -DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_is_cached); - TRACE_EVENT(nfsd_file_fsnotify_handle_event, TP_PROTO(struct inode *inode, u32 mask), TP_ARGS(inode, mask), From 12494d98fea94e4cedcd52e819c3fa1a8db176b5 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:26:03 -0400 Subject: [PATCH 0874/1565] NFSD: nfsd_file_hash_remove can compute hashval [ Upstream commit cb7ec76e73ff6640241c8f1f2f35c81d4005a2d6 ] Remove an unnecessary use of nf_hashval. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 3925df9124c3..dd59deec8b01 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -287,6 +287,18 @@ nfsd_file_do_unhash(struct nfsd_file *nf) atomic_long_dec(&nfsd_filecache_count); } +static void +nfsd_file_hash_remove(struct nfsd_file *nf) +{ + struct inode *inode = nf->nf_inode; + unsigned int hashval = (unsigned int)hash_long(inode->i_ino, + NFSD_FILE_HASH_BITS); + + spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); + nfsd_file_do_unhash(nf); + spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); +} + static bool nfsd_file_unhash(struct nfsd_file *nf) { @@ -506,11 +518,8 @@ static void nfsd_file_gc_dispose_list(struct list_head *dispose) { struct nfsd_file *nf; - list_for_each_entry(nf, dispose, nf_lru) { - spin_lock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); - nfsd_file_do_unhash(nf); - spin_unlock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); - } + list_for_each_entry(nf, dispose, nf_lru) + nfsd_file_hash_remove(nf); nfsd_file_dispose_list_delayed(dispose); } From bbb260f3ce9ff499f841df36b83d890c2ea11ec4 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:26:10 -0400 Subject: [PATCH 0875/1565] NFSD: Remove nfsd_file::nf_hashval [ Upstream commit f0743c2b25c65debd4f599a7c861428cd9de5906 ] The value in this field can always be computed from nf_inode, thus it is no longer used. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 6 ++---- fs/nfsd/filecache.h | 1 - 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index dd59deec8b01..29b1f57692a6 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -167,8 +167,7 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf) } static struct nfsd_file * -nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval, - struct net *net) +nfsd_file_alloc(struct inode *inode, unsigned int may, struct net *net) { struct nfsd_file *nf; @@ -182,7 +181,6 @@ nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval, nf->nf_net = net; nf->nf_flags = 0; nf->nf_inode = inode; - nf->nf_hashval = hashval; refcount_set(&nf->nf_ref, 1); nf->nf_may = may & NFSD_FILE_MAY_MASK; nf->nf_mark = NULL; @@ -1005,7 +1003,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, if (nf) goto wait_for_construction; - new = nfsd_file_alloc(inode, may_flags, hashval, net); + new = nfsd_file_alloc(inode, may_flags, net); if (!new) { status = nfserr_jukebox; goto out_status; diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index c6ad5fe47f12..82051e1b8420 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -40,7 +40,6 @@ struct nfsd_file { #define NFSD_FILE_REFERENCED (2) unsigned long nf_flags; struct inode *nf_inode; - unsigned int nf_hashval; refcount_t nf_ref; unsigned char nf_may; struct nfsd_file_mark *nf_mark; From 1ea9b51f738c27ca81447a01565a5b9a4045616e Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:26:16 -0400 Subject: [PATCH 0876/1565] NFSD: Replace the "init once" mechanism [ Upstream commit c7b824c3d06c85e054caf86e227255112c5e3c38 ] In a moment, the nfsd_file_hashtbl global will be replaced with an rhashtable. Replace the one or two spots that need to check if the hash table is available. We can easily reuse the SHUTDOWN flag for this purpose. Document that this mechanism relies on callers to hold the nfsd_mutex to prevent init, shutdown, and purging to run concurrently. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 42 ++++++++++++++++++++++++++---------------- 1 file changed, 26 insertions(+), 16 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 29b1f57692a6..33bb4d31b497 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -27,7 +27,7 @@ #define NFSD_FILE_HASH_SIZE (1 << NFSD_FILE_HASH_BITS) #define NFSD_LAUNDRETTE_DELAY (2 * HZ) -#define NFSD_FILE_SHUTDOWN (1) +#define NFSD_FILE_CACHE_UP (0) /* We only care about NFSD_MAY_READ/WRITE for this cache */ #define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE) @@ -58,7 +58,7 @@ static struct kmem_cache *nfsd_file_slab; static struct kmem_cache *nfsd_file_mark_slab; static struct nfsd_fcache_bucket *nfsd_file_hashtbl; static struct list_lru nfsd_file_lru; -static long nfsd_file_lru_flags; +static unsigned long nfsd_file_flags; static struct fsnotify_group *nfsd_file_fsnotify_group; static atomic_long_t nfsd_filecache_count; static struct delayed_work nfsd_filecache_laundrette; @@ -66,9 +66,8 @@ static struct delayed_work nfsd_filecache_laundrette; static void nfsd_file_schedule_laundrette(void) { - long count = atomic_long_read(&nfsd_filecache_count); - - if (count == 0 || test_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags)) + if ((atomic_long_read(&nfsd_filecache_count) == 0) || + test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 0) return; queue_delayed_work(system_wq, &nfsd_filecache_laundrette, @@ -697,9 +696,8 @@ nfsd_file_cache_init(void) int ret = -ENOMEM; unsigned int i; - clear_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags); - - if (nfsd_file_hashtbl) + lockdep_assert_held(&nfsd_mutex); + if (test_and_set_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1) return 0; nfsd_filecache_wq = alloc_workqueue("nfsd_filecache", 0, 0); @@ -785,8 +783,8 @@ nfsd_file_cache_init(void) /* * Note this can deadlock with nfsd_file_lru_cb. */ -void -nfsd_file_cache_purge(struct net *net) +static void +__nfsd_file_cache_purge(struct net *net) { unsigned int i; struct nfsd_file *nf; @@ -794,9 +792,6 @@ nfsd_file_cache_purge(struct net *net) LIST_HEAD(dispose); bool del; - if (!nfsd_file_hashtbl) - return; - for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) { struct nfsd_fcache_bucket *nfb = &nfsd_file_hashtbl[i]; @@ -857,6 +852,19 @@ nfsd_file_cache_start_net(struct net *net) return nn->fcache_disposal ? 0 : -ENOMEM; } +/** + * nfsd_file_cache_purge - Remove all cache items associated with @net + * @net: target net namespace + * + */ +void +nfsd_file_cache_purge(struct net *net) +{ + lockdep_assert_held(&nfsd_mutex); + if (test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1) + __nfsd_file_cache_purge(net); +} + void nfsd_file_cache_shutdown_net(struct net *net) { @@ -869,7 +877,9 @@ nfsd_file_cache_shutdown(void) { int i; - set_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags); + lockdep_assert_held(&nfsd_mutex); + if (test_and_clear_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 0) + return; lease_unregister_notifier(&nfsd_file_lease_notifier); unregister_shrinker(&nfsd_file_shrinker); @@ -878,7 +888,7 @@ nfsd_file_cache_shutdown(void) * calling nfsd_file_cache_purge */ cancel_delayed_work_sync(&nfsd_filecache_laundrette); - nfsd_file_cache_purge(NULL); + __nfsd_file_cache_purge(NULL); list_lru_destroy(&nfsd_file_lru); rcu_barrier(); fsnotify_put_group(nfsd_file_fsnotify_group); @@ -1142,7 +1152,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) * don't end up racing with server shutdown */ mutex_lock(&nfsd_mutex); - if (nfsd_file_hashtbl) { + if (test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1) { for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) { count += nfsd_file_hashtbl[i].nfb_count; longest = max(longest, nfsd_file_hashtbl[i].nfb_count); From ebe886ac37d2d566a1a310d75cc47fe65dae4f14 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:26:23 -0400 Subject: [PATCH 0877/1565] NFSD: Set up an rhashtable for the filecache [ Upstream commit fc22945ecc2a0a028f3683115f98a922d506c284 ] Add code to initialize and tear down an rhashtable. The rhashtable is not used yet. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 160 ++++++++++++++++++++++++++++++++++++++------ fs/nfsd/filecache.h | 1 + 2 files changed, 140 insertions(+), 21 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 33bb4d31b497..95e7e15b567e 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -12,6 +12,7 @@ #include #include #include +#include #include "vfs.h" #include "nfsd.h" @@ -62,6 +63,136 @@ static unsigned long nfsd_file_flags; static struct fsnotify_group *nfsd_file_fsnotify_group; static atomic_long_t nfsd_filecache_count; static struct delayed_work nfsd_filecache_laundrette; +static struct rhashtable nfsd_file_rhash_tbl + ____cacheline_aligned_in_smp; + +enum nfsd_file_lookup_type { + NFSD_FILE_KEY_INODE, + NFSD_FILE_KEY_FULL, +}; + +struct nfsd_file_lookup_key { + struct inode *inode; + struct net *net; + const struct cred *cred; + unsigned char need; + enum nfsd_file_lookup_type type; +}; + +/* + * The returned hash value is based solely on the address of an in-code + * inode, a pointer to a slab-allocated object. The entropy in such a + * pointer is concentrated in its middle bits. + */ +static u32 nfsd_file_inode_hash(const struct inode *inode, u32 seed) +{ + unsigned long ptr = (unsigned long)inode; + u32 k; + + k = ptr >> L1_CACHE_SHIFT; + k &= 0x00ffffff; + return jhash2(&k, 1, seed); +} + +/** + * nfsd_file_key_hashfn - Compute the hash value of a lookup key + * @data: key on which to compute the hash value + * @len: rhash table's key_len parameter (unused) + * @seed: rhash table's random seed of the day + * + * Return value: + * Computed 32-bit hash value + */ +static u32 nfsd_file_key_hashfn(const void *data, u32 len, u32 seed) +{ + const struct nfsd_file_lookup_key *key = data; + + return nfsd_file_inode_hash(key->inode, seed); +} + +/** + * nfsd_file_obj_hashfn - Compute the hash value of an nfsd_file + * @data: object on which to compute the hash value + * @len: rhash table's key_len parameter (unused) + * @seed: rhash table's random seed of the day + * + * Return value: + * Computed 32-bit hash value + */ +static u32 nfsd_file_obj_hashfn(const void *data, u32 len, u32 seed) +{ + const struct nfsd_file *nf = data; + + return nfsd_file_inode_hash(nf->nf_inode, seed); +} + +static bool +nfsd_match_cred(const struct cred *c1, const struct cred *c2) +{ + int i; + + if (!uid_eq(c1->fsuid, c2->fsuid)) + return false; + if (!gid_eq(c1->fsgid, c2->fsgid)) + return false; + if (c1->group_info == NULL || c2->group_info == NULL) + return c1->group_info == c2->group_info; + if (c1->group_info->ngroups != c2->group_info->ngroups) + return false; + for (i = 0; i < c1->group_info->ngroups; i++) { + if (!gid_eq(c1->group_info->gid[i], c2->group_info->gid[i])) + return false; + } + return true; +} + +/** + * nfsd_file_obj_cmpfn - Match a cache item against search criteria + * @arg: search criteria + * @ptr: cache item to check + * + * Return values: + * %0 - Item matches search criteria + * %1 - Item does not match search criteria + */ +static int nfsd_file_obj_cmpfn(struct rhashtable_compare_arg *arg, + const void *ptr) +{ + const struct nfsd_file_lookup_key *key = arg->key; + const struct nfsd_file *nf = ptr; + + switch (key->type) { + case NFSD_FILE_KEY_INODE: + if (nf->nf_inode != key->inode) + return 1; + break; + case NFSD_FILE_KEY_FULL: + if (nf->nf_inode != key->inode) + return 1; + if (nf->nf_may != key->need) + return 1; + if (nf->nf_net != key->net) + return 1; + if (!nfsd_match_cred(nf->nf_cred, key->cred)) + return 1; + if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) + return 1; + break; + } + return 0; +} + +static const struct rhashtable_params nfsd_file_rhash_params = { + .key_len = sizeof_field(struct nfsd_file, nf_inode), + .key_offset = offsetof(struct nfsd_file, nf_inode), + .head_offset = offsetof(struct nfsd_file, nf_rhash), + .hashfn = nfsd_file_key_hashfn, + .obj_hashfn = nfsd_file_obj_hashfn, + .obj_cmpfn = nfsd_file_obj_cmpfn, + /* Reduce resizing churn on light workloads */ + .min_size = 512, /* buckets */ + .automatic_shrinking = true, +}; static void nfsd_file_schedule_laundrette(void) @@ -693,13 +824,18 @@ static const struct fsnotify_ops nfsd_file_fsnotify_ops = { int nfsd_file_cache_init(void) { - int ret = -ENOMEM; + int ret; unsigned int i; lockdep_assert_held(&nfsd_mutex); if (test_and_set_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1) return 0; + ret = rhashtable_init(&nfsd_file_rhash_tbl, &nfsd_file_rhash_params); + if (ret) + return ret; + + ret = -ENOMEM; nfsd_filecache_wq = alloc_workqueue("nfsd_filecache", 0, 0); if (!nfsd_filecache_wq) goto out; @@ -777,6 +913,7 @@ nfsd_file_cache_init(void) nfsd_file_hashtbl = NULL; destroy_workqueue(nfsd_filecache_wq); nfsd_filecache_wq = NULL; + rhashtable_destroy(&nfsd_file_rhash_tbl); goto out; } @@ -902,6 +1039,7 @@ nfsd_file_cache_shutdown(void) nfsd_file_hashtbl = NULL; destroy_workqueue(nfsd_filecache_wq); nfsd_filecache_wq = NULL; + rhashtable_destroy(&nfsd_file_rhash_tbl); for_each_possible_cpu(i) { per_cpu(nfsd_file_cache_hits, i) = 0; @@ -913,26 +1051,6 @@ nfsd_file_cache_shutdown(void) } } -static bool -nfsd_match_cred(const struct cred *c1, const struct cred *c2) -{ - int i; - - if (!uid_eq(c1->fsuid, c2->fsuid)) - return false; - if (!gid_eq(c1->fsgid, c2->fsgid)) - return false; - if (c1->group_info == NULL || c2->group_info == NULL) - return c1->group_info == c2->group_info; - if (c1->group_info->ngroups != c2->group_info->ngroups) - return false; - for (i = 0; i < c1->group_info->ngroups; i++) { - if (!gid_eq(c1->group_info->gid[i], c2->group_info->gid[i])) - return false; - } - return true; -} - static struct nfsd_file * nfsd_file_find_locked(struct inode *inode, unsigned int may_flags, unsigned int hashval, struct net *net) diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 82051e1b8420..5cbfc61a7d7d 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -29,6 +29,7 @@ struct nfsd_file_mark { * never be dereferenced, only used for comparison. */ struct nfsd_file { + struct rhash_head nf_rhash; struct hlist_node nf_node; struct list_head nf_lru; struct rcu_head nf_rcu; From a174ce98b302958394d8fdbc1a5b8b30dadcdd72 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:26:30 -0400 Subject: [PATCH 0878/1565] NFSD: Convert the filecache to use rhashtable [ Upstream commit ce502f81ba884c1fe45dc0ebddbcaaa4ec0fc5fb ] Enable the filecache hash table to start small, then grow with the workload. Smaller server deployments benefit because there should be lower memory utilization. Larger server deployments should see improved scaling with the number of open files. Suggested-by: Jeff Layton Suggested-by: Dave Chinner Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 261 +++++++++++++++++++------------------------- fs/nfsd/trace.h | 63 ++++++++++- 2 files changed, 177 insertions(+), 147 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 95e7e15b567e..45dd4f3fa090 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -61,7 +61,6 @@ static struct nfsd_fcache_bucket *nfsd_file_hashtbl; static struct list_lru nfsd_file_lru; static unsigned long nfsd_file_flags; static struct fsnotify_group *nfsd_file_fsnotify_group; -static atomic_long_t nfsd_filecache_count; static struct delayed_work nfsd_filecache_laundrette; static struct rhashtable nfsd_file_rhash_tbl ____cacheline_aligned_in_smp; @@ -197,7 +196,7 @@ static const struct rhashtable_params nfsd_file_rhash_params = { static void nfsd_file_schedule_laundrette(void) { - if ((atomic_long_read(&nfsd_filecache_count) == 0) || + if ((atomic_read(&nfsd_file_rhash_tbl.nelems) == 0) || test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 0) return; @@ -297,7 +296,7 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf) } static struct nfsd_file * -nfsd_file_alloc(struct inode *inode, unsigned int may, struct net *net) +nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) { struct nfsd_file *nf; @@ -308,11 +307,14 @@ nfsd_file_alloc(struct inode *inode, unsigned int may, struct net *net) nf->nf_birthtime = ktime_get(); nf->nf_file = NULL; nf->nf_cred = get_current_cred(); - nf->nf_net = net; + nf->nf_net = key->net; nf->nf_flags = 0; - nf->nf_inode = inode; - refcount_set(&nf->nf_ref, 1); - nf->nf_may = may & NFSD_FILE_MAY_MASK; + __set_bit(NFSD_FILE_HASHED, &nf->nf_flags); + __set_bit(NFSD_FILE_PENDING, &nf->nf_flags); + nf->nf_inode = key->inode; + /* nf_ref is pre-incremented for hash table */ + refcount_set(&nf->nf_ref, 2); + nf->nf_may = key->need; nf->nf_mark = NULL; trace_nfsd_file_alloc(nf); } @@ -398,40 +400,21 @@ static void nfsd_file_lru_remove(struct nfsd_file *nf) } static void -nfsd_file_do_unhash(struct nfsd_file *nf) +nfsd_file_hash_remove(struct nfsd_file *nf) { - struct inode *inode = nf->nf_inode; - unsigned int hashval = (unsigned int)hash_long(inode->i_ino, - NFSD_FILE_HASH_BITS); - - lockdep_assert_held(&nfsd_file_hashtbl[hashval].nfb_lock); - trace_nfsd_file_unhash(nf); if (nfsd_file_check_write_error(nf)) nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); - --nfsd_file_hashtbl[hashval].nfb_count; - hlist_del_rcu(&nf->nf_node); - atomic_long_dec(&nfsd_filecache_count); -} - -static void -nfsd_file_hash_remove(struct nfsd_file *nf) -{ - struct inode *inode = nf->nf_inode; - unsigned int hashval = (unsigned int)hash_long(inode->i_ino, - NFSD_FILE_HASH_BITS); - - spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); - nfsd_file_do_unhash(nf); - spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); + rhashtable_remove_fast(&nfsd_file_rhash_tbl, &nf->nf_rhash, + nfsd_file_rhash_params); } static bool nfsd_file_unhash(struct nfsd_file *nf) { if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { - nfsd_file_do_unhash(nf); + nfsd_file_hash_remove(nf); return true; } return false; @@ -441,9 +424,9 @@ nfsd_file_unhash(struct nfsd_file *nf) * Return true if the file was unhashed. */ static bool -nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *dispose) +nfsd_file_unhash_and_dispose(struct nfsd_file *nf, struct list_head *dispose) { - trace_nfsd_file_unhash_and_release_locked(nf); + trace_nfsd_file_unhash_and_dispose(nf); if (!nfsd_file_unhash(nf)) return false; /* keep final reference for nfsd_file_lru_dispose */ @@ -702,20 +685,23 @@ static struct shrinker nfsd_file_shrinker = { static unsigned int __nfsd_file_close_inode(struct inode *inode, struct list_head *dispose) { - unsigned int hashval = (unsigned int)hash_long(inode->i_ino, - NFSD_FILE_HASH_BITS); - unsigned int count = 0; - struct nfsd_file *nf; - struct hlist_node *tmp; + struct nfsd_file_lookup_key key = { + .type = NFSD_FILE_KEY_INODE, + .inode = inode, + }; + unsigned int count = 0; + struct nfsd_file *nf; - spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); - hlist_for_each_entry_safe(nf, tmp, &nfsd_file_hashtbl[hashval].nfb_head, nf_node) { - if (inode == nf->nf_inode) { - nfsd_file_unhash_and_release_locked(nf, dispose); - count++; - } - } - spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); + rcu_read_lock(); + do { + nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key, + nfsd_file_rhash_params); + if (!nf) + break; + nfsd_file_unhash_and_dispose(nf, dispose); + count++; + } while (1); + rcu_read_unlock(); return count; } @@ -923,30 +909,35 @@ nfsd_file_cache_init(void) static void __nfsd_file_cache_purge(struct net *net) { - unsigned int i; - struct nfsd_file *nf; - struct hlist_node *next; + struct rhashtable_iter iter; + struct nfsd_file *nf; LIST_HEAD(dispose); bool del; - for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) { - struct nfsd_fcache_bucket *nfb = &nfsd_file_hashtbl[i]; + rhashtable_walk_enter(&nfsd_file_rhash_tbl, &iter); + do { + rhashtable_walk_start(&iter); - spin_lock(&nfb->nfb_lock); - hlist_for_each_entry_safe(nf, next, &nfb->nfb_head, nf_node) { + nf = rhashtable_walk_next(&iter); + while (!IS_ERR_OR_NULL(nf)) { if (net && nf->nf_net != net) continue; - del = nfsd_file_unhash_and_release_locked(nf, &dispose); + del = nfsd_file_unhash_and_dispose(nf, &dispose); /* * Deadlock detected! Something marked this entry as * unhased, but hasn't removed it from the hash list. */ WARN_ON_ONCE(!del); + + nf = rhashtable_walk_next(&iter); } - spin_unlock(&nfb->nfb_lock); - nfsd_file_dispose_list(&dispose); - } + + rhashtable_walk_stop(&iter); + } while (nf == ERR_PTR(-EAGAIN)); + rhashtable_walk_exit(&iter); + + nfsd_file_dispose_list(&dispose); } static struct nfsd_fcache_disposal * @@ -1051,56 +1042,29 @@ nfsd_file_cache_shutdown(void) } } -static struct nfsd_file * -nfsd_file_find_locked(struct inode *inode, unsigned int may_flags, - unsigned int hashval, struct net *net) -{ - struct nfsd_file *nf; - unsigned char need = may_flags & NFSD_FILE_MAY_MASK; - - hlist_for_each_entry_rcu(nf, &nfsd_file_hashtbl[hashval].nfb_head, - nf_node, lockdep_is_held(&nfsd_file_hashtbl[hashval].nfb_lock)) { - if (nf->nf_may != need) - continue; - if (nf->nf_inode != inode) - continue; - if (nf->nf_net != net) - continue; - if (!nfsd_match_cred(nf->nf_cred, current_cred())) - continue; - if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) - continue; - if (nfsd_file_get(nf) != NULL) - return nf; - } - return NULL; -} - /** - * nfsd_file_is_cached - are there any cached open files for this fh? - * @inode: inode of the file to check + * nfsd_file_is_cached - are there any cached open files for this inode? + * @inode: inode to check * - * Scan the hashtable for open files that match this fh. Returns true if there - * are any, and false if not. + * The lookup matches inodes in all net namespaces and is atomic wrt + * nfsd_file_acquire(). + * + * Return values: + * %true: filecache contains at least one file matching this inode + * %false: filecache contains no files matching this inode */ bool nfsd_file_is_cached(struct inode *inode) { - bool ret = false; - struct nfsd_file *nf; - unsigned int hashval; + struct nfsd_file_lookup_key key = { + .type = NFSD_FILE_KEY_INODE, + .inode = inode, + }; + bool ret = false; - hashval = (unsigned int)hash_long(inode->i_ino, NFSD_FILE_HASH_BITS); - - rcu_read_lock(); - hlist_for_each_entry_rcu(nf, &nfsd_file_hashtbl[hashval].nfb_head, - nf_node) { - if (inode == nf->nf_inode) { - ret = true; - break; - } - } - rcu_read_unlock(); + if (rhashtable_lookup_fast(&nfsd_file_rhash_tbl, &key, + nfsd_file_rhash_params) != NULL) + ret = true; trace_nfsd_file_is_cached(inode, (int)ret); return ret; } @@ -1109,39 +1073,51 @@ static __be32 nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **pnf, bool open) { - __be32 status; - struct net *net = SVC_NET(rqstp); + struct nfsd_file_lookup_key key = { + .type = NFSD_FILE_KEY_FULL, + .need = may_flags & NFSD_FILE_MAY_MASK, + .net = SVC_NET(rqstp), + }; struct nfsd_file *nf, *new; - struct inode *inode; - unsigned int hashval; bool retry = true; + __be32 status; - /* FIXME: skip this if fh_dentry is already set? */ status = fh_verify(rqstp, fhp, S_IFREG, may_flags|NFSD_MAY_OWNER_OVERRIDE); if (status != nfs_ok) return status; + key.inode = d_inode(fhp->fh_dentry); + key.cred = get_current_cred(); - inode = d_inode(fhp->fh_dentry); - hashval = (unsigned int)hash_long(inode->i_ino, NFSD_FILE_HASH_BITS); retry: - rcu_read_lock(); - nf = nfsd_file_find_locked(inode, may_flags, hashval, net); - rcu_read_unlock(); + /* Avoid allocation if the item is already in cache */ + nf = rhashtable_lookup_fast(&nfsd_file_rhash_tbl, &key, + nfsd_file_rhash_params); + if (nf) + nf = nfsd_file_get(nf); if (nf) goto wait_for_construction; - new = nfsd_file_alloc(inode, may_flags, net); + new = nfsd_file_alloc(&key, may_flags); if (!new) { status = nfserr_jukebox; goto out_status; } - spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); - nf = nfsd_file_find_locked(inode, may_flags, hashval, net); - if (nf == NULL) + nf = rhashtable_lookup_get_insert_key(&nfsd_file_rhash_tbl, + &key, &new->nf_rhash, + nfsd_file_rhash_params); + if (!nf) { + nf = new; goto open_file; - spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); + } + if (IS_ERR(nf)) + goto insert_err; + nf = nfsd_file_get(nf); + if (nf == NULL) { + nf = new; + goto open_file; + } nfsd_file_slab_free(&new->nf_rcu); wait_for_construction: @@ -1149,6 +1125,7 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, /* Did construction of this file fail? */ if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { + trace_nfsd_file_cons_err(rqstp, key.inode, may_flags, nf); if (!retry) { status = nfserr_jukebox; goto out; @@ -1173,22 +1150,11 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, } out_status: - trace_nfsd_file_acquire(rqstp, inode, may_flags, nf, status); + put_cred(key.cred); + trace_nfsd_file_acquire(rqstp, key.inode, may_flags, nf, status); return status; open_file: - nf = new; - /* Take reference for the hashtable */ - refcount_inc(&nf->nf_ref); - __set_bit(NFSD_FILE_HASHED, &nf->nf_flags); - __set_bit(NFSD_FILE_PENDING, &nf->nf_flags); - hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head); - ++nfsd_file_hashtbl[hashval].nfb_count; - nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount, - nfsd_file_hashtbl[hashval].nfb_count); - spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); - atomic_long_inc(&nfsd_filecache_count); - nf->nf_mark = nfsd_file_mark_find_or_create(nf); if (nf->nf_mark) { if (open) { @@ -1203,19 +1169,20 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, * If construction failed, or we raced with a call to unlink() * then unhash. */ - if (status != nfs_ok || inode->i_nlink == 0) { - bool do_free; - nfsd_file_lru_remove(nf); - spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); - do_free = nfsd_file_unhash(nf); - spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); - if (do_free) + if (status != nfs_ok || key.inode->i_nlink == 0) + if (nfsd_file_unhash(nf)) nfsd_file_put_noref(nf); - } clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags); smp_mb__after_atomic(); wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING); goto out; + +insert_err: + nfsd_file_slab_free(&new->nf_rcu); + trace_nfsd_file_insert_err(rqstp, key.inode, may_flags, PTR_ERR(nf)); + nf = NULL; + status = nfserr_jukebox; + goto out_status; } /** @@ -1261,21 +1228,23 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { unsigned long releases = 0, pages_flushed = 0, evictions = 0; unsigned long hits = 0, acquisitions = 0; - unsigned int i, count = 0, longest = 0; + unsigned int i, count = 0, buckets = 0; unsigned long lru = 0, total_age = 0; - /* - * No need for spinlocks here since we're not terribly interested in - * accuracy. We do take the nfsd_mutex simply to ensure that we - * don't end up racing with server shutdown - */ + /* Serialize with server shutdown */ mutex_lock(&nfsd_mutex); if (test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1) { - for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) { - count += nfsd_file_hashtbl[i].nfb_count; - longest = max(longest, nfsd_file_hashtbl[i].nfb_count); - } + struct bucket_table *tbl; + struct rhashtable *ht; + lru = list_lru_count(&nfsd_file_lru); + + rcu_read_lock(); + ht = &nfsd_file_rhash_tbl; + count = atomic_read(&ht->nelems); + tbl = rht_dereference_rcu(ht->tbl, ht); + buckets = tbl->size; + rcu_read_unlock(); } mutex_unlock(&nfsd_mutex); @@ -1289,7 +1258,7 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) } seq_printf(m, "total entries: %u\n", count); - seq_printf(m, "longest chain: %u\n", longest); + seq_printf(m, "hash buckets: %u\n", buckets); seq_printf(m, "lru entries: %lu\n", lru); seq_printf(m, "cache hits: %lu\n", hits); seq_printf(m, "acquisitions: %lu\n", acquisitions); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index f170f07ec0fd..33bd8618c20a 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -743,7 +743,7 @@ DEFINE_NFSD_FILE_EVENT(nfsd_file_alloc); DEFINE_NFSD_FILE_EVENT(nfsd_file_put_final); DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash); DEFINE_NFSD_FILE_EVENT(nfsd_file_put); -DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_release_locked); +DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_dispose); TRACE_EVENT(nfsd_file_acquire, TP_PROTO( @@ -786,6 +786,67 @@ TRACE_EVENT(nfsd_file_acquire, __entry->nf_file, __entry->status) ); +TRACE_EVENT(nfsd_file_insert_err, + TP_PROTO( + const struct svc_rqst *rqstp, + const struct inode *inode, + unsigned int may_flags, + long error + ), + TP_ARGS(rqstp, inode, may_flags, error), + TP_STRUCT__entry( + __field(u32, xid) + __field(const void *, inode) + __field(unsigned long, may_flags) + __field(long, error) + ), + TP_fast_assign( + __entry->xid = be32_to_cpu(rqstp->rq_xid); + __entry->inode = inode; + __entry->may_flags = may_flags; + __entry->error = error; + ), + TP_printk("xid=0x%x inode=%p may_flags=%s error=%ld", + __entry->xid, __entry->inode, + show_nfsd_may_flags(__entry->may_flags), + __entry->error + ) +); + +TRACE_EVENT(nfsd_file_cons_err, + TP_PROTO( + const struct svc_rqst *rqstp, + const struct inode *inode, + unsigned int may_flags, + const struct nfsd_file *nf + ), + TP_ARGS(rqstp, inode, may_flags, nf), + TP_STRUCT__entry( + __field(u32, xid) + __field(const void *, inode) + __field(unsigned long, may_flags) + __field(unsigned int, nf_ref) + __field(unsigned long, nf_flags) + __field(unsigned long, nf_may) + __field(const void *, nf_file) + ), + TP_fast_assign( + __entry->xid = be32_to_cpu(rqstp->rq_xid); + __entry->inode = inode; + __entry->may_flags = may_flags; + __entry->nf_ref = refcount_read(&nf->nf_ref); + __entry->nf_flags = nf->nf_flags; + __entry->nf_may = nf->nf_may; + __entry->nf_file = nf->nf_file; + ), + TP_printk("xid=0x%x inode=%p may_flags=%s ref=%u nf_flags=%s nf_may=%s nf_file=%p", + __entry->xid, __entry->inode, + show_nfsd_may_flags(__entry->may_flags), __entry->nf_ref, + show_nf_flags(__entry->nf_flags), + show_nfsd_may_flags(__entry->nf_may), __entry->nf_file + ) +); + TRACE_EVENT(nfsd_file_open, TP_PROTO(struct nfsd_file *nf, __be32 status), TP_ARGS(nf, status), From de070f66d23ff90e4e25122d5809bd21d2441e52 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:26:36 -0400 Subject: [PATCH 0879/1565] NFSD: Clean up unused code after rhashtable conversion [ Upstream commit 0ec8e9d1539a7b8109a554028bbce441052f847e ] Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 33 +-------------------------------- fs/nfsd/filecache.h | 1 - 2 files changed, 1 insertion(+), 33 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 45dd4f3fa090..c6dc55c0f758 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -21,11 +21,6 @@ #include "filecache.h" #include "trace.h" -#define NFSDDBG_FACILITY NFSDDBG_FH - -/* FIXME: dynamically size this for the machine somehow? */ -#define NFSD_FILE_HASH_BITS 12 -#define NFSD_FILE_HASH_SIZE (1 << NFSD_FILE_HASH_BITS) #define NFSD_LAUNDRETTE_DELAY (2 * HZ) #define NFSD_FILE_CACHE_UP (0) @@ -33,13 +28,6 @@ /* We only care about NFSD_MAY_READ/WRITE for this cache */ #define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE) -struct nfsd_fcache_bucket { - struct hlist_head nfb_head; - spinlock_t nfb_lock; - unsigned int nfb_count; - unsigned int nfb_maxcount; -}; - static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); static DEFINE_PER_CPU(unsigned long, nfsd_file_releases); @@ -57,7 +45,6 @@ static struct workqueue_struct *nfsd_filecache_wq __read_mostly; static struct kmem_cache *nfsd_file_slab; static struct kmem_cache *nfsd_file_mark_slab; -static struct nfsd_fcache_bucket *nfsd_file_hashtbl; static struct list_lru nfsd_file_lru; static unsigned long nfsd_file_flags; static struct fsnotify_group *nfsd_file_fsnotify_group; @@ -302,7 +289,6 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) nf = kmem_cache_alloc(nfsd_file_slab, GFP_KERNEL); if (nf) { - INIT_HLIST_NODE(&nf->nf_node); INIT_LIST_HEAD(&nf->nf_lru); nf->nf_birthtime = ktime_get(); nf->nf_file = NULL; @@ -810,8 +796,7 @@ static const struct fsnotify_ops nfsd_file_fsnotify_ops = { int nfsd_file_cache_init(void) { - int ret; - unsigned int i; + int ret; lockdep_assert_held(&nfsd_mutex); if (test_and_set_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1) @@ -826,13 +811,6 @@ nfsd_file_cache_init(void) if (!nfsd_filecache_wq) goto out; - nfsd_file_hashtbl = kvcalloc(NFSD_FILE_HASH_SIZE, - sizeof(*nfsd_file_hashtbl), GFP_KERNEL); - if (!nfsd_file_hashtbl) { - pr_err("nfsd: unable to allocate nfsd_file_hashtbl\n"); - goto out_err; - } - nfsd_file_slab = kmem_cache_create("nfsd_file", sizeof(struct nfsd_file), 0, 0, NULL); if (!nfsd_file_slab) { @@ -876,11 +854,6 @@ nfsd_file_cache_init(void) goto out_notifier; } - for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) { - INIT_HLIST_HEAD(&nfsd_file_hashtbl[i].nfb_head); - spin_lock_init(&nfsd_file_hashtbl[i].nfb_lock); - } - INIT_DELAYED_WORK(&nfsd_filecache_laundrette, nfsd_file_gc_worker); out: return ret; @@ -895,8 +868,6 @@ nfsd_file_cache_init(void) nfsd_file_slab = NULL; kmem_cache_destroy(nfsd_file_mark_slab); nfsd_file_mark_slab = NULL; - kvfree(nfsd_file_hashtbl); - nfsd_file_hashtbl = NULL; destroy_workqueue(nfsd_filecache_wq); nfsd_filecache_wq = NULL; rhashtable_destroy(&nfsd_file_rhash_tbl); @@ -1026,8 +997,6 @@ nfsd_file_cache_shutdown(void) fsnotify_wait_marks_destroyed(); kmem_cache_destroy(nfsd_file_mark_slab); nfsd_file_mark_slab = NULL; - kvfree(nfsd_file_hashtbl); - nfsd_file_hashtbl = NULL; destroy_workqueue(nfsd_filecache_wq); nfsd_filecache_wq = NULL; rhashtable_destroy(&nfsd_file_rhash_tbl); diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 5cbfc61a7d7d..ee9ed99d8b8f 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -30,7 +30,6 @@ struct nfsd_file_mark { */ struct nfsd_file { struct rhash_head nf_rhash; - struct hlist_node nf_node; struct list_head nf_lru; struct rcu_head nf_rcu; struct file *nf_file; From 26664203ddebce2cbadcca756cc114ea07953301 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:26:43 -0400 Subject: [PATCH 0880/1565] NFSD: Separate tracepoints for acquire and create [ Upstream commit be0230069fcbf7d332d010b57c1d0cfd623a84d6 ] These tracepoints collect different information: the create case does not open a file, so there's no nf_file available. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 9 ++++---- fs/nfsd/nfs4state.c | 1 + fs/nfsd/trace.h | 54 ++++++++++++++++++++++++++++++++++++++------- 3 files changed, 52 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index c6dc55c0f758..85813affb8ab 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1039,7 +1039,7 @@ nfsd_file_is_cached(struct inode *inode) } static __be32 -nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, +nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **pnf, bool open) { struct nfsd_file_lookup_key key = { @@ -1120,7 +1120,8 @@ nfsd_do_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, out_status: put_cred(key.cred); - trace_nfsd_file_acquire(rqstp, key.inode, may_flags, nf, status); + if (open) + trace_nfsd_file_acquire(rqstp, key.inode, may_flags, nf, status); return status; open_file: @@ -1168,7 +1169,7 @@ __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **pnf) { - return nfsd_do_file_acquire(rqstp, fhp, may_flags, pnf, true); + return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true); } /** @@ -1185,7 +1186,7 @@ __be32 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **pnf) { - return nfsd_do_file_acquire(rqstp, fhp, may_flags, pnf, false); + return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, false); } /* diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 76ec207f5e44..587e79234657 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5121,6 +5121,7 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, goto out_put_access; nf->nf_file = open->op_filp; open->op_filp = NULL; + trace_nfsd_file_create(rqstp, access, nf); } spin_lock(&fp->fi_lock); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 33bd8618c20a..ce391ba2f1ca 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -747,10 +747,10 @@ DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_dispose); TRACE_EVENT(nfsd_file_acquire, TP_PROTO( - struct svc_rqst *rqstp, - struct inode *inode, + const struct svc_rqst *rqstp, + const struct inode *inode, unsigned int may_flags, - struct nfsd_file *nf, + const struct nfsd_file *nf, __be32 status ), @@ -758,12 +758,12 @@ TRACE_EVENT(nfsd_file_acquire, TP_STRUCT__entry( __field(u32, xid) - __field(void *, inode) + __field(const void *, inode) __field(unsigned long, may_flags) - __field(int, nf_ref) + __field(unsigned int, nf_ref) __field(unsigned long, nf_flags) __field(unsigned long, nf_may) - __field(struct file *, nf_file) + __field(const void *, nf_file) __field(u32, status) ), @@ -778,12 +778,50 @@ TRACE_EVENT(nfsd_file_acquire, __entry->status = be32_to_cpu(status); ), - TP_printk("xid=0x%x inode=%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=%p status=%u", + TP_printk("xid=0x%x inode=%p may_flags=%s ref=%u nf_flags=%s nf_may=%s nf_file=%p status=%u", __entry->xid, __entry->inode, show_nfsd_may_flags(__entry->may_flags), __entry->nf_ref, show_nf_flags(__entry->nf_flags), show_nfsd_may_flags(__entry->nf_may), - __entry->nf_file, __entry->status) + __entry->nf_file, __entry->status + ) +); + +TRACE_EVENT(nfsd_file_create, + TP_PROTO( + const struct svc_rqst *rqstp, + unsigned int may_flags, + const struct nfsd_file *nf + ), + + TP_ARGS(rqstp, may_flags, nf), + + TP_STRUCT__entry( + __field(const void *, nf_inode) + __field(const void *, nf_file) + __field(unsigned long, may_flags) + __field(unsigned long, nf_flags) + __field(unsigned long, nf_may) + __field(unsigned int, nf_ref) + __field(u32, xid) + ), + + TP_fast_assign( + __entry->nf_inode = nf->nf_inode; + __entry->nf_file = nf->nf_file; + __entry->may_flags = may_flags; + __entry->nf_flags = nf->nf_flags; + __entry->nf_may = nf->nf_may; + __entry->nf_ref = refcount_read(&nf->nf_ref); + __entry->xid = be32_to_cpu(rqstp->rq_xid); + ), + + TP_printk("xid=0x%x inode=%p may_flags=%s ref=%u nf_flags=%s nf_may=%s nf_file=%p", + __entry->xid, __entry->nf_inode, + show_nfsd_may_flags(__entry->may_flags), + __entry->nf_ref, show_nf_flags(__entry->nf_flags), + show_nfsd_may_flags(__entry->nf_may), __entry->nf_file + ) ); TRACE_EVENT(nfsd_file_insert_err, From c553e79c08038cd92b61a2d38549a4f5b865512e Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:26:49 -0400 Subject: [PATCH 0881/1565] NFSD: Move nfsd_file_trace_alloc() tracepoint [ Upstream commit b40a2839470cd62ed68c4a32d72a18ee8975b1ac ] Avoid recording the allocation of an nfsd_file item that is immediately released because a matching item was already inserted in the hash. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 2 +- fs/nfsd/trace.h | 25 ++++++++++++++++++++++++- 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 85813affb8ab..26cfae138b90 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -302,7 +302,6 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) refcount_set(&nf->nf_ref, 2); nf->nf_may = key->need; nf->nf_mark = NULL; - trace_nfsd_file_alloc(nf); } return nf; } @@ -1125,6 +1124,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, return status; open_file: + trace_nfsd_file_alloc(nf); nf->nf_mark = nfsd_file_mark_find_or_create(nf); if (nf->nf_mark) { if (open) { diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index ce391ba2f1ca..22c1fb735f1a 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -739,12 +739,35 @@ DEFINE_EVENT(nfsd_file_class, name, \ TP_PROTO(struct nfsd_file *nf), \ TP_ARGS(nf)) -DEFINE_NFSD_FILE_EVENT(nfsd_file_alloc); DEFINE_NFSD_FILE_EVENT(nfsd_file_put_final); DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash); DEFINE_NFSD_FILE_EVENT(nfsd_file_put); DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_dispose); +TRACE_EVENT(nfsd_file_alloc, + TP_PROTO( + const struct nfsd_file *nf + ), + TP_ARGS(nf), + TP_STRUCT__entry( + __field(const void *, nf_inode) + __field(unsigned long, nf_flags) + __field(unsigned long, nf_may) + __field(unsigned int, nf_ref) + ), + TP_fast_assign( + __entry->nf_inode = nf->nf_inode; + __entry->nf_flags = nf->nf_flags; + __entry->nf_ref = refcount_read(&nf->nf_ref); + __entry->nf_may = nf->nf_may; + ), + TP_printk("inode=%p ref=%u flags=%s may=%s", + __entry->nf_inode, __entry->nf_ref, + show_nf_flags(__entry->nf_flags), + show_nfsd_may_flags(__entry->nf_may) + ) +); + TRACE_EVENT(nfsd_file_acquire, TP_PROTO( const struct svc_rqst *rqstp, From 451b2c2125dfa98d1a050bd149a8e10a1288ac94 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:27:02 -0400 Subject: [PATCH 0882/1565] NFSD: NFSv4 CLOSE should release an nfsd_file immediately [ Upstream commit 5e138c4a750dc140d881dab4a8804b094bbc08d2 ] The last close of a file should enable other accessors to open and use that file immediately. Leaving the file open in the filecache prevents other users from accessing that file until the filecache garbage-collects the file -- sometimes that takes several seconds. Reported-by: Wang Yugui Link: https://bugzilla.linux-nfs.org/show_bug.cgi?387 Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 18 ++++++++++++++++++ fs/nfsd/filecache.h | 1 + fs/nfsd/nfs4state.c | 4 ++-- 3 files changed, 21 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 26cfae138b90..7ad27655db69 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -451,6 +451,24 @@ nfsd_file_put(struct nfsd_file *nf) nfsd_file_put_noref(nf); } +/** + * nfsd_file_close - Close an nfsd_file + * @nf: nfsd_file to close + * + * If this is the final reference for @nf, free it immediately. + * This reflects an on-the-wire CLOSE or DELEGRETURN into the + * VFS and exported filesystem. + */ +void nfsd_file_close(struct nfsd_file *nf) +{ + nfsd_file_put(nf); + if (refcount_dec_if_one(&nf->nf_ref)) { + nfsd_file_unhash(nf); + nfsd_file_lru_remove(nf); + nfsd_file_free(nf); + } +} + struct nfsd_file * nfsd_file_get(struct nfsd_file *nf) { diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index ee9ed99d8b8f..28145f162892 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -52,6 +52,7 @@ void nfsd_file_cache_shutdown(void); int nfsd_file_cache_start_net(struct net *net); void nfsd_file_cache_shutdown_net(struct net *net); void nfsd_file_put(struct nfsd_file *nf); +void nfsd_file_close(struct nfsd_file *nf); struct nfsd_file *nfsd_file_get(struct nfsd_file *nf); void nfsd_file_close_inode_sync(struct inode *inode); bool nfsd_file_is_cached(struct inode *inode); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 587e79234657..c4c1e35d3eb7 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -831,9 +831,9 @@ static void __nfs4_file_put_access(struct nfs4_file *fp, int oflag) swap(f2, fp->fi_fds[O_RDWR]); spin_unlock(&fp->fi_lock); if (f1) - nfsd_file_put(f1); + nfsd_file_close(f1); if (f2) - nfsd_file_put(f2); + nfsd_file_close(f2); } } From 705e2cb1fec053c0354943c3c10d6d95e71c21e7 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 8 Jul 2022 14:27:09 -0400 Subject: [PATCH 0883/1565] NFSD: Ensure nf_inode is never dereferenced [ Upstream commit 427f5f83a3191cbf024c5aea6e5b601cdf88d895 ] The documenting comment for struct nf_file states: /* * A representation of a file that has been opened by knfsd. These are hashed * in the hashtable by inode pointer value. Note that this object doesn't * hold a reference to the inode by itself, so the nf_inode pointer should * never be dereferenced, only used for comparison. */ Replace the two existing dereferences to make the comment always true. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 5 ++--- fs/nfsd/filecache.h | 2 +- fs/nfsd/nfs4state.c | 2 +- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 7ad27655db69..55478d411e5a 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -227,12 +227,11 @@ nfsd_file_mark_put(struct nfsd_file_mark *nfm) } static struct nfsd_file_mark * -nfsd_file_mark_find_or_create(struct nfsd_file *nf) +nfsd_file_mark_find_or_create(struct nfsd_file *nf, struct inode *inode) { int err; struct fsnotify_mark *mark; struct nfsd_file_mark *nfm = NULL, *new; - struct inode *inode = nf->nf_inode; do { fsnotify_group_lock(nfsd_file_fsnotify_group); @@ -1143,7 +1142,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, open_file: trace_nfsd_file_alloc(nf); - nf->nf_mark = nfsd_file_mark_find_or_create(nf); + nf->nf_mark = nfsd_file_mark_find_or_create(nf, key.inode); if (nf->nf_mark) { if (open) { status = nfsd_open_verified(rqstp, fhp, may_flags, diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 28145f162892..8e8c0c47d67d 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -39,7 +39,7 @@ struct nfsd_file { #define NFSD_FILE_PENDING (1) #define NFSD_FILE_REFERENCED (2) unsigned long nf_flags; - struct inode *nf_inode; + struct inode *nf_inode; /* don't deref */ refcount_t nf_ref; unsigned char nf_may; struct nfsd_file_mark *nf_mark; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index c4c1e35d3eb7..16e5bd54d92c 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2577,7 +2577,7 @@ static void nfs4_show_fname(struct seq_file *s, struct nfsd_file *f) static void nfs4_show_superblock(struct seq_file *s, struct nfsd_file *f) { - struct inode *inode = f->nf_inode; + struct inode *inode = file_inode(f->nf_file); seq_printf(s, "superblock: \"%02x:%02x:%ld\"", MAJOR(inode->i_sb->s_dev), From 4e7a739f6372422a38a69dce96413641f9d623b2 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Fri, 15 Jul 2022 16:54:51 -0700 Subject: [PATCH 0884/1565] NFSD: refactoring v4 specific code to a helper in nfs4state.c [ Upstream commit 6867137ebcf4155fe25f2ecf7c29b9fb90a76d1d ] This patch moves the v4 specific code from nfsd_init_net() to nfsd4_init_leases_net() helper in nfs4state.c Signed-off-by: Dai Ngo Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 12 ++++++++++++ fs/nfsd/nfsctl.c | 9 +-------- fs/nfsd/nfsd.h | 4 ++++ 3 files changed, 17 insertions(+), 8 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 16e5bd54d92c..76a77329cf36 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4347,6 +4347,18 @@ nfsd4_init_slabs(void) return -ENOMEM; } +void nfsd4_init_leases_net(struct nfsd_net *nn) +{ + nn->nfsd4_lease = 90; /* default lease time */ + nn->nfsd4_grace = 90; + nn->somebody_reclaimed = false; + nn->track_reclaim_completes = false; + nn->clverifier_counter = prandom_u32(); + nn->clientid_base = prandom_u32(); + nn->clientid_counter = nn->clientid_base + 1; + nn->s2s_cp_cl_id = nn->clientid_counter++; +} + static void init_nfs4_replay(struct nfs4_replay *rp) { rp->rp_status = nfserr_serverfault; diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 7002edbf2687..164c822ae3ae 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1484,14 +1484,7 @@ static __net_init int nfsd_init_net(struct net *net) retval = nfsd_reply_cache_init(nn); if (retval) goto out_drc_error; - nn->nfsd4_lease = 90; /* default lease time */ - nn->nfsd4_grace = 90; - nn->somebody_reclaimed = false; - nn->track_reclaim_completes = false; - nn->clverifier_counter = prandom_u32(); - nn->clientid_base = prandom_u32(); - nn->clientid_counter = nn->clientid_base + 1; - nn->s2s_cp_cl_id = nn->clientid_counter++; + nfsd4_init_leases_net(nn); get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); seqlock_init(&nn->writeverf_lock); diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 9a8b09afc173..ef8087691138 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -496,12 +496,16 @@ extern void unregister_cld_notifier(void); extern void nfsd4_ssc_init_umount_work(struct nfsd_net *nn); #endif +extern void nfsd4_init_leases_net(struct nfsd_net *nn); + #else /* CONFIG_NFSD_V4 */ static inline int nfsd4_is_junction(struct dentry *dentry) { return 0; } +static inline void nfsd4_init_leases_net(struct nfsd_net *nn) {}; + #define register_cld_notifier() 0 #define unregister_cld_notifier() do { } while(0) From ec16f5f7faaa6e0cf602d0c6ebe678a332e9dbea Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Fri, 15 Jul 2022 16:54:52 -0700 Subject: [PATCH 0885/1565] NFSD: keep track of the number of v4 clients in the system [ Upstream commit 0926c39515aa065a296e97dfc8790026f1e53f86 ] Add counter nfs4_client_count to keep track of the total number of v4 clients, including courtesy clients, in the system. Signed-off-by: Dai Ngo Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/netns.h | 2 ++ fs/nfsd/nfs4state.c | 10 ++++++++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 1b1a962a1804..ce864f001a3e 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -189,6 +189,8 @@ struct nfsd_net { struct nfsd_fcache_disposal *fcache_disposal; siphash_key_t siphash_key; + + atomic_t nfs4_client_count; }; /* Simple check to find out if a given net was properly initialized */ diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 76a77329cf36..5003d73fa928 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2066,7 +2066,8 @@ STALE_CLIENTID(clientid_t *clid, struct nfsd_net *nn) * This type of memory management is somewhat inefficient, but we use it * anyway since SETCLIENTID is not a common operation. */ -static struct nfs4_client *alloc_client(struct xdr_netobj name) +static struct nfs4_client *alloc_client(struct xdr_netobj name, + struct nfsd_net *nn) { struct nfs4_client *clp; int i; @@ -2089,6 +2090,7 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name) atomic_set(&clp->cl_rpc_users, 0); clp->cl_cb_state = NFSD4_CB_UNKNOWN; clp->cl_state = NFSD4_ACTIVE; + atomic_inc(&nn->nfs4_client_count); atomic_set(&clp->cl_delegs_in_recall, 0); INIT_LIST_HEAD(&clp->cl_idhash); INIT_LIST_HEAD(&clp->cl_openowners); @@ -2196,6 +2198,7 @@ static __be32 mark_client_expired_locked(struct nfs4_client *clp) static void __destroy_client(struct nfs4_client *clp) { + struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); int i; struct nfs4_openowner *oo; struct nfs4_delegation *dp; @@ -2239,6 +2242,7 @@ __destroy_client(struct nfs4_client *clp) nfsd4_shutdown_callback(clp); if (clp->cl_cb_conn.cb_xprt) svc_xprt_put(clp->cl_cb_conn.cb_xprt); + atomic_add_unless(&nn->nfs4_client_count, -1, 0); free_client(clp); wake_up_all(&expiry_wq); } @@ -2865,7 +2869,7 @@ static struct nfs4_client *create_client(struct xdr_netobj name, struct nfsd_net *nn = net_generic(net, nfsd_net_id); struct dentry *dentries[ARRAY_SIZE(client_files)]; - clp = alloc_client(name); + clp = alloc_client(name, nn); if (clp == NULL) return NULL; @@ -4357,6 +4361,8 @@ void nfsd4_init_leases_net(struct nfsd_net *nn) nn->clientid_base = prandom_u32(); nn->clientid_counter = nn->clientid_base + 1; nn->s2s_cp_cl_id = nn->clientid_counter++; + + atomic_set(&nn->nfs4_client_count, 0); } static void init_nfs4_replay(struct nfs4_replay *rp) From f81ab23756abfc1829f20aaf6469f76d0ed4326c Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Fri, 15 Jul 2022 16:54:53 -0700 Subject: [PATCH 0886/1565] NFSD: limit the number of v4 clients to 1024 per 1GB of system memory [ Upstream commit 4271c2c0887562318a0afef97d32d8a71cbe0743 ] Currently there is no limit on how many v4 clients are supported by the system. This can be a problem in systems with small memory configuration to function properly when a very large number of clients exist that creates memory shortage conditions. This patch enforces a limit of 1024 NFSv4 clients, including courtesy clients, per 1GB of system memory. When the number of the clients reaches the limit, requests that create new clients are returned with NFS4ERR_DELAY and the laundromat is kicked start to trim old clients. Due to the overhead of the upcall to remove the client record, the maximun number of clients the laundromat removes on each run is limited to 128. This is done to ensure the laundromat can still process the other tasks in a timely manner. Since there is now a limit of the number of clients, the 24-hr idle time limit of courtesy client is no longer needed and was removed. Signed-off-by: Dai Ngo Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/netns.h | 1 + fs/nfsd/nfs4state.c | 27 +++++++++++++++++++++------ fs/nfsd/nfsd.h | 2 ++ 3 files changed, 24 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index ce864f001a3e..ffe17743cc74 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -191,6 +191,7 @@ struct nfsd_net { siphash_key_t siphash_key; atomic_t nfs4_client_count; + int nfs4_max_clients; }; /* Simple check to find out if a given net was properly initialized */ diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 5003d73fa928..1babc08fcb88 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2072,6 +2072,10 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name, struct nfs4_client *clp; int i; + if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) { + mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); + return NULL; + } clp = kmem_cache_zalloc(client_slab, GFP_KERNEL); if (clp == NULL) return NULL; @@ -4353,6 +4357,9 @@ nfsd4_init_slabs(void) void nfsd4_init_leases_net(struct nfsd_net *nn) { + struct sysinfo si; + u64 max_clients; + nn->nfsd4_lease = 90; /* default lease time */ nn->nfsd4_grace = 90; nn->somebody_reclaimed = false; @@ -4363,6 +4370,10 @@ void nfsd4_init_leases_net(struct nfsd_net *nn) nn->s2s_cp_cl_id = nn->clientid_counter++; atomic_set(&nn->nfs4_client_count, 0); + si_meminfo(&si); + max_clients = (u64)si.totalram * si.mem_unit / (1024 * 1024 * 1024); + max_clients *= NFS4_CLIENTS_PER_GB; + nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB); } static void init_nfs4_replay(struct nfs4_replay *rp) @@ -5828,9 +5839,12 @@ static void nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, struct laundry_time *lt) { + unsigned int maxreap, reapcnt = 0; struct list_head *pos, *next; struct nfs4_client *clp; + maxreap = (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) ? + NFSD_CLIENT_MAX_TRIM_PER_RUN : 0; INIT_LIST_HEAD(reaplist); spin_lock(&nn->client_lock); list_for_each_safe(pos, next, &nn->client_lru) { @@ -5841,14 +5855,15 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, break; if (!atomic_read(&clp->cl_rpc_users)) clp->cl_state = NFSD4_COURTESY; - if (!client_has_state(clp) || - ktime_get_boottime_seconds() >= - (clp->cl_time + NFSD_COURTESY_CLIENT_TIMEOUT)) + if (!client_has_state(clp)) goto exp_client; - if (nfs4_anylock_blockers(clp)) { + if (!nfs4_anylock_blockers(clp)) + if (reapcnt >= maxreap) + continue; exp_client: - if (!mark_client_expired_locked(clp)) - list_add(&clp->cl_lru, reaplist); + if (!mark_client_expired_locked(clp)) { + list_add(&clp->cl_lru, reaplist); + reapcnt++; } } spin_unlock(&nn->client_lock); diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index ef8087691138..57a468ed85c3 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -341,6 +341,8 @@ void nfsd_lockd_shutdown(void); #define NFSD_LAUNDROMAT_MINTIMEOUT 1 /* seconds */ #define NFSD_COURTESY_CLIENT_TIMEOUT (24 * 60 * 60) /* seconds */ +#define NFSD_CLIENT_MAX_TRIM_PER_RUN 128 +#define NFS4_CLIENTS_PER_GB 1024 /* * The following attributes are currently not supported by the NFSv4 server: From 3b5dcf6b46d99801049240111f78db0d3435c265 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 20 Jul 2022 08:39:23 -0400 Subject: [PATCH 0887/1565] nfsd: silence extraneous printk on nfsd.ko insertion [ Upstream commit 3a5940bfa17fb9964bf9688b4356ca643a8f5e2d ] This printk pops every time nfsd.ko gets plugged in. Most kmods don't do that and this one is not very informative. Olaf's email address seems to be defunct at this point anyway. Just drop it. Cc: Olaf Kirch Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsctl.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 164c822ae3ae..917fa1892fd2 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1519,7 +1519,6 @@ static struct pernet_operations nfsd_net_ops = { static int __init init_nfsd(void) { int retval; - printk(KERN_INFO "Installing knfsd (copyright (C) 1996 okir@monad.swb.de).\n"); retval = nfsd4_init_slabs(); if (retval) From d8c3d704085cf197f95289029bd927e4c5fa9163 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 22 Jul 2022 16:08:38 -0400 Subject: [PATCH 0888/1565] NFSD: Optimize nfsd4_encode_operation() [ Upstream commit 095a764b7afb06c9499b798c04eaa3cbf70ebe2d ] write_bytes_to_xdr_buf() is a generic way to place a variable-length data item in an already-reserved spot in the encoding buffer. However, it is costly, and here, it is unnecessary because the data item is fixed in size, the buffer destination address is always word-aligned, and the destination location is already in @p. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index b98a24c2a753..14e8e3755060 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5381,8 +5381,7 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) so->so_replay.rp_buf, len); } status: - /* Note that op->status is already in network byte order: */ - write_bytes_to_xdr_buf(xdr->buf, post_err_offset - 4, &op->status, 4); + *p = op->status; } /* From 8fd87bf897bcb3af491f0d6ad9586d8f041db679 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 22 Jul 2022 16:08:45 -0400 Subject: [PATCH 0889/1565] NFSD: Optimize nfsd4_encode_fattr() [ Upstream commit ab04de60ae1cc64ae16b77feae795311b97720c7 ] write_bytes_to_xdr_buf() is a generic way to place a variable-length data item in an already-reserved spot in the encoding buffer. However, it is costly. In nfsd4_encode_fattr(), it is unnecessary because the data item is fixed in size and the buffer destination address is always word-aligned. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 14e8e3755060..1e388cdf9b00 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2828,10 +2828,9 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct kstat stat; struct svc_fh *tempfh = NULL; struct kstatfs statfs; - __be32 *p; + __be32 *p, *attrlen_p; int starting_len = xdr->buf->len; int attrlen_offset; - __be32 attrlen; u32 dummy; u64 dummy64; u32 rdattr_err = 0; @@ -2919,10 +2918,9 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, goto out; attrlen_offset = xdr->buf->len; - p = xdr_reserve_space(xdr, 4); - if (!p) + attrlen_p = xdr_reserve_space(xdr, XDR_UNIT); + if (!attrlen_p) goto out_resource; - p++; /* to be backfilled later */ if (bmval0 & FATTR4_WORD0_SUPPORTED_ATTRS) { u32 supp[3]; @@ -3344,8 +3342,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, *p++ = cpu_to_be32(err == 0); } - attrlen = htonl(xdr->buf->len - attrlen_offset - 4); - write_bytes_to_xdr_buf(xdr->buf, attrlen_offset, &attrlen, 4); + *attrlen_p = cpu_to_be32(xdr->buf->len - attrlen_offset - XDR_UNIT); status = nfs_ok; out: From 427bd174a4d345bd0e8f9d9b34a45fdd073d795b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 22 Jul 2022 16:08:51 -0400 Subject: [PATCH 0890/1565] NFSD: Clean up SPLICE_OK in nfsd4_encode_read() [ Upstream commit c738b218a2e5a753a336b4b7fee6720b902c7ace ] Do the test_bit() once -- this reduces the number of locked-bus operations and makes the function a little easier to read. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 1e388cdf9b00..059e920c2191 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3991,6 +3991,7 @@ static __be32 nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_read *read) { + bool splice_ok = test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags); unsigned long maxcount; struct xdr_stream *xdr = resp->xdr; struct file *file; @@ -4003,11 +4004,10 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */ if (!p) { - WARN_ON_ONCE(test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags)); + WARN_ON_ONCE(splice_ok); return nfserr_resource; } - if (resp->xdr->buf->page_len && - test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags)) { + if (resp->xdr->buf->page_len && splice_ok) { WARN_ON_ONCE(1); return nfserr_serverfault; } @@ -4016,8 +4016,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, maxcount = min_t(unsigned long, read->rd_length, (xdr->buf->buflen - xdr->buf->len)); - if (file->f_op->splice_read && - test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags)) + if (file->f_op->splice_read && splice_ok) nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount); else nfserr = nfsd4_encode_readv(resp, read, file, maxcount); From d176e7348bd05822aee549e3e99326eef6940a86 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 22 Jul 2022 16:08:57 -0400 Subject: [PATCH 0891/1565] NFSD: Add an nfsd4_read::rd_eof field [ Upstream commit 24c7fb85498eda1d4c6b42cc4886328429814990 ] Refactor: Make the EOF result available in the entire NFSv4 READ path. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 11 +++++------ fs/nfsd/xdr4.h | 5 +++-- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 059e920c2191..8437a390480d 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3890,7 +3890,6 @@ static __be32 nfsd4_encode_splice_read( struct xdr_stream *xdr = resp->xdr; struct xdr_buf *buf = xdr->buf; int status, space_left; - u32 eof; __be32 nfserr; __be32 *p = xdr->p - 2; @@ -3899,7 +3898,8 @@ static __be32 nfsd4_encode_splice_read( return nfserr_resource; nfserr = nfsd_splice_read(read->rd_rqstp, read->rd_fhp, - file, read->rd_offset, &maxcount, &eof); + file, read->rd_offset, &maxcount, + &read->rd_eof); read->rd_length = maxcount; if (nfserr) goto out_err; @@ -3910,7 +3910,7 @@ static __be32 nfsd4_encode_splice_read( goto out_err; } - *(p++) = htonl(eof); + *(p++) = htonl(read->rd_eof); *(p++) = htonl(maxcount); buf->page_len = maxcount; @@ -3954,7 +3954,6 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, struct file *file, unsigned long maxcount) { struct xdr_stream *xdr = resp->xdr; - u32 eof; int starting_len = xdr->buf->len - 8; __be32 nfserr; __be32 tmp; @@ -3966,7 +3965,7 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset, resp->rqstp->rq_vec, read->rd_vlen, &maxcount, - &eof); + &read->rd_eof); read->rd_length = maxcount; if (nfserr) return nfserr; @@ -3974,7 +3973,7 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, return nfserr_io; xdr_truncate_encode(xdr, starting_len + 8 + xdr_align_size(maxcount)); - tmp = htonl(eof); + tmp = htonl(read->rd_eof); write_bytes_to_xdr_buf(xdr->buf, starting_len , &tmp, 4); tmp = htonl(maxcount); write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 448b687943cd..32617639a3ec 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -302,9 +302,10 @@ struct nfsd4_read { u32 rd_length; /* request */ int rd_vlen; struct nfsd_file *rd_nf; - + struct svc_rqst *rd_rqstp; /* response */ - struct svc_fh *rd_fhp; /* response */ + struct svc_fh *rd_fhp; /* response */ + u32 rd_eof; /* response */ }; struct nfsd4_readdir { From e77d3f5ee50fac3d3b1e31705af68fc2a8da24fb Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 22 Jul 2022 16:09:04 -0400 Subject: [PATCH 0892/1565] NFSD: Optimize nfsd4_encode_readv() [ Upstream commit 28d5bc468efe74b790e052f758ce083a5015c665 ] write_bytes_to_xdr_buf() is pretty expensive to use for inserting an XDR data item that is always 1 XDR_UNIT at an address that is always XDR word-aligned. Since both the readv and splice read paths encode EOF and maxcount values, move both to a common code path. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 8437a390480d..36f0f06714de 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3891,7 +3891,6 @@ static __be32 nfsd4_encode_splice_read( struct xdr_buf *buf = xdr->buf; int status, space_left; __be32 nfserr; - __be32 *p = xdr->p - 2; /* Make sure there will be room for padding if needed */ if (xdr->end - xdr->p < 1) @@ -3910,9 +3909,6 @@ static __be32 nfsd4_encode_splice_read( goto out_err; } - *(p++) = htonl(read->rd_eof); - *(p++) = htonl(maxcount); - buf->page_len = maxcount; buf->len += maxcount; xdr->page_ptr += (buf->page_base + maxcount + PAGE_SIZE - 1) @@ -3973,11 +3969,6 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, return nfserr_io; xdr_truncate_encode(xdr, starting_len + 8 + xdr_align_size(maxcount)); - tmp = htonl(read->rd_eof); - write_bytes_to_xdr_buf(xdr->buf, starting_len , &tmp, 4); - tmp = htonl(maxcount); - write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4); - tmp = xdr_zero; pad = (maxcount&3) ? 4 - (maxcount&3) : 0; write_bytes_to_xdr_buf(xdr->buf, starting_len + 8 + maxcount, @@ -4019,11 +4010,14 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount); else nfserr = nfsd4_encode_readv(resp, read, file, maxcount); - - if (nfserr) + if (nfserr) { xdr_truncate_encode(xdr, starting_len); + return nfserr; + } - return nfserr; + p = xdr_encode_bool(p, read->rd_eof); + *p = cpu_to_be32(read->rd_length); + return nfs_ok; } static __be32 From c587004a7634d02d1aec63cd94c50a1a8bcd8347 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 22 Jul 2022 16:09:10 -0400 Subject: [PATCH 0893/1565] NFSD: Simplify starting_len [ Upstream commit 071ae99feadfc55979f89287d6ad2c6a315cb46d ] Clean-up: Now that nfsd4_encode_readv() does not have to encode the EOF or rd_length values, it no longer needs to subtract 8 from @starting_len. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 36f0f06714de..f67a54f7eb13 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3950,7 +3950,7 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, struct file *file, unsigned long maxcount) { struct xdr_stream *xdr = resp->xdr; - int starting_len = xdr->buf->len - 8; + unsigned int starting_len = xdr->buf->len; __be32 nfserr; __be32 tmp; int pad; @@ -3965,14 +3965,13 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, read->rd_length = maxcount; if (nfserr) return nfserr; - if (svc_encode_result_payload(resp->rqstp, starting_len + 8, maxcount)) + if (svc_encode_result_payload(resp->rqstp, starting_len, maxcount)) return nfserr_io; - xdr_truncate_encode(xdr, starting_len + 8 + xdr_align_size(maxcount)); + xdr_truncate_encode(xdr, starting_len + xdr_align_size(maxcount)); tmp = xdr_zero; pad = (maxcount&3) ? 4 - (maxcount&3) : 0; - write_bytes_to_xdr_buf(xdr->buf, starting_len + 8 + maxcount, - &tmp, pad); + write_bytes_to_xdr_buf(xdr->buf, starting_len + maxcount, &tmp, pad); return 0; } From c7863472e57e19570b5fbae08f88b270f45cafd9 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 22 Jul 2022 16:09:16 -0400 Subject: [PATCH 0894/1565] NFSD: Use xdr_pad_size() [ Upstream commit 5e64d85c7d0c59cfcd61d899720b8ccfe895d743 ] Clean up: Use a helper instead of open-coding the calculation of the XDR pad size. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index f67a54f7eb13..4d74eb1fee8f 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3951,9 +3951,8 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, { struct xdr_stream *xdr = resp->xdr; unsigned int starting_len = xdr->buf->len; + __be32 zero = xdr_zero; __be32 nfserr; - __be32 tmp; - int pad; read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, maxcount); if (read->rd_vlen < 0) @@ -3969,11 +3968,9 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, return nfserr_io; xdr_truncate_encode(xdr, starting_len + xdr_align_size(maxcount)); - tmp = xdr_zero; - pad = (maxcount&3) ? 4 - (maxcount&3) : 0; - write_bytes_to_xdr_buf(xdr->buf, starting_len + maxcount, &tmp, pad); - return 0; - + write_bytes_to_xdr_buf(xdr->buf, starting_len + maxcount, &zero, + xdr_pad_size(maxcount)); + return nfs_ok; } static __be32 From 0a1b9a216f7f3f19cd85ebf4f812cc4bfc9e583b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 22 Jul 2022 16:09:23 -0400 Subject: [PATCH 0895/1565] NFSD: Clean up nfsd4_encode_readlink() [ Upstream commit 99b002a1fa00d90e66357315757e7277447ce973 ] Similar changes to nfsd4_encode_readv(), all bundled into a single patch. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 24 +++++++++--------------- 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 4d74eb1fee8f..a98513cb35b1 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -4019,16 +4019,13 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_readlink *readlink) { - int maxcount; - __be32 wire_count; - int zero = 0; + __be32 *p, *maxcount_p, zero = xdr_zero; struct xdr_stream *xdr = resp->xdr; int length_offset = xdr->buf->len; - int status; - __be32 *p; + int maxcount, status; - p = xdr_reserve_space(xdr, 4); - if (!p) + maxcount_p = xdr_reserve_space(xdr, XDR_UNIT); + if (!maxcount_p) return nfserr_resource; maxcount = PAGE_SIZE; @@ -4053,14 +4050,11 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd nfserr = nfserrno(status); goto out_err; } - - wire_count = htonl(maxcount); - write_bytes_to_xdr_buf(xdr->buf, length_offset, &wire_count, 4); - xdr_truncate_encode(xdr, length_offset + 4 + ALIGN(maxcount, 4)); - if (maxcount & 3) - write_bytes_to_xdr_buf(xdr->buf, length_offset + 4 + maxcount, - &zero, 4 - (maxcount&3)); - return 0; + *maxcount_p = cpu_to_be32(maxcount); + xdr_truncate_encode(xdr, length_offset + 4 + xdr_align_size(maxcount)); + write_bytes_to_xdr_buf(xdr->buf, length_offset + 4 + maxcount, &zero, + xdr_pad_size(maxcount)); + return nfs_ok; out_err: xdr_truncate_encode(xdr, length_offset); From 8ce03085cc536770516af77a5ac84b8f171ead99 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 27 Jul 2022 14:40:03 -0400 Subject: [PATCH 0896/1565] NFSD: Fix strncpy() fortify warning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 5304877936c0a67e1a01464d113bae4c81eacdb6 ] In function ‘strncpy’, inlined from ‘nfsd4_ssc_setup_dul’ at /home/cel/src/linux/manet/fs/nfsd/nfs4proc.c:1392:3, inlined from ‘nfsd4_interssc_connect’ at /home/cel/src/linux/manet/fs/nfsd/nfs4proc.c:1489:11: /home/cel/src/linux/manet/include/linux/fortify-string.h:52:33: warning: ‘__builtin_strncpy’ specified bound 63 equals destination size [-Wstringop-truncation] 52 | #define __underlying_strncpy __builtin_strncpy | ^ /home/cel/src/linux/manet/include/linux/fortify-string.h:89:16: note: in expansion of macro ‘__underlying_strncpy’ 89 | return __underlying_strncpy(p, q, size); | ^~~~~~~~~~~~~~~~~~~~ Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 2 +- include/linux/nfs_ssc.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 08c2eaca4f24..1b49b4e2803c 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1391,7 +1391,7 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, return 0; } if (work) { - strncpy(work->nsui_ipaddr, ipaddr, sizeof(work->nsui_ipaddr)); + strlcpy(work->nsui_ipaddr, ipaddr, sizeof(work->nsui_ipaddr) - 1); refcount_set(&work->nsui_refcnt, 2); work->nsui_busy = true; list_add_tail(&work->nsui_list, &nn->nfsd_ssc_mount_list); diff --git a/include/linux/nfs_ssc.h b/include/linux/nfs_ssc.h index 222ae8883e85..75843c00f326 100644 --- a/include/linux/nfs_ssc.h +++ b/include/linux/nfs_ssc.h @@ -64,7 +64,7 @@ struct nfsd4_ssc_umount_item { refcount_t nsui_refcnt; unsigned long nsui_expire; struct vfsmount *nsui_vfsmount; - char nsui_ipaddr[RPC_MAX_ADDRBUFLEN]; + char nsui_ipaddr[RPC_MAX_ADDRBUFLEN + 1]; }; #endif From 02bc4d514c2553ca2461102bb960635e4b844593 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 27 Jul 2022 14:40:09 -0400 Subject: [PATCH 0897/1565] NFSD: nfserrno(-ENOMEM) is nfserr_jukebox [ Upstream commit bb4d842722b84a2731257054b6405f2d866fc5f3 ] Suggested-by: Dai Ngo Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index a98513cb35b1..fcf56e396e66 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1810,7 +1810,7 @@ nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_test_sta for (i = 0; i < test_stateid->ts_num_ids; i++) { stateid = svcxdr_tmpalloc(argp, sizeof(*stateid)); if (!stateid) - return nfserrno(-ENOMEM); /* XXX: not jukebox? */ + return nfserr_jukebox; INIT_LIST_HEAD(&stateid->ts_id_list); list_add_tail(&stateid->ts_id_list, &test_stateid->ts_stateid_list); status = nfsd4_decode_stateid4(argp, &stateid->ts_id_stateid); @@ -1933,7 +1933,7 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) ns_dummy = kmalloc(sizeof(struct nl4_server), GFP_KERNEL); if (ns_dummy == NULL) - return nfserrno(-ENOMEM); /* XXX: jukebox? */ + return nfserr_jukebox; for (i = 0; i < count - 1; i++) { status = nfsd4_decode_nl4_server(argp, ns_dummy); if (status) { From 7c6fd14057a7dab2032aa617870a49f5b05b867e Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 27 Jul 2022 14:40:16 -0400 Subject: [PATCH 0898/1565] NFSD: Shrink size of struct nfsd4_copy_notify [ Upstream commit 09426ef2a64ee189ca1e3298f1e874842dbf35ea ] struct nfsd4_copy_notify is part of struct nfsd4_op, which resides in an 8-element array. sizeof(struct nfsd4_op): Before: /* size: 2208, cachelines: 35, members: 5 */ After: /* size: 1696, cachelines: 27, members: 5 */ Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 4 ++-- fs/nfsd/nfs4xdr.c | 12 ++++++++++-- fs/nfsd/xdr4.h | 4 ++-- 3 files changed, 14 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 1b49b4e2803c..3ed9c36bc407 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1945,9 +1945,9 @@ nfsd4_copy_notify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, /* For now, only return one server address in cpn_src, the * address used by the client to connect to this server. */ - cn->cpn_src.nl4_type = NL4_NETADDR; + cn->cpn_src->nl4_type = NL4_NETADDR; status = nfsd4_set_netaddr((struct sockaddr *)&rqstp->rq_daddr, - &cn->cpn_src.u.nl4_addr); + &cn->cpn_src->u.nl4_addr); WARN_ON_ONCE(status); if (status) { nfs4_put_cpntf_state(nn, cps); diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index fcf56e396e66..537c358fec98 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1952,10 +1952,17 @@ nfsd4_decode_copy_notify(struct nfsd4_compoundargs *argp, { __be32 status; + cn->cpn_src = svcxdr_tmpalloc(argp, sizeof(*cn->cpn_src)); + if (cn->cpn_src == NULL) + return nfserr_jukebox; + cn->cpn_dst = svcxdr_tmpalloc(argp, sizeof(*cn->cpn_dst)); + if (cn->cpn_dst == NULL) + return nfserr_jukebox; + status = nfsd4_decode_stateid4(argp, &cn->cpn_src_stateid); if (status) return status; - return nfsd4_decode_nl4_server(argp, &cn->cpn_dst); + return nfsd4_decode_nl4_server(argp, cn->cpn_dst); } static __be32 @@ -4906,7 +4913,8 @@ nfsd4_encode_copy_notify(struct nfsd4_compoundres *resp, __be32 nfserr, *p++ = cpu_to_be32(1); - return nfsd42_encode_nl4_server(resp, &cn->cpn_src); + nfserr = nfsd42_encode_nl4_server(resp, cn->cpn_src); + return nfserr; } static __be32 diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 32617639a3ec..27f9300fadbe 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -595,13 +595,13 @@ struct nfsd4_offload_status { struct nfsd4_copy_notify { /* request */ stateid_t cpn_src_stateid; - struct nl4_server cpn_dst; + struct nl4_server *cpn_dst; /* response */ stateid_t cpn_cnr_stateid; u64 cpn_sec; u32 cpn_nsec; - struct nl4_server cpn_src; + struct nl4_server *cpn_src; }; struct nfsd4_op { From 94fd87568e91dad8641f4e648b1bfd782652a245 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 27 Jul 2022 14:40:22 -0400 Subject: [PATCH 0899/1565] NFSD: Shrink size of struct nfsd4_copy [ Upstream commit 87689df694916c40e8e6c179ab1c8710f65cb6c6 ] struct nfsd4_copy is part of struct nfsd4_op, which resides in an 8-element array. sizeof(struct nfsd4_op): Before: /* size: 1696, cachelines: 27, members: 5 */ After: /* size: 672, cachelines: 11, members: 5 */ Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 8 ++++++-- fs/nfsd/nfs4xdr.c | 5 ++++- fs/nfsd/xdr4.h | 2 +- 3 files changed, 11 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 3ed9c36bc407..7fb1ef7c4383 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1291,6 +1291,7 @@ void nfs4_put_copy(struct nfsd4_copy *copy) { if (!refcount_dec_and_test(©->refcount)) return; + kfree(copy->cp_src); kfree(copy); } @@ -1544,7 +1545,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, if (status) goto out; - status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); + status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount); if (status) goto out; @@ -1751,7 +1752,7 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst) dst->nf_src = nfsd_file_get(src->nf_src); memcpy(&dst->cp_stateid, &src->cp_stateid, sizeof(src->cp_stateid)); - memcpy(&dst->cp_src, &src->cp_src, sizeof(struct nl4_server)); + memcpy(dst->cp_src, src->cp_src, sizeof(struct nl4_server)); memcpy(&dst->stateid, &src->stateid, sizeof(src->stateid)); memcpy(&dst->c_fh, &src->c_fh, sizeof(src->c_fh)); dst->ss_mnt = src->ss_mnt; @@ -1845,6 +1846,9 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, async_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL); if (!async_copy) goto out_err; + async_copy->cp_src = kmalloc(sizeof(*async_copy->cp_src), GFP_KERNEL); + if (!async_copy->cp_src) + goto out_err; if (!nfs4_init_copy_state(nn, copy)) goto out_err; refcount_set(&async_copy->refcount, 1); diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 537c358fec98..890f1009bd4c 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1920,6 +1920,9 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) if (xdr_stream_decode_u32(argp->xdr, &count) < 0) return nfserr_bad_xdr; + copy->cp_src = svcxdr_tmpalloc(argp, sizeof(*copy->cp_src)); + if (copy->cp_src == NULL) + return nfserr_jukebox; copy->cp_intra = false; if (count == 0) { /* intra-server copy */ copy->cp_intra = true; @@ -1927,7 +1930,7 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) } /* decode all the supplied server addresses but use only the first */ - status = nfsd4_decode_nl4_server(argp, ©->cp_src); + status = nfsd4_decode_nl4_server(argp, copy->cp_src); if (status) return status; diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 27f9300fadbe..c44c76cef40c 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -540,7 +540,7 @@ struct nfsd4_copy { u64 cp_src_pos; u64 cp_dst_pos; u64 cp_count; - struct nl4_server cp_src; + struct nl4_server *cp_src; bool cp_intra; /* both */ From 0d7e3df76b50276db781ce7a2a09df74b2147a61 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 27 Jul 2022 14:40:28 -0400 Subject: [PATCH 0900/1565] NFSD: Reorder the fields in struct nfsd4_op [ Upstream commit d314309425ad5dc1b6facdb2d456580fb5fa5e3a ] Pack the fields to reduce the size of struct nfsd4_op, which is used an array in struct nfsd4_compoundargs. sizeof(struct nfsd4_op): Before: /* size: 672, cachelines: 11, members: 5 */ After: /* size: 640, cachelines: 10, members: 5 */ Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/xdr4.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index c44c76cef40c..621937fae9ac 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -606,8 +606,9 @@ struct nfsd4_copy_notify { struct nfsd4_op { u32 opnum; - const struct nfsd4_operation * opdesc; __be32 status; + const struct nfsd4_operation *opdesc; + struct nfs4_replay *replay; union nfsd4_op_u { struct nfsd4_access access; struct nfsd4_close close; @@ -671,7 +672,6 @@ struct nfsd4_op { struct nfsd4_listxattrs listxattrs; struct nfsd4_removexattr removexattr; } u; - struct nfs4_replay * replay; }; bool nfsd4_cache_this_op(struct nfsd4_op *); From 627b896c5219ed1f973a594fd9c0bebcc4f0d89f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 27 Jul 2022 14:40:35 -0400 Subject: [PATCH 0901/1565] NFSD: Make nfs4_put_copy() static [ Upstream commit 8ea6e2c90bb0eb74a595a12e23a1dff9abbc760a ] Clean up: All call sites are in fs/nfsd/nfs4proc.c. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/state.h | 1 - 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 7fb1ef7c4383..64879350ccbd 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1287,7 +1287,7 @@ nfsd4_clone(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, return status; } -void nfs4_put_copy(struct nfsd4_copy *copy) +static void nfs4_put_copy(struct nfsd4_copy *copy) { if (!refcount_dec_and_test(©->refcount)) return; diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index f3d6313914ed..ae596dbf8667 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -703,7 +703,6 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(struct xdr_netobj name extern bool nfs4_has_reclaimed_state(struct xdr_netobj name, struct nfsd_net *nn); void put_nfs4_file(struct nfs4_file *fi); -extern void nfs4_put_copy(struct nfsd4_copy *copy); extern struct nfsd4_copy * find_async_copy(struct nfs4_client *clp, stateid_t *staetid); extern void nfs4_put_cpntf_state(struct nfsd_net *nn, From ac774e1eebe8cc93a29530252cbc97a4c21fab23 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 27 Jul 2022 14:40:41 -0400 Subject: [PATCH 0902/1565] NFSD: Replace boolean fields in struct nfsd4_copy [ Upstream commit 1913cdf56cb5bfbc8170873728d13598cbecda23 ] Clean up: saves 8 bytes, and we can replace check_and_set_stop_copy() with an atomic bitop. [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 49 +++++++++++++++++----------------------------- fs/nfsd/nfs4xdr.c | 12 ++++++------ fs/nfsd/xdr4.h | 33 ++++++++++++++++++++++++++----- 3 files changed, 52 insertions(+), 42 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 64879350ccbd..a4bc096e509d 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1295,23 +1295,9 @@ static void nfs4_put_copy(struct nfsd4_copy *copy) kfree(copy); } -static bool -check_and_set_stop_copy(struct nfsd4_copy *copy) -{ - bool value; - - spin_lock(©->cp_clp->async_lock); - value = copy->stopped; - if (!copy->stopped) - copy->stopped = true; - spin_unlock(©->cp_clp->async_lock); - return value; -} - static void nfsd4_stop_copy(struct nfsd4_copy *copy) { - /* only 1 thread should stop the copy */ - if (!check_and_set_stop_copy(copy)) + if (!test_and_set_bit(NFSD4_COPY_F_STOPPED, ©->cp_flags)) kthread_stop(copy->copy_task); nfs4_put_copy(copy); } @@ -1668,8 +1654,9 @@ static const struct nfsd4_callback_ops nfsd4_cb_offload_ops = { static void nfsd4_init_copy_res(struct nfsd4_copy *copy, bool sync) { copy->cp_res.wr_stable_how = - copy->committed ? NFS_FILE_SYNC : NFS_UNSTABLE; - copy->cp_synchronous = sync; + test_bit(NFSD4_COPY_F_COMMITTED, ©->cp_flags) ? + NFS_FILE_SYNC : NFS_UNSTABLE; + nfsd4_copy_set_sync(copy, sync); gen_boot_verifier(©->cp_res.wr_verifier, copy->cp_clp->net); } @@ -1698,16 +1685,16 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy) copy->cp_res.wr_bytes_written += bytes_copied; src_pos += bytes_copied; dst_pos += bytes_copied; - } while (bytes_total > 0 && !copy->cp_synchronous); + } while (bytes_total > 0 && nfsd4_copy_is_async(copy)); /* for a non-zero asynchronous copy do a commit of data */ - if (!copy->cp_synchronous && copy->cp_res.wr_bytes_written > 0) { + if (nfsd4_copy_is_async(copy) && copy->cp_res.wr_bytes_written > 0) { since = READ_ONCE(dst->f_wb_err); status = vfs_fsync_range(dst, copy->cp_dst_pos, copy->cp_res.wr_bytes_written, 0); if (!status) status = filemap_check_wb_err(dst->f_mapping, since); if (!status) - copy->committed = true; + set_bit(NFSD4_COPY_F_COMMITTED, ©->cp_flags); } return bytes_copied; } @@ -1728,7 +1715,7 @@ static __be32 nfsd4_do_copy(struct nfsd4_copy *copy, bool sync) status = nfs_ok; } - if (!copy->cp_intra) /* Inter server SSC */ + if (nfsd4_ssc_is_inter(copy)) nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->nf_src, copy->nf_dst); else @@ -1742,13 +1729,13 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst) dst->cp_src_pos = src->cp_src_pos; dst->cp_dst_pos = src->cp_dst_pos; dst->cp_count = src->cp_count; - dst->cp_synchronous = src->cp_synchronous; + dst->cp_flags = src->cp_flags; memcpy(&dst->cp_res, &src->cp_res, sizeof(src->cp_res)); memcpy(&dst->fh, &src->fh, sizeof(src->fh)); dst->cp_clp = src->cp_clp; dst->nf_dst = nfsd_file_get(src->nf_dst); - dst->cp_intra = src->cp_intra; - if (src->cp_intra) /* for inter, file_src doesn't exist yet */ + /* for inter, nf_src doesn't exist yet */ + if (!nfsd4_ssc_is_inter(src)) dst->nf_src = nfsd_file_get(src->nf_src); memcpy(&dst->cp_stateid, &src->cp_stateid, sizeof(src->cp_stateid)); @@ -1762,7 +1749,7 @@ static void cleanup_async_copy(struct nfsd4_copy *copy) { nfs4_free_copy_state(copy); nfsd_file_put(copy->nf_dst); - if (copy->cp_intra) + if (!nfsd4_ssc_is_inter(copy)) nfsd_file_put(copy->nf_src); spin_lock(©->cp_clp->async_lock); list_del(©->copies); @@ -1775,7 +1762,7 @@ static int nfsd4_do_async_copy(void *data) struct nfsd4_copy *copy = (struct nfsd4_copy *)data; struct nfsd4_copy *cb_copy; - if (!copy->cp_intra) { /* Inter server SSC */ + if (nfsd4_ssc_is_inter(copy)) { copy->nf_src = kzalloc(sizeof(struct nfsd_file), GFP_KERNEL); if (!copy->nf_src) { copy->nfserr = nfserr_serverfault; @@ -1807,7 +1794,7 @@ static int nfsd4_do_async_copy(void *data) ©->fh, copy->cp_count, copy->nfserr); nfsd4_run_cb(&cb_copy->cp_cb); out: - if (!copy->cp_intra) + if (nfsd4_ssc_is_inter(copy)) kfree(copy->nf_src); cleanup_async_copy(copy); return 0; @@ -1821,8 +1808,8 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, __be32 status; struct nfsd4_copy *async_copy = NULL; - if (!copy->cp_intra) { /* Inter server SSC */ - if (!inter_copy_offload_enable || copy->cp_synchronous) { + if (nfsd4_ssc_is_inter(copy)) { + if (!inter_copy_offload_enable || nfsd4_copy_is_sync(copy)) { status = nfserr_notsupp; goto out; } @@ -1839,7 +1826,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, copy->cp_clp = cstate->clp; memcpy(©->fh, &cstate->current_fh.fh_handle, sizeof(struct knfsd_fh)); - if (!copy->cp_synchronous) { + if (nfsd4_copy_is_async(copy)) { struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); status = nfserrno(-ENOMEM); @@ -2605,7 +2592,7 @@ check_if_stalefh_allowed(struct nfsd4_compoundargs *args) return; } putfh = (struct nfsd4_putfh *)&saved_op->u; - if (!copy->cp_intra) + if (nfsd4_ssc_is_inter(copy)) putfh->no_verify = true; } } diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 890f1009bd4c..5476541530ea 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1896,8 +1896,8 @@ static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) { + u32 consecutive, i, count, sync; struct nl4_server *ns_dummy; - u32 consecutive, i, count; __be32 status; status = nfsd4_decode_stateid4(argp, ©->cp_src_stateid); @@ -1915,17 +1915,17 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) /* ca_consecutive: we always do consecutive copies */ if (xdr_stream_decode_u32(argp->xdr, &consecutive) < 0) return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, ©->cp_synchronous) < 0) + if (xdr_stream_decode_bool(argp->xdr, &sync) < 0) return nfserr_bad_xdr; + nfsd4_copy_set_sync(copy, sync); if (xdr_stream_decode_u32(argp->xdr, &count) < 0) return nfserr_bad_xdr; copy->cp_src = svcxdr_tmpalloc(argp, sizeof(*copy->cp_src)); if (copy->cp_src == NULL) return nfserr_jukebox; - copy->cp_intra = false; if (count == 0) { /* intra-server copy */ - copy->cp_intra = true; + __set_bit(NFSD4_COPY_F_INTRA, ©->cp_flags); return nfs_ok; } @@ -4712,13 +4712,13 @@ nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr, __be32 *p; nfserr = nfsd42_encode_write_res(resp, ©->cp_res, - !!copy->cp_synchronous); + nfsd4_copy_is_sync(copy)); if (nfserr) return nfserr; p = xdr_reserve_space(resp->xdr, 4 + 4); *p++ = xdr_one; /* cr_consecutive */ - *p++ = cpu_to_be32(copy->cp_synchronous); + *p = nfsd4_copy_is_sync(copy) ? xdr_one : xdr_zero; return 0; } diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 621937fae9ac..1f6ac92bcf85 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -541,10 +541,12 @@ struct nfsd4_copy { u64 cp_dst_pos; u64 cp_count; struct nl4_server *cp_src; - bool cp_intra; - /* both */ - u32 cp_synchronous; + unsigned long cp_flags; +#define NFSD4_COPY_F_STOPPED (0) +#define NFSD4_COPY_F_INTRA (1) +#define NFSD4_COPY_F_SYNCHRONOUS (2) +#define NFSD4_COPY_F_COMMITTED (3) /* response */ struct nfsd42_write_res cp_res; @@ -564,14 +566,35 @@ struct nfsd4_copy { struct list_head copies; struct task_struct *copy_task; refcount_t refcount; - bool stopped; struct vfsmount *ss_mnt; struct nfs_fh c_fh; nfs4_stateid stateid; - bool committed; }; +static inline void nfsd4_copy_set_sync(struct nfsd4_copy *copy, bool sync) +{ + if (sync) + set_bit(NFSD4_COPY_F_SYNCHRONOUS, ©->cp_flags); + else + clear_bit(NFSD4_COPY_F_SYNCHRONOUS, ©->cp_flags); +} + +static inline bool nfsd4_copy_is_sync(const struct nfsd4_copy *copy) +{ + return test_bit(NFSD4_COPY_F_SYNCHRONOUS, ©->cp_flags); +} + +static inline bool nfsd4_copy_is_async(const struct nfsd4_copy *copy) +{ + return !test_bit(NFSD4_COPY_F_SYNCHRONOUS, ©->cp_flags); +} + +static inline bool nfsd4_ssc_is_inter(const struct nfsd4_copy *copy) +{ + return !test_bit(NFSD4_COPY_F_INTRA, ©->cp_flags); +} + struct nfsd4_seek { /* request */ stateid_t seek_stateid; From 0d592d96d6c60de28ff8877e75d55ea39b29f33f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 27 Jul 2022 14:40:47 -0400 Subject: [PATCH 0903/1565] NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2) [ Upstream commit 24d796ea383b8a4c8234e06d1b14bbcd371192ea ] The @src parameter is sometimes a pointer to a struct nfsd_file and sometimes a pointer to struct file hiding in a phony struct nfsd_file. Refactor nfsd4_cleanup_inter_ssc() so the @src parameter is always an explicit struct file. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index a4bc096e509d..d39150425da8 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1549,7 +1549,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, } static void -nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src, +nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp, struct nfsd_file *dst) { bool found = false; @@ -1558,9 +1558,9 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src, struct nfsd4_ssc_umount_item *ni = NULL; struct nfsd_net *nn = net_generic(dst->nf_net, nfsd_net_id); - nfs42_ssc_close(src->nf_file); + nfs42_ssc_close(filp); nfsd_file_put(dst); - fput(src->nf_file); + fput(filp); if (!nn) { mntput(ss_mnt); @@ -1603,7 +1603,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, } static void -nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src, +nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp, struct nfsd_file *dst) { } @@ -1716,7 +1716,7 @@ static __be32 nfsd4_do_copy(struct nfsd4_copy *copy, bool sync) } if (nfsd4_ssc_is_inter(copy)) - nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->nf_src, + nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->nf_src->nf_file, copy->nf_dst); else nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst); From 8153ed38cc9d43ccb40d5bc4dcd3046e9f3f5be7 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 27 Jul 2022 14:40:53 -0400 Subject: [PATCH 0904/1565] NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2) [ Upstream commit 478ed7b10d875da2743d1a22822b9f8a82df8f12 ] Move the nfsd4_cleanup_*() call sites out of nfsd4_do_copy(). A subsequent patch will modify one of the new call sites to avoid the need to manufacture the phony struct nfsd_file. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index d39150425da8..5d05bb7a0c0f 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1714,13 +1714,6 @@ static __be32 nfsd4_do_copy(struct nfsd4_copy *copy, bool sync) nfsd4_init_copy_res(copy, sync); status = nfs_ok; } - - if (nfsd4_ssc_is_inter(copy)) - nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->nf_src->nf_file, - copy->nf_dst); - else - nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst); - return status; } @@ -1776,9 +1769,14 @@ static int nfsd4_do_async_copy(void *data) /* ss_mnt will be unmounted by the laundromat */ goto do_callback; } + copy->nfserr = nfsd4_do_copy(copy, 0); + nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->nf_src->nf_file, + copy->nf_dst); + } else { + copy->nfserr = nfsd4_do_copy(copy, 0); + nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst); } - copy->nfserr = nfsd4_do_copy(copy, 0); do_callback: cb_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL); if (!cb_copy) @@ -1854,6 +1852,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfs_ok; } else { status = nfsd4_do_copy(copy, 1); + nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst); } out: return status; From a860bd179e7a0483f4d4ad3df57ffe5f7775afc9 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 27 Jul 2022 14:40:59 -0400 Subject: [PATCH 0905/1565] NFSD: Refactor nfsd4_do_copy() [ Upstream commit 3b7bf5933cada732783554edf0dc61283551c6cf ] Refactor: Now that nfsd4_do_copy() no longer calls the cleanup helpers, plumb the use of struct file pointers all the way down to _nfsd_copy_file_range(). Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 5d05bb7a0c0f..16f968c165c9 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1660,10 +1660,10 @@ static void nfsd4_init_copy_res(struct nfsd4_copy *copy, bool sync) gen_boot_verifier(©->cp_res.wr_verifier, copy->cp_clp->net); } -static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy) +static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy, + struct file *dst, + struct file *src) { - struct file *dst = copy->nf_dst->nf_file; - struct file *src = copy->nf_src->nf_file; errseq_t since; ssize_t bytes_copied = 0; u64 bytes_total = copy->cp_count; @@ -1699,12 +1699,15 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy) return bytes_copied; } -static __be32 nfsd4_do_copy(struct nfsd4_copy *copy, bool sync) +static __be32 nfsd4_do_copy(struct nfsd4_copy *copy, + struct file *src, struct file *dst, + bool sync) { __be32 status; ssize_t bytes; - bytes = _nfsd_copy_file_range(copy); + bytes = _nfsd_copy_file_range(copy, dst, src); + /* for async copy, we ignore the error, client can always retry * to get the error */ @@ -1769,11 +1772,13 @@ static int nfsd4_do_async_copy(void *data) /* ss_mnt will be unmounted by the laundromat */ goto do_callback; } - copy->nfserr = nfsd4_do_copy(copy, 0); + copy->nfserr = nfsd4_do_copy(copy, copy->nf_src->nf_file, + copy->nf_dst->nf_file, false); nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->nf_src->nf_file, copy->nf_dst); } else { - copy->nfserr = nfsd4_do_copy(copy, 0); + copy->nfserr = nfsd4_do_copy(copy, copy->nf_src->nf_file, + copy->nf_dst->nf_file, false); nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst); } @@ -1851,7 +1856,8 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, wake_up_process(async_copy->copy_task); status = nfs_ok; } else { - status = nfsd4_do_copy(copy, 1); + status = nfsd4_do_copy(copy, copy->nf_src->nf_file, + copy->nf_dst->nf_file, true); nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst); } out: From d87486acbd6eef1d5ebc997d70115007594b7f6d Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 27 Jul 2022 14:41:06 -0400 Subject: [PATCH 0906/1565] NFSD: Remove kmalloc from nfsd4_do_async_copy() [ Upstream commit ad1e46c9b07b13659635ee5405f83ad0df143116 ] Instead of manufacturing a phony struct nfsd_file, pass the struct file returned by nfs42_ssc_open() directly to nfsd4_do_copy(). [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 16f968c165c9..dbc507c9aa11 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1753,29 +1753,31 @@ static void cleanup_async_copy(struct nfsd4_copy *copy) nfs4_put_copy(copy); } +/** + * nfsd4_do_async_copy - kthread function for background server-side COPY + * @data: arguments for COPY operation + * + * Return values: + * %0: Copy operation is done. + */ static int nfsd4_do_async_copy(void *data) { struct nfsd4_copy *copy = (struct nfsd4_copy *)data; struct nfsd4_copy *cb_copy; if (nfsd4_ssc_is_inter(copy)) { - copy->nf_src = kzalloc(sizeof(struct nfsd_file), GFP_KERNEL); - if (!copy->nf_src) { - copy->nfserr = nfserr_serverfault; - /* ss_mnt will be unmounted by the laundromat */ - goto do_callback; - } - copy->nf_src->nf_file = nfs42_ssc_open(copy->ss_mnt, ©->c_fh, - ©->stateid); - if (IS_ERR(copy->nf_src->nf_file)) { + struct file *filp; + + filp = nfs42_ssc_open(copy->ss_mnt, ©->c_fh, + ©->stateid); + if (IS_ERR(filp)) { copy->nfserr = nfserr_offload_denied; /* ss_mnt will be unmounted by the laundromat */ goto do_callback; } - copy->nfserr = nfsd4_do_copy(copy, copy->nf_src->nf_file, + copy->nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file, false); - nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->nf_src->nf_file, - copy->nf_dst); + nfsd4_cleanup_inter_ssc(copy->ss_mnt, filp, copy->nf_dst); } else { copy->nfserr = nfsd4_do_copy(copy, copy->nf_src->nf_file, copy->nf_dst->nf_file, false); @@ -1797,8 +1799,6 @@ static int nfsd4_do_async_copy(void *data) ©->fh, copy->cp_count, copy->nfserr); nfsd4_run_cb(&cb_copy->cp_cb); out: - if (nfsd4_ssc_is_inter(copy)) - kfree(copy->nf_src); cleanup_async_copy(copy); return 0; } From e1d1b6574e7b7bb593b1c612379f6632fa90ea30 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 27 Jul 2022 14:41:12 -0400 Subject: [PATCH 0907/1565] NFSD: Add nfsd4_send_cb_offload() [ Upstream commit e72f9bc006c08841c46d27747a4debc747a8fe13 ] Refactor for legibility. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 37 ++++++++++++++++++++++--------------- 1 file changed, 22 insertions(+), 15 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index dbc507c9aa11..332fd1d0b188 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1753,6 +1753,27 @@ static void cleanup_async_copy(struct nfsd4_copy *copy) nfs4_put_copy(copy); } +static void nfsd4_send_cb_offload(struct nfsd4_copy *copy) +{ + struct nfsd4_copy *cb_copy; + + cb_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL); + if (!cb_copy) + return; + + refcount_set(&cb_copy->refcount, 1); + memcpy(&cb_copy->cp_res, ©->cp_res, sizeof(copy->cp_res)); + cb_copy->cp_clp = copy->cp_clp; + cb_copy->nfserr = copy->nfserr; + memcpy(&cb_copy->fh, ©->fh, sizeof(copy->fh)); + + nfsd4_init_cb(&cb_copy->cp_cb, cb_copy->cp_clp, + &nfsd4_cb_offload_ops, NFSPROC4_CLNT_CB_OFFLOAD); + trace_nfsd_cb_offload(copy->cp_clp, ©->cp_res.cb_stateid, + ©->fh, copy->cp_count, copy->nfserr); + nfsd4_run_cb(&cb_copy->cp_cb); +} + /** * nfsd4_do_async_copy - kthread function for background server-side COPY * @data: arguments for COPY operation @@ -1763,7 +1784,6 @@ static void cleanup_async_copy(struct nfsd4_copy *copy) static int nfsd4_do_async_copy(void *data) { struct nfsd4_copy *copy = (struct nfsd4_copy *)data; - struct nfsd4_copy *cb_copy; if (nfsd4_ssc_is_inter(copy)) { struct file *filp; @@ -1785,20 +1805,7 @@ static int nfsd4_do_async_copy(void *data) } do_callback: - cb_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL); - if (!cb_copy) - goto out; - refcount_set(&cb_copy->refcount, 1); - memcpy(&cb_copy->cp_res, ©->cp_res, sizeof(copy->cp_res)); - cb_copy->cp_clp = copy->cp_clp; - cb_copy->nfserr = copy->nfserr; - memcpy(&cb_copy->fh, ©->fh, sizeof(copy->fh)); - nfsd4_init_cb(&cb_copy->cp_cb, cb_copy->cp_clp, - &nfsd4_cb_offload_ops, NFSPROC4_CLNT_CB_OFFLOAD); - trace_nfsd_cb_offload(copy->cp_clp, ©->cp_res.cb_stateid, - ©->fh, copy->cp_count, copy->nfserr); - nfsd4_run_cb(&cb_copy->cp_cb); -out: + nfsd4_send_cb_offload(copy); cleanup_async_copy(copy); return 0; } From c62dcf86332e030dc7c427cf5582a769926e2a1e Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 27 Jul 2022 14:41:18 -0400 Subject: [PATCH 0908/1565] NFSD: Move copy offload callback arguments into a separate structure [ Upstream commit a11ada99ce93a79393dc6683d22f7915748c8f6b ] Refactor so that CB_OFFLOAD arguments can be passed without allocating a whole struct nfsd4_copy object. On my system (x86_64) this removes another 96 bytes from struct nfsd4_copy. [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4callback.c | 37 +++++++++++++++++------------------ fs/nfsd/nfs4proc.c | 44 +++++++++++++++++++++--------------------- fs/nfsd/xdr4.h | 11 +++++++---- 3 files changed, 47 insertions(+), 45 deletions(-) diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index e1272a7f4522..face8908a40b 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -679,7 +679,7 @@ static int nfs4_xdr_dec_cb_notify_lock(struct rpc_rqst *rqstp, * case NFS4_OK: * write_response4 coa_resok4; * default: - * length4 coa_bytes_copied; + * length4 coa_bytes_copied; * }; * struct CB_OFFLOAD4args { * nfs_fh4 coa_fh; @@ -688,21 +688,22 @@ static int nfs4_xdr_dec_cb_notify_lock(struct rpc_rqst *rqstp, * }; */ static void encode_offload_info4(struct xdr_stream *xdr, - __be32 nfserr, - const struct nfsd4_copy *cp) + const struct nfsd4_cb_offload *cbo) { __be32 *p; p = xdr_reserve_space(xdr, 4); - *p++ = nfserr; - if (!nfserr) { + *p = cbo->co_nfserr; + switch (cbo->co_nfserr) { + case nfs_ok: p = xdr_reserve_space(xdr, 4 + 8 + 4 + NFS4_VERIFIER_SIZE); p = xdr_encode_empty_array(p); - p = xdr_encode_hyper(p, cp->cp_res.wr_bytes_written); - *p++ = cpu_to_be32(cp->cp_res.wr_stable_how); - p = xdr_encode_opaque_fixed(p, cp->cp_res.wr_verifier.data, + p = xdr_encode_hyper(p, cbo->co_res.wr_bytes_written); + *p++ = cpu_to_be32(cbo->co_res.wr_stable_how); + p = xdr_encode_opaque_fixed(p, cbo->co_res.wr_verifier.data, NFS4_VERIFIER_SIZE); - } else { + break; + default: p = xdr_reserve_space(xdr, 8); /* We always return success if bytes were written */ p = xdr_encode_hyper(p, 0); @@ -710,18 +711,16 @@ static void encode_offload_info4(struct xdr_stream *xdr, } static void encode_cb_offload4args(struct xdr_stream *xdr, - __be32 nfserr, - const struct knfsd_fh *fh, - const struct nfsd4_copy *cp, + const struct nfsd4_cb_offload *cbo, struct nfs4_cb_compound_hdr *hdr) { __be32 *p; p = xdr_reserve_space(xdr, 4); - *p++ = cpu_to_be32(OP_CB_OFFLOAD); - encode_nfs_fh4(xdr, fh); - encode_stateid4(xdr, &cp->cp_res.cb_stateid); - encode_offload_info4(xdr, nfserr, cp); + *p = cpu_to_be32(OP_CB_OFFLOAD); + encode_nfs_fh4(xdr, &cbo->co_fh); + encode_stateid4(xdr, &cbo->co_res.cb_stateid); + encode_offload_info4(xdr, cbo); hdr->nops++; } @@ -731,8 +730,8 @@ static void nfs4_xdr_enc_cb_offload(struct rpc_rqst *req, const void *data) { const struct nfsd4_callback *cb = data; - const struct nfsd4_copy *cp = - container_of(cb, struct nfsd4_copy, cp_cb); + const struct nfsd4_cb_offload *cbo = + container_of(cb, struct nfsd4_cb_offload, co_cb); struct nfs4_cb_compound_hdr hdr = { .ident = 0, .minorversion = cb->cb_clp->cl_minorversion, @@ -740,7 +739,7 @@ static void nfs4_xdr_enc_cb_offload(struct rpc_rqst *req, encode_cb_compound4args(xdr, &hdr); encode_cb_sequence4args(xdr, cb, &hdr); - encode_cb_offload4args(xdr, cp->nfserr, &cp->fh, cp, &hdr); + encode_cb_offload4args(xdr, cbo, &hdr); encode_cb_nops(&hdr); } diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 332fd1d0b188..fdde7eca8d43 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1635,9 +1635,10 @@ nfsd4_cleanup_intra_ssc(struct nfsd_file *src, struct nfsd_file *dst) static void nfsd4_cb_offload_release(struct nfsd4_callback *cb) { - struct nfsd4_copy *copy = container_of(cb, struct nfsd4_copy, cp_cb); + struct nfsd4_cb_offload *cbo = + container_of(cb, struct nfsd4_cb_offload, co_cb); - nfs4_put_copy(copy); + kfree(cbo); } static int nfsd4_cb_offload_done(struct nfsd4_callback *cb, @@ -1753,25 +1754,23 @@ static void cleanup_async_copy(struct nfsd4_copy *copy) nfs4_put_copy(copy); } -static void nfsd4_send_cb_offload(struct nfsd4_copy *copy) +static void nfsd4_send_cb_offload(struct nfsd4_copy *copy, __be32 nfserr) { - struct nfsd4_copy *cb_copy; + struct nfsd4_cb_offload *cbo; - cb_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL); - if (!cb_copy) + cbo = kzalloc(sizeof(*cbo), GFP_KERNEL); + if (!cbo) return; - refcount_set(&cb_copy->refcount, 1); - memcpy(&cb_copy->cp_res, ©->cp_res, sizeof(copy->cp_res)); - cb_copy->cp_clp = copy->cp_clp; - cb_copy->nfserr = copy->nfserr; - memcpy(&cb_copy->fh, ©->fh, sizeof(copy->fh)); + memcpy(&cbo->co_res, ©->cp_res, sizeof(copy->cp_res)); + memcpy(&cbo->co_fh, ©->fh, sizeof(copy->fh)); + cbo->co_nfserr = nfserr; - nfsd4_init_cb(&cb_copy->cp_cb, cb_copy->cp_clp, - &nfsd4_cb_offload_ops, NFSPROC4_CLNT_CB_OFFLOAD); - trace_nfsd_cb_offload(copy->cp_clp, ©->cp_res.cb_stateid, - ©->fh, copy->cp_count, copy->nfserr); - nfsd4_run_cb(&cb_copy->cp_cb); + nfsd4_init_cb(&cbo->co_cb, copy->cp_clp, &nfsd4_cb_offload_ops, + NFSPROC4_CLNT_CB_OFFLOAD); + trace_nfsd_cb_offload(copy->cp_clp, &cbo->co_res.cb_stateid, + &cbo->co_fh, copy->cp_count, nfserr); + nfsd4_run_cb(&cbo->co_cb); } /** @@ -1784,6 +1783,7 @@ static void nfsd4_send_cb_offload(struct nfsd4_copy *copy) static int nfsd4_do_async_copy(void *data) { struct nfsd4_copy *copy = (struct nfsd4_copy *)data; + __be32 nfserr; if (nfsd4_ssc_is_inter(copy)) { struct file *filp; @@ -1791,21 +1791,21 @@ static int nfsd4_do_async_copy(void *data) filp = nfs42_ssc_open(copy->ss_mnt, ©->c_fh, ©->stateid); if (IS_ERR(filp)) { - copy->nfserr = nfserr_offload_denied; + nfserr = nfserr_offload_denied; /* ss_mnt will be unmounted by the laundromat */ goto do_callback; } - copy->nfserr = nfsd4_do_copy(copy, filp, - copy->nf_dst->nf_file, false); + nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file, + false); nfsd4_cleanup_inter_ssc(copy->ss_mnt, filp, copy->nf_dst); } else { - copy->nfserr = nfsd4_do_copy(copy, copy->nf_src->nf_file, - copy->nf_dst->nf_file, false); + nfserr = nfsd4_do_copy(copy, copy->nf_src->nf_file, + copy->nf_dst->nf_file, false); nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst); } do_callback: - nfsd4_send_cb_offload(copy); + nfsd4_send_cb_offload(copy, nfserr); cleanup_async_copy(copy); return 0; } diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 1f6ac92bcf85..3b9e60249aea 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -533,6 +533,13 @@ struct nfsd42_write_res { stateid_t cb_stateid; }; +struct nfsd4_cb_offload { + struct nfsd4_callback co_cb; + struct nfsd42_write_res co_res; + __be32 co_nfserr; + struct knfsd_fh co_fh; +}; + struct nfsd4_copy { /* request */ stateid_t cp_src_stateid; @@ -550,10 +557,6 @@ struct nfsd4_copy { /* response */ struct nfsd42_write_res cp_res; - - /* for cb_offload */ - struct nfsd4_callback cp_cb; - __be32 nfserr; struct knfsd_fh fh; struct nfs4_client *cp_clp; From 820bf1383d66a4e2c5fce7277d6f99ee68e5f98a Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Tue, 26 Jul 2022 16:45:30 +1000 Subject: [PATCH 0909/1565] NFSD: drop fh argument from alloc_init_deleg [ Upstream commit bbf936edd543e7220f60f9cbd6933b916550396d ] Currently, we pass the fh of the opened file down through several functions so that alloc_init_deleg can pass it to delegation_blocked. The filehandle of the open file is available in the nfs4_file however, so there's no need to pass it in a separate argument. Drop the argument from alloc_init_deleg, nfs4_open_delegation and nfs4_set_delegation. Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 1babc08fcb88..194d8aeb1fd4 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1143,7 +1143,6 @@ static void block_delegations(struct knfsd_fh *fh) static struct nfs4_delegation * alloc_init_deleg(struct nfs4_client *clp, struct nfs4_file *fp, - struct svc_fh *current_fh, struct nfs4_clnt_odstate *odstate) { struct nfs4_delegation *dp; @@ -1153,7 +1152,7 @@ alloc_init_deleg(struct nfs4_client *clp, struct nfs4_file *fp, n = atomic_long_inc_return(&num_delegations); if (n < 0 || n > max_delegations) goto out_dec; - if (delegation_blocked(¤t_fh->fh_handle)) + if (delegation_blocked(&fp->fi_fhandle)) goto out_dec; dp = delegstateid(nfs4_alloc_stid(clp, deleg_slab, nfs4_free_deleg)); if (dp == NULL) @@ -5307,7 +5306,7 @@ static int nfsd4_check_conflicting_opens(struct nfs4_client *clp, } static struct nfs4_delegation * -nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh, +nfs4_set_delegation(struct nfs4_client *clp, struct nfs4_file *fp, struct nfs4_clnt_odstate *odstate) { int status = 0; @@ -5352,7 +5351,7 @@ nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh, return ERR_PTR(status); status = -ENOMEM; - dp = alloc_init_deleg(clp, fp, fh, odstate); + dp = alloc_init_deleg(clp, fp, odstate); if (!dp) goto out_delegees; @@ -5420,8 +5419,7 @@ static void nfsd4_open_deleg_none_ext(struct nfsd4_open *open, int status) * proper support for them. */ static void -nfs4_open_delegation(struct svc_fh *fh, struct nfsd4_open *open, - struct nfs4_ol_stateid *stp) +nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp) { struct nfs4_delegation *dp; struct nfs4_openowner *oo = openowner(stp->st_stateowner); @@ -5453,7 +5451,7 @@ nfs4_open_delegation(struct svc_fh *fh, struct nfsd4_open *open, default: goto out_no_deleg; } - dp = nfs4_set_delegation(clp, fh, stp->st_stid.sc_file, stp->st_clnt_odstate); + dp = nfs4_set_delegation(clp, stp->st_stid.sc_file, stp->st_clnt_odstate); if (IS_ERR(dp)) goto out_no_deleg; @@ -5585,7 +5583,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf * Attempt to hand out a delegation. No error return, because the * OPEN succeeds even if we fail. */ - nfs4_open_delegation(current_fh, open, stp); + nfs4_open_delegation(open, stp); nodeleg: status = nfs_ok; trace_nfsd_open(&stp->st_stid.sc_stateid); From 3aac39eaa675259e313ab5418fc3b6476f5b8712 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Tue, 26 Jul 2022 16:45:30 +1000 Subject: [PATCH 0910/1565] NFSD: verify the opened dentry after setting a delegation [ Upstream commit 876c553cb41026cb6ad3cef970a35e5f69c42a25 ] Between opening a file and setting a delegation on it, someone could rename or unlink the dentry. If this happens, we do not want to grant a delegation on the open. On a CLAIM_NULL open, we're opening by filename, and we may (in the non-create case) or may not (in the create case) be holding i_rwsem when attempting to set a delegation. The latter case allows a race. After getting a lease, redo the lookup of the file being opened and validate that the resulting dentry matches the one in the open file description. To properly redo the lookup we need an rqst pointer to pass to nfsd_lookup_dentry(), so make sure that is available. Signed-off-by: Jeff Layton Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 1 + fs/nfsd/nfs4state.c | 54 ++++++++++++++++++++++++++++++++++++++++----- fs/nfsd/xdr4.h | 1 + 3 files changed, 51 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index fdde7eca8d43..7685bbe9d78d 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -547,6 +547,7 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, open->op_openowner); open->op_filp = NULL; + open->op_rqstp = rqstp; /* This check required by spec. */ if (open->op_create && open->op_claim_type != NFS4_OPEN_CLAIM_NULL) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 194d8aeb1fd4..b3ab11269583 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5305,11 +5305,44 @@ static int nfsd4_check_conflicting_opens(struct nfs4_client *clp, return 0; } +/* + * It's possible that between opening the dentry and setting the delegation, + * that it has been renamed or unlinked. Redo the lookup to verify that this + * hasn't happened. + */ +static int +nfsd4_verify_deleg_dentry(struct nfsd4_open *open, struct nfs4_file *fp, + struct svc_fh *parent) +{ + struct svc_export *exp; + struct dentry *child; + __be32 err; + + /* parent may already be locked, and it may get unlocked by + * this call, but that is safe. + */ + err = nfsd_lookup_dentry(open->op_rqstp, parent, + open->op_fname, open->op_fnamelen, + &exp, &child); + + if (err) + return -EAGAIN; + + dput(child); + if (child != file_dentry(fp->fi_deleg_file->nf_file)) + return -EAGAIN; + + return 0; +} + static struct nfs4_delegation * -nfs4_set_delegation(struct nfs4_client *clp, - struct nfs4_file *fp, struct nfs4_clnt_odstate *odstate) +nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, + struct svc_fh *parent) { int status = 0; + struct nfs4_client *clp = stp->st_stid.sc_client; + struct nfs4_file *fp = stp->st_stid.sc_file; + struct nfs4_clnt_odstate *odstate = stp->st_clnt_odstate; struct nfs4_delegation *dp; struct nfsd_file *nf; struct file_lock *fl; @@ -5364,6 +5397,13 @@ nfs4_set_delegation(struct nfs4_client *clp, locks_free_lock(fl); if (status) goto out_clnt_odstate; + + if (parent) { + status = nfsd4_verify_deleg_dentry(open, fp, parent); + if (status) + goto out_unlock; + } + status = nfsd4_check_conflicting_opens(clp, fp); if (status) goto out_unlock; @@ -5419,11 +5459,13 @@ static void nfsd4_open_deleg_none_ext(struct nfsd4_open *open, int status) * proper support for them. */ static void -nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp) +nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, + struct svc_fh *currentfh) { struct nfs4_delegation *dp; struct nfs4_openowner *oo = openowner(stp->st_stateowner); struct nfs4_client *clp = stp->st_stid.sc_client; + struct svc_fh *parent = NULL; int cb_up; int status = 0; @@ -5437,6 +5479,8 @@ nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp) goto out_no_deleg; break; case NFS4_OPEN_CLAIM_NULL: + parent = currentfh; + fallthrough; case NFS4_OPEN_CLAIM_FH: /* * Let's not give out any delegations till everyone's @@ -5451,7 +5495,7 @@ nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp) default: goto out_no_deleg; } - dp = nfs4_set_delegation(clp, stp->st_stid.sc_file, stp->st_clnt_odstate); + dp = nfs4_set_delegation(open, stp, parent); if (IS_ERR(dp)) goto out_no_deleg; @@ -5583,7 +5627,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf * Attempt to hand out a delegation. No error return, because the * OPEN succeeds even if we fail. */ - nfs4_open_delegation(open, stp); + nfs4_open_delegation(open, stp, &resp->cstate.current_fh); nodeleg: status = nfs_ok; trace_nfsd_open(&stp->st_stid.sc_stateid); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 3b9e60249aea..14b87141de34 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -279,6 +279,7 @@ struct nfsd4_open { struct nfs4_clnt_odstate *op_odstate; /* used during processing */ struct nfs4_acl *op_acl; struct xdr_netobj op_label; + struct svc_rqst *op_rqstp; }; struct nfsd4_open_confirm { From 45cf4b1bb10fc848657bdc69ffe014165d1780f4 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 26 Jul 2022 16:45:30 +1000 Subject: [PATCH 0911/1565] NFSD: introduce struct nfsd_attrs [ Upstream commit 7fe2a71dda349a1afa75781f0cc7975be9784d15 ] The attributes that nfsd might want to set on a file include 'struct iattr' as well as an ACL and security label. The latter two are passed around quite separately from the first, in part because they are only needed for NFSv4. This leads to some clumsiness in the code, such as the attributes NOT being set in nfsd_create_setattr(). We need to keep the directory locked until all attributes are set to ensure the file is never visibile without all its attributes. This need combined with the inconsistent handling of attributes leads to more clumsiness. As a first step towards tidying this up, introduce 'struct nfsd_attrs'. This is passed (by reference) to vfs.c functions that work with attributes, and is assembled by the various nfs*proc functions which call them. As yet only iattr is included, but future patches will expand this. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 20 ++++++++++++++++---- fs/nfsd/nfs4proc.c | 23 ++++++++++++++++------- fs/nfsd/nfs4state.c | 5 ++++- fs/nfsd/nfsproc.c | 17 +++++++++++++---- fs/nfsd/vfs.c | 24 ++++++++++++++---------- fs/nfsd/vfs.h | 12 ++++++++---- 6 files changed, 71 insertions(+), 30 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index e314545bbdb2..7b81d871f0d3 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -67,12 +67,15 @@ nfsd3_proc_setattr(struct svc_rqst *rqstp) { struct nfsd3_sattrargs *argp = rqstp->rq_argp; struct nfsd3_attrstat *resp = rqstp->rq_resp; + struct nfsd_attrs attrs = { + .na_iattr = &argp->attrs, + }; dprintk("nfsd: SETATTR(3) %s\n", SVCFH_fmt(&argp->fh)); fh_copy(&resp->fh, &argp->fh); - resp->status = nfsd_setattr(rqstp, &resp->fh, &argp->attrs, + resp->status = nfsd_setattr(rqstp, &resp->fh, &attrs, argp->check_guard, argp->guardtime); return rpc_success; } @@ -233,6 +236,9 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, { struct iattr *iap = &argp->attrs; struct dentry *parent, *child; + struct nfsd_attrs attrs = { + .na_iattr = iap, + }; __u32 v_mtime, v_atime; struct inode *inode; __be32 status; @@ -331,7 +337,7 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, } set_attr: - status = nfsd_create_setattr(rqstp, fhp, resfhp, iap); + status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs); out: fh_unlock(fhp); @@ -368,6 +374,9 @@ nfsd3_proc_mkdir(struct svc_rqst *rqstp) { struct nfsd3_createargs *argp = rqstp->rq_argp; struct nfsd3_diropres *resp = rqstp->rq_resp; + struct nfsd_attrs attrs = { + .na_iattr = &argp->attrs, + }; dprintk("nfsd: MKDIR(3) %s %.*s\n", SVCFH_fmt(&argp->fh), @@ -378,7 +387,7 @@ nfsd3_proc_mkdir(struct svc_rqst *rqstp) fh_copy(&resp->dirfh, &argp->fh); fh_init(&resp->fh, NFS3_FHSIZE); resp->status = nfsd_create(rqstp, &resp->dirfh, argp->name, argp->len, - &argp->attrs, S_IFDIR, 0, &resp->fh); + &attrs, S_IFDIR, 0, &resp->fh); fh_unlock(&resp->dirfh); return rpc_success; } @@ -428,6 +437,9 @@ nfsd3_proc_mknod(struct svc_rqst *rqstp) { struct nfsd3_mknodargs *argp = rqstp->rq_argp; struct nfsd3_diropres *resp = rqstp->rq_resp; + struct nfsd_attrs attrs = { + .na_iattr = &argp->attrs, + }; int type; dev_t rdev = 0; @@ -453,7 +465,7 @@ nfsd3_proc_mknod(struct svc_rqst *rqstp) type = nfs3_ftypes[argp->ftype]; resp->status = nfsd_create(rqstp, &resp->dirfh, argp->name, argp->len, - &argp->attrs, type, rdev, &resp->fh); + &attrs, type, rdev, &resp->fh); fh_unlock(&resp->dirfh); out: return rpc_success; diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 7685bbe9d78d..f8a157e4bc70 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -286,6 +286,9 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct svc_fh *resfhp, struct nfsd4_open *open) { struct iattr *iap = &open->op_iattr; + struct nfsd_attrs attrs = { + .na_iattr = iap, + }; struct dentry *parent, *child; __u32 v_mtime, v_atime; struct inode *inode; @@ -404,7 +407,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, } set_attr: - status = nfsd_create_setattr(rqstp, fhp, resfhp, iap); + status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs); out: fh_unlock(fhp); @@ -787,6 +790,9 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) { struct nfsd4_create *create = &u->create; + struct nfsd_attrs attrs = { + .na_iattr = &create->cr_iattr, + }; struct svc_fh resfh; __be32 status; dev_t rdev; @@ -818,7 +824,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out_umask; status = nfsd_create(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - &create->cr_iattr, S_IFBLK, rdev, &resfh); + &attrs, S_IFBLK, rdev, &resfh); break; case NF4CHR: @@ -829,26 +835,26 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out_umask; status = nfsd_create(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - &create->cr_iattr, S_IFCHR, rdev, &resfh); + &attrs, S_IFCHR, rdev, &resfh); break; case NF4SOCK: status = nfsd_create(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - &create->cr_iattr, S_IFSOCK, 0, &resfh); + &attrs, S_IFSOCK, 0, &resfh); break; case NF4FIFO: status = nfsd_create(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - &create->cr_iattr, S_IFIFO, 0, &resfh); + &attrs, S_IFIFO, 0, &resfh); break; case NF4DIR: create->cr_iattr.ia_valid &= ~ATTR_SIZE; status = nfsd_create(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - &create->cr_iattr, S_IFDIR, 0, &resfh); + &attrs, S_IFDIR, 0, &resfh); break; default: @@ -1142,6 +1148,9 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) { struct nfsd4_setattr *setattr = &u->setattr; + struct nfsd_attrs attrs = { + .na_iattr = &setattr->sa_iattr, + }; __be32 status = nfs_ok; int err; @@ -1174,7 +1183,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, &setattr->sa_label); if (status) goto out; - status = nfsd_setattr(rqstp, &cstate->current_fh, &setattr->sa_iattr, + status = nfsd_setattr(rqstp, &cstate->current_fh, &attrs, 0, (time64_t)0); out: fh_drop_write(&cstate->current_fh); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index b3ab11269583..2b6c9f1b9de8 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5077,11 +5077,14 @@ nfsd4_truncate(struct svc_rqst *rqstp, struct svc_fh *fh, .ia_valid = ATTR_SIZE, .ia_size = 0, }; + struct nfsd_attrs attrs = { + .na_iattr = &iattr, + }; if (!open->op_truncate) return 0; if (!(open->op_share_access & NFS4_SHARE_ACCESS_WRITE)) return nfserr_inval; - return nfsd_setattr(rqstp, fh, &iattr, 0, (time64_t)0); + return nfsd_setattr(rqstp, fh, &attrs, 0, (time64_t)0); } static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index b8ddf21e70ea..f061f229d5ff 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -51,6 +51,9 @@ nfsd_proc_setattr(struct svc_rqst *rqstp) struct nfsd_sattrargs *argp = rqstp->rq_argp; struct nfsd_attrstat *resp = rqstp->rq_resp; struct iattr *iap = &argp->attrs; + struct nfsd_attrs attrs = { + .na_iattr = iap, + }; struct svc_fh *fhp; dprintk("nfsd: SETATTR %s, valid=%x, size=%ld\n", @@ -100,7 +103,7 @@ nfsd_proc_setattr(struct svc_rqst *rqstp) } } - resp->status = nfsd_setattr(rqstp, fhp, iap, 0, (time64_t)0); + resp->status = nfsd_setattr(rqstp, fhp, &attrs, 0, (time64_t)0); if (resp->status != nfs_ok) goto out; @@ -260,6 +263,9 @@ nfsd_proc_create(struct svc_rqst *rqstp) svc_fh *dirfhp = &argp->fh; svc_fh *newfhp = &resp->fh; struct iattr *attr = &argp->attrs; + struct nfsd_attrs attrs = { + .na_iattr = attr, + }; struct inode *inode; struct dentry *dchild; int type, mode; @@ -385,7 +391,7 @@ nfsd_proc_create(struct svc_rqst *rqstp) if (!inode) { /* File doesn't exist. Create it and set attrs */ resp->status = nfsd_create_locked(rqstp, dirfhp, argp->name, - argp->len, attr, type, rdev, + argp->len, &attrs, type, rdev, newfhp); } else if (type == S_IFREG) { dprintk("nfsd: existing %s, valid=%x, size=%ld\n", @@ -396,7 +402,7 @@ nfsd_proc_create(struct svc_rqst *rqstp) */ attr->ia_valid &= ATTR_SIZE; if (attr->ia_valid) - resp->status = nfsd_setattr(rqstp, newfhp, attr, 0, + resp->status = nfsd_setattr(rqstp, newfhp, &attrs, 0, (time64_t)0); } @@ -511,6 +517,9 @@ nfsd_proc_mkdir(struct svc_rqst *rqstp) { struct nfsd_createargs *argp = rqstp->rq_argp; struct nfsd_diropres *resp = rqstp->rq_resp; + struct nfsd_attrs attrs = { + .na_iattr = &argp->attrs, + }; dprintk("nfsd: MKDIR %s %.*s\n", SVCFH_fmt(&argp->fh), argp->len, argp->name); @@ -522,7 +531,7 @@ nfsd_proc_mkdir(struct svc_rqst *rqstp) argp->attrs.ia_valid &= ~ATTR_SIZE; fh_init(&resp->fh, NFS_FHSIZE); resp->status = nfsd_create(rqstp, &argp->fh, argp->name, argp->len, - &argp->attrs, S_IFDIR, 0, &resp->fh); + &attrs, S_IFDIR, 0, &resp->fh); fh_put(&argp->fh); if (resp->status != nfs_ok) goto out; diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 5c8dc1a05e57..c86c3a8e4232 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -362,11 +362,13 @@ nfsd_get_write_access(struct svc_rqst *rqstp, struct svc_fh *fhp, * Set various file attributes. After this call fhp needs an fh_put. */ __be32 -nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap, +nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, + struct nfsd_attrs *attr, int check_guard, time64_t guardtime) { struct dentry *dentry; struct inode *inode; + struct iattr *iap = attr->na_iattr; int accmode = NFSD_MAY_SATTR; umode_t ftype = 0; __be32 err; @@ -1221,14 +1223,15 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, u64 offset, * @rqstp: RPC transaction being executed * @fhp: NFS filehandle of parent directory * @resfhp: NFS filehandle of new object - * @iap: requested attributes of new object + * @attrs: requested attributes of new object * * Returns nfs_ok on success, or an nfsstat in network byte order. */ __be32 nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct svc_fh *resfhp, struct iattr *iap) + struct svc_fh *resfhp, struct nfsd_attrs *attrs) { + struct iattr *iap = attrs->na_iattr; __be32 status; /* @@ -1249,7 +1252,7 @@ nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, * if the attributes have not changed. */ if (iap->ia_valid) - status = nfsd_setattr(rqstp, resfhp, iap, 0, (time64_t)0); + status = nfsd_setattr(rqstp, resfhp, attrs, 0, (time64_t)0); else status = nfserrno(commit_metadata(resfhp)); @@ -1288,11 +1291,12 @@ nfsd_check_ignore_resizing(struct iattr *iap) /* The parent directory should already be locked: */ __be32 nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, - char *fname, int flen, struct iattr *iap, - int type, dev_t rdev, struct svc_fh *resfhp) + char *fname, int flen, struct nfsd_attrs *attrs, + int type, dev_t rdev, struct svc_fh *resfhp) { struct dentry *dentry, *dchild; struct inode *dirp; + struct iattr *iap = attrs->na_iattr; __be32 err; int host_err; @@ -1365,7 +1369,7 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, if (host_err < 0) goto out_nfserr; - err = nfsd_create_setattr(rqstp, fhp, resfhp, iap); + err = nfsd_create_setattr(rqstp, fhp, resfhp, attrs); out: dput(dchild); @@ -1384,8 +1388,8 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, */ __be32 nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, - char *fname, int flen, struct iattr *iap, - int type, dev_t rdev, struct svc_fh *resfhp) + char *fname, int flen, struct nfsd_attrs *attrs, + int type, dev_t rdev, struct svc_fh *resfhp) { struct dentry *dentry, *dchild = NULL; __be32 err; @@ -1417,7 +1421,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, dput(dchild); if (err) return err; - return nfsd_create_locked(rqstp, fhp, fname, flen, iap, type, + return nfsd_create_locked(rqstp, fhp, fname, flen, attrs, type, rdev, resfhp); } diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index 26347d76f44a..d8b1a36fca95 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -42,6 +42,10 @@ struct nfsd_file; typedef int (*nfsd_filldir_t)(void *, const char *, int, loff_t, u64, unsigned); /* nfsd/vfs.c */ +struct nfsd_attrs { + struct iattr *na_iattr; /* input */ +}; + int nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp, struct svc_export **expp); __be32 nfsd_lookup(struct svc_rqst *, struct svc_fh *, @@ -50,7 +54,7 @@ __be32 nfsd_lookup_dentry(struct svc_rqst *, struct svc_fh *, const char *, unsigned int, struct svc_export **, struct dentry **); __be32 nfsd_setattr(struct svc_rqst *, struct svc_fh *, - struct iattr *, int, time64_t); + struct nfsd_attrs *, int, time64_t); int nfsd_mountpoint(struct dentry *, struct svc_export *); #ifdef CONFIG_NFSD_V4 __be32 nfsd4_set_nfs4_label(struct svc_rqst *, struct svc_fh *, @@ -63,14 +67,14 @@ __be32 nfsd4_clone_file_range(struct svc_rqst *rqstp, u64 count, bool sync); #endif /* CONFIG_NFSD_V4 */ __be32 nfsd_create_locked(struct svc_rqst *, struct svc_fh *, - char *name, int len, struct iattr *attrs, + char *name, int len, struct nfsd_attrs *attrs, int type, dev_t rdev, struct svc_fh *res); __be32 nfsd_create(struct svc_rqst *, struct svc_fh *, - char *name, int len, struct iattr *attrs, + char *name, int len, struct nfsd_attrs *attrs, int type, dev_t rdev, struct svc_fh *res); __be32 nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *); __be32 nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct svc_fh *resfhp, struct iattr *iap); + struct svc_fh *resfhp, struct nfsd_attrs *iap); __be32 nfsd_commit(struct svc_rqst *rqst, struct svc_fh *fhp, u64 offset, u32 count, __be32 *verf); #ifdef CONFIG_NFSD_V4 From 2a5642abeb7249beb529ea31a3c000db9c51b729 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 26 Jul 2022 16:45:30 +1000 Subject: [PATCH 0912/1565] NFSD: set attributes when creating symlinks [ Upstream commit 93adc1e391a761441d783828b93979b38093d011 ] The NFS protocol includes attributes when creating symlinks. Linux does store attributes for symlinks and allows them to be set, though they are not used for permission checking. NFSD currently doesn't set standard (struct iattr) attributes when creating symlinks, but for NFSv4 it does set ACLs and security labels. This is inconsistent. To improve consistency, pass the provided attributes into nfsd_symlink() and call nfsd_create_setattr() to set them. NOTE: this results in a behaviour change for all NFS versions when the client sends non-default attributes with a SYMLINK request. With the Linux client, the only attributes are: attr.ia_mode = S_IFLNK | S_IRWXUGO; attr.ia_valid = ATTR_MODE; so the final outcome will be unchanged. Other clients might sent different attributes, and if they did they probably expect them to be honoured. We ignore any error from nfsd_create_setattr(). It isn't really clear what should be done if a file is successfully created, but the attributes cannot be set. NFS doesn't allow partial success to be reported. Reporting failure is probably more misleading than reporting success, so the status is ignored. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 5 ++++- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfsproc.c | 5 ++++- fs/nfsd/vfs.c | 25 ++++++++++++++++++------- fs/nfsd/vfs.h | 5 +++-- 5 files changed, 30 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 7b81d871f0d3..394f6fb20197 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -397,6 +397,9 @@ nfsd3_proc_symlink(struct svc_rqst *rqstp) { struct nfsd3_symlinkargs *argp = rqstp->rq_argp; struct nfsd3_diropres *resp = rqstp->rq_resp; + struct nfsd_attrs attrs = { + .na_iattr = &argp->attrs, + }; if (argp->tlen == 0) { resp->status = nfserr_inval; @@ -423,7 +426,7 @@ nfsd3_proc_symlink(struct svc_rqst *rqstp) fh_copy(&resp->dirfh, &argp->ffh); fh_init(&resp->fh, NFS3_FHSIZE); resp->status = nfsd_symlink(rqstp, &resp->dirfh, argp->fname, - argp->flen, argp->tname, &resp->fh); + argp->flen, argp->tname, &attrs, &resp->fh); kfree(argp->tname); out: return rpc_success; diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index f8a157e4bc70..43387a8f10d0 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -813,7 +813,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, case NF4LNK: status = nfsd_symlink(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - create->cr_data, &resfh); + create->cr_data, &attrs, &resfh); break; case NF4BLK: diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index f061f229d5ff..180b84b6597b 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -478,6 +478,9 @@ nfsd_proc_symlink(struct svc_rqst *rqstp) { struct nfsd_symlinkargs *argp = rqstp->rq_argp; struct nfsd_stat *resp = rqstp->rq_resp; + struct nfsd_attrs attrs = { + .na_iattr = &argp->attrs, + }; struct svc_fh newfh; if (argp->tlen > NFS_MAXPATHLEN) { @@ -499,7 +502,7 @@ nfsd_proc_symlink(struct svc_rqst *rqstp) fh_init(&newfh, NFS_FHSIZE); resp->status = nfsd_symlink(rqstp, &argp->ffh, argp->fname, argp->flen, - argp->tname, &newfh); + argp->tname, &attrs, &newfh); kfree(argp->tname); fh_put(&argp->ffh); diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index c86c3a8e4232..f5a1f41cddff 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1463,15 +1463,25 @@ nfsd_readlink(struct svc_rqst *rqstp, struct svc_fh *fhp, char *buf, int *lenp) return 0; } -/* - * Create a symlink and look up its inode +/** + * nfsd_symlink - Create a symlink and look up its inode + * @rqstp: RPC transaction being executed + * @fhp: NFS filehandle of parent directory + * @fname: filename of the new symlink + * @flen: length of @fname + * @path: content of the new symlink (NUL-terminated) + * @attrs: requested attributes of new object + * @resfhp: NFS filehandle of new object + * * N.B. After this call _both_ fhp and resfhp need an fh_put + * + * Returns nfs_ok on success, or an nfsstat in network byte order. */ __be32 nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp, - char *fname, int flen, - char *path, - struct svc_fh *resfhp) + char *fname, int flen, + char *path, struct nfsd_attrs *attrs, + struct svc_fh *resfhp) { struct dentry *dentry, *dnew; __be32 err, cerr; @@ -1501,13 +1511,14 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp, host_err = vfs_symlink(d_inode(dentry), dnew, path); err = nfserrno(host_err); + cerr = fh_compose(resfhp, fhp->fh_export, dnew, fhp); + if (!err) + nfsd_create_setattr(rqstp, fhp, resfhp, attrs); fh_unlock(fhp); if (!err) err = nfserrno(commit_metadata(fhp)); - fh_drop_write(fhp); - cerr = fh_compose(resfhp, fhp->fh_export, dnew, fhp); dput(dnew); if (err==0) err = cerr; out: diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index d8b1a36fca95..5047cec4c423 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -114,8 +114,9 @@ __be32 nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, __be32 nfsd_readlink(struct svc_rqst *, struct svc_fh *, char *, int *); __be32 nfsd_symlink(struct svc_rqst *, struct svc_fh *, - char *name, int len, char *path, - struct svc_fh *res); + char *name, int len, char *path, + struct nfsd_attrs *attrs, + struct svc_fh *res); __be32 nfsd_link(struct svc_rqst *, struct svc_fh *, char *, int, struct svc_fh *); ssize_t nfsd_copy_file_range(struct file *, u64, From 18ee0869d6f3357043ba6a191edc3b53655e4942 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 26 Jul 2022 16:45:30 +1000 Subject: [PATCH 0913/1565] NFSD: add security label to struct nfsd_attrs [ Upstream commit d6a97d3f589a3a46a16183e03f3774daee251317 ] nfsd_setattr() now sets a security label if provided, and nfsv4 provides it in the 'open' and 'create' paths and the 'setattr' path. If setting the label failed (including because the kernel doesn't support labels), an error field in 'struct nfsd_attrs' is set, and the caller can respond. The open/create callers clear FATTR4_WORD2_SECURITY_LABEL in the returned attr set in this case. The setattr caller returns the error. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 49 +++++++++------------------------------------- fs/nfsd/vfs.c | 29 +++------------------------ fs/nfsd/vfs.h | 5 +++-- 3 files changed, 15 insertions(+), 68 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 43387a8f10d0..cfcc463968b7 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -64,36 +64,6 @@ MODULE_PARM_DESC(nfsd4_ssc_umount_timeout, "idle msecs before unmount export from source server"); #endif -#ifdef CONFIG_NFSD_V4_SECURITY_LABEL -#include - -static inline void -nfsd4_security_inode_setsecctx(struct svc_fh *resfh, struct xdr_netobj *label, u32 *bmval) -{ - struct inode *inode = d_inode(resfh->fh_dentry); - int status; - - inode_lock(inode); - status = security_inode_setsecctx(resfh->fh_dentry, - label->data, label->len); - inode_unlock(inode); - - if (status) - /* - * XXX: We should really fail the whole open, but we may - * already have created a new file, so it may be too - * late. For now this seems the least of evils: - */ - bmval[2] &= ~FATTR4_WORD2_SECURITY_LABEL; - - return; -} -#else -static inline void -nfsd4_security_inode_setsecctx(struct svc_fh *resfh, struct xdr_netobj *label, u32 *bmval) -{ } -#endif - #define NFSDDBG_FACILITY NFSDDBG_PROC static u32 nfsd_attrmask[] = { @@ -288,6 +258,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap = &open->op_iattr; struct nfsd_attrs attrs = { .na_iattr = iap, + .na_seclabel = &open->op_label, }; struct dentry *parent, *child; __u32 v_mtime, v_atime; @@ -409,6 +380,8 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, set_attr: status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs); + if (attrs.na_labelerr) + open->op_bmval[2] &= ~FATTR4_WORD2_SECURITY_LABEL; out: fh_unlock(fhp); if (child && !IS_ERR(child)) @@ -450,9 +423,6 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru status = nfsd4_create_file(rqstp, current_fh, *resfh, open); current->fs->umask = 0; - if (!status && open->op_label.len) - nfsd4_security_inode_setsecctx(*resfh, &open->op_label, open->op_bmval); - /* * Following rfc 3530 14.2.16, and rfc 5661 18.16.4 * use the returned bitmask to indicate which attributes @@ -792,6 +762,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_create *create = &u->create; struct nfsd_attrs attrs = { .na_iattr = &create->cr_iattr, + .na_seclabel = &create->cr_label, }; struct svc_fh resfh; __be32 status; @@ -864,8 +835,8 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) goto out; - if (create->cr_label.len) - nfsd4_security_inode_setsecctx(&resfh, &create->cr_label, create->cr_bmval); + if (attrs.na_labelerr) + create->cr_bmval[2] &= ~FATTR4_WORD2_SECURITY_LABEL; if (create->cr_acl != NULL) do_set_nfs4_acl(rqstp, &resfh, create->cr_acl, @@ -1150,6 +1121,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_setattr *setattr = &u->setattr; struct nfsd_attrs attrs = { .na_iattr = &setattr->sa_iattr, + .na_seclabel = &setattr->sa_label, }; __be32 status = nfs_ok; int err; @@ -1178,13 +1150,10 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, setattr->sa_acl); if (status) goto out; - if (setattr->sa_label.len) - status = nfsd4_set_nfs4_label(rqstp, &cstate->current_fh, - &setattr->sa_label); - if (status) - goto out; status = nfsd_setattr(rqstp, &cstate->current_fh, &attrs, 0, (time64_t)0); + if (!status) + status = nfserrno(attrs.na_labelerr); out: fh_drop_write(&cstate->current_fh); return status; diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index f5a1f41cddff..01c431ed90ec 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -471,6 +471,9 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, host_err = notify_change(dentry, iap, NULL); out_unlock: + if (attr->na_seclabel && attr->na_seclabel->len) + attr->na_labelerr = security_inode_setsecctx(dentry, + attr->na_seclabel->data, attr->na_seclabel->len); fh_unlock(fhp); if (size_change) put_write_access(inode); @@ -508,32 +511,6 @@ int nfsd4_is_junction(struct dentry *dentry) return 0; return 1; } -#ifdef CONFIG_NFSD_V4_SECURITY_LABEL -__be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct xdr_netobj *label) -{ - __be32 error; - int host_error; - struct dentry *dentry; - - error = fh_verify(rqstp, fhp, 0 /* S_IFREG */, NFSD_MAY_SATTR); - if (error) - return error; - - dentry = fhp->fh_dentry; - - inode_lock(d_inode(dentry)); - host_error = security_inode_setsecctx(dentry, label->data, label->len); - inode_unlock(d_inode(dentry)); - return nfserrno(host_error); -} -#else -__be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct xdr_netobj *label) -{ - return nfserr_notsupp; -} -#endif static struct nfsd4_compound_state *nfsd4_get_cstate(struct svc_rqst *rqstp) { diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index 5047cec4c423..d5d4cfe37c93 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -44,6 +44,9 @@ typedef int (*nfsd_filldir_t)(void *, const char *, int, loff_t, u64, unsigned); /* nfsd/vfs.c */ struct nfsd_attrs { struct iattr *na_iattr; /* input */ + struct xdr_netobj *na_seclabel; /* input */ + + int na_labelerr; /* output */ }; int nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp, @@ -57,8 +60,6 @@ __be32 nfsd_setattr(struct svc_rqst *, struct svc_fh *, struct nfsd_attrs *, int, time64_t); int nfsd_mountpoint(struct dentry *, struct svc_export *); #ifdef CONFIG_NFSD_V4 -__be32 nfsd4_set_nfs4_label(struct svc_rqst *, struct svc_fh *, - struct xdr_netobj *); __be32 nfsd4_vfs_fallocate(struct svc_rqst *, struct svc_fh *, struct file *, loff_t, loff_t, int); __be32 nfsd4_clone_file_range(struct svc_rqst *rqstp, From c5409ce523af40d5c3019717bc5b4f72038d48be Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 26 Jul 2022 16:45:30 +1000 Subject: [PATCH 0914/1565] NFSD: add posix ACLs to struct nfsd_attrs [ Upstream commit c0cbe70742f4a70893cd6e5f6b10b6e89b6db95b ] pacl and dpacl pointers are added to struct nfsd_attrs, which requires that we have an nfsd_attrs_free() function to free them. Those nfsv4 functions that can set ACLs now set up these pointers based on the passed in NFSv4 ACL. nfsd_setattr() sets the acls as appropriate. Errors are handled as with security labels. Signed-off-by: NeilBrown [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/acl.h | 6 ++++-- fs/nfsd/nfs4acl.c | 45 +++++++-------------------------------------- fs/nfsd/nfs4proc.c | 46 ++++++++++++++++------------------------------ fs/nfsd/vfs.c | 7 +++++++ fs/nfsd/vfs.h | 11 +++++++++++ 5 files changed, 45 insertions(+), 70 deletions(-) diff --git a/fs/nfsd/acl.h b/fs/nfsd/acl.h index ba14d2f4b64f..4b7324458a94 100644 --- a/fs/nfsd/acl.h +++ b/fs/nfsd/acl.h @@ -38,6 +38,8 @@ struct nfs4_acl; struct svc_fh; struct svc_rqst; +struct nfsd_attrs; +enum nfs_ftype4; int nfs4_acl_bytes(int entries); int nfs4_acl_get_whotype(char *, u32); @@ -45,7 +47,7 @@ __be32 nfs4_acl_write_who(struct xdr_stream *xdr, int who); int nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry, struct nfs4_acl **acl); -__be32 nfsd4_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct nfs4_acl *acl); +__be32 nfsd4_acl_to_attr(enum nfs_ftype4 type, struct nfs4_acl *acl, + struct nfsd_attrs *attr); #endif /* LINUX_NFS4_ACL_H */ diff --git a/fs/nfsd/nfs4acl.c b/fs/nfsd/nfs4acl.c index 71292a0d6f09..bb8e2f6d7d03 100644 --- a/fs/nfsd/nfs4acl.c +++ b/fs/nfsd/nfs4acl.c @@ -751,57 +751,26 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl, return ret; } -__be32 -nfsd4_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct nfs4_acl *acl) +__be32 nfsd4_acl_to_attr(enum nfs_ftype4 type, struct nfs4_acl *acl, + struct nfsd_attrs *attr) { - __be32 error; int host_error; - struct dentry *dentry; - struct inode *inode; - struct posix_acl *pacl = NULL, *dpacl = NULL; unsigned int flags = 0; - /* Get inode */ - error = fh_verify(rqstp, fhp, 0, NFSD_MAY_SATTR); - if (error) - return error; + if (!acl) + return nfs_ok; - dentry = fhp->fh_dentry; - inode = d_inode(dentry); - - if (S_ISDIR(inode->i_mode)) + if (type == NF4DIR) flags = NFS4_ACL_DIR; - host_error = nfs4_acl_nfsv4_to_posix(acl, &pacl, &dpacl, flags); + host_error = nfs4_acl_nfsv4_to_posix(acl, &attr->na_pacl, + &attr->na_dpacl, flags); if (host_error == -EINVAL) return nfserr_attrnotsupp; - if (host_error < 0) - goto out_nfserr; - - fh_lock(fhp); - - host_error = set_posix_acl(inode, ACL_TYPE_ACCESS, pacl); - if (host_error < 0) - goto out_drop_lock; - - if (S_ISDIR(inode->i_mode)) { - host_error = set_posix_acl(inode, ACL_TYPE_DEFAULT, dpacl); - } - -out_drop_lock: - fh_unlock(fhp); - - posix_acl_release(pacl); - posix_acl_release(dpacl); -out_nfserr: - if (host_error == -EOPNOTSUPP) - return nfserr_attrnotsupp; else return nfserrno(host_error); } - static short ace2type(struct nfs4_ace *ace) { diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index cfcc463968b7..507a2aa96713 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -128,26 +128,6 @@ is_create_with_attrs(struct nfsd4_open *open) || open->op_createmode == NFS4_CREATE_EXCLUSIVE4_1); } -/* - * if error occurs when setting the acl, just clear the acl bit - * in the returned attr bitmap. - */ -static void -do_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct nfs4_acl *acl, u32 *bmval) -{ - __be32 status; - - status = nfsd4_set_nfs4_acl(rqstp, fhp, acl); - if (status) - /* - * We should probably fail the whole open at this point, - * but we've already created the file, so it's too late; - * So this seems the least of evils: - */ - bmval[0] &= ~FATTR4_WORD0_ACL; -} - static inline void fh_dup2(struct svc_fh *dst, struct svc_fh *src) { @@ -281,6 +261,9 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, if (host_err) return nfserrno(host_err); + if (is_create_with_attrs(open)) + nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs); + fh_lock_nested(fhp, I_MUTEX_PARENT); child = lookup_one_len(open->op_fname, parent, open->op_fnamelen); @@ -382,8 +365,11 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, if (attrs.na_labelerr) open->op_bmval[2] &= ~FATTR4_WORD2_SECURITY_LABEL; + if (attrs.na_aclerr) + open->op_bmval[0] &= ~FATTR4_WORD0_ACL; out: fh_unlock(fhp); + nfsd_attrs_free(&attrs); if (child && !IS_ERR(child)) dput(child); fh_drop_write(fhp); @@ -446,9 +432,6 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru if (status) goto out; - if (is_create_with_attrs(open) && open->op_acl != NULL) - do_set_nfs4_acl(rqstp, *resfh, open->op_acl, open->op_bmval); - nfsd4_set_open_owner_reply_cache(cstate, open, *resfh); accmode = NFSD_MAY_NOP; if (open->op_created || @@ -779,6 +762,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) return status; + status = nfsd4_acl_to_attr(create->cr_type, create->cr_acl, &attrs); current->fs->umask = create->cr_umask; switch (create->cr_type) { case NF4LNK: @@ -837,10 +821,8 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (attrs.na_labelerr) create->cr_bmval[2] &= ~FATTR4_WORD2_SECURITY_LABEL; - - if (create->cr_acl != NULL) - do_set_nfs4_acl(rqstp, &resfh, create->cr_acl, - create->cr_bmval); + if (attrs.na_aclerr) + create->cr_bmval[0] &= ~FATTR4_WORD0_ACL; fh_unlock(&cstate->current_fh); set_change_info(&create->cr_cinfo, &cstate->current_fh); @@ -849,6 +831,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, fh_put(&resfh); out_umask: current->fs->umask = 0; + nfsd_attrs_free(&attrs); return status; } @@ -1123,6 +1106,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, .na_iattr = &setattr->sa_iattr, .na_seclabel = &setattr->sa_label, }; + struct inode *inode; __be32 status = nfs_ok; int err; @@ -1145,9 +1129,10 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) goto out; - if (setattr->sa_acl != NULL) - status = nfsd4_set_nfs4_acl(rqstp, &cstate->current_fh, - setattr->sa_acl); + inode = cstate->current_fh.fh_dentry->d_inode; + status = nfsd4_acl_to_attr(S_ISDIR(inode->i_mode) ? NF4DIR : NF4REG, + setattr->sa_acl, &attrs); + if (status) goto out; status = nfsd_setattr(rqstp, &cstate->current_fh, &attrs, @@ -1155,6 +1140,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (!status) status = nfserrno(attrs.na_labelerr); out: + nfsd_attrs_free(&attrs); fh_drop_write(&cstate->current_fh); return status; } diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 01c431ed90ec..bed542ba8ad8 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -474,6 +474,13 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, if (attr->na_seclabel && attr->na_seclabel->len) attr->na_labelerr = security_inode_setsecctx(dentry, attr->na_seclabel->data, attr->na_seclabel->len); + if (IS_ENABLED(CONFIG_FS_POSIX_ACL) && attr->na_pacl) + attr->na_aclerr = set_posix_acl(inode, ACL_TYPE_ACCESS, + attr->na_pacl); + if (IS_ENABLED(CONFIG_FS_POSIX_ACL) && + !attr->na_aclerr && attr->na_dpacl && S_ISDIR(inode->i_mode)) + attr->na_aclerr = set_posix_acl(inode, ACL_TYPE_DEFAULT, + attr->na_dpacl); fh_unlock(fhp); if (size_change) put_write_access(inode); diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index d5d4cfe37c93..c95cd414b4bb 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -6,6 +6,8 @@ #ifndef LINUX_NFSD_VFS_H #define LINUX_NFSD_VFS_H +#include +#include #include "nfsfh.h" #include "nfsd.h" @@ -45,10 +47,19 @@ typedef int (*nfsd_filldir_t)(void *, const char *, int, loff_t, u64, unsigned); struct nfsd_attrs { struct iattr *na_iattr; /* input */ struct xdr_netobj *na_seclabel; /* input */ + struct posix_acl *na_pacl; /* input */ + struct posix_acl *na_dpacl; /* input */ int na_labelerr; /* output */ + int na_aclerr; /* output */ }; +static inline void nfsd_attrs_free(struct nfsd_attrs *attrs) +{ + posix_acl_release(attrs->na_pacl); + posix_acl_release(attrs->na_dpacl); +} + int nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp, struct svc_export **expp); __be32 nfsd_lookup(struct svc_rqst *, struct svc_fh *, From 617f72a1aa6d9eb3e370c12deefaaac139f74003 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 26 Jul 2022 16:45:30 +1000 Subject: [PATCH 0915/1565] NFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning. [ Upstream commit 927bfc5600cd6333c9ef9f090f19e66b7d4c8ee1 ] nfsd_create() usually returns with the directory still locked. nfsd_symlink() usually returns with it unlocked. This is clumsy. Until recently nfsd_create() needed to keep the directory locked until ACLs and security label had been set. These are now set inside nfsd_create() (in nfsd_setattr()) so this need is gone. So change nfsd_create() and nfsd_symlink() to always unlock, and remove any fh_unlock() calls that follow calls to these functions. Signed-off-by: NeilBrown [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 2 -- fs/nfsd/nfs4proc.c | 2 -- fs/nfsd/vfs.c | 38 +++++++++++++++++++++----------------- 3 files changed, 21 insertions(+), 21 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 394f6fb20197..c71f0c359f7a 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -388,7 +388,6 @@ nfsd3_proc_mkdir(struct svc_rqst *rqstp) fh_init(&resp->fh, NFS3_FHSIZE); resp->status = nfsd_create(rqstp, &resp->dirfh, argp->name, argp->len, &attrs, S_IFDIR, 0, &resp->fh); - fh_unlock(&resp->dirfh); return rpc_success; } @@ -469,7 +468,6 @@ nfsd3_proc_mknod(struct svc_rqst *rqstp) type = nfs3_ftypes[argp->ftype]; resp->status = nfsd_create(rqstp, &resp->dirfh, argp->name, argp->len, &attrs, type, rdev, &resp->fh); - fh_unlock(&resp->dirfh); out: return rpc_success; } diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 507a2aa96713..0396fa2c1cfe 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -823,8 +823,6 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, create->cr_bmval[2] &= ~FATTR4_WORD2_SECURITY_LABEL; if (attrs.na_aclerr) create->cr_bmval[0] &= ~FATTR4_WORD0_ACL; - - fh_unlock(&cstate->current_fh); set_change_info(&create->cr_cinfo, &cstate->current_fh); fh_dup2(&cstate->current_fh, &resfh); out: diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index bed542ba8ad8..3d794533bbd4 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1395,8 +1395,10 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, fh_lock_nested(fhp, I_MUTEX_PARENT); dchild = lookup_one_len(fname, dentry, flen); host_err = PTR_ERR(dchild); - if (IS_ERR(dchild)) - return nfserrno(host_err); + if (IS_ERR(dchild)) { + err = nfserrno(host_err); + goto out_unlock; + } err = fh_compose(resfhp, fhp->fh_export, dchild, fhp); /* * We unconditionally drop our ref to dchild as fh_compose will have @@ -1404,9 +1406,12 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, */ dput(dchild); if (err) - return err; - return nfsd_create_locked(rqstp, fhp, fname, flen, attrs, type, - rdev, resfhp); + goto out_unlock; + err = nfsd_create_locked(rqstp, fhp, fname, flen, attrs, type, + rdev, resfhp); +out_unlock: + fh_unlock(fhp); + return err; } /* @@ -1483,16 +1488,19 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp, goto out; host_err = fh_want_write(fhp); - if (host_err) - goto out_nfserr; + if (host_err) { + err = nfserrno(host_err); + goto out; + } fh_lock(fhp); dentry = fhp->fh_dentry; dnew = lookup_one_len(fname, dentry, flen); - host_err = PTR_ERR(dnew); - if (IS_ERR(dnew)) - goto out_nfserr; - + if (IS_ERR(dnew)) { + err = nfserrno(PTR_ERR(dnew)); + fh_unlock(fhp); + goto out_drop_write; + } host_err = vfs_symlink(d_inode(dentry), dnew, path); err = nfserrno(host_err); cerr = fh_compose(resfhp, fhp->fh_export, dnew, fhp); @@ -1501,16 +1509,12 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp, fh_unlock(fhp); if (!err) err = nfserrno(commit_metadata(fhp)); - fh_drop_write(fhp); - dput(dnew); if (err==0) err = cerr; +out_drop_write: + fh_drop_write(fhp); out: return err; - -out_nfserr: - err = nfserrno(host_err); - goto out; } /* From 77f83bc2ed03b3038e12b80dee5c46cf094590e5 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 26 Jul 2022 16:45:30 +1000 Subject: [PATCH 0916/1565] NFSD: always drop directory lock in nfsd_unlink() [ Upstream commit b677c0c63a135a916493c064906582e9f3ed4802 ] Some error paths in nfsd_unlink() allow it to exit without unlocking the directory. This is not a problem in practice as the directory will be locked with an fh_put(), but it is untidy and potentially confusing. This allows us to remove all the fh_unlock() calls that are immediately after nfsd_unlink() calls. Reviewed-by: Jeff Layton Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 2 -- fs/nfsd/nfs4proc.c | 4 +--- fs/nfsd/vfs.c | 7 +++++-- 3 files changed, 6 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index c71f0c359f7a..d78dbb214c5c 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -490,7 +490,6 @@ nfsd3_proc_remove(struct svc_rqst *rqstp) fh_copy(&resp->fh, &argp->fh); resp->status = nfsd_unlink(rqstp, &resp->fh, -S_IFDIR, argp->name, argp->len); - fh_unlock(&resp->fh); return rpc_success; } @@ -511,7 +510,6 @@ nfsd3_proc_rmdir(struct svc_rqst *rqstp) fh_copy(&resp->fh, &argp->fh); resp->status = nfsd_unlink(rqstp, &resp->fh, S_IFDIR, argp->name, argp->len); - fh_unlock(&resp->fh); return rpc_success; } diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 0396fa2c1cfe..f588e592a070 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1002,10 +1002,8 @@ nfsd4_remove(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, return nfserr_grace; status = nfsd_unlink(rqstp, &cstate->current_fh, 0, remove->rm_name, remove->rm_namelen); - if (!status) { - fh_unlock(&cstate->current_fh); + if (!status) set_change_info(&remove->rm_cinfo, &cstate->current_fh); - } return status; } diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 3d794533bbd4..6b8e1826956b 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1767,12 +1767,12 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, rdentry = lookup_one_len(fname, dentry, flen); host_err = PTR_ERR(rdentry); if (IS_ERR(rdentry)) - goto out_drop_write; + goto out_unlock; if (d_really_is_negative(rdentry)) { dput(rdentry); host_err = -ENOENT; - goto out_drop_write; + goto out_unlock; } rinode = d_inode(rdentry); ihold(rinode); @@ -1810,6 +1810,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, } out: return err; +out_unlock: + fh_unlock(fhp); + goto out_drop_write; } /* From bedd266b1fe3a787cc0c3e8b3d5ecaac7a85232d Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 26 Jul 2022 16:45:30 +1000 Subject: [PATCH 0917/1565] NFSD: only call fh_unlock() once in nfsd_link() [ Upstream commit e18bcb33bc5b69bccc2b532075aa00bb49cc01c5 ] On non-error paths, nfsd_link() calls fh_unlock() twice. This is safe because fh_unlock() records that the unlock has been done and doesn't repeat it. However it makes the code a little confusing and interferes with changes that are planned for directory locking. So rearrange the code to ensure fh_unlock() is called exactly once if fh_lock() was called. Reviewed-by: Jeff Layton Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 6b8e1826956b..dd945099ae6b 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1557,9 +1557,10 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp, dirp = d_inode(ddir); dnew = lookup_one_len(name, ddir, len); - host_err = PTR_ERR(dnew); - if (IS_ERR(dnew)) - goto out_nfserr; + if (IS_ERR(dnew)) { + err = nfserrno(PTR_ERR(dnew)); + goto out_unlock; + } dold = tfhp->fh_dentry; @@ -1578,17 +1579,17 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp, else err = nfserrno(host_err); } -out_dput: dput(dnew); -out_unlock: - fh_unlock(ffhp); +out_drop_write: fh_drop_write(tfhp); out: return err; -out_nfserr: - err = nfserrno(host_err); - goto out_unlock; +out_dput: + dput(dnew); +out_unlock: + fh_unlock(ffhp); + goto out_drop_write; } static void From 203f09fae4e21ee8fe14077f6081516722af12f4 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 26 Jul 2022 16:45:30 +1000 Subject: [PATCH 0918/1565] NFSD: reduce locking in nfsd_lookup() [ Upstream commit 19d008b46941b8c668402170522e0f7a9258409c ] nfsd_lookup() takes an exclusive lock on the parent inode, but no callers want the lock and it may not be needed at all if the result is in the dcache. Change nfsd_lookup_dentry() to not take the lock, and call lookup_one_len_locked() which takes lock only if needed. nfsd4_open() currently expects the lock to still be held, but that isn't necessary as nfsd_validate_delegated_dentry() provides required guarantees without the lock. NOTE: NFSv4 requires directory changeinfo for OPEN even when a create wasn't requested and no change happened. Now that nfsd_lookup() doesn't use fh_lock(), we need to explicitly fill the attributes when no create happens. A new fh_fill_both_attrs() is provided for that task. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 20 ++++++++++++-------- fs/nfsd/nfs4state.c | 3 --- fs/nfsd/nfsfh.c | 19 +++++++++++++++++++ fs/nfsd/nfsfh.h | 2 +- fs/nfsd/vfs.c | 34 ++++++++++++++-------------------- 5 files changed, 46 insertions(+), 32 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index f588e592a070..9cf4298817c4 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -302,6 +302,11 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, if (d_really_is_positive(child)) { status = nfs_ok; + /* NFSv4 protocol requires change attributes even though + * no change happened. + */ + fh_fill_both_attrs(fhp); + switch (open->op_createmode) { case NFS4_CREATE_UNCHECKED: if (!d_is_reg(child)) @@ -417,15 +422,15 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru if (nfsd4_create_is_exclusive(open->op_createmode) && status == 0) open->op_bmval[1] |= (FATTR4_WORD1_TIME_ACCESS | FATTR4_WORD1_TIME_MODIFY); - } else - /* - * Note this may exit with the parent still locked. - * We will hold the lock until nfsd4_open's final - * lookup, to prevent renames or unlinks until we've had - * a chance to an acquire a delegation if appropriate. - */ + } else { status = nfsd_lookup(rqstp, current_fh, open->op_fname, open->op_fnamelen, *resfh); + if (!status) + /* NFSv4 protocol requires change attributes even though + * no change happened. + */ + fh_fill_both_attrs(current_fh); + } if (status) goto out; status = nfsd_check_obj_isreg(*resfh); @@ -1043,7 +1048,6 @@ nfsd4_secinfo(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, &exp, &dentry); if (err) return err; - fh_unlock(&cstate->current_fh); if (d_really_is_negative(dentry)) { exp_put(exp); err = nfserr_noent; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 2b6c9f1b9de8..e44d9c8d5065 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5321,9 +5321,6 @@ nfsd4_verify_deleg_dentry(struct nfsd4_open *open, struct nfs4_file *fp, struct dentry *child; __be32 err; - /* parent may already be locked, and it may get unlocked by - * this call, but that is safe. - */ err = nfsd_lookup_dentry(open->op_rqstp, parent, open->op_fname, open->op_fnamelen, &exp, &child); diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index d4ae838948ba..cc680deecafa 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -670,6 +670,25 @@ void fh_fill_post_attrs(struct svc_fh *fhp) nfsd4_change_attribute(&fhp->fh_post_attr, inode); } +/** + * fh_fill_both_attrs - Fill pre-op and post-op attributes + * @fhp: file handle to be updated + * + * This is used when the directory wasn't changed, but wcc attributes + * are needed anyway. + */ +void fh_fill_both_attrs(struct svc_fh *fhp) +{ + fh_fill_post_attrs(fhp); + if (!fhp->fh_post_saved) + return; + fhp->fh_pre_change = fhp->fh_post_change; + fhp->fh_pre_mtime = fhp->fh_post_attr.mtime; + fhp->fh_pre_ctime = fhp->fh_post_attr.ctime; + fhp->fh_pre_size = fhp->fh_post_attr.size; + fhp->fh_pre_saved = true; +} + /* * Release a file handle. */ diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index fb9d358a267e..28a4f9a94e2c 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -322,7 +322,7 @@ static inline u64 nfsd4_change_attribute(struct kstat *stat, extern void fh_fill_pre_attrs(struct svc_fh *fhp); extern void fh_fill_post_attrs(struct svc_fh *fhp); - +extern void fh_fill_both_attrs(struct svc_fh *fhp); /* * Lock a file handle/inode diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index dd945099ae6b..03f6dd2ec653 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -198,27 +198,13 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp, goto out_nfserr; } } else { - /* - * In the nfsd4_open() case, this may be held across - * subsequent open and delegation acquisition which may - * need to take the child's i_mutex: - */ - fh_lock_nested(fhp, I_MUTEX_PARENT); - dentry = lookup_one_len(name, dparent, len); + dentry = lookup_one_len_unlocked(name, dparent, len); host_err = PTR_ERR(dentry); if (IS_ERR(dentry)) goto out_nfserr; if (nfsd_mountpoint(dentry, exp)) { - /* - * We don't need the i_mutex after all. It's - * still possible we could open this (regular - * files can be mountpoints too), but the - * i_mutex is just there to prevent renames of - * something that we might be about to delegate, - * and a mountpoint won't be renamed: - */ - fh_unlock(fhp); - if ((host_err = nfsd_cross_mnt(rqstp, &dentry, &exp))) { + host_err = nfsd_cross_mnt(rqstp, &dentry, &exp); + if (host_err) { dput(dentry); goto out_nfserr; } @@ -233,7 +219,15 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp, return nfserrno(host_err); } -/* +/** + * nfsd_lookup - look up a single path component for nfsd + * + * @rqstp: the request context + * @fhp: the file handle of the directory + * @name: the component name, or %NULL to look up parent + * @len: length of name to examine + * @resfh: pointer to pre-initialised filehandle to hold result. + * * Look up one component of a pathname. * N.B. After this call _both_ fhp and resfh need an fh_put * @@ -243,11 +237,11 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp, * returned. Otherwise the covered directory is returned. * NOTE: this mountpoint crossing is not supported properly by all * clients and is explicitly disallowed for NFSv3 - * NeilBrown + * */ __be32 nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name, - unsigned int len, struct svc_fh *resfh) + unsigned int len, struct svc_fh *resfh) { struct svc_export *exp; struct dentry *dentry; From 9ef325edeade4d5c67121553ea158fa17b215bb4 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 26 Jul 2022 16:45:30 +1000 Subject: [PATCH 0919/1565] NFSD: use explicit lock/unlock for directory ops [ Upstream commit debf16f0c671cb8db154a9ebcd6014cfff683b80 ] When creating or unlinking a name in a directory use explicit inode_lock_nested() instead of fh_lock(), and explicit calls to fh_fill_pre_attrs() and fh_fill_post_attrs(). This is already done for renames, with lock_rename() as the explicit locking. Also move the 'fill' calls closer to the operation that might change the attributes. This way they are avoided on some error paths. For the v2-only code in nfsproc.c, the fill calls are not replaced as they aren't needed. Making the locking explicit will simplify proposed future changes to locking for directories. It also makes it easily visible exactly where pre/post attributes are used - not all callers of fh_lock() actually need the pre/post attributes. Reviewed-by: Jeff Layton Signed-off-by: NeilBrown [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 6 ++++-- fs/nfsd/nfs4proc.c | 6 ++++-- fs/nfsd/nfsproc.c | 5 ++--- fs/nfsd/vfs.c | 36 ++++++++++++++++++++++++++---------- 4 files changed, 36 insertions(+), 17 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index d78dbb214c5c..1a52f0b06ec3 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -260,7 +260,7 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, if (host_err) return nfserrno(host_err); - fh_lock_nested(fhp, I_MUTEX_PARENT); + inode_lock_nested(inode, I_MUTEX_PARENT); child = lookup_one_len(argp->name, parent, argp->len); if (IS_ERR(child)) { @@ -318,11 +318,13 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, if (!IS_POSIXACL(inode)) iap->ia_mode &= ~current_umask(); + fh_fill_pre_attrs(fhp); host_err = vfs_create(inode, child, iap->ia_mode, true); if (host_err < 0) { status = nfserrno(host_err); goto out; } + fh_fill_post_attrs(fhp); /* A newly created file already has a file size of zero. */ if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0)) @@ -340,7 +342,7 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs); out: - fh_unlock(fhp); + inode_unlock(inode); if (child && !IS_ERR(child)) dput(child); fh_drop_write(fhp); diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 9cf4298817c4..193b84a0f3a5 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -264,7 +264,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, if (is_create_with_attrs(open)) nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs); - fh_lock_nested(fhp, I_MUTEX_PARENT); + inode_lock_nested(inode, I_MUTEX_PARENT); child = lookup_one_len(open->op_fname, parent, open->op_fnamelen); if (IS_ERR(child)) { @@ -348,10 +348,12 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, if (!IS_POSIXACL(inode)) iap->ia_mode &= ~current_umask(); + fh_fill_pre_attrs(fhp); status = nfsd4_vfs_create(fhp, child, open); if (status != nfs_ok) goto out; open->op_created = true; + fh_fill_post_attrs(fhp); /* A newly created file already has a file size of zero. */ if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0)) @@ -373,7 +375,7 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, if (attrs.na_aclerr) open->op_bmval[0] &= ~FATTR4_WORD0_ACL; out: - fh_unlock(fhp); + inode_unlock(inode); nfsd_attrs_free(&attrs); if (child && !IS_ERR(child)) dput(child); diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 180b84b6597b..e533550a26db 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -291,7 +291,7 @@ nfsd_proc_create(struct svc_rqst *rqstp) goto done; } - fh_lock_nested(dirfhp, I_MUTEX_PARENT); + inode_lock_nested(dirfhp->fh_dentry->d_inode, I_MUTEX_PARENT); dchild = lookup_one_len(argp->name, dirfhp->fh_dentry, argp->len); if (IS_ERR(dchild)) { resp->status = nfserrno(PTR_ERR(dchild)); @@ -407,8 +407,7 @@ nfsd_proc_create(struct svc_rqst *rqstp) } out_unlock: - /* We don't really need to unlock, as fh_put does it. */ - fh_unlock(dirfhp); + inode_unlock(dirfhp->fh_dentry->d_inode); fh_drop_write(dirfhp); done: fh_put(dirfhp); diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 03f6dd2ec653..3364e562b00e 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1386,7 +1386,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, if (host_err) return nfserrno(host_err); - fh_lock_nested(fhp, I_MUTEX_PARENT); + inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT); dchild = lookup_one_len(fname, dentry, flen); host_err = PTR_ERR(dchild); if (IS_ERR(dchild)) { @@ -1401,10 +1401,12 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, dput(dchild); if (err) goto out_unlock; + fh_fill_pre_attrs(fhp); err = nfsd_create_locked(rqstp, fhp, fname, flen, attrs, type, rdev, resfhp); + fh_fill_post_attrs(fhp); out_unlock: - fh_unlock(fhp); + inode_unlock(dentry->d_inode); return err; } @@ -1487,20 +1489,22 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp, goto out; } - fh_lock(fhp); dentry = fhp->fh_dentry; + inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT); dnew = lookup_one_len(fname, dentry, flen); if (IS_ERR(dnew)) { err = nfserrno(PTR_ERR(dnew)); - fh_unlock(fhp); + inode_unlock(dentry->d_inode); goto out_drop_write; } + fh_fill_pre_attrs(fhp); host_err = vfs_symlink(d_inode(dentry), dnew, path); err = nfserrno(host_err); cerr = fh_compose(resfhp, fhp->fh_export, dnew, fhp); if (!err) nfsd_create_setattr(rqstp, fhp, resfhp, attrs); - fh_unlock(fhp); + fh_fill_post_attrs(fhp); + inode_unlock(dentry->d_inode); if (!err) err = nfserrno(commit_metadata(fhp)); dput(dnew); @@ -1546,9 +1550,9 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp, goto out; } - fh_lock_nested(ffhp, I_MUTEX_PARENT); ddir = ffhp->fh_dentry; dirp = d_inode(ddir); + inode_lock_nested(dirp, I_MUTEX_PARENT); dnew = lookup_one_len(name, ddir, len); if (IS_ERR(dnew)) { @@ -1561,8 +1565,18 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp, err = nfserr_noent; if (d_really_is_negative(dold)) goto out_dput; +<<<<<<< current host_err = vfs_link(dold, dirp, dnew, NULL); fh_unlock(ffhp); +||||||| constructed merge base + host_err = vfs_link(dold, &init_user_ns, dirp, dnew, NULL); + fh_unlock(ffhp); +======= + fh_fill_pre_attrs(ffhp); + host_err = vfs_link(dold, &init_user_ns, dirp, dnew, NULL); + fh_fill_post_attrs(ffhp); + inode_unlock(dirp); +>>>>>>> patched if (!host_err) { err = nfserrno(commit_metadata(ffhp)); if (!err) @@ -1582,7 +1596,7 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp, out_dput: dput(dnew); out_unlock: - fh_unlock(ffhp); + inode_unlock(dirp); goto out_drop_write; } @@ -1755,9 +1769,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, if (host_err) goto out_nfserr; - fh_lock_nested(fhp, I_MUTEX_PARENT); dentry = fhp->fh_dentry; dirp = d_inode(dentry); + inode_lock_nested(dirp, I_MUTEX_PARENT); rdentry = lookup_one_len(fname, dentry, flen); host_err = PTR_ERR(rdentry); @@ -1775,6 +1789,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, if (!type) type = d_inode(rdentry)->i_mode & S_IFMT; + fh_fill_pre_attrs(fhp); if (type != S_IFDIR) { if (rdentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) nfsd_close_cached_files(rdentry); @@ -1782,8 +1797,9 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, } else { host_err = vfs_rmdir(dirp, rdentry); } + fh_fill_post_attrs(fhp); - fh_unlock(fhp); + inode_unlock(dirp); if (!host_err) host_err = commit_metadata(fhp); dput(rdentry); @@ -1806,7 +1822,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, out: return err; out_unlock: - fh_unlock(fhp); + inode_unlock(dirp); goto out_drop_write; } From 5a8d428f5e373f146ef76a004b38630e2c8bc30a Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 26 Jul 2022 16:45:30 +1000 Subject: [PATCH 0920/1565] NFSD: use (un)lock_inode instead of fh_(un)lock for file operations [ Upstream commit bb4d53d66e4b8c8b8e5634802262e53851a2d2db ] When locking a file to access ACLs and xattrs etc, use explicit locking with inode_lock() instead of fh_lock(). This means that the calls to fh_fill_pre/post_attr() are also explicit which improves readability and allows us to place them only where they are needed. Only the xattr calls need pre/post information. When locking a file we don't need I_MUTEX_PARENT as the file is not a parent of anything, so we can use inode_lock() directly rather than the inode_lock_nested() call that fh_lock() uses. Reviewed-by: Jeff Layton Signed-off-by: NeilBrown [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs2acl.c | 6 +++--- fs/nfsd/nfs3acl.c | 4 ++-- fs/nfsd/nfs4state.c | 9 +++++---- fs/nfsd/vfs.c | 44 +++++++++++++++++++++----------------------- 4 files changed, 31 insertions(+), 32 deletions(-) diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 03703b22c81e..f7d166f056af 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -111,7 +111,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp) if (error) goto out_errno; - fh_lock(fh); + inode_lock(inode); error = set_posix_acl(inode, ACL_TYPE_ACCESS, argp->acl_access); if (error) @@ -120,7 +120,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp) if (error) goto out_drop_lock; - fh_unlock(fh); + inode_unlock(inode); fh_drop_write(fh); @@ -134,7 +134,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp) return rpc_success; out_drop_lock: - fh_unlock(fh); + inode_unlock(inode); fh_drop_write(fh); out_errno: resp->status = nfserrno(error); diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 350fae92ae04..15bee0339c76 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -101,7 +101,7 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp) if (error) goto out_errno; - fh_lock(fh); + inode_lock(inode); error = set_posix_acl(inode, ACL_TYPE_ACCESS, argp->acl_access); if (error) @@ -109,7 +109,7 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp) error = set_posix_acl(inode, ACL_TYPE_DEFAULT, argp->acl_default); out_drop_lock: - fh_unlock(fh); + inode_unlock(inode); fh_drop_write(fh); out_errno: resp->status = nfserrno(error); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index e44d9c8d5065..7a4baa41b536 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -7429,21 +7429,22 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, static __be32 nfsd_test_lock(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file_lock *lock) { struct nfsd_file *nf; + struct inode *inode; __be32 err; err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); if (err) return err; - fh_lock(fhp); /* to block new leases till after test_lock: */ - err = nfserrno(nfsd_open_break_lease(fhp->fh_dentry->d_inode, - NFSD_MAY_READ)); + inode = fhp->fh_dentry->d_inode; + inode_lock(inode); /* to block new leases till after test_lock: */ + err = nfserrno(nfsd_open_break_lease(inode, NFSD_MAY_READ)); if (err) goto out; lock->fl_file = nf->nf_file; err = nfserrno(vfs_test_lock(nf->nf_file, lock)); lock->fl_file = NULL; out: - fh_unlock(fhp); + inode_unlock(inode); nfsd_file_put(nf); return err; } diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 3364e562b00e..504a3ddfaf75 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -429,7 +429,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, return err; } - fh_lock(fhp); + inode_lock(inode); if (size_change) { /* * RFC5661, Section 18.30.4: @@ -475,7 +475,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, !attr->na_aclerr && attr->na_dpacl && S_ISDIR(inode->i_mode)) attr->na_aclerr = set_posix_acl(inode, ACL_TYPE_DEFAULT, attr->na_dpacl); - fh_unlock(fhp); + inode_unlock(inode); if (size_change) put_write_access(inode); out: @@ -1565,18 +1565,10 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp, err = nfserr_noent; if (d_really_is_negative(dold)) goto out_dput; -<<<<<<< current - host_err = vfs_link(dold, dirp, dnew, NULL); - fh_unlock(ffhp); -||||||| constructed merge base - host_err = vfs_link(dold, &init_user_ns, dirp, dnew, NULL); - fh_unlock(ffhp); -======= fh_fill_pre_attrs(ffhp); - host_err = vfs_link(dold, &init_user_ns, dirp, dnew, NULL); + host_err = vfs_link(dold, dirp, dnew, NULL); fh_fill_post_attrs(ffhp); inode_unlock(dirp); ->>>>>>> patched if (!host_err) { err = nfserrno(commit_metadata(ffhp)); if (!err) @@ -2177,13 +2169,16 @@ nfsd_listxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char **bufp, return err; } -/* - * Removexattr and setxattr need to call fh_lock to both lock the inode - * and set the change attribute. Since the top-level vfs_removexattr - * and vfs_setxattr calls already do their own inode_lock calls, call - * the _locked variant. Pass in a NULL pointer for delegated_inode, - * and let the client deal with NFS4ERR_DELAY (same as with e.g. - * setattr and remove). +/** + * nfsd_removexattr - Remove an extended attribute + * @rqstp: RPC transaction being executed + * @fhp: NFS filehandle of object with xattr to remove + * @name: name of xattr to remove (NUL-terminate) + * + * Pass in a NULL pointer for delegated_inode, and let the client deal + * with NFS4ERR_DELAY (same as with e.g. setattr and remove). + * + * Returns nfs_ok on success, or an nfsstat in network byte order. */ __be32 nfsd_removexattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name) @@ -2199,11 +2194,13 @@ nfsd_removexattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name) if (ret) return nfserrno(ret); - fh_lock(fhp); + inode_lock(fhp->fh_dentry->d_inode); + fh_fill_pre_attrs(fhp); ret = __vfs_removexattr_locked(fhp->fh_dentry, name, NULL); - fh_unlock(fhp); + fh_fill_post_attrs(fhp); + inode_unlock(fhp->fh_dentry->d_inode); fh_drop_write(fhp); return nfsd_xattr_errno(ret); @@ -2223,12 +2220,13 @@ nfsd_setxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name, ret = fh_want_write(fhp); if (ret) return nfserrno(ret); - fh_lock(fhp); + inode_lock(fhp->fh_dentry->d_inode); + fh_fill_pre_attrs(fhp); ret = __vfs_setxattr_locked(fhp->fh_dentry, name, buf, len, flags, NULL); - - fh_unlock(fhp); + fh_fill_post_attrs(fhp); + inode_unlock(fhp->fh_dentry->d_inode); fh_drop_write(fhp); return nfsd_xattr_errno(ret); From b15656dfa283dad1b2a328753bdc1f5b19d1cf5c Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 26 Jul 2022 16:45:30 +1000 Subject: [PATCH 0921/1565] NFSD: discard fh_locked flag and fh_lock/fh_unlock [ Upstream commit dd8dd403d7b223cc77ee89d8d09caf045e90e648 ] As all inode locking is now fully balanced, fh_put() does not need to call fh_unlock(). fh_lock() and fh_unlock() are no longer used, so discard them. These are the only real users of ->fh_locked, so discard that too. Reviewed-by: Jeff Layton Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsfh.c | 3 +-- fs/nfsd/nfsfh.h | 56 ++++--------------------------------------------- fs/nfsd/vfs.c | 17 +-------------- 3 files changed, 6 insertions(+), 70 deletions(-) diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index cc680deecafa..db8d62632a5b 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -547,7 +547,7 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry, if (ref_fh == fhp) fh_put(ref_fh); - if (fhp->fh_locked || fhp->fh_dentry) { + if (fhp->fh_dentry) { printk(KERN_ERR "fh_compose: fh %pd2 not initialized!\n", dentry); } @@ -698,7 +698,6 @@ fh_put(struct svc_fh *fhp) struct dentry * dentry = fhp->fh_dentry; struct svc_export * exp = fhp->fh_export; if (dentry) { - fh_unlock(fhp); fhp->fh_dentry = NULL; dput(dentry); fh_clear_pre_post_attrs(fhp); diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 28a4f9a94e2c..c3ae6414fc5c 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -81,7 +81,6 @@ typedef struct svc_fh { struct dentry * fh_dentry; /* validated dentry */ struct svc_export * fh_export; /* export pointer */ - bool fh_locked; /* inode locked by us */ bool fh_want_write; /* remount protection taken */ bool fh_no_wcc; /* no wcc data needed */ bool fh_no_atomic_attr; @@ -93,7 +92,7 @@ typedef struct svc_fh { bool fh_post_saved; /* post-op attrs saved */ bool fh_pre_saved; /* pre-op attrs saved */ - /* Pre-op attributes saved during fh_lock */ + /* Pre-op attributes saved when inode is locked */ __u64 fh_pre_size; /* size before operation */ struct timespec64 fh_pre_mtime; /* mtime before oper */ struct timespec64 fh_pre_ctime; /* ctime before oper */ @@ -103,7 +102,7 @@ typedef struct svc_fh { */ u64 fh_pre_change; - /* Post-op attributes saved in fh_unlock */ + /* Post-op attributes saved in fh_fill_post_attrs() */ struct kstat fh_post_attr; /* full attrs after operation */ u64 fh_post_change; /* nfsv4 change; see above */ } svc_fh; @@ -223,8 +222,8 @@ void fh_put(struct svc_fh *); static __inline__ struct svc_fh * fh_copy(struct svc_fh *dst, struct svc_fh *src) { - WARN_ON(src->fh_dentry || src->fh_locked); - + WARN_ON(src->fh_dentry); + *dst = *src; return dst; } @@ -323,51 +322,4 @@ static inline u64 nfsd4_change_attribute(struct kstat *stat, extern void fh_fill_pre_attrs(struct svc_fh *fhp); extern void fh_fill_post_attrs(struct svc_fh *fhp); extern void fh_fill_both_attrs(struct svc_fh *fhp); - -/* - * Lock a file handle/inode - * NOTE: both fh_lock and fh_unlock are done "by hand" in - * vfs.c:nfsd_rename as it needs to grab 2 i_mutex's at once - * so, any changes here should be reflected there. - */ - -static inline void -fh_lock_nested(struct svc_fh *fhp, unsigned int subclass) -{ - struct dentry *dentry = fhp->fh_dentry; - struct inode *inode; - - BUG_ON(!dentry); - - if (fhp->fh_locked) { - printk(KERN_WARNING "fh_lock: %pd2 already locked!\n", - dentry); - return; - } - - inode = d_inode(dentry); - inode_lock_nested(inode, subclass); - fh_fill_pre_attrs(fhp); - fhp->fh_locked = true; -} - -static inline void -fh_lock(struct svc_fh *fhp) -{ - fh_lock_nested(fhp, I_MUTEX_NORMAL); -} - -/* - * Unlock a file handle/inode - */ -static inline void -fh_unlock(struct svc_fh *fhp) -{ - if (fhp->fh_locked) { - fh_fill_post_attrs(fhp); - inode_unlock(d_inode(fhp->fh_dentry)); - fhp->fh_locked = false; - } -} - #endif /* _LINUX_NFSD_NFSFH_H */ diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 504a3ddfaf75..5a7fee4ee207 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1282,13 +1282,6 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, dirp = d_inode(dentry); dchild = dget(resfhp->fh_dentry); - if (!fhp->fh_locked) { - WARN_ONCE(1, "nfsd_create: parent %pd2 not locked!\n", - dentry); - err = nfserr_io; - goto out; - } - err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE); if (err) goto out; @@ -1656,10 +1649,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, goto out; } - /* cannot use fh_lock as we need deadlock protective ordering - * so do it by hand */ trap = lock_rename(tdentry, fdentry); - ffhp->fh_locked = tfhp->fh_locked = true; fh_fill_pre_attrs(ffhp); fh_fill_pre_attrs(tfhp); @@ -1707,17 +1697,12 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, dput(odentry); out_nfserr: err = nfserrno(host_err); - /* - * We cannot rely on fh_unlock on the two filehandles, - * as that would do the wrong thing if the two directories - * were the same, so again we do it by hand. - */ + if (!close_cached) { fh_fill_post_attrs(ffhp); fh_fill_post_attrs(tfhp); } unlock_rename(tdentry, fdentry); - ffhp->fh_locked = tfhp->fh_locked = false; fh_drop_write(ffhp); /* From 1f87122d348ea4e2c9f6286c86a9dc7c433d711f Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 1 Aug 2022 15:57:26 -0400 Subject: [PATCH 0922/1565] lockd: detect and reject lock arguments that overflow [ Upstream commit 6930bcbfb6ceda63e298c6af6d733ecdf6bd4cde ] lockd doesn't currently vet the start and length in nlm4 requests like it should, and can end up generating lock requests with arguments that overflow when passed to the filesystem. The NLM4 protocol uses unsigned 64-bit arguments for both start and length, whereas struct file_lock tracks the start and end as loff_t values. By the time we get around to calling nlm4svc_retrieve_args, we've lost the information that would allow us to determine if there was an overflow. Start tracking the actual start and len for NLM4 requests in the nlm_lock. In nlm4svc_retrieve_args, vet these values to ensure they won't cause an overflow, and return NLM4_FBIG if they do. Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=392 Reported-by: Jan Kasiak Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Cc: # 5.14+ Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc4proc.c | 8 ++++++++ fs/lockd/xdr4.c | 19 ++----------------- include/linux/lockd/xdr.h | 2 ++ 3 files changed, 12 insertions(+), 17 deletions(-) diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c index 4f247ab8be61..bf274f23969b 100644 --- a/fs/lockd/svc4proc.c +++ b/fs/lockd/svc4proc.c @@ -32,6 +32,10 @@ nlm4svc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, if (!nlmsvc_ops) return nlm_lck_denied_nolocks; + if (lock->lock_start > OFFSET_MAX || + (lock->lock_len && ((lock->lock_len - 1) > (OFFSET_MAX - lock->lock_start)))) + return nlm4_fbig; + /* Obtain host handle */ if (!(host = nlmsvc_lookup_host(rqstp, lock->caller, lock->len)) || (argp->monitor && nsm_monitor(host) < 0)) @@ -50,6 +54,10 @@ nlm4svc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, /* Set up the missing parts of the file_lock structure */ lock->fl.fl_file = file->f_file[mode]; lock->fl.fl_pid = current->tgid; + lock->fl.fl_start = (loff_t)lock->lock_start; + lock->fl.fl_end = lock->lock_len ? + (loff_t)(lock->lock_start + lock->lock_len - 1) : + OFFSET_MAX; lock->fl.fl_lmops = &nlmsvc_lock_operations; nlmsvc_locks_init_private(&lock->fl, host, (pid_t)lock->svid); if (!lock->fl.fl_owner) { diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 856267c0864b..712fdfeb8ef0 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -20,13 +20,6 @@ #include "svcxdr.h" -static inline loff_t -s64_to_loff_t(__s64 offset) -{ - return (loff_t)offset; -} - - static inline s64 loff_t_to_s64(loff_t offset) { @@ -70,8 +63,6 @@ static bool svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) { struct file_lock *fl = &lock->fl; - u64 len, start; - s64 end; if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) return false; @@ -81,20 +72,14 @@ svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) return false; if (xdr_stream_decode_u32(xdr, &lock->svid) < 0) return false; - if (xdr_stream_decode_u64(xdr, &start) < 0) + if (xdr_stream_decode_u64(xdr, &lock->lock_start) < 0) return false; - if (xdr_stream_decode_u64(xdr, &len) < 0) + if (xdr_stream_decode_u64(xdr, &lock->lock_len) < 0) return false; locks_init_lock(fl); fl->fl_flags = FL_POSIX; fl->fl_type = F_RDLCK; - end = start + len - 1; - fl->fl_start = s64_to_loff_t(start); - if (len == 0 || end < 0) - fl->fl_end = OFFSET_MAX; - else - fl->fl_end = s64_to_loff_t(end); return true; } diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h index 398f70093cd3..67e4a2c5500b 100644 --- a/include/linux/lockd/xdr.h +++ b/include/linux/lockd/xdr.h @@ -41,6 +41,8 @@ struct nlm_lock { struct nfs_fh fh; struct xdr_netobj oh; u32 svid; + u64 lock_start; + u64 lock_len; struct file_lock fl; }; From c7d320e62066039c5863dc637e616042fc387d05 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Thu, 8 Sep 2022 12:08:40 +1000 Subject: [PATCH 0923/1565] NFSD: fix regression with setting ACLs. [ Upstream commit 00801cd92d91e94aa04d687f9bb9a9104e7c3d46 ] A recent patch moved ACL setting into nfsd_setattr(). Unfortunately it didn't work as nfsd_setattr() aborts early if iap->ia_valid is 0. Remove this test, and instead avoid calling notify_change() when ia_valid is 0. This means that nfsd_setattr() will now *always* lock the inode. Previously it didn't if only a ATTR_MODE change was requested on a symlink (see Commit 15b7a1b86d66 ("[PATCH] knfsd: fix setattr-on-symlink error return")). I don't think this change really matters. Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs") Signed-off-by: NeilBrown Reviewed-by: Jeff Layton [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 5a7fee4ee207..95f2e4549c03 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -299,6 +299,10 @@ commit_metadata(struct svc_fh *fhp) static void nfsd_sanitize_attrs(struct inode *inode, struct iattr *iap) { + /* Ignore mode updates on symlinks */ + if (S_ISLNK(inode->i_mode)) + iap->ia_valid &= ~ATTR_MODE; + /* sanitize the mode change */ if (iap->ia_valid & ATTR_MODE) { iap->ia_mode &= S_IALLUGO; @@ -366,7 +370,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, int accmode = NFSD_MAY_SATTR; umode_t ftype = 0; __be32 err; - int host_err; + int host_err = 0; bool get_write_count; bool size_change = (iap->ia_valid & ATTR_SIZE); @@ -404,13 +408,6 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, dentry = fhp->fh_dentry; inode = d_inode(dentry); - /* Ignore any mode updates on symlinks */ - if (S_ISLNK(inode->i_mode)) - iap->ia_valid &= ~ATTR_MODE; - - if (!iap->ia_valid) - return 0; - nfsd_sanitize_attrs(inode, iap); if (check_guard && guardtime != inode->i_ctime.tv_sec) @@ -461,8 +458,10 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, goto out_unlock; } - iap->ia_valid |= ATTR_CTIME; - host_err = notify_change(dentry, iap, NULL); + if (iap->ia_valid) { + iap->ia_valid |= ATTR_CTIME; + host_err = notify_change(dentry, iap, NULL); + } out_unlock: if (attr->na_seclabel && attr->na_seclabel->len) From 57afda7bf248389290b81c33370a8e320df59ab8 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sat, 10 Sep 2022 22:14:02 +0100 Subject: [PATCH 0924/1565] nfsd_splice_actor(): handle compound pages MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit bfbfb6182ad1d7d184b16f25165faad879147f79 ] pipe_buffer might refer to a compound page (and contain more than a PAGE_SIZE worth of data). Theoretically it had been possible since way back, but nfsd_splice_actor() hadn't run into that until copy_page_to_iter() change. Fortunately, the only thing that changes for compound pages is that we need to stuff each relevant subpage in and convert the offset into offset in the first subpage. Acked-by: Chuck Lever Tested-by: Benjamin Coddington Fixes: f0f6b614f83d "copy_page_to_iter(): don't split high-order page in case of ITER_PIPE" Signed-off-by: Al Viro [ cel: "‘for’ loop initial declarations are only allowed in C99 or C11 mode" ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 95f2e4549c03..bc377ee17717 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -862,10 +862,15 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct splice_desc *sd) { struct svc_rqst *rqstp = sd->u.data; + struct page *page = buf->page; // may be a compound one + unsigned offset = buf->offset; + int i; - svc_rqst_replace_page(rqstp, buf->page); - if (rqstp->rq_res.page_len == 0) - rqstp->rq_res.page_base = buf->offset; + page += offset / PAGE_SIZE; + for (i = sd->len; i > 0; i -= PAGE_SIZE) + svc_rqst_replace_page(rqstp, page++); + if (rqstp->rq_res.page_len == 0) // first call + rqstp->rq_res.page_base = offset % PAGE_SIZE; rqstp->rq_res.page_len += sd->len; return sd->len; } From 91eebaa181b5cfec8fca647bb33c6dc167051397 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Thu, 18 Aug 2022 23:01:14 +0200 Subject: [PATCH 0925/1565] NFSD: move from strlcpy with unused retval to strscpy [ Upstream commit 72f78ae00a8e5d7abe13abac8305a300f6afd74b ] Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4idmap.c | 8 ++++---- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfssvc.c | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs4idmap.c b/fs/nfsd/nfs4idmap.c index f92161ce1f97..e70a1a2999b7 100644 --- a/fs/nfsd/nfs4idmap.c +++ b/fs/nfsd/nfs4idmap.c @@ -82,8 +82,8 @@ ent_init(struct cache_head *cnew, struct cache_head *citm) new->id = itm->id; new->type = itm->type; - strlcpy(new->name, itm->name, sizeof(new->name)); - strlcpy(new->authname, itm->authname, sizeof(new->authname)); + strscpy(new->name, itm->name, sizeof(new->name)); + strscpy(new->authname, itm->authname, sizeof(new->authname)); } static void @@ -548,7 +548,7 @@ idmap_name_to_id(struct svc_rqst *rqstp, int type, const char *name, u32 namelen return nfserr_badowner; memcpy(key.name, name, namelen); key.name[namelen] = '\0'; - strlcpy(key.authname, rqst_authname(rqstp), sizeof(key.authname)); + strscpy(key.authname, rqst_authname(rqstp), sizeof(key.authname)); ret = idmap_lookup(rqstp, nametoid_lookup, &key, nn->nametoid_cache, &item); if (ret == -ENOENT) return nfserr_badowner; @@ -584,7 +584,7 @@ static __be32 idmap_id_to_name(struct xdr_stream *xdr, int ret; struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); - strlcpy(key.authname, rqst_authname(rqstp), sizeof(key.authname)); + strscpy(key.authname, rqst_authname(rqstp), sizeof(key.authname)); ret = idmap_lookup(rqstp, idtoname_lookup, &key, nn->idtoname_cache, &item); if (ret == -ENOENT) return encode_ascii_id(xdr, id); diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 193b84a0f3a5..0431b979748b 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1345,7 +1345,7 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, return 0; } if (work) { - strlcpy(work->nsui_ipaddr, ipaddr, sizeof(work->nsui_ipaddr) - 1); + strscpy(work->nsui_ipaddr, ipaddr, sizeof(work->nsui_ipaddr) - 1); refcount_set(&work->nsui_refcnt, 2); work->nsui_busy = true; list_add_tail(&work->nsui_list, &nn->nfsd_ssc_mount_list); diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 011c556caa1e..8b1afde19211 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -799,7 +799,7 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred) if (nrservs == 0 && nn->nfsd_serv == NULL) goto out; - strlcpy(nn->nfsd_name, utsname()->nodename, + strscpy(nn->nfsd_name, utsname()->nodename, sizeof(nn->nfsd_name)); error = nfsd_create_serv(net); From 9dc20a662fb8389ad1e8c766bc0cb6a2028c2e82 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Thu, 18 Aug 2022 23:01:16 +0200 Subject: [PATCH 0926/1565] lockd: move from strlcpy with unused retval to strscpy [ Upstream commit 97f8e62572555f8ad578d7b1739ba64d5d2cac0f ] Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/host.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/lockd/host.c b/fs/lockd/host.c index f802223e71ab..cdc8e12cdac4 100644 --- a/fs/lockd/host.c +++ b/fs/lockd/host.c @@ -164,7 +164,7 @@ static struct nlm_host *nlm_alloc_host(struct nlm_lookup_host_info *ni, host->h_addrbuf = nsm->sm_addrbuf; host->net = ni->net; host->h_cred = get_cred(ni->cred); - strlcpy(host->nodename, utsname()->nodename, sizeof(host->nodename)); + strscpy(host->nodename, utsname()->nodename, sizeof(host->nodename)); out: return host; From 30b0e49a957435fcd184d1e39255a4f1f5c42960 Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia Date: Fri, 19 Aug 2022 15:16:36 -0400 Subject: [PATCH 0927/1565] NFSD enforce filehandle check for source file in COPY [ Upstream commit 754035ff79a14886e68c0c9f6fa80adb21f12b53 ] If the passed in filehandle for the source file in the COPY operation is not a regular file, the server MUST return NFS4ERR_WRONG_TYPE. Signed-off-by: Olga Kornievskaia Reviewed-by: Jeff Layton [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 0431b979748b..ebfe39d31311 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1758,7 +1758,13 @@ static int nfsd4_do_async_copy(void *data) filp = nfs42_ssc_open(copy->ss_mnt, ©->c_fh, ©->stateid); if (IS_ERR(filp)) { - nfserr = nfserr_offload_denied; + switch (PTR_ERR(filp)) { + case -EBADF: + nfserr = nfserr_wrong_type; + break; + default: + nfserr = nfserr_offload_denied; + } /* ss_mnt will be unmounted by the laundromat */ goto do_callback; } From cef1ab71ae3705fd8203d506866f00e6aea5a104 Mon Sep 17 00:00:00 2001 From: Jinpeng Cui Date: Wed, 31 Aug 2022 14:20:02 +0000 Subject: [PATCH 0928/1565] NFSD: remove redundant variable status [ Upstream commit 4ab3442ca384a02abf8b1f2b3449a6c547851873 ] Return value directly from fh_verify() do_open_permission() exp_pseudoroot() instead of getting value from redundant variable status. Reported-by: Zeal Robot Signed-off-by: Jinpeng Cui Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index ebfe39d31311..62ffcecf78f7 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -141,7 +141,6 @@ fh_dup2(struct svc_fh *dst, struct svc_fh *src) static __be32 do_open_permission(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nfsd4_open *open, int accmode) { - __be32 status; if (open->op_truncate && !(open->op_share_access & NFS4_SHARE_ACCESS_WRITE)) @@ -156,9 +155,7 @@ do_open_permission(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nfs if (open->op_share_deny & NFS4_SHARE_DENY_READ) accmode |= NFSD_MAY_WRITE; - status = fh_verify(rqstp, current_fh, S_IFREG, accmode); - - return status; + return fh_verify(rqstp, current_fh, S_IFREG, accmode); } static __be32 nfsd_check_obj_isreg(struct svc_fh *fh) @@ -454,7 +451,6 @@ static __be32 do_open_fhandle(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_open *open) { struct svc_fh *current_fh = &cstate->current_fh; - __be32 status; int accmode = 0; /* We don't know the target directory, and therefore can not @@ -479,9 +475,7 @@ do_open_fhandle(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, str if (open->op_claim_type == NFS4_OPEN_CLAIM_DELEG_CUR_FH) accmode = NFSD_MAY_OWNER_OVERRIDE; - status = do_open_permission(rqstp, current_fh, open, accmode); - - return status; + return do_open_permission(rqstp, current_fh, open, accmode); } static void @@ -668,11 +662,9 @@ static __be32 nfsd4_putrootfh(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) { - __be32 status; - fh_put(&cstate->current_fh); - status = exp_pseudoroot(rqstp, &cstate->current_fh); - return status; + + return exp_pseudoroot(rqstp, &cstate->current_fh); } static __be32 From 490af5b07d854b4eb3cd46d58910230ab86a2103 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Thu, 1 Sep 2022 07:27:11 +0200 Subject: [PATCH 0929/1565] nfsd: Avoid some useless tests [ Upstream commit d44899b8bb0b919f923186c616a84f0e70e04772 ] memdup_user() can't return NULL, so there is no point for checking for it. Simplify some tests accordingly. Suggested-by: Dan Carpenter Signed-off-by: Christophe JAILLET Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4recover.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c index d08c1a8c9254..b9394a639a41 100644 --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -807,7 +807,7 @@ __cld_pipe_inprogress_downcall(const struct cld_msg_v2 __user *cmsg, if (get_user(namelen, &ci->cc_name.cn_len)) return -EFAULT; name.data = memdup_user(&ci->cc_name.cn_id, namelen); - if (IS_ERR_OR_NULL(name.data)) + if (IS_ERR(name.data)) return -EFAULT; name.len = namelen; get_user(princhashlen, &ci->cc_princhash.cp_len); @@ -815,7 +815,7 @@ __cld_pipe_inprogress_downcall(const struct cld_msg_v2 __user *cmsg, princhash.data = memdup_user( &ci->cc_princhash.cp_data, princhashlen); - if (IS_ERR_OR_NULL(princhash.data)) { + if (IS_ERR(princhash.data)) { kfree(name.data); return -EFAULT; } @@ -829,7 +829,7 @@ __cld_pipe_inprogress_downcall(const struct cld_msg_v2 __user *cmsg, if (get_user(namelen, &cnm->cn_len)) return -EFAULT; name.data = memdup_user(&cnm->cn_id, namelen); - if (IS_ERR_OR_NULL(name.data)) + if (IS_ERR(name.data)) return -EFAULT; name.len = namelen; } From d40bef3801cdf7bbb13035067de16dd99b115593 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Thu, 1 Sep 2022 07:27:19 +0200 Subject: [PATCH 0930/1565] nfsd: Propagate some error code returned by memdup_user() [ Upstream commit 30a30fcc3fc1ad4c5d017c9fcb75dc8f59e7bdad ] Propagate the error code returned by memdup_user() instead of a hard coded -EFAULT. Suggested-by: Dan Carpenter Signed-off-by: Christophe JAILLET Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4recover.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c index b9394a639a41..189c622dde61 100644 --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -808,7 +808,7 @@ __cld_pipe_inprogress_downcall(const struct cld_msg_v2 __user *cmsg, return -EFAULT; name.data = memdup_user(&ci->cc_name.cn_id, namelen); if (IS_ERR(name.data)) - return -EFAULT; + return PTR_ERR(name.data); name.len = namelen; get_user(princhashlen, &ci->cc_princhash.cp_len); if (princhashlen > 0) { @@ -817,7 +817,7 @@ __cld_pipe_inprogress_downcall(const struct cld_msg_v2 __user *cmsg, princhashlen); if (IS_ERR(princhash.data)) { kfree(name.data); - return -EFAULT; + return PTR_ERR(princhash.data); } princhash.len = princhashlen; } else @@ -830,7 +830,7 @@ __cld_pipe_inprogress_downcall(const struct cld_msg_v2 __user *cmsg, return -EFAULT; name.data = memdup_user(&cnm->cn_id, namelen); if (IS_ERR(name.data)) - return -EFAULT; + return PTR_ERR(name.data); name.len = namelen; } if (name.len > 5 && memcmp(name.data, "hash:", 5) == 0) { From 2bd6f95ff9911d06d922fdbd49db9b52f6609182 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 2 Sep 2022 18:18:16 -0400 Subject: [PATCH 0931/1565] NFSD: Increase NFSD_MAX_OPS_PER_COMPOUND [ Upstream commit 80e591ce636f3ae6855a0ca26963da1fdd6d4508 ] When attempting an NFSv4 mount, a Solaris NFSv4 client builds a single large COMPOUND that chains a series of LOOKUPs to get to the pseudo filesystem root directory that is to be mounted. The Linux NFS server's current maximum of 16 operations per NFSv4 COMPOUND is not large enough to ensure that this works for paths that are more than a few components deep. Since NFSD_MAX_OPS_PER_COMPOUND is mostly a sanity check, and most NFSv4 COMPOUNDS are between 3 and 6 operations (thus they do not trigger any re-allocation of the operation array on the server), increasing this maximum should result in little to no impact. The ops array can get large now, so allocate it via vmalloc() to help ensure memory fragmentation won't cause an allocation failure. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216383 Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 7 ++++--- fs/nfsd/state.h | 2 +- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 5476541530ea..92e0535ddb92 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -42,6 +42,8 @@ #include #include #include +#include + #include #include "idmap.h" @@ -2369,10 +2371,9 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) return true; if (argp->opcnt > ARRAY_SIZE(argp->iops)) { - argp->ops = kzalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL); + argp->ops = vcalloc(argp->opcnt, sizeof(*argp->ops)); if (!argp->ops) { argp->ops = argp->iops; - dprintk("nfsd: couldn't allocate room for COMPOUND\n"); return false; } } @@ -5402,7 +5403,7 @@ void nfsd4_release_compoundargs(struct svc_rqst *rqstp) struct nfsd4_compoundargs *args = rqstp->rq_argp; if (args->ops != args->iops) { - kfree(args->ops); + vfree(args->ops); args->ops = args->iops; } while (args->to_free) { diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index ae596dbf8667..5d28beb290fe 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -175,7 +175,7 @@ static inline struct nfs4_delegation *delegstateid(struct nfs4_stid *s) /* Maximum number of slots per session. 160 is useful for long haul TCP */ #define NFSD_MAX_SLOTS_PER_SESSION 160 /* Maximum number of operations per session compound */ -#define NFSD_MAX_OPS_PER_COMPOUND 16 +#define NFSD_MAX_OPS_PER_COMPOUND 50 /* Maximum session per slot cache size */ #define NFSD_SLOT_CACHE_SIZE 2048 /* Maximum number of NFSD_SLOT_CACHE_SIZE slots per session */ From 0e57d696f60dee6117a8ace0cac7c5761d375277 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 1 Sep 2022 15:10:05 -0400 Subject: [PATCH 0932/1565] NFSD: Protect against send buffer overflow in NFSv2 READDIR [ Upstream commit 00b4492686e0497fdb924a9d4c8f6f99377e176c ] Restore the previous limit on the @count argument to prevent a buffer overflow attack. Fixes: 53b1119a6e50 ("NFSD: Fix READDIR buffer overflow") Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index e533550a26db..559603a0a535 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -567,12 +567,11 @@ static void nfsd_init_dirlist_pages(struct svc_rqst *rqstp, struct xdr_buf *buf = &resp->dirlist; struct xdr_stream *xdr = &resp->xdr; - count = clamp(count, (u32)(XDR_UNIT * 2), svc_max_payload(rqstp)); - memset(buf, 0, sizeof(*buf)); /* Reserve room for the NULL ptr & eof flag (-2 words) */ - buf->buflen = count - XDR_UNIT * 2; + buf->buflen = clamp(count, (u32)(XDR_UNIT * 2), (u32)PAGE_SIZE); + buf->buflen -= XDR_UNIT * 2; buf->pages = rqstp->rq_next_page; rqstp->rq_next_page++; From 57774b1526163766403167b7bf00b136fb103761 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 1 Sep 2022 15:10:12 -0400 Subject: [PATCH 0933/1565] NFSD: Protect against send buffer overflow in NFSv3 READDIR [ Upstream commit 640f87c190e0d1b2a0fcb2ecf6d2cd53b1c41991 ] Since before the git era, NFSD has conserved the number of pages held by each nfsd thread by combining the RPC receive and send buffers into a single array of pages. This works because there are no cases where an operation needs a large RPC Call message and a large RPC Reply message at the same time. Once an RPC Call has been received, svc_process() updates svc_rqst::rq_res to describe the part of rq_pages that can be used for constructing the Reply. This means that the send buffer (rq_res) shrinks when the received RPC record containing the RPC Call is large. A client can force this shrinkage on TCP by sending a correctly- formed RPC Call header contained in an RPC record that is excessively large. The full maximum payload size cannot be constructed in that case. Thanks to Aleksi Illikainen and Kari Hulkko for uncovering this issue. Reported-by: Ben Ronallo Cc: Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 1a52f0b06ec3..d808779ab453 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -563,13 +563,14 @@ static void nfsd3_init_dirlist_pages(struct svc_rqst *rqstp, { struct xdr_buf *buf = &resp->dirlist; struct xdr_stream *xdr = &resp->xdr; - - count = clamp(count, (u32)(XDR_UNIT * 2), svc_max_payload(rqstp)); + unsigned int sendbuf = min_t(unsigned int, rqstp->rq_res.buflen, + svc_max_payload(rqstp)); memset(buf, 0, sizeof(*buf)); /* Reserve room for the NULL ptr & eof flag (-2 words) */ - buf->buflen = count - XDR_UNIT * 2; + buf->buflen = clamp(count, (u32)(XDR_UNIT * 2), sendbuf); + buf->buflen -= XDR_UNIT * 2; buf->pages = rqstp->rq_next_page; rqstp->rq_next_page += (buf->buflen + PAGE_SIZE - 1) >> PAGE_SHIFT; From 2007867c5874134f2271eb276398208070049dd3 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 1 Sep 2022 15:10:18 -0400 Subject: [PATCH 0934/1565] NFSD: Protect against send buffer overflow in NFSv2 READ [ Upstream commit 401bc1f90874280a80b93f23be33a0e7e2d1f912 ] Since before the git era, NFSD has conserved the number of pages held by each nfsd thread by combining the RPC receive and send buffers into a single array of pages. This works because there are no cases where an operation needs a large RPC Call message and a large RPC Reply at the same time. Once an RPC Call has been received, svc_process() updates svc_rqst::rq_res to describe the part of rq_pages that can be used for constructing the Reply. This means that the send buffer (rq_res) shrinks when the received RPC record containing the RPC Call is large. A client can force this shrinkage on TCP by sending a correctly- formed RPC Call header contained in an RPC record that is excessively large. The full maximum payload size cannot be constructed in that case. Cc: Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 559603a0a535..749c3354304c 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -185,6 +185,7 @@ nfsd_proc_read(struct svc_rqst *rqstp) argp->count, argp->offset); argp->count = min_t(u32, argp->count, NFSSVC_MAXBLKSIZE_V2); + argp->count = min_t(u32, argp->count, rqstp->rq_res.buflen); v = 0; len = argp->count; From c23687911f82a63fa2977ce9c992b395e90f8ba0 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 1 Sep 2022 15:10:24 -0400 Subject: [PATCH 0935/1565] NFSD: Protect against send buffer overflow in NFSv3 READ [ Upstream commit fa6be9cc6e80ec79892ddf08a8c10cabab9baf38 ] Since before the git era, NFSD has conserved the number of pages held by each nfsd thread by combining the RPC receive and send buffers into a single array of pages. This works because there are no cases where an operation needs a large RPC Call message and a large RPC Reply at the same time. Once an RPC Call has been received, svc_process() updates svc_rqst::rq_res to describe the part of rq_pages that can be used for constructing the Reply. This means that the send buffer (rq_res) shrinks when the received RPC record containing the RPC Call is large. A client can force this shrinkage on TCP by sending a correctly- formed RPC Call header contained in an RPC record that is excessively large. The full maximum payload size cannot be constructed in that case. Cc: Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index d808779ab453..8679c4174602 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -150,7 +150,6 @@ nfsd3_proc_read(struct svc_rqst *rqstp) { struct nfsd3_readargs *argp = rqstp->rq_argp; struct nfsd3_readres *resp = rqstp->rq_resp; - u32 max_blocksize = svc_max_payload(rqstp); unsigned int len; int v; @@ -159,7 +158,8 @@ nfsd3_proc_read(struct svc_rqst *rqstp) (unsigned long) argp->count, (unsigned long long) argp->offset); - argp->count = min_t(u32, argp->count, max_blocksize); + argp->count = min_t(u32, argp->count, svc_max_payload(rqstp)); + argp->count = min_t(u32, argp->count, rqstp->rq_res.buflen); if (argp->offset > (u64)OFFSET_MAX) argp->offset = (u64)OFFSET_MAX; if (argp->offset + argp->count > (u64)OFFSET_MAX) From b0062184a18435036223174aff2ff3bd5a260029 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 6 Sep 2022 10:42:19 +1000 Subject: [PATCH 0936/1565] NFSD: drop fname and flen args from nfsd_create_locked() [ Upstream commit 9558f9304ca1903090fa5d995a3269a8e82804b4 ] nfsd_create_locked() does not use the "fname" and "flen" arguments, so drop them from declaration and all callers. Signed-off-by: NeilBrown Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 5 ++--- fs/nfsd/vfs.c | 5 ++--- fs/nfsd/vfs.h | 4 ++-- 3 files changed, 6 insertions(+), 8 deletions(-) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 749c3354304c..7ed03ac6bdab 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -391,9 +391,8 @@ nfsd_proc_create(struct svc_rqst *rqstp) resp->status = nfs_ok; if (!inode) { /* File doesn't exist. Create it and set attrs */ - resp->status = nfsd_create_locked(rqstp, dirfhp, argp->name, - argp->len, &attrs, type, rdev, - newfhp); + resp->status = nfsd_create_locked(rqstp, dirfhp, &attrs, type, + rdev, newfhp); } else if (type == S_IFREG) { dprintk("nfsd: existing %s, valid=%x, size=%ld\n", argp->name, attr->ia_valid, (long) attr->ia_size); diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index bc377ee17717..5ec1119a8785 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1273,7 +1273,7 @@ nfsd_check_ignore_resizing(struct iattr *iap) /* The parent directory should already be locked: */ __be32 nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, - char *fname, int flen, struct nfsd_attrs *attrs, + struct nfsd_attrs *attrs, int type, dev_t rdev, struct svc_fh *resfhp) { struct dentry *dentry, *dchild; @@ -1399,8 +1399,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, if (err) goto out_unlock; fh_fill_pre_attrs(fhp); - err = nfsd_create_locked(rqstp, fhp, fname, flen, attrs, type, - rdev, resfhp); + err = nfsd_create_locked(rqstp, fhp, attrs, type, rdev, resfhp); fh_fill_post_attrs(fhp); out_unlock: inode_unlock(dentry->d_inode); diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index c95cd414b4bb..120521bc7b24 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -79,8 +79,8 @@ __be32 nfsd4_clone_file_range(struct svc_rqst *rqstp, u64 count, bool sync); #endif /* CONFIG_NFSD_V4 */ __be32 nfsd_create_locked(struct svc_rqst *, struct svc_fh *, - char *name, int len, struct nfsd_attrs *attrs, - int type, dev_t rdev, struct svc_fh *res); + struct nfsd_attrs *attrs, int type, dev_t rdev, + struct svc_fh *res); __be32 nfsd_create(struct svc_rqst *, struct svc_fh *, char *name, int len, struct nfsd_attrs *attrs, int type, dev_t rdev, struct svc_fh *res); From f574d41b1bda4fa3687b9c34ff3a9e5251833e65 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 5 Sep 2022 15:33:32 -0400 Subject: [PATCH 0937/1565] NFSD: Fix handling of oversized NFSv4 COMPOUND requests [ Upstream commit 7518a3dc5ea249d4112156ce71b8b184eb786151 ] If an NFS server returns NFS4ERR_RESOURCE on the first operation in an NFSv4 COMPOUND, there's no way for a client to know where the problem is and then simplify the compound to make forward progress. So instead, make NFSD process as many operations in an oversized COMPOUND as it can and then return NFS4ERR_RESOURCE on the first operation it did not process. pynfs NFSv4.0 COMP6 exercises this case, but checks only for the COMPOUND status code, not whether the server has processed any of the operations. pynfs NFSv4.1 SEQ6 and SEQ7 exercise the NFSv4.1 case, which detects too many operations per COMPOUND by checking against the limits negotiated when the session was created. Suggested-by: Bruce Fields Fixes: 0078117c6d91 ("nfsd: return RESOURCE not GARBAGE_ARGS on too many ops") Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 19 +++++++++++++------ fs/nfsd/nfs4xdr.c | 12 +++--------- fs/nfsd/xdr4.h | 3 ++- 3 files changed, 18 insertions(+), 16 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 62ffcecf78f7..c40795d1d98d 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2623,9 +2623,6 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) status = nfserr_minor_vers_mismatch; if (nfsd_minorversion(nn, args->minorversion, NFSD_TEST) <= 0) goto out; - status = nfserr_resource; - if (args->opcnt > NFSD_MAX_OPS_PER_COMPOUND) - goto out; status = nfs41_check_op_ordering(args); if (status) { @@ -2638,10 +2635,20 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) rqstp->rq_lease_breaker = (void **)&cstate->clp; - trace_nfsd_compound(rqstp, args->opcnt); + trace_nfsd_compound(rqstp, args->client_opcnt); while (!status && resp->opcnt < args->opcnt) { op = &args->ops[resp->opcnt++]; + if (unlikely(resp->opcnt == NFSD_MAX_OPS_PER_COMPOUND)) { + /* If there are still more operations to process, + * stop here and report NFS4ERR_RESOURCE. */ + if (cstate->minorversion == 0 && + args->client_opcnt > resp->opcnt) { + op->status = nfserr_resource; + goto encode_op; + } + } + /* * The XDR decode routines may have pre-set op->status; * for example, if there is a miscellaneous XDR error @@ -2717,8 +2724,8 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) status = op->status; } - trace_nfsd_compound_status(args->opcnt, resp->opcnt, status, - nfsd4_op_name(op->opnum)); + trace_nfsd_compound_status(args->client_opcnt, resp->opcnt, + status, nfsd4_op_name(op->opnum)); nfsd4_cstate_clear_replay(cstate); nfsd4_increment_op_stats(op->opnum); diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 92e0535ddb92..b9398b7b3539 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2359,16 +2359,10 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) if (xdr_stream_decode_u32(argp->xdr, &argp->minorversion) < 0) return false; - if (xdr_stream_decode_u32(argp->xdr, &argp->opcnt) < 0) + if (xdr_stream_decode_u32(argp->xdr, &argp->client_opcnt) < 0) return false; - - /* - * NFS4ERR_RESOURCE is a more helpful error than GARBAGE_ARGS - * here, so we return success at the xdr level so that - * nfsd4_proc can handle this is an NFS-level error. - */ - if (argp->opcnt > NFSD_MAX_OPS_PER_COMPOUND) - return true; + argp->opcnt = min_t(u32, argp->client_opcnt, + NFSD_MAX_OPS_PER_COMPOUND); if (argp->opcnt > ARRAY_SIZE(argp->iops)) { argp->ops = vcalloc(argp->opcnt, sizeof(*argp->ops)); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 14b87141de34..2a699e8ca1ba 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -717,9 +717,10 @@ struct nfsd4_compoundargs { struct svcxdr_tmpbuf *to_free; struct svc_rqst *rqstp; - u32 taglen; char * tag; + u32 taglen; u32 minorversion; + u32 client_opcnt; u32 opcnt; struct nfsd4_op *ops; struct nfsd4_op iops[8]; From 330914c342451a8e229fbe67bb37a63e40564767 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Thu, 8 Sep 2022 12:31:07 -0400 Subject: [PATCH 0938/1565] nfsd: clean up mounted_on_fileid handling [ Upstream commit 6106d9119b6599fa23dc556b429d887b4c2d9f62 ] We only need the inode number for this, not a full rack of attributes. Rename this function make it take a pointer to a u64 instead of struct kstat, and change it to just request STATX_INO. Signed-off-by: Jeff Layton [ cel: renamed get_mounted_on_ino() ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index b9398b7b3539..b5ca83045d6e 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2769,9 +2769,10 @@ static __be32 fattr_handle_absent_fs(u32 *bmval0, u32 *bmval1, u32 *bmval2, u32 } -static int get_parent_attributes(struct svc_export *exp, struct kstat *stat) +static int nfsd4_get_mounted_on_ino(struct svc_export *exp, u64 *pino) { struct path path = exp->ex_path; + struct kstat stat; int err; path_get(&path); @@ -2779,8 +2780,10 @@ static int get_parent_attributes(struct svc_export *exp, struct kstat *stat) if (path.dentry != path.mnt->mnt_root) break; } - err = vfs_getattr(&path, stat, STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT); + err = vfs_getattr(&path, &stat, STATX_INO, AT_STATX_SYNC_AS_STAT); path_put(&path); + if (!err) + *pino = stat.ino; return err; } @@ -3277,22 +3280,21 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, *p++ = cpu_to_be32(stat.btime.tv_nsec); } if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) { - struct kstat parent_stat; u64 ino = stat.ino; p = xdr_reserve_space(xdr, 8); if (!p) goto out_resource; /* - * Get parent's attributes if not ignoring crossmount - * and this is the root of a cross-mounted filesystem. + * Get ino of mountpoint in parent filesystem, if not ignoring + * crossmount and this is the root of a cross-mounted + * filesystem. */ if (ignore_crossmnt == 0 && dentry == exp->ex_path.mnt->mnt_root) { - err = get_parent_attributes(exp, &parent_stat); + err = nfsd4_get_mounted_on_ino(exp, &ino); if (err) goto out_nfserr; - ino = parent_stat.ino; } p = xdr_encode_hyper(p, ino); } From bc6bead0af16f1343f7e8d0e71a3b5a64d658df9 Mon Sep 17 00:00:00 2001 From: Gaosheng Cui Date: Fri, 9 Sep 2022 14:59:10 +0800 Subject: [PATCH 0939/1565] nfsd: remove nfsd4_prepare_cb_recall() declaration [ Upstream commit 18224dc58d960c65446971930d0487fc72d00598 ] nfsd4_prepare_cb_recall() has been removed since commit 0162ac2b978e ("nfsd: introduce nfsd4_callback_ops"), so remove it. Signed-off-by: Gaosheng Cui Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/state.h | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 5d28beb290fe..4155be65d806 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -697,7 +697,6 @@ extern int nfsd4_create_callback_queue(void); extern void nfsd4_destroy_callback_queue(void); extern void nfsd4_shutdown_callback(struct nfs4_client *); extern void nfsd4_shutdown_copy(struct nfs4_client *clp); -extern void nfsd4_prepare_cb_recall(struct nfs4_delegation *dp); extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(struct xdr_netobj name, struct xdr_netobj princhash, struct nfsd_net *nn); extern bool nfs4_has_reclaimed_state(struct xdr_netobj name, struct nfsd_net *nn); From 3c0e831b87c6a424f85e6ad4cf254198421b6794 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 8 Sep 2022 18:13:54 -0400 Subject: [PATCH 0940/1565] NFSD: Add tracepoints to report NFSv4 callback completions [ Upstream commit 1035d65446a018ca2dd179e29a2fcd6d29057781 ] Wireshark has always been lousy about dissecting NFSv4 callbacks, especially NFSv4.0 backchannel requests. Add tracepoints so we can surgically capture these events in the trace log. Tracepoints are time-stamped and ordered so that we can now observe the timing relationship between a CB_RECALL Reply and the client's DELEGRETURN Call. Example: nfsd-1153 [002] 211.986391: nfsd_cb_recall: addr=192.168.1.67:45767 client 62ea82e4:fee7492a stateid 00000003:00000001 nfsd-1153 [002] 212.095634: nfsd_compound: xid=0x0000002c opcnt=2 nfsd-1153 [002] 212.095647: nfsd_compound_status: op=1/2 OP_PUTFH status=0 nfsd-1153 [002] 212.095658: nfsd_file_put: hash=0xf72 inode=0xffff9291148c7410 ref=3 flags=HASHED|REFERENCED may=READ file=0xffff929103b3ea00 nfsd-1153 [002] 212.095661: nfsd_compound_status: op=2/2 OP_DELEGRETURN status=0 kworker/u25:8-148 [002] 212.096713: nfsd_cb_recall_done: client 62ea82e4:fee7492a stateid 00000003:00000001 status=0 Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4layouts.c | 2 +- fs/nfsd/nfs4proc.c | 4 ++++ fs/nfsd/nfs4state.c | 4 ++++ fs/nfsd/trace.h | 39 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 48 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c index 7018d209b784..e4e23b2a3e65 100644 --- a/fs/nfsd/nfs4layouts.c +++ b/fs/nfsd/nfs4layouts.c @@ -657,7 +657,7 @@ nfsd4_cb_layout_done(struct nfsd4_callback *cb, struct rpc_task *task) ktime_t now, cutoff; const struct nfsd4_layout_ops *ops; - + trace_nfsd_cb_layout_done(&ls->ls_stid.sc_stateid, task); switch (task->tk_status) { case 0: case -NFS4ERR_DELAY: diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index c40795d1d98d..38d60964d8b2 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1603,6 +1603,10 @@ static void nfsd4_cb_offload_release(struct nfsd4_callback *cb) static int nfsd4_cb_offload_done(struct nfsd4_callback *cb, struct rpc_task *task) { + struct nfsd4_cb_offload *cbo = + container_of(cb, struct nfsd4_cb_offload, co_cb); + + trace_nfsd_cb_offload_done(&cbo->co_res.cb_stateid, task); return 1; } diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 7a4baa41b536..8bbff388e4f0 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -357,6 +357,8 @@ nfsd4_cb_notify_lock_prepare(struct nfsd4_callback *cb) static int nfsd4_cb_notify_lock_done(struct nfsd4_callback *cb, struct rpc_task *task) { + trace_nfsd_cb_notify_lock_done(&zero_stateid, task); + /* * Since this is just an optimization, we don't try very hard if it * turns out not to succeed. We'll requeue it on NFS4ERR_DELAY, and @@ -4760,6 +4762,8 @@ static int nfsd4_cb_recall_done(struct nfsd4_callback *cb, { struct nfs4_delegation *dp = cb_to_delegation(cb); + trace_nfsd_cb_recall_done(&dp->dl_stid.sc_stateid, task); + if (dp->dl_stid.sc_type == NFS4_CLOSED_DELEG_STID || dp->dl_stid.sc_type == NFS4_REVOKED_DELEG_STID) return 1; diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 22c1fb735f1a..0ee4220a289a 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -1366,6 +1366,45 @@ TRACE_EVENT(nfsd_cb_offload, __entry->fh_hash, __entry->count, __entry->status) ); +DECLARE_EVENT_CLASS(nfsd_cb_done_class, + TP_PROTO( + const stateid_t *stp, + const struct rpc_task *task + ), + TP_ARGS(stp, task), + TP_STRUCT__entry( + __field(u32, cl_boot) + __field(u32, cl_id) + __field(u32, si_id) + __field(u32, si_generation) + __field(int, status) + ), + TP_fast_assign( + __entry->cl_boot = stp->si_opaque.so_clid.cl_boot; + __entry->cl_id = stp->si_opaque.so_clid.cl_id; + __entry->si_id = stp->si_opaque.so_id; + __entry->si_generation = stp->si_generation; + __entry->status = task->tk_status; + ), + TP_printk("client %08x:%08x stateid %08x:%08x status=%d", + __entry->cl_boot, __entry->cl_id, __entry->si_id, + __entry->si_generation, __entry->status + ) +); + +#define DEFINE_NFSD_CB_DONE_EVENT(name) \ +DEFINE_EVENT(nfsd_cb_done_class, name, \ + TP_PROTO( \ + const stateid_t *stp, \ + const struct rpc_task *task \ + ), \ + TP_ARGS(stp, task)) + +DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_recall_done); +DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_notify_lock_done); +DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_layout_done); +DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_offload_done); + #endif /* _NFSD_TRACE_H */ #undef TRACE_INCLUDE_PATH From 95dce2279c8172cab0ba5d4c23a84ddd1290d3d2 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 8 Sep 2022 18:14:00 -0400 Subject: [PATCH 0941/1565] NFSD: Add a mechanism to wait for a DELEGRETURN [ Upstream commit c035362eb935fe9381d9d1cc453bc2a37460e24c ] Subsequent patches will use this mechanism to wake up an operation that is waiting for a client to return a delegation. The new tracepoint records whether the wait timed out or was properly awoken by the expected DELEGRETURN: nfsd-1155 [002] 83799.493199: nfsd_delegret_wakeup: xid=0x14b7d6ef fh_hash=0xf6826792 (timed out) Suggested-by: Jeff Layton Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 30 ++++++++++++++++++++++++++++++ fs/nfsd/nfsd.h | 7 +++++++ fs/nfsd/trace.h | 23 +++++++++++++++++++++++ 3 files changed, 60 insertions(+) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 8bbff388e4f0..2bb78ab4f6c3 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4734,6 +4734,35 @@ nfs4_share_conflict(struct svc_fh *current_fh, unsigned int deny_type) return ret; } +static bool nfsd4_deleg_present(const struct inode *inode) +{ + struct file_lock_context *ctx = smp_load_acquire(&inode->i_flctx); + + return ctx && !list_empty_careful(&ctx->flc_lease); +} + +/** + * nfsd_wait_for_delegreturn - wait for delegations to be returned + * @rqstp: the RPC transaction being executed + * @inode: in-core inode of the file being waited for + * + * The timeout prevents deadlock if all nfsd threads happen to be + * tied up waiting for returning delegations. + * + * Return values: + * %true: delegation was returned + * %false: timed out waiting for delegreturn + */ +bool nfsd_wait_for_delegreturn(struct svc_rqst *rqstp, struct inode *inode) +{ + long __maybe_unused timeo; + + timeo = wait_var_event_timeout(inode, !nfsd4_deleg_present(inode), + NFSD_DELEGRETURN_TIMEOUT); + trace_nfsd_delegret_wakeup(rqstp, inode, timeo); + return timeo > 0; +} + static void nfsd4_cb_recall_prepare(struct nfsd4_callback *cb) { struct nfs4_delegation *dp = cb_to_delegation(cb); @@ -6811,6 +6840,7 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) goto put_stateid; + wake_up_var(d_inode(cstate->current_fh.fh_dentry)); destroy_delegation(dp); put_stateid: nfs4_put_stid(&dp->dl_stid); diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 57a468ed85c3..6ab4ad41ae84 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -164,6 +164,7 @@ char * nfs4_recoverydir(void); bool nfsd4_spo_must_allow(struct svc_rqst *rqstp); int nfsd4_create_laundry_wq(void); void nfsd4_destroy_laundry_wq(void); +bool nfsd_wait_for_delegreturn(struct svc_rqst *rqstp, struct inode *inode); #else static inline int nfsd4_init_slabs(void) { return 0; } static inline void nfsd4_free_slabs(void) { } @@ -179,6 +180,11 @@ static inline bool nfsd4_spo_must_allow(struct svc_rqst *rqstp) } static inline int nfsd4_create_laundry_wq(void) { return 0; }; static inline void nfsd4_destroy_laundry_wq(void) {}; +static inline bool nfsd_wait_for_delegreturn(struct svc_rqst *rqstp, + struct inode *inode) +{ + return false; +} #endif /* @@ -343,6 +349,7 @@ void nfsd_lockd_shutdown(void); #define NFSD_COURTESY_CLIENT_TIMEOUT (24 * 60 * 60) /* seconds */ #define NFSD_CLIENT_MAX_TRIM_PER_RUN 128 #define NFS4_CLIENTS_PER_GB 1024 +#define NFSD_DELEGRETURN_TIMEOUT (HZ / 34) /* 30ms */ /* * The following attributes are currently not supported by the NFSv4 server: diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 0ee4220a289a..6803ac877ff7 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -454,6 +454,29 @@ DEFINE_NFSD_COPY_ERR_EVENT(clone_file_range_err); #include "filecache.h" #include "vfs.h" +TRACE_EVENT(nfsd_delegret_wakeup, + TP_PROTO( + const struct svc_rqst *rqstp, + const struct inode *inode, + long timeo + ), + TP_ARGS(rqstp, inode, timeo), + TP_STRUCT__entry( + __field(u32, xid) + __field(const void *, inode) + __field(long, timeo) + ), + TP_fast_assign( + __entry->xid = be32_to_cpu(rqstp->rq_xid); + __entry->inode = inode; + __entry->timeo = timeo; + ), + TP_printk("xid=0x%08x inode=%p%s", + __entry->xid, __entry->inode, + __entry->timeo == 0 ? " (timed out)" : "" + ) +); + DECLARE_EVENT_CLASS(nfsd_stateid_class, TP_PROTO(stateid_t *stp), TP_ARGS(stp), From f326970df189b622fe73f217514847668042590a Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 8 Sep 2022 18:14:07 -0400 Subject: [PATCH 0942/1565] NFSD: Refactor nfsd_setattr() [ Upstream commit c0aa1913db57219e91a0a8832363cbafb3a9cf8f ] Move code that will be retried (in a subsequent patch) into a helper function. Reviewed-by: Jeff Layton [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 97 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 57 insertions(+), 40 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 5ec1119a8785..e32b0c807ea9 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -356,8 +356,61 @@ nfsd_get_write_access(struct svc_rqst *rqstp, struct svc_fh *fhp, return nfserrno(host_err); } -/* - * Set various file attributes. After this call fhp needs an fh_put. +static int __nfsd_setattr(struct dentry *dentry, struct iattr *iap) +{ + int host_err; + + if (iap->ia_valid & ATTR_SIZE) { + /* + * RFC5661, Section 18.30.4: + * Changing the size of a file with SETATTR indirectly + * changes the time_modify and change attributes. + * + * (and similar for the older RFCs) + */ + struct iattr size_attr = { + .ia_valid = ATTR_SIZE | ATTR_CTIME | ATTR_MTIME, + .ia_size = iap->ia_size, + }; + + if (iap->ia_size < 0) + return -EFBIG; + + host_err = notify_change(dentry, &size_attr, NULL); + if (host_err) + return host_err; + iap->ia_valid &= ~ATTR_SIZE; + + /* + * Avoid the additional setattr call below if the only other + * attribute that the client sends is the mtime, as we update + * it as part of the size change above. + */ + if ((iap->ia_valid & ~ATTR_MTIME) == 0) + return 0; + } + + if (!iap->ia_valid) + return 0; + + iap->ia_valid |= ATTR_CTIME; + return notify_change(dentry, iap, NULL); +} + +/** + * nfsd_setattr - Set various file attributes. + * @rqstp: controlling RPC transaction + * @fhp: filehandle of target + * @attr: attributes to set + * @check_guard: set to 1 if guardtime is a valid timestamp + * @guardtime: do not act if ctime.tv_sec does not match this timestamp + * + * This call may adjust the contents of @attr (in particular, this + * call may change the bits in the na_iattr.ia_valid field). + * + * Returns nfs_ok on success, otherwise an NFS status code is + * returned. Caller must release @fhp by calling fh_put in either + * case. */ __be32 nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, @@ -370,7 +423,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, int accmode = NFSD_MAY_SATTR; umode_t ftype = 0; __be32 err; - int host_err = 0; + int host_err; bool get_write_count; bool size_change = (iap->ia_valid & ATTR_SIZE); @@ -427,43 +480,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, } inode_lock(inode); - if (size_change) { - /* - * RFC5661, Section 18.30.4: - * Changing the size of a file with SETATTR indirectly - * changes the time_modify and change attributes. - * - * (and similar for the older RFCs) - */ - struct iattr size_attr = { - .ia_valid = ATTR_SIZE | ATTR_CTIME | ATTR_MTIME, - .ia_size = iap->ia_size, - }; - - host_err = -EFBIG; - if (iap->ia_size < 0) - goto out_unlock; - - host_err = notify_change(dentry, &size_attr, NULL); - if (host_err) - goto out_unlock; - iap->ia_valid &= ~ATTR_SIZE; - - /* - * Avoid the additional setattr call below if the only other - * attribute that the client sends is the mtime, as we update - * it as part of the size change above. - */ - if ((iap->ia_valid & ~ATTR_MTIME) == 0) - goto out_unlock; - } - - if (iap->ia_valid) { - iap->ia_valid |= ATTR_CTIME; - host_err = notify_change(dentry, iap, NULL); - } - -out_unlock: + host_err = __nfsd_setattr(dentry, iap); if (attr->na_seclabel && attr->na_seclabel->len) attr->na_labelerr = security_inode_setsecctx(dentry, attr->na_seclabel->data, attr->na_seclabel->len); From b6c6c7153bdbcdbab07a08fbbe330ca71a2305af Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 8 Sep 2022 18:14:13 -0400 Subject: [PATCH 0943/1565] NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY [ Upstream commit 34b91dda7124fc3259e4b2ae53e0c933dedfec01 ] nfsd_setattr() can kick off a CB_RECALL (via notify_change() -> break_lease()) if a delegation is present. Before returning NFS4ERR_DELAY, give the client holding that delegation a chance to return it and then retry the nfsd_setattr() again, once. Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354 Tested-by: Igor Mammedov Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index e32b0c807ea9..dc79db261d6a 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -426,6 +426,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, int host_err; bool get_write_count; bool size_change = (iap->ia_valid & ATTR_SIZE); + int retries; if (iap->ia_valid & ATTR_SIZE) { accmode |= NFSD_MAY_WRITE|NFSD_MAY_OWNER_OVERRIDE; @@ -480,7 +481,13 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, } inode_lock(inode); - host_err = __nfsd_setattr(dentry, iap); + for (retries = 1;;) { + host_err = __nfsd_setattr(dentry, iap); + if (host_err != -EAGAIN || !retries--) + break; + if (!nfsd_wait_for_delegreturn(rqstp, inode)) + break; + } if (attr->na_seclabel && attr->na_seclabel->len) attr->na_labelerr = security_inode_setsecctx(dentry, attr->na_seclabel->data, attr->na_seclabel->len); From 235738ccea3b6b2dd58440d91dcadd3e88ba5d7d Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 8 Sep 2022 18:14:19 -0400 Subject: [PATCH 0944/1565] NFSD: Make nfsd4_rename() wait before returning NFS4ERR_DELAY [ Upstream commit 68c522afd0b1936b48a03a4c8b81261e7597c62d ] nfsd_rename() can kick off a CB_RECALL (via vfs_rename() -> leases_conflict()) if a delegation is present. Before returning NFS4ERR_DELAY, give the client holding that delegation a chance to return it and then retry the nfsd_rename() again, once. This version of the patch handles renaming an existing file, but does not deal with renaming onto an existing file. That case will still always trigger an NFS4ERR_DELAY. Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354 Tested-by: Igor Mammedov Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index dc79db261d6a..5684845d9511 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1711,7 +1711,15 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, .new_dir = tdir, .new_dentry = ndentry, }; - host_err = vfs_rename(&rd); + int retries; + + for (retries = 1;;) { + host_err = vfs_rename(&rd); + if (host_err != -EAGAIN || !retries--) + break; + if (!nfsd_wait_for_delegreturn(rqstp, d_inode(odentry))) + break; + } if (!host_err) { host_err = commit_metadata(tfhp); if (!host_err) From 1022fe63c57e4e24a57316d8ddfdf88bd293e2f8 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 8 Sep 2022 18:14:25 -0400 Subject: [PATCH 0945/1565] NFSD: Make nfsd4_remove() wait before returning NFS4ERR_DELAY [ Upstream commit 5f5f8b6d655fd947e899b1771c2f7cb581a06764 ] nfsd_unlink() can kick off a CB_RECALL (via vfs_unlink() -> leases_conflict()) if a delegation is present. Before returning NFS4ERR_DELAY, give the client holding that delegation a chance to return it and then retry the nfsd_unlink() again, once. Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354 Tested-by: Igor Mammedov Reviewed-by: Jeff Layton [ cel: backported to 5.10.y, prior to idmapped mounts ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 5684845d9511..e29034b1e612 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1803,9 +1803,18 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, fh_fill_pre_attrs(fhp); if (type != S_IFDIR) { + int retries; + if (rdentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) nfsd_close_cached_files(rdentry); - host_err = vfs_unlink(dirp, rdentry, NULL); + + for (retries = 1;;) { + host_err = vfs_unlink(dirp, rdentry, NULL); + if (host_err != -EAGAIN || !retries--) + break; + if (!nfsd_wait_for_delegreturn(rqstp, rinode)) + break; + } } else { host_err = vfs_rmdir(dirp, rdentry); } From 67302ef04e5411d7d81279c5957f8bdffb800c65 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Wed, 14 Sep 2022 08:54:25 -0700 Subject: [PATCH 0946/1565] NFSD: keep track of the number of courtesy clients in the system [ Upstream commit 3a4ea23d86a317c4b68b9a69d51f7e84e1e04357 ] Add counter nfs4_courtesy_client_count to nfsd_net to keep track of the number of courtesy clients in the system. Signed-off-by: Dai Ngo Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/netns.h | 2 ++ fs/nfsd/nfs4state.c | 17 ++++++++++++++++- 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index ffe17743cc74..55c7006d6109 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -192,6 +192,8 @@ struct nfsd_net { atomic_t nfs4_client_count; int nfs4_max_clients; + + atomic_t nfsd_courtesy_clients; }; /* Simple check to find out if a given net was properly initialized */ diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 2bb78ab4f6c3..9930c5f9440c 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -160,6 +160,13 @@ static bool is_client_expired(struct nfs4_client *clp) return clp->cl_time == 0; } +static void nfsd4_dec_courtesy_client_count(struct nfsd_net *nn, + struct nfs4_client *clp) +{ + if (clp->cl_state != NFSD4_ACTIVE) + atomic_add_unless(&nn->nfsd_courtesy_clients, -1, 0); +} + static __be32 get_client_locked(struct nfs4_client *clp) { struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); @@ -169,6 +176,7 @@ static __be32 get_client_locked(struct nfs4_client *clp) if (is_client_expired(clp)) return nfserr_expired; atomic_inc(&clp->cl_rpc_users); + nfsd4_dec_courtesy_client_count(nn, clp); clp->cl_state = NFSD4_ACTIVE; return nfs_ok; } @@ -190,6 +198,7 @@ renew_client_locked(struct nfs4_client *clp) list_move_tail(&clp->cl_lru, &nn->client_lru); clp->cl_time = ktime_get_boottime_seconds(); + nfsd4_dec_courtesy_client_count(nn, clp); clp->cl_state = NFSD4_ACTIVE; } @@ -2248,6 +2257,7 @@ __destroy_client(struct nfs4_client *clp) if (clp->cl_cb_conn.cb_xprt) svc_xprt_put(clp->cl_cb_conn.cb_xprt); atomic_add_unless(&nn->nfs4_client_count, -1, 0); + nfsd4_dec_courtesy_client_count(nn, clp); free_client(clp); wake_up_all(&expiry_wq); } @@ -4375,6 +4385,8 @@ void nfsd4_init_leases_net(struct nfsd_net *nn) max_clients = (u64)si.totalram * si.mem_unit / (1024 * 1024 * 1024); max_clients *= NFS4_CLIENTS_PER_GB; nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB); + + atomic_set(&nn->nfsd_courtesy_clients, 0); } static void init_nfs4_replay(struct nfs4_replay *rp) @@ -5928,8 +5940,11 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, goto exp_client; if (!state_expired(lt, clp->cl_time)) break; - if (!atomic_read(&clp->cl_rpc_users)) + if (!atomic_read(&clp->cl_rpc_users)) { + if (clp->cl_state == NFSD4_ACTIVE) + atomic_inc(&nn->nfsd_courtesy_clients); clp->cl_state = NFSD4_COURTESY; + } if (!client_has_state(clp)) goto exp_client; if (!nfs4_anylock_blockers(clp)) From 6e50de3b3a289c3166d20620abef1885156ceaa8 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Wed, 14 Sep 2022 08:54:26 -0700 Subject: [PATCH 0947/1565] NFSD: add shrinker to reap courtesy clients on low memory condition [ Upstream commit 7746b32f467b3813fb61faaab3258de35806a7ac ] Add courtesy_client_reaper to react to low memory condition triggered by the system memory shrinker. The delayed_work for the courtesy_client_reaper is scheduled on the shrinker's count callback using the laundry_wq. The shrinker's scan callback is not used for expiring the courtesy clients due to potential deadlocks. Signed-off-by: Dai Ngo [ cel: adjusted to apply without e33c267ab70d ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/netns.h | 2 + fs/nfsd/nfs4state.c | 94 +++++++++++++++++++++++++++++++++++++++++---- fs/nfsd/nfsctl.c | 6 ++- fs/nfsd/nfsd.h | 6 ++- 4 files changed, 96 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 55c7006d6109..8c854ba3285b 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -194,6 +194,8 @@ struct nfsd_net { int nfs4_max_clients; atomic_t nfsd_courtesy_clients; + struct shrinker nfsd_client_shrinker; + struct delayed_work nfsd_shrinker_work; }; /* Simple check to find out if a given net was properly initialized */ diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 9930c5f9440c..d2468a408328 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4366,7 +4366,27 @@ nfsd4_init_slabs(void) return -ENOMEM; } -void nfsd4_init_leases_net(struct nfsd_net *nn) +static unsigned long +nfsd_courtesy_client_count(struct shrinker *shrink, struct shrink_control *sc) +{ + int cnt; + struct nfsd_net *nn = container_of(shrink, + struct nfsd_net, nfsd_client_shrinker); + + cnt = atomic_read(&nn->nfsd_courtesy_clients); + if (cnt > 0) + mod_delayed_work(laundry_wq, &nn->nfsd_shrinker_work, 0); + return (unsigned long)cnt; +} + +static unsigned long +nfsd_courtesy_client_scan(struct shrinker *shrink, struct shrink_control *sc) +{ + return SHRINK_STOP; +} + +int +nfsd4_init_leases_net(struct nfsd_net *nn) { struct sysinfo si; u64 max_clients; @@ -4387,6 +4407,16 @@ void nfsd4_init_leases_net(struct nfsd_net *nn) nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB); atomic_set(&nn->nfsd_courtesy_clients, 0); + nn->nfsd_client_shrinker.scan_objects = nfsd_courtesy_client_scan; + nn->nfsd_client_shrinker.count_objects = nfsd_courtesy_client_count; + nn->nfsd_client_shrinker.seeks = DEFAULT_SEEKS; + return register_shrinker(&nn->nfsd_client_shrinker); +} + +void +nfsd4_leases_net_shutdown(struct nfsd_net *nn) +{ + unregister_shrinker(&nn->nfsd_client_shrinker); } static void init_nfs4_replay(struct nfs4_replay *rp) @@ -5959,10 +5989,49 @@ nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, spin_unlock(&nn->client_lock); } +static void +nfs4_get_courtesy_client_reaplist(struct nfsd_net *nn, + struct list_head *reaplist) +{ + unsigned int maxreap = 0, reapcnt = 0; + struct list_head *pos, *next; + struct nfs4_client *clp; + + maxreap = NFSD_CLIENT_MAX_TRIM_PER_RUN; + INIT_LIST_HEAD(reaplist); + + spin_lock(&nn->client_lock); + list_for_each_safe(pos, next, &nn->client_lru) { + clp = list_entry(pos, struct nfs4_client, cl_lru); + if (clp->cl_state == NFSD4_ACTIVE) + break; + if (reapcnt >= maxreap) + break; + if (!mark_client_expired_locked(clp)) { + list_add(&clp->cl_lru, reaplist); + reapcnt++; + } + } + spin_unlock(&nn->client_lock); +} + +static void +nfs4_process_client_reaplist(struct list_head *reaplist) +{ + struct list_head *pos, *next; + struct nfs4_client *clp; + + list_for_each_safe(pos, next, reaplist) { + clp = list_entry(pos, struct nfs4_client, cl_lru); + trace_nfsd_clid_purged(&clp->cl_clientid); + list_del_init(&clp->cl_lru); + expire_client(clp); + } +} + static time64_t nfs4_laundromat(struct nfsd_net *nn) { - struct nfs4_client *clp; struct nfs4_openowner *oo; struct nfs4_delegation *dp; struct nfs4_ol_stateid *stp; @@ -5991,12 +6060,8 @@ nfs4_laundromat(struct nfsd_net *nn) } spin_unlock(&nn->s2s_cp_lock); nfs4_get_client_reaplist(nn, &reaplist, <); - list_for_each_safe(pos, next, &reaplist) { - clp = list_entry(pos, struct nfs4_client, cl_lru); - trace_nfsd_clid_purged(&clp->cl_clientid); - list_del_init(&clp->cl_lru); - expire_client(clp); - } + nfs4_process_client_reaplist(&reaplist); + spin_lock(&state_lock); list_for_each_safe(pos, next, &nn->del_recall_lru) { dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru); @@ -6079,6 +6144,18 @@ laundromat_main(struct work_struct *laundry) queue_delayed_work(laundry_wq, &nn->laundromat_work, t*HZ); } +static void +courtesy_client_reaper(struct work_struct *reaper) +{ + struct list_head reaplist; + struct delayed_work *dwork = to_delayed_work(reaper); + struct nfsd_net *nn = container_of(dwork, struct nfsd_net, + nfsd_shrinker_work); + + nfs4_get_courtesy_client_reaplist(nn, &reaplist); + nfs4_process_client_reaplist(&reaplist); +} + static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp) { if (!fh_match(&fhp->fh_handle, &stp->sc_file->fi_fhandle)) @@ -7911,6 +7988,7 @@ static int nfs4_state_create_net(struct net *net) INIT_LIST_HEAD(&nn->blocked_locks_lru); INIT_DELAYED_WORK(&nn->laundromat_work, laundromat_main); + INIT_DELAYED_WORK(&nn->nfsd_shrinker_work, courtesy_client_reaper); get_net(net); return 0; diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 917fa1892fd2..597a26ad4183 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1481,11 +1481,12 @@ static __net_init int nfsd_init_net(struct net *net) goto out_idmap_error; nn->nfsd_versions = NULL; nn->nfsd4_minorversions = NULL; + retval = nfsd4_init_leases_net(nn); + if (retval) + goto out_drc_error; retval = nfsd_reply_cache_init(nn); if (retval) goto out_drc_error; - nfsd4_init_leases_net(nn); - get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); seqlock_init(&nn->writeverf_lock); @@ -1507,6 +1508,7 @@ static __net_exit void nfsd_exit_net(struct net *net) nfsd_idmap_shutdown(net); nfsd_export_shutdown(net); nfsd_netns_free_versions(net_generic(net, nfsd_net_id)); + nfsd4_leases_net_shutdown(nn); } static struct pernet_operations nfsd_net_ops = { diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 6ab4ad41ae84..09726c5b9a31 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -505,7 +505,8 @@ extern void unregister_cld_notifier(void); extern void nfsd4_ssc_init_umount_work(struct nfsd_net *nn); #endif -extern void nfsd4_init_leases_net(struct nfsd_net *nn); +extern int nfsd4_init_leases_net(struct nfsd_net *nn); +extern void nfsd4_leases_net_shutdown(struct nfsd_net *nn); #else /* CONFIG_NFSD_V4 */ static inline int nfsd4_is_junction(struct dentry *dentry) @@ -513,7 +514,8 @@ static inline int nfsd4_is_junction(struct dentry *dentry) return 0; } -static inline void nfsd4_init_leases_net(struct nfsd_net *nn) {}; +static inline int nfsd4_init_leases_net(struct nfsd_net *nn) { return 0; }; +static inline void nfsd4_leases_net_shutdown(struct nfsd_net *nn) {}; #define register_cld_notifier() 0 #define unregister_cld_notifier() do { } while(0) From 5ed252489368015326022adf541f08f2f073382d Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 12 Sep 2022 17:22:38 -0400 Subject: [PATCH 0948/1565] SUNRPC: Parametrize how much of argsize should be zeroed [ Upstream commit 103cc1fafee48adb91fca0e19deb869fd23e46ab ] Currently, SUNRPC clears the whole of .pc_argsize before processing each incoming RPC transaction. Add an extra parameter to struct svc_procedure to enable upper layers to reduce the amount of each operation's argument structure that is zeroed by SUNRPC. The size of struct nfsd4_compoundargs, in particular, is a lot to clear on each incoming RPC Call. A subsequent patch will cut this down to something closer to what NFSv2 and NFSv3 uses. This patch should cause no behavior changes. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc4proc.c | 24 ++++++++++++++++++++++++ fs/lockd/svcproc.c | 24 ++++++++++++++++++++++++ fs/nfs/callback_xdr.c | 1 + fs/nfsd/nfs2acl.c | 5 +++++ fs/nfsd/nfs3acl.c | 3 +++ fs/nfsd/nfs3proc.c | 22 ++++++++++++++++++++++ fs/nfsd/nfs4proc.c | 2 ++ fs/nfsd/nfsproc.c | 18 ++++++++++++++++++ include/linux/sunrpc/svc.h | 1 + net/sunrpc/svc.c | 2 +- 10 files changed, 101 insertions(+), 1 deletion(-) diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c index bf274f23969b..284b019cb652 100644 --- a/fs/lockd/svc4proc.c +++ b/fs/lockd/svc4proc.c @@ -521,6 +521,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_void), + .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "NULL", @@ -530,6 +531,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_testargs, .pc_encode = nlm4svc_encode_testres, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+2+No+Rg, .pc_name = "TEST", @@ -539,6 +541,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_lockargs, .pc_encode = nlm4svc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "LOCK", @@ -548,6 +551,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_cancargs, .pc_encode = nlm4svc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "CANCEL", @@ -557,6 +561,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_unlockargs, .pc_encode = nlm4svc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "UNLOCK", @@ -566,6 +571,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_testargs, .pc_encode = nlm4svc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "GRANTED", @@ -575,6 +581,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_testargs, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "TEST_MSG", @@ -584,6 +591,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_lockargs, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "LOCK_MSG", @@ -593,6 +601,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_cancargs, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "CANCEL_MSG", @@ -602,6 +611,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_unlockargs, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "UNLOCK_MSG", @@ -611,6 +621,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_testargs, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "GRANTED_MSG", @@ -620,6 +631,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "TEST_RES", @@ -629,6 +641,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "LOCK_RES", @@ -638,6 +651,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "CANCEL_RES", @@ -647,6 +661,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "UNLOCK_RES", @@ -656,6 +671,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_res, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "GRANTED_RES", @@ -665,6 +681,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_reboot, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_reboot), + .pc_argzero = sizeof(struct nlm_reboot), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "SM_NOTIFY", @@ -674,6 +691,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_void), + .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, .pc_name = "UNUSED", @@ -683,6 +701,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_void), + .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, .pc_name = "UNUSED", @@ -692,6 +711,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_void), + .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, .pc_name = "UNUSED", @@ -701,6 +721,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_shareargs, .pc_encode = nlm4svc_encode_shareres, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, .pc_name = "SHARE", @@ -710,6 +731,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_shareargs, .pc_encode = nlm4svc_encode_shareres, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, .pc_name = "UNSHARE", @@ -719,6 +741,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_lockargs, .pc_encode = nlm4svc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "NM_LOCK", @@ -728,6 +751,7 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_notify, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "FREE_ALL", diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c index b09ca35b527c..e35c05e27806 100644 --- a/fs/lockd/svcproc.c +++ b/fs/lockd/svcproc.c @@ -555,6 +555,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_void), + .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "NULL", @@ -564,6 +565,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_testargs, .pc_encode = nlmsvc_encode_testres, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+2+No+Rg, .pc_name = "TEST", @@ -573,6 +575,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_lockargs, .pc_encode = nlmsvc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "LOCK", @@ -582,6 +585,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_cancargs, .pc_encode = nlmsvc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "CANCEL", @@ -591,6 +595,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_unlockargs, .pc_encode = nlmsvc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "UNLOCK", @@ -600,6 +605,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_testargs, .pc_encode = nlmsvc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "GRANTED", @@ -609,6 +615,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_testargs, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "TEST_MSG", @@ -618,6 +625,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_lockargs, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "LOCK_MSG", @@ -627,6 +635,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_cancargs, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "CANCEL_MSG", @@ -636,6 +645,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_unlockargs, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "UNLOCK_MSG", @@ -645,6 +655,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_testargs, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "GRANTED_MSG", @@ -654,6 +665,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "TEST_RES", @@ -663,6 +675,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "LOCK_RES", @@ -672,6 +685,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "CANCEL_RES", @@ -681,6 +695,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "UNLOCK_RES", @@ -690,6 +705,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_res, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_res), + .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "GRANTED_RES", @@ -699,6 +715,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_reboot, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_reboot), + .pc_argzero = sizeof(struct nlm_reboot), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "SM_NOTIFY", @@ -708,6 +725,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_void), + .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "UNUSED", @@ -717,6 +735,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_void), + .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "UNUSED", @@ -726,6 +745,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_void), + .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, .pc_name = "UNUSED", @@ -735,6 +755,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_shareargs, .pc_encode = nlmsvc_encode_shareres, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, .pc_name = "SHARE", @@ -744,6 +765,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_shareargs, .pc_encode = nlmsvc_encode_shareres, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, .pc_name = "UNSHARE", @@ -753,6 +775,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_lockargs, .pc_encode = nlmsvc_encode_res, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, .pc_name = "NM_LOCK", @@ -762,6 +785,7 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_notify, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_args), + .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, .pc_name = "FREE_ALL", diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index 1b21f88a9808..db69fc267c9a 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -1070,6 +1070,7 @@ static const struct svc_procedure nfs4_callback_procedures1[] = { .pc_func = nfs4_callback_compound, .pc_encode = nfs4_encode_void, .pc_argsize = 256, + .pc_argzero = 256, .pc_ressize = 256, .pc_xdrressize = NFS4_CALLBACK_BUFSIZE, .pc_name = "COMPOUND", diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index f7d166f056af..f8179fc8c9bd 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -329,6 +329,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_decode = nfssvc_decode_voidarg, .pc_encode = nfssvc_encode_voidres, .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_argzero = sizeof(struct nfsd_voidargs), .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, @@ -340,6 +341,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_encode = nfsaclsvc_encode_getaclres, .pc_release = nfsaclsvc_release_getacl, .pc_argsize = sizeof(struct nfsd3_getaclargs), + .pc_argzero = sizeof(struct nfsd3_getaclargs), .pc_ressize = sizeof(struct nfsd3_getaclres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+1+2*(1+ACL), @@ -351,6 +353,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_encode = nfssvc_encode_attrstatres, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd3_setaclargs), + .pc_argzero = sizeof(struct nfsd3_setaclargs), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, @@ -362,6 +365,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_encode = nfssvc_encode_attrstatres, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_fhandle), + .pc_argzero = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, @@ -373,6 +377,7 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_encode = nfsaclsvc_encode_accessres, .pc_release = nfsaclsvc_release_access, .pc_argsize = sizeof(struct nfsd3_accessargs), + .pc_argzero = sizeof(struct nfsd3_accessargs), .pc_ressize = sizeof(struct nfsd3_accessres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT+1, diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 15bee0339c76..2c451fcd5a3c 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -250,6 +250,7 @@ static const struct svc_procedure nfsd_acl_procedures3[3] = { .pc_decode = nfssvc_decode_voidarg, .pc_encode = nfssvc_encode_voidres, .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_argzero = sizeof(struct nfsd_voidargs), .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, @@ -261,6 +262,7 @@ static const struct svc_procedure nfsd_acl_procedures3[3] = { .pc_encode = nfs3svc_encode_getaclres, .pc_release = nfs3svc_release_getacl, .pc_argsize = sizeof(struct nfsd3_getaclargs), + .pc_argzero = sizeof(struct nfsd3_getaclargs), .pc_ressize = sizeof(struct nfsd3_getaclres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+1+2*(1+ACL), @@ -272,6 +274,7 @@ static const struct svc_procedure nfsd_acl_procedures3[3] = { .pc_encode = nfs3svc_encode_setaclres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_setaclargs), + .pc_argzero = sizeof(struct nfsd3_setaclargs), .pc_ressize = sizeof(struct nfsd3_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT, diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 8679c4174602..5c2e2b5e5945 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -809,6 +809,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_decode = nfssvc_decode_voidarg, .pc_encode = nfssvc_encode_voidres, .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_argzero = sizeof(struct nfsd_voidargs), .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, @@ -820,6 +821,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_getattrres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd_fhandle), + .pc_argzero = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd3_attrstatres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, @@ -831,6 +833,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_wccstatres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_sattrargs), + .pc_argzero = sizeof(struct nfsd3_sattrargs), .pc_ressize = sizeof(struct nfsd3_wccstatres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC, @@ -842,6 +845,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_lookupres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_diropargs), + .pc_argzero = sizeof(struct nfsd3_diropargs), .pc_ressize = sizeof(struct nfsd3_diropres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+FH+pAT+pAT, @@ -853,6 +857,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_accessres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_accessargs), + .pc_argzero = sizeof(struct nfsd3_accessargs), .pc_ressize = sizeof(struct nfsd3_accessres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+1, @@ -864,6 +869,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_readlinkres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd_fhandle), + .pc_argzero = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd3_readlinkres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+1+NFS3_MAXPATHLEN/4, @@ -875,6 +881,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_readres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_readargs), + .pc_argzero = sizeof(struct nfsd3_readargs), .pc_ressize = sizeof(struct nfsd3_readres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+4+NFSSVC_MAXBLKSIZE/4, @@ -886,6 +893,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_writeres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_writeargs), + .pc_argzero = sizeof(struct nfsd3_writeargs), .pc_ressize = sizeof(struct nfsd3_writeres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC+4, @@ -897,6 +905,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_createres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_createargs), + .pc_argzero = sizeof(struct nfsd3_createargs), .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, @@ -908,6 +917,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_createres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_mkdirargs), + .pc_argzero = sizeof(struct nfsd3_mkdirargs), .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, @@ -919,6 +929,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_createres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_symlinkargs), + .pc_argzero = sizeof(struct nfsd3_symlinkargs), .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, @@ -930,6 +941,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_createres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_mknodargs), + .pc_argzero = sizeof(struct nfsd3_mknodargs), .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, @@ -941,6 +953,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_wccstatres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_diropargs), + .pc_argzero = sizeof(struct nfsd3_diropargs), .pc_ressize = sizeof(struct nfsd3_wccstatres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC, @@ -952,6 +965,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_wccstatres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_diropargs), + .pc_argzero = sizeof(struct nfsd3_diropargs), .pc_ressize = sizeof(struct nfsd3_wccstatres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC, @@ -963,6 +977,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_renameres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_renameargs), + .pc_argzero = sizeof(struct nfsd3_renameargs), .pc_ressize = sizeof(struct nfsd3_renameres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC+WC, @@ -974,6 +989,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_linkres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_linkargs), + .pc_argzero = sizeof(struct nfsd3_linkargs), .pc_ressize = sizeof(struct nfsd3_linkres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+pAT+WC, @@ -985,6 +1001,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_readdirres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_readdirargs), + .pc_argzero = sizeof(struct nfsd3_readdirargs), .pc_ressize = sizeof(struct nfsd3_readdirres), .pc_cachetype = RC_NOCACHE, .pc_name = "READDIR", @@ -995,6 +1012,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_readdirres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_readdirplusargs), + .pc_argzero = sizeof(struct nfsd3_readdirplusargs), .pc_ressize = sizeof(struct nfsd3_readdirres), .pc_cachetype = RC_NOCACHE, .pc_name = "READDIRPLUS", @@ -1004,6 +1022,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_decode = nfs3svc_decode_fhandleargs, .pc_encode = nfs3svc_encode_fsstatres, .pc_argsize = sizeof(struct nfsd3_fhandleargs), + .pc_argzero = sizeof(struct nfsd3_fhandleargs), .pc_ressize = sizeof(struct nfsd3_fsstatres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+2*6+1, @@ -1014,6 +1033,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_decode = nfs3svc_decode_fhandleargs, .pc_encode = nfs3svc_encode_fsinfores, .pc_argsize = sizeof(struct nfsd3_fhandleargs), + .pc_argzero = sizeof(struct nfsd3_fhandleargs), .pc_ressize = sizeof(struct nfsd3_fsinfores), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+12, @@ -1024,6 +1044,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_decode = nfs3svc_decode_fhandleargs, .pc_encode = nfs3svc_encode_pathconfres, .pc_argsize = sizeof(struct nfsd3_fhandleargs), + .pc_argzero = sizeof(struct nfsd3_fhandleargs), .pc_ressize = sizeof(struct nfsd3_pathconfres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+6, @@ -1035,6 +1056,7 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_commitres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_commitargs), + .pc_argzero = sizeof(struct nfsd3_commitargs), .pc_ressize = sizeof(struct nfsd3_commitres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+WC+2, diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 38d60964d8b2..f6ea7445073f 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -3577,6 +3577,7 @@ static const struct svc_procedure nfsd_procedures4[2] = { .pc_decode = nfssvc_decode_voidarg, .pc_encode = nfssvc_encode_voidres, .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_argzero = sizeof(struct nfsd_voidargs), .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 1, @@ -3587,6 +3588,7 @@ static const struct svc_procedure nfsd_procedures4[2] = { .pc_decode = nfs4svc_decode_compoundargs, .pc_encode = nfs4svc_encode_compoundres, .pc_argsize = sizeof(struct nfsd4_compoundargs), + .pc_argzero = sizeof(struct nfsd4_compoundargs), .pc_ressize = sizeof(struct nfsd4_compoundres), .pc_release = nfsd4_release_compoundargs, .pc_cachetype = RC_NOCACHE, diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 7ed03ac6bdab..3509331a6650 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -645,6 +645,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_voidarg, .pc_encode = nfssvc_encode_voidres, .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_argzero = sizeof(struct nfsd_voidargs), .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, @@ -656,6 +657,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_encode = nfssvc_encode_attrstatres, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_fhandle), + .pc_argzero = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, @@ -667,6 +669,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_encode = nfssvc_encode_attrstatres, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_sattrargs), + .pc_argzero = sizeof(struct nfsd_sattrargs), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+AT, @@ -677,6 +680,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_voidarg, .pc_encode = nfssvc_encode_voidres, .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_argzero = sizeof(struct nfsd_voidargs), .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, @@ -688,6 +692,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_encode = nfssvc_encode_diropres, .pc_release = nfssvc_release_diropres, .pc_argsize = sizeof(struct nfsd_diropargs), + .pc_argzero = sizeof(struct nfsd_diropargs), .pc_ressize = sizeof(struct nfsd_diropres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+FH+AT, @@ -698,6 +703,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_fhandleargs, .pc_encode = nfssvc_encode_readlinkres, .pc_argsize = sizeof(struct nfsd_fhandle), + .pc_argzero = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_readlinkres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+1+NFS_MAXPATHLEN/4, @@ -709,6 +715,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_encode = nfssvc_encode_readres, .pc_release = nfssvc_release_readres, .pc_argsize = sizeof(struct nfsd_readargs), + .pc_argzero = sizeof(struct nfsd_readargs), .pc_ressize = sizeof(struct nfsd_readres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT+1+NFSSVC_MAXBLKSIZE_V2/4, @@ -719,6 +726,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_voidarg, .pc_encode = nfssvc_encode_voidres, .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_argzero = sizeof(struct nfsd_voidargs), .pc_ressize = sizeof(struct nfsd_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, @@ -730,6 +738,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_encode = nfssvc_encode_attrstatres, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_writeargs), + .pc_argzero = sizeof(struct nfsd_writeargs), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+AT, @@ -741,6 +750,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_encode = nfssvc_encode_diropres, .pc_release = nfssvc_release_diropres, .pc_argsize = sizeof(struct nfsd_createargs), + .pc_argzero = sizeof(struct nfsd_createargs), .pc_ressize = sizeof(struct nfsd_diropres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+FH+AT, @@ -751,6 +761,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_diropargs, .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_diropargs), + .pc_argzero = sizeof(struct nfsd_diropargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, @@ -761,6 +772,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_renameargs, .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_renameargs), + .pc_argzero = sizeof(struct nfsd_renameargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, @@ -771,6 +783,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_linkargs, .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_linkargs), + .pc_argzero = sizeof(struct nfsd_linkargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, @@ -781,6 +794,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_symlinkargs, .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_symlinkargs), + .pc_argzero = sizeof(struct nfsd_symlinkargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, @@ -792,6 +806,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_encode = nfssvc_encode_diropres, .pc_release = nfssvc_release_diropres, .pc_argsize = sizeof(struct nfsd_createargs), + .pc_argzero = sizeof(struct nfsd_createargs), .pc_ressize = sizeof(struct nfsd_diropres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+FH+AT, @@ -802,6 +817,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_diropargs, .pc_encode = nfssvc_encode_statres, .pc_argsize = sizeof(struct nfsd_diropargs), + .pc_argzero = sizeof(struct nfsd_diropargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, @@ -812,6 +828,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_readdirargs, .pc_encode = nfssvc_encode_readdirres, .pc_argsize = sizeof(struct nfsd_readdirargs), + .pc_argzero = sizeof(struct nfsd_readdirargs), .pc_ressize = sizeof(struct nfsd_readdirres), .pc_cachetype = RC_NOCACHE, .pc_name = "READDIR", @@ -821,6 +838,7 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_decode = nfssvc_decode_fhandleargs, .pc_encode = nfssvc_encode_statfsres, .pc_argsize = sizeof(struct nfsd_fhandle), + .pc_argzero = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_statfsres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+5, diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 3c908ffbbf45..5cf6543d3c8e 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -476,6 +476,7 @@ struct svc_procedure { /* XDR free result: */ void (*pc_release)(struct svc_rqst *); unsigned int pc_argsize; /* argument struct size */ + unsigned int pc_argzero; /* how much of argument to clear */ unsigned int pc_ressize; /* result struct size */ unsigned int pc_cachetype; /* cache info (NFS) */ unsigned int pc_xdrressize; /* maximum size of XDR reply */ diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 9caaed1c143e..8c2918179649 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1237,7 +1237,7 @@ svc_generic_init_request(struct svc_rqst *rqstp, rqstp->rq_procinfo = procp = &versp->vs_proc[rqstp->rq_proc]; /* Initialize storage for argp and resp */ - memset(rqstp->rq_argp, 0, procp->pc_argsize); + memset(rqstp->rq_argp, 0, procp->pc_argzero); memset(rqstp->rq_resp, 0, procp->pc_ressize); /* Bump per-procedure stats counter */ From 5e76b25d7cc82c148d391c0c43b884e6427cb302 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 12 Sep 2022 17:22:44 -0400 Subject: [PATCH 0949/1565] NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing [ Upstream commit 3fdc546462348b8a497c72bc894e0cde9f10fc40 ] Have SunRPC clear everything except for the iops array. Then have each NFSv4 XDR decoder clear it's own argument before decoding. Now individual operations may have a large argument struct while not penalizing the vast majority of operations with a small struct. And, clearing the argument structure occurs as the argument fields are initialized, enabling the CPU to do write combining on that memory. In some cases, clearing is not even necessary because all of the fields in the argument structure are initialized by the decoder. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 2 +- fs/nfsd/nfs4xdr.c | 61 +++++++++++++++++++++++++++++++++++++--------- 2 files changed, 51 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index f6ea7445073f..79f1990d40c4 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -3588,7 +3588,7 @@ static const struct svc_procedure nfsd_procedures4[2] = { .pc_decode = nfs4svc_decode_compoundargs, .pc_encode = nfs4svc_encode_compoundres, .pc_argsize = sizeof(struct nfsd4_compoundargs), - .pc_argzero = sizeof(struct nfsd4_compoundargs), + .pc_argzero = offsetof(struct nfsd4_compoundargs, iops), .pc_ressize = sizeof(struct nfsd4_compoundres), .pc_release = nfsd4_release_compoundargs, .pc_cachetype = RC_NOCACHE, diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index b5ca83045d6e..04699198eace 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -793,6 +793,7 @@ nfsd4_decode_commit(struct nfsd4_compoundargs *argp, struct nfsd4_commit *commit return nfserr_bad_xdr; if (xdr_stream_decode_u32(argp->xdr, &commit->co_count) < 0) return nfserr_bad_xdr; + memset(&commit->co_verf, 0, sizeof(commit->co_verf)); return nfs_ok; } @@ -801,6 +802,7 @@ nfsd4_decode_create(struct nfsd4_compoundargs *argp, struct nfsd4_create *create { __be32 *p, status; + memset(create, 0, sizeof(*create)); if (xdr_stream_decode_u32(argp->xdr, &create->cr_type) < 0) return nfserr_bad_xdr; switch (create->cr_type) { @@ -850,6 +852,7 @@ nfsd4_decode_delegreturn(struct nfsd4_compoundargs *argp, struct nfsd4_delegretu static inline __be32 nfsd4_decode_getattr(struct nfsd4_compoundargs *argp, struct nfsd4_getattr *getattr) { + memset(getattr, 0, sizeof(*getattr)); return nfsd4_decode_bitmap4(argp, getattr->ga_bmval, ARRAY_SIZE(getattr->ga_bmval)); } @@ -857,6 +860,7 @@ nfsd4_decode_getattr(struct nfsd4_compoundargs *argp, struct nfsd4_getattr *geta static __be32 nfsd4_decode_link(struct nfsd4_compoundargs *argp, struct nfsd4_link *link) { + memset(link, 0, sizeof(*link)); return nfsd4_decode_component4(argp, &link->li_name, &link->li_namelen); } @@ -905,6 +909,7 @@ nfsd4_decode_locker4(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) static __be32 nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) { + memset(lock, 0, sizeof(*lock)); if (xdr_stream_decode_u32(argp->xdr, &lock->lk_type) < 0) return nfserr_bad_xdr; if ((lock->lk_type < NFS4_READ_LT) || (lock->lk_type > NFS4_WRITEW_LT)) @@ -921,6 +926,7 @@ nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) static __be32 nfsd4_decode_lockt(struct nfsd4_compoundargs *argp, struct nfsd4_lockt *lockt) { + memset(lockt, 0, sizeof(*lockt)); if (xdr_stream_decode_u32(argp->xdr, &lockt->lt_type) < 0) return nfserr_bad_xdr; if ((lockt->lt_type < NFS4_READ_LT) || (lockt->lt_type > NFS4_WRITEW_LT)) @@ -1142,11 +1148,8 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) __be32 status; u32 dummy; - memset(open->op_bmval, 0, sizeof(open->op_bmval)); - open->op_iattr.ia_valid = 0; - open->op_openowner = NULL; + memset(open, 0, sizeof(*open)); - open->op_xdr_error = 0; if (xdr_stream_decode_u32(argp->xdr, &open->op_seqid) < 0) return nfserr_bad_xdr; /* deleg_want is ignored */ @@ -1181,6 +1184,8 @@ nfsd4_decode_open_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_open_con if (xdr_stream_decode_u32(argp->xdr, &open_conf->oc_seqid) < 0) return nfserr_bad_xdr; + memset(&open_conf->oc_resp_stateid, 0, + sizeof(open_conf->oc_resp_stateid)); return nfs_ok; } @@ -1189,6 +1194,7 @@ nfsd4_decode_open_downgrade(struct nfsd4_compoundargs *argp, struct nfsd4_open_d { __be32 status; + memset(open_down, 0, sizeof(*open_down)); status = nfsd4_decode_stateid4(argp, &open_down->od_stateid); if (status) return status; @@ -1218,6 +1224,7 @@ nfsd4_decode_putfh(struct nfsd4_compoundargs *argp, struct nfsd4_putfh *putfh) if (!putfh->pf_fhval) return nfserr_jukebox; + putfh->no_verify = false; return nfs_ok; } @@ -1234,6 +1241,7 @@ nfsd4_decode_read(struct nfsd4_compoundargs *argp, struct nfsd4_read *read) { __be32 status; + memset(read, 0, sizeof(*read)); status = nfsd4_decode_stateid4(argp, &read->rd_stateid); if (status) return status; @@ -1250,6 +1258,7 @@ nfsd4_decode_readdir(struct nfsd4_compoundargs *argp, struct nfsd4_readdir *read { __be32 status; + memset(readdir, 0, sizeof(*readdir)); if (xdr_stream_decode_u64(argp->xdr, &readdir->rd_cookie) < 0) return nfserr_bad_xdr; status = nfsd4_decode_verifier4(argp, &readdir->rd_verf); @@ -1269,6 +1278,7 @@ nfsd4_decode_readdir(struct nfsd4_compoundargs *argp, struct nfsd4_readdir *read static __be32 nfsd4_decode_remove(struct nfsd4_compoundargs *argp, struct nfsd4_remove *remove) { + memset(&remove->rm_cinfo, 0, sizeof(remove->rm_cinfo)); return nfsd4_decode_component4(argp, &remove->rm_name, &remove->rm_namelen); } @@ -1277,6 +1287,7 @@ nfsd4_decode_rename(struct nfsd4_compoundargs *argp, struct nfsd4_rename *rename { __be32 status; + memset(rename, 0, sizeof(*rename)); status = nfsd4_decode_component4(argp, &rename->rn_sname, &rename->rn_snamelen); if (status) return status; @@ -1293,6 +1304,7 @@ static __be32 nfsd4_decode_secinfo(struct nfsd4_compoundargs *argp, struct nfsd4_secinfo *secinfo) { + secinfo->si_exp = NULL; return nfsd4_decode_component4(argp, &secinfo->si_name, &secinfo->si_namelen); } @@ -1301,6 +1313,7 @@ nfsd4_decode_setattr(struct nfsd4_compoundargs *argp, struct nfsd4_setattr *seta { __be32 status; + memset(setattr, 0, sizeof(*setattr)); status = nfsd4_decode_stateid4(argp, &setattr->sa_stateid); if (status) return status; @@ -1315,6 +1328,8 @@ nfsd4_decode_setclientid(struct nfsd4_compoundargs *argp, struct nfsd4_setclient { __be32 *p, status; + memset(setclientid, 0, sizeof(*setclientid)); + if (argp->minorversion >= 1) return nfserr_notsupp; @@ -1371,6 +1386,8 @@ nfsd4_decode_verify(struct nfsd4_compoundargs *argp, struct nfsd4_verify *verify { __be32 *p, status; + memset(verify, 0, sizeof(*verify)); + status = nfsd4_decode_bitmap4(argp, verify->ve_bmval, ARRAY_SIZE(verify->ve_bmval)); if (status) @@ -1410,6 +1427,9 @@ nfsd4_decode_write(struct nfsd4_compoundargs *argp, struct nfsd4_write *write) if (!xdr_stream_subsegment(argp->xdr, &write->wr_payload, write->wr_buflen)) return nfserr_bad_xdr; + write->wr_bytes_written = 0; + write->wr_how_written = 0; + memset(&write->wr_verifier, 0, sizeof(write->wr_verifier)); return nfs_ok; } @@ -1434,6 +1454,7 @@ nfsd4_decode_release_lockowner(struct nfsd4_compoundargs *argp, struct nfsd4_rel static __be32 nfsd4_decode_backchannel_ctl(struct nfsd4_compoundargs *argp, struct nfsd4_backchannel_ctl *bc) { + memset(bc, 0, sizeof(*bc)); if (xdr_stream_decode_u32(argp->xdr, &bc->bc_cb_program) < 0) return nfserr_bad_xdr; return nfsd4_decode_cb_sec(argp, &bc->bc_cb_sec); @@ -1444,6 +1465,7 @@ static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, u32 use_conn_in_rdma_mode; __be32 status; + memset(bcts, 0, sizeof(*bcts)); status = nfsd4_decode_sessionid4(argp, &bcts->sessionid); if (status) return status; @@ -1585,6 +1607,7 @@ nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, { __be32 status; + memset(exid, 0, sizeof(*exid)); status = nfsd4_decode_verifier4(argp, &exid->verifier); if (status) return status; @@ -1637,6 +1660,7 @@ nfsd4_decode_create_session(struct nfsd4_compoundargs *argp, { __be32 status; + memset(sess, 0, sizeof(*sess)); status = nfsd4_decode_clientid4(argp, &sess->clientid); if (status) return status; @@ -1652,11 +1676,7 @@ nfsd4_decode_create_session(struct nfsd4_compoundargs *argp, return status; if (xdr_stream_decode_u32(argp->xdr, &sess->callback_prog) < 0) return nfserr_bad_xdr; - status = nfsd4_decode_cb_sec(argp, &sess->cb_sec); - if (status) - return status; - - return nfs_ok; + return nfsd4_decode_cb_sec(argp, &sess->cb_sec); } static __be32 @@ -1680,6 +1700,7 @@ nfsd4_decode_getdeviceinfo(struct nfsd4_compoundargs *argp, { __be32 status; + memset(gdev, 0, sizeof(*gdev)); status = nfsd4_decode_deviceid4(argp, &gdev->gd_devid); if (status) return status; @@ -1700,6 +1721,7 @@ nfsd4_decode_layoutcommit(struct nfsd4_compoundargs *argp, { __be32 *p, status; + memset(lcp, 0, sizeof(*lcp)); if (xdr_stream_decode_u64(argp->xdr, &lcp->lc_seg.offset) < 0) return nfserr_bad_xdr; if (xdr_stream_decode_u64(argp->xdr, &lcp->lc_seg.length) < 0) @@ -1735,6 +1757,7 @@ nfsd4_decode_layoutget(struct nfsd4_compoundargs *argp, { __be32 status; + memset(lgp, 0, sizeof(*lgp)); if (xdr_stream_decode_u32(argp->xdr, &lgp->lg_signal) < 0) return nfserr_bad_xdr; if (xdr_stream_decode_u32(argp->xdr, &lgp->lg_layout_type) < 0) @@ -1760,6 +1783,7 @@ static __be32 nfsd4_decode_layoutreturn(struct nfsd4_compoundargs *argp, struct nfsd4_layoutreturn *lrp) { + memset(lrp, 0, sizeof(*lrp)); if (xdr_stream_decode_bool(argp->xdr, &lrp->lr_reclaim) < 0) return nfserr_bad_xdr; if (xdr_stream_decode_u32(argp->xdr, &lrp->lr_layout_type) < 0) @@ -1775,6 +1799,8 @@ static __be32 nfsd4_decode_secinfo_no_name(struct nfsd4_compoundargs *argp, { if (xdr_stream_decode_u32(argp->xdr, &sin->sin_style) < 0) return nfserr_bad_xdr; + + sin->sin_exp = NULL; return nfs_ok; } @@ -1795,6 +1821,7 @@ nfsd4_decode_sequence(struct nfsd4_compoundargs *argp, seq->maxslots = be32_to_cpup(p++); seq->cachethis = be32_to_cpup(p); + seq->status_flags = 0; return nfs_ok; } @@ -1805,6 +1832,7 @@ nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_test_sta __be32 status; u32 i; + memset(test_stateid, 0, sizeof(*test_stateid)); if (xdr_stream_decode_u32(argp->xdr, &test_stateid->ts_num_ids) < 0) return nfserr_bad_xdr; @@ -1902,6 +1930,7 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) struct nl4_server *ns_dummy; __be32 status; + memset(copy, 0, sizeof(*copy)); status = nfsd4_decode_stateid4(argp, ©->cp_src_stateid); if (status) return status; @@ -1957,6 +1986,7 @@ nfsd4_decode_copy_notify(struct nfsd4_compoundargs *argp, { __be32 status; + memset(cn, 0, sizeof(*cn)); cn->cpn_src = svcxdr_tmpalloc(argp, sizeof(*cn->cpn_src)); if (cn->cpn_src == NULL) return nfserr_jukebox; @@ -1974,6 +2004,8 @@ static __be32 nfsd4_decode_offload_status(struct nfsd4_compoundargs *argp, struct nfsd4_offload_status *os) { + os->count = 0; + os->status = 0; return nfsd4_decode_stateid4(argp, &os->stateid); } @@ -1990,6 +2022,8 @@ nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek) if (xdr_stream_decode_u32(argp->xdr, &seek->seek_whence) < 0) return nfserr_bad_xdr; + seek->seek_eof = 0; + seek->seek_pos = 0; return nfs_ok; } @@ -2125,6 +2159,7 @@ nfsd4_decode_getxattr(struct nfsd4_compoundargs *argp, __be32 status; u32 maxcount; + memset(getxattr, 0, sizeof(*getxattr)); status = nfsd4_decode_xattr_name(argp, &getxattr->getxa_name); if (status) return status; @@ -2133,8 +2168,7 @@ nfsd4_decode_getxattr(struct nfsd4_compoundargs *argp, maxcount = min_t(u32, XATTR_SIZE_MAX, maxcount); getxattr->getxa_len = maxcount; - - return status; + return nfs_ok; } static __be32 @@ -2144,6 +2178,8 @@ nfsd4_decode_setxattr(struct nfsd4_compoundargs *argp, u32 flags, maxcount, size; __be32 status; + memset(setxattr, 0, sizeof(*setxattr)); + if (xdr_stream_decode_u32(argp->xdr, &flags) < 0) return nfserr_bad_xdr; @@ -2182,6 +2218,8 @@ nfsd4_decode_listxattrs(struct nfsd4_compoundargs *argp, { u32 maxcount; + memset(listxattrs, 0, sizeof(*listxattrs)); + if (xdr_stream_decode_u64(argp->xdr, &listxattrs->lsxa_cookie) < 0) return nfserr_bad_xdr; @@ -2209,6 +2247,7 @@ static __be32 nfsd4_decode_removexattr(struct nfsd4_compoundargs *argp, struct nfsd4_removexattr *removexattr) { + memset(removexattr, 0, sizeof(*removexattr)); return nfsd4_decode_xattr_name(argp, &removexattr->rmxa_name); } From d08acee648f1e1237a3b045a7af382d434a779fb Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 12 Sep 2022 17:22:56 -0400 Subject: [PATCH 0950/1565] NFSD: Refactor common code out of dirlist helpers [ Upstream commit 98124f5bd6c76699d514fbe491dd95265369cc99 ] The dust has settled a bit and it's become obvious what code is totally common between nfsd_init_dirlist_pages() and nfsd3_init_dirlist_pages(). Move that common code to SUNRPC. The new helper brackets the existing xdr_init_decode_pages() API. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 10 +--------- fs/nfsd/nfsproc.c | 10 +--------- include/linux/sunrpc/xdr.h | 2 ++ net/sunrpc/xdr.c | 22 ++++++++++++++++++++++ 4 files changed, 26 insertions(+), 18 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 5c2e2b5e5945..a9bf455aee82 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -574,15 +574,7 @@ static void nfsd3_init_dirlist_pages(struct svc_rqst *rqstp, buf->pages = rqstp->rq_next_page; rqstp->rq_next_page += (buf->buflen + PAGE_SIZE - 1) >> PAGE_SHIFT; - /* This is xdr_init_encode(), but it assumes that - * the head kvec has already been consumed. */ - xdr_set_scratch_buffer(xdr, NULL, 0); - xdr->buf = buf; - xdr->page_ptr = buf->pages; - xdr->iov = NULL; - xdr->p = page_address(*buf->pages); - xdr->end = (void *)xdr->p + min_t(u32, buf->buflen, PAGE_SIZE); - xdr->rqst = NULL; + xdr_init_encode_pages(xdr, buf, buf->pages, NULL); } /* diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 3509331a6650..78fa5a9edf27 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -575,15 +575,7 @@ static void nfsd_init_dirlist_pages(struct svc_rqst *rqstp, buf->pages = rqstp->rq_next_page; rqstp->rq_next_page++; - /* This is xdr_init_encode(), but it assumes that - * the head kvec has already been consumed. */ - xdr_set_scratch_buffer(xdr, NULL, 0); - xdr->buf = buf; - xdr->page_ptr = buf->pages; - xdr->iov = NULL; - xdr->p = page_address(*buf->pages); - xdr->end = (void *)xdr->p + min_t(u32, buf->buflen, PAGE_SIZE); - xdr->rqst = NULL; + xdr_init_encode_pages(xdr, buf, buf->pages, NULL); } /* diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 59d99ff31c1a..c1c50eaae472 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -239,6 +239,8 @@ typedef int (*kxdrdproc_t)(struct rpc_rqst *rqstp, struct xdr_stream *xdr, extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, struct rpc_rqst *rqst); +extern void xdr_init_encode_pages(struct xdr_stream *xdr, struct xdr_buf *buf, + struct page **pages, struct rpc_rqst *rqst); extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes); extern int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec, size_t nbytes); diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c index f66e2de7cd27..e2bd0cd39114 100644 --- a/net/sunrpc/xdr.c +++ b/net/sunrpc/xdr.c @@ -690,6 +690,28 @@ void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, } EXPORT_SYMBOL_GPL(xdr_init_encode); +/** + * xdr_init_encode_pages - Initialize an xdr_stream for encoding into pages + * @xdr: pointer to xdr_stream struct + * @buf: pointer to XDR buffer into which to encode data + * @pages: list of pages to decode into + * @rqst: pointer to controlling rpc_rqst, for debugging + * + */ +void xdr_init_encode_pages(struct xdr_stream *xdr, struct xdr_buf *buf, + struct page **pages, struct rpc_rqst *rqst) +{ + xdr_reset_scratch_buffer(xdr); + + xdr->buf = buf; + xdr->page_ptr = pages; + xdr->iov = NULL; + xdr->p = page_address(*pages); + xdr->end = (void *)xdr->p + min_t(u32, buf->buflen, PAGE_SIZE); + xdr->rqst = rqst; +} +EXPORT_SYMBOL_GPL(xdr_init_encode_pages); + /** * __xdr_commit_encode - Ensure all data is written to buffer * @xdr: pointer to xdr_stream From c92e8b295ae86a79f1c17d8ae54043db47953271 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 12 Sep 2022 17:23:02 -0400 Subject: [PATCH 0951/1565] NFSD: Use xdr_inline_decode() to decode NFSv3 symlinks [ Upstream commit c3d2a04f05c590303c125a176e6e43df4a436fdb ] Replace the check for buffer over/underflow with a helper that is commonly used for this purpose. The helper also sets xdr->nwords correctly after successfully linearizing the symlink argument into the stream's scratch buffer. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 0293b8d65f10..71e32cf28885 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -616,8 +616,6 @@ nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_symlinkargs *args = rqstp->rq_argp; struct kvec *head = rqstp->rq_arg.head; - struct kvec *tail = rqstp->rq_arg.tail; - size_t remaining; if (!svcxdr_decode_diropargs3(xdr, &args->ffh, &args->fname, &args->flen)) return false; @@ -626,16 +624,10 @@ nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) if (xdr_stream_decode_u32(xdr, &args->tlen) < 0) return false; - /* request sanity */ - remaining = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len; - remaining -= xdr_stream_pos(xdr); - if (remaining < xdr_align_size(args->tlen)) - return false; - - args->first.iov_base = xdr->p; + /* symlink_data */ args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); - - return true; + args->first.iov_base = xdr_inline_decode(xdr, args->tlen); + return args->first.iov_base != NULL; } bool From 918054d2d8ac981314af3f00ae1cbf5545d01cd0 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 12 Sep 2022 17:23:07 -0400 Subject: [PATCH 0952/1565] NFSD: Clean up WRITE arg decoders [ Upstream commit d4da5baa533215b14625458e645056baf646bb2e ] xdr_stream_subsegment() already returns a boolean value. Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3xdr.c | 4 +--- fs/nfsd/nfsxdr.c | 4 +--- 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 71e32cf28885..3308dd671ef0 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -571,10 +571,8 @@ nfs3svc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) args->count = max_blocksize; args->len = max_blocksize; } - if (!xdr_stream_subsegment(xdr, &args->payload, args->count)) - return false; - return true; + return xdr_stream_subsegment(xdr, &args->payload, args->count); } bool diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index aba8520b4b8b..caf6355b18fa 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -338,10 +338,8 @@ nfssvc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) return false; if (args->len > NFSSVC_MAXBLKSIZE_V2) return false; - if (!xdr_stream_subsegment(xdr, &args->payload, args->len)) - return false; - return true; + return xdr_stream_subsegment(xdr, &args->payload, args->len); } bool From f533a01b0982fdd292858ad68e18986f8773fc96 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 12 Sep 2022 17:23:19 -0400 Subject: [PATCH 0953/1565] NFSD: Clean up nfs4svc_encode_compoundres() [ Upstream commit 9993a66317fc9951322483a9edbfae95a640b210 ] In today's Linux NFS server implementation, the NFS dispatcher initializes each XDR result stream, and the NFSv4 .pc_func and .pc_encode methods all use xdr_stream-based encoding. This keeps rq_res.len automatically updated. There is no longer a need for the WARN_ON_ONCE() check in nfs4svc_encode_compoundres(). Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 04699198eace..fc587381cd08 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5467,12 +5467,8 @@ bool nfs4svc_encode_compoundres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd4_compoundres *resp = rqstp->rq_resp; - struct xdr_buf *buf = xdr->buf; __be32 *p; - WARN_ON_ONCE(buf->len != buf->head[0].iov_len + buf->page_len + - buf->tail[0].iov_len); - /* * Send buffer space for the following items is reserved * at the top of nfsd4_proc_compound(). From 17bb698078678c71d2a98a060e3d1c2dab8f6f1b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 12 Sep 2022 17:23:25 -0400 Subject: [PATCH 0954/1565] NFSD: Remove "inline" directives on op_rsize_bop helpers [ Upstream commit 6604148cf961b57fc735e4204f8996536da9253c ] These helpers are always invoked indirectly, so the compiler can't inline these anyway. While we're updating the synopses of these helpers, defensively convert their parameters to const pointers. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 121 ++++++++++++++++++++++++++++----------------- fs/nfsd/xdr4.h | 3 +- 2 files changed, 77 insertions(+), 47 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 79f1990d40c4..1bb0fb917cf0 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2763,28 +2763,33 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) #define op_encode_channel_attrs_maxsz (6 + 1 + 1) -static inline u32 nfsd4_only_status_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_only_status_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size) * sizeof(__be32); } -static inline u32 nfsd4_status_stateid_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_status_stateid_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_stateid_maxsz)* sizeof(__be32); } -static inline u32 nfsd4_access_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_access_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { /* ac_supported, ac_resp_access */ return (op_encode_hdr_size + 2)* sizeof(__be32); } -static inline u32 nfsd4_commit_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_commit_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_verifier_maxsz) * sizeof(__be32); } -static inline u32 nfsd4_create_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_create_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_change_info_maxsz + nfs4_fattr_bitmap_maxsz) * sizeof(__be32); @@ -2795,10 +2800,10 @@ static inline u32 nfsd4_create_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op * the op prematurely if the estimate is too large. We may turn off splice * reads unnecessarily. */ -static inline u32 nfsd4_getattr_rsize(struct svc_rqst *rqstp, - struct nfsd4_op *op) +static u32 nfsd4_getattr_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { - u32 *bmap = op->u.getattr.ga_bmval; + const u32 *bmap = op->u.getattr.ga_bmval; u32 bmap0 = bmap[0], bmap1 = bmap[1], bmap2 = bmap[2]; u32 ret = 0; @@ -2833,24 +2838,28 @@ static inline u32 nfsd4_getattr_rsize(struct svc_rqst *rqstp, return ret; } -static inline u32 nfsd4_getfh_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_getfh_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 1) * sizeof(__be32) + NFS4_FHSIZE; } -static inline u32 nfsd4_link_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_link_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_change_info_maxsz) * sizeof(__be32); } -static inline u32 nfsd4_lock_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_lock_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_lock_denied_maxsz) * sizeof(__be32); } -static inline u32 nfsd4_open_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_open_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_stateid_maxsz + op_encode_change_info_maxsz + 1 @@ -2858,7 +2867,8 @@ static inline u32 nfsd4_open_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) + op_encode_delegation_maxsz) * sizeof(__be32); } -static inline u32 nfsd4_read_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_read_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { u32 maxcount = 0, rlen = 0; @@ -2868,7 +2878,8 @@ static inline u32 nfsd4_read_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) return (op_encode_hdr_size + 2 + XDR_QUADLEN(rlen)) * sizeof(__be32); } -static inline u32 nfsd4_read_plus_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_read_plus_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { u32 maxcount = svc_max_payload(rqstp); u32 rlen = min(op->u.read.rd_length, maxcount); @@ -2882,7 +2893,8 @@ static inline u32 nfsd4_read_plus_rsize(struct svc_rqst *rqstp, struct nfsd4_op return (op_encode_hdr_size + 2 + seg_len + XDR_QUADLEN(rlen)) * sizeof(__be32); } -static inline u32 nfsd4_readdir_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_readdir_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { u32 maxcount = 0, rlen = 0; @@ -2893,59 +2905,68 @@ static inline u32 nfsd4_readdir_rsize(struct svc_rqst *rqstp, struct nfsd4_op *o XDR_QUADLEN(rlen)) * sizeof(__be32); } -static inline u32 nfsd4_readlink_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_readlink_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 1) * sizeof(__be32) + PAGE_SIZE; } -static inline u32 nfsd4_remove_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_remove_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_change_info_maxsz) * sizeof(__be32); } -static inline u32 nfsd4_rename_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_rename_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_change_info_maxsz + op_encode_change_info_maxsz) * sizeof(__be32); } -static inline u32 nfsd4_sequence_rsize(struct svc_rqst *rqstp, - struct nfsd4_op *op) +static u32 nfsd4_sequence_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + XDR_QUADLEN(NFS4_MAX_SESSIONID_LEN) + 5) * sizeof(__be32); } -static inline u32 nfsd4_test_stateid_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_test_stateid_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 1 + op->u.test_stateid.ts_num_ids) * sizeof(__be32); } -static inline u32 nfsd4_setattr_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_setattr_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + nfs4_fattr_bitmap_maxsz) * sizeof(__be32); } -static inline u32 nfsd4_secinfo_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_secinfo_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + RPC_AUTH_MAXFLAVOR * (4 + XDR_QUADLEN(GSS_OID_MAX_LEN))) * sizeof(__be32); } -static inline u32 nfsd4_setclientid_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_setclientid_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 2 + XDR_QUADLEN(NFS4_VERIFIER_SIZE)) * sizeof(__be32); } -static inline u32 nfsd4_write_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_write_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 2 + op_encode_verifier_maxsz) * sizeof(__be32); } -static inline u32 nfsd4_exchange_id_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_exchange_id_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 2 + 1 + /* eir_clientid, eir_sequenceid */\ 1 + 1 + /* eir_flags, spr_how */\ @@ -2959,14 +2980,16 @@ static inline u32 nfsd4_exchange_id_rsize(struct svc_rqst *rqstp, struct nfsd4_o 0 /* ignored eir_server_impl_id contents */) * sizeof(__be32); } -static inline u32 nfsd4_bind_conn_to_session_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_bind_conn_to_session_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + \ XDR_QUADLEN(NFS4_MAX_SESSIONID_LEN) + /* bctsr_sessid */\ 2 /* bctsr_dir, use_conn_in_rdma_mode */) * sizeof(__be32); } -static inline u32 nfsd4_create_session_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_create_session_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + \ XDR_QUADLEN(NFS4_MAX_SESSIONID_LEN) + /* sessionid */\ @@ -2975,7 +2998,8 @@ static inline u32 nfsd4_create_session_rsize(struct svc_rqst *rqstp, struct nfsd op_encode_channel_attrs_maxsz) * sizeof(__be32); } -static inline u32 nfsd4_copy_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_copy_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 1 /* wr_callback */ + @@ -2987,16 +3011,16 @@ static inline u32 nfsd4_copy_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) 1 /* cr_synchronous */) * sizeof(__be32); } -static inline u32 nfsd4_offload_status_rsize(struct svc_rqst *rqstp, - struct nfsd4_op *op) +static u32 nfsd4_offload_status_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 2 /* osr_count */ + 1 /* osr_complete<1> optional 0 for now */) * sizeof(__be32); } -static inline u32 nfsd4_copy_notify_rsize(struct svc_rqst *rqstp, - struct nfsd4_op *op) +static u32 nfsd4_copy_notify_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 3 /* cnr_lease_time */ + @@ -3011,7 +3035,8 @@ static inline u32 nfsd4_copy_notify_rsize(struct svc_rqst *rqstp, } #ifdef CONFIG_NFSD_PNFS -static inline u32 nfsd4_getdeviceinfo_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_getdeviceinfo_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { u32 maxcount = 0, rlen = 0; @@ -3029,7 +3054,8 @@ static inline u32 nfsd4_getdeviceinfo_rsize(struct svc_rqst *rqstp, struct nfsd4 * so we need to define an arbitrary upper bound here. */ #define MAX_LAYOUT_SIZE 128 -static inline u32 nfsd4_layoutget_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_layoutget_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 1 /* logr_return_on_close */ + @@ -3038,14 +3064,16 @@ static inline u32 nfsd4_layoutget_rsize(struct svc_rqst *rqstp, struct nfsd4_op MAX_LAYOUT_SIZE) * sizeof(__be32); } -static inline u32 nfsd4_layoutcommit_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_layoutcommit_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 1 /* locr_newsize */ + 2 /* ns_size */) * sizeof(__be32); } -static inline u32 nfsd4_layoutreturn_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_layoutreturn_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 1 /* lrs_stateid */ + @@ -3054,13 +3082,14 @@ static inline u32 nfsd4_layoutreturn_rsize(struct svc_rqst *rqstp, struct nfsd4_ #endif /* CONFIG_NFSD_PNFS */ -static inline u32 nfsd4_seek_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) +static u32 nfsd4_seek_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + 3) * sizeof(__be32); } -static inline u32 nfsd4_getxattr_rsize(struct svc_rqst *rqstp, - struct nfsd4_op *op) +static u32 nfsd4_getxattr_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { u32 maxcount, rlen; @@ -3070,14 +3099,14 @@ static inline u32 nfsd4_getxattr_rsize(struct svc_rqst *rqstp, return (op_encode_hdr_size + 1 + XDR_QUADLEN(rlen)) * sizeof(__be32); } -static inline u32 nfsd4_setxattr_rsize(struct svc_rqst *rqstp, - struct nfsd4_op *op) +static u32 nfsd4_setxattr_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_change_info_maxsz) * sizeof(__be32); } -static inline u32 nfsd4_listxattrs_rsize(struct svc_rqst *rqstp, - struct nfsd4_op *op) +static u32 nfsd4_listxattrs_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { u32 maxcount, rlen; @@ -3087,8 +3116,8 @@ static inline u32 nfsd4_listxattrs_rsize(struct svc_rqst *rqstp, return (op_encode_hdr_size + 4 + XDR_QUADLEN(rlen)) * sizeof(__be32); } -static inline u32 nfsd4_removexattr_rsize(struct svc_rqst *rqstp, - struct nfsd4_op *op) +static u32 nfsd4_removexattr_rsize(const struct svc_rqst *rqstp, + const struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_change_info_maxsz) * sizeof(__be32); diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 2a699e8ca1ba..1e83fda7601a 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -889,7 +889,8 @@ struct nfsd4_operation { u32 op_flags; char *op_name; /* Try to get response size before operation */ - u32 (*op_rsize_bop)(struct svc_rqst *, struct nfsd4_op *); + u32 (*op_rsize_bop)(const struct svc_rqst *rqstp, + const struct nfsd4_op *op); void (*op_get_currentstateid)(struct nfsd4_compound_state *, union nfsd4_op_u *); void (*op_set_currentstateid)(struct nfsd4_compound_state *, From a54822e64d3a0b7af015bb6e4eca5635078493a2 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 12 Sep 2022 17:23:30 -0400 Subject: [PATCH 0955/1565] NFSD: Remove unused nfsd4_compoundargs::cachetype field [ Upstream commit 77e378cf2a595d8e39cddf28a31efe6afd9394a0 ] This field was added by commit 1091006c5eb1 ("nfsd: turn on reply cache for NFSv4") but was never put to use. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/xdr4.h | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 1e83fda7601a..624a19ec3ad1 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -724,7 +724,6 @@ struct nfsd4_compoundargs { u32 opcnt; struct nfsd4_op *ops; struct nfsd4_op iops[8]; - int cachetype; }; struct nfsd4_compoundres { From cda3e9b8cd5eafe922b2d253fe75cbc3f646e186 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 12 Sep 2022 17:23:36 -0400 Subject: [PATCH 0956/1565] NFSD: Pack struct nfsd4_compoundres [ Upstream commit 9f553e61bd36c1048543ac2f6945103dd2f742be ] Remove a couple of 4-byte holes on platforms with 64-bit pointers. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/xdr4.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 624a19ec3ad1..8f323d9071f0 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -732,8 +732,8 @@ struct nfsd4_compoundres { struct svc_rqst * rqstp; __be32 *statusp; - u32 taglen; char * tag; + u32 taglen; u32 opcnt; struct nfsd4_compound_state cstate; From a063abefc6a5253d4f2bcdc46f80e560530f536e Mon Sep 17 00:00:00 2001 From: ChenXiaoSong Date: Fri, 23 Sep 2022 00:31:52 +0800 Subject: [PATCH 0957/1565] nfsd: use DEFINE_PROC_SHOW_ATTRIBUTE to define nfsd_proc_ops [ Upstream commit 0cfb0c4228a5c8e2ed2b58f8309b660b187cef02 ] Use DEFINE_PROC_SHOW_ATTRIBUTE helper macro to simplify the code. Signed-off-by: ChenXiaoSong Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/stats.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/stats.c b/fs/nfsd/stats.c index a8c5a02a84f0..777e24e5da33 100644 --- a/fs/nfsd/stats.c +++ b/fs/nfsd/stats.c @@ -32,7 +32,7 @@ struct svc_stat nfsd_svcstats = { .program = &nfsd_program, }; -static int nfsd_proc_show(struct seq_file *seq, void *v) +static int nfsd_show(struct seq_file *seq, void *v) { int i; @@ -72,17 +72,7 @@ static int nfsd_proc_show(struct seq_file *seq, void *v) return 0; } -static int nfsd_proc_open(struct inode *inode, struct file *file) -{ - return single_open(file, nfsd_proc_show, NULL); -} - -static const struct proc_ops nfsd_proc_ops = { - .proc_open = nfsd_proc_open, - .proc_read = seq_read, - .proc_lseek = seq_lseek, - .proc_release = single_release, -}; +DEFINE_PROC_SHOW_ATTRIBUTE(nfsd); int nfsd_percpu_counters_init(struct percpu_counter counters[], int num) { From 615d761a6b99d126d733d13803cd53dd5735676b Mon Sep 17 00:00:00 2001 From: ChenXiaoSong Date: Fri, 23 Sep 2022 00:31:53 +0800 Subject: [PATCH 0958/1565] nfsd: use DEFINE_SHOW_ATTRIBUTE to define export_features_fops and supported_enctypes_fops [ Upstream commit 9beeaab8e05d353d709103cafa1941714b4d5d94 ] Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code. Signed-off-by: ChenXiaoSong [ cel: reduce line length ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsctl.c | 29 +++++------------------------ 1 file changed, 5 insertions(+), 24 deletions(-) diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 597a26ad4183..3ed0cfdb0c0b 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -185,17 +185,7 @@ static int export_features_show(struct seq_file *m, void *v) return 0; } -static int export_features_open(struct inode *inode, struct file *file) -{ - return single_open(file, export_features_show, NULL); -} - -static const struct file_operations export_features_operations = { - .open = export_features_open, - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; +DEFINE_SHOW_ATTRIBUTE(export_features); #if defined(CONFIG_SUNRPC_GSS) || defined(CONFIG_SUNRPC_GSS_MODULE) static int supported_enctypes_show(struct seq_file *m, void *v) @@ -204,17 +194,7 @@ static int supported_enctypes_show(struct seq_file *m, void *v) return 0; } -static int supported_enctypes_open(struct inode *inode, struct file *file) -{ - return single_open(file, supported_enctypes_show, NULL); -} - -static const struct file_operations supported_enctypes_ops = { - .open = supported_enctypes_open, - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; +DEFINE_SHOW_ATTRIBUTE(supported_enctypes); #endif /* CONFIG_SUNRPC_GSS or CONFIG_SUNRPC_GSS_MODULE */ static const struct file_operations pool_stats_operations = { @@ -1365,7 +1345,7 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc) /* Per-export io stats use same ops as exports file */ [NFSD_Export_Stats] = {"export_stats", &exports_nfsd_operations, S_IRUGO}, [NFSD_Export_features] = {"export_features", - &export_features_operations, S_IRUGO}, + &export_features_fops, S_IRUGO}, [NFSD_FO_UnlockIP] = {"unlock_ip", &transaction_ops, S_IWUSR|S_IRUSR}, [NFSD_FO_UnlockFS] = {"unlock_filesystem", @@ -1381,7 +1361,8 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc) [NFSD_MaxConnections] = {"max_connections", &transaction_ops, S_IWUSR|S_IRUGO}, [NFSD_Filecache] = {"filecache", &filecache_ops, S_IRUGO}, #if defined(CONFIG_SUNRPC_GSS) || defined(CONFIG_SUNRPC_GSS_MODULE) - [NFSD_SupportedEnctypes] = {"supported_krb5_enctypes", &supported_enctypes_ops, S_IRUGO}, + [NFSD_SupportedEnctypes] = {"supported_krb5_enctypes", + &supported_enctypes_fops, S_IRUGO}, #endif /* CONFIG_SUNRPC_GSS or CONFIG_SUNRPC_GSS_MODULE */ #ifdef CONFIG_NFSD_V4 [NFSD_Leasetime] = {"nfsv4leasetime", &transaction_ops, S_IWUSR|S_IRUSR}, From 443e6484259f134c8acad4ae50955ac357cca3ea Mon Sep 17 00:00:00 2001 From: ChenXiaoSong Date: Fri, 23 Sep 2022 00:31:54 +0800 Subject: [PATCH 0959/1565] nfsd: use DEFINE_SHOW_ATTRIBUTE to define client_info_fops [ Upstream commit 1d7f6b302b75ff7acb9eb3cab0c631b10cfa7542 ] Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code. inode is converted from seq_file->file instead of seq_file->private in client_info_show(). Signed-off-by: ChenXiaoSong Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index d2468a408328..fce62a4388a2 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2503,7 +2503,7 @@ static const char *cb_state2str(int state) static int client_info_show(struct seq_file *m, void *v) { - struct inode *inode = m->private; + struct inode *inode = file_inode(m->file); struct nfs4_client *clp; u64 clid; @@ -2543,17 +2543,7 @@ static int client_info_show(struct seq_file *m, void *v) return 0; } -static int client_info_open(struct inode *inode, struct file *file) -{ - return single_open(file, client_info_show, inode); -} - -static const struct file_operations client_info_fops = { - .open = client_info_open, - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; +DEFINE_SHOW_ATTRIBUTE(client_info); static void *states_start(struct seq_file *s, loff_t *pos) __acquires(&clp->cl_lock) From 21e18dd5eba48855b1800b24af9d3dc32b1b8788 Mon Sep 17 00:00:00 2001 From: ChenXiaoSong Date: Fri, 23 Sep 2022 00:31:55 +0800 Subject: [PATCH 0960/1565] nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fops [ Upstream commit 64776611a06322b99386f8dfe3b3ba1aa0347a38 ] Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code. nfsd_net is converted from seq_file->file instead of seq_file->private in nfsd_reply_cache_stats_show(). Signed-off-by: ChenXiaoSong [ cel: reduce line length ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/cache.h | 2 +- fs/nfsd/nfscache.c | 13 +++---------- fs/nfsd/nfsctl.c | 10 +++------- 3 files changed, 7 insertions(+), 18 deletions(-) diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h index 65c331f75e9c..f21259ead64b 100644 --- a/fs/nfsd/cache.h +++ b/fs/nfsd/cache.h @@ -84,6 +84,6 @@ int nfsd_reply_cache_init(struct nfsd_net *); void nfsd_reply_cache_shutdown(struct nfsd_net *); int nfsd_cache_lookup(struct svc_rqst *); void nfsd_cache_update(struct svc_rqst *, int, __be32 *); -int nfsd_reply_cache_stats_open(struct inode *, struct file *); +int nfsd_reply_cache_stats_show(struct seq_file *m, void *v); #endif /* NFSCACHE_H */ diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c index 7da88bdc0d6c..2b5417e06d80 100644 --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -603,9 +603,10 @@ nfsd_cache_append(struct svc_rqst *rqstp, struct kvec *data) * scraping this file for info should test the labels to ensure they're * getting the correct field. */ -static int nfsd_reply_cache_stats_show(struct seq_file *m, void *v) +int nfsd_reply_cache_stats_show(struct seq_file *m, void *v) { - struct nfsd_net *nn = m->private; + struct nfsd_net *nn = net_generic(file_inode(m->file)->i_sb->s_fs_info, + nfsd_net_id); seq_printf(m, "max entries: %u\n", nn->max_drc_entries); seq_printf(m, "num entries: %u\n", @@ -625,11 +626,3 @@ static int nfsd_reply_cache_stats_show(struct seq_file *m, void *v) seq_printf(m, "cachesize at longest: %u\n", nn->longest_chain_cachesize); return 0; } - -int nfsd_reply_cache_stats_open(struct inode *inode, struct file *file) -{ - struct nfsd_net *nn = net_generic(file_inode(file)->i_sb->s_fs_info, - nfsd_net_id); - - return single_open(file, nfsd_reply_cache_stats_show, nn); -} diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 3ed0cfdb0c0b..1983f4f2908d 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -204,12 +204,7 @@ static const struct file_operations pool_stats_operations = { .release = nfsd_pool_stats_release, }; -static const struct file_operations reply_cache_stats_operations = { - .open = nfsd_reply_cache_stats_open, - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; +DEFINE_SHOW_ATTRIBUTE(nfsd_reply_cache_stats); static const struct file_operations filecache_ops = { .open = nfsd_file_cache_stats_open, @@ -1354,7 +1349,8 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc) [NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR}, [NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR}, [NFSD_Pool_Stats] = {"pool_stats", &pool_stats_operations, S_IRUGO}, - [NFSD_Reply_Cache_Stats] = {"reply_cache_stats", &reply_cache_stats_operations, S_IRUGO}, + [NFSD_Reply_Cache_Stats] = {"reply_cache_stats", + &nfsd_reply_cache_stats_fops, S_IRUGO}, [NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR}, [NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO}, [NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO}, From b7aea45a67e94bacaa7ce3d76fc629b1380a681e Mon Sep 17 00:00:00 2001 From: ChenXiaoSong Date: Fri, 23 Sep 2022 00:31:56 +0800 Subject: [PATCH 0961/1565] nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_file_cache_stats_fops [ Upstream commit 1342f9dd3fc219089deeb2620f6790f19b4129b1 ] Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code. Signed-off-by: ChenXiaoSong Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 7 +------ fs/nfsd/filecache.h | 2 +- fs/nfsd/nfsctl.c | 9 ++------- 3 files changed, 4 insertions(+), 14 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 55478d411e5a..fa8e1546e020 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1211,7 +1211,7 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, * scraping this file for info should test the labels to ensure they're * getting the correct field. */ -static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) +int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { unsigned long releases = 0, pages_flushed = 0, evictions = 0; unsigned long hits = 0, acquisitions = 0; @@ -1258,8 +1258,3 @@ static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) seq_printf(m, "pages flushed: %lu\n", pages_flushed); return 0; } - -int nfsd_file_cache_stats_open(struct inode *inode, struct file *file) -{ - return single_open(file, nfsd_file_cache_stats_show, NULL); -} diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 8e8c0c47d67d..357832bac736 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -60,5 +60,5 @@ __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **nfp); __be32 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **nfp); -int nfsd_file_cache_stats_open(struct inode *, struct file *); +int nfsd_file_cache_stats_show(struct seq_file *m, void *v); #endif /* _FS_NFSD_FILECACHE_H */ diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 1983f4f2908d..6a29bcfc9390 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -206,12 +206,7 @@ static const struct file_operations pool_stats_operations = { DEFINE_SHOW_ATTRIBUTE(nfsd_reply_cache_stats); -static const struct file_operations filecache_ops = { - .open = nfsd_file_cache_stats_open, - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; +DEFINE_SHOW_ATTRIBUTE(nfsd_file_cache_stats); /*----------------------------------------------------------------------------*/ /* @@ -1355,7 +1350,7 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc) [NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO}, [NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO}, [NFSD_MaxConnections] = {"max_connections", &transaction_ops, S_IWUSR|S_IRUGO}, - [NFSD_Filecache] = {"filecache", &filecache_ops, S_IRUGO}, + [NFSD_Filecache] = {"filecache", &nfsd_file_cache_stats_fops, S_IRUGO}, #if defined(CONFIG_SUNRPC_GSS) || defined(CONFIG_SUNRPC_GSS_MODULE) [NFSD_SupportedEnctypes] = {"supported_krb5_enctypes", &supported_enctypes_fops, S_IRUGO}, From 60b46564e0b6e9e5fade655df3552f01cd1a6b4f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 22 Sep 2022 13:10:35 -0400 Subject: [PATCH 0962/1565] NFSD: Rename the fields in copy_stateid_t [ Upstream commit 781fde1a2ba2391f31142f46f964cf1148ca1791 ] Code maintenance: The name of the copy_stateid_t::sc_count field collides with the sc_count field in struct nfs4_stid, making the latter difficult to grep for when auditing stateid reference counting. No behavior change expected. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 6 +++--- fs/nfsd/nfs4state.c | 30 +++++++++++++++--------------- fs/nfsd/state.h | 6 +++--- 3 files changed, 21 insertions(+), 21 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 1bb0fb917cf0..e1aa48d496b9 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1818,7 +1818,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (!nfs4_init_copy_state(nn, copy)) goto out_err; refcount_set(&async_copy->refcount, 1); - memcpy(©->cp_res.cb_stateid, ©->cp_stateid.stid, + memcpy(©->cp_res.cb_stateid, ©->cp_stateid.cs_stid, sizeof(copy->cp_res.cb_stateid)); dup_copy_fields(copy, async_copy); async_copy->copy_task = kthread_create(nfsd4_do_async_copy, @@ -1856,7 +1856,7 @@ find_async_copy(struct nfs4_client *clp, stateid_t *stateid) spin_lock(&clp->async_lock); list_for_each_entry(copy, &clp->async_copies, copies) { - if (memcmp(©->cp_stateid.stid, stateid, NFS4_STATEID_SIZE)) + if (memcmp(©->cp_stateid.cs_stid, stateid, NFS4_STATEID_SIZE)) continue; refcount_inc(©->refcount); spin_unlock(&clp->async_lock); @@ -1910,7 +1910,7 @@ nfsd4_copy_notify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, cps = nfs4_alloc_init_cpntf_state(nn, stid); if (!cps) goto out; - memcpy(&cn->cpn_cnr_stateid, &cps->cp_stateid.stid, sizeof(stateid_t)); + memcpy(&cn->cpn_cnr_stateid, &cps->cp_stateid.cs_stid, sizeof(stateid_t)); memcpy(&cps->cp_p_stateid, &stid->sc_stateid, sizeof(stateid_t)); memcpy(&cps->cp_p_clid, &clp->cl_clientid, sizeof(clientid_t)); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index fce62a4388a2..f207c73ae1b5 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -985,19 +985,19 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla * Create a unique stateid_t to represent each COPY. */ static int nfs4_init_cp_state(struct nfsd_net *nn, copy_stateid_t *stid, - unsigned char sc_type) + unsigned char cs_type) { int new_id; - stid->stid.si_opaque.so_clid.cl_boot = (u32)nn->boot_time; - stid->stid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id; - stid->sc_type = sc_type; + stid->cs_stid.si_opaque.so_clid.cl_boot = (u32)nn->boot_time; + stid->cs_stid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id; + stid->cs_type = cs_type; idr_preload(GFP_KERNEL); spin_lock(&nn->s2s_cp_lock); new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, stid, 0, 0, GFP_NOWAIT); - stid->stid.si_opaque.so_id = new_id; - stid->stid.si_generation = 1; + stid->cs_stid.si_opaque.so_id = new_id; + stid->cs_stid.si_generation = 1; spin_unlock(&nn->s2s_cp_lock); idr_preload_end(); if (new_id < 0) @@ -1019,7 +1019,7 @@ struct nfs4_cpntf_state *nfs4_alloc_init_cpntf_state(struct nfsd_net *nn, if (!cps) return NULL; cps->cpntf_time = ktime_get_boottime_seconds(); - refcount_set(&cps->cp_stateid.sc_count, 1); + refcount_set(&cps->cp_stateid.cs_count, 1); if (!nfs4_init_cp_state(nn, &cps->cp_stateid, NFS4_COPYNOTIFY_STID)) goto out_free; spin_lock(&nn->s2s_cp_lock); @@ -1035,11 +1035,11 @@ void nfs4_free_copy_state(struct nfsd4_copy *copy) { struct nfsd_net *nn; - WARN_ON_ONCE(copy->cp_stateid.sc_type != NFS4_COPY_STID); + WARN_ON_ONCE(copy->cp_stateid.cs_type != NFS4_COPY_STID); nn = net_generic(copy->cp_clp->net, nfsd_net_id); spin_lock(&nn->s2s_cp_lock); idr_remove(&nn->s2s_cp_stateids, - copy->cp_stateid.stid.si_opaque.so_id); + copy->cp_stateid.cs_stid.si_opaque.so_id); spin_unlock(&nn->s2s_cp_lock); } @@ -6044,7 +6044,7 @@ nfs4_laundromat(struct nfsd_net *nn) spin_lock(&nn->s2s_cp_lock); idr_for_each_entry(&nn->s2s_cp_stateids, cps_t, i) { cps = container_of(cps_t, struct nfs4_cpntf_state, cp_stateid); - if (cps->cp_stateid.sc_type == NFS4_COPYNOTIFY_STID && + if (cps->cp_stateid.cs_type == NFS4_COPYNOTIFY_STID && state_expired(<, cps->cpntf_time)) _free_cpntf_state_locked(nn, cps); } @@ -6384,12 +6384,12 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s, static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps) { - WARN_ON_ONCE(cps->cp_stateid.sc_type != NFS4_COPYNOTIFY_STID); - if (!refcount_dec_and_test(&cps->cp_stateid.sc_count)) + WARN_ON_ONCE(cps->cp_stateid.cs_type != NFS4_COPYNOTIFY_STID); + if (!refcount_dec_and_test(&cps->cp_stateid.cs_count)) return; list_del(&cps->cp_list); idr_remove(&nn->s2s_cp_stateids, - cps->cp_stateid.stid.si_opaque.so_id); + cps->cp_stateid.cs_stid.si_opaque.so_id); kfree(cps); } /* @@ -6411,12 +6411,12 @@ __be32 manage_cpntf_state(struct nfsd_net *nn, stateid_t *st, if (cps_t) { state = container_of(cps_t, struct nfs4_cpntf_state, cp_stateid); - if (state->cp_stateid.sc_type != NFS4_COPYNOTIFY_STID) { + if (state->cp_stateid.cs_type != NFS4_COPYNOTIFY_STID) { state = NULL; goto unlock; } if (!clp) - refcount_inc(&state->cp_stateid.sc_count); + refcount_inc(&state->cp_stateid.cs_count); else _free_cpntf_state_locked(nn, state); } diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index 4155be65d806..b3477087a9fc 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -57,11 +57,11 @@ typedef struct { } stateid_t; typedef struct { - stateid_t stid; + stateid_t cs_stid; #define NFS4_COPY_STID 1 #define NFS4_COPYNOTIFY_STID 2 - unsigned char sc_type; - refcount_t sc_count; + unsigned char cs_type; + refcount_t cs_count; } copy_stateid_t; struct nfsd4_callback { From 31b16e6b0b7841bfecb489a08f957d79586b30e8 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 1 Sep 2022 15:29:55 -0400 Subject: [PATCH 0963/1565] NFSD: Cap rsize_bop result based on send buffer size [ Upstream commit 76ce4dcec0dc08a032db916841ddc4e3998be317 ] Since before the git era, NFSD has conserved the number of pages held by each nfsd thread by combining the RPC receive and send buffers into a single array of pages. This works because there are no cases where an operation needs a large RPC Call message and a large RPC Reply at the same time. Once an RPC Call has been received, svc_process() updates svc_rqst::rq_res to describe the part of rq_pages that can be used for constructing the Reply. This means that the send buffer (rq_res) shrinks when the received RPC record containing the RPC Call is large. Add an NFSv4 helper that computes the size of the send buffer. It replaces svc_max_payload() in spots where svc_max_payload() returns a value that might be larger than the remaining send buffer space. Callers who need to know the transport's actual maximum payload size will continue to use svc_max_payload(). Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 48 +++++++++++++++++++++++----------------------- 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index e1aa48d496b9..50fd4ba04a3e 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2763,6 +2763,22 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) #define op_encode_channel_attrs_maxsz (6 + 1 + 1) +/* + * The _rsize() helpers are invoked by the NFSv4 COMPOUND decoder, which + * is called before sunrpc sets rq_res.buflen. Thus we have to compute + * the maximum payload size here, based on transport limits and the size + * of the remaining space in the rq_pages array. + */ +static u32 nfsd4_max_payload(const struct svc_rqst *rqstp) +{ + u32 buflen; + + buflen = (rqstp->rq_page_end - rqstp->rq_next_page) * PAGE_SIZE; + buflen -= rqstp->rq_auth_slack; + buflen -= rqstp->rq_res.head[0].iov_len; + return min_t(u32, buflen, svc_max_payload(rqstp)); +} + static u32 nfsd4_only_status_rsize(const struct svc_rqst *rqstp, const struct nfsd4_op *op) { @@ -2808,9 +2824,9 @@ static u32 nfsd4_getattr_rsize(const struct svc_rqst *rqstp, u32 ret = 0; if (bmap0 & FATTR4_WORD0_ACL) - return svc_max_payload(rqstp); + return nfsd4_max_payload(rqstp); if (bmap0 & FATTR4_WORD0_FS_LOCATIONS) - return svc_max_payload(rqstp); + return nfsd4_max_payload(rqstp); if (bmap1 & FATTR4_WORD1_OWNER) { ret += IDMAP_NAMESZ + 4; @@ -2870,10 +2886,7 @@ static u32 nfsd4_open_rsize(const struct svc_rqst *rqstp, static u32 nfsd4_read_rsize(const struct svc_rqst *rqstp, const struct nfsd4_op *op) { - u32 maxcount = 0, rlen = 0; - - maxcount = svc_max_payload(rqstp); - rlen = min(op->u.read.rd_length, maxcount); + u32 rlen = min(op->u.read.rd_length, nfsd4_max_payload(rqstp)); return (op_encode_hdr_size + 2 + XDR_QUADLEN(rlen)) * sizeof(__be32); } @@ -2881,8 +2894,7 @@ static u32 nfsd4_read_rsize(const struct svc_rqst *rqstp, static u32 nfsd4_read_plus_rsize(const struct svc_rqst *rqstp, const struct nfsd4_op *op) { - u32 maxcount = svc_max_payload(rqstp); - u32 rlen = min(op->u.read.rd_length, maxcount); + u32 rlen = min(op->u.read.rd_length, nfsd4_max_payload(rqstp)); /* * If we detect that the file changed during hole encoding, then we * recover by encoding the remaining reply as data. This means we need @@ -2896,10 +2908,7 @@ static u32 nfsd4_read_plus_rsize(const struct svc_rqst *rqstp, static u32 nfsd4_readdir_rsize(const struct svc_rqst *rqstp, const struct nfsd4_op *op) { - u32 maxcount = 0, rlen = 0; - - maxcount = svc_max_payload(rqstp); - rlen = min(op->u.readdir.rd_maxcount, maxcount); + u32 rlen = min(op->u.readdir.rd_maxcount, nfsd4_max_payload(rqstp)); return (op_encode_hdr_size + op_encode_verifier_maxsz + XDR_QUADLEN(rlen)) * sizeof(__be32); @@ -3038,10 +3047,7 @@ static u32 nfsd4_copy_notify_rsize(const struct svc_rqst *rqstp, static u32 nfsd4_getdeviceinfo_rsize(const struct svc_rqst *rqstp, const struct nfsd4_op *op) { - u32 maxcount = 0, rlen = 0; - - maxcount = svc_max_payload(rqstp); - rlen = min(op->u.getdeviceinfo.gd_maxcount, maxcount); + u32 rlen = min(op->u.getdeviceinfo.gd_maxcount, nfsd4_max_payload(rqstp)); return (op_encode_hdr_size + 1 /* gd_layout_type*/ + @@ -3091,10 +3097,7 @@ static u32 nfsd4_seek_rsize(const struct svc_rqst *rqstp, static u32 nfsd4_getxattr_rsize(const struct svc_rqst *rqstp, const struct nfsd4_op *op) { - u32 maxcount, rlen; - - maxcount = svc_max_payload(rqstp); - rlen = min_t(u32, XATTR_SIZE_MAX, maxcount); + u32 rlen = min_t(u32, XATTR_SIZE_MAX, nfsd4_max_payload(rqstp)); return (op_encode_hdr_size + 1 + XDR_QUADLEN(rlen)) * sizeof(__be32); } @@ -3108,10 +3111,7 @@ static u32 nfsd4_setxattr_rsize(const struct svc_rqst *rqstp, static u32 nfsd4_listxattrs_rsize(const struct svc_rqst *rqstp, const struct nfsd4_op *op) { - u32 maxcount, rlen; - - maxcount = svc_max_payload(rqstp); - rlen = min(op->u.listxattrs.lsxa_maxcount, maxcount); + u32 rlen = min(op->u.listxattrs.lsxa_maxcount, nfsd4_max_payload(rqstp)); return (op_encode_hdr_size + 4 + XDR_QUADLEN(rlen)) * sizeof(__be32); } From 89b63627049083f63c1113910e422775c22b213c Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 26 Sep 2022 12:38:44 -0400 Subject: [PATCH 0964/1565] nfsd: only fill out return pointer on success in nfsd4_lookup_stateid [ Upstream commit 4d01416ab41540bb13ec4a39ac4e6c4aa5934bc9 ] In the case of a revoked delegation, we still fill out the pointer even when returning an error, which is bad form. Only overwrite the pointer on success. Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index f207c73ae1b5..1dc3823f3d12 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6289,6 +6289,7 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate, struct nfs4_stid **s, struct nfsd_net *nn) { __be32 status; + struct nfs4_stid *stid; bool return_revoked = false; /* @@ -6311,15 +6312,16 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate, } if (status) return status; - *s = find_stateid_by_type(cstate->clp, stateid, typemask); - if (!*s) + stid = find_stateid_by_type(cstate->clp, stateid, typemask); + if (!stid) return nfserr_bad_stateid; - if (((*s)->sc_type == NFS4_REVOKED_DELEG_STID) && !return_revoked) { - nfs4_put_stid(*s); + if ((stid->sc_type == NFS4_REVOKED_DELEG_STID) && !return_revoked) { + nfs4_put_stid(stid); if (cstate->minorversion) return nfserr_deleg_revoked; return nfserr_bad_stateid; } + *s = stid; return nfs_ok; } From d7f2774d8c59df6497de0751458f76319c75970d Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 26 Sep 2022 12:38:45 -0400 Subject: [PATCH 0965/1565] nfsd: fix comments about spinlock handling with delegations [ Upstream commit 25fbe1fca14142beae6c882f7906510363d42bff ] Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 1dc3823f3d12..e9fc5a357fc4 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4870,14 +4870,14 @@ static void nfsd_break_one_deleg(struct nfs4_delegation *dp) * We're assuming the state code never drops its reference * without first removing the lease. Since we're in this lease * callback (and since the lease code is serialized by the - * i_lock) we know the server hasn't removed the lease yet, and + * flc_lock) we know the server hasn't removed the lease yet, and * we know it's safe to take a reference. */ refcount_inc(&dp->dl_stid.sc_count); nfsd4_run_cb(&dp->dl_recall); } -/* Called from break_lease() with i_lock held. */ +/* Called from break_lease() with flc_lock held. */ static bool nfsd_break_deleg_cb(struct file_lock *fl) { From 345e3bb5e82a922ffc81cec9746290a8ef0930a9 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 26 Sep 2022 14:41:01 -0400 Subject: [PATCH 0966/1565] nfsd: make nfsd4_run_cb a bool return function [ Upstream commit b95239ca4954a0d48b19c09ce7e8f31b453b4216 ] queue_work can return false and not queue anything, if the work is already queued. If that happens in the case of a CB_RECALL, we'll have taken an extra reference to the stid that will never be put. Ensure we throw a warning in that case. Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4callback.c | 14 ++++++++++++-- fs/nfsd/nfs4state.c | 5 ++--- fs/nfsd/state.h | 2 +- 3 files changed, 15 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index face8908a40b..39989c14c8a1 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -1373,11 +1373,21 @@ void nfsd4_init_cb(struct nfsd4_callback *cb, struct nfs4_client *clp, cb->cb_holds_slot = false; } -void nfsd4_run_cb(struct nfsd4_callback *cb) +/** + * nfsd4_run_cb - queue up a callback job to run + * @cb: callback to queue + * + * Kick off a callback to do its thing. Returns false if it was already + * on a queue, true otherwise. + */ +bool nfsd4_run_cb(struct nfsd4_callback *cb) { struct nfs4_client *clp = cb->cb_clp; + bool queued; nfsd41_cb_inflight_begin(clp); - if (!nfsd4_queue_cb(cb)) + queued = nfsd4_queue_cb(cb); + if (!queued) nfsd41_cb_inflight_end(clp); + return queued; } diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index e9fc5a357fc4..fc6188d70796 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4874,14 +4874,13 @@ static void nfsd_break_one_deleg(struct nfs4_delegation *dp) * we know it's safe to take a reference. */ refcount_inc(&dp->dl_stid.sc_count); - nfsd4_run_cb(&dp->dl_recall); + WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall)); } /* Called from break_lease() with flc_lock held. */ static bool nfsd_break_deleg_cb(struct file_lock *fl) { - bool ret = false; struct nfs4_delegation *dp = (struct nfs4_delegation *)fl->fl_owner; struct nfs4_file *fp = dp->dl_stid.sc_file; struct nfs4_client *clp = dp->dl_stid.sc_client; @@ -4907,7 +4906,7 @@ nfsd_break_deleg_cb(struct file_lock *fl) fp->fi_had_conflict = true; nfsd_break_one_deleg(dp); spin_unlock(&fp->fi_lock); - return ret; + return false; } /** diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index b3477087a9fc..e2daef3cc003 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -692,7 +692,7 @@ extern void nfsd4_probe_callback_sync(struct nfs4_client *clp); extern void nfsd4_change_callback(struct nfs4_client *clp, struct nfs4_cb_conn *); extern void nfsd4_init_cb(struct nfsd4_callback *cb, struct nfs4_client *clp, const struct nfsd4_callback_ops *ops, enum nfsd4_cb_op op); -extern void nfsd4_run_cb(struct nfsd4_callback *cb); +extern bool nfsd4_run_cb(struct nfsd4_callback *cb); extern int nfsd4_create_callback_queue(void); extern void nfsd4_destroy_callback_queue(void); extern void nfsd4_shutdown_callback(struct nfs4_client *); From 241685bab2774f54d347525889f06e265a0a133c Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 26 Sep 2022 14:41:02 -0400 Subject: [PATCH 0967/1565] nfsd: extra checks when freeing delegation stateids [ Upstream commit 895ddf5ed4c54ea9e3533606d7a8b4e4f27f95ef ] We've had some reports of problems in the refcounting for delegation stateids that we've yet to track down. Add some extra checks to ensure that we've removed the object from various lists before freeing it. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2127067 Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index fc6188d70796..948ef1717815 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1071,7 +1071,12 @@ static struct nfs4_ol_stateid * nfs4_alloc_open_stateid(struct nfs4_client *clp) static void nfs4_free_deleg(struct nfs4_stid *stid) { - WARN_ON(!list_empty(&stid->sc_cp_list)); + struct nfs4_delegation *dp = delegstateid(stid); + + WARN_ON_ONCE(!list_empty(&stid->sc_cp_list)); + WARN_ON_ONCE(!list_empty(&dp->dl_perfile)); + WARN_ON_ONCE(!list_empty(&dp->dl_perclnt)); + WARN_ON_ONCE(!list_empty(&dp->dl_recall_lru)); kmem_cache_free(deleg_slab, stid); atomic_long_dec(&num_delegations); } From a2d440dce60311cc33d3a1b85933287155179bd6 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 4 Aug 2022 12:57:38 -0400 Subject: [PATCH 0968/1565] fs/notify: constify path [ Upstream commit d5bf88895f24686641c39420ee6df716dc1d95d8 ] Reviewed-by: Matthew Bobrowski Reviewed-by: Christian Brauner (Microsoft) Signed-off-by: Al Viro Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.c | 2 +- fs/notify/fanotify/fanotify.h | 2 +- fs/notify/fanotify/fanotify_user.c | 6 +++--- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index cd7d09a569ff..a2a15bc4df28 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -18,7 +18,7 @@ #include "fanotify.h" -static bool fanotify_path_equal(struct path *p1, struct path *p2) +static bool fanotify_path_equal(const struct path *p1, const struct path *p2) { return p1->mnt == p2->mnt && p1->dentry == p2->dentry; } diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 1d9f11255c64..bf6d4d38afa0 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -458,7 +458,7 @@ static inline bool fanotify_event_has_path(struct fanotify_event *event) event->type == FANOTIFY_EVENT_TYPE_PATH_PERM; } -static inline struct path *fanotify_event_path(struct fanotify_event *event) +static inline const struct path *fanotify_event_path(struct fanotify_event *event) { if (event->type == FANOTIFY_EVENT_TYPE_PATH) return &FANOTIFY_PE(event)->path; diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 72dd446606a7..5302313f28be 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -237,7 +237,7 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group, return event; } -static int create_fd(struct fsnotify_group *group, struct path *path, +static int create_fd(struct fsnotify_group *group, const struct path *path, struct file **file) { int client_fd; @@ -607,7 +607,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, char __user *buf, size_t count) { struct fanotify_event_metadata metadata; - struct path *path = fanotify_event_path(event); + const struct path *path = fanotify_event_path(event); struct fanotify_info *info = fanotify_event_info(event); unsigned int info_mode = FAN_GROUP_FLAG(group, FANOTIFY_INFO_MODES); unsigned int pidfd_mode = info_mode & FAN_REPORT_PIDFD; @@ -1541,7 +1541,7 @@ static int fanotify_test_fid(struct dentry *dentry) } static int fanotify_events_supported(struct fsnotify_group *group, - struct path *path, __u64 mask, + const struct path *path, __u64 mask, unsigned int flags) { unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS; From 229e73a0f4071ba699cd55836e95e7953b4c45bb Mon Sep 17 00:00:00 2001 From: Gaosheng Cui Date: Fri, 9 Sep 2022 11:38:28 +0800 Subject: [PATCH 0969/1565] fsnotify: remove unused declaration [ Upstream commit f847c74d6e89f10926db58649a05b99237258691 ] fsnotify_alloc_event_holder() and fsnotify_destroy_event_holder() has been removed since commit 7053aee26a35 ("fsnotify: do not share events between notification groups"), so remove it. Reviewed-by: Ritesh Harjani (IBM) Signed-off-by: Gaosheng Cui Signed-off-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fsnotify.h | 4 ---- 1 file changed, 4 deletions(-) diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h index 87d8a50ee803..fde74eb333cc 100644 --- a/fs/notify/fsnotify.h +++ b/fs/notify/fsnotify.h @@ -76,10 +76,6 @@ static inline void fsnotify_clear_marks_by_sb(struct super_block *sb) */ extern void __fsnotify_update_child_dentry_flags(struct inode *inode); -/* allocate and destroy and event holder to attach events to notification/access queues */ -extern struct fsnotify_event_holder *fsnotify_alloc_event_holder(void); -extern void fsnotify_destroy_event_holder(struct fsnotify_event_holder *holder); - extern struct kmem_cache *fsnotify_mark_connector_cachep; #endif /* __FS_NOTIFY_FSNOTIFY_H_ */ From 683fb922e7b59dcd56a73316eacddd797f6965a9 Mon Sep 17 00:00:00 2001 From: Gaosheng Cui Date: Mon, 26 Sep 2022 10:30:18 +0800 Subject: [PATCH 0970/1565] fanotify: Remove obsoleted fanotify_event_has_path() [ Upstream commit 7a80bf902d2bc722b4477442ee772e8574603185 ] All uses of fanotify_event_has_path() have been removed since commit 9c61f3b560f5 ("fanotify: break up fanotify_alloc_event()"), now it is useless, so remove it. Link: https://lore.kernel.org/r/20220926023018.1505270-1-cuigaosheng1@huawei.com Signed-off-by: Gaosheng Cui Signed-off-by: Jan Kara [ cel: resolved merge conflict ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/notify/fanotify/fanotify.h | 6 ------ 1 file changed, 6 deletions(-) diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index bf6d4d38afa0..57f51a9a3015 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -452,12 +452,6 @@ static inline bool fanotify_is_error_event(u32 mask) return mask & FAN_FS_ERROR; } -static inline bool fanotify_event_has_path(struct fanotify_event *event) -{ - return event->type == FANOTIFY_EVENT_TYPE_PATH || - event->type == FANOTIFY_EVENT_TYPE_PATH_PERM; -} - static inline const struct path *fanotify_event_path(struct fanotify_event *event) { if (event->type == FANOTIFY_EVENT_TYPE_PATH) From 2db3e73f9afd256f4d45640324100f7cd61ea375 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 30 Sep 2022 16:56:02 -0400 Subject: [PATCH 0971/1565] nfsd: fix nfsd_file_unhash_and_dispose [ Upstream commit 8d0d254b15cc5b7d46d85fb7ab8ecede9575e672 ] nfsd_file_unhash_and_dispose() is called for two reasons: We're either shutting down and purging the filecache, or we've gotten a notification about a file delete, so we want to go ahead and unhash it so that it'll get cleaned up when we close. We're either walking the hashtable or doing a lookup in it and we don't take a reference in either case. What we want to do in both cases is to try and unhash the object and put it on the dispose list if that was successful. If it's no longer hashed, then we don't want to touch it, with the assumption being that something else is already cleaning up the sentinel reference. Instead of trying to selectively decrement the refcount in this function, just unhash it, and if that was successful, move it to the dispose list. Then, the disposal routine will just clean that up as usual. Also, just make this a void function, drop the WARN_ON_ONCE, and the comments about deadlocking since the nature of the purported deadlock is no longer clear. Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 36 +++++++----------------------------- 1 file changed, 7 insertions(+), 29 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index fa8e1546e020..a0d93e797cdc 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -404,22 +404,15 @@ nfsd_file_unhash(struct nfsd_file *nf) return false; } -/* - * Return true if the file was unhashed. - */ -static bool +static void nfsd_file_unhash_and_dispose(struct nfsd_file *nf, struct list_head *dispose) { trace_nfsd_file_unhash_and_dispose(nf); - if (!nfsd_file_unhash(nf)) - return false; - /* keep final reference for nfsd_file_lru_dispose */ - if (refcount_dec_not_one(&nf->nf_ref)) - return true; - - nfsd_file_lru_remove(nf); - list_add(&nf->nf_lru, dispose); - return true; + if (nfsd_file_unhash(nf)) { + /* caller must call nfsd_file_dispose_list() later */ + nfsd_file_lru_remove(nf); + list_add(&nf->nf_lru, dispose); + } } static void @@ -561,8 +554,6 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose) * @lock: LRU list lock (unused) * @arg: dispose list * - * Note this can deadlock with nfsd_file_cache_purge. - * * Return values: * %LRU_REMOVED: @item was removed from the LRU * %LRU_ROTATE: @item is to be moved to the LRU tail @@ -747,8 +738,6 @@ nfsd_file_close_inode(struct inode *inode) * * Walk the LRU list and close any entries that have not been used since * the last scan. - * - * Note this can deadlock with nfsd_file_cache_purge. */ static void nfsd_file_delayed_close(struct work_struct *work) @@ -890,16 +879,12 @@ nfsd_file_cache_init(void) goto out; } -/* - * Note this can deadlock with nfsd_file_lru_cb. - */ static void __nfsd_file_cache_purge(struct net *net) { struct rhashtable_iter iter; struct nfsd_file *nf; LIST_HEAD(dispose); - bool del; rhashtable_walk_enter(&nfsd_file_rhash_tbl, &iter); do { @@ -909,14 +894,7 @@ __nfsd_file_cache_purge(struct net *net) while (!IS_ERR_OR_NULL(nf)) { if (net && nf->nf_net != net) continue; - del = nfsd_file_unhash_and_dispose(nf, &dispose); - - /* - * Deadlock detected! Something marked this entry as - * unhased, but hasn't removed it from the hash list. - */ - WARN_ON_ONCE(!del); - + nfsd_file_unhash_and_dispose(nf, &dispose); nf = rhashtable_walk_next(&iter); } From 1bb33492578c18a837632d64de0e19b7e77c6afe Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Tue, 4 Oct 2022 15:41:10 -0400 Subject: [PATCH 0972/1565] nfsd: rework hashtable handling in nfsd_do_file_acquire [ Upstream commit 243a5263014a30436c93ed3f1f864c1da845455e ] nfsd_file is RCU-freed, so we need to hold the rcu_read_lock long enough to get a reference after finding it in the hash. Take the rcu_read_lock() and call rhashtable_lookup directly. Switch to using rhashtable_lookup_insert_key as well, and use the usual retry mechanism if we hit an -EEXIST. Rename the "retry" bool to open_retry, and eliminiate the insert_err goto target. Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 52 +++++++++++++++++++-------------------------- 1 file changed, 22 insertions(+), 30 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index a0d93e797cdc..0b19eb015c6c 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1041,9 +1041,10 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, .need = may_flags & NFSD_FILE_MAY_MASK, .net = SVC_NET(rqstp), }; - struct nfsd_file *nf, *new; - bool retry = true; + bool open_retry = true; + struct nfsd_file *nf; __be32 status; + int ret; status = fh_verify(rqstp, fhp, S_IFREG, may_flags|NFSD_MAY_OWNER_OVERRIDE); @@ -1053,35 +1054,33 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, key.cred = get_current_cred(); retry: - /* Avoid allocation if the item is already in cache */ - nf = rhashtable_lookup_fast(&nfsd_file_rhash_tbl, &key, - nfsd_file_rhash_params); + rcu_read_lock(); + nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key, + nfsd_file_rhash_params); if (nf) nf = nfsd_file_get(nf); + rcu_read_unlock(); if (nf) goto wait_for_construction; - new = nfsd_file_alloc(&key, may_flags); - if (!new) { + nf = nfsd_file_alloc(&key, may_flags); + if (!nf) { status = nfserr_jukebox; goto out_status; } - nf = rhashtable_lookup_get_insert_key(&nfsd_file_rhash_tbl, - &key, &new->nf_rhash, - nfsd_file_rhash_params); - if (!nf) { - nf = new; + ret = rhashtable_lookup_insert_key(&nfsd_file_rhash_tbl, + &key, &nf->nf_rhash, + nfsd_file_rhash_params); + if (likely(ret == 0)) goto open_file; - } - if (IS_ERR(nf)) - goto insert_err; - nf = nfsd_file_get(nf); - if (nf == NULL) { - nf = new; - goto open_file; - } - nfsd_file_slab_free(&new->nf_rcu); + + nfsd_file_slab_free(&nf->nf_rcu); + if (ret == -EEXIST) + goto retry; + trace_nfsd_file_insert_err(rqstp, key.inode, may_flags, ret); + status = nfserr_jukebox; + goto out_status; wait_for_construction: wait_on_bit(&nf->nf_flags, NFSD_FILE_PENDING, TASK_UNINTERRUPTIBLE); @@ -1089,11 +1088,11 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, /* Did construction of this file fail? */ if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { trace_nfsd_file_cons_err(rqstp, key.inode, may_flags, nf); - if (!retry) { + if (!open_retry) { status = nfserr_jukebox; goto out; } - retry = false; + open_retry = false; nfsd_file_put_noref(nf); goto retry; } @@ -1141,13 +1140,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, smp_mb__after_atomic(); wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING); goto out; - -insert_err: - nfsd_file_slab_free(&new->nf_rcu); - trace_nfsd_file_insert_err(rqstp, key.inode, may_flags, PTR_ERR(nf)); - nf = NULL; - status = nfserr_jukebox; - goto out_status; } /** From 5428383c6fb3184c8be00e8b874678ae89129f23 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Mon, 10 Oct 2022 14:59:02 +0900 Subject: [PATCH 0973/1565] NFSD: unregister shrinker when nfsd_init_net() fails [ Upstream commit bd86c69dae65de30f6d47249418ba7889809e31a ] syzbot is reporting UAF read at register_shrinker_prepared() [1], for commit 7746b32f467b3813 ("NFSD: add shrinker to reap courtesy clients on low memory condition") missed that nfsd4_leases_net_shutdown() from nfsd_exit_net() is called only when nfsd_init_net() succeeded. If nfsd_init_net() fails due to nfsd_reply_cache_init() failure, register_shrinker() from nfsd4_init_leases_net() has to be undone before nfsd_init_net() returns. Link: https://syzkaller.appspot.com/bug?extid=ff796f04613b4c84ad89 [1] Reported-by: syzbot Signed-off-by: Tetsuo Handa Fixes: 7746b32f467b3813 ("NFSD: add shrinker to reap courtesy clients on low memory condition") Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsctl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 6a29bcfc9390..dc74a947a440 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1458,12 +1458,14 @@ static __net_init int nfsd_init_net(struct net *net) goto out_drc_error; retval = nfsd_reply_cache_init(nn); if (retval) - goto out_drc_error; + goto out_cache_error; get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); seqlock_init(&nn->writeverf_lock); return 0; +out_cache_error: + nfsd4_leases_net_shutdown(nn); out_drc_error: nfsd_idmap_shutdown(net); out_idmap_error: From 31268eb4572b155f69c527cb3274a0721dc1482e Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 31 Oct 2022 11:49:21 -0400 Subject: [PATCH 0974/1565] nfsd: fix net-namespace logic in __nfsd_file_cache_purge [ Upstream commit d3aefd2b29ff5ffdeb5c06a7d3191a027a18cdb8 ] If the namespace doesn't match the one in "net", then we'll continue, but that doesn't cause another rhashtable_walk_next call, so it will loop infinitely. Fixes: ce502f81ba88 ("NFSD: Convert the filecache to use rhashtable") Reported-by: Petr Vorel Link: https://lore.kernel.org/ltp/Y1%2FP8gDAcWC%2F+VR3@pevik/ Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 0b19eb015c6c..024adcbe67e9 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -892,9 +892,8 @@ __nfsd_file_cache_purge(struct net *net) nf = rhashtable_walk_next(&iter); while (!IS_ERR_OR_NULL(nf)) { - if (net && nf->nf_net != net) - continue; - nfsd_file_unhash_and_dispose(nf, &dispose); + if (!net || nf->nf_net == net) + nfsd_file_unhash_and_dispose(nf, &dispose); nf = rhashtable_walk_next(&iter); } From 551f17db6508b45eb3d984bfcec61b1c3dba4806 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Sat, 5 Nov 2022 09:49:26 -0400 Subject: [PATCH 0975/1565] nfsd: fix use-after-free in nfsd_file_do_acquire tracepoint [ Upstream commit bdd6b5624c62d0acd350d07564f1c82fe649235f ] When we fail to insert into the hashtable with a non-retryable error, we'll free the object and then goto out_status. If the tracepoint is enabled, it'll end up accessing the freed object when it tries to grab the fields out of it. Set nf to NULL after freeing it to avoid the issue. Fixes: 243a5263014a ("nfsd: rework hashtable handling in nfsd_do_file_acquire") Reported-by: kernel test robot Reported-by: Dan Carpenter Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 024adcbe67e9..dceb522f5cee 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1075,6 +1075,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, goto open_file; nfsd_file_slab_free(&nf->nf_rcu); + nf = NULL; if (ret == -EEXIST) goto retry; trace_nfsd_file_insert_err(rqstp, key.inode, may_flags, ret); From 7d867c6c30e1c5abd7ef01418108af9911163a67 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Tue, 8 Nov 2022 11:23:11 -0500 Subject: [PATCH 0976/1565] nfsd: put the export reference in nfsd4_verify_deleg_dentry [ Upstream commit 50256e4793a5e5ab77703c82a47344ad2e774a59 ] nfsd_lookup_dentry returns an export reference in addition to the dentry ref. Ensure that we put it too. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2138866 Fixes: 876c553cb410 ("NFSD: verify the opened dentry after setting a delegation") Reported-by: Yongcheng Yang Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 948ef1717815..10915c72c781 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5397,6 +5397,7 @@ nfsd4_verify_deleg_dentry(struct nfsd4_open *open, struct nfs4_file *fp, if (err) return -EAGAIN; + exp_put(exp); dput(child); if (child != file_dentry(fp->fi_deleg_file->nf_file)) return -EAGAIN; From 7ea635fc47af797fea5250fcba22acc4c159e4de Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 23 Nov 2022 14:14:32 -0500 Subject: [PATCH 0977/1565] NFSD: Fix reads with a non-zero offset that don't end on a page boundary MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit ac8db824ead0de2e9111337c401409d010fba2f0 ] This was found when virtual machines with nfs-mounted qcow2 disks failed to boot properly. Reported-by: Anders Blomdell Suggested-by: Al Viro Link: https://bugzilla.redhat.com/show_bug.cgi?id=2142132 Fixes: bfbfb6182ad1 ("nfsd_splice_actor(): handle compound pages") [ cel: "‘for’ loop initial declarations are only allowed in C99 or C11 mode" ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index e29034b1e612..4ff626c912cc 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -888,11 +888,11 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct svc_rqst *rqstp = sd->u.data; struct page *page = buf->page; // may be a compound one unsigned offset = buf->offset; - int i; + struct page *last_page; - page += offset / PAGE_SIZE; - for (i = sd->len; i > 0; i -= PAGE_SIZE) - svc_rqst_replace_page(rqstp, page++); + last_page = page + (offset + sd->len - 1) / PAGE_SIZE; + for (page += offset / PAGE_SIZE; page <= last_page; page++) + svc_rqst_replace_page(rqstp, page); if (rqstp->rq_res.page_len == 0) // first call rqstp->rq_res.page_base = offset % PAGE_SIZE; rqstp->rq_res.page_len += sd->len; From 70ffaa7896d9002687eaba43caf437737b424704 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 16 Nov 2022 09:02:30 -0500 Subject: [PATCH 0978/1565] filelock: add a new locks_inode_context accessor function [ Upstream commit 401a8b8fd5acd51582b15238d72a8d0edd580e9f ] There are a number of places in the kernel that are accessing the inode->i_flctx field without smp_load_acquire. This is required to ensure that the caller doesn't see a partially-initialized structure. Add a new accessor function for it to make this clear and convert all of the relevant accesses in locks.c to use it. Also, convert locks_free_lock_context to use the helper as well instead of just doing a "bare" assignment. Reviewed-by: Christoph Hellwig Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/locks.c | 24 ++++++++++++------------ include/linux/fs.h | 14 ++++++++++++++ 2 files changed, 26 insertions(+), 12 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 13a3ba97b73d..b0753c8871fb 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -251,7 +251,7 @@ locks_get_lock_context(struct inode *inode, int type) struct file_lock_context *ctx; /* paired with cmpxchg() below */ - ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); if (likely(ctx) || type == F_UNLCK) goto out; @@ -270,7 +270,7 @@ locks_get_lock_context(struct inode *inode, int type) */ if (cmpxchg(&inode->i_flctx, NULL, ctx)) { kmem_cache_free(flctx_cache, ctx); - ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); } out: trace_locks_get_lock_context(inode, type, ctx); @@ -323,7 +323,7 @@ locks_check_ctx_file_list(struct file *filp, struct list_head *list, void locks_free_lock_context(struct inode *inode) { - struct file_lock_context *ctx = inode->i_flctx; + struct file_lock_context *ctx = locks_inode_context(inode); if (unlikely(ctx)) { locks_check_ctx_lists(inode); @@ -985,7 +985,7 @@ posix_test_lock(struct file *filp, struct file_lock *fl) void *owner; void (*func)(void); - ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); if (!ctx || list_empty_careful(&ctx->flc_posix)) { fl->fl_type = F_UNLCK; return; @@ -1674,7 +1674,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) new_fl->fl_flags = type; /* typically we will check that ctx is non-NULL before calling */ - ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); if (!ctx) { WARN_ON_ONCE(1); goto free_lock; @@ -1779,7 +1779,7 @@ void lease_get_mtime(struct inode *inode, struct timespec64 *time) struct file_lock_context *ctx; struct file_lock *fl; - ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); if (ctx && !list_empty_careful(&ctx->flc_lease)) { spin_lock(&ctx->flc_lock); fl = list_first_entry_or_null(&ctx->flc_lease, @@ -1825,7 +1825,7 @@ int fcntl_getlease(struct file *filp) int type = F_UNLCK; LIST_HEAD(dispose); - ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); if (ctx && !list_empty_careful(&ctx->flc_lease)) { percpu_down_read(&file_rwsem); spin_lock(&ctx->flc_lock); @@ -2014,7 +2014,7 @@ static int generic_delete_lease(struct file *filp, void *owner) struct file_lock_context *ctx; LIST_HEAD(dispose); - ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); if (!ctx) { trace_generic_delete_lease(inode, NULL); return error; @@ -2765,7 +2765,7 @@ void locks_remove_posix(struct file *filp, fl_owner_t owner) * posix_lock_file(). Another process could be setting a lock on this * file at the same time, but we wouldn't remove that lock anyway. */ - ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); if (!ctx || list_empty(&ctx->flc_posix)) return; @@ -2838,7 +2838,7 @@ void locks_remove_file(struct file *filp) { struct file_lock_context *ctx; - ctx = smp_load_acquire(&locks_inode(filp)->i_flctx); + ctx = locks_inode_context(locks_inode(filp)); if (!ctx) return; @@ -2885,7 +2885,7 @@ bool vfs_inode_has_locks(struct inode *inode) struct file_lock_context *ctx; bool ret; - ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); if (!ctx) return false; @@ -3030,7 +3030,7 @@ void show_fd_locks(struct seq_file *f, struct file_lock_context *ctx; int id = 0; - ctx = smp_load_acquire(&inode->i_flctx); + ctx = locks_inode_context(inode); if (!ctx) return; diff --git a/include/linux/fs.h b/include/linux/fs.h index 7e7098bf2c57..6a26ef54ac25 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1168,6 +1168,13 @@ extern void show_fd_locks(struct seq_file *f, struct file *filp, struct files_struct *files); extern bool locks_owner_has_blockers(struct file_lock_context *flctx, fl_owner_t owner); + +static inline struct file_lock_context * +locks_inode_context(const struct inode *inode) +{ + return smp_load_acquire(&inode->i_flctx); +} + #else /* !CONFIG_FILE_LOCKING */ static inline int fcntl_getlk(struct file *file, unsigned int cmd, struct flock __user *user) @@ -1313,6 +1320,13 @@ static inline bool locks_owner_has_blockers(struct file_lock_context *flctx, { return false; } + +static inline struct file_lock_context * +locks_inode_context(const struct inode *inode) +{ + return NULL; +} + #endif /* !CONFIG_FILE_LOCKING */ static inline struct inode *file_inode(const struct file *f) From 32c59062f86837668cfc0feee968b083fdb2afb6 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 16 Nov 2022 09:19:43 -0500 Subject: [PATCH 0979/1565] lockd: use locks_inode_context helper [ Upstream commit 98b41ffe0afdfeaa1439a5d6bd2db4a94277e31b ] lockd currently doesn't access i_flctx safely. This requires a smp_load_acquire, as the pointer is set via cmpxchg (a release operation). Cc: Trond Myklebust Cc: Anna Schumaker Cc: Chuck Lever Reviewed-by: Christoph Hellwig Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svcsubs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index e1c4617de771..720684345817 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -207,7 +207,7 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file *file, { struct inode *inode = nlmsvc_file_inode(file); struct file_lock *fl; - struct file_lock_context *flctx = inode->i_flctx; + struct file_lock_context *flctx = locks_inode_context(inode); struct nlm_host *lockhost; if (!flctx || list_empty_careful(&flctx->flc_posix)) @@ -262,7 +262,7 @@ nlm_file_inuse(struct nlm_file *file) { struct inode *inode = nlmsvc_file_inode(file); struct file_lock *fl; - struct file_lock_context *flctx = inode->i_flctx; + struct file_lock_context *flctx = locks_inode_context(inode); if (file->f_count || !list_empty(&file->f_blocks) || file->f_shares) return 1; From 10727ce312c6a23614afe7ae24978aeb00d53e29 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 16 Nov 2022 09:36:07 -0500 Subject: [PATCH 0980/1565] nfsd: use locks_inode_context helper [ Upstream commit 77c67530e1f95ac25c7075635f32f04367380894 ] nfsd currently doesn't access i_flctx safely everywhere. This requires a smp_load_acquire, as the pointer is set via cmpxchg (a release operation). Acked-by: Chuck Lever Reviewed-by: Christoph Hellwig Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 10915c72c781..2d09977fee83 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4773,7 +4773,7 @@ nfs4_share_conflict(struct svc_fh *current_fh, unsigned int deny_type) static bool nfsd4_deleg_present(const struct inode *inode) { - struct file_lock_context *ctx = smp_load_acquire(&inode->i_flctx); + struct file_lock_context *ctx = locks_inode_context(inode); return ctx && !list_empty_careful(&ctx->flc_lease); } @@ -5912,7 +5912,7 @@ nfs4_lockowner_has_blockers(struct nfs4_lockowner *lo) list_for_each_entry(stp, &lo->lo_owner.so_stateids, st_perstateowner) { nf = stp->st_stid.sc_file; - ctx = nf->fi_inode->i_flctx; + ctx = locks_inode_context(nf->fi_inode); if (!ctx) continue; if (locks_owner_has_blockers(ctx, lo)) @@ -7740,7 +7740,7 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner) } inode = locks_inode(nf->nf_file); - flctx = inode->i_flctx; + flctx = locks_inode_context(inode); if (flctx && !list_empty_careful(&flctx->flc_posix)) { spin_lock(&flctx->flc_lock); From 48a237cb5e52106e111a3afbe61d2e7bb5c1f49f Mon Sep 17 00:00:00 2001 From: Anna Schumaker Date: Tue, 13 Sep 2022 14:01:51 -0400 Subject: [PATCH 0981/1565] NFSD: Simplify READ_PLUS [ Upstream commit eeadcb75794516839078c28b3730132aeb700ce6 ] Chuck had suggested reverting READ_PLUS so it returns a single DATA segment covering the requested read range. This prepares the server for a future "sparse read" function so support can easily be added without needing to rip out the old READ_PLUS code at the same time. Signed-off-by: Anna Schumaker Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 139 +++++++++++----------------------------------- 1 file changed, 32 insertions(+), 107 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index fc587381cd08..27b38ce6f8c6 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -4775,79 +4775,37 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp, - struct nfsd4_read *read, - unsigned long *maxcount, u32 *eof, - loff_t *pos) + struct nfsd4_read *read) { - struct xdr_stream *xdr = resp->xdr; + bool splice_ok = test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags); struct file *file = read->rd_nf->nf_file; - int starting_len = xdr->buf->len; - loff_t hole_pos; - __be32 nfserr; - __be32 *p, tmp; - __be64 tmp64; - - hole_pos = pos ? *pos : vfs_llseek(file, read->rd_offset, SEEK_HOLE); - if (hole_pos > read->rd_offset) - *maxcount = min_t(unsigned long, *maxcount, hole_pos - read->rd_offset); - *maxcount = min_t(unsigned long, *maxcount, (xdr->buf->buflen - xdr->buf->len)); + struct xdr_stream *xdr = resp->xdr; + unsigned long maxcount; + __be32 nfserr, *p; /* Content type, offset, byte count */ p = xdr_reserve_space(xdr, 4 + 8 + 4); if (!p) - return nfserr_resource; + return nfserr_io; + if (resp->xdr->buf->page_len && splice_ok) { + WARN_ON_ONCE(splice_ok); + return nfserr_serverfault; + } - read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, *maxcount); - if (read->rd_vlen < 0) - return nfserr_resource; + maxcount = min_t(unsigned long, read->rd_length, + (xdr->buf->buflen - xdr->buf->len)); - nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset, - resp->rqstp->rq_vec, read->rd_vlen, maxcount, eof); + if (file->f_op->splice_read && splice_ok) + nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount); + else + nfserr = nfsd4_encode_readv(resp, read, file, maxcount); if (nfserr) return nfserr; - xdr_truncate_encode(xdr, starting_len + 16 + xdr_align_size(*maxcount)); - tmp = htonl(NFS4_CONTENT_DATA); - write_bytes_to_xdr_buf(xdr->buf, starting_len, &tmp, 4); - tmp64 = cpu_to_be64(read->rd_offset); - write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp64, 8); - tmp = htonl(*maxcount); - write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp, 4); - - tmp = xdr_zero; - write_bytes_to_xdr_buf(xdr->buf, starting_len + 16 + *maxcount, &tmp, - xdr_pad_size(*maxcount)); - return nfs_ok; -} - -static __be32 -nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp, - struct nfsd4_read *read, - unsigned long *maxcount, u32 *eof) -{ - struct file *file = read->rd_nf->nf_file; - loff_t data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA); - loff_t f_size = i_size_read(file_inode(file)); - unsigned long count; - __be32 *p; - - if (data_pos == -ENXIO) - data_pos = f_size; - else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE)) - return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size); - count = data_pos - read->rd_offset; - - /* Content type, offset, byte count */ - p = xdr_reserve_space(resp->xdr, 4 + 8 + 8); - if (!p) - return nfserr_resource; - - *p++ = htonl(NFS4_CONTENT_HOLE); + *p++ = cpu_to_be32(NFS4_CONTENT_DATA); p = xdr_encode_hyper(p, read->rd_offset); - p = xdr_encode_hyper(p, count); + *p = cpu_to_be32(read->rd_length); - *eof = (read->rd_offset + count) >= f_size; - *maxcount = min_t(unsigned long, count, *maxcount); return nfs_ok; } @@ -4855,69 +4813,36 @@ static __be32 nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_read *read) { - unsigned long maxcount, count; + struct file *file = read->rd_nf->nf_file; struct xdr_stream *xdr = resp->xdr; - struct file *file; int starting_len = xdr->buf->len; - int last_segment = xdr->buf->len; - int segments = 0; - __be32 *p, tmp; - bool is_data; - loff_t pos; - u32 eof; + u32 segments = 0; + __be32 *p; if (nfserr) return nfserr; - file = read->rd_nf->nf_file; /* eof flag, segment count */ p = xdr_reserve_space(xdr, 4 + 4); if (!p) - return nfserr_resource; + return nfserr_io; xdr_commit_encode(xdr); - maxcount = min_t(unsigned long, read->rd_length, - (xdr->buf->buflen - xdr->buf->len)); - count = maxcount; - - eof = read->rd_offset >= i_size_read(file_inode(file)); - if (eof) + read->rd_eof = read->rd_offset >= i_size_read(file_inode(file)); + if (read->rd_eof) goto out; - pos = vfs_llseek(file, read->rd_offset, SEEK_HOLE); - is_data = pos > read->rd_offset; - - while (count > 0 && !eof) { - maxcount = count; - if (is_data) - nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof, - segments == 0 ? &pos : NULL); - else - nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof); - if (nfserr) - goto out; - count -= maxcount; - read->rd_offset += maxcount; - is_data = !is_data; - last_segment = xdr->buf->len; - segments++; + nfserr = nfsd4_encode_read_plus_data(resp, read); + if (nfserr) { + xdr_truncate_encode(xdr, starting_len); + return nfserr; } + segments++; + out: - if (nfserr && segments == 0) - xdr_truncate_encode(xdr, starting_len); - else { - if (nfserr) { - xdr_truncate_encode(xdr, last_segment); - nfserr = nfs_ok; - eof = 0; - } - tmp = htonl(eof); - write_bytes_to_xdr_buf(xdr->buf, starting_len, &tmp, 4); - tmp = htonl(segments); - write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4); - } - + p = xdr_encode_bool(p, read->rd_eof); + *p = cpu_to_be32(segments); return nfserr; } From 1dd04600f62953390e6f35827e49b0662ecd2e65 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Mon, 10 Oct 2022 21:24:23 +0100 Subject: [PATCH 0982/1565] NFSD: Remove redundant assignment to variable host_err [ Upstream commit 69eed23baf877bbb1f14d7f4df54f89807c9ee2a ] Variable host_err is assigned a value that is never read, it is being re-assigned a value in every different execution path in the following switch statement. The assignment is redundant and can be removed. Cleans up clang-scan warning: warning: Value stored to 'host_err' is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 4ff626c912cc..aae81c5cecb9 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1322,7 +1322,6 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, iap->ia_mode &= ~current_umask(); err = 0; - host_err = 0; switch (type) { case S_IFREG: host_err = vfs_create(dirp, dchild, iap->ia_mode, true); From a20b0abab966a189a79aba6ebf41f59024a3224d Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 16 Oct 2022 11:47:02 -0400 Subject: [PATCH 0983/1565] NFSD: Finish converting the NFSv2 GETACL result encoder [ Upstream commit ea5021e911d3479346a75ac9b7d9dcd751b0fb99 ] The xdr_stream conversion inadvertently left some code that set the page_len of the send buffer. The XDR stream encoders should handle this automatically now. This oversight adds garbage past the end of the Reply message. Clients typically ignore the garbage, but NFSD does not need to send it, as it leaks stale memory contents onto the wire. Fixes: f8cba47344f7 ("NFSD: Update the NFSv2 GETACL result encoder to use struct xdr_stream") Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs2acl.c | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index f8179fc8c9bd..9adf672dedbd 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -244,7 +244,6 @@ nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) struct nfsd3_getaclres *resp = rqstp->rq_resp; struct dentry *dentry = resp->fh.fh_dentry; struct inode *inode; - int w; if (!svcxdr_encode_stat(xdr, resp->status)) return false; @@ -258,15 +257,6 @@ nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) if (xdr_stream_encode_u32(xdr, resp->mask) < 0) return false; - rqstp->rq_res.page_len = w = nfsacl_size( - (resp->mask & NFS_ACL) ? resp->acl_access : NULL, - (resp->mask & NFS_DFACL) ? resp->acl_default : NULL); - while (w > 0) { - if (!*(rqstp->rq_next_page++)) - return true; - w -= PAGE_SIZE; - } - if (!nfs_stream_encode_acl(xdr, inode, resp->acl_access, resp->mask & NFS_ACL, 0)) return false; From 81714ef8e3ef8052df358fbccc3fe5125bf0ac67 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sun, 16 Oct 2022 11:47:08 -0400 Subject: [PATCH 0984/1565] NFSD: Finish converting the NFSv3 GETACL result encoder [ Upstream commit 841fd0a3cb490eae5dfd262eccb8c8b11d57f8b8 ] For some reason, the NFSv2 GETACL result encoder was fully converted to use the new nfs_stream_encode_acl(), but the NFSv3 equivalent was not similarly converted. Fixes: 20798dfe249a ("NFSD: Update the NFSv3 GETACL result encoder to use struct xdr_stream") Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3acl.c | 30 ++++++------------------------ 1 file changed, 6 insertions(+), 24 deletions(-) diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 2c451fcd5a3c..161f831b3a1b 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -169,11 +169,7 @@ nfs3svc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) { struct nfsd3_getaclres *resp = rqstp->rq_resp; struct dentry *dentry = resp->fh.fh_dentry; - struct kvec *head = rqstp->rq_res.head; struct inode *inode; - unsigned int base; - int n; - int w; if (!svcxdr_encode_nfsstat3(xdr, resp->status)) return false; @@ -185,26 +181,12 @@ nfs3svc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) if (xdr_stream_encode_u32(xdr, resp->mask) < 0) return false; - base = (char *)xdr->p - (char *)head->iov_base; - - rqstp->rq_res.page_len = w = nfsacl_size( - (resp->mask & NFS_ACL) ? resp->acl_access : NULL, - (resp->mask & NFS_DFACL) ? resp->acl_default : NULL); - while (w > 0) { - if (!*(rqstp->rq_next_page++)) - return false; - w -= PAGE_SIZE; - } - - n = nfsacl_encode(&rqstp->rq_res, base, inode, - resp->acl_access, - resp->mask & NFS_ACL, 0); - if (n > 0) - n = nfsacl_encode(&rqstp->rq_res, base + n, inode, - resp->acl_default, - resp->mask & NFS_DFACL, - NFS_ACL_DEFAULT); - if (n <= 0) + if (!nfs_stream_encode_acl(xdr, inode, resp->acl_access, + resp->mask & NFS_ACL, 0)) + return false; + if (!nfs_stream_encode_acl(xdr, inode, resp->acl_default, + resp->mask & NFS_DFACL, + NFS_ACL_DEFAULT)) return false; break; default: From abbd1215c3f91545b8f244672ec054d149e46f89 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Tue, 18 Oct 2022 07:47:54 -0400 Subject: [PATCH 0985/1565] nfsd: ignore requests to disable unsupported versions [ Upstream commit 8e823bafff2308753d430566256c83d8085952da ] The kernel currently errors out if you attempt to enable or disable a version that it doesn't recognize. Change it to ignore attempts to disable an unrecognized version. If we don't support it, then there is no harm in doing so. Signed-off-by: Jeff Layton Reviewed-by: Tom Talpey Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsctl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index dc74a947a440..68ed42fd29fc 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -601,7 +601,9 @@ static ssize_t __write_versions(struct file *file, char *buf, size_t size) } break; default: - return -EINVAL; + /* Ignore requests to disable non-existent versions */ + if (cmd == NFSD_SET) + return -EINVAL; } vers += len + 1; } while ((len = qword_get(&mesg, vers, size)) > 0); From 68f7bd7f29a014e0994ae66747803d434ee11eb0 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Tue, 18 Oct 2022 07:47:55 -0400 Subject: [PATCH 0986/1565] nfsd: move nfserrno() to vfs.c [ Upstream commit cb12fae1c34b1fa7eaae92c5aadc72d86d7fae19 ] nfserrno() is common to all nfs versions, but nfsproc.c is specifically for NFSv2. Move it to vfs.c, and the prototype to vfs.h. While we're in here, remove the #ifdef EDQUOT check in this function. It's apparently a holdover from the initial merge of the nfsd code in 1997. No other place in the kernel checks that that symbol is defined before using it, so I think we can dispense with it here. Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/blocklayout.c | 1 + fs/nfsd/blocklayoutxdr.c | 1 + fs/nfsd/export.h | 1 - fs/nfsd/flexfilelayout.c | 1 + fs/nfsd/nfs4idmap.c | 1 + fs/nfsd/nfsproc.c | 62 --------------------------------------- fs/nfsd/vfs.c | 63 ++++++++++++++++++++++++++++++++++++++++ fs/nfsd/vfs.h | 1 + 8 files changed, 68 insertions(+), 63 deletions(-) diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c index a07c39c94bbd..d91a686d2f31 100644 --- a/fs/nfsd/blocklayout.c +++ b/fs/nfsd/blocklayout.c @@ -16,6 +16,7 @@ #include "blocklayoutxdr.h" #include "pnfs.h" #include "filecache.h" +#include "vfs.h" #define NFSDDBG_FACILITY NFSDDBG_PNFS diff --git a/fs/nfsd/blocklayoutxdr.c b/fs/nfsd/blocklayoutxdr.c index 2455dc8be18a..1ed2f691ebb9 100644 --- a/fs/nfsd/blocklayoutxdr.c +++ b/fs/nfsd/blocklayoutxdr.c @@ -9,6 +9,7 @@ #include "nfsd.h" #include "blocklayoutxdr.h" +#include "vfs.h" #define NFSDDBG_FACILITY NFSDDBG_PNFS diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h index ee0e3aba4a6e..d03f7f6a8642 100644 --- a/fs/nfsd/export.h +++ b/fs/nfsd/export.h @@ -115,7 +115,6 @@ struct svc_export * rqst_find_fsidzero_export(struct svc_rqst *); int exp_rootfh(struct net *, struct auth_domain *, char *path, struct knfsd_fh *, int maxsize); __be32 exp_pseudoroot(struct svc_rqst *, struct svc_fh *); -__be32 nfserrno(int errno); static inline void exp_put(struct svc_export *exp) { diff --git a/fs/nfsd/flexfilelayout.c b/fs/nfsd/flexfilelayout.c index 2e2f1d5e9f62..fabc21ed68ce 100644 --- a/fs/nfsd/flexfilelayout.c +++ b/fs/nfsd/flexfilelayout.c @@ -15,6 +15,7 @@ #include "flexfilelayoutxdr.h" #include "pnfs.h" +#include "vfs.h" #define NFSDDBG_FACILITY NFSDDBG_PNFS diff --git a/fs/nfsd/nfs4idmap.c b/fs/nfsd/nfs4idmap.c index e70a1a2999b7..5e9809aff37e 100644 --- a/fs/nfsd/nfs4idmap.c +++ b/fs/nfsd/nfs4idmap.c @@ -41,6 +41,7 @@ #include "idmap.h" #include "nfsd.h" #include "netns.h" +#include "vfs.h" /* * Turn off idmapping when using AUTH_SYS. diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 78fa5a9edf27..3777b5be4253 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -848,65 +848,3 @@ const struct svc_version nfsd_version2 = { .vs_dispatch = nfsd_dispatch, .vs_xdrsize = NFS2_SVC_XDRSIZE, }; - -/* - * Map errnos to NFS errnos. - */ -__be32 -nfserrno (int errno) -{ - static struct { - __be32 nfserr; - int syserr; - } nfs_errtbl[] = { - { nfs_ok, 0 }, - { nfserr_perm, -EPERM }, - { nfserr_noent, -ENOENT }, - { nfserr_io, -EIO }, - { nfserr_nxio, -ENXIO }, - { nfserr_fbig, -E2BIG }, - { nfserr_stale, -EBADF }, - { nfserr_acces, -EACCES }, - { nfserr_exist, -EEXIST }, - { nfserr_xdev, -EXDEV }, - { nfserr_mlink, -EMLINK }, - { nfserr_nodev, -ENODEV }, - { nfserr_notdir, -ENOTDIR }, - { nfserr_isdir, -EISDIR }, - { nfserr_inval, -EINVAL }, - { nfserr_fbig, -EFBIG }, - { nfserr_nospc, -ENOSPC }, - { nfserr_rofs, -EROFS }, - { nfserr_mlink, -EMLINK }, - { nfserr_nametoolong, -ENAMETOOLONG }, - { nfserr_notempty, -ENOTEMPTY }, -#ifdef EDQUOT - { nfserr_dquot, -EDQUOT }, -#endif - { nfserr_stale, -ESTALE }, - { nfserr_jukebox, -ETIMEDOUT }, - { nfserr_jukebox, -ERESTARTSYS }, - { nfserr_jukebox, -EAGAIN }, - { nfserr_jukebox, -EWOULDBLOCK }, - { nfserr_jukebox, -ENOMEM }, - { nfserr_io, -ETXTBSY }, - { nfserr_notsupp, -EOPNOTSUPP }, - { nfserr_toosmall, -ETOOSMALL }, - { nfserr_serverfault, -ESERVERFAULT }, - { nfserr_serverfault, -ENFILE }, - { nfserr_io, -EREMOTEIO }, - { nfserr_stale, -EOPENSTALE }, - { nfserr_io, -EUCLEAN }, - { nfserr_perm, -ENOKEY }, - { nfserr_no_grace, -ENOGRACE}, - }; - int i; - - for (i = 0; i < ARRAY_SIZE(nfs_errtbl); i++) { - if (nfs_errtbl[i].syserr == errno) - return nfs_errtbl[i].nfserr; - } - WARN_ONCE(1, "nfsd: non-standard errno: %d\n", errno); - return nfserr_io; -} - diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index aae81c5cecb9..9bda56fe303c 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -48,6 +48,69 @@ #define NFSDDBG_FACILITY NFSDDBG_FILEOP +/** + * nfserrno - Map Linux errnos to NFS errnos + * @errno: POSIX(-ish) error code to be mapped + * + * Returns the appropriate (net-endian) nfserr_* (or nfs_ok if errno is 0). If + * it's an error we don't expect, log it once and return nfserr_io. + */ +__be32 +nfserrno (int errno) +{ + static struct { + __be32 nfserr; + int syserr; + } nfs_errtbl[] = { + { nfs_ok, 0 }, + { nfserr_perm, -EPERM }, + { nfserr_noent, -ENOENT }, + { nfserr_io, -EIO }, + { nfserr_nxio, -ENXIO }, + { nfserr_fbig, -E2BIG }, + { nfserr_stale, -EBADF }, + { nfserr_acces, -EACCES }, + { nfserr_exist, -EEXIST }, + { nfserr_xdev, -EXDEV }, + { nfserr_mlink, -EMLINK }, + { nfserr_nodev, -ENODEV }, + { nfserr_notdir, -ENOTDIR }, + { nfserr_isdir, -EISDIR }, + { nfserr_inval, -EINVAL }, + { nfserr_fbig, -EFBIG }, + { nfserr_nospc, -ENOSPC }, + { nfserr_rofs, -EROFS }, + { nfserr_mlink, -EMLINK }, + { nfserr_nametoolong, -ENAMETOOLONG }, + { nfserr_notempty, -ENOTEMPTY }, + { nfserr_dquot, -EDQUOT }, + { nfserr_stale, -ESTALE }, + { nfserr_jukebox, -ETIMEDOUT }, + { nfserr_jukebox, -ERESTARTSYS }, + { nfserr_jukebox, -EAGAIN }, + { nfserr_jukebox, -EWOULDBLOCK }, + { nfserr_jukebox, -ENOMEM }, + { nfserr_io, -ETXTBSY }, + { nfserr_notsupp, -EOPNOTSUPP }, + { nfserr_toosmall, -ETOOSMALL }, + { nfserr_serverfault, -ESERVERFAULT }, + { nfserr_serverfault, -ENFILE }, + { nfserr_io, -EREMOTEIO }, + { nfserr_stale, -EOPENSTALE }, + { nfserr_io, -EUCLEAN }, + { nfserr_perm, -ENOKEY }, + { nfserr_no_grace, -ENOGRACE}, + }; + int i; + + for (i = 0; i < ARRAY_SIZE(nfs_errtbl); i++) { + if (nfs_errtbl[i].syserr == errno) + return nfs_errtbl[i].nfserr; + } + WARN_ONCE(1, "nfsd: non-standard errno: %d\n", errno); + return nfserr_io; +} + /* * Called from nfsd_lookup and encode_dirent. Check if we have crossed * a mount point. diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index 120521bc7b24..8ddd687f8359 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -60,6 +60,7 @@ static inline void nfsd_attrs_free(struct nfsd_attrs *attrs) posix_acl_release(attrs->na_dpacl); } +__be32 nfserrno (int errno); int nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp, struct svc_export **expp); __be32 nfsd_lookup(struct svc_rqst *, struct svc_fh *, From 4ab1211c28f10b22b01200f07e600921ebf191af Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Tue, 18 Oct 2022 07:47:56 -0400 Subject: [PATCH 0987/1565] nfsd: allow disabling NFSv2 at compile time [ Upstream commit 2f3a4b2ac2f28b9be78ad21f401f31e263845214 ] rpc.nfsd stopped supporting NFSv2 a year ago. Take the next logical step toward deprecating it and allow NFSv2 support to be compiled out. Add a new CONFIG_NFSD_V2 option that can be turned off and rework the CONFIG_NFSD_V?_ACL option dependencies. Add a description that discourages enabling it. Also, change the description of CONFIG_NFSD to state that the always-on version is now 3 instead of 2. Finally, add an #ifdef around "case 2:" in __write_versions. When NFSv2 is disabled at compile time, this should make the kernel ignore attempts to disable it at runtime, but still error out when trying to enable it. Signed-off-by: Jeff Layton Reviewed-by: Tom Talpey Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/Kconfig | 19 +++++++++++++++---- fs/nfsd/Makefile | 5 +++-- fs/nfsd/nfsctl.c | 2 ++ fs/nfsd/nfsd.h | 3 +-- fs/nfsd/nfssvc.c | 6 ++++++ 5 files changed, 27 insertions(+), 8 deletions(-) diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig index 887af7966b03..6d2d498a5957 100644 --- a/fs/nfsd/Kconfig +++ b/fs/nfsd/Kconfig @@ -8,6 +8,7 @@ config NFSD select SUNRPC select EXPORTFS select NFS_ACL_SUPPORT if NFSD_V2_ACL + select NFS_ACL_SUPPORT if NFSD_V3_ACL depends on MULTIUSER help Choose Y here if you want to allow other computers to access @@ -26,19 +27,29 @@ config NFSD Below you can choose which versions of the NFS protocol are available to clients mounting the NFS server on this system. - Support for NFS version 2 (RFC 1094) is always available when + Support for NFS version 3 (RFC 1813) is always available when CONFIG_NFSD is selected. If unsure, say N. -config NFSD_V2_ACL - bool +config NFSD_V2 + bool "NFS server support for NFS version 2 (DEPRECATED)" depends on NFSD + default n + help + NFSv2 (RFC 1094) was the first publicly-released version of NFS. + Unless you are hosting ancient (1990's era) NFS clients, you don't + need this. + + If unsure, say N. + +config NFSD_V2_ACL + bool "NFS server support for the NFSv2 ACL protocol extension" + depends on NFSD_V2 config NFSD_V3_ACL bool "NFS server support for the NFSv3 ACL protocol extension" depends on NFSD - select NFSD_V2_ACL help Solaris NFS servers support an auxiliary NFSv3 ACL protocol that never became an official part of the NFS version 3 protocol. diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile index 805c06d5f1b4..6fffc8f03f74 100644 --- a/fs/nfsd/Makefile +++ b/fs/nfsd/Makefile @@ -10,9 +10,10 @@ obj-$(CONFIG_NFSD) += nfsd.o # this one should be compiled first, as the tracing macros can easily blow up nfsd-y += trace.o -nfsd-y += nfssvc.o nfsctl.o nfsproc.o nfsfh.o vfs.o \ - export.o auth.o lockd.o nfscache.o nfsxdr.o \ +nfsd-y += nfssvc.o nfsctl.o nfsfh.o vfs.o \ + export.o auth.o lockd.o nfscache.o \ stats.o filecache.o nfs3proc.o nfs3xdr.o +nfsd-$(CONFIG_NFSD_V2) += nfsproc.o nfsxdr.o nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o nfsd-$(CONFIG_NFSD_V3_ACL) += nfs3acl.o nfsd-$(CONFIG_NFSD_V4) += nfs4proc.o nfs4xdr.o nfs4state.o nfs4idmap.o \ diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 68ed42fd29fc..d1e581a60480 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -581,7 +581,9 @@ static ssize_t __write_versions(struct file *file, char *buf, size_t size) cmd = sign == '-' ? NFSD_CLEAR : NFSD_SET; switch(num) { +#ifdef CONFIG_NFSD_V2 case 2: +#endif case 3: nfsd_vers(nn, num, cmd); break; diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 09726c5b9a31..93b42ef9ed91 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -64,8 +64,7 @@ struct readdir_cd { extern struct svc_program nfsd_program; -extern const struct svc_version nfsd_version2, nfsd_version3, - nfsd_version4; +extern const struct svc_version nfsd_version2, nfsd_version3, nfsd_version4; extern struct mutex nfsd_mutex; extern spinlock_t nfsd_drc_lock; extern unsigned long nfsd_drc_max_mem; diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 8b1afde19211..429f38c98628 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -91,8 +91,12 @@ unsigned long nfsd_drc_mem_used; #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) static struct svc_stat nfsd_acl_svcstats; static const struct svc_version *nfsd_acl_version[] = { +# if defined(CONFIG_NFSD_V2_ACL) [2] = &nfsd_acl_version2, +# endif +# if defined(CONFIG_NFSD_V3_ACL) [3] = &nfsd_acl_version3, +# endif }; #define NFSD_ACL_MINVERS 2 @@ -116,7 +120,9 @@ static struct svc_stat nfsd_acl_svcstats = { #endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */ static const struct svc_version *nfsd_version[] = { +#if defined(CONFIG_NFSD_V2) [2] = &nfsd_version2, +#endif [3] = &nfsd_version3, #if defined(CONFIG_NFSD_V4) [4] = &nfsd_version4, From 599d5c22912f8891e3e50f7f4c8ac4ae943cf13e Mon Sep 17 00:00:00 2001 From: David Disseldorp Date: Fri, 21 Oct 2022 14:24:14 +0200 Subject: [PATCH 0988/1565] exportfs: use pr_debug for unreachable debug statements [ Upstream commit 427505ffeaa464f683faba945a88d3e3248f6979 ] expfs.c has a bunch of dprintk statements which are unusable due to: #define dprintk(fmt, args...) do{}while(0) Use pr_debug so that they can be enabled dynamically. Also make some minor changes to the debug statements to fix some incorrect types, and remove __func__ which can be handled by dynamic debug separately. Signed-off-by: David Disseldorp Reviewed-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/exportfs/expfs.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c index 0106eba46d5a..8c28bd1c9ed9 100644 --- a/fs/exportfs/expfs.c +++ b/fs/exportfs/expfs.c @@ -18,7 +18,7 @@ #include #include -#define dprintk(fmt, args...) do{}while(0) +#define dprintk(fmt, args...) pr_debug(fmt, ##args) static int get_name(const struct path *path, char *name, struct dentry *child); @@ -132,8 +132,8 @@ static struct dentry *reconnect_one(struct vfsmount *mnt, inode_unlock(dentry->d_inode); if (IS_ERR(parent)) { - dprintk("%s: get_parent of %ld failed, err %d\n", - __func__, dentry->d_inode->i_ino, PTR_ERR(parent)); + dprintk("get_parent of %lu failed, err %ld\n", + dentry->d_inode->i_ino, PTR_ERR(parent)); return parent; } @@ -147,7 +147,7 @@ static struct dentry *reconnect_one(struct vfsmount *mnt, dprintk("%s: found name: %s\n", __func__, nbuf); tmp = lookup_one_len_unlocked(nbuf, parent, strlen(nbuf)); if (IS_ERR(tmp)) { - dprintk("%s: lookup failed: %d\n", __func__, PTR_ERR(tmp)); + dprintk("lookup failed: %ld\n", PTR_ERR(tmp)); err = PTR_ERR(tmp); goto out_err; } From caa627020132b571f79e3c65b8d85ad4f23b1a84 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 28 Oct 2022 10:46:38 -0400 Subject: [PATCH 0989/1565] NFSD: Pass the target nfsd_file to nfsd_commit() [ Upstream commit c252849082ff525af18b4f253b3c9ece94e951ed ] In a moment I'm going to introduce separate nfsd_file types, one of which is garbage-collected; the other, not. The garbage-collected variety is to be used by NFSv2 and v3, and the non-garbage-collected variety is to be used by NFSv4. nfsd_commit() is invoked by both NFSv3 and NFSv4 consumers. We want nfsd_commit() to find and use the correct variety of cached nfsd_file object for the NFS version that is in use. Signed-off-by: Chuck Lever Tested-by: Jeff Layton Reviewed-by: Jeff Layton Reviewed-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs3proc.c | 10 +++++++++- fs/nfsd/nfs4proc.c | 11 ++++++++++- fs/nfsd/vfs.c | 15 ++++----------- fs/nfsd/vfs.h | 3 ++- 4 files changed, 25 insertions(+), 14 deletions(-) diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index a9bf455aee82..a3a55b2be4f6 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -13,6 +13,7 @@ #include "cache.h" #include "xdr3.h" #include "vfs.h" +#include "filecache.h" #define NFSDDBG_FACILITY NFSDDBG_PROC @@ -763,6 +764,7 @@ nfsd3_proc_commit(struct svc_rqst *rqstp) { struct nfsd3_commitargs *argp = rqstp->rq_argp; struct nfsd3_commitres *resp = rqstp->rq_resp; + struct nfsd_file *nf; dprintk("nfsd: COMMIT(3) %s %u@%Lu\n", SVCFH_fmt(&argp->fh), @@ -770,8 +772,14 @@ nfsd3_proc_commit(struct svc_rqst *rqstp) (unsigned long long) argp->offset); fh_copy(&resp->fh, &argp->fh); - resp->status = nfsd_commit(rqstp, &resp->fh, argp->offset, + resp->status = nfsd_file_acquire(rqstp, &resp->fh, NFSD_MAY_WRITE | + NFSD_MAY_NOT_BREAK_LEASE, &nf); + if (resp->status) + goto out; + resp->status = nfsd_commit(rqstp, &resp->fh, nf, argp->offset, argp->count, resp->verf); + nfsd_file_put(nf); +out: return rpc_success; } diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 50fd4ba04a3e..3fe1966ed735 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -731,10 +731,19 @@ nfsd4_commit(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) { struct nfsd4_commit *commit = &u->commit; + struct nfsd_file *nf; + __be32 status; - return nfsd_commit(rqstp, &cstate->current_fh, commit->co_offset, + status = nfsd_file_acquire(rqstp, &cstate->current_fh, NFSD_MAY_WRITE | + NFSD_MAY_NOT_BREAK_LEASE, &nf); + if (status != nfs_ok) + return status; + + status = nfsd_commit(rqstp, &cstate->current_fh, nf, commit->co_offset, commit->co_count, (__be32 *)commit->co_verf.data); + nfsd_file_put(nf); + return status; } static __be32 diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 9bda56fe303c..3278dddd11ba 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1213,6 +1213,7 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset, * nfsd_commit - Commit pending writes to stable storage * @rqstp: RPC request being processed * @fhp: NFS filehandle + * @nf: target file * @offset: raw offset from beginning of file * @count: raw count of bytes to sync * @verf: filled in with the server's current write verifier @@ -1229,19 +1230,13 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset, * An nfsstat value in network byte order. */ __be32 -nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, u64 offset, - u32 count, __be32 *verf) +nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, + u64 offset, u32 count, __be32 *verf) { + __be32 err = nfs_ok; u64 maxbytes; loff_t start, end; struct nfsd_net *nn; - struct nfsd_file *nf; - __be32 err; - - err = nfsd_file_acquire(rqstp, fhp, - NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &nf); - if (err) - goto out; /* * Convert the client-provided (offset, count) range to a @@ -1282,8 +1277,6 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, u64 offset, } else nfsd_copy_write_verifier(verf, nn); - nfsd_file_put(nf); -out: return err; } diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index 8ddd687f8359..dbdfef7ae85b 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -89,7 +89,8 @@ __be32 nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *); __be32 nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct svc_fh *resfhp, struct nfsd_attrs *iap); __be32 nfsd_commit(struct svc_rqst *rqst, struct svc_fh *fhp, - u64 offset, u32 count, __be32 *verf); + struct nfsd_file *nf, u64 offset, u32 count, + __be32 *verf); #ifdef CONFIG_NFSD_V4 __be32 nfsd_getxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name, void **bufp, int *lenp); From c09b456a81d23a508fb017d7814e875399fb7dca Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 28 Oct 2022 10:46:44 -0400 Subject: [PATCH 0990/1565] NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately" [ Upstream commit dcf3f80965ca787c70def402cdf1553c93c75529 ] This reverts commit 5e138c4a750dc140d881dab4a8804b094bbc08d2. That commit attempted to make files available to other users as soon as all NFSv4 clients were done with them, rather than waiting until the filecache LRU had garbage collected them. It gets the reference counting wrong, for one thing. But it also misses that DELEGRETURN should release a file in the same fashion. In fact, any nfsd_file_put() on an file held open by an NFSv4 client needs potentially to release the file immediately... Clear the way for implementing that idea. Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Reviewed-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 18 ------------------ fs/nfsd/filecache.h | 1 - fs/nfsd/nfs4state.c | 4 ++-- 3 files changed, 2 insertions(+), 21 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index dceb522f5cee..e429fce89431 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -443,24 +443,6 @@ nfsd_file_put(struct nfsd_file *nf) nfsd_file_put_noref(nf); } -/** - * nfsd_file_close - Close an nfsd_file - * @nf: nfsd_file to close - * - * If this is the final reference for @nf, free it immediately. - * This reflects an on-the-wire CLOSE or DELEGRETURN into the - * VFS and exported filesystem. - */ -void nfsd_file_close(struct nfsd_file *nf) -{ - nfsd_file_put(nf); - if (refcount_dec_if_one(&nf->nf_ref)) { - nfsd_file_unhash(nf); - nfsd_file_lru_remove(nf); - nfsd_file_free(nf); - } -} - struct nfsd_file * nfsd_file_get(struct nfsd_file *nf) { diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 357832bac736..6b012ea4bd9d 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -52,7 +52,6 @@ void nfsd_file_cache_shutdown(void); int nfsd_file_cache_start_net(struct net *net); void nfsd_file_cache_shutdown_net(struct net *net); void nfsd_file_put(struct nfsd_file *nf); -void nfsd_file_close(struct nfsd_file *nf); struct nfsd_file *nfsd_file_get(struct nfsd_file *nf); void nfsd_file_close_inode_sync(struct inode *inode); bool nfsd_file_is_cached(struct inode *inode); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 2d09977fee83..80d8f40d1f12 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -842,9 +842,9 @@ static void __nfs4_file_put_access(struct nfs4_file *fp, int oflag) swap(f2, fp->fi_fds[O_RDWR]); spin_unlock(&fp->fi_lock); if (f1) - nfsd_file_close(f1); + nfsd_file_put(f1); if (f2) - nfsd_file_close(f2); + nfsd_file_put(f2); } } From 5f866f5a8611bacc40c1f16831f96af6e937eff3 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 28 Oct 2022 10:46:51 -0400 Subject: [PATCH 0991/1565] NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection [ Upstream commit 4d1ea8455716ca070e3cd85767e6f6a562a58b1b ] NFSv4 operations manage the lifetime of nfsd_file items they use by means of NFSv4 OPEN and CLOSE. Hence there's no need for them to be garbage collected. Introduce a mechanism to enable garbage collection for nfsd_file items used only by NFSv2/3 callers. Note that the change in nfsd_file_put() ensures that both CLOSE and DELEGRETURN will actually close out and free an nfsd_file on last reference of a non-garbage-collected file. Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=394 Suggested-by: Trond Myklebust Signed-off-by: Chuck Lever Tested-by: Jeff Layton Reviewed-by: NeilBrown Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 63 +++++++++++++++++++++++++++++++++++++++------ fs/nfsd/filecache.h | 3 +++ fs/nfsd/nfs3proc.c | 4 +-- fs/nfsd/trace.h | 3 ++- fs/nfsd/vfs.c | 4 +-- 5 files changed, 64 insertions(+), 13 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index e429fce89431..13a25503b80e 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -62,6 +62,7 @@ struct nfsd_file_lookup_key { struct net *net; const struct cred *cred; unsigned char need; + bool gc; enum nfsd_file_lookup_type type; }; @@ -161,6 +162,8 @@ static int nfsd_file_obj_cmpfn(struct rhashtable_compare_arg *arg, return 1; if (!nfsd_match_cred(nf->nf_cred, key->cred)) return 1; + if (!!test_bit(NFSD_FILE_GC, &nf->nf_flags) != key->gc) + return 1; if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) return 1; break; @@ -296,6 +299,8 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) nf->nf_flags = 0; __set_bit(NFSD_FILE_HASHED, &nf->nf_flags); __set_bit(NFSD_FILE_PENDING, &nf->nf_flags); + if (key->gc) + __set_bit(NFSD_FILE_GC, &nf->nf_flags); nf->nf_inode = key->inode; /* nf_ref is pre-incremented for hash table */ refcount_set(&nf->nf_ref, 2); @@ -427,16 +432,27 @@ nfsd_file_put_noref(struct nfsd_file *nf) } } +static void +nfsd_file_unhash_and_put(struct nfsd_file *nf) +{ + if (nfsd_file_unhash(nf)) + nfsd_file_put_noref(nf); +} + void nfsd_file_put(struct nfsd_file *nf) { might_sleep(); - nfsd_file_lru_add(nf); - if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) { + if (test_bit(NFSD_FILE_GC, &nf->nf_flags)) + nfsd_file_lru_add(nf); + else if (refcount_read(&nf->nf_ref) == 2) + nfsd_file_unhash_and_put(nf); + + if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { nfsd_file_flush(nf); nfsd_file_put_noref(nf); - } else if (nf->nf_file) { + } else if (nf->nf_file && test_bit(NFSD_FILE_GC, &nf->nf_flags)) { nfsd_file_put_noref(nf); nfsd_file_schedule_laundrette(); } else @@ -1015,12 +1031,14 @@ nfsd_file_is_cached(struct inode *inode) static __be32 nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, - unsigned int may_flags, struct nfsd_file **pnf, bool open) + unsigned int may_flags, struct nfsd_file **pnf, + bool open, bool want_gc) { struct nfsd_file_lookup_key key = { .type = NFSD_FILE_KEY_FULL, .need = may_flags & NFSD_FILE_MAY_MASK, .net = SVC_NET(rqstp), + .gc = want_gc, }; bool open_retry = true; struct nfsd_file *nf; @@ -1116,14 +1134,35 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, * then unhash. */ if (status != nfs_ok || key.inode->i_nlink == 0) - if (nfsd_file_unhash(nf)) - nfsd_file_put_noref(nf); + nfsd_file_unhash_and_put(nf); clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags); smp_mb__after_atomic(); wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING); goto out; } +/** + * nfsd_file_acquire_gc - Get a struct nfsd_file with an open file + * @rqstp: the RPC transaction being executed + * @fhp: the NFS filehandle of the file to be opened + * @may_flags: NFSD_MAY_ settings for the file + * @pnf: OUT: new or found "struct nfsd_file" object + * + * The nfsd_file object returned by this API is reference-counted + * and garbage-collected. The object is retained for a few + * seconds after the final nfsd_file_put() in case the caller + * wants to re-use it. + * + * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in + * network byte order is returned. + */ +__be32 +nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp, + unsigned int may_flags, struct nfsd_file **pnf) +{ + return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true, true); +} + /** * nfsd_file_acquire - Get a struct nfsd_file with an open file * @rqstp: the RPC transaction being executed @@ -1131,6 +1170,10 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, * @may_flags: NFSD_MAY_ settings for the file * @pnf: OUT: new or found "struct nfsd_file" object * + * The nfsd_file_object returned by this API is reference-counted + * but not garbage-collected. The object is unhashed after the + * final nfsd_file_put(). + * * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in * network byte order is returned. */ @@ -1138,7 +1181,7 @@ __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **pnf) { - return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true); + return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true, false); } /** @@ -1148,6 +1191,10 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, * @may_flags: NFSD_MAY_ settings for the file * @pnf: OUT: new or found "struct nfsd_file" object * + * The nfsd_file_object returned by this API is reference-counted + * but not garbage-collected. The object is released immediately + * one RCU grace period after the final nfsd_file_put(). + * * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in * network byte order is returned. */ @@ -1155,7 +1202,7 @@ __be32 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **pnf) { - return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, false); + return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, false, false); } /* diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 6b012ea4bd9d..b7efb2c3ddb1 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -38,6 +38,7 @@ struct nfsd_file { #define NFSD_FILE_HASHED (0) #define NFSD_FILE_PENDING (1) #define NFSD_FILE_REFERENCED (2) +#define NFSD_FILE_GC (3) unsigned long nf_flags; struct inode *nf_inode; /* don't deref */ refcount_t nf_ref; @@ -55,6 +56,8 @@ void nfsd_file_put(struct nfsd_file *nf); struct nfsd_file *nfsd_file_get(struct nfsd_file *nf); void nfsd_file_close_inode_sync(struct inode *inode); bool nfsd_file_is_cached(struct inode *inode); +__be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp, + unsigned int may_flags, struct nfsd_file **nfp); __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **nfp); __be32 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index a3a55b2be4f6..19cf583096d9 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -772,8 +772,8 @@ nfsd3_proc_commit(struct svc_rqst *rqstp) (unsigned long long) argp->offset); fh_copy(&resp->fh, &argp->fh); - resp->status = nfsd_file_acquire(rqstp, &resp->fh, NFSD_MAY_WRITE | - NFSD_MAY_NOT_BREAK_LEASE, &nf); + resp->status = nfsd_file_acquire_gc(rqstp, &resp->fh, NFSD_MAY_WRITE | + NFSD_MAY_NOT_BREAK_LEASE, &nf); if (resp->status) goto out; resp->status = nfsd_commit(rqstp, &resp->fh, nf, argp->offset, diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 6803ac877ff7..03722414d6db 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -730,7 +730,8 @@ DEFINE_CLID_EVENT(confirmed_r); __print_flags(val, "|", \ { 1 << NFSD_FILE_HASHED, "HASHED" }, \ { 1 << NFSD_FILE_PENDING, "PENDING" }, \ - { 1 << NFSD_FILE_REFERENCED, "REFERENCED"}) + { 1 << NFSD_FILE_REFERENCED, "REFERENCED"}, \ + { 1 << NFSD_FILE_GC, "GC"}) DECLARE_EVENT_CLASS(nfsd_file_class, TP_PROTO(struct nfsd_file *nf), diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 3278dddd11ba..3d67dd7eab4b 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1165,7 +1165,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, __be32 err; trace_nfsd_read_start(rqstp, fhp, offset, *count); - err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); + err = nfsd_file_acquire_gc(rqstp, fhp, NFSD_MAY_READ, &nf); if (err) return err; @@ -1197,7 +1197,7 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset, trace_nfsd_write_start(rqstp, fhp, offset, *cnt); - err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_WRITE, &nf); + err = nfsd_file_acquire_gc(rqstp, fhp, NFSD_MAY_WRITE, &nf); if (err) goto out; From d9991b0b9dd5ad53701d14ea194d59f4833ec829 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 1 Nov 2022 13:30:46 -0400 Subject: [PATCH 0992/1565] NFSD: Flesh out a documenting comment for filecache.c [ Upstream commit b3276c1f5b268ff56622e9e125b792b4c3dc03ac ] Record what we've learned recently about the NFSD filecache in a documenting comment so our future selves don't forget what all this is for. Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 13a25503b80e..d681faf48cf8 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -2,6 +2,30 @@ * Open file cache. * * (c) 2015 - Jeff Layton + * + * An nfsd_file object is a per-file collection of open state that binds + * together: + * - a struct file * + * - a user credential + * - a network namespace + * - a read-ahead context + * - monitoring for writeback errors + * + * nfsd_file objects are reference-counted. Consumers acquire a new + * object via the nfsd_file_acquire API. They manage their interest in + * the acquired object, and hence the object's reference count, via + * nfsd_file_get and nfsd_file_put. There are two varieties of nfsd_file + * object: + * + * * non-garbage-collected: When a consumer wants to precisely control + * the lifetime of a file's open state, it acquires a non-garbage- + * collected nfsd_file. The final nfsd_file_put releases the open + * state immediately. + * + * * garbage-collected: When a consumer does not control the lifetime + * of open state, it acquires a garbage-collected nfsd_file. The + * final nfsd_file_put allows the open state to linger for a period + * during which it may be re-used. */ #include From 14c9c091f2a6a2c5e196b9ae832136e2e15c94ab Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 28 Oct 2022 10:46:57 -0400 Subject: [PATCH 0993/1565] NFSD: Clean up nfs4_preprocess_stateid_op() call sites [ Upstream commit eeff73f7c1c583f79a401284f46c619294859310 ] Remove the lame-duck dprintk()s around nfs4_preprocess_stateid_op() call sites. Signed-off-by: Chuck Lever Tested-by: Jeff Layton Reviewed-by: Jeff Layton Reviewed-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 31 +++++++------------------------ 1 file changed, 7 insertions(+), 24 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 3fe1966ed735..32fccca7de18 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -943,12 +943,7 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, &read->rd_stateid, RD_STATE, &read->rd_nf, NULL); - if (status) { - dprintk("NFSD: nfsd4_read: couldn't process stateid!\n"); - goto out; - } - status = nfs_ok; -out: + read->rd_rqstp = rqstp; read->rd_fhp = &cstate->current_fh; return status; @@ -1117,10 +1112,8 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, &setattr->sa_stateid, WR_STATE, NULL, NULL); - if (status) { - dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n"); + if (status) return status; - } } err = fh_want_write(&cstate->current_fh); if (err) @@ -1168,10 +1161,8 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, write->wr_offset, cnt); status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, stateid, WR_STATE, &nf, NULL); - if (status) { - dprintk("NFSD: nfsd4_write: couldn't process stateid!\n"); + if (status) return status; - } write->wr_how_written = write->wr_stable_how; @@ -1202,17 +1193,13 @@ nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->save_fh, src_stateid, RD_STATE, src, NULL); - if (status) { - dprintk("NFSD: %s: couldn't process src stateid!\n", __func__); + if (status) goto out; - } status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, dst_stateid, WR_STATE, dst, NULL); - if (status) { - dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__); + if (status) goto out_put_src; - } /* fix up for NFS-specific error code */ if (!S_ISREG(file_inode((*src)->nf_file)->i_mode) || @@ -1949,10 +1936,8 @@ nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, &fallocate->falloc_stateid, WR_STATE, &nf, NULL); - if (status != nfs_ok) { - dprintk("NFSD: nfsd4_fallocate: couldn't process stateid!\n"); + if (status != nfs_ok) return status; - } status = nfsd4_vfs_fallocate(rqstp, &cstate->current_fh, nf->nf_file, fallocate->falloc_offset, @@ -2008,10 +1993,8 @@ nfsd4_seek(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, &seek->seek_stateid, RD_STATE, &nf, NULL); - if (status) { - dprintk("NFSD: nfsd4_seek: couldn't process stateid!\n"); + if (status) return status; - } switch (seek->seek_whence) { case NFS4_CONTENT_DATA: From 88cf6a1e76aa40e4e71df2b4f6b93eff0845790f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 28 Oct 2022 10:47:03 -0400 Subject: [PATCH 0994/1565] NFSD: Trace stateids returned via DELEGRETURN [ Upstream commit 20eee313ff4b8a7e71ae9560f5c4ba27cd763005 ] Handing out a delegation stateid is recorded with the nfsd_deleg_read tracepoint, but there isn't a matching tracepoint for recording when the stateid is returned. Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 1 + fs/nfsd/trace.h | 1 + 2 files changed, 2 insertions(+) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 80d8f40d1f12..a40a9a836fb1 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -6929,6 +6929,7 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) goto put_stateid; + trace_nfsd_deleg_return(stateid); wake_up_var(d_inode(cstate->current_fh.fh_dentry)); destroy_delegation(dp); put_stateid: diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 03722414d6db..fe76d3b2c928 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -517,6 +517,7 @@ DEFINE_STATEID_EVENT(layout_recall_release); DEFINE_STATEID_EVENT(open); DEFINE_STATEID_EVENT(deleg_read); +DEFINE_STATEID_EVENT(deleg_return); DEFINE_STATEID_EVENT(deleg_recall); DECLARE_EVENT_CLASS(nfsd_stateseqid_class, From a10d111fd09fb5090269f0f856a64a0f0ba8d9cc Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 28 Oct 2022 10:47:09 -0400 Subject: [PATCH 0995/1565] NFSD: Trace delegation revocations [ Upstream commit a1c74569bbde91299f24535abf711be5c84df9de ] Delegation revocation is an exceptional event that is not otherwise visible externally (eg, no network traffic is emitted). Generate a trace record when it occurs so that revocation can be observed or other activity can be triggered. Example: nfsd-1104 [005] 1912.002544: nfsd_stid_revoke: client 633c9343:4e82788d stateid 00000003:00000001 ref=2 type=DELEG Trace infrastructure is provided for subsequent additional tracing related to nfs4_stid activity. Signed-off-by: Chuck Lever Tested-by: Jeff Layton Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 2 ++ fs/nfsd/trace.h | 55 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index a40a9a836fb1..dcbf777dc58d 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1366,6 +1366,8 @@ static void revoke_delegation(struct nfs4_delegation *dp) WARN_ON(!list_empty(&dp->dl_recall_lru)); + trace_nfsd_stid_revoke(&dp->dl_stid); + if (clp->cl_minorversion) { spin_lock(&clp->cl_lock); dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID; diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index fe76d3b2c928..191b206379b7 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -550,6 +550,61 @@ DEFINE_EVENT(nfsd_stateseqid_class, nfsd_##name, \ DEFINE_STATESEQID_EVENT(preprocess); DEFINE_STATESEQID_EVENT(open_confirm); +TRACE_DEFINE_ENUM(NFS4_OPEN_STID); +TRACE_DEFINE_ENUM(NFS4_LOCK_STID); +TRACE_DEFINE_ENUM(NFS4_DELEG_STID); +TRACE_DEFINE_ENUM(NFS4_CLOSED_STID); +TRACE_DEFINE_ENUM(NFS4_REVOKED_DELEG_STID); +TRACE_DEFINE_ENUM(NFS4_CLOSED_DELEG_STID); +TRACE_DEFINE_ENUM(NFS4_LAYOUT_STID); + +#define show_stid_type(x) \ + __print_flags(x, "|", \ + { NFS4_OPEN_STID, "OPEN" }, \ + { NFS4_LOCK_STID, "LOCK" }, \ + { NFS4_DELEG_STID, "DELEG" }, \ + { NFS4_CLOSED_STID, "CLOSED" }, \ + { NFS4_REVOKED_DELEG_STID, "REVOKED" }, \ + { NFS4_CLOSED_DELEG_STID, "CLOSED_DELEG" }, \ + { NFS4_LAYOUT_STID, "LAYOUT" }) + +DECLARE_EVENT_CLASS(nfsd_stid_class, + TP_PROTO( + const struct nfs4_stid *stid + ), + TP_ARGS(stid), + TP_STRUCT__entry( + __field(unsigned long, sc_type) + __field(int, sc_count) + __field(u32, cl_boot) + __field(u32, cl_id) + __field(u32, si_id) + __field(u32, si_generation) + ), + TP_fast_assign( + const stateid_t *stp = &stid->sc_stateid; + + __entry->sc_type = stid->sc_type; + __entry->sc_count = refcount_read(&stid->sc_count); + __entry->cl_boot = stp->si_opaque.so_clid.cl_boot; + __entry->cl_id = stp->si_opaque.so_clid.cl_id; + __entry->si_id = stp->si_opaque.so_id; + __entry->si_generation = stp->si_generation; + ), + TP_printk("client %08x:%08x stateid %08x:%08x ref=%d type=%s", + __entry->cl_boot, __entry->cl_id, + __entry->si_id, __entry->si_generation, + __entry->sc_count, show_stid_type(__entry->sc_type) + ) +); + +#define DEFINE_STID_EVENT(name) \ +DEFINE_EVENT(nfsd_stid_class, nfsd_stid_##name, \ + TP_PROTO(const struct nfs4_stid *stid), \ + TP_ARGS(stid)) + +DEFINE_STID_EVENT(revoke); + DECLARE_EVENT_CLASS(nfsd_clientid_class, TP_PROTO(const clientid_t *clid), TP_ARGS(clid), From b0952d49483a08ecdd04fea6a09d013cb8956171 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 28 Oct 2022 10:47:16 -0400 Subject: [PATCH 0996/1565] NFSD: Use const pointers as parameters to fh_ helpers [ Upstream commit b48f8056c034f28dd54668399f1d22be421b0bef ] Enable callers to use const pointers where they are able to. Signed-off-by: Chuck Lever Tested-by: Jeff Layton Reviewed-by: Jeff Layton Reviewed-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsfh.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index c3ae6414fc5c..513e028b0bbe 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -220,7 +220,7 @@ __be32 fh_update(struct svc_fh *); void fh_put(struct svc_fh *); static __inline__ struct svc_fh * -fh_copy(struct svc_fh *dst, struct svc_fh *src) +fh_copy(struct svc_fh *dst, const struct svc_fh *src) { WARN_ON(src->fh_dentry); @@ -229,7 +229,7 @@ fh_copy(struct svc_fh *dst, struct svc_fh *src) } static inline void -fh_copy_shallow(struct knfsd_fh *dst, struct knfsd_fh *src) +fh_copy_shallow(struct knfsd_fh *dst, const struct knfsd_fh *src) { dst->fh_size = src->fh_size; memcpy(&dst->fh_raw, &src->fh_raw, src->fh_size); @@ -243,7 +243,8 @@ fh_init(struct svc_fh *fhp, int maxsize) return fhp; } -static inline bool fh_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2) +static inline bool fh_match(const struct knfsd_fh *fh1, + const struct knfsd_fh *fh2) { if (fh1->fh_size != fh2->fh_size) return false; @@ -252,7 +253,8 @@ static inline bool fh_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2) return true; } -static inline bool fh_fsid_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2) +static inline bool fh_fsid_match(const struct knfsd_fh *fh1, + const struct knfsd_fh *fh2) { if (fh1->fh_fsid_type != fh2->fh_fsid_type) return false; From c152e4ffb9e85b3788a002a79184f0674aa8b62f Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 28 Oct 2022 10:47:22 -0400 Subject: [PATCH 0997/1565] NFSD: Update file_hashtbl() helpers [ Upstream commit 3fe828caddd81e68e9d29353c6e9285a658ca056 ] Enable callers to use const pointers for type safety. Signed-off-by: Chuck Lever Reviewed-by: NeilBrown Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index dcbf777dc58d..90505095d7e0 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -721,7 +721,7 @@ static unsigned int ownerstr_hashval(struct xdr_netobj *ownername) #define FILE_HASH_BITS 8 #define FILE_HASH_SIZE (1 << FILE_HASH_BITS) -static unsigned int file_hashval(struct svc_fh *fh) +static unsigned int file_hashval(const struct svc_fh *fh) { struct inode *inode = d_inode(fh->fh_dentry); @@ -4686,7 +4686,7 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net) /* search file_hashtbl[] for file */ static struct nfs4_file * -find_file_locked(struct svc_fh *fh, unsigned int hashval) +find_file_locked(const struct svc_fh *fh, unsigned int hashval) { struct nfs4_file *fp; From e77b1d63c02e6d14aa8fd4c95a4010933efb5ffd Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 28 Oct 2022 10:47:28 -0400 Subject: [PATCH 0998/1565] NFSD: Clean up nfsd4_init_file() [ Upstream commit 81a21fa3e7fdecb3c5b97014f0fc5a17d5806cae ] Name this function more consistently. I'm going to use nfsd4_file_ and nfsd4_file_hash_ for these helpers. Change the @fh parameter to be const pointer for better type safety. Finally, move the hash insertion operation to the caller. This is typical for most other "init_object" type helpers, and it is where most of the other nfs4_file hash table operations are located. Signed-off-by: Chuck Lever Reviewed-by: NeilBrown Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 90505095d7e0..d6e0052b3f68 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4277,11 +4277,9 @@ static struct nfs4_file *nfsd4_alloc_file(void) } /* OPEN Share state helper functions */ -static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval, - struct nfs4_file *fp) -{ - lockdep_assert_held(&state_lock); +static void nfsd4_file_init(const struct svc_fh *fh, struct nfs4_file *fp) +{ refcount_set(&fp->fi_ref, 1); spin_lock_init(&fp->fi_lock); INIT_LIST_HEAD(&fp->fi_stateids); @@ -4299,7 +4297,6 @@ static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval, INIT_LIST_HEAD(&fp->fi_lo_states); atomic_set(&fp->fi_lo_recalls, 0); #endif - hlist_add_head_rcu(&fp->fi_hash, &file_hashtbl[hashval]); } void @@ -4717,7 +4714,8 @@ static struct nfs4_file *insert_file(struct nfs4_file *new, struct svc_fh *fh, fp->fi_aliased = alias_found = true; } if (likely(ret == NULL)) { - nfsd4_init_file(fh, hashval, new); + nfsd4_file_init(fh, new); + hlist_add_head_rcu(&new->fi_hash, &file_hashtbl[hashval]); new->fi_aliased = alias_found; ret = new; } From 5aa2c4a1fe2806f081997dfbc12992c5d50ec33d Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 28 Oct 2022 10:47:34 -0400 Subject: [PATCH 0999/1565] NFSD: Add a nfsd4_file_hash_remove() helper [ Upstream commit 3341678f2fd6106055cead09e513fad6950a0d19 ] Refactor to relocate hash deletion operation to a helper function that is close to most other nfs4_file data structure operations. The "noinline" annotation will become useful in a moment when the hlist_del_rcu() is replaced with a more complex rhash remove operation. It also guarantees that hash remove operations can be traced with "-p function -l remove_nfs4_file_locked". This also simplifies the organization of forward declarations: the to-be-added rhashtable and its param structure will be defined /after/ put_nfs4_file(). Signed-off-by: Chuck Lever Reviewed-by: NeilBrown Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index d6e0052b3f68..c464403c23a2 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -84,6 +84,7 @@ static bool check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner) static void nfs4_free_ol_stateid(struct nfs4_stid *stid); void nfsd4_end_grace(struct nfsd_net *nn); static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps); +static void nfsd4_file_hash_remove(struct nfs4_file *fi); /* Locking: */ @@ -591,7 +592,7 @@ put_nfs4_file(struct nfs4_file *fi) might_lock(&state_lock); if (refcount_dec_and_lock(&fi->fi_ref, &state_lock)) { - hlist_del_rcu(&fi->fi_hash); + nfsd4_file_hash_remove(fi); spin_unlock(&state_lock); WARN_ON_ONCE(!list_empty(&fi->fi_clnt_odstate)); WARN_ON_ONCE(!list_empty(&fi->fi_delegations)); @@ -4749,6 +4750,11 @@ find_or_add_file(struct nfs4_file *new, struct svc_fh *fh) return insert_file(new, fh, hashval); } +static noinline_for_stack void nfsd4_file_hash_remove(struct nfs4_file *fi) +{ + hlist_del_rcu(&fi->fi_hash); +} + /* * Called to check deny when READ with all zero stateid or * WRITE with all zero or all one stateid From 5e92a168495cd7c8a789004c78313583176e8816 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 28 Oct 2022 10:47:41 -0400 Subject: [PATCH 1000/1565] NFSD: Clean up find_or_add_file() [ Upstream commit 9270fc514ba7d415636b23bcb937573a1ce54f6a ] Remove the call to find_file_locked() in insert_nfs4_file(). Tracing shows that over 99% of these calls return NULL. Thus it is not worth the expense of the extra bucket list traversal. insert_file() already deals correctly with the case where the item is already in the hash bucket. Since nfsd4_file_hash_insert() is now just a wrapper around insert_file(), move the meat of insert_file() into nfsd4_file_hash_insert() and get rid of it. Signed-off-by: Chuck Lever Reviewed-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 68 ++++++++++++++++++++------------------------- 1 file changed, 30 insertions(+), 38 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index c464403c23a2..d2664aa4bde0 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4698,32 +4698,6 @@ find_file_locked(const struct svc_fh *fh, unsigned int hashval) return NULL; } -static struct nfs4_file *insert_file(struct nfs4_file *new, struct svc_fh *fh, - unsigned int hashval) -{ - struct nfs4_file *fp; - struct nfs4_file *ret = NULL; - bool alias_found = false; - - spin_lock(&state_lock); - hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash, - lockdep_is_held(&state_lock)) { - if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) { - if (refcount_inc_not_zero(&fp->fi_ref)) - ret = fp; - } else if (d_inode(fh->fh_dentry) == fp->fi_inode) - fp->fi_aliased = alias_found = true; - } - if (likely(ret == NULL)) { - nfsd4_file_init(fh, new); - hlist_add_head_rcu(&new->fi_hash, &file_hashtbl[hashval]); - new->fi_aliased = alias_found; - ret = new; - } - spin_unlock(&state_lock); - return ret; -} - static struct nfs4_file * find_file(struct svc_fh *fh) { struct nfs4_file *fp; @@ -4735,19 +4709,37 @@ static struct nfs4_file * find_file(struct svc_fh *fh) return fp; } -static struct nfs4_file * -find_or_add_file(struct nfs4_file *new, struct svc_fh *fh) +/* + * On hash insertion, identify entries with the same inode but + * distinct filehandles. They will all be in the same hash bucket + * because nfs4_file's are hashed by the address in the fi_inode + * field. + */ +static noinline_for_stack struct nfs4_file * +nfsd4_file_hash_insert(struct nfs4_file *new, const struct svc_fh *fhp) { - struct nfs4_file *fp; - unsigned int hashval = file_hashval(fh); + unsigned int hashval = file_hashval(fhp); + struct nfs4_file *ret = NULL; + bool alias_found = false; + struct nfs4_file *fi; - rcu_read_lock(); - fp = find_file_locked(fh, hashval); - rcu_read_unlock(); - if (fp) - return fp; - - return insert_file(new, fh, hashval); + spin_lock(&state_lock); + hlist_for_each_entry_rcu(fi, &file_hashtbl[hashval], fi_hash, + lockdep_is_held(&state_lock)) { + if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) { + if (refcount_inc_not_zero(&fi->fi_ref)) + ret = fi; + } else if (d_inode(fhp->fh_dentry) == fi->fi_inode) + fi->fi_aliased = alias_found = true; + } + if (likely(ret == NULL)) { + nfsd4_file_init(fhp, new); + hlist_add_head_rcu(&new->fi_hash, &file_hashtbl[hashval]); + new->fi_aliased = alias_found; + ret = new; + } + spin_unlock(&state_lock); + return ret; } static noinline_for_stack void nfsd4_file_hash_remove(struct nfs4_file *fi) @@ -5641,7 +5633,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf * and check for delegations in the process of being recalled. * If not found, create the nfs4_file struct */ - fp = find_or_add_file(open->op_file, current_fh); + fp = nfsd4_file_hash_insert(open->op_file, current_fh); if (fp != open->op_file) { status = nfs4_check_deleg(cl, open, &dp); if (status) From 8fdef896122f53a5cbf5871cfa6c40070e83d676 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 28 Oct 2022 10:47:47 -0400 Subject: [PATCH 1001/1565] NFSD: Refactor find_file() [ Upstream commit 15424748001a9b5ea62b3e6ad45f0a8b27f01df9 ] find_file() is now the only caller of find_file_locked(), so just fold these two together. Signed-off-by: Chuck Lever Reviewed-by: NeilBrown Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 38 ++++++++++++++++---------------------- 1 file changed, 16 insertions(+), 22 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index d2664aa4bde0..69329dc159f6 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4682,31 +4682,24 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net) nfs4_put_stid(&last->st_stid); } -/* search file_hashtbl[] for file */ -static struct nfs4_file * -find_file_locked(const struct svc_fh *fh, unsigned int hashval) +static noinline_for_stack struct nfs4_file * +nfsd4_file_hash_lookup(const struct svc_fh *fhp) { - struct nfs4_file *fp; - - hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash, - lockdep_is_held(&state_lock)) { - if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) { - if (refcount_inc_not_zero(&fp->fi_ref)) - return fp; - } - } - return NULL; -} - -static struct nfs4_file * find_file(struct svc_fh *fh) -{ - struct nfs4_file *fp; - unsigned int hashval = file_hashval(fh); + unsigned int hashval = file_hashval(fhp); + struct nfs4_file *fi; rcu_read_lock(); - fp = find_file_locked(fh, hashval); + hlist_for_each_entry_rcu(fi, &file_hashtbl[hashval], fi_hash, + lockdep_is_held(&state_lock)) { + if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) { + if (refcount_inc_not_zero(&fi->fi_ref)) { + rcu_read_unlock(); + return fi; + } + } + } rcu_read_unlock(); - return fp; + return NULL; } /* @@ -4757,9 +4750,10 @@ nfs4_share_conflict(struct svc_fh *current_fh, unsigned int deny_type) struct nfs4_file *fp; __be32 ret = nfs_ok; - fp = find_file(current_fh); + fp = nfsd4_file_hash_lookup(current_fh); if (!fp) return ret; + /* Check for conflicting share reservations */ spin_lock(&fp->fi_lock); if (fp->fi_share_deny & deny_type) From 56eedeaf71b0dd30b607e5f05a92c18bd8bf2fcd Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 28 Oct 2022 10:47:53 -0400 Subject: [PATCH 1002/1565] NFSD: Use rhashtable for managing nfs4_file objects [ Upstream commit d47b295e8d76a4d69f0e2ea0cd8a79c9d3488280 ] fh_match() is costly, especially when filehandles are large (as is the case for NFSv4). It needs to be used sparingly when searching data structures. Unfortunately, with common workloads, I see multiple thousands of objects stored in file_hashtbl[], which has just 256 buckets, making its bucket hash chains quite lengthy. Walking long hash chains with the state_lock held blocks other activity that needs that lock. Sizable hash chains are a common occurrance once the server has handed out some delegations, for example -- IIUC, each delegated file is held open on the server by an nfs4_file object. To help mitigate the cost of searching with fh_match(), replace the nfs4_file hash table with an rhashtable, which can dynamically resize its bucket array to minimize hash chain length. The result of this modification is an improvement in the latency of NFSv4 operations, and the reduction of nfsd CPU utilization due to eliminating the cost of multiple calls to fh_match() and reducing the CPU cache misses incurred while walking long hash chains in the nfs4_file hash table. Signed-off-by: Chuck Lever Reviewed-by: NeilBrown Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 97 +++++++++++++++++++++++++++++---------------- fs/nfsd/state.h | 5 +-- 2 files changed, 63 insertions(+), 39 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 69329dc159f6..d490b19d95dd 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -44,7 +44,9 @@ #include #include #include +#include #include + #include "xdr4.h" #include "xdr4cb.h" #include "vfs.h" @@ -589,11 +591,8 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu) void put_nfs4_file(struct nfs4_file *fi) { - might_lock(&state_lock); - - if (refcount_dec_and_lock(&fi->fi_ref, &state_lock)) { + if (refcount_dec_and_test(&fi->fi_ref)) { nfsd4_file_hash_remove(fi); - spin_unlock(&state_lock); WARN_ON_ONCE(!list_empty(&fi->fi_clnt_odstate)); WARN_ON_ONCE(!list_empty(&fi->fi_delegations)); call_rcu(&fi->fi_rcu, nfsd4_free_file_rcu); @@ -718,19 +717,20 @@ static unsigned int ownerstr_hashval(struct xdr_netobj *ownername) return ret & OWNER_HASH_MASK; } -/* hash table for nfs4_file */ -#define FILE_HASH_BITS 8 -#define FILE_HASH_SIZE (1 << FILE_HASH_BITS) +static struct rhltable nfs4_file_rhltable ____cacheline_aligned_in_smp; -static unsigned int file_hashval(const struct svc_fh *fh) -{ - struct inode *inode = d_inode(fh->fh_dentry); +static const struct rhashtable_params nfs4_file_rhash_params = { + .key_len = sizeof_field(struct nfs4_file, fi_inode), + .key_offset = offsetof(struct nfs4_file, fi_inode), + .head_offset = offsetof(struct nfs4_file, fi_rlist), - /* XXX: why not (here & in file cache) use inode? */ - return (unsigned int)hash_long(inode->i_ino, FILE_HASH_BITS); -} - -static struct hlist_head file_hashtbl[FILE_HASH_SIZE]; + /* + * Start with a single page hash table to reduce resizing churn + * on light workloads. + */ + .min_size = 256, + .automatic_shrinking = true, +}; /* * Check if courtesy clients have conflicting access and resolve it if possible @@ -4685,12 +4685,14 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net) static noinline_for_stack struct nfs4_file * nfsd4_file_hash_lookup(const struct svc_fh *fhp) { - unsigned int hashval = file_hashval(fhp); + struct inode *inode = d_inode(fhp->fh_dentry); + struct rhlist_head *tmp, *list; struct nfs4_file *fi; rcu_read_lock(); - hlist_for_each_entry_rcu(fi, &file_hashtbl[hashval], fi_hash, - lockdep_is_held(&state_lock)) { + list = rhltable_lookup(&nfs4_file_rhltable, &inode, + nfs4_file_rhash_params); + rhl_for_each_entry_rcu(fi, tmp, list, fi_rlist) { if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) { if (refcount_inc_not_zero(&fi->fi_ref)) { rcu_read_unlock(); @@ -4704,40 +4706,56 @@ nfsd4_file_hash_lookup(const struct svc_fh *fhp) /* * On hash insertion, identify entries with the same inode but - * distinct filehandles. They will all be in the same hash bucket - * because nfs4_file's are hashed by the address in the fi_inode - * field. + * distinct filehandles. They will all be on the list returned + * by rhltable_lookup(). + * + * inode->i_lock prevents racing insertions from adding an entry + * for the same inode/fhp pair twice. */ static noinline_for_stack struct nfs4_file * nfsd4_file_hash_insert(struct nfs4_file *new, const struct svc_fh *fhp) { - unsigned int hashval = file_hashval(fhp); + struct inode *inode = d_inode(fhp->fh_dentry); + struct rhlist_head *tmp, *list; struct nfs4_file *ret = NULL; bool alias_found = false; struct nfs4_file *fi; + int err; - spin_lock(&state_lock); - hlist_for_each_entry_rcu(fi, &file_hashtbl[hashval], fi_hash, - lockdep_is_held(&state_lock)) { + rcu_read_lock(); + spin_lock(&inode->i_lock); + + list = rhltable_lookup(&nfs4_file_rhltable, &inode, + nfs4_file_rhash_params); + rhl_for_each_entry_rcu(fi, tmp, list, fi_rlist) { if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) { if (refcount_inc_not_zero(&fi->fi_ref)) ret = fi; - } else if (d_inode(fhp->fh_dentry) == fi->fi_inode) + } else fi->fi_aliased = alias_found = true; } - if (likely(ret == NULL)) { - nfsd4_file_init(fhp, new); - hlist_add_head_rcu(&new->fi_hash, &file_hashtbl[hashval]); - new->fi_aliased = alias_found; - ret = new; - } - spin_unlock(&state_lock); + if (ret) + goto out_unlock; + + nfsd4_file_init(fhp, new); + err = rhltable_insert(&nfs4_file_rhltable, &new->fi_rlist, + nfs4_file_rhash_params); + if (err) + goto out_unlock; + + new->fi_aliased = alias_found; + ret = new; + +out_unlock: + spin_unlock(&inode->i_lock); + rcu_read_unlock(); return ret; } static noinline_for_stack void nfsd4_file_hash_remove(struct nfs4_file *fi) { - hlist_del_rcu(&fi->fi_hash); + rhltable_remove(&nfs4_file_rhltable, &fi->fi_rlist, + nfs4_file_rhash_params); } /* @@ -5628,6 +5646,8 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf * If not found, create the nfs4_file struct */ fp = nfsd4_file_hash_insert(open->op_file, current_fh); + if (unlikely(!fp)) + return nfserr_jukebox; if (fp != open->op_file) { status = nfs4_check_deleg(cl, open, &dp); if (status) @@ -8054,10 +8074,16 @@ nfs4_state_start(void) { int ret; - ret = nfsd4_create_callback_queue(); + ret = rhltable_init(&nfs4_file_rhltable, &nfs4_file_rhash_params); if (ret) return ret; + ret = nfsd4_create_callback_queue(); + if (ret) { + rhltable_destroy(&nfs4_file_rhltable); + return ret; + } + set_max_delegations(); return 0; } @@ -8088,6 +8114,7 @@ nfs4_state_shutdown_net(struct net *net) nfsd4_client_tracking_exit(net); nfs4_state_destroy_net(net); + rhltable_destroy(&nfs4_file_rhltable); #ifdef CONFIG_NFSD_V4_2_INTER_SSC nfsd4_ssc_shutdown_umount(nn); #endif diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index e2daef3cc003..eadd7f465bf5 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -536,16 +536,13 @@ struct nfs4_clnt_odstate { * inode can have multiple filehandles associated with it, so there is * (potentially) a many to one relationship between this struct and struct * inode. - * - * These are hashed by filehandle in the file_hashtbl, which is protected by - * the global state_lock spinlock. */ struct nfs4_file { refcount_t fi_ref; struct inode * fi_inode; bool fi_aliased; spinlock_t fi_lock; - struct hlist_node fi_hash; /* hash on fi_fhandle */ + struct rhlist_head fi_rlist; struct list_head fi_stateids; union { struct list_head fi_delegations; From 384b23f13672aebbf0bd56339b749ea4e162b11e Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 31 Oct 2022 09:53:26 -0400 Subject: [PATCH 1003/1565] NFSD: Fix licensing header in filecache.c [ Upstream commit 3f054211b29c0fa06dfdcab402c795fd7e906be1 ] Add a missing SPDX header. Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index d681faf48cf8..b43d2d7ac595 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1,5 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0 /* - * Open file cache. + * The NFSD open file cache. * * (c) 2015 - Jeff Layton * From 605a5acd6f42033f490cb71ff9594678929a40ca Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 2 Nov 2022 14:44:47 -0400 Subject: [PATCH 1004/1565] nfsd: remove the pages_flushed statistic from filecache [ Upstream commit 1f696e230ea5198e393368b319eb55651828d687 ] We're counting mapping->nrpages, but not all of those are necessarily dirty. We don't really have a simple way to count just the dirty pages, so just remove this stat since it's not accurate. Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index b43d2d7ac595..b95b1be5b2e4 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -57,7 +57,6 @@ static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); static DEFINE_PER_CPU(unsigned long, nfsd_file_releases); static DEFINE_PER_CPU(unsigned long, nfsd_file_total_age); -static DEFINE_PER_CPU(unsigned long, nfsd_file_pages_flushed); static DEFINE_PER_CPU(unsigned long, nfsd_file_evictions); struct nfsd_fcache_disposal { @@ -395,7 +394,6 @@ nfsd_file_flush(struct nfsd_file *nf) if (!file || !(file->f_mode & FMODE_WRITE)) return; - this_cpu_add(nfsd_file_pages_flushed, file->f_mapping->nrpages); if (vfs_fsync(file, 1) != 0) nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); } @@ -1022,7 +1020,6 @@ nfsd_file_cache_shutdown(void) per_cpu(nfsd_file_acquisitions, i) = 0; per_cpu(nfsd_file_releases, i) = 0; per_cpu(nfsd_file_total_age, i) = 0; - per_cpu(nfsd_file_pages_flushed, i) = 0; per_cpu(nfsd_file_evictions, i) = 0; } } @@ -1237,7 +1234,7 @@ nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, */ int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { - unsigned long releases = 0, pages_flushed = 0, evictions = 0; + unsigned long releases = 0, evictions = 0; unsigned long hits = 0, acquisitions = 0; unsigned int i, count = 0, buckets = 0; unsigned long lru = 0, total_age = 0; @@ -1265,7 +1262,6 @@ int nfsd_file_cache_stats_show(struct seq_file *m, void *v) releases += per_cpu(nfsd_file_releases, i); total_age += per_cpu(nfsd_file_total_age, i); evictions += per_cpu(nfsd_file_evictions, i); - pages_flushed += per_cpu(nfsd_file_pages_flushed, i); } seq_printf(m, "total entries: %u\n", count); @@ -1279,6 +1275,5 @@ int nfsd_file_cache_stats_show(struct seq_file *m, void *v) seq_printf(m, "mean age (ms): %ld\n", total_age / releases); else seq_printf(m, "mean age (ms): -\n"); - seq_printf(m, "pages flushed: %lu\n", pages_flushed); return 0; } From 3d479899f4fe0ce70ab12ef212e438cc3621b3c4 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 2 Nov 2022 14:44:48 -0400 Subject: [PATCH 1005/1565] nfsd: reorganize filecache.c [ Upstream commit 8214118589881b2d390284410c5ff275e7a5e03c ] In a coming patch, we're going to rework how the filecache refcounting works. Move some code around in the function to reduce the churn in the later patches, and rename some of the functions with (hopefully) clearer names: nfsd_file_flush becomes nfsd_file_fsync, and nfsd_file_unhash_and_dispose is renamed to nfsd_file_unhash_and_queue. Also, the nfsd_file_put_final tracepoint is renamed to nfsd_file_free, to better match the name of the function from which it's called. Signed-off-by: Jeff Layton Reviewed-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 111 ++++++++++++++++++++++---------------------- fs/nfsd/trace.h | 4 +- 2 files changed, 58 insertions(+), 57 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index b95b1be5b2e4..fb7ada3f7410 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -334,16 +334,59 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) return nf; } +static void +nfsd_file_fsync(struct nfsd_file *nf) +{ + struct file *file = nf->nf_file; + + if (!file || !(file->f_mode & FMODE_WRITE)) + return; + if (vfs_fsync(file, 1) != 0) + nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); +} + +static int +nfsd_file_check_write_error(struct nfsd_file *nf) +{ + struct file *file = nf->nf_file; + + if (!file || !(file->f_mode & FMODE_WRITE)) + return 0; + return filemap_check_wb_err(file->f_mapping, READ_ONCE(file->f_wb_err)); +} + +static void +nfsd_file_hash_remove(struct nfsd_file *nf) +{ + trace_nfsd_file_unhash(nf); + + if (nfsd_file_check_write_error(nf)) + nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); + rhashtable_remove_fast(&nfsd_file_rhash_tbl, &nf->nf_rhash, + nfsd_file_rhash_params); +} + +static bool +nfsd_file_unhash(struct nfsd_file *nf) +{ + if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { + nfsd_file_hash_remove(nf); + return true; + } + return false; +} + static bool nfsd_file_free(struct nfsd_file *nf) { s64 age = ktime_to_ms(ktime_sub(ktime_get(), nf->nf_birthtime)); bool flush = false; + trace_nfsd_file_free(nf); + this_cpu_inc(nfsd_file_releases); this_cpu_add(nfsd_file_total_age, age); - trace_nfsd_file_put_final(nf); if (nf->nf_mark) nfsd_file_mark_put(nf->nf_mark); if (nf->nf_file) { @@ -377,27 +420,6 @@ nfsd_file_check_writeback(struct nfsd_file *nf) mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK); } -static int -nfsd_file_check_write_error(struct nfsd_file *nf) -{ - struct file *file = nf->nf_file; - - if (!file || !(file->f_mode & FMODE_WRITE)) - return 0; - return filemap_check_wb_err(file->f_mapping, READ_ONCE(file->f_wb_err)); -} - -static void -nfsd_file_flush(struct nfsd_file *nf) -{ - struct file *file = nf->nf_file; - - if (!file || !(file->f_mode & FMODE_WRITE)) - return; - if (vfs_fsync(file, 1) != 0) - nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); -} - static void nfsd_file_lru_add(struct nfsd_file *nf) { set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); @@ -411,31 +433,18 @@ static void nfsd_file_lru_remove(struct nfsd_file *nf) trace_nfsd_file_lru_del(nf); } -static void -nfsd_file_hash_remove(struct nfsd_file *nf) +struct nfsd_file * +nfsd_file_get(struct nfsd_file *nf) { - trace_nfsd_file_unhash(nf); - - if (nfsd_file_check_write_error(nf)) - nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); - rhashtable_remove_fast(&nfsd_file_rhash_tbl, &nf->nf_rhash, - nfsd_file_rhash_params); -} - -static bool -nfsd_file_unhash(struct nfsd_file *nf) -{ - if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { - nfsd_file_hash_remove(nf); - return true; - } - return false; + if (likely(refcount_inc_not_zero(&nf->nf_ref))) + return nf; + return NULL; } static void -nfsd_file_unhash_and_dispose(struct nfsd_file *nf, struct list_head *dispose) +nfsd_file_unhash_and_queue(struct nfsd_file *nf, struct list_head *dispose) { - trace_nfsd_file_unhash_and_dispose(nf); + trace_nfsd_file_unhash_and_queue(nf); if (nfsd_file_unhash(nf)) { /* caller must call nfsd_file_dispose_list() later */ nfsd_file_lru_remove(nf); @@ -473,7 +482,7 @@ nfsd_file_put(struct nfsd_file *nf) nfsd_file_unhash_and_put(nf); if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { - nfsd_file_flush(nf); + nfsd_file_fsync(nf); nfsd_file_put_noref(nf); } else if (nf->nf_file && test_bit(NFSD_FILE_GC, &nf->nf_flags)) { nfsd_file_put_noref(nf); @@ -482,14 +491,6 @@ nfsd_file_put(struct nfsd_file *nf) nfsd_file_put_noref(nf); } -struct nfsd_file * -nfsd_file_get(struct nfsd_file *nf) -{ - if (likely(refcount_inc_not_zero(&nf->nf_ref))) - return nf; - return NULL; -} - static void nfsd_file_dispose_list(struct list_head *dispose) { @@ -498,7 +499,7 @@ nfsd_file_dispose_list(struct list_head *dispose) while(!list_empty(dispose)) { nf = list_first_entry(dispose, struct nfsd_file, nf_lru); list_del_init(&nf->nf_lru); - nfsd_file_flush(nf); + nfsd_file_fsync(nf); nfsd_file_put_noref(nf); } } @@ -512,7 +513,7 @@ nfsd_file_dispose_list_sync(struct list_head *dispose) while(!list_empty(dispose)) { nf = list_first_entry(dispose, struct nfsd_file, nf_lru); list_del_init(&nf->nf_lru); - nfsd_file_flush(nf); + nfsd_file_fsync(nf); if (!refcount_dec_and_test(&nf->nf_ref)) continue; if (nfsd_file_free(nf)) @@ -712,7 +713,7 @@ __nfsd_file_close_inode(struct inode *inode, struct list_head *dispose) nfsd_file_rhash_params); if (!nf) break; - nfsd_file_unhash_and_dispose(nf, dispose); + nfsd_file_unhash_and_queue(nf, dispose); count++; } while (1); rcu_read_unlock(); @@ -914,7 +915,7 @@ __nfsd_file_cache_purge(struct net *net) nf = rhashtable_walk_next(&iter); while (!IS_ERR_OR_NULL(nf)) { if (!net || nf->nf_net == net) - nfsd_file_unhash_and_dispose(nf, &dispose); + nfsd_file_unhash_and_queue(nf, &dispose); nf = rhashtable_walk_next(&iter); } diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 191b206379b7..5faec08ac7cf 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -819,10 +819,10 @@ DEFINE_EVENT(nfsd_file_class, name, \ TP_PROTO(struct nfsd_file *nf), \ TP_ARGS(nf)) -DEFINE_NFSD_FILE_EVENT(nfsd_file_put_final); +DEFINE_NFSD_FILE_EVENT(nfsd_file_free); DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash); DEFINE_NFSD_FILE_EVENT(nfsd_file_put); -DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_dispose); +DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_queue); TRACE_EVENT(nfsd_file_alloc, TP_PROTO( From 4453e0c1bbabd660b6de0a790dbf907131635fda Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 2 Nov 2022 14:44:50 -0400 Subject: [PATCH 1006/1565] nfsd: fix up the filecache laundrette scheduling [ Upstream commit 22ae4c114f77b55a4c5036e8f70409a0799a08f8 ] We don't really care whether there are hashed entries when it comes to scheduling the laundrette. They might all be non-gc entries, after all. We only want to schedule it if there are entries on the LRU. Switch to using list_lru_count, and move the check into nfsd_file_gc_worker. The other callsite in nfsd_file_put doesn't need to count entries, since it only schedules the laundrette after adding an entry to the LRU. Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index fb7ada3f7410..522e900a8860 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -210,12 +210,9 @@ static const struct rhashtable_params nfsd_file_rhash_params = { static void nfsd_file_schedule_laundrette(void) { - if ((atomic_read(&nfsd_file_rhash_tbl.nelems) == 0) || - test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 0) - return; - - queue_delayed_work(system_wq, &nfsd_filecache_laundrette, - NFSD_LAUNDRETTE_DELAY); + if (test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags)) + queue_delayed_work(system_wq, &nfsd_filecache_laundrette, + NFSD_LAUNDRETTE_DELAY); } static void @@ -665,7 +662,8 @@ static void nfsd_file_gc_worker(struct work_struct *work) { nfsd_file_gc(); - nfsd_file_schedule_laundrette(); + if (list_lru_count(&nfsd_file_lru)) + nfsd_file_schedule_laundrette(); } static unsigned long From 739202b2b9cfee0941bd22c3838f545a6c7f278c Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 3 Nov 2022 16:22:48 -0400 Subject: [PATCH 1007/1565] NFSD: Add an nfsd_file_fsync tracepoint [ Upstream commit d7064eaf688cfe454c50db9f59298463d80d403c ] Add a tracepoint to capture the number of filecache-triggered fsync calls and which files needed it. Also, record when an fsync triggers a write verifier reset. Examples: <...>-97 [007] 262.505611: nfsd_file_free: inode=0xffff888171e08140 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d2400 <...>-97 [007] 262.505612: nfsd_file_fsync: inode=0xffff888171e08140 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d2400 ret=0 <...>-97 [007] 262.505623: nfsd_file_free: inode=0xffff888171e08dc0 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d1e00 <...>-97 [007] 262.505624: nfsd_file_fsync: inode=0xffff888171e08dc0 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d1e00 ret=0 Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 5 ++++- fs/nfsd/trace.h | 31 +++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 522e900a8860..6b8873b0c2c3 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -335,10 +335,13 @@ static void nfsd_file_fsync(struct nfsd_file *nf) { struct file *file = nf->nf_file; + int ret; if (!file || !(file->f_mode & FMODE_WRITE)) return; - if (vfs_fsync(file, 1) != 0) + ret = vfs_fsync(file, 1); + trace_nfsd_file_fsync(nf, ret); + if (ret) nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); } diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 5faec08ac7cf..235ea38da864 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -1151,6 +1151,37 @@ DEFINE_EVENT(nfsd_file_lruwalk_class, name, \ DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_gc_removed); DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_shrinker_removed); +TRACE_EVENT(nfsd_file_fsync, + TP_PROTO( + const struct nfsd_file *nf, + int ret + ), + TP_ARGS(nf, ret), + TP_STRUCT__entry( + __field(void *, nf_inode) + __field(int, nf_ref) + __field(int, ret) + __field(unsigned long, nf_flags) + __field(unsigned char, nf_may) + __field(struct file *, nf_file) + ), + TP_fast_assign( + __entry->nf_inode = nf->nf_inode; + __entry->nf_ref = refcount_read(&nf->nf_ref); + __entry->ret = ret; + __entry->nf_flags = nf->nf_flags; + __entry->nf_may = nf->nf_may; + __entry->nf_file = nf->nf_file; + ), + TP_printk("inode=%p ref=%d flags=%s may=%s nf_file=%p ret=%d", + __entry->nf_inode, + __entry->nf_ref, + show_nf_flags(__entry->nf_flags), + show_nfsd_may_flags(__entry->nf_may), + __entry->nf_file, __entry->ret + ) +); + #include "cache.h" TRACE_DEFINE_ENUM(RC_DROPIT); From 31c93ee5f1e4dc278b562e20f3c3274ac34997f3 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Sun, 6 Nov 2022 14:02:39 -0500 Subject: [PATCH 1008/1565] lockd: set other missing fields when unlocking files [ Upstream commit 18ebd35b61b4693a0ddc270b6d4f18def232e770 ] vfs_lock_file() expects the struct file_lock to be fully initialised by the caller. Re-exported NFSv3 has been seen to Oops if the fl_file field is NULL. Fixes: aec158242b87 ("lockd: set fl_owner when unlocking files") Signed-off-by: Trond Myklebust Reviewed-by: Jeff Layton Link: https://bugzilla.kernel.org/show_bug.cgi?id=216582 Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svcsubs.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index 720684345817..e3b6229e7ae5 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -176,7 +176,7 @@ nlm_delete_file(struct nlm_file *file) } } -static int nlm_unlock_files(struct nlm_file *file, fl_owner_t owner) +static int nlm_unlock_files(struct nlm_file *file, const struct file_lock *fl) { struct file_lock lock; @@ -184,12 +184,15 @@ static int nlm_unlock_files(struct nlm_file *file, fl_owner_t owner) lock.fl_type = F_UNLCK; lock.fl_start = 0; lock.fl_end = OFFSET_MAX; - lock.fl_owner = owner; - if (file->f_file[O_RDONLY] && - vfs_lock_file(file->f_file[O_RDONLY], F_SETLK, &lock, NULL)) + lock.fl_owner = fl->fl_owner; + lock.fl_pid = fl->fl_pid; + lock.fl_flags = FL_POSIX; + + lock.fl_file = file->f_file[O_RDONLY]; + if (lock.fl_file && vfs_lock_file(lock.fl_file, F_SETLK, &lock, NULL)) goto out_err; - if (file->f_file[O_WRONLY] && - vfs_lock_file(file->f_file[O_WRONLY], F_SETLK, &lock, NULL)) + lock.fl_file = file->f_file[O_WRONLY]; + if (lock.fl_file && vfs_lock_file(lock.fl_file, F_SETLK, &lock, NULL)) goto out_err; return 0; out_err: @@ -226,7 +229,7 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file *file, if (match(lockhost, host)) { spin_unlock(&flctx->flc_lock); - if (nlm_unlock_files(file, fl->fl_owner)) + if (nlm_unlock_files(file, fl)) return 1; goto again; } From e2505cb851641451273f9f8913e4c3ff5f5c6621 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 7 Nov 2022 06:58:41 -0500 Subject: [PATCH 1009/1565] nfsd: return error if nfs4_setacl fails [ Upstream commit 01d53a88c08951f88f2a42f1f1e6568928e0590e ] With the addition of POSIX ACLs to struct nfsd_attrs, we no longer return an error if setting the ACL fails. Ensure we return the na_aclerr error on SETATTR if there is one. Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs") Cc: Neil Brown Reported-by: Yongcheng Yang Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 32fccca7de18..86ab9b8ac2da 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1135,6 +1135,8 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, 0, (time64_t)0); if (!status) status = nfserrno(attrs.na_labelerr); + if (!status) + status = nfserrno(attrs.na_aclerr); out: nfsd_attrs_free(&attrs); fh_drop_write(&cstate->current_fh); From ae8f2bb3dd3440be214b1f59c6aef4b3a8c2631c Mon Sep 17 00:00:00 2001 From: Xiu Jianfeng Date: Fri, 11 Nov 2022 17:18:35 +0800 Subject: [PATCH 1010/1565] NFSD: Use struct_size() helper in alloc_session() [ Upstream commit 85a0d0c9a58002ef7d1bf5e3ea630f4fbd42a4f0 ] Use struct_size() helper to simplify the code, no functional changes. Signed-off-by: Xiu Jianfeng Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index d490b19d95dd..9a8038bfaa0d 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1833,13 +1833,12 @@ static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs, int numslots = fattrs->maxreqs; int slotsize = slot_bytes(fattrs); struct nfsd4_session *new; - int mem, i; + int i; - BUILD_BUG_ON(NFSD_MAX_SLOTS_PER_SESSION * sizeof(struct nfsd4_slot *) - + sizeof(struct nfsd4_session) > PAGE_SIZE); - mem = numslots * sizeof(struct nfsd4_slot *); + BUILD_BUG_ON(struct_size(new, se_slots, NFSD_MAX_SLOTS_PER_SESSION) + > PAGE_SIZE); - new = kzalloc(sizeof(*new) + mem, GFP_KERNEL); + new = kzalloc(struct_size(new, se_slots, numslots), GFP_KERNEL); if (!new) return NULL; /* allocate each struct nfsd4_slot and data cache in one piece */ From 781c3f3d18125ab852c038e938f74f2c7f55f46e Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 11 Nov 2022 14:36:36 -0500 Subject: [PATCH 1011/1565] lockd: set missing fl_flags field when retrieving args [ Upstream commit 75c7940d2a86d3f1b60a0a265478cb8fc887b970 ] Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc4proc.c | 1 + fs/lockd/svcproc.c | 1 + 2 files changed, 2 insertions(+) diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c index 284b019cb652..b72023a6b4c1 100644 --- a/fs/lockd/svc4proc.c +++ b/fs/lockd/svc4proc.c @@ -52,6 +52,7 @@ nlm4svc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, *filp = file; /* Set up the missing parts of the file_lock structure */ + lock->fl.fl_flags = FL_POSIX; lock->fl.fl_file = file->f_file[mode]; lock->fl.fl_pid = current->tgid; lock->fl.fl_start = (loff_t)lock->lock_start; diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c index e35c05e27806..32784f508c81 100644 --- a/fs/lockd/svcproc.c +++ b/fs/lockd/svcproc.c @@ -77,6 +77,7 @@ nlmsvc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, /* Set up the missing parts of the file_lock structure */ mode = lock_to_openmode(&lock->fl); + lock->fl.fl_flags = FL_POSIX; lock->fl.fl_file = file->f_file[mode]; lock->fl.fl_pid = current->tgid; lock->fl.fl_lmops = &nlmsvc_lock_operations; From 7ecaa9aff9f5b2f9b2b634c9f52ec726749374e4 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 11 Nov 2022 14:36:37 -0500 Subject: [PATCH 1012/1565] lockd: ensure we use the correct file descriptor when unlocking [ Upstream commit 69efce009f7df888e1fede3cb2913690eb829f52 ] Shared locks are set on O_RDONLY descriptors and exclusive locks are set on O_WRONLY ones. nlmsvc_unlock however calls vfs_lock_file twice, once for each descriptor, but it doesn't reset fl_file. Ensure that it does. Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svclock.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index 9c1aa75441e1..9eae99e08e69 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -659,11 +659,13 @@ nlmsvc_unlock(struct net *net, struct nlm_file *file, struct nlm_lock *lock) nlmsvc_cancel_blocked(net, file, lock); lock->fl.fl_type = F_UNLCK; - if (file->f_file[O_RDONLY]) - error = vfs_lock_file(file->f_file[O_RDONLY], F_SETLK, + lock->fl.fl_file = file->f_file[O_RDONLY]; + if (lock->fl.fl_file) + error = vfs_lock_file(lock->fl.fl_file, F_SETLK, &lock->fl, NULL); - if (file->f_file[O_WRONLY]) - error = vfs_lock_file(file->f_file[O_WRONLY], F_SETLK, + lock->fl.fl_file = file->f_file[O_WRONLY]; + if (lock->fl.fl_file) + error |= vfs_lock_file(lock->fl.fl_file, F_SETLK, &lock->fl, NULL); return (error < 0)? nlm_lck_denied_nolocks : nlm_granted; From 0d5f3de2b422c43f968f0048e1ad62be78a3e0af Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 11 Nov 2022 14:36:38 -0500 Subject: [PATCH 1013/1565] lockd: fix file selection in nlmsvc_cancel_blocked [ Upstream commit 9f27783b4dd235ef3c8dbf69fc6322777450323c ] We currently do a lock_to_openmode call based on the arguments from the NLM_UNLOCK call, but that will always set the fl_type of the lock to F_UNLCK, and the O_RDONLY descriptor is always chosen. Fix it to use the file_lock from the block instead. Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svclock.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index 9eae99e08e69..4e30f3c50970 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -699,9 +699,10 @@ nlmsvc_cancel_blocked(struct net *net, struct nlm_file *file, struct nlm_lock *l block = nlmsvc_lookup_block(file, lock); mutex_unlock(&file->f_mutex); if (block != NULL) { - mode = lock_to_openmode(&lock->fl); - vfs_cancel_lock(block->b_file->f_file[mode], - &block->b_call->a_args.lock.fl); + struct file_lock *fl = &block->b_call->a_args.lock.fl; + + mode = lock_to_openmode(fl); + vfs_cancel_lock(block->b_file->f_file[mode], fl); status = nlmsvc_unlink_block(block); nlmsvc_release_block(block); } From 35a48412f6a4d6ca8473c91dc853d0b5f3953fe0 Mon Sep 17 00:00:00 2001 From: Brian Foster Date: Wed, 16 Nov 2022 10:28:36 -0500 Subject: [PATCH 1014/1565] NFSD: pass range end to vfs_fsync_range() instead of count [ Upstream commit 79a1d88a36f77374c77fd41a4386d8c2736b8704 ] _nfsd_copy_file_range() calls vfs_fsync_range() with an offset and count (bytes written), but the former wants the start and end bytes of the range to sync. Fix it up. Fixes: eac0b17a77fb ("NFSD add vfs_fsync after async copy is done") Signed-off-by: Brian Foster Tested-by: Dai Ngo Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 86ab9b8ac2da..cf558d951ec6 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1632,6 +1632,7 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy, u64 src_pos = copy->cp_src_pos; u64 dst_pos = copy->cp_dst_pos; int status; + loff_t end; /* See RFC 7862 p.67: */ if (bytes_total == 0) @@ -1651,8 +1652,8 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy, /* for a non-zero asynchronous copy do a commit of data */ if (nfsd4_copy_is_async(copy) && copy->cp_res.wr_bytes_written > 0) { since = READ_ONCE(dst->f_wb_err); - status = vfs_fsync_range(dst, copy->cp_dst_pos, - copy->cp_res.wr_bytes_written, 0); + end = copy->cp_dst_pos + copy->cp_res.wr_bytes_written - 1; + status = vfs_fsync_range(dst, copy->cp_dst_pos, end, 0); if (!status) status = filemap_check_wb_err(dst->f_mapping, since); if (!status) From 87098b663f42f4f0866fa876afdfab502a10b4de Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Wed, 16 Nov 2022 19:44:45 -0800 Subject: [PATCH 1015/1565] NFSD: refactoring courtesy_client_reaper to a generic low memory shrinker [ Upstream commit a1049eb47f20b9eabf9afb218578fff16b4baca6 ] Refactoring courtesy_client_reaper to generic low memory shrinker so it can be used for other purposes. Signed-off-by: Dai Ngo Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 9a8038bfaa0d..8fdf5ab5b9e4 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4361,7 +4361,7 @@ nfsd4_init_slabs(void) } static unsigned long -nfsd_courtesy_client_count(struct shrinker *shrink, struct shrink_control *sc) +nfsd4_state_shrinker_count(struct shrinker *shrink, struct shrink_control *sc) { int cnt; struct nfsd_net *nn = container_of(shrink, @@ -4374,7 +4374,7 @@ nfsd_courtesy_client_count(struct shrinker *shrink, struct shrink_control *sc) } static unsigned long -nfsd_courtesy_client_scan(struct shrinker *shrink, struct shrink_control *sc) +nfsd4_state_shrinker_scan(struct shrinker *shrink, struct shrink_control *sc) { return SHRINK_STOP; } @@ -4401,8 +4401,8 @@ nfsd4_init_leases_net(struct nfsd_net *nn) nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB); atomic_set(&nn->nfsd_courtesy_clients, 0); - nn->nfsd_client_shrinker.scan_objects = nfsd_courtesy_client_scan; - nn->nfsd_client_shrinker.count_objects = nfsd_courtesy_client_count; + nn->nfsd_client_shrinker.scan_objects = nfsd4_state_shrinker_scan; + nn->nfsd_client_shrinker.count_objects = nfsd4_state_shrinker_count; nn->nfsd_client_shrinker.seeks = DEFAULT_SEEKS; return register_shrinker(&nn->nfsd_client_shrinker); } @@ -6151,17 +6151,24 @@ laundromat_main(struct work_struct *laundry) } static void -courtesy_client_reaper(struct work_struct *reaper) +courtesy_client_reaper(struct nfsd_net *nn) { struct list_head reaplist; - struct delayed_work *dwork = to_delayed_work(reaper); - struct nfsd_net *nn = container_of(dwork, struct nfsd_net, - nfsd_shrinker_work); nfs4_get_courtesy_client_reaplist(nn, &reaplist); nfs4_process_client_reaplist(&reaplist); } +static void +nfsd4_state_shrinker_worker(struct work_struct *work) +{ + struct delayed_work *dwork = to_delayed_work(work); + struct nfsd_net *nn = container_of(dwork, struct nfsd_net, + nfsd_shrinker_work); + + courtesy_client_reaper(nn); +} + static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp) { if (!fh_match(&fhp->fh_handle, &stp->sc_file->fi_fhandle)) @@ -7997,7 +8004,7 @@ static int nfs4_state_create_net(struct net *net) INIT_LIST_HEAD(&nn->blocked_locks_lru); INIT_DELAYED_WORK(&nn->laundromat_work, laundromat_main); - INIT_DELAYED_WORK(&nn->nfsd_shrinker_work, courtesy_client_reaper); + INIT_DELAYED_WORK(&nn->nfsd_shrinker_work, nfsd4_state_shrinker_worker); get_net(net); return 0; From 80a81db01ab0521c9355739a9276836264141da6 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Wed, 16 Nov 2022 19:44:46 -0800 Subject: [PATCH 1016/1565] NFSD: add support for sending CB_RECALL_ANY [ Upstream commit 3959066b697b5dfbb7141124ae9665337d4bc638 ] Add XDR encode and decode function for CB_RECALL_ANY. Signed-off-by: Dai Ngo Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4callback.c | 72 ++++++++++++++++++++++++++++++++++++++++++ fs/nfsd/state.h | 1 + fs/nfsd/xdr4.h | 5 +++ fs/nfsd/xdr4cb.h | 6 ++++ 4 files changed, 84 insertions(+) diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index 39989c14c8a1..4eae2c5af2ed 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -76,6 +76,17 @@ static __be32 *xdr_encode_empty_array(__be32 *p) * 1 Protocol" */ +static void encode_uint32(struct xdr_stream *xdr, u32 n) +{ + WARN_ON_ONCE(xdr_stream_encode_u32(xdr, n) < 0); +} + +static void encode_bitmap4(struct xdr_stream *xdr, const __u32 *bitmap, + size_t len) +{ + WARN_ON_ONCE(xdr_stream_encode_uint32_array(xdr, bitmap, len) < 0); +} + /* * nfs_cb_opnum4 * @@ -328,6 +339,24 @@ static void encode_cb_recall4args(struct xdr_stream *xdr, hdr->nops++; } +/* + * CB_RECALLANY4args + * + * struct CB_RECALLANY4args { + * uint32_t craa_objects_to_keep; + * bitmap4 craa_type_mask; + * }; + */ +static void +encode_cb_recallany4args(struct xdr_stream *xdr, + struct nfs4_cb_compound_hdr *hdr, struct nfsd4_cb_recall_any *ra) +{ + encode_nfs_cb_opnum4(xdr, OP_CB_RECALL_ANY); + encode_uint32(xdr, ra->ra_keep); + encode_bitmap4(xdr, ra->ra_bmval, ARRAY_SIZE(ra->ra_bmval)); + hdr->nops++; +} + /* * CB_SEQUENCE4args * @@ -482,6 +511,26 @@ static void nfs4_xdr_enc_cb_recall(struct rpc_rqst *req, struct xdr_stream *xdr, encode_cb_nops(&hdr); } +/* + * 20.6. Operation 8: CB_RECALL_ANY - Keep Any N Recallable Objects + */ +static void +nfs4_xdr_enc_cb_recall_any(struct rpc_rqst *req, + struct xdr_stream *xdr, const void *data) +{ + const struct nfsd4_callback *cb = data; + struct nfsd4_cb_recall_any *ra; + struct nfs4_cb_compound_hdr hdr = { + .ident = cb->cb_clp->cl_cb_ident, + .minorversion = cb->cb_clp->cl_minorversion, + }; + + ra = container_of(cb, struct nfsd4_cb_recall_any, ra_cb); + encode_cb_compound4args(xdr, &hdr); + encode_cb_sequence4args(xdr, cb, &hdr); + encode_cb_recallany4args(xdr, &hdr, ra); + encode_cb_nops(&hdr); +} /* * NFSv4.0 and NFSv4.1 XDR decode functions @@ -520,6 +569,28 @@ static int nfs4_xdr_dec_cb_recall(struct rpc_rqst *rqstp, return decode_cb_op_status(xdr, OP_CB_RECALL, &cb->cb_status); } +/* + * 20.6. Operation 8: CB_RECALL_ANY - Keep Any N Recallable Objects + */ +static int +nfs4_xdr_dec_cb_recall_any(struct rpc_rqst *rqstp, + struct xdr_stream *xdr, + void *data) +{ + struct nfsd4_callback *cb = data; + struct nfs4_cb_compound_hdr hdr; + int status; + + status = decode_cb_compound4res(xdr, &hdr); + if (unlikely(status)) + return status; + status = decode_cb_sequence4res(xdr, cb); + if (unlikely(status || cb->cb_seq_status)) + return status; + status = decode_cb_op_status(xdr, OP_CB_RECALL_ANY, &cb->cb_status); + return status; +} + #ifdef CONFIG_NFSD_PNFS /* * CB_LAYOUTRECALL4args @@ -783,6 +854,7 @@ static const struct rpc_procinfo nfs4_cb_procedures[] = { #endif PROC(CB_NOTIFY_LOCK, COMPOUND, cb_notify_lock, cb_notify_lock), PROC(CB_OFFLOAD, COMPOUND, cb_offload, cb_offload), + PROC(CB_RECALL_ANY, COMPOUND, cb_recall_any, cb_recall_any), }; static unsigned int nfs4_cb_counts[ARRAY_SIZE(nfs4_cb_procedures)]; diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index eadd7f465bf5..e30882f8b851 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -636,6 +636,7 @@ enum nfsd4_cb_op { NFSPROC4_CLNT_CB_OFFLOAD, NFSPROC4_CLNT_CB_SEQUENCE, NFSPROC4_CLNT_CB_NOTIFY_LOCK, + NFSPROC4_CLNT_CB_RECALL_ANY, }; /* Returns true iff a is later than b: */ diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 8f323d9071f0..24934cf90a84 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -896,6 +896,11 @@ struct nfsd4_operation { union nfsd4_op_u *); }; +struct nfsd4_cb_recall_any { + struct nfsd4_callback ra_cb; + u32 ra_keep; + u32 ra_bmval[1]; +}; #endif diff --git a/fs/nfsd/xdr4cb.h b/fs/nfsd/xdr4cb.h index 547cf07cf4e0..0d39af1b00a0 100644 --- a/fs/nfsd/xdr4cb.h +++ b/fs/nfsd/xdr4cb.h @@ -48,3 +48,9 @@ #define NFS4_dec_cb_offload_sz (cb_compound_dec_hdr_sz + \ cb_sequence_dec_sz + \ op_dec_sz) +#define NFS4_enc_cb_recall_any_sz (cb_compound_enc_hdr_sz + \ + cb_sequence_enc_sz + \ + 1 + 1 + 1) +#define NFS4_dec_cb_recall_any_sz (cb_compound_dec_hdr_sz + \ + cb_sequence_dec_sz + \ + op_dec_sz) From 71a98737cdcfda6dee009522db1196bbc4a215f6 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Wed, 16 Nov 2022 19:44:47 -0800 Subject: [PATCH 1017/1565] NFSD: add delegation reaper to react to low memory condition [ Upstream commit 44df6f439a1790a5f602e3842879efa88f346672 ] The delegation reaper is called by nfsd memory shrinker's on the 'count' callback. It scans the client list and sends the courtesy CB_RECALL_ANY to the clients that hold delegations. To avoid flooding the clients with CB_RECALL_ANY requests, the delegation reaper sends only one CB_RECALL_ANY request to each client per 5 seconds. Signed-off-by: Dai Ngo [ cel: moved definition of RCA4_TYPE_MASK_RDATA_DLG ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 88 ++++++++++++++++++++++++++++++++++++++++++-- fs/nfsd/state.h | 5 +++ include/linux/nfs4.h | 13 +++++++ 3 files changed, 102 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 8fdf5ab5b9e4..4e05ad774c86 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2144,6 +2144,7 @@ static void __free_client(struct kref *k) kfree(clp->cl_nii_domain.data); kfree(clp->cl_nii_name.data); idr_destroy(&clp->cl_stateids); + kfree(clp->cl_ra); kmem_cache_free(client_slab, clp); } @@ -2871,6 +2872,36 @@ static const struct tree_descr client_files[] = { [3] = {""}, }; +static int +nfsd4_cb_recall_any_done(struct nfsd4_callback *cb, + struct rpc_task *task) +{ + switch (task->tk_status) { + case -NFS4ERR_DELAY: + rpc_delay(task, 2 * HZ); + return 0; + default: + return 1; + } +} + +static void +nfsd4_cb_recall_any_release(struct nfsd4_callback *cb) +{ + struct nfs4_client *clp = cb->cb_clp; + struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); + + spin_lock(&nn->client_lock); + clear_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags); + put_client_renew_locked(clp); + spin_unlock(&nn->client_lock); +} + +static const struct nfsd4_callback_ops nfsd4_cb_recall_any_ops = { + .done = nfsd4_cb_recall_any_done, + .release = nfsd4_cb_recall_any_release, +}; + static struct nfs4_client *create_client(struct xdr_netobj name, struct svc_rqst *rqstp, nfs4_verifier *verf) { @@ -2908,6 +2939,14 @@ static struct nfs4_client *create_client(struct xdr_netobj name, free_client(clp); return NULL; } + clp->cl_ra = kzalloc(sizeof(*clp->cl_ra), GFP_KERNEL); + if (!clp->cl_ra) { + free_client(clp); + return NULL; + } + clp->cl_ra_time = 0; + nfsd4_init_cb(&clp->cl_ra->ra_cb, clp, &nfsd4_cb_recall_any_ops, + NFSPROC4_CLNT_CB_RECALL_ANY); return clp; } @@ -4363,14 +4402,16 @@ nfsd4_init_slabs(void) static unsigned long nfsd4_state_shrinker_count(struct shrinker *shrink, struct shrink_control *sc) { - int cnt; + int count; struct nfsd_net *nn = container_of(shrink, struct nfsd_net, nfsd_client_shrinker); - cnt = atomic_read(&nn->nfsd_courtesy_clients); - if (cnt > 0) + count = atomic_read(&nn->nfsd_courtesy_clients); + if (!count) + count = atomic_long_read(&num_delegations); + if (count) mod_delayed_work(laundry_wq, &nn->nfsd_shrinker_work, 0); - return (unsigned long)cnt; + return (unsigned long)count; } static unsigned long @@ -6159,6 +6200,44 @@ courtesy_client_reaper(struct nfsd_net *nn) nfs4_process_client_reaplist(&reaplist); } +static void +deleg_reaper(struct nfsd_net *nn) +{ + struct list_head *pos, *next; + struct nfs4_client *clp; + struct list_head cblist; + + INIT_LIST_HEAD(&cblist); + spin_lock(&nn->client_lock); + list_for_each_safe(pos, next, &nn->client_lru) { + clp = list_entry(pos, struct nfs4_client, cl_lru); + if (clp->cl_state != NFSD4_ACTIVE || + list_empty(&clp->cl_delegations) || + atomic_read(&clp->cl_delegs_in_recall) || + test_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags) || + (ktime_get_boottime_seconds() - + clp->cl_ra_time < 5)) { + continue; + } + list_add(&clp->cl_ra_cblist, &cblist); + + /* release in nfsd4_cb_recall_any_release */ + atomic_inc(&clp->cl_rpc_users); + set_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags); + clp->cl_ra_time = ktime_get_boottime_seconds(); + } + spin_unlock(&nn->client_lock); + + while (!list_empty(&cblist)) { + clp = list_first_entry(&cblist, struct nfs4_client, + cl_ra_cblist); + list_del_init(&clp->cl_ra_cblist); + clp->cl_ra->ra_keep = 0; + clp->cl_ra->ra_bmval[0] = BIT(RCA4_TYPE_MASK_RDATA_DLG); + nfsd4_run_cb(&clp->cl_ra->ra_cb); + } +} + static void nfsd4_state_shrinker_worker(struct work_struct *work) { @@ -6167,6 +6246,7 @@ nfsd4_state_shrinker_worker(struct work_struct *work) nfsd_shrinker_work); courtesy_client_reaper(nn); + deleg_reaper(nn); } static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp) diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index e30882f8b851..e94634d30591 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -368,6 +368,7 @@ struct nfs4_client { #define NFSD4_CLIENT_UPCALL_LOCK (5) /* upcall serialization */ #define NFSD4_CLIENT_CB_FLAG_MASK (1 << NFSD4_CLIENT_CB_UPDATE | \ 1 << NFSD4_CLIENT_CB_KILL) +#define NFSD4_CLIENT_CB_RECALL_ANY (6) unsigned long cl_flags; const struct cred *cl_cb_cred; struct rpc_clnt *cl_cb_client; @@ -411,6 +412,10 @@ struct nfs4_client { unsigned int cl_state; atomic_t cl_delegs_in_recall; + + struct nfsd4_cb_recall_any *cl_ra; + time64_t cl_ra_time; + struct list_head cl_ra_cblist; }; /* struct nfs4_client_reset diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h index 5b4c67c91f56..ea88d0f462c9 100644 --- a/include/linux/nfs4.h +++ b/include/linux/nfs4.h @@ -717,4 +717,17 @@ enum nfs4_setxattr_options { SETXATTR4_CREATE = 1, SETXATTR4_REPLACE = 2, }; + +enum { + RCA4_TYPE_MASK_RDATA_DLG = 0, + RCA4_TYPE_MASK_WDATA_DLG = 1, + RCA4_TYPE_MASK_DIR_DLG = 2, + RCA4_TYPE_MASK_FILE_LAYOUT = 3, + RCA4_TYPE_MASK_BLK_LAYOUT = 4, + RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8, + RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 9, + RCA4_TYPE_MASK_OTHER_LAYOUT_MIN = 12, + RCA4_TYPE_MASK_OTHER_LAYOUT_MAX = 15, +}; + #endif From ea468098605ef6a02d5a6ff54a8e45fdf0c08d93 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sat, 26 Nov 2022 15:55:30 -0500 Subject: [PATCH 1018/1565] NFSD: Use only RQ_DROPME to signal the need to drop a reply [ Upstream commit 9315564747cb6a570e99196b3a4880fb817635fd ] Clean up: NFSv2 has the only two usages of rpc_drop_reply in the NFSD code base. Since NFSv2 is going away at some point, replace these in order to simplify the "drop this reply?" check in nfsd_dispatch(). Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 4 ++-- fs/nfsd/nfssvc.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 3777b5be4253..c32a871f0e27 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -211,7 +211,7 @@ nfsd_proc_read(struct svc_rqst *rqstp) if (resp->status == nfs_ok) resp->status = fh_getattr(&resp->fh, &resp->stat); else if (resp->status == nfserr_jukebox) - return rpc_drop_reply; + __set_bit(RQ_DROPME, &rqstp->rq_flags); return rpc_success; } @@ -246,7 +246,7 @@ nfsd_proc_write(struct svc_rqst *rqstp) if (resp->status == nfs_ok) resp->status = fh_getattr(&resp->fh, &resp->stat); else if (resp->status == nfserr_jukebox) - return rpc_drop_reply; + __set_bit(RQ_DROPME, &rqstp->rq_flags); return rpc_success; } diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 429f38c98628..325d3d3f1211 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1060,7 +1060,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) svcxdr_init_encode(rqstp); *statp = proc->pc_func(rqstp); - if (*statp == rpc_drop_reply || test_bit(RQ_DROPME, &rqstp->rq_flags)) + if (test_bit(RQ_DROPME, &rqstp->rq_flags)) goto out_update_drop; if (!proc->pc_encode(rqstp, &rqstp->rq_res_stream)) From dfbf3066d9739e58f24ddc98e4a660af9b525315 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Fri, 2 Dec 2022 12:48:59 -0800 Subject: [PATCH 1019/1565] NFSD: Avoid clashing function prototypes [ Upstream commit e78e274eb22d966258a3845acc71d3c5b8ee2ea8 ] When built with Control Flow Integrity, function prototypes between caller and function declaration must match. These mismatches are visible at compile time with the new -Wcast-function-type-strict in Clang[1]. There were 97 warnings produced by NFS. For example: fs/nfsd/nfs4xdr.c:2228:17: warning: cast from '__be32 (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)') to 'nfsd4_dec' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, void *)') converts to incompatible function type [-Wcast-function-type-strict] [OP_ACCESS] = (nfsd4_dec)nfsd4_decode_access, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The enc/dec callbacks were defined as passing "void *" as the second argument, but were being implicitly cast to a new type. Replace the argument with union nfsd4_op_u, and perform explicit member selection in the function body. There are no resulting binary differences. Changes were made mechanically using the following Coccinelle script, with minor by-hand fixes for members that didn't already match their existing argument name: @find@ identifier func; type T, opsT; identifier ops, N; @@ opsT ops[] = { [N] = (T) func, }; @already_void@ identifier find.func; identifier name; @@ func(..., -void +union nfsd4_op_u *name) { ... } @proto depends on !already_void@ identifier find.func; type T; identifier name; position p; @@ func@p(..., T name ) { ... } @script:python get_member@ type_name << proto.T; member; @@ coccinelle.member = cocci.make_ident(type_name.split("_", 1)[1].split(' ',1)[0]) @convert@ identifier find.func; type proto.T; identifier proto.name; position proto.p; identifier get_member.member; @@ func@p(..., - T name + union nfsd4_op_u *u ) { + T name = &u->member; ... } @cast@ identifier find.func; type T, opsT; identifier ops, N; @@ opsT ops[] = { [N] = - (T) func, }; Cc: Chuck Lever Cc: Jeff Layton Cc: Gustavo A. R. Silva Cc: linux-nfs@vger.kernel.org Signed-off-by: Kees Cook Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 632 +++++++++++++++++++++++++++------------------- 1 file changed, 377 insertions(+), 255 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 27b38ce6f8c6..8bbf85da1de0 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -770,16 +770,18 @@ nfsd4_decode_cb_sec(struct nfsd4_compoundargs *argp, struct nfsd4_cb_sec *cbs) static __be32 nfsd4_decode_access(struct nfsd4_compoundargs *argp, - struct nfsd4_access *access) + union nfsd4_op_u *u) { + struct nfsd4_access *access = &u->access; if (xdr_stream_decode_u32(argp->xdr, &access->ac_req_access) < 0) return nfserr_bad_xdr; return nfs_ok; } static __be32 -nfsd4_decode_close(struct nfsd4_compoundargs *argp, struct nfsd4_close *close) +nfsd4_decode_close(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_close *close = &u->close; if (xdr_stream_decode_u32(argp->xdr, &close->cl_seqid) < 0) return nfserr_bad_xdr; return nfsd4_decode_stateid4(argp, &close->cl_stateid); @@ -787,8 +789,9 @@ nfsd4_decode_close(struct nfsd4_compoundargs *argp, struct nfsd4_close *close) static __be32 -nfsd4_decode_commit(struct nfsd4_compoundargs *argp, struct nfsd4_commit *commit) +nfsd4_decode_commit(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_commit *commit = &u->commit; if (xdr_stream_decode_u64(argp->xdr, &commit->co_offset) < 0) return nfserr_bad_xdr; if (xdr_stream_decode_u32(argp->xdr, &commit->co_count) < 0) @@ -798,8 +801,9 @@ nfsd4_decode_commit(struct nfsd4_compoundargs *argp, struct nfsd4_commit *commit } static __be32 -nfsd4_decode_create(struct nfsd4_compoundargs *argp, struct nfsd4_create *create) +nfsd4_decode_create(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_create *create = &u->create; __be32 *p, status; memset(create, 0, sizeof(*create)); @@ -844,22 +848,25 @@ nfsd4_decode_create(struct nfsd4_compoundargs *argp, struct nfsd4_create *create } static inline __be32 -nfsd4_decode_delegreturn(struct nfsd4_compoundargs *argp, struct nfsd4_delegreturn *dr) +nfsd4_decode_delegreturn(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_delegreturn *dr = &u->delegreturn; return nfsd4_decode_stateid4(argp, &dr->dr_stateid); } static inline __be32 -nfsd4_decode_getattr(struct nfsd4_compoundargs *argp, struct nfsd4_getattr *getattr) +nfsd4_decode_getattr(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_getattr *getattr = &u->getattr; memset(getattr, 0, sizeof(*getattr)); return nfsd4_decode_bitmap4(argp, getattr->ga_bmval, ARRAY_SIZE(getattr->ga_bmval)); } static __be32 -nfsd4_decode_link(struct nfsd4_compoundargs *argp, struct nfsd4_link *link) +nfsd4_decode_link(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_link *link = &u->link; memset(link, 0, sizeof(*link)); return nfsd4_decode_component4(argp, &link->li_name, &link->li_namelen); } @@ -907,8 +914,9 @@ nfsd4_decode_locker4(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) } static __be32 -nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) +nfsd4_decode_lock(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_lock *lock = &u->lock; memset(lock, 0, sizeof(*lock)); if (xdr_stream_decode_u32(argp->xdr, &lock->lk_type) < 0) return nfserr_bad_xdr; @@ -924,8 +932,9 @@ nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) } static __be32 -nfsd4_decode_lockt(struct nfsd4_compoundargs *argp, struct nfsd4_lockt *lockt) +nfsd4_decode_lockt(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_lockt *lockt = &u->lockt; memset(lockt, 0, sizeof(*lockt)); if (xdr_stream_decode_u32(argp->xdr, &lockt->lt_type) < 0) return nfserr_bad_xdr; @@ -940,8 +949,9 @@ nfsd4_decode_lockt(struct nfsd4_compoundargs *argp, struct nfsd4_lockt *lockt) } static __be32 -nfsd4_decode_locku(struct nfsd4_compoundargs *argp, struct nfsd4_locku *locku) +nfsd4_decode_locku(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_locku *locku = &u->locku; __be32 status; if (xdr_stream_decode_u32(argp->xdr, &locku->lu_type) < 0) @@ -962,8 +972,9 @@ nfsd4_decode_locku(struct nfsd4_compoundargs *argp, struct nfsd4_locku *locku) } static __be32 -nfsd4_decode_lookup(struct nfsd4_compoundargs *argp, struct nfsd4_lookup *lookup) +nfsd4_decode_lookup(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_lookup *lookup = &u->lookup; return nfsd4_decode_component4(argp, &lookup->lo_name, &lookup->lo_len); } @@ -1143,8 +1154,9 @@ nfsd4_decode_open_claim4(struct nfsd4_compoundargs *argp, } static __be32 -nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) +nfsd4_decode_open(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_open *open = &u->open; __be32 status; u32 dummy; @@ -1171,8 +1183,10 @@ nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) } static __be32 -nfsd4_decode_open_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_open_confirm *open_conf) +nfsd4_decode_open_confirm(struct nfsd4_compoundargs *argp, + union nfsd4_op_u *u) { + struct nfsd4_open_confirm *open_conf = &u->open_confirm; __be32 status; if (argp->minorversion >= 1) @@ -1190,8 +1204,10 @@ nfsd4_decode_open_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_open_con } static __be32 -nfsd4_decode_open_downgrade(struct nfsd4_compoundargs *argp, struct nfsd4_open_downgrade *open_down) +nfsd4_decode_open_downgrade(struct nfsd4_compoundargs *argp, + union nfsd4_op_u *u) { + struct nfsd4_open_downgrade *open_down = &u->open_downgrade; __be32 status; memset(open_down, 0, sizeof(*open_down)); @@ -1209,8 +1225,9 @@ nfsd4_decode_open_downgrade(struct nfsd4_compoundargs *argp, struct nfsd4_open_d } static __be32 -nfsd4_decode_putfh(struct nfsd4_compoundargs *argp, struct nfsd4_putfh *putfh) +nfsd4_decode_putfh(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_putfh *putfh = &u->putfh; __be32 *p; if (xdr_stream_decode_u32(argp->xdr, &putfh->pf_fhlen) < 0) @@ -1229,7 +1246,7 @@ nfsd4_decode_putfh(struct nfsd4_compoundargs *argp, struct nfsd4_putfh *putfh) } static __be32 -nfsd4_decode_putpubfh(struct nfsd4_compoundargs *argp, void *p) +nfsd4_decode_putpubfh(struct nfsd4_compoundargs *argp, union nfsd4_op_u *p) { if (argp->minorversion == 0) return nfs_ok; @@ -1237,8 +1254,9 @@ nfsd4_decode_putpubfh(struct nfsd4_compoundargs *argp, void *p) } static __be32 -nfsd4_decode_read(struct nfsd4_compoundargs *argp, struct nfsd4_read *read) +nfsd4_decode_read(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_read *read = &u->read; __be32 status; memset(read, 0, sizeof(*read)); @@ -1254,8 +1272,9 @@ nfsd4_decode_read(struct nfsd4_compoundargs *argp, struct nfsd4_read *read) } static __be32 -nfsd4_decode_readdir(struct nfsd4_compoundargs *argp, struct nfsd4_readdir *readdir) +nfsd4_decode_readdir(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_readdir *readdir = &u->readdir; __be32 status; memset(readdir, 0, sizeof(*readdir)); @@ -1276,15 +1295,17 @@ nfsd4_decode_readdir(struct nfsd4_compoundargs *argp, struct nfsd4_readdir *read } static __be32 -nfsd4_decode_remove(struct nfsd4_compoundargs *argp, struct nfsd4_remove *remove) +nfsd4_decode_remove(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_remove *remove = &u->remove; memset(&remove->rm_cinfo, 0, sizeof(remove->rm_cinfo)); return nfsd4_decode_component4(argp, &remove->rm_name, &remove->rm_namelen); } static __be32 -nfsd4_decode_rename(struct nfsd4_compoundargs *argp, struct nfsd4_rename *rename) +nfsd4_decode_rename(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_rename *rename = &u->rename; __be32 status; memset(rename, 0, sizeof(*rename)); @@ -1295,22 +1316,25 @@ nfsd4_decode_rename(struct nfsd4_compoundargs *argp, struct nfsd4_rename *rename } static __be32 -nfsd4_decode_renew(struct nfsd4_compoundargs *argp, clientid_t *clientid) +nfsd4_decode_renew(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + clientid_t *clientid = &u->renew; return nfsd4_decode_clientid4(argp, clientid); } static __be32 nfsd4_decode_secinfo(struct nfsd4_compoundargs *argp, - struct nfsd4_secinfo *secinfo) + union nfsd4_op_u *u) { + struct nfsd4_secinfo *secinfo = &u->secinfo; secinfo->si_exp = NULL; return nfsd4_decode_component4(argp, &secinfo->si_name, &secinfo->si_namelen); } static __be32 -nfsd4_decode_setattr(struct nfsd4_compoundargs *argp, struct nfsd4_setattr *setattr) +nfsd4_decode_setattr(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_setattr *setattr = &u->setattr; __be32 status; memset(setattr, 0, sizeof(*setattr)); @@ -1324,8 +1348,9 @@ nfsd4_decode_setattr(struct nfsd4_compoundargs *argp, struct nfsd4_setattr *seta } static __be32 -nfsd4_decode_setclientid(struct nfsd4_compoundargs *argp, struct nfsd4_setclientid *setclientid) +nfsd4_decode_setclientid(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_setclientid *setclientid = &u->setclientid; __be32 *p, status; memset(setclientid, 0, sizeof(*setclientid)); @@ -1367,8 +1392,10 @@ nfsd4_decode_setclientid(struct nfsd4_compoundargs *argp, struct nfsd4_setclient } static __be32 -nfsd4_decode_setclientid_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_setclientid_confirm *scd_c) +nfsd4_decode_setclientid_confirm(struct nfsd4_compoundargs *argp, + union nfsd4_op_u *u) { + struct nfsd4_setclientid_confirm *scd_c = &u->setclientid_confirm; __be32 status; if (argp->minorversion >= 1) @@ -1382,8 +1409,9 @@ nfsd4_decode_setclientid_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_s /* Also used for NVERIFY */ static __be32 -nfsd4_decode_verify(struct nfsd4_compoundargs *argp, struct nfsd4_verify *verify) +nfsd4_decode_verify(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_verify *verify = &u->verify; __be32 *p, status; memset(verify, 0, sizeof(*verify)); @@ -1409,8 +1437,9 @@ nfsd4_decode_verify(struct nfsd4_compoundargs *argp, struct nfsd4_verify *verify } static __be32 -nfsd4_decode_write(struct nfsd4_compoundargs *argp, struct nfsd4_write *write) +nfsd4_decode_write(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_write *write = &u->write; __be32 status; status = nfsd4_decode_stateid4(argp, &write->wr_stateid); @@ -1434,8 +1463,10 @@ nfsd4_decode_write(struct nfsd4_compoundargs *argp, struct nfsd4_write *write) } static __be32 -nfsd4_decode_release_lockowner(struct nfsd4_compoundargs *argp, struct nfsd4_release_lockowner *rlockowner) +nfsd4_decode_release_lockowner(struct nfsd4_compoundargs *argp, + union nfsd4_op_u *u) { + struct nfsd4_release_lockowner *rlockowner = &u->release_lockowner; __be32 status; if (argp->minorversion >= 1) @@ -1452,16 +1483,20 @@ nfsd4_decode_release_lockowner(struct nfsd4_compoundargs *argp, struct nfsd4_rel return nfs_ok; } -static __be32 nfsd4_decode_backchannel_ctl(struct nfsd4_compoundargs *argp, struct nfsd4_backchannel_ctl *bc) +static __be32 nfsd4_decode_backchannel_ctl(struct nfsd4_compoundargs *argp, + union nfsd4_op_u *u) { + struct nfsd4_backchannel_ctl *bc = &u->backchannel_ctl; memset(bc, 0, sizeof(*bc)); if (xdr_stream_decode_u32(argp->xdr, &bc->bc_cb_program) < 0) return nfserr_bad_xdr; return nfsd4_decode_cb_sec(argp, &bc->bc_cb_sec); } -static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, struct nfsd4_bind_conn_to_session *bcts) +static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, + union nfsd4_op_u *u) { + struct nfsd4_bind_conn_to_session *bcts = &u->bind_conn_to_session; u32 use_conn_in_rdma_mode; __be32 status; @@ -1603,8 +1638,9 @@ nfsd4_decode_nfs_impl_id4(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, - struct nfsd4_exchange_id *exid) + union nfsd4_op_u *u) { + struct nfsd4_exchange_id *exid = &u->exchange_id; __be32 status; memset(exid, 0, sizeof(*exid)); @@ -1656,8 +1692,9 @@ nfsd4_decode_channel_attrs4(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_create_session(struct nfsd4_compoundargs *argp, - struct nfsd4_create_session *sess) + union nfsd4_op_u *u) { + struct nfsd4_create_session *sess = &u->create_session; __be32 status; memset(sess, 0, sizeof(*sess)); @@ -1681,23 +1718,26 @@ nfsd4_decode_create_session(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_destroy_session(struct nfsd4_compoundargs *argp, - struct nfsd4_destroy_session *destroy_session) + union nfsd4_op_u *u) { + struct nfsd4_destroy_session *destroy_session = &u->destroy_session; return nfsd4_decode_sessionid4(argp, &destroy_session->sessionid); } static __be32 nfsd4_decode_free_stateid(struct nfsd4_compoundargs *argp, - struct nfsd4_free_stateid *free_stateid) + union nfsd4_op_u *u) { + struct nfsd4_free_stateid *free_stateid = &u->free_stateid; return nfsd4_decode_stateid4(argp, &free_stateid->fr_stateid); } #ifdef CONFIG_NFSD_PNFS static __be32 nfsd4_decode_getdeviceinfo(struct nfsd4_compoundargs *argp, - struct nfsd4_getdeviceinfo *gdev) + union nfsd4_op_u *u) { + struct nfsd4_getdeviceinfo *gdev = &u->getdeviceinfo; __be32 status; memset(gdev, 0, sizeof(*gdev)); @@ -1717,8 +1757,9 @@ nfsd4_decode_getdeviceinfo(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_layoutcommit(struct nfsd4_compoundargs *argp, - struct nfsd4_layoutcommit *lcp) + union nfsd4_op_u *u) { + struct nfsd4_layoutcommit *lcp = &u->layoutcommit; __be32 *p, status; memset(lcp, 0, sizeof(*lcp)); @@ -1753,8 +1794,9 @@ nfsd4_decode_layoutcommit(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_layoutget(struct nfsd4_compoundargs *argp, - struct nfsd4_layoutget *lgp) + union nfsd4_op_u *u) { + struct nfsd4_layoutget *lgp = &u->layoutget; __be32 status; memset(lgp, 0, sizeof(*lgp)); @@ -1781,8 +1823,9 @@ nfsd4_decode_layoutget(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_layoutreturn(struct nfsd4_compoundargs *argp, - struct nfsd4_layoutreturn *lrp) + union nfsd4_op_u *u) { + struct nfsd4_layoutreturn *lrp = &u->layoutreturn; memset(lrp, 0, sizeof(*lrp)); if (xdr_stream_decode_bool(argp->xdr, &lrp->lr_reclaim) < 0) return nfserr_bad_xdr; @@ -1795,8 +1838,9 @@ nfsd4_decode_layoutreturn(struct nfsd4_compoundargs *argp, #endif /* CONFIG_NFSD_PNFS */ static __be32 nfsd4_decode_secinfo_no_name(struct nfsd4_compoundargs *argp, - struct nfsd4_secinfo_no_name *sin) + union nfsd4_op_u *u) { + struct nfsd4_secinfo_no_name *sin = &u->secinfo_no_name; if (xdr_stream_decode_u32(argp->xdr, &sin->sin_style) < 0) return nfserr_bad_xdr; @@ -1806,8 +1850,9 @@ static __be32 nfsd4_decode_secinfo_no_name(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_sequence(struct nfsd4_compoundargs *argp, - struct nfsd4_sequence *seq) + union nfsd4_op_u *u) { + struct nfsd4_sequence *seq = &u->sequence; __be32 *p, status; status = nfsd4_decode_sessionid4(argp, &seq->sessionid); @@ -1826,8 +1871,10 @@ nfsd4_decode_sequence(struct nfsd4_compoundargs *argp, } static __be32 -nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_test_stateid *test_stateid) +nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, + union nfsd4_op_u *u) { + struct nfsd4_test_stateid *test_stateid = &u->test_stateid; struct nfsd4_test_stateid_id *stateid; __be32 status; u32 i; @@ -1852,14 +1899,16 @@ nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_test_sta } static __be32 nfsd4_decode_destroy_clientid(struct nfsd4_compoundargs *argp, - struct nfsd4_destroy_clientid *dc) + union nfsd4_op_u *u) { + struct nfsd4_destroy_clientid *dc = &u->destroy_clientid; return nfsd4_decode_clientid4(argp, &dc->clientid); } static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, - struct nfsd4_reclaim_complete *rc) + union nfsd4_op_u *u) { + struct nfsd4_reclaim_complete *rc = &u->reclaim_complete; if (xdr_stream_decode_bool(argp->xdr, &rc->rca_one_fs) < 0) return nfserr_bad_xdr; return nfs_ok; @@ -1867,8 +1916,9 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp, - struct nfsd4_fallocate *fallocate) + union nfsd4_op_u *u) { + struct nfsd4_fallocate *fallocate = &u->allocate; __be32 status; status = nfsd4_decode_stateid4(argp, &fallocate->falloc_stateid); @@ -1924,8 +1974,9 @@ static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp, } static __be32 -nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) +nfsd4_decode_copy(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_copy *copy = &u->copy; u32 consecutive, i, count, sync; struct nl4_server *ns_dummy; __be32 status; @@ -1982,8 +2033,9 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) static __be32 nfsd4_decode_copy_notify(struct nfsd4_compoundargs *argp, - struct nfsd4_copy_notify *cn) + union nfsd4_op_u *u) { + struct nfsd4_copy_notify *cn = &u->copy_notify; __be32 status; memset(cn, 0, sizeof(*cn)); @@ -2002,16 +2054,18 @@ nfsd4_decode_copy_notify(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_offload_status(struct nfsd4_compoundargs *argp, - struct nfsd4_offload_status *os) + union nfsd4_op_u *u) { + struct nfsd4_offload_status *os = &u->offload_status; os->count = 0; os->status = 0; return nfsd4_decode_stateid4(argp, &os->stateid); } static __be32 -nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek) +nfsd4_decode_seek(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_seek *seek = &u->seek; __be32 status; status = nfsd4_decode_stateid4(argp, &seek->seek_stateid); @@ -2028,8 +2082,9 @@ nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek) } static __be32 -nfsd4_decode_clone(struct nfsd4_compoundargs *argp, struct nfsd4_clone *clone) +nfsd4_decode_clone(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) { + struct nfsd4_clone *clone = &u->clone; __be32 status; status = nfsd4_decode_stateid4(argp, &clone->cl_src_stateid); @@ -2154,8 +2209,9 @@ nfsd4_decode_xattr_name(struct nfsd4_compoundargs *argp, char **namep) */ static __be32 nfsd4_decode_getxattr(struct nfsd4_compoundargs *argp, - struct nfsd4_getxattr *getxattr) + union nfsd4_op_u *u) { + struct nfsd4_getxattr *getxattr = &u->getxattr; __be32 status; u32 maxcount; @@ -2173,8 +2229,9 @@ nfsd4_decode_getxattr(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_setxattr(struct nfsd4_compoundargs *argp, - struct nfsd4_setxattr *setxattr) + union nfsd4_op_u *u) { + struct nfsd4_setxattr *setxattr = &u->setxattr; u32 flags, maxcount, size; __be32 status; @@ -2214,8 +2271,9 @@ nfsd4_decode_setxattr(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_listxattrs(struct nfsd4_compoundargs *argp, - struct nfsd4_listxattrs *listxattrs) + union nfsd4_op_u *u) { + struct nfsd4_listxattrs *listxattrs = &u->listxattrs; u32 maxcount; memset(listxattrs, 0, sizeof(*listxattrs)); @@ -2245,113 +2303,114 @@ nfsd4_decode_listxattrs(struct nfsd4_compoundargs *argp, static __be32 nfsd4_decode_removexattr(struct nfsd4_compoundargs *argp, - struct nfsd4_removexattr *removexattr) + union nfsd4_op_u *u) { + struct nfsd4_removexattr *removexattr = &u->removexattr; memset(removexattr, 0, sizeof(*removexattr)); return nfsd4_decode_xattr_name(argp, &removexattr->rmxa_name); } static __be32 -nfsd4_decode_noop(struct nfsd4_compoundargs *argp, void *p) +nfsd4_decode_noop(struct nfsd4_compoundargs *argp, union nfsd4_op_u *p) { return nfs_ok; } static __be32 -nfsd4_decode_notsupp(struct nfsd4_compoundargs *argp, void *p) +nfsd4_decode_notsupp(struct nfsd4_compoundargs *argp, union nfsd4_op_u *p) { return nfserr_notsupp; } -typedef __be32(*nfsd4_dec)(struct nfsd4_compoundargs *argp, void *); +typedef __be32(*nfsd4_dec)(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u); static const nfsd4_dec nfsd4_dec_ops[] = { - [OP_ACCESS] = (nfsd4_dec)nfsd4_decode_access, - [OP_CLOSE] = (nfsd4_dec)nfsd4_decode_close, - [OP_COMMIT] = (nfsd4_dec)nfsd4_decode_commit, - [OP_CREATE] = (nfsd4_dec)nfsd4_decode_create, - [OP_DELEGPURGE] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_DELEGRETURN] = (nfsd4_dec)nfsd4_decode_delegreturn, - [OP_GETATTR] = (nfsd4_dec)nfsd4_decode_getattr, - [OP_GETFH] = (nfsd4_dec)nfsd4_decode_noop, - [OP_LINK] = (nfsd4_dec)nfsd4_decode_link, - [OP_LOCK] = (nfsd4_dec)nfsd4_decode_lock, - [OP_LOCKT] = (nfsd4_dec)nfsd4_decode_lockt, - [OP_LOCKU] = (nfsd4_dec)nfsd4_decode_locku, - [OP_LOOKUP] = (nfsd4_dec)nfsd4_decode_lookup, - [OP_LOOKUPP] = (nfsd4_dec)nfsd4_decode_noop, - [OP_NVERIFY] = (nfsd4_dec)nfsd4_decode_verify, - [OP_OPEN] = (nfsd4_dec)nfsd4_decode_open, - [OP_OPENATTR] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_OPEN_CONFIRM] = (nfsd4_dec)nfsd4_decode_open_confirm, - [OP_OPEN_DOWNGRADE] = (nfsd4_dec)nfsd4_decode_open_downgrade, - [OP_PUTFH] = (nfsd4_dec)nfsd4_decode_putfh, - [OP_PUTPUBFH] = (nfsd4_dec)nfsd4_decode_putpubfh, - [OP_PUTROOTFH] = (nfsd4_dec)nfsd4_decode_noop, - [OP_READ] = (nfsd4_dec)nfsd4_decode_read, - [OP_READDIR] = (nfsd4_dec)nfsd4_decode_readdir, - [OP_READLINK] = (nfsd4_dec)nfsd4_decode_noop, - [OP_REMOVE] = (nfsd4_dec)nfsd4_decode_remove, - [OP_RENAME] = (nfsd4_dec)nfsd4_decode_rename, - [OP_RENEW] = (nfsd4_dec)nfsd4_decode_renew, - [OP_RESTOREFH] = (nfsd4_dec)nfsd4_decode_noop, - [OP_SAVEFH] = (nfsd4_dec)nfsd4_decode_noop, - [OP_SECINFO] = (nfsd4_dec)nfsd4_decode_secinfo, - [OP_SETATTR] = (nfsd4_dec)nfsd4_decode_setattr, - [OP_SETCLIENTID] = (nfsd4_dec)nfsd4_decode_setclientid, - [OP_SETCLIENTID_CONFIRM] = (nfsd4_dec)nfsd4_decode_setclientid_confirm, - [OP_VERIFY] = (nfsd4_dec)nfsd4_decode_verify, - [OP_WRITE] = (nfsd4_dec)nfsd4_decode_write, - [OP_RELEASE_LOCKOWNER] = (nfsd4_dec)nfsd4_decode_release_lockowner, + [OP_ACCESS] = nfsd4_decode_access, + [OP_CLOSE] = nfsd4_decode_close, + [OP_COMMIT] = nfsd4_decode_commit, + [OP_CREATE] = nfsd4_decode_create, + [OP_DELEGPURGE] = nfsd4_decode_notsupp, + [OP_DELEGRETURN] = nfsd4_decode_delegreturn, + [OP_GETATTR] = nfsd4_decode_getattr, + [OP_GETFH] = nfsd4_decode_noop, + [OP_LINK] = nfsd4_decode_link, + [OP_LOCK] = nfsd4_decode_lock, + [OP_LOCKT] = nfsd4_decode_lockt, + [OP_LOCKU] = nfsd4_decode_locku, + [OP_LOOKUP] = nfsd4_decode_lookup, + [OP_LOOKUPP] = nfsd4_decode_noop, + [OP_NVERIFY] = nfsd4_decode_verify, + [OP_OPEN] = nfsd4_decode_open, + [OP_OPENATTR] = nfsd4_decode_notsupp, + [OP_OPEN_CONFIRM] = nfsd4_decode_open_confirm, + [OP_OPEN_DOWNGRADE] = nfsd4_decode_open_downgrade, + [OP_PUTFH] = nfsd4_decode_putfh, + [OP_PUTPUBFH] = nfsd4_decode_putpubfh, + [OP_PUTROOTFH] = nfsd4_decode_noop, + [OP_READ] = nfsd4_decode_read, + [OP_READDIR] = nfsd4_decode_readdir, + [OP_READLINK] = nfsd4_decode_noop, + [OP_REMOVE] = nfsd4_decode_remove, + [OP_RENAME] = nfsd4_decode_rename, + [OP_RENEW] = nfsd4_decode_renew, + [OP_RESTOREFH] = nfsd4_decode_noop, + [OP_SAVEFH] = nfsd4_decode_noop, + [OP_SECINFO] = nfsd4_decode_secinfo, + [OP_SETATTR] = nfsd4_decode_setattr, + [OP_SETCLIENTID] = nfsd4_decode_setclientid, + [OP_SETCLIENTID_CONFIRM] = nfsd4_decode_setclientid_confirm, + [OP_VERIFY] = nfsd4_decode_verify, + [OP_WRITE] = nfsd4_decode_write, + [OP_RELEASE_LOCKOWNER] = nfsd4_decode_release_lockowner, /* new operations for NFSv4.1 */ - [OP_BACKCHANNEL_CTL] = (nfsd4_dec)nfsd4_decode_backchannel_ctl, - [OP_BIND_CONN_TO_SESSION]= (nfsd4_dec)nfsd4_decode_bind_conn_to_session, - [OP_EXCHANGE_ID] = (nfsd4_dec)nfsd4_decode_exchange_id, - [OP_CREATE_SESSION] = (nfsd4_dec)nfsd4_decode_create_session, - [OP_DESTROY_SESSION] = (nfsd4_dec)nfsd4_decode_destroy_session, - [OP_FREE_STATEID] = (nfsd4_dec)nfsd4_decode_free_stateid, - [OP_GET_DIR_DELEGATION] = (nfsd4_dec)nfsd4_decode_notsupp, + [OP_BACKCHANNEL_CTL] = nfsd4_decode_backchannel_ctl, + [OP_BIND_CONN_TO_SESSION] = nfsd4_decode_bind_conn_to_session, + [OP_EXCHANGE_ID] = nfsd4_decode_exchange_id, + [OP_CREATE_SESSION] = nfsd4_decode_create_session, + [OP_DESTROY_SESSION] = nfsd4_decode_destroy_session, + [OP_FREE_STATEID] = nfsd4_decode_free_stateid, + [OP_GET_DIR_DELEGATION] = nfsd4_decode_notsupp, #ifdef CONFIG_NFSD_PNFS - [OP_GETDEVICEINFO] = (nfsd4_dec)nfsd4_decode_getdeviceinfo, - [OP_GETDEVICELIST] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_LAYOUTCOMMIT] = (nfsd4_dec)nfsd4_decode_layoutcommit, - [OP_LAYOUTGET] = (nfsd4_dec)nfsd4_decode_layoutget, - [OP_LAYOUTRETURN] = (nfsd4_dec)nfsd4_decode_layoutreturn, + [OP_GETDEVICEINFO] = nfsd4_decode_getdeviceinfo, + [OP_GETDEVICELIST] = nfsd4_decode_notsupp, + [OP_LAYOUTCOMMIT] = nfsd4_decode_layoutcommit, + [OP_LAYOUTGET] = nfsd4_decode_layoutget, + [OP_LAYOUTRETURN] = nfsd4_decode_layoutreturn, #else - [OP_GETDEVICEINFO] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_GETDEVICELIST] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_LAYOUTCOMMIT] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_LAYOUTGET] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_LAYOUTRETURN] = (nfsd4_dec)nfsd4_decode_notsupp, + [OP_GETDEVICEINFO] = nfsd4_decode_notsupp, + [OP_GETDEVICELIST] = nfsd4_decode_notsupp, + [OP_LAYOUTCOMMIT] = nfsd4_decode_notsupp, + [OP_LAYOUTGET] = nfsd4_decode_notsupp, + [OP_LAYOUTRETURN] = nfsd4_decode_notsupp, #endif - [OP_SECINFO_NO_NAME] = (nfsd4_dec)nfsd4_decode_secinfo_no_name, - [OP_SEQUENCE] = (nfsd4_dec)nfsd4_decode_sequence, - [OP_SET_SSV] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_TEST_STATEID] = (nfsd4_dec)nfsd4_decode_test_stateid, - [OP_WANT_DELEGATION] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_DESTROY_CLIENTID] = (nfsd4_dec)nfsd4_decode_destroy_clientid, - [OP_RECLAIM_COMPLETE] = (nfsd4_dec)nfsd4_decode_reclaim_complete, + [OP_SECINFO_NO_NAME] = nfsd4_decode_secinfo_no_name, + [OP_SEQUENCE] = nfsd4_decode_sequence, + [OP_SET_SSV] = nfsd4_decode_notsupp, + [OP_TEST_STATEID] = nfsd4_decode_test_stateid, + [OP_WANT_DELEGATION] = nfsd4_decode_notsupp, + [OP_DESTROY_CLIENTID] = nfsd4_decode_destroy_clientid, + [OP_RECLAIM_COMPLETE] = nfsd4_decode_reclaim_complete, /* new operations for NFSv4.2 */ - [OP_ALLOCATE] = (nfsd4_dec)nfsd4_decode_fallocate, - [OP_COPY] = (nfsd4_dec)nfsd4_decode_copy, - [OP_COPY_NOTIFY] = (nfsd4_dec)nfsd4_decode_copy_notify, - [OP_DEALLOCATE] = (nfsd4_dec)nfsd4_decode_fallocate, - [OP_IO_ADVISE] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_LAYOUTERROR] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_LAYOUTSTATS] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_OFFLOAD_CANCEL] = (nfsd4_dec)nfsd4_decode_offload_status, - [OP_OFFLOAD_STATUS] = (nfsd4_dec)nfsd4_decode_offload_status, - [OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_read, - [OP_SEEK] = (nfsd4_dec)nfsd4_decode_seek, - [OP_WRITE_SAME] = (nfsd4_dec)nfsd4_decode_notsupp, - [OP_CLONE] = (nfsd4_dec)nfsd4_decode_clone, + [OP_ALLOCATE] = nfsd4_decode_fallocate, + [OP_COPY] = nfsd4_decode_copy, + [OP_COPY_NOTIFY] = nfsd4_decode_copy_notify, + [OP_DEALLOCATE] = nfsd4_decode_fallocate, + [OP_IO_ADVISE] = nfsd4_decode_notsupp, + [OP_LAYOUTERROR] = nfsd4_decode_notsupp, + [OP_LAYOUTSTATS] = nfsd4_decode_notsupp, + [OP_OFFLOAD_CANCEL] = nfsd4_decode_offload_status, + [OP_OFFLOAD_STATUS] = nfsd4_decode_offload_status, + [OP_READ_PLUS] = nfsd4_decode_read, + [OP_SEEK] = nfsd4_decode_seek, + [OP_WRITE_SAME] = nfsd4_decode_notsupp, + [OP_CLONE] = nfsd4_decode_clone, /* RFC 8276 extended atributes operations */ - [OP_GETXATTR] = (nfsd4_dec)nfsd4_decode_getxattr, - [OP_SETXATTR] = (nfsd4_dec)nfsd4_decode_setxattr, - [OP_LISTXATTRS] = (nfsd4_dec)nfsd4_decode_listxattrs, - [OP_REMOVEXATTR] = (nfsd4_dec)nfsd4_decode_removexattr, + [OP_GETXATTR] = nfsd4_decode_getxattr, + [OP_SETXATTR] = nfsd4_decode_setxattr, + [OP_LISTXATTRS] = nfsd4_decode_listxattrs, + [OP_REMOVEXATTR] = nfsd4_decode_removexattr, }; static inline bool @@ -3641,8 +3700,10 @@ nfsd4_encode_stateid(struct xdr_stream *xdr, stateid_t *sid) } static __be32 -nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_access *access) +nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_access *access = &u->access; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -3654,8 +3715,10 @@ nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_ return 0; } -static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_bind_conn_to_session *bcts) +static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_bind_conn_to_session *bcts = &u->bind_conn_to_session; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -3671,8 +3734,10 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp, } static __be32 -nfsd4_encode_close(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_close *close) +nfsd4_encode_close(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_close *close = &u->close; struct xdr_stream *xdr = resp->xdr; return nfsd4_encode_stateid(xdr, &close->cl_stateid); @@ -3680,8 +3745,10 @@ nfsd4_encode_close(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_c static __be32 -nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_commit *commit) +nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_commit *commit = &u->commit; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -3694,8 +3761,10 @@ nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_ } static __be32 -nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_create *create) +nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_create *create = &u->create; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -3708,8 +3777,10 @@ nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_ } static __be32 -nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_getattr *getattr) +nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_getattr *getattr = &u->getattr; struct svc_fh *fhp = getattr->ga_fhp; struct xdr_stream *xdr = resp->xdr; @@ -3718,8 +3789,10 @@ nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 } static __be32 -nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, struct svc_fh **fhpp) +nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct svc_fh **fhpp = &u->getfh; struct xdr_stream *xdr = resp->xdr; struct svc_fh *fhp = *fhpp; unsigned int len; @@ -3773,8 +3846,10 @@ nfsd4_encode_lock_denied(struct xdr_stream *xdr, struct nfsd4_lock_denied *ld) } static __be32 -nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lock *lock) +nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_lock *lock = &u->lock; struct xdr_stream *xdr = resp->xdr; if (!nfserr) @@ -3786,8 +3861,10 @@ nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lo } static __be32 -nfsd4_encode_lockt(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lockt *lockt) +nfsd4_encode_lockt(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_lockt *lockt = &u->lockt; struct xdr_stream *xdr = resp->xdr; if (nfserr == nfserr_denied) @@ -3796,8 +3873,10 @@ nfsd4_encode_lockt(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_l } static __be32 -nfsd4_encode_locku(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_locku *locku) +nfsd4_encode_locku(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_locku *locku = &u->locku; struct xdr_stream *xdr = resp->xdr; return nfsd4_encode_stateid(xdr, &locku->lu_stateid); @@ -3805,8 +3884,10 @@ nfsd4_encode_locku(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_l static __be32 -nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_link *link) +nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_link *link = &u->link; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -3819,8 +3900,10 @@ nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_li static __be32 -nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open *open) +nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_open *open = &u->open; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -3913,16 +3996,20 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op } static __be32 -nfsd4_encode_open_confirm(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open_confirm *oc) +nfsd4_encode_open_confirm(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_open_confirm *oc = &u->open_confirm; struct xdr_stream *xdr = resp->xdr; return nfsd4_encode_stateid(xdr, &oc->oc_resp_stateid); } static __be32 -nfsd4_encode_open_downgrade(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open_downgrade *od) +nfsd4_encode_open_downgrade(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_open_downgrade *od = &u->open_downgrade; struct xdr_stream *xdr = resp->xdr; return nfsd4_encode_stateid(xdr, &od->od_stateid); @@ -4021,8 +4108,9 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, static __be32 nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_read *read) + union nfsd4_op_u *u) { + struct nfsd4_read *read = &u->read; bool splice_ok = test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags); unsigned long maxcount; struct xdr_stream *xdr = resp->xdr; @@ -4063,8 +4151,10 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, } static __be32 -nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_readlink *readlink) +nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_readlink *readlink = &u->readlink; __be32 *p, *maxcount_p, zero = xdr_zero; struct xdr_stream *xdr = resp->xdr; int length_offset = xdr->buf->len; @@ -4108,8 +4198,10 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd } static __be32 -nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_readdir *readdir) +nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_readdir *readdir = &u->readdir; int maxcount; int bytes_left; loff_t offset; @@ -4199,8 +4291,10 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 } static __be32 -nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_remove *remove) +nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_remove *remove = &u->remove; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -4212,8 +4306,10 @@ nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_ } static __be32 -nfsd4_encode_rename(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_rename *rename) +nfsd4_encode_rename(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_rename *rename = &u->rename; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -4295,8 +4391,9 @@ nfsd4_do_encode_secinfo(struct xdr_stream *xdr, struct svc_export *exp) static __be32 nfsd4_encode_secinfo(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_secinfo *secinfo) + union nfsd4_op_u *u) { + struct nfsd4_secinfo *secinfo = &u->secinfo; struct xdr_stream *xdr = resp->xdr; return nfsd4_do_encode_secinfo(xdr, secinfo->si_exp); @@ -4304,8 +4401,9 @@ nfsd4_encode_secinfo(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_secinfo_no_name(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_secinfo_no_name *secinfo) + union nfsd4_op_u *u) { + struct nfsd4_secinfo_no_name *secinfo = &u->secinfo_no_name; struct xdr_stream *xdr = resp->xdr; return nfsd4_do_encode_secinfo(xdr, secinfo->sin_exp); @@ -4316,8 +4414,10 @@ nfsd4_encode_secinfo_no_name(struct nfsd4_compoundres *resp, __be32 nfserr, * regardless of the error status. */ static __be32 -nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_setattr *setattr) +nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_setattr *setattr = &u->setattr; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -4340,8 +4440,10 @@ nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4 } static __be32 -nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_setclientid *scd) +nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_setclientid *scd = &u->setclientid; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -4364,8 +4466,10 @@ nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct n } static __be32 -nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_write *write) +nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *u) { + struct nfsd4_write *write = &u->write; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -4381,8 +4485,9 @@ nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_w static __be32 nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_exchange_id *exid) + union nfsd4_op_u *u) { + struct nfsd4_exchange_id *exid = &u->exchange_id; struct xdr_stream *xdr = resp->xdr; __be32 *p; char *major_id; @@ -4459,8 +4564,9 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_create_session *sess) + union nfsd4_op_u *u) { + struct nfsd4_create_session *sess = &u->create_session; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -4512,8 +4618,9 @@ nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_sequence *seq) + union nfsd4_op_u *u) { + struct nfsd4_sequence *seq = &u->sequence; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -4535,8 +4642,9 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_test_stateid *test_stateid) + union nfsd4_op_u *u) { + struct nfsd4_test_stateid *test_stateid = &u->test_stateid; struct xdr_stream *xdr = resp->xdr; struct nfsd4_test_stateid_id *stateid, *next; __be32 *p; @@ -4556,8 +4664,9 @@ nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr, #ifdef CONFIG_NFSD_PNFS static __be32 nfsd4_encode_getdeviceinfo(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_getdeviceinfo *gdev) + union nfsd4_op_u *u) { + struct nfsd4_getdeviceinfo *gdev = &u->getdeviceinfo; struct xdr_stream *xdr = resp->xdr; const struct nfsd4_layout_ops *ops; u32 starting_len = xdr->buf->len, needed_len; @@ -4609,8 +4718,9 @@ nfsd4_encode_getdeviceinfo(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_layoutget(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_layoutget *lgp) + union nfsd4_op_u *u) { + struct nfsd4_layoutget *lgp = &u->layoutget; struct xdr_stream *xdr = resp->xdr; const struct nfsd4_layout_ops *ops; __be32 *p; @@ -4636,8 +4746,9 @@ nfsd4_encode_layoutget(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_layoutcommit(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_layoutcommit *lcp) + union nfsd4_op_u *u) { + struct nfsd4_layoutcommit *lcp = &u->layoutcommit; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -4657,8 +4768,9 @@ nfsd4_encode_layoutcommit(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_layoutreturn(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_layoutreturn *lrp) + union nfsd4_op_u *u) { + struct nfsd4_layoutreturn *lrp = &u->layoutreturn; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -4743,8 +4855,9 @@ nfsd42_encode_nl4_server(struct nfsd4_compoundres *resp, struct nl4_server *ns) static __be32 nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_copy *copy) + union nfsd4_op_u *u) { + struct nfsd4_copy *copy = &u->copy; __be32 *p; nfserr = nfsd42_encode_write_res(resp, ©->cp_res, @@ -4760,8 +4873,9 @@ nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_offload_status *os) + union nfsd4_op_u *u) { + struct nfsd4_offload_status *os = &u->offload_status; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -4811,8 +4925,9 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp, static __be32 nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_read *read) + union nfsd4_op_u *u) { + struct nfsd4_read *read = &u->read; struct file *file = read->rd_nf->nf_file; struct xdr_stream *xdr = resp->xdr; int starting_len = xdr->buf->len; @@ -4848,8 +4963,9 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_copy_notify(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_copy_notify *cn) + union nfsd4_op_u *u) { + struct nfsd4_copy_notify *cn = &u->copy_notify; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -4883,8 +4999,9 @@ nfsd4_encode_copy_notify(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_seek *seek) + union nfsd4_op_u *u) { + struct nfsd4_seek *seek = &u->seek; __be32 *p; p = xdr_reserve_space(resp->xdr, 4 + 8); @@ -4895,7 +5012,8 @@ nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr, } static __be32 -nfsd4_encode_noop(struct nfsd4_compoundres *resp, __be32 nfserr, void *p) +nfsd4_encode_noop(struct nfsd4_compoundres *resp, __be32 nfserr, + union nfsd4_op_u *p) { return nfserr; } @@ -4946,8 +5064,9 @@ nfsd4_vbuf_to_stream(struct xdr_stream *xdr, char *buf, u32 buflen) static __be32 nfsd4_encode_getxattr(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_getxattr *getxattr) + union nfsd4_op_u *u) { + struct nfsd4_getxattr *getxattr = &u->getxattr; struct xdr_stream *xdr = resp->xdr; __be32 *p, err; @@ -4970,8 +5089,9 @@ nfsd4_encode_getxattr(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_setxattr(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_setxattr *setxattr) + union nfsd4_op_u *u) { + struct nfsd4_setxattr *setxattr = &u->setxattr; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -5011,8 +5131,9 @@ nfsd4_listxattr_validate_cookie(struct nfsd4_listxattrs *listxattrs, static __be32 nfsd4_encode_listxattrs(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_listxattrs *listxattrs) + union nfsd4_op_u *u) { + struct nfsd4_listxattrs *listxattrs = &u->listxattrs; struct xdr_stream *xdr = resp->xdr; u32 cookie_offset, count_offset, eof; u32 left, xdrleft, slen, count; @@ -5122,8 +5243,9 @@ nfsd4_encode_listxattrs(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_removexattr(struct nfsd4_compoundres *resp, __be32 nfserr, - struct nfsd4_removexattr *removexattr) + union nfsd4_op_u *u) { + struct nfsd4_removexattr *removexattr = &u->removexattr; struct xdr_stream *xdr = resp->xdr; __be32 *p; @@ -5135,7 +5257,7 @@ nfsd4_encode_removexattr(struct nfsd4_compoundres *resp, __be32 nfserr, return 0; } -typedef __be32(* nfsd4_enc)(struct nfsd4_compoundres *, __be32, void *); +typedef __be32(*nfsd4_enc)(struct nfsd4_compoundres *, __be32, union nfsd4_op_u *u); /* * Note: nfsd4_enc_ops vector is shared for v4.0 and v4.1 @@ -5143,93 +5265,93 @@ typedef __be32(* nfsd4_enc)(struct nfsd4_compoundres *, __be32, void *); * done in the decoding phase. */ static const nfsd4_enc nfsd4_enc_ops[] = { - [OP_ACCESS] = (nfsd4_enc)nfsd4_encode_access, - [OP_CLOSE] = (nfsd4_enc)nfsd4_encode_close, - [OP_COMMIT] = (nfsd4_enc)nfsd4_encode_commit, - [OP_CREATE] = (nfsd4_enc)nfsd4_encode_create, - [OP_DELEGPURGE] = (nfsd4_enc)nfsd4_encode_noop, - [OP_DELEGRETURN] = (nfsd4_enc)nfsd4_encode_noop, - [OP_GETATTR] = (nfsd4_enc)nfsd4_encode_getattr, - [OP_GETFH] = (nfsd4_enc)nfsd4_encode_getfh, - [OP_LINK] = (nfsd4_enc)nfsd4_encode_link, - [OP_LOCK] = (nfsd4_enc)nfsd4_encode_lock, - [OP_LOCKT] = (nfsd4_enc)nfsd4_encode_lockt, - [OP_LOCKU] = (nfsd4_enc)nfsd4_encode_locku, - [OP_LOOKUP] = (nfsd4_enc)nfsd4_encode_noop, - [OP_LOOKUPP] = (nfsd4_enc)nfsd4_encode_noop, - [OP_NVERIFY] = (nfsd4_enc)nfsd4_encode_noop, - [OP_OPEN] = (nfsd4_enc)nfsd4_encode_open, - [OP_OPENATTR] = (nfsd4_enc)nfsd4_encode_noop, - [OP_OPEN_CONFIRM] = (nfsd4_enc)nfsd4_encode_open_confirm, - [OP_OPEN_DOWNGRADE] = (nfsd4_enc)nfsd4_encode_open_downgrade, - [OP_PUTFH] = (nfsd4_enc)nfsd4_encode_noop, - [OP_PUTPUBFH] = (nfsd4_enc)nfsd4_encode_noop, - [OP_PUTROOTFH] = (nfsd4_enc)nfsd4_encode_noop, - [OP_READ] = (nfsd4_enc)nfsd4_encode_read, - [OP_READDIR] = (nfsd4_enc)nfsd4_encode_readdir, - [OP_READLINK] = (nfsd4_enc)nfsd4_encode_readlink, - [OP_REMOVE] = (nfsd4_enc)nfsd4_encode_remove, - [OP_RENAME] = (nfsd4_enc)nfsd4_encode_rename, - [OP_RENEW] = (nfsd4_enc)nfsd4_encode_noop, - [OP_RESTOREFH] = (nfsd4_enc)nfsd4_encode_noop, - [OP_SAVEFH] = (nfsd4_enc)nfsd4_encode_noop, - [OP_SECINFO] = (nfsd4_enc)nfsd4_encode_secinfo, - [OP_SETATTR] = (nfsd4_enc)nfsd4_encode_setattr, - [OP_SETCLIENTID] = (nfsd4_enc)nfsd4_encode_setclientid, - [OP_SETCLIENTID_CONFIRM] = (nfsd4_enc)nfsd4_encode_noop, - [OP_VERIFY] = (nfsd4_enc)nfsd4_encode_noop, - [OP_WRITE] = (nfsd4_enc)nfsd4_encode_write, - [OP_RELEASE_LOCKOWNER] = (nfsd4_enc)nfsd4_encode_noop, + [OP_ACCESS] = nfsd4_encode_access, + [OP_CLOSE] = nfsd4_encode_close, + [OP_COMMIT] = nfsd4_encode_commit, + [OP_CREATE] = nfsd4_encode_create, + [OP_DELEGPURGE] = nfsd4_encode_noop, + [OP_DELEGRETURN] = nfsd4_encode_noop, + [OP_GETATTR] = nfsd4_encode_getattr, + [OP_GETFH] = nfsd4_encode_getfh, + [OP_LINK] = nfsd4_encode_link, + [OP_LOCK] = nfsd4_encode_lock, + [OP_LOCKT] = nfsd4_encode_lockt, + [OP_LOCKU] = nfsd4_encode_locku, + [OP_LOOKUP] = nfsd4_encode_noop, + [OP_LOOKUPP] = nfsd4_encode_noop, + [OP_NVERIFY] = nfsd4_encode_noop, + [OP_OPEN] = nfsd4_encode_open, + [OP_OPENATTR] = nfsd4_encode_noop, + [OP_OPEN_CONFIRM] = nfsd4_encode_open_confirm, + [OP_OPEN_DOWNGRADE] = nfsd4_encode_open_downgrade, + [OP_PUTFH] = nfsd4_encode_noop, + [OP_PUTPUBFH] = nfsd4_encode_noop, + [OP_PUTROOTFH] = nfsd4_encode_noop, + [OP_READ] = nfsd4_encode_read, + [OP_READDIR] = nfsd4_encode_readdir, + [OP_READLINK] = nfsd4_encode_readlink, + [OP_REMOVE] = nfsd4_encode_remove, + [OP_RENAME] = nfsd4_encode_rename, + [OP_RENEW] = nfsd4_encode_noop, + [OP_RESTOREFH] = nfsd4_encode_noop, + [OP_SAVEFH] = nfsd4_encode_noop, + [OP_SECINFO] = nfsd4_encode_secinfo, + [OP_SETATTR] = nfsd4_encode_setattr, + [OP_SETCLIENTID] = nfsd4_encode_setclientid, + [OP_SETCLIENTID_CONFIRM] = nfsd4_encode_noop, + [OP_VERIFY] = nfsd4_encode_noop, + [OP_WRITE] = nfsd4_encode_write, + [OP_RELEASE_LOCKOWNER] = nfsd4_encode_noop, /* NFSv4.1 operations */ - [OP_BACKCHANNEL_CTL] = (nfsd4_enc)nfsd4_encode_noop, - [OP_BIND_CONN_TO_SESSION] = (nfsd4_enc)nfsd4_encode_bind_conn_to_session, - [OP_EXCHANGE_ID] = (nfsd4_enc)nfsd4_encode_exchange_id, - [OP_CREATE_SESSION] = (nfsd4_enc)nfsd4_encode_create_session, - [OP_DESTROY_SESSION] = (nfsd4_enc)nfsd4_encode_noop, - [OP_FREE_STATEID] = (nfsd4_enc)nfsd4_encode_noop, - [OP_GET_DIR_DELEGATION] = (nfsd4_enc)nfsd4_encode_noop, + [OP_BACKCHANNEL_CTL] = nfsd4_encode_noop, + [OP_BIND_CONN_TO_SESSION] = nfsd4_encode_bind_conn_to_session, + [OP_EXCHANGE_ID] = nfsd4_encode_exchange_id, + [OP_CREATE_SESSION] = nfsd4_encode_create_session, + [OP_DESTROY_SESSION] = nfsd4_encode_noop, + [OP_FREE_STATEID] = nfsd4_encode_noop, + [OP_GET_DIR_DELEGATION] = nfsd4_encode_noop, #ifdef CONFIG_NFSD_PNFS - [OP_GETDEVICEINFO] = (nfsd4_enc)nfsd4_encode_getdeviceinfo, - [OP_GETDEVICELIST] = (nfsd4_enc)nfsd4_encode_noop, - [OP_LAYOUTCOMMIT] = (nfsd4_enc)nfsd4_encode_layoutcommit, - [OP_LAYOUTGET] = (nfsd4_enc)nfsd4_encode_layoutget, - [OP_LAYOUTRETURN] = (nfsd4_enc)nfsd4_encode_layoutreturn, + [OP_GETDEVICEINFO] = nfsd4_encode_getdeviceinfo, + [OP_GETDEVICELIST] = nfsd4_encode_noop, + [OP_LAYOUTCOMMIT] = nfsd4_encode_layoutcommit, + [OP_LAYOUTGET] = nfsd4_encode_layoutget, + [OP_LAYOUTRETURN] = nfsd4_encode_layoutreturn, #else - [OP_GETDEVICEINFO] = (nfsd4_enc)nfsd4_encode_noop, - [OP_GETDEVICELIST] = (nfsd4_enc)nfsd4_encode_noop, - [OP_LAYOUTCOMMIT] = (nfsd4_enc)nfsd4_encode_noop, - [OP_LAYOUTGET] = (nfsd4_enc)nfsd4_encode_noop, - [OP_LAYOUTRETURN] = (nfsd4_enc)nfsd4_encode_noop, + [OP_GETDEVICEINFO] = nfsd4_encode_noop, + [OP_GETDEVICELIST] = nfsd4_encode_noop, + [OP_LAYOUTCOMMIT] = nfsd4_encode_noop, + [OP_LAYOUTGET] = nfsd4_encode_noop, + [OP_LAYOUTRETURN] = nfsd4_encode_noop, #endif - [OP_SECINFO_NO_NAME] = (nfsd4_enc)nfsd4_encode_secinfo_no_name, - [OP_SEQUENCE] = (nfsd4_enc)nfsd4_encode_sequence, - [OP_SET_SSV] = (nfsd4_enc)nfsd4_encode_noop, - [OP_TEST_STATEID] = (nfsd4_enc)nfsd4_encode_test_stateid, - [OP_WANT_DELEGATION] = (nfsd4_enc)nfsd4_encode_noop, - [OP_DESTROY_CLIENTID] = (nfsd4_enc)nfsd4_encode_noop, - [OP_RECLAIM_COMPLETE] = (nfsd4_enc)nfsd4_encode_noop, + [OP_SECINFO_NO_NAME] = nfsd4_encode_secinfo_no_name, + [OP_SEQUENCE] = nfsd4_encode_sequence, + [OP_SET_SSV] = nfsd4_encode_noop, + [OP_TEST_STATEID] = nfsd4_encode_test_stateid, + [OP_WANT_DELEGATION] = nfsd4_encode_noop, + [OP_DESTROY_CLIENTID] = nfsd4_encode_noop, + [OP_RECLAIM_COMPLETE] = nfsd4_encode_noop, /* NFSv4.2 operations */ - [OP_ALLOCATE] = (nfsd4_enc)nfsd4_encode_noop, - [OP_COPY] = (nfsd4_enc)nfsd4_encode_copy, - [OP_COPY_NOTIFY] = (nfsd4_enc)nfsd4_encode_copy_notify, - [OP_DEALLOCATE] = (nfsd4_enc)nfsd4_encode_noop, - [OP_IO_ADVISE] = (nfsd4_enc)nfsd4_encode_noop, - [OP_LAYOUTERROR] = (nfsd4_enc)nfsd4_encode_noop, - [OP_LAYOUTSTATS] = (nfsd4_enc)nfsd4_encode_noop, - [OP_OFFLOAD_CANCEL] = (nfsd4_enc)nfsd4_encode_noop, - [OP_OFFLOAD_STATUS] = (nfsd4_enc)nfsd4_encode_offload_status, - [OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_read_plus, - [OP_SEEK] = (nfsd4_enc)nfsd4_encode_seek, - [OP_WRITE_SAME] = (nfsd4_enc)nfsd4_encode_noop, - [OP_CLONE] = (nfsd4_enc)nfsd4_encode_noop, + [OP_ALLOCATE] = nfsd4_encode_noop, + [OP_COPY] = nfsd4_encode_copy, + [OP_COPY_NOTIFY] = nfsd4_encode_copy_notify, + [OP_DEALLOCATE] = nfsd4_encode_noop, + [OP_IO_ADVISE] = nfsd4_encode_noop, + [OP_LAYOUTERROR] = nfsd4_encode_noop, + [OP_LAYOUTSTATS] = nfsd4_encode_noop, + [OP_OFFLOAD_CANCEL] = nfsd4_encode_noop, + [OP_OFFLOAD_STATUS] = nfsd4_encode_offload_status, + [OP_READ_PLUS] = nfsd4_encode_read_plus, + [OP_SEEK] = nfsd4_encode_seek, + [OP_WRITE_SAME] = nfsd4_encode_noop, + [OP_CLONE] = nfsd4_encode_noop, /* RFC 8276 extended atributes operations */ - [OP_GETXATTR] = (nfsd4_enc)nfsd4_encode_getxattr, - [OP_SETXATTR] = (nfsd4_enc)nfsd4_encode_setxattr, - [OP_LISTXATTRS] = (nfsd4_enc)nfsd4_encode_listxattrs, - [OP_REMOVEXATTR] = (nfsd4_enc)nfsd4_encode_removexattr, + [OP_GETXATTR] = nfsd4_encode_getxattr, + [OP_SETXATTR] = nfsd4_encode_setxattr, + [OP_LISTXATTRS] = nfsd4_encode_listxattrs, + [OP_REMOVEXATTR] = nfsd4_encode_removexattr, }; /* From f31bc0bc12f3062feba2bb56c2c4a9bb27bae580 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Sun, 11 Dec 2022 06:19:33 -0500 Subject: [PATCH 1020/1565] nfsd: rework refcounting in filecache [ Upstream commit ac3a2585f018f10039b4a856dcb122da88c1c1c9 ] The filecache refcounting is a bit non-standard for something searchable by RCU, in that we maintain a sentinel reference while it's hashed. This in turn requires that we have to do things differently in the "put" depending on whether its hashed, which we believe to have led to races. There are other problems in here too. nfsd_file_close_inode_sync can end up freeing an nfsd_file while there are still outstanding references to it, and there are a number of subtle ToC/ToU races. Rework the code so that the refcount is what drives the lifecycle. When the refcount goes to zero, then unhash and rcu free the object. A task searching for a nfsd_file is allowed to bump its refcount, but only if it's not already 0. Ensure that we don't make any other changes to it until a reference is held. With this change, the LRU carries a reference. Take special care to deal with it when removing an entry from the list, and ensure that we only repurpose the nf_lru list_head when the refcount is 0 to ensure exclusive access to it. Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 328 +++++++++++++++++++++++--------------------- fs/nfsd/trace.h | 51 +++---- 2 files changed, 194 insertions(+), 185 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 6b8873b0c2c3..140094a44cc4 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -323,8 +323,7 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) if (key->gc) __set_bit(NFSD_FILE_GC, &nf->nf_flags); nf->nf_inode = key->inode; - /* nf_ref is pre-incremented for hash table */ - refcount_set(&nf->nf_ref, 2); + refcount_set(&nf->nf_ref, 1); nf->nf_may = key->need; nf->nf_mark = NULL; } @@ -376,24 +375,35 @@ nfsd_file_unhash(struct nfsd_file *nf) return false; } -static bool +static void nfsd_file_free(struct nfsd_file *nf) { s64 age = ktime_to_ms(ktime_sub(ktime_get(), nf->nf_birthtime)); - bool flush = false; trace_nfsd_file_free(nf); this_cpu_inc(nfsd_file_releases); this_cpu_add(nfsd_file_total_age, age); + nfsd_file_unhash(nf); + + /* + * We call fsync here in order to catch writeback errors. It's not + * strictly required by the protocol, but an nfsd_file could get + * evicted from the cache before a COMMIT comes in. If another + * task were to open that file in the interim and scrape the error, + * then the client may never see it. By calling fsync here, we ensure + * that writeback happens before the entry is freed, and that any + * errors reported result in the write verifier changing. + */ + nfsd_file_fsync(nf); + if (nf->nf_mark) nfsd_file_mark_put(nf->nf_mark); if (nf->nf_file) { get_file(nf->nf_file); filp_close(nf->nf_file, NULL); fput(nf->nf_file); - flush = true; } /* @@ -401,10 +411,9 @@ nfsd_file_free(struct nfsd_file *nf) * WARN and leak it to preserve system stability. */ if (WARN_ON_ONCE(!list_empty(&nf->nf_lru))) - return flush; + return; call_rcu(&nf->nf_rcu, nfsd_file_slab_free); - return flush; } static bool @@ -420,17 +429,23 @@ nfsd_file_check_writeback(struct nfsd_file *nf) mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK); } -static void nfsd_file_lru_add(struct nfsd_file *nf) +static bool nfsd_file_lru_add(struct nfsd_file *nf) { set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); - if (list_lru_add(&nfsd_file_lru, &nf->nf_lru)) + if (list_lru_add(&nfsd_file_lru, &nf->nf_lru)) { trace_nfsd_file_lru_add(nf); + return true; + } + return false; } -static void nfsd_file_lru_remove(struct nfsd_file *nf) +static bool nfsd_file_lru_remove(struct nfsd_file *nf) { - if (list_lru_del(&nfsd_file_lru, &nf->nf_lru)) + if (list_lru_del(&nfsd_file_lru, &nf->nf_lru)) { trace_nfsd_file_lru_del(nf); + return true; + } + return false; } struct nfsd_file * @@ -441,54 +456,48 @@ nfsd_file_get(struct nfsd_file *nf) return NULL; } -static void -nfsd_file_unhash_and_queue(struct nfsd_file *nf, struct list_head *dispose) -{ - trace_nfsd_file_unhash_and_queue(nf); - if (nfsd_file_unhash(nf)) { - /* caller must call nfsd_file_dispose_list() later */ - nfsd_file_lru_remove(nf); - list_add(&nf->nf_lru, dispose); - } -} - -static void -nfsd_file_put_noref(struct nfsd_file *nf) -{ - trace_nfsd_file_put(nf); - - if (refcount_dec_and_test(&nf->nf_ref)) { - WARN_ON(test_bit(NFSD_FILE_HASHED, &nf->nf_flags)); - nfsd_file_lru_remove(nf); - nfsd_file_free(nf); - } -} - -static void -nfsd_file_unhash_and_put(struct nfsd_file *nf) -{ - if (nfsd_file_unhash(nf)) - nfsd_file_put_noref(nf); -} - +/** + * nfsd_file_put - put the reference to a nfsd_file + * @nf: nfsd_file of which to put the reference + * + * Put a reference to a nfsd_file. In the non-GC case, we just put the + * reference immediately. In the GC case, if the reference would be + * the last one, the put it on the LRU instead to be cleaned up later. + */ void nfsd_file_put(struct nfsd_file *nf) { might_sleep(); + trace_nfsd_file_put(nf); - if (test_bit(NFSD_FILE_GC, &nf->nf_flags)) - nfsd_file_lru_add(nf); - else if (refcount_read(&nf->nf_ref) == 2) - nfsd_file_unhash_and_put(nf); + if (test_bit(NFSD_FILE_GC, &nf->nf_flags) && + test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { + /* + * If this is the last reference (nf_ref == 1), then try to + * transfer it to the LRU. + */ + if (refcount_dec_not_one(&nf->nf_ref)) + return; - if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { - nfsd_file_fsync(nf); - nfsd_file_put_noref(nf); - } else if (nf->nf_file && test_bit(NFSD_FILE_GC, &nf->nf_flags)) { - nfsd_file_put_noref(nf); - nfsd_file_schedule_laundrette(); - } else - nfsd_file_put_noref(nf); + /* Try to add it to the LRU. If that fails, decrement. */ + if (nfsd_file_lru_add(nf)) { + /* If it's still hashed, we're done */ + if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { + nfsd_file_schedule_laundrette(); + return; + } + + /* + * We're racing with unhashing, so try to remove it from + * the LRU. If removal fails, then someone else already + * has our reference. + */ + if (!nfsd_file_lru_remove(nf)) + return; + } + } + if (refcount_dec_and_test(&nf->nf_ref)) + nfsd_file_free(nf); } static void @@ -496,33 +505,13 @@ nfsd_file_dispose_list(struct list_head *dispose) { struct nfsd_file *nf; - while(!list_empty(dispose)) { + while (!list_empty(dispose)) { nf = list_first_entry(dispose, struct nfsd_file, nf_lru); list_del_init(&nf->nf_lru); - nfsd_file_fsync(nf); - nfsd_file_put_noref(nf); + nfsd_file_free(nf); } } -static void -nfsd_file_dispose_list_sync(struct list_head *dispose) -{ - bool flush = false; - struct nfsd_file *nf; - - while(!list_empty(dispose)) { - nf = list_first_entry(dispose, struct nfsd_file, nf_lru); - list_del_init(&nf->nf_lru); - nfsd_file_fsync(nf); - if (!refcount_dec_and_test(&nf->nf_ref)) - continue; - if (nfsd_file_free(nf)) - flush = true; - } - if (flush) - flush_delayed_fput(); -} - static void nfsd_file_list_remove_disposal(struct list_head *dst, struct nfsd_fcache_disposal *l) @@ -590,21 +579,8 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru, struct list_head *head = arg; struct nfsd_file *nf = list_entry(item, struct nfsd_file, nf_lru); - /* - * Do a lockless refcount check. The hashtable holds one reference, so - * we look to see if anything else has a reference, or if any have - * been put since the shrinker last ran. Those don't get unhashed and - * released. - * - * Note that in the put path, we set the flag and then decrement the - * counter. Here we check the counter and then test and clear the flag. - * That order is deliberate to ensure that we can do this locklessly. - */ - if (refcount_read(&nf->nf_ref) > 1) { - list_lru_isolate(lru, &nf->nf_lru); - trace_nfsd_file_gc_in_use(nf); - return LRU_REMOVED; - } + /* We should only be dealing with GC entries here */ + WARN_ON_ONCE(!test_bit(NFSD_FILE_GC, &nf->nf_flags)); /* * Don't throw out files that are still undergoing I/O or @@ -615,40 +591,30 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru, return LRU_SKIP; } + /* If it was recently added to the list, skip it */ if (test_and_clear_bit(NFSD_FILE_REFERENCED, &nf->nf_flags)) { trace_nfsd_file_gc_referenced(nf); return LRU_ROTATE; } - if (!test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { - trace_nfsd_file_gc_hashed(nf); - return LRU_SKIP; + /* + * Put the reference held on behalf of the LRU. If it wasn't the last + * one, then just remove it from the LRU and ignore it. + */ + if (!refcount_dec_and_test(&nf->nf_ref)) { + trace_nfsd_file_gc_in_use(nf); + list_lru_isolate(lru, &nf->nf_lru); + return LRU_REMOVED; } + /* Refcount went to zero. Unhash it and queue it to the dispose list */ + nfsd_file_unhash(nf); list_lru_isolate_move(lru, &nf->nf_lru, head); this_cpu_inc(nfsd_file_evictions); trace_nfsd_file_gc_disposed(nf); return LRU_REMOVED; } -/* - * Unhash items on @dispose immediately, then queue them on the - * disposal workqueue to finish releasing them in the background. - * - * cel: Note that between the time list_lru_shrink_walk runs and - * now, these items are in the hash table but marked unhashed. - * Why release these outside of lru_cb ? There's no lock ordering - * problem since lru_cb currently takes no lock. - */ -static void nfsd_file_gc_dispose_list(struct list_head *dispose) -{ - struct nfsd_file *nf; - - list_for_each_entry(nf, dispose, nf_lru) - nfsd_file_hash_remove(nf); - nfsd_file_dispose_list_delayed(dispose); -} - static void nfsd_file_gc(void) { @@ -658,7 +624,7 @@ nfsd_file_gc(void) ret = list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb, &dispose, list_lru_count(&nfsd_file_lru)); trace_nfsd_file_gc_removed(ret, list_lru_count(&nfsd_file_lru)); - nfsd_file_gc_dispose_list(&dispose); + nfsd_file_dispose_list_delayed(&dispose); } static void @@ -684,7 +650,7 @@ nfsd_file_lru_scan(struct shrinker *s, struct shrink_control *sc) ret = list_lru_shrink_walk(&nfsd_file_lru, sc, nfsd_file_lru_cb, &dispose); trace_nfsd_file_shrinker_removed(ret, list_lru_count(&nfsd_file_lru)); - nfsd_file_gc_dispose_list(&dispose); + nfsd_file_dispose_list_delayed(&dispose); return ret; } @@ -694,72 +660,111 @@ static struct shrinker nfsd_file_shrinker = { .seeks = 1, }; -/* - * Find all cache items across all net namespaces that match @inode and - * move them to @dispose. The lookup is atomic wrt nfsd_file_acquire(). +/** + * nfsd_file_queue_for_close: try to close out any open nfsd_files for an inode + * @inode: inode on which to close out nfsd_files + * @dispose: list on which to gather nfsd_files to close out + * + * An nfsd_file represents a struct file being held open on behalf of nfsd. An + * open file however can block other activity (such as leases), or cause + * undesirable behavior (e.g. spurious silly-renames when reexporting NFS). + * + * This function is intended to find open nfsd_files when this sort of + * conflicting access occurs and then attempt to close those files out. + * + * Populates the dispose list with entries that have already had their + * refcounts go to zero. The actual free of an nfsd_file can be expensive, + * so we leave it up to the caller whether it wants to wait or not. */ -static unsigned int -__nfsd_file_close_inode(struct inode *inode, struct list_head *dispose) +static void +nfsd_file_queue_for_close(struct inode *inode, struct list_head *dispose) { struct nfsd_file_lookup_key key = { .type = NFSD_FILE_KEY_INODE, .inode = inode, }; - unsigned int count = 0; struct nfsd_file *nf; rcu_read_lock(); do { + int decrement = 1; + nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key, nfsd_file_rhash_params); if (!nf) break; - nfsd_file_unhash_and_queue(nf, dispose); - count++; + + /* If we raced with someone else unhashing, ignore it */ + if (!nfsd_file_unhash(nf)) + continue; + + /* If we can't get a reference, ignore it */ + if (!nfsd_file_get(nf)) + continue; + + /* Extra decrement if we remove from the LRU */ + if (nfsd_file_lru_remove(nf)) + ++decrement; + + /* If refcount goes to 0, then put on the dispose list */ + if (refcount_sub_and_test(decrement, &nf->nf_ref)) { + list_add(&nf->nf_lru, dispose); + trace_nfsd_file_closing(nf); + } } while (1); rcu_read_unlock(); - return count; -} - -/** - * nfsd_file_close_inode_sync - attempt to forcibly close a nfsd_file - * @inode: inode of the file to attempt to remove - * - * Unhash and put, then flush and fput all cache items associated with @inode. - */ -void -nfsd_file_close_inode_sync(struct inode *inode) -{ - LIST_HEAD(dispose); - unsigned int count; - - count = __nfsd_file_close_inode(inode, &dispose); - trace_nfsd_file_close_inode_sync(inode, count); - nfsd_file_dispose_list_sync(&dispose); } /** * nfsd_file_close_inode - attempt a delayed close of a nfsd_file * @inode: inode of the file to attempt to remove * - * Unhash and put all cache item associated with @inode. + * Close out any open nfsd_files that can be reaped for @inode. The + * actual freeing is deferred to the dispose_list_delayed infrastructure. + * + * This is used by the fsnotify callbacks and setlease notifier. */ static void nfsd_file_close_inode(struct inode *inode) { LIST_HEAD(dispose); - unsigned int count; - count = __nfsd_file_close_inode(inode, &dispose); - trace_nfsd_file_close_inode(inode, count); + nfsd_file_queue_for_close(inode, &dispose); nfsd_file_dispose_list_delayed(&dispose); } +/** + * nfsd_file_close_inode_sync - attempt to forcibly close a nfsd_file + * @inode: inode of the file to attempt to remove + * + * Close out any open nfsd_files that can be reaped for @inode. The + * nfsd_files are closed out synchronously. + * + * This is called from nfsd_rename and nfsd_unlink to avoid silly-renames + * when reexporting NFS. + */ +void +nfsd_file_close_inode_sync(struct inode *inode) +{ + struct nfsd_file *nf; + LIST_HEAD(dispose); + + trace_nfsd_file_close(inode); + + nfsd_file_queue_for_close(inode, &dispose); + while (!list_empty(&dispose)) { + nf = list_first_entry(&dispose, struct nfsd_file, nf_lru); + list_del_init(&nf->nf_lru); + nfsd_file_free(nf); + } + flush_delayed_fput(); +} + /** * nfsd_file_delayed_close - close unused nfsd_files * @work: dummy * - * Walk the LRU list and close any entries that have not been used since + * Walk the LRU list and destroy any entries that have not been used since * the last scan. */ static void @@ -781,7 +786,7 @@ nfsd_file_lease_notifier_call(struct notifier_block *nb, unsigned long arg, /* Only close files for F_SETLEASE leases */ if (fl->fl_flags & FL_LEASE) - nfsd_file_close_inode_sync(file_inode(fl->fl_file)); + nfsd_file_close_inode(file_inode(fl->fl_file)); return 0; } @@ -902,6 +907,13 @@ nfsd_file_cache_init(void) goto out; } +/** + * __nfsd_file_cache_purge: clean out the cache for shutdown + * @net: net-namespace to shut down the cache (may be NULL) + * + * Walk the nfsd_file cache and close out any that match @net. If @net is NULL, + * then close out everything. Called when an nfsd instance is being shut down. + */ static void __nfsd_file_cache_purge(struct net *net) { @@ -915,8 +927,11 @@ __nfsd_file_cache_purge(struct net *net) nf = rhashtable_walk_next(&iter); while (!IS_ERR_OR_NULL(nf)) { - if (!net || nf->nf_net == net) - nfsd_file_unhash_and_queue(nf, &dispose); + if (!net || nf->nf_net == net) { + nfsd_file_unhash(nf); + nfsd_file_lru_remove(nf); + list_add(&nf->nf_lru, &dispose); + } nf = rhashtable_walk_next(&iter); } @@ -1083,8 +1098,12 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, if (nf) nf = nfsd_file_get(nf); rcu_read_unlock(); - if (nf) + + if (nf) { + if (nfsd_file_lru_remove(nf)) + WARN_ON_ONCE(refcount_dec_and_test(&nf->nf_ref)); goto wait_for_construction; + } nf = nfsd_file_alloc(&key, may_flags); if (!nf) { @@ -1117,11 +1136,11 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, goto out; } open_retry = false; - nfsd_file_put_noref(nf); + if (refcount_dec_and_test(&nf->nf_ref)) + nfsd_file_free(nf); goto retry; } - nfsd_file_lru_remove(nf); this_cpu_inc(nfsd_file_cache_hits); status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); @@ -1131,7 +1150,8 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, this_cpu_inc(nfsd_file_acquisitions); *pnf = nf; } else { - nfsd_file_put(nf); + if (refcount_dec_and_test(&nf->nf_ref)) + nfsd_file_free(nf); nf = NULL; } @@ -1157,8 +1177,10 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, * If construction failed, or we raced with a call to unlink() * then unhash. */ - if (status != nfs_ok || key.inode->i_nlink == 0) - nfsd_file_unhash_and_put(nf); + if (status == nfs_ok && key.inode->i_nlink == 0) + status = nfserr_jukebox; + if (status != nfs_ok) + nfsd_file_unhash(nf); clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags); smp_mb__after_atomic(); wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING); diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 235ea38da864..8db7eecde9e3 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -786,8 +786,8 @@ DEFINE_CLID_EVENT(confirmed_r); __print_flags(val, "|", \ { 1 << NFSD_FILE_HASHED, "HASHED" }, \ { 1 << NFSD_FILE_PENDING, "PENDING" }, \ - { 1 << NFSD_FILE_REFERENCED, "REFERENCED"}, \ - { 1 << NFSD_FILE_GC, "GC"}) + { 1 << NFSD_FILE_REFERENCED, "REFERENCED" }, \ + { 1 << NFSD_FILE_GC, "GC" }) DECLARE_EVENT_CLASS(nfsd_file_class, TP_PROTO(struct nfsd_file *nf), @@ -822,6 +822,7 @@ DEFINE_EVENT(nfsd_file_class, name, \ DEFINE_NFSD_FILE_EVENT(nfsd_file_free); DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash); DEFINE_NFSD_FILE_EVENT(nfsd_file_put); +DEFINE_NFSD_FILE_EVENT(nfsd_file_closing); DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_queue); TRACE_EVENT(nfsd_file_alloc, @@ -1013,35 +1014,6 @@ TRACE_EVENT(nfsd_file_open, __entry->nf_file) ) -DECLARE_EVENT_CLASS(nfsd_file_search_class, - TP_PROTO( - const struct inode *inode, - unsigned int count - ), - TP_ARGS(inode, count), - TP_STRUCT__entry( - __field(const struct inode *, inode) - __field(unsigned int, count) - ), - TP_fast_assign( - __entry->inode = inode; - __entry->count = count; - ), - TP_printk("inode=%p count=%u", - __entry->inode, __entry->count) -); - -#define DEFINE_NFSD_FILE_SEARCH_EVENT(name) \ -DEFINE_EVENT(nfsd_file_search_class, name, \ - TP_PROTO( \ - const struct inode *inode, \ - unsigned int count \ - ), \ - TP_ARGS(inode, count)) - -DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode_sync); -DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode); - TRACE_EVENT(nfsd_file_is_cached, TP_PROTO( const struct inode *inode, @@ -1119,7 +1091,6 @@ DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del_disposed); DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_in_use); DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_writeback); DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_referenced); -DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_hashed); DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_disposed); DECLARE_EVENT_CLASS(nfsd_file_lruwalk_class, @@ -1151,6 +1122,22 @@ DEFINE_EVENT(nfsd_file_lruwalk_class, name, \ DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_gc_removed); DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_shrinker_removed); +TRACE_EVENT(nfsd_file_close, + TP_PROTO( + const struct inode *inode + ), + TP_ARGS(inode), + TP_STRUCT__entry( + __field(const void *, inode) + ), + TP_fast_assign( + __entry->inode = inode; + ), + TP_printk("inode=%p", + __entry->inode + ) +); + TRACE_EVENT(nfsd_file_fsync, TP_PROTO( const struct nfsd_file *nf, From 45c08a752982116f3287afcd1bd9c50f4fab0c28 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Thu, 5 Jan 2023 14:55:56 -0500 Subject: [PATCH 1021/1565] nfsd: fix handling of cached open files in nfsd4_open codepath [ Upstream commit 0b3a551fa58b4da941efeb209b3770868e2eddd7 ] Commit fb70bf124b05 ("NFSD: Instantiate a struct file when creating a regular NFSv4 file") added the ability to cache an open fd over a compound. There are a couple of problems with the way this currently works: It's racy, as a newly-created nfsd_file can end up with its PENDING bit cleared while the nf is hashed, and the nf_file pointer is still zeroed out. Other tasks can find it in this state and they expect to see a valid nf_file, and can oops if nf_file is NULL. Also, there is no guarantee that we'll end up creating a new nfsd_file if one is already in the hash. If an extant entry is in the hash with a valid nf_file, nfs4_get_vfs_file will clobber its nf_file pointer with the value of op_file and the old nf_file will leak. Fix both issues by making a new nfsd_file_acquirei_opened variant that takes an optional file pointer. If one is present when this is called, we'll take a new reference to it instead of trying to open the file. If the nfsd_file already has a valid nf_file, we'll just ignore the optional file and pass the nfsd_file back as-is. Also rework the tracepoints a bit to allow for an "opened" variant and don't try to avoid counting acquisitions in the case where we already have a cached open file. Fixes: fb70bf124b05 ("NFSD: Instantiate a struct file when creating a regular NFSv4 file") Cc: Trond Myklebust Reported-by: Stanislav Saner Reported-and-Tested-by: Ruben Vestergaard Reported-and-Tested-by: Torkil Svensgaard Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 40 ++++++++++++++++++---------------- fs/nfsd/filecache.h | 5 +++-- fs/nfsd/nfs4state.c | 16 ++++---------- fs/nfsd/trace.h | 52 ++++++++++++--------------------------------- 4 files changed, 42 insertions(+), 71 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 140094a44cc4..6a62d95d5ce6 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1070,8 +1070,8 @@ nfsd_file_is_cached(struct inode *inode) static __be32 nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, - unsigned int may_flags, struct nfsd_file **pnf, - bool open, bool want_gc) + unsigned int may_flags, struct file *file, + struct nfsd_file **pnf, bool want_gc) { struct nfsd_file_lookup_key key = { .type = NFSD_FILE_KEY_FULL, @@ -1146,8 +1146,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); out: if (status == nfs_ok) { - if (open) - this_cpu_inc(nfsd_file_acquisitions); + this_cpu_inc(nfsd_file_acquisitions); *pnf = nf; } else { if (refcount_dec_and_test(&nf->nf_ref)) @@ -1157,20 +1156,23 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, out_status: put_cred(key.cred); - if (open) - trace_nfsd_file_acquire(rqstp, key.inode, may_flags, nf, status); + trace_nfsd_file_acquire(rqstp, key.inode, may_flags, nf, status); return status; open_file: trace_nfsd_file_alloc(nf); nf->nf_mark = nfsd_file_mark_find_or_create(nf, key.inode); if (nf->nf_mark) { - if (open) { + if (file) { + get_file(file); + nf->nf_file = file; + status = nfs_ok; + trace_nfsd_file_opened(nf, status); + } else { status = nfsd_open_verified(rqstp, fhp, may_flags, &nf->nf_file); trace_nfsd_file_open(nf, status); - } else - status = nfs_ok; + } } else status = nfserr_jukebox; /* @@ -1206,7 +1208,7 @@ __be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **pnf) { - return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true, true); + return nfsd_file_do_acquire(rqstp, fhp, may_flags, NULL, pnf, true); } /** @@ -1227,28 +1229,30 @@ __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **pnf) { - return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true, false); + return nfsd_file_do_acquire(rqstp, fhp, may_flags, NULL, pnf, false); } /** - * nfsd_file_create - Get a struct nfsd_file, do not open + * nfsd_file_acquire_opened - Get a struct nfsd_file using existing open file * @rqstp: the RPC transaction being executed * @fhp: the NFS filehandle of the file just created * @may_flags: NFSD_MAY_ settings for the file + * @file: cached, already-open file (may be NULL) * @pnf: OUT: new or found "struct nfsd_file" object * - * The nfsd_file_object returned by this API is reference-counted - * but not garbage-collected. The object is released immediately - * one RCU grace period after the final nfsd_file_put(). + * Acquire a nfsd_file object that is not GC'ed. If one doesn't already exist, + * and @file is non-NULL, use it to instantiate a new nfsd_file instead of + * opening a new one. * * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in * network byte order is returned. */ __be32 -nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, - unsigned int may_flags, struct nfsd_file **pnf) +nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp, + unsigned int may_flags, struct file *file, + struct nfsd_file **pnf) { - return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, false, false); + return nfsd_file_do_acquire(rqstp, fhp, may_flags, file, pnf, false); } /* diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index b7efb2c3ddb1..41516a4263ea 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -60,7 +60,8 @@ __be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **nfp); __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **nfp); -__be32 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp, - unsigned int may_flags, struct nfsd_file **nfp); +__be32 nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp, + unsigned int may_flags, struct file *file, + struct nfsd_file **nfp); int nfsd_file_cache_stats_show(struct seq_file *m, void *v); #endif /* _FS_NFSD_FILECACHE_H */ diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 4e05ad774c86..5eb3e055fde4 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5261,18 +5261,10 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, if (!fp->fi_fds[oflag]) { spin_unlock(&fp->fi_lock); - if (!open->op_filp) { - status = nfsd_file_acquire(rqstp, cur_fh, access, &nf); - if (status != nfs_ok) - goto out_put_access; - } else { - status = nfsd_file_create(rqstp, cur_fh, access, &nf); - if (status != nfs_ok) - goto out_put_access; - nf->nf_file = open->op_filp; - open->op_filp = NULL; - trace_nfsd_file_create(rqstp, access, nf); - } + status = nfsd_file_acquire_opened(rqstp, cur_fh, access, + open->op_filp, &nf); + if (status != nfs_ok) + goto out_put_access; spin_lock(&fp->fi_lock); if (!fp->fi_fds[oflag]) { diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 8db7eecde9e3..0f674982785c 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -891,43 +891,6 @@ TRACE_EVENT(nfsd_file_acquire, ) ); -TRACE_EVENT(nfsd_file_create, - TP_PROTO( - const struct svc_rqst *rqstp, - unsigned int may_flags, - const struct nfsd_file *nf - ), - - TP_ARGS(rqstp, may_flags, nf), - - TP_STRUCT__entry( - __field(const void *, nf_inode) - __field(const void *, nf_file) - __field(unsigned long, may_flags) - __field(unsigned long, nf_flags) - __field(unsigned long, nf_may) - __field(unsigned int, nf_ref) - __field(u32, xid) - ), - - TP_fast_assign( - __entry->nf_inode = nf->nf_inode; - __entry->nf_file = nf->nf_file; - __entry->may_flags = may_flags; - __entry->nf_flags = nf->nf_flags; - __entry->nf_may = nf->nf_may; - __entry->nf_ref = refcount_read(&nf->nf_ref); - __entry->xid = be32_to_cpu(rqstp->rq_xid); - ), - - TP_printk("xid=0x%x inode=%p may_flags=%s ref=%u nf_flags=%s nf_may=%s nf_file=%p", - __entry->xid, __entry->nf_inode, - show_nfsd_may_flags(__entry->may_flags), - __entry->nf_ref, show_nf_flags(__entry->nf_flags), - show_nfsd_may_flags(__entry->nf_may), __entry->nf_file - ) -); - TRACE_EVENT(nfsd_file_insert_err, TP_PROTO( const struct svc_rqst *rqstp, @@ -989,8 +952,8 @@ TRACE_EVENT(nfsd_file_cons_err, ) ); -TRACE_EVENT(nfsd_file_open, - TP_PROTO(struct nfsd_file *nf, __be32 status), +DECLARE_EVENT_CLASS(nfsd_file_open_class, + TP_PROTO(const struct nfsd_file *nf, __be32 status), TP_ARGS(nf, status), TP_STRUCT__entry( __field(void *, nf_inode) /* cannot be dereferenced */ @@ -1014,6 +977,17 @@ TRACE_EVENT(nfsd_file_open, __entry->nf_file) ) +#define DEFINE_NFSD_FILE_OPEN_EVENT(name) \ +DEFINE_EVENT(nfsd_file_open_class, name, \ + TP_PROTO( \ + const struct nfsd_file *nf, \ + __be32 status \ + ), \ + TP_ARGS(nf, status)) + +DEFINE_NFSD_FILE_OPEN_EVENT(nfsd_file_open); +DEFINE_NFSD_FILE_OPEN_EVENT(nfsd_file_opened); + TRACE_EVENT(nfsd_file_is_cached, TP_PROTO( const struct inode *inode, From 115b58b56f8875569cbea90cca607d56382de614 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 6 Jan 2023 12:43:37 -0500 Subject: [PATCH 1022/1565] Revert "SUNRPC: Use RMW bitops in single-threaded hot paths" [ Upstream commit 7827c81f0248e3c2f40d438b020f3d222f002171 ] The premise that "Once an svc thread is scheduled and executing an RPC, no other processes will touch svc_rqst::rq_flags" is false. svc_xprt_enqueue() examines the RQ_BUSY flag in scheduled nfsd threads when determining which thread to wake up next. Found via KCSAN. Fixes: 28df0988815f ("SUNRPC: Use RMW bitops in single-threaded hot paths") Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 7 +++---- fs/nfsd/nfs4xdr.c | 2 +- net/sunrpc/auth_gss/svcauth_gss.c | 4 ++-- net/sunrpc/svc.c | 6 +++--- net/sunrpc/svc_xprt.c | 2 +- net/sunrpc/svcsock.c | 8 ++++---- net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +- 7 files changed, 15 insertions(+), 16 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index cf558d951ec6..a89f98fa3a9d 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -937,7 +937,7 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, * the client wants us to do more in this compound: */ if (!nfsd4_last_compound_op(rqstp)) - __clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); + clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); /* check stateid */ status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, @@ -2609,12 +2609,11 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) cstate->minorversion = args->minorversion; fh_init(current_fh, NFS4_FHSIZE); fh_init(save_fh, NFS4_FHSIZE); - /* * Don't use the deferral mechanism for NFSv4; compounds make it * too hard to avoid non-idempotency problems. */ - __clear_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); + clear_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); /* * According to RFC3010, this takes precedence over all other errors. @@ -2736,7 +2735,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) out: cstate->status = status; /* Reset deferral mechanism for RPC deferrals */ - __set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); + set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); return rpc_success; } diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 8bbf85da1de0..d0b9fbf189ac 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2523,7 +2523,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) argp->rqstp->rq_cachetype = cachethis ? RC_REPLBUFF : RC_NOCACHE; if (readcount > 1 || max_reply > PAGE_SIZE - auth_slack) - __clear_bit(RQ_SPLICE_OK, &argp->rqstp->rq_flags); + clear_bit(RQ_SPLICE_OK, &argp->rqstp->rq_flags); return true; } diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c index abaa952ae751..329eac782cc5 100644 --- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -898,7 +898,7 @@ unwrap_integ_data(struct svc_rqst *rqstp, struct xdr_buf *buf, u32 seq, struct g * rejecting the server-computed MIC in this somewhat rare case, * do not use splice with the GSS integrity service. */ - __clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); + clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); /* Did we already verify the signature on the original pass through? */ if (rqstp->rq_deferred) @@ -970,7 +970,7 @@ unwrap_priv_data(struct svc_rqst *rqstp, struct xdr_buf *buf, u32 seq, struct gs int pad, remaining_len, offset; u32 rseqno; - __clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); + clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags); priv_len = svc_getnl(&buf->head[0]); if (rqstp->rq_deferred) { diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 8c2918179649..26d972c54a59 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1276,10 +1276,10 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) goto err_short_len; /* Will be turned off by GSS integrity and privacy services */ - __set_bit(RQ_SPLICE_OK, &rqstp->rq_flags); + set_bit(RQ_SPLICE_OK, &rqstp->rq_flags); /* Will be turned off only when NFSv4 Sessions are used */ - __set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); - __clear_bit(RQ_DROPME, &rqstp->rq_flags); + set_bit(RQ_USEDEFERRAL, &rqstp->rq_flags); + clear_bit(RQ_DROPME, &rqstp->rq_flags); svc_putu32(resv, rqstp->rq_xid); diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index ea65036dc506..000b737784bd 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -1228,7 +1228,7 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req) trace_svc_defer(rqstp); svc_xprt_get(rqstp->rq_xprt); dr->xprt = rqstp->rq_xprt; - __set_bit(RQ_DROPME, &rqstp->rq_flags); + set_bit(RQ_DROPME, &rqstp->rq_flags); dr->handle.revisit = svc_revisit; return &dr->handle; diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index ff8dcebbdfc4..90f6231d6ed6 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -309,9 +309,9 @@ static void svc_sock_setbufsize(struct svc_sock *svsk, unsigned int nreqs) static void svc_sock_secure_port(struct svc_rqst *rqstp) { if (svc_port_is_privileged(svc_addr(rqstp))) - __set_bit(RQ_SECURE, &rqstp->rq_flags); + set_bit(RQ_SECURE, &rqstp->rq_flags); else - __clear_bit(RQ_SECURE, &rqstp->rq_flags); + clear_bit(RQ_SECURE, &rqstp->rq_flags); } /* @@ -1014,9 +1014,9 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp) rqstp->rq_xprt_ctxt = NULL; rqstp->rq_prot = IPPROTO_TCP; if (test_bit(XPT_LOCAL, &svsk->sk_xprt.xpt_flags)) - __set_bit(RQ_LOCAL, &rqstp->rq_flags); + set_bit(RQ_LOCAL, &rqstp->rq_flags); else - __clear_bit(RQ_LOCAL, &rqstp->rq_flags); + clear_bit(RQ_LOCAL, &rqstp->rq_flags); p = (__be32 *)rqstp->rq_arg.head[0].iov_base; calldir = p[1]; diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index e72e36989985..c895f80df659 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -606,7 +606,7 @@ static int svc_rdma_has_wspace(struct svc_xprt *xprt) static void svc_rdma_secure_port(struct svc_rqst *rqstp) { - __set_bit(RQ_SECURE, &rqstp->rq_flags); + set_bit(RQ_SECURE, &rqstp->rq_flags); } static void svc_rdma_kill_temp_xprt(struct svc_xprt *xprt) From 2ecc439931ef44ec37a220185a4f0718e9159c98 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sat, 7 Jan 2023 10:15:35 -0500 Subject: [PATCH 1023/1565] NFSD: Use set_bit(RQ_DROPME) [ Upstream commit 5304930dbae82d259bcf7e5611db7c81e7a42eff ] The premise that "Once an svc thread is scheduled and executing an RPC, no other processes will touch svc_rqst::rq_flags" is false. svc_xprt_enqueue() examines the RQ_BUSY flag in scheduled nfsd threads when determining which thread to wake up next. Fixes: 9315564747cb ("NFSD: Use only RQ_DROPME to signal the need to drop a reply") Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsproc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index c32a871f0e27..96426dea7d41 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -211,7 +211,7 @@ nfsd_proc_read(struct svc_rqst *rqstp) if (resp->status == nfs_ok) resp->status = fh_getattr(&resp->fh, &resp->stat); else if (resp->status == nfserr_jukebox) - __set_bit(RQ_DROPME, &rqstp->rq_flags); + set_bit(RQ_DROPME, &rqstp->rq_flags); return rpc_success; } @@ -246,7 +246,7 @@ nfsd_proc_write(struct svc_rqst *rqstp) if (resp->status == nfs_ok) resp->status = fh_getattr(&resp->fh, &resp->stat); else if (resp->status == nfserr_jukebox) - __set_bit(RQ_DROPME, &rqstp->rq_flags); + set_bit(RQ_DROPME, &rqstp->rq_flags); return rpc_success; } From 6ac4c383c39f8f2f955f868d1ad9365c2363e80b Mon Sep 17 00:00:00 2001 From: Xingyuan Mo Date: Thu, 12 Jan 2023 00:24:53 +0800 Subject: [PATCH 1024/1565] NFSD: fix use-after-free in nfsd4_ssc_setup_dul() [ Upstream commit e6cf91b7b47ff82b624bdfe2fdcde32bb52e71dd ] If signal_pending() returns true, schedule_timeout() will not be executed, causing the waiting task to remain in the wait queue. Fixed by adding a call to finish_wait(), which ensures that the waiting task will always be removed from the wait queue. Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.") Signed-off-by: Xingyuan Mo Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index a89f98fa3a9d..4ab063a2ac84 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1320,6 +1320,7 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, /* allow 20secs for mount/unmount for now - revisit */ if (signal_pending(current) || (schedule_timeout(20*HZ) == 0)) { + finish_wait(&nn->nfsd_ssc_waitq, &wait); kfree(work); return nfserr_eagain; } From 438ef64bbfe44f99f0ffe890af44b687c12d089f Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Wed, 11 Jan 2023 12:17:09 -0800 Subject: [PATCH 1025/1565] NFSD: register/unregister of nfsd-client shrinker at nfsd startup/shutdown time [ Upstream commit f385f7d244134246f984975ed34cd75f77de479f ] Currently the nfsd-client shrinker is registered and unregistered at the time the nfsd module is loaded and unloaded. The problem with this is the shrinker is being registered before all of the relevant fields in nfsd_net are initialized when nfsd is started. This can lead to an oops when memory is low and the shrinker is called while nfsd is not running. This patch moves the register/unregister of nfsd-client shrinker from module load/unload time to nfsd startup/shutdown time. Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition") Reported-by: Mike Galbraith Signed-off-by: Dai Ngo [ cel: adjusted to apply without e33c267ab70d ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 22 +++++++++++----------- fs/nfsd/nfsctl.c | 7 +------ fs/nfsd/nfsd.h | 6 ++---- 3 files changed, 14 insertions(+), 21 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 5eb3e055fde4..f2647cbc108e 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4420,7 +4420,7 @@ nfsd4_state_shrinker_scan(struct shrinker *shrink, struct shrink_control *sc) return SHRINK_STOP; } -int +void nfsd4_init_leases_net(struct nfsd_net *nn) { struct sysinfo si; @@ -4442,16 +4442,6 @@ nfsd4_init_leases_net(struct nfsd_net *nn) nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB); atomic_set(&nn->nfsd_courtesy_clients, 0); - nn->nfsd_client_shrinker.scan_objects = nfsd4_state_shrinker_scan; - nn->nfsd_client_shrinker.count_objects = nfsd4_state_shrinker_count; - nn->nfsd_client_shrinker.seeks = DEFAULT_SEEKS; - return register_shrinker(&nn->nfsd_client_shrinker); -} - -void -nfsd4_leases_net_shutdown(struct nfsd_net *nn) -{ - unregister_shrinker(&nn->nfsd_client_shrinker); } static void init_nfs4_replay(struct nfs4_replay *rp) @@ -8079,8 +8069,17 @@ static int nfs4_state_create_net(struct net *net) INIT_DELAYED_WORK(&nn->nfsd_shrinker_work, nfsd4_state_shrinker_worker); get_net(net); + nn->nfsd_client_shrinker.scan_objects = nfsd4_state_shrinker_scan; + nn->nfsd_client_shrinker.count_objects = nfsd4_state_shrinker_count; + nn->nfsd_client_shrinker.seeks = DEFAULT_SEEKS; + + if (register_shrinker(&nn->nfsd_client_shrinker)) + goto err_shrinker; return 0; +err_shrinker: + put_net(net); + kfree(nn->sessionid_hashtbl); err_sessionid: kfree(nn->unconf_id_hashtbl); err_unconf_id: @@ -8173,6 +8172,7 @@ nfs4_state_shutdown_net(struct net *net) struct list_head *pos, *next, reaplist; struct nfsd_net *nn = net_generic(net, nfsd_net_id); + unregister_shrinker(&nn->nfsd_client_shrinker); cancel_delayed_work_sync(&nn->laundromat_work); locks_end_grace(&nn->nfsd4_manager); diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index d1e581a60480..c2577ee7ffb2 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1457,9 +1457,7 @@ static __net_init int nfsd_init_net(struct net *net) goto out_idmap_error; nn->nfsd_versions = NULL; nn->nfsd4_minorversions = NULL; - retval = nfsd4_init_leases_net(nn); - if (retval) - goto out_drc_error; + nfsd4_init_leases_net(nn); retval = nfsd_reply_cache_init(nn); if (retval) goto out_cache_error; @@ -1469,8 +1467,6 @@ static __net_init int nfsd_init_net(struct net *net) return 0; out_cache_error: - nfsd4_leases_net_shutdown(nn); -out_drc_error: nfsd_idmap_shutdown(net); out_idmap_error: nfsd_export_shutdown(net); @@ -1486,7 +1482,6 @@ static __net_exit void nfsd_exit_net(struct net *net) nfsd_idmap_shutdown(net); nfsd_export_shutdown(net); nfsd_netns_free_versions(net_generic(net, nfsd_net_id)); - nfsd4_leases_net_shutdown(nn); } static struct pernet_operations nfsd_net_ops = { diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 93b42ef9ed91..fa0144a74267 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -504,8 +504,7 @@ extern void unregister_cld_notifier(void); extern void nfsd4_ssc_init_umount_work(struct nfsd_net *nn); #endif -extern int nfsd4_init_leases_net(struct nfsd_net *nn); -extern void nfsd4_leases_net_shutdown(struct nfsd_net *nn); +extern void nfsd4_init_leases_net(struct nfsd_net *nn); #else /* CONFIG_NFSD_V4 */ static inline int nfsd4_is_junction(struct dentry *dentry) @@ -513,8 +512,7 @@ static inline int nfsd4_is_junction(struct dentry *dentry) return 0; } -static inline int nfsd4_init_leases_net(struct nfsd_net *nn) { return 0; }; -static inline void nfsd4_leases_net_shutdown(struct nfsd_net *nn) {}; +static inline void nfsd4_init_leases_net(struct nfsd_net *nn) { }; #define register_cld_notifier() 0 #define unregister_cld_notifier() do { } while(0) From 2bbf10861d51dae76c6da7113516d0071c782653 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Wed, 11 Jan 2023 16:06:51 -0800 Subject: [PATCH 1026/1565] NFSD: replace delayed_work with work_struct for nfsd_client_shrinker [ Upstream commit 7c24fa225081f31bc6da6a355c1ba801889ab29a ] Since nfsd4_state_shrinker_count always calls mod_delayed_work with 0 delay, we can replace delayed_work with work_struct to save some space and overhead. Also add the call to cancel_work after unregister the shrinker in nfs4_state_shutdown_net. Signed-off-by: Dai Ngo Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/netns.h | 2 +- fs/nfsd/nfs4state.c | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 8c854ba3285b..51a4b7885cae 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -195,7 +195,7 @@ struct nfsd_net { atomic_t nfsd_courtesy_clients; struct shrinker nfsd_client_shrinker; - struct delayed_work nfsd_shrinker_work; + struct work_struct nfsd_shrinker_work; }; /* Simple check to find out if a given net was properly initialized */ diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index f2647cbc108e..3dd64caf0615 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4410,7 +4410,7 @@ nfsd4_state_shrinker_count(struct shrinker *shrink, struct shrink_control *sc) if (!count) count = atomic_long_read(&num_delegations); if (count) - mod_delayed_work(laundry_wq, &nn->nfsd_shrinker_work, 0); + queue_work(laundry_wq, &nn->nfsd_shrinker_work); return (unsigned long)count; } @@ -6223,8 +6223,7 @@ deleg_reaper(struct nfsd_net *nn) static void nfsd4_state_shrinker_worker(struct work_struct *work) { - struct delayed_work *dwork = to_delayed_work(work); - struct nfsd_net *nn = container_of(dwork, struct nfsd_net, + struct nfsd_net *nn = container_of(work, struct nfsd_net, nfsd_shrinker_work); courtesy_client_reaper(nn); @@ -8066,7 +8065,7 @@ static int nfs4_state_create_net(struct net *net) INIT_LIST_HEAD(&nn->blocked_locks_lru); INIT_DELAYED_WORK(&nn->laundromat_work, laundromat_main); - INIT_DELAYED_WORK(&nn->nfsd_shrinker_work, nfsd4_state_shrinker_worker); + INIT_WORK(&nn->nfsd_shrinker_work, nfsd4_state_shrinker_worker); get_net(net); nn->nfsd_client_shrinker.scan_objects = nfsd4_state_shrinker_scan; @@ -8173,6 +8172,7 @@ nfs4_state_shutdown_net(struct net *net) struct nfsd_net *nn = net_generic(net, nfsd_net_id); unregister_shrinker(&nn->nfsd_client_shrinker); + cancel_work(&nn->nfsd_shrinker_work); cancel_delayed_work_sync(&nn->laundromat_work); locks_end_grace(&nn->nfsd4_manager); From e997a230d854c65626a5463ea447c02ac3710747 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 20 Jan 2023 14:52:14 -0500 Subject: [PATCH 1027/1565] nfsd: don't free files unconditionally in __nfsd_file_cache_purge [ Upstream commit 4bdbba54e9b1c769da8ded9abd209d765715e1d6 ] nfsd_file_cache_purge is called when the server is shutting down, in which case, tearing things down is generally fine, but it also gets called when the exports cache is flushed. Instead of walking the cache and freeing everything unconditionally, handle it the same as when we have a notification of conflicting access. Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache") Reported-by: Ruben Vestergaard Reported-by: Torkil Svensgaard Reported-by: Shachar Kagan Signed-off-by: Jeff Layton Tested-by: Shachar Kagan Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 61 ++++++++++++++++++++++++++------------------- 1 file changed, 36 insertions(+), 25 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 6a62d95d5ce6..68c7c82f8b3b 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -660,6 +660,39 @@ static struct shrinker nfsd_file_shrinker = { .seeks = 1, }; +/** + * nfsd_file_cond_queue - conditionally unhash and queue a nfsd_file + * @nf: nfsd_file to attempt to queue + * @dispose: private list to queue successfully-put objects + * + * Unhash an nfsd_file, try to get a reference to it, and then put that + * reference. If it's the last reference, queue it to the dispose list. + */ +static void +nfsd_file_cond_queue(struct nfsd_file *nf, struct list_head *dispose) + __must_hold(RCU) +{ + int decrement = 1; + + /* If we raced with someone else unhashing, ignore it */ + if (!nfsd_file_unhash(nf)) + return; + + /* If we can't get a reference, ignore it */ + if (!nfsd_file_get(nf)) + return; + + /* Extra decrement if we remove from the LRU */ + if (nfsd_file_lru_remove(nf)) + ++decrement; + + /* If refcount goes to 0, then put on the dispose list */ + if (refcount_sub_and_test(decrement, &nf->nf_ref)) { + list_add(&nf->nf_lru, dispose); + trace_nfsd_file_closing(nf); + } +} + /** * nfsd_file_queue_for_close: try to close out any open nfsd_files for an inode * @inode: inode on which to close out nfsd_files @@ -687,30 +720,11 @@ nfsd_file_queue_for_close(struct inode *inode, struct list_head *dispose) rcu_read_lock(); do { - int decrement = 1; - nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key, nfsd_file_rhash_params); if (!nf) break; - - /* If we raced with someone else unhashing, ignore it */ - if (!nfsd_file_unhash(nf)) - continue; - - /* If we can't get a reference, ignore it */ - if (!nfsd_file_get(nf)) - continue; - - /* Extra decrement if we remove from the LRU */ - if (nfsd_file_lru_remove(nf)) - ++decrement; - - /* If refcount goes to 0, then put on the dispose list */ - if (refcount_sub_and_test(decrement, &nf->nf_ref)) { - list_add(&nf->nf_lru, dispose); - trace_nfsd_file_closing(nf); - } + nfsd_file_cond_queue(nf, dispose); } while (1); rcu_read_unlock(); } @@ -927,11 +941,8 @@ __nfsd_file_cache_purge(struct net *net) nf = rhashtable_walk_next(&iter); while (!IS_ERR_OR_NULL(nf)) { - if (!net || nf->nf_net == net) { - nfsd_file_unhash(nf); - nfsd_file_lru_remove(nf); - list_add(&nf->nf_lru, &dispose); - } + if (!net || nf->nf_net == net) + nfsd_file_cond_queue(nf, &dispose); nf = rhashtable_walk_next(&iter); } From 6856f1385d62ce7cc88f6300326df1038df8ec86 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Sat, 11 Feb 2023 07:50:08 -0500 Subject: [PATCH 1028/1565] nfsd: don't destroy global nfs4_file table in per-net shutdown [ Upstream commit 4102db175b5d884d133270fdbd0e59111ce688fc ] The nfs4_file table is global, so shutting it down when a containerized nfsd is shut down is wrong and can lead to double-frees. Tear down the nfs4_file_rhltable in nfs4_state_shutdown instead of nfs4_state_shutdown_net. Fixes: d47b295e8d76 ("NFSD: Use rhashtable for managing nfs4_file objects") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2169017 Reported-by: JianHong Yin Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 3dd64caf0615..6c11f2701af8 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -8192,7 +8192,6 @@ nfs4_state_shutdown_net(struct net *net) nfsd4_client_tracking_exit(net); nfs4_state_destroy_net(net); - rhltable_destroy(&nfs4_file_rhltable); #ifdef CONFIG_NFSD_V4_2_INTER_SSC nfsd4_ssc_shutdown_umount(nn); #endif @@ -8202,6 +8201,7 @@ void nfs4_state_shutdown(void) { nfsd4_destroy_callback_queue(); + rhltable_destroy(&nfs4_file_rhltable); } static void From 9d7608dc4bd1ab1593ce84f802e12b3bb6b27fd4 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Sun, 18 Dec 2022 16:55:53 -0800 Subject: [PATCH 1029/1565] NFSD: enhance inter-server copy cleanup [ Upstream commit df24ac7a2e3a9d0bc68f1756a880e50bfe4b4522 ] Currently nfsd4_setup_inter_ssc returns the vfsmount of the source server's export when the mount completes. After the copy is done nfsd4_cleanup_inter_ssc is called with the vfsmount of the source server and it searches nfsd_ssc_mount_list for a matching entry to do the clean up. The problems with this approach are (1) the need to search the nfsd_ssc_mount_list and (2) the code has to handle the case where the matching entry is not found which looks ugly. The enhancement is instead of nfsd4_setup_inter_ssc returning the vfsmount, it returns the nfsd4_ssc_umount_item which has the vfsmount embedded in it. When nfsd4_cleanup_inter_ssc is called it's passed with the nfsd4_ssc_umount_item directly to do the clean up so no searching is needed and there is no need to handle the 'not found' case. Signed-off-by: Dai Ngo Signed-off-by: Chuck Lever [ cel: adjusted whitespace and variable/function names ] Reviewed-by: Olga Kornievskaia Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 111 ++++++++++++++++------------------------ fs/nfsd/xdr4.h | 2 +- include/linux/nfs_ssc.h | 2 +- 3 files changed, 46 insertions(+), 69 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 4ab063a2ac84..5e175133b7bc 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1295,15 +1295,15 @@ extern void nfs_sb_deactive(struct super_block *sb); * setup a work entry in the ssc delayed unmount list. */ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, - struct nfsd4_ssc_umount_item **retwork, struct vfsmount **ss_mnt) + struct nfsd4_ssc_umount_item **nsui) { struct nfsd4_ssc_umount_item *ni = NULL; struct nfsd4_ssc_umount_item *work = NULL; struct nfsd4_ssc_umount_item *tmp; DEFINE_WAIT(wait); + __be32 status = 0; - *ss_mnt = NULL; - *retwork = NULL; + *nsui = NULL; work = kzalloc(sizeof(*work), GFP_KERNEL); try_again: spin_lock(&nn->nfsd_ssc_lock); @@ -1327,12 +1327,12 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, finish_wait(&nn->nfsd_ssc_waitq, &wait); goto try_again; } - *ss_mnt = ni->nsui_vfsmount; + *nsui = ni; refcount_inc(&ni->nsui_refcnt); spin_unlock(&nn->nfsd_ssc_lock); kfree(work); - /* return vfsmount in ss_mnt */ + /* return vfsmount in (*nsui)->nsui_vfsmount */ return 0; } if (work) { @@ -1340,31 +1340,32 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, refcount_set(&work->nsui_refcnt, 2); work->nsui_busy = true; list_add_tail(&work->nsui_list, &nn->nfsd_ssc_mount_list); - *retwork = work; - } + *nsui = work; + } else + status = nfserr_resource; spin_unlock(&nn->nfsd_ssc_lock); - return 0; + return status; } -static void nfsd4_ssc_update_dul_work(struct nfsd_net *nn, - struct nfsd4_ssc_umount_item *work, struct vfsmount *ss_mnt) +static void nfsd4_ssc_update_dul(struct nfsd_net *nn, + struct nfsd4_ssc_umount_item *nsui, + struct vfsmount *ss_mnt) { - /* set nsui_vfsmount, clear busy flag and wakeup waiters */ spin_lock(&nn->nfsd_ssc_lock); - work->nsui_vfsmount = ss_mnt; - work->nsui_busy = false; + nsui->nsui_vfsmount = ss_mnt; + nsui->nsui_busy = false; wake_up_all(&nn->nfsd_ssc_waitq); spin_unlock(&nn->nfsd_ssc_lock); } -static void nfsd4_ssc_cancel_dul_work(struct nfsd_net *nn, - struct nfsd4_ssc_umount_item *work) +static void nfsd4_ssc_cancel_dul(struct nfsd_net *nn, + struct nfsd4_ssc_umount_item *nsui) { spin_lock(&nn->nfsd_ssc_lock); - list_del(&work->nsui_list); + list_del(&nsui->nsui_list); wake_up_all(&nn->nfsd_ssc_waitq); spin_unlock(&nn->nfsd_ssc_lock); - kfree(work); + kfree(nsui); } /* @@ -1372,7 +1373,7 @@ static void nfsd4_ssc_cancel_dul_work(struct nfsd_net *nn, */ static __be32 nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, - struct vfsmount **mount) + struct nfsd4_ssc_umount_item **nsui) { struct file_system_type *type; struct vfsmount *ss_mnt; @@ -1383,7 +1384,6 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, char *ipaddr, *dev_name, *raw_data; int len, raw_len; __be32 status = nfserr_inval; - struct nfsd4_ssc_umount_item *work = NULL; struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); naddr = &nss->u.nl4_addr; @@ -1391,6 +1391,7 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, naddr->addr_len, (struct sockaddr *)&tmp_addr, sizeof(tmp_addr)); + *nsui = NULL; if (tmp_addrlen == 0) goto out_err; @@ -1433,10 +1434,10 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, goto out_free_rawdata; snprintf(dev_name, len + 5, "%s%s%s:/", startsep, ipaddr, endsep); - status = nfsd4_ssc_setup_dul(nn, ipaddr, &work, &ss_mnt); + status = nfsd4_ssc_setup_dul(nn, ipaddr, nsui); if (status) goto out_free_devname; - if (ss_mnt) + if ((*nsui)->nsui_vfsmount) goto out_done; /* Use an 'internal' mount: SB_KERNMOUNT -> MNT_INTERNAL */ @@ -1444,15 +1445,12 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, module_put(type->owner); if (IS_ERR(ss_mnt)) { status = nfserr_nodev; - if (work) - nfsd4_ssc_cancel_dul_work(nn, work); + nfsd4_ssc_cancel_dul(nn, *nsui); goto out_free_devname; } - if (work) - nfsd4_ssc_update_dul_work(nn, work, ss_mnt); + nfsd4_ssc_update_dul(nn, *nsui, ss_mnt); out_done: status = 0; - *mount = ss_mnt; out_free_devname: kfree(dev_name); @@ -1476,7 +1474,7 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, static __be32 nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, - struct nfsd4_copy *copy, struct vfsmount **mount) + struct nfsd4_copy *copy) { struct svc_fh *s_fh = NULL; stateid_t *s_stid = ©->cp_src_stateid; @@ -1489,7 +1487,7 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, if (status) goto out; - status = nfsd4_interssc_connect(copy->cp_src, rqstp, mount); + status = nfsd4_interssc_connect(copy->cp_src, rqstp, ©->ss_nsui); if (status) goto out; @@ -1507,45 +1505,27 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, } static void -nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp, +nfsd4_cleanup_inter_ssc(struct nfsd4_ssc_umount_item *nsui, struct file *filp, struct nfsd_file *dst) { - bool found = false; - long timeout; - struct nfsd4_ssc_umount_item *tmp; - struct nfsd4_ssc_umount_item *ni = NULL; struct nfsd_net *nn = net_generic(dst->nf_net, nfsd_net_id); + long timeout = msecs_to_jiffies(nfsd4_ssc_umount_timeout); nfs42_ssc_close(filp); nfsd_file_put(dst); fput(filp); - if (!nn) { - mntput(ss_mnt); - return; - } spin_lock(&nn->nfsd_ssc_lock); - timeout = msecs_to_jiffies(nfsd4_ssc_umount_timeout); - list_for_each_entry_safe(ni, tmp, &nn->nfsd_ssc_mount_list, nsui_list) { - if (ni->nsui_vfsmount->mnt_sb == ss_mnt->mnt_sb) { - list_del(&ni->nsui_list); - /* - * vfsmount can be shared by multiple exports, - * decrement refcnt. If the count drops to 1 it - * will be unmounted when nsui_expire expires. - */ - refcount_dec(&ni->nsui_refcnt); - ni->nsui_expire = jiffies + timeout; - list_add_tail(&ni->nsui_list, &nn->nfsd_ssc_mount_list); - found = true; - break; - } - } + list_del(&nsui->nsui_list); + /* + * vfsmount can be shared by multiple exports, + * decrement refcnt. If the count drops to 1 it + * will be unmounted when nsui_expire expires. + */ + refcount_dec(&nsui->nsui_refcnt); + nsui->nsui_expire = jiffies + timeout; + list_add_tail(&nsui->nsui_list, &nn->nfsd_ssc_mount_list); spin_unlock(&nn->nfsd_ssc_lock); - if (!found) { - mntput(ss_mnt); - return; - } } #else /* CONFIG_NFSD_V4_2_INTER_SSC */ @@ -1553,15 +1533,13 @@ nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp, static __be32 nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, - struct nfsd4_copy *copy, - struct vfsmount **mount) + struct nfsd4_copy *copy) { - *mount = NULL; return nfserr_inval; } static void -nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *filp, +nfsd4_cleanup_inter_ssc(struct nfsd4_ssc_umount_item *nsui, struct file *filp, struct nfsd_file *dst) { } @@ -1702,7 +1680,7 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst) memcpy(dst->cp_src, src->cp_src, sizeof(struct nl4_server)); memcpy(&dst->stateid, &src->stateid, sizeof(src->stateid)); memcpy(&dst->c_fh, &src->c_fh, sizeof(src->c_fh)); - dst->ss_mnt = src->ss_mnt; + dst->ss_nsui = src->ss_nsui; } static void cleanup_async_copy(struct nfsd4_copy *copy) @@ -1751,8 +1729,8 @@ static int nfsd4_do_async_copy(void *data) if (nfsd4_ssc_is_inter(copy)) { struct file *filp; - filp = nfs42_ssc_open(copy->ss_mnt, ©->c_fh, - ©->stateid); + filp = nfs42_ssc_open(copy->ss_nsui->nsui_vfsmount, + ©->c_fh, ©->stateid); if (IS_ERR(filp)) { switch (PTR_ERR(filp)) { case -EBADF: @@ -1766,7 +1744,7 @@ static int nfsd4_do_async_copy(void *data) } nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file, false); - nfsd4_cleanup_inter_ssc(copy->ss_mnt, filp, copy->nf_dst); + nfsd4_cleanup_inter_ssc(copy->ss_nsui, filp, copy->nf_dst); } else { nfserr = nfsd4_do_copy(copy, copy->nf_src->nf_file, copy->nf_dst->nf_file, false); @@ -1792,8 +1770,7 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfserr_notsupp; goto out; } - status = nfsd4_setup_inter_ssc(rqstp, cstate, copy, - ©->ss_mnt); + status = nfsd4_setup_inter_ssc(rqstp, cstate, copy); if (status) return nfserr_offload_denied; } else { diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index 24934cf90a84..a034b9b62137 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -571,7 +571,7 @@ struct nfsd4_copy { struct task_struct *copy_task; refcount_t refcount; - struct vfsmount *ss_mnt; + struct nfsd4_ssc_umount_item *ss_nsui; struct nfs_fh c_fh; nfs4_stateid stateid; }; diff --git a/include/linux/nfs_ssc.h b/include/linux/nfs_ssc.h index 75843c00f326..22265b1ff080 100644 --- a/include/linux/nfs_ssc.h +++ b/include/linux/nfs_ssc.h @@ -53,6 +53,7 @@ static inline void nfs42_ssc_close(struct file *filep) if (nfs_ssc_client_tbl.ssc_nfs4_ops) (*nfs_ssc_client_tbl.ssc_nfs4_ops->sco_close)(filep); } +#endif struct nfsd4_ssc_umount_item { struct list_head nsui_list; @@ -66,7 +67,6 @@ struct nfsd4_ssc_umount_item { struct vfsmount *nsui_vfsmount; char nsui_ipaddr[RPC_MAX_ADDRBUFLEN + 1]; }; -#endif /* * NFS_FS From 3c7b9b3487c06e3ffe1b82caf34c1af39fb1e326 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 6 Jan 2023 10:33:47 -0500 Subject: [PATCH 1030/1565] nfsd: allow nfsd_file_get to sanely handle a NULL pointer [ Upstream commit 70f62231cdfd52357836733dd31db787e0412ab2 ] ...and remove some now-useless NULL pointer checks in its callers. Suggested-by: NeilBrown Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 5 ++--- fs/nfsd/nfs4state.c | 4 +--- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 68c7c82f8b3b..206742bbbd68 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -451,7 +451,7 @@ static bool nfsd_file_lru_remove(struct nfsd_file *nf) struct nfsd_file * nfsd_file_get(struct nfsd_file *nf) { - if (likely(refcount_inc_not_zero(&nf->nf_ref))) + if (nf && refcount_inc_not_zero(&nf->nf_ref)) return nf; return NULL; } @@ -1106,8 +1106,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, rcu_read_lock(); nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key, nfsd_file_rhash_params); - if (nf) - nf = nfsd_file_get(nf); + nf = nfsd_file_get(nf); rcu_read_unlock(); if (nf) { diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 6c11f2701af8..2733eb33d5df 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -602,9 +602,7 @@ put_nfs4_file(struct nfs4_file *fi) static struct nfsd_file * __nfs4_get_fd(struct nfs4_file *f, int oflag) { - if (f->fi_fds[oflag]) - return nfsd_file_get(f->fi_fds[oflag]); - return NULL; + return nfsd_file_get(f->fi_fds[oflag]); } static struct nfsd_file * From fd63299db8090307eae66f2aef17c8f00aafa0a9 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Tue, 17 Jan 2023 14:38:31 -0500 Subject: [PATCH 1031/1565] nfsd: clean up potential nfsd_file refcount leaks in COPY codepath [ Upstream commit 6ba434cb1a8d403ea9aad1b667c3ea3ad8b3191f ] There are two different flavors of the nfsd4_copy struct. One is embedded in the compound and is used directly in synchronous copies. The other is dynamically allocated, refcounted and tracked in the client struture. For the embedded one, the cleanup just involves releasing any nfsd_files held on its behalf. For the async one, the cleanup is a bit more involved, and we need to dequeue it from lists, unhash it, etc. There is at least one potential refcount leak in this code now. If the kthread_create call fails, then both the src and dst nfsd_files in the original nfsd4_copy object are leaked. The cleanup in this codepath is also sort of weird. In the async copy case, we'll have up to four nfsd_file references (src and dst for both flavors of copy structure). They are both put at the end of nfsd4_do_async_copy, even though the ones held on behalf of the embedded one outlive that structure. Change it so that we always clean up the nfsd_file refs held by the embedded copy structure before nfsd4_copy returns. Rework cleanup_async_copy to handle both inter and intra copies. Eliminate nfsd4_cleanup_intra_ssc since it now becomes a no-op. Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 5e175133b7bc..0fe00d6d385a 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1512,7 +1512,6 @@ nfsd4_cleanup_inter_ssc(struct nfsd4_ssc_umount_item *nsui, struct file *filp, long timeout = msecs_to_jiffies(nfsd4_ssc_umount_timeout); nfs42_ssc_close(filp); - nfsd_file_put(dst); fput(filp); spin_lock(&nn->nfsd_ssc_lock); @@ -1562,13 +1561,6 @@ nfsd4_setup_intra_ssc(struct svc_rqst *rqstp, ©->nf_dst); } -static void -nfsd4_cleanup_intra_ssc(struct nfsd_file *src, struct nfsd_file *dst) -{ - nfsd_file_put(src); - nfsd_file_put(dst); -} - static void nfsd4_cb_offload_release(struct nfsd4_callback *cb) { struct nfsd4_cb_offload *cbo = @@ -1683,12 +1675,18 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst) dst->ss_nsui = src->ss_nsui; } +static void release_copy_files(struct nfsd4_copy *copy) +{ + if (copy->nf_src) + nfsd_file_put(copy->nf_src); + if (copy->nf_dst) + nfsd_file_put(copy->nf_dst); +} + static void cleanup_async_copy(struct nfsd4_copy *copy) { nfs4_free_copy_state(copy); - nfsd_file_put(copy->nf_dst); - if (!nfsd4_ssc_is_inter(copy)) - nfsd_file_put(copy->nf_src); + release_copy_files(copy); spin_lock(©->cp_clp->async_lock); list_del(©->copies); spin_unlock(©->cp_clp->async_lock); @@ -1748,7 +1746,6 @@ static int nfsd4_do_async_copy(void *data) } else { nfserr = nfsd4_do_copy(copy, copy->nf_src->nf_file, copy->nf_dst->nf_file, false); - nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst); } do_callback: @@ -1811,9 +1808,9 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, } else { status = nfsd4_do_copy(copy, copy->nf_src->nf_file, copy->nf_dst->nf_file, true); - nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst); } out: + release_copy_files(copy); return status; out_err: if (async_copy) From 2da50149981d05955e51c28e982e9ac29bd73417 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Mon, 23 Jan 2023 21:34:13 -0800 Subject: [PATCH 1032/1565] NFSD: fix leaked reference count of nfsd4_ssc_umount_item [ Upstream commit 34e8f9ec4c9ac235f917747b23a200a5e0ec857b ] The reference count of nfsd4_ssc_umount_item is not decremented on error conditions. This prevents the laundromat from unmounting the vfsmount of the source file. This patch decrements the reference count of nfsd4_ssc_umount_item on error. Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.") Signed-off-by: Dai Ngo Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 0fe00d6d385a..fe20fd093335 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1813,13 +1813,17 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, release_copy_files(copy); return status; out_err: + if (nfsd4_ssc_is_inter(copy)) { + /* + * Source's vfsmount of inter-copy will be unmounted + * by the laundromat. Use copy instead of async_copy + * since async_copy->ss_nsui might not be set yet. + */ + refcount_dec(©->ss_nsui->nsui_refcnt); + } if (async_copy) cleanup_async_copy(async_copy); status = nfserrno(-ENOMEM); - /* - * source's vfsmount of inter-copy will be unmounted - * by the laundromat - */ goto out; } From e5e1dc828499735e7bd525174cd3141a316da97a Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 27 Jan 2023 07:09:33 -0500 Subject: [PATCH 1033/1565] nfsd: don't hand out delegation on setuid files being opened for write [ Upstream commit 826b67e6376c2a788e3a62c4860dcd79500a27d5 ] We had a bug report that xfstest generic/355 was failing on NFSv4.0. This test sets various combinations of setuid/setgid modes and tests whether DIO writes will cause them to be stripped. What I found was that the server did properly strip those bits, but the client didn't notice because it held a delegation that was not recalled. The recall didn't occur because the client itself was the one generating the activity and we avoid recalls in that case. Clearing setuid bits is an "implicit" activity. The client didn't specifically request that we do that, so we need the server to issue a CB_RECALL, or avoid the situation entirely by not issuing a delegation. The easiest fix here is to simply not give out a delegation if the file is being opened for write, and the mode has the setuid and/or setgid bit set. Note that there is a potential race between the mode and lease being set, so we test for this condition both before and after setting the lease. This patch fixes generic/355, generic/683 and generic/684 for me. (Note that 355 fails only on v4.0, and 683 and 684 require NFSv4.2 to run and fail). Reported-by: Boyang Xue Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 2733eb33d5df..f57137ef306b 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5435,6 +5435,23 @@ nfsd4_verify_deleg_dentry(struct nfsd4_open *open, struct nfs4_file *fp, return 0; } +/* + * We avoid breaking delegations held by a client due to its own activity, but + * clearing setuid/setgid bits on a write is an implicit activity and the client + * may not notice and continue using the old mode. Avoid giving out a delegation + * on setuid/setgid files when the client is requesting an open for write. + */ +static int +nfsd4_verify_setuid_write(struct nfsd4_open *open, struct nfsd_file *nf) +{ + struct inode *inode = file_inode(nf->nf_file); + + if ((open->op_share_access & NFS4_SHARE_ACCESS_WRITE) && + (inode->i_mode & (S_ISUID|S_ISGID))) + return -EAGAIN; + return 0; +} + static struct nfs4_delegation * nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, struct svc_fh *parent) @@ -5468,6 +5485,8 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, spin_lock(&fp->fi_lock); if (nfs4_delegation_exists(clp, fp)) status = -EAGAIN; + else if (nfsd4_verify_setuid_write(open, nf)) + status = -EAGAIN; else if (!fp->fi_deleg_file) { fp->fi_deleg_file = nf; /* increment early to prevent fi_deleg_file from being @@ -5508,6 +5527,14 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, if (status) goto out_unlock; + /* + * Now that the deleg is set, check again to ensure that nothing + * raced in and changed the mode while we weren't lookng. + */ + status = nfsd4_verify_setuid_write(open, fp->fi_deleg_file); + if (status) + goto out_unlock; + spin_lock(&state_lock); spin_lock(&fp->fi_lock); if (fp->fi_had_conflict) From 3db6c79de9232f04c62b8a13c9b951fd42d1aeb0 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Tue, 31 Jan 2023 11:12:29 -0800 Subject: [PATCH 1034/1565] NFSD: fix problems with cleanup on errors in nfsd4_copy [ Upstream commit 81e722978ad21072470b73d8f6a50ad62c7d5b7d ] When nfsd4_copy fails to allocate memory for async_copy->cp_src, or nfs4_init_copy_state fails, it calls cleanup_async_copy to do the cleanup for the async_copy which causes page fault since async_copy is not yet initialized. This patche rearranges the order of initializing the fields in async_copy and adds checks in cleanup_async_copy to skip un-initialized fields. Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy") Fixes: 87689df69491 ("NFSD: Shrink size of struct nfsd4_copy") Signed-off-by: Dai Ngo Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 12 ++++++++---- fs/nfsd/nfs4state.c | 5 +++-- 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index fe20fd093335..9c51c10bcf08 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1687,9 +1687,12 @@ static void cleanup_async_copy(struct nfsd4_copy *copy) { nfs4_free_copy_state(copy); release_copy_files(copy); - spin_lock(©->cp_clp->async_lock); - list_del(©->copies); - spin_unlock(©->cp_clp->async_lock); + if (copy->cp_clp) { + spin_lock(©->cp_clp->async_lock); + if (!list_empty(©->copies)) + list_del_init(©->copies); + spin_unlock(©->cp_clp->async_lock); + } nfs4_put_copy(copy); } @@ -1786,12 +1789,13 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, async_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL); if (!async_copy) goto out_err; + INIT_LIST_HEAD(&async_copy->copies); + refcount_set(&async_copy->refcount, 1); async_copy->cp_src = kmalloc(sizeof(*async_copy->cp_src), GFP_KERNEL); if (!async_copy->cp_src) goto out_err; if (!nfs4_init_copy_state(nn, copy)) goto out_err; - refcount_set(&async_copy->refcount, 1); memcpy(©->cp_res.cb_stateid, ©->cp_stateid.cs_stid, sizeof(copy->cp_res.cb_stateid)); dup_copy_fields(copy, async_copy); diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index f57137ef306b..507dd2b11856 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -990,7 +990,6 @@ static int nfs4_init_cp_state(struct nfsd_net *nn, copy_stateid_t *stid, stid->cs_stid.si_opaque.so_clid.cl_boot = (u32)nn->boot_time; stid->cs_stid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id; - stid->cs_type = cs_type; idr_preload(GFP_KERNEL); spin_lock(&nn->s2s_cp_lock); @@ -1001,6 +1000,7 @@ static int nfs4_init_cp_state(struct nfsd_net *nn, copy_stateid_t *stid, idr_preload_end(); if (new_id < 0) return 0; + stid->cs_type = cs_type; return 1; } @@ -1034,7 +1034,8 @@ void nfs4_free_copy_state(struct nfsd4_copy *copy) { struct nfsd_net *nn; - WARN_ON_ONCE(copy->cp_stateid.cs_type != NFS4_COPY_STID); + if (copy->cp_stateid.cs_type != NFS4_COPY_STID) + return; nn = net_generic(copy->cp_clp->net, nfsd_net_id); spin_lock(&nn->s2s_cp_lock); idr_remove(&nn->s2s_cp_stateids, From 1178547637a2b05aeff2d65e90c145826e5d865c Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 3 Feb 2023 13:18:34 -0500 Subject: [PATCH 1035/1565] nfsd: fix courtesy client with deny mode handling in nfs4_upgrade_open MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit dcd779dc46540e174a6ac8d52fbed23593407317 ] The nested if statements here make no sense, as you can never reach "else" branch in the nested statement. Fix the error handling for when there is a courtesy client that holds a conflicting deny mode. Fixes: 3d6942715180 ("NFSD: add support for share reservation conflict to courteous server") Reported-by: 張智諺 Signed-off-by: Jeff Layton Reviewed-by: Dai Ngo Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 507dd2b11856..e8e642e5ec8f 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -5296,16 +5296,17 @@ nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, /* test and set deny mode */ spin_lock(&fp->fi_lock); status = nfs4_file_check_deny(fp, open->op_share_deny); - if (status == nfs_ok) { - if (status != nfserr_share_denied) { - set_deny(open->op_share_deny, stp); - fp->fi_share_deny |= - (open->op_share_deny & NFS4_SHARE_DENY_BOTH); - } else { - if (nfs4_resolve_deny_conflicts_locked(fp, false, - stp, open->op_share_deny, false)) - status = nfserr_jukebox; - } + switch (status) { + case nfs_ok: + set_deny(open->op_share_deny, stp); + fp->fi_share_deny |= + (open->op_share_deny & NFS4_SHARE_DENY_BOTH); + break; + case nfserr_share_denied: + if (nfs4_resolve_deny_conflicts_locked(fp, false, + stp, open->op_share_deny, false)) + status = nfserr_jukebox; + break; } spin_unlock(&fp->fi_lock); From dd7d50c695a6194bdd26ecbd832aefdc36c6d77a Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Tue, 7 Feb 2023 12:02:46 -0500 Subject: [PATCH 1036/1565] nfsd: don't fsync nfsd_files on last close [ Upstream commit 4c475eee02375ade6e864f1db16976ba0d96a0a2 ] Most of the time, NFSv4 clients issue a COMMIT before the final CLOSE of an open stateid, so with NFSv4, the fsync in the nfsd_file_free path is usually a no-op and doesn't block. We have a customer running knfsd over very slow storage (XFS over Ceph RBD). They were using the "async" export option because performance was more important than data integrity for this application. That export option turns NFSv4 COMMIT calls into no-ops. Due to the fsync in this codepath however, their final CLOSE calls would still stall (since a CLOSE effectively became a COMMIT). I think this fsync is not strictly necessary. We only use that result to reset the write verifier. Instead of fsync'ing all of the data when we free an nfsd_file, we can just check for writeback errors when one is acquired and when it is freed. If the client never comes back, then it'll never see the error anyway and there is no point in resetting it. If an error occurs after the nfsd_file is removed from the cache but before the inode is evicted, then it will reset the write verifier on the next nfsd_file_acquire, (since there will be an unseen error). The only exception here is if something else opens and fsyncs the file during that window. Given that local applications work with this limitation today, I don't see that as an issue. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2166658 Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache") Reported-and-tested-by: Pierguido Lambri Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 44 ++++++++++++-------------------------------- fs/nfsd/trace.h | 31 ------------------------------- 2 files changed, 12 insertions(+), 63 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 206742bbbd68..4a3796c6bd95 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -330,37 +330,27 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) return nf; } +/** + * nfsd_file_check_write_error - check for writeback errors on a file + * @nf: nfsd_file to check for writeback errors + * + * Check whether a nfsd_file has an unseen error. Reset the write + * verifier if so. + */ static void -nfsd_file_fsync(struct nfsd_file *nf) -{ - struct file *file = nf->nf_file; - int ret; - - if (!file || !(file->f_mode & FMODE_WRITE)) - return; - ret = vfs_fsync(file, 1); - trace_nfsd_file_fsync(nf, ret); - if (ret) - nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); -} - -static int nfsd_file_check_write_error(struct nfsd_file *nf) { struct file *file = nf->nf_file; - if (!file || !(file->f_mode & FMODE_WRITE)) - return 0; - return filemap_check_wb_err(file->f_mapping, READ_ONCE(file->f_wb_err)); + if ((file->f_mode & FMODE_WRITE) && + filemap_check_wb_err(file->f_mapping, READ_ONCE(file->f_wb_err))) + nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); } static void nfsd_file_hash_remove(struct nfsd_file *nf) { trace_nfsd_file_unhash(nf); - - if (nfsd_file_check_write_error(nf)) - nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); rhashtable_remove_fast(&nfsd_file_rhash_tbl, &nf->nf_rhash, nfsd_file_rhash_params); } @@ -386,23 +376,12 @@ nfsd_file_free(struct nfsd_file *nf) this_cpu_add(nfsd_file_total_age, age); nfsd_file_unhash(nf); - - /* - * We call fsync here in order to catch writeback errors. It's not - * strictly required by the protocol, but an nfsd_file could get - * evicted from the cache before a COMMIT comes in. If another - * task were to open that file in the interim and scrape the error, - * then the client may never see it. By calling fsync here, we ensure - * that writeback happens before the entry is freed, and that any - * errors reported result in the write verifier changing. - */ - nfsd_file_fsync(nf); - if (nf->nf_mark) nfsd_file_mark_put(nf->nf_mark); if (nf->nf_file) { get_file(nf->nf_file); filp_close(nf->nf_file, NULL); + nfsd_file_check_write_error(nf); fput(nf->nf_file); } @@ -1157,6 +1136,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, out: if (status == nfs_ok) { this_cpu_inc(nfsd_file_acquisitions); + nfsd_file_check_write_error(nf); *pnf = nf; } else { if (refcount_dec_and_test(&nf->nf_ref)) diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 0f674982785c..445d00f00eab 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -1112,37 +1112,6 @@ TRACE_EVENT(nfsd_file_close, ) ); -TRACE_EVENT(nfsd_file_fsync, - TP_PROTO( - const struct nfsd_file *nf, - int ret - ), - TP_ARGS(nf, ret), - TP_STRUCT__entry( - __field(void *, nf_inode) - __field(int, nf_ref) - __field(int, ret) - __field(unsigned long, nf_flags) - __field(unsigned char, nf_may) - __field(struct file *, nf_file) - ), - TP_fast_assign( - __entry->nf_inode = nf->nf_inode; - __entry->nf_ref = refcount_read(&nf->nf_ref); - __entry->ret = ret; - __entry->nf_flags = nf->nf_flags; - __entry->nf_may = nf->nf_may; - __entry->nf_file = nf->nf_file; - ), - TP_printk("inode=%p ref=%d flags=%s may=%s nf_file=%p ret=%d", - __entry->nf_inode, - __entry->nf_ref, - show_nf_flags(__entry->nf_flags), - show_nfsd_may_flags(__entry->nf_may), - __entry->nf_file, __entry->ret - ) -); - #include "cache.h" TRACE_DEFINE_ENUM(RC_DROPIT); From 37cd49faaa944d1c14fb7adc30550fe27f08e7e6 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 14 Feb 2023 10:07:59 -0500 Subject: [PATCH 1037/1565] NFSD: copy the whole verifier in nfsd_copy_write_verifier [ Upstream commit 90d2175572470ba7f55da8447c72ddd4942923c4 ] Currently, we're only memcpy'ing the first __be32. Ensure we copy into both words. Fixes: 91d2e9b56cf5 ("NFSD: Clean up the nfsd_net::nfssvc_boot field") Reported-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfssvc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 325d3d3f1211..a0ecec54d3d7 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -363,7 +363,7 @@ void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn) do { read_seqbegin_or_lock(&nn->writeverf_lock, &seq); - memcpy(verf, nn->writeverf, sizeof(*verf)); + memcpy(verf, nn->writeverf, sizeof(nn->writeverf)); } while (need_seqretry(&nn->writeverf_lock, seq)); done_seqretry(&nn->writeverf_lock, seq); } From b0f33732796b45d7833257ea30d56de42bf80403 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 6 Mar 2023 10:43:47 -0500 Subject: [PATCH 1038/1565] NFSD: Protect against filesystem freezing [ Upstream commit fd9a2e1d513823e840960cb3bc26d8b7749d4ac2 ] Flole observes this WARNING on occasion: [1210423.486503] WARNING: CPU: 8 PID: 1524732 at fs/ext4/ext4_jbd2.c:75 ext4_journal_check_start+0x68/0xb0 Reported-by: Suggested-by: Jan Kara Link: https://bugzilla.kernel.org/show_bug.cgi?id=217123 Fixes: 73da852e3831 ("nfsd: use vfs_iter_read/write") Reviewed-by: Jeff Layton Reviewed-by: Jan Kara Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 3d67dd7eab4b..ddf424d76d41 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1117,7 +1117,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, since = READ_ONCE(file->f_wb_err); if (verf) nfsd_copy_write_verifier(verf, nn); + file_start_write(file); host_err = vfs_iter_write(file, &iter, &pos, flags); + file_end_write(file); if (host_err < 0) { nfsd_reset_write_verifier(nn); trace_nfsd_writeverf_reset(nn, rqstp, host_err); From 37b34eb5677073ec6972f395dfe650cf44b421eb Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Tue, 14 Mar 2023 06:20:58 -0400 Subject: [PATCH 1039/1565] lockd: set file_lock start and end when decoding nlm4 testargs [ Upstream commit 7ff84910c66c9144cc0de9d9deed9fb84c03aff0 ] Commit 6930bcbfb6ce dropped the setting of the file_lock range when decoding a nlm_lock off the wire. This causes the client side grant callback to miss matching blocks and reject the lock, only to rerequest it 30s later. Add a helper function to set the file_lock range from the start and end values that the protocol uses, and have the nlm_lock decoder call that to set up the file_lock args properly. Fixes: 6930bcbfb6ce ("lockd: detect and reject lock arguments that overflow") Reported-by: Amir Goldstein Signed-off-by: Jeff Layton Tested-by: Amir Goldstein Cc: stable@vger.kernel.org #6.0 Signed-off-by: Anna Schumaker Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/clnt4xdr.c | 9 +-------- fs/lockd/xdr4.c | 13 ++++++++++++- include/linux/lockd/xdr4.h | 1 + 3 files changed, 14 insertions(+), 9 deletions(-) diff --git a/fs/lockd/clnt4xdr.c b/fs/lockd/clnt4xdr.c index 7df6324ccb8a..8161667c976f 100644 --- a/fs/lockd/clnt4xdr.c +++ b/fs/lockd/clnt4xdr.c @@ -261,7 +261,6 @@ static int decode_nlm4_holder(struct xdr_stream *xdr, struct nlm_res *result) u32 exclusive; int error; __be32 *p; - s32 end; memset(lock, 0, sizeof(*lock)); locks_init_lock(fl); @@ -285,13 +284,7 @@ static int decode_nlm4_holder(struct xdr_stream *xdr, struct nlm_res *result) fl->fl_type = exclusive != 0 ? F_WRLCK : F_RDLCK; p = xdr_decode_hyper(p, &l_offset); xdr_decode_hyper(p, &l_len); - end = l_offset + l_len - 1; - - fl->fl_start = (loff_t)l_offset; - if (l_len == 0 || end < 0) - fl->fl_end = OFFSET_MAX; - else - fl->fl_end = (loff_t)end; + nlm4svc_set_file_lock_range(fl, l_offset, l_len); error = 0; out: return error; diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 712fdfeb8ef0..5fcbf30cd275 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -33,6 +33,17 @@ loff_t_to_s64(loff_t offset) return res; } +void nlm4svc_set_file_lock_range(struct file_lock *fl, u64 off, u64 len) +{ + s64 end = off + len - 1; + + fl->fl_start = off; + if (len == 0 || end < 0) + fl->fl_end = OFFSET_MAX; + else + fl->fl_end = end; +} + /* * NLM file handles are defined by specification to be a variable-length * XDR opaque no longer than 1024 bytes. However, this implementation @@ -80,7 +91,7 @@ svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) locks_init_lock(fl); fl->fl_flags = FL_POSIX; fl->fl_type = F_RDLCK; - + nlm4svc_set_file_lock_range(fl, lock->lock_start, lock->lock_len); return true; } diff --git a/include/linux/lockd/xdr4.h b/include/linux/lockd/xdr4.h index 9a6b55da8fd6..72831e35dca3 100644 --- a/include/linux/lockd/xdr4.h +++ b/include/linux/lockd/xdr4.h @@ -22,6 +22,7 @@ #define nlm4_fbig cpu_to_be32(NLM_FBIG) #define nlm4_failed cpu_to_be32(NLM_FAILED) +void nlm4svc_set_file_lock_range(struct file_lock *fl, u64 off, u64 len); bool nlm4svc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlm4svc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); bool nlm4svc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); From 8235cd619db6e67f1d7d26c55f1f3e4e575c947d Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 17 Mar 2023 13:13:08 -0400 Subject: [PATCH 1040/1565] nfsd: don't replace page in rq_pages if it's a continuation of last page [ Upstream commit 27c934dd8832dd40fd34776f916dc201e18b319b ] The splice read calls nfsd_splice_actor to put the pages containing file data into the svc_rqst->rq_pages array. It's possible however to get a splice result that only has a partial page at the end, if (e.g.) the filesystem hands back a short read that doesn't cover the whole page. nfsd_splice_actor will plop the partial page into its rq_pages array and return. Then later, when nfsd_splice_actor is called again, the remainder of the page may end up being filled out. At this point, nfsd_splice_actor will put the page into the array _again_ corrupting the reply. If this is done enough times, rq_next_page will overrun the array and corrupt the trailing fields -- the rq_respages and rq_next_page pointers themselves. If we've already added the page to the array in the last pass, don't add it to the array a second time when dealing with a splice continuation. This was originally handled properly in nfsd_splice_actor, but commit 91e23b1c3982 ("NFSD: Clean up nfsd_splice_actor()") removed the check for it. Fixes: 91e23b1c3982 ("NFSD: Clean up nfsd_splice_actor()") Cc: Al Viro Reported-by: Dario Lesca Tested-by: David Critch Link: https://bugzilla.redhat.com/show_bug.cgi?id=2150630 Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index ddf424d76d41..abc682854507 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -954,8 +954,15 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct page *last_page; last_page = page + (offset + sd->len - 1) / PAGE_SIZE; - for (page += offset / PAGE_SIZE; page <= last_page; page++) + for (page += offset / PAGE_SIZE; page <= last_page; page++) { + /* + * Skip page replacement when extending the contents + * of the current page. + */ + if (page == *(rqstp->rq_next_page - 1)) + continue; svc_rqst_replace_page(rqstp, page); + } if (rqstp->rq_res.page_len == 0) // first call rqstp->rq_res.page_base = offset % PAGE_SIZE; rqstp->rq_res.page_len += sd->len; From 50827896c365e0f6c8b55ed56d444dafd87c92c5 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 31 Mar 2023 16:31:19 -0400 Subject: [PATCH 1041/1565] NFSD: Avoid calling OPDESC() with ops->opnum == OP_ILLEGAL [ Upstream commit 804d8e0a6e54427268790472781e03bc243f4ee3 ] OPDESC() simply indexes into nfsd4_ops[] by the op's operation number, without range checking that value. It assumes callers are careful to avoid calling it with an out-of-bounds opnum value. nfsd4_decode_compound() is not so careful, and can invoke OPDESC() with opnum set to OP_ILLEGAL, which is 10044 -- well beyond the end of nfsd4_ops[]. Reported-by: Jeff Layton Fixes: f4f9ef4a1b0a ("nfsd4: opdesc will be useful outside nfs4proc.c") Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index d0b9fbf189ac..e1f2f26ba93f 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2476,10 +2476,12 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) for (i = 0; i < argp->opcnt; i++) { op = &argp->ops[i]; op->replay = NULL; + op->opdesc = NULL; if (xdr_stream_decode_u32(argp->xdr, &op->opnum) < 0) return false; if (nfsd4_opnum_in_range(argp, op)) { + op->opdesc = OPDESC(op); op->status = nfsd4_dec_ops[op->opnum](argp, &op->u); if (op->status != nfs_ok) trace_nfsd_compound_decode_err(argp->rqstp, @@ -2490,7 +2492,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) op->opnum = OP_ILLEGAL; op->status = nfserr_op_illegal; } - op->opdesc = OPDESC(op); + /* * We'll try to cache the result in the DRC if any one * op in the compound wants to be cached: From 65a33135e91e6dd661ecdf1194b9d90c49ae3570 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 27 Mar 2023 06:21:37 -0400 Subject: [PATCH 1042/1565] nfsd: call op_release, even when op_func returns an error [ Upstream commit 15a8b55dbb1ba154d82627547c5761cac884d810 ] For ops with "trivial" replies, nfsd4_encode_operation will shortcut most of the encoding work and skip to just marshalling up the status. One of the things it skips is calling op_release. This could cause a memory leak in the layoutget codepath if there is an error at an inopportune time. Have the compound processing engine always call op_release, even when op_func sets an error in op->status. With this change, we also need nfsd4_block_get_device_info_scsi to set the gd_device pointer to NULL on error to avoid a double free. Reported-by: Zhi Li Link: https://bugzilla.redhat.com/show_bug.cgi?id=2181403 Fixes: 34b1744c91cc ("nfsd4: define ->op_release for compound ops") Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index e1f2f26ba93f..d62382dfc135 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -5397,10 +5397,8 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) __be32 *p; p = xdr_reserve_space(xdr, 8); - if (!p) { - WARN_ON_ONCE(1); - return; - } + if (!p) + goto release; *p++ = cpu_to_be32(op->opnum); post_err_offset = xdr->buf->len; @@ -5415,8 +5413,6 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) op->status = encoder(resp, op->status, &op->u); if (op->status) trace_nfsd_compound_encode_err(rqstp, op->opnum, op->status); - if (opdesc && opdesc->op_release) - opdesc->op_release(&op->u); xdr_commit_encode(xdr); /* nfsd4_check_resp_size guarantees enough room for error status */ @@ -5457,6 +5453,9 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) } status: *p = op->status; +release: + if (opdesc && opdesc->op_release) + opdesc->op_release(&op->u); } /* From 9c599dee875463ba3888e9c05d1755a3896eba53 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Thu, 5 Jan 2023 07:15:09 -0500 Subject: [PATCH 1043/1565] nfsd: don't open-code clear_and_wake_up_bit [ Upstream commit b8bea9f6cdd7236c7c2238d022145e9b2f8aac22 ] Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 4a3796c6bd95..677a8d935ccc 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1173,9 +1173,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, status = nfserr_jukebox; if (status != nfs_ok) nfsd_file_unhash(nf); - clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags); - smp_mb__after_atomic(); - wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING); + clear_and_wake_up_bit(NFSD_FILE_PENDING, &nf->nf_flags); goto out; } From e378da83577f6131a37eb79df12cb4276e657dfb Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 6 Jan 2023 10:39:00 -0500 Subject: [PATCH 1044/1565] nfsd: NFSD_FILE_KEY_INODE only needs to find GC'ed entries [ Upstream commit 6c31e4c98853a4ba47355ea151b36a77c42b7734 ] Since v4 files are expected to be long-lived, there's little value in closing them out of the cache when there is conflicting access. Change the comparator to also match the gc value in the key. Change both of the current users of that key to set the gc value in the key to "true". Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 677a8d935ccc..4ddc82b84f7c 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -174,6 +174,8 @@ static int nfsd_file_obj_cmpfn(struct rhashtable_compare_arg *arg, switch (key->type) { case NFSD_FILE_KEY_INODE: + if (test_bit(NFSD_FILE_GC, &nf->nf_flags) != key->gc) + return 1; if (nf->nf_inode != key->inode) return 1; break; @@ -694,6 +696,7 @@ nfsd_file_queue_for_close(struct inode *inode, struct list_head *dispose) struct nfsd_file_lookup_key key = { .type = NFSD_FILE_KEY_INODE, .inode = inode, + .gc = true, }; struct nfsd_file *nf; @@ -1048,6 +1051,7 @@ nfsd_file_is_cached(struct inode *inode) struct nfsd_file_lookup_key key = { .type = NFSD_FILE_KEY_INODE, .inode = inode, + .gc = true, }; bool ret = false; From 2c95ad0a0cb9e624f79c89b697c6725692c7beee Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 6 Jan 2023 10:39:01 -0500 Subject: [PATCH 1045/1565] nfsd: simplify test_bit return in NFSD_FILE_KEY_FULL comparator [ Upstream commit d69b8dbfd0866abc5ec84652cc1c10fc3d4d91ef ] test_bit returns bool, so we can just compare the result of that to the key->gc value without the "!!". Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 4ddc82b84f7c..d61c8223082a 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -188,7 +188,7 @@ static int nfsd_file_obj_cmpfn(struct rhashtable_compare_arg *arg, return 1; if (!nfsd_match_cred(nf->nf_cred, key->cred)) return 1; - if (!!test_bit(NFSD_FILE_GC, &nf->nf_flags) != key->gc) + if (test_bit(NFSD_FILE_GC, &nf->nf_flags) != key->gc) return 1; if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) return 1; From c9e8ed6efabe6580dbf6c6d6477d5b2dec4a4ec8 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Thu, 5 Jan 2023 07:15:11 -0500 Subject: [PATCH 1046/1565] nfsd: don't kill nfsd_files because of lease break error [ Upstream commit c6593366c0bf222be9c7561354dfb921c611745e ] An error from break_lease is non-fatal, so we needn't destroy the nfsd_file in that case. Just put the reference like we normally would and return the error. Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index d61c8223082a..43bb2fd47cf5 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1101,7 +1101,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, nf = nfsd_file_alloc(&key, may_flags); if (!nf) { status = nfserr_jukebox; - goto out_status; + goto out; } ret = rhashtable_lookup_insert_key(&nfsd_file_rhash_tbl, @@ -1110,13 +1110,11 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, if (likely(ret == 0)) goto open_file; - nfsd_file_slab_free(&nf->nf_rcu); - nf = NULL; if (ret == -EEXIST) goto retry; trace_nfsd_file_insert_err(rqstp, key.inode, may_flags, ret); status = nfserr_jukebox; - goto out_status; + goto construction_err; wait_for_construction: wait_on_bit(&nf->nf_flags, NFSD_FILE_PENDING, TASK_UNINTERRUPTIBLE); @@ -1126,29 +1124,25 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, trace_nfsd_file_cons_err(rqstp, key.inode, may_flags, nf); if (!open_retry) { status = nfserr_jukebox; - goto out; + goto construction_err; } open_retry = false; - if (refcount_dec_and_test(&nf->nf_ref)) - nfsd_file_free(nf); goto retry; } - this_cpu_inc(nfsd_file_cache_hits); status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); + if (status != nfs_ok) { + nfsd_file_put(nf); + nf = NULL; + } + out: if (status == nfs_ok) { this_cpu_inc(nfsd_file_acquisitions); nfsd_file_check_write_error(nf); *pnf = nf; - } else { - if (refcount_dec_and_test(&nf->nf_ref)) - nfsd_file_free(nf); - nf = NULL; } - -out_status: put_cred(key.cred); trace_nfsd_file_acquire(rqstp, key.inode, may_flags, nf, status); return status; @@ -1178,6 +1172,13 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, if (status != nfs_ok) nfsd_file_unhash(nf); clear_and_wake_up_bit(NFSD_FILE_PENDING, &nf->nf_flags); + if (status == nfs_ok) + goto out; + +construction_err: + if (refcount_dec_and_test(&nf->nf_ref)) + nfsd_file_free(nf); + nf = NULL; goto out; } From c3677c14b3d4fc7e537d8a2dfc2f6c96732fd8a7 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Thu, 5 Jan 2023 07:15:12 -0500 Subject: [PATCH 1047/1565] nfsd: add some comments to nfsd_file_do_acquire [ Upstream commit b680cb9b737331aad271feebbedafb865504e234 ] David Howells mentioned that he found this bit of code confusing, so sprinkle in some comments to clarify. Reported-by: David Howells Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 43bb2fd47cf5..faa0c7d0253e 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1093,6 +1093,11 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, rcu_read_unlock(); if (nf) { + /* + * If the nf is on the LRU then it holds an extra reference + * that must be put if it's removed. It had better not be + * the last one however, since we should hold another. + */ if (nfsd_file_lru_remove(nf)) WARN_ON_ONCE(refcount_dec_and_test(&nf->nf_ref)); goto wait_for_construction; From f593ea1423c66aa6fb3ab0f2243a1c4ec9f8a1af Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 18 Jan 2023 12:31:37 -0500 Subject: [PATCH 1048/1565] nfsd: don't take/put an extra reference when putting a file [ Upstream commit b2ff1bd71db2a1b193a6dde0845adcd69cbcf75e ] The last thing that filp_close does is an fput, so don't bother taking and putting the extra reference. Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index faa0c7d0253e..786e06cf107f 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -381,10 +381,8 @@ nfsd_file_free(struct nfsd_file *nf) if (nf->nf_mark) nfsd_file_mark_put(nf->nf_mark); if (nf->nf_file) { - get_file(nf->nf_file); - filp_close(nf->nf_file, NULL); nfsd_file_check_write_error(nf); - fput(nf->nf_file); + filp_close(nf->nf_file, NULL); } /* From f7b157737c64adeac414f31262ef3ba4f322d5e4 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Thu, 26 Jan 2023 12:21:16 -0500 Subject: [PATCH 1049/1565] nfsd: update comment over __nfsd_file_cache_purge [ Upstream commit 972cc0e0924598cb293b919d39c848dc038b2c28 ] Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 786e06cf107f..1d4c0387c419 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -906,7 +906,8 @@ nfsd_file_cache_init(void) * @net: net-namespace to shut down the cache (may be NULL) * * Walk the nfsd_file cache and close out any that match @net. If @net is NULL, - * then close out everything. Called when an nfsd instance is being shut down. + * then close out everything. Called when an nfsd instance is being shut down, + * and when the exports table is flushed. */ static void __nfsd_file_cache_purge(struct net *net) From 5a132ffa76bdd48a2ecfe7ff543a406c883e42f4 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 15 Feb 2023 06:53:54 -0500 Subject: [PATCH 1050/1565] nfsd: allow reaping files still under writeback [ Upstream commit dcb779fcd4ed5984ad15991d574943d12a8693d1 ] On most filesystems, there is no reason to delay reaping an nfsd_file just because its underlying inode is still under writeback. nfsd just relies on client activity or the local flusher threads to do writeback. The main exception is NFS, which flushes all of its dirty data on last close. Add a new EXPORT_OP_FLUSH_ON_CLOSE flag to allow filesystems to signal that they do this, and only skip closing files under writeback on such filesystems. Also, remove a redundant NULL file pointer check in nfsd_file_check_writeback, and clean up nfs's export op flag definitions. Signed-off-by: Jeff Layton Acked-by: Anna Schumaker [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfs/export.c | 9 ++++++--- fs/nfsd/filecache.c | 12 +++++++++++- include/linux/exportfs.h | 1 + 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/fs/nfs/export.c b/fs/nfs/export.c index b347e3ce0cc8..993be63ab301 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -182,7 +182,10 @@ const struct export_operations nfs_export_ops = { .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, .fetch_iversion = nfs_fetch_iversion, - .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK| - EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS| - EXPORT_OP_NOATOMIC_ATTR, + .flags = EXPORT_OP_NOWCC | + EXPORT_OP_NOSUBTREECHK | + EXPORT_OP_CLOSE_BEFORE_UNLINK | + EXPORT_OP_REMOTE_FS | + EXPORT_OP_NOATOMIC_ATTR | + EXPORT_OP_FLUSH_ON_CLOSE, }; diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 1d4c0387c419..080d79654785 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -401,13 +401,23 @@ nfsd_file_check_writeback(struct nfsd_file *nf) struct file *file = nf->nf_file; struct address_space *mapping; - if (!file || !(file->f_mode & FMODE_WRITE)) + /* File not open for write? */ + if (!(file->f_mode & FMODE_WRITE)) return false; + + /* + * Some filesystems (e.g. NFS) flush all dirty data on close. + * On others, there is no need to wait for writeback. + */ + if (!(file_inode(file)->i_sb->s_export_op->flags & EXPORT_OP_FLUSH_ON_CLOSE)) + return false; + mapping = file->f_mapping; return mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) || mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK); } + static bool nfsd_file_lru_add(struct nfsd_file *nf) { set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index fe848901fcc3..218fc5c54e90 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -221,6 +221,7 @@ struct export_operations { #define EXPORT_OP_NOATOMIC_ATTR (0x10) /* Filesystem cannot supply atomic attribute updates */ +#define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */ unsigned long flags; }; From 56b36b8960e5d0a6650d5e6081fd4e994ca40d34 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 24 Nov 2022 15:09:04 -0500 Subject: [PATCH 1051/1565] NFSD: Convert filecache to rhltable [ Upstream commit c4c649ab413ba6a785b25f0edbb12f617c87db2a ] While we were converting the nfs4_file hashtable to use the kernel's resizable hashtable data structure, Neil Brown observed that the list variant (rhltable) would be better for managing nfsd_file items as well. The nfsd_file hash table will contain multiple entries for the same inode -- these should be kept together on a list. And, it could be possible for exotic or malicious client behavior to cause the hash table to resize itself on every insertion. A nice simplification is that rhltable_lookup() can return a list that contains only nfsd_file items that match a given inode, which enables us to eliminate specialized hash table helper functions and use the default functions provided by the rhashtable implementation). Since we are now storing nfsd_file items for the same inode on a single list, that effectively reduces the number of hash entries that have to be tracked in the hash table. The mininum bucket count is therefore lowered. Light testing with fstests generic/531 show no regressions. Suggested-by: Neil Brown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 311 ++++++++++++++++++-------------------------- fs/nfsd/filecache.h | 9 +- 2 files changed, 133 insertions(+), 187 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 080d79654785..52e67ec26796 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -73,70 +73,9 @@ static struct list_lru nfsd_file_lru; static unsigned long nfsd_file_flags; static struct fsnotify_group *nfsd_file_fsnotify_group; static struct delayed_work nfsd_filecache_laundrette; -static struct rhashtable nfsd_file_rhash_tbl +static struct rhltable nfsd_file_rhltable ____cacheline_aligned_in_smp; -enum nfsd_file_lookup_type { - NFSD_FILE_KEY_INODE, - NFSD_FILE_KEY_FULL, -}; - -struct nfsd_file_lookup_key { - struct inode *inode; - struct net *net; - const struct cred *cred; - unsigned char need; - bool gc; - enum nfsd_file_lookup_type type; -}; - -/* - * The returned hash value is based solely on the address of an in-code - * inode, a pointer to a slab-allocated object. The entropy in such a - * pointer is concentrated in its middle bits. - */ -static u32 nfsd_file_inode_hash(const struct inode *inode, u32 seed) -{ - unsigned long ptr = (unsigned long)inode; - u32 k; - - k = ptr >> L1_CACHE_SHIFT; - k &= 0x00ffffff; - return jhash2(&k, 1, seed); -} - -/** - * nfsd_file_key_hashfn - Compute the hash value of a lookup key - * @data: key on which to compute the hash value - * @len: rhash table's key_len parameter (unused) - * @seed: rhash table's random seed of the day - * - * Return value: - * Computed 32-bit hash value - */ -static u32 nfsd_file_key_hashfn(const void *data, u32 len, u32 seed) -{ - const struct nfsd_file_lookup_key *key = data; - - return nfsd_file_inode_hash(key->inode, seed); -} - -/** - * nfsd_file_obj_hashfn - Compute the hash value of an nfsd_file - * @data: object on which to compute the hash value - * @len: rhash table's key_len parameter (unused) - * @seed: rhash table's random seed of the day - * - * Return value: - * Computed 32-bit hash value - */ -static u32 nfsd_file_obj_hashfn(const void *data, u32 len, u32 seed) -{ - const struct nfsd_file *nf = data; - - return nfsd_file_inode_hash(nf->nf_inode, seed); -} - static bool nfsd_match_cred(const struct cred *c1, const struct cred *c2) { @@ -157,55 +96,16 @@ nfsd_match_cred(const struct cred *c1, const struct cred *c2) return true; } -/** - * nfsd_file_obj_cmpfn - Match a cache item against search criteria - * @arg: search criteria - * @ptr: cache item to check - * - * Return values: - * %0 - Item matches search criteria - * %1 - Item does not match search criteria - */ -static int nfsd_file_obj_cmpfn(struct rhashtable_compare_arg *arg, - const void *ptr) -{ - const struct nfsd_file_lookup_key *key = arg->key; - const struct nfsd_file *nf = ptr; - - switch (key->type) { - case NFSD_FILE_KEY_INODE: - if (test_bit(NFSD_FILE_GC, &nf->nf_flags) != key->gc) - return 1; - if (nf->nf_inode != key->inode) - return 1; - break; - case NFSD_FILE_KEY_FULL: - if (nf->nf_inode != key->inode) - return 1; - if (nf->nf_may != key->need) - return 1; - if (nf->nf_net != key->net) - return 1; - if (!nfsd_match_cred(nf->nf_cred, key->cred)) - return 1; - if (test_bit(NFSD_FILE_GC, &nf->nf_flags) != key->gc) - return 1; - if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) - return 1; - break; - } - return 0; -} - static const struct rhashtable_params nfsd_file_rhash_params = { .key_len = sizeof_field(struct nfsd_file, nf_inode), .key_offset = offsetof(struct nfsd_file, nf_inode), - .head_offset = offsetof(struct nfsd_file, nf_rhash), - .hashfn = nfsd_file_key_hashfn, - .obj_hashfn = nfsd_file_obj_hashfn, - .obj_cmpfn = nfsd_file_obj_cmpfn, - /* Reduce resizing churn on light workloads */ - .min_size = 512, /* buckets */ + .head_offset = offsetof(struct nfsd_file, nf_rlist), + + /* + * Start with a single page hash table to reduce resizing churn + * on light workloads. + */ + .min_size = 256, .automatic_shrinking = true, }; @@ -308,27 +208,27 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf, struct inode *inode) } static struct nfsd_file * -nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) +nfsd_file_alloc(struct net *net, struct inode *inode, unsigned char need, + bool want_gc) { struct nfsd_file *nf; nf = kmem_cache_alloc(nfsd_file_slab, GFP_KERNEL); - if (nf) { - INIT_LIST_HEAD(&nf->nf_lru); - nf->nf_birthtime = ktime_get(); - nf->nf_file = NULL; - nf->nf_cred = get_current_cred(); - nf->nf_net = key->net; - nf->nf_flags = 0; - __set_bit(NFSD_FILE_HASHED, &nf->nf_flags); - __set_bit(NFSD_FILE_PENDING, &nf->nf_flags); - if (key->gc) - __set_bit(NFSD_FILE_GC, &nf->nf_flags); - nf->nf_inode = key->inode; - refcount_set(&nf->nf_ref, 1); - nf->nf_may = key->need; - nf->nf_mark = NULL; - } + if (unlikely(!nf)) + return NULL; + + INIT_LIST_HEAD(&nf->nf_lru); + nf->nf_birthtime = ktime_get(); + nf->nf_file = NULL; + nf->nf_cred = get_current_cred(); + nf->nf_net = net; + nf->nf_flags = want_gc ? + BIT(NFSD_FILE_HASHED) | BIT(NFSD_FILE_PENDING) | BIT(NFSD_FILE_GC) : + BIT(NFSD_FILE_HASHED) | BIT(NFSD_FILE_PENDING); + nf->nf_inode = inode; + refcount_set(&nf->nf_ref, 1); + nf->nf_may = need; + nf->nf_mark = NULL; return nf; } @@ -353,8 +253,8 @@ static void nfsd_file_hash_remove(struct nfsd_file *nf) { trace_nfsd_file_unhash(nf); - rhashtable_remove_fast(&nfsd_file_rhash_tbl, &nf->nf_rhash, - nfsd_file_rhash_params); + rhltable_remove(&nfsd_file_rhltable, &nf->nf_rlist, + nfsd_file_rhash_params); } static bool @@ -687,8 +587,8 @@ nfsd_file_cond_queue(struct nfsd_file *nf, struct list_head *dispose) * @inode: inode on which to close out nfsd_files * @dispose: list on which to gather nfsd_files to close out * - * An nfsd_file represents a struct file being held open on behalf of nfsd. An - * open file however can block other activity (such as leases), or cause + * An nfsd_file represents a struct file being held open on behalf of nfsd. + * An open file however can block other activity (such as leases), or cause * undesirable behavior (e.g. spurious silly-renames when reexporting NFS). * * This function is intended to find open nfsd_files when this sort of @@ -701,21 +601,17 @@ nfsd_file_cond_queue(struct nfsd_file *nf, struct list_head *dispose) static void nfsd_file_queue_for_close(struct inode *inode, struct list_head *dispose) { - struct nfsd_file_lookup_key key = { - .type = NFSD_FILE_KEY_INODE, - .inode = inode, - .gc = true, - }; + struct rhlist_head *tmp, *list; struct nfsd_file *nf; rcu_read_lock(); - do { - nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key, - nfsd_file_rhash_params); - if (!nf) - break; + list = rhltable_lookup(&nfsd_file_rhltable, &inode, + nfsd_file_rhash_params); + rhl_for_each_entry_rcu(nf, tmp, list, nf_rlist) { + if (!test_bit(NFSD_FILE_GC, &nf->nf_flags)) + continue; nfsd_file_cond_queue(nf, dispose); - } while (1); + } rcu_read_unlock(); } @@ -839,7 +735,7 @@ nfsd_file_cache_init(void) if (test_and_set_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1) return 0; - ret = rhashtable_init(&nfsd_file_rhash_tbl, &nfsd_file_rhash_params); + ret = rhltable_init(&nfsd_file_rhltable, &nfsd_file_rhash_params); if (ret) return ret; @@ -907,7 +803,7 @@ nfsd_file_cache_init(void) nfsd_file_mark_slab = NULL; destroy_workqueue(nfsd_filecache_wq); nfsd_filecache_wq = NULL; - rhashtable_destroy(&nfsd_file_rhash_tbl); + rhltable_destroy(&nfsd_file_rhltable); goto out; } @@ -926,7 +822,7 @@ __nfsd_file_cache_purge(struct net *net) struct nfsd_file *nf; LIST_HEAD(dispose); - rhashtable_walk_enter(&nfsd_file_rhash_tbl, &iter); + rhltable_walk_enter(&nfsd_file_rhltable, &iter); do { rhashtable_walk_start(&iter); @@ -1032,7 +928,7 @@ nfsd_file_cache_shutdown(void) nfsd_file_mark_slab = NULL; destroy_workqueue(nfsd_filecache_wq); nfsd_filecache_wq = NULL; - rhashtable_destroy(&nfsd_file_rhash_tbl); + rhltable_destroy(&nfsd_file_rhltable); for_each_possible_cpu(i) { per_cpu(nfsd_file_cache_hits, i) = 0; @@ -1043,6 +939,35 @@ nfsd_file_cache_shutdown(void) } } +static struct nfsd_file * +nfsd_file_lookup_locked(const struct net *net, const struct cred *cred, + struct inode *inode, unsigned char need, + bool want_gc) +{ + struct rhlist_head *tmp, *list; + struct nfsd_file *nf; + + list = rhltable_lookup(&nfsd_file_rhltable, &inode, + nfsd_file_rhash_params); + rhl_for_each_entry_rcu(nf, tmp, list, nf_rlist) { + if (nf->nf_may != need) + continue; + if (nf->nf_net != net) + continue; + if (!nfsd_match_cred(nf->nf_cred, cred)) + continue; + if (test_bit(NFSD_FILE_GC, &nf->nf_flags) != want_gc) + continue; + if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) + continue; + + if (!nfsd_file_get(nf)) + continue; + return nf; + } + return NULL; +} + /** * nfsd_file_is_cached - are there any cached open files for this inode? * @inode: inode to check @@ -1057,16 +982,20 @@ nfsd_file_cache_shutdown(void) bool nfsd_file_is_cached(struct inode *inode) { - struct nfsd_file_lookup_key key = { - .type = NFSD_FILE_KEY_INODE, - .inode = inode, - .gc = true, - }; + struct rhlist_head *tmp, *list; + struct nfsd_file *nf; bool ret = false; - if (rhashtable_lookup_fast(&nfsd_file_rhash_tbl, &key, - nfsd_file_rhash_params) != NULL) - ret = true; + rcu_read_lock(); + list = rhltable_lookup(&nfsd_file_rhltable, &inode, + nfsd_file_rhash_params); + rhl_for_each_entry_rcu(nf, tmp, list, nf_rlist) + if (test_bit(NFSD_FILE_GC, &nf->nf_flags)) { + ret = true; + break; + } + rcu_read_unlock(); + trace_nfsd_file_is_cached(inode, (int)ret); return ret; } @@ -1076,14 +1005,12 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct file *file, struct nfsd_file **pnf, bool want_gc) { - struct nfsd_file_lookup_key key = { - .type = NFSD_FILE_KEY_FULL, - .need = may_flags & NFSD_FILE_MAY_MASK, - .net = SVC_NET(rqstp), - .gc = want_gc, - }; + unsigned char need = may_flags & NFSD_FILE_MAY_MASK; + struct net *net = SVC_NET(rqstp); + struct nfsd_file *new, *nf; + const struct cred *cred; bool open_retry = true; - struct nfsd_file *nf; + struct inode *inode; __be32 status; int ret; @@ -1091,14 +1018,12 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, may_flags|NFSD_MAY_OWNER_OVERRIDE); if (status != nfs_ok) return status; - key.inode = d_inode(fhp->fh_dentry); - key.cred = get_current_cred(); + inode = d_inode(fhp->fh_dentry); + cred = get_current_cred(); retry: rcu_read_lock(); - nf = rhashtable_lookup(&nfsd_file_rhash_tbl, &key, - nfsd_file_rhash_params); - nf = nfsd_file_get(nf); + nf = nfsd_file_lookup_locked(net, cred, inode, need, want_gc); rcu_read_unlock(); if (nf) { @@ -1112,21 +1037,32 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, goto wait_for_construction; } - nf = nfsd_file_alloc(&key, may_flags); - if (!nf) { + new = nfsd_file_alloc(net, inode, need, want_gc); + if (!new) { status = nfserr_jukebox; goto out; } - ret = rhashtable_lookup_insert_key(&nfsd_file_rhash_tbl, - &key, &nf->nf_rhash, - nfsd_file_rhash_params); + rcu_read_lock(); + spin_lock(&inode->i_lock); + nf = nfsd_file_lookup_locked(net, cred, inode, need, want_gc); + if (unlikely(nf)) { + spin_unlock(&inode->i_lock); + rcu_read_unlock(); + nfsd_file_slab_free(&new->nf_rcu); + goto wait_for_construction; + } + nf = new; + ret = rhltable_insert(&nfsd_file_rhltable, &nf->nf_rlist, + nfsd_file_rhash_params); + spin_unlock(&inode->i_lock); + rcu_read_unlock(); if (likely(ret == 0)) goto open_file; if (ret == -EEXIST) goto retry; - trace_nfsd_file_insert_err(rqstp, key.inode, may_flags, ret); + trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret); status = nfserr_jukebox; goto construction_err; @@ -1135,7 +1071,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, /* Did construction of this file fail? */ if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { - trace_nfsd_file_cons_err(rqstp, key.inode, may_flags, nf); + trace_nfsd_file_cons_err(rqstp, inode, may_flags, nf); if (!open_retry) { status = nfserr_jukebox; goto construction_err; @@ -1157,13 +1093,13 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, nfsd_file_check_write_error(nf); *pnf = nf; } - put_cred(key.cred); - trace_nfsd_file_acquire(rqstp, key.inode, may_flags, nf, status); + put_cred(cred); + trace_nfsd_file_acquire(rqstp, inode, may_flags, nf, status); return status; open_file: trace_nfsd_file_alloc(nf); - nf->nf_mark = nfsd_file_mark_find_or_create(nf, key.inode); + nf->nf_mark = nfsd_file_mark_find_or_create(nf, inode); if (nf->nf_mark) { if (file) { get_file(file); @@ -1181,7 +1117,7 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, * If construction failed, or we raced with a call to unlink() * then unhash. */ - if (status == nfs_ok && key.inode->i_nlink == 0) + if (status != nfs_ok || inode->i_nlink == 0) status = nfserr_jukebox; if (status != nfs_ok) nfsd_file_unhash(nf); @@ -1208,8 +1144,11 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, * seconds after the final nfsd_file_put() in case the caller * wants to re-use it. * - * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in - * network byte order is returned. + * Return values: + * %nfs_ok - @pnf points to an nfsd_file with its reference + * count boosted. + * + * On error, an nfsstat value in network byte order is returned. */ __be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp, @@ -1229,8 +1168,11 @@ nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp, * but not garbage-collected. The object is unhashed after the * final nfsd_file_put(). * - * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in - * network byte order is returned. + * Return values: + * %nfs_ok - @pnf points to an nfsd_file with its reference + * count boosted. + * + * On error, an nfsstat value in network byte order is returned. */ __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, @@ -1251,8 +1193,11 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, * and @file is non-NULL, use it to instantiate a new nfsd_file instead of * opening a new one. * - * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in - * network byte order is returned. + * Return values: + * %nfs_ok - @pnf points to an nfsd_file with its reference + * count boosted. + * + * On error, an nfsstat value in network byte order is returned. */ __be32 nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp, @@ -1283,7 +1228,7 @@ int nfsd_file_cache_stats_show(struct seq_file *m, void *v) lru = list_lru_count(&nfsd_file_lru); rcu_read_lock(); - ht = &nfsd_file_rhash_tbl; + ht = &nfsd_file_rhltable.ht; count = atomic_read(&ht->nelems); tbl = rht_dereference_rcu(ht->tbl, ht); buckets = tbl->size; @@ -1299,7 +1244,7 @@ int nfsd_file_cache_stats_show(struct seq_file *m, void *v) evictions += per_cpu(nfsd_file_evictions, i); } - seq_printf(m, "total entries: %u\n", count); + seq_printf(m, "total inodes: %u\n", count); seq_printf(m, "hash buckets: %u\n", buckets); seq_printf(m, "lru entries: %lu\n", lru); seq_printf(m, "cache hits: %lu\n", hits); diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 41516a4263ea..e54165a3224f 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -29,9 +29,8 @@ struct nfsd_file_mark { * never be dereferenced, only used for comparison. */ struct nfsd_file { - struct rhash_head nf_rhash; - struct list_head nf_lru; - struct rcu_head nf_rcu; + struct rhlist_head nf_rlist; + void *nf_inode; struct file *nf_file; const struct cred *nf_cred; struct net *nf_net; @@ -40,10 +39,12 @@ struct nfsd_file { #define NFSD_FILE_REFERENCED (2) #define NFSD_FILE_GC (3) unsigned long nf_flags; - struct inode *nf_inode; /* don't deref */ refcount_t nf_ref; unsigned char nf_may; + struct nfsd_file_mark *nf_mark; + struct list_head nf_lru; + struct rcu_head nf_rcu; ktime_t nf_birthtime; }; From 6c05d25ca8990158aeea90f731a7a1ee5bef4ffa Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 14 Apr 2023 17:31:44 -0400 Subject: [PATCH 1052/1565] nfsd: simplify the delayed disposal list code [ Upstream commit 92e4a6733f922f0fef1d0995f7b2d0eaff86c7ea ] When queueing a dispose list to the appropriate "freeme" lists, it pointlessly queues the objects one at a time to an intermediate list. Remove a few helpers and just open code a list_move to make it more clear and efficient. Better document the resulting functions with kerneldoc comments. Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 64 ++++++++++++++++----------------------------- 1 file changed, 22 insertions(+), 42 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 52e67ec26796..6b8706f23eaf 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -401,49 +401,26 @@ nfsd_file_dispose_list(struct list_head *dispose) } } -static void -nfsd_file_list_remove_disposal(struct list_head *dst, - struct nfsd_fcache_disposal *l) -{ - spin_lock(&l->lock); - list_splice_init(&l->freeme, dst); - spin_unlock(&l->lock); -} - -static void -nfsd_file_list_add_disposal(struct list_head *files, struct net *net) -{ - struct nfsd_net *nn = net_generic(net, nfsd_net_id); - struct nfsd_fcache_disposal *l = nn->fcache_disposal; - - spin_lock(&l->lock); - list_splice_tail_init(files, &l->freeme); - spin_unlock(&l->lock); - queue_work(nfsd_filecache_wq, &l->work); -} - -static void -nfsd_file_list_add_pernet(struct list_head *dst, struct list_head *src, - struct net *net) -{ - struct nfsd_file *nf, *tmp; - - list_for_each_entry_safe(nf, tmp, src, nf_lru) { - if (nf->nf_net == net) - list_move_tail(&nf->nf_lru, dst); - } -} - +/** + * nfsd_file_dispose_list_delayed - move list of dead files to net's freeme list + * @dispose: list of nfsd_files to be disposed + * + * Transfers each file to the "freeme" list for its nfsd_net, to eventually + * be disposed of by the per-net garbage collector. + */ static void nfsd_file_dispose_list_delayed(struct list_head *dispose) { - LIST_HEAD(list); - struct nfsd_file *nf; - while(!list_empty(dispose)) { - nf = list_first_entry(dispose, struct nfsd_file, nf_lru); - nfsd_file_list_add_pernet(&list, dispose, nf->nf_net); - nfsd_file_list_add_disposal(&list, nf->nf_net); + struct nfsd_file *nf = list_first_entry(dispose, + struct nfsd_file, nf_lru); + struct nfsd_net *nn = net_generic(nf->nf_net, nfsd_net_id); + struct nfsd_fcache_disposal *l = nn->fcache_disposal; + + spin_lock(&l->lock); + list_move_tail(&nf->nf_lru, &l->freeme); + spin_unlock(&l->lock); + queue_work(nfsd_filecache_wq, &l->work); } } @@ -664,8 +641,8 @@ nfsd_file_close_inode_sync(struct inode *inode) * nfsd_file_delayed_close - close unused nfsd_files * @work: dummy * - * Walk the LRU list and destroy any entries that have not been used since - * the last scan. + * Scrape the freeme list for this nfsd_net, and then dispose of them + * all. */ static void nfsd_file_delayed_close(struct work_struct *work) @@ -674,7 +651,10 @@ nfsd_file_delayed_close(struct work_struct *work) struct nfsd_fcache_disposal *l = container_of(work, struct nfsd_fcache_disposal, work); - nfsd_file_list_remove_disposal(&head, l); + spin_lock(&l->lock); + list_splice_init(&l->freeme, &head); + spin_unlock(&l->lock); + nfsd_file_dispose_list(&head); } From 05f45f3981d392f9f3ced9bd5302ad981bb55499 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Wed, 19 Apr 2023 10:53:18 -0700 Subject: [PATCH 1053/1565] NFSD: Fix problem of COMMIT and NFS4ERR_DELAY in infinite loop [ Upstream commit 147abcacee33781e75588869e944ddb07528a897 ] The following request sequence to the same file causes the NFS client and server getting into an infinite loop with COMMIT and NFS4ERR_DELAY: OPEN REMOVE WRITE COMMIT Problem reported by recall11, recall12, recall14, recall20, recall22, recall40, recall42, recall48, recall50 of nfstest suite. This patch restores the handling of race condition in nfsd_file_do_acquire with unlink to that prior of the regression. Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache") Signed-off-by: Dai Ngo Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/filecache.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 6b8706f23eaf..615ea8324911 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1098,8 +1098,6 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, * then unhash. */ if (status != nfs_ok || inode->i_nlink == 0) - status = nfserr_jukebox; - if (status != nfs_ok) nfsd_file_unhash(nf); clear_and_wake_up_bit(NFSD_FILE_PENDING, &nf->nf_flags); if (status == nfs_ok) From 8157832461bdae946485a17f7701c1215b85fd77 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 17 May 2023 12:26:44 -0400 Subject: [PATCH 1054/1565] nfsd: make a copy of struct iattr before calling notify_change [ Upstream commit d53d70084d27f56bcdf5074328f2c9ec861be596 ] notify_change can modify the iattr structure. In particular it can end up setting ATTR_MODE when ATTR_KILL_SUID is already set, causing a BUG() if the same iattr is passed to notify_change more than once. Make a copy of the struct iattr before calling notify_change. Reported-by: Zhi Li Link: https://bugzilla.redhat.com/show_bug.cgi?id=2207969 Tested-by: Zhi Li Fixes: 34b91dda7124 ("NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY") Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/vfs.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index abc682854507..542a1adbbf2a 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -545,7 +545,15 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, inode_lock(inode); for (retries = 1;;) { - host_err = __nfsd_setattr(dentry, iap); + struct iattr attrs; + + /* + * notify_change() can alter its iattr argument, making + * @iap unsuitable for submission multiple times. Make a + * copy for every loop iteration. + */ + attrs = *iap; + host_err = __nfsd_setattr(dentry, &attrs); if (host_err != -EAGAIN || !retries--) break; if (!nfsd_wait_for_delegreturn(rqstp, inode)) From b28b5c726e49e8020d7143b5beede75850639aa3 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Mon, 29 May 2023 14:35:55 +0300 Subject: [PATCH 1055/1565] nfsd: fix double fget() bug in __write_ports_addfd() [ Upstream commit c034203b6a9dae6751ef4371c18cb77983e30c28 ] The bug here is that you cannot rely on getting the same socket from multiple calls to fget() because userspace can influence that. This is a kind of double fetch bug. The fix is to delete the svc_alien_sock() function and instead do the checking inside the svc_addsock() function. Fixes: 3064639423c4 ("nfsd: check passed socket's net matches NFSd superblock's one") Signed-off-by: Dan Carpenter Reviewed-by: NeilBrown Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsctl.c | 7 +------ include/linux/sunrpc/svcsock.h | 7 +++---- net/sunrpc/svcsock.c | 24 ++++++------------------ 3 files changed, 10 insertions(+), 28 deletions(-) diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index c2577ee7ffb2..76a60e7a7509 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -714,16 +714,11 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred if (err != 0 || fd < 0) return -EINVAL; - if (svc_alien_sock(net, fd)) { - printk(KERN_ERR "%s: socket net is different to NFSd's one\n", __func__); - return -EINVAL; - } - err = nfsd_create_serv(net); if (err != 0) return err; - err = svc_addsock(nn->nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred); + err = svc_addsock(nn->nfsd_serv, net, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred); if (err >= 0 && !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h index b7ac7fe68306..a366d3eb0531 100644 --- a/include/linux/sunrpc/svcsock.h +++ b/include/linux/sunrpc/svcsock.h @@ -57,10 +57,9 @@ int svc_recv(struct svc_rqst *, long); int svc_send(struct svc_rqst *); void svc_drop(struct svc_rqst *); void svc_sock_update_bufs(struct svc_serv *serv); -bool svc_alien_sock(struct net *net, int fd); -int svc_addsock(struct svc_serv *serv, const int fd, - char *name_return, const size_t len, - const struct cred *cred); +int svc_addsock(struct svc_serv *serv, struct net *net, + const int fd, char *name_return, const size_t len, + const struct cred *cred); void svc_init_xprt_sock(void); void svc_cleanup_xprt_sock(void); struct svc_xprt *svc_sock_create(struct svc_serv *serv, int prot); diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 90f6231d6ed6..cb0cfcd8a814 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1342,25 +1342,10 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv, return svsk; } -bool svc_alien_sock(struct net *net, int fd) -{ - int err; - struct socket *sock = sockfd_lookup(fd, &err); - bool ret = false; - - if (!sock) - goto out; - if (sock_net(sock->sk) != net) - ret = true; - sockfd_put(sock); -out: - return ret; -} -EXPORT_SYMBOL_GPL(svc_alien_sock); - /** * svc_addsock - add a listener socket to an RPC service * @serv: pointer to RPC service to which to add a new listener + * @net: caller's network namespace * @fd: file descriptor of the new listener * @name_return: pointer to buffer to fill in with name of listener * @len: size of the buffer @@ -1370,8 +1355,8 @@ EXPORT_SYMBOL_GPL(svc_alien_sock); * Name is terminated with '\n'. On error, returns a negative errno * value. */ -int svc_addsock(struct svc_serv *serv, const int fd, char *name_return, - const size_t len, const struct cred *cred) +int svc_addsock(struct svc_serv *serv, struct net *net, const int fd, + char *name_return, const size_t len, const struct cred *cred) { int err = 0; struct socket *so = sockfd_lookup(fd, &err); @@ -1382,6 +1367,9 @@ int svc_addsock(struct svc_serv *serv, const int fd, char *name_return, if (!so) return err; + err = -EINVAL; + if (sock_net(so->sk) != net) + goto out; err = -EAFNOSUPPORT; if ((so->sk->sk_family != PF_INET) && (so->sk->sk_family != PF_INET6)) goto out; From b4417c53d4f90c6bd25b7778b1e956179fd36f0c Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Sat, 3 Jun 2023 07:14:14 +1000 Subject: [PATCH 1056/1565] lockd: drop inappropriate svc_get() from locked_get() [ Upstream commit 665e89ab7c5af1f2d260834c861a74b01a30f95f ] The below-mentioned patch was intended to simplify refcounting on the svc_serv used by locked. The goal was to only ever have a single reference from the single thread. To that end we dropped a call to lockd_start_svc() (except when creating thread) which would take a reference, and dropped the svc_put(serv) that would drop that reference. Unfortunately we didn't also remove the svc_get() from lockd_create_svc() in the case where the svc_serv already existed. So after the patch: - on the first call the svc_serv was allocated and the one reference was given to the thread, so there are no extra references - on subsequent calls svc_get() was called so there is now an extra reference. This is clearly not consistent. The inconsistency is also clear in the current code in lockd_get() takes *two* references, one on nlmsvc_serv and one by incrementing nlmsvc_users. This clearly does not match lockd_put(). So: drop that svc_get() from lockd_get() (which used to be in lockd_create_svc(). Reported-by: Ido Schimmel Closes: https://lore.kernel.org/linux-nfs/ZHsI%2FH16VX9kJQX1@shredder/T/#u Fixes: b73a2972041b ("lockd: move lockd_start_svc() call into lockd_create_svc()") Signed-off-by: NeilBrown Tested-by: Ido Schimmel Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/lockd/svc.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 59ef8a1f843f..5579e67da17d 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -355,7 +355,6 @@ static int lockd_get(void) int error; if (nlmsvc_serv) { - svc_get(nlmsvc_serv); nlmsvc_users++; return 0; } From 72f28b5ad0b5f81208df16e75d241f13924d0d43 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 12 Jun 2023 10:13:39 -0400 Subject: [PATCH 1057/1565] NFSD: Add an nfsd4_encode_nfstime4() helper [ Upstream commit 262176798b18b12fd8ab84c94cfece0a6a652476 ] Clean up: de-duplicate some common code. Reviewed-by: Jeff Layton Acked-by: Tom Talpey Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 46 ++++++++++++++++++++++++++-------------------- 1 file changed, 26 insertions(+), 20 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index d62382dfc135..a81938c1e3ef 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -2541,6 +2541,20 @@ static __be32 *encode_change(__be32 *p, struct kstat *stat, struct inode *inode, return p; } +static __be32 nfsd4_encode_nfstime4(struct xdr_stream *xdr, + struct timespec64 *tv) +{ + __be32 *p; + + p = xdr_reserve_space(xdr, XDR_UNIT * 3); + if (!p) + return nfserr_resource; + + p = xdr_encode_hyper(p, (s64)tv->tv_sec); + *p = cpu_to_be32(tv->tv_nsec); + return nfs_ok; +} + /* * ctime (in NFSv4, time_metadata) is not writeable, and the client * doesn't really care what resolution could theoretically be stored by @@ -3346,11 +3360,9 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, p = xdr_encode_hyper(p, dummy64); } if (bmval1 & FATTR4_WORD1_TIME_ACCESS) { - p = xdr_reserve_space(xdr, 12); - if (!p) - goto out_resource; - p = xdr_encode_hyper(p, (s64)stat.atime.tv_sec); - *p++ = cpu_to_be32(stat.atime.tv_nsec); + status = nfsd4_encode_nfstime4(xdr, &stat.atime); + if (status) + goto out; } if (bmval1 & FATTR4_WORD1_TIME_DELTA) { p = xdr_reserve_space(xdr, 12); @@ -3359,25 +3371,19 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, p = encode_time_delta(p, d_inode(dentry)); } if (bmval1 & FATTR4_WORD1_TIME_METADATA) { - p = xdr_reserve_space(xdr, 12); - if (!p) - goto out_resource; - p = xdr_encode_hyper(p, (s64)stat.ctime.tv_sec); - *p++ = cpu_to_be32(stat.ctime.tv_nsec); + status = nfsd4_encode_nfstime4(xdr, &stat.ctime); + if (status) + goto out; } if (bmval1 & FATTR4_WORD1_TIME_MODIFY) { - p = xdr_reserve_space(xdr, 12); - if (!p) - goto out_resource; - p = xdr_encode_hyper(p, (s64)stat.mtime.tv_sec); - *p++ = cpu_to_be32(stat.mtime.tv_nsec); + status = nfsd4_encode_nfstime4(xdr, &stat.mtime); + if (status) + goto out; } if (bmval1 & FATTR4_WORD1_TIME_CREATE) { - p = xdr_reserve_space(xdr, 12); - if (!p) - goto out_resource; - p = xdr_encode_hyper(p, (s64)stat.btime.tv_sec); - *p++ = cpu_to_be32(stat.btime.tv_nsec); + status = nfsd4_encode_nfstime4(xdr, &stat.btime); + if (status) + goto out; } if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) { u64 ino = stat.ino; From 8ef87fe6e87f754daa9d1b9f2cff71008f07da44 Mon Sep 17 00:00:00 2001 From: Tavian Barnes Date: Fri, 23 Jun 2023 17:09:06 -0400 Subject: [PATCH 1058/1565] nfsd: Fix creation time serialization order [ Upstream commit d7dbed457c2ef83709a2a2723a2d58de43623449 ] In nfsd4_encode_fattr(), TIME_CREATE was being written out after all other times. However, they should be written out in an order that matches the bit flags in bmval1, which in this case are #define FATTR4_WORD1_TIME_ACCESS (1UL << 15) #define FATTR4_WORD1_TIME_CREATE (1UL << 18) #define FATTR4_WORD1_TIME_DELTA (1UL << 19) #define FATTR4_WORD1_TIME_METADATA (1UL << 20) #define FATTR4_WORD1_TIME_MODIFY (1UL << 21) so TIME_CREATE should come second. I noticed this on a FreeBSD NFSv4.2 client, which supports creation times. On this client, file times were weirdly permuted. With this patch applied on the server, times looked normal on the client. Fixes: e377a3e698fb ("nfsd: Add support for the birth time attribute") Link: https://unix.stackexchange.com/q/749605/56202 Signed-off-by: Tavian Barnes Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4xdr.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index a81938c1e3ef..5a68c6286492 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3364,6 +3364,11 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, if (status) goto out; } + if (bmval1 & FATTR4_WORD1_TIME_CREATE) { + status = nfsd4_encode_nfstime4(xdr, &stat.btime); + if (status) + goto out; + } if (bmval1 & FATTR4_WORD1_TIME_DELTA) { p = xdr_reserve_space(xdr, 12); if (!p) @@ -3380,11 +3385,6 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, if (status) goto out; } - if (bmval1 & FATTR4_WORD1_TIME_CREATE) { - status = nfsd4_encode_nfstime4(xdr, &stat.btime); - if (status) - goto out; - } if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) { u64 ino = stat.ino; From 7229200f68662660bb4d55f19247eaf3c79a4217 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 3 Jun 2024 10:35:02 -0400 Subject: [PATCH 1059/1565] nfsd: don't allow nfsd threads to be signalled. [ Upstream commit 3903902401451b1cd9d797a8c79769eb26ac7fe5 ] The original implementation of nfsd used signals to stop threads during shutdown. In Linux 2.3.46pre5 nfsd gained the ability to shutdown threads internally it if was asked to run "0" threads. After this user-space transitioned to using "rpc.nfsd 0" to stop nfsd and sending signals to threads was no longer an important part of the API. In commit 3ebdbe5203a8 ("SUNRPC: discard svo_setup and rename svc_set_num_threads_sync()") (v5.17-rc1~75^2~41) we finally removed the use of signals for stopping threads, using kthread_stop() instead. This patch makes the "obvious" next step and removes the ability to signal nfsd threads - or any svc threads. nfsd stops allowing signals and we don't check for their delivery any more. This will allow for some simplification in later patches. A change worth noting is in nfsd4_ssc_setup_dul(). There was previously a signal_pending() check which would only succeed when the thread was being shut down. It should really have tested kthread_should_stop() as well. Now it just does the latter, not the former. Signed-off-by: NeilBrown Reviewed-by: Jeff Layton [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfs/callback.c | 11 ++--------- fs/nfsd/nfs4proc.c | 8 ++++---- fs/nfsd/nfssvc.c | 12 ------------ net/sunrpc/svc_xprt.c | 20 ++++++++------------ 4 files changed, 14 insertions(+), 37 deletions(-) diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 456af7d230cf..8fe143cad4a2 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -80,9 +80,6 @@ nfs4_callback_svc(void *vrqstp) set_freezable(); while (!kthread_freezable_should_stop(NULL)) { - - if (signal_pending(current)) - flush_signals(current); /* * Listen for a request on the socket */ @@ -112,11 +109,7 @@ nfs41_callback_svc(void *vrqstp) set_freezable(); while (!kthread_freezable_should_stop(NULL)) { - - if (signal_pending(current)) - flush_signals(current); - - prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_INTERRUPTIBLE); + prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_IDLE); spin_lock_bh(&serv->sv_cb_lock); if (!list_empty(&serv->sv_cb_list)) { req = list_first_entry(&serv->sv_cb_list, @@ -131,7 +124,7 @@ nfs41_callback_svc(void *vrqstp) } else { spin_unlock_bh(&serv->sv_cb_lock); if (!kthread_should_stop()) - schedule(); + freezable_schedule(); finish_wait(&serv->sv_cb_waitq, &wq); } } diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 9c51c10bcf08..8560a11daa47 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include @@ -1313,13 +1314,12 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, /* found a match */ if (ni->nsui_busy) { /* wait - and try again */ - prepare_to_wait(&nn->nfsd_ssc_waitq, &wait, - TASK_INTERRUPTIBLE); + prepare_to_wait(&nn->nfsd_ssc_waitq, &wait, TASK_IDLE); spin_unlock(&nn->nfsd_ssc_lock); /* allow 20secs for mount/unmount for now - revisit */ - if (signal_pending(current) || - (schedule_timeout(20*HZ) == 0)) { + if (kthread_should_stop() || + (freezable_schedule_timeout(20*HZ) == 0)) { finish_wait(&nn->nfsd_ssc_waitq, &wait); kfree(work); return nfserr_eagain; diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index a0ecec54d3d7..8063fab2c027 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -952,15 +952,6 @@ nfsd(void *vrqstp) current->fs->umask = 0; - /* - * thread is spawned with all signals set to SIG_IGN, re-enable - * the ones that will bring down the thread - */ - allow_signal(SIGKILL); - allow_signal(SIGHUP); - allow_signal(SIGINT); - allow_signal(SIGQUIT); - atomic_inc(&nfsdstats.th_cnt); set_freezable(); @@ -985,9 +976,6 @@ nfsd(void *vrqstp) validate_process_creds(); } - /* Clear signals before calling svc_exit_thread() */ - flush_signals(current); - atomic_dec(&nfsdstats.th_cnt); out: diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 000b737784bd..d1eacf3358b8 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -674,13 +674,13 @@ static int svc_alloc_arg(struct svc_rqst *rqstp) while (rqstp->rq_pages[i] == NULL) { struct page *p = alloc_page(GFP_KERNEL); if (!p) { - set_current_state(TASK_INTERRUPTIBLE); - if (signalled() || kthread_should_stop()) { + set_current_state(TASK_IDLE); + if (kthread_should_stop()) { set_current_state(TASK_RUNNING); return -EINTR; } - schedule_timeout(msecs_to_jiffies(500)); } + freezable_schedule_timeout(msecs_to_jiffies(500)); rqstp->rq_pages[i] = p; } rqstp->rq_page_end = &rqstp->rq_pages[i]; @@ -713,7 +713,7 @@ rqst_should_sleep(struct svc_rqst *rqstp) return false; /* are we shutting down? */ - if (signalled() || kthread_should_stop()) + if (kthread_should_stop()) return false; /* are we freezing? */ @@ -735,18 +735,14 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp, long timeout) if (rqstp->rq_xprt) goto out_found; - /* - * We have to be able to interrupt this wait - * to bring down the daemons ... - */ - set_current_state(TASK_INTERRUPTIBLE); + set_current_state(TASK_IDLE); smp_mb__before_atomic(); clear_bit(SP_CONGESTED, &pool->sp_flags); clear_bit(RQ_BUSY, &rqstp->rq_flags); smp_mb__after_atomic(); if (likely(rqst_should_sleep(rqstp))) - time_left = schedule_timeout(timeout); + time_left = freezable_schedule_timeout(timeout); else __set_current_state(TASK_RUNNING); @@ -761,7 +757,7 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp, long timeout) if (!time_left) atomic_long_inc(&pool->sp_stats.threads_timedout); - if (signalled() || kthread_should_stop()) + if (kthread_should_stop()) return ERR_PTR(-EINTR); return ERR_PTR(-EAGAIN); out_found: @@ -860,7 +856,7 @@ int svc_recv(struct svc_rqst *rqstp, long timeout) try_to_freeze(); cond_resched(); err = -EINTR; - if (signalled() || kthread_should_stop()) + if (kthread_should_stop()) goto out; xprt = svc_get_next_xprt(rqstp, timeout); From 987c0e102874a27f40c06f4c94853e9979c9ee78 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 31 Jul 2023 16:48:31 +1000 Subject: [PATCH 1060/1565] nfsd: Simplify code around svc_exit_thread() call in nfsd() [ Upstream commit 18e4cf915543257eae2925671934937163f5639b ] Previously a thread could exit asynchronously (due to a signal) so some care was needed to hold nfsd_mutex over the last svc_put() call. Now a thread can only exit when svc_set_num_threads() is called, and this is always called under nfsd_mutex. So no care is needed. Not only is the mutex held when a thread exits now, but the svc refcount is elevated, so the svc_put() in svc_exit_thread() will never be a final put, so the mutex isn't even needed at this point in the code. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfssvc.c | 23 ----------------------- include/linux/sunrpc/svc.h | 13 ------------- 2 files changed, 36 deletions(-) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 8063fab2c027..8907dba22c3f 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -979,31 +979,8 @@ nfsd(void *vrqstp) atomic_dec(&nfsdstats.th_cnt); out: - /* Take an extra ref so that the svc_put in svc_exit_thread() - * doesn't call svc_destroy() - */ - svc_get(nn->nfsd_serv); - /* Release the thread */ svc_exit_thread(rqstp); - - /* We need to drop a ref, but may not drop the last reference - * without holding nfsd_mutex, and we cannot wait for nfsd_mutex as that - * could deadlock with nfsd_shutdown_threads() waiting for us. - * So three options are: - * - drop a non-final reference, - * - get the mutex without waiting - * - sleep briefly andd try the above again - */ - while (!svc_put_not_last(nn->nfsd_serv)) { - if (mutex_trylock(&nfsd_mutex)) { - nfsd_put(net); - mutex_unlock(&nfsd_mutex); - break; - } - msleep(20); - } - return 0; } diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 5cf6543d3c8e..1cf7a7799cc0 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -123,19 +123,6 @@ static inline void svc_put(struct svc_serv *serv) kref_put(&serv->sv_refcnt, svc_destroy); } -/** - * svc_put_not_last - decrement non-final reference count on SUNRPC serv - * @serv: the svc_serv to have count decremented - * - * Returns: %true is refcount was decremented. - * - * If the refcount is 1, it is not decremented and instead failure is reported. - */ -static inline bool svc_put_not_last(struct svc_serv *serv) -{ - return refcount_dec_not_one(&serv->sv_refcnt.refcount); -} - /* * Maximum payload size supported by a kernel RPC server. * This is use to determine the max number of pages nfsd is From d31cd25f5501d28dfe654856647253bb6eedd724 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 31 Jul 2023 16:48:32 +1000 Subject: [PATCH 1061/1565] nfsd: separate nfsd_last_thread() from nfsd_put() [ Upstream commit 9f28a971ee9fdf1bf8ce8c88b103f483be610277 ] Now that the last nfsd thread is stopped by an explicit act of calling svc_set_num_threads() with a count of zero, we only have a limited number of places that can happen, and don't need to call nfsd_last_thread() in nfsd_put() So separate that out and call it at the two places where the number of threads is set to zero. Move the clearing of ->nfsd_serv and the call to svc_xprt_destroy_all() into nfsd_last_thread(), as they are really part of the same action. nfsd_put() is now a thin wrapper around svc_put(), so make it a static inline. nfsd_put() cannot be called after nfsd_last_thread(), so in a couple of places we have to use svc_put() instead. Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsd.h | 7 ++++++- fs/nfsd/nfssvc.c | 52 ++++++++++++++++++------------------------------ 2 files changed, 25 insertions(+), 34 deletions(-) diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index fa0144a74267..867dcfd64d42 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -96,7 +96,12 @@ int nfsd_pool_stats_open(struct inode *, struct file *); int nfsd_pool_stats_release(struct inode *, struct file *); void nfsd_shutdown_threads(struct net *net); -void nfsd_put(struct net *net); +static inline void nfsd_put(struct net *net) +{ + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + + svc_put(nn->nfsd_serv); +} bool i_am_nfsd(void); diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 8907dba22c3f..ee5713fca187 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -529,9 +529,14 @@ static struct notifier_block nfsd_inet6addr_notifier = { /* Only used under nfsd_mutex, so this atomic may be overkill: */ static atomic_t nfsd_notifier_refcount = ATOMIC_INIT(0); -static void nfsd_last_thread(struct svc_serv *serv, struct net *net) +static void nfsd_last_thread(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct svc_serv *serv = nn->nfsd_serv; + + spin_lock(&nfsd_notifier_lock); + nn->nfsd_serv = NULL; + spin_unlock(&nfsd_notifier_lock); /* check if the notifier still has clients */ if (atomic_dec_return(&nfsd_notifier_refcount) == 0) { @@ -541,6 +546,8 @@ static void nfsd_last_thread(struct svc_serv *serv, struct net *net) #endif } + svc_xprt_destroy_all(serv, net); + /* * write_ports can create the server without actually starting * any threads--if we get shut down before any threads are @@ -631,7 +638,8 @@ void nfsd_shutdown_threads(struct net *net) svc_get(serv); /* Kill outstanding nfsd threads */ svc_set_num_threads(serv, NULL, 0); - nfsd_put(net); + nfsd_last_thread(net); + svc_put(serv); mutex_unlock(&nfsd_mutex); } @@ -661,9 +669,6 @@ int nfsd_create_serv(struct net *net) serv->sv_maxconn = nn->max_connections; error = svc_bind(serv, net); if (error < 0) { - /* NOT nfsd_put() as notifiers (see below) haven't - * been set up yet. - */ svc_put(serv); return error; } @@ -706,29 +711,6 @@ int nfsd_get_nrthreads(int n, int *nthreads, struct net *net) return 0; } -/* This is the callback for kref_put() below. - * There is no code here as the first thing to be done is - * call svc_shutdown_net(), but we cannot get the 'net' from - * the kref. So do all the work when kref_put returns true. - */ -static void nfsd_noop(struct kref *ref) -{ -} - -void nfsd_put(struct net *net) -{ - struct nfsd_net *nn = net_generic(net, nfsd_net_id); - - if (kref_put(&nn->nfsd_serv->sv_refcnt, nfsd_noop)) { - svc_xprt_destroy_all(nn->nfsd_serv, net); - nfsd_last_thread(nn->nfsd_serv, net); - svc_destroy(&nn->nfsd_serv->sv_refcnt); - spin_lock(&nfsd_notifier_lock); - nn->nfsd_serv = NULL; - spin_unlock(&nfsd_notifier_lock); - } -} - int nfsd_set_nrthreads(int n, int *nthreads, struct net *net) { int i = 0; @@ -779,7 +761,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net) if (err) break; } - nfsd_put(net); + svc_put(nn->nfsd_serv); return err; } @@ -794,6 +776,7 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred) int error; bool nfsd_up_before; struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct svc_serv *serv; mutex_lock(&nfsd_mutex); dprintk("nfsd: creating service\n"); @@ -813,22 +796,25 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred) goto out; nfsd_up_before = nn->nfsd_net_up; + serv = nn->nfsd_serv; error = nfsd_startup_net(net, cred); if (error) goto out_put; - error = svc_set_num_threads(nn->nfsd_serv, NULL, nrservs); + error = svc_set_num_threads(serv, NULL, nrservs); if (error) goto out_shutdown; - error = nn->nfsd_serv->sv_nrthreads; + error = serv->sv_nrthreads; + if (error == 0) + nfsd_last_thread(net); out_shutdown: if (error < 0 && !nfsd_up_before) nfsd_shutdown_net(net); out_put: /* Threads now hold service active */ if (xchg(&nn->keep_active, 0)) - nfsd_put(net); - nfsd_put(net); + svc_put(serv); + svc_put(serv); out: mutex_unlock(&nfsd_mutex); return error; From 3add01e067483833c302c1815b5df02491832fb9 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 25 Aug 2023 15:04:23 -0400 Subject: [PATCH 1062/1565] Documentation: Add missing documentation for EXPORT_OP flags [ Upstream commit b38a6023da6a12b561f0421c6a5a1f7624a1529c ] The commits that introduced these flags neglected to update the Documentation/filesystems/nfs/exporting.rst file. Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- Documentation/filesystems/nfs/exporting.rst | 26 +++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst index 0e98edd353b5..6f59a364f84c 100644 --- a/Documentation/filesystems/nfs/exporting.rst +++ b/Documentation/filesystems/nfs/exporting.rst @@ -215,3 +215,29 @@ following flags are defined: This flag causes nfsd to close any open files for this inode _before_ calling into the vfs to do an unlink or a rename that would replace an existing file. + + EXPORT_OP_REMOTE_FS - Backing storage for this filesystem is remote + PF_LOCAL_THROTTLE exists for loopback NFSD, where a thread needs to + write to one bdi (the final bdi) in order to free up writes queued + to another bdi (the client bdi). Such threads get a private balance + of dirty pages so that dirty pages for the client bdi do not imact + the daemon writing to the final bdi. For filesystems whose durable + storage is not local (such as exported NFS filesystems), this + constraint has negative consequences. EXPORT_OP_REMOTE_FS enables + an export to disable writeback throttling. + + EXPORT_OP_NOATOMIC_ATTR - Filesystem does not update attributes atomically + EXPORT_OP_NOATOMIC_ATTR indicates that the exported filesystem + cannot provide the semantics required by the "atomic" boolean in + NFSv4's change_info4. This boolean indicates to a client whether the + returned before and after change attributes were obtained atomically + with the respect to the requested metadata operation (UNLINK, + OPEN/CREATE, MKDIR, etc). + + EXPORT_OP_FLUSH_ON_CLOSE - Filesystem flushes file data on close(2) + On most filesystems, inodes can remain under writeback after the + file is closed. NFSD relies on client activity or local flusher + threads to handle writeback. Certain filesystems, such as NFS, flush + all of an inode's dirty data on last close. Exports that behave this + way should set EXPORT_OP_FLUSH_ON_CLOSE so that NFSD knows to skip + waiting for writeback when closing such files. From e35cb663a462918a36183cec73cf55dc8963ac0c Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 12 Sep 2023 11:25:00 +1000 Subject: [PATCH 1063/1565] NFSD: fix possible oops when nfsd/pool_stats is closed. [ Upstream commit 88956eabfdea7d01d550535af120d4ef265b1d02 ] If /proc/fs/nfsd/pool_stats is open when the last nfsd thread exits, then when the file is closed a NULL pointer is dereferenced. This is because nfsd_pool_stats_release() assumes that the pointer to the svc_serv cannot become NULL while a reference is held. This used to be the case but a recent patch split nfsd_last_thread() out from nfsd_put(), and clearing the pointer is done in nfsd_last_thread(). This is easily reproduced by running rpc.nfsd 8 ; ( rpc.nfsd 0;true) < /proc/fs/nfsd/pool_stats Fortunately nfsd_pool_stats_release() has easy access to the svc_serv pointer, and so can call svc_put() on it directly. Fixes: 9f28a971ee9f ("nfsd: separate nfsd_last_thread() from nfsd_put()") Signed-off-by: NeilBrown Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfssvc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index ee5713fca187..2a1dd580dfb9 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -1084,11 +1084,12 @@ int nfsd_pool_stats_open(struct inode *inode, struct file *file) int nfsd_pool_stats_release(struct inode *inode, struct file *file) { + struct seq_file *seq = file->private_data; + struct svc_serv *serv = seq->private; int ret = seq_release(inode, file); - struct net *net = inode->i_sb->s_fs_info; mutex_lock(&nfsd_mutex); - nfsd_put(net); + svc_put(serv); mutex_unlock(&nfsd_mutex); return ret; } From 838a602db75d017c891af8db2e2a0a438f7bd70e Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Fri, 15 Dec 2023 11:56:31 +1100 Subject: [PATCH 1064/1565] nfsd: call nfsd_last_thread() before final nfsd_put() [ Upstream commit 2a501f55cd641eb4d3c16a2eab0d678693fac663 ] If write_ports_addfd or write_ports_addxprt fail, they call nfsd_put() without calling nfsd_last_thread(). This leaves nn->nfsd_serv pointing to a structure that has been freed. So remove 'static' from nfsd_last_thread() and call it when the nfsd_serv is about to be destroyed. Fixes: ec52361df99b ("SUNRPC: stop using ->sv_nrthreads as a refcount") Signed-off-by: NeilBrown Reviewed-by: Jeff Layton Cc: Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsctl.c | 9 +++++++-- fs/nfsd/nfsd.h | 1 + fs/nfsd/nfssvc.c | 2 +- 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 76a60e7a7509..eec442edb655 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -720,8 +720,10 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred err = svc_addsock(nn->nfsd_serv, net, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred); - if (err >= 0 && - !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) + if (err < 0 && !nn->nfsd_serv->sv_nrthreads && !nn->keep_active) + nfsd_last_thread(net); + else if (err >= 0 && + !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) svc_get(nn->nfsd_serv); nfsd_put(net); @@ -771,6 +773,9 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr svc_xprt_put(xprt); } out_err: + if (!nn->nfsd_serv->sv_nrthreads && !nn->keep_active) + nfsd_last_thread(net); + nfsd_put(net); return err; } diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 867dcfd64d42..3796015dc765 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -138,6 +138,7 @@ int nfsd_vers(struct nfsd_net *nn, int vers, enum vers_op change); int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change); void nfsd_reset_versions(struct nfsd_net *nn); int nfsd_create_serv(struct net *net); +void nfsd_last_thread(struct net *net); extern int nfsd_max_blksize; diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 2a1dd580dfb9..3d4fd40c987b 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -529,7 +529,7 @@ static struct notifier_block nfsd_inet6addr_notifier = { /* Only used under nfsd_mutex, so this atomic may be overkill: */ static atomic_t nfsd_notifier_refcount = ATOMIC_INIT(0); -static void nfsd_last_thread(struct net *net) +void nfsd_last_thread(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); struct svc_serv *serv = nn->nfsd_serv; From ca791e1a31cfa36c51f5c895258828f890edbc2e Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 3 Jan 2024 08:36:52 -0500 Subject: [PATCH 1065/1565] nfsd: drop the nfsd_put helper [ Upstream commit 64e6304169f1e1f078e7f0798033f80a7fb0ea46 ] It's not safe to call nfsd_put once nfsd_last_thread has been called, as that function will zero out the nn->nfsd_serv pointer. Drop the nfsd_put helper altogether and open-code the svc_put in its callers instead. That allows us to not be reliant on the value of that pointer when handling an error. Fixes: 2a501f55cd64 ("nfsd: call nfsd_last_thread() before final nfsd_put()") Reported-by: Zhi Li Cc: NeilBrown Signed-off-by: Jeffrey Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsctl.c | 31 +++++++++++++++++-------------- fs/nfsd/nfsd.h | 7 ------- 2 files changed, 17 insertions(+), 21 deletions(-) diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index eec442edb655..f77f00c93172 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -709,6 +709,7 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred char *mesg = buf; int fd, err; struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct svc_serv *serv; err = get_int(&mesg, &fd); if (err != 0 || fd < 0) @@ -718,15 +719,15 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred if (err != 0) return err; - err = svc_addsock(nn->nfsd_serv, net, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred); + serv = nn->nfsd_serv; + err = svc_addsock(serv, net, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred); - if (err < 0 && !nn->nfsd_serv->sv_nrthreads && !nn->keep_active) + if (err < 0 && !serv->sv_nrthreads && !nn->keep_active) nfsd_last_thread(net); - else if (err >= 0 && - !nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) - svc_get(nn->nfsd_serv); + else if (err >= 0 && !serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) + svc_get(serv); - nfsd_put(net); + svc_put(serv); return err; } @@ -740,6 +741,7 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr struct svc_xprt *xprt; int port, err; struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct svc_serv *serv; if (sscanf(buf, "%15s %5u", transport, &port) != 2) return -EINVAL; @@ -751,32 +753,33 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr if (err != 0) return err; - err = svc_xprt_create(nn->nfsd_serv, transport, net, + serv = nn->nfsd_serv; + err = svc_xprt_create(serv, transport, net, PF_INET, port, SVC_SOCK_ANONYMOUS, cred); if (err < 0) goto out_err; - err = svc_xprt_create(nn->nfsd_serv, transport, net, + err = svc_xprt_create(serv, transport, net, PF_INET6, port, SVC_SOCK_ANONYMOUS, cred); if (err < 0 && err != -EAFNOSUPPORT) goto out_close; - if (!nn->nfsd_serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) - svc_get(nn->nfsd_serv); + if (!serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) + svc_get(serv); - nfsd_put(net); + svc_put(serv); return 0; out_close: - xprt = svc_find_xprt(nn->nfsd_serv, transport, net, PF_INET, port); + xprt = svc_find_xprt(serv, transport, net, PF_INET, port); if (xprt != NULL) { svc_xprt_close(xprt); svc_xprt_put(xprt); } out_err: - if (!nn->nfsd_serv->sv_nrthreads && !nn->keep_active) + if (!serv->sv_nrthreads && !nn->keep_active) nfsd_last_thread(net); - nfsd_put(net); + svc_put(serv); return err; } diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 3796015dc765..013bfa24ced2 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -96,13 +96,6 @@ int nfsd_pool_stats_open(struct inode *, struct file *); int nfsd_pool_stats_release(struct inode *, struct file *); void nfsd_shutdown_threads(struct net *net); -static inline void nfsd_put(struct net *net) -{ - struct nfsd_net *nn = net_generic(net, nfsd_net_id); - - svc_put(nn->nfsd_serv); -} - bool i_am_nfsd(void); struct nfsdfs_client { From 99fb654d01dc3f08b5905c663ad6c89a9d83302f Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 22 Jan 2024 14:58:16 +1100 Subject: [PATCH 1066/1565] nfsd: fix RELEASE_LOCKOWNER [ Upstream commit edcf9725150e42beeca42d085149f4c88fa97afd ] The test on so_count in nfsd4_release_lockowner() is nonsense and harmful. Revert to using check_for_locks(), changing that to not sleep. First: harmful. As is documented in the kdoc comment for nfsd4_release_lockowner(), the test on so_count can transiently return a false positive resulting in a return of NFS4ERR_LOCKS_HELD when in fact no locks are held. This is clearly a protocol violation and with the Linux NFS client it can cause incorrect behaviour. If RELEASE_LOCKOWNER is sent while some other thread is still processing a LOCK request which failed because, at the time that request was received, the given owner held a conflicting lock, then the nfsd thread processing that LOCK request can hold a reference (conflock) to the lock owner that causes nfsd4_release_lockowner() to return an incorrect error. The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so it knows that the error is impossible. It assumes the lock owner was in fact released so it feels free to use the same lock owner identifier in some later locking request. When it does reuse a lock owner identifier for which a previous RELEASE failed, it will naturally use a lock_seqid of zero. However the server, which didn't release the lock owner, will expect a larger lock_seqid and so will respond with NFS4ERR_BAD_SEQID. So clearly it is harmful to allow a false positive, which testing so_count allows. The test is nonsense because ... well... it doesn't mean anything. so_count is the sum of three different counts. 1/ the set of states listed on so_stateids 2/ the set of active vfs locks owned by any of those states 3/ various transient counts such as for conflicting locks. When it is tested against '2' it is clear that one of these is the transient reference obtained by find_lockowner_str_locked(). It is not clear what the other one is expected to be. In practice, the count is often 2 because there is precisely one state on so_stateids. If there were more, this would fail. In my testing I see two circumstances when RELEASE_LOCKOWNER is called. In one case, CLOSE is called before RELEASE_LOCKOWNER. That results in all the lock states being removed, and so the lockowner being discarded (it is removed when there are no more references which usually happens when the lock state is discarded). When nfsd4_release_lockowner() finds that the lock owner doesn't exist, it returns success. The other case shows an so_count of '2' and precisely one state listed in so_stateid. It appears that the Linux client uses a separate lock owner for each file resulting in one lock state per lock owner, so this test on '2' is safe. For another client it might not be safe. So this patch changes check_for_locks() to use the (newish) find_any_file_locked() so that it doesn't take a reference on the nfs4_file and so never calls nfsd_file_put(), and so never sleeps. With this check is it safe to restore the use of check_for_locks() rather than testing so_count against the mysterious '2'. Fixes: ce3c4ad7f4ce ("NFSD: Fix possible sleep during nfsd4_release_lockowner()") Signed-off-by: NeilBrown Reviewed-by: Jeff Layton Cc: stable@vger.kernel.org # v6.2+ Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index e8e642e5ec8f..c073cc23c528 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -7836,14 +7836,16 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner) { struct file_lock *fl; int status = false; - struct nfsd_file *nf = find_any_file(fp); + struct nfsd_file *nf; struct inode *inode; struct file_lock_context *flctx; + spin_lock(&fp->fi_lock); + nf = find_any_file_locked(fp); if (!nf) { /* Any valid lock stateid should have some sort of access */ WARN_ON_ONCE(1); - return status; + goto out; } inode = locks_inode(nf->nf_file); @@ -7859,7 +7861,8 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner) } spin_unlock(&flctx->flc_lock); } - nfsd_file_put(nf); +out: + spin_unlock(&fp->fi_lock); return status; } @@ -7869,10 +7872,8 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner) * @cstate: NFSv4 COMPOUND state * @u: RELEASE_LOCKOWNER arguments * - * The lockowner's so_count is bumped when a lock record is added - * or when copying a conflicting lock. The latter case is brief, - * but can lead to fleeting false positives when looking for - * locks-in-use. + * Check if theree are any locks still held and if not - free the lockowner + * and any lock state that is owned. * * Return values: * %nfs_ok: lockowner released or not found @@ -7908,10 +7909,13 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp, spin_unlock(&clp->cl_lock); return nfs_ok; } - if (atomic_read(&lo->lo_owner.so_count) != 2) { - spin_unlock(&clp->cl_lock); - nfs4_put_stateowner(&lo->lo_owner); - return nfserr_locks_held; + + list_for_each_entry(stp, &lo->lo_owner.so_stateids, st_perstateowner) { + if (check_for_locks(stp->st_stid.sc_file, lo)) { + spin_unlock(&clp->cl_lock); + nfs4_put_stateowner(&lo->lo_owner); + return nfserr_locks_held; + } } unhash_lockowner_locked(lo); while (!list_empty(&lo->lo_owner.so_stateids)) { From feb3352af74273424cd4ae68650196a73a5a9d54 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 5 Feb 2024 13:22:39 +1100 Subject: [PATCH 1067/1565] nfsd: don't take fi_lock in nfsd_break_deleg_cb() [ Upstream commit 5ea9a7c5fe4149f165f0e3b624fe08df02b6c301 ] A recent change to check_for_locks() changed it to take ->flc_lock while holding ->fi_lock. This creates a lock inversion (reported by lockdep) because there is a case where ->fi_lock is taken while holding ->flc_lock. ->flc_lock is held across ->fl_lmops callbacks, and nfsd_break_deleg_cb() is one of those and does take ->fi_lock. However it doesn't need to. Prior to v4.17-rc1~110^2~22 ("nfsd: create a separate lease for each delegation") nfsd_break_deleg_cb() would walk the ->fi_delegations list and so needed the lock. Since then it doesn't walk the list and doesn't need the lock. Two actions are performed under the lock. One is to call nfsd_break_one_deleg which calls nfsd4_run_cb(). These doesn't act on the nfs4_file at all, so don't need the lock. The other is to set ->fi_had_conflict which is in the nfs4_file. This field is only ever set here (except when initialised to false) so there is no possible problem will multiple threads racing when setting it. The field is tested twice in nfs4_set_delegation(). The first test does not hold a lock and is documented as an opportunistic optimisation, so it doesn't impose any need to hold ->fi_lock while setting ->fi_had_conflict. The second test in nfs4_set_delegation() *is* make under ->fi_lock, so removing the locking when ->fi_had_conflict is set could make a change. The change could only be interesting if ->fi_had_conflict tested as false even though nfsd_break_one_deleg() ran before ->fi_lock was unlocked. i.e. while hash_delegation_locked() was running. As hash_delegation_lock() doesn't interact in any way with nfs4_run_cb() there can be no importance to this interaction. So this patch removes the locking from nfsd_break_one_deleg() and moves the final test on ->fi_had_conflict out of the locked region to make it clear that locking isn't important to the test. It is still tested *after* vfs_setlease() has succeeded. This might be significant and as vfs_setlease() takes ->flc_lock, and nfsd_break_one_deleg() is called under ->flc_lock this "after" is a true ordering provided by a spinlock. Fixes: edcf9725150e ("nfsd: fix RELEASE_LOCKOWNER") Signed-off-by: NeilBrown Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index c073cc23c528..165acd8138ab 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4946,10 +4946,8 @@ nfsd_break_deleg_cb(struct file_lock *fl) */ fl->fl_break_time = 0; - spin_lock(&fp->fi_lock); fp->fi_had_conflict = true; nfsd_break_one_deleg(dp); - spin_unlock(&fp->fi_lock); return false; } @@ -5537,12 +5535,13 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, if (status) goto out_unlock; + status = -EAGAIN; + if (fp->fi_had_conflict) + goto out_unlock; + spin_lock(&state_lock); spin_lock(&fp->fi_lock); - if (fp->fi_had_conflict) - status = -EAGAIN; - else - status = hash_delegation_locked(dp, fp); + status = hash_delegation_locked(dp, fp); spin_unlock(&fp->fi_lock); spin_unlock(&state_lock); From a1a153fc73cc9184bcdfcd3a9f619705e35e53b7 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Wed, 31 Jan 2024 11:17:40 +1100 Subject: [PATCH 1068/1565] nfsd: don't call locks_release_private() twice concurrently [ Upstream commit 05eda6e75773592760285e10ac86c56d683be17f ] It is possible for free_blocked_lock() to be called twice concurrently, once from nfsd4_lock() and once from nfsd4_release_lockowner() calling remove_blocked_locks(). This is why a kref was added. It is perfectly safe for locks_delete_block() and kref_put() to be called in parallel as they use locking or atomicity respectively as protection. However locks_release_private() has no locking. It is safe for it to be called twice sequentially, but not concurrently. This patch moves that call from free_blocked_lock() where it could race with itself, to free_nbl() where it cannot. This will slightly delay the freeing of private info or release of the owner - but not by much. It is arguably more natural for this freeing to happen in free_nbl() where the structure itself is freed. This bug was found by code inspection - it has not been seen in practice. Fixes: 47446d74f170 ("nfsd4: add refcount for nfsd4_blocked_lock") Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 165acd8138ab..228560f3fd0e 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -318,6 +318,7 @@ free_nbl(struct kref *kref) struct nfsd4_blocked_lock *nbl; nbl = container_of(kref, struct nfsd4_blocked_lock, nbl_kref); + locks_release_private(&nbl->nbl_lock); kfree(nbl); } @@ -325,7 +326,6 @@ static void free_blocked_lock(struct nfsd4_blocked_lock *nbl) { locks_delete_block(&nbl->nbl_lock); - locks_release_private(&nbl->nbl_lock); kref_put(&nbl->nbl_kref, free_nbl); } From 9444ce5cd48802188b73315dfde9e341fc3f75a2 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Thu, 15 Feb 2024 20:24:50 -0500 Subject: [PATCH 1069/1565] nfsd: Fix a regression in nfsd_setattr() [ Upstream commit 6412e44c40aaf8f1d7320b2099c5bdd6cb9126ac ] Commit bb4d53d66e4b ("NFSD: use (un)lock_inode instead of fh_(un)lock for file operations") broke the NFSv3 pre/post op attributes behaviour when doing a SETATTR rpc call by stripping out the calls to fh_fill_pre_attrs() and fh_fill_post_attrs(). Fixes: bb4d53d66e4b ("NFSD: use (un)lock_inode instead of fh_(un)lock for file operations") Signed-off-by: Trond Myklebust Reviewed-by: Jeff Layton Reviewed-by: NeilBrown Message-ID: <20240216012451.22725-1-trondmy@kernel.org> [ cel: adjusted to apply to v5.10.y ] Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4proc.c | 4 ++++ fs/nfsd/vfs.c | 6 ++++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 8560a11daa47..2c0de247083a 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -1107,6 +1107,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, }; struct inode *inode; __be32 status = nfs_ok; + bool save_no_wcc; int err; if (setattr->sa_iattr.ia_valid & ATTR_SIZE) { @@ -1132,8 +1133,11 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) goto out; + save_no_wcc = cstate->current_fh.fh_no_wcc; + cstate->current_fh.fh_no_wcc = true; status = nfsd_setattr(rqstp, &cstate->current_fh, &attrs, 0, (time64_t)0); + cstate->current_fh.fh_no_wcc = save_no_wcc; if (!status) status = nfserrno(attrs.na_labelerr); if (!status) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 542a1adbbf2a..0ea05ddff0d0 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -486,7 +486,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, int accmode = NFSD_MAY_SATTR; umode_t ftype = 0; __be32 err; - int host_err; + int host_err = 0; bool get_write_count; bool size_change = (iap->ia_valid & ATTR_SIZE); int retries; @@ -544,6 +544,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, } inode_lock(inode); + fh_fill_pre_attrs(fhp); for (retries = 1;;) { struct iattr attrs; @@ -569,13 +570,14 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, !attr->na_aclerr && attr->na_dpacl && S_ISDIR(inode->i_mode)) attr->na_aclerr = set_posix_acl(inode, ACL_TYPE_DEFAULT, attr->na_dpacl); + fh_fill_post_attrs(fhp); inode_unlock(inode); if (size_change) put_write_access(inode); out: if (!host_err) host_err = commit_metadata(fhp); - return nfserrno(host_err); + return err != 0 ? err : nfserrno(host_err); } #if defined(CONFIG_NFSD_V4) From 3a3877de44342d0a09216dfbe674a404e8f5e96f Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Fri, 21 Jun 2024 14:54:16 +0200 Subject: [PATCH 1070/1565] Linux 5.10.220 Link: https://lore.kernel.org/r/20240618123407.280171066@linuxfoundation.org Tested-by: Jon Hunter Tested-by: Pavel Machek (CIP) Tested-by: Dominique Martinet Tested-by: Florian Fainelli Tested-by: Linux Kernel Functional Testing Tested-by: Salvatore Bonaccorso Signed-off-by: Greg Kroah-Hartman --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 3b36b77589f2..9304408d8ace 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 5 PATCHLEVEL = 10 -SUBLEVEL = 219 +SUBLEVEL = 220 EXTRAVERSION = NAME = Dare mighty things From 8f48a7f8b929b346b6a23212dec5a0e14d5999fe Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Google)" Date: Mon, 20 May 2024 20:57:37 -0400 Subject: [PATCH 1071/1565] tracing/selftests: Fix kprobe event name test for .isra. functions commit 23a4b108accc29a6125ed14de4a044689ffeda78 upstream. The kprobe_eventname.tc test checks if a function with .isra. can have a kprobe attached to it. It loops through the kallsyms file for all the functions that have the .isra. name, and checks if it exists in the available_filter_functions file, and if it does, it uses it to attach a kprobe to it. The issue is that kprobes can not attach to functions that are listed more than once in available_filter_functions. With the latest kernel, the function that is found is: rapl_event_update.isra.0 # grep rapl_event_update.isra.0 /sys/kernel/tracing/available_filter_functions rapl_event_update.isra.0 rapl_event_update.isra.0 It is listed twice. This causes the attached kprobe to it to fail which in turn fails the test. Instead of just picking the function function that is found in available_filter_functions, pick the first one that is listed only once in available_filter_functions. Cc: stable@vger.kernel.org Fixes: 604e3548236d ("selftests/ftrace: Select an existing function in kprobe_eventname test") Signed-off-by: Steven Rostedt (Google) Acked-by: Masami Hiramatsu (Google) Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman --- .../testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc index 1f6981ef7afa..ba19b81cef39 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc @@ -30,7 +30,8 @@ find_dot_func() { fi grep " [tT] .*\.isra\..*" /proc/kallsyms | cut -f 3 -d " " | while read f; do - if grep -s $f available_filter_functions; then + cnt=`grep -s $f available_filter_functions | wc -l`; + if [ $cnt -eq 1 ]; then echo $f break fi From f64d566f4332f782e3e060d377a93d8b34591522 Mon Sep 17 00:00:00 2001 From: Damien Le Moal Date: Tue, 28 May 2024 15:28:52 +0900 Subject: [PATCH 1072/1565] null_blk: Print correct max open zones limit in null_init_zoned_dev() commit 233e27b4d21c3e44eb863f03e566d3a22e81a7ae upstream. When changing the maximum number of open zones, print that number instead of the total number of zones. Fixes: dc4d137ee3b7 ("null_blk: add support for max open/active zone limit for zoned devices") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal Reviewed-by: Niklas Cassel Link: https://lore.kernel.org/r/20240528062852.437599-1-dlemoal@kernel.org Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- drivers/block/null_blk/zoned.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c index 41220ce59659..2bad11424734 100644 --- a/drivers/block/null_blk/zoned.c +++ b/drivers/block/null_blk/zoned.c @@ -81,7 +81,7 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q) if (dev->zone_max_active && dev->zone_max_open > dev->zone_max_active) { dev->zone_max_open = dev->zone_max_active; pr_info("changed the maximum number of open zones to %u\n", - dev->nr_zones); + dev->zone_max_open); } else if (dev->zone_max_open >= dev->nr_zones - dev->zone_nr_conv) { dev->zone_max_open = 0; pr_info("zone_max_open limit disabled, limit >= zone count\n"); From 7518e20a189f8659b8b83969db4d33a4068fcfc3 Mon Sep 17 00:00:00 2001 From: Nicolas Escande Date: Tue, 28 May 2024 16:26:05 +0200 Subject: [PATCH 1073/1565] wifi: mac80211: mesh: Fix leak of mesh_preq_queue objects [ Upstream commit b7d7f11a291830fdf69d3301075dd0fb347ced84 ] The hwmp code use objects of type mesh_preq_queue, added to a list in ieee80211_if_mesh, to keep track of mpath we need to resolve. If the mpath gets deleted, ex mesh interface is removed, the entries in that list will never get cleaned. Fix this by flushing all corresponding items of the preq_queue in mesh_path_flush_pending(). This should take care of KASAN reports like this: unreferenced object 0xffff00000668d800 (size 128): comm "kworker/u8:4", pid 67, jiffies 4295419552 (age 1836.444s) hex dump (first 32 bytes): 00 1f 05 09 00 00 ff ff 00 d5 68 06 00 00 ff ff ..........h..... 8e 97 ea eb 3e b8 01 00 00 00 00 00 00 00 00 00 ....>........... backtrace: [<000000007302a0b6>] __kmem_cache_alloc_node+0x1e0/0x35c [<00000000049bd418>] kmalloc_trace+0x34/0x80 [<0000000000d792bb>] mesh_queue_preq+0x44/0x2a8 [<00000000c99c3696>] mesh_nexthop_resolve+0x198/0x19c [<00000000926bf598>] ieee80211_xmit+0x1d0/0x1f4 [<00000000fc8c2284>] __ieee80211_subif_start_xmit+0x30c/0x764 [<000000005926ee38>] ieee80211_subif_start_xmit+0x9c/0x7a4 [<000000004c86e916>] dev_hard_start_xmit+0x174/0x440 [<0000000023495647>] __dev_queue_xmit+0xe24/0x111c [<00000000cfe9ca78>] batadv_send_skb_packet+0x180/0x1e4 [<000000007bacc5d5>] batadv_v_elp_periodic_work+0x2f4/0x508 [<00000000adc3cd94>] process_one_work+0x4b8/0xa1c [<00000000b36425d1>] worker_thread+0x9c/0x634 [<0000000005852dd5>] kthread+0x1bc/0x1c4 [<000000005fccd770>] ret_from_fork+0x10/0x20 unreferenced object 0xffff000009051f00 (size 128): comm "kworker/u8:4", pid 67, jiffies 4295419553 (age 1836.440s) hex dump (first 32 bytes): 90 d6 92 0d 00 00 ff ff 00 d8 68 06 00 00 ff ff ..........h..... 36 27 92 e4 02 e0 01 00 00 58 79 06 00 00 ff ff 6'.......Xy..... backtrace: [<000000007302a0b6>] __kmem_cache_alloc_node+0x1e0/0x35c [<00000000049bd418>] kmalloc_trace+0x34/0x80 [<0000000000d792bb>] mesh_queue_preq+0x44/0x2a8 [<00000000c99c3696>] mesh_nexthop_resolve+0x198/0x19c [<00000000926bf598>] ieee80211_xmit+0x1d0/0x1f4 [<00000000fc8c2284>] __ieee80211_subif_start_xmit+0x30c/0x764 [<000000005926ee38>] ieee80211_subif_start_xmit+0x9c/0x7a4 [<000000004c86e916>] dev_hard_start_xmit+0x174/0x440 [<0000000023495647>] __dev_queue_xmit+0xe24/0x111c [<00000000cfe9ca78>] batadv_send_skb_packet+0x180/0x1e4 [<000000007bacc5d5>] batadv_v_elp_periodic_work+0x2f4/0x508 [<00000000adc3cd94>] process_one_work+0x4b8/0xa1c [<00000000b36425d1>] worker_thread+0x9c/0x634 [<0000000005852dd5>] kthread+0x1bc/0x1c4 [<000000005fccd770>] ret_from_fork+0x10/0x20 Fixes: 050ac52cbe1f ("mac80211: code for on-demand Hybrid Wireless Mesh Protocol") Signed-off-by: Nicolas Escande Link: https://msgid.link/20240528142605.1060566-1-nico.escande@gmail.com Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin --- net/mac80211/mesh_pathtbl.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/net/mac80211/mesh_pathtbl.c b/net/mac80211/mesh_pathtbl.c index d936ef0c17a3..72ecce377d17 100644 --- a/net/mac80211/mesh_pathtbl.c +++ b/net/mac80211/mesh_pathtbl.c @@ -723,10 +723,23 @@ void mesh_path_discard_frame(struct ieee80211_sub_if_data *sdata, */ void mesh_path_flush_pending(struct mesh_path *mpath) { + struct ieee80211_sub_if_data *sdata = mpath->sdata; + struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh; + struct mesh_preq_queue *preq, *tmp; struct sk_buff *skb; while ((skb = skb_dequeue(&mpath->frame_queue)) != NULL) mesh_path_discard_frame(mpath->sdata, skb); + + spin_lock_bh(&ifmsh->mesh_preq_queue_lock); + list_for_each_entry_safe(preq, tmp, &ifmsh->preq_queue.list, list) { + if (ether_addr_equal(mpath->dst, preq->dst)) { + list_del(&preq->list); + kfree(preq); + --ifmsh->preq_queue_len; + } + } + spin_unlock_bh(&ifmsh->mesh_preq_queue_lock); } /** From e7e916d693dcb5a297f40312600a82475f2e63bc Mon Sep 17 00:00:00 2001 From: Remi Pommarel Date: Wed, 29 May 2024 08:57:53 +0200 Subject: [PATCH 1074/1565] wifi: mac80211: Fix deadlock in ieee80211_sta_ps_deliver_wakeup() [ Upstream commit 44c06bbde6443de206b30f513100b5670b23fc5e ] The ieee80211_sta_ps_deliver_wakeup() function takes sta->ps_lock to synchronizes with ieee80211_tx_h_unicast_ps_buf() which is called from softirq context. However using only spin_lock() to get sta->ps_lock in ieee80211_sta_ps_deliver_wakeup() does not prevent softirq to execute on this same CPU, to run ieee80211_tx_h_unicast_ps_buf() and try to take this same lock ending in deadlock. Below is an example of rcu stall that arises in such situation. rcu: INFO: rcu_sched self-detected stall on CPU rcu: 2-....: (42413413 ticks this GP) idle=b154/1/0x4000000000000000 softirq=1763/1765 fqs=21206996 rcu: (t=42586894 jiffies g=2057 q=362405 ncpus=4) CPU: 2 PID: 719 Comm: wpa_supplicant Tainted: G W 6.4.0-02158-g1b062f552873 #742 Hardware name: RPT (r1) (DT) pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : queued_spin_lock_slowpath+0x58/0x2d0 lr : invoke_tx_handlers_early+0x5b4/0x5c0 sp : ffff00001ef64660 x29: ffff00001ef64660 x28: ffff000009bc1070 x27: ffff000009bc0ad8 x26: ffff000009bc0900 x25: ffff00001ef647a8 x24: 0000000000000000 x23: ffff000009bc0900 x22: ffff000009bc0900 x21: ffff00000ac0e000 x20: ffff00000a279e00 x19: ffff00001ef646e8 x18: 0000000000000000 x17: ffff800016468000 x16: ffff00001ef608c0 x15: 0010533c93f64f80 x14: 0010395c9faa3946 x13: 0000000000000000 x12: 00000000fa83b2da x11: 000000012edeceea x10: ffff0000010fbe00 x9 : 0000000000895440 x8 : 000000000010533c x7 : ffff00000ad8b740 x6 : ffff00000c350880 x5 : 0000000000000007 x4 : 0000000000000001 x3 : 0000000000000000 x2 : 0000000000000000 x1 : 0000000000000001 x0 : ffff00000ac0e0e8 Call trace: queued_spin_lock_slowpath+0x58/0x2d0 ieee80211_tx+0x80/0x12c ieee80211_tx_pending+0x110/0x278 tasklet_action_common.constprop.0+0x10c/0x144 tasklet_action+0x20/0x28 _stext+0x11c/0x284 ____do_softirq+0xc/0x14 call_on_irq_stack+0x24/0x34 do_softirq_own_stack+0x18/0x20 do_softirq+0x74/0x7c __local_bh_enable_ip+0xa0/0xa4 _ieee80211_wake_txqs+0x3b0/0x4b8 __ieee80211_wake_queue+0x12c/0x168 ieee80211_add_pending_skbs+0xec/0x138 ieee80211_sta_ps_deliver_wakeup+0x2a4/0x480 ieee80211_mps_sta_status_update.part.0+0xd8/0x11c ieee80211_mps_sta_status_update+0x18/0x24 sta_apply_parameters+0x3bc/0x4c0 ieee80211_change_station+0x1b8/0x2dc nl80211_set_station+0x444/0x49c genl_family_rcv_msg_doit.isra.0+0xa4/0xfc genl_rcv_msg+0x1b0/0x244 netlink_rcv_skb+0x38/0x10c genl_rcv+0x34/0x48 netlink_unicast+0x254/0x2bc netlink_sendmsg+0x190/0x3b4 ____sys_sendmsg+0x1e8/0x218 ___sys_sendmsg+0x68/0x8c __sys_sendmsg+0x44/0x84 __arm64_sys_sendmsg+0x20/0x28 do_el0_svc+0x6c/0xe8 el0_svc+0x14/0x48 el0t_64_sync_handler+0xb0/0xb4 el0t_64_sync+0x14c/0x150 Using spin_lock_bh()/spin_unlock_bh() instead prevents softirq to raise on the same CPU that is holding the lock. Fixes: 1d147bfa6429 ("mac80211: fix AP powersave TX vs. wakeup race") Signed-off-by: Remi Pommarel Link: https://msgid.link/8e36fe07d0fbc146f89196cd47a53c8a0afe84aa.1716910344.git.repk@triplefau.lt Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin --- net/mac80211/sta_info.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c index 44bd03c6b847..f7637176d719 100644 --- a/net/mac80211/sta_info.c +++ b/net/mac80211/sta_info.c @@ -1343,7 +1343,7 @@ void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta) skb_queue_head_init(&pending); /* sync with ieee80211_tx_h_unicast_ps_buf */ - spin_lock(&sta->ps_lock); + spin_lock_bh(&sta->ps_lock); /* Send all buffered frames to the station */ for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) { int count = skb_queue_len(&pending), tmp; @@ -1372,7 +1372,7 @@ void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta) */ clear_sta_flag(sta, WLAN_STA_PSPOLL); clear_sta_flag(sta, WLAN_STA_UAPSD); - spin_unlock(&sta->ps_lock); + spin_unlock_bh(&sta->ps_lock); atomic_dec(&ps->num_sta_ps); From 8d5c7d7bfd72efc7b9aa1dd9ceb6647480580e99 Mon Sep 17 00:00:00 2001 From: Lin Ma Date: Tue, 21 May 2024 15:50:59 +0800 Subject: [PATCH 1075/1565] wifi: cfg80211: pmsr: use correct nla_get_uX functions [ Upstream commit ab904521f4de52fef4f179d2dfc1877645ef5f5c ] The commit 9bb7e0f24e7e ("cfg80211: add peer measurement with FTM initiator API") defines four attributes NL80211_PMSR_FTM_REQ_ATTR_ {NUM_BURSTS_EXP}/{BURST_PERIOD}/{BURST_DURATION}/{FTMS_PER_BURST} in following ways. static const struct nla_policy nl80211_pmsr_ftm_req_attr_policy[NL80211_PMSR_FTM_REQ_ATTR_MAX + 1] = { ... [NL80211_PMSR_FTM_REQ_ATTR_NUM_BURSTS_EXP] = NLA_POLICY_MAX(NLA_U8, 15), [NL80211_PMSR_FTM_REQ_ATTR_BURST_PERIOD] = { .type = NLA_U16 }, [NL80211_PMSR_FTM_REQ_ATTR_BURST_DURATION] = NLA_POLICY_MAX(NLA_U8, 15), [NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST] = NLA_POLICY_MAX(NLA_U8, 31), ... }; That is, those attributes are expected to be NLA_U8 and NLA_U16 types. However, the consumers of these attributes in `pmsr_parse_ftm` blindly all use `nla_get_u32`, which is incorrect and causes functionality issues on little-endian platforms. Hence, fix them with the correct `nla_get_u8` and `nla_get_u16` functions. Fixes: 9bb7e0f24e7e ("cfg80211: add peer measurement with FTM initiator API") Signed-off-by: Lin Ma Link: https://msgid.link/20240521075059.47999-1-linma@zju.edu.cn Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin --- net/wireless/pmsr.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/wireless/pmsr.c b/net/wireless/pmsr.c index a817d8e3e4b3..7503c7dd71ab 100644 --- a/net/wireless/pmsr.c +++ b/net/wireless/pmsr.c @@ -58,7 +58,7 @@ static int pmsr_parse_ftm(struct cfg80211_registered_device *rdev, out->ftm.burst_period = 0; if (tb[NL80211_PMSR_FTM_REQ_ATTR_BURST_PERIOD]) out->ftm.burst_period = - nla_get_u32(tb[NL80211_PMSR_FTM_REQ_ATTR_BURST_PERIOD]); + nla_get_u16(tb[NL80211_PMSR_FTM_REQ_ATTR_BURST_PERIOD]); out->ftm.asap = !!tb[NL80211_PMSR_FTM_REQ_ATTR_ASAP]; if (out->ftm.asap && !capa->ftm.asap) { @@ -77,7 +77,7 @@ static int pmsr_parse_ftm(struct cfg80211_registered_device *rdev, out->ftm.num_bursts_exp = 0; if (tb[NL80211_PMSR_FTM_REQ_ATTR_NUM_BURSTS_EXP]) out->ftm.num_bursts_exp = - nla_get_u32(tb[NL80211_PMSR_FTM_REQ_ATTR_NUM_BURSTS_EXP]); + nla_get_u8(tb[NL80211_PMSR_FTM_REQ_ATTR_NUM_BURSTS_EXP]); if (capa->ftm.max_bursts_exponent >= 0 && out->ftm.num_bursts_exp > capa->ftm.max_bursts_exponent) { @@ -90,7 +90,7 @@ static int pmsr_parse_ftm(struct cfg80211_registered_device *rdev, out->ftm.burst_duration = 15; if (tb[NL80211_PMSR_FTM_REQ_ATTR_BURST_DURATION]) out->ftm.burst_duration = - nla_get_u32(tb[NL80211_PMSR_FTM_REQ_ATTR_BURST_DURATION]); + nla_get_u8(tb[NL80211_PMSR_FTM_REQ_ATTR_BURST_DURATION]); out->ftm.ftms_per_burst = 0; if (tb[NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST]) @@ -109,7 +109,7 @@ static int pmsr_parse_ftm(struct cfg80211_registered_device *rdev, out->ftm.ftmr_retries = 3; if (tb[NL80211_PMSR_FTM_REQ_ATTR_NUM_FTMR_RETRIES]) out->ftm.ftmr_retries = - nla_get_u32(tb[NL80211_PMSR_FTM_REQ_ATTR_NUM_FTMR_RETRIES]); + nla_get_u8(tb[NL80211_PMSR_FTM_REQ_ATTR_NUM_FTMR_RETRIES]); out->ftm.request_lci = !!tb[NL80211_PMSR_FTM_REQ_ATTR_REQUEST_LCI]; if (out->ftm.request_lci && !capa->ftm.request_lci) { From 99c4903dcee30a760ba4328d3907e03bfcd0c7f2 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Fri, 10 May 2024 17:06:33 +0300 Subject: [PATCH 1076/1565] wifi: iwlwifi: mvm: revert gen2 TX A-MPDU size to 64 [ Upstream commit 4a7aace2899711592327463c1a29ffee44fcc66e ] We don't actually support >64 even for HE devices, so revert back to 64. This fixes an issue where the session is refused because the queue is configured differently from the actual session later. Fixes: 514c30696fbc ("iwlwifi: add support for IEEE802.11ax") Signed-off-by: Johannes Berg Reviewed-by: Liad Kaufman Reviewed-by: Luciano Coelho Signed-off-by: Miri Korenblit Link: https://msgid.link/20240510170500.52f7b4cf83aa.If47e43adddf7fe250ed7f5571fbb35d8221c7c47@changeid Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin --- drivers/net/wireless/intel/iwlwifi/mvm/rs.h | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rs.h b/drivers/net/wireless/intel/iwlwifi/mvm/rs.h index 32104c9f8f5e..d59a47637d12 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/rs.h +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rs.h @@ -133,13 +133,8 @@ enum { #define LINK_QUAL_AGG_FRAME_LIMIT_DEF (63) #define LINK_QUAL_AGG_FRAME_LIMIT_MAX (63) -/* - * FIXME - various places in firmware API still use u8, - * e.g. LQ command and SCD config command. - * This should be 256 instead. - */ -#define LINK_QUAL_AGG_FRAME_LIMIT_GEN2_DEF (255) -#define LINK_QUAL_AGG_FRAME_LIMIT_GEN2_MAX (255) +#define LINK_QUAL_AGG_FRAME_LIMIT_GEN2_DEF (64) +#define LINK_QUAL_AGG_FRAME_LIMIT_GEN2_MAX (64) #define LINK_QUAL_AGG_FRAME_LIMIT_MIN (0) #define LQ_SIZE 2 /* 2 mode tables: "Active" and "Search" */ From 2c80bd07c11ca3bb9c4c150f573d123d96fab9e7 Mon Sep 17 00:00:00 2001 From: Shahar S Matityahu Date: Fri, 10 May 2024 17:06:39 +0300 Subject: [PATCH 1077/1565] wifi: iwlwifi: dbg_ini: move iwl_dbg_tlv_free outside of debugfs ifdef [ Upstream commit 87821b67dea87addbc4ab093ba752753b002176a ] The driver should call iwl_dbg_tlv_free even if debugfs is not defined since ini mode does not depend on debugfs ifdef. Fixes: 68f6f492c4fa ("iwlwifi: trans: support loading ini TLVs from external file") Signed-off-by: Shahar S Matityahu Reviewed-by: Luciano Coelho Signed-off-by: Miri Korenblit Link: https://msgid.link/20240510170500.c8e3723f55b0.I5e805732b0be31ee6b83c642ec652a34e974ff10@changeid Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin --- drivers/net/wireless/intel/iwlwifi/iwl-drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-drv.c b/drivers/net/wireless/intel/iwlwifi/iwl-drv.c index ab84ac3f8f03..bf00c2fede74 100644 --- a/drivers/net/wireless/intel/iwlwifi/iwl-drv.c +++ b/drivers/net/wireless/intel/iwlwifi/iwl-drv.c @@ -1699,8 +1699,8 @@ struct iwl_drv *iwl_drv_start(struct iwl_trans *trans) err_fw: #ifdef CONFIG_IWLWIFI_DEBUGFS debugfs_remove_recursive(drv->dbgfs_drv); - iwl_dbg_tlv_free(drv->trans); #endif + iwl_dbg_tlv_free(drv->trans); kfree(drv); err: return ERR_PTR(ret); From 3c4771091ea8016c8601399078916f722dd8833b Mon Sep 17 00:00:00 2001 From: Miri Korenblit Date: Mon, 13 May 2024 13:27:12 +0300 Subject: [PATCH 1078/1565] wifi: iwlwifi: mvm: check n_ssids before accessing the ssids [ Upstream commit 60d62757df30b74bf397a2847a6db7385c6ee281 ] In some versions of cfg80211, the ssids poinet might be a valid one even though n_ssids is 0. Accessing the pointer in this case will cuase an out-of-bound access. Fix this by checking n_ssids first. Fixes: c1a7515393e4 ("iwlwifi: mvm: add adaptive dwell support") Signed-off-by: Miri Korenblit Reviewed-by: Ilan Peer Reviewed-by: Johannes Berg Link: https://msgid.link/20240513132416.6e4d1762bf0d.I5a0e6cc8f02050a766db704d15594c61fe583d45@changeid Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin --- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c index 17b992526694..a9df48c75155 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c @@ -1354,7 +1354,7 @@ static void iwl_mvm_scan_umac_dwell(struct iwl_mvm *mvm, if (IWL_MVM_ADWELL_MAX_BUDGET) cmd->v7.adwell_max_budget = cpu_to_le16(IWL_MVM_ADWELL_MAX_BUDGET); - else if (params->ssids && params->ssids[0].ssid_len) + else if (params->n_ssids && params->ssids[0].ssid_len) cmd->v7.adwell_max_budget = cpu_to_le16(IWL_SCAN_ADWELL_MAX_BUDGET_DIRECTED_SCAN); else @@ -1456,7 +1456,7 @@ iwl_mvm_scan_umac_dwell_v10(struct iwl_mvm *mvm, if (IWL_MVM_ADWELL_MAX_BUDGET) general_params->adwell_max_budget = cpu_to_le16(IWL_MVM_ADWELL_MAX_BUDGET); - else if (params->ssids && params->ssids[0].ssid_len) + else if (params->n_ssids && params->ssids[0].ssid_len) general_params->adwell_max_budget = cpu_to_le16(IWL_SCAN_ADWELL_MAX_BUDGET_DIRECTED_SCAN); else From 46c59a25337049a2a230ce7f7c3b9f21d0aaaad7 Mon Sep 17 00:00:00 2001 From: Emmanuel Grumbach Date: Mon, 13 May 2024 13:27:14 +0300 Subject: [PATCH 1079/1565] wifi: iwlwifi: mvm: don't read past the mfuart notifcation [ Upstream commit 4bb95f4535489ed830cf9b34b0a891e384d1aee4 ] In case the firmware sends a notification that claims it has more data than it has, we will read past that was allocated for the notification. Remove the print of the buffer, we won't see it by default. If needed, we can see the content with tracing. This was reported by KFENCE. Fixes: bdccdb854f2f ("iwlwifi: mvm: support MFUART dump in case of MFUART assert") Signed-off-by: Emmanuel Grumbach Reviewed-by: Johannes Berg Signed-off-by: Miri Korenblit Link: https://msgid.link/20240513132416.ba82a01a559e.Ia91dd20f5e1ca1ad380b95e68aebf2794f553d9b@changeid Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin --- drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c index 54b28f0932e2..793208d99b5f 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c @@ -196,20 +196,10 @@ void iwl_mvm_mfu_assert_dump_notif(struct iwl_mvm *mvm, { struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_mfu_assert_dump_notif *mfu_dump_notif = (void *)pkt->data; - __le32 *dump_data = mfu_dump_notif->data; - int n_words = le32_to_cpu(mfu_dump_notif->data_size) / sizeof(__le32); - int i; if (mfu_dump_notif->index_num == 0) IWL_INFO(mvm, "MFUART assert id 0x%x occurred\n", le32_to_cpu(mfu_dump_notif->assert_id)); - - for (i = 0; i < n_words; i++) - IWL_DEBUG_INFO(mvm, - "MFUART assert dump, dword %u: 0x%08x\n", - le16_to_cpu(mfu_dump_notif->index_num) * - n_words + i, - le32_to_cpu(dump_data[i])); } static bool iwl_alive_fn(struct iwl_notif_wait_data *notif_wait, From 7c9b9f822eaa2689a8f41887e6be56434e3f0f9d Mon Sep 17 00:00:00 2001 From: Lingbo Kong Date: Thu, 16 May 2024 10:18:54 +0800 Subject: [PATCH 1080/1565] wifi: mac80211: correctly parse Spatial Reuse Parameter Set element [ Upstream commit a26d8dc5227f449a54518a8b40733a54c6600a8b ] Currently, the way of parsing Spatial Reuse Parameter Set element is incorrect and some members of struct ieee80211_he_obss_pd are not assigned. To address this issue, it must be parsed in the order of the elements of Spatial Reuse Parameter Set defined in the IEEE Std 802.11ax specification. The diagram of the Spatial Reuse Parameter Set element (IEEE Std 802.11ax -2021-9.4.2.252). ------------------------------------------------------------------------- | | | | |Non-SRG| SRG | SRG | SRG | SRG | |Element|Length| Element | SR |OBSS PD|OBSS PD|OBSS PD| BSS |Partial| | ID | | ID |Control| Max | Min | Max |Color | BSSID | | | |Extension| | Offset| Offset|Offset |Bitmap|Bitmap | ------------------------------------------------------------------------- Fixes: 1ced169cc1c2 ("mac80211: allow setting spatial reuse parameters from bss_conf") Signed-off-by: Lingbo Kong Link: https://msgid.link/20240516021854.5682-3-quic_lingbok@quicinc.com Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin --- net/mac80211/he.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/net/mac80211/he.c b/net/mac80211/he.c index cc26f239838b..41413a4588db 100644 --- a/net/mac80211/he.c +++ b/net/mac80211/he.c @@ -127,15 +127,21 @@ ieee80211_he_spr_ie_to_bss_conf(struct ieee80211_vif *vif, if (!he_spr_ie_elem) return; + + he_obss_pd->sr_ctrl = he_spr_ie_elem->he_sr_control; data = he_spr_ie_elem->optional; if (he_spr_ie_elem->he_sr_control & IEEE80211_HE_SPR_NON_SRG_OFFSET_PRESENT) - data++; + he_obss_pd->non_srg_max_offset = *data++; + if (he_spr_ie_elem->he_sr_control & IEEE80211_HE_SPR_SRG_INFORMATION_PRESENT) { - he_obss_pd->max_offset = *data++; he_obss_pd->min_offset = *data++; + he_obss_pd->max_offset = *data++; + memcpy(he_obss_pd->bss_color_bitmap, data, 8); + data += 8; + memcpy(he_obss_pd->partial_bssid_bitmap, data, 8); he_obss_pd->enable = true; } } From f40cac4e708365aeb15585cb50c57961b51a5d1c Mon Sep 17 00:00:00 2001 From: Ivan Mikhaylov Date: Thu, 8 Jul 2021 15:27:53 +0300 Subject: [PATCH 1081/1565] net/ncsi: add NCSI Intel OEM command to keep PHY up [ Upstream commit abd2fddc94a619b96bf41c60429d4c32bd118e17 ] This allows to keep PHY link up and prevents any channel resets during the host load. It is KEEP_PHY_LINK_UP option(Veto bit) in i210 datasheet which block PHY reset and power state changes. Signed-off-by: Ivan Mikhaylov Signed-off-by: David S. Miller Stable-dep-of: e85e271dec02 ("net/ncsi: Fix the multi thread manner of NCSI driver") Signed-off-by: Sasha Levin --- net/ncsi/Kconfig | 6 ++++++ net/ncsi/internal.h | 5 +++++ net/ncsi/ncsi-manage.c | 45 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 56 insertions(+) diff --git a/net/ncsi/Kconfig b/net/ncsi/Kconfig index 93309081f5a4..ea1dd32b6b1f 100644 --- a/net/ncsi/Kconfig +++ b/net/ncsi/Kconfig @@ -17,3 +17,9 @@ config NCSI_OEM_CMD_GET_MAC help This allows to get MAC address from NCSI firmware and set them back to controller. +config NCSI_OEM_CMD_KEEP_PHY + bool "Keep PHY Link up" + depends on NET_NCSI + help + This allows to keep PHY link up and prevents any channel resets during + the host load. diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h index ec765f2a7569..1e3e587d04d3 100644 --- a/net/ncsi/internal.h +++ b/net/ncsi/internal.h @@ -78,6 +78,9 @@ enum { /* OEM Vendor Manufacture ID */ #define NCSI_OEM_MFR_MLX_ID 0x8119 #define NCSI_OEM_MFR_BCM_ID 0x113d +#define NCSI_OEM_MFR_INTEL_ID 0x157 +/* Intel specific OEM command */ +#define NCSI_OEM_INTEL_CMD_KEEP_PHY 0x20 /* CMD ID for Keep PHY up */ /* Broadcom specific OEM Command */ #define NCSI_OEM_BCM_CMD_GMA 0x01 /* CMD ID for Get MAC */ /* Mellanox specific OEM Command */ @@ -86,6 +89,7 @@ enum { #define NCSI_OEM_MLX_CMD_SMAF 0x01 /* CMD ID for Set MC Affinity */ #define NCSI_OEM_MLX_CMD_SMAF_PARAM 0x07 /* Parameter for SMAF */ /* OEM Command payload lengths*/ +#define NCSI_OEM_INTEL_CMD_KEEP_PHY_LEN 7 #define NCSI_OEM_BCM_CMD_GMA_LEN 12 #define NCSI_OEM_MLX_CMD_GMA_LEN 8 #define NCSI_OEM_MLX_CMD_SMAF_LEN 60 @@ -274,6 +278,7 @@ enum { ncsi_dev_state_probe_mlx_gma, ncsi_dev_state_probe_mlx_smaf, ncsi_dev_state_probe_cis, + ncsi_dev_state_probe_keep_phy, ncsi_dev_state_probe_gvi, ncsi_dev_state_probe_gc, ncsi_dev_state_probe_gls, diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c index ffff8da707b8..06ad7db553fb 100644 --- a/net/ncsi/ncsi-manage.c +++ b/net/ncsi/ncsi-manage.c @@ -689,6 +689,35 @@ static int set_one_vid(struct ncsi_dev_priv *ndp, struct ncsi_channel *nc, return 0; } +#if IS_ENABLED(CONFIG_NCSI_OEM_CMD_KEEP_PHY) + +static int ncsi_oem_keep_phy_intel(struct ncsi_cmd_arg *nca) +{ + unsigned char data[NCSI_OEM_INTEL_CMD_KEEP_PHY_LEN]; + int ret = 0; + + nca->payload = NCSI_OEM_INTEL_CMD_KEEP_PHY_LEN; + + memset(data, 0, NCSI_OEM_INTEL_CMD_KEEP_PHY_LEN); + *(unsigned int *)data = ntohl((__force __be32)NCSI_OEM_MFR_INTEL_ID); + + data[4] = NCSI_OEM_INTEL_CMD_KEEP_PHY; + + /* PHY Link up attribute */ + data[6] = 0x1; + + nca->data = data; + + ret = ncsi_xmit_cmd(nca); + if (ret) + netdev_err(nca->ndp->ndev.dev, + "NCSI: Failed to transmit cmd 0x%x during configure\n", + nca->type); + return ret; +} + +#endif + #if IS_ENABLED(CONFIG_NCSI_OEM_CMD_GET_MAC) /* NCSI OEM Command APIs */ @@ -1392,7 +1421,23 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp) } nd->state = ncsi_dev_state_probe_gvi; + if (IS_ENABLED(CONFIG_NCSI_OEM_CMD_KEEP_PHY)) + nd->state = ncsi_dev_state_probe_keep_phy; break; +#if IS_ENABLED(CONFIG_NCSI_OEM_CMD_KEEP_PHY) + case ncsi_dev_state_probe_keep_phy: + ndp->pending_req_num = 1; + + nca.type = NCSI_PKT_CMD_OEM; + nca.package = ndp->active_package->id; + nca.channel = 0; + ret = ncsi_oem_keep_phy_intel(&nca); + if (ret) + goto error; + + nd->state = ncsi_dev_state_probe_gvi; + break; +#endif /* CONFIG_NCSI_OEM_CMD_KEEP_PHY */ case ncsi_dev_state_probe_gvi: case ncsi_dev_state_probe_gc: case ncsi_dev_state_probe_gls: From 7329bc66b4a03e63d8c191eeb7c342a1c13f96fa Mon Sep 17 00:00:00 2001 From: Peter Delevoryas Date: Tue, 14 Nov 2023 10:07:33 -0600 Subject: [PATCH 1082/1565] net/ncsi: Simplify Kconfig/dts control flow [ Upstream commit c797ce168930ce3d62a9b7fc4d7040963ee6a01e ] Background: 1. CONFIG_NCSI_OEM_CMD_KEEP_PHY If this is enabled, we send an extra OEM Intel command in the probe sequence immediately after discovering a channel (e.g. after "Clear Initial State"). 2. CONFIG_NCSI_OEM_CMD_GET_MAC If this is enabled, we send one of 3 OEM "Get MAC Address" commands from Broadcom, Mellanox (Nvidida), and Intel in the *configuration* sequence for a channel. 3. mellanox,multi-host (or mlx,multi-host) Introduced by this patch: https://lore.kernel.org/all/20200108234341.2590674-1-vijaykhemka@fb.com/ Which was actually originally from cosmo.chou@quantatw.com: https://github.com/facebook/openbmc-linux/commit/9f132a10ec48db84613519258cd8a317fb9c8f1b Cosmo claimed that the Nvidia ConnectX-4 and ConnectX-6 NIC's don't respond to Get Version ID, et. al in the probe sequence unless you send the Set MC Affinity command first. Problem Statement: We've been using a combination of #ifdef code blocks and IS_ENABLED() conditions to conditionally send these OEM commands. It makes adding any new code around these commands hard to understand. Solution: In this patch, I just want to remove the conditionally compiled blocks of code, and always use IS_ENABLED(...) to do dynamic control flow. I don't think the small amount of code this adds to non-users of the OEM Kconfigs is a big deal. Signed-off-by: Peter Delevoryas Signed-off-by: David S. Miller Stable-dep-of: e85e271dec02 ("net/ncsi: Fix the multi thread manner of NCSI driver") Signed-off-by: Sasha Levin --- net/ncsi/ncsi-manage.c | 20 +++----------------- 1 file changed, 3 insertions(+), 17 deletions(-) diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c index 06ad7db553fb..d2f8155af53e 100644 --- a/net/ncsi/ncsi-manage.c +++ b/net/ncsi/ncsi-manage.c @@ -689,8 +689,6 @@ static int set_one_vid(struct ncsi_dev_priv *ndp, struct ncsi_channel *nc, return 0; } -#if IS_ENABLED(CONFIG_NCSI_OEM_CMD_KEEP_PHY) - static int ncsi_oem_keep_phy_intel(struct ncsi_cmd_arg *nca) { unsigned char data[NCSI_OEM_INTEL_CMD_KEEP_PHY_LEN]; @@ -716,10 +714,6 @@ static int ncsi_oem_keep_phy_intel(struct ncsi_cmd_arg *nca) return ret; } -#endif - -#if IS_ENABLED(CONFIG_NCSI_OEM_CMD_GET_MAC) - /* NCSI OEM Command APIs */ static int ncsi_oem_gma_handler_bcm(struct ncsi_cmd_arg *nca) { @@ -833,8 +827,6 @@ static int ncsi_gma_handler(struct ncsi_cmd_arg *nca, unsigned int mf_id) return nch->handler(nca); } -#endif /* CONFIG_NCSI_OEM_CMD_GET_MAC */ - /* Determine if a given channel from the channel_queue should be used for Tx */ static bool ncsi_channel_is_tx(struct ncsi_dev_priv *ndp, struct ncsi_channel *nc) @@ -1016,20 +1008,18 @@ static void ncsi_configure_channel(struct ncsi_dev_priv *ndp) goto error; } - nd->state = ncsi_dev_state_config_oem_gma; + nd->state = IS_ENABLED(CONFIG_NCSI_OEM_CMD_GET_MAC) + ? ncsi_dev_state_config_oem_gma + : ncsi_dev_state_config_clear_vids; break; case ncsi_dev_state_config_oem_gma: nd->state = ncsi_dev_state_config_clear_vids; - ret = -1; -#if IS_ENABLED(CONFIG_NCSI_OEM_CMD_GET_MAC) nca.type = NCSI_PKT_CMD_OEM; nca.package = np->id; nca.channel = nc->id; ndp->pending_req_num = 1; ret = ncsi_gma_handler(&nca, nc->version.mf_id); -#endif /* CONFIG_NCSI_OEM_CMD_GET_MAC */ - if (ret < 0) schedule_work(&ndp->work); @@ -1381,7 +1371,6 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp) schedule_work(&ndp->work); break; -#if IS_ENABLED(CONFIG_NCSI_OEM_CMD_GET_MAC) case ncsi_dev_state_probe_mlx_gma: ndp->pending_req_num = 1; @@ -1406,7 +1395,6 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp) nd->state = ncsi_dev_state_probe_cis; break; -#endif /* CONFIG_NCSI_OEM_CMD_GET_MAC */ case ncsi_dev_state_probe_cis: ndp->pending_req_num = NCSI_RESERVED_CHANNEL; @@ -1424,7 +1412,6 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp) if (IS_ENABLED(CONFIG_NCSI_OEM_CMD_KEEP_PHY)) nd->state = ncsi_dev_state_probe_keep_phy; break; -#if IS_ENABLED(CONFIG_NCSI_OEM_CMD_KEEP_PHY) case ncsi_dev_state_probe_keep_phy: ndp->pending_req_num = 1; @@ -1437,7 +1424,6 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp) nd->state = ncsi_dev_state_probe_gvi; break; -#endif /* CONFIG_NCSI_OEM_CMD_KEEP_PHY */ case ncsi_dev_state_probe_gvi: case ncsi_dev_state_probe_gc: case ncsi_dev_state_probe_gls: From 3cbb2ba0a0d90a97ff55560d37e27d3a6d922369 Mon Sep 17 00:00:00 2001 From: DelphineCCChiu Date: Wed, 29 May 2024 14:58:55 +0800 Subject: [PATCH 1083/1565] net/ncsi: Fix the multi thread manner of NCSI driver [ Upstream commit e85e271dec0270982afed84f70dc37703fcc1d52 ] Currently NCSI driver will send several NCSI commands back to back without waiting the response of previous NCSI command or timeout in some state when NIC have multi channel. This operation against the single thread manner defined by NCSI SPEC(section 6.3.2.3 in DSP0222_1.1.1) According to NCSI SPEC(section 6.2.13.1 in DSP0222_1.1.1), we should probe one channel at a time by sending NCSI commands (Clear initial state, Get version ID, Get capabilities...), than repeat this steps until the max number of channels which we got from NCSI command (Get capabilities) has been probed. Fixes: e6f44ed6d04d ("net/ncsi: Package and channel management") Signed-off-by: DelphineCCChiu Link: https://lore.kernel.org/r/20240529065856.825241-1-delphine_cc_chiu@wiwynn.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/ncsi/internal.h | 2 ++ net/ncsi/ncsi-manage.c | 75 +++++++++++++++++++++--------------------- net/ncsi/ncsi-rsp.c | 4 ++- 3 files changed, 42 insertions(+), 39 deletions(-) diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h index 1e3e587d04d3..dea60e25e860 100644 --- a/net/ncsi/internal.h +++ b/net/ncsi/internal.h @@ -322,6 +322,7 @@ struct ncsi_dev_priv { spinlock_t lock; /* Protect the NCSI device */ unsigned int package_probe_id;/* Current ID during probe */ unsigned int package_num; /* Number of packages */ + unsigned int channel_probe_id;/* Current cahnnel ID during probe */ struct list_head packages; /* List of packages */ struct ncsi_channel *hot_channel; /* Channel was ever active */ struct ncsi_request requests[256]; /* Request table */ @@ -340,6 +341,7 @@ struct ncsi_dev_priv { bool multi_package; /* Enable multiple packages */ bool mlx_multi_host; /* Enable multi host Mellanox */ u32 package_whitelist; /* Packages to configure */ + unsigned char channel_count; /* Num of channels to probe */ }; struct ncsi_cmd_arg { diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c index d2f8155af53e..bb3248214746 100644 --- a/net/ncsi/ncsi-manage.c +++ b/net/ncsi/ncsi-manage.c @@ -510,17 +510,19 @@ static void ncsi_suspend_channel(struct ncsi_dev_priv *ndp) break; case ncsi_dev_state_suspend_gls: - ndp->pending_req_num = np->channel_num; + ndp->pending_req_num = 1; nca.type = NCSI_PKT_CMD_GLS; nca.package = np->id; + nca.channel = ndp->channel_probe_id; + ret = ncsi_xmit_cmd(&nca); + if (ret) + goto error; + ndp->channel_probe_id++; - nd->state = ncsi_dev_state_suspend_dcnt; - NCSI_FOR_EACH_CHANNEL(np, nc) { - nca.channel = nc->id; - ret = ncsi_xmit_cmd(&nca); - if (ret) - goto error; + if (ndp->channel_probe_id == ndp->channel_count) { + ndp->channel_probe_id = 0; + nd->state = ncsi_dev_state_suspend_dcnt; } break; @@ -1317,7 +1319,6 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp) { struct ncsi_dev *nd = &ndp->ndev; struct ncsi_package *np; - struct ncsi_channel *nc; struct ncsi_cmd_arg nca; unsigned char index; int ret; @@ -1395,23 +1396,6 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp) nd->state = ncsi_dev_state_probe_cis; break; - case ncsi_dev_state_probe_cis: - ndp->pending_req_num = NCSI_RESERVED_CHANNEL; - - /* Clear initial state */ - nca.type = NCSI_PKT_CMD_CIS; - nca.package = ndp->active_package->id; - for (index = 0; index < NCSI_RESERVED_CHANNEL; index++) { - nca.channel = index; - ret = ncsi_xmit_cmd(&nca); - if (ret) - goto error; - } - - nd->state = ncsi_dev_state_probe_gvi; - if (IS_ENABLED(CONFIG_NCSI_OEM_CMD_KEEP_PHY)) - nd->state = ncsi_dev_state_probe_keep_phy; - break; case ncsi_dev_state_probe_keep_phy: ndp->pending_req_num = 1; @@ -1424,14 +1408,17 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp) nd->state = ncsi_dev_state_probe_gvi; break; + case ncsi_dev_state_probe_cis: case ncsi_dev_state_probe_gvi: case ncsi_dev_state_probe_gc: case ncsi_dev_state_probe_gls: np = ndp->active_package; - ndp->pending_req_num = np->channel_num; + ndp->pending_req_num = 1; - /* Retrieve version, capability or link status */ - if (nd->state == ncsi_dev_state_probe_gvi) + /* Clear initial state Retrieve version, capability or link status */ + if (nd->state == ncsi_dev_state_probe_cis) + nca.type = NCSI_PKT_CMD_CIS; + else if (nd->state == ncsi_dev_state_probe_gvi) nca.type = NCSI_PKT_CMD_GVI; else if (nd->state == ncsi_dev_state_probe_gc) nca.type = NCSI_PKT_CMD_GC; @@ -1439,19 +1426,29 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp) nca.type = NCSI_PKT_CMD_GLS; nca.package = np->id; - NCSI_FOR_EACH_CHANNEL(np, nc) { - nca.channel = nc->id; - ret = ncsi_xmit_cmd(&nca); - if (ret) - goto error; + nca.channel = ndp->channel_probe_id; + + ret = ncsi_xmit_cmd(&nca); + if (ret) + goto error; + + if (nd->state == ncsi_dev_state_probe_cis) { + nd->state = ncsi_dev_state_probe_gvi; + if (IS_ENABLED(CONFIG_NCSI_OEM_CMD_KEEP_PHY) && ndp->channel_probe_id == 0) + nd->state = ncsi_dev_state_probe_keep_phy; + } else if (nd->state == ncsi_dev_state_probe_gvi) { + nd->state = ncsi_dev_state_probe_gc; + } else if (nd->state == ncsi_dev_state_probe_gc) { + nd->state = ncsi_dev_state_probe_gls; + } else { + nd->state = ncsi_dev_state_probe_cis; + ndp->channel_probe_id++; } - if (nd->state == ncsi_dev_state_probe_gvi) - nd->state = ncsi_dev_state_probe_gc; - else if (nd->state == ncsi_dev_state_probe_gc) - nd->state = ncsi_dev_state_probe_gls; - else + if (ndp->channel_probe_id == ndp->channel_count) { + ndp->channel_probe_id = 0; nd->state = ncsi_dev_state_probe_dp; + } break; case ncsi_dev_state_probe_dp: ndp->pending_req_num = 1; @@ -1752,6 +1749,7 @@ struct ncsi_dev *ncsi_register_dev(struct net_device *dev, ndp->requests[i].ndp = ndp; timer_setup(&ndp->requests[i].timer, ncsi_request_timeout, 0); } + ndp->channel_count = NCSI_RESERVED_CHANNEL; spin_lock_irqsave(&ncsi_dev_lock, flags); list_add_tail_rcu(&ndp->node, &ncsi_dev_list); @@ -1784,6 +1782,7 @@ int ncsi_start_dev(struct ncsi_dev *nd) if (!(ndp->flags & NCSI_DEV_PROBED)) { ndp->package_probe_id = 0; + ndp->channel_probe_id = 0; nd->state = ncsi_dev_state_probe; schedule_work(&ndp->work); return 0; diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c index 6a4638811660..960e2cfc1fd2 100644 --- a/net/ncsi/ncsi-rsp.c +++ b/net/ncsi/ncsi-rsp.c @@ -795,12 +795,13 @@ static int ncsi_rsp_handler_gc(struct ncsi_request *nr) struct ncsi_rsp_gc_pkt *rsp; struct ncsi_dev_priv *ndp = nr->ndp; struct ncsi_channel *nc; + struct ncsi_package *np; size_t size; /* Find the channel */ rsp = (struct ncsi_rsp_gc_pkt *)skb_network_header(nr->rsp); ncsi_find_package_and_channel(ndp, rsp->rsp.common.channel, - NULL, &nc); + &np, &nc); if (!nc) return -ENODEV; @@ -835,6 +836,7 @@ static int ncsi_rsp_handler_gc(struct ncsi_request *nr) */ nc->vlan_filter.bitmap = U64_MAX; nc->vlan_filter.n_vids = rsp->vlan_cnt; + np->ndp->channel_count = rsp->channel_cnt; return 0; } From f9f69e3f698916370e0271b2c326957acaf5e9dc Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 31 May 2024 13:26:34 +0000 Subject: [PATCH 1084/1565] ipv6: sr: block BH in seg6_output_core() and seg6_input_core() [ Upstream commit c0b98ac1cc104f48763cdb27b1e9ac25fd81fc90 ] As explained in commit 1378817486d6 ("tipc: block BH before using dst_cache"), net/core/dst_cache.c helpers need to be called with BH disabled. Disabling preemption in seg6_output_core() is not good enough, because seg6_output_core() is called from process context, lwtunnel_output() only uses rcu_read_lock(). We might be interrupted by a softirq, re-enter seg6_output_core() and corrupt dst_cache data structures. Fix the race by using local_bh_disable() instead of preempt_disable(). Apply a similar change in seg6_input_core(). Fixes: fa79581ea66c ("ipv6: sr: fix several BUGs when preemption is enabled") Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by: Eric Dumazet Cc: David Lebrun Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240531132636.2637995-4-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/ipv6/seg6_iptunnel.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c index 40ac23242c37..a73840da34ed 100644 --- a/net/ipv6/seg6_iptunnel.c +++ b/net/ipv6/seg6_iptunnel.c @@ -325,9 +325,8 @@ static int seg6_input(struct sk_buff *skb) slwt = seg6_lwt_lwtunnel(orig_dst->lwtstate); - preempt_disable(); + local_bh_disable(); dst = dst_cache_get(&slwt->cache); - preempt_enable(); skb_dst_drop(skb); @@ -335,14 +334,13 @@ static int seg6_input(struct sk_buff *skb) ip6_route_input(skb); dst = skb_dst(skb); if (!dst->error) { - preempt_disable(); dst_cache_set_ip6(&slwt->cache, dst, &ipv6_hdr(skb)->saddr); - preempt_enable(); } } else { skb_dst_set(skb, dst); } + local_bh_enable(); err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev)); if (unlikely(err)) @@ -364,9 +362,9 @@ static int seg6_output(struct net *net, struct sock *sk, struct sk_buff *skb) slwt = seg6_lwt_lwtunnel(orig_dst->lwtstate); - preempt_disable(); + local_bh_disable(); dst = dst_cache_get(&slwt->cache); - preempt_enable(); + local_bh_enable(); if (unlikely(!dst)) { struct ipv6hdr *hdr = ipv6_hdr(skb); @@ -386,9 +384,9 @@ static int seg6_output(struct net *net, struct sock *sk, struct sk_buff *skb) goto drop; } - preempt_disable(); + local_bh_disable(); dst_cache_set_ip6(&slwt->cache, dst, &fl6.saddr); - preempt_enable(); + local_bh_enable(); } skb_dst_drop(skb); From 52b1aa07cda6a199cd6754d3798c7759023bc70f Mon Sep 17 00:00:00 2001 From: Hangyu Hua Date: Mon, 3 Jun 2024 15:13:03 +0800 Subject: [PATCH 1085/1565] net: sched: sch_multiq: fix possible OOB write in multiq_tune() [ Upstream commit affc18fdc694190ca7575b9a86632a73b9fe043d ] q->bands will be assigned to qopt->bands to execute subsequent code logic after kmalloc. So the old q->bands should not be used in kmalloc. Otherwise, an out-of-bounds write will occur. Fixes: c2999f7fb05b ("net: sched: multiq: don't call qdisc_put() while holding tree lock") Signed-off-by: Hangyu Hua Acked-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/sched/sch_multiq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c index 1c6dbcfa89b8..77fd7be3a9cd 100644 --- a/net/sched/sch_multiq.c +++ b/net/sched/sch_multiq.c @@ -185,7 +185,7 @@ static int multiq_tune(struct Qdisc *sch, struct nlattr *opt, qopt->bands = qdisc_dev(sch)->real_num_tx_queues; - removed = kmalloc(sizeof(*removed) * (q->max_bands - q->bands), + removed = kmalloc(sizeof(*removed) * (q->max_bands - qopt->bands), GFP_KERNEL); if (!removed) return -ENOMEM; From dd254cde571580ba0822be571660be4c17dba903 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Mon, 3 Jun 2024 10:59:26 +0200 Subject: [PATCH 1086/1565] vxlan: Fix regression when dropping packets due to invalid src addresses [ Upstream commit 1cd4bc987abb2823836cbb8f887026011ccddc8a ] Commit f58f45c1e5b9 ("vxlan: drop packets from invalid src-address") has recently been added to vxlan mainly in the context of source address snooping/learning so that when it is enabled, an entry in the FDB is not being created for an invalid address for the corresponding tunnel endpoint. Before commit f58f45c1e5b9 vxlan was similarly behaving as geneve in that it passed through whichever macs were set in the L2 header. It turns out that this change in behavior breaks setups, for example, Cilium with netkit in L3 mode for Pods as well as tunnel mode has been passing before the change in f58f45c1e5b9 for both vxlan and geneve. After mentioned change it is only passing for geneve as in case of vxlan packets are dropped due to vxlan_set_mac() returning false as source and destination macs are zero which for E/W traffic via tunnel is totally fine. Fix it by only opting into the is_valid_ether_addr() check in vxlan_set_mac() when in fact source address snooping/learning is actually enabled in vxlan. This is done by moving the check into vxlan_snoop(). With this change, the Cilium connectivity test suite passes again for both tunnel flavors. Fixes: f58f45c1e5b9 ("vxlan: drop packets from invalid src-address") Signed-off-by: Daniel Borkmann Cc: David Bauer Cc: Ido Schimmel Cc: Nikolay Aleksandrov Cc: Martin KaFai Lau Reviewed-by: Ido Schimmel Reviewed-by: Nikolay Aleksandrov Reviewed-by: David Bauer Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/net/vxlan/vxlan_core.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c index 3096769e718e..ec67d2eb05ec 100644 --- a/drivers/net/vxlan/vxlan_core.c +++ b/drivers/net/vxlan/vxlan_core.c @@ -1492,6 +1492,10 @@ static bool vxlan_snoop(struct net_device *dev, struct vxlan_fdb *f; u32 ifindex = 0; + /* Ignore packets from invalid src-address */ + if (!is_valid_ether_addr(src_mac)) + return true; + #if IS_ENABLED(CONFIG_IPV6) if (src_ip->sa.sa_family == AF_INET6 && (ipv6_addr_type(&src_ip->sin6.sin6_addr) & IPV6_ADDR_LINKLOCAL)) From d8c79ae03ee12137ef9404833e0dd893e2364ed4 Mon Sep 17 00:00:00 2001 From: Jason Xing Date: Tue, 4 Jun 2024 01:02:16 +0800 Subject: [PATCH 1087/1565] tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB [ Upstream commit a46d0ea5c94205f40ecf912d1bb7806a8a64704f ] According to RFC 1213, we should also take CLOSE-WAIT sockets into consideration: "tcpCurrEstab OBJECT-TYPE ... The number of TCP connections for which the current state is either ESTABLISHED or CLOSE- WAIT." After this, CurrEstab counter will display the total number of ESTABLISHED and CLOSE-WAIT sockets. The logic of counting When we increment the counter? a) if we change the state to ESTABLISHED. b) if we change the state from SYN-RECEIVED to CLOSE-WAIT. When we decrement the counter? a) if the socket leaves ESTABLISHED and will never go into CLOSE-WAIT, say, on the client side, changing from ESTABLISHED to FIN-WAIT-1. b) if the socket leaves CLOSE-WAIT, say, on the server side, changing from CLOSE-WAIT to LAST-ACK. Please note: there are two chances that old state of socket can be changed to CLOSE-WAIT in tcp_fin(). One is SYN-RECV, the other is ESTABLISHED. So we have to take care of the former case. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jason Xing Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/ipv4/tcp.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 4ed0d303791a..0a495b6edbc4 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2420,6 +2420,10 @@ void tcp_set_state(struct sock *sk, int state) if (oldstate != TCP_ESTABLISHED) TCP_INC_STATS(sock_net(sk), TCP_MIB_CURRESTAB); break; + case TCP_CLOSE_WAIT: + if (oldstate == TCP_SYN_RECV) + TCP_INC_STATS(sock_net(sk), TCP_MIB_CURRESTAB); + break; case TCP_CLOSE: if (oldstate == TCP_CLOSE_WAIT || oldstate == TCP_ESTABLISHED) @@ -2431,7 +2435,7 @@ void tcp_set_state(struct sock *sk, int state) inet_put_port(sk); fallthrough; default: - if (oldstate == TCP_ESTABLISHED) + if (oldstate == TCP_ESTABLISHED || oldstate == TCP_CLOSE_WAIT) TCP_DEC_STATS(sock_net(sk), TCP_MIB_CURRESTAB); } From 6db4af09987cc5d5f0136bd46148b0e0460dae5b Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 4 Jun 2024 18:15:11 +0000 Subject: [PATCH 1088/1565] net/sched: taprio: always validate TCA_TAPRIO_ATTR_PRIOMAP [ Upstream commit f921a58ae20852d188f70842431ce6519c4fdc36 ] If one TCA_TAPRIO_ATTR_PRIOMAP attribute has been provided, taprio_parse_mqprio_opt() must validate it, or userspace can inject arbitrary data to the kernel, the second time taprio_change() is called. First call (with valid attributes) sets dev->num_tc to a non zero value. Second call (with arbitrary mqprio attributes) returns early from taprio_parse_mqprio_opt() and bad things can happen. Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule") Reported-by: Noam Rathaus Signed-off-by: Eric Dumazet Acked-by: Vinicius Costa Gomes Reviewed-by: Vladimir Oltean Link: https://lore.kernel.org/r/20240604181511.769870-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/sched/sch_taprio.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 2d842f31ec5a..ec6b24edf5f9 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -925,16 +925,13 @@ static int taprio_parse_mqprio_opt(struct net_device *dev, { int i, j; - if (!qopt && !dev->num_tc) { - NL_SET_ERR_MSG(extack, "'mqprio' configuration is necessary"); - return -EINVAL; - } - - /* If num_tc is already set, it means that the user already - * configured the mqprio part - */ - if (dev->num_tc) + if (!qopt) { + if (!dev->num_tc) { + NL_SET_ERR_MSG(extack, "'mqprio' configuration is necessary"); + return -EINVAL; + } return 0; + } /* Verify num_tc is not out of max range */ if (qopt->num_tc > TC_MAX_QUEUE) { From 13f61e503ec1fd6d031ea42f0805c434cff9895f Mon Sep 17 00:00:00 2001 From: Karol Kolacinski Date: Tue, 4 Jun 2024 14:05:27 +0200 Subject: [PATCH 1089/1565] ptp: Fix error message on failed pin verification [ Upstream commit 323a359f9b077f382f4483023d096a4d316fd135 ] On failed verification of PTP clock pin, error message prints channel number instead of pin index after "pin", which is incorrect. Fix error message by adding channel number to the message and printing pin number instead of channel number. Fixes: 6092315dfdec ("ptp: introduce programmable pins.") Signed-off-by: Karol Kolacinski Acked-by: Richard Cochran Link: https://lore.kernel.org/r/20240604120555.16643-1-karol.kolacinski@intel.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/ptp/ptp_chardev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c index 9311f3d09c8f..8eb902fe73a9 100644 --- a/drivers/ptp/ptp_chardev.c +++ b/drivers/ptp/ptp_chardev.c @@ -84,7 +84,8 @@ int ptp_set_pinfunc(struct ptp_clock *ptp, unsigned int pin, } if (info->verify(info, pin, func, chan)) { - pr_err("driver cannot use function %u on pin %u\n", func, chan); + pr_err("driver cannot use function %u and channel %u on pin %u\n", + func, chan, pin); return -EOPNOTSUPP; } From d2c53bedeb96b5a39bdce267462401e9910f350d Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Tue, 4 Jun 2024 09:52:29 -0700 Subject: [PATCH 1090/1565] af_unix: Annotate data-race of sk->sk_state in unix_inq_len(). [ Upstream commit 3a0f38eb285c8c2eead4b3230c7ac2983707599d ] ioctl(SIOCINQ) calls unix_inq_len() that checks sk->sk_state first and returns -EINVAL if it's TCP_LISTEN. Then, for SOCK_STREAM sockets, unix_inq_len() returns the number of bytes in recvq. However, unix_inq_len() does not hold unix_state_lock(), and the concurrent listen() might change the state after checking sk->sk_state. If the race occurs, 0 is returned for the listener, instead of -EINVAL, because the length of skb with embryo is 0. We could hold unix_state_lock() in unix_inq_len(), but it's overkill given the result is true for pre-listen() TCP_CLOSE state. So, let's use READ_ONCE() for sk->sk_state in unix_inq_len(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/unix/af_unix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 405bf3e6eb79..1636bd712b0a 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -2614,7 +2614,7 @@ long unix_inq_len(struct sock *sk) struct sk_buff *skb; long amount = 0; - if (sk->sk_state == TCP_LISTEN) + if (READ_ONCE(sk->sk_state) == TCP_LISTEN) return -EINVAL; spin_lock(&sk->sk_receive_queue.lock); From cc5d123ce4aeed76554b831204420c1577709901 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Tue, 4 Jun 2024 09:52:30 -0700 Subject: [PATCH 1091/1565] af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll(). [ Upstream commit eb0718fb3e97ad0d6f4529b810103451c90adf94 ] unix_poll() and unix_dgram_poll() read sk->sk_state locklessly and calls unix_writable() which also reads sk->sk_state without holding unix_state_lock(). Let's use READ_ONCE() in unix_poll() and unix_dgram_poll() and pass it to unix_writable(). While at it, we remove TCP_SYN_SENT check in unix_dgram_poll() as that state does not exist for AF_UNIX socket since the code was added. Fixes: 1586a5877db9 ("af_unix: do not report POLLOUT on listeners") Fixes: 3c73419c09a5 ("af_unix: fix 'poll for write'/ connected DGRAM sockets") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/unix/af_unix.c | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 1636bd712b0a..59685d6d083a 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -447,9 +447,9 @@ static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other) return 0; } -static int unix_writable(const struct sock *sk) +static int unix_writable(const struct sock *sk, unsigned char state) { - return sk->sk_state != TCP_LISTEN && + return state != TCP_LISTEN && (refcount_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf; } @@ -458,7 +458,7 @@ static void unix_write_space(struct sock *sk) struct socket_wq *wq; rcu_read_lock(); - if (unix_writable(sk)) { + if (unix_writable(sk, READ_ONCE(sk->sk_state))) { wq = rcu_dereference(sk->sk_wq); if (skwq_has_sleeper(wq)) wake_up_interruptible_sync_poll(&wq->wait, @@ -2713,12 +2713,14 @@ static int unix_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned lon static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wait) { struct sock *sk = sock->sk; + unsigned char state; __poll_t mask; u8 shutdown; sock_poll_wait(file, sock, wait); mask = 0; shutdown = READ_ONCE(sk->sk_shutdown); + state = READ_ONCE(sk->sk_state); /* exceptional events? */ if (sk->sk_err) @@ -2734,14 +2736,14 @@ static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wa /* Connection-based need to check for termination and startup */ if ((sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) && - sk->sk_state == TCP_CLOSE) + state == TCP_CLOSE) mask |= EPOLLHUP; /* * we set writable also when the other side has shut down the * connection. This prevents stuck sockets. */ - if (unix_writable(sk)) + if (unix_writable(sk, state)) mask |= EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND; return mask; @@ -2752,12 +2754,14 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock, { struct sock *sk = sock->sk, *other; unsigned int writable; + unsigned char state; __poll_t mask; u8 shutdown; sock_poll_wait(file, sock, wait); mask = 0; shutdown = READ_ONCE(sk->sk_shutdown); + state = READ_ONCE(sk->sk_state); /* exceptional events? */ if (sk->sk_err || !skb_queue_empty_lockless(&sk->sk_error_queue)) @@ -2774,19 +2778,14 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock, mask |= EPOLLIN | EPOLLRDNORM; /* Connection-based need to check for termination and startup */ - if (sk->sk_type == SOCK_SEQPACKET) { - if (sk->sk_state == TCP_CLOSE) - mask |= EPOLLHUP; - /* connection hasn't started yet? */ - if (sk->sk_state == TCP_SYN_SENT) - return mask; - } + if (sk->sk_type == SOCK_SEQPACKET && state == TCP_CLOSE) + mask |= EPOLLHUP; /* No write status requested, avoid expensive OUT tests. */ if (!(poll_requested_events(wait) & (EPOLLWRBAND|EPOLLWRNORM|EPOLLOUT))) return mask; - writable = unix_writable(sk); + writable = unix_writable(sk, state); if (writable) { unix_state_lock(sk); From b5a6507c6196ec33bfc2f679a1e4fe0f06c93aed Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Tue, 4 Jun 2024 09:52:33 -0700 Subject: [PATCH 1092/1565] af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg(). [ Upstream commit 8a34d4e8d9742a24f74998f45a6a98edd923319b ] The following functions read sk->sk_state locklessly and proceed only if the state is TCP_ESTABLISHED. * unix_stream_sendmsg * unix_stream_read_generic * unix_seqpacket_sendmsg * unix_seqpacket_recvmsg Let's use READ_ONCE() there. Fixes: a05d2ad1c1f3 ("af_unix: Only allow recv on connected seqpacket sockets.") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/unix/af_unix.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 59685d6d083a..a88333b30cc3 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1909,7 +1909,7 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, goto out_err; if (msg->msg_namelen) { - err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EOPNOTSUPP; + err = READ_ONCE(sk->sk_state) == TCP_ESTABLISHED ? -EISCONN : -EOPNOTSUPP; goto out_err; } else { err = -ENOTCONN; @@ -2112,7 +2112,7 @@ static int unix_seqpacket_sendmsg(struct socket *sock, struct msghdr *msg, if (err) return err; - if (sk->sk_state != TCP_ESTABLISHED) + if (READ_ONCE(sk->sk_state) != TCP_ESTABLISHED) return -ENOTCONN; if (msg->msg_namelen) @@ -2126,7 +2126,7 @@ static int unix_seqpacket_recvmsg(struct socket *sock, struct msghdr *msg, { struct sock *sk = sock->sk; - if (sk->sk_state != TCP_ESTABLISHED) + if (READ_ONCE(sk->sk_state) != TCP_ESTABLISHED) return -ENOTCONN; return unix_dgram_recvmsg(sock, msg, size, flags); @@ -2326,7 +2326,7 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state, size_t size = state->size; unsigned int last_len; - if (unlikely(sk->sk_state != TCP_ESTABLISHED)) { + if (unlikely(READ_ONCE(sk->sk_state) != TCP_ESTABLISHED)) { err = -EINVAL; goto out; } From 44a2437c60b151c53a5fcabac4f873fe8d9a0a16 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Tue, 4 Jun 2024 09:52:35 -0700 Subject: [PATCH 1093/1565] af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG. [ Upstream commit 0aa3be7b3e1f8f997312cc4705f8165e02806f8f ] While dumping AF_UNIX sockets via UNIX_DIAG, sk->sk_state is read locklessly. Let's use READ_ONCE() there. Note that the result could be inconsistent if the socket is dumped during the state change. This is common for other SOCK_DIAG and similar interfaces. Fixes: c9da99e6475f ("unix_diag: Fixup RQLEN extension report") Fixes: 2aac7a2cb0d9 ("unix_diag: Pending connections IDs NLA") Fixes: 45a96b9be6ec ("unix_diag: Dumping all sockets core") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/unix/diag.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/unix/diag.c b/net/unix/diag.c index 2975e7a061d0..4666fabb0493 100644 --- a/net/unix/diag.c +++ b/net/unix/diag.c @@ -64,7 +64,7 @@ static int sk_diag_dump_icons(struct sock *sk, struct sk_buff *nlskb) u32 *buf; int i; - if (sk->sk_state == TCP_LISTEN) { + if (READ_ONCE(sk->sk_state) == TCP_LISTEN) { spin_lock(&sk->sk_receive_queue.lock); attr = nla_reserve(nlskb, UNIX_DIAG_ICONS, @@ -102,7 +102,7 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb) { struct unix_diag_rqlen rql; - if (sk->sk_state == TCP_LISTEN) { + if (READ_ONCE(sk->sk_state) == TCP_LISTEN) { rql.udiag_rqueue = sk->sk_receive_queue.qlen; rql.udiag_wqueue = sk->sk_max_ack_backlog; } else { @@ -135,7 +135,7 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_r rep = nlmsg_data(nlh); rep->udiag_family = AF_UNIX; rep->udiag_type = sk->sk_type; - rep->udiag_state = sk->sk_state; + rep->udiag_state = READ_ONCE(sk->sk_state); rep->pad = 0; rep->udiag_ino = sk_ino; sock_diag_save_cookie(sk, rep->udiag_cookie); @@ -218,7 +218,7 @@ static int unix_diag_dump(struct sk_buff *skb, struct netlink_callback *cb) continue; if (num < s_num) goto next; - if (!(req->udiag_states & (1 << sk->sk_state))) + if (!(req->udiag_states & (1 << READ_ONCE(sk->sk_state)))) goto next; if (sk_diag_dump(sk, skb, req, sk_user_ns(skb->sk), NETLINK_CB(cb->skb).portid, From f70ef84b821e485617b77f29e462c265fe890d28 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Tue, 4 Jun 2024 09:52:37 -0700 Subject: [PATCH 1094/1565] af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen. [ Upstream commit bd9f2d05731f6a112d0c7391a0d537bfc588dbe6 ] net->unx.sysctl_max_dgram_qlen is exposed as a sysctl knob and can be changed concurrently. Let's use READ_ONCE() in unix_create1(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/unix/af_unix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index a88333b30cc3..8e640ca2e4af 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -815,7 +815,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern) sk->sk_allocation = GFP_KERNEL_ACCOUNT; sk->sk_write_space = unix_write_space; - sk->sk_max_ack_backlog = net->unx.sysctl_max_dgram_qlen; + sk->sk_max_ack_backlog = READ_ONCE(net->unx.sysctl_max_dgram_qlen); sk->sk_destruct = unix_sock_destructor; u = unix_sk(sk); u->inflight = 0; From a64e4b8f9bf6ff21beb111df36e4a70529cd52aa Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Tue, 4 Jun 2024 09:52:38 -0700 Subject: [PATCH 1095/1565] af_unix: Use unix_recvq_full_lockless() in unix_stream_connect(). [ Upstream commit 45d872f0e65593176d880ec148f41ad7c02e40a7 ] Once sk->sk_state is changed to TCP_LISTEN, it never changes. unix_accept() takes advantage of this characteristics; it does not hold the listener's unix_state_lock() and only acquires recvq lock to pop one skb. It means unix_state_lock() does not prevent the queue length from changing in unix_stream_connect(). Thus, we need to use unix_recvq_full_lockless() to avoid data-race. Now we remove unix_recvq_full() as no one uses it. Note that we can remove READ_ONCE() for sk->sk_max_ack_backlog in unix_recvq_full_lockless() because of the following reasons: (1) For SOCK_DGRAM, it is a written-once field in unix_create1() (2) For SOCK_STREAM and SOCK_SEQPACKET, it is changed under the listener's unix_state_lock() in unix_listen(), and we hold the lock in unix_stream_connect() Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/unix/af_unix.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 8e640ca2e4af..e2ff610d2776 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -189,15 +189,9 @@ static inline int unix_may_send(struct sock *sk, struct sock *osk) return unix_peer(osk) == NULL || unix_our_peer(sk, osk); } -static inline int unix_recvq_full(const struct sock *sk) -{ - return skb_queue_len(&sk->sk_receive_queue) > sk->sk_max_ack_backlog; -} - static inline int unix_recvq_full_lockless(const struct sock *sk) { - return skb_queue_len_lockless(&sk->sk_receive_queue) > - READ_ONCE(sk->sk_max_ack_backlog); + return skb_queue_len_lockless(&sk->sk_receive_queue) > sk->sk_max_ack_backlog; } struct sock *unix_peer_get(struct sock *s) @@ -1310,7 +1304,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr, if (other->sk_shutdown & RCV_SHUTDOWN) goto out_unlock; - if (unix_recvq_full(other)) { + if (unix_recvq_full_lockless(other)) { err = -EAGAIN; if (!timeo) goto out_unlock; From 35a69f9e5db848de37569be6ac674339a0089f91 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Tue, 4 Jun 2024 09:52:40 -0700 Subject: [PATCH 1096/1565] af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen(). [ Upstream commit 5d915e584d8408211d4567c22685aae8820bfc55 ] We can dump the socket queue length via UNIX_DIAG by specifying UDIAG_SHOW_RQLEN. If sk->sk_state is TCP_LISTEN, we return the recv queue length, but here we do not hold recvq lock. Let's use skb_queue_len_lockless() in sk_diag_show_rqlen(). Fixes: c9da99e6475f ("unix_diag: Fixup RQLEN extension report") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/unix/diag.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/unix/diag.c b/net/unix/diag.c index 4666fabb0493..5bc5cb83cc6e 100644 --- a/net/unix/diag.c +++ b/net/unix/diag.c @@ -103,7 +103,7 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb) struct unix_diag_rqlen rql; if (READ_ONCE(sk->sk_state) == TCP_LISTEN) { - rql.udiag_rqueue = sk->sk_receive_queue.qlen; + rql.udiag_rqueue = skb_queue_len_lockless(&sk->sk_receive_queue); rql.udiag_wqueue = sk->sk_max_ack_backlog; } else { rql.udiag_rqueue = (u32) unix_inq_len(sk); From 74c97c80034f649b14e142287a2e51bc4aae8686 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Tue, 4 Jun 2024 09:52:41 -0700 Subject: [PATCH 1097/1565] af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill(). [ Upstream commit efaf24e30ec39ebbea9112227485805a48b0ceb1 ] While dumping sockets via UNIX_DIAG, we do not hold unix_state_lock(). Let's use READ_ONCE() to read sk->sk_shutdown. Fixes: e4e541a84863 ("sock-diag: Report shutdown for inet and unix sockets (v2)") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/unix/diag.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/unix/diag.c b/net/unix/diag.c index 5bc5cb83cc6e..7066a3623410 100644 --- a/net/unix/diag.c +++ b/net/unix/diag.c @@ -164,7 +164,7 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_r sock_diag_put_meminfo(sk, skb, UNIX_DIAG_MEMINFO)) goto out_nlmsg_trim; - if (nla_put_u8(skb, UNIX_DIAG_SHUTDOWN, sk->sk_shutdown)) + if (nla_put_u8(skb, UNIX_DIAG_SHUTDOWN, READ_ONCE(sk->sk_shutdown))) goto out_nlmsg_trim; if ((req->udiag_show & UDIAG_SHOW_UID) && From c693698787660c97950bc1f93a8dd19d8307153d Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 4 Jun 2024 19:35:49 +0000 Subject: [PATCH 1098/1565] ipv6: fix possible race in __fib6_drop_pcpu_from() [ Upstream commit b01e1c030770ff3b4fe37fc7cc6bca03f594133f ] syzbot found a race in __fib6_drop_pcpu_from() [1] If compiler reads more than once (*ppcpu_rt), second read could read NULL, if another cpu clears the value in rt6_get_pcpu_route(). Add a READ_ONCE() to prevent this race. Also add rcu_read_lock()/rcu_read_unlock() because we rely on RCU protection while dereferencing pcpu_rt. [1] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097] CPU: 0 PID: 7543 Comm: kworker/u8:17 Not tainted 6.10.0-rc1-syzkaller-00013-g2bfcfd584ff5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Workqueue: netns cleanup_net RIP: 0010:__fib6_drop_pcpu_from.part.0+0x10a/0x370 net/ipv6/ip6_fib.c:984 Code: f8 48 c1 e8 03 80 3c 28 00 0f 85 16 02 00 00 4d 8b 3f 4d 85 ff 74 31 e8 74 a7 fa f7 49 8d bf 90 00 00 00 48 89 f8 48 c1 e8 03 <80> 3c 28 00 0f 85 1e 02 00 00 49 8b 87 90 00 00 00 48 8b 0c 24 48 RSP: 0018:ffffc900040df070 EFLAGS: 00010206 RAX: 0000000000000012 RBX: 0000000000000001 RCX: ffffffff89932e16 RDX: ffff888049dd1e00 RSI: ffffffff89932d7c RDI: 0000000000000091 RBP: dffffc0000000000 R08: 0000000000000005 R09: 0000000000000007 R10: 0000000000000001 R11: 0000000000000006 R12: ffff88807fa080b8 R13: fffffbfff1a9a07d R14: ffffed100ff41022 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8880b9200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32c26000 CR3: 000000005d56e000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __fib6_drop_pcpu_from net/ipv6/ip6_fib.c:966 [inline] fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1027 [inline] fib6_purge_rt+0x7f2/0x9f0 net/ipv6/ip6_fib.c:1038 fib6_del_route net/ipv6/ip6_fib.c:1998 [inline] fib6_del+0xa70/0x17b0 net/ipv6/ip6_fib.c:2043 fib6_clean_node+0x426/0x5b0 net/ipv6/ip6_fib.c:2205 fib6_walk_continue+0x44f/0x8d0 net/ipv6/ip6_fib.c:2127 fib6_walk+0x182/0x370 net/ipv6/ip6_fib.c:2175 fib6_clean_tree+0xd7/0x120 net/ipv6/ip6_fib.c:2255 __fib6_clean_all+0x100/0x2d0 net/ipv6/ip6_fib.c:2271 rt6_sync_down_dev net/ipv6/route.c:4906 [inline] rt6_disable_ip+0x7ed/0xa00 net/ipv6/route.c:4911 addrconf_ifdown.isra.0+0x117/0x1b40 net/ipv6/addrconf.c:3855 addrconf_notify+0x223/0x19e0 net/ipv6/addrconf.c:3778 notifier_call_chain+0xb9/0x410 kernel/notifier.c:93 call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1992 call_netdevice_notifiers_extack net/core/dev.c:2030 [inline] call_netdevice_notifiers net/core/dev.c:2044 [inline] dev_close_many+0x333/0x6a0 net/core/dev.c:1585 unregister_netdevice_many_notify+0x46d/0x19f0 net/core/dev.c:11193 unregister_netdevice_many net/core/dev.c:11276 [inline] default_device_exit_batch+0x85b/0xae0 net/core/dev.c:11759 ops_exit_list+0x128/0x180 net/core/net_namespace.c:178 cleanup_net+0x5b7/0xbf0 net/core/net_namespace.c:640 process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231 process_scheduled_works kernel/workqueue.c:3312 [inline] worker_thread+0x6c8/0xf70 kernel/workqueue.c:3393 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Fixes: d52d3997f843 ("ipv6: Create percpu rt6_info") Signed-off-by: Eric Dumazet Cc: Martin KaFai Lau Link: https://lore.kernel.org/r/20240604193549.981839-1-edumazet@google.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/ipv6/ip6_fib.c | 6 +++++- net/ipv6/route.c | 1 + 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index b79e571e5a86..83f15e930b57 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -959,6 +959,7 @@ static void __fib6_drop_pcpu_from(struct fib6_nh *fib6_nh, if (!fib6_nh->rt6i_pcpu) return; + rcu_read_lock(); /* release the reference to this fib entry from * all of its cached pcpu routes */ @@ -967,7 +968,9 @@ static void __fib6_drop_pcpu_from(struct fib6_nh *fib6_nh, struct rt6_info *pcpu_rt; ppcpu_rt = per_cpu_ptr(fib6_nh->rt6i_pcpu, cpu); - pcpu_rt = *ppcpu_rt; + + /* Paired with xchg() in rt6_get_pcpu_route() */ + pcpu_rt = READ_ONCE(*ppcpu_rt); /* only dropping the 'from' reference if the cached route * is using 'match'. The cached pcpu_rt->from only changes @@ -981,6 +984,7 @@ static void __fib6_drop_pcpu_from(struct fib6_nh *fib6_nh, fib6_info_release(from); } } + rcu_read_unlock(); } struct fib6_nh_pcpu_arg { diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 88f96241ca97..62eaaf9532e1 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1396,6 +1396,7 @@ static struct rt6_info *rt6_get_pcpu_route(const struct fib6_result *res) struct rt6_info *prev, **p; p = this_cpu_ptr(res->nh->rt6i_pcpu); + /* Paired with READ_ONCE() in __fib6_drop_pcpu_from() */ prev = xchg(p, NULL); if (prev) { dst_dev_put(&prev->dst); From e500b1c4e29ad0bd1c1332a1eaea2913627a92dd Mon Sep 17 00:00:00 2001 From: Wesley Cheng Date: Mon, 8 Apr 2024 18:40:59 -0700 Subject: [PATCH 1099/1565] usb: gadget: f_fs: Fix race between aio_cancel() and AIO request complete [ Upstream commit 24729b307eefcd7c476065cd7351c1a018082c19 ] FFS based applications can utilize the aio_cancel() callback to dequeue pending USB requests submitted to the UDC. There is a scenario where the FFS application issues an AIO cancel call, while the UDC is handling a soft disconnect. For a DWC3 based implementation, the callstack looks like the following: DWC3 Gadget FFS Application dwc3_gadget_soft_disconnect() ... --> dwc3_stop_active_transfers() --> dwc3_gadget_giveback(-ESHUTDOWN) --> ffs_epfile_async_io_complete() ffs_aio_cancel() --> usb_ep_free_request() --> usb_ep_dequeue() There is currently no locking implemented between the AIO completion handler and AIO cancel, so the issue occurs if the completion routine is running in parallel to an AIO cancel call coming from the FFS application. As the completion call frees the USB request (io_data->req) the FFS application is also referencing it for the usb_ep_dequeue() call. This can lead to accessing a stale/hanging pointer. commit b566d38857fc ("usb: gadget: f_fs: use io_data->status consistently") relocated the usb_ep_free_request() into ffs_epfile_async_io_complete(). However, in order to properly implement locking to mitigate this issue, the spinlock can't be added to ffs_epfile_async_io_complete(), as usb_ep_dequeue() (if successfully dequeuing a USB request) will call the function driver's completion handler in the same context. Hence, leading into a deadlock. Fix this issue by moving the usb_ep_free_request() back to ffs_user_copy_worker(), and ensuring that it explicitly sets io_data->req to NULL after freeing it within the ffs->eps_lock. This resolves the race condition above, as the ffs_aio_cancel() routine will not continue attempting to dequeue a request that has already been freed, or the ffs_user_copy_work() not freeing the USB request until the AIO cancel is done referencing it. This fix depends on commit b566d38857fc ("usb: gadget: f_fs: use io_data->status consistently") Fixes: 2e4c7553cd6f ("usb: gadget: f_fs: add aio support") Cc: stable # b566d38857fc ("usb: gadget: f_fs: use io_data->status consistently") Signed-off-by: Wesley Cheng Link: https://lore.kernel.org/r/20240409014059.6740-1-quic_wcheng@quicinc.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- drivers/usb/gadget/function/f_fs.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c index ad7df99f09a4..592c79a04d64 100644 --- a/drivers/usb/gadget/function/f_fs.c +++ b/drivers/usb/gadget/function/f_fs.c @@ -827,6 +827,7 @@ static void ffs_user_copy_worker(struct work_struct *work) int ret = io_data->req->status ? io_data->req->status : io_data->req->actual; bool kiocb_has_eventfd = io_data->kiocb->ki_flags & IOCB_EVENTFD; + unsigned long flags; if (io_data->read && ret > 0) { kthread_use_mm(io_data->mm); @@ -839,7 +840,10 @@ static void ffs_user_copy_worker(struct work_struct *work) if (io_data->ffs->ffs_eventfd && !kiocb_has_eventfd) eventfd_signal(io_data->ffs->ffs_eventfd, 1); + spin_lock_irqsave(&io_data->ffs->eps_lock, flags); usb_ep_free_request(io_data->ep, io_data->req); + io_data->req = NULL; + spin_unlock_irqrestore(&io_data->ffs->eps_lock, flags); if (io_data->read) kfree(io_data->to_free); From e8b8054f5ef4d626f8904ea901f81b9d9fd2360b Mon Sep 17 00:00:00 2001 From: George Shen Date: Thu, 16 Sep 2021 19:55:39 -0400 Subject: [PATCH 1100/1565] drm/amd/display: Handle Y carry-over in VCP X.Y calculation [ Upstream commit 3626a6aebe62ce7067cdc460c0c644e9445386bb ] [Why/How] Theoretically rare corner case where ceil(Y) results in rounding up to an integer. If this happens, the 1 should be carried over to the X value. Reviewed-by: Wenjing Liu Acked-by: Anson Jacob Signed-off-by: George Shen Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_stream_encoder.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_stream_encoder.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_stream_encoder.c index f70fcadf1ee5..a4a6b99bddba 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_stream_encoder.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_stream_encoder.c @@ -633,6 +633,12 @@ void enc1_stream_encoder_set_throttled_vcp_size( x), 26)); + // If y rounds up to integer, carry it over to x. + if (y >> 26) { + x += 1; + y = 0; + } + REG_SET_2(DP_MSE_RATE_CNTL, 0, DP_MSE_RATE_X, x, DP_MSE_RATE_Y, y); From 9a2e0aa9a80941b450a78ff074ee08cbb58abf53 Mon Sep 17 00:00:00 2001 From: Hugo Villeneuve Date: Thu, 21 Dec 2023 18:18:19 -0500 Subject: [PATCH 1101/1565] serial: sc16is7xx: replace hardcoded divisor value with BIT() macro [ Upstream commit 2e57cefc4477659527f7adab1f87cdbf60ef1ae6 ] To better show why the limit is what it is, since we have only 16 bits for the divisor. Reviewed-by: Andy Shevchenko Suggested-by: Andy Shevchenko Signed-off-by: Hugo Villeneuve Link: https://lore.kernel.org/r/20231221231823.2327894-13-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman Stable-dep-of: 8492bd91aa05 ("serial: sc16is7xx: fix bug in sc16is7xx_set_baud() when using prescaler") Signed-off-by: Sasha Levin --- drivers/tty/serial/sc16is7xx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c index d751f8ce5cf6..88b84d43c2d6 100644 --- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -497,7 +497,7 @@ static int sc16is7xx_set_baud(struct uart_port *port, int baud) u8 prescaler = 0; unsigned long clk = port->uartclk, div = clk / 16 / baud; - if (div > 0xffff) { + if (div >= BIT(16)) { prescaler = SC16IS7XX_MCR_CLKSEL_BIT; div /= 4; } From b5a2a6908109c54b5f4b20eed29bcdffb8854c70 Mon Sep 17 00:00:00 2001 From: Hugo Villeneuve Date: Tue, 30 Apr 2024 16:04:30 -0400 Subject: [PATCH 1102/1565] serial: sc16is7xx: fix bug in sc16is7xx_set_baud() when using prescaler [ Upstream commit 8492bd91aa055907c67ef04f2b56f6dadd1f44bf ] When using a high speed clock with a low baud rate, the 4x prescaler is automatically selected if required. In that case, sc16is7xx_set_baud() properly configures the chip registers, but returns an incorrect baud rate by not taking into account the prescaler value. This incorrect baud rate is then fed to uart_update_timeout(). For example, with an input clock of 80MHz, and a selected baud rate of 50, sc16is7xx_set_baud() will return 200 instead of 50. Fix this by first changing the prescaler variable to hold the selected prescaler value instead of the MCR bitfield. Then properly take into account the selected prescaler value in the return value computation. Also add better documentation about the divisor value computation. Fixes: dfeae619d781 ("serial: sc16is7xx") Cc: stable@vger.kernel.org Signed-off-by: Hugo Villeneuve Reviewed-by: Jiri Slaby Link: https://lore.kernel.org/r/20240430200431.4102923-1-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- drivers/tty/serial/sc16is7xx.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c index 88b84d43c2d6..4ea52426acf9 100644 --- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -490,16 +490,28 @@ static bool sc16is7xx_regmap_noinc(struct device *dev, unsigned int reg) return reg == SC16IS7XX_RHR_REG; } +/* + * Configure programmable baud rate generator (divisor) according to the + * desired baud rate. + * + * From the datasheet, the divisor is computed according to: + * + * XTAL1 input frequency + * ----------------------- + * prescaler + * divisor = --------------------------- + * baud-rate x sampling-rate + */ static int sc16is7xx_set_baud(struct uart_port *port, int baud) { struct sc16is7xx_port *s = dev_get_drvdata(port->dev); u8 lcr; - u8 prescaler = 0; + unsigned int prescaler = 1; unsigned long clk = port->uartclk, div = clk / 16 / baud; if (div >= BIT(16)) { - prescaler = SC16IS7XX_MCR_CLKSEL_BIT; - div /= 4; + prescaler = 4; + div /= prescaler; } /* In an amazing feat of design, the Enhanced Features Register shares @@ -534,9 +546,10 @@ static int sc16is7xx_set_baud(struct uart_port *port, int baud) mutex_unlock(&s->efr_lock); + /* If bit MCR_CLKSEL is set, the divide by 4 prescaler is activated. */ sc16is7xx_port_update(port, SC16IS7XX_MCR_REG, SC16IS7XX_MCR_CLKSEL_BIT, - prescaler); + prescaler == 1 ? 0 : SC16IS7XX_MCR_CLKSEL_BIT); /* Open the LCR divisors for configuration */ sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, @@ -551,7 +564,7 @@ static int sc16is7xx_set_baud(struct uart_port *port, int baud) /* Put LCR back to the normal mode */ sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, lcr); - return DIV_ROUND_CLOSEST(clk / 16, div); + return DIV_ROUND_CLOSEST((clk / prescaler) / 16, div); } static void sc16is7xx_handle_rx(struct uart_port *port, unsigned int rxlen, From 6ff7cfa02baabec907f6f29ea76634e6256d2ec4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= Date: Sun, 24 Mar 2024 12:40:17 +0100 Subject: [PATCH 1103/1565] mmc: davinci: Don't strip remove function when driver is builtin MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 55c421b364482b61c4c45313a535e61ed5ae4ea3 ] Using __exit for the remove function results in the remove callback being discarded with CONFIG_MMC_DAVINCI=y. When such a device gets unbound (e.g. using sysfs or hotplug), the driver is just removed without the cleanup being performed. This results in resource leaks. Fix it by compiling in the remove callback unconditionally. This also fixes a W=1 modpost warning: WARNING: modpost: drivers/mmc/host/davinci_mmc: section mismatch in reference: davinci_mmcsd_driver+0x10 (section: .data) -> davinci_mmcsd_remove (section: .exit.text) Fixes: b4cff4549b7a ("DaVinci: MMC: MMC/SD controller driver for DaVinci family") Signed-off-by: Uwe Kleine-König Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240324114017.231936-2-u.kleine-koenig@pengutronix.de Signed-off-by: Ulf Hansson Signed-off-by: Sasha Levin --- drivers/mmc/host/davinci_mmc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/mmc/host/davinci_mmc.c b/drivers/mmc/host/davinci_mmc.c index 647928ab00a3..77258948440f 100644 --- a/drivers/mmc/host/davinci_mmc.c +++ b/drivers/mmc/host/davinci_mmc.c @@ -1347,7 +1347,7 @@ static int davinci_mmcsd_probe(struct platform_device *pdev) return ret; } -static int __exit davinci_mmcsd_remove(struct platform_device *pdev) +static int davinci_mmcsd_remove(struct platform_device *pdev) { struct mmc_davinci_host *host = platform_get_drvdata(pdev); @@ -1404,7 +1404,7 @@ static struct platform_driver davinci_mmcsd_driver = { .of_match_table = davinci_mmc_dt_ids, }, .probe = davinci_mmcsd_probe, - .remove = __exit_p(davinci_mmcsd_remove), + .remove = davinci_mmcsd_remove, .id_table = davinci_mmc_devtype, }; From 9d3886a1604b84e780d56c01367a4a536b937114 Mon Sep 17 00:00:00 2001 From: Dev Jain Date: Tue, 21 May 2024 13:13:57 +0530 Subject: [PATCH 1104/1565] selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages [ Upstream commit 9ad665ef55eaad1ead1406a58a34f615a7c18b5e ] Currently, the test tries to set nr_hugepages to zero, but that is not actually done because the file offset is not reset after read(). Fix that using lseek(). Link: https://lkml.kernel.org/r/20240521074358.675031-3-dev.jain@arm.com Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Dev Jain Cc: Cc: Anshuman Khandual Cc: Shuah Khan Cc: Sri Jayaramappa Signed-off-by: Andrew Morton Signed-off-by: Sasha Levin --- tools/testing/selftests/vm/compaction_test.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/vm/compaction_test.c b/tools/testing/selftests/vm/compaction_test.c index 9b420140ba2b..55dec92e1e58 100644 --- a/tools/testing/selftests/vm/compaction_test.c +++ b/tools/testing/selftests/vm/compaction_test.c @@ -103,6 +103,8 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size) goto close_fd; } + lseek(fd, 0, SEEK_SET); + /* Start with the initial condition of 0 huge pages*/ if (write(fd, "0", sizeof(char)) != sizeof(char)) { perror("Failed to write 0 to /proc/sys/vm/nr_hugepages\n"); From 7c1cc0a5d43f0ace5fded23ddc2d01ae5bffedca Mon Sep 17 00:00:00 2001 From: Muhammad Usama Anjum Date: Mon, 1 Jan 2024 13:36:12 +0500 Subject: [PATCH 1105/1565] selftests/mm: conform test to TAP format output [ Upstream commit 9a21701edc41465de56f97914741bfb7bfc2517d ] Conform the layout, informational and status messages to TAP. No functional change is intended other than the layout of output messages. Link: https://lkml.kernel.org/r/20240101083614.1076768-1-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum Cc: Shuah Khan Signed-off-by: Andrew Morton Stable-dep-of: d4202e66a4b1 ("selftests/mm: compaction_test: fix bogus test success on Aarch64") Signed-off-by: Sasha Levin --- tools/testing/selftests/vm/compaction_test.c | 91 ++++++++++---------- 1 file changed, 44 insertions(+), 47 deletions(-) diff --git a/tools/testing/selftests/vm/compaction_test.c b/tools/testing/selftests/vm/compaction_test.c index 55dec92e1e58..f81931c1f838 100644 --- a/tools/testing/selftests/vm/compaction_test.c +++ b/tools/testing/selftests/vm/compaction_test.c @@ -33,7 +33,7 @@ int read_memory_info(unsigned long *memfree, unsigned long *hugepagesize) FILE *cmdfile = popen(cmd, "r"); if (!(fgets(buffer, sizeof(buffer), cmdfile))) { - perror("Failed to read meminfo\n"); + ksft_print_msg("Failed to read meminfo: %s\n", strerror(errno)); return -1; } @@ -44,7 +44,7 @@ int read_memory_info(unsigned long *memfree, unsigned long *hugepagesize) cmdfile = popen(cmd, "r"); if (!(fgets(buffer, sizeof(buffer), cmdfile))) { - perror("Failed to read meminfo\n"); + ksft_print_msg("Failed to read meminfo: %s\n", strerror(errno)); return -1; } @@ -62,14 +62,14 @@ int prereq(void) fd = open("/proc/sys/vm/compact_unevictable_allowed", O_RDONLY | O_NONBLOCK); if (fd < 0) { - perror("Failed to open\n" - "/proc/sys/vm/compact_unevictable_allowed\n"); + ksft_print_msg("Failed to open /proc/sys/vm/compact_unevictable_allowed: %s\n", + strerror(errno)); return -1; } if (read(fd, &allowed, sizeof(char)) != sizeof(char)) { - perror("Failed to read from\n" - "/proc/sys/vm/compact_unevictable_allowed\n"); + ksft_print_msg("Failed to read from /proc/sys/vm/compact_unevictable_allowed: %s\n", + strerror(errno)); close(fd); return -1; } @@ -78,12 +78,13 @@ int prereq(void) if (allowed == '1') return 0; + ksft_print_msg("Compaction isn't allowed\n"); return -1; } int check_compaction(unsigned long mem_free, unsigned int hugepage_size) { - int fd; + int fd, ret = -1; int compaction_index = 0; char initial_nr_hugepages[10] = {0}; char nr_hugepages[10] = {0}; @@ -94,12 +95,14 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size) fd = open("/proc/sys/vm/nr_hugepages", O_RDWR | O_NONBLOCK); if (fd < 0) { - perror("Failed to open /proc/sys/vm/nr_hugepages"); + ksft_test_result_fail("Failed to open /proc/sys/vm/nr_hugepages: %s\n", + strerror(errno)); return -1; } if (read(fd, initial_nr_hugepages, sizeof(initial_nr_hugepages)) <= 0) { - perror("Failed to read from /proc/sys/vm/nr_hugepages"); + ksft_test_result_fail("Failed to read from /proc/sys/vm/nr_hugepages: %s\n", + strerror(errno)); goto close_fd; } @@ -107,7 +110,8 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size) /* Start with the initial condition of 0 huge pages*/ if (write(fd, "0", sizeof(char)) != sizeof(char)) { - perror("Failed to write 0 to /proc/sys/vm/nr_hugepages\n"); + ksft_test_result_fail("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n", + strerror(errno)); goto close_fd; } @@ -116,14 +120,16 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size) /* Request a large number of huge pages. The Kernel will allocate as much as it can */ if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) { - perror("Failed to write 100000 to /proc/sys/vm/nr_hugepages\n"); + ksft_test_result_fail("Failed to write 100000 to /proc/sys/vm/nr_hugepages: %s\n", + strerror(errno)); goto close_fd; } lseek(fd, 0, SEEK_SET); if (read(fd, nr_hugepages, sizeof(nr_hugepages)) <= 0) { - perror("Failed to re-read from /proc/sys/vm/nr_hugepages\n"); + ksft_test_result_fail("Failed to re-read from /proc/sys/vm/nr_hugepages: %s\n", + strerror(errno)); goto close_fd; } @@ -131,67 +137,58 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size) huge pages */ compaction_index = mem_free/(atoi(nr_hugepages) * hugepage_size); - if (compaction_index > 3) { - printf("No of huge pages allocated = %d\n", - (atoi(nr_hugepages))); - fprintf(stderr, "ERROR: Less that 1/%d of memory is available\n" - "as huge pages\n", compaction_index); - goto close_fd; - } - - printf("No of huge pages allocated = %d\n", - (atoi(nr_hugepages))); - lseek(fd, 0, SEEK_SET); if (write(fd, initial_nr_hugepages, strlen(initial_nr_hugepages)) != strlen(initial_nr_hugepages)) { - perror("Failed to write value to /proc/sys/vm/nr_hugepages\n"); + ksft_test_result_fail("Failed to write value to /proc/sys/vm/nr_hugepages: %s\n", + strerror(errno)); goto close_fd; } - close(fd); - return 0; + if (compaction_index > 3) { + ksft_print_msg("ERROR: Less that 1/%d of memory is available\n" + "as huge pages\n", compaction_index); + ksft_test_result_fail("No of huge pages allocated = %d\n", (atoi(nr_hugepages))); + goto close_fd; + } + + ksft_test_result_pass("Memory compaction succeeded. No of huge pages allocated = %d\n", + (atoi(nr_hugepages))); + ret = 0; close_fd: close(fd); - printf("Not OK. Compaction test failed."); - return -1; + return ret; } int main(int argc, char **argv) { struct rlimit lim; - struct map_list *list, *entry; + struct map_list *list = NULL, *entry; size_t page_size, i; void *map = NULL; unsigned long mem_free = 0; unsigned long hugepage_size = 0; long mem_fragmentable_MB = 0; - if (prereq() != 0) { - printf("Either the sysctl compact_unevictable_allowed is not\n" - "set to 1 or couldn't read the proc file.\n" - "Skipping the test\n"); - return KSFT_SKIP; - } + ksft_print_header(); + + if (prereq() != 0) + return ksft_exit_pass(); + + ksft_set_plan(1); lim.rlim_cur = RLIM_INFINITY; lim.rlim_max = RLIM_INFINITY; - if (setrlimit(RLIMIT_MEMLOCK, &lim)) { - perror("Failed to set rlimit:\n"); - return -1; - } + if (setrlimit(RLIMIT_MEMLOCK, &lim)) + ksft_exit_fail_msg("Failed to set rlimit: %s\n", strerror(errno)); page_size = getpagesize(); - list = NULL; - - if (read_memory_info(&mem_free, &hugepage_size) != 0) { - printf("ERROR: Cannot read meminfo\n"); - return -1; - } + if (read_memory_info(&mem_free, &hugepage_size) != 0) + ksft_exit_fail_msg("Failed to get meminfo\n"); mem_fragmentable_MB = mem_free * 0.8 / 1024; @@ -227,7 +224,7 @@ int main(int argc, char **argv) } if (check_compaction(mem_free, hugepage_size) == 0) - return 0; + return ksft_exit_pass(); - return -1; + return ksft_exit_fail(); } From 79bf1ea0d52233a1e80e8f2bcebeca7f9bc7218c Mon Sep 17 00:00:00 2001 From: Dev Jain Date: Tue, 21 May 2024 13:13:56 +0530 Subject: [PATCH 1106/1565] selftests/mm: compaction_test: fix bogus test success on Aarch64 [ Upstream commit d4202e66a4b1fe6968f17f9f09bbc30d08f028a1 ] Patch series "Fixes for compaction_test", v2. The compaction_test memory selftest introduces fragmentation in memory and then tries to allocate as many hugepages as possible. This series addresses some problems. On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since compaction_index becomes 0, which is less than 3, due to no division by zero exception being raised. We fix that by checking for division by zero. Secondly, correctly set the number of hugepages to zero before trying to set a large number of them. Now, consider a situation in which, at the start of the test, a non-zero number of hugepages have been already set (while running the entire selftests/mm suite, or manually by the admin). The test operates on 80% of memory to avoid OOM-killer invocation, and because some memory is already blocked by hugepages, it would increase the chance of OOM-killing. Also, since mem_free used in check_compaction() is the value before we set nr_hugepages to zero, the chance that the compaction_index will be small is very high if the preset nr_hugepages was high, leading to a bogus test success. This patch (of 3): Currently, if at runtime we are not able to allocate a huge page, the test will trivially pass on Aarch64 due to no exception being raised on division by zero while computing compaction_index. Fix that by checking for nr_hugepages == 0. Anyways, in general, avoid a division by zero by exiting the program beforehand. While at it, fix a typo, and handle the case where the number of hugepages may overflow an integer. Link: https://lkml.kernel.org/r/20240521074358.675031-1-dev.jain@arm.com Link: https://lkml.kernel.org/r/20240521074358.675031-2-dev.jain@arm.com Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Dev Jain Cc: Anshuman Khandual Cc: Shuah Khan Cc: Sri Jayaramappa Cc: Signed-off-by: Andrew Morton Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/testing/selftests/vm/compaction_test.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/vm/compaction_test.c b/tools/testing/selftests/vm/compaction_test.c index f81931c1f838..7c260060a1a6 100644 --- a/tools/testing/selftests/vm/compaction_test.c +++ b/tools/testing/selftests/vm/compaction_test.c @@ -82,12 +82,13 @@ int prereq(void) return -1; } -int check_compaction(unsigned long mem_free, unsigned int hugepage_size) +int check_compaction(unsigned long mem_free, unsigned long hugepage_size) { + unsigned long nr_hugepages_ul; int fd, ret = -1; int compaction_index = 0; - char initial_nr_hugepages[10] = {0}; - char nr_hugepages[10] = {0}; + char initial_nr_hugepages[20] = {0}; + char nr_hugepages[20] = {0}; /* We want to test with 80% of available memory. Else, OOM killer comes in to play */ @@ -135,7 +136,12 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size) /* We should have been able to request at least 1/3 rd of the memory in huge pages */ - compaction_index = mem_free/(atoi(nr_hugepages) * hugepage_size); + nr_hugepages_ul = strtoul(nr_hugepages, NULL, 10); + if (!nr_hugepages_ul) { + ksft_print_msg("ERROR: No memory is available as huge pages\n"); + goto close_fd; + } + compaction_index = mem_free/(nr_hugepages_ul * hugepage_size); lseek(fd, 0, SEEK_SET); @@ -147,7 +153,7 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size) } if (compaction_index > 3) { - ksft_print_msg("ERROR: Less that 1/%d of memory is available\n" + ksft_print_msg("ERROR: Less than 1/%d of memory is available\n" "as huge pages\n", compaction_index); ksft_test_result_fail("No of huge pages allocated = %d\n", (atoi(nr_hugepages))); goto close_fd; From 05544fd3f18a66e6d34fc1d4b580de59861a6f8b Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Mon, 3 Jun 2024 12:49:08 +0100 Subject: [PATCH 1107/1565] btrfs: fix leak of qgroup extent records after transaction abort [ Upstream commit fb33eb2ef0d88e75564983ef057b44c5b7e4fded ] Qgroup extent records are created when delayed ref heads are created and then released after accounting extents at btrfs_qgroup_account_extents(), called during the transaction commit path. If a transaction is aborted we free the qgroup records by calling btrfs_qgroup_destroy_extent_records() at btrfs_destroy_delayed_refs(), unless we don't have delayed references. We are incorrectly assuming that no delayed references means we don't have qgroup extents records. We can currently have no delayed references because we ran them all during a transaction commit and the transaction was aborted after that due to some error in the commit path. So fix this by ensuring we btrfs_qgroup_destroy_extent_records() at btrfs_destroy_delayed_refs() even if we don't have any delayed references. Reported-by: syzbot+0fecc032fa134afd49df@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/0000000000004e7f980619f91835@google.com/ Fixes: 81f7eb00ff5b ("btrfs: destroy qgroup extent records on transaction abort") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Josef Bacik Reviewed-by: Qu Wenruo Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Sasha Levin --- fs/btrfs/disk-io.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 019f0925fa73..c484c145c5d0 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4442,19 +4442,11 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, struct btrfs_fs_info *fs_info) { struct rb_node *node; - struct btrfs_delayed_ref_root *delayed_refs; + struct btrfs_delayed_ref_root *delayed_refs = &trans->delayed_refs; struct btrfs_delayed_ref_node *ref; int ret = 0; - delayed_refs = &trans->delayed_refs; - spin_lock(&delayed_refs->lock); - if (atomic_read(&delayed_refs->num_entries) == 0) { - spin_unlock(&delayed_refs->lock); - btrfs_debug(fs_info, "delayed_refs has NO entry"); - return ret; - } - while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) { struct btrfs_delayed_ref_head *head; struct rb_node *n; From 8b56df81b369eb4eb3ded759a96fa2bff55605c2 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 17 May 2022 18:12:25 -0400 Subject: [PATCH 1108/1565] nilfs2: Remove check for PageError [ Upstream commit 79ea65563ad8aaab309d61eeb4d5019dd6cf5fa0 ] If read_mapping_page() encounters an error, it returns an errno, not a page with PageError set, so this test is not needed. Signed-off-by: Matthew Wilcox (Oracle) Stable-dep-of: 7373a51e7998 ("nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors") Signed-off-by: Sasha Levin --- fs/nilfs2/dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c index eb7de9e2a384..24cfe9db66e0 100644 --- a/fs/nilfs2/dir.c +++ b/fs/nilfs2/dir.c @@ -194,7 +194,7 @@ static struct page *nilfs_get_page(struct inode *dir, unsigned long n) if (!IS_ERR(page)) { kmap(page); if (unlikely(!PageChecked(page))) { - if (PageError(page) || !nilfs_check_page(page)) + if (!nilfs_check_page(page)) goto fail; } } From adf1b931d50b929c197dcbda47caa7469f6b0a26 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 27 Nov 2023 23:30:25 +0900 Subject: [PATCH 1109/1565] nilfs2: return the mapped address from nilfs_get_page() [ Upstream commit 09a46acb3697e50548bb265afa1d79163659dd85 ] In prepartion for switching from kmap() to kmap_local(), return the kmap address from nilfs_get_page() instead of having the caller look up page_address(). [konishi.ryusuke: fixed a missing blank line after declaration] Link: https://lkml.kernel.org/r/20231127143036.2425-7-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Ryusuke Konishi Signed-off-by: Andrew Morton Stable-dep-of: 7373a51e7998 ("nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors") Signed-off-by: Sasha Levin --- fs/nilfs2/dir.c | 57 +++++++++++++++++++++++-------------------------- 1 file changed, 27 insertions(+), 30 deletions(-) diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c index 24cfe9db66e0..22f1f75a90c1 100644 --- a/fs/nilfs2/dir.c +++ b/fs/nilfs2/dir.c @@ -186,19 +186,24 @@ static bool nilfs_check_page(struct page *page) return false; } -static struct page *nilfs_get_page(struct inode *dir, unsigned long n) +static void *nilfs_get_page(struct inode *dir, unsigned long n, + struct page **pagep) { struct address_space *mapping = dir->i_mapping; struct page *page = read_mapping_page(mapping, n, NULL); + void *kaddr; - if (!IS_ERR(page)) { - kmap(page); - if (unlikely(!PageChecked(page))) { - if (!nilfs_check_page(page)) - goto fail; - } + if (IS_ERR(page)) + return page; + + kaddr = kmap(page); + if (unlikely(!PageChecked(page))) { + if (!nilfs_check_page(page)) + goto fail; } - return page; + + *pagep = page; + return kaddr; fail: nilfs_put_page(page); @@ -275,14 +280,14 @@ static int nilfs_readdir(struct file *file, struct dir_context *ctx) for ( ; n < npages; n++, offset = 0) { char *kaddr, *limit; struct nilfs_dir_entry *de; - struct page *page = nilfs_get_page(inode, n); + struct page *page; - if (IS_ERR(page)) { + kaddr = nilfs_get_page(inode, n, &page); + if (IS_ERR(kaddr)) { nilfs_error(sb, "bad page in #%lu", inode->i_ino); ctx->pos += PAGE_SIZE - offset; return -EIO; } - kaddr = page_address(page); de = (struct nilfs_dir_entry *)(kaddr + offset); limit = kaddr + nilfs_last_byte(inode, n) - NILFS_DIR_REC_LEN(1); @@ -345,11 +350,9 @@ nilfs_find_entry(struct inode *dir, const struct qstr *qstr, start = 0; n = start; do { - char *kaddr; + char *kaddr = nilfs_get_page(dir, n, &page); - page = nilfs_get_page(dir, n); - if (!IS_ERR(page)) { - kaddr = page_address(page); + if (!IS_ERR(kaddr)) { de = (struct nilfs_dir_entry *)kaddr; kaddr += nilfs_last_byte(dir, n) - reclen; while ((char *) de <= kaddr) { @@ -387,15 +390,11 @@ nilfs_find_entry(struct inode *dir, const struct qstr *qstr, struct nilfs_dir_entry *nilfs_dotdot(struct inode *dir, struct page **p) { - struct page *page = nilfs_get_page(dir, 0); - struct nilfs_dir_entry *de = NULL; + struct nilfs_dir_entry *de = nilfs_get_page(dir, 0, p); - if (!IS_ERR(page)) { - de = nilfs_next_entry( - (struct nilfs_dir_entry *)page_address(page)); - *p = page; - } - return de; + if (IS_ERR(de)) + return NULL; + return nilfs_next_entry(de); } ino_t nilfs_inode_by_name(struct inode *dir, const struct qstr *qstr) @@ -459,12 +458,11 @@ int nilfs_add_link(struct dentry *dentry, struct inode *inode) for (n = 0; n <= npages; n++) { char *dir_end; - page = nilfs_get_page(dir, n); - err = PTR_ERR(page); - if (IS_ERR(page)) + kaddr = nilfs_get_page(dir, n, &page); + err = PTR_ERR(kaddr); + if (IS_ERR(kaddr)) goto out; lock_page(page); - kaddr = page_address(page); dir_end = kaddr + nilfs_last_byte(dir, n); de = (struct nilfs_dir_entry *)kaddr; kaddr += PAGE_SIZE - reclen; @@ -627,11 +625,10 @@ int nilfs_empty_dir(struct inode *inode) char *kaddr; struct nilfs_dir_entry *de; - page = nilfs_get_page(inode, i); - if (IS_ERR(page)) + kaddr = nilfs_get_page(inode, i, &page); + if (IS_ERR(kaddr)) continue; - kaddr = page_address(page); de = (struct nilfs_dir_entry *)kaddr; kaddr += nilfs_last_byte(inode, i) - NILFS_DIR_REC_LEN(1); From c77ad608df6c091fe64ecb91f41ef7cb465587f1 Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi Date: Tue, 4 Jun 2024 22:42:55 +0900 Subject: [PATCH 1110/1565] nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors [ Upstream commit 7373a51e7998b508af7136530f3a997b286ce81c ] The error handling in nilfs_empty_dir() when a directory folio/page read fails is incorrect, as in the old ext2 implementation, and if the folio/page cannot be read or nilfs_check_folio() fails, it will falsely determine the directory as empty and corrupt the file system. In addition, since nilfs_empty_dir() does not immediately return on a failed folio/page read, but continues to loop, this can cause a long loop with I/O if i_size of the directory's inode is also corrupted, causing the log writer thread to wait and hang, as reported by syzbot. Fix these issues by making nilfs_empty_dir() immediately return a false value (0) if it fails to get a directory folio/page. Link: https://lkml.kernel.org/r/20240604134255.7165-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi Reported-by: syzbot+c8166c541d3971bf6c87@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c8166c541d3971bf6c87 Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations") Tested-by: Ryusuke Konishi Cc: Signed-off-by: Andrew Morton Signed-off-by: Sasha Levin --- fs/nilfs2/dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c index 22f1f75a90c1..552234ef22fe 100644 --- a/fs/nilfs2/dir.c +++ b/fs/nilfs2/dir.c @@ -627,7 +627,7 @@ int nilfs_empty_dir(struct inode *inode) kaddr = nilfs_get_page(inode, i, &page); if (IS_ERR(kaddr)) - continue; + return 0; de = (struct nilfs_dir_entry *)kaddr; kaddr += nilfs_last_byte(inode, i) - NILFS_DIR_REC_LEN(1); From c0747d76eb05542b5d49f67069b64ef5ff732c6c Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Thu, 13 Jun 2024 21:30:43 -0400 Subject: [PATCH 1111/1565] USB: class: cdc-wdm: Fix CPU lockup caused by excessive log messages commit 22f00812862564b314784167a89f27b444f82a46 upstream. The syzbot fuzzer found that the interrupt-URB completion callback in the cdc-wdm driver was taking too long, and the driver's immediate resubmission of interrupt URBs with -EPROTO status combined with the dummy-hcd emulation to cause a CPU lockup: cdc_wdm 1-1:1.0: nonzero urb status received: -71 cdc_wdm 1-1:1.0: wdm_int_callback - 0 bytes watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [syz-executor782:6625] CPU#0 Utilization every 4s during lockup: #1: 98% system, 0% softirq, 3% hardirq, 0% idle #2: 98% system, 0% softirq, 3% hardirq, 0% idle #3: 98% system, 0% softirq, 3% hardirq, 0% idle #4: 98% system, 0% softirq, 3% hardirq, 0% idle #5: 98% system, 1% softirq, 3% hardirq, 0% idle Modules linked in: irq event stamp: 73096 hardirqs last enabled at (73095): [] console_emit_next_record kernel/printk/printk.c:2935 [inline] hardirqs last enabled at (73095): [] console_flush_all+0x650/0xb74 kernel/printk/printk.c:2994 hardirqs last disabled at (73096): [] __el1_irq arch/arm64/kernel/entry-common.c:533 [inline] hardirqs last disabled at (73096): [] el1_interrupt+0x24/0x68 arch/arm64/kernel/entry-common.c:551 softirqs last enabled at (73048): [] softirq_handle_end kernel/softirq.c:400 [inline] softirqs last enabled at (73048): [] handle_softirqs+0xa60/0xc34 kernel/softirq.c:582 softirqs last disabled at (73043): [] __do_softirq+0x14/0x20 kernel/softirq.c:588 CPU: 0 PID: 6625 Comm: syz-executor782 Tainted: G W 6.10.0-rc2-syzkaller-g8867bbd4a056 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Testing showed that the problem did not occur if the two error messages -- the first two lines above -- were removed; apparently adding material to the kernel log takes a surprisingly large amount of time. In any case, the best approach for preventing these lockups and to avoid spamming the log with thousands of error messages per second is to ratelimit the two dev_err() calls. Therefore we replace them with dev_err_ratelimited(). Signed-off-by: Alan Stern Suggested-by: Greg KH Reported-and-tested-by: syzbot+5f996b83575ef4058638@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/00000000000073d54b061a6a1c65@google.com/ Reported-and-tested-by: syzbot+1b2abad17596ad03dcff@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/000000000000f45085061aa9b37e@google.com/ Fixes: 9908a32e94de ("USB: remove err() macro from usb class drivers") Link: https://lore.kernel.org/linux-usb/40dfa45b-5f21-4eef-a8c1-51a2f320e267@rowland.harvard.edu/ Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/29855215-52f5-4385-b058-91f42c2bee18@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman --- drivers/usb/class/cdc-wdm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c index 80332b6a1963..aa91d561a0ac 100644 --- a/drivers/usb/class/cdc-wdm.c +++ b/drivers/usb/class/cdc-wdm.c @@ -252,14 +252,14 @@ static void wdm_int_callback(struct urb *urb) dev_err(&desc->intf->dev, "Stall on int endpoint\n"); goto sw; /* halt is cleared in work */ default: - dev_err(&desc->intf->dev, + dev_err_ratelimited(&desc->intf->dev, "nonzero urb status received: %d\n", status); break; } } if (urb->actual_length < sizeof(struct usb_cdc_notification)) { - dev_err(&desc->intf->dev, "wdm_int_callback - %d bytes\n", + dev_err_ratelimited(&desc->intf->dev, "wdm_int_callback - %d bytes\n", urb->actual_length); goto exit; } From 498ff29800a6e1a6d11e49251ae122354691afff Mon Sep 17 00:00:00 2001 From: Tomas Winkler Date: Tue, 4 Jun 2024 12:07:28 +0300 Subject: [PATCH 1112/1565] mei: me: release irq in mei_me_pci_resume error path commit 283cb234ef95d94c61f59e1cd070cd9499b51292 upstream. The mei_me_pci_resume doesn't release irq on the error path, in case mei_start() fails. Cc: Fixes: 33ec08263147 ("mei: revamp mei reset state machine") Signed-off-by: Tomas Winkler Link: https://lore.kernel.org/r/20240604090728.1027307-1-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/misc/mei/pci-me.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/misc/mei/pci-me.c b/drivers/misc/mei/pci-me.c index 188d847662ff..7b8e92bcaa98 100644 --- a/drivers/misc/mei/pci-me.c +++ b/drivers/misc/mei/pci-me.c @@ -400,8 +400,10 @@ static int mei_me_pci_resume(struct device *device) } err = mei_restart(dev); - if (err) + if (err) { + free_irq(pdev->irq, dev); return err; + } /* Start timer if stopped in suspend */ schedule_delayed_work(&dev->timer_work, HZ); From fc745f6e83cb650f9a5f2c864158e3a5ea76dad0 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 14 May 2024 12:06:34 +0200 Subject: [PATCH 1113/1565] jfs: xattr: fix buffer overflow for invalid xattr commit 7c55b78818cfb732680c4a72ab270cc2d2ee3d0f upstream. When an xattr size is not what is expected, it is printed out to the kernel log in hex format as a form of debugging. But when that xattr size is bigger than the expected size, printing it out can cause an access off the end of the buffer. Fix this all up by properly restricting the size of the debug hex dump in the kernel log. Reported-by: syzbot+9dfe490c8176301c1d06@syzkaller.appspotmail.com Cc: Dave Kleikamp Link: https://lore.kernel.org/r/2024051433-slider-cloning-98f9@gregkh Signed-off-by: Greg Kroah-Hartman --- fs/jfs/xattr.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/jfs/xattr.c b/fs/jfs/xattr.c index db41e7803163..7ae54f78a5b0 100644 --- a/fs/jfs/xattr.c +++ b/fs/jfs/xattr.c @@ -557,9 +557,11 @@ static int ea_get(struct inode *inode, struct ea_buffer *ea_buf, int min_size) size_check: if (EALIST_SIZE(ea_buf->xattr) != ea_size) { + int size = min_t(int, EALIST_SIZE(ea_buf->xattr), ea_size); + printk(KERN_ERR "ea_get: invalid extended attribute\n"); print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, - ea_buf->xattr, ea_size, 1); + ea_buf->xattr, size, 1); ea_release(inode, ea_buf); rc = -EIO; goto clean_up; From 22de7c9cba6fd1ddbe967a1992477f184af3321d Mon Sep 17 00:00:00 2001 From: Mathias Nyman Date: Tue, 11 Jun 2024 15:06:07 +0300 Subject: [PATCH 1114/1565] xhci: Set correct transferred length for cancelled bulk transfers commit f0260589b439e2637ad54a2b25f00a516ef28a57 upstream. The transferred length is set incorrectly for cancelled bulk transfer TDs in case the bulk transfer ring stops on the last transfer block with a 'Stop - Length Invalid' completion code. length essentially ends up being set to the requested length: urb->actual_length = urb->transfer_buffer_length Length for 'Stop - Length Invalid' cases should be the sum of all TRB transfer block lengths up to the one the ring stopped on, _excluding_ the one stopped on. Fix this by always summing up TRB lengths for 'Stop - Length Invalid' bulk cases. This issue was discovered by Alan Stern while debugging https://bugzilla.kernel.org/show_bug.cgi?id=218890, but does not solve that bug. Issue is older than 4.10 kernel but fix won't apply to those due to major reworks in that area. Tested-by: Pierre Tomon Cc: stable@vger.kernel.org # v4.10+ Cc: Alan Stern Signed-off-by: Mathias Nyman Link: https://lore.kernel.org/r/20240611120610.3264502-2-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-ring.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 4fa387e447f0..fbb7a5b51ef4 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2386,9 +2386,8 @@ static int process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep, goto finish_td; case COMP_STOPPED_LENGTH_INVALID: /* stopped on ep trb with invalid length, exclude it */ - ep_trb_len = 0; - remaining = 0; - break; + td->urb->actual_length = sum_trb_lengths(xhci, ep_ring, ep_trb); + goto finish_td; case COMP_USB_TRANSACTION_ERROR: if (xhci->quirks & XHCI_NO_SOFT_RETRY || (ep->err_count++ > MAX_SOFT_RETRY) || From 0b05b12e2d03c617e8581d95cad36777b8572df0 Mon Sep 17 00:00:00 2001 From: Kuangyi Chiang Date: Tue, 11 Jun 2024 15:06:08 +0300 Subject: [PATCH 1115/1565] xhci: Apply reset resume quirk to Etron EJ188 xHCI host commit 17bd54555c2aaecfdb38e2734149f684a73fa584 upstream. As described in commit c877b3b2ad5c ("xhci: Add reset on resume quirk for asrock p67 host"), EJ188 have the same issue as EJ168, where completely dies on resume. So apply XHCI_RESET_ON_RESUME quirk to EJ188 as well. Cc: stable@vger.kernel.org Signed-off-by: Kuangyi Chiang Signed-off-by: Mathias Nyman Link: https://lore.kernel.org/r/20240611120610.3264502-3-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-pci.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index 53f832787350..df92d54f4669 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -35,6 +35,7 @@ #define PCI_VENDOR_ID_ETRON 0x1b6f #define PCI_DEVICE_ID_EJ168 0x7023 +#define PCI_DEVICE_ID_EJ188 0x7052 #define PCI_DEVICE_ID_INTEL_LYNXPOINT_XHCI 0x8c31 #define PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI 0x9c31 @@ -270,6 +271,10 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) xhci->quirks |= XHCI_TRUST_TX_LENGTH; xhci->quirks |= XHCI_BROKEN_STREAMS; } + if (pdev->vendor == PCI_VENDOR_ID_ETRON && + pdev->device == PCI_DEVICE_ID_EJ188) + xhci->quirks |= XHCI_RESET_ON_RESUME; + if (pdev->vendor == PCI_VENDOR_ID_RENESAS && pdev->device == 0x0014) { xhci->quirks |= XHCI_TRUST_TX_LENGTH; From baeae72258ad68674aca94e8b84478ca169014df Mon Sep 17 00:00:00 2001 From: Kuangyi Chiang Date: Tue, 11 Jun 2024 15:06:09 +0300 Subject: [PATCH 1116/1565] xhci: Apply broken streams quirk to Etron EJ188 xHCI host commit 91f7a1524a92c70ffe264db8bdfa075f15bbbeb9 upstream. As described in commit 8f873c1ff4ca ("xhci: Blacklist using streams on the Etron EJ168 controller"), EJ188 have the same issue as EJ168, where Streams do not work reliable on EJ188. So apply XHCI_BROKEN_STREAMS quirk to EJ188 as well. Cc: stable@vger.kernel.org Signed-off-by: Kuangyi Chiang Signed-off-by: Mathias Nyman Link: https://lore.kernel.org/r/20240611120610.3264502-4-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-pci.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index df92d54f4669..88f223b975d3 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -272,8 +272,10 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) xhci->quirks |= XHCI_BROKEN_STREAMS; } if (pdev->vendor == PCI_VENDOR_ID_ETRON && - pdev->device == PCI_DEVICE_ID_EJ188) + pdev->device == PCI_DEVICE_ID_EJ188) { xhci->quirks |= XHCI_RESET_ON_RESUME; + xhci->quirks |= XHCI_BROKEN_STREAMS; + } if (pdev->vendor == PCI_VENDOR_ID_RENESAS && pdev->device == 0x0014) { From 0081d2b3ae0a17a86b8cc0fa3c8bdc54e233ba16 Mon Sep 17 00:00:00 2001 From: Breno Leitao Date: Wed, 5 Jun 2024 01:55:29 -0700 Subject: [PATCH 1117/1565] scsi: mpt3sas: Avoid test/set_bit() operating in non-allocated memory commit 4254dfeda82f20844299dca6c38cbffcfd499f41 upstream. There is a potential out-of-bounds access when using test_bit() on a single word. The test_bit() and set_bit() functions operate on long values, and when testing or setting a single word, they can exceed the word boundary. KASAN detects this issue and produces a dump: BUG: KASAN: slab-out-of-bounds in _scsih_add_device.constprop.0 (./arch/x86/include/asm/bitops.h:60 ./include/asm-generic/bitops/instrumented-atomic.h:29 drivers/scsi/mpt3sas/mpt3sas_scsih.c:7331) mpt3sas Write of size 8 at addr ffff8881d26e3c60 by task kworker/u1536:2/2965 For full log, please look at [1]. Make the allocation at least the size of sizeof(unsigned long) so that set_bit() and test_bit() have sufficient room for read/write operations without overwriting unallocated memory. [1] Link: https://lore.kernel.org/all/ZkNcALr3W3KGYYJG@gmail.com/ Fixes: c696f7b83ede ("scsi: mpt3sas: Implement device_remove_in_progress check in IOCTL path") Cc: stable@vger.kernel.org Suggested-by: Keith Busch Signed-off-by: Breno Leitao Link: https://lore.kernel.org/r/20240605085530.499432-1-leitao@debian.org Reviewed-by: Keith Busch Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/mpt3sas/mpt3sas_base.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 105d781d0cac..2803b475dae6 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -7440,6 +7440,12 @@ mpt3sas_base_attach(struct MPT3SAS_ADAPTER *ioc) ioc->pd_handles_sz = (ioc->facts.MaxDevHandle / 8); if (ioc->facts.MaxDevHandle % 8) ioc->pd_handles_sz++; + /* + * pd_handles_sz should have, at least, the minimal room for + * set_bit()/test_bit(), otherwise out-of-memory touch may occur. + */ + ioc->pd_handles_sz = ALIGN(ioc->pd_handles_sz, sizeof(unsigned long)); + ioc->pd_handles = kzalloc(ioc->pd_handles_sz, GFP_KERNEL); if (!ioc->pd_handles) { @@ -7457,6 +7463,13 @@ mpt3sas_base_attach(struct MPT3SAS_ADAPTER *ioc) ioc->pend_os_device_add_sz = (ioc->facts.MaxDevHandle / 8); if (ioc->facts.MaxDevHandle % 8) ioc->pend_os_device_add_sz++; + + /* + * pend_os_device_add_sz should have, at least, the minimal room for + * set_bit()/test_bit(), otherwise out-of-memory may occur. + */ + ioc->pend_os_device_add_sz = ALIGN(ioc->pend_os_device_add_sz, + sizeof(unsigned long)); ioc->pend_os_device_add = kzalloc(ioc->pend_os_device_add_sz, GFP_KERNEL); if (!ioc->pend_os_device_add) { @@ -7747,6 +7760,12 @@ _base_check_ioc_facts_changes(struct MPT3SAS_ADAPTER *ioc) if (ioc->facts.MaxDevHandle % 8) pd_handles_sz++; + /* + * pd_handles should have, at least, the minimal room for + * set_bit()/test_bit(), otherwise out-of-memory touch may + * occur. + */ + pd_handles_sz = ALIGN(pd_handles_sz, sizeof(unsigned long)); pd_handles = krealloc(ioc->pd_handles, pd_handles_sz, GFP_KERNEL); if (!pd_handles) { From 20b3f435b7c1bb74f2d68ed67c37cbe39fcb06d3 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Wed, 29 May 2024 22:30:28 +1000 Subject: [PATCH 1118/1565] powerpc/uaccess: Fix build errors seen with GCC 13/14 commit 2d43cc701b96f910f50915ac4c2a0cae5deb734c upstream. Building ppc64le_defconfig with GCC 14 fails with assembler errors: CC fs/readdir.o /tmp/ccdQn0mD.s: Assembler messages: /tmp/ccdQn0mD.s:212: Error: operand out of domain (18 is not a multiple of 4) /tmp/ccdQn0mD.s:226: Error: operand out of domain (18 is not a multiple of 4) ... [6 lines] /tmp/ccdQn0mD.s:1699: Error: operand out of domain (18 is not a multiple of 4) A snippet of the asm shows: # ../fs/readdir.c:210: unsafe_copy_dirent_name(dirent->d_name, name, namlen, efault_end); ld 9,0(29) # MEM[(u64 *)name_38(D) + _88 * 1], MEM[(u64 *)name_38(D) + _88 * 1] # 210 "../fs/readdir.c" 1 1: std 9,18(8) # put_user # *__pus_addr_52, MEM[(u64 *)name_38(D) + _88 * 1] The 'std' instruction requires a 4-byte aligned displacement because it is a DS-form instruction, and as the assembler says, 18 is not a multiple of 4. A similar error is seen with GCC 13 and CONFIG_UBSAN_SIGNED_WRAP=y. The fix is to change the constraint on the memory operand to put_user(), from "m" which is a general memory reference to "YZ". The "Z" constraint is documented in the GCC manual PowerPC machine constraints, and specifies a "memory operand accessed with indexed or indirect addressing". "Y" is not documented in the manual but specifies a "memory operand for a DS-form instruction". Using both allows the compiler to generate a DS-form "std" or X-form "stdx" as appropriate. The change has to be conditional on CONFIG_PPC_KERNEL_PREFIXED because the "Y" constraint does not guarantee 4-byte alignment when prefixed instructions are enabled. Unfortunately clang doesn't support the "Y" constraint so that has to be behind an ifdef. Although the build error is only seen with GCC 13/14, that appears to just be luck. The constraint has been incorrect since it was first added. Fixes: c20beffeec3c ("powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()") Cc: stable@vger.kernel.org # v5.10+ Suggested-by: Kewen Lin Signed-off-by: Michael Ellerman Link: https://msgid.link/20240529123029.146953-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/include/asm/uaccess.h | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index 6b808bcdecd5..6df110c1254e 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -186,9 +186,20 @@ do { \ : \ : label) +#ifdef CONFIG_CC_IS_CLANG +#define DS_FORM_CONSTRAINT "Z<>" +#else +#define DS_FORM_CONSTRAINT "YZ<>" +#endif + #ifdef __powerpc64__ -#define __put_user_asm2_goto(x, ptr, label) \ - __put_user_asm_goto(x, ptr, label, "std") +#define __put_user_asm2_goto(x, addr, label) \ + asm goto ("1: std%U1%X1 %0,%1 # put_user\n" \ + EX_TABLE(1b, %l2) \ + : \ + : "r" (x), DS_FORM_CONSTRAINT (*addr) \ + : \ + : label) #else /* __powerpc64__ */ #define __put_user_asm2_goto(x, addr, label) \ asm_volatile_goto( \ From b6a204f937e6372bfbff681f992bc226d5db5169 Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Mon, 29 Apr 2024 14:50:41 -0700 Subject: [PATCH 1119/1565] Input: try trimming too long modalias strings commit 0774d19038c496f0c3602fb505c43e1b2d8eed85 upstream. If an input device declares too many capability bits then modalias string for such device may become too long and not fit into uevent buffer, resulting in failure of sending said uevent. This, in turn, may prevent userspace from recognizing existence of such devices. This is typically not a concern for real hardware devices as they have limited number of keys, but happen with synthetic devices such as ones created by xen-kbdfront driver, which creates devices as being capable of delivering all possible keys, since it doesn't know what keys the backend may produce. To deal with such devices input core will attempt to trim key data, in the hope that the rest of modalias string will fit in the given buffer. When trimming key data it will indicate that it is not complete by placing "+," sign, resulting in conversions like this: old: k71,72,73,74,78,7A,7B,7C,7D,8E,9E,A4,AD,E0,E1,E4,F8,174, new: k71,72,73,74,78,7A,7B,7C,+, This should allow existing udev rules continue to work with existing devices, and will also allow writing more complex rules that would recognize trimmed modalias and check input device characteristics by other means (for example by parsing KEY= data in uevent or parsing input device sysfs attributes). Note that the driver core may try adding more uevent environment variables once input core is done adding its own, so when forming modalias we can not use the entire available buffer, so we reduce it by somewhat an arbitrary amount (96 bytes). Reported-by: Jason Andryuk Reviewed-by: Peter Hutterer Tested-by: Jason Andryuk Link: https://lore.kernel.org/r/ZjAWMQCJdrxZkvkB@google.com Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman Signed-off-by: Jason Andryuk --- drivers/input/input.c | 105 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 90 insertions(+), 15 deletions(-) diff --git a/drivers/input/input.c b/drivers/input/input.c index 49504dcd5dc6..9190aa18263e 100644 --- a/drivers/input/input.c +++ b/drivers/input/input.c @@ -1356,19 +1356,19 @@ static int input_print_modalias_bits(char *buf, int size, char name, unsigned long *bm, unsigned int min_bit, unsigned int max_bit) { - int len = 0, i; + int bit = min_bit; + int len = 0; len += snprintf(buf, max(size, 0), "%c", name); - for (i = min_bit; i < max_bit; i++) - if (bm[BIT_WORD(i)] & BIT_MASK(i)) - len += snprintf(buf + len, max(size - len, 0), "%X,", i); + for_each_set_bit_from(bit, bm, max_bit) + len += snprintf(buf + len, max(size - len, 0), "%X,", bit); return len; } -static int input_print_modalias(char *buf, int size, struct input_dev *id, - int add_cr) +static int input_print_modalias_parts(char *buf, int size, int full_len, + struct input_dev *id) { - int len; + int len, klen, remainder, space; len = snprintf(buf, max(size, 0), "input:b%04Xv%04Xp%04Xe%04X-", @@ -1377,8 +1377,49 @@ static int input_print_modalias(char *buf, int size, struct input_dev *id, len += input_print_modalias_bits(buf + len, size - len, 'e', id->evbit, 0, EV_MAX); - len += input_print_modalias_bits(buf + len, size - len, + + /* + * Calculate the remaining space in the buffer making sure we + * have place for the terminating 0. + */ + space = max(size - (len + 1), 0); + + klen = input_print_modalias_bits(buf + len, size - len, 'k', id->keybit, KEY_MIN_INTERESTING, KEY_MAX); + len += klen; + + /* + * If we have more data than we can fit in the buffer, check + * if we can trim key data to fit in the rest. We will indicate + * that key data is incomplete by adding "+" sign at the end, like + * this: * "k1,2,3,45,+,". + * + * Note that we shortest key info (if present) is "k+," so we + * can only try to trim if key data is longer than that. + */ + if (full_len && size < full_len + 1 && klen > 3) { + remainder = full_len - len; + /* + * We can only trim if we have space for the remainder + * and also for at least "k+," which is 3 more characters. + */ + if (remainder <= space - 3) { + int i; + /* + * We are guaranteed to have 'k' in the buffer, so + * we need at least 3 additional bytes for storing + * "+," in addition to the remainder. + */ + for (i = size - 1 - remainder - 3; i >= 0; i--) { + if (buf[i] == 'k' || buf[i] == ',') { + strcpy(buf + i + 1, "+,"); + len = i + 3; /* Not counting '\0' */ + break; + } + } + } + } + len += input_print_modalias_bits(buf + len, size - len, 'r', id->relbit, 0, REL_MAX); len += input_print_modalias_bits(buf + len, size - len, @@ -1394,12 +1435,25 @@ static int input_print_modalias(char *buf, int size, struct input_dev *id, len += input_print_modalias_bits(buf + len, size - len, 'w', id->swbit, 0, SW_MAX); - if (add_cr) - len += snprintf(buf + len, max(size - len, 0), "\n"); - return len; } +static int input_print_modalias(char *buf, int size, struct input_dev *id) +{ + int full_len; + + /* + * Printing is done in 2 passes: first one figures out total length + * needed for the modalias string, second one will try to trim key + * data in case when buffer is too small for the entire modalias. + * If the buffer is too small regardless, it will fill as much as it + * can (without trimming key data) into the buffer and leave it to + * the caller to figure out what to do with the result. + */ + full_len = input_print_modalias_parts(NULL, 0, 0, id); + return input_print_modalias_parts(buf, size, full_len, id); +} + static ssize_t input_dev_show_modalias(struct device *dev, struct device_attribute *attr, char *buf) @@ -1407,7 +1461,9 @@ static ssize_t input_dev_show_modalias(struct device *dev, struct input_dev *id = to_input_dev(dev); ssize_t len; - len = input_print_modalias(buf, PAGE_SIZE, id, 1); + len = input_print_modalias(buf, PAGE_SIZE, id); + if (len < PAGE_SIZE - 2) + len += snprintf(buf + len, PAGE_SIZE - len, "\n"); return min_t(int, len, PAGE_SIZE); } @@ -1582,6 +1638,23 @@ static int input_add_uevent_bm_var(struct kobj_uevent_env *env, return 0; } +/* + * This is a pretty gross hack. When building uevent data the driver core + * may try adding more environment variables to kobj_uevent_env without + * telling us, so we have no idea how much of the buffer we can use to + * avoid overflows/-ENOMEM elsewhere. To work around this let's artificially + * reduce amount of memory we will use for the modalias environment variable. + * + * The potential additions are: + * + * SEQNUM=18446744073709551615 - (%llu - 28 bytes) + * HOME=/ (6 bytes) + * PATH=/sbin:/bin:/usr/sbin:/usr/bin (34 bytes) + * + * 68 bytes total. Allow extra buffer - 96 bytes + */ +#define UEVENT_ENV_EXTRA_LEN 96 + static int input_add_uevent_modalias_var(struct kobj_uevent_env *env, struct input_dev *dev) { @@ -1591,9 +1664,11 @@ static int input_add_uevent_modalias_var(struct kobj_uevent_env *env, return -ENOMEM; len = input_print_modalias(&env->buf[env->buflen - 1], - sizeof(env->buf) - env->buflen, - dev, 0); - if (len >= (sizeof(env->buf) - env->buflen)) + (int)sizeof(env->buf) - env->buflen - + UEVENT_ENV_EXTRA_LEN, + dev); + if (len >= ((int)sizeof(env->buf) - env->buflen - + UEVENT_ENV_EXTRA_LEN)) return -ENOMEM; env->buflen += len; From 66c79c5acc5c6cc9fdb1e2977c03cdb7905badae Mon Sep 17 00:00:00 2001 From: Chen Hanxiao Date: Thu, 23 May 2024 16:47:16 +0800 Subject: [PATCH 1120/1565] SUNRPC: return proper error from gss_wrap_req_priv [ Upstream commit 33c94d7e3cb84f6d130678d6d59ba475a6c489cf ] don't return 0 if snd_buf->len really greater than snd_buf->buflen Signed-off-by: Chen Hanxiao Fixes: 0c77668ddb4e ("SUNRPC: Introduce trace points in rpc_auth_gss.ko") Reviewed-by: Benjamin Coddington Reviewed-by: Chuck Lever Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin --- net/sunrpc/auth_gss/auth_gss.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c index 2ff66a6a7e54..7ce4a6b7cfae 100644 --- a/net/sunrpc/auth_gss/auth_gss.c +++ b/net/sunrpc/auth_gss/auth_gss.c @@ -1855,8 +1855,10 @@ gss_wrap_req_priv(struct rpc_cred *cred, struct gss_cl_ctx *ctx, offset = (u8 *)p - (u8 *)snd_buf->head[0].iov_base; maj_stat = gss_wrap(ctx->gc_gss_ctx, offset, snd_buf, inpages); /* slack space should prevent this ever happening: */ - if (unlikely(snd_buf->len > snd_buf->buflen)) + if (unlikely(snd_buf->len > snd_buf->buflen)) { + status = -EIO; goto wrap_failed; + } /* We're assuming that when GSS_S_CONTEXT_EXPIRED, the encryption was * done anyway, so it's safe to put the request on the wire: */ if (maj_stat == GSS_S_CONTEXT_EXPIRED) From 544015b94589fd8a3cfe0870ed9b11539cf354a0 Mon Sep 17 00:00:00 2001 From: Gregor Herburger Date: Thu, 30 May 2024 12:19:59 +0200 Subject: [PATCH 1121/1565] gpio: tqmx86: fix typo in Kconfig label [ Upstream commit 8c219e52ca4d9a67cd6a7074e91bf29b55edc075 ] Fix description for GPIO_TQMX86 from QTMX86 to TQMx86. Fixes: b868db94a6a7 ("gpio: tqmx86: Add GPIO from for this IO controller") Signed-off-by: Gregor Herburger Signed-off-by: Matthias Schiffer Reviewed-by: Andrew Lunn Link: https://lore.kernel.org/r/e0e38c9944ad6d281d9a662a45d289b88edc808e.1717063994.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Bartosz Golaszewski Signed-off-by: Sasha Levin --- drivers/gpio/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig index 39f3e1366409..b7811dbe0ec2 100644 --- a/drivers/gpio/Kconfig +++ b/drivers/gpio/Kconfig @@ -1335,7 +1335,7 @@ config GPIO_TPS68470 drivers are loaded. config GPIO_TQMX86 - tristate "TQ-Systems QTMX86 GPIO" + tristate "TQ-Systems TQMx86 GPIO" depends on MFD_TQMX86 || COMPILE_TEST depends on HAS_IOPORT_MAP select GPIOLIB_IRQCHIP From 33f6832798dd3297317901cc1db556ac3ae80c24 Mon Sep 17 00:00:00 2001 From: Nikita Zhandarovich Date: Fri, 17 May 2024 07:19:14 -0700 Subject: [PATCH 1122/1565] HID: core: remove unnecessary WARN_ON() in implement() [ Upstream commit 4aa2dcfbad538adf7becd0034a3754e1bd01b2b5 ] Syzkaller hit a warning [1] in a call to implement() when trying to write a value into a field of smaller size in an output report. Since implement() already has a warn message printed out with the help of hid_warn() and value in question gets trimmed with: ... value &= m; ... WARN_ON may be considered superfluous. Remove it to suppress future syzkaller triggers. [1] WARNING: CPU: 0 PID: 5084 at drivers/hid/hid-core.c:1451 implement drivers/hid/hid-core.c:1451 [inline] WARNING: CPU: 0 PID: 5084 at drivers/hid/hid-core.c:1451 hid_output_report+0x548/0x760 drivers/hid/hid-core.c:1863 Modules linked in: CPU: 0 PID: 5084 Comm: syz-executor424 Not tainted 6.9.0-rc7-syzkaller-00183-gcf87f46fd34d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 RIP: 0010:implement drivers/hid/hid-core.c:1451 [inline] RIP: 0010:hid_output_report+0x548/0x760 drivers/hid/hid-core.c:1863 ... Call Trace: __usbhid_submit_report drivers/hid/usbhid/hid-core.c:591 [inline] usbhid_submit_report+0x43d/0x9e0 drivers/hid/usbhid/hid-core.c:636 hiddev_ioctl+0x138b/0x1f00 drivers/hid/usbhid/hiddev.c:726 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f ... Fixes: 95d1c8951e5b ("HID: simplify implement() a bit") Reported-by: Suggested-by: Alan Stern Signed-off-by: Nikita Zhandarovich Signed-off-by: Jiri Kosina Signed-off-by: Sasha Levin --- drivers/hid/hid-core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c index 476967ab6294..5281d693b32d 100644 --- a/drivers/hid/hid-core.c +++ b/drivers/hid/hid-core.c @@ -1446,7 +1446,6 @@ static void implement(const struct hid_device *hid, u8 *report, hid_warn(hid, "%s() called with too large value %d (n: %d)! (%s)\n", __func__, value, n, current->comm); - WARN_ON(1); value &= m; } } From a843c0e9da326482f079285f58a727db9a43d5b5 Mon Sep 17 00:00:00 2001 From: Matthias Schiffer Date: Thu, 30 May 2024 12:20:01 +0200 Subject: [PATCH 1123/1565] gpio: tqmx86: store IRQ trigger type and unmask status separately commit 08af509efdf8dad08e972b48de0e2c2a7919ea8b upstream. irq_set_type() should not implicitly unmask the IRQ. All accesses to the interrupt configuration register are moved to a new helper tqmx86_gpio_irq_config(). We also introduce the new rule that accessing irq_type must happen while locked, which will become significant for fixing EDGE_BOTH handling. Fixes: b868db94a6a7 ("gpio: tqmx86: Add GPIO from for this IO controller") Signed-off-by: Matthias Schiffer Link: https://lore.kernel.org/r/6aa4f207f77cb58ef64ffb947e91949b0f753ccd.1717063994.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Bartosz Golaszewski Signed-off-by: Greg Kroah-Hartman --- drivers/gpio/gpio-tqmx86.c | 46 +++++++++++++++++++++----------------- 1 file changed, 26 insertions(+), 20 deletions(-) diff --git a/drivers/gpio/gpio-tqmx86.c b/drivers/gpio/gpio-tqmx86.c index 0f5d17f343f1..70c52544ec80 100644 --- a/drivers/gpio/gpio-tqmx86.c +++ b/drivers/gpio/gpio-tqmx86.c @@ -27,16 +27,20 @@ #define TQMX86_GPIIC 3 /* GPI Interrupt Configuration Register */ #define TQMX86_GPIIS 4 /* GPI Interrupt Status Register */ +#define TQMX86_GPII_NONE 0 #define TQMX86_GPII_FALLING BIT(0) #define TQMX86_GPII_RISING BIT(1) #define TQMX86_GPII_MASK (BIT(0) | BIT(1)) #define TQMX86_GPII_BITS 2 +/* Stored in irq_type with GPII bits */ +#define TQMX86_INT_UNMASKED BIT(2) struct tqmx86_gpio_data { struct gpio_chip chip; struct irq_chip irq_chip; void __iomem *io_base; int irq; + /* Lock must be held for accessing output and irq_type fields */ raw_spinlock_t spinlock; u8 irq_type[TQMX86_NGPI]; }; @@ -107,20 +111,30 @@ static int tqmx86_gpio_get_direction(struct gpio_chip *chip, return GPIO_LINE_DIRECTION_OUT; } +static void tqmx86_gpio_irq_config(struct tqmx86_gpio_data *gpio, int offset) + __must_hold(&gpio->spinlock) +{ + u8 type = TQMX86_GPII_NONE, gpiic; + + if (gpio->irq_type[offset] & TQMX86_INT_UNMASKED) + type = gpio->irq_type[offset] & TQMX86_GPII_MASK; + + gpiic = tqmx86_gpio_read(gpio, TQMX86_GPIIC); + gpiic &= ~(TQMX86_GPII_MASK << (offset * TQMX86_GPII_BITS)); + gpiic |= type << (offset * TQMX86_GPII_BITS); + tqmx86_gpio_write(gpio, gpiic, TQMX86_GPIIC); +} + static void tqmx86_gpio_irq_mask(struct irq_data *data) { unsigned int offset = (data->hwirq - TQMX86_NGPO); struct tqmx86_gpio_data *gpio = gpiochip_get_data( irq_data_get_irq_chip_data(data)); unsigned long flags; - u8 gpiic, mask; - - mask = TQMX86_GPII_MASK << (offset * TQMX86_GPII_BITS); raw_spin_lock_irqsave(&gpio->spinlock, flags); - gpiic = tqmx86_gpio_read(gpio, TQMX86_GPIIC); - gpiic &= ~mask; - tqmx86_gpio_write(gpio, gpiic, TQMX86_GPIIC); + gpio->irq_type[offset] &= ~TQMX86_INT_UNMASKED; + tqmx86_gpio_irq_config(gpio, offset); raw_spin_unlock_irqrestore(&gpio->spinlock, flags); } @@ -130,15 +144,10 @@ static void tqmx86_gpio_irq_unmask(struct irq_data *data) struct tqmx86_gpio_data *gpio = gpiochip_get_data( irq_data_get_irq_chip_data(data)); unsigned long flags; - u8 gpiic, mask; - - mask = TQMX86_GPII_MASK << (offset * TQMX86_GPII_BITS); raw_spin_lock_irqsave(&gpio->spinlock, flags); - gpiic = tqmx86_gpio_read(gpio, TQMX86_GPIIC); - gpiic &= ~mask; - gpiic |= gpio->irq_type[offset] << (offset * TQMX86_GPII_BITS); - tqmx86_gpio_write(gpio, gpiic, TQMX86_GPIIC); + gpio->irq_type[offset] |= TQMX86_INT_UNMASKED; + tqmx86_gpio_irq_config(gpio, offset); raw_spin_unlock_irqrestore(&gpio->spinlock, flags); } @@ -149,7 +158,7 @@ static int tqmx86_gpio_irq_set_type(struct irq_data *data, unsigned int type) unsigned int offset = (data->hwirq - TQMX86_NGPO); unsigned int edge_type = type & IRQF_TRIGGER_MASK; unsigned long flags; - u8 new_type, gpiic; + u8 new_type; switch (edge_type) { case IRQ_TYPE_EDGE_RISING: @@ -165,13 +174,10 @@ static int tqmx86_gpio_irq_set_type(struct irq_data *data, unsigned int type) return -EINVAL; /* not supported */ } - gpio->irq_type[offset] = new_type; - raw_spin_lock_irqsave(&gpio->spinlock, flags); - gpiic = tqmx86_gpio_read(gpio, TQMX86_GPIIC); - gpiic &= ~((TQMX86_GPII_MASK) << (offset * TQMX86_GPII_BITS)); - gpiic |= new_type << (offset * TQMX86_GPII_BITS); - tqmx86_gpio_write(gpio, gpiic, TQMX86_GPIIC); + gpio->irq_type[offset] &= ~TQMX86_GPII_MASK; + gpio->irq_type[offset] |= new_type; + tqmx86_gpio_irq_config(gpio, offset); raw_spin_unlock_irqrestore(&gpio->spinlock, flags); return 0; From c0f1bd317b3a539546739aef47fa5d384fa88d83 Mon Sep 17 00:00:00 2001 From: Vasant Hegde Date: Wed, 6 Jul 2022 17:07:52 +0530 Subject: [PATCH 1124/1565] iommu/amd: Introduce pci segment structure [ Upstream commit 404ec4e4c169fb64da6b2a38b471c13ac0897c76 ] Newer AMD systems can support multiple PCI segments, where each segment contains one or more IOMMU instances. However, an IOMMU instance can only support a single PCI segment. Current code assumes that system contains only one pci segment (segment 0) and creates global data structures such as device table, rlookup table, etc. Introducing per PCI segment data structure, which contains segment specific data structures. This will eventually replace the global data structures. Also update `amd_iommu->pci_seg` variable to point to PCI segment structure instead of PCI segment ID. Co-developed-by: Suravee Suthikulpanit Signed-off-by: Suravee Suthikulpanit Signed-off-by: Vasant Hegde Link: https://lore.kernel.org/r/20220706113825.25582-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel Stable-dep-of: a295ec52c862 ("iommu/amd: Fix sysfs leak in iommu init") Signed-off-by: Sasha Levin --- drivers/iommu/amd/amd_iommu_types.h | 24 ++++++++++++++- drivers/iommu/amd/init.c | 46 ++++++++++++++++++++++++++++- 2 files changed, 68 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h index 4a8791e037b8..c4b1a652c2c7 100644 --- a/drivers/iommu/amd/amd_iommu_types.h +++ b/drivers/iommu/amd/amd_iommu_types.h @@ -435,6 +435,11 @@ extern bool amd_iommu_irq_remap; /* kmem_cache to get tables with 128 byte alignement */ extern struct kmem_cache *amd_iommu_irq_cache; +/* Make iterating over all pci segment easier */ +#define for_each_pci_segment(pci_seg) \ + list_for_each_entry((pci_seg), &amd_iommu_pci_seg_list, list) +#define for_each_pci_segment_safe(pci_seg, next) \ + list_for_each_entry_safe((pci_seg), (next), &amd_iommu_pci_seg_list, list) /* * Make iterating over all IOMMUs easier */ @@ -494,6 +499,17 @@ struct domain_pgtable { u64 *root; }; +/* + * This structure contains information about one PCI segment in the system. + */ +struct amd_iommu_pci_seg { + /* List with all PCI segments in the system */ + struct list_head list; + + /* PCI segment number */ + u16 id; +}; + /* * Structure where we save information about one hardware AMD IOMMU in the * system. @@ -545,7 +561,7 @@ struct amd_iommu { u16 cap_ptr; /* pci domain of this IOMMU */ - u16 pci_seg; + struct amd_iommu_pci_seg *pci_seg; /* start of exclusion range of that IOMMU */ u64 exclusion_start; @@ -676,6 +692,12 @@ extern struct list_head ioapic_map; extern struct list_head hpet_map; extern struct list_head acpihid_map; +/* + * List with all PCI segments in the system. This list is not locked because + * it is only written at driver initialization time + */ +extern struct list_head amd_iommu_pci_seg_list; + /* * List with all IOMMUs in the system. This list is not locked because it is * only written and read at driver initialization or suspend time diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 91cc3a5643ca..22d28dbe092e 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -165,6 +165,7 @@ LIST_HEAD(amd_iommu_unity_map); /* a list of required unity mappings we find in ACPI */ bool amd_iommu_unmap_flush; /* if true, flush on every unmap */ +LIST_HEAD(amd_iommu_pci_seg_list); /* list of all PCI segments */ LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the system */ @@ -1456,6 +1457,43 @@ static int __init init_iommu_from_acpi(struct amd_iommu *iommu, return 0; } +/* Allocate PCI segment data structure */ +static struct amd_iommu_pci_seg *__init alloc_pci_segment(u16 id) +{ + struct amd_iommu_pci_seg *pci_seg; + + pci_seg = kzalloc(sizeof(struct amd_iommu_pci_seg), GFP_KERNEL); + if (pci_seg == NULL) + return NULL; + + pci_seg->id = id; + list_add_tail(&pci_seg->list, &amd_iommu_pci_seg_list); + + return pci_seg; +} + +static struct amd_iommu_pci_seg *__init get_pci_segment(u16 id) +{ + struct amd_iommu_pci_seg *pci_seg; + + for_each_pci_segment(pci_seg) { + if (pci_seg->id == id) + return pci_seg; + } + + return alloc_pci_segment(id); +} + +static void __init free_pci_segments(void) +{ + struct amd_iommu_pci_seg *pci_seg, *next; + + for_each_pci_segment_safe(pci_seg, next) { + list_del(&pci_seg->list); + kfree(pci_seg); + } +} + static void __init free_iommu_one(struct amd_iommu *iommu) { free_cwwb_sem(iommu); @@ -1542,8 +1580,14 @@ static void amd_iommu_ats_write_check_workaround(struct amd_iommu *iommu) */ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h) { + struct amd_iommu_pci_seg *pci_seg; int ret; + pci_seg = get_pci_segment(h->pci_seg); + if (pci_seg == NULL) + return -ENOMEM; + iommu->pci_seg = pci_seg; + raw_spin_lock_init(&iommu->lock); iommu->cmd_sem_val = 0; @@ -1564,7 +1608,6 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h) */ iommu->devid = h->devid; iommu->cap_ptr = h->cap_ptr; - iommu->pci_seg = h->pci_seg; iommu->mmio_phys = h->mmio_phys; switch (h->type) { @@ -2511,6 +2554,7 @@ static void __init free_iommu_resources(void) amd_iommu_dev_table = NULL; free_iommu_all(); + free_pci_segments(); } /* SB IOAPIC is always on this device in AMD systems */ From d4673a34d8fd7254b16e6e707a0bf9e866b69ac7 Mon Sep 17 00:00:00 2001 From: "Kun(llfl)" Date: Thu, 9 May 2024 08:42:20 +0800 Subject: [PATCH 1125/1565] iommu/amd: Fix sysfs leak in iommu init [ Upstream commit a295ec52c8624883885396fde7b4df1a179627c3 ] During the iommu initialization, iommu_init_pci() adds sysfs nodes. However, these nodes aren't remove in free_iommu_resources() subsequently. Fixes: 39ab9555c241 ("iommu: Add sysfs bindings for struct iommu_device") Signed-off-by: Kun(llfl) Reviewed-by: Suravee Suthikulpanit Link: https://lore.kernel.org/r/c8e0d11c6ab1ee48299c288009cf9c5dae07b42d.1715215003.git.llfl@linux.alibaba.com Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- drivers/iommu/amd/init.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 22d28dbe092e..917ee5a67e78 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1494,8 +1494,17 @@ static void __init free_pci_segments(void) } } +static void __init free_sysfs(struct amd_iommu *iommu) +{ + if (iommu->iommu.dev) { + iommu_device_unregister(&iommu->iommu); + iommu_device_sysfs_remove(&iommu->iommu); + } +} + static void __init free_iommu_one(struct amd_iommu *iommu) { + free_sysfs(iommu); free_cwwb_sem(iommu); free_command_buffer(iommu); free_event_buffer(iommu); From cf34f8f66982a36e5cba0d05781b21ec9606b91e Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Tue, 28 May 2024 12:25:28 +0800 Subject: [PATCH 1126/1565] iommu: Return right value in iommu_sva_bind_device() [ Upstream commit 89e8a2366e3bce584b6c01549d5019c5cda1205e ] iommu_sva_bind_device() should return either a sva bond handle or an ERR_PTR value in error cases. Existing drivers (idxd and uacce) only check the return value with IS_ERR(). This could potentially lead to a kernel NULL pointer dereference issue if the function returns NULL instead of an error pointer. In reality, this doesn't cause any problems because iommu_sva_bind_device() only returns NULL when the kernel is not configured with CONFIG_IOMMU_SVA. In this case, iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) will return an error, and the device drivers won't call iommu_sva_bind_device() at all. Fixes: 26b25a2b98e4 ("iommu: Bind process address spaces to devices") Signed-off-by: Lu Baolu Reviewed-by: Jean-Philippe Brucker Reviewed-by: Kevin Tian Reviewed-by: Vasant Hegde Link: https://lore.kernel.org/r/20240528042528.71396-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin --- include/linux/iommu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index e90c267e7f3e..2698dd231298 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -1026,7 +1026,7 @@ iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev) static inline struct iommu_sva * iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata) { - return NULL; + return ERR_PTR(-ENODEV); } static inline void iommu_sva_unbind_device(struct iommu_sva *handle) From caa9c9acb93db7ad7b74b157cf101579bac9596d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jos=C3=A9=20Exp=C3=B3sito?= Date: Fri, 24 May 2024 15:05:39 +0200 Subject: [PATCH 1127/1565] HID: logitech-dj: Fix memory leak in logi_dj_recv_switch_to_dj_mode() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit ce3af2ee95170b7d9e15fff6e500d67deab1e7b3 ] Fix a memory leak on logi_dj_recv_send_report() error path. Fixes: 6f20d3261265 ("HID: logitech-dj: Fix error handling in logi_dj_recv_switch_to_dj_mode()") Signed-off-by: José Expósito Signed-off-by: Jiri Kosina Signed-off-by: Sasha Levin --- drivers/hid/hid-logitech-dj.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c index f4d79ec82679..bda9150aa372 100644 --- a/drivers/hid/hid-logitech-dj.c +++ b/drivers/hid/hid-logitech-dj.c @@ -1218,8 +1218,10 @@ static int logi_dj_recv_switch_to_dj_mode(struct dj_receiver_dev *djrcv_dev, */ msleep(50); - if (retval) + if (retval) { + kfree(dj_report); return retval; + } } /* From abc55e738b4337f26d16ec70159fb8288767fd5e Mon Sep 17 00:00:00 2001 From: Ian Forbes Date: Tue, 21 May 2024 13:47:18 -0500 Subject: [PATCH 1128/1565] drm/vmwgfx: 3D disabled should not effect STDU memory limits [ Upstream commit fb5e19d2dd03eb995ccd468d599b2337f7f66555 ] This limit became a hard cap starting with the change referenced below. Surface creation on the device will fail if the requested size is larger than this limit so altering the value arbitrarily will expose modes that are too large for the device's hard limits. Fixes: 7ebb47c9f9ab ("drm/vmwgfx: Read new register for GB memory when available") Signed-off-by: Ian Forbes Signed-off-by: Zack Rusin Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-3-ian.forbes@broadcom.com Signed-off-by: Sasha Levin --- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c index bdb7a5e96560..25a9c72cca80 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c @@ -752,13 +752,6 @@ static int vmw_driver_load(struct drm_device *dev, unsigned long chipset) vmw_read(dev_priv, SVGA_REG_SUGGESTED_GBOBJECT_MEM_SIZE_KB); - /* - * Workaround for low memory 2D VMs to compensate for the - * allocation taken by fbdev - */ - if (!(dev_priv->capabilities & SVGA_CAP_3D)) - mem_size *= 3; - dev_priv->max_mob_pages = mem_size * 1024 / PAGE_SIZE; dev_priv->prim_bb_mem = vmw_read(dev_priv, From bd8e1e6af6d9e29edaeaa86ab51c41123760ebc2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Cs=C3=B3k=C3=A1s=2C=20Bence?= Date: Wed, 5 Jun 2024 10:42:51 +0200 Subject: [PATCH 1129/1565] net: sfp: Always call `sfp_sm_mod_remove()` on remove MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit e96b2933152fd87b6a41765b2f58b158fde855b6 ] If the module is in SFP_MOD_ERROR, `sfp_sm_mod_remove()` will not be run. As a consequence, `sfp_hwmon_remove()` is not getting run either, leaving a stale `hwmon` device behind. `sfp_sm_mod_remove()` itself checks `sfp->sm_mod_state` anyways, so this check was not really needed in the first place. Fixes: d2e816c0293f ("net: sfp: handle module remove outside state machine") Signed-off-by: "Csókás, Bence" Reviewed-by: Andrew Lunn Link: https://lore.kernel.org/r/20240605084251.63502-1-csokas.bence@prolan.hu Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/phy/sfp.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c index 6a5f40f11db3..d08990437f3e 100644 --- a/drivers/net/phy/sfp.c +++ b/drivers/net/phy/sfp.c @@ -1952,8 +1952,7 @@ static void sfp_sm_module(struct sfp *sfp, unsigned int event) /* Handle remove event globally, it resets this state machine */ if (event == SFP_E_REMOVE) { - if (sfp->sm_mod_state > SFP_MOD_PROBE) - sfp_sm_mod_remove(sfp); + sfp_sm_mod_remove(sfp); sfp_sm_mod_next(sfp, SFP_MOD_EMPTY, 0); return; } From 187e293c82607c08c719f050385776fb9b565d85 Mon Sep 17 00:00:00 2001 From: Jie Wang Date: Wed, 5 Jun 2024 15:20:58 +0800 Subject: [PATCH 1130/1565] net: hns3: add cond_resched() to hns3 ring buffer init process [ Upstream commit 968fde83841a8c23558dfbd0a0c69d636db52b55 ] Currently hns3 ring buffer init process would hold cpu too long with big Tx/Rx ring depth. This could cause soft lockup. So this patch adds cond_resched() to the process. Then cpu can break to run other tasks instead of busy looping. Fixes: a723fb8efe29 ("net: hns3: refine for set ring parameters") Signed-off-by: Jie Wang Signed-off-by: Jijie Shao Reviewed-by: Simon Horman Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 4 ++++ drivers/net/ethernet/hisilicon/hns3/hns3_enet.h | 2 ++ 2 files changed, 6 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index a4ab3e7efa5e..f8275534205a 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -2513,6 +2513,9 @@ static int hns3_alloc_ring_buffers(struct hns3_enet_ring *ring) ret = hns3_alloc_and_attach_buffer(ring, i); if (ret) goto out_buffer_fail; + + if (!(i % HNS3_RESCHED_BD_NUM)) + cond_resched(); } return 0; @@ -3946,6 +3949,7 @@ int hns3_init_all_ring(struct hns3_nic_priv *priv) } u64_stats_init(&priv->ring[i].syncp); + cond_resched(); } return 0; diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h index 54d02ea4aaa7..669cd30b9871 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h @@ -182,6 +182,8 @@ enum hns3_nic_state { #define HNS3_RING_EN_B 0 +#define HNS3_RESCHED_BD_NUM 1024 + enum hns3_pkt_l2t_type { HNS3_L2_TYPE_UNICAST, HNS3_L2_TYPE_MULTICAST, From cbf18d8128a753cb632bef39470d19befd9c7347 Mon Sep 17 00:00:00 2001 From: Aleksandr Mishin Date: Wed, 5 Jun 2024 13:11:35 +0300 Subject: [PATCH 1131/1565] liquidio: Adjust a NULL pointer handling path in lio_vf_rep_copy_packet [ Upstream commit c44711b78608c98a3e6b49ce91678cd0917d5349 ] In lio_vf_rep_copy_packet() pg_info->page is compared to a NULL value, but then it is unconditionally passed to skb_add_rx_frag() which looks strange and could lead to null pointer dereference. lio_vf_rep_copy_packet() call trace looks like: octeon_droq_process_packets octeon_droq_fast_process_packets octeon_droq_dispatch_pkt octeon_create_recv_info ...search in the dispatch_list... ->disp_fn(rdisp->rinfo, ...) lio_vf_rep_pkt_recv(struct octeon_recv_info *recv_info, ...) In this path there is no code which sets pg_info->page to NULL. So this check looks unneeded and doesn't solve potential problem. But I guess the author had reason to add a check and I have no such card and can't do real test. In addition, the code in the function liquidio_push_packet() in liquidio/lio_core.c does exactly the same. Based on this, I consider the most acceptable compromise solution to adjust this issue by moving skb_add_rx_frag() into conditional scope. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 1f233f327913 ("liquidio: switchdev support for LiquidIO NIC") Signed-off-by: Aleksandr Mishin Reviewed-by: Simon Horman Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c index 600de587d7a9..e70b9ccca380 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c @@ -272,13 +272,12 @@ lio_vf_rep_copy_packet(struct octeon_device *oct, pg_info->page_offset; memcpy(skb->data, va, MIN_SKB_SIZE); skb_put(skb, MIN_SKB_SIZE); + skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, + pg_info->page, + pg_info->page_offset + MIN_SKB_SIZE, + len - MIN_SKB_SIZE, + LIO_RXBUFFER_SZ); } - - skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, - pg_info->page, - pg_info->page_offset + MIN_SKB_SIZE, - len - MIN_SKB_SIZE, - LIO_RXBUFFER_SZ); } else { struct octeon_skb_page_info *pg_info = ((struct octeon_skb_page_info *)(skb->cb)); From bda7cdaeebf57e46c1a488ae7a15f6f264691f59 Mon Sep 17 00:00:00 2001 From: Amjad Ouled-Ameur Date: Mon, 10 Jun 2024 11:20:56 +0100 Subject: [PATCH 1132/1565] drm/komeda: check for error-valued pointer [ Upstream commit b880018edd3a577e50366338194dee9b899947e0 ] komeda_pipeline_get_state() may return an error-valued pointer, thus check the pointer for negative or null value before dereferencing. Fixes: 502932a03fce ("drm/komeda: Add the initial scaler support for CORE") Signed-off-by: Amjad Ouled-Ameur Signed-off-by: Maxime Ripard Link: https://patchwork.freedesktop.org/patch/msgid/20240610102056.40406-1-amjad.ouled-ameur@arm.com Signed-off-by: Sasha Levin --- drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c b/drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c index 1e922703e26b..7cc891c091f8 100644 --- a/drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c +++ b/drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c @@ -259,7 +259,7 @@ komeda_component_get_avail_scaler(struct komeda_component *c, u32 avail_scalers; pipe_st = komeda_pipeline_get_state(c->pipeline, state); - if (!pipe_st) + if (IS_ERR_OR_NULL(pipe_st)) return NULL; avail_scalers = (pipe_st->active_comps & KOMEDA_PIPELINE_SCALERS) ^ From 9b164605c115d953a7b44c12504a0f797f7da2d6 Mon Sep 17 00:00:00 2001 From: Adam Miotk Date: Mon, 10 Jun 2024 11:27:39 +0100 Subject: [PATCH 1133/1565] drm/bridge/panel: Fix runtime warning on panel bridge release [ Upstream commit ce62600c4dbee8d43b02277669dd91785a9b81d9 ] Device managed panel bridge wrappers are created by calling to drm_panel_bridge_add_typed() and registering a release handler for clean-up when the device gets unbound. Since the memory for this bridge is also managed and linked to the panel device, the release function should not try to free that memory. Moreover, the call to devm_kfree() inside drm_panel_bridge_remove() will fail in this case and emit a warning because the panel bridge resource is no longer on the device resources list (it has been removed from there before the call to release handlers). Fixes: 67022227ffb1 ("drm/bridge: Add a devm_ allocator for panel bridge.") Signed-off-by: Adam Miotk Signed-off-by: Maxime Ripard Link: https://patchwork.freedesktop.org/patch/msgid/20240610102739.139852-1-adam.miotk@arm.com Signed-off-by: Sasha Levin --- drivers/gpu/drm/bridge/panel.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/bridge/panel.c b/drivers/gpu/drm/bridge/panel.c index c916f4b8907e..35a6d9c4e081 100644 --- a/drivers/gpu/drm/bridge/panel.c +++ b/drivers/gpu/drm/bridge/panel.c @@ -252,9 +252,12 @@ EXPORT_SYMBOL(drm_panel_bridge_remove); static void devm_drm_panel_bridge_release(struct device *dev, void *res) { - struct drm_bridge **bridge = res; + struct drm_bridge *bridge = *(struct drm_bridge **)res; - drm_panel_bridge_remove(*bridge); + if (!bridge) + return; + + drm_bridge_remove(bridge); } /** From 330c8661c993a2e59179ab22c65fd2650c47c8c7 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 6 Jun 2024 15:46:51 +0000 Subject: [PATCH 1134/1565] tcp: fix race in tcp_v6_syn_recv_sock() [ Upstream commit d37fe4255abe8e7b419b90c5847e8ec2b8debb08 ] tcp_v6_syn_recv_sock() calls ip6_dst_store() before inet_sk(newsk)->pinet6 has been set up. This means ip6_dst_store() writes over the parent (listener) np->dst_cookie. This is racy because multiple threads could share the same parent and their final np->dst_cookie could be wrong. Move ip6_dst_store() call after inet_sk(newsk)->pinet6 has been changed and after the copy of parent ipv6_pinfo. Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Signed-off-by: Eric Dumazet Reviewed-by: Simon Horman Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/ipv6/tcp_ipv6.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 79d6f6ea3c54..003221d6f52e 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1311,7 +1311,6 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff * */ newsk->sk_gso_type = SKB_GSO_TCPV6; - ip6_dst_store(newsk, dst, NULL, NULL); inet6_sk_rx_dst_set(newsk, skb); inet_sk(newsk)->pinet6 = tcp_inet6_sk(newsk); @@ -1322,6 +1321,8 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff * memcpy(newnp, np, sizeof(struct ipv6_pinfo)); + ip6_dst_store(newsk, dst, NULL, NULL); + newsk->sk_v6_daddr = ireq->ir_v6_rmt_addr; newnp->saddr = ireq->ir_v6_loc_addr; newsk->sk_v6_rcv_saddr = ireq->ir_v6_loc_addr; From dfd7f4670723fa79542e67fc6222ef94780d77f0 Mon Sep 17 00:00:00 2001 From: Gal Pressman Date: Thu, 6 Jun 2024 23:32:49 +0300 Subject: [PATCH 1135/1565] net/mlx5e: Fix features validation check for tunneled UDP (non-VXLAN) packets [ Upstream commit 791b4089e326271424b78f2fae778b20e53d071b ] Move the vxlan_features_check() call to after we verified the packet is a tunneled VXLAN packet. Without this, tunneled UDP non-VXLAN packets (for ex. GENENVE) might wrongly not get offloaded. In some cases, it worked by chance as GENEVE header is the same size as VXLAN, but it is obviously incorrect. Fixes: e3cfc7e6b7bd ("net/mlx5e: TX, Add geneve tunnel stateless offload support") Signed-off-by: Gal Pressman Reviewed-by: Dragos Tatulea Signed-off-by: Tariq Toukan Reviewed-by: Wojciech Drewek Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index f1834853872d..aeb8bb3c549a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -4393,7 +4393,7 @@ static netdev_features_t mlx5e_tunnel_features_check(struct mlx5e_priv *priv, /* Verify if UDP port is being offloaded by HW */ if (mlx5_vxlan_lookup_port(priv->mdev->vxlan, port)) - return features; + return vxlan_features_check(skb, features); #if IS_ENABLED(CONFIG_GENEVE) /* Support Geneve offload for default UDP port */ @@ -4414,7 +4414,6 @@ netdev_features_t mlx5e_features_check(struct sk_buff *skb, struct mlx5e_priv *priv = netdev_priv(netdev); features = vlan_features_check(skb, features); - features = vxlan_features_check(skb, features); #ifdef CONFIG_MLX5_EN_IPSEC if (mlx5e_ipsec_feature_check(skb, netdev, features)) From ea1a98c9a3678148cf1484d8c0534fa3e5693bc0 Mon Sep 17 00:00:00 2001 From: Luiz Augusto von Dentz Date: Mon, 20 May 2024 16:03:07 -0400 Subject: [PATCH 1136/1565] Bluetooth: L2CAP: Fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ [ Upstream commit 806a5198c05987b748b50f3d0c0cfb3d417381a4 ] This removes the bogus check for max > hcon->le_conn_max_interval since the later is just the initial maximum conn interval not the maximum the stack could support which is really 3200=4000ms. In order to pass GAP/CONN/CPUP/BV-05-C one shall probably enter values of the following fields in IXIT that would cause hci_check_conn_params to fail: TSPX_conn_update_int_min TSPX_conn_update_int_max TSPX_conn_update_peripheral_latency TSPX_conn_update_supervision_timeout Link: https://github.com/bluez/bluez/issues/847 Fixes: e4b019515f95 ("Bluetooth: Enforce validation on max value of connection interval") Signed-off-by: Luiz Augusto von Dentz Signed-off-by: Sasha Levin --- include/net/bluetooth/hci_core.h | 36 ++++++++++++++++++++++++++++---- net/bluetooth/l2cap_core.c | 8 +------ 2 files changed, 33 insertions(+), 11 deletions(-) diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index 33873266b2bc..9128c0db11f8 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -1625,18 +1625,46 @@ static inline int hci_check_conn_params(u16 min, u16 max, u16 latency, { u16 max_latency; - if (min > max || min < 6 || max > 3200) + if (min > max) { + BT_WARN("min %d > max %d", min, max); return -EINVAL; + } - if (to_multiplier < 10 || to_multiplier > 3200) + if (min < 6) { + BT_WARN("min %d < 6", min); return -EINVAL; + } - if (max >= to_multiplier * 8) + if (max > 3200) { + BT_WARN("max %d > 3200", max); return -EINVAL; + } + + if (to_multiplier < 10) { + BT_WARN("to_multiplier %d < 10", to_multiplier); + return -EINVAL; + } + + if (to_multiplier > 3200) { + BT_WARN("to_multiplier %d > 3200", to_multiplier); + return -EINVAL; + } + + if (max >= to_multiplier * 8) { + BT_WARN("max %d >= to_multiplier %d * 8", max, to_multiplier); + return -EINVAL; + } max_latency = (to_multiplier * 4 / max) - 1; - if (latency > 499 || latency > max_latency) + if (latency > 499) { + BT_WARN("latency %d > 499", latency); return -EINVAL; + } + + if (latency > max_latency) { + BT_WARN("latency %d > max_latency %d", latency, max_latency); + return -EINVAL; + } return 0; } diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index da03ca6dd922..9cc034e6074c 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -5612,13 +5612,7 @@ static inline int l2cap_conn_param_update_req(struct l2cap_conn *conn, memset(&rsp, 0, sizeof(rsp)); - if (max > hcon->le_conn_max_interval) { - BT_DBG("requested connection interval exceeds current bounds."); - err = -EINVAL; - } else { - err = hci_check_conn_params(min, max, latency, to_multiplier); - } - + err = hci_check_conn_params(min, max, latency, to_multiplier); if (err) rsp.result = cpu_to_le16(L2CAP_CONN_PARAM_REJECTED); else From 93b53c202b51a69e42ca57f5a183f7e008e19f83 Mon Sep 17 00:00:00 2001 From: Jozsef Kadlecsik Date: Tue, 4 Jun 2024 15:58:03 +0200 Subject: [PATCH 1137/1565] netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type [ Upstream commit 4e7aaa6b82d63e8ddcbfb56b4fd3d014ca586f10 ] Lion Ackermann reported that there is a race condition between namespace cleanup in ipset and the garbage collection of the list:set type. The namespace cleanup can destroy the list:set type of sets while the gc of the set type is waiting to run in rcu cleanup. The latter uses data from the destroyed set which thus leads use after free. The patch contains the following parts: - When destroying all sets, first remove the garbage collectors, then wait if needed and then destroy the sets. - Fix the badly ordered "wait then remove gc" for the destroy a single set case. - Fix the missing rcu locking in the list:set type in the userspace test case. - Use proper RCU list handlings in the list:set type. The patch depends on c1193d9bbbd3 (netfilter: ipset: Add list flush to cancel_gc). Fixes: 97f7cf1cd80e (netfilter: ipset: fix performance regression in swap operation) Reported-by: Lion Ackermann Tested-by: Lion Ackermann Signed-off-by: Jozsef Kadlecsik Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- net/netfilter/ipset/ip_set_core.c | 93 +++++++++++++++------------ net/netfilter/ipset/ip_set_list_set.c | 30 ++++----- 2 files changed, 66 insertions(+), 57 deletions(-) diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index cc04c4d7956c..ecd693fca224 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -1176,23 +1176,50 @@ ip_set_setname_policy[IPSET_ATTR_CMD_MAX + 1] = { .len = IPSET_MAXNAMELEN - 1 }, }; -static void -ip_set_destroy_set(struct ip_set *set) -{ - pr_debug("set: %s\n", set->name); - - /* Must call it without holding any lock */ - set->variant->destroy(set); - module_put(set->type->me); - kfree(set); -} +/* In order to return quickly when destroying a single set, it is split + * into two stages: + * - Cancel garbage collector + * - Destroy the set itself via call_rcu() + */ static void ip_set_destroy_set_rcu(struct rcu_head *head) { struct ip_set *set = container_of(head, struct ip_set, rcu); - ip_set_destroy_set(set); + set->variant->destroy(set); + module_put(set->type->me); + kfree(set); +} + +static void +_destroy_all_sets(struct ip_set_net *inst) +{ + struct ip_set *set; + ip_set_id_t i; + bool need_wait = false; + + /* First cancel gc's: set:list sets are flushed as well */ + for (i = 0; i < inst->ip_set_max; i++) { + set = ip_set(inst, i); + if (set) { + set->variant->cancel_gc(set); + if (set->type->features & IPSET_TYPE_NAME) + need_wait = true; + } + } + /* Must wait for flush to be really finished */ + if (need_wait) + rcu_barrier(); + for (i = 0; i < inst->ip_set_max; i++) { + set = ip_set(inst, i); + if (set) { + ip_set(inst, i) = NULL; + set->variant->destroy(set); + module_put(set->type->me); + kfree(set); + } + } } static int ip_set_destroy(struct net *net, struct sock *ctnl, @@ -1208,11 +1235,10 @@ static int ip_set_destroy(struct net *net, struct sock *ctnl, if (unlikely(protocol_min_failed(attr))) return -IPSET_ERR_PROTOCOL; - /* Commands are serialized and references are * protected by the ip_set_ref_lock. * External systems (i.e. xt_set) must call - * ip_set_put|get_nfnl_* functions, that way we + * ip_set_nfnl_get_* functions, that way we * can safely check references here. * * list:set timer can only decrement the reference @@ -1220,8 +1246,6 @@ static int ip_set_destroy(struct net *net, struct sock *ctnl, * without holding the lock. */ if (!attr[IPSET_ATTR_SETNAME]) { - /* Must wait for flush to be really finished in list:set */ - rcu_barrier(); read_lock_bh(&ip_set_ref_lock); for (i = 0; i < inst->ip_set_max; i++) { s = ip_set(inst, i); @@ -1232,15 +1256,7 @@ static int ip_set_destroy(struct net *net, struct sock *ctnl, } inst->is_destroyed = true; read_unlock_bh(&ip_set_ref_lock); - for (i = 0; i < inst->ip_set_max; i++) { - s = ip_set(inst, i); - if (s) { - ip_set(inst, i) = NULL; - /* Must cancel garbage collectors */ - s->variant->cancel_gc(s); - ip_set_destroy_set(s); - } - } + _destroy_all_sets(inst); /* Modified by ip_set_destroy() only, which is serialized */ inst->is_destroyed = false; } else { @@ -1259,12 +1275,12 @@ static int ip_set_destroy(struct net *net, struct sock *ctnl, features = s->type->features; ip_set(inst, i) = NULL; read_unlock_bh(&ip_set_ref_lock); + /* Must cancel garbage collectors */ + s->variant->cancel_gc(s); if (features & IPSET_TYPE_NAME) { /* Must wait for flush to be really finished */ rcu_barrier(); } - /* Must cancel garbage collectors */ - s->variant->cancel_gc(s); call_rcu(&s->rcu, ip_set_destroy_set_rcu); } return 0; @@ -2399,31 +2415,26 @@ ip_set_net_init(struct net *net) return 0; } +static void __net_exit +ip_set_net_pre_exit(struct net *net) +{ + struct ip_set_net *inst = ip_set_pernet(net); + + inst->is_deleted = true; /* flag for ip_set_nfnl_put */ +} + static void __net_exit ip_set_net_exit(struct net *net) { struct ip_set_net *inst = ip_set_pernet(net); - struct ip_set *set = NULL; - ip_set_id_t i; - - inst->is_deleted = true; /* flag for ip_set_nfnl_put */ - - nfnl_lock(NFNL_SUBSYS_IPSET); - for (i = 0; i < inst->ip_set_max; i++) { - set = ip_set(inst, i); - if (set) { - ip_set(inst, i) = NULL; - set->variant->cancel_gc(set); - ip_set_destroy_set(set); - } - } - nfnl_unlock(NFNL_SUBSYS_IPSET); + _destroy_all_sets(inst); kvfree(rcu_dereference_protected(inst->ip_set_list, 1)); } static struct pernet_operations ip_set_net_ops = { .init = ip_set_net_init, + .pre_exit = ip_set_net_pre_exit, .exit = ip_set_net_exit, .id = &ip_set_net_id, .size = sizeof(struct ip_set_net), diff --git a/net/netfilter/ipset/ip_set_list_set.c b/net/netfilter/ipset/ip_set_list_set.c index 6bc7019982b0..e839c356bcb5 100644 --- a/net/netfilter/ipset/ip_set_list_set.c +++ b/net/netfilter/ipset/ip_set_list_set.c @@ -79,7 +79,7 @@ list_set_kadd(struct ip_set *set, const struct sk_buff *skb, struct set_elem *e; int ret; - list_for_each_entry(e, &map->members, list) { + list_for_each_entry_rcu(e, &map->members, list) { if (SET_WITH_TIMEOUT(set) && ip_set_timeout_expired(ext_timeout(e, set))) continue; @@ -99,7 +99,7 @@ list_set_kdel(struct ip_set *set, const struct sk_buff *skb, struct set_elem *e; int ret; - list_for_each_entry(e, &map->members, list) { + list_for_each_entry_rcu(e, &map->members, list) { if (SET_WITH_TIMEOUT(set) && ip_set_timeout_expired(ext_timeout(e, set))) continue; @@ -188,9 +188,10 @@ list_set_utest(struct ip_set *set, void *value, const struct ip_set_ext *ext, struct list_set *map = set->data; struct set_adt_elem *d = value; struct set_elem *e, *next, *prev = NULL; - int ret; + int ret = 0; - list_for_each_entry(e, &map->members, list) { + rcu_read_lock(); + list_for_each_entry_rcu(e, &map->members, list) { if (SET_WITH_TIMEOUT(set) && ip_set_timeout_expired(ext_timeout(e, set))) continue; @@ -201,6 +202,7 @@ list_set_utest(struct ip_set *set, void *value, const struct ip_set_ext *ext, if (d->before == 0) { ret = 1; + goto out; } else if (d->before > 0) { next = list_next_entry(e, list); ret = !list_is_last(&e->list, &map->members) && @@ -208,9 +210,11 @@ list_set_utest(struct ip_set *set, void *value, const struct ip_set_ext *ext, } else { ret = prev && prev->id == d->refid; } - return ret; + goto out; } - return 0; +out: + rcu_read_unlock(); + return ret; } static void @@ -239,7 +243,7 @@ list_set_uadd(struct ip_set *set, void *value, const struct ip_set_ext *ext, /* Find where to add the new entry */ n = prev = next = NULL; - list_for_each_entry(e, &map->members, list) { + list_for_each_entry_rcu(e, &map->members, list) { if (SET_WITH_TIMEOUT(set) && ip_set_timeout_expired(ext_timeout(e, set))) continue; @@ -316,9 +320,9 @@ list_set_udel(struct ip_set *set, void *value, const struct ip_set_ext *ext, { struct list_set *map = set->data; struct set_adt_elem *d = value; - struct set_elem *e, *next, *prev = NULL; + struct set_elem *e, *n, *next, *prev = NULL; - list_for_each_entry(e, &map->members, list) { + list_for_each_entry_safe(e, n, &map->members, list) { if (SET_WITH_TIMEOUT(set) && ip_set_timeout_expired(ext_timeout(e, set))) continue; @@ -424,14 +428,8 @@ static void list_set_destroy(struct ip_set *set) { struct list_set *map = set->data; - struct set_elem *e, *n; - list_for_each_entry_safe(e, n, &map->members, list) { - list_del(&e->list); - ip_set_put_byindex(map->net, e->id); - ip_set_ext_destroy(set, e); - kfree(e); - } + WARN_ON_ONCE(!list_empty(&map->members)); kfree(map); set->data = NULL; From 01ce5bdfdf8426a83bae16014c6545a2f3a8e5a6 Mon Sep 17 00:00:00 2001 From: Xiaolei Wang Date: Sat, 8 Jun 2024 22:35:24 +0800 Subject: [PATCH 1138/1565] net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters [ Upstream commit be27b896529787e23a35ae4befb6337ce73fcca0 ] The current cbs parameter depends on speed after uplinking, which is not needed and will report a configuration error if the port is not initially connected. The UAPI exposed by tc-cbs requires userspace to recalculate the send slope anyway, because the formula depends on port_transmit_rate (see man tc-cbs), which is not an invariant from tc's perspective. Therefore, we use offload->sendslope and offload->idleslope to derive the original port_transmit_rate from the CBS formula. Fixes: 1f705bc61aee ("net: stmmac: Add support for CBS QDISC") Signed-off-by: Xiaolei Wang Reviewed-by: Wojciech Drewek Reviewed-by: Vladimir Oltean Link: https://lore.kernel.org/r/20240608143524.2065736-1-xiaolei.wang@windriver.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- .../net/ethernet/stmicro/stmmac/stmmac_tc.c | 25 ++++++++----------- 1 file changed, 11 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c index 43165c662740..6b93a7614ad9 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c @@ -310,10 +310,11 @@ static int tc_setup_cbs(struct stmmac_priv *priv, struct tc_cbs_qopt_offload *qopt) { u32 tx_queues_count = priv->plat->tx_queues_to_use; + s64 port_transmit_rate_kbps; u32 queue = qopt->queue; - u32 ptr, speed_div; u32 mode_to_use; u64 value; + u32 ptr; int ret; /* Queue 0 is not AVB capable */ @@ -322,30 +323,26 @@ static int tc_setup_cbs(struct stmmac_priv *priv, if (!priv->dma_cap.av) return -EOPNOTSUPP; + port_transmit_rate_kbps = qopt->idleslope - qopt->sendslope; + /* Port Transmit Rate and Speed Divider */ - switch (priv->speed) { + switch (div_s64(port_transmit_rate_kbps, 1000)) { case SPEED_10000: - ptr = 32; - speed_div = 10000000; - break; case SPEED_5000: ptr = 32; - speed_div = 5000000; break; case SPEED_2500: - ptr = 8; - speed_div = 2500000; - break; case SPEED_1000: ptr = 8; - speed_div = 1000000; break; case SPEED_100: ptr = 4; - speed_div = 100000; break; default: - return -EOPNOTSUPP; + netdev_err(priv->dev, + "Invalid portTransmitRate %lld (idleSlope - sendSlope)\n", + port_transmit_rate_kbps); + return -EINVAL; } mode_to_use = priv->plat->tx_queues_cfg[queue].mode_to_use; @@ -365,10 +362,10 @@ static int tc_setup_cbs(struct stmmac_priv *priv, } /* Final adjustments for HW */ - value = div_s64(qopt->idleslope * 1024ll * ptr, speed_div); + value = div_s64(qopt->idleslope * 1024ll * ptr, port_transmit_rate_kbps); priv->plat->tx_queues_cfg[queue].idle_slope = value & GENMASK(31, 0); - value = div_s64(-qopt->sendslope * 1024ll * ptr, speed_div); + value = div_s64(-qopt->sendslope * 1024ll * ptr, port_transmit_rate_kbps); priv->plat->tx_queues_cfg[queue].send_slope = value & GENMASK(31, 0); value = qopt->hicredit * 1024ll * 8; From b278f9b458faef200dba5466b719cf14b641df74 Mon Sep 17 00:00:00 2001 From: Petr Pavlu Date: Fri, 7 Jun 2024 13:28:28 +0200 Subject: [PATCH 1139/1565] net/ipv6: Fix the RT cache flush via sysctl using a previous delay [ Upstream commit 14a20e5b4ad998793c5f43b0330d9e1388446cf3 ] The net.ipv6.route.flush system parameter takes a value which specifies a delay used during the flush operation for aging exception routes. The written value is however not used in the currently requested flush and instead utilized only in the next one. A problem is that ipv6_sysctl_rtcache_flush() first reads the old value of net->ipv6.sysctl.flush_delay into a local delay variable and then calls proc_dointvec() which actually updates the sysctl based on the provided input. Fix the problem by switching the order of the two operations. Fixes: 4990509f19e8 ("[NETNS][IPV6]: Make sysctls route per namespace.") Signed-off-by: Petr Pavlu Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20240607112828.30285-1-petr.pavlu@suse.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/ipv6/route.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 62eaaf9532e1..00e3daaafed8 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -6181,12 +6181,12 @@ static int ipv6_sysctl_rtcache_flush(struct ctl_table *ctl, int write, if (!write) return -EINVAL; - net = (struct net *)ctl->extra1; - delay = net->ipv6.sysctl.flush_delay; ret = proc_dointvec(ctl, write, buffer, lenp, ppos); if (ret) return ret; + net = (struct net *)ctl->extra1; + delay = net->ipv6.sysctl.flush_delay; fib6_run_gc(delay <= 0 ? 0 : (unsigned long)delay, net, delay > 0); return 0; } From ff9c2a9426ecf5b9631e9fd74993b357262387d6 Mon Sep 17 00:00:00 2001 From: Taehee Yoo Date: Wed, 12 Jun 2024 06:04:46 +0000 Subject: [PATCH 1140/1565] ionic: fix use after netif_napi_del() [ Upstream commit 79f18a41dd056115d685f3b0a419c7cd40055e13 ] When queues are started, netif_napi_add() and napi_enable() are called. If there are 4 queues and only 3 queues are used for the current configuration, only 3 queues' napi should be registered and enabled. The ionic_qcq_enable() checks whether the .poll pointer is not NULL for enabling only the using queue' napi. Unused queues' napi will not be registered by netif_napi_add(), so the .poll pointer indicates NULL. But it couldn't distinguish whether the napi was unregistered or not because netif_napi_del() doesn't reset the .poll pointer to NULL. So, ionic_qcq_enable() calls napi_enable() for the queue, which was unregistered by netif_napi_del(). Reproducer: ethtool -L rx 1 tx 1 combined 0 ethtool -L rx 0 tx 0 combined 1 ethtool -L rx 0 tx 0 combined 4 Splat looks like: kernel BUG at net/core/dev.c:6666! Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 PID: 1057 Comm: kworker/3:3 Not tainted 6.10.0-rc2+ #16 Workqueue: events ionic_lif_deferred_work [ionic] RIP: 0010:napi_enable+0x3b/0x40 Code: 48 89 c2 48 83 e2 f6 80 b9 61 09 00 00 00 74 0d 48 83 bf 60 01 00 00 00 74 03 80 ce 01 f0 4f RSP: 0018:ffffb6ed83227d48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff97560cda0828 RCX: 0000000000000029 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff97560cda0a28 RBP: ffffb6ed83227d50 R08: 0000000000000400 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000 R13: ffff97560ce3c1a0 R14: 0000000000000000 R15: ffff975613ba0a20 FS: 0000000000000000(0000) GS:ffff975d5f780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8f734ee200 CR3: 0000000103e50000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: ? die+0x33/0x90 ? do_trap+0xd9/0x100 ? napi_enable+0x3b/0x40 ? do_error_trap+0x83/0xb0 ? napi_enable+0x3b/0x40 ? napi_enable+0x3b/0x40 ? exc_invalid_op+0x4e/0x70 ? napi_enable+0x3b/0x40 ? asm_exc_invalid_op+0x16/0x20 ? napi_enable+0x3b/0x40 ionic_qcq_enable+0xb7/0x180 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] ionic_start_queues+0xc4/0x290 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] ionic_link_status_check+0x11c/0x170 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] ionic_lif_deferred_work+0x129/0x280 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] process_one_work+0x145/0x360 worker_thread+0x2bb/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0xcc/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2d/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 Fixes: 0f3154e6bcb3 ("ionic: Add Tx and Rx handling") Signed-off-by: Taehee Yoo Reviewed-by: Brett Creeley Reviewed-by: Shannon Nelson Link: https://lore.kernel.org/r/20240612060446.1754392-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/ethernet/pensando/ionic/ionic_lif.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c index a37ca4b1e566..324ef6990e9a 100644 --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c @@ -272,10 +272,8 @@ static int ionic_qcq_enable(struct ionic_qcq *qcq) if (ret) return ret; - if (qcq->napi.poll) - napi_enable(&qcq->napi); - if (qcq->flags & IONIC_QCQ_F_INTR) { + napi_enable(&qcq->napi); irq_set_affinity_hint(qcq->intr.vector, &qcq->intr.affinity_mask); ionic_intr_mask(idev->intr_ctrl, qcq->intr.index, From e4ce76890e5e7accccd9eebcb1a56b3c3c3eb4a1 Mon Sep 17 00:00:00 2001 From: David Lechner Date: Fri, 3 May 2024 14:45:05 -0500 Subject: [PATCH 1141/1565] iio: adc: ad9467: fix scan type sign commit 8a01ef749b0a632f0e1f4ead0f08b3310d99fcb1 upstream. According to the IIO documentation, the sign in the scan type should be lower case. The ad9467 driver was incorrectly using upper case. Fix by changing to lower case. Fixes: 4606d0f4b05f ("iio: adc: ad9467: add support for AD9434 high-speed ADC") Fixes: ad6797120238 ("iio: adc: ad9467: add support AD9467 ADC") Signed-off-by: David Lechner Link: https://lore.kernel.org/r/20240503-ad9467-fix-scan-type-sign-v1-1-c7a1a066ebb9@baylibre.com Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/adc/ad9467.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/iio/adc/ad9467.c b/drivers/iio/adc/ad9467.c index 6b9627bfebb0..4fd74e2847f0 100644 --- a/drivers/iio/adc/ad9467.c +++ b/drivers/iio/adc/ad9467.c @@ -223,11 +223,11 @@ static void __ad9467_get_scale(struct adi_axi_adc_conv *conv, int index, } static const struct iio_chan_spec ad9434_channels[] = { - AD9467_CHAN(0, 0, 12, 'S'), + AD9467_CHAN(0, 0, 12, 's'), }; static const struct iio_chan_spec ad9467_channels[] = { - AD9467_CHAN(0, 0, 16, 'S'), + AD9467_CHAN(0, 0, 16, 's'), }; static const struct ad9467_chip_info ad9467_chip_tbl[] = { From 9d4dce5870816ba7a85ba590d22ff60d1dcc6595 Mon Sep 17 00:00:00 2001 From: Marc Ferland Date: Wed, 1 May 2024 11:05:54 -0400 Subject: [PATCH 1142/1565] iio: dac: ad5592r: fix temperature channel scaling value commit 279428df888319bf68f2686934897301a250bb84 upstream. The scale value for the temperature channel is (assuming Vref=2.5 and the datasheet): 376.7897513 When calculating both val and val2 for the temperature scale we use (3767897513/25) and multiply it by Vref (here I assume 2500mV) to obtain: 2500 * (3767897513/25) ==> 376789751300 Finally we divide with remainder by 10^9 to get: val = 376 val2 = 789751300 However, we return IIO_VAL_INT_PLUS_MICRO (should have been NANO) as the scale type. So when converting the raw temperature value to the 'processed' temperature value we will get (assuming raw=810, offset=-753): processed = (raw + offset) * scale_val = (810 + -753) * 376 = 21432 processed += div((raw + offset) * scale_val2, 10^6) += div((810 + -753) * 789751300, 10^6) += 45015 ==> 66447 ==> 66.4 Celcius instead of the expected 21.5 Celsius. Fix this issue by changing IIO_VAL_INT_PLUS_MICRO to IIO_VAL_INT_PLUS_NANO. Fixes: 56ca9db862bf ("iio: dac: Add support for the AD5592R/AD5593R ADCs/DACs") Signed-off-by: Marc Ferland Link: https://lore.kernel.org/r/20240501150554.1871390-1-marc.ferland@sonatest.com Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/dac/ad5592r-base.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iio/dac/ad5592r-base.c b/drivers/iio/dac/ad5592r-base.c index 987264410278..520a3748befa 100644 --- a/drivers/iio/dac/ad5592r-base.c +++ b/drivers/iio/dac/ad5592r-base.c @@ -411,7 +411,7 @@ static int ad5592r_read_raw(struct iio_dev *iio_dev, s64 tmp = *val * (3767897513LL / 25LL); *val = div_s64_rem(tmp, 1000000000LL, val2); - return IIO_VAL_INT_PLUS_MICRO; + return IIO_VAL_INT_PLUS_NANO; } mutex_lock(&st->lock); From fd45d6f1949481b6a6f50d1d505ba8dd2579f0e6 Mon Sep 17 00:00:00 2001 From: Jean-Baptiste Maneyrol Date: Mon, 27 May 2024 21:00:08 +0000 Subject: [PATCH 1143/1565] iio: imu: inv_icm42600: delete unneeded update watermark call commit 245f3b149e6cc3ac6ee612cdb7042263bfc9e73c upstream. Update watermark will be done inside the hwfifo_set_watermark callback just after the update_scan_mode. It is useless to do it here. Fixes: 7f85e42a6c54 ("iio: imu: inv_icm42600: add buffer support in iio devices") Cc: stable@vger.kernel.org Signed-off-by: Jean-Baptiste Maneyrol Link: https://lore.kernel.org/r/20240527210008.612932-1-inv.git-commit@tdk.com Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/imu/inv_icm42600/inv_icm42600_accel.c | 4 ---- drivers/iio/imu/inv_icm42600/inv_icm42600_gyro.c | 4 ---- 2 files changed, 8 deletions(-) diff --git a/drivers/iio/imu/inv_icm42600/inv_icm42600_accel.c b/drivers/iio/imu/inv_icm42600/inv_icm42600_accel.c index 3441b0d61c5d..cef76c1c40f3 100644 --- a/drivers/iio/imu/inv_icm42600/inv_icm42600_accel.c +++ b/drivers/iio/imu/inv_icm42600/inv_icm42600_accel.c @@ -128,10 +128,6 @@ static int inv_icm42600_accel_update_scan_mode(struct iio_dev *indio_dev, /* update data FIFO write */ inv_icm42600_timestamp_apply_odr(ts, 0, 0, 0); ret = inv_icm42600_buffer_set_fifo_en(st, fifo_en | st->fifo.en); - if (ret) - goto out_unlock; - - ret = inv_icm42600_buffer_update_watermark(st); out_unlock: mutex_unlock(&st->lock); diff --git a/drivers/iio/imu/inv_icm42600/inv_icm42600_gyro.c b/drivers/iio/imu/inv_icm42600/inv_icm42600_gyro.c index aee7b9ff4bf4..608d07115a36 100644 --- a/drivers/iio/imu/inv_icm42600/inv_icm42600_gyro.c +++ b/drivers/iio/imu/inv_icm42600/inv_icm42600_gyro.c @@ -128,10 +128,6 @@ static int inv_icm42600_gyro_update_scan_mode(struct iio_dev *indio_dev, /* update data FIFO write */ inv_icm42600_timestamp_apply_odr(ts, 0, 0, 0); ret = inv_icm42600_buffer_set_fifo_en(st, fifo_en | st->fifo.en); - if (ret) - goto out_unlock; - - ret = inv_icm42600_buffer_update_watermark(st); out_unlock: mutex_unlock(&st->lock); From 760603e30bf19d7b4c28e9d81f18b54fa3b745ad Mon Sep 17 00:00:00 2001 From: Dirk Behme Date: Mon, 13 May 2024 07:06:34 +0200 Subject: [PATCH 1144/1565] drivers: core: synchronize really_probe() and dev_uevent() commit c0a40097f0bc81deafc15f9195d1fb54595cd6d0 upstream. Synchronize the dev->driver usage in really_probe() and dev_uevent(). These can run in different threads, what can result in the following race condition for dev->driver uninitialization: Thread #1: ========== really_probe() { ... probe_failed: ... device_unbind_cleanup(dev) { ... dev->driver = NULL; // <= Failed probe sets dev->driver to NULL ... } ... } Thread #2: ========== dev_uevent() { ... if (dev->driver) // If dev->driver is NULLed from really_probe() from here on, // after above check, the system crashes add_uevent_var(env, "DRIVER=%s", dev->driver->name); ... } really_probe() holds the lock, already. So nothing needs to be done there. dev_uevent() is called with lock held, often, too. But not always. What implies that we can't add any locking in dev_uevent() itself. So fix this race by adding the lock to the non-protected path. This is the path where above race is observed: dev_uevent+0x235/0x380 uevent_show+0x10c/0x1f0 <= Add lock here dev_attr_show+0x3a/0xa0 sysfs_kf_seq_show+0x17c/0x250 kernfs_seq_show+0x7c/0x90 seq_read_iter+0x2d7/0x940 kernfs_fop_read_iter+0xc6/0x310 vfs_read+0x5bc/0x6b0 ksys_read+0xeb/0x1b0 __x64_sys_read+0x42/0x50 x64_sys_call+0x27ad/0x2d30 do_syscall_64+0xcd/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Similar cases are reported by syzkaller in https://syzkaller.appspot.com/bug?extid=ffa8143439596313a85a But these are regarding the *initialization* of dev->driver dev->driver = drv; As this switches dev->driver to non-NULL these reports can be considered to be false-positives (which should be "fixed" by this commit, as well, though). The same issue was reported and tried to be fixed back in 2015 in https://lore.kernel.org/lkml/1421259054-2574-1-git-send-email-a.sangwan@samsung.com/ already. Fixes: 239378f16aa1 ("Driver core: add uevent vars for devices of a class") Cc: stable Cc: syzbot+ffa8143439596313a85a@syzkaller.appspotmail.com Cc: Ashish Sangwan Cc: Namjae Jeon Signed-off-by: Dirk Behme Link: https://lore.kernel.org/r/20240513050634.3964461-1-dirk.behme@de.bosch.com Signed-off-by: Greg Kroah-Hartman --- drivers/base/core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/base/core.c b/drivers/base/core.c index 2c978941b488..b13a60de5a86 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2008,8 +2008,11 @@ static ssize_t uevent_show(struct device *dev, struct device_attribute *attr, if (!env) return -ENOMEM; + /* Synchronize with really_probe() */ + device_lock(dev); /* let the kset specific function add its keys */ retval = kset->uevent_ops->uevent(kset, &dev->kobj, env); + device_unlock(dev); if (retval) goto out; From 0acc356da8546b5c55aabfc2e2c5caa0ac9b0003 Mon Sep 17 00:00:00 2001 From: Jani Nikula Date: Thu, 30 May 2024 13:01:51 +0300 Subject: [PATCH 1145/1565] drm/exynos/vidi: fix memory leak in .get_modes() commit 38e3825631b1f314b21e3ade00b5a4d737eb054e upstream. The duplicated EDID is never freed. Fix it. Cc: stable@vger.kernel.org Signed-off-by: Jani Nikula Signed-off-by: Inki Dae Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/exynos/exynos_drm_vidi.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_vidi.c b/drivers/gpu/drm/exynos/exynos_drm_vidi.c index e1ffe8a28b64..d87ab8ecb023 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_vidi.c +++ b/drivers/gpu/drm/exynos/exynos_drm_vidi.c @@ -308,6 +308,7 @@ static int vidi_get_modes(struct drm_connector *connector) struct vidi_context *ctx = ctx_from_connector(connector); struct edid *edid; int edid_len; + int count; /* * the edid data comes from user side and it would be set @@ -327,7 +328,11 @@ static int vidi_get_modes(struct drm_connector *connector) drm_connector_update_edid_property(connector, edid); - return drm_add_edid_modes(connector, edid); + count = drm_add_edid_modes(connector, edid); + + kfree(edid); + + return count; } static const struct drm_connector_helper_funcs vidi_connector_helper_funcs = { From 4dfffb50316c761c59386c9b002a10ac6d7bb6c9 Mon Sep 17 00:00:00 2001 From: Marek Szyprowski Date: Thu, 25 Apr 2024 11:48:51 +0200 Subject: [PATCH 1146/1565] drm/exynos: hdmi: report safe 640x480 mode as a fallback when no EDID found commit 799d4b392417ed6889030a5b2335ccb6dcf030ab upstream. When reading EDID fails and driver reports no modes available, the DRM core adds an artificial 1024x786 mode to the connector. Unfortunately some variants of the Exynos HDMI (like the one in Exynos4 SoCs) are not able to drive such mode, so report a safe 640x480 mode instead of nothing in case of the EDID reading failure. This fixes the following issue observed on Trats2 board since commit 13d5b040363c ("drm/exynos: do not return negative values from .get_modes()"): [drm] Exynos DRM: using 11c00000.fimd device for DMA mapping operations exynos-drm exynos-drm: bound 11c00000.fimd (ops fimd_component_ops) exynos-drm exynos-drm: bound 12c10000.mixer (ops mixer_component_ops) exynos-dsi 11c80000.dsi: [drm:samsung_dsim_host_attach] Attached s6e8aa0 device (lanes:4 bpp:24 mode-flags:0x10b) exynos-drm exynos-drm: bound 11c80000.dsi (ops exynos_dsi_component_ops) exynos-drm exynos-drm: bound 12d00000.hdmi (ops hdmi_component_ops) [drm] Initialized exynos 1.1.0 20180330 for exynos-drm on minor 1 exynos-hdmi 12d00000.hdmi: [drm:hdmiphy_enable.part.0] *ERROR* PLL could not reach steady state panel-samsung-s6e8aa0 11c80000.dsi.0: ID: 0xa2, 0x20, 0x8c exynos-mixer 12c10000.mixer: timeout waiting for VSYNC ------------[ cut here ]------------ WARNING: CPU: 1 PID: 11 at drivers/gpu/drm/drm_atomic_helper.c:1682 drm_atomic_helper_wait_for_vblanks.part.0+0x2b0/0x2b8 [CRTC:70:crtc-1] vblank wait timed out Modules linked in: CPU: 1 PID: 11 Comm: kworker/u16:0 Not tainted 6.9.0-rc5-next-20240424 #14913 Hardware name: Samsung Exynos (Flattened Device Tree) Workqueue: events_unbound deferred_probe_work_func Call trace: unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x68/0x88 dump_stack_lvl from __warn+0x7c/0x1c4 __warn from warn_slowpath_fmt+0x11c/0x1a8 warn_slowpath_fmt from drm_atomic_helper_wait_for_vblanks.part.0+0x2b0/0x2b8 drm_atomic_helper_wait_for_vblanks.part.0 from drm_atomic_helper_commit_tail_rpm+0x7c/0x8c drm_atomic_helper_commit_tail_rpm from commit_tail+0x9c/0x184 commit_tail from drm_atomic_helper_commit+0x168/0x190 drm_atomic_helper_commit from drm_atomic_commit+0xb4/0xe0 drm_atomic_commit from drm_client_modeset_commit_atomic+0x23c/0x27c drm_client_modeset_commit_atomic from drm_client_modeset_commit_locked+0x60/0x1cc drm_client_modeset_commit_locked from drm_client_modeset_commit+0x24/0x40 drm_client_modeset_commit from __drm_fb_helper_restore_fbdev_mode_unlocked+0x9c/0xc4 __drm_fb_helper_restore_fbdev_mode_unlocked from drm_fb_helper_set_par+0x2c/0x3c drm_fb_helper_set_par from fbcon_init+0x3d8/0x550 fbcon_init from visual_init+0xc0/0x108 visual_init from do_bind_con_driver+0x1b8/0x3a4 do_bind_con_driver from do_take_over_console+0x140/0x1ec do_take_over_console from do_fbcon_takeover+0x70/0xd0 do_fbcon_takeover from fbcon_fb_registered+0x19c/0x1ac fbcon_fb_registered from register_framebuffer+0x190/0x21c register_framebuffer from __drm_fb_helper_initial_config_and_unlock+0x350/0x574 __drm_fb_helper_initial_config_and_unlock from exynos_drm_fbdev_client_hotplug+0x6c/0xb0 exynos_drm_fbdev_client_hotplug from drm_client_register+0x58/0x94 drm_client_register from exynos_drm_bind+0x160/0x190 exynos_drm_bind from try_to_bring_up_aggregate_device+0x200/0x2d8 try_to_bring_up_aggregate_device from __component_add+0xb0/0x170 __component_add from mixer_probe+0x74/0xcc mixer_probe from platform_probe+0x5c/0xb8 platform_probe from really_probe+0xe0/0x3d8 really_probe from __driver_probe_device+0x9c/0x1e4 __driver_probe_device from driver_probe_device+0x30/0xc0 driver_probe_device from __device_attach_driver+0xa8/0x120 __device_attach_driver from bus_for_each_drv+0x80/0xcc bus_for_each_drv from __device_attach+0xac/0x1fc __device_attach from bus_probe_device+0x8c/0x90 bus_probe_device from deferred_probe_work_func+0x98/0xe0 deferred_probe_work_func from process_one_work+0x240/0x6d0 process_one_work from worker_thread+0x1a0/0x3f4 worker_thread from kthread+0x104/0x138 kthread from ret_from_fork+0x14/0x28 Exception stack(0xf0895fb0 to 0xf0895ff8) ... irq event stamp: 82357 hardirqs last enabled at (82363): [] vprintk_emit+0x308/0x33c hardirqs last disabled at (82368): [] vprintk_emit+0x2bc/0x33c softirqs last enabled at (81614): [] __do_softirq+0x320/0x500 softirqs last disabled at (81609): [] __irq_exit_rcu+0x130/0x184 ---[ end trace 0000000000000000 ]--- exynos-drm exynos-drm: [drm] *ERROR* flip_done timed out exynos-drm exynos-drm: [drm] *ERROR* [CRTC:70:crtc-1] commit wait timed out exynos-drm exynos-drm: [drm] *ERROR* flip_done timed out exynos-drm exynos-drm: [drm] *ERROR* [CONNECTOR:74:HDMI-A-1] commit wait timed out exynos-drm exynos-drm: [drm] *ERROR* flip_done timed out exynos-drm exynos-drm: [drm] *ERROR* [PLANE:56:plane-5] commit wait timed out exynos-mixer 12c10000.mixer: timeout waiting for VSYNC Cc: stable@vger.kernel.org Fixes: 13d5b040363c ("drm/exynos: do not return negative values from .get_modes()") Signed-off-by: Marek Szyprowski Signed-off-by: Inki Dae Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/exynos/exynos_hdmi.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos_hdmi.c b/drivers/gpu/drm/exynos/exynos_hdmi.c index 576fcf180716..d8579c7c15fd 100644 --- a/drivers/gpu/drm/exynos/exynos_hdmi.c +++ b/drivers/gpu/drm/exynos/exynos_hdmi.c @@ -878,11 +878,11 @@ static int hdmi_get_modes(struct drm_connector *connector) int ret; if (!hdata->ddc_adpt) - return 0; + goto no_edid; edid = drm_get_edid(connector, hdata->ddc_adpt); if (!edid) - return 0; + goto no_edid; hdata->dvi_mode = !drm_detect_hdmi_monitor(edid); DRM_DEV_DEBUG_KMS(hdata->dev, "%s : width[%d] x height[%d]\n", @@ -897,6 +897,9 @@ static int hdmi_get_modes(struct drm_connector *connector) kfree(edid); return ret; + +no_edid: + return drm_add_modes_noedid(connector, 640, 480); } static int hdmi_find_phy_conf(struct hdmi_context *hdata, u32 pixel_clock) From f70ff737346744633e7b655c1fb23e1578491ff3 Mon Sep 17 00:00:00 2001 From: Hagar Gamal Halim Hemdan Date: Tue, 30 Apr 2024 08:59:16 +0000 Subject: [PATCH 1147/1565] vmci: prevent speculation leaks by sanitizing event in event_deliver() commit 8003f00d895310d409b2bf9ef907c56b42a4e0f4 upstream. Coverity spotted that event_msg is controlled by user-space, event_msg->event_data.event is passed to event_deliver() and used as an index without sanitization. This change ensures that the event index is sanitized to mitigate any possibility of speculative information leaks. This bug was discovered and resolved using Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc. Only compile tested, no access to HW. Fixes: 1d990201f9bb ("VMCI: event handling implementation.") Cc: stable Signed-off-by: Hagar Gamal Halim Hemdan Link: https://lore.kernel.org/stable/20231127193533.46174-1-hagarhem%40amazon.com Link: https://lore.kernel.org/r/20240430085916.4753-1-hagarhem@amazon.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Greg Kroah-Hartman --- drivers/misc/vmw_vmci/vmci_event.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/misc/vmw_vmci/vmci_event.c b/drivers/misc/vmw_vmci/vmci_event.c index e3436abf39f4..5b7ef47f4c11 100644 --- a/drivers/misc/vmw_vmci/vmci_event.c +++ b/drivers/misc/vmw_vmci/vmci_event.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -86,9 +87,12 @@ static void event_deliver(struct vmci_event_msg *event_msg) { struct vmci_subscription *cur; struct list_head *subscriber_list; + u32 sanitized_event, max_vmci_event; rcu_read_lock(); - subscriber_list = &subscriber_array[event_msg->event_data.event]; + max_vmci_event = ARRAY_SIZE(subscriber_array); + sanitized_event = array_index_nospec(event_msg->event_data.event, max_vmci_event); + subscriber_list = &subscriber_array[sanitized_event]; list_for_each_entry_rcu(cur, subscriber_list, node) { cur->callback(cur->id, &event_msg->event_data, cur->callback_data); From 70c1835e776c8447c1aca87ddb38cfe764fe756a Mon Sep 17 00:00:00 2001 From: Rik van Riel Date: Tue, 7 May 2024 09:18:58 -0400 Subject: [PATCH 1148/1565] fs/proc: fix softlockup in __read_vmcore commit 5cbcb62dddf5346077feb82b7b0c9254222d3445 upstream. While taking a kernel core dump with makedumpfile on a larger system, softlockup messages often appear. While softlockup warnings can be harmless, they can also interfere with things like RCU freeing memory, which can be problematic when the kdump kexec image is configured with as little memory as possible. Avoid the softlockup, and give things like work items and RCU a chance to do their thing during __read_vmcore by adding a cond_resched. Link: https://lkml.kernel.org/r/20240507091858.36ff767f@imladris.surriel.com Signed-off-by: Rik van Riel Acked-by: Baoquan He Cc: Dave Young Cc: Vivek Goyal Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- fs/proc/vmcore.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c index 0e4278d4a769..0833676da5f4 100644 --- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -373,6 +373,8 @@ static ssize_t __read_vmcore(char *buffer, size_t buflen, loff_t *fpos, /* leave now if filled buffer already */ if (buflen == 0) return acc; + + cond_resched(); } list_for_each_entry(m, &vmcore_list, list) { From 11a075a1c8c75f7bb7a05139d64decab2420dca1 Mon Sep 17 00:00:00 2001 From: Su Yue Date: Mon, 8 Apr 2024 16:20:41 +0800 Subject: [PATCH 1149/1565] ocfs2: use coarse time for new created files commit b8cb324277ee16f3eca3055b96fce4735a5a41c6 upstream. The default atime related mount option is '-o realtime' which means file atime should be updated if atime <= ctime or atime <= mtime. atime should be updated in the following scenario, but it is not: ========================================================== $ rm /mnt/testfile; $ echo test > /mnt/testfile $ stat -c "%X %Y %Z" /mnt/testfile 1711881646 1711881646 1711881646 $ sleep 5 $ cat /mnt/testfile > /dev/null $ stat -c "%X %Y %Z" /mnt/testfile 1711881646 1711881646 1711881646 ========================================================== And the reason the atime in the test is not updated is that ocfs2 calls ktime_get_real_ts64() in __ocfs2_mknod_locked during file creation. Then inode_set_ctime_current() is called in inode_set_ctime_current() calls ktime_get_coarse_real_ts64() to get current time. ktime_get_real_ts64() is more accurate than ktime_get_coarse_real_ts64(). In my test box, I saw ctime set by ktime_get_coarse_real_ts64() is less than ktime_get_real_ts64() even ctime is set later. The ctime of the new inode is smaller than atime. The call trace is like: ocfs2_create ocfs2_mknod __ocfs2_mknod_locked .... ktime_get_real_ts64 <------- set atime,ctime,mtime, more accurate ocfs2_populate_inode ... ocfs2_init_acl ocfs2_acl_set_mode inode_set_ctime_current current_time ktime_get_coarse_real_ts64 <-------less accurate ocfs2_file_read_iter ocfs2_inode_lock_atime ocfs2_should_update_atime atime <= ctime ? <-------- false, ctime < atime due to accuracy So here call ktime_get_coarse_real_ts64 to set inode time coarser while creating new files. It may lower the accuracy of file times. But it's not a big deal since we already use coarse time in other places like ocfs2_update_inode_atime and inode_set_ctime_current. Link: https://lkml.kernel.org/r/20240408082041.20925-5-glass.su@suse.com Fixes: c62c38f6b91b ("ocfs2: replace CURRENT_TIME macro") Signed-off-by: Su Yue Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- fs/ocfs2/namei.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c index 5c98813b3dca..7bdda635ca80 100644 --- a/fs/ocfs2/namei.c +++ b/fs/ocfs2/namei.c @@ -566,7 +566,7 @@ static int __ocfs2_mknod_locked(struct inode *dir, fe->i_last_eb_blk = 0; strcpy(fe->i_signature, OCFS2_INODE_SIGNATURE); fe->i_flags |= cpu_to_le32(OCFS2_VALID_FL); - ktime_get_real_ts64(&ts); + ktime_get_coarse_real_ts64(&ts); fe->i_atime = fe->i_ctime = fe->i_mtime = cpu_to_le64(ts.tv_sec); fe->i_mtime_nsec = fe->i_ctime_nsec = fe->i_atime_nsec = From 050ce8af6838c71e872e982b50d3f1bec21da40e Mon Sep 17 00:00:00 2001 From: Su Yue Date: Mon, 8 Apr 2024 16:20:39 +0800 Subject: [PATCH 1150/1565] ocfs2: fix races between hole punching and AIO+DIO commit 952b023f06a24b2ad6ba67304c4c84d45bea2f18 upstream. After commit "ocfs2: return real error code in ocfs2_dio_wr_get_block", fstests/generic/300 become from always failed to sometimes failed: ======================================================================== [ 473.293420 ] run fstests generic/300 [ 475.296983 ] JBD2: Ignoring recovery information on journal [ 475.302473 ] ocfs2: Mounting device (253,1) on (node local, slot 0) with ordered data mode. [ 494.290998 ] OCFS2: ERROR (device dm-1): ocfs2_change_extent_flag: Owner 5668 has an extent at cpos 78723 which can no longer be found [ 494.291609 ] On-disk corruption discovered. Please run fsck.ocfs2 once the filesystem is unmounted. [ 494.292018 ] OCFS2: File system is now read-only. [ 494.292224 ] (kworker/19:11,2628,19):ocfs2_mark_extent_written:5272 ERROR: status = -30 [ 494.292602 ] (kworker/19:11,2628,19):ocfs2_dio_end_io_write:2374 ERROR: status = -3 fio: io_u error on file /mnt/scratch/racer: Read-only file system: write offset=460849152, buflen=131072 ========================================================================= In __blockdev_direct_IO, ocfs2_dio_wr_get_block is called to add unwritten extents to a list. extents are also inserted into extent tree in ocfs2_write_begin_nolock. Then another thread call fallocate to puch a hole at one of the unwritten extent. The extent at cpos was removed by ocfs2_remove_extent(). At end io worker thread, ocfs2_search_extent_list found there is no such extent at the cpos. T1 T2 T3 inode lock ... insert extents ... inode unlock ocfs2_fallocate __ocfs2_change_file_space inode lock lock ip_alloc_sem ocfs2_remove_inode_range inode ocfs2_remove_btree_range ocfs2_remove_extent ^---remove the extent at cpos 78723 ... unlock ip_alloc_sem inode unlock ocfs2_dio_end_io ocfs2_dio_end_io_write lock ip_alloc_sem ocfs2_mark_extent_written ocfs2_change_extent_flag ocfs2_search_extent_list ^---failed to find extent ... unlock ip_alloc_sem In most filesystems, fallocate is not compatible with racing with AIO+DIO, so fix it by adding to wait for all dio before fallocate/punch_hole like ext4. Link: https://lkml.kernel.org/r/20240408082041.20925-3-glass.su@suse.com Fixes: b25801038da5 ("ocfs2: Support xfs style space reservation ioctls") Signed-off-by: Su Yue Reviewed-by: Joseph Qi Cc: Changwei Ge Cc: Gang He Cc: Joel Becker Cc: Jun Piao Cc: Junxiao Bi Cc: Mark Fasheh Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- fs/ocfs2/file.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index df36d84aedc4..5fd565a6228f 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1940,6 +1940,8 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, inode_lock(inode); + /* Wait all existing dio workers, newcomers will block on i_rwsem */ + inode_dio_wait(inode); /* * This prevents concurrent writes on other nodes */ From 42ed6bfc2ddb2b7a956c1d70e907a4353f32291b Mon Sep 17 00:00:00 2001 From: Rick Wertenbroek Date: Wed, 3 Apr 2024 16:45:08 +0200 Subject: [PATCH 1151/1565] PCI: rockchip-ep: Remove wrong mask on subsys_vendor_id MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 2dba285caba53f309d6060fca911b43d63f41697 upstream. Remove wrong mask on subsys_vendor_id. Both the Vendor ID and Subsystem Vendor ID are u16 variables and are written to a u32 register of the controller. The Subsystem Vendor ID was always 0 because the u16 value was masked incorrectly with GENMASK(31,16) resulting in all lower 16 bits being set to 0 prior to the shift. Remove both masks as they are unnecessary and set the register correctly i.e., the lower 16-bits are the Vendor ID and the upper 16-bits are the Subsystem Vendor ID. This is documented in the RK3399 TRM section 17.6.7.1.17 [kwilczynski: removed unnecesary newline] Fixes: cf590b078391 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller") Link: https://lore.kernel.org/linux-pci/20240403144508.489835-1-rick.wertenbroek@gmail.com Signed-off-by: Rick Wertenbroek Signed-off-by: Krzysztof Wilczyński Signed-off-by: Bjorn Helgaas Reviewed-by: Damien Le Moal Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- drivers/pci/controller/pcie-rockchip-ep.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/pci/controller/pcie-rockchip-ep.c b/drivers/pci/controller/pcie-rockchip-ep.c index dba8bdc3fc94..d1b72b704c31 100644 --- a/drivers/pci/controller/pcie-rockchip-ep.c +++ b/drivers/pci/controller/pcie-rockchip-ep.c @@ -131,10 +131,8 @@ static int rockchip_pcie_ep_write_header(struct pci_epc *epc, u8 fn, /* All functions share the same vendor ID with function 0 */ if (fn == 0) { - u32 vid_regs = (hdr->vendorid & GENMASK(15, 0)) | - (hdr->subsys_vendor_id & GENMASK(31, 16)) << 16; - - rockchip_pcie_write(rockchip, vid_regs, + rockchip_pcie_write(rockchip, + hdr->vendorid | hdr->subsys_vendor_id << 16, PCIE_CORE_CONFIG_VENDOR); } From fe5b53c60217a585933c050da6c9db6ed27fdedd Mon Sep 17 00:00:00 2001 From: Nuno Sa Date: Thu, 28 Mar 2024 14:58:50 +0100 Subject: [PATCH 1152/1565] dmaengine: axi-dmac: fix possible race in remove() commit 1bc31444209c8efae98cb78818131950d9a6f4d6 upstream. We need to first free the IRQ before calling of_dma_controller_free(). Otherwise we could get an interrupt and schedule a tasklet while removing the DMA controller. Fixes: 0e3b67b348b8 ("dmaengine: Add support for the Analog Devices AXI-DMAC DMA controller") Cc: stable@kernel.org Signed-off-by: Nuno Sa Link: https://lore.kernel.org/r/20240328-axi-dmac-devm-probe-v3-1-523c0176df70@analog.com Signed-off-by: Vinod Koul Signed-off-by: Greg Kroah-Hartman --- drivers/dma/dma-axi-dmac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/dma/dma-axi-dmac.c b/drivers/dma/dma-axi-dmac.c index 5161b73c30c4..e91aeec71c81 100644 --- a/drivers/dma/dma-axi-dmac.c +++ b/drivers/dma/dma-axi-dmac.c @@ -1020,8 +1020,8 @@ static int axi_dmac_remove(struct platform_device *pdev) { struct axi_dmac *dmac = platform_get_drvdata(pdev); - of_dma_controller_free(pdev->dev.of_node); free_irq(dmac->irq, dmac); + of_dma_controller_free(pdev->dev.of_node); tasklet_kill(&dmac->chan.vchan.task); dma_async_device_unregister(&dmac->dma_dev); clk_disable_unprepare(dmac->clk); From 02d3b5e48d247d3daad381e761eb4ce7add7d561 Mon Sep 17 00:00:00 2001 From: Beleswar Padhi Date: Tue, 30 Apr 2024 16:23:07 +0530 Subject: [PATCH 1153/1565] remoteproc: k3-r5: Do not allow core1 to power up before core0 via sysfs commit 3c8a9066d584f5010b6f4ba03bf6b19d28973d52 upstream. PSC controller has a limitation that it can only power-up the second core when the first core is in ON state. Power-state for core0 should be equal to or higher than core1. Therefore, prevent core1 from powering up before core0 during the start process from sysfs. Similarly, prevent core0 from shutting down before core1 has been shut down from sysfs. Fixes: 6dedbd1d5443 ("remoteproc: k3-r5: Add a remoteproc driver for R5F subsystem") Signed-off-by: Beleswar Padhi Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240430105307.1190615-3-b-padhi@ti.com Signed-off-by: Mathieu Poirier Signed-off-by: Greg Kroah-Hartman --- drivers/remoteproc/ti_k3_r5_remoteproc.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c index f92a18c06d80..b2e6911311a4 100644 --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c @@ -429,7 +429,7 @@ static int k3_r5_rproc_start(struct rproc *rproc) struct k3_r5_cluster *cluster = kproc->cluster; struct mbox_client *client = &kproc->client; struct device *dev = kproc->dev; - struct k3_r5_core *core; + struct k3_r5_core *core0, *core; u32 boot_addr; int ret; @@ -478,6 +478,15 @@ static int k3_r5_rproc_start(struct rproc *rproc) goto unroll_core_run; } } else { + /* do not allow core 1 to start before core 0 */ + core0 = list_first_entry(&cluster->cores, struct k3_r5_core, + elem); + if (core != core0 && core0->rproc->state == RPROC_OFFLINE) { + dev_err(dev, "%s: can not start core 1 before core 0\n", + __func__); + return -EPERM; + } + ret = k3_r5_core_run(core); if (ret) goto put_mbox; @@ -518,7 +527,8 @@ static int k3_r5_rproc_stop(struct rproc *rproc) { struct k3_r5_rproc *kproc = rproc->priv; struct k3_r5_cluster *cluster = kproc->cluster; - struct k3_r5_core *core = kproc->core; + struct device *dev = kproc->dev; + struct k3_r5_core *core1, *core = kproc->core; int ret; /* halt all applicable cores */ @@ -531,6 +541,15 @@ static int k3_r5_rproc_stop(struct rproc *rproc) } } } else { + /* do not allow core 0 to stop before core 1 */ + core1 = list_last_entry(&cluster->cores, struct k3_r5_core, + elem); + if (core != core1 && core1->rproc->state != RPROC_OFFLINE) { + dev_err(dev, "%s: can not stop core 0 before core 1\n", + __func__); + return -EPERM; + } + ret = k3_r5_core_halt(core); if (ret) goto out; From 78a41b1614c3168778ffe9c87c4b415a9ca5cb38 Mon Sep 17 00:00:00 2001 From: Alexander Shishkin Date: Mon, 29 Apr 2024 16:01:14 +0300 Subject: [PATCH 1154/1565] intel_th: pci: Add Granite Rapids support commit e44937889bdf4ecd1f0c25762b7226406b9b7a69 upstream. Add support for the Trace Hub in Granite Rapids. Signed-off-by: Alexander Shishkin Reviewed-by: Andy Shevchenko Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240429130119.1518073-11-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/hwtracing/intel_th/pci.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c index 7f13687267fd..b2ee6202867d 100644 --- a/drivers/hwtracing/intel_th/pci.c +++ b/drivers/hwtracing/intel_th/pci.c @@ -304,6 +304,11 @@ static const struct pci_device_id intel_th_pci_id_table[] = { PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0xa76f), .driver_data = (kernel_ulong_t)&intel_th_2x, }, + { + /* Granite Rapids */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x0963), + .driver_data = (kernel_ulong_t)&intel_th_2x, + }, { /* Alder Lake CPU */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x466f), From dbfe50b50eb9993f50826092ce308a80def98ac2 Mon Sep 17 00:00:00 2001 From: Alexander Shishkin Date: Mon, 29 Apr 2024 16:01:15 +0300 Subject: [PATCH 1155/1565] intel_th: pci: Add Granite Rapids SOC support commit 854afe461b009801a171b3a49c5f75ea43e4c04c upstream. Add support for the Trace Hub in Granite Rapids SOC. Signed-off-by: Alexander Shishkin Reviewed-by: Andy Shevchenko Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240429130119.1518073-12-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/hwtracing/intel_th/pci.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c index b2ee6202867d..b674af7aa908 100644 --- a/drivers/hwtracing/intel_th/pci.c +++ b/drivers/hwtracing/intel_th/pci.c @@ -309,6 +309,11 @@ static const struct pci_device_id intel_th_pci_id_table[] = { PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x0963), .driver_data = (kernel_ulong_t)&intel_th_2x, }, + { + /* Granite Rapids SOC */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x3256), + .driver_data = (kernel_ulong_t)&intel_th_2x, + }, { /* Alder Lake CPU */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x466f), From c02003a97a888d1452fdb3824a8c4eeed5b18907 Mon Sep 17 00:00:00 2001 From: Alexander Shishkin Date: Mon, 29 Apr 2024 16:01:16 +0300 Subject: [PATCH 1156/1565] intel_th: pci: Add Sapphire Rapids SOC support commit 2e1da7efabe05cb0cf0b358883b2bc89080ed0eb upstream. Add support for the Trace Hub in Sapphire Rapids SOC. Signed-off-by: Alexander Shishkin Reviewed-by: Andy Shevchenko Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240429130119.1518073-13-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/hwtracing/intel_th/pci.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c index b674af7aa908..8560a6b1c3d0 100644 --- a/drivers/hwtracing/intel_th/pci.c +++ b/drivers/hwtracing/intel_th/pci.c @@ -314,6 +314,11 @@ static const struct pci_device_id intel_th_pci_id_table[] = { PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x3256), .driver_data = (kernel_ulong_t)&intel_th_2x, }, + { + /* Sapphire Rapids SOC */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x3456), + .driver_data = (kernel_ulong_t)&intel_th_2x, + }, { /* Alder Lake CPU */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x466f), From 31f3136fd6fcd83b2a7ab35ceefb6083b31e71af Mon Sep 17 00:00:00 2001 From: Alexander Shishkin Date: Mon, 29 Apr 2024 16:01:17 +0300 Subject: [PATCH 1157/1565] intel_th: pci: Add Meteor Lake-S support commit c4a30def564d75e84718b059d1a62cc79b137cf9 upstream. Add support for the Trace Hub in Meteor Lake-S. Signed-off-by: Alexander Shishkin Reviewed-by: Andy Shevchenko Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240429130119.1518073-14-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/hwtracing/intel_th/pci.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c index 8560a6b1c3d0..9ad0dd9636e1 100644 --- a/drivers/hwtracing/intel_th/pci.c +++ b/drivers/hwtracing/intel_th/pci.c @@ -294,6 +294,11 @@ static const struct pci_device_id intel_th_pci_id_table[] = { PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0xae24), .driver_data = (kernel_ulong_t)&intel_th_2x, }, + { + /* Meteor Lake-S */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x7f26), + .driver_data = (kernel_ulong_t)&intel_th_2x, + }, { /* Raptor Lake-S */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x7a26), From f699f9f8b2ea7c67f85812d8812e71d45a6345b9 Mon Sep 17 00:00:00 2001 From: Alexander Shishkin Date: Mon, 29 Apr 2024 16:01:19 +0300 Subject: [PATCH 1158/1565] intel_th: pci: Add Lunar Lake support commit f866b65322bfbc8fcca13c25f49e1a5c5a93ae4d upstream. Add support for the Trace Hub in Lunar Lake. Signed-off-by: Alexander Shishkin Reviewed-by: Andy Shevchenko Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240429130119.1518073-16-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/hwtracing/intel_th/pci.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/hwtracing/intel_th/pci.c b/drivers/hwtracing/intel_th/pci.c index 9ad0dd9636e1..eba0dd541fa2 100644 --- a/drivers/hwtracing/intel_th/pci.c +++ b/drivers/hwtracing/intel_th/pci.c @@ -324,6 +324,11 @@ static const struct pci_device_id intel_th_pci_id_table[] = { PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x3456), .driver_data = (kernel_ulong_t)&intel_th_2x, }, + { + /* Lunar Lake */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0xa824), + .driver_data = (kernel_ulong_t)&intel_th_2x, + }, { /* Alder Lake CPU */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x466f), From 0ecfe3a92869a59668d27228dabbd7965e83567f Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi Date: Thu, 30 May 2024 23:15:56 +0900 Subject: [PATCH 1159/1565] nilfs2: fix potential kernel bug due to lack of writeback flag waiting commit a4ca369ca221bb7e06c725792ac107f0e48e82e7 upstream. Destructive writes to a block device on which nilfs2 is mounted can cause a kernel bug in the folio/page writeback start routine or writeback end routine (__folio_start_writeback in the log below): kernel BUG at mm/page-writeback.c:3070! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI ... RIP: 0010:__folio_start_writeback+0xbaa/0x10e0 Code: 25 ff 0f 00 00 0f 84 18 01 00 00 e8 40 ca c6 ff e9 17 f6 ff ff e8 36 ca c6 ff 4c 89 f7 48 c7 c6 80 c0 12 84 e8 e7 b3 0f 00 90 <0f> 0b e8 1f ca c6 ff 4c 89 f7 48 c7 c6 a0 c6 12 84 e8 d0 b3 0f 00 ... Call Trace: nilfs_segctor_do_construct+0x4654/0x69d0 [nilfs2] nilfs_segctor_construct+0x181/0x6b0 [nilfs2] nilfs_segctor_thread+0x548/0x11c0 [nilfs2] kthread+0x2f0/0x390 ret_from_fork+0x4b/0x80 ret_from_fork_asm+0x1a/0x30 This is because when the log writer starts a writeback for segment summary blocks or a super root block that use the backing device's page cache, it does not wait for the ongoing folio/page writeback, resulting in an inconsistent writeback state. Fix this issue by waiting for ongoing writebacks when putting folios/pages on the backing device into writeback state. Link: https://lkml.kernel.org/r/20240530141556.4411-1-konishi.ryusuke@gmail.com Fixes: 9ff05123e3bf ("nilfs2: segment constructor") Signed-off-by: Ryusuke Konishi Tested-by: Ryusuke Konishi Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- fs/nilfs2/segment.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index 2eeae67ad298..02407c524382 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -1697,6 +1697,7 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci) if (bh->b_page != bd_page) { if (bd_page) { lock_page(bd_page); + wait_on_page_writeback(bd_page); clear_page_dirty_for_io(bd_page); set_page_writeback(bd_page); unlock_page(bd_page); @@ -1710,6 +1711,7 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci) if (bh == segbuf->sb_super_root) { if (bh->b_page != bd_page) { lock_page(bd_page); + wait_on_page_writeback(bd_page); clear_page_dirty_for_io(bd_page); set_page_writeback(bd_page); unlock_page(bd_page); @@ -1726,6 +1728,7 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci) } if (bd_page) { lock_page(bd_page); + wait_on_page_writeback(bd_page); clear_page_dirty_for_io(bd_page); set_page_writeback(bd_page); unlock_page(bd_page); From 33eae51f656976895552bfdcb611131591e5abed Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Tue, 28 May 2024 14:20:19 +0200 Subject: [PATCH 1160/1565] tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device() commit 07c54cc5988f19c9642fd463c2dbdac7fc52f777 upstream. After the recent commit 5097cbcb38e6 ("sched/isolation: Prevent boot crash when the boot CPU is nohz_full") the kernel no longer crashes, but there is another problem. In this case tick_setup_device() calls tick_take_do_timer_from_boot() to update tick_do_timer_cpu and this triggers the WARN_ON_ONCE(irqs_disabled) in smp_call_function_single(). Kill tick_take_do_timer_from_boot() and just use WRITE_ONCE(), the new comment explains why this is safe (thanks Thomas!). Fixes: 08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full") Signed-off-by: Oleg Nesterov Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240528122019.GA28794@redhat.com Link: https://lore.kernel.org/all/20240522151742.GA10400@redhat.com Signed-off-by: Greg Kroah-Hartman --- kernel/time/tick-common.c | 42 +++++++++++++-------------------------- 1 file changed, 14 insertions(+), 28 deletions(-) diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c index e883d12dcb0d..b13b3e3f6c9f 100644 --- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -177,26 +177,6 @@ void tick_setup_periodic(struct clock_event_device *dev, int broadcast) } } -#ifdef CONFIG_NO_HZ_FULL -static void giveup_do_timer(void *info) -{ - int cpu = *(unsigned int *)info; - - WARN_ON(tick_do_timer_cpu != smp_processor_id()); - - tick_do_timer_cpu = cpu; -} - -static void tick_take_do_timer_from_boot(void) -{ - int cpu = smp_processor_id(); - int from = tick_do_timer_boot_cpu; - - if (from >= 0 && from != cpu) - smp_call_function_single(from, giveup_do_timer, &cpu, 1); -} -#endif - /* * Setup the tick device */ @@ -220,19 +200,25 @@ static void tick_setup_device(struct tick_device *td, tick_next_period = ktime_get(); #ifdef CONFIG_NO_HZ_FULL /* - * The boot CPU may be nohz_full, in which case set - * tick_do_timer_boot_cpu so the first housekeeping - * secondary that comes up will take do_timer from - * us. + * The boot CPU may be nohz_full, in which case the + * first housekeeping secondary will take do_timer() + * from it. */ if (tick_nohz_full_cpu(cpu)) tick_do_timer_boot_cpu = cpu; - } else if (tick_do_timer_boot_cpu != -1 && - !tick_nohz_full_cpu(cpu)) { - tick_take_do_timer_from_boot(); + } else if (tick_do_timer_boot_cpu != -1 && !tick_nohz_full_cpu(cpu)) { tick_do_timer_boot_cpu = -1; - WARN_ON(tick_do_timer_cpu != cpu); + /* + * The boot CPU will stay in periodic (NOHZ disabled) + * mode until clocksource_done_booting() called after + * smp_init() selects a high resolution clocksource and + * timekeeping_notify() kicks the NOHZ stuff alive. + * + * So this WRITE_ONCE can only race with the READ_ONCE + * check in tick_periodic() but this race is harmless. + */ + WRITE_ONCE(tick_do_timer_cpu, cpu); #endif } From 0047568dbd9c4ef0c3d5e483d70e55e8f743e680 Mon Sep 17 00:00:00 2001 From: Doug Brown Date: Sun, 19 May 2024 12:19:30 -0700 Subject: [PATCH 1161/1565] serial: 8250_pxa: Configure tx_loadsz to match FIFO IRQ level commit 5208e7ced520a813b4f4774451fbac4e517e78b2 upstream. The FIFO is 64 bytes, but the FCR is configured to fire the TX interrupt when the FIFO is half empty (bit 3 = 0). Thus, we should only write 32 bytes when a TX interrupt occurs. This fixes a problem observed on the PXA168 that dropped a bunch of TX bytes during large transmissions. Fixes: ab28f51c77cd ("serial: rewrite pxa2xx-uart to use 8250_core") Signed-off-by: Doug Brown Link: https://lore.kernel.org/r/20240519191929.122202-1-doug@schmorgal.com Cc: stable Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/8250/8250_pxa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/tty/serial/8250/8250_pxa.c b/drivers/tty/serial/8250/8250_pxa.c index 33ca98bfa5b3..0780f5d7be62 100644 --- a/drivers/tty/serial/8250/8250_pxa.c +++ b/drivers/tty/serial/8250/8250_pxa.c @@ -125,6 +125,7 @@ static int serial_pxa_probe(struct platform_device *pdev) uart.port.regshift = 2; uart.port.irq = irq; uart.port.fifosize = 64; + uart.tx_loadsz = 32; uart.port.flags = UPF_IOREMAP | UPF_SKIP_TEST | UPF_FIXED_TYPE; uart.port.dev = &pdev->dev; uart.port.uartclk = clk_get_rate(data->clk); From 73014c77ec2aabb68b0660d0bf3c461fd8addc88 Mon Sep 17 00:00:00 2001 From: Matthias Goergens Date: Mon, 5 Sep 2022 11:19:04 +0800 Subject: [PATCH 1162/1565] hugetlb_encode.h: fix undefined behaviour (34 << 26) commit 710bb68c2e3a24512e2d2bae470960d7488e97b1 upstream. Left-shifting past the size of your datatype is undefined behaviour in C. The literal 34 gets the type `int`, and that one is not big enough to be left shifted by 26 bits. An `unsigned` is long enough (on any machine that has at least 32 bits for their ints.) For uniformity, we mark all the literals as unsigned. But it's only really needed for HUGETLB_FLAG_ENCODE_16GB. Thanks to Randy Dunlap for an initial review and suggestion. Link: https://lkml.kernel.org/r/20220905031904.150925-1-matthias.goergens@gmail.com Signed-off-by: Matthias Goergens Acked-by: Randy Dunlap Cc: Mike Kravetz Cc: Muchun Song Signed-off-by: Andrew Morton [cmllamas: fix trivial conflict due to missing page encondigs] Signed-off-by: Carlos Llamas Signed-off-by: Greg Kroah-Hartman --- include/uapi/asm-generic/hugetlb_encode.h | 26 +++++++++++----------- tools/include/asm-generic/hugetlb_encode.h | 20 ++++++++--------- 2 files changed, 23 insertions(+), 23 deletions(-) diff --git a/include/uapi/asm-generic/hugetlb_encode.h b/include/uapi/asm-generic/hugetlb_encode.h index 4f3d5aaa11f5..de687009bfe5 100644 --- a/include/uapi/asm-generic/hugetlb_encode.h +++ b/include/uapi/asm-generic/hugetlb_encode.h @@ -20,18 +20,18 @@ #define HUGETLB_FLAG_ENCODE_SHIFT 26 #define HUGETLB_FLAG_ENCODE_MASK 0x3f -#define HUGETLB_FLAG_ENCODE_16KB (14 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_64KB (16 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_512KB (19 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_1MB (20 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_2MB (21 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_8MB (23 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_16MB (24 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_32MB (25 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_256MB (28 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_512MB (29 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_1GB (30 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_2GB (31 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_16GB (34 << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_16KB (14U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_64KB (16U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_512KB (19U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_1MB (20U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_2MB (21U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_8MB (23U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_16MB (24U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_32MB (25U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_256MB (28U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_512MB (29U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_1GB (30U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_2GB (31U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_16GB (34U << HUGETLB_FLAG_ENCODE_SHIFT) #endif /* _ASM_GENERIC_HUGETLB_ENCODE_H_ */ diff --git a/tools/include/asm-generic/hugetlb_encode.h b/tools/include/asm-generic/hugetlb_encode.h index e4732d3c2998..9d279fa4c36f 100644 --- a/tools/include/asm-generic/hugetlb_encode.h +++ b/tools/include/asm-generic/hugetlb_encode.h @@ -20,15 +20,15 @@ #define HUGETLB_FLAG_ENCODE_SHIFT 26 #define HUGETLB_FLAG_ENCODE_MASK 0x3f -#define HUGETLB_FLAG_ENCODE_64KB (16 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_512KB (19 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_1MB (20 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_2MB (21 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_8MB (23 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_16MB (24 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_256MB (28 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_1GB (30 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_2GB (31 << HUGETLB_FLAG_ENCODE_SHIFT) -#define HUGETLB_FLAG_ENCODE_16GB (34 << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_64KB (16U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_512KB (19U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_1MB (20U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_2MB (21U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_8MB (23U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_16MB (24U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_256MB (28U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_1GB (30U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_2GB (31U << HUGETLB_FLAG_ENCODE_SHIFT) +#define HUGETLB_FLAG_ENCODE_16GB (34U << HUGETLB_FLAG_ENCODE_SHIFT) #endif /* _ASM_GENERIC_HUGETLB_ENCODE_H_ */ From 208cd22ef5e57f82d38ec11c1a1703f9401d6dde Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Tue, 18 Jun 2024 14:24:45 +0200 Subject: [PATCH 1163/1565] mptcp: ensure snd_una is properly initialized on connect commit 8031b58c3a9b1db3ef68b3bd749fbee2e1e1aaa3 upstream. This is strictly related to commit fb7a0d334894 ("mptcp: ensure snd_nxt is properly initialized on connect"). It turns out that syzkaller can trigger the retransmit after fallback and before processing any other incoming packet - so that snd_una is still left uninitialized. Address the issue explicitly initializing snd_una together with snd_nxt and write_seq. Suggested-by: Mat Martineau Fixes: 8fd738049ac3 ("mptcp: fallback in case of simultaneous connect") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/485 Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-1-1ab9ddfa3d00@kernel.org Signed-off-by: Jakub Kicinski [ Conflicts in protocol.c, similar to the ones from commit 99951b62bf20 ("mptcp: ensure snd_nxt is properly initialized on connect"), with the same resolution. Note that in this version, 'snd_una' is an atomic64 type, so use atomic64_set() instead, as it is done everywhere else. ] Signed-off-by: Matthieu Baerts (NGI0) Signed-off-by: Greg Kroah-Hartman --- net/mptcp/protocol.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 36fa456f42ba..a36493bbf895 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2646,6 +2646,7 @@ static int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr, mptcp_subflow_early_fallback(msk, subflow); WRITE_ONCE(msk->write_seq, subflow->idsn); + atomic64_set(&msk->snd_una, msk->write_seq); do_connect: err = ssock->ops->connect(ssock, uaddr, addr_len, flags); From 5a4efafcf843ed56686406b79765358b75401db2 Mon Sep 17 00:00:00 2001 From: YonglongLi Date: Tue, 18 Jun 2024 14:25:12 +0200 Subject: [PATCH 1164/1565] mptcp: pm: inc RmAddr MIB counter once per RM_ADDR ID commit 6a09788c1a66e3d8b04b3b3e7618cc817bb60ae9 upstream. The RmAddr MIB counter is supposed to be incremented once when a valid RM_ADDR has been received. Before this patch, it could have been incremented as many times as the number of subflows connected to the linked address ID, so it could have been 0, 1 or more than 1. The "RmSubflow" is incremented after a local operation. In this case, it is normal to tied it with the number of subflows that have been actually removed. The "remove invalid addresses" MP Join subtest has been modified to validate this case. A broadcast IP address is now used instead: the client will not be able to create a subflow to this address. The consequence is that when receiving the RM_ADDR with the ID attached to this broadcast IP address, no subflow linked to this ID will be found. Fixes: 7a7e52e38a40 ("mptcp: add RM_ADDR related mibs") Cc: stable@vger.kernel.org Co-developed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) Signed-off-by: YonglongLi Signed-off-by: Matthieu Baerts (NGI0) Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-2-1ab9ddfa3d00@kernel.org Signed-off-by: Jakub Kicinski [ Conflicts in pm_netlink.c because the commit 9f12e97bf16c ("mptcp: unify RM_ADDR and RM_SUBFLOW receiving"), and commit d0b698ca9a27 ("mptcp: remove multi addresses in PM") are not in this version. To fix the issue, the incrementation should be done outside the loop: the same resolution has been applied here. The selftest modification has been dropped, because the modified test is not in this version. That's fine, we can test with selftests from a newer version. ] Signed-off-by: Matthieu Baerts (NGI0) Signed-off-by: Greg Kroah-Hartman --- net/mptcp/pm_netlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index 0d6f3d912891..7472473605e3 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -427,10 +427,10 @@ void mptcp_pm_nl_rm_addr_received(struct mptcp_sock *msk) msk->pm.subflows--; WRITE_ONCE(msk->pm.accept_addr, true); - __MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RMADDR); - break; } + + __MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RMADDR); } void mptcp_pm_nl_rm_subflow_received(struct mptcp_sock *msk, u8 rm_id) From 990d0710108df3cea4321cbb32fe0aabc5cc320f Mon Sep 17 00:00:00 2001 From: YonglongLi Date: Tue, 18 Jun 2024 14:25:32 +0200 Subject: [PATCH 1165/1565] mptcp: pm: update add_addr counters after connect commit 40eec1795cc27b076d49236649a29507c7ed8c2d upstream. The creation of new subflows can fail for different reasons. If no subflow have been created using the received ADD_ADDR, the related counters should not be updated, otherwise they will never be decremented for events related to this ID later on. For the moment, the number of accepted ADD_ADDR is only decremented upon the reception of a related RM_ADDR, and only if the remote address ID is currently being used by at least one subflow. In other words, if no subflow can be created with the received address, the counter will not be decremented. In this case, it is then important not to increment pm.add_addr_accepted counter, and not to modify pm.accept_addr bit. Note that this patch does not modify the behaviour in case of failures later on, e.g. if the MP Join is dropped or rejected. The "remove invalid addresses" MP Join subtest has been modified to validate this case. The broadcast IP address is added before the "valid" address that will be used to successfully create a subflow, and the limit is decreased by one: without this patch, it was not possible to create the last subflow, because: - the broadcast address would have been accepted even if it was not usable: the creation of a subflow to this address results in an error, - the limit of 2 accepted ADD_ADDR would have then been reached. Fixes: 01cacb00b35c ("mptcp: add netlink-based PM") Cc: stable@vger.kernel.org Co-developed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) Signed-off-by: YonglongLi Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-3-1ab9ddfa3d00@kernel.org Signed-off-by: Jakub Kicinski [ Conflicts in pm_netlink.c because commit 1a0d6136c5f0 ("mptcp: local addresses fullmesh") is not present in this version (+ others changing the context). To resolve the conflicts, the same block is moved later, and under the condition that the call to __mptcp_subflow_connect() didn't fail. The selftest modification has been dropped, because the modified test is not in this version. That's fine, we can test with selftests from a newer version. ] Signed-off-by: Matthieu Baerts (NGI0) Signed-off-by: Greg Kroah-Hartman --- net/mptcp/pm_netlink.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index 7472473605e3..452c7e21befd 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -371,15 +371,12 @@ void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk) struct sock *sk = (struct sock *)msk; struct mptcp_addr_info remote; struct mptcp_addr_info local; + int err; pr_debug("accepted %d:%d remote family %d", msk->pm.add_addr_accepted, msk->pm.add_addr_accept_max, msk->pm.remote.family); - msk->pm.add_addr_accepted++; msk->pm.subflows++; - if (msk->pm.add_addr_accepted >= msk->pm.add_addr_accept_max || - msk->pm.subflows >= msk->pm.subflows_max) - WRITE_ONCE(msk->pm.accept_addr, false); /* connect to the specified remote address, using whatever * local address the routing configuration will pick. @@ -391,9 +388,16 @@ void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk) local.family = remote.family; spin_unlock_bh(&msk->pm.lock); - __mptcp_subflow_connect((struct sock *)msk, &local, &remote); + err = __mptcp_subflow_connect((struct sock *)msk, &local, &remote); spin_lock_bh(&msk->pm.lock); + if (!err) { + msk->pm.add_addr_accepted++; + if (msk->pm.add_addr_accepted >= msk->pm.add_addr_accept_max || + msk->pm.subflows >= msk->pm.subflows_max) + WRITE_ONCE(msk->pm.accept_addr, false); + } + mptcp_pm_announce_addr(msk, &remote, true); } From d6c26a59e633b5cc81bad7db231ba54a0a1cca1d Mon Sep 17 00:00:00 2001 From: Beleswar Padhi Date: Mon, 6 May 2024 19:48:49 +0530 Subject: [PATCH 1166/1565] remoteproc: k3-r5: Jump to error handling labels in start/stop errors commit 1dc7242f6ee0c99852cb90676d7fe201cf5de422 upstream. In case of errors during core start operation from sysfs, the driver directly returns with the -EPERM error code. Fix this to ensure that mailbox channels are freed on error before returning by jumping to the 'put_mbox' error handling label. Similarly, jump to the 'out' error handling label to return with required -EPERM error code during the core stop operation from sysfs. Fixes: 3c8a9066d584 ("remoteproc: k3-r5: Do not allow core1 to power up before core0 via sysfs") Signed-off-by: Beleswar Padhi Link: https://lore.kernel.org/r/20240506141849.1735679-1-b-padhi@ti.com Signed-off-by: Mathieu Poirier Signed-off-by: Greg Kroah-Hartman --- drivers/remoteproc/ti_k3_r5_remoteproc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c index b2e6911311a4..48b45fd94437 100644 --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c @@ -484,7 +484,8 @@ static int k3_r5_rproc_start(struct rproc *rproc) if (core != core0 && core0->rproc->state == RPROC_OFFLINE) { dev_err(dev, "%s: can not start core 1 before core 0\n", __func__); - return -EPERM; + ret = -EPERM; + goto put_mbox; } ret = k3_r5_core_run(core); @@ -547,7 +548,8 @@ static int k3_r5_rproc_stop(struct rproc *rproc) if (core != core1 && core1->rproc->state != RPROC_OFFLINE) { dev_err(dev, "%s: can not stop core 0 before core 1\n", __func__); - return -EPERM; + ret = -EPERM; + goto out; } ret = k3_r5_core_halt(core); From 2b6bb0b4abfd79b8698ee161bb73c0936a2aaf83 Mon Sep 17 00:00:00 2001 From: Sicong Huang Date: Tue, 16 Apr 2024 16:03:13 +0800 Subject: [PATCH 1167/1565] greybus: Fix use-after-free bug in gb_interface_release due to race condition. commit 5c9c5d7f26acc2c669c1dcf57d1bb43ee99220ce upstream. In gb_interface_create, &intf->mode_switch_completion is bound with gb_interface_mode_switch_work. Then it will be started by gb_interface_request_mode_switch. Here is the relevant code. if (!queue_work(system_long_wq, &intf->mode_switch_work)) { ... } If we call gb_interface_release to make cleanup, there may be an unfinished work. This function will call kfree to free the object "intf". However, if gb_interface_mode_switch_work is scheduled to run after kfree, it may cause use-after-free error as gb_interface_mode_switch_work will use the object "intf". The possible execution flow that may lead to the issue is as follows: CPU0 CPU1 | gb_interface_create | gb_interface_request_mode_switch gb_interface_release | kfree(intf) (free) | | gb_interface_mode_switch_work | mutex_lock(&intf->mutex) (use) Fix it by canceling the work before kfree. Signed-off-by: Sicong Huang Link: https://lore.kernel.org/r/20240416080313.92306-1-congei42@163.com Cc: Ronnie Sahlberg Signed-off-by: Greg Kroah-Hartman --- drivers/greybus/interface.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/greybus/interface.c b/drivers/greybus/interface.c index 9ec949a438ef..52ef6be9d449 100644 --- a/drivers/greybus/interface.c +++ b/drivers/greybus/interface.c @@ -694,6 +694,7 @@ static void gb_interface_release(struct device *dev) trace_gb_interface_release(intf); + cancel_work_sync(&intf->mode_switch_work); kfree(intf); } From f68820f1256b21466ff094dd97f243b7e708f9c1 Mon Sep 17 00:00:00 2001 From: Shichao Lai Date: Sun, 26 May 2024 09:27:45 +0800 Subject: [PATCH 1168/1565] usb-storage: alauda: Check whether the media is initialized [ Upstream commit 16637fea001ab3c8df528a8995b3211906165a30 ] The member "uzonesize" of struct alauda_info will remain 0 if alauda_init_media() fails, potentially causing divide errors in alauda_read_data() and alauda_write_lba(). - Add a member "media_initialized" to struct alauda_info. - Change a condition in alauda_check_media() to ensure the first initialization. - Add an error check for the return value of alauda_init_media(). Fixes: e80b0fade09e ("[PATCH] USB Storage: add alauda support") Reported-by: xingwei lee Reported-by: yue sun Reviewed-by: Alan Stern Signed-off-by: Shichao Lai Link: https://lore.kernel.org/r/20240526012745.2852061-1-shichaorai@gmail.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- drivers/usb/storage/alauda.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/usb/storage/alauda.c b/drivers/usb/storage/alauda.c index dcc4778d1ae9..17fe35083f04 100644 --- a/drivers/usb/storage/alauda.c +++ b/drivers/usb/storage/alauda.c @@ -105,6 +105,8 @@ struct alauda_info { unsigned char sense_key; unsigned long sense_asc; /* additional sense code */ unsigned long sense_ascq; /* additional sense code qualifier */ + + bool media_initialized; }; #define short_pack(lsb,msb) ( ((u16)(lsb)) | ( ((u16)(msb))<<8 ) ) @@ -476,11 +478,12 @@ static int alauda_check_media(struct us_data *us) } /* Check for media change */ - if (status[0] & 0x08) { + if (status[0] & 0x08 || !info->media_initialized) { usb_stor_dbg(us, "Media change detected\n"); alauda_free_maps(&MEDIA_INFO(us)); - alauda_init_media(us); - + rc = alauda_init_media(us); + if (rc == USB_STOR_TRANSPORT_GOOD) + info->media_initialized = true; info->sense_key = UNIT_ATTENTION; info->sense_asc = 0x28; info->sense_ascq = 0x00; From 0f37d22a6215c37a5280315a82ac2907e1ac298b Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Fri, 31 May 2024 11:19:14 +0200 Subject: [PATCH 1169/1565] i2c: at91: Fix the functionality flags of the slave-only interface [ Upstream commit d6d5645e5fc1233a7ba950de4a72981c394a2557 ] When an I2C adapter acts only as a slave, it should not claim to support I2C master capabilities. Fixes: 9d3ca54b550c ("i2c: at91: added slave mode support") Signed-off-by: Jean Delvare Cc: Juergen Fitschen Cc: Ludovic Desroches Cc: Codrin Ciubotariu Cc: Andi Shyti Cc: Nicolas Ferre Cc: Alexandre Belloni Cc: Claudiu Beznea Signed-off-by: Andi Shyti Signed-off-by: Sasha Levin --- drivers/i2c/busses/i2c-at91-slave.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/i2c/busses/i2c-at91-slave.c b/drivers/i2c/busses/i2c-at91-slave.c index d6eeea5166c0..131a67d9d4a6 100644 --- a/drivers/i2c/busses/i2c-at91-slave.c +++ b/drivers/i2c/busses/i2c-at91-slave.c @@ -106,8 +106,7 @@ static int at91_unreg_slave(struct i2c_client *slave) static u32 at91_twi_func(struct i2c_adapter *adapter) { - return I2C_FUNC_SLAVE | I2C_FUNC_I2C | I2C_FUNC_SMBUS_EMUL - | I2C_FUNC_SMBUS_READ_BLOCK_DATA; + return I2C_FUNC_SLAVE; } static const struct i2c_algorithm at91_twi_algorithm_slave = { From 864963d2692ecd4357e9da3d27f5e5bb2acf2212 Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Fri, 31 May 2024 11:17:48 +0200 Subject: [PATCH 1170/1565] i2c: designware: Fix the functionality flags of the slave-only interface [ Upstream commit cbf3fb5b29e99e3689d63a88c3cddbffa1b8de99 ] When an I2C adapter acts only as a slave, it should not claim to support I2C master capabilities. Fixes: 5b6d721b266a ("i2c: designware: enable SLAVE in platform module") Signed-off-by: Jean Delvare Cc: Luis Oliveira Cc: Jarkko Nikula Cc: Andy Shevchenko Cc: Mika Westerberg Cc: Jan Dabros Cc: Andi Shyti Reviewed-by: Andy Shevchenko Acked-by: Jarkko Nikula Tested-by: Jarkko Nikula Signed-off-by: Andi Shyti Signed-off-by: Sasha Levin --- drivers/i2c/busses/i2c-designware-slave.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-designware-slave.c b/drivers/i2c/busses/i2c-designware-slave.c index 0d15f4c1e9f7..5b54a9b9ed1a 100644 --- a/drivers/i2c/busses/i2c-designware-slave.c +++ b/drivers/i2c/busses/i2c-designware-slave.c @@ -232,7 +232,7 @@ static const struct i2c_algorithm i2c_dw_algo = { void i2c_dw_configure_slave(struct dw_i2c_dev *dev) { - dev->functionality = I2C_FUNC_SLAVE | DW_IC_DEFAULT_FUNCTIONALITY; + dev->functionality = I2C_FUNC_SLAVE; dev->slave_cfg = DW_IC_CON_RX_FIFO_FULL_HLD_CTRL | DW_IC_CON_RESTART_EN | DW_IC_CON_STOP_DET_IFADDRESSED; From 82c7acf9a12c9b4e4b0aba531de82b4e39a345c8 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Sat, 8 Jun 2024 14:06:16 +0200 Subject: [PATCH 1171/1565] zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING [ Upstream commit 7fea700e04bd3f424c2d836e98425782f97b494e ] kernel_wait4() doesn't sleep and returns -EINTR if there is no eligible child and signal_pending() is true. That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending() return false and avoid a busy-wait loop. Link: https://lkml.kernel.org/r/20240608120616.GB7947@redhat.com Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") Signed-off-by: Oleg Nesterov Reported-by: Rachel Menge Closes: https://lore.kernel.org/all/1386cd49-36d0-4a5c-85e9-bc42056a5a38@linux.microsoft.com/ Reviewed-by: Boqun Feng Tested-by: Wei Fu Reviewed-by: Jens Axboe Cc: Allen Pais Cc: Christian Brauner Cc: Frederic Weisbecker Cc: Joel Fernandes (Google) Cc: Joel Granados Cc: Josh Triplett Cc: Lai Jiangshan Cc: Mateusz Guzik Cc: Mathieu Desnoyers Cc: Mike Christie Cc: Neeraj Upadhyay Cc: Paul E. McKenney Cc: Steven Rostedt (Google) Cc: Zqiang Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Sasha Levin --- kernel/pid_namespace.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index 20243682e605..e032b1ce7964 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -221,6 +221,7 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns) */ do { clear_thread_flag(TIF_SIGPENDING); + clear_thread_flag(TIF_NOTIFY_SIGNAL); rc = kernel_wait4(-1, NULL, __WALL, NULL); } while (rc != -ECHILD); From ec58e6ff29b78616ced062e46b9635b97407ffb4 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Wed, 3 Apr 2024 17:36:18 +0800 Subject: [PATCH 1172/1565] padata: Disable BH when taking works lock on MT path [ Upstream commit 58329c4312031603bb1786b44265c26d5065fe72 ] As the old padata code can execute in softirq context, disable softirqs for the new padata_do_mutithreaded code too as otherwise lockdep will get antsy. Reported-by: syzbot+0cb5bb0f4bf9e79db3b3@syzkaller.appspotmail.com Signed-off-by: Herbert Xu Acked-by: Daniel Jordan Signed-off-by: Herbert Xu Signed-off-by: Sasha Levin --- kernel/padata.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/padata.c b/kernel/padata.c index fdcd78302cd7..471ccbc44541 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -111,7 +111,7 @@ static int __init padata_work_alloc_mt(int nworks, void *data, { int i; - spin_lock(&padata_works_lock); + spin_lock_bh(&padata_works_lock); /* Start at 1 because the current task participates in the job. */ for (i = 1; i < nworks; ++i) { struct padata_work *pw = padata_work_alloc(); @@ -121,7 +121,7 @@ static int __init padata_work_alloc_mt(int nworks, void *data, padata_work_init(pw, padata_mt_helper, data, 0); list_add(&pw->pw_list, head); } - spin_unlock(&padata_works_lock); + spin_unlock_bh(&padata_works_lock); return i; } @@ -139,12 +139,12 @@ static void __init padata_works_free(struct list_head *works) if (list_empty(works)) return; - spin_lock(&padata_works_lock); + spin_lock_bh(&padata_works_lock); list_for_each_entry_safe(cur, next, works, pw_list) { list_del(&cur->pw_list); padata_work_free(cur); } - spin_unlock(&padata_works_lock); + spin_unlock_bh(&padata_works_lock); } static void padata_parallel_worker(struct work_struct *parallel_work) From dd2cb39afc72d16daa13124be28691ff97baf050 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 6 Mar 2024 19:21:47 -0800 Subject: [PATCH 1173/1565] rcutorture: Fix rcu_torture_one_read() pipe_count overflow comment [ Upstream commit 8b9b443fa860276822b25057cb3ff3b28734dec0 ] The "pipe_count > RCU_TORTURE_PIPE_LEN" check has a comment saying "Should not happen, but...". This is only true when testing an RCU whose grace periods are always long enough. This commit therefore fixes this comment. Reported-by: Linus Torvalds Closes: https://lore.kernel.org/lkml/CAHk-=wi7rJ-eGq+xaxVfzFEgbL9tdf6Kc8Z89rCpfcQOKm74Tw@mail.gmail.com/ Signed-off-by: Paul E. McKenney Signed-off-by: Uladzislau Rezki (Sony) Signed-off-by: Sasha Levin --- kernel/rcu/rcutorture.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 6c1aea48a79a..c413eb4a9566 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -1407,7 +1407,8 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp) preempt_disable(); pipe_count = READ_ONCE(p->rtort_pipe_count); if (pipe_count > RCU_TORTURE_PIPE_LEN) { - /* Should not happen, but... */ + // Should not happen in a correct RCU implementation, + // happens quite often for torture_type=busted. pipe_count = RCU_TORTURE_PIPE_LEN; } completed = cur_ops->get_gp_seq(); From c15df6f49867f90b41b8e6043d8eb87943b05a07 Mon Sep 17 00:00:00 2001 From: Zqiang Date: Mon, 25 Mar 2024 15:52:19 +0800 Subject: [PATCH 1174/1565] rcutorture: Fix invalid context warning when enable srcu barrier testing [ Upstream commit 668c0406d887467d53f8fe79261dda1d22d5b671 ] When the torture_type is set srcu or srcud and cb_barrier is non-zero, running the rcutorture test will trigger the following warning: [ 163.910989][ C1] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 [ 163.910994][ C1] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1 [ 163.910999][ C1] preempt_count: 10001, expected: 0 [ 163.911002][ C1] RCU nest depth: 0, expected: 0 [ 163.911005][ C1] INFO: lockdep is turned off. [ 163.911007][ C1] irq event stamp: 30964 [ 163.911010][ C1] hardirqs last enabled at (30963): [] do_idle+0x362/0x500 [ 163.911018][ C1] hardirqs last disabled at (30964): [] sysvec_call_function_single+0xf/0xd0 [ 163.911025][ C1] softirqs last enabled at (0): [] copy_process+0x16ff/0x6580 [ 163.911033][ C1] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 163.911038][ C1] Preemption disabled at: [ 163.911039][ C1] [] stack_depot_save_flags+0x24b/0x6c0 [ 163.911063][ C1] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 6.8.0-rc4-rt4-yocto-preempt-rt+ #3 1e39aa9a737dd024a3275c4f835a872f673a7d3a [ 163.911071][ C1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [ 163.911075][ C1] Call Trace: [ 163.911078][ C1] [ 163.911080][ C1] dump_stack_lvl+0x88/0xd0 [ 163.911089][ C1] dump_stack+0x10/0x20 [ 163.911095][ C1] __might_resched+0x36f/0x530 [ 163.911105][ C1] rt_spin_lock+0x82/0x1c0 [ 163.911112][ C1] spin_lock_irqsave_ssp_contention+0xb8/0x100 [ 163.911121][ C1] srcu_gp_start_if_needed+0x782/0xf00 [ 163.911128][ C1] ? _raw_spin_unlock_irqrestore+0x46/0x70 [ 163.911136][ C1] ? debug_object_active_state+0x336/0x470 [ 163.911148][ C1] ? __pfx_srcu_gp_start_if_needed+0x10/0x10 [ 163.911156][ C1] ? __pfx_lock_release+0x10/0x10 [ 163.911165][ C1] ? __pfx_rcu_torture_barrier_cbf+0x10/0x10 [ 163.911188][ C1] __call_srcu+0x9f/0xe0 [ 163.911196][ C1] call_srcu+0x13/0x20 [ 163.911201][ C1] srcu_torture_call+0x1b/0x30 [ 163.911224][ C1] rcu_torture_barrier1cb+0x4a/0x60 [ 163.911247][ C1] __flush_smp_call_function_queue+0x267/0xca0 [ 163.911256][ C1] ? __pfx_rcu_torture_barrier1cb+0x10/0x10 [ 163.911281][ C1] generic_smp_call_function_single_interrupt+0x13/0x20 [ 163.911288][ C1] __sysvec_call_function_single+0x7d/0x280 [ 163.911295][ C1] sysvec_call_function_single+0x93/0xd0 [ 163.911302][ C1] [ 163.911304][ C1] [ 163.911308][ C1] asm_sysvec_call_function_single+0x1b/0x20 [ 163.911313][ C1] RIP: 0010:default_idle+0x17/0x20 [ 163.911326][ C1] RSP: 0018:ffff888001997dc8 EFLAGS: 00000246 [ 163.911333][ C1] RAX: 0000000000000000 RBX: dffffc0000000000 RCX: ffffffffae618b51 [ 163.911337][ C1] RDX: 0000000000000000 RSI: ffffffffaea80920 RDI: ffffffffaec2de80 [ 163.911342][ C1] RBP: ffff888001997dc8 R08: 0000000000000001 R09: ffffed100d740cad [ 163.911346][ C1] R10: ffffed100d740cac R11: ffff88806ba06563 R12: 0000000000000001 [ 163.911350][ C1] R13: ffffffffafe460c0 R14: ffffffffafe460c0 R15: 0000000000000000 [ 163.911358][ C1] ? ct_kernel_exit.constprop.3+0x121/0x160 [ 163.911369][ C1] ? lockdep_hardirqs_on+0xc4/0x150 [ 163.911376][ C1] arch_cpu_idle+0x9/0x10 [ 163.911383][ C1] default_idle_call+0x7a/0xb0 [ 163.911390][ C1] do_idle+0x362/0x500 [ 163.911398][ C1] ? __pfx_do_idle+0x10/0x10 [ 163.911404][ C1] ? complete_with_flags+0x8b/0xb0 [ 163.911416][ C1] cpu_startup_entry+0x58/0x70 [ 163.911423][ C1] start_secondary+0x221/0x280 [ 163.911430][ C1] ? __pfx_start_secondary+0x10/0x10 [ 163.911440][ C1] secondary_startup_64_no_verify+0x17f/0x18b [ 163.911455][ C1] This commit therefore use smp_call_on_cpu() instead of smp_call_function_single(), make rcu_torture_barrier1cb() invoked happens on task-context. Signed-off-by: Zqiang Signed-off-by: Paul E. McKenney Signed-off-by: Uladzislau Rezki (Sony) Signed-off-by: Sasha Levin --- kernel/rcu/rcutorture.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index c413eb4a9566..9f505688291e 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -2210,11 +2210,12 @@ static void rcu_torture_barrier_cbf(struct rcu_head *rcu) } /* IPI handler to get callback posted on desired CPU, if online. */ -static void rcu_torture_barrier1cb(void *rcu_void) +static int rcu_torture_barrier1cb(void *rcu_void) { struct rcu_head *rhp = rcu_void; cur_ops->call(rhp, rcu_torture_barrier_cbf); + return 0; } /* kthread function to register callbacks used to test RCU barriers. */ @@ -2240,11 +2241,9 @@ static int rcu_torture_barrier_cbs(void *arg) * The above smp_load_acquire() ensures barrier_phase load * is ordered before the following ->call(). */ - if (smp_call_function_single(myid, rcu_torture_barrier1cb, - &rcu, 1)) { - // IPI failed, so use direct call from current CPU. + if (smp_call_on_cpu(myid, rcu_torture_barrier1cb, &rcu, 1)) cur_ops->call(&rcu, rcu_torture_barrier_cbf); - } + if (atomic_dec_and_test(&barrier_cbs_count)) wake_up(&barrier_wq); } while (!torture_must_stop()); From 58706e482bf45c4db48b0c53aba2468c97adda24 Mon Sep 17 00:00:00 2001 From: Justin Stitt Date: Tue, 7 May 2024 03:53:49 +0000 Subject: [PATCH 1175/1565] block/ioctl: prefer different overflow check [ Upstream commit ccb326b5f9e623eb7f130fbbf2505ec0e2dcaff9 ] Running syzkaller with the newly reintroduced signed integer overflow sanitizer shows this report: [ 62.982337] ------------[ cut here ]------------ [ 62.985692] cgroup: Invalid name [ 62.986211] UBSAN: signed-integer-overflow in ../block/ioctl.c:36:46 [ 62.989370] 9pnet_fd: p9_fd_create_tcp (7343): problem connecting socket to 127.0.0.1 [ 62.992992] 9223372036854775807 + 4095 cannot be represented in type 'long long' [ 62.997827] 9pnet_fd: p9_fd_create_tcp (7345): problem connecting socket to 127.0.0.1 [ 62.999369] random: crng reseeded on system resumption [ 63.000634] GUP no longer grows the stack in syz-executor.2 (7353): 20002000-20003000 (20001000) [ 63.000668] CPU: 0 PID: 7353 Comm: syz-executor.2 Not tainted 6.8.0-rc2-00035-gb3ef86b5a957 #1 [ 63.000677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 63.000682] Call Trace: [ 63.000686] [ 63.000731] dump_stack_lvl+0x93/0xd0 [ 63.000919] __get_user_pages+0x903/0xd30 [ 63.001030] __gup_longterm_locked+0x153e/0x1ba0 [ 63.001041] ? _raw_read_unlock_irqrestore+0x17/0x50 [ 63.001072] ? try_get_folio+0x29c/0x2d0 [ 63.001083] internal_get_user_pages_fast+0x1119/0x1530 [ 63.001109] iov_iter_extract_pages+0x23b/0x580 [ 63.001206] bio_iov_iter_get_pages+0x4de/0x1220 [ 63.001235] iomap_dio_bio_iter+0x9b6/0x1410 [ 63.001297] __iomap_dio_rw+0xab4/0x1810 [ 63.001316] iomap_dio_rw+0x45/0xa0 [ 63.001328] ext4_file_write_iter+0xdde/0x1390 [ 63.001372] vfs_write+0x599/0xbd0 [ 63.001394] ksys_write+0xc8/0x190 [ 63.001403] do_syscall_64+0xd4/0x1b0 [ 63.001421] ? arch_exit_to_user_mode_prepare+0x3a/0x60 [ 63.001479] entry_SYSCALL_64_after_hwframe+0x6f/0x77 [ 63.001535] RIP: 0033:0x7f7fd3ebf539 [ 63.001551] Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 [ 63.001562] RSP: 002b:00007f7fd32570c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 63.001584] RAX: ffffffffffffffda RBX: 00007f7fd3ff3f80 RCX: 00007f7fd3ebf539 [ 63.001590] RDX: 4db6d1e4f7e43360 RSI: 0000000020000000 RDI: 0000000000000004 [ 63.001595] RBP: 00007f7fd3f1e496 R08: 0000000000000000 R09: 0000000000000000 [ 63.001599] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 63.001604] R13: 0000000000000006 R14: 00007f7fd3ff3f80 R15: 00007ffd415ad2b8 ... [ 63.018142] ---[ end trace ]--- Historically, the signed integer overflow sanitizer did not work in the kernel due to its interaction with `-fwrapv` but this has since been changed [1] in the newest version of Clang; It was re-enabled in the kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow sanitizer"). Let's rework this overflow checking logic to not actually perform an overflow during the check itself, thus avoiding the UBSAN splat. [1]: https://github.com/llvm/llvm-project/pull/82432 Signed-off-by: Justin Stitt Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20240507-b4-sio-block-ioctl-v3-1-ba0c2b32275e@google.com Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin --- block/ioctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/ioctl.c b/block/ioctl.c index bc97698e0e8a..11e692741f17 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -32,7 +32,7 @@ static int blkpg_do_ioctl(struct block_device *bdev, if (op == BLKPG_DEL_PARTITION) return bdev_del_partition(bdev, p.pno); - if (p.start < 0 || p.length <= 0 || p.start + p.length < 0) + if (p.start < 0 || p.length <= 0 || LLONG_MAX - p.length < p.start) return -EINVAL; /* Check that the partition is aligned to the block size */ if (!IS_ALIGNED(p.start | p.length, bdev_logical_block_size(bdev))) From 973b32034ce1e10dc750a441f986ca52aed9bef6 Mon Sep 17 00:00:00 2001 From: "Alessandro Carminati (Red Hat)" Date: Thu, 14 Mar 2024 10:59:11 +0000 Subject: [PATCH 1176/1565] selftests/bpf: Prevent client connect before server bind in test_tc_tunnel.sh [ Upstream commit f803bcf9208a2540acb4c32bdc3616673169f490 ] In some systems, the netcat server can incur in delay to start listening. When this happens, the test can randomly fail in various points. This is an example error message: # ip gre none gso # encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000 # test basic connectivity # Ncat: Connection refused. The issue stems from a race condition between the netcat client and server. The test author had addressed this problem by implementing a sleep, which I have removed in this patch. This patch introduces a function capable of sleeping for up to two seconds. However, it can terminate the waiting period early if the port is reported to be listening. Signed-off-by: Alessandro Carminati (Red Hat) Signed-off-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gmail.com Signed-off-by: Sasha Levin --- tools/testing/selftests/bpf/test_tc_tunnel.sh | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh index 7c76b841b17b..21bde60c9523 100755 --- a/tools/testing/selftests/bpf/test_tc_tunnel.sh +++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh @@ -71,7 +71,6 @@ cleanup() { server_listen() { ip netns exec "${ns2}" nc "${netcat_opt}" -l -p "${port}" > "${outfile}" & server_pid=$! - sleep 0.2 } client_connect() { @@ -92,6 +91,16 @@ verify_data() { fi } +wait_for_port() { + for i in $(seq 20); do + if ip netns exec "${ns2}" ss ${2:--4}OHntl | grep -q "$1"; then + return 0 + fi + sleep 0.1 + done + return 1 +} + set -e # no arguments: automated test, run all @@ -183,6 +192,7 @@ setup # basic communication works echo "test basic connectivity" server_listen +wait_for_port ${port} ${netcat_opt} client_connect verify_data @@ -194,6 +204,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \ section "encap_${tuntype}_${mac}" echo "test bpf encap without decap (expect failure)" server_listen +wait_for_port ${port} ${netcat_opt} ! client_connect if [[ "$tuntype" =~ "udp" ]]; then From e1c3f5fb1be86304c253f189b8f2577708bfeda1 Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Thu, 21 Mar 2024 23:13:53 -0700 Subject: [PATCH 1177/1565] selftests/bpf: Fix flaky test btf_map_in_map/lookup_update [ Upstream commit 14bb1e8c8d4ad5d9d2febb7d19c70a3cf536e1e5 ] Recently, I frequently hit the following test failure: [root@arch-fb-vm1 bpf]# ./test_progs -n 33/1 test_lookup_update:PASS:skel_open 0 nsec [...] test_lookup_update:PASS:sync_rcu 0 nsec test_lookup_update:FAIL:map1_leak inner_map1 leaked! #33/1 btf_map_in_map/lookup_update:FAIL #33 btf_map_in_map:FAIL In the test, after map is closed and then after two rcu grace periods, it is assumed that map_id is not available to user space. But the above assumption cannot be guaranteed. After zero or one or two rcu grace periods in different siturations, the actual freeing-map-work is put into a workqueue. Later on, when the work is dequeued, the map will be actually freed. See bpf_map_put() in kernel/bpf/syscall.c. By using workqueue, there is no ganrantee that map will be actually freed after a couple of rcu grace periods. This patch removed such map leak detection and then the test can pass consistently. Signed-off-by: Yonghong Song Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20240322061353.632136-1-yonghong.song@linux.dev Signed-off-by: Sasha Levin --- .../selftests/bpf/prog_tests/btf_map_in_map.c | 26 +------------------ 1 file changed, 1 insertion(+), 25 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c b/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c index 76ebe4c250f1..a434828bc7ab 100644 --- a/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c +++ b/tools/testing/selftests/bpf/prog_tests/btf_map_in_map.c @@ -58,7 +58,7 @@ static void test_lookup_update(void) int map1_fd, map2_fd, map3_fd, map4_fd, map5_fd, map1_id, map2_id; int outer_arr_fd, outer_hash_fd, outer_arr_dyn_fd; struct test_btf_map_in_map *skel; - int err, key = 0, val, i, fd; + int err, key = 0, val, i; skel = test_btf_map_in_map__open_and_load(); if (CHECK(!skel, "skel_open", "failed to open&load skeleton\n")) @@ -135,30 +135,6 @@ static void test_lookup_update(void) CHECK(map1_id == 0, "map1_id", "failed to get ID 1\n"); CHECK(map2_id == 0, "map2_id", "failed to get ID 2\n"); - test_btf_map_in_map__destroy(skel); - skel = NULL; - - /* we need to either wait for or force synchronize_rcu(), before - * checking for "still exists" condition, otherwise map could still be - * resolvable by ID, causing false positives. - * - * Older kernels (5.8 and earlier) freed map only after two - * synchronize_rcu()s, so trigger two, to be entirely sure. - */ - CHECK(kern_sync_rcu(), "sync_rcu", "failed\n"); - CHECK(kern_sync_rcu(), "sync_rcu", "failed\n"); - - fd = bpf_map_get_fd_by_id(map1_id); - if (CHECK(fd >= 0, "map1_leak", "inner_map1 leaked!\n")) { - close(fd); - goto cleanup; - } - fd = bpf_map_get_fd_by_id(map2_id); - if (CHECK(fd >= 0, "map2_leak", "inner_map2 leaked!\n")) { - close(fd); - goto cleanup; - } - cleanup: test_btf_map_in_map__destroy(skel); } From 82cdea8f3af1e36543c937df963d108c60bea030 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 30 Mar 2024 15:54:38 +0000 Subject: [PATCH 1178/1565] batman-adv: bypass empty buckets in batadv_purge_orig_ref() [ Upstream commit 40dc8ab605894acae1473e434944924a22cfaaa0 ] Many syzbot reports are pointing to soft lockups in batadv_purge_orig_ref() [1] Root cause is unknown, but we can avoid spending too much time there and perhaps get more interesting reports. [1] watchdog: BUG: soft lockup - CPU#0 stuck for 27s! [kworker/u4:6:621] Modules linked in: irq event stamp: 6182794 hardirqs last enabled at (6182793): [] __local_bh_enable_ip+0x224/0x44c kernel/softirq.c:386 hardirqs last disabled at (6182794): [] __el1_irq arch/arm64/kernel/entry-common.c:533 [inline] hardirqs last disabled at (6182794): [] el1_interrupt+0x24/0x68 arch/arm64/kernel/entry-common.c:551 softirqs last enabled at (6182792): [] spin_unlock_bh include/linux/spinlock.h:396 [inline] softirqs last enabled at (6182792): [] batadv_purge_orig_ref+0x114c/0x1228 net/batman-adv/originator.c:1287 softirqs last disabled at (6182790): [] spin_lock_bh include/linux/spinlock.h:356 [inline] softirqs last disabled at (6182790): [] batadv_purge_orig_ref+0x164/0x1228 net/batman-adv/originator.c:1271 CPU: 0 PID: 621 Comm: kworker/u4:6 Not tainted 6.8.0-rc7-syzkaller-g707081b61156 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024 Workqueue: bat_events batadv_purge_orig pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : should_resched arch/arm64/include/asm/preempt.h:79 [inline] pc : __local_bh_enable_ip+0x228/0x44c kernel/softirq.c:388 lr : __local_bh_enable_ip+0x224/0x44c kernel/softirq.c:386 sp : ffff800099007970 x29: ffff800099007980 x28: 1fffe00018fce1bd x27: dfff800000000000 x26: ffff0000d2620008 x25: ffff0000c7e70de8 x24: 0000000000000001 x23: 1fffe00018e57781 x22: dfff800000000000 x21: ffff80008aab71c4 x20: ffff0001b40136c0 x19: ffff0000c72bbc08 x18: 1fffe0001a817bb0 x17: ffff800125414000 x16: ffff80008032116c x15: 0000000000000001 x14: 1fffe0001ee9d610 x13: 0000000000000000 x12: 0000000000000003 x11: 0000000000000000 x10: 0000000000ff0100 x9 : 0000000000000000 x8 : 00000000005e5789 x7 : ffff80008aab61dc x6 : 0000000000000000 x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000 x2 : 0000000000000006 x1 : 0000000000000080 x0 : ffff800125414000 Call trace: __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:27 [inline] arch_local_irq_enable arch/arm64/include/asm/irqflags.h:49 [inline] __local_bh_enable_ip+0x228/0x44c kernel/softirq.c:386 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline] _raw_spin_unlock_bh+0x3c/0x4c kernel/locking/spinlock.c:210 spin_unlock_bh include/linux/spinlock.h:396 [inline] batadv_purge_orig_ref+0x114c/0x1228 net/batman-adv/originator.c:1287 batadv_purge_orig+0x20/0x70 net/batman-adv/originator.c:1300 process_one_work+0x694/0x1204 kernel/workqueue.c:2633 process_scheduled_works kernel/workqueue.c:2706 [inline] worker_thread+0x938/0xef4 kernel/workqueue.c:2787 kthread+0x288/0x310 kernel/kthread.c:388 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860 Sending NMI from CPU 0 to CPUs 1: NMI backtrace for cpu 1 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-syzkaller-g707081b61156 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024 pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : arch_local_irq_enable+0x8/0xc arch/arm64/include/asm/irqflags.h:51 lr : default_idle_call+0xf8/0x128 kernel/sched/idle.c:103 sp : ffff800093a17d30 x29: ffff800093a17d30 x28: dfff800000000000 x27: 1ffff00012742fb4 x26: ffff80008ec9d000 x25: 0000000000000000 x24: 0000000000000002 x23: 1ffff00011d93a74 x22: ffff80008ec9d3a0 x21: 0000000000000000 x20: ffff0000c19dbc00 x19: ffff8000802d0fd8 x18: 1fffe00036804396 x17: ffff80008ec9d000 x16: ffff8000802d089c x15: 0000000000000001 x14: 1fffe00036805f10 x13: 0000000000000000 x12: 0000000000000003 x11: 0000000000000001 x10: 0000000000000003 x9 : 0000000000000000 x8 : 00000000000ce8d1 x7 : ffff8000804609e4 x6 : 0000000000000000 x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff80008ad6aac0 x2 : 0000000000000000 x1 : ffff80008aedea60 x0 : ffff800125436000 Call trace: __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:27 [inline] arch_local_irq_enable+0x8/0xc arch/arm64/include/asm/irqflags.h:49 cpuidle_idle_call kernel/sched/idle.c:170 [inline] do_idle+0x1f0/0x4e8 kernel/sched/idle.c:312 cpu_startup_entry+0x5c/0x74 kernel/sched/idle.c:410 secondary_start_kernel+0x198/0x1c0 arch/arm64/kernel/smp.c:272 __secondary_switched+0xb8/0xbc arch/arm64/kernel/head.S:404 Signed-off-by: Eric Dumazet Signed-off-by: Sven Eckelmann Signed-off-by: Simon Wunderlich Signed-off-by: Sasha Levin --- net/batman-adv/originator.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c index 2d38a09459bb..3b2a3759efd6 100644 --- a/net/batman-adv/originator.c +++ b/net/batman-adv/originator.c @@ -1285,6 +1285,8 @@ void batadv_purge_orig_ref(struct batadv_priv *bat_priv) /* for all origins... */ for (i = 0; i < hash->size; i++) { head = &hash->table[i]; + if (hlist_empty(head)) + continue; list_lock = &hash->list_locks[i]; spin_lock_bh(list_lock); From b5a53d14dd83c174413da09f9a13c4c1cdf884d3 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 4 Apr 2024 09:35:59 +0300 Subject: [PATCH 1179/1565] wifi: ath9k: work around memset overflow warning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 61752ac69b69ed2e04444d090f6917c77ab36d42 ] gcc-9 and some other older versions produce a false-positive warning for zeroing two fields In file included from include/linux/string.h:369, from drivers/net/wireless/ath/ath9k/main.c:18: In function 'fortify_memset_chk', inlined from 'ath9k_ps_wakeup' at drivers/net/wireless/ath/ath9k/main.c:140:3: include/linux/fortify-string.h:462:25: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning] 462 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Using a struct_group seems to reliably avoid the warning and not make the code much uglier. The combined memset() should even save a couple of cpu cycles. Signed-off-by: Arnd Bergmann Acked-by: Toke Høiland-Jørgensen Reviewed-by: Kees Cook Signed-off-by: Kalle Valo Link: https://msgid.link/20240328135509.3755090-3-arnd@kernel.org Signed-off-by: Sasha Levin --- drivers/net/wireless/ath/ath.h | 6 ++++-- drivers/net/wireless/ath/ath9k/main.c | 3 +-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/net/wireless/ath/ath.h b/drivers/net/wireless/ath/ath.h index f02a308a9ffc..34654f710d8a 100644 --- a/drivers/net/wireless/ath/ath.h +++ b/drivers/net/wireless/ath/ath.h @@ -171,8 +171,10 @@ struct ath_common { unsigned int clockrate; spinlock_t cc_lock; - struct ath_cycle_counters cc_ani; - struct ath_cycle_counters cc_survey; + struct_group(cc, + struct ath_cycle_counters cc_ani; + struct ath_cycle_counters cc_survey; + ); struct ath_regulatory regulatory; struct ath_regulatory reg_world_copy; diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c index b2cfc483515c..c5904d81d000 100644 --- a/drivers/net/wireless/ath/ath9k/main.c +++ b/drivers/net/wireless/ath/ath9k/main.c @@ -135,8 +135,7 @@ void ath9k_ps_wakeup(struct ath_softc *sc) if (power_mode != ATH9K_PM_AWAKE) { spin_lock(&common->cc_lock); ath_hw_cycle_counters_update(common); - memset(&common->cc_survey, 0, sizeof(common->cc_survey)); - memset(&common->cc_ani, 0, sizeof(common->cc_ani)); + memset(&common->cc, 0, sizeof(common->cc)); spin_unlock(&common->cc_lock); } From a720d71dd494320b4452b94221d05f498fe12d04 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 5 Apr 2024 11:49:39 +0000 Subject: [PATCH 1180/1565] af_packet: avoid a false positive warning in packet_setsockopt() [ Upstream commit 86d43e2bf93ccac88ef71cee36a23282ebd9e427 ] Although the code is correct, the following line copy_from_sockptr(&req_u.req, optval, len)); triggers this warning : memcpy: detected field-spanning write (size 28) of single field "dst" at include/linux/sockptr.h:49 (size 16) Refactor the code to be more explicit. Reported-by: syzbot Signed-off-by: Eric Dumazet Cc: Kees Cook Cc: Willem de Bruijn Reviewed-by: Kees Cook Reviewed-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/packet/af_packet.c | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 8e52f0949305..9bec88fe3505 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -3761,28 +3761,30 @@ packet_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, case PACKET_TX_RING: { union tpacket_req_u req_u; - int len; + ret = -EINVAL; lock_sock(sk); switch (po->tp_version) { case TPACKET_V1: case TPACKET_V2: - len = sizeof(req_u.req); + if (optlen < sizeof(req_u.req)) + break; + ret = copy_from_sockptr(&req_u.req, optval, + sizeof(req_u.req)) ? + -EINVAL : 0; break; case TPACKET_V3: default: - len = sizeof(req_u.req3); + if (optlen < sizeof(req_u.req3)) + break; + ret = copy_from_sockptr(&req_u.req3, optval, + sizeof(req_u.req3)) ? + -EINVAL : 0; break; } - if (optlen < len) { - ret = -EINVAL; - } else { - if (copy_from_sockptr(&req_u.req, optval, len)) - ret = -EFAULT; - else - ret = packet_set_ring(sk, &req_u, 0, - optname == PACKET_TX_RING); - } + if (!ret) + ret = packet_set_ring(sk, &req_u, 0, + optname == PACKET_TX_RING); release_sock(sk); return ret; } From 96941f29ebcc1e9cbf570dc903f30374909562f5 Mon Sep 17 00:00:00 2001 From: Wander Lairson Costa Date: Thu, 11 Apr 2024 11:13:46 -0300 Subject: [PATCH 1181/1565] drop_monitor: replace spin_lock by raw_spin_lock [ Upstream commit f1e197a665c2148ebc25fe09c53689e60afea195 ] trace_drop_common() is called with preemption disabled, and it acquires a spin_lock. This is problematic for RT kernels because spin_locks are sleeping locks in this configuration, which causes the following splat: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 449, name: rcuc/47 preempt_count: 1, expected: 0 RCU nest depth: 2, expected: 2 5 locks held by rcuc/47/449: #0: ff1100086ec30a60 ((softirq_ctrl.lock)){+.+.}-{2:2}, at: __local_bh_disable_ip+0x105/0x210 #1: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0xbf/0x130 #2: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip+0x11c/0x210 #3: ffffffffb394a160 (rcu_callback){....}-{0:0}, at: rcu_do_batch+0x360/0xc70 #4: ff1100086ee07520 (&data->lock){+.+.}-{2:2}, at: trace_drop_common.constprop.0+0xb5/0x290 irq event stamp: 139909 hardirqs last enabled at (139908): [] _raw_spin_unlock_irqrestore+0x63/0x80 hardirqs last disabled at (139909): [] trace_drop_common.constprop.0+0x26d/0x290 softirqs last enabled at (139892): [] __local_bh_enable_ip+0x103/0x170 softirqs last disabled at (139898): [] rcu_cpu_kthread+0x93/0x1f0 Preemption disabled at: [] rt_mutex_slowunlock+0xab/0x2e0 CPU: 47 PID: 449 Comm: rcuc/47 Not tainted 6.9.0-rc2-rt1+ #7 Hardware name: Dell Inc. PowerEdge R650/0Y2G81, BIOS 1.6.5 04/15/2022 Call Trace: dump_stack_lvl+0x8c/0xd0 dump_stack+0x14/0x20 __might_resched+0x21e/0x2f0 rt_spin_lock+0x5e/0x130 ? trace_drop_common.constprop.0+0xb5/0x290 ? skb_queue_purge_reason.part.0+0x1bf/0x230 trace_drop_common.constprop.0+0xb5/0x290 ? preempt_count_sub+0x1c/0xd0 ? _raw_spin_unlock_irqrestore+0x4a/0x80 ? __pfx_trace_drop_common.constprop.0+0x10/0x10 ? rt_mutex_slowunlock+0x26a/0x2e0 ? skb_queue_purge_reason.part.0+0x1bf/0x230 ? __pfx_rt_mutex_slowunlock+0x10/0x10 ? skb_queue_purge_reason.part.0+0x1bf/0x230 trace_kfree_skb_hit+0x15/0x20 trace_kfree_skb+0xe9/0x150 kfree_skb_reason+0x7b/0x110 skb_queue_purge_reason.part.0+0x1bf/0x230 ? __pfx_skb_queue_purge_reason.part.0+0x10/0x10 ? mark_lock.part.0+0x8a/0x520 ... trace_drop_common() also disables interrupts, but this is a minor issue because we could easily replace it with a local_lock. Replace the spin_lock with raw_spin_lock to avoid sleeping in atomic context. Signed-off-by: Wander Lairson Costa Reported-by: Hu Chunyu Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/core/drop_monitor.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index 7742ee689141..009b9e22c4e7 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -73,7 +73,7 @@ struct net_dm_hw_entries { }; struct per_cpu_dm_data { - spinlock_t lock; /* Protects 'skb', 'hw_entries' and + raw_spinlock_t lock; /* Protects 'skb', 'hw_entries' and * 'send_timer' */ union { @@ -168,9 +168,9 @@ static struct sk_buff *reset_per_cpu_data(struct per_cpu_dm_data *data) err: mod_timer(&data->send_timer, jiffies + HZ / 10); out: - spin_lock_irqsave(&data->lock, flags); + raw_spin_lock_irqsave(&data->lock, flags); swap(data->skb, skb); - spin_unlock_irqrestore(&data->lock, flags); + raw_spin_unlock_irqrestore(&data->lock, flags); if (skb) { struct nlmsghdr *nlh = (struct nlmsghdr *)skb->data; @@ -225,7 +225,7 @@ static void trace_drop_common(struct sk_buff *skb, void *location) local_irq_save(flags); data = this_cpu_ptr(&dm_cpu_data); - spin_lock(&data->lock); + raw_spin_lock(&data->lock); dskb = data->skb; if (!dskb) @@ -259,7 +259,7 @@ static void trace_drop_common(struct sk_buff *skb, void *location) } out: - spin_unlock_irqrestore(&data->lock, flags); + raw_spin_unlock_irqrestore(&data->lock, flags); } static void trace_kfree_skb_hit(void *ignore, struct sk_buff *skb, void *location) @@ -318,9 +318,9 @@ net_dm_hw_reset_per_cpu_data(struct per_cpu_dm_data *hw_data) mod_timer(&hw_data->send_timer, jiffies + HZ / 10); } - spin_lock_irqsave(&hw_data->lock, flags); + raw_spin_lock_irqsave(&hw_data->lock, flags); swap(hw_data->hw_entries, hw_entries); - spin_unlock_irqrestore(&hw_data->lock, flags); + raw_spin_unlock_irqrestore(&hw_data->lock, flags); return hw_entries; } @@ -452,7 +452,7 @@ net_dm_hw_trap_summary_probe(void *ignore, const struct devlink *devlink, return; hw_data = this_cpu_ptr(&dm_hw_cpu_data); - spin_lock_irqsave(&hw_data->lock, flags); + raw_spin_lock_irqsave(&hw_data->lock, flags); hw_entries = hw_data->hw_entries; if (!hw_entries) @@ -481,7 +481,7 @@ net_dm_hw_trap_summary_probe(void *ignore, const struct devlink *devlink, } out: - spin_unlock_irqrestore(&hw_data->lock, flags); + raw_spin_unlock_irqrestore(&hw_data->lock, flags); } static const struct net_dm_alert_ops net_dm_alert_summary_ops = { @@ -1669,7 +1669,7 @@ static struct notifier_block dropmon_net_notifier = { static void __net_dm_cpu_data_init(struct per_cpu_dm_data *data) { - spin_lock_init(&data->lock); + raw_spin_lock_init(&data->lock); skb_queue_head_init(&data->drop_queue); u64_stats_init(&data->stats.syncp); } From 144d76a676b630e321556965011b00e2de0b40a7 Mon Sep 17 00:00:00 2001 From: Manish Rangankar Date: Mon, 15 Apr 2024 12:51:55 +0530 Subject: [PATCH 1182/1565] scsi: qedi: Fix crash while reading debugfs attribute [ Upstream commit 28027ec8e32ecbadcd67623edb290dad61e735b5 ] The qedi_dbg_do_not_recover_cmd_read() function invokes sprintf() directly on a __user pointer, which results into the crash. To fix this issue, use a small local stack buffer for sprintf() and then call simple_read_from_buffer(), which in turns make the copy_to_user() call. BUG: unable to handle page fault for address: 00007f4801111000 PGD 8000000864df6067 P4D 8000000864df6067 PUD 864df7067 PMD 846028067 PTE 0 Oops: 0002 [#1] PREEMPT SMP PTI Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 06/15/2023 RIP: 0010:memcpy_orig+0xcd/0x130 RSP: 0018:ffffb7a18c3ffc40 EFLAGS: 00010202 RAX: 00007f4801111000 RBX: 00007f4801111000 RCX: 000000000000000f RDX: 000000000000000f RSI: ffffffffc0bfd7a0 RDI: 00007f4801111000 RBP: ffffffffc0bfd7a0 R08: 725f746f6e5f6f64 R09: 3d7265766f636572 R10: ffffb7a18c3ffd08 R11: 0000000000000000 R12: 00007f4881110fff R13: 000000007fffffff R14: ffffb7a18c3ffca0 R15: ffffffffc0bfd7af FS: 00007f480118a740(0000) GS:ffff98e38af00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4801111000 CR3: 0000000864b8e001 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? __die_body+0x1a/0x60 ? page_fault_oops+0x183/0x510 ? exc_page_fault+0x69/0x150 ? asm_exc_page_fault+0x22/0x30 ? memcpy_orig+0xcd/0x130 vsnprintf+0x102/0x4c0 sprintf+0x51/0x80 qedi_dbg_do_not_recover_cmd_read+0x2f/0x50 [qedi 6bcfdeeecdea037da47069eca2ba717c84a77324] full_proxy_read+0x50/0x80 vfs_read+0xa5/0x2e0 ? folio_add_new_anon_rmap+0x44/0xa0 ? set_pte_at+0x15/0x30 ? do_pte_missing+0x426/0x7f0 ksys_read+0xa5/0xe0 do_syscall_64+0x58/0x80 ? __count_memcg_events+0x46/0x90 ? count_memcg_event_mm+0x3d/0x60 ? handle_mm_fault+0x196/0x2f0 ? do_user_addr_fault+0x267/0x890 ? exc_page_fault+0x69/0x150 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f4800f20b4d Tested-by: Martin Hoyer Reviewed-by: John Meneghini Signed-off-by: Manish Rangankar Link: https://lore.kernel.org/r/20240415072155.30840-1-mrangankar@marvell.com Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/qedi/qedi_debugfs.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/drivers/scsi/qedi/qedi_debugfs.c b/drivers/scsi/qedi/qedi_debugfs.c index 42f5afb60055..6e724f47ab9e 100644 --- a/drivers/scsi/qedi/qedi_debugfs.c +++ b/drivers/scsi/qedi/qedi_debugfs.c @@ -120,15 +120,11 @@ static ssize_t qedi_dbg_do_not_recover_cmd_read(struct file *filp, char __user *buffer, size_t count, loff_t *ppos) { - size_t cnt = 0; + char buf[64]; + int len; - if (*ppos) - return 0; - - cnt = sprintf(buffer, "do_not_recover=%d\n", qedi_do_not_recover); - cnt = min_t(int, count, cnt - *ppos); - *ppos += cnt; - return cnt; + len = sprintf(buf, "do_not_recover=%d\n", qedi_do_not_recover); + return simple_read_from_buffer(buffer, count, ppos, buf, len); } static int From 1b577bb1cbe765afcc3840b3dcc8cce7dc8e27e9 Mon Sep 17 00:00:00 2001 From: Kunwu Chan Date: Tue, 23 Apr 2024 16:21:02 +0800 Subject: [PATCH 1183/1565] kselftest: arm64: Add a null pointer check [ Upstream commit 80164282b3620a3cb73de6ffda5592743e448d0e ] There is a 'malloc' call, which can be unsuccessful. This patch will add the malloc failure checking to avoid possible null dereference and give more information about test fail reasons. Signed-off-by: Kunwu Chan Reviewed-by: Muhammad Usama Anjum Link: https://lore.kernel.org/r/20240423082102.2018886-1-chentao@kylinos.cn Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- tools/testing/selftests/arm64/tags/tags_test.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/arm64/tags/tags_test.c b/tools/testing/selftests/arm64/tags/tags_test.c index 5701163460ef..955f87c1170d 100644 --- a/tools/testing/selftests/arm64/tags/tags_test.c +++ b/tools/testing/selftests/arm64/tags/tags_test.c @@ -6,6 +6,7 @@ #include #include #include +#include "../../kselftest.h" #define SHIFT_TAG(tag) ((uint64_t)(tag) << 56) #define SET_TAG(ptr, tag) (((uint64_t)(ptr) & ~SHIFT_TAG(0xff)) | \ @@ -21,6 +22,9 @@ int main(void) if (prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0) == 0) tbi_enabled = 1; ptr = (struct utsname *)malloc(sizeof(*ptr)); + if (!ptr) + ksft_exit_fail_msg("Failed to allocate utsname buffer\n"); + if (tbi_enabled) tag = 0x42; ptr = (struct utsname *)SET_TAG(ptr, tag); From 43c0ca793a18578a0f5b305dd77fcf7ed99f1265 Mon Sep 17 00:00:00 2001 From: Breno Leitao Date: Mon, 29 Apr 2024 03:04:33 -0700 Subject: [PATCH 1184/1565] netpoll: Fix race condition in netpoll_owner_active [ Upstream commit c2e6a872bde9912f1a7579639c5ca3adf1003916 ] KCSAN detected a race condition in netpoll: BUG: KCSAN: data-race in net_rx_action / netpoll_send_skb write (marked) to 0xffff8881164168b0 of 4 bytes by interrupt on cpu 10: net_rx_action (./include/linux/netpoll.h:90 net/core/dev.c:6712 net/core/dev.c:6822) read to 0xffff8881164168b0 of 4 bytes by task 1 on cpu 2: netpoll_send_skb (net/core/netpoll.c:319 net/core/netpoll.c:345 net/core/netpoll.c:393) netpoll_send_udp (net/core/netpoll.c:?) value changed: 0x0000000a -> 0xffffffff This happens because netpoll_owner_active() needs to check if the current CPU is the owner of the lock, touching napi->poll_owner non atomically. The ->poll_owner field contains the current CPU holding the lock. Use an atomic read to check if the poll owner is the current CPU. Signed-off-by: Breno Leitao Link: https://lore.kernel.org/r/20240429100437.3487432-1-leitao@debian.org Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/core/netpoll.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 2ad22511b9c6..f76afab9fd8b 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -316,7 +316,7 @@ static int netpoll_owner_active(struct net_device *dev) struct napi_struct *napi; list_for_each_entry_rcu(napi, &dev->napi_list, dev_list) { - if (napi->poll_owner == smp_processor_id()) + if (READ_ONCE(napi->poll_owner) == smp_processor_id()) return 1; } return 0; From 6f6cb07482430c1e7ebdb2ebbc0ededbef5f6872 Mon Sep 17 00:00:00 2001 From: Sean O'Brien Date: Mon, 29 Apr 2024 18:08:05 +0000 Subject: [PATCH 1185/1565] HID: Add quirk for Logitech Casa touchpad [ Upstream commit dd2c345a94cfa3873cc20db87387ee509c345c1b ] This device sometimes doesn't send touch release signals when moving from >=4 fingers to <4 fingers. Using MT_QUIRK_NOT_SEEN_MEANS_UP instead of MT_QUIRK_ALWAYS_VALID makes sure that no touches become stuck. MT_QUIRK_FORCE_MULTI_INPUT is not necessary for this device, but does no harm. Signed-off-by: Sean O'Brien Signed-off-by: Jiri Kosina Signed-off-by: Sasha Levin --- drivers/hid/hid-ids.h | 1 + drivers/hid/hid-multitouch.c | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index 0732fe6c7a85..9f3f7588fe46 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -765,6 +765,7 @@ #define USB_DEVICE_ID_LOGITECH_AUDIOHUB 0x0a0e #define USB_DEVICE_ID_LOGITECH_T651 0xb00c #define USB_DEVICE_ID_LOGITECH_DINOVO_EDGE_KBD 0xb309 +#define USB_DEVICE_ID_LOGITECH_CASA_TOUCHPAD 0xbb00 #define USB_DEVICE_ID_LOGITECH_C007 0xc007 #define USB_DEVICE_ID_LOGITECH_C077 0xc077 #define USB_DEVICE_ID_LOGITECH_RECEIVER 0xc101 diff --git a/drivers/hid/hid-multitouch.c b/drivers/hid/hid-multitouch.c index 8dcd636daf27..e7b047421f3d 100644 --- a/drivers/hid/hid-multitouch.c +++ b/drivers/hid/hid-multitouch.c @@ -1998,6 +1998,12 @@ static const struct hid_device_id mt_devices[] = { USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_X12_TAB) }, + /* Logitech devices */ + { .driver_data = MT_CLS_NSMU, + HID_DEVICE(BUS_BLUETOOTH, HID_GROUP_MULTITOUCH_WIN_8, + USB_VENDOR_ID_LOGITECH, + USB_DEVICE_ID_LOGITECH_CASA_TOUCHPAD) }, + /* MosArt panels */ { .driver_data = MT_CLS_CONFIDENCE_MINUS_ONE, MT_USB_DEVICE(USB_VENDOR_ID_ASUS, From 6fdc98bcc66eafddff91d23d626f6a210242eda5 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Mon, 6 May 2024 16:08:50 +0200 Subject: [PATCH 1186/1565] ACPI: video: Add backlight=native quirk for Lenovo Slim 7 16ARH7 [ Upstream commit c901f63dc142c48326931f164f787dfff69273d9 ] Lenovo Slim 7 16ARH7 is a machine with switchable graphics between AMD and Nvidia, and the backlight can't be adjusted properly unless acpi_backlight=native is passed. Although nvidia-wmi-backlight is present and loaded, this doesn't work as expected at all. For making it working as default, add the corresponding quirk entry with a DMI matching "LENOVO" "82UX". Link: https://bugzilla.suse.com/show_bug.cgi?id=1217750 Signed-off-by: Takashi Iwai Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin --- drivers/acpi/video_detect.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/acpi/video_detect.c b/drivers/acpi/video_detect.c index a5cb9e1d48bc..338e1f44906a 100644 --- a/drivers/acpi/video_detect.c +++ b/drivers/acpi/video_detect.c @@ -341,6 +341,14 @@ static const struct dmi_system_id video_detect_dmi_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "82BK"), }, }, + { + .callback = video_detect_force_native, + /* Lenovo Slim 7 16ARH7 */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "82UX"), + }, + }, { .callback = video_detect_force_native, /* Lenovo ThinkPad X131e (3371 AMD version) */ From 52d4cfa56b5fc79345394a0f5ca4ba5f9b2e5e8c Mon Sep 17 00:00:00 2001 From: Uri Arev Date: Sat, 6 Apr 2024 00:42:24 +0300 Subject: [PATCH 1187/1565] Bluetooth: ath3k: Fix multiple issues reported by checkpatch.pl [ Upstream commit 68aa21054ec3a1a313af90a5f95ade16c3326d20 ] This fixes some CHECKs reported by the checkpatch script. Issues reported in ath3k.c: ------- ath3k.c ------- CHECK: Please don't use multiple blank lines + + CHECK: Blank lines aren't necessary after an open brace '{' +static const struct usb_device_id ath3k_blist_tbl[] = { + CHECK: Alignment should match open parenthesis +static int ath3k_load_firmware(struct usb_device *udev, + const struct firmware *firmware) CHECK: Alignment should match open parenthesis + err = usb_bulk_msg(udev, pipe, send_buf, size, + &len, 3000); CHECK: Unnecessary parentheses around 'len != size' + if (err || (len != size)) { CHECK: Alignment should match open parenthesis +static int ath3k_get_version(struct usb_device *udev, + struct ath3k_version *version) CHECK: Alignment should match open parenthesis +static int ath3k_load_fwfile(struct usb_device *udev, + const struct firmware *firmware) CHECK: Alignment should match open parenthesis + err = usb_bulk_msg(udev, pipe, send_buf, size, + &len, 3000); CHECK: Unnecessary parentheses around 'len != size' + if (err || (len != size)) { CHECK: Blank lines aren't necessary after an open brace '{' + switch (fw_version.ref_clock) { + CHECK: Alignment should match open parenthesis + snprintf(filename, ATH3K_NAME_LEN, "ar3k/ramps_0x%08x_%d%s", + le32_to_cpu(fw_version.rom_version), clk_value, ".dfu"); CHECK: Alignment should match open parenthesis +static int ath3k_probe(struct usb_interface *intf, + const struct usb_device_id *id) CHECK: Alignment should match open parenthesis + BT_ERR("Firmware file \"%s\" not found", + ATH3K_FIRMWARE); CHECK: Alignment should match open parenthesis + BT_ERR("Firmware file \"%s\" request failed (err=%d)", + ATH3K_FIRMWARE, ret); total: 0 errors, 0 warnings, 14 checks, 540 lines checked Signed-off-by: Uri Arev Signed-off-by: Luiz Augusto von Dentz Signed-off-by: Sasha Levin --- drivers/bluetooth/ath3k.c | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-) diff --git a/drivers/bluetooth/ath3k.c b/drivers/bluetooth/ath3k.c index 759d7828931d..56ada48104f0 100644 --- a/drivers/bluetooth/ath3k.c +++ b/drivers/bluetooth/ath3k.c @@ -3,7 +3,6 @@ * Copyright (c) 2008-2009 Atheros Communications Inc. */ - #include #include #include @@ -129,7 +128,6 @@ MODULE_DEVICE_TABLE(usb, ath3k_table); * for AR3012 */ static const struct usb_device_id ath3k_blist_tbl[] = { - /* Atheros AR3012 with sflash firmware*/ { USB_DEVICE(0x0489, 0xe04e), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0489, 0xe04d), .driver_info = BTUSB_ATH3012 }, @@ -203,7 +201,7 @@ static inline void ath3k_log_failed_loading(int err, int len, int size, #define TIMEGAP_USEC_MAX 100 static int ath3k_load_firmware(struct usb_device *udev, - const struct firmware *firmware) + const struct firmware *firmware) { u8 *send_buf; int len = 0; @@ -238,9 +236,9 @@ static int ath3k_load_firmware(struct usb_device *udev, memcpy(send_buf, firmware->data + sent, size); err = usb_bulk_msg(udev, pipe, send_buf, size, - &len, 3000); + &len, 3000); - if (err || (len != size)) { + if (err || len != size) { ath3k_log_failed_loading(err, len, size, count); goto error; } @@ -263,7 +261,7 @@ static int ath3k_get_state(struct usb_device *udev, unsigned char *state) } static int ath3k_get_version(struct usb_device *udev, - struct ath3k_version *version) + struct ath3k_version *version) { return usb_control_msg_recv(udev, 0, ATH3K_GETVERSION, USB_TYPE_VENDOR | USB_DIR_IN, 0, 0, @@ -272,7 +270,7 @@ static int ath3k_get_version(struct usb_device *udev, } static int ath3k_load_fwfile(struct usb_device *udev, - const struct firmware *firmware) + const struct firmware *firmware) { u8 *send_buf; int len = 0; @@ -311,8 +309,8 @@ static int ath3k_load_fwfile(struct usb_device *udev, memcpy(send_buf, firmware->data + sent, size); err = usb_bulk_msg(udev, pipe, send_buf, size, - &len, 3000); - if (err || (len != size)) { + &len, 3000); + if (err || len != size) { ath3k_log_failed_loading(err, len, size, count); kfree(send_buf); return err; @@ -426,7 +424,6 @@ static int ath3k_load_syscfg(struct usb_device *udev) } switch (fw_version.ref_clock) { - case ATH3K_XTAL_FREQ_26M: clk_value = 26; break; @@ -442,7 +439,7 @@ static int ath3k_load_syscfg(struct usb_device *udev) } snprintf(filename, ATH3K_NAME_LEN, "ar3k/ramps_0x%08x_%d%s", - le32_to_cpu(fw_version.rom_version), clk_value, ".dfu"); + le32_to_cpu(fw_version.rom_version), clk_value, ".dfu"); ret = request_firmware(&firmware, filename, &udev->dev); if (ret < 0) { @@ -457,7 +454,7 @@ static int ath3k_load_syscfg(struct usb_device *udev) } static int ath3k_probe(struct usb_interface *intf, - const struct usb_device_id *id) + const struct usb_device_id *id) { const struct firmware *firmware; struct usb_device *udev = interface_to_usbdev(intf); @@ -506,10 +503,10 @@ static int ath3k_probe(struct usb_interface *intf, if (ret < 0) { if (ret == -ENOENT) BT_ERR("Firmware file \"%s\" not found", - ATH3K_FIRMWARE); + ATH3K_FIRMWARE); else BT_ERR("Firmware file \"%s\" request failed (err=%d)", - ATH3K_FIRMWARE, ret); + ATH3K_FIRMWARE, ret); return ret; } From b4291f58a9cfd232d378034e8207889b35d5ad99 Mon Sep 17 00:00:00 2001 From: Nicholas Kazlauskas Date: Mon, 12 Feb 2024 16:51:59 -0500 Subject: [PATCH 1188/1565] drm/amd/display: Exit idle optimizations before HDCP execution [ Upstream commit f30a3bea92bdab398531129d187629fb1d28f598 ] [WHY] PSP can access DCN registers during command submission and we need to ensure that DCN is not in PG before doing so. [HOW] Add a callback to DM to lock and notify DC for idle optimization exit. It can't be DC directly because of a potential race condition with the link protection thread and the rest of DM operation. Cc: Mario Limonciello Cc: Alex Deucher Reviewed-by: Charlene Liu Acked-by: Alex Hung Signed-off-by: Nicholas Kazlauskas Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c | 10 ++++++++++ drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h | 8 ++++++++ 2 files changed, 18 insertions(+) diff --git a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c index fa8aeec304ef..c39cb4b6767c 100644 --- a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c +++ b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c @@ -86,6 +86,14 @@ static uint8_t is_cp_desired_hdcp2(struct mod_hdcp *hdcp) !hdcp->connection.is_hdcp2_revoked; } +static void exit_idle_optimizations(struct mod_hdcp *hdcp) +{ + struct mod_hdcp_dm *dm = &hdcp->config.dm; + + if (dm->funcs.exit_idle_optimizations) + dm->funcs.exit_idle_optimizations(dm->handle); +} + static enum mod_hdcp_status execution(struct mod_hdcp *hdcp, struct mod_hdcp_event_context *event_ctx, union mod_hdcp_transition_input *input) @@ -448,6 +456,8 @@ enum mod_hdcp_status mod_hdcp_process_event(struct mod_hdcp *hdcp, memset(&event_ctx, 0, sizeof(struct mod_hdcp_event_context)); event_ctx.event = event; + exit_idle_optimizations(hdcp); + /* execute and transition */ exec_status = execution(hdcp, &event_ctx, &hdcp->auth.trans_input); trans_status = transition( diff --git a/drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h b/drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h index eed560eecbab..fb195276fb70 100644 --- a/drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h +++ b/drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h @@ -143,6 +143,13 @@ struct mod_hdcp_ddc { } funcs; }; +struct mod_hdcp_dm { + void *handle; + struct { + void (*exit_idle_optimizations)(void *handle); + } funcs; +}; + struct mod_hdcp_psp { void *handle; void *funcs; @@ -252,6 +259,7 @@ struct mod_hdcp_display_query { struct mod_hdcp_config { struct mod_hdcp_psp psp; struct mod_hdcp_ddc ddc; + struct mod_hdcp_dm dm; uint8_t index; }; From 2db63bf7d87c88bc6863c640c96811c44551b481 Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Thu, 11 Apr 2024 17:03:38 -0500 Subject: [PATCH 1189/1565] ASoC: Intel: sof_sdw: add JD2 quirk for HP Omen 14 [ Upstream commit 4fee07fbf47d2a5f1065d985459e5ce7bf7969f0 ] The default JD1 does not seem to work, use JD2 instead. Signed-off-by: Pierre-Louis Bossart Link: https://lore.kernel.org/r/20240411220347.131267-4-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/intel/boards/sof_sdw.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/sound/soc/intel/boards/sof_sdw.c b/sound/soc/intel/boards/sof_sdw.c index f36a0fda1b6a..25bf73a7e7bf 100644 --- a/sound/soc/intel/boards/sof_sdw.c +++ b/sound/soc/intel/boards/sof_sdw.c @@ -216,6 +216,15 @@ static const struct dmi_system_id sof_sdw_quirk_table[] = { }, .driver_data = (void *)(SOF_SDW_PCH_DMIC), }, + { + .callback = sof_sdw_quirk_cb, + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "HP"), + DMI_MATCH(DMI_PRODUCT_NAME, "OMEN Transcend Gaming Laptop"), + }, + .driver_data = (void *)(RT711_JD2), + }, + /* LunarLake devices */ { .callback = sof_sdw_quirk_cb, From e12c363cf5fdd0decef467ee2c0069dcad267679 Mon Sep 17 00:00:00 2001 From: Erico Nunes Date: Fri, 5 Apr 2024 17:29:49 +0200 Subject: [PATCH 1190/1565] drm/lima: add mask irq callback to gp and pp [ Upstream commit 49c13b4d2dd4a831225746e758893673f6ae961c ] This is needed because we want to reset those devices in device-agnostic code such as lima_sched. In particular, masking irqs will be useful before a hard reset to prevent race conditions. Signed-off-by: Erico Nunes Signed-off-by: Qiang Yu Link: https://patchwork.freedesktop.org/patch/msgid/20240405152951.1531555-2-nunes.erico@gmail.com Signed-off-by: Sasha Levin --- drivers/gpu/drm/lima/lima_bcast.c | 12 ++++++++++++ drivers/gpu/drm/lima/lima_bcast.h | 3 +++ drivers/gpu/drm/lima/lima_gp.c | 8 ++++++++ drivers/gpu/drm/lima/lima_pp.c | 18 ++++++++++++++++++ drivers/gpu/drm/lima/lima_sched.h | 1 + 5 files changed, 42 insertions(+) diff --git a/drivers/gpu/drm/lima/lima_bcast.c b/drivers/gpu/drm/lima/lima_bcast.c index fbc43f243c54..6d000504e1a4 100644 --- a/drivers/gpu/drm/lima/lima_bcast.c +++ b/drivers/gpu/drm/lima/lima_bcast.c @@ -43,6 +43,18 @@ void lima_bcast_suspend(struct lima_ip *ip) } +int lima_bcast_mask_irq(struct lima_ip *ip) +{ + bcast_write(LIMA_BCAST_BROADCAST_MASK, 0); + bcast_write(LIMA_BCAST_INTERRUPT_MASK, 0); + return 0; +} + +int lima_bcast_reset(struct lima_ip *ip) +{ + return lima_bcast_hw_init(ip); +} + int lima_bcast_init(struct lima_ip *ip) { int i; diff --git a/drivers/gpu/drm/lima/lima_bcast.h b/drivers/gpu/drm/lima/lima_bcast.h index 465ee587bceb..cd08841e4787 100644 --- a/drivers/gpu/drm/lima/lima_bcast.h +++ b/drivers/gpu/drm/lima/lima_bcast.h @@ -13,4 +13,7 @@ void lima_bcast_fini(struct lima_ip *ip); void lima_bcast_enable(struct lima_device *dev, int num_pp); +int lima_bcast_mask_irq(struct lima_ip *ip); +int lima_bcast_reset(struct lima_ip *ip); + #endif diff --git a/drivers/gpu/drm/lima/lima_gp.c b/drivers/gpu/drm/lima/lima_gp.c index 8dd501b7a3d0..6cf46b653e81 100644 --- a/drivers/gpu/drm/lima/lima_gp.c +++ b/drivers/gpu/drm/lima/lima_gp.c @@ -212,6 +212,13 @@ static void lima_gp_task_mmu_error(struct lima_sched_pipe *pipe) lima_sched_pipe_task_done(pipe); } +static void lima_gp_task_mask_irq(struct lima_sched_pipe *pipe) +{ + struct lima_ip *ip = pipe->processor[0]; + + gp_write(LIMA_GP_INT_MASK, 0); +} + static int lima_gp_task_recover(struct lima_sched_pipe *pipe) { struct lima_ip *ip = pipe->processor[0]; @@ -344,6 +351,7 @@ int lima_gp_pipe_init(struct lima_device *dev) pipe->task_error = lima_gp_task_error; pipe->task_mmu_error = lima_gp_task_mmu_error; pipe->task_recover = lima_gp_task_recover; + pipe->task_mask_irq = lima_gp_task_mask_irq; return 0; } diff --git a/drivers/gpu/drm/lima/lima_pp.c b/drivers/gpu/drm/lima/lima_pp.c index a5c95bed08c0..54b208a4a768 100644 --- a/drivers/gpu/drm/lima/lima_pp.c +++ b/drivers/gpu/drm/lima/lima_pp.c @@ -408,6 +408,9 @@ static void lima_pp_task_error(struct lima_sched_pipe *pipe) lima_pp_hard_reset(ip); } + + if (pipe->bcast_processor) + lima_bcast_reset(pipe->bcast_processor); } static void lima_pp_task_mmu_error(struct lima_sched_pipe *pipe) @@ -416,6 +419,20 @@ static void lima_pp_task_mmu_error(struct lima_sched_pipe *pipe) lima_sched_pipe_task_done(pipe); } +static void lima_pp_task_mask_irq(struct lima_sched_pipe *pipe) +{ + int i; + + for (i = 0; i < pipe->num_processor; i++) { + struct lima_ip *ip = pipe->processor[i]; + + pp_write(LIMA_PP_INT_MASK, 0); + } + + if (pipe->bcast_processor) + lima_bcast_mask_irq(pipe->bcast_processor); +} + static struct kmem_cache *lima_pp_task_slab; static int lima_pp_task_slab_refcnt; @@ -447,6 +464,7 @@ int lima_pp_pipe_init(struct lima_device *dev) pipe->task_fini = lima_pp_task_fini; pipe->task_error = lima_pp_task_error; pipe->task_mmu_error = lima_pp_task_mmu_error; + pipe->task_mask_irq = lima_pp_task_mask_irq; return 0; } diff --git a/drivers/gpu/drm/lima/lima_sched.h b/drivers/gpu/drm/lima/lima_sched.h index 90f03c48ef4a..f8bbfa69baea 100644 --- a/drivers/gpu/drm/lima/lima_sched.h +++ b/drivers/gpu/drm/lima/lima_sched.h @@ -83,6 +83,7 @@ struct lima_sched_pipe { void (*task_error)(struct lima_sched_pipe *pipe); void (*task_mmu_error)(struct lima_sched_pipe *pipe); int (*task_recover)(struct lima_sched_pipe *pipe); + void (*task_mask_irq)(struct lima_sched_pipe *pipe); struct work_struct recover_work; }; From 03e7b2f7ae4c0ae5fb8e4e2454ba4008877f196a Mon Sep 17 00:00:00 2001 From: Erico Nunes Date: Fri, 5 Apr 2024 17:29:51 +0200 Subject: [PATCH 1191/1565] drm/lima: mask irqs in timeout path before hard reset [ Upstream commit a421cc7a6a001b70415aa4f66024fa6178885a14 ] There is a race condition in which a rendering job might take just long enough to trigger the drm sched job timeout handler but also still complete before the hard reset is done by the timeout handler. This runs into race conditions not expected by the timeout handler. In some very specific cases it currently may result in a refcount imbalance on lima_pm_idle, with a stack dump such as: [10136.669170] WARNING: CPU: 0 PID: 0 at drivers/gpu/drm/lima/lima_devfreq.c:205 lima_devfreq_record_idle+0xa0/0xb0 ... [10136.669459] pc : lima_devfreq_record_idle+0xa0/0xb0 ... [10136.669628] Call trace: [10136.669634] lima_devfreq_record_idle+0xa0/0xb0 [10136.669646] lima_sched_pipe_task_done+0x5c/0xb0 [10136.669656] lima_gp_irq_handler+0xa8/0x120 [10136.669666] __handle_irq_event_percpu+0x48/0x160 [10136.669679] handle_irq_event+0x4c/0xc0 We can prevent that race condition entirely by masking the irqs at the beginning of the timeout handler, at which point we give up on waiting for that job entirely. The irqs will be enabled again at the next hard reset which is already done as a recovery by the timeout handler. Signed-off-by: Erico Nunes Reviewed-by: Qiang Yu Signed-off-by: Qiang Yu Link: https://patchwork.freedesktop.org/patch/msgid/20240405152951.1531555-4-nunes.erico@gmail.com Signed-off-by: Sasha Levin --- drivers/gpu/drm/lima/lima_sched.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c index f6e7a88a56f1..290f875c2859 100644 --- a/drivers/gpu/drm/lima/lima_sched.c +++ b/drivers/gpu/drm/lima/lima_sched.c @@ -419,6 +419,13 @@ static void lima_sched_timedout_job(struct drm_sched_job *job) struct lima_sched_task *task = to_lima_task(job); struct lima_device *ldev = pipe->ldev; + /* + * The task might still finish while this timeout handler runs. + * To prevent a race condition on its completion, mask all irqs + * on the running core until the next hard reset completes. + */ + pipe->task_mask_irq(pipe); + if (!pipe->error) DRM_ERROR("lima job timeout\n"); From a8c988d752b3d98d5cc1e3929c519a55ef55426c Mon Sep 17 00:00:00 2001 From: Nathan Lynch Date: Mon, 8 Apr 2024 09:08:31 -0500 Subject: [PATCH 1192/1565] powerpc/pseries: Enforce hcall result buffer validity and size [ Upstream commit ff2e185cf73df480ec69675936c4ee75a445c3e4 ] plpar_hcall(), plpar_hcall9(), and related functions expect callers to provide valid result buffers of certain minimum size. Currently this is communicated only through comments in the code and the compiler has no idea. For example, if I write a bug like this: long retbuf[PLPAR_HCALL_BUFSIZE]; // should be PLPAR_HCALL9_BUFSIZE plpar_hcall9(H_ALLOCATE_VAS_WINDOW, retbuf, ...); This compiles with no diagnostics emitted, but likely results in stack corruption at runtime when plpar_hcall9() stores results past the end of the array. (To be clear this is a contrived example and I have not found a real instance yet.) To make this class of error less likely, we can use explicitly-sized array parameters instead of pointers in the declarations for the hcall APIs. When compiled with -Warray-bounds[1], the code above now provokes a diagnostic like this: error: array argument is too small; is of size 32, callee requires at least 72 [-Werror,-Warray-bounds] 60 | plpar_hcall9(H_ALLOCATE_VAS_WINDOW, retbuf, | ^ ~~~~~~ [1] Enabled for LLVM builds but not GCC for now. See commit 0da6e5fd6c37 ("gcc: disable '-Warray-bounds' for gcc-13 too") and related changes. Signed-off-by: Nathan Lynch Signed-off-by: Michael Ellerman Link: https://msgid.link/20240408-pseries-hvcall-retbuf-v1-1-ebc73d7253cf@linux.ibm.com Signed-off-by: Sasha Levin --- arch/powerpc/include/asm/hvcall.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index 1a60188f74ad..f3b46a0fa708 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -453,7 +453,7 @@ long plpar_hcall_norets_notrace(unsigned long opcode, ...); * Used for all but the craziest of phyp interfaces (see plpar_hcall9) */ #define PLPAR_HCALL_BUFSIZE 4 -long plpar_hcall(unsigned long opcode, unsigned long *retbuf, ...); +long plpar_hcall(unsigned long opcode, unsigned long retbuf[static PLPAR_HCALL_BUFSIZE], ...); /** * plpar_hcall_raw: - Make a hypervisor call without calculating hcall stats @@ -467,7 +467,7 @@ long plpar_hcall(unsigned long opcode, unsigned long *retbuf, ...); * plpar_hcall, but plpar_hcall_raw works in real mode and does not * calculate hypervisor call statistics. */ -long plpar_hcall_raw(unsigned long opcode, unsigned long *retbuf, ...); +long plpar_hcall_raw(unsigned long opcode, unsigned long retbuf[static PLPAR_HCALL_BUFSIZE], ...); /** * plpar_hcall9: - Make a pseries hypervisor call with up to 9 return arguments @@ -478,8 +478,8 @@ long plpar_hcall_raw(unsigned long opcode, unsigned long *retbuf, ...); * PLPAR_HCALL9_BUFSIZE to size the return argument buffer. */ #define PLPAR_HCALL9_BUFSIZE 9 -long plpar_hcall9(unsigned long opcode, unsigned long *retbuf, ...); -long plpar_hcall9_raw(unsigned long opcode, unsigned long *retbuf, ...); +long plpar_hcall9(unsigned long opcode, unsigned long retbuf[static PLPAR_HCALL9_BUFSIZE], ...); +long plpar_hcall9_raw(unsigned long opcode, unsigned long retbuf[static PLPAR_HCALL9_BUFSIZE], ...); struct hvcall_mpp_data { unsigned long entitled_mem; From 1939648b3aca1c81a955eec1fcafc533fddedee8 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Fri, 3 May 2024 17:56:18 +1000 Subject: [PATCH 1193/1565] powerpc/io: Avoid clang null pointer arithmetic warnings [ Upstream commit 03c0f2c2b2220fc9cf8785cd7b61d3e71e24a366 ] With -Wextra clang warns about pointer arithmetic using a null pointer. When building with CONFIG_PCI=n, that triggers a warning in the IO accessors, eg: In file included from linux/arch/powerpc/include/asm/io.h:672: linux/arch/powerpc/include/asm/io-defs.h:23:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 23 | DEF_PCI_AC_RET(inb, u8, (unsigned long port), (port), pio, port) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... linux/arch/powerpc/include/asm/io.h:591:53: note: expanded from macro '__do_inb' 591 | #define __do_inb(port) readb((PCI_IO_ADDR)_IO_BASE + port); | ~~~~~~~~~~~~~~~~~~~~~ ^ That is because when CONFIG_PCI=n, _IO_BASE is defined as 0. Although _IO_BASE is defined as plain 0, the cast (PCI_IO_ADDR) converts it to void * before the addition with port happens. Instead the addition can be done first, and then the cast. The resulting value will be the same, but avoids the warning, and also avoids void pointer arithmetic which is apparently non-standard. Reported-by: Naresh Kamboju Closes: https://lore.kernel.org/all/CA+G9fYtEh8zmq8k8wE-8RZwW-Qr927RLTn+KqGnq1F=ptaaNsA@mail.gmail.com Signed-off-by: Michael Ellerman Link: https://msgid.link/20240503075619.394467-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin --- arch/powerpc/include/asm/io.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h index 0182b291248a..058d21f493fa 100644 --- a/arch/powerpc/include/asm/io.h +++ b/arch/powerpc/include/asm/io.h @@ -541,12 +541,12 @@ __do_out_asm(_rec_outl, "stwbrx") #define __do_inw(port) _rec_inw(port) #define __do_inl(port) _rec_inl(port) #else /* CONFIG_PPC32 */ -#define __do_outb(val, port) writeb(val,(PCI_IO_ADDR)_IO_BASE+port); -#define __do_outw(val, port) writew(val,(PCI_IO_ADDR)_IO_BASE+port); -#define __do_outl(val, port) writel(val,(PCI_IO_ADDR)_IO_BASE+port); -#define __do_inb(port) readb((PCI_IO_ADDR)_IO_BASE + port); -#define __do_inw(port) readw((PCI_IO_ADDR)_IO_BASE + port); -#define __do_inl(port) readl((PCI_IO_ADDR)_IO_BASE + port); +#define __do_outb(val, port) writeb(val,(PCI_IO_ADDR)(_IO_BASE+port)); +#define __do_outw(val, port) writew(val,(PCI_IO_ADDR)(_IO_BASE+port)); +#define __do_outl(val, port) writel(val,(PCI_IO_ADDR)(_IO_BASE+port)); +#define __do_inb(port) readb((PCI_IO_ADDR)(_IO_BASE + port)); +#define __do_inw(port) readw((PCI_IO_ADDR)(_IO_BASE + port)); +#define __do_inl(port) readl((PCI_IO_ADDR)(_IO_BASE + port)); #endif /* !CONFIG_PPC32 */ #ifdef CONFIG_EEH @@ -562,12 +562,12 @@ __do_out_asm(_rec_outl, "stwbrx") #define __do_writesw(a, b, n) _outsw(PCI_FIX_ADDR(a),(b),(n)) #define __do_writesl(a, b, n) _outsl(PCI_FIX_ADDR(a),(b),(n)) -#define __do_insb(p, b, n) readsb((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) -#define __do_insw(p, b, n) readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) -#define __do_insl(p, b, n) readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) -#define __do_outsb(p, b, n) writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) -#define __do_outsw(p, b, n) writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) -#define __do_outsl(p, b, n) writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) +#define __do_insb(p, b, n) readsb((PCI_IO_ADDR)(_IO_BASE+(p)), (b), (n)) +#define __do_insw(p, b, n) readsw((PCI_IO_ADDR)(_IO_BASE+(p)), (b), (n)) +#define __do_insl(p, b, n) readsl((PCI_IO_ADDR)(_IO_BASE+(p)), (b), (n)) +#define __do_outsb(p, b, n) writesb((PCI_IO_ADDR)(_IO_BASE+(p)),(b),(n)) +#define __do_outsw(p, b, n) writesw((PCI_IO_ADDR)(_IO_BASE+(p)),(b),(n)) +#define __do_outsl(p, b, n) writesl((PCI_IO_ADDR)(_IO_BASE+(p)),(b),(n)) #define __do_memset_io(addr, c, n) \ _memset_io(PCI_FIX_ADDR(addr), c, n) From 449d55871cae6e8479408849fd8eb814e6ec8e82 Mon Sep 17 00:00:00 2001 From: Tzung-Bi Shih Date: Mon, 1 Apr 2024 11:00:49 +0800 Subject: [PATCH 1194/1565] power: supply: cros_usbpd: provide ID table for avoiding fallback match [ Upstream commit 0f8678c34cbfdc63569a9b0ede1fe235ec6ec693 ] Instead of using fallback driver name match, provide ID table[1] for the primary match. [1]: https://elixir.bootlin.com/linux/v6.8/source/drivers/base/platform.c#L1353 Reviewed-by: Benson Leung Reviewed-by: Prashant Malani Reviewed-by: Krzysztof Kozlowski Signed-off-by: Tzung-Bi Shih Link: https://lore.kernel.org/r/20240401030052.2887845-4-tzungbi@kernel.org Signed-off-by: Sebastian Reichel Signed-off-by: Sasha Levin --- drivers/power/supply/cros_usbpd-charger.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/power/supply/cros_usbpd-charger.c b/drivers/power/supply/cros_usbpd-charger.c index 0a4f02e4ae7b..d7ee1eb9ca88 100644 --- a/drivers/power/supply/cros_usbpd-charger.c +++ b/drivers/power/supply/cros_usbpd-charger.c @@ -5,6 +5,7 @@ * Copyright (c) 2014 - 2018 Google, Inc */ +#include #include #include #include @@ -711,16 +712,22 @@ static int cros_usbpd_charger_resume(struct device *dev) static SIMPLE_DEV_PM_OPS(cros_usbpd_charger_pm_ops, NULL, cros_usbpd_charger_resume); +static const struct platform_device_id cros_usbpd_charger_id[] = { + { DRV_NAME, 0 }, + {} +}; +MODULE_DEVICE_TABLE(platform, cros_usbpd_charger_id); + static struct platform_driver cros_usbpd_charger_driver = { .driver = { .name = DRV_NAME, .pm = &cros_usbpd_charger_pm_ops, }, - .probe = cros_usbpd_charger_probe + .probe = cros_usbpd_charger_probe, + .id_table = cros_usbpd_charger_id, }; module_platform_driver(cros_usbpd_charger_driver); MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("ChromeOS EC USBPD charger"); -MODULE_ALIAS("platform:" DRV_NAME); From d8481016c295be548c43f77f0d4fbbd64c75191e Mon Sep 17 00:00:00 2001 From: Aleksandr Aprelkov Date: Wed, 3 Apr 2024 12:37:59 +0700 Subject: [PATCH 1195/1565] iommu/arm-smmu-v3: Free MSIs in case of ENOMEM [ Upstream commit 80fea979dd9d48d67c5b48d2f690c5da3e543ebd ] If devm_add_action() returns -ENOMEM, then MSIs are allocated but not not freed on teardown. Use devm_add_action_or_reset() instead to keep the static analyser happy. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Aleksandr Aprelkov Link: https://lore.kernel.org/r/20240403053759.643164-1-aaprelkov@usergate.com [will: Tweak commit message, remove warning message] Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 982c42c87310..9ac7b37290eb 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2925,7 +2925,7 @@ static void arm_smmu_setup_msis(struct arm_smmu_device *smmu) } /* Add callback to free MSIs on teardown */ - devm_add_action(dev, arm_smmu_free_msis, dev); + devm_add_action_or_reset(dev, arm_smmu_free_msis, dev); } static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu) From 38a82c8d00638bb642bef787eb1d5e0e4d3b7d71 Mon Sep 17 00:00:00 2001 From: Yunlei He Date: Tue, 26 Mar 2024 14:10:43 +0800 Subject: [PATCH 1196/1565] f2fs: remove clear SB_INLINECRYPT flag in default_options [ Upstream commit ac5eecf481c29942eb9a862e758c0c8b68090c33 ] In f2fs_remount, SB_INLINECRYPT flag will be clear and re-set. If create new file or open file during this gap, these files will not use inlinecrypt. Worse case, it may lead to data corruption if wrappedkey_v0 is enable. Thread A: Thread B: -f2fs_remount -f2fs_file_open or f2fs_new_inode -default_options <- clear SB_INLINECRYPT flag -fscrypt_select_encryption_impl -parse_options <- set SB_INLINECRYPT again Signed-off-by: Yunlei He Reviewed-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Sasha Levin --- fs/f2fs/super.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 1281b59da6a2..9fed42e7bb1d 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1745,8 +1745,6 @@ static void default_options(struct f2fs_sb_info *sbi) F2FS_OPTION(sbi).compress_mode = COMPR_MODE_FS; F2FS_OPTION(sbi).bggc_mode = BGGC_MODE_ON; - sbi->sb->s_flags &= ~SB_INLINECRYPT; - set_opt(sbi, INLINE_XATTR); set_opt(sbi, INLINE_DATA); set_opt(sbi, INLINE_DENTRY); From 04736c1bc32197f7d859e01a96ae80a16659931d Mon Sep 17 00:00:00 2001 From: Alex Henrie Date: Tue, 26 Mar 2024 09:07:11 -0600 Subject: [PATCH 1197/1565] usb: misc: uss720: check for incompatible versions of the Belkin F5U002 [ Upstream commit 3295f1b866bfbcabd625511968e8a5c541f9ab32 ] The incompatible device in my possession has a sticker that says "F5U002 Rev 2" and "P80453-B", and lsusb identifies it as "050d:0002 Belkin Components IEEE-1284 Controller". There is a bug report from 2007 from Michael Trausch who was seeing the exact same errors that I saw in 2024 trying to use this cable. Link: https://lore.kernel.org/all/46DE5830.9060401@trausch.us/ Signed-off-by: Alex Henrie Link: https://lore.kernel.org/r/20240326150723.99939-5-alexhenrie24@gmail.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- drivers/usb/misc/uss720.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/drivers/usb/misc/uss720.c b/drivers/usb/misc/uss720.c index 0be8efcda15d..d972c0962939 100644 --- a/drivers/usb/misc/uss720.c +++ b/drivers/usb/misc/uss720.c @@ -677,7 +677,7 @@ static int uss720_probe(struct usb_interface *intf, struct parport_uss720_private *priv; struct parport *pp; unsigned char reg; - int i; + int ret; dev_dbg(&intf->dev, "probe: vendor id 0x%x, device id 0x%x\n", le16_to_cpu(usbdev->descriptor.idVendor), @@ -688,8 +688,8 @@ static int uss720_probe(struct usb_interface *intf, usb_put_dev(usbdev); return -ENODEV; } - i = usb_set_interface(usbdev, intf->altsetting->desc.bInterfaceNumber, 2); - dev_dbg(&intf->dev, "set interface result %d\n", i); + ret = usb_set_interface(usbdev, intf->altsetting->desc.bInterfaceNumber, 2); + dev_dbg(&intf->dev, "set interface result %d\n", ret); interface = intf->cur_altsetting; @@ -725,12 +725,18 @@ static int uss720_probe(struct usb_interface *intf, set_1284_register(pp, 7, 0x00, GFP_KERNEL); set_1284_register(pp, 6, 0x30, GFP_KERNEL); /* PS/2 mode */ set_1284_register(pp, 2, 0x0c, GFP_KERNEL); - /* debugging */ - get_1284_register(pp, 0, ®, GFP_KERNEL); - dev_dbg(&intf->dev, "reg: %7ph\n", priv->reg); - i = usb_find_last_int_in_endpoint(interface, &epd); - if (!i) { + /* The Belkin F5U002 Rev 2 P80453-B USB parallel port adapter shares the + * device ID 050d:0002 with some other device that works with this + * driver, but it itself does not. Detect and handle the bad cable + * here. */ + ret = get_1284_register(pp, 0, ®, GFP_KERNEL); + dev_dbg(&intf->dev, "reg: %7ph\n", priv->reg); + if (ret < 0) + return ret; + + ret = usb_find_last_int_in_endpoint(interface, &epd); + if (!ret) { dev_dbg(&intf->dev, "epaddr %d interval %d\n", epd->bEndpointAddress, epd->bInterval); } From bffff80d103c37d93916ff2ee616c4e17f9d79b5 Mon Sep 17 00:00:00 2001 From: Roman Smirnov Date: Wed, 27 Mar 2024 16:27:55 +0300 Subject: [PATCH 1198/1565] udf: udftime: prevent overflow in udf_disk_stamp_to_time() [ Upstream commit 3b84adf460381169c085e4bc09e7b57e9e16db0a ] An overflow can occur in a situation where src.centiseconds takes the value of 255. This situation is unlikely, but there is no validation check anywere in the code. Found by Linux Verification Center (linuxtesting.org) with Svace. Suggested-by: Jan Kara Signed-off-by: Roman Smirnov Reviewed-by: Sergey Shtylyov Signed-off-by: Jan Kara Message-Id: <20240327132755.13945-1-r.smirnov@omp.ru> Signed-off-by: Sasha Levin --- fs/udf/udftime.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/udf/udftime.c b/fs/udf/udftime.c index fce4ad976c8c..26169b1f482c 100644 --- a/fs/udf/udftime.c +++ b/fs/udf/udftime.c @@ -60,13 +60,18 @@ udf_disk_stamp_to_time(struct timespec64 *dest, struct timestamp src) dest->tv_sec = mktime64(year, src.month, src.day, src.hour, src.minute, src.second); dest->tv_sec -= offset * 60; - dest->tv_nsec = 1000 * (src.centiseconds * 10000 + - src.hundredsOfMicroseconds * 100 + src.microseconds); + /* * Sanitize nanosecond field since reportedly some filesystems are * recorded with bogus sub-second values. */ - dest->tv_nsec %= NSEC_PER_SEC; + if (src.centiseconds < 100 && src.hundredsOfMicroseconds < 100 && + src.microseconds < 100) { + dest->tv_nsec = 1000 * (src.centiseconds * 10000 + + src.hundredsOfMicroseconds * 100 + src.microseconds); + } else { + dest->tv_nsec = 0; + } } void From c59f79e2b47763060abae208485b983f9eb52e0f Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Thu, 7 Mar 2024 10:37:09 -0600 Subject: [PATCH 1199/1565] PCI/PM: Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports [ Upstream commit 256df20c590bf0e4d63ac69330cf23faddac3e08 ] Hewlett-Packard HP Pavilion 17 Notebook PC/1972 is an Intel Ivy Bridge system with a muxless AMD Radeon dGPU. Attempting to use the dGPU fails with the following sequence: ACPI Error: Aborting method \AMD3._ON due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529) radeon 0000:01:00.0: not ready 1023ms after resume; waiting radeon 0000:01:00.0: not ready 2047ms after resume; waiting radeon 0000:01:00.0: not ready 4095ms after resume; waiting radeon 0000:01:00.0: not ready 8191ms after resume; waiting radeon 0000:01:00.0: not ready 16383ms after resume; waiting radeon 0000:01:00.0: not ready 32767ms after resume; waiting radeon 0000:01:00.0: not ready 65535ms after resume; giving up radeon 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible The issue is that the Root Port the dGPU is connected to can't handle the transition from D3cold to D0 so the dGPU can't properly exit runtime PM. The existing logic in pci_bridge_d3_possible() checks for systems that are newer than 2015 to decide that D3 is safe. This would nominally work for an Ivy Bridge system (which was discontinued in 2015), but this system appears to have continued to receive BIOS updates until 2017 and so this existing logic doesn't appropriately capture it. Add the system to bridge_d3_blacklist to prevent D3cold from being used. Link: https://lore.kernel.org/r/20240307163709.323-1-mario.limonciello@amd.com Reported-by: Eric Heintzmann Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3229 Signed-off-by: Mario Limonciello Signed-off-by: Bjorn Helgaas Tested-by: Eric Heintzmann Signed-off-by: Sasha Levin --- drivers/pci/pci.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index d1631109b142..530ced8f7abd 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2840,6 +2840,18 @@ static const struct dmi_system_id bridge_d3_blacklist[] = { DMI_MATCH(DMI_BOARD_VERSION, "Continental Z2"), }, }, + { + /* + * Changing power state of root port dGPU is connected fails + * https://gitlab.freedesktop.org/drm/amd/-/issues/3229 + */ + .ident = "Hewlett-Packard HP Pavilion 17 Notebook PC/1972", + .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "Hewlett-Packard"), + DMI_MATCH(DMI_BOARD_NAME, "1972"), + DMI_MATCH(DMI_BOARD_VERSION, "95.33"), + }, + }, #endif { } }; From 6c1b9fe148a4e03bbfa234267ebb89f35285814a Mon Sep 17 00:00:00 2001 From: Songyang Li Date: Wed, 20 Mar 2024 23:22:00 +0800 Subject: [PATCH 1200/1565] MIPS: Octeon: Add PCIe link status check [ Upstream commit 29b83a64df3b42c88c0338696feb6fdcd7f1f3b7 ] The standard PCIe configuration read-write interface is used to access the configuration space of the peripheral PCIe devices of the mips processor after the PCIe link surprise down, it can generate kernel panic caused by "Data bus error". So it is necessary to add PCIe link status check for system protection. When the PCIe link is down or in training, assigning a value of 0 to the configuration address can prevent read-write behavior to the configuration space of peripheral PCIe devices, thereby preventing kernel panic. Signed-off-by: Songyang Li Signed-off-by: Thomas Bogendoerfer Signed-off-by: Sasha Levin --- arch/mips/pci/pcie-octeon.c | 6 ++++++ 1 file changed, 6 insertions(+) mode change 100644 => 100755 arch/mips/pci/pcie-octeon.c diff --git a/arch/mips/pci/pcie-octeon.c b/arch/mips/pci/pcie-octeon.c old mode 100644 new mode 100755 index d919a0d813a1..38de2a9c3cf1 --- a/arch/mips/pci/pcie-octeon.c +++ b/arch/mips/pci/pcie-octeon.c @@ -230,12 +230,18 @@ static inline uint64_t __cvmx_pcie_build_config_addr(int pcie_port, int bus, { union cvmx_pcie_address pcie_addr; union cvmx_pciercx_cfg006 pciercx_cfg006; + union cvmx_pciercx_cfg032 pciercx_cfg032; pciercx_cfg006.u32 = cvmx_pcie_cfgx_read(pcie_port, CVMX_PCIERCX_CFG006(pcie_port)); if ((bus <= pciercx_cfg006.s.pbnum) && (dev != 0)) return 0; + pciercx_cfg032.u32 = + cvmx_pcie_cfgx_read(pcie_port, CVMX_PCIERCX_CFG032(pcie_port)); + if ((pciercx_cfg032.s.dlla == 0) || (pciercx_cfg032.s.lt == 1)) + return 0; + pcie_addr.u64 = 0; pcie_addr.config.upper = 2; pcie_addr.config.io = 1; From 15c8b2e1d6fc7ea66710c0431292b20630a24d6c Mon Sep 17 00:00:00 2001 From: Parker Newman Date: Tue, 16 Apr 2024 08:55:28 -0400 Subject: [PATCH 1201/1565] serial: exar: adding missing CTI and Exar PCI ids [ Upstream commit b86ae40ffcf5a16b9569b1016da4a08c4f352ca2 ] - Added Connect Tech and Exar IDs not already in pci_ids.h Signed-off-by: Parker Newman Link: https://lore.kernel.org/r/7c3d8e795a864dd9b0a00353b722060dc27c4e09.1713270624.git.pnewman@connecttech.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- drivers/tty/serial/8250/8250_exar.c | 42 +++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/drivers/tty/serial/8250/8250_exar.c b/drivers/tty/serial/8250/8250_exar.c index 7c28d2752a4c..3d09f8f30e02 100644 --- a/drivers/tty/serial/8250/8250_exar.c +++ b/drivers/tty/serial/8250/8250_exar.c @@ -41,8 +41,50 @@ #define PCI_DEVICE_ID_COMMTECH_4228PCIE 0x0021 #define PCI_DEVICE_ID_COMMTECH_4222PCIE 0x0022 +#define PCI_VENDOR_ID_CONNECT_TECH 0x12c4 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_2_SP_OPTO 0x0340 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_4_SP_OPTO_A 0x0341 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_4_SP_OPTO_B 0x0342 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_2_XPRS 0x0350 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_4_XPRS_A 0x0351 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_4_XPRS_B 0x0352 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_8_XPRS 0x0353 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_16_XPRS_A 0x0354 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_16_XPRS_B 0x0355 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_2_XPRS_OPTO 0x0360 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_4_XPRS_OPTO_A 0x0361 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_4_XPRS_OPTO_B 0x0362 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_8_SP 0x0370 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_8_SP_232 0x0371 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_8_SP_485 0x0372 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_4_4_SP 0x0373 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_6_2_SP 0x0374 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_2_6_SP 0x0375 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_8_SP_232_NS 0x0376 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_2_XP_OPTO_LEFT 0x0380 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_2_XP_OPTO_RIGHT 0x0381 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_4_XP_OPTO 0x0382 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_4_4_XPRS_OPTO 0x0392 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_8_XPRS_LP 0x03A0 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_8_XPRS_LP_232 0x03A1 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_8_XPRS_LP_485 0x03A2 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCI_UART_8_XPRS_LP_232_NS 0x03A3 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCIE_XEG001 0x0602 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCIE_XR35X_BASE 0x1000 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCIE_XR35X_2 0x1002 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCIE_XR35X_4 0x1004 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCIE_XR35X_8 0x1008 +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCIE_XR35X_12 0x100C +#define PCI_SUBDEVICE_ID_CONNECT_TECH_PCIE_XR35X_16 0x1010 +#define PCI_DEVICE_ID_CONNECT_TECH_PCI_XR79X_12_XIG00X 0x110c +#define PCI_DEVICE_ID_CONNECT_TECH_PCI_XR79X_12_XIG01X 0x110d +#define PCI_DEVICE_ID_CONNECT_TECH_PCI_XR79X_16 0x1110 + #define PCI_DEVICE_ID_EXAR_XR17V4358 0x4358 #define PCI_DEVICE_ID_EXAR_XR17V8358 0x8358 +#define PCI_DEVICE_ID_EXAR_XR17V252 0x0252 +#define PCI_DEVICE_ID_EXAR_XR17V254 0x0254 +#define PCI_DEVICE_ID_EXAR_XR17V258 0x0258 #define PCI_SUBDEVICE_ID_USR_2980 0x0128 #define PCI_SUBDEVICE_ID_USR_2981 0x0129 From 7117969bff94b41a822c4d58f651bf73803d6dad Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= Date: Wed, 8 May 2024 15:07:00 +0300 Subject: [PATCH 1202/1565] MIPS: Routerboard 532: Fix vendor retry check code MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit ae9daffd9028f2500c9ac1517e46d4f2b57efb80 ] read_config_dword() contains strange condition checking ret for a number of values. The ret variable, however, is always zero because config_access() never returns anything else. Thus, the retry is always taken until number of tries is exceeded. The code looks like it wants to check *val instead of ret to see if the read gave an error response. Fixes: 73b4390fb234 ("[MIPS] Routerboard 532: Support for base system") Signed-off-by: Ilpo Järvinen Signed-off-by: Thomas Bogendoerfer Signed-off-by: Sasha Levin --- arch/mips/pci/ops-rc32434.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/mips/pci/ops-rc32434.c b/arch/mips/pci/ops-rc32434.c index 874ed6df9768..34b9323bdabb 100644 --- a/arch/mips/pci/ops-rc32434.c +++ b/arch/mips/pci/ops-rc32434.c @@ -112,8 +112,8 @@ static int read_config_dword(struct pci_bus *bus, unsigned int devfn, * gives them time to settle */ if (where == PCI_VENDOR_ID) { - if (ret == 0xffffffff || ret == 0x00000000 || - ret == 0x0000ffff || ret == 0xffff0000) { + if (*val == 0xffffffff || *val == 0x00000000 || + *val == 0x0000ffff || *val == 0xffff0000) { if (delay > 4) return 0; delay *= 2; From 36d771ce6028b886e18a4a8956a5d23688e4e13d Mon Sep 17 00:00:00 2001 From: Christian Marangi Date: Tue, 11 Jun 2024 13:35:33 +0200 Subject: [PATCH 1203/1565] mips: bmips: BCM6358: make sure CBR is correctly set [ Upstream commit ce5cdd3b05216b704a704f466fb4c2dff3778caf ] It was discovered that some device have CBR address set to 0 causing kernel panic when arch_sync_dma_for_cpu_all is called. This was notice in situation where the system is booted from TP1 and BMIPS_GET_CBR() returns 0 instead of a valid address and !!(read_c0_brcm_cmt_local() & (1 << 31)); not failing. The current check whether RAC flush should be disabled or not are not enough hence lets check if CBR is a valid address or not. Fixes: ab327f8acdf8 ("mips: bmips: BCM6358: disable RAC flush for TP1") Signed-off-by: Christian Marangi Acked-by: Florian Fainelli Signed-off-by: Thomas Bogendoerfer Signed-off-by: Sasha Levin --- arch/mips/bmips/setup.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c index 16063081d61e..ac75797739d2 100644 --- a/arch/mips/bmips/setup.c +++ b/arch/mips/bmips/setup.c @@ -110,7 +110,8 @@ static void bcm6358_quirks(void) * RAC flush causes kernel panics on BCM6358 when booting from TP1 * because the bootloader is not initializing it properly. */ - bmips_rac_flush_disable = !!(read_c0_brcm_cmt_local() & (1 << 31)); + bmips_rac_flush_disable = !!(read_c0_brcm_cmt_local() & (1 << 31)) || + !!BMIPS_GET_CBR(); } static void bcm6368_quirks(void) From a85bae262ccecc52a40c466ec067f6c915e0839d Mon Sep 17 00:00:00 2001 From: "Masami Hiramatsu (Google)" Date: Tue, 11 Jun 2024 22:30:37 +0900 Subject: [PATCH 1204/1565] tracing: Build event generation tests only as modules [ Upstream commit 3572bd5689b0812b161b40279e39ca5b66d73e88 ] The kprobes and synth event generation test modules add events and lock (get a reference) those event file reference in module init function, and unlock and delete it in module exit function. This is because those are designed for playing as modules. If we make those modules as built-in, those events are left locked in the kernel, and never be removed. This causes kprobe event self-test failure as below. [ 97.349708] ------------[ cut here ]------------ [ 97.353453] WARNING: CPU: 3 PID: 1 at kernel/trace/trace_kprobe.c:2133 kprobe_trace_self_tests_init+0x3f1/0x480 [ 97.357106] Modules linked in: [ 97.358488] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.9.0-g699646734ab5-dirty #14 [ 97.361556] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 97.363880] RIP: 0010:kprobe_trace_self_tests_init+0x3f1/0x480 [ 97.365538] Code: a8 24 08 82 e9 ae fd ff ff 90 0f 0b 90 48 c7 c7 e5 aa 0b 82 e9 ee fc ff ff 90 0f 0b 90 48 c7 c7 2d 61 06 82 e9 8e fd ff ff 90 <0f> 0b 90 48 c7 c7 33 0b 0c 82 89 c6 e8 6e 03 1f ff 41 ff c7 e9 90 [ 97.370429] RSP: 0000:ffffc90000013b50 EFLAGS: 00010286 [ 97.371852] RAX: 00000000fffffff0 RBX: ffff888005919c00 RCX: 0000000000000000 [ 97.373829] RDX: ffff888003f40000 RSI: ffffffff8236a598 RDI: ffff888003f40a68 [ 97.375715] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [ 97.377675] R10: ffffffff811c9ae5 R11: ffffffff8120c4e0 R12: 0000000000000000 [ 97.379591] R13: 0000000000000001 R14: 0000000000000015 R15: 0000000000000000 [ 97.381536] FS: 0000000000000000(0000) GS:ffff88807dcc0000(0000) knlGS:0000000000000000 [ 97.383813] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 97.385449] CR2: 0000000000000000 CR3: 0000000002244000 CR4: 00000000000006b0 [ 97.387347] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 97.389277] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 97.391196] Call Trace: [ 97.391967] [ 97.392647] ? __warn+0xcc/0x180 [ 97.393640] ? kprobe_trace_self_tests_init+0x3f1/0x480 [ 97.395181] ? report_bug+0xbd/0x150 [ 97.396234] ? handle_bug+0x3e/0x60 [ 97.397311] ? exc_invalid_op+0x1a/0x50 [ 97.398434] ? asm_exc_invalid_op+0x1a/0x20 [ 97.399652] ? trace_kprobe_is_busy+0x20/0x20 [ 97.400904] ? tracing_reset_all_online_cpus+0x15/0x90 [ 97.402304] ? kprobe_trace_self_tests_init+0x3f1/0x480 [ 97.403773] ? init_kprobe_trace+0x50/0x50 [ 97.404972] do_one_initcall+0x112/0x240 [ 97.406113] do_initcall_level+0x95/0xb0 [ 97.407286] ? kernel_init+0x1a/0x1a0 [ 97.408401] do_initcalls+0x3f/0x70 [ 97.409452] kernel_init_freeable+0x16f/0x1e0 [ 97.410662] ? rest_init+0x1f0/0x1f0 [ 97.411738] kernel_init+0x1a/0x1a0 [ 97.412788] ret_from_fork+0x39/0x50 [ 97.413817] ? rest_init+0x1f0/0x1f0 [ 97.414844] ret_from_fork_asm+0x11/0x20 [ 97.416285] [ 97.417134] irq event stamp: 13437323 [ 97.418376] hardirqs last enabled at (13437337): [] console_unlock+0x11c/0x150 [ 97.421285] hardirqs last disabled at (13437370): [] console_unlock+0x101/0x150 [ 97.423838] softirqs last enabled at (13437366): [] handle_softirqs+0x23f/0x2a0 [ 97.426450] softirqs last disabled at (13437393): [] __irq_exit_rcu+0x66/0xd0 [ 97.428850] ---[ end trace 0000000000000000 ]--- And also, since we can not cleanup dynamic_event file, ftracetest are failed too. To avoid these issues, build these tests only as modules. Link: https://lore.kernel.org/all/171811263754.85078.5877446624311852525.stgit@devnote2/ Fixes: 9fe41efaca08 ("tracing: Add synth event generation test module") Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module") Signed-off-by: Masami Hiramatsu (Google) Reviewed-by: Steven Rostedt (Google) Signed-off-by: Sasha Levin --- kernel/trace/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig index 29db703f6880..467975300ddd 100644 --- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -824,7 +824,7 @@ config PREEMPTIRQ_DELAY_TEST config SYNTH_EVENT_GEN_TEST tristate "Test module for in-kernel synthetic event generation" - depends on SYNTH_EVENTS + depends on SYNTH_EVENTS && m help This option creates a test module to check the base functionality of in-kernel synthetic event definition and @@ -837,7 +837,7 @@ config SYNTH_EVENT_GEN_TEST config KPROBE_EVENT_GEN_TEST tristate "Test module for in-kernel kprobe event generation" - depends on KPROBE_EVENTS + depends on KPROBE_EVENTS && m help This option creates a test module to check the base functionality of in-kernel kprobe event definition. From 1aabe0f850ad446763383106d9b677094db9f930 Mon Sep 17 00:00:00 2001 From: Ondrej Mosnacek Date: Fri, 7 Jun 2024 18:07:52 +0200 Subject: [PATCH 1205/1565] cipso: fix total option length computation [ Upstream commit 9f36169912331fa035d7b73a91252d7c2512eb1a ] As evident from the definition of ip_options_get(), the IP option IPOPT_END is used to pad the IP option data array, not IPOPT_NOP. Yet the loop that walks the IP options to determine the total IP options length in cipso_v4_delopt() doesn't take IPOPT_END into account. Fix it by recognizing the IPOPT_END value as the end of actual options. Fixes: 014ab19a69c3 ("selinux: Set socket NetLabel based on connection endpoint") Signed-off-by: Ondrej Mosnacek Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/ipv4/cipso_ipv4.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c index d4a4160159a9..c5da958e3bbd 100644 --- a/net/ipv4/cipso_ipv4.c +++ b/net/ipv4/cipso_ipv4.c @@ -2016,12 +2016,16 @@ static int cipso_v4_delopt(struct ip_options_rcu __rcu **opt_ptr) * from there we can determine the new total option length */ iter = 0; optlen_new = 0; - while (iter < opt->opt.optlen) - if (opt->opt.__data[iter] != IPOPT_NOP) { + while (iter < opt->opt.optlen) { + if (opt->opt.__data[iter] == IPOPT_END) { + break; + } else if (opt->opt.__data[iter] == IPOPT_NOP) { + iter++; + } else { iter += opt->opt.__data[iter + 1]; optlen_new = iter; - } else - iter++; + } + } hdr_delta = opt->opt.optlen; opt->opt.optlen = (optlen_new + 3) & ~3; hdr_delta -= opt->opt.optlen; From 5391f9db2cab5ef1cb411be1ab7dbec728078fba Mon Sep 17 00:00:00 2001 From: Gavrilov Ilia Date: Thu, 13 Jun 2024 08:23:00 +0000 Subject: [PATCH 1206/1565] netrom: Fix a memory leak in nr_heartbeat_expiry() [ Upstream commit 0b9130247f3b6a1122478471ff0e014ea96bb735 ] syzbot reported a memory leak in nr_create() [0]. Commit 409db27e3a2e ("netrom: Fix use-after-free of a listening socket.") added sock_hold() to the nr_heartbeat_expiry() function, where a) a socket has a SOCK_DESTROY flag or b) a listening socket has a SOCK_DEAD flag. But in the case "a," when the SOCK_DESTROY flag is set, the file descriptor has already been closed and the nr_release() function has been called. So it makes no sense to hold the reference count because no one will call another nr_destroy_socket() and put it as in the case "b." nr_connect nr_establish_data_link nr_start_heartbeat nr_release switch (nr->state) case NR_STATE_3 nr->state = NR_STATE_2 sock_set_flag(sk, SOCK_DESTROY); nr_rx_frame nr_process_rx_frame switch (nr->state) case NR_STATE_2 nr_state2_machine() nr_disconnect() nr_sk(sk)->state = NR_STATE_0 sock_set_flag(sk, SOCK_DEAD) nr_heartbeat_expiry switch (nr->state) case NR_STATE_0 if (sock_flag(sk, SOCK_DESTROY) || (sk->sk_state == TCP_LISTEN && sock_flag(sk, SOCK_DEAD))) sock_hold() // ( !!! ) nr_destroy_socket() To fix the memory leak, let's call sock_hold() only for a listening socket. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with Syzkaller. [0]: https://syzkaller.appspot.com/bug?extid=d327a1f3b12e1e206c16 Reported-by: syzbot+d327a1f3b12e1e206c16@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d327a1f3b12e1e206c16 Fixes: 409db27e3a2e ("netrom: Fix use-after-free of a listening socket.") Signed-off-by: Gavrilov Ilia Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/netrom/nr_timer.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/netrom/nr_timer.c b/net/netrom/nr_timer.c index 4e7c968cde2d..5e3ca068f04e 100644 --- a/net/netrom/nr_timer.c +++ b/net/netrom/nr_timer.c @@ -121,7 +121,8 @@ static void nr_heartbeat_expiry(struct timer_list *t) is accepted() it isn't 'dead' so doesn't get removed. */ if (sock_flag(sk, SOCK_DESTROY) || (sk->sk_state == TCP_LISTEN && sock_flag(sk, SOCK_DEAD))) { - sock_hold(sk); + if (sk->sk_state == TCP_LISTEN) + sock_hold(sk); bh_unlock_sock(sk); nr_destroy_socket(sk); goto out; From de5ad4d45cd0128a2a37555f48ab69aa19d78adc Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 14 Jun 2024 08:20:02 +0000 Subject: [PATCH 1207/1565] ipv6: prevent possible NULL deref in fib6_nh_init() [ Upstream commit 2eab4543a2204092c3a7af81d7d6c506e59a03a6 ] syzbot reminds us that in6_dev_get() can return NULL. fib6_nh_init() ip6_validate_gw( &idev ) ip6_route_check_nh( idev ) *idev = in6_dev_get(dev); // can be NULL Oops: general protection fault, probably for non-canonical address 0xdffffc00000000bc: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x00000000000005e0-0x00000000000005e7] CPU: 0 PID: 11237 Comm: syz-executor.3 Not tainted 6.10.0-rc2-syzkaller-00249-gbe27b8965297 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024 RIP: 0010:fib6_nh_init+0x640/0x2160 net/ipv6/route.c:3606 Code: 00 00 fc ff df 4c 8b 64 24 58 48 8b 44 24 28 4c 8b 74 24 30 48 89 c1 48 89 44 24 28 48 8d 98 e0 05 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 38 84 c0 0f 85 b3 17 00 00 8b 1b 31 ff 89 de e8 b8 8b RSP: 0018:ffffc900032775a0 EFLAGS: 00010202 RAX: 00000000000000bc RBX: 00000000000005e0 RCX: 0000000000000000 RDX: 0000000000000010 RSI: ffffc90003277a54 RDI: ffff88802b3a08d8 RBP: ffffc900032778b0 R08: 00000000000002fc R09: 0000000000000000 R10: 00000000000002fc R11: 0000000000000000 R12: ffff88802b3a08b8 R13: 1ffff9200064eec8 R14: ffffc90003277a00 R15: dffffc0000000000 FS: 00007f940feb06c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000245e8000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ip6_route_info_create+0x99e/0x12b0 net/ipv6/route.c:3809 ip6_route_add+0x28/0x160 net/ipv6/route.c:3853 ipv6_route_ioctl+0x588/0x870 net/ipv6/route.c:4483 inet6_ioctl+0x21a/0x280 net/ipv6/af_inet6.c:579 sock_do_ioctl+0x158/0x460 net/socket.c:1222 sock_ioctl+0x629/0x8e0 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f940f07cea9 Fixes: 428604fb118f ("ipv6: do not set routes if disable_ipv6 has been enabled") Reported-by: syzbot Signed-off-by: Eric Dumazet Acked-by: Lorenzo Bianconi Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20240614082002.26407-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/ipv6/route.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 00e3daaafed8..9ea865f22c79 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3488,7 +3488,7 @@ int fib6_nh_init(struct net *net, struct fib6_nh *fib6_nh, if (!dev) goto out; - if (idev->cnf.disable_ipv6) { + if (!idev || idev->cnf.disable_ipv6) { NL_SET_ERR_MSG(extack, "IPv6 is disabled on nexthop device"); err = -EACCES; goto out; From 1ed9849fdf9a1a617129346b11d2094ca26828dc Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 15 Jun 2024 15:14:54 +0000 Subject: [PATCH 1208/1565] ipv6: prevent possible NULL dereference in rt6_probe() [ Upstream commit b86762dbe19a62e785c189f313cda5b989931f37 ] syzbot caught a NULL dereference in rt6_probe() [1] Bail out if __in6_dev_get() returns NULL. [1] Oops: general protection fault, probably for non-canonical address 0xdffffc00000000cb: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000658-0x000000000000065f] CPU: 1 PID: 22444 Comm: syz-executor.0 Not tainted 6.10.0-rc2-syzkaller-00383-gb8481381d4e2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 RIP: 0010:rt6_probe net/ipv6/route.c:656 [inline] RIP: 0010:find_match+0x8c4/0xf50 net/ipv6/route.c:758 Code: 14 fd f7 48 8b 85 38 ff ff ff 48 c7 45 b0 00 00 00 00 48 8d b8 5c 06 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 19 RSP: 0018:ffffc900034af070 EFLAGS: 00010203 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90004521000 RDX: 00000000000000cb RSI: ffffffff8990d0cd RDI: 000000000000065c RBP: ffffc900034af150 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000002 R12: 000000000000000a R13: 1ffff92000695e18 R14: ffff8880244a1d20 R15: 0000000000000000 FS: 00007f4844a5a6c0(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b31b27000 CR3: 000000002d42c000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: rt6_nh_find_match+0xfa/0x1a0 net/ipv6/route.c:784 nexthop_for_each_fib6_nh+0x26d/0x4a0 net/ipv4/nexthop.c:1496 __find_rr_leaf+0x6e7/0xe00 net/ipv6/route.c:825 find_rr_leaf net/ipv6/route.c:853 [inline] rt6_select net/ipv6/route.c:897 [inline] fib6_table_lookup+0x57e/0xa30 net/ipv6/route.c:2195 ip6_pol_route+0x1cd/0x1150 net/ipv6/route.c:2231 pol_lookup_func include/net/ip6_fib.h:616 [inline] fib6_rule_lookup+0x386/0x720 net/ipv6/fib6_rules.c:121 ip6_route_output_flags_noref net/ipv6/route.c:2639 [inline] ip6_route_output_flags+0x1d0/0x640 net/ipv6/route.c:2651 ip6_dst_lookup_tail.constprop.0+0x961/0x1760 net/ipv6/ip6_output.c:1147 ip6_dst_lookup_flow+0x99/0x1d0 net/ipv6/ip6_output.c:1250 rawv6_sendmsg+0xdab/0x4340 net/ipv6/raw.c:898 inet_sendmsg+0x119/0x140 net/ipv4/af_inet.c:853 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] sock_write_iter+0x4b8/0x5c0 net/socket.c:1160 new_sync_write fs/read_write.c:497 [inline] vfs_write+0x6b6/0x1140 fs/read_write.c:590 ksys_write+0x1f8/0x260 fs/read_write.c:643 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 52e1635631b3 ("[IPV6]: ROUTE: Add router_probe_interval sysctl.") Signed-off-by: Eric Dumazet Reviewed-by: Jason Xing Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20240615151454.166404-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/ipv6/route.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 9ea865f22c79..799779475c7d 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -632,6 +632,8 @@ static void rt6_probe(struct fib6_nh *fib6_nh) rcu_read_lock_bh(); last_probe = READ_ONCE(fib6_nh->last_probe); idev = __in6_dev_get(dev); + if (!idev) + goto out; neigh = __ipv6_neigh_lookup_noref(dev, nh_gw); if (neigh) { if (neigh->nud_state & NUD_VALID) From 20427b85781aca0ad072851f6907a3d4b2fed8d1 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 15 Jun 2024 15:42:31 +0000 Subject: [PATCH 1209/1565] xfrm6: check ip6_dst_idev() return value in xfrm6_get_saddr() [ Upstream commit d46401052c2d5614da8efea5788532f0401cb164 ] ip6_dst_idev() can return NULL, xfrm6_get_saddr() must act accordingly. syzbot reported: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 PID: 12 Comm: kworker/u8:1 Not tainted 6.10.0-rc2-syzkaller-00383-gb8481381d4e2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Workqueue: wg-kex-wg1 wg_packet_handshake_send_worker RIP: 0010:xfrm6_get_saddr+0x93/0x130 net/ipv6/xfrm6_policy.c:64 Code: df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 97 00 00 00 4c 8b ab d8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 86 00 00 00 4d 8b 6d 00 e8 ca 13 47 01 48 b8 00 RSP: 0018:ffffc90000117378 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff88807b079dc0 RCX: ffffffff89a0d6d7 RDX: 0000000000000000 RSI: ffffffff89a0d6e9 RDI: ffff88807b079e98 RBP: ffff88807ad73248 R08: 0000000000000007 R09: fffffffffffff000 R10: ffff88807b079dc0 R11: 0000000000000007 R12: ffffc90000117480 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4586d00440 CR3: 0000000079042000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: xfrm_get_saddr net/xfrm/xfrm_policy.c:2452 [inline] xfrm_tmpl_resolve_one net/xfrm/xfrm_policy.c:2481 [inline] xfrm_tmpl_resolve+0xa26/0xf10 net/xfrm/xfrm_policy.c:2541 xfrm_resolve_and_create_bundle+0x140/0x2570 net/xfrm/xfrm_policy.c:2835 xfrm_bundle_lookup net/xfrm/xfrm_policy.c:3070 [inline] xfrm_lookup_with_ifid+0x4d1/0x1e60 net/xfrm/xfrm_policy.c:3201 xfrm_lookup net/xfrm/xfrm_policy.c:3298 [inline] xfrm_lookup_route+0x3b/0x200 net/xfrm/xfrm_policy.c:3309 ip6_dst_lookup_flow+0x15c/0x1d0 net/ipv6/ip6_output.c:1256 send6+0x611/0xd20 drivers/net/wireguard/socket.c:139 wg_socket_send_skb_to_peer+0xf9/0x220 drivers/net/wireguard/socket.c:178 wg_socket_send_buffer_to_peer+0x12b/0x190 drivers/net/wireguard/socket.c:200 wg_packet_send_handshake_initiation+0x227/0x360 drivers/net/wireguard/send.c:40 wg_packet_handshake_send_worker+0x1c/0x30 drivers/net/wireguard/send.c:51 process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231 process_scheduled_works kernel/workqueue.c:3312 [inline] worker_thread+0x6c8/0xf70 kernel/workqueue.c:3393 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot Signed-off-by: Eric Dumazet Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20240615154231.234442-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/ipv6/xfrm6_policy.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index 4c3aa97f23fa..7c903e0e446c 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -57,12 +57,18 @@ static int xfrm6_get_saddr(struct net *net, int oif, { struct dst_entry *dst; struct net_device *dev; + struct inet6_dev *idev; dst = xfrm6_dst_lookup(net, 0, oif, NULL, daddr, mark); if (IS_ERR(dst)) return -EHOSTUNREACH; - dev = ip6_dst_idev(dst)->dev; + idev = ip6_dst_idev(dst); + if (!idev) { + dst_release(dst); + return -EHOSTUNREACH; + } + dev = idev->dev; ipv6_dev_get_saddr(dev_net(dev), dev, &daddr->in6, 0, &saddr->in6); dst_release(dst); return 0; From 2b82028a1f5ee3a8e04090776b10c534144ae77b Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Fri, 14 Jun 2024 21:13:02 +0800 Subject: [PATCH 1210/1565] netns: Make get_net_ns() handle zero refcount net [ Upstream commit ff960f9d3edbe08a736b5a224d91a305ccc946b0 ] Syzkaller hit a warning: refcount_t: addition on 0; use-after-free. WARNING: CPU: 3 PID: 7890 at lib/refcount.c:25 refcount_warn_saturate+0xdf/0x1d0 Modules linked in: CPU: 3 PID: 7890 Comm: tun Not tainted 6.10.0-rc3-00100-gcaa4f9578aba-dirty #310 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:refcount_warn_saturate+0xdf/0x1d0 Code: 41 49 04 31 ff 89 de e8 9f 1e cd fe 84 db 75 9c e8 76 26 cd fe c6 05 b6 41 49 04 01 90 48 c7 c7 b8 8e 25 86 e8 d2 05 b5 fe 90 <0f> 0b 90 90 e9 79 ff ff ff e8 53 26 cd fe 0f b6 1 RSP: 0018:ffff8881067b7da0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff811c72ac RDX: ffff8881026a2140 RSI: ffffffff811c72b5 RDI: 0000000000000001 RBP: ffff8881067b7db0 R08: 0000000000000000 R09: 205b5d3730353139 R10: 0000000000000000 R11: 205d303938375420 R12: ffff8881086500c4 R13: ffff8881086500c4 R14: ffff8881086500b0 R15: ffff888108650040 FS: 00007f5b2961a4c0(0000) GS:ffff88823bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055d7ed36fd18 CR3: 00000001482f6000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? show_regs+0xa3/0xc0 ? __warn+0xa5/0x1c0 ? refcount_warn_saturate+0xdf/0x1d0 ? report_bug+0x1fc/0x2d0 ? refcount_warn_saturate+0xdf/0x1d0 ? handle_bug+0xa1/0x110 ? exc_invalid_op+0x3c/0xb0 ? asm_exc_invalid_op+0x1f/0x30 ? __warn_printk+0xcc/0x140 ? __warn_printk+0xd5/0x140 ? refcount_warn_saturate+0xdf/0x1d0 get_net_ns+0xa4/0xc0 ? __pfx_get_net_ns+0x10/0x10 open_related_ns+0x5a/0x130 __tun_chr_ioctl+0x1616/0x2370 ? __sanitizer_cov_trace_switch+0x58/0xa0 ? __sanitizer_cov_trace_const_cmp2+0x1c/0x30 ? __pfx_tun_chr_ioctl+0x10/0x10 tun_chr_ioctl+0x2f/0x40 __x64_sys_ioctl+0x11b/0x160 x64_sys_call+0x1211/0x20d0 do_syscall_64+0x9e/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f5b28f165d7 Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 8 RSP: 002b:00007ffc2b59c5e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5b28f165d7 RDX: 0000000000000000 RSI: 00000000000054e3 RDI: 0000000000000003 RBP: 00007ffc2b59c650 R08: 00007f5b291ed8c0 R09: 00007f5b2961a4c0 R10: 0000000029690010 R11: 0000000000000246 R12: 0000000000400730 R13: 00007ffc2b59cf40 R14: 0000000000000000 R15: 0000000000000000 Kernel panic - not syncing: kernel: panic_on_warn set ... This is trigger as below: ns0 ns1 tun_set_iff() //dev is tun0 tun->dev = dev //ip link set tun0 netns ns1 put_net() //ref is 0 __tun_chr_ioctl() //TUNGETDEVNETNS net = dev_net(tun->dev); open_related_ns(&net->ns, get_net_ns); //ns1 get_net_ns() get_net() //addition on 0 Use maybe_get_net() in get_net_ns in case net's ref is zero to fix this Fixes: 0c3e0e3bb623 ("tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of tun device") Signed-off-by: Yue Haibing Link: https://lore.kernel.org/r/20240614131302.2698509-1-yuehaibing@huawei.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/core/net_namespace.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 72cfe5248b76..6192a05ebcce 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -670,11 +670,16 @@ EXPORT_SYMBOL_GPL(__put_net); * get_net_ns - increment the refcount of the network namespace * @ns: common namespace (net) * - * Returns the net's common namespace. + * Returns the net's common namespace or ERR_PTR() if ref is zero. */ struct ns_common *get_net_ns(struct ns_common *ns) { - return &get_net(container_of(ns, struct net, ns))->ns; + struct net *net; + + net = maybe_get_net(container_of(ns, struct net, ns)); + if (net) + return &net->ns; + return ERR_PTR(-EINVAL); } EXPORT_SYMBOL_GPL(get_net_ns); From fb910ac2d3dafa750bdcec699b088dc99b8a0c17 Mon Sep 17 00:00:00 2001 From: Stefan Wahren Date: Fri, 14 Jun 2024 16:50:30 +0200 Subject: [PATCH 1211/1565] qca_spi: Make interrupt remembering atomic [ Upstream commit 2d7198278ece01818cd95a3beffbdf8b2a353fa0 ] The whole mechanism to remember occurred SPI interrupts is not atomic, which could lead to unexpected behavior. So fix this by using atomic bit operations instead. Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren Link: https://lore.kernel.org/r/20240614145030.7781-1-wahrenst@gmx.net Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- drivers/net/ethernet/qualcomm/qca_debug.c | 6 ++---- drivers/net/ethernet/qualcomm/qca_spi.c | 16 ++++++++-------- drivers/net/ethernet/qualcomm/qca_spi.h | 3 +-- 3 files changed, 11 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/qca_debug.c b/drivers/net/ethernet/qualcomm/qca_debug.c index 66229b300c5a..c205d3f03696 100644 --- a/drivers/net/ethernet/qualcomm/qca_debug.c +++ b/drivers/net/ethernet/qualcomm/qca_debug.c @@ -110,10 +110,8 @@ qcaspi_info_show(struct seq_file *s, void *what) seq_printf(s, "IRQ : %d\n", qca->spi_dev->irq); - seq_printf(s, "INTR REQ : %u\n", - qca->intr_req); - seq_printf(s, "INTR SVC : %u\n", - qca->intr_svc); + seq_printf(s, "INTR : %lx\n", + qca->intr); seq_printf(s, "SPI max speed : %lu\n", (unsigned long)qca->spi_dev->max_speed_hz); diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c index ffa1846f5b4c..f6bc5a273477 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.c +++ b/drivers/net/ethernet/qualcomm/qca_spi.c @@ -49,6 +49,8 @@ #define MAX_DMA_BURST_LEN 5000 +#define SPI_INTR 0 + /* Modules parameters */ #define QCASPI_CLK_SPEED_MIN 1000000 #define QCASPI_CLK_SPEED_MAX 16000000 @@ -585,14 +587,14 @@ qcaspi_spi_thread(void *data) continue; } - if ((qca->intr_req == qca->intr_svc) && + if (!test_bit(SPI_INTR, &qca->intr) && !qca->txr.skb[qca->txr.head]) schedule(); set_current_state(TASK_RUNNING); - netdev_dbg(qca->net_dev, "have work to do. int: %d, tx_skb: %p\n", - qca->intr_req - qca->intr_svc, + netdev_dbg(qca->net_dev, "have work to do. int: %lu, tx_skb: %p\n", + qca->intr, qca->txr.skb[qca->txr.head]); qcaspi_qca7k_sync(qca, QCASPI_EVENT_UPDATE); @@ -606,8 +608,7 @@ qcaspi_spi_thread(void *data) msleep(QCASPI_QCA7K_REBOOT_TIME_MS); } - if (qca->intr_svc != qca->intr_req) { - qca->intr_svc = qca->intr_req; + if (test_and_clear_bit(SPI_INTR, &qca->intr)) { start_spi_intr_handling(qca, &intr_cause); if (intr_cause & SPI_INT_CPU_ON) { @@ -669,7 +670,7 @@ qcaspi_intr_handler(int irq, void *data) { struct qcaspi *qca = data; - qca->intr_req++; + set_bit(SPI_INTR, &qca->intr); if (qca->spi_thread && qca->spi_thread->state != TASK_RUNNING) wake_up_process(qca->spi_thread); @@ -686,8 +687,7 @@ qcaspi_netdev_open(struct net_device *dev) if (!qca) return -EINVAL; - qca->intr_req = 1; - qca->intr_svc = 0; + set_bit(SPI_INTR, &qca->intr); qca->sync = QCASPI_SYNC_UNKNOWN; qcafrm_fsm_init_spi(&qca->frm_handle); diff --git a/drivers/net/ethernet/qualcomm/qca_spi.h b/drivers/net/ethernet/qualcomm/qca_spi.h index d13a67e20d65..8d4767e9b914 100644 --- a/drivers/net/ethernet/qualcomm/qca_spi.h +++ b/drivers/net/ethernet/qualcomm/qca_spi.h @@ -92,8 +92,7 @@ struct qcaspi { struct qcafrm_handle frm_handle; struct sk_buff *rx_skb; - unsigned int intr_req; - unsigned int intr_svc; + unsigned long intr; u16 reset_count; #ifdef CONFIG_DEBUG_FS From 66c7aa157a38d93947b32c0d309889369a27e8e2 Mon Sep 17 00:00:00 2001 From: Pedro Tammela Date: Mon, 11 Dec 2023 15:18:06 -0300 Subject: [PATCH 1212/1565] net/sched: act_api: rely on rcu in tcf_idr_check_alloc [ Upstream commit 4b55e86736d5b492cf689125da2600f59c7d2c39 ] Instead of relying only on the idrinfo->lock mutex for bind/alloc logic, rely on a combination of rcu + mutex + atomics to better scale the case where multiple rtnl-less filters are binding to the same action object. Action binding happens when an action index is specified explicitly and an action exists which such index exists. Example: tc actions add action drop index 1 tc filter add ... matchall action drop index 1 tc filter add ... matchall action drop index 1 tc filter add ... matchall action drop index 1 tc filter ls ... filter protocol all pref 49150 matchall chain 0 filter protocol all pref 49150 matchall chain 0 handle 0x1 not_in_hw action order 1: gact action drop random type none pass val 0 index 1 ref 4 bind 3 filter protocol all pref 49151 matchall chain 0 filter protocol all pref 49151 matchall chain 0 handle 0x1 not_in_hw action order 1: gact action drop random type none pass val 0 index 1 ref 4 bind 3 filter protocol all pref 49152 matchall chain 0 filter protocol all pref 49152 matchall chain 0 handle 0x1 not_in_hw action order 1: gact action drop random type none pass val 0 index 1 ref 4 bind 3 When no index is specified, as before, grab the mutex and allocate in the idr the next available id. In this version, as opposed to before, it's simplified to store the -EBUSY pointer instead of the previous alloc + replace combination. When an index is specified, rely on rcu to find if there's an object in such index. If there's none, fallback to the above, serializing on the mutex and reserving the specified id. If there's one, it can be an -EBUSY pointer, in which case we just try again until it's an action, or an action. Given the rcu guarantees, the action found could be dead and therefore we need to bump the refcount if it's not 0, handling the case it's in fact 0. As bind and the action refcount are already atomics, these increments can happen without the mutex protection while many tcf_idr_check_alloc race to bind to the same action instance. In case binding encounters a parallel delete or add, it will return -EAGAIN in order to try again. Both filter and action apis already have the retry machinery in-place. In case it's an unlocked filter it retries under the rtnl lock. Signed-off-by: Pedro Tammela Acked-by: Jamal Hadi Salim Reviewed-by: Vlad Buslov Link: https://lore.kernel.org/r/20231211181807.96028-2-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski Stable-dep-of: d864319871b0 ("net/sched: act_api: fix possible infinite loop in tcf_idr_check_alloc()") Signed-off-by: Sasha Levin --- net/sched/act_api.c | 65 ++++++++++++++++++++++++++++++--------------- 1 file changed, 43 insertions(+), 22 deletions(-) diff --git a/net/sched/act_api.c b/net/sched/act_api.c index 4ab9c2a6f650..eb6bb90ddd33 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -507,6 +507,9 @@ EXPORT_SYMBOL(tcf_idr_cleanup); * its reference and bind counters, and return 1. Otherwise insert temporary * error pointer (to prevent concurrent users from inserting actions with same * index) and return 0. + * + * May return -EAGAIN for binding actions in case of a parallel add/delete on + * the requested index. */ int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index, @@ -515,43 +518,61 @@ int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index, struct tcf_idrinfo *idrinfo = tn->idrinfo; struct tc_action *p; int ret; + u32 max; -again: - mutex_lock(&idrinfo->lock); if (*index) { +again: + rcu_read_lock(); p = idr_find(&idrinfo->action_idr, *index); + if (IS_ERR(p)) { /* This means that another process allocated * index but did not assign the pointer yet. */ - mutex_unlock(&idrinfo->lock); + rcu_read_unlock(); goto again; } - if (p) { - refcount_inc(&p->tcfa_refcnt); - if (bind) - atomic_inc(&p->tcfa_bindcnt); - *a = p; - ret = 1; - } else { - *a = NULL; - ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index, - *index, GFP_KERNEL); - if (!ret) - idr_replace(&idrinfo->action_idr, - ERR_PTR(-EBUSY), *index); + if (!p) { + /* Empty slot, try to allocate it */ + max = *index; + rcu_read_unlock(); + goto new; } + + if (!refcount_inc_not_zero(&p->tcfa_refcnt)) { + /* Action was deleted in parallel */ + rcu_read_unlock(); + return -EAGAIN; + } + + if (bind) + atomic_inc(&p->tcfa_bindcnt); + *a = p; + + rcu_read_unlock(); + + return 1; } else { + /* Find a slot */ *index = 1; - *a = NULL; - ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index, - UINT_MAX, GFP_KERNEL); - if (!ret) - idr_replace(&idrinfo->action_idr, ERR_PTR(-EBUSY), - *index); + max = UINT_MAX; } + +new: + *a = NULL; + + mutex_lock(&idrinfo->lock); + ret = idr_alloc_u32(&idrinfo->action_idr, ERR_PTR(-EBUSY), index, max, + GFP_KERNEL); mutex_unlock(&idrinfo->lock); + + /* N binds raced for action allocation, + * retry for all the ones that failed. + */ + if (ret == -ENOSPC && *index == max) + ret = -EAGAIN; + return ret; } EXPORT_SYMBOL(tcf_idr_check_alloc); From c6a7da65a296745535a964be1019ec7691b0cb90 Mon Sep 17 00:00:00 2001 From: David Ruth Date: Fri, 14 Jun 2024 19:03:26 +0000 Subject: [PATCH 1213/1565] net/sched: act_api: fix possible infinite loop in tcf_idr_check_alloc() [ Upstream commit d864319871b05fadd153e0aede4811ca7008f5d6 ] syzbot found hanging tasks waiting on rtnl_lock [1] A reproducer is available in the syzbot bug. When a request to add multiple actions with the same index is sent, the second request will block forever on the first request. This holds rtnl_lock, and causes tasks to hang. Return -EAGAIN to prevent infinite looping, while keeping documented behavior. [1] INFO: task kworker/1:0:5088 blocked for more than 143 seconds. Not tainted 6.9.0-rc4-syzkaller-00173-g3cdb45594619 #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/1:0 state:D stack:23744 pid:5088 tgid:5088 ppid:2 flags:0x00004000 Workqueue: events_power_efficient reg_check_chans_work Call Trace: context_switch kernel/sched/core.c:5409 [inline] __schedule+0xf15/0x5d00 kernel/sched/core.c:6746 __schedule_loop kernel/sched/core.c:6823 [inline] schedule+0xe7/0x350 kernel/sched/core.c:6838 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6895 __mutex_lock_common kernel/locking/mutex.c:684 [inline] __mutex_lock+0x5b8/0x9c0 kernel/locking/mutex.c:752 wiphy_lock include/net/cfg80211.h:5953 [inline] reg_leave_invalid_chans net/wireless/reg.c:2466 [inline] reg_check_chans_work+0x10a/0x10e0 net/wireless/reg.c:2481 Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action") Reported-by: syzbot+b87c222546179f4513a7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b87c222546179f4513a7 Signed-off-by: David Ruth Reviewed-by: Jamal Hadi Salim Link: https://lore.kernel.org/r/20240614190326.1349786-1-druth@chromium.org Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/sched/act_api.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/sched/act_api.c b/net/sched/act_api.c index eb6bb90ddd33..bf98bb602a9d 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -521,7 +521,6 @@ int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index, u32 max; if (*index) { -again: rcu_read_lock(); p = idr_find(&idrinfo->action_idr, *index); @@ -530,7 +529,7 @@ int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index, * index but did not assign the pointer yet. */ rcu_read_unlock(); - goto again; + return -EAGAIN; } if (!p) { From 3eb1b39627892c4e26cb0162b75725aa5fcc60c8 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sat, 15 Jun 2024 14:27:20 -0400 Subject: [PATCH 1214/1565] tipc: force a dst refcount before doing decryption [ Upstream commit 2ebe8f840c7450ecbfca9d18ac92e9ce9155e269 ] As it says in commit 3bc07321ccc2 ("xfrm: Force a dst refcount before entering the xfrm type handlers"): "Crypto requests might return asynchronous. In this case we leave the rcu protected region, so force a refcount on the skb's destination entry before we enter the xfrm type input/output handlers." On TIPC decryption path it has the same problem, and skb_dst_force() should be called before doing decryption to avoid a possible crash. Shuang reported this issue when this warning is triggered: [] WARNING: include/net/dst.h:337 tipc_sk_rcv+0x1055/0x1ea0 [tipc] [] Kdump: loaded Tainted: G W --------- - - 4.18.0-496.el8.x86_64+debug [] Workqueue: crypto cryptd_queue_worker [] RIP: 0010:tipc_sk_rcv+0x1055/0x1ea0 [tipc] [] Call Trace: [] tipc_sk_mcast_rcv+0x548/0xea0 [tipc] [] tipc_rcv+0xcf5/0x1060 [tipc] [] tipc_aead_decrypt_done+0x215/0x2e0 [tipc] [] cryptd_aead_crypt+0xdb/0x190 [] cryptd_queue_worker+0xed/0x190 [] process_one_work+0x93d/0x17e0 Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication") Reported-by: Shuang Li Signed-off-by: Xin Long Link: https://lore.kernel.org/r/fbe3195fad6997a4eec62d9bf076b2ad03ac336b.1718476040.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/tipc/node.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/tipc/node.c b/net/tipc/node.c index 9e3cfeb82a23..5f6866407be5 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -2076,6 +2076,7 @@ void tipc_rcv(struct net *net, struct sk_buff *skb, struct tipc_bearer *b) } else { n = tipc_node_find_by_id(net, ehdr->id); } + skb_dst_force(skb); tipc_crypto_rcv(net, (n) ? n->crypto_rx : NULL, &skb, b); if (!skb) return; From b4899d75b8432d499ed9d1c3d0013f98e8f4f3c9 Mon Sep 17 00:00:00 2001 From: Vlad Buslov Date: Wed, 15 Jun 2022 12:43:54 +0200 Subject: [PATCH 1215/1565] net/sched: act_ct: set 'net' pointer when creating new nf_flow_table [ Upstream commit fc54d9065f90dd25063883f404e6ff9a76913e73 ] Following patches in series use the pointer to access flow table offload debug variables. Signed-off-by: Vlad Buslov Signed-off-by: Oz Shlomo Signed-off-by: Pablo Neira Ayuso Stable-dep-of: 88c67aeb1407 ("sched: act_ct: add netns into the key of tcf_ct_flow_table") Signed-off-by: Sasha Levin --- net/sched/act_ct.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 2d41d866de3e..155426d5a48f 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -275,7 +275,7 @@ static struct nf_flowtable_type flowtable_ct = { .owner = THIS_MODULE, }; -static int tcf_ct_flow_table_get(struct tcf_ct_params *params) +static int tcf_ct_flow_table_get(struct net *net, struct tcf_ct_params *params) { struct tcf_ct_flow_table *ct_ft; int err = -ENOMEM; @@ -300,6 +300,7 @@ static int tcf_ct_flow_table_get(struct tcf_ct_params *params) err = nf_flow_table_init(&ct_ft->nf_ft); if (err) goto err_init; + write_pnet(&ct_ft->nf_ft.net, net); __module_get(THIS_MODULE); out_unlock: @@ -1291,7 +1292,7 @@ static int tcf_ct_init(struct net *net, struct nlattr *nla, if (err) goto cleanup; - err = tcf_ct_flow_table_get(params); + err = tcf_ct_flow_table_get(net, params); if (err) goto cleanup_params; From 03f625505e27f709390a86c9b78d3707f4c23df8 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sat, 15 Jun 2024 17:47:30 -0400 Subject: [PATCH 1216/1565] sched: act_ct: add netns into the key of tcf_ct_flow_table [ Upstream commit 88c67aeb14070bab61d3dd8be96c8b42ebcaf53a ] zones_ht is a global hashtable for flow_table with zone as key. However, it does not consider netns when getting a flow_table from zones_ht in tcf_ct_init(), and it means an act_ct action in netns A may get a flow_table that belongs to netns B if it has the same zone value. In Shuang's test with the TOPO: tcf2_c <---> tcf2_sw1 <---> tcf2_sw2 <---> tcf2_s tcf2_sw1 and tcf2_sw2 saw the same flow and used the same flow table, which caused their ct entries entering unexpected states and the TCP connection not able to end normally. This patch fixes the issue simply by adding netns into the key of tcf_ct_flow_table so that an act_ct action gets a flow_table that belongs to its own netns in tcf_ct_init(). Note that for easy coding we don't use tcf_ct_flow_table.nf_ft.net, as the ct_ft is initialized after inserting it to the hashtable in tcf_ct_flow_table_get() and also it requires to implement several functions in rhashtable_params including hashfn, obj_hashfn and obj_cmpfn. Fixes: 64ff70b80fd4 ("net/sched: act_ct: Offload established connections to flow table") Reported-by: Shuang Li Signed-off-by: Xin Long Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/1db5b6cc6902c5fc6f8c6cbd85494a2008087be5.1718488050.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/sched/act_ct.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 155426d5a48f..c6d6a6fe9602 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -38,21 +38,26 @@ static struct workqueue_struct *act_ct_wq; static struct rhashtable zones_ht; static DEFINE_MUTEX(zones_mutex); +struct zones_ht_key { + struct net *net; + u16 zone; +}; + struct tcf_ct_flow_table { struct rhash_head node; /* In zones tables */ struct rcu_work rwork; struct nf_flowtable nf_ft; refcount_t ref; - u16 zone; + struct zones_ht_key key; bool dying; }; static const struct rhashtable_params zones_params = { .head_offset = offsetof(struct tcf_ct_flow_table, node), - .key_offset = offsetof(struct tcf_ct_flow_table, zone), - .key_len = sizeof_field(struct tcf_ct_flow_table, zone), + .key_offset = offsetof(struct tcf_ct_flow_table, key), + .key_len = sizeof_field(struct tcf_ct_flow_table, key), .automatic_shrinking = true, }; @@ -277,11 +282,12 @@ static struct nf_flowtable_type flowtable_ct = { static int tcf_ct_flow_table_get(struct net *net, struct tcf_ct_params *params) { + struct zones_ht_key key = { .net = net, .zone = params->zone }; struct tcf_ct_flow_table *ct_ft; int err = -ENOMEM; mutex_lock(&zones_mutex); - ct_ft = rhashtable_lookup_fast(&zones_ht, ¶ms->zone, zones_params); + ct_ft = rhashtable_lookup_fast(&zones_ht, &key, zones_params); if (ct_ft && refcount_inc_not_zero(&ct_ft->ref)) goto out_unlock; @@ -290,7 +296,7 @@ static int tcf_ct_flow_table_get(struct net *net, struct tcf_ct_params *params) goto err_alloc; refcount_set(&ct_ft->ref, 1); - ct_ft->zone = params->zone; + ct_ft->key = key; err = rhashtable_insert_fast(&zones_ht, &ct_ft->node, zones_params); if (err) goto err_insert; From b4bca4722fda928810d024350493990de39f1e40 Mon Sep 17 00:00:00 2001 From: Xiaolei Wang Date: Mon, 17 Jun 2024 09:39:22 +0800 Subject: [PATCH 1217/1565] net: stmmac: No need to calculate speed divider when offload is disabled [ Upstream commit b8c43360f6e424131fa81d3ba8792ad8ff25a09e ] commit be27b8965297 ("net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters") introduced a problem. When deleting, it prompts "Invalid portTransmitRate 0 (idleSlope - sendSlope)" and exits. Add judgment on cbs.enable. Only when offload is enabled, speed divider needs to be calculated. Fixes: be27b8965297 ("net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters") Signed-off-by: Xiaolei Wang Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20240617013922.1035854-1-xiaolei.wang@windriver.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- .../net/ethernet/stmicro/stmmac/stmmac_tc.c | 40 ++++++++++--------- 1 file changed, 22 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c index 6b93a7614ad9..4da1a80de722 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c @@ -325,24 +325,28 @@ static int tc_setup_cbs(struct stmmac_priv *priv, port_transmit_rate_kbps = qopt->idleslope - qopt->sendslope; - /* Port Transmit Rate and Speed Divider */ - switch (div_s64(port_transmit_rate_kbps, 1000)) { - case SPEED_10000: - case SPEED_5000: - ptr = 32; - break; - case SPEED_2500: - case SPEED_1000: - ptr = 8; - break; - case SPEED_100: - ptr = 4; - break; - default: - netdev_err(priv->dev, - "Invalid portTransmitRate %lld (idleSlope - sendSlope)\n", - port_transmit_rate_kbps); - return -EINVAL; + if (qopt->enable) { + /* Port Transmit Rate and Speed Divider */ + switch (div_s64(port_transmit_rate_kbps, 1000)) { + case SPEED_10000: + case SPEED_5000: + ptr = 32; + break; + case SPEED_2500: + case SPEED_1000: + ptr = 8; + break; + case SPEED_100: + ptr = 4; + break; + default: + netdev_err(priv->dev, + "Invalid portTransmitRate %lld (idleSlope - sendSlope)\n", + port_transmit_rate_kbps); + return -EINVAL; + } + } else { + ptr = 0; } mode_to_use = priv->plat->tx_queues_cfg[queue].mode_to_use; From 333c0a1f7d5bb506d169d56005299e2b1c07ff8e Mon Sep 17 00:00:00 2001 From: Heng Qi Date: Mon, 17 Jun 2024 21:15:23 +0800 Subject: [PATCH 1218/1565] virtio_net: checksum offloading handling fix [ Upstream commit 604141c036e1b636e2a71cf6e1aa09d1e45f40c2 ] In virtio spec 0.95, VIRTIO_NET_F_GUEST_CSUM was designed to handle partially checksummed packets, and the validation of fully checksummed packets by the device is independent of VIRTIO_NET_F_GUEST_CSUM negotiation. However, the specification erroneously stated: "If VIRTIO_NET_F_GUEST_CSUM is not negotiated, the device MUST set flags to zero and SHOULD supply a fully checksummed packet to the driver." This statement is inaccurate because even without VIRTIO_NET_F_GUEST_CSUM negotiation, the device can still set the VIRTIO_NET_HDR_F_DATA_VALID flag. Essentially, the device can facilitate the validation of these packets' checksums - a process known as RX checksum offloading - removing the need for the driver to do so. This scenario is currently not implemented in the driver and requires correction. The necessary specification correction[1] has been made and approved in the virtio TC vote. [1] https://lists.oasis-open.org/archives/virtio-comment/202401/msg00011.html Fixes: 4f49129be6fa ("virtio-net: Set RXCSUM feature if GUEST_CSUM is available") Signed-off-by: Heng Qi Reviewed-by: Jiri Pirko Acked-by: Jason Wang Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/net/virtio_net.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 4029c56dfcf0..f7ed99561c19 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -3112,8 +3112,16 @@ static int virtnet_probe(struct virtio_device *vdev) dev->features |= dev->hw_features & NETIF_F_ALL_TSO; /* (!csum && gso) case will be fixed by register_netdev() */ } - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM)) - dev->features |= NETIF_F_RXCSUM; + + /* 1. With VIRTIO_NET_F_GUEST_CSUM negotiation, the driver doesn't + * need to calculate checksums for partially checksummed packets, + * as they're considered valid by the upper layer. + * 2. Without VIRTIO_NET_F_GUEST_CSUM negotiation, the driver only + * receives fully checksummed packets. The device may assist in + * validating these packets' checksums, so the driver won't have to. + */ + dev->features |= NETIF_F_RXCSUM; + if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) || virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)) dev->features |= NETIF_F_GRO_HW; From 72d9611968867cc4c5509e7708b1507d692b797a Mon Sep 17 00:00:00 2001 From: Jozsef Kadlecsik Date: Mon, 17 Jun 2024 11:18:15 +0200 Subject: [PATCH 1219/1565] netfilter: ipset: Fix suspicious rcu_dereference_protected() [ Upstream commit 8ecd06277a7664f4ef018abae3abd3451d64e7a6 ] When destroying all sets, we are either in pernet exit phase or are executing a "destroy all sets command" from userspace. The latter was taken into account in ip_set_dereference() (nfnetlink mutex is held), but the former was not. The patch adds the required check to rcu_dereference_protected() in ip_set_dereference(). Fixes: 4e7aaa6b82d6 ("netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type") Reported-by: syzbot+b62c37cdd58103293a5a@syzkaller.appspotmail.com Reported-by: syzbot+cfbe1da5fdfc39efc293@syzkaller.appspotmail.com Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-lkp/202406141556.e0b6f17e-lkp@intel.com Signed-off-by: Jozsef Kadlecsik Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- net/netfilter/ipset/ip_set_core.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index ecd693fca224..bac92369a543 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -53,12 +53,13 @@ MODULE_DESCRIPTION("core IP set support"); MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_IPSET); /* When the nfnl mutex or ip_set_ref_lock is held: */ -#define ip_set_dereference(p) \ - rcu_dereference_protected(p, \ +#define ip_set_dereference(inst) \ + rcu_dereference_protected((inst)->ip_set_list, \ lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET) || \ - lockdep_is_held(&ip_set_ref_lock)) + lockdep_is_held(&ip_set_ref_lock) || \ + (inst)->is_deleted) #define ip_set(inst, id) \ - ip_set_dereference((inst)->ip_set_list)[id] + ip_set_dereference(inst)[id] #define ip_set_ref_netlink(inst,id) \ rcu_dereference_raw((inst)->ip_set_list)[id] #define ip_set_dereference_nfnl(p) \ @@ -1137,7 +1138,7 @@ static int ip_set_create(struct net *net, struct sock *ctnl, if (!list) goto cleanup; /* nfnl mutex is held, both lists are valid */ - tmp = ip_set_dereference(inst->ip_set_list); + tmp = ip_set_dereference(inst); memcpy(list, tmp, sizeof(struct ip_set *) * inst->ip_set_max); rcu_assign_pointer(inst->ip_set_list, list); /* Make sure all current packets have passed through */ From 6a0f5d540f0f9635b5ac6fc4bdb78d40302de32d Mon Sep 17 00:00:00 2001 From: Oliver Neukum Date: Wed, 19 Jun 2024 15:28:03 +0200 Subject: [PATCH 1220/1565] net: usb: rtl8150 fix unintiatilzed variables in rtl8150_get_link_ksettings [ Upstream commit fba383985354e83474f95f36d7c65feb75dba19d ] This functions retrieves values by passing a pointer. As the function that retrieves them can fail before touching the pointers, the variables must be initialized. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+5186630949e3c55f0799@syzkaller.appspotmail.com Signed-off-by: Oliver Neukum Link: https://lore.kernel.org/r/20240619132816.11526-1-oneukum@suse.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/usb/rtl8150.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/usb/rtl8150.c b/drivers/net/usb/rtl8150.c index bf8a60533f3e..d128b4ac7c9f 100644 --- a/drivers/net/usb/rtl8150.c +++ b/drivers/net/usb/rtl8150.c @@ -778,7 +778,8 @@ static int rtl8150_get_link_ksettings(struct net_device *netdev, struct ethtool_link_ksettings *ecmd) { rtl8150_t *dev = netdev_priv(netdev); - short lpa, bmcr; + short lpa = 0; + short bmcr = 0; u32 supported; supported = (SUPPORTED_10baseT_Half | From 97509608b7e49298717112db2388d8452ebb2128 Mon Sep 17 00:00:00 2001 From: Biju Das Date: Mon, 10 Jun 2024 20:55:32 +0100 Subject: [PATCH 1221/1565] regulator: core: Fix modpost error "regulator_get_regmap" undefined [ Upstream commit 3f60497c658d2072714d097a177612d34b34aa3d ] Fix the modpost error "regulator_get_regmap" undefined by adding export symbol. Fixes: 04eca28cde52 ("regulator: Add helpers for low-level register access") Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202406110117.mk5UR3VZ-lkp@intel.com Signed-off-by: Biju Das Link: https://lore.kernel.org/r/20240610195532.175942-1-biju.das.jz@bp.renesas.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- drivers/regulator/core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c index 2d1a23b9eae3..7082cffdd10e 100644 --- a/drivers/regulator/core.c +++ b/drivers/regulator/core.c @@ -3185,6 +3185,7 @@ struct regmap *regulator_get_regmap(struct regulator *regulator) return map ? map : ERR_PTR(-EOPNOTSUPP); } +EXPORT_SYMBOL_GPL(regulator_get_regmap); /** * regulator_get_hardware_vsel_register - get the HW voltage selector register From d293db11cb9c177abe13b6fce306aa762d454796 Mon Sep 17 00:00:00 2001 From: Qing Wang Date: Thu, 7 Oct 2021 20:28:28 -0700 Subject: [PATCH 1222/1565] dmaengine: ioat: switch from 'pci_' to 'dma_' API [ Upstream commit 0c5afef7bf1fbda7e7883dc4b93f64f90003706f ] The wrappers in include/linux/pci-dma-compat.h should go away. pci_set_dma_mask()/pci_set_consistent_dma_mask() should be replaced with dma_set_mask()/dma_set_coherent_mask(), and use dma_set_mask_and_coherent() for both. Signed-off-by: Qing Wang Link: https://lore.kernel.org/r/1633663733-47199-3-git-send-email-wangqing@vivo.com Signed-off-by: Vinod Koul Stable-dep-of: 1b11b4ef6bd6 ("dmaengine: ioatdma: Fix leaking on version mismatch") Signed-off-by: Sasha Levin --- drivers/dma/ioat/init.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c index 191b59279007..373b8dac6c9b 100644 --- a/drivers/dma/ioat/init.c +++ b/drivers/dma/ioat/init.c @@ -1363,15 +1363,9 @@ static int ioat_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (!iomap) return -ENOMEM; - err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64)); + err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64)); if (err) - err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32)); - if (err) - return err; - - err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64)); - if (err) - err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32)); + err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32)); if (err) return err; From f99b00ed9b928a61144abdad2730de495c63a66f Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Tue, 7 Mar 2023 13:26:54 -0600 Subject: [PATCH 1223/1565] dmaengine: ioat: Drop redundant pci_enable_pcie_error_reporting() [ Upstream commit e32622f84ae289dc7a04e9f01cd62cb914fdc5c6 ] pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration, so the driver doesn't need to do it itself. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Also remove the corresponding pci_disable_pcie_error_reporting() from the driver .remove() path. Note that this only controls ERR_* Messages from the device. An ERR_* Message may cause the Root Port to generate an interrupt, depending on the AER Root Error Command register managed by the AER service driver. Signed-off-by: Bjorn Helgaas Acked-by: Dave Jiang Link: https://lore.kernel.org/r/20230307192655.874008-2-helgaas@kernel.org Signed-off-by: Vinod Koul Stable-dep-of: 1b11b4ef6bd6 ("dmaengine: ioatdma: Fix leaking on version mismatch") Signed-off-by: Sasha Levin --- drivers/dma/ioat/init.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c index 373b8dac6c9b..783d4e740f11 100644 --- a/drivers/dma/ioat/init.c +++ b/drivers/dma/ioat/init.c @@ -15,7 +15,6 @@ #include #include #include -#include #include #include "dma.h" #include "registers.h" @@ -1382,15 +1381,11 @@ static int ioat_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (is_skx_ioat(pdev)) device->version = IOAT_VER_3_2; err = ioat3_dma_probe(device, ioat_dca_enabled); - - if (device->version >= IOAT_VER_3_3) - pci_enable_pcie_error_reporting(pdev); } else return -ENODEV; if (err) { dev_err(dev, "Intel(R) I/OAT DMA Engine init failed\n"); - pci_disable_pcie_error_reporting(pdev); return -ENODEV; } @@ -1413,7 +1408,6 @@ static void ioat_remove(struct pci_dev *pdev) device->dca = NULL; } - pci_disable_pcie_error_reporting(pdev); ioat_dma_remove(device); } From c017a8e3e30c663054e5d087e2d87ef68747bda6 Mon Sep 17 00:00:00 2001 From: Nikita Shubin Date: Tue, 28 May 2024 09:09:23 +0300 Subject: [PATCH 1224/1565] dmaengine: ioatdma: Fix leaking on version mismatch [ Upstream commit 1b11b4ef6bd68591dcaf8423c7d05e794e6aec6f ] Fix leaking ioatdma_device if I/OAT version is less than IOAT_VER_3_0. Fixes: bf453a0a18b2 ("dmaengine: ioat: Support in-use unbind") Signed-off-by: Nikita Shubin Reviewed-by: Dave Jiang Link: https://lore.kernel.org/r/20240528-ioatdma-fixes-v2-1-a9f2fbe26ab1@yadro.com Signed-off-by: Vinod Koul Signed-off-by: Sasha Levin --- drivers/dma/ioat/init.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c index 783d4e740f11..6c27980a5ec8 100644 --- a/drivers/dma/ioat/init.c +++ b/drivers/dma/ioat/init.c @@ -1349,6 +1349,7 @@ static int ioat_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) void __iomem * const *iomap; struct device *dev = &pdev->dev; struct ioatdma_device *device; + u8 version; int err; err = pcim_enable_device(pdev); @@ -1362,6 +1363,10 @@ static int ioat_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) if (!iomap) return -ENOMEM; + version = readb(iomap[IOAT_MMIO_BAR] + IOAT_VER_OFFSET); + if (version < IOAT_VER_3_0) + return -ENODEV; + err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64)); if (err) err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32)); @@ -1374,16 +1379,14 @@ static int ioat_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) pci_set_master(pdev); pci_set_drvdata(pdev, device); - device->version = readb(device->reg_base + IOAT_VER_OFFSET); + device->version = version; if (device->version >= IOAT_VER_3_4) ioat_dca_enabled = 0; - if (device->version >= IOAT_VER_3_0) { - if (is_skx_ioat(pdev)) - device->version = IOAT_VER_3_2; - err = ioat3_dma_probe(device, ioat_dca_enabled); - } else - return -ENODEV; + if (is_skx_ioat(pdev)) + device->version = IOAT_VER_3_2; + + err = ioat3_dma_probe(device, ioat_dca_enabled); if (err) { dev_err(dev, "Intel(R) I/OAT DMA Engine init failed\n"); return -ENODEV; From a486fca282a9fb3b95e46002acb2bdc2744e9bdf Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Tue, 7 Mar 2023 15:46:15 -0600 Subject: [PATCH 1225/1565] dmaengine: ioat: use PCI core macros for PCIe Capability [ Upstream commit 8f6707d0773be31972768abd6e0bf7b8515b5b1a ] The PCIe Capability is defined by the PCIe spec, so use the PCI_EXP_DEVCTL macros defined by the PCI core instead of defining copies in IOAT. This makes it easier to find all uses of the PCIe Device Control register. No functional change intended. Signed-off-by: Bjorn Helgaas Acked-by: Dave Jiang Link: https://lore.kernel.org/r/20230307214615.887354-1-helgaas@kernel.org Signed-off-by: Vinod Koul Stable-dep-of: f0dc9fda2e0e ("dmaengine: ioatdma: Fix error path in ioat3_dma_probe()") Signed-off-by: Sasha Levin --- drivers/dma/ioat/init.c | 6 +++--- drivers/dma/ioat/registers.h | 7 ------- 2 files changed, 3 insertions(+), 10 deletions(-) diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c index 6c27980a5ec8..ed4910e3bc2a 100644 --- a/drivers/dma/ioat/init.c +++ b/drivers/dma/ioat/init.c @@ -1190,13 +1190,13 @@ static int ioat3_dma_probe(struct ioatdma_device *ioat_dma, int dca) ioat_dma->dca = ioat_dca_init(pdev, ioat_dma->reg_base); /* disable relaxed ordering */ - err = pcie_capability_read_word(pdev, IOAT_DEVCTRL_OFFSET, &val16); + err = pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &val16); if (err) return pcibios_err_to_errno(err); /* clear relaxed ordering enable */ - val16 &= ~IOAT_DEVCTRL_ROE; - err = pcie_capability_write_word(pdev, IOAT_DEVCTRL_OFFSET, val16); + val16 &= ~PCI_EXP_DEVCTL_RELAX_EN; + err = pcie_capability_write_word(pdev, PCI_EXP_DEVCTL, val16); if (err) return pcibios_err_to_errno(err); diff --git a/drivers/dma/ioat/registers.h b/drivers/dma/ioat/registers.h index f55a5f92f185..54cf0ad39887 100644 --- a/drivers/dma/ioat/registers.h +++ b/drivers/dma/ioat/registers.h @@ -14,13 +14,6 @@ #define IOAT_PCI_CHANERR_INT_OFFSET 0x180 #define IOAT_PCI_CHANERRMASK_INT_OFFSET 0x184 -/* PCIe config registers */ - -/* EXPCAPID + N */ -#define IOAT_DEVCTRL_OFFSET 0x8 -/* relaxed ordering enable */ -#define IOAT_DEVCTRL_ROE 0x10 - /* MMIO Device Registers */ #define IOAT_CHANCNT_OFFSET 0x00 /* 8-bit */ From 768ae5e02551b38bc604bb0aca4bfba90aeaa9dc Mon Sep 17 00:00:00 2001 From: Nikita Shubin Date: Tue, 28 May 2024 09:09:24 +0300 Subject: [PATCH 1226/1565] dmaengine: ioatdma: Fix error path in ioat3_dma_probe() [ Upstream commit f0dc9fda2e0ee9e01496c2f5aca3a831131fad79 ] Make sure we are disabling interrupts and destroying DMA pool if pcie_capability_read/write_word() call failed. Fixes: 511deae0261c ("dmaengine: ioatdma: disable relaxed ordering for ioatdma") Signed-off-by: Nikita Shubin Reviewed-by: Dave Jiang Link: https://lore.kernel.org/r/20240528-ioatdma-fixes-v2-2-a9f2fbe26ab1@yadro.com Signed-off-by: Vinod Koul Signed-off-by: Sasha Levin --- drivers/dma/ioat/init.c | 33 +++++++++++++++------------------ 1 file changed, 15 insertions(+), 18 deletions(-) diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c index ed4910e3bc2a..ceba1b4083a9 100644 --- a/drivers/dma/ioat/init.c +++ b/drivers/dma/ioat/init.c @@ -534,18 +534,6 @@ static int ioat_probe(struct ioatdma_device *ioat_dma) return err; } -static int ioat_register(struct ioatdma_device *ioat_dma) -{ - int err = dma_async_device_register(&ioat_dma->dma_dev); - - if (err) { - ioat_disable_interrupts(ioat_dma); - dma_pool_destroy(ioat_dma->completion_pool); - } - - return err; -} - static void ioat_dma_remove(struct ioatdma_device *ioat_dma) { struct dma_device *dma = &ioat_dma->dma_dev; @@ -1180,9 +1168,9 @@ static int ioat3_dma_probe(struct ioatdma_device *ioat_dma, int dca) ioat_chan->reg_base + IOAT_DCACTRL_OFFSET); } - err = ioat_register(ioat_dma); + err = dma_async_device_register(&ioat_dma->dma_dev); if (err) - return err; + goto err_disable_interrupts; ioat_kobject_add(ioat_dma, &ioat_ktype); @@ -1191,20 +1179,29 @@ static int ioat3_dma_probe(struct ioatdma_device *ioat_dma, int dca) /* disable relaxed ordering */ err = pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &val16); - if (err) - return pcibios_err_to_errno(err); + if (err) { + err = pcibios_err_to_errno(err); + goto err_disable_interrupts; + } /* clear relaxed ordering enable */ val16 &= ~PCI_EXP_DEVCTL_RELAX_EN; err = pcie_capability_write_word(pdev, PCI_EXP_DEVCTL, val16); - if (err) - return pcibios_err_to_errno(err); + if (err) { + err = pcibios_err_to_errno(err); + goto err_disable_interrupts; + } if (ioat_dma->cap & IOAT_CAP_DPS) writeb(ioat_pending_level + 1, ioat_dma->reg_base + IOAT_PREFETCH_LIMIT_OFFSET); return 0; + +err_disable_interrupts: + ioat_disable_interrupts(ioat_dma); + dma_pool_destroy(ioat_dma->completion_pool); + return err; } static void ioat_shutdown(struct pci_dev *pdev) From 34cc20a5441dd0b6a94f045d346a53bbc2bbf32a Mon Sep 17 00:00:00 2001 From: Nikita Shubin Date: Tue, 28 May 2024 09:09:25 +0300 Subject: [PATCH 1227/1565] dmaengine: ioatdma: Fix kmemleak in ioat_pci_probe() [ Upstream commit 29b7cd255f3628e0d65be33a939d8b5bba10aa62 ] If probing fails we end up with leaking ioatdma_device and each allocated channel. Following kmemleak easy to reproduce by injecting an error in ioat_alloc_chan_resources() when doing ioat_dma_self_test(). unreferenced object 0xffff888014ad5800 (size 1024): [..] [] kmemleak_alloc+0x4a/0x80 [] kmalloc_trace+0x270/0x2f0 [] ioat_pci_probe+0xc1/0x1c0 [ioatdma] [..] repeated for each ioatdma channel: unreferenced object 0xffff8880148e5c00 (size 512): [..] [] kmemleak_alloc+0x4a/0x80 [] kmalloc_trace+0x270/0x2f0 [] ioat_enumerate_channels+0x101/0x2d0 [ioatdma] [] ioat3_dma_probe+0x4d6/0x970 [ioatdma] [] ioat_pci_probe+0x181/0x1c0 [ioatdma] [..] Fixes: bf453a0a18b2 ("dmaengine: ioat: Support in-use unbind") Signed-off-by: Nikita Shubin Reviewed-by: Dave Jiang Link: https://lore.kernel.org/r/20240528-ioatdma-fixes-v2-3-a9f2fbe26ab1@yadro.com Signed-off-by: Vinod Koul Signed-off-by: Sasha Levin --- drivers/dma/ioat/init.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c index ceba1b4083a9..8a115bafab6a 100644 --- a/drivers/dma/ioat/init.c +++ b/drivers/dma/ioat/init.c @@ -1346,6 +1346,7 @@ static int ioat_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) void __iomem * const *iomap; struct device *dev = &pdev->dev; struct ioatdma_device *device; + unsigned int i; u8 version; int err; @@ -1385,6 +1386,9 @@ static int ioat_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) err = ioat3_dma_probe(device, ioat_dca_enabled); if (err) { + for (i = 0; i < IOAT_MAX_CHANS; i++) + kfree(device->idx[i]); + kfree(device); dev_err(dev, "Intel(R) I/OAT DMA Engine init failed\n"); return -ENODEV; } From f3d17826d6b6bba58c3c86cf81312e2af6d4df66 Mon Sep 17 00:00:00 2001 From: Nikita Shubin Date: Tue, 14 May 2024 13:52:31 +0300 Subject: [PATCH 1228/1565] dmaengine: ioatdma: Fix missing kmem_cache_destroy() [ Upstream commit 5422145d0b749ad554ada772133b9b20f9fb0ec8 ] Fix missing kmem_cache_destroy() for ioat_sed_cache in ioat_exit_module(). Noticed via: ``` modprobe ioatdma rmmod ioatdma modprobe ioatdma debugfs: Directory 'ioat_sed_ent' with parent 'slab' already present! ``` Fixes: c0f28ce66ecf ("dmaengine: ioatdma: move all the init routines") Signed-off-by: Nikita Shubin Acked-by: Dave Jiang Link: https://lore.kernel.org/r/20240514-ioatdma_fixes-v1-1-2776a0913254@yadro.com Signed-off-by: Vinod Koul Signed-off-by: Sasha Levin --- drivers/dma/ioat/init.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c index 8a115bafab6a..2a47b9053bad 100644 --- a/drivers/dma/ioat/init.c +++ b/drivers/dma/ioat/init.c @@ -1450,6 +1450,7 @@ module_init(ioat_init_module); static void __exit ioat_exit_module(void) { pci_unregister_driver(&ioat_pci_driver); + kmem_cache_destroy(ioat_sed_cache); kmem_cache_destroy(ioat_cache); } module_exit(ioat_exit_module); From 6eca23100e9030725f69c1babacd58803f29ec8d Mon Sep 17 00:00:00 2001 From: Raju Rangoju Date: Fri, 14 Jun 2024 19:31:49 +0530 Subject: [PATCH 1229/1565] ACPICA: Revert "ACPICA: avoid Info: mapping multiple BARs. Your kernel is fine." [ Upstream commit a83e1385b780d41307433ddbc86e3c528db031f0 ] Undo the modifications made in commit d410ee5109a1 ("ACPICA: avoid "Info: mapping multiple BARs. Your kernel is fine.""). The initial purpose of this commit was to stop memory mappings for operation regions from overlapping page boundaries, as it can trigger warnings if different page attributes are present. However, it was found that when this situation arises, mapping continues until the boundary's end, but there is still an attempt to read/write the entire length of the map, leading to a NULL pointer deference. For example, if a four-byte mapping request is made but only one byte is mapped because it hits the current page boundary's end, a four-byte read/write attempt is still made, resulting in a NULL pointer deference. Instead, map the entire length, as the ACPI specification does not mandate that it must be within the same page boundary. It is permissible for it to be mapped across different regions. Link: https://github.com/acpica/acpica/pull/954 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218849 Fixes: d410ee5109a1 ("ACPICA: avoid "Info: mapping multiple BARs. Your kernel is fine."") Co-developed-by: Sanath S Signed-off-by: Sanath S Signed-off-by: Raju Rangoju Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin --- drivers/acpi/acpica/exregion.c | 23 ++--------------------- 1 file changed, 2 insertions(+), 21 deletions(-) diff --git a/drivers/acpi/acpica/exregion.c b/drivers/acpi/acpica/exregion.c index 4914dbc44517..5bbbd015de5a 100644 --- a/drivers/acpi/acpica/exregion.c +++ b/drivers/acpi/acpica/exregion.c @@ -44,7 +44,6 @@ acpi_ex_system_memory_space_handler(u32 function, struct acpi_mem_mapping *mm = mem_info->cur_mm; u32 length; acpi_size map_length; - acpi_size page_boundary_map_length; #ifdef ACPI_MISALIGNMENT_NOT_SUPPORTED u32 remainder; #endif @@ -138,26 +137,8 @@ acpi_ex_system_memory_space_handler(u32 function, map_length = (acpi_size) ((mem_info->address + mem_info->length) - address); - /* - * If mapping the entire remaining portion of the region will cross - * a page boundary, just map up to the page boundary, do not cross. - * On some systems, crossing a page boundary while mapping regions - * can cause warnings if the pages have different attributes - * due to resource management. - * - * This has the added benefit of constraining a single mapping to - * one page, which is similar to the original code that used a 4k - * maximum window. - */ - page_boundary_map_length = (acpi_size) - (ACPI_ROUND_UP(address, ACPI_DEFAULT_PAGE_SIZE) - address); - if (page_boundary_map_length == 0) { - page_boundary_map_length = ACPI_DEFAULT_PAGE_SIZE; - } - - if (map_length > page_boundary_map_length) { - map_length = page_boundary_map_length; - } + if (map_length > ACPI_DEFAULT_PAGE_SIZE) + map_length = ACPI_DEFAULT_PAGE_SIZE; /* Create a new mapping starting at the address given */ From 7186b81c1f15e39069b1af172c6a951728ed3511 Mon Sep 17 00:00:00 2001 From: Patrisious Haddad Date: Tue, 28 May 2024 15:52:56 +0300 Subject: [PATCH 1230/1565] RDMA/mlx5: Add check for srq max_sge attribute [ Upstream commit 36ab7ada64caf08f10ee5a114d39964d1f91e81d ] max_sge attribute is passed by the user, and is inserted and used unchecked, so verify that the value doesn't exceed maximum allowed value before using it. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Patrisious Haddad Link: https://lore.kernel.org/r/277ccc29e8d57bfd53ddeb2ac633f2760cf8cdd0.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky Signed-off-by: Sasha Levin --- drivers/infiniband/hw/mlx5/srq.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c index e2f720eec1e1..56ce0fea00e9 100644 --- a/drivers/infiniband/hw/mlx5/srq.c +++ b/drivers/infiniband/hw/mlx5/srq.c @@ -225,12 +225,15 @@ int mlx5_ib_create_srq(struct ib_srq *ib_srq, int err; struct mlx5_srq_attr in = {}; __u32 max_srq_wqes = 1 << MLX5_CAP_GEN(dev->mdev, log_max_srq_sz); + __u32 max_sge_sz = MLX5_CAP_GEN(dev->mdev, max_wqe_sz_rq) / + sizeof(struct mlx5_wqe_data_seg); - /* Sanity check SRQ size before proceeding */ - if (init_attr->attr.max_wr >= max_srq_wqes) { - mlx5_ib_dbg(dev, "max_wr %d, cap %d\n", - init_attr->attr.max_wr, - max_srq_wqes); + /* Sanity check SRQ and sge size before proceeding */ + if (init_attr->attr.max_wr >= max_srq_wqes || + init_attr->attr.max_sge > max_sge_sz) { + mlx5_ib_dbg(dev, "max_wr %d,wr_cap %d,max_sge %d, sge_cap:%d\n", + init_attr->attr.max_wr, max_srq_wqes, + init_attr->attr.max_sge, max_sge_sz); return -EINVAL; } From 71bea3e64879f762260f20b2b153d0da3d78b89c Mon Sep 17 00:00:00 2001 From: Edson Juliano Drosdeck Date: Wed, 5 Jun 2024 12:39:23 -0300 Subject: [PATCH 1231/1565] ALSA: hda/realtek: Limit mic boost on N14AP7 commit 86a433862912f52597263aa224a9ed82bcd533bf upstream. The internal mic boost on the N14AP7 is too high. Fix this by applying the ALC269_FIXUP_LIMIT_INT_MIC_BOOST fixup to the machine to limit the gain. Signed-off-by: Edson Juliano Drosdeck Cc: Link: https://lore.kernel.org/r/20240605153923.2837-1-edson.drosdeck@gmail.com Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 4af809493805..5c5a144e707f 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9380,6 +9380,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1b7d, 0xa831, "Ordissimo EVE2 ", ALC269VB_FIXUP_ORDISSIMO_EVE2), /* Also known as Malata PC-B1303 */ SND_PCI_QUIRK(0x1c06, 0x2013, "Lemote A1802", ALC269_FIXUP_LEMOTE_A1802), SND_PCI_QUIRK(0x1c06, 0x2015, "Lemote A190X", ALC269_FIXUP_LEMOTE_A190X), + SND_PCI_QUIRK(0x1c6c, 0x122a, "Positivo N14AP7", ALC269_FIXUP_LIMIT_INT_MIC_BOOST), SND_PCI_QUIRK(0x1c6c, 0x1251, "Positivo N14KP6-TG", ALC288_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1d05, 0x1132, "TongFang PHxTxX1", ALC256_FIXUP_SET_COEF_DEFAULTS), SND_PCI_QUIRK(0x1d05, 0x1096, "TongFang GMxMRxx", ALC269_FIXUP_NO_SHUTUP), From febe794b83693257f21a23d2e03ea695a62449c8 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Mon, 20 May 2024 09:11:45 -0400 Subject: [PATCH 1232/1565] drm/radeon: fix UBSAN warning in kv_dpm.c commit a498df5421fd737d11bfd152428ba6b1c8538321 upstream. Adds bounds check for sumo_vid_mapping_entry. Reviewed-by: Mario Limonciello Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/radeon/sumo_dpm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/radeon/sumo_dpm.c b/drivers/gpu/drm/radeon/sumo_dpm.c index 45d04996adf5..a80e2edb7c0f 100644 --- a/drivers/gpu/drm/radeon/sumo_dpm.c +++ b/drivers/gpu/drm/radeon/sumo_dpm.c @@ -1621,6 +1621,8 @@ void sumo_construct_vid_mapping_table(struct radeon_device *rdev, for (i = 0; i < SUMO_MAX_HARDWARE_POWERLEVELS; i++) { if (table[i].ulSupportedSCLK != 0) { + if (table[i].usVoltageIndex >= SUMO_MAX_NUMBER_VOLTAGES) + continue; vid_mapping_table->entries[table[i].usVoltageIndex].vid_7bit = table[i].usVoltageID; vid_mapping_table->entries[table[i].usVoltageIndex].vid_2bit = From b1684798a300ecc1cfcae4c642799ce5ed3cec2e Mon Sep 17 00:00:00 2001 From: Peter Oberparleiter Date: Mon, 10 Jun 2024 11:27:43 +0200 Subject: [PATCH 1233/1565] gcov: add support for GCC 14 commit c1558bc57b8e5b4da5d821537cd30e2e660861d8 upstream. Using gcov on kernels compiled with GCC 14 results in truncated 16-byte long .gcda files with no usable data. To fix this, update GCOV_COUNTERS to match the value defined by GCC 14. Tested with GCC versions 14.1.0 and 13.2.0. Link: https://lkml.kernel.org/r/20240610092743.1609845-1-oberpar@linux.ibm.com Signed-off-by: Peter Oberparleiter Reported-by: Allison Henderson Reported-by: Chuck Lever III Tested-by: Chuck Lever Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- kernel/gcov/gcc_4_7.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/gcov/gcc_4_7.c b/kernel/gcov/gcc_4_7.c index 04880d8fba25..42de79cd1a67 100644 --- a/kernel/gcov/gcc_4_7.c +++ b/kernel/gcov/gcc_4_7.c @@ -19,7 +19,9 @@ #include #include "gcov.h" -#if (__GNUC__ >= 10) +#if (__GNUC__ >= 14) +#define GCOV_COUNTERS 9 +#elif (__GNUC__ >= 10) #define GCOV_COUNTERS 8 #elif (__GNUC__ >= 7) #define GCOV_COUNTERS 9 From 7879b54f0b9046226f1b52e3aa9a06724b54b413 Mon Sep 17 00:00:00 2001 From: Aleksandr Nogikh Date: Tue, 11 Jun 2024 15:32:29 +0200 Subject: [PATCH 1234/1565] kcov: don't lose track of remote references during softirqs commit 01c8f9806bde438ca1c8cbbc439f0a14a6694f6c upstream. In kcov_remote_start()/kcov_remote_stop(), we swap the previous KCOV metadata of the current task into a per-CPU variable. However, the kcov_mode_enabled(mode) check is not sufficient in the case of remote KCOV coverage: current->kcov_mode always remains KCOV_MODE_DISABLED for remote KCOV objects. If the original task that has invoked the KCOV_REMOTE_ENABLE ioctl happens to get interrupted and kcov_remote_start() is called, it ultimately leads to kcov_remote_stop() NOT restoring the original KCOV reference. So when the task exits, all registered remote KCOV handles remain active forever. The most uncomfortable effect (at least for syzkaller) is that the bug prevents the reuse of the same /sys/kernel/debug/kcov descriptor. If we obtain it in the parent process and then e.g. drop some capabilities and continuously fork to execute individual programs, at some point current->kcov of the forked process is lost, kcov_task_exit() takes no action, and all KCOV_REMOTE_ENABLE ioctls calls from subsequent forks fail. And, yes, the efficiency is also affected if we keep on losing remote kcov objects. a) kcov_remote_map keeps on growing forever. b) (If I'm not mistaken), we're also not freeing the memory referenced by kcov->area. Fix it by introducing a special kcov_mode that is assigned to the task that owns a KCOV remote object. It makes kcov_mode_enabled() return true and yet does not trigger coverage collection in __sanitizer_cov_trace_pc() and write_comp_data(). [nogikh@google.com: replace WRITE_ONCE() with an ordinary assignment] Link: https://lkml.kernel.org/r/20240614171221.2837584-1-nogikh@google.com Link: https://lkml.kernel.org/r/20240611133229.527822-1-nogikh@google.com Fixes: 5ff3b30ab57d ("kcov: collect coverage from interrupts") Signed-off-by: Aleksandr Nogikh Reviewed-by: Dmitry Vyukov Reviewed-by: Andrey Konovalov Tested-by: Andrey Konovalov Cc: Alexander Potapenko Cc: Arnd Bergmann Cc: Marco Elver Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- include/linux/kcov.h | 2 ++ kernel/kcov.c | 1 + 2 files changed, 3 insertions(+) diff --git a/include/linux/kcov.h b/include/linux/kcov.h index b48128b717f1..84535b168762 100644 --- a/include/linux/kcov.h +++ b/include/linux/kcov.h @@ -21,6 +21,8 @@ enum kcov_mode { KCOV_MODE_TRACE_PC = 2, /* Collecting comparison operands mode. */ KCOV_MODE_TRACE_CMP = 3, + /* The process owns a KCOV remote reference. */ + KCOV_MODE_REMOTE = 4, }; #define KCOV_IN_CTXSW (1 << 30) diff --git a/kernel/kcov.c b/kernel/kcov.c index 6b8368be89c8..ec2fd698ebf5 100644 --- a/kernel/kcov.c +++ b/kernel/kcov.c @@ -635,6 +635,7 @@ static int kcov_ioctl_locked(struct kcov *kcov, unsigned int cmd, return -EINVAL; kcov->mode = mode; t->kcov = kcov; + t->kcov_mode = KCOV_MODE_REMOTE; kcov->t = t; kcov->remote = true; kcov->remote_size = remote_arg->area_size; From 7884f4afeccb501ac581329bfda1eb2a2c65bc4f Mon Sep 17 00:00:00 2001 From: Grygorii Tertychnyi Date: Mon, 20 May 2024 17:39:32 +0200 Subject: [PATCH 1235/1565] i2c: ocores: set IACK bit after core is enabled commit 5a72477273066b5b357801ab2d315ef14949d402 upstream. Setting IACK bit when core is disabled does not clear the "Interrupt Flag" bit in the status register, and the interrupt remains pending. Sometimes it causes failure for the very first message transfer, that is usually a device probe. Hence, set IACK bit after core is enabled to clear pending interrupt. Fixes: 18f98b1e3147 ("[PATCH] i2c: New bus driver for the OpenCores I2C controller") Signed-off-by: Grygorii Tertychnyi Acked-by: Peter Korsgaard Cc: stable@vger.kernel.org Signed-off-by: Andi Shyti Signed-off-by: Greg Kroah-Hartman --- drivers/i2c/busses/i2c-ocores.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-ocores.c b/drivers/i2c/busses/i2c-ocores.c index 71e26aa6bd8f..5e2e8bd2d055 100644 --- a/drivers/i2c/busses/i2c-ocores.c +++ b/drivers/i2c/busses/i2c-ocores.c @@ -443,8 +443,8 @@ static int ocores_init(struct device *dev, struct ocores_i2c *i2c) oc_setreg(i2c, OCI2C_PREHIGH, prescale >> 8); /* Init the device */ - oc_setreg(i2c, OCI2C_CMD, OCI2C_CMD_IACK); oc_setreg(i2c, OCI2C_CONTROL, ctrl | OCI2C_CTRL_EN); + oc_setreg(i2c, OCI2C_CMD, OCI2C_CMD_IACK); return 0; } From 5d7fef7522b1f7617d92744c38ecaf9cd0a68417 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Thu, 20 Jun 2024 13:34:50 +0200 Subject: [PATCH 1236/1565] dt-bindings: i2c: google,cros-ec-i2c-tunnel: correct path to i2c-controller schema commit 5c8cfd592bb7632200b4edac8f2c7ec892ed9d81 upstream. The referenced i2c-controller.yaml schema is provided by dtschema package (outside of Linux kernel), so use full path to reference it. Cc: stable@vger.kernel.org Fixes: 1acd4577a66f ("dt-bindings: i2c: convert i2c-cros-ec-tunnel to json-schema") Signed-off-by: Krzysztof Kozlowski Reviewed-by: Conor Dooley Signed-off-by: Andi Shyti Signed-off-by: Greg Kroah-Hartman --- .../devicetree/bindings/i2c/google,cros-ec-i2c-tunnel.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/i2c/google,cros-ec-i2c-tunnel.yaml b/Documentation/devicetree/bindings/i2c/google,cros-ec-i2c-tunnel.yaml index b386e4128a79..d3e01d8c043c 100644 --- a/Documentation/devicetree/bindings/i2c/google,cros-ec-i2c-tunnel.yaml +++ b/Documentation/devicetree/bindings/i2c/google,cros-ec-i2c-tunnel.yaml @@ -22,7 +22,7 @@ description: | google,cros-ec-spi or google,cros-ec-i2c. allOf: - - $ref: i2c-controller.yaml# + - $ref: /schemas/i2c/i2c-controller.yaml# properties: compatible: From cc255080c1c59f1749c7bedd4281bd809d49ee3a Mon Sep 17 00:00:00 2001 From: Martin Leung Date: Mon, 26 Feb 2024 13:20:08 -0500 Subject: [PATCH 1237/1565] drm/amd/display: revert Exit idle optimizations before HDCP execution commit f2703a3596a279b0be6eeed4c500bdbaa8dc3ce4 upstream. why and how: causes black screen on PNP on DCN 3.5 This reverts commit f30a3bea92bd ("drm/amd/display: Exit idle optimizations before HDCP execution") Cc: Mario Limonciello Cc: Alex Deucher Reviewed-by: Nicholas Kazlauskas Acked-by: Wayne Lin Signed-off-by: Martin Leung Tested-by: Daniel Wheeler Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c | 10 ---------- drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h | 8 -------- 2 files changed, 18 deletions(-) diff --git a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c index c39cb4b6767c..fa8aeec304ef 100644 --- a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c +++ b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp.c @@ -86,14 +86,6 @@ static uint8_t is_cp_desired_hdcp2(struct mod_hdcp *hdcp) !hdcp->connection.is_hdcp2_revoked; } -static void exit_idle_optimizations(struct mod_hdcp *hdcp) -{ - struct mod_hdcp_dm *dm = &hdcp->config.dm; - - if (dm->funcs.exit_idle_optimizations) - dm->funcs.exit_idle_optimizations(dm->handle); -} - static enum mod_hdcp_status execution(struct mod_hdcp *hdcp, struct mod_hdcp_event_context *event_ctx, union mod_hdcp_transition_input *input) @@ -456,8 +448,6 @@ enum mod_hdcp_status mod_hdcp_process_event(struct mod_hdcp *hdcp, memset(&event_ctx, 0, sizeof(struct mod_hdcp_event_context)); event_ctx.event = event; - exit_idle_optimizations(hdcp); - /* execute and transition */ exec_status = execution(hdcp, &event_ctx, &hdcp->auth.trans_input); trans_status = transition( diff --git a/drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h b/drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h index fb195276fb70..eed560eecbab 100644 --- a/drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h +++ b/drivers/gpu/drm/amd/display/modules/inc/mod_hdcp.h @@ -143,13 +143,6 @@ struct mod_hdcp_ddc { } funcs; }; -struct mod_hdcp_dm { - void *handle; - struct { - void (*exit_idle_optimizations)(void *handle); - } funcs; -}; - struct mod_hdcp_psp { void *handle; void *funcs; @@ -259,7 +252,6 @@ struct mod_hdcp_display_query { struct mod_hdcp_config { struct mod_hdcp_psp psp; struct mod_hdcp_ddc ddc; - struct mod_hdcp_dm dm; uint8_t index; }; From b263a895d8a178eeb0987c062edfee45eb65135e Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Tue, 12 Mar 2024 19:31:02 +0100 Subject: [PATCH 1238/1565] ARM: dts: samsung: smdkv310: fix keypad no-autorepeat [ Upstream commit 87d8e522d6f5a004f0aa06c0def302df65aff296 ] Although the Samsung SoC keypad binding defined linux,keypad-no-autorepeat property, Linux driver never implemented it and always used linux,input-no-autorepeat. Correct the DTS to use property actually implemented. This also fixes dtbs_check errors like: exynos4210-smdkv310.dtb: keypad@100a0000: 'linux,keypad-no-autorepeat' does not match any of the regexes: '^key-[0-9a-z]+$', 'pinctrl-[0-9]+' Cc: Fixes: 0561ceabd0f1 ("ARM: dts: Add intial dts file for EXYNOS4210 SoC, SMDKV310 and ORIGEN") Link: https://lore.kernel.org/r/20240312183105.715735-1-krzysztof.kozlowski@linaro.org Signed-off-by: Krzysztof Kozlowski Signed-off-by: Sasha Levin --- arch/arm/boot/dts/exynos4210-smdkv310.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/exynos4210-smdkv310.dts b/arch/arm/boot/dts/exynos4210-smdkv310.dts index c5609afa6101..a6622a0d9b58 100644 --- a/arch/arm/boot/dts/exynos4210-smdkv310.dts +++ b/arch/arm/boot/dts/exynos4210-smdkv310.dts @@ -84,7 +84,7 @@ eeprom@52 { &keypad { samsung,keypad-num-rows = <2>; samsung,keypad-num-columns = <8>; - linux,keypad-no-autorepeat; + linux,input-no-autorepeat; wakeup-source; pinctrl-names = "default"; pinctrl-0 = <&keypad_rows &keypad_cols>; From 1fdaecc326f0cf8628474c87df47cf2b73f66728 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Tue, 12 Mar 2024 19:31:03 +0100 Subject: [PATCH 1239/1565] ARM: dts: samsung: exynos4412-origen: fix keypad no-autorepeat [ Upstream commit 88208d3cd79821117fd3fb80d9bcab618467d37b ] Although the Samsung SoC keypad binding defined linux,keypad-no-autorepeat property, Linux driver never implemented it and always used linux,input-no-autorepeat. Correct the DTS to use property actually implemented. This also fixes dtbs_check errors like: exynos4412-origen.dtb: keypad@100a0000: 'linux,keypad-no-autorepeat' does not match any of the regexes: '^key-[0-9a-z]+$', 'pinctrl-[0-9]+' Cc: Fixes: bd08f6277e44 ("ARM: dts: Add keypad entries to Exynos4412 based Origen") Link: https://lore.kernel.org/r/20240312183105.715735-2-krzysztof.kozlowski@linaro.org Signed-off-by: Krzysztof Kozlowski Signed-off-by: Sasha Levin --- arch/arm/boot/dts/exynos4412-origen.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/exynos4412-origen.dts b/arch/arm/boot/dts/exynos4412-origen.dts index e2d76ea4404e..7138bf291437 100644 --- a/arch/arm/boot/dts/exynos4412-origen.dts +++ b/arch/arm/boot/dts/exynos4412-origen.dts @@ -447,7 +447,7 @@ buck9_reg: BUCK9 { &keypad { samsung,keypad-num-rows = <3>; samsung,keypad-num-columns = <2>; - linux,keypad-no-autorepeat; + linux,input-no-autorepeat; wakeup-source; pinctrl-0 = <&keypad_rows &keypad_cols>; pinctrl-names = "default"; From 77942a027231ac7670788f26543a133d65b18e9c Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Tue, 12 Mar 2024 19:31:04 +0100 Subject: [PATCH 1240/1565] ARM: dts: samsung: smdk4412: fix keypad no-autorepeat [ Upstream commit 4ac4c1d794e7ff454d191bbdab7585ed8dbf3758 ] Although the Samsung SoC keypad binding defined linux,keypad-no-autorepeat property, Linux driver never implemented it and always used linux,input-no-autorepeat. Correct the DTS to use property actually implemented. This also fixes dtbs_check errors like: exynos4412-smdk4412.dtb: keypad@100a0000: 'key-A', 'key-B', 'key-C', 'key-D', 'key-E', 'linux,keypad-no-autorepeat' do not match any of the regexes: '^key-[0-9a-z]+$', 'pinctrl-[0-9]+' Cc: Fixes: c9b92dd70107 ("ARM: dts: Add keypad entries to SMDK4412") Link: https://lore.kernel.org/r/20240312183105.715735-3-krzysztof.kozlowski@linaro.org Signed-off-by: Krzysztof Kozlowski Signed-off-by: Sasha Levin --- arch/arm/boot/dts/exynos4412-smdk4412.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/exynos4412-smdk4412.dts b/arch/arm/boot/dts/exynos4412-smdk4412.dts index 49971203a8aa..6e6bef907a72 100644 --- a/arch/arm/boot/dts/exynos4412-smdk4412.dts +++ b/arch/arm/boot/dts/exynos4412-smdk4412.dts @@ -65,7 +65,7 @@ cooling_map1: map1 { &keypad { samsung,keypad-num-rows = <3>; samsung,keypad-num-columns = <8>; - linux,keypad-no-autorepeat; + linux,input-no-autorepeat; wakeup-source; pinctrl-0 = <&keypad_rows &keypad_cols>; pinctrl-names = "default"; From 33628b6ed3ccb6a9dffd4e2b7f76f9b13a657cfe Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Fri, 19 Nov 2021 11:22:33 -0800 Subject: [PATCH 1241/1565] rtlwifi: rtl8192de: Style clean-ups [ Upstream commit 69831173fcbbfebb7aa2d76523deaf0b87b8eddd ] Clean up some style issues: - Use ARRAY_SIZE() even though it's a u8 array. - Remove redundant CHANNEL_MAX_NUMBER_2G define. Additionally fix some dead code WARNs. Acked-by: Ping-Ke Shih Link: https://lore.kernel.org/lkml/57d0d1b6064342309f680f692192556c@realtek.com/ Signed-off-by: Kees Cook Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20211119192233.1021063-1-keescook@chromium.org Stable-dep-of: de4d4be4fa64 ("wifi: rtlwifi: rtl8192de: Fix 5 GHz TX power") Signed-off-by: Sasha Levin --- .../wireless/realtek/rtlwifi/rtl8192de/phy.c | 17 +++++++---------- drivers/net/wireless/realtek/rtlwifi/wifi.h | 1 - 2 files changed, 7 insertions(+), 11 deletions(-) diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8192de/phy.c b/drivers/net/wireless/realtek/rtlwifi/rtl8192de/phy.c index af0c7d74b3f5..da9acebbe237 100644 --- a/drivers/net/wireless/realtek/rtlwifi/rtl8192de/phy.c +++ b/drivers/net/wireless/realtek/rtlwifi/rtl8192de/phy.c @@ -892,7 +892,7 @@ static u8 _rtl92c_phy_get_rightchnlplace(u8 chnl) u8 place = chnl; if (chnl > 14) { - for (place = 14; place < sizeof(channel5g); place++) { + for (place = 14; place < ARRAY_SIZE(channel5g); place++) { if (channel5g[place] == chnl) { place++; break; @@ -1359,7 +1359,7 @@ u8 rtl92d_get_rightchnlplace_for_iqk(u8 chnl) u8 place = chnl; if (chnl > 14) { - for (place = 14; place < sizeof(channel_all); place++) { + for (place = 14; place < ARRAY_SIZE(channel_all); place++) { if (channel_all[place] == chnl) return place - 13; } @@ -2417,7 +2417,7 @@ static bool _rtl92d_is_legal_5g_channel(struct ieee80211_hw *hw, u8 channel) int i; - for (i = 0; i < sizeof(channel5g); i++) + for (i = 0; i < ARRAY_SIZE(channel5g); i++) if (channel == channel5g[i]) return true; return false; @@ -2681,9 +2681,8 @@ void rtl92d_phy_reset_iqk_result(struct ieee80211_hw *hw) u8 i; rtl_dbg(rtlpriv, COMP_INIT, DBG_LOUD, - "settings regs %d default regs %d\n", - (int)(sizeof(rtlphy->iqk_matrix) / - sizeof(struct iqk_matrix_regs)), + "settings regs %zu default regs %d\n", + ARRAY_SIZE(rtlphy->iqk_matrix), IQK_MATRIX_REG_NUM); /* 0xe94, 0xe9c, 0xea4, 0xeac, 0xeb4, 0xebc, 0xec4, 0xecc */ for (i = 0; i < IQK_MATRIX_SETTINGS_NUM; i++) { @@ -2850,16 +2849,14 @@ u8 rtl92d_phy_sw_chnl(struct ieee80211_hw *hw) case BAND_ON_5G: /* Get first channel error when change between * 5G and 2.4G band. */ - if (channel <= 14) + if (WARN_ONCE(channel <= 14, "rtl8192de: 5G but channel<=14\n")) return 0; - WARN_ONCE((channel <= 14), "rtl8192de: 5G but channel<=14\n"); break; case BAND_ON_2_4G: /* Get first channel error when change between * 5G and 2.4G band. */ - if (channel > 14) + if (WARN_ONCE(channel > 14, "rtl8192de: 2G but channel>14\n")) return 0; - WARN_ONCE((channel > 14), "rtl8192de: 2G but channel>14\n"); break; default: WARN_ONCE(true, "rtl8192de: Invalid WirelessMode(%#x)!!\n", diff --git a/drivers/net/wireless/realtek/rtlwifi/wifi.h b/drivers/net/wireless/realtek/rtlwifi/wifi.h index a89e232d6963..c997d8bfda97 100644 --- a/drivers/net/wireless/realtek/rtlwifi/wifi.h +++ b/drivers/net/wireless/realtek/rtlwifi/wifi.h @@ -108,7 +108,6 @@ #define CHANNEL_GROUP_IDX_5GM 6 #define CHANNEL_GROUP_IDX_5GH 9 #define CHANNEL_GROUP_MAX_5G 9 -#define CHANNEL_MAX_NUMBER_2G 14 #define AVG_THERMAL_NUM 8 #define AVG_THERMAL_NUM_88E 4 #define AVG_THERMAL_NUM_8723BE 4 From 5fe1b2c72e9edaa5bb64f50dca11bafd62bdc8cd Mon Sep 17 00:00:00 2001 From: Bitterblue Smith Date: Thu, 25 Apr 2024 21:09:21 +0300 Subject: [PATCH 1242/1565] wifi: rtlwifi: rtl8192de: Fix 5 GHz TX power [ Upstream commit de4d4be4fa64ed7b4aa1c613061015bd8fa98b24 ] Different channels have different TX power settings. rtl8192de is using the TX power setting from the wrong channel in the 5 GHz band because _rtl92c_phy_get_rightchnlplace expects an array which includes all the channel numbers, but it's using an array which includes only the 5 GHz channel numbers. Use the array channel_all (defined in rtl8192de/phy.c) instead of the incorrect channel5g (defined in core.c). Tested only with rtl8192du, which will use the same TX power code. Cc: stable@vger.kernel.org Signed-off-by: Bitterblue Smith Signed-off-by: Ping-Ke Shih Link: https://msgid.link/c7653517-cf88-4f57-b79a-8edb0a8b32f0@gmail.com Signed-off-by: Sasha Levin --- drivers/net/wireless/realtek/rtlwifi/rtl8192de/phy.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8192de/phy.c b/drivers/net/wireless/realtek/rtlwifi/rtl8192de/phy.c index da9acebbe237..ffe6d243c764 100644 --- a/drivers/net/wireless/realtek/rtlwifi/rtl8192de/phy.c +++ b/drivers/net/wireless/realtek/rtlwifi/rtl8192de/phy.c @@ -892,8 +892,8 @@ static u8 _rtl92c_phy_get_rightchnlplace(u8 chnl) u8 place = chnl; if (chnl > 14) { - for (place = 14; place < ARRAY_SIZE(channel5g); place++) { - if (channel5g[place] == chnl) { + for (place = 14; place < ARRAY_SIZE(channel_all); place++) { + if (channel_all[place] == chnl) { place++; break; } From f77c8a2ce21e6a11658be63995fad0171ff425a4 Mon Sep 17 00:00:00 2001 From: Tomi Valkeinen Date: Mon, 15 Apr 2024 19:00:23 +0300 Subject: [PATCH 1243/1565] pmdomain: ti-sci: Fix duplicate PD referrals [ Upstream commit 670c900f69645db394efb38934b3344d8804171a ] When the dts file has multiple referrers to a single PD (e.g. simple-framebuffer and dss nodes both point to the DSS power-domain) the ti-sci driver will create two power domains, both with the same ID, and that will cause problems as one of the power domains will hide the other one. Fix this checking if a PD with the ID has already been created, and only create a PD for new IDs. Fixes: efa5c01cd7ee ("soc: ti: ti_sci_pm_domains: switch to use multiple genpds instead of one") Signed-off-by: Tomi Valkeinen Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240415-ti-sci-pd-v1-1-a0e56b8ad897@ideasonboard.com Signed-off-by: Ulf Hansson Signed-off-by: Sasha Levin --- drivers/soc/ti/ti_sci_pm_domains.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/soc/ti/ti_sci_pm_domains.c b/drivers/soc/ti/ti_sci_pm_domains.c index a33ec7eaf23d..17984a7bffba 100644 --- a/drivers/soc/ti/ti_sci_pm_domains.c +++ b/drivers/soc/ti/ti_sci_pm_domains.c @@ -114,6 +114,18 @@ static const struct of_device_id ti_sci_pm_domain_matches[] = { }; MODULE_DEVICE_TABLE(of, ti_sci_pm_domain_matches); +static bool ti_sci_pm_idx_exists(struct ti_sci_genpd_provider *pd_provider, u32 idx) +{ + struct ti_sci_pm_domain *pd; + + list_for_each_entry(pd, &pd_provider->pd_list, node) { + if (pd->idx == idx) + return true; + } + + return false; +} + static int ti_sci_pm_domain_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; @@ -153,8 +165,14 @@ static int ti_sci_pm_domain_probe(struct platform_device *pdev) break; if (args.args_count >= 1 && args.np == dev->of_node) { - if (args.args[0] > max_id) + if (args.args[0] > max_id) { max_id = args.args[0]; + } else { + if (ti_sci_pm_idx_exists(pd_provider, args.args[0])) { + index++; + continue; + } + } pd = devm_kzalloc(dev, sizeof(*pd), GFP_KERNEL); if (!pd) From 90551062fd6978c7bc9c9d363b6cea4c9cf2129d Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 6 May 2024 12:30:04 -0400 Subject: [PATCH 1244/1565] knfsd: LOOKUP can return an illegal error value [ Upstream commit e221c45da3770962418fb30c27d941bbc70d595a ] The 'NFS error' NFSERR_OPNOTSUPP is not described by any of the official NFS related RFCs, but appears to have snuck into some older .x files for NFSv2. Either way, it is not in RFC1094, RFC1813 or any of the NFSv4 RFCs, so should not be returned by the knfsd server, and particularly not by the "LOOKUP" operation. Instead, let's return NFSERR_STALE, which is more appropriate if the filesystem encodes the filehandle as FILEID_INVALID. Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfsfh.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index db8d62632a5b..ae3323e0708d 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -573,7 +573,7 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry, _fh_update(fhp, exp, dentry); if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) { fh_put(fhp); - return nfserr_opnotsupp; + return nfserr_stale; } return 0; @@ -599,7 +599,7 @@ fh_update(struct svc_fh *fhp) _fh_update(fhp, fhp->fh_export, dentry); if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) - return nfserr_opnotsupp; + return nfserr_stale; return 0; out_bad: printk(KERN_ERR "fh_update: fh not verified!\n"); From 6337072467296defa7d1b0160c03de3d39b1d318 Mon Sep 17 00:00:00 2001 From: Vamshi Gajjela Date: Tue, 7 May 2024 14:07:41 -0700 Subject: [PATCH 1245/1565] spmi: hisi-spmi-controller: Do not override device identifier [ Upstream commit eda4923d78d634482227c0b189d9b7ca18824146 ] 'nr' member of struct spmi_controller, which serves as an identifier for the controller/bus. This value is a dynamic ID assigned in spmi_controller_alloc, and overriding it from the driver results in an ida_free error "ida_free called for id=xx which is not allocated". Signed-off-by: Vamshi Gajjela Fixes: 70f59c90c819 ("staging: spmi: add Hikey 970 SPMI controller driver") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240228185116.1269-1-vamshigajjela@google.com Signed-off-by: Stephen Boyd Link: https://lore.kernel.org/r/20240507210809.3479953-5-sboyd@kernel.org Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- drivers/staging/hikey9xx/hisi-spmi-controller.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/staging/hikey9xx/hisi-spmi-controller.c b/drivers/staging/hikey9xx/hisi-spmi-controller.c index 29f226503668..eee3dcf96ee7 100644 --- a/drivers/staging/hikey9xx/hisi-spmi-controller.c +++ b/drivers/staging/hikey9xx/hisi-spmi-controller.c @@ -303,7 +303,6 @@ static int spmi_controller_probe(struct platform_device *pdev) spin_lock_init(&spmi_controller->lock); - ctrl->nr = spmi_controller->channel; ctrl->dev.parent = pdev->dev.parent; ctrl->dev.of_node = of_node_get(pdev->dev.of_node); From 2c3d7b03b658dc8bfa6112b194b67b92a87e081b Mon Sep 17 00:00:00 2001 From: Matthew Mirvish Date: Thu, 9 May 2024 09:11:17 +0800 Subject: [PATCH 1246/1565] bcache: fix variable length array abuse in btree_iter [ Upstream commit 3a861560ccb35f2a4f0a4b8207fa7c2a35fc7f31 ] btree_iter is used in two ways: either allocated on the stack with a fixed size MAX_BSETS, or from a mempool with a dynamic size based on the specific cache set. Previously, the struct had a fixed-length array of size MAX_BSETS which was indexed out-of-bounds for the dynamically-sized iterators, which causes UBSAN to complain. This patch uses the same approach as in bcachefs's sort_iter and splits the iterator into a btree_iter with a flexible array member and a btree_iter_stack which embeds a btree_iter as well as a fixed-length data array. Cc: stable@vger.kernel.org Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2039368 Signed-off-by: Matthew Mirvish Signed-off-by: Coly Li Link: https://lore.kernel.org/r/20240509011117.2697-3-colyli@suse.de Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin --- drivers/md/bcache/bset.c | 44 +++++++++++++++++------------------ drivers/md/bcache/bset.h | 28 ++++++++++++++-------- drivers/md/bcache/btree.c | 40 ++++++++++++++++--------------- drivers/md/bcache/super.c | 5 ++-- drivers/md/bcache/sysfs.c | 2 +- drivers/md/bcache/writeback.c | 10 ++++---- 6 files changed, 70 insertions(+), 59 deletions(-) diff --git a/drivers/md/bcache/bset.c b/drivers/md/bcache/bset.c index 67a2c47f4201..f006c780dd4b 100644 --- a/drivers/md/bcache/bset.c +++ b/drivers/md/bcache/bset.c @@ -54,7 +54,7 @@ void bch_dump_bucket(struct btree_keys *b) int __bch_count_data(struct btree_keys *b) { unsigned int ret = 0; - struct btree_iter iter; + struct btree_iter_stack iter; struct bkey *k; if (b->ops->is_extents) @@ -67,7 +67,7 @@ void __bch_check_keys(struct btree_keys *b, const char *fmt, ...) { va_list args; struct bkey *k, *p = NULL; - struct btree_iter iter; + struct btree_iter_stack iter; const char *err; for_each_key(b, k, &iter) { @@ -877,7 +877,7 @@ unsigned int bch_btree_insert_key(struct btree_keys *b, struct bkey *k, unsigned int status = BTREE_INSERT_STATUS_NO_INSERT; struct bset *i = bset_tree_last(b)->data; struct bkey *m, *prev = NULL; - struct btree_iter iter; + struct btree_iter_stack iter; struct bkey preceding_key_on_stack = ZERO_KEY; struct bkey *preceding_key_p = &preceding_key_on_stack; @@ -893,9 +893,9 @@ unsigned int bch_btree_insert_key(struct btree_keys *b, struct bkey *k, else preceding_key(k, &preceding_key_p); - m = bch_btree_iter_init(b, &iter, preceding_key_p); + m = bch_btree_iter_stack_init(b, &iter, preceding_key_p); - if (b->ops->insert_fixup(b, k, &iter, replace_key)) + if (b->ops->insert_fixup(b, k, &iter.iter, replace_key)) return status; status = BTREE_INSERT_STATUS_INSERT; @@ -1096,33 +1096,33 @@ void bch_btree_iter_push(struct btree_iter *iter, struct bkey *k, btree_iter_cmp)); } -static struct bkey *__bch_btree_iter_init(struct btree_keys *b, - struct btree_iter *iter, - struct bkey *search, - struct bset_tree *start) +static struct bkey *__bch_btree_iter_stack_init(struct btree_keys *b, + struct btree_iter_stack *iter, + struct bkey *search, + struct bset_tree *start) { struct bkey *ret = NULL; - iter->size = ARRAY_SIZE(iter->data); - iter->used = 0; + iter->iter.size = ARRAY_SIZE(iter->stack_data); + iter->iter.used = 0; #ifdef CONFIG_BCACHE_DEBUG - iter->b = b; + iter->iter.b = b; #endif for (; start <= bset_tree_last(b); start++) { ret = bch_bset_search(b, start, search); - bch_btree_iter_push(iter, ret, bset_bkey_last(start->data)); + bch_btree_iter_push(&iter->iter, ret, bset_bkey_last(start->data)); } return ret; } -struct bkey *bch_btree_iter_init(struct btree_keys *b, - struct btree_iter *iter, +struct bkey *bch_btree_iter_stack_init(struct btree_keys *b, + struct btree_iter_stack *iter, struct bkey *search) { - return __bch_btree_iter_init(b, iter, search, b->set); + return __bch_btree_iter_stack_init(b, iter, search, b->set); } static inline struct bkey *__bch_btree_iter_next(struct btree_iter *iter, @@ -1289,10 +1289,10 @@ void bch_btree_sort_partial(struct btree_keys *b, unsigned int start, struct bset_sort_state *state) { size_t order = b->page_order, keys = 0; - struct btree_iter iter; + struct btree_iter_stack iter; int oldsize = bch_count_data(b); - __bch_btree_iter_init(b, &iter, NULL, &b->set[start]); + __bch_btree_iter_stack_init(b, &iter, NULL, &b->set[start]); if (start) { unsigned int i; @@ -1303,7 +1303,7 @@ void bch_btree_sort_partial(struct btree_keys *b, unsigned int start, order = get_order(__set_bytes(b->set->data, keys)); } - __btree_sort(b, &iter, start, order, false, state); + __btree_sort(b, &iter.iter, start, order, false, state); EBUG_ON(oldsize >= 0 && bch_count_data(b) != oldsize); } @@ -1319,11 +1319,11 @@ void bch_btree_sort_into(struct btree_keys *b, struct btree_keys *new, struct bset_sort_state *state) { uint64_t start_time = local_clock(); - struct btree_iter iter; + struct btree_iter_stack iter; - bch_btree_iter_init(b, &iter, NULL); + bch_btree_iter_stack_init(b, &iter, NULL); - btree_mergesort(b, new->set->data, &iter, false, true); + btree_mergesort(b, new->set->data, &iter.iter, false, true); bch_time_stats_update(&state->time, start_time); diff --git a/drivers/md/bcache/bset.h b/drivers/md/bcache/bset.h index a50dcfda656f..2ed6dbd35d6e 100644 --- a/drivers/md/bcache/bset.h +++ b/drivers/md/bcache/bset.h @@ -321,7 +321,14 @@ struct btree_iter { #endif struct btree_iter_set { struct bkey *k, *end; - } data[MAX_BSETS]; + } data[]; +}; + +/* Fixed-size btree_iter that can be allocated on the stack */ + +struct btree_iter_stack { + struct btree_iter iter; + struct btree_iter_set stack_data[MAX_BSETS]; }; typedef bool (*ptr_filter_fn)(struct btree_keys *b, const struct bkey *k); @@ -333,9 +340,9 @@ struct bkey *bch_btree_iter_next_filter(struct btree_iter *iter, void bch_btree_iter_push(struct btree_iter *iter, struct bkey *k, struct bkey *end); -struct bkey *bch_btree_iter_init(struct btree_keys *b, - struct btree_iter *iter, - struct bkey *search); +struct bkey *bch_btree_iter_stack_init(struct btree_keys *b, + struct btree_iter_stack *iter, + struct bkey *search); struct bkey *__bch_bset_search(struct btree_keys *b, struct bset_tree *t, const struct bkey *search); @@ -350,13 +357,14 @@ static inline struct bkey *bch_bset_search(struct btree_keys *b, return search ? __bch_bset_search(b, t, search) : t->data->start; } -#define for_each_key_filter(b, k, iter, filter) \ - for (bch_btree_iter_init((b), (iter), NULL); \ - ((k) = bch_btree_iter_next_filter((iter), (b), filter));) +#define for_each_key_filter(b, k, stack_iter, filter) \ + for (bch_btree_iter_stack_init((b), (stack_iter), NULL); \ + ((k) = bch_btree_iter_next_filter(&((stack_iter)->iter), (b), \ + filter));) -#define for_each_key(b, k, iter) \ - for (bch_btree_iter_init((b), (iter), NULL); \ - ((k) = bch_btree_iter_next(iter));) +#define for_each_key(b, k, stack_iter) \ + for (bch_btree_iter_stack_init((b), (stack_iter), NULL); \ + ((k) = bch_btree_iter_next(&((stack_iter)->iter)));) /* Sorting */ diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 1a1a9554474a..2768b4b4302d 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -1283,7 +1283,7 @@ static bool btree_gc_mark_node(struct btree *b, struct gc_stat *gc) uint8_t stale = 0; unsigned int keys = 0, good_keys = 0; struct bkey *k; - struct btree_iter iter; + struct btree_iter_stack iter; struct bset_tree *t; gc->nodes++; @@ -1544,7 +1544,7 @@ static int btree_gc_rewrite_node(struct btree *b, struct btree_op *op, static unsigned int btree_gc_count_keys(struct btree *b) { struct bkey *k; - struct btree_iter iter; + struct btree_iter_stack iter; unsigned int ret = 0; for_each_key_filter(&b->keys, k, &iter, bch_ptr_bad) @@ -1585,17 +1585,18 @@ static int btree_gc_recurse(struct btree *b, struct btree_op *op, int ret = 0; bool should_rewrite; struct bkey *k; - struct btree_iter iter; + struct btree_iter_stack iter; struct gc_merge_info r[GC_MERGE_NODES]; struct gc_merge_info *i, *last = r + ARRAY_SIZE(r) - 1; - bch_btree_iter_init(&b->keys, &iter, &b->c->gc_done); + bch_btree_iter_stack_init(&b->keys, &iter, &b->c->gc_done); for (i = r; i < r + ARRAY_SIZE(r); i++) i->b = ERR_PTR(-EINTR); while (1) { - k = bch_btree_iter_next_filter(&iter, &b->keys, bch_ptr_bad); + k = bch_btree_iter_next_filter(&iter.iter, &b->keys, + bch_ptr_bad); if (k) { r->b = bch_btree_node_get(b->c, op, k, b->level - 1, true, b); @@ -1885,7 +1886,7 @@ static int bch_btree_check_recurse(struct btree *b, struct btree_op *op) { int ret = 0; struct bkey *k, *p = NULL; - struct btree_iter iter; + struct btree_iter_stack iter; for_each_key_filter(&b->keys, k, &iter, bch_ptr_invalid) bch_initial_mark_key(b->c, b->level, k); @@ -1893,10 +1894,10 @@ static int bch_btree_check_recurse(struct btree *b, struct btree_op *op) bch_initial_mark_key(b->c, b->level + 1, &b->key); if (b->level) { - bch_btree_iter_init(&b->keys, &iter, NULL); + bch_btree_iter_stack_init(&b->keys, &iter, NULL); do { - k = bch_btree_iter_next_filter(&iter, &b->keys, + k = bch_btree_iter_next_filter(&iter.iter, &b->keys, bch_ptr_bad); if (k) { btree_node_prefetch(b, k); @@ -1924,7 +1925,7 @@ static int bch_btree_check_thread(void *arg) struct btree_check_info *info = arg; struct btree_check_state *check_state = info->state; struct cache_set *c = check_state->c; - struct btree_iter iter; + struct btree_iter_stack iter; struct bkey *k, *p; int cur_idx, prev_idx, skip_nr; @@ -1933,8 +1934,8 @@ static int bch_btree_check_thread(void *arg) ret = 0; /* root node keys are checked before thread created */ - bch_btree_iter_init(&c->root->keys, &iter, NULL); - k = bch_btree_iter_next_filter(&iter, &c->root->keys, bch_ptr_bad); + bch_btree_iter_stack_init(&c->root->keys, &iter, NULL); + k = bch_btree_iter_next_filter(&iter.iter, &c->root->keys, bch_ptr_bad); BUG_ON(!k); p = k; @@ -1952,7 +1953,7 @@ static int bch_btree_check_thread(void *arg) skip_nr = cur_idx - prev_idx; while (skip_nr) { - k = bch_btree_iter_next_filter(&iter, + k = bch_btree_iter_next_filter(&iter.iter, &c->root->keys, bch_ptr_bad); if (k) @@ -2025,7 +2026,7 @@ int bch_btree_check(struct cache_set *c) int ret = 0; int i; struct bkey *k = NULL; - struct btree_iter iter; + struct btree_iter_stack iter; struct btree_check_state check_state; /* check and mark root node keys */ @@ -2521,11 +2522,11 @@ static int bch_btree_map_nodes_recurse(struct btree *b, struct btree_op *op, if (b->level) { struct bkey *k; - struct btree_iter iter; + struct btree_iter_stack iter; - bch_btree_iter_init(&b->keys, &iter, from); + bch_btree_iter_stack_init(&b->keys, &iter, from); - while ((k = bch_btree_iter_next_filter(&iter, &b->keys, + while ((k = bch_btree_iter_next_filter(&iter.iter, &b->keys, bch_ptr_bad))) { ret = bcache_btree(map_nodes_recurse, k, b, op, from, fn, flags); @@ -2554,11 +2555,12 @@ int bch_btree_map_keys_recurse(struct btree *b, struct btree_op *op, { int ret = MAP_CONTINUE; struct bkey *k; - struct btree_iter iter; + struct btree_iter_stack iter; - bch_btree_iter_init(&b->keys, &iter, from); + bch_btree_iter_stack_init(&b->keys, &iter, from); - while ((k = bch_btree_iter_next_filter(&iter, &b->keys, bch_ptr_bad))) { + while ((k = bch_btree_iter_next_filter(&iter.iter, &b->keys, + bch_ptr_bad))) { ret = !b->level ? fn(op, b, k) : bcache_btree(map_keys_recurse, k, diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index 04ddaa4bbd77..14336fd54102 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1939,8 +1939,9 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb) INIT_LIST_HEAD(&c->btree_cache_freed); INIT_LIST_HEAD(&c->data_buckets); - iter_size = ((meta_bucket_pages(sb) * PAGE_SECTORS) / sb->block_size + 1) * - sizeof(struct btree_iter_set); + iter_size = sizeof(struct btree_iter) + + ((meta_bucket_pages(sb) * PAGE_SECTORS) / sb->block_size) * + sizeof(struct btree_iter_set); c->devices = kcalloc(c->nr_uuids, sizeof(void *), GFP_KERNEL); if (!c->devices) diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index ca3e2f000cd4..a31108625f46 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -639,7 +639,7 @@ static unsigned int bch_root_usage(struct cache_set *c) unsigned int bytes = 0; struct bkey *k; struct btree *b; - struct btree_iter iter; + struct btree_iter_stack iter; goto lock_root; diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index 8e3f5f004c39..9a2aac59f6bc 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -852,15 +852,15 @@ static int bch_dirty_init_thread(void *arg) struct dirty_init_thrd_info *info = arg; struct bch_dirty_init_state *state = info->state; struct cache_set *c = state->c; - struct btree_iter iter; + struct btree_iter_stack iter; struct bkey *k, *p; int cur_idx, prev_idx, skip_nr; k = p = NULL; prev_idx = 0; - bch_btree_iter_init(&c->root->keys, &iter, NULL); - k = bch_btree_iter_next_filter(&iter, &c->root->keys, bch_ptr_bad); + bch_btree_iter_stack_init(&c->root->keys, &iter, NULL); + k = bch_btree_iter_next_filter(&iter.iter, &c->root->keys, bch_ptr_bad); BUG_ON(!k); p = k; @@ -874,7 +874,7 @@ static int bch_dirty_init_thread(void *arg) skip_nr = cur_idx - prev_idx; while (skip_nr) { - k = bch_btree_iter_next_filter(&iter, + k = bch_btree_iter_next_filter(&iter.iter, &c->root->keys, bch_ptr_bad); if (k) @@ -923,7 +923,7 @@ void bch_sectors_dirty_init(struct bcache_device *d) int i; struct btree *b = NULL; struct bkey *k = NULL; - struct btree_iter iter; + struct btree_iter_stack iter; struct sectors_dirty_init op; struct cache_set *c = d->c; struct bch_dirty_init_state state; From 0e84701753acc8332a3da54f3d2522b73afe8c02 Mon Sep 17 00:00:00 2001 From: Jeff Johnson Date: Sat, 18 May 2024 15:54:49 -0700 Subject: [PATCH 1247/1565] tracing: Add MODULE_DESCRIPTION() to preemptirq_delay_test [ Upstream commit 23748e3e0fbfe471eff5ce439921629f6a427828 ] Fix the 'make W=1' warning: WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/trace/preemptirq_delay_test.o Link: https://lore.kernel.org/linux-trace-kernel/20240518-md-preemptirq_delay_test-v1-1-387d11b30d85@quicinc.com Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers Fixes: f96e8577da10 ("lib: Add module for testing preemptoff/irqsoff latency tracers") Acked-by: Masami Hiramatsu (Google) Signed-off-by: Jeff Johnson Signed-off-by: Steven Rostedt (Google) Signed-off-by: Sasha Levin --- kernel/trace/preemptirq_delay_test.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c index 312d1a0ca3b6..1a4f2f424996 100644 --- a/kernel/trace/preemptirq_delay_test.c +++ b/kernel/trace/preemptirq_delay_test.c @@ -201,4 +201,5 @@ static void __exit preemptirq_delay_exit(void) module_init(preemptirq_delay_init) module_exit(preemptirq_delay_exit) +MODULE_DESCRIPTION("Preempt / IRQ disable delay thread to test latency tracers"); MODULE_LICENSE("GPL v2"); From 50b1b4e4f3a65813ab6edf487b90252672245466 Mon Sep 17 00:00:00 2001 From: Tony Luck Date: Tue, 16 Apr 2024 14:19:04 -0700 Subject: [PATCH 1248/1565] x86/cpu/vfm: Add new macros to work with (vendor/family/model) values [ Upstream commit e6dfdc2e89a0adedf455814c91b977d6a584cc88 ] To avoid adding a slew of new macros for each new Intel CPU family switch over from providing CPU model number #defines to a new scheme that encodes vendor, family, and model in a single number. [ bp: s/casted/cast/g ] Signed-off-by: Tony Luck Signed-off-by: Borislav Petkov (AMD) Reviewed-by: Thomas Gleixner Link: https://lore.kernel.org/r/20240416211941.9369-3-tony.luck@intel.com Stable-dep-of: 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL") Signed-off-by: Sasha Levin --- arch/x86/include/asm/cpu_device_id.h | 93 ++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+) diff --git a/arch/x86/include/asm/cpu_device_id.h b/arch/x86/include/asm/cpu_device_id.h index eb8fcede9e3b..dd7b9463696f 100644 --- a/arch/x86/include/asm/cpu_device_id.h +++ b/arch/x86/include/asm/cpu_device_id.h @@ -2,6 +2,39 @@ #ifndef _ASM_X86_CPU_DEVICE_ID #define _ASM_X86_CPU_DEVICE_ID +/* + * Can't use because it generates expressions that + * cannot be used in structure initializers. Bitfield construction + * here must match the union in struct cpuinfo_86: + * union { + * struct { + * __u8 x86_model; + * __u8 x86; + * __u8 x86_vendor; + * __u8 x86_reserved; + * }; + * __u32 x86_vfm; + * }; + */ +#define VFM_MODEL_BIT 0 +#define VFM_FAMILY_BIT 8 +#define VFM_VENDOR_BIT 16 +#define VFM_RSVD_BIT 24 + +#define VFM_MODEL_MASK GENMASK(VFM_FAMILY_BIT - 1, VFM_MODEL_BIT) +#define VFM_FAMILY_MASK GENMASK(VFM_VENDOR_BIT - 1, VFM_FAMILY_BIT) +#define VFM_VENDOR_MASK GENMASK(VFM_RSVD_BIT - 1, VFM_VENDOR_BIT) + +#define VFM_MODEL(vfm) (((vfm) & VFM_MODEL_MASK) >> VFM_MODEL_BIT) +#define VFM_FAMILY(vfm) (((vfm) & VFM_FAMILY_MASK) >> VFM_FAMILY_BIT) +#define VFM_VENDOR(vfm) (((vfm) & VFM_VENDOR_MASK) >> VFM_VENDOR_BIT) + +#define VFM_MAKE(_vendor, _family, _model) ( \ + ((_model) << VFM_MODEL_BIT) | \ + ((_family) << VFM_FAMILY_BIT) | \ + ((_vendor) << VFM_VENDOR_BIT) \ +) + /* * Declare drivers belonging to specific x86 CPUs * Similar in spirit to pci_device_id and related PCI functions @@ -49,6 +82,16 @@ .driver_data = (unsigned long) _data \ } +#define X86_MATCH_VENDORID_FAM_MODEL_STEPPINGS_FEATURE(_vendor, _family, _model, \ + _steppings, _feature, _data) { \ + .vendor = _vendor, \ + .family = _family, \ + .model = _model, \ + .steppings = _steppings, \ + .feature = _feature, \ + .driver_data = (unsigned long) _data \ +} + /** * X86_MATCH_VENDOR_FAM_MODEL_FEATURE - Macro for CPU matching * @_vendor: The vendor name, e.g. INTEL, AMD, HYGON, ..., ANY @@ -164,6 +207,56 @@ X86_MATCH_VENDOR_FAM_MODEL_STEPPINGS_FEATURE(INTEL, 6, INTEL_FAM6_##model, \ steppings, X86_FEATURE_ANY, data) +/** + * X86_MATCH_VFM - Match encoded vendor/family/model + * @vfm: Encoded 8-bits each for vendor, family, model + * @data: Driver specific data or NULL. The internal storage + * format is unsigned long. The supplied value, pointer + * etc. is cast to unsigned long internally. + * + * Stepping and feature are set to wildcards + */ +#define X86_MATCH_VFM(vfm, data) \ + X86_MATCH_VENDORID_FAM_MODEL_STEPPINGS_FEATURE( \ + VFM_VENDOR(vfm), \ + VFM_FAMILY(vfm), \ + VFM_MODEL(vfm), \ + X86_STEPPING_ANY, X86_FEATURE_ANY, data) + +/** + * X86_MATCH_VFM_STEPPINGS - Match encoded vendor/family/model/stepping + * @vfm: Encoded 8-bits each for vendor, family, model + * @steppings: Bitmask of steppings to match + * @data: Driver specific data or NULL. The internal storage + * format is unsigned long. The supplied value, pointer + * etc. is cast to unsigned long internally. + * + * feature is set to wildcard + */ +#define X86_MATCH_VFM_STEPPINGS(vfm, steppings, data) \ + X86_MATCH_VENDORID_FAM_MODEL_STEPPINGS_FEATURE( \ + VFM_VENDOR(vfm), \ + VFM_FAMILY(vfm), \ + VFM_MODEL(vfm), \ + steppings, X86_FEATURE_ANY, data) + +/** + * X86_MATCH_VFM_FEATURE - Match encoded vendor/family/model/feature + * @vfm: Encoded 8-bits each for vendor, family, model + * @feature: A X86_FEATURE bit + * @data: Driver specific data or NULL. The internal storage + * format is unsigned long. The supplied value, pointer + * etc. is cast to unsigned long internally. + * + * Steppings is set to wildcard + */ +#define X86_MATCH_VFM_FEATURE(vfm, feature, data) \ + X86_MATCH_VENDORID_FAM_MODEL_STEPPINGS_FEATURE( \ + VFM_VENDOR(vfm), \ + VFM_FAMILY(vfm), \ + VFM_MODEL(vfm), \ + X86_STEPPING_ANY, feature, data) + /* * Match specific microcode revisions. * From 40a697e34517fdbb3a31bfd50776575bd1886bc0 Mon Sep 17 00:00:00 2001 From: Tony Luck Date: Mon, 20 May 2024 15:45:33 -0700 Subject: [PATCH 1249/1565] x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL [ Upstream commit 93022482b2948a9a7e9b5a2bb685f2e1cb4c3348 ] Code in v6.9 arch/x86/kernel/smpboot.c was changed by commit 4db64279bc2b ("x86/cpu: Switch to new Intel CPU model defines") from: static const struct x86_cpu_id intel_cod_cpu[] = { X86_MATCH_INTEL_FAM6_MODEL(HASWELL_X, 0), /* COD */ X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_X, 0), /* COD */ X86_MATCH_INTEL_FAM6_MODEL(ANY, 1), /* SNC */ <--- 443 {} }; static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) { const struct x86_cpu_id *id = x86_match_cpu(intel_cod_cpu); to: static const struct x86_cpu_id intel_cod_cpu[] = { X86_MATCH_VFM(INTEL_HASWELL_X, 0), /* COD */ X86_MATCH_VFM(INTEL_BROADWELL_X, 0), /* COD */ X86_MATCH_VFM(INTEL_ANY, 1), /* SNC */ {} }; static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) { const struct x86_cpu_id *id = x86_match_cpu(intel_cod_cpu); On an Intel CPU with SNC enabled this code previously matched the rule on line 443 to avoid printing messages about insane cache configuration. The new code did not match any rules. Expanding the macros for the intel_cod_cpu[] array shows that the old is equivalent to: static const struct x86_cpu_id intel_cod_cpu[] = { [0] = { .vendor = 0, .family = 6, .model = 0x3F, .steppings = 0, .feature = 0, .driver_data = 0 }, [1] = { .vendor = 0, .family = 6, .model = 0x4F, .steppings = 0, .feature = 0, .driver_data = 0 }, [2] = { .vendor = 0, .family = 6, .model = 0x00, .steppings = 0, .feature = 0, .driver_data = 1 }, [3] = { .vendor = 0, .family = 0, .model = 0x00, .steppings = 0, .feature = 0, .driver_data = 0 } } while the new code expands to: static const struct x86_cpu_id intel_cod_cpu[] = { [0] = { .vendor = 0, .family = 6, .model = 0x3F, .steppings = 0, .feature = 0, .driver_data = 0 }, [1] = { .vendor = 0, .family = 6, .model = 0x4F, .steppings = 0, .feature = 0, .driver_data = 0 }, [2] = { .vendor = 0, .family = 0, .model = 0x00, .steppings = 0, .feature = 0, .driver_data = 1 }, [3] = { .vendor = 0, .family = 0, .model = 0x00, .steppings = 0, .feature = 0, .driver_data = 0 } } Looking at the code for x86_match_cpu(): const struct x86_cpu_id *x86_match_cpu(const struct x86_cpu_id *match) { const struct x86_cpu_id *m; struct cpuinfo_x86 *c = &boot_cpu_data; for (m = match; m->vendor | m->family | m->model | m->steppings | m->feature; m++) { ... } return NULL; it is clear that there was no match because the ANY entry in the table (array index 2) is now the loop termination condition (all of vendor, family, model, steppings, and feature are zero). So this code was working before because the "ANY" check was looking for any Intel CPU in family 6. But fails now because the family is a wild card. So the root cause is that x86_match_cpu() has never been able to match on a rule with just X86_VENDOR_INTEL and all other fields set to wildcards. Add a new flags field to struct x86_cpu_id that has a bit set to indicate that this entry in the array is valid. Update X86_MATCH*() macros to set that bit. Change the end-marker check in x86_match_cpu() to just check the flags field for this bit. Backporter notes: The commit in Fixes is really the one that is broken: you can't have m->vendor as part of the loop termination conditional in x86_match_cpu() because it can happen - as it has happened above - that that whole conditional is 0 albeit vendor == 0 is a valid case - X86_VENDOR_INTEL is 0. However, the only case where the above happens is the SNC check added by 4db64279bc2b1 so you only need this fix if you have backported that other commit 4db64279bc2b ("x86/cpu: Switch to new Intel CPU model defines") Fixes: 644e9cbbe3fc ("Add driver auto probing for x86 features v4") Suggested-by: Thomas Gleixner Suggested-by: Borislav Petkov Signed-off-by: Tony Luck Signed-off-by: Borislav Petkov (AMD) Cc: # see above Link: https://lore.kernel.org/r/20240517144312.GBZkdtAOuJZCvxhFbJ@fat_crate.local Signed-off-by: Sasha Levin --- arch/x86/include/asm/cpu_device_id.h | 5 +++++ arch/x86/kernel/cpu/match.c | 4 +--- include/linux/mod_devicetable.h | 2 ++ 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/cpu_device_id.h b/arch/x86/include/asm/cpu_device_id.h index dd7b9463696f..e8e3dbe7f173 100644 --- a/arch/x86/include/asm/cpu_device_id.h +++ b/arch/x86/include/asm/cpu_device_id.h @@ -53,6 +53,9 @@ #define X86_CENTAUR_FAM6_C7_D 0xd #define X86_CENTAUR_FAM6_NANO 0xf +/* x86_cpu_id::flags */ +#define X86_CPU_ID_FLAG_ENTRY_VALID BIT(0) + #define X86_STEPPINGS(mins, maxs) GENMASK(maxs, mins) /** * X86_MATCH_VENDOR_FAM_MODEL_STEPPINGS_FEATURE - Base macro for CPU matching @@ -79,6 +82,7 @@ .model = _model, \ .steppings = _steppings, \ .feature = _feature, \ + .flags = X86_CPU_ID_FLAG_ENTRY_VALID, \ .driver_data = (unsigned long) _data \ } @@ -89,6 +93,7 @@ .model = _model, \ .steppings = _steppings, \ .feature = _feature, \ + .flags = X86_CPU_ID_FLAG_ENTRY_VALID, \ .driver_data = (unsigned long) _data \ } diff --git a/arch/x86/kernel/cpu/match.c b/arch/x86/kernel/cpu/match.c index ad6776081e60..ae71b8ef909c 100644 --- a/arch/x86/kernel/cpu/match.c +++ b/arch/x86/kernel/cpu/match.c @@ -39,9 +39,7 @@ const struct x86_cpu_id *x86_match_cpu(const struct x86_cpu_id *match) const struct x86_cpu_id *m; struct cpuinfo_x86 *c = &boot_cpu_data; - for (m = match; - m->vendor | m->family | m->model | m->steppings | m->feature; - m++) { + for (m = match; m->flags & X86_CPU_ID_FLAG_ENTRY_VALID; m++) { if (m->vendor != X86_VENDOR_ANY && c->x86_vendor != m->vendor) continue; if (m->family != X86_FAMILY_ANY && c->x86 != m->family) diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h index 5b08a473cdba..18be4459aaf7 100644 --- a/include/linux/mod_devicetable.h +++ b/include/linux/mod_devicetable.h @@ -669,6 +669,8 @@ struct x86_cpu_id { __u16 model; __u16 steppings; __u16 feature; /* bit index */ + /* Solely for kernel-internal use: DO NOT EXPORT to userspace! */ + __u16 flags; kernel_ulong_t driver_data; }; From 6d3eb1658be6a404b8c3e1b9f6cf6a1ea2be401a Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Thu, 29 Oct 2020 18:56:06 +0100 Subject: [PATCH 1250/1565] r8169: remove unneeded memory barrier in rtl_tx [ Upstream commit 3a689e34973e8717cd57991c6fcf527dc56062b5 ] tp->dirty_tx isn't changed outside rtl_tx(). Therefore I see no need to guarantee a specific order of reading tp->dirty_tx and tp->cur_tx. Having said that we can remove the memory barrier. In addition use READ_ONCE() when reading tp->cur_tx because it can change in parallel to rtl_tx(). Signed-off-by: Heiner Kallweit Link: https://lore.kernel.org/r/2264563a-fa9e-11b0-2c42-31bc6b8e2790@gmail.com Signed-off-by: Jakub Kicinski Stable-dep-of: c71e3a5cffd5 ("r8169: Fix possible ring buffer corruption on fragmented Tx packets.") Signed-off-by: Sasha Levin --- drivers/net/ethernet/realtek/r8169_main.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index c29d43c5f450..84c8362a65cd 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -4472,9 +4472,8 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp, unsigned int dirty_tx, tx_left, bytes_compl = 0, pkts_compl = 0; dirty_tx = tp->dirty_tx; - smp_rmb(); - for (tx_left = tp->cur_tx - dirty_tx; tx_left > 0; tx_left--) { + for (tx_left = READ_ONCE(tp->cur_tx) - dirty_tx; tx_left; tx_left--) { unsigned int entry = dirty_tx % NUM_TX_DESC; struct sk_buff *skb = tp->tx_skb[entry].skb; u32 status; From 04f9d0cd3974554afc1b9b6ce13990f95aa83bfd Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Wed, 11 Nov 2020 22:56:29 +0100 Subject: [PATCH 1251/1565] r8169: improve rtl_tx [ Upstream commit ca1ab89cd2d654661f559bd83ad9fc7323cb6c86 ] We can simplify the for() condition and eliminate variable tx_left. The change also considers that tp->cur_tx may be incremented by a racing rtl8169_start_xmit(). In addition replace the write to tp->dirty_tx and the following smp_mb() with an equivalent call to smp_store_mb(). This implicitly adds a WRITE_ONCE() to the write. Signed-off-by: Heiner Kallweit Link: https://lore.kernel.org/r/c2e19e5e-3d3f-d663-af32-13c3374f5def@gmail.com Signed-off-by: Jakub Kicinski Stable-dep-of: c71e3a5cffd5 ("r8169: Fix possible ring buffer corruption on fragmented Tx packets.") Signed-off-by: Sasha Levin --- drivers/net/ethernet/realtek/r8169_main.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 84c8362a65cd..0847b64fbb2f 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -4469,11 +4469,11 @@ static void rtl8169_pcierr_interrupt(struct net_device *dev) static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp, int budget) { - unsigned int dirty_tx, tx_left, bytes_compl = 0, pkts_compl = 0; + unsigned int dirty_tx, bytes_compl = 0, pkts_compl = 0; dirty_tx = tp->dirty_tx; - for (tx_left = READ_ONCE(tp->cur_tx) - dirty_tx; tx_left; tx_left--) { + while (READ_ONCE(tp->cur_tx) != dirty_tx) { unsigned int entry = dirty_tx % NUM_TX_DESC; struct sk_buff *skb = tp->tx_skb[entry].skb; u32 status; @@ -4497,7 +4497,6 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp, rtl_inc_priv_stats(&tp->tx_stats, pkts_compl, bytes_compl); - tp->dirty_tx = dirty_tx; /* Sync with rtl8169_start_xmit: * - publish dirty_tx ring index (write barrier) * - refresh cur_tx ring index and queue status (read barrier) @@ -4505,7 +4504,7 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp, * a racing xmit thread can only have a right view of the * ring status. */ - smp_mb(); + smp_store_mb(tp->dirty_tx, dirty_tx); if (netif_queue_stopped(dev) && rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) { netif_wake_queue(dev); From 41eeb13459b2ede1e2094436036ff8f680b7af7f Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Sat, 14 Nov 2020 21:49:53 +0100 Subject: [PATCH 1252/1565] r8169: improve rtl8169_start_xmit [ Upstream commit 41294e6a434d4f19e957c55b275ea0324f275009 ] Improve the following in rtl8169_start_xmit: - tp->cur_tx can be accessed in parallel by rtl_tx(), therefore annotate the race by using WRITE_ONCE - avoid checking stop_queue a second time by moving the doorbell check - netif_stop_queue() uses atomic operation set_bit() that includes a full memory barrier on some platforms, therefore use smp_mb__after_atomic to avoid overhead Signed-off-by: Heiner Kallweit Link: https://lore.kernel.org/r/80085451-3eaf-507a-c7c0-08d607c46fbc@gmail.com Signed-off-by: Jakub Kicinski Stable-dep-of: c71e3a5cffd5 ("r8169: Fix possible ring buffer corruption on fragmented Tx packets.") Signed-off-by: Sasha Levin --- drivers/net/ethernet/realtek/r8169_main.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 0847b64fbb2f..b678fd1436a4 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -4331,7 +4331,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, /* rtl_tx needs to see descriptor changes before updated tp->cur_tx */ smp_wmb(); - tp->cur_tx += frags + 1; + WRITE_ONCE(tp->cur_tx, tp->cur_tx + frags + 1); stop_queue = !rtl_tx_slots_avail(tp, MAX_SKB_FRAGS); if (unlikely(stop_queue)) { @@ -4340,13 +4340,6 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, */ smp_wmb(); netif_stop_queue(dev); - door_bell = true; - } - - if (door_bell) - rtl8169_doorbell(tp); - - if (unlikely(stop_queue)) { /* Sync with rtl_tx: * - publish queue status and cur_tx ring index (write barrier) * - refresh dirty_tx ring index (read barrier). @@ -4354,11 +4347,15 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, * status and forget to wake up queue, a racing rtl_tx thread * can't. */ - smp_mb(); + smp_mb__after_atomic(); if (rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) netif_start_queue(dev); + door_bell = true; } + if (door_bell) + rtl8169_doorbell(tp); + return NETDEV_TX_OK; err_dma_1: From 48833226fb0832627f54d215314c7cefb7d3a624 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Mon, 16 Nov 2020 17:03:14 +0100 Subject: [PATCH 1253/1565] r8169: remove nr_frags argument from rtl_tx_slots_avail [ Upstream commit 83c317d7b36bb3858cf1cb86d2635ec3f3bd6ea3 ] The only time when nr_frags isn't SKB_MAX_FRAGS is when entering rtl8169_start_xmit(). However we can use SKB_MAX_FRAGS also here because when queue isn't stopped there should always be room for MAX_SKB_FRAGS + 1 descriptors. Signed-off-by: Heiner Kallweit Link: https://lore.kernel.org/r/3d1f2ad7-31d5-2cac-4f4a-394f8a3cab63@gmail.com Signed-off-by: Jakub Kicinski Stable-dep-of: c71e3a5cffd5 ("r8169: Fix possible ring buffer corruption on fragmented Tx packets.") Signed-off-by: Sasha Levin --- drivers/net/ethernet/realtek/r8169_main.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index b678fd1436a4..dbf885f4dd01 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -4247,13 +4247,12 @@ static bool rtl8169_tso_csum_v2(struct rtl8169_private *tp, return true; } -static bool rtl_tx_slots_avail(struct rtl8169_private *tp, - unsigned int nr_frags) +static bool rtl_tx_slots_avail(struct rtl8169_private *tp) { unsigned int slots_avail = tp->dirty_tx + NUM_TX_DESC - tp->cur_tx; /* A skbuff with nr_frags needs nr_frags+1 entries in the tx queue */ - return slots_avail > nr_frags; + return slots_avail > MAX_SKB_FRAGS; } /* Versions RTL8102e and from RTL8168c onwards support csum_v2 */ @@ -4288,7 +4287,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, txd_first = tp->TxDescArray + entry; - if (unlikely(!rtl_tx_slots_avail(tp, frags))) { + if (unlikely(!rtl_tx_slots_avail(tp))) { if (net_ratelimit()) netdev_err(dev, "BUG! Tx Ring full when queue awake!\n"); goto err_stop_0; @@ -4333,7 +4332,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, WRITE_ONCE(tp->cur_tx, tp->cur_tx + frags + 1); - stop_queue = !rtl_tx_slots_avail(tp, MAX_SKB_FRAGS); + stop_queue = !rtl_tx_slots_avail(tp); if (unlikely(stop_queue)) { /* Avoid wrongly optimistic queue wake-up: rtl_tx thread must * not miss a ring update when it notices a stopped queue. @@ -4348,7 +4347,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, * can't. */ smp_mb__after_atomic(); - if (rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) + if (rtl_tx_slots_avail(tp)) netif_start_queue(dev); door_bell = true; } @@ -4502,10 +4501,8 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp, * ring status. */ smp_store_mb(tp->dirty_tx, dirty_tx); - if (netif_queue_stopped(dev) && - rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) { + if (netif_queue_stopped(dev) && rtl_tx_slots_avail(tp)) netif_wake_queue(dev); - } /* * 8168 hack: TxPoll requests are lost when the Tx packets are * too close. Let's kick an extra TxPoll request when a burst From 5c88f4f6341cfc7cc65e08d8e6e38b3df8004f4c Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Tue, 17 Nov 2020 21:34:09 +0100 Subject: [PATCH 1254/1565] r8169: remove not needed check in rtl8169_start_xmit [ Upstream commit bd4bdeb4f29027199c68104fbdfa07ad45390cc1 ] In rtl_tx() the released descriptors are zero'ed by rtl8169_unmap_tx_skb(). And in the beginning of rtl8169_start_xmit() we check that enough descriptors are free, therefore there's no way the DescOwn bit can be set here. Signed-off-by: Heiner Kallweit Link: https://lore.kernel.org/r/6965d665-6c50-90c5-70e6-0bb335d4ea47@gmail.com Signed-off-by: Jakub Kicinski Stable-dep-of: c71e3a5cffd5 ("r8169: Fix possible ring buffer corruption on fragmented Tx packets.") Signed-off-by: Sasha Levin --- drivers/net/ethernet/realtek/r8169_main.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index dbf885f4dd01..dd4404efdb8d 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -4285,17 +4285,12 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, bool stop_queue, door_bell; u32 opts[2]; - txd_first = tp->TxDescArray + entry; - if (unlikely(!rtl_tx_slots_avail(tp))) { if (net_ratelimit()) netdev_err(dev, "BUG! Tx Ring full when queue awake!\n"); goto err_stop_0; } - if (unlikely(le32_to_cpu(txd_first->opts1) & DescOwn)) - goto err_stop_0; - opts[1] = rtl8169_tx_vlan_tag(skb); opts[0] = 0; @@ -4308,6 +4303,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, entry, false))) goto err_dma_0; + txd_first = tp->TxDescArray + entry; + if (frags) { if (rtl8169_xmit_frags(tp, skb, opts, entry)) goto err_dma_1; From 61c1c98e2607120ce9c3fa1bf75e6da909712b27 Mon Sep 17 00:00:00 2001 From: Ken Milmore Date: Tue, 21 May 2024 23:45:50 +0100 Subject: [PATCH 1255/1565] r8169: Fix possible ring buffer corruption on fragmented Tx packets. [ Upstream commit c71e3a5cffd5309d7f84444df03d5b72600cc417 ] An issue was found on the RTL8125b when transmitting small fragmented packets, whereby invalid entries were inserted into the transmit ring buffer, subsequently leading to calls to dma_unmap_single() with a null address. This was caused by rtl8169_start_xmit() not noticing changes to nr_frags which may occur when small packets are padded (to work around hardware quirks) in rtl8169_tso_csum_v2(). To fix this, postpone inspecting nr_frags until after any padding has been applied. Fixes: 9020845fb5d6 ("r8169: improve rtl8169_start_xmit") Cc: stable@vger.kernel.org Signed-off-by: Ken Milmore Reviewed-by: Heiner Kallweit Link: https://lore.kernel.org/r/27ead18b-c23d-4f49-a020-1fc482c5ac95@gmail.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- drivers/net/ethernet/realtek/r8169_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index dd4404efdb8d..d24eb5ee152a 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -4278,11 +4278,11 @@ static void rtl8169_doorbell(struct rtl8169_private *tp) static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, struct net_device *dev) { - unsigned int frags = skb_shinfo(skb)->nr_frags; struct rtl8169_private *tp = netdev_priv(dev); unsigned int entry = tp->cur_tx % NUM_TX_DESC; struct TxDesc *txd_first, *txd_last; bool stop_queue, door_bell; + unsigned int frags; u32 opts[2]; if (unlikely(!rtl_tx_slots_avail(tp))) { @@ -4305,6 +4305,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, txd_first = tp->TxDescArray + entry; + frags = skb_shinfo(skb)->nr_frags; if (frags) { if (rtl8169_xmit_frags(tp, skb, opts, entry)) goto err_dma_1; From 247c3f8958ab847fb86c024413763361f0556f80 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Sun, 21 May 2023 22:23:35 +0900 Subject: [PATCH 1256/1565] Revert "kheaders: substituting --sort in archive creation" [ Upstream commit 49c386ebbb43394ff4773ce24f726f6afc4c30c8 ] This reverts commit 700dea5a0bea9f64eba89fae7cb2540326fdfdc1. The reason for that commit was --sort=ORDER introduced in tar 1.28 (2014). More than 3 years have passed since then. Requiring GNU tar 1.28 should be fine now because we require GCC 5.1 (2015). Signed-off-by: Masahiro Yamada Reviewed-by: Nicolas Schier Stable-dep-of: 3bd27a847a3a ("kheaders: explicitly define file modes for archived headers") Signed-off-by: Sasha Levin --- kernel/gen_kheaders.sh | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/kernel/gen_kheaders.sh b/kernel/gen_kheaders.sh index c1510f0ab3ea..d7e827c6cd2d 100755 --- a/kernel/gen_kheaders.sh +++ b/kernel/gen_kheaders.sh @@ -83,12 +83,9 @@ find $cpio_dir -type f -print0 | xargs -0 -P8 -n1 perl -pi -e 'BEGIN {undef $/;}; s/\/\*((?!SPDX).)*?\*\///smg;' # Create archive and try to normalize metadata for reproducibility. -# For compatibility with older versions of tar, files are fed to tar -# pre-sorted, as --sort=name might not be available. -find $cpio_dir -printf "./%P\n" | LC_ALL=C sort | \ - tar "${KBUILD_BUILD_TIMESTAMP:+--mtime=$KBUILD_BUILD_TIMESTAMP}" \ - --owner=0 --group=0 --numeric-owner --no-recursion \ - -I $XZ -cf $tarfile -C $cpio_dir/ -T - > /dev/null +tar "${KBUILD_BUILD_TIMESTAMP:+--mtime=$KBUILD_BUILD_TIMESTAMP}" \ + --owner=0 --group=0 --sort=name --numeric-owner \ + -I $XZ -cf $tarfile -C $cpio_dir/ . > /dev/null echo $headers_md5 > kernel/kheaders.md5 echo "$this_file_md5" >> kernel/kheaders.md5 From 695f20c6785d3e1ce42aa9a1fb06a1fd2828c89b Mon Sep 17 00:00:00 2001 From: Matthias Maennich Date: Tue, 28 May 2024 11:32:43 +0000 Subject: [PATCH 1257/1565] kheaders: explicitly define file modes for archived headers [ Upstream commit 3bd27a847a3a4827a948387cc8f0dbc9fa5931d5 ] Build environments might be running with different umask settings resulting in indeterministic file modes for the files contained in kheaders.tar.xz. The file itself is served with 444, i.e. world readable. Archive the files explicitly with 744,a+X to improve reproducibility across build environments. --mode=0444 is not suitable as directories need to be executable. Also, 444 makes it hard to delete all the readonly files after extraction. Cc: stable@vger.kernel.org Signed-off-by: Matthias Maennich Signed-off-by: Masahiro Yamada Signed-off-by: Sasha Levin --- kernel/gen_kheaders.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/gen_kheaders.sh b/kernel/gen_kheaders.sh index d7e827c6cd2d..206ab3d41ee7 100755 --- a/kernel/gen_kheaders.sh +++ b/kernel/gen_kheaders.sh @@ -84,7 +84,7 @@ find $cpio_dir -type f -print0 | # Create archive and try to normalize metadata for reproducibility. tar "${KBUILD_BUILD_TIMESTAMP:+--mtime=$KBUILD_BUILD_TIMESTAMP}" \ - --owner=0 --group=0 --sort=name --numeric-owner \ + --owner=0 --group=0 --sort=name --numeric-owner --mode=u=rw,go=r,a+X \ -I $XZ -cf $tarfile -C $cpio_dir/ . > /dev/null echo $headers_md5 > kernel/kheaders.md5 From a335ad77bda22a5fff43078be7e7bc2a37d2ffab Mon Sep 17 00:00:00 2001 From: Haifeng Xu Date: Mon, 13 May 2024 10:39:48 +0000 Subject: [PATCH 1258/1565] perf/core: Fix missing wakeup when waiting for context reference [ Upstream commit 74751ef5c1912ebd3e65c3b65f45587e05ce5d36 ] In our production environment, we found many hung tasks which are blocked for more than 18 hours. Their call traces are like this: [346278.191038] __schedule+0x2d8/0x890 [346278.191046] schedule+0x4e/0xb0 [346278.191049] perf_event_free_task+0x220/0x270 [346278.191056] ? init_wait_var_entry+0x50/0x50 [346278.191060] copy_process+0x663/0x18d0 [346278.191068] kernel_clone+0x9d/0x3d0 [346278.191072] __do_sys_clone+0x5d/0x80 [346278.191076] __x64_sys_clone+0x25/0x30 [346278.191079] do_syscall_64+0x5c/0xc0 [346278.191083] ? syscall_exit_to_user_mode+0x27/0x50 [346278.191086] ? do_syscall_64+0x69/0xc0 [346278.191088] ? irqentry_exit_to_user_mode+0x9/0x20 [346278.191092] ? irqentry_exit+0x19/0x30 [346278.191095] ? exc_page_fault+0x89/0x160 [346278.191097] ? asm_exc_page_fault+0x8/0x30 [346278.191102] entry_SYSCALL_64_after_hwframe+0x44/0xae The task was waiting for the refcount become to 1, but from the vmcore, we found the refcount has already been 1. It seems that the task didn't get woken up by perf_event_release_kernel() and got stuck forever. The below scenario may cause the problem. Thread A Thread B ... ... perf_event_free_task perf_event_release_kernel ... acquire event->child_mutex ... get_ctx ... release event->child_mutex acquire ctx->mutex ... perf_free_event (acquire/release event->child_mutex) ... release ctx->mutex wait_var_event acquire ctx->mutex acquire event->child_mutex # move existing events to free_list release event->child_mutex release ctx->mutex put_ctx ... ... In this case, all events of the ctx have been freed, so we couldn't find the ctx in free_list and Thread A will miss the wakeup. It's thus necessary to add a wakeup after dropping the reference. Fixes: 1cf8dfe8a661 ("perf/core: Fix race between close() and fork()") Signed-off-by: Haifeng Xu Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Frederic Weisbecker Acked-by: Mark Rutland Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20240513103948.33570-1-haifeng.xu@shopee.com Signed-off-by: Sasha Levin --- kernel/events/core.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/kernel/events/core.c b/kernel/events/core.c index e0b47bed8675..3a191bec69ac 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5128,6 +5128,7 @@ int perf_event_release_kernel(struct perf_event *event) again: mutex_lock(&event->child_mutex); list_for_each_entry(child, &event->child_list, child_list) { + void *var = NULL; /* * Cannot change, child events are not migrated, see the @@ -5168,11 +5169,23 @@ int perf_event_release_kernel(struct perf_event *event) * this can't be the last reference. */ put_event(event); + } else { + var = &ctx->refcount; } mutex_unlock(&event->child_mutex); mutex_unlock(&ctx->mutex); put_ctx(ctx); + + if (var) { + /* + * If perf_event_free_task() has deleted all events from the + * ctx while the child_mutex got released above, make sure to + * notify about the preceding put_ctx(). + */ + smp_mb(); /* pairs with wait_var_event() */ + wake_up_var(var); + } goto again; } mutex_unlock(&event->child_mutex); From 0caf70a8e8168bfcedfb85d09e7f3d76d69f5bf1 Mon Sep 17 00:00:00 2001 From: Naveen Naidu Date: Thu, 18 Nov 2021 19:33:11 +0530 Subject: [PATCH 1259/1565] PCI: Add PCI_ERROR_RESPONSE and related definitions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 57bdeef4716689d9b0e3571034d65cf420f6efcd ] A config or MMIO read from a PCI device that doesn't exist or doesn't respond causes a PCI error. There's no real data to return to satisfy the CPU read, so most hardware fabricates ~0 data. Add a PCI_ERROR_RESPONSE definition for that and use it where appropriate to make these checks consistent and easier to find. Also add helper definitions PCI_SET_ERROR_RESPONSE() and PCI_POSSIBLE_ERROR() to make the code more readable. Suggested-by: Bjorn Helgaas Link: https://lore.kernel.org/r/55563bf4dfc5d3fdc96695373c659d099bf175b1.1637243717.git.naveennaidu479@gmail.com Signed-off-by: Naveen Naidu Signed-off-by: Bjorn Helgaas Reviewed-by: Pali Rohár Stable-dep-of: c625dabbf1c4 ("x86/amd_nb: Check for invalid SMN reads") Signed-off-by: Sasha Levin --- include/linux/pci.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/include/linux/pci.h b/include/linux/pci.h index 5b24a6fbfa0b..30bc462fb196 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -148,6 +148,15 @@ enum pci_interrupt_pin { /* The number of legacy PCI INTx interrupts */ #define PCI_NUM_INTX 4 +/* + * Reading from a device that doesn't respond typically returns ~0. A + * successful read from a device may also return ~0, so you need additional + * information to reliably identify errors. + */ +#define PCI_ERROR_RESPONSE (~0ULL) +#define PCI_SET_ERROR_RESPONSE(val) (*(val) = ((typeof(*(val))) PCI_ERROR_RESPONSE)) +#define PCI_POSSIBLE_ERROR(val) ((val) == ((typeof(val)) PCI_ERROR_RESPONSE)) + /* * pci_power_t values must match the bits in the Capabilities PME_Support * and Control/Status PowerState fields in the Power Management capability. From b03555a8fa054f7310965aec481f89964d3b149f Mon Sep 17 00:00:00 2001 From: Yazen Ghannam Date: Mon, 3 Apr 2023 16:42:44 +0000 Subject: [PATCH 1260/1565] x86/amd_nb: Check for invalid SMN reads [ Upstream commit c625dabbf1c4a8e77e4734014f2fde7aa9071a1f ] AMD Zen-based systems use a System Management Network (SMN) that provides access to implementation-specific registers. SMN accesses are done indirectly through an index/data pair in PCI config space. The PCI config access may fail and return an error code. This would prevent the "read" value from being updated. However, the PCI config access may succeed, but the return value may be invalid. This is in similar fashion to PCI bad reads, i.e. return all bits set. Most systems will return 0 for SMN addresses that are not accessible. This is in line with AMD convention that unavailable registers are Read-as-Zero/Writes-Ignored. However, some systems will return a "PCI Error Response" instead. This value, along with an error code of 0 from the PCI config access, will confuse callers of the amd_smn_read() function. Check for this condition, clear the return value, and set a proper error code. Fixes: ddfe43cdc0da ("x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h") Signed-off-by: Yazen Ghannam Signed-off-by: Borislav Petkov (AMD) Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230403164244.471141-1-yazen.ghannam@amd.com Signed-off-by: Sasha Levin --- arch/x86/kernel/amd_nb.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c index 18f6b7c4bd79..16cd56627574 100644 --- a/arch/x86/kernel/amd_nb.c +++ b/arch/x86/kernel/amd_nb.c @@ -164,7 +164,14 @@ static int __amd_smn_rw(u16 node, u32 address, u32 *value, bool write) int amd_smn_read(u16 node, u32 address, u32 *value) { - return __amd_smn_rw(node, address, value, false); + int err = __amd_smn_rw(node, address, value, false); + + if (PCI_POSSIBLE_ERROR(*value)) { + err = -ENODEV; + *value = 0; + } + + return err; } EXPORT_SYMBOL_GPL(amd_smn_read); From 78ebec450ef4f0720c592638d92bad679d75d7ce Mon Sep 17 00:00:00 2001 From: Shyam Prasad N Date: Sun, 23 May 2021 16:54:42 +0000 Subject: [PATCH 1261/1565] cifs: missed ref-counting smb session in find [ Upstream commit e695a9ad0305af6e8b0cbc24a54976ac2120cbb3 ] When we lookup an smb session based on session id, we did not up the ref-count for the session. This can potentially cause issues if the session is freed from under us. Signed-off-by: Shyam Prasad N Reviewed-by: Aurelien Aptel Reviewed-by: Paulo Alcantara (SUSE) Signed-off-by: Steve French Stable-dep-of: 02c418774f76 ("smb: client: fix deadlock in smb2_find_smb_tcon()") Signed-off-by: Sasha Levin --- fs/cifs/smb2transport.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/cifs/smb2transport.c b/fs/cifs/smb2transport.c index d659eb70df76..f40b8de2aeeb 100644 --- a/fs/cifs/smb2transport.c +++ b/fs/cifs/smb2transport.c @@ -154,6 +154,7 @@ smb2_find_smb_ses_unlocked(struct TCP_Server_Info *server, __u64 ses_id) list_for_each_entry(ses, &server->smb_ses_list, smb_ses_list) { if (ses->Suid != ses_id) continue; + ++ses->ses_count; return ses; } @@ -205,7 +206,14 @@ smb2_find_smb_tcon(struct TCP_Server_Info *server, __u64 ses_id, __u32 tid) return NULL; } tcon = smb2_find_smb_sess_tcon_unlocked(ses, tid); + if (!tcon) { + cifs_put_smb_ses(ses); + spin_unlock(&cifs_tcp_ses_lock); + return NULL; + } spin_unlock(&cifs_tcp_ses_lock); + /* tcon already has a ref to ses, so we don't need ses anymore */ + cifs_put_smb_ses(ses); return tcon; } @@ -239,7 +247,7 @@ smb2_calc_signature(struct smb_rqst *rqst, struct TCP_Server_Info *server, if (rc) { cifs_server_dbg(VFS, "%s: sha256 alloc failed\n", __func__); - return rc; + goto out; } shash = &sdesc->shash; } else { @@ -290,6 +298,8 @@ smb2_calc_signature(struct smb_rqst *rqst, struct TCP_Server_Info *server, out: if (allocate_crypto) cifs_free_hash(&hash, &sdesc); + if (ses) + cifs_put_smb_ses(ses); return rc; } From b055752675cd1d1db4ac9c2750db3dc3e89ea261 Mon Sep 17 00:00:00 2001 From: Enzo Matsumiya Date: Thu, 6 Jun 2024 13:13:13 -0300 Subject: [PATCH 1262/1565] smb: client: fix deadlock in smb2_find_smb_tcon() [ Upstream commit 02c418774f76a0a36a6195c9dbf8971eb4130a15 ] Unlock cifs_tcp_ses_lock before calling cifs_put_smb_ses() to avoid such deadlock. Cc: stable@vger.kernel.org Signed-off-by: Enzo Matsumiya Reviewed-by: Shyam Prasad N Reviewed-by: Paulo Alcantara (Red Hat) Signed-off-by: Steve French Signed-off-by: Sasha Levin --- fs/cifs/smb2transport.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/cifs/smb2transport.c b/fs/cifs/smb2transport.c index f40b8de2aeeb..adb324234b44 100644 --- a/fs/cifs/smb2transport.c +++ b/fs/cifs/smb2transport.c @@ -207,8 +207,8 @@ smb2_find_smb_tcon(struct TCP_Server_Info *server, __u64 ses_id, __u32 tid) } tcon = smb2_find_smb_sess_tcon_unlocked(ses, tid); if (!tcon) { - cifs_put_smb_ses(ses); spin_unlock(&cifs_tcp_ses_lock); + cifs_put_smb_ses(ses); return NULL; } spin_unlock(&cifs_tcp_ses_lock); From 83f65222100523e818348f2d243745a1376b4dcf Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Wed, 9 Jun 2021 13:40:18 -0500 Subject: [PATCH 1263/1565] ACPI: Add quirks for AMD Renoir/Lucienne CPUs to force the D3 hint [ Upstream commit 6485fc18faa01e8845b1e5bb55118e633f84d1f2 ] AMD systems from Renoir and Lucienne require that the NVME controller is put into D3 over a Modern Standby / suspend-to-idle cycle. This is "typically" accomplished using the `StorageD3Enable` property in the _DSD, but this property was introduced after many of these systems launched and most OEM systems don't have it in their BIOS. On AMD Renoir without these drives going into D3 over suspend-to-idle the resume will fail with the NVME controller being reset and a trace like this in the kernel logs: ``` [ 83.556118] nvme nvme0: I/O 161 QID 2 timeout, aborting [ 83.556178] nvme nvme0: I/O 162 QID 2 timeout, aborting [ 83.556187] nvme nvme0: I/O 163 QID 2 timeout, aborting [ 83.556196] nvme nvme0: I/O 164 QID 2 timeout, aborting [ 95.332114] nvme nvme0: I/O 25 QID 0 timeout, reset controller [ 95.332843] nvme nvme0: Abort status: 0x371 [ 95.332852] nvme nvme0: Abort status: 0x371 [ 95.332856] nvme nvme0: Abort status: 0x371 [ 95.332859] nvme nvme0: Abort status: 0x371 [ 95.332909] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -16 [ 95.332936] nvme 0000:03:00.0: PM: failed to resume async: error -16 ``` The Microsoft documentation for StorageD3Enable mentioned that Windows has a hardcoded allowlist for D3 support, which was used for these platforms. Introduce quirks to hardcode them for Linux as well. As this property is now "standardized", OEM systems using AMD Cezanne and newer APU's have adopted this property, and quirks like this should not be necessary. CC: Shyam-sundar S-k CC: Alexander Deucher CC: Prike Liang Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro Signed-off-by: Mario Limonciello Acked-by: Rafael J. Wysocki Tested-by: Julian Sikorski Signed-off-by: Christoph Hellwig Stable-dep-of: e79a10652bbd ("ACPI: x86: Force StorageD3Enable on more products") Signed-off-by: Sasha Levin --- drivers/acpi/device_pm.c | 3 +++ drivers/acpi/internal.h | 9 +++++++++ drivers/acpi/x86/utils.c | 25 +++++++++++++++++++++++++ 3 files changed, 37 insertions(+) diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c index 66e53df75865..4c38846fea96 100644 --- a/drivers/acpi/device_pm.c +++ b/drivers/acpi/device_pm.c @@ -1346,6 +1346,9 @@ bool acpi_storage_d3(struct device *dev) struct acpi_device *adev = ACPI_COMPANION(dev); u8 val; + if (force_storage_d3()) + return true; + if (!adev) return false; if (fwnode_property_read_u8(acpi_fwnode_handle(adev), "StorageD3Enable", diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h index 125e4901c9b4..f6c929787c9e 100644 --- a/drivers/acpi/internal.h +++ b/drivers/acpi/internal.h @@ -237,6 +237,15 @@ static inline int suspend_nvs_save(void) { return 0; } static inline void suspend_nvs_restore(void) {} #endif +#ifdef CONFIG_X86 +bool force_storage_d3(void); +#else +static inline bool force_storage_d3(void) +{ + return false; +} +#endif + /*-------------------------------------------------------------------------- Device properties -------------------------------------------------------------------------- */ diff --git a/drivers/acpi/x86/utils.c b/drivers/acpi/x86/utils.c index 3f9a162be84e..b3fb428461c6 100644 --- a/drivers/acpi/x86/utils.c +++ b/drivers/acpi/x86/utils.c @@ -177,3 +177,28 @@ bool acpi_device_override_status(struct acpi_device *adev, unsigned long long *s return ret; } + +/* + * AMD systems from Renoir and Lucienne *require* that the NVME controller + * is put into D3 over a Modern Standby / suspend-to-idle cycle. + * + * This is "typically" accomplished using the `StorageD3Enable` + * property in the _DSD that is checked via the `acpi_storage_d3` function + * but this property was introduced after many of these systems launched + * and most OEM systems don't have it in their BIOS. + * + * The Microsoft documentation for StorageD3Enable mentioned that Windows has + * a hardcoded allowlist for D3 support, which was used for these platforms. + * + * This allows quirking on Linux in a similar fashion. + */ +static const struct x86_cpu_id storage_d3_cpu_ids[] = { + X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 96, NULL), /* Renoir */ + X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 104, NULL), /* Lucienne */ + {} +}; + +bool force_storage_d3(void) +{ + return x86_match_cpu(storage_d3_cpu_ids); +} From fe73b1d0804de57556ae35728947350b530d15a7 Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Thu, 15 Sep 2022 13:23:14 -0500 Subject: [PATCH 1264/1565] ACPI: x86: Add a quirk for Dell Inspiron 14 2-in-1 for StorageD3Enable [ Upstream commit 018d6711c26e4bd26e20a819fcc7f8ab902608f3 ] Dell Inspiron 14 2-in-1 has two ACPI nodes under GPP1 both with _ADR of 0, both without _HID. It's ambiguous which the kernel should take, but it seems to take "DEV0". Unfortunately "DEV0" is missing the device property `StorageD3Enable` which is present on "NVME". To avoid this causing problems for suspend, add a quirk for this system to behave like `StorageD3Enable` property was found. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216440 Reported-and-tested-by: Luya Tshimbalanga Signed-off-by: Mario Limonciello Reviewed-by: Hans de Goede Signed-off-by: Rafael J. Wysocki Stable-dep-of: e79a10652bbd ("ACPI: x86: Force StorageD3Enable on more products") Signed-off-by: Sasha Levin --- drivers/acpi/x86/utils.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/x86/utils.c b/drivers/acpi/x86/utils.c index b3fb428461c6..3a3f09b6cbfc 100644 --- a/drivers/acpi/x86/utils.c +++ b/drivers/acpi/x86/utils.c @@ -198,7 +198,24 @@ static const struct x86_cpu_id storage_d3_cpu_ids[] = { {} }; +static const struct dmi_system_id force_storage_d3_dmi[] = { + { + /* + * _ADR is ambiguous between GPP1.DEV0 and GPP1.NVME + * but .NVME is needed to get StorageD3Enable node + * https://bugzilla.kernel.org/show_bug.cgi?id=216440 + */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Inspiron 14 7425 2-in-1"), + } + }, + {} +}; + bool force_storage_d3(void) { - return x86_match_cpu(storage_d3_cpu_ids); + const struct dmi_system_id *dmi_id = dmi_first_match(force_storage_d3_dmi); + + return dmi_id || x86_match_cpu(storage_d3_cpu_ids); } From c2a6ab506fd27243841fd5d66bc96d323b2c9039 Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Fri, 14 Oct 2022 07:11:36 -0500 Subject: [PATCH 1265/1565] ACPI: x86: Add another system to quirk list for forcing StorageD3Enable [ Upstream commit 2124becad797245d49252d2d733aee0322233d7e ] commit 018d6711c26e4 ("ACPI: x86: Add a quirk for Dell Inspiron 14 2-in-1 for StorageD3Enable") introduced a quirk to allow a system with ambiguous use of _ADR 0 to force StorageD3Enable. Julius Brockmann reports that Inspiron 16 5625 suffers that same symptoms. Add this other system to the list as well. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216440 Reported-and-tested-by: Julius Brockmann Signed-off-by: Mario Limonciello Signed-off-by: Rafael J. Wysocki Stable-dep-of: e79a10652bbd ("ACPI: x86: Force StorageD3Enable on more products") Signed-off-by: Sasha Levin --- drivers/acpi/x86/utils.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/acpi/x86/utils.c b/drivers/acpi/x86/utils.c index 3a3f09b6cbfc..222b951ff56a 100644 --- a/drivers/acpi/x86/utils.c +++ b/drivers/acpi/x86/utils.c @@ -210,6 +210,12 @@ static const struct dmi_system_id force_storage_d3_dmi[] = { DMI_MATCH(DMI_PRODUCT_NAME, "Inspiron 14 7425 2-in-1"), } }, + { + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "Inspiron 16 5625"), + } + }, {} }; From bb1758cc4af868c8b253b83816a09fd6a90d171d Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Tue, 28 Feb 2023 16:11:28 -0600 Subject: [PATCH 1266/1565] ACPI: x86: utils: Add Cezanne to the list for forcing StorageD3Enable [ Upstream commit e2a56364485e7789e7b8f342637c7f3a219f7ede ] commit 018d6711c26e4 ("ACPI: x86: Add a quirk for Dell Inspiron 14 2-in-1 for StorageD3Enable") introduced a quirk to allow a system with ambiguous use of _ADR 0 to force StorageD3Enable. It was reported that several more Dell systems suffered the same symptoms. As the list is continuing to grow but these are all Cezanne systems, instead add Cezanne to the CPU list to apply the StorageD3Enable property and remove the whole list. It was also reported that an HP system only has StorageD3Enable on the ACPI device for the first NVME disk, not the second. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217003 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216773 Reported-by: David Alvarez Lombardi Reported-by: dbilios@stdio.gr Reported-and-tested-by: Elvis Angelaccio Tested-by: victor.bonnelle@proton.me Tested-by: hurricanepootis@protonmail.com Signed-off-by: Mario Limonciello Signed-off-by: Rafael J. Wysocki Stable-dep-of: e79a10652bbd ("ACPI: x86: Force StorageD3Enable on more products") Signed-off-by: Sasha Levin --- drivers/acpi/x86/utils.c | 37 +++++++++++++------------------------ 1 file changed, 13 insertions(+), 24 deletions(-) diff --git a/drivers/acpi/x86/utils.c b/drivers/acpi/x86/utils.c index 222b951ff56a..f1dd086d0b87 100644 --- a/drivers/acpi/x86/utils.c +++ b/drivers/acpi/x86/utils.c @@ -191,37 +191,26 @@ bool acpi_device_override_status(struct acpi_device *adev, unsigned long long *s * a hardcoded allowlist for D3 support, which was used for these platforms. * * This allows quirking on Linux in a similar fashion. + * + * Cezanne systems shouldn't *normally* need this as the BIOS includes + * StorageD3Enable. But for two reasons we have added it. + * 1) The BIOS on a number of Dell systems have ambiguity + * between the same value used for _ADR on ACPI nodes GPP1.DEV0 and GPP1.NVME. + * GPP1.NVME is needed to get StorageD3Enable node set properly. + * https://bugzilla.kernel.org/show_bug.cgi?id=216440 + * https://bugzilla.kernel.org/show_bug.cgi?id=216773 + * https://bugzilla.kernel.org/show_bug.cgi?id=217003 + * 2) On at least one HP system StorageD3Enable is missing on the second NVME + disk in the system. */ static const struct x86_cpu_id storage_d3_cpu_ids[] = { X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 96, NULL), /* Renoir */ X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 104, NULL), /* Lucienne */ - {} -}; - -static const struct dmi_system_id force_storage_d3_dmi[] = { - { - /* - * _ADR is ambiguous between GPP1.DEV0 and GPP1.NVME - * but .NVME is needed to get StorageD3Enable node - * https://bugzilla.kernel.org/show_bug.cgi?id=216440 - */ - .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), - DMI_MATCH(DMI_PRODUCT_NAME, "Inspiron 14 7425 2-in-1"), - } - }, - { - .matches = { - DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), - DMI_MATCH(DMI_PRODUCT_NAME, "Inspiron 16 5625"), - } - }, + X86_MATCH_VENDOR_FAM_MODEL(AMD, 25, 80, NULL), /* Cezanne */ {} }; bool force_storage_d3(void) { - const struct dmi_system_id *dmi_id = dmi_first_match(force_storage_d3_dmi); - - return dmi_id || x86_match_cpu(storage_d3_cpu_ids); + return x86_match_cpu(storage_d3_cpu_ids); } From 3f830c24840077d7de0e1ee05a37a09a33da999d Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Fri, 31 Mar 2023 11:08:42 -0500 Subject: [PATCH 1267/1565] ACPI: x86: utils: Add Picasso to the list for forcing StorageD3Enable [ Upstream commit 10b6b4a8ac6120ec36555fd286eed577f7632e3b ] Picasso was the first APU that introduced s2idle support from AMD, and it was predating before vendors started to use `StorageD3Enable` in their firmware. Windows doesn't have problems with this hardware and NVME so it was likely on the list of hardcoded CPUs to use this behavior in Windows. Add it to the list for Linux to avoid NVME resume issues. Reported-by: Stuart Axon Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2449 Signed-off-by: Mario Limonciello Signed-off-by: Rafael J. Wysocki Stable-dep-of: e79a10652bbd ("ACPI: x86: Force StorageD3Enable on more products") Signed-off-by: Sasha Levin --- drivers/acpi/x86/utils.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/acpi/x86/utils.c b/drivers/acpi/x86/utils.c index f1dd086d0b87..7d6083d40bf6 100644 --- a/drivers/acpi/x86/utils.c +++ b/drivers/acpi/x86/utils.c @@ -204,6 +204,7 @@ bool acpi_device_override_status(struct acpi_device *adev, unsigned long long *s disk in the system. */ static const struct x86_cpu_id storage_d3_cpu_ids[] = { + X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 24, NULL), /* Picasso */ X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 96, NULL), /* Renoir */ X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 104, NULL), /* Lucienne */ X86_MATCH_VENDOR_FAM_MODEL(AMD, 25, 80, NULL), /* Cezanne */ From f0ef5ca85f432a7e90e103d0de4bf9d4bd140794 Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Thu, 9 May 2024 13:45:02 -0500 Subject: [PATCH 1268/1565] ACPI: x86: Force StorageD3Enable on more products [ Upstream commit e79a10652bbd320649da705ca1ea0c04351af403 ] A Rembrandt-based HP thin client is reported to have problems where the NVME disk isn't present after resume from s2idle. This is because the NVME disk wasn't put into D3 at suspend, and that happened because the StorageD3Enable _DSD was missing in the BIOS. As AMD's architecture requires that the NVME is in D3 for s2idle, adjust the criteria for force_storage_d3 to match *all* Zen SoCs when the FADT advertises low power idle support. This will ensure that any future products with this BIOS deficiency don't need to be added to the allow list of overrides. Cc: All applicable Signed-off-by: Mario Limonciello Acked-by: Hans de Goede Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin --- drivers/acpi/x86/utils.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/drivers/acpi/x86/utils.c b/drivers/acpi/x86/utils.c index 7d6083d40bf6..aa4f233373af 100644 --- a/drivers/acpi/x86/utils.c +++ b/drivers/acpi/x86/utils.c @@ -179,16 +179,16 @@ bool acpi_device_override_status(struct acpi_device *adev, unsigned long long *s } /* - * AMD systems from Renoir and Lucienne *require* that the NVME controller + * AMD systems from Renoir onwards *require* that the NVME controller * is put into D3 over a Modern Standby / suspend-to-idle cycle. * * This is "typically" accomplished using the `StorageD3Enable` * property in the _DSD that is checked via the `acpi_storage_d3` function - * but this property was introduced after many of these systems launched - * and most OEM systems don't have it in their BIOS. + * but some OEM systems still don't have it in their BIOS. * * The Microsoft documentation for StorageD3Enable mentioned that Windows has - * a hardcoded allowlist for D3 support, which was used for these platforms. + * a hardcoded allowlist for D3 support as well as a registry key to override + * the BIOS, which has been used for these cases. * * This allows quirking on Linux in a similar fashion. * @@ -201,17 +201,13 @@ bool acpi_device_override_status(struct acpi_device *adev, unsigned long long *s * https://bugzilla.kernel.org/show_bug.cgi?id=216773 * https://bugzilla.kernel.org/show_bug.cgi?id=217003 * 2) On at least one HP system StorageD3Enable is missing on the second NVME - disk in the system. + * disk in the system. + * 3) On at least one HP Rembrandt system StorageD3Enable is missing on the only + * NVME device. */ -static const struct x86_cpu_id storage_d3_cpu_ids[] = { - X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 24, NULL), /* Picasso */ - X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 96, NULL), /* Renoir */ - X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 104, NULL), /* Lucienne */ - X86_MATCH_VENDOR_FAM_MODEL(AMD, 25, 80, NULL), /* Cezanne */ - {} -}; - bool force_storage_d3(void) { - return x86_match_cpu(storage_d3_cpu_ids); + if (!cpu_feature_enabled(X86_FEATURE_ZEN)) + return false; + return acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0; } From b6be2b025c724fbf2829fc00b37662fc6ec4bd0c Mon Sep 17 00:00:00 2001 From: John Keeping Date: Thu, 23 May 2024 09:56:24 +0100 Subject: [PATCH 1269/1565] Input: ili210x - fix ili251x_read_touch_data() return value [ Upstream commit 9f0fad0382124e7e23b3c730fa78818c22c89c0a ] The caller of this function treats all non-zero values as an error, so the return value of i2c_master_recv() cannot be returned directly. This fixes touch reporting when there are more than 6 active touches. Fixes: ef536abd3afd1 ("Input: ili210x - define and use chip operations structure") Signed-off-by: John Keeping Link: https://lore.kernel.org/r/20240523085624.2295988-1-jkeeping@inmusicbrands.com Signed-off-by: Dmitry Torokhov Signed-off-by: Sasha Levin --- drivers/input/touchscreen/ili210x.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/input/touchscreen/ili210x.c b/drivers/input/touchscreen/ili210x.c index f437eefec94a..9452a12ddb09 100644 --- a/drivers/input/touchscreen/ili210x.c +++ b/drivers/input/touchscreen/ili210x.c @@ -231,8 +231,8 @@ static int ili251x_read_touch_data(struct i2c_client *client, u8 *data) if (!error && data[0] == 2) { error = i2c_master_recv(client, data + ILI251X_DATA_SIZE1, ILI251X_DATA_SIZE2); - if (error >= 0 && error != ILI251X_DATA_SIZE2) - error = -EIO; + if (error >= 0) + error = error == ILI251X_DATA_SIZE2 ? 0 : -EIO; } return error; From b813e3fd102a959c5b208ed68afe27e0137a561b Mon Sep 17 00:00:00 2001 From: Hagar Hemdan Date: Tue, 4 Jun 2024 08:58:38 +0000 Subject: [PATCH 1270/1565] pinctrl: fix deadlock in create_pinctrl() when handling -EPROBE_DEFER [ Upstream commit adec57ff8e66aee632f3dd1f93787c13d112b7a1 ] In create_pinctrl(), pinctrl_maps_mutex is acquired before calling add_setting(). If add_setting() returns -EPROBE_DEFER, create_pinctrl() calls pinctrl_free(). However, pinctrl_free() attempts to acquire pinctrl_maps_mutex, which is already held by create_pinctrl(), leading to a potential deadlock. This patch resolves the issue by releasing pinctrl_maps_mutex before calling pinctrl_free(), preventing the deadlock. This bug was discovered and resolved using Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc. Fixes: 42fed7ba44e4 ("pinctrl: move subsystem mutex to pinctrl_dev struct") Suggested-by: Maximilian Heyne Signed-off-by: Hagar Hemdan Link: https://lore.kernel.org/r/20240604085838.3344-1-hagarhem@amazon.com Signed-off-by: Linus Walleij Signed-off-by: Sasha Levin --- drivers/pinctrl/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pinctrl/core.c b/drivers/pinctrl/core.c index ee99dc56c544..3d44d6f48cc4 100644 --- a/drivers/pinctrl/core.c +++ b/drivers/pinctrl/core.c @@ -1092,8 +1092,8 @@ static struct pinctrl *create_pinctrl(struct device *dev, * an -EPROBE_DEFER later, as that is the worst case. */ if (ret == -EPROBE_DEFER) { - pinctrl_free(p, false); mutex_unlock(&pinctrl_maps_mutex); + pinctrl_free(p, false); return ERR_PTR(ret); } } From 6ff152b2be889eb423ae4d34044dcf020b3365bf Mon Sep 17 00:00:00 2001 From: Huang-Huang Bao Date: Thu, 6 Jun 2024 20:57:52 +0800 Subject: [PATCH 1271/1565] pinctrl: rockchip: fix pinmux bits for RK3328 GPIO2-B pins [ Upstream commit e8448a6c817c2aa6c6af785b1d45678bd5977e8d ] The pinmux bits for GPIO2-B0 to GPIO2-B6 actually have 2 bits width, correct the bank flag for GPIO2-B. The pinmux bits for GPIO2-B7 is recalculated so it remain unchanged. The pinmux bits for those pins are not explicitly specified in RK3328 TRM, however we can get hint from pad name and its correspinding IOMUX setting for pins in interface descriptions. The correspinding IOMIX settings for GPIO2-B0 to GPIO2-B6 can be found in the same row next to occurrences of following pad names in RK3328 TRM. GPIO2-B0: IO_SPIclkm0_GPIO2B0vccio5 GPIO2-B1: IO_SPItxdm0_GPIO2B1vccio5 GPIO2-B2: IO_SPIrxdm0_GPIO2B2vccio5 GPIO2-B3: IO_SPIcsn0m0_GPIO2B3vccio5 GPIO2-B4: IO_SPIcsn1m0_FLASHvol_sel_GPIO2B4vccio5 GPIO2-B5: IO_ I2C2sda_TSADCshut_GPIO2B5vccio5 GPIO2-B6: IO_ I2C2scl_GPIO2B6vccio5 This fix has been tested on NanoPi R2S for fixing confliting pinmux bits between GPIO2-B7 with GPIO2-B5. Signed-off-by: Huang-Huang Bao Reviewed-by: Heiko Stuebner Fixes: 3818e4a7678e ("pinctrl: rockchip: Add rk3328 pinctrl support") Link: https://lore.kernel.org/r/20240606125755.53778-2-i@eh5.me Signed-off-by: Linus Walleij Signed-off-by: Sasha Levin --- drivers/pinctrl/pinctrl-rockchip.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/pinctrl/pinctrl-rockchip.c b/drivers/pinctrl/pinctrl-rockchip.c index 2a454098eaaa..afa705511437 100644 --- a/drivers/pinctrl/pinctrl-rockchip.c +++ b/drivers/pinctrl/pinctrl-rockchip.c @@ -800,12 +800,6 @@ static struct rockchip_mux_recalced_data rk3308_mux_recalced_data[] = { static struct rockchip_mux_recalced_data rk3328_mux_recalced_data[] = { { - .num = 2, - .pin = 12, - .reg = 0x24, - .bit = 8, - .mask = 0x3 - }, { .num = 2, .pin = 15, .reg = 0x28, @@ -3882,7 +3876,7 @@ static struct rockchip_pin_bank rk3328_pin_banks[] = { PIN_BANK_IOMUX_FLAGS(0, 32, "gpio0", 0, 0, 0, 0), PIN_BANK_IOMUX_FLAGS(1, 32, "gpio1", 0, 0, 0, 0), PIN_BANK_IOMUX_FLAGS(2, 32, "gpio2", 0, - IOMUX_WIDTH_3BIT, + 0, IOMUX_WIDTH_3BIT, 0), PIN_BANK_IOMUX_FLAGS(3, 32, "gpio3", From cfa2527ac80a618c19deff9f6478b9a7e85134cc Mon Sep 17 00:00:00 2001 From: Huang-Huang Bao Date: Thu, 6 Jun 2024 20:57:53 +0800 Subject: [PATCH 1272/1565] pinctrl: rockchip: fix pinmux bits for RK3328 GPIO3-B pins [ Upstream commit 5ef6914e0bf578357b4c906ffe6b26e7eedb8ccf ] The pinmux bits for GPIO3-B1 to GPIO3-B6 pins are not explicitly specified in RK3328 TRM, however we can get hint from pad name and its correspinding IOMUX setting for pins in interface descriptions. The correspinding IOMIX settings for these pins can be found in the same row next to occurrences of following pad names in RK3328 TRM. GPIO3-B1: IO_TSPd5m0_CIFdata5m0_GPIO3B1vccio6 GPIO3-B2: IO_TSPd6m0_CIFdata6m0_GPIO3B2vccio6 GPIO3-B3: IO_TSPd7m0_CIFdata7m0_GPIO3B3vccio6 GPIO3-B4: IO_CARDclkm0_GPIO3B4vccio6 GPIO3-B5: IO_CARDrstm0_GPIO3B5vccio6 GPIO3-B6: IO_CARDdetm0_GPIO3B6vccio6 Add pinmux data to rk3328_mux_recalced_data as mux register offset for these pins does not follow rockchip convention. Signed-off-by: Huang-Huang Bao Reviewed-by: Heiko Stuebner Fixes: 3818e4a7678e ("pinctrl: rockchip: Add rk3328 pinctrl support") Link: https://lore.kernel.org/r/20240606125755.53778-3-i@eh5.me Signed-off-by: Linus Walleij Signed-off-by: Sasha Levin --- drivers/pinctrl/pinctrl-rockchip.c | 51 ++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/drivers/pinctrl/pinctrl-rockchip.c b/drivers/pinctrl/pinctrl-rockchip.c index afa705511437..ad51f84d5e81 100644 --- a/drivers/pinctrl/pinctrl-rockchip.c +++ b/drivers/pinctrl/pinctrl-rockchip.c @@ -800,17 +800,68 @@ static struct rockchip_mux_recalced_data rk3308_mux_recalced_data[] = { static struct rockchip_mux_recalced_data rk3328_mux_recalced_data[] = { { + /* gpio2_b7_sel */ .num = 2, .pin = 15, .reg = 0x28, .bit = 0, .mask = 0x7 }, { + /* gpio2_c7_sel */ .num = 2, .pin = 23, .reg = 0x30, .bit = 14, .mask = 0x3 + }, { + /* gpio3_b1_sel */ + .num = 3, + .pin = 9, + .reg = 0x44, + .bit = 2, + .mask = 0x3 + }, { + /* gpio3_b2_sel */ + .num = 3, + .pin = 10, + .reg = 0x44, + .bit = 4, + .mask = 0x3 + }, { + /* gpio3_b3_sel */ + .num = 3, + .pin = 11, + .reg = 0x44, + .bit = 6, + .mask = 0x3 + }, { + /* gpio3_b4_sel */ + .num = 3, + .pin = 12, + .reg = 0x44, + .bit = 8, + .mask = 0x3 + }, { + /* gpio3_b5_sel */ + .num = 3, + .pin = 13, + .reg = 0x44, + .bit = 10, + .mask = 0x3 + }, { + /* gpio3_b6_sel */ + .num = 3, + .pin = 14, + .reg = 0x44, + .bit = 12, + .mask = 0x3 + }, { + /* gpio3_b7_sel */ + .num = 3, + .pin = 15, + .reg = 0x44, + .bit = 14, + .mask = 0x3 }, }; From 926cb583b9ef8463ee346ee03b4fce48cd81a6f3 Mon Sep 17 00:00:00 2001 From: Jianqun Xu Date: Mon, 16 Aug 2021 09:19:41 +0800 Subject: [PATCH 1273/1565] pinctrl/rockchip: separate struct rockchip_pin_bank to a head file [ Upstream commit e1450694e94657458395af886d2467d6ac3355af ] Separate struct rockchip_pin_bank to pinctrl-rockchip.h file, which will be used by gpio-rockchip driver in the future. Signed-off-by: Jianqun Xu Link: https://lore.kernel.org/r/20210816011948.1118959-3-jay.xu@rock-chips.com Signed-off-by: Linus Walleij Stable-dep-of: 01b4b1d1cec4 ("pinctrl: rockchip: use dedicated pinctrl type for RK3328") Signed-off-by: Sasha Levin --- drivers/pinctrl/pinctrl-rockchip.c | 226 +------------------------- drivers/pinctrl/pinctrl-rockchip.h | 245 +++++++++++++++++++++++++++++ 2 files changed, 246 insertions(+), 225 deletions(-) create mode 100644 drivers/pinctrl/pinctrl-rockchip.h diff --git a/drivers/pinctrl/pinctrl-rockchip.c b/drivers/pinctrl/pinctrl-rockchip.c index ad51f84d5e81..25716bdf5bbe 100644 --- a/drivers/pinctrl/pinctrl-rockchip.c +++ b/drivers/pinctrl/pinctrl-rockchip.c @@ -35,6 +35,7 @@ #include "core.h" #include "pinconf.h" +#include "pinctrl-rockchip.h" /* GPIO control registers */ #define GPIO_SWPORT_DR 0x00 @@ -50,21 +51,6 @@ #define GPIO_EXT_PORT 0x50 #define GPIO_LS_SYNC 0x60 -enum rockchip_pinctrl_type { - PX30, - RV1108, - RK2928, - RK3066B, - RK3128, - RK3188, - RK3288, - RK3308, - RK3368, - RK3399, - RK3568, -}; - - /** * Generate a bitmask for setting a value (v) with a write mask bit in hiword * register 31:16 area. @@ -82,103 +68,6 @@ enum rockchip_pinctrl_type { #define IOMUX_WIDTH_3BIT BIT(4) #define IOMUX_WIDTH_2BIT BIT(5) -/** - * struct rockchip_iomux - * @type: iomux variant using IOMUX_* constants - * @offset: if initialized to -1 it will be autocalculated, by specifying - * an initial offset value the relevant source offset can be reset - * to a new value for autocalculating the following iomux registers. - */ -struct rockchip_iomux { - int type; - int offset; -}; - -/* - * enum type index corresponding to rockchip_perpin_drv_list arrays index. - */ -enum rockchip_pin_drv_type { - DRV_TYPE_IO_DEFAULT = 0, - DRV_TYPE_IO_1V8_OR_3V0, - DRV_TYPE_IO_1V8_ONLY, - DRV_TYPE_IO_1V8_3V0_AUTO, - DRV_TYPE_IO_3V3_ONLY, - DRV_TYPE_MAX -}; - -/* - * enum type index corresponding to rockchip_pull_list arrays index. - */ -enum rockchip_pin_pull_type { - PULL_TYPE_IO_DEFAULT = 0, - PULL_TYPE_IO_1V8_ONLY, - PULL_TYPE_MAX -}; - -/** - * struct rockchip_drv - * @drv_type: drive strength variant using rockchip_perpin_drv_type - * @offset: if initialized to -1 it will be autocalculated, by specifying - * an initial offset value the relevant source offset can be reset - * to a new value for autocalculating the following drive strength - * registers. if used chips own cal_drv func instead to calculate - * registers offset, the variant could be ignored. - */ -struct rockchip_drv { - enum rockchip_pin_drv_type drv_type; - int offset; -}; - -/** - * struct rockchip_pin_bank - * @reg_base: register base of the gpio bank - * @regmap_pull: optional separate register for additional pull settings - * @clk: clock of the gpio bank - * @irq: interrupt of the gpio bank - * @saved_masks: Saved content of GPIO_INTEN at suspend time. - * @pin_base: first pin number - * @nr_pins: number of pins in this bank - * @name: name of the bank - * @bank_num: number of the bank, to account for holes - * @iomux: array describing the 4 iomux sources of the bank - * @drv: array describing the 4 drive strength sources of the bank - * @pull_type: array describing the 4 pull type sources of the bank - * @valid: is all necessary information present - * @of_node: dt node of this bank - * @drvdata: common pinctrl basedata - * @domain: irqdomain of the gpio bank - * @gpio_chip: gpiolib chip - * @grange: gpio range - * @slock: spinlock for the gpio bank - * @toggle_edge_mode: bit mask to toggle (falling/rising) edge mode - * @recalced_mask: bit mask to indicate a need to recalulate the mask - * @route_mask: bits describing the routing pins of per bank - */ -struct rockchip_pin_bank { - void __iomem *reg_base; - struct regmap *regmap_pull; - struct clk *clk; - int irq; - u32 saved_masks; - u32 pin_base; - u8 nr_pins; - char *name; - u8 bank_num; - struct rockchip_iomux iomux[4]; - struct rockchip_drv drv[4]; - enum rockchip_pin_pull_type pull_type[4]; - bool valid; - struct device_node *of_node; - struct rockchip_pinctrl *drvdata; - struct irq_domain *domain; - struct gpio_chip gpio_chip; - struct pinctrl_gpio_range grange; - raw_spinlock_t slock; - u32 toggle_edge_mode; - u32 recalced_mask; - u32 route_mask; -}; - #define PIN_BANK(id, pins, label) \ { \ .bank_num = id, \ @@ -318,119 +207,6 @@ struct rockchip_pin_bank { #define RK_MUXROUTE_PMU(ID, PIN, FUNC, REG, VAL) \ PIN_BANK_MUX_ROUTE_FLAGS(ID, PIN, FUNC, REG, VAL, ROCKCHIP_ROUTE_PMU) -/** - * struct rockchip_mux_recalced_data: represent a pin iomux data. - * @num: bank number. - * @pin: pin number. - * @bit: index at register. - * @reg: register offset. - * @mask: mask bit - */ -struct rockchip_mux_recalced_data { - u8 num; - u8 pin; - u32 reg; - u8 bit; - u8 mask; -}; - -enum rockchip_mux_route_location { - ROCKCHIP_ROUTE_SAME = 0, - ROCKCHIP_ROUTE_PMU, - ROCKCHIP_ROUTE_GRF, -}; - -/** - * struct rockchip_mux_recalced_data: represent a pin iomux data. - * @bank_num: bank number. - * @pin: index at register or used to calc index. - * @func: the min pin. - * @route_location: the mux route location (same, pmu, grf). - * @route_offset: the max pin. - * @route_val: the register offset. - */ -struct rockchip_mux_route_data { - u8 bank_num; - u8 pin; - u8 func; - enum rockchip_mux_route_location route_location; - u32 route_offset; - u32 route_val; -}; - -struct rockchip_pin_ctrl { - struct rockchip_pin_bank *pin_banks; - u32 nr_banks; - u32 nr_pins; - char *label; - enum rockchip_pinctrl_type type; - int grf_mux_offset; - int pmu_mux_offset; - int grf_drv_offset; - int pmu_drv_offset; - struct rockchip_mux_recalced_data *iomux_recalced; - u32 niomux_recalced; - struct rockchip_mux_route_data *iomux_routes; - u32 niomux_routes; - - void (*pull_calc_reg)(struct rockchip_pin_bank *bank, - int pin_num, struct regmap **regmap, - int *reg, u8 *bit); - void (*drv_calc_reg)(struct rockchip_pin_bank *bank, - int pin_num, struct regmap **regmap, - int *reg, u8 *bit); - int (*schmitt_calc_reg)(struct rockchip_pin_bank *bank, - int pin_num, struct regmap **regmap, - int *reg, u8 *bit); -}; - -struct rockchip_pin_config { - unsigned int func; - unsigned long *configs; - unsigned int nconfigs; -}; - -/** - * struct rockchip_pin_group: represent group of pins of a pinmux function. - * @name: name of the pin group, used to lookup the group. - * @pins: the pins included in this group. - * @npins: number of pins included in this group. - * @data: local pin configuration - */ -struct rockchip_pin_group { - const char *name; - unsigned int npins; - unsigned int *pins; - struct rockchip_pin_config *data; -}; - -/** - * struct rockchip_pmx_func: represent a pin function. - * @name: name of the pin function, used to lookup the function. - * @groups: one or more names of pin groups that provide this function. - * @ngroups: number of groups included in @groups. - */ -struct rockchip_pmx_func { - const char *name; - const char **groups; - u8 ngroups; -}; - -struct rockchip_pinctrl { - struct regmap *regmap_base; - int reg_size; - struct regmap *regmap_pull; - struct regmap *regmap_pmu; - struct device *dev; - struct rockchip_pin_ctrl *ctrl; - struct pinctrl_desc pctl; - struct pinctrl_dev *pctl_dev; - struct rockchip_pin_group *groups; - unsigned int ngroups; - struct rockchip_pmx_func *functions; - unsigned int nfunctions; -}; - static struct regmap_config rockchip_regmap_config = { .reg_bits = 32, .val_bits = 32, diff --git a/drivers/pinctrl/pinctrl-rockchip.h b/drivers/pinctrl/pinctrl-rockchip.h new file mode 100644 index 000000000000..dba9e9540633 --- /dev/null +++ b/drivers/pinctrl/pinctrl-rockchip.h @@ -0,0 +1,245 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2020-2021 Rockchip Electronics Co. Ltd. + * + * Copyright (c) 2013 MundoReader S.L. + * Author: Heiko Stuebner + * + * With some ideas taken from pinctrl-samsung: + * Copyright (c) 2012 Samsung Electronics Co., Ltd. + * http://www.samsung.com + * Copyright (c) 2012 Linaro Ltd + * https://www.linaro.org + * + * and pinctrl-at91: + * Copyright (C) 2011-2012 Jean-Christophe PLAGNIOL-VILLARD + */ + +#ifndef _PINCTRL_ROCKCHIP_H +#define _PINCTRL_ROCKCHIP_H + +enum rockchip_pinctrl_type { + PX30, + RV1108, + RK2928, + RK3066B, + RK3128, + RK3188, + RK3288, + RK3308, + RK3368, + RK3399, + RK3568, +}; + +/** + * struct rockchip_iomux + * @type: iomux variant using IOMUX_* constants + * @offset: if initialized to -1 it will be autocalculated, by specifying + * an initial offset value the relevant source offset can be reset + * to a new value for autocalculating the following iomux registers. + */ +struct rockchip_iomux { + int type; + int offset; +}; + +/* + * enum type index corresponding to rockchip_perpin_drv_list arrays index. + */ +enum rockchip_pin_drv_type { + DRV_TYPE_IO_DEFAULT = 0, + DRV_TYPE_IO_1V8_OR_3V0, + DRV_TYPE_IO_1V8_ONLY, + DRV_TYPE_IO_1V8_3V0_AUTO, + DRV_TYPE_IO_3V3_ONLY, + DRV_TYPE_MAX +}; + +/* + * enum type index corresponding to rockchip_pull_list arrays index. + */ +enum rockchip_pin_pull_type { + PULL_TYPE_IO_DEFAULT = 0, + PULL_TYPE_IO_1V8_ONLY, + PULL_TYPE_MAX +}; + +/** + * struct rockchip_drv + * @drv_type: drive strength variant using rockchip_perpin_drv_type + * @offset: if initialized to -1 it will be autocalculated, by specifying + * an initial offset value the relevant source offset can be reset + * to a new value for autocalculating the following drive strength + * registers. if used chips own cal_drv func instead to calculate + * registers offset, the variant could be ignored. + */ +struct rockchip_drv { + enum rockchip_pin_drv_type drv_type; + int offset; +}; + +/** + * struct rockchip_pin_bank + * @reg_base: register base of the gpio bank + * @regmap_pull: optional separate register for additional pull settings + * @clk: clock of the gpio bank + * @irq: interrupt of the gpio bank + * @saved_masks: Saved content of GPIO_INTEN at suspend time. + * @pin_base: first pin number + * @nr_pins: number of pins in this bank + * @name: name of the bank + * @bank_num: number of the bank, to account for holes + * @iomux: array describing the 4 iomux sources of the bank + * @drv: array describing the 4 drive strength sources of the bank + * @pull_type: array describing the 4 pull type sources of the bank + * @valid: is all necessary information present + * @of_node: dt node of this bank + * @drvdata: common pinctrl basedata + * @domain: irqdomain of the gpio bank + * @gpio_chip: gpiolib chip + * @grange: gpio range + * @slock: spinlock for the gpio bank + * @toggle_edge_mode: bit mask to toggle (falling/rising) edge mode + * @recalced_mask: bit mask to indicate a need to recalulate the mask + * @route_mask: bits describing the routing pins of per bank + */ +struct rockchip_pin_bank { + void __iomem *reg_base; + struct regmap *regmap_pull; + struct clk *clk; + int irq; + u32 saved_masks; + u32 pin_base; + u8 nr_pins; + char *name; + u8 bank_num; + struct rockchip_iomux iomux[4]; + struct rockchip_drv drv[4]; + enum rockchip_pin_pull_type pull_type[4]; + bool valid; + struct device_node *of_node; + struct rockchip_pinctrl *drvdata; + struct irq_domain *domain; + struct gpio_chip gpio_chip; + struct pinctrl_gpio_range grange; + raw_spinlock_t slock; + u32 toggle_edge_mode; + u32 recalced_mask; + u32 route_mask; +}; + +/** + * struct rockchip_mux_recalced_data: represent a pin iomux data. + * @num: bank number. + * @pin: pin number. + * @bit: index at register. + * @reg: register offset. + * @mask: mask bit + */ +struct rockchip_mux_recalced_data { + u8 num; + u8 pin; + u32 reg; + u8 bit; + u8 mask; +}; + +enum rockchip_mux_route_location { + ROCKCHIP_ROUTE_SAME = 0, + ROCKCHIP_ROUTE_PMU, + ROCKCHIP_ROUTE_GRF, +}; + +/** + * struct rockchip_mux_recalced_data: represent a pin iomux data. + * @bank_num: bank number. + * @pin: index at register or used to calc index. + * @func: the min pin. + * @route_location: the mux route location (same, pmu, grf). + * @route_offset: the max pin. + * @route_val: the register offset. + */ +struct rockchip_mux_route_data { + u8 bank_num; + u8 pin; + u8 func; + enum rockchip_mux_route_location route_location; + u32 route_offset; + u32 route_val; +}; + +struct rockchip_pin_ctrl { + struct rockchip_pin_bank *pin_banks; + u32 nr_banks; + u32 nr_pins; + char *label; + enum rockchip_pinctrl_type type; + int grf_mux_offset; + int pmu_mux_offset; + int grf_drv_offset; + int pmu_drv_offset; + struct rockchip_mux_recalced_data *iomux_recalced; + u32 niomux_recalced; + struct rockchip_mux_route_data *iomux_routes; + u32 niomux_routes; + + void (*pull_calc_reg)(struct rockchip_pin_bank *bank, + int pin_num, struct regmap **regmap, + int *reg, u8 *bit); + void (*drv_calc_reg)(struct rockchip_pin_bank *bank, + int pin_num, struct regmap **regmap, + int *reg, u8 *bit); + int (*schmitt_calc_reg)(struct rockchip_pin_bank *bank, + int pin_num, struct regmap **regmap, + int *reg, u8 *bit); +}; + +struct rockchip_pin_config { + unsigned int func; + unsigned long *configs; + unsigned int nconfigs; +}; + +/** + * struct rockchip_pin_group: represent group of pins of a pinmux function. + * @name: name of the pin group, used to lookup the group. + * @pins: the pins included in this group. + * @npins: number of pins included in this group. + * @data: local pin configuration + */ +struct rockchip_pin_group { + const char *name; + unsigned int npins; + unsigned int *pins; + struct rockchip_pin_config *data; +}; + +/** + * struct rockchip_pmx_func: represent a pin function. + * @name: name of the pin function, used to lookup the function. + * @groups: one or more names of pin groups that provide this function. + * @ngroups: number of groups included in @groups. + */ +struct rockchip_pmx_func { + const char *name; + const char **groups; + u8 ngroups; +}; + +struct rockchip_pinctrl { + struct regmap *regmap_base; + int reg_size; + struct regmap *regmap_pull; + struct regmap *regmap_pmu; + struct device *dev; + struct rockchip_pin_ctrl *ctrl; + struct pinctrl_desc pctl; + struct pinctrl_dev *pctl_dev; + struct rockchip_pin_group *groups; + unsigned int ngroups; + struct rockchip_pmx_func *functions; + unsigned int nfunctions; +}; + +#endif From 6531f8c6663cb890250dcc5973444c5c9c07edda Mon Sep 17 00:00:00 2001 From: Huang-Huang Bao Date: Thu, 6 Jun 2024 20:57:54 +0800 Subject: [PATCH 1274/1565] pinctrl: rockchip: use dedicated pinctrl type for RK3328 [ Upstream commit 01b4b1d1cec48ef4c26616c2fc4600b2c9fec05a ] rk3328_pin_ctrl uses type of RK3288 which has a hack in rockchip_pinctrl_suspend and rockchip_pinctrl_resume to restore GPIO6-C6 at assume, the hack is not applicable to RK3328 as GPIO6 is not even exist in it. So use a dedicated pinctrl type to skip this hack. Fixes: 3818e4a7678e ("pinctrl: rockchip: Add rk3328 pinctrl support") Reviewed-by: Heiko Stuebner Signed-off-by: Huang-Huang Bao Link: https://lore.kernel.org/r/20240606125755.53778-4-i@eh5.me Signed-off-by: Linus Walleij Signed-off-by: Sasha Levin --- drivers/pinctrl/pinctrl-rockchip.c | 5 ++++- drivers/pinctrl/pinctrl-rockchip.h | 1 + 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/pinctrl/pinctrl-rockchip.c b/drivers/pinctrl/pinctrl-rockchip.c index 25716bdf5bbe..7637b25c6edf 100644 --- a/drivers/pinctrl/pinctrl-rockchip.c +++ b/drivers/pinctrl/pinctrl-rockchip.c @@ -1864,6 +1864,7 @@ static int rockchip_get_pull(struct rockchip_pin_bank *bank, int pin_num) case RK3188: case RK3288: case RK3308: + case RK3328: case RK3368: case RK3399: case RK3568: @@ -1918,6 +1919,7 @@ static int rockchip_set_pull(struct rockchip_pin_bank *bank, case RK3188: case RK3288: case RK3308: + case RK3328: case RK3368: case RK3399: case RK3568: @@ -2239,6 +2241,7 @@ static bool rockchip_pinconf_pull_valid(struct rockchip_pin_ctrl *ctrl, case RK3188: case RK3288: case RK3308: + case RK3328: case RK3368: case RK3399: case RK3568: @@ -3717,7 +3720,7 @@ static struct rockchip_pin_ctrl rk3328_pin_ctrl = { .pin_banks = rk3328_pin_banks, .nr_banks = ARRAY_SIZE(rk3328_pin_banks), .label = "RK3328-GPIO", - .type = RK3288, + .type = RK3328, .grf_mux_offset = 0x0, .iomux_recalced = rk3328_mux_recalced_data, .niomux_recalced = ARRAY_SIZE(rk3328_mux_recalced_data), diff --git a/drivers/pinctrl/pinctrl-rockchip.h b/drivers/pinctrl/pinctrl-rockchip.h index dba9e9540633..7263db68d0ef 100644 --- a/drivers/pinctrl/pinctrl-rockchip.h +++ b/drivers/pinctrl/pinctrl-rockchip.h @@ -27,6 +27,7 @@ enum rockchip_pinctrl_type { RK3188, RK3288, RK3308, + RK3328, RK3368, RK3399, RK3568, From 52af94393dd608ee0f8b375dfbcc3139737e501c Mon Sep 17 00:00:00 2001 From: Huang-Huang Bao Date: Thu, 6 Jun 2024 20:57:55 +0800 Subject: [PATCH 1275/1565] pinctrl: rockchip: fix pinmux reset in rockchip_pmx_set [ Upstream commit 4ea4d4808e342ddf89ba24b93ffa2057005aaced ] rockchip_pmx_set reset all pinmuxs in group to 0 in the case of error, add missing bank data retrieval in that code to avoid setting mux on unexpected pins. Fixes: 14797189b35e ("pinctrl: rockchip: add return value to rockchip_set_mux") Reviewed-by: Heiko Stuebner Signed-off-by: Huang-Huang Bao Link: https://lore.kernel.org/r/20240606125755.53778-5-i@eh5.me Signed-off-by: Linus Walleij Signed-off-by: Sasha Levin --- drivers/pinctrl/pinctrl-rockchip.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/pinctrl/pinctrl-rockchip.c b/drivers/pinctrl/pinctrl-rockchip.c index 7637b25c6edf..02b41f1bafe7 100644 --- a/drivers/pinctrl/pinctrl-rockchip.c +++ b/drivers/pinctrl/pinctrl-rockchip.c @@ -2131,8 +2131,10 @@ static int rockchip_pmx_set(struct pinctrl_dev *pctldev, unsigned selector, if (ret) { /* revert the already done pin settings */ - for (cnt--; cnt >= 0; cnt--) + for (cnt--; cnt >= 0; cnt--) { + bank = pin_to_bank(info, pins[cnt]); rockchip_set_mux(bank, pins[cnt] - bank->pin_base, 0); + } return ret; } From d8a04a6bfa75251ba7bcc3651ed211e82f13f388 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Mon, 20 May 2024 09:05:21 -0400 Subject: [PATCH 1276/1565] drm/amdgpu: fix UBSAN warning in kv_dpm.c [ Upstream commit f0d576f840153392d04b2d52cf3adab8f62e8cb6 ] Adds bounds check for sumo_vid_mapping_entry. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3392 Reviewed-by: Mario Limonciello Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/pm/powerplay/kv_dpm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/pm/powerplay/kv_dpm.c b/drivers/gpu/drm/amd/pm/powerplay/kv_dpm.c index 6eb6f05c1136..56e15f5bc822 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/kv_dpm.c +++ b/drivers/gpu/drm/amd/pm/powerplay/kv_dpm.c @@ -163,6 +163,8 @@ static void sumo_construct_vid_mapping_table(struct amdgpu_device *adev, for (i = 0; i < SUMO_MAX_HARDWARE_POWERLEVELS; i++) { if (table[i].ulSupportedSCLK != 0) { + if (table[i].usVoltageIndex >= SUMO_MAX_NUMBER_VOLTAGES) + continue; vid_mapping_table->entries[table[i].usVoltageIndex].vid_7bit = table[i].usVoltageID; vid_mapping_table->entries[table[i].usVoltageIndex].vid_2bit = From 310dee723530a8c85915d058c12277de58c3f611 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Thu, 27 Jun 2024 02:41:13 +0200 Subject: [PATCH 1277/1565] netfilter: nf_tables: validate family when identifying table via handle [ Upstream commit f6e1532a2697b81da00bfb184e99d15e01e9d98c ] Validate table family when looking up for it via NFTA_TABLE_HANDLE. Fixes: 3ecbfd65f50e ("netfilter: nf_tables: allocate handle and delete objects via handle") Reported-by: Xingyuan Mo Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- net/netfilter/nf_tables_api.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index f3cb5c920276..754278b85706 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -713,7 +713,7 @@ static struct nft_table *nft_table_lookup(const struct net *net, static struct nft_table *nft_table_lookup_byhandle(const struct net *net, const struct nlattr *nla, - u8 genmask) + int family, u8 genmask) { struct nftables_pernet *nft_net; struct nft_table *table; @@ -721,6 +721,7 @@ static struct nft_table *nft_table_lookup_byhandle(const struct net *net, nft_net = net_generic(net, nf_tables_net_id); list_for_each_entry(table, &nft_net->tables, list) { if (be64_to_cpu(nla_get_be64(nla)) == table->handle && + table->family == family && nft_active_genmask(table, genmask)) return table; } @@ -1440,7 +1441,7 @@ static int nf_tables_deltable(struct net *net, struct sock *nlsk, if (nla[NFTA_TABLE_HANDLE]) { attr = nla[NFTA_TABLE_HANDLE]; - table = nft_table_lookup_byhandle(net, attr, genmask); + table = nft_table_lookup_byhandle(net, attr, family, genmask); } else { attr = nla[NFTA_TABLE_NAME]; table = nft_table_lookup(net, attr, family, genmask); From 3de81c1e84bf84803308da3272a829a7655c5336 Mon Sep 17 00:00:00 2001 From: Yunjian Wang Date: Wed, 26 Jun 2024 14:27:41 -0400 Subject: [PATCH 1278/1565] SUNRPC: Fix null pointer dereference in svc_rqst_free() [ Upstream commit b9f83ffaa0c096b4c832a43964fe6bff3acffe10 ] When alloc_pages_node() returns null in svc_rqst_alloc(), the null rq_scratch_page pointer will be dereferenced when calling put_page() in svc_rqst_free(). Fix it by adding a null check. Addresses-Coverity: ("Dereference after null check") Fixes: 5191955d6fc6 ("SUNRPC: Prepare for xdr_stream-style decoding on the server-side") Signed-off-by: Yunjian Wang Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- net/sunrpc/svc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 26d972c54a59..ac7b3a93d992 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -845,7 +845,8 @@ void svc_rqst_free(struct svc_rqst *rqstp) { svc_release_buffer(rqstp); - put_page(rqstp->rq_scratch_page); + if (rqstp->rq_scratch_page) + put_page(rqstp->rq_scratch_page); kfree(rqstp->rq_resp); kfree(rqstp->rq_argp); kfree(rqstp->rq_auth_data); From be20af24585df70d58198d7d3eb02d26f1e5a6c7 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 26 Jun 2024 14:27:42 -0400 Subject: [PATCH 1279/1565] SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency() [ Upstream commit 5c11720767f70d34357d00a15ba5a0ad052c40fe ] Some paths through svc_process() leave rqst->rq_procinfo set to NULL, which triggers a crash if tracing happens to be enabled. Fixes: 89ff87494c6e ("SUNRPC: Display RPC procedure names instead of proc numbers") Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/sunrpc/svc.h | 1 + include/trace/events/sunrpc.h | 8 ++++---- net/sunrpc/svc.c | 15 +++++++++++++++ 3 files changed, 20 insertions(+), 4 deletions(-) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 1cf7a7799cc0..8583825c4aea 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -498,6 +498,7 @@ void svc_wake_up(struct svc_serv *); void svc_reserve(struct svc_rqst *rqstp, int space); struct svc_pool * svc_pool_for_cpu(struct svc_serv *serv, int cpu); char * svc_print_addr(struct svc_rqst *, char *, size_t); +const char * svc_proc_name(const struct svc_rqst *rqstp); int svc_encode_result_payload(struct svc_rqst *rqstp, unsigned int offset, unsigned int length); diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 56e4a57d2538..5d34deca0f30 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -1578,7 +1578,7 @@ TRACE_EVENT(svc_process, __field(u32, vers) __field(u32, proc) __string(service, name) - __string(procedure, rqst->rq_procinfo->pc_name) + __string(procedure, svc_proc_name(rqst)) __string(addr, rqst->rq_xprt ? rqst->rq_xprt->xpt_remotebuf : "(null)") ), @@ -1588,7 +1588,7 @@ TRACE_EVENT(svc_process, __entry->vers = rqst->rq_vers; __entry->proc = rqst->rq_proc; __assign_str(service, name); - __assign_str(procedure, rqst->rq_procinfo->pc_name); + __assign_str(procedure, svc_proc_name(rqst)); __assign_str(addr, rqst->rq_xprt ? rqst->rq_xprt->xpt_remotebuf : "(null)"); ), @@ -1854,7 +1854,7 @@ TRACE_EVENT(svc_stats_latency, TP_STRUCT__entry( __field(u32, xid) __field(unsigned long, execute) - __string(procedure, rqst->rq_procinfo->pc_name) + __string(procedure, svc_proc_name(rqst)) __string(addr, rqst->rq_xprt->xpt_remotebuf) ), @@ -1862,7 +1862,7 @@ TRACE_EVENT(svc_stats_latency, __entry->xid = be32_to_cpu(rqst->rq_xid); __entry->execute = ktime_to_us(ktime_sub(ktime_get(), rqst->rq_stime)); - __assign_str(procedure, rqst->rq_procinfo->pc_name); + __assign_str(procedure, svc_proc_name(rqst)); __assign_str(addr, rqst->rq_xprt->xpt_remotebuf); ), diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index ac7b3a93d992..f8815ae776e6 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1612,6 +1612,21 @@ u32 svc_max_payload(const struct svc_rqst *rqstp) } EXPORT_SYMBOL_GPL(svc_max_payload); +/** + * svc_proc_name - Return RPC procedure name in string form + * @rqstp: svc_rqst to operate on + * + * Return value: + * Pointer to a NUL-terminated string + */ +const char *svc_proc_name(const struct svc_rqst *rqstp) +{ + if (rqstp && rqstp->rq_procinfo) + return rqstp->rq_procinfo->pc_name; + return "unknown"; +} + + /** * svc_encode_result_payload - mark a range of bytes as a result payload * @rqstp: svc_rqst to operate on From f1ef3dc758c7631246b0da8fe8d216886a05a209 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 26 Jun 2024 14:27:43 -0400 Subject: [PATCH 1280/1565] SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation [ Upstream commit 90bfc37b5ab91c1a6165e3e5cfc49bf04571b762 ] Ensure that stream-based argument decoding can't go past the actual end of the receive buffer. xdr_init_decode's calculation of the value of xdr->end over-estimates the end of the buffer because the Linux kernel RPC server code does not remove the size of the RPC header from rqstp->rq_arg before calling the upper layer's dispatcher. The server-side still uses the svc_getnl() macros to decode the RPC call header. These macros reduce the length of the head iov but do not update the total length of the message in the buffer (buf->len). A proper fix for this would be to replace the use of svc_getnl() and friends in the RPC header decoder, but that would be a large and invasive change that would be difficult to backport. Fixes: 5191955d6fc6 ("SUNRPC: Prepare for xdr_stream-style decoding on the server-side") Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/sunrpc/svc.h | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 8583825c4aea..f0e09427070c 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -536,16 +536,27 @@ static inline void svc_reserve_auth(struct svc_rqst *rqstp, int space) } /** - * svcxdr_init_decode - Prepare an xdr_stream for svc Call decoding + * svcxdr_init_decode - Prepare an xdr_stream for Call decoding * @rqstp: controlling server RPC transaction context * + * This function currently assumes the RPC header in rq_arg has + * already been decoded. Upon return, xdr->p points to the + * location of the upper layer header. */ static inline void svcxdr_init_decode(struct svc_rqst *rqstp) { struct xdr_stream *xdr = &rqstp->rq_arg_stream; - struct kvec *argv = rqstp->rq_arg.head; + struct xdr_buf *buf = &rqstp->rq_arg; + struct kvec *argv = buf->head; - xdr_init_decode(xdr, &rqstp->rq_arg, argv->iov_base, NULL); + /* + * svc_getnl() and friends do not keep the xdr_buf's ::len + * field up to date. Refresh that field before initializing + * the argument decoding stream. + */ + buf->len = buf->head->iov_len + buf->page_len + buf->tail->iov_len; + + xdr_init_decode(xdr, buf, argv->iov_base, NULL); xdr_set_scratch_page(xdr, rqstp->rq_scratch_page); } From a4f3907ab50b62304f6341f21cda094e1cf2766d Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 26 Jun 2024 14:27:44 -0400 Subject: [PATCH 1281/1565] SUNRPC: Fix svcxdr_init_encode's buflen calculation [ Upstream commit 1242a87da0d8cd2a428e96ca68e7ea899b0f4624 ] Commit 2825a7f90753 ("nfsd4: allow encoding across page boundaries") added an explicit computation of the remaining length in the rq_res XDR buffer. The computation appears to suffer from an "off-by-one" bug. Because buflen is too large by one page, XDR encoding can run off the end of the send buffer by eventually trying to use the struct page address in rq_page_end, which always contains NULL. Fixes: bddfdbcddbe2 ("NFSD: Extract the svcxdr_init_encode() helper") Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- include/linux/sunrpc/svc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index f0e09427070c..00303c636a89 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -579,7 +579,7 @@ static inline void svcxdr_init_encode(struct svc_rqst *rqstp) xdr->end = resv->iov_base + PAGE_SIZE - rqstp->rq_auth_slack; buf->len = resv->iov_len; xdr->page_ptr = buf->pages - 1; - buf->buflen = PAGE_SIZE * (1 + rqstp->rq_page_end - buf->pages); + buf->buflen = PAGE_SIZE * (rqstp->rq_page_end - buf->pages); buf->buflen -= rqstp->rq_auth_slack; xdr->rqst = NULL; } From 229e145a810d326de2357c7f9211cf7a90d389c3 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 26 Jun 2024 14:27:45 -0400 Subject: [PATCH 1282/1565] nfsd: hold a lighter-weight client reference over CB_RECALL_ANY [ Upstream commit 10396f4df8b75ff6ab0aa2cd74296565466f2c8d ] Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the client. While a callback job is technically an RPC that counter is really more for client-driven RPCs, and this has the effect of preventing the client from being unhashed until the callback completes. If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we can end up in a situation where the callback can't complete on the (now dead) callback channel, but the new client can't connect because the old client can't be unhashed. This usually manifests as a NFS4ERR_DELAY return on the CREATE_SESSION operation. The job is only holding a reference to the client so it can clear a flag after the RPC completes. Fix this by having CB_RECALL_ANY instead hold a reference to the cl_nfsdfs.cl_ref. Typically we only take that sort of reference when dealing with the nfsdfs info files, but it should work appropriately here to ensure that the nfs4_client doesn't disappear. Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition") Reported-by: Vladimir Benes Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin --- fs/nfsd/nfs4state.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 228560f3fd0e..8e84ddccce4b 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2888,12 +2888,9 @@ static void nfsd4_cb_recall_any_release(struct nfsd4_callback *cb) { struct nfs4_client *clp = cb->cb_clp; - struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); - spin_lock(&nn->client_lock); clear_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags); - put_client_renew_locked(clp); - spin_unlock(&nn->client_lock); + drop_client(clp); } static const struct nfsd4_callback_ops nfsd4_cb_recall_any_ops = { @@ -6230,7 +6227,7 @@ deleg_reaper(struct nfsd_net *nn) list_add(&clp->cl_ra_cblist, &cblist); /* release in nfsd4_cb_recall_any_release */ - atomic_inc(&clp->cl_rpc_users); + kref_get(&clp->cl_nfsdfs.cl_ref); set_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags); clp->cl_ra_time = ktime_get_boottime_seconds(); } From 3662eb2170e59b58ad479982dc1084889ba757b9 Mon Sep 17 00:00:00 2001 From: Elinor Montmasson Date: Thu, 20 Jun 2024 15:25:03 +0200 Subject: [PATCH 1283/1565] ASoC: fsl-asoc-card: set priv->pdev before using it [ Upstream commit 90f3feb24172185f1832636264943e8b5e289245 ] priv->pdev pointer was set after being used in fsl_asoc_card_audmux_init(). Move this assignment at the start of the probe function, so sub-functions can correctly use pdev through priv. fsl_asoc_card_audmux_init() dereferences priv->pdev to get access to the dev struct, used with dev_err macros. As priv is zero-initialised, there would be a NULL pointer dereference. Note that if priv->dev is dereferenced before assignment but never used, for example if there is no error to be printed, the driver won't crash probably due to compiler optimisations. Fixes: 708b4351f08c ("ASoC: fsl: Add Freescale Generic ASoC Sound Card with ASRC support") Signed-off-by: Elinor Montmasson Link: https://patch.msgid.link/20240620132511.4291-2-elinor.montmasson@savoirfairelinux.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/fsl/fsl-asoc-card.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c index 9a756d0a6032..c876f111d8b0 100644 --- a/sound/soc/fsl/fsl-asoc-card.c +++ b/sound/soc/fsl/fsl-asoc-card.c @@ -538,6 +538,8 @@ static int fsl_asoc_card_probe(struct platform_device *pdev) if (!priv) return -ENOMEM; + priv->pdev = pdev; + cpu_np = of_parse_phandle(np, "audio-cpu", 0); /* Give a chance to old DT binding */ if (!cpu_np) @@ -718,7 +720,6 @@ static int fsl_asoc_card_probe(struct platform_device *pdev) } /* Initialize sound card */ - priv->pdev = pdev; priv->card.dev = &pdev->dev; priv->card.owner = THIS_MODULE; ret = snd_soc_of_parse_card_name(&priv->card, "model"); From 65a9383389db7bea170eacd9e619805e79660c2f Mon Sep 17 00:00:00 2001 From: Tristram Ha Date: Tue, 18 Jun 2024 17:16:42 -0700 Subject: [PATCH 1284/1565] net: dsa: microchip: fix initial port flush problem [ Upstream commit ad53f5f54f351e967128edbc431f0f26427172cf ] The very first flush in any port will flush all learned addresses in all ports. This can be observed by unplugging the cable from one port while additional ports are connected and dumping the fdb entries. This problem is caused by the initially wrong value programmed to the REG_SW_LUE_CTRL_1 register. Setting SW_FLUSH_STP_TABLE and SW_FLUSH_MSTP_TABLE bits does not have an immediate effect. It is when ksz9477_flush_dyn_mac_table() is called then the SW_FLUSH_STP_TABLE bit takes effect and flushes all learned entries. After that call both bits are reset and so the next port flush will not cause such problem again. Fixes: b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477") Signed-off-by: Tristram Ha Link: https://patch.msgid.link/1718756202-2731-1-git-send-email-Tristram.Ha@microchip.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/dsa/microchip/ksz9477.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c index f42f2f4e4b60..535b64155320 100644 --- a/drivers/net/dsa/microchip/ksz9477.c +++ b/drivers/net/dsa/microchip/ksz9477.c @@ -206,10 +206,8 @@ static int ksz9477_reset_switch(struct ksz_device *dev) SPI_AUTO_EDGE_DETECTION, 0); /* default configuration */ - ksz_read8(dev, REG_SW_LUE_CTRL_1, &data8); - data8 = SW_AGING_ENABLE | SW_LINK_AUTO_AGING | - SW_SRC_ADDR_FILTER | SW_FLUSH_STP_TABLE | SW_FLUSH_MSTP_TABLE; - ksz_write8(dev, REG_SW_LUE_CTRL_1, data8); + ksz_write8(dev, REG_SW_LUE_CTRL_1, + SW_AGING_ENABLE | SW_LINK_AUTO_AGING | SW_SRC_ADDR_FILTER); /* disable interrupts */ ksz_write32(dev, REG_SW_INT_MASK__4, SWITCH_INT_MASK); From 0427f74a79538df6c91d4de296ce5e87fe9c9e8b Mon Sep 17 00:00:00 2001 From: Enguerrand de Ribaucourt Date: Fri, 21 Jun 2024 16:43:20 +0200 Subject: [PATCH 1285/1565] net: phy: micrel: add Microchip KSZ 9477 to the device table [ Upstream commit 54a4e5c16382e871c01dd82b47e930fdce30406b ] PHY_ID_KSZ9477 was supported but not added to the device table passed to MODULE_DEVICE_TABLE. Fixes: fc3973a1fa09 ("phy: micrel: add Microchip KSZ 9477 Switch PHY support") Signed-off-by: Enguerrand de Ribaucourt Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/net/phy/micrel.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index 2b7616f161d6..b68682006c06 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -1410,6 +1410,7 @@ static struct mdio_device_id __maybe_unused micrel_tbl[] = { { PHY_ID_KSZ8081, MICREL_PHY_ID_MASK }, { PHY_ID_KSZ8873MLL, MICREL_PHY_ID_MASK }, { PHY_ID_KSZ886X, MICREL_PHY_ID_MASK }, + { PHY_ID_KSZ9477, MICREL_PHY_ID_MASK }, { PHY_ID_LAN8814, MICREL_PHY_ID_MASK }, { } }; From f4aa8268d774210e6bbbeb5e9b1570b2d2962d77 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Fri, 25 Jun 2021 15:16:12 -0700 Subject: [PATCH 1286/1565] xdp: Move the rxq_info.mem clearing to unreg_mem_model() [ Upstream commit a78cae2476812cecaa4a33d0086bbb53986906bc ] xdp_rxq_info_unreg() implicitly calls xdp_rxq_info_unreg_mem_model(). This may well be confusing to the driver authors, and lead to double free if they call xdp_rxq_info_unreg_mem_model() before xdp_rxq_info_unreg() (when mem model type == MEM_TYPE_PAGE_POOL). In fact error path of mvpp2_rxq_init() seems to currently do exactly that. The double free will result in refcount underflow in page_pool_destroy(). Make the interface a little more programmer friendly by clearing type and id so that xdp_rxq_info_unreg_mem_model() can be called multiple times. Signed-off-by: Jakub Kicinski Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20210625221612.2637086-1-kuba@kernel.org Stable-dep-of: 7e9f79428372 ("xdp: Remove WARN() from __xdp_reg_mem_model()") Signed-off-by: Sasha Levin --- net/core/xdp.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/net/core/xdp.c b/net/core/xdp.c index b8d7fa47d293..0f0b65981614 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -113,8 +113,13 @@ static void mem_allocator_disconnect(void *allocator) void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq) { struct xdp_mem_allocator *xa; + int type = xdp_rxq->mem.type; int id = xdp_rxq->mem.id; + /* Reset mem info to defaults */ + xdp_rxq->mem.id = 0; + xdp_rxq->mem.type = 0; + if (xdp_rxq->reg_state != REG_STATE_REGISTERED) { WARN(1, "Missing register, driver bug"); return; @@ -123,7 +128,7 @@ void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq) if (id == 0) return; - if (xdp_rxq->mem.type == MEM_TYPE_PAGE_POOL) { + if (type == MEM_TYPE_PAGE_POOL) { rcu_read_lock(); xa = rhashtable_lookup(mem_id_ht, &id, mem_id_rht_params); page_pool_destroy(xa->page_pool); @@ -144,10 +149,6 @@ void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq) xdp_rxq->reg_state = REG_STATE_UNREGISTERED; xdp_rxq->dev = NULL; - - /* Reset mem info to defaults */ - xdp_rxq->mem.id = 0; - xdp_rxq->mem.type = 0; } EXPORT_SYMBOL_GPL(xdp_rxq_info_unreg); From 5a3035306a0becc121023c5ef363bdf165f4f1ba Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= Date: Mon, 3 Jan 2022 16:08:06 +0100 Subject: [PATCH 1287/1565] xdp: Allow registering memory model without rxq reference MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 4a48ef70b93b8c7ed5190adfca18849e76387b80 ] The functions that register an XDP memory model take a struct xdp_rxq as parameter, but the RXQ is not actually used for anything other than pulling out the struct xdp_mem_info that it embeds. So refactor the register functions and export variants that just take a pointer to the xdp_mem_info. This is in preparation for enabling XDP_REDIRECT in bpf_prog_run(), using a page_pool instance that is not connected to any network device. Signed-off-by: Toke Høiland-Jørgensen Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220103150812.87914-2-toke@redhat.com Stable-dep-of: 7e9f79428372 ("xdp: Remove WARN() from __xdp_reg_mem_model()") Signed-off-by: Sasha Levin --- include/net/xdp.h | 3 ++ net/core/xdp.c | 92 +++++++++++++++++++++++++++++++---------------- 2 files changed, 65 insertions(+), 30 deletions(-) diff --git a/include/net/xdp.h b/include/net/xdp.h index 9dab2bc6f187..9e6c10b323b8 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -218,6 +218,9 @@ bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq); int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, enum xdp_mem_type type, void *allocator); void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq); +int xdp_reg_mem_model(struct xdp_mem_info *mem, + enum xdp_mem_type type, void *allocator); +void xdp_unreg_mem_model(struct xdp_mem_info *mem); /* Drivers not supporting XDP metadata can use this helper, which * rejects any room expansion for metadata as a result. diff --git a/net/core/xdp.c b/net/core/xdp.c index 0f0b65981614..6e6b89d5f77e 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -110,20 +110,15 @@ static void mem_allocator_disconnect(void *allocator) mutex_unlock(&mem_id_lock); } -void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq) +void xdp_unreg_mem_model(struct xdp_mem_info *mem) { struct xdp_mem_allocator *xa; - int type = xdp_rxq->mem.type; - int id = xdp_rxq->mem.id; + int type = mem->type; + int id = mem->id; /* Reset mem info to defaults */ - xdp_rxq->mem.id = 0; - xdp_rxq->mem.type = 0; - - if (xdp_rxq->reg_state != REG_STATE_REGISTERED) { - WARN(1, "Missing register, driver bug"); - return; - } + mem->id = 0; + mem->type = 0; if (id == 0) return; @@ -135,6 +130,17 @@ void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq) rcu_read_unlock(); } } +EXPORT_SYMBOL_GPL(xdp_unreg_mem_model); + +void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq) +{ + if (xdp_rxq->reg_state != REG_STATE_REGISTERED) { + WARN(1, "Missing register, driver bug"); + return; + } + + xdp_unreg_mem_model(&xdp_rxq->mem); +} EXPORT_SYMBOL_GPL(xdp_rxq_info_unreg_mem_model); void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq) @@ -260,28 +266,24 @@ static bool __is_supported_mem_type(enum xdp_mem_type type) return true; } -int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, - enum xdp_mem_type type, void *allocator) +static struct xdp_mem_allocator *__xdp_reg_mem_model(struct xdp_mem_info *mem, + enum xdp_mem_type type, + void *allocator) { struct xdp_mem_allocator *xdp_alloc; gfp_t gfp = GFP_KERNEL; int id, errno, ret; void *ptr; - if (xdp_rxq->reg_state != REG_STATE_REGISTERED) { - WARN(1, "Missing register, driver bug"); - return -EFAULT; - } - if (!__is_supported_mem_type(type)) - return -EOPNOTSUPP; + return ERR_PTR(-EOPNOTSUPP); - xdp_rxq->mem.type = type; + mem->type = type; if (!allocator) { if (type == MEM_TYPE_PAGE_POOL) - return -EINVAL; /* Setup time check page_pool req */ - return 0; + return ERR_PTR(-EINVAL); /* Setup time check page_pool req */ + return NULL; } /* Delay init of rhashtable to save memory if feature isn't used */ @@ -291,13 +293,13 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, mutex_unlock(&mem_id_lock); if (ret < 0) { WARN_ON(1); - return ret; + return ERR_PTR(ret); } } xdp_alloc = kzalloc(sizeof(*xdp_alloc), gfp); if (!xdp_alloc) - return -ENOMEM; + return ERR_PTR(-ENOMEM); mutex_lock(&mem_id_lock); id = __mem_id_cyclic_get(gfp); @@ -305,15 +307,15 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, errno = id; goto err; } - xdp_rxq->mem.id = id; - xdp_alloc->mem = xdp_rxq->mem; + mem->id = id; + xdp_alloc->mem = *mem; xdp_alloc->allocator = allocator; /* Insert allocator into ID lookup table */ ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node); if (IS_ERR(ptr)) { - ida_simple_remove(&mem_id_pool, xdp_rxq->mem.id); - xdp_rxq->mem.id = 0; + ida_simple_remove(&mem_id_pool, mem->id); + mem->id = 0; errno = PTR_ERR(ptr); goto err; } @@ -323,13 +325,43 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, mutex_unlock(&mem_id_lock); - trace_mem_connect(xdp_alloc, xdp_rxq); - return 0; + return xdp_alloc; err: mutex_unlock(&mem_id_lock); kfree(xdp_alloc); - return errno; + return ERR_PTR(errno); } + +int xdp_reg_mem_model(struct xdp_mem_info *mem, + enum xdp_mem_type type, void *allocator) +{ + struct xdp_mem_allocator *xdp_alloc; + + xdp_alloc = __xdp_reg_mem_model(mem, type, allocator); + if (IS_ERR(xdp_alloc)) + return PTR_ERR(xdp_alloc); + return 0; +} +EXPORT_SYMBOL_GPL(xdp_reg_mem_model); + +int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, + enum xdp_mem_type type, void *allocator) +{ + struct xdp_mem_allocator *xdp_alloc; + + if (xdp_rxq->reg_state != REG_STATE_REGISTERED) { + WARN(1, "Missing register, driver bug"); + return -EFAULT; + } + + xdp_alloc = __xdp_reg_mem_model(&xdp_rxq->mem, type, allocator); + if (IS_ERR(xdp_alloc)) + return PTR_ERR(xdp_alloc); + + trace_mem_connect(xdp_alloc, xdp_rxq); + return 0; +} + EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model); /* XDP RX runs under NAPI protection, and in different delivery error From 1095b8efbb13a6a5fa583ed373ee1ccab29da2d0 Mon Sep 17 00:00:00 2001 From: Daniil Dulov Date: Mon, 24 Jun 2024 11:07:47 +0300 Subject: [PATCH 1288/1565] xdp: Remove WARN() from __xdp_reg_mem_model() [ Upstream commit 7e9f79428372c6eab92271390851be34ab26bfb4 ] syzkaller reports a warning in __xdp_reg_mem_model(). The warning occurs only if __mem_id_init_hash_table() returns an error. It returns the error in two cases: 1. memory allocation fails; 2. rhashtable_init() fails when some fields of rhashtable_params struct are not initialized properly. The second case cannot happen since there is a static const rhashtable_params struct with valid fields. So, warning is only triggered when there is a problem with memory allocation. Thus, there is no sense in using WARN() to handle this error and it can be safely removed. WARNING: CPU: 0 PID: 5065 at net/core/xdp.c:299 __xdp_reg_mem_model+0x2d9/0x650 net/core/xdp.c:299 CPU: 0 PID: 5065 Comm: syz-executor883 Not tainted 6.8.0-syzkaller-05271-gf99c5f563c17 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 RIP: 0010:__xdp_reg_mem_model+0x2d9/0x650 net/core/xdp.c:299 Call Trace: xdp_reg_mem_model+0x22/0x40 net/core/xdp.c:344 xdp_test_run_setup net/bpf/test_run.c:188 [inline] bpf_test_run_xdp_live+0x365/0x1e90 net/bpf/test_run.c:377 bpf_prog_test_run_xdp+0x813/0x11b0 net/bpf/test_run.c:1267 bpf_prog_test_run+0x33a/0x3b0 kernel/bpf/syscall.c:4240 __sys_bpf+0x48d/0x810 kernel/bpf/syscall.c:5649 __do_sys_bpf kernel/bpf/syscall.c:5738 [inline] __se_sys_bpf kernel/bpf/syscall.c:5736 [inline] __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5736 do_syscall_64+0xfb/0x240 entry_SYSCALL_64_after_hwframe+0x6d/0x75 Found by Linux Verification Center (linuxtesting.org) with syzkaller. Fixes: 8d5d88527587 ("xdp: rhashtable with allocator ID to pointer mapping") Signed-off-by: Daniil Dulov Signed-off-by: Daniel Borkmann Acked-by: Jesper Dangaard Brouer Link: https://lore.kernel.org/all/20240617162708.492159-1-d.dulov@aladdin.ru Link: https://lore.kernel.org/bpf/20240624080747.36858-1-d.dulov@aladdin.ru Signed-off-by: Sasha Levin --- net/core/xdp.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/net/core/xdp.c b/net/core/xdp.c index 6e6b89d5f77e..eb5045924efa 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -291,10 +291,8 @@ static struct xdp_mem_allocator *__xdp_reg_mem_model(struct xdp_mem_info *mem, mutex_lock(&mem_id_lock); ret = __mem_id_init_hash_table(); mutex_unlock(&mem_id_lock); - if (ret < 0) { - WARN_ON(1); + if (ret < 0) return ERR_PTR(ret); - } } xdp_alloc = kzalloc(sizeof(*xdp_alloc), gfp); From 2a700b8de527372ca26a99660fdb68c3db01c85d Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 19 Jun 2024 14:07:30 +0200 Subject: [PATCH 1289/1565] sparc: fix old compat_sys_select() [ Upstream commit bae6428a9fffb2023191b0723e276cf1377a7c9f ] sparc has two identical select syscalls at numbers 93 and 230, respectively. During the conversion to the modern syscall.tbl format, the older one of the two broke in compat mode, and now refers to the native 64-bit syscall. Restore the correct behavior. This has very little effect, as glibc has been using the newer number anyway. Fixes: 6ff645dd683a ("sparc: add system call table generation support") Signed-off-by: Arnd Bergmann Signed-off-by: Sasha Levin --- arch/sparc/kernel/syscalls/syscall.tbl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 78160260991b..6869c7c71eba 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -117,7 +117,7 @@ 90 common dup2 sys_dup2 91 32 setfsuid32 sys_setfsuid 92 common fcntl sys_fcntl compat_sys_fcntl -93 common select sys_select +93 common select sys_select compat_sys_select 94 32 setfsgid32 sys_setfsgid 95 common fsync sys_fsync 96 common setpriority sys_setpriority From 7620738513f742da68a0f048dad0a8aff1c5040b Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 19 Jun 2024 12:49:39 +0200 Subject: [PATCH 1290/1565] sparc: fix compat recv/recvfrom syscalls [ Upstream commit d6fbd26fb872ec518d25433a12e8ce8163e20909 ] sparc has the wrong compat version of recv() and recvfrom() for both the direct syscalls and socketcall(). The direct syscalls just need to use the compat version. For socketcall, the same thing could be done, but it seems better to completely remove the custom assembler code for it and just use the same implementation that everyone else has. Fixes: 1dacc76d0014 ("net/compat/wext: send different messages to compat tasks") Signed-off-by: Arnd Bergmann Signed-off-by: Sasha Levin --- arch/sparc/kernel/sys32.S | 221 ------------------------- arch/sparc/kernel/syscalls/syscall.tbl | 4 +- 2 files changed, 2 insertions(+), 223 deletions(-) diff --git a/arch/sparc/kernel/sys32.S b/arch/sparc/kernel/sys32.S index a45f0f31fe51..a3d308f2043e 100644 --- a/arch/sparc/kernel/sys32.S +++ b/arch/sparc/kernel/sys32.S @@ -18,224 +18,3 @@ sys32_mmap2: sethi %hi(sys_mmap), %g1 jmpl %g1 + %lo(sys_mmap), %g0 sllx %o5, 12, %o5 - - .align 32 - .globl sys32_socketcall -sys32_socketcall: /* %o0=call, %o1=args */ - cmp %o0, 1 - bl,pn %xcc, do_einval - cmp %o0, 18 - bg,pn %xcc, do_einval - sub %o0, 1, %o0 - sllx %o0, 5, %o0 - sethi %hi(__socketcall_table_begin), %g2 - or %g2, %lo(__socketcall_table_begin), %g2 - jmpl %g2 + %o0, %g0 - nop -do_einval: - retl - mov -EINVAL, %o0 - - .align 32 -__socketcall_table_begin: - - /* Each entry is exactly 32 bytes. */ -do_sys_socket: /* sys_socket(int, int, int) */ -1: ldswa [%o1 + 0x0] %asi, %o0 - sethi %hi(sys_socket), %g1 -2: ldswa [%o1 + 0x8] %asi, %o2 - jmpl %g1 + %lo(sys_socket), %g0 -3: ldswa [%o1 + 0x4] %asi, %o1 - nop - nop - nop -do_sys_bind: /* sys_bind(int fd, struct sockaddr *, int) */ -4: ldswa [%o1 + 0x0] %asi, %o0 - sethi %hi(sys_bind), %g1 -5: ldswa [%o1 + 0x8] %asi, %o2 - jmpl %g1 + %lo(sys_bind), %g0 -6: lduwa [%o1 + 0x4] %asi, %o1 - nop - nop - nop -do_sys_connect: /* sys_connect(int, struct sockaddr *, int) */ -7: ldswa [%o1 + 0x0] %asi, %o0 - sethi %hi(sys_connect), %g1 -8: ldswa [%o1 + 0x8] %asi, %o2 - jmpl %g1 + %lo(sys_connect), %g0 -9: lduwa [%o1 + 0x4] %asi, %o1 - nop - nop - nop -do_sys_listen: /* sys_listen(int, int) */ -10: ldswa [%o1 + 0x0] %asi, %o0 - sethi %hi(sys_listen), %g1 - jmpl %g1 + %lo(sys_listen), %g0 -11: ldswa [%o1 + 0x4] %asi, %o1 - nop - nop - nop - nop -do_sys_accept: /* sys_accept(int, struct sockaddr *, int *) */ -12: ldswa [%o1 + 0x0] %asi, %o0 - sethi %hi(sys_accept), %g1 -13: lduwa [%o1 + 0x8] %asi, %o2 - jmpl %g1 + %lo(sys_accept), %g0 -14: lduwa [%o1 + 0x4] %asi, %o1 - nop - nop - nop -do_sys_getsockname: /* sys_getsockname(int, struct sockaddr *, int *) */ -15: ldswa [%o1 + 0x0] %asi, %o0 - sethi %hi(sys_getsockname), %g1 -16: lduwa [%o1 + 0x8] %asi, %o2 - jmpl %g1 + %lo(sys_getsockname), %g0 -17: lduwa [%o1 + 0x4] %asi, %o1 - nop - nop - nop -do_sys_getpeername: /* sys_getpeername(int, struct sockaddr *, int *) */ -18: ldswa [%o1 + 0x0] %asi, %o0 - sethi %hi(sys_getpeername), %g1 -19: lduwa [%o1 + 0x8] %asi, %o2 - jmpl %g1 + %lo(sys_getpeername), %g0 -20: lduwa [%o1 + 0x4] %asi, %o1 - nop - nop - nop -do_sys_socketpair: /* sys_socketpair(int, int, int, int *) */ -21: ldswa [%o1 + 0x0] %asi, %o0 - sethi %hi(sys_socketpair), %g1 -22: ldswa [%o1 + 0x8] %asi, %o2 -23: lduwa [%o1 + 0xc] %asi, %o3 - jmpl %g1 + %lo(sys_socketpair), %g0 -24: ldswa [%o1 + 0x4] %asi, %o1 - nop - nop -do_sys_send: /* sys_send(int, void *, size_t, unsigned int) */ -25: ldswa [%o1 + 0x0] %asi, %o0 - sethi %hi(sys_send), %g1 -26: lduwa [%o1 + 0x8] %asi, %o2 -27: lduwa [%o1 + 0xc] %asi, %o3 - jmpl %g1 + %lo(sys_send), %g0 -28: lduwa [%o1 + 0x4] %asi, %o1 - nop - nop -do_sys_recv: /* sys_recv(int, void *, size_t, unsigned int) */ -29: ldswa [%o1 + 0x0] %asi, %o0 - sethi %hi(sys_recv), %g1 -30: lduwa [%o1 + 0x8] %asi, %o2 -31: lduwa [%o1 + 0xc] %asi, %o3 - jmpl %g1 + %lo(sys_recv), %g0 -32: lduwa [%o1 + 0x4] %asi, %o1 - nop - nop -do_sys_sendto: /* sys_sendto(int, u32, compat_size_t, unsigned int, u32, int) */ -33: ldswa [%o1 + 0x0] %asi, %o0 - sethi %hi(sys_sendto), %g1 -34: lduwa [%o1 + 0x8] %asi, %o2 -35: lduwa [%o1 + 0xc] %asi, %o3 -36: lduwa [%o1 + 0x10] %asi, %o4 -37: ldswa [%o1 + 0x14] %asi, %o5 - jmpl %g1 + %lo(sys_sendto), %g0 -38: lduwa [%o1 + 0x4] %asi, %o1 -do_sys_recvfrom: /* sys_recvfrom(int, u32, compat_size_t, unsigned int, u32, u32) */ -39: ldswa [%o1 + 0x0] %asi, %o0 - sethi %hi(sys_recvfrom), %g1 -40: lduwa [%o1 + 0x8] %asi, %o2 -41: lduwa [%o1 + 0xc] %asi, %o3 -42: lduwa [%o1 + 0x10] %asi, %o4 -43: lduwa [%o1 + 0x14] %asi, %o5 - jmpl %g1 + %lo(sys_recvfrom), %g0 -44: lduwa [%o1 + 0x4] %asi, %o1 -do_sys_shutdown: /* sys_shutdown(int, int) */ -45: ldswa [%o1 + 0x0] %asi, %o0 - sethi %hi(sys_shutdown), %g1 - jmpl %g1 + %lo(sys_shutdown), %g0 -46: ldswa [%o1 + 0x4] %asi, %o1 - nop - nop - nop - nop -do_sys_setsockopt: /* sys_setsockopt(int, int, int, char *, int) */ -47: ldswa [%o1 + 0x0] %asi, %o0 - sethi %hi(sys_setsockopt), %g1 -48: ldswa [%o1 + 0x8] %asi, %o2 -49: lduwa [%o1 + 0xc] %asi, %o3 -50: ldswa [%o1 + 0x10] %asi, %o4 - jmpl %g1 + %lo(sys_setsockopt), %g0 -51: ldswa [%o1 + 0x4] %asi, %o1 - nop -do_sys_getsockopt: /* sys_getsockopt(int, int, int, u32, u32) */ -52: ldswa [%o1 + 0x0] %asi, %o0 - sethi %hi(sys_getsockopt), %g1 -53: ldswa [%o1 + 0x8] %asi, %o2 -54: lduwa [%o1 + 0xc] %asi, %o3 -55: lduwa [%o1 + 0x10] %asi, %o4 - jmpl %g1 + %lo(sys_getsockopt), %g0 -56: ldswa [%o1 + 0x4] %asi, %o1 - nop -do_sys_sendmsg: /* compat_sys_sendmsg(int, struct compat_msghdr *, unsigned int) */ -57: ldswa [%o1 + 0x0] %asi, %o0 - sethi %hi(compat_sys_sendmsg), %g1 -58: lduwa [%o1 + 0x8] %asi, %o2 - jmpl %g1 + %lo(compat_sys_sendmsg), %g0 -59: lduwa [%o1 + 0x4] %asi, %o1 - nop - nop - nop -do_sys_recvmsg: /* compat_sys_recvmsg(int, struct compat_msghdr *, unsigned int) */ -60: ldswa [%o1 + 0x0] %asi, %o0 - sethi %hi(compat_sys_recvmsg), %g1 -61: lduwa [%o1 + 0x8] %asi, %o2 - jmpl %g1 + %lo(compat_sys_recvmsg), %g0 -62: lduwa [%o1 + 0x4] %asi, %o1 - nop - nop - nop -do_sys_accept4: /* sys_accept4(int, struct sockaddr *, int *, int) */ -63: ldswa [%o1 + 0x0] %asi, %o0 - sethi %hi(sys_accept4), %g1 -64: lduwa [%o1 + 0x8] %asi, %o2 -65: ldswa [%o1 + 0xc] %asi, %o3 - jmpl %g1 + %lo(sys_accept4), %g0 -66: lduwa [%o1 + 0x4] %asi, %o1 - nop - nop - - .section __ex_table,"a" - .align 4 - .word 1b, __retl_efault, 2b, __retl_efault - .word 3b, __retl_efault, 4b, __retl_efault - .word 5b, __retl_efault, 6b, __retl_efault - .word 7b, __retl_efault, 8b, __retl_efault - .word 9b, __retl_efault, 10b, __retl_efault - .word 11b, __retl_efault, 12b, __retl_efault - .word 13b, __retl_efault, 14b, __retl_efault - .word 15b, __retl_efault, 16b, __retl_efault - .word 17b, __retl_efault, 18b, __retl_efault - .word 19b, __retl_efault, 20b, __retl_efault - .word 21b, __retl_efault, 22b, __retl_efault - .word 23b, __retl_efault, 24b, __retl_efault - .word 25b, __retl_efault, 26b, __retl_efault - .word 27b, __retl_efault, 28b, __retl_efault - .word 29b, __retl_efault, 30b, __retl_efault - .word 31b, __retl_efault, 32b, __retl_efault - .word 33b, __retl_efault, 34b, __retl_efault - .word 35b, __retl_efault, 36b, __retl_efault - .word 37b, __retl_efault, 38b, __retl_efault - .word 39b, __retl_efault, 40b, __retl_efault - .word 41b, __retl_efault, 42b, __retl_efault - .word 43b, __retl_efault, 44b, __retl_efault - .word 45b, __retl_efault, 46b, __retl_efault - .word 47b, __retl_efault, 48b, __retl_efault - .word 49b, __retl_efault, 50b, __retl_efault - .word 51b, __retl_efault, 52b, __retl_efault - .word 53b, __retl_efault, 54b, __retl_efault - .word 55b, __retl_efault, 56b, __retl_efault - .word 57b, __retl_efault, 58b, __retl_efault - .word 59b, __retl_efault, 60b, __retl_efault - .word 61b, __retl_efault, 62b, __retl_efault - .word 63b, __retl_efault, 64b, __retl_efault - .word 65b, __retl_efault, 66b, __retl_efault - .previous diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 6869c7c71eba..5c76ebe07e9c 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -155,7 +155,7 @@ 123 32 fchown sys_fchown16 123 64 fchown sys_fchown 124 common fchmod sys_fchmod -125 common recvfrom sys_recvfrom +125 common recvfrom sys_recvfrom compat_sys_recvfrom 126 32 setreuid sys_setreuid16 126 64 setreuid sys_setreuid 127 32 setregid sys_setregid16 @@ -247,7 +247,7 @@ 204 32 readdir sys_old_readdir compat_sys_old_readdir 204 64 readdir sys_nis_syscall 205 common readahead sys_readahead compat_sys_readahead -206 common socketcall sys_socketcall sys32_socketcall +206 common socketcall sys_socketcall compat_sys_socketcall 207 common syslog sys_syslog 208 common lookup_dcookie sys_lookup_dcookie compat_sys_lookup_dcookie 209 common fadvise64 sys_fadvise64 compat_sys_fadvise64 From 4e6367fe3210807912b361ac1e6fa26ac7fddd3a Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 19 Jun 2024 14:27:55 +0200 Subject: [PATCH 1291/1565] parisc: use correct compat recv/recvfrom syscalls [ Upstream commit 20a50787349fadf66ac5c48f62e58d753878d2bb ] Johannes missed parisc back when he introduced the compat version of these syscalls, so receiving cmsg messages that require a compat conversion is still broken. Use the correct calls like the other architectures do. Fixes: 1dacc76d0014 ("net/compat/wext: send different messages to compat tasks") Acked-by: Helge Deller Signed-off-by: Arnd Bergmann Signed-off-by: Sasha Levin --- arch/parisc/kernel/syscalls/syscall.tbl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index dfe9254ee74b..e9c70f9a2f50 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -108,7 +108,7 @@ 95 common fchown sys_fchown 96 common getpriority sys_getpriority 97 common setpriority sys_setpriority -98 common recv sys_recv +98 common recv sys_recv compat_sys_recv 99 common statfs sys_statfs compat_sys_statfs 100 common fstatfs sys_fstatfs compat_sys_fstatfs 101 common stat64 sys_stat64 @@ -135,7 +135,7 @@ 120 common clone sys_clone_wrapper 121 common setdomainname sys_setdomainname 122 common sendfile sys_sendfile compat_sys_sendfile -123 common recvfrom sys_recvfrom +123 common recvfrom sys_recvfrom compat_sys_recvfrom 124 32 adjtimex sys_adjtimex_time32 124 64 adjtimex sys_adjtimex 125 common mprotect sys_mprotect From 5d43d789b57943720dca4181a05f6477362b94cf Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Wed, 26 Jun 2024 23:15:38 +0200 Subject: [PATCH 1292/1565] netfilter: nf_tables: fully validate NFT_DATA_VALUE on store to data registers [ Upstream commit 7931d32955e09d0a11b1fe0b6aac1bfa061c005c ] register store validation for NFT_DATA_VALUE is conditional, however, the datatype is always either NFT_DATA_VALUE or NFT_DATA_VERDICT. This only requires a new helper function to infer the register type from the set datatype so this conditional check can be removed. Otherwise, pointer to chain object can be leaked through the registers. Fixes: 96518518cc41 ("netfilter: add nftables") Reported-by: Linus Torvalds Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin --- include/net/netfilter/nf_tables.h | 5 +++++ net/netfilter/nf_tables_api.c | 8 ++++---- net/netfilter/nft_lookup.c | 3 ++- 3 files changed, 11 insertions(+), 5 deletions(-) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index ab8d84775ca8..2b99ee1303d9 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -490,6 +490,11 @@ static inline void *nft_set_priv(const struct nft_set *set) return (void *)set->data; } +static inline enum nft_data_types nft_set_datatype(const struct nft_set *set) +{ + return set->dtype == NFT_DATA_VERDICT ? NFT_DATA_VERDICT : NFT_DATA_VALUE; +} + static inline bool nft_set_gc_is_pending(const struct nft_set *s) { return refcount_read(&s->refs) != 1; diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 754278b85706..f4bbddfbbc24 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -4990,8 +4990,7 @@ static int nf_tables_fill_setelem(struct sk_buff *skb, if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) && nft_data_dump(skb, NFTA_SET_ELEM_DATA, nft_set_ext_data(ext), - set->dtype == NFT_DATA_VERDICT ? NFT_DATA_VERDICT : NFT_DATA_VALUE, - set->dlen) < 0) + nft_set_datatype(set), set->dlen) < 0) goto nla_put_failure; if (nft_set_ext_exists(ext, NFT_SET_EXT_EXPR) && @@ -9337,6 +9336,9 @@ static int nft_validate_register_store(const struct nft_ctx *ctx, return 0; default: + if (type != NFT_DATA_VALUE) + return -EINVAL; + if (reg < NFT_REG_1 * NFT_REG_SIZE / NFT_REG32_SIZE) return -EINVAL; if (len == 0) @@ -9345,8 +9347,6 @@ static int nft_validate_register_store(const struct nft_ctx *ctx, sizeof_field(struct nft_regs, data)) return -ERANGE; - if (data != NULL && type != NFT_DATA_VALUE) - return -EINVAL; return 0; } } diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c index 8bc008ff00cb..d2f8131edaf1 100644 --- a/net/netfilter/nft_lookup.c +++ b/net/netfilter/nft_lookup.c @@ -101,7 +101,8 @@ static int nft_lookup_init(const struct nft_ctx *ctx, return -EINVAL; err = nft_parse_register_store(ctx, tb[NFTA_LOOKUP_DREG], - &priv->dreg, NULL, set->dtype, + &priv->dreg, NULL, + nft_set_datatype(set), set->dlen); if (err < 0) return err; From cae52f61fda0f5d2949dc177f984c9e187d4c6a0 Mon Sep 17 00:00:00 2001 From: Laurent Pinchart Date: Sun, 17 Mar 2024 17:48:39 +0200 Subject: [PATCH 1293/1565] drm/panel: ilitek-ili9881c: Fix warning with GPIO controllers that sleep [ Upstream commit ee7860cd8b5763017f8dc785c2851fecb7a0c565 ] The ilitek-ili9881c controls the reset GPIO using the non-sleeping gpiod_set_value() function. This complains loudly when the GPIO controller needs to sleep. As the caller can sleep, use gpiod_set_value_cansleep() to fix the issue. Signed-off-by: Laurent Pinchart Reviewed-by: Neil Armstrong Link: https://lore.kernel.org/r/20240317154839.21260-1-laurent.pinchart@ideasonboard.com Signed-off-by: Neil Armstrong Link: https://patchwork.freedesktop.org/patch/msgid/20240317154839.21260-1-laurent.pinchart@ideasonboard.com Signed-off-by: Sasha Levin --- drivers/gpu/drm/panel/panel-ilitek-ili9881c.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/panel/panel-ilitek-ili9881c.c b/drivers/gpu/drm/panel/panel-ilitek-ili9881c.c index 534dd7414d42..917cb322bab1 100644 --- a/drivers/gpu/drm/panel/panel-ilitek-ili9881c.c +++ b/drivers/gpu/drm/panel/panel-ilitek-ili9881c.c @@ -506,10 +506,10 @@ static int ili9881c_prepare(struct drm_panel *panel) msleep(5); /* And reset it */ - gpiod_set_value(ctx->reset, 1); + gpiod_set_value_cansleep(ctx->reset, 1); msleep(20); - gpiod_set_value(ctx->reset, 0); + gpiod_set_value_cansleep(ctx->reset, 0); msleep(20); for (i = 0; i < ctx->desc->init_length; i++) { @@ -564,7 +564,7 @@ static int ili9881c_unprepare(struct drm_panel *panel) mipi_dsi_dcs_enter_sleep_mode(ctx->dsi); regulator_disable(ctx->power); - gpiod_set_value(ctx->reset, 1); + gpiod_set_value_cansleep(ctx->reset, 1); return 0; } From 37f646c6040f3c9f8c28101bd6559fb07cb12be0 Mon Sep 17 00:00:00 2001 From: Denis Arefev Date: Fri, 15 Mar 2024 12:37:58 +0300 Subject: [PATCH 1294/1565] mtd: partitions: redboot: Added conversion of operands to a larger type [ Upstream commit 1162bc2f8f5de7da23d18aa4b7fbd4e93c369c50 ] The value of an arithmetic expression directory * master->erasesize is subject to overflow due to a failure to cast operands to a larger data type before perfroming arithmetic Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Denis Arefev Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20240315093758.20790-1-arefev@swemel.ru Signed-off-by: Sasha Levin --- drivers/mtd/parsers/redboot.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mtd/parsers/redboot.c b/drivers/mtd/parsers/redboot.c index 4f3bcc59a638..3351be651473 100644 --- a/drivers/mtd/parsers/redboot.c +++ b/drivers/mtd/parsers/redboot.c @@ -102,7 +102,7 @@ static int parse_redboot_partitions(struct mtd_info *master, offset -= master->erasesize; } } else { - offset = directory * master->erasesize; + offset = (unsigned long) directory * master->erasesize; while (mtd_block_isbad(master, offset)) { offset += master->erasesize; if (offset == master->size) From 3d6432f20f0040837885ddab2e0a7fdb7f790035 Mon Sep 17 00:00:00 2001 From: Anton Protopopov Date: Tue, 26 Mar 2024 10:17:42 +0000 Subject: [PATCH 1295/1565] bpf: Add a check for struct bpf_fib_lookup size [ Upstream commit 59b418c7063d30e0a3e1f592d47df096db83185c ] The struct bpf_fib_lookup should not grow outside of its 64 bytes. Add a static assert to validate this. Suggested-by: David Ahern Signed-off-by: Anton Protopopov Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20240326101742.17421-4-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov Signed-off-by: Sasha Levin --- net/core/filter.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/core/filter.c b/net/core/filter.c index 49e4d1535cc8..a3101cdfd47b 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -78,6 +78,9 @@ #include #include +/* Keep the struct bpf_fib_lookup small so that it fits into a cacheline */ +static_assert(sizeof(struct bpf_fib_lookup) == 64, "struct bpf_fib_lookup size check"); + static const struct bpf_func_proto * bpf_sk_base_func_proto(enum bpf_func_id func_id); From 9dadab0db7d904413ea1cdaa13f127da05c31e71 Mon Sep 17 00:00:00 2001 From: Dawei Li Date: Sun, 31 Mar 2024 13:34:40 +0800 Subject: [PATCH 1296/1565] net/iucv: Avoid explicit cpumask var allocation on stack [ Upstream commit be4e1304419c99a164b4c0e101c7c2a756b635b9 ] For CONFIG_CPUMASK_OFFSTACK=y kernel, explicit allocation of cpumask variable on stack is not recommended since it can cause potential stack overflow. Instead, kernel code should always use *cpumask_var API(s) to allocate cpumask var in config-neutral way, leaving allocation strategy to CONFIG_CPUMASK_OFFSTACK. Use *cpumask_var API(s) to address it. Signed-off-by: Dawei Li Reviewed-by: Alexandra Winter Link: https://lore.kernel.org/r/20240331053441.1276826-2-dawei.li@shingroup.cn Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/iucv/iucv.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/net/iucv/iucv.c b/net/iucv/iucv.c index ed0dbdbba4d9..06770b77e5d2 100644 --- a/net/iucv/iucv.c +++ b/net/iucv/iucv.c @@ -517,7 +517,7 @@ static void iucv_setmask_mp(void) */ static void iucv_setmask_up(void) { - cpumask_t cpumask; + static cpumask_t cpumask; int cpu; /* Disable all cpu but the first in cpu_irq_cpumask. */ @@ -625,23 +625,33 @@ static int iucv_cpu_online(unsigned int cpu) static int iucv_cpu_down_prep(unsigned int cpu) { - cpumask_t cpumask; + cpumask_var_t cpumask; + int ret = 0; if (!iucv_path_table) return 0; - cpumask_copy(&cpumask, &iucv_buffer_cpumask); - cpumask_clear_cpu(cpu, &cpumask); - if (cpumask_empty(&cpumask)) + if (!alloc_cpumask_var(&cpumask, GFP_KERNEL)) + return -ENOMEM; + + cpumask_copy(cpumask, &iucv_buffer_cpumask); + cpumask_clear_cpu(cpu, cpumask); + if (cpumask_empty(cpumask)) { /* Can't offline last IUCV enabled cpu. */ - return -EINVAL; + ret = -EINVAL; + goto __free_cpumask; + } iucv_retrieve_cpu(NULL); if (!cpumask_empty(&iucv_irq_cpumask)) - return 0; + goto __free_cpumask; + smp_call_function_single(cpumask_first(&iucv_buffer_cpumask), iucv_allow_cpu, NULL, 1); - return 0; + +__free_cpumask: + free_cpumask_var(cpumask); + return ret; } /** From 763896ab62a672d728f5eb10ac90d98c607a8509 Mon Sep 17 00:00:00 2001 From: Dawei Li Date: Sun, 31 Mar 2024 13:34:41 +0800 Subject: [PATCH 1297/1565] net/dpaa2: Avoid explicit cpumask var allocation on stack [ Upstream commit d33fe1714a44ff540629b149d8fab4ac6967585c ] For CONFIG_CPUMASK_OFFSTACK=y kernel, explicit allocation of cpumask variable on stack is not recommended since it can cause potential stack overflow. Instead, kernel code should always use *cpumask_var API(s) to allocate cpumask var in config-neutral way, leaving allocation strategy to CONFIG_CPUMASK_OFFSTACK. Use *cpumask_var API(s) to address it. Signed-off-by: Dawei Li Link: https://lore.kernel.org/r/20240331053441.1276826-3-dawei.li@shingroup.cn Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c index 07ba0438f965..fa202fea537f 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c @@ -2383,11 +2383,14 @@ static int dpaa2_eth_xdp_xmit(struct net_device *net_dev, int n, static int update_xps(struct dpaa2_eth_priv *priv) { struct net_device *net_dev = priv->net_dev; - struct cpumask xps_mask; - struct dpaa2_eth_fq *fq; int i, num_queues, netdev_queues; + struct dpaa2_eth_fq *fq; + cpumask_var_t xps_mask; int err = 0; + if (!alloc_cpumask_var(&xps_mask, GFP_KERNEL)) + return -ENOMEM; + num_queues = dpaa2_eth_queue_count(priv); netdev_queues = (net_dev->num_tc ? : 1) * num_queues; @@ -2397,16 +2400,17 @@ static int update_xps(struct dpaa2_eth_priv *priv) for (i = 0; i < netdev_queues; i++) { fq = &priv->fq[i % num_queues]; - cpumask_clear(&xps_mask); - cpumask_set_cpu(fq->target_cpu, &xps_mask); + cpumask_clear(xps_mask); + cpumask_set_cpu(fq->target_cpu, xps_mask); - err = netif_set_xps_queue(net_dev, &xps_mask, i); + err = netif_set_xps_queue(net_dev, xps_mask, i); if (err) { netdev_warn_once(net_dev, "Error setting XPS queue\n"); break; } } + free_cpumask_var(xps_mask); return err; } From d23982ea9aa438f35a8c8a6305943e98a8db90f6 Mon Sep 17 00:00:00 2001 From: Oswald Buddenhagen Date: Sat, 6 Apr 2024 08:48:20 +0200 Subject: [PATCH 1298/1565] ALSA: emux: improve patch ioctl data validation [ Upstream commit 89b32ccb12ae67e630c6453d778ec30a592a212f ] In load_data(), make the validation of and skipping over the main info block match that in load_guspatch(). In load_guspatch(), add checking that the specified patch length matches the actually supplied data, like load_data() already did. Signed-off-by: Oswald Buddenhagen Message-ID: <20240406064830.1029573-8-oswald.buddenhagen@gmx.de> Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin --- sound/synth/emux/soundfont.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/sound/synth/emux/soundfont.c b/sound/synth/emux/soundfont.c index 9ebc711afa6b..18b2d8492656 100644 --- a/sound/synth/emux/soundfont.c +++ b/sound/synth/emux/soundfont.c @@ -697,7 +697,6 @@ load_data(struct snd_sf_list *sflist, const void __user *data, long count) struct snd_soundfont *sf; struct soundfont_sample_info sample_info; struct snd_sf_sample *sp; - long off; /* patch must be opened */ if ((sf = sflist->currsf) == NULL) @@ -706,12 +705,16 @@ load_data(struct snd_sf_list *sflist, const void __user *data, long count) if (is_special_type(sf->type)) return -EINVAL; + if (count < (long)sizeof(sample_info)) { + return -EINVAL; + } if (copy_from_user(&sample_info, data, sizeof(sample_info))) return -EFAULT; + data += sizeof(sample_info); + count -= sizeof(sample_info); - off = sizeof(sample_info); - - if (sample_info.size != (count-off)/2) + // SoundFont uses S16LE samples. + if (sample_info.size * 2 != count) return -EINVAL; /* Check for dup */ @@ -738,7 +741,7 @@ load_data(struct snd_sf_list *sflist, const void __user *data, long count) int rc; rc = sflist->callback.sample_new (sflist->callback.private_data, sp, sflist->memhdr, - data + off, count - off); + data, count); if (rc < 0) { sf_sample_delete(sflist, sf, sp); return rc; @@ -951,10 +954,12 @@ load_guspatch(struct snd_sf_list *sflist, const char __user *data, } if (copy_from_user(&patch, data, sizeof(patch))) return -EFAULT; - count -= sizeof(patch); data += sizeof(patch); + if ((patch.len << (patch.mode & WAVE_16_BITS ? 1 : 0)) != count) + return -EINVAL; + sf = newsf(sflist, SNDRV_SFNT_PAT_TYPE_GUS|SNDRV_SFNT_PAT_SHARED, NULL); if (sf == NULL) return -ENOMEM; From e47d3babaa472d8d8b0a41c69b45288b994ff4dc Mon Sep 17 00:00:00 2001 From: Ricardo Ribalda Date: Mon, 25 Mar 2024 14:50:25 +0000 Subject: [PATCH 1299/1565] media: dvbdev: Initialize sbuf MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 17d1316de0d7dc1bdc5d6e3ad4efd30a9bf1a381 ] Because the size passed to copy_from_user() cannot be known beforehand, it needs to be checked during runtime with check_object_size. That makes gcc believe that the content of sbuf can be used before init. Fix: ./include/linux/thread_info.h:215:17: warning: ‘sbuf’ may be used uninitialized [-Wmaybe-uninitialized] Signed-off-by: Ricardo Ribalda Signed-off-by: Hans Verkuil Signed-off-by: Sasha Levin --- drivers/media/dvb-core/dvbdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/dvb-core/dvbdev.c b/drivers/media/dvb-core/dvbdev.c index 23a0c209744d..661588fc64f6 100644 --- a/drivers/media/dvb-core/dvbdev.c +++ b/drivers/media/dvb-core/dvbdev.c @@ -974,7 +974,7 @@ int dvb_usercopy(struct file *file, int (*func)(struct file *file, unsigned int cmd, void *arg)) { - char sbuf[128]; + char sbuf[128] = {}; void *mbuf = NULL; void *parg = NULL; int err = -EINVAL; From a45c45767bfe8d7c1ff5a1de931cfe0a85d523f7 Mon Sep 17 00:00:00 2001 From: Andrew Davis Date: Mon, 25 Mar 2024 11:55:07 -0500 Subject: [PATCH 1300/1565] soc: ti: wkup_m3_ipc: Send NULL dummy message instead of pointer message [ Upstream commit ddbf3204f600a4d1f153498f618369fca352ae00 ] mbox_send_message() sends a u32 bit message, not a pointer to a message. We only convert to a pointer type as a generic type. If we want to send a dummy message of 0, then simply send 0 (NULL). Signed-off-by: Andrew Davis Link: https://lore.kernel.org/r/20240325165507.30323-1-afd@ti.com Signed-off-by: Nishanth Menon Signed-off-by: Sasha Levin --- drivers/soc/ti/wkup_m3_ipc.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/soc/ti/wkup_m3_ipc.c b/drivers/soc/ti/wkup_m3_ipc.c index ef3f95fefab5..6634709e646c 100644 --- a/drivers/soc/ti/wkup_m3_ipc.c +++ b/drivers/soc/ti/wkup_m3_ipc.c @@ -14,7 +14,6 @@ #include #include #include -#include #include #include #include @@ -151,7 +150,6 @@ static irqreturn_t wkup_m3_txev_handler(int irq, void *ipc_data) static int wkup_m3_ping(struct wkup_m3_ipc *m3_ipc) { struct device *dev = m3_ipc->dev; - mbox_msg_t dummy_msg = 0; int ret; if (!m3_ipc->mbox) { @@ -167,7 +165,7 @@ static int wkup_m3_ping(struct wkup_m3_ipc *m3_ipc) * the RX callback to avoid multiple interrupts being received * by the CM3. */ - ret = mbox_send_message(m3_ipc->mbox, &dummy_msg); + ret = mbox_send_message(m3_ipc->mbox, NULL); if (ret < 0) { dev_err(dev, "%s: mbox_send_message() failed: %d\n", __func__, ret); @@ -189,7 +187,6 @@ static int wkup_m3_ping(struct wkup_m3_ipc *m3_ipc) static int wkup_m3_ping_noirq(struct wkup_m3_ipc *m3_ipc) { struct device *dev = m3_ipc->dev; - mbox_msg_t dummy_msg = 0; int ret; if (!m3_ipc->mbox) { @@ -198,7 +195,7 @@ static int wkup_m3_ping_noirq(struct wkup_m3_ipc *m3_ipc) return -EIO; } - ret = mbox_send_message(m3_ipc->mbox, &dummy_msg); + ret = mbox_send_message(m3_ipc->mbox, NULL); if (ret < 0) { dev_err(dev, "%s: mbox_send_message() failed: %d\n", __func__, ret); From 1a7a494184cf64fac81b8a72da53002652327926 Mon Sep 17 00:00:00 2001 From: Erick Archer Date: Sat, 30 Mar 2024 17:34:47 +0100 Subject: [PATCH 1301/1565] drm/radeon/radeon_display: Decrease the size of allocated memory MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit ae6a233092747e9652eb793d92f79d0820e01c6a ] This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1] [2]. In this case, the memory allocated to store RADEONFB_CONN_LIMIT pointers to "drm_connector" structures can be avoided. This is because this memory area is never accessed. Also, in the kzalloc function, it is preferred to use sizeof(*pointer) instead of sizeof(type) due to the type of the variable can change and one needs not change the former (unlike the latter). At the same time take advantage to remove the "#if 0" block, the code where the removed memory area was accessed, and the RADEONFB_CONN_LIMIT constant due to now is never used. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1] Link: https://github.com/KSPP/linux/issues/160 [2] Acked-by: Christian König Signed-off-by: Erick Archer Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/radeon/radeon.h | 1 - drivers/gpu/drm/radeon/radeon_display.c | 8 +------- 2 files changed, 1 insertion(+), 8 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index a813c00f109b..26fd45ea14cd 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -132,7 +132,6 @@ extern int radeon_cik_support; /* RADEON_IB_POOL_SIZE must be a power of 2 */ #define RADEON_IB_POOL_SIZE 16 #define RADEON_DEBUGFS_MAX_COMPONENTS 32 -#define RADEONFB_CONN_LIMIT 4 #define RADEON_BIOS_NUM_SCRATCH 8 /* internal ring indices */ diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 07d23a1e62a0..8643eebe9e24 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -685,7 +685,7 @@ static void radeon_crtc_init(struct drm_device *dev, int index) struct radeon_device *rdev = dev->dev_private; struct radeon_crtc *radeon_crtc; - radeon_crtc = kzalloc(sizeof(struct radeon_crtc) + (RADEONFB_CONN_LIMIT * sizeof(struct drm_connector *)), GFP_KERNEL); + radeon_crtc = kzalloc(sizeof(*radeon_crtc), GFP_KERNEL); if (radeon_crtc == NULL) return; @@ -711,12 +711,6 @@ static void radeon_crtc_init(struct drm_device *dev, int index) dev->mode_config.cursor_width = radeon_crtc->max_cursor_width; dev->mode_config.cursor_height = radeon_crtc->max_cursor_height; -#if 0 - radeon_crtc->mode_set.crtc = &radeon_crtc->base; - radeon_crtc->mode_set.connectors = (struct drm_connector **)(radeon_crtc + 1); - radeon_crtc->mode_set.num_connectors = 0; -#endif - if (rdev->is_atom_bios && (ASIC_IS_AVIVO(rdev) || radeon_r4xx_atom)) radeon_atombios_init_crtc(dev, radeon_crtc); else From cb4e7a8f3965e200447ac2c59211f07464b64be1 Mon Sep 17 00:00:00 2001 From: Hannes Reinecke Date: Mon, 17 Jun 2024 09:27:27 +0200 Subject: [PATCH 1302/1565] nvme: fixup comment for nvme RDMA Provider Type [ Upstream commit f80a55fa90fa76d01e3fffaa5d0413e522ab9a00 ] PRTYPE is the provider type, not the QP service type. Fixes: eb793e2c9286 ("nvme.h: add NVMe over Fabrics definitions") Signed-off-by: Hannes Reinecke Reviewed-by: Christoph Hellwig Signed-off-by: Keith Busch Signed-off-by: Sasha Levin --- include/linux/nvme.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/nvme.h b/include/linux/nvme.h index f454dd100334..ddf9ae37a2cc 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -71,8 +71,8 @@ enum { NVMF_RDMA_QPTYPE_DATAGRAM = 2, /* Reliable Datagram */ }; -/* RDMA QP Service Type codes for Discovery Log Page entry TSAS - * RDMA_QPTYPE field +/* RDMA Provider Type codes for Discovery Log Page entry TSAS + * RDMA_PRTYPE field */ enum { NVMF_RDMA_PRTYPE_NOT_SPECIFIED = 1, /* No Provider Specified */ From 98eae65cb5e3bf4aaf658b428f2608739ee1638e Mon Sep 17 00:00:00 2001 From: Liu Ying Date: Mon, 24 Jun 2024 09:56:12 +0800 Subject: [PATCH 1303/1565] drm/panel: simple: Add missing display timing flags for KOE TX26D202VM0BWA [ Upstream commit 37ce99b77762256ec9fda58d58fd613230151456 ] KOE TX26D202VM0BWA panel spec indicates the DE signal is active high in timing chart, so add DISPLAY_FLAGS_DE_HIGH flag in display timing flags. This aligns display_timing with panel_desc. Fixes: 8a07052440c2 ("drm/panel: simple: Add support for KOE TX26D202VM0BWA panel") Signed-off-by: Liu Ying Reviewed-by: Neil Armstrong Link: https://lore.kernel.org/r/20240624015612.341983-1-victor.liu@nxp.com Signed-off-by: Neil Armstrong Link: https://patchwork.freedesktop.org/patch/msgid/20240624015612.341983-1-victor.liu@nxp.com Signed-off-by: Sasha Levin --- drivers/gpu/drm/panel/panel-simple.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/panel/panel-simple.c b/drivers/gpu/drm/panel/panel-simple.c index 7797aad592a1..07b59021008e 100644 --- a/drivers/gpu/drm/panel/panel-simple.c +++ b/drivers/gpu/drm/panel/panel-simple.c @@ -2438,6 +2438,7 @@ static const struct display_timing koe_tx26d202vm0bwa_timing = { .vfront_porch = { 3, 5, 10 }, .vback_porch = { 2, 5, 10 }, .vsync_len = { 5, 5, 5 }, + .flags = DISPLAY_FLAGS_DE_HIGH, }; static const struct panel_desc koe_tx26d202vm0bwa = { From e44a83bf15c4db053ac6dfe96a23af184c9136d9 Mon Sep 17 00:00:00 2001 From: Aleksandr Mishin Date: Tue, 18 Jun 2024 17:43:44 +0300 Subject: [PATCH 1304/1565] gpio: davinci: Validate the obtained number of IRQs [ Upstream commit 7aa9b96e9a73e4ec1771492d0527bd5fc5ef9164 ] Value of pdata->gpio_unbanked is taken from Device Tree. In case of broken DT due to any error this value can be any. Without this value validation there can be out of chips->irqs array boundaries access in davinci_gpio_probe(). Validate the obtained nirq value so that it won't exceed the maximum number of IRQs per bank. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: eb3744a2dd01 ("gpio: davinci: Do not assume continuous IRQ numbering") Signed-off-by: Aleksandr Mishin Link: https://lore.kernel.org/r/20240618144344.16943-1-amishin@t-argos.ru Signed-off-by: Bartosz Golaszewski Signed-off-by: Sasha Levin --- drivers/gpio/gpio-davinci.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/gpio/gpio-davinci.c b/drivers/gpio/gpio-davinci.c index 80597e90de9c..33623bcfc886 100644 --- a/drivers/gpio/gpio-davinci.c +++ b/drivers/gpio/gpio-davinci.c @@ -227,6 +227,11 @@ static int davinci_gpio_probe(struct platform_device *pdev) else nirq = DIV_ROUND_UP(ngpio, 16); + if (nirq > MAX_INT_PER_BANK) { + dev_err(dev, "Too many IRQs!\n"); + return -EINVAL; + } + chips = devm_kzalloc(dev, sizeof(*chips), GFP_KERNEL); if (!chips) return -ENOMEM; From 38ce307939467def07b2004c6fccc5a4addb36d4 Mon Sep 17 00:00:00 2001 From: Kent Gibson Date: Wed, 26 Jun 2024 13:29:22 +0800 Subject: [PATCH 1305/1565] gpiolib: cdev: Disallow reconfiguration without direction (uAPI v1) [ Upstream commit 9919cce62f68e6ab68dc2a975b5dc670f8ca7d40 ] linehandle_set_config() behaves badly when direction is not set. The configuration validation is borrowed from linehandle_create(), where, to verify the intent of the user, the direction must be set to in order to effect a change to the electrical configuration of a line. But, when applied to reconfiguration, that validation does not allow for the unset direction case, making it possible to clear flags set previously without specifying the line direction. Adding to the inconsistency, those changes are not immediately applied by linehandle_set_config(), but will take effect when the line value is next get or set. For example, by requesting a configuration with no flags set, an output line with GPIOHANDLE_REQUEST_ACTIVE_LOW and GPIOHANDLE_REQUEST_OPEN_DRAIN requested could have those flags cleared, inverting the sense of the line and changing the line drive to push-pull on the next line value set. Ensure the intent of the user by disallowing configurations which do not have direction set, returning an error to userspace to indicate that the configuration is invalid. And, for clarity, use lflags, a local copy of gcnf.flags, throughout when dealing with the requested flags, rather than a mixture of both. Fixes: e588bb1eae31 ("gpio: add new SET_CONFIG ioctl() to gpio chardev") Signed-off-by: Kent Gibson Link: https://lore.kernel.org/r/20240626052925.174272-2-warthog618@gmail.com Signed-off-by: Bartosz Golaszewski Signed-off-by: Sasha Levin --- drivers/gpio/gpiolib-cdev.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c index 40d0196d8bdc..95861916deff 100644 --- a/drivers/gpio/gpiolib-cdev.c +++ b/drivers/gpio/gpiolib-cdev.c @@ -83,6 +83,10 @@ struct linehandle_state { GPIOHANDLE_REQUEST_OPEN_DRAIN | \ GPIOHANDLE_REQUEST_OPEN_SOURCE) +#define GPIOHANDLE_REQUEST_DIRECTION_FLAGS \ + (GPIOHANDLE_REQUEST_INPUT | \ + GPIOHANDLE_REQUEST_OUTPUT) + static int linehandle_validate_flags(u32 flags) { /* Return an error if an unknown flag is set */ @@ -163,21 +167,21 @@ static long linehandle_set_config(struct linehandle_state *lh, if (ret) return ret; + /* Lines must be reconfigured explicitly as input or output. */ + if (!(lflags & GPIOHANDLE_REQUEST_DIRECTION_FLAGS)) + return -EINVAL; + for (i = 0; i < lh->num_descs; i++) { desc = lh->descs[i]; - linehandle_flags_to_desc_flags(gcnf.flags, &desc->flags); + linehandle_flags_to_desc_flags(lflags, &desc->flags); - /* - * Lines have to be requested explicitly for input - * or output, else the line will be treated "as is". - */ if (lflags & GPIOHANDLE_REQUEST_OUTPUT) { int val = !!gcnf.default_values[i]; ret = gpiod_direction_output(desc, val); if (ret) return ret; - } else if (lflags & GPIOHANDLE_REQUEST_INPUT) { + } else { ret = gpiod_direction_input(desc); if (ret) return ret; From 49c09ca35a5f521d7fa18caf62fdf378f15e8aa4 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Fri, 28 Jun 2024 14:27:22 -0700 Subject: [PATCH 1306/1565] x86: stop playing stack games in profile_pc() [ Upstream commit 093d9603b60093a9aaae942db56107f6432a5dca ] The 'profile_pc()' function is used for timer-based profiling, which isn't really all that relevant any more to begin with, but it also ends up making assumptions based on the stack layout that aren't necessarily valid. Basically, the code tries to account the time spent in spinlocks to the caller rather than the spinlock, and while I support that as a concept, it's not worth the code complexity or the KASAN warnings when no serious profiling is done using timers anyway these days. And the code really does depend on stack layout that is only true in the simplest of cases. We've lost the comment at some point (I think when the 32-bit and 64-bit code was unified), but it used to say: Assume the lock function has either no stack frame or a copy of eflags from PUSHF. which explains why it just blindly loads a word or two straight off the stack pointer and then takes a minimal look at the values to just check if they might be eflags or the return pc: Eflags always has bits 22 and up cleared unlike kernel addresses but that basic stack layout assumption assumes that there isn't any lock debugging etc going on that would complicate the code and cause a stack frame. It causes KASAN unhappiness reported for years by syzkaller [1] and others [2]. With no real practical reason for this any more, just remove the code. Just for historical interest, here's some background commits relating to this code from 2006: 0cb91a229364 ("i386: Account spinlocks to the caller during profiling for !FP kernels") 31679f38d886 ("Simplify profile_pc on x86-64") and a code unification from 2009: ef4512882dbe ("x86: time_32/64.c unify profile_pc") but the basics of this thing actually goes back to before the git tree. Link: https://syzkaller.appspot.com/bug?extid=84fe685c02cd112a2ac3 [1] Link: https://lore.kernel.org/all/CAK55_s7Xyq=nh97=K=G1sxueOFrJDAvPOJAL4TPTCAYvmxO9_A@mail.gmail.com/ [2] Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- arch/x86/kernel/time.c | 20 +------------------- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c index e42faa792c07..52e1f3f0b361 100644 --- a/arch/x86/kernel/time.c +++ b/arch/x86/kernel/time.c @@ -27,25 +27,7 @@ unsigned long profile_pc(struct pt_regs *regs) { - unsigned long pc = instruction_pointer(regs); - - if (!user_mode(regs) && in_lock_functions(pc)) { -#ifdef CONFIG_FRAME_POINTER - return *(unsigned long *)(regs->bp + sizeof(long)); -#else - unsigned long *sp = (unsigned long *)regs->sp; - /* - * Return address is either directly at stack pointer - * or above a saved flags. Eflags has bits 22-31 zero, - * kernel addresses don't. - */ - if (sp[0] >> 22) - return sp[0]; - if (sp[1] >> 22) - return sp[1]; -#endif - } - return pc; + return instruction_pointer(regs); } EXPORT_SYMBOL(profile_pc); From a68b896aa56e435506453ec8835bc991ec3ae687 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Fri, 14 Jun 2024 16:52:43 +0200 Subject: [PATCH 1307/1565] ocfs2: fix DIO failure due to insufficient transaction credits commit be346c1a6eeb49d8fda827d2a9522124c2f72f36 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] #11 dio_complete at ffffffff8c2b9fa7 #12 do_blockdev_direct_IO at ffffffff8c2bc09f #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] #14 generic_file_direct_write at ffffffff8c1dcf14 #15 __generic_file_write_iter at ffffffff8c1dd07b #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] #17 aio_write at ffffffff8c2cc72e #18 kmem_cache_alloc at ffffffff8c248dde #19 do_io_submit at ffffffff8c2ccada #20 do_syscall_64 at ffffffff8c004984 #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f79506 ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara Reviewed-by: Joseph Qi Reviewed-by: Heming Zhao Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- fs/ocfs2/aops.c | 5 +++++ fs/ocfs2/journal.c | 17 +++++++++++++++++ fs/ocfs2/journal.h | 2 ++ fs/ocfs2/ocfs2_trace.h | 2 ++ 4 files changed, 26 insertions(+) diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 9b23e74036eb..1a5f23e79f5e 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -2373,6 +2373,11 @@ static int ocfs2_dio_end_io_write(struct inode *inode, } list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { + ret = ocfs2_assure_trans_credits(handle, credits); + if (ret < 0) { + mlog_errno(ret); + break; + } ret = ocfs2_mark_extent_written(inode, &et, handle, ue->ue_cpos, 1, ue->ue_phys, diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index 0534800a472a..dfa6ff2756fb 100644 --- a/fs/ocfs2/journal.c +++ b/fs/ocfs2/journal.c @@ -449,6 +449,23 @@ int ocfs2_extend_trans(handle_t *handle, int nblocks) return status; } +/* + * Make sure handle has at least 'nblocks' credits available. If it does not + * have that many credits available, we will try to extend the handle to have + * enough credits. If that fails, we will restart transaction to have enough + * credits. Similar notes regarding data consistency and locking implications + * as for ocfs2_extend_trans() apply here. + */ +int ocfs2_assure_trans_credits(handle_t *handle, int nblocks) +{ + int old_nblks = jbd2_handle_buffer_credits(handle); + + trace_ocfs2_assure_trans_credits(old_nblks); + if (old_nblks >= nblocks) + return 0; + return ocfs2_extend_trans(handle, nblocks - old_nblks); +} + /* * If we have fewer than thresh credits, extend by OCFS2_MAX_TRANS_DATA. * If that fails, restart the transaction & regain write access for the diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h index eb7a21bac71e..bc5d77cb3c50 100644 --- a/fs/ocfs2/journal.h +++ b/fs/ocfs2/journal.h @@ -244,6 +244,8 @@ handle_t *ocfs2_start_trans(struct ocfs2_super *osb, int ocfs2_commit_trans(struct ocfs2_super *osb, handle_t *handle); int ocfs2_extend_trans(handle_t *handle, int nblocks); +int ocfs2_assure_trans_credits(handle_t *handle, + int nblocks); int ocfs2_allocate_extend_trans(handle_t *handle, int thresh); diff --git a/fs/ocfs2/ocfs2_trace.h b/fs/ocfs2/ocfs2_trace.h index dc4bce1649c1..7a9cfd61145a 100644 --- a/fs/ocfs2/ocfs2_trace.h +++ b/fs/ocfs2/ocfs2_trace.h @@ -2578,6 +2578,8 @@ DEFINE_OCFS2_ULL_UINT_EVENT(ocfs2_commit_cache_end); DEFINE_OCFS2_INT_INT_EVENT(ocfs2_extend_trans); +DEFINE_OCFS2_INT_EVENT(ocfs2_assure_trans_credits); + DEFINE_OCFS2_INT_EVENT(ocfs2_extend_trans_restart); DEFINE_OCFS2_INT_INT_EVENT(ocfs2_allocate_extend_trans); From 5048a44a257e40b9c252c8c982cf61338b1d68fe Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= Date: Mon, 27 May 2024 16:24:41 +0300 Subject: [PATCH 1308/1565] mmc: sdhci-pci: Convert PCIBIOS_* return codes to errnos MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit ebc4fc34eae8ddfbef49f2bdaced1bf4167ef80d upstream. jmicron_pmos() and sdhci_pci_probe() use pci_{read,write}_config_byte() that return PCIBIOS_* codes. The return code is then returned as is by jmicron_probe() and sdhci_pci_probe(). Similarly, the return code is also returned as is from jmicron_resume(). Both probe and resume functions should return normal errnos. Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal errno before returning them the fix these issues. Fixes: 7582041ff3d4 ("mmc: sdhci-pci: fix simple_return.cocci warnings") Fixes: 45211e215984 ("sdhci: toggle JMicron PMOS setting") Cc: stable@vger.kernel.org Signed-off-by: Ilpo Järvinen Acked-by: Adrian Hunter Link: https://lore.kernel.org/r/20240527132443.14038-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/host/sdhci-pci-core.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/mmc/host/sdhci-pci-core.c b/drivers/mmc/host/sdhci-pci-core.c index 8b02fe3916d1..7e5dab385518 100644 --- a/drivers/mmc/host/sdhci-pci-core.c +++ b/drivers/mmc/host/sdhci-pci-core.c @@ -1380,7 +1380,7 @@ static int jmicron_pmos(struct sdhci_pci_chip *chip, int on) ret = pci_read_config_byte(chip->pdev, 0xAE, &scratch); if (ret) - return ret; + goto fail; /* * Turn PMOS on [bit 0], set over current detection to 2.4 V @@ -1391,7 +1391,10 @@ static int jmicron_pmos(struct sdhci_pci_chip *chip, int on) else scratch &= ~0x47; - return pci_write_config_byte(chip->pdev, 0xAE, scratch); + ret = pci_write_config_byte(chip->pdev, 0xAE, scratch); + +fail: + return pcibios_err_to_errno(ret); } static int jmicron_probe(struct sdhci_pci_chip *chip) @@ -2308,7 +2311,7 @@ static int sdhci_pci_probe(struct pci_dev *pdev, ret = pci_read_config_byte(pdev, PCI_SLOT_INFO, &slots); if (ret) - return ret; + return pcibios_err_to_errno(ret); slots = PCI_SLOT_INFO_SLOTS(slots) + 1; dev_dbg(&pdev->dev, "found %d slot(s)\n", slots); @@ -2317,7 +2320,7 @@ static int sdhci_pci_probe(struct pci_dev *pdev, ret = pci_read_config_byte(pdev, PCI_SLOT_INFO, &first_bar); if (ret) - return ret; + return pcibios_err_to_errno(ret); first_bar &= PCI_SLOT_INFO_FIRST_BAR_MASK; From 803835fda351c8aeb0546e784e80999b75c5fc3c Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Fri, 14 Jun 2024 11:00:49 +0300 Subject: [PATCH 1309/1565] mmc: sdhci: Do not invert write-protect twice commit fbd64f902b93fe9658b855b9892ae59ef6ea22b9 upstream. mmc_of_parse() reads device property "wp-inverted" and sets MMC_CAP2_RO_ACTIVE_HIGH if it is true. MMC_CAP2_RO_ACTIVE_HIGH is used to invert a write-protect (AKA read-only) GPIO value. sdhci_get_property() also reads "wp-inverted" and sets SDHCI_QUIRK_INVERTED_WRITE_PROTECT which is used to invert the write-protect value as well but also acts upon a value read out from the SDHCI_PRESENT_STATE register. Many drivers call both mmc_of_parse() and sdhci_get_property(), so that both MMC_CAP2_RO_ACTIVE_HIGH and SDHCI_QUIRK_INVERTED_WRITE_PROTECT will be set if the controller has device property "wp-inverted". Amend the logic in sdhci_check_ro() to allow for that possibility, so that the write-protect value is not inverted twice. Also do not invert the value if it is a negative error value. Note that callers treat an error the same as not-write-protected, so the result is functionally the same in that case. Also do not invert the value if sdhci host operation ->get_ro() is used. None of the users of that callback set SDHCI_QUIRK_INVERTED_WRITE_PROTECT directly or indirectly, but two do call mmc_gpio_get_ro(), so leave it to them to deal with that if they ever set SDHCI_QUIRK_INVERTED_WRITE_PROTECT in the future. Fixes: 6d5cd068ee59 ("mmc: sdhci: use WP GPIO in sdhci_check_ro()") Signed-off-by: Adrian Hunter Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240614080051.4005-2-adrian.hunter@intel.com Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/host/sdhci.c | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index bad01cc6823f..ce1ecb579614 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -2488,26 +2488,34 @@ static int sdhci_get_cd(struct mmc_host *mmc) static int sdhci_check_ro(struct sdhci_host *host) { + bool allow_invert = false; unsigned long flags; int is_readonly; spin_lock_irqsave(&host->lock, flags); - if (host->flags & SDHCI_DEVICE_DEAD) + if (host->flags & SDHCI_DEVICE_DEAD) { is_readonly = 0; - else if (host->ops->get_ro) + } else if (host->ops->get_ro) { is_readonly = host->ops->get_ro(host); - else if (mmc_can_gpio_ro(host->mmc)) + } else if (mmc_can_gpio_ro(host->mmc)) { is_readonly = mmc_gpio_get_ro(host->mmc); - else + /* Do not invert twice */ + allow_invert = !(host->mmc->caps2 & MMC_CAP2_RO_ACTIVE_HIGH); + } else { is_readonly = !(sdhci_readl(host, SDHCI_PRESENT_STATE) & SDHCI_WRITE_PROTECT); + allow_invert = true; + } spin_unlock_irqrestore(&host->lock, flags); - /* This quirk needs to be replaced by a callback-function later */ - return host->quirks & SDHCI_QUIRK_INVERTED_WRITE_PROTECT ? - !is_readonly : is_readonly; + if (is_readonly >= 0 && + allow_invert && + (host->quirks & SDHCI_QUIRK_INVERTED_WRITE_PROTECT)) + is_readonly = !is_readonly; + + return is_readonly; } #define SAMPLE_COUNT 5 From 76da476a4c60762716ee2d561b6e67a832189fb7 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Fri, 14 Jun 2024 11:00:50 +0300 Subject: [PATCH 1310/1565] mmc: sdhci: Do not lock spinlock around mmc_gpio_get_ro() commit ab069ce125965a5e282f7b53b86aee76ab32975c upstream. sdhci_check_ro() can call mmc_gpio_get_ro() while holding the sdhci host->lock spinlock. That would be a problem if the GPIO access done by mmc_gpio_get_ro() needed to sleep. However, host->lock is not needed anyway. The mmc core ensures that host operations do not race with each other, and asynchronous callbacks like the interrupt handler, software timeouts, completion work etc, cannot affect sdhci_check_ro(). So remove the locking. Fixes: 6d5cd068ee59 ("mmc: sdhci: use WP GPIO in sdhci_check_ro()") Signed-off-by: Adrian Hunter Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240614080051.4005-3-adrian.hunter@intel.com Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/host/sdhci.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index ce1ecb579614..9091930f5859 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -2489,11 +2489,8 @@ static int sdhci_get_cd(struct mmc_host *mmc) static int sdhci_check_ro(struct sdhci_host *host) { bool allow_invert = false; - unsigned long flags; int is_readonly; - spin_lock_irqsave(&host->lock, flags); - if (host->flags & SDHCI_DEVICE_DEAD) { is_readonly = 0; } else if (host->ops->get_ro) { @@ -2508,8 +2505,6 @@ static int sdhci_check_ro(struct sdhci_host *host) allow_invert = true; } - spin_unlock_irqrestore(&host->lock, flags); - if (is_readonly >= 0 && allow_invert && (host->quirks & SDHCI_QUIRK_INVERTED_WRITE_PROTECT)) From 78ece307f82397629be958bc46d7573b4fe3cc86 Mon Sep 17 00:00:00 2001 From: David Lechner Date: Fri, 21 Jun 2024 17:22:40 -0500 Subject: [PATCH 1311/1565] counter: ti-eqep: enable clock at probe [ Upstream commit 0cf81c73e4c6a4861128a8f27861176ec312af4e ] The TI eQEP clock is both a functional and interface clock. Since it is required for the device to function, we should be enabling it at probe. Up to now, we've just been lucky that the clock was enabled by something else on the system already. Fixes: f213729f6796 ("counter: new TI eQEP driver") Reviewed-by: Judith Mendez Signed-off-by: David Lechner Link: https://lore.kernel.org/r/20240621-ti-eqep-enable-clock-v2-1-edd3421b54d4@baylibre.com Signed-off-by: William Breathitt Gray Signed-off-by: Sasha Levin --- drivers/counter/ti-eqep.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/counter/ti-eqep.c b/drivers/counter/ti-eqep.c index 65df9ef5b5bc..634399881929 100644 --- a/drivers/counter/ti-eqep.c +++ b/drivers/counter/ti-eqep.c @@ -6,6 +6,7 @@ */ #include +#include #include #include #include @@ -349,6 +350,7 @@ static int ti_eqep_probe(struct platform_device *pdev) struct device *dev = &pdev->dev; struct ti_eqep_cnt *priv; void __iomem *base; + struct clk *clk; int err; priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL); @@ -388,6 +390,10 @@ static int ti_eqep_probe(struct platform_device *pdev) pm_runtime_enable(dev); pm_runtime_get_sync(dev); + clk = devm_clk_get_enabled(dev, NULL); + if (IS_ERR(clk)) + return dev_err_probe(dev, PTR_ERR(clk), "failed to enable clock\n"); + err = counter_register(&priv->counter); if (err < 0) { pm_runtime_put_sync(dev); From 28f6d0b5ff9f4a1638f8e2840359deb296716a88 Mon Sep 17 00:00:00 2001 From: Fernando Yang Date: Mon, 3 Jun 2024 15:07:54 -0300 Subject: [PATCH 1312/1565] iio: adc: ad7266: Fix variable checking bug commit a2b86132955268b2a1703082fbc2d4832fc001b8 upstream. The ret variable was not checked after iio_device_release_direct_mode(), which could possibly cause errors Fixes: c70df20e3159 ("iio: adc: ad7266: claim direct mode during sensor read") Signed-off-by: Fernando Yang Link: https://lore.kernel.org/r/20240603180757.8560-1-hagisf@usp.br Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/adc/ad7266.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/iio/adc/ad7266.c b/drivers/iio/adc/ad7266.c index a8ec3efd659e..90b9ac589e90 100644 --- a/drivers/iio/adc/ad7266.c +++ b/drivers/iio/adc/ad7266.c @@ -157,6 +157,8 @@ static int ad7266_read_raw(struct iio_dev *indio_dev, ret = ad7266_read_single(st, val, chan->address); iio_device_release_direct_mode(indio_dev); + if (ret < 0) + return ret; *val = (*val >> 2) & 0xfff; if (chan->scan_type.sign == 's') *val = sign_extend32(*val, 11); From 44f04b1a88d63d6c4df6fe82954c9cdb98447a61 Mon Sep 17 00:00:00 2001 From: Vasileios Amoiridis Date: Thu, 6 Jun 2024 23:22:53 +0200 Subject: [PATCH 1313/1565] iio: chemical: bme680: Fix pressure value output commit ae1f7b93b52095be6776d0f34957b4f35dda44d9 upstream. The IIO standard units are measured in kPa while the driver is using hPa. Apart from checking the userspace value itself, it is mentioned also in the Bosch API [1] that the pressure value is in Pascal. [1]: https://github.com/boschsensortec/BME68x_SensorAPI/blob/v4.4.8/bme68x_defs.h#L742 Fixes: 1b3bd8592780 ("iio: chemical: Add support for Bosch BME680 sensor") Signed-off-by: Vasileios Amoiridis Link: https://lore.kernel.org/r/20240606212313.207550-2-vassilisamir@gmail.com Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/chemical/bme680_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iio/chemical/bme680_core.c b/drivers/iio/chemical/bme680_core.c index 6ea99e4cbf92..79f68659770b 100644 --- a/drivers/iio/chemical/bme680_core.c +++ b/drivers/iio/chemical/bme680_core.c @@ -678,7 +678,7 @@ static int bme680_read_press(struct bme680_data *data, } *val = bme680_compensate_press(data, adc_press); - *val2 = 100; + *val2 = 1000; return IIO_VAL_FRACTIONAL; } From 3d78fc351bee7228a6e625d92d718e86e67a2a3c Mon Sep 17 00:00:00 2001 From: Vasileios Amoiridis Date: Thu, 6 Jun 2024 23:22:54 +0200 Subject: [PATCH 1314/1565] iio: chemical: bme680: Fix calibration data variable commit b47c0fee73a810c4503c4a94ea34858a1d865bba upstream. According to the BME68x Sensor API [1], the h6 calibration data variable should be an unsigned integer of size 8. [1]: https://github.com/boschsensortec/BME68x_SensorAPI/blob/v4.4.8/bme68x_defs.h#L789 Fixes: 1b3bd8592780 ("iio: chemical: Add support for Bosch BME680 sensor") Signed-off-by: Vasileios Amoiridis Link: https://lore.kernel.org/r/20240606212313.207550-3-vassilisamir@gmail.com Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/chemical/bme680_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iio/chemical/bme680_core.c b/drivers/iio/chemical/bme680_core.c index 79f68659770b..08087faf93a2 100644 --- a/drivers/iio/chemical/bme680_core.c +++ b/drivers/iio/chemical/bme680_core.c @@ -38,7 +38,7 @@ struct bme680_calib { s8 par_h3; s8 par_h4; s8 par_h5; - s8 par_h6; + u8 par_h6; s8 par_h7; s8 par_gh1; s16 par_gh2; From c326551e99f5416986074ce78bef94f6a404b517 Mon Sep 17 00:00:00 2001 From: Vasileios Amoiridis Date: Thu, 6 Jun 2024 23:22:55 +0200 Subject: [PATCH 1315/1565] iio: chemical: bme680: Fix overflows in compensate() functions commit fdd478c3ae98c3f13628e110dce9b6cfb0d9b3c8 upstream. There are cases in the compensate functions of the driver that there could be overflows of variables due to bit shifting ops. These implications were initially discussed here [1] and they were mentioned in log message of Commit 1b3bd8592780 ("iio: chemical: Add support for Bosch BME680 sensor"). [1]: https://lore.kernel.org/linux-iio/20180728114028.3c1bbe81@archlinux/ Fixes: 1b3bd8592780 ("iio: chemical: Add support for Bosch BME680 sensor") Signed-off-by: Vasileios Amoiridis Link: https://lore.kernel.org/r/20240606212313.207550-4-vassilisamir@gmail.com Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/chemical/bme680_core.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/iio/chemical/bme680_core.c b/drivers/iio/chemical/bme680_core.c index 08087faf93a2..282f3ea91e4f 100644 --- a/drivers/iio/chemical/bme680_core.c +++ b/drivers/iio/chemical/bme680_core.c @@ -342,10 +342,10 @@ static s16 bme680_compensate_temp(struct bme680_data *data, if (!calib->par_t2) bme680_read_calib(data, calib); - var1 = (adc_temp >> 3) - (calib->par_t1 << 1); + var1 = (adc_temp >> 3) - ((s32)calib->par_t1 << 1); var2 = (var1 * calib->par_t2) >> 11; var3 = ((var1 >> 1) * (var1 >> 1)) >> 12; - var3 = (var3 * (calib->par_t3 << 4)) >> 14; + var3 = (var3 * ((s32)calib->par_t3 << 4)) >> 14; data->t_fine = var2 + var3; calc_temp = (data->t_fine * 5 + 128) >> 8; @@ -368,9 +368,9 @@ static u32 bme680_compensate_press(struct bme680_data *data, var1 = (data->t_fine >> 1) - 64000; var2 = ((((var1 >> 2) * (var1 >> 2)) >> 11) * calib->par_p6) >> 2; var2 = var2 + (var1 * calib->par_p5 << 1); - var2 = (var2 >> 2) + (calib->par_p4 << 16); + var2 = (var2 >> 2) + ((s32)calib->par_p4 << 16); var1 = (((((var1 >> 2) * (var1 >> 2)) >> 13) * - (calib->par_p3 << 5)) >> 3) + + ((s32)calib->par_p3 << 5)) >> 3) + ((calib->par_p2 * var1) >> 1); var1 = var1 >> 18; var1 = ((32768 + var1) * calib->par_p1) >> 15; @@ -388,7 +388,7 @@ static u32 bme680_compensate_press(struct bme680_data *data, var3 = ((press_comp >> 8) * (press_comp >> 8) * (press_comp >> 8) * calib->par_p10) >> 17; - press_comp += (var1 + var2 + var3 + (calib->par_p7 << 7)) >> 4; + press_comp += (var1 + var2 + var3 + ((s32)calib->par_p7 << 7)) >> 4; return press_comp; } @@ -414,7 +414,7 @@ static u32 bme680_compensate_humid(struct bme680_data *data, (((temp_scaled * ((temp_scaled * calib->par_h5) / 100)) >> 6) / 100) + (1 << 14))) >> 10; var3 = var1 * var2; - var4 = calib->par_h6 << 7; + var4 = (s32)calib->par_h6 << 7; var4 = (var4 + ((temp_scaled * calib->par_h7) / 100)) >> 4; var5 = ((var3 >> 14) * (var3 >> 14)) >> 10; var6 = (var4 * var5) >> 1; From 59a84bcf1cc791c16ccf86cc4ae9637ba3338af3 Mon Sep 17 00:00:00 2001 From: Vasileios Amoiridis Date: Thu, 6 Jun 2024 23:22:56 +0200 Subject: [PATCH 1316/1565] iio: chemical: bme680: Fix sensor data read operation commit 4241665e6ea063a9c1d734de790121a71db763fc upstream. A read operation is happening as follows: a) Set sensor to forced mode b) Sensor measures values and update data registers and sleeps again c) Read data registers In the current implementation the read operation happens immediately after the sensor is set to forced mode so the sensor does not have the time to update properly the registers. This leads to the following 2 problems: 1) The first ever value which is read by the register is always wrong 2) Every read operation, puts the register into forced mode and reads the data that were calculated in the previous conversion. This behaviour was tested in 2 ways: 1) The internal meas_status_0 register was read before and after every read operation in order to verify that the data were ready even before the register was set to forced mode and also to check that after the forced mode was set the new data were not yet ready. 2) Physically changing the temperature and measuring the temperature This commit adds the waiting time in between the set of the forced mode and the read of the data. The function is taken from the Bosch BME68x Sensor API [1]. [1]: https://github.com/boschsensortec/BME68x_SensorAPI/blob/v4.4.8/bme68x.c#L490 Fixes: 1b3bd8592780 ("iio: chemical: Add support for Bosch BME680 sensor") Signed-off-by: Vasileios Amoiridis Link: https://lore.kernel.org/r/20240606212313.207550-5-vassilisamir@gmail.com Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/chemical/bme680.h | 2 ++ drivers/iio/chemical/bme680_core.c | 46 ++++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+) diff --git a/drivers/iio/chemical/bme680.h b/drivers/iio/chemical/bme680.h index 4edc5d21cb9f..f959252a4fe6 100644 --- a/drivers/iio/chemical/bme680.h +++ b/drivers/iio/chemical/bme680.h @@ -54,7 +54,9 @@ #define BME680_NB_CONV_MASK GENMASK(3, 0) #define BME680_REG_MEAS_STAT_0 0x1D +#define BME680_NEW_DATA_BIT BIT(7) #define BME680_GAS_MEAS_BIT BIT(6) +#define BME680_MEAS_BIT BIT(5) /* Calibration Parameters */ #define BME680_T2_LSB_REG 0x8A diff --git a/drivers/iio/chemical/bme680_core.c b/drivers/iio/chemical/bme680_core.c index 282f3ea91e4f..2216577b1005 100644 --- a/drivers/iio/chemical/bme680_core.c +++ b/drivers/iio/chemical/bme680_core.c @@ -10,6 +10,7 @@ */ #include #include +#include #include #include #include @@ -532,6 +533,43 @@ static u8 bme680_oversampling_to_reg(u8 val) return ilog2(val) + 1; } +/* + * Taken from Bosch BME680 API: + * https://github.com/boschsensortec/BME68x_SensorAPI/blob/v4.4.8/bme68x.c#L490 + */ +static int bme680_wait_for_eoc(struct bme680_data *data) +{ + struct device *dev = regmap_get_device(data->regmap); + unsigned int check; + int ret; + /* + * (Sum of oversampling ratios * time per oversampling) + + * TPH measurement + gas measurement + wait transition from forced mode + * + heater duration + */ + int wait_eoc_us = ((data->oversampling_temp + data->oversampling_press + + data->oversampling_humid) * 1936) + (477 * 4) + + (477 * 5) + 1000 + (data->heater_dur * 1000); + + usleep_range(wait_eoc_us, wait_eoc_us + 100); + + ret = regmap_read(data->regmap, BME680_REG_MEAS_STAT_0, &check); + if (ret) { + dev_err(dev, "failed to read measurement status register.\n"); + return ret; + } + if (check & BME680_MEAS_BIT) { + dev_err(dev, "Device measurement cycle incomplete.\n"); + return -EBUSY; + } + if (!(check & BME680_NEW_DATA_BIT)) { + dev_err(dev, "No new data available from the device.\n"); + return -ENODATA; + } + + return 0; +} + static int bme680_chip_config(struct bme680_data *data) { struct device *dev = regmap_get_device(data->regmap); @@ -622,6 +660,10 @@ static int bme680_read_temp(struct bme680_data *data, int *val) if (ret < 0) return ret; + ret = bme680_wait_for_eoc(data); + if (ret) + return ret; + ret = regmap_bulk_read(data->regmap, BME680_REG_TEMP_MSB, &tmp, 3); if (ret < 0) { @@ -738,6 +780,10 @@ static int bme680_read_gas(struct bme680_data *data, if (ret < 0) return ret; + ret = bme680_wait_for_eoc(data); + if (ret) + return ret; + ret = regmap_read(data->regmap, BME680_REG_MEAS_STAT_0, &check); if (check & BME680_GAS_MEAS_BIT) { dev_err(dev, "gas measurement incomplete\n"); From ee88636607e1e58d7a43aa1ced8f92a20ac2b607 Mon Sep 17 00:00:00 2001 From: Jose Ignacio Tornos Martinez Date: Thu, 20 Jun 2024 15:34:31 +0200 Subject: [PATCH 1317/1565] net: usb: ax88179_178a: improve link status logs commit 058722ee350c0bdd664e467156feb2bf5d9cc271 upstream. Avoid spurious link status logs that may ultimately be wrong; for example, if the link is set to down with the cable plugged, then the cable is unplugged and after this the link is set to up, the last new log that is appearing is incorrectly telling that the link is up. In order to avoid errors, show link status logs after link_reset processing, and in order to avoid spurious as much as possible, only show the link loss when some link status change is detected. cc: stable@vger.kernel.org Fixes: e2ca90c276e1 ("ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver") Signed-off-by: Jose Ignacio Tornos Martinez Reviewed-by: Simon Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/usb/ax88179_178a.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c index da4a2427b005..29a04b059c11 100644 --- a/drivers/net/usb/ax88179_178a.c +++ b/drivers/net/usb/ax88179_178a.c @@ -346,7 +346,8 @@ static void ax88179_status(struct usbnet *dev, struct urb *urb) if (netif_carrier_ok(dev->net) != link) { usbnet_link_change(dev, link, 1); - netdev_info(dev->net, "ax88179 - Link status is: %d\n", link); + if (!link) + netdev_info(dev->net, "ax88179 - Link status is: 0\n"); } } @@ -1638,6 +1639,7 @@ static int ax88179_link_reset(struct usbnet *dev) GMII_PHY_PHYSR, 2, &tmp16); if (!(tmp16 & GMII_PHY_PHYSR_LINK)) { + netdev_info(dev->net, "ax88179 - Link status is: 0\n"); return 0; } else if (GMII_PHY_PHYSR_GIGA == (tmp16 & GMII_PHY_PHYSR_SMASK)) { mode |= AX_MEDIUM_GIGAMODE | AX_MEDIUM_EN_125MHZ; @@ -1675,6 +1677,8 @@ static int ax88179_link_reset(struct usbnet *dev) netif_carrier_on(dev->net); + netdev_info(dev->net, "ax88179 - Link status is: 1\n"); + return 0; } From 84ca47192f977cf3adea3ba62a2f179e4a6ed02d Mon Sep 17 00:00:00 2001 From: Oliver Neukum Date: Thu, 20 Jun 2024 11:37:39 +0200 Subject: [PATCH 1318/1565] usb: gadget: printer: SS+ support commit fd80731e5e9d1402cb2f85022a6abf9b1982ec5f upstream. We need to treat super speed plus as super speed, not the default, which is full speed. Signed-off-by: Oliver Neukum Cc: stable Link: https://lore.kernel.org/r/20240620093800.28901-1-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman --- drivers/usb/gadget/function/f_printer.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/usb/gadget/function/f_printer.c b/drivers/usb/gadget/function/f_printer.c index c13bb29a160e..cac251a6d7ec 100644 --- a/drivers/usb/gadget/function/f_printer.c +++ b/drivers/usb/gadget/function/f_printer.c @@ -208,6 +208,7 @@ static inline struct usb_endpoint_descriptor *ep_desc(struct usb_gadget *gadget, struct usb_endpoint_descriptor *ss) { switch (gadget->speed) { + case USB_SPEED_SUPER_PLUS: case USB_SPEED_SUPER: return ss; case USB_SPEED_HIGH: From 2798fc1560716c7945587ad43bef8f31f78c00af Mon Sep 17 00:00:00 2001 From: Oliver Neukum Date: Thu, 20 Jun 2024 13:40:26 +0200 Subject: [PATCH 1319/1565] usb: gadget: printer: fix races against disable commit e587a7633dfee8987a999cf253f7c52a8e09276c upstream. printer_read() and printer_write() guard against the race against disable() by checking the dev->interface flag, which in turn is guarded by a spinlock. These functions, however, drop the lock on multiple occasions. This means that the test has to be redone after reacquiring the lock and before doing IO. Add the tests. This also addresses CVE-2024-25741 Fixes: 7f2ca14d2f9b9 ("usb: gadget: function: printer: Interface is disabled and returns error") Cc: stable Signed-off-by: Oliver Neukum Link: https://lore.kernel.org/r/20240620114039.5767-1-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman --- drivers/usb/gadget/function/f_printer.c | 39 ++++++++++++++++++------- 1 file changed, 29 insertions(+), 10 deletions(-) diff --git a/drivers/usb/gadget/function/f_printer.c b/drivers/usb/gadget/function/f_printer.c index cac251a6d7ec..a31f2fe5d984 100644 --- a/drivers/usb/gadget/function/f_printer.c +++ b/drivers/usb/gadget/function/f_printer.c @@ -446,11 +446,8 @@ printer_read(struct file *fd, char __user *buf, size_t len, loff_t *ptr) mutex_lock(&dev->lock_printer_io); spin_lock_irqsave(&dev->lock, flags); - if (dev->interface < 0) { - spin_unlock_irqrestore(&dev->lock, flags); - mutex_unlock(&dev->lock_printer_io); - return -ENODEV; - } + if (dev->interface < 0) + goto out_disabled; /* We will use this flag later to check if a printer reset happened * after we turn interrupts back on. @@ -458,6 +455,9 @@ printer_read(struct file *fd, char __user *buf, size_t len, loff_t *ptr) dev->reset_printer = 0; setup_rx_reqs(dev); + /* this dropped the lock - need to retest */ + if (dev->interface < 0) + goto out_disabled; bytes_copied = 0; current_rx_req = dev->current_rx_req; @@ -491,6 +491,8 @@ printer_read(struct file *fd, char __user *buf, size_t len, loff_t *ptr) wait_event_interruptible(dev->rx_wait, (likely(!list_empty(&dev->rx_buffers)))); spin_lock_irqsave(&dev->lock, flags); + if (dev->interface < 0) + goto out_disabled; } /* We have data to return then copy it to the caller's buffer.*/ @@ -534,6 +536,9 @@ printer_read(struct file *fd, char __user *buf, size_t len, loff_t *ptr) return -EAGAIN; } + if (dev->interface < 0) + goto out_disabled; + /* If we not returning all the data left in this RX request * buffer then adjust the amount of data left in the buffer. * Othewise if we are done with this RX request buffer then @@ -563,6 +568,11 @@ printer_read(struct file *fd, char __user *buf, size_t len, loff_t *ptr) return bytes_copied; else return -EAGAIN; + +out_disabled: + spin_unlock_irqrestore(&dev->lock, flags); + mutex_unlock(&dev->lock_printer_io); + return -ENODEV; } static ssize_t @@ -583,11 +593,8 @@ printer_write(struct file *fd, const char __user *buf, size_t len, loff_t *ptr) mutex_lock(&dev->lock_printer_io); spin_lock_irqsave(&dev->lock, flags); - if (dev->interface < 0) { - spin_unlock_irqrestore(&dev->lock, flags); - mutex_unlock(&dev->lock_printer_io); - return -ENODEV; - } + if (dev->interface < 0) + goto out_disabled; /* Check if a printer reset happens while we have interrupts on */ dev->reset_printer = 0; @@ -610,6 +617,8 @@ printer_write(struct file *fd, const char __user *buf, size_t len, loff_t *ptr) wait_event_interruptible(dev->tx_wait, (likely(!list_empty(&dev->tx_reqs)))); spin_lock_irqsave(&dev->lock, flags); + if (dev->interface < 0) + goto out_disabled; } while (likely(!list_empty(&dev->tx_reqs)) && len) { @@ -659,6 +668,9 @@ printer_write(struct file *fd, const char __user *buf, size_t len, loff_t *ptr) return -EAGAIN; } + if (dev->interface < 0) + goto out_disabled; + list_add(&req->list, &dev->tx_reqs_active); /* here, we unlock, and only unlock, to avoid deadlock. */ @@ -672,6 +684,8 @@ printer_write(struct file *fd, const char __user *buf, size_t len, loff_t *ptr) mutex_unlock(&dev->lock_printer_io); return -EAGAIN; } + if (dev->interface < 0) + goto out_disabled; } spin_unlock_irqrestore(&dev->lock, flags); @@ -683,6 +697,11 @@ printer_write(struct file *fd, const char __user *buf, size_t len, loff_t *ptr) return bytes_copied; else return -EAGAIN; + +out_disabled: + spin_unlock_irqrestore(&dev->lock, flags); + mutex_unlock(&dev->lock_printer_io); + return -ENODEV; } static int From 621e90201c84b8fc427c433738a11495d05422d0 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Mon, 17 Jun 2024 12:31:30 +0300 Subject: [PATCH 1320/1565] usb: musb: da8xx: fix a resource leak in probe() commit de644a4a86be04ed8a43ef8267d0f7d021941c5e upstream. Call usb_phy_generic_unregister() if of_platform_populate() fails. Fixes: d6299b6efbf6 ("usb: musb: Add support of CPPI 4.1 DMA controller to DA8xx") Cc: stable Signed-off-by: Dan Carpenter Link: https://lore.kernel.org/r/69af1b1d-d3f4-492b-bcea-359ca5949f30@moroto.mountain Signed-off-by: Greg Kroah-Hartman --- drivers/usb/musb/da8xx.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/usb/musb/da8xx.c b/drivers/usb/musb/da8xx.c index 1c023c0091c4..ad336daa9344 100644 --- a/drivers/usb/musb/da8xx.c +++ b/drivers/usb/musb/da8xx.c @@ -556,7 +556,7 @@ static int da8xx_probe(struct platform_device *pdev) ret = of_platform_populate(pdev->dev.of_node, NULL, da8xx_auxdata_lookup, &pdev->dev); if (ret) - return ret; + goto err_unregister_phy; memset(musb_resources, 0x00, sizeof(*musb_resources) * ARRAY_SIZE(musb_resources)); @@ -582,9 +582,13 @@ static int da8xx_probe(struct platform_device *pdev) ret = PTR_ERR_OR_ZERO(glue->musb); if (ret) { dev_err(&pdev->dev, "failed to register musb device: %d\n", ret); - usb_phy_generic_unregister(glue->usb_phy); + goto err_unregister_phy; } + return 0; + +err_unregister_phy: + usb_phy_generic_unregister(glue->usb_phy); return ret; } From 75ddbf776dd04a09fb9e5267ead5d0c989f84506 Mon Sep 17 00:00:00 2001 From: Nikita Zhandarovich Date: Sun, 9 Jun 2024 06:15:46 -0700 Subject: [PATCH 1321/1565] usb: atm: cxacru: fix endpoint checking in cxacru_bind() commit 2eabb655a968b862bc0c31629a09f0fbf3c80d51 upstream. Syzbot is still reporting quite an old issue [1] that occurs due to incomplete checking of present usb endpoints. As such, wrong endpoints types may be used at urb sumbitting stage which in turn triggers a warning in usb_submit_urb(). Fix the issue by verifying that required endpoint types are present for both in and out endpoints, taking into account cmd endpoint type. Unfortunately, this patch has not been tested on real hardware. [1] Syzbot report: usb 1-1: BOGUS urb xfer, pipe 1 != type 3 WARNING: CPU: 0 PID: 8667 at drivers/usb/core/urb.c:502 usb_submit_urb+0xed2/0x18a0 drivers/usb/core/urb.c:502 Modules linked in: CPU: 0 PID: 8667 Comm: kworker/0:4 Not tainted 5.14.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: usb_hub_wq hub_event RIP: 0010:usb_submit_urb+0xed2/0x18a0 drivers/usb/core/urb.c:502 ... Call Trace: cxacru_cm+0x3c0/0x8e0 drivers/usb/atm/cxacru.c:649 cxacru_card_status+0x22/0xd0 drivers/usb/atm/cxacru.c:760 cxacru_bind+0x7ac/0x11a0 drivers/usb/atm/cxacru.c:1209 usbatm_usb_probe+0x321/0x1ae0 drivers/usb/atm/usbatm.c:1055 cxacru_usb_probe+0xdf/0x1e0 drivers/usb/atm/cxacru.c:1363 usb_probe_interface+0x315/0x7f0 drivers/usb/core/driver.c:396 call_driver_probe drivers/base/dd.c:517 [inline] really_probe+0x23c/0xcd0 drivers/base/dd.c:595 __driver_probe_device+0x338/0x4d0 drivers/base/dd.c:747 driver_probe_device+0x4c/0x1a0 drivers/base/dd.c:777 __device_attach_driver+0x20b/0x2f0 drivers/base/dd.c:894 bus_for_each_drv+0x15f/0x1e0 drivers/base/bus.c:427 __device_attach+0x228/0x4a0 drivers/base/dd.c:965 bus_probe_device+0x1e4/0x290 drivers/base/bus.c:487 device_add+0xc2f/0x2180 drivers/base/core.c:3354 usb_set_configuration+0x113a/0x1910 drivers/usb/core/message.c:2170 usb_generic_driver_probe+0xba/0x100 drivers/usb/core/generic.c:238 usb_probe_device+0xd9/0x2c0 drivers/usb/core/driver.c:293 Reported-and-tested-by: syzbot+00c18ee8497dd3be6ade@syzkaller.appspotmail.com Fixes: 902ffc3c707c ("USB: cxacru: Use a bulk/int URB to access the command endpoint") Cc: stable Signed-off-by: Nikita Zhandarovich Link: https://lore.kernel.org/r/20240609131546.3932-1-n.zhandarovich@fintech.ru Signed-off-by: Greg Kroah-Hartman --- drivers/usb/atm/cxacru.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/drivers/usb/atm/cxacru.c b/drivers/usb/atm/cxacru.c index e62a770a5d3b..1d2c736dbf6a 100644 --- a/drivers/usb/atm/cxacru.c +++ b/drivers/usb/atm/cxacru.c @@ -1134,6 +1134,7 @@ static int cxacru_bind(struct usbatm_data *usbatm_instance, struct cxacru_data *instance; struct usb_device *usb_dev = interface_to_usbdev(intf); struct usb_host_endpoint *cmd_ep = usb_dev->ep_in[CXACRU_EP_CMD]; + struct usb_endpoint_descriptor *in, *out; int ret; /* instance init */ @@ -1180,6 +1181,19 @@ static int cxacru_bind(struct usbatm_data *usbatm_instance, goto fail; } + if (usb_endpoint_xfer_int(&cmd_ep->desc)) + ret = usb_find_common_endpoints(intf->cur_altsetting, + NULL, NULL, &in, &out); + else + ret = usb_find_common_endpoints(intf->cur_altsetting, + &in, &out, NULL, NULL); + + if (ret) { + usb_err(usbatm_instance, "cxacru_bind: interface has incorrect endpoints\n"); + ret = -ENODEV; + goto fail; + } + if ((cmd_ep->desc.bmAttributes & USB_ENDPOINT_XFERTYPE_MASK) == USB_ENDPOINT_XFER_INT) { usb_fill_int_urb(instance->rcv_urb, From cb879300669881970eabebe64bd509dbbe42b9de Mon Sep 17 00:00:00 2001 From: Udit Kumar Date: Wed, 19 Jun 2024 16:29:03 +0530 Subject: [PATCH 1322/1565] serial: 8250_omap: Implementation of Errata i2310 commit 9d141c1e615795eeb93cd35501ad144ee997a826 upstream. As per Errata i2310[0], Erroneous timeout can be triggered, if this Erroneous interrupt is not cleared then it may leads to storm of interrupts, therefore apply Errata i2310 solution. [0] https://www.ti.com/lit/pdf/sprz536 page 23 Fixes: b67e830d38fa ("serial: 8250: 8250_omap: Fix possible interrupt storm on K3 SoCs") Cc: stable@vger.kernel.org Signed-off-by: Udit Kumar Link: https://lore.kernel.org/r/20240619105903.165434-1-u-kumar1@ti.com Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/8250/8250_omap.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c index 25765ebb756a..955642e90ede 100644 --- a/drivers/tty/serial/8250/8250_omap.c +++ b/drivers/tty/serial/8250/8250_omap.c @@ -164,6 +164,10 @@ static void uart_write(struct omap8250_priv *priv, u32 reg, u32 val) writel(val, priv->membase + (reg << OMAP_UART_REGSHIFT)); } +/* Timeout low and High */ +#define UART_OMAP_TO_L 0x26 +#define UART_OMAP_TO_H 0x27 + /* * Called on runtime PM resume path from omap8250_restore_regs(), and * omap8250_set_mctrl(). @@ -647,13 +651,24 @@ static irqreturn_t omap8250_irq(int irq, void *dev_id) /* * On K3 SoCs, it is observed that RX TIMEOUT is signalled after - * FIFO has been drained, in which case a dummy read of RX FIFO - * is required to clear RX TIMEOUT condition. + * FIFO has been drained or erroneously. + * So apply solution of Errata i2310 as mentioned in + * https://www.ti.com/lit/pdf/sprz536 */ if (priv->habit & UART_RX_TIMEOUT_QUIRK && - (iir & UART_IIR_RX_TIMEOUT) == UART_IIR_RX_TIMEOUT && - serial_port_in(port, UART_OMAP_RX_LVL) == 0) { - serial_port_in(port, UART_RX); + (iir & UART_IIR_RX_TIMEOUT) == UART_IIR_RX_TIMEOUT) { + unsigned char efr2, timeout_h, timeout_l; + + efr2 = serial_in(up, UART_OMAP_EFR2); + timeout_h = serial_in(up, UART_OMAP_TO_H); + timeout_l = serial_in(up, UART_OMAP_TO_L); + serial_out(up, UART_OMAP_TO_H, 0xFF); + serial_out(up, UART_OMAP_TO_L, 0xFF); + serial_out(up, UART_OMAP_EFR2, UART_OMAP_EFR2_TIMEOUT_BEHAVE); + serial_in(up, UART_IIR); + serial_out(up, UART_OMAP_EFR2, efr2); + serial_out(up, UART_OMAP_TO_H, timeout_h); + serial_out(up, UART_OMAP_TO_L, timeout_l); } /* Stop processing interrupts on input overrun */ From 44add57b5b44d08e0b2072e6a8bf58270443708b Mon Sep 17 00:00:00 2001 From: Jean-Michel Hautbois Date: Thu, 20 Jun 2024 18:29:59 +0200 Subject: [PATCH 1323/1565] tty: mcf: MCF54418 has 10 UARTS commit 7c92a8bd53f24d50c8cf4aba53bb75505b382fed upstream. Most of the colfires have up to 5 UARTs but MCF54418 has up-to 10 ! Change the maximum value authorized. Signed-off-by: Jean-Michel Hautbois Cc: stable Fixes: 2545cf6e94b4 ("m68knommu: allow 4 coldfire serial ports") Link: https://lore.kernel.org/r/20240620-upstream-uart-v1-1-a9d0d95fb19e@yoseli.org Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/mcf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/serial/mcf.c b/drivers/tty/serial/mcf.c index 09c88c48fb7b..b90bb745277d 100644 --- a/drivers/tty/serial/mcf.c +++ b/drivers/tty/serial/mcf.c @@ -479,7 +479,7 @@ static const struct uart_ops mcf_uart_ops = { .verify_port = mcf_verify_port, }; -static struct mcf_uart mcf_ports[4]; +static struct mcf_uart mcf_ports[10]; #define MCF_MAXPORTS ARRAY_SIZE(mcf_ports) From a2a0ebff7fdeb2f66e29335adf64b9e457300dd4 Mon Sep 17 00:00:00 2001 From: Shigeru Yoshida Date: Fri, 17 May 2024 12:59:53 +0900 Subject: [PATCH 1324/1565] net: can: j1939: Initialize unused data in j1939_send_one() commit b7cdf1dd5d2a2d8200efd98d1893684db48fe134 upstream. syzbot reported kernel-infoleak in raw_recvmsg() [1]. j1939_send_one() creates full frame including unused data, but it doesn't initialize it. This causes the kernel-infoleak issue. Fix this by initializing unused data. [1] BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline] BUG: KMSAN: kernel-infoleak in copy_to_user_iter lib/iov_iter.c:24 [inline] BUG: KMSAN: kernel-infoleak in iterate_ubuf include/linux/iov_iter.h:29 [inline] BUG: KMSAN: kernel-infoleak in iterate_and_advance2 include/linux/iov_iter.h:245 [inline] BUG: KMSAN: kernel-infoleak in iterate_and_advance include/linux/iov_iter.h:271 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x366/0x2520 lib/iov_iter.c:185 instrument_copy_to_user include/linux/instrumented.h:114 [inline] copy_to_user_iter lib/iov_iter.c:24 [inline] iterate_ubuf include/linux/iov_iter.h:29 [inline] iterate_and_advance2 include/linux/iov_iter.h:245 [inline] iterate_and_advance include/linux/iov_iter.h:271 [inline] _copy_to_iter+0x366/0x2520 lib/iov_iter.c:185 copy_to_iter include/linux/uio.h:196 [inline] memcpy_to_msg include/linux/skbuff.h:4113 [inline] raw_recvmsg+0x2b8/0x9e0 net/can/raw.c:1008 sock_recvmsg_nosec net/socket.c:1046 [inline] sock_recvmsg+0x2c4/0x340 net/socket.c:1068 ____sys_recvmsg+0x18a/0x620 net/socket.c:2803 ___sys_recvmsg+0x223/0x840 net/socket.c:2845 do_recvmmsg+0x4fc/0xfd0 net/socket.c:2939 __sys_recvmmsg net/socket.c:3018 [inline] __do_sys_recvmmsg net/socket.c:3041 [inline] __se_sys_recvmmsg net/socket.c:3034 [inline] __x64_sys_recvmmsg+0x397/0x490 net/socket.c:3034 x64_sys_call+0xf6c/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:300 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was created at: slab_post_alloc_hook mm/slub.c:3804 [inline] slab_alloc_node mm/slub.c:3845 [inline] kmem_cache_alloc_node+0x613/0xc50 mm/slub.c:3888 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:577 __alloc_skb+0x35b/0x7a0 net/core/skbuff.c:668 alloc_skb include/linux/skbuff.h:1313 [inline] alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6504 sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2795 sock_alloc_send_skb include/net/sock.h:1842 [inline] j1939_sk_alloc_skb net/can/j1939/socket.c:878 [inline] j1939_sk_send_loop net/can/j1939/socket.c:1142 [inline] j1939_sk_sendmsg+0xc0a/0x2730 net/can/j1939/socket.c:1277 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 ____sys_sendmsg+0x877/0xb60 net/socket.c:2584 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638 __sys_sendmsg net/socket.c:2667 [inline] __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __x64_sys_sendmsg+0x307/0x4a0 net/socket.c:2674 x64_sys_call+0xc4b/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:47 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Bytes 12-15 of 16 are uninitialized Memory access of size 16 starts at ffff888120969690 Data copied to user address 00000000200017c0 CPU: 1 PID: 5050 Comm: syz-executor198 Not tainted 6.9.0-rc5-syzkaller-00031-g71b1543c83d6 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Reported-and-tested-by: syzbot+5681e40d297b30f5b513@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=5681e40d297b30f5b513 Acked-by: Oleksij Rempel Signed-off-by: Shigeru Yoshida Link: https://lore.kernel.org/all/20240517035953.2617090-1-syoshida@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- net/can/j1939/main.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c index 9169ef174ff0..d102efcae65a 100644 --- a/net/can/j1939/main.c +++ b/net/can/j1939/main.c @@ -30,10 +30,6 @@ MODULE_ALIAS("can-proto-" __stringify(CAN_J1939)); /* CAN_HDR: #bytes before can_frame data part */ #define J1939_CAN_HDR (offsetof(struct can_frame, data)) -/* CAN_FTR: #bytes beyond data part */ -#define J1939_CAN_FTR (sizeof(struct can_frame) - J1939_CAN_HDR - \ - sizeof(((struct can_frame *)0)->data)) - /* lowest layer */ static void j1939_can_recv(struct sk_buff *iskb, void *data) { @@ -338,7 +334,7 @@ int j1939_send_one(struct j1939_priv *priv, struct sk_buff *skb) memset(cf, 0, J1939_CAN_HDR); /* make it a full can frame again */ - skb_put(skb, J1939_CAN_FTR + (8 - dlc)); + skb_put_zero(skb, 8 - dlc); canid = CAN_EFF_FLAG | (skcb->priority << 26) | From 3f177e46c935284d83f3fe74d75693208e8fca13 Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Tue, 28 May 2024 09:06:48 +0200 Subject: [PATCH 1325/1565] net: can: j1939: recover socket queue on CAN bus error during BAM transmission MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 9ad1da14ab3bf23087ae45fe399d84a109ddb81a upstream. Addresses an issue where a CAN bus error during a BAM transmission could stall the socket queue, preventing further transmissions even after the bus error is resolved. The fix activates the next queued session after the error recovery, allowing communication to continue. Fixes: 9d71dd0c70099 ("can: add support of SAE J1939 protocol") Cc: stable@vger.kernel.org Reported-by: Alexander Hölzl Tested-by: Alexander Hölzl Signed-off-by: Oleksij Rempel Link: https://lore.kernel.org/all/20240528070648.1947203-1-o.rempel@pengutronix.de Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- net/can/j1939/transport.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c index 5dcbb0b7d123..c36d0c2bf0ed 100644 --- a/net/can/j1939/transport.c +++ b/net/can/j1939/transport.c @@ -1662,6 +1662,8 @@ static int j1939_xtp_rx_rts_session_active(struct j1939_session *session, j1939_session_timers_cancel(session); j1939_session_cancel(session, J1939_XTP_ABORT_BUSY); + if (session->transmission) + j1939_session_deactivate_activate_next(session); return -EBUSY; } From f6c839e717901dbd6b1c1ca807b6210222eb70f6 Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Fri, 17 Nov 2023 13:49:59 +0100 Subject: [PATCH 1326/1565] net: can: j1939: enhanced error handling for tightly received RTS messages in xtp_rx_rts_session_new commit d3e2904f71ea0fe7eaff1d68a2b0363c888ea0fb upstream. This patch enhances error handling in scenarios with RTS (Request to Send) messages arriving closely. It replaces the less informative WARN_ON_ONCE backtraces with a new error handling method. This provides clearer error messages and allows for the early termination of problematic sessions. Previously, sessions were only released at the end of j1939_xtp_rx_rts(). Potentially this could be reproduced with something like: testj1939 -r vcan0:0x80 & while true; do # send first RTS cansend vcan0 18EC8090#1014000303002301; # send second RTS cansend vcan0 18EC8090#1014000303002301; # send abort cansend vcan0 18EC8090#ff00000000002301; done Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Reported-by: syzbot+daa36413a5cedf799ae4@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel Link: https://lore.kernel.org/all/20231117124959.961171-1-o.rempel@pengutronix.de Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- net/can/j1939/transport.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c index c36d0c2bf0ed..478dafc73857 100644 --- a/net/can/j1939/transport.c +++ b/net/can/j1939/transport.c @@ -1577,8 +1577,8 @@ j1939_session *j1939_xtp_rx_rts_session_new(struct j1939_priv *priv, struct j1939_sk_buff_cb skcb = *j1939_skb_to_cb(skb); struct j1939_session *session; const u8 *dat; + int len, ret; pgn_t pgn; - int len; netdev_dbg(priv->ndev, "%s\n", __func__); @@ -1634,7 +1634,22 @@ j1939_session *j1939_xtp_rx_rts_session_new(struct j1939_priv *priv, session->pkt.rx = 0; session->pkt.tx = 0; - WARN_ON_ONCE(j1939_session_activate(session)); + ret = j1939_session_activate(session); + if (ret) { + /* Entering this scope indicates an issue with the J1939 bus. + * Possible scenarios include: + * - A time lapse occurred, and a new session was initiated + * due to another packet being sent correctly. This could + * have been caused by too long interrupt, debugger, or being + * out-scheduled by another task. + * - The bus is receiving numerous erroneous packets, either + * from a malfunctioning device or during a test scenario. + */ + netdev_alert(priv->ndev, "%s: 0x%p: concurrent session with same addr (%02x %02x) is already active.\n", + __func__, session, skcb.addr.sa, skcb.addr.da); + j1939_session_put(session); + return NULL; + } return session; } From bf4a43c533d9c3789750192d621425403c64c312 Mon Sep 17 00:00:00 2001 From: Dragan Simic Date: Mon, 10 Jun 2024 07:21:12 +0200 Subject: [PATCH 1327/1565] kbuild: Install dtb files as 0644 in Makefile.dtbinst commit 9cc5f3bf63aa98bd7cc7ce8a8599077fde13283e upstream. The compiled dtb files aren't executable, so install them with 0644 as their permission mode, instead of defaulting to 0755 for the permission mode and installing them with the executable bits set. Some Linux distributions, including Debian, [1][2][3] already include fixes in their kernel package build recipes to change the dtb file permissions to 0644 in their kernel packages. These changes, when additionally propagated into the long-term kernel versions, will allow such distributions to remove their downstream fixes. [1] https://salsa.debian.org/kernel-team/linux/-/merge_requests/642 [2] https://salsa.debian.org/kernel-team/linux/-/merge_requests/749 [3] https://salsa.debian.org/kernel-team/linux/-/blob/debian/6.8.12-1/debian/rules.real#L193 Cc: Diederik de Haas Cc: Fixes: aefd80307a05 ("kbuild: refactor Makefile.dtbinst more") Signed-off-by: Dragan Simic Reviewed-by: Nicolas Schier Signed-off-by: Masahiro Yamada Signed-off-by: Greg Kroah-Hartman --- scripts/Makefile.dtbinst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/Makefile.dtbinst b/scripts/Makefile.dtbinst index 50d580d77ae9..e1ac7e40bbaa 100644 --- a/scripts/Makefile.dtbinst +++ b/scripts/Makefile.dtbinst @@ -24,7 +24,7 @@ __dtbs_install: $(dtbs) $(subdirs) @: quiet_cmd_dtb_install = INSTALL $@ - cmd_dtb_install = install -D $< $@ + cmd_dtb_install = install -D -m 0644 $< $@ $(dst)/%.dtb: $(obj)/%.dtb $(call cmd,dtb_install) From 690633552986ed86001f1092ff403e9a64fd386b Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 14 Jun 2024 09:54:20 +0200 Subject: [PATCH 1328/1565] csky, hexagon: fix broken sys_sync_file_range commit 3339b99ef6fe38dac43b534cba3a8a0e29fb2eff upstream. Both of these architectures require u64 function arguments to be passed in even/odd pairs of registers or stack slots, which in case of sync_file_range would result in a seven-argument system call that is not currently possible. The system call is therefore incompatible with all existing binaries. While it would be possible to implement support for seven arguments like on mips, it seems better to use a six-argument version, either with the normal argument order but misaligned as on most architectures or with the reordered sync_file_range2() calling conventions as on arm and powerpc. Cc: stable@vger.kernel.org Acked-by: Guo Ren Signed-off-by: Arnd Bergmann Signed-off-by: Greg Kroah-Hartman --- arch/csky/include/uapi/asm/unistd.h | 1 + arch/hexagon/include/uapi/asm/unistd.h | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/csky/include/uapi/asm/unistd.h b/arch/csky/include/uapi/asm/unistd.h index ba4018929733..3594062a1bba 100644 --- a/arch/csky/include/uapi/asm/unistd.h +++ b/arch/csky/include/uapi/asm/unistd.h @@ -7,6 +7,7 @@ #define __ARCH_WANT_SYS_CLONE3 #define __ARCH_WANT_SET_GET_RLIMIT #define __ARCH_WANT_TIME32_SYSCALLS +#define __ARCH_WANT_SYNC_FILE_RANGE2 #include #define __NR_set_thread_area (__NR_arch_specific_syscall + 0) diff --git a/arch/hexagon/include/uapi/asm/unistd.h b/arch/hexagon/include/uapi/asm/unistd.h index 432c4db1b623..21ae22306b5d 100644 --- a/arch/hexagon/include/uapi/asm/unistd.h +++ b/arch/hexagon/include/uapi/asm/unistd.h @@ -36,5 +36,6 @@ #define __ARCH_WANT_SYS_VFORK #define __ARCH_WANT_SYS_FORK #define __ARCH_WANT_TIME32_SYSCALLS +#define __ARCH_WANT_SYNC_FILE_RANGE2 #include From 9ec84770e4867f75608e82c6503076dab814f299 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 20 Jun 2024 15:24:11 +0200 Subject: [PATCH 1329/1565] hexagon: fix fadvise64_64 calling conventions commit 896842284c6ccba25ec9d78b7b6e62cdd507c083 upstream. fadvise64_64() has two 64-bit arguments at the wrong alignment for hexagon, which turns them into a 7-argument syscall that is not supported by Linux. The downstream musl port for hexagon actually asks for a 6-argument version the same way we do it on arm, csky, powerpc, so make the kernel do it the same way to avoid having to change both. Link: https://github.com/quic/musl/blob/hexagon/arch/hexagon/syscall_arch.h#L78 Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann Signed-off-by: Greg Kroah-Hartman --- arch/hexagon/include/asm/syscalls.h | 6 ++++++ arch/hexagon/kernel/syscalltab.c | 7 +++++++ 2 files changed, 13 insertions(+) create mode 100644 arch/hexagon/include/asm/syscalls.h diff --git a/arch/hexagon/include/asm/syscalls.h b/arch/hexagon/include/asm/syscalls.h new file mode 100644 index 000000000000..40f2d08bec92 --- /dev/null +++ b/arch/hexagon/include/asm/syscalls.h @@ -0,0 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#include + +asmlinkage long sys_hexagon_fadvise64_64(int fd, int advice, + u32 a2, u32 a3, u32 a4, u32 a5); diff --git a/arch/hexagon/kernel/syscalltab.c b/arch/hexagon/kernel/syscalltab.c index 0fadd582cfc7..5d98bdc494ec 100644 --- a/arch/hexagon/kernel/syscalltab.c +++ b/arch/hexagon/kernel/syscalltab.c @@ -14,6 +14,13 @@ #undef __SYSCALL #define __SYSCALL(nr, call) [nr] = (call), +SYSCALL_DEFINE6(hexagon_fadvise64_64, int, fd, int, advice, + SC_ARG64(offset), SC_ARG64(len)) +{ + return ksys_fadvise64_64(fd, SC_VAL64(loff_t, offset), SC_VAL64(loff_t, len), advice); +} +#define sys_fadvise64_64 sys_hexagon_fadvise64_64 + void *sys_call_table[__NR_syscalls] = { #include }; From 259549b2ccf795b7f91f7b5aba47286addcfa389 Mon Sep 17 00:00:00 2001 From: Ma Ke Date: Tue, 25 Jun 2024 16:18:28 +0800 Subject: [PATCH 1330/1565] drm/nouveau/dispnv04: fix null pointer dereference in nv17_tv_get_ld_modes commit 66edf3fb331b6c55439b10f9862987b0916b3726 upstream. In nv17_tv_get_ld_modes(), the return value of drm_mode_duplicate() is assigned to mode, which will lead to a possible NULL pointer dereference on failure of drm_mode_duplicate(). Add a check to avoid npd. Cc: stable@vger.kernel.org Signed-off-by: Ma Ke Signed-off-by: Lyude Paul Link: https://patchwork.freedesktop.org/patch/msgid/20240625081828.2620794-1-make24@iscas.ac.cn Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/nouveau/dispnv04/tvnv17.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/nouveau/dispnv04/tvnv17.c b/drivers/gpu/drm/nouveau/dispnv04/tvnv17.c index be28e7bd7490..f0251866e6dd 100644 --- a/drivers/gpu/drm/nouveau/dispnv04/tvnv17.c +++ b/drivers/gpu/drm/nouveau/dispnv04/tvnv17.c @@ -208,6 +208,8 @@ static int nv17_tv_get_ld_modes(struct drm_encoder *encoder, struct drm_display_mode *mode; mode = drm_mode_duplicate(encoder->dev, tv_mode); + if (!mode) + continue; mode->clock = tv_norm->tv_enc_mode.vrefresh * mode->htotal / 1000 * From f771b91f21c46ad1217328d05e72a2c7e3add535 Mon Sep 17 00:00:00 2001 From: Janusz Krzysztofik Date: Mon, 3 Jun 2024 21:54:45 +0200 Subject: [PATCH 1331/1565] drm/i915/gt: Fix potential UAF by revoke of fence registers commit 996c3412a06578e9d779a16b9e79ace18125ab50 upstream. CI has been sporadically reporting the following issue triggered by igt@i915_selftest@live@hangcheck on ADL-P and similar machines: <6> [414.049203] i915: Running intel_hangcheck_live_selftests/igt_reset_evict_fence ... <6> [414.068804] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled <6> [414.068812] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled <3> [414.070354] Unable to pin Y-tiled fence; err:-4 <3> [414.071282] i915_vma_revoke_fence:301 GEM_BUG_ON(!i915_active_is_idle(&fence->active)) ... <4>[ 609.603992] ------------[ cut here ]------------ <2>[ 609.603995] kernel BUG at drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c:301! <4>[ 609.604003] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI <4>[ 609.604006] CPU: 0 PID: 268 Comm: kworker/u64:3 Tainted: G U W 6.9.0-CI_DRM_14785-g1ba62f8cea9c+ #1 <4>[ 609.604008] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023 <4>[ 609.604010] Workqueue: i915 __i915_gem_free_work [i915] <4>[ 609.604149] RIP: 0010:i915_vma_revoke_fence+0x187/0x1f0 [i915] ... <4>[ 609.604271] Call Trace: <4>[ 609.604273] ... <4>[ 609.604716] __i915_vma_evict+0x2e9/0x550 [i915] <4>[ 609.604852] __i915_vma_unbind+0x7c/0x160 [i915] <4>[ 609.604977] force_unbind+0x24/0xa0 [i915] <4>[ 609.605098] i915_vma_destroy+0x2f/0xa0 [i915] <4>[ 609.605210] __i915_gem_object_pages_fini+0x51/0x2f0 [i915] <4>[ 609.605330] __i915_gem_free_objects.isra.0+0x6a/0xc0 [i915] <4>[ 609.605440] process_scheduled_works+0x351/0x690 ... In the past, there were similar failures reported by CI from other IGT tests, observed on other platforms. Before commit 63baf4f3d587 ("drm/i915/gt: Only wait for GPU activity before unbinding a GGTT fence"), i915_vma_revoke_fence() was waiting for idleness of vma->active via fence_update(). That commit introduced vma->fence->active in order for the fence_update() to be able to wait selectively on that one instead of vma->active since only idleness of fence registers was needed. But then, another commit 0d86ee35097a ("drm/i915/gt: Make fence revocation unequivocal") replaced the call to fence_update() in i915_vma_revoke_fence() with only fence_write(), and also added that GEM_BUG_ON(!i915_active_is_idle(&fence->active)) in front. No justification was provided on why we might then expect idleness of vma->fence->active without first waiting on it. The issue can be potentially caused by a race among revocation of fence registers on one side and sequential execution of signal callbacks invoked on completion of a request that was using them on the other, still processed in parallel to revocation of those fence registers. Fix it by waiting for idleness of vma->fence->active in i915_vma_revoke_fence(). Fixes: 0d86ee35097a ("drm/i915/gt: Make fence revocation unequivocal") Closes: https://gitlab.freedesktop.org/drm/intel/issues/10021 Signed-off-by: Janusz Krzysztofik Cc: stable@vger.kernel.org # v5.8+ Reviewed-by: Andi Shyti Signed-off-by: Andi Shyti Link: https://patchwork.freedesktop.org/patch/msgid/20240603195446.297690-2-janusz.krzysztofik@linux.intel.com (cherry picked from commit 24bb052d3dd499c5956abad5f7d8e4fd07da7fb1) Signed-off-by: Jani Nikula Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c index cd71631bef0c..ddf76a4e4fa2 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c @@ -309,6 +309,7 @@ void i915_vma_revoke_fence(struct i915_vma *vma) return; GEM_BUG_ON(fence->vma != vma); + i915_active_wait(&fence->active); GEM_BUG_ON(!i915_active_is_idle(&fence->active)); GEM_BUG_ON(atomic_read(&fence->pin_count)); From 56fc4d3b0bdef691831cd95715a7ca3ebea98b2d Mon Sep 17 00:00:00 2001 From: Ma Ke Date: Tue, 25 Jun 2024 16:10:29 +0800 Subject: [PATCH 1332/1565] drm/nouveau/dispnv04: fix null pointer dereference in nv17_tv_get_hd_modes commit 6d411c8ccc0137a612e0044489030a194ff5c843 upstream. In nv17_tv_get_hd_modes(), the return value of drm_mode_duplicate() is assigned to mode, which will lead to a possible NULL pointer dereference on failure of drm_mode_duplicate(). The same applies to drm_cvt_mode(). Add a check to avoid null pointer dereference. Cc: stable@vger.kernel.org Signed-off-by: Ma Ke Signed-off-by: Lyude Paul Link: https://patchwork.freedesktop.org/patch/msgid/20240625081029.2619437-1-make24@iscas.ac.cn Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/nouveau/dispnv04/tvnv17.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/nouveau/dispnv04/tvnv17.c b/drivers/gpu/drm/nouveau/dispnv04/tvnv17.c index f0251866e6dd..1da9d1e89f91 100644 --- a/drivers/gpu/drm/nouveau/dispnv04/tvnv17.c +++ b/drivers/gpu/drm/nouveau/dispnv04/tvnv17.c @@ -259,6 +259,8 @@ static int nv17_tv_get_hd_modes(struct drm_encoder *encoder, if (modes[i].hdisplay == output_mode->hdisplay && modes[i].vdisplay == output_mode->vdisplay) { mode = drm_mode_duplicate(encoder->dev, output_mode); + if (!mode) + continue; mode->type |= DRM_MODE_TYPE_PREFERRED; } else { @@ -266,6 +268,8 @@ static int nv17_tv_get_hd_modes(struct drm_encoder *encoder, modes[i].vdisplay, 60, false, (output_mode->flags & DRM_MODE_FLAG_INTERLACE), false); + if (!mode) + continue; } /* CVT modes are sometimes unsuitable... */ From 692858d9edb34887235d94c8f12175c631a54b56 Mon Sep 17 00:00:00 2001 From: Sven Eckelmann Date: Sat, 4 May 2024 21:57:30 +0200 Subject: [PATCH 1333/1565] batman-adv: Don't accept TT entries for out-of-spec VIDs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 537a350d14321c8cca5efbf0a33a404fec3a9f9e upstream. The internal handling of VLAN IDs in batman-adv is only specified for following encodings: * VLAN is used - bit 15 is 1 - bit 11 - bit 0 is the VLAN ID (0-4095) - remaining bits are 0 * No VLAN is used - bit 15 is 0 - remaining bits are 0 batman-adv was only preparing new translation table entries (based on its soft interface information) using this encoding format. But the receive path was never checking if entries in the roam or TT TVLVs were also following this encoding. It was therefore possible to create more than the expected maximum of 4096 + 1 entries in the originator VLAN list. Simply by setting the "remaining bits" to "random" values in corresponding TVLV. Cc: stable@vger.kernel.org Fixes: 7ea7b4a14275 ("batman-adv: make the TT CRC logic VLAN specific") Reported-by: Linus Lüssing Signed-off-by: Sven Eckelmann Signed-off-by: Simon Wunderlich Signed-off-by: Greg Kroah-Hartman --- net/batman-adv/originator.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c index 3b2a3759efd6..cbebe3aa1f60 100644 --- a/net/batman-adv/originator.c +++ b/net/batman-adv/originator.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -132,6 +133,29 @@ batadv_orig_node_vlan_get(struct batadv_orig_node *orig_node, return vlan; } +/** + * batadv_vlan_id_valid() - check if vlan id is in valid batman-adv encoding + * @vid: the VLAN identifier + * + * Return: true when either no vlan is set or if VLAN is in correct range, + * false otherwise + */ +static bool batadv_vlan_id_valid(unsigned short vid) +{ + unsigned short non_vlan = vid & ~(BATADV_VLAN_HAS_TAG | VLAN_VID_MASK); + + if (vid == 0) + return true; + + if (!(vid & BATADV_VLAN_HAS_TAG)) + return false; + + if (non_vlan) + return false; + + return true; +} + /** * batadv_orig_node_vlan_new() - search and possibly create an orig_node_vlan * object @@ -150,6 +174,9 @@ batadv_orig_node_vlan_new(struct batadv_orig_node *orig_node, { struct batadv_orig_node_vlan *vlan; + if (!batadv_vlan_id_valid(vid)) + return NULL; + spin_lock_bh(&orig_node->vlan_list_lock); /* first look if an object for this vid already exists */ From be5016ae5a3b362dd2bafb4c5b22114a84a3c38b Mon Sep 17 00:00:00 2001 From: Niklas Cassel Date: Sat, 29 Jun 2024 14:42:14 +0200 Subject: [PATCH 1334/1565] ata: ahci: Clean up sysfs file on error commit eeb25a09c5e0805d92e4ebd12c4b0ad0df1b0295 upstream. .probe() (ahci_init_one()) calls sysfs_add_file_to_group(), however, if probe() fails after this call, we currently never call sysfs_remove_file_from_group(). (The sysfs_remove_file_from_group() call in .remove() (ahci_remove_one()) does not help, as .remove() is not called on .probe() error.) Thus, if probe() fails after the sysfs_add_file_to_group() call, the next time we insmod the module we will get: sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:04.0/remapped_nvme' CPU: 11 PID: 954 Comm: modprobe Not tainted 6.10.0-rc5 #43 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 Call Trace: dump_stack_lvl+0x5d/0x80 sysfs_warn_dup.cold+0x17/0x23 sysfs_add_file_mode_ns+0x11a/0x130 sysfs_add_file_to_group+0x7e/0xc0 ahci_init_one+0x31f/0xd40 [ahci] Fixes: 894fba7f434a ("ata: ahci: Add sysfs attribute to show remapped NVMe device count") Cc: stable@vger.kernel.org Reviewed-by: Damien Le Moal Reviewed-by: Hannes Reinecke Link: https://lore.kernel.org/r/20240629124210.181537-10-cassel@kernel.org Signed-off-by: Niklas Cassel Signed-off-by: Greg Kroah-Hartman --- drivers/ata/ahci.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 55a07dbe1d8a..04b53bb7a692 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -1893,8 +1893,10 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) n_ports = max(ahci_nr_ports(hpriv->cap), fls(hpriv->port_map)); host = ata_host_alloc_pinfo(&pdev->dev, ppi, n_ports); - if (!host) - return -ENOMEM; + if (!host) { + rc = -ENOMEM; + goto err_rm_sysfs_file; + } host->private_data = hpriv; if (ahci_init_msi(pdev, n_ports, hpriv) < 0) { @@ -1947,11 +1949,11 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) /* initialize adapter */ rc = ahci_configure_dma_masks(pdev, hpriv); if (rc) - return rc; + goto err_rm_sysfs_file; rc = ahci_pci_reset_controller(host); if (rc) - return rc; + goto err_rm_sysfs_file; ahci_pci_init_controller(host); ahci_pci_print_info(host); @@ -1960,10 +1962,15 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) rc = ahci_host_activate(host, &ahci_sht); if (rc) - return rc; + goto err_rm_sysfs_file; pm_runtime_put_noidle(&pdev->dev); return 0; + +err_rm_sysfs_file: + sysfs_remove_file_from_group(&pdev->dev.kobj, + &dev_attr_remapped_nvme.attr, NULL); + return rc; } static void ahci_shutdown_one(struct pci_dev *pdev) From 010de9acbea58fbcbda08e3793d6262086a493fe Mon Sep 17 00:00:00 2001 From: Niklas Cassel Date: Sat, 29 Jun 2024 14:42:13 +0200 Subject: [PATCH 1335/1565] ata: libata-core: Fix double free on error commit ab9e0c529eb7cafebdd31fe1644524e80a48b05d upstream. If e.g. the ata_port_alloc() call in ata_host_alloc() fails, we will jump to the err_out label, which will call devres_release_group(). devres_release_group() will trigger a call to ata_host_release(). ata_host_release() calls kfree(host), so executing the kfree(host) in ata_host_alloc() will lead to a double free: kernel BUG at mm/slub.c:553! Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 11 PID: 599 Comm: (udev-worker) Not tainted 6.10.0-rc5 #47 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:kfree+0x2cf/0x2f0 Code: 5d 41 5e 41 5f 5d e9 80 d6 ff ff 4d 89 f1 41 b8 01 00 00 00 48 89 d9 48 89 da RSP: 0018:ffffc90000f377f0 EFLAGS: 00010246 RAX: ffff888112b1f2c0 RBX: ffff888112b1f2c0 RCX: ffff888112b1f320 RDX: 000000000000400b RSI: ffffffffc02c9de5 RDI: ffff888112b1f2c0 RBP: ffffc90000f37830 R08: 0000000000000000 R09: 0000000000000000 R10: ffffc90000f37610 R11: 617461203a736b6e R12: ffffea00044ac780 R13: ffff888100046400 R14: ffffffffc02c9de5 R15: 0000000000000006 FS: 00007f2f1cabe980(0000) GS:ffff88813b380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2f1c3acf75 CR3: 0000000111724000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: ? __die_body.cold+0x19/0x27 ? die+0x2e/0x50 ? do_trap+0xca/0x110 ? do_error_trap+0x6a/0x90 ? kfree+0x2cf/0x2f0 ? exc_invalid_op+0x50/0x70 ? kfree+0x2cf/0x2f0 ? asm_exc_invalid_op+0x1a/0x20 ? ata_host_alloc+0xf5/0x120 [libata] ? ata_host_alloc+0xf5/0x120 [libata] ? kfree+0x2cf/0x2f0 ata_host_alloc+0xf5/0x120 [libata] ata_host_alloc_pinfo+0x14/0xa0 [libata] ahci_init_one+0x6c9/0xd20 [ahci] Ensure that we will not call kfree(host) twice, by performing the kfree() only if the devres_open_group() call failed. Fixes: dafd6c496381 ("libata: ensure host is free'd on error exit paths") Cc: stable@vger.kernel.org Reviewed-by: Damien Le Moal Reviewed-by: Hannes Reinecke Link: https://lore.kernel.org/r/20240629124210.181537-9-cassel@kernel.org Signed-off-by: Niklas Cassel Signed-off-by: Greg Kroah-Hartman --- drivers/ata/libata-core.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 702b8e061b36..3717ed6fcccc 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -5420,8 +5420,10 @@ struct ata_host *ata_host_alloc(struct device *dev, int max_ports) if (!host) return NULL; - if (!devres_open_group(dev, NULL, GFP_KERNEL)) - goto err_free; + if (!devres_open_group(dev, NULL, GFP_KERNEL)) { + kfree(host); + return NULL; + } dr = devres_alloc(ata_devres_release, 0, GFP_KERNEL); if (!dr) @@ -5453,8 +5455,6 @@ struct ata_host *ata_host_alloc(struct device *dev, int max_ports) err_out: devres_release_group(dev, NULL); - err_free: - kfree(host); return NULL; } EXPORT_SYMBOL_GPL(ata_host_alloc); From 84bf6b64a1a0dfc6de7e1b1c776d58d608e7865a Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 19 Jun 2024 11:34:09 +0200 Subject: [PATCH 1336/1565] ftruncate: pass a signed offset commit 4b8e88e563b5f666446d002ad0dc1e6e8e7102b0 upstream. The old ftruncate() syscall, using the 32-bit off_t misses a sign extension when called in compat mode on 64-bit architectures. As a result, passing a negative length accidentally succeeds in truncating to file size between 2GiB and 4GiB. Changing the type of the compat syscall to the signed compat_off_t changes the behavior so it instead returns -EINVAL. The native entry point, the truncate() syscall and the corresponding loff_t based variants are all correct already and do not suffer from this mistake. Fixes: 3f6d078d4acc ("fix compat truncate/ftruncate") Reviewed-by: Christian Brauner Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann Signed-off-by: Greg Kroah-Hartman --- fs/open.c | 4 ++-- include/linux/compat.h | 2 +- include/linux/syscalls.h | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/open.c b/fs/open.c index d69312a2d434..694110929519 100644 --- a/fs/open.c +++ b/fs/open.c @@ -200,13 +200,13 @@ long do_sys_ftruncate(unsigned int fd, loff_t length, int small) return error; } -SYSCALL_DEFINE2(ftruncate, unsigned int, fd, unsigned long, length) +SYSCALL_DEFINE2(ftruncate, unsigned int, fd, off_t, length) { return do_sys_ftruncate(fd, length, 1); } #ifdef CONFIG_COMPAT -COMPAT_SYSCALL_DEFINE2(ftruncate, unsigned int, fd, compat_ulong_t, length) +COMPAT_SYSCALL_DEFINE2(ftruncate, unsigned int, fd, compat_off_t, length) { return do_sys_ftruncate(fd, length, 1); } diff --git a/include/linux/compat.h b/include/linux/compat.h index 14d514233e1d..8dffffe846ce 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -527,7 +527,7 @@ asmlinkage long compat_sys_fstatfs(unsigned int fd, asmlinkage long compat_sys_fstatfs64(unsigned int fd, compat_size_t sz, struct compat_statfs64 __user *buf); asmlinkage long compat_sys_truncate(const char __user *, compat_off_t); -asmlinkage long compat_sys_ftruncate(unsigned int, compat_ulong_t); +asmlinkage long compat_sys_ftruncate(unsigned int, compat_off_t); /* No generic prototype for truncate64, ftruncate64, fallocate */ asmlinkage long compat_sys_openat(int dfd, const char __user *filename, int flags, umode_t mode); diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 1ea422b1a9f1..a96e924c7b45 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -445,7 +445,7 @@ asmlinkage long sys_fstatfs(unsigned int fd, struct statfs __user *buf); asmlinkage long sys_fstatfs64(unsigned int fd, size_t sz, struct statfs64 __user *buf); asmlinkage long sys_truncate(const char __user *path, long length); -asmlinkage long sys_ftruncate(unsigned int fd, unsigned long length); +asmlinkage long sys_ftruncate(unsigned int fd, off_t length); #if BITS_PER_LONG == 32 asmlinkage long sys_truncate64(const char __user *path, loff_t length); asmlinkage long sys_ftruncate64(unsigned int fd, loff_t length); From d5f75f01994e399500264a07585935adab948ee5 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 20 Jun 2024 14:16:37 +0200 Subject: [PATCH 1337/1565] syscalls: fix compat_sys_io_pgetevents_time64 usage commit d3882564a77c21eb746ba5364f3fa89b88de3d61 upstream. Using sys_io_pgetevents() as the entry point for compat mode tasks works almost correctly, but misses the sign extension for the min_nr and nr arguments. This was addressed on parisc by switching to compat_sys_io_pgetevents_time64() in commit 6431e92fc827 ("parisc: io_pgetevents_time64() needs compat syscall in 32-bit compat mode"), as well as by using more sophisticated system call wrappers on x86 and s390. However, arm64, mips, powerpc, sparc and riscv still have the same bug. Change all of them over to use compat_sys_io_pgetevents_time64() like parisc already does. This was clearly the intention when the function was originally added, but it got hooked up incorrectly in the tables. Cc: stable@vger.kernel.org Fixes: 48166e6ea47d ("y2038: add 64-bit time_t syscalls to all 32-bit architectures") Acked-by: Heiko Carstens # s390 Signed-off-by: Arnd Bergmann Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/unistd32.h | 2 +- arch/mips/kernel/syscalls/syscall_n32.tbl | 2 +- arch/mips/kernel/syscalls/syscall_o32.tbl | 2 +- arch/powerpc/kernel/syscalls/syscall.tbl | 2 +- arch/s390/kernel/syscalls/syscall.tbl | 2 +- arch/sparc/kernel/syscalls/syscall.tbl | 2 +- arch/x86/entry/syscalls/syscall_32.tbl | 2 +- include/uapi/asm-generic/unistd.h | 2 +- kernel/sys_ni.c | 2 +- 9 files changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 107f08e03b9f..2f120f254e26 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -840,7 +840,7 @@ __SYSCALL(__NR_pselect6_time64, compat_sys_pselect6_time64) #define __NR_ppoll_time64 414 __SYSCALL(__NR_ppoll_time64, compat_sys_ppoll_time64) #define __NR_io_pgetevents_time64 416 -__SYSCALL(__NR_io_pgetevents_time64, sys_io_pgetevents) +__SYSCALL(__NR_io_pgetevents_time64, compat_sys_io_pgetevents_time64) #define __NR_recvmmsg_time64 417 __SYSCALL(__NR_recvmmsg_time64, compat_sys_recvmmsg_time64) #define __NR_mq_timedsend_time64 418 diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index 32817c954435..67fc2d9b7cb1 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -354,7 +354,7 @@ 412 n32 utimensat_time64 sys_utimensat 413 n32 pselect6_time64 compat_sys_pselect6_time64 414 n32 ppoll_time64 compat_sys_ppoll_time64 -416 n32 io_pgetevents_time64 sys_io_pgetevents +416 n32 io_pgetevents_time64 compat_sys_io_pgetevents_time64 417 n32 recvmmsg_time64 compat_sys_recvmmsg_time64 418 n32 mq_timedsend_time64 sys_mq_timedsend 419 n32 mq_timedreceive_time64 sys_mq_timedreceive diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 29f5f28cf5ce..6036af4f30e2 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -403,7 +403,7 @@ 412 o32 utimensat_time64 sys_utimensat sys_utimensat 413 o32 pselect6_time64 sys_pselect6 compat_sys_pselect6_time64 414 o32 ppoll_time64 sys_ppoll compat_sys_ppoll_time64 -416 o32 io_pgetevents_time64 sys_io_pgetevents sys_io_pgetevents +416 o32 io_pgetevents_time64 sys_io_pgetevents compat_sys_io_pgetevents_time64 417 o32 recvmmsg_time64 sys_recvmmsg compat_sys_recvmmsg_time64 418 o32 mq_timedsend_time64 sys_mq_timedsend sys_mq_timedsend 419 o32 mq_timedreceive_time64 sys_mq_timedreceive sys_mq_timedreceive diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 1275daec7fec..59f550062bb7 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -503,7 +503,7 @@ 412 32 utimensat_time64 sys_utimensat sys_utimensat 413 32 pselect6_time64 sys_pselect6 compat_sys_pselect6_time64 414 32 ppoll_time64 sys_ppoll compat_sys_ppoll_time64 -416 32 io_pgetevents_time64 sys_io_pgetevents sys_io_pgetevents +416 32 io_pgetevents_time64 sys_io_pgetevents compat_sys_io_pgetevents_time64 417 32 recvmmsg_time64 sys_recvmmsg compat_sys_recvmmsg_time64 418 32 mq_timedsend_time64 sys_mq_timedsend sys_mq_timedsend 419 32 mq_timedreceive_time64 sys_mq_timedreceive sys_mq_timedreceive diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 28c168000483..b846233b81dc 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -418,7 +418,7 @@ 412 32 utimensat_time64 - sys_utimensat 413 32 pselect6_time64 - compat_sys_pselect6_time64 414 32 ppoll_time64 - compat_sys_ppoll_time64 -416 32 io_pgetevents_time64 - sys_io_pgetevents +416 32 io_pgetevents_time64 - compat_sys_io_pgetevents_time64 417 32 recvmmsg_time64 - compat_sys_recvmmsg_time64 418 32 mq_timedsend_time64 - sys_mq_timedsend 419 32 mq_timedreceive_time64 - sys_mq_timedreceive diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 5c76ebe07e9c..e38a5773d85e 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -461,7 +461,7 @@ 412 32 utimensat_time64 sys_utimensat sys_utimensat 413 32 pselect6_time64 sys_pselect6 compat_sys_pselect6_time64 414 32 ppoll_time64 sys_ppoll compat_sys_ppoll_time64 -416 32 io_pgetevents_time64 sys_io_pgetevents sys_io_pgetevents +416 32 io_pgetevents_time64 sys_io_pgetevents compat_sys_io_pgetevents_time64 417 32 recvmmsg_time64 sys_recvmmsg compat_sys_recvmmsg_time64 418 32 mq_timedsend_time64 sys_mq_timedsend sys_mq_timedsend 419 32 mq_timedreceive_time64 sys_mq_timedreceive sys_mq_timedreceive diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 0d0667a9fbd7..bad983312060 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -420,7 +420,7 @@ 412 i386 utimensat_time64 sys_utimensat 413 i386 pselect6_time64 sys_pselect6 compat_sys_pselect6_time64 414 i386 ppoll_time64 sys_ppoll compat_sys_ppoll_time64 -416 i386 io_pgetevents_time64 sys_io_pgetevents +416 i386 io_pgetevents_time64 sys_io_pgetevents compat_sys_io_pgetevents_time64 417 i386 recvmmsg_time64 sys_recvmmsg compat_sys_recvmmsg_time64 418 i386 mq_timedsend_time64 sys_mq_timedsend 419 i386 mq_timedreceive_time64 sys_mq_timedreceive diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 2056318988f7..1ff9589ab611 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -805,7 +805,7 @@ __SC_COMP(__NR_pselect6_time64, sys_pselect6, compat_sys_pselect6_time64) #define __NR_ppoll_time64 414 __SC_COMP(__NR_ppoll_time64, sys_ppoll, compat_sys_ppoll_time64) #define __NR_io_pgetevents_time64 416 -__SYSCALL(__NR_io_pgetevents_time64, sys_io_pgetevents) +__SC_COMP(__NR_io_pgetevents_time64, sys_io_pgetevents, compat_sys_io_pgetevents_time64) #define __NR_recvmmsg_time64 417 __SC_COMP(__NR_recvmmsg_time64, sys_recvmmsg, compat_sys_recvmmsg_time64) #define __NR_mq_timedsend_time64 418 diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index cdecd47e5580..9019299069b1 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -46,8 +46,8 @@ COND_SYSCALL(io_getevents_time32); COND_SYSCALL(io_getevents); COND_SYSCALL(io_pgetevents_time32); COND_SYSCALL(io_pgetevents); -COND_SYSCALL_COMPAT(io_pgetevents_time32); COND_SYSCALL_COMPAT(io_pgetevents); +COND_SYSCALL_COMPAT(io_pgetevents_time64); COND_SYSCALL(io_uring_setup); COND_SYSCALL(io_uring_enter); COND_SYSCALL(io_uring_register); From 2c43adf36475b2ed60241820f5095e3816d14d26 Mon Sep 17 00:00:00 2001 From: Jaime Liao Date: Thu, 20 May 2021 09:45:08 +0800 Subject: [PATCH 1338/1565] mtd: spinand: macronix: Add support for serial NAND flash commit c374839f9b4475173e536d1eaddff45cb481dbdf upstream. Macronix NAND Flash devices are available in different configurations and densities. MX"35" means SPI NAND MX35"LF"/"UF" , LF means 3V and UF meands 1.8V MX35LF"2G" , 2G means 2Gbits MX35LF2G"E4"/"24"/"14", E4 means internal ECC and Quad I/O(x4) 24 means 8-bit ecc requirement and Quad I/O(x4) 14 means 4-bit ecc requirement and Quad I/O(x4) MX35LF2G14AC is 3V 2Gbit serial NAND flash device (without on-die ECC) https://www.mxic.com.tw/Lists/Datasheet/Attachments/7926/MX35LF2G14AC,%203V,%202Gb,%20v1.1.pdf MX35UF4G24AD is 1.8V 4Gbit serial NAND flash device (without on-die ECC) https://www.mxic.com.tw/Lists/Datasheet/Attachments/7980/MX35UF4G24AD,%201.8V,%204Gb,%20v0.00.pdf MX35UF4GE4AD/MX35UF2GE4AD are 1.8V 4G/2Gbit serial NAND flash device with 8-bit on-die ECC https://www.mxic.com.tw/Lists/Datasheet/Attachments/7983/MX35UF4GE4AD,%201.8V,%204Gb,%20v0.00.pdf MX35UF2GE4AC/MX35UF1GE4AC are 1.8V 2G/1Gbit serial NAND flash device with 8-bit on-die ECC https://www.mxic.com.tw/Lists/Datasheet/Attachments/7974/MX35UF2GE4AC,%201.8V,%202Gb,%20v1.0.pdf MX35UF2G14AC/MX35UF1G14AC are 1.8V 2G/1Gbit serial NAND flash device (without on-die ECC) https://www.mxic.com.tw/Lists/Datasheet/Attachments/7931/MX35UF2G14AC,%201.8V,%202Gb,%20v1.1.pdf Validated via normal(default) and QUAD mode by read, erase, read back, on Xilinx Zynq PicoZed FPGA board which included Macronix SPI Host(drivers/spi/spi-mxic.c). Signed-off-by: Jaime Liao Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/1621475108-22523-1-git-send-email-jaimeliao@mxic.com.tw Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/nand/spi/macronix.c | 112 ++++++++++++++++++++++++++++++++ 1 file changed, 112 insertions(+) diff --git a/drivers/mtd/nand/spi/macronix.c b/drivers/mtd/nand/spi/macronix.c index 8bd3f6bf9b10..be6bdedc9b61 100644 --- a/drivers/mtd/nand/spi/macronix.c +++ b/drivers/mtd/nand/spi/macronix.c @@ -159,6 +159,118 @@ static const struct spinand_info macronix_spinand_table[] = { 0 /*SPINAND_HAS_QE_BIT*/, SPINAND_ECCINFO(&mx35lfxge4ab_ooblayout, mx35lf1ge4ab_ecc_get_status)), + + SPINAND_INFO("MX35LF2G14AC", + SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0x20), + NAND_MEMORG(1, 2048, 64, 64, 2048, 40, 2, 1, 1), + NAND_ECCREQ(4, 512), + SPINAND_INFO_OP_VARIANTS(&read_cache_variants, + &write_cache_variants, + &update_cache_variants), + SPINAND_HAS_QE_BIT, + SPINAND_ECCINFO(&mx35lfxge4ab_ooblayout, + mx35lf1ge4ab_ecc_get_status)), + SPINAND_INFO("MX35UF4G24AD", + SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0xb5), + NAND_MEMORG(1, 4096, 256, 64, 2048, 40, 2, 1, 1), + NAND_ECCREQ(8, 512), + SPINAND_INFO_OP_VARIANTS(&read_cache_variants, + &write_cache_variants, + &update_cache_variants), + SPINAND_HAS_QE_BIT, + SPINAND_ECCINFO(&mx35lfxge4ab_ooblayout, + mx35lf1ge4ab_ecc_get_status)), + SPINAND_INFO("MX35UF4GE4AD", + SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0xb7), + NAND_MEMORG(1, 4096, 256, 64, 2048, 40, 1, 1, 1), + NAND_ECCREQ(8, 512), + SPINAND_INFO_OP_VARIANTS(&read_cache_variants, + &write_cache_variants, + &update_cache_variants), + SPINAND_HAS_QE_BIT, + SPINAND_ECCINFO(&mx35lfxge4ab_ooblayout, + mx35lf1ge4ab_ecc_get_status)), + SPINAND_INFO("MX35UF2G14AC", + SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0xa0), + NAND_MEMORG(1, 2048, 64, 64, 2048, 40, 2, 1, 1), + NAND_ECCREQ(4, 512), + SPINAND_INFO_OP_VARIANTS(&read_cache_variants, + &write_cache_variants, + &update_cache_variants), + SPINAND_HAS_QE_BIT, + SPINAND_ECCINFO(&mx35lfxge4ab_ooblayout, + mx35lf1ge4ab_ecc_get_status)), + SPINAND_INFO("MX35UF2G24AD", + SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0xa4), + NAND_MEMORG(1, 2048, 128, 64, 2048, 40, 2, 1, 1), + NAND_ECCREQ(8, 512), + SPINAND_INFO_OP_VARIANTS(&read_cache_variants, + &write_cache_variants, + &update_cache_variants), + SPINAND_HAS_QE_BIT, + SPINAND_ECCINFO(&mx35lfxge4ab_ooblayout, + mx35lf1ge4ab_ecc_get_status)), + SPINAND_INFO("MX35UF2GE4AD", + SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0xa6), + NAND_MEMORG(1, 2048, 128, 64, 2048, 40, 1, 1, 1), + NAND_ECCREQ(8, 512), + SPINAND_INFO_OP_VARIANTS(&read_cache_variants, + &write_cache_variants, + &update_cache_variants), + SPINAND_HAS_QE_BIT, + SPINAND_ECCINFO(&mx35lfxge4ab_ooblayout, + mx35lf1ge4ab_ecc_get_status)), + SPINAND_INFO("MX35UF2GE4AC", + SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0xa2), + NAND_MEMORG(1, 2048, 64, 64, 2048, 40, 1, 1, 1), + NAND_ECCREQ(4, 512), + SPINAND_INFO_OP_VARIANTS(&read_cache_variants, + &write_cache_variants, + &update_cache_variants), + SPINAND_HAS_QE_BIT, + SPINAND_ECCINFO(&mx35lfxge4ab_ooblayout, + mx35lf1ge4ab_ecc_get_status)), + SPINAND_INFO("MX35UF1G14AC", + SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0x90), + NAND_MEMORG(1, 2048, 64, 64, 1024, 20, 1, 1, 1), + NAND_ECCREQ(4, 512), + SPINAND_INFO_OP_VARIANTS(&read_cache_variants, + &write_cache_variants, + &update_cache_variants), + SPINAND_HAS_QE_BIT, + SPINAND_ECCINFO(&mx35lfxge4ab_ooblayout, + mx35lf1ge4ab_ecc_get_status)), + SPINAND_INFO("MX35UF1G24AD", + SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0x94), + NAND_MEMORG(1, 2048, 128, 64, 1024, 20, 1, 1, 1), + NAND_ECCREQ(8, 512), + SPINAND_INFO_OP_VARIANTS(&read_cache_variants, + &write_cache_variants, + &update_cache_variants), + SPINAND_HAS_QE_BIT, + SPINAND_ECCINFO(&mx35lfxge4ab_ooblayout, + mx35lf1ge4ab_ecc_get_status)), + SPINAND_INFO("MX35UF1GE4AD", + SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0x96), + NAND_MEMORG(1, 2048, 128, 64, 1024, 20, 1, 1, 1), + NAND_ECCREQ(8, 512), + SPINAND_INFO_OP_VARIANTS(&read_cache_variants, + &write_cache_variants, + &update_cache_variants), + SPINAND_HAS_QE_BIT, + SPINAND_ECCINFO(&mx35lfxge4ab_ooblayout, + mx35lf1ge4ab_ecc_get_status)), + SPINAND_INFO("MX35UF1GE4AC", + SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0x92), + NAND_MEMORG(1, 2048, 64, 64, 1024, 20, 1, 1, 1), + NAND_ECCREQ(4, 512), + SPINAND_INFO_OP_VARIANTS(&read_cache_variants, + &write_cache_variants, + &update_cache_variants), + SPINAND_HAS_QE_BIT, + SPINAND_ECCINFO(&mx35lfxge4ab_ooblayout, + mx35lf1ge4ab_ecc_get_status)), + }; static const struct spinand_manufacturer_ops macronix_spinand_manuf_ops = { From 2e6bbfa1abfe1533cb242d657cca0694c59f4bab Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= Date: Fri, 21 Jun 2024 16:37:12 +0200 Subject: [PATCH 1339/1565] pwm: stm32: Refuse too small period requests MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit c45fcf46ca2368dafe7e5c513a711a6f0f974308 upstream. If period_ns is small, prd might well become 0. Catch that case because otherwise with regmap_write(priv->regmap, TIM_ARR, prd - 1); a few lines down quite a big period is configured. Fixes: 7edf7369205b ("pwm: Add driver for STM32 plaftorm") Cc: stable@vger.kernel.org Reviewed-by: Trevor Gamblin Signed-off-by: Uwe Kleine-König Link: https://lore.kernel.org/r/b86f62f099983646f97eeb6bfc0117bb2d0c340d.1718979150.git.u.kleine-koenig@baylibre.com Signed-off-by: Uwe Kleine-König Signed-off-by: Greg Kroah-Hartman --- drivers/pwm/pwm-stm32.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/pwm/pwm-stm32.c b/drivers/pwm/pwm-stm32.c index 69b7bc604946..3e5f1b622af8 100644 --- a/drivers/pwm/pwm-stm32.c +++ b/drivers/pwm/pwm-stm32.c @@ -340,6 +340,9 @@ static int stm32_pwm_config(struct stm32_pwm *priv, int ch, prd = div; + if (!prd) + return -EINVAL; + if (prescaler > MAX_TIM_PSC) return -EINVAL; From ed07b26c54ef4fbfe67fdd348e951fdbaeed35f8 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 16 May 2022 10:30:09 -0400 Subject: [PATCH 1340/1565] nfs: Leave pages in the pagecache if readpage failed commit 0b768a9610c6de9811c6d33900bebfb665192ee1 upstream. The pagecache handles readpage failing by itself; it doesn't want filesystems to remove pages from under it. Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Kuniyuki Iwashima Signed-off-by: Greg Kroah-Hartman --- fs/nfs/read.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/fs/nfs/read.c b/fs/nfs/read.c index eb854f1f86e2..75334ba10947 100644 --- a/fs/nfs/read.c +++ b/fs/nfs/read.c @@ -103,12 +103,8 @@ static void nfs_readpage_release(struct nfs_page *req, int error) if (nfs_error_is_fatal_on_server(error) && error != -ETIMEDOUT) SetPageError(page); if (nfs_page_group_sync_on_bit(req, PG_UNLOCKPAGE)) { - struct address_space *mapping = page_file_mapping(page); - if (PageUptodate(page)) nfs_readpage_to_fscache(inode, page, 0); - else if (!PageError(page) && !PagePrivate(page)) - generic_error_remove_page(mapping, page); unlock_page(page); } nfs_release_request(req); From d3bf338e9ca4d70f00e44c54ea67a274c7409583 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 17 Apr 2023 16:50:32 +0000 Subject: [PATCH 1341/1565] ipv6: annotate some data-races around sk->sk_prot commit 086d49058cd8471046ae9927524708820f5fd1c7 upstream. IPv6 has this hack changing sk->sk_prot when an IPv6 socket is 'converted' to an IPv4 one with IPV6_ADDRFORM option. This operation is only performed for TCP and UDP, knowing their 'struct proto' for the two network families are populated in the same way, and can not disappear while a reader might use and dereference sk->sk_prot. If we think about it all reads of sk->sk_prot while either socket lock or RTNL is not acquired should be using READ_ONCE(). Also note that other layers like MPTCP, XFRM, CHELSIO_TLS also write over sk->sk_prot. BUG: KCSAN: data-race in inet6_recvmsg / ipv6_setsockopt write to 0xffff8881386f7aa8 of 8 bytes by task 26932 on cpu 0: do_ipv6_setsockopt net/ipv6/ipv6_sockglue.c:492 [inline] ipv6_setsockopt+0x3758/0x3910 net/ipv6/ipv6_sockglue.c:1019 udpv6_setsockopt+0x85/0x90 net/ipv6/udp.c:1649 sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3489 __sys_setsockopt+0x209/0x2a0 net/socket.c:2180 __do_sys_setsockopt net/socket.c:2191 [inline] __se_sys_setsockopt net/socket.c:2188 [inline] __x64_sys_setsockopt+0x62/0x70 net/socket.c:2188 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff8881386f7aa8 of 8 bytes by task 26911 on cpu 1: inet6_recvmsg+0x7a/0x210 net/ipv6/af_inet6.c:659 ____sys_recvmsg+0x16c/0x320 ___sys_recvmsg net/socket.c:2674 [inline] do_recvmmsg+0x3f5/0xae0 net/socket.c:2768 __sys_recvmmsg net/socket.c:2847 [inline] __do_sys_recvmmsg net/socket.c:2870 [inline] __se_sys_recvmmsg net/socket.c:2863 [inline] __x64_sys_recvmmsg+0xde/0x160 net/socket.c:2863 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0xffffffff85e0e980 -> 0xffffffff85e01580 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 26911 Comm: syz-executor.3 Not tainted 5.17.0-rc2-syzkaller-00316-g0457e5153e0e-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Reported-by: syzbot Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Kazunori Kobayashi Signed-off-by: Greg Kroah-Hartman --- net/ipv6/af_inet6.c | 24 ++++++++++++++++++------ net/ipv6/ipv6_sockglue.c | 6 ++++-- 2 files changed, 22 insertions(+), 8 deletions(-) diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 329b3b36688a..32da2b66fa2f 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -449,11 +449,14 @@ static int __inet6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len, int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) { struct sock *sk = sock->sk; + const struct proto *prot; int err = 0; + /* IPV6_ADDRFORM can change sk->sk_prot under us. */ + prot = READ_ONCE(sk->sk_prot); /* If the socket has its own bind function then use it. */ - if (sk->sk_prot->bind) - return sk->sk_prot->bind(sk, uaddr, addr_len); + if (prot->bind) + return prot->bind(sk, uaddr, addr_len); if (addr_len < SIN6_LEN_RFC2133) return -EINVAL; @@ -566,6 +569,7 @@ int inet6_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) void __user *argp = (void __user *)arg; struct sock *sk = sock->sk; struct net *net = sock_net(sk); + const struct proto *prot; switch (cmd) { case SIOCADDRT: @@ -583,9 +587,11 @@ int inet6_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) case SIOCSIFDSTADDR: return addrconf_set_dstaddr(net, argp); default: - if (!sk->sk_prot->ioctl) + /* IPV6_ADDRFORM can change sk->sk_prot under us. */ + prot = READ_ONCE(sk->sk_prot); + if (!prot->ioctl) return -ENOIOCTLCMD; - return sk->sk_prot->ioctl(sk, cmd, arg); + return prot->ioctl(sk, cmd, arg); } /*NOTREACHED*/ return 0; @@ -647,11 +653,14 @@ INDIRECT_CALLABLE_DECLARE(int udpv6_sendmsg(struct sock *, struct msghdr *, int inet6_sendmsg(struct socket *sock, struct msghdr *msg, size_t size) { struct sock *sk = sock->sk; + const struct proto *prot; if (unlikely(inet_send_prepare(sk))) return -EAGAIN; - return INDIRECT_CALL_2(sk->sk_prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, + /* IPV6_ADDRFORM can change sk->sk_prot under us. */ + prot = READ_ONCE(sk->sk_prot); + return INDIRECT_CALL_2(prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, sk, msg, size); } @@ -661,13 +670,16 @@ int inet6_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int flags) { struct sock *sk = sock->sk; + const struct proto *prot; int addr_len = 0; int err; if (likely(!(flags & MSG_ERRQUEUE))) sock_rps_record_flow(sk); - err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udpv6_recvmsg, + /* IPV6_ADDRFORM can change sk->sk_prot under us. */ + prot = READ_ONCE(sk->sk_prot); + err = INDIRECT_CALL_2(prot->recvmsg, tcp_recvmsg, udpv6_recvmsg, sk, msg, size, flags & MSG_DONTWAIT, flags & ~MSG_DONTWAIT, &addr_len); if (err >= 0) diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 0ac527cd5d56..2921e162925a 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -475,7 +475,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, sock_prot_inuse_add(net, sk->sk_prot, -1); sock_prot_inuse_add(net, &tcp_prot, 1); local_bh_enable(); - sk->sk_prot = &tcp_prot; + /* Paired with READ_ONCE(sk->sk_prot) in net/ipv6/af_inet6.c */ + WRITE_ONCE(sk->sk_prot, &tcp_prot); icsk->icsk_af_ops = &ipv4_specific; sk->sk_socket->ops = &inet_stream_ops; sk->sk_family = PF_INET; @@ -489,7 +490,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, sock_prot_inuse_add(net, sk->sk_prot, -1); sock_prot_inuse_add(net, prot, 1); local_bh_enable(); - sk->sk_prot = prot; + /* Paired with READ_ONCE(sk->sk_prot) in net/ipv6/af_inet6.c */ + WRITE_ONCE(sk->sk_prot, prot); sk->sk_socket->ops = &inet_dgram_ops; sk->sk_family = PF_INET; } From ea2ed3f78ab238865b9b9e19a85c1de16d570ab7 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Mon, 17 Apr 2023 16:50:33 +0000 Subject: [PATCH 1342/1565] ipv6: Fix data races around sk->sk_prot. commit 364f997b5cfe1db0d63a390fe7c801fa2b3115f6 upstream. Commit 086d49058cd8 ("ipv6: annotate some data-races around sk->sk_prot") fixed some data-races around sk->sk_prot but it was not enough. Some functions in inet6_(stream|dgram)_ops still access sk->sk_prot without lock_sock() or rtnl_lock(), so they need READ_ONCE() to avoid load tearing. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Jakub Kicinski Signed-off-by: Kazunori Kobayashi Signed-off-by: Greg Kroah-Hartman --- net/core/sock.c | 6 ++++-- net/ipv4/af_inet.c | 23 ++++++++++++++++------- net/ipv6/ipv6_sockglue.c | 4 ++-- 3 files changed, 22 insertions(+), 11 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index b4ecd0071e22..d5818a5a86fd 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3263,7 +3263,8 @@ int sock_common_getsockopt(struct socket *sock, int level, int optname, { struct sock *sk = sock->sk; - return sk->sk_prot->getsockopt(sk, level, optname, optval, optlen); + /* IPV6_ADDRFORM can change sk->sk_prot under us. */ + return READ_ONCE(sk->sk_prot)->getsockopt(sk, level, optname, optval, optlen); } EXPORT_SYMBOL(sock_common_getsockopt); @@ -3290,7 +3291,8 @@ int sock_common_setsockopt(struct socket *sock, int level, int optname, { struct sock *sk = sock->sk; - return sk->sk_prot->setsockopt(sk, level, optname, optval, optlen); + /* IPV6_ADDRFORM can change sk->sk_prot under us. */ + return READ_ONCE(sk->sk_prot)->setsockopt(sk, level, optname, optval, optlen); } EXPORT_SYMBOL(sock_common_setsockopt); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 5f1b334e64b3..475a19db3713 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -562,22 +562,27 @@ int inet_dgram_connect(struct socket *sock, struct sockaddr *uaddr, int addr_len, int flags) { struct sock *sk = sock->sk; + const struct proto *prot; int err; if (addr_len < sizeof(uaddr->sa_family)) return -EINVAL; + + /* IPV6_ADDRFORM can change sk->sk_prot under us. */ + prot = READ_ONCE(sk->sk_prot); + if (uaddr->sa_family == AF_UNSPEC) - return sk->sk_prot->disconnect(sk, flags); + return prot->disconnect(sk, flags); if (BPF_CGROUP_PRE_CONNECT_ENABLED(sk)) { - err = sk->sk_prot->pre_connect(sk, uaddr, addr_len); + err = prot->pre_connect(sk, uaddr, addr_len); if (err) return err; } if (data_race(!inet_sk(sk)->inet_num) && inet_autobind(sk)) return -EAGAIN; - return sk->sk_prot->connect(sk, uaddr, addr_len); + return prot->connect(sk, uaddr, addr_len); } EXPORT_SYMBOL(inet_dgram_connect); @@ -740,10 +745,11 @@ EXPORT_SYMBOL(inet_stream_connect); int inet_accept(struct socket *sock, struct socket *newsock, int flags, bool kern) { - struct sock *sk1 = sock->sk; + struct sock *sk1 = sock->sk, *sk2; int err = -EINVAL; - struct sock *sk2 = sk1->sk_prot->accept(sk1, flags, &err, kern); + /* IPV6_ADDRFORM can change sk->sk_prot under us. */ + sk2 = READ_ONCE(sk1->sk_prot)->accept(sk1, flags, &err, kern); if (!sk2) goto do_err; @@ -828,12 +834,15 @@ ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags) { struct sock *sk = sock->sk; + const struct proto *prot; if (unlikely(inet_send_prepare(sk))) return -EAGAIN; - if (sk->sk_prot->sendpage) - return sk->sk_prot->sendpage(sk, page, offset, size, flags); + /* IPV6_ADDRFORM can change sk->sk_prot under us. */ + prot = READ_ONCE(sk->sk_prot); + if (prot->sendpage) + return prot->sendpage(sk, page, offset, size, flags); return sock_no_sendpage(sock, page, offset, size, flags); } EXPORT_SYMBOL(inet_sendpage); diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 2921e162925a..186de3efc7c6 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -475,7 +475,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, sock_prot_inuse_add(net, sk->sk_prot, -1); sock_prot_inuse_add(net, &tcp_prot, 1); local_bh_enable(); - /* Paired with READ_ONCE(sk->sk_prot) in net/ipv6/af_inet6.c */ + /* Paired with READ_ONCE(sk->sk_prot) in inet6_stream_ops */ WRITE_ONCE(sk->sk_prot, &tcp_prot); icsk->icsk_af_ops = &ipv4_specific; sk->sk_socket->ops = &inet_stream_ops; @@ -490,7 +490,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, sock_prot_inuse_add(net, sk->sk_prot, -1); sock_prot_inuse_add(net, prot, 1); local_bh_enable(); - /* Paired with READ_ONCE(sk->sk_prot) in net/ipv6/af_inet6.c */ + /* Paired with READ_ONCE(sk->sk_prot) in inet6_dgram_ops */ WRITE_ONCE(sk->sk_prot, prot); sk->sk_socket->ops = &inet_dgram_ops; sk->sk_family = PF_INET; From 3b32f265805a49071e2c4568a524398ba22bf93c Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Mon, 17 Apr 2023 16:50:34 +0000 Subject: [PATCH 1343/1565] tcp: Fix data races around icsk->icsk_af_ops. commit f49cd2f4d6170d27a2c61f1fecb03d8a70c91f57 upstream. setsockopt(IPV6_ADDRFORM) and tcp_v6_connect() change icsk->icsk_af_ops under lock_sock(), but tcp_(get|set)sockopt() read it locklessly. To avoid load/store tearing, we need to add READ_ONCE() and WRITE_ONCE() for the reads and writes. Thanks to Eric Dumazet for providing the syzbot report: BUG: KCSAN: data-race in tcp_setsockopt / tcp_v6_connect write to 0xffff88813c624518 of 8 bytes by task 23936 on cpu 0: tcp_v6_connect+0x5b3/0xce0 net/ipv6/tcp_ipv6.c:240 __inet_stream_connect+0x159/0x6d0 net/ipv4/af_inet.c:660 inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:724 __sys_connect_file net/socket.c:1976 [inline] __sys_connect+0x197/0x1b0 net/socket.c:1993 __do_sys_connect net/socket.c:2003 [inline] __se_sys_connect net/socket.c:2000 [inline] __x64_sys_connect+0x3d/0x50 net/socket.c:2000 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff88813c624518 of 8 bytes by task 23937 on cpu 1: tcp_setsockopt+0x147/0x1c80 net/ipv4/tcp.c:3789 sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3585 __sys_setsockopt+0x212/0x2b0 net/socket.c:2252 __do_sys_setsockopt net/socket.c:2263 [inline] __se_sys_setsockopt net/socket.c:2260 [inline] __x64_sys_setsockopt+0x62/0x70 net/socket.c:2260 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0xffffffff8539af68 -> 0xffffffff8539aff8 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 23937 Comm: syz-executor.5 Not tainted 6.0.0-rc4-syzkaller-00331-g4ed9c1e971b1-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot Reported-by: Eric Dumazet Signed-off-by: Kuniyuki Iwashima Signed-off-by: Jakub Kicinski Signed-off-by: Kazunori Kobayashi Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp.c | 10 ++++++---- net/ipv6/ipv6_sockglue.c | 3 ++- net/ipv6/tcp_ipv6.c | 6 ++++-- 3 files changed, 12 insertions(+), 7 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 0a495b6edbc4..24ebd51c5e0b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3501,8 +3501,9 @@ int tcp_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, const struct inet_connection_sock *icsk = inet_csk(sk); if (level != SOL_TCP) - return icsk->icsk_af_ops->setsockopt(sk, level, optname, - optval, optlen); + /* Paired with WRITE_ONCE() in do_ipv6_setsockopt() and tcp_v6_connect() */ + return READ_ONCE(icsk->icsk_af_ops)->setsockopt(sk, level, optname, + optval, optlen); return do_tcp_setsockopt(sk, level, optname, optval, optlen); } EXPORT_SYMBOL(tcp_setsockopt); @@ -4063,8 +4064,9 @@ int tcp_getsockopt(struct sock *sk, int level, int optname, char __user *optval, struct inet_connection_sock *icsk = inet_csk(sk); if (level != SOL_TCP) - return icsk->icsk_af_ops->getsockopt(sk, level, optname, - optval, optlen); + /* Paired with WRITE_ONCE() in do_ipv6_setsockopt() and tcp_v6_connect() */ + return READ_ONCE(icsk->icsk_af_ops)->getsockopt(sk, level, optname, + optval, optlen); return do_tcp_getsockopt(sk, level, optname, optval, optlen); } EXPORT_SYMBOL(tcp_getsockopt); diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 186de3efc7c6..dbbe53260b00 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -477,7 +477,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, local_bh_enable(); /* Paired with READ_ONCE(sk->sk_prot) in inet6_stream_ops */ WRITE_ONCE(sk->sk_prot, &tcp_prot); - icsk->icsk_af_ops = &ipv4_specific; + /* Paired with READ_ONCE() in tcp_(get|set)sockopt() */ + WRITE_ONCE(icsk->icsk_af_ops, &ipv4_specific); sk->sk_socket->ops = &inet_stream_ops; sk->sk_family = PF_INET; tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 003221d6f52e..7e595585d059 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -237,7 +237,8 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, sin.sin_port = usin->sin6_port; sin.sin_addr.s_addr = usin->sin6_addr.s6_addr32[3]; - icsk->icsk_af_ops = &ipv6_mapped; + /* Paired with READ_ONCE() in tcp_(get|set)sockopt() */ + WRITE_ONCE(icsk->icsk_af_ops, &ipv6_mapped); if (sk_is_mptcp(sk)) mptcpv6_handle_mapped(sk, true); sk->sk_backlog_rcv = tcp_v4_do_rcv; @@ -249,7 +250,8 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, if (err) { icsk->icsk_ext_hdr_len = exthdrlen; - icsk->icsk_af_ops = &ipv6_specific; + /* Paired with READ_ONCE() in tcp_(get|set)sockopt() */ + WRITE_ONCE(icsk->icsk_af_ops, &ipv6_specific); if (sk_is_mptcp(sk)) mptcpv6_handle_mapped(sk, false); sk->sk_backlog_rcv = tcp_v6_do_rcv; From d204beedc82f79e1fe95c63fd70c2e9bb04da539 Mon Sep 17 00:00:00 2001 From: Zheng Zhi Yuan Date: Sat, 29 Jun 2024 17:13:58 +0200 Subject: [PATCH 1344/1565] drivers: fix typo in firmware/efi/memmap.c [ Commit 1df4d1724baafa55e9803414ebcdf1ca702bc958 upstream ] This patch fixes the spelling error in firmware/efi/memmap.c, changing it to the correct word. Signed-off-by: Zheng Zhi Yuan Signed-off-by: Ard Biesheuvel Signed-off-by: Greg Kroah-Hartman --- drivers/firmware/efi/memmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c index 2ff1883dc788..b4070b5e4c45 100644 --- a/drivers/firmware/efi/memmap.c +++ b/drivers/firmware/efi/memmap.c @@ -245,7 +245,7 @@ int __init efi_memmap_install(struct efi_memory_map_data *data) * @range: Address range (start, end) to split around * * Returns the number of additional EFI memmap entries required to - * accomodate @range. + * accommodate @range. */ int __init efi_memmap_split_count(efi_memory_desc_t *md, struct range *range) { From 52dc463a76b0441a6031d66d0c8f2fb243950c6a Mon Sep 17 00:00:00 2001 From: Liu Zixian Date: Sat, 29 Jun 2024 17:13:59 +0200 Subject: [PATCH 1345/1565] efi: Correct comment on efi_memmap_alloc [ Commit db01ea882bf601252dad57242655da17fd9ad2f5 upstream ] Returning zero means success now. Signed-off-by: Liu Zixian Signed-off-by: Ard Biesheuvel Signed-off-by: Greg Kroah-Hartman --- drivers/firmware/efi/memmap.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c index b4070b5e4c45..3aaf717aad05 100644 --- a/drivers/firmware/efi/memmap.c +++ b/drivers/firmware/efi/memmap.c @@ -59,8 +59,7 @@ static void __init efi_memmap_free(void) * Depending on whether mm_init() has already been invoked or not, * either memblock or "normal" page allocation is used. * - * Returns the physical address of the allocated memory map on - * success, zero on failure. + * Returns zero on success, a negative error code on failure. */ int __init efi_memmap_alloc(unsigned int num_entries, struct efi_memory_map_data *data) From 31e0721aeabde29371f624f56ce2f403508527a5 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Sat, 29 Jun 2024 17:14:00 +0200 Subject: [PATCH 1346/1565] efi: memmap: Move manipulation routines into x86 arch tree [ Commit fdc6d38d64a20c542b1867ebeb8dd03b98829336 upstream ] The EFI memory map is a description of the memory layout as provided by the firmware, and only x86 manipulates it in various different ways for its own memory bookkeeping. So let's move the memmap routines that are only used by x86 into the x86 arch tree. [ardb: minor tweaks for linux-5.10.y backport] Signed-off-by: Ard Biesheuvel Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/efi.h | 12 ++ arch/x86/platform/efi/Makefile | 3 +- arch/x86/platform/efi/memmap.c | 236 ++++++++++++++++++++++++++++++++ drivers/firmware/efi/memmap.c | 240 ++------------------------------- include/linux/efi.h | 10 +- 5 files changed, 261 insertions(+), 240 deletions(-) create mode 100644 arch/x86/platform/efi/memmap.c diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index bc9758ef292e..01c0410ae97a 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -382,4 +382,16 @@ static inline void efi_fake_memmap_early(void) } #endif +extern int __init efi_memmap_alloc(unsigned int num_entries, + struct efi_memory_map_data *data); +extern void __efi_memmap_free(u64 phys, unsigned long size, + unsigned long flags); +#define __efi_memmap_free __efi_memmap_free + +extern int __init efi_memmap_install(struct efi_memory_map_data *data); +extern int __init efi_memmap_split_count(efi_memory_desc_t *md, + struct range *range); +extern void __init efi_memmap_insert(struct efi_memory_map *old_memmap, + void *buf, struct efi_mem_range *mem); + #endif /* _ASM_X86_EFI_H */ diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile index 84b09c230cbd..196f61fdbf28 100644 --- a/arch/x86/platform/efi/Makefile +++ b/arch/x86/platform/efi/Makefile @@ -3,5 +3,6 @@ OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y KASAN_SANITIZE := n GCOV_PROFILE := n -obj-$(CONFIG_EFI) += quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o +obj-$(CONFIG_EFI) += memmap.o quirks.o efi.o efi_$(BITS).o \ + efi_stub_$(BITS).o obj-$(CONFIG_EFI_MIXED) += efi_thunk_$(BITS).o diff --git a/arch/x86/platform/efi/memmap.c b/arch/x86/platform/efi/memmap.c new file mode 100644 index 000000000000..620af26b55c0 --- /dev/null +++ b/arch/x86/platform/efi/memmap.c @@ -0,0 +1,236 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Common EFI memory map functions. + */ + +#define pr_fmt(fmt) "efi: " fmt + +#include +#include +#include +#include +#include +#include +#include +#include + +static phys_addr_t __init __efi_memmap_alloc_early(unsigned long size) +{ + return memblock_phys_alloc(size, SMP_CACHE_BYTES); +} + +static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size) +{ + unsigned int order = get_order(size); + struct page *p = alloc_pages(GFP_KERNEL, order); + + if (!p) + return 0; + + return PFN_PHYS(page_to_pfn(p)); +} + +void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags) +{ + if (flags & EFI_MEMMAP_MEMBLOCK) { + if (slab_is_available()) + memblock_free_late(phys, size); + else + memblock_free(phys, size); + } else if (flags & EFI_MEMMAP_SLAB) { + struct page *p = pfn_to_page(PHYS_PFN(phys)); + unsigned int order = get_order(size); + + free_pages((unsigned long) page_address(p), order); + } +} + +/** + * efi_memmap_alloc - Allocate memory for the EFI memory map + * @num_entries: Number of entries in the allocated map. + * @data: efi memmap installation parameters + * + * Depending on whether mm_init() has already been invoked or not, + * either memblock or "normal" page allocation is used. + * + * Returns zero on success, a negative error code on failure. + */ +int __init efi_memmap_alloc(unsigned int num_entries, + struct efi_memory_map_data *data) +{ + /* Expect allocation parameters are zero initialized */ + WARN_ON(data->phys_map || data->size); + + data->size = num_entries * efi.memmap.desc_size; + data->desc_version = efi.memmap.desc_version; + data->desc_size = efi.memmap.desc_size; + data->flags &= ~(EFI_MEMMAP_SLAB | EFI_MEMMAP_MEMBLOCK); + data->flags |= efi.memmap.flags & EFI_MEMMAP_LATE; + + if (slab_is_available()) { + data->flags |= EFI_MEMMAP_SLAB; + data->phys_map = __efi_memmap_alloc_late(data->size); + } else { + data->flags |= EFI_MEMMAP_MEMBLOCK; + data->phys_map = __efi_memmap_alloc_early(data->size); + } + + if (!data->phys_map) + return -ENOMEM; + return 0; +} + +/** + * efi_memmap_install - Install a new EFI memory map in efi.memmap + * @ctx: map allocation parameters (address, size, flags) + * + * Unlike efi_memmap_init_*(), this function does not allow the caller + * to switch from early to late mappings. It simply uses the existing + * mapping function and installs the new memmap. + * + * Returns zero on success, a negative error code on failure. + */ +int __init efi_memmap_install(struct efi_memory_map_data *data) +{ + efi_memmap_unmap(); + + return __efi_memmap_init(data); +} + +/** + * efi_memmap_split_count - Count number of additional EFI memmap entries + * @md: EFI memory descriptor to split + * @range: Address range (start, end) to split around + * + * Returns the number of additional EFI memmap entries required to + * accommodate @range. + */ +int __init efi_memmap_split_count(efi_memory_desc_t *md, struct range *range) +{ + u64 m_start, m_end; + u64 start, end; + int count = 0; + + start = md->phys_addr; + end = start + (md->num_pages << EFI_PAGE_SHIFT) - 1; + + /* modifying range */ + m_start = range->start; + m_end = range->end; + + if (m_start <= start) { + /* split into 2 parts */ + if (start < m_end && m_end < end) + count++; + } + + if (start < m_start && m_start < end) { + /* split into 3 parts */ + if (m_end < end) + count += 2; + /* split into 2 parts */ + if (end <= m_end) + count++; + } + + return count; +} + +/** + * efi_memmap_insert - Insert a memory region in an EFI memmap + * @old_memmap: The existing EFI memory map structure + * @buf: Address of buffer to store new map + * @mem: Memory map entry to insert + * + * It is suggested that you call efi_memmap_split_count() first + * to see how large @buf needs to be. + */ +void __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf, + struct efi_mem_range *mem) +{ + u64 m_start, m_end, m_attr; + efi_memory_desc_t *md; + u64 start, end; + void *old, *new; + + /* modifying range */ + m_start = mem->range.start; + m_end = mem->range.end; + m_attr = mem->attribute; + + /* + * The EFI memory map deals with regions in EFI_PAGE_SIZE + * units. Ensure that the region described by 'mem' is aligned + * correctly. + */ + if (!IS_ALIGNED(m_start, EFI_PAGE_SIZE) || + !IS_ALIGNED(m_end + 1, EFI_PAGE_SIZE)) { + WARN_ON(1); + return; + } + + for (old = old_memmap->map, new = buf; + old < old_memmap->map_end; + old += old_memmap->desc_size, new += old_memmap->desc_size) { + + /* copy original EFI memory descriptor */ + memcpy(new, old, old_memmap->desc_size); + md = new; + start = md->phys_addr; + end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1; + + if (m_start <= start && end <= m_end) + md->attribute |= m_attr; + + if (m_start <= start && + (start < m_end && m_end < end)) { + /* first part */ + md->attribute |= m_attr; + md->num_pages = (m_end - md->phys_addr + 1) >> + EFI_PAGE_SHIFT; + /* latter part */ + new += old_memmap->desc_size; + memcpy(new, old, old_memmap->desc_size); + md = new; + md->phys_addr = m_end + 1; + md->num_pages = (end - md->phys_addr + 1) >> + EFI_PAGE_SHIFT; + } + + if ((start < m_start && m_start < end) && m_end < end) { + /* first part */ + md->num_pages = (m_start - md->phys_addr) >> + EFI_PAGE_SHIFT; + /* middle part */ + new += old_memmap->desc_size; + memcpy(new, old, old_memmap->desc_size); + md = new; + md->attribute |= m_attr; + md->phys_addr = m_start; + md->num_pages = (m_end - m_start + 1) >> + EFI_PAGE_SHIFT; + /* last part */ + new += old_memmap->desc_size; + memcpy(new, old, old_memmap->desc_size); + md = new; + md->phys_addr = m_end + 1; + md->num_pages = (end - m_end) >> + EFI_PAGE_SHIFT; + } + + if ((start < m_start && m_start < end) && + (end <= m_end)) { + /* first part */ + md->num_pages = (m_start - md->phys_addr) >> + EFI_PAGE_SHIFT; + /* latter part */ + new += old_memmap->desc_size; + memcpy(new, old, old_memmap->desc_size); + md = new; + md->phys_addr = m_start; + md->num_pages = (end - md->phys_addr + 1) >> + EFI_PAGE_SHIFT; + md->attribute |= m_attr; + } + } +} diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c index 3aaf717aad05..e6256c48284e 100644 --- a/drivers/firmware/efi/memmap.c +++ b/drivers/firmware/efi/memmap.c @@ -9,82 +9,15 @@ #include #include #include -#include #include #include -static phys_addr_t __init __efi_memmap_alloc_early(unsigned long size) -{ - return memblock_phys_alloc(size, SMP_CACHE_BYTES); -} +#include +#include -static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size) -{ - unsigned int order = get_order(size); - struct page *p = alloc_pages(GFP_KERNEL, order); - - if (!p) - return 0; - - return PFN_PHYS(page_to_pfn(p)); -} - -void __init __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags) -{ - if (flags & EFI_MEMMAP_MEMBLOCK) { - if (slab_is_available()) - memblock_free_late(phys, size); - else - memblock_free(phys, size); - } else if (flags & EFI_MEMMAP_SLAB) { - struct page *p = pfn_to_page(PHYS_PFN(phys)); - unsigned int order = get_order(size); - - free_pages((unsigned long) page_address(p), order); - } -} - -static void __init efi_memmap_free(void) -{ - __efi_memmap_free(efi.memmap.phys_map, - efi.memmap.desc_size * efi.memmap.nr_map, - efi.memmap.flags); -} - -/** - * efi_memmap_alloc - Allocate memory for the EFI memory map - * @num_entries: Number of entries in the allocated map. - * @data: efi memmap installation parameters - * - * Depending on whether mm_init() has already been invoked or not, - * either memblock or "normal" page allocation is used. - * - * Returns zero on success, a negative error code on failure. - */ -int __init efi_memmap_alloc(unsigned int num_entries, - struct efi_memory_map_data *data) -{ - /* Expect allocation parameters are zero initialized */ - WARN_ON(data->phys_map || data->size); - - data->size = num_entries * efi.memmap.desc_size; - data->desc_version = efi.memmap.desc_version; - data->desc_size = efi.memmap.desc_size; - data->flags &= ~(EFI_MEMMAP_SLAB | EFI_MEMMAP_MEMBLOCK); - data->flags |= efi.memmap.flags & EFI_MEMMAP_LATE; - - if (slab_is_available()) { - data->flags |= EFI_MEMMAP_SLAB; - data->phys_map = __efi_memmap_alloc_late(data->size); - } else { - data->flags |= EFI_MEMMAP_MEMBLOCK; - data->phys_map = __efi_memmap_alloc_early(data->size); - } - - if (!data->phys_map) - return -ENOMEM; - return 0; -} +#ifndef __efi_memmap_free +#define __efi_memmap_free(phys, size, flags) do { } while (0) +#endif /** * __efi_memmap_init - Common code for mapping the EFI memory map @@ -101,7 +34,7 @@ int __init efi_memmap_alloc(unsigned int num_entries, * * Returns zero on success, a negative error code on failure. */ -static int __init __efi_memmap_init(struct efi_memory_map_data *data) +int __init __efi_memmap_init(struct efi_memory_map_data *data) { struct efi_memory_map map; phys_addr_t phys_map; @@ -121,8 +54,10 @@ static int __init __efi_memmap_init(struct efi_memory_map_data *data) return -ENOMEM; } - /* NOP if data->flags & (EFI_MEMMAP_MEMBLOCK | EFI_MEMMAP_SLAB) == 0 */ - efi_memmap_free(); + if (efi.memmap.flags & (EFI_MEMMAP_MEMBLOCK | EFI_MEMMAP_SLAB)) + __efi_memmap_free(efi.memmap.phys_map, + efi.memmap.desc_size * efi.memmap.nr_map, + efi.memmap.flags); map.phys_map = data->phys_map; map.nr_map = data->size / data->desc_size; @@ -220,158 +155,3 @@ int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size) return __efi_memmap_init(&data); } - -/** - * efi_memmap_install - Install a new EFI memory map in efi.memmap - * @ctx: map allocation parameters (address, size, flags) - * - * Unlike efi_memmap_init_*(), this function does not allow the caller - * to switch from early to late mappings. It simply uses the existing - * mapping function and installs the new memmap. - * - * Returns zero on success, a negative error code on failure. - */ -int __init efi_memmap_install(struct efi_memory_map_data *data) -{ - efi_memmap_unmap(); - - return __efi_memmap_init(data); -} - -/** - * efi_memmap_split_count - Count number of additional EFI memmap entries - * @md: EFI memory descriptor to split - * @range: Address range (start, end) to split around - * - * Returns the number of additional EFI memmap entries required to - * accommodate @range. - */ -int __init efi_memmap_split_count(efi_memory_desc_t *md, struct range *range) -{ - u64 m_start, m_end; - u64 start, end; - int count = 0; - - start = md->phys_addr; - end = start + (md->num_pages << EFI_PAGE_SHIFT) - 1; - - /* modifying range */ - m_start = range->start; - m_end = range->end; - - if (m_start <= start) { - /* split into 2 parts */ - if (start < m_end && m_end < end) - count++; - } - - if (start < m_start && m_start < end) { - /* split into 3 parts */ - if (m_end < end) - count += 2; - /* split into 2 parts */ - if (end <= m_end) - count++; - } - - return count; -} - -/** - * efi_memmap_insert - Insert a memory region in an EFI memmap - * @old_memmap: The existing EFI memory map structure - * @buf: Address of buffer to store new map - * @mem: Memory map entry to insert - * - * It is suggested that you call efi_memmap_split_count() first - * to see how large @buf needs to be. - */ -void __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf, - struct efi_mem_range *mem) -{ - u64 m_start, m_end, m_attr; - efi_memory_desc_t *md; - u64 start, end; - void *old, *new; - - /* modifying range */ - m_start = mem->range.start; - m_end = mem->range.end; - m_attr = mem->attribute; - - /* - * The EFI memory map deals with regions in EFI_PAGE_SIZE - * units. Ensure that the region described by 'mem' is aligned - * correctly. - */ - if (!IS_ALIGNED(m_start, EFI_PAGE_SIZE) || - !IS_ALIGNED(m_end + 1, EFI_PAGE_SIZE)) { - WARN_ON(1); - return; - } - - for (old = old_memmap->map, new = buf; - old < old_memmap->map_end; - old += old_memmap->desc_size, new += old_memmap->desc_size) { - - /* copy original EFI memory descriptor */ - memcpy(new, old, old_memmap->desc_size); - md = new; - start = md->phys_addr; - end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1; - - if (m_start <= start && end <= m_end) - md->attribute |= m_attr; - - if (m_start <= start && - (start < m_end && m_end < end)) { - /* first part */ - md->attribute |= m_attr; - md->num_pages = (m_end - md->phys_addr + 1) >> - EFI_PAGE_SHIFT; - /* latter part */ - new += old_memmap->desc_size; - memcpy(new, old, old_memmap->desc_size); - md = new; - md->phys_addr = m_end + 1; - md->num_pages = (end - md->phys_addr + 1) >> - EFI_PAGE_SHIFT; - } - - if ((start < m_start && m_start < end) && m_end < end) { - /* first part */ - md->num_pages = (m_start - md->phys_addr) >> - EFI_PAGE_SHIFT; - /* middle part */ - new += old_memmap->desc_size; - memcpy(new, old, old_memmap->desc_size); - md = new; - md->attribute |= m_attr; - md->phys_addr = m_start; - md->num_pages = (m_end - m_start + 1) >> - EFI_PAGE_SHIFT; - /* last part */ - new += old_memmap->desc_size; - memcpy(new, old, old_memmap->desc_size); - md = new; - md->phys_addr = m_end + 1; - md->num_pages = (end - m_end) >> - EFI_PAGE_SHIFT; - } - - if ((start < m_start && m_start < end) && - (end <= m_end)) { - /* first part */ - md->num_pages = (m_start - md->phys_addr) >> - EFI_PAGE_SHIFT; - /* latter part */ - new += old_memmap->desc_size; - memcpy(new, old, old_memmap->desc_size); - md = new; - md->phys_addr = m_start; - md->num_pages = (end - md->phys_addr + 1) >> - EFI_PAGE_SHIFT; - md->attribute |= m_attr; - } - } -} diff --git a/include/linux/efi.h b/include/linux/efi.h index 084990390420..2fd321da5c4b 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -633,18 +633,10 @@ static inline efi_status_t efi_query_variable_store(u32 attributes, #endif extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr); -extern int __init efi_memmap_alloc(unsigned int num_entries, - struct efi_memory_map_data *data); -extern void __efi_memmap_free(u64 phys, unsigned long size, - unsigned long flags); +extern int __init __efi_memmap_init(struct efi_memory_map_data *data); extern int __init efi_memmap_init_early(struct efi_memory_map_data *data); extern int __init efi_memmap_init_late(phys_addr_t addr, unsigned long size); extern void __init efi_memmap_unmap(void); -extern int __init efi_memmap_install(struct efi_memory_map_data *data); -extern int __init efi_memmap_split_count(efi_memory_desc_t *md, - struct range *range); -extern void __init efi_memmap_insert(struct efi_memory_map *old_memmap, - void *buf, struct efi_mem_range *mem); #ifdef CONFIG_EFI_ESRT extern void __init efi_esrt_init(void); From e5d730882d27a8556b2522300a4e80405b443067 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Sat, 29 Jun 2024 17:14:01 +0200 Subject: [PATCH 1347/1565] efi: xen: Set EFI_PARAVIRT for Xen dom0 boot on all architectures [ Commit d85e3e34940788578eeffd94e8b7e1d28e7278e9 upstream ] Currently, the EFI_PARAVIRT flag is only used by Xen dom0 boot on x86, even though other architectures also support pseudo-EFI boot, where the core kernel is invoked directly and provided with a set of data tables that resemble the ones constructed by the EFI stub, which never actually runs in that case. Let's fix this inconsistency, and always set this flag when booting dom0 via the EFI boot path. Note that Xen on x86 does not provide the EFI memory map in this case, whereas other architectures do, so move the associated EFI_PARAVIRT check into the x86 platform code. Signed-off-by: Ard Biesheuvel Signed-off-by: Greg Kroah-Hartman --- arch/x86/platform/efi/efi.c | 8 +++++--- arch/x86/platform/efi/memmap.c | 3 +++ drivers/firmware/efi/fdtparams.c | 4 ++++ drivers/firmware/efi/memmap.c | 3 --- 4 files changed, 12 insertions(+), 6 deletions(-) diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 8a26e705cb06..41229bcbe0d9 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -234,9 +234,11 @@ int __init efi_memblock_x86_reserve_range(void) data.desc_size = e->efi_memdesc_size; data.desc_version = e->efi_memdesc_version; - rv = efi_memmap_init_early(&data); - if (rv) - return rv; + if (!efi_enabled(EFI_PARAVIRT)) { + rv = efi_memmap_init_early(&data); + if (rv) + return rv; + } if (add_efi_memmap || do_efi_soft_reserve()) do_add_efi_memmap(); diff --git a/arch/x86/platform/efi/memmap.c b/arch/x86/platform/efi/memmap.c index 620af26b55c0..241464b6dd03 100644 --- a/arch/x86/platform/efi/memmap.c +++ b/arch/x86/platform/efi/memmap.c @@ -94,6 +94,9 @@ int __init efi_memmap_install(struct efi_memory_map_data *data) { efi_memmap_unmap(); + if (efi_enabled(EFI_PARAVIRT)) + return 0; + return __efi_memmap_init(data); } diff --git a/drivers/firmware/efi/fdtparams.c b/drivers/firmware/efi/fdtparams.c index e901f8564ca0..0ec83ba58097 100644 --- a/drivers/firmware/efi/fdtparams.c +++ b/drivers/firmware/efi/fdtparams.c @@ -30,11 +30,13 @@ static __initconst const char name[][22] = { static __initconst const struct { const char path[17]; + u8 paravirt; const char params[PARAMCOUNT][26]; } dt_params[] = { { #ifdef CONFIG_XEN // <-------17------> .path = "/hypervisor/uefi", + .paravirt = 1, .params = { [SYSTAB] = "xen,uefi-system-table", [MMBASE] = "xen,uefi-mmap-start", @@ -121,6 +123,8 @@ u64 __init efi_get_fdt_params(struct efi_memory_map_data *mm) pr_err("Can't find property '%s' in DT!\n", pname); return 0; } + if (dt_params[i].paravirt) + set_bit(EFI_PARAVIRT, &efi.flags); return systab; } notfound: diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c index e6256c48284e..a1180461a445 100644 --- a/drivers/firmware/efi/memmap.c +++ b/drivers/firmware/efi/memmap.c @@ -39,9 +39,6 @@ int __init __efi_memmap_init(struct efi_memory_map_data *data) struct efi_memory_map map; phys_addr_t phys_map; - if (efi_enabled(EFI_PARAVIRT)) - return 0; - phys_map = data->phys_map; if (data->flags & EFI_MEMMAP_LATE) From 91efb15b5a3eb330b2900430c390ec2270dc67a6 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Sat, 29 Jun 2024 17:14:02 +0200 Subject: [PATCH 1348/1565] efi/x86: Free EFI memory map only when installing a new one. [ Commit 75dde792d6f6c2d0af50278bd374bf0c512fe196 upstream ] The logic in __efi_memmap_init() is shared between two different execution flows: - mapping the EFI memory map early or late into the kernel VA space, so that its entries can be accessed; - the x86 specific cloning of the EFI memory map in order to insert new entries that are created as a result of making a memory reservation via a call to efi_mem_reserve(). In the former case, the underlying memory containing the kernel's view of the EFI memory map (which may be heavily modified by the kernel itself on x86) is not modified at all, and the only thing that changes is the virtual mapping of this memory, which is different between early and late boot. In the latter case, an entirely new allocation is created that carries a new, updated version of the kernel's view of the EFI memory map. When installing this new version, the old version will no longer be referenced, and if the memory was allocated by the kernel, it will leak unless it gets freed. The logic that implements this freeing currently lives on the code path that is shared between these two use cases, but it should only apply to the latter. So move it to the correct spot. While at it, drop the dummy definition for non-x86 architectures, as that is no longer needed. Cc: Fixes: f0ef6523475f ("efi: Fix efi_memmap_alloc() leaks") Tested-by: Ashish Kalra Link: https://lore.kernel.org/all/36ad5079-4326-45ed-85f6-928ff76483d3@amd.com Signed-off-by: Ard Biesheuvel Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/efi.h | 1 - arch/x86/platform/efi/memmap.c | 12 +++++++++++- drivers/firmware/efi/memmap.c | 9 --------- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 01c0410ae97a..7c675832712d 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -386,7 +386,6 @@ extern int __init efi_memmap_alloc(unsigned int num_entries, struct efi_memory_map_data *data); extern void __efi_memmap_free(u64 phys, unsigned long size, unsigned long flags); -#define __efi_memmap_free __efi_memmap_free extern int __init efi_memmap_install(struct efi_memory_map_data *data); extern int __init efi_memmap_split_count(efi_memory_desc_t *md, diff --git a/arch/x86/platform/efi/memmap.c b/arch/x86/platform/efi/memmap.c index 241464b6dd03..872d310c426e 100644 --- a/arch/x86/platform/efi/memmap.c +++ b/arch/x86/platform/efi/memmap.c @@ -92,12 +92,22 @@ int __init efi_memmap_alloc(unsigned int num_entries, */ int __init efi_memmap_install(struct efi_memory_map_data *data) { + unsigned long size = efi.memmap.desc_size * efi.memmap.nr_map; + unsigned long flags = efi.memmap.flags; + u64 phys = efi.memmap.phys_map; + int ret; + efi_memmap_unmap(); if (efi_enabled(EFI_PARAVIRT)) return 0; - return __efi_memmap_init(data); + ret = __efi_memmap_init(data); + if (ret) + return ret; + + __efi_memmap_free(phys, size, flags); + return 0; } /** diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c index a1180461a445..77dd20f9df31 100644 --- a/drivers/firmware/efi/memmap.c +++ b/drivers/firmware/efi/memmap.c @@ -15,10 +15,6 @@ #include #include -#ifndef __efi_memmap_free -#define __efi_memmap_free(phys, size, flags) do { } while (0) -#endif - /** * __efi_memmap_init - Common code for mapping the EFI memory map * @data: EFI memory map data @@ -51,11 +47,6 @@ int __init __efi_memmap_init(struct efi_memory_map_data *data) return -ENOMEM; } - if (efi.memmap.flags & (EFI_MEMMAP_MEMBLOCK | EFI_MEMMAP_SLAB)) - __efi_memmap_free(efi.memmap.phys_map, - efi.memmap.desc_size * efi.memmap.nr_map, - efi.memmap.flags); - map.phys_map = data->phys_map; map.nr_map = data->size / data->desc_size; map.map_end = map.map + data->size; From b63d015b7ae9282d50e203370178837f6f19065a Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Thu, 13 Jul 2023 08:06:57 +0100 Subject: [PATCH 1349/1565] KVM: arm64: vgic-v4: Make the doorbell request robust w.r.t preemption commit b321c31c9b7b309dcde5e8854b741c8e6a9a05f0 upstream. Xiang reports that VMs occasionally fail to boot on GICv4.1 systems when running a preemptible kernel, as it is possible that a vCPU is blocked without requesting a doorbell interrupt. The issue is that any preemption that occurs between vgic_v4_put() and schedule() on the block path will mark the vPE as nonresident and *not* request a doorbell irq. This occurs because when the vcpu thread is resumed on its way to block, vcpu_load() will make the vPE resident again. Once the vcpu actually blocks, we don't request a doorbell anymore, and the vcpu won't be woken up on interrupt delivery. Fix it by tracking that we're entering WFI, and key the doorbell request on that flag. This allows us not to make the vPE resident when going through a preempt/schedule cycle, meaning we don't lose any state. Cc: stable@vger.kernel.org Fixes: 8e01d9a396e6 ("KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put") Reported-by: Xiang Chen Suggested-by: Zenghui Yu Tested-by: Xiang Chen Co-developed-by: Oliver Upton Signed-off-by: Marc Zyngier Acked-by: Zenghui Yu Link: https://lore.kernel.org/r/20230713070657.3873244-1-maz@kernel.org Signed-off-by: Oliver Upton [ modified to wrangle the vCPU flags directly instead of going through the flag helper macros as they have not yet been introduced. Also doing the flag wranging in the kvm_arch_vcpu_{un}blocking() hooks as the introduction of kvm_vcpu_wfi has not yet happened. See: 6109c5a6ab7f ("KVM: arm64: Move vGIC v4 handling for WFI out arch callback hook") ] Signed-off-by: James Gowans Acked-by: Marc Zyngier Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kvm/arm.c | 6 ++++-- arch/arm64/kvm/vgic/vgic-v3.c | 2 +- arch/arm64/kvm/vgic/vgic-v4.c | 8 ++++++-- include/kvm/arm_vgic.h | 2 +- 5 files changed, 13 insertions(+), 6 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 912b83e784bb..48ee1fe3aca4 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -410,6 +410,7 @@ struct kvm_vcpu_arch { #define KVM_ARM64_GUEST_HAS_SVE (1 << 5) /* SVE exposed to guest */ #define KVM_ARM64_VCPU_SVE_FINALIZED (1 << 6) /* SVE config completed */ #define KVM_ARM64_GUEST_HAS_PTRAUTH (1 << 7) /* PTRAUTH exposed to guest */ +#define KVM_ARM64_VCPU_IN_WFI (1 << 8) /* WFI instruction trapped */ #define vcpu_has_sve(vcpu) (system_supports_sve() && \ ((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SVE)) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 4d63fcd7574b..afe8be2fef88 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -332,13 +332,15 @@ void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) */ preempt_disable(); kvm_vgic_vmcr_sync(vcpu); - vgic_v4_put(vcpu, true); + vcpu->arch.flags |= KVM_ARM64_VCPU_IN_WFI; + vgic_v4_put(vcpu); preempt_enable(); } void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) { preempt_disable(); + vcpu->arch.flags &= ~KVM_ARM64_VCPU_IN_WFI; vgic_v4_load(vcpu); preempt_enable(); } @@ -649,7 +651,7 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu) if (kvm_check_request(KVM_REQ_RELOAD_GICv4, vcpu)) { /* The distributor enable bits were changed */ preempt_disable(); - vgic_v4_put(vcpu, false); + vgic_v4_put(vcpu); vgic_v4_load(vcpu); preempt_enable(); } diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c index 9cdf39a94a63..29c12bf9601a 100644 --- a/arch/arm64/kvm/vgic/vgic-v3.c +++ b/arch/arm64/kvm/vgic/vgic-v3.c @@ -682,7 +682,7 @@ void vgic_v3_put(struct kvm_vcpu *vcpu) { struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3; - WARN_ON(vgic_v4_put(vcpu, false)); + WARN_ON(vgic_v4_put(vcpu)); vgic_v3_vmcr_sync(vcpu); diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c index b5fa73c9fd35..cdfaaeabbb7d 100644 --- a/arch/arm64/kvm/vgic/vgic-v4.c +++ b/arch/arm64/kvm/vgic/vgic-v4.c @@ -310,14 +310,15 @@ void vgic_v4_teardown(struct kvm *kvm) its_vm->vpes = NULL; } -int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db) +int vgic_v4_put(struct kvm_vcpu *vcpu) { struct its_vpe *vpe = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe; if (!vgic_supports_direct_msis(vcpu->kvm) || !vpe->resident) return 0; - return its_make_vpe_non_resident(vpe, need_db); + return its_make_vpe_non_resident(vpe, + vcpu->arch.flags & KVM_ARM64_VCPU_IN_WFI); } int vgic_v4_load(struct kvm_vcpu *vcpu) @@ -328,6 +329,9 @@ int vgic_v4_load(struct kvm_vcpu *vcpu) if (!vgic_supports_direct_msis(vcpu->kvm) || vpe->resident) return 0; + if (vcpu->arch.flags & KVM_ARM64_VCPU_IN_WFI) + return 0; + /* * Before making the VPE resident, make sure the redistributor * corresponding to our current CPU expects us here. See the diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index a8d8fdcd3723..92348c085c0c 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -402,6 +402,6 @@ int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int irq, struct kvm_kernel_irq_routing_entry *irq_entry); int vgic_v4_load(struct kvm_vcpu *vcpu); -int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db); +int vgic_v4_put(struct kvm_vcpu *vcpu); #endif /* __KVM_ARM_VGIC_H */ From e9918954e370cbe764c80dce89c402e9e792b276 Mon Sep 17 00:00:00 2001 From: Johan Jonker Date: Thu, 13 Jun 2024 20:08:10 +0200 Subject: [PATCH 1350/1565] ARM: dts: rockchip: rk3066a: add #sound-dai-cells to hdmi node [ Upstream commit cca46f811d0000c1522a5e18ea48c27a15e45c05 ] '#sound-dai-cells' is required to properly interpret the list of DAI specified in the 'sound-dai' property, so add them to the 'hdmi' node for 'rk3066a.dtsi'. Fixes: fadc78062477 ("ARM: dts: rockchip: add rk3066 hdmi nodes") Signed-off-by: Johan Jonker Link: https://lore.kernel.org/r/8b229dcc-94e4-4bbc-9efc-9d5ddd694532@gmail.com Signed-off-by: Heiko Stuebner Signed-off-by: Sasha Levin --- arch/arm/boot/dts/rk3066a.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/boot/dts/rk3066a.dtsi b/arch/arm/boot/dts/rk3066a.dtsi index bbc3bff50856..75ea8e03bef0 100644 --- a/arch/arm/boot/dts/rk3066a.dtsi +++ b/arch/arm/boot/dts/rk3066a.dtsi @@ -124,6 +124,7 @@ hdmi: hdmi@10116000 { pinctrl-0 = <&hdmii2c_xfer>, <&hdmi_hpd>; power-domains = <&power RK3066_PD_VIO>; rockchip,grf = <&grf>; + #sound-dai-cells = <0>; status = "disabled"; ports { From 4686892f615a005133794fc79bb6cef1c156df69 Mon Sep 17 00:00:00 2001 From: Alex Bee Date: Sun, 23 Jun 2024 11:01:15 +0200 Subject: [PATCH 1351/1565] arm64: dts: rockchip: Add sound-dai-cells for RK3368 [ Upstream commit 8d7ec44aa5d1eb94a30319074762a1740440cdc8 ] Add the missing #sound-dai-cells for RK3368's I2S and S/PDIF controllers. Fixes: f7d89dfe1e31 ("arm64: dts: rockchip: add i2s nodes support for RK3368 SoCs") Fixes: 0328d68ea76d ("arm64: dts: rockchip: add rk3368 spdif node") Signed-off-by: Alex Bee Link: https://lore.kernel.org/r/20240623090116.670607-4-knaerzche@gmail.com Signed-off-by: Heiko Stuebner Signed-off-by: Sasha Levin --- arch/arm64/boot/dts/rockchip/rk3368.dtsi | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm64/boot/dts/rockchip/rk3368.dtsi b/arch/arm64/boot/dts/rockchip/rk3368.dtsi index 3746f23dc3df..bc4df59df3f1 100644 --- a/arch/arm64/boot/dts/rockchip/rk3368.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3368.dtsi @@ -687,6 +687,7 @@ spdif: spdif@ff880000 { dma-names = "tx"; pinctrl-names = "default"; pinctrl-0 = <&spdif_tx>; + #sound-dai-cells = <0>; status = "disabled"; }; @@ -698,6 +699,7 @@ i2s_2ch: i2s-2ch@ff890000 { clocks = <&cru SCLK_I2S_2CH>, <&cru HCLK_I2S_2CH>; dmas = <&dmac_bus 6>, <&dmac_bus 7>; dma-names = "tx", "rx"; + #sound-dai-cells = <0>; status = "disabled"; }; @@ -711,6 +713,7 @@ i2s_8ch: i2s-8ch@ff898000 { dma-names = "tx", "rx"; pinctrl-names = "default"; pinctrl-0 = <&i2s_8ch_bus>; + #sound-dai-cells = <0>; status = "disabled"; }; From 1bd2dc77029458b32987fe2e4a30d78cc10a55e8 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Wed, 9 Mar 2022 23:13:45 +0100 Subject: [PATCH 1352/1565] xdp: xdp_mem_allocator can be NULL in trace_mem_connect(). MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit e0ae713023a9d09d6e1b454bdc8e8c1dd32c586e upstream. Since the commit mentioned below __xdp_reg_mem_model() can return a NULL pointer. This pointer is dereferenced in trace_mem_connect() which leads to segfault. The trace points (mem_connect + mem_disconnect) were put in place to pair connect/disconnect using the IDs. The ID is only assigned if __xdp_reg_mem_model() does not return NULL. That connect trace point is of no use if there is no ID. Skip that connect trace point if xdp_alloc is NULL. [ Toke Høiland-Jørgensen delivered the reasoning for skipping the trace point ] Fixes: 4a48ef70b93b8 ("xdp: Allow registering memory model without rxq reference") Signed-off-by: Sebastian Andrzej Siewior Acked-by: Toke Høiland-Jørgensen Link: https://lore.kernel.org/r/YikmmXsffE+QajTB@linutronix.de Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- net/core/xdp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/xdp.c b/net/core/xdp.c index eb5045924efa..fd98d6059007 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -356,7 +356,8 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, if (IS_ERR(xdp_alloc)) return PTR_ERR(xdp_alloc); - trace_mem_connect(xdp_alloc, xdp_rxq); + if (trace_mem_connect_enabled() && xdp_alloc) + trace_mem_connect(xdp_alloc, xdp_rxq); return 0; } From 0a95f0f6d64eb9e80bb6008626f68182c32d3976 Mon Sep 17 00:00:00 2001 From: Udit Kumar Date: Tue, 25 Jun 2024 21:37:25 +0530 Subject: [PATCH 1353/1565] serial: 8250_omap: Fix Errata i2310 with RX FIFO level check commit c128a1b0523b685c8856ddc0ac0e1caef1fdeee5 upstream. Errata i2310[0] says, Erroneous timeout can be triggered, if this Erroneous interrupt is not cleared then it may leads to storm of interrupts. Commit 9d141c1e6157 ("serial: 8250_omap: Implementation of Errata i2310") which added the workaround but missed ensuring RX FIFO is really empty before applying the errata workaround as recommended in the errata text. Fix this by adding back check for UART_OMAP_RX_LVL to be 0 for workaround to take effect. [0] https://www.ti.com/lit/pdf/sprz536 page 23 Fixes: 9d141c1e6157 ("serial: 8250_omap: Implementation of Errata i2310") Cc: stable@vger.kernel.org Reported-by: Vignesh Raghavendra Closes: https://lore.kernel.org/all/e96d0c55-0b12-4cbf-9d23-48963543de49@ti.com/ Signed-off-by: Udit Kumar Link: https://lore.kernel.org/r/20240625160725.2102194-1-u-kumar1@ti.com Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/8250/8250_omap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c index 955642e90ede..ff461d0a9acc 100644 --- a/drivers/tty/serial/8250/8250_omap.c +++ b/drivers/tty/serial/8250/8250_omap.c @@ -656,7 +656,8 @@ static irqreturn_t omap8250_irq(int irq, void *dev_id) * https://www.ti.com/lit/pdf/sprz536 */ if (priv->habit & UART_RX_TIMEOUT_QUIRK && - (iir & UART_IIR_RX_TIMEOUT) == UART_IIR_RX_TIMEOUT) { + (iir & UART_IIR_RX_TIMEOUT) == UART_IIR_RX_TIMEOUT && + serial_port_in(port, UART_OMAP_RX_LVL) == 0) { unsigned char efr2, timeout_h, timeout_l; efr2 = serial_in(up, UART_OMAP_EFR2); From ec3adc2af0f119260d4721036cffe5c600734c1e Mon Sep 17 00:00:00 2001 From: Yunseong Kim Date: Tue, 25 Jun 2024 02:33:23 +0900 Subject: [PATCH 1354/1565] tracing/net_sched: NULL pointer dereference in perf_trace_qdisc_reset() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit bab4923132feb3e439ae45962979c5d9d5c7c1f1 upstream. In the TRACE_EVENT(qdisc_reset) NULL dereference occurred from qdisc->dev_queue->dev ->name This situation simulated from bunch of veths and Bluetooth disconnection and reconnection. During qdisc initialization, qdisc was being set to noop_queue. In veth_init_queue, the initial tx_num was reduced back to one, causing the qdisc reset to be called with noop, which led to the kernel panic. I've attached the GitHub gist link that C converted syz-execprogram source code and 3 log of reproduced vmcore-dmesg. https://gist.github.com/yskelg/cc64562873ce249cdd0d5a358b77d740 Yeoreum and I use two fuzzing tool simultaneously. One process with syz-executor : https://github.com/google/syzkaller $ ./syz-execprog -executor=./syz-executor -repeat=1 -sandbox=setuid \ -enable=none -collide=false log1 The other process with perf fuzzer: https://github.com/deater/perf_event_tests/tree/master/fuzzer $ perf_event_tests/fuzzer/perf_fuzzer I think this will happen on the kernel version. Linux kernel version +v6.7.10, +v6.8, +v6.9 and it could happen in v6.10. This occurred from 51270d573a8d. I think this patch is absolutely necessary. Previously, It was showing not intended string value of name. I've reproduced 3 time from my fedora 40 Debug Kernel with any other module or patched. version: 6.10.0-0.rc2.20240608gitdc772f8237f9.29.fc41.aarch64+debug [ 5287.164555] veth0_vlan: left promiscuous mode [ 5287.164929] veth1_macvtap: left promiscuous mode [ 5287.164950] veth0_macvtap: left promiscuous mode [ 5287.164983] veth1_vlan: left promiscuous mode [ 5287.165008] veth0_vlan: left promiscuous mode [ 5287.165450] veth1_macvtap: left promiscuous mode [ 5287.165472] veth0_macvtap: left promiscuous mode [ 5287.165502] veth1_vlan: left promiscuous mode … [ 5297.598240] bridge0: port 2(bridge_slave_1) entered blocking state [ 5297.598262] bridge0: port 2(bridge_slave_1) entered forwarding state [ 5297.598296] bridge0: port 1(bridge_slave_0) entered blocking state [ 5297.598313] bridge0: port 1(bridge_slave_0) entered forwarding state [ 5297.616090] 8021q: adding VLAN 0 to HW filter on device bond0 [ 5297.620405] bridge0: port 1(bridge_slave_0) entered disabled state [ 5297.620730] bridge0: port 2(bridge_slave_1) entered disabled state [ 5297.627247] 8021q: adding VLAN 0 to HW filter on device team0 [ 5297.629636] bridge0: port 1(bridge_slave_0) entered blocking state … [ 5298.002798] bridge_slave_0: left promiscuous mode [ 5298.002869] bridge0: port 1(bridge_slave_0) entered disabled state [ 5298.309444] bond0 (unregistering): (slave bond_slave_0): Releasing backup interface [ 5298.315206] bond0 (unregistering): (slave bond_slave_1): Releasing backup interface [ 5298.320207] bond0 (unregistering): Released all slaves [ 5298.354296] hsr_slave_0: left promiscuous mode [ 5298.360750] hsr_slave_1: left promiscuous mode [ 5298.374889] veth1_macvtap: left promiscuous mode [ 5298.374931] veth0_macvtap: left promiscuous mode [ 5298.374988] veth1_vlan: left promiscuous mode [ 5298.375024] veth0_vlan: left promiscuous mode [ 5299.109741] team0 (unregistering): Port device team_slave_1 removed [ 5299.185870] team0 (unregistering): Port device team_slave_0 removed … [ 5300.155443] Bluetooth: hci3: unexpected cc 0x0c03 length: 249 > 1 [ 5300.155724] Bluetooth: hci3: unexpected cc 0x1003 length: 249 > 9 [ 5300.155988] Bluetooth: hci3: unexpected cc 0x1001 length: 249 > 9 …. [ 5301.075531] team0: Port device team_slave_1 added [ 5301.085515] bridge0: port 1(bridge_slave_0) entered blocking state [ 5301.085531] bridge0: port 1(bridge_slave_0) entered disabled state [ 5301.085588] bridge_slave_0: entered allmulticast mode [ 5301.085800] bridge_slave_0: entered promiscuous mode [ 5301.095617] bridge0: port 1(bridge_slave_0) entered blocking state [ 5301.095633] bridge0: port 1(bridge_slave_0) entered disabled state … [ 5301.149734] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link [ 5301.173234] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link [ 5301.180517] bond0: (slave bond_slave_1): Enslaving as an active interface with an up link [ 5301.193481] hsr_slave_0: entered promiscuous mode [ 5301.204425] hsr_slave_1: entered promiscuous mode [ 5301.210172] debugfs: Directory 'hsr0' with parent 'hsr' already present! [ 5301.210185] Cannot create hsr debugfs directory [ 5301.224061] bond0: (slave bond_slave_1): Enslaving as an active interface with an up link [ 5301.246901] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link [ 5301.255934] team0: Port device team_slave_0 added [ 5301.256480] team0: Port device team_slave_1 added [ 5301.256948] team0: Port device team_slave_0 added … [ 5301.435928] hsr_slave_0: entered promiscuous mode [ 5301.446029] hsr_slave_1: entered promiscuous mode [ 5301.455872] debugfs: Directory 'hsr0' with parent 'hsr' already present! [ 5301.455884] Cannot create hsr debugfs directory [ 5301.502664] hsr_slave_0: entered promiscuous mode [ 5301.513675] hsr_slave_1: entered promiscuous mode [ 5301.526155] debugfs: Directory 'hsr0' with parent 'hsr' already present! [ 5301.526164] Cannot create hsr debugfs directory [ 5301.563662] hsr_slave_0: entered promiscuous mode [ 5301.576129] hsr_slave_1: entered promiscuous mode [ 5301.580259] debugfs: Directory 'hsr0' with parent 'hsr' already present! [ 5301.580270] Cannot create hsr debugfs directory [ 5301.590269] 8021q: adding VLAN 0 to HW filter on device bond0 [ 5301.595872] KASAN: null-ptr-deref in range [0x0000000000000130-0x0000000000000137] [ 5301.595877] Mem abort info: [ 5301.595881] ESR = 0x0000000096000006 [ 5301.595885] EC = 0x25: DABT (current EL), IL = 32 bits [ 5301.595889] SET = 0, FnV = 0 [ 5301.595893] EA = 0, S1PTW = 0 [ 5301.595896] FSC = 0x06: level 2 translation fault [ 5301.595900] Data abort info: [ 5301.595903] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 [ 5301.595907] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 5301.595911] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 5301.595915] [dfff800000000026] address between user and kernel address ranges [ 5301.595971] Internal error: Oops: 0000000096000006 [#1] SMP … [ 5301.596076] CPU: 2 PID: 102769 Comm: syz-executor.3 Kdump: loaded Tainted: G W ------- --- 6.10.0-0.rc2.20240608gitdc772f8237f9.29.fc41.aarch64+debug #1 [ 5301.596080] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.21805430.BA64.2305221830 05/22/2023 [ 5301.596082] pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 5301.596085] pc : strnlen+0x40/0x88 [ 5301.596114] lr : trace_event_get_offsets_qdisc_reset+0x6c/0x2b0 [ 5301.596124] sp : ffff8000beef6b40 [ 5301.596126] x29: ffff8000beef6b40 x28: dfff800000000000 x27: 0000000000000001 [ 5301.596131] x26: 6de1800082c62bd0 x25: 1ffff000110aa9e0 x24: ffff800088554f00 [ 5301.596136] x23: ffff800088554ec0 x22: 0000000000000130 x21: 0000000000000140 [ 5301.596140] x20: dfff800000000000 x19: ffff8000beef6c60 x18: ffff7000115106d8 [ 5301.596143] x17: ffff800121bad000 x16: ffff800080020000 x15: 0000000000000006 [ 5301.596147] x14: 0000000000000002 x13: ffff0001f3ed8d14 x12: ffff700017ddeda5 [ 5301.596151] x11: 1ffff00017ddeda4 x10: ffff700017ddeda4 x9 : ffff800082cc5eec [ 5301.596155] x8 : 0000000000000004 x7 : 00000000f1f1f1f1 x6 : 00000000f2f2f200 [ 5301.596158] x5 : 00000000f3f3f3f3 x4 : ffff700017dded80 x3 : 00000000f204f1f1 [ 5301.596162] x2 : 0000000000000026 x1 : 0000000000000000 x0 : 0000000000000130 [ 5301.596166] Call trace: [ 5301.596175] strnlen+0x40/0x88 [ 5301.596179] trace_event_get_offsets_qdisc_reset+0x6c/0x2b0 [ 5301.596182] perf_trace_qdisc_reset+0xb0/0x538 [ 5301.596184] __traceiter_qdisc_reset+0x68/0xc0 [ 5301.596188] qdisc_reset+0x43c/0x5e8 [ 5301.596190] netif_set_real_num_tx_queues+0x288/0x770 [ 5301.596194] veth_init_queues+0xfc/0x130 [veth] [ 5301.596198] veth_newlink+0x45c/0x850 [veth] [ 5301.596202] rtnl_newlink_create+0x2c8/0x798 [ 5301.596205] __rtnl_newlink+0x92c/0xb60 [ 5301.596208] rtnl_newlink+0xd8/0x130 [ 5301.596211] rtnetlink_rcv_msg+0x2e0/0x890 [ 5301.596214] netlink_rcv_skb+0x1c4/0x380 [ 5301.596225] rtnetlink_rcv+0x20/0x38 [ 5301.596227] netlink_unicast+0x3c8/0x640 [ 5301.596231] netlink_sendmsg+0x658/0xa60 [ 5301.596234] __sock_sendmsg+0xd0/0x180 [ 5301.596243] __sys_sendto+0x1c0/0x280 [ 5301.596246] __arm64_sys_sendto+0xc8/0x150 [ 5301.596249] invoke_syscall+0xdc/0x268 [ 5301.596256] el0_svc_common.constprop.0+0x16c/0x240 [ 5301.596259] do_el0_svc+0x48/0x68 [ 5301.596261] el0_svc+0x50/0x188 [ 5301.596265] el0t_64_sync_handler+0x120/0x130 [ 5301.596268] el0t_64_sync+0x194/0x198 [ 5301.596272] Code: eb15001f 54000120 d343fc02 12000801 (38f46842) [ 5301.596285] SMP: stopping secondary CPUs [ 5301.597053] Starting crashdump kernel... [ 5301.597057] Bye! After applying our patch, I didn't find any kernel panic errors. We've found a simple reproducer # echo 1 > /sys/kernel/debug/tracing/events/qdisc/qdisc_reset/enable # ip link add veth0 type veth peer name veth1 Error: Unknown device type. However, without our patch applied, I tested upstream 6.10.0-rc3 kernel using the qdisc_reset event and the ip command on my qemu virtual machine. This 2 commands makes always kernel panic. Linux version: 6.10.0-rc3 [ 0.000000] Linux version 6.10.0-rc3-00164-g44ef20baed8e-dirty (paran@fedora) (gcc (GCC) 14.1.1 20240522 (Red Hat 14.1.1-4), GNU ld version 2.41-34.fc40) #20 SMP PREEMPT Sat Jun 15 16:51:25 KST 2024 Kernel panic message: [ 615.236484] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP [ 615.237250] Dumping ftrace buffer: [ 615.237679] (ftrace buffer empty) [ 615.238097] Modules linked in: veth crct10dif_ce virtio_gpu virtio_dma_buf drm_shmem_helper drm_kms_helper zynqmp_fpga xilinx_can xilinx_spi xilinx_selectmap xilinx_core xilinx_pr_decoupler versal_fpga uvcvideo uvc videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc usbnet deflate zstd ubifs ubi rcar_canfd rcar_can omap_mailbox ntb_msi_test ntb_hw_epf lattice_sysconfig_spi lattice_sysconfig ice40_spi gpio_xilinx dwmac_altr_socfpga mdio_regmap stmmac_platform stmmac pcs_xpcs dfl_fme_region dfl_fme_mgr dfl_fme_br dfl_afu dfl fpga_region fpga_bridge can can_dev br_netfilter bridge stp llc atl1c ath11k_pci mhi ath11k_ahb ath11k qmi_helpers ath10k_sdio ath10k_pci ath10k_core ath mac80211 libarc4 cfg80211 drm fuse backlight ipv6 Jun 22 02:36:5[3 6k152.62-4sm98k4-0k]v kCePUr:n e1l :P IUDn:a b4le6 8t oC ohmma: nidpl eN oketr nteali nptaedg i6n.g1 0re.0q-urecs3t- 0at0 1v6i4r-tgu4a4le fa2d0dbraeeds0se-dir tyd f#f2f08 615.252376] Hardware name: linux,dummy-virt (DT) [ 615.253220] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 615.254433] pc : strnlen+0x6c/0xe0 [ 615.255096] lr : trace_event_get_offsets_qdisc_reset+0x94/0x3d0 [ 615.256088] sp : ffff800080b269a0 [ 615.256615] x29: ffff800080b269a0 x28: ffffc070f3f98500 x27: 0000000000000001 [ 615.257831] x26: 0000000000000010 x25: ffffc070f3f98540 x24: ffffc070f619cf60 [ 615.259020] x23: 0000000000000128 x22: 0000000000000138 x21: dfff800000000000 [ 615.260241] x20: ffffc070f631ad00 x19: 0000000000000128 x18: ffffc070f448b800 [ 615.261454] x17: 0000000000000000 x16: 0000000000000001 x15: ffffc070f4ba2a90 [ 615.262635] x14: ffff700010164d73 x13: 1ffff80e1e8d5eb3 x12: 1ffff00010164d72 [ 615.263877] x11: ffff700010164d72 x10: dfff800000000000 x9 : ffffc070e85d6184 [ 615.265047] x8 : ffffc070e4402070 x7 : 000000000000f1f1 x6 : 000000001504a6d3 [ 615.266336] x5 : ffff28ca21122140 x4 : ffffc070f5043ea8 x3 : 0000000000000000 [ 615.267528] x2 : 0000000000000025 x1 : 0000000000000000 x0 : 0000000000000000 [ 615.268747] Call trace: [ 615.269180] strnlen+0x6c/0xe0 [ 615.269767] trace_event_get_offsets_qdisc_reset+0x94/0x3d0 [ 615.270716] trace_event_raw_event_qdisc_reset+0xe8/0x4e8 [ 615.271667] __traceiter_qdisc_reset+0xa0/0x140 [ 615.272499] qdisc_reset+0x554/0x848 [ 615.273134] netif_set_real_num_tx_queues+0x360/0x9a8 [ 615.274050] veth_init_queues+0x110/0x220 [veth] [ 615.275110] veth_newlink+0x538/0xa50 [veth] [ 615.276172] __rtnl_newlink+0x11e4/0x1bc8 [ 615.276944] rtnl_newlink+0xac/0x120 [ 615.277657] rtnetlink_rcv_msg+0x4e4/0x1370 [ 615.278409] netlink_rcv_skb+0x25c/0x4f0 [ 615.279122] rtnetlink_rcv+0x48/0x70 [ 615.279769] netlink_unicast+0x5a8/0x7b8 [ 615.280462] netlink_sendmsg+0xa70/0x1190 Yeoreum and I don't know if the patch we wrote will fix the underlying cause, but we think that priority is to prevent kernel panic happening. So, we're sending this patch. Fixes: 51270d573a8d ("tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string") Link: https://lore.kernel.org/lkml/20240229143432.273b4871@gandalf.local.home/t/ Cc: netdev@vger.kernel.org Tested-by: Yunseong Kim Signed-off-by: Yunseong Kim Signed-off-by: Yeoreum Yun Link: https://lore.kernel.org/r/20240624173320.24945-4-yskelg@gmail.com Signed-off-by: Paolo Abeni Signed-off-by: Greg Kroah-Hartman --- include/trace/events/qdisc.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/trace/events/qdisc.h b/include/trace/events/qdisc.h index a50df41634c5..980cb7a74e35 100644 --- a/include/trace/events/qdisc.h +++ b/include/trace/events/qdisc.h @@ -78,14 +78,14 @@ TRACE_EVENT(qdisc_destroy, TP_ARGS(q), TP_STRUCT__entry( - __string( dev, qdisc_dev(q)->name ) + __string( dev, qdisc_dev(q) ? qdisc_dev(q)->name : "(null)" ) __string( kind, q->ops->id ) __field( u32, parent ) __field( u32, handle ) ), TP_fast_assign( - __assign_str(dev, qdisc_dev(q)->name); + __assign_str(dev, qdisc_dev(q) ? qdisc_dev(q)->name : "(null)"); __assign_str(kind, q->ops->id); __entry->parent = q->parent; __entry->handle = q->handle; From 6ab8b697d7d1ff0a127a7774489311d878dad04c Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Fri, 5 Jul 2024 09:12:57 +0200 Subject: [PATCH 1355/1565] Linux 5.10.221 Link: https://lore.kernel.org/r/20240703102904.170852981@linuxfoundation.org Tested-by: Jon Hunter Tested-by: Pavel Machek (CIP) Link: https://lore.kernel.org/r/20240704094505.095988824@linuxfoundation.org Tested-by: Jon Hunter Tested-by: Florian Fainelli Tested-by: Linux Kernel Functional Testing Tested-by: Salvatore Bonaccorso Signed-off-by: Greg Kroah-Hartman --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 9304408d8ace..b0e22161cd55 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 5 PATCHLEVEL = 10 -SUBLEVEL = 220 +SUBLEVEL = 221 EXTRAVERSION = NAME = Dare mighty things From 054258ff896ea6f63f2ca32b476a14bdcb553b86 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Sat, 1 Jun 2024 09:33:51 +0000 Subject: [PATCH 1356/1565] ANDROID: ABI fixup for abi break in struct dst_ops In commit 92f1655aa2b2 ("net: fix __dst_negative_advice() race") the struct dst_ops callback negative_advice is callback changes function parameters. But as this pointer is part of a structure that is tracked in the ABI checker, the tool triggers when this is changed. However, the callback pointer is internal to the networking stack, so changing the function type is safe, so needing to preserve this is not required. To do so, switch the function pointer type back to the old one so that the checking tools pass, AND then do a hard cast of the function pointer to the new type when assigning and calling the function. Bug: 343727534 Fixes: 92f1655aa2b2 ("net: fix __dst_negative_advice() race") Change-Id: I48d4ab4bbd29f8edc8fbd7923828b7f78a23e12e Signed-off-by: Greg Kroah-Hartman --- include/net/dst_ops.h | 12 +++++++++++- include/net/sock.h | 12 ++++++++++-- net/ipv4/route.c | 2 +- net/ipv6/route.c | 2 +- net/xfrm/xfrm_policy.c | 2 +- 5 files changed, 24 insertions(+), 6 deletions(-) diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h index dd7c0b37da38..382af5f36e7f 100644 --- a/include/net/dst_ops.h +++ b/include/net/dst_ops.h @@ -12,6 +12,16 @@ struct sk_buff; struct sock; struct net; +/* *** ANDROID FIXUP *** + * These typedefs are used to help fixup the ABI break caused by commit + * 92f1655aa2b2 ("net: fix __dst_negative_advice() race") where the + * negative_advice callback changed function signatures. + * See b/343727534 for more details. + * *** ANDROID FIXUP *** + */ +typedef void (*android_dst_ops_negative_advice_new_t)(struct sock *sk, struct dst_entry *); +typedef struct dst_entry * (*android_dst_ops_negative_advice_old_t)(struct dst_entry *); + struct dst_ops { unsigned short family; unsigned int gc_thresh; @@ -24,7 +34,7 @@ struct dst_ops { void (*destroy)(struct dst_entry *); void (*ifdown)(struct dst_entry *, struct net_device *dev, int how); - void (*negative_advice)(struct sock *sk, struct dst_entry *); + struct dst_entry * (*negative_advice)(struct dst_entry *); void (*link_failure)(struct sk_buff *); void (*update_pmtu)(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb, u32 mtu, diff --git a/include/net/sock.h b/include/net/sock.h index 5abdd262b79d..e995fa74b813 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2027,10 +2027,18 @@ sk_dst_get(struct sock *sk) static inline void __dst_negative_advice(struct sock *sk) { + /* *** ANDROID FIXUP *** + * See b/343727534 for more details why this typedef is needed here. + * *** ANDROID FIXUP *** + */ + android_dst_ops_negative_advice_new_t negative_advice; + struct dst_entry *dst = __sk_dst_get(sk); - if (dst && dst->ops->negative_advice) - dst->ops->negative_advice(sk, dst); + if (dst && dst->ops->negative_advice) { + negative_advice = (android_dst_ops_negative_advice_new_t)dst->ops->negative_advice; + negative_advice(sk, dst); + } } static inline void dst_negative_advice(struct sock *sk) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index b5acc9723761..a8fbbd85b694 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -165,7 +165,7 @@ static struct dst_ops ipv4_dst_ops = { .mtu = ipv4_mtu, .cow_metrics = ipv4_cow_metrics, .destroy = ipv4_dst_destroy, - .negative_advice = ipv4_negative_advice, + .negative_advice = (android_dst_ops_negative_advice_old_t)ipv4_negative_advice, .link_failure = ipv4_link_failure, .update_pmtu = ip_rt_update_pmtu, .redirect = ip_do_redirect, diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 2a26b7491b4b..c96a0a9b331f 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -251,7 +251,7 @@ static struct dst_ops ip6_dst_ops_template = { .cow_metrics = dst_cow_metrics_generic, .destroy = ip6_dst_destroy, .ifdown = ip6_dst_ifdown, - .negative_advice = ip6_negative_advice, + .negative_advice = (android_dst_ops_negative_advice_old_t)ip6_negative_advice, .link_failure = ip6_link_failure, .update_pmtu = ip6_rt_update_pmtu, .redirect = rt6_do_redirect, diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 1b078dc28d39..ac18cf1a03ec 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -3975,7 +3975,7 @@ int xfrm_policy_register_afinfo(const struct xfrm_policy_afinfo *afinfo, int fam if (likely(dst_ops->mtu == NULL)) dst_ops->mtu = xfrm_mtu; if (likely(dst_ops->negative_advice == NULL)) - dst_ops->negative_advice = xfrm_negative_advice; + dst_ops->negative_advice = (android_dst_ops_negative_advice_old_t)xfrm_negative_advice; if (likely(dst_ops->link_failure == NULL)) dst_ops->link_failure = xfrm_link_failure; if (likely(dst_ops->neigh_lookup == NULL)) From 0546f6a05d674fcac79fc3167b85346cf19cff38 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 15 Jul 2024 14:30:51 +0000 Subject: [PATCH 1357/1565] Revert "media: cec: core: add adap_nb_transmit_canceled() callback" This reverts commit 0cf6693d3f8ed5b733c6df1989fc43cfc07bce05 which is commit da53c36ddd3f118a525a04faa8c47ca471e6c467 upstream. It breaks the Android kernel abi and can be brought back in the future in an abi-safe way if it is really needed. Bug: 161946584 Change-Id: I1ea895e4c9b1dc05aa46673722aa81ede779df65 Signed-off-by: Greg Kroah-Hartman --- drivers/media/cec/core/cec-adap.c | 4 ++-- include/media/cec.h | 6 ++---- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index 73cc2b0fe185..a5853c82d5e4 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -397,8 +397,8 @@ static void cec_data_cancel(struct cec_data *data, u8 tx_status, u8 rx_status) cec_queue_msg_monitor(adap, &data->msg, 1); if (!data->blocking && data->msg.sequence) - /* Allow drivers to react to a canceled transmit */ - call_void_op(adap, adap_nb_transmit_canceled, &data->msg); + /* Allow drivers to process the message first */ + call_op(adap, received, &data->msg); cec_data_completed(data); } diff --git a/include/media/cec.h b/include/media/cec.h index 38eb9334d854..23202bf439b4 100644 --- a/include/media/cec.h +++ b/include/media/cec.h @@ -120,16 +120,14 @@ struct cec_adap_ops { int (*adap_log_addr)(struct cec_adapter *adap, u8 logical_addr); int (*adap_transmit)(struct cec_adapter *adap, u8 attempts, u32 signal_free_time, struct cec_msg *msg); - void (*adap_nb_transmit_canceled)(struct cec_adapter *adap, - const struct cec_msg *msg); void (*adap_status)(struct cec_adapter *adap, struct seq_file *file); void (*adap_free)(struct cec_adapter *adap); - /* Error injection callbacks, called without adap->lock held */ + /* Error injection callbacks */ int (*error_inj_show)(struct cec_adapter *adap, struct seq_file *sf); bool (*error_inj_parse_line)(struct cec_adapter *adap, char *line); - /* High-level CEC message callback, called without adap->lock held */ + /* High-level CEC message callback */ int (*received)(struct cec_adapter *adap, struct cec_msg *msg); }; From 8047831dc66a58889cd1edb50f1793c682007a2d Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 15 Jul 2024 17:17:39 +0000 Subject: [PATCH 1358/1565] Revert "media: cec: core: avoid recursive cec_claim_log_addrs" This reverts commit 5103090f4e55252cb1b1a6084da53d63434dc24b which is commit 47c82aac10a6954d68f29f10d9758d016e8e5af1 upstream. It breaks the Android kernel abi and can be brought back in the future in an abi-safe way if it is really needed. Bug: 161946584 Change-Id: I58fa015fd62c79ab2bdfb07bdb4cd1ca10e8e4c4 Signed-off-by: Greg Kroah-Hartman --- drivers/media/cec/core/cec-adap.c | 6 +----- drivers/media/cec/core/cec-api.c | 2 +- include/media/cec.h | 1 - 3 files changed, 2 insertions(+), 7 deletions(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index a5853c82d5e4..09f242bc0783 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -1541,12 +1541,9 @@ static int cec_config_thread_func(void *arg) */ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block) { - if (WARN_ON(adap->is_claiming_log_addrs || - adap->is_configuring || adap->is_configured)) + if (WARN_ON(adap->is_configuring || adap->is_configured)) return; - adap->is_claiming_log_addrs = true; - init_completion(&adap->config_completion); /* Ready to kick off the thread */ @@ -1561,7 +1558,6 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block) wait_for_completion(&adap->config_completion); mutex_lock(&adap->lock); } - adap->is_claiming_log_addrs = false; } /* diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c index 8bdf58abdf96..feaf100a44c2 100644 --- a/drivers/media/cec/core/cec-api.c +++ b/drivers/media/cec/core/cec-api.c @@ -178,7 +178,7 @@ static long cec_adap_s_log_addrs(struct cec_adapter *adap, struct cec_fh *fh, CEC_LOG_ADDRS_FL_ALLOW_RC_PASSTHRU | CEC_LOG_ADDRS_FL_CDC_ONLY; mutex_lock(&adap->lock); - if (!adap->is_claiming_log_addrs && !adap->is_configuring && + if (!adap->is_configuring && (!log_addrs.num_log_addrs || !adap->is_configured) && !cec_is_busy(adap, fh)) { err = __cec_s_log_addrs(adap, &log_addrs, block); diff --git a/include/media/cec.h b/include/media/cec.h index 23202bf439b4..df3e8738d512 100644 --- a/include/media/cec.h +++ b/include/media/cec.h @@ -239,7 +239,6 @@ struct cec_adapter { u16 phys_addr; bool needs_hpd; bool is_enabled; - bool is_claiming_log_addrs; bool is_configuring; bool is_configured; bool cec_pin_is_high; From c0342019d8cfdd38fecdd803ee85386aa103a007 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 16 Jul 2024 08:19:55 +0000 Subject: [PATCH 1359/1565] Revert "media: cec: core: avoid confusing "transmit timed out" message" This reverts commit 3166b2dffaee89b8c68f55d2c825bb3163a3ea55 which is commit cbe499977bc36fedae89f0a0d7deb4ccde9798fe upstream. It breaks the Android kernel abi and can be brought back in the future in an abi-safe way if it is really needed. Bug: 161946584 Change-Id: I5f48a6ef27c0de8d19ad979949724dcc39e37a3a Signed-off-by: Greg Kroah-Hartman --- drivers/media/cec/core/cec-adap.c | 15 +-------------- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index 09f242bc0783..920c108b84aa 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -502,15 +502,6 @@ int cec_thread_func(void *_adap) goto unlock; } - if (adap->transmit_in_progress && - adap->transmit_in_progress_aborted) { - if (adap->transmitting) - cec_data_cancel(adap->transmitting, - CEC_TX_STATUS_ABORTED, 0); - adap->transmit_in_progress = false; - adap->transmit_in_progress_aborted = false; - goto unlock; - } if (adap->transmit_in_progress && timeout) { /* * If we timeout, then log that. Normally this does @@ -764,7 +755,6 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg, { struct cec_data *data; bool is_raw = msg_is_raw(msg); - int err; if (adap->devnode.unregistered) return -ENODEV; @@ -927,13 +917,10 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg, * Release the lock and wait, retake the lock afterwards. */ mutex_unlock(&adap->lock); - err = wait_for_completion_killable(&data->c); + wait_for_completion_killable(&data->c); cancel_delayed_work_sync(&data->work); mutex_lock(&adap->lock); - if (err) - adap->transmit_in_progress_aborted = true; - /* Cancel the transmit if it was interrupted */ if (!data->completed) { if (data->msg.tx_status & CEC_TX_STATUS_OK) From 590dc9d34f83ad30f104ed913402dadb845eec5c Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 15 Jul 2024 17:18:37 +0000 Subject: [PATCH 1360/1565] Revert "media: cec-adap.c: drop activate_cnt, use state info instead" This reverts commit 3e938b7d40fb3920619c3a9909cdd7902bcebae8 which is commit f9222f8ca18bcb1d55dd749b493b29fd8092fb82 upstream. It breaks the Android kernel abi and can be brought back in the future in an abi-safe way if it is really needed. Bug: 161946584 Change-Id: I16b7093a0a644af04a0c62aed235653bbeaae070 Signed-off-by: Greg Kroah-Hartman --- drivers/media/cec/core/cec-adap.c | 152 ++++++++++++++++++------------ include/media/cec.h | 4 +- 2 files changed, 95 insertions(+), 61 deletions(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index 920c108b84aa..07ff4b0c8461 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -1548,59 +1548,47 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block) } /* - * Helper function to enable/disable the CEC adapter. + * Helper functions to enable/disable the CEC adapter. * - * This function is called with adap->lock held. + * These functions are called with adap->lock held. */ -static int cec_adap_enable(struct cec_adapter *adap) +static int cec_activate_cnt_inc(struct cec_adapter *adap) { - bool enable; - int ret = 0; + int ret; - enable = adap->monitor_all_cnt || adap->monitor_pin_cnt || - adap->log_addrs.num_log_addrs; - if (adap->needs_hpd) - enable = enable && adap->phys_addr != CEC_PHYS_ADDR_INVALID; - - if (enable == adap->is_enabled) + if (adap->activate_cnt++) return 0; /* serialize adap_enable */ mutex_lock(&adap->devnode.lock); - if (enable) { - adap->last_initiator = 0xff; - adap->transmit_in_progress = false; - ret = adap->ops->adap_enable(adap, true); - if (!ret) { - /* - * Enable monitor-all/pin modes if needed. We warn, but - * continue if this fails as this is not a critical error. - */ - if (adap->monitor_all_cnt) - WARN_ON(call_op(adap, adap_monitor_all_enable, true)); - if (adap->monitor_pin_cnt) - WARN_ON(call_op(adap, adap_monitor_pin_enable, true)); - } - } else { - /* Disable monitor-all/pin modes if needed (needs_hpd == 1) */ - if (adap->monitor_all_cnt) - WARN_ON(call_op(adap, adap_monitor_all_enable, false)); - if (adap->monitor_pin_cnt) - WARN_ON(call_op(adap, adap_monitor_pin_enable, false)); - WARN_ON(adap->ops->adap_enable(adap, false)); - adap->last_initiator = 0xff; - adap->transmit_in_progress = false; - adap->transmit_in_progress_aborted = false; - if (adap->transmitting) - cec_data_cancel(adap->transmitting, CEC_TX_STATUS_ABORTED, 0); - } - if (!ret) - adap->is_enabled = enable; - wake_up_interruptible(&adap->kthread_waitq); + adap->last_initiator = 0xff; + adap->transmit_in_progress = false; + ret = call_op(adap, adap_enable, true); + if (ret) + adap->activate_cnt--; mutex_unlock(&adap->devnode.lock); return ret; } +static void cec_activate_cnt_dec(struct cec_adapter *adap) +{ + if (WARN_ON(!adap->activate_cnt)) + return; + + if (--adap->activate_cnt) + return; + + /* serialize adap_enable */ + mutex_lock(&adap->devnode.lock); + WARN_ON(call_op(adap, adap_enable, false)); + adap->last_initiator = 0xff; + adap->transmit_in_progress = false; + adap->transmit_in_progress_aborted = false; + if (adap->transmitting) + cec_data_cancel(adap->transmitting, CEC_TX_STATUS_ABORTED, 0); + mutex_unlock(&adap->devnode.lock); +} + /* Set a new physical address and send an event notifying userspace of this. * * This function is called with adap->lock held. @@ -1621,16 +1609,33 @@ void __cec_s_phys_addr(struct cec_adapter *adap, u16 phys_addr, bool block) adap->phys_addr = CEC_PHYS_ADDR_INVALID; cec_post_state_event(adap); cec_adap_unconfigure(adap); - if (becomes_invalid) { - cec_adap_enable(adap); - return; + if (becomes_invalid && adap->needs_hpd) { + /* Disable monitor-all/pin modes if needed */ + if (adap->monitor_all_cnt) + WARN_ON(call_op(adap, adap_monitor_all_enable, false)); + if (adap->monitor_pin_cnt) + WARN_ON(call_op(adap, adap_monitor_pin_enable, false)); + cec_activate_cnt_dec(adap); + wake_up_interruptible(&adap->kthread_waitq); } + if (becomes_invalid) + return; + } + + if (is_invalid && adap->needs_hpd) { + if (cec_activate_cnt_inc(adap)) + return; + /* + * Re-enable monitor-all/pin modes if needed. We warn, but + * continue if this fails as this is not a critical error. + */ + if (adap->monitor_all_cnt) + WARN_ON(call_op(adap, adap_monitor_all_enable, true)); + if (adap->monitor_pin_cnt) + WARN_ON(call_op(adap, adap_monitor_pin_enable, true)); } adap->phys_addr = phys_addr; - if (is_invalid) - cec_adap_enable(adap); - cec_post_state_event(adap); if (adap->log_addrs.num_log_addrs) cec_claim_log_addrs(adap, block); @@ -1687,7 +1692,6 @@ int __cec_s_log_addrs(struct cec_adapter *adap, struct cec_log_addrs *log_addrs, bool block) { u16 type_mask = 0; - int err; int i; if (adap->devnode.unregistered) @@ -1703,7 +1707,8 @@ int __cec_s_log_addrs(struct cec_adapter *adap, adap->log_addrs.osd_name[0] = '\0'; adap->log_addrs.vendor_id = CEC_VENDOR_ID_NONE; adap->log_addrs.cec_version = CEC_OP_CEC_VERSION_2_0; - cec_adap_enable(adap); + if (!adap->needs_hpd) + cec_activate_cnt_dec(adap); return 0; } @@ -1837,12 +1842,17 @@ int __cec_s_log_addrs(struct cec_adapter *adap, sizeof(log_addrs->features[i])); } + if (!adap->needs_hpd && !adap->is_configuring && !adap->is_configured) { + int ret = cec_activate_cnt_inc(adap); + + if (ret) + return ret; + } log_addrs->log_addr_mask = adap->log_addrs.log_addr_mask; adap->log_addrs = *log_addrs; - err = cec_adap_enable(adap); - if (!err && adap->phys_addr != CEC_PHYS_ADDR_INVALID) + if (adap->phys_addr != CEC_PHYS_ADDR_INVALID) cec_claim_log_addrs(adap, block); - return err; + return 0; } int cec_s_log_addrs(struct cec_adapter *adap, @@ -2145,9 +2155,20 @@ int cec_monitor_all_cnt_inc(struct cec_adapter *adap) if (adap->monitor_all_cnt++) return 0; - ret = cec_adap_enable(adap); - if (ret) + if (!adap->needs_hpd) { + ret = cec_activate_cnt_inc(adap); + if (ret) { + adap->monitor_all_cnt--; + return ret; + } + } + + ret = call_op(adap, adap_monitor_all_enable, true); + if (ret) { adap->monitor_all_cnt--; + if (!adap->needs_hpd) + cec_activate_cnt_dec(adap); + } return ret; } @@ -2158,7 +2179,8 @@ void cec_monitor_all_cnt_dec(struct cec_adapter *adap) if (--adap->monitor_all_cnt) return; WARN_ON(call_op(adap, adap_monitor_all_enable, false)); - cec_adap_enable(adap); + if (!adap->needs_hpd) + cec_activate_cnt_dec(adap); } /* @@ -2173,9 +2195,20 @@ int cec_monitor_pin_cnt_inc(struct cec_adapter *adap) if (adap->monitor_pin_cnt++) return 0; - ret = cec_adap_enable(adap); - if (ret) + if (!adap->needs_hpd) { + ret = cec_activate_cnt_inc(adap); + if (ret) { + adap->monitor_pin_cnt--; + return ret; + } + } + + ret = call_op(adap, adap_monitor_pin_enable, true); + if (ret) { adap->monitor_pin_cnt--; + if (!adap->needs_hpd) + cec_activate_cnt_dec(adap); + } return ret; } @@ -2186,7 +2219,8 @@ void cec_monitor_pin_cnt_dec(struct cec_adapter *adap) if (--adap->monitor_pin_cnt) return; WARN_ON(call_op(adap, adap_monitor_pin_enable, false)); - cec_adap_enable(adap); + if (!adap->needs_hpd) + cec_activate_cnt_dec(adap); } #ifdef CONFIG_DEBUG_FS @@ -2200,7 +2234,7 @@ int cec_adap_status(struct seq_file *file, void *priv) struct cec_data *data; mutex_lock(&adap->lock); - seq_printf(file, "enabled: %d\n", adap->is_enabled); + seq_printf(file, "activation count: %u\n", adap->activate_cnt); seq_printf(file, "configured: %d\n", adap->is_configured); seq_printf(file, "configuring: %d\n", adap->is_configuring); seq_printf(file, "phys_addr: %x.%x.%x.%x\n", diff --git a/include/media/cec.h b/include/media/cec.h index df3e8738d512..31d704f36707 100644 --- a/include/media/cec.h +++ b/include/media/cec.h @@ -180,7 +180,6 @@ struct cec_adap_ops { * @needs_hpd: if true, then the HDMI HotPlug Detect pin must be high * in order to transmit or receive CEC messages. This is usually a HW * limitation. - * @is_enabled: the CEC adapter is enabled * @is_configuring: the CEC adapter is configuring (i.e. claiming LAs) * @is_configured: the CEC adapter is configured (i.e. has claimed LAs) * @cec_pin_is_high: if true then the CEC pin is high. Only used with the @@ -191,6 +190,7 @@ struct cec_adap_ops { * Drivers that need this can set this field to true after the * cec_allocate_adapter() call. * @last_initiator: the initiator of the last transmitted message. + * @activate_cnt: number of times that CEC is activated * @monitor_all_cnt: number of filehandles monitoring all msgs * @monitor_pin_cnt: number of filehandles monitoring pin changes * @follower_cnt: number of filehandles in follower mode @@ -238,12 +238,12 @@ struct cec_adapter { u16 phys_addr; bool needs_hpd; - bool is_enabled; bool is_configuring; bool is_configured; bool cec_pin_is_high; bool adap_controls_phys_addr; u8 last_initiator; + u32 activate_cnt; u32 monitor_all_cnt; u32 monitor_pin_cnt; u32 follower_cnt; From 802e36bc55c4c4f41168e324edc7f83cdbebdee3 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 15 Jul 2024 17:18:58 +0000 Subject: [PATCH 1361/1565] Revert "media: cec: use call_op and check for !unregistered" This reverts commit 73ef9ae980edef7ceeb2aa6e9ef38bee1fd8afd8 which is commit e2ed5024ac2bc27d4bfc99fd58f5ab54de8fa965 upstream. It breaks the Android kernel abi and can be brought back in the future in an abi-safe way if it is really needed. Bug: 161946584 Change-Id: I6c14452de87e35943c042c3a388c976e59e587a9 Signed-off-by: Greg Kroah-Hartman --- drivers/media/cec/core/cec-adap.c | 37 +++++++++++++++++---------- drivers/media/cec/core/cec-api.c | 6 ++--- drivers/media/cec/core/cec-core.c | 4 +-- drivers/media/cec/core/cec-pin-priv.h | 11 -------- drivers/media/cec/core/cec-pin.c | 23 +++++++++-------- drivers/media/cec/core/cec-priv.h | 10 -------- 6 files changed, 40 insertions(+), 51 deletions(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index 07ff4b0c8461..41f4def8a31c 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -39,6 +39,15 @@ static void cec_fill_msg_report_features(struct cec_adapter *adap, */ #define CEC_XFER_TIMEOUT_MS (5 * 400 + 100) +#define call_op(adap, op, arg...) \ + (adap->ops->op ? adap->ops->op(adap, ## arg) : 0) + +#define call_void_op(adap, op, arg...) \ + do { \ + if (adap->ops->op) \ + adap->ops->op(adap, ## arg); \ + } while (0) + static int cec_log_addr2idx(const struct cec_adapter *adap, u8 log_addr) { int i; @@ -396,9 +405,9 @@ static void cec_data_cancel(struct cec_data *data, u8 tx_status, u8 rx_status) /* Queue transmitted message for monitoring purposes */ cec_queue_msg_monitor(adap, &data->msg, 1); - if (!data->blocking && data->msg.sequence) + if (!data->blocking && data->msg.sequence && adap->ops->received) /* Allow drivers to process the message first */ - call_op(adap, received, &data->msg); + adap->ops->received(adap, &data->msg); cec_data_completed(data); } @@ -575,8 +584,8 @@ int cec_thread_func(void *_adap) adap->transmit_in_progress_aborted = false; /* Tell the adapter to transmit, cancel on error */ - if (call_op(adap, adap_transmit, data->attempts, - signal_free_time, &data->msg)) + if (adap->ops->adap_transmit(adap, data->attempts, + signal_free_time, &data->msg)) cec_data_cancel(data, CEC_TX_STATUS_ABORTED, 0); else adap->transmit_in_progress = true; @@ -1302,7 +1311,7 @@ static int cec_config_log_addr(struct cec_adapter *adap, * Message not acknowledged, so this logical * address is free to use. */ - err = call_op(adap, adap_log_addr, log_addr); + err = adap->ops->adap_log_addr(adap, log_addr); if (err) return err; @@ -1319,8 +1328,9 @@ static int cec_config_log_addr(struct cec_adapter *adap, */ static void cec_adap_unconfigure(struct cec_adapter *adap) { - if (!adap->needs_hpd || adap->phys_addr != CEC_PHYS_ADDR_INVALID) - WARN_ON(call_op(adap, adap_log_addr, CEC_LOG_ADDR_INVALID)); + if (!adap->needs_hpd || + adap->phys_addr != CEC_PHYS_ADDR_INVALID) + WARN_ON(adap->ops->adap_log_addr(adap, CEC_LOG_ADDR_INVALID)); adap->log_addrs.log_addr_mask = 0; adap->is_configured = false; cec_flush(adap); @@ -1563,7 +1573,7 @@ static int cec_activate_cnt_inc(struct cec_adapter *adap) mutex_lock(&adap->devnode.lock); adap->last_initiator = 0xff; adap->transmit_in_progress = false; - ret = call_op(adap, adap_enable, true); + ret = adap->ops->adap_enable(adap, true); if (ret) adap->activate_cnt--; mutex_unlock(&adap->devnode.lock); @@ -1580,7 +1590,7 @@ static void cec_activate_cnt_dec(struct cec_adapter *adap) /* serialize adap_enable */ mutex_lock(&adap->devnode.lock); - WARN_ON(call_op(adap, adap_enable, false)); + WARN_ON(adap->ops->adap_enable(adap, false)); adap->last_initiator = 0xff; adap->transmit_in_progress = false; adap->transmit_in_progress_aborted = false; @@ -1954,10 +1964,11 @@ static int cec_receive_notify(struct cec_adapter *adap, struct cec_msg *msg, msg->msg[1] != CEC_MSG_CDC_MESSAGE) return 0; - /* Allow drivers to process the message first */ - if (adap->ops->received && !adap->devnode.unregistered && - adap->ops->received(adap, msg) != -ENOMSG) - return 0; + if (adap->ops->received) { + /* Allow drivers to process the message first */ + if (adap->ops->received(adap, msg) != -ENOMSG) + return 0; + } /* * REPORT_PHYSICAL_ADDR, CEC_MSG_USER_CONTROL_PRESSED and diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c index feaf100a44c2..0be4e822211e 100644 --- a/drivers/media/cec/core/cec-api.c +++ b/drivers/media/cec/core/cec-api.c @@ -595,8 +595,7 @@ static int cec_open(struct inode *inode, struct file *filp) adap->conn_info.type != CEC_CONNECTOR_TYPE_NO_CONNECTOR; cec_queue_event_fh(fh, &ev, 0); #ifdef CONFIG_CEC_PIN - if (adap->pin && adap->pin->ops->read_hpd && - !adap->devnode.unregistered) { + if (adap->pin && adap->pin->ops->read_hpd) { err = adap->pin->ops->read_hpd(adap); if (err >= 0) { ev.event = err ? CEC_EVENT_PIN_HPD_HIGH : @@ -604,8 +603,7 @@ static int cec_open(struct inode *inode, struct file *filp) cec_queue_event_fh(fh, &ev, 0); } } - if (adap->pin && adap->pin->ops->read_5v && - !adap->devnode.unregistered) { + if (adap->pin && adap->pin->ops->read_5v) { err = adap->pin->ops->read_5v(adap); if (err >= 0) { ev.event = err ? CEC_EVENT_PIN_5V_HIGH : diff --git a/drivers/media/cec/core/cec-core.c b/drivers/media/cec/core/cec-core.c index 0a3345e1a0f3..61139b0a0869 100644 --- a/drivers/media/cec/core/cec-core.c +++ b/drivers/media/cec/core/cec-core.c @@ -204,7 +204,7 @@ static ssize_t cec_error_inj_write(struct file *file, line = strsep(&p, "\n"); if (!*line || *line == '#') continue; - if (!call_op(adap, error_inj_parse_line, line)) { + if (!adap->ops->error_inj_parse_line(adap, line)) { kfree(buf); return -EINVAL; } @@ -217,7 +217,7 @@ static int cec_error_inj_show(struct seq_file *sf, void *unused) { struct cec_adapter *adap = sf->private; - return call_op(adap, error_inj_show, sf); + return adap->ops->error_inj_show(adap, sf); } static int cec_error_inj_open(struct inode *inode, struct file *file) diff --git a/drivers/media/cec/core/cec-pin-priv.h b/drivers/media/cec/core/cec-pin-priv.h index 5577c62e4131..f423db8855d9 100644 --- a/drivers/media/cec/core/cec-pin-priv.h +++ b/drivers/media/cec/core/cec-pin-priv.h @@ -12,17 +12,6 @@ #include #include -#define call_pin_op(pin, op, arg...) \ - ((pin && pin->ops->op && !pin->adap->devnode.unregistered) ? \ - pin->ops->op(pin->adap, ## arg) : 0) - -#define call_void_pin_op(pin, op, arg...) \ - do { \ - if (pin && pin->ops->op && \ - !pin->adap->devnode.unregistered) \ - pin->ops->op(pin->adap, ## arg); \ - } while (0) - enum cec_pin_state { /* CEC is off */ CEC_ST_OFF, diff --git a/drivers/media/cec/core/cec-pin.c b/drivers/media/cec/core/cec-pin.c index 447fdada20e9..f8452a1f9fc6 100644 --- a/drivers/media/cec/core/cec-pin.c +++ b/drivers/media/cec/core/cec-pin.c @@ -135,7 +135,7 @@ static void cec_pin_update(struct cec_pin *pin, bool v, bool force) static bool cec_pin_read(struct cec_pin *pin) { - bool v = call_pin_op(pin, read); + bool v = pin->ops->read(pin->adap); cec_pin_update(pin, v, false); return v; @@ -143,13 +143,13 @@ static bool cec_pin_read(struct cec_pin *pin) static void cec_pin_low(struct cec_pin *pin) { - call_void_pin_op(pin, low); + pin->ops->low(pin->adap); cec_pin_update(pin, false, false); } static bool cec_pin_high(struct cec_pin *pin) { - call_void_pin_op(pin, high); + pin->ops->high(pin->adap); return cec_pin_read(pin); } @@ -1086,7 +1086,7 @@ static int cec_pin_thread_func(void *_adap) CEC_PIN_IRQ_UNCHANGED)) { case CEC_PIN_IRQ_DISABLE: if (irq_enabled) { - call_void_pin_op(pin, disable_irq); + pin->ops->disable_irq(adap); irq_enabled = false; } cec_pin_high(pin); @@ -1097,7 +1097,7 @@ static int cec_pin_thread_func(void *_adap) case CEC_PIN_IRQ_ENABLE: if (irq_enabled) break; - pin->enable_irq_failed = !call_pin_op(pin, enable_irq); + pin->enable_irq_failed = !pin->ops->enable_irq(adap); if (pin->enable_irq_failed) { cec_pin_to_idle(pin); hrtimer_start(&pin->timer, ns_to_ktime(0), @@ -1112,8 +1112,8 @@ static int cec_pin_thread_func(void *_adap) if (kthread_should_stop()) break; } - if (irq_enabled) - call_void_pin_op(pin, disable_irq); + if (pin->ops->disable_irq && irq_enabled) + pin->ops->disable_irq(adap); hrtimer_cancel(&pin->timer); cec_pin_read(pin); cec_pin_to_idle(pin); @@ -1208,7 +1208,7 @@ static void cec_pin_adap_status(struct cec_adapter *adap, seq_printf(file, "state: %s\n", states[pin->state].name); seq_printf(file, "tx_bit: %d\n", pin->tx_bit); seq_printf(file, "rx_bit: %d\n", pin->rx_bit); - seq_printf(file, "cec pin: %d\n", call_pin_op(pin, read)); + seq_printf(file, "cec pin: %d\n", pin->ops->read(adap)); seq_printf(file, "cec pin events dropped: %u\n", pin->work_pin_events_dropped_cnt); seq_printf(file, "irq failed: %d\n", pin->enable_irq_failed); @@ -1261,7 +1261,8 @@ static void cec_pin_adap_status(struct cec_adapter *adap, pin->rx_data_bit_too_long_cnt = 0; pin->rx_low_drive_cnt = 0; pin->tx_low_drive_cnt = 0; - call_void_pin_op(pin, status, file); + if (pin->ops->status) + pin->ops->status(adap, file); } static int cec_pin_adap_monitor_all_enable(struct cec_adapter *adap, @@ -1277,7 +1278,7 @@ static void cec_pin_adap_free(struct cec_adapter *adap) { struct cec_pin *pin = adap->pin; - if (pin && pin->ops->free) + if (pin->ops->free) pin->ops->free(adap); adap->pin = NULL; kfree(pin); @@ -1287,7 +1288,7 @@ static int cec_pin_received(struct cec_adapter *adap, struct cec_msg *msg) { struct cec_pin *pin = adap->pin; - if (pin->ops->received && !adap->devnode.unregistered) + if (pin->ops->received) return pin->ops->received(adap, msg); return -ENOMSG; } diff --git a/drivers/media/cec/core/cec-priv.h b/drivers/media/cec/core/cec-priv.h index b78df931aa74..9bbd05053d42 100644 --- a/drivers/media/cec/core/cec-priv.h +++ b/drivers/media/cec/core/cec-priv.h @@ -17,16 +17,6 @@ pr_info("cec-%s: " fmt, adap->name, ## arg); \ } while (0) -#define call_op(adap, op, arg...) \ - ((adap->ops->op && !adap->devnode.unregistered) ? \ - adap->ops->op(adap, ## arg) : 0) - -#define call_void_op(adap, op, arg...) \ - do { \ - if (adap->ops->op && !adap->devnode.unregistered) \ - adap->ops->op(adap, ## arg); \ - } while (0) - /* devnode to cec_adapter */ #define to_cec_adapter(node) container_of(node, struct cec_adapter, devnode) From baa6c4164b55b748923017a0ca63b732fcd86513 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 15 Jul 2024 17:19:18 +0000 Subject: [PATCH 1362/1565] Revert "media: cec: correctly pass on reply results" This reverts commit 8fa7e4896fddb62dfe7262ebc0dca411c24de09f which is commit f9d0ecbf56f4b90745a6adc5b59281ad8f70ab54 upstream. It breaks the Android kernel abi and can be brought back in the future in an abi-safe way if it is really needed. Bug: 161946584 Change-Id: I46d225aebe60f71253ff07b1af5771553c11297d Signed-off-by: Greg Kroah-Hartman --- drivers/media/cec/core/cec-adap.c | 48 ++++++++++++------------------- 1 file changed, 18 insertions(+), 30 deletions(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index 41f4def8a31c..fd4af157f4ce 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -366,48 +366,38 @@ static void cec_data_completed(struct cec_data *data) /* * A pending CEC transmit needs to be cancelled, either because the CEC * adapter is disabled or the transmit takes an impossibly long time to - * finish, or the reply timed out. + * finish. * * This function is called with adap->lock held. */ -static void cec_data_cancel(struct cec_data *data, u8 tx_status, u8 rx_status) +static void cec_data_cancel(struct cec_data *data, u8 tx_status) { - struct cec_adapter *adap = data->adap; - /* * It's either the current transmit, or it is a pending * transmit. Take the appropriate action to clear it. */ - if (adap->transmitting == data) { - adap->transmitting = NULL; + if (data->adap->transmitting == data) { + data->adap->transmitting = NULL; } else { list_del_init(&data->list); if (!(data->msg.tx_status & CEC_TX_STATUS_OK)) - if (!WARN_ON(!adap->transmit_queue_sz)) - adap->transmit_queue_sz--; + if (!WARN_ON(!data->adap->transmit_queue_sz)) + data->adap->transmit_queue_sz--; } if (data->msg.tx_status & CEC_TX_STATUS_OK) { data->msg.rx_ts = ktime_get_ns(); - data->msg.rx_status = rx_status; - if (!data->blocking) - data->msg.tx_status = 0; + data->msg.rx_status = CEC_RX_STATUS_ABORTED; } else { data->msg.tx_ts = ktime_get_ns(); data->msg.tx_status |= tx_status | CEC_TX_STATUS_MAX_RETRIES; data->msg.tx_error_cnt++; data->attempts = 0; - if (!data->blocking) - data->msg.rx_status = 0; } /* Queue transmitted message for monitoring purposes */ - cec_queue_msg_monitor(adap, &data->msg, 1); - - if (!data->blocking && data->msg.sequence && adap->ops->received) - /* Allow drivers to process the message first */ - adap->ops->received(adap, &data->msg); + cec_queue_msg_monitor(data->adap, &data->msg, 1); cec_data_completed(data); } @@ -428,7 +418,7 @@ static void cec_flush(struct cec_adapter *adap) while (!list_empty(&adap->transmit_queue)) { data = list_first_entry(&adap->transmit_queue, struct cec_data, list); - cec_data_cancel(data, CEC_TX_STATUS_ABORTED, 0); + cec_data_cancel(data, CEC_TX_STATUS_ABORTED); } if (adap->transmitting) adap->transmit_in_progress_aborted = true; @@ -436,7 +426,7 @@ static void cec_flush(struct cec_adapter *adap) /* Cancel the pending timeout work. */ list_for_each_entry_safe(data, n, &adap->wait_queue, list) { if (cancel_delayed_work(&data->work)) - cec_data_cancel(data, CEC_TX_STATUS_OK, CEC_RX_STATUS_ABORTED); + cec_data_cancel(data, CEC_TX_STATUS_OK); /* * If cancel_delayed_work returned false, then * the cec_wait_timeout function is running, @@ -526,7 +516,7 @@ int cec_thread_func(void *_adap) adap->transmitting->msg.msg); /* Just give up on this. */ cec_data_cancel(adap->transmitting, - CEC_TX_STATUS_TIMEOUT, 0); + CEC_TX_STATUS_TIMEOUT); } else { pr_warn("cec-%s: transmit timed out\n", adap->name); } @@ -586,7 +576,7 @@ int cec_thread_func(void *_adap) /* Tell the adapter to transmit, cancel on error */ if (adap->ops->adap_transmit(adap, data->attempts, signal_free_time, &data->msg)) - cec_data_cancel(data, CEC_TX_STATUS_ABORTED, 0); + cec_data_cancel(data, CEC_TX_STATUS_ABORTED); else adap->transmit_in_progress = true; @@ -748,7 +738,9 @@ static void cec_wait_timeout(struct work_struct *work) /* Mark the message as timed out */ list_del_init(&data->list); - cec_data_cancel(data, CEC_TX_STATUS_OK, CEC_RX_STATUS_TIMEOUT); + data->msg.rx_ts = ktime_get_ns(); + data->msg.rx_status = CEC_RX_STATUS_TIMEOUT; + cec_data_completed(data); unlock: mutex_unlock(&adap->lock); } @@ -931,12 +923,8 @@ int cec_transmit_msg_fh(struct cec_adapter *adap, struct cec_msg *msg, mutex_lock(&adap->lock); /* Cancel the transmit if it was interrupted */ - if (!data->completed) { - if (data->msg.tx_status & CEC_TX_STATUS_OK) - cec_data_cancel(data, CEC_TX_STATUS_OK, CEC_RX_STATUS_ABORTED); - else - cec_data_cancel(data, CEC_TX_STATUS_ABORTED, 0); - } + if (!data->completed) + cec_data_cancel(data, CEC_TX_STATUS_ABORTED); /* The transmit completed (possibly with an error) */ *msg = data->msg; @@ -1595,7 +1583,7 @@ static void cec_activate_cnt_dec(struct cec_adapter *adap) adap->transmit_in_progress = false; adap->transmit_in_progress_aborted = false; if (adap->transmitting) - cec_data_cancel(adap->transmitting, CEC_TX_STATUS_ABORTED, 0); + cec_data_cancel(adap->transmitting, CEC_TX_STATUS_ABORTED); mutex_unlock(&adap->devnode.lock); } From f257da513d63bcc67c9ff192d0c2ae3166eeda6a Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 15 Jul 2024 17:19:35 +0000 Subject: [PATCH 1363/1565] Revert "media: cec: abort if the current transmit was canceled" This reverts commit b64cb24a9e97f05473eb2b9b0573d25cfde50446 which is commit 590a8e564c6eff7e77a84e728612f1269e3c0685 upstream. It breaks the Android kernel abi and can be brought back in the future in an abi-safe way if it is really needed. Bug: 161946584 Change-Id: I5e4cdc4d23e850367ecef30627faec5ff4bf79ca Signed-off-by: Greg Kroah-Hartman --- drivers/media/cec/core/cec-adap.c | 14 +++----------- include/media/cec.h | 6 ------ 2 files changed, 3 insertions(+), 17 deletions(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index fd4af157f4ce..6415a80c9040 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -421,7 +421,7 @@ static void cec_flush(struct cec_adapter *adap) cec_data_cancel(data, CEC_TX_STATUS_ABORTED); } if (adap->transmitting) - adap->transmit_in_progress_aborted = true; + cec_data_cancel(adap->transmitting, CEC_TX_STATUS_ABORTED); /* Cancel the pending timeout work. */ list_for_each_entry_safe(data, n, &adap->wait_queue, list) { @@ -572,7 +572,6 @@ int cec_thread_func(void *_adap) if (data->attempts == 0) data->attempts = attempts; - adap->transmit_in_progress_aborted = false; /* Tell the adapter to transmit, cancel on error */ if (adap->ops->adap_transmit(adap, data->attempts, signal_free_time, &data->msg)) @@ -600,8 +599,6 @@ void cec_transmit_done_ts(struct cec_adapter *adap, u8 status, struct cec_msg *msg; unsigned int attempts_made = arb_lost_cnt + nack_cnt + low_drive_cnt + error_cnt; - bool done = status & (CEC_TX_STATUS_MAX_RETRIES | CEC_TX_STATUS_OK); - bool aborted = adap->transmit_in_progress_aborted; dprintk(2, "%s: status 0x%02x\n", __func__, status); if (attempts_made < 1) @@ -622,7 +619,6 @@ void cec_transmit_done_ts(struct cec_adapter *adap, u8 status, goto wake_thread; } adap->transmit_in_progress = false; - adap->transmit_in_progress_aborted = false; msg = &data->msg; @@ -643,7 +639,8 @@ void cec_transmit_done_ts(struct cec_adapter *adap, u8 status, * the hardware didn't signal that it retried itself (by setting * CEC_TX_STATUS_MAX_RETRIES), then we will retry ourselves. */ - if (!aborted && data->attempts > attempts_made && !done) { + if (data->attempts > attempts_made && + !(status & (CEC_TX_STATUS_MAX_RETRIES | CEC_TX_STATUS_OK))) { /* Retry this message */ data->attempts -= attempts_made; if (msg->timeout) @@ -658,8 +655,6 @@ void cec_transmit_done_ts(struct cec_adapter *adap, u8 status, goto wake_thread; } - if (aborted && !done) - status |= CEC_TX_STATUS_ABORTED; data->attempts = 0; /* Always set CEC_TX_STATUS_MAX_RETRIES on error */ @@ -1581,9 +1576,6 @@ static void cec_activate_cnt_dec(struct cec_adapter *adap) WARN_ON(adap->ops->adap_enable(adap, false)); adap->last_initiator = 0xff; adap->transmit_in_progress = false; - adap->transmit_in_progress_aborted = false; - if (adap->transmitting) - cec_data_cancel(adap->transmitting, CEC_TX_STATUS_ABORTED); mutex_unlock(&adap->devnode.lock); } diff --git a/include/media/cec.h b/include/media/cec.h index 31d704f36707..97c5f5bfcbd0 100644 --- a/include/media/cec.h +++ b/include/media/cec.h @@ -163,11 +163,6 @@ struct cec_adap_ops { * @wait_queue: queue of transmits waiting for a reply * @transmitting: CEC messages currently being transmitted * @transmit_in_progress: true if a transmit is in progress - * @transmit_in_progress_aborted: true if a transmit is in progress is to be - * aborted. This happens if the logical address is - * invalidated while the transmit is ongoing. In that - * case the transmit will finish, but will not retransmit - * and be marked as ABORTED. * @kthread_config: kthread used to configure a CEC adapter * @config_completion: used to signal completion of the config kthread * @kthread: main CEC processing thread @@ -223,7 +218,6 @@ struct cec_adapter { struct list_head wait_queue; struct cec_data *transmitting; bool transmit_in_progress; - bool transmit_in_progress_aborted; struct task_struct *kthread_config; struct completion config_completion; From 12d97237e4f4bc1eec81e8947e07cd614658cc5f Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 15 Jul 2024 17:19:57 +0000 Subject: [PATCH 1364/1565] Revert "media: cec: call enable_adap on s_log_addrs" This reverts commit 2c67f3634f827b06b73bc1c1d3750f3bfed072b8 which is commit 3813c932ed970dd4f413498ccecb03c73c4f1784 upstream. It breaks the Android kernel abi and can be brought back in the future in an abi-safe way if it is really needed. Bug: 161946584 Change-Id: I042a2e45c1b5e48db8fe5aab647d18c0a2d28632 Signed-off-by: Greg Kroah-Hartman --- drivers/media/cec/core/cec-adap.c | 174 ++++++++---------------------- drivers/media/cec/core/cec-api.c | 18 +++- include/media/cec.h | 2 - 3 files changed, 64 insertions(+), 130 deletions(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index 6415a80c9040..d2a6fd8b6501 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -1532,7 +1532,6 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block) "ceccfg-%s", adap->name); if (IS_ERR(adap->kthread_config)) { adap->kthread_config = NULL; - adap->is_configuring = false; } else if (block) { mutex_unlock(&adap->lock); wait_for_completion(&adap->config_completion); @@ -1540,91 +1539,60 @@ static void cec_claim_log_addrs(struct cec_adapter *adap, bool block) } } -/* - * Helper functions to enable/disable the CEC adapter. - * - * These functions are called with adap->lock held. - */ -static int cec_activate_cnt_inc(struct cec_adapter *adap) -{ - int ret; - - if (adap->activate_cnt++) - return 0; - - /* serialize adap_enable */ - mutex_lock(&adap->devnode.lock); - adap->last_initiator = 0xff; - adap->transmit_in_progress = false; - ret = adap->ops->adap_enable(adap, true); - if (ret) - adap->activate_cnt--; - mutex_unlock(&adap->devnode.lock); - return ret; -} - -static void cec_activate_cnt_dec(struct cec_adapter *adap) -{ - if (WARN_ON(!adap->activate_cnt)) - return; - - if (--adap->activate_cnt) - return; - - /* serialize adap_enable */ - mutex_lock(&adap->devnode.lock); - WARN_ON(adap->ops->adap_enable(adap, false)); - adap->last_initiator = 0xff; - adap->transmit_in_progress = false; - mutex_unlock(&adap->devnode.lock); -} - /* Set a new physical address and send an event notifying userspace of this. * * This function is called with adap->lock held. */ void __cec_s_phys_addr(struct cec_adapter *adap, u16 phys_addr, bool block) { - bool becomes_invalid = phys_addr == CEC_PHYS_ADDR_INVALID; - bool is_invalid = adap->phys_addr == CEC_PHYS_ADDR_INVALID; - if (phys_addr == adap->phys_addr) return; - if (!becomes_invalid && adap->devnode.unregistered) + if (phys_addr != CEC_PHYS_ADDR_INVALID && adap->devnode.unregistered) return; dprintk(1, "new physical address %x.%x.%x.%x\n", cec_phys_addr_exp(phys_addr)); - if (becomes_invalid || !is_invalid) { + if (phys_addr == CEC_PHYS_ADDR_INVALID || + adap->phys_addr != CEC_PHYS_ADDR_INVALID) { adap->phys_addr = CEC_PHYS_ADDR_INVALID; cec_post_state_event(adap); cec_adap_unconfigure(adap); - if (becomes_invalid && adap->needs_hpd) { - /* Disable monitor-all/pin modes if needed */ - if (adap->monitor_all_cnt) - WARN_ON(call_op(adap, adap_monitor_all_enable, false)); - if (adap->monitor_pin_cnt) - WARN_ON(call_op(adap, adap_monitor_pin_enable, false)); - cec_activate_cnt_dec(adap); + /* Disabling monitor all mode should always succeed */ + if (adap->monitor_all_cnt) + WARN_ON(call_op(adap, adap_monitor_all_enable, false)); + /* serialize adap_enable */ + mutex_lock(&adap->devnode.lock); + if (adap->needs_hpd || list_empty(&adap->devnode.fhs)) { + WARN_ON(adap->ops->adap_enable(adap, false)); + adap->transmit_in_progress = false; wake_up_interruptible(&adap->kthread_waitq); } - if (becomes_invalid) + mutex_unlock(&adap->devnode.lock); + if (phys_addr == CEC_PHYS_ADDR_INVALID) return; } - if (is_invalid && adap->needs_hpd) { - if (cec_activate_cnt_inc(adap)) + /* serialize adap_enable */ + mutex_lock(&adap->devnode.lock); + adap->last_initiator = 0xff; + adap->transmit_in_progress = false; + + if (adap->needs_hpd || list_empty(&adap->devnode.fhs)) { + if (adap->ops->adap_enable(adap, true)) { + mutex_unlock(&adap->devnode.lock); return; - /* - * Re-enable monitor-all/pin modes if needed. We warn, but - * continue if this fails as this is not a critical error. - */ - if (adap->monitor_all_cnt) - WARN_ON(call_op(adap, adap_monitor_all_enable, true)); - if (adap->monitor_pin_cnt) - WARN_ON(call_op(adap, adap_monitor_pin_enable, true)); + } } + if (adap->monitor_all_cnt && + call_op(adap, adap_monitor_all_enable, true)) { + if (adap->needs_hpd || list_empty(&adap->devnode.fhs)) + WARN_ON(adap->ops->adap_enable(adap, false)); + mutex_unlock(&adap->devnode.lock); + return; + } + mutex_unlock(&adap->devnode.lock); + adap->phys_addr = phys_addr; cec_post_state_event(adap); if (adap->log_addrs.num_log_addrs) @@ -1688,8 +1656,6 @@ int __cec_s_log_addrs(struct cec_adapter *adap, return -ENODEV; if (!log_addrs || log_addrs->num_log_addrs == 0) { - if (!adap->is_configuring && !adap->is_configured) - return 0; cec_adap_unconfigure(adap); adap->log_addrs.num_log_addrs = 0; for (i = 0; i < CEC_MAX_LOG_ADDRS; i++) @@ -1697,8 +1663,6 @@ int __cec_s_log_addrs(struct cec_adapter *adap, adap->log_addrs.osd_name[0] = '\0'; adap->log_addrs.vendor_id = CEC_VENDOR_ID_NONE; adap->log_addrs.cec_version = CEC_OP_CEC_VERSION_2_0; - if (!adap->needs_hpd) - cec_activate_cnt_dec(adap); return 0; } @@ -1832,12 +1796,6 @@ int __cec_s_log_addrs(struct cec_adapter *adap, sizeof(log_addrs->features[i])); } - if (!adap->needs_hpd && !adap->is_configuring && !adap->is_configured) { - int ret = cec_activate_cnt_inc(adap); - - if (ret) - return ret; - } log_addrs->log_addr_mask = adap->log_addrs.log_addr_mask; adap->log_addrs = *log_addrs; if (adap->phys_addr != CEC_PHYS_ADDR_INVALID) @@ -2141,37 +2099,20 @@ static int cec_receive_notify(struct cec_adapter *adap, struct cec_msg *msg, */ int cec_monitor_all_cnt_inc(struct cec_adapter *adap) { - int ret; + int ret = 0; - if (adap->monitor_all_cnt++) - return 0; - - if (!adap->needs_hpd) { - ret = cec_activate_cnt_inc(adap); - if (ret) { - adap->monitor_all_cnt--; - return ret; - } - } - - ret = call_op(adap, adap_monitor_all_enable, true); - if (ret) { - adap->monitor_all_cnt--; - if (!adap->needs_hpd) - cec_activate_cnt_dec(adap); - } + if (adap->monitor_all_cnt == 0) + ret = call_op(adap, adap_monitor_all_enable, 1); + if (ret == 0) + adap->monitor_all_cnt++; return ret; } void cec_monitor_all_cnt_dec(struct cec_adapter *adap) { - if (WARN_ON(!adap->monitor_all_cnt)) - return; - if (--adap->monitor_all_cnt) - return; - WARN_ON(call_op(adap, adap_monitor_all_enable, false)); - if (!adap->needs_hpd) - cec_activate_cnt_dec(adap); + adap->monitor_all_cnt--; + if (adap->monitor_all_cnt == 0) + WARN_ON(call_op(adap, adap_monitor_all_enable, 0)); } /* @@ -2181,37 +2122,20 @@ void cec_monitor_all_cnt_dec(struct cec_adapter *adap) */ int cec_monitor_pin_cnt_inc(struct cec_adapter *adap) { - int ret; + int ret = 0; - if (adap->monitor_pin_cnt++) - return 0; - - if (!adap->needs_hpd) { - ret = cec_activate_cnt_inc(adap); - if (ret) { - adap->monitor_pin_cnt--; - return ret; - } - } - - ret = call_op(adap, adap_monitor_pin_enable, true); - if (ret) { - adap->monitor_pin_cnt--; - if (!adap->needs_hpd) - cec_activate_cnt_dec(adap); - } + if (adap->monitor_pin_cnt == 0) + ret = call_op(adap, adap_monitor_pin_enable, 1); + if (ret == 0) + adap->monitor_pin_cnt++; return ret; } void cec_monitor_pin_cnt_dec(struct cec_adapter *adap) { - if (WARN_ON(!adap->monitor_pin_cnt)) - return; - if (--adap->monitor_pin_cnt) - return; - WARN_ON(call_op(adap, adap_monitor_pin_enable, false)); - if (!adap->needs_hpd) - cec_activate_cnt_dec(adap); + adap->monitor_pin_cnt--; + if (adap->monitor_pin_cnt == 0) + WARN_ON(call_op(adap, adap_monitor_pin_enable, 0)); } #ifdef CONFIG_DEBUG_FS @@ -2225,7 +2149,6 @@ int cec_adap_status(struct seq_file *file, void *priv) struct cec_data *data; mutex_lock(&adap->lock); - seq_printf(file, "activation count: %u\n", adap->activate_cnt); seq_printf(file, "configured: %d\n", adap->is_configured); seq_printf(file, "configuring: %d\n", adap->is_configuring); seq_printf(file, "phys_addr: %x.%x.%x.%x\n", @@ -2240,9 +2163,6 @@ int cec_adap_status(struct seq_file *file, void *priv) if (adap->monitor_all_cnt) seq_printf(file, "file handles in Monitor All mode: %u\n", adap->monitor_all_cnt); - if (adap->monitor_pin_cnt) - seq_printf(file, "file handles in Monitor Pin mode: %u\n", - adap->monitor_pin_cnt); if (adap->tx_timeouts) { seq_printf(file, "transmit timeouts: %u\n", adap->tx_timeouts); diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c index 0be4e822211e..899017a0e514 100644 --- a/drivers/media/cec/core/cec-api.c +++ b/drivers/media/cec/core/cec-api.c @@ -586,6 +586,18 @@ static int cec_open(struct inode *inode, struct file *filp) return err; } + /* serialize adap_enable */ + mutex_lock(&devnode->lock); + if (list_empty(&devnode->fhs) && + !adap->needs_hpd && + adap->phys_addr == CEC_PHYS_ADDR_INVALID) { + err = adap->ops->adap_enable(adap, true); + if (err) { + mutex_unlock(&devnode->lock); + kfree(fh); + return err; + } + } filp->private_data = fh; /* Queue up initial state events */ @@ -613,7 +625,6 @@ static int cec_open(struct inode *inode, struct file *filp) } #endif - mutex_lock(&devnode->lock); mutex_lock(&devnode->lock_fhs); list_add(&fh->list, &devnode->fhs); mutex_unlock(&devnode->lock_fhs); @@ -645,10 +656,15 @@ static int cec_release(struct inode *inode, struct file *filp) cec_monitor_all_cnt_dec(adap); mutex_unlock(&adap->lock); + /* serialize adap_enable */ mutex_lock(&devnode->lock); mutex_lock(&devnode->lock_fhs); list_del(&fh->list); mutex_unlock(&devnode->lock_fhs); + if (cec_is_registered(adap) && list_empty(&devnode->fhs) && + !adap->needs_hpd && adap->phys_addr == CEC_PHYS_ADDR_INVALID) { + WARN_ON(adap->ops->adap_enable(adap, false)); + } mutex_unlock(&devnode->lock); /* Unhook pending transmits from this filehandle. */ diff --git a/include/media/cec.h b/include/media/cec.h index 97c5f5bfcbd0..77346f757036 100644 --- a/include/media/cec.h +++ b/include/media/cec.h @@ -185,7 +185,6 @@ struct cec_adap_ops { * Drivers that need this can set this field to true after the * cec_allocate_adapter() call. * @last_initiator: the initiator of the last transmitted message. - * @activate_cnt: number of times that CEC is activated * @monitor_all_cnt: number of filehandles monitoring all msgs * @monitor_pin_cnt: number of filehandles monitoring pin changes * @follower_cnt: number of filehandles in follower mode @@ -237,7 +236,6 @@ struct cec_adapter { bool cec_pin_is_high; bool adap_controls_phys_addr; u8 last_initiator; - u32 activate_cnt; u32 monitor_all_cnt; u32 monitor_pin_cnt; u32 follower_cnt; From e4f3376872f3992cc31115d7ad832d86dfb34733 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 15 Jul 2024 17:20:18 +0000 Subject: [PATCH 1365/1565] Revert "media: cec: fix a deadlock situation" This reverts commit 0ab74ae99f86d0bc06b10d8c41980a91ba853f45 which is commit a9e6107616bb8108aa4fc22584a05e69761a91f7 upstream. It breaks the Android kernel abi and can be brought back in the future in an abi-safe way if it is really needed. Bug: 161946584 Change-Id: I8abbf69a8f0812b07813ba1b94b5457d4fa27f7b Signed-off-by: Greg Kroah-Hartman --- drivers/media/cec/core/cec-adap.c | 38 ++++++++++++++----------------- drivers/media/cec/core/cec-api.c | 6 ----- drivers/media/cec/core/cec-core.c | 3 --- include/media/cec.h | 11 ++------- 4 files changed, 19 insertions(+), 39 deletions(-) diff --git a/drivers/media/cec/core/cec-adap.c b/drivers/media/cec/core/cec-adap.c index d2a6fd8b6501..78027f0310ac 100644 --- a/drivers/media/cec/core/cec-adap.c +++ b/drivers/media/cec/core/cec-adap.c @@ -161,10 +161,10 @@ static void cec_queue_event(struct cec_adapter *adap, u64 ts = ktime_get_ns(); struct cec_fh *fh; - mutex_lock(&adap->devnode.lock_fhs); + mutex_lock(&adap->devnode.lock); list_for_each_entry(fh, &adap->devnode.fhs, list) cec_queue_event_fh(fh, ev, ts); - mutex_unlock(&adap->devnode.lock_fhs); + mutex_unlock(&adap->devnode.lock); } /* Notify userspace that the CEC pin changed state at the given time. */ @@ -178,12 +178,11 @@ void cec_queue_pin_cec_event(struct cec_adapter *adap, bool is_high, }; struct cec_fh *fh; - mutex_lock(&adap->devnode.lock_fhs); - list_for_each_entry(fh, &adap->devnode.fhs, list) { + mutex_lock(&adap->devnode.lock); + list_for_each_entry(fh, &adap->devnode.fhs, list) if (fh->mode_follower == CEC_MODE_MONITOR_PIN) cec_queue_event_fh(fh, &ev, ktime_to_ns(ts)); - } - mutex_unlock(&adap->devnode.lock_fhs); + mutex_unlock(&adap->devnode.lock); } EXPORT_SYMBOL_GPL(cec_queue_pin_cec_event); @@ -196,10 +195,10 @@ void cec_queue_pin_hpd_event(struct cec_adapter *adap, bool is_high, ktime_t ts) }; struct cec_fh *fh; - mutex_lock(&adap->devnode.lock_fhs); + mutex_lock(&adap->devnode.lock); list_for_each_entry(fh, &adap->devnode.fhs, list) cec_queue_event_fh(fh, &ev, ktime_to_ns(ts)); - mutex_unlock(&adap->devnode.lock_fhs); + mutex_unlock(&adap->devnode.lock); } EXPORT_SYMBOL_GPL(cec_queue_pin_hpd_event); @@ -212,10 +211,10 @@ void cec_queue_pin_5v_event(struct cec_adapter *adap, bool is_high, ktime_t ts) }; struct cec_fh *fh; - mutex_lock(&adap->devnode.lock_fhs); + mutex_lock(&adap->devnode.lock); list_for_each_entry(fh, &adap->devnode.fhs, list) cec_queue_event_fh(fh, &ev, ktime_to_ns(ts)); - mutex_unlock(&adap->devnode.lock_fhs); + mutex_unlock(&adap->devnode.lock); } EXPORT_SYMBOL_GPL(cec_queue_pin_5v_event); @@ -287,12 +286,12 @@ static void cec_queue_msg_monitor(struct cec_adapter *adap, u32 monitor_mode = valid_la ? CEC_MODE_MONITOR : CEC_MODE_MONITOR_ALL; - mutex_lock(&adap->devnode.lock_fhs); + mutex_lock(&adap->devnode.lock); list_for_each_entry(fh, &adap->devnode.fhs, list) { if (fh->mode_follower >= monitor_mode) cec_queue_msg_fh(fh, msg); } - mutex_unlock(&adap->devnode.lock_fhs); + mutex_unlock(&adap->devnode.lock); } /* @@ -303,12 +302,12 @@ static void cec_queue_msg_followers(struct cec_adapter *adap, { struct cec_fh *fh; - mutex_lock(&adap->devnode.lock_fhs); + mutex_lock(&adap->devnode.lock); list_for_each_entry(fh, &adap->devnode.fhs, list) { if (fh->mode_follower == CEC_MODE_FOLLOWER) cec_queue_msg_fh(fh, msg); } - mutex_unlock(&adap->devnode.lock_fhs); + mutex_unlock(&adap->devnode.lock); } /* Notify userspace of an adapter state change. */ @@ -1560,7 +1559,6 @@ void __cec_s_phys_addr(struct cec_adapter *adap, u16 phys_addr, bool block) /* Disabling monitor all mode should always succeed */ if (adap->monitor_all_cnt) WARN_ON(call_op(adap, adap_monitor_all_enable, false)); - /* serialize adap_enable */ mutex_lock(&adap->devnode.lock); if (adap->needs_hpd || list_empty(&adap->devnode.fhs)) { WARN_ON(adap->ops->adap_enable(adap, false)); @@ -1572,16 +1570,14 @@ void __cec_s_phys_addr(struct cec_adapter *adap, u16 phys_addr, bool block) return; } - /* serialize adap_enable */ mutex_lock(&adap->devnode.lock); adap->last_initiator = 0xff; adap->transmit_in_progress = false; - if (adap->needs_hpd || list_empty(&adap->devnode.fhs)) { - if (adap->ops->adap_enable(adap, true)) { - mutex_unlock(&adap->devnode.lock); - return; - } + if ((adap->needs_hpd || list_empty(&adap->devnode.fhs)) && + adap->ops->adap_enable(adap, true)) { + mutex_unlock(&adap->devnode.lock); + return; } if (adap->monitor_all_cnt && diff --git a/drivers/media/cec/core/cec-api.c b/drivers/media/cec/core/cec-api.c index 899017a0e514..893641ebc644 100644 --- a/drivers/media/cec/core/cec-api.c +++ b/drivers/media/cec/core/cec-api.c @@ -586,7 +586,6 @@ static int cec_open(struct inode *inode, struct file *filp) return err; } - /* serialize adap_enable */ mutex_lock(&devnode->lock); if (list_empty(&devnode->fhs) && !adap->needs_hpd && @@ -625,9 +624,7 @@ static int cec_open(struct inode *inode, struct file *filp) } #endif - mutex_lock(&devnode->lock_fhs); list_add(&fh->list, &devnode->fhs); - mutex_unlock(&devnode->lock_fhs); mutex_unlock(&devnode->lock); return 0; @@ -656,11 +653,8 @@ static int cec_release(struct inode *inode, struct file *filp) cec_monitor_all_cnt_dec(adap); mutex_unlock(&adap->lock); - /* serialize adap_enable */ mutex_lock(&devnode->lock); - mutex_lock(&devnode->lock_fhs); list_del(&fh->list); - mutex_unlock(&devnode->lock_fhs); if (cec_is_registered(adap) && list_empty(&devnode->fhs) && !adap->needs_hpd && adap->phys_addr == CEC_PHYS_ADDR_INVALID) { WARN_ON(adap->ops->adap_enable(adap, false)); diff --git a/drivers/media/cec/core/cec-core.c b/drivers/media/cec/core/cec-core.c index 61139b0a0869..ece236291f35 100644 --- a/drivers/media/cec/core/cec-core.c +++ b/drivers/media/cec/core/cec-core.c @@ -167,10 +167,8 @@ static void cec_devnode_unregister(struct cec_adapter *adap) return; } - mutex_lock(&devnode->lock_fhs); list_for_each_entry(fh, &devnode->fhs, list) wake_up_interruptible(&fh->wait); - mutex_unlock(&devnode->lock_fhs); devnode->registered = false; devnode->unregistered = true; @@ -274,7 +272,6 @@ struct cec_adapter *cec_allocate_adapter(const struct cec_adap_ops *ops, /* adap->devnode initialization */ INIT_LIST_HEAD(&adap->devnode.fhs); - mutex_init(&adap->devnode.lock_fhs); mutex_init(&adap->devnode.lock); adap->kthread = kthread_run(cec_thread_func, adap, "cec-%s", name); diff --git a/include/media/cec.h b/include/media/cec.h index 77346f757036..208c9613c07e 100644 --- a/include/media/cec.h +++ b/include/media/cec.h @@ -26,17 +26,13 @@ * @dev: cec device * @cdev: cec character device * @minor: device node minor number - * @lock: lock to serialize open/release and registration * @registered: the device was correctly registered * @unregistered: the device was unregistered - * @lock_fhs: lock to control access to @fhs * @fhs: the list of open filehandles (cec_fh) + * @lock: lock to control access to this structure * * This structure represents a cec-related device node. * - * To add or remove filehandles from @fhs the @lock must be taken first, - * followed by @lock_fhs. It is safe to access @fhs if either lock is held. - * * The @parent is a physical device. It must be set by core or device drivers * before registering the node. */ @@ -47,13 +43,10 @@ struct cec_devnode { /* device info */ int minor; - /* serialize open/release and registration */ - struct mutex lock; bool registered; bool unregistered; - /* protect access to fhs */ - struct mutex lock_fhs; struct list_head fhs; + struct mutex lock; }; struct cec_adapter; From 640645c85ba551dc98a3cd56f51be40c707e10fb Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 16 Jul 2024 10:25:39 +0000 Subject: [PATCH 1366/1565] Revert "drm/mipi-dsi: use correct return type for the DSC functions" This reverts commit eb9635b4a94fb0685a7f01b73ad085d23e126c9e which is commit de1c705c50326acaceaf1f02bc5bf6f267c572bd upstream. It breaks the Android kernel abi and can be brought back in the future in an abi-safe way if it is really needed. Bug: 161946584 Change-Id: Ifeda5fca1bf9a45e7264ab123d0a69210bb506cc Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/drm_mipi_dsi.c | 6 +++--- include/drm/drm_mipi_dsi.h | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/drm_mipi_dsi.c b/drivers/gpu/drm/drm_mipi_dsi.c index fb701491f497..fddc041aa865 100644 --- a/drivers/gpu/drm/drm_mipi_dsi.c +++ b/drivers/gpu/drm/drm_mipi_dsi.c @@ -560,7 +560,7 @@ EXPORT_SYMBOL(mipi_dsi_set_maximum_return_packet_size); * * Return: 0 on success or a negative error code on failure. */ -int mipi_dsi_compression_mode(struct mipi_dsi_device *dsi, bool enable) +ssize_t mipi_dsi_compression_mode(struct mipi_dsi_device *dsi, bool enable) { /* Note: Needs updating for non-default PPS or algorithm */ u8 tx[2] = { enable << 0, 0 }; @@ -585,8 +585,8 @@ EXPORT_SYMBOL(mipi_dsi_compression_mode); * * Return: 0 on success or a negative error code on failure. */ -int mipi_dsi_picture_parameter_set(struct mipi_dsi_device *dsi, - const struct drm_dsc_picture_parameter_set *pps) +ssize_t mipi_dsi_picture_parameter_set(struct mipi_dsi_device *dsi, + const struct drm_dsc_picture_parameter_set *pps) { struct mipi_dsi_msg msg = { .channel = dsi->channel, diff --git a/include/drm/drm_mipi_dsi.h b/include/drm/drm_mipi_dsi.h index 6e367e79625c..05592c17da0c 100644 --- a/include/drm/drm_mipi_dsi.h +++ b/include/drm/drm_mipi_dsi.h @@ -241,9 +241,9 @@ int mipi_dsi_shutdown_peripheral(struct mipi_dsi_device *dsi); int mipi_dsi_turn_on_peripheral(struct mipi_dsi_device *dsi); int mipi_dsi_set_maximum_return_packet_size(struct mipi_dsi_device *dsi, u16 value); -int mipi_dsi_compression_mode(struct mipi_dsi_device *dsi, bool enable); -int mipi_dsi_picture_parameter_set(struct mipi_dsi_device *dsi, - const struct drm_dsc_picture_parameter_set *pps); +ssize_t mipi_dsi_compression_mode(struct mipi_dsi_device *dsi, bool enable); +ssize_t mipi_dsi_picture_parameter_set(struct mipi_dsi_device *dsi, + const struct drm_dsc_picture_parameter_set *pps); ssize_t mipi_dsi_generic_write(struct mipi_dsi_device *dsi, const void *payload, size_t size); From 88eb084d18c6124acccfe06edfc161dfa11bb34b Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 16 Jul 2024 16:32:09 +0000 Subject: [PATCH 1367/1565] Revert "Merge 5.10.220 into android12-5.10-lts" This reverts commit 87a7f35a248737adec8257a65ec4cb6ee9523f0b, reversing changes made to 640645c85ba551dc98a3cd56f51be40c707e10fb. 5.10.220 is a bunch of vfs and nfs changes that are not needed in Android systems, so revert the whole lot all at once, except for the version number bump. Change-Id: If28dc2231f27d326d3730716f23545dd0a2cdc75 Signed-off-by: Greg Kroah-Hartman --- Documentation/filesystems/files.rst | 8 +- Documentation/filesystems/locking.rst | 10 +- Documentation/filesystems/nfs/exporting.rst | 78 - arch/powerpc/platforms/cell/spufs/coredump.c | 2 +- crypto/algboss.c | 4 +- fs/Kconfig | 6 +- fs/autofs/dev-ioctl.c | 5 +- fs/cachefiles/namei.c | 9 +- fs/cifs/connect.c | 2 +- fs/coredump.c | 5 +- fs/ecryptfs/inode.c | 10 +- fs/exec.c | 29 +- fs/exportfs/expfs.c | 40 +- fs/file.c | 177 +- fs/init.c | 6 +- fs/lockd/clnt4xdr.c | 9 +- fs/lockd/clntproc.c | 3 + fs/lockd/host.c | 4 +- fs/lockd/svc.c | 260 +- fs/lockd/svc4proc.c | 70 +- fs/lockd/svclock.c | 67 +- fs/lockd/svcproc.c | 62 +- fs/lockd/svcsubs.c | 123 +- fs/lockd/svcxdr.h | 142 - fs/lockd/xdr.c | 448 ++- fs/lockd/xdr4.c | 462 +-- fs/locks.c | 102 +- fs/namei.c | 21 +- fs/nfs/blocklayout/blocklayout.c | 2 +- fs/nfs/blocklayout/dev.c | 2 +- fs/nfs/callback.c | 111 +- fs/nfs/callback_xdr.c | 33 +- fs/nfs/dir.c | 2 +- fs/nfs/export.c | 17 - fs/nfs/file.c | 3 - fs/nfs/filelayout/filelayout.c | 4 +- fs/nfs/filelayout/filelayoutdev.c | 2 +- fs/nfs/flexfilelayout/flexfilelayout.c | 4 +- fs/nfs/flexfilelayout/flexfilelayoutdev.c | 2 +- fs/nfs/nfs42xdr.c | 2 +- fs/nfs/nfs4state.c | 2 +- fs/nfs/nfs4xdr.c | 6 +- fs/nfs/pagelist.c | 3 + fs/nfs/super.c | 8 - fs/nfs/write.c | 3 + fs/nfs_common/Makefile | 2 +- fs/nfs_common/nfs_ssc.c | 2 + fs/nfs_common/nfsacl.c | 123 - fs/nfsd/Kconfig | 40 +- fs/nfsd/Makefile | 8 +- fs/nfsd/acl.h | 6 +- fs/nfsd/blocklayout.c | 1 - fs/nfsd/blocklayoutxdr.c | 1 - fs/nfsd/cache.h | 2 +- fs/nfsd/export.c | 74 +- fs/nfsd/export.h | 16 +- fs/nfsd/filecache.c | 1217 +++--- fs/nfsd/filecache.h | 23 +- fs/nfsd/flexfilelayout.c | 3 +- fs/nfsd/lockd.c | 10 +- fs/nfsd/netns.h | 63 +- fs/nfsd/nfs2acl.c | 214 +- fs/nfsd/nfs3acl.c | 140 +- fs/nfsd/nfs3proc.c | 402 +- fs/nfsd/nfs3xdr.c | 1801 ++++----- fs/nfsd/nfs4acl.c | 45 +- fs/nfsd/nfs4callback.c | 168 +- fs/nfsd/nfs4idmap.c | 9 +- fs/nfsd/nfs4layouts.c | 4 +- fs/nfsd/nfs4proc.c | 1111 ++---- fs/nfsd/nfs4recover.c | 20 +- fs/nfsd/nfs4state.c | 1715 +++----- fs/nfsd/nfs4xdr.c | 3771 +++++++++--------- fs/nfsd/nfscache.c | 115 +- fs/nfsd/nfsctl.c | 169 +- fs/nfsd/nfsd.h | 50 +- fs/nfsd/nfsfh.c | 291 +- fs/nfsd/nfsfh.h | 179 +- fs/nfsd/nfsproc.c | 262 +- fs/nfsd/nfssvc.c | 356 +- fs/nfsd/nfsxdr.c | 842 ++-- fs/nfsd/state.h | 69 +- fs/nfsd/stats.c | 124 +- fs/nfsd/stats.h | 98 +- fs/nfsd/trace.c | 1 - fs/nfsd/trace.h | 898 +---- fs/nfsd/vfs.c | 933 +++-- fs/nfsd/vfs.h | 62 +- fs/nfsd/xdr.h | 68 +- fs/nfsd/xdr3.h | 116 +- fs/nfsd/xdr4.h | 127 +- fs/nfsd/xdr4cb.h | 6 - fs/notify/dnotify/dnotify.c | 17 +- fs/notify/fanotify/fanotify.c | 487 +-- fs/notify/fanotify/fanotify.h | 252 +- fs/notify/fanotify/fanotify_user.c | 886 +--- fs/notify/fdinfo.c | 19 +- fs/notify/fsnotify.c | 183 +- fs/notify/fsnotify.h | 19 +- fs/notify/group.c | 38 +- fs/notify/inotify/inotify.h | 11 +- fs/notify/inotify/inotify_fsnotify.c | 12 +- fs/notify/inotify/inotify_user.c | 87 +- fs/notify/mark.c | 172 +- fs/notify/notification.c | 72 +- fs/open.c | 49 +- fs/overlayfs/overlayfs.h | 9 +- fs/proc/fd.c | 48 +- fs/udf/file.c | 2 +- fs/verity/enable.c | 2 +- include/linux/dnotify.h | 2 +- include/linux/errno.h | 1 - include/linux/exportfs.h | 15 - include/linux/fanotify.h | 74 +- include/linux/fdtable.h | 37 +- include/linux/fs.h | 54 +- include/linux/fsnotify.h | 77 +- include/linux/fsnotify_backend.h | 372 +- include/linux/iversion.h | 13 - include/linux/kallsyms.h | 17 +- include/linux/kthread.h | 1 - include/linux/lockd/bind.h | 3 +- include/linux/lockd/lockd.h | 17 +- include/linux/lockd/xdr.h | 35 +- include/linux/lockd/xdr4.h | 33 +- include/linux/module.h | 24 +- include/linux/nfs.h | 8 + include/linux/nfs4.h | 21 +- include/linux/nfs_ssc.h | 14 - include/linux/nfsacl.h | 6 - include/linux/pid.h | 1 - include/linux/sched/user.h | 3 + include/linux/sunrpc/msg_prot.h | 3 + include/linux/sunrpc/svc.h | 151 +- include/linux/sunrpc/svc_rdma.h | 4 +- include/linux/sunrpc/svc_xprt.h | 16 +- include/linux/sunrpc/svcauth.h | 4 +- include/linux/sunrpc/svcsock.h | 7 +- include/linux/sunrpc/xdr.h | 153 +- include/linux/syscalls.h | 12 + include/linux/sysctl.h | 2 - include/linux/user_namespace.h | 4 - include/trace/events/sunrpc.h | 26 +- include/uapi/linux/fanotify.h | 42 - include/uapi/linux/nfs3.h | 6 - include/uapi/linux/nfsd/nfsfh.h | 105 + kernel/audit_fsnotify.c | 8 +- kernel/audit_tree.c | 2 +- kernel/audit_watch.c | 5 +- kernel/bpf/inode.c | 2 +- kernel/bpf/syscall.c | 20 +- kernel/bpf/task_iter.c | 2 +- kernel/fork.c | 12 +- kernel/kallsyms.c | 8 +- kernel/kcmp.c | 29 +- kernel/kthread.c | 23 +- kernel/livepatch/core.c | 7 +- kernel/module.c | 24 +- kernel/pid.c | 15 +- kernel/sys.c | 2 +- kernel/sysctl.c | 54 +- kernel/trace/trace_kprobe.c | 4 +- kernel/ucount.c | 4 - mm/madvise.c | 2 +- mm/memcontrol.c | 2 +- mm/mincore.c | 2 +- net/bluetooth/bnep/core.c | 2 +- net/bluetooth/cmtp/core.c | 2 +- net/bluetooth/hidp/core.c | 2 +- net/sunrpc/auth_gss/gss_rpc_xdr.c | 2 +- net/sunrpc/auth_gss/svcauth_gss.c | 47 +- net/sunrpc/sched.c | 1 - net/sunrpc/svc.c | 314 +- net/sunrpc/svc_xprt.c | 104 +- net/sunrpc/svcauth.c | 8 +- net/sunrpc/svcauth_unix.c | 18 +- net/sunrpc/svcsock.c | 32 +- net/sunrpc/xdr.c | 112 +- net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 2 +- net/sunrpc/xprtrdma/svc_rdma_sendto.c | 32 +- net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +- net/unix/af_unix.c | 2 +- tools/objtool/check.c | 3 +- 183 files changed, 8839 insertions(+), 13928 deletions(-) delete mode 100644 fs/lockd/svcxdr.h create mode 100644 include/uapi/linux/nfsd/nfsfh.h diff --git a/Documentation/filesystems/files.rst b/Documentation/filesystems/files.rst index bcf84459917f..cbf8e57376bf 100644 --- a/Documentation/filesystems/files.rst +++ b/Documentation/filesystems/files.rst @@ -62,7 +62,7 @@ the fdtable structure - be held. 4. To look up the file structure given an fd, a reader - must use either lookup_fd_rcu() or files_lookup_fd_rcu() APIs. These + must use either fcheck() or fcheck_files() APIs. These take care of barrier requirements due to lock-free lookup. An example:: @@ -70,7 +70,7 @@ the fdtable structure - struct file *file; rcu_read_lock(); - file = lookup_fd_rcu(fd); + file = fcheck(fd); if (file) { ... } @@ -84,7 +84,7 @@ the fdtable structure - on ->f_count:: rcu_read_lock(); - file = files_lookup_fd_rcu(files, fd); + file = fcheck_files(files, fd); if (file) { if (atomic_long_inc_not_zero(&file->f_count)) *fput_needed = 1; @@ -104,7 +104,7 @@ the fdtable structure - lock-free, they must be installed using rcu_assign_pointer() API. If they are looked up lock-free, rcu_dereference() must be used. However it is advisable to use files_fdtable() - and lookup_fd_rcu()/files_lookup_fd_rcu() which take care of these issues. + and fcheck()/fcheck_files() which take care of these issues. 7. While updating, the fdtable pointer must be looked up while holding files->file_lock. If ->file_lock is dropped, then diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 5db6dec0b423..18d93fc7dc46 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -433,21 +433,17 @@ prototypes:: void (*lm_break)(struct file_lock *); /* break_lease callback */ int (*lm_change)(struct file_lock **, int); bool (*lm_breaker_owns_lease)(struct file_lock *); - bool (*lm_lock_expirable)(struct file_lock *); - void (*lm_expire_lock)(void); locking rules: ====================== ============= ================= ========= -ops flc_lock blocked_lock_lock may block +ops inode->i_lock blocked_lock_lock may block ====================== ============= ================= ========= -lm_notify: no yes no +lm_notify: yes yes no lm_grant: no no no lm_break: yes no no lm_change yes no no -lm_breaker_owns_lease: yes no no -lm_lock_expirable yes no no -lm_expire_lock no no yes +lm_breaker_owns_lease: no no no ====================== ============= ================= ========= buffer_head diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst index 6f59a364f84c..33d588a01ace 100644 --- a/Documentation/filesystems/nfs/exporting.rst +++ b/Documentation/filesystems/nfs/exporting.rst @@ -154,11 +154,6 @@ struct which has the following members: to find potential names, and matches inode numbers to find the correct match. - flags - Some filesystems may need to be handled differently than others. The - export_operations struct also includes a flags field that allows the - filesystem to communicate such information to nfsd. See the Export - Operations Flags section below for more explanation. A filehandle fragment consists of an array of 1 or more 4byte words, together with a one byte "type". @@ -168,76 +163,3 @@ generated by encode_fh, in which case it will have been padded with nuls. Rather, the encode_fh routine should choose a "type" which indicates the decode_fh how much of the filehandle is valid, and how it should be interpreted. - -Export Operations Flags ------------------------ -In addition to the operation vector pointers, struct export_operations also -contains a "flags" field that allows the filesystem to communicate to nfsd -that it may want to do things differently when dealing with it. The -following flags are defined: - - EXPORT_OP_NOWCC - disable NFSv3 WCC attributes on this filesystem - RFC 1813 recommends that servers always send weak cache consistency - (WCC) data to the client after each operation. The server should - atomically collect attributes about the inode, do an operation on it, - and then collect the attributes afterward. This allows the client to - skip issuing GETATTRs in some situations but means that the server - is calling vfs_getattr for almost all RPCs. On some filesystems - (particularly those that are clustered or networked) this is expensive - and atomicity is difficult to guarantee. This flag indicates to nfsd - that it should skip providing WCC attributes to the client in NFSv3 - replies when doing operations on this filesystem. Consider enabling - this on filesystems that have an expensive ->getattr inode operation, - or when atomicity between pre and post operation attribute collection - is impossible to guarantee. - - EXPORT_OP_NOSUBTREECHK - disallow subtree checking on this fs - Many NFS operations deal with filehandles, which the server must then - vet to ensure that they live inside of an exported tree. When the - export consists of an entire filesystem, this is trivial. nfsd can just - ensure that the filehandle live on the filesystem. When only part of a - filesystem is exported however, then nfsd must walk the ancestors of the - inode to ensure that it's within an exported subtree. This is an - expensive operation and not all filesystems can support it properly. - This flag exempts the filesystem from subtree checking and causes - exportfs to get back an error if it tries to enable subtree checking - on it. - - EXPORT_OP_CLOSE_BEFORE_UNLINK - always close cached files before unlinking - On some exportable filesystems (such as NFS) unlinking a file that - is still open can cause a fair bit of extra work. For instance, - the NFS client will do a "sillyrename" to ensure that the file - sticks around while it's still open. When reexporting, that open - file is held by nfsd so we usually end up doing a sillyrename, and - then immediately deleting the sillyrenamed file just afterward when - the link count actually goes to zero. Sometimes this delete can race - with other operations (for instance an rmdir of the parent directory). - This flag causes nfsd to close any open files for this inode _before_ - calling into the vfs to do an unlink or a rename that would replace - an existing file. - - EXPORT_OP_REMOTE_FS - Backing storage for this filesystem is remote - PF_LOCAL_THROTTLE exists for loopback NFSD, where a thread needs to - write to one bdi (the final bdi) in order to free up writes queued - to another bdi (the client bdi). Such threads get a private balance - of dirty pages so that dirty pages for the client bdi do not imact - the daemon writing to the final bdi. For filesystems whose durable - storage is not local (such as exported NFS filesystems), this - constraint has negative consequences. EXPORT_OP_REMOTE_FS enables - an export to disable writeback throttling. - - EXPORT_OP_NOATOMIC_ATTR - Filesystem does not update attributes atomically - EXPORT_OP_NOATOMIC_ATTR indicates that the exported filesystem - cannot provide the semantics required by the "atomic" boolean in - NFSv4's change_info4. This boolean indicates to a client whether the - returned before and after change attributes were obtained atomically - with the respect to the requested metadata operation (UNLINK, - OPEN/CREATE, MKDIR, etc). - - EXPORT_OP_FLUSH_ON_CLOSE - Filesystem flushes file data on close(2) - On most filesystems, inodes can remain under writeback after the - file is closed. NFSD relies on client activity or local flusher - threads to handle writeback. Certain filesystems, such as NFS, flush - all of an inode's dirty data on last close. Exports that behave this - way should set EXPORT_OP_FLUSH_ON_CLOSE so that NFSD knows to skip - waiting for writeback when closing such files. diff --git a/arch/powerpc/platforms/cell/spufs/coredump.c b/arch/powerpc/platforms/cell/spufs/coredump.c index 60b5583e9eaf..026c181a98c5 100644 --- a/arch/powerpc/platforms/cell/spufs/coredump.c +++ b/arch/powerpc/platforms/cell/spufs/coredump.c @@ -74,7 +74,7 @@ static struct spu_context *coredump_next_context(int *fd) *fd = n - 1; rcu_read_lock(); - file = lookup_fd_rcu(*fd); + file = fcheck(*fd); ctx = SPUFS_I(file_inode(file))->i_ctx; get_spu_context(ctx); rcu_read_unlock(); diff --git a/crypto/algboss.c b/crypto/algboss.c index b87f907bb142..5ebccbd6b74e 100644 --- a/crypto/algboss.c +++ b/crypto/algboss.c @@ -74,7 +74,7 @@ static int cryptomgr_probe(void *data) complete_all(¶m->larval->completion); crypto_alg_put(¶m->larval->alg); kfree(param); - module_put_and_kthread_exit(0); + module_put_and_exit(0); } static int cryptomgr_schedule_probe(struct crypto_larval *larval) @@ -209,7 +209,7 @@ static int cryptomgr_test(void *data) crypto_alg_tested(param->driver, err); kfree(param); - module_put_and_kthread_exit(0); + module_put_and_exit(0); } static int cryptomgr_schedule_test(struct crypto_alg *alg) diff --git a/fs/Kconfig b/fs/Kconfig index d34b8227c772..a6a721108d1c 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -321,7 +321,7 @@ config LOCKD config LOCKD_V4 bool - depends on NFSD || NFS_V3 + depends on NFSD_V3 || NFS_V3 depends on FILE_LOCKING default y @@ -334,10 +334,6 @@ config NFS_COMMON depends on NFSD || NFS_FS || LOCKD default y -config NFS_V4_2_SSC_HELPER - bool - default y if NFS_V4_2 - source "net/sunrpc/Kconfig" source "fs/ceph/Kconfig" source "fs/cifs/Kconfig" diff --git a/fs/autofs/dev-ioctl.c b/fs/autofs/dev-ioctl.c index 5bf781ea6d67..322b7dfb4ea0 100644 --- a/fs/autofs/dev-ioctl.c +++ b/fs/autofs/dev-ioctl.c @@ -4,10 +4,9 @@ * Copyright 2008 Ian Kent */ -#include #include #include -#include +#include #include #include @@ -290,7 +289,7 @@ static int autofs_dev_ioctl_closemount(struct file *fp, struct autofs_sb_info *sbi, struct autofs_dev_ioctl *param) { - return close_fd(param->ioctlfd); + return ksys_close(param->ioctlfd); } /* diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index 7b987de0babe..ecc8ecbbfa5a 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -412,14 +412,9 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache, if (ret < 0) { cachefiles_io_error(cache, "Rename security error %d", ret); } else { - struct renamedata rd = { - .old_dir = d_inode(dir), - .old_dentry = rep, - .new_dir = d_inode(cache->graveyard), - .new_dentry = grave, - }; trace_cachefiles_rename(object, rep, grave, why); - ret = vfs_rename(&rd); + ret = vfs_rename(d_inode(dir), rep, + d_inode(cache->graveyard), grave, NULL, 0); if (ret != 0 && ret != -ENOMEM) cachefiles_io_error(cache, "Rename failed with error %d", ret); diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index a3c0e6a4e484..164b98540716 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -1242,7 +1242,7 @@ cifs_demultiplex_thread(void *p) } memalloc_noreclaim_restore(noreclaim_flag); - module_put_and_kthread_exit(0); + module_put_and_exit(0); } /* extract the host portion of the UNC string */ diff --git a/fs/coredump.c b/fs/coredump.c index ca4802d14158..7c5edadf5208 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -587,6 +587,7 @@ void do_coredump(const kernel_siginfo_t *siginfo) int ispipe; size_t *argv = NULL; int argc = 0; + struct files_struct *displaced; /* require nonrelative corefile path and be extra careful */ bool need_suid_safe = false; bool core_dumped = false; @@ -792,9 +793,11 @@ void do_coredump(const kernel_siginfo_t *siginfo) } /* get us an unshared descriptor table; almost always a no-op */ - retval = unshare_files(); + retval = unshare_files(&displaced); if (retval) goto close_fail; + if (displaced) + put_files_struct(displaced); if (!dump_interrupted()) { /* * umh disabled with CONFIG_STATIC_USERMODEHELPER_PATH="" would diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index cd1a60a319b8..7777bb6f66d2 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -598,7 +598,6 @@ ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry, struct dentry *lower_new_dir_dentry; struct dentry *trap; struct inode *target_inode; - struct renamedata rd = {}; if (flags) return -EINVAL; @@ -628,12 +627,9 @@ ecryptfs_rename(struct inode *old_dir, struct dentry *old_dentry, rc = -ENOTEMPTY; goto out_lock; } - - rd.old_dir = d_inode(lower_old_dir_dentry); - rd.old_dentry = lower_old_dentry; - rd.new_dir = d_inode(lower_new_dir_dentry); - rd.new_dentry = lower_new_dentry; - rc = vfs_rename(&rd); + rc = vfs_rename(d_inode(lower_old_dir_dentry), lower_old_dentry, + d_inode(lower_new_dir_dentry), lower_new_dentry, + NULL, 0); if (rc) goto out_lock; if (target_inode) diff --git a/fs/exec.c b/fs/exec.c index 398ccf06d799..4edc932a7dce 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1266,11 +1266,6 @@ int begin_new_exec(struct linux_binprm * bprm) if (retval) goto out; - /* Ensure the files table is not shared. */ - retval = unshare_files(); - if (retval) - goto out; - /* * Must be called _before_ exec_mmap() as bprm->mm is * not visibile until then. This also enables the update @@ -1796,6 +1791,7 @@ static int bprm_execve(struct linux_binprm *bprm, int fd, struct filename *filename, int flags) { struct file *file; + struct files_struct *displaced; int retval; /* @@ -1803,10 +1799,14 @@ static int bprm_execve(struct linux_binprm *bprm, */ io_uring_task_cancel(); - retval = prepare_bprm_creds(bprm); + retval = unshare_files(&displaced); if (retval) return retval; + retval = prepare_bprm_creds(bprm); + if (retval) + goto out_files; + check_unsafe_exec(bprm); current->in_execve = 1; @@ -1820,14 +1820,11 @@ static int bprm_execve(struct linux_binprm *bprm, bprm->file = file; /* * Record that a name derived from an O_CLOEXEC fd will be - * inaccessible after exec. This allows the code in exec to - * choose to fail when the executable is not mmaped into the - * interpreter and an open file descriptor is not passed to - * the interpreter. This makes for a better user experience - * than having the interpreter start and then immediately fail - * when it finds the executable is inaccessible. + * inaccessible after exec. Relies on having exclusive access to + * current->files (due to unshare_files above). */ - if (bprm->fdpath && get_close_on_exec(fd)) + if (bprm->fdpath && + close_on_exec(fd, rcu_dereference_raw(current->files->fdt))) bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE; /* Set the unchanging part of bprm->cred */ @@ -1845,6 +1842,8 @@ static int bprm_execve(struct linux_binprm *bprm, rseq_execve(current); acct_update_integrals(current); task_numa_free(current, false); + if (displaced) + put_files_struct(displaced); return retval; out: @@ -1861,6 +1860,10 @@ static int bprm_execve(struct linux_binprm *bprm, current->fs->in_exec = 0; current->in_execve = 0; +out_files: + if (displaced) + reset_files_struct(displaced); + return retval; } diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c index 8c28bd1c9ed9..2dd55b172d57 100644 --- a/fs/exportfs/expfs.c +++ b/fs/exportfs/expfs.c @@ -18,7 +18,7 @@ #include #include -#define dprintk(fmt, args...) pr_debug(fmt, ##args) +#define dprintk(fmt, args...) do{}while(0) static int get_name(const struct path *path, char *name, struct dentry *child); @@ -132,8 +132,8 @@ static struct dentry *reconnect_one(struct vfsmount *mnt, inode_unlock(dentry->d_inode); if (IS_ERR(parent)) { - dprintk("get_parent of %lu failed, err %ld\n", - dentry->d_inode->i_ino, PTR_ERR(parent)); + dprintk("%s: get_parent of %ld failed, err %d\n", + __func__, dentry->d_inode->i_ino, PTR_ERR(parent)); return parent; } @@ -147,7 +147,7 @@ static struct dentry *reconnect_one(struct vfsmount *mnt, dprintk("%s: found name: %s\n", __func__, nbuf); tmp = lookup_one_len_unlocked(nbuf, parent, strlen(nbuf)); if (IS_ERR(tmp)) { - dprintk("lookup failed: %ld\n", PTR_ERR(tmp)); + dprintk("%s: lookup failed: %d\n", __func__, PTR_ERR(tmp)); err = PTR_ERR(tmp); goto out_err; } @@ -417,11 +417,9 @@ int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len, } EXPORT_SYMBOL_GPL(exportfs_encode_fh); -struct dentry * -exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len, - int fileid_type, - int (*acceptable)(void *, struct dentry *), - void *context) +struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid, + int fh_len, int fileid_type, + int (*acceptable)(void *, struct dentry *), void *context) { const struct export_operations *nop = mnt->mnt_sb->s_export_op; struct dentry *result, *alias; @@ -434,8 +432,10 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len, if (!nop || !nop->fh_to_dentry) return ERR_PTR(-ESTALE); result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type); + if (PTR_ERR(result) == -ENOMEM) + return ERR_CAST(result); if (IS_ERR_OR_NULL(result)) - return result; + return ERR_PTR(-ESTALE); /* * If no acceptance criteria was specified by caller, a disconnected @@ -561,26 +561,10 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len, err_result: dput(result); + if (err != -ENOMEM) + err = -ESTALE; return ERR_PTR(err); } -EXPORT_SYMBOL_GPL(exportfs_decode_fh_raw); - -struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid, - int fh_len, int fileid_type, - int (*acceptable)(void *, struct dentry *), - void *context) -{ - struct dentry *ret; - - ret = exportfs_decode_fh_raw(mnt, fid, fh_len, fileid_type, - acceptable, context); - if (IS_ERR_OR_NULL(ret)) { - if (ret == ERR_PTR(-ENOMEM)) - return ret; - return ERR_PTR(-ESTALE); - } - return ret; -} EXPORT_SYMBOL_GPL(exportfs_decode_fh); MODULE_LICENSE("GPL"); diff --git a/fs/file.c b/fs/file.c index fdb84a64724b..d6bc73960e4a 100644 --- a/fs/file.c +++ b/fs/file.c @@ -175,7 +175,7 @@ static int expand_fdtable(struct files_struct *files, unsigned int nr) spin_unlock(&files->file_lock); new_fdt = alloc_fdtable(nr); - /* make sure all fd_install() have seen resize_in_progress + /* make sure all __fd_install() have seen resize_in_progress * or have finished their rcu_read_lock_sched() section. */ if (atomic_read(&files->count) > 1) @@ -198,7 +198,7 @@ static int expand_fdtable(struct files_struct *files, unsigned int nr) rcu_assign_pointer(files->fdt, new_fdt); if (cur_fdt != &files->fdtab) call_rcu(&cur_fdt->rcu, free_fdtable_rcu); - /* coupled with smp_rmb() in fd_install() */ + /* coupled with smp_rmb() in __fd_install() */ smp_wmb(); return 1; } @@ -466,6 +466,18 @@ void put_files_struct(struct files_struct *files) } } +void reset_files_struct(struct files_struct *files) +{ + struct task_struct *tsk = current; + struct files_struct *old; + + old = tsk->files; + task_lock(tsk); + tsk->files = files; + task_unlock(tsk); + put_files_struct(old); +} + void exit_files(struct task_struct *tsk) { struct files_struct * files = tsk->files; @@ -509,9 +521,9 @@ static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start) /* * allocate a file descriptor, mark it busy. */ -static int alloc_fd(unsigned start, unsigned end, unsigned flags) +int __alloc_fd(struct files_struct *files, + unsigned start, unsigned end, unsigned flags) { - struct files_struct *files = current->files; unsigned int fd; int error; struct fdtable *fdt; @@ -567,9 +579,14 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags) return error; } +static int alloc_fd(unsigned start, unsigned flags) +{ + return __alloc_fd(current->files, start, rlimit(RLIMIT_NOFILE), flags); +} + int __get_unused_fd_flags(unsigned flags, unsigned long nofile) { - return alloc_fd(0, nofile, flags); + return __alloc_fd(current->files, 0, nofile, flags); } int get_unused_fd_flags(unsigned flags) @@ -608,13 +625,17 @@ EXPORT_SYMBOL(put_unused_fd); * It should never happen - if we allow dup2() do it, _really_ bad things * will follow. * - * This consumes the "file" refcount, so callers should treat it - * as if they had called fput(file). + * NOTE: __fd_install() variant is really, really low-level; don't + * use it unless you are forced to by truly lousy API shoved down + * your throat. 'files' *MUST* be either current->files or obtained + * by get_files_struct(current) done by whoever had given it to you, + * or really bad things will happen. Normally you want to use + * fd_install() instead. */ -void fd_install(unsigned int fd, struct file *file) +void __fd_install(struct files_struct *files, unsigned int fd, + struct file *file) { - struct files_struct *files = current->files; struct fdtable *fdt; rcu_read_lock_sched(); @@ -636,6 +657,15 @@ void fd_install(unsigned int fd, struct file *file) rcu_read_unlock_sched(); } +/* + * This consumes the "file" refcount, so callers should treat it + * as if they had called fput(file). + */ +void fd_install(unsigned int fd, struct file *file) +{ + __fd_install(current->files, fd, file); +} + EXPORT_SYMBOL(fd_install); static struct file *pick_file(struct files_struct *files, unsigned fd) @@ -659,9 +689,11 @@ static struct file *pick_file(struct files_struct *files, unsigned fd) return file; } -int close_fd(unsigned fd) +/* + * The same warnings as for __alloc_fd()/__fd_install() apply here... + */ +int __close_fd(struct files_struct *files, unsigned fd) { - struct files_struct *files = current->files; struct file *file; file = pick_file(files, fd); @@ -670,7 +702,7 @@ int close_fd(unsigned fd) return filp_close(file, files); } -EXPORT_SYMBOL(close_fd); /* for ksys_close() */ +EXPORT_SYMBOL(__close_fd); /* for ksys_close() */ /** * __close_range() - Close all file descriptors in a given range. @@ -829,28 +861,68 @@ void do_close_on_exec(struct files_struct *files) spin_unlock(&files->file_lock); } +static inline struct file *__fget_files_rcu(struct files_struct *files, + unsigned int fd, fmode_t mask, unsigned int refs) +{ + for (;;) { + struct file *file; + struct fdtable *fdt = rcu_dereference_raw(files->fdt); + struct file __rcu **fdentry; + + if (unlikely(fd >= fdt->max_fds)) + return NULL; + + fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds); + file = rcu_dereference_raw(*fdentry); + if (unlikely(!file)) + return NULL; + + if (unlikely(file->f_mode & mask)) + return NULL; + + /* + * Ok, we have a file pointer. However, because we do + * this all locklessly under RCU, we may be racing with + * that file being closed. + * + * Such a race can take two forms: + * + * (a) the file ref already went down to zero, + * and get_file_rcu_many() fails. Just try + * again: + */ + if (unlikely(!get_file_rcu_many(file, refs))) + continue; + + /* + * (b) the file table entry has changed under us. + * Note that we don't need to re-check the 'fdt->fd' + * pointer having changed, because it always goes + * hand-in-hand with 'fdt'. + * + * If so, we need to put our refs and try again. + */ + if (unlikely(rcu_dereference_raw(files->fdt) != fdt) || + unlikely(rcu_dereference_raw(*fdentry) != file)) { + fput_many(file, refs); + continue; + } + + /* + * Ok, we have a ref to the file, and checked that it + * still exists. + */ + return file; + } +} + static struct file *__fget_files(struct files_struct *files, unsigned int fd, fmode_t mask, unsigned int refs) { struct file *file; rcu_read_lock(); -loop: - file = files_lookup_fd_rcu(files, fd); - if (file) { - /* File object ref couldn't be taken. - * dup2() atomicity guarantee is the reason - * we loop to catch the new file (or NULL pointer) - */ - if (file->f_mode & mask) - file = NULL; - else if (!get_file_rcu_many(file, refs)) - goto loop; - else if (files_lookup_fd_raw(files, fd) != file) { - fput_many(file, refs); - goto loop; - } - } + file = __fget_files_rcu(files, fd, mask, refs); rcu_read_unlock(); return file; @@ -891,42 +963,6 @@ struct file *fget_task(struct task_struct *task, unsigned int fd) return file; } -struct file *task_lookup_fd_rcu(struct task_struct *task, unsigned int fd) -{ - /* Must be called with rcu_read_lock held */ - struct files_struct *files; - struct file *file = NULL; - - task_lock(task); - files = task->files; - if (files) - file = files_lookup_fd_rcu(files, fd); - task_unlock(task); - - return file; -} - -struct file *task_lookup_next_fd_rcu(struct task_struct *task, unsigned int *ret_fd) -{ - /* Must be called with rcu_read_lock held */ - struct files_struct *files; - unsigned int fd = *ret_fd; - struct file *file = NULL; - - task_lock(task); - files = task->files; - if (files) { - for (; fd < files_fdtable(files)->max_fds; fd++) { - file = files_lookup_fd_rcu(files, fd); - if (file) - break; - } - } - task_unlock(task); - *ret_fd = fd; - return file; -} - /* * Lightweight file lookup - no refcnt increment if fd table isn't shared. * @@ -949,7 +985,7 @@ static unsigned long __fget_light(unsigned int fd, fmode_t mask) struct file *file; if (atomic_read(&files->count) == 1) { - file = files_lookup_fd_raw(files, fd); + file = __fcheck_files(files, fd); if (!file || unlikely(file->f_mode & mask)) return 0; return (unsigned long)file; @@ -1085,7 +1121,7 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags) struct files_struct *files = current->files; if (!file) - return close_fd(fd); + return __close_fd(files, fd); if (fd >= rlimit(RLIMIT_NOFILE)) return -EBADF; @@ -1174,7 +1210,7 @@ static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags) spin_lock(&files->file_lock); err = expand_files(files, newfd); - file = files_lookup_fd_locked(files, oldfd); + file = fcheck(oldfd); if (unlikely(!file)) goto Ebadf; if (unlikely(err < 0)) { @@ -1203,7 +1239,7 @@ SYSCALL_DEFINE2(dup2, unsigned int, oldfd, unsigned int, newfd) int retval = oldfd; rcu_read_lock(); - if (!files_lookup_fd_rcu(files, oldfd)) + if (!fcheck_files(files, oldfd)) retval = -EBADF; rcu_read_unlock(); return retval; @@ -1228,11 +1264,10 @@ SYSCALL_DEFINE1(dup, unsigned int, fildes) int f_dupfd(unsigned int from, struct file *file, unsigned flags) { - unsigned long nofile = rlimit(RLIMIT_NOFILE); int err; - if (from >= nofile) + if (from >= rlimit(RLIMIT_NOFILE)) return -EINVAL; - err = alloc_fd(from, nofile, flags); + err = alloc_fd(from, flags); if (err >= 0) { get_file(file); fd_install(err, file); diff --git a/fs/init.c b/fs/init.c index 02723bea8499..e9c320a48cf1 100644 --- a/fs/init.c +++ b/fs/init.c @@ -49,7 +49,7 @@ int __init init_chdir(const char *filename) error = kern_path(filename, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &path); if (error) return error; - error = path_permission(&path, MAY_EXEC | MAY_CHDIR); + error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR); if (!error) set_fs_pwd(current->fs, &path); path_put(&path); @@ -64,7 +64,7 @@ int __init init_chroot(const char *filename) error = kern_path(filename, LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &path); if (error) return error; - error = path_permission(&path, MAY_EXEC | MAY_CHDIR); + error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR); if (error) goto dput_and_out; error = -EPERM; @@ -118,7 +118,7 @@ int __init init_eaccess(const char *filename) error = kern_path(filename, LOOKUP_FOLLOW, &path); if (error) return error; - error = path_permission(&path, MAY_ACCESS); + error = inode_permission(d_inode(path.dentry), MAY_ACCESS); path_put(&path); return error; } diff --git a/fs/lockd/clnt4xdr.c b/fs/lockd/clnt4xdr.c index 8161667c976f..7df6324ccb8a 100644 --- a/fs/lockd/clnt4xdr.c +++ b/fs/lockd/clnt4xdr.c @@ -261,6 +261,7 @@ static int decode_nlm4_holder(struct xdr_stream *xdr, struct nlm_res *result) u32 exclusive; int error; __be32 *p; + s32 end; memset(lock, 0, sizeof(*lock)); locks_init_lock(fl); @@ -284,7 +285,13 @@ static int decode_nlm4_holder(struct xdr_stream *xdr, struct nlm_res *result) fl->fl_type = exclusive != 0 ? F_WRLCK : F_RDLCK; p = xdr_decode_hyper(p, &l_offset); xdr_decode_hyper(p, &l_len); - nlm4svc_set_file_lock_range(fl, l_offset, l_len); + end = l_offset + l_len - 1; + + fl->fl_start = (loff_t)l_offset; + if (l_len == 0 || end < 0) + fl->fl_end = OFFSET_MAX; + else + fl->fl_end = (loff_t)end; error = 0; out: return error; diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c index 99fffc9cb958..b11f2afa84f1 100644 --- a/fs/lockd/clntproc.c +++ b/fs/lockd/clntproc.c @@ -794,6 +794,9 @@ static void nlmclnt_cancel_callback(struct rpc_task *task, void *data) goto retry_cancel; } + dprintk("lockd: cancel status %u (task %u)\n", + status, task->tk_pid); + switch (status) { case NLM_LCK_GRANTED: case NLM_LCK_DENIED_GRACE_PERIOD: diff --git a/fs/lockd/host.c b/fs/lockd/host.c index cdc8e12cdac4..771c289f6df7 100644 --- a/fs/lockd/host.c +++ b/fs/lockd/host.c @@ -163,8 +163,8 @@ static struct nlm_host *nlm_alloc_host(struct nlm_lookup_host_info *ni, host->h_nsmhandle = nsm; host->h_addrbuf = nsm->sm_addrbuf; host->net = ni->net; - host->h_cred = get_cred(ni->cred); - strscpy(host->nodename, utsname()->nodename, sizeof(host->nodename)); + host->h_cred = get_cred(ni->cred), + strlcpy(host->nodename, utsname()->nodename, sizeof(host->nodename)); out: return host; diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index 5579e67da17d..1a639e34847d 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -54,9 +54,13 @@ EXPORT_SYMBOL_GPL(nlmsvc_ops); static DEFINE_MUTEX(nlmsvc_mutex); static unsigned int nlmsvc_users; -static struct svc_serv *nlmsvc_serv; +static struct task_struct *nlmsvc_task; +static struct svc_rqst *nlmsvc_rqst; unsigned long nlmsvc_timeout; +static atomic_t nlm_ntf_refcnt = ATOMIC_INIT(0); +static DECLARE_WAIT_QUEUE_HEAD(nlm_ntf_wq); + unsigned int lockd_net_id; /* @@ -180,10 +184,6 @@ lockd(void *vrqstp) nlm_shutdown_hosts(); cancel_delayed_work_sync(&ln->grace_period_end); locks_end_grace(&ln->lockd_manager); - - dprintk("lockd_down: service stopped\n"); - - svc_exit_thread(rqstp); return 0; } @@ -196,8 +196,8 @@ static int create_lockd_listener(struct svc_serv *serv, const char *name, xprt = svc_find_xprt(serv, name, net, family, 0); if (xprt == NULL) - return svc_xprt_create(serv, name, net, family, port, - SVC_SOCK_DEFAULTS, cred); + return svc_create_xprt(serv, name, net, family, port, + SVC_SOCK_DEFAULTS, cred); svc_xprt_put(xprt); return 0; } @@ -247,8 +247,7 @@ static int make_socks(struct svc_serv *serv, struct net *net, if (warned++ == 0) printk(KERN_WARNING "lockd_up: makesock failed, error=%d\n", err); - svc_xprt_destroy_all(serv, net); - svc_rpcb_cleanup(serv, net); + svc_shutdown_net(serv, net); return err; } @@ -286,12 +285,13 @@ static void lockd_down_net(struct svc_serv *serv, struct net *net) nlm_shutdown_hosts_net(net); cancel_delayed_work_sync(&ln->grace_period_end); locks_end_grace(&ln->lockd_manager); - svc_xprt_destroy_all(serv, net); - svc_rpcb_cleanup(serv, net); + svc_shutdown_net(serv, net); + dprintk("%s: per-net data destroyed; net=%x\n", + __func__, net->ns.inum); } } else { - pr_err("%s: no users! net=%x\n", - __func__, net->ns.inum); + pr_err("%s: no users! task=%p, net=%x\n", + __func__, nlmsvc_task, net->ns.inum); BUG(); } } @@ -302,16 +302,20 @@ static int lockd_inetaddr_event(struct notifier_block *this, struct in_ifaddr *ifa = (struct in_ifaddr *)ptr; struct sockaddr_in sin; - if (event != NETDEV_DOWN) + if ((event != NETDEV_DOWN) || + !atomic_inc_not_zero(&nlm_ntf_refcnt)) goto out; - if (nlmsvc_serv) { + if (nlmsvc_rqst) { dprintk("lockd_inetaddr_event: removed %pI4\n", &ifa->ifa_local); sin.sin_family = AF_INET; sin.sin_addr.s_addr = ifa->ifa_local; - svc_age_temp_xprts_now(nlmsvc_serv, (struct sockaddr *)&sin); + svc_age_temp_xprts_now(nlmsvc_rqst->rq_server, + (struct sockaddr *)&sin); } + atomic_dec(&nlm_ntf_refcnt); + wake_up(&nlm_ntf_wq); out: return NOTIFY_DONE; @@ -328,17 +332,21 @@ static int lockd_inet6addr_event(struct notifier_block *this, struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr; struct sockaddr_in6 sin6; - if (event != NETDEV_DOWN) + if ((event != NETDEV_DOWN) || + !atomic_inc_not_zero(&nlm_ntf_refcnt)) goto out; - if (nlmsvc_serv) { + if (nlmsvc_rqst) { dprintk("lockd_inet6addr_event: removed %pI6\n", &ifa->addr); sin6.sin6_family = AF_INET6; sin6.sin6_addr = ifa->addr; if (ipv6_addr_type(&sin6.sin6_addr) & IPV6_ADDR_LINKLOCAL) sin6.sin6_scope_id = ifa->idev->dev->ifindex; - svc_age_temp_xprts_now(nlmsvc_serv, (struct sockaddr *)&sin6); + svc_age_temp_xprts_now(nlmsvc_rqst->rq_server, + (struct sockaddr *)&sin6); } + atomic_dec(&nlm_ntf_refcnt); + wake_up(&nlm_ntf_wq); out: return NOTIFY_DONE; @@ -349,14 +357,86 @@ static struct notifier_block lockd_inet6addr_notifier = { }; #endif -static int lockd_get(void) +static void lockd_unregister_notifiers(void) +{ + unregister_inetaddr_notifier(&lockd_inetaddr_notifier); +#if IS_ENABLED(CONFIG_IPV6) + unregister_inet6addr_notifier(&lockd_inet6addr_notifier); +#endif + wait_event(nlm_ntf_wq, atomic_read(&nlm_ntf_refcnt) == 0); +} + +static void lockd_svc_exit_thread(void) +{ + atomic_dec(&nlm_ntf_refcnt); + lockd_unregister_notifiers(); + svc_exit_thread(nlmsvc_rqst); +} + +static int lockd_start_svc(struct svc_serv *serv) { - struct svc_serv *serv; int error; - if (nlmsvc_serv) { - nlmsvc_users++; + if (nlmsvc_rqst) return 0; + + /* + * Create the kernel thread and wait for it to start. + */ + nlmsvc_rqst = svc_prepare_thread(serv, &serv->sv_pools[0], NUMA_NO_NODE); + if (IS_ERR(nlmsvc_rqst)) { + error = PTR_ERR(nlmsvc_rqst); + printk(KERN_WARNING + "lockd_up: svc_rqst allocation failed, error=%d\n", + error); + lockd_unregister_notifiers(); + goto out_rqst; + } + + atomic_inc(&nlm_ntf_refcnt); + svc_sock_update_bufs(serv); + serv->sv_maxconn = nlm_max_connections; + + nlmsvc_task = kthread_create(lockd, nlmsvc_rqst, "%s", serv->sv_name); + if (IS_ERR(nlmsvc_task)) { + error = PTR_ERR(nlmsvc_task); + printk(KERN_WARNING + "lockd_up: kthread_run failed, error=%d\n", error); + goto out_task; + } + nlmsvc_rqst->rq_task = nlmsvc_task; + wake_up_process(nlmsvc_task); + + dprintk("lockd_up: service started\n"); + return 0; + +out_task: + lockd_svc_exit_thread(); + nlmsvc_task = NULL; +out_rqst: + nlmsvc_rqst = NULL; + return error; +} + +static const struct svc_serv_ops lockd_sv_ops = { + .svo_shutdown = svc_rpcb_cleanup, + .svo_enqueue_xprt = svc_xprt_do_enqueue, +}; + +static struct svc_serv *lockd_create_svc(void) +{ + struct svc_serv *serv; + + /* + * Check whether we're already up and running. + */ + if (nlmsvc_rqst) { + /* + * Note: increase service usage, because later in case of error + * svc_destroy() will be called. + */ + svc_get(nlmsvc_rqst->rq_server); + return nlmsvc_rqst->rq_server; } /* @@ -371,44 +451,17 @@ static int lockd_get(void) nlm_timeout = LOCKD_DFLT_TIMEO; nlmsvc_timeout = nlm_timeout * HZ; - serv = svc_create(&nlmsvc_program, LOCKD_BUFSIZE, lockd); + serv = svc_create(&nlmsvc_program, LOCKD_BUFSIZE, &lockd_sv_ops); if (!serv) { printk(KERN_WARNING "lockd_up: create service failed\n"); - return -ENOMEM; + return ERR_PTR(-ENOMEM); } - - serv->sv_maxconn = nlm_max_connections; - error = svc_set_num_threads(serv, NULL, 1); - /* The thread now holds the only reference */ - svc_put(serv); - if (error < 0) - return error; - - nlmsvc_serv = serv; register_inetaddr_notifier(&lockd_inetaddr_notifier); #if IS_ENABLED(CONFIG_IPV6) register_inet6addr_notifier(&lockd_inet6addr_notifier); #endif dprintk("lockd_up: service created\n"); - nlmsvc_users++; - return 0; -} - -static void lockd_put(void) -{ - if (WARN(nlmsvc_users <= 0, "lockd_down: no users!\n")) - return; - if (--nlmsvc_users) - return; - - unregister_inetaddr_notifier(&lockd_inetaddr_notifier); -#if IS_ENABLED(CONFIG_IPV6) - unregister_inet6addr_notifier(&lockd_inet6addr_notifier); -#endif - - svc_set_num_threads(nlmsvc_serv, NULL, 0); - nlmsvc_serv = NULL; - dprintk("lockd_down: service destroyed\n"); + return serv; } /* @@ -416,21 +469,36 @@ static void lockd_put(void) */ int lockd_up(struct net *net, const struct cred *cred) { + struct svc_serv *serv; int error; mutex_lock(&nlmsvc_mutex); - error = lockd_get(); - if (error) - goto err; - - error = lockd_up_net(nlmsvc_serv, net, cred); - if (error < 0) { - lockd_put(); - goto err; + serv = lockd_create_svc(); + if (IS_ERR(serv)) { + error = PTR_ERR(serv); + goto err_create; } -err: + error = lockd_up_net(serv, net, cred); + if (error < 0) { + lockd_unregister_notifiers(); + goto err_put; + } + + error = lockd_start_svc(serv); + if (error < 0) { + lockd_down_net(serv, net); + goto err_put; + } + nlmsvc_users++; + /* + * Note: svc_serv structures have an initial use count of 1, + * so we exit through here on both success and failure. + */ +err_put: + svc_destroy(serv); +err_create: mutex_unlock(&nlmsvc_mutex); return error; } @@ -443,8 +511,27 @@ void lockd_down(struct net *net) { mutex_lock(&nlmsvc_mutex); - lockd_down_net(nlmsvc_serv, net); - lockd_put(); + lockd_down_net(nlmsvc_rqst->rq_server, net); + if (nlmsvc_users) { + if (--nlmsvc_users) + goto out; + } else { + printk(KERN_ERR "lockd_down: no users! task=%p\n", + nlmsvc_task); + BUG(); + } + + if (!nlmsvc_task) { + printk(KERN_ERR "lockd_down: no lockd running.\n"); + BUG(); + } + kthread_stop(nlmsvc_task); + dprintk("lockd_down: service stopped\n"); + lockd_svc_exit_thread(); + dprintk("lockd_down: service destroyed\n"); + nlmsvc_task = NULL; + nlmsvc_rqst = NULL; +out: mutex_unlock(&nlmsvc_mutex); } EXPORT_SYMBOL_GPL(lockd_down); @@ -497,7 +584,7 @@ static struct ctl_table nlm_sysctls[] = { .data = &nsm_use_hostnames, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dobool, + .proc_handler = proc_dointvec, }, { .procname = "nsm_local_state", @@ -562,7 +649,6 @@ static int lockd_authenticate(struct svc_rqst *rqstp) switch (rqstp->rq_authop->flavour) { case RPC_AUTH_NULL: case RPC_AUTH_UNIX: - rqstp->rq_auth_stat = rpc_auth_ok; if (rqstp->rq_proc == 0) return SVC_OK; if (is_callback(rqstp->rq_proc)) { @@ -573,7 +659,6 @@ static int lockd_authenticate(struct svc_rqst *rqstp) } return svc_set_client(rqstp); } - rqstp->rq_auth_stat = rpc_autherr_badcred; return SVC_DENIED; } @@ -681,44 +766,6 @@ static void __exit exit_nlm(void) module_init(init_nlm); module_exit(exit_nlm); -/** - * nlmsvc_dispatch - Process an NLM Request - * @rqstp: incoming request - * @statp: pointer to location of accept_stat field in RPC Reply buffer - * - * Return values: - * %0: Processing complete; do not send a Reply - * %1: Processing complete; send Reply in rqstp->rq_res - */ -static int nlmsvc_dispatch(struct svc_rqst *rqstp, __be32 *statp) -{ - const struct svc_procedure *procp = rqstp->rq_procinfo; - - svcxdr_init_decode(rqstp); - if (!procp->pc_decode(rqstp, &rqstp->rq_arg_stream)) - goto out_decode_err; - - *statp = procp->pc_func(rqstp); - if (*statp == rpc_drop_reply) - return 0; - if (*statp != rpc_success) - return 1; - - svcxdr_init_encode(rqstp); - if (!procp->pc_encode(rqstp, &rqstp->rq_res_stream)) - goto out_encode_err; - - return 1; - -out_decode_err: - *statp = rpc_garbage_args; - return 1; - -out_encode_err: - *statp = rpc_system_err; - return 1; -} - /* * Define NLM program and procedures */ @@ -728,7 +775,6 @@ static const struct svc_version nlmsvc_version1 = { .vs_nproc = 17, .vs_proc = nlmsvc_procedures, .vs_count = nlmsvc_version1_count, - .vs_dispatch = nlmsvc_dispatch, .vs_xdrsize = NLMSVC_XDRSIZE, }; static unsigned int nlmsvc_version3_count[24]; @@ -737,7 +783,6 @@ static const struct svc_version nlmsvc_version3 = { .vs_nproc = 24, .vs_proc = nlmsvc_procedures, .vs_count = nlmsvc_version3_count, - .vs_dispatch = nlmsvc_dispatch, .vs_xdrsize = NLMSVC_XDRSIZE, }; #ifdef CONFIG_LOCKD_V4 @@ -747,7 +792,6 @@ static const struct svc_version nlmsvc_version4 = { .vs_nproc = 24, .vs_proc = nlmsvc_procedures4, .vs_count = nlmsvc_version4_count, - .vs_dispatch = nlmsvc_dispatch, .vs_xdrsize = NLMSVC_XDRSIZE, }; #endif diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c index b72023a6b4c1..fa41dda39925 100644 --- a/fs/lockd/svc4proc.c +++ b/fs/lockd/svc4proc.c @@ -32,10 +32,6 @@ nlm4svc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, if (!nlmsvc_ops) return nlm_lck_denied_nolocks; - if (lock->lock_start > OFFSET_MAX || - (lock->lock_len && ((lock->lock_len - 1) > (OFFSET_MAX - lock->lock_start)))) - return nlm4_fbig; - /* Obtain host handle */ if (!(host = nlmsvc_lookup_host(rqstp, lock->caller, lock->len)) || (argp->monitor && nsm_monitor(host) < 0)) @@ -44,21 +40,13 @@ nlm4svc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, /* Obtain file pointer. Not used by FREE_ALL call. */ if (filp != NULL) { - int mode = lock_to_openmode(&lock->fl); - - error = nlm_lookup_file(rqstp, &file, lock); - if (error) + if ((error = nlm_lookup_file(rqstp, &file, &lock->fh)) != 0) goto no_locks; *filp = file; /* Set up the missing parts of the file_lock structure */ - lock->fl.fl_flags = FL_POSIX; - lock->fl.fl_file = file->f_file[mode]; + lock->fl.fl_file = file->f_file; lock->fl.fl_pid = current->tgid; - lock->fl.fl_start = (loff_t)lock->lock_start; - lock->fl.fl_end = lock->lock_len ? - (loff_t)(lock->lock_start + lock->lock_len - 1) : - OFFSET_MAX; lock->fl.fl_lmops = &nlmsvc_lock_operations; nlmsvc_locks_init_private(&lock->fl, host, (pid_t)lock->svid); if (!lock->fl.fl_owner) { @@ -96,7 +84,6 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp) struct nlm_args *argp = rqstp->rq_argp; struct nlm_host *host; struct nlm_file *file; - struct nlm_lockowner *test_owner; __be32 rc = rpc_success; dprintk("lockd: TEST4 called\n"); @@ -106,7 +93,6 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp) if ((resp->status = nlm4svc_retrieve_args(rqstp, argp, &host, &file))) return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success; - test_owner = argp->lock.fl.fl_owner; /* Now check for conflicting locks */ resp->status = nlmsvc_testlock(rqstp, file, host, &argp->lock, &resp->lock, &resp->cookie); if (resp->status == nlm_drop_reply) @@ -114,7 +100,7 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp) else dprintk("lockd: TEST4 status %d\n", ntohl(resp->status)); - nlmsvc_put_lockowner(test_owner); + nlmsvc_release_lockowner(&argp->lock); nlmsvc_release_host(host); nlm_release_file(file); return rc; @@ -280,6 +266,8 @@ nlm4svc_proc_granted(struct svc_rqst *rqstp) */ static void nlm4svc_callback_exit(struct rpc_task *task, void *data) { + dprintk("lockd: %5u callback returned %d\n", task->tk_pid, + -task->tk_status); } static void nlm4svc_callback_release(void *data) @@ -522,239 +510,191 @@ const struct svc_procedure nlmsvc_procedures4[24] = { .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_void), - .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "NULL", }, [NLMPROC_TEST] = { .pc_func = nlm4svc_proc_test, .pc_decode = nlm4svc_decode_testargs, .pc_encode = nlm4svc_encode_testres, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+2+No+Rg, - .pc_name = "TEST", }, [NLMPROC_LOCK] = { .pc_func = nlm4svc_proc_lock, .pc_decode = nlm4svc_decode_lockargs, .pc_encode = nlm4svc_encode_res, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, - .pc_name = "LOCK", }, [NLMPROC_CANCEL] = { .pc_func = nlm4svc_proc_cancel, .pc_decode = nlm4svc_decode_cancargs, .pc_encode = nlm4svc_encode_res, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, - .pc_name = "CANCEL", }, [NLMPROC_UNLOCK] = { .pc_func = nlm4svc_proc_unlock, .pc_decode = nlm4svc_decode_unlockargs, .pc_encode = nlm4svc_encode_res, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, - .pc_name = "UNLOCK", }, [NLMPROC_GRANTED] = { .pc_func = nlm4svc_proc_granted, .pc_decode = nlm4svc_decode_testargs, .pc_encode = nlm4svc_encode_res, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, - .pc_name = "GRANTED", }, [NLMPROC_TEST_MSG] = { .pc_func = nlm4svc_proc_test_msg, .pc_decode = nlm4svc_decode_testargs, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "TEST_MSG", }, [NLMPROC_LOCK_MSG] = { .pc_func = nlm4svc_proc_lock_msg, .pc_decode = nlm4svc_decode_lockargs, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "LOCK_MSG", }, [NLMPROC_CANCEL_MSG] = { .pc_func = nlm4svc_proc_cancel_msg, .pc_decode = nlm4svc_decode_cancargs, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "CANCEL_MSG", }, [NLMPROC_UNLOCK_MSG] = { .pc_func = nlm4svc_proc_unlock_msg, .pc_decode = nlm4svc_decode_unlockargs, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "UNLOCK_MSG", }, [NLMPROC_GRANTED_MSG] = { .pc_func = nlm4svc_proc_granted_msg, .pc_decode = nlm4svc_decode_testargs, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "GRANTED_MSG", }, [NLMPROC_TEST_RES] = { .pc_func = nlm4svc_proc_null, .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_res), - .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "TEST_RES", }, [NLMPROC_LOCK_RES] = { .pc_func = nlm4svc_proc_null, .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_res), - .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "LOCK_RES", }, [NLMPROC_CANCEL_RES] = { .pc_func = nlm4svc_proc_null, .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_res), - .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "CANCEL_RES", }, [NLMPROC_UNLOCK_RES] = { .pc_func = nlm4svc_proc_null, .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_res), - .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "UNLOCK_RES", }, [NLMPROC_GRANTED_RES] = { .pc_func = nlm4svc_proc_granted_res, .pc_decode = nlm4svc_decode_res, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_res), - .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "GRANTED_RES", }, [NLMPROC_NSM_NOTIFY] = { .pc_func = nlm4svc_proc_sm_notify, .pc_decode = nlm4svc_decode_reboot, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_reboot), - .pc_argzero = sizeof(struct nlm_reboot), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "SM_NOTIFY", }, [17] = { .pc_func = nlm4svc_proc_unused, .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_void), - .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, - .pc_name = "UNUSED", }, [18] = { .pc_func = nlm4svc_proc_unused, .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_void), - .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, - .pc_name = "UNUSED", }, [19] = { .pc_func = nlm4svc_proc_unused, .pc_decode = nlm4svc_decode_void, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_void), - .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, - .pc_name = "UNUSED", }, [NLMPROC_SHARE] = { .pc_func = nlm4svc_proc_share, .pc_decode = nlm4svc_decode_shareargs, .pc_encode = nlm4svc_encode_shareres, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, - .pc_name = "SHARE", }, [NLMPROC_UNSHARE] = { .pc_func = nlm4svc_proc_unshare, .pc_decode = nlm4svc_decode_shareargs, .pc_encode = nlm4svc_encode_shareres, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, - .pc_name = "UNSHARE", }, [NLMPROC_NM_LOCK] = { .pc_func = nlm4svc_proc_nm_lock, .pc_decode = nlm4svc_decode_lockargs, .pc_encode = nlm4svc_encode_res, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, - .pc_name = "NM_LOCK", }, [NLMPROC_FREE_ALL] = { .pc_func = nlm4svc_proc_free_all, .pc_decode = nlm4svc_decode_notify, .pc_encode = nlm4svc_encode_void, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "FREE_ALL", }, }; diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c index 4e30f3c50970..273a81971ed5 100644 --- a/fs/lockd/svclock.c +++ b/fs/lockd/svclock.c @@ -31,7 +31,6 @@ #include #include #include -#include #define NLMDBG_FACILITY NLMDBG_SVCLOCK @@ -340,7 +339,7 @@ nlmsvc_get_lockowner(struct nlm_lockowner *lockowner) return lockowner; } -void nlmsvc_put_lockowner(struct nlm_lockowner *lockowner) +static void nlmsvc_put_lockowner(struct nlm_lockowner *lockowner) { if (!refcount_dec_and_lock(&lockowner->count, &lockowner->host->h_lock)) return; @@ -470,27 +469,18 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file, struct nlm_host *host, struct nlm_lock *lock, int wait, struct nlm_cookie *cookie, int reclaim) { -#if IS_ENABLED(CONFIG_SUNRPC_DEBUG) - struct inode *inode = nlmsvc_file_inode(file); -#endif struct nlm_block *block = NULL; int error; - int mode; - int async_block = 0; __be32 ret; dprintk("lockd: nlmsvc_lock(%s/%ld, ty=%d, pi=%d, %Ld-%Ld, bl=%d)\n", - inode->i_sb->s_id, inode->i_ino, + locks_inode(file->f_file)->i_sb->s_id, + locks_inode(file->f_file)->i_ino, lock->fl.fl_type, lock->fl.fl_pid, (long long)lock->fl.fl_start, (long long)lock->fl.fl_end, wait); - if (nlmsvc_file_file(file)->f_op->lock) { - async_block = wait; - wait = 0; - } - /* Lock file against concurrent access */ mutex_lock(&file->f_mutex); /* Get existing block (in case client is busy-waiting) @@ -534,8 +524,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file, if (!wait) lock->fl.fl_flags &= ~FL_SLEEP; - mode = lock_to_openmode(&lock->fl); - error = vfs_lock_file(file->f_file[mode], F_SETLK, &lock->fl, NULL); + error = vfs_lock_file(file->f_file, F_SETLK, &lock->fl, NULL); lock->fl.fl_flags &= ~FL_SLEEP; dprintk("lockd: vfs_lock_file returned %d\n", error); @@ -551,7 +540,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file, */ if (wait) break; - ret = async_block ? nlm_lck_blocked : nlm_lck_denied; + ret = nlm_lck_denied; goto out; case FILE_LOCK_DEFERRED: if (wait) @@ -588,12 +577,12 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file, struct nlm_lock *conflock, struct nlm_cookie *cookie) { int error; - int mode; __be32 ret; + struct nlm_lockowner *test_owner; dprintk("lockd: nlmsvc_testlock(%s/%ld, ty=%d, %Ld-%Ld)\n", - nlmsvc_file_inode(file)->i_sb->s_id, - nlmsvc_file_inode(file)->i_ino, + locks_inode(file->f_file)->i_sb->s_id, + locks_inode(file->f_file)->i_ino, lock->fl.fl_type, (long long)lock->fl.fl_start, (long long)lock->fl.fl_end); @@ -603,8 +592,10 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file, goto out; } - mode = lock_to_openmode(&lock->fl); - error = vfs_test_lock(file->f_file[mode], &lock->fl); + /* If there's a conflicting lock, remember to clean up the test lock */ + test_owner = (struct nlm_lockowner *)lock->fl.fl_owner; + + error = vfs_test_lock(file->f_file, &lock->fl); if (error) { /* We can't currently deal with deferred test requests */ if (error == FILE_LOCK_DEFERRED) @@ -631,6 +622,10 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file, conflock->fl.fl_end = lock->fl.fl_end; locks_release_private(&lock->fl); + /* Clean up the test lock */ + lock->fl.fl_owner = NULL; + nlmsvc_put_lockowner(test_owner); + ret = nlm_lck_denied; out: return ret; @@ -646,11 +641,11 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file, __be32 nlmsvc_unlock(struct net *net, struct nlm_file *file, struct nlm_lock *lock) { - int error = 0; + int error; dprintk("lockd: nlmsvc_unlock(%s/%ld, pi=%d, %Ld-%Ld)\n", - nlmsvc_file_inode(file)->i_sb->s_id, - nlmsvc_file_inode(file)->i_ino, + locks_inode(file->f_file)->i_sb->s_id, + locks_inode(file->f_file)->i_ino, lock->fl.fl_pid, (long long)lock->fl.fl_start, (long long)lock->fl.fl_end); @@ -659,14 +654,7 @@ nlmsvc_unlock(struct net *net, struct nlm_file *file, struct nlm_lock *lock) nlmsvc_cancel_blocked(net, file, lock); lock->fl.fl_type = F_UNLCK; - lock->fl.fl_file = file->f_file[O_RDONLY]; - if (lock->fl.fl_file) - error = vfs_lock_file(lock->fl.fl_file, F_SETLK, - &lock->fl, NULL); - lock->fl.fl_file = file->f_file[O_WRONLY]; - if (lock->fl.fl_file) - error |= vfs_lock_file(lock->fl.fl_file, F_SETLK, - &lock->fl, NULL); + error = vfs_lock_file(file->f_file, F_SETLK, &lock->fl, NULL); return (error < 0)? nlm_lck_denied_nolocks : nlm_granted; } @@ -683,11 +671,10 @@ nlmsvc_cancel_blocked(struct net *net, struct nlm_file *file, struct nlm_lock *l { struct nlm_block *block; int status = 0; - int mode; dprintk("lockd: nlmsvc_cancel(%s/%ld, pi=%d, %Ld-%Ld)\n", - nlmsvc_file_inode(file)->i_sb->s_id, - nlmsvc_file_inode(file)->i_ino, + locks_inode(file->f_file)->i_sb->s_id, + locks_inode(file->f_file)->i_ino, lock->fl.fl_pid, (long long)lock->fl.fl_start, (long long)lock->fl.fl_end); @@ -699,10 +686,8 @@ nlmsvc_cancel_blocked(struct net *net, struct nlm_file *file, struct nlm_lock *l block = nlmsvc_lookup_block(file, lock); mutex_unlock(&file->f_mutex); if (block != NULL) { - struct file_lock *fl = &block->b_call->a_args.lock.fl; - - mode = lock_to_openmode(fl); - vfs_cancel_lock(block->b_file->f_file[mode], fl); + vfs_cancel_lock(block->b_file->f_file, + &block->b_call->a_args.lock.fl); status = nlmsvc_unlink_block(block); nlmsvc_release_block(block); } @@ -818,7 +803,6 @@ nlmsvc_grant_blocked(struct nlm_block *block) { struct nlm_file *file = block->b_file; struct nlm_lock *lock = &block->b_call->a_args.lock; - int mode; int error; loff_t fl_start, fl_end; @@ -844,8 +828,7 @@ nlmsvc_grant_blocked(struct nlm_block *block) lock->fl.fl_flags |= FL_SLEEP; fl_start = lock->fl.fl_start; fl_end = lock->fl.fl_end; - mode = lock_to_openmode(&lock->fl); - error = vfs_lock_file(file->f_file[mode], F_SETLK, &lock->fl, NULL); + error = vfs_lock_file(file->f_file, F_SETLK, &lock->fl, NULL); lock->fl.fl_flags &= ~FL_SLEEP; lock->fl.fl_start = fl_start; lock->fl.fl_end = fl_end; diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c index 32784f508c81..50855f2c1f4b 100644 --- a/fs/lockd/svcproc.c +++ b/fs/lockd/svcproc.c @@ -55,7 +55,6 @@ nlmsvc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, struct nlm_host *host = NULL; struct nlm_file *file = NULL; struct nlm_lock *lock = &argp->lock; - int mode; __be32 error = 0; /* nfsd callbacks must have been installed for this procedure */ @@ -70,15 +69,13 @@ nlmsvc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp, /* Obtain file pointer. Not used by FREE_ALL call. */ if (filp != NULL) { - error = cast_status(nlm_lookup_file(rqstp, &file, lock)); + error = cast_status(nlm_lookup_file(rqstp, &file, &lock->fh)); if (error != 0) goto no_locks; *filp = file; /* Set up the missing parts of the file_lock structure */ - mode = lock_to_openmode(&lock->fl); - lock->fl.fl_flags = FL_POSIX; - lock->fl.fl_file = file->f_file[mode]; + lock->fl.fl_file = file->f_file; lock->fl.fl_pid = current->tgid; lock->fl.fl_lmops = &nlmsvc_lock_operations; nlmsvc_locks_init_private(&lock->fl, host, (pid_t)lock->svid); @@ -117,7 +114,6 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp) struct nlm_args *argp = rqstp->rq_argp; struct nlm_host *host; struct nlm_file *file; - struct nlm_lockowner *test_owner; __be32 rc = rpc_success; dprintk("lockd: TEST called\n"); @@ -127,8 +123,6 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp) if ((resp->status = nlmsvc_retrieve_args(rqstp, argp, &host, &file))) return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success; - test_owner = argp->lock.fl.fl_owner; - /* Now check for conflicting locks */ resp->status = cast_status(nlmsvc_testlock(rqstp, file, host, &argp->lock, &resp->lock, &resp->cookie)); if (resp->status == nlm_drop_reply) @@ -137,7 +131,7 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp) dprintk("lockd: TEST status %d vers %d\n", ntohl(resp->status), rqstp->rq_vers); - nlmsvc_put_lockowner(test_owner); + nlmsvc_release_lockowner(&argp->lock); nlmsvc_release_host(host); nlm_release_file(file); return rc; @@ -305,6 +299,8 @@ nlmsvc_proc_granted(struct svc_rqst *rqstp) */ static void nlmsvc_callback_exit(struct rpc_task *task, void *data) { + dprintk("lockd: %5u callback returned %d\n", task->tk_pid, + -task->tk_status); } void nlmsvc_release_call(struct nlm_rqst *call) @@ -556,239 +552,191 @@ const struct svc_procedure nlmsvc_procedures[24] = { .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_void), - .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "NULL", }, [NLMPROC_TEST] = { .pc_func = nlmsvc_proc_test, .pc_decode = nlmsvc_decode_testargs, .pc_encode = nlmsvc_encode_testres, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+2+No+Rg, - .pc_name = "TEST", }, [NLMPROC_LOCK] = { .pc_func = nlmsvc_proc_lock, .pc_decode = nlmsvc_decode_lockargs, .pc_encode = nlmsvc_encode_res, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, - .pc_name = "LOCK", }, [NLMPROC_CANCEL] = { .pc_func = nlmsvc_proc_cancel, .pc_decode = nlmsvc_decode_cancargs, .pc_encode = nlmsvc_encode_res, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, - .pc_name = "CANCEL", }, [NLMPROC_UNLOCK] = { .pc_func = nlmsvc_proc_unlock, .pc_decode = nlmsvc_decode_unlockargs, .pc_encode = nlmsvc_encode_res, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, - .pc_name = "UNLOCK", }, [NLMPROC_GRANTED] = { .pc_func = nlmsvc_proc_granted, .pc_decode = nlmsvc_decode_testargs, .pc_encode = nlmsvc_encode_res, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, - .pc_name = "GRANTED", }, [NLMPROC_TEST_MSG] = { .pc_func = nlmsvc_proc_test_msg, .pc_decode = nlmsvc_decode_testargs, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "TEST_MSG", }, [NLMPROC_LOCK_MSG] = { .pc_func = nlmsvc_proc_lock_msg, .pc_decode = nlmsvc_decode_lockargs, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "LOCK_MSG", }, [NLMPROC_CANCEL_MSG] = { .pc_func = nlmsvc_proc_cancel_msg, .pc_decode = nlmsvc_decode_cancargs, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "CANCEL_MSG", }, [NLMPROC_UNLOCK_MSG] = { .pc_func = nlmsvc_proc_unlock_msg, .pc_decode = nlmsvc_decode_unlockargs, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "UNLOCK_MSG", }, [NLMPROC_GRANTED_MSG] = { .pc_func = nlmsvc_proc_granted_msg, .pc_decode = nlmsvc_decode_testargs, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "GRANTED_MSG", }, [NLMPROC_TEST_RES] = { .pc_func = nlmsvc_proc_null, .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_res), - .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "TEST_RES", }, [NLMPROC_LOCK_RES] = { .pc_func = nlmsvc_proc_null, .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_res), - .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "LOCK_RES", }, [NLMPROC_CANCEL_RES] = { .pc_func = nlmsvc_proc_null, .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_res), - .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "CANCEL_RES", }, [NLMPROC_UNLOCK_RES] = { .pc_func = nlmsvc_proc_null, .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_res), - .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "UNLOCK_RES", }, [NLMPROC_GRANTED_RES] = { .pc_func = nlmsvc_proc_granted_res, .pc_decode = nlmsvc_decode_res, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_res), - .pc_argzero = sizeof(struct nlm_res), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "GRANTED_RES", }, [NLMPROC_NSM_NOTIFY] = { .pc_func = nlmsvc_proc_sm_notify, .pc_decode = nlmsvc_decode_reboot, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_reboot), - .pc_argzero = sizeof(struct nlm_reboot), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "SM_NOTIFY", }, [17] = { .pc_func = nlmsvc_proc_unused, .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_void), - .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "UNUSED", }, [18] = { .pc_func = nlmsvc_proc_unused, .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_void), - .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "UNUSED", }, [19] = { .pc_func = nlmsvc_proc_unused, .pc_decode = nlmsvc_decode_void, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_void), - .pc_argzero = sizeof(struct nlm_void), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = St, - .pc_name = "UNUSED", }, [NLMPROC_SHARE] = { .pc_func = nlmsvc_proc_share, .pc_decode = nlmsvc_decode_shareargs, .pc_encode = nlmsvc_encode_shareres, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, - .pc_name = "SHARE", }, [NLMPROC_UNSHARE] = { .pc_func = nlmsvc_proc_unshare, .pc_decode = nlmsvc_decode_shareargs, .pc_encode = nlmsvc_encode_shareres, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St+1, - .pc_name = "UNSHARE", }, [NLMPROC_NM_LOCK] = { .pc_func = nlmsvc_proc_nm_lock, .pc_decode = nlmsvc_decode_lockargs, .pc_encode = nlmsvc_encode_res, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_res), .pc_xdrressize = Ck+St, - .pc_name = "NM_LOCK", }, [NLMPROC_FREE_ALL] = { .pc_func = nlmsvc_proc_free_all, .pc_decode = nlmsvc_decode_notify, .pc_encode = nlmsvc_encode_void, .pc_argsize = sizeof(struct nlm_args), - .pc_argzero = sizeof(struct nlm_args), .pc_ressize = sizeof(struct nlm_void), .pc_xdrressize = 0, - .pc_name = "FREE_ALL", }, }; diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c index e3b6229e7ae5..028fc152da22 100644 --- a/fs/lockd/svcsubs.c +++ b/fs/lockd/svcsubs.c @@ -45,7 +45,7 @@ static inline void nlm_debug_print_fh(char *msg, struct nfs_fh *f) static inline void nlm_debug_print_file(char *msg, struct nlm_file *file) { - struct inode *inode = nlmsvc_file_inode(file); + struct inode *inode = locks_inode(file->f_file); dprintk("lockd: %s %s/%ld\n", msg, inode->i_sb->s_id, inode->i_ino); @@ -71,75 +71,56 @@ static inline unsigned int file_hash(struct nfs_fh *f) return tmp & (FILE_NRHASH - 1); } -int lock_to_openmode(struct file_lock *lock) -{ - return (lock->fl_type == F_WRLCK) ? O_WRONLY : O_RDONLY; -} - -/* - * Open the file. Note that if we're reexporting, for example, - * this could block the lockd thread for a while. - * - * We have to make sure we have the right credential to open - * the file. - */ -static __be32 nlm_do_fopen(struct svc_rqst *rqstp, - struct nlm_file *file, int mode) -{ - struct file **fp = &file->f_file[mode]; - __be32 nfserr; - - if (*fp) - return 0; - nfserr = nlmsvc_ops->fopen(rqstp, &file->f_handle, fp, mode); - if (nfserr) - dprintk("lockd: open failed (error %d)\n", nfserr); - return nfserr; -} - /* * Lookup file info. If it doesn't exist, create a file info struct * and open a (VFS) file for the given inode. + * + * FIXME: + * Note that we open the file O_RDONLY even when creating write locks. + * This is not quite right, but for now, we assume the client performs + * the proper R/W checking. */ __be32 nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result, - struct nlm_lock *lock) + struct nfs_fh *f) { struct nlm_file *file; unsigned int hash; __be32 nfserr; - int mode; - nlm_debug_print_fh("nlm_lookup_file", &lock->fh); + nlm_debug_print_fh("nlm_lookup_file", f); - hash = file_hash(&lock->fh); - mode = lock_to_openmode(&lock->fl); + hash = file_hash(f); /* Lock file table */ mutex_lock(&nlm_file_mutex); hlist_for_each_entry(file, &nlm_files[hash], f_list) - if (!nfs_compare_fh(&file->f_handle, &lock->fh)) { - mutex_lock(&file->f_mutex); - nfserr = nlm_do_fopen(rqstp, file, mode); - mutex_unlock(&file->f_mutex); + if (!nfs_compare_fh(&file->f_handle, f)) goto found; - } - nlm_debug_print_fh("creating file for", &lock->fh); + + nlm_debug_print_fh("creating file for", f); nfserr = nlm_lck_denied_nolocks; file = kzalloc(sizeof(*file), GFP_KERNEL); if (!file) - goto out_free; + goto out_unlock; - memcpy(&file->f_handle, &lock->fh, sizeof(struct nfs_fh)); + memcpy(&file->f_handle, f, sizeof(struct nfs_fh)); mutex_init(&file->f_mutex); INIT_HLIST_NODE(&file->f_list); INIT_LIST_HEAD(&file->f_blocks); - nfserr = nlm_do_fopen(rqstp, file, mode); - if (nfserr) - goto out_unlock; + /* Open the file. Note that this must not sleep for too long, else + * we would lock up lockd:-) So no NFS re-exports, folks. + * + * We have to make sure we have the right credential to open + * the file. + */ + if ((nfserr = nlmsvc_ops->fopen(rqstp, f, &file->f_file)) != 0) { + dprintk("lockd: open failed (error %d)\n", nfserr); + goto out_free; + } hlist_add_head(&file->f_list, &nlm_files[hash]); @@ -147,6 +128,7 @@ nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result, dprintk("lockd: found file %p (count %d)\n", file, file->f_count); *result = file; file->f_count++; + nfserr = 0; out_unlock: mutex_unlock(&nlm_file_mutex); @@ -166,40 +148,13 @@ nlm_delete_file(struct nlm_file *file) nlm_debug_print_file("closing file", file); if (!hlist_unhashed(&file->f_list)) { hlist_del(&file->f_list); - if (file->f_file[O_RDONLY]) - nlmsvc_ops->fclose(file->f_file[O_RDONLY]); - if (file->f_file[O_WRONLY]) - nlmsvc_ops->fclose(file->f_file[O_WRONLY]); + nlmsvc_ops->fclose(file->f_file); kfree(file); } else { printk(KERN_WARNING "lockd: attempt to release unknown file!\n"); } } -static int nlm_unlock_files(struct nlm_file *file, const struct file_lock *fl) -{ - struct file_lock lock; - - locks_init_lock(&lock); - lock.fl_type = F_UNLCK; - lock.fl_start = 0; - lock.fl_end = OFFSET_MAX; - lock.fl_owner = fl->fl_owner; - lock.fl_pid = fl->fl_pid; - lock.fl_flags = FL_POSIX; - - lock.fl_file = file->f_file[O_RDONLY]; - if (lock.fl_file && vfs_lock_file(lock.fl_file, F_SETLK, &lock, NULL)) - goto out_err; - lock.fl_file = file->f_file[O_WRONLY]; - if (lock.fl_file && vfs_lock_file(lock.fl_file, F_SETLK, &lock, NULL)) - goto out_err; - return 0; -out_err: - pr_warn("lockd: unlock failure in %s:%d\n", __FILE__, __LINE__); - return 1; -} - /* * Loop over all locks on the given file and perform the specified * action. @@ -210,7 +165,7 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file *file, { struct inode *inode = nlmsvc_file_inode(file); struct file_lock *fl; - struct file_lock_context *flctx = locks_inode_context(inode); + struct file_lock_context *flctx = inode->i_flctx; struct nlm_host *lockhost; if (!flctx || list_empty_careful(&flctx->flc_posix)) @@ -227,10 +182,17 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file *file, lockhost = ((struct nlm_lockowner *)fl->fl_owner)->host; if (match(lockhost, host)) { + struct file_lock lock = *fl; spin_unlock(&flctx->flc_lock); - if (nlm_unlock_files(file, fl)) + lock.fl_type = F_UNLCK; + lock.fl_start = 0; + lock.fl_end = OFFSET_MAX; + if (vfs_lock_file(file->f_file, F_SETLK, &lock, NULL) < 0) { + printk("lockd: unlock failure in %s:%d\n", + __FILE__, __LINE__); return 1; + } goto again; } } @@ -265,7 +227,7 @@ nlm_file_inuse(struct nlm_file *file) { struct inode *inode = nlmsvc_file_inode(file); struct file_lock *fl; - struct file_lock_context *flctx = locks_inode_context(inode); + struct file_lock_context *flctx = inode->i_flctx; if (file->f_count || !list_empty(&file->f_blocks) || file->f_shares) return 1; @@ -284,14 +246,6 @@ nlm_file_inuse(struct nlm_file *file) return 0; } -static void nlm_close_files(struct nlm_file *file) -{ - if (file->f_file[O_RDONLY]) - nlmsvc_ops->fclose(file->f_file[O_RDONLY]); - if (file->f_file[O_WRONLY]) - nlmsvc_ops->fclose(file->f_file[O_WRONLY]); -} - /* * Loop over all files in the file table. */ @@ -322,7 +276,7 @@ nlm_traverse_files(void *data, nlm_host_match_fn_t match, if (list_empty(&file->f_blocks) && !file->f_locks && !file->f_shares && !file->f_count) { hlist_del(&file->f_list); - nlm_close_files(file); + nlmsvc_ops->fclose(file->f_file); kfree(file); } } @@ -456,13 +410,12 @@ nlmsvc_invalidate_all(void) nlm_traverse_files(NULL, nlmsvc_is_client, NULL); } - static int nlmsvc_match_sb(void *datap, struct nlm_file *file) { struct super_block *sb = datap; - return sb == nlmsvc_file_inode(file)->i_sb; + return sb == locks_inode(file->f_file)->i_sb; } /** diff --git a/fs/lockd/svcxdr.h b/fs/lockd/svcxdr.h deleted file mode 100644 index 4f1a451da5ba..000000000000 --- a/fs/lockd/svcxdr.h +++ /dev/null @@ -1,142 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * Encode/decode NLM basic data types - * - * Basic NLMv3 XDR data types are not defined in an IETF standards - * document. X/Open has a description of these data types that - * is useful. See Chapter 10 of "Protocols for Interworking: - * XNFS, Version 3W". - * - * Basic NLMv4 XDR data types are defined in Appendix II.1.4 of - * RFC 1813: "NFS Version 3 Protocol Specification". - * - * Author: Chuck Lever - * - * Copyright (c) 2020, Oracle and/or its affiliates. - */ - -#ifndef _LOCKD_SVCXDR_H_ -#define _LOCKD_SVCXDR_H_ - -static inline bool -svcxdr_decode_stats(struct xdr_stream *xdr, __be32 *status) -{ - __be32 *p; - - p = xdr_inline_decode(xdr, XDR_UNIT); - if (!p) - return false; - *status = *p; - - return true; -} - -static inline bool -svcxdr_encode_stats(struct xdr_stream *xdr, __be32 status) -{ - __be32 *p; - - p = xdr_reserve_space(xdr, XDR_UNIT); - if (!p) - return false; - *p = status; - - return true; -} - -static inline bool -svcxdr_decode_string(struct xdr_stream *xdr, char **data, unsigned int *data_len) -{ - __be32 *p; - u32 len; - - if (xdr_stream_decode_u32(xdr, &len) < 0) - return false; - if (len > NLM_MAXSTRLEN) - return false; - p = xdr_inline_decode(xdr, len); - if (!p) - return false; - *data_len = len; - *data = (char *)p; - - return true; -} - -/* - * NLM cookies are defined by specification to be a variable-length - * XDR opaque no longer than 1024 bytes. However, this implementation - * limits their length to 32 bytes, and treats zero-length cookies - * specially. - */ -static inline bool -svcxdr_decode_cookie(struct xdr_stream *xdr, struct nlm_cookie *cookie) -{ - __be32 *p; - u32 len; - - if (xdr_stream_decode_u32(xdr, &len) < 0) - return false; - if (len > NLM_MAXCOOKIELEN) - return false; - if (!len) - goto out_hpux; - - p = xdr_inline_decode(xdr, len); - if (!p) - return false; - cookie->len = len; - memcpy(cookie->data, p, len); - - return true; - - /* apparently HPUX can return empty cookies */ -out_hpux: - cookie->len = 4; - memset(cookie->data, 0, 4); - return true; -} - -static inline bool -svcxdr_encode_cookie(struct xdr_stream *xdr, const struct nlm_cookie *cookie) -{ - __be32 *p; - - if (xdr_stream_encode_u32(xdr, cookie->len) < 0) - return false; - p = xdr_reserve_space(xdr, cookie->len); - if (!p) - return false; - memcpy(p, cookie->data, cookie->len); - - return true; -} - -static inline bool -svcxdr_decode_owner(struct xdr_stream *xdr, struct xdr_netobj *obj) -{ - __be32 *p; - u32 len; - - if (xdr_stream_decode_u32(xdr, &len) < 0) - return false; - if (len > XDR_MAX_NETOBJ) - return false; - p = xdr_inline_decode(xdr, len); - if (!p) - return false; - obj->len = len; - obj->data = (u8 *)p; - - return true; -} - -static inline bool -svcxdr_encode_owner(struct xdr_stream *xdr, const struct xdr_netobj *obj) -{ - if (obj->len > XDR_MAX_NETOBJ) - return false; - return xdr_stream_encode_opaque(xdr, obj->data, obj->len) > 0; -} - -#endif /* _LOCKD_SVCXDR_H_ */ diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c index 2fb5748dae0c..982629f7b120 100644 --- a/fs/lockd/xdr.c +++ b/fs/lockd/xdr.c @@ -19,7 +19,7 @@ #include -#include "svcxdr.h" +#define NLMDBG_FACILITY NLMDBG_XDR static inline loff_t @@ -42,313 +42,311 @@ loff_t_to_s32(loff_t offset) } /* - * NLM file handles are defined by specification to be a variable-length - * XDR opaque no longer than 1024 bytes. However, this implementation - * constrains their length to exactly the length of an NFSv2 file - * handle. + * XDR functions for basic NLM types */ -static bool -svcxdr_decode_fhandle(struct xdr_stream *xdr, struct nfs_fh *fh) +static __be32 *nlm_decode_cookie(__be32 *p, struct nlm_cookie *c) { - __be32 *p; - u32 len; + unsigned int len; - if (xdr_stream_decode_u32(xdr, &len) < 0) - return false; - if (len != NFS2_FHSIZE) - return false; - - p = xdr_inline_decode(xdr, len); - if (!p) - return false; - fh->size = NFS2_FHSIZE; - memcpy(fh->data, p, len); - memset(fh->data + NFS2_FHSIZE, 0, sizeof(fh->data) - NFS2_FHSIZE); - - return true; + len = ntohl(*p++); + + if(len==0) + { + c->len=4; + memset(c->data, 0, 4); /* hockeypux brain damage */ + } + else if(len<=NLM_MAXCOOKIELEN) + { + c->len=len; + memcpy(c->data, p, len); + p+=XDR_QUADLEN(len); + } + else + { + dprintk("lockd: bad cookie size %d (only cookies under " + "%d bytes are supported.)\n", + len, NLM_MAXCOOKIELEN); + return NULL; + } + return p; } -static bool -svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) +static inline __be32 * +nlm_encode_cookie(__be32 *p, struct nlm_cookie *c) { - struct file_lock *fl = &lock->fl; - s32 start, len, end; + *p++ = htonl(c->len); + memcpy(p, c->data, c->len); + p+=XDR_QUADLEN(c->len); + return p; +} - if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) - return false; - if (!svcxdr_decode_fhandle(xdr, &lock->fh)) - return false; - if (!svcxdr_decode_owner(xdr, &lock->oh)) - return false; - if (xdr_stream_decode_u32(xdr, &lock->svid) < 0) - return false; - if (xdr_stream_decode_u32(xdr, &start) < 0) - return false; - if (xdr_stream_decode_u32(xdr, &len) < 0) - return false; +static __be32 * +nlm_decode_fh(__be32 *p, struct nfs_fh *f) +{ + unsigned int len; + + if ((len = ntohl(*p++)) != NFS2_FHSIZE) { + dprintk("lockd: bad fhandle size %d (should be %d)\n", + len, NFS2_FHSIZE); + return NULL; + } + f->size = NFS2_FHSIZE; + memset(f->data, 0, sizeof(f->data)); + memcpy(f->data, p, NFS2_FHSIZE); + return p + XDR_QUADLEN(NFS2_FHSIZE); +} + +/* + * Encode and decode owner handle + */ +static inline __be32 * +nlm_decode_oh(__be32 *p, struct xdr_netobj *oh) +{ + return xdr_decode_netobj(p, oh); +} + +static inline __be32 * +nlm_encode_oh(__be32 *p, struct xdr_netobj *oh) +{ + return xdr_encode_netobj(p, oh); +} + +static __be32 * +nlm_decode_lock(__be32 *p, struct nlm_lock *lock) +{ + struct file_lock *fl = &lock->fl; + s32 start, len, end; + + if (!(p = xdr_decode_string_inplace(p, &lock->caller, + &lock->len, + NLM_MAXSTRLEN)) + || !(p = nlm_decode_fh(p, &lock->fh)) + || !(p = nlm_decode_oh(p, &lock->oh))) + return NULL; + lock->svid = ntohl(*p++); locks_init_lock(fl); fl->fl_flags = FL_POSIX; - fl->fl_type = F_RDLCK; + fl->fl_type = F_RDLCK; /* as good as anything else */ + start = ntohl(*p++); + len = ntohl(*p++); end = start + len - 1; + fl->fl_start = s32_to_loff_t(start); + if (len == 0 || end < 0) fl->fl_end = OFFSET_MAX; else fl->fl_end = s32_to_loff_t(end); - - return true; + return p; } -static bool -svcxdr_encode_holder(struct xdr_stream *xdr, const struct nlm_lock *lock) +/* + * Encode result of a TEST/TEST_MSG call + */ +static __be32 * +nlm_encode_testres(__be32 *p, struct nlm_res *resp) { - const struct file_lock *fl = &lock->fl; - s32 start, len; + s32 start, len; - /* exclusive */ - if (xdr_stream_encode_bool(xdr, fl->fl_type != F_RDLCK) < 0) - return false; - if (xdr_stream_encode_u32(xdr, lock->svid) < 0) - return false; - if (!svcxdr_encode_owner(xdr, &lock->oh)) - return false; - start = loff_t_to_s32(fl->fl_start); - if (fl->fl_end == OFFSET_MAX) - len = 0; - else - len = loff_t_to_s32(fl->fl_end - fl->fl_start + 1); - if (xdr_stream_encode_u32(xdr, start) < 0) - return false; - if (xdr_stream_encode_u32(xdr, len) < 0) - return false; + if (!(p = nlm_encode_cookie(p, &resp->cookie))) + return NULL; + *p++ = resp->status; - return true; -} + if (resp->status == nlm_lck_denied) { + struct file_lock *fl = &resp->lock.fl; -static bool -svcxdr_encode_testrply(struct xdr_stream *xdr, const struct nlm_res *resp) -{ - if (!svcxdr_encode_stats(xdr, resp->status)) - return false; - switch (resp->status) { - case nlm_lck_denied: - if (!svcxdr_encode_holder(xdr, &resp->lock)) - return false; + *p++ = (fl->fl_type == F_RDLCK)? xdr_zero : xdr_one; + *p++ = htonl(resp->lock.svid); + + /* Encode owner handle. */ + if (!(p = xdr_encode_netobj(p, &resp->lock.oh))) + return NULL; + + start = loff_t_to_s32(fl->fl_start); + if (fl->fl_end == OFFSET_MAX) + len = 0; + else + len = loff_t_to_s32(fl->fl_end - fl->fl_start + 1); + + *p++ = htonl(start); + *p++ = htonl(len); } - return true; + return p; } /* - * Decode Call arguments + * First, the server side XDR functions */ - -bool -nlmsvc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) -{ - return true; -} - -bool -nlmsvc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlmsvc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) { struct nlm_args *argp = rqstp->rq_argp; - u32 exclusive; + u32 exclusive; - if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return false; - if (xdr_stream_decode_bool(xdr, &exclusive) < 0) - return false; - if (!svcxdr_decode_lock(xdr, &argp->lock)) - return false; + if (!(p = nlm_decode_cookie(p, &argp->cookie))) + return 0; + + exclusive = ntohl(*p++); + if (!(p = nlm_decode_lock(p, &argp->lock))) + return 0; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; - return true; + return xdr_argsize_check(rqstp, p); } -bool -nlmsvc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlmsvc_encode_testres(struct svc_rqst *rqstp, __be32 *p) +{ + struct nlm_res *resp = rqstp->rq_resp; + + if (!(p = nlm_encode_testres(p, resp))) + return 0; + return xdr_ressize_check(rqstp, p); +} + +int +nlmsvc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) { struct nlm_args *argp = rqstp->rq_argp; - u32 exclusive; + u32 exclusive; - if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return false; - if (xdr_stream_decode_bool(xdr, &argp->block) < 0) - return false; - if (xdr_stream_decode_bool(xdr, &exclusive) < 0) - return false; - if (!svcxdr_decode_lock(xdr, &argp->lock)) - return false; + if (!(p = nlm_decode_cookie(p, &argp->cookie))) + return 0; + argp->block = ntohl(*p++); + exclusive = ntohl(*p++); + if (!(p = nlm_decode_lock(p, &argp->lock))) + return 0; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; - if (xdr_stream_decode_bool(xdr, &argp->reclaim) < 0) - return false; - if (xdr_stream_decode_u32(xdr, &argp->state) < 0) - return false; + argp->reclaim = ntohl(*p++); + argp->state = ntohl(*p++); argp->monitor = 1; /* monitor client by default */ - return true; + return xdr_argsize_check(rqstp, p); } -bool -nlmsvc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlmsvc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) { struct nlm_args *argp = rqstp->rq_argp; - u32 exclusive; + u32 exclusive; - if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return false; - if (xdr_stream_decode_bool(xdr, &argp->block) < 0) - return false; - if (xdr_stream_decode_bool(xdr, &exclusive) < 0) - return false; - if (!svcxdr_decode_lock(xdr, &argp->lock)) - return false; + if (!(p = nlm_decode_cookie(p, &argp->cookie))) + return 0; + argp->block = ntohl(*p++); + exclusive = ntohl(*p++); + if (!(p = nlm_decode_lock(p, &argp->lock))) + return 0; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; - - return true; + return xdr_argsize_check(rqstp, p); } -bool -nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) { struct nlm_args *argp = rqstp->rq_argp; - if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return false; - if (!svcxdr_decode_lock(xdr, &argp->lock)) - return false; + if (!(p = nlm_decode_cookie(p, &argp->cookie)) + || !(p = nlm_decode_lock(p, &argp->lock))) + return 0; argp->lock.fl.fl_type = F_UNLCK; - - return true; + return xdr_argsize_check(rqstp, p); } -bool -nlmsvc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) -{ - struct nlm_res *resp = rqstp->rq_argp; - - if (!svcxdr_decode_cookie(xdr, &resp->cookie)) - return false; - if (!svcxdr_decode_stats(xdr, &resp->status)) - return false; - - return true; -} - -bool -nlmsvc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr) -{ - struct nlm_reboot *argp = rqstp->rq_argp; - __be32 *p; - u32 len; - - if (xdr_stream_decode_u32(xdr, &len) < 0) - return false; - if (len > SM_MAXSTRLEN) - return false; - p = xdr_inline_decode(xdr, len); - if (!p) - return false; - argp->len = len; - argp->mon = (char *)p; - if (xdr_stream_decode_u32(xdr, &argp->state) < 0) - return false; - p = xdr_inline_decode(xdr, SM_PRIV_SIZE); - if (!p) - return false; - memcpy(&argp->priv.data, p, sizeof(argp->priv.data)); - - return true; -} - -bool -nlmsvc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlmsvc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) { struct nlm_args *argp = rqstp->rq_argp; struct nlm_lock *lock = &argp->lock; memset(lock, 0, sizeof(*lock)); locks_init_lock(&lock->fl); - lock->svid = ~(u32)0; + lock->svid = ~(u32) 0; - if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return false; - if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) - return false; - if (!svcxdr_decode_fhandle(xdr, &lock->fh)) - return false; - if (!svcxdr_decode_owner(xdr, &lock->oh)) - return false; - /* XXX: Range checks are missing in the original code */ - if (xdr_stream_decode_u32(xdr, &argp->fsm_mode) < 0) - return false; - if (xdr_stream_decode_u32(xdr, &argp->fsm_access) < 0) - return false; - - return true; + if (!(p = nlm_decode_cookie(p, &argp->cookie)) + || !(p = xdr_decode_string_inplace(p, &lock->caller, + &lock->len, NLM_MAXSTRLEN)) + || !(p = nlm_decode_fh(p, &lock->fh)) + || !(p = nlm_decode_oh(p, &lock->oh))) + return 0; + argp->fsm_mode = ntohl(*p++); + argp->fsm_access = ntohl(*p++); + return xdr_argsize_check(rqstp, p); } -bool -nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlmsvc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) +{ + struct nlm_res *resp = rqstp->rq_resp; + + if (!(p = nlm_encode_cookie(p, &resp->cookie))) + return 0; + *p++ = resp->status; + *p++ = xdr_zero; /* sequence argument */ + return xdr_ressize_check(rqstp, p); +} + +int +nlmsvc_encode_res(struct svc_rqst *rqstp, __be32 *p) +{ + struct nlm_res *resp = rqstp->rq_resp; + + if (!(p = nlm_encode_cookie(p, &resp->cookie))) + return 0; + *p++ = resp->status; + return xdr_ressize_check(rqstp, p); +} + +int +nlmsvc_decode_notify(struct svc_rqst *rqstp, __be32 *p) { struct nlm_args *argp = rqstp->rq_argp; struct nlm_lock *lock = &argp->lock; - if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) - return false; - if (xdr_stream_decode_u32(xdr, &argp->state) < 0) - return false; - - return true; + if (!(p = xdr_decode_string_inplace(p, &lock->caller, + &lock->len, NLM_MAXSTRLEN))) + return 0; + argp->state = ntohl(*p++); + return xdr_argsize_check(rqstp, p); } - -/* - * Encode Reply results - */ - -bool -nlmsvc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlmsvc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) { - return true; + struct nlm_reboot *argp = rqstp->rq_argp; + + if (!(p = xdr_decode_string_inplace(p, &argp->mon, &argp->len, SM_MAXSTRLEN))) + return 0; + argp->state = ntohl(*p++); + memcpy(&argp->priv.data, p, sizeof(argp->priv.data)); + p += XDR_QUADLEN(SM_PRIV_SIZE); + return xdr_argsize_check(rqstp, p); } -bool -nlmsvc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlmsvc_decode_res(struct svc_rqst *rqstp, __be32 *p) { - struct nlm_res *resp = rqstp->rq_resp; + struct nlm_res *resp = rqstp->rq_argp; - return svcxdr_encode_cookie(xdr, &resp->cookie) && - svcxdr_encode_testrply(xdr, resp); + if (!(p = nlm_decode_cookie(p, &resp->cookie))) + return 0; + resp->status = *p++; + return xdr_argsize_check(rqstp, p); } -bool -nlmsvc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlmsvc_decode_void(struct svc_rqst *rqstp, __be32 *p) { - struct nlm_res *resp = rqstp->rq_resp; - - return svcxdr_encode_cookie(xdr, &resp->cookie) && - svcxdr_encode_stats(xdr, resp->status); + return xdr_argsize_check(rqstp, p); } -bool -nlmsvc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlmsvc_encode_void(struct svc_rqst *rqstp, __be32 *p) { - struct nlm_res *resp = rqstp->rq_resp; - - if (!svcxdr_encode_cookie(xdr, &resp->cookie)) - return false; - if (!svcxdr_encode_stats(xdr, resp->status)) - return false; - /* sequence */ - if (xdr_stream_encode_u32(xdr, 0) < 0) - return false; - - return true; + return xdr_ressize_check(rqstp, p); } diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c index 5fcbf30cd275..5fa9f48a9dba 100644 --- a/fs/lockd/xdr4.c +++ b/fs/lockd/xdr4.c @@ -18,7 +18,14 @@ #include #include -#include "svcxdr.h" +#define NLMDBG_FACILITY NLMDBG_XDR + +static inline loff_t +s64_to_loff_t(__s64 offset) +{ + return (loff_t)offset; +} + static inline s64 loff_t_to_s64(loff_t offset) @@ -33,317 +40,310 @@ loff_t_to_s64(loff_t offset) return res; } -void nlm4svc_set_file_lock_range(struct file_lock *fl, u64 off, u64 len) +/* + * XDR functions for basic NLM types + */ +static __be32 * +nlm4_decode_cookie(__be32 *p, struct nlm_cookie *c) { - s64 end = off + len - 1; + unsigned int len; - fl->fl_start = off; - if (len == 0 || end < 0) - fl->fl_end = OFFSET_MAX; - else - fl->fl_end = end; + len = ntohl(*p++); + + if(len==0) + { + c->len=4; + memset(c->data, 0, 4); /* hockeypux brain damage */ + } + else if(len<=NLM_MAXCOOKIELEN) + { + c->len=len; + memcpy(c->data, p, len); + p+=XDR_QUADLEN(len); + } + else + { + dprintk("lockd: bad cookie size %d (only cookies under " + "%d bytes are supported.)\n", + len, NLM_MAXCOOKIELEN); + return NULL; + } + return p; +} + +static __be32 * +nlm4_encode_cookie(__be32 *p, struct nlm_cookie *c) +{ + *p++ = htonl(c->len); + memcpy(p, c->data, c->len); + p+=XDR_QUADLEN(c->len); + return p; +} + +static __be32 * +nlm4_decode_fh(__be32 *p, struct nfs_fh *f) +{ + memset(f->data, 0, sizeof(f->data)); + f->size = ntohl(*p++); + if (f->size > NFS_MAXFHSIZE) { + dprintk("lockd: bad fhandle size %d (should be <=%d)\n", + f->size, NFS_MAXFHSIZE); + return NULL; + } + memcpy(f->data, p, f->size); + return p + XDR_QUADLEN(f->size); } /* - * NLM file handles are defined by specification to be a variable-length - * XDR opaque no longer than 1024 bytes. However, this implementation - * limits their length to the size of an NFSv3 file handle. + * Encode and decode owner handle */ -static bool -svcxdr_decode_fhandle(struct xdr_stream *xdr, struct nfs_fh *fh) +static __be32 * +nlm4_decode_oh(__be32 *p, struct xdr_netobj *oh) { - __be32 *p; - u32 len; - - if (xdr_stream_decode_u32(xdr, &len) < 0) - return false; - if (len > NFS_MAXFHSIZE) - return false; - - p = xdr_inline_decode(xdr, len); - if (!p) - return false; - fh->size = len; - memcpy(fh->data, p, len); - memset(fh->data + len, 0, sizeof(fh->data) - len); - - return true; + return xdr_decode_netobj(p, oh); } -static bool -svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock) +static __be32 * +nlm4_decode_lock(__be32 *p, struct nlm_lock *lock) { - struct file_lock *fl = &lock->fl; + struct file_lock *fl = &lock->fl; + __u64 len, start; + __s64 end; - if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) - return false; - if (!svcxdr_decode_fhandle(xdr, &lock->fh)) - return false; - if (!svcxdr_decode_owner(xdr, &lock->oh)) - return false; - if (xdr_stream_decode_u32(xdr, &lock->svid) < 0) - return false; - if (xdr_stream_decode_u64(xdr, &lock->lock_start) < 0) - return false; - if (xdr_stream_decode_u64(xdr, &lock->lock_len) < 0) - return false; + if (!(p = xdr_decode_string_inplace(p, &lock->caller, + &lock->len, NLM_MAXSTRLEN)) + || !(p = nlm4_decode_fh(p, &lock->fh)) + || !(p = nlm4_decode_oh(p, &lock->oh))) + return NULL; + lock->svid = ntohl(*p++); locks_init_lock(fl); fl->fl_flags = FL_POSIX; - fl->fl_type = F_RDLCK; - nlm4svc_set_file_lock_range(fl, lock->lock_start, lock->lock_len); - return true; -} + fl->fl_type = F_RDLCK; /* as good as anything else */ + p = xdr_decode_hyper(p, &start); + p = xdr_decode_hyper(p, &len); + end = start + len - 1; -static bool -svcxdr_encode_holder(struct xdr_stream *xdr, const struct nlm_lock *lock) -{ - const struct file_lock *fl = &lock->fl; - s64 start, len; + fl->fl_start = s64_to_loff_t(start); - /* exclusive */ - if (xdr_stream_encode_bool(xdr, fl->fl_type != F_RDLCK) < 0) - return false; - if (xdr_stream_encode_u32(xdr, lock->svid) < 0) - return false; - if (!svcxdr_encode_owner(xdr, &lock->oh)) - return false; - start = loff_t_to_s64(fl->fl_start); - if (fl->fl_end == OFFSET_MAX) - len = 0; + if (len == 0 || end < 0) + fl->fl_end = OFFSET_MAX; else - len = loff_t_to_s64(fl->fl_end - fl->fl_start + 1); - if (xdr_stream_encode_u64(xdr, start) < 0) - return false; - if (xdr_stream_encode_u64(xdr, len) < 0) - return false; - - return true; + fl->fl_end = s64_to_loff_t(end); + return p; } -static bool -svcxdr_encode_testrply(struct xdr_stream *xdr, const struct nlm_res *resp) +/* + * Encode result of a TEST/TEST_MSG call + */ +static __be32 * +nlm4_encode_testres(__be32 *p, struct nlm_res *resp) { - if (!svcxdr_encode_stats(xdr, resp->status)) - return false; - switch (resp->status) { - case nlm_lck_denied: - if (!svcxdr_encode_holder(xdr, &resp->lock)) - return false; + s64 start, len; + + dprintk("xdr: before encode_testres (p %p resp %p)\n", p, resp); + if (!(p = nlm4_encode_cookie(p, &resp->cookie))) + return NULL; + *p++ = resp->status; + + if (resp->status == nlm_lck_denied) { + struct file_lock *fl = &resp->lock.fl; + + *p++ = (fl->fl_type == F_RDLCK)? xdr_zero : xdr_one; + *p++ = htonl(resp->lock.svid); + + /* Encode owner handle. */ + if (!(p = xdr_encode_netobj(p, &resp->lock.oh))) + return NULL; + + start = loff_t_to_s64(fl->fl_start); + if (fl->fl_end == OFFSET_MAX) + len = 0; + else + len = loff_t_to_s64(fl->fl_end - fl->fl_start + 1); + + p = xdr_encode_hyper(p, start); + p = xdr_encode_hyper(p, len); + dprintk("xdr: encode_testres (status %u pid %d type %d start %Ld end %Ld)\n", + resp->status, (int)resp->lock.svid, fl->fl_type, + (long long)fl->fl_start, (long long)fl->fl_end); } - return true; + dprintk("xdr: after encode_testres (p %p resp %p)\n", p, resp); + return p; } /* - * Decode Call arguments + * First, the server side XDR functions */ - -bool -nlm4svc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) -{ - return true; -} - -bool -nlm4svc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlm4svc_decode_testargs(struct svc_rqst *rqstp, __be32 *p) { struct nlm_args *argp = rqstp->rq_argp; - u32 exclusive; + u32 exclusive; - if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return false; - if (xdr_stream_decode_bool(xdr, &exclusive) < 0) - return false; - if (!svcxdr_decode_lock(xdr, &argp->lock)) - return false; + if (!(p = nlm4_decode_cookie(p, &argp->cookie))) + return 0; + + exclusive = ntohl(*p++); + if (!(p = nlm4_decode_lock(p, &argp->lock))) + return 0; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; - return true; + return xdr_argsize_check(rqstp, p); } -bool -nlm4svc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlm4svc_encode_testres(struct svc_rqst *rqstp, __be32 *p) +{ + struct nlm_res *resp = rqstp->rq_resp; + + if (!(p = nlm4_encode_testres(p, resp))) + return 0; + return xdr_ressize_check(rqstp, p); +} + +int +nlm4svc_decode_lockargs(struct svc_rqst *rqstp, __be32 *p) { struct nlm_args *argp = rqstp->rq_argp; - u32 exclusive; + u32 exclusive; - if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return false; - if (xdr_stream_decode_bool(xdr, &argp->block) < 0) - return false; - if (xdr_stream_decode_bool(xdr, &exclusive) < 0) - return false; - if (!svcxdr_decode_lock(xdr, &argp->lock)) - return false; + if (!(p = nlm4_decode_cookie(p, &argp->cookie))) + return 0; + argp->block = ntohl(*p++); + exclusive = ntohl(*p++); + if (!(p = nlm4_decode_lock(p, &argp->lock))) + return 0; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; - if (xdr_stream_decode_bool(xdr, &argp->reclaim) < 0) - return false; - if (xdr_stream_decode_u32(xdr, &argp->state) < 0) - return false; + argp->reclaim = ntohl(*p++); + argp->state = ntohl(*p++); argp->monitor = 1; /* monitor client by default */ - return true; + return xdr_argsize_check(rqstp, p); } -bool -nlm4svc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlm4svc_decode_cancargs(struct svc_rqst *rqstp, __be32 *p) { struct nlm_args *argp = rqstp->rq_argp; - u32 exclusive; + u32 exclusive; - if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return false; - if (xdr_stream_decode_bool(xdr, &argp->block) < 0) - return false; - if (xdr_stream_decode_bool(xdr, &exclusive) < 0) - return false; - if (!svcxdr_decode_lock(xdr, &argp->lock)) - return false; + if (!(p = nlm4_decode_cookie(p, &argp->cookie))) + return 0; + argp->block = ntohl(*p++); + exclusive = ntohl(*p++); + if (!(p = nlm4_decode_lock(p, &argp->lock))) + return 0; if (exclusive) argp->lock.fl.fl_type = F_WRLCK; - - return true; + return xdr_argsize_check(rqstp, p); } -bool -nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, __be32 *p) { struct nlm_args *argp = rqstp->rq_argp; - if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return false; - if (!svcxdr_decode_lock(xdr, &argp->lock)) - return false; + if (!(p = nlm4_decode_cookie(p, &argp->cookie)) + || !(p = nlm4_decode_lock(p, &argp->lock))) + return 0; argp->lock.fl.fl_type = F_UNLCK; - - return true; + return xdr_argsize_check(rqstp, p); } -bool -nlm4svc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) -{ - struct nlm_res *resp = rqstp->rq_argp; - - if (!svcxdr_decode_cookie(xdr, &resp->cookie)) - return false; - if (!svcxdr_decode_stats(xdr, &resp->status)) - return false; - - return true; -} - -bool -nlm4svc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr) -{ - struct nlm_reboot *argp = rqstp->rq_argp; - __be32 *p; - u32 len; - - if (xdr_stream_decode_u32(xdr, &len) < 0) - return false; - if (len > SM_MAXSTRLEN) - return false; - p = xdr_inline_decode(xdr, len); - if (!p) - return false; - argp->len = len; - argp->mon = (char *)p; - if (xdr_stream_decode_u32(xdr, &argp->state) < 0) - return false; - p = xdr_inline_decode(xdr, SM_PRIV_SIZE); - if (!p) - return false; - memcpy(&argp->priv.data, p, sizeof(argp->priv.data)); - - return true; -} - -bool -nlm4svc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlm4svc_decode_shareargs(struct svc_rqst *rqstp, __be32 *p) { struct nlm_args *argp = rqstp->rq_argp; struct nlm_lock *lock = &argp->lock; memset(lock, 0, sizeof(*lock)); locks_init_lock(&lock->fl); - lock->svid = ~(u32)0; + lock->svid = ~(u32) 0; - if (!svcxdr_decode_cookie(xdr, &argp->cookie)) - return false; - if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) - return false; - if (!svcxdr_decode_fhandle(xdr, &lock->fh)) - return false; - if (!svcxdr_decode_owner(xdr, &lock->oh)) - return false; - /* XXX: Range checks are missing in the original code */ - if (xdr_stream_decode_u32(xdr, &argp->fsm_mode) < 0) - return false; - if (xdr_stream_decode_u32(xdr, &argp->fsm_access) < 0) - return false; - - return true; + if (!(p = nlm4_decode_cookie(p, &argp->cookie)) + || !(p = xdr_decode_string_inplace(p, &lock->caller, + &lock->len, NLM_MAXSTRLEN)) + || !(p = nlm4_decode_fh(p, &lock->fh)) + || !(p = nlm4_decode_oh(p, &lock->oh))) + return 0; + argp->fsm_mode = ntohl(*p++); + argp->fsm_access = ntohl(*p++); + return xdr_argsize_check(rqstp, p); } -bool -nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlm4svc_encode_shareres(struct svc_rqst *rqstp, __be32 *p) +{ + struct nlm_res *resp = rqstp->rq_resp; + + if (!(p = nlm4_encode_cookie(p, &resp->cookie))) + return 0; + *p++ = resp->status; + *p++ = xdr_zero; /* sequence argument */ + return xdr_ressize_check(rqstp, p); +} + +int +nlm4svc_encode_res(struct svc_rqst *rqstp, __be32 *p) +{ + struct nlm_res *resp = rqstp->rq_resp; + + if (!(p = nlm4_encode_cookie(p, &resp->cookie))) + return 0; + *p++ = resp->status; + return xdr_ressize_check(rqstp, p); +} + +int +nlm4svc_decode_notify(struct svc_rqst *rqstp, __be32 *p) { struct nlm_args *argp = rqstp->rq_argp; struct nlm_lock *lock = &argp->lock; - if (!svcxdr_decode_string(xdr, &lock->caller, &lock->len)) - return false; - if (xdr_stream_decode_u32(xdr, &argp->state) < 0) - return false; - - return true; + if (!(p = xdr_decode_string_inplace(p, &lock->caller, + &lock->len, NLM_MAXSTRLEN))) + return 0; + argp->state = ntohl(*p++); + return xdr_argsize_check(rqstp, p); } - -/* - * Encode Reply results - */ - -bool -nlm4svc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlm4svc_decode_reboot(struct svc_rqst *rqstp, __be32 *p) { - return true; + struct nlm_reboot *argp = rqstp->rq_argp; + + if (!(p = xdr_decode_string_inplace(p, &argp->mon, &argp->len, SM_MAXSTRLEN))) + return 0; + argp->state = ntohl(*p++); + memcpy(&argp->priv.data, p, sizeof(argp->priv.data)); + p += XDR_QUADLEN(SM_PRIV_SIZE); + return xdr_argsize_check(rqstp, p); } -bool -nlm4svc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlm4svc_decode_res(struct svc_rqst *rqstp, __be32 *p) { - struct nlm_res *resp = rqstp->rq_resp; + struct nlm_res *resp = rqstp->rq_argp; - return svcxdr_encode_cookie(xdr, &resp->cookie) && - svcxdr_encode_testrply(xdr, resp); + if (!(p = nlm4_decode_cookie(p, &resp->cookie))) + return 0; + resp->status = *p++; + return xdr_argsize_check(rqstp, p); } -bool -nlm4svc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlm4svc_decode_void(struct svc_rqst *rqstp, __be32 *p) { - struct nlm_res *resp = rqstp->rq_resp; - - return svcxdr_encode_cookie(xdr, &resp->cookie) && - svcxdr_encode_stats(xdr, resp->status); + return xdr_argsize_check(rqstp, p); } -bool -nlm4svc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nlm4svc_encode_void(struct svc_rqst *rqstp, __be32 *p) { - struct nlm_res *resp = rqstp->rq_resp; - - if (!svcxdr_encode_cookie(xdr, &resp->cookie)) - return false; - if (!svcxdr_encode_stats(xdr, resp->status)) - return false; - /* sequence */ - if (xdr_stream_encode_u32(xdr, 0) < 0) - return false; - - return true; + return xdr_ressize_check(rqstp, p); } diff --git a/fs/locks.c b/fs/locks.c index b0753c8871fb..cbb5701ce9f3 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -251,7 +251,7 @@ locks_get_lock_context(struct inode *inode, int type) struct file_lock_context *ctx; /* paired with cmpxchg() below */ - ctx = locks_inode_context(inode); + ctx = smp_load_acquire(&inode->i_flctx); if (likely(ctx) || type == F_UNLCK) goto out; @@ -270,7 +270,7 @@ locks_get_lock_context(struct inode *inode, int type) */ if (cmpxchg(&inode->i_flctx, NULL, ctx)) { kmem_cache_free(flctx_cache, ctx); - ctx = locks_inode_context(inode); + ctx = smp_load_acquire(&inode->i_flctx); } out: trace_locks_get_lock_context(inode, type, ctx); @@ -323,7 +323,7 @@ locks_check_ctx_file_list(struct file *filp, struct list_head *list, void locks_free_lock_context(struct inode *inode) { - struct file_lock_context *ctx = locks_inode_context(inode); + struct file_lock_context *ctx = inode->i_flctx; if (unlikely(ctx)) { locks_check_ctx_lists(inode); @@ -376,34 +376,6 @@ void locks_release_private(struct file_lock *fl) } EXPORT_SYMBOL_GPL(locks_release_private); -/** - * locks_owner_has_blockers - Check for blocking lock requests - * @flctx: file lock context - * @owner: lock owner - * - * Return values: - * %true: @owner has at least one blocker - * %false: @owner has no blockers - */ -bool locks_owner_has_blockers(struct file_lock_context *flctx, - fl_owner_t owner) -{ - struct file_lock *fl; - - spin_lock(&flctx->flc_lock); - list_for_each_entry(fl, &flctx->flc_posix, fl_list) { - if (fl->fl_owner != owner) - continue; - if (!list_empty(&fl->fl_blocked_requests)) { - spin_unlock(&flctx->flc_lock); - return true; - } - } - spin_unlock(&flctx->flc_lock); - return false; -} -EXPORT_SYMBOL_GPL(locks_owner_has_blockers); - /* Free a lock which is not in use. */ void locks_free_lock(struct file_lock *fl) { @@ -982,32 +954,19 @@ posix_test_lock(struct file *filp, struct file_lock *fl) struct file_lock *cfl; struct file_lock_context *ctx; struct inode *inode = locks_inode(filp); - void *owner; - void (*func)(void); - ctx = locks_inode_context(inode); + ctx = smp_load_acquire(&inode->i_flctx); if (!ctx || list_empty_careful(&ctx->flc_posix)) { fl->fl_type = F_UNLCK; return; } -retry: spin_lock(&ctx->flc_lock); list_for_each_entry(cfl, &ctx->flc_posix, fl_list) { - if (!posix_locks_conflict(fl, cfl)) - continue; - if (cfl->fl_lmops && cfl->fl_lmops->lm_lock_expirable - && (*cfl->fl_lmops->lm_lock_expirable)(cfl)) { - owner = cfl->fl_lmops->lm_mod_owner; - func = cfl->fl_lmops->lm_expire_lock; - __module_get(owner); - spin_unlock(&ctx->flc_lock); - (*func)(); - module_put(owner); - goto retry; + if (posix_locks_conflict(fl, cfl)) { + locks_copy_conflock(fl, cfl); + goto out; } - locks_copy_conflock(fl, cfl); - goto out; } fl->fl_type = F_UNLCK; out: @@ -1181,8 +1140,6 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, int error; bool added = false; LIST_HEAD(dispose); - void *owner; - void (*func)(void); ctx = locks_get_lock_context(inode, request->fl_type); if (!ctx) @@ -1201,7 +1158,6 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, new_fl2 = locks_alloc_lock(); } -retry: percpu_down_read(&file_rwsem); spin_lock(&ctx->flc_lock); /* @@ -1213,17 +1169,6 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, list_for_each_entry(fl, &ctx->flc_posix, fl_list) { if (!posix_locks_conflict(request, fl)) continue; - if (fl->fl_lmops && fl->fl_lmops->lm_lock_expirable - && (*fl->fl_lmops->lm_lock_expirable)(fl)) { - owner = fl->fl_lmops->lm_mod_owner; - func = fl->fl_lmops->lm_expire_lock; - __module_get(owner); - spin_unlock(&ctx->flc_lock); - percpu_up_read(&file_rwsem); - (*func)(); - module_put(owner); - goto retry; - } if (conflock) locks_copy_conflock(conflock, fl); error = -EAGAIN; @@ -1674,7 +1619,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) new_fl->fl_flags = type; /* typically we will check that ctx is non-NULL before calling */ - ctx = locks_inode_context(inode); + ctx = smp_load_acquire(&inode->i_flctx); if (!ctx) { WARN_ON_ONCE(1); goto free_lock; @@ -1779,7 +1724,7 @@ void lease_get_mtime(struct inode *inode, struct timespec64 *time) struct file_lock_context *ctx; struct file_lock *fl; - ctx = locks_inode_context(inode); + ctx = smp_load_acquire(&inode->i_flctx); if (ctx && !list_empty_careful(&ctx->flc_lease)) { spin_lock(&ctx->flc_lock); fl = list_first_entry_or_null(&ctx->flc_lease, @@ -1825,7 +1770,7 @@ int fcntl_getlease(struct file *filp) int type = F_UNLCK; LIST_HEAD(dispose); - ctx = locks_inode_context(inode); + ctx = smp_load_acquire(&inode->i_flctx); if (ctx && !list_empty_careful(&ctx->flc_lease)) { percpu_down_read(&file_rwsem); spin_lock(&ctx->flc_lock); @@ -1863,9 +1808,6 @@ check_conflicting_open(struct file *filp, const long arg, int flags) if (flags & FL_LAYOUT) return 0; - if (flags & FL_DELEG) - /* We leave these checks to the caller */ - return 0; if (arg == F_RDLCK) return inode_is_open_for_write(inode) ? -EAGAIN : 0; @@ -2014,7 +1956,7 @@ static int generic_delete_lease(struct file *filp, void *owner) struct file_lock_context *ctx; LIST_HEAD(dispose); - ctx = locks_inode_context(inode); + ctx = smp_load_acquire(&inode->i_flctx); if (!ctx) { trace_generic_delete_lease(inode, NULL); return error; @@ -2594,15 +2536,14 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd, */ if (!error && file_lock->fl_type != F_UNLCK && !(file_lock->fl_flags & FL_OFDLCK)) { - struct files_struct *files = current->files; /* * We need that spin_lock here - it prevents reordering between * update of i_flctx->flc_posix and check for it done in * close(). rcu_read_lock() wouldn't do. */ - spin_lock(&files->file_lock); - f = files_lookup_fd_locked(files, fd); - spin_unlock(&files->file_lock); + spin_lock(¤t->files->file_lock); + f = fcheck(fd); + spin_unlock(¤t->files->file_lock); if (f != filp) { file_lock->fl_type = F_UNLCK; error = do_lock_file_wait(filp, cmd, file_lock); @@ -2726,15 +2667,14 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd, */ if (!error && file_lock->fl_type != F_UNLCK && !(file_lock->fl_flags & FL_OFDLCK)) { - struct files_struct *files = current->files; /* * We need that spin_lock here - it prevents reordering between * update of i_flctx->flc_posix and check for it done in * close(). rcu_read_lock() wouldn't do. */ - spin_lock(&files->file_lock); - f = files_lookup_fd_locked(files, fd); - spin_unlock(&files->file_lock); + spin_lock(¤t->files->file_lock); + f = fcheck(fd); + spin_unlock(¤t->files->file_lock); if (f != filp) { file_lock->fl_type = F_UNLCK; error = do_lock_file_wait(filp, cmd, file_lock); @@ -2765,7 +2705,7 @@ void locks_remove_posix(struct file *filp, fl_owner_t owner) * posix_lock_file(). Another process could be setting a lock on this * file at the same time, but we wouldn't remove that lock anyway. */ - ctx = locks_inode_context(inode); + ctx = smp_load_acquire(&inode->i_flctx); if (!ctx || list_empty(&ctx->flc_posix)) return; @@ -2838,7 +2778,7 @@ void locks_remove_file(struct file *filp) { struct file_lock_context *ctx; - ctx = locks_inode_context(locks_inode(filp)); + ctx = smp_load_acquire(&locks_inode(filp)->i_flctx); if (!ctx) return; @@ -2885,7 +2825,7 @@ bool vfs_inode_has_locks(struct inode *inode) struct file_lock_context *ctx; bool ret; - ctx = locks_inode_context(inode); + ctx = smp_load_acquire(&inode->i_flctx); if (!ctx) return false; @@ -3030,7 +2970,7 @@ void show_fd_locks(struct seq_file *f, struct file_lock_context *ctx; int id = 0; - ctx = locks_inode_context(inode); + ctx = smp_load_acquire(&inode->i_flctx); if (!ctx) return; diff --git a/fs/namei.c b/fs/namei.c index 6b85ad8a1555..8cea84ecbf56 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -4361,14 +4361,11 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname * ->i_mutex on parents, which works but leads to some truly excessive * locking]. */ -int vfs_rename(struct renamedata *rd) +int vfs_rename(struct inode *old_dir, struct dentry *old_dentry, + struct inode *new_dir, struct dentry *new_dentry, + struct inode **delegated_inode, unsigned int flags) { int error; - struct inode *old_dir = rd->old_dir, *new_dir = rd->new_dir; - struct dentry *old_dentry = rd->old_dentry; - struct dentry *new_dentry = rd->new_dentry; - struct inode **delegated_inode = rd->delegated_inode; - unsigned int flags = rd->flags; bool is_dir = d_is_dir(old_dentry); struct inode *source = old_dentry->d_inode; struct inode *target = new_dentry->d_inode; @@ -4516,7 +4513,6 @@ EXPORT_SYMBOL_NS(vfs_rename, ANDROID_GKI_VFS_EXPORT_ONLY); int do_renameat2(int olddfd, struct filename *from, int newdfd, struct filename *to, unsigned int flags) { - struct renamedata rd; struct dentry *old_dentry, *new_dentry; struct dentry *trap; struct path old_path, new_path; @@ -4620,14 +4616,9 @@ int do_renameat2(int olddfd, struct filename *from, int newdfd, &new_path, new_dentry, flags); if (error) goto exit5; - - rd.old_dir = old_path.dentry->d_inode; - rd.old_dentry = old_dentry; - rd.new_dir = new_path.dentry->d_inode; - rd.new_dentry = new_dentry; - rd.delegated_inode = &delegated_inode; - rd.flags = flags; - error = vfs_rename(&rd); + error = vfs_rename(old_path.dentry->d_inode, old_dentry, + new_path.dentry->d_inode, new_dentry, + &delegated_inode, flags); exit5: dput(new_dentry); exit4: diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c index a9e563145e0c..73000aa2d220 100644 --- a/fs/nfs/blocklayout/blocklayout.c +++ b/fs/nfs/blocklayout/blocklayout.c @@ -699,7 +699,7 @@ bl_alloc_lseg(struct pnfs_layout_hdr *lo, struct nfs4_layoutget_res *lgr, xdr_init_decode_pages(&xdr, &buf, lgr->layoutp->pages, lgr->layoutp->len); - xdr_set_scratch_page(&xdr, scratch); + xdr_set_scratch_buffer(&xdr, page_address(scratch), PAGE_SIZE); status = -EIO; p = xdr_inline_decode(&xdr, 4); diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c index 16412d6636e8..6e3a14fdff9c 100644 --- a/fs/nfs/blocklayout/dev.c +++ b/fs/nfs/blocklayout/dev.c @@ -510,7 +510,7 @@ bl_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev, goto out; xdr_init_decode_pages(&xdr, &buf, pdev->pages, pdev->pglen); - xdr_set_scratch_page(&xdr, scratch); + xdr_set_scratch_buffer(&xdr, page_address(scratch), PAGE_SIZE); p = xdr_inline_decode(&xdr, sizeof(__be32)); if (!p) diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c index 8fe143cad4a2..7817ad94a6ba 100644 --- a/fs/nfs/callback.c +++ b/fs/nfs/callback.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -44,18 +45,18 @@ static int nfs4_callback_up_net(struct svc_serv *serv, struct net *net) int ret; struct nfs_net *nn = net_generic(net, nfs_net_id); - ret = svc_xprt_create(serv, "tcp", net, PF_INET, - nfs_callback_set_tcpport, SVC_SOCK_ANONYMOUS, - cred); + ret = svc_create_xprt(serv, "tcp", net, PF_INET, + nfs_callback_set_tcpport, SVC_SOCK_ANONYMOUS, + cred); if (ret <= 0) goto out_err; nn->nfs_callback_tcpport = ret; dprintk("NFS: Callback listener port = %u (af %u, net %x)\n", nn->nfs_callback_tcpport, PF_INET, net->ns.inum); - ret = svc_xprt_create(serv, "tcp", net, PF_INET6, - nfs_callback_set_tcpport, SVC_SOCK_ANONYMOUS, - cred); + ret = svc_create_xprt(serv, "tcp", net, PF_INET6, + nfs_callback_set_tcpport, SVC_SOCK_ANONYMOUS, + cred); if (ret > 0) { nn->nfs_callback_tcpport6 = ret; dprintk("NFS: Callback listener port = %u (af %u, net %x)\n", @@ -80,6 +81,9 @@ nfs4_callback_svc(void *vrqstp) set_freezable(); while (!kthread_freezable_should_stop(NULL)) { + + if (signal_pending(current)) + flush_signals(current); /* * Listen for a request on the socket */ @@ -88,8 +92,8 @@ nfs4_callback_svc(void *vrqstp) continue; svc_process(rqstp); } - svc_exit_thread(rqstp); + module_put_and_exit(0); return 0; } @@ -109,7 +113,11 @@ nfs41_callback_svc(void *vrqstp) set_freezable(); while (!kthread_freezable_should_stop(NULL)) { - prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_IDLE); + + if (signal_pending(current)) + flush_signals(current); + + prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_INTERRUPTIBLE); spin_lock_bh(&serv->sv_cb_lock); if (!list_empty(&serv->sv_cb_list)) { req = list_first_entry(&serv->sv_cb_list, @@ -124,12 +132,12 @@ nfs41_callback_svc(void *vrqstp) } else { spin_unlock_bh(&serv->sv_cb_lock); if (!kthread_should_stop()) - freezable_schedule(); + schedule(); finish_wait(&serv->sv_cb_waitq, &wq); } } - svc_exit_thread(rqstp); + module_put_and_exit(0); return 0; } @@ -161,12 +169,12 @@ static int nfs_callback_start_svc(int minorversion, struct rpc_xprt *xprt, if (nrservs < NFS4_MIN_NR_CALLBACK_THREADS) nrservs = NFS4_MIN_NR_CALLBACK_THREADS; - if (serv->sv_nrthreads == nrservs) + if (serv->sv_nrthreads-1 == nrservs) return 0; - ret = svc_set_num_threads(serv, NULL, nrservs); + ret = serv->sv_ops->svo_setup(serv, NULL, nrservs); if (ret) { - svc_set_num_threads(serv, NULL, 0); + serv->sv_ops->svo_setup(serv, NULL, 0); return ret; } dprintk("nfs_callback_up: service started\n"); @@ -181,7 +189,7 @@ static void nfs_callback_down_net(u32 minorversion, struct svc_serv *serv, struc return; dprintk("NFS: destroy per-net callback data; net=%x\n", net->ns.inum); - svc_xprt_destroy_all(serv, net); + svc_shutdown_net(serv, net); } static int nfs_callback_up_net(int minorversion, struct svc_serv *serv, @@ -224,17 +232,59 @@ static int nfs_callback_up_net(int minorversion, struct svc_serv *serv, return ret; } +static const struct svc_serv_ops nfs40_cb_sv_ops = { + .svo_function = nfs4_callback_svc, + .svo_enqueue_xprt = svc_xprt_do_enqueue, + .svo_setup = svc_set_num_threads_sync, + .svo_module = THIS_MODULE, +}; +#if defined(CONFIG_NFS_V4_1) +static const struct svc_serv_ops nfs41_cb_sv_ops = { + .svo_function = nfs41_callback_svc, + .svo_enqueue_xprt = svc_xprt_do_enqueue, + .svo_setup = svc_set_num_threads_sync, + .svo_module = THIS_MODULE, +}; + +static const struct svc_serv_ops *nfs4_cb_sv_ops[] = { + [0] = &nfs40_cb_sv_ops, + [1] = &nfs41_cb_sv_ops, +}; +#else +static const struct svc_serv_ops *nfs4_cb_sv_ops[] = { + [0] = &nfs40_cb_sv_ops, + [1] = NULL, +}; +#endif + static struct svc_serv *nfs_callback_create_svc(int minorversion) { struct nfs_callback_data *cb_info = &nfs_callback_info[minorversion]; - int (*threadfn)(void *data); + const struct svc_serv_ops *sv_ops; struct svc_serv *serv; /* * Check whether we're already up and running. */ - if (cb_info->serv) - return svc_get(cb_info->serv); + if (cb_info->serv) { + /* + * Note: increase service usage, because later in case of error + * svc_destroy() will be called. + */ + svc_get(cb_info->serv); + return cb_info->serv; + } + + switch (minorversion) { + case 0: + sv_ops = nfs4_cb_sv_ops[0]; + break; + default: + sv_ops = nfs4_cb_sv_ops[1]; + } + + if (sv_ops == NULL) + return ERR_PTR(-ENOTSUPP); /* * Sanity check: if there's no task, @@ -244,16 +294,7 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion) printk(KERN_WARNING "nfs_callback_create_svc: no kthread, %d users??\n", cb_info->users); - threadfn = nfs4_callback_svc; -#if defined(CONFIG_NFS_V4_1) - if (minorversion) - threadfn = nfs41_callback_svc; -#else - if (minorversion) - return ERR_PTR(-ENOTSUPP); -#endif - serv = svc_create(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE, - threadfn); + serv = svc_create_pooled(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE, sv_ops); if (!serv) { printk(KERN_ERR "nfs_callback_create_svc: create service failed\n"); return ERR_PTR(-ENOMEM); @@ -294,10 +335,16 @@ int nfs_callback_up(u32 minorversion, struct rpc_xprt *xprt) goto err_start; cb_info->users++; + /* + * svc_create creates the svc_serv with sv_nrthreads == 1, and then + * svc_prepare_thread increments that. So we need to call svc_destroy + * on both success and failure so that the refcount is 1 when the + * thread exits. + */ err_net: if (!cb_info->users) cb_info->serv = NULL; - svc_put(serv); + svc_destroy(serv); err_create: mutex_unlock(&nfs_callback_mutex); return ret; @@ -322,8 +369,8 @@ void nfs_callback_down(int minorversion, struct net *net) cb_info->users--; if (cb_info->users == 0) { svc_get(serv); - svc_set_num_threads(serv, NULL, 0); - svc_put(serv); + serv->sv_ops->svo_setup(serv, NULL, 0); + svc_destroy(serv); dprintk("nfs_callback_down: service destroyed\n"); cb_info->serv = NULL; } @@ -382,8 +429,6 @@ check_gss_callback_principal(struct nfs_client *clp, struct svc_rqst *rqstp) */ static int nfs_callback_authenticate(struct svc_rqst *rqstp) { - rqstp->rq_auth_stat = rpc_autherr_badcred; - switch (rqstp->rq_authop->flavour) { case RPC_AUTH_NULL: if (rqstp->rq_proc != CB_NULL) @@ -394,8 +439,6 @@ static int nfs_callback_authenticate(struct svc_rqst *rqstp) if (svc_is_backchannel(rqstp)) return SVC_DENIED; } - - rqstp->rq_auth_stat = rpc_auth_ok; return SVC_OK; } diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c index db69fc267c9a..ca8a4aa351dc 100644 --- a/fs/nfs/callback_xdr.c +++ b/fs/nfs/callback_xdr.c @@ -63,13 +63,14 @@ static __be32 nfs4_callback_null(struct svc_rqst *rqstp) return htonl(NFS4_OK); } -/* - * svc_process_common() looks for an XDR encoder to know when - * not to drop a Reply. - */ -static bool nfs4_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr) +static int nfs4_decode_void(struct svc_rqst *rqstp, __be32 *p) { - return true; + return xdr_argsize_check(rqstp, p); +} + +static int nfs4_encode_void(struct svc_rqst *rqstp, __be32 *p) +{ + return xdr_ressize_check(rqstp, p); } static __be32 decode_string(struct xdr_stream *xdr, unsigned int *len, @@ -983,17 +984,7 @@ static __be32 nfs4_callback_compound(struct svc_rqst *rqstp) out_invalidcred: pr_warn_ratelimited("NFS: NFSv4 callback contains invalid cred\n"); - rqstp->rq_auth_stat = rpc_autherr_badcred; - return rpc_success; -} - -static int -nfs_callback_dispatch(struct svc_rqst *rqstp, __be32 *statp) -{ - const struct svc_procedure *procp = rqstp->rq_procinfo; - - *statp = procp->pc_func(rqstp); - return 1; + return svc_return_autherr(rqstp, rpc_autherr_badcred); } /* @@ -1062,18 +1053,16 @@ static struct callback_op callback_ops[] = { static const struct svc_procedure nfs4_callback_procedures1[] = { [CB_NULL] = { .pc_func = nfs4_callback_null, + .pc_decode = nfs4_decode_void, .pc_encode = nfs4_encode_void, .pc_xdrressize = 1, - .pc_name = "NULL", }, [CB_COMPOUND] = { .pc_func = nfs4_callback_compound, .pc_encode = nfs4_encode_void, .pc_argsize = 256, - .pc_argzero = 256, .pc_ressize = 256, .pc_xdrressize = NFS4_CALLBACK_BUFSIZE, - .pc_name = "COMPOUND", } }; @@ -1084,7 +1073,7 @@ const struct svc_version nfs4_callback_version1 = { .vs_proc = nfs4_callback_procedures1, .vs_count = nfs4_callback_count1, .vs_xdrsize = NFS4_CALLBACK_XDRSIZE, - .vs_dispatch = nfs_callback_dispatch, + .vs_dispatch = NULL, .vs_hidden = true, .vs_need_cong_ctrl = true, }; @@ -1096,7 +1085,7 @@ const struct svc_version nfs4_callback_version4 = { .vs_proc = nfs4_callback_procedures1, .vs_count = nfs4_callback_count4, .vs_xdrsize = NFS4_CALLBACK_XDRSIZE, - .vs_dispatch = nfs_callback_dispatch, + .vs_dispatch = NULL, .vs_hidden = true, .vs_need_cong_ctrl = true, }; diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 935029632d5f..9f88ca7b2001 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -576,7 +576,7 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en goto out_nopages; xdr_init_decode_pages(&stream, &buf, xdr_pages, buflen); - xdr_set_scratch_page(&stream, scratch); + xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE); do { if (entry->label) diff --git a/fs/nfs/export.c b/fs/nfs/export.c index 993be63ab301..3430d6891e89 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -167,25 +167,8 @@ nfs_get_parent(struct dentry *dentry) return parent; } -static u64 nfs_fetch_iversion(struct inode *inode) -{ - struct nfs_server *server = NFS_SERVER(inode); - - if (nfs_check_cache_invalid(inode, NFS_INO_INVALID_CHANGE | - NFS_INO_REVAL_PAGECACHE)) - __nfs_revalidate_inode(server, inode); - return inode_peek_iversion_raw(inode); -} - const struct export_operations nfs_export_ops = { .encode_fh = nfs_encode_fh, .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, - .fetch_iversion = nfs_fetch_iversion, - .flags = EXPORT_OP_NOWCC | - EXPORT_OP_NOSUBTREECHK | - EXPORT_OP_CLOSE_BEFORE_UNLINK | - EXPORT_OP_REMOTE_FS | - EXPORT_OP_NOATOMIC_ATTR | - EXPORT_OP_FLUSH_ON_CLOSE, }; diff --git a/fs/nfs/file.c b/fs/nfs/file.c index d35aae47b062..7be1a7f7fcb2 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -798,9 +798,6 @@ int nfs_lock(struct file *filp, int cmd, struct file_lock *fl) nfs_inc_stats(inode, NFSIOS_VFSLOCK); - if (fl->fl_flags & FL_RECLAIM) - return -ENOGRACE; - /* No mandatory locks over NFS */ if (__mandatory_lock(inode) && fl->fl_type != F_UNLCK) goto out_err; diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c index 2ed8b6885b09..deecfb50dd7e 100644 --- a/fs/nfs/filelayout/filelayout.c +++ b/fs/nfs/filelayout/filelayout.c @@ -293,6 +293,8 @@ static void filelayout_read_call_done(struct rpc_task *task, void *data) { struct nfs_pgio_header *hdr = data; + dprintk("--> %s task->tk_status %d\n", __func__, task->tk_status); + if (test_bit(NFS_IOHDR_REDO, &hdr->flags) && task->tk_status == 0) { nfs41_sequence_done(task, &hdr->res.seq_res); @@ -664,7 +666,7 @@ filelayout_decode_layout(struct pnfs_layout_hdr *flo, return -ENOMEM; xdr_init_decode_pages(&stream, &buf, lgr->layoutp->pages, lgr->layoutp->len); - xdr_set_scratch_page(&stream, scratch); + xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE); /* 20 = ufl_util (4), first_stripe_index (4), pattern_offset (8), * num_fh (4) */ diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c index 86c3f7e69ec4..d913e818858f 100644 --- a/fs/nfs/filelayout/filelayoutdev.c +++ b/fs/nfs/filelayout/filelayoutdev.c @@ -82,7 +82,7 @@ nfs4_fl_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev, goto out_err; xdr_init_decode_pages(&stream, &buf, pdev->pages, pdev->pglen); - xdr_set_scratch_page(&stream, scratch); + xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE); /* Get the stripe count (number of stripe index) */ p = xdr_inline_decode(&stream, 4); diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index a263bfec4244..e4f2820ba5a5 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -378,7 +378,7 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh, xdr_init_decode_pages(&stream, &buf, lgr->layoutp->pages, lgr->layoutp->len); - xdr_set_scratch_page(&stream, scratch); + xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE); /* stripe unit and mirror_array_cnt */ rc = -EIO; @@ -1419,6 +1419,8 @@ static void ff_layout_read_call_done(struct rpc_task *task, void *data) { struct nfs_pgio_header *hdr = data; + dprintk("--> %s task->tk_status %d\n", __func__, task->tk_status); + if (test_bit(NFS_IOHDR_REDO, &hdr->flags) && task->tk_status == 0) { nfs4_sequence_done(task, &hdr->res.seq_res); diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c index bfa7202ca7be..1f12297109b4 100644 --- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c +++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c @@ -69,7 +69,7 @@ nfs4_ff_alloc_deviceid_node(struct nfs_server *server, struct pnfs_device *pdev, INIT_LIST_HEAD(&dsaddrs); xdr_init_decode_pages(&stream, &buf, pdev->pages, pdev->pglen); - xdr_set_scratch_page(&stream, scratch); + xdr_set_scratch_buffer(&stream, page_address(scratch), PAGE_SIZE); /* multipath count */ p = xdr_inline_decode(&stream, 4); diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c index df5bee2f505c..f2248d9d4db5 100644 --- a/fs/nfs/nfs42xdr.c +++ b/fs/nfs/nfs42xdr.c @@ -1536,7 +1536,7 @@ static int nfs4_xdr_dec_listxattrs(struct rpc_rqst *rqstp, struct compound_hdr hdr; int status; - xdr_set_scratch_page(xdr, res->scratch); + xdr_set_scratch_buffer(xdr, page_address(res->scratch), PAGE_SIZE); status = decode_compound_hdr(xdr, &hdr); if (status) diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index d8fc5d72a161..afb617a4a7e4 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -2757,7 +2757,7 @@ static int nfs4_run_state_manager(void *ptr) goto again; nfs_put_client(clp); - module_put_and_kthread_exit(0); + module_put_and_exit(0); return 0; } diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c index 4e5c6cb770ad..f1e599553f2b 100644 --- a/fs/nfs/nfs4xdr.c +++ b/fs/nfs/nfs4xdr.c @@ -6404,8 +6404,10 @@ nfs4_xdr_dec_getacl(struct rpc_rqst *rqstp, struct xdr_stream *xdr, struct compound_hdr hdr; int status; - if (res->acl_scratch != NULL) - xdr_set_scratch_page(xdr, res->acl_scratch); + if (res->acl_scratch != NULL) { + void *p = page_address(res->acl_scratch); + xdr_set_scratch_buffer(xdr, p, PAGE_SIZE); + } status = decode_compound_hdr(xdr, &hdr); if (status) goto out; diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c index d79a3b6cb070..17fef6eb490c 100644 --- a/fs/nfs/pagelist.c +++ b/fs/nfs/pagelist.c @@ -870,6 +870,9 @@ static void nfs_pgio_result(struct rpc_task *task, void *calldata) struct nfs_pgio_header *hdr = calldata; struct inode *inode = hdr->inode; + dprintk("NFS: %s: %5u, (status %d)\n", __func__, + task->tk_pid, task->tk_status); + if (hdr->rw_ops->rw_done(task, hdr, inode) != 0) return; if (task->tk_status < 0) diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 1ffce9076060..b3fcc27b9564 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -86,11 +86,9 @@ const struct super_operations nfs_sops = { }; EXPORT_SYMBOL_GPL(nfs_sops); -#ifdef CONFIG_NFS_V4_2 static const struct nfs_ssc_client_ops nfs_ssc_clnt_ops_tbl = { .sco_sb_deactive = nfs_sb_deactive, }; -#endif #if IS_ENABLED(CONFIG_NFS_V4) static int __init register_nfs4_fs(void) @@ -113,7 +111,6 @@ static void unregister_nfs4_fs(void) } #endif -#ifdef CONFIG_NFS_V4_2 static void nfs_ssc_register_ops(void) { nfs_ssc_register(&nfs_ssc_clnt_ops_tbl); @@ -123,7 +120,6 @@ static void nfs_ssc_unregister_ops(void) { nfs_ssc_unregister(&nfs_ssc_clnt_ops_tbl); } -#endif /* CONFIG_NFS_V4_2 */ static struct shrinker acl_shrinker = { .count_objects = nfs_access_cache_count, @@ -152,9 +148,7 @@ int __init register_nfs_fs(void) ret = register_shrinker(&acl_shrinker); if (ret < 0) goto error_3; -#ifdef CONFIG_NFS_V4_2 nfs_ssc_register_ops(); -#endif return 0; error_3: nfs_unregister_sysctl(); @@ -174,9 +168,7 @@ void __exit unregister_nfs_fs(void) unregister_shrinker(&acl_shrinker); nfs_unregister_sysctl(); unregister_nfs4_fs(); -#ifdef CONFIG_NFS_V4_2 nfs_ssc_unregister_ops(); -#endif unregister_filesystem(&nfs_fs_type); } diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 2bde35921f2b..4cf060691979 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1809,6 +1809,9 @@ static void nfs_commit_done(struct rpc_task *task, void *calldata) { struct nfs_commit_data *data = calldata; + dprintk("NFS: %5u nfs_commit_done (status %d)\n", + task->tk_pid, task->tk_status); + /* Call the NFS version-specific code */ NFS_PROTO(data->inode)->commit_done(task, data); trace_nfs_commit_done(task, data); diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile index 119c75ab9fd0..fa82f5aaa6d9 100644 --- a/fs/nfs_common/Makefile +++ b/fs/nfs_common/Makefile @@ -7,4 +7,4 @@ obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o nfs_acl-objs := nfsacl.o obj-$(CONFIG_GRACE_PERIOD) += grace.o -obj-$(CONFIG_NFS_V4_2_SSC_HELPER) += nfs_ssc.o +obj-$(CONFIG_GRACE_PERIOD) += nfs_ssc.o diff --git a/fs/nfs_common/nfs_ssc.c b/fs/nfs_common/nfs_ssc.c index 7c1509e968c8..f43bbb373913 100644 --- a/fs/nfs_common/nfs_ssc.c +++ b/fs/nfs_common/nfs_ssc.c @@ -1,5 +1,7 @@ // SPDX-License-Identifier: GPL-2.0-only /* + * fs/nfs_common/nfs_ssc_comm.c + * * Helper for knfsd's SSC to access ops in NFS client modules * * Author: Dai Ngo diff --git a/fs/nfs_common/nfsacl.c b/fs/nfs_common/nfsacl.c index 5a5bd85d08f8..d056ad2fdefd 100644 --- a/fs/nfs_common/nfsacl.c +++ b/fs/nfs_common/nfsacl.c @@ -136,77 +136,6 @@ int nfsacl_encode(struct xdr_buf *buf, unsigned int base, struct inode *inode, } EXPORT_SYMBOL_GPL(nfsacl_encode); -/** - * nfs_stream_encode_acl - Encode an NFSv3 ACL - * - * @xdr: an xdr_stream positioned to receive an encoded ACL - * @inode: inode of file whose ACL this is - * @acl: posix_acl to encode - * @encode_entries: whether to encode ACEs as well - * @typeflag: ACL type: NFS_ACL_DEFAULT or zero - * - * Return values: - * %false: The ACL could not be encoded - * %true: @xdr is advanced to the next available position - */ -bool nfs_stream_encode_acl(struct xdr_stream *xdr, struct inode *inode, - struct posix_acl *acl, int encode_entries, - int typeflag) -{ - const size_t elem_size = XDR_UNIT * 3; - u32 entries = (acl && acl->a_count) ? max_t(int, acl->a_count, 4) : 0; - struct nfsacl_encode_desc nfsacl_desc = { - .desc = { - .elem_size = elem_size, - .array_len = encode_entries ? entries : 0, - .xcode = xdr_nfsace_encode, - }, - .acl = acl, - .typeflag = typeflag, - .uid = inode->i_uid, - .gid = inode->i_gid, - }; - struct nfsacl_simple_acl aclbuf; - unsigned int base; - int err; - - if (entries > NFS_ACL_MAX_ENTRIES) - return false; - if (xdr_stream_encode_u32(xdr, entries) < 0) - return false; - - if (encode_entries && acl && acl->a_count == 3) { - struct posix_acl *acl2 = &aclbuf.acl; - - /* Avoid the use of posix_acl_alloc(). nfsacl_encode() is - * invoked in contexts where a memory allocation failure is - * fatal. Fortunately this fake ACL is small enough to - * construct on the stack. */ - posix_acl_init(acl2, 4); - - /* Insert entries in canonical order: other orders seem - to confuse Solaris VxFS. */ - acl2->a_entries[0] = acl->a_entries[0]; /* ACL_USER_OBJ */ - acl2->a_entries[1] = acl->a_entries[1]; /* ACL_GROUP_OBJ */ - acl2->a_entries[2] = acl->a_entries[1]; /* ACL_MASK */ - acl2->a_entries[2].e_tag = ACL_MASK; - acl2->a_entries[3] = acl->a_entries[2]; /* ACL_OTHER */ - nfsacl_desc.acl = acl2; - } - - base = xdr_stream_pos(xdr); - if (!xdr_reserve_space(xdr, XDR_UNIT + - elem_size * nfsacl_desc.desc.array_len)) - return false; - err = xdr_encode_array2(xdr->buf, base, &nfsacl_desc.desc); - if (err) - return false; - - return true; -} -EXPORT_SYMBOL_GPL(nfs_stream_encode_acl); - - struct nfsacl_decode_desc { struct xdr_array2_desc desc; unsigned int count; @@ -366,55 +295,3 @@ int nfsacl_decode(struct xdr_buf *buf, unsigned int base, unsigned int *aclcnt, nfsacl_desc.desc.array_len; } EXPORT_SYMBOL_GPL(nfsacl_decode); - -/** - * nfs_stream_decode_acl - Decode an NFSv3 ACL - * - * @xdr: an xdr_stream positioned at an encoded ACL - * @aclcnt: OUT: count of ACEs in decoded posix_acl - * @pacl: OUT: a dynamically-allocated buffer containing the decoded posix_acl - * - * Return values: - * %false: The encoded ACL is not valid - * %true: @pacl contains a decoded ACL, and @xdr is advanced - * - * On a successful return, caller must release *pacl using posix_acl_release(). - */ -bool nfs_stream_decode_acl(struct xdr_stream *xdr, unsigned int *aclcnt, - struct posix_acl **pacl) -{ - const size_t elem_size = XDR_UNIT * 3; - struct nfsacl_decode_desc nfsacl_desc = { - .desc = { - .elem_size = elem_size, - .xcode = pacl ? xdr_nfsace_decode : NULL, - }, - }; - unsigned int base; - u32 entries; - - if (xdr_stream_decode_u32(xdr, &entries) < 0) - return false; - if (entries > NFS_ACL_MAX_ENTRIES) - return false; - - base = xdr_stream_pos(xdr); - if (!xdr_inline_decode(xdr, XDR_UNIT + elem_size * entries)) - return false; - nfsacl_desc.desc.array_maxlen = entries; - if (xdr_decode_array2(xdr->buf, base, &nfsacl_desc.desc)) - return false; - - if (pacl) { - if (entries != nfsacl_desc.desc.array_len || - posix_acl_from_nfsacl(nfsacl_desc.acl) != 0) { - posix_acl_release(nfsacl_desc.acl); - return false; - } - *pacl = nfsacl_desc.acl; - } - if (aclcnt) - *aclcnt = entries; - return true; -} -EXPORT_SYMBOL_GPL(nfs_stream_decode_acl); diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig index 6d2d498a5957..248f1459c039 100644 --- a/fs/nfsd/Kconfig +++ b/fs/nfsd/Kconfig @@ -8,7 +8,6 @@ config NFSD select SUNRPC select EXPORTFS select NFS_ACL_SUPPORT if NFSD_V2_ACL - select NFS_ACL_SUPPORT if NFSD_V3_ACL depends on MULTIUSER help Choose Y here if you want to allow other computers to access @@ -27,29 +26,28 @@ config NFSD Below you can choose which versions of the NFS protocol are available to clients mounting the NFS server on this system. - Support for NFS version 3 (RFC 1813) is always available when + Support for NFS version 2 (RFC 1094) is always available when CONFIG_NFSD is selected. If unsure, say N. -config NFSD_V2 - bool "NFS server support for NFS version 2 (DEPRECATED)" - depends on NFSD - default n - help - NFSv2 (RFC 1094) was the first publicly-released version of NFS. - Unless you are hosting ancient (1990's era) NFS clients, you don't - need this. - - If unsure, say N. - config NFSD_V2_ACL - bool "NFS server support for the NFSv2 ACL protocol extension" - depends on NFSD_V2 + bool + depends on NFSD + +config NFSD_V3 + bool "NFS server support for NFS version 3" + depends on NFSD + help + This option enables support in your system's NFS server for + version 3 of the NFS protocol (RFC 1813). + + If unsure, say Y. config NFSD_V3_ACL bool "NFS server support for the NFSv3 ACL protocol extension" - depends on NFSD + depends on NFSD_V3 + select NFSD_V2_ACL help Solaris NFS servers support an auxiliary NFSv3 ACL protocol that never became an official part of the NFS version 3 protocol. @@ -72,13 +70,13 @@ config NFSD_V3_ACL config NFSD_V4 bool "NFS server support for NFS version 4" depends on NFSD && PROC_FS + select NFSD_V3 select FS_POSIX_ACL select SUNRPC_GSS select CRYPTO select CRYPTO_MD5 select CRYPTO_SHA256 select GRACE_PERIOD - select NFS_V4_2_SSC_HELPER if NFS_V4_2 help This option enables support in your system's NFS server for version 4 of the NFS protocol (RFC 3530). @@ -100,7 +98,7 @@ config NFSD_BLOCKLAYOUT help This option enables support for the exporting pNFS block layouts in the kernel's NFS server. The pNFS block layout enables NFS - clients to directly perform I/O to block devices accessible to both + clients to directly perform I/O to block devices accesible to both the server and the clients. See RFC 5663 for more details. If unsure, say N. @@ -114,7 +112,7 @@ config NFSD_SCSILAYOUT help This option enables support for the exporting pNFS SCSI layouts in the kernel's NFS server. The pNFS SCSI layout enables NFS - clients to directly perform I/O to SCSI devices accessible to both + clients to directly perform I/O to SCSI devices accesible to both the server and the clients. See draft-ietf-nfsv4-scsi-layout for more details. @@ -128,7 +126,7 @@ config NFSD_FLEXFILELAYOUT This option enables support for the exporting pNFS Flex File layouts in the kernel's NFS server. The pNFS Flex File layout enables NFS clients to directly perform I/O to NFSv3 devices - accessible to both the server and the clients. See + accesible to both the server and the clients. See draft-ietf-nfsv4-flex-files for more details. Warning, this server implements the bare minimum functionality @@ -139,7 +137,7 @@ config NFSD_FLEXFILELAYOUT config NFSD_V4_2_INTER_SSC bool "NFSv4.2 inter server to server COPY" - depends on NFSD_V4 && NFS_V4_2 + depends on NFSD_V4 && NFS_V4_1 && NFS_V4_2 help This option enables support for NFSv4.2 inter server to server copy where the destination server calls the NFSv4.2 diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile index 6fffc8f03f74..3f0983e93a99 100644 --- a/fs/nfsd/Makefile +++ b/fs/nfsd/Makefile @@ -10,11 +10,11 @@ obj-$(CONFIG_NFSD) += nfsd.o # this one should be compiled first, as the tracing macros can easily blow up nfsd-y += trace.o -nfsd-y += nfssvc.o nfsctl.o nfsfh.o vfs.o \ - export.o auth.o lockd.o nfscache.o \ - stats.o filecache.o nfs3proc.o nfs3xdr.o -nfsd-$(CONFIG_NFSD_V2) += nfsproc.o nfsxdr.o +nfsd-y += nfssvc.o nfsctl.o nfsproc.o nfsfh.o vfs.o \ + export.o auth.o lockd.o nfscache.o nfsxdr.o \ + stats.o filecache.o nfsd-$(CONFIG_NFSD_V2_ACL) += nfs2acl.o +nfsd-$(CONFIG_NFSD_V3) += nfs3proc.o nfs3xdr.o nfsd-$(CONFIG_NFSD_V3_ACL) += nfs3acl.o nfsd-$(CONFIG_NFSD_V4) += nfs4proc.o nfs4xdr.o nfs4state.o nfs4idmap.o \ nfs4acl.o nfs4callback.o nfs4recover.o diff --git a/fs/nfsd/acl.h b/fs/nfsd/acl.h index 4b7324458a94..ba14d2f4b64f 100644 --- a/fs/nfsd/acl.h +++ b/fs/nfsd/acl.h @@ -38,8 +38,6 @@ struct nfs4_acl; struct svc_fh; struct svc_rqst; -struct nfsd_attrs; -enum nfs_ftype4; int nfs4_acl_bytes(int entries); int nfs4_acl_get_whotype(char *, u32); @@ -47,7 +45,7 @@ __be32 nfs4_acl_write_who(struct xdr_stream *xdr, int who); int nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry, struct nfs4_acl **acl); -__be32 nfsd4_acl_to_attr(enum nfs_ftype4 type, struct nfs4_acl *acl, - struct nfsd_attrs *attr); +__be32 nfsd4_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp, + struct nfs4_acl *acl); #endif /* LINUX_NFS4_ACL_H */ diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c index d91a686d2f31..a07c39c94bbd 100644 --- a/fs/nfsd/blocklayout.c +++ b/fs/nfsd/blocklayout.c @@ -16,7 +16,6 @@ #include "blocklayoutxdr.h" #include "pnfs.h" #include "filecache.h" -#include "vfs.h" #define NFSDDBG_FACILITY NFSDDBG_PNFS diff --git a/fs/nfsd/blocklayoutxdr.c b/fs/nfsd/blocklayoutxdr.c index 1ed2f691ebb9..2455dc8be18a 100644 --- a/fs/nfsd/blocklayoutxdr.c +++ b/fs/nfsd/blocklayoutxdr.c @@ -9,7 +9,6 @@ #include "nfsd.h" #include "blocklayoutxdr.h" -#include "vfs.h" #define NFSDDBG_FACILITY NFSDDBG_PNFS diff --git a/fs/nfsd/cache.h b/fs/nfsd/cache.h index f21259ead64b..65c331f75e9c 100644 --- a/fs/nfsd/cache.h +++ b/fs/nfsd/cache.h @@ -84,6 +84,6 @@ int nfsd_reply_cache_init(struct nfsd_net *); void nfsd_reply_cache_shutdown(struct nfsd_net *); int nfsd_cache_lookup(struct svc_rqst *); void nfsd_cache_update(struct svc_rqst *, int, __be32 *); -int nfsd_reply_cache_stats_show(struct seq_file *m, void *v); +int nfsd_reply_cache_stats_open(struct inode *, struct file *); #endif /* NFSCACHE_H */ diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c index 7c863f2c21e0..21e404e7cb68 100644 --- a/fs/nfsd/export.c +++ b/fs/nfsd/export.c @@ -331,29 +331,12 @@ static void nfsd4_fslocs_free(struct nfsd4_fs_locations *fsloc) fsloc->locations = NULL; } -static int export_stats_init(struct export_stats *stats) -{ - stats->start_time = ktime_get_seconds(); - return nfsd_percpu_counters_init(stats->counter, EXP_STATS_COUNTERS_NUM); -} - -static void export_stats_reset(struct export_stats *stats) -{ - nfsd_percpu_counters_reset(stats->counter, EXP_STATS_COUNTERS_NUM); -} - -static void export_stats_destroy(struct export_stats *stats) -{ - nfsd_percpu_counters_destroy(stats->counter, EXP_STATS_COUNTERS_NUM); -} - static void svc_export_put(struct kref *ref) { struct svc_export *exp = container_of(ref, struct svc_export, h.ref); path_put(&exp->ex_path); auth_domain_put(exp->ex_client); nfsd4_fslocs_free(&exp->ex_fslocs); - export_stats_destroy(&exp->ex_stats); kfree(exp->ex_uuid); kfree_rcu(exp, ex_rcu); } @@ -425,12 +408,6 @@ static int check_export(struct inode *inode, int *flags, unsigned char *uuid) return -EINVAL; } - if (inode->i_sb->s_export_op->flags & EXPORT_OP_NOSUBTREECHK && - !(*flags & NFSEXP_NOSUBTREECHECK)) { - dprintk("%s: %s does not support subtree checking!\n", - __func__, inode->i_sb->s_type->name); - return -EINVAL; - } return 0; } @@ -709,47 +686,22 @@ static void exp_flags(struct seq_file *m, int flag, int fsid, kuid_t anonu, kgid_t anong, struct nfsd4_fs_locations *fslocs); static void show_secinfo(struct seq_file *m, struct svc_export *exp); -static int is_export_stats_file(struct seq_file *m) -{ - /* - * The export_stats file uses the same ops as the exports file. - * We use the file's name to determine the reported info per export. - * There is no rename in nsfdfs, so d_name.name is stable. - */ - return !strcmp(m->file->f_path.dentry->d_name.name, "export_stats"); -} - static int svc_export_show(struct seq_file *m, struct cache_detail *cd, struct cache_head *h) { - struct svc_export *exp; - bool export_stats = is_export_stats_file(m); + struct svc_export *exp ; - if (h == NULL) { - if (export_stats) - seq_puts(m, "#path domain start-time\n#\tstats\n"); - else - seq_puts(m, "#path domain(flags)\n"); + if (h ==NULL) { + seq_puts(m, "#path domain(flags)\n"); return 0; } exp = container_of(h, struct svc_export, h); seq_path(m, &exp->ex_path, " \t\n\\"); seq_putc(m, '\t'); seq_escape(m, exp->ex_client->name, " \t\n\\"); - if (export_stats) { - seq_printf(m, "\t%lld\n", exp->ex_stats.start_time); - seq_printf(m, "\tfh_stale: %lld\n", - percpu_counter_sum_positive(&exp->ex_stats.counter[EXP_STATS_FH_STALE])); - seq_printf(m, "\tio_read: %lld\n", - percpu_counter_sum_positive(&exp->ex_stats.counter[EXP_STATS_IO_READ])); - seq_printf(m, "\tio_write: %lld\n", - percpu_counter_sum_positive(&exp->ex_stats.counter[EXP_STATS_IO_WRITE])); - seq_putc(m, '\n'); - return 0; - } seq_putc(m, '('); - if (test_bit(CACHE_VALID, &h->flags) && + if (test_bit(CACHE_VALID, &h->flags) && !test_bit(CACHE_NEGATIVE, &h->flags)) { exp_flags(m, exp->ex_flags, exp->ex_fsid, exp->ex_anon_uid, exp->ex_anon_gid, &exp->ex_fslocs); @@ -790,7 +742,6 @@ static void svc_export_init(struct cache_head *cnew, struct cache_head *citem) new->ex_layout_types = 0; new->ex_uuid = NULL; new->cd = item->cd; - export_stats_reset(&new->ex_stats); } static void export_update(struct cache_head *cnew, struct cache_head *citem) @@ -823,15 +774,10 @@ static void export_update(struct cache_head *cnew, struct cache_head *citem) static struct cache_head *svc_export_alloc(void) { struct svc_export *i = kmalloc(sizeof(*i), GFP_KERNEL); - if (!i) + if (i) + return &i->h; + else return NULL; - - if (export_stats_init(&i->ex_stats)) { - kfree(i); - return NULL; - } - - return &i->h; } static const struct cache_detail svc_export_cache_template = { @@ -1293,14 +1239,10 @@ static int e_show(struct seq_file *m, void *p) struct cache_head *cp = p; struct svc_export *exp = container_of(cp, struct svc_export, h); struct cache_detail *cd = m->private; - bool export_stats = is_export_stats_file(m); if (p == SEQ_START_TOKEN) { seq_puts(m, "# Version 1.1\n"); - if (export_stats) - seq_puts(m, "# Path Client Start-time\n#\tStats\n"); - else - seq_puts(m, "# Path Client(Flags) # IPs\n"); + seq_puts(m, "# Path Client(Flags) # IPs\n"); return 0; } diff --git a/fs/nfsd/export.h b/fs/nfsd/export.h index d03f7f6a8642..e7daa1f246f0 100644 --- a/fs/nfsd/export.h +++ b/fs/nfsd/export.h @@ -6,7 +6,6 @@ #define NFSD_EXPORT_H #include -#include #include #include @@ -47,19 +46,6 @@ struct exp_flavor_info { u32 flags; }; -/* Per-export stats */ -enum { - EXP_STATS_FH_STALE, - EXP_STATS_IO_READ, - EXP_STATS_IO_WRITE, - EXP_STATS_COUNTERS_NUM -}; - -struct export_stats { - time64_t start_time; - struct percpu_counter counter[EXP_STATS_COUNTERS_NUM]; -}; - struct svc_export { struct cache_head h; struct auth_domain * ex_client; @@ -76,7 +62,6 @@ struct svc_export { struct nfsd4_deviceid_map *ex_devid_map; struct cache_detail *cd; struct rcu_head ex_rcu; - struct export_stats ex_stats; }; /* an "export key" (expkey) maps a filehandlefragement to an @@ -115,6 +100,7 @@ struct svc_export * rqst_find_fsidzero_export(struct svc_rqst *); int exp_rootfh(struct net *, struct auth_domain *, char *path, struct knfsd_fh *, int maxsize); __be32 exp_pseudoroot(struct svc_rqst *, struct svc_fh *); +__be32 nfserrno(int errno); static inline void exp_put(struct svc_export *exp) { diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 615ea8324911..e30e1ddc1ace 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -1,32 +1,7 @@ -// SPDX-License-Identifier: GPL-2.0 /* - * The NFSD open file cache. + * Open file cache. * * (c) 2015 - Jeff Layton - * - * An nfsd_file object is a per-file collection of open state that binds - * together: - * - a struct file * - * - a user credential - * - a network namespace - * - a read-ahead context - * - monitoring for writeback errors - * - * nfsd_file objects are reference-counted. Consumers acquire a new - * object via the nfsd_file_acquire API. They manage their interest in - * the acquired object, and hence the object's reference count, via - * nfsd_file_get and nfsd_file_put. There are two varieties of nfsd_file - * object: - * - * * non-garbage-collected: When a consumer wants to precisely control - * the lifetime of a file's open state, it acquires a non-garbage- - * collected nfsd_file. The final nfsd_file_put releases the open - * state immediately. - * - * * garbage-collected: When a consumer does not control the lifetime - * of open state, it acquires a garbage-collected nfsd_file. The - * final nfsd_file_put allows the open state to linger for a period - * during which it may be re-used. */ #include @@ -37,7 +12,6 @@ #include #include #include -#include #include "vfs.h" #include "nfsd.h" @@ -46,75 +20,63 @@ #include "filecache.h" #include "trace.h" +#define NFSDDBG_FACILITY NFSDDBG_FH + +/* FIXME: dynamically size this for the machine somehow? */ +#define NFSD_FILE_HASH_BITS 12 +#define NFSD_FILE_HASH_SIZE (1 << NFSD_FILE_HASH_BITS) #define NFSD_LAUNDRETTE_DELAY (2 * HZ) -#define NFSD_FILE_CACHE_UP (0) +#define NFSD_FILE_SHUTDOWN (1) +#define NFSD_FILE_LRU_THRESHOLD (4096UL) +#define NFSD_FILE_LRU_LIMIT (NFSD_FILE_LRU_THRESHOLD << 2) /* We only care about NFSD_MAY_READ/WRITE for this cache */ #define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE) +struct nfsd_fcache_bucket { + struct hlist_head nfb_head; + spinlock_t nfb_lock; + unsigned int nfb_count; + unsigned int nfb_maxcount; +}; + static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); -static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); -static DEFINE_PER_CPU(unsigned long, nfsd_file_releases); -static DEFINE_PER_CPU(unsigned long, nfsd_file_total_age); -static DEFINE_PER_CPU(unsigned long, nfsd_file_evictions); struct nfsd_fcache_disposal { + struct list_head list; struct work_struct work; + struct net *net; spinlock_t lock; struct list_head freeme; + struct rcu_head rcu; }; static struct workqueue_struct *nfsd_filecache_wq __read_mostly; static struct kmem_cache *nfsd_file_slab; static struct kmem_cache *nfsd_file_mark_slab; +static struct nfsd_fcache_bucket *nfsd_file_hashtbl; static struct list_lru nfsd_file_lru; -static unsigned long nfsd_file_flags; +static long nfsd_file_lru_flags; static struct fsnotify_group *nfsd_file_fsnotify_group; +static atomic_long_t nfsd_filecache_count; static struct delayed_work nfsd_filecache_laundrette; -static struct rhltable nfsd_file_rhltable - ____cacheline_aligned_in_smp; +static DEFINE_SPINLOCK(laundrette_lock); +static LIST_HEAD(laundrettes); -static bool -nfsd_match_cred(const struct cred *c1, const struct cred *c2) -{ - int i; - - if (!uid_eq(c1->fsuid, c2->fsuid)) - return false; - if (!gid_eq(c1->fsgid, c2->fsgid)) - return false; - if (c1->group_info == NULL || c2->group_info == NULL) - return c1->group_info == c2->group_info; - if (c1->group_info->ngroups != c2->group_info->ngroups) - return false; - for (i = 0; i < c1->group_info->ngroups; i++) { - if (!gid_eq(c1->group_info->gid[i], c2->group_info->gid[i])) - return false; - } - return true; -} - -static const struct rhashtable_params nfsd_file_rhash_params = { - .key_len = sizeof_field(struct nfsd_file, nf_inode), - .key_offset = offsetof(struct nfsd_file, nf_inode), - .head_offset = offsetof(struct nfsd_file, nf_rlist), - - /* - * Start with a single page hash table to reduce resizing churn - * on light workloads. - */ - .min_size = 256, - .automatic_shrinking = true, -}; +static void nfsd_file_gc(void); static void nfsd_file_schedule_laundrette(void) { - if (test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags)) - queue_delayed_work(system_wq, &nfsd_filecache_laundrette, - NFSD_LAUNDRETTE_DELAY); + long count = atomic_long_read(&nfsd_filecache_count); + + if (count == 0 || test_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags)) + return; + + queue_delayed_work(system_wq, &nfsd_filecache_laundrette, + NFSD_LAUNDRETTE_DELAY); } static void @@ -153,21 +115,22 @@ nfsd_file_mark_put(struct nfsd_file_mark *nfm) } static struct nfsd_file_mark * -nfsd_file_mark_find_or_create(struct nfsd_file *nf, struct inode *inode) +nfsd_file_mark_find_or_create(struct nfsd_file *nf) { int err; struct fsnotify_mark *mark; struct nfsd_file_mark *nfm = NULL, *new; + struct inode *inode = nf->nf_inode; do { - fsnotify_group_lock(nfsd_file_fsnotify_group); + mutex_lock(&nfsd_file_fsnotify_group->mark_mutex); mark = fsnotify_find_mark(&inode->i_fsnotify_marks, - nfsd_file_fsnotify_group); + nfsd_file_fsnotify_group); if (mark) { nfm = nfsd_file_mark_get(container_of(mark, struct nfsd_file_mark, nfm_mark)); - fsnotify_group_unlock(nfsd_file_fsnotify_group); + mutex_unlock(&nfsd_file_fsnotify_group->mark_mutex); if (nfm) { fsnotify_put_mark(mark); break; @@ -175,9 +138,8 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf, struct inode *inode) /* Avoid soft lockup race with nfsd_file_mark_put() */ fsnotify_destroy_mark(mark, nfsd_file_fsnotify_group); fsnotify_put_mark(mark); - } else { - fsnotify_group_unlock(nfsd_file_fsnotify_group); - } + } else + mutex_unlock(&nfsd_file_fsnotify_group->mark_mutex); /* allocate a new nfm */ new = kmem_cache_alloc(nfsd_file_mark_slab, GFP_KERNEL); @@ -208,91 +170,51 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf, struct inode *inode) } static struct nfsd_file * -nfsd_file_alloc(struct net *net, struct inode *inode, unsigned char need, - bool want_gc) +nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval, + struct net *net) { struct nfsd_file *nf; nf = kmem_cache_alloc(nfsd_file_slab, GFP_KERNEL); - if (unlikely(!nf)) - return NULL; - - INIT_LIST_HEAD(&nf->nf_lru); - nf->nf_birthtime = ktime_get(); - nf->nf_file = NULL; - nf->nf_cred = get_current_cred(); - nf->nf_net = net; - nf->nf_flags = want_gc ? - BIT(NFSD_FILE_HASHED) | BIT(NFSD_FILE_PENDING) | BIT(NFSD_FILE_GC) : - BIT(NFSD_FILE_HASHED) | BIT(NFSD_FILE_PENDING); - nf->nf_inode = inode; - refcount_set(&nf->nf_ref, 1); - nf->nf_may = need; - nf->nf_mark = NULL; + if (nf) { + INIT_HLIST_NODE(&nf->nf_node); + INIT_LIST_HEAD(&nf->nf_lru); + nf->nf_file = NULL; + nf->nf_cred = get_current_cred(); + nf->nf_net = net; + nf->nf_flags = 0; + nf->nf_inode = inode; + nf->nf_hashval = hashval; + refcount_set(&nf->nf_ref, 1); + nf->nf_may = may & NFSD_FILE_MAY_MASK; + if (may & NFSD_MAY_NOT_BREAK_LEASE) { + if (may & NFSD_MAY_WRITE) + __set_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags); + if (may & NFSD_MAY_READ) + __set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags); + } + nf->nf_mark = NULL; + trace_nfsd_file_alloc(nf); + } return nf; } -/** - * nfsd_file_check_write_error - check for writeback errors on a file - * @nf: nfsd_file to check for writeback errors - * - * Check whether a nfsd_file has an unseen error. Reset the write - * verifier if so. - */ -static void -nfsd_file_check_write_error(struct nfsd_file *nf) -{ - struct file *file = nf->nf_file; - - if ((file->f_mode & FMODE_WRITE) && - filemap_check_wb_err(file->f_mapping, READ_ONCE(file->f_wb_err))) - nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); -} - -static void -nfsd_file_hash_remove(struct nfsd_file *nf) -{ - trace_nfsd_file_unhash(nf); - rhltable_remove(&nfsd_file_rhltable, &nf->nf_rlist, - nfsd_file_rhash_params); -} - static bool -nfsd_file_unhash(struct nfsd_file *nf) -{ - if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { - nfsd_file_hash_remove(nf); - return true; - } - return false; -} - -static void nfsd_file_free(struct nfsd_file *nf) { - s64 age = ktime_to_ms(ktime_sub(ktime_get(), nf->nf_birthtime)); + bool flush = false; - trace_nfsd_file_free(nf); - - this_cpu_inc(nfsd_file_releases); - this_cpu_add(nfsd_file_total_age, age); - - nfsd_file_unhash(nf); + trace_nfsd_file_put_final(nf); if (nf->nf_mark) nfsd_file_mark_put(nf->nf_mark); if (nf->nf_file) { - nfsd_file_check_write_error(nf); + get_file(nf->nf_file); filp_close(nf->nf_file, NULL); + fput(nf->nf_file); + flush = true; } - - /* - * If this item is still linked via nf_lru, that's a bug. - * WARN and leak it to preserve system stability. - */ - if (WARN_ON_ONCE(!list_empty(&nf->nf_lru))) - return; - call_rcu(&nf->nf_rcu, nfsd_file_slab_free); + return flush; } static bool @@ -301,140 +223,191 @@ nfsd_file_check_writeback(struct nfsd_file *nf) struct file *file = nf->nf_file; struct address_space *mapping; - /* File not open for write? */ - if (!(file->f_mode & FMODE_WRITE)) + if (!file || !(file->f_mode & FMODE_WRITE)) return false; - - /* - * Some filesystems (e.g. NFS) flush all dirty data on close. - * On others, there is no need to wait for writeback. - */ - if (!(file_inode(file)->i_sb->s_export_op->flags & EXPORT_OP_FLUSH_ON_CLOSE)) - return false; - mapping = file->f_mapping; return mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) || mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK); } - -static bool nfsd_file_lru_add(struct nfsd_file *nf) +static int +nfsd_file_check_write_error(struct nfsd_file *nf) { - set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); - if (list_lru_add(&nfsd_file_lru, &nf->nf_lru)) { - trace_nfsd_file_lru_add(nf); + struct file *file = nf->nf_file; + + if (!file || !(file->f_mode & FMODE_WRITE)) + return 0; + return filemap_check_wb_err(file->f_mapping, READ_ONCE(file->f_wb_err)); +} + +static void +nfsd_file_do_unhash(struct nfsd_file *nf) +{ + lockdep_assert_held(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); + + trace_nfsd_file_unhash(nf); + + if (nfsd_file_check_write_error(nf)) + nfsd_reset_boot_verifier(net_generic(nf->nf_net, nfsd_net_id)); + --nfsd_file_hashtbl[nf->nf_hashval].nfb_count; + hlist_del_rcu(&nf->nf_node); + atomic_long_dec(&nfsd_filecache_count); +} + +static bool +nfsd_file_unhash(struct nfsd_file *nf) +{ + if (test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { + nfsd_file_do_unhash(nf); + if (!list_empty(&nf->nf_lru)) + list_lru_del(&nfsd_file_lru, &nf->nf_lru); return true; } return false; } -static bool nfsd_file_lru_remove(struct nfsd_file *nf) +/* + * Return true if the file was unhashed. + */ +static bool +nfsd_file_unhash_and_release_locked(struct nfsd_file *nf, struct list_head *dispose) { - if (list_lru_del(&nfsd_file_lru, &nf->nf_lru)) { - trace_nfsd_file_lru_del(nf); + lockdep_assert_held(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); + + trace_nfsd_file_unhash_and_release_locked(nf); + if (!nfsd_file_unhash(nf)) + return false; + /* keep final reference for nfsd_file_lru_dispose */ + if (refcount_dec_not_one(&nf->nf_ref)) return true; + + list_add(&nf->nf_lru, dispose); + return true; +} + +static void +nfsd_file_put_noref(struct nfsd_file *nf) +{ + trace_nfsd_file_put(nf); + + if (refcount_dec_and_test(&nf->nf_ref)) { + WARN_ON(test_bit(NFSD_FILE_HASHED, &nf->nf_flags)); + nfsd_file_free(nf); } - return false; +} + +void +nfsd_file_put(struct nfsd_file *nf) +{ + bool is_hashed; + + set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); + if (refcount_read(&nf->nf_ref) > 2 || !nf->nf_file) { + nfsd_file_put_noref(nf); + return; + } + + filemap_flush(nf->nf_file->f_mapping); + is_hashed = test_bit(NFSD_FILE_HASHED, &nf->nf_flags) != 0; + nfsd_file_put_noref(nf); + if (is_hashed) + nfsd_file_schedule_laundrette(); + if (atomic_long_read(&nfsd_filecache_count) >= NFSD_FILE_LRU_LIMIT) + nfsd_file_gc(); } struct nfsd_file * nfsd_file_get(struct nfsd_file *nf) { - if (nf && refcount_inc_not_zero(&nf->nf_ref)) + if (likely(refcount_inc_not_zero(&nf->nf_ref))) return nf; return NULL; } -/** - * nfsd_file_put - put the reference to a nfsd_file - * @nf: nfsd_file of which to put the reference - * - * Put a reference to a nfsd_file. In the non-GC case, we just put the - * reference immediately. In the GC case, if the reference would be - * the last one, the put it on the LRU instead to be cleaned up later. - */ -void -nfsd_file_put(struct nfsd_file *nf) -{ - might_sleep(); - trace_nfsd_file_put(nf); - - if (test_bit(NFSD_FILE_GC, &nf->nf_flags) && - test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { - /* - * If this is the last reference (nf_ref == 1), then try to - * transfer it to the LRU. - */ - if (refcount_dec_not_one(&nf->nf_ref)) - return; - - /* Try to add it to the LRU. If that fails, decrement. */ - if (nfsd_file_lru_add(nf)) { - /* If it's still hashed, we're done */ - if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { - nfsd_file_schedule_laundrette(); - return; - } - - /* - * We're racing with unhashing, so try to remove it from - * the LRU. If removal fails, then someone else already - * has our reference. - */ - if (!nfsd_file_lru_remove(nf)) - return; - } - } - if (refcount_dec_and_test(&nf->nf_ref)) - nfsd_file_free(nf); -} - static void nfsd_file_dispose_list(struct list_head *dispose) { struct nfsd_file *nf; - while (!list_empty(dispose)) { + while(!list_empty(dispose)) { nf = list_first_entry(dispose, struct nfsd_file, nf_lru); - list_del_init(&nf->nf_lru); - nfsd_file_free(nf); + list_del(&nf->nf_lru); + nfsd_file_put_noref(nf); + } +} + +static void +nfsd_file_dispose_list_sync(struct list_head *dispose) +{ + bool flush = false; + struct nfsd_file *nf; + + while(!list_empty(dispose)) { + nf = list_first_entry(dispose, struct nfsd_file, nf_lru); + list_del(&nf->nf_lru); + if (!refcount_dec_and_test(&nf->nf_ref)) + continue; + if (nfsd_file_free(nf)) + flush = true; + } + if (flush) + flush_delayed_fput(); +} + +static void +nfsd_file_list_remove_disposal(struct list_head *dst, + struct nfsd_fcache_disposal *l) +{ + spin_lock(&l->lock); + list_splice_init(&l->freeme, dst); + spin_unlock(&l->lock); +} + +static void +nfsd_file_list_add_disposal(struct list_head *files, struct net *net) +{ + struct nfsd_fcache_disposal *l; + + rcu_read_lock(); + list_for_each_entry_rcu(l, &laundrettes, list) { + if (l->net == net) { + spin_lock(&l->lock); + list_splice_tail_init(files, &l->freeme); + spin_unlock(&l->lock); + queue_work(nfsd_filecache_wq, &l->work); + break; + } + } + rcu_read_unlock(); +} + +static void +nfsd_file_list_add_pernet(struct list_head *dst, struct list_head *src, + struct net *net) +{ + struct nfsd_file *nf, *tmp; + + list_for_each_entry_safe(nf, tmp, src, nf_lru) { + if (nf->nf_net == net) + list_move_tail(&nf->nf_lru, dst); } } -/** - * nfsd_file_dispose_list_delayed - move list of dead files to net's freeme list - * @dispose: list of nfsd_files to be disposed - * - * Transfers each file to the "freeme" list for its nfsd_net, to eventually - * be disposed of by the per-net garbage collector. - */ static void nfsd_file_dispose_list_delayed(struct list_head *dispose) { - while(!list_empty(dispose)) { - struct nfsd_file *nf = list_first_entry(dispose, - struct nfsd_file, nf_lru); - struct nfsd_net *nn = net_generic(nf->nf_net, nfsd_net_id); - struct nfsd_fcache_disposal *l = nn->fcache_disposal; + LIST_HEAD(list); + struct nfsd_file *nf; - spin_lock(&l->lock); - list_move_tail(&nf->nf_lru, &l->freeme); - spin_unlock(&l->lock); - queue_work(nfsd_filecache_wq, &l->work); + while(!list_empty(dispose)) { + nf = list_first_entry(dispose, struct nfsd_file, nf_lru); + nfsd_file_list_add_pernet(&list, dispose, nf->nf_net); + nfsd_file_list_add_disposal(&list, nf->nf_net); } } -/** - * nfsd_file_lru_cb - Examine an entry on the LRU list - * @item: LRU entry to examine - * @lru: controlling LRU - * @lock: LRU list lock (unused) - * @arg: dispose list - * - * Return values: - * %LRU_REMOVED: @item was removed from the LRU - * %LRU_ROTATE: @item is to be moved to the LRU tail - * %LRU_SKIP: @item cannot be evicted +/* + * Note this can deadlock with nfsd_file_cache_purge. */ static enum lru_status nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru, @@ -445,60 +418,72 @@ nfsd_file_lru_cb(struct list_head *item, struct list_lru_one *lru, struct list_head *head = arg; struct nfsd_file *nf = list_entry(item, struct nfsd_file, nf_lru); - /* We should only be dealing with GC entries here */ - WARN_ON_ONCE(!test_bit(NFSD_FILE_GC, &nf->nf_flags)); + /* + * Do a lockless refcount check. The hashtable holds one reference, so + * we look to see if anything else has a reference, or if any have + * been put since the shrinker last ran. Those don't get unhashed and + * released. + * + * Note that in the put path, we set the flag and then decrement the + * counter. Here we check the counter and then test and clear the flag. + * That order is deliberate to ensure that we can do this locklessly. + */ + if (refcount_read(&nf->nf_ref) > 1) + goto out_skip; /* * Don't throw out files that are still undergoing I/O or * that have uncleared errors pending. */ - if (nfsd_file_check_writeback(nf)) { - trace_nfsd_file_gc_writeback(nf); - return LRU_SKIP; - } + if (nfsd_file_check_writeback(nf)) + goto out_skip; - /* If it was recently added to the list, skip it */ - if (test_and_clear_bit(NFSD_FILE_REFERENCED, &nf->nf_flags)) { - trace_nfsd_file_gc_referenced(nf); - return LRU_ROTATE; - } + if (test_and_clear_bit(NFSD_FILE_REFERENCED, &nf->nf_flags)) + goto out_skip; - /* - * Put the reference held on behalf of the LRU. If it wasn't the last - * one, then just remove it from the LRU and ignore it. - */ - if (!refcount_dec_and_test(&nf->nf_ref)) { - trace_nfsd_file_gc_in_use(nf); - list_lru_isolate(lru, &nf->nf_lru); - return LRU_REMOVED; - } + if (!test_and_clear_bit(NFSD_FILE_HASHED, &nf->nf_flags)) + goto out_skip; - /* Refcount went to zero. Unhash it and queue it to the dispose list */ - nfsd_file_unhash(nf); list_lru_isolate_move(lru, &nf->nf_lru, head); - this_cpu_inc(nfsd_file_evictions); - trace_nfsd_file_gc_disposed(nf); return LRU_REMOVED; +out_skip: + return LRU_SKIP; +} + +static unsigned long +nfsd_file_lru_walk_list(struct shrink_control *sc) +{ + LIST_HEAD(head); + struct nfsd_file *nf; + unsigned long ret; + + if (sc) + ret = list_lru_shrink_walk(&nfsd_file_lru, sc, + nfsd_file_lru_cb, &head); + else + ret = list_lru_walk(&nfsd_file_lru, + nfsd_file_lru_cb, + &head, LONG_MAX); + list_for_each_entry(nf, &head, nf_lru) { + spin_lock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); + nfsd_file_do_unhash(nf); + spin_unlock(&nfsd_file_hashtbl[nf->nf_hashval].nfb_lock); + } + nfsd_file_dispose_list_delayed(&head); + return ret; } static void nfsd_file_gc(void) { - LIST_HEAD(dispose); - unsigned long ret; - - ret = list_lru_walk(&nfsd_file_lru, nfsd_file_lru_cb, - &dispose, list_lru_count(&nfsd_file_lru)); - trace_nfsd_file_gc_removed(ret, list_lru_count(&nfsd_file_lru)); - nfsd_file_dispose_list_delayed(&dispose); + nfsd_file_lru_walk_list(NULL); } static void nfsd_file_gc_worker(struct work_struct *work) { nfsd_file_gc(); - if (list_lru_count(&nfsd_file_lru)) - nfsd_file_schedule_laundrette(); + nfsd_file_schedule_laundrette(); } static unsigned long @@ -510,14 +495,7 @@ nfsd_file_lru_count(struct shrinker *s, struct shrink_control *sc) static unsigned long nfsd_file_lru_scan(struct shrinker *s, struct shrink_control *sc) { - LIST_HEAD(dispose); - unsigned long ret; - - ret = list_lru_shrink_walk(&nfsd_file_lru, sc, - nfsd_file_lru_cb, &dispose); - trace_nfsd_file_shrinker_removed(ret, list_lru_count(&nfsd_file_lru)); - nfsd_file_dispose_list_delayed(&dispose); - return ret; + return nfsd_file_lru_walk_list(sc); } static struct shrinker nfsd_file_shrinker = { @@ -526,123 +504,70 @@ static struct shrinker nfsd_file_shrinker = { .seeks = 1, }; -/** - * nfsd_file_cond_queue - conditionally unhash and queue a nfsd_file - * @nf: nfsd_file to attempt to queue - * @dispose: private list to queue successfully-put objects - * - * Unhash an nfsd_file, try to get a reference to it, and then put that - * reference. If it's the last reference, queue it to the dispose list. - */ static void -nfsd_file_cond_queue(struct nfsd_file *nf, struct list_head *dispose) - __must_hold(RCU) +__nfsd_file_close_inode(struct inode *inode, unsigned int hashval, + struct list_head *dispose) { - int decrement = 1; + struct nfsd_file *nf; + struct hlist_node *tmp; - /* If we raced with someone else unhashing, ignore it */ - if (!nfsd_file_unhash(nf)) - return; - - /* If we can't get a reference, ignore it */ - if (!nfsd_file_get(nf)) - return; - - /* Extra decrement if we remove from the LRU */ - if (nfsd_file_lru_remove(nf)) - ++decrement; - - /* If refcount goes to 0, then put on the dispose list */ - if (refcount_sub_and_test(decrement, &nf->nf_ref)) { - list_add(&nf->nf_lru, dispose); - trace_nfsd_file_closing(nf); + spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); + hlist_for_each_entry_safe(nf, tmp, &nfsd_file_hashtbl[hashval].nfb_head, nf_node) { + if (inode == nf->nf_inode) + nfsd_file_unhash_and_release_locked(nf, dispose); } -} - -/** - * nfsd_file_queue_for_close: try to close out any open nfsd_files for an inode - * @inode: inode on which to close out nfsd_files - * @dispose: list on which to gather nfsd_files to close out - * - * An nfsd_file represents a struct file being held open on behalf of nfsd. - * An open file however can block other activity (such as leases), or cause - * undesirable behavior (e.g. spurious silly-renames when reexporting NFS). - * - * This function is intended to find open nfsd_files when this sort of - * conflicting access occurs and then attempt to close those files out. - * - * Populates the dispose list with entries that have already had their - * refcounts go to zero. The actual free of an nfsd_file can be expensive, - * so we leave it up to the caller whether it wants to wait or not. - */ -static void -nfsd_file_queue_for_close(struct inode *inode, struct list_head *dispose) -{ - struct rhlist_head *tmp, *list; - struct nfsd_file *nf; - - rcu_read_lock(); - list = rhltable_lookup(&nfsd_file_rhltable, &inode, - nfsd_file_rhash_params); - rhl_for_each_entry_rcu(nf, tmp, list, nf_rlist) { - if (!test_bit(NFSD_FILE_GC, &nf->nf_flags)) - continue; - nfsd_file_cond_queue(nf, dispose); - } - rcu_read_unlock(); -} - -/** - * nfsd_file_close_inode - attempt a delayed close of a nfsd_file - * @inode: inode of the file to attempt to remove - * - * Close out any open nfsd_files that can be reaped for @inode. The - * actual freeing is deferred to the dispose_list_delayed infrastructure. - * - * This is used by the fsnotify callbacks and setlease notifier. - */ -static void -nfsd_file_close_inode(struct inode *inode) -{ - LIST_HEAD(dispose); - - nfsd_file_queue_for_close(inode, &dispose); - nfsd_file_dispose_list_delayed(&dispose); + spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); } /** * nfsd_file_close_inode_sync - attempt to forcibly close a nfsd_file * @inode: inode of the file to attempt to remove * - * Close out any open nfsd_files that can be reaped for @inode. The - * nfsd_files are closed out synchronously. - * - * This is called from nfsd_rename and nfsd_unlink to avoid silly-renames - * when reexporting NFS. + * Walk the whole hash bucket, looking for any files that correspond to "inode". + * If any do, then unhash them and put the hashtable reference to them and + * destroy any that had their last reference put. Also ensure that any of the + * fputs also have their final __fput done as well. */ void nfsd_file_close_inode_sync(struct inode *inode) { - struct nfsd_file *nf; + unsigned int hashval = (unsigned int)hash_long(inode->i_ino, + NFSD_FILE_HASH_BITS); LIST_HEAD(dispose); - trace_nfsd_file_close(inode); + __nfsd_file_close_inode(inode, hashval, &dispose); + trace_nfsd_file_close_inode_sync(inode, hashval, !list_empty(&dispose)); + nfsd_file_dispose_list_sync(&dispose); +} - nfsd_file_queue_for_close(inode, &dispose); - while (!list_empty(&dispose)) { - nf = list_first_entry(&dispose, struct nfsd_file, nf_lru); - list_del_init(&nf->nf_lru); - nfsd_file_free(nf); - } - flush_delayed_fput(); +/** + * nfsd_file_close_inode_sync - attempt to forcibly close a nfsd_file + * @inode: inode of the file to attempt to remove + * + * Walk the whole hash bucket, looking for any files that correspond to "inode". + * If any do, then unhash them and put the hashtable reference to them and + * destroy any that had their last reference put. + */ +static void +nfsd_file_close_inode(struct inode *inode) +{ + unsigned int hashval = (unsigned int)hash_long(inode->i_ino, + NFSD_FILE_HASH_BITS); + LIST_HEAD(dispose); + + __nfsd_file_close_inode(inode, hashval, &dispose); + trace_nfsd_file_close_inode(inode, hashval, !list_empty(&dispose)); + nfsd_file_dispose_list_delayed(&dispose); } /** * nfsd_file_delayed_close - close unused nfsd_files * @work: dummy * - * Scrape the freeme list for this nfsd_net, and then dispose of them - * all. + * Walk the LRU list and close any entries that have not been used since + * the last scan. + * + * Note this can deadlock with nfsd_file_cache_purge. */ static void nfsd_file_delayed_close(struct work_struct *work) @@ -651,10 +576,7 @@ nfsd_file_delayed_close(struct work_struct *work) struct nfsd_fcache_disposal *l = container_of(work, struct nfsd_fcache_disposal, work); - spin_lock(&l->lock); - list_splice_init(&l->freeme, &head); - spin_unlock(&l->lock); - + nfsd_file_list_remove_disposal(&head, l); nfsd_file_dispose_list(&head); } @@ -666,7 +588,7 @@ nfsd_file_lease_notifier_call(struct notifier_block *nb, unsigned long arg, /* Only close files for F_SETLEASE leases */ if (fl->fl_flags & FL_LEASE) - nfsd_file_close_inode(file_inode(fl->fl_file)); + nfsd_file_close_inode_sync(file_inode(fl->fl_file)); return 0; } @@ -679,9 +601,6 @@ nfsd_file_fsnotify_handle_event(struct fsnotify_mark *mark, u32 mask, struct inode *inode, struct inode *dir, const struct qstr *name, u32 cookie) { - if (WARN_ON_ONCE(!inode)) - return 0; - trace_nfsd_file_fsnotify_handle_event(inode, mask); /* Should be no marks on non-regular files */ @@ -709,21 +628,25 @@ static const struct fsnotify_ops nfsd_file_fsnotify_ops = { int nfsd_file_cache_init(void) { - int ret; + int ret = -ENOMEM; + unsigned int i; - lockdep_assert_held(&nfsd_mutex); - if (test_and_set_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1) + clear_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags); + + if (nfsd_file_hashtbl) return 0; - ret = rhltable_init(&nfsd_file_rhltable, &nfsd_file_rhash_params); - if (ret) - return ret; - - ret = -ENOMEM; nfsd_filecache_wq = alloc_workqueue("nfsd_filecache", 0, 0); if (!nfsd_filecache_wq) goto out; + nfsd_file_hashtbl = kvcalloc(NFSD_FILE_HASH_SIZE, + sizeof(*nfsd_file_hashtbl), GFP_KERNEL); + if (!nfsd_file_hashtbl) { + pr_err("nfsd: unable to allocate nfsd_file_hashtbl\n"); + goto out_err; + } + nfsd_file_slab = kmem_cache_create("nfsd_file", sizeof(struct nfsd_file), 0, 0, NULL); if (!nfsd_file_slab) { @@ -757,16 +680,19 @@ nfsd_file_cache_init(void) goto out_shrinker; } - nfsd_file_fsnotify_group = fsnotify_alloc_group(&nfsd_file_fsnotify_ops, - FSNOTIFY_GROUP_NOFS); + nfsd_file_fsnotify_group = fsnotify_alloc_group(&nfsd_file_fsnotify_ops); if (IS_ERR(nfsd_file_fsnotify_group)) { pr_err("nfsd: unable to create fsnotify group: %ld\n", PTR_ERR(nfsd_file_fsnotify_group)); - ret = PTR_ERR(nfsd_file_fsnotify_group); nfsd_file_fsnotify_group = NULL; goto out_notifier; } + for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) { + INIT_HLIST_HEAD(&nfsd_file_hashtbl[i].nfb_head); + spin_lock_init(&nfsd_file_hashtbl[i].nfb_lock); + } + INIT_DELAYED_WORK(&nfsd_filecache_laundrette, nfsd_file_gc_worker); out: return ret; @@ -781,47 +707,50 @@ nfsd_file_cache_init(void) nfsd_file_slab = NULL; kmem_cache_destroy(nfsd_file_mark_slab); nfsd_file_mark_slab = NULL; + kvfree(nfsd_file_hashtbl); + nfsd_file_hashtbl = NULL; destroy_workqueue(nfsd_filecache_wq); nfsd_filecache_wq = NULL; - rhltable_destroy(&nfsd_file_rhltable); goto out; } -/** - * __nfsd_file_cache_purge: clean out the cache for shutdown - * @net: net-namespace to shut down the cache (may be NULL) - * - * Walk the nfsd_file cache and close out any that match @net. If @net is NULL, - * then close out everything. Called when an nfsd instance is being shut down, - * and when the exports table is flushed. +/* + * Note this can deadlock with nfsd_file_lru_cb. */ -static void -__nfsd_file_cache_purge(struct net *net) +void +nfsd_file_cache_purge(struct net *net) { - struct rhashtable_iter iter; - struct nfsd_file *nf; + unsigned int i; + struct nfsd_file *nf; + struct hlist_node *next; LIST_HEAD(dispose); + bool del; - rhltable_walk_enter(&nfsd_file_rhltable, &iter); - do { - rhashtable_walk_start(&iter); + if (!nfsd_file_hashtbl) + return; - nf = rhashtable_walk_next(&iter); - while (!IS_ERR_OR_NULL(nf)) { - if (!net || nf->nf_net == net) - nfsd_file_cond_queue(nf, &dispose); - nf = rhashtable_walk_next(&iter); + for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) { + struct nfsd_fcache_bucket *nfb = &nfsd_file_hashtbl[i]; + + spin_lock(&nfb->nfb_lock); + hlist_for_each_entry_safe(nf, next, &nfb->nfb_head, nf_node) { + if (net && nf->nf_net != net) + continue; + del = nfsd_file_unhash_and_release_locked(nf, &dispose); + + /* + * Deadlock detected! Something marked this entry as + * unhased, but hasn't removed it from the hash list. + */ + WARN_ON_ONCE(!del); } - - rhashtable_walk_stop(&iter); - } while (nf == ERR_PTR(-EAGAIN)); - rhashtable_walk_exit(&iter); - - nfsd_file_dispose_list(&dispose); + spin_unlock(&nfb->nfb_lock); + nfsd_file_dispose_list(&dispose); + } } static struct nfsd_fcache_disposal * -nfsd_alloc_fcache_disposal(void) +nfsd_alloc_fcache_disposal(struct net *net) { struct nfsd_fcache_disposal *l; @@ -829,6 +758,7 @@ nfsd_alloc_fcache_disposal(void) if (!l) return NULL; INIT_WORK(&l->work, nfsd_file_delayed_close); + l->net = net; spin_lock_init(&l->lock); INIT_LIST_HEAD(&l->freeme); return l; @@ -837,40 +767,61 @@ nfsd_alloc_fcache_disposal(void) static void nfsd_free_fcache_disposal(struct nfsd_fcache_disposal *l) { + rcu_assign_pointer(l->net, NULL); cancel_work_sync(&l->work); nfsd_file_dispose_list(&l->freeme); - kfree(l); + kfree_rcu(l, rcu); +} + +static void +nfsd_add_fcache_disposal(struct nfsd_fcache_disposal *l) +{ + spin_lock(&laundrette_lock); + list_add_tail_rcu(&l->list, &laundrettes); + spin_unlock(&laundrette_lock); +} + +static void +nfsd_del_fcache_disposal(struct nfsd_fcache_disposal *l) +{ + spin_lock(&laundrette_lock); + list_del_rcu(&l->list); + spin_unlock(&laundrette_lock); +} + +static int +nfsd_alloc_fcache_disposal_net(struct net *net) +{ + struct nfsd_fcache_disposal *l; + + l = nfsd_alloc_fcache_disposal(net); + if (!l) + return -ENOMEM; + nfsd_add_fcache_disposal(l); + return 0; } static void nfsd_free_fcache_disposal_net(struct net *net) { - struct nfsd_net *nn = net_generic(net, nfsd_net_id); - struct nfsd_fcache_disposal *l = nn->fcache_disposal; + struct nfsd_fcache_disposal *l; - nfsd_free_fcache_disposal(l); + rcu_read_lock(); + list_for_each_entry_rcu(l, &laundrettes, list) { + if (l->net != net) + continue; + nfsd_del_fcache_disposal(l); + rcu_read_unlock(); + nfsd_free_fcache_disposal(l); + return; + } + rcu_read_unlock(); } int nfsd_file_cache_start_net(struct net *net) { - struct nfsd_net *nn = net_generic(net, nfsd_net_id); - - nn->fcache_disposal = nfsd_alloc_fcache_disposal(); - return nn->fcache_disposal ? 0 : -ENOMEM; -} - -/** - * nfsd_file_cache_purge - Remove all cache items associated with @net - * @net: target net namespace - * - */ -void -nfsd_file_cache_purge(struct net *net) -{ - lockdep_assert_held(&nfsd_mutex); - if (test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1) - __nfsd_file_cache_purge(net); + return nfsd_alloc_fcache_disposal_net(net); } void @@ -883,11 +834,7 @@ nfsd_file_cache_shutdown_net(struct net *net) void nfsd_file_cache_shutdown(void) { - int i; - - lockdep_assert_held(&nfsd_mutex); - if (test_and_clear_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 0) - return; + set_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags); lease_unregister_notifier(&nfsd_file_lease_notifier); unregister_shrinker(&nfsd_file_shrinker); @@ -896,7 +843,7 @@ nfsd_file_cache_shutdown(void) * calling nfsd_file_cache_purge */ cancel_delayed_work_sync(&nfsd_filecache_laundrette); - __nfsd_file_cache_purge(NULL); + nfsd_file_cache_purge(NULL); list_lru_destroy(&nfsd_file_lru); rcu_barrier(); fsnotify_put_group(nfsd_file_fsnotify_group); @@ -906,332 +853,240 @@ nfsd_file_cache_shutdown(void) fsnotify_wait_marks_destroyed(); kmem_cache_destroy(nfsd_file_mark_slab); nfsd_file_mark_slab = NULL; + kvfree(nfsd_file_hashtbl); + nfsd_file_hashtbl = NULL; destroy_workqueue(nfsd_filecache_wq); nfsd_filecache_wq = NULL; - rhltable_destroy(&nfsd_file_rhltable); +} - for_each_possible_cpu(i) { - per_cpu(nfsd_file_cache_hits, i) = 0; - per_cpu(nfsd_file_acquisitions, i) = 0; - per_cpu(nfsd_file_releases, i) = 0; - per_cpu(nfsd_file_total_age, i) = 0; - per_cpu(nfsd_file_evictions, i) = 0; +static bool +nfsd_match_cred(const struct cred *c1, const struct cred *c2) +{ + int i; + + if (!uid_eq(c1->fsuid, c2->fsuid)) + return false; + if (!gid_eq(c1->fsgid, c2->fsgid)) + return false; + if (c1->group_info == NULL || c2->group_info == NULL) + return c1->group_info == c2->group_info; + if (c1->group_info->ngroups != c2->group_info->ngroups) + return false; + for (i = 0; i < c1->group_info->ngroups; i++) { + if (!gid_eq(c1->group_info->gid[i], c2->group_info->gid[i])) + return false; } + return true; } static struct nfsd_file * -nfsd_file_lookup_locked(const struct net *net, const struct cred *cred, - struct inode *inode, unsigned char need, - bool want_gc) +nfsd_file_find_locked(struct inode *inode, unsigned int may_flags, + unsigned int hashval, struct net *net) { - struct rhlist_head *tmp, *list; struct nfsd_file *nf; + unsigned char need = may_flags & NFSD_FILE_MAY_MASK; - list = rhltable_lookup(&nfsd_file_rhltable, &inode, - nfsd_file_rhash_params); - rhl_for_each_entry_rcu(nf, tmp, list, nf_rlist) { + hlist_for_each_entry_rcu(nf, &nfsd_file_hashtbl[hashval].nfb_head, + nf_node, lockdep_is_held(&nfsd_file_hashtbl[hashval].nfb_lock)) { if (nf->nf_may != need) continue; + if (nf->nf_inode != inode) + continue; if (nf->nf_net != net) continue; - if (!nfsd_match_cred(nf->nf_cred, cred)) + if (!nfsd_match_cred(nf->nf_cred, current_cred())) continue; - if (test_bit(NFSD_FILE_GC, &nf->nf_flags) != want_gc) + if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) continue; - if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) - continue; - - if (!nfsd_file_get(nf)) - continue; - return nf; + if (nfsd_file_get(nf) != NULL) + return nf; } return NULL; } /** - * nfsd_file_is_cached - are there any cached open files for this inode? - * @inode: inode to check + * nfsd_file_is_cached - are there any cached open files for this fh? + * @inode: inode of the file to check * - * The lookup matches inodes in all net namespaces and is atomic wrt - * nfsd_file_acquire(). - * - * Return values: - * %true: filecache contains at least one file matching this inode - * %false: filecache contains no files matching this inode + * Scan the hashtable for open files that match this fh. Returns true if there + * are any, and false if not. */ bool nfsd_file_is_cached(struct inode *inode) { - struct rhlist_head *tmp, *list; - struct nfsd_file *nf; - bool ret = false; + bool ret = false; + struct nfsd_file *nf; + unsigned int hashval; + + hashval = (unsigned int)hash_long(inode->i_ino, NFSD_FILE_HASH_BITS); rcu_read_lock(); - list = rhltable_lookup(&nfsd_file_rhltable, &inode, - nfsd_file_rhash_params); - rhl_for_each_entry_rcu(nf, tmp, list, nf_rlist) - if (test_bit(NFSD_FILE_GC, &nf->nf_flags)) { + hlist_for_each_entry_rcu(nf, &nfsd_file_hashtbl[hashval].nfb_head, + nf_node) { + if (inode == nf->nf_inode) { ret = true; break; } + } rcu_read_unlock(); - - trace_nfsd_file_is_cached(inode, (int)ret); + trace_nfsd_file_is_cached(inode, hashval, (int)ret); return ret; } -static __be32 -nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, - unsigned int may_flags, struct file *file, - struct nfsd_file **pnf, bool want_gc) +__be32 +nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, + unsigned int may_flags, struct nfsd_file **pnf) { - unsigned char need = may_flags & NFSD_FILE_MAY_MASK; + __be32 status; struct net *net = SVC_NET(rqstp); - struct nfsd_file *new, *nf; - const struct cred *cred; - bool open_retry = true; + struct nfsd_file *nf, *new; struct inode *inode; - __be32 status; - int ret; + unsigned int hashval; + bool retry = true; + /* FIXME: skip this if fh_dentry is already set? */ status = fh_verify(rqstp, fhp, S_IFREG, may_flags|NFSD_MAY_OWNER_OVERRIDE); if (status != nfs_ok) return status; - inode = d_inode(fhp->fh_dentry); - cred = get_current_cred(); + inode = d_inode(fhp->fh_dentry); + hashval = (unsigned int)hash_long(inode->i_ino, NFSD_FILE_HASH_BITS); retry: rcu_read_lock(); - nf = nfsd_file_lookup_locked(net, cred, inode, need, want_gc); + nf = nfsd_file_find_locked(inode, may_flags, hashval, net); rcu_read_unlock(); - - if (nf) { - /* - * If the nf is on the LRU then it holds an extra reference - * that must be put if it's removed. It had better not be - * the last one however, since we should hold another. - */ - if (nfsd_file_lru_remove(nf)) - WARN_ON_ONCE(refcount_dec_and_test(&nf->nf_ref)); + if (nf) goto wait_for_construction; - } - new = nfsd_file_alloc(net, inode, need, want_gc); + new = nfsd_file_alloc(inode, may_flags, hashval, net); if (!new) { - status = nfserr_jukebox; - goto out; + trace_nfsd_file_acquire(rqstp, hashval, inode, may_flags, + NULL, nfserr_jukebox); + return nfserr_jukebox; } - rcu_read_lock(); - spin_lock(&inode->i_lock); - nf = nfsd_file_lookup_locked(net, cred, inode, need, want_gc); - if (unlikely(nf)) { - spin_unlock(&inode->i_lock); - rcu_read_unlock(); - nfsd_file_slab_free(&new->nf_rcu); - goto wait_for_construction; - } - nf = new; - ret = rhltable_insert(&nfsd_file_rhltable, &nf->nf_rlist, - nfsd_file_rhash_params); - spin_unlock(&inode->i_lock); - rcu_read_unlock(); - if (likely(ret == 0)) + spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); + nf = nfsd_file_find_locked(inode, may_flags, hashval, net); + if (nf == NULL) goto open_file; - - if (ret == -EEXIST) - goto retry; - trace_nfsd_file_insert_err(rqstp, inode, may_flags, ret); - status = nfserr_jukebox; - goto construction_err; + spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); + nfsd_file_slab_free(&new->nf_rcu); wait_for_construction: wait_on_bit(&nf->nf_flags, NFSD_FILE_PENDING, TASK_UNINTERRUPTIBLE); /* Did construction of this file fail? */ if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { - trace_nfsd_file_cons_err(rqstp, inode, may_flags, nf); - if (!open_retry) { + if (!retry) { status = nfserr_jukebox; - goto construction_err; + goto out; } - open_retry = false; + retry = false; + nfsd_file_put_noref(nf); goto retry; } + this_cpu_inc(nfsd_file_cache_hits); - status = nfserrno(nfsd_open_break_lease(file_inode(nf->nf_file), may_flags)); - if (status != nfs_ok) { + if (!(may_flags & NFSD_MAY_NOT_BREAK_LEASE)) { + bool write = (may_flags & NFSD_MAY_WRITE); + + if (test_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags) || + (test_bit(NFSD_FILE_BREAK_WRITE, &nf->nf_flags) && write)) { + status = nfserrno(nfsd_open_break_lease( + file_inode(nf->nf_file), may_flags)); + if (status == nfs_ok) { + clear_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags); + if (write) + clear_bit(NFSD_FILE_BREAK_WRITE, + &nf->nf_flags); + } + } + } +out: + if (status == nfs_ok) { + *pnf = nf; + } else { nfsd_file_put(nf); nf = NULL; } -out: - if (status == nfs_ok) { - this_cpu_inc(nfsd_file_acquisitions); - nfsd_file_check_write_error(nf); - *pnf = nf; - } - put_cred(cred); - trace_nfsd_file_acquire(rqstp, inode, may_flags, nf, status); + trace_nfsd_file_acquire(rqstp, hashval, inode, may_flags, nf, status); return status; - open_file: - trace_nfsd_file_alloc(nf); - nf->nf_mark = nfsd_file_mark_find_or_create(nf, inode); - if (nf->nf_mark) { - if (file) { - get_file(file); - nf->nf_file = file; - status = nfs_ok; - trace_nfsd_file_opened(nf, status); - } else { - status = nfsd_open_verified(rqstp, fhp, may_flags, - &nf->nf_file); - trace_nfsd_file_open(nf, status); - } - } else + nf = new; + /* Take reference for the hashtable */ + refcount_inc(&nf->nf_ref); + __set_bit(NFSD_FILE_HASHED, &nf->nf_flags); + __set_bit(NFSD_FILE_PENDING, &nf->nf_flags); + list_lru_add(&nfsd_file_lru, &nf->nf_lru); + hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head); + ++nfsd_file_hashtbl[hashval].nfb_count; + nfsd_file_hashtbl[hashval].nfb_maxcount = max(nfsd_file_hashtbl[hashval].nfb_maxcount, + nfsd_file_hashtbl[hashval].nfb_count); + spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); + if (atomic_long_inc_return(&nfsd_filecache_count) >= NFSD_FILE_LRU_THRESHOLD) + nfsd_file_gc(); + + nf->nf_mark = nfsd_file_mark_find_or_create(nf); + if (nf->nf_mark) + status = nfsd_open_verified(rqstp, fhp, S_IFREG, + may_flags, &nf->nf_file); + else status = nfserr_jukebox; /* * If construction failed, or we raced with a call to unlink() * then unhash. */ - if (status != nfs_ok || inode->i_nlink == 0) - nfsd_file_unhash(nf); - clear_and_wake_up_bit(NFSD_FILE_PENDING, &nf->nf_flags); - if (status == nfs_ok) - goto out; - -construction_err: - if (refcount_dec_and_test(&nf->nf_ref)) - nfsd_file_free(nf); - nf = NULL; + if (status != nfs_ok || inode->i_nlink == 0) { + bool do_free; + spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock); + do_free = nfsd_file_unhash(nf); + spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock); + if (do_free) + nfsd_file_put_noref(nf); + } + clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags); + smp_mb__after_atomic(); + wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING); goto out; } -/** - * nfsd_file_acquire_gc - Get a struct nfsd_file with an open file - * @rqstp: the RPC transaction being executed - * @fhp: the NFS filehandle of the file to be opened - * @may_flags: NFSD_MAY_ settings for the file - * @pnf: OUT: new or found "struct nfsd_file" object - * - * The nfsd_file object returned by this API is reference-counted - * and garbage-collected. The object is retained for a few - * seconds after the final nfsd_file_put() in case the caller - * wants to re-use it. - * - * Return values: - * %nfs_ok - @pnf points to an nfsd_file with its reference - * count boosted. - * - * On error, an nfsstat value in network byte order is returned. - */ -__be32 -nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp, - unsigned int may_flags, struct nfsd_file **pnf) -{ - return nfsd_file_do_acquire(rqstp, fhp, may_flags, NULL, pnf, true); -} - -/** - * nfsd_file_acquire - Get a struct nfsd_file with an open file - * @rqstp: the RPC transaction being executed - * @fhp: the NFS filehandle of the file to be opened - * @may_flags: NFSD_MAY_ settings for the file - * @pnf: OUT: new or found "struct nfsd_file" object - * - * The nfsd_file_object returned by this API is reference-counted - * but not garbage-collected. The object is unhashed after the - * final nfsd_file_put(). - * - * Return values: - * %nfs_ok - @pnf points to an nfsd_file with its reference - * count boosted. - * - * On error, an nfsstat value in network byte order is returned. - */ -__be32 -nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, - unsigned int may_flags, struct nfsd_file **pnf) -{ - return nfsd_file_do_acquire(rqstp, fhp, may_flags, NULL, pnf, false); -} - -/** - * nfsd_file_acquire_opened - Get a struct nfsd_file using existing open file - * @rqstp: the RPC transaction being executed - * @fhp: the NFS filehandle of the file just created - * @may_flags: NFSD_MAY_ settings for the file - * @file: cached, already-open file (may be NULL) - * @pnf: OUT: new or found "struct nfsd_file" object - * - * Acquire a nfsd_file object that is not GC'ed. If one doesn't already exist, - * and @file is non-NULL, use it to instantiate a new nfsd_file instead of - * opening a new one. - * - * Return values: - * %nfs_ok - @pnf points to an nfsd_file with its reference - * count boosted. - * - * On error, an nfsstat value in network byte order is returned. - */ -__be32 -nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp, - unsigned int may_flags, struct file *file, - struct nfsd_file **pnf) -{ - return nfsd_file_do_acquire(rqstp, fhp, may_flags, file, pnf, false); -} - /* * Note that fields may be added, removed or reordered in the future. Programs * scraping this file for info should test the labels to ensure they're * getting the correct field. */ -int nfsd_file_cache_stats_show(struct seq_file *m, void *v) +static int nfsd_file_cache_stats_show(struct seq_file *m, void *v) { - unsigned long releases = 0, evictions = 0; - unsigned long hits = 0, acquisitions = 0; - unsigned int i, count = 0, buckets = 0; - unsigned long lru = 0, total_age = 0; + unsigned int i, count = 0, longest = 0; + unsigned long hits = 0; - /* Serialize with server shutdown */ + /* + * No need for spinlocks here since we're not terribly interested in + * accuracy. We do take the nfsd_mutex simply to ensure that we + * don't end up racing with server shutdown + */ mutex_lock(&nfsd_mutex); - if (test_bit(NFSD_FILE_CACHE_UP, &nfsd_file_flags) == 1) { - struct bucket_table *tbl; - struct rhashtable *ht; - - lru = list_lru_count(&nfsd_file_lru); - - rcu_read_lock(); - ht = &nfsd_file_rhltable.ht; - count = atomic_read(&ht->nelems); - tbl = rht_dereference_rcu(ht->tbl, ht); - buckets = tbl->size; - rcu_read_unlock(); + if (nfsd_file_hashtbl) { + for (i = 0; i < NFSD_FILE_HASH_SIZE; i++) { + count += nfsd_file_hashtbl[i].nfb_count; + longest = max(longest, nfsd_file_hashtbl[i].nfb_count); + } } mutex_unlock(&nfsd_mutex); - for_each_possible_cpu(i) { + for_each_possible_cpu(i) hits += per_cpu(nfsd_file_cache_hits, i); - acquisitions += per_cpu(nfsd_file_acquisitions, i); - releases += per_cpu(nfsd_file_releases, i); - total_age += per_cpu(nfsd_file_total_age, i); - evictions += per_cpu(nfsd_file_evictions, i); - } - seq_printf(m, "total inodes: %u\n", count); - seq_printf(m, "hash buckets: %u\n", buckets); - seq_printf(m, "lru entries: %lu\n", lru); + seq_printf(m, "total entries: %u\n", count); + seq_printf(m, "longest chain: %u\n", longest); seq_printf(m, "cache hits: %lu\n", hits); - seq_printf(m, "acquisitions: %lu\n", acquisitions); - seq_printf(m, "releases: %lu\n", releases); - seq_printf(m, "evictions: %lu\n", evictions); - if (releases) - seq_printf(m, "mean age (ms): %ld\n", total_age / releases); - else - seq_printf(m, "mean age (ms): -\n"); return 0; } + +int nfsd_file_cache_stats_open(struct inode *inode, struct file *file) +{ + return single_open(file, nfsd_file_cache_stats_show, NULL); +} diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index e54165a3224f..435ceab27897 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -29,23 +29,23 @@ struct nfsd_file_mark { * never be dereferenced, only used for comparison. */ struct nfsd_file { - struct rhlist_head nf_rlist; - void *nf_inode; + struct hlist_node nf_node; + struct list_head nf_lru; + struct rcu_head nf_rcu; struct file *nf_file; const struct cred *nf_cred; struct net *nf_net; #define NFSD_FILE_HASHED (0) #define NFSD_FILE_PENDING (1) -#define NFSD_FILE_REFERENCED (2) -#define NFSD_FILE_GC (3) +#define NFSD_FILE_BREAK_READ (2) +#define NFSD_FILE_BREAK_WRITE (3) +#define NFSD_FILE_REFERENCED (4) unsigned long nf_flags; + struct inode *nf_inode; + unsigned int nf_hashval; refcount_t nf_ref; unsigned char nf_may; - struct nfsd_file_mark *nf_mark; - struct list_head nf_lru; - struct rcu_head nf_rcu; - ktime_t nf_birthtime; }; int nfsd_file_cache_init(void); @@ -57,12 +57,7 @@ void nfsd_file_put(struct nfsd_file *nf); struct nfsd_file *nfsd_file_get(struct nfsd_file *nf); void nfsd_file_close_inode_sync(struct inode *inode); bool nfsd_file_is_cached(struct inode *inode); -__be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp, - unsigned int may_flags, struct nfsd_file **nfp); __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **nfp); -__be32 nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp, - unsigned int may_flags, struct file *file, - struct nfsd_file **nfp); -int nfsd_file_cache_stats_show(struct seq_file *m, void *v); +int nfsd_file_cache_stats_open(struct inode *, struct file *); #endif /* _FS_NFSD_FILECACHE_H */ diff --git a/fs/nfsd/flexfilelayout.c b/fs/nfsd/flexfilelayout.c index fabc21ed68ce..db7ef07ae50c 100644 --- a/fs/nfsd/flexfilelayout.c +++ b/fs/nfsd/flexfilelayout.c @@ -15,7 +15,6 @@ #include "flexfilelayoutxdr.h" #include "pnfs.h" -#include "vfs.h" #define NFSDDBG_FACILITY NFSDDBG_PNFS @@ -62,7 +61,7 @@ nfsd4_ff_proc_layoutget(struct inode *inode, const struct svc_fh *fhp, goto out_error; fl->fh.size = fhp->fh_handle.fh_size; - memcpy(fl->fh.data, &fhp->fh_handle.fh_raw, fl->fh.size); + memcpy(fl->fh.data, &fhp->fh_handle.fh_base, fl->fh.size); /* Give whole file layout segments */ seg->offset = 0; diff --git a/fs/nfsd/lockd.c b/fs/nfsd/lockd.c index 46a7f9b813e5..3f5b3d7b62b7 100644 --- a/fs/nfsd/lockd.c +++ b/fs/nfsd/lockd.c @@ -25,22 +25,18 @@ * Note: we hold the dentry use count while the file is open. */ static __be32 -nlm_fopen(struct svc_rqst *rqstp, struct nfs_fh *f, struct file **filp, - int mode) +nlm_fopen(struct svc_rqst *rqstp, struct nfs_fh *f, struct file **filp) { __be32 nfserr; - int access; struct svc_fh fh; /* must initialize before using! but maxsize doesn't matter */ fh_init(&fh,0); fh.fh_handle.fh_size = f->size; - memcpy(&fh.fh_handle.fh_raw, f->data, f->size); + memcpy((char*)&fh.fh_handle.fh_base, f->data, f->size); fh.fh_export = NULL; - access = (mode == O_WRONLY) ? NFSD_MAY_WRITE : NFSD_MAY_READ; - access |= NFSD_MAY_LOCK; - nfserr = nfsd_open(rqstp, &fh, S_IFREG, access, filp); + nfserr = nfsd_open(rqstp, &fh, S_IFREG, NFSD_MAY_LOCK, filp); fh_put(&fh); /* We return nlm error codes as nlm doesn't know * about nfsd, but nfsd does know about nlm.. diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 51a4b7885cae..02d3d2f0e616 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -10,8 +10,6 @@ #include #include -#include -#include /* Hash tables for nfs4_clientid state */ #define CLIENT_HASH_BITS 4 @@ -23,14 +21,6 @@ struct cld_net; struct nfsd4_client_tracking_ops; -enum { - /* cache misses due only to checksum comparison failures */ - NFSD_NET_PAYLOAD_MISSES, - /* amount of memory (in bytes) currently consumed by the DRC */ - NFSD_NET_DRC_MEM_USAGE, - NFSD_NET_COUNTERS_NUM -}; - /* * Represents a nfsd "container". With respect to nfsv4 state tracking, the * fields of interest are the *_id_hashtbls and the *_name_tree. These track @@ -109,8 +99,9 @@ struct nfsd_net { bool nfsd_net_up; bool lockd_up; - seqlock_t writeverf_lock; - unsigned char writeverf[8]; + /* Time of server startup */ + struct timespec64 nfssvc_boot; + seqlock_t boot_lock; /* * Max number of connections this nfsd container will allow. Defaults @@ -123,13 +114,12 @@ struct nfsd_net { u32 clverifier_counter; struct svc_serv *nfsd_serv; - /* When a listening socket is added to nfsd, keep_active is set - * and this justifies a reference on nfsd_serv. This stops - * nfsd_serv from being freed. When the number of threads is - * set, keep_active is cleared and the reference is dropped. So - * when the last thread exits, the service will be destroyed. - */ - int keep_active; + + wait_queue_head_t ntf_wq; + atomic_t ntf_refcnt; + + /* Allow umount to wait for nfsd state cleanup */ + struct completion nfsd_shutdown_complete; /* * clientid and stateid data for construction of net unique COPY @@ -159,16 +149,20 @@ struct nfsd_net { /* * Stats and other tracking of on the duplicate reply cache. - * The longest_chain* fields are modified with only the per-bucket - * cache lock, which isn't really safe and should be fixed if we want - * these statistics to be completely accurate. + * These fields and the "rc" fields in nfsdstats are modified + * with only the per-bucket cache lock, which isn't really safe + * and should be fixed if we want the statistics to be + * completely accurate. */ /* total number of entries */ atomic_t num_drc_entries; - /* Per-netns stats counters */ - struct percpu_counter counter[NFSD_NET_COUNTERS_NUM]; + /* cache misses due only to checksum comparison failures */ + unsigned int payload_misses; + + /* amount of memory (in bytes) currently consumed by the DRC */ + unsigned int drc_mem_usage; /* longest hash chain seen */ unsigned int longest_chain; @@ -177,25 +171,8 @@ struct nfsd_net { unsigned int longest_chain_cachesize; struct shrinker nfsd_reply_cache_shrinker; - - /* tracking server-to-server copy mounts */ - spinlock_t nfsd_ssc_lock; - struct list_head nfsd_ssc_mount_list; - wait_queue_head_t nfsd_ssc_waitq; - /* utsname taken from the process that starts the server */ char nfsd_name[UNX_MAXNODENAME+1]; - - struct nfsd_fcache_disposal *fcache_disposal; - - siphash_key_t siphash_key; - - atomic_t nfs4_client_count; - int nfs4_max_clients; - - atomic_t nfsd_courtesy_clients; - struct shrinker nfsd_client_shrinker; - struct work_struct nfsd_shrinker_work; }; /* Simple check to find out if a given net was properly initialized */ @@ -205,6 +182,6 @@ extern void nfsd_netns_free_versions(struct nfsd_net *nn); extern unsigned int nfsd_net_id; -void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn); -void nfsd_reset_write_verifier(struct nfsd_net *nn); +void nfsd_copy_boot_verifier(__be32 verf[2], struct nfsd_net *nn); +void nfsd_reset_boot_verifier(struct nfsd_net *nn); #endif /* __NFSD_NETNS_H__ */ diff --git a/fs/nfsd/nfs2acl.c b/fs/nfsd/nfs2acl.c index 9adf672dedbd..6a900f770dd2 100644 --- a/fs/nfsd/nfs2acl.c +++ b/fs/nfsd/nfs2acl.c @@ -111,7 +111,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp) if (error) goto out_errno; - inode_lock(inode); + fh_lock(fh); error = set_posix_acl(inode, ACL_TYPE_ACCESS, argp->acl_access); if (error) @@ -120,7 +120,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp) if (error) goto out_drop_lock; - inode_unlock(inode); + fh_unlock(fh); fh_drop_write(fh); @@ -134,7 +134,7 @@ static __be32 nfsacld_proc_setacl(struct svc_rqst *rqstp) return rpc_success; out_drop_lock: - inode_unlock(inode); + fh_unlock(fh); fh_drop_write(fh); out_errno: resp->status = nfserrno(error); @@ -185,106 +185,161 @@ static __be32 nfsacld_proc_access(struct svc_rqst *rqstp) /* * XDR decode functions */ +static int nfsaclsvc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p) +{ + return 1; +} -static bool -nfsaclsvc_decode_getaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +static int nfsaclsvc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_getaclargs *argp = rqstp->rq_argp; - if (!svcxdr_decode_fhandle(xdr, &argp->fh)) - return false; - if (xdr_stream_decode_u32(xdr, &argp->mask) < 0) - return false; + p = nfs2svc_decode_fh(p, &argp->fh); + if (!p) + return 0; + argp->mask = ntohl(*p); p++; - return true; + return xdr_argsize_check(rqstp, p); } -static bool -nfsaclsvc_decode_setaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) + +static int nfsaclsvc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_setaclargs *argp = rqstp->rq_argp; + struct kvec *head = rqstp->rq_arg.head; + unsigned int base; + int n; - if (!svcxdr_decode_fhandle(xdr, &argp->fh)) - return false; - if (xdr_stream_decode_u32(xdr, &argp->mask) < 0) - return false; - if (argp->mask & ~NFS_ACL_MASK) - return false; - if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_ACL) ? - &argp->acl_access : NULL)) - return false; - if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_DFACL) ? - &argp->acl_default : NULL)) - return false; + p = nfs2svc_decode_fh(p, &argp->fh); + if (!p) + return 0; + argp->mask = ntohl(*p++); + if (argp->mask & ~NFS_ACL_MASK || + !xdr_argsize_check(rqstp, p)) + return 0; - return true; + base = (char *)p - (char *)head->iov_base; + n = nfsacl_decode(&rqstp->rq_arg, base, NULL, + (argp->mask & NFS_ACL) ? + &argp->acl_access : NULL); + if (n > 0) + n = nfsacl_decode(&rqstp->rq_arg, base + n, NULL, + (argp->mask & NFS_DFACL) ? + &argp->acl_default : NULL); + return (n > 0); } -static bool -nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +static int nfsaclsvc_decode_fhandleargs(struct svc_rqst *rqstp, __be32 *p) { - struct nfsd3_accessargs *args = rqstp->rq_argp; + struct nfsd_fhandle *argp = rqstp->rq_argp; - if (!svcxdr_decode_fhandle(xdr, &args->fh)) - return false; - if (xdr_stream_decode_u32(xdr, &args->access) < 0) - return false; + p = nfs2svc_decode_fh(p, &argp->fh); + if (!p) + return 0; + return xdr_argsize_check(rqstp, p); +} - return true; +static int nfsaclsvc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) +{ + struct nfsd3_accessargs *argp = rqstp->rq_argp; + + p = nfs2svc_decode_fh(p, &argp->fh); + if (!p) + return 0; + argp->access = ntohl(*p++); + + return xdr_argsize_check(rqstp, p); } /* * XDR encode functions */ +/* + * There must be an encoding function for void results so svc_process + * will work properly. + */ +static int nfsaclsvc_encode_voidres(struct svc_rqst *rqstp, __be32 *p) +{ + return xdr_ressize_check(rqstp, p); +} + /* GETACL */ -static bool -nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +static int nfsaclsvc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_getaclres *resp = rqstp->rq_resp; struct dentry *dentry = resp->fh.fh_dentry; struct inode *inode; + struct kvec *head = rqstp->rq_res.head; + unsigned int base; + int n; + int w; - if (!svcxdr_encode_stat(xdr, resp->status)) - return false; + *p++ = resp->status; + if (resp->status != nfs_ok) + return xdr_ressize_check(rqstp, p); + /* + * Since this is version 2, the check for nfserr in + * nfsd_dispatch actually ensures the following cannot happen. + * However, it seems fragile to depend on that. + */ if (dentry == NULL || d_really_is_negative(dentry)) - return true; + return 0; inode = d_inode(dentry); - if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) - return false; - if (xdr_stream_encode_u32(xdr, resp->mask) < 0) - return false; + p = nfs2svc_encode_fattr(rqstp, p, &resp->fh, &resp->stat); + *p++ = htonl(resp->mask); + if (!xdr_ressize_check(rqstp, p)) + return 0; + base = (char *)p - (char *)head->iov_base; - if (!nfs_stream_encode_acl(xdr, inode, resp->acl_access, - resp->mask & NFS_ACL, 0)) - return false; - if (!nfs_stream_encode_acl(xdr, inode, resp->acl_default, - resp->mask & NFS_DFACL, NFS_ACL_DEFAULT)) - return false; + rqstp->rq_res.page_len = w = nfsacl_size( + (resp->mask & NFS_ACL) ? resp->acl_access : NULL, + (resp->mask & NFS_DFACL) ? resp->acl_default : NULL); + while (w > 0) { + if (!*(rqstp->rq_next_page++)) + return 0; + w -= PAGE_SIZE; + } - return true; + n = nfsacl_encode(&rqstp->rq_res, base, inode, + resp->acl_access, + resp->mask & NFS_ACL, 0); + if (n > 0) + n = nfsacl_encode(&rqstp->rq_res, base + n, inode, + resp->acl_default, + resp->mask & NFS_DFACL, + NFS_ACL_DEFAULT); + return (n > 0); +} + +static int nfsaclsvc_encode_attrstatres(struct svc_rqst *rqstp, __be32 *p) +{ + struct nfsd_attrstat *resp = rqstp->rq_resp; + + *p++ = resp->status; + if (resp->status != nfs_ok) + goto out; + + p = nfs2svc_encode_fattr(rqstp, p, &resp->fh, &resp->stat); +out: + return xdr_ressize_check(rqstp, p); } /* ACCESS */ -static bool -nfsaclsvc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +static int nfsaclsvc_encode_accessres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_accessres *resp = rqstp->rq_resp; - if (!svcxdr_encode_stat(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) - return false; - if (xdr_stream_encode_u32(xdr, resp->access) < 0) - return false; - break; - } + *p++ = resp->status; + if (resp->status != nfs_ok) + goto out; - return true; + p = nfs2svc_encode_fattr(rqstp, p, &resp->fh, &resp->stat); + *p++ = htonl(resp->access); +out: + return xdr_ressize_check(rqstp, p); } /* @@ -299,6 +354,13 @@ static void nfsaclsvc_release_getacl(struct svc_rqst *rqstp) posix_acl_release(resp->acl_default); } +static void nfsaclsvc_release_attrstat(struct svc_rqst *rqstp) +{ + struct nfsd_attrstat *resp = rqstp->rq_resp; + + fh_put(&resp->fh); +} + static void nfsaclsvc_release_access(struct svc_rqst *rqstp) { struct nfsd3_accessres *resp = rqstp->rq_resp; @@ -316,14 +378,12 @@ struct nfsd3_voidargs { int dummy; }; static const struct svc_procedure nfsd_acl_procedures2[5] = { [ACLPROC2_NULL] = { .pc_func = nfsacld_proc_null, - .pc_decode = nfssvc_decode_voidarg, - .pc_encode = nfssvc_encode_voidres, - .pc_argsize = sizeof(struct nfsd_voidargs), - .pc_argzero = sizeof(struct nfsd_voidargs), - .pc_ressize = sizeof(struct nfsd_voidres), + .pc_decode = nfsaclsvc_decode_voidarg, + .pc_encode = nfsaclsvc_encode_voidres, + .pc_argsize = sizeof(struct nfsd3_voidargs), + .pc_ressize = sizeof(struct nfsd3_voidargs), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, - .pc_name = "NULL", }, [ACLPROC2_GETACL] = { .pc_func = nfsacld_proc_getacl, @@ -331,35 +391,29 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_encode = nfsaclsvc_encode_getaclres, .pc_release = nfsaclsvc_release_getacl, .pc_argsize = sizeof(struct nfsd3_getaclargs), - .pc_argzero = sizeof(struct nfsd3_getaclargs), .pc_ressize = sizeof(struct nfsd3_getaclres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+1+2*(1+ACL), - .pc_name = "GETACL", }, [ACLPROC2_SETACL] = { .pc_func = nfsacld_proc_setacl, .pc_decode = nfsaclsvc_decode_setaclargs, - .pc_encode = nfssvc_encode_attrstatres, - .pc_release = nfssvc_release_attrstat, + .pc_encode = nfsaclsvc_encode_attrstatres, + .pc_release = nfsaclsvc_release_attrstat, .pc_argsize = sizeof(struct nfsd3_setaclargs), - .pc_argzero = sizeof(struct nfsd3_setaclargs), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, - .pc_name = "SETACL", }, [ACLPROC2_GETATTR] = { .pc_func = nfsacld_proc_getattr, - .pc_decode = nfssvc_decode_fhandleargs, - .pc_encode = nfssvc_encode_attrstatres, - .pc_release = nfssvc_release_attrstat, + .pc_decode = nfsaclsvc_decode_fhandleargs, + .pc_encode = nfsaclsvc_encode_attrstatres, + .pc_release = nfsaclsvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_fhandle), - .pc_argzero = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, - .pc_name = "GETATTR", }, [ACLPROC2_ACCESS] = { .pc_func = nfsacld_proc_access, @@ -367,11 +421,9 @@ static const struct svc_procedure nfsd_acl_procedures2[5] = { .pc_encode = nfsaclsvc_encode_accessres, .pc_release = nfsaclsvc_release_access, .pc_argsize = sizeof(struct nfsd3_accessargs), - .pc_argzero = sizeof(struct nfsd3_accessargs), .pc_ressize = sizeof(struct nfsd3_accessres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT+1, - .pc_name = "SETATTR", }, }; diff --git a/fs/nfsd/nfs3acl.c b/fs/nfsd/nfs3acl.c index 161f831b3a1b..34a394e50e1d 100644 --- a/fs/nfsd/nfs3acl.c +++ b/fs/nfsd/nfs3acl.c @@ -101,7 +101,7 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp) if (error) goto out_errno; - inode_lock(inode); + fh_lock(fh); error = set_posix_acl(inode, ACL_TYPE_ACCESS, argp->acl_access); if (error) @@ -109,7 +109,7 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp) error = set_posix_acl(inode, ACL_TYPE_DEFAULT, argp->acl_default); out_drop_lock: - inode_unlock(inode); + fh_unlock(fh); fh_drop_write(fh); out_errno: resp->status = nfserrno(error); @@ -124,39 +124,43 @@ static __be32 nfsd3_proc_setacl(struct svc_rqst *rqstp) /* * XDR decode functions */ - -static bool -nfs3svc_decode_getaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +static int nfs3svc_decode_getaclargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_getaclargs *args = rqstp->rq_argp; - if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return false; - if (xdr_stream_decode_u32(xdr, &args->mask) < 0) - return false; + p = nfs3svc_decode_fh(p, &args->fh); + if (!p) + return 0; + args->mask = ntohl(*p); p++; - return true; + return xdr_argsize_check(rqstp, p); } -static bool -nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) + +static int nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, __be32 *p) { - struct nfsd3_setaclargs *argp = rqstp->rq_argp; + struct nfsd3_setaclargs *args = rqstp->rq_argp; + struct kvec *head = rqstp->rq_arg.head; + unsigned int base; + int n; - if (!svcxdr_decode_nfs_fh3(xdr, &argp->fh)) - return false; - if (xdr_stream_decode_u32(xdr, &argp->mask) < 0) - return false; - if (argp->mask & ~NFS_ACL_MASK) - return false; - if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_ACL) ? - &argp->acl_access : NULL)) - return false; - if (!nfs_stream_decode_acl(xdr, NULL, (argp->mask & NFS_DFACL) ? - &argp->acl_default : NULL)) - return false; + p = nfs3svc_decode_fh(p, &args->fh); + if (!p) + return 0; + args->mask = ntohl(*p++); + if (args->mask & ~NFS_ACL_MASK || + !xdr_argsize_check(rqstp, p)) + return 0; - return true; + base = (char *)p - (char *)head->iov_base; + n = nfsacl_decode(&rqstp->rq_arg, base, NULL, + (args->mask & NFS_ACL) ? + &args->acl_access : NULL); + if (n > 0) + n = nfsacl_decode(&rqstp->rq_arg, base + n, NULL, + (args->mask & NFS_DFACL) ? + &args->acl_default : NULL); + return (n > 0); } /* @@ -164,47 +168,59 @@ nfs3svc_decode_setaclargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) */ /* GETACL */ -static bool -nfs3svc_encode_getaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +static int nfs3svc_encode_getaclres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_getaclres *resp = rqstp->rq_resp; struct dentry *dentry = resp->fh.fh_dentry; - struct inode *inode; - if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - inode = d_inode(dentry); - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return false; - if (xdr_stream_encode_u32(xdr, resp->mask) < 0) - return false; + *p++ = resp->status; + p = nfs3svc_encode_post_op_attr(rqstp, p, &resp->fh); + if (resp->status == 0 && dentry && d_really_is_positive(dentry)) { + struct inode *inode = d_inode(dentry); + struct kvec *head = rqstp->rq_res.head; + unsigned int base; + int n; + int w; - if (!nfs_stream_encode_acl(xdr, inode, resp->acl_access, - resp->mask & NFS_ACL, 0)) - return false; - if (!nfs_stream_encode_acl(xdr, inode, resp->acl_default, - resp->mask & NFS_DFACL, - NFS_ACL_DEFAULT)) - return false; - break; - default: - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return false; - } + *p++ = htonl(resp->mask); + if (!xdr_ressize_check(rqstp, p)) + return 0; + base = (char *)p - (char *)head->iov_base; - return true; + rqstp->rq_res.page_len = w = nfsacl_size( + (resp->mask & NFS_ACL) ? resp->acl_access : NULL, + (resp->mask & NFS_DFACL) ? resp->acl_default : NULL); + while (w > 0) { + if (!*(rqstp->rq_next_page++)) + return 0; + w -= PAGE_SIZE; + } + + n = nfsacl_encode(&rqstp->rq_res, base, inode, + resp->acl_access, + resp->mask & NFS_ACL, 0); + if (n > 0) + n = nfsacl_encode(&rqstp->rq_res, base + n, inode, + resp->acl_default, + resp->mask & NFS_DFACL, + NFS_ACL_DEFAULT); + if (n <= 0) + return 0; + } else + if (!xdr_ressize_check(rqstp, p)) + return 0; + + return 1; } /* SETACL */ -static bool -nfs3svc_encode_setaclres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +static int nfs3svc_encode_setaclres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_attrstat *resp = rqstp->rq_resp; - return svcxdr_encode_nfsstat3(xdr, resp->status) && - svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh); + *p++ = resp->status; + p = nfs3svc_encode_post_op_attr(rqstp, p, &resp->fh); + return xdr_ressize_check(rqstp, p); } /* @@ -229,14 +245,12 @@ struct nfsd3_voidargs { int dummy; }; static const struct svc_procedure nfsd_acl_procedures3[3] = { [ACLPROC3_NULL] = { .pc_func = nfsd3_proc_null, - .pc_decode = nfssvc_decode_voidarg, - .pc_encode = nfssvc_encode_voidres, - .pc_argsize = sizeof(struct nfsd_voidargs), - .pc_argzero = sizeof(struct nfsd_voidargs), - .pc_ressize = sizeof(struct nfsd_voidres), + .pc_decode = nfs3svc_decode_voidarg, + .pc_encode = nfs3svc_encode_voidres, + .pc_argsize = sizeof(struct nfsd3_voidargs), + .pc_ressize = sizeof(struct nfsd3_voidargs), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, - .pc_name = "NULL", }, [ACLPROC3_GETACL] = { .pc_func = nfsd3_proc_getacl, @@ -244,11 +258,9 @@ static const struct svc_procedure nfsd_acl_procedures3[3] = { .pc_encode = nfs3svc_encode_getaclres, .pc_release = nfs3svc_release_getacl, .pc_argsize = sizeof(struct nfsd3_getaclargs), - .pc_argzero = sizeof(struct nfsd3_getaclargs), .pc_ressize = sizeof(struct nfsd3_getaclres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+1+2*(1+ACL), - .pc_name = "GETACL", }, [ACLPROC3_SETACL] = { .pc_func = nfsd3_proc_setacl, @@ -256,11 +268,9 @@ static const struct svc_procedure nfsd_acl_procedures3[3] = { .pc_encode = nfs3svc_encode_setaclres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_setaclargs), - .pc_argzero = sizeof(struct nfsd3_setaclargs), .pc_ressize = sizeof(struct nfsd3_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT, - .pc_name = "SETACL", }, }; diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index 19cf583096d9..981a4e4c9a3c 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -8,12 +8,10 @@ #include #include #include -#include #include "cache.h" #include "xdr3.h" #include "vfs.h" -#include "filecache.h" #define NFSDDBG_FACILITY NFSDDBG_PROC @@ -68,15 +66,12 @@ nfsd3_proc_setattr(struct svc_rqst *rqstp) { struct nfsd3_sattrargs *argp = rqstp->rq_argp; struct nfsd3_attrstat *resp = rqstp->rq_resp; - struct nfsd_attrs attrs = { - .na_iattr = &argp->attrs, - }; dprintk("nfsd: SETATTR(3) %s\n", SVCFH_fmt(&argp->fh)); fh_copy(&resp->fh, &argp->fh); - resp->status = nfsd_setattr(rqstp, &resp->fh, &attrs, + resp->status = nfsd_setattr(rqstp, &resp->fh, &argp->attrs, argp->check_guard, argp->guardtime); return rpc_success; } @@ -129,7 +124,7 @@ nfsd3_proc_access(struct svc_rqst *rqstp) static __be32 nfsd3_proc_readlink(struct svc_rqst *rqstp) { - struct nfsd_fhandle *argp = rqstp->rq_argp; + struct nfsd3_readlinkargs *argp = rqstp->rq_argp; struct nfsd3_readlinkres *resp = rqstp->rq_resp; dprintk("nfsd: READLINK(3) %s\n", SVCFH_fmt(&argp->fh)); @@ -137,9 +132,7 @@ nfsd3_proc_readlink(struct svc_rqst *rqstp) /* Read the symlink. */ fh_copy(&resp->fh, &argp->fh); resp->len = NFS3_MAXPATHLEN; - resp->pages = rqstp->rq_next_page++; - resp->status = nfsd_readlink(rqstp, &resp->fh, - page_address(*resp->pages), &resp->len); + resp->status = nfsd_readlink(rqstp, &resp->fh, argp->buffer, &resp->len); return rpc_success; } @@ -151,43 +144,25 @@ nfsd3_proc_read(struct svc_rqst *rqstp) { struct nfsd3_readargs *argp = rqstp->rq_argp; struct nfsd3_readres *resp = rqstp->rq_resp; - unsigned int len; - int v; + u32 max_blocksize = svc_max_payload(rqstp); + unsigned long cnt = min(argp->count, max_blocksize); dprintk("nfsd: READ(3) %s %lu bytes at %Lu\n", SVCFH_fmt(&argp->fh), (unsigned long) argp->count, (unsigned long long) argp->offset); - argp->count = min_t(u32, argp->count, svc_max_payload(rqstp)); - argp->count = min_t(u32, argp->count, rqstp->rq_res.buflen); - if (argp->offset > (u64)OFFSET_MAX) - argp->offset = (u64)OFFSET_MAX; - if (argp->offset + argp->count > (u64)OFFSET_MAX) - argp->count = (u64)OFFSET_MAX - argp->offset; - - v = 0; - len = argp->count; - resp->pages = rqstp->rq_next_page; - while (len > 0) { - struct page *page = *(rqstp->rq_next_page++); - - rqstp->rq_vec[v].iov_base = page_address(page); - rqstp->rq_vec[v].iov_len = min_t(unsigned int, len, PAGE_SIZE); - len -= rqstp->rq_vec[v].iov_len; - v++; - } - /* Obtain buffer pointer for payload. * 1 (status) + 22 (post_op_attr) + 1 (count) + 1 (eof) * + 1 (xdr opaque byte count) = 26 */ - resp->count = argp->count; + resp->count = cnt; svc_reserve_auth(rqstp, ((1 + NFS3_POST_OP_ATTR_WORDS + 3)<<2) + resp->count +4); fh_copy(&resp->fh, &argp->fh); resp->status = nfsd_read(rqstp, &resp->fh, argp->offset, - rqstp->rq_vec, v, &resp->count, &resp->eof); + rqstp->rq_vec, argp->vlen, &resp->count, + &resp->eof); return rpc_success; } @@ -215,147 +190,32 @@ nfsd3_proc_write(struct svc_rqst *rqstp) fh_copy(&resp->fh, &argp->fh); resp->committed = argp->stable; - nvecs = svc_fill_write_vector(rqstp, &argp->payload); - + nvecs = svc_fill_write_vector(rqstp, rqstp->rq_arg.pages, + &argp->first, cnt); + if (!nvecs) { + resp->status = nfserr_io; + goto out; + } resp->status = nfsd_write(rqstp, &resp->fh, argp->offset, rqstp->rq_vec, nvecs, &cnt, resp->committed, resp->verf); resp->count = cnt; +out: return rpc_success; } /* - * Implement NFSv3's unchecked, guarded, and exclusive CREATE - * semantics for regular files. Except for the created file, - * this operation is stateless on the server. - * - * Upon return, caller must release @fhp and @resfhp. + * With NFSv3, CREATE processing is a lot easier than with NFSv2. + * At least in theory; we'll see how it fares in practice when the + * first reports about SunOS compatibility problems start to pour in... */ -static __be32 -nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct svc_fh *resfhp, struct nfsd3_createargs *argp) -{ - struct iattr *iap = &argp->attrs; - struct dentry *parent, *child; - struct nfsd_attrs attrs = { - .na_iattr = iap, - }; - __u32 v_mtime, v_atime; - struct inode *inode; - __be32 status; - int host_err; - - if (isdotent(argp->name, argp->len)) - return nfserr_exist; - if (!(iap->ia_valid & ATTR_MODE)) - iap->ia_mode = 0; - - status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC); - if (status != nfs_ok) - return status; - - parent = fhp->fh_dentry; - inode = d_inode(parent); - - host_err = fh_want_write(fhp); - if (host_err) - return nfserrno(host_err); - - inode_lock_nested(inode, I_MUTEX_PARENT); - - child = lookup_one_len(argp->name, parent, argp->len); - if (IS_ERR(child)) { - status = nfserrno(PTR_ERR(child)); - goto out; - } - - if (d_really_is_negative(child)) { - status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE); - if (status != nfs_ok) - goto out; - } - - status = fh_compose(resfhp, fhp->fh_export, child, fhp); - if (status != nfs_ok) - goto out; - - v_mtime = 0; - v_atime = 0; - if (argp->createmode == NFS3_CREATE_EXCLUSIVE) { - u32 *verifier = (u32 *)argp->verf; - - /* - * Solaris 7 gets confused (bugid 4218508) if these have - * the high bit set, as do xfs filesystems without the - * "bigtime" feature. So just clear the high bits. - */ - v_mtime = verifier[0] & 0x7fffffff; - v_atime = verifier[1] & 0x7fffffff; - } - - if (d_really_is_positive(child)) { - status = nfs_ok; - - switch (argp->createmode) { - case NFS3_CREATE_UNCHECKED: - if (!d_is_reg(child)) - break; - iap->ia_valid &= ATTR_SIZE; - goto set_attr; - case NFS3_CREATE_GUARDED: - status = nfserr_exist; - break; - case NFS3_CREATE_EXCLUSIVE: - if (d_inode(child)->i_mtime.tv_sec == v_mtime && - d_inode(child)->i_atime.tv_sec == v_atime && - d_inode(child)->i_size == 0) { - break; - } - status = nfserr_exist; - } - goto out; - } - - if (!IS_POSIXACL(inode)) - iap->ia_mode &= ~current_umask(); - - fh_fill_pre_attrs(fhp); - host_err = vfs_create(inode, child, iap->ia_mode, true); - if (host_err < 0) { - status = nfserrno(host_err); - goto out; - } - fh_fill_post_attrs(fhp); - - /* A newly created file already has a file size of zero. */ - if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0)) - iap->ia_valid &= ~ATTR_SIZE; - if (argp->createmode == NFS3_CREATE_EXCLUSIVE) { - iap->ia_valid = ATTR_MTIME | ATTR_ATIME | - ATTR_MTIME_SET | ATTR_ATIME_SET; - iap->ia_mtime.tv_sec = v_mtime; - iap->ia_atime.tv_sec = v_atime; - iap->ia_mtime.tv_nsec = 0; - iap->ia_atime.tv_nsec = 0; - } - -set_attr: - status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs); - -out: - inode_unlock(inode); - if (child && !IS_ERR(child)) - dput(child); - fh_drop_write(fhp); - return status; -} - static __be32 nfsd3_proc_create(struct svc_rqst *rqstp) { struct nfsd3_createargs *argp = rqstp->rq_argp; struct nfsd3_diropres *resp = rqstp->rq_resp; - svc_fh *dirfhp, *newfhp; + svc_fh *dirfhp, *newfhp = NULL; + struct iattr *attr; dprintk("nfsd: CREATE(3) %s %.*s\n", SVCFH_fmt(&argp->fh), @@ -364,8 +224,21 @@ nfsd3_proc_create(struct svc_rqst *rqstp) dirfhp = fh_copy(&resp->dirfh, &argp->fh); newfhp = fh_init(&resp->fh, NFS3_FHSIZE); + attr = &argp->attrs; - resp->status = nfsd3_create_file(rqstp, dirfhp, newfhp, argp); + /* Unfudge the mode bits */ + attr->ia_mode &= ~S_IFMT; + if (!(attr->ia_valid & ATTR_MODE)) { + attr->ia_valid |= ATTR_MODE; + attr->ia_mode = S_IFREG; + } else { + attr->ia_mode = (attr->ia_mode & ~S_IFMT) | S_IFREG; + } + + /* Now create the file and set attributes */ + resp->status = do_nfsd_create(rqstp, dirfhp, argp->name, argp->len, + attr, newfhp, argp->createmode, + (u32 *)argp->verf, NULL, NULL); return rpc_success; } @@ -377,9 +250,6 @@ nfsd3_proc_mkdir(struct svc_rqst *rqstp) { struct nfsd3_createargs *argp = rqstp->rq_argp; struct nfsd3_diropres *resp = rqstp->rq_resp; - struct nfsd_attrs attrs = { - .na_iattr = &argp->attrs, - }; dprintk("nfsd: MKDIR(3) %s %.*s\n", SVCFH_fmt(&argp->fh), @@ -390,7 +260,8 @@ nfsd3_proc_mkdir(struct svc_rqst *rqstp) fh_copy(&resp->dirfh, &argp->fh); fh_init(&resp->fh, NFS3_FHSIZE); resp->status = nfsd_create(rqstp, &resp->dirfh, argp->name, argp->len, - &attrs, S_IFDIR, 0, &resp->fh); + &argp->attrs, S_IFDIR, 0, &resp->fh); + fh_unlock(&resp->dirfh); return rpc_success; } @@ -399,9 +270,6 @@ nfsd3_proc_symlink(struct svc_rqst *rqstp) { struct nfsd3_symlinkargs *argp = rqstp->rq_argp; struct nfsd3_diropres *resp = rqstp->rq_resp; - struct nfsd_attrs attrs = { - .na_iattr = &argp->attrs, - }; if (argp->tlen == 0) { resp->status = nfserr_inval; @@ -428,7 +296,7 @@ nfsd3_proc_symlink(struct svc_rqst *rqstp) fh_copy(&resp->dirfh, &argp->ffh); fh_init(&resp->fh, NFS3_FHSIZE); resp->status = nfsd_symlink(rqstp, &resp->dirfh, argp->fname, - argp->flen, argp->tname, &attrs, &resp->fh); + argp->flen, argp->tname, &resp->fh); kfree(argp->tname); out: return rpc_success; @@ -442,9 +310,6 @@ nfsd3_proc_mknod(struct svc_rqst *rqstp) { struct nfsd3_mknodargs *argp = rqstp->rq_argp; struct nfsd3_diropres *resp = rqstp->rq_resp; - struct nfsd_attrs attrs = { - .na_iattr = &argp->attrs, - }; int type; dev_t rdev = 0; @@ -470,7 +335,8 @@ nfsd3_proc_mknod(struct svc_rqst *rqstp) type = nfs3_ftypes[argp->ftype]; resp->status = nfsd_create(rqstp, &resp->dirfh, argp->name, argp->len, - &attrs, type, rdev, &resp->fh); + &argp->attrs, type, rdev, &resp->fh); + fh_unlock(&resp->dirfh); out: return rpc_success; } @@ -493,6 +359,7 @@ nfsd3_proc_remove(struct svc_rqst *rqstp) fh_copy(&resp->fh, &argp->fh); resp->status = nfsd_unlink(rqstp, &resp->fh, -S_IFDIR, argp->name, argp->len); + fh_unlock(&resp->fh); return rpc_success; } @@ -513,6 +380,7 @@ nfsd3_proc_rmdir(struct svc_rqst *rqstp) fh_copy(&resp->fh, &argp->fh); resp->status = nfsd_unlink(rqstp, &resp->fh, S_IFDIR, argp->name, argp->len); + fh_unlock(&resp->fh); return rpc_success; } @@ -558,26 +426,6 @@ nfsd3_proc_link(struct svc_rqst *rqstp) return rpc_success; } -static void nfsd3_init_dirlist_pages(struct svc_rqst *rqstp, - struct nfsd3_readdirres *resp, - u32 count) -{ - struct xdr_buf *buf = &resp->dirlist; - struct xdr_stream *xdr = &resp->xdr; - unsigned int sendbuf = min_t(unsigned int, rqstp->rq_res.buflen, - svc_max_payload(rqstp)); - - memset(buf, 0, sizeof(*buf)); - - /* Reserve room for the NULL ptr & eof flag (-2 words) */ - buf->buflen = clamp(count, (u32)(XDR_UNIT * 2), sendbuf); - buf->buflen -= XDR_UNIT * 2; - buf->pages = rqstp->rq_next_page; - rqstp->rq_next_page += (buf->buflen + PAGE_SIZE - 1) >> PAGE_SHIFT; - - xdr_init_encode_pages(xdr, buf, buf->pages, NULL); -} - /* * Read a portion of a directory. */ @@ -586,26 +434,53 @@ nfsd3_proc_readdir(struct svc_rqst *rqstp) { struct nfsd3_readdirargs *argp = rqstp->rq_argp; struct nfsd3_readdirres *resp = rqstp->rq_resp; - loff_t offset; + int count = 0; + struct page **p; + caddr_t page_addr = NULL; dprintk("nfsd: READDIR(3) %s %d bytes at %d\n", SVCFH_fmt(&argp->fh), argp->count, (u32) argp->cookie); - nfsd3_init_dirlist_pages(rqstp, resp, argp->count); + /* Make sure we've room for the NULL ptr & eof flag, and shrink to + * client read size */ + count = (argp->count >> 2) - 2; + /* Read directory and encode entries on the fly */ fh_copy(&resp->fh, &argp->fh); - resp->common.err = nfs_ok; - resp->cookie_offset = 0; - resp->rqstp = rqstp; - offset = argp->cookie; - resp->status = nfsd_readdir(rqstp, &resp->fh, &offset, - &resp->common, nfs3svc_encode_entry3); - memcpy(resp->verf, argp->verf, 8); - nfs3svc_encode_cookie3(resp, offset); - /* Recycle only pages that were part of the reply */ - rqstp->rq_next_page = resp->xdr.page_ptr + 1; + resp->buflen = count; + resp->common.err = nfs_ok; + resp->buffer = argp->buffer; + resp->rqstp = rqstp; + resp->status = nfsd_readdir(rqstp, &resp->fh, (loff_t *)&argp->cookie, + &resp->common, nfs3svc_encode_entry); + memcpy(resp->verf, argp->verf, 8); + count = 0; + for (p = rqstp->rq_respages + 1; p < rqstp->rq_next_page; p++) { + page_addr = page_address(*p); + + if (((caddr_t)resp->buffer >= page_addr) && + ((caddr_t)resp->buffer < page_addr + PAGE_SIZE)) { + count += (caddr_t)resp->buffer - page_addr; + break; + } + count += PAGE_SIZE; + } + resp->count = count >> 2; + if (resp->offset) { + loff_t offset = argp->cookie; + + if (unlikely(resp->offset1)) { + /* we ended up with offset on a page boundary */ + *resp->offset = htonl(offset >> 32); + *resp->offset1 = htonl(offset & 0xffffffff); + resp->offset1 = NULL; + } else { + xdr_encode_hyper(resp->offset, offset); + } + resp->offset = NULL; + } return rpc_success; } @@ -619,17 +494,25 @@ nfsd3_proc_readdirplus(struct svc_rqst *rqstp) { struct nfsd3_readdirargs *argp = rqstp->rq_argp; struct nfsd3_readdirres *resp = rqstp->rq_resp; + int count = 0; loff_t offset; + struct page **p; + caddr_t page_addr = NULL; dprintk("nfsd: READDIR+(3) %s %d bytes at %d\n", SVCFH_fmt(&argp->fh), argp->count, (u32) argp->cookie); - nfsd3_init_dirlist_pages(rqstp, resp, argp->count); + /* Convert byte count to number of words (i.e. >> 2), + * and reserve room for the NULL ptr & eof flag (-2 words) */ + resp->count = (argp->count >> 2) - 2; + /* Read directory and encode entries on the fly */ fh_copy(&resp->fh, &argp->fh); + resp->common.err = nfs_ok; - resp->cookie_offset = 0; + resp->buffer = argp->buffer; + resp->buflen = resp->count; resp->rqstp = rqstp; offset = argp->cookie; @@ -643,12 +526,30 @@ nfsd3_proc_readdirplus(struct svc_rqst *rqstp) } resp->status = nfsd_readdir(rqstp, &resp->fh, &offset, - &resp->common, nfs3svc_encode_entryplus3); + &resp->common, nfs3svc_encode_entry_plus); memcpy(resp->verf, argp->verf, 8); - nfs3svc_encode_cookie3(resp, offset); + for (p = rqstp->rq_respages + 1; p < rqstp->rq_next_page; p++) { + page_addr = page_address(*p); - /* Recycle only pages that were part of the reply */ - rqstp->rq_next_page = resp->xdr.page_ptr + 1; + if (((caddr_t)resp->buffer >= page_addr) && + ((caddr_t)resp->buffer < page_addr + PAGE_SIZE)) { + count += (caddr_t)resp->buffer - page_addr; + break; + } + count += PAGE_SIZE; + } + resp->count = count >> 2; + if (resp->offset) { + if (unlikely(resp->offset1)) { + /* we ended up with offset on a page boundary */ + *resp->offset = htonl(offset >> 32); + *resp->offset1 = htonl(offset & 0xffffffff); + resp->offset1 = NULL; + } else { + xdr_encode_hyper(resp->offset, offset); + } + resp->offset = NULL; + } out: return rpc_success; @@ -764,21 +665,20 @@ nfsd3_proc_commit(struct svc_rqst *rqstp) { struct nfsd3_commitargs *argp = rqstp->rq_argp; struct nfsd3_commitres *resp = rqstp->rq_resp; - struct nfsd_file *nf; dprintk("nfsd: COMMIT(3) %s %u@%Lu\n", SVCFH_fmt(&argp->fh), argp->count, (unsigned long long) argp->offset); - fh_copy(&resp->fh, &argp->fh); - resp->status = nfsd_file_acquire_gc(rqstp, &resp->fh, NFSD_MAY_WRITE | - NFSD_MAY_NOT_BREAK_LEASE, &nf); - if (resp->status) + if (argp->offset > NFS_OFFSET_MAX) { + resp->status = nfserr_inval; goto out; - resp->status = nfsd_commit(rqstp, &resp->fh, nf, argp->offset, + } + + fh_copy(&resp->fh, &argp->fh); + resp->status = nfsd_commit(rqstp, &resp->fh, argp->offset, argp->count, resp->verf); - nfsd_file_put(nf); out: return rpc_success; } @@ -788,14 +688,18 @@ nfsd3_proc_commit(struct svc_rqst *rqstp) * NFSv3 Server procedures. * Only the results of non-idempotent operations are cached. */ +#define nfs3svc_decode_fhandleargs nfs3svc_decode_fhandle #define nfs3svc_encode_attrstatres nfs3svc_encode_attrstat #define nfs3svc_encode_wccstatres nfs3svc_encode_wccstat #define nfsd3_mkdirargs nfsd3_createargs #define nfsd3_readdirplusargs nfsd3_readdirargs #define nfsd3_fhandleargs nfsd_fhandle +#define nfsd3_fhandleres nfsd3_attrstat #define nfsd3_attrstatres nfsd3_attrstat #define nfsd3_wccstatres nfsd3_attrstat #define nfsd3_createres nfsd3_diropres +#define nfsd3_voidres nfsd3_voidargs +struct nfsd3_voidargs { int dummy; }; #define ST 1 /* status*/ #define FH 17 /* filehandle with length */ @@ -806,26 +710,22 @@ nfsd3_proc_commit(struct svc_rqst *rqstp) static const struct svc_procedure nfsd_procedures3[22] = { [NFS3PROC_NULL] = { .pc_func = nfsd3_proc_null, - .pc_decode = nfssvc_decode_voidarg, - .pc_encode = nfssvc_encode_voidres, - .pc_argsize = sizeof(struct nfsd_voidargs), - .pc_argzero = sizeof(struct nfsd_voidargs), - .pc_ressize = sizeof(struct nfsd_voidres), + .pc_decode = nfs3svc_decode_voidarg, + .pc_encode = nfs3svc_encode_voidres, + .pc_argsize = sizeof(struct nfsd3_voidargs), + .pc_ressize = sizeof(struct nfsd3_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST, - .pc_name = "NULL", }, [NFS3PROC_GETATTR] = { .pc_func = nfsd3_proc_getattr, .pc_decode = nfs3svc_decode_fhandleargs, - .pc_encode = nfs3svc_encode_getattrres, + .pc_encode = nfs3svc_encode_attrstatres, .pc_release = nfs3svc_release_fhandle, - .pc_argsize = sizeof(struct nfsd_fhandle), - .pc_argzero = sizeof(struct nfsd_fhandle), + .pc_argsize = sizeof(struct nfsd3_fhandleargs), .pc_ressize = sizeof(struct nfsd3_attrstatres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, - .pc_name = "GETATTR", }, [NFS3PROC_SETATTR] = { .pc_func = nfsd3_proc_setattr, @@ -833,23 +733,19 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_wccstatres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_sattrargs), - .pc_argzero = sizeof(struct nfsd3_sattrargs), .pc_ressize = sizeof(struct nfsd3_wccstatres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC, - .pc_name = "SETATTR", }, [NFS3PROC_LOOKUP] = { .pc_func = nfsd3_proc_lookup, .pc_decode = nfs3svc_decode_diropargs, - .pc_encode = nfs3svc_encode_lookupres, + .pc_encode = nfs3svc_encode_diropres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_diropargs), - .pc_argzero = sizeof(struct nfsd3_diropargs), .pc_ressize = sizeof(struct nfsd3_diropres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+FH+pAT+pAT, - .pc_name = "LOOKUP", }, [NFS3PROC_ACCESS] = { .pc_func = nfsd3_proc_access, @@ -857,23 +753,19 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_accessres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_accessargs), - .pc_argzero = sizeof(struct nfsd3_accessargs), .pc_ressize = sizeof(struct nfsd3_accessres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+1, - .pc_name = "ACCESS", }, [NFS3PROC_READLINK] = { .pc_func = nfsd3_proc_readlink, - .pc_decode = nfs3svc_decode_fhandleargs, + .pc_decode = nfs3svc_decode_readlinkargs, .pc_encode = nfs3svc_encode_readlinkres, .pc_release = nfs3svc_release_fhandle, - .pc_argsize = sizeof(struct nfsd_fhandle), - .pc_argzero = sizeof(struct nfsd_fhandle), + .pc_argsize = sizeof(struct nfsd3_readlinkargs), .pc_ressize = sizeof(struct nfsd3_readlinkres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+1+NFS3_MAXPATHLEN/4, - .pc_name = "READLINK", }, [NFS3PROC_READ] = { .pc_func = nfsd3_proc_read, @@ -881,11 +773,9 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_readres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_readargs), - .pc_argzero = sizeof(struct nfsd3_readargs), .pc_ressize = sizeof(struct nfsd3_readres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+4+NFSSVC_MAXBLKSIZE/4, - .pc_name = "READ", }, [NFS3PROC_WRITE] = { .pc_func = nfsd3_proc_write, @@ -893,11 +783,9 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_writeres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_writeargs), - .pc_argzero = sizeof(struct nfsd3_writeargs), .pc_ressize = sizeof(struct nfsd3_writeres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC+4, - .pc_name = "WRITE", }, [NFS3PROC_CREATE] = { .pc_func = nfsd3_proc_create, @@ -905,11 +793,9 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_createres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_createargs), - .pc_argzero = sizeof(struct nfsd3_createargs), .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, - .pc_name = "CREATE", }, [NFS3PROC_MKDIR] = { .pc_func = nfsd3_proc_mkdir, @@ -917,11 +803,9 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_createres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_mkdirargs), - .pc_argzero = sizeof(struct nfsd3_mkdirargs), .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, - .pc_name = "MKDIR", }, [NFS3PROC_SYMLINK] = { .pc_func = nfsd3_proc_symlink, @@ -929,11 +813,9 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_createres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_symlinkargs), - .pc_argzero = sizeof(struct nfsd3_symlinkargs), .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, - .pc_name = "SYMLINK", }, [NFS3PROC_MKNOD] = { .pc_func = nfsd3_proc_mknod, @@ -941,11 +823,9 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_createres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_mknodargs), - .pc_argzero = sizeof(struct nfsd3_mknodargs), .pc_ressize = sizeof(struct nfsd3_createres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+(1+FH+pAT)+WC, - .pc_name = "MKNOD", }, [NFS3PROC_REMOVE] = { .pc_func = nfsd3_proc_remove, @@ -953,11 +833,9 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_wccstatres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_diropargs), - .pc_argzero = sizeof(struct nfsd3_diropargs), .pc_ressize = sizeof(struct nfsd3_wccstatres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC, - .pc_name = "REMOVE", }, [NFS3PROC_RMDIR] = { .pc_func = nfsd3_proc_rmdir, @@ -965,11 +843,9 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_wccstatres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_diropargs), - .pc_argzero = sizeof(struct nfsd3_diropargs), .pc_ressize = sizeof(struct nfsd3_wccstatres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC, - .pc_name = "RMDIR", }, [NFS3PROC_RENAME] = { .pc_func = nfsd3_proc_rename, @@ -977,11 +853,9 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_renameres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_renameargs), - .pc_argzero = sizeof(struct nfsd3_renameargs), .pc_ressize = sizeof(struct nfsd3_renameres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+WC+WC, - .pc_name = "RENAME", }, [NFS3PROC_LINK] = { .pc_func = nfsd3_proc_link, @@ -989,11 +863,9 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_linkres, .pc_release = nfs3svc_release_fhandle2, .pc_argsize = sizeof(struct nfsd3_linkargs), - .pc_argzero = sizeof(struct nfsd3_linkargs), .pc_ressize = sizeof(struct nfsd3_linkres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+pAT+WC, - .pc_name = "LINK", }, [NFS3PROC_READDIR] = { .pc_func = nfsd3_proc_readdir, @@ -1001,10 +873,8 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_readdirres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_readdirargs), - .pc_argzero = sizeof(struct nfsd3_readdirargs), .pc_ressize = sizeof(struct nfsd3_readdirres), .pc_cachetype = RC_NOCACHE, - .pc_name = "READDIR", }, [NFS3PROC_READDIRPLUS] = { .pc_func = nfsd3_proc_readdirplus, @@ -1012,43 +882,35 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_readdirres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_readdirplusargs), - .pc_argzero = sizeof(struct nfsd3_readdirplusargs), .pc_ressize = sizeof(struct nfsd3_readdirres), .pc_cachetype = RC_NOCACHE, - .pc_name = "READDIRPLUS", }, [NFS3PROC_FSSTAT] = { .pc_func = nfsd3_proc_fsstat, .pc_decode = nfs3svc_decode_fhandleargs, .pc_encode = nfs3svc_encode_fsstatres, .pc_argsize = sizeof(struct nfsd3_fhandleargs), - .pc_argzero = sizeof(struct nfsd3_fhandleargs), .pc_ressize = sizeof(struct nfsd3_fsstatres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+2*6+1, - .pc_name = "FSSTAT", }, [NFS3PROC_FSINFO] = { .pc_func = nfsd3_proc_fsinfo, .pc_decode = nfs3svc_decode_fhandleargs, .pc_encode = nfs3svc_encode_fsinfores, .pc_argsize = sizeof(struct nfsd3_fhandleargs), - .pc_argzero = sizeof(struct nfsd3_fhandleargs), .pc_ressize = sizeof(struct nfsd3_fsinfores), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+12, - .pc_name = "FSINFO", }, [NFS3PROC_PATHCONF] = { .pc_func = nfsd3_proc_pathconf, .pc_decode = nfs3svc_decode_fhandleargs, .pc_encode = nfs3svc_encode_pathconfres, .pc_argsize = sizeof(struct nfsd3_fhandleargs), - .pc_argzero = sizeof(struct nfsd3_fhandleargs), .pc_ressize = sizeof(struct nfsd3_pathconfres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+pAT+6, - .pc_name = "PATHCONF", }, [NFS3PROC_COMMIT] = { .pc_func = nfsd3_proc_commit, @@ -1056,11 +918,9 @@ static const struct svc_procedure nfsd_procedures3[22] = { .pc_encode = nfs3svc_encode_commitres, .pc_release = nfs3svc_release_fhandle, .pc_argsize = sizeof(struct nfsd3_commitargs), - .pc_argzero = sizeof(struct nfsd3_commitargs), .pc_ressize = sizeof(struct nfsd3_commitres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+WC+2, - .pc_name = "COMMIT", }, }; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index 3308dd671ef0..716566da400e 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -14,26 +14,13 @@ #include "netns.h" #include "vfs.h" -/* - * Force construction of an empty post-op attr - */ -static const struct svc_fh nfs3svc_null_fh = { - .fh_no_wcc = true, -}; +#define NFSDDBG_FACILITY NFSDDBG_XDR -/* - * time_delta. {1, 0} means the server is accurate only - * to the nearest second. - */ -static const struct timespec64 nfs3svc_time_delta = { - .tv_sec = 1, - .tv_nsec = 0, -}; /* * Mapping of S_IF* types to NFS file types */ -static const u32 nfs3_ftypes[] = { +static u32 nfs3_ftypes[] = { NF3NON, NF3FIFO, NF3CHR, NF3BAD, NF3DIR, NF3BAD, NF3BLK, NF3BAD, NF3REG, NF3BAD, NF3LNK, NF3BAD, @@ -42,938 +29,824 @@ static const u32 nfs3_ftypes[] = { /* - * Basic NFSv3 data types (RFC 1813 Sections 2.5 and 2.6) + * XDR functions for basic NFS types */ +static __be32 * +encode_time3(__be32 *p, struct timespec64 *time) +{ + *p++ = htonl((u32) time->tv_sec); *p++ = htonl(time->tv_nsec); + return p; +} static __be32 * -encode_nfstime3(__be32 *p, const struct timespec64 *time) +decode_time3(__be32 *p, struct timespec64 *time) { - *p++ = cpu_to_be32((u32)time->tv_sec); - *p++ = cpu_to_be32(time->tv_nsec); + time->tv_sec = ntohl(*p++); + time->tv_nsec = ntohl(*p++); + return p; +} + +static __be32 * +decode_fh(__be32 *p, struct svc_fh *fhp) +{ + unsigned int size; + fh_init(fhp, NFS3_FHSIZE); + size = ntohl(*p++); + if (size > NFS3_FHSIZE) + return NULL; + + memcpy(&fhp->fh_handle.fh_base, p, size); + fhp->fh_handle.fh_size = size; + return p + XDR_QUADLEN(size); +} + +/* Helper function for NFSv3 ACL code */ +__be32 *nfs3svc_decode_fh(__be32 *p, struct svc_fh *fhp) +{ + return decode_fh(p, fhp); +} + +static __be32 * +encode_fh(__be32 *p, struct svc_fh *fhp) +{ + unsigned int size = fhp->fh_handle.fh_size; + *p++ = htonl(size); + if (size) p[XDR_QUADLEN(size)-1]=0; + memcpy(p, &fhp->fh_handle.fh_base, size); + return p + XDR_QUADLEN(size); +} + +/* + * Decode a file name and make sure that the path contains + * no slashes or null bytes. + */ +static __be32 * +decode_filename(__be32 *p, char **namp, unsigned int *lenp) +{ + char *name; + unsigned int i; + + if ((p = xdr_decode_string_inplace(p, namp, lenp, NFS3_MAXNAMLEN)) != NULL) { + for (i = 0, name = *namp; i < *lenp; i++, name++) { + if (*name == '\0' || *name == '/') + return NULL; + } + } return p; } -static bool -svcxdr_decode_nfstime3(struct xdr_stream *xdr, struct timespec64 *timep) +static __be32 * +decode_sattr3(__be32 *p, struct iattr *iap, struct user_namespace *userns) { - __be32 *p; - - p = xdr_inline_decode(xdr, XDR_UNIT * 2); - if (!p) - return false; - timep->tv_sec = be32_to_cpup(p++); - timep->tv_nsec = be32_to_cpup(p); - - return true; -} - -/** - * svcxdr_decode_nfs_fh3 - Decode an NFSv3 file handle - * @xdr: XDR stream positioned at an undecoded NFSv3 FH - * @fhp: OUT: filled-in server file handle - * - * Return values: - * %false: The encoded file handle was not valid - * %true: @fhp has been initialized - */ -bool -svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp) -{ - __be32 *p; - u32 size; - - if (xdr_stream_decode_u32(xdr, &size) < 0) - return false; - if (size == 0 || size > NFS3_FHSIZE) - return false; - p = xdr_inline_decode(xdr, size); - if (!p) - return false; - fh_init(fhp, NFS3_FHSIZE); - fhp->fh_handle.fh_size = size; - memcpy(&fhp->fh_handle.fh_raw, p, size); - - return true; -} - -/** - * svcxdr_encode_nfsstat3 - Encode an NFSv3 status code - * @xdr: XDR stream - * @status: status value to encode - * - * Return values: - * %false: Send buffer space was exhausted - * %true: Success - */ -bool -svcxdr_encode_nfsstat3(struct xdr_stream *xdr, __be32 status) -{ - __be32 *p; - - p = xdr_reserve_space(xdr, sizeof(status)); - if (!p) - return false; - *p = status; - - return true; -} - -static bool -svcxdr_encode_nfs_fh3(struct xdr_stream *xdr, const struct svc_fh *fhp) -{ - u32 size = fhp->fh_handle.fh_size; - __be32 *p; - - p = xdr_reserve_space(xdr, XDR_UNIT + size); - if (!p) - return false; - *p++ = cpu_to_be32(size); - if (size) - p[XDR_QUADLEN(size) - 1] = 0; - memcpy(p, &fhp->fh_handle.fh_raw, size); - - return true; -} - -static bool -svcxdr_encode_post_op_fh3(struct xdr_stream *xdr, const struct svc_fh *fhp) -{ - if (xdr_stream_encode_item_present(xdr) < 0) - return false; - if (!svcxdr_encode_nfs_fh3(xdr, fhp)) - return false; - - return true; -} - -static bool -svcxdr_encode_cookieverf3(struct xdr_stream *xdr, const __be32 *verf) -{ - __be32 *p; - - p = xdr_reserve_space(xdr, NFS3_COOKIEVERFSIZE); - if (!p) - return false; - memcpy(p, verf, NFS3_COOKIEVERFSIZE); - - return true; -} - -static bool -svcxdr_encode_writeverf3(struct xdr_stream *xdr, const __be32 *verf) -{ - __be32 *p; - - p = xdr_reserve_space(xdr, NFS3_WRITEVERFSIZE); - if (!p) - return false; - memcpy(p, verf, NFS3_WRITEVERFSIZE); - - return true; -} - -static bool -svcxdr_decode_filename3(struct xdr_stream *xdr, char **name, unsigned int *len) -{ - u32 size, i; - __be32 *p; - char *c; - - if (xdr_stream_decode_u32(xdr, &size) < 0) - return false; - if (size == 0 || size > NFS3_MAXNAMLEN) - return false; - p = xdr_inline_decode(xdr, size); - if (!p) - return false; - - *len = size; - *name = (char *)p; - for (i = 0, c = *name; i < size; i++, c++) { - if (*c == '\0' || *c == '/') - return false; - } - - return true; -} - -static bool -svcxdr_decode_diropargs3(struct xdr_stream *xdr, struct svc_fh *fhp, - char **name, unsigned int *len) -{ - return svcxdr_decode_nfs_fh3(xdr, fhp) && - svcxdr_decode_filename3(xdr, name, len); -} - -static bool -svcxdr_decode_sattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr, - struct iattr *iap) -{ - u32 set_it; + u32 tmp; iap->ia_valid = 0; - if (xdr_stream_decode_bool(xdr, &set_it) < 0) - return false; - if (set_it) { - u32 mode; - - if (xdr_stream_decode_u32(xdr, &mode) < 0) - return false; + if (*p++) { iap->ia_valid |= ATTR_MODE; - iap->ia_mode = mode; + iap->ia_mode = ntohl(*p++); } - if (xdr_stream_decode_bool(xdr, &set_it) < 0) - return false; - if (set_it) { - u32 uid; - - if (xdr_stream_decode_u32(xdr, &uid) < 0) - return false; - iap->ia_uid = make_kuid(nfsd_user_namespace(rqstp), uid); + if (*p++) { + iap->ia_uid = make_kuid(userns, ntohl(*p++)); if (uid_valid(iap->ia_uid)) iap->ia_valid |= ATTR_UID; } - if (xdr_stream_decode_bool(xdr, &set_it) < 0) - return false; - if (set_it) { - u32 gid; - - if (xdr_stream_decode_u32(xdr, &gid) < 0) - return false; - iap->ia_gid = make_kgid(nfsd_user_namespace(rqstp), gid); + if (*p++) { + iap->ia_gid = make_kgid(userns, ntohl(*p++)); if (gid_valid(iap->ia_gid)) iap->ia_valid |= ATTR_GID; } - if (xdr_stream_decode_bool(xdr, &set_it) < 0) - return false; - if (set_it) { - u64 newsize; + if (*p++) { + u64 newsize; - if (xdr_stream_decode_u64(xdr, &newsize) < 0) - return false; iap->ia_valid |= ATTR_SIZE; - iap->ia_size = newsize; + p = xdr_decode_hyper(p, &newsize); + iap->ia_size = min_t(u64, newsize, NFS_OFFSET_MAX); } - if (xdr_stream_decode_u32(xdr, &set_it) < 0) - return false; - switch (set_it) { - case DONT_CHANGE: - break; - case SET_TO_SERVER_TIME: + if ((tmp = ntohl(*p++)) == 1) { /* set to server time */ iap->ia_valid |= ATTR_ATIME; - break; - case SET_TO_CLIENT_TIME: - if (!svcxdr_decode_nfstime3(xdr, &iap->ia_atime)) - return false; + } else if (tmp == 2) { /* set to client time */ iap->ia_valid |= ATTR_ATIME | ATTR_ATIME_SET; - break; - default: - return false; + iap->ia_atime.tv_sec = ntohl(*p++); + iap->ia_atime.tv_nsec = ntohl(*p++); } - if (xdr_stream_decode_u32(xdr, &set_it) < 0) - return false; - switch (set_it) { - case DONT_CHANGE: - break; - case SET_TO_SERVER_TIME: + if ((tmp = ntohl(*p++)) == 1) { /* set to server time */ iap->ia_valid |= ATTR_MTIME; - break; - case SET_TO_CLIENT_TIME: - if (!svcxdr_decode_nfstime3(xdr, &iap->ia_mtime)) - return false; + } else if (tmp == 2) { /* set to client time */ iap->ia_valid |= ATTR_MTIME | ATTR_MTIME_SET; - break; - default: - return false; + iap->ia_mtime.tv_sec = ntohl(*p++); + iap->ia_mtime.tv_nsec = ntohl(*p++); } - - return true; + return p; } -static bool -svcxdr_decode_sattrguard3(struct xdr_stream *xdr, struct nfsd3_sattrargs *args) +static __be32 *encode_fsid(__be32 *p, struct svc_fh *fhp) { - __be32 *p; - u32 check; - - if (xdr_stream_decode_bool(xdr, &check) < 0) - return false; - if (check) { - p = xdr_inline_decode(xdr, XDR_UNIT * 2); - if (!p) - return false; - args->check_guard = 1; - args->guardtime = be32_to_cpup(p); - } else - args->check_guard = 0; - - return true; -} - -static bool -svcxdr_decode_specdata3(struct xdr_stream *xdr, struct nfsd3_mknodargs *args) -{ - __be32 *p; - - p = xdr_inline_decode(xdr, XDR_UNIT * 2); - if (!p) - return false; - args->major = be32_to_cpup(p++); - args->minor = be32_to_cpup(p); - - return true; -} - -static bool -svcxdr_decode_devicedata3(struct svc_rqst *rqstp, struct xdr_stream *xdr, - struct nfsd3_mknodargs *args) -{ - return svcxdr_decode_sattr3(rqstp, xdr, &args->attrs) && - svcxdr_decode_specdata3(xdr, args); -} - -static bool -svcxdr_encode_fattr3(struct svc_rqst *rqstp, struct xdr_stream *xdr, - const struct svc_fh *fhp, const struct kstat *stat) -{ - struct user_namespace *userns = nfsd_user_namespace(rqstp); - __be32 *p; - u64 fsid; - - p = xdr_reserve_space(xdr, XDR_UNIT * 21); - if (!p) - return false; - - *p++ = cpu_to_be32(nfs3_ftypes[(stat->mode & S_IFMT) >> 12]); - *p++ = cpu_to_be32((u32)(stat->mode & S_IALLUGO)); - *p++ = cpu_to_be32((u32)stat->nlink); - *p++ = cpu_to_be32((u32)from_kuid_munged(userns, stat->uid)); - *p++ = cpu_to_be32((u32)from_kgid_munged(userns, stat->gid)); - if (S_ISLNK(stat->mode) && stat->size > NFS3_MAXPATHLEN) - p = xdr_encode_hyper(p, (u64)NFS3_MAXPATHLEN); - else - p = xdr_encode_hyper(p, (u64)stat->size); - - /* used */ - p = xdr_encode_hyper(p, ((u64)stat->blocks) << 9); - - /* rdev */ - *p++ = cpu_to_be32((u32)MAJOR(stat->rdev)); - *p++ = cpu_to_be32((u32)MINOR(stat->rdev)); - + u64 f; switch(fsid_source(fhp)) { + default: + case FSIDSOURCE_DEV: + p = xdr_encode_hyper(p, (u64)huge_encode_dev + (fhp->fh_dentry->d_sb->s_dev)); + break; case FSIDSOURCE_FSID: - fsid = (u64)fhp->fh_export->ex_fsid; + p = xdr_encode_hyper(p, (u64) fhp->fh_export->ex_fsid); break; case FSIDSOURCE_UUID: - fsid = ((u64 *)fhp->fh_export->ex_uuid)[0]; - fsid ^= ((u64 *)fhp->fh_export->ex_uuid)[1]; + f = ((u64*)fhp->fh_export->ex_uuid)[0]; + f ^= ((u64*)fhp->fh_export->ex_uuid)[1]; + p = xdr_encode_hyper(p, f); break; - default: - fsid = (u64)huge_encode_dev(fhp->fh_dentry->d_sb->s_dev); } - p = xdr_encode_hyper(p, fsid); + return p; +} - /* fileid */ +static __be32 * +encode_fattr3(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, + struct kstat *stat) +{ + struct user_namespace *userns = nfsd_user_namespace(rqstp); + *p++ = htonl(nfs3_ftypes[(stat->mode & S_IFMT) >> 12]); + *p++ = htonl((u32) (stat->mode & S_IALLUGO)); + *p++ = htonl((u32) stat->nlink); + *p++ = htonl((u32) from_kuid_munged(userns, stat->uid)); + *p++ = htonl((u32) from_kgid_munged(userns, stat->gid)); + if (S_ISLNK(stat->mode) && stat->size > NFS3_MAXPATHLEN) { + p = xdr_encode_hyper(p, (u64) NFS3_MAXPATHLEN); + } else { + p = xdr_encode_hyper(p, (u64) stat->size); + } + p = xdr_encode_hyper(p, ((u64)stat->blocks) << 9); + *p++ = htonl((u32) MAJOR(stat->rdev)); + *p++ = htonl((u32) MINOR(stat->rdev)); + p = encode_fsid(p, fhp); p = xdr_encode_hyper(p, stat->ino); + p = encode_time3(p, &stat->atime); + p = encode_time3(p, &stat->mtime); + p = encode_time3(p, &stat->ctime); - p = encode_nfstime3(p, &stat->atime); - p = encode_nfstime3(p, &stat->mtime); - encode_nfstime3(p, &stat->ctime); - - return true; + return p; } -static bool -svcxdr_encode_wcc_attr(struct xdr_stream *xdr, const struct svc_fh *fhp) +static __be32 * +encode_saved_post_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) { - __be32 *p; - - p = xdr_reserve_space(xdr, XDR_UNIT * 6); - if (!p) - return false; - p = xdr_encode_hyper(p, (u64)fhp->fh_pre_size); - p = encode_nfstime3(p, &fhp->fh_pre_mtime); - encode_nfstime3(p, &fhp->fh_pre_ctime); - - return true; -} - -static bool -svcxdr_encode_pre_op_attr(struct xdr_stream *xdr, const struct svc_fh *fhp) -{ - if (!fhp->fh_pre_saved) { - if (xdr_stream_encode_item_absent(xdr) < 0) - return false; - return true; - } - - if (xdr_stream_encode_item_present(xdr) < 0) - return false; - return svcxdr_encode_wcc_attr(xdr, fhp); -} - -/** - * svcxdr_encode_post_op_attr - Encode NFSv3 post-op attributes - * @rqstp: Context of a completed RPC transaction - * @xdr: XDR stream - * @fhp: File handle to encode - * - * Return values: - * %false: Send buffer space was exhausted - * %true: Success - */ -bool -svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr, - const struct svc_fh *fhp) -{ - struct dentry *dentry = fhp->fh_dentry; - struct kstat stat; - - /* - * The inode may be NULL if the call failed because of a - * stale file handle. In this case, no attributes are - * returned. - */ - if (fhp->fh_no_wcc || !dentry || !d_really_is_positive(dentry)) - goto no_post_op_attrs; - if (fh_getattr(fhp, &stat) != nfs_ok) - goto no_post_op_attrs; - - if (xdr_stream_encode_item_present(xdr) < 0) - return false; - lease_get_mtime(d_inode(dentry), &stat.mtime); - if (!svcxdr_encode_fattr3(rqstp, xdr, fhp, &stat)) - return false; - - return true; - -no_post_op_attrs: - return xdr_stream_encode_item_absent(xdr) > 0; + /* Attributes to follow */ + *p++ = xdr_one; + return encode_fattr3(rqstp, p, fhp, &fhp->fh_post_attr); } /* - * Encode weak cache consistency data + * Encode post-operation attributes. + * The inode may be NULL if the call failed because of a stale file + * handle. In this case, no attributes are returned. */ -static bool -svcxdr_encode_wcc_data(struct svc_rqst *rqstp, struct xdr_stream *xdr, - const struct svc_fh *fhp) +static __be32 * +encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) { struct dentry *dentry = fhp->fh_dentry; + if (dentry && d_really_is_positive(dentry)) { + __be32 err; + struct kstat stat; - if (!dentry || !d_really_is_positive(dentry) || !fhp->fh_post_saved) - goto neither; + err = fh_getattr(fhp, &stat); + if (!err) { + *p++ = xdr_one; /* attributes follow */ + lease_get_mtime(d_inode(dentry), &stat.mtime); + return encode_fattr3(rqstp, p, fhp, &stat); + } + } + *p++ = xdr_zero; + return p; +} - /* before */ - if (!svcxdr_encode_pre_op_attr(xdr, fhp)) - return false; +/* Helper for NFSv3 ACLs */ +__be32 * +nfs3svc_encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) +{ + return encode_post_op_attr(rqstp, p, fhp); +} - /* after */ - if (xdr_stream_encode_item_present(xdr) < 0) - return false; - if (!svcxdr_encode_fattr3(rqstp, xdr, fhp, &fhp->fh_post_attr)) - return false; +/* + * Enocde weak cache consistency data + */ +static __be32 * +encode_wcc_data(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) +{ + struct dentry *dentry = fhp->fh_dentry; - return true; + if (dentry && d_really_is_positive(dentry) && fhp->fh_post_saved) { + if (fhp->fh_pre_saved) { + *p++ = xdr_one; + p = xdr_encode_hyper(p, (u64) fhp->fh_pre_size); + p = encode_time3(p, &fhp->fh_pre_mtime); + p = encode_time3(p, &fhp->fh_pre_ctime); + } else { + *p++ = xdr_zero; + } + return encode_saved_post_attr(rqstp, p, fhp); + } + /* no pre- or post-attrs */ + *p++ = xdr_zero; + return encode_post_op_attr(rqstp, p, fhp); +} -neither: - if (xdr_stream_encode_item_absent(xdr) < 0) - return false; - if (!svcxdr_encode_post_op_attr(rqstp, xdr, fhp)) - return false; +/* + * Fill in the pre_op attr for the wcc data + */ +void fill_pre_wcc(struct svc_fh *fhp) +{ + struct inode *inode; + struct kstat stat; + __be32 err; - return true; + if (fhp->fh_pre_saved) + return; + + inode = d_inode(fhp->fh_dentry); + err = fh_getattr(fhp, &stat); + if (err) { + /* Grab the times from inode anyway */ + stat.mtime = inode->i_mtime; + stat.ctime = inode->i_ctime; + stat.size = inode->i_size; + } + + fhp->fh_pre_mtime = stat.mtime; + fhp->fh_pre_ctime = stat.ctime; + fhp->fh_pre_size = stat.size; + fhp->fh_pre_change = nfsd4_change_attribute(&stat, inode); + fhp->fh_pre_saved = true; +} + +/* + * Fill in the post_op attr for the wcc data + */ +void fill_post_wcc(struct svc_fh *fhp) +{ + __be32 err; + + if (fhp->fh_post_saved) + printk("nfsd: inode locked twice during operation.\n"); + + err = fh_getattr(fhp, &fhp->fh_post_attr); + fhp->fh_post_change = nfsd4_change_attribute(&fhp->fh_post_attr, + d_inode(fhp->fh_dentry)); + if (err) { + fhp->fh_post_saved = false; + /* Grab the ctime anyway - set_change_info might use it */ + fhp->fh_post_attr.ctime = d_inode(fhp->fh_dentry)->i_ctime; + } else + fhp->fh_post_saved = true; } /* * XDR decode functions */ +int +nfs3svc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p) +{ + return 1; +} -bool -nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_decode_fhandle(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_fhandle *args = rqstp->rq_argp; - return svcxdr_decode_nfs_fh3(xdr, &args->fh); + p = decode_fh(p, &args->fh); + if (!p) + return 0; + return xdr_argsize_check(rqstp, p); } -bool -nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_sattrargs *args = rqstp->rq_argp; - return svcxdr_decode_nfs_fh3(xdr, &args->fh) && - svcxdr_decode_sattr3(rqstp, xdr, &args->attrs) && - svcxdr_decode_sattrguard3(xdr, args); + p = decode_fh(p, &args->fh); + if (!p) + return 0; + p = decode_sattr3(p, &args->attrs, nfsd_user_namespace(rqstp)); + + if ((args->check_guard = ntohl(*p++)) != 0) { + struct timespec64 time; + p = decode_time3(p, &time); + args->guardtime = time.tv_sec; + } + + return xdr_argsize_check(rqstp, p); } -bool -nfs3svc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_decode_diropargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_diropargs *args = rqstp->rq_argp; - return svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len); + if (!(p = decode_fh(p, &args->fh)) + || !(p = decode_filename(p, &args->name, &args->len))) + return 0; + + return xdr_argsize_check(rqstp, p); } -bool -nfs3svc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_decode_accessargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_accessargs *args = rqstp->rq_argp; - if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return false; - if (xdr_stream_decode_u32(xdr, &args->access) < 0) - return false; + p = decode_fh(p, &args->fh); + if (!p) + return 0; + args->access = ntohl(*p++); - return true; + return xdr_argsize_check(rqstp, p); } -bool -nfs3svc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_decode_readargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_readargs *args = rqstp->rq_argp; - - if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return false; - if (xdr_stream_decode_u64(xdr, &args->offset) < 0) - return false; - if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return false; - - return true; -} - -bool -nfs3svc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) -{ - struct nfsd3_writeargs *args = rqstp->rq_argp; + unsigned int len; + int v; u32 max_blocksize = svc_max_payload(rqstp); - if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return false; - if (xdr_stream_decode_u64(xdr, &args->offset) < 0) - return false; - if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return false; - if (xdr_stream_decode_u32(xdr, &args->stable) < 0) - return false; + p = decode_fh(p, &args->fh); + if (!p) + return 0; + p = xdr_decode_hyper(p, &args->offset); - /* opaque data */ - if (xdr_stream_decode_u32(xdr, &args->len) < 0) - return false; + args->count = ntohl(*p++); + len = min(args->count, max_blocksize); - /* request sanity */ + /* set up the kvec */ + v=0; + while (len > 0) { + struct page *p = *(rqstp->rq_next_page++); + + rqstp->rq_vec[v].iov_base = page_address(p); + rqstp->rq_vec[v].iov_len = min_t(unsigned int, len, PAGE_SIZE); + len -= rqstp->rq_vec[v].iov_len; + v++; + } + args->vlen = v; + return xdr_argsize_check(rqstp, p); +} + +int +nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) +{ + struct nfsd3_writeargs *args = rqstp->rq_argp; + unsigned int len, hdr, dlen; + u32 max_blocksize = svc_max_payload(rqstp); + struct kvec *head = rqstp->rq_arg.head; + struct kvec *tail = rqstp->rq_arg.tail; + + p = decode_fh(p, &args->fh); + if (!p) + return 0; + p = xdr_decode_hyper(p, &args->offset); + + args->count = ntohl(*p++); + args->stable = ntohl(*p++); + len = args->len = ntohl(*p++); + if ((void *)p > head->iov_base + head->iov_len) + return 0; + /* + * The count must equal the amount of data passed. + */ if (args->count != args->len) - return false; + return 0; + + /* + * Check to make sure that we got the right number of + * bytes. + */ + hdr = (void*)p - head->iov_base; + dlen = head->iov_len + rqstp->rq_arg.page_len + tail->iov_len - hdr; + /* + * Round the length of the data which was specified up to + * the next multiple of XDR units and then compare that + * against the length which was actually received. + * Note that when RPCSEC/GSS (for example) is used, the + * data buffer can be padded so dlen might be larger + * than required. It must never be smaller. + */ + if (dlen < XDR_QUADLEN(len)*4) + return 0; + if (args->count > max_blocksize) { args->count = max_blocksize; - args->len = max_blocksize; + len = args->len = max_blocksize; } - return xdr_stream_subsegment(xdr, &args->payload, args->count); + args->first.iov_base = (void *)p; + args->first.iov_len = head->iov_len - hdr; + return 1; } -bool -nfs3svc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_decode_createargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_createargs *args = rqstp->rq_argp; - if (!svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len)) - return false; - if (xdr_stream_decode_u32(xdr, &args->createmode) < 0) - return false; - switch (args->createmode) { + if (!(p = decode_fh(p, &args->fh)) + || !(p = decode_filename(p, &args->name, &args->len))) + return 0; + + switch (args->createmode = ntohl(*p++)) { case NFS3_CREATE_UNCHECKED: case NFS3_CREATE_GUARDED: - return svcxdr_decode_sattr3(rqstp, xdr, &args->attrs); + p = decode_sattr3(p, &args->attrs, nfsd_user_namespace(rqstp)); + break; case NFS3_CREATE_EXCLUSIVE: - args->verf = xdr_inline_decode(xdr, NFS3_CREATEVERFSIZE); - if (!args->verf) - return false; + args->verf = p; + p += 2; break; default: - return false; + return 0; } - return true; + + return xdr_argsize_check(rqstp, p); } -bool -nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_createargs *args = rqstp->rq_argp; - return svcxdr_decode_diropargs3(xdr, &args->fh, - &args->name, &args->len) && - svcxdr_decode_sattr3(rqstp, xdr, &args->attrs); + if (!(p = decode_fh(p, &args->fh)) || + !(p = decode_filename(p, &args->name, &args->len))) + return 0; + p = decode_sattr3(p, &args->attrs, nfsd_user_namespace(rqstp)); + + return xdr_argsize_check(rqstp, p); } -bool -nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_symlinkargs *args = rqstp->rq_argp; - struct kvec *head = rqstp->rq_arg.head; + char *base = (char *)p; + size_t dlen; - if (!svcxdr_decode_diropargs3(xdr, &args->ffh, &args->fname, &args->flen)) - return false; - if (!svcxdr_decode_sattr3(rqstp, xdr, &args->attrs)) - return false; - if (xdr_stream_decode_u32(xdr, &args->tlen) < 0) - return false; + if (!(p = decode_fh(p, &args->ffh)) || + !(p = decode_filename(p, &args->fname, &args->flen))) + return 0; + p = decode_sattr3(p, &args->attrs, nfsd_user_namespace(rqstp)); - /* symlink_data */ - args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); - args->first.iov_base = xdr_inline_decode(xdr, args->tlen); - return args->first.iov_base != NULL; + args->tlen = ntohl(*p++); + + args->first.iov_base = p; + args->first.iov_len = rqstp->rq_arg.head[0].iov_len; + args->first.iov_len -= (char *)p - base; + + dlen = args->first.iov_len + rqstp->rq_arg.page_len + + rqstp->rq_arg.tail[0].iov_len; + if (dlen < XDR_QUADLEN(args->tlen) << 2) + return 0; + return 1; } -bool -nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_mknodargs *args = rqstp->rq_argp; - if (!svcxdr_decode_diropargs3(xdr, &args->fh, &args->name, &args->len)) - return false; - if (xdr_stream_decode_u32(xdr, &args->ftype) < 0) - return false; - switch (args->ftype) { - case NF3CHR: - case NF3BLK: - return svcxdr_decode_devicedata3(rqstp, xdr, args); - case NF3SOCK: - case NF3FIFO: - return svcxdr_decode_sattr3(rqstp, xdr, &args->attrs); - case NF3REG: - case NF3DIR: - case NF3LNK: - /* Valid XDR but illegal file types */ - break; - default: - return false; + if (!(p = decode_fh(p, &args->fh)) + || !(p = decode_filename(p, &args->name, &args->len))) + return 0; + + args->ftype = ntohl(*p++); + + if (args->ftype == NF3BLK || args->ftype == NF3CHR + || args->ftype == NF3SOCK || args->ftype == NF3FIFO) + p = decode_sattr3(p, &args->attrs, nfsd_user_namespace(rqstp)); + + if (args->ftype == NF3BLK || args->ftype == NF3CHR) { + args->major = ntohl(*p++); + args->minor = ntohl(*p++); } - return true; + return xdr_argsize_check(rqstp, p); } -bool -nfs3svc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_renameargs *args = rqstp->rq_argp; - return svcxdr_decode_diropargs3(xdr, &args->ffh, - &args->fname, &args->flen) && - svcxdr_decode_diropargs3(xdr, &args->tfh, - &args->tname, &args->tlen); + if (!(p = decode_fh(p, &args->ffh)) + || !(p = decode_filename(p, &args->fname, &args->flen)) + || !(p = decode_fh(p, &args->tfh)) + || !(p = decode_filename(p, &args->tname, &args->tlen))) + return 0; + + return xdr_argsize_check(rqstp, p); } -bool -nfs3svc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_decode_readlinkargs(struct svc_rqst *rqstp, __be32 *p) +{ + struct nfsd3_readlinkargs *args = rqstp->rq_argp; + + p = decode_fh(p, &args->fh); + if (!p) + return 0; + args->buffer = page_address(*(rqstp->rq_next_page++)); + + return xdr_argsize_check(rqstp, p); +} + +int +nfs3svc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_linkargs *args = rqstp->rq_argp; - return svcxdr_decode_nfs_fh3(xdr, &args->ffh) && - svcxdr_decode_diropargs3(xdr, &args->tfh, - &args->tname, &args->tlen); + if (!(p = decode_fh(p, &args->ffh)) + || !(p = decode_fh(p, &args->tfh)) + || !(p = decode_filename(p, &args->tname, &args->tlen))) + return 0; + + return xdr_argsize_check(rqstp, p); } -bool -nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_readdirargs *args = rqstp->rq_argp; + int len; + u32 max_blocksize = svc_max_payload(rqstp); - if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return false; - if (xdr_stream_decode_u64(xdr, &args->cookie) < 0) - return false; - args->verf = xdr_inline_decode(xdr, NFS3_COOKIEVERFSIZE); - if (!args->verf) - return false; - if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return false; + p = decode_fh(p, &args->fh); + if (!p) + return 0; + p = xdr_decode_hyper(p, &args->cookie); + args->verf = p; p += 2; + args->dircount = ~0; + args->count = ntohl(*p++); + len = args->count = min_t(u32, args->count, max_blocksize); - return true; + while (len > 0) { + struct page *p = *(rqstp->rq_next_page++); + if (!args->buffer) + args->buffer = page_address(p); + len -= PAGE_SIZE; + } + + return xdr_argsize_check(rqstp, p); } -bool -nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_readdirargs *args = rqstp->rq_argp; - u32 dircount; + int len; + u32 max_blocksize = svc_max_payload(rqstp); - if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return false; - if (xdr_stream_decode_u64(xdr, &args->cookie) < 0) - return false; - args->verf = xdr_inline_decode(xdr, NFS3_COOKIEVERFSIZE); - if (!args->verf) - return false; - /* dircount is ignored */ - if (xdr_stream_decode_u32(xdr, &dircount) < 0) - return false; - if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return false; + p = decode_fh(p, &args->fh); + if (!p) + return 0; + p = xdr_decode_hyper(p, &args->cookie); + args->verf = p; p += 2; + args->dircount = ntohl(*p++); + args->count = ntohl(*p++); - return true; + len = args->count = min(args->count, max_blocksize); + while (len > 0) { + struct page *p = *(rqstp->rq_next_page++); + if (!args->buffer) + args->buffer = page_address(p); + len -= PAGE_SIZE; + } + + return xdr_argsize_check(rqstp, p); } -bool -nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_decode_commitargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_commitargs *args = rqstp->rq_argp; + p = decode_fh(p, &args->fh); + if (!p) + return 0; + p = xdr_decode_hyper(p, &args->offset); + args->count = ntohl(*p++); - if (!svcxdr_decode_nfs_fh3(xdr, &args->fh)) - return false; - if (xdr_stream_decode_u64(xdr, &args->offset) < 0) - return false; - if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return false; - - return true; + return xdr_argsize_check(rqstp, p); } /* * XDR encode functions */ +int +nfs3svc_encode_voidres(struct svc_rqst *rqstp, __be32 *p) +{ + return xdr_ressize_check(rqstp, p); +} + /* GETATTR */ -bool -nfs3svc_encode_getattrres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_encode_attrstat(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_attrstat *resp = rqstp->rq_resp; - if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - lease_get_mtime(d_inode(resp->fh.fh_dentry), &resp->stat.mtime); - if (!svcxdr_encode_fattr3(rqstp, xdr, &resp->fh, &resp->stat)) - return false; - break; + *p++ = resp->status; + if (resp->status == 0) { + lease_get_mtime(d_inode(resp->fh.fh_dentry), + &resp->stat.mtime); + p = encode_fattr3(rqstp, p, &resp->fh, &resp->stat); } - - return true; + return xdr_ressize_check(rqstp, p); } /* SETATTR, REMOVE, RMDIR */ -bool -nfs3svc_encode_wccstat(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_encode_wccstat(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_attrstat *resp = rqstp->rq_resp; - return svcxdr_encode_nfsstat3(xdr, resp->status) && - svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh); + *p++ = resp->status; + p = encode_wcc_data(rqstp, p, &resp->fh); + return xdr_ressize_check(rqstp, p); } /* LOOKUP */ -bool -nfs3svc_encode_lookupres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_encode_diropres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_diropres *resp = rqstp->rq_resp; - if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - if (!svcxdr_encode_nfs_fh3(xdr, &resp->fh)) - return false; - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return false; - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->dirfh)) - return false; - break; - default: - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->dirfh)) - return false; + *p++ = resp->status; + if (resp->status == 0) { + p = encode_fh(p, &resp->fh); + p = encode_post_op_attr(rqstp, p, &resp->fh); } - - return true; + p = encode_post_op_attr(rqstp, p, &resp->dirfh); + return xdr_ressize_check(rqstp, p); } /* ACCESS */ -bool -nfs3svc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_encode_accessres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_accessres *resp = rqstp->rq_resp; - if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return false; - if (xdr_stream_encode_u32(xdr, resp->access) < 0) - return false; - break; - default: - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return false; - } - - return true; + *p++ = resp->status; + p = encode_post_op_attr(rqstp, p, &resp->fh); + if (resp->status == 0) + *p++ = htonl(resp->access); + return xdr_ressize_check(rqstp, p); } /* READLINK */ -bool -nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_readlinkres *resp = rqstp->rq_resp; - struct kvec *head = rqstp->rq_res.head; - if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return false; - if (xdr_stream_encode_u32(xdr, resp->len) < 0) - return false; - xdr_write_pages(xdr, resp->pages, 0, resp->len); - if (svc_encode_result_payload(rqstp, head->iov_len, resp->len) < 0) - return false; - break; - default: - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return false; - } - - return true; + *p++ = resp->status; + p = encode_post_op_attr(rqstp, p, &resp->fh); + if (resp->status == 0) { + *p++ = htonl(resp->len); + xdr_ressize_check(rqstp, p); + rqstp->rq_res.page_len = resp->len; + if (resp->len & 3) { + /* need to pad the tail */ + rqstp->rq_res.tail[0].iov_base = p; + *p = 0; + rqstp->rq_res.tail[0].iov_len = 4 - (resp->len&3); + } + return 1; + } else + return xdr_ressize_check(rqstp, p); } /* READ */ -bool -nfs3svc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_encode_readres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_readres *resp = rqstp->rq_resp; - struct kvec *head = rqstp->rq_res.head; - if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return false; - if (xdr_stream_encode_u32(xdr, resp->count) < 0) - return false; - if (xdr_stream_encode_bool(xdr, resp->eof) < 0) - return false; - if (xdr_stream_encode_u32(xdr, resp->count) < 0) - return false; - xdr_write_pages(xdr, resp->pages, rqstp->rq_res.page_base, - resp->count); - if (svc_encode_result_payload(rqstp, head->iov_len, resp->count) < 0) - return false; - break; - default: - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return false; - } - - return true; + *p++ = resp->status; + p = encode_post_op_attr(rqstp, p, &resp->fh); + if (resp->status == 0) { + *p++ = htonl(resp->count); + *p++ = htonl(resp->eof); + *p++ = htonl(resp->count); /* xdr opaque count */ + xdr_ressize_check(rqstp, p); + /* now update rqstp->rq_res to reflect data as well */ + rqstp->rq_res.page_len = resp->count; + if (resp->count & 3) { + /* need to pad the tail */ + rqstp->rq_res.tail[0].iov_base = p; + *p = 0; + rqstp->rq_res.tail[0].iov_len = 4 - (resp->count & 3); + } + return 1; + } else + return xdr_ressize_check(rqstp, p); } /* WRITE */ -bool -nfs3svc_encode_writeres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_encode_writeres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_writeres *resp = rqstp->rq_resp; - if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) - return false; - if (xdr_stream_encode_u32(xdr, resp->count) < 0) - return false; - if (xdr_stream_encode_u32(xdr, resp->committed) < 0) - return false; - if (!svcxdr_encode_writeverf3(xdr, resp->verf)) - return false; - break; - default: - if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) - return false; + *p++ = resp->status; + p = encode_wcc_data(rqstp, p, &resp->fh); + if (resp->status == 0) { + *p++ = htonl(resp->count); + *p++ = htonl(resp->committed); + *p++ = resp->verf[0]; + *p++ = resp->verf[1]; } - - return true; + return xdr_ressize_check(rqstp, p); } /* CREATE, MKDIR, SYMLINK, MKNOD */ -bool -nfs3svc_encode_createres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_encode_createres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_diropres *resp = rqstp->rq_resp; - if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - if (!svcxdr_encode_post_op_fh3(xdr, &resp->fh)) - return false; - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return false; - if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->dirfh)) - return false; - break; - default: - if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->dirfh)) - return false; + *p++ = resp->status; + if (resp->status == 0) { + *p++ = xdr_one; + p = encode_fh(p, &resp->fh); + p = encode_post_op_attr(rqstp, p, &resp->fh); } - - return true; + p = encode_wcc_data(rqstp, p, &resp->dirfh); + return xdr_ressize_check(rqstp, p); } /* RENAME */ -bool -nfs3svc_encode_renameres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_encode_renameres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_renameres *resp = rqstp->rq_resp; - return svcxdr_encode_nfsstat3(xdr, resp->status) && - svcxdr_encode_wcc_data(rqstp, xdr, &resp->ffh) && - svcxdr_encode_wcc_data(rqstp, xdr, &resp->tfh); + *p++ = resp->status; + p = encode_wcc_data(rqstp, p, &resp->ffh); + p = encode_wcc_data(rqstp, p, &resp->tfh); + return xdr_ressize_check(rqstp, p); } /* LINK */ -bool -nfs3svc_encode_linkres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_encode_linkres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_linkres *resp = rqstp->rq_resp; - return svcxdr_encode_nfsstat3(xdr, resp->status) && - svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh) && - svcxdr_encode_wcc_data(rqstp, xdr, &resp->tfh); + *p++ = resp->status; + p = encode_post_op_attr(rqstp, p, &resp->fh); + p = encode_wcc_data(rqstp, p, &resp->tfh); + return xdr_ressize_check(rqstp, p); } /* READDIR */ -bool -nfs3svc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_readdirres *resp = rqstp->rq_resp; - struct xdr_buf *dirlist = &resp->dirlist; - if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return false; - if (!svcxdr_encode_cookieverf3(xdr, resp->verf)) - return false; - xdr_write_pages(xdr, dirlist->pages, 0, dirlist->len); - /* no more entries */ - if (xdr_stream_encode_item_absent(xdr) < 0) - return false; - if (xdr_stream_encode_bool(xdr, resp->common.err == nfserr_eof) < 0) - return false; - break; - default: - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh)) - return false; - } + *p++ = resp->status; + p = encode_post_op_attr(rqstp, p, &resp->fh); - return true; + if (resp->status == 0) { + /* stupid readdir cookie */ + memcpy(p, resp->verf, 8); p += 2; + xdr_ressize_check(rqstp, p); + if (rqstp->rq_res.head[0].iov_len + (2<<2) > PAGE_SIZE) + return 1; /*No room for trailer */ + rqstp->rq_res.page_len = (resp->count) << 2; + + /* add the 'tail' to the end of the 'head' page - page 0. */ + rqstp->rq_res.tail[0].iov_base = p; + *p++ = 0; /* no more entries */ + *p++ = htonl(resp->common.err == nfserr_eof); + rqstp->rq_res.tail[0].iov_len = 2<<2; + return 1; + } else + return xdr_ressize_check(rqstp, p); +} + +static __be32 * +encode_entry_baggage(struct nfsd3_readdirres *cd, __be32 *p, const char *name, + int namlen, u64 ino) +{ + *p++ = xdr_one; /* mark entry present */ + p = xdr_encode_hyper(p, ino); /* file id */ + p = xdr_encode_array(p, name, namlen);/* name length & name */ + + cd->offset = p; /* remember pointer */ + p = xdr_encode_hyper(p, NFS_OFFSET_MAX);/* offset of next entry */ + + return p; } static __be32 @@ -1014,323 +887,267 @@ compose_entry_fh(struct nfsd3_readdirres *cd, struct svc_fh *fhp, return rv; } -/** - * nfs3svc_encode_cookie3 - Encode a directory offset cookie - * @resp: readdir result context - * @offset: offset cookie to encode - * - * The buffer space for the offset cookie has already been reserved - * by svcxdr_encode_entry3_common(). - */ -void nfs3svc_encode_cookie3(struct nfsd3_readdirres *resp, u64 offset) +static __be32 *encode_entryplus_baggage(struct nfsd3_readdirres *cd, __be32 *p, const char *name, int namlen, u64 ino) { - __be64 cookie = cpu_to_be64(offset); + struct svc_fh *fh = &cd->scratch; + __be32 err; - if (!resp->cookie_offset) - return; - write_bytes_to_xdr_buf(&resp->dirlist, resp->cookie_offset, &cookie, - sizeof(cookie)); - resp->cookie_offset = 0; -} - -static bool -svcxdr_encode_entry3_common(struct nfsd3_readdirres *resp, const char *name, - int namlen, loff_t offset, u64 ino) -{ - struct xdr_buf *dirlist = &resp->dirlist; - struct xdr_stream *xdr = &resp->xdr; - - if (xdr_stream_encode_item_present(xdr) < 0) - return false; - /* fileid */ - if (xdr_stream_encode_u64(xdr, ino) < 0) - return false; - /* name */ - if (xdr_stream_encode_opaque(xdr, name, min(namlen, NFS3_MAXNAMLEN)) < 0) - return false; - /* cookie */ - resp->cookie_offset = dirlist->len; - if (xdr_stream_encode_u64(xdr, OFFSET_MAX) < 0) - return false; - - return true; -} - -/** - * nfs3svc_encode_entry3 - encode one NFSv3 READDIR entry - * @data: directory context - * @name: name of the object to be encoded - * @namlen: length of that name, in bytes - * @offset: the offset of the previous entry - * @ino: the fileid of this entry - * @d_type: unused - * - * Return values: - * %0: Entry was successfully encoded. - * %-EINVAL: An encoding problem occured, secondary status code in resp->common.err - * - * On exit, the following fields are updated: - * - resp->xdr - * - resp->common.err - * - resp->cookie_offset - */ -int nfs3svc_encode_entry3(void *data, const char *name, int namlen, - loff_t offset, u64 ino, unsigned int d_type) -{ - struct readdir_cd *ccd = data; - struct nfsd3_readdirres *resp = container_of(ccd, - struct nfsd3_readdirres, - common); - unsigned int starting_length = resp->dirlist.len; - - /* The offset cookie for the previous entry */ - nfs3svc_encode_cookie3(resp, offset); - - if (!svcxdr_encode_entry3_common(resp, name, namlen, offset, ino)) - goto out_toosmall; - - xdr_commit_encode(&resp->xdr); - resp->common.err = nfs_ok; - return 0; - -out_toosmall: - resp->cookie_offset = 0; - resp->common.err = nfserr_toosmall; - resp->dirlist.len = starting_length; - return -EINVAL; -} - -static bool -svcxdr_encode_entry3_plus(struct nfsd3_readdirres *resp, const char *name, - int namlen, u64 ino) -{ - struct xdr_stream *xdr = &resp->xdr; - struct svc_fh *fhp = &resp->scratch; - bool result; - - result = false; - fh_init(fhp, NFS3_FHSIZE); - if (compose_entry_fh(resp, fhp, name, namlen, ino) != nfs_ok) - goto out_noattrs; - - if (!svcxdr_encode_post_op_attr(resp->rqstp, xdr, fhp)) + fh_init(fh, NFS3_FHSIZE); + err = compose_entry_fh(cd, fh, name, namlen, ino); + if (err) { + *p++ = 0; + *p++ = 0; goto out; - if (!svcxdr_encode_post_op_fh3(xdr, fhp)) - goto out; - result = true; - + } + p = encode_post_op_attr(cd->rqstp, p, fh); + *p++ = xdr_one; /* yes, a file handle follows */ + p = encode_fh(p, fh); out: - fh_put(fhp); - return result; - -out_noattrs: - if (xdr_stream_encode_item_absent(xdr) < 0) - return false; - if (xdr_stream_encode_item_absent(xdr) < 0) - return false; - return true; + fh_put(fh); + return p; } -/** - * nfs3svc_encode_entryplus3 - encode one NFSv3 READDIRPLUS entry - * @data: directory context - * @name: name of the object to be encoded - * @namlen: length of that name, in bytes - * @offset: the offset of the previous entry - * @ino: the fileid of this entry - * @d_type: unused - * - * Return values: - * %0: Entry was successfully encoded. - * %-EINVAL: An encoding problem occured, secondary status code in resp->common.err - * - * On exit, the following fields are updated: - * - resp->xdr - * - resp->common.err - * - resp->cookie_offset +/* + * Encode a directory entry. This one works for both normal readdir + * and readdirplus. + * The normal readdir reply requires 2 (fileid) + 1 (stringlen) + * + string + 2 (cookie) + 1 (next) words, i.e. 6 + strlen. + * + * The readdirplus baggage is 1+21 words for post_op_attr, plus the + * file handle. */ -int nfs3svc_encode_entryplus3(void *data, const char *name, int namlen, - loff_t offset, u64 ino, unsigned int d_type) + +#define NFS3_ENTRY_BAGGAGE (2 + 1 + 2 + 1) +#define NFS3_ENTRYPLUS_BAGGAGE (1 + 21 + 1 + (NFS3_FHSIZE >> 2)) +static int +encode_entry(struct readdir_cd *ccd, const char *name, int namlen, + loff_t offset, u64 ino, unsigned int d_type, int plus) { - struct readdir_cd *ccd = data; - struct nfsd3_readdirres *resp = container_of(ccd, - struct nfsd3_readdirres, - common); - unsigned int starting_length = resp->dirlist.len; + struct nfsd3_readdirres *cd = container_of(ccd, struct nfsd3_readdirres, + common); + __be32 *p = cd->buffer; + caddr_t curr_page_addr = NULL; + struct page ** page; + int slen; /* string (name) length */ + int elen; /* estimated entry length in words */ + int num_entry_words = 0; /* actual number of words */ - /* The offset cookie for the previous entry */ - nfs3svc_encode_cookie3(resp, offset); + if (cd->offset) { + u64 offset64 = offset; - if (!svcxdr_encode_entry3_common(resp, name, namlen, offset, ino)) - goto out_toosmall; - if (!svcxdr_encode_entry3_plus(resp, name, namlen, ino)) - goto out_toosmall; + if (unlikely(cd->offset1)) { + /* we ended up with offset on a page boundary */ + *cd->offset = htonl(offset64 >> 32); + *cd->offset1 = htonl(offset64 & 0xffffffff); + cd->offset1 = NULL; + } else { + xdr_encode_hyper(cd->offset, offset64); + } + cd->offset = NULL; + } - xdr_commit_encode(&resp->xdr); - resp->common.err = nfs_ok; + /* + dprintk("encode_entry(%.*s @%ld%s)\n", + namlen, name, (long) offset, plus? " plus" : ""); + */ + + /* truncate filename if too long */ + namlen = min(namlen, NFS3_MAXNAMLEN); + + slen = XDR_QUADLEN(namlen); + elen = slen + NFS3_ENTRY_BAGGAGE + + (plus? NFS3_ENTRYPLUS_BAGGAGE : 0); + + if (cd->buflen < elen) { + cd->common.err = nfserr_toosmall; + return -EINVAL; + } + + /* determine which page in rq_respages[] we are currently filling */ + for (page = cd->rqstp->rq_respages + 1; + page < cd->rqstp->rq_next_page; page++) { + curr_page_addr = page_address(*page); + + if (((caddr_t)cd->buffer >= curr_page_addr) && + ((caddr_t)cd->buffer < curr_page_addr + PAGE_SIZE)) + break; + } + + if ((caddr_t)(cd->buffer + elen) < (curr_page_addr + PAGE_SIZE)) { + /* encode entry in current page */ + + p = encode_entry_baggage(cd, p, name, namlen, ino); + + if (plus) + p = encode_entryplus_baggage(cd, p, name, namlen, ino); + num_entry_words = p - cd->buffer; + } else if (*(page+1) != NULL) { + /* temporarily encode entry into next page, then move back to + * current and next page in rq_respages[] */ + __be32 *p1, *tmp; + int len1, len2; + + /* grab next page for temporary storage of entry */ + p1 = tmp = page_address(*(page+1)); + + p1 = encode_entry_baggage(cd, p1, name, namlen, ino); + + if (plus) + p1 = encode_entryplus_baggage(cd, p1, name, namlen, ino); + + /* determine entry word length and lengths to go in pages */ + num_entry_words = p1 - tmp; + len1 = curr_page_addr + PAGE_SIZE - (caddr_t)cd->buffer; + if ((num_entry_words << 2) < len1) { + /* the actual number of words in the entry is less + * than elen and can still fit in the current page + */ + memmove(p, tmp, num_entry_words << 2); + p += num_entry_words; + + /* update offset */ + cd->offset = cd->buffer + (cd->offset - tmp); + } else { + unsigned int offset_r = (cd->offset - tmp) << 2; + + /* update pointer to offset location. + * This is a 64bit quantity, so we need to + * deal with 3 cases: + * - entirely in first page + * - entirely in second page + * - 4 bytes in each page + */ + if (offset_r + 8 <= len1) { + cd->offset = p + (cd->offset - tmp); + } else if (offset_r >= len1) { + cd->offset -= len1 >> 2; + } else { + /* sitting on the fence */ + BUG_ON(offset_r != len1 - 4); + cd->offset = p + (cd->offset - tmp); + cd->offset1 = tmp; + } + + len2 = (num_entry_words << 2) - len1; + + /* move from temp page to current and next pages */ + memmove(p, tmp, len1); + memmove(tmp, (caddr_t)tmp+len1, len2); + + p = tmp + (len2 >> 2); + } + } + else { + cd->common.err = nfserr_toosmall; + return -EINVAL; + } + + cd->buflen -= num_entry_words; + cd->buffer = p; + cd->common.err = nfs_ok; return 0; -out_toosmall: - resp->cookie_offset = 0; - resp->common.err = nfserr_toosmall; - resp->dirlist.len = starting_length; - return -EINVAL; } -static bool -svcxdr_encode_fsstat3resok(struct xdr_stream *xdr, - const struct nfsd3_fsstatres *resp) +int +nfs3svc_encode_entry(void *cd, const char *name, + int namlen, loff_t offset, u64 ino, unsigned int d_type) { - const struct kstatfs *s = &resp->stats; - u64 bs = s->f_bsize; - __be32 *p; + return encode_entry(cd, name, namlen, offset, ino, d_type, 0); +} - p = xdr_reserve_space(xdr, XDR_UNIT * 13); - if (!p) - return false; - p = xdr_encode_hyper(p, bs * s->f_blocks); /* total bytes */ - p = xdr_encode_hyper(p, bs * s->f_bfree); /* free bytes */ - p = xdr_encode_hyper(p, bs * s->f_bavail); /* user available bytes */ - p = xdr_encode_hyper(p, s->f_files); /* total inodes */ - p = xdr_encode_hyper(p, s->f_ffree); /* free inodes */ - p = xdr_encode_hyper(p, s->f_ffree); /* user available inodes */ - *p = cpu_to_be32(resp->invarsec); /* mean unchanged time */ - - return true; +int +nfs3svc_encode_entry_plus(void *cd, const char *name, + int namlen, loff_t offset, u64 ino, + unsigned int d_type) +{ + return encode_entry(cd, name, namlen, offset, ino, d_type, 1); } /* FSSTAT */ -bool -nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_fsstatres *resp = rqstp->rq_resp; + struct kstatfs *s = &resp->stats; + u64 bs = s->f_bsize; - if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) - return false; - if (!svcxdr_encode_fsstat3resok(xdr, resp)) - return false; - break; - default: - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) - return false; + *p++ = resp->status; + *p++ = xdr_zero; /* no post_op_attr */ + + if (resp->status == 0) { + p = xdr_encode_hyper(p, bs * s->f_blocks); /* total bytes */ + p = xdr_encode_hyper(p, bs * s->f_bfree); /* free bytes */ + p = xdr_encode_hyper(p, bs * s->f_bavail); /* user available bytes */ + p = xdr_encode_hyper(p, s->f_files); /* total inodes */ + p = xdr_encode_hyper(p, s->f_ffree); /* free inodes */ + p = xdr_encode_hyper(p, s->f_ffree); /* user available inodes */ + *p++ = htonl(resp->invarsec); /* mean unchanged time */ } - - return true; -} - -static bool -svcxdr_encode_fsinfo3resok(struct xdr_stream *xdr, - const struct nfsd3_fsinfores *resp) -{ - __be32 *p; - - p = xdr_reserve_space(xdr, XDR_UNIT * 12); - if (!p) - return false; - *p++ = cpu_to_be32(resp->f_rtmax); - *p++ = cpu_to_be32(resp->f_rtpref); - *p++ = cpu_to_be32(resp->f_rtmult); - *p++ = cpu_to_be32(resp->f_wtmax); - *p++ = cpu_to_be32(resp->f_wtpref); - *p++ = cpu_to_be32(resp->f_wtmult); - *p++ = cpu_to_be32(resp->f_dtpref); - p = xdr_encode_hyper(p, resp->f_maxfilesize); - p = encode_nfstime3(p, &nfs3svc_time_delta); - *p = cpu_to_be32(resp->f_properties); - - return true; + return xdr_ressize_check(rqstp, p); } /* FSINFO */ -bool -nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_fsinfores *resp = rqstp->rq_resp; - if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) - return false; - if (!svcxdr_encode_fsinfo3resok(xdr, resp)) - return false; - break; - default: - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) - return false; + *p++ = resp->status; + *p++ = xdr_zero; /* no post_op_attr */ + + if (resp->status == 0) { + *p++ = htonl(resp->f_rtmax); + *p++ = htonl(resp->f_rtpref); + *p++ = htonl(resp->f_rtmult); + *p++ = htonl(resp->f_wtmax); + *p++ = htonl(resp->f_wtpref); + *p++ = htonl(resp->f_wtmult); + *p++ = htonl(resp->f_dtpref); + p = xdr_encode_hyper(p, resp->f_maxfilesize); + *p++ = xdr_one; + *p++ = xdr_zero; + *p++ = htonl(resp->f_properties); } - return true; -} - -static bool -svcxdr_encode_pathconf3resok(struct xdr_stream *xdr, - const struct nfsd3_pathconfres *resp) -{ - __be32 *p; - - p = xdr_reserve_space(xdr, XDR_UNIT * 6); - if (!p) - return false; - *p++ = cpu_to_be32(resp->p_link_max); - *p++ = cpu_to_be32(resp->p_name_max); - p = xdr_encode_bool(p, resp->p_no_trunc); - p = xdr_encode_bool(p, resp->p_chown_restricted); - p = xdr_encode_bool(p, resp->p_case_insensitive); - xdr_encode_bool(p, resp->p_case_preserving); - - return true; + return xdr_ressize_check(rqstp, p); } /* PATHCONF */ -bool -nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_pathconfres *resp = rqstp->rq_resp; - if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) - return false; - if (!svcxdr_encode_pathconf3resok(xdr, resp)) - return false; - break; - default: - if (!svcxdr_encode_post_op_attr(rqstp, xdr, &nfs3svc_null_fh)) - return false; + *p++ = resp->status; + *p++ = xdr_zero; /* no post_op_attr */ + + if (resp->status == 0) { + *p++ = htonl(resp->p_link_max); + *p++ = htonl(resp->p_name_max); + *p++ = htonl(resp->p_no_trunc); + *p++ = htonl(resp->p_chown_restricted); + *p++ = htonl(resp->p_case_insensitive); + *p++ = htonl(resp->p_case_preserving); } - return true; + return xdr_ressize_check(rqstp, p); } /* COMMIT */ -bool -nfs3svc_encode_commitres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs3svc_encode_commitres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd3_commitres *resp = rqstp->rq_resp; - if (!svcxdr_encode_nfsstat3(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) - return false; - if (!svcxdr_encode_writeverf3(xdr, resp->verf)) - return false; - break; - default: - if (!svcxdr_encode_wcc_data(rqstp, xdr, &resp->fh)) - return false; + *p++ = resp->status; + p = encode_wcc_data(rqstp, p, &resp->fh); + /* Write verifier */ + if (resp->status == 0) { + *p++ = resp->verf[0]; + *p++ = resp->verf[1]; } - - return true; + return xdr_ressize_check(rqstp, p); } /* diff --git a/fs/nfsd/nfs4acl.c b/fs/nfsd/nfs4acl.c index bb8e2f6d7d03..71292a0d6f09 100644 --- a/fs/nfsd/nfs4acl.c +++ b/fs/nfsd/nfs4acl.c @@ -751,26 +751,57 @@ static int nfs4_acl_nfsv4_to_posix(struct nfs4_acl *acl, return ret; } -__be32 nfsd4_acl_to_attr(enum nfs_ftype4 type, struct nfs4_acl *acl, - struct nfsd_attrs *attr) +__be32 +nfsd4_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp, + struct nfs4_acl *acl) { + __be32 error; int host_error; + struct dentry *dentry; + struct inode *inode; + struct posix_acl *pacl = NULL, *dpacl = NULL; unsigned int flags = 0; - if (!acl) - return nfs_ok; + /* Get inode */ + error = fh_verify(rqstp, fhp, 0, NFSD_MAY_SATTR); + if (error) + return error; - if (type == NF4DIR) + dentry = fhp->fh_dentry; + inode = d_inode(dentry); + + if (S_ISDIR(inode->i_mode)) flags = NFS4_ACL_DIR; - host_error = nfs4_acl_nfsv4_to_posix(acl, &attr->na_pacl, - &attr->na_dpacl, flags); + host_error = nfs4_acl_nfsv4_to_posix(acl, &pacl, &dpacl, flags); if (host_error == -EINVAL) return nfserr_attrnotsupp; + if (host_error < 0) + goto out_nfserr; + + fh_lock(fhp); + + host_error = set_posix_acl(inode, ACL_TYPE_ACCESS, pacl); + if (host_error < 0) + goto out_drop_lock; + + if (S_ISDIR(inode->i_mode)) { + host_error = set_posix_acl(inode, ACL_TYPE_DEFAULT, dpacl); + } + +out_drop_lock: + fh_unlock(fhp); + + posix_acl_release(pacl); + posix_acl_release(dpacl); +out_nfserr: + if (host_error == -EOPNOTSUPP) + return nfserr_attrnotsupp; else return nfserrno(host_error); } + static short ace2type(struct nfs4_ace *ace) { diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index 4eae2c5af2ed..f5b7ad0847f2 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -76,17 +76,6 @@ static __be32 *xdr_encode_empty_array(__be32 *p) * 1 Protocol" */ -static void encode_uint32(struct xdr_stream *xdr, u32 n) -{ - WARN_ON_ONCE(xdr_stream_encode_u32(xdr, n) < 0); -} - -static void encode_bitmap4(struct xdr_stream *xdr, const __u32 *bitmap, - size_t len) -{ - WARN_ON_ONCE(xdr_stream_encode_uint32_array(xdr, bitmap, len) < 0); -} - /* * nfs_cb_opnum4 * @@ -132,7 +121,7 @@ static void encode_nfs_fh4(struct xdr_stream *xdr, const struct knfsd_fh *fh) BUG_ON(length > NFS4_FHSIZE); p = xdr_reserve_space(xdr, 4 + length); - xdr_encode_opaque(p, &fh->fh_raw, length); + xdr_encode_opaque(p, &fh->fh_base, length); } /* @@ -339,24 +328,6 @@ static void encode_cb_recall4args(struct xdr_stream *xdr, hdr->nops++; } -/* - * CB_RECALLANY4args - * - * struct CB_RECALLANY4args { - * uint32_t craa_objects_to_keep; - * bitmap4 craa_type_mask; - * }; - */ -static void -encode_cb_recallany4args(struct xdr_stream *xdr, - struct nfs4_cb_compound_hdr *hdr, struct nfsd4_cb_recall_any *ra) -{ - encode_nfs_cb_opnum4(xdr, OP_CB_RECALL_ANY); - encode_uint32(xdr, ra->ra_keep); - encode_bitmap4(xdr, ra->ra_bmval, ARRAY_SIZE(ra->ra_bmval)); - hdr->nops++; -} - /* * CB_SEQUENCE4args * @@ -511,26 +482,6 @@ static void nfs4_xdr_enc_cb_recall(struct rpc_rqst *req, struct xdr_stream *xdr, encode_cb_nops(&hdr); } -/* - * 20.6. Operation 8: CB_RECALL_ANY - Keep Any N Recallable Objects - */ -static void -nfs4_xdr_enc_cb_recall_any(struct rpc_rqst *req, - struct xdr_stream *xdr, const void *data) -{ - const struct nfsd4_callback *cb = data; - struct nfsd4_cb_recall_any *ra; - struct nfs4_cb_compound_hdr hdr = { - .ident = cb->cb_clp->cl_cb_ident, - .minorversion = cb->cb_clp->cl_minorversion, - }; - - ra = container_of(cb, struct nfsd4_cb_recall_any, ra_cb); - encode_cb_compound4args(xdr, &hdr); - encode_cb_sequence4args(xdr, cb, &hdr); - encode_cb_recallany4args(xdr, &hdr, ra); - encode_cb_nops(&hdr); -} /* * NFSv4.0 and NFSv4.1 XDR decode functions @@ -569,28 +520,6 @@ static int nfs4_xdr_dec_cb_recall(struct rpc_rqst *rqstp, return decode_cb_op_status(xdr, OP_CB_RECALL, &cb->cb_status); } -/* - * 20.6. Operation 8: CB_RECALL_ANY - Keep Any N Recallable Objects - */ -static int -nfs4_xdr_dec_cb_recall_any(struct rpc_rqst *rqstp, - struct xdr_stream *xdr, - void *data) -{ - struct nfsd4_callback *cb = data; - struct nfs4_cb_compound_hdr hdr; - int status; - - status = decode_cb_compound4res(xdr, &hdr); - if (unlikely(status)) - return status; - status = decode_cb_sequence4res(xdr, cb); - if (unlikely(status || cb->cb_seq_status)) - return status; - status = decode_cb_op_status(xdr, OP_CB_RECALL_ANY, &cb->cb_status); - return status; -} - #ifdef CONFIG_NFSD_PNFS /* * CB_LAYOUTRECALL4args @@ -750,7 +679,7 @@ static int nfs4_xdr_dec_cb_notify_lock(struct rpc_rqst *rqstp, * case NFS4_OK: * write_response4 coa_resok4; * default: - * length4 coa_bytes_copied; + * length4 coa_bytes_copied; * }; * struct CB_OFFLOAD4args { * nfs_fh4 coa_fh; @@ -759,22 +688,21 @@ static int nfs4_xdr_dec_cb_notify_lock(struct rpc_rqst *rqstp, * }; */ static void encode_offload_info4(struct xdr_stream *xdr, - const struct nfsd4_cb_offload *cbo) + __be32 nfserr, + const struct nfsd4_copy *cp) { __be32 *p; p = xdr_reserve_space(xdr, 4); - *p = cbo->co_nfserr; - switch (cbo->co_nfserr) { - case nfs_ok: + *p++ = nfserr; + if (!nfserr) { p = xdr_reserve_space(xdr, 4 + 8 + 4 + NFS4_VERIFIER_SIZE); p = xdr_encode_empty_array(p); - p = xdr_encode_hyper(p, cbo->co_res.wr_bytes_written); - *p++ = cpu_to_be32(cbo->co_res.wr_stable_how); - p = xdr_encode_opaque_fixed(p, cbo->co_res.wr_verifier.data, + p = xdr_encode_hyper(p, cp->cp_res.wr_bytes_written); + *p++ = cpu_to_be32(cp->cp_res.wr_stable_how); + p = xdr_encode_opaque_fixed(p, cp->cp_res.wr_verifier.data, NFS4_VERIFIER_SIZE); - break; - default: + } else { p = xdr_reserve_space(xdr, 8); /* We always return success if bytes were written */ p = xdr_encode_hyper(p, 0); @@ -782,16 +710,18 @@ static void encode_offload_info4(struct xdr_stream *xdr, } static void encode_cb_offload4args(struct xdr_stream *xdr, - const struct nfsd4_cb_offload *cbo, + __be32 nfserr, + const struct knfsd_fh *fh, + const struct nfsd4_copy *cp, struct nfs4_cb_compound_hdr *hdr) { __be32 *p; p = xdr_reserve_space(xdr, 4); - *p = cpu_to_be32(OP_CB_OFFLOAD); - encode_nfs_fh4(xdr, &cbo->co_fh); - encode_stateid4(xdr, &cbo->co_res.cb_stateid); - encode_offload_info4(xdr, cbo); + *p++ = cpu_to_be32(OP_CB_OFFLOAD); + encode_nfs_fh4(xdr, fh); + encode_stateid4(xdr, &cp->cp_res.cb_stateid); + encode_offload_info4(xdr, nfserr, cp); hdr->nops++; } @@ -801,8 +731,8 @@ static void nfs4_xdr_enc_cb_offload(struct rpc_rqst *req, const void *data) { const struct nfsd4_callback *cb = data; - const struct nfsd4_cb_offload *cbo = - container_of(cb, struct nfsd4_cb_offload, co_cb); + const struct nfsd4_copy *cp = + container_of(cb, struct nfsd4_copy, cp_cb); struct nfs4_cb_compound_hdr hdr = { .ident = 0, .minorversion = cb->cb_clp->cl_minorversion, @@ -810,7 +740,7 @@ static void nfs4_xdr_enc_cb_offload(struct rpc_rqst *req, encode_cb_compound4args(xdr, &hdr); encode_cb_sequence4args(xdr, cb, &hdr); - encode_cb_offload4args(xdr, cbo, &hdr); + encode_cb_offload4args(xdr, cp->nfserr, &cp->fh, cp, &hdr); encode_cb_nops(&hdr); } @@ -854,7 +784,6 @@ static const struct rpc_procinfo nfs4_cb_procedures[] = { #endif PROC(CB_NOTIFY_LOCK, COMPOUND, cb_notify_lock, cb_notify_lock), PROC(CB_OFFLOAD, COMPOUND, cb_offload, cb_offload), - PROC(CB_RECALL_ANY, COMPOUND, cb_recall_any, cb_recall_any), }; static unsigned int nfs4_cb_counts[ARRAY_SIZE(nfs4_cb_procedures)]; @@ -1012,43 +941,37 @@ static int setup_callback_client(struct nfs4_client *clp, struct nfs4_cb_conn *c clp->cl_cb_conn.cb_xprt = conn->cb_xprt; clp->cl_cb_client = client; clp->cl_cb_cred = cred; - rcu_read_lock(); - trace_nfsd_cb_setup(clp, rpc_peeraddr2str(client, RPC_DISPLAY_NETID), - args.authflavor); - rcu_read_unlock(); + trace_nfsd_cb_setup(clp); return 0; } -static void nfsd4_mark_cb_state(struct nfs4_client *clp, int newstate) -{ - if (clp->cl_cb_state != newstate) { - clp->cl_cb_state = newstate; - trace_nfsd_cb_state(clp); - } -} - static void nfsd4_mark_cb_down(struct nfs4_client *clp, int reason) { if (test_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags)) return; - nfsd4_mark_cb_state(clp, NFSD4_CB_DOWN); + clp->cl_cb_state = NFSD4_CB_DOWN; + trace_nfsd_cb_state(clp); } static void nfsd4_mark_cb_fault(struct nfs4_client *clp, int reason) { if (test_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags)) return; - nfsd4_mark_cb_state(clp, NFSD4_CB_FAULT); + clp->cl_cb_state = NFSD4_CB_FAULT; + trace_nfsd_cb_state(clp); } static void nfsd4_cb_probe_done(struct rpc_task *task, void *calldata) { struct nfs4_client *clp = container_of(calldata, struct nfs4_client, cl_cb_null); + trace_nfsd_cb_done(clp, task->tk_status); if (task->tk_status) nfsd4_mark_cb_down(clp, task->tk_status); - else - nfsd4_mark_cb_state(clp, NFSD4_CB_UP); + else { + clp->cl_cb_state = NFSD4_CB_UP; + trace_nfsd_cb_state(clp); + } } static void nfsd4_cb_probe_release(void *calldata) @@ -1072,8 +995,8 @@ static const struct rpc_call_ops nfsd4_cb_probe_ops = { */ void nfsd4_probe_callback(struct nfs4_client *clp) { - trace_nfsd_cb_probe(clp); - nfsd4_mark_cb_state(clp, NFSD4_CB_UNKNOWN); + clp->cl_cb_state = NFSD4_CB_UNKNOWN; + trace_nfsd_cb_state(clp); set_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags); nfsd4_run_cb(&clp->cl_cb_null); } @@ -1086,10 +1009,11 @@ void nfsd4_probe_callback_sync(struct nfs4_client *clp) void nfsd4_change_callback(struct nfs4_client *clp, struct nfs4_cb_conn *conn) { - nfsd4_mark_cb_state(clp, NFSD4_CB_UNKNOWN); + clp->cl_cb_state = NFSD4_CB_UNKNOWN; spin_lock(&clp->cl_lock); memcpy(&clp->cl_cb_conn, conn, sizeof(struct nfs4_cb_conn)); spin_unlock(&clp->cl_lock); + trace_nfsd_cb_state(clp); } /* @@ -1246,6 +1170,8 @@ static void nfsd4_cb_done(struct rpc_task *task, void *calldata) struct nfsd4_callback *cb = calldata; struct nfs4_client *clp = cb->cb_clp; + trace_nfsd_cb_done(clp, task->tk_status); + if (!nfsd4_cb_sequence_done(task, cb)) return; @@ -1305,9 +1231,6 @@ void nfsd4_destroy_callback_queue(void) /* must be called under the state lock */ void nfsd4_shutdown_callback(struct nfs4_client *clp) { - if (clp->cl_cb_state != NFSD4_CB_UNKNOWN) - trace_nfsd_cb_shutdown(clp); - set_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags); /* * Note this won't actually result in a null callback; @@ -1353,6 +1276,7 @@ static void nfsd4_process_cb_update(struct nfsd4_callback *cb) * kill the old client: */ if (clp->cl_cb_client) { + trace_nfsd_cb_shutdown(clp); rpc_shutdown_client(clp->cl_cb_client); clp->cl_cb_client = NULL; put_cred(clp->cl_cb_cred); @@ -1398,6 +1322,8 @@ nfsd4_run_cb_work(struct work_struct *work) struct rpc_clnt *clnt; int flags; + trace_nfsd_cb_work(clp, cb->cb_msg.rpc_proc->p_name); + if (cb->cb_need_restart) { cb->cb_need_restart = false; } else { @@ -1419,7 +1345,7 @@ nfsd4_run_cb_work(struct work_struct *work) * Don't send probe messages for 4.1 or later. */ if (!cb->cb_ops && clp->cl_minorversion) { - nfsd4_mark_cb_state(clp, NFSD4_CB_UP); + clp->cl_cb_state = NFSD4_CB_UP; nfsd41_destroy_cb(cb); return; } @@ -1445,21 +1371,11 @@ void nfsd4_init_cb(struct nfsd4_callback *cb, struct nfs4_client *clp, cb->cb_holds_slot = false; } -/** - * nfsd4_run_cb - queue up a callback job to run - * @cb: callback to queue - * - * Kick off a callback to do its thing. Returns false if it was already - * on a queue, true otherwise. - */ -bool nfsd4_run_cb(struct nfsd4_callback *cb) +void nfsd4_run_cb(struct nfsd4_callback *cb) { struct nfs4_client *clp = cb->cb_clp; - bool queued; nfsd41_cb_inflight_begin(clp); - queued = nfsd4_queue_cb(cb); - if (!queued) + if (!nfsd4_queue_cb(cb)) nfsd41_cb_inflight_end(clp); - return queued; } diff --git a/fs/nfsd/nfs4idmap.c b/fs/nfsd/nfs4idmap.c index 5e9809aff37e..f92161ce1f97 100644 --- a/fs/nfsd/nfs4idmap.c +++ b/fs/nfsd/nfs4idmap.c @@ -41,7 +41,6 @@ #include "idmap.h" #include "nfsd.h" #include "netns.h" -#include "vfs.h" /* * Turn off idmapping when using AUTH_SYS. @@ -83,8 +82,8 @@ ent_init(struct cache_head *cnew, struct cache_head *citm) new->id = itm->id; new->type = itm->type; - strscpy(new->name, itm->name, sizeof(new->name)); - strscpy(new->authname, itm->authname, sizeof(new->authname)); + strlcpy(new->name, itm->name, sizeof(new->name)); + strlcpy(new->authname, itm->authname, sizeof(new->authname)); } static void @@ -549,7 +548,7 @@ idmap_name_to_id(struct svc_rqst *rqstp, int type, const char *name, u32 namelen return nfserr_badowner; memcpy(key.name, name, namelen); key.name[namelen] = '\0'; - strscpy(key.authname, rqst_authname(rqstp), sizeof(key.authname)); + strlcpy(key.authname, rqst_authname(rqstp), sizeof(key.authname)); ret = idmap_lookup(rqstp, nametoid_lookup, &key, nn->nametoid_cache, &item); if (ret == -ENOENT) return nfserr_badowner; @@ -585,7 +584,7 @@ static __be32 idmap_id_to_name(struct xdr_stream *xdr, int ret; struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); - strscpy(key.authname, rqst_authname(rqstp), sizeof(key.authname)); + strlcpy(key.authname, rqst_authname(rqstp), sizeof(key.authname)); ret = idmap_lookup(rqstp, idtoname_lookup, &key, nn->idtoname_cache, &item); if (ret == -ENOENT) return encode_ascii_id(xdr, id); diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c index e4e23b2a3e65..2673019d30ec 100644 --- a/fs/nfsd/nfs4layouts.c +++ b/fs/nfsd/nfs4layouts.c @@ -421,7 +421,7 @@ nfsd4_insert_layout(struct nfsd4_layoutget *lgp, struct nfs4_layout_stateid *ls) new = kmem_cache_alloc(nfs4_layout_cache, GFP_KERNEL); if (!new) return nfserr_jukebox; - memcpy(&new->lo_seg, seg, sizeof(new->lo_seg)); + memcpy(&new->lo_seg, seg, sizeof(lp->lo_seg)); new->lo_state = ls; spin_lock(&fp->fi_lock); @@ -657,7 +657,7 @@ nfsd4_cb_layout_done(struct nfsd4_callback *cb, struct rpc_task *task) ktime_t now, cutoff; const struct nfsd4_layout_ops *ops; - trace_nfsd_cb_layout_done(&ls->ls_stid.sc_stateid, task); + switch (task->tk_status) { case 0: case -NFS4ERR_DELAY: diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 2c0de247083a..e84996c3867c 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -37,9 +37,6 @@ #include #include #include -#include -#include - #include #include @@ -53,16 +50,34 @@ #include "pnfs.h" #include "trace.h" -static bool inter_copy_offload_enable; -module_param(inter_copy_offload_enable, bool, 0644); -MODULE_PARM_DESC(inter_copy_offload_enable, - "Enable inter server to server copy offload. Default: false"); +#ifdef CONFIG_NFSD_V4_SECURITY_LABEL +#include -#ifdef CONFIG_NFSD_V4_2_INTER_SSC -static int nfsd4_ssc_umount_timeout = 900000; /* default to 15 mins */ -module_param(nfsd4_ssc_umount_timeout, int, 0644); -MODULE_PARM_DESC(nfsd4_ssc_umount_timeout, - "idle msecs before unmount export from source server"); +static inline void +nfsd4_security_inode_setsecctx(struct svc_fh *resfh, struct xdr_netobj *label, u32 *bmval) +{ + struct inode *inode = d_inode(resfh->fh_dentry); + int status; + + inode_lock(inode); + status = security_inode_setsecctx(resfh->fh_dentry, + label->data, label->len); + inode_unlock(inode); + + if (status) + /* + * XXX: We should really fail the whole open, but we may + * already have created a new file, so it may be too + * late. For now this seems the least of evils: + */ + bmval[2] &= ~FATTR4_WORD2_SECURITY_LABEL; + + return; +} +#else +static inline void +nfsd4_security_inode_setsecctx(struct svc_fh *resfh, struct xdr_netobj *label, u32 *bmval) +{ } #endif #define NFSDDBG_FACILITY NFSDDBG_PROC @@ -129,6 +144,26 @@ is_create_with_attrs(struct nfsd4_open *open) || open->op_createmode == NFS4_CREATE_EXCLUSIVE4_1); } +/* + * if error occurs when setting the acl, just clear the acl bit + * in the returned attr bitmap. + */ +static void +do_set_nfs4_acl(struct svc_rqst *rqstp, struct svc_fh *fhp, + struct nfs4_acl *acl, u32 *bmval) +{ + __be32 status; + + status = nfsd4_set_nfs4_acl(rqstp, fhp, acl); + if (status) + /* + * We should probably fail the whole open at this point, + * but we've already created the file, so it's too late; + * So this seems the least of evils: + */ + bmval[0] &= ~FATTR4_WORD0_ACL; +} + static inline void fh_dup2(struct svc_fh *dst, struct svc_fh *src) { @@ -142,6 +177,7 @@ fh_dup2(struct svc_fh *dst, struct svc_fh *src) static __be32 do_open_permission(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nfsd4_open *open, int accmode) { + __be32 status; if (open->op_truncate && !(open->op_share_access & NFS4_SHARE_ACCESS_WRITE)) @@ -156,7 +192,9 @@ do_open_permission(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nfs if (open->op_share_deny & NFS4_SHARE_DENY_READ) accmode |= NFSD_MAY_WRITE; - return fh_verify(rqstp, current_fh, S_IFREG, accmode); + status = fh_verify(rqstp, current_fh, S_IFREG, accmode); + + return status; } static __be32 nfsd_check_obj_isreg(struct svc_fh *fh) @@ -185,202 +223,6 @@ static void nfsd4_set_open_owner_reply_cache(struct nfsd4_compound_state *cstate &resfh->fh_handle); } -static inline bool nfsd4_create_is_exclusive(int createmode) -{ - return createmode == NFS4_CREATE_EXCLUSIVE || - createmode == NFS4_CREATE_EXCLUSIVE4_1; -} - -static __be32 -nfsd4_vfs_create(struct svc_fh *fhp, struct dentry *child, - struct nfsd4_open *open) -{ - struct file *filp; - struct path path; - int oflags; - - oflags = O_CREAT | O_LARGEFILE; - switch (open->op_share_access & NFS4_SHARE_ACCESS_BOTH) { - case NFS4_SHARE_ACCESS_WRITE: - oflags |= O_WRONLY; - break; - case NFS4_SHARE_ACCESS_BOTH: - oflags |= O_RDWR; - break; - default: - oflags |= O_RDONLY; - } - - path.mnt = fhp->fh_export->ex_path.mnt; - path.dentry = child; - filp = dentry_create(&path, oflags, open->op_iattr.ia_mode, - current_cred()); - if (IS_ERR(filp)) - return nfserrno(PTR_ERR(filp)); - - open->op_filp = filp; - return nfs_ok; -} - -/* - * Implement NFSv4's unchecked, guarded, and exclusive create - * semantics for regular files. Open state for this new file is - * subsequently fabricated in nfsd4_process_open2(). - * - * Upon return, caller must release @fhp and @resfhp. - */ -static __be32 -nfsd4_create_file(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct svc_fh *resfhp, struct nfsd4_open *open) -{ - struct iattr *iap = &open->op_iattr; - struct nfsd_attrs attrs = { - .na_iattr = iap, - .na_seclabel = &open->op_label, - }; - struct dentry *parent, *child; - __u32 v_mtime, v_atime; - struct inode *inode; - __be32 status; - int host_err; - - if (isdotent(open->op_fname, open->op_fnamelen)) - return nfserr_exist; - if (!(iap->ia_valid & ATTR_MODE)) - iap->ia_mode = 0; - - status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC); - if (status != nfs_ok) - return status; - parent = fhp->fh_dentry; - inode = d_inode(parent); - - host_err = fh_want_write(fhp); - if (host_err) - return nfserrno(host_err); - - if (is_create_with_attrs(open)) - nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs); - - inode_lock_nested(inode, I_MUTEX_PARENT); - - child = lookup_one_len(open->op_fname, parent, open->op_fnamelen); - if (IS_ERR(child)) { - status = nfserrno(PTR_ERR(child)); - goto out; - } - - if (d_really_is_negative(child)) { - status = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE); - if (status != nfs_ok) - goto out; - } - - status = fh_compose(resfhp, fhp->fh_export, child, fhp); - if (status != nfs_ok) - goto out; - - v_mtime = 0; - v_atime = 0; - if (nfsd4_create_is_exclusive(open->op_createmode)) { - u32 *verifier = (u32 *)open->op_verf.data; - - /* - * Solaris 7 gets confused (bugid 4218508) if these have - * the high bit set, as do xfs filesystems without the - * "bigtime" feature. So just clear the high bits. If this - * is ever changed to use different attrs for storing the - * verifier, then do_open_lookup() will also need to be - * fixed accordingly. - */ - v_mtime = verifier[0] & 0x7fffffff; - v_atime = verifier[1] & 0x7fffffff; - } - - if (d_really_is_positive(child)) { - status = nfs_ok; - - /* NFSv4 protocol requires change attributes even though - * no change happened. - */ - fh_fill_both_attrs(fhp); - - switch (open->op_createmode) { - case NFS4_CREATE_UNCHECKED: - if (!d_is_reg(child)) - break; - - /* - * In NFSv4, we don't want to truncate the file - * now. This would be wrong if the OPEN fails for - * some other reason. Furthermore, if the size is - * nonzero, we should ignore it according to spec! - */ - open->op_truncate = (iap->ia_valid & ATTR_SIZE) && - !iap->ia_size; - break; - case NFS4_CREATE_GUARDED: - status = nfserr_exist; - break; - case NFS4_CREATE_EXCLUSIVE: - if (d_inode(child)->i_mtime.tv_sec == v_mtime && - d_inode(child)->i_atime.tv_sec == v_atime && - d_inode(child)->i_size == 0) { - open->op_created = true; - break; /* subtle */ - } - status = nfserr_exist; - break; - case NFS4_CREATE_EXCLUSIVE4_1: - if (d_inode(child)->i_mtime.tv_sec == v_mtime && - d_inode(child)->i_atime.tv_sec == v_atime && - d_inode(child)->i_size == 0) { - open->op_created = true; - goto set_attr; /* subtle */ - } - status = nfserr_exist; - } - goto out; - } - - if (!IS_POSIXACL(inode)) - iap->ia_mode &= ~current_umask(); - - fh_fill_pre_attrs(fhp); - status = nfsd4_vfs_create(fhp, child, open); - if (status != nfs_ok) - goto out; - open->op_created = true; - fh_fill_post_attrs(fhp); - - /* A newly created file already has a file size of zero. */ - if ((iap->ia_valid & ATTR_SIZE) && (iap->ia_size == 0)) - iap->ia_valid &= ~ATTR_SIZE; - if (nfsd4_create_is_exclusive(open->op_createmode)) { - iap->ia_valid = ATTR_MTIME | ATTR_ATIME | - ATTR_MTIME_SET|ATTR_ATIME_SET; - iap->ia_mtime.tv_sec = v_mtime; - iap->ia_atime.tv_sec = v_atime; - iap->ia_mtime.tv_nsec = 0; - iap->ia_atime.tv_nsec = 0; - } - -set_attr: - status = nfsd_create_setattr(rqstp, fhp, resfhp, &attrs); - - if (attrs.na_labelerr) - open->op_bmval[2] &= ~FATTR4_WORD2_SECURITY_LABEL; - if (attrs.na_aclerr) - open->op_bmval[0] &= ~FATTR4_WORD0_ACL; -out: - inode_unlock(inode); - nfsd_attrs_free(&attrs); - if (child && !IS_ERR(child)) - dput(child); - fh_drop_write(fhp); - return status; -} - static __be32 do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_open *open, struct svc_fh **resfh) { @@ -410,33 +252,47 @@ do_open_lookup(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stru * yes | yes | GUARDED4 | GUARDED4 */ + /* + * Note: create modes (UNCHECKED,GUARDED...) are the same + * in NFSv4 as in v3 except EXCLUSIVE4_1. + */ current->fs->umask = open->op_umask; - status = nfsd4_create_file(rqstp, current_fh, *resfh, open); + status = do_nfsd_create(rqstp, current_fh, open->op_fname.data, + open->op_fname.len, &open->op_iattr, + *resfh, open->op_createmode, + (u32 *)open->op_verf.data, + &open->op_truncate, &open->op_created); current->fs->umask = 0; + if (!status && open->op_label.len) + nfsd4_security_inode_setsecctx(*resfh, &open->op_label, open->op_bmval); + /* * Following rfc 3530 14.2.16, and rfc 5661 18.16.4 * use the returned bitmask to indicate which attributes * we used to store the verifier: */ - if (nfsd4_create_is_exclusive(open->op_createmode) && status == 0) + if (nfsd_create_is_exclusive(open->op_createmode) && status == 0) open->op_bmval[1] |= (FATTR4_WORD1_TIME_ACCESS | FATTR4_WORD1_TIME_MODIFY); - } else { + } else + /* + * Note this may exit with the parent still locked. + * We will hold the lock until nfsd4_open's final + * lookup, to prevent renames or unlinks until we've had + * a chance to an acquire a delegation if appropriate. + */ status = nfsd_lookup(rqstp, current_fh, - open->op_fname, open->op_fnamelen, *resfh); - if (!status) - /* NFSv4 protocol requires change attributes even though - * no change happened. - */ - fh_fill_both_attrs(current_fh); - } + open->op_fname.data, open->op_fname.len, *resfh); if (status) goto out; status = nfsd_check_obj_isreg(*resfh); if (status) goto out; + if (is_create_with_attrs(open) && open->op_acl != NULL) + do_set_nfs4_acl(rqstp, *resfh, open->op_acl, open->op_bmval); + nfsd4_set_open_owner_reply_cache(cstate, open, *resfh); accmode = NFSD_MAY_NOP; if (open->op_created || @@ -452,6 +308,7 @@ static __be32 do_open_fhandle(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd4_open *open) { struct svc_fh *current_fh = &cstate->current_fh; + __be32 status; int accmode = 0; /* We don't know the target directory, and therefore can not @@ -476,7 +333,9 @@ do_open_fhandle(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, str if (open->op_claim_type == NFS4_OPEN_CLAIM_DELEG_CUR_FH) accmode = NFSD_MAY_OWNER_OVERRIDE; - return do_open_permission(rqstp, current_fh, open, accmode); + status = do_open_permission(rqstp, current_fh, open, accmode); + + return status; } static void @@ -501,12 +360,9 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, bool reclaim = false; dprintk("NFSD: nfsd4_open filename %.*s op_openowner %p\n", - (int)open->op_fnamelen, open->op_fname, + (int)open->op_fname.len, open->op_fname.data, open->op_openowner); - open->op_filp = NULL; - open->op_rqstp = rqstp; - /* This check required by spec. */ if (open->op_create && open->op_claim_type != NFS4_OPEN_CLAIM_NULL) return nfserr_inval; @@ -517,7 +373,8 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, * Before RECLAIM_COMPLETE done, server should deny new lock */ if (nfsd4_has_session(cstate) && - !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, &cstate->clp->cl_flags) && + !test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, + &cstate->session->se_client->cl_flags) && open->op_claim_type != NFS4_OPEN_CLAIM_PREVIOUS) return nfserr_grace; @@ -559,46 +416,51 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out; switch (open->op_claim_type) { - case NFS4_OPEN_CLAIM_DELEGATE_CUR: - case NFS4_OPEN_CLAIM_NULL: - status = do_open_lookup(rqstp, cstate, open, &resfh); - if (status) + case NFS4_OPEN_CLAIM_DELEGATE_CUR: + case NFS4_OPEN_CLAIM_NULL: + status = do_open_lookup(rqstp, cstate, open, &resfh); + if (status) + goto out; + break; + case NFS4_OPEN_CLAIM_PREVIOUS: + status = nfs4_check_open_reclaim(&open->op_clientid, + cstate, nn); + if (status) + goto out; + open->op_openowner->oo_flags |= NFS4_OO_CONFIRMED; + reclaim = true; + fallthrough; + case NFS4_OPEN_CLAIM_FH: + case NFS4_OPEN_CLAIM_DELEG_CUR_FH: + status = do_open_fhandle(rqstp, cstate, open); + if (status) + goto out; + resfh = &cstate->current_fh; + break; + case NFS4_OPEN_CLAIM_DELEG_PREV_FH: + case NFS4_OPEN_CLAIM_DELEGATE_PREV: + dprintk("NFSD: unsupported OPEN claim type %d\n", + open->op_claim_type); + status = nfserr_notsupp; goto out; - break; - case NFS4_OPEN_CLAIM_PREVIOUS: - status = nfs4_check_open_reclaim(cstate->clp); - if (status) + default: + dprintk("NFSD: Invalid OPEN claim type %d\n", + open->op_claim_type); + status = nfserr_inval; goto out; - open->op_openowner->oo_flags |= NFS4_OO_CONFIRMED; - reclaim = true; - fallthrough; - case NFS4_OPEN_CLAIM_FH: - case NFS4_OPEN_CLAIM_DELEG_CUR_FH: - status = do_open_fhandle(rqstp, cstate, open); - if (status) - goto out; - resfh = &cstate->current_fh; - break; - case NFS4_OPEN_CLAIM_DELEG_PREV_FH: - case NFS4_OPEN_CLAIM_DELEGATE_PREV: - status = nfserr_notsupp; - goto out; - default: - status = nfserr_inval; - goto out; } - + /* + * nfsd4_process_open2() does the actual opening of the file. If + * successful, it (1) truncates the file if open->op_truncate was + * set, (2) sets open->op_stateid, (3) sets open->op_delegation. + */ status = nfsd4_process_open2(rqstp, resfh, open); - if (status && open->op_created) - pr_warn("nfsd4_process_open2 failed to open newly-created file: status=%u\n", - be32_to_cpu(status)); + WARN(status && open->op_created, + "nfsd4_process_open2 failed to open newly-created file! status=%u\n", + be32_to_cpu(status)); if (reclaim && !status) nn->somebody_reclaimed = true; out: - if (open->op_filp) { - fput(open->op_filp); - open->op_filp = NULL; - } if (resfh && resfh != &cstate->current_fh) { fh_dup2(&cstate->current_fh, resfh); fh_put(resfh); @@ -647,7 +509,7 @@ nfsd4_putfh(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, fh_put(&cstate->current_fh); cstate->current_fh.fh_handle.fh_size = putfh->pf_fhlen; - memcpy(&cstate->current_fh.fh_handle.fh_raw, putfh->pf_fhval, + memcpy(&cstate->current_fh.fh_handle.fh_base, putfh->pf_fhval, putfh->pf_fhlen); ret = fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS); #ifdef CONFIG_NFSD_V4_2_INTER_SSC @@ -663,9 +525,11 @@ static __be32 nfsd4_putrootfh(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) { - fh_put(&cstate->current_fh); + __be32 status; - return exp_pseudoroot(rqstp, &cstate->current_fh); + fh_put(&cstate->current_fh); + status = exp_pseudoroot(rqstp, &cstate->current_fh); + return status; } static __be32 @@ -724,7 +588,7 @@ static void gen_boot_verifier(nfs4_verifier *verifier, struct net *net) BUILD_BUG_ON(2*sizeof(*verf) != sizeof(verifier->data)); - nfsd_copy_write_verifier(verf, net_generic(net, nfsd_net_id)); + nfsd_copy_boot_verifier(verf, net_generic(net, nfsd_net_id)); } static __be32 @@ -732,19 +596,10 @@ nfsd4_commit(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) { struct nfsd4_commit *commit = &u->commit; - struct nfsd_file *nf; - __be32 status; - status = nfsd_file_acquire(rqstp, &cstate->current_fh, NFSD_MAY_WRITE | - NFSD_MAY_NOT_BREAK_LEASE, &nf); - if (status != nfs_ok) - return status; - - status = nfsd_commit(rqstp, &cstate->current_fh, nf, commit->co_offset, + return nfsd_commit(rqstp, &cstate->current_fh, commit->co_offset, commit->co_count, (__be32 *)commit->co_verf.data); - nfsd_file_put(nf); - return status; } static __be32 @@ -752,10 +607,6 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) { struct nfsd4_create *create = &u->create; - struct nfsd_attrs attrs = { - .na_iattr = &create->cr_iattr, - .na_seclabel = &create->cr_label, - }; struct svc_fh resfh; __be32 status; dev_t rdev; @@ -771,13 +622,12 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) return status; - status = nfsd4_acl_to_attr(create->cr_type, create->cr_acl, &attrs); current->fs->umask = create->cr_umask; switch (create->cr_type) { case NF4LNK: status = nfsd_symlink(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - create->cr_data, &attrs, &resfh); + create->cr_data, &resfh); break; case NF4BLK: @@ -788,7 +638,7 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out_umask; status = nfsd_create(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - &attrs, S_IFBLK, rdev, &resfh); + &create->cr_iattr, S_IFBLK, rdev, &resfh); break; case NF4CHR: @@ -799,26 +649,26 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out_umask; status = nfsd_create(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - &attrs, S_IFCHR, rdev, &resfh); + &create->cr_iattr,S_IFCHR, rdev, &resfh); break; case NF4SOCK: status = nfsd_create(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - &attrs, S_IFSOCK, 0, &resfh); + &create->cr_iattr, S_IFSOCK, 0, &resfh); break; case NF4FIFO: status = nfsd_create(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - &attrs, S_IFIFO, 0, &resfh); + &create->cr_iattr, S_IFIFO, 0, &resfh); break; case NF4DIR: create->cr_iattr.ia_valid &= ~ATTR_SIZE; status = nfsd_create(rqstp, &cstate->current_fh, create->cr_name, create->cr_namelen, - &attrs, S_IFDIR, 0, &resfh); + &create->cr_iattr, S_IFDIR, 0, &resfh); break; default: @@ -828,17 +678,20 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) goto out; - if (attrs.na_labelerr) - create->cr_bmval[2] &= ~FATTR4_WORD2_SECURITY_LABEL; - if (attrs.na_aclerr) - create->cr_bmval[0] &= ~FATTR4_WORD0_ACL; + if (create->cr_label.len) + nfsd4_security_inode_setsecctx(&resfh, &create->cr_label, create->cr_bmval); + + if (create->cr_acl != NULL) + do_set_nfs4_acl(rqstp, &resfh, create->cr_acl, + create->cr_bmval); + + fh_unlock(&cstate->current_fh); set_change_info(&create->cr_cinfo, &cstate->current_fh); fh_dup2(&cstate->current_fh, &resfh); out: fh_put(&resfh); out_umask: current->fs->umask = 0; - nfsd_attrs_free(&attrs); return status; } @@ -919,16 +772,12 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, __be32 status; read->rd_nf = NULL; + if (read->rd_offset >= OFFSET_MAX) + return nfserr_inval; trace_nfsd_read_start(rqstp, &cstate->current_fh, read->rd_offset, read->rd_length); - read->rd_length = min_t(u32, read->rd_length, svc_max_payload(rqstp)); - if (read->rd_offset > (u64)OFFSET_MAX) - read->rd_offset = (u64)OFFSET_MAX; - if (read->rd_offset + read->rd_length > (u64)OFFSET_MAX) - read->rd_length = (u64)OFFSET_MAX - read->rd_offset; - /* * If we do a zero copy read, then a client will see read data * that reflects the state of the file *after* performing the @@ -944,7 +793,12 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, &read->rd_stateid, RD_STATE, &read->rd_nf, NULL); - + if (status) { + dprintk("NFSD: nfsd4_read: couldn't process stateid!\n"); + goto out; + } + status = nfs_ok; +out: read->rd_rqstp = rqstp; read->rd_fhp = &cstate->current_fh; return status; @@ -1006,8 +860,10 @@ nfsd4_remove(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, return nfserr_grace; status = nfsd_unlink(rqstp, &cstate->current_fh, 0, remove->rm_name, remove->rm_namelen); - if (!status) + if (!status) { + fh_unlock(&cstate->current_fh); set_change_info(&remove->rm_cinfo, &cstate->current_fh); + } return status; } @@ -1047,6 +903,7 @@ nfsd4_secinfo(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, &exp, &dentry); if (err) return err; + fh_unlock(&cstate->current_fh); if (d_really_is_negative(dentry)) { exp_put(exp); err = nfserr_noent; @@ -1101,21 +958,17 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) { struct nfsd4_setattr *setattr = &u->setattr; - struct nfsd_attrs attrs = { - .na_iattr = &setattr->sa_iattr, - .na_seclabel = &setattr->sa_label, - }; - struct inode *inode; __be32 status = nfs_ok; - bool save_no_wcc; int err; if (setattr->sa_iattr.ia_valid & ATTR_SIZE) { status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, &setattr->sa_stateid, WR_STATE, NULL, NULL); - if (status) + if (status) { + dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n"); return status; + } } err = fh_want_write(&cstate->current_fh); if (err) @@ -1127,23 +980,19 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) goto out; - inode = cstate->current_fh.fh_dentry->d_inode; - status = nfsd4_acl_to_attr(S_ISDIR(inode->i_mode) ? NF4DIR : NF4REG, - setattr->sa_acl, &attrs); - + if (setattr->sa_acl != NULL) + status = nfsd4_set_nfs4_acl(rqstp, &cstate->current_fh, + setattr->sa_acl); if (status) goto out; - save_no_wcc = cstate->current_fh.fh_no_wcc; - cstate->current_fh.fh_no_wcc = true; - status = nfsd_setattr(rqstp, &cstate->current_fh, &attrs, + if (setattr->sa_label.len) + status = nfsd4_set_nfs4_label(rqstp, &cstate->current_fh, + &setattr->sa_label); + if (status) + goto out; + status = nfsd_setattr(rqstp, &cstate->current_fh, &setattr->sa_iattr, 0, (time64_t)0); - cstate->current_fh.fh_no_wcc = save_no_wcc; - if (!status) - status = nfserrno(attrs.na_labelerr); - if (!status) - status = nfserrno(attrs.na_aclerr); out: - nfsd_attrs_free(&attrs); fh_drop_write(&cstate->current_fh); return status; } @@ -1168,12 +1017,15 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, write->wr_offset, cnt); status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, stateid, WR_STATE, &nf, NULL); - if (status) + if (status) { + dprintk("NFSD: nfsd4_write: couldn't process stateid!\n"); return status; + } write->wr_how_written = write->wr_stable_how; - nvecs = svc_fill_write_vector(rqstp, &write->wr_payload); + nvecs = svc_fill_write_vector(rqstp, write->wr_pagelist, + &write->wr_head, write->wr_buflen); WARN_ON_ONCE(nvecs > ARRAY_SIZE(rqstp->rq_vec)); status = nfsd_vfs_write(rqstp, &cstate->current_fh, nf, @@ -1200,13 +1052,17 @@ nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->save_fh, src_stateid, RD_STATE, src, NULL); - if (status) + if (status) { + dprintk("NFSD: %s: couldn't process src stateid!\n", __func__); goto out; + } status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, dst_stateid, WR_STATE, dst, NULL); - if (status) + if (status) { + dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__); goto out_put_src; + } /* fix up for NFS-specific error code */ if (!S_ISREG(file_inode((*src)->nf_file)->i_mode) || @@ -1239,7 +1095,7 @@ nfsd4_clone(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) goto out; - status = nfsd4_clone_file_range(rqstp, src, clone->cl_src_pos, + status = nfsd4_clone_file_range(src, clone->cl_src_pos, dst, clone->cl_dst_pos, clone->cl_count, EX_ISSYNC(cstate->current_fh.fh_export)); @@ -1249,17 +1105,30 @@ nfsd4_clone(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, return status; } -static void nfs4_put_copy(struct nfsd4_copy *copy) +void nfs4_put_copy(struct nfsd4_copy *copy) { if (!refcount_dec_and_test(©->refcount)) return; - kfree(copy->cp_src); kfree(copy); } +static bool +check_and_set_stop_copy(struct nfsd4_copy *copy) +{ + bool value; + + spin_lock(©->cp_clp->async_lock); + value = copy->stopped; + if (!copy->stopped) + copy->stopped = true; + spin_unlock(©->cp_clp->async_lock); + return value; +} + static void nfsd4_stop_copy(struct nfsd4_copy *copy) { - if (!test_and_set_bit(NFSD4_COPY_F_STOPPED, ©->cp_flags)) + /* only 1 thread should stop the copy */ + if (!check_and_set_stop_copy(copy)) kthread_stop(copy->copy_task); nfs4_put_copy(copy); } @@ -1296,88 +1165,12 @@ extern void nfs_sb_deactive(struct super_block *sb); #define NFSD42_INTERSSC_MOUNTOPS "vers=4.2,addr=%s,sec=sys" -/* - * setup a work entry in the ssc delayed unmount list. - */ -static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr, - struct nfsd4_ssc_umount_item **nsui) -{ - struct nfsd4_ssc_umount_item *ni = NULL; - struct nfsd4_ssc_umount_item *work = NULL; - struct nfsd4_ssc_umount_item *tmp; - DEFINE_WAIT(wait); - __be32 status = 0; - - *nsui = NULL; - work = kzalloc(sizeof(*work), GFP_KERNEL); -try_again: - spin_lock(&nn->nfsd_ssc_lock); - list_for_each_entry_safe(ni, tmp, &nn->nfsd_ssc_mount_list, nsui_list) { - if (strncmp(ni->nsui_ipaddr, ipaddr, sizeof(ni->nsui_ipaddr))) - continue; - /* found a match */ - if (ni->nsui_busy) { - /* wait - and try again */ - prepare_to_wait(&nn->nfsd_ssc_waitq, &wait, TASK_IDLE); - spin_unlock(&nn->nfsd_ssc_lock); - - /* allow 20secs for mount/unmount for now - revisit */ - if (kthread_should_stop() || - (freezable_schedule_timeout(20*HZ) == 0)) { - finish_wait(&nn->nfsd_ssc_waitq, &wait); - kfree(work); - return nfserr_eagain; - } - finish_wait(&nn->nfsd_ssc_waitq, &wait); - goto try_again; - } - *nsui = ni; - refcount_inc(&ni->nsui_refcnt); - spin_unlock(&nn->nfsd_ssc_lock); - kfree(work); - - /* return vfsmount in (*nsui)->nsui_vfsmount */ - return 0; - } - if (work) { - strscpy(work->nsui_ipaddr, ipaddr, sizeof(work->nsui_ipaddr) - 1); - refcount_set(&work->nsui_refcnt, 2); - work->nsui_busy = true; - list_add_tail(&work->nsui_list, &nn->nfsd_ssc_mount_list); - *nsui = work; - } else - status = nfserr_resource; - spin_unlock(&nn->nfsd_ssc_lock); - return status; -} - -static void nfsd4_ssc_update_dul(struct nfsd_net *nn, - struct nfsd4_ssc_umount_item *nsui, - struct vfsmount *ss_mnt) -{ - spin_lock(&nn->nfsd_ssc_lock); - nsui->nsui_vfsmount = ss_mnt; - nsui->nsui_busy = false; - wake_up_all(&nn->nfsd_ssc_waitq); - spin_unlock(&nn->nfsd_ssc_lock); -} - -static void nfsd4_ssc_cancel_dul(struct nfsd_net *nn, - struct nfsd4_ssc_umount_item *nsui) -{ - spin_lock(&nn->nfsd_ssc_lock); - list_del(&nsui->nsui_list); - wake_up_all(&nn->nfsd_ssc_waitq); - spin_unlock(&nn->nfsd_ssc_lock); - kfree(nsui); -} - /* * Support one copy source server for now. */ static __be32 nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, - struct nfsd4_ssc_umount_item **nsui) + struct vfsmount **mount) { struct file_system_type *type; struct vfsmount *ss_mnt; @@ -1388,14 +1181,12 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, char *ipaddr, *dev_name, *raw_data; int len, raw_len; __be32 status = nfserr_inval; - struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); naddr = &nss->u.nl4_addr; tmp_addrlen = rpc_uaddr2sockaddr(SVC_NET(rqstp), naddr->addr, naddr->addr_len, (struct sockaddr *)&tmp_addr, sizeof(tmp_addr)); - *nsui = NULL; if (tmp_addrlen == 0) goto out_err; @@ -1438,23 +1229,14 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, goto out_free_rawdata; snprintf(dev_name, len + 5, "%s%s%s:/", startsep, ipaddr, endsep); - status = nfsd4_ssc_setup_dul(nn, ipaddr, nsui); - if (status) - goto out_free_devname; - if ((*nsui)->nsui_vfsmount) - goto out_done; - /* Use an 'internal' mount: SB_KERNMOUNT -> MNT_INTERNAL */ ss_mnt = vfs_kern_mount(type, SB_KERNMOUNT, dev_name, raw_data); module_put(type->owner); - if (IS_ERR(ss_mnt)) { - status = nfserr_nodev; - nfsd4_ssc_cancel_dul(nn, *nsui); + if (IS_ERR(ss_mnt)) goto out_free_devname; - } - nfsd4_ssc_update_dul(nn, *nsui, ss_mnt); -out_done: + status = 0; + *mount = ss_mnt; out_free_devname: kfree(dev_name); @@ -1478,7 +1260,7 @@ nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp, static __be32 nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, - struct nfsd4_copy *copy) + struct nfsd4_copy *copy, struct vfsmount **mount) { struct svc_fh *s_fh = NULL; stateid_t *s_stid = ©->cp_src_stateid; @@ -1491,14 +1273,14 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, if (status) goto out; - status = nfsd4_interssc_connect(copy->cp_src, rqstp, ©->ss_nsui); + status = nfsd4_interssc_connect(©->cp_src, rqstp, mount); if (status) goto out; s_fh = &cstate->save_fh; copy->c_fh.size = s_fh->fh_handle.fh_size; - memcpy(copy->c_fh.data, &s_fh->fh_handle.fh_raw, copy->c_fh.size); + memcpy(copy->c_fh.data, &s_fh->fh_handle.fh_base, copy->c_fh.size); copy->stateid.seqid = cpu_to_be32(s_stid->si_generation); memcpy(copy->stateid.other, (void *)&s_stid->si_opaque, sizeof(stateid_opaque_t)); @@ -1509,26 +1291,13 @@ nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, } static void -nfsd4_cleanup_inter_ssc(struct nfsd4_ssc_umount_item *nsui, struct file *filp, +nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src, struct nfsd_file *dst) { - struct nfsd_net *nn = net_generic(dst->nf_net, nfsd_net_id); - long timeout = msecs_to_jiffies(nfsd4_ssc_umount_timeout); - - nfs42_ssc_close(filp); - fput(filp); - - spin_lock(&nn->nfsd_ssc_lock); - list_del(&nsui->nsui_list); - /* - * vfsmount can be shared by multiple exports, - * decrement refcnt. If the count drops to 1 it - * will be unmounted when nsui_expire expires. - */ - refcount_dec(&nsui->nsui_refcnt); - nsui->nsui_expire = jiffies + timeout; - list_add_tail(&nsui->nsui_list, &nn->nfsd_ssc_mount_list); - spin_unlock(&nn->nfsd_ssc_lock); + nfs42_ssc_close(src->nf_file); + fput(src->nf_file); + nfsd_file_put(dst); + mntput(ss_mnt); } #else /* CONFIG_NFSD_V4_2_INTER_SSC */ @@ -1536,13 +1305,15 @@ nfsd4_cleanup_inter_ssc(struct nfsd4_ssc_umount_item *nsui, struct file *filp, static __be32 nfsd4_setup_inter_ssc(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, - struct nfsd4_copy *copy) + struct nfsd4_copy *copy, + struct vfsmount **mount) { + *mount = NULL; return nfserr_inval; } static void -nfsd4_cleanup_inter_ssc(struct nfsd4_ssc_umount_item *nsui, struct file *filp, +nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct nfsd_file *src, struct nfsd_file *dst) { } @@ -1565,21 +1336,23 @@ nfsd4_setup_intra_ssc(struct svc_rqst *rqstp, ©->nf_dst); } +static void +nfsd4_cleanup_intra_ssc(struct nfsd_file *src, struct nfsd_file *dst) +{ + nfsd_file_put(src); + nfsd_file_put(dst); +} + static void nfsd4_cb_offload_release(struct nfsd4_callback *cb) { - struct nfsd4_cb_offload *cbo = - container_of(cb, struct nfsd4_cb_offload, co_cb); + struct nfsd4_copy *copy = container_of(cb, struct nfsd4_copy, cp_cb); - kfree(cbo); + nfs4_put_copy(copy); } static int nfsd4_cb_offload_done(struct nfsd4_callback *cb, struct rpc_task *task) { - struct nfsd4_cb_offload *cbo = - container_of(cb, struct nfsd4_cb_offload, co_cb); - - trace_nfsd_cb_offload_done(&cbo->co_res.cb_stateid, task); return 1; } @@ -1590,28 +1363,20 @@ static const struct nfsd4_callback_ops nfsd4_cb_offload_ops = { static void nfsd4_init_copy_res(struct nfsd4_copy *copy, bool sync) { - copy->cp_res.wr_stable_how = - test_bit(NFSD4_COPY_F_COMMITTED, ©->cp_flags) ? - NFS_FILE_SYNC : NFS_UNSTABLE; - nfsd4_copy_set_sync(copy, sync); + copy->cp_res.wr_stable_how = NFS_UNSTABLE; + copy->cp_synchronous = sync; gen_boot_verifier(©->cp_res.wr_verifier, copy->cp_clp->net); } -static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy, - struct file *dst, - struct file *src) +static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy) { - errseq_t since; + struct file *dst = copy->nf_dst->nf_file; + struct file *src = copy->nf_src->nf_file; ssize_t bytes_copied = 0; - u64 bytes_total = copy->cp_count; + size_t bytes_total = copy->cp_count; u64 src_pos = copy->cp_src_pos; u64 dst_pos = copy->cp_dst_pos; - int status; - loff_t end; - /* See RFC 7862 p.67: */ - if (bytes_total == 0) - bytes_total = ULLONG_MAX; do { if (kthread_should_stop()) break; @@ -1623,29 +1388,16 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy, copy->cp_res.wr_bytes_written += bytes_copied; src_pos += bytes_copied; dst_pos += bytes_copied; - } while (bytes_total > 0 && nfsd4_copy_is_async(copy)); - /* for a non-zero asynchronous copy do a commit of data */ - if (nfsd4_copy_is_async(copy) && copy->cp_res.wr_bytes_written > 0) { - since = READ_ONCE(dst->f_wb_err); - end = copy->cp_dst_pos + copy->cp_res.wr_bytes_written - 1; - status = vfs_fsync_range(dst, copy->cp_dst_pos, end, 0); - if (!status) - status = filemap_check_wb_err(dst->f_mapping, since); - if (!status) - set_bit(NFSD4_COPY_F_COMMITTED, ©->cp_flags); - } + } while (bytes_total > 0 && !copy->cp_synchronous); return bytes_copied; } -static __be32 nfsd4_do_copy(struct nfsd4_copy *copy, - struct file *src, struct file *dst, - bool sync) +static __be32 nfsd4_do_copy(struct nfsd4_copy *copy, bool sync) { __be32 status; ssize_t bytes; - bytes = _nfsd_copy_file_range(copy, dst, src); - + bytes = _nfsd_copy_file_range(copy); /* for async copy, we ignore the error, client can always retry * to get the error */ @@ -1655,6 +1407,13 @@ static __be32 nfsd4_do_copy(struct nfsd4_copy *copy, nfsd4_init_copy_res(copy, sync); status = nfs_ok; } + + if (!copy->cp_intra) /* Inter server SSC */ + nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->nf_src, + copy->nf_dst); + else + nfsd4_cleanup_intra_ssc(copy->nf_src, copy->nf_dst); + return status; } @@ -1663,100 +1422,71 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst) dst->cp_src_pos = src->cp_src_pos; dst->cp_dst_pos = src->cp_dst_pos; dst->cp_count = src->cp_count; - dst->cp_flags = src->cp_flags; + dst->cp_synchronous = src->cp_synchronous; memcpy(&dst->cp_res, &src->cp_res, sizeof(src->cp_res)); memcpy(&dst->fh, &src->fh, sizeof(src->fh)); dst->cp_clp = src->cp_clp; dst->nf_dst = nfsd_file_get(src->nf_dst); - /* for inter, nf_src doesn't exist yet */ - if (!nfsd4_ssc_is_inter(src)) + dst->cp_intra = src->cp_intra; + if (src->cp_intra) /* for inter, file_src doesn't exist yet */ dst->nf_src = nfsd_file_get(src->nf_src); memcpy(&dst->cp_stateid, &src->cp_stateid, sizeof(src->cp_stateid)); - memcpy(dst->cp_src, src->cp_src, sizeof(struct nl4_server)); + memcpy(&dst->cp_src, &src->cp_src, sizeof(struct nl4_server)); memcpy(&dst->stateid, &src->stateid, sizeof(src->stateid)); memcpy(&dst->c_fh, &src->c_fh, sizeof(src->c_fh)); - dst->ss_nsui = src->ss_nsui; -} - -static void release_copy_files(struct nfsd4_copy *copy) -{ - if (copy->nf_src) - nfsd_file_put(copy->nf_src); - if (copy->nf_dst) - nfsd_file_put(copy->nf_dst); + dst->ss_mnt = src->ss_mnt; } static void cleanup_async_copy(struct nfsd4_copy *copy) { nfs4_free_copy_state(copy); - release_copy_files(copy); - if (copy->cp_clp) { - spin_lock(©->cp_clp->async_lock); - if (!list_empty(©->copies)) - list_del_init(©->copies); - spin_unlock(©->cp_clp->async_lock); - } + nfsd_file_put(copy->nf_dst); + if (copy->cp_intra) + nfsd_file_put(copy->nf_src); + spin_lock(©->cp_clp->async_lock); + list_del(©->copies); + spin_unlock(©->cp_clp->async_lock); nfs4_put_copy(copy); } -static void nfsd4_send_cb_offload(struct nfsd4_copy *copy, __be32 nfserr) -{ - struct nfsd4_cb_offload *cbo; - - cbo = kzalloc(sizeof(*cbo), GFP_KERNEL); - if (!cbo) - return; - - memcpy(&cbo->co_res, ©->cp_res, sizeof(copy->cp_res)); - memcpy(&cbo->co_fh, ©->fh, sizeof(copy->fh)); - cbo->co_nfserr = nfserr; - - nfsd4_init_cb(&cbo->co_cb, copy->cp_clp, &nfsd4_cb_offload_ops, - NFSPROC4_CLNT_CB_OFFLOAD); - trace_nfsd_cb_offload(copy->cp_clp, &cbo->co_res.cb_stateid, - &cbo->co_fh, copy->cp_count, nfserr); - nfsd4_run_cb(&cbo->co_cb); -} - -/** - * nfsd4_do_async_copy - kthread function for background server-side COPY - * @data: arguments for COPY operation - * - * Return values: - * %0: Copy operation is done. - */ static int nfsd4_do_async_copy(void *data) { struct nfsd4_copy *copy = (struct nfsd4_copy *)data; - __be32 nfserr; + struct nfsd4_copy *cb_copy; - if (nfsd4_ssc_is_inter(copy)) { - struct file *filp; - - filp = nfs42_ssc_open(copy->ss_nsui->nsui_vfsmount, - ©->c_fh, ©->stateid); - if (IS_ERR(filp)) { - switch (PTR_ERR(filp)) { - case -EBADF: - nfserr = nfserr_wrong_type; - break; - default: - nfserr = nfserr_offload_denied; - } + if (!copy->cp_intra) { /* Inter server SSC */ + copy->nf_src = kzalloc(sizeof(struct nfsd_file), GFP_KERNEL); + if (!copy->nf_src) { + copy->nfserr = nfserr_serverfault; + /* ss_mnt will be unmounted by the laundromat */ + goto do_callback; + } + copy->nf_src->nf_file = nfs42_ssc_open(copy->ss_mnt, ©->c_fh, + ©->stateid); + if (IS_ERR(copy->nf_src->nf_file)) { + copy->nfserr = nfserr_offload_denied; /* ss_mnt will be unmounted by the laundromat */ goto do_callback; } - nfserr = nfsd4_do_copy(copy, filp, copy->nf_dst->nf_file, - false); - nfsd4_cleanup_inter_ssc(copy->ss_nsui, filp, copy->nf_dst); - } else { - nfserr = nfsd4_do_copy(copy, copy->nf_src->nf_file, - copy->nf_dst->nf_file, false); } + copy->nfserr = nfsd4_do_copy(copy, 0); do_callback: - nfsd4_send_cb_offload(copy, nfserr); + cb_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL); + if (!cb_copy) + goto out; + refcount_set(&cb_copy->refcount, 1); + memcpy(&cb_copy->cp_res, ©->cp_res, sizeof(copy->cp_res)); + cb_copy->cp_clp = copy->cp_clp; + cb_copy->nfserr = copy->nfserr; + memcpy(&cb_copy->fh, ©->fh, sizeof(copy->fh)); + nfsd4_init_cb(&cb_copy->cp_cb, cb_copy->cp_clp, + &nfsd4_cb_offload_ops, NFSPROC4_CLNT_CB_OFFLOAD); + nfsd4_run_cb(&cb_copy->cp_cb); +out: + if (!copy->cp_intra) + kfree(copy->nf_src); cleanup_async_copy(copy); return 0; } @@ -1769,12 +1499,13 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, __be32 status; struct nfsd4_copy *async_copy = NULL; - if (nfsd4_ssc_is_inter(copy)) { - if (!inter_copy_offload_enable || nfsd4_copy_is_sync(copy)) { + if (!copy->cp_intra) { /* Inter server SSC */ + if (!inter_copy_offload_enable || copy->cp_synchronous) { status = nfserr_notsupp; goto out; } - status = nfsd4_setup_inter_ssc(rqstp, cstate, copy); + status = nfsd4_setup_inter_ssc(rqstp, cstate, copy, + ©->ss_mnt); if (status) return nfserr_offload_denied; } else { @@ -1786,21 +1517,17 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, copy->cp_clp = cstate->clp; memcpy(©->fh, &cstate->current_fh.fh_handle, sizeof(struct knfsd_fh)); - if (nfsd4_copy_is_async(copy)) { + if (!copy->cp_synchronous) { struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); status = nfserrno(-ENOMEM); async_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL); if (!async_copy) goto out_err; - INIT_LIST_HEAD(&async_copy->copies); - refcount_set(&async_copy->refcount, 1); - async_copy->cp_src = kmalloc(sizeof(*async_copy->cp_src), GFP_KERNEL); - if (!async_copy->cp_src) - goto out_err; if (!nfs4_init_copy_state(nn, copy)) goto out_err; - memcpy(©->cp_res.cb_stateid, ©->cp_stateid.cs_stid, + refcount_set(&async_copy->refcount, 1); + memcpy(©->cp_res.cb_stateid, ©->cp_stateid.stid, sizeof(copy->cp_res.cb_stateid)); dup_copy_fields(copy, async_copy); async_copy->copy_task = kthread_create(nfsd4_do_async_copy, @@ -1814,24 +1541,18 @@ nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, wake_up_process(async_copy->copy_task); status = nfs_ok; } else { - status = nfsd4_do_copy(copy, copy->nf_src->nf_file, - copy->nf_dst->nf_file, true); + status = nfsd4_do_copy(copy, 1); } out: - release_copy_files(copy); return status; out_err: - if (nfsd4_ssc_is_inter(copy)) { - /* - * Source's vfsmount of inter-copy will be unmounted - * by the laundromat. Use copy instead of async_copy - * since async_copy->ss_nsui might not be set yet. - */ - refcount_dec(©->ss_nsui->nsui_refcnt); - } if (async_copy) cleanup_async_copy(async_copy); status = nfserrno(-ENOMEM); + /* + * source's vfsmount of inter-copy will be unmounted + * by the laundromat + */ goto out; } @@ -1842,7 +1563,7 @@ find_async_copy(struct nfs4_client *clp, stateid_t *stateid) spin_lock(&clp->async_lock); list_for_each_entry(copy, &clp->async_copies, copies) { - if (memcmp(©->cp_stateid.cs_stid, stateid, NFS4_STATEID_SIZE)) + if (memcmp(©->cp_stateid.stid, stateid, NFS4_STATEID_SIZE)) continue; refcount_inc(©->refcount); spin_unlock(&clp->async_lock); @@ -1896,16 +1617,16 @@ nfsd4_copy_notify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, cps = nfs4_alloc_init_cpntf_state(nn, stid); if (!cps) goto out; - memcpy(&cn->cpn_cnr_stateid, &cps->cp_stateid.cs_stid, sizeof(stateid_t)); + memcpy(&cn->cpn_cnr_stateid, &cps->cp_stateid.stid, sizeof(stateid_t)); memcpy(&cps->cp_p_stateid, &stid->sc_stateid, sizeof(stateid_t)); memcpy(&cps->cp_p_clid, &clp->cl_clientid, sizeof(clientid_t)); /* For now, only return one server address in cpn_src, the * address used by the client to connect to this server. */ - cn->cpn_src->nl4_type = NL4_NETADDR; + cn->cpn_src.nl4_type = NL4_NETADDR; status = nfsd4_set_netaddr((struct sockaddr *)&rqstp->rq_daddr, - &cn->cpn_src->u.nl4_addr); + &cn->cpn_src.u.nl4_addr); WARN_ON_ONCE(status); if (status) { nfs4_put_cpntf_state(nn, cps); @@ -1926,8 +1647,10 @@ nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, &fallocate->falloc_stateid, WR_STATE, &nf, NULL); - if (status != nfs_ok) + if (status != nfs_ok) { + dprintk("NFSD: nfsd4_fallocate: couldn't process stateid!\n"); return status; + } status = nfsd4_vfs_fallocate(rqstp, &cstate->current_fh, nf->nf_file, fallocate->falloc_offset, @@ -1983,8 +1706,10 @@ nfsd4_seek(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh, &seek->seek_stateid, RD_STATE, &nf, NULL); - if (status) + if (status) { + dprintk("NFSD: nfsd4_seek: couldn't process stateid!\n"); return status; + } switch (seek->seek_whence) { case NFS4_CONTENT_DATA: @@ -2152,7 +1877,7 @@ nfsd4_getdeviceinfo(struct svc_rqst *rqstp, nfserr = nfs_ok; if (gdp->gd_maxcount != 0) { nfserr = ops->proc_getdeviceinfo(exp->ex_path.mnt->mnt_sb, - rqstp, cstate->clp, gdp); + rqstp, cstate->session->se_client, gdp); } gdp->gd_notify_types &= ops->notify_types; @@ -2438,7 +2163,7 @@ nfsd4_proc_null(struct svc_rqst *rqstp) static inline void nfsd4_increment_op_stats(u32 opnum) { if (opnum >= FIRST_NFS4_OP && opnum <= LAST_NFS4_OP) - percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_NFS4_OP(opnum)]); + nfsdstats.nfs4_opcount[opnum]++; } static const struct nfsd4_operation nfsd4_ops[]; @@ -2528,6 +2253,25 @@ static bool need_wrongsec_check(struct svc_rqst *rqstp) return !(nextd->op_flags & OP_HANDLES_WRONGSEC); } +static void svcxdr_init_encode(struct svc_rqst *rqstp, + struct nfsd4_compoundres *resp) +{ + struct xdr_stream *xdr = &resp->xdr; + struct xdr_buf *buf = &rqstp->rq_res; + struct kvec *head = buf->head; + + xdr->buf = buf; + xdr->iov = head; + xdr->p = head->iov_base + head->iov_len; + xdr->end = head->iov_base + PAGE_SIZE - rqstp->rq_auth_slack; + /* Tail and page_len should be zero at this point: */ + buf->len = buf->head[0].iov_len; + xdr->scratch.iov_len = 0; + xdr->page_ptr = buf->pages - 1; + buf->buflen = PAGE_SIZE * (1 + rqstp->rq_page_end - buf->pages) + - rqstp->rq_auth_slack; +} + #ifdef CONFIG_NFSD_V4_2_INTER_SSC static void check_if_stalefh_allowed(struct nfsd4_compoundargs *args) @@ -2555,7 +2299,7 @@ check_if_stalefh_allowed(struct nfsd4_compoundargs *args) return; } putfh = (struct nfsd4_putfh *)&saved_op->u; - if (nfsd4_ssc_is_inter(copy)) + if (!copy->cp_intra) putfh->no_verify = true; } } @@ -2582,14 +2326,10 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); __be32 status; - resp->xdr = &rqstp->rq_res_stream; - resp->statusp = resp->xdr->p; - - /* reserve space for: NFS status code */ - xdr_reserve_space(resp->xdr, XDR_UNIT); - + svcxdr_init_encode(rqstp, resp); + resp->tagp = resp->xdr.p; /* reserve space for: taglen, tag, and opcnt */ - xdr_reserve_space(resp->xdr, XDR_UNIT * 2 + args->taglen); + xdr_reserve_space(&resp->xdr, 8 + args->taglen); resp->taglen = args->taglen; resp->tag = args->tag; resp->rqstp = rqstp; @@ -2608,6 +2348,9 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) status = nfserr_minor_vers_mismatch; if (nfsd_minorversion(nn, args->minorversion, NFSD_TEST) <= 0) goto out; + status = nfserr_resource; + if (args->opcnt > NFSD_MAX_OPS_PER_COMPOUND) + goto out; status = nfs41_check_op_ordering(args); if (status) { @@ -2620,20 +2363,10 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) rqstp->rq_lease_breaker = (void **)&cstate->clp; - trace_nfsd_compound(rqstp, args->client_opcnt); + trace_nfsd_compound(rqstp, args->opcnt); while (!status && resp->opcnt < args->opcnt) { op = &args->ops[resp->opcnt++]; - if (unlikely(resp->opcnt == NFSD_MAX_OPS_PER_COMPOUND)) { - /* If there are still more operations to process, - * stop here and report NFS4ERR_RESOURCE. */ - if (cstate->minorversion == 0 && - args->client_opcnt > resp->opcnt) { - op->status = nfserr_resource; - goto encode_op; - } - } - /* * The XDR decode routines may have pre-set op->status; * for example, if there is a miscellaneous XDR error @@ -2657,13 +2390,13 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) goto encode_op; } - fh_clear_pre_post_attrs(current_fh); + fh_clear_wcc(current_fh); /* If op is non-idempotent */ if (op->opdesc->op_flags & OP_MODIFIES_SOMETHING) { /* * Don't execute this op if we couldn't encode a - * successful reply: + * succesful reply: */ u32 plen = op->opdesc->op_rsize_bop(rqstp, op); /* @@ -2702,15 +2435,15 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) encode_op: if (op->status == nfserr_replay_me) { op->replay = &cstate->replay_owner->so_replay; - nfsd4_encode_replay(resp->xdr, op); + nfsd4_encode_replay(&resp->xdr, op); status = op->status = op->replay->rp_status; } else { nfsd4_encode_operation(resp, op); status = op->status; } - trace_nfsd_compound_status(args->client_opcnt, resp->opcnt, - status, nfsd4_op_name(op->opnum)); + trace_nfsd_compound_status(args->opcnt, resp->opcnt, status, + nfsd4_op_name(op->opnum)); nfsd4_cstate_clear_replay(cstate); nfsd4_increment_op_stats(op->opnum); @@ -2744,49 +2477,28 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) #define op_encode_channel_attrs_maxsz (6 + 1 + 1) -/* - * The _rsize() helpers are invoked by the NFSv4 COMPOUND decoder, which - * is called before sunrpc sets rq_res.buflen. Thus we have to compute - * the maximum payload size here, based on transport limits and the size - * of the remaining space in the rq_pages array. - */ -static u32 nfsd4_max_payload(const struct svc_rqst *rqstp) -{ - u32 buflen; - - buflen = (rqstp->rq_page_end - rqstp->rq_next_page) * PAGE_SIZE; - buflen -= rqstp->rq_auth_slack; - buflen -= rqstp->rq_res.head[0].iov_len; - return min_t(u32, buflen, svc_max_payload(rqstp)); -} - -static u32 nfsd4_only_status_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_only_status_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size) * sizeof(__be32); } -static u32 nfsd4_status_stateid_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_status_stateid_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_stateid_maxsz)* sizeof(__be32); } -static u32 nfsd4_access_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_access_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { /* ac_supported, ac_resp_access */ return (op_encode_hdr_size + 2)* sizeof(__be32); } -static u32 nfsd4_commit_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_commit_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_verifier_maxsz) * sizeof(__be32); } -static u32 nfsd4_create_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_create_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_change_info_maxsz + nfs4_fattr_bitmap_maxsz) * sizeof(__be32); @@ -2797,17 +2509,17 @@ static u32 nfsd4_create_rsize(const struct svc_rqst *rqstp, * the op prematurely if the estimate is too large. We may turn off splice * reads unnecessarily. */ -static u32 nfsd4_getattr_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_getattr_rsize(struct svc_rqst *rqstp, + struct nfsd4_op *op) { - const u32 *bmap = op->u.getattr.ga_bmval; + u32 *bmap = op->u.getattr.ga_bmval; u32 bmap0 = bmap[0], bmap1 = bmap[1], bmap2 = bmap[2]; u32 ret = 0; if (bmap0 & FATTR4_WORD0_ACL) - return nfsd4_max_payload(rqstp); + return svc_max_payload(rqstp); if (bmap0 & FATTR4_WORD0_FS_LOCATIONS) - return nfsd4_max_payload(rqstp); + return svc_max_payload(rqstp); if (bmap1 & FATTR4_WORD1_OWNER) { ret += IDMAP_NAMESZ + 4; @@ -2835,28 +2547,24 @@ static u32 nfsd4_getattr_rsize(const struct svc_rqst *rqstp, return ret; } -static u32 nfsd4_getfh_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_getfh_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + 1) * sizeof(__be32) + NFS4_FHSIZE; } -static u32 nfsd4_link_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_link_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_change_info_maxsz) * sizeof(__be32); } -static u32 nfsd4_lock_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_lock_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_lock_denied_maxsz) * sizeof(__be32); } -static u32 nfsd4_open_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_open_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_stateid_maxsz + op_encode_change_info_maxsz + 1 @@ -2864,18 +2572,20 @@ static u32 nfsd4_open_rsize(const struct svc_rqst *rqstp, + op_encode_delegation_maxsz) * sizeof(__be32); } -static u32 nfsd4_read_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_read_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { - u32 rlen = min(op->u.read.rd_length, nfsd4_max_payload(rqstp)); + u32 maxcount = 0, rlen = 0; + + maxcount = svc_max_payload(rqstp); + rlen = min(op->u.read.rd_length, maxcount); return (op_encode_hdr_size + 2 + XDR_QUADLEN(rlen)) * sizeof(__be32); } -static u32 nfsd4_read_plus_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_read_plus_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { - u32 rlen = min(op->u.read.rd_length, nfsd4_max_payload(rqstp)); + u32 maxcount = svc_max_payload(rqstp); + u32 rlen = min(op->u.read.rd_length, maxcount); /* * If we detect that the file changed during hole encoding, then we * recover by encoding the remaining reply as data. This means we need @@ -2886,77 +2596,70 @@ static u32 nfsd4_read_plus_rsize(const struct svc_rqst *rqstp, return (op_encode_hdr_size + 2 + seg_len + XDR_QUADLEN(rlen)) * sizeof(__be32); } -static u32 nfsd4_readdir_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_readdir_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { - u32 rlen = min(op->u.readdir.rd_maxcount, nfsd4_max_payload(rqstp)); + u32 maxcount = 0, rlen = 0; + + maxcount = svc_max_payload(rqstp); + rlen = min(op->u.readdir.rd_maxcount, maxcount); return (op_encode_hdr_size + op_encode_verifier_maxsz + XDR_QUADLEN(rlen)) * sizeof(__be32); } -static u32 nfsd4_readlink_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_readlink_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + 1) * sizeof(__be32) + PAGE_SIZE; } -static u32 nfsd4_remove_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_remove_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_change_info_maxsz) * sizeof(__be32); } -static u32 nfsd4_rename_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_rename_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_change_info_maxsz + op_encode_change_info_maxsz) * sizeof(__be32); } -static u32 nfsd4_sequence_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_sequence_rsize(struct svc_rqst *rqstp, + struct nfsd4_op *op) { return (op_encode_hdr_size + XDR_QUADLEN(NFS4_MAX_SESSIONID_LEN) + 5) * sizeof(__be32); } -static u32 nfsd4_test_stateid_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_test_stateid_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + 1 + op->u.test_stateid.ts_num_ids) * sizeof(__be32); } -static u32 nfsd4_setattr_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_setattr_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + nfs4_fattr_bitmap_maxsz) * sizeof(__be32); } -static u32 nfsd4_secinfo_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_secinfo_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + RPC_AUTH_MAXFLAVOR * (4 + XDR_QUADLEN(GSS_OID_MAX_LEN))) * sizeof(__be32); } -static u32 nfsd4_setclientid_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_setclientid_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + 2 + XDR_QUADLEN(NFS4_VERIFIER_SIZE)) * sizeof(__be32); } -static u32 nfsd4_write_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_write_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + 2 + op_encode_verifier_maxsz) * sizeof(__be32); } -static u32 nfsd4_exchange_id_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_exchange_id_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + 2 + 1 + /* eir_clientid, eir_sequenceid */\ 1 + 1 + /* eir_flags, spr_how */\ @@ -2970,16 +2673,14 @@ static u32 nfsd4_exchange_id_rsize(const struct svc_rqst *rqstp, 0 /* ignored eir_server_impl_id contents */) * sizeof(__be32); } -static u32 nfsd4_bind_conn_to_session_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_bind_conn_to_session_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + \ XDR_QUADLEN(NFS4_MAX_SESSIONID_LEN) + /* bctsr_sessid */\ 2 /* bctsr_dir, use_conn_in_rdma_mode */) * sizeof(__be32); } -static u32 nfsd4_create_session_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_create_session_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + \ XDR_QUADLEN(NFS4_MAX_SESSIONID_LEN) + /* sessionid */\ @@ -2988,8 +2689,7 @@ static u32 nfsd4_create_session_rsize(const struct svc_rqst *rqstp, op_encode_channel_attrs_maxsz) * sizeof(__be32); } -static u32 nfsd4_copy_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_copy_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + 1 /* wr_callback */ + @@ -3001,16 +2701,16 @@ static u32 nfsd4_copy_rsize(const struct svc_rqst *rqstp, 1 /* cr_synchronous */) * sizeof(__be32); } -static u32 nfsd4_offload_status_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_offload_status_rsize(struct svc_rqst *rqstp, + struct nfsd4_op *op) { return (op_encode_hdr_size + 2 /* osr_count */ + 1 /* osr_complete<1> optional 0 for now */) * sizeof(__be32); } -static u32 nfsd4_copy_notify_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_copy_notify_rsize(struct svc_rqst *rqstp, + struct nfsd4_op *op) { return (op_encode_hdr_size + 3 /* cnr_lease_time */ + @@ -3025,10 +2725,12 @@ static u32 nfsd4_copy_notify_rsize(const struct svc_rqst *rqstp, } #ifdef CONFIG_NFSD_PNFS -static u32 nfsd4_getdeviceinfo_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_getdeviceinfo_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { - u32 rlen = min(op->u.getdeviceinfo.gd_maxcount, nfsd4_max_payload(rqstp)); + u32 maxcount = 0, rlen = 0; + + maxcount = svc_max_payload(rqstp); + rlen = min(op->u.getdeviceinfo.gd_maxcount, maxcount); return (op_encode_hdr_size + 1 /* gd_layout_type*/ + @@ -3041,8 +2743,7 @@ static u32 nfsd4_getdeviceinfo_rsize(const struct svc_rqst *rqstp, * so we need to define an arbitrary upper bound here. */ #define MAX_LAYOUT_SIZE 128 -static u32 nfsd4_layoutget_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_layoutget_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + 1 /* logr_return_on_close */ + @@ -3051,16 +2752,14 @@ static u32 nfsd4_layoutget_rsize(const struct svc_rqst *rqstp, MAX_LAYOUT_SIZE) * sizeof(__be32); } -static u32 nfsd4_layoutcommit_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_layoutcommit_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + 1 /* locr_newsize */ + 2 /* ns_size */) * sizeof(__be32); } -static u32 nfsd4_layoutreturn_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_layoutreturn_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + 1 /* lrs_stateid */ + @@ -3069,36 +2768,41 @@ static u32 nfsd4_layoutreturn_rsize(const struct svc_rqst *rqstp, #endif /* CONFIG_NFSD_PNFS */ -static u32 nfsd4_seek_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_seek_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op) { return (op_encode_hdr_size + 3) * sizeof(__be32); } -static u32 nfsd4_getxattr_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_getxattr_rsize(struct svc_rqst *rqstp, + struct nfsd4_op *op) { - u32 rlen = min_t(u32, XATTR_SIZE_MAX, nfsd4_max_payload(rqstp)); + u32 maxcount, rlen; + + maxcount = svc_max_payload(rqstp); + rlen = min_t(u32, XATTR_SIZE_MAX, maxcount); return (op_encode_hdr_size + 1 + XDR_QUADLEN(rlen)) * sizeof(__be32); } -static u32 nfsd4_setxattr_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_setxattr_rsize(struct svc_rqst *rqstp, + struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_change_info_maxsz) * sizeof(__be32); } -static u32 nfsd4_listxattrs_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_listxattrs_rsize(struct svc_rqst *rqstp, + struct nfsd4_op *op) { - u32 rlen = min(op->u.listxattrs.lsxa_maxcount, nfsd4_max_payload(rqstp)); + u32 maxcount, rlen; + + maxcount = svc_max_payload(rqstp); + rlen = min(op->u.listxattrs.lsxa_maxcount, maxcount); return (op_encode_hdr_size + 4 + XDR_QUADLEN(rlen)) * sizeof(__be32); } -static u32 nfsd4_removexattr_rsize(const struct svc_rqst *rqstp, - const struct nfsd4_op *op) +static inline u32 nfsd4_removexattr_rsize(struct svc_rqst *rqstp, + struct nfsd4_op *op) { return (op_encode_hdr_size + op_encode_change_info_maxsz) * sizeof(__be32); @@ -3531,7 +3235,7 @@ bool nfsd4_spo_must_allow(struct svc_rqst *rqstp) { struct nfsd4_compoundres *resp = rqstp->rq_resp; struct nfsd4_compoundargs *argp = rqstp->rq_argp; - struct nfsd4_op *this; + struct nfsd4_op *this = &argp->ops[resp->opcnt - 1]; struct nfsd4_compound_state *cstate = &resp->cstate; struct nfs4_op_map *allow = &cstate->clp->cl_spo_must_allow; u32 opiter; @@ -3568,7 +3272,7 @@ int nfsd4_max_reply(struct svc_rqst *rqstp, struct nfsd4_op *op) void warn_on_nonidempotent_op(struct nfsd4_op *op) { if (OPDESC(op)->op_flags & OP_MODIFIES_SOMETHING) { - pr_err("unable to encode reply to nonidempotent op %u (%s)\n", + pr_err("unable to encode reply to nonidempotent op %d (%s)\n", op->opnum, nfsd4_op_name(op->opnum)); WARN_ON_ONCE(1); } @@ -3581,29 +3285,28 @@ static const char *nfsd4_op_name(unsigned opnum) return "unknown_operation"; } +#define nfsd4_voidres nfsd4_voidargs +struct nfsd4_voidargs { int dummy; }; + static const struct svc_procedure nfsd_procedures4[2] = { [NFSPROC4_NULL] = { .pc_func = nfsd4_proc_null, - .pc_decode = nfssvc_decode_voidarg, - .pc_encode = nfssvc_encode_voidres, - .pc_argsize = sizeof(struct nfsd_voidargs), - .pc_argzero = sizeof(struct nfsd_voidargs), - .pc_ressize = sizeof(struct nfsd_voidres), + .pc_decode = nfs4svc_decode_voidarg, + .pc_encode = nfs4svc_encode_voidres, + .pc_argsize = sizeof(struct nfsd4_voidargs), + .pc_ressize = sizeof(struct nfsd4_voidres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 1, - .pc_name = "NULL", }, [NFSPROC4_COMPOUND] = { .pc_func = nfsd4_proc_compound, .pc_decode = nfs4svc_decode_compoundargs, .pc_encode = nfs4svc_encode_compoundres, .pc_argsize = sizeof(struct nfsd4_compoundargs), - .pc_argzero = offsetof(struct nfsd4_compoundargs, iops), .pc_ressize = sizeof(struct nfsd4_compoundres), .pc_release = nfsd4_release_compoundargs, .pc_cachetype = RC_NOCACHE, .pc_xdrressize = NFSD_BUFSIZE/4, - .pc_name = "COMPOUND", }, }; diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c index 189c622dde61..83c4e6883953 100644 --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -626,7 +626,7 @@ nfsd4_legacy_tracking_init(struct net *net) status = nfsd4_load_reboot_recovery_data(net); if (status) goto err; - pr_info("NFSD: Using legacy client tracking operations.\n"); + printk("NFSD: Using legacy client tracking operations.\n"); return 0; err: @@ -807,17 +807,17 @@ __cld_pipe_inprogress_downcall(const struct cld_msg_v2 __user *cmsg, if (get_user(namelen, &ci->cc_name.cn_len)) return -EFAULT; name.data = memdup_user(&ci->cc_name.cn_id, namelen); - if (IS_ERR(name.data)) - return PTR_ERR(name.data); + if (IS_ERR_OR_NULL(name.data)) + return -EFAULT; name.len = namelen; get_user(princhashlen, &ci->cc_princhash.cp_len); if (princhashlen > 0) { princhash.data = memdup_user( &ci->cc_princhash.cp_data, princhashlen); - if (IS_ERR(princhash.data)) { + if (IS_ERR_OR_NULL(princhash.data)) { kfree(name.data); - return PTR_ERR(princhash.data); + return -EFAULT; } princhash.len = princhashlen; } else @@ -829,8 +829,8 @@ __cld_pipe_inprogress_downcall(const struct cld_msg_v2 __user *cmsg, if (get_user(namelen, &cnm->cn_len)) return -EFAULT; name.data = memdup_user(&cnm->cn_id, namelen); - if (IS_ERR(name.data)) - return PTR_ERR(name.data); + if (IS_ERR_OR_NULL(name.data)) + return -EFAULT; name.len = namelen; } if (name.len > 5 && memcmp(name.data, "hash:", 5) == 0) { @@ -1030,7 +1030,7 @@ nfsd4_init_cld_pipe(struct net *net) status = __nfsd4_init_cld_pipe(net); if (!status) - pr_info("NFSD: Using old nfsdcld client tracking operations.\n"); + printk("NFSD: Using old nfsdcld client tracking operations.\n"); return status; } @@ -1607,7 +1607,7 @@ nfsd4_cld_tracking_init(struct net *net) nfs4_release_reclaim(nn); goto err_remove; } else - pr_info("NFSD: Using nfsdcld client tracking operations.\n"); + printk("NFSD: Using nfsdcld client tracking operations.\n"); return 0; err_remove: @@ -1866,7 +1866,7 @@ nfsd4_umh_cltrack_init(struct net *net) ret = nfsd4_umh_cltrack_upcall("init", NULL, grace_start, NULL); kfree(grace_start); if (!ret) - pr_info("NFSD: Using UMH upcall client tracking operations.\n"); + printk("NFSD: Using UMH upcall client tracking operations.\n"); return ret; } diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 228560f3fd0e..d402ca0b535f 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -43,10 +43,6 @@ #include #include #include -#include -#include -#include - #include "xdr4.h" #include "xdr4cb.h" #include "vfs.h" @@ -86,7 +82,6 @@ static bool check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner) static void nfs4_free_ol_stateid(struct nfs4_stid *stid); void nfsd4_end_grace(struct nfsd_net *nn); static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps); -static void nfsd4_file_hash_remove(struct nfs4_file *fi); /* Locking: */ @@ -128,23 +123,6 @@ static void free_session(struct nfsd4_session *); static const struct nfsd4_callback_ops nfsd4_cb_recall_ops; static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops; -static struct workqueue_struct *laundry_wq; - -int nfsd4_create_laundry_wq(void) -{ - int rc = 0; - - laundry_wq = alloc_workqueue("%s", WQ_UNBOUND, 0, "nfsd4"); - if (laundry_wq == NULL) - rc = -ENOMEM; - return rc; -} - -void nfsd4_destroy_laundry_wq(void) -{ - destroy_workqueue(laundry_wq); -} - static bool is_session_dead(struct nfsd4_session *ses) { return ses->se_flags & NFS4_SESSION_DEAD; @@ -163,13 +141,6 @@ static bool is_client_expired(struct nfs4_client *clp) return clp->cl_time == 0; } -static void nfsd4_dec_courtesy_client_count(struct nfsd_net *nn, - struct nfs4_client *clp) -{ - if (clp->cl_state != NFSD4_ACTIVE) - atomic_add_unless(&nn->nfsd_courtesy_clients, -1, 0); -} - static __be32 get_client_locked(struct nfs4_client *clp) { struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); @@ -179,8 +150,6 @@ static __be32 get_client_locked(struct nfs4_client *clp) if (is_client_expired(clp)) return nfserr_expired; atomic_inc(&clp->cl_rpc_users); - nfsd4_dec_courtesy_client_count(nn, clp); - clp->cl_state = NFSD4_ACTIVE; return nfs_ok; } @@ -201,8 +170,6 @@ renew_client_locked(struct nfs4_client *clp) list_move_tail(&clp->cl_lru, &nn->client_lru); clp->cl_time = ktime_get_boottime_seconds(); - nfsd4_dec_courtesy_client_count(nn, clp); - clp->cl_state = NFSD4_ACTIVE; } static void put_client_renew_locked(struct nfs4_client *clp) @@ -277,7 +244,6 @@ find_blocked_lock(struct nfs4_lockowner *lo, struct knfsd_fh *fh, list_for_each_entry(cur, &lo->lo_blocked, nbl_list) { if (fh_match(fh, &cur->nbl_fh)) { list_del_init(&cur->nbl_list); - WARN_ON(list_empty(&cur->nbl_lru)); list_del_init(&cur->nbl_lru); found = cur; break; @@ -303,7 +269,6 @@ find_or_allocate_block(struct nfs4_lockowner *lo, struct knfsd_fh *fh, INIT_LIST_HEAD(&nbl->nbl_lru); fh_copy_shallow(&nbl->nbl_fh, fh); locks_init_lock(&nbl->nbl_lock); - kref_init(&nbl->nbl_kref); nfsd4_init_cb(&nbl->nbl_cb, lo->lo_owner.so_client, &nfsd4_cb_notify_lock_ops, NFSPROC4_CLNT_CB_NOTIFY_LOCK); @@ -312,21 +277,12 @@ find_or_allocate_block(struct nfs4_lockowner *lo, struct knfsd_fh *fh, return nbl; } -static void -free_nbl(struct kref *kref) -{ - struct nfsd4_blocked_lock *nbl; - - nbl = container_of(kref, struct nfsd4_blocked_lock, nbl_kref); - locks_release_private(&nbl->nbl_lock); - kfree(nbl); -} - static void free_blocked_lock(struct nfsd4_blocked_lock *nbl) { locks_delete_block(&nbl->nbl_lock); - kref_put(&nbl->nbl_kref, free_nbl); + locks_release_private(&nbl->nbl_lock); + kfree(nbl); } static void @@ -344,7 +300,6 @@ remove_blocked_locks(struct nfs4_lockowner *lo) struct nfsd4_blocked_lock, nbl_list); list_del_init(&nbl->nbl_list); - WARN_ON(list_empty(&nbl->nbl_lru)); list_move(&nbl->nbl_lru, &reaplist); } spin_unlock(&nn->blocked_locks_lock); @@ -369,8 +324,6 @@ nfsd4_cb_notify_lock_prepare(struct nfsd4_callback *cb) static int nfsd4_cb_notify_lock_done(struct nfsd4_callback *cb, struct rpc_task *task) { - trace_nfsd_cb_notify_lock_done(&zero_stateid, task); - /* * Since this is just an optimization, we don't try very hard if it * turns out not to succeed. We'll requeue it on NFS4ERR_DELAY, and @@ -400,130 +353,6 @@ static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops = { .release = nfsd4_cb_notify_lock_release, }; -/* - * We store the NONE, READ, WRITE, and BOTH bits separately in the - * st_{access,deny}_bmap field of the stateid, in order to track not - * only what share bits are currently in force, but also what - * combinations of share bits previous opens have used. This allows us - * to enforce the recommendation in - * https://datatracker.ietf.org/doc/html/rfc7530#section-16.19.4 that - * the server return an error if the client attempt to downgrade to a - * combination of share bits not explicable by closing some of its - * previous opens. - * - * This enforcement is arguably incomplete, since we don't keep - * track of access/deny bit combinations; so, e.g., we allow: - * - * OPEN allow read, deny write - * OPEN allow both, deny none - * DOWNGRADE allow read, deny none - * - * which we should reject. - * - * But you could also argue that our current code is already overkill, - * since it only exists to return NFS4ERR_INVAL on incorrect client - * behavior. - */ -static unsigned int -bmap_to_share_mode(unsigned long bmap) -{ - int i; - unsigned int access = 0; - - for (i = 1; i < 4; i++) { - if (test_bit(i, &bmap)) - access |= i; - } - return access; -} - -/* set share access for a given stateid */ -static inline void -set_access(u32 access, struct nfs4_ol_stateid *stp) -{ - unsigned char mask = 1 << access; - - WARN_ON_ONCE(access > NFS4_SHARE_ACCESS_BOTH); - stp->st_access_bmap |= mask; -} - -/* clear share access for a given stateid */ -static inline void -clear_access(u32 access, struct nfs4_ol_stateid *stp) -{ - unsigned char mask = 1 << access; - - WARN_ON_ONCE(access > NFS4_SHARE_ACCESS_BOTH); - stp->st_access_bmap &= ~mask; -} - -/* test whether a given stateid has access */ -static inline bool -test_access(u32 access, struct nfs4_ol_stateid *stp) -{ - unsigned char mask = 1 << access; - - return (bool)(stp->st_access_bmap & mask); -} - -/* set share deny for a given stateid */ -static inline void -set_deny(u32 deny, struct nfs4_ol_stateid *stp) -{ - unsigned char mask = 1 << deny; - - WARN_ON_ONCE(deny > NFS4_SHARE_DENY_BOTH); - stp->st_deny_bmap |= mask; -} - -/* clear share deny for a given stateid */ -static inline void -clear_deny(u32 deny, struct nfs4_ol_stateid *stp) -{ - unsigned char mask = 1 << deny; - - WARN_ON_ONCE(deny > NFS4_SHARE_DENY_BOTH); - stp->st_deny_bmap &= ~mask; -} - -/* test whether a given stateid is denying specific access */ -static inline bool -test_deny(u32 deny, struct nfs4_ol_stateid *stp) -{ - unsigned char mask = 1 << deny; - - return (bool)(stp->st_deny_bmap & mask); -} - -static int nfs4_access_to_omode(u32 access) -{ - switch (access & NFS4_SHARE_ACCESS_BOTH) { - case NFS4_SHARE_ACCESS_READ: - return O_RDONLY; - case NFS4_SHARE_ACCESS_WRITE: - return O_WRONLY; - case NFS4_SHARE_ACCESS_BOTH: - return O_RDWR; - } - WARN_ON_ONCE(1); - return O_RDONLY; -} - -static inline int -access_permit_read(struct nfs4_ol_stateid *stp) -{ - return test_access(NFS4_SHARE_ACCESS_READ, stp) || - test_access(NFS4_SHARE_ACCESS_BOTH, stp) || - test_access(NFS4_SHARE_ACCESS_WRITE, stp); -} - -static inline int -access_permit_write(struct nfs4_ol_stateid *stp) -{ - return test_access(NFS4_SHARE_ACCESS_WRITE, stp) || - test_access(NFS4_SHARE_ACCESS_BOTH, stp); -} - static inline struct nfs4_stateowner * nfs4_get_stateowner(struct nfs4_stateowner *sop) { @@ -591,8 +420,11 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu) void put_nfs4_file(struct nfs4_file *fi) { - if (refcount_dec_and_test(&fi->fi_ref)) { - nfsd4_file_hash_remove(fi); + might_lock(&state_lock); + + if (refcount_dec_and_lock(&fi->fi_ref, &state_lock)) { + hlist_del_rcu(&fi->fi_hash); + spin_unlock(&state_lock); WARN_ON_ONCE(!list_empty(&fi->fi_clnt_odstate)); WARN_ON_ONCE(!list_empty(&fi->fi_delegations)); call_rcu(&fi->fi_rcu, nfsd4_free_file_rcu); @@ -602,7 +434,9 @@ put_nfs4_file(struct nfs4_file *fi) static struct nfsd_file * __nfs4_get_fd(struct nfs4_file *f, int oflag) { - return nfsd_file_get(f->fi_fds[oflag]); + if (f->fi_fds[oflag]) + return nfsd_file_get(f->fi_fds[oflag]); + return NULL; } static struct nfsd_file * @@ -715,72 +549,22 @@ static unsigned int ownerstr_hashval(struct xdr_netobj *ownername) return ret & OWNER_HASH_MASK; } -static struct rhltable nfs4_file_rhltable ____cacheline_aligned_in_smp; +/* hash table for nfs4_file */ +#define FILE_HASH_BITS 8 +#define FILE_HASH_SIZE (1 << FILE_HASH_BITS) -static const struct rhashtable_params nfs4_file_rhash_params = { - .key_len = sizeof_field(struct nfs4_file, fi_inode), - .key_offset = offsetof(struct nfs4_file, fi_inode), - .head_offset = offsetof(struct nfs4_file, fi_rlist), - - /* - * Start with a single page hash table to reduce resizing churn - * on light workloads. - */ - .min_size = 256, - .automatic_shrinking = true, -}; - -/* - * Check if courtesy clients have conflicting access and resolve it if possible - * - * access: is op_share_access if share_access is true. - * Check if access mode, op_share_access, would conflict with - * the current deny mode of the file 'fp'. - * access: is op_share_deny if share_access is false. - * Check if the deny mode, op_share_deny, would conflict with - * current access of the file 'fp'. - * stp: skip checking this entry. - * new_stp: normal open, not open upgrade. - * - * Function returns: - * false - access/deny mode conflict with normal client. - * true - no conflict or conflict with courtesy client(s) is resolved. - */ -static bool -nfs4_resolve_deny_conflicts_locked(struct nfs4_file *fp, bool new_stp, - struct nfs4_ol_stateid *stp, u32 access, bool share_access) +static unsigned int nfsd_fh_hashval(struct knfsd_fh *fh) { - struct nfs4_ol_stateid *st; - bool resolvable = true; - unsigned char bmap; - struct nfsd_net *nn; - struct nfs4_client *clp; - - lockdep_assert_held(&fp->fi_lock); - list_for_each_entry(st, &fp->fi_stateids, st_perfile) { - /* ignore lock stateid */ - if (st->st_openstp) - continue; - if (st == stp && new_stp) - continue; - /* check file access against deny mode or vice versa */ - bmap = share_access ? st->st_deny_bmap : st->st_access_bmap; - if (!(access & bmap_to_share_mode(bmap))) - continue; - clp = st->st_stid.sc_client; - if (try_to_expire_client(clp)) - continue; - resolvable = false; - break; - } - if (resolvable) { - clp = stp->st_stid.sc_client; - nn = net_generic(clp->net, nfsd_net_id); - mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); - } - return resolvable; + return jhash2(fh->fh_base.fh_pad, XDR_QUADLEN(fh->fh_size), 0); } +static unsigned int file_hashval(struct knfsd_fh *fh) +{ + return nfsd_fh_hashval(fh) & (FILE_HASH_SIZE - 1); +} + +static struct hlist_head file_hashtbl[FILE_HASH_SIZE]; + static void __nfs4_file_get_access(struct nfs4_file *fp, u32 access) { @@ -984,23 +768,23 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla * Create a unique stateid_t to represent each COPY. */ static int nfs4_init_cp_state(struct nfsd_net *nn, copy_stateid_t *stid, - unsigned char cs_type) + unsigned char sc_type) { int new_id; - stid->cs_stid.si_opaque.so_clid.cl_boot = (u32)nn->boot_time; - stid->cs_stid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id; + stid->stid.si_opaque.so_clid.cl_boot = (u32)nn->boot_time; + stid->stid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id; + stid->sc_type = sc_type; idr_preload(GFP_KERNEL); spin_lock(&nn->s2s_cp_lock); new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, stid, 0, 0, GFP_NOWAIT); - stid->cs_stid.si_opaque.so_id = new_id; - stid->cs_stid.si_generation = 1; + stid->stid.si_opaque.so_id = new_id; + stid->stid.si_generation = 1; spin_unlock(&nn->s2s_cp_lock); idr_preload_end(); if (new_id < 0) return 0; - stid->cs_type = cs_type; return 1; } @@ -1018,7 +802,7 @@ struct nfs4_cpntf_state *nfs4_alloc_init_cpntf_state(struct nfsd_net *nn, if (!cps) return NULL; cps->cpntf_time = ktime_get_boottime_seconds(); - refcount_set(&cps->cp_stateid.cs_count, 1); + refcount_set(&cps->cp_stateid.sc_count, 1); if (!nfs4_init_cp_state(nn, &cps->cp_stateid, NFS4_COPYNOTIFY_STID)) goto out_free; spin_lock(&nn->s2s_cp_lock); @@ -1034,12 +818,11 @@ void nfs4_free_copy_state(struct nfsd4_copy *copy) { struct nfsd_net *nn; - if (copy->cp_stateid.cs_type != NFS4_COPY_STID) - return; + WARN_ON_ONCE(copy->cp_stateid.sc_type != NFS4_COPY_STID); nn = net_generic(copy->cp_clp->net, nfsd_net_id); spin_lock(&nn->s2s_cp_lock); idr_remove(&nn->s2s_cp_stateids, - copy->cp_stateid.cs_stid.si_opaque.so_id); + copy->cp_stateid.stid.si_opaque.so_id); spin_unlock(&nn->s2s_cp_lock); } @@ -1071,12 +854,7 @@ static struct nfs4_ol_stateid * nfs4_alloc_open_stateid(struct nfs4_client *clp) static void nfs4_free_deleg(struct nfs4_stid *stid) { - struct nfs4_delegation *dp = delegstateid(stid); - - WARN_ON_ONCE(!list_empty(&stid->sc_cp_list)); - WARN_ON_ONCE(!list_empty(&dp->dl_perfile)); - WARN_ON_ONCE(!list_empty(&dp->dl_perclnt)); - WARN_ON_ONCE(!list_empty(&dp->dl_recall_lru)); + WARN_ON(!list_empty(&stid->sc_cp_list)); kmem_cache_free(deleg_slab, stid); atomic_long_dec(&num_delegations); } @@ -1126,7 +904,7 @@ static int delegation_blocked(struct knfsd_fh *fh) } spin_unlock(&blocked_delegations_lock); } - hash = jhash(&fh->fh_raw, fh->fh_size, 0); + hash = jhash(&fh->fh_base, fh->fh_size, 0); if (test_bit(hash&255, bd->set[0]) && test_bit((hash>>8)&255, bd->set[0]) && test_bit((hash>>16)&255, bd->set[0])) @@ -1145,7 +923,7 @@ static void block_delegations(struct knfsd_fh *fh) u32 hash; struct bloom_pair *bd = &blocked_delegations; - hash = jhash(&fh->fh_raw, fh->fh_size, 0); + hash = jhash(&fh->fh_base, fh->fh_size, 0); spin_lock(&blocked_delegations_lock); __set_bit(hash&255, bd->set[bd->new]); @@ -1159,6 +937,7 @@ static void block_delegations(struct knfsd_fh *fh) static struct nfs4_delegation * alloc_init_deleg(struct nfs4_client *clp, struct nfs4_file *fp, + struct svc_fh *current_fh, struct nfs4_clnt_odstate *odstate) { struct nfs4_delegation *dp; @@ -1168,7 +947,7 @@ alloc_init_deleg(struct nfs4_client *clp, struct nfs4_file *fp, n = atomic_long_inc_return(&num_delegations); if (n < 0 || n > max_delegations) goto out_dec; - if (delegation_blocked(&fp->fi_fhandle)) + if (delegation_blocked(¤t_fh->fh_handle)) goto out_dec; dp = delegstateid(nfs4_alloc_stid(clp, deleg_slab, nfs4_free_deleg)); if (dp == NULL) @@ -1187,7 +966,6 @@ alloc_init_deleg(struct nfs4_client *clp, struct nfs4_file *fp, get_clnt_odstate(odstate); dp->dl_type = NFS4_OPEN_DELEGATE_READ; dp->dl_retries = 1; - dp->dl_recalled = false; nfsd4_init_cb(&dp->dl_recall, dp->dl_stid.sc_client, &nfsd4_cb_recall_ops, NFSPROC4_CLNT_CB_RECALL); get_nfs4_file(fp); @@ -1366,8 +1144,6 @@ static void revoke_delegation(struct nfs4_delegation *dp) WARN_ON(!list_empty(&dp->dl_recall_lru)); - trace_nfsd_stid_revoke(&dp->dl_stid); - if (clp->cl_minorversion) { spin_lock(&clp->cl_lock); dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID; @@ -1392,6 +1168,108 @@ static unsigned int clientstr_hashval(struct xdr_netobj name) return opaque_hashval(name.data, 8) & CLIENT_HASH_MASK; } +/* + * We store the NONE, READ, WRITE, and BOTH bits separately in the + * st_{access,deny}_bmap field of the stateid, in order to track not + * only what share bits are currently in force, but also what + * combinations of share bits previous opens have used. This allows us + * to enforce the recommendation of rfc 3530 14.2.19 that the server + * return an error if the client attempt to downgrade to a combination + * of share bits not explicable by closing some of its previous opens. + * + * XXX: This enforcement is actually incomplete, since we don't keep + * track of access/deny bit combinations; so, e.g., we allow: + * + * OPEN allow read, deny write + * OPEN allow both, deny none + * DOWNGRADE allow read, deny none + * + * which we should reject. + */ +static unsigned int +bmap_to_share_mode(unsigned long bmap) { + int i; + unsigned int access = 0; + + for (i = 1; i < 4; i++) { + if (test_bit(i, &bmap)) + access |= i; + } + return access; +} + +/* set share access for a given stateid */ +static inline void +set_access(u32 access, struct nfs4_ol_stateid *stp) +{ + unsigned char mask = 1 << access; + + WARN_ON_ONCE(access > NFS4_SHARE_ACCESS_BOTH); + stp->st_access_bmap |= mask; +} + +/* clear share access for a given stateid */ +static inline void +clear_access(u32 access, struct nfs4_ol_stateid *stp) +{ + unsigned char mask = 1 << access; + + WARN_ON_ONCE(access > NFS4_SHARE_ACCESS_BOTH); + stp->st_access_bmap &= ~mask; +} + +/* test whether a given stateid has access */ +static inline bool +test_access(u32 access, struct nfs4_ol_stateid *stp) +{ + unsigned char mask = 1 << access; + + return (bool)(stp->st_access_bmap & mask); +} + +/* set share deny for a given stateid */ +static inline void +set_deny(u32 deny, struct nfs4_ol_stateid *stp) +{ + unsigned char mask = 1 << deny; + + WARN_ON_ONCE(deny > NFS4_SHARE_DENY_BOTH); + stp->st_deny_bmap |= mask; +} + +/* clear share deny for a given stateid */ +static inline void +clear_deny(u32 deny, struct nfs4_ol_stateid *stp) +{ + unsigned char mask = 1 << deny; + + WARN_ON_ONCE(deny > NFS4_SHARE_DENY_BOTH); + stp->st_deny_bmap &= ~mask; +} + +/* test whether a given stateid is denying specific access */ +static inline bool +test_deny(u32 deny, struct nfs4_ol_stateid *stp) +{ + unsigned char mask = 1 << deny; + + return (bool)(stp->st_deny_bmap & mask); +} + +static int nfs4_access_to_omode(u32 access) +{ + switch (access & NFS4_SHARE_ACCESS_BOTH) { + case NFS4_SHARE_ACCESS_READ: + return O_RDONLY; + case NFS4_SHARE_ACCESS_WRITE: + return O_WRONLY; + case NFS4_SHARE_ACCESS_BOTH: + return O_RDWR; + } + WARN_ON_ONCE(1); + return O_RDONLY; +} + /* * A stateid that had a deny mode associated with it is being released * or downgraded. Recalculate the deny mode on the file. @@ -1832,12 +1710,13 @@ static struct nfsd4_session *alloc_session(struct nfsd4_channel_attrs *fattrs, int numslots = fattrs->maxreqs; int slotsize = slot_bytes(fattrs); struct nfsd4_session *new; - int i; + int mem, i; - BUILD_BUG_ON(struct_size(new, se_slots, NFSD_MAX_SLOTS_PER_SESSION) - > PAGE_SIZE); + BUILD_BUG_ON(NFSD_MAX_SLOTS_PER_SESSION * sizeof(struct nfsd4_slot *) + + sizeof(struct nfsd4_session) > PAGE_SIZE); + mem = numslots * sizeof(struct nfsd4_slot *); - new = kzalloc(struct_size(new, se_slots, numslots), GFP_KERNEL); + new = kzalloc(sizeof(*new) + mem, GFP_KERNEL); if (!new) return NULL; /* allocate each struct nfsd4_slot and data cache in one piece */ @@ -1869,8 +1748,6 @@ static void nfsd4_conn_lost(struct svc_xpt_user *u) struct nfsd4_conn *c = container_of(u, struct nfsd4_conn, cn_xpt_user); struct nfs4_client *clp = c->cn_session->se_client; - trace_nfsd_cb_lost(clp); - spin_lock(&clp->cl_lock); if (!list_empty(&c->cn_persession)) { list_del(&c->cn_persession); @@ -2082,16 +1959,11 @@ STALE_CLIENTID(clientid_t *clid, struct nfsd_net *nn) * This type of memory management is somewhat inefficient, but we use it * anyway since SETCLIENTID is not a common operation. */ -static struct nfs4_client *alloc_client(struct xdr_netobj name, - struct nfsd_net *nn) +static struct nfs4_client *alloc_client(struct xdr_netobj name) { struct nfs4_client *clp; int i; - if (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) { - mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); - return NULL; - } clp = kmem_cache_zalloc(client_slab, GFP_KERNEL); if (clp == NULL) return NULL; @@ -2109,9 +1981,6 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name, idr_init(&clp->cl_stateids); atomic_set(&clp->cl_rpc_users, 0); clp->cl_cb_state = NFSD4_CB_UNKNOWN; - clp->cl_state = NFSD4_ACTIVE; - atomic_inc(&nn->nfs4_client_count); - atomic_set(&clp->cl_delegs_in_recall, 0); INIT_LIST_HEAD(&clp->cl_idhash); INIT_LIST_HEAD(&clp->cl_openowners); INIT_LIST_HEAD(&clp->cl_delegations); @@ -2143,7 +2012,6 @@ static void __free_client(struct kref *k) kfree(clp->cl_nii_domain.data); kfree(clp->cl_nii_name.data); idr_destroy(&clp->cl_stateids); - kfree(clp->cl_ra); kmem_cache_free(client_slab, clp); } @@ -2219,7 +2087,6 @@ static __be32 mark_client_expired_locked(struct nfs4_client *clp) static void __destroy_client(struct nfs4_client *clp) { - struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); int i; struct nfs4_openowner *oo; struct nfs4_delegation *dp; @@ -2263,8 +2130,6 @@ __destroy_client(struct nfs4_client *clp) nfsd4_shutdown_callback(clp); if (clp->cl_cb_conn.cb_xprt) svc_xprt_put(clp->cl_cb_conn.cb_xprt); - atomic_add_unless(&nn->nfs4_client_count, -1, 0); - nfsd4_dec_courtesy_client_count(nn, clp); free_client(clp); wake_up_all(&expiry_wq); } @@ -2493,24 +2358,9 @@ static void seq_quote_mem(struct seq_file *m, char *data, int len) seq_printf(m, "\""); } -static const char *cb_state2str(int state) -{ - switch (state) { - case NFSD4_CB_UP: - return "UP"; - case NFSD4_CB_UNKNOWN: - return "UNKNOWN"; - case NFSD4_CB_DOWN: - return "DOWN"; - case NFSD4_CB_FAULT: - return "FAULT"; - } - return "UNDEFINED"; -} - static int client_info_show(struct seq_file *m, void *v) { - struct inode *inode = file_inode(m->file); + struct inode *inode = m->private; struct nfs4_client *clp; u64 clid; @@ -2520,17 +2370,6 @@ static int client_info_show(struct seq_file *m, void *v) memcpy(&clid, &clp->cl_clientid, sizeof(clid)); seq_printf(m, "clientid: 0x%llx\n", clid); seq_printf(m, "address: \"%pISpc\"\n", (struct sockaddr *)&clp->cl_addr); - - if (clp->cl_state == NFSD4_COURTESY) - seq_puts(m, "status: courtesy\n"); - else if (clp->cl_state == NFSD4_EXPIRABLE) - seq_puts(m, "status: expirable\n"); - else if (test_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags)) - seq_puts(m, "status: confirmed\n"); - else - seq_puts(m, "status: unconfirmed\n"); - seq_printf(m, "seconds from last renew: %lld\n", - ktime_get_boottime_seconds() - clp->cl_time); seq_printf(m, "name: "); seq_quote_mem(m, clp->cl_name.data, clp->cl_name.len); seq_printf(m, "\nminor version: %d\n", clp->cl_minorversion); @@ -2543,14 +2382,22 @@ static int client_info_show(struct seq_file *m, void *v) seq_printf(m, "\nImplementation time: [%lld, %ld]\n", clp->cl_nii_time.tv_sec, clp->cl_nii_time.tv_nsec); } - seq_printf(m, "callback state: %s\n", cb_state2str(clp->cl_cb_state)); - seq_printf(m, "callback address: %pISpc\n", &clp->cl_cb_conn.cb_addr); drop_client(clp); return 0; } -DEFINE_SHOW_ATTRIBUTE(client_info); +static int client_info_open(struct inode *inode, struct file *file) +{ + return single_open(file, client_info_show, inode); +} + +static const struct file_operations client_info_fops = { + .open = client_info_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; static void *states_start(struct seq_file *s, loff_t *pos) __acquires(&clp->cl_lock) @@ -2593,7 +2440,7 @@ static void nfs4_show_fname(struct seq_file *s, struct nfsd_file *f) static void nfs4_show_superblock(struct seq_file *s, struct nfsd_file *f) { - struct inode *inode = file_inode(f->nf_file); + struct inode *inode = f->nf_inode; seq_printf(s, "superblock: \"%02x:%02x:%ld\"", MAJOR(inode->i_sb->s_dev), @@ -2821,8 +2668,6 @@ static void force_expire_client(struct nfs4_client *clp) struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); bool already_expired; - trace_nfsd_clid_admin_expired(&clp->cl_clientid); - spin_lock(&nn->client_lock); clp->cl_time = 0; spin_unlock(&nn->client_lock); @@ -2871,36 +2716,6 @@ static const struct tree_descr client_files[] = { [3] = {""}, }; -static int -nfsd4_cb_recall_any_done(struct nfsd4_callback *cb, - struct rpc_task *task) -{ - switch (task->tk_status) { - case -NFS4ERR_DELAY: - rpc_delay(task, 2 * HZ); - return 0; - default: - return 1; - } -} - -static void -nfsd4_cb_recall_any_release(struct nfsd4_callback *cb) -{ - struct nfs4_client *clp = cb->cb_clp; - struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); - - spin_lock(&nn->client_lock); - clear_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags); - put_client_renew_locked(clp); - spin_unlock(&nn->client_lock); -} - -static const struct nfsd4_callback_ops nfsd4_cb_recall_any_ops = { - .done = nfsd4_cb_recall_any_done, - .release = nfsd4_cb_recall_any_release, -}; - static struct nfs4_client *create_client(struct xdr_netobj name, struct svc_rqst *rqstp, nfs4_verifier *verf) { @@ -2909,9 +2724,8 @@ static struct nfs4_client *create_client(struct xdr_netobj name, int ret; struct net *net = SVC_NET(rqstp); struct nfsd_net *nn = net_generic(net, nfsd_net_id); - struct dentry *dentries[ARRAY_SIZE(client_files)]; - clp = alloc_client(name, nn); + clp = alloc_client(name); if (clp == NULL) return NULL; @@ -2929,23 +2743,13 @@ static struct nfs4_client *create_client(struct xdr_netobj name, memcpy(&clp->cl_addr, sa, sizeof(struct sockaddr_storage)); clp->cl_cb_session = NULL; clp->net = net; - clp->cl_nfsd_dentry = nfsd_client_mkdir( - nn, &clp->cl_nfsdfs, - clp->cl_clientid.cl_id - nn->clientid_base, - client_files, dentries); - clp->cl_nfsd_info_dentry = dentries[0]; + clp->cl_nfsd_dentry = nfsd_client_mkdir(nn, &clp->cl_nfsdfs, + clp->cl_clientid.cl_id - nn->clientid_base, + client_files); if (!clp->cl_nfsd_dentry) { free_client(clp); return NULL; } - clp->cl_ra = kzalloc(sizeof(*clp->cl_ra), GFP_KERNEL); - if (!clp->cl_ra) { - free_client(clp); - return NULL; - } - clp->cl_ra_time = 0; - nfsd4_init_cb(&clp->cl_ra->ra_cb, clp, &nfsd4_cb_recall_any_ops, - NFSPROC4_CLNT_CB_RECALL_ANY); return clp; } @@ -3012,11 +2816,11 @@ move_to_confirmed(struct nfs4_client *clp) lockdep_assert_held(&nn->client_lock); + dprintk("NFSD: move_to_confirm nfs4_client %p\n", clp); list_move(&clp->cl_idhash, &nn->conf_id_hashtbl[idhashval]); rb_erase(&clp->cl_namenode, &nn->unconf_name_tree); add_clp_to_name_tree(clp, &nn->conf_name_tree); set_bit(NFSD4_CLIENT_CONFIRMED, &clp->cl_flags); - trace_nfsd_clid_confirmed(&clp->cl_clientid); renew_client_locked(clp); } @@ -3121,7 +2925,7 @@ gen_callback(struct nfs4_client *clp, struct nfsd4_setclientid *se, struct svc_r static void nfsd4_store_cache_entry(struct nfsd4_compoundres *resp) { - struct xdr_buf *buf = resp->xdr->buf; + struct xdr_buf *buf = resp->xdr.buf; struct nfsd4_slot *slot = resp->cstate.slot; unsigned int base; @@ -3191,7 +2995,7 @@ nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp, struct nfsd4_sequence *seq) { struct nfsd4_slot *slot = resp->cstate.slot; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; __be32 status; @@ -3285,7 +3089,7 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, rpc_ntop(sa, addr_str, sizeof(addr_str)); dprintk("%s rqstp=%p exid=%p clname.len=%u clname.data=%p " - "ip_addr=%s flags %x, spa_how %u\n", + "ip_addr=%s flags %x, spa_how %d\n", __func__, rqstp, exid, exid->clname.len, exid->clname.data, addr_str, exid->flags, exid->spa_how); @@ -3332,7 +3136,6 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out_nolock; } new->cl_mach_cred = true; - break; case SP4_NONE: break; default: /* checked by xdr code */ @@ -3369,24 +3172,20 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, } /* case 6 */ exid->flags |= EXCHGID4_FLAG_CONFIRMED_R; - trace_nfsd_clid_confirmed_r(conf); goto out_copy; } if (!creds_match) { /* case 3 */ if (client_has_state(conf)) { status = nfserr_clid_inuse; - trace_nfsd_clid_cred_mismatch(conf, rqstp); goto out; } goto out_new; } if (verfs_match) { /* case 2 */ conf->cl_exchange_flags |= EXCHGID4_FLAG_CONFIRMED_R; - trace_nfsd_clid_confirmed_r(conf); goto out_copy; } /* case 5, client reboot */ - trace_nfsd_clid_verf_mismatch(conf, rqstp, &verf); conf = NULL; goto out_new; } @@ -3396,19 +3195,16 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out; } - unconf = find_unconfirmed_client_by_name(&exid->clname, nn); + unconf = find_unconfirmed_client_by_name(&exid->clname, nn); if (unconf) /* case 4, possible retry or client restart */ unhash_client_locked(unconf); - /* case 1, new owner ID */ - trace_nfsd_clid_fresh(new); - + /* case 1 (normal case) */ out_new: if (conf) { status = mark_client_expired_locked(conf); if (status) goto out; - trace_nfsd_clid_replaced(&conf->cl_clientid); } new->cl_minorversion = cstate->minorversion; new->cl_spo_must_allow.u.words[0] = exid->spo_must_allow[0]; @@ -3432,10 +3228,8 @@ nfsd4_exchange_id(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, out_nolock: if (new) expire_client(new); - if (unconf) { - trace_nfsd_clid_expire_unconf(&unconf->cl_clientid); + if (unconf) expire_client(unconf); - } return status; } @@ -3627,10 +3421,9 @@ nfsd4_create_session(struct svc_rqst *rqstp, goto out_free_conn; } } else if (unconf) { - status = nfserr_clid_inuse; if (!same_creds(&unconf->cl_cred, &rqstp->rq_cred) || !rpc_cmp_addr(sa, (struct sockaddr *) &unconf->cl_addr)) { - trace_nfsd_clid_cred_mismatch(unconf, rqstp); + status = nfserr_clid_inuse; goto out_free_conn; } status = nfserr_wrong_cred; @@ -3650,7 +3443,6 @@ nfsd4_create_session(struct svc_rqst *rqstp, old = NULL; goto out_free_conn; } - trace_nfsd_clid_replaced(&old->cl_clientid); } move_to_confirmed(unconf); conf = unconf; @@ -3675,8 +3467,6 @@ nfsd4_create_session(struct svc_rqst *rqstp, /* cache solo and embedded create sessions under the client_lock */ nfsd4_cache_create_session(cr_ses, cs_slot, status); spin_unlock(&nn->client_lock); - if (conf == unconf) - fsnotify_dentry(conf->cl_nfsd_info_dentry, FS_MODIFY); /* init connection and backchannel */ nfsd4_init_conn(rqstp, conn, new); nfsd4_put_session(new); @@ -3950,7 +3740,7 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, { struct nfsd4_sequence *seq = &u->sequence; struct nfsd4_compoundres *resp = rqstp->rq_resp; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; struct nfsd4_session *session; struct nfs4_client *clp; struct nfsd4_slot *slot; @@ -4120,7 +3910,6 @@ nfsd4_destroy_clientid(struct svc_rqst *rqstp, status = nfserr_wrong_cred; goto out; } - trace_nfsd_clid_destroyed(&clp->cl_clientid); unhash_client_locked(clp); out: spin_unlock(&nn->client_lock); @@ -4134,7 +3923,6 @@ nfsd4_reclaim_complete(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) { struct nfsd4_reclaim_complete *rc = &u->reclaim_complete; - struct nfs4_client *clp = cstate->clp; __be32 status = 0; if (rc->rca_one_fs) { @@ -4148,11 +3936,12 @@ nfsd4_reclaim_complete(struct svc_rqst *rqstp, } status = nfserr_complete_already; - if (test_and_set_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, &clp->cl_flags)) + if (test_and_set_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, + &cstate->session->se_client->cl_flags)) goto out; status = nfserr_stale_clientid; - if (is_client_expired(clp)) + if (is_client_expired(cstate->session->se_client)) /* * The following error isn't really legal. * But we only get here if the client just explicitly @@ -4163,9 +3952,8 @@ nfsd4_reclaim_complete(struct svc_rqst *rqstp, goto out; status = nfs_ok; - trace_nfsd_clid_reclaim_complete(&clp->cl_clientid); - nfsd4_client_record_create(clp); - inc_reclaim_complete(clp); + nfsd4_client_record_create(cstate->session->se_client); + inc_reclaim_complete(cstate->session->se_client); out: return status; } @@ -4185,29 +3973,27 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, new = create_client(clname, rqstp, &clverifier); if (new == NULL) return nfserr_jukebox; + /* Cases below refer to rfc 3530 section 14.2.33: */ spin_lock(&nn->client_lock); conf = find_confirmed_client_by_name(&clname, nn); if (conf && client_has_state(conf)) { + /* case 0: */ status = nfserr_clid_inuse; if (clp_used_exchangeid(conf)) goto out; if (!same_creds(&conf->cl_cred, &rqstp->rq_cred)) { - trace_nfsd_clid_cred_mismatch(conf, rqstp); + trace_nfsd_clid_inuse_err(conf); goto out; } } unconf = find_unconfirmed_client_by_name(&clname, nn); if (unconf) unhash_client_locked(unconf); - if (conf) { - if (same_verf(&conf->cl_verifier, &clverifier)) { - copy_clid(new, conf); - gen_confirm(new, nn); - } else - trace_nfsd_clid_verf_mismatch(conf, rqstp, - &clverifier); - } else - trace_nfsd_clid_fresh(new); + /* We need to handle only case 1: probable callback update */ + if (conf && same_verf(&conf->cl_verifier, &clverifier)) { + copy_clid(new, conf); + gen_confirm(new, nn); + } new->cl_minorversion = 0; gen_callback(new, setclid, rqstp); add_to_unconfirmed(new); @@ -4220,13 +4006,12 @@ nfsd4_setclientid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, spin_unlock(&nn->client_lock); if (new) free_client(new); - if (unconf) { - trace_nfsd_clid_expire_unconf(&unconf->cl_clientid); + if (unconf) expire_client(unconf); - } return status; } + __be32 nfsd4_setclientid_confirm(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, @@ -4255,27 +4040,25 @@ nfsd4_setclientid_confirm(struct svc_rqst *rqstp, * Nevertheless, RFC 7530 recommends INUSE for this case: */ status = nfserr_clid_inuse; - if (unconf && !same_creds(&unconf->cl_cred, &rqstp->rq_cred)) { - trace_nfsd_clid_cred_mismatch(unconf, rqstp); + if (unconf && !same_creds(&unconf->cl_cred, &rqstp->rq_cred)) goto out; - } - if (conf && !same_creds(&conf->cl_cred, &rqstp->rq_cred)) { - trace_nfsd_clid_cred_mismatch(conf, rqstp); + if (conf && !same_creds(&conf->cl_cred, &rqstp->rq_cred)) goto out; - } + /* cases below refer to rfc 3530 section 14.2.34: */ if (!unconf || !same_verf(&confirm, &unconf->cl_confirm)) { if (conf && same_verf(&confirm, &conf->cl_confirm)) { + /* case 2: probable retransmit */ status = nfs_ok; - } else + } else /* case 4: client hasn't noticed we rebooted yet? */ status = nfserr_stale_clientid; goto out; } status = nfs_ok; - if (conf) { + if (conf) { /* case 1: callback update */ old = unconf; unhash_client_locked(old); nfsd4_change_callback(conf, &unconf->cl_cb_conn); - } else { + } else { /* case 3: normal case; new or rebooted client */ old = find_confirmed_client_by_name(&unconf->cl_name, nn); if (old) { status = nfserr_clid_inuse; @@ -4290,15 +4073,12 @@ nfsd4_setclientid_confirm(struct svc_rqst *rqstp, old = NULL; goto out; } - trace_nfsd_clid_replaced(&old->cl_clientid); } move_to_confirmed(unconf); conf = unconf; } get_client_locked(conf); spin_unlock(&nn->client_lock); - if (conf == unconf) - fsnotify_dentry(conf->cl_nfsd_info_dentry, FS_MODIFY); nfsd4_probe_callback(conf); spin_lock(&nn->client_lock); put_client_renew_locked(conf); @@ -4315,26 +4095,27 @@ static struct nfs4_file *nfsd4_alloc_file(void) } /* OPEN Share state helper functions */ - -static void nfsd4_file_init(const struct svc_fh *fh, struct nfs4_file *fp) +static void nfsd4_init_file(struct knfsd_fh *fh, unsigned int hashval, + struct nfs4_file *fp) { + lockdep_assert_held(&state_lock); + refcount_set(&fp->fi_ref, 1); spin_lock_init(&fp->fi_lock); INIT_LIST_HEAD(&fp->fi_stateids); INIT_LIST_HEAD(&fp->fi_delegations); INIT_LIST_HEAD(&fp->fi_clnt_odstate); - fh_copy_shallow(&fp->fi_fhandle, &fh->fh_handle); + fh_copy_shallow(&fp->fi_fhandle, fh); fp->fi_deleg_file = NULL; fp->fi_had_conflict = false; fp->fi_share_deny = 0; memset(fp->fi_fds, 0, sizeof(fp->fi_fds)); memset(fp->fi_access, 0, sizeof(fp->fi_access)); - fp->fi_aliased = false; - fp->fi_inode = d_inode(fh->fh_dentry); #ifdef CONFIG_NFSD_PNFS INIT_LIST_HEAD(&fp->fi_lo_states); atomic_set(&fp->fi_lo_recalls, 0); #endif + hlist_add_head_rcu(&fp->fi_hash, &file_hashtbl[hashval]); } void @@ -4398,51 +4179,6 @@ nfsd4_init_slabs(void) return -ENOMEM; } -static unsigned long -nfsd4_state_shrinker_count(struct shrinker *shrink, struct shrink_control *sc) -{ - int count; - struct nfsd_net *nn = container_of(shrink, - struct nfsd_net, nfsd_client_shrinker); - - count = atomic_read(&nn->nfsd_courtesy_clients); - if (!count) - count = atomic_long_read(&num_delegations); - if (count) - queue_work(laundry_wq, &nn->nfsd_shrinker_work); - return (unsigned long)count; -} - -static unsigned long -nfsd4_state_shrinker_scan(struct shrinker *shrink, struct shrink_control *sc) -{ - return SHRINK_STOP; -} - -void -nfsd4_init_leases_net(struct nfsd_net *nn) -{ - struct sysinfo si; - u64 max_clients; - - nn->nfsd4_lease = 90; /* default lease time */ - nn->nfsd4_grace = 90; - nn->somebody_reclaimed = false; - nn->track_reclaim_completes = false; - nn->clverifier_counter = prandom_u32(); - nn->clientid_base = prandom_u32(); - nn->clientid_counter = nn->clientid_base + 1; - nn->s2s_cp_cl_id = nn->clientid_counter++; - - atomic_set(&nn->nfs4_client_count, 0); - si_meminfo(&si); - max_clients = (u64)si.totalram * si.mem_unit / (1024 * 1024 * 1024); - max_clients *= NFS4_CLIENTS_PER_GB; - nn->nfs4_max_clients = max_t(int, max_clients, NFS4_CLIENTS_PER_GB); - - atomic_set(&nn->nfsd_courtesy_clients, 0); -} - static void init_nfs4_replay(struct nfs4_replay *rp) { rp->rp_status = nfserr_serverfault; @@ -4711,80 +4447,55 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net) nfs4_put_stid(&last->st_stid); } -static noinline_for_stack struct nfs4_file * -nfsd4_file_hash_lookup(const struct svc_fh *fhp) +/* search file_hashtbl[] for file */ +static struct nfs4_file * +find_file_locked(struct knfsd_fh *fh, unsigned int hashval) { - struct inode *inode = d_inode(fhp->fh_dentry); - struct rhlist_head *tmp, *list; - struct nfs4_file *fi; + struct nfs4_file *fp; - rcu_read_lock(); - list = rhltable_lookup(&nfs4_file_rhltable, &inode, - nfs4_file_rhash_params); - rhl_for_each_entry_rcu(fi, tmp, list, fi_rlist) { - if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) { - if (refcount_inc_not_zero(&fi->fi_ref)) { - rcu_read_unlock(); - return fi; - } + hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash, + lockdep_is_held(&state_lock)) { + if (fh_match(&fp->fi_fhandle, fh)) { + if (refcount_inc_not_zero(&fp->fi_ref)) + return fp; } } - rcu_read_unlock(); return NULL; } -/* - * On hash insertion, identify entries with the same inode but - * distinct filehandles. They will all be on the list returned - * by rhltable_lookup(). - * - * inode->i_lock prevents racing insertions from adding an entry - * for the same inode/fhp pair twice. - */ -static noinline_for_stack struct nfs4_file * -nfsd4_file_hash_insert(struct nfs4_file *new, const struct svc_fh *fhp) +struct nfs4_file * +find_file(struct knfsd_fh *fh) { - struct inode *inode = d_inode(fhp->fh_dentry); - struct rhlist_head *tmp, *list; - struct nfs4_file *ret = NULL; - bool alias_found = false; - struct nfs4_file *fi; - int err; + struct nfs4_file *fp; + unsigned int hashval = file_hashval(fh); rcu_read_lock(); - spin_lock(&inode->i_lock); - - list = rhltable_lookup(&nfs4_file_rhltable, &inode, - nfs4_file_rhash_params); - rhl_for_each_entry_rcu(fi, tmp, list, fi_rlist) { - if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) { - if (refcount_inc_not_zero(&fi->fi_ref)) - ret = fi; - } else - fi->fi_aliased = alias_found = true; - } - if (ret) - goto out_unlock; - - nfsd4_file_init(fhp, new); - err = rhltable_insert(&nfs4_file_rhltable, &new->fi_rlist, - nfs4_file_rhash_params); - if (err) - goto out_unlock; - - new->fi_aliased = alias_found; - ret = new; - -out_unlock: - spin_unlock(&inode->i_lock); + fp = find_file_locked(fh, hashval); rcu_read_unlock(); - return ret; + return fp; } -static noinline_for_stack void nfsd4_file_hash_remove(struct nfs4_file *fi) +static struct nfs4_file * +find_or_add_file(struct nfs4_file *new, struct knfsd_fh *fh) { - rhltable_remove(&nfs4_file_rhltable, &fi->fi_rlist, - nfs4_file_rhash_params); + struct nfs4_file *fp; + unsigned int hashval = file_hashval(fh); + + rcu_read_lock(); + fp = find_file_locked(fh, hashval); + rcu_read_unlock(); + if (fp) + return fp; + + spin_lock(&state_lock); + fp = find_file_locked(fh, hashval); + if (likely(fp == NULL)) { + nfsd4_init_file(fh, hashval, new); + fp = new; + } + spin_unlock(&state_lock); + + return fp; } /* @@ -4797,10 +4508,9 @@ nfs4_share_conflict(struct svc_fh *current_fh, unsigned int deny_type) struct nfs4_file *fp; __be32 ret = nfs_ok; - fp = nfsd4_file_hash_lookup(current_fh); + fp = find_file(¤t_fh->fh_handle); if (!fp) return ret; - /* Check for conflicting share reservations */ spin_lock(&fp->fi_lock); if (fp->fi_share_deny & deny_type) @@ -4810,35 +4520,6 @@ nfs4_share_conflict(struct svc_fh *current_fh, unsigned int deny_type) return ret; } -static bool nfsd4_deleg_present(const struct inode *inode) -{ - struct file_lock_context *ctx = locks_inode_context(inode); - - return ctx && !list_empty_careful(&ctx->flc_lease); -} - -/** - * nfsd_wait_for_delegreturn - wait for delegations to be returned - * @rqstp: the RPC transaction being executed - * @inode: in-core inode of the file being waited for - * - * The timeout prevents deadlock if all nfsd threads happen to be - * tied up waiting for returning delegations. - * - * Return values: - * %true: delegation was returned - * %false: timed out waiting for delegreturn - */ -bool nfsd_wait_for_delegreturn(struct svc_rqst *rqstp, struct inode *inode) -{ - long __maybe_unused timeo; - - timeo = wait_var_event_timeout(inode, !nfsd4_deleg_present(inode), - NFSD_DELEGRETURN_TIMEOUT); - trace_nfsd_delegret_wakeup(rqstp, inode, timeo); - return timeo > 0; -} - static void nfsd4_cb_recall_prepare(struct nfsd4_callback *cb) { struct nfs4_delegation *dp = cb_to_delegation(cb); @@ -4867,8 +4548,6 @@ static int nfsd4_cb_recall_done(struct nfsd4_callback *cb, { struct nfs4_delegation *dp = cb_to_delegation(cb); - trace_nfsd_cb_recall_done(&dp->dl_stid.sc_stateid, task); - if (dp->dl_stid.sc_type == NFS4_CLOSED_DELEG_STID || dp->dl_stid.sc_type == NFS4_REVOKED_DELEG_STID) return 1; @@ -4914,30 +4593,22 @@ static void nfsd_break_one_deleg(struct nfs4_delegation *dp) * We're assuming the state code never drops its reference * without first removing the lease. Since we're in this lease * callback (and since the lease code is serialized by the - * flc_lock) we know the server hasn't removed the lease yet, and + * i_lock) we know the server hasn't removed the lease yet, and * we know it's safe to take a reference. */ refcount_inc(&dp->dl_stid.sc_count); - WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall)); + nfsd4_run_cb(&dp->dl_recall); } -/* Called from break_lease() with flc_lock held. */ +/* Called from break_lease() with i_lock held. */ static bool nfsd_break_deleg_cb(struct file_lock *fl) { + bool ret = false; struct nfs4_delegation *dp = (struct nfs4_delegation *)fl->fl_owner; struct nfs4_file *fp = dp->dl_stid.sc_file; - struct nfs4_client *clp = dp->dl_stid.sc_client; - struct nfsd_net *nn; - trace_nfsd_cb_recall(&dp->dl_stid); - - dp->dl_recalled = true; - atomic_inc(&clp->cl_delegs_in_recall); - if (try_to_expire_client(clp)) { - nn = net_generic(clp->net, nfsd_net_id); - mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); - } + trace_nfsd_deleg_break(&dp->dl_stid.sc_stateid); /* * We don't want the locks code to timeout the lease for us; @@ -4946,9 +4617,11 @@ nfsd_break_deleg_cb(struct file_lock *fl) */ fl->fl_break_time = 0; + spin_lock(&fp->fi_lock); fp->fi_had_conflict = true; nfsd_break_one_deleg(dp); - return false; + spin_unlock(&fp->fi_lock); + return ret; } /** @@ -4979,14 +4652,9 @@ static int nfsd_change_deleg_cb(struct file_lock *onlist, int arg, struct list_head *dispose) { - struct nfs4_delegation *dp = (struct nfs4_delegation *)onlist->fl_owner; - struct nfs4_client *clp = dp->dl_stid.sc_client; - - if (arg & F_UNLCK) { - if (dp->dl_recalled) - atomic_dec(&clp->cl_delegs_in_recall); + if (arg & F_UNLCK) return lease_modify(onlist, arg, dispose); - } else + else return -EAGAIN; } @@ -5007,37 +4675,40 @@ static __be32 nfsd4_check_seqid(struct nfsd4_compound_state *cstate, struct nfs4 return nfserr_bad_seqid; } -static struct nfs4_client *lookup_clientid(clientid_t *clid, bool sessions, - struct nfsd_net *nn) +static __be32 lookup_clientid(clientid_t *clid, + struct nfsd4_compound_state *cstate, + struct nfsd_net *nn, + bool sessions) { struct nfs4_client *found; - spin_lock(&nn->client_lock); - found = find_confirmed_client(clid, sessions, nn); - if (found) - atomic_inc(&found->cl_rpc_users); - spin_unlock(&nn->client_lock); - return found; -} - -static __be32 set_client(clientid_t *clid, - struct nfsd4_compound_state *cstate, - struct nfsd_net *nn) -{ if (cstate->clp) { - if (!same_clid(&cstate->clp->cl_clientid, clid)) + found = cstate->clp; + if (!same_clid(&found->cl_clientid, clid)) return nfserr_stale_clientid; return nfs_ok; } + if (STALE_CLIENTID(clid, nn)) return nfserr_stale_clientid; + /* - * We're in the 4.0 case (otherwise the SEQUENCE op would have - * set cstate->clp), so session = false: + * For v4.1+ we get the client in the SEQUENCE op. If we don't have one + * cached already then we know this is for is for v4.0 and "sessions" + * will be false. */ - cstate->clp = lookup_clientid(clid, false, nn); - if (!cstate->clp) + WARN_ON_ONCE(cstate->session); + spin_lock(&nn->client_lock); + found = find_confirmed_client(clid, sessions, nn); + if (!found) { + spin_unlock(&nn->client_lock); return nfserr_expired; + } + atomic_inc(&found->cl_rpc_users); + spin_unlock(&nn->client_lock); + + /* Cache the nfs4_client in cstate! */ + cstate->clp = found; return nfs_ok; } @@ -5051,6 +4722,8 @@ nfsd4_process_open1(struct nfsd4_compound_state *cstate, struct nfs4_openowner *oo = NULL; __be32 status; + if (STALE_CLIENTID(&open->op_clientid, nn)) + return nfserr_stale_clientid; /* * In case we need it later, after we've already created the * file and don't want to risk a further failure: @@ -5059,7 +4732,7 @@ nfsd4_process_open1(struct nfsd4_compound_state *cstate, if (open->op_file == NULL) return nfserr_jukebox; - status = set_client(clientid, cstate, nn); + status = lookup_clientid(clientid, cstate, nn, false); if (status) return status; clp = cstate->clp; @@ -5183,19 +4856,16 @@ nfsd4_truncate(struct svc_rqst *rqstp, struct svc_fh *fh, .ia_valid = ATTR_SIZE, .ia_size = 0, }; - struct nfsd_attrs attrs = { - .na_iattr = &iattr, - }; if (!open->op_truncate) return 0; if (!(open->op_share_access & NFS4_SHARE_ACCESS_WRITE)) return nfserr_inval; - return nfsd_setattr(rqstp, fh, &attrs, 0, (time64_t)0); + return nfsd_setattr(rqstp, fh, &iattr, 0, (time64_t)0); } static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp, - struct nfsd4_open *open, bool new_stp) + struct nfsd4_open *open) { struct nfsd_file *nf = NULL; __be32 status; @@ -5211,13 +4881,6 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, */ status = nfs4_file_check_deny(fp, open->op_share_deny); if (status != nfs_ok) { - if (status != nfserr_share_denied) { - spin_unlock(&fp->fi_lock); - goto out; - } - if (nfs4_resolve_deny_conflicts_locked(fp, new_stp, - stp, open->op_share_deny, false)) - status = nfserr_jukebox; spin_unlock(&fp->fi_lock); goto out; } @@ -5225,13 +4888,6 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, /* set access to the file */ status = nfs4_file_get_access(fp, open->op_share_access); if (status != nfs_ok) { - if (status != nfserr_share_denied) { - spin_unlock(&fp->fi_lock); - goto out; - } - if (nfs4_resolve_deny_conflicts_locked(fp, new_stp, - stp, open->op_share_access, true)) - status = nfserr_jukebox; spin_unlock(&fp->fi_lock); goto out; } @@ -5247,12 +4903,9 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, if (!fp->fi_fds[oflag]) { spin_unlock(&fp->fi_lock); - - status = nfsd_file_acquire_opened(rqstp, cur_fh, access, - open->op_filp, &nf); - if (status != nfs_ok) + status = nfsd_file_acquire(rqstp, cur_fh, access, &nf); + if (status) goto out_put_access; - spin_lock(&fp->fi_lock); if (!fp->fi_fds[oflag]) { fp->fi_fds[oflag] = nf; @@ -5281,30 +4934,21 @@ static __be32 nfs4_get_vfs_file(struct svc_rqst *rqstp, struct nfs4_file *fp, } static __be32 -nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, - struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp, - struct nfsd4_open *open) +nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, struct svc_fh *cur_fh, struct nfs4_ol_stateid *stp, struct nfsd4_open *open) { __be32 status; unsigned char old_deny_bmap = stp->st_deny_bmap; if (!test_access(open->op_share_access, stp)) - return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open, false); + return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open); /* test and set deny mode */ spin_lock(&fp->fi_lock); status = nfs4_file_check_deny(fp, open->op_share_deny); - switch (status) { - case nfs_ok: + if (status == nfs_ok) { set_deny(open->op_share_deny, stp); fp->fi_share_deny |= - (open->op_share_deny & NFS4_SHARE_DENY_BOTH); - break; - case nfserr_share_denied: - if (nfs4_resolve_deny_conflicts_locked(fp, false, - stp, open->op_share_deny, false)) - status = nfserr_jukebox; - break; + (open->op_share_deny & NFS4_SHARE_DENY_BOTH); } spin_unlock(&fp->fi_lock); @@ -5348,118 +4992,11 @@ static struct file_lock *nfs4_alloc_init_lease(struct nfs4_delegation *dp, return fl; } -static int nfsd4_check_conflicting_opens(struct nfs4_client *clp, - struct nfs4_file *fp) -{ - struct nfs4_ol_stateid *st; - struct file *f = fp->fi_deleg_file->nf_file; - struct inode *ino = locks_inode(f); - int writes; - - writes = atomic_read(&ino->i_writecount); - if (!writes) - return 0; - /* - * There could be multiple filehandles (hence multiple - * nfs4_files) referencing this file, but that's not too - * common; let's just give up in that case rather than - * trying to go look up all the clients using that other - * nfs4_file as well: - */ - if (fp->fi_aliased) - return -EAGAIN; - /* - * If there's a close in progress, make sure that we see it - * clear any fi_fds[] entries before we see it decrement - * i_writecount: - */ - smp_mb__after_atomic(); - - if (fp->fi_fds[O_WRONLY]) - writes--; - if (fp->fi_fds[O_RDWR]) - writes--; - if (writes > 0) - return -EAGAIN; /* There may be non-NFSv4 writers */ - /* - * It's possible there are non-NFSv4 write opens in progress, - * but if they haven't incremented i_writecount yet then they - * also haven't called break lease yet; so, they'll break this - * lease soon enough. So, all that's left to check for is NFSv4 - * opens: - */ - spin_lock(&fp->fi_lock); - list_for_each_entry(st, &fp->fi_stateids, st_perfile) { - if (st->st_openstp == NULL /* it's an open */ && - access_permit_write(st) && - st->st_stid.sc_client != clp) { - spin_unlock(&fp->fi_lock); - return -EAGAIN; - } - } - spin_unlock(&fp->fi_lock); - /* - * There's a small chance that we could be racing with another - * NFSv4 open. However, any open that hasn't added itself to - * the fi_stateids list also hasn't called break_lease yet; so, - * they'll break this lease soon enough. - */ - return 0; -} - -/* - * It's possible that between opening the dentry and setting the delegation, - * that it has been renamed or unlinked. Redo the lookup to verify that this - * hasn't happened. - */ -static int -nfsd4_verify_deleg_dentry(struct nfsd4_open *open, struct nfs4_file *fp, - struct svc_fh *parent) -{ - struct svc_export *exp; - struct dentry *child; - __be32 err; - - err = nfsd_lookup_dentry(open->op_rqstp, parent, - open->op_fname, open->op_fnamelen, - &exp, &child); - - if (err) - return -EAGAIN; - - exp_put(exp); - dput(child); - if (child != file_dentry(fp->fi_deleg_file->nf_file)) - return -EAGAIN; - - return 0; -} - -/* - * We avoid breaking delegations held by a client due to its own activity, but - * clearing setuid/setgid bits on a write is an implicit activity and the client - * may not notice and continue using the old mode. Avoid giving out a delegation - * on setuid/setgid files when the client is requesting an open for write. - */ -static int -nfsd4_verify_setuid_write(struct nfsd4_open *open, struct nfsd_file *nf) -{ - struct inode *inode = file_inode(nf->nf_file); - - if ((open->op_share_access & NFS4_SHARE_ACCESS_WRITE) && - (inode->i_mode & (S_ISUID|S_ISGID))) - return -EAGAIN; - return 0; -} - static struct nfs4_delegation * -nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, - struct svc_fh *parent) +nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh, + struct nfs4_file *fp, struct nfs4_clnt_odstate *odstate) { int status = 0; - struct nfs4_client *clp = stp->st_stid.sc_client; - struct nfs4_file *fp = stp->st_stid.sc_file; - struct nfs4_clnt_odstate *odstate = stp->st_clnt_odstate; struct nfs4_delegation *dp; struct nfsd_file *nf; struct file_lock *fl; @@ -5474,19 +5011,14 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, nf = find_readable_file(fp); if (!nf) { - /* - * We probably could attempt another open and get a read - * delegation, but for now, don't bother until the - * client actually sends us one. - */ - return ERR_PTR(-EAGAIN); + /* We should always have a readable file here */ + WARN_ON_ONCE(1); + return ERR_PTR(-EBADF); } spin_lock(&state_lock); spin_lock(&fp->fi_lock); if (nfs4_delegation_exists(clp, fp)) status = -EAGAIN; - else if (nfsd4_verify_setuid_write(open, nf)) - status = -EAGAIN; else if (!fp->fi_deleg_file) { fp->fi_deleg_file = nf; /* increment early to prevent fi_deleg_file from being @@ -5503,7 +5035,7 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, return ERR_PTR(status); status = -ENOMEM; - dp = alloc_init_deleg(clp, fp, odstate); + dp = alloc_init_deleg(clp, fp, fh, odstate); if (!dp) goto out_delegees; @@ -5517,31 +5049,12 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, if (status) goto out_clnt_odstate; - if (parent) { - status = nfsd4_verify_deleg_dentry(open, fp, parent); - if (status) - goto out_unlock; - } - - status = nfsd4_check_conflicting_opens(clp, fp); - if (status) - goto out_unlock; - - /* - * Now that the deleg is set, check again to ensure that nothing - * raced in and changed the mode while we weren't lookng. - */ - status = nfsd4_verify_setuid_write(open, fp->fi_deleg_file); - if (status) - goto out_unlock; - - status = -EAGAIN; - if (fp->fi_had_conflict) - goto out_unlock; - spin_lock(&state_lock); spin_lock(&fp->fi_lock); - status = hash_delegation_locked(dp, fp); + if (fp->fi_had_conflict) + status = -EAGAIN; + else + status = hash_delegation_locked(dp, fp); spin_unlock(&fp->fi_lock); spin_unlock(&state_lock); @@ -5587,13 +5100,12 @@ static void nfsd4_open_deleg_none_ext(struct nfsd4_open *open, int status) * proper support for them. */ static void -nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, - struct svc_fh *currentfh) +nfs4_open_delegation(struct svc_fh *fh, struct nfsd4_open *open, + struct nfs4_ol_stateid *stp) { struct nfs4_delegation *dp; struct nfs4_openowner *oo = openowner(stp->st_stateowner); struct nfs4_client *clp = stp->st_stid.sc_client; - struct svc_fh *parent = NULL; int cb_up; int status = 0; @@ -5607,8 +5119,6 @@ nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, goto out_no_deleg; break; case NFS4_OPEN_CLAIM_NULL: - parent = currentfh; - fallthrough; case NFS4_OPEN_CLAIM_FH: /* * Let's not give out any delegations till everyone's @@ -5619,11 +5129,22 @@ nfs4_open_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp, goto out_no_deleg; if (!cb_up || !(oo->oo_flags & NFS4_OO_CONFIRMED)) goto out_no_deleg; + /* + * Also, if the file was opened for write or + * create, there's a good chance the client's + * about to write to it, resulting in an + * immediate recall (since we don't support + * write delegations): + */ + if (open->op_share_access & NFS4_SHARE_ACCESS_WRITE) + goto out_no_deleg; + if (open->op_create == NFS4_OPEN_CREATE) + goto out_no_deleg; break; default: goto out_no_deleg; } - dp = nfs4_set_delegation(open, stp, parent); + dp = nfs4_set_delegation(clp, fh, stp->st_stid.sc_file, stp->st_clnt_odstate); if (IS_ERR(dp)) goto out_no_deleg; @@ -5665,18 +5186,6 @@ static void nfsd4_deleg_xgrade_none_ext(struct nfsd4_open *open, */ } -/** - * nfsd4_process_open2 - finish open processing - * @rqstp: the RPC transaction being executed - * @current_fh: NFSv4 COMPOUND's current filehandle - * @open: OPEN arguments - * - * If successful, (1) truncate the file if open->op_truncate was - * set, (2) set open->op_stateid, (3) set open->op_delegation. - * - * Returns %nfs_ok on success; otherwise an nfs4stat value in - * network byte order is returned. - */ __be32 nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nfsd4_open *open) { @@ -5693,9 +5202,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf * and check for delegations in the process of being recalled. * If not found, create the nfs4_file struct */ - fp = nfsd4_file_hash_insert(open->op_file, current_fh); - if (unlikely(!fp)) - return nfserr_jukebox; + fp = find_or_add_file(open->op_file, ¤t_fh->fh_handle); if (fp != open->op_file) { status = nfs4_check_deleg(cl, open, &dp); if (status) @@ -5728,7 +5235,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf goto out; } } else { - status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open, true); + status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open); if (status) { stp->st_stid.sc_type = NFS4_CLOSED_STID; release_open_stateid(stp); @@ -5757,7 +5264,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf * Attempt to hand out a delegation. No error return, because the * OPEN succeeds even if we fail. */ - nfs4_open_delegation(open, stp, &resp->cstate.current_fh); + nfs4_open_delegation(current_fh, open, stp); nodeleg: status = nfs_ok; trace_nfsd_open(&stp->st_stid.sc_stateid); @@ -5815,14 +5322,17 @@ nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); trace_nfsd_clid_renew(clid); - status = set_client(clid, cstate, nn); + status = lookup_clientid(clid, cstate, nn, false); if (status) - return status; + goto out; clp = cstate->clp; + status = nfserr_cb_path_down; if (!list_empty(&clp->cl_delegations) && clp->cl_cb_state != NFSD4_CB_UP) - return nfserr_cb_path_down; - return nfs_ok; + goto out; + status = nfs_ok; +out: + return status; } void @@ -5883,245 +5393,66 @@ static bool clients_still_reclaiming(struct nfsd_net *nn) return true; } -struct laundry_time { - time64_t cutoff; - time64_t new_timeo; -}; - -static bool state_expired(struct laundry_time *lt, time64_t last_refresh) -{ - time64_t time_remaining; - - if (last_refresh < lt->cutoff) - return true; - time_remaining = last_refresh - lt->cutoff; - lt->new_timeo = min(lt->new_timeo, time_remaining); - return false; -} - -#ifdef CONFIG_NFSD_V4_2_INTER_SSC -void nfsd4_ssc_init_umount_work(struct nfsd_net *nn) -{ - spin_lock_init(&nn->nfsd_ssc_lock); - INIT_LIST_HEAD(&nn->nfsd_ssc_mount_list); - init_waitqueue_head(&nn->nfsd_ssc_waitq); -} -EXPORT_SYMBOL_GPL(nfsd4_ssc_init_umount_work); - -/* - * This is called when nfsd is being shutdown, after all inter_ssc - * cleanup were done, to destroy the ssc delayed unmount list. - */ -static void nfsd4_ssc_shutdown_umount(struct nfsd_net *nn) -{ - struct nfsd4_ssc_umount_item *ni = NULL; - struct nfsd4_ssc_umount_item *tmp; - - spin_lock(&nn->nfsd_ssc_lock); - list_for_each_entry_safe(ni, tmp, &nn->nfsd_ssc_mount_list, nsui_list) { - list_del(&ni->nsui_list); - spin_unlock(&nn->nfsd_ssc_lock); - mntput(ni->nsui_vfsmount); - kfree(ni); - spin_lock(&nn->nfsd_ssc_lock); - } - spin_unlock(&nn->nfsd_ssc_lock); -} - -static void nfsd4_ssc_expire_umount(struct nfsd_net *nn) -{ - bool do_wakeup = false; - struct nfsd4_ssc_umount_item *ni = NULL; - struct nfsd4_ssc_umount_item *tmp; - - spin_lock(&nn->nfsd_ssc_lock); - list_for_each_entry_safe(ni, tmp, &nn->nfsd_ssc_mount_list, nsui_list) { - if (time_after(jiffies, ni->nsui_expire)) { - if (refcount_read(&ni->nsui_refcnt) > 1) - continue; - - /* mark being unmount */ - ni->nsui_busy = true; - spin_unlock(&nn->nfsd_ssc_lock); - mntput(ni->nsui_vfsmount); - spin_lock(&nn->nfsd_ssc_lock); - - /* waiters need to start from begin of list */ - list_del(&ni->nsui_list); - kfree(ni); - - /* wakeup ssc_connect waiters */ - do_wakeup = true; - continue; - } - break; - } - if (do_wakeup) - wake_up_all(&nn->nfsd_ssc_waitq); - spin_unlock(&nn->nfsd_ssc_lock); -} -#endif - -/* Check if any lock belonging to this lockowner has any blockers */ -static bool -nfs4_lockowner_has_blockers(struct nfs4_lockowner *lo) -{ - struct file_lock_context *ctx; - struct nfs4_ol_stateid *stp; - struct nfs4_file *nf; - - list_for_each_entry(stp, &lo->lo_owner.so_stateids, st_perstateowner) { - nf = stp->st_stid.sc_file; - ctx = locks_inode_context(nf->fi_inode); - if (!ctx) - continue; - if (locks_owner_has_blockers(ctx, lo)) - return true; - } - return false; -} - -static bool -nfs4_anylock_blockers(struct nfs4_client *clp) -{ - int i; - struct nfs4_stateowner *so; - struct nfs4_lockowner *lo; - - if (atomic_read(&clp->cl_delegs_in_recall)) - return true; - spin_lock(&clp->cl_lock); - for (i = 0; i < OWNER_HASH_SIZE; i++) { - list_for_each_entry(so, &clp->cl_ownerstr_hashtbl[i], - so_strhash) { - if (so->so_is_open_owner) - continue; - lo = lockowner(so); - if (nfs4_lockowner_has_blockers(lo)) { - spin_unlock(&clp->cl_lock); - return true; - } - } - } - spin_unlock(&clp->cl_lock); - return false; -} - -static void -nfs4_get_client_reaplist(struct nfsd_net *nn, struct list_head *reaplist, - struct laundry_time *lt) -{ - unsigned int maxreap, reapcnt = 0; - struct list_head *pos, *next; - struct nfs4_client *clp; - - maxreap = (atomic_read(&nn->nfs4_client_count) >= nn->nfs4_max_clients) ? - NFSD_CLIENT_MAX_TRIM_PER_RUN : 0; - INIT_LIST_HEAD(reaplist); - spin_lock(&nn->client_lock); - list_for_each_safe(pos, next, &nn->client_lru) { - clp = list_entry(pos, struct nfs4_client, cl_lru); - if (clp->cl_state == NFSD4_EXPIRABLE) - goto exp_client; - if (!state_expired(lt, clp->cl_time)) - break; - if (!atomic_read(&clp->cl_rpc_users)) { - if (clp->cl_state == NFSD4_ACTIVE) - atomic_inc(&nn->nfsd_courtesy_clients); - clp->cl_state = NFSD4_COURTESY; - } - if (!client_has_state(clp)) - goto exp_client; - if (!nfs4_anylock_blockers(clp)) - if (reapcnt >= maxreap) - continue; -exp_client: - if (!mark_client_expired_locked(clp)) { - list_add(&clp->cl_lru, reaplist); - reapcnt++; - } - } - spin_unlock(&nn->client_lock); -} - -static void -nfs4_get_courtesy_client_reaplist(struct nfsd_net *nn, - struct list_head *reaplist) -{ - unsigned int maxreap = 0, reapcnt = 0; - struct list_head *pos, *next; - struct nfs4_client *clp; - - maxreap = NFSD_CLIENT_MAX_TRIM_PER_RUN; - INIT_LIST_HEAD(reaplist); - - spin_lock(&nn->client_lock); - list_for_each_safe(pos, next, &nn->client_lru) { - clp = list_entry(pos, struct nfs4_client, cl_lru); - if (clp->cl_state == NFSD4_ACTIVE) - break; - if (reapcnt >= maxreap) - break; - if (!mark_client_expired_locked(clp)) { - list_add(&clp->cl_lru, reaplist); - reapcnt++; - } - } - spin_unlock(&nn->client_lock); -} - -static void -nfs4_process_client_reaplist(struct list_head *reaplist) -{ - struct list_head *pos, *next; - struct nfs4_client *clp; - - list_for_each_safe(pos, next, reaplist) { - clp = list_entry(pos, struct nfs4_client, cl_lru); - trace_nfsd_clid_purged(&clp->cl_clientid); - list_del_init(&clp->cl_lru); - expire_client(clp); - } -} - static time64_t nfs4_laundromat(struct nfsd_net *nn) { + struct nfs4_client *clp; struct nfs4_openowner *oo; struct nfs4_delegation *dp; struct nfs4_ol_stateid *stp; struct nfsd4_blocked_lock *nbl; struct list_head *pos, *next, reaplist; - struct laundry_time lt = { - .cutoff = ktime_get_boottime_seconds() - nn->nfsd4_lease, - .new_timeo = nn->nfsd4_lease - }; + time64_t cutoff = ktime_get_boottime_seconds() - nn->nfsd4_lease; + time64_t t, new_timeo = nn->nfsd4_lease; struct nfs4_cpntf_state *cps; copy_stateid_t *cps_t; int i; if (clients_still_reclaiming(nn)) { - lt.new_timeo = 0; + new_timeo = 0; goto out; } nfsd4_end_grace(nn); + INIT_LIST_HEAD(&reaplist); spin_lock(&nn->s2s_cp_lock); idr_for_each_entry(&nn->s2s_cp_stateids, cps_t, i) { cps = container_of(cps_t, struct nfs4_cpntf_state, cp_stateid); - if (cps->cp_stateid.cs_type == NFS4_COPYNOTIFY_STID && - state_expired(<, cps->cpntf_time)) + if (cps->cp_stateid.sc_type == NFS4_COPYNOTIFY_STID && + cps->cpntf_time < cutoff) _free_cpntf_state_locked(nn, cps); } spin_unlock(&nn->s2s_cp_lock); - nfs4_get_client_reaplist(nn, &reaplist, <); - nfs4_process_client_reaplist(&reaplist); + spin_lock(&nn->client_lock); + list_for_each_safe(pos, next, &nn->client_lru) { + clp = list_entry(pos, struct nfs4_client, cl_lru); + if (clp->cl_time > cutoff) { + t = clp->cl_time - cutoff; + new_timeo = min(new_timeo, t); + break; + } + if (mark_client_expired_locked(clp)) { + trace_nfsd_clid_expired(&clp->cl_clientid); + continue; + } + list_add(&clp->cl_lru, &reaplist); + } + spin_unlock(&nn->client_lock); + list_for_each_safe(pos, next, &reaplist) { + clp = list_entry(pos, struct nfs4_client, cl_lru); + trace_nfsd_clid_purged(&clp->cl_clientid); + list_del_init(&clp->cl_lru); + expire_client(clp); + } spin_lock(&state_lock); list_for_each_safe(pos, next, &nn->del_recall_lru) { dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru); - if (!state_expired(<, dp->dl_time)) + if (dp->dl_time > cutoff) { + t = dp->dl_time - cutoff; + new_timeo = min(new_timeo, t); break; + } WARN_ON(!unhash_delegation_locked(dp)); list_add(&dp->dl_recall_lru, &reaplist); } @@ -6137,8 +5468,11 @@ nfs4_laundromat(struct nfsd_net *nn) while (!list_empty(&nn->close_lru)) { oo = list_first_entry(&nn->close_lru, struct nfs4_openowner, oo_close_lru); - if (!state_expired(<, oo->oo_time)) + if (oo->oo_time > cutoff) { + t = oo->oo_time - cutoff; + new_timeo = min(new_timeo, t); break; + } list_del_init(&oo->oo_close_lru); stp = oo->oo_last_closed_stid; oo->oo_last_closed_stid = NULL; @@ -6164,8 +5498,11 @@ nfs4_laundromat(struct nfsd_net *nn) while (!list_empty(&nn->blocked_locks_lru)) { nbl = list_first_entry(&nn->blocked_locks_lru, struct nfsd4_blocked_lock, nbl_lru); - if (!state_expired(<, nbl->nbl_time)) + if (nbl->nbl_time > cutoff) { + t = nbl->nbl_time - cutoff; + new_timeo = min(new_timeo, t); break; + } list_move(&nbl->nbl_lru, &reaplist); list_del_init(&nbl->nbl_list); } @@ -6177,14 +5514,12 @@ nfs4_laundromat(struct nfsd_net *nn) list_del_init(&nbl->nbl_lru); free_blocked_lock(nbl); } -#ifdef CONFIG_NFSD_V4_2_INTER_SSC - /* service the server-to-server copy delayed unmount list */ - nfsd4_ssc_expire_umount(nn); -#endif out: - return max_t(time64_t, lt.new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT); + new_timeo = max_t(time64_t, new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT); + return new_timeo; } +static struct workqueue_struct *laundry_wq; static void laundromat_main(struct work_struct *); static void @@ -6199,63 +5534,6 @@ laundromat_main(struct work_struct *laundry) queue_delayed_work(laundry_wq, &nn->laundromat_work, t*HZ); } -static void -courtesy_client_reaper(struct nfsd_net *nn) -{ - struct list_head reaplist; - - nfs4_get_courtesy_client_reaplist(nn, &reaplist); - nfs4_process_client_reaplist(&reaplist); -} - -static void -deleg_reaper(struct nfsd_net *nn) -{ - struct list_head *pos, *next; - struct nfs4_client *clp; - struct list_head cblist; - - INIT_LIST_HEAD(&cblist); - spin_lock(&nn->client_lock); - list_for_each_safe(pos, next, &nn->client_lru) { - clp = list_entry(pos, struct nfs4_client, cl_lru); - if (clp->cl_state != NFSD4_ACTIVE || - list_empty(&clp->cl_delegations) || - atomic_read(&clp->cl_delegs_in_recall) || - test_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags) || - (ktime_get_boottime_seconds() - - clp->cl_ra_time < 5)) { - continue; - } - list_add(&clp->cl_ra_cblist, &cblist); - - /* release in nfsd4_cb_recall_any_release */ - atomic_inc(&clp->cl_rpc_users); - set_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags); - clp->cl_ra_time = ktime_get_boottime_seconds(); - } - spin_unlock(&nn->client_lock); - - while (!list_empty(&cblist)) { - clp = list_first_entry(&cblist, struct nfs4_client, - cl_ra_cblist); - list_del_init(&clp->cl_ra_cblist); - clp->cl_ra->ra_keep = 0; - clp->cl_ra->ra_bmval[0] = BIT(RCA4_TYPE_MASK_RDATA_DLG); - nfsd4_run_cb(&clp->cl_ra->ra_cb); - } -} - -static void -nfsd4_state_shrinker_worker(struct work_struct *work) -{ - struct nfsd_net *nn = container_of(work, struct nfsd_net, - nfsd_shrinker_work); - - courtesy_client_reaper(nn); - deleg_reaper(nn); -} - static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp) { if (!fh_match(&fhp->fh_handle, &stp->sc_file->fi_fhandle)) @@ -6263,6 +5541,21 @@ static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp) return nfs_ok; } +static inline int +access_permit_read(struct nfs4_ol_stateid *stp) +{ + return test_access(NFS4_SHARE_ACCESS_READ, stp) || + test_access(NFS4_SHARE_ACCESS_BOTH, stp) || + test_access(NFS4_SHARE_ACCESS_WRITE, stp); +} + +static inline int +access_permit_write(struct nfs4_ol_stateid *stp) +{ + return test_access(NFS4_SHARE_ACCESS_WRITE, stp) || + test_access(NFS4_SHARE_ACCESS_BOTH, stp); +} + static __be32 nfs4_check_openmode(struct nfs4_ol_stateid *stp, int flags) { @@ -6399,7 +5692,6 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate, struct nfs4_stid **s, struct nfsd_net *nn) { __be32 status; - struct nfs4_stid *stid; bool return_revoked = false; /* @@ -6414,7 +5706,8 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate, if (ZERO_STATEID(stateid) || ONE_STATEID(stateid) || CLOSE_STATEID(stateid)) return nfserr_bad_stateid; - status = set_client(&stateid->si_opaque.so_clid, cstate, nn); + status = lookup_clientid(&stateid->si_opaque.so_clid, cstate, nn, + false); if (status == nfserr_stale_clientid) { if (cstate->session) return nfserr_bad_stateid; @@ -6422,16 +5715,15 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate, } if (status) return status; - stid = find_stateid_by_type(cstate->clp, stateid, typemask); - if (!stid) + *s = find_stateid_by_type(cstate->clp, stateid, typemask); + if (!*s) return nfserr_bad_stateid; - if ((stid->sc_type == NFS4_REVOKED_DELEG_STID) && !return_revoked) { - nfs4_put_stid(stid); + if (((*s)->sc_type == NFS4_REVOKED_DELEG_STID) && !return_revoked) { + nfs4_put_stid(*s); if (cstate->minorversion) return nfserr_deleg_revoked; return nfserr_bad_stateid; } - *s = stid; return nfs_ok; } @@ -6496,12 +5788,12 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s, static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps) { - WARN_ON_ONCE(cps->cp_stateid.cs_type != NFS4_COPYNOTIFY_STID); - if (!refcount_dec_and_test(&cps->cp_stateid.cs_count)) + WARN_ON_ONCE(cps->cp_stateid.sc_type != NFS4_COPYNOTIFY_STID); + if (!refcount_dec_and_test(&cps->cp_stateid.sc_count)) return; list_del(&cps->cp_list); idr_remove(&nn->s2s_cp_stateids, - cps->cp_stateid.cs_stid.si_opaque.so_id); + cps->cp_stateid.stid.si_opaque.so_id); kfree(cps); } /* @@ -6523,12 +5815,12 @@ __be32 manage_cpntf_state(struct nfsd_net *nn, stateid_t *st, if (cps_t) { state = container_of(cps_t, struct nfs4_cpntf_state, cp_stateid); - if (state->cp_stateid.cs_type != NFS4_COPYNOTIFY_STID) { + if (state->cp_stateid.sc_type != NFS4_COPYNOTIFY_STID) { state = NULL; goto unlock; } if (!clp) - refcount_inc(&state->cp_stateid.cs_count); + refcount_inc(&state->cp_stateid.sc_count); else _free_cpntf_state_locked(nn, state); } @@ -6546,27 +5838,21 @@ static __be32 find_cpntf_state(struct nfsd_net *nn, stateid_t *st, { __be32 status; struct nfs4_cpntf_state *cps = NULL; - struct nfs4_client *found; + struct nfsd4_compound_state cstate; status = manage_cpntf_state(nn, st, NULL, &cps); if (status) return status; cps->cpntf_time = ktime_get_boottime_seconds(); - - status = nfserr_expired; - found = lookup_clientid(&cps->cp_p_clid, true, nn); - if (!found) + memset(&cstate, 0, sizeof(cstate)); + status = lookup_clientid(&cps->cp_p_clid, &cstate, nn, true); + if (status) goto out; - - *stid = find_stateid_by_type(found, &cps->cp_p_stateid, - NFS4_DELEG_STID|NFS4_OPEN_STID|NFS4_LOCK_STID); - if (*stid) - status = nfs_ok; - else - status = nfserr_bad_stateid; - - put_client_renew(found); + status = nfsd4_lookup_stateid(&cstate, &cps->cp_p_stateid, + NFS4_DELEG_STID|NFS4_OPEN_STID|NFS4_LOCK_STID, + stid, nn); + put_client_renew(cstate.clp); out: nfs4_put_cpntf_state(nn, cps); return status; @@ -6601,11 +5887,7 @@ nfs4_preprocess_stateid_op(struct svc_rqst *rqstp, return nfserr_grace; if (ZERO_STATEID(stateid) || ONE_STATEID(stateid)) { - if (cstid) - status = nfserr_bad_stateid; - else - status = check_special_stateids(net, fhp, stateid, - flags); + status = check_special_stateids(net, fhp, stateid, flags); goto done; } @@ -6659,7 +5941,7 @@ nfsd4_test_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, { struct nfsd4_test_stateid *test_stateid = &u->test_stateid; struct nfsd4_test_stateid_id *stateid; - struct nfs4_client *cl = cstate->clp; + struct nfs4_client *cl = cstate->session->se_client; list_for_each_entry(stateid, &test_stateid->ts_stateid_list, ts_id_list) stateid->ts_id_status = @@ -6705,7 +5987,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, stateid_t *stateid = &free_stateid->fr_stateid; struct nfs4_stid *s; struct nfs4_delegation *dp; - struct nfs4_client *cl = cstate->clp; + struct nfs4_client *cl = cstate->session->se_client; __be32 ret = nfserr_bad_stateid; spin_lock(&cl->cl_lock); @@ -7034,8 +6316,6 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) goto put_stateid; - trace_nfsd_deleg_return(stateid); - wake_up_var(d_inode(cstate->current_fh.fh_dentry)); destroy_delegation(dp); put_stateid: nfs4_put_stid(&dp->dl_stid); @@ -7043,6 +6323,15 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, return status; } +static inline u64 +end_offset(u64 start, u64 len) +{ + u64 end; + + end = start + len; + return end >= start ? end: NFS4_MAX_UINT64; +} + /* last octet in a range */ static inline u64 last_byte_offset(u64 start, u64 len) @@ -7072,7 +6361,7 @@ nfs4_transform_lock_offset(struct file_lock *lock) } static fl_owner_t -nfsd4_lm_get_owner(fl_owner_t owner) +nfsd4_fl_get_owner(fl_owner_t owner) { struct nfs4_lockowner *lo = (struct nfs4_lockowner *)owner; @@ -7081,7 +6370,7 @@ nfsd4_lm_get_owner(fl_owner_t owner) } static void -nfsd4_lm_put_owner(fl_owner_t owner) +nfsd4_fl_put_owner(fl_owner_t owner) { struct nfs4_lockowner *lo = (struct nfs4_lockowner *)owner; @@ -7089,29 +6378,6 @@ nfsd4_lm_put_owner(fl_owner_t owner) nfs4_put_stateowner(&lo->lo_owner); } -/* return pointer to struct nfs4_client if client is expirable */ -static bool -nfsd4_lm_lock_expirable(struct file_lock *cfl) -{ - struct nfs4_lockowner *lo = (struct nfs4_lockowner *)cfl->fl_owner; - struct nfs4_client *clp = lo->lo_owner.so_client; - struct nfsd_net *nn; - - if (try_to_expire_client(clp)) { - nn = net_generic(clp->net, nfsd_net_id); - mod_delayed_work(laundry_wq, &nn->laundromat_work, 0); - return true; - } - return false; -} - -/* schedule laundromat to run immediately and wait for it to complete */ -static void -nfsd4_lm_expire_lock(void) -{ - flush_workqueue(laundry_wq); -} - static void nfsd4_lm_notify(struct file_lock *fl) { @@ -7131,19 +6397,14 @@ nfsd4_lm_notify(struct file_lock *fl) } spin_unlock(&nn->blocked_locks_lock); - if (queue) { - trace_nfsd_cb_notify_lock(lo, nbl); + if (queue) nfsd4_run_cb(&nbl->nbl_cb); - } } static const struct lock_manager_operations nfsd_posix_mng_ops = { - .lm_mod_owner = THIS_MODULE, .lm_notify = nfsd4_lm_notify, - .lm_get_owner = nfsd4_lm_get_owner, - .lm_put_owner = nfsd4_lm_put_owner, - .lm_lock_expirable = nfsd4_lm_lock_expirable, - .lm_expire_lock = nfsd4_lm_expire_lock, + .lm_get_owner = nfsd4_fl_get_owner, + .lm_put_owner = nfsd4_fl_put_owner, }; static inline void @@ -7458,9 +6719,13 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (nfsd4_has_session(cstate)) /* See rfc 5661 18.10.3: given clientid is ignored: */ memcpy(&lock->lk_new_clientid, - &cstate->clp->cl_clientid, + &cstate->session->se_client->cl_clientid, sizeof(clientid_t)); + status = nfserr_stale_clientid; + if (STALE_CLIENTID(&lock->lk_new_clientid, nn)) + goto out; + /* validate and update open stateid and open seqid */ status = nfs4_preprocess_confirmed_seqid_op(cstate, lock->lk_new_open_seqid, @@ -7498,9 +6763,6 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (!locks_in_grace(net) && lock->lk_reclaim) goto out; - if (lock->lk_reclaim) - fl_flags |= FL_RECLAIM; - fp = lock_stp->st_stid.sc_file; switch (lock->lk_type) { case NFS4_READW_LT: @@ -7537,16 +6799,6 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, goto out; } - /* - * Most filesystems with their own ->lock operations will block - * the nfsd thread waiting to acquire the lock. That leads to - * deadlocks (we don't want every nfsd thread tied up waiting - * for file locks), so don't attempt blocking lock notifications - * on those filesystems: - */ - if (nf->nf_file->f_op->lock) - fl_flags &= ~FL_SLEEP; - nbl = find_or_allocate_block(lock_sop, &fp->fi_fhandle, nn); if (!nbl) { dprintk("NFSD: %s: unable to allocate block!\n", __func__); @@ -7577,7 +6829,6 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, spin_lock(&nn->blocked_locks_lock); list_add_tail(&nbl->nbl_list, &lock_sop->lo_blocked); list_add_tail(&nbl->nbl_lru, &nn->blocked_locks_lru); - kref_get(&nbl->nbl_kref); spin_unlock(&nn->blocked_locks_lock); } @@ -7590,7 +6841,6 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, nn->somebody_reclaimed = true; break; case FILE_LOCK_DEFERRED: - kref_put(&nbl->nbl_kref, free_nbl); nbl = NULL; fallthrough; case -EAGAIN: /* conflock holds conflicting lock */ @@ -7611,13 +6861,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, /* dequeue it if we queued it before */ if (fl_flags & FL_SLEEP) { spin_lock(&nn->blocked_locks_lock); - if (!list_empty(&nbl->nbl_list) && - !list_empty(&nbl->nbl_lru)) { - list_del_init(&nbl->nbl_list); - list_del_init(&nbl->nbl_lru); - kref_put(&nbl->nbl_kref, free_nbl); - } - /* nbl can use one of lists to be linked to reaplist */ + list_del_init(&nbl->nbl_list); + list_del_init(&nbl->nbl_lru); spin_unlock(&nn->blocked_locks_lock); } free_blocked_lock(nbl); @@ -7658,22 +6903,21 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, static __be32 nfsd_test_lock(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file_lock *lock) { struct nfsd_file *nf; - struct inode *inode; __be32 err; err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); if (err) return err; - inode = fhp->fh_dentry->d_inode; - inode_lock(inode); /* to block new leases till after test_lock: */ - err = nfserrno(nfsd_open_break_lease(inode, NFSD_MAY_READ)); + fh_lock(fhp); /* to block new leases till after test_lock: */ + err = nfserrno(nfsd_open_break_lease(fhp->fh_dentry->d_inode, + NFSD_MAY_READ)); if (err) goto out; lock->fl_file = nf->nf_file; err = nfserrno(vfs_test_lock(nf->nf_file, lock)); lock->fl_file = NULL; out: - inode_unlock(inode); + fh_unlock(fhp); nfsd_file_put(nf); return err; } @@ -7698,7 +6942,8 @@ nfsd4_lockt(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, return nfserr_inval; if (!nfsd4_has_session(cstate)) { - status = set_client(&lockt->lt_clientid, cstate, nn); + status = lookup_clientid(&lockt->lt_clientid, cstate, nn, + false); if (status) goto out; } @@ -7835,20 +7080,18 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner) { struct file_lock *fl; int status = false; - struct nfsd_file *nf; + struct nfsd_file *nf = find_any_file(fp); struct inode *inode; struct file_lock_context *flctx; - spin_lock(&fp->fi_lock); - nf = find_any_file_locked(fp); if (!nf) { /* Any valid lock stateid should have some sort of access */ WARN_ON_ONCE(1); - goto out; + return status; } inode = locks_inode(nf->nf_file); - flctx = locks_inode_context(inode); + flctx = inode->i_flctx; if (flctx && !list_empty_careful(&flctx->flc_posix)) { spin_lock(&flctx->flc_lock); @@ -7860,62 +7103,57 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner) } spin_unlock(&flctx->flc_lock); } -out: - spin_unlock(&fp->fi_lock); + nfsd_file_put(nf); return status; } -/** - * nfsd4_release_lockowner - process NFSv4.0 RELEASE_LOCKOWNER operations - * @rqstp: RPC transaction - * @cstate: NFSv4 COMPOUND state - * @u: RELEASE_LOCKOWNER arguments - * - * Check if theree are any locks still held and if not - free the lockowner - * and any lock state that is owned. - * - * Return values: - * %nfs_ok: lockowner released or not found - * %nfserr_locks_held: lockowner still in use - * %nfserr_stale_clientid: clientid no longer active - * %nfserr_expired: clientid not recognized - */ __be32 nfsd4_release_lockowner(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, union nfsd4_op_u *u) { struct nfsd4_release_lockowner *rlockowner = &u->release_lockowner; - struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); clientid_t *clid = &rlockowner->rl_clientid; + struct nfs4_stateowner *sop; + struct nfs4_lockowner *lo = NULL; struct nfs4_ol_stateid *stp; - struct nfs4_lockowner *lo; - struct nfs4_client *clp; - LIST_HEAD(reaplist); + struct xdr_netobj *owner = &rlockowner->rl_owner; + unsigned int hashval = ownerstr_hashval(owner); __be32 status; + struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); + struct nfs4_client *clp; + LIST_HEAD (reaplist); dprintk("nfsd4_release_lockowner clientid: (%08x/%08x):\n", clid->cl_boot, clid->cl_id); - status = set_client(clid, cstate, nn); + status = lookup_clientid(clid, cstate, nn, false); if (status) return status; + clp = cstate->clp; - + /* Find the matching lock stateowner */ spin_lock(&clp->cl_lock); - lo = find_lockowner_str_locked(clp, &rlockowner->rl_owner); - if (!lo) { - spin_unlock(&clp->cl_lock); - return nfs_ok; - } + list_for_each_entry(sop, &clp->cl_ownerstr_hashtbl[hashval], + so_strhash) { - list_for_each_entry(stp, &lo->lo_owner.so_stateids, st_perstateowner) { - if (check_for_locks(stp->st_stid.sc_file, lo)) { + if (sop->so_is_open_owner || !same_owner_str(sop, owner)) + continue; + + if (atomic_read(&sop->so_count) != 1) { spin_unlock(&clp->cl_lock); - nfs4_put_stateowner(&lo->lo_owner); return nfserr_locks_held; } + + lo = lockowner(sop); + nfs4_get_stateowner(sop); + break; } + if (!lo) { + spin_unlock(&clp->cl_lock); + return status; + } + unhash_lockowner_locked(lo); while (!list_empty(&lo->lo_owner.so_stateids)) { stp = list_first_entry(&lo->lo_owner.so_stateids, @@ -7925,11 +7163,11 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp, put_ol_stateid_locked(stp, &reaplist); } spin_unlock(&clp->cl_lock); - free_ol_stateid_reaplist(&reaplist); remove_blocked_locks(lo); nfs4_put_stateowner(&lo->lo_owner); - return nfs_ok; + + return status; } static inline struct nfs4_client_reclaim * @@ -8018,13 +7256,25 @@ nfsd4_find_reclaim_client(struct xdr_netobj name, struct nfsd_net *nn) return NULL; } +/* +* Called from OPEN. Look for clientid in reclaim list. +*/ __be32 -nfs4_check_open_reclaim(struct nfs4_client *clp) +nfs4_check_open_reclaim(clientid_t *clid, + struct nfsd4_compound_state *cstate, + struct nfsd_net *nn) { - if (test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, &clp->cl_flags)) + __be32 status; + + /* find clientid in conf_id_hashtbl */ + status = lookup_clientid(clid, cstate, nn, false); + if (status) + return nfserr_reclaim_bad; + + if (test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, &cstate->clp->cl_flags)) return nfserr_no_grace; - if (nfsd4_client_record_check(clp)) + if (nfsd4_client_record_check(cstate->clp)) return nfserr_reclaim_bad; return nfs_ok; @@ -8095,20 +7345,10 @@ static int nfs4_state_create_net(struct net *net) INIT_LIST_HEAD(&nn->blocked_locks_lru); INIT_DELAYED_WORK(&nn->laundromat_work, laundromat_main); - INIT_WORK(&nn->nfsd_shrinker_work, nfsd4_state_shrinker_worker); get_net(net); - nn->nfsd_client_shrinker.scan_objects = nfsd4_state_shrinker_scan; - nn->nfsd_client_shrinker.count_objects = nfsd4_state_shrinker_count; - nn->nfsd_client_shrinker.seeks = DEFAULT_SEEKS; - - if (register_shrinker(&nn->nfsd_client_shrinker)) - goto err_shrinker; return 0; -err_shrinker: - put_net(net); - kfree(nn->sessionid_hashtbl); err_sessionid: kfree(nn->unconf_id_hashtbl); err_unconf_id: @@ -8180,18 +7420,22 @@ nfs4_state_start(void) { int ret; - ret = rhltable_init(&nfs4_file_rhltable, &nfs4_file_rhash_params); - if (ret) - return ret; - - ret = nfsd4_create_callback_queue(); - if (ret) { - rhltable_destroy(&nfs4_file_rhltable); - return ret; + laundry_wq = alloc_workqueue("%s", WQ_UNBOUND, 0, "nfsd4"); + if (laundry_wq == NULL) { + ret = -ENOMEM; + goto out; } + ret = nfsd4_create_callback_queue(); + if (ret) + goto out_free_laundry; set_max_delegations(); return 0; + +out_free_laundry: + destroy_workqueue(laundry_wq); +out: + return ret; } void @@ -8201,8 +7445,6 @@ nfs4_state_shutdown_net(struct net *net) struct list_head *pos, *next, reaplist; struct nfsd_net *nn = net_generic(net, nfsd_net_id); - unregister_shrinker(&nn->nfsd_client_shrinker); - cancel_work(&nn->nfsd_shrinker_work); cancel_delayed_work_sync(&nn->laundromat_work); locks_end_grace(&nn->nfsd4_manager); @@ -8222,16 +7464,13 @@ nfs4_state_shutdown_net(struct net *net) nfsd4_client_tracking_exit(net); nfs4_state_destroy_net(net); -#ifdef CONFIG_NFSD_V4_2_INTER_SSC - nfsd4_ssc_shutdown_umount(nn); -#endif } void nfs4_state_shutdown(void) { + destroy_workqueue(laundry_wq); nfsd4_destroy_callback_queue(); - rhltable_destroy(&nfs4_file_rhltable); } static void diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 5a68c6286492..dbfa24cf3390 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -42,8 +42,6 @@ #include #include #include -#include - #include #include "idmap.h" @@ -56,8 +54,6 @@ #include "pnfs.h" #include "filecache.h" -#include "trace.h" - #ifdef CONFIG_NFSD_V4_SECURITY_LABEL #include #endif @@ -94,8 +90,6 @@ check_filename(char *str, int len) if (len == 0) return nfserr_inval; - if (len > NFS4_MAXNAMLEN) - return nfserr_nametoolong; if (isdotent(str, len)) return nfserr_badname; for (i = 0; i < len; i++) @@ -104,6 +98,122 @@ check_filename(char *str, int len) return 0; } +#define DECODE_HEAD \ + __be32 *p; \ + __be32 status +#define DECODE_TAIL \ + status = 0; \ +out: \ + return status; \ +xdr_error: \ + dprintk("NFSD: xdr error (%s:%d)\n", \ + __FILE__, __LINE__); \ + status = nfserr_bad_xdr; \ + goto out + +#define READMEM(x,nbytes) do { \ + x = (char *)p; \ + p += XDR_QUADLEN(nbytes); \ +} while (0) +#define SAVEMEM(x,nbytes) do { \ + if (!(x = (p==argp->tmp || p == argp->tmpp) ? \ + savemem(argp, p, nbytes) : \ + (char *)p)) { \ + dprintk("NFSD: xdr error (%s:%d)\n", \ + __FILE__, __LINE__); \ + goto xdr_error; \ + } \ + p += XDR_QUADLEN(nbytes); \ +} while (0) +#define COPYMEM(x,nbytes) do { \ + memcpy((x), p, nbytes); \ + p += XDR_QUADLEN(nbytes); \ +} while (0) + +/* READ_BUF, read_buf(): nbytes must be <= PAGE_SIZE */ +#define READ_BUF(nbytes) do { \ + if (nbytes <= (u32)((char *)argp->end - (char *)argp->p)) { \ + p = argp->p; \ + argp->p += XDR_QUADLEN(nbytes); \ + } else if (!(p = read_buf(argp, nbytes))) { \ + dprintk("NFSD: xdr error (%s:%d)\n", \ + __FILE__, __LINE__); \ + goto xdr_error; \ + } \ +} while (0) + +static void next_decode_page(struct nfsd4_compoundargs *argp) +{ + argp->p = page_address(argp->pagelist[0]); + argp->pagelist++; + if (argp->pagelen < PAGE_SIZE) { + argp->end = argp->p + XDR_QUADLEN(argp->pagelen); + argp->pagelen = 0; + } else { + argp->end = argp->p + (PAGE_SIZE>>2); + argp->pagelen -= PAGE_SIZE; + } +} + +static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes) +{ + /* We want more bytes than seem to be available. + * Maybe we need a new page, maybe we have just run out + */ + unsigned int avail = (char *)argp->end - (char *)argp->p; + __be32 *p; + + if (argp->pagelen == 0) { + struct kvec *vec = &argp->rqstp->rq_arg.tail[0]; + + if (!argp->tail) { + argp->tail = true; + avail = vec->iov_len; + argp->p = vec->iov_base; + argp->end = vec->iov_base + avail; + } + + if (avail < nbytes) + return NULL; + + p = argp->p; + argp->p += XDR_QUADLEN(nbytes); + return p; + } + + if (avail + argp->pagelen < nbytes) + return NULL; + if (avail + PAGE_SIZE < nbytes) /* need more than a page !! */ + return NULL; + /* ok, we can do it with the current plus the next page */ + if (nbytes <= sizeof(argp->tmp)) + p = argp->tmp; + else { + kfree(argp->tmpp); + p = argp->tmpp = kmalloc(nbytes, GFP_KERNEL); + if (!p) + return NULL; + + } + /* + * The following memcpy is safe because read_buf is always + * called with nbytes > avail, and the two cases above both + * guarantee p points to at least nbytes bytes. + */ + memcpy(p, argp->p, avail); + next_decode_page(argp); + memcpy(((char*)p)+avail, argp->p, (nbytes - avail)); + argp->p += XDR_QUADLEN(nbytes - avail); + return p; +} + +static unsigned int compoundargs_bytes_left(struct nfsd4_compoundargs *argp) +{ + unsigned int this = (char *)argp->end - (char *)argp->p; + + return this + argp->pagelen; +} + static int zero_clientid(clientid_t *clid) { return (clid->cl_boot == 0) && (clid->cl_id == 0); @@ -149,246 +259,118 @@ svcxdr_dupstr(struct nfsd4_compoundargs *argp, void *buf, u32 len) return p; } -static void * -svcxdr_savemem(struct nfsd4_compoundargs *argp, __be32 *p, u32 len) +static __be32 +svcxdr_construct_vector(struct nfsd4_compoundargs *argp, struct kvec *head, + struct page ***pagelist, u32 buflen) { - __be32 *tmp; + int avail; + int len; + int pages; - /* - * The location of the decoded data item is stable, - * so @p is OK to use. This is the common case. + /* Sorry .. no magic macros for this.. * + * READ_BUF(write->wr_buflen); + * SAVEMEM(write->wr_buf, write->wr_buflen); */ - if (p != argp->xdr->scratch.iov_base) - return p; + avail = (char *)argp->end - (char *)argp->p; + if (avail + argp->pagelen < buflen) { + dprintk("NFSD: xdr error (%s:%d)\n", + __FILE__, __LINE__); + return nfserr_bad_xdr; + } + head->iov_base = argp->p; + head->iov_len = avail; + *pagelist = argp->pagelist; - tmp = svcxdr_tmpalloc(argp, len); - if (!tmp) + len = XDR_QUADLEN(buflen) << 2; + if (len >= avail) { + len -= avail; + + pages = len >> PAGE_SHIFT; + argp->pagelist += pages; + argp->pagelen -= pages * PAGE_SIZE; + len -= pages * PAGE_SIZE; + + next_decode_page(argp); + } + argp->p += XDR_QUADLEN(len); + + return 0; +} + +/** + * savemem - duplicate a chunk of memory for later processing + * @argp: NFSv4 compound argument structure to be freed with + * @p: pointer to be duplicated + * @nbytes: length to be duplicated + * + * Returns a pointer to a copy of @nbytes bytes of memory at @p + * that are preserved until processing of the NFSv4 compound + * operation described by @argp finishes. + */ +static char *savemem(struct nfsd4_compoundargs *argp, __be32 *p, int nbytes) +{ + void *ret; + + ret = svcxdr_tmpalloc(argp, nbytes); + if (!ret) return NULL; - memcpy(tmp, p, len); - return tmp; -} - -/* - * NFSv4 basic data type decoders - */ - -/* - * This helper handles variable-length opaques which belong to protocol - * elements that this implementation does not support. - */ -static __be32 -nfsd4_decode_ignored_string(struct nfsd4_compoundargs *argp, u32 maxlen) -{ - u32 len; - - if (xdr_stream_decode_u32(argp->xdr, &len) < 0) - return nfserr_bad_xdr; - if (maxlen && len > maxlen) - return nfserr_bad_xdr; - if (!xdr_inline_decode(argp->xdr, len)) - return nfserr_bad_xdr; - - return nfs_ok; + memcpy(ret, p, nbytes); + return ret; } static __be32 -nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_netobj *o) +nfsd4_decode_time(struct nfsd4_compoundargs *argp, struct timespec64 *tv) { - __be32 *p; - u32 len; + DECODE_HEAD; - if (xdr_stream_decode_u32(argp->xdr, &len) < 0) - return nfserr_bad_xdr; - if (len == 0 || len > NFS4_OPAQUE_LIMIT) - return nfserr_bad_xdr; - p = xdr_inline_decode(argp->xdr, len); - if (!p) - return nfserr_bad_xdr; - o->data = svcxdr_savemem(argp, p, len); - if (!o->data) - return nfserr_jukebox; - o->len = len; - - return nfs_ok; -} - -static __be32 -nfsd4_decode_component4(struct nfsd4_compoundargs *argp, char **namp, u32 *lenp) -{ - __be32 *p, status; - - if (xdr_stream_decode_u32(argp->xdr, lenp) < 0) - return nfserr_bad_xdr; - p = xdr_inline_decode(argp->xdr, *lenp); - if (!p) - return nfserr_bad_xdr; - status = check_filename((char *)p, *lenp); - if (status) - return status; - *namp = svcxdr_savemem(argp, p, *lenp); - if (!*namp) - return nfserr_jukebox; - - return nfs_ok; -} - -static __be32 -nfsd4_decode_nfstime4(struct nfsd4_compoundargs *argp, struct timespec64 *tv) -{ - __be32 *p; - - p = xdr_inline_decode(argp->xdr, XDR_UNIT * 3); - if (!p) - return nfserr_bad_xdr; + READ_BUF(12); p = xdr_decode_hyper(p, &tv->tv_sec); tv->tv_nsec = be32_to_cpup(p++); if (tv->tv_nsec >= (u32)1000000000) return nfserr_inval; - return nfs_ok; + + DECODE_TAIL; } static __be32 -nfsd4_decode_verifier4(struct nfsd4_compoundargs *argp, nfs4_verifier *verf) +nfsd4_decode_bitmap(struct nfsd4_compoundargs *argp, u32 *bmval) { - __be32 *p; + u32 bmlen; + DECODE_HEAD; - p = xdr_inline_decode(argp->xdr, NFS4_VERIFIER_SIZE); - if (!p) - return nfserr_bad_xdr; - memcpy(verf->data, p, sizeof(verf->data)); - return nfs_ok; -} + bmval[0] = 0; + bmval[1] = 0; + bmval[2] = 0; -/** - * nfsd4_decode_bitmap4 - Decode an NFSv4 bitmap4 - * @argp: NFSv4 compound argument structure - * @bmval: pointer to an array of u32's to decode into - * @bmlen: size of the @bmval array - * - * The server needs to return nfs_ok rather than nfserr_bad_xdr when - * encountering bitmaps containing bits it does not recognize. This - * includes bits in bitmap words past WORDn, where WORDn is the last - * bitmap WORD the implementation currently supports. Thus we are - * careful here to simply ignore bits in bitmap words that this - * implementation has yet to support explicitly. - * - * Return values: - * %nfs_ok: @bmval populated successfully - * %nfserr_bad_xdr: the encoded bitmap was invalid - */ -static __be32 -nfsd4_decode_bitmap4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen) -{ - ssize_t status; + READ_BUF(4); + bmlen = be32_to_cpup(p++); + if (bmlen > 1000) + goto xdr_error; - status = xdr_stream_decode_uint32_array(argp->xdr, bmval, bmlen); - return status == -EBADMSG ? nfserr_bad_xdr : nfs_ok; + READ_BUF(bmlen << 2); + if (bmlen > 0) + bmval[0] = be32_to_cpup(p++); + if (bmlen > 1) + bmval[1] = be32_to_cpup(p++); + if (bmlen > 2) + bmval[2] = be32_to_cpup(p++); + + DECODE_TAIL; } static __be32 -nfsd4_decode_nfsace4(struct nfsd4_compoundargs *argp, struct nfs4_ace *ace) +nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval, + struct iattr *iattr, struct nfs4_acl **acl, + struct xdr_netobj *label, int *umask) { - __be32 *p, status; - u32 length; - - if (xdr_stream_decode_u32(argp->xdr, &ace->type) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &ace->flag) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &ace->access_mask) < 0) - return nfserr_bad_xdr; - - if (xdr_stream_decode_u32(argp->xdr, &length) < 0) - return nfserr_bad_xdr; - p = xdr_inline_decode(argp->xdr, length); - if (!p) - return nfserr_bad_xdr; - ace->whotype = nfs4_acl_get_whotype((char *)p, length); - if (ace->whotype != NFS4_ACL_WHO_NAMED) - status = nfs_ok; - else if (ace->flag & NFS4_ACE_IDENTIFIER_GROUP) - status = nfsd_map_name_to_gid(argp->rqstp, - (char *)p, length, &ace->who_gid); - else - status = nfsd_map_name_to_uid(argp->rqstp, - (char *)p, length, &ace->who_uid); - - return status; -} - -/* A counted array of nfsace4's */ -static noinline __be32 -nfsd4_decode_acl(struct nfsd4_compoundargs *argp, struct nfs4_acl **acl) -{ - struct nfs4_ace *ace; - __be32 status; - u32 count; - - if (xdr_stream_decode_u32(argp->xdr, &count) < 0) - return nfserr_bad_xdr; - - if (count > xdr_stream_remaining(argp->xdr) / 20) - /* - * Even with 4-byte names there wouldn't be - * space for that many aces; something fishy is - * going on: - */ - return nfserr_fbig; - - *acl = svcxdr_tmpalloc(argp, nfs4_acl_bytes(count)); - if (*acl == NULL) - return nfserr_jukebox; - - (*acl)->naces = count; - for (ace = (*acl)->aces; ace < (*acl)->aces + count; ace++) { - status = nfsd4_decode_nfsace4(argp, ace); - if (status) - return status; - } - - return nfs_ok; -} - -static noinline __be32 -nfsd4_decode_security_label(struct nfsd4_compoundargs *argp, - struct xdr_netobj *label) -{ - u32 lfs, pi, length; - __be32 *p; - - if (xdr_stream_decode_u32(argp->xdr, &lfs) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &pi) < 0) - return nfserr_bad_xdr; - - if (xdr_stream_decode_u32(argp->xdr, &length) < 0) - return nfserr_bad_xdr; - if (length > NFS4_MAXLABELLEN) - return nfserr_badlabel; - p = xdr_inline_decode(argp->xdr, length); - if (!p) - return nfserr_bad_xdr; - label->len = length; - label->data = svcxdr_dupstr(argp, p, length); - if (!label->data) - return nfserr_jukebox; - - return nfs_ok; -} - -static __be32 -nfsd4_decode_fattr4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen, - struct iattr *iattr, struct nfs4_acl **acl, - struct xdr_netobj *label, int *umask) -{ - unsigned int starting_pos; - u32 attrlist4_count; - __be32 *p, status; + int expected_len, len = 0; + u32 dummy32; + char *buf; + DECODE_HEAD; iattr->ia_valid = 0; - status = nfsd4_decode_bitmap4(argp, bmval, bmlen); - if (status) - return nfserr_bad_xdr; + if ((status = nfsd4_decode_bitmap(argp, bmval))) + return status; if (bmval[0] & ~NFSD_WRITEABLE_ATTRS_WORD0 || bmval[1] & ~NFSD_WRITEABLE_ATTRS_WORD1 @@ -398,69 +380,96 @@ nfsd4_decode_fattr4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen, return nfserr_attrnotsupp; } - if (xdr_stream_decode_u32(argp->xdr, &attrlist4_count) < 0) - return nfserr_bad_xdr; - starting_pos = xdr_stream_pos(argp->xdr); + READ_BUF(4); + expected_len = be32_to_cpup(p++); if (bmval[0] & FATTR4_WORD0_SIZE) { - u64 size; - - if (xdr_stream_decode_u64(argp->xdr, &size) < 0) - return nfserr_bad_xdr; - iattr->ia_size = size; + READ_BUF(8); + len += 8; + p = xdr_decode_hyper(p, &iattr->ia_size); iattr->ia_valid |= ATTR_SIZE; } if (bmval[0] & FATTR4_WORD0_ACL) { - status = nfsd4_decode_acl(argp, acl); - if (status) - return status; + u32 nace; + struct nfs4_ace *ace; + + READ_BUF(4); len += 4; + nace = be32_to_cpup(p++); + + if (nace > compoundargs_bytes_left(argp)/20) + /* + * Even with 4-byte names there wouldn't be + * space for that many aces; something fishy is + * going on: + */ + return nfserr_fbig; + + *acl = svcxdr_tmpalloc(argp, nfs4_acl_bytes(nace)); + if (*acl == NULL) + return nfserr_jukebox; + + (*acl)->naces = nace; + for (ace = (*acl)->aces; ace < (*acl)->aces + nace; ace++) { + READ_BUF(16); len += 16; + ace->type = be32_to_cpup(p++); + ace->flag = be32_to_cpup(p++); + ace->access_mask = be32_to_cpup(p++); + dummy32 = be32_to_cpup(p++); + READ_BUF(dummy32); + len += XDR_QUADLEN(dummy32) << 2; + READMEM(buf, dummy32); + ace->whotype = nfs4_acl_get_whotype(buf, dummy32); + status = nfs_ok; + if (ace->whotype != NFS4_ACL_WHO_NAMED) + ; + else if (ace->flag & NFS4_ACE_IDENTIFIER_GROUP) + status = nfsd_map_name_to_gid(argp->rqstp, + buf, dummy32, &ace->who_gid); + else + status = nfsd_map_name_to_uid(argp->rqstp, + buf, dummy32, &ace->who_uid); + if (status) + return status; + } } else *acl = NULL; if (bmval[1] & FATTR4_WORD1_MODE) { - u32 mode; - - if (xdr_stream_decode_u32(argp->xdr, &mode) < 0) - return nfserr_bad_xdr; - iattr->ia_mode = mode; + READ_BUF(4); + len += 4; + iattr->ia_mode = be32_to_cpup(p++); iattr->ia_mode &= (S_IFMT | S_IALLUGO); iattr->ia_valid |= ATTR_MODE; } if (bmval[1] & FATTR4_WORD1_OWNER) { - u32 length; - - if (xdr_stream_decode_u32(argp->xdr, &length) < 0) - return nfserr_bad_xdr; - p = xdr_inline_decode(argp->xdr, length); - if (!p) - return nfserr_bad_xdr; - status = nfsd_map_name_to_uid(argp->rqstp, (char *)p, length, - &iattr->ia_uid); - if (status) + READ_BUF(4); + len += 4; + dummy32 = be32_to_cpup(p++); + READ_BUF(dummy32); + len += (XDR_QUADLEN(dummy32) << 2); + READMEM(buf, dummy32); + if ((status = nfsd_map_name_to_uid(argp->rqstp, buf, dummy32, &iattr->ia_uid))) return status; iattr->ia_valid |= ATTR_UID; } if (bmval[1] & FATTR4_WORD1_OWNER_GROUP) { - u32 length; - - if (xdr_stream_decode_u32(argp->xdr, &length) < 0) - return nfserr_bad_xdr; - p = xdr_inline_decode(argp->xdr, length); - if (!p) - return nfserr_bad_xdr; - status = nfsd_map_name_to_gid(argp->rqstp, (char *)p, length, - &iattr->ia_gid); - if (status) + READ_BUF(4); + len += 4; + dummy32 = be32_to_cpup(p++); + READ_BUF(dummy32); + len += (XDR_QUADLEN(dummy32) << 2); + READMEM(buf, dummy32); + if ((status = nfsd_map_name_to_gid(argp->rqstp, buf, dummy32, &iattr->ia_gid))) return status; iattr->ia_valid |= ATTR_GID; } if (bmval[1] & FATTR4_WORD1_TIME_ACCESS_SET) { - u32 set_it; - - if (xdr_stream_decode_u32(argp->xdr, &set_it) < 0) - return nfserr_bad_xdr; - switch (set_it) { + READ_BUF(4); + len += 4; + dummy32 = be32_to_cpup(p++); + switch (dummy32) { case NFS4_SET_TO_CLIENT_TIME: - status = nfsd4_decode_nfstime4(argp, &iattr->ia_atime); + len += 12; + status = nfsd4_decode_time(argp, &iattr->ia_atime); if (status) return status; iattr->ia_valid |= (ATTR_ATIME | ATTR_ATIME_SET); @@ -469,26 +478,17 @@ nfsd4_decode_fattr4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen, iattr->ia_valid |= ATTR_ATIME; break; default: - return nfserr_bad_xdr; + goto xdr_error; } } - if (bmval[1] & FATTR4_WORD1_TIME_CREATE) { - struct timespec64 ts; - - /* No Linux filesystem supports setting this attribute. */ - bmval[1] &= ~FATTR4_WORD1_TIME_CREATE; - status = nfsd4_decode_nfstime4(argp, &ts); - if (status) - return status; - } if (bmval[1] & FATTR4_WORD1_TIME_MODIFY_SET) { - u32 set_it; - - if (xdr_stream_decode_u32(argp->xdr, &set_it) < 0) - return nfserr_bad_xdr; - switch (set_it) { + READ_BUF(4); + len += 4; + dummy32 = be32_to_cpup(p++); + switch (dummy32) { case NFS4_SET_TO_CLIENT_TIME: - status = nfsd4_decode_nfstime4(argp, &iattr->ia_mtime); + len += 12; + status = nfsd4_decode_time(argp, &iattr->ia_mtime); if (status) return status; iattr->ia_valid |= (ATTR_MTIME | ATTR_MTIME_SET); @@ -497,335 +497,222 @@ nfsd4_decode_fattr4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen, iattr->ia_valid |= ATTR_MTIME; break; default: - return nfserr_bad_xdr; + goto xdr_error; } } + label->len = 0; if (IS_ENABLED(CONFIG_NFSD_V4_SECURITY_LABEL) && bmval[2] & FATTR4_WORD2_SECURITY_LABEL) { - status = nfsd4_decode_security_label(argp, label); - if (status) - return status; + READ_BUF(4); + len += 4; + dummy32 = be32_to_cpup(p++); /* lfs: we don't use it */ + READ_BUF(4); + len += 4; + dummy32 = be32_to_cpup(p++); /* pi: we don't use it either */ + READ_BUF(4); + len += 4; + dummy32 = be32_to_cpup(p++); + READ_BUF(dummy32); + if (dummy32 > NFS4_MAXLABELLEN) + return nfserr_badlabel; + len += (XDR_QUADLEN(dummy32) << 2); + READMEM(buf, dummy32); + label->len = dummy32; + label->data = svcxdr_dupstr(argp, buf, dummy32); + if (!label->data) + return nfserr_jukebox; } if (bmval[2] & FATTR4_WORD2_MODE_UMASK) { - u32 mode, mask; - if (!umask) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &mode) < 0) - return nfserr_bad_xdr; - iattr->ia_mode = mode & (S_IFMT | S_IALLUGO); - if (xdr_stream_decode_u32(argp->xdr, &mask) < 0) - return nfserr_bad_xdr; - *umask = mask & S_IRWXUGO; + goto xdr_error; + READ_BUF(8); + len += 8; + dummy32 = be32_to_cpup(p++); + iattr->ia_mode = dummy32 & (S_IFMT | S_IALLUGO); + dummy32 = be32_to_cpup(p++); + *umask = dummy32 & S_IRWXUGO; iattr->ia_valid |= ATTR_MODE; } + if (len != expected_len) + goto xdr_error; - /* request sanity: did attrlist4 contain the expected number of words? */ - if (attrlist4_count != xdr_stream_pos(argp->xdr) - starting_pos) - return nfserr_bad_xdr; - - return nfs_ok; + DECODE_TAIL; } static __be32 -nfsd4_decode_stateid4(struct nfsd4_compoundargs *argp, stateid_t *sid) +nfsd4_decode_stateid(struct nfsd4_compoundargs *argp, stateid_t *sid) { - __be32 *p; + DECODE_HEAD; - p = xdr_inline_decode(argp->xdr, NFS4_STATEID_SIZE); - if (!p) - return nfserr_bad_xdr; + READ_BUF(sizeof(stateid_t)); sid->si_generation = be32_to_cpup(p++); - memcpy(&sid->si_opaque, p, sizeof(sid->si_opaque)); - return nfs_ok; + COPYMEM(&sid->si_opaque, sizeof(stateid_opaque_t)); + + DECODE_TAIL; } static __be32 -nfsd4_decode_clientid4(struct nfsd4_compoundargs *argp, clientid_t *clientid) +nfsd4_decode_access(struct nfsd4_compoundargs *argp, struct nfsd4_access *access) { - __be32 *p; + DECODE_HEAD; - p = xdr_inline_decode(argp->xdr, sizeof(__be64)); - if (!p) - return nfserr_bad_xdr; - memcpy(clientid, p, sizeof(*clientid)); - return nfs_ok; + READ_BUF(4); + access->ac_req_access = be32_to_cpup(p++); + + DECODE_TAIL; } -static __be32 -nfsd4_decode_state_owner4(struct nfsd4_compoundargs *argp, - clientid_t *clientid, struct xdr_netobj *owner) +static __be32 nfsd4_decode_cb_sec(struct nfsd4_compoundargs *argp, struct nfsd4_cb_sec *cbs) { - __be32 status; - - status = nfsd4_decode_clientid4(argp, clientid); - if (status) - return status; - return nfsd4_decode_opaque(argp, owner); -} - -#ifdef CONFIG_NFSD_PNFS -static __be32 -nfsd4_decode_deviceid4(struct nfsd4_compoundargs *argp, - struct nfsd4_deviceid *devid) -{ - __be32 *p; - - p = xdr_inline_decode(argp->xdr, NFS4_DEVICEID4_SIZE); - if (!p) - return nfserr_bad_xdr; - memcpy(devid, p, sizeof(*devid)); - return nfs_ok; -} - -static __be32 -nfsd4_decode_layoutupdate4(struct nfsd4_compoundargs *argp, - struct nfsd4_layoutcommit *lcp) -{ - if (xdr_stream_decode_u32(argp->xdr, &lcp->lc_layout_type) < 0) - return nfserr_bad_xdr; - if (lcp->lc_layout_type < LAYOUT_NFSV4_1_FILES) - return nfserr_bad_xdr; - if (lcp->lc_layout_type >= LAYOUT_TYPE_MAX) - return nfserr_bad_xdr; - - if (xdr_stream_decode_u32(argp->xdr, &lcp->lc_up_len) < 0) - return nfserr_bad_xdr; - if (lcp->lc_up_len > 0) { - lcp->lc_up_layout = xdr_inline_decode(argp->xdr, lcp->lc_up_len); - if (!lcp->lc_up_layout) - return nfserr_bad_xdr; - } - - return nfs_ok; -} - -static __be32 -nfsd4_decode_layoutreturn4(struct nfsd4_compoundargs *argp, - struct nfsd4_layoutreturn *lrp) -{ - __be32 status; - - if (xdr_stream_decode_u32(argp->xdr, &lrp->lr_return_type) < 0) - return nfserr_bad_xdr; - switch (lrp->lr_return_type) { - case RETURN_FILE: - if (xdr_stream_decode_u64(argp->xdr, &lrp->lr_seg.offset) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u64(argp->xdr, &lrp->lr_seg.length) < 0) - return nfserr_bad_xdr; - status = nfsd4_decode_stateid4(argp, &lrp->lr_sid); - if (status) - return status; - if (xdr_stream_decode_u32(argp->xdr, &lrp->lrf_body_len) < 0) - return nfserr_bad_xdr; - if (lrp->lrf_body_len > 0) { - lrp->lrf_body = xdr_inline_decode(argp->xdr, lrp->lrf_body_len); - if (!lrp->lrf_body) - return nfserr_bad_xdr; - } - break; - case RETURN_FSID: - case RETURN_ALL: - lrp->lr_seg.offset = 0; - lrp->lr_seg.length = NFS4_MAX_UINT64; - break; - default: - return nfserr_bad_xdr; - } - - return nfs_ok; -} - -#endif /* CONFIG_NFSD_PNFS */ - -static __be32 -nfsd4_decode_sessionid4(struct nfsd4_compoundargs *argp, - struct nfs4_sessionid *sessionid) -{ - __be32 *p; - - p = xdr_inline_decode(argp->xdr, NFS4_MAX_SESSIONID_LEN); - if (!p) - return nfserr_bad_xdr; - memcpy(sessionid->data, p, sizeof(sessionid->data)); - return nfs_ok; -} - -/* Defined in Appendix A of RFC 5531 */ -static __be32 -nfsd4_decode_authsys_parms(struct nfsd4_compoundargs *argp, - struct nfsd4_cb_sec *cbs) -{ - u32 stamp, gidcount, uid, gid; - __be32 *p, status; - - if (xdr_stream_decode_u32(argp->xdr, &stamp) < 0) - return nfserr_bad_xdr; - /* machine name */ - status = nfsd4_decode_ignored_string(argp, 255); - if (status) - return status; - if (xdr_stream_decode_u32(argp->xdr, &uid) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &gid) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &gidcount) < 0) - return nfserr_bad_xdr; - if (gidcount > 16) - return nfserr_bad_xdr; - p = xdr_inline_decode(argp->xdr, gidcount << 2); - if (!p) - return nfserr_bad_xdr; - if (cbs->flavor == (u32)(-1)) { - struct user_namespace *userns = nfsd_user_namespace(argp->rqstp); - - kuid_t kuid = make_kuid(userns, uid); - kgid_t kgid = make_kgid(userns, gid); - if (uid_valid(kuid) && gid_valid(kgid)) { - cbs->uid = kuid; - cbs->gid = kgid; - cbs->flavor = RPC_AUTH_UNIX; - } else { - dprintk("RPC_AUTH_UNIX with invalid uid or gid, ignoring!\n"); - } - } - - return nfs_ok; -} - -static __be32 -nfsd4_decode_gss_cb_handles4(struct nfsd4_compoundargs *argp, - struct nfsd4_cb_sec *cbs) -{ - __be32 status; - u32 service; - - dprintk("RPC_AUTH_GSS callback secflavor not supported!\n"); - - if (xdr_stream_decode_u32(argp->xdr, &service) < 0) - return nfserr_bad_xdr; - if (service < RPC_GSS_SVC_NONE || service > RPC_GSS_SVC_PRIVACY) - return nfserr_bad_xdr; - /* gcbp_handle_from_server */ - status = nfsd4_decode_ignored_string(argp, 0); - if (status) - return status; - /* gcbp_handle_from_client */ - status = nfsd4_decode_ignored_string(argp, 0); - if (status) - return status; - - return nfs_ok; -} - -/* a counted array of callback_sec_parms4 items */ -static __be32 -nfsd4_decode_cb_sec(struct nfsd4_compoundargs *argp, struct nfsd4_cb_sec *cbs) -{ - u32 i, secflavor, nr_secflavs; - __be32 status; + DECODE_HEAD; + struct user_namespace *userns = nfsd_user_namespace(argp->rqstp); + u32 dummy, uid, gid; + char *machine_name; + int i; + int nr_secflavs; /* callback_sec_params4 */ - if (xdr_stream_decode_u32(argp->xdr, &nr_secflavs) < 0) - return nfserr_bad_xdr; + READ_BUF(4); + nr_secflavs = be32_to_cpup(p++); if (nr_secflavs) cbs->flavor = (u32)(-1); else /* Is this legal? Be generous, take it to mean AUTH_NONE: */ cbs->flavor = 0; - for (i = 0; i < nr_secflavs; ++i) { - if (xdr_stream_decode_u32(argp->xdr, &secflavor) < 0) - return nfserr_bad_xdr; - switch (secflavor) { + READ_BUF(4); + dummy = be32_to_cpup(p++); + switch (dummy) { case RPC_AUTH_NULL: - /* void */ + /* Nothing to read */ if (cbs->flavor == (u32)(-1)) cbs->flavor = RPC_AUTH_NULL; break; case RPC_AUTH_UNIX: - status = nfsd4_decode_authsys_parms(argp, cbs); - if (status) - return status; + READ_BUF(8); + /* stamp */ + dummy = be32_to_cpup(p++); + + /* machine name */ + dummy = be32_to_cpup(p++); + READ_BUF(dummy); + SAVEMEM(machine_name, dummy); + + /* uid, gid */ + READ_BUF(8); + uid = be32_to_cpup(p++); + gid = be32_to_cpup(p++); + + /* more gids */ + READ_BUF(4); + dummy = be32_to_cpup(p++); + READ_BUF(dummy * 4); + if (cbs->flavor == (u32)(-1)) { + kuid_t kuid = make_kuid(userns, uid); + kgid_t kgid = make_kgid(userns, gid); + if (uid_valid(kuid) && gid_valid(kgid)) { + cbs->uid = kuid; + cbs->gid = kgid; + cbs->flavor = RPC_AUTH_UNIX; + } else { + dprintk("RPC_AUTH_UNIX with invalid" + "uid or gid ignoring!\n"); + } + } break; case RPC_AUTH_GSS: - status = nfsd4_decode_gss_cb_handles4(argp, cbs); - if (status) - return status; + dprintk("RPC_AUTH_GSS callback secflavor " + "not supported!\n"); + READ_BUF(8); + /* gcbp_service */ + dummy = be32_to_cpup(p++); + /* gcbp_handle_from_server */ + dummy = be32_to_cpup(p++); + READ_BUF(dummy); + p += XDR_QUADLEN(dummy); + /* gcbp_handle_from_client */ + READ_BUF(4); + dummy = be32_to_cpup(p++); + READ_BUF(dummy); break; default: + dprintk("Illegal callback secflavor\n"); return nfserr_inval; } } - - return nfs_ok; + DECODE_TAIL; } - -/* - * NFSv4 operation argument decoders - */ - -static __be32 -nfsd4_decode_access(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) +static __be32 nfsd4_decode_backchannel_ctl(struct nfsd4_compoundargs *argp, struct nfsd4_backchannel_ctl *bc) { - struct nfsd4_access *access = &u->access; - if (xdr_stream_decode_u32(argp->xdr, &access->ac_req_access) < 0) - return nfserr_bad_xdr; - return nfs_ok; + DECODE_HEAD; + + READ_BUF(4); + bc->bc_cb_program = be32_to_cpup(p++); + nfsd4_decode_cb_sec(argp, &bc->bc_cb_sec); + + DECODE_TAIL; +} + +static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, struct nfsd4_bind_conn_to_session *bcts) +{ + DECODE_HEAD; + + READ_BUF(NFS4_MAX_SESSIONID_LEN + 8); + COPYMEM(bcts->sessionid.data, NFS4_MAX_SESSIONID_LEN); + bcts->dir = be32_to_cpup(p++); + /* XXX: skipping ctsa_use_conn_in_rdma_mode. Perhaps Tom Tucker + * could help us figure out we should be using it. */ + DECODE_TAIL; } static __be32 -nfsd4_decode_close(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_close(struct nfsd4_compoundargs *argp, struct nfsd4_close *close) { - struct nfsd4_close *close = &u->close; - if (xdr_stream_decode_u32(argp->xdr, &close->cl_seqid) < 0) - return nfserr_bad_xdr; - return nfsd4_decode_stateid4(argp, &close->cl_stateid); + DECODE_HEAD; + + READ_BUF(4); + close->cl_seqid = be32_to_cpup(p++); + return nfsd4_decode_stateid(argp, &close->cl_stateid); + + DECODE_TAIL; } static __be32 -nfsd4_decode_commit(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_commit(struct nfsd4_compoundargs *argp, struct nfsd4_commit *commit) { - struct nfsd4_commit *commit = &u->commit; - if (xdr_stream_decode_u64(argp->xdr, &commit->co_offset) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &commit->co_count) < 0) - return nfserr_bad_xdr; - memset(&commit->co_verf, 0, sizeof(commit->co_verf)); - return nfs_ok; + DECODE_HEAD; + + READ_BUF(12); + p = xdr_decode_hyper(p, &commit->co_offset); + commit->co_count = be32_to_cpup(p++); + + DECODE_TAIL; } static __be32 -nfsd4_decode_create(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_create(struct nfsd4_compoundargs *argp, struct nfsd4_create *create) { - struct nfsd4_create *create = &u->create; - __be32 *p, status; + DECODE_HEAD; - memset(create, 0, sizeof(*create)); - if (xdr_stream_decode_u32(argp->xdr, &create->cr_type) < 0) - return nfserr_bad_xdr; + READ_BUF(4); + create->cr_type = be32_to_cpup(p++); switch (create->cr_type) { case NF4LNK: - if (xdr_stream_decode_u32(argp->xdr, &create->cr_datalen) < 0) - return nfserr_bad_xdr; - p = xdr_inline_decode(argp->xdr, create->cr_datalen); - if (!p) - return nfserr_bad_xdr; + READ_BUF(4); + create->cr_datalen = be32_to_cpup(p++); + READ_BUF(create->cr_datalen); create->cr_data = svcxdr_dupstr(argp, p, create->cr_datalen); if (!create->cr_data) return nfserr_jukebox; break; case NF4BLK: case NF4CHR: - if (xdr_stream_decode_u32(argp->xdr, &create->cr_specdata1) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &create->cr_specdata2) < 0) - return nfserr_bad_xdr; + READ_BUF(8); + create->cr_specdata1 = be32_to_cpup(p++); + create->cr_specdata2 = be32_to_cpup(p++); break; case NF4SOCK: case NF4FIFO: @@ -833,221 +720,151 @@ nfsd4_decode_create(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) default: break; } - status = nfsd4_decode_component4(argp, &create->cr_name, - &create->cr_namelen); - if (status) - return status; - status = nfsd4_decode_fattr4(argp, create->cr_bmval, - ARRAY_SIZE(create->cr_bmval), - &create->cr_iattr, &create->cr_acl, - &create->cr_label, &create->cr_umask); - if (status) + + READ_BUF(4); + create->cr_namelen = be32_to_cpup(p++); + READ_BUF(create->cr_namelen); + SAVEMEM(create->cr_name, create->cr_namelen); + if ((status = check_filename(create->cr_name, create->cr_namelen))) return status; - return nfs_ok; + status = nfsd4_decode_fattr(argp, create->cr_bmval, &create->cr_iattr, + &create->cr_acl, &create->cr_label, + &create->cr_umask); + if (status) + goto out; + + DECODE_TAIL; } static inline __be32 -nfsd4_decode_delegreturn(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_delegreturn(struct nfsd4_compoundargs *argp, struct nfsd4_delegreturn *dr) { - struct nfsd4_delegreturn *dr = &u->delegreturn; - return nfsd4_decode_stateid4(argp, &dr->dr_stateid); + return nfsd4_decode_stateid(argp, &dr->dr_stateid); } static inline __be32 -nfsd4_decode_getattr(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_getattr(struct nfsd4_compoundargs *argp, struct nfsd4_getattr *getattr) { - struct nfsd4_getattr *getattr = &u->getattr; - memset(getattr, 0, sizeof(*getattr)); - return nfsd4_decode_bitmap4(argp, getattr->ga_bmval, - ARRAY_SIZE(getattr->ga_bmval)); + return nfsd4_decode_bitmap(argp, getattr->ga_bmval); } static __be32 -nfsd4_decode_link(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_link(struct nfsd4_compoundargs *argp, struct nfsd4_link *link) { - struct nfsd4_link *link = &u->link; - memset(link, 0, sizeof(*link)); - return nfsd4_decode_component4(argp, &link->li_name, &link->li_namelen); -} + DECODE_HEAD; -static __be32 -nfsd4_decode_open_to_lock_owner4(struct nfsd4_compoundargs *argp, - struct nfsd4_lock *lock) -{ - __be32 status; - - if (xdr_stream_decode_u32(argp->xdr, &lock->lk_new_open_seqid) < 0) - return nfserr_bad_xdr; - status = nfsd4_decode_stateid4(argp, &lock->lk_new_open_stateid); - if (status) + READ_BUF(4); + link->li_namelen = be32_to_cpup(p++); + READ_BUF(link->li_namelen); + SAVEMEM(link->li_name, link->li_namelen); + if ((status = check_filename(link->li_name, link->li_namelen))) return status; - if (xdr_stream_decode_u32(argp->xdr, &lock->lk_new_lock_seqid) < 0) - return nfserr_bad_xdr; - return nfsd4_decode_state_owner4(argp, &lock->lk_new_clientid, - &lock->lk_new_owner); + + DECODE_TAIL; } static __be32 -nfsd4_decode_exist_lock_owner4(struct nfsd4_compoundargs *argp, - struct nfsd4_lock *lock) +nfsd4_decode_lock(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) { - __be32 status; + DECODE_HEAD; - status = nfsd4_decode_stateid4(argp, &lock->lk_old_lock_stateid); - if (status) - return status; - if (xdr_stream_decode_u32(argp->xdr, &lock->lk_old_lock_seqid) < 0) - return nfserr_bad_xdr; - - return nfs_ok; -} - -static __be32 -nfsd4_decode_locker4(struct nfsd4_compoundargs *argp, struct nfsd4_lock *lock) -{ - if (xdr_stream_decode_bool(argp->xdr, &lock->lk_is_new) < 0) - return nfserr_bad_xdr; - if (lock->lk_is_new) - return nfsd4_decode_open_to_lock_owner4(argp, lock); - return nfsd4_decode_exist_lock_owner4(argp, lock); -} - -static __be32 -nfsd4_decode_lock(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) -{ - struct nfsd4_lock *lock = &u->lock; - memset(lock, 0, sizeof(*lock)); - if (xdr_stream_decode_u32(argp->xdr, &lock->lk_type) < 0) - return nfserr_bad_xdr; + /* + * type, reclaim(boolean), offset, length, new_lock_owner(boolean) + */ + READ_BUF(28); + lock->lk_type = be32_to_cpup(p++); if ((lock->lk_type < NFS4_READ_LT) || (lock->lk_type > NFS4_WRITEW_LT)) - return nfserr_bad_xdr; - if (xdr_stream_decode_bool(argp->xdr, &lock->lk_reclaim) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u64(argp->xdr, &lock->lk_offset) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u64(argp->xdr, &lock->lk_length) < 0) - return nfserr_bad_xdr; - return nfsd4_decode_locker4(argp, lock); + goto xdr_error; + lock->lk_reclaim = be32_to_cpup(p++); + p = xdr_decode_hyper(p, &lock->lk_offset); + p = xdr_decode_hyper(p, &lock->lk_length); + lock->lk_is_new = be32_to_cpup(p++); + + if (lock->lk_is_new) { + READ_BUF(4); + lock->lk_new_open_seqid = be32_to_cpup(p++); + status = nfsd4_decode_stateid(argp, &lock->lk_new_open_stateid); + if (status) + return status; + READ_BUF(8 + sizeof(clientid_t)); + lock->lk_new_lock_seqid = be32_to_cpup(p++); + COPYMEM(&lock->lk_new_clientid, sizeof(clientid_t)); + lock->lk_new_owner.len = be32_to_cpup(p++); + READ_BUF(lock->lk_new_owner.len); + READMEM(lock->lk_new_owner.data, lock->lk_new_owner.len); + } else { + status = nfsd4_decode_stateid(argp, &lock->lk_old_lock_stateid); + if (status) + return status; + READ_BUF(4); + lock->lk_old_lock_seqid = be32_to_cpup(p++); + } + + DECODE_TAIL; } static __be32 -nfsd4_decode_lockt(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_lockt(struct nfsd4_compoundargs *argp, struct nfsd4_lockt *lockt) { - struct nfsd4_lockt *lockt = &u->lockt; - memset(lockt, 0, sizeof(*lockt)); - if (xdr_stream_decode_u32(argp->xdr, &lockt->lt_type) < 0) - return nfserr_bad_xdr; - if ((lockt->lt_type < NFS4_READ_LT) || (lockt->lt_type > NFS4_WRITEW_LT)) - return nfserr_bad_xdr; - if (xdr_stream_decode_u64(argp->xdr, &lockt->lt_offset) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u64(argp->xdr, &lockt->lt_length) < 0) - return nfserr_bad_xdr; - return nfsd4_decode_state_owner4(argp, &lockt->lt_clientid, - &lockt->lt_owner); + DECODE_HEAD; + + READ_BUF(32); + lockt->lt_type = be32_to_cpup(p++); + if((lockt->lt_type < NFS4_READ_LT) || (lockt->lt_type > NFS4_WRITEW_LT)) + goto xdr_error; + p = xdr_decode_hyper(p, &lockt->lt_offset); + p = xdr_decode_hyper(p, &lockt->lt_length); + COPYMEM(&lockt->lt_clientid, 8); + lockt->lt_owner.len = be32_to_cpup(p++); + READ_BUF(lockt->lt_owner.len); + READMEM(lockt->lt_owner.data, lockt->lt_owner.len); + + DECODE_TAIL; } static __be32 -nfsd4_decode_locku(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_locku(struct nfsd4_compoundargs *argp, struct nfsd4_locku *locku) { - struct nfsd4_locku *locku = &u->locku; - __be32 status; + DECODE_HEAD; - if (xdr_stream_decode_u32(argp->xdr, &locku->lu_type) < 0) - return nfserr_bad_xdr; + READ_BUF(8); + locku->lu_type = be32_to_cpup(p++); if ((locku->lu_type < NFS4_READ_LT) || (locku->lu_type > NFS4_WRITEW_LT)) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &locku->lu_seqid) < 0) - return nfserr_bad_xdr; - status = nfsd4_decode_stateid4(argp, &locku->lu_stateid); + goto xdr_error; + locku->lu_seqid = be32_to_cpup(p++); + status = nfsd4_decode_stateid(argp, &locku->lu_stateid); if (status) return status; - if (xdr_stream_decode_u64(argp->xdr, &locku->lu_offset) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u64(argp->xdr, &locku->lu_length) < 0) - return nfserr_bad_xdr; + READ_BUF(16); + p = xdr_decode_hyper(p, &locku->lu_offset); + p = xdr_decode_hyper(p, &locku->lu_length); - return nfs_ok; + DECODE_TAIL; } static __be32 -nfsd4_decode_lookup(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_lookup(struct nfsd4_compoundargs *argp, struct nfsd4_lookup *lookup) { - struct nfsd4_lookup *lookup = &u->lookup; - return nfsd4_decode_component4(argp, &lookup->lo_name, &lookup->lo_len); -} + DECODE_HEAD; -static __be32 -nfsd4_decode_createhow4(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) -{ - __be32 status; + READ_BUF(4); + lookup->lo_len = be32_to_cpup(p++); + READ_BUF(lookup->lo_len); + SAVEMEM(lookup->lo_name, lookup->lo_len); + if ((status = check_filename(lookup->lo_name, lookup->lo_len))) + return status; - if (xdr_stream_decode_u32(argp->xdr, &open->op_createmode) < 0) - return nfserr_bad_xdr; - switch (open->op_createmode) { - case NFS4_CREATE_UNCHECKED: - case NFS4_CREATE_GUARDED: - status = nfsd4_decode_fattr4(argp, open->op_bmval, - ARRAY_SIZE(open->op_bmval), - &open->op_iattr, &open->op_acl, - &open->op_label, &open->op_umask); - if (status) - return status; - break; - case NFS4_CREATE_EXCLUSIVE: - status = nfsd4_decode_verifier4(argp, &open->op_verf); - if (status) - return status; - break; - case NFS4_CREATE_EXCLUSIVE4_1: - if (argp->minorversion < 1) - return nfserr_bad_xdr; - status = nfsd4_decode_verifier4(argp, &open->op_verf); - if (status) - return status; - status = nfsd4_decode_fattr4(argp, open->op_bmval, - ARRAY_SIZE(open->op_bmval), - &open->op_iattr, &open->op_acl, - &open->op_label, &open->op_umask); - if (status) - return status; - break; - default: - return nfserr_bad_xdr; - } - - return nfs_ok; -} - -static __be32 -nfsd4_decode_openflag4(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) -{ - __be32 status; - - if (xdr_stream_decode_u32(argp->xdr, &open->op_create) < 0) - return nfserr_bad_xdr; - switch (open->op_create) { - case NFS4_OPEN_NOCREATE: - break; - case NFS4_OPEN_CREATE: - status = nfsd4_decode_createhow4(argp, open); - if (status) - return status; - break; - default: - return nfserr_bad_xdr; - } - - return nfs_ok; + DECODE_TAIL; } static __be32 nfsd4_decode_share_access(struct nfsd4_compoundargs *argp, u32 *share_access, u32 *deleg_want, u32 *deleg_when) { + __be32 *p; u32 w; - if (xdr_stream_decode_u32(argp->xdr, &w) < 0) - return nfserr_bad_xdr; + READ_BUF(4); + w = be32_to_cpup(p++); *share_access = w & NFS4_SHARE_ACCESS_MASK; *deleg_want = w & NFS4_SHARE_WANT_MASK; if (deleg_when) @@ -1090,163 +907,210 @@ static __be32 nfsd4_decode_share_access(struct nfsd4_compoundargs *argp, u32 *sh NFS4_SHARE_PUSH_DELEG_WHEN_UNCONTENDED): return nfs_ok; } +xdr_error: return nfserr_bad_xdr; } static __be32 nfsd4_decode_share_deny(struct nfsd4_compoundargs *argp, u32 *x) { - if (xdr_stream_decode_u32(argp->xdr, x) < 0) - return nfserr_bad_xdr; - /* Note: unlike access bits, deny bits may be zero. */ + __be32 *p; + + READ_BUF(4); + *x = be32_to_cpup(p++); + /* Note: unlinke access bits, deny bits may be zero. */ if (*x & ~NFS4_SHARE_DENY_BOTH) return nfserr_bad_xdr; - return nfs_ok; +xdr_error: + return nfserr_bad_xdr; +} + +static __be32 nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_netobj *o) +{ + __be32 *p; + + READ_BUF(4); + o->len = be32_to_cpup(p++); + + if (o->len == 0 || o->len > NFS4_OPAQUE_LIMIT) + return nfserr_bad_xdr; + + READ_BUF(o->len); + SAVEMEM(o->data, o->len); + return nfs_ok; +xdr_error: + return nfserr_bad_xdr; } static __be32 -nfsd4_decode_open_claim4(struct nfsd4_compoundargs *argp, - struct nfsd4_open *open) +nfsd4_decode_open(struct nfsd4_compoundargs *argp, struct nfsd4_open *open) { - __be32 status; + DECODE_HEAD; + u32 dummy; - if (xdr_stream_decode_u32(argp->xdr, &open->op_claim_type) < 0) - return nfserr_bad_xdr; + memset(open->op_bmval, 0, sizeof(open->op_bmval)); + open->op_iattr.ia_valid = 0; + open->op_openowner = NULL; + + open->op_xdr_error = 0; + /* seqid, share_access, share_deny, clientid, ownerlen */ + READ_BUF(4); + open->op_seqid = be32_to_cpup(p++); + /* decode, yet ignore deleg_when until supported */ + status = nfsd4_decode_share_access(argp, &open->op_share_access, + &open->op_deleg_want, &dummy); + if (status) + goto xdr_error; + status = nfsd4_decode_share_deny(argp, &open->op_share_deny); + if (status) + goto xdr_error; + READ_BUF(sizeof(clientid_t)); + COPYMEM(&open->op_clientid, sizeof(clientid_t)); + status = nfsd4_decode_opaque(argp, &open->op_owner); + if (status) + goto xdr_error; + READ_BUF(4); + open->op_create = be32_to_cpup(p++); + switch (open->op_create) { + case NFS4_OPEN_NOCREATE: + break; + case NFS4_OPEN_CREATE: + READ_BUF(4); + open->op_createmode = be32_to_cpup(p++); + switch (open->op_createmode) { + case NFS4_CREATE_UNCHECKED: + case NFS4_CREATE_GUARDED: + status = nfsd4_decode_fattr(argp, open->op_bmval, + &open->op_iattr, &open->op_acl, &open->op_label, + &open->op_umask); + if (status) + goto out; + break; + case NFS4_CREATE_EXCLUSIVE: + READ_BUF(NFS4_VERIFIER_SIZE); + COPYMEM(open->op_verf.data, NFS4_VERIFIER_SIZE); + break; + case NFS4_CREATE_EXCLUSIVE4_1: + if (argp->minorversion < 1) + goto xdr_error; + READ_BUF(NFS4_VERIFIER_SIZE); + COPYMEM(open->op_verf.data, NFS4_VERIFIER_SIZE); + status = nfsd4_decode_fattr(argp, open->op_bmval, + &open->op_iattr, &open->op_acl, &open->op_label, + &open->op_umask); + if (status) + goto out; + break; + default: + goto xdr_error; + } + break; + default: + goto xdr_error; + } + + /* open_claim */ + READ_BUF(4); + open->op_claim_type = be32_to_cpup(p++); switch (open->op_claim_type) { case NFS4_OPEN_CLAIM_NULL: case NFS4_OPEN_CLAIM_DELEGATE_PREV: - status = nfsd4_decode_component4(argp, &open->op_fname, - &open->op_fnamelen); - if (status) + READ_BUF(4); + open->op_fname.len = be32_to_cpup(p++); + READ_BUF(open->op_fname.len); + SAVEMEM(open->op_fname.data, open->op_fname.len); + if ((status = check_filename(open->op_fname.data, open->op_fname.len))) return status; break; case NFS4_OPEN_CLAIM_PREVIOUS: - if (xdr_stream_decode_u32(argp->xdr, &open->op_delegate_type) < 0) - return nfserr_bad_xdr; + READ_BUF(4); + open->op_delegate_type = be32_to_cpup(p++); break; case NFS4_OPEN_CLAIM_DELEGATE_CUR: - status = nfsd4_decode_stateid4(argp, &open->op_delegate_stateid); + status = nfsd4_decode_stateid(argp, &open->op_delegate_stateid); if (status) return status; - status = nfsd4_decode_component4(argp, &open->op_fname, - &open->op_fnamelen); - if (status) + READ_BUF(4); + open->op_fname.len = be32_to_cpup(p++); + READ_BUF(open->op_fname.len); + SAVEMEM(open->op_fname.data, open->op_fname.len); + if ((status = check_filename(open->op_fname.data, open->op_fname.len))) return status; break; case NFS4_OPEN_CLAIM_FH: case NFS4_OPEN_CLAIM_DELEG_PREV_FH: if (argp->minorversion < 1) - return nfserr_bad_xdr; + goto xdr_error; /* void */ break; case NFS4_OPEN_CLAIM_DELEG_CUR_FH: if (argp->minorversion < 1) - return nfserr_bad_xdr; - status = nfsd4_decode_stateid4(argp, &open->op_delegate_stateid); + goto xdr_error; + status = nfsd4_decode_stateid(argp, &open->op_delegate_stateid); if (status) return status; break; default: - return nfserr_bad_xdr; + goto xdr_error; } - return nfs_ok; + DECODE_TAIL; } static __be32 -nfsd4_decode_open(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_open_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_open_confirm *open_conf) { - struct nfsd4_open *open = &u->open; - __be32 status; - u32 dummy; - - memset(open, 0, sizeof(*open)); - - if (xdr_stream_decode_u32(argp->xdr, &open->op_seqid) < 0) - return nfserr_bad_xdr; - /* deleg_want is ignored */ - status = nfsd4_decode_share_access(argp, &open->op_share_access, - &open->op_deleg_want, &dummy); - if (status) - return status; - status = nfsd4_decode_share_deny(argp, &open->op_share_deny); - if (status) - return status; - status = nfsd4_decode_state_owner4(argp, &open->op_clientid, - &open->op_owner); - if (status) - return status; - status = nfsd4_decode_openflag4(argp, open); - if (status) - return status; - return nfsd4_decode_open_claim4(argp, open); -} - -static __be32 -nfsd4_decode_open_confirm(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) -{ - struct nfsd4_open_confirm *open_conf = &u->open_confirm; - __be32 status; + DECODE_HEAD; if (argp->minorversion >= 1) return nfserr_notsupp; - status = nfsd4_decode_stateid4(argp, &open_conf->oc_req_stateid); + status = nfsd4_decode_stateid(argp, &open_conf->oc_req_stateid); if (status) return status; - if (xdr_stream_decode_u32(argp->xdr, &open_conf->oc_seqid) < 0) - return nfserr_bad_xdr; + READ_BUF(4); + open_conf->oc_seqid = be32_to_cpup(p++); - memset(&open_conf->oc_resp_stateid, 0, - sizeof(open_conf->oc_resp_stateid)); - return nfs_ok; + DECODE_TAIL; } static __be32 -nfsd4_decode_open_downgrade(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) +nfsd4_decode_open_downgrade(struct nfsd4_compoundargs *argp, struct nfsd4_open_downgrade *open_down) { - struct nfsd4_open_downgrade *open_down = &u->open_downgrade; - __be32 status; - - memset(open_down, 0, sizeof(*open_down)); - status = nfsd4_decode_stateid4(argp, &open_down->od_stateid); + DECODE_HEAD; + + status = nfsd4_decode_stateid(argp, &open_down->od_stateid); if (status) return status; - if (xdr_stream_decode_u32(argp->xdr, &open_down->od_seqid) < 0) - return nfserr_bad_xdr; - /* deleg_want is ignored */ + READ_BUF(4); + open_down->od_seqid = be32_to_cpup(p++); status = nfsd4_decode_share_access(argp, &open_down->od_share_access, &open_down->od_deleg_want, NULL); if (status) return status; - return nfsd4_decode_share_deny(argp, &open_down->od_share_deny); + status = nfsd4_decode_share_deny(argp, &open_down->od_share_deny); + if (status) + return status; + DECODE_TAIL; } static __be32 -nfsd4_decode_putfh(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_putfh(struct nfsd4_compoundargs *argp, struct nfsd4_putfh *putfh) { - struct nfsd4_putfh *putfh = &u->putfh; - __be32 *p; + DECODE_HEAD; - if (xdr_stream_decode_u32(argp->xdr, &putfh->pf_fhlen) < 0) - return nfserr_bad_xdr; + READ_BUF(4); + putfh->pf_fhlen = be32_to_cpup(p++); if (putfh->pf_fhlen > NFS4_FHSIZE) - return nfserr_bad_xdr; - p = xdr_inline_decode(argp->xdr, putfh->pf_fhlen); - if (!p) - return nfserr_bad_xdr; - putfh->pf_fhval = svcxdr_savemem(argp, p, putfh->pf_fhlen); - if (!putfh->pf_fhval) - return nfserr_jukebox; + goto xdr_error; + READ_BUF(putfh->pf_fhlen); + SAVEMEM(putfh->pf_fhval, putfh->pf_fhlen); - putfh->no_verify = false; - return nfs_ok; + DECODE_TAIL; } static __be32 -nfsd4_decode_putpubfh(struct nfsd4_compoundargs *argp, union nfsd4_op_u *p) +nfsd4_decode_putpubfh(struct nfsd4_compoundargs *argp, void *p) { if (argp->minorversion == 0) return nfs_ok; @@ -1254,771 +1118,719 @@ nfsd4_decode_putpubfh(struct nfsd4_compoundargs *argp, union nfsd4_op_u *p) } static __be32 -nfsd4_decode_read(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_read(struct nfsd4_compoundargs *argp, struct nfsd4_read *read) { - struct nfsd4_read *read = &u->read; - __be32 status; + DECODE_HEAD; - memset(read, 0, sizeof(*read)); - status = nfsd4_decode_stateid4(argp, &read->rd_stateid); + status = nfsd4_decode_stateid(argp, &read->rd_stateid); if (status) return status; - if (xdr_stream_decode_u64(argp->xdr, &read->rd_offset) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &read->rd_length) < 0) - return nfserr_bad_xdr; + READ_BUF(12); + p = xdr_decode_hyper(p, &read->rd_offset); + read->rd_length = be32_to_cpup(p++); - return nfs_ok; + DECODE_TAIL; } static __be32 -nfsd4_decode_readdir(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_readdir(struct nfsd4_compoundargs *argp, struct nfsd4_readdir *readdir) { - struct nfsd4_readdir *readdir = &u->readdir; - __be32 status; + DECODE_HEAD; - memset(readdir, 0, sizeof(*readdir)); - if (xdr_stream_decode_u64(argp->xdr, &readdir->rd_cookie) < 0) - return nfserr_bad_xdr; - status = nfsd4_decode_verifier4(argp, &readdir->rd_verf); - if (status) + READ_BUF(24); + p = xdr_decode_hyper(p, &readdir->rd_cookie); + COPYMEM(readdir->rd_verf.data, sizeof(readdir->rd_verf.data)); + readdir->rd_dircount = be32_to_cpup(p++); + readdir->rd_maxcount = be32_to_cpup(p++); + if ((status = nfsd4_decode_bitmap(argp, readdir->rd_bmval))) + goto out; + + DECODE_TAIL; +} + +static __be32 +nfsd4_decode_remove(struct nfsd4_compoundargs *argp, struct nfsd4_remove *remove) +{ + DECODE_HEAD; + + READ_BUF(4); + remove->rm_namelen = be32_to_cpup(p++); + READ_BUF(remove->rm_namelen); + SAVEMEM(remove->rm_name, remove->rm_namelen); + if ((status = check_filename(remove->rm_name, remove->rm_namelen))) return status; - if (xdr_stream_decode_u32(argp->xdr, &readdir->rd_dircount) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &readdir->rd_maxcount) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_uint32_array(argp->xdr, readdir->rd_bmval, - ARRAY_SIZE(readdir->rd_bmval)) < 0) - return nfserr_bad_xdr; - return nfs_ok; + DECODE_TAIL; } static __be32 -nfsd4_decode_remove(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_rename(struct nfsd4_compoundargs *argp, struct nfsd4_rename *rename) { - struct nfsd4_remove *remove = &u->remove; - memset(&remove->rm_cinfo, 0, sizeof(remove->rm_cinfo)); - return nfsd4_decode_component4(argp, &remove->rm_name, &remove->rm_namelen); -} + DECODE_HEAD; -static __be32 -nfsd4_decode_rename(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) -{ - struct nfsd4_rename *rename = &u->rename; - __be32 status; - - memset(rename, 0, sizeof(*rename)); - status = nfsd4_decode_component4(argp, &rename->rn_sname, &rename->rn_snamelen); - if (status) + READ_BUF(4); + rename->rn_snamelen = be32_to_cpup(p++); + READ_BUF(rename->rn_snamelen); + SAVEMEM(rename->rn_sname, rename->rn_snamelen); + READ_BUF(4); + rename->rn_tnamelen = be32_to_cpup(p++); + READ_BUF(rename->rn_tnamelen); + SAVEMEM(rename->rn_tname, rename->rn_tnamelen); + if ((status = check_filename(rename->rn_sname, rename->rn_snamelen))) return status; - return nfsd4_decode_component4(argp, &rename->rn_tname, &rename->rn_tnamelen); + if ((status = check_filename(rename->rn_tname, rename->rn_tnamelen))) + return status; + + DECODE_TAIL; } static __be32 -nfsd4_decode_renew(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_renew(struct nfsd4_compoundargs *argp, clientid_t *clientid) { - clientid_t *clientid = &u->renew; - return nfsd4_decode_clientid4(argp, clientid); + DECODE_HEAD; + + if (argp->minorversion >= 1) + return nfserr_notsupp; + + READ_BUF(sizeof(clientid_t)); + COPYMEM(clientid, sizeof(clientid_t)); + + DECODE_TAIL; } static __be32 nfsd4_decode_secinfo(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) + struct nfsd4_secinfo *secinfo) { - struct nfsd4_secinfo *secinfo = &u->secinfo; - secinfo->si_exp = NULL; - return nfsd4_decode_component4(argp, &secinfo->si_name, &secinfo->si_namelen); -} + DECODE_HEAD; -static __be32 -nfsd4_decode_setattr(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) -{ - struct nfsd4_setattr *setattr = &u->setattr; - __be32 status; - - memset(setattr, 0, sizeof(*setattr)); - status = nfsd4_decode_stateid4(argp, &setattr->sa_stateid); + READ_BUF(4); + secinfo->si_namelen = be32_to_cpup(p++); + READ_BUF(secinfo->si_namelen); + SAVEMEM(secinfo->si_name, secinfo->si_namelen); + status = check_filename(secinfo->si_name, secinfo->si_namelen); if (status) return status; - return nfsd4_decode_fattr4(argp, setattr->sa_bmval, - ARRAY_SIZE(setattr->sa_bmval), - &setattr->sa_iattr, &setattr->sa_acl, - &setattr->sa_label, NULL); + DECODE_TAIL; } static __be32 -nfsd4_decode_setclientid(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_secinfo_no_name(struct nfsd4_compoundargs *argp, + struct nfsd4_secinfo_no_name *sin) { - struct nfsd4_setclientid *setclientid = &u->setclientid; - __be32 *p, status; + DECODE_HEAD; - memset(setclientid, 0, sizeof(*setclientid)); + READ_BUF(4); + sin->sin_style = be32_to_cpup(p++); + DECODE_TAIL; +} + +static __be32 +nfsd4_decode_setattr(struct nfsd4_compoundargs *argp, struct nfsd4_setattr *setattr) +{ + __be32 status; + + status = nfsd4_decode_stateid(argp, &setattr->sa_stateid); + if (status) + return status; + return nfsd4_decode_fattr(argp, setattr->sa_bmval, &setattr->sa_iattr, + &setattr->sa_acl, &setattr->sa_label, NULL); +} + +static __be32 +nfsd4_decode_setclientid(struct nfsd4_compoundargs *argp, struct nfsd4_setclientid *setclientid) +{ + DECODE_HEAD; if (argp->minorversion >= 1) return nfserr_notsupp; - status = nfsd4_decode_verifier4(argp, &setclientid->se_verf); - if (status) - return status; + READ_BUF(NFS4_VERIFIER_SIZE); + COPYMEM(setclientid->se_verf.data, NFS4_VERIFIER_SIZE); + status = nfsd4_decode_opaque(argp, &setclientid->se_name); if (status) - return status; - if (xdr_stream_decode_u32(argp->xdr, &setclientid->se_callback_prog) < 0) return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &setclientid->se_callback_netid_len) < 0) - return nfserr_bad_xdr; - p = xdr_inline_decode(argp->xdr, setclientid->se_callback_netid_len); - if (!p) - return nfserr_bad_xdr; - setclientid->se_callback_netid_val = svcxdr_savemem(argp, p, - setclientid->se_callback_netid_len); - if (!setclientid->se_callback_netid_val) - return nfserr_jukebox; + READ_BUF(8); + setclientid->se_callback_prog = be32_to_cpup(p++); + setclientid->se_callback_netid_len = be32_to_cpup(p++); + READ_BUF(setclientid->se_callback_netid_len); + SAVEMEM(setclientid->se_callback_netid_val, setclientid->se_callback_netid_len); + READ_BUF(4); + setclientid->se_callback_addr_len = be32_to_cpup(p++); - if (xdr_stream_decode_u32(argp->xdr, &setclientid->se_callback_addr_len) < 0) - return nfserr_bad_xdr; - p = xdr_inline_decode(argp->xdr, setclientid->se_callback_addr_len); - if (!p) - return nfserr_bad_xdr; - setclientid->se_callback_addr_val = svcxdr_savemem(argp, p, - setclientid->se_callback_addr_len); - if (!setclientid->se_callback_addr_val) - return nfserr_jukebox; - if (xdr_stream_decode_u32(argp->xdr, &setclientid->se_callback_ident) < 0) - return nfserr_bad_xdr; + READ_BUF(setclientid->se_callback_addr_len); + SAVEMEM(setclientid->se_callback_addr_val, setclientid->se_callback_addr_len); + READ_BUF(4); + setclientid->se_callback_ident = be32_to_cpup(p++); - return nfs_ok; + DECODE_TAIL; } static __be32 -nfsd4_decode_setclientid_confirm(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) +nfsd4_decode_setclientid_confirm(struct nfsd4_compoundargs *argp, struct nfsd4_setclientid_confirm *scd_c) { - struct nfsd4_setclientid_confirm *scd_c = &u->setclientid_confirm; - __be32 status; + DECODE_HEAD; if (argp->minorversion >= 1) return nfserr_notsupp; - status = nfsd4_decode_clientid4(argp, &scd_c->sc_clientid); - if (status) - return status; - return nfsd4_decode_verifier4(argp, &scd_c->sc_confirm); + READ_BUF(8 + NFS4_VERIFIER_SIZE); + COPYMEM(&scd_c->sc_clientid, 8); + COPYMEM(&scd_c->sc_confirm, NFS4_VERIFIER_SIZE); + + DECODE_TAIL; } /* Also used for NVERIFY */ static __be32 -nfsd4_decode_verify(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_verify(struct nfsd4_compoundargs *argp, struct nfsd4_verify *verify) { - struct nfsd4_verify *verify = &u->verify; - __be32 *p, status; + DECODE_HEAD; - memset(verify, 0, sizeof(*verify)); - - status = nfsd4_decode_bitmap4(argp, verify->ve_bmval, - ARRAY_SIZE(verify->ve_bmval)); - if (status) - return status; + if ((status = nfsd4_decode_bitmap(argp, verify->ve_bmval))) + goto out; /* For convenience's sake, we compare raw xdr'd attributes in * nfsd4_proc_verify */ - if (xdr_stream_decode_u32(argp->xdr, &verify->ve_attrlen) < 0) - return nfserr_bad_xdr; - p = xdr_inline_decode(argp->xdr, verify->ve_attrlen); - if (!p) - return nfserr_bad_xdr; - verify->ve_attrval = svcxdr_savemem(argp, p, verify->ve_attrlen); - if (!verify->ve_attrval) - return nfserr_jukebox; + READ_BUF(4); + verify->ve_attrlen = be32_to_cpup(p++); + READ_BUF(verify->ve_attrlen); + SAVEMEM(verify->ve_attrval, verify->ve_attrlen); - return nfs_ok; + DECODE_TAIL; } static __be32 -nfsd4_decode_write(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_write(struct nfsd4_compoundargs *argp, struct nfsd4_write *write) { - struct nfsd4_write *write = &u->write; - __be32 status; + DECODE_HEAD; - status = nfsd4_decode_stateid4(argp, &write->wr_stateid); + status = nfsd4_decode_stateid(argp, &write->wr_stateid); if (status) return status; - if (xdr_stream_decode_u64(argp->xdr, &write->wr_offset) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &write->wr_stable_how) < 0) - return nfserr_bad_xdr; + READ_BUF(16); + p = xdr_decode_hyper(p, &write->wr_offset); + write->wr_stable_how = be32_to_cpup(p++); if (write->wr_stable_how > NFS_FILE_SYNC) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &write->wr_buflen) < 0) - return nfserr_bad_xdr; - if (!xdr_stream_subsegment(argp->xdr, &write->wr_payload, write->wr_buflen)) - return nfserr_bad_xdr; + goto xdr_error; + write->wr_buflen = be32_to_cpup(p++); - write->wr_bytes_written = 0; - write->wr_how_written = 0; - memset(&write->wr_verifier, 0, sizeof(write->wr_verifier)); - return nfs_ok; + status = svcxdr_construct_vector(argp, &write->wr_head, + &write->wr_pagelist, write->wr_buflen); + if (status) + return status; + + DECODE_TAIL; } static __be32 -nfsd4_decode_release_lockowner(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) +nfsd4_decode_release_lockowner(struct nfsd4_compoundargs *argp, struct nfsd4_release_lockowner *rlockowner) { - struct nfsd4_release_lockowner *rlockowner = &u->release_lockowner; - __be32 status; + DECODE_HEAD; if (argp->minorversion >= 1) return nfserr_notsupp; - status = nfsd4_decode_state_owner4(argp, &rlockowner->rl_clientid, - &rlockowner->rl_owner); - if (status) - return status; + READ_BUF(12); + COPYMEM(&rlockowner->rl_clientid, sizeof(clientid_t)); + rlockowner->rl_owner.len = be32_to_cpup(p++); + READ_BUF(rlockowner->rl_owner.len); + READMEM(rlockowner->rl_owner.data, rlockowner->rl_owner.len); if (argp->minorversion && !zero_clientid(&rlockowner->rl_clientid)) return nfserr_inval; - - return nfs_ok; -} - -static __be32 nfsd4_decode_backchannel_ctl(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) -{ - struct nfsd4_backchannel_ctl *bc = &u->backchannel_ctl; - memset(bc, 0, sizeof(*bc)); - if (xdr_stream_decode_u32(argp->xdr, &bc->bc_cb_program) < 0) - return nfserr_bad_xdr; - return nfsd4_decode_cb_sec(argp, &bc->bc_cb_sec); -} - -static __be32 nfsd4_decode_bind_conn_to_session(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) -{ - struct nfsd4_bind_conn_to_session *bcts = &u->bind_conn_to_session; - u32 use_conn_in_rdma_mode; - __be32 status; - - memset(bcts, 0, sizeof(*bcts)); - status = nfsd4_decode_sessionid4(argp, &bcts->sessionid); - if (status) - return status; - if (xdr_stream_decode_u32(argp->xdr, &bcts->dir) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &use_conn_in_rdma_mode) < 0) - return nfserr_bad_xdr; - - return nfs_ok; -} - -static __be32 -nfsd4_decode_state_protect_ops(struct nfsd4_compoundargs *argp, - struct nfsd4_exchange_id *exid) -{ - __be32 status; - - status = nfsd4_decode_bitmap4(argp, exid->spo_must_enforce, - ARRAY_SIZE(exid->spo_must_enforce)); - if (status) - return nfserr_bad_xdr; - status = nfsd4_decode_bitmap4(argp, exid->spo_must_allow, - ARRAY_SIZE(exid->spo_must_allow)); - if (status) - return nfserr_bad_xdr; - - return nfs_ok; -} - -/* - * This implementation currently does not support SP4_SSV. - * This decoder simply skips over these arguments. - */ -static noinline __be32 -nfsd4_decode_ssv_sp_parms(struct nfsd4_compoundargs *argp, - struct nfsd4_exchange_id *exid) -{ - u32 count, window, num_gss_handles; - __be32 status; - - /* ssp_ops */ - status = nfsd4_decode_state_protect_ops(argp, exid); - if (status) - return status; - - /* ssp_hash_algs<> */ - if (xdr_stream_decode_u32(argp->xdr, &count) < 0) - return nfserr_bad_xdr; - while (count--) { - status = nfsd4_decode_ignored_string(argp, 0); - if (status) - return status; - } - - /* ssp_encr_algs<> */ - if (xdr_stream_decode_u32(argp->xdr, &count) < 0) - return nfserr_bad_xdr; - while (count--) { - status = nfsd4_decode_ignored_string(argp, 0); - if (status) - return status; - } - - if (xdr_stream_decode_u32(argp->xdr, &window) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &num_gss_handles) < 0) - return nfserr_bad_xdr; - - return nfs_ok; -} - -static __be32 -nfsd4_decode_state_protect4_a(struct nfsd4_compoundargs *argp, - struct nfsd4_exchange_id *exid) -{ - __be32 status; - - if (xdr_stream_decode_u32(argp->xdr, &exid->spa_how) < 0) - return nfserr_bad_xdr; - switch (exid->spa_how) { - case SP4_NONE: - break; - case SP4_MACH_CRED: - status = nfsd4_decode_state_protect_ops(argp, exid); - if (status) - return status; - break; - case SP4_SSV: - status = nfsd4_decode_ssv_sp_parms(argp, exid); - if (status) - return status; - break; - default: - return nfserr_bad_xdr; - } - - return nfs_ok; -} - -static __be32 -nfsd4_decode_nfs_impl_id4(struct nfsd4_compoundargs *argp, - struct nfsd4_exchange_id *exid) -{ - __be32 status; - u32 count; - - if (xdr_stream_decode_u32(argp->xdr, &count) < 0) - return nfserr_bad_xdr; - switch (count) { - case 0: - break; - case 1: - /* Note that RFC 8881 places no length limit on - * nii_domain, but this implementation permits no - * more than NFS4_OPAQUE_LIMIT bytes */ - status = nfsd4_decode_opaque(argp, &exid->nii_domain); - if (status) - return status; - /* Note that RFC 8881 places no length limit on - * nii_name, but this implementation permits no - * more than NFS4_OPAQUE_LIMIT bytes */ - status = nfsd4_decode_opaque(argp, &exid->nii_name); - if (status) - return status; - status = nfsd4_decode_nfstime4(argp, &exid->nii_time); - if (status) - return status; - break; - default: - return nfserr_bad_xdr; - } - - return nfs_ok; + DECODE_TAIL; } static __be32 nfsd4_decode_exchange_id(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) + struct nfsd4_exchange_id *exid) { - struct nfsd4_exchange_id *exid = &u->exchange_id; - __be32 status; + int dummy, tmp; + DECODE_HEAD; + + READ_BUF(NFS4_VERIFIER_SIZE); + COPYMEM(exid->verifier.data, NFS4_VERIFIER_SIZE); - memset(exid, 0, sizeof(*exid)); - status = nfsd4_decode_verifier4(argp, &exid->verifier); - if (status) - return status; status = nfsd4_decode_opaque(argp, &exid->clname); if (status) - return status; - if (xdr_stream_decode_u32(argp->xdr, &exid->flags) < 0) - return nfserr_bad_xdr; - status = nfsd4_decode_state_protect4_a(argp, exid); - if (status) - return status; - return nfsd4_decode_nfs_impl_id4(argp, exid); -} - -static __be32 -nfsd4_decode_channel_attrs4(struct nfsd4_compoundargs *argp, - struct nfsd4_channel_attrs *ca) -{ - __be32 *p; - - p = xdr_inline_decode(argp->xdr, XDR_UNIT * 7); - if (!p) return nfserr_bad_xdr; - /* headerpadsz is ignored */ - p++; - ca->maxreq_sz = be32_to_cpup(p++); - ca->maxresp_sz = be32_to_cpup(p++); - ca->maxresp_cached = be32_to_cpup(p++); - ca->maxops = be32_to_cpup(p++); - ca->maxreqs = be32_to_cpup(p++); - ca->nr_rdma_attrs = be32_to_cpup(p); - switch (ca->nr_rdma_attrs) { - case 0: + READ_BUF(4); + exid->flags = be32_to_cpup(p++); + + /* Ignore state_protect4_a */ + READ_BUF(4); + exid->spa_how = be32_to_cpup(p++); + switch (exid->spa_how) { + case SP4_NONE: break; - case 1: - if (xdr_stream_decode_u32(argp->xdr, &ca->rdma_attrs) < 0) - return nfserr_bad_xdr; + case SP4_MACH_CRED: + /* spo_must_enforce */ + status = nfsd4_decode_bitmap(argp, + exid->spo_must_enforce); + if (status) + goto out; + /* spo_must_allow */ + status = nfsd4_decode_bitmap(argp, exid->spo_must_allow); + if (status) + goto out; + break; + case SP4_SSV: + /* ssp_ops */ + READ_BUF(4); + dummy = be32_to_cpup(p++); + READ_BUF(dummy * 4); + p += dummy; + + READ_BUF(4); + dummy = be32_to_cpup(p++); + READ_BUF(dummy * 4); + p += dummy; + + /* ssp_hash_algs<> */ + READ_BUF(4); + tmp = be32_to_cpup(p++); + while (tmp--) { + READ_BUF(4); + dummy = be32_to_cpup(p++); + READ_BUF(dummy); + p += XDR_QUADLEN(dummy); + } + + /* ssp_encr_algs<> */ + READ_BUF(4); + tmp = be32_to_cpup(p++); + while (tmp--) { + READ_BUF(4); + dummy = be32_to_cpup(p++); + READ_BUF(dummy); + p += XDR_QUADLEN(dummy); + } + + /* ignore ssp_window and ssp_num_gss_handles: */ + READ_BUF(8); break; default: - return nfserr_bad_xdr; + goto xdr_error; } - return nfs_ok; + READ_BUF(4); /* nfs_impl_id4 array length */ + dummy = be32_to_cpup(p++); + + if (dummy > 1) + goto xdr_error; + + if (dummy == 1) { + status = nfsd4_decode_opaque(argp, &exid->nii_domain); + if (status) + goto xdr_error; + + /* nii_name */ + status = nfsd4_decode_opaque(argp, &exid->nii_name); + if (status) + goto xdr_error; + + /* nii_date */ + status = nfsd4_decode_time(argp, &exid->nii_time); + if (status) + goto xdr_error; + } + DECODE_TAIL; } static __be32 nfsd4_decode_create_session(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) + struct nfsd4_create_session *sess) { - struct nfsd4_create_session *sess = &u->create_session; - __be32 status; + DECODE_HEAD; - memset(sess, 0, sizeof(*sess)); - status = nfsd4_decode_clientid4(argp, &sess->clientid); - if (status) - return status; - if (xdr_stream_decode_u32(argp->xdr, &sess->seqid) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &sess->flags) < 0) - return nfserr_bad_xdr; - status = nfsd4_decode_channel_attrs4(argp, &sess->fore_channel); - if (status) - return status; - status = nfsd4_decode_channel_attrs4(argp, &sess->back_channel); - if (status) - return status; - if (xdr_stream_decode_u32(argp->xdr, &sess->callback_prog) < 0) - return nfserr_bad_xdr; - return nfsd4_decode_cb_sec(argp, &sess->cb_sec); + READ_BUF(16); + COPYMEM(&sess->clientid, 8); + sess->seqid = be32_to_cpup(p++); + sess->flags = be32_to_cpup(p++); + + /* Fore channel attrs */ + READ_BUF(28); + p++; /* headerpadsz is always 0 */ + sess->fore_channel.maxreq_sz = be32_to_cpup(p++); + sess->fore_channel.maxresp_sz = be32_to_cpup(p++); + sess->fore_channel.maxresp_cached = be32_to_cpup(p++); + sess->fore_channel.maxops = be32_to_cpup(p++); + sess->fore_channel.maxreqs = be32_to_cpup(p++); + sess->fore_channel.nr_rdma_attrs = be32_to_cpup(p++); + if (sess->fore_channel.nr_rdma_attrs == 1) { + READ_BUF(4); + sess->fore_channel.rdma_attrs = be32_to_cpup(p++); + } else if (sess->fore_channel.nr_rdma_attrs > 1) { + dprintk("Too many fore channel attr bitmaps!\n"); + goto xdr_error; + } + + /* Back channel attrs */ + READ_BUF(28); + p++; /* headerpadsz is always 0 */ + sess->back_channel.maxreq_sz = be32_to_cpup(p++); + sess->back_channel.maxresp_sz = be32_to_cpup(p++); + sess->back_channel.maxresp_cached = be32_to_cpup(p++); + sess->back_channel.maxops = be32_to_cpup(p++); + sess->back_channel.maxreqs = be32_to_cpup(p++); + sess->back_channel.nr_rdma_attrs = be32_to_cpup(p++); + if (sess->back_channel.nr_rdma_attrs == 1) { + READ_BUF(4); + sess->back_channel.rdma_attrs = be32_to_cpup(p++); + } else if (sess->back_channel.nr_rdma_attrs > 1) { + dprintk("Too many back channel attr bitmaps!\n"); + goto xdr_error; + } + + READ_BUF(4); + sess->callback_prog = be32_to_cpup(p++); + nfsd4_decode_cb_sec(argp, &sess->cb_sec); + DECODE_TAIL; } static __be32 nfsd4_decode_destroy_session(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) + struct nfsd4_destroy_session *destroy_session) { - struct nfsd4_destroy_session *destroy_session = &u->destroy_session; - return nfsd4_decode_sessionid4(argp, &destroy_session->sessionid); + DECODE_HEAD; + READ_BUF(NFS4_MAX_SESSIONID_LEN); + COPYMEM(destroy_session->sessionid.data, NFS4_MAX_SESSIONID_LEN); + + DECODE_TAIL; } static __be32 nfsd4_decode_free_stateid(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) + struct nfsd4_free_stateid *free_stateid) { - struct nfsd4_free_stateid *free_stateid = &u->free_stateid; - return nfsd4_decode_stateid4(argp, &free_stateid->fr_stateid); + DECODE_HEAD; + + READ_BUF(sizeof(stateid_t)); + free_stateid->fr_stateid.si_generation = be32_to_cpup(p++); + COPYMEM(&free_stateid->fr_stateid.si_opaque, sizeof(stateid_opaque_t)); + + DECODE_TAIL; +} + +static __be32 +nfsd4_decode_sequence(struct nfsd4_compoundargs *argp, + struct nfsd4_sequence *seq) +{ + DECODE_HEAD; + + READ_BUF(NFS4_MAX_SESSIONID_LEN + 16); + COPYMEM(seq->sessionid.data, NFS4_MAX_SESSIONID_LEN); + seq->seqid = be32_to_cpup(p++); + seq->slotid = be32_to_cpup(p++); + seq->maxslots = be32_to_cpup(p++); + seq->cachethis = be32_to_cpup(p++); + + DECODE_TAIL; +} + +static __be32 +nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, struct nfsd4_test_stateid *test_stateid) +{ + int i; + __be32 *p, status; + struct nfsd4_test_stateid_id *stateid; + + READ_BUF(4); + test_stateid->ts_num_ids = ntohl(*p++); + + INIT_LIST_HEAD(&test_stateid->ts_stateid_list); + + for (i = 0; i < test_stateid->ts_num_ids; i++) { + stateid = svcxdr_tmpalloc(argp, sizeof(*stateid)); + if (!stateid) { + status = nfserrno(-ENOMEM); + goto out; + } + + INIT_LIST_HEAD(&stateid->ts_id_list); + list_add_tail(&stateid->ts_id_list, &test_stateid->ts_stateid_list); + + status = nfsd4_decode_stateid(argp, &stateid->ts_id_stateid); + if (status) + goto out; + } + + status = 0; +out: + return status; +xdr_error: + dprintk("NFSD: xdr error (%s:%d)\n", __FILE__, __LINE__); + status = nfserr_bad_xdr; + goto out; +} + +static __be32 nfsd4_decode_destroy_clientid(struct nfsd4_compoundargs *argp, struct nfsd4_destroy_clientid *dc) +{ + DECODE_HEAD; + + READ_BUF(8); + COPYMEM(&dc->clientid, 8); + + DECODE_TAIL; +} + +static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, struct nfsd4_reclaim_complete *rc) +{ + DECODE_HEAD; + + READ_BUF(4); + rc->rca_one_fs = be32_to_cpup(p++); + + DECODE_TAIL; } #ifdef CONFIG_NFSD_PNFS static __be32 nfsd4_decode_getdeviceinfo(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) + struct nfsd4_getdeviceinfo *gdev) { - struct nfsd4_getdeviceinfo *gdev = &u->getdeviceinfo; - __be32 status; + DECODE_HEAD; + u32 num, i; - memset(gdev, 0, sizeof(*gdev)); - status = nfsd4_decode_deviceid4(argp, &gdev->gd_devid); + READ_BUF(sizeof(struct nfsd4_deviceid) + 3 * 4); + COPYMEM(&gdev->gd_devid, sizeof(struct nfsd4_deviceid)); + gdev->gd_layout_type = be32_to_cpup(p++); + gdev->gd_maxcount = be32_to_cpup(p++); + num = be32_to_cpup(p++); + if (num) { + if (num > 1000) + goto xdr_error; + READ_BUF(4 * num); + gdev->gd_notify_types = be32_to_cpup(p++); + for (i = 1; i < num; i++) { + if (be32_to_cpup(p++)) { + status = nfserr_inval; + goto out; + } + } + } + DECODE_TAIL; +} + +static __be32 +nfsd4_decode_layoutget(struct nfsd4_compoundargs *argp, + struct nfsd4_layoutget *lgp) +{ + DECODE_HEAD; + + READ_BUF(36); + lgp->lg_signal = be32_to_cpup(p++); + lgp->lg_layout_type = be32_to_cpup(p++); + lgp->lg_seg.iomode = be32_to_cpup(p++); + p = xdr_decode_hyper(p, &lgp->lg_seg.offset); + p = xdr_decode_hyper(p, &lgp->lg_seg.length); + p = xdr_decode_hyper(p, &lgp->lg_minlength); + + status = nfsd4_decode_stateid(argp, &lgp->lg_sid); if (status) return status; - if (xdr_stream_decode_u32(argp->xdr, &gdev->gd_layout_type) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &gdev->gd_maxcount) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_uint32_array(argp->xdr, - &gdev->gd_notify_types, 1) < 0) - return nfserr_bad_xdr; - return nfs_ok; + READ_BUF(4); + lgp->lg_maxcount = be32_to_cpup(p++); + + DECODE_TAIL; } static __be32 nfsd4_decode_layoutcommit(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) + struct nfsd4_layoutcommit *lcp) { - struct nfsd4_layoutcommit *lcp = &u->layoutcommit; - __be32 *p, status; + DECODE_HEAD; + u32 timechange; - memset(lcp, 0, sizeof(*lcp)); - if (xdr_stream_decode_u64(argp->xdr, &lcp->lc_seg.offset) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u64(argp->xdr, &lcp->lc_seg.length) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_bool(argp->xdr, &lcp->lc_reclaim) < 0) - return nfserr_bad_xdr; - status = nfsd4_decode_stateid4(argp, &lcp->lc_sid); + READ_BUF(20); + p = xdr_decode_hyper(p, &lcp->lc_seg.offset); + p = xdr_decode_hyper(p, &lcp->lc_seg.length); + lcp->lc_reclaim = be32_to_cpup(p++); + + status = nfsd4_decode_stateid(argp, &lcp->lc_sid); if (status) return status; - if (xdr_stream_decode_u32(argp->xdr, &lcp->lc_newoffset) < 0) - return nfserr_bad_xdr; + + READ_BUF(4); + lcp->lc_newoffset = be32_to_cpup(p++); if (lcp->lc_newoffset) { - if (xdr_stream_decode_u64(argp->xdr, &lcp->lc_last_wr) < 0) - return nfserr_bad_xdr; + READ_BUF(8); + p = xdr_decode_hyper(p, &lcp->lc_last_wr); } else lcp->lc_last_wr = 0; - p = xdr_inline_decode(argp->xdr, XDR_UNIT); - if (!p) - return nfserr_bad_xdr; - if (xdr_item_is_present(p)) { - status = nfsd4_decode_nfstime4(argp, &lcp->lc_mtime); + READ_BUF(4); + timechange = be32_to_cpup(p++); + if (timechange) { + status = nfsd4_decode_time(argp, &lcp->lc_mtime); if (status) return status; } else { lcp->lc_mtime.tv_nsec = UTIME_NOW; } - return nfsd4_decode_layoutupdate4(argp, lcp); -} + READ_BUF(8); + lcp->lc_layout_type = be32_to_cpup(p++); -static __be32 -nfsd4_decode_layoutget(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) -{ - struct nfsd4_layoutget *lgp = &u->layoutget; - __be32 status; + /* + * Save the layout update in XDR format and let the layout driver deal + * with it later. + */ + lcp->lc_up_len = be32_to_cpup(p++); + if (lcp->lc_up_len > 0) { + READ_BUF(lcp->lc_up_len); + READMEM(lcp->lc_up_layout, lcp->lc_up_len); + } - memset(lgp, 0, sizeof(*lgp)); - if (xdr_stream_decode_u32(argp->xdr, &lgp->lg_signal) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &lgp->lg_layout_type) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &lgp->lg_seg.iomode) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u64(argp->xdr, &lgp->lg_seg.offset) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u64(argp->xdr, &lgp->lg_seg.length) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u64(argp->xdr, &lgp->lg_minlength) < 0) - return nfserr_bad_xdr; - status = nfsd4_decode_stateid4(argp, &lgp->lg_sid); - if (status) - return status; - if (xdr_stream_decode_u32(argp->xdr, &lgp->lg_maxcount) < 0) - return nfserr_bad_xdr; - - return nfs_ok; + DECODE_TAIL; } static __be32 nfsd4_decode_layoutreturn(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) + struct nfsd4_layoutreturn *lrp) { - struct nfsd4_layoutreturn *lrp = &u->layoutreturn; - memset(lrp, 0, sizeof(*lrp)); - if (xdr_stream_decode_bool(argp->xdr, &lrp->lr_reclaim) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &lrp->lr_layout_type) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &lrp->lr_seg.iomode) < 0) - return nfserr_bad_xdr; - return nfsd4_decode_layoutreturn4(argp, lrp); + DECODE_HEAD; + + READ_BUF(16); + lrp->lr_reclaim = be32_to_cpup(p++); + lrp->lr_layout_type = be32_to_cpup(p++); + lrp->lr_seg.iomode = be32_to_cpup(p++); + lrp->lr_return_type = be32_to_cpup(p++); + if (lrp->lr_return_type == RETURN_FILE) { + READ_BUF(16); + p = xdr_decode_hyper(p, &lrp->lr_seg.offset); + p = xdr_decode_hyper(p, &lrp->lr_seg.length); + + status = nfsd4_decode_stateid(argp, &lrp->lr_sid); + if (status) + return status; + + READ_BUF(4); + lrp->lrf_body_len = be32_to_cpup(p++); + if (lrp->lrf_body_len > 0) { + READ_BUF(lrp->lrf_body_len); + READMEM(lrp->lrf_body, lrp->lrf_body_len); + } + } else { + lrp->lr_seg.offset = 0; + lrp->lr_seg.length = NFS4_MAX_UINT64; + } + + DECODE_TAIL; } #endif /* CONFIG_NFSD_PNFS */ -static __be32 nfsd4_decode_secinfo_no_name(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) -{ - struct nfsd4_secinfo_no_name *sin = &u->secinfo_no_name; - if (xdr_stream_decode_u32(argp->xdr, &sin->sin_style) < 0) - return nfserr_bad_xdr; - - sin->sin_exp = NULL; - return nfs_ok; -} - -static __be32 -nfsd4_decode_sequence(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) -{ - struct nfsd4_sequence *seq = &u->sequence; - __be32 *p, status; - - status = nfsd4_decode_sessionid4(argp, &seq->sessionid); - if (status) - return status; - p = xdr_inline_decode(argp->xdr, XDR_UNIT * 4); - if (!p) - return nfserr_bad_xdr; - seq->seqid = be32_to_cpup(p++); - seq->slotid = be32_to_cpup(p++); - seq->maxslots = be32_to_cpup(p++); - seq->cachethis = be32_to_cpup(p); - - seq->status_flags = 0; - return nfs_ok; -} - -static __be32 -nfsd4_decode_test_stateid(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) -{ - struct nfsd4_test_stateid *test_stateid = &u->test_stateid; - struct nfsd4_test_stateid_id *stateid; - __be32 status; - u32 i; - - memset(test_stateid, 0, sizeof(*test_stateid)); - if (xdr_stream_decode_u32(argp->xdr, &test_stateid->ts_num_ids) < 0) - return nfserr_bad_xdr; - - INIT_LIST_HEAD(&test_stateid->ts_stateid_list); - for (i = 0; i < test_stateid->ts_num_ids; i++) { - stateid = svcxdr_tmpalloc(argp, sizeof(*stateid)); - if (!stateid) - return nfserr_jukebox; - INIT_LIST_HEAD(&stateid->ts_id_list); - list_add_tail(&stateid->ts_id_list, &test_stateid->ts_stateid_list); - status = nfsd4_decode_stateid4(argp, &stateid->ts_id_stateid); - if (status) - return status; - } - - return nfs_ok; -} - -static __be32 nfsd4_decode_destroy_clientid(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) -{ - struct nfsd4_destroy_clientid *dc = &u->destroy_clientid; - return nfsd4_decode_clientid4(argp, &dc->clientid); -} - -static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) -{ - struct nfsd4_reclaim_complete *rc = &u->reclaim_complete; - if (xdr_stream_decode_bool(argp->xdr, &rc->rca_one_fs) < 0) - return nfserr_bad_xdr; - return nfs_ok; -} - static __be32 nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) + struct nfsd4_fallocate *fallocate) { - struct nfsd4_fallocate *fallocate = &u->allocate; - __be32 status; + DECODE_HEAD; - status = nfsd4_decode_stateid4(argp, &fallocate->falloc_stateid); + status = nfsd4_decode_stateid(argp, &fallocate->falloc_stateid); if (status) return status; - if (xdr_stream_decode_u64(argp->xdr, &fallocate->falloc_offset) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u64(argp->xdr, &fallocate->falloc_length) < 0) - return nfserr_bad_xdr; - return nfs_ok; + READ_BUF(16); + p = xdr_decode_hyper(p, &fallocate->falloc_offset); + xdr_decode_hyper(p, &fallocate->falloc_length); + + DECODE_TAIL; +} + +static __be32 +nfsd4_decode_clone(struct nfsd4_compoundargs *argp, struct nfsd4_clone *clone) +{ + DECODE_HEAD; + + status = nfsd4_decode_stateid(argp, &clone->cl_src_stateid); + if (status) + return status; + status = nfsd4_decode_stateid(argp, &clone->cl_dst_stateid); + if (status) + return status; + + READ_BUF(8 + 8 + 8); + p = xdr_decode_hyper(p, &clone->cl_src_pos); + p = xdr_decode_hyper(p, &clone->cl_dst_pos); + p = xdr_decode_hyper(p, &clone->cl_count); + DECODE_TAIL; } static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp, struct nl4_server *ns) { + DECODE_HEAD; struct nfs42_netaddr *naddr; - __be32 *p; - if (xdr_stream_decode_u32(argp->xdr, &ns->nl4_type) < 0) - return nfserr_bad_xdr; + READ_BUF(4); + ns->nl4_type = be32_to_cpup(p++); /* currently support for 1 inter-server source server */ switch (ns->nl4_type) { case NL4_NETADDR: naddr = &ns->u.nl4_addr; - if (xdr_stream_decode_u32(argp->xdr, &naddr->netid_len) < 0) - return nfserr_bad_xdr; + READ_BUF(4); + naddr->netid_len = be32_to_cpup(p++); if (naddr->netid_len > RPCBIND_MAXNETIDLEN) - return nfserr_bad_xdr; + goto xdr_error; - p = xdr_inline_decode(argp->xdr, naddr->netid_len); - if (!p) - return nfserr_bad_xdr; - memcpy(naddr->netid, p, naddr->netid_len); + READ_BUF(naddr->netid_len + 4); /* 4 for uaddr len */ + COPYMEM(naddr->netid, naddr->netid_len); - if (xdr_stream_decode_u32(argp->xdr, &naddr->addr_len) < 0) - return nfserr_bad_xdr; + naddr->addr_len = be32_to_cpup(p++); if (naddr->addr_len > RPCBIND_MAXUADDRLEN) - return nfserr_bad_xdr; + goto xdr_error; - p = xdr_inline_decode(argp->xdr, naddr->addr_len); - if (!p) - return nfserr_bad_xdr; - memcpy(naddr->addr, p, naddr->addr_len); + READ_BUF(naddr->addr_len); + COPYMEM(naddr->addr, naddr->addr_len); break; default: - return nfserr_bad_xdr; + goto xdr_error; } - - return nfs_ok; + DECODE_TAIL; } static __be32 -nfsd4_decode_copy(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy) { - struct nfsd4_copy *copy = &u->copy; - u32 consecutive, i, count, sync; + DECODE_HEAD; struct nl4_server *ns_dummy; - __be32 status; + int i, count; - memset(copy, 0, sizeof(*copy)); - status = nfsd4_decode_stateid4(argp, ©->cp_src_stateid); + status = nfsd4_decode_stateid(argp, ©->cp_src_stateid); if (status) return status; - status = nfsd4_decode_stateid4(argp, ©->cp_dst_stateid); + status = nfsd4_decode_stateid(argp, ©->cp_dst_stateid); if (status) return status; - if (xdr_stream_decode_u64(argp->xdr, ©->cp_src_pos) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u64(argp->xdr, ©->cp_dst_pos) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u64(argp->xdr, ©->cp_count) < 0) - return nfserr_bad_xdr; - /* ca_consecutive: we always do consecutive copies */ - if (xdr_stream_decode_u32(argp->xdr, &consecutive) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_bool(argp->xdr, &sync) < 0) - return nfserr_bad_xdr; - nfsd4_copy_set_sync(copy, sync); - if (xdr_stream_decode_u32(argp->xdr, &count) < 0) - return nfserr_bad_xdr; - copy->cp_src = svcxdr_tmpalloc(argp, sizeof(*copy->cp_src)); - if (copy->cp_src == NULL) - return nfserr_jukebox; + READ_BUF(8 + 8 + 8 + 4 + 4 + 4); + p = xdr_decode_hyper(p, ©->cp_src_pos); + p = xdr_decode_hyper(p, ©->cp_dst_pos); + p = xdr_decode_hyper(p, ©->cp_count); + p++; /* ca_consecutive: we always do consecutive copies */ + copy->cp_synchronous = be32_to_cpup(p++); + + count = be32_to_cpup(p++); + + copy->cp_intra = false; if (count == 0) { /* intra-server copy */ - __set_bit(NFSD4_COPY_F_INTRA, ©->cp_flags); - return nfs_ok; + copy->cp_intra = true; + goto intra; } - /* decode all the supplied server addresses but use only the first */ - status = nfsd4_decode_nl4_server(argp, copy->cp_src); + /* decode all the supplied server addresses but use first */ + status = nfsd4_decode_nl4_server(argp, ©->cp_src); if (status) return status; ns_dummy = kmalloc(sizeof(struct nl4_server), GFP_KERNEL); if (ns_dummy == NULL) - return nfserr_jukebox; + return nfserrno(-ENOMEM); for (i = 0; i < count - 1; i++) { status = nfsd4_decode_nl4_server(argp, ns_dummy); if (status) { @@ -2027,80 +1839,44 @@ nfsd4_decode_copy(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) } } kfree(ns_dummy); +intra: - return nfs_ok; -} - -static __be32 -nfsd4_decode_copy_notify(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) -{ - struct nfsd4_copy_notify *cn = &u->copy_notify; - __be32 status; - - memset(cn, 0, sizeof(*cn)); - cn->cpn_src = svcxdr_tmpalloc(argp, sizeof(*cn->cpn_src)); - if (cn->cpn_src == NULL) - return nfserr_jukebox; - cn->cpn_dst = svcxdr_tmpalloc(argp, sizeof(*cn->cpn_dst)); - if (cn->cpn_dst == NULL) - return nfserr_jukebox; - - status = nfsd4_decode_stateid4(argp, &cn->cpn_src_stateid); - if (status) - return status; - return nfsd4_decode_nl4_server(argp, cn->cpn_dst); + DECODE_TAIL; } static __be32 nfsd4_decode_offload_status(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) + struct nfsd4_offload_status *os) { - struct nfsd4_offload_status *os = &u->offload_status; - os->count = 0; - os->status = 0; - return nfsd4_decode_stateid4(argp, &os->stateid); + return nfsd4_decode_stateid(argp, &os->stateid); } static __be32 -nfsd4_decode_seek(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_copy_notify(struct nfsd4_compoundargs *argp, + struct nfsd4_copy_notify *cn) { - struct nfsd4_seek *seek = &u->seek; __be32 status; - status = nfsd4_decode_stateid4(argp, &seek->seek_stateid); + status = nfsd4_decode_stateid(argp, &cn->cpn_src_stateid); if (status) return status; - if (xdr_stream_decode_u64(argp->xdr, &seek->seek_offset) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u32(argp->xdr, &seek->seek_whence) < 0) - return nfserr_bad_xdr; - - seek->seek_eof = 0; - seek->seek_pos = 0; - return nfs_ok; + return nfsd4_decode_nl4_server(argp, &cn->cpn_dst); } static __be32 -nfsd4_decode_clone(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) +nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek) { - struct nfsd4_clone *clone = &u->clone; - __be32 status; + DECODE_HEAD; - status = nfsd4_decode_stateid4(argp, &clone->cl_src_stateid); + status = nfsd4_decode_stateid(argp, &seek->seek_stateid); if (status) return status; - status = nfsd4_decode_stateid4(argp, &clone->cl_dst_stateid); - if (status) - return status; - if (xdr_stream_decode_u64(argp->xdr, &clone->cl_src_pos) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u64(argp->xdr, &clone->cl_dst_pos) < 0) - return nfserr_bad_xdr; - if (xdr_stream_decode_u64(argp->xdr, &clone->cl_count) < 0) - return nfserr_bad_xdr; - return nfs_ok; + READ_BUF(8 + 4); + p = xdr_decode_hyper(p, &seek->seek_offset); + seek->seek_whence = be32_to_cpup(p); + + DECODE_TAIL; } /* @@ -2113,14 +1889,13 @@ nfsd4_decode_clone(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u) */ /* - * Decode data into buffer. + * Decode data into buffer. Uses head and pages constructed by + * svcxdr_construct_vector. */ static __be32 -nfsd4_vbuf_from_vector(struct nfsd4_compoundargs *argp, struct xdr_buf *xdr, - char **bufp, u32 buflen) +nfsd4_vbuf_from_vector(struct nfsd4_compoundargs *argp, struct kvec *head, + struct page **pages, char **bufp, u32 buflen) { - struct page **pages = xdr->pages; - struct kvec *head = xdr->head; char *tmp, *dp; u32 len; @@ -2163,22 +1938,25 @@ nfsd4_vbuf_from_vector(struct nfsd4_compoundargs *argp, struct xdr_buf *xdr, static __be32 nfsd4_decode_xattr_name(struct nfsd4_compoundargs *argp, char **namep) { + DECODE_HEAD; char *name, *sp, *dp; u32 namelen, cnt; - __be32 *p; - if (xdr_stream_decode_u32(argp->xdr, &namelen) < 0) - return nfserr_bad_xdr; + READ_BUF(4); + namelen = be32_to_cpup(p++); + if (namelen > (XATTR_NAME_MAX - XATTR_USER_PREFIX_LEN)) return nfserr_nametoolong; + if (namelen == 0) - return nfserr_bad_xdr; - p = xdr_inline_decode(argp->xdr, namelen); - if (!p) - return nfserr_bad_xdr; + goto xdr_error; + + READ_BUF(namelen); + name = svcxdr_tmpalloc(argp, namelen + XATTR_USER_PREFIX_LEN + 1); if (!name) return nfserr_jukebox; + memcpy(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN); /* @@ -2191,14 +1969,14 @@ nfsd4_decode_xattr_name(struct nfsd4_compoundargs *argp, char **namep) while (cnt-- > 0) { if (*sp == '\0') - return nfserr_bad_xdr; + goto xdr_error; *dp++ = *sp++; } *dp = '\0'; *namep = name; - return nfs_ok; + DECODE_TAIL; } /* @@ -2209,13 +1987,11 @@ nfsd4_decode_xattr_name(struct nfsd4_compoundargs *argp, char **namep) */ static __be32 nfsd4_decode_getxattr(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) + struct nfsd4_getxattr *getxattr) { - struct nfsd4_getxattr *getxattr = &u->getxattr; __be32 status; u32 maxcount; - memset(getxattr, 0, sizeof(*getxattr)); status = nfsd4_decode_xattr_name(argp, &getxattr->getxa_name); if (status) return status; @@ -2224,21 +2000,21 @@ nfsd4_decode_getxattr(struct nfsd4_compoundargs *argp, maxcount = min_t(u32, XATTR_SIZE_MAX, maxcount); getxattr->getxa_len = maxcount; - return nfs_ok; + + return status; } static __be32 nfsd4_decode_setxattr(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) + struct nfsd4_setxattr *setxattr) { - struct nfsd4_setxattr *setxattr = &u->setxattr; + DECODE_HEAD; u32 flags, maxcount, size; - __be32 status; + struct kvec head; + struct page **pagelist; - memset(setxattr, 0, sizeof(*setxattr)); - - if (xdr_stream_decode_u32(argp->xdr, &flags) < 0) - return nfserr_bad_xdr; + READ_BUF(4); + flags = be32_to_cpup(p++); if (flags > SETXATTR4_REPLACE) return nfserr_inval; @@ -2251,35 +2027,33 @@ nfsd4_decode_setxattr(struct nfsd4_compoundargs *argp, maxcount = svc_max_payload(argp->rqstp); maxcount = min_t(u32, XATTR_SIZE_MAX, maxcount); - if (xdr_stream_decode_u32(argp->xdr, &size) < 0) - return nfserr_bad_xdr; + READ_BUF(4); + size = be32_to_cpup(p++); if (size > maxcount) return nfserr_xattr2big; setxattr->setxa_len = size; if (size > 0) { - struct xdr_buf payload; + status = svcxdr_construct_vector(argp, &head, &pagelist, size); + if (status) + return status; - if (!xdr_stream_subsegment(argp->xdr, &payload, size)) - return nfserr_bad_xdr; - status = nfsd4_vbuf_from_vector(argp, &payload, - &setxattr->setxa_buf, size); + status = nfsd4_vbuf_from_vector(argp, &head, pagelist, + &setxattr->setxa_buf, size); } - return nfs_ok; + DECODE_TAIL; } static __be32 nfsd4_decode_listxattrs(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) + struct nfsd4_listxattrs *listxattrs) { - struct nfsd4_listxattrs *listxattrs = &u->listxattrs; + DECODE_HEAD; u32 maxcount; - memset(listxattrs, 0, sizeof(*listxattrs)); - - if (xdr_stream_decode_u64(argp->xdr, &listxattrs->lsxa_cookie) < 0) - return nfserr_bad_xdr; + READ_BUF(12); + p = xdr_decode_hyper(p, &listxattrs->lsxa_cookie); /* * If the cookie is too large to have even one user.x attribute @@ -2289,8 +2063,7 @@ nfsd4_decode_listxattrs(struct nfsd4_compoundargs *argp, (XATTR_LIST_MAX / (XATTR_USER_PREFIX_LEN + 2))) return nfserr_badcookie; - if (xdr_stream_decode_u32(argp->xdr, &maxcount) < 0) - return nfserr_bad_xdr; + maxcount = be32_to_cpup(p++); if (maxcount < 8) /* Always need at least 2 words (length and one character) */ return nfserr_inval; @@ -2298,119 +2071,117 @@ nfsd4_decode_listxattrs(struct nfsd4_compoundargs *argp, maxcount = min(maxcount, svc_max_payload(argp->rqstp)); listxattrs->lsxa_maxcount = maxcount; - return nfs_ok; + DECODE_TAIL; } static __be32 nfsd4_decode_removexattr(struct nfsd4_compoundargs *argp, - union nfsd4_op_u *u) + struct nfsd4_removexattr *removexattr) { - struct nfsd4_removexattr *removexattr = &u->removexattr; - memset(removexattr, 0, sizeof(*removexattr)); return nfsd4_decode_xattr_name(argp, &removexattr->rmxa_name); } static __be32 -nfsd4_decode_noop(struct nfsd4_compoundargs *argp, union nfsd4_op_u *p) +nfsd4_decode_noop(struct nfsd4_compoundargs *argp, void *p) { return nfs_ok; } static __be32 -nfsd4_decode_notsupp(struct nfsd4_compoundargs *argp, union nfsd4_op_u *p) +nfsd4_decode_notsupp(struct nfsd4_compoundargs *argp, void *p) { return nfserr_notsupp; } -typedef __be32(*nfsd4_dec)(struct nfsd4_compoundargs *argp, union nfsd4_op_u *u); +typedef __be32(*nfsd4_dec)(struct nfsd4_compoundargs *argp, void *); static const nfsd4_dec nfsd4_dec_ops[] = { - [OP_ACCESS] = nfsd4_decode_access, - [OP_CLOSE] = nfsd4_decode_close, - [OP_COMMIT] = nfsd4_decode_commit, - [OP_CREATE] = nfsd4_decode_create, - [OP_DELEGPURGE] = nfsd4_decode_notsupp, - [OP_DELEGRETURN] = nfsd4_decode_delegreturn, - [OP_GETATTR] = nfsd4_decode_getattr, - [OP_GETFH] = nfsd4_decode_noop, - [OP_LINK] = nfsd4_decode_link, - [OP_LOCK] = nfsd4_decode_lock, - [OP_LOCKT] = nfsd4_decode_lockt, - [OP_LOCKU] = nfsd4_decode_locku, - [OP_LOOKUP] = nfsd4_decode_lookup, - [OP_LOOKUPP] = nfsd4_decode_noop, - [OP_NVERIFY] = nfsd4_decode_verify, - [OP_OPEN] = nfsd4_decode_open, - [OP_OPENATTR] = nfsd4_decode_notsupp, - [OP_OPEN_CONFIRM] = nfsd4_decode_open_confirm, - [OP_OPEN_DOWNGRADE] = nfsd4_decode_open_downgrade, - [OP_PUTFH] = nfsd4_decode_putfh, - [OP_PUTPUBFH] = nfsd4_decode_putpubfh, - [OP_PUTROOTFH] = nfsd4_decode_noop, - [OP_READ] = nfsd4_decode_read, - [OP_READDIR] = nfsd4_decode_readdir, - [OP_READLINK] = nfsd4_decode_noop, - [OP_REMOVE] = nfsd4_decode_remove, - [OP_RENAME] = nfsd4_decode_rename, - [OP_RENEW] = nfsd4_decode_renew, - [OP_RESTOREFH] = nfsd4_decode_noop, - [OP_SAVEFH] = nfsd4_decode_noop, - [OP_SECINFO] = nfsd4_decode_secinfo, - [OP_SETATTR] = nfsd4_decode_setattr, - [OP_SETCLIENTID] = nfsd4_decode_setclientid, - [OP_SETCLIENTID_CONFIRM] = nfsd4_decode_setclientid_confirm, - [OP_VERIFY] = nfsd4_decode_verify, - [OP_WRITE] = nfsd4_decode_write, - [OP_RELEASE_LOCKOWNER] = nfsd4_decode_release_lockowner, + [OP_ACCESS] = (nfsd4_dec)nfsd4_decode_access, + [OP_CLOSE] = (nfsd4_dec)nfsd4_decode_close, + [OP_COMMIT] = (nfsd4_dec)nfsd4_decode_commit, + [OP_CREATE] = (nfsd4_dec)nfsd4_decode_create, + [OP_DELEGPURGE] = (nfsd4_dec)nfsd4_decode_notsupp, + [OP_DELEGRETURN] = (nfsd4_dec)nfsd4_decode_delegreturn, + [OP_GETATTR] = (nfsd4_dec)nfsd4_decode_getattr, + [OP_GETFH] = (nfsd4_dec)nfsd4_decode_noop, + [OP_LINK] = (nfsd4_dec)nfsd4_decode_link, + [OP_LOCK] = (nfsd4_dec)nfsd4_decode_lock, + [OP_LOCKT] = (nfsd4_dec)nfsd4_decode_lockt, + [OP_LOCKU] = (nfsd4_dec)nfsd4_decode_locku, + [OP_LOOKUP] = (nfsd4_dec)nfsd4_decode_lookup, + [OP_LOOKUPP] = (nfsd4_dec)nfsd4_decode_noop, + [OP_NVERIFY] = (nfsd4_dec)nfsd4_decode_verify, + [OP_OPEN] = (nfsd4_dec)nfsd4_decode_open, + [OP_OPENATTR] = (nfsd4_dec)nfsd4_decode_notsupp, + [OP_OPEN_CONFIRM] = (nfsd4_dec)nfsd4_decode_open_confirm, + [OP_OPEN_DOWNGRADE] = (nfsd4_dec)nfsd4_decode_open_downgrade, + [OP_PUTFH] = (nfsd4_dec)nfsd4_decode_putfh, + [OP_PUTPUBFH] = (nfsd4_dec)nfsd4_decode_putpubfh, + [OP_PUTROOTFH] = (nfsd4_dec)nfsd4_decode_noop, + [OP_READ] = (nfsd4_dec)nfsd4_decode_read, + [OP_READDIR] = (nfsd4_dec)nfsd4_decode_readdir, + [OP_READLINK] = (nfsd4_dec)nfsd4_decode_noop, + [OP_REMOVE] = (nfsd4_dec)nfsd4_decode_remove, + [OP_RENAME] = (nfsd4_dec)nfsd4_decode_rename, + [OP_RENEW] = (nfsd4_dec)nfsd4_decode_renew, + [OP_RESTOREFH] = (nfsd4_dec)nfsd4_decode_noop, + [OP_SAVEFH] = (nfsd4_dec)nfsd4_decode_noop, + [OP_SECINFO] = (nfsd4_dec)nfsd4_decode_secinfo, + [OP_SETATTR] = (nfsd4_dec)nfsd4_decode_setattr, + [OP_SETCLIENTID] = (nfsd4_dec)nfsd4_decode_setclientid, + [OP_SETCLIENTID_CONFIRM] = (nfsd4_dec)nfsd4_decode_setclientid_confirm, + [OP_VERIFY] = (nfsd4_dec)nfsd4_decode_verify, + [OP_WRITE] = (nfsd4_dec)nfsd4_decode_write, + [OP_RELEASE_LOCKOWNER] = (nfsd4_dec)nfsd4_decode_release_lockowner, /* new operations for NFSv4.1 */ - [OP_BACKCHANNEL_CTL] = nfsd4_decode_backchannel_ctl, - [OP_BIND_CONN_TO_SESSION] = nfsd4_decode_bind_conn_to_session, - [OP_EXCHANGE_ID] = nfsd4_decode_exchange_id, - [OP_CREATE_SESSION] = nfsd4_decode_create_session, - [OP_DESTROY_SESSION] = nfsd4_decode_destroy_session, - [OP_FREE_STATEID] = nfsd4_decode_free_stateid, - [OP_GET_DIR_DELEGATION] = nfsd4_decode_notsupp, + [OP_BACKCHANNEL_CTL] = (nfsd4_dec)nfsd4_decode_backchannel_ctl, + [OP_BIND_CONN_TO_SESSION]= (nfsd4_dec)nfsd4_decode_bind_conn_to_session, + [OP_EXCHANGE_ID] = (nfsd4_dec)nfsd4_decode_exchange_id, + [OP_CREATE_SESSION] = (nfsd4_dec)nfsd4_decode_create_session, + [OP_DESTROY_SESSION] = (nfsd4_dec)nfsd4_decode_destroy_session, + [OP_FREE_STATEID] = (nfsd4_dec)nfsd4_decode_free_stateid, + [OP_GET_DIR_DELEGATION] = (nfsd4_dec)nfsd4_decode_notsupp, #ifdef CONFIG_NFSD_PNFS - [OP_GETDEVICEINFO] = nfsd4_decode_getdeviceinfo, - [OP_GETDEVICELIST] = nfsd4_decode_notsupp, - [OP_LAYOUTCOMMIT] = nfsd4_decode_layoutcommit, - [OP_LAYOUTGET] = nfsd4_decode_layoutget, - [OP_LAYOUTRETURN] = nfsd4_decode_layoutreturn, + [OP_GETDEVICEINFO] = (nfsd4_dec)nfsd4_decode_getdeviceinfo, + [OP_GETDEVICELIST] = (nfsd4_dec)nfsd4_decode_notsupp, + [OP_LAYOUTCOMMIT] = (nfsd4_dec)nfsd4_decode_layoutcommit, + [OP_LAYOUTGET] = (nfsd4_dec)nfsd4_decode_layoutget, + [OP_LAYOUTRETURN] = (nfsd4_dec)nfsd4_decode_layoutreturn, #else - [OP_GETDEVICEINFO] = nfsd4_decode_notsupp, - [OP_GETDEVICELIST] = nfsd4_decode_notsupp, - [OP_LAYOUTCOMMIT] = nfsd4_decode_notsupp, - [OP_LAYOUTGET] = nfsd4_decode_notsupp, - [OP_LAYOUTRETURN] = nfsd4_decode_notsupp, + [OP_GETDEVICEINFO] = (nfsd4_dec)nfsd4_decode_notsupp, + [OP_GETDEVICELIST] = (nfsd4_dec)nfsd4_decode_notsupp, + [OP_LAYOUTCOMMIT] = (nfsd4_dec)nfsd4_decode_notsupp, + [OP_LAYOUTGET] = (nfsd4_dec)nfsd4_decode_notsupp, + [OP_LAYOUTRETURN] = (nfsd4_dec)nfsd4_decode_notsupp, #endif - [OP_SECINFO_NO_NAME] = nfsd4_decode_secinfo_no_name, - [OP_SEQUENCE] = nfsd4_decode_sequence, - [OP_SET_SSV] = nfsd4_decode_notsupp, - [OP_TEST_STATEID] = nfsd4_decode_test_stateid, - [OP_WANT_DELEGATION] = nfsd4_decode_notsupp, - [OP_DESTROY_CLIENTID] = nfsd4_decode_destroy_clientid, - [OP_RECLAIM_COMPLETE] = nfsd4_decode_reclaim_complete, + [OP_SECINFO_NO_NAME] = (nfsd4_dec)nfsd4_decode_secinfo_no_name, + [OP_SEQUENCE] = (nfsd4_dec)nfsd4_decode_sequence, + [OP_SET_SSV] = (nfsd4_dec)nfsd4_decode_notsupp, + [OP_TEST_STATEID] = (nfsd4_dec)nfsd4_decode_test_stateid, + [OP_WANT_DELEGATION] = (nfsd4_dec)nfsd4_decode_notsupp, + [OP_DESTROY_CLIENTID] = (nfsd4_dec)nfsd4_decode_destroy_clientid, + [OP_RECLAIM_COMPLETE] = (nfsd4_dec)nfsd4_decode_reclaim_complete, /* new operations for NFSv4.2 */ - [OP_ALLOCATE] = nfsd4_decode_fallocate, - [OP_COPY] = nfsd4_decode_copy, - [OP_COPY_NOTIFY] = nfsd4_decode_copy_notify, - [OP_DEALLOCATE] = nfsd4_decode_fallocate, - [OP_IO_ADVISE] = nfsd4_decode_notsupp, - [OP_LAYOUTERROR] = nfsd4_decode_notsupp, - [OP_LAYOUTSTATS] = nfsd4_decode_notsupp, - [OP_OFFLOAD_CANCEL] = nfsd4_decode_offload_status, - [OP_OFFLOAD_STATUS] = nfsd4_decode_offload_status, - [OP_READ_PLUS] = nfsd4_decode_read, - [OP_SEEK] = nfsd4_decode_seek, - [OP_WRITE_SAME] = nfsd4_decode_notsupp, - [OP_CLONE] = nfsd4_decode_clone, + [OP_ALLOCATE] = (nfsd4_dec)nfsd4_decode_fallocate, + [OP_COPY] = (nfsd4_dec)nfsd4_decode_copy, + [OP_COPY_NOTIFY] = (nfsd4_dec)nfsd4_decode_copy_notify, + [OP_DEALLOCATE] = (nfsd4_dec)nfsd4_decode_fallocate, + [OP_IO_ADVISE] = (nfsd4_dec)nfsd4_decode_notsupp, + [OP_LAYOUTERROR] = (nfsd4_dec)nfsd4_decode_notsupp, + [OP_LAYOUTSTATS] = (nfsd4_dec)nfsd4_decode_notsupp, + [OP_OFFLOAD_CANCEL] = (nfsd4_dec)nfsd4_decode_offload_status, + [OP_OFFLOAD_STATUS] = (nfsd4_dec)nfsd4_decode_offload_status, + [OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_read, + [OP_SEEK] = (nfsd4_dec)nfsd4_decode_seek, + [OP_WRITE_SAME] = (nfsd4_dec)nfsd4_decode_notsupp, + [OP_CLONE] = (nfsd4_dec)nfsd4_decode_clone, /* RFC 8276 extended atributes operations */ - [OP_GETXATTR] = nfsd4_decode_getxattr, - [OP_SETXATTR] = nfsd4_decode_setxattr, - [OP_LISTXATTRS] = nfsd4_decode_listxattrs, - [OP_REMOVEXATTR] = nfsd4_decode_removexattr, + [OP_GETXATTR] = (nfsd4_dec)nfsd4_decode_getxattr, + [OP_SETXATTR] = (nfsd4_dec)nfsd4_decode_setxattr, + [OP_LISTXATTRS] = (nfsd4_dec)nfsd4_decode_listxattrs, + [OP_REMOVEXATTR] = (nfsd4_dec)nfsd4_decode_removexattr, }; static inline bool @@ -2427,46 +2198,43 @@ nfsd4_opnum_in_range(struct nfsd4_compoundargs *argp, struct nfsd4_op *op) return true; } -static bool +static __be32 nfsd4_decode_compound(struct nfsd4_compoundargs *argp) { + DECODE_HEAD; struct nfsd4_op *op; bool cachethis = false; int auth_slack= argp->rqstp->rq_auth_slack; int max_reply = auth_slack + 8; /* opcnt, status */ int readcount = 0; int readbytes = 0; - __be32 *p; int i; - if (xdr_stream_decode_u32(argp->xdr, &argp->taglen) < 0) - return false; - max_reply += XDR_UNIT; - argp->tag = NULL; - if (unlikely(argp->taglen)) { - if (argp->taglen > NFSD4_MAX_TAGLEN) - return false; - p = xdr_inline_decode(argp->xdr, argp->taglen); - if (!p) - return false; - argp->tag = svcxdr_savemem(argp, p, argp->taglen); - if (!argp->tag) - return false; - max_reply += xdr_align_size(argp->taglen); - } + READ_BUF(4); + argp->taglen = be32_to_cpup(p++); + READ_BUF(argp->taglen); + SAVEMEM(argp->tag, argp->taglen); + READ_BUF(8); + argp->minorversion = be32_to_cpup(p++); + argp->opcnt = be32_to_cpup(p++); + max_reply += 4 + (XDR_QUADLEN(argp->taglen) << 2); - if (xdr_stream_decode_u32(argp->xdr, &argp->minorversion) < 0) - return false; - if (xdr_stream_decode_u32(argp->xdr, &argp->client_opcnt) < 0) - return false; - argp->opcnt = min_t(u32, argp->client_opcnt, - NFSD_MAX_OPS_PER_COMPOUND); + if (argp->taglen > NFSD4_MAX_TAGLEN) + goto xdr_error; + /* + * NFS4ERR_RESOURCE is a more helpful error than GARBAGE_ARGS + * here, so we return success at the xdr level so that + * nfsd4_proc can handle this is an NFS-level error. + */ + if (argp->opcnt > NFSD_MAX_OPS_PER_COMPOUND) + return 0; if (argp->opcnt > ARRAY_SIZE(argp->iops)) { - argp->ops = vcalloc(argp->opcnt, sizeof(*argp->ops)); + argp->ops = kzalloc(argp->opcnt * sizeof(*argp->ops), GFP_KERNEL); if (!argp->ops) { argp->ops = argp->iops; - return false; + dprintk("nfsd: couldn't allocate room for COMPOUND\n"); + goto xdr_error; } } @@ -2476,23 +2244,17 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) for (i = 0; i < argp->opcnt; i++) { op = &argp->ops[i]; op->replay = NULL; - op->opdesc = NULL; - if (xdr_stream_decode_u32(argp->xdr, &op->opnum) < 0) - return false; - if (nfsd4_opnum_in_range(argp, op)) { - op->opdesc = OPDESC(op); + READ_BUF(4); + op->opnum = be32_to_cpup(p++); + + if (nfsd4_opnum_in_range(argp, op)) op->status = nfsd4_dec_ops[op->opnum](argp, &op->u); - if (op->status != nfs_ok) - trace_nfsd_compound_decode_err(argp->rqstp, - argp->opcnt, i, - op->opnum, - op->status); - } else { + else { op->opnum = OP_ILLEGAL; op->status = nfserr_op_illegal; } - + op->opdesc = OPDESC(op); /* * We'll try to cache the result in the DRC if any one * op in the compound wants to be cached: @@ -2527,7 +2289,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp) if (readcount > 1 || max_reply > PAGE_SIZE - auth_slack) clear_bit(RQ_SPLICE_OK, &argp->rqstp->rq_flags); - return true; + DECODE_TAIL; } static __be32 *encode_change(__be32 *p, struct kstat *stat, struct inode *inode, @@ -2536,25 +2298,15 @@ static __be32 *encode_change(__be32 *p, struct kstat *stat, struct inode *inode, if (exp->ex_flags & NFSEXP_V4ROOT) { *p++ = cpu_to_be32(convert_to_wallclock(exp->cd->flush_time)); *p++ = 0; - } else + } else if (IS_I_VERSION(inode)) { p = xdr_encode_hyper(p, nfsd4_change_attribute(stat, inode)); + } else { + *p++ = cpu_to_be32(stat->ctime.tv_sec); + *p++ = cpu_to_be32(stat->ctime.tv_nsec); + } return p; } -static __be32 nfsd4_encode_nfstime4(struct xdr_stream *xdr, - struct timespec64 *tv) -{ - __be32 *p; - - p = xdr_reserve_space(xdr, XDR_UNIT * 3); - if (!p) - return nfserr_resource; - - p = xdr_encode_hyper(p, (s64)tv->tv_sec); - *p = cpu_to_be32(tv->tv_nsec); - return nfs_ok; -} - /* * ctime (in NFSv4, time_metadata) is not writeable, and the client * doesn't really care what resolution could theoretically be stored by @@ -2583,8 +2335,15 @@ static __be32 *encode_time_delta(__be32 *p, struct inode *inode) static __be32 *encode_cinfo(__be32 *p, struct nfsd4_change_info *c) { *p++ = cpu_to_be32(c->atomic); - p = xdr_encode_hyper(p, c->before_change); - p = xdr_encode_hyper(p, c->after_change); + if (c->change_supported) { + p = xdr_encode_hyper(p, c->before_change); + p = xdr_encode_hyper(p, c->after_change); + } else { + *p++ = cpu_to_be32(c->before_ctime_sec); + *p++ = cpu_to_be32(c->before_ctime_nsec); + *p++ = cpu_to_be32(c->after_ctime_sec); + *p++ = cpu_to_be32(c->after_ctime_nsec); + } return p; } @@ -2799,7 +2558,7 @@ static u32 nfs4_file_type(umode_t mode) case S_IFREG: return NF4REG; case S_IFSOCK: return NF4SOCK; default: return NF4BAD; - } + }; } static inline __be32 @@ -2883,10 +2642,9 @@ static __be32 fattr_handle_absent_fs(u32 *bmval0, u32 *bmval1, u32 *bmval2, u32 } -static int nfsd4_get_mounted_on_ino(struct svc_export *exp, u64 *pino) +static int get_parent_attributes(struct svc_export *exp, struct kstat *stat) { struct path path = exp->ex_path; - struct kstat stat; int err; path_get(&path); @@ -2894,10 +2652,8 @@ static int nfsd4_get_mounted_on_ino(struct svc_export *exp, u64 *pino) if (path.dentry != path.mnt->mnt_root) break; } - err = vfs_getattr(&path, &stat, STATX_INO, AT_STATX_SYNC_AS_STAT); + err = vfs_getattr(&path, stat, STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT); path_put(&path); - if (!err) - *pino = stat.ino; return err; } @@ -2950,9 +2706,10 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct kstat stat; struct svc_fh *tempfh = NULL; struct kstatfs statfs; - __be32 *p, *attrlen_p; + __be32 *p; int starting_len = xdr->buf->len; int attrlen_offset; + __be32 attrlen; u32 dummy; u64 dummy64; u32 rdattr_err = 0; @@ -2984,9 +2741,6 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, err = vfs_getattr(&path, &stat, STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT); if (err) goto out_nfserr; - if (!(stat.result_mask & STATX_BTIME)) - /* underlying FS does not offer btime so we can't share it */ - bmval1 &= ~FATTR4_WORD1_TIME_CREATE; if ((bmval0 & (FATTR4_WORD0_FILES_AVAIL | FATTR4_WORD0_FILES_FREE | FATTR4_WORD0_FILES_TOTAL | FATTR4_WORD0_MAXNAME)) || (bmval1 & (FATTR4_WORD1_SPACE_AVAIL | FATTR4_WORD1_SPACE_FREE | @@ -3040,9 +2794,10 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, goto out; attrlen_offset = xdr->buf->len; - attrlen_p = xdr_reserve_space(xdr, XDR_UNIT); - if (!attrlen_p) + p = xdr_reserve_space(xdr, 4); + if (!p) goto out_resource; + p++; /* to be backfilled later */ if (bmval0 & FATTR4_WORD0_SUPPORTED_ATTRS) { u32 supp[3]; @@ -3228,7 +2983,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, p = xdr_reserve_space(xdr, fhp->fh_handle.fh_size + 4); if (!p) goto out_resource; - p = xdr_encode_opaque(p, &fhp->fh_handle.fh_raw, + p = xdr_encode_opaque(p, &fhp->fh_handle.fh_base, fhp->fh_handle.fh_size); } if (bmval0 & FATTR4_WORD0_FILEID) { @@ -3360,14 +3115,11 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, p = xdr_encode_hyper(p, dummy64); } if (bmval1 & FATTR4_WORD1_TIME_ACCESS) { - status = nfsd4_encode_nfstime4(xdr, &stat.atime); - if (status) - goto out; - } - if (bmval1 & FATTR4_WORD1_TIME_CREATE) { - status = nfsd4_encode_nfstime4(xdr, &stat.btime); - if (status) - goto out; + p = xdr_reserve_space(xdr, 12); + if (!p) + goto out_resource; + p = xdr_encode_hyper(p, (s64)stat.atime.tv_sec); + *p++ = cpu_to_be32(stat.atime.tv_nsec); } if (bmval1 & FATTR4_WORD1_TIME_DELTA) { p = xdr_reserve_space(xdr, 12); @@ -3376,31 +3128,36 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, p = encode_time_delta(p, d_inode(dentry)); } if (bmval1 & FATTR4_WORD1_TIME_METADATA) { - status = nfsd4_encode_nfstime4(xdr, &stat.ctime); - if (status) - goto out; + p = xdr_reserve_space(xdr, 12); + if (!p) + goto out_resource; + p = xdr_encode_hyper(p, (s64)stat.ctime.tv_sec); + *p++ = cpu_to_be32(stat.ctime.tv_nsec); } if (bmval1 & FATTR4_WORD1_TIME_MODIFY) { - status = nfsd4_encode_nfstime4(xdr, &stat.mtime); - if (status) - goto out; + p = xdr_reserve_space(xdr, 12); + if (!p) + goto out_resource; + p = xdr_encode_hyper(p, (s64)stat.mtime.tv_sec); + *p++ = cpu_to_be32(stat.mtime.tv_nsec); } if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) { + struct kstat parent_stat; u64 ino = stat.ino; p = xdr_reserve_space(xdr, 8); if (!p) goto out_resource; /* - * Get ino of mountpoint in parent filesystem, if not ignoring - * crossmount and this is the root of a cross-mounted - * filesystem. + * Get parent's attributes if not ignoring crossmount + * and this is the root of a cross-mounted filesystem. */ if (ignore_crossmnt == 0 && dentry == exp->ex_path.mnt->mnt_root) { - err = nfsd4_get_mounted_on_ino(exp, &ino); + err = get_parent_attributes(exp, &parent_stat); if (err) goto out_nfserr; + ino = parent_stat.ino; } p = xdr_encode_hyper(p, ino); } @@ -3437,6 +3194,16 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, goto out; } + if (bmval2 & FATTR4_WORD2_CHANGE_ATTR_TYPE) { + p = xdr_reserve_space(xdr, 4); + if (!p) + goto out_resource; + if (IS_I_VERSION(d_inode(dentry))) + *p++ = cpu_to_be32(NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR); + else + *p++ = cpu_to_be32(NFS4_CHANGE_TYPE_IS_TIME_METADATA); + } + #ifdef CONFIG_NFSD_V4_SECURITY_LABEL if (bmval2 & FATTR4_WORD2_SECURITY_LABEL) { status = nfsd4_encode_security_label(xdr, rqstp, context, @@ -3455,7 +3222,8 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, *p++ = cpu_to_be32(err == 0); } - *attrlen_p = cpu_to_be32(xdr->buf->len - attrlen_offset - XDR_UNIT); + attrlen = htonl(xdr->buf->len - attrlen_offset - 4); + write_bytes_to_xdr_buf(xdr->buf, attrlen_offset, &attrlen, 4); status = nfs_ok; out: @@ -3624,7 +3392,7 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen, p = xdr_reserve_space(xdr, 3*4 + namlen); if (!p) goto fail; - p = xdr_encode_hyper(p, OFFSET_MAX); /* offset of next entry */ + p = xdr_encode_hyper(p, NFS_OFFSET_MAX); /* offset of next entry */ p = xdr_encode_array(p, name, namlen); /* name length & name */ nfserr = nfsd4_encode_dirent_fattr(xdr, cd, name, namlen); @@ -3708,11 +3476,9 @@ nfsd4_encode_stateid(struct xdr_stream *xdr, stateid_t *sid) } static __be32 -nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_access *access) { - struct nfsd4_access *access = &u->access; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 8); @@ -3723,11 +3489,9 @@ nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, return 0; } -static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_bind_conn_to_session *bcts) { - struct nfsd4_bind_conn_to_session *bcts = &u->bind_conn_to_session; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, NFS4_MAX_SESSIONID_LEN + 8); @@ -3742,22 +3506,18 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp, } static __be32 -nfsd4_encode_close(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_close(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_close *close) { - struct nfsd4_close *close = &u->close; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; return nfsd4_encode_stateid(xdr, &close->cl_stateid); } static __be32 -nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_commit *commit) { - struct nfsd4_commit *commit = &u->commit; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, NFS4_VERIFIER_SIZE); @@ -3769,11 +3529,9 @@ nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, } static __be32 -nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_create *create) { - struct nfsd4_create *create = &u->create; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 20); @@ -3785,23 +3543,19 @@ nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, } static __be32 -nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_getattr *getattr) { - struct nfsd4_getattr *getattr = &u->getattr; struct svc_fh *fhp = getattr->ga_fhp; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; return nfsd4_encode_fattr(xdr, fhp, fhp->fh_export, fhp->fh_dentry, getattr->ga_bmval, resp->rqstp, 0); } static __be32 -nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, struct svc_fh **fhpp) { - struct svc_fh **fhpp = &u->getfh; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; struct svc_fh *fhp = *fhpp; unsigned int len; __be32 *p; @@ -3810,7 +3564,7 @@ nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, p = xdr_reserve_space(xdr, len + 4); if (!p) return nfserr_resource; - p = xdr_encode_opaque(p, &fhp->fh_handle.fh_raw, len); + p = xdr_encode_opaque(p, &fhp->fh_handle.fh_base, len); return 0; } @@ -3854,11 +3608,9 @@ nfsd4_encode_lock_denied(struct xdr_stream *xdr, struct nfsd4_lock_denied *ld) } static __be32 -nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lock *lock) { - struct nfsd4_lock *lock = &u->lock; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; if (!nfserr) nfserr = nfsd4_encode_stateid(xdr, &lock->lk_resp_stateid); @@ -3869,11 +3621,9 @@ nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, } static __be32 -nfsd4_encode_lockt(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_lockt(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lockt *lockt) { - struct nfsd4_lockt *lockt = &u->lockt; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; if (nfserr == nfserr_denied) nfsd4_encode_lock_denied(xdr, &lockt->lt_denied); @@ -3881,22 +3631,18 @@ nfsd4_encode_lockt(struct nfsd4_compoundres *resp, __be32 nfserr, } static __be32 -nfsd4_encode_locku(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_locku(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_locku *locku) { - struct nfsd4_locku *locku = &u->locku; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; return nfsd4_encode_stateid(xdr, &locku->lu_stateid); } static __be32 -nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_link *link) { - struct nfsd4_link *link = &u->link; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 20); @@ -3908,11 +3654,9 @@ nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 -nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open *open) { - struct nfsd4_open *open = &u->open; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; nfserr = nfsd4_encode_stateid(xdr, &open->op_stateid); @@ -4004,21 +3748,17 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, } static __be32 -nfsd4_encode_open_confirm(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_open_confirm(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open_confirm *oc) { - struct nfsd4_open_confirm *oc = &u->open_confirm; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; return nfsd4_encode_stateid(xdr, &oc->oc_resp_stateid); } static __be32 -nfsd4_encode_open_downgrade(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_open_downgrade(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open_downgrade *od) { - struct nfsd4_open_downgrade *od = &u->open_downgrade; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; return nfsd4_encode_stateid(xdr, &od->od_stateid); } @@ -4028,28 +3768,33 @@ static __be32 nfsd4_encode_splice_read( struct nfsd4_read *read, struct file *file, unsigned long maxcount) { - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; struct xdr_buf *buf = xdr->buf; - int status, space_left; + u32 eof; + int space_left; __be32 nfserr; + __be32 *p = xdr->p - 2; /* Make sure there will be room for padding if needed */ if (xdr->end - xdr->p < 1) return nfserr_resource; nfserr = nfsd_splice_read(read->rd_rqstp, read->rd_fhp, - file, read->rd_offset, &maxcount, - &read->rd_eof); + file, read->rd_offset, &maxcount, &eof); read->rd_length = maxcount; - if (nfserr) - goto out_err; - status = svc_encode_result_payload(read->rd_rqstp, - buf->head[0].iov_len, maxcount); - if (status) { - nfserr = nfserrno(status); - goto out_err; + if (nfserr) { + /* + * nfsd_splice_actor may have already messed with the + * page length; reset it so as not to confuse + * xdr_truncate_encode: + */ + buf->page_len = 0; + return nfserr; } + *(p++) = htonl(eof); + *(p++) = htonl(maxcount); + buf->page_len = maxcount; buf->len += maxcount; xdr->page_ptr += (buf->page_base + maxcount + PAGE_SIZE - 1) @@ -4075,25 +3820,18 @@ static __be32 nfsd4_encode_splice_read( xdr->end = (__be32 *)((void *)xdr->end + space_left); return 0; - -out_err: - /* - * nfsd_splice_actor may have already messed with the - * page length; reset it so as not to confuse - * xdr_truncate_encode in our caller. - */ - buf->page_len = 0; - return nfserr; } static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, struct nfsd4_read *read, struct file *file, unsigned long maxcount) { - struct xdr_stream *xdr = resp->xdr; - unsigned int starting_len = xdr->buf->len; - __be32 zero = xdr_zero; + struct xdr_stream *xdr = &resp->xdr; + u32 eof; + int starting_len = xdr->buf->len - 8; __be32 nfserr; + __be32 tmp; + int pad; read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, maxcount); if (read->rd_vlen < 0) @@ -4101,27 +3839,33 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp, nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset, resp->rqstp->rq_vec, read->rd_vlen, &maxcount, - &read->rd_eof); + &eof); read->rd_length = maxcount; if (nfserr) return nfserr; - if (svc_encode_result_payload(resp->rqstp, starting_len, maxcount)) + if (svc_encode_read_payload(resp->rqstp, starting_len + 8, maxcount)) return nfserr_io; - xdr_truncate_encode(xdr, starting_len + xdr_align_size(maxcount)); + xdr_truncate_encode(xdr, starting_len + 8 + xdr_align_size(maxcount)); + + tmp = htonl(eof); + write_bytes_to_xdr_buf(xdr->buf, starting_len , &tmp, 4); + tmp = htonl(maxcount); + write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4); + + tmp = xdr_zero; + pad = (maxcount&3) ? 4 - (maxcount&3) : 0; + write_bytes_to_xdr_buf(xdr->buf, starting_len + 8 + maxcount, + &tmp, pad); + return 0; - write_bytes_to_xdr_buf(xdr->buf, starting_len + maxcount, &zero, - xdr_pad_size(maxcount)); - return nfs_ok; } static __be32 nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_read *read) { - struct nfsd4_read *read = &u->read; - bool splice_ok = test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags); unsigned long maxcount; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; struct file *file; int starting_len = xdr->buf->len; __be32 *p; @@ -4132,44 +3876,45 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */ if (!p) { - WARN_ON_ONCE(splice_ok); + WARN_ON_ONCE(test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags)); return nfserr_resource; } - if (resp->xdr->buf->page_len && splice_ok) { + if (resp->xdr.buf->page_len && + test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags)) { WARN_ON_ONCE(1); return nfserr_serverfault; } xdr_commit_encode(xdr); - maxcount = min_t(unsigned long, read->rd_length, + maxcount = svc_max_payload(resp->rqstp); + maxcount = min_t(unsigned long, maxcount, (xdr->buf->buflen - xdr->buf->len)); + maxcount = min_t(unsigned long, maxcount, read->rd_length); - if (file->f_op->splice_read && splice_ok) + if (file->f_op->splice_read && + test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags)) nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount); else nfserr = nfsd4_encode_readv(resp, read, file, maxcount); - if (nfserr) { - xdr_truncate_encode(xdr, starting_len); - return nfserr; - } - p = xdr_encode_bool(p, read->rd_eof); - *p = cpu_to_be32(read->rd_length); - return nfs_ok; + if (nfserr) + xdr_truncate_encode(xdr, starting_len); + + return nfserr; } static __be32 -nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_readlink *readlink) { - struct nfsd4_readlink *readlink = &u->readlink; - __be32 *p, *maxcount_p, zero = xdr_zero; - struct xdr_stream *xdr = resp->xdr; + int maxcount; + __be32 wire_count; + int zero = 0; + struct xdr_stream *xdr = &resp->xdr; int length_offset = xdr->buf->len; - int maxcount, status; + __be32 *p; - maxcount_p = xdr_reserve_space(xdr, XDR_UNIT); - if (!maxcount_p) + p = xdr_reserve_space(xdr, 4); + if (!p) return nfserr_resource; maxcount = PAGE_SIZE; @@ -4186,35 +3931,28 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, (char *)p, &maxcount); if (nfserr == nfserr_isdir) nfserr = nfserr_inval; - if (nfserr) - goto out_err; - status = svc_encode_result_payload(readlink->rl_rqstp, length_offset, - maxcount); - if (status) { - nfserr = nfserrno(status); - goto out_err; + if (nfserr) { + xdr_truncate_encode(xdr, length_offset); + return nfserr; } - *maxcount_p = cpu_to_be32(maxcount); - xdr_truncate_encode(xdr, length_offset + 4 + xdr_align_size(maxcount)); - write_bytes_to_xdr_buf(xdr->buf, length_offset + 4 + maxcount, &zero, - xdr_pad_size(maxcount)); - return nfs_ok; -out_err: - xdr_truncate_encode(xdr, length_offset); - return nfserr; + wire_count = htonl(maxcount); + write_bytes_to_xdr_buf(xdr->buf, length_offset, &wire_count, 4); + xdr_truncate_encode(xdr, length_offset + 4 + ALIGN(maxcount, 4)); + if (maxcount & 3) + write_bytes_to_xdr_buf(xdr->buf, length_offset + 4 + maxcount, + &zero, 4 - (maxcount&3)); + return 0; } static __be32 -nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_readdir *readdir) { - struct nfsd4_readdir *readdir = &u->readdir; int maxcount; int bytes_left; loff_t offset; __be64 wire_offset; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; int starting_len = xdr->buf->len; __be32 *p; @@ -4225,8 +3963,8 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, /* XXX: Following NFSv3, we ignore the READDIR verifier for now. */ *p++ = cpu_to_be32(0); *p++ = cpu_to_be32(0); - xdr->buf->head[0].iov_len = (char *)xdr->p - - (char *)xdr->buf->head[0].iov_base; + resp->xdr.buf->head[0].iov_len = ((char *)resp->xdr.p) + - (char *)resp->xdr.buf->head[0].iov_base; /* * Number of bytes left for directory entries allowing for the @@ -4299,11 +4037,9 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, } static __be32 -nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_remove *remove) { - struct nfsd4_remove *remove = &u->remove; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 20); @@ -4314,11 +4050,9 @@ nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, } static __be32 -nfsd4_encode_rename(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_rename(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_rename *rename) { - struct nfsd4_rename *rename = &u->rename; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 40); @@ -4399,20 +4133,18 @@ nfsd4_do_encode_secinfo(struct xdr_stream *xdr, struct svc_export *exp) static __be32 nfsd4_encode_secinfo(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_secinfo *secinfo) { - struct nfsd4_secinfo *secinfo = &u->secinfo; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; return nfsd4_do_encode_secinfo(xdr, secinfo->si_exp); } static __be32 nfsd4_encode_secinfo_no_name(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_secinfo_no_name *secinfo) { - struct nfsd4_secinfo_no_name *secinfo = &u->secinfo_no_name; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; return nfsd4_do_encode_secinfo(xdr, secinfo->sin_exp); } @@ -4422,11 +4154,9 @@ nfsd4_encode_secinfo_no_name(struct nfsd4_compoundres *resp, __be32 nfserr, * regardless of the error status. */ static __be32 -nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_setattr *setattr) { - struct nfsd4_setattr *setattr = &u->setattr; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 16); @@ -4448,11 +4178,9 @@ nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, } static __be32 -nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_setclientid *scd) { - struct nfsd4_setclientid *scd = &u->setclientid; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; if (!nfserr) { @@ -4474,11 +4202,9 @@ nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, } static __be32 -nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) +nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_write *write) { - struct nfsd4_write *write = &u->write; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 16); @@ -4493,10 +4219,9 @@ nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_exchange_id *exid) { - struct nfsd4_exchange_id *exid = &u->exchange_id; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; char *major_id; char *server_scope; @@ -4572,10 +4297,9 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_create_session *sess) { - struct nfsd4_create_session *sess = &u->create_session; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 24); @@ -4626,10 +4350,9 @@ nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_sequence *seq) { - struct nfsd4_sequence *seq = &u->sequence; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, NFS4_MAX_SESSIONID_LEN + 20); @@ -4650,10 +4373,9 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_test_stateid *test_stateid) { - struct nfsd4_test_stateid *test_stateid = &u->test_stateid; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; struct nfsd4_test_stateid_id *stateid, *next; __be32 *p; @@ -4672,10 +4394,9 @@ nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr, #ifdef CONFIG_NFSD_PNFS static __be32 nfsd4_encode_getdeviceinfo(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_getdeviceinfo *gdev) { - struct nfsd4_getdeviceinfo *gdev = &u->getdeviceinfo; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; const struct nfsd4_layout_ops *ops; u32 starting_len = xdr->buf->len, needed_len; __be32 *p; @@ -4726,10 +4447,9 @@ nfsd4_encode_getdeviceinfo(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_layoutget(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_layoutget *lgp) { - struct nfsd4_layoutget *lgp = &u->layoutget; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; const struct nfsd4_layout_ops *ops; __be32 *p; @@ -4754,10 +4474,9 @@ nfsd4_encode_layoutget(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_layoutcommit(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_layoutcommit *lcp) { - struct nfsd4_layoutcommit *lcp = &u->layoutcommit; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 4); @@ -4776,10 +4495,9 @@ nfsd4_encode_layoutcommit(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_layoutreturn(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_layoutreturn *lrp) { - struct nfsd4_layoutreturn *lrp = &u->layoutreturn; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 4); @@ -4797,7 +4515,7 @@ nfsd42_encode_write_res(struct nfsd4_compoundres *resp, struct nfsd42_write_res *write, bool sync) { __be32 *p; - p = xdr_reserve_space(resp->xdr, 4); + p = xdr_reserve_space(&resp->xdr, 4); if (!p) return nfserr_resource; @@ -4806,11 +4524,11 @@ nfsd42_encode_write_res(struct nfsd4_compoundres *resp, else { __be32 nfserr; *p++ = cpu_to_be32(1); - nfserr = nfsd4_encode_stateid(resp->xdr, &write->cb_stateid); + nfserr = nfsd4_encode_stateid(&resp->xdr, &write->cb_stateid); if (nfserr) return nfserr; } - p = xdr_reserve_space(resp->xdr, 8 + 4 + NFS4_VERIFIER_SIZE); + p = xdr_reserve_space(&resp->xdr, 8 + 4 + NFS4_VERIFIER_SIZE); if (!p) return nfserr_resource; @@ -4824,7 +4542,7 @@ nfsd42_encode_write_res(struct nfsd4_compoundres *resp, static __be32 nfsd42_encode_nl4_server(struct nfsd4_compoundres *resp, struct nl4_server *ns) { - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; struct nfs42_netaddr *addr; __be32 *p; @@ -4863,28 +4581,26 @@ nfsd42_encode_nl4_server(struct nfsd4_compoundres *resp, struct nl4_server *ns) static __be32 nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_copy *copy) { - struct nfsd4_copy *copy = &u->copy; __be32 *p; nfserr = nfsd42_encode_write_res(resp, ©->cp_res, - nfsd4_copy_is_sync(copy)); + copy->cp_synchronous); if (nfserr) return nfserr; - p = xdr_reserve_space(resp->xdr, 4 + 4); + p = xdr_reserve_space(&resp->xdr, 4 + 4); *p++ = xdr_one; /* cr_consecutive */ - *p = nfsd4_copy_is_sync(copy) ? xdr_one : xdr_zero; + *p++ = cpu_to_be32(copy->cp_synchronous); return 0; } static __be32 nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_offload_status *os) { - struct nfsd4_offload_status *os = &u->offload_status; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 8 + 4); @@ -4897,84 +4613,159 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp, - struct nfsd4_read *read) + struct nfsd4_read *read, + unsigned long *maxcount, u32 *eof, + loff_t *pos) { - bool splice_ok = test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags); + struct xdr_stream *xdr = &resp->xdr; struct file *file = read->rd_nf->nf_file; - struct xdr_stream *xdr = resp->xdr; - unsigned long maxcount; - __be32 nfserr, *p; + int starting_len = xdr->buf->len; + loff_t hole_pos; + __be32 nfserr; + __be32 *p, tmp; + __be64 tmp64; + + hole_pos = pos ? *pos : vfs_llseek(file, read->rd_offset, SEEK_HOLE); + if (hole_pos > read->rd_offset) + *maxcount = min_t(unsigned long, *maxcount, hole_pos - read->rd_offset); + *maxcount = min_t(unsigned long, *maxcount, (xdr->buf->buflen - xdr->buf->len)); /* Content type, offset, byte count */ p = xdr_reserve_space(xdr, 4 + 8 + 4); if (!p) - return nfserr_io; - if (resp->xdr->buf->page_len && splice_ok) { - WARN_ON_ONCE(splice_ok); - return nfserr_serverfault; - } + return nfserr_resource; - maxcount = min_t(unsigned long, read->rd_length, - (xdr->buf->buflen - xdr->buf->len)); + read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, *maxcount); + if (read->rd_vlen < 0) + return nfserr_resource; - if (file->f_op->splice_read && splice_ok) - nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount); - else - nfserr = nfsd4_encode_readv(resp, read, file, maxcount); + nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset, + resp->rqstp->rq_vec, read->rd_vlen, maxcount, eof); if (nfserr) return nfserr; + xdr_truncate_encode(xdr, starting_len + 16 + xdr_align_size(*maxcount)); - *p++ = cpu_to_be32(NFS4_CONTENT_DATA); - p = xdr_encode_hyper(p, read->rd_offset); - *p = cpu_to_be32(read->rd_length); + tmp = htonl(NFS4_CONTENT_DATA); + write_bytes_to_xdr_buf(xdr->buf, starting_len, &tmp, 4); + tmp64 = cpu_to_be64(read->rd_offset); + write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp64, 8); + tmp = htonl(*maxcount); + write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp, 4); + tmp = xdr_zero; + write_bytes_to_xdr_buf(xdr->buf, starting_len + 16 + *maxcount, &tmp, + xdr_pad_size(*maxcount)); + return nfs_ok; +} + +static __be32 +nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp, + struct nfsd4_read *read, + unsigned long *maxcount, u32 *eof) +{ + struct file *file = read->rd_nf->nf_file; + loff_t data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA); + loff_t f_size = i_size_read(file_inode(file)); + unsigned long count; + __be32 *p; + + if (data_pos == -ENXIO) + data_pos = f_size; + else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE)) + return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size); + count = data_pos - read->rd_offset; + + /* Content type, offset, byte count */ + p = xdr_reserve_space(&resp->xdr, 4 + 8 + 8); + if (!p) + return nfserr_resource; + + *p++ = htonl(NFS4_CONTENT_HOLE); + p = xdr_encode_hyper(p, read->rd_offset); + p = xdr_encode_hyper(p, count); + + *eof = (read->rd_offset + count) >= f_size; + *maxcount = min_t(unsigned long, count, *maxcount); return nfs_ok; } static __be32 nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_read *read) { - struct nfsd4_read *read = &u->read; - struct file *file = read->rd_nf->nf_file; - struct xdr_stream *xdr = resp->xdr; + unsigned long maxcount, count; + struct xdr_stream *xdr = &resp->xdr; + struct file *file; int starting_len = xdr->buf->len; - u32 segments = 0; - __be32 *p; + int last_segment = xdr->buf->len; + int segments = 0; + __be32 *p, tmp; + bool is_data; + loff_t pos; + u32 eof; if (nfserr) return nfserr; + file = read->rd_nf->nf_file; /* eof flag, segment count */ p = xdr_reserve_space(xdr, 4 + 4); if (!p) - return nfserr_io; + return nfserr_resource; xdr_commit_encode(xdr); - read->rd_eof = read->rd_offset >= i_size_read(file_inode(file)); - if (read->rd_eof) + maxcount = svc_max_payload(resp->rqstp); + maxcount = min_t(unsigned long, maxcount, + (xdr->buf->buflen - xdr->buf->len)); + maxcount = min_t(unsigned long, maxcount, read->rd_length); + count = maxcount; + + eof = read->rd_offset >= i_size_read(file_inode(file)); + if (eof) goto out; - nfserr = nfsd4_encode_read_plus_data(resp, read); - if (nfserr) { - xdr_truncate_encode(xdr, starting_len); - return nfserr; + pos = vfs_llseek(file, read->rd_offset, SEEK_HOLE); + is_data = pos > read->rd_offset; + + while (count > 0 && !eof) { + maxcount = count; + if (is_data) + nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof, + segments == 0 ? &pos : NULL); + else + nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof); + if (nfserr) + goto out; + count -= maxcount; + read->rd_offset += maxcount; + is_data = !is_data; + last_segment = xdr->buf->len; + segments++; } - segments++; - out: - p = xdr_encode_bool(p, read->rd_eof); - *p = cpu_to_be32(segments); + if (nfserr && segments == 0) + xdr_truncate_encode(xdr, starting_len); + else { + if (nfserr) { + xdr_truncate_encode(xdr, last_segment); + nfserr = nfs_ok; + eof = 0; + } + tmp = htonl(eof); + write_bytes_to_xdr_buf(xdr->buf, starting_len, &tmp, 4); + tmp = htonl(segments); + write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4); + } + return nfserr; } static __be32 nfsd4_encode_copy_notify(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_copy_notify *cn) { - struct nfsd4_copy_notify *cn = &u->copy_notify; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; if (nfserr) @@ -5001,18 +4792,16 @@ nfsd4_encode_copy_notify(struct nfsd4_compoundres *resp, __be32 nfserr, *p++ = cpu_to_be32(1); - nfserr = nfsd42_encode_nl4_server(resp, cn->cpn_src); - return nfserr; + return nfsd42_encode_nl4_server(resp, &cn->cpn_src); } static __be32 nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_seek *seek) { - struct nfsd4_seek *seek = &u->seek; __be32 *p; - p = xdr_reserve_space(resp->xdr, 4 + 8); + p = xdr_reserve_space(&resp->xdr, 4 + 8); *p++ = cpu_to_be32(seek->seek_eof); p = xdr_encode_hyper(p, seek->seek_pos); @@ -5020,8 +4809,7 @@ nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr, } static __be32 -nfsd4_encode_noop(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *p) +nfsd4_encode_noop(struct nfsd4_compoundres *resp, __be32 nfserr, void *p) { return nfserr; } @@ -5072,10 +4860,9 @@ nfsd4_vbuf_to_stream(struct xdr_stream *xdr, char *buf, u32 buflen) static __be32 nfsd4_encode_getxattr(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_getxattr *getxattr) { - struct nfsd4_getxattr *getxattr = &u->getxattr; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p, err; p = xdr_reserve_space(xdr, 4); @@ -5097,10 +4884,9 @@ nfsd4_encode_getxattr(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_setxattr(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_setxattr *setxattr) { - struct nfsd4_setxattr *setxattr = &u->setxattr; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 20); @@ -5139,10 +4925,9 @@ nfsd4_listxattr_validate_cookie(struct nfsd4_listxattrs *listxattrs, static __be32 nfsd4_encode_listxattrs(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_listxattrs *listxattrs) { - struct nfsd4_listxattrs *listxattrs = &u->listxattrs; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; u32 cookie_offset, count_offset, eof; u32 left, xdrleft, slen, count; u32 xdrlen, offset; @@ -5251,10 +5036,9 @@ nfsd4_encode_listxattrs(struct nfsd4_compoundres *resp, __be32 nfserr, static __be32 nfsd4_encode_removexattr(struct nfsd4_compoundres *resp, __be32 nfserr, - union nfsd4_op_u *u) + struct nfsd4_removexattr *removexattr) { - struct nfsd4_removexattr *removexattr = &u->removexattr; - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; __be32 *p; p = xdr_reserve_space(xdr, 20); @@ -5265,7 +5049,7 @@ nfsd4_encode_removexattr(struct nfsd4_compoundres *resp, __be32 nfserr, return 0; } -typedef __be32(*nfsd4_enc)(struct nfsd4_compoundres *, __be32, union nfsd4_op_u *u); +typedef __be32(* nfsd4_enc)(struct nfsd4_compoundres *, __be32, void *); /* * Note: nfsd4_enc_ops vector is shared for v4.0 and v4.1 @@ -5273,93 +5057,93 @@ typedef __be32(*nfsd4_enc)(struct nfsd4_compoundres *, __be32, union nfsd4_op_u * done in the decoding phase. */ static const nfsd4_enc nfsd4_enc_ops[] = { - [OP_ACCESS] = nfsd4_encode_access, - [OP_CLOSE] = nfsd4_encode_close, - [OP_COMMIT] = nfsd4_encode_commit, - [OP_CREATE] = nfsd4_encode_create, - [OP_DELEGPURGE] = nfsd4_encode_noop, - [OP_DELEGRETURN] = nfsd4_encode_noop, - [OP_GETATTR] = nfsd4_encode_getattr, - [OP_GETFH] = nfsd4_encode_getfh, - [OP_LINK] = nfsd4_encode_link, - [OP_LOCK] = nfsd4_encode_lock, - [OP_LOCKT] = nfsd4_encode_lockt, - [OP_LOCKU] = nfsd4_encode_locku, - [OP_LOOKUP] = nfsd4_encode_noop, - [OP_LOOKUPP] = nfsd4_encode_noop, - [OP_NVERIFY] = nfsd4_encode_noop, - [OP_OPEN] = nfsd4_encode_open, - [OP_OPENATTR] = nfsd4_encode_noop, - [OP_OPEN_CONFIRM] = nfsd4_encode_open_confirm, - [OP_OPEN_DOWNGRADE] = nfsd4_encode_open_downgrade, - [OP_PUTFH] = nfsd4_encode_noop, - [OP_PUTPUBFH] = nfsd4_encode_noop, - [OP_PUTROOTFH] = nfsd4_encode_noop, - [OP_READ] = nfsd4_encode_read, - [OP_READDIR] = nfsd4_encode_readdir, - [OP_READLINK] = nfsd4_encode_readlink, - [OP_REMOVE] = nfsd4_encode_remove, - [OP_RENAME] = nfsd4_encode_rename, - [OP_RENEW] = nfsd4_encode_noop, - [OP_RESTOREFH] = nfsd4_encode_noop, - [OP_SAVEFH] = nfsd4_encode_noop, - [OP_SECINFO] = nfsd4_encode_secinfo, - [OP_SETATTR] = nfsd4_encode_setattr, - [OP_SETCLIENTID] = nfsd4_encode_setclientid, - [OP_SETCLIENTID_CONFIRM] = nfsd4_encode_noop, - [OP_VERIFY] = nfsd4_encode_noop, - [OP_WRITE] = nfsd4_encode_write, - [OP_RELEASE_LOCKOWNER] = nfsd4_encode_noop, + [OP_ACCESS] = (nfsd4_enc)nfsd4_encode_access, + [OP_CLOSE] = (nfsd4_enc)nfsd4_encode_close, + [OP_COMMIT] = (nfsd4_enc)nfsd4_encode_commit, + [OP_CREATE] = (nfsd4_enc)nfsd4_encode_create, + [OP_DELEGPURGE] = (nfsd4_enc)nfsd4_encode_noop, + [OP_DELEGRETURN] = (nfsd4_enc)nfsd4_encode_noop, + [OP_GETATTR] = (nfsd4_enc)nfsd4_encode_getattr, + [OP_GETFH] = (nfsd4_enc)nfsd4_encode_getfh, + [OP_LINK] = (nfsd4_enc)nfsd4_encode_link, + [OP_LOCK] = (nfsd4_enc)nfsd4_encode_lock, + [OP_LOCKT] = (nfsd4_enc)nfsd4_encode_lockt, + [OP_LOCKU] = (nfsd4_enc)nfsd4_encode_locku, + [OP_LOOKUP] = (nfsd4_enc)nfsd4_encode_noop, + [OP_LOOKUPP] = (nfsd4_enc)nfsd4_encode_noop, + [OP_NVERIFY] = (nfsd4_enc)nfsd4_encode_noop, + [OP_OPEN] = (nfsd4_enc)nfsd4_encode_open, + [OP_OPENATTR] = (nfsd4_enc)nfsd4_encode_noop, + [OP_OPEN_CONFIRM] = (nfsd4_enc)nfsd4_encode_open_confirm, + [OP_OPEN_DOWNGRADE] = (nfsd4_enc)nfsd4_encode_open_downgrade, + [OP_PUTFH] = (nfsd4_enc)nfsd4_encode_noop, + [OP_PUTPUBFH] = (nfsd4_enc)nfsd4_encode_noop, + [OP_PUTROOTFH] = (nfsd4_enc)nfsd4_encode_noop, + [OP_READ] = (nfsd4_enc)nfsd4_encode_read, + [OP_READDIR] = (nfsd4_enc)nfsd4_encode_readdir, + [OP_READLINK] = (nfsd4_enc)nfsd4_encode_readlink, + [OP_REMOVE] = (nfsd4_enc)nfsd4_encode_remove, + [OP_RENAME] = (nfsd4_enc)nfsd4_encode_rename, + [OP_RENEW] = (nfsd4_enc)nfsd4_encode_noop, + [OP_RESTOREFH] = (nfsd4_enc)nfsd4_encode_noop, + [OP_SAVEFH] = (nfsd4_enc)nfsd4_encode_noop, + [OP_SECINFO] = (nfsd4_enc)nfsd4_encode_secinfo, + [OP_SETATTR] = (nfsd4_enc)nfsd4_encode_setattr, + [OP_SETCLIENTID] = (nfsd4_enc)nfsd4_encode_setclientid, + [OP_SETCLIENTID_CONFIRM] = (nfsd4_enc)nfsd4_encode_noop, + [OP_VERIFY] = (nfsd4_enc)nfsd4_encode_noop, + [OP_WRITE] = (nfsd4_enc)nfsd4_encode_write, + [OP_RELEASE_LOCKOWNER] = (nfsd4_enc)nfsd4_encode_noop, /* NFSv4.1 operations */ - [OP_BACKCHANNEL_CTL] = nfsd4_encode_noop, - [OP_BIND_CONN_TO_SESSION] = nfsd4_encode_bind_conn_to_session, - [OP_EXCHANGE_ID] = nfsd4_encode_exchange_id, - [OP_CREATE_SESSION] = nfsd4_encode_create_session, - [OP_DESTROY_SESSION] = nfsd4_encode_noop, - [OP_FREE_STATEID] = nfsd4_encode_noop, - [OP_GET_DIR_DELEGATION] = nfsd4_encode_noop, + [OP_BACKCHANNEL_CTL] = (nfsd4_enc)nfsd4_encode_noop, + [OP_BIND_CONN_TO_SESSION] = (nfsd4_enc)nfsd4_encode_bind_conn_to_session, + [OP_EXCHANGE_ID] = (nfsd4_enc)nfsd4_encode_exchange_id, + [OP_CREATE_SESSION] = (nfsd4_enc)nfsd4_encode_create_session, + [OP_DESTROY_SESSION] = (nfsd4_enc)nfsd4_encode_noop, + [OP_FREE_STATEID] = (nfsd4_enc)nfsd4_encode_noop, + [OP_GET_DIR_DELEGATION] = (nfsd4_enc)nfsd4_encode_noop, #ifdef CONFIG_NFSD_PNFS - [OP_GETDEVICEINFO] = nfsd4_encode_getdeviceinfo, - [OP_GETDEVICELIST] = nfsd4_encode_noop, - [OP_LAYOUTCOMMIT] = nfsd4_encode_layoutcommit, - [OP_LAYOUTGET] = nfsd4_encode_layoutget, - [OP_LAYOUTRETURN] = nfsd4_encode_layoutreturn, + [OP_GETDEVICEINFO] = (nfsd4_enc)nfsd4_encode_getdeviceinfo, + [OP_GETDEVICELIST] = (nfsd4_enc)nfsd4_encode_noop, + [OP_LAYOUTCOMMIT] = (nfsd4_enc)nfsd4_encode_layoutcommit, + [OP_LAYOUTGET] = (nfsd4_enc)nfsd4_encode_layoutget, + [OP_LAYOUTRETURN] = (nfsd4_enc)nfsd4_encode_layoutreturn, #else - [OP_GETDEVICEINFO] = nfsd4_encode_noop, - [OP_GETDEVICELIST] = nfsd4_encode_noop, - [OP_LAYOUTCOMMIT] = nfsd4_encode_noop, - [OP_LAYOUTGET] = nfsd4_encode_noop, - [OP_LAYOUTRETURN] = nfsd4_encode_noop, + [OP_GETDEVICEINFO] = (nfsd4_enc)nfsd4_encode_noop, + [OP_GETDEVICELIST] = (nfsd4_enc)nfsd4_encode_noop, + [OP_LAYOUTCOMMIT] = (nfsd4_enc)nfsd4_encode_noop, + [OP_LAYOUTGET] = (nfsd4_enc)nfsd4_encode_noop, + [OP_LAYOUTRETURN] = (nfsd4_enc)nfsd4_encode_noop, #endif - [OP_SECINFO_NO_NAME] = nfsd4_encode_secinfo_no_name, - [OP_SEQUENCE] = nfsd4_encode_sequence, - [OP_SET_SSV] = nfsd4_encode_noop, - [OP_TEST_STATEID] = nfsd4_encode_test_stateid, - [OP_WANT_DELEGATION] = nfsd4_encode_noop, - [OP_DESTROY_CLIENTID] = nfsd4_encode_noop, - [OP_RECLAIM_COMPLETE] = nfsd4_encode_noop, + [OP_SECINFO_NO_NAME] = (nfsd4_enc)nfsd4_encode_secinfo_no_name, + [OP_SEQUENCE] = (nfsd4_enc)nfsd4_encode_sequence, + [OP_SET_SSV] = (nfsd4_enc)nfsd4_encode_noop, + [OP_TEST_STATEID] = (nfsd4_enc)nfsd4_encode_test_stateid, + [OP_WANT_DELEGATION] = (nfsd4_enc)nfsd4_encode_noop, + [OP_DESTROY_CLIENTID] = (nfsd4_enc)nfsd4_encode_noop, + [OP_RECLAIM_COMPLETE] = (nfsd4_enc)nfsd4_encode_noop, /* NFSv4.2 operations */ - [OP_ALLOCATE] = nfsd4_encode_noop, - [OP_COPY] = nfsd4_encode_copy, - [OP_COPY_NOTIFY] = nfsd4_encode_copy_notify, - [OP_DEALLOCATE] = nfsd4_encode_noop, - [OP_IO_ADVISE] = nfsd4_encode_noop, - [OP_LAYOUTERROR] = nfsd4_encode_noop, - [OP_LAYOUTSTATS] = nfsd4_encode_noop, - [OP_OFFLOAD_CANCEL] = nfsd4_encode_noop, - [OP_OFFLOAD_STATUS] = nfsd4_encode_offload_status, - [OP_READ_PLUS] = nfsd4_encode_read_plus, - [OP_SEEK] = nfsd4_encode_seek, - [OP_WRITE_SAME] = nfsd4_encode_noop, - [OP_CLONE] = nfsd4_encode_noop, + [OP_ALLOCATE] = (nfsd4_enc)nfsd4_encode_noop, + [OP_COPY] = (nfsd4_enc)nfsd4_encode_copy, + [OP_COPY_NOTIFY] = (nfsd4_enc)nfsd4_encode_copy_notify, + [OP_DEALLOCATE] = (nfsd4_enc)nfsd4_encode_noop, + [OP_IO_ADVISE] = (nfsd4_enc)nfsd4_encode_noop, + [OP_LAYOUTERROR] = (nfsd4_enc)nfsd4_encode_noop, + [OP_LAYOUTSTATS] = (nfsd4_enc)nfsd4_encode_noop, + [OP_OFFLOAD_CANCEL] = (nfsd4_enc)nfsd4_encode_noop, + [OP_OFFLOAD_STATUS] = (nfsd4_enc)nfsd4_encode_offload_status, + [OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_read_plus, + [OP_SEEK] = (nfsd4_enc)nfsd4_encode_seek, + [OP_WRITE_SAME] = (nfsd4_enc)nfsd4_encode_noop, + [OP_CLONE] = (nfsd4_enc)nfsd4_encode_noop, /* RFC 8276 extended atributes operations */ - [OP_GETXATTR] = nfsd4_encode_getxattr, - [OP_SETXATTR] = nfsd4_encode_setxattr, - [OP_LISTXATTRS] = nfsd4_encode_listxattrs, - [OP_REMOVEXATTR] = nfsd4_encode_removexattr, + [OP_GETXATTR] = (nfsd4_enc)nfsd4_encode_getxattr, + [OP_SETXATTR] = (nfsd4_enc)nfsd4_encode_setxattr, + [OP_LISTXATTRS] = (nfsd4_enc)nfsd4_encode_listxattrs, + [OP_REMOVEXATTR] = (nfsd4_enc)nfsd4_encode_removexattr, }; /* @@ -5394,7 +5178,7 @@ __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 respsize) void nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) { - struct xdr_stream *xdr = resp->xdr; + struct xdr_stream *xdr = &resp->xdr; struct nfs4_stateowner *so = resp->cstate.replay_owner; struct svc_rqst *rqstp = resp->rqstp; const struct nfsd4_operation *opdesc = op->opdesc; @@ -5403,8 +5187,10 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) __be32 *p; p = xdr_reserve_space(xdr, 8); - if (!p) - goto release; + if (!p) { + WARN_ON_ONCE(1); + return; + } *p++ = cpu_to_be32(op->opnum); post_err_offset = xdr->buf->len; @@ -5413,12 +5199,12 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) if (op->status && opdesc && !(opdesc->op_flags & OP_NONTRIVIAL_ERROR_ENCODE)) goto status; - BUG_ON(op->opnum >= ARRAY_SIZE(nfsd4_enc_ops) || + BUG_ON(op->opnum < 0 || op->opnum >= ARRAY_SIZE(nfsd4_enc_ops) || !nfsd4_enc_ops[op->opnum]); encoder = nfsd4_enc_ops[op->opnum]; op->status = encoder(resp, op->status, &op->u); - if (op->status) - trace_nfsd_compound_encode_err(rqstp, op->opnum, op->status); + if (opdesc && opdesc->op_release) + opdesc->op_release(&op->u); xdr_commit_encode(xdr); /* nfsd4_check_resp_size guarantees enough room for error status */ @@ -5458,10 +5244,8 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) so->so_replay.rp_buf, len); } status: - *p = op->status; -release: - if (opdesc && opdesc->op_release) - opdesc->op_release(&op->u); + /* Note that op->status is already in network byte order: */ + write_bytes_to_xdr_buf(xdr->buf, post_err_offset - 4, &op->status, 4); } /* @@ -5487,14 +5271,22 @@ nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op) p = xdr_encode_opaque_fixed(p, rp->rp_buf, rp->rp_buflen); } +int +nfs4svc_encode_voidres(struct svc_rqst *rqstp, __be32 *p) +{ + return xdr_ressize_check(rqstp, p); +} + void nfsd4_release_compoundargs(struct svc_rqst *rqstp) { struct nfsd4_compoundargs *args = rqstp->rq_argp; if (args->ops != args->iops) { - vfree(args->ops); + kfree(args->ops); args->ops = args->iops; } + kfree(args->tmpp); + args->tmpp = NULL; while (args->to_free) { struct svcxdr_tmpbuf *tb = args->to_free; args->to_free = tb->next; @@ -5502,44 +5294,57 @@ void nfsd4_release_compoundargs(struct svc_rqst *rqstp) } } -bool -nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs4svc_decode_voidarg(struct svc_rqst *rqstp, __be32 *p) +{ + return 1; +} + +int +nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd4_compoundargs *args = rqstp->rq_argp; - /* svcxdr_tmp_alloc */ + if (rqstp->rq_arg.head[0].iov_len % 4) { + /* client is nuts */ + dprintk("%s: compound not properly padded! (peeraddr=%pISc xid=0x%x)", + __func__, svc_addr(rqstp), be32_to_cpu(rqstp->rq_xid)); + return 0; + } + args->p = p; + args->end = rqstp->rq_arg.head[0].iov_base + rqstp->rq_arg.head[0].iov_len; + args->pagelist = rqstp->rq_arg.pages; + args->pagelen = rqstp->rq_arg.page_len; + args->tail = false; + args->tmpp = NULL; args->to_free = NULL; - - args->xdr = xdr; args->ops = args->iops; args->rqstp = rqstp; - return nfsd4_decode_compound(args); + return !nfsd4_decode_compound(args); } -bool -nfs4svc_encode_compoundres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd4_compoundres *resp = rqstp->rq_resp; - __be32 *p; + struct xdr_buf *buf = resp->xdr.buf; - /* - * Send buffer space for the following items is reserved - * at the top of nfsd4_proc_compound(). - */ - p = resp->statusp; + WARN_ON_ONCE(buf->len != buf->head[0].iov_len + buf->page_len + + buf->tail[0].iov_len); - *p++ = resp->cstate.status; + *p = resp->cstate.status; - rqstp->rq_next_page = xdr->page_ptr + 1; + rqstp->rq_next_page = resp->xdr.page_ptr + 1; + p = resp->tagp; *p++ = htonl(resp->taglen); memcpy(p, resp->tag, resp->taglen); p += XDR_QUADLEN(resp->taglen); *p++ = htonl(resp->opcnt); nfsd4_sequence_done(resp); - return true; + return 1; } /* diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c index 2b5417e06d80..80c90fc231a5 100644 --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -84,6 +84,12 @@ nfsd_hashsize(unsigned int limit) return roundup_pow_of_two(limit / TARGET_BUCKET_SIZE); } +static u32 +nfsd_cache_hash(__be32 xid, struct nfsd_net *nn) +{ + return hash_32(be32_to_cpu(xid), nn->maskbits); +} + static struct svc_cacherep * nfsd_reply_cache_alloc(struct svc_rqst *rqstp, __wsum csum, struct nfsd_net *nn) @@ -115,14 +121,14 @@ nfsd_reply_cache_free_locked(struct nfsd_drc_bucket *b, struct svc_cacherep *rp, struct nfsd_net *nn) { if (rp->c_type == RC_REPLBUFF && rp->c_replvec.iov_base) { - nfsd_stats_drc_mem_usage_sub(nn, rp->c_replvec.iov_len); + nn->drc_mem_usage -= rp->c_replvec.iov_len; kfree(rp->c_replvec.iov_base); } if (rp->c_state != RC_UNUSED) { rb_erase(&rp->c_node, &b->rb_head); list_del(&rp->c_lru); atomic_dec(&nn->num_drc_entries); - nfsd_stats_drc_mem_usage_sub(nn, sizeof(*rp)); + nn->drc_mem_usage -= sizeof(*rp); } kmem_cache_free(drc_slab, rp); } @@ -148,16 +154,6 @@ void nfsd_drc_slab_free(void) kmem_cache_destroy(drc_slab); } -static int nfsd_reply_cache_stats_init(struct nfsd_net *nn) -{ - return nfsd_percpu_counters_init(nn->counter, NFSD_NET_COUNTERS_NUM); -} - -static void nfsd_reply_cache_stats_destroy(struct nfsd_net *nn) -{ - nfsd_percpu_counters_destroy(nn->counter, NFSD_NET_COUNTERS_NUM); -} - int nfsd_reply_cache_init(struct nfsd_net *nn) { unsigned int hashsize; @@ -169,16 +165,12 @@ int nfsd_reply_cache_init(struct nfsd_net *nn) hashsize = nfsd_hashsize(nn->max_drc_entries); nn->maskbits = ilog2(hashsize); - status = nfsd_reply_cache_stats_init(nn); - if (status) - goto out_nomem; - nn->nfsd_reply_cache_shrinker.scan_objects = nfsd_reply_cache_scan; nn->nfsd_reply_cache_shrinker.count_objects = nfsd_reply_cache_count; nn->nfsd_reply_cache_shrinker.seeks = 1; status = register_shrinker(&nn->nfsd_reply_cache_shrinker); if (status) - goto out_stats_destroy; + goto out_nomem; nn->drc_hashtbl = kvzalloc(array_size(hashsize, sizeof(*nn->drc_hashtbl)), GFP_KERNEL); @@ -194,8 +186,6 @@ int nfsd_reply_cache_init(struct nfsd_net *nn) return 0; out_shrinker: unregister_shrinker(&nn->nfsd_reply_cache_shrinker); -out_stats_destroy: - nfsd_reply_cache_stats_destroy(nn); out_nomem: printk(KERN_ERR "nfsd: failed to allocate reply cache\n"); return -ENOMEM; @@ -216,7 +206,6 @@ void nfsd_reply_cache_shutdown(struct nfsd_net *nn) rp, nn); } } - nfsd_reply_cache_stats_destroy(nn); kvfree(nn->drc_hashtbl); nn->drc_hashtbl = NULL; @@ -235,16 +224,8 @@ lru_put_end(struct nfsd_drc_bucket *b, struct svc_cacherep *rp) list_move_tail(&rp->c_lru, &b->lru_head); } -static noinline struct nfsd_drc_bucket * -nfsd_cache_bucket_find(__be32 xid, struct nfsd_net *nn) -{ - unsigned int hash = hash_32((__force u32)xid, nn->maskbits); - - return &nn->drc_hashtbl[hash]; -} - -static long prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn, - unsigned int max) +static long +prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn) { struct svc_cacherep *rp, *tmp; long freed = 0; @@ -260,17 +241,11 @@ static long prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn, time_before(jiffies, rp->c_timestamp + RC_EXPIRE)) break; nfsd_reply_cache_free_locked(b, rp, nn); - if (max && freed++ > max) - break; + freed++; } return freed; } -static long nfsd_prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn) -{ - return prune_bucket(b, nn, 3); -} - /* * Walk the LRU list and prune off entries that are older than RC_EXPIRE. * Also prune the oldest ones when the total exceeds the max number of entries. @@ -287,7 +262,7 @@ prune_cache_entries(struct nfsd_net *nn) if (list_empty(&b->lru_head)) continue; spin_lock(&b->cache_lock); - freed += prune_bucket(b, nn, 0); + freed += prune_bucket(b, nn); spin_unlock(&b->cache_lock); } return freed; @@ -349,7 +324,7 @@ nfsd_cache_key_cmp(const struct svc_cacherep *key, { if (key->c_key.k_xid == rp->c_key.k_xid && key->c_key.k_csum != rp->c_key.k_csum) { - nfsd_stats_payload_misses_inc(nn); + ++nn->payload_misses; trace_nfsd_drc_mismatch(nn, key, rp); } @@ -421,16 +396,18 @@ nfsd_cache_insert(struct nfsd_drc_bucket *b, struct svc_cacherep *key, */ int nfsd_cache_lookup(struct svc_rqst *rqstp) { - struct nfsd_net *nn; + struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); struct svc_cacherep *rp, *found; + __be32 xid = rqstp->rq_xid; __wsum csum; - struct nfsd_drc_bucket *b; + u32 hash = nfsd_cache_hash(xid, nn); + struct nfsd_drc_bucket *b = &nn->drc_hashtbl[hash]; int type = rqstp->rq_cachetype; int rtn = RC_DOIT; rqstp->rq_cacherep = NULL; if (type == RC_NOCACHE) { - nfsd_stats_rc_nocache_inc(); + nfsdstats.rcnocache++; goto out; } @@ -440,25 +417,27 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp) * Since the common case is a cache miss followed by an insert, * preallocate an entry. */ - nn = net_generic(SVC_NET(rqstp), nfsd_net_id); rp = nfsd_reply_cache_alloc(rqstp, csum, nn); if (!rp) goto out; - b = nfsd_cache_bucket_find(rqstp->rq_xid, nn); spin_lock(&b->cache_lock); found = nfsd_cache_insert(b, rp, nn); - if (found != rp) + if (found != rp) { + nfsd_reply_cache_free_locked(NULL, rp, nn); + rp = found; goto found_entry; + } - nfsd_stats_rc_misses_inc(); + nfsdstats.rcmisses++; rqstp->rq_cacherep = rp; rp->c_state = RC_INPROG; atomic_inc(&nn->num_drc_entries); - nfsd_stats_drc_mem_usage_add(nn, sizeof(*rp)); + nn->drc_mem_usage += sizeof(*rp); - nfsd_prune_bucket(b, nn); + /* go ahead and prune the cache */ + prune_bucket(b, nn); out_unlock: spin_unlock(&b->cache_lock); @@ -467,10 +446,8 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp) found_entry: /* We found a matching entry which is either in progress or done. */ - nfsd_reply_cache_free_locked(NULL, rp, nn); - nfsd_stats_rc_hits_inc(); + nfsdstats.rchits++; rtn = RC_DROPIT; - rp = found; /* Request being processed */ if (rp->c_state == RC_INPROG) @@ -529,6 +506,7 @@ void nfsd_cache_update(struct svc_rqst *rqstp, int cachetype, __be32 *statp) struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); struct svc_cacherep *rp = rqstp->rq_cacherep; struct kvec *resv = &rqstp->rq_res.head[0], *cachv; + u32 hash; struct nfsd_drc_bucket *b; int len; size_t bufsize = 0; @@ -536,7 +514,8 @@ void nfsd_cache_update(struct svc_rqst *rqstp, int cachetype, __be32 *statp) if (!rp) return; - b = nfsd_cache_bucket_find(rp->c_key.k_xid, nn); + hash = nfsd_cache_hash(rp->c_key.k_xid, nn); + b = &nn->drc_hashtbl[hash]; len = resv->iov_len - ((char*)statp - (char*)resv->iov_base); len >>= 2; @@ -569,7 +548,7 @@ void nfsd_cache_update(struct svc_rqst *rqstp, int cachetype, __be32 *statp) return; } spin_lock(&b->cache_lock); - nfsd_stats_drc_mem_usage_add(nn, bufsize); + nn->drc_mem_usage += bufsize; lru_put_end(b, rp); rp->c_secure = test_bit(RQ_SECURE, &rqstp->rq_flags); rp->c_type = cachetype; @@ -603,26 +582,28 @@ nfsd_cache_append(struct svc_rqst *rqstp, struct kvec *data) * scraping this file for info should test the labels to ensure they're * getting the correct field. */ -int nfsd_reply_cache_stats_show(struct seq_file *m, void *v) +static int nfsd_reply_cache_stats_show(struct seq_file *m, void *v) { - struct nfsd_net *nn = net_generic(file_inode(m->file)->i_sb->s_fs_info, - nfsd_net_id); + struct nfsd_net *nn = m->private; seq_printf(m, "max entries: %u\n", nn->max_drc_entries); seq_printf(m, "num entries: %u\n", - atomic_read(&nn->num_drc_entries)); + atomic_read(&nn->num_drc_entries)); seq_printf(m, "hash buckets: %u\n", 1 << nn->maskbits); - seq_printf(m, "mem usage: %lld\n", - percpu_counter_sum_positive(&nn->counter[NFSD_NET_DRC_MEM_USAGE])); - seq_printf(m, "cache hits: %lld\n", - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_HITS])); - seq_printf(m, "cache misses: %lld\n", - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_MISSES])); - seq_printf(m, "not cached: %lld\n", - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE])); - seq_printf(m, "payload misses: %lld\n", - percpu_counter_sum_positive(&nn->counter[NFSD_NET_PAYLOAD_MISSES])); + seq_printf(m, "mem usage: %u\n", nn->drc_mem_usage); + seq_printf(m, "cache hits: %u\n", nfsdstats.rchits); + seq_printf(m, "cache misses: %u\n", nfsdstats.rcmisses); + seq_printf(m, "not cached: %u\n", nfsdstats.rcnocache); + seq_printf(m, "payload misses: %u\n", nn->payload_misses); seq_printf(m, "longest chain len: %u\n", nn->longest_chain); seq_printf(m, "cachesize at longest: %u\n", nn->longest_chain_cachesize); return 0; } + +int nfsd_reply_cache_stats_open(struct inode *inode, struct file *file) +{ + struct nfsd_net *nn = net_generic(file_inode(file)->i_sb->s_fs_info, + nfsd_net_id); + + return single_open(file, nfsd_reply_cache_stats_show, nn); +} diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 682f5226e79a..7c36634598d3 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -25,7 +25,6 @@ #include "state.h" #include "netns.h" #include "pnfs.h" -#include "filecache.h" /* * We have a single directory with several nodes in it. @@ -33,7 +32,6 @@ enum { NFSD_Root = 1, NFSD_List, - NFSD_Export_Stats, NFSD_Export_features, NFSD_Fh, NFSD_FO_UnlockIP, @@ -46,7 +44,6 @@ enum { NFSD_Ports, NFSD_MaxBlkSize, NFSD_MaxConnections, - NFSD_Filecache, NFSD_SupportedEnctypes, /* * The below MUST come last. Otherwise we leave a hole in nfsd_files[] @@ -185,7 +182,17 @@ static int export_features_show(struct seq_file *m, void *v) return 0; } -DEFINE_SHOW_ATTRIBUTE(export_features); +static int export_features_open(struct inode *inode, struct file *file) +{ + return single_open(file, export_features_show, NULL); +} + +static const struct file_operations export_features_operations = { + .open = export_features_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; #if defined(CONFIG_SUNRPC_GSS) || defined(CONFIG_SUNRPC_GSS_MODULE) static int supported_enctypes_show(struct seq_file *m, void *v) @@ -194,7 +201,17 @@ static int supported_enctypes_show(struct seq_file *m, void *v) return 0; } -DEFINE_SHOW_ATTRIBUTE(supported_enctypes); +static int supported_enctypes_open(struct inode *inode, struct file *file) +{ + return single_open(file, supported_enctypes_show, NULL); +} + +static const struct file_operations supported_enctypes_ops = { + .open = supported_enctypes_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; #endif /* CONFIG_SUNRPC_GSS or CONFIG_SUNRPC_GSS_MODULE */ static const struct file_operations pool_stats_operations = { @@ -204,9 +221,12 @@ static const struct file_operations pool_stats_operations = { .release = nfsd_pool_stats_release, }; -DEFINE_SHOW_ATTRIBUTE(nfsd_reply_cache_stats); - -DEFINE_SHOW_ATTRIBUTE(nfsd_file_cache_stats); +static const struct file_operations reply_cache_stats_operations = { + .open = nfsd_reply_cache_stats_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; /*----------------------------------------------------------------------------*/ /* @@ -374,12 +394,12 @@ static ssize_t write_filehandle(struct file *file, char *buf, size_t size) auth_domain_put(dom); if (len) return len; - + mesg = buf; len = SIMPLE_TRANSACTION_LIMIT; - qword_addhex(&mesg, &len, fh.fh_raw, fh.fh_size); + qword_addhex(&mesg, &len, (char*)&fh.fh_base, fh.fh_size); mesg[-1] = '\n'; - return mesg - buf; + return mesg - buf; } /* @@ -581,9 +601,7 @@ static ssize_t __write_versions(struct file *file, char *buf, size_t size) cmd = sign == '-' ? NFSD_CLEAR : NFSD_SET; switch(num) { -#ifdef CONFIG_NFSD_V2 case 2: -#endif case 3: nfsd_vers(nn, num, cmd); break; @@ -603,9 +621,7 @@ static ssize_t __write_versions(struct file *file, char *buf, size_t size) } break; default: - /* Ignore requests to disable non-existent versions */ - if (cmd == NFSD_SET) - return -EINVAL; + return -EINVAL; } vers += len + 1; } while ((len = qword_get(&mesg, vers, size)) > 0); @@ -616,6 +632,7 @@ static ssize_t __write_versions(struct file *file, char *buf, size_t size) } /* Now write current state into reply buffer */ + len = 0; sep = ""; remaining = SIMPLE_TRANSACTION_LIMIT; for (num=2 ; num <= 4 ; num++) { @@ -709,25 +726,28 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred char *mesg = buf; int fd, err; struct nfsd_net *nn = net_generic(net, nfsd_net_id); - struct svc_serv *serv; err = get_int(&mesg, &fd); if (err != 0 || fd < 0) return -EINVAL; + if (svc_alien_sock(net, fd)) { + printk(KERN_ERR "%s: socket net is different to NFSd's one\n", __func__); + return -EINVAL; + } + err = nfsd_create_serv(net); if (err != 0) return err; - serv = nn->nfsd_serv; - err = svc_addsock(serv, net, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred); + err = svc_addsock(nn->nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred); + if (err < 0) { + nfsd_destroy(net); + return err; + } - if (err < 0 && !serv->sv_nrthreads && !nn->keep_active) - nfsd_last_thread(net); - else if (err >= 0 && !serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) - svc_get(serv); - - svc_put(serv); + /* Decrease the count, but don't shut down the service */ + nn->nfsd_serv->sv_nrthreads--; return err; } @@ -741,7 +761,6 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr struct svc_xprt *xprt; int port, err; struct nfsd_net *nn = net_generic(net, nfsd_net_id); - struct svc_serv *serv; if (sscanf(buf, "%15s %5u", transport, &port) != 2) return -EINVAL; @@ -753,33 +772,30 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr if (err != 0) return err; - serv = nn->nfsd_serv; - err = svc_xprt_create(serv, transport, net, - PF_INET, port, SVC_SOCK_ANONYMOUS, cred); + err = svc_create_xprt(nn->nfsd_serv, transport, net, + PF_INET, port, SVC_SOCK_ANONYMOUS, cred); if (err < 0) goto out_err; - err = svc_xprt_create(serv, transport, net, - PF_INET6, port, SVC_SOCK_ANONYMOUS, cred); + err = svc_create_xprt(nn->nfsd_serv, transport, net, + PF_INET6, port, SVC_SOCK_ANONYMOUS, cred); if (err < 0 && err != -EAFNOSUPPORT) goto out_close; - if (!serv->sv_nrthreads && !xchg(&nn->keep_active, 1)) - svc_get(serv); - - svc_put(serv); + /* Decrease the count, but don't shut down the service */ + nn->nfsd_serv->sv_nrthreads--; return 0; out_close: - xprt = svc_find_xprt(serv, transport, net, PF_INET, port); + xprt = svc_find_xprt(nn->nfsd_serv, transport, net, PF_INET, port); if (xprt != NULL) { - svc_xprt_close(xprt); + svc_close_xprt(xprt); svc_xprt_put(xprt); } out_err: - if (!serv->sv_nrthreads && !nn->keep_active) - nfsd_last_thread(net); - - svc_put(serv); + if (!list_empty(&nn->nfsd_serv->sv_permsocks)) + nn->nfsd_serv->sv_nrthreads--; + else + nfsd_destroy(net); return err; } @@ -1152,7 +1168,6 @@ static struct inode *nfsd_get_inode(struct super_block *sb, umode_t mode) inode->i_fop = &simple_dir_operations; inode->i_op = &simple_dir_inode_operations; inc_nlink(inode); - break; default: break; } @@ -1254,8 +1269,7 @@ static void nfsdfs_remove_files(struct dentry *root) /* XXX: cut'n'paste from simple_fill_super; figure out if we could share * code instead. */ static int nfsdfs_create_files(struct dentry *root, - const struct tree_descr *files, - struct dentry **fdentries) + const struct tree_descr *files) { struct inode *dir = d_inode(root); struct inode *inode; @@ -1264,6 +1278,8 @@ static int nfsdfs_create_files(struct dentry *root, inode_lock(dir); for (i = 0; files->name && files->name[0]; i++, files++) { + if (!files->name) + continue; dentry = d_alloc_name(root, files->name); if (!dentry) goto out; @@ -1277,8 +1293,6 @@ static int nfsdfs_create_files(struct dentry *root, inode->i_private = __get_nfsdfs_client(dir); d_add(dentry, inode); fsnotify_create(dir, dentry); - if (fdentries) - fdentries[i] = dentry; } inode_unlock(dir); return 0; @@ -1290,9 +1304,8 @@ static int nfsdfs_create_files(struct dentry *root, /* on success, returns positive number unique to that client. */ struct dentry *nfsd_client_mkdir(struct nfsd_net *nn, - struct nfsdfs_client *ncl, u32 id, - const struct tree_descr *files, - struct dentry **fdentries) + struct nfsdfs_client *ncl, u32 id, + const struct tree_descr *files) { struct dentry *dentry; char name[11]; @@ -1303,7 +1316,7 @@ struct dentry *nfsd_client_mkdir(struct nfsd_net *nn, dentry = nfsd_mkdir(nn->nfsd_client_dir, ncl, name); if (IS_ERR(dentry)) /* XXX: tossing errors? */ return NULL; - ret = nfsdfs_create_files(dentry, files, fdentries); + ret = nfsdfs_create_files(dentry, files); if (ret) { nfsd_client_rmdir(dentry); return NULL; @@ -1339,10 +1352,8 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc) static const struct tree_descr nfsd_files[] = { [NFSD_List] = {"exports", &exports_nfsd_operations, S_IRUGO}, - /* Per-export io stats use same ops as exports file */ - [NFSD_Export_Stats] = {"export_stats", &exports_nfsd_operations, S_IRUGO}, [NFSD_Export_features] = {"export_features", - &export_features_fops, S_IRUGO}, + &export_features_operations, S_IRUGO}, [NFSD_FO_UnlockIP] = {"unlock_ip", &transaction_ops, S_IWUSR|S_IRUSR}, [NFSD_FO_UnlockFS] = {"unlock_filesystem", @@ -1351,16 +1362,13 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc) [NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR}, [NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR}, [NFSD_Pool_Stats] = {"pool_stats", &pool_stats_operations, S_IRUGO}, - [NFSD_Reply_Cache_Stats] = {"reply_cache_stats", - &nfsd_reply_cache_stats_fops, S_IRUGO}, + [NFSD_Reply_Cache_Stats] = {"reply_cache_stats", &reply_cache_stats_operations, S_IRUGO}, [NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR}, [NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO}, [NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO}, [NFSD_MaxConnections] = {"max_connections", &transaction_ops, S_IWUSR|S_IRUGO}, - [NFSD_Filecache] = {"filecache", &nfsd_file_cache_stats_fops, S_IRUGO}, #if defined(CONFIG_SUNRPC_GSS) || defined(CONFIG_SUNRPC_GSS_MODULE) - [NFSD_SupportedEnctypes] = {"supported_krb5_enctypes", - &supported_enctypes_fops, S_IRUGO}, + [NFSD_SupportedEnctypes] = {"supported_krb5_enctypes", &supported_enctypes_ops, S_IRUGO}, #endif /* CONFIG_SUNRPC_GSS or CONFIG_SUNRPC_GSS_MODULE */ #ifdef CONFIG_NFSD_V4 [NFSD_Leasetime] = {"nfsv4leasetime", &transaction_ops, S_IWUSR|S_IRUSR}, @@ -1460,16 +1468,25 @@ static __net_init int nfsd_init_net(struct net *net) goto out_idmap_error; nn->nfsd_versions = NULL; nn->nfsd4_minorversions = NULL; - nfsd4_init_leases_net(nn); retval = nfsd_reply_cache_init(nn); if (retval) - goto out_cache_error; - get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); - seqlock_init(&nn->writeverf_lock); + goto out_drc_error; + nn->nfsd4_lease = 90; /* default lease time */ + nn->nfsd4_grace = 90; + nn->somebody_reclaimed = false; + nn->track_reclaim_completes = false; + nn->clverifier_counter = prandom_u32(); + nn->clientid_base = prandom_u32(); + nn->clientid_counter = nn->clientid_base + 1; + nn->s2s_cp_cl_id = nn->clientid_counter++; + + atomic_set(&nn->ntf_refcnt, 0); + init_waitqueue_head(&nn->ntf_wq); + seqlock_init(&nn->boot_lock); return 0; -out_cache_error: +out_drc_error: nfsd_idmap_shutdown(net); out_idmap_error: nfsd_export_shutdown(net); @@ -1497,6 +1514,7 @@ static struct pernet_operations nfsd_net_ops = { static int __init init_nfsd(void) { int retval; + printk(KERN_INFO "Installing knfsd (copyright (C) 1996 okir@monad.swb.de).\n"); retval = nfsd4_init_slabs(); if (retval) @@ -1504,9 +1522,7 @@ static int __init init_nfsd(void) retval = nfsd4_init_pnfs(); if (retval) goto out_free_slabs; - retval = nfsd_stat_init(); /* Statistics */ - if (retval) - goto out_free_pnfs; + nfsd_stat_init(); /* Statistics */ retval = nfsd_drc_slab_create(); if (retval) goto out_free_stat; @@ -1514,25 +1530,20 @@ static int __init init_nfsd(void) retval = create_proc_exports_entry(); if (retval) goto out_free_lockd; + retval = register_filesystem(&nfsd_fs_type); + if (retval) + goto out_free_exports; retval = register_pernet_subsys(&nfsd_net_ops); if (retval < 0) - goto out_free_exports; + goto out_free_filesystem; retval = register_cld_notifier(); - if (retval) - goto out_free_subsys; - retval = nfsd4_create_laundry_wq(); - if (retval) - goto out_free_cld; - retval = register_filesystem(&nfsd_fs_type); if (retval) goto out_free_all; return 0; out_free_all: - nfsd4_destroy_laundry_wq(); -out_free_cld: - unregister_cld_notifier(); -out_free_subsys: unregister_pernet_subsys(&nfsd_net_ops); +out_free_filesystem: + unregister_filesystem(&nfsd_fs_type); out_free_exports: remove_proc_entry("fs/nfs/exports", NULL); remove_proc_entry("fs/nfs", NULL); @@ -1541,7 +1552,6 @@ static int __init init_nfsd(void) nfsd_drc_slab_free(); out_free_stat: nfsd_stat_shutdown(); -out_free_pnfs: nfsd4_exit_pnfs(); out_free_slabs: nfsd4_free_slabs(); @@ -1550,8 +1560,6 @@ static int __init init_nfsd(void) static void __exit exit_nfsd(void) { - unregister_filesystem(&nfsd_fs_type); - nfsd4_destroy_laundry_wq(); unregister_cld_notifier(); unregister_pernet_subsys(&nfsd_net_ops); nfsd_drc_slab_free(); @@ -1561,6 +1569,7 @@ static void __exit exit_nfsd(void) nfsd_lockd_shutdown(); nfsd4_free_slabs(); nfsd4_exit_pnfs(); + unregister_filesystem(&nfsd_fs_type); } MODULE_AUTHOR("Olaf Kirch "); diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 013bfa24ced2..4362d295ed34 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -24,8 +24,8 @@ #include #include "netns.h" -#include "export.h" #include "stats.h" +#include "export.h" #undef ifdebug #ifdef CONFIG_SUNRPC_DEBUG @@ -64,7 +64,8 @@ struct readdir_cd { extern struct svc_program nfsd_program; -extern const struct svc_version nfsd_version2, nfsd_version3, nfsd_version4; +extern const struct svc_version nfsd_version2, nfsd_version3, + nfsd_version4; extern struct mutex nfsd_mutex; extern spinlock_t nfsd_drc_lock; extern unsigned long nfsd_drc_max_mem; @@ -72,16 +73,6 @@ extern unsigned long nfsd_drc_mem_used; extern const struct seq_operations nfs_exports_op; -/* - * Common void argument and result helpers - */ -struct nfsd_voidargs { }; -struct nfsd_voidres { }; -bool nfssvc_decode_voidarg(struct svc_rqst *rqstp, - struct xdr_stream *xdr); -bool nfssvc_encode_voidres(struct svc_rqst *rqstp, - struct xdr_stream *xdr); - /* * Function prototypes. */ @@ -96,6 +87,8 @@ int nfsd_pool_stats_open(struct inode *, struct file *); int nfsd_pool_stats_release(struct inode *, struct file *); void nfsd_shutdown_threads(struct net *net); +void nfsd_destroy(struct net *net); + bool i_am_nfsd(void); struct nfsdfs_client { @@ -105,9 +98,7 @@ struct nfsdfs_client { struct nfsdfs_client *get_nfsdfs_client(struct inode *); struct dentry *nfsd_client_mkdir(struct nfsd_net *nn, - struct nfsdfs_client *ncl, u32 id, - const struct tree_descr *, - struct dentry **fdentries); + struct nfsdfs_client *ncl, u32 id, const struct tree_descr *); void nfsd_client_rmdir(struct dentry *dentry); @@ -131,7 +122,6 @@ int nfsd_vers(struct nfsd_net *nn, int vers, enum vers_op change); int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change); void nfsd_reset_versions(struct nfsd_net *nn); int nfsd_create_serv(struct net *net); -void nfsd_last_thread(struct net *net); extern int nfsd_max_blksize; @@ -160,9 +150,6 @@ void nfs4_state_shutdown_net(struct net *net); int nfs4_reset_recoverydir(char *recdir); char * nfs4_recoverydir(void); bool nfsd4_spo_must_allow(struct svc_rqst *rqstp); -int nfsd4_create_laundry_wq(void); -void nfsd4_destroy_laundry_wq(void); -bool nfsd_wait_for_delegreturn(struct svc_rqst *rqstp, struct inode *inode); #else static inline int nfsd4_init_slabs(void) { return 0; } static inline void nfsd4_free_slabs(void) { } @@ -176,13 +163,6 @@ static inline bool nfsd4_spo_must_allow(struct svc_rqst *rqstp) { return false; } -static inline int nfsd4_create_laundry_wq(void) { return 0; }; -static inline void nfsd4_destroy_laundry_wq(void) {}; -static inline bool nfsd_wait_for_delegreturn(struct svc_rqst *rqstp, - struct inode *inode) -{ - return false; -} #endif /* @@ -344,10 +324,6 @@ void nfsd_lockd_shutdown(void); #define COMPOUND_ERR_SLACK_SPACE 16 /* OP_SETATTR */ #define NFSD_LAUNDROMAT_MINTIMEOUT 1 /* seconds */ -#define NFSD_COURTESY_CLIENT_TIMEOUT (24 * 60 * 60) /* seconds */ -#define NFSD_CLIENT_MAX_TRIM_PER_RUN 128 -#define NFS4_CLIENTS_PER_GB 1024 -#define NFSD_DELEGRETURN_TIMEOUT (HZ / 34) /* 30ms */ /* * The following attributes are currently not supported by the NFSv4 server: @@ -376,7 +352,7 @@ void nfsd_lockd_shutdown(void); | FATTR4_WORD1_OWNER | FATTR4_WORD1_OWNER_GROUP | FATTR4_WORD1_RAWDEV \ | FATTR4_WORD1_SPACE_AVAIL | FATTR4_WORD1_SPACE_FREE | FATTR4_WORD1_SPACE_TOTAL \ | FATTR4_WORD1_SPACE_USED | FATTR4_WORD1_TIME_ACCESS | FATTR4_WORD1_TIME_ACCESS_SET \ - | FATTR4_WORD1_TIME_DELTA | FATTR4_WORD1_TIME_METADATA | FATTR4_WORD1_TIME_CREATE \ + | FATTR4_WORD1_TIME_DELTA | FATTR4_WORD1_TIME_METADATA \ | FATTR4_WORD1_TIME_MODIFY | FATTR4_WORD1_TIME_MODIFY_SET | FATTR4_WORD1_MOUNTED_ON_FILEID) #define NFSD4_SUPPORTED_ATTRS_WORD2 0 @@ -410,6 +386,7 @@ void nfsd_lockd_shutdown(void); #define NFSD4_2_SUPPORTED_ATTRS_WORD2 \ (NFSD4_1_SUPPORTED_ATTRS_WORD2 | \ + FATTR4_WORD2_CHANGE_ATTR_TYPE | \ FATTR4_WORD2_MODE_UMASK | \ NFSD4_2_SECURITY_ATTRS | \ FATTR4_WORD2_XATTR_SUPPORT) @@ -472,8 +449,7 @@ static inline bool nfsd_attrs_supported(u32 minorversion, const u32 *bmval) (FATTR4_WORD0_SIZE | FATTR4_WORD0_ACL) #define NFSD_WRITEABLE_ATTRS_WORD1 \ (FATTR4_WORD1_MODE | FATTR4_WORD1_OWNER | FATTR4_WORD1_OWNER_GROUP \ - | FATTR4_WORD1_TIME_ACCESS_SET | FATTR4_WORD1_TIME_CREATE \ - | FATTR4_WORD1_TIME_MODIFY_SET) + | FATTR4_WORD1_TIME_ACCESS_SET | FATTR4_WORD1_TIME_MODIFY_SET) #ifdef CONFIG_NFSD_V4_SECURITY_LABEL #define MAYBE_FATTR4_WORD2_SECURITY_LABEL \ FATTR4_WORD2_SECURITY_LABEL @@ -499,20 +475,12 @@ static inline bool nfsd_attrs_supported(u32 minorversion, const u32 *bmval) extern int nfsd4_is_junction(struct dentry *dentry); extern int register_cld_notifier(void); extern void unregister_cld_notifier(void); -#ifdef CONFIG_NFSD_V4_2_INTER_SSC -extern void nfsd4_ssc_init_umount_work(struct nfsd_net *nn); -#endif - -extern void nfsd4_init_leases_net(struct nfsd_net *nn); - #else /* CONFIG_NFSD_V4 */ static inline int nfsd4_is_junction(struct dentry *dentry) { return 0; } -static inline void nfsd4_init_leases_net(struct nfsd_net *nn) { }; - #define register_cld_notifier() 0 #define unregister_cld_notifier() do { } while(0) diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index db8d62632a5b..c81dbbad8792 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -153,12 +153,11 @@ static inline __be32 check_pseudo_root(struct svc_rqst *rqstp, static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) { struct knfsd_fh *fh = &fhp->fh_handle; - struct fid *fid = NULL; + struct fid *fid = NULL, sfid; struct svc_export *exp; struct dentry *dentry; int fileid_type; int data_left = fh->fh_size/4; - int len; __be32 error; error = nfserr_stale; @@ -167,35 +166,48 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) if (rqstp->rq_vers == 4 && fh->fh_size == 0) return nfserr_nofilehandle; - if (fh->fh_version != 1) - return error; + if (fh->fh_version == 1) { + int len; - if (--data_left < 0) - return error; - if (fh->fh_auth_type != 0) - return error; - len = key_len(fh->fh_fsid_type) / 4; - if (len == 0) - return error; - if (fh->fh_fsid_type == FSID_MAJOR_MINOR) { - /* deprecated, convert to type 3 */ - len = key_len(FSID_ENCODE_DEV)/4; - fh->fh_fsid_type = FSID_ENCODE_DEV; - /* - * struct knfsd_fh uses host-endian fields, which are - * sometimes used to hold net-endian values. This - * confuses sparse, so we must use __force here to - * keep it from complaining. - */ - fh->fh_fsid[0] = new_encode_dev(MKDEV(ntohl((__force __be32)fh->fh_fsid[0]), - ntohl((__force __be32)fh->fh_fsid[1]))); - fh->fh_fsid[1] = fh->fh_fsid[2]; + if (--data_left < 0) + return error; + if (fh->fh_auth_type != 0) + return error; + len = key_len(fh->fh_fsid_type) / 4; + if (len == 0) + return error; + if (fh->fh_fsid_type == FSID_MAJOR_MINOR) { + /* deprecated, convert to type 3 */ + len = key_len(FSID_ENCODE_DEV)/4; + fh->fh_fsid_type = FSID_ENCODE_DEV; + /* + * struct knfsd_fh uses host-endian fields, which are + * sometimes used to hold net-endian values. This + * confuses sparse, so we must use __force here to + * keep it from complaining. + */ + fh->fh_fsid[0] = new_encode_dev(MKDEV(ntohl((__force __be32)fh->fh_fsid[0]), + ntohl((__force __be32)fh->fh_fsid[1]))); + fh->fh_fsid[1] = fh->fh_fsid[2]; + } + data_left -= len; + if (data_left < 0) + return error; + exp = rqst_exp_find(rqstp, fh->fh_fsid_type, fh->fh_fsid); + fid = (struct fid *)(fh->fh_fsid + len); + } else { + __u32 tfh[2]; + dev_t xdev; + ino_t xino; + + if (fh->fh_size != NFS_FHSIZE) + return error; + /* assume old filehandle format */ + xdev = old_decode_dev(fh->ofh_xdev); + xino = u32_to_ino_t(fh->ofh_xino); + mk_fsid(FSID_DEV, tfh, xdev, xino, 0, NULL); + exp = rqst_exp_find(rqstp, FSID_DEV, tfh); } - data_left -= len; - if (data_left < 0) - return error; - exp = rqst_exp_find(rqstp, fh->fh_fsid_type, fh->fh_fsid); - fid = (struct fid *)(fh->fh_fsid + len); error = nfserr_stale; if (IS_ERR(exp)) { @@ -240,25 +252,28 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) if (rqstp->rq_vers > 2) error = nfserr_badhandle; - fileid_type = fh->fh_fileid_type; + if (fh->fh_version != 1) { + sfid.i32.ino = fh->ofh_ino; + sfid.i32.gen = fh->ofh_generation; + sfid.i32.parent_ino = fh->ofh_dirino; + fid = &sfid; + data_left = 3; + if (fh->ofh_dirino == 0) + fileid_type = FILEID_INO32_GEN; + else + fileid_type = FILEID_INO32_GEN_PARENT; + } else + fileid_type = fh->fh_fileid_type; if (fileid_type == FILEID_ROOT) dentry = dget(exp->ex_path.dentry); else { - dentry = exportfs_decode_fh_raw(exp->ex_path.mnt, fid, - data_left, fileid_type, - nfsd_acceptable, exp); - if (IS_ERR_OR_NULL(dentry)) { + dentry = exportfs_decode_fh(exp->ex_path.mnt, fid, + data_left, fileid_type, + nfsd_acceptable, exp); + if (IS_ERR_OR_NULL(dentry)) trace_nfsd_set_fh_dentry_badhandle(rqstp, fhp, dentry ? PTR_ERR(dentry) : -ESTALE); - switch (PTR_ERR(dentry)) { - case -ENOMEM: - case -ETIMEDOUT: - break; - default: - dentry = ERR_PTR(-ESTALE); - } - } } if (dentry == NULL) goto out; @@ -276,20 +291,6 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) fhp->fh_dentry = dentry; fhp->fh_export = exp; - - switch (rqstp->rq_vers) { - case 4: - if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOATOMIC_ATTR) - fhp->fh_no_atomic_attr = true; - break; - case 3: - if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC) - fhp->fh_no_wcc = true; - break; - case 2: - fhp->fh_no_wcc = true; - } - return 0; out: exp_put(exp); @@ -326,7 +327,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) __be32 fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access) { - struct svc_export *exp = NULL; + struct svc_export *exp; struct dentry *dentry; __be32 error; @@ -399,7 +400,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access) } out: if (error == nfserr_stale) - nfsd_stats_fh_stale_inc(exp); + nfsdstats.fh_stale++; return error; } @@ -428,6 +429,20 @@ static void _fh_update(struct svc_fh *fhp, struct svc_export *exp, } } +/* + * for composing old style file handles + */ +static inline void _fh_update_old(struct dentry *dentry, + struct svc_export *exp, + struct knfsd_fh *fh) +{ + fh->ofh_ino = ino_t_to_u32(d_inode(dentry)->i_ino); + fh->ofh_generation = d_inode(dentry)->i_generation; + if (d_is_dir(dentry) || + (exp->ex_flags & NFSEXP_NOSUBTREECHECK)) + fh->ofh_dirino = 0; +} + static bool is_root_export(struct svc_export *exp) { return exp->ex_path.dentry == exp->ex_path.dentry->d_sb->s_root; @@ -524,6 +539,9 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry, /* ref_fh is a reference file handle. * if it is non-null and for the same filesystem, then we should compose * a filehandle which is of the same version, where possible. + * Currently, that means that if ref_fh->fh_handle.fh_version == 0xca + * Then create a 32byte filehandle using nfs_fhbase_old + * */ struct inode * inode = d_inode(dentry); @@ -541,13 +559,10 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry, */ set_version_and_fsid_type(fhp, exp, ref_fh); - /* If we have a ref_fh, then copy the fh_no_wcc setting from it. */ - fhp->fh_no_wcc = ref_fh ? ref_fh->fh_no_wcc : false; - if (ref_fh == fhp) fh_put(ref_fh); - if (fhp->fh_dentry) { + if (fhp->fh_locked || fhp->fh_dentry) { printk(KERN_ERR "fh_compose: fh %pd2 not initialized!\n", dentry); } @@ -559,21 +574,35 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry, fhp->fh_dentry = dget(dentry); /* our internal copy */ fhp->fh_export = exp_get(exp); - fhp->fh_handle.fh_size = - key_len(fhp->fh_handle.fh_fsid_type) + 4; - fhp->fh_handle.fh_auth_type = 0; + if (fhp->fh_handle.fh_version == 0xca) { + /* old style filehandle please */ + memset(&fhp->fh_handle.fh_base, 0, NFS_FHSIZE); + fhp->fh_handle.fh_size = NFS_FHSIZE; + fhp->fh_handle.ofh_dcookie = 0xfeebbaca; + fhp->fh_handle.ofh_dev = old_encode_dev(ex_dev); + fhp->fh_handle.ofh_xdev = fhp->fh_handle.ofh_dev; + fhp->fh_handle.ofh_xino = + ino_t_to_u32(d_inode(exp->ex_path.dentry)->i_ino); + fhp->fh_handle.ofh_dirino = ino_t_to_u32(parent_ino(dentry)); + if (inode) + _fh_update_old(dentry, exp, &fhp->fh_handle); + } else { + fhp->fh_handle.fh_size = + key_len(fhp->fh_handle.fh_fsid_type) + 4; + fhp->fh_handle.fh_auth_type = 0; - mk_fsid(fhp->fh_handle.fh_fsid_type, - fhp->fh_handle.fh_fsid, - ex_dev, - d_inode(exp->ex_path.dentry)->i_ino, - exp->ex_fsid, exp->ex_uuid); + mk_fsid(fhp->fh_handle.fh_fsid_type, + fhp->fh_handle.fh_fsid, + ex_dev, + d_inode(exp->ex_path.dentry)->i_ino, + exp->ex_fsid, exp->ex_uuid); - if (inode) - _fh_update(fhp, exp, dentry); - if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) { - fh_put(fhp); - return nfserr_opnotsupp; + if (inode) + _fh_update(fhp, exp, dentry); + if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) { + fh_put(fhp); + return nfserr_opnotsupp; + } } return 0; @@ -594,12 +623,16 @@ fh_update(struct svc_fh *fhp) dentry = fhp->fh_dentry; if (d_really_is_negative(dentry)) goto out_negative; - if (fhp->fh_handle.fh_fileid_type != FILEID_ROOT) - return 0; + if (fhp->fh_handle.fh_version != 1) { + _fh_update_old(dentry, fhp->fh_export, &fhp->fh_handle); + } else { + if (fhp->fh_handle.fh_fileid_type != FILEID_ROOT) + return 0; - _fh_update(fhp, fhp->fh_export, dentry); - if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) - return nfserr_opnotsupp; + _fh_update(fhp, fhp->fh_export, dentry); + if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) + return nfserr_opnotsupp; + } return 0; out_bad: printk(KERN_ERR "fh_update: fh not verified!\n"); @@ -610,85 +643,6 @@ fh_update(struct svc_fh *fhp) return nfserr_serverfault; } -/** - * fh_fill_pre_attrs - Fill in pre-op attributes - * @fhp: file handle to be updated - * - */ -void fh_fill_pre_attrs(struct svc_fh *fhp) -{ - bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); - struct inode *inode; - struct kstat stat; - __be32 err; - - if (fhp->fh_no_wcc || fhp->fh_pre_saved) - return; - - inode = d_inode(fhp->fh_dentry); - err = fh_getattr(fhp, &stat); - if (err) { - /* Grab the times from inode anyway */ - stat.mtime = inode->i_mtime; - stat.ctime = inode->i_ctime; - stat.size = inode->i_size; - } - if (v4) - fhp->fh_pre_change = nfsd4_change_attribute(&stat, inode); - - fhp->fh_pre_mtime = stat.mtime; - fhp->fh_pre_ctime = stat.ctime; - fhp->fh_pre_size = stat.size; - fhp->fh_pre_saved = true; -} - -/** - * fh_fill_post_attrs - Fill in post-op attributes - * @fhp: file handle to be updated - * - */ -void fh_fill_post_attrs(struct svc_fh *fhp) -{ - bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); - struct inode *inode = d_inode(fhp->fh_dentry); - __be32 err; - - if (fhp->fh_no_wcc) - return; - - if (fhp->fh_post_saved) - printk("nfsd: inode locked twice during operation.\n"); - - err = fh_getattr(fhp, &fhp->fh_post_attr); - if (err) { - fhp->fh_post_saved = false; - fhp->fh_post_attr.ctime = inode->i_ctime; - } else - fhp->fh_post_saved = true; - if (v4) - fhp->fh_post_change = - nfsd4_change_attribute(&fhp->fh_post_attr, inode); -} - -/** - * fh_fill_both_attrs - Fill pre-op and post-op attributes - * @fhp: file handle to be updated - * - * This is used when the directory wasn't changed, but wcc attributes - * are needed anyway. - */ -void fh_fill_both_attrs(struct svc_fh *fhp) -{ - fh_fill_post_attrs(fhp); - if (!fhp->fh_post_saved) - return; - fhp->fh_pre_change = fhp->fh_post_change; - fhp->fh_pre_mtime = fhp->fh_post_attr.mtime; - fhp->fh_pre_ctime = fhp->fh_post_attr.ctime; - fhp->fh_pre_size = fhp->fh_post_attr.size; - fhp->fh_pre_saved = true; -} - /* * Release a file handle. */ @@ -698,16 +652,16 @@ fh_put(struct svc_fh *fhp) struct dentry * dentry = fhp->fh_dentry; struct svc_export * exp = fhp->fh_export; if (dentry) { + fh_unlock(fhp); fhp->fh_dentry = NULL; dput(dentry); - fh_clear_pre_post_attrs(fhp); + fh_clear_wcc(fhp); } fh_drop_write(fhp); if (exp) { exp_put(exp); fhp->fh_export = NULL; } - fhp->fh_no_wcc = false; return; } @@ -717,15 +671,20 @@ fh_put(struct svc_fh *fhp) char * SVCFH_fmt(struct svc_fh *fhp) { struct knfsd_fh *fh = &fhp->fh_handle; - static char buf[2+1+1+64*3+1]; - if (fh->fh_size < 0 || fh->fh_size> 64) - return "bad-fh"; - sprintf(buf, "%d: %*ph", fh->fh_size, fh->fh_size, fh->fh_raw); + static char buf[80]; + sprintf(buf, "%d: %08x %08x %08x %08x %08x %08x", + fh->fh_size, + fh->fh_base.fh_pad[0], + fh->fh_base.fh_pad[1], + fh->fh_base.fh_pad[2], + fh->fh_base.fh_pad[3], + fh->fh_base.fh_pad[4], + fh->fh_base.fh_pad[5]); return buf; } -enum fsid_source fsid_source(const struct svc_fh *fhp) +enum fsid_source fsid_source(struct svc_fh *fhp) { if (fhp->fh_handle.fh_version != 1) return FSIDSOURCE_DEV; diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 513e028b0bbe..56cfbc361561 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -10,56 +10,8 @@ #include #include +#include #include -#include -#include - -/* - * The file handle starts with a sequence of four-byte words. - * The first word contains a version number (1) and three descriptor bytes - * that tell how the remaining 3 variable length fields should be handled. - * These three bytes are auth_type, fsid_type and fileid_type. - * - * All four-byte values are in host-byte-order. - * - * The auth_type field is deprecated and must be set to 0. - * - * The fsid_type identifies how the filesystem (or export point) is - * encoded. - * Current values: - * 0 - 4 byte device id (ms-2-bytes major, ls-2-bytes minor), 4byte inode number - * NOTE: we cannot use the kdev_t device id value, because kdev_t.h - * says we mustn't. We must break it up and reassemble. - * 1 - 4 byte user specified identifier - * 2 - 4 byte major, 4 byte minor, 4 byte inode number - DEPRECATED - * 3 - 4 byte device id, encoded for user-space, 4 byte inode number - * 4 - 4 byte inode number and 4 byte uuid - * 5 - 8 byte uuid - * 6 - 16 byte uuid - * 7 - 8 byte inode number and 16 byte uuid - * - * The fileid_type identifies how the file within the filesystem is encoded. - * The values for this field are filesystem specific, exccept that - * filesystems must not use the values '0' or '0xff'. 'See enum fid_type' - * in include/linux/exportfs.h for currently registered values. - */ - -struct knfsd_fh { - unsigned int fh_size; /* - * Points to the current size while - * building a new file handle. - */ - union { - char fh_raw[NFS4_FHSIZE]; - struct { - u8 fh_version; /* == 1 */ - u8 fh_auth_type; /* deprecated */ - u8 fh_fsid_type; - u8 fh_fileid_type; - u32 fh_fsid[]; /* flexible-array member */ - }; - }; -}; static inline __u32 ino_t_to_u32(ino_t ino) { @@ -81,18 +33,14 @@ typedef struct svc_fh { struct dentry * fh_dentry; /* validated dentry */ struct svc_export * fh_export; /* export pointer */ + bool fh_locked; /* inode locked by us */ bool fh_want_write; /* remount protection taken */ - bool fh_no_wcc; /* no wcc data needed */ - bool fh_no_atomic_attr; - /* - * wcc data is not atomic with - * operation - */ int fh_flags; /* FH flags */ +#ifdef CONFIG_NFSD_V3 bool fh_post_saved; /* post-op attrs saved */ bool fh_pre_saved; /* pre-op attrs saved */ - /* Pre-op attributes saved when inode is locked */ + /* Pre-op attributes saved during fh_lock */ __u64 fh_pre_size; /* size before operation */ struct timespec64 fh_pre_mtime; /* mtime before oper */ struct timespec64 fh_pre_ctime; /* ctime before oper */ @@ -102,9 +50,11 @@ typedef struct svc_fh { */ u64 fh_pre_change; - /* Post-op attributes saved in fh_fill_post_attrs() */ + /* Post-op attributes saved in fh_unlock */ struct kstat fh_post_attr; /* full attrs after operation */ u64 fh_post_change; /* nfsv4 change; see above */ +#endif /* CONFIG_NFSD_V3 */ + } svc_fh; #define NFSD4_FH_FOREIGN (1<<0) #define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f)) @@ -126,7 +76,7 @@ enum fsid_source { FSIDSOURCE_FSID, FSIDSOURCE_UUID, }; -extern enum fsid_source fsid_source(const struct svc_fh *fhp); +extern enum fsid_source fsid_source(struct svc_fh *fhp); /* @@ -220,19 +170,19 @@ __be32 fh_update(struct svc_fh *); void fh_put(struct svc_fh *); static __inline__ struct svc_fh * -fh_copy(struct svc_fh *dst, const struct svc_fh *src) +fh_copy(struct svc_fh *dst, struct svc_fh *src) { - WARN_ON(src->fh_dentry); - + WARN_ON(src->fh_dentry || src->fh_locked); + *dst = *src; return dst; } static inline void -fh_copy_shallow(struct knfsd_fh *dst, const struct knfsd_fh *src) +fh_copy_shallow(struct knfsd_fh *dst, struct knfsd_fh *src) { dst->fh_size = src->fh_size; - memcpy(&dst->fh_raw, &src->fh_raw, src->fh_size); + memcpy(&dst->fh_base, &src->fh_base, src->fh_size); } static __inline__ struct svc_fh * @@ -243,18 +193,16 @@ fh_init(struct svc_fh *fhp, int maxsize) return fhp; } -static inline bool fh_match(const struct knfsd_fh *fh1, - const struct knfsd_fh *fh2) +static inline bool fh_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2) { if (fh1->fh_size != fh2->fh_size) return false; - if (memcmp(fh1->fh_raw, fh2->fh_raw, fh1->fh_size) != 0) + if (memcmp(fh1->fh_base.fh_pad, fh2->fh_base.fh_pad, fh1->fh_size) != 0) return false; return true; } -static inline bool fh_fsid_match(const struct knfsd_fh *fh1, - const struct knfsd_fh *fh2) +static inline bool fh_fsid_match(struct knfsd_fh *fh1, struct knfsd_fh *fh2) { if (fh1->fh_fsid_type != fh2->fh_fsid_type) return false; @@ -271,23 +219,27 @@ static inline bool fh_fsid_match(const struct knfsd_fh *fh1, * returns a crc32 hash for the filehandle that is compatible with * the one displayed by "wireshark". */ -static inline u32 knfsd_fh_hash(const struct knfsd_fh *fh) + +static inline u32 +knfsd_fh_hash(struct knfsd_fh *fh) { - return ~crc32_le(0xFFFFFFFF, fh->fh_raw, fh->fh_size); + return ~crc32_le(0xFFFFFFFF, (unsigned char *)&fh->fh_base, fh->fh_size); } #else -static inline u32 knfsd_fh_hash(const struct knfsd_fh *fh) +static inline u32 +knfsd_fh_hash(struct knfsd_fh *fh) { return 0; } #endif -/** - * fh_clear_pre_post_attrs - Reset pre/post attributes - * @fhp: file handle to be updated - * +#ifdef CONFIG_NFSD_V3 +/* + * The wcc data stored in current_fh should be cleared + * between compound ops. */ -static inline void fh_clear_pre_post_attrs(struct svc_fh *fhp) +static inline void +fh_clear_wcc(struct svc_fh *fhp) { fhp->fh_post_saved = false; fhp->fh_pre_saved = false; @@ -307,21 +259,68 @@ static inline void fh_clear_pre_post_attrs(struct svc_fh *fhp) static inline u64 nfsd4_change_attribute(struct kstat *stat, struct inode *inode) { - if (inode->i_sb->s_export_op->fetch_iversion) - return inode->i_sb->s_export_op->fetch_iversion(inode); - else if (IS_I_VERSION(inode)) { - u64 chattr; + u64 chattr; - chattr = stat->ctime.tv_sec; - chattr <<= 30; - chattr += stat->ctime.tv_nsec; - chattr += inode_query_iversion(inode); - return chattr; - } else - return time_to_chattr(&stat->ctime); + chattr = stat->ctime.tv_sec; + chattr <<= 30; + chattr += stat->ctime.tv_nsec; + chattr += inode_query_iversion(inode); + return chattr; +} + +extern void fill_pre_wcc(struct svc_fh *fhp); +extern void fill_post_wcc(struct svc_fh *fhp); +#else +#define fh_clear_wcc(ignored) +#define fill_pre_wcc(ignored) +#define fill_post_wcc(notused) +#endif /* CONFIG_NFSD_V3 */ + + +/* + * Lock a file handle/inode + * NOTE: both fh_lock and fh_unlock are done "by hand" in + * vfs.c:nfsd_rename as it needs to grab 2 i_mutex's at once + * so, any changes here should be reflected there. + */ + +static inline void +fh_lock_nested(struct svc_fh *fhp, unsigned int subclass) +{ + struct dentry *dentry = fhp->fh_dentry; + struct inode *inode; + + BUG_ON(!dentry); + + if (fhp->fh_locked) { + printk(KERN_WARNING "fh_lock: %pd2 already locked!\n", + dentry); + return; + } + + inode = d_inode(dentry); + inode_lock_nested(inode, subclass); + fill_pre_wcc(fhp); + fhp->fh_locked = true; +} + +static inline void +fh_lock(struct svc_fh *fhp) +{ + fh_lock_nested(fhp, I_MUTEX_NORMAL); +} + +/* + * Unlock a file handle/inode + */ +static inline void +fh_unlock(struct svc_fh *fhp) +{ + if (fhp->fh_locked) { + fill_post_wcc(fhp); + inode_unlock(d_inode(fhp->fh_dentry)); + fhp->fh_locked = false; + } } -extern void fh_fill_pre_attrs(struct svc_fh *fhp); -extern void fh_fill_post_attrs(struct svc_fh *fhp); -extern void fh_fill_both_attrs(struct svc_fh *fhp); #endif /* _LINUX_NFSD_NFSFH_H */ diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 96426dea7d41..bbd01e8397f6 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -51,9 +51,6 @@ nfsd_proc_setattr(struct svc_rqst *rqstp) struct nfsd_sattrargs *argp = rqstp->rq_argp; struct nfsd_attrstat *resp = rqstp->rq_resp; struct iattr *iap = &argp->attrs; - struct nfsd_attrs attrs = { - .na_iattr = iap, - }; struct svc_fh *fhp; dprintk("nfsd: SETATTR %s, valid=%x, size=%ld\n", @@ -103,7 +100,7 @@ nfsd_proc_setattr(struct svc_rqst *rqstp) } } - resp->status = nfsd_setattr(rqstp, fhp, &attrs, 0, (time64_t)0); + resp->status = nfsd_setattr(rqstp, fhp, iap, 0, (time64_t)0); if (resp->status != nfs_ok) goto out; @@ -152,16 +149,14 @@ nfsd_proc_lookup(struct svc_rqst *rqstp) static __be32 nfsd_proc_readlink(struct svc_rqst *rqstp) { - struct nfsd_fhandle *argp = rqstp->rq_argp; + struct nfsd_readlinkargs *argp = rqstp->rq_argp; struct nfsd_readlinkres *resp = rqstp->rq_resp; dprintk("nfsd: READLINK %s\n", SVCFH_fmt(&argp->fh)); /* Read the symlink. */ resp->len = NFS_MAXPATHLEN; - resp->page = *(rqstp->rq_next_page++); - resp->status = nfsd_readlink(rqstp, &argp->fh, - page_address(resp->page), &resp->len); + resp->status = nfsd_readlink(rqstp, &argp->fh, argp->buffer, &resp->len); fh_put(&argp->fh); return rpc_success; @@ -176,42 +171,36 @@ nfsd_proc_read(struct svc_rqst *rqstp) { struct nfsd_readargs *argp = rqstp->rq_argp; struct nfsd_readres *resp = rqstp->rq_resp; - unsigned int len; u32 eof; - int v; dprintk("nfsd: READ %s %d bytes at %d\n", SVCFH_fmt(&argp->fh), argp->count, argp->offset); - argp->count = min_t(u32, argp->count, NFSSVC_MAXBLKSIZE_V2); - argp->count = min_t(u32, argp->count, rqstp->rq_res.buflen); - - v = 0; - len = argp->count; - resp->pages = rqstp->rq_next_page; - while (len > 0) { - struct page *page = *(rqstp->rq_next_page++); - - rqstp->rq_vec[v].iov_base = page_address(page); - rqstp->rq_vec[v].iov_len = min_t(unsigned int, len, PAGE_SIZE); - len -= rqstp->rq_vec[v].iov_len; - v++; - } - /* Obtain buffer pointer for payload. 19 is 1 word for * status, 17 words for fattr, and 1 word for the byte count. */ + + if (NFSSVC_MAXBLKSIZE_V2 < argp->count) { + char buf[RPC_MAX_ADDRBUFLEN]; + printk(KERN_NOTICE + "oversized read request from %s (%d bytes)\n", + svc_print_addr(rqstp, buf, sizeof(buf)), + argp->count); + argp->count = NFSSVC_MAXBLKSIZE_V2; + } svc_reserve_auth(rqstp, (19<<2) + argp->count + 4); resp->count = argp->count; - fh_copy(&resp->fh, &argp->fh); - resp->status = nfsd_read(rqstp, &resp->fh, argp->offset, - rqstp->rq_vec, v, &resp->count, &eof); + resp->status = nfsd_read(rqstp, fh_copy(&resp->fh, &argp->fh), + argp->offset, + rqstp->rq_vec, argp->vlen, + &resp->count, + &eof); if (resp->status == nfs_ok) resp->status = fh_getattr(&resp->fh, &resp->stat); else if (resp->status == nfserr_jukebox) - set_bit(RQ_DROPME, &rqstp->rq_flags); + return rpc_drop_reply; return rpc_success; } @@ -238,7 +227,12 @@ nfsd_proc_write(struct svc_rqst *rqstp) SVCFH_fmt(&argp->fh), argp->len, argp->offset); - nvecs = svc_fill_write_vector(rqstp, &argp->payload); + nvecs = svc_fill_write_vector(rqstp, rqstp->rq_arg.pages, + &argp->first, cnt); + if (!nvecs) { + resp->status = nfserr_io; + goto out; + } resp->status = nfsd_write(rqstp, fh_copy(&resp->fh, &argp->fh), argp->offset, rqstp->rq_vec, nvecs, @@ -246,7 +240,8 @@ nfsd_proc_write(struct svc_rqst *rqstp) if (resp->status == nfs_ok) resp->status = fh_getattr(&resp->fh, &resp->stat); else if (resp->status == nfserr_jukebox) - set_bit(RQ_DROPME, &rqstp->rq_flags); + return rpc_drop_reply; +out: return rpc_success; } @@ -264,9 +259,6 @@ nfsd_proc_create(struct svc_rqst *rqstp) svc_fh *dirfhp = &argp->fh; svc_fh *newfhp = &resp->fh; struct iattr *attr = &argp->attrs; - struct nfsd_attrs attrs = { - .na_iattr = attr, - }; struct inode *inode; struct dentry *dchild; int type, mode; @@ -292,7 +284,7 @@ nfsd_proc_create(struct svc_rqst *rqstp) goto done; } - inode_lock_nested(dirfhp->fh_dentry->d_inode, I_MUTEX_PARENT); + fh_lock_nested(dirfhp, I_MUTEX_PARENT); dchild = lookup_one_len(argp->name, dirfhp->fh_dentry, argp->len); if (IS_ERR(dchild)) { resp->status = nfserrno(PTR_ERR(dchild)); @@ -391,8 +383,9 @@ nfsd_proc_create(struct svc_rqst *rqstp) resp->status = nfs_ok; if (!inode) { /* File doesn't exist. Create it and set attrs */ - resp->status = nfsd_create_locked(rqstp, dirfhp, &attrs, type, - rdev, newfhp); + resp->status = nfsd_create_locked(rqstp, dirfhp, argp->name, + argp->len, attr, type, rdev, + newfhp); } else if (type == S_IFREG) { dprintk("nfsd: existing %s, valid=%x, size=%ld\n", argp->name, attr->ia_valid, (long) attr->ia_size); @@ -402,12 +395,13 @@ nfsd_proc_create(struct svc_rqst *rqstp) */ attr->ia_valid &= ATTR_SIZE; if (attr->ia_valid) - resp->status = nfsd_setattr(rqstp, newfhp, &attrs, 0, + resp->status = nfsd_setattr(rqstp, newfhp, attr, 0, (time64_t)0); } out_unlock: - inode_unlock(dirfhp->fh_dentry->d_inode); + /* We don't really need to unlock, as fh_put does it. */ + fh_unlock(dirfhp); fh_drop_write(dirfhp); done: fh_put(dirfhp); @@ -477,9 +471,6 @@ nfsd_proc_symlink(struct svc_rqst *rqstp) { struct nfsd_symlinkargs *argp = rqstp->rq_argp; struct nfsd_stat *resp = rqstp->rq_resp; - struct nfsd_attrs attrs = { - .na_iattr = &argp->attrs, - }; struct svc_fh newfh; if (argp->tlen > NFS_MAXPATHLEN) { @@ -501,7 +492,7 @@ nfsd_proc_symlink(struct svc_rqst *rqstp) fh_init(&newfh, NFS_FHSIZE); resp->status = nfsd_symlink(rqstp, &argp->ffh, argp->fname, argp->flen, - argp->tname, &attrs, &newfh); + argp->tname, &newfh); kfree(argp->tname); fh_put(&argp->ffh); @@ -519,9 +510,6 @@ nfsd_proc_mkdir(struct svc_rqst *rqstp) { struct nfsd_createargs *argp = rqstp->rq_argp; struct nfsd_diropres *resp = rqstp->rq_resp; - struct nfsd_attrs attrs = { - .na_iattr = &argp->attrs, - }; dprintk("nfsd: MKDIR %s %.*s\n", SVCFH_fmt(&argp->fh), argp->len, argp->name); @@ -533,7 +521,7 @@ nfsd_proc_mkdir(struct svc_rqst *rqstp) argp->attrs.ia_valid &= ~ATTR_SIZE; fh_init(&resp->fh, NFS_FHSIZE); resp->status = nfsd_create(rqstp, &argp->fh, argp->name, argp->len, - &attrs, S_IFDIR, 0, &resp->fh); + &argp->attrs, S_IFDIR, 0, &resp->fh); fh_put(&argp->fh); if (resp->status != nfs_ok) goto out; @@ -560,24 +548,6 @@ nfsd_proc_rmdir(struct svc_rqst *rqstp) return rpc_success; } -static void nfsd_init_dirlist_pages(struct svc_rqst *rqstp, - struct nfsd_readdirres *resp, - u32 count) -{ - struct xdr_buf *buf = &resp->dirlist; - struct xdr_stream *xdr = &resp->xdr; - - memset(buf, 0, sizeof(*buf)); - - /* Reserve room for the NULL ptr & eof flag (-2 words) */ - buf->buflen = clamp(count, (u32)(XDR_UNIT * 2), (u32)PAGE_SIZE); - buf->buflen -= XDR_UNIT * 2; - buf->pages = rqstp->rq_next_page; - rqstp->rq_next_page++; - - xdr_init_encode_pages(xdr, buf, buf->pages, NULL); -} - /* * Read a portion of a directory. */ @@ -586,20 +556,33 @@ nfsd_proc_readdir(struct svc_rqst *rqstp) { struct nfsd_readdirargs *argp = rqstp->rq_argp; struct nfsd_readdirres *resp = rqstp->rq_resp; + int count; loff_t offset; dprintk("nfsd: READDIR %s %d bytes at %d\n", SVCFH_fmt(&argp->fh), argp->count, argp->cookie); - nfsd_init_dirlist_pages(rqstp, resp, argp->count); + /* Shrink to the client read size */ + count = (argp->count >> 2) - 2; + /* Make sure we've room for the NULL ptr & eof flag */ + count -= 2; + if (count < 0) + count = 0; + + resp->buffer = argp->buffer; + resp->offset = NULL; + resp->buflen = count; resp->common.err = nfs_ok; - resp->cookie_offset = 0; + /* Read directory and encode entries on the fly */ offset = argp->cookie; resp->status = nfsd_readdir(rqstp, &argp->fh, &offset, &resp->common, nfssvc_encode_entry); - nfssvc_encode_nfscookie(resp, offset); + + resp->count = resp->buffer - argp->buffer; + if (resp->offset) + *resp->offset = htonl(offset); fh_put(&argp->fh); return rpc_success; @@ -626,6 +609,7 @@ nfsd_proc_statfs(struct svc_rqst *rqstp) * NFSv2 Server procedures. * Only the results of non-idempotent operations are cached. */ +struct nfsd_void { int dummy; }; #define ST 1 /* status */ #define FH 8 /* filehandle */ @@ -634,49 +618,41 @@ nfsd_proc_statfs(struct svc_rqst *rqstp) static const struct svc_procedure nfsd_procedures2[18] = { [NFSPROC_NULL] = { .pc_func = nfsd_proc_null, - .pc_decode = nfssvc_decode_voidarg, - .pc_encode = nfssvc_encode_voidres, - .pc_argsize = sizeof(struct nfsd_voidargs), - .pc_argzero = sizeof(struct nfsd_voidargs), - .pc_ressize = sizeof(struct nfsd_voidres), + .pc_decode = nfssvc_decode_void, + .pc_encode = nfssvc_encode_void, + .pc_argsize = sizeof(struct nfsd_void), + .pc_ressize = sizeof(struct nfsd_void), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, - .pc_name = "NULL", }, [NFSPROC_GETATTR] = { .pc_func = nfsd_proc_getattr, - .pc_decode = nfssvc_decode_fhandleargs, - .pc_encode = nfssvc_encode_attrstatres, + .pc_decode = nfssvc_decode_fhandle, + .pc_encode = nfssvc_encode_attrstat, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_fhandle), - .pc_argzero = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT, - .pc_name = "GETATTR", }, [NFSPROC_SETATTR] = { .pc_func = nfsd_proc_setattr, .pc_decode = nfssvc_decode_sattrargs, - .pc_encode = nfssvc_encode_attrstatres, + .pc_encode = nfssvc_encode_attrstat, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_sattrargs), - .pc_argzero = sizeof(struct nfsd_sattrargs), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+AT, - .pc_name = "SETATTR", }, [NFSPROC_ROOT] = { .pc_func = nfsd_proc_root, - .pc_decode = nfssvc_decode_voidarg, - .pc_encode = nfssvc_encode_voidres, - .pc_argsize = sizeof(struct nfsd_voidargs), - .pc_argzero = sizeof(struct nfsd_voidargs), - .pc_ressize = sizeof(struct nfsd_voidres), + .pc_decode = nfssvc_decode_void, + .pc_encode = nfssvc_encode_void, + .pc_argsize = sizeof(struct nfsd_void), + .pc_ressize = sizeof(struct nfsd_void), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, - .pc_name = "ROOT", }, [NFSPROC_LOOKUP] = { .pc_func = nfsd_proc_lookup, @@ -684,22 +660,18 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_encode = nfssvc_encode_diropres, .pc_release = nfssvc_release_diropres, .pc_argsize = sizeof(struct nfsd_diropargs), - .pc_argzero = sizeof(struct nfsd_diropargs), .pc_ressize = sizeof(struct nfsd_diropres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+FH+AT, - .pc_name = "LOOKUP", }, [NFSPROC_READLINK] = { .pc_func = nfsd_proc_readlink, - .pc_decode = nfssvc_decode_fhandleargs, + .pc_decode = nfssvc_decode_readlinkargs, .pc_encode = nfssvc_encode_readlinkres, - .pc_argsize = sizeof(struct nfsd_fhandle), - .pc_argzero = sizeof(struct nfsd_fhandle), + .pc_argsize = sizeof(struct nfsd_readlinkargs), .pc_ressize = sizeof(struct nfsd_readlinkres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+1+NFS_MAXPATHLEN/4, - .pc_name = "READLINK", }, [NFSPROC_READ] = { .pc_func = nfsd_proc_read, @@ -707,34 +679,28 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_encode = nfssvc_encode_readres, .pc_release = nfssvc_release_readres, .pc_argsize = sizeof(struct nfsd_readargs), - .pc_argzero = sizeof(struct nfsd_readargs), .pc_ressize = sizeof(struct nfsd_readres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+AT+1+NFSSVC_MAXBLKSIZE_V2/4, - .pc_name = "READ", }, [NFSPROC_WRITECACHE] = { .pc_func = nfsd_proc_writecache, - .pc_decode = nfssvc_decode_voidarg, - .pc_encode = nfssvc_encode_voidres, - .pc_argsize = sizeof(struct nfsd_voidargs), - .pc_argzero = sizeof(struct nfsd_voidargs), - .pc_ressize = sizeof(struct nfsd_voidres), + .pc_decode = nfssvc_decode_void, + .pc_encode = nfssvc_encode_void, + .pc_argsize = sizeof(struct nfsd_void), + .pc_ressize = sizeof(struct nfsd_void), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = 0, - .pc_name = "WRITECACHE", }, [NFSPROC_WRITE] = { .pc_func = nfsd_proc_write, .pc_decode = nfssvc_decode_writeargs, - .pc_encode = nfssvc_encode_attrstatres, + .pc_encode = nfssvc_encode_attrstat, .pc_release = nfssvc_release_attrstat, .pc_argsize = sizeof(struct nfsd_writeargs), - .pc_argzero = sizeof(struct nfsd_writeargs), .pc_ressize = sizeof(struct nfsd_attrstat), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+AT, - .pc_name = "WRITE", }, [NFSPROC_CREATE] = { .pc_func = nfsd_proc_create, @@ -742,55 +708,45 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_encode = nfssvc_encode_diropres, .pc_release = nfssvc_release_diropres, .pc_argsize = sizeof(struct nfsd_createargs), - .pc_argzero = sizeof(struct nfsd_createargs), .pc_ressize = sizeof(struct nfsd_diropres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+FH+AT, - .pc_name = "CREATE", }, [NFSPROC_REMOVE] = { .pc_func = nfsd_proc_remove, .pc_decode = nfssvc_decode_diropargs, - .pc_encode = nfssvc_encode_statres, + .pc_encode = nfssvc_encode_stat, .pc_argsize = sizeof(struct nfsd_diropargs), - .pc_argzero = sizeof(struct nfsd_diropargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, - .pc_name = "REMOVE", }, [NFSPROC_RENAME] = { .pc_func = nfsd_proc_rename, .pc_decode = nfssvc_decode_renameargs, - .pc_encode = nfssvc_encode_statres, + .pc_encode = nfssvc_encode_stat, .pc_argsize = sizeof(struct nfsd_renameargs), - .pc_argzero = sizeof(struct nfsd_renameargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, - .pc_name = "RENAME", }, [NFSPROC_LINK] = { .pc_func = nfsd_proc_link, .pc_decode = nfssvc_decode_linkargs, - .pc_encode = nfssvc_encode_statres, + .pc_encode = nfssvc_encode_stat, .pc_argsize = sizeof(struct nfsd_linkargs), - .pc_argzero = sizeof(struct nfsd_linkargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, - .pc_name = "LINK", }, [NFSPROC_SYMLINK] = { .pc_func = nfsd_proc_symlink, .pc_decode = nfssvc_decode_symlinkargs, - .pc_encode = nfssvc_encode_statres, + .pc_encode = nfssvc_encode_stat, .pc_argsize = sizeof(struct nfsd_symlinkargs), - .pc_argzero = sizeof(struct nfsd_symlinkargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, - .pc_name = "SYMLINK", }, [NFSPROC_MKDIR] = { .pc_func = nfsd_proc_mkdir, @@ -798,43 +754,35 @@ static const struct svc_procedure nfsd_procedures2[18] = { .pc_encode = nfssvc_encode_diropres, .pc_release = nfssvc_release_diropres, .pc_argsize = sizeof(struct nfsd_createargs), - .pc_argzero = sizeof(struct nfsd_createargs), .pc_ressize = sizeof(struct nfsd_diropres), .pc_cachetype = RC_REPLBUFF, .pc_xdrressize = ST+FH+AT, - .pc_name = "MKDIR", }, [NFSPROC_RMDIR] = { .pc_func = nfsd_proc_rmdir, .pc_decode = nfssvc_decode_diropargs, - .pc_encode = nfssvc_encode_statres, + .pc_encode = nfssvc_encode_stat, .pc_argsize = sizeof(struct nfsd_diropargs), - .pc_argzero = sizeof(struct nfsd_diropargs), .pc_ressize = sizeof(struct nfsd_stat), .pc_cachetype = RC_REPLSTAT, .pc_xdrressize = ST, - .pc_name = "RMDIR", }, [NFSPROC_READDIR] = { .pc_func = nfsd_proc_readdir, .pc_decode = nfssvc_decode_readdirargs, .pc_encode = nfssvc_encode_readdirres, .pc_argsize = sizeof(struct nfsd_readdirargs), - .pc_argzero = sizeof(struct nfsd_readdirargs), .pc_ressize = sizeof(struct nfsd_readdirres), .pc_cachetype = RC_NOCACHE, - .pc_name = "READDIR", }, [NFSPROC_STATFS] = { .pc_func = nfsd_proc_statfs, - .pc_decode = nfssvc_decode_fhandleargs, + .pc_decode = nfssvc_decode_fhandle, .pc_encode = nfssvc_encode_statfsres, .pc_argsize = sizeof(struct nfsd_fhandle), - .pc_argzero = sizeof(struct nfsd_fhandle), .pc_ressize = sizeof(struct nfsd_statfsres), .pc_cachetype = RC_NOCACHE, .pc_xdrressize = ST+5, - .pc_name = "STATFS", }, }; @@ -848,3 +796,61 @@ const struct svc_version nfsd_version2 = { .vs_dispatch = nfsd_dispatch, .vs_xdrsize = NFS2_SVC_XDRSIZE, }; + +/* + * Map errnos to NFS errnos. + */ +__be32 +nfserrno (int errno) +{ + static struct { + __be32 nfserr; + int syserr; + } nfs_errtbl[] = { + { nfs_ok, 0 }, + { nfserr_perm, -EPERM }, + { nfserr_noent, -ENOENT }, + { nfserr_io, -EIO }, + { nfserr_nxio, -ENXIO }, + { nfserr_fbig, -E2BIG }, + { nfserr_acces, -EACCES }, + { nfserr_exist, -EEXIST }, + { nfserr_xdev, -EXDEV }, + { nfserr_mlink, -EMLINK }, + { nfserr_nodev, -ENODEV }, + { nfserr_notdir, -ENOTDIR }, + { nfserr_isdir, -EISDIR }, + { nfserr_inval, -EINVAL }, + { nfserr_fbig, -EFBIG }, + { nfserr_nospc, -ENOSPC }, + { nfserr_rofs, -EROFS }, + { nfserr_mlink, -EMLINK }, + { nfserr_nametoolong, -ENAMETOOLONG }, + { nfserr_notempty, -ENOTEMPTY }, +#ifdef EDQUOT + { nfserr_dquot, -EDQUOT }, +#endif + { nfserr_stale, -ESTALE }, + { nfserr_jukebox, -ETIMEDOUT }, + { nfserr_jukebox, -ERESTARTSYS }, + { nfserr_jukebox, -EAGAIN }, + { nfserr_jukebox, -EWOULDBLOCK }, + { nfserr_jukebox, -ENOMEM }, + { nfserr_io, -ETXTBSY }, + { nfserr_notsupp, -EOPNOTSUPP }, + { nfserr_toosmall, -ETOOSMALL }, + { nfserr_serverfault, -ESERVERFAULT }, + { nfserr_serverfault, -ENFILE }, + { nfserr_io, -EUCLEAN }, + { nfserr_perm, -ENOKEY }, + }; + int i; + + for (i = 0; i < ARRAY_SIZE(nfs_errtbl); i++) { + if (nfs_errtbl[i].syserr == errno) + return nfs_errtbl[i].nfserr; + } + WARN_ONCE(1, "nfsd: non-standard errno: %d\n", errno); + return nfserr_io; +} + diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 3d4fd40c987b..2e61a565cdbd 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -12,7 +12,6 @@ #include #include #include -#include #include #include @@ -30,10 +29,14 @@ #include "netns.h" #include "filecache.h" -#include "trace.h" - #define NFSDDBG_FACILITY NFSDDBG_SVC +bool inter_copy_offload_enable; +EXPORT_SYMBOL_GPL(inter_copy_offload_enable); +module_param(inter_copy_offload_enable, bool, 0644); +MODULE_PARM_DESC(inter_copy_offload_enable, + "Enable inter server to server copy offload. Default: false"); + extern struct svc_program nfsd_program; static int nfsd(void *vrqstp); #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) @@ -56,17 +59,18 @@ static __be32 nfsd_init_request(struct svc_rqst *, struct svc_process_info *); /* - * nfsd_mutex protects nn->nfsd_serv -- both the pointer itself and some members - * of the svc_serv struct such as ->sv_temp_socks and ->sv_permsocks. + * nfsd_mutex protects nn->nfsd_serv -- both the pointer itself and the members + * of the svc_serv struct. In particular, ->sv_nrthreads but also to some + * extent ->sv_temp_socks and ->sv_permsocks. It also protects nfsdstats.th_cnt * * If (out side the lock) nn->nfsd_serv is non-NULL, then it must point to a - * properly initialised 'struct svc_serv' with ->sv_nrthreads > 0 (unless - * nn->keep_active is set). That number of nfsd threads must - * exist and each must be listed in ->sp_all_threads in some entry of - * ->sv_pools[]. + * properly initialised 'struct svc_serv' with ->sv_nrthreads > 0. That number + * of nfsd threads must exist and each must listed in ->sp_all_threads in each + * entry of ->sv_pools[]. * - * Each active thread holds a counted reference on nn->nfsd_serv, as does - * the nn->keep_active flag and various transient calls to svc_get(). + * Transitions of the thread count between zero and non-zero are of particular + * interest since the svc_serv needs to be created and initialized at that + * point, or freed. * * Finally, the nfsd_mutex also protects some of the global variables that are * accessed when nfsd starts and that are settable via the write_* routines in @@ -84,19 +88,15 @@ DEFINE_MUTEX(nfsd_mutex); * version 4.1 DRC caches. * nfsd_drc_pages_used tracks the current version 4.1 DRC memory usage. */ -DEFINE_SPINLOCK(nfsd_drc_lock); +spinlock_t nfsd_drc_lock; unsigned long nfsd_drc_max_mem; unsigned long nfsd_drc_mem_used; #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) static struct svc_stat nfsd_acl_svcstats; static const struct svc_version *nfsd_acl_version[] = { -# if defined(CONFIG_NFSD_V2_ACL) [2] = &nfsd_acl_version2, -# endif -# if defined(CONFIG_NFSD_V3_ACL) [3] = &nfsd_acl_version3, -# endif }; #define NFSD_ACL_MINVERS 2 @@ -120,10 +120,10 @@ static struct svc_stat nfsd_acl_svcstats = { #endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */ static const struct svc_version *nfsd_version[] = { -#if defined(CONFIG_NFSD_V2) [2] = &nfsd_version2, -#endif +#if defined(CONFIG_NFSD_V3) [3] = &nfsd_version3, +#endif #if defined(CONFIG_NFSD_V4) [4] = &nfsd_version4, #endif @@ -297,13 +297,13 @@ static int nfsd_init_socks(struct net *net, const struct cred *cred) if (!list_empty(&nn->nfsd_serv->sv_permsocks)) return 0; - error = svc_xprt_create(nn->nfsd_serv, "udp", net, PF_INET, NFS_PORT, - SVC_SOCK_DEFAULTS, cred); + error = svc_create_xprt(nn->nfsd_serv, "udp", net, PF_INET, NFS_PORT, + SVC_SOCK_DEFAULTS, cred); if (error < 0) return error; - error = svc_xprt_create(nn->nfsd_serv, "tcp", net, PF_INET, NFS_PORT, - SVC_SOCK_DEFAULTS, cred); + error = svc_create_xprt(nn->nfsd_serv, "tcp", net, PF_INET, NFS_PORT, + SVC_SOCK_DEFAULTS, cred); if (error < 0) return error; @@ -312,7 +312,7 @@ static int nfsd_init_socks(struct net *net, const struct cred *cred) static int nfsd_users = 0; -static int nfsd_startup_generic(void) +static int nfsd_startup_generic(int nrservs) { int ret; @@ -349,60 +349,36 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn) return nfsd_vers(nn, 2, NFSD_TEST) || nfsd_vers(nn, 3, NFSD_TEST); } -/** - * nfsd_copy_write_verifier - Atomically copy a write verifier - * @verf: buffer in which to receive the verifier cookie - * @nn: NFS net namespace - * - * This function provides a wait-free mechanism for copying the - * namespace's write verifier without tearing it. - */ -void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn) +void nfsd_copy_boot_verifier(__be32 verf[2], struct nfsd_net *nn) { int seq = 0; do { - read_seqbegin_or_lock(&nn->writeverf_lock, &seq); - memcpy(verf, nn->writeverf, sizeof(nn->writeverf)); - } while (need_seqretry(&nn->writeverf_lock, seq)); - done_seqretry(&nn->writeverf_lock, seq); + read_seqbegin_or_lock(&nn->boot_lock, &seq); + /* + * This is opaque to client, so no need to byte-swap. Use + * __force to keep sparse happy. y2038 time_t overflow is + * irrelevant in this usage + */ + verf[0] = (__force __be32)nn->nfssvc_boot.tv_sec; + verf[1] = (__force __be32)nn->nfssvc_boot.tv_nsec; + } while (need_seqretry(&nn->boot_lock, seq)); + done_seqretry(&nn->boot_lock, seq); } -static void nfsd_reset_write_verifier_locked(struct nfsd_net *nn) +static void nfsd_reset_boot_verifier_locked(struct nfsd_net *nn) { - struct timespec64 now; - u64 verf; - - /* - * Because the time value is hashed, y2038 time_t overflow - * is irrelevant in this usage. - */ - ktime_get_raw_ts64(&now); - verf = siphash_2u64(now.tv_sec, now.tv_nsec, &nn->siphash_key); - memcpy(nn->writeverf, &verf, sizeof(nn->writeverf)); + ktime_get_real_ts64(&nn->nfssvc_boot); } -/** - * nfsd_reset_write_verifier - Generate a new write verifier - * @nn: NFS net namespace - * - * This function updates the ->writeverf field of @nn. This field - * contains an opaque cookie that, according to Section 18.32.3 of - * RFC 8881, "the client can use to determine whether a server has - * changed instance state (e.g., server restart) between a call to - * WRITE and a subsequent call to either WRITE or COMMIT. This - * cookie MUST be unchanged during a single instance of the NFSv4.1 - * server and MUST be unique between instances of the NFSv4.1 - * server." - */ -void nfsd_reset_write_verifier(struct nfsd_net *nn) +void nfsd_reset_boot_verifier(struct nfsd_net *nn) { - write_seqlock(&nn->writeverf_lock); - nfsd_reset_write_verifier_locked(nn); - write_sequnlock(&nn->writeverf_lock); + write_seqlock(&nn->boot_lock); + nfsd_reset_boot_verifier_locked(nn); + write_sequnlock(&nn->boot_lock); } -static int nfsd_startup_net(struct net *net, const struct cred *cred) +static int nfsd_startup_net(int nrservs, struct net *net, const struct cred *cred) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); int ret; @@ -410,7 +386,7 @@ static int nfsd_startup_net(struct net *net, const struct cred *cred) if (nn->nfsd_net_up) return 0; - ret = nfsd_startup_generic(); + ret = nfsd_startup_generic(nrservs); if (ret) return ret; ret = nfsd_init_socks(net, cred); @@ -431,9 +407,6 @@ static int nfsd_startup_net(struct net *net, const struct cred *cred) if (ret) goto out_filecache; -#ifdef CONFIG_NFSD_V4_2_INTER_SSC - nfsd4_ssc_init_umount_work(nn); -#endif nn->nfsd_net_up = true; return 0; @@ -463,7 +436,6 @@ static void nfsd_shutdown_net(struct net *net) nfsd_shutdown_generic(); } -static DEFINE_SPINLOCK(nfsd_notifier_lock); static int nfsd_inetaddr_event(struct notifier_block *this, unsigned long event, void *ptr) { @@ -473,17 +445,18 @@ static int nfsd_inetaddr_event(struct notifier_block *this, unsigned long event, struct nfsd_net *nn = net_generic(net, nfsd_net_id); struct sockaddr_in sin; - if (event != NETDEV_DOWN || !nn->nfsd_serv) + if ((event != NETDEV_DOWN) || + !atomic_inc_not_zero(&nn->ntf_refcnt)) goto out; - spin_lock(&nfsd_notifier_lock); if (nn->nfsd_serv) { dprintk("nfsd_inetaddr_event: removed %pI4\n", &ifa->ifa_local); sin.sin_family = AF_INET; sin.sin_addr.s_addr = ifa->ifa_local; svc_age_temp_xprts_now(nn->nfsd_serv, (struct sockaddr *)&sin); } - spin_unlock(&nfsd_notifier_lock); + atomic_dec(&nn->ntf_refcnt); + wake_up(&nn->ntf_wq); out: return NOTIFY_DONE; @@ -503,10 +476,10 @@ static int nfsd_inet6addr_event(struct notifier_block *this, struct nfsd_net *nn = net_generic(net, nfsd_net_id); struct sockaddr_in6 sin6; - if (event != NETDEV_DOWN || !nn->nfsd_serv) + if ((event != NETDEV_DOWN) || + !atomic_inc_not_zero(&nn->ntf_refcnt)) goto out; - spin_lock(&nfsd_notifier_lock); if (nn->nfsd_serv) { dprintk("nfsd_inet6addr_event: removed %pI6\n", &ifa->addr); sin6.sin6_family = AF_INET6; @@ -515,8 +488,8 @@ static int nfsd_inet6addr_event(struct notifier_block *this, sin6.sin6_scope_id = ifa->idev->dev->ifindex; svc_age_temp_xprts_now(nn->nfsd_serv, (struct sockaddr *)&sin6); } - spin_unlock(&nfsd_notifier_lock); - + atomic_dec(&nn->ntf_refcnt); + wake_up(&nn->ntf_wq); out: return NOTIFY_DONE; } @@ -529,15 +502,11 @@ static struct notifier_block nfsd_inet6addr_notifier = { /* Only used under nfsd_mutex, so this atomic may be overkill: */ static atomic_t nfsd_notifier_refcount = ATOMIC_INIT(0); -void nfsd_last_thread(struct net *net) +static void nfsd_last_thread(struct svc_serv *serv, struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); - struct svc_serv *serv = nn->nfsd_serv; - - spin_lock(&nfsd_notifier_lock); - nn->nfsd_serv = NULL; - spin_unlock(&nfsd_notifier_lock); + atomic_dec(&nn->ntf_refcnt); /* check if the notifier still has clients */ if (atomic_dec_return(&nfsd_notifier_refcount) == 0) { unregister_inetaddr_notifier(&nfsd_inetaddr_notifier); @@ -545,8 +514,7 @@ void nfsd_last_thread(struct net *net) unregister_inet6addr_notifier(&nfsd_inet6addr_notifier); #endif } - - svc_xprt_destroy_all(serv, net); + wait_event(nn->ntf_wq, atomic_read(&nn->ntf_refcnt) == 0); /* * write_ports can create the server without actually starting @@ -599,6 +567,7 @@ static void set_max_drc(void) nfsd_drc_max_mem = (nr_free_buffer_pages() >> NFSD_DRC_SIZE_SHIFT) * PAGE_SIZE; nfsd_drc_mem_used = 0; + spin_lock_init(&nfsd_drc_lock); dprintk("%s nfsd_drc_max_mem %lu \n", __func__, nfsd_drc_max_mem); } @@ -623,6 +592,24 @@ static int nfsd_get_default_max_blksize(void) return ret; } +static const struct svc_serv_ops nfsd_thread_sv_ops = { + .svo_shutdown = nfsd_last_thread, + .svo_function = nfsd, + .svo_enqueue_xprt = svc_xprt_do_enqueue, + .svo_setup = svc_set_num_threads, + .svo_module = THIS_MODULE, +}; + +static void nfsd_complete_shutdown(struct net *net) +{ + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + + WARN_ON(!mutex_is_locked(&nfsd_mutex)); + + nn->nfsd_serv = NULL; + complete(&nn->nfsd_shutdown_complete); +} + void nfsd_shutdown_threads(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); @@ -637,10 +624,11 @@ void nfsd_shutdown_threads(struct net *net) svc_get(serv); /* Kill outstanding nfsd threads */ - svc_set_num_threads(serv, NULL, 0); - nfsd_last_thread(net); - svc_put(serv); + serv->sv_ops->svo_setup(serv, NULL, 0); + nfsd_destroy(net); mutex_unlock(&nfsd_mutex); + /* Wait for shutdown of nfsd_serv to complete */ + wait_for_completion(&nn->nfsd_shutdown_complete); } bool i_am_nfsd(void) @@ -652,7 +640,6 @@ int nfsd_create_serv(struct net *net) { int error; struct nfsd_net *nn = net_generic(net, nfsd_net_id); - struct svc_serv *serv; WARN_ON(!mutex_is_locked(&nfsd_mutex)); if (nn->nfsd_serv) { @@ -662,19 +649,19 @@ int nfsd_create_serv(struct net *net) if (nfsd_max_blksize == 0) nfsd_max_blksize = nfsd_get_default_max_blksize(); nfsd_reset_versions(nn); - serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize, nfsd); - if (serv == NULL) + nn->nfsd_serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize, + &nfsd_thread_sv_ops); + if (nn->nfsd_serv == NULL) return -ENOMEM; + init_completion(&nn->nfsd_shutdown_complete); - serv->sv_maxconn = nn->max_connections; - error = svc_bind(serv, net); + nn->nfsd_serv->sv_maxconn = nn->max_connections; + error = svc_bind(nn->nfsd_serv, net); if (error < 0) { - svc_put(serv); + svc_destroy(nn->nfsd_serv); + nfsd_complete_shutdown(net); return error; } - spin_lock(&nfsd_notifier_lock); - nn->nfsd_serv = serv; - spin_unlock(&nfsd_notifier_lock); set_max_drc(); /* check if the notifier is already set */ @@ -684,7 +671,8 @@ int nfsd_create_serv(struct net *net) register_inet6addr_notifier(&nfsd_inet6addr_notifier); #endif } - nfsd_reset_write_verifier(nn); + atomic_inc(&nn->ntf_refcnt); + nfsd_reset_boot_verifier(nn); return 0; } @@ -711,6 +699,18 @@ int nfsd_get_nrthreads(int n, int *nthreads, struct net *net) return 0; } +void nfsd_destroy(struct net *net) +{ + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + int destroy = (nn->nfsd_serv->sv_nrthreads == 1); + + if (destroy) + svc_shutdown_net(nn->nfsd_serv, net); + svc_destroy(nn->nfsd_serv); + if (destroy) + nfsd_complete_shutdown(net); +} + int nfsd_set_nrthreads(int n, int *nthreads, struct net *net) { int i = 0; @@ -735,7 +735,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net) if (tot > NFSD_MAXSERVS) { /* total too large: scale down requested numbers */ for (i = 0; i < n && tot > 0; i++) { - int new = nthreads[i] * NFSD_MAXSERVS / tot; + int new = nthreads[i] * NFSD_MAXSERVS / tot; tot -= (nthreads[i] - new); nthreads[i] = new; } @@ -755,13 +755,12 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net) /* apply the new numbers */ svc_get(nn->nfsd_serv); for (i = 0; i < n; i++) { - err = svc_set_num_threads(nn->nfsd_serv, - &nn->nfsd_serv->sv_pools[i], - nthreads[i]); + err = nn->nfsd_serv->sv_ops->svo_setup(nn->nfsd_serv, + &nn->nfsd_serv->sv_pools[i], nthreads[i]); if (err) break; } - svc_put(nn->nfsd_serv); + nfsd_destroy(net); return err; } @@ -776,7 +775,6 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred) int error; bool nfsd_up_before; struct nfsd_net *nn = net_generic(net, nfsd_net_id); - struct svc_serv *serv; mutex_lock(&nfsd_mutex); dprintk("nfsd: creating service\n"); @@ -788,7 +786,7 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred) if (nrservs == 0 && nn->nfsd_serv == NULL) goto out; - strscpy(nn->nfsd_name, utsname()->nodename, + strlcpy(nn->nfsd_name, utsname()->nodename, sizeof(nn->nfsd_name)); error = nfsd_create_serv(net); @@ -796,25 +794,24 @@ nfsd_svc(int nrservs, struct net *net, const struct cred *cred) goto out; nfsd_up_before = nn->nfsd_net_up; - serv = nn->nfsd_serv; - error = nfsd_startup_net(net, cred); + error = nfsd_startup_net(nrservs, net, cred); if (error) - goto out_put; - error = svc_set_num_threads(serv, NULL, nrservs); + goto out_destroy; + error = nn->nfsd_serv->sv_ops->svo_setup(nn->nfsd_serv, + NULL, nrservs); if (error) goto out_shutdown; - error = serv->sv_nrthreads; - if (error == 0) - nfsd_last_thread(net); + /* We are holding a reference to nn->nfsd_serv which + * we don't want to count in the return value, + * so subtract 1 + */ + error = nn->nfsd_serv->sv_nrthreads - 1; out_shutdown: if (error < 0 && !nfsd_up_before) nfsd_shutdown_net(net); -out_put: - /* Threads now hold service active */ - if (xchg(&nn->keep_active, 0)) - svc_put(serv); - svc_put(serv); +out_destroy: + nfsd_destroy(net); /* Release server */ out: mutex_unlock(&nfsd_mutex); return error; @@ -928,6 +925,9 @@ nfsd(void *vrqstp) struct nfsd_net *nn = net_generic(net, nfsd_net_id); int err; + /* Lock module and set up kernel thread */ + mutex_lock(&nfsd_mutex); + /* At this point, the thread shares current->fs * with the init process. We need to create files with the * umask as defined by the client instead of init's umask. */ @@ -938,7 +938,17 @@ nfsd(void *vrqstp) current->fs->umask = 0; - atomic_inc(&nfsdstats.th_cnt); + /* + * thread is spawned with all signals set to SIG_IGN, re-enable + * the ones that will bring down the thread + */ + allow_signal(SIGKILL); + allow_signal(SIGHUP); + allow_signal(SIGINT); + allow_signal(SIGQUIT); + + nfsdstats.th_cnt++; + mutex_unlock(&nfsd_mutex); set_freezable(); @@ -962,14 +972,57 @@ nfsd(void *vrqstp) validate_process_creds(); } - atomic_dec(&nfsdstats.th_cnt); + /* Clear signals before calling svc_exit_thread() */ + flush_signals(current); + + mutex_lock(&nfsd_mutex); + nfsdstats.th_cnt --; out: + rqstp->rq_server = NULL; + /* Release the thread */ svc_exit_thread(rqstp); + + nfsd_destroy(net); + + /* Release module */ + mutex_unlock(&nfsd_mutex); + module_put_and_exit(0); return 0; } +/* + * A write procedure can have a large argument, and a read procedure can + * have a large reply, but no NFSv2 or NFSv3 procedure has argument and + * reply that can both be larger than a page. The xdr code has taken + * advantage of this assumption to be a sloppy about bounds checking in + * some cases. Pending a rewrite of the NFSv2/v3 xdr code to fix that + * problem, we enforce these assumptions here: + */ +static bool nfs_request_too_big(struct svc_rqst *rqstp, + const struct svc_procedure *proc) +{ + /* + * The ACL code has more careful bounds-checking and is not + * susceptible to this problem: + */ + if (rqstp->rq_prog != NFS_PROGRAM) + return false; + /* + * Ditto NFSv4 (which can in theory have argument and reply both + * more than a page): + */ + if (rqstp->rq_vers >= 4) + return false; + /* The reply will be small, we're OK: */ + if (proc->pc_xdrressize > 0 && + proc->pc_xdrressize < XDR_QUADLEN(PAGE_SIZE)) + return false; + + return rqstp->rq_arg.len > PAGE_SIZE; +} + /** * nfsd_dispatch - Process an NFS or NFSACL Request * @rqstp: incoming request @@ -984,15 +1037,22 @@ nfsd(void *vrqstp) int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) { const struct svc_procedure *proc = rqstp->rq_procinfo; + struct kvec *argv = &rqstp->rq_arg.head[0]; + struct kvec *resv = &rqstp->rq_res.head[0]; + __be32 *p; + + dprintk("nfsd_dispatch: vers %d proc %d\n", + rqstp->rq_vers, rqstp->rq_proc); + + if (nfs_request_too_big(rqstp, proc)) + goto out_too_large; /* * Give the xdr decoder a chance to change this if it wants * (necessary in the NFSv4.0 compound case) */ rqstp->rq_cachetype = proc->pc_cachetype; - - svcxdr_init_decode(rqstp); - if (!proc->pc_decode(rqstp, &rqstp->rq_arg_stream)) + if (!proc->pc_decode(rqstp, argv->iov_base)) goto out_decode_err; switch (nfsd_cache_lookup(rqstp)) { @@ -1008,64 +1068,43 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) * Need to grab the location to store the status, as * NFSv4 does some encoding while processing */ - svcxdr_init_encode(rqstp); + p = resv->iov_base + resv->iov_len; + resv->iov_len += sizeof(__be32); *statp = proc->pc_func(rqstp); - if (test_bit(RQ_DROPME, &rqstp->rq_flags)) + if (*statp == rpc_drop_reply || test_bit(RQ_DROPME, &rqstp->rq_flags)) goto out_update_drop; - if (!proc->pc_encode(rqstp, &rqstp->rq_res_stream)) + if (!proc->pc_encode(rqstp, p)) goto out_encode_err; nfsd_cache_update(rqstp, rqstp->rq_cachetype, statp + 1); out_cached_reply: return 1; +out_too_large: + dprintk("nfsd: NFSv%d argument too large\n", rqstp->rq_vers); + *statp = rpc_garbage_args; + return 1; + out_decode_err: - trace_nfsd_garbage_args_err(rqstp); + dprintk("nfsd: failed to decode arguments!\n"); *statp = rpc_garbage_args; return 1; out_update_drop: + dprintk("nfsd: Dropping request; may be revisited later\n"); nfsd_cache_update(rqstp, RC_NOCACHE, NULL); out_dropit: return 0; out_encode_err: - trace_nfsd_cant_encode_err(rqstp); + dprintk("nfsd: failed to encode result!\n"); nfsd_cache_update(rqstp, RC_NOCACHE, NULL); *statp = rpc_system_err; return 1; } -/** - * nfssvc_decode_voidarg - Decode void arguments - * @rqstp: Server RPC transaction context - * @xdr: XDR stream positioned at arguments to decode - * - * Return values: - * %false: Arguments were not valid - * %true: Decoding was successful - */ -bool nfssvc_decode_voidarg(struct svc_rqst *rqstp, struct xdr_stream *xdr) -{ - return true; -} - -/** - * nfssvc_encode_voidres - Encode void results - * @rqstp: Server RPC transaction context - * @xdr: XDR stream into which to encode results - * - * Return values: - * %false: Local error while encoding - * %true: Encoding was successful - */ -bool nfssvc_encode_voidres(struct svc_rqst *rqstp, struct xdr_stream *xdr) -{ - return true; -} - int nfsd_pool_stats_open(struct inode *inode, struct file *file) { int ret; @@ -1076,6 +1115,7 @@ int nfsd_pool_stats_open(struct inode *inode, struct file *file) mutex_unlock(&nfsd_mutex); return -ENODEV; } + /* bump up the psudo refcount while traversing */ svc_get(nn->nfsd_serv); ret = svc_pool_stats_open(nn->nfsd_serv, file); mutex_unlock(&nfsd_mutex); @@ -1084,12 +1124,12 @@ int nfsd_pool_stats_open(struct inode *inode, struct file *file) int nfsd_pool_stats_release(struct inode *inode, struct file *file) { - struct seq_file *seq = file->private_data; - struct svc_serv *serv = seq->private; int ret = seq_release(inode, file); + struct net *net = inode->i_sb->s_fs_info; mutex_lock(&nfsd_mutex); - svc_put(serv); + /* this function really, really should have been called svc_put() */ + nfsd_destroy(net); mutex_unlock(&nfsd_mutex); return ret; } diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c index caf6355b18fa..8a288c8fcd57 100644 --- a/fs/nfsd/nfsxdr.c +++ b/fs/nfsd/nfsxdr.c @@ -9,10 +9,12 @@ #include "xdr.h" #include "auth.h" +#define NFSDDBG_FACILITY NFSDDBG_XDR + /* * Mapping of S_IF* types to NFS file types */ -static const u32 nfs_ftypes[] = { +static u32 nfs_ftypes[] = { NFNON, NFCHR, NFCHR, NFBAD, NFDIR, NFBAD, NFBLK, NFBAD, NFREG, NFBAD, NFLNK, NFBAD, @@ -21,168 +23,93 @@ static const u32 nfs_ftypes[] = { /* - * Basic NFSv2 data types (RFC 1094 Section 2.3) + * XDR functions for basic NFS types */ - -/** - * svcxdr_encode_stat - Encode an NFSv2 status code - * @xdr: XDR stream - * @status: status value to encode - * - * Return values: - * %false: Send buffer space was exhausted - * %true: Success - */ -bool -svcxdr_encode_stat(struct xdr_stream *xdr, __be32 status) +static __be32 * +decode_fh(__be32 *p, struct svc_fh *fhp) { - __be32 *p; - - p = xdr_reserve_space(xdr, sizeof(status)); - if (!p) - return false; - *p = status; - - return true; -} - -/** - * svcxdr_decode_fhandle - Decode an NFSv2 file handle - * @xdr: XDR stream positioned at an encoded NFSv2 FH - * @fhp: OUT: filled-in server file handle - * - * Return values: - * %false: The encoded file handle was not valid - * %true: @fhp has been initialized - */ -bool -svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp) -{ - __be32 *p; - - p = xdr_inline_decode(xdr, NFS_FHSIZE); - if (!p) - return false; fh_init(fhp, NFS_FHSIZE); - memcpy(&fhp->fh_handle.fh_raw, p, NFS_FHSIZE); + memcpy(&fhp->fh_handle.fh_base, p, NFS_FHSIZE); fhp->fh_handle.fh_size = NFS_FHSIZE; - return true; + /* FIXME: Look up export pointer here and verify + * Sun Secure RPC if requested */ + return p + (NFS_FHSIZE >> 2); } -static bool -svcxdr_encode_fhandle(struct xdr_stream *xdr, const struct svc_fh *fhp) +/* Helper function for NFSv2 ACL code */ +__be32 *nfs2svc_decode_fh(__be32 *p, struct svc_fh *fhp) { - __be32 *p; - - p = xdr_reserve_space(xdr, NFS_FHSIZE); - if (!p) - return false; - memcpy(p, &fhp->fh_handle.fh_raw, NFS_FHSIZE); - - return true; + return decode_fh(p, fhp); } static __be32 * -encode_timeval(__be32 *p, const struct timespec64 *time) +encode_fh(__be32 *p, struct svc_fh *fhp) { - *p++ = cpu_to_be32((u32)time->tv_sec); - if (time->tv_nsec) - *p++ = cpu_to_be32(time->tv_nsec / NSEC_PER_USEC); - else - *p++ = xdr_zero; + memcpy(p, &fhp->fh_handle.fh_base, NFS_FHSIZE); + return p + (NFS_FHSIZE>> 2); +} + +/* + * Decode a file name and make sure that the path contains + * no slashes or null bytes. + */ +static __be32 * +decode_filename(__be32 *p, char **namp, unsigned int *lenp) +{ + char *name; + unsigned int i; + + if ((p = xdr_decode_string_inplace(p, namp, lenp, NFS_MAXNAMLEN)) != NULL) { + for (i = 0, name = *namp; i < *lenp; i++, name++) { + if (*name == '\0' || *name == '/') + return NULL; + } + } + return p; } -static bool -svcxdr_decode_filename(struct xdr_stream *xdr, char **name, unsigned int *len) +static __be32 * +decode_sattr(__be32 *p, struct iattr *iap, struct user_namespace *userns) { - u32 size, i; - __be32 *p; - char *c; - - if (xdr_stream_decode_u32(xdr, &size) < 0) - return false; - if (size == 0 || size > NFS_MAXNAMLEN) - return false; - p = xdr_inline_decode(xdr, size); - if (!p) - return false; - - *len = size; - *name = (char *)p; - for (i = 0, c = *name; i < size; i++, c++) - if (*c == '\0' || *c == '/') - return false; - - return true; -} - -static bool -svcxdr_decode_diropargs(struct xdr_stream *xdr, struct svc_fh *fhp, - char **name, unsigned int *len) -{ - return svcxdr_decode_fhandle(xdr, fhp) && - svcxdr_decode_filename(xdr, name, len); -} - -static bool -svcxdr_decode_sattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, - struct iattr *iap) -{ - u32 tmp1, tmp2; - __be32 *p; - - p = xdr_inline_decode(xdr, XDR_UNIT * 8); - if (!p) - return false; + u32 tmp, tmp1; iap->ia_valid = 0; - /* - * Some Sun clients put 0xffff in the mode field when they - * mean 0xffffffff. + /* Sun client bug compatibility check: some sun clients seem to + * put 0xffff in the mode field when they mean 0xffffffff. + * Quoting the 4.4BSD nfs server code: Nah nah nah nah na nah. */ - tmp1 = be32_to_cpup(p++); - if (tmp1 != (u32)-1 && tmp1 != 0xffff) { + if ((tmp = ntohl(*p++)) != (u32)-1 && tmp != 0xffff) { iap->ia_valid |= ATTR_MODE; - iap->ia_mode = tmp1; + iap->ia_mode = tmp; } - - tmp1 = be32_to_cpup(p++); - if (tmp1 != (u32)-1) { - iap->ia_uid = make_kuid(nfsd_user_namespace(rqstp), tmp1); + if ((tmp = ntohl(*p++)) != (u32)-1) { + iap->ia_uid = make_kuid(userns, tmp); if (uid_valid(iap->ia_uid)) iap->ia_valid |= ATTR_UID; } - - tmp1 = be32_to_cpup(p++); - if (tmp1 != (u32)-1) { - iap->ia_gid = make_kgid(nfsd_user_namespace(rqstp), tmp1); + if ((tmp = ntohl(*p++)) != (u32)-1) { + iap->ia_gid = make_kgid(userns, tmp); if (gid_valid(iap->ia_gid)) iap->ia_valid |= ATTR_GID; } - - tmp1 = be32_to_cpup(p++); - if (tmp1 != (u32)-1) { + if ((tmp = ntohl(*p++)) != (u32)-1) { iap->ia_valid |= ATTR_SIZE; - iap->ia_size = tmp1; + iap->ia_size = tmp; } - - tmp1 = be32_to_cpup(p++); - tmp2 = be32_to_cpup(p++); - if (tmp1 != (u32)-1 && tmp2 != (u32)-1) { + tmp = ntohl(*p++); tmp1 = ntohl(*p++); + if (tmp != (u32)-1 && tmp1 != (u32)-1) { iap->ia_valid |= ATTR_ATIME | ATTR_ATIME_SET; - iap->ia_atime.tv_sec = tmp1; - iap->ia_atime.tv_nsec = tmp2 * NSEC_PER_USEC; + iap->ia_atime.tv_sec = tmp; + iap->ia_atime.tv_nsec = tmp1 * 1000; } - - tmp1 = be32_to_cpup(p++); - tmp2 = be32_to_cpup(p++); - if (tmp1 != (u32)-1 && tmp2 != (u32)-1) { + tmp = ntohl(*p++); tmp1 = ntohl(*p++); + if (tmp != (u32)-1 && tmp1 != (u32)-1) { iap->ia_valid |= ATTR_MTIME | ATTR_MTIME_SET; - iap->ia_mtime.tv_sec = tmp1; - iap->ia_mtime.tv_nsec = tmp2 * NSEC_PER_USEC; + iap->ia_mtime.tv_sec = tmp; + iap->ia_mtime.tv_nsec = tmp1 * 1000; /* * Passing the invalid value useconds=1000000 for mtime * is a Sun convention for "set both mtime and atime to @@ -192,447 +119,476 @@ svcxdr_decode_sattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, * sattr in section 6.1 of "NFS Illustrated" by * Brent Callaghan, Addison-Wesley, ISBN 0-201-32750-5 */ - if (tmp2 == 1000000) + if (tmp1 == 1000000) iap->ia_valid &= ~(ATTR_ATIME_SET|ATTR_MTIME_SET); } - - return true; + return p; } -/** - * svcxdr_encode_fattr - Encode NFSv2 file attributes - * @rqstp: Context of a completed RPC transaction - * @xdr: XDR stream - * @fhp: File handle to encode - * @stat: Attributes to encode - * - * Return values: - * %false: Send buffer space was exhausted - * %true: Success - */ -bool -svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, - const struct svc_fh *fhp, const struct kstat *stat) +static __be32 * +encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, + struct kstat *stat) { struct user_namespace *userns = nfsd_user_namespace(rqstp); - struct dentry *dentry = fhp->fh_dentry; - int type = stat->mode & S_IFMT; + struct dentry *dentry = fhp->fh_dentry; + int type; struct timespec64 time; - __be32 *p; - u32 fsid; + u32 f; - p = xdr_reserve_space(xdr, XDR_UNIT * 17); - if (!p) - return false; + type = (stat->mode & S_IFMT); - *p++ = cpu_to_be32(nfs_ftypes[type >> 12]); - *p++ = cpu_to_be32((u32)stat->mode); - *p++ = cpu_to_be32((u32)stat->nlink); - *p++ = cpu_to_be32((u32)from_kuid_munged(userns, stat->uid)); - *p++ = cpu_to_be32((u32)from_kgid_munged(userns, stat->gid)); + *p++ = htonl(nfs_ftypes[type >> 12]); + *p++ = htonl((u32) stat->mode); + *p++ = htonl((u32) stat->nlink); + *p++ = htonl((u32) from_kuid_munged(userns, stat->uid)); + *p++ = htonl((u32) from_kgid_munged(userns, stat->gid)); - if (S_ISLNK(type) && stat->size > NFS_MAXPATHLEN) - *p++ = cpu_to_be32(NFS_MAXPATHLEN); - else - *p++ = cpu_to_be32((u32) stat->size); - *p++ = cpu_to_be32((u32) stat->blksize); + if (S_ISLNK(type) && stat->size > NFS_MAXPATHLEN) { + *p++ = htonl(NFS_MAXPATHLEN); + } else { + *p++ = htonl((u32) stat->size); + } + *p++ = htonl((u32) stat->blksize); if (S_ISCHR(type) || S_ISBLK(type)) - *p++ = cpu_to_be32(new_encode_dev(stat->rdev)); + *p++ = htonl(new_encode_dev(stat->rdev)); else - *p++ = cpu_to_be32(0xffffffff); - *p++ = cpu_to_be32((u32)stat->blocks); - + *p++ = htonl(0xffffffff); + *p++ = htonl((u32) stat->blocks); switch (fsid_source(fhp)) { + default: + case FSIDSOURCE_DEV: + *p++ = htonl(new_encode_dev(stat->dev)); + break; case FSIDSOURCE_FSID: - fsid = (u32)fhp->fh_export->ex_fsid; + *p++ = htonl((u32) fhp->fh_export->ex_fsid); break; case FSIDSOURCE_UUID: - fsid = ((u32 *)fhp->fh_export->ex_uuid)[0]; - fsid ^= ((u32 *)fhp->fh_export->ex_uuid)[1]; - fsid ^= ((u32 *)fhp->fh_export->ex_uuid)[2]; - fsid ^= ((u32 *)fhp->fh_export->ex_uuid)[3]; - break; - default: - fsid = new_encode_dev(stat->dev); + f = ((u32*)fhp->fh_export->ex_uuid)[0]; + f ^= ((u32*)fhp->fh_export->ex_uuid)[1]; + f ^= ((u32*)fhp->fh_export->ex_uuid)[2]; + f ^= ((u32*)fhp->fh_export->ex_uuid)[3]; + *p++ = htonl(f); break; } - *p++ = cpu_to_be32(fsid); - - *p++ = cpu_to_be32((u32)stat->ino); - p = encode_timeval(p, &stat->atime); + *p++ = htonl((u32) stat->ino); + *p++ = htonl((u32) stat->atime.tv_sec); + *p++ = htonl(stat->atime.tv_nsec ? stat->atime.tv_nsec / 1000 : 0); time = stat->mtime; - lease_get_mtime(d_inode(dentry), &time); - p = encode_timeval(p, &time); - encode_timeval(p, &stat->ctime); + lease_get_mtime(d_inode(dentry), &time); + *p++ = htonl((u32) time.tv_sec); + *p++ = htonl(time.tv_nsec ? time.tv_nsec / 1000 : 0); + *p++ = htonl((u32) stat->ctime.tv_sec); + *p++ = htonl(stat->ctime.tv_nsec ? stat->ctime.tv_nsec / 1000 : 0); - return true; + return p; +} + +/* Helper function for NFSv2 ACL code */ +__be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, struct kstat *stat) +{ + return encode_fattr(rqstp, p, fhp, stat); } /* * XDR decode functions */ +int +nfssvc_decode_void(struct svc_rqst *rqstp, __be32 *p) +{ + return xdr_argsize_check(rqstp, p); +} -bool -nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfssvc_decode_fhandle(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_fhandle *args = rqstp->rq_argp; - return svcxdr_decode_fhandle(xdr, &args->fh); + p = decode_fh(p, &args->fh); + if (!p) + return 0; + return xdr_argsize_check(rqstp, p); } -bool -nfssvc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfssvc_decode_sattrargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_sattrargs *args = rqstp->rq_argp; - return svcxdr_decode_fhandle(xdr, &args->fh) && - svcxdr_decode_sattr(rqstp, xdr, &args->attrs); + p = decode_fh(p, &args->fh); + if (!p) + return 0; + p = decode_sattr(p, &args->attrs, nfsd_user_namespace(rqstp)); + + return xdr_argsize_check(rqstp, p); } -bool -nfssvc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfssvc_decode_diropargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_diropargs *args = rqstp->rq_argp; - return svcxdr_decode_diropargs(xdr, &args->fh, &args->name, &args->len); + if (!(p = decode_fh(p, &args->fh)) + || !(p = decode_filename(p, &args->name, &args->len))) + return 0; + + return xdr_argsize_check(rqstp, p); } -bool -nfssvc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfssvc_decode_readargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_readargs *args = rqstp->rq_argp; - u32 totalcount; + unsigned int len; + int v; + p = decode_fh(p, &args->fh); + if (!p) + return 0; - if (!svcxdr_decode_fhandle(xdr, &args->fh)) - return false; - if (xdr_stream_decode_u32(xdr, &args->offset) < 0) - return false; - if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return false; - /* totalcount is ignored */ - if (xdr_stream_decode_u32(xdr, &totalcount) < 0) - return false; + args->offset = ntohl(*p++); + len = args->count = ntohl(*p++); + p++; /* totalcount - unused */ - return true; + len = min_t(unsigned int, len, NFSSVC_MAXBLKSIZE_V2); + + /* set up somewhere to store response. + * We take pages, put them on reslist and include in iovec + */ + v=0; + while (len > 0) { + struct page *p = *(rqstp->rq_next_page++); + + rqstp->rq_vec[v].iov_base = page_address(p); + rqstp->rq_vec[v].iov_len = min_t(unsigned int, len, PAGE_SIZE); + len -= rqstp->rq_vec[v].iov_len; + v++; + } + args->vlen = v; + return xdr_argsize_check(rqstp, p); } -bool -nfssvc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_writeargs *args = rqstp->rq_argp; - u32 beginoffset, totalcount; + unsigned int len, hdr, dlen; + struct kvec *head = rqstp->rq_arg.head; - if (!svcxdr_decode_fhandle(xdr, &args->fh)) - return false; - /* beginoffset is ignored */ - if (xdr_stream_decode_u32(xdr, &beginoffset) < 0) - return false; - if (xdr_stream_decode_u32(xdr, &args->offset) < 0) - return false; - /* totalcount is ignored */ - if (xdr_stream_decode_u32(xdr, &totalcount) < 0) - return false; + p = decode_fh(p, &args->fh); + if (!p) + return 0; - /* opaque data */ - if (xdr_stream_decode_u32(xdr, &args->len) < 0) - return false; - if (args->len > NFSSVC_MAXBLKSIZE_V2) - return false; + p++; /* beginoffset */ + args->offset = ntohl(*p++); /* offset */ + p++; /* totalcount */ + len = args->len = ntohl(*p++); + /* + * The protocol specifies a maximum of 8192 bytes. + */ + if (len > NFSSVC_MAXBLKSIZE_V2) + return 0; - return xdr_stream_subsegment(xdr, &args->payload, args->len); + /* + * Check to make sure that we got the right number of + * bytes. + */ + hdr = (void*)p - head->iov_base; + if (hdr > head->iov_len) + return 0; + dlen = head->iov_len + rqstp->rq_arg.page_len - hdr; + + /* + * Round the length of the data which was specified up to + * the next multiple of XDR units and then compare that + * against the length which was actually received. + * Note that when RPCSEC/GSS (for example) is used, the + * data buffer can be padded so dlen might be larger + * than required. It must never be smaller. + */ + if (dlen < XDR_QUADLEN(len)*4) + return 0; + + args->first.iov_base = (void *)p; + args->first.iov_len = head->iov_len - hdr; + return 1; } -bool -nfssvc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfssvc_decode_createargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_createargs *args = rqstp->rq_argp; - return svcxdr_decode_diropargs(xdr, &args->fh, - &args->name, &args->len) && - svcxdr_decode_sattr(rqstp, xdr, &args->attrs); + if ( !(p = decode_fh(p, &args->fh)) + || !(p = decode_filename(p, &args->name, &args->len))) + return 0; + p = decode_sattr(p, &args->attrs, nfsd_user_namespace(rqstp)); + + return xdr_argsize_check(rqstp, p); } -bool -nfssvc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfssvc_decode_renameargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_renameargs *args = rqstp->rq_argp; - return svcxdr_decode_diropargs(xdr, &args->ffh, - &args->fname, &args->flen) && - svcxdr_decode_diropargs(xdr, &args->tfh, - &args->tname, &args->tlen); + if (!(p = decode_fh(p, &args->ffh)) + || !(p = decode_filename(p, &args->fname, &args->flen)) + || !(p = decode_fh(p, &args->tfh)) + || !(p = decode_filename(p, &args->tname, &args->tlen))) + return 0; + + return xdr_argsize_check(rqstp, p); } -bool -nfssvc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfssvc_decode_readlinkargs(struct svc_rqst *rqstp, __be32 *p) +{ + struct nfsd_readlinkargs *args = rqstp->rq_argp; + + p = decode_fh(p, &args->fh); + if (!p) + return 0; + args->buffer = page_address(*(rqstp->rq_next_page++)); + + return xdr_argsize_check(rqstp, p); +} + +int +nfssvc_decode_linkargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_linkargs *args = rqstp->rq_argp; - return svcxdr_decode_fhandle(xdr, &args->ffh) && - svcxdr_decode_diropargs(xdr, &args->tfh, - &args->tname, &args->tlen); + if (!(p = decode_fh(p, &args->ffh)) + || !(p = decode_fh(p, &args->tfh)) + || !(p = decode_filename(p, &args->tname, &args->tlen))) + return 0; + + return xdr_argsize_check(rqstp, p); } -bool -nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_symlinkargs *args = rqstp->rq_argp; - struct kvec *head = rqstp->rq_arg.head; + char *base = (char *)p; + size_t xdrlen; - if (!svcxdr_decode_diropargs(xdr, &args->ffh, &args->fname, &args->flen)) - return false; - if (xdr_stream_decode_u32(xdr, &args->tlen) < 0) - return false; + if ( !(p = decode_fh(p, &args->ffh)) + || !(p = decode_filename(p, &args->fname, &args->flen))) + return 0; + + args->tlen = ntohl(*p++); if (args->tlen == 0) - return false; + return 0; - args->first.iov_len = head->iov_len - xdr_stream_pos(xdr); - args->first.iov_base = xdr_inline_decode(xdr, args->tlen); - if (!args->first.iov_base) - return false; - return svcxdr_decode_sattr(rqstp, xdr, &args->attrs); + args->first.iov_base = p; + args->first.iov_len = rqstp->rq_arg.head[0].iov_len; + args->first.iov_len -= (char *)p - base; + + /* This request is never larger than a page. Therefore, + * transport will deliver either: + * 1. pathname in the pagelist -> sattr is in the tail. + * 2. everything in the head buffer -> sattr is in the head. + */ + if (rqstp->rq_arg.page_len) { + if (args->tlen != rqstp->rq_arg.page_len) + return 0; + p = rqstp->rq_arg.tail[0].iov_base; + } else { + xdrlen = XDR_QUADLEN(args->tlen); + if (xdrlen > args->first.iov_len - (8 * sizeof(__be32))) + return 0; + p += xdrlen; + } + decode_sattr(p, &args->attrs, nfsd_user_namespace(rqstp)); + + return 1; } -bool -nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfssvc_decode_readdirargs(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_readdirargs *args = rqstp->rq_argp; - if (!svcxdr_decode_fhandle(xdr, &args->fh)) - return false; - if (xdr_stream_decode_u32(xdr, &args->cookie) < 0) - return false; - if (xdr_stream_decode_u32(xdr, &args->count) < 0) - return false; + p = decode_fh(p, &args->fh); + if (!p) + return 0; + args->cookie = ntohl(*p++); + args->count = ntohl(*p++); + args->count = min_t(u32, args->count, PAGE_SIZE); + args->buffer = page_address(*(rqstp->rq_next_page++)); - return true; + return xdr_argsize_check(rqstp, p); } /* * XDR encode functions */ +int +nfssvc_encode_void(struct svc_rqst *rqstp, __be32 *p) +{ + return xdr_ressize_check(rqstp, p); +} -bool -nfssvc_encode_statres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfssvc_encode_stat(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_stat *resp = rqstp->rq_resp; - return svcxdr_encode_stat(xdr, resp->status); + *p++ = resp->status; + return xdr_ressize_check(rqstp, p); } -bool -nfssvc_encode_attrstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfssvc_encode_attrstat(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_attrstat *resp = rqstp->rq_resp; - if (!svcxdr_encode_stat(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) - return false; - break; - } - - return true; + *p++ = resp->status; + if (resp->status != nfs_ok) + goto out; + p = encode_fattr(rqstp, p, &resp->fh, &resp->stat); +out: + return xdr_ressize_check(rqstp, p); } -bool -nfssvc_encode_diropres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfssvc_encode_diropres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_diropres *resp = rqstp->rq_resp; - if (!svcxdr_encode_stat(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - if (!svcxdr_encode_fhandle(xdr, &resp->fh)) - return false; - if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) - return false; - break; - } - - return true; + *p++ = resp->status; + if (resp->status != nfs_ok) + goto out; + p = encode_fh(p, &resp->fh); + p = encode_fattr(rqstp, p, &resp->fh, &resp->stat); +out: + return xdr_ressize_check(rqstp, p); } -bool -nfssvc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfssvc_encode_readlinkres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_readlinkres *resp = rqstp->rq_resp; - struct kvec *head = rqstp->rq_res.head; - if (!svcxdr_encode_stat(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - if (xdr_stream_encode_u32(xdr, resp->len) < 0) - return false; - xdr_write_pages(xdr, &resp->page, 0, resp->len); - if (svc_encode_result_payload(rqstp, head->iov_len, resp->len) < 0) - return false; - break; + *p++ = resp->status; + if (resp->status != nfs_ok) + return xdr_ressize_check(rqstp, p); + + *p++ = htonl(resp->len); + xdr_ressize_check(rqstp, p); + rqstp->rq_res.page_len = resp->len; + if (resp->len & 3) { + /* need to pad the tail */ + rqstp->rq_res.tail[0].iov_base = p; + *p = 0; + rqstp->rq_res.tail[0].iov_len = 4 - (resp->len&3); } - - return true; + return 1; } -bool -nfssvc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfssvc_encode_readres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_readres *resp = rqstp->rq_resp; - struct kvec *head = rqstp->rq_res.head; - if (!svcxdr_encode_stat(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - if (!svcxdr_encode_fattr(rqstp, xdr, &resp->fh, &resp->stat)) - return false; - if (xdr_stream_encode_u32(xdr, resp->count) < 0) - return false; - xdr_write_pages(xdr, resp->pages, rqstp->rq_res.page_base, - resp->count); - if (svc_encode_result_payload(rqstp, head->iov_len, resp->count) < 0) - return false; - break; + *p++ = resp->status; + if (resp->status != nfs_ok) + return xdr_ressize_check(rqstp, p); + + p = encode_fattr(rqstp, p, &resp->fh, &resp->stat); + *p++ = htonl(resp->count); + xdr_ressize_check(rqstp, p); + + /* now update rqstp->rq_res to reflect data as well */ + rqstp->rq_res.page_len = resp->count; + if (resp->count & 3) { + /* need to pad the tail */ + rqstp->rq_res.tail[0].iov_base = p; + *p = 0; + rqstp->rq_res.tail[0].iov_len = 4 - (resp->count&3); } - - return true; + return 1; } -bool -nfssvc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfssvc_encode_readdirres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_readdirres *resp = rqstp->rq_resp; - struct xdr_buf *dirlist = &resp->dirlist; - if (!svcxdr_encode_stat(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - xdr_write_pages(xdr, dirlist->pages, 0, dirlist->len); - /* no more entries */ - if (xdr_stream_encode_item_absent(xdr) < 0) - return false; - if (xdr_stream_encode_bool(xdr, resp->common.err == nfserr_eof) < 0) - return false; - break; - } + *p++ = resp->status; + if (resp->status != nfs_ok) + return xdr_ressize_check(rqstp, p); - return true; + xdr_ressize_check(rqstp, p); + p = resp->buffer; + *p++ = 0; /* no more entries */ + *p++ = htonl((resp->common.err == nfserr_eof)); + rqstp->rq_res.page_len = (((unsigned long)p-1) & ~PAGE_MASK)+1; + + return 1; } -bool -nfssvc_encode_statfsres(struct svc_rqst *rqstp, struct xdr_stream *xdr) +int +nfssvc_encode_statfsres(struct svc_rqst *rqstp, __be32 *p) { struct nfsd_statfsres *resp = rqstp->rq_resp; struct kstatfs *stat = &resp->stats; - __be32 *p; - if (!svcxdr_encode_stat(xdr, resp->status)) - return false; - switch (resp->status) { - case nfs_ok: - p = xdr_reserve_space(xdr, XDR_UNIT * 5); - if (!p) - return false; - *p++ = cpu_to_be32(NFSSVC_MAXBLKSIZE_V2); - *p++ = cpu_to_be32(stat->f_bsize); - *p++ = cpu_to_be32(stat->f_blocks); - *p++ = cpu_to_be32(stat->f_bfree); - *p = cpu_to_be32(stat->f_bavail); - break; + *p++ = resp->status; + if (resp->status != nfs_ok) + return xdr_ressize_check(rqstp, p); + + *p++ = htonl(NFSSVC_MAXBLKSIZE_V2); /* max transfer size */ + *p++ = htonl(stat->f_bsize); + *p++ = htonl(stat->f_blocks); + *p++ = htonl(stat->f_bfree); + *p++ = htonl(stat->f_bavail); + return xdr_ressize_check(rqstp, p); +} + +int +nfssvc_encode_entry(void *ccdv, const char *name, + int namlen, loff_t offset, u64 ino, unsigned int d_type) +{ + struct readdir_cd *ccd = ccdv; + struct nfsd_readdirres *cd = container_of(ccd, struct nfsd_readdirres, common); + __be32 *p = cd->buffer; + int buflen, slen; + + /* + dprintk("nfsd: entry(%.*s off %ld ino %ld)\n", + namlen, name, offset, ino); + */ + + if (offset > ~((u32) 0)) { + cd->common.err = nfserr_fbig; + return -EINVAL; } + if (cd->offset) + *cd->offset = htonl(offset); - return true; -} + /* truncate filename */ + namlen = min(namlen, NFS2_MAXNAMLEN); + slen = XDR_QUADLEN(namlen); -/** - * nfssvc_encode_nfscookie - Encode a directory offset cookie - * @resp: readdir result context - * @offset: offset cookie to encode - * - * The buffer space for the offset cookie has already been reserved - * by svcxdr_encode_entry_common(). - */ -void nfssvc_encode_nfscookie(struct nfsd_readdirres *resp, u32 offset) -{ - __be32 cookie = cpu_to_be32(offset); + if ((buflen = cd->buflen - slen - 4) < 0) { + cd->common.err = nfserr_toosmall; + return -EINVAL; + } + if (ino > ~((u32) 0)) { + cd->common.err = nfserr_fbig; + return -EINVAL; + } + *p++ = xdr_one; /* mark entry present */ + *p++ = htonl((u32) ino); /* file id */ + p = xdr_encode_array(p, name, namlen);/* name length & name */ + cd->offset = p; /* remember pointer */ + *p++ = htonl(~0U); /* offset of next entry */ - if (!resp->cookie_offset) - return; - - write_bytes_to_xdr_buf(&resp->dirlist, resp->cookie_offset, &cookie, - sizeof(cookie)); - resp->cookie_offset = 0; -} - -static bool -svcxdr_encode_entry_common(struct nfsd_readdirres *resp, const char *name, - int namlen, loff_t offset, u64 ino) -{ - struct xdr_buf *dirlist = &resp->dirlist; - struct xdr_stream *xdr = &resp->xdr; - - if (xdr_stream_encode_item_present(xdr) < 0) - return false; - /* fileid */ - if (xdr_stream_encode_u32(xdr, (u32)ino) < 0) - return false; - /* name */ - if (xdr_stream_encode_opaque(xdr, name, min(namlen, NFS2_MAXNAMLEN)) < 0) - return false; - /* cookie */ - resp->cookie_offset = dirlist->len; - if (xdr_stream_encode_u32(xdr, ~0U) < 0) - return false; - - return true; -} - -/** - * nfssvc_encode_entry - encode one NFSv2 READDIR entry - * @data: directory context - * @name: name of the object to be encoded - * @namlen: length of that name, in bytes - * @offset: the offset of the previous entry - * @ino: the fileid of this entry - * @d_type: unused - * - * Return values: - * %0: Entry was successfully encoded. - * %-EINVAL: An encoding problem occured, secondary status code in resp->common.err - * - * On exit, the following fields are updated: - * - resp->xdr - * - resp->common.err - * - resp->cookie_offset - */ -int nfssvc_encode_entry(void *data, const char *name, int namlen, - loff_t offset, u64 ino, unsigned int d_type) -{ - struct readdir_cd *ccd = data; - struct nfsd_readdirres *resp = container_of(ccd, - struct nfsd_readdirres, - common); - unsigned int starting_length = resp->dirlist.len; - - /* The offset cookie for the previous entry */ - nfssvc_encode_nfscookie(resp, offset); - - if (!svcxdr_encode_entry_common(resp, name, namlen, offset, ino)) - goto out_toosmall; - - xdr_commit_encode(&resp->xdr); - resp->common.err = nfs_ok; + cd->buflen = buflen; + cd->buffer = p; + cd->common.err = nfs_ok; return 0; - -out_toosmall: - resp->cookie_offset = 0; - resp->common.err = nfserr_toosmall; - resp->dirlist.len = starting_length; - return -EINVAL; } /* diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index e94634d30591..9eae11a9d21c 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -57,11 +57,11 @@ typedef struct { } stateid_t; typedef struct { - stateid_t cs_stid; + stateid_t stid; #define NFS4_COPY_STID 1 #define NFS4_COPYNOTIFY_STID 2 - unsigned char cs_type; - refcount_t cs_count; + unsigned char sc_type; + refcount_t sc_count; } copy_stateid_t; struct nfsd4_callback { @@ -149,7 +149,6 @@ struct nfs4_delegation { /* For recall: */ int dl_retries; struct nfsd4_callback dl_recall; - bool dl_recalled; }; #define cb_to_delegation(cb) \ @@ -175,7 +174,7 @@ static inline struct nfs4_delegation *delegstateid(struct nfs4_stid *s) /* Maximum number of slots per session. 160 is useful for long haul TCP */ #define NFSD_MAX_SLOTS_PER_SESSION 160 /* Maximum number of operations per session compound */ -#define NFSD_MAX_OPS_PER_COMPOUND 50 +#define NFSD_MAX_OPS_PER_COMPOUND 16 /* Maximum session per slot cache size */ #define NFSD_SLOT_CACHE_SIZE 2048 /* Maximum number of NFSD_SLOT_CACHE_SIZE slots per session */ @@ -283,28 +282,6 @@ struct nfsd4_sessionid { #define HEXDIR_LEN 33 /* hex version of 16 byte md5 of cl_name plus '\0' */ -/* - * State Meaning Where set - * -------------------------------------------------------------------------- - * | NFSD4_ACTIVE | Confirmed, active | Default | - * |------------------- ----------------------------------------------------| - * | NFSD4_COURTESY | Courtesy state. | nfs4_get_client_reaplist | - * | | Lease/lock/share | | - * | | reservation conflict | | - * | | can cause Courtesy | | - * | | client to be expired | | - * |------------------------------------------------------------------------| - * | NFSD4_EXPIRABLE | Courtesy client to be| nfs4_laundromat | - * | | expired by Laundromat| try_to_expire_client | - * | | due to conflict | | - * |------------------------------------------------------------------------| - */ -enum { - NFSD4_ACTIVE = 0, - NFSD4_COURTESY, - NFSD4_EXPIRABLE, -}; - /* * struct nfs4_client - one per client. Clientids live here. * @@ -368,7 +345,6 @@ struct nfs4_client { #define NFSD4_CLIENT_UPCALL_LOCK (5) /* upcall serialization */ #define NFSD4_CLIENT_CB_FLAG_MASK (1 << NFSD4_CLIENT_CB_UPDATE | \ 1 << NFSD4_CLIENT_CB_KILL) -#define NFSD4_CLIENT_CB_RECALL_ANY (6) unsigned long cl_flags; const struct cred *cl_cb_cred; struct rpc_clnt *cl_cb_client; @@ -395,10 +371,6 @@ struct nfs4_client { /* debugging info directory under nfsd/clients/ : */ struct dentry *cl_nfsd_dentry; - /* 'info' file within that directory. Ref is not counted, - * but will remain valid iff cl_nfsd_dentry != NULL - */ - struct dentry *cl_nfsd_info_dentry; /* for nfs41 callbacks */ /* We currently support a single back channel with a single slot */ @@ -409,13 +381,6 @@ struct nfs4_client { struct list_head async_copies; /* list of async copies */ spinlock_t async_lock; /* lock for async copies */ atomic_t cl_cb_inflight; /* Outstanding callbacks */ - - unsigned int cl_state; - atomic_t cl_delegs_in_recall; - - struct nfsd4_cb_recall_any *cl_ra; - time64_t cl_ra_time; - struct list_head cl_ra_cblist; }; /* struct nfs4_client_reset @@ -541,13 +506,14 @@ struct nfs4_clnt_odstate { * inode can have multiple filehandles associated with it, so there is * (potentially) a many to one relationship between this struct and struct * inode. + * + * These are hashed by filehandle in the file_hashtbl, which is protected by + * the global state_lock spinlock. */ struct nfs4_file { refcount_t fi_ref; - struct inode * fi_inode; - bool fi_aliased; spinlock_t fi_lock; - struct rhlist_head fi_rlist; + struct hlist_node fi_hash; /* hash on fi_fhandle */ struct list_head fi_stateids; union { struct list_head fi_delegations; @@ -596,10 +562,6 @@ struct nfs4_ol_stateid { struct list_head st_locks; struct nfs4_stateowner *st_stateowner; struct nfs4_clnt_odstate *st_clnt_odstate; -/* - * These bitmasks use 3 separate bits for READ, ALLOW, and BOTH; see the - * comment above bmap_to_share_mode() for explanation: - */ unsigned char st_access_bmap; unsigned char st_deny_bmap; struct nfs4_ol_stateid *st_openstp; @@ -641,7 +603,6 @@ enum nfsd4_cb_op { NFSPROC4_CLNT_CB_OFFLOAD, NFSPROC4_CLNT_CB_SEQUENCE, NFSPROC4_CLNT_CB_NOTIFY_LOCK, - NFSPROC4_CLNT_CB_RECALL_ANY, }; /* Returns true iff a is later than b: */ @@ -662,7 +623,6 @@ struct nfsd4_blocked_lock { struct file_lock nbl_lock; struct knfsd_fh nbl_fh; struct nfsd4_callback nbl_cb; - struct kref nbl_kref; }; struct nfsd4_compound_state; @@ -689,22 +649,26 @@ void nfs4_remove_reclaim_record(struct nfs4_client_reclaim *, struct nfsd_net *) extern void nfs4_release_reclaim(struct nfsd_net *); extern struct nfs4_client_reclaim *nfsd4_find_reclaim_client(struct xdr_netobj name, struct nfsd_net *nn); -extern __be32 nfs4_check_open_reclaim(struct nfs4_client *); +extern __be32 nfs4_check_open_reclaim(clientid_t *clid, + struct nfsd4_compound_state *cstate, struct nfsd_net *nn); extern void nfsd4_probe_callback(struct nfs4_client *clp); extern void nfsd4_probe_callback_sync(struct nfs4_client *clp); extern void nfsd4_change_callback(struct nfs4_client *clp, struct nfs4_cb_conn *); extern void nfsd4_init_cb(struct nfsd4_callback *cb, struct nfs4_client *clp, const struct nfsd4_callback_ops *ops, enum nfsd4_cb_op op); -extern bool nfsd4_run_cb(struct nfsd4_callback *cb); +extern void nfsd4_run_cb(struct nfsd4_callback *cb); extern int nfsd4_create_callback_queue(void); extern void nfsd4_destroy_callback_queue(void); extern void nfsd4_shutdown_callback(struct nfs4_client *); extern void nfsd4_shutdown_copy(struct nfs4_client *clp); +extern void nfsd4_prepare_cb_recall(struct nfs4_delegation *dp); extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(struct xdr_netobj name, struct xdr_netobj princhash, struct nfsd_net *nn); extern bool nfs4_has_reclaimed_state(struct xdr_netobj name, struct nfsd_net *nn); +struct nfs4_file *find_file(struct knfsd_fh *fh); void put_nfs4_file(struct nfs4_file *fi); +extern void nfs4_put_copy(struct nfsd4_copy *copy); extern struct nfsd4_copy * find_async_copy(struct nfs4_client *clp, stateid_t *staetid); extern void nfs4_put_cpntf_state(struct nfsd_net *nn, @@ -729,9 +693,4 @@ extern void nfsd4_client_record_remove(struct nfs4_client *clp); extern int nfsd4_client_record_check(struct nfs4_client *clp); extern void nfsd4_record_grace_done(struct nfsd_net *nn); -static inline bool try_to_expire_client(struct nfs4_client *clp) -{ - cmpxchg(&clp->cl_state, NFSD4_COURTESY, NFSD4_EXPIRABLE); - return clp->cl_state == NFSD4_EXPIRABLE; -} #endif /* NFSD4_STATE_H */ diff --git a/fs/nfsd/stats.c b/fs/nfsd/stats.c index 777e24e5da33..b1bc582b0493 100644 --- a/fs/nfsd/stats.c +++ b/fs/nfsd/stats.c @@ -7,14 +7,16 @@ * Format: * rc * Statistsics for the reply cache - * fh + * fh * statistics for filehandle lookup * io * statistics for IO throughput - * th - * number of threads - * ra - * + * th <10%-20%> <20%-30%> ... <90%-100%> <100%> + * time (seconds) when nfsd thread usage above thresholds + * and number of times that all threads were in use + * ra cache-size <10% <20% <30% ... <100% not-found + * number of times that read-ahead entry was found that deep in + * the cache. * plus generic RPC stats (see net/sunrpc/stats.c) * * Copyright (C) 1995, 1996, 1997 Olaf Kirch @@ -32,28 +34,35 @@ struct svc_stat nfsd_svcstats = { .program = &nfsd_program, }; -static int nfsd_show(struct seq_file *seq, void *v) +static int nfsd_proc_show(struct seq_file *seq, void *v) { int i; - seq_printf(seq, "rc %lld %lld %lld\nfh %lld 0 0 0 0\nio %lld %lld\n", - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_HITS]), - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_MISSES]), - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE]), - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_FH_STALE]), - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_IO_READ]), - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_IO_WRITE])); - + seq_printf(seq, "rc %u %u %u\nfh %u %u %u %u %u\nio %u %u\n", + nfsdstats.rchits, + nfsdstats.rcmisses, + nfsdstats.rcnocache, + nfsdstats.fh_stale, + nfsdstats.fh_lookup, + nfsdstats.fh_anon, + nfsdstats.fh_nocache_dir, + nfsdstats.fh_nocache_nondir, + nfsdstats.io_read, + nfsdstats.io_write); /* thread usage: */ - seq_printf(seq, "th %u 0", atomic_read(&nfsdstats.th_cnt)); - - /* deprecated thread usage histogram stats */ - for (i = 0; i < 10; i++) - seq_puts(seq, " 0.000"); - - /* deprecated ra-cache stats */ - seq_puts(seq, "\nra 0 0 0 0 0 0 0 0 0 0 0 0\n"); + seq_printf(seq, "th %u %u", nfsdstats.th_cnt, nfsdstats.th_fullcnt); + for (i=0; i<10; i++) { + unsigned int jifs = nfsdstats.th_usage[i]; + unsigned int sec = jifs / HZ, msec = (jifs % HZ)*1000/HZ; + seq_printf(seq, " %u.%03u", sec, msec); + } + /* newline and ra-cache */ + seq_printf(seq, "\nra %u", nfsdstats.ra_size); + for (i=0; i<11; i++) + seq_printf(seq, " %u", nfsdstats.ra_depth[i]); + seq_putc(seq, '\n'); + /* show my rpc info */ svc_seq_show(seq, &nfsd_svcstats); @@ -61,10 +70,8 @@ static int nfsd_show(struct seq_file *seq, void *v) /* Show count for individual nfsv4 operations */ /* Writing operation numbers 0 1 2 also for maintaining uniformity */ seq_printf(seq,"proc4ops %u", LAST_NFS4_OP + 1); - for (i = 0; i <= LAST_NFS4_OP; i++) { - seq_printf(seq, " %lld", - percpu_counter_sum_positive(&nfsdstats.counter[NFSD_STATS_NFS4_OP(i)])); - } + for (i = 0; i <= LAST_NFS4_OP; i++) + seq_printf(seq, " %u", nfsdstats.nfs4_opcount[i]); seq_putc(seq, '\n'); #endif @@ -72,65 +79,26 @@ static int nfsd_show(struct seq_file *seq, void *v) return 0; } -DEFINE_PROC_SHOW_ATTRIBUTE(nfsd); - -int nfsd_percpu_counters_init(struct percpu_counter counters[], int num) +static int nfsd_proc_open(struct inode *inode, struct file *file) { - int i, err = 0; - - for (i = 0; !err && i < num; i++) - err = percpu_counter_init(&counters[i], 0, GFP_KERNEL); - - if (!err) - return 0; - - for (; i > 0; i--) - percpu_counter_destroy(&counters[i-1]); - - return err; + return single_open(file, nfsd_proc_show, NULL); } -void nfsd_percpu_counters_reset(struct percpu_counter counters[], int num) +static const struct proc_ops nfsd_proc_ops = { + .proc_open = nfsd_proc_open, + .proc_read = seq_read, + .proc_lseek = seq_lseek, + .proc_release = single_release, +}; + +void +nfsd_stat_init(void) { - int i; - - for (i = 0; i < num; i++) - percpu_counter_set(&counters[i], 0); -} - -void nfsd_percpu_counters_destroy(struct percpu_counter counters[], int num) -{ - int i; - - for (i = 0; i < num; i++) - percpu_counter_destroy(&counters[i]); -} - -static int nfsd_stat_counters_init(void) -{ - return nfsd_percpu_counters_init(nfsdstats.counter, NFSD_STATS_COUNTERS_NUM); -} - -static void nfsd_stat_counters_destroy(void) -{ - nfsd_percpu_counters_destroy(nfsdstats.counter, NFSD_STATS_COUNTERS_NUM); -} - -int nfsd_stat_init(void) -{ - int err; - - err = nfsd_stat_counters_init(); - if (err) - return err; - svc_proc_register(&init_net, &nfsd_svcstats, &nfsd_proc_ops); - - return 0; } -void nfsd_stat_shutdown(void) +void +nfsd_stat_shutdown(void) { - nfsd_stat_counters_destroy(); svc_proc_unregister(&init_net, "nfsd"); } diff --git a/fs/nfsd/stats.h b/fs/nfsd/stats.h index 9b43dc3d9991..b23fdac69820 100644 --- a/fs/nfsd/stats.h +++ b/fs/nfsd/stats.h @@ -8,89 +8,37 @@ #define _NFSD_STATS_H #include -#include -enum { - NFSD_STATS_RC_HITS, /* repcache hits */ - NFSD_STATS_RC_MISSES, /* repcache misses */ - NFSD_STATS_RC_NOCACHE, /* uncached reqs */ - NFSD_STATS_FH_STALE, /* FH stale error */ - NFSD_STATS_IO_READ, /* bytes returned to read requests */ - NFSD_STATS_IO_WRITE, /* bytes passed in write requests */ -#ifdef CONFIG_NFSD_V4 - NFSD_STATS_FIRST_NFS4_OP, /* count of individual nfsv4 operations */ - NFSD_STATS_LAST_NFS4_OP = NFSD_STATS_FIRST_NFS4_OP + LAST_NFS4_OP, -#define NFSD_STATS_NFS4_OP(op) (NFSD_STATS_FIRST_NFS4_OP + (op)) -#endif - NFSD_STATS_COUNTERS_NUM -}; - struct nfsd_stats { - struct percpu_counter counter[NFSD_STATS_COUNTERS_NUM]; + unsigned int rchits; /* repcache hits */ + unsigned int rcmisses; /* repcache hits */ + unsigned int rcnocache; /* uncached reqs */ + unsigned int fh_stale; /* FH stale error */ + unsigned int fh_lookup; /* dentry cached */ + unsigned int fh_anon; /* anon file dentry returned */ + unsigned int fh_nocache_dir; /* filehandle not found in dcache */ + unsigned int fh_nocache_nondir; /* filehandle not found in dcache */ + unsigned int io_read; /* bytes returned to read requests */ + unsigned int io_write; /* bytes passed in write requests */ + unsigned int th_cnt; /* number of available threads */ + unsigned int th_usage[10]; /* number of ticks during which n perdeciles + * of available threads were in use */ + unsigned int th_fullcnt; /* number of times last free thread was used */ + unsigned int ra_size; /* size of ra cache */ + unsigned int ra_depth[11]; /* number of times ra entry was found that deep + * in the cache (10percentiles). [10] = not found */ +#ifdef CONFIG_NFSD_V4 + unsigned int nfs4_opcount[LAST_NFS4_OP + 1]; /* count of individual nfsv4 operations */ +#endif - atomic_t th_cnt; /* number of available threads */ }; -extern struct nfsd_stats nfsdstats; +extern struct nfsd_stats nfsdstats; extern struct svc_stat nfsd_svcstats; -int nfsd_percpu_counters_init(struct percpu_counter counters[], int num); -void nfsd_percpu_counters_reset(struct percpu_counter counters[], int num); -void nfsd_percpu_counters_destroy(struct percpu_counter counters[], int num); -int nfsd_stat_init(void); -void nfsd_stat_shutdown(void); - -static inline void nfsd_stats_rc_hits_inc(void) -{ - percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_RC_HITS]); -} - -static inline void nfsd_stats_rc_misses_inc(void) -{ - percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_RC_MISSES]); -} - -static inline void nfsd_stats_rc_nocache_inc(void) -{ - percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_RC_NOCACHE]); -} - -static inline void nfsd_stats_fh_stale_inc(struct svc_export *exp) -{ - percpu_counter_inc(&nfsdstats.counter[NFSD_STATS_FH_STALE]); - if (exp) - percpu_counter_inc(&exp->ex_stats.counter[EXP_STATS_FH_STALE]); -} - -static inline void nfsd_stats_io_read_add(struct svc_export *exp, s64 amount) -{ - percpu_counter_add(&nfsdstats.counter[NFSD_STATS_IO_READ], amount); - if (exp) - percpu_counter_add(&exp->ex_stats.counter[EXP_STATS_IO_READ], amount); -} - -static inline void nfsd_stats_io_write_add(struct svc_export *exp, s64 amount) -{ - percpu_counter_add(&nfsdstats.counter[NFSD_STATS_IO_WRITE], amount); - if (exp) - percpu_counter_add(&exp->ex_stats.counter[EXP_STATS_IO_WRITE], amount); -} - -static inline void nfsd_stats_payload_misses_inc(struct nfsd_net *nn) -{ - percpu_counter_inc(&nn->counter[NFSD_NET_PAYLOAD_MISSES]); -} - -static inline void nfsd_stats_drc_mem_usage_add(struct nfsd_net *nn, s64 amount) -{ - percpu_counter_add(&nn->counter[NFSD_NET_DRC_MEM_USAGE], amount); -} - -static inline void nfsd_stats_drc_mem_usage_sub(struct nfsd_net *nn, s64 amount) -{ - percpu_counter_sub(&nn->counter[NFSD_NET_DRC_MEM_USAGE], amount); -} +void nfsd_stat_init(void); +void nfsd_stat_shutdown(void); #endif /* _NFSD_STATS_H */ diff --git a/fs/nfsd/trace.c b/fs/nfsd/trace.c index f008b95ceec2..90967466a1e5 100644 --- a/fs/nfsd/trace.c +++ b/fs/nfsd/trace.c @@ -1,4 +1,3 @@ -// SPDX-License-Identifier: GPL-2.0 #define CREATE_TRACE_POINTS #include "trace.h" diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 445d00f00eab..a952f4a9b2a6 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -12,86 +12,6 @@ #include "export.h" #include "nfsfh.h" -#define NFSD_TRACE_PROC_ARG_FIELDS \ - __field(unsigned int, netns_ino) \ - __field(u32, xid) \ - __array(unsigned char, server, sizeof(struct sockaddr_in6)) \ - __array(unsigned char, client, sizeof(struct sockaddr_in6)) - -#define NFSD_TRACE_PROC_ARG_ASSIGNMENTS \ - do { \ - __entry->netns_ino = SVC_NET(rqstp)->ns.inum; \ - __entry->xid = be32_to_cpu(rqstp->rq_xid); \ - memcpy(__entry->server, &rqstp->rq_xprt->xpt_local, \ - rqstp->rq_xprt->xpt_locallen); \ - memcpy(__entry->client, &rqstp->rq_xprt->xpt_remote, \ - rqstp->rq_xprt->xpt_remotelen); \ - } while (0); - -#define NFSD_TRACE_PROC_RES_FIELDS \ - __field(unsigned int, netns_ino) \ - __field(u32, xid) \ - __field(unsigned long, status) \ - __array(unsigned char, server, sizeof(struct sockaddr_in6)) \ - __array(unsigned char, client, sizeof(struct sockaddr_in6)) - -#define NFSD_TRACE_PROC_RES_ASSIGNMENTS(error) \ - do { \ - __entry->netns_ino = SVC_NET(rqstp)->ns.inum; \ - __entry->xid = be32_to_cpu(rqstp->rq_xid); \ - __entry->status = be32_to_cpu(error); \ - memcpy(__entry->server, &rqstp->rq_xprt->xpt_local, \ - rqstp->rq_xprt->xpt_locallen); \ - memcpy(__entry->client, &rqstp->rq_xprt->xpt_remote, \ - rqstp->rq_xprt->xpt_remotelen); \ - } while (0); - -DECLARE_EVENT_CLASS(nfsd_xdr_err_class, - TP_PROTO( - const struct svc_rqst *rqstp - ), - TP_ARGS(rqstp), - TP_STRUCT__entry( - NFSD_TRACE_PROC_ARG_FIELDS - - __field(u32, vers) - __field(u32, proc) - ), - TP_fast_assign( - NFSD_TRACE_PROC_ARG_ASSIGNMENTS - - __entry->vers = rqstp->rq_vers; - __entry->proc = rqstp->rq_proc; - ), - TP_printk("xid=0x%08x vers=%u proc=%u", - __entry->xid, __entry->vers, __entry->proc - ) -); - -#define DEFINE_NFSD_XDR_ERR_EVENT(name) \ -DEFINE_EVENT(nfsd_xdr_err_class, nfsd_##name##_err, \ - TP_PROTO(const struct svc_rqst *rqstp), \ - TP_ARGS(rqstp)) - -DEFINE_NFSD_XDR_ERR_EVENT(garbage_args); -DEFINE_NFSD_XDR_ERR_EVENT(cant_encode); - -#define show_nfsd_may_flags(x) \ - __print_flags(x, "|", \ - { NFSD_MAY_EXEC, "EXEC" }, \ - { NFSD_MAY_WRITE, "WRITE" }, \ - { NFSD_MAY_READ, "READ" }, \ - { NFSD_MAY_SATTR, "SATTR" }, \ - { NFSD_MAY_TRUNC, "TRUNC" }, \ - { NFSD_MAY_LOCK, "LOCK" }, \ - { NFSD_MAY_OWNER_OVERRIDE, "OWNER_OVERRIDE" }, \ - { NFSD_MAY_LOCAL_ACCESS, "LOCAL_ACCESS" }, \ - { NFSD_MAY_BYPASS_GSS_ON_ROOT, "BYPASS_GSS_ON_ROOT" }, \ - { NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" }, \ - { NFSD_MAY_BYPASS_GSS, "BYPASS_GSS" }, \ - { NFSD_MAY_READ_IF_EXEC, "READ_IF_EXEC" }, \ - { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" }) - TRACE_EVENT(nfsd_compound, TP_PROTO(const struct svc_rqst *rqst, u32 args_opcnt), @@ -131,56 +51,6 @@ TRACE_EVENT(nfsd_compound_status, __get_str(name), __entry->status) ) -TRACE_EVENT(nfsd_compound_decode_err, - TP_PROTO( - const struct svc_rqst *rqstp, - u32 args_opcnt, - u32 resp_opcnt, - u32 opnum, - __be32 status - ), - TP_ARGS(rqstp, args_opcnt, resp_opcnt, opnum, status), - TP_STRUCT__entry( - NFSD_TRACE_PROC_RES_FIELDS - - __field(u32, args_opcnt) - __field(u32, resp_opcnt) - __field(u32, opnum) - ), - TP_fast_assign( - NFSD_TRACE_PROC_RES_ASSIGNMENTS(status) - - __entry->args_opcnt = args_opcnt; - __entry->resp_opcnt = resp_opcnt; - __entry->opnum = opnum; - ), - TP_printk("op=%u/%u opnum=%u status=%lu", - __entry->resp_opcnt, __entry->args_opcnt, - __entry->opnum, __entry->status) -); - -TRACE_EVENT(nfsd_compound_encode_err, - TP_PROTO( - const struct svc_rqst *rqstp, - u32 opnum, - __be32 status - ), - TP_ARGS(rqstp, opnum, status), - TP_STRUCT__entry( - NFSD_TRACE_PROC_RES_FIELDS - - __field(u32, opnum) - ), - TP_fast_assign( - NFSD_TRACE_PROC_RES_ASSIGNMENTS(status) - - __entry->opnum = opnum; - ), - TP_printk("opnum=%u status=%lu", - __entry->opnum, __entry->status) -); - - DECLARE_EVENT_CLASS(nfsd_fh_err_class, TP_PROTO(struct svc_rqst *rqstp, struct svc_fh *fhp, @@ -377,106 +247,10 @@ DEFINE_EVENT(nfsd_err_class, nfsd_##name, \ DEFINE_NFSD_ERR_EVENT(read_err); DEFINE_NFSD_ERR_EVENT(write_err); -TRACE_EVENT(nfsd_dirent, - TP_PROTO(struct svc_fh *fhp, - u64 ino, - const char *name, - int namlen), - TP_ARGS(fhp, ino, name, namlen), - TP_STRUCT__entry( - __field(u32, fh_hash) - __field(u64, ino) - __field(int, len) - __dynamic_array(unsigned char, name, namlen) - ), - TP_fast_assign( - __entry->fh_hash = fhp ? knfsd_fh_hash(&fhp->fh_handle) : 0; - __entry->ino = ino; - __entry->len = namlen; - memcpy(__get_str(name), name, namlen); - ), - TP_printk("fh_hash=0x%08x ino=%llu name=%.*s", - __entry->fh_hash, __entry->ino, - __entry->len, __get_str(name)) -) - -DECLARE_EVENT_CLASS(nfsd_copy_err_class, - TP_PROTO(struct svc_rqst *rqstp, - struct svc_fh *src_fhp, - loff_t src_offset, - struct svc_fh *dst_fhp, - loff_t dst_offset, - u64 count, - int status), - TP_ARGS(rqstp, src_fhp, src_offset, dst_fhp, dst_offset, count, status), - TP_STRUCT__entry( - __field(u32, xid) - __field(u32, src_fh_hash) - __field(loff_t, src_offset) - __field(u32, dst_fh_hash) - __field(loff_t, dst_offset) - __field(u64, count) - __field(int, status) - ), - TP_fast_assign( - __entry->xid = be32_to_cpu(rqstp->rq_xid); - __entry->src_fh_hash = knfsd_fh_hash(&src_fhp->fh_handle); - __entry->src_offset = src_offset; - __entry->dst_fh_hash = knfsd_fh_hash(&dst_fhp->fh_handle); - __entry->dst_offset = dst_offset; - __entry->count = count; - __entry->status = status; - ), - TP_printk("xid=0x%08x src_fh_hash=0x%08x src_offset=%lld " - "dst_fh_hash=0x%08x dst_offset=%lld " - "count=%llu status=%d", - __entry->xid, __entry->src_fh_hash, __entry->src_offset, - __entry->dst_fh_hash, __entry->dst_offset, - (unsigned long long)__entry->count, - __entry->status) -) - -#define DEFINE_NFSD_COPY_ERR_EVENT(name) \ -DEFINE_EVENT(nfsd_copy_err_class, nfsd_##name, \ - TP_PROTO(struct svc_rqst *rqstp, \ - struct svc_fh *src_fhp, \ - loff_t src_offset, \ - struct svc_fh *dst_fhp, \ - loff_t dst_offset, \ - u64 count, \ - int status), \ - TP_ARGS(rqstp, src_fhp, src_offset, dst_fhp, dst_offset, \ - count, status)) - -DEFINE_NFSD_COPY_ERR_EVENT(clone_file_range_err); - #include "state.h" #include "filecache.h" #include "vfs.h" -TRACE_EVENT(nfsd_delegret_wakeup, - TP_PROTO( - const struct svc_rqst *rqstp, - const struct inode *inode, - long timeo - ), - TP_ARGS(rqstp, inode, timeo), - TP_STRUCT__entry( - __field(u32, xid) - __field(const void *, inode) - __field(long, timeo) - ), - TP_fast_assign( - __entry->xid = be32_to_cpu(rqstp->rq_xid); - __entry->inode = inode; - __entry->timeo = timeo; - ), - TP_printk("xid=0x%08x inode=%p%s", - __entry->xid, __entry->inode, - __entry->timeo == 0 ? " (timed out)" : "" - ) -); - DECLARE_EVENT_CLASS(nfsd_stateid_class, TP_PROTO(stateid_t *stp), TP_ARGS(stp), @@ -517,7 +291,7 @@ DEFINE_STATEID_EVENT(layout_recall_release); DEFINE_STATEID_EVENT(open); DEFINE_STATEID_EVENT(deleg_read); -DEFINE_STATEID_EVENT(deleg_return); +DEFINE_STATEID_EVENT(deleg_break); DEFINE_STATEID_EVENT(deleg_recall); DECLARE_EVENT_CLASS(nfsd_stateseqid_class, @@ -550,61 +324,6 @@ DEFINE_EVENT(nfsd_stateseqid_class, nfsd_##name, \ DEFINE_STATESEQID_EVENT(preprocess); DEFINE_STATESEQID_EVENT(open_confirm); -TRACE_DEFINE_ENUM(NFS4_OPEN_STID); -TRACE_DEFINE_ENUM(NFS4_LOCK_STID); -TRACE_DEFINE_ENUM(NFS4_DELEG_STID); -TRACE_DEFINE_ENUM(NFS4_CLOSED_STID); -TRACE_DEFINE_ENUM(NFS4_REVOKED_DELEG_STID); -TRACE_DEFINE_ENUM(NFS4_CLOSED_DELEG_STID); -TRACE_DEFINE_ENUM(NFS4_LAYOUT_STID); - -#define show_stid_type(x) \ - __print_flags(x, "|", \ - { NFS4_OPEN_STID, "OPEN" }, \ - { NFS4_LOCK_STID, "LOCK" }, \ - { NFS4_DELEG_STID, "DELEG" }, \ - { NFS4_CLOSED_STID, "CLOSED" }, \ - { NFS4_REVOKED_DELEG_STID, "REVOKED" }, \ - { NFS4_CLOSED_DELEG_STID, "CLOSED_DELEG" }, \ - { NFS4_LAYOUT_STID, "LAYOUT" }) - -DECLARE_EVENT_CLASS(nfsd_stid_class, - TP_PROTO( - const struct nfs4_stid *stid - ), - TP_ARGS(stid), - TP_STRUCT__entry( - __field(unsigned long, sc_type) - __field(int, sc_count) - __field(u32, cl_boot) - __field(u32, cl_id) - __field(u32, si_id) - __field(u32, si_generation) - ), - TP_fast_assign( - const stateid_t *stp = &stid->sc_stateid; - - __entry->sc_type = stid->sc_type; - __entry->sc_count = refcount_read(&stid->sc_count); - __entry->cl_boot = stp->si_opaque.so_clid.cl_boot; - __entry->cl_id = stp->si_opaque.so_clid.cl_id; - __entry->si_id = stp->si_opaque.so_id; - __entry->si_generation = stp->si_generation; - ), - TP_printk("client %08x:%08x stateid %08x:%08x ref=%d type=%s", - __entry->cl_boot, __entry->cl_id, - __entry->si_id, __entry->si_generation, - __entry->sc_count, show_stid_type(__entry->sc_type) - ) -); - -#define DEFINE_STID_EVENT(name) \ -DEFINE_EVENT(nfsd_stid_class, nfsd_stid_##name, \ - TP_PROTO(const struct nfs4_stid *stid), \ - TP_ARGS(stid)) - -DEFINE_STID_EVENT(revoke); - DECLARE_EVENT_CLASS(nfsd_clientid_class, TP_PROTO(const clientid_t *clid), TP_ARGS(clid), @@ -624,12 +343,7 @@ DEFINE_EVENT(nfsd_clientid_class, nfsd_clid_##name, \ TP_PROTO(const clientid_t *clid), \ TP_ARGS(clid)) -DEFINE_CLIENTID_EVENT(expire_unconf); -DEFINE_CLIENTID_EVENT(reclaim_complete); -DEFINE_CLIENTID_EVENT(confirmed); -DEFINE_CLIENTID_EVENT(destroyed); -DEFINE_CLIENTID_EVENT(admin_expired); -DEFINE_CLIENTID_EVENT(replaced); +DEFINE_CLIENTID_EVENT(expired); DEFINE_CLIENTID_EVENT(purged); DEFINE_CLIENTID_EVENT(renew); DEFINE_CLIENTID_EVENT(stale); @@ -654,145 +368,56 @@ DEFINE_EVENT(nfsd_net_class, nfsd_##name, \ DEFINE_NET_EVENT(grace_start); DEFINE_NET_EVENT(grace_complete); -TRACE_EVENT(nfsd_writeverf_reset, - TP_PROTO( - const struct nfsd_net *nn, - const struct svc_rqst *rqstp, - int error - ), - TP_ARGS(nn, rqstp, error), - TP_STRUCT__entry( - __field(unsigned long long, boot_time) - __field(u32, xid) - __field(int, error) - __array(unsigned char, verifier, NFS4_VERIFIER_SIZE) - ), - TP_fast_assign( - __entry->boot_time = nn->boot_time; - __entry->xid = be32_to_cpu(rqstp->rq_xid); - __entry->error = error; - - /* avoid seqlock inside TP_fast_assign */ - memcpy(__entry->verifier, nn->writeverf, - NFS4_VERIFIER_SIZE); - ), - TP_printk("boot_time=%16llx xid=0x%08x error=%d new verifier=0x%s", - __entry->boot_time, __entry->xid, __entry->error, - __print_hex_str(__entry->verifier, NFS4_VERIFIER_SIZE) - ) -); - -TRACE_EVENT(nfsd_clid_cred_mismatch, - TP_PROTO( - const struct nfs4_client *clp, - const struct svc_rqst *rqstp - ), - TP_ARGS(clp, rqstp), - TP_STRUCT__entry( - __field(u32, cl_boot) - __field(u32, cl_id) - __field(unsigned long, cl_flavor) - __field(unsigned long, new_flavor) - __array(unsigned char, addr, sizeof(struct sockaddr_in6)) - ), - TP_fast_assign( - __entry->cl_boot = clp->cl_clientid.cl_boot; - __entry->cl_id = clp->cl_clientid.cl_id; - __entry->cl_flavor = clp->cl_cred.cr_flavor; - __entry->new_flavor = rqstp->rq_cred.cr_flavor; - memcpy(__entry->addr, &rqstp->rq_xprt->xpt_remote, - sizeof(struct sockaddr_in6)); - ), - TP_printk("client %08x:%08x flavor=%s, conflict=%s from addr=%pISpc", - __entry->cl_boot, __entry->cl_id, - show_nfsd_authflavor(__entry->cl_flavor), - show_nfsd_authflavor(__entry->new_flavor), __entry->addr - ) -) - -TRACE_EVENT(nfsd_clid_verf_mismatch, - TP_PROTO( - const struct nfs4_client *clp, - const struct svc_rqst *rqstp, - const nfs4_verifier *verf - ), - TP_ARGS(clp, rqstp, verf), - TP_STRUCT__entry( - __field(u32, cl_boot) - __field(u32, cl_id) - __array(unsigned char, cl_verifier, NFS4_VERIFIER_SIZE) - __array(unsigned char, new_verifier, NFS4_VERIFIER_SIZE) - __array(unsigned char, addr, sizeof(struct sockaddr_in6)) - ), - TP_fast_assign( - __entry->cl_boot = clp->cl_clientid.cl_boot; - __entry->cl_id = clp->cl_clientid.cl_id; - memcpy(__entry->cl_verifier, (void *)&clp->cl_verifier, - NFS4_VERIFIER_SIZE); - memcpy(__entry->new_verifier, (void *)verf, - NFS4_VERIFIER_SIZE); - memcpy(__entry->addr, &rqstp->rq_xprt->xpt_remote, - sizeof(struct sockaddr_in6)); - ), - TP_printk("client %08x:%08x verf=0x%s, updated=0x%s from addr=%pISpc", - __entry->cl_boot, __entry->cl_id, - __print_hex_str(__entry->cl_verifier, NFS4_VERIFIER_SIZE), - __print_hex_str(__entry->new_verifier, NFS4_VERIFIER_SIZE), - __entry->addr - ) -); - -DECLARE_EVENT_CLASS(nfsd_clid_class, +TRACE_EVENT(nfsd_clid_inuse_err, TP_PROTO(const struct nfs4_client *clp), TP_ARGS(clp), TP_STRUCT__entry( __field(u32, cl_boot) __field(u32, cl_id) __array(unsigned char, addr, sizeof(struct sockaddr_in6)) - __field(unsigned long, flavor) - __array(unsigned char, verifier, NFS4_VERIFIER_SIZE) - __dynamic_array(char, name, clp->cl_name.len + 1) + __field(unsigned int, namelen) + __dynamic_array(unsigned char, name, clp->cl_name.len) ), TP_fast_assign( __entry->cl_boot = clp->cl_clientid.cl_boot; __entry->cl_id = clp->cl_clientid.cl_id; memcpy(__entry->addr, &clp->cl_addr, sizeof(struct sockaddr_in6)); - __entry->flavor = clp->cl_cred.cr_flavor; - memcpy(__entry->verifier, (void *)&clp->cl_verifier, - NFS4_VERIFIER_SIZE); - memcpy(__get_str(name), clp->cl_name.data, clp->cl_name.len); - __get_str(name)[clp->cl_name.len] = '\0'; + __entry->namelen = clp->cl_name.len; + memcpy(__get_dynamic_array(name), clp->cl_name.data, + clp->cl_name.len); ), - TP_printk("addr=%pISpc name='%s' verifier=0x%s flavor=%s client=%08x:%08x", - __entry->addr, __get_str(name), - __print_hex_str(__entry->verifier, NFS4_VERIFIER_SIZE), - show_nfsd_authflavor(__entry->flavor), + TP_printk("nfs4_clientid %.*s already in use by %pISpc, client %08x:%08x", + __entry->namelen, __get_str(name), __entry->addr, __entry->cl_boot, __entry->cl_id) -); +) -#define DEFINE_CLID_EVENT(name) \ -DEFINE_EVENT(nfsd_clid_class, nfsd_clid_##name, \ - TP_PROTO(const struct nfs4_client *clp), \ - TP_ARGS(clp)) +TRACE_DEFINE_ENUM(NFSD_FILE_HASHED); +TRACE_DEFINE_ENUM(NFSD_FILE_PENDING); +TRACE_DEFINE_ENUM(NFSD_FILE_BREAK_READ); +TRACE_DEFINE_ENUM(NFSD_FILE_BREAK_WRITE); +TRACE_DEFINE_ENUM(NFSD_FILE_REFERENCED); -DEFINE_CLID_EVENT(fresh); -DEFINE_CLID_EVENT(confirmed_r); - -/* - * from fs/nfsd/filecache.h - */ #define show_nf_flags(val) \ __print_flags(val, "|", \ { 1 << NFSD_FILE_HASHED, "HASHED" }, \ { 1 << NFSD_FILE_PENDING, "PENDING" }, \ - { 1 << NFSD_FILE_REFERENCED, "REFERENCED" }, \ - { 1 << NFSD_FILE_GC, "GC" }) + { 1 << NFSD_FILE_BREAK_READ, "BREAK_READ" }, \ + { 1 << NFSD_FILE_BREAK_WRITE, "BREAK_WRITE" }, \ + { 1 << NFSD_FILE_REFERENCED, "REFERENCED"}) + +/* FIXME: This should probably be fleshed out in the future. */ +#define show_nf_may(val) \ + __print_flags(val, "|", \ + { NFSD_MAY_READ, "READ" }, \ + { NFSD_MAY_WRITE, "WRITE" }, \ + { NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" }) DECLARE_EVENT_CLASS(nfsd_file_class, TP_PROTO(struct nfsd_file *nf), TP_ARGS(nf), TP_STRUCT__entry( + __field(unsigned int, nf_hashval) __field(void *, nf_inode) __field(int, nf_ref) __field(unsigned long, nf_flags) @@ -800,17 +425,19 @@ DECLARE_EVENT_CLASS(nfsd_file_class, __field(struct file *, nf_file) ), TP_fast_assign( + __entry->nf_hashval = nf->nf_hashval; __entry->nf_inode = nf->nf_inode; __entry->nf_ref = refcount_read(&nf->nf_ref); __entry->nf_flags = nf->nf_flags; __entry->nf_may = nf->nf_may; __entry->nf_file = nf->nf_file; ), - TP_printk("inode=%p ref=%d flags=%s may=%s nf_file=%p", + TP_printk("hash=0x%x inode=0x%p ref=%d flags=%s may=%s file=%p", + __entry->nf_hashval, __entry->nf_inode, __entry->nf_ref, show_nf_flags(__entry->nf_flags), - show_nfsd_may_flags(__entry->nf_may), + show_nf_may(__entry->nf_may), __entry->nf_file) ) @@ -819,60 +446,34 @@ DEFINE_EVENT(nfsd_file_class, name, \ TP_PROTO(struct nfsd_file *nf), \ TP_ARGS(nf)) -DEFINE_NFSD_FILE_EVENT(nfsd_file_free); +DEFINE_NFSD_FILE_EVENT(nfsd_file_alloc); +DEFINE_NFSD_FILE_EVENT(nfsd_file_put_final); DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash); DEFINE_NFSD_FILE_EVENT(nfsd_file_put); -DEFINE_NFSD_FILE_EVENT(nfsd_file_closing); -DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_queue); - -TRACE_EVENT(nfsd_file_alloc, - TP_PROTO( - const struct nfsd_file *nf - ), - TP_ARGS(nf), - TP_STRUCT__entry( - __field(const void *, nf_inode) - __field(unsigned long, nf_flags) - __field(unsigned long, nf_may) - __field(unsigned int, nf_ref) - ), - TP_fast_assign( - __entry->nf_inode = nf->nf_inode; - __entry->nf_flags = nf->nf_flags; - __entry->nf_ref = refcount_read(&nf->nf_ref); - __entry->nf_may = nf->nf_may; - ), - TP_printk("inode=%p ref=%u flags=%s may=%s", - __entry->nf_inode, __entry->nf_ref, - show_nf_flags(__entry->nf_flags), - show_nfsd_may_flags(__entry->nf_may) - ) -); +DEFINE_NFSD_FILE_EVENT(nfsd_file_unhash_and_release_locked); TRACE_EVENT(nfsd_file_acquire, - TP_PROTO( - const struct svc_rqst *rqstp, - const struct inode *inode, - unsigned int may_flags, - const struct nfsd_file *nf, - __be32 status - ), + TP_PROTO(struct svc_rqst *rqstp, unsigned int hash, + struct inode *inode, unsigned int may_flags, + struct nfsd_file *nf, __be32 status), - TP_ARGS(rqstp, inode, may_flags, nf, status), + TP_ARGS(rqstp, hash, inode, may_flags, nf, status), TP_STRUCT__entry( __field(u32, xid) - __field(const void *, inode) - __field(unsigned long, may_flags) - __field(unsigned int, nf_ref) + __field(unsigned int, hash) + __field(void *, inode) + __field(unsigned int, may_flags) + __field(int, nf_ref) __field(unsigned long, nf_flags) - __field(unsigned long, nf_may) - __field(const void *, nf_file) + __field(unsigned char, nf_may) + __field(struct file *, nf_file) __field(u32, status) ), TP_fast_assign( __entry->xid = be32_to_cpu(rqstp->rq_xid); + __entry->hash = hash; __entry->inode = inode; __entry->may_flags = may_flags; __entry->nf_ref = nf ? refcount_read(&nf->nf_ref) : 0; @@ -882,132 +483,40 @@ TRACE_EVENT(nfsd_file_acquire, __entry->status = be32_to_cpu(status); ), - TP_printk("xid=0x%x inode=%p may_flags=%s ref=%u nf_flags=%s nf_may=%s nf_file=%p status=%u", - __entry->xid, __entry->inode, - show_nfsd_may_flags(__entry->may_flags), - __entry->nf_ref, show_nf_flags(__entry->nf_flags), - show_nfsd_may_flags(__entry->nf_may), - __entry->nf_file, __entry->status - ) + TP_printk("xid=0x%x hash=0x%x inode=0x%p may_flags=%s ref=%d nf_flags=%s nf_may=%s nf_file=0x%p status=%u", + __entry->xid, __entry->hash, __entry->inode, + show_nf_may(__entry->may_flags), __entry->nf_ref, + show_nf_flags(__entry->nf_flags), + show_nf_may(__entry->nf_may), __entry->nf_file, + __entry->status) ); -TRACE_EVENT(nfsd_file_insert_err, - TP_PROTO( - const struct svc_rqst *rqstp, - const struct inode *inode, - unsigned int may_flags, - long error - ), - TP_ARGS(rqstp, inode, may_flags, error), +DECLARE_EVENT_CLASS(nfsd_file_search_class, + TP_PROTO(struct inode *inode, unsigned int hash, int found), + TP_ARGS(inode, hash, found), TP_STRUCT__entry( - __field(u32, xid) - __field(const void *, inode) - __field(unsigned long, may_flags) - __field(long, error) - ), - TP_fast_assign( - __entry->xid = be32_to_cpu(rqstp->rq_xid); - __entry->inode = inode; - __entry->may_flags = may_flags; - __entry->error = error; - ), - TP_printk("xid=0x%x inode=%p may_flags=%s error=%ld", - __entry->xid, __entry->inode, - show_nfsd_may_flags(__entry->may_flags), - __entry->error - ) -); - -TRACE_EVENT(nfsd_file_cons_err, - TP_PROTO( - const struct svc_rqst *rqstp, - const struct inode *inode, - unsigned int may_flags, - const struct nfsd_file *nf - ), - TP_ARGS(rqstp, inode, may_flags, nf), - TP_STRUCT__entry( - __field(u32, xid) - __field(const void *, inode) - __field(unsigned long, may_flags) - __field(unsigned int, nf_ref) - __field(unsigned long, nf_flags) - __field(unsigned long, nf_may) - __field(const void *, nf_file) - ), - TP_fast_assign( - __entry->xid = be32_to_cpu(rqstp->rq_xid); - __entry->inode = inode; - __entry->may_flags = may_flags; - __entry->nf_ref = refcount_read(&nf->nf_ref); - __entry->nf_flags = nf->nf_flags; - __entry->nf_may = nf->nf_may; - __entry->nf_file = nf->nf_file; - ), - TP_printk("xid=0x%x inode=%p may_flags=%s ref=%u nf_flags=%s nf_may=%s nf_file=%p", - __entry->xid, __entry->inode, - show_nfsd_may_flags(__entry->may_flags), __entry->nf_ref, - show_nf_flags(__entry->nf_flags), - show_nfsd_may_flags(__entry->nf_may), __entry->nf_file - ) -); - -DECLARE_EVENT_CLASS(nfsd_file_open_class, - TP_PROTO(const struct nfsd_file *nf, __be32 status), - TP_ARGS(nf, status), - TP_STRUCT__entry( - __field(void *, nf_inode) /* cannot be dereferenced */ - __field(int, nf_ref) - __field(unsigned long, nf_flags) - __field(unsigned long, nf_may) - __field(void *, nf_file) /* cannot be dereferenced */ - ), - TP_fast_assign( - __entry->nf_inode = nf->nf_inode; - __entry->nf_ref = refcount_read(&nf->nf_ref); - __entry->nf_flags = nf->nf_flags; - __entry->nf_may = nf->nf_may; - __entry->nf_file = nf->nf_file; - ), - TP_printk("inode=%p ref=%d flags=%s may=%s file=%p", - __entry->nf_inode, - __entry->nf_ref, - show_nf_flags(__entry->nf_flags), - show_nfsd_may_flags(__entry->nf_may), - __entry->nf_file) -) - -#define DEFINE_NFSD_FILE_OPEN_EVENT(name) \ -DEFINE_EVENT(nfsd_file_open_class, name, \ - TP_PROTO( \ - const struct nfsd_file *nf, \ - __be32 status \ - ), \ - TP_ARGS(nf, status)) - -DEFINE_NFSD_FILE_OPEN_EVENT(nfsd_file_open); -DEFINE_NFSD_FILE_OPEN_EVENT(nfsd_file_opened); - -TRACE_EVENT(nfsd_file_is_cached, - TP_PROTO( - const struct inode *inode, - int found - ), - TP_ARGS(inode, found), - TP_STRUCT__entry( - __field(const struct inode *, inode) + __field(struct inode *, inode) + __field(unsigned int, hash) __field(int, found) ), TP_fast_assign( __entry->inode = inode; + __entry->hash = hash; __entry->found = found; ), - TP_printk("inode=%p is %scached", - __entry->inode, - __entry->found ? "" : "not " - ) + TP_printk("hash=0x%x inode=0x%p found=%d", __entry->hash, + __entry->inode, __entry->found) ); +#define DEFINE_NFSD_FILE_SEARCH_EVENT(name) \ +DEFINE_EVENT(nfsd_file_search_class, name, \ + TP_PROTO(struct inode *inode, unsigned int hash, int found), \ + TP_ARGS(inode, hash, found)) + +DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode_sync); +DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_close_inode); +DEFINE_NFSD_FILE_SEARCH_EVENT(nfsd_file_is_cached); + TRACE_EVENT(nfsd_file_fsnotify_handle_event, TP_PROTO(struct inode *inode, u32 mask), TP_ARGS(inode, mask), @@ -1023,95 +532,10 @@ TRACE_EVENT(nfsd_file_fsnotify_handle_event, __entry->mode = inode->i_mode; __entry->mask = mask; ), - TP_printk("inode=%p nlink=%u mode=0%ho mask=0x%x", __entry->inode, + TP_printk("inode=0x%p nlink=%u mode=0%ho mask=0x%x", __entry->inode, __entry->nlink, __entry->mode, __entry->mask) ); -DECLARE_EVENT_CLASS(nfsd_file_gc_class, - TP_PROTO( - const struct nfsd_file *nf - ), - TP_ARGS(nf), - TP_STRUCT__entry( - __field(void *, nf_inode) - __field(void *, nf_file) - __field(int, nf_ref) - __field(unsigned long, nf_flags) - ), - TP_fast_assign( - __entry->nf_inode = nf->nf_inode; - __entry->nf_file = nf->nf_file; - __entry->nf_ref = refcount_read(&nf->nf_ref); - __entry->nf_flags = nf->nf_flags; - ), - TP_printk("inode=%p ref=%d nf_flags=%s nf_file=%p", - __entry->nf_inode, __entry->nf_ref, - show_nf_flags(__entry->nf_flags), - __entry->nf_file - ) -); - -#define DEFINE_NFSD_FILE_GC_EVENT(name) \ -DEFINE_EVENT(nfsd_file_gc_class, name, \ - TP_PROTO( \ - const struct nfsd_file *nf \ - ), \ - TP_ARGS(nf)) - -DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add); -DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_add_disposed); -DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del); -DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_lru_del_disposed); -DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_in_use); -DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_writeback); -DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_referenced); -DEFINE_NFSD_FILE_GC_EVENT(nfsd_file_gc_disposed); - -DECLARE_EVENT_CLASS(nfsd_file_lruwalk_class, - TP_PROTO( - unsigned long removed, - unsigned long remaining - ), - TP_ARGS(removed, remaining), - TP_STRUCT__entry( - __field(unsigned long, removed) - __field(unsigned long, remaining) - ), - TP_fast_assign( - __entry->removed = removed; - __entry->remaining = remaining; - ), - TP_printk("%lu entries removed, %lu remaining", - __entry->removed, __entry->remaining) -); - -#define DEFINE_NFSD_FILE_LRUWALK_EVENT(name) \ -DEFINE_EVENT(nfsd_file_lruwalk_class, name, \ - TP_PROTO( \ - unsigned long removed, \ - unsigned long remaining \ - ), \ - TP_ARGS(removed, remaining)) - -DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_gc_removed); -DEFINE_NFSD_FILE_LRUWALK_EVENT(nfsd_file_shrinker_removed); - -TRACE_EVENT(nfsd_file_close, - TP_PROTO( - const struct inode *inode - ), - TP_ARGS(inode), - TP_STRUCT__entry( - __field(const void *, inode) - ), - TP_fast_assign( - __entry->inode = inode; - ), - TP_printk("inode=%p", - __entry->inode - ) -); - #include "cache.h" TRACE_DEFINE_ENUM(RC_DROPIT); @@ -1192,9 +616,9 @@ TRACE_EVENT(nfsd_cb_args, memcpy(__entry->addr, &conn->cb_addr, sizeof(struct sockaddr_in6)); ), - TP_printk("addr=%pISpc client %08x:%08x prog=%u ident=%u", - __entry->addr, __entry->cl_boot, __entry->cl_id, - __entry->prog, __entry->ident) + TP_printk("client %08x:%08x callback addr=%pISpc prog=%u ident=%u", + __entry->cl_boot, __entry->cl_id, + __entry->addr, __entry->prog, __entry->ident) ); TRACE_EVENT(nfsd_cb_nodelegs, @@ -1211,6 +635,11 @@ TRACE_EVENT(nfsd_cb_nodelegs, TP_printk("client %08x:%08x", __entry->cl_boot, __entry->cl_id) ) +TRACE_DEFINE_ENUM(NFSD4_CB_UP); +TRACE_DEFINE_ENUM(NFSD4_CB_UNKNOWN); +TRACE_DEFINE_ENUM(NFSD4_CB_DOWN); +TRACE_DEFINE_ENUM(NFSD4_CB_FAULT); + #define show_cb_state(val) \ __print_symbolic(val, \ { NFSD4_CB_UP, "UP" }, \ @@ -1244,53 +673,10 @@ DEFINE_EVENT(nfsd_cb_class, nfsd_cb_##name, \ TP_PROTO(const struct nfs4_client *clp), \ TP_ARGS(clp)) +DEFINE_NFSD_CB_EVENT(setup); DEFINE_NFSD_CB_EVENT(state); -DEFINE_NFSD_CB_EVENT(probe); -DEFINE_NFSD_CB_EVENT(lost); DEFINE_NFSD_CB_EVENT(shutdown); -TRACE_DEFINE_ENUM(RPC_AUTH_NULL); -TRACE_DEFINE_ENUM(RPC_AUTH_UNIX); -TRACE_DEFINE_ENUM(RPC_AUTH_GSS); -TRACE_DEFINE_ENUM(RPC_AUTH_GSS_KRB5); -TRACE_DEFINE_ENUM(RPC_AUTH_GSS_KRB5I); -TRACE_DEFINE_ENUM(RPC_AUTH_GSS_KRB5P); - -#define show_nfsd_authflavor(val) \ - __print_symbolic(val, \ - { RPC_AUTH_NULL, "none" }, \ - { RPC_AUTH_UNIX, "sys" }, \ - { RPC_AUTH_GSS, "gss" }, \ - { RPC_AUTH_GSS_KRB5, "krb5" }, \ - { RPC_AUTH_GSS_KRB5I, "krb5i" }, \ - { RPC_AUTH_GSS_KRB5P, "krb5p" }) - -TRACE_EVENT(nfsd_cb_setup, - TP_PROTO(const struct nfs4_client *clp, - const char *netid, - rpc_authflavor_t authflavor - ), - TP_ARGS(clp, netid, authflavor), - TP_STRUCT__entry( - __field(u32, cl_boot) - __field(u32, cl_id) - __field(unsigned long, authflavor) - __array(unsigned char, addr, sizeof(struct sockaddr_in6)) - __array(unsigned char, netid, 8) - ), - TP_fast_assign( - __entry->cl_boot = clp->cl_clientid.cl_boot; - __entry->cl_id = clp->cl_clientid.cl_id; - strlcpy(__entry->netid, netid, sizeof(__entry->netid)); - __entry->authflavor = authflavor; - memcpy(__entry->addr, &clp->cl_cb_conn.cb_addr, - sizeof(struct sockaddr_in6)); - ), - TP_printk("addr=%pISpc client %08x:%08x proto=%s flavor=%s", - __entry->addr, __entry->cl_boot, __entry->cl_id, - __entry->netid, show_nfsd_authflavor(__entry->authflavor)) -); - TRACE_EVENT(nfsd_cb_setup_err, TP_PROTO( const struct nfs4_client *clp, @@ -1314,138 +700,54 @@ TRACE_EVENT(nfsd_cb_setup_err, __entry->addr, __entry->cl_boot, __entry->cl_id, __entry->error) ); -TRACE_EVENT(nfsd_cb_recall, - TP_PROTO( - const struct nfs4_stid *stid - ), - TP_ARGS(stid), - TP_STRUCT__entry( - __field(u32, cl_boot) - __field(u32, cl_id) - __field(u32, si_id) - __field(u32, si_generation) - __array(unsigned char, addr, sizeof(struct sockaddr_in6)) - ), - TP_fast_assign( - const stateid_t *stp = &stid->sc_stateid; - const struct nfs4_client *clp = stid->sc_client; - - __entry->cl_boot = stp->si_opaque.so_clid.cl_boot; - __entry->cl_id = stp->si_opaque.so_clid.cl_id; - __entry->si_id = stp->si_opaque.so_id; - __entry->si_generation = stp->si_generation; - if (clp) - memcpy(__entry->addr, &clp->cl_cb_conn.cb_addr, - sizeof(struct sockaddr_in6)); - else - memset(__entry->addr, 0, sizeof(struct sockaddr_in6)); - ), - TP_printk("addr=%pISpc client %08x:%08x stateid %08x:%08x", - __entry->addr, __entry->cl_boot, __entry->cl_id, - __entry->si_id, __entry->si_generation) -); - -TRACE_EVENT(nfsd_cb_notify_lock, - TP_PROTO( - const struct nfs4_lockowner *lo, - const struct nfsd4_blocked_lock *nbl - ), - TP_ARGS(lo, nbl), - TP_STRUCT__entry( - __field(u32, cl_boot) - __field(u32, cl_id) - __field(u32, fh_hash) - __array(unsigned char, addr, sizeof(struct sockaddr_in6)) - ), - TP_fast_assign( - const struct nfs4_client *clp = lo->lo_owner.so_client; - - __entry->cl_boot = clp->cl_clientid.cl_boot; - __entry->cl_id = clp->cl_clientid.cl_id; - __entry->fh_hash = knfsd_fh_hash(&nbl->nbl_fh); - memcpy(__entry->addr, &clp->cl_cb_conn.cb_addr, - sizeof(struct sockaddr_in6)); - ), - TP_printk("addr=%pISpc client %08x:%08x fh_hash=0x%08x", - __entry->addr, __entry->cl_boot, __entry->cl_id, - __entry->fh_hash) -); - -TRACE_EVENT(nfsd_cb_offload, +TRACE_EVENT(nfsd_cb_work, TP_PROTO( const struct nfs4_client *clp, - const stateid_t *stp, - const struct knfsd_fh *fh, - u64 count, - __be32 status + const char *procedure ), - TP_ARGS(clp, stp, fh, count, status), + TP_ARGS(clp, procedure), TP_STRUCT__entry( __field(u32, cl_boot) __field(u32, cl_id) - __field(u32, si_id) - __field(u32, si_generation) - __field(u32, fh_hash) - __field(int, status) - __field(u64, count) + __string(procedure, procedure) __array(unsigned char, addr, sizeof(struct sockaddr_in6)) ), TP_fast_assign( - __entry->cl_boot = stp->si_opaque.so_clid.cl_boot; - __entry->cl_id = stp->si_opaque.so_clid.cl_id; - __entry->si_id = stp->si_opaque.so_id; - __entry->si_generation = stp->si_generation; - __entry->fh_hash = knfsd_fh_hash(fh); - __entry->status = be32_to_cpu(status); - __entry->count = count; + __entry->cl_boot = clp->cl_clientid.cl_boot; + __entry->cl_id = clp->cl_clientid.cl_id; + __assign_str(procedure, procedure) memcpy(__entry->addr, &clp->cl_cb_conn.cb_addr, sizeof(struct sockaddr_in6)); ), - TP_printk("addr=%pISpc client %08x:%08x stateid %08x:%08x fh_hash=0x%08x count=%llu status=%d", + TP_printk("addr=%pISpc client %08x:%08x procedure=%s", __entry->addr, __entry->cl_boot, __entry->cl_id, - __entry->si_id, __entry->si_generation, - __entry->fh_hash, __entry->count, __entry->status) + __get_str(procedure)) ); -DECLARE_EVENT_CLASS(nfsd_cb_done_class, +TRACE_EVENT(nfsd_cb_done, TP_PROTO( - const stateid_t *stp, - const struct rpc_task *task + const struct nfs4_client *clp, + int status ), - TP_ARGS(stp, task), + TP_ARGS(clp, status), TP_STRUCT__entry( __field(u32, cl_boot) __field(u32, cl_id) - __field(u32, si_id) - __field(u32, si_generation) __field(int, status) + __array(unsigned char, addr, sizeof(struct sockaddr_in6)) ), TP_fast_assign( - __entry->cl_boot = stp->si_opaque.so_clid.cl_boot; - __entry->cl_id = stp->si_opaque.so_clid.cl_id; - __entry->si_id = stp->si_opaque.so_id; - __entry->si_generation = stp->si_generation; - __entry->status = task->tk_status; + __entry->cl_boot = clp->cl_clientid.cl_boot; + __entry->cl_id = clp->cl_clientid.cl_id; + __entry->status = status; + memcpy(__entry->addr, &clp->cl_cb_conn.cb_addr, + sizeof(struct sockaddr_in6)); ), - TP_printk("client %08x:%08x stateid %08x:%08x status=%d", - __entry->cl_boot, __entry->cl_id, __entry->si_id, - __entry->si_generation, __entry->status - ) + TP_printk("addr=%pISpc client %08x:%08x status=%d", + __entry->addr, __entry->cl_boot, __entry->cl_id, + __entry->status) ); -#define DEFINE_NFSD_CB_DONE_EVENT(name) \ -DEFINE_EVENT(nfsd_cb_done_class, name, \ - TP_PROTO( \ - const stateid_t *stp, \ - const struct rpc_task *task \ - ), \ - TP_ARGS(stp, task)) - -DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_recall_done); -DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_notify_lock_done); -DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_layout_done); -DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_offload_done); - #endif /* _NFSD_TRACE_H */ #undef TRACE_INCLUDE_PATH diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 0ea05ddff0d0..31edb883afd0 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -32,13 +32,14 @@ #include #include +#ifdef CONFIG_NFSD_V3 #include "xdr3.h" +#endif /* CONFIG_NFSD_V3 */ #ifdef CONFIG_NFSD_V4 #include "../internal.h" #include "acl.h" #include "idmap.h" -#include "xdr4.h" #endif /* CONFIG_NFSD_V4 */ #include "nfsd.h" @@ -48,69 +49,6 @@ #define NFSDDBG_FACILITY NFSDDBG_FILEOP -/** - * nfserrno - Map Linux errnos to NFS errnos - * @errno: POSIX(-ish) error code to be mapped - * - * Returns the appropriate (net-endian) nfserr_* (or nfs_ok if errno is 0). If - * it's an error we don't expect, log it once and return nfserr_io. - */ -__be32 -nfserrno (int errno) -{ - static struct { - __be32 nfserr; - int syserr; - } nfs_errtbl[] = { - { nfs_ok, 0 }, - { nfserr_perm, -EPERM }, - { nfserr_noent, -ENOENT }, - { nfserr_io, -EIO }, - { nfserr_nxio, -ENXIO }, - { nfserr_fbig, -E2BIG }, - { nfserr_stale, -EBADF }, - { nfserr_acces, -EACCES }, - { nfserr_exist, -EEXIST }, - { nfserr_xdev, -EXDEV }, - { nfserr_mlink, -EMLINK }, - { nfserr_nodev, -ENODEV }, - { nfserr_notdir, -ENOTDIR }, - { nfserr_isdir, -EISDIR }, - { nfserr_inval, -EINVAL }, - { nfserr_fbig, -EFBIG }, - { nfserr_nospc, -ENOSPC }, - { nfserr_rofs, -EROFS }, - { nfserr_mlink, -EMLINK }, - { nfserr_nametoolong, -ENAMETOOLONG }, - { nfserr_notempty, -ENOTEMPTY }, - { nfserr_dquot, -EDQUOT }, - { nfserr_stale, -ESTALE }, - { nfserr_jukebox, -ETIMEDOUT }, - { nfserr_jukebox, -ERESTARTSYS }, - { nfserr_jukebox, -EAGAIN }, - { nfserr_jukebox, -EWOULDBLOCK }, - { nfserr_jukebox, -ENOMEM }, - { nfserr_io, -ETXTBSY }, - { nfserr_notsupp, -EOPNOTSUPP }, - { nfserr_toosmall, -ETOOSMALL }, - { nfserr_serverfault, -ESERVERFAULT }, - { nfserr_serverfault, -ENFILE }, - { nfserr_io, -EREMOTEIO }, - { nfserr_stale, -EOPENSTALE }, - { nfserr_io, -EUCLEAN }, - { nfserr_perm, -ENOKEY }, - { nfserr_no_grace, -ENOGRACE}, - }; - int i; - - for (i = 0; i < ARRAY_SIZE(nfs_errtbl); i++) { - if (nfs_errtbl[i].syserr == errno) - return nfs_errtbl[i].nfserr; - } - WARN_ONCE(1, "nfsd: non-standard errno: %d\n", errno); - return nfserr_io; -} - /* * Called from nfsd_lookup and encode_dirent. Check if we have crossed * a mount point. @@ -261,13 +199,27 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp, goto out_nfserr; } } else { - dentry = lookup_one_len_unlocked(name, dparent, len); + /* + * In the nfsd4_open() case, this may be held across + * subsequent open and delegation acquisition which may + * need to take the child's i_mutex: + */ + fh_lock_nested(fhp, I_MUTEX_PARENT); + dentry = lookup_one_len(name, dparent, len); host_err = PTR_ERR(dentry); if (IS_ERR(dentry)) goto out_nfserr; if (nfsd_mountpoint(dentry, exp)) { - host_err = nfsd_cross_mnt(rqstp, &dentry, &exp); - if (host_err) { + /* + * We don't need the i_mutex after all. It's + * still possible we could open this (regular + * files can be mountpoints too), but the + * i_mutex is just there to prevent renames of + * something that we might be about to delegate, + * and a mountpoint won't be renamed: + */ + fh_unlock(fhp); + if ((host_err = nfsd_cross_mnt(rqstp, &dentry, &exp))) { dput(dentry); goto out_nfserr; } @@ -282,15 +234,7 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp, return nfserrno(host_err); } -/** - * nfsd_lookup - look up a single path component for nfsd - * - * @rqstp: the request context - * @fhp: the file handle of the directory - * @name: the component name, or %NULL to look up parent - * @len: length of name to examine - * @resfh: pointer to pre-initialised filehandle to hold result. - * +/* * Look up one component of a pathname. * N.B. After this call _both_ fhp and resfh need an fh_put * @@ -300,11 +244,11 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp, * returned. Otherwise the covered directory is returned. * NOTE: this mountpoint crossing is not supported properly by all * clients and is explicitly disallowed for NFSv3 - * + * NeilBrown */ __be32 nfsd_lookup(struct svc_rqst *rqstp, struct svc_fh *fhp, const char *name, - unsigned int len, struct svc_fh *resfh) + unsigned int len, struct svc_fh *resfh) { struct svc_export *exp; struct dentry *dentry; @@ -362,10 +306,6 @@ commit_metadata(struct svc_fh *fhp) static void nfsd_sanitize_attrs(struct inode *inode, struct iattr *iap) { - /* Ignore mode updates on symlinks */ - if (S_ISLNK(inode->i_mode)) - iap->ia_valid &= ~ATTR_MODE; - /* sanitize the mode change */ if (iap->ia_valid & ATTR_MODE) { iap->ia_mode &= S_IALLUGO; @@ -419,77 +359,21 @@ nfsd_get_write_access(struct svc_rqst *rqstp, struct svc_fh *fhp, return nfserrno(host_err); } -static int __nfsd_setattr(struct dentry *dentry, struct iattr *iap) -{ - int host_err; - - if (iap->ia_valid & ATTR_SIZE) { - /* - * RFC5661, Section 18.30.4: - * Changing the size of a file with SETATTR indirectly - * changes the time_modify and change attributes. - * - * (and similar for the older RFCs) - */ - struct iattr size_attr = { - .ia_valid = ATTR_SIZE | ATTR_CTIME | ATTR_MTIME, - .ia_size = iap->ia_size, - }; - - if (iap->ia_size < 0) - return -EFBIG; - - host_err = notify_change(dentry, &size_attr, NULL); - if (host_err) - return host_err; - iap->ia_valid &= ~ATTR_SIZE; - - /* - * Avoid the additional setattr call below if the only other - * attribute that the client sends is the mtime, as we update - * it as part of the size change above. - */ - if ((iap->ia_valid & ~ATTR_MTIME) == 0) - return 0; - } - - if (!iap->ia_valid) - return 0; - - iap->ia_valid |= ATTR_CTIME; - return notify_change(dentry, iap, NULL); -} - -/** - * nfsd_setattr - Set various file attributes. - * @rqstp: controlling RPC transaction - * @fhp: filehandle of target - * @attr: attributes to set - * @check_guard: set to 1 if guardtime is a valid timestamp - * @guardtime: do not act if ctime.tv_sec does not match this timestamp - * - * This call may adjust the contents of @attr (in particular, this - * call may change the bits in the na_iattr.ia_valid field). - * - * Returns nfs_ok on success, otherwise an NFS status code is - * returned. Caller must release @fhp by calling fh_put in either - * case. +/* + * Set various file attributes. After this call fhp needs an fh_put. */ __be32 -nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct nfsd_attrs *attr, +nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap, int check_guard, time64_t guardtime) { struct dentry *dentry; struct inode *inode; - struct iattr *iap = attr->na_iattr; int accmode = NFSD_MAY_SATTR; umode_t ftype = 0; __be32 err; - int host_err = 0; + int host_err; bool get_write_count; bool size_change = (iap->ia_valid & ATTR_SIZE); - int retries; if (iap->ia_valid & ATTR_SIZE) { accmode |= NFSD_MAY_WRITE|NFSD_MAY_OWNER_OVERRIDE; @@ -525,6 +409,13 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, dentry = fhp->fh_dentry; inode = d_inode(dentry); + /* Ignore any mode updates on symlinks */ + if (S_ISLNK(inode->i_mode)) + iap->ia_valid &= ~ATTR_MODE; + + if (!iap->ia_valid) + return 0; + nfsd_sanitize_attrs(inode, iap); if (check_guard && guardtime != inode->i_ctime.tv_sec) @@ -543,41 +434,45 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, return err; } - inode_lock(inode); - fh_fill_pre_attrs(fhp); - for (retries = 1;;) { - struct iattr attrs; + fh_lock(fhp); + if (size_change) { + /* + * RFC5661, Section 18.30.4: + * Changing the size of a file with SETATTR indirectly + * changes the time_modify and change attributes. + * + * (and similar for the older RFCs) + */ + struct iattr size_attr = { + .ia_valid = ATTR_SIZE | ATTR_CTIME | ATTR_MTIME, + .ia_size = iap->ia_size, + }; + + host_err = notify_change(dentry, &size_attr, NULL); + if (host_err) + goto out_unlock; + iap->ia_valid &= ~ATTR_SIZE; /* - * notify_change() can alter its iattr argument, making - * @iap unsuitable for submission multiple times. Make a - * copy for every loop iteration. + * Avoid the additional setattr call below if the only other + * attribute that the client sends is the mtime, as we update + * it as part of the size change above. */ - attrs = *iap; - host_err = __nfsd_setattr(dentry, &attrs); - if (host_err != -EAGAIN || !retries--) - break; - if (!nfsd_wait_for_delegreturn(rqstp, inode)) - break; + if ((iap->ia_valid & ~ATTR_MTIME) == 0) + goto out_unlock; } - if (attr->na_seclabel && attr->na_seclabel->len) - attr->na_labelerr = security_inode_setsecctx(dentry, - attr->na_seclabel->data, attr->na_seclabel->len); - if (IS_ENABLED(CONFIG_FS_POSIX_ACL) && attr->na_pacl) - attr->na_aclerr = set_posix_acl(inode, ACL_TYPE_ACCESS, - attr->na_pacl); - if (IS_ENABLED(CONFIG_FS_POSIX_ACL) && - !attr->na_aclerr && attr->na_dpacl && S_ISDIR(inode->i_mode)) - attr->na_aclerr = set_posix_acl(inode, ACL_TYPE_DEFAULT, - attr->na_dpacl); - fh_fill_post_attrs(fhp); - inode_unlock(inode); + + iap->ia_valid |= ATTR_CTIME; + host_err = notify_change(dentry, iap, NULL); + +out_unlock: + fh_unlock(fhp); if (size_change) put_write_access(inode); out: if (!host_err) host_err = commit_metadata(fhp); - return err != 0 ? err : nfserrno(host_err); + return nfserrno(host_err); } #if defined(CONFIG_NFSD_V4) @@ -608,16 +503,35 @@ int nfsd4_is_junction(struct dentry *dentry) return 0; return 1; } - -static struct nfsd4_compound_state *nfsd4_get_cstate(struct svc_rqst *rqstp) +#ifdef CONFIG_NFSD_V4_SECURITY_LABEL +__be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp, + struct xdr_netobj *label) { - return &((struct nfsd4_compoundres *)rqstp->rq_resp)->cstate; -} + __be32 error; + int host_error; + struct dentry *dentry; -__be32 nfsd4_clone_file_range(struct svc_rqst *rqstp, - struct nfsd_file *nf_src, u64 src_pos, - struct nfsd_file *nf_dst, u64 dst_pos, - u64 count, bool sync) + error = fh_verify(rqstp, fhp, 0 /* S_IFREG */, NFSD_MAY_SATTR); + if (error) + return error; + + dentry = fhp->fh_dentry; + + inode_lock(d_inode(dentry)); + host_error = security_inode_setsecctx(dentry, label->data, label->len); + inode_unlock(d_inode(dentry)); + return nfserrno(host_error); +} +#else +__be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp, + struct xdr_netobj *label) +{ + return nfserr_notsupp; +} +#endif + +__be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos, + struct nfsd_file *nf_dst, u64 dst_pos, u64 count, bool sync) { struct file *src = nf_src->nf_file; struct file *dst = nf_dst->nf_file; @@ -644,17 +558,8 @@ __be32 nfsd4_clone_file_range(struct svc_rqst *rqstp, if (!status) status = commit_inode_metadata(file_inode(src)); if (status < 0) { - struct nfsd_net *nn = net_generic(nf_dst->nf_net, - nfsd_net_id); - - trace_nfsd_clone_file_range_err(rqstp, - &nfsd4_get_cstate(rqstp)->save_fh, - src_pos, - &nfsd4_get_cstate(rqstp)->current_fh, - dst_pos, - count, status); - nfsd_reset_write_verifier(nn); - trace_nfsd_writeverf_reset(nn, rqstp, status); + nfsd_reset_boot_verifier(net_generic(nf_dst->nf_net, + nfsd_net_id)); ret = nfserrno(status); } } @@ -701,6 +606,7 @@ __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh *fhp, } #endif /* defined(CONFIG_NFSD_V4) */ +#ifdef CONFIG_NFSD_V3 /* * Check server access rights to a file system object */ @@ -812,6 +718,7 @@ nfsd_access(struct svc_rqst *rqstp, struct svc_fh *fhp, u32 *access, u32 *suppor out: return error; } +#endif /* CONFIG_NFSD_V3 */ int nfsd_open_break_lease(struct inode *inode, int access) { @@ -844,6 +751,9 @@ __nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, path.dentry = fhp->fh_dentry; inode = d_inode(path.dentry); + /* Disallow write access to files with the append-only bit set + * or any access when mandatory locking enabled + */ err = nfserr_perm; if (IS_APPEND(inode) && (may_flags & NFSD_MAY_WRITE)) goto out; @@ -898,7 +808,6 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int may_flags, struct file **filp) { __be32 err; - bool retried = false; validate_process_creds(); /* @@ -914,37 +823,21 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, */ if (type == S_IFREG) may_flags |= NFSD_MAY_OWNER_OVERRIDE; -retry: err = fh_verify(rqstp, fhp, type, may_flags); - if (!err) { + if (!err) err = __nfsd_open(rqstp, fhp, type, may_flags, filp); - if (err == nfserr_stale && !retried) { - retried = true; - fh_put(fhp); - goto retry; - } - } validate_process_creds(); return err; } -/** - * nfsd_open_verified - Open a regular file for the filecache - * @rqstp: RPC request - * @fhp: NFS filehandle of the file to open - * @may_flags: internal permission flags - * @filp: OUT: open "struct file *" - * - * Returns an nfsstat value in network byte order. - */ __be32 -nfsd_open_verified(struct svc_rqst *rqstp, struct svc_fh *fhp, int may_flags, - struct file **filp) +nfsd_open_verified(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, + int may_flags, struct file **filp) { __be32 err; validate_process_creds(); - err = __nfsd_open(rqstp, fhp, S_IFREG, may_flags, filp); + err = __nfsd_open(rqstp, fhp, type, may_flags, filp); validate_process_creds(); return err; } @@ -959,24 +852,28 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct splice_desc *sd) { struct svc_rqst *rqstp = sd->u.data; - struct page *page = buf->page; // may be a compound one - unsigned offset = buf->offset; - struct page *last_page; + struct page **pp = rqstp->rq_next_page; + struct page *page = buf->page; + size_t size; - last_page = page + (offset + sd->len - 1) / PAGE_SIZE; - for (page += offset / PAGE_SIZE; page <= last_page; page++) { - /* - * Skip page replacement when extending the contents - * of the current page. - */ - if (page == *(rqstp->rq_next_page - 1)) - continue; - svc_rqst_replace_page(rqstp, page); - } - if (rqstp->rq_res.page_len == 0) // first call - rqstp->rq_res.page_base = offset % PAGE_SIZE; - rqstp->rq_res.page_len += sd->len; - return sd->len; + size = sd->len; + + if (rqstp->rq_res.page_len == 0) { + get_page(page); + put_page(*rqstp->rq_next_page); + *(rqstp->rq_next_page++) = page; + rqstp->rq_res.page_base = buf->offset; + rqstp->rq_res.page_len = size; + } else if (page != pp[-1]) { + get_page(page); + if (*rqstp->rq_next_page) + put_page(*rqstp->rq_next_page); + *(rqstp->rq_next_page++) = page; + rqstp->rq_res.page_len += size; + } else + rqstp->rq_res.page_len += size; + + return size; } static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe, @@ -1000,7 +897,7 @@ static __be32 nfsd_finish_read(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned long *count, u32 *eof, ssize_t host_err) { if (host_err >= 0) { - nfsd_stats_io_read_add(fhp->fh_export, host_err); + nfsdstats.io_read += host_err; *eof = nfsd_eof_on_read(file, offset, host_err, *count); *count = host_err; fsnotify_access(file); @@ -1088,9 +985,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, unsigned long *cnt, int stable, __be32 *verf) { - struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); struct file *file = nf->nf_file; - struct super_block *sb = file_inode(file)->i_sb; struct svc_export *exp; struct iov_iter iter; errseq_t since; @@ -1098,18 +993,12 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, int host_err; int use_wgather; loff_t pos = offset; - unsigned long exp_op_flags = 0; unsigned int pflags = current->flags; rwf_t flags = 0; - bool restore_flags = false; trace_nfsd_write_opened(rqstp, fhp, offset, *cnt); - if (sb->s_export_op) - exp_op_flags = sb->s_export_op->flags; - - if (test_bit(RQ_LOCAL, &rqstp->rq_flags) && - !(exp_op_flags & EXPORT_OP_REMOTE_FS)) { + if (test_bit(RQ_LOCAL, &rqstp->rq_flags)) /* * We want throttling in balance_dirty_pages() * and shrink_inactive_list() to only consider @@ -1118,8 +1007,6 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, * the client's dirty pages or its congested queue. */ current->flags |= PF_LOCAL_THROTTLE; - restore_flags = true; - } exp = fhp->fh_export; use_wgather = (rqstp->rq_vers == 2) && EX_WGATHER(exp); @@ -1132,18 +1019,29 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, iov_iter_kvec(&iter, WRITE, vec, vlen, *cnt); since = READ_ONCE(file->f_wb_err); - if (verf) - nfsd_copy_write_verifier(verf, nn); - file_start_write(file); - host_err = vfs_iter_write(file, &iter, &pos, flags); - file_end_write(file); + if (flags & RWF_SYNC) { + if (verf) + nfsd_copy_boot_verifier(verf, + net_generic(SVC_NET(rqstp), + nfsd_net_id)); + host_err = vfs_iter_write(file, &iter, &pos, flags); + if (host_err < 0) + nfsd_reset_boot_verifier(net_generic(SVC_NET(rqstp), + nfsd_net_id)); + } else { + if (verf) + nfsd_copy_boot_verifier(verf, + net_generic(SVC_NET(rqstp), + nfsd_net_id)); + host_err = vfs_iter_write(file, &iter, &pos, flags); + } if (host_err < 0) { - nfsd_reset_write_verifier(nn); - trace_nfsd_writeverf_reset(nn, rqstp, host_err); + nfsd_reset_boot_verifier(net_generic(SVC_NET(rqstp), + nfsd_net_id)); goto out_nfserr; } *cnt = host_err; - nfsd_stats_io_write_add(exp, *cnt); + nfsdstats.io_write += *cnt; fsnotify_modify(file); host_err = filemap_check_wb_err(file->f_mapping, since); if (host_err < 0) @@ -1151,10 +1049,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, if (stable && use_wgather) { host_err = wait_for_concurrent_writes(file); - if (host_err < 0) { - nfsd_reset_write_verifier(nn); - trace_nfsd_writeverf_reset(nn, rqstp, host_err); - } + if (host_err < 0) + nfsd_reset_boot_verifier(net_generic(SVC_NET(rqstp), + nfsd_net_id)); } out_nfserr: @@ -1165,7 +1062,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, trace_nfsd_write_err(rqstp, fhp, offset, host_err); nfserr = nfserrno(host_err); } - if (restore_flags) + if (test_bit(RQ_LOCAL, &rqstp->rq_flags)) current_restore_flags(pflags, PF_LOCAL_THROTTLE); return nfserr; } @@ -1184,7 +1081,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, __be32 err; trace_nfsd_read_start(rqstp, fhp, offset, *count); - err = nfsd_file_acquire_gc(rqstp, fhp, NFSD_MAY_READ, &nf); + err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); if (err) return err; @@ -1216,7 +1113,7 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset, trace_nfsd_write_start(rqstp, fhp, offset, *cnt); - err = nfsd_file_acquire_gc(rqstp, fhp, NFSD_MAY_WRITE, &nf); + err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_WRITE, &nf); if (err) goto out; @@ -1228,59 +1125,45 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset, return err; } -/** - * nfsd_commit - Commit pending writes to stable storage - * @rqstp: RPC request being processed - * @fhp: NFS filehandle - * @nf: target file - * @offset: raw offset from beginning of file - * @count: raw count of bytes to sync - * @verf: filled in with the server's current write verifier +#ifdef CONFIG_NFSD_V3 +/* + * Commit all pending writes to stable storage. * - * Note: we guarantee that data that lies within the range specified - * by the 'offset' and 'count' parameters will be synced. The server - * is permitted to sync data that lies outside this range at the - * same time. + * Note: we only guarantee that data that lies within the range specified + * by the 'offset' and 'count' parameters will be synced. * * Unfortunately we cannot lock the file to make sure we return full WCC * data to the client, as locking happens lower down in the filesystem. - * - * Return values: - * An nfsstat value in network byte order. */ __be32 -nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, - u64 offset, u32 count, __be32 *verf) +nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, + loff_t offset, unsigned long count, __be32 *verf) { - __be32 err = nfs_ok; - u64 maxbytes; - loff_t start, end; - struct nfsd_net *nn; + struct nfsd_file *nf; + loff_t end = LLONG_MAX; + __be32 err = nfserr_inval; - /* - * Convert the client-provided (offset, count) range to a - * (start, end) range. If the client-provided range falls - * outside the maximum file size of the underlying FS, - * clamp the sync range appropriately. - */ - start = 0; - end = LLONG_MAX; - maxbytes = (u64)fhp->fh_dentry->d_sb->s_maxbytes; - if (offset < maxbytes) { - start = offset; - if (count && (offset + count - 1 < maxbytes)) - end = offset + count - 1; + if (offset < 0) + goto out; + if (count != 0) { + end = offset + (loff_t)count - 1; + if (end < offset) + goto out; } - nn = net_generic(nf->nf_net, nfsd_net_id); + err = nfsd_file_acquire(rqstp, fhp, + NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &nf); + if (err) + goto out; if (EX_ISSYNC(fhp->fh_export)) { errseq_t since = READ_ONCE(nf->nf_file->f_wb_err); int err2; - err2 = vfs_fsync_range(nf->nf_file, start, end, 0); + err2 = vfs_fsync_range(nf->nf_file, offset, end, 0); switch (err2) { case 0: - nfsd_copy_write_verifier(verf, nn); + nfsd_copy_boot_verifier(verf, net_generic(nf->nf_net, + nfsd_net_id)); err2 = filemap_check_wb_err(nf->nf_file->f_mapping, since); err = nfserrno(err2); @@ -1289,37 +1172,28 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, err = nfserr_notsupp; break; default: - nfsd_reset_write_verifier(nn); - trace_nfsd_writeverf_reset(nn, rqstp, err2); + nfsd_reset_boot_verifier(net_generic(nf->nf_net, + nfsd_net_id)); err = nfserrno(err2); } } else - nfsd_copy_write_verifier(verf, nn); + nfsd_copy_boot_verifier(verf, net_generic(nf->nf_net, + nfsd_net_id)); + nfsd_file_put(nf); +out: return err; } +#endif /* CONFIG_NFSD_V3 */ -/** - * nfsd_create_setattr - Set a created file's attributes - * @rqstp: RPC transaction being executed - * @fhp: NFS filehandle of parent directory - * @resfhp: NFS filehandle of new object - * @attrs: requested attributes of new object - * - * Returns nfs_ok on success, or an nfsstat in network byte order. - */ -__be32 -nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct svc_fh *resfhp, struct nfsd_attrs *attrs) +static __be32 +nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *resfhp, + struct iattr *iap) { - struct iattr *iap = attrs->na_iattr; - __be32 status; - /* - * Mode has already been set by file creation. + * Mode has already been set earlier in create: */ iap->ia_valid &= ~ATTR_MODE; - /* * Setting uid/gid works only for root. Irix appears to * send along the gid on create when it tries to implement @@ -1327,31 +1201,10 @@ nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, */ if (!uid_eq(current_fsuid(), GLOBAL_ROOT_UID)) iap->ia_valid &= ~(ATTR_UID|ATTR_GID); - - /* - * Callers expect new file metadata to be committed even - * if the attributes have not changed. - */ if (iap->ia_valid) - status = nfsd_setattr(rqstp, resfhp, attrs, 0, (time64_t)0); - else - status = nfserrno(commit_metadata(resfhp)); - - /* - * Transactional filesystems had a chance to commit changes - * for both parent and child simultaneously making the - * following commit_metadata a noop in many cases. - */ - if (!status) - status = nfserrno(commit_metadata(fhp)); - - /* - * Update the new filehandle to pick up the new attributes. - */ - if (!status) - status = fh_update(resfhp); - - return status; + return nfsd_setattr(rqstp, resfhp, iap, 0, (time64_t)0); + /* Callers expect file metadata to be committed here */ + return nfserrno(commit_metadata(resfhp)); } /* HPUX client sometimes creates a file in mode 000, and sets size to 0. @@ -1372,19 +1225,26 @@ nfsd_check_ignore_resizing(struct iattr *iap) /* The parent directory should already be locked: */ __be32 nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct nfsd_attrs *attrs, - int type, dev_t rdev, struct svc_fh *resfhp) + char *fname, int flen, struct iattr *iap, + int type, dev_t rdev, struct svc_fh *resfhp) { struct dentry *dentry, *dchild; struct inode *dirp; - struct iattr *iap = attrs->na_iattr; __be32 err; + __be32 err2; int host_err; dentry = fhp->fh_dentry; dirp = d_inode(dentry); dchild = dget(resfhp->fh_dentry); + if (!fhp->fh_locked) { + WARN_ONCE(1, "nfsd_create: parent %pd2 not locked!\n", + dentry); + err = nfserr_io; + goto out; + } + err = nfsd_permission(rqstp, fhp->fh_export, dentry, NFSD_MAY_CREATE); if (err) goto out; @@ -1397,6 +1257,7 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, iap->ia_mode &= ~current_umask(); err = 0; + host_err = 0; switch (type) { case S_IFREG: host_err = vfs_create(dirp, dchild, iap->ia_mode, true); @@ -1442,8 +1303,22 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, if (host_err < 0) goto out_nfserr; - err = nfsd_create_setattr(rqstp, fhp, resfhp, attrs); + err = nfsd_create_setattr(rqstp, resfhp, iap); + /* + * nfsd_create_setattr already committed the child. Transactional + * filesystems had a chance to commit changes for both parent and + * child simultaneously making the following commit_metadata a + * noop. + */ + err2 = nfserrno(commit_metadata(fhp)); + if (err2) + err = err2; + /* + * Update the file handle to get the new inode info. + */ + if (!err) + err = fh_update(resfhp); out: dput(dchild); return err; @@ -1461,8 +1336,8 @@ nfsd_create_locked(struct svc_rqst *rqstp, struct svc_fh *fhp, */ __be32 nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, - char *fname, int flen, struct nfsd_attrs *attrs, - int type, dev_t rdev, struct svc_fh *resfhp) + char *fname, int flen, struct iattr *iap, + int type, dev_t rdev, struct svc_fh *resfhp) { struct dentry *dentry, *dchild = NULL; __be32 err; @@ -1481,13 +1356,11 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, if (host_err) return nfserrno(host_err); - inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT); + fh_lock_nested(fhp, I_MUTEX_PARENT); dchild = lookup_one_len(fname, dentry, flen); host_err = PTR_ERR(dchild); - if (IS_ERR(dchild)) { - err = nfserrno(host_err); - goto out_unlock; - } + if (IS_ERR(dchild)) + return nfserrno(host_err); err = fh_compose(resfhp, fhp->fh_export, dchild, fhp); /* * We unconditionally drop our ref to dchild as fh_compose will have @@ -1495,15 +1368,179 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, */ dput(dchild); if (err) - goto out_unlock; - fh_fill_pre_attrs(fhp); - err = nfsd_create_locked(rqstp, fhp, attrs, type, rdev, resfhp); - fh_fill_post_attrs(fhp); -out_unlock: - inode_unlock(dentry->d_inode); - return err; + return err; + return nfsd_create_locked(rqstp, fhp, fname, flen, iap, type, + rdev, resfhp); } +#ifdef CONFIG_NFSD_V3 + +/* + * NFSv3 and NFSv4 version of nfsd_create + */ +__be32 +do_nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fhp, + char *fname, int flen, struct iattr *iap, + struct svc_fh *resfhp, int createmode, u32 *verifier, + bool *truncp, bool *created) +{ + struct dentry *dentry, *dchild = NULL; + struct inode *dirp; + __be32 err; + int host_err; + __u32 v_mtime=0, v_atime=0; + + err = nfserr_perm; + if (!flen) + goto out; + err = nfserr_exist; + if (isdotent(fname, flen)) + goto out; + if (!(iap->ia_valid & ATTR_MODE)) + iap->ia_mode = 0; + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_EXEC); + if (err) + goto out; + + dentry = fhp->fh_dentry; + dirp = d_inode(dentry); + + host_err = fh_want_write(fhp); + if (host_err) + goto out_nfserr; + + fh_lock_nested(fhp, I_MUTEX_PARENT); + + /* + * Compose the response file handle. + */ + dchild = lookup_one_len(fname, dentry, flen); + host_err = PTR_ERR(dchild); + if (IS_ERR(dchild)) + goto out_nfserr; + + /* If file doesn't exist, check for permissions to create one */ + if (d_really_is_negative(dchild)) { + err = fh_verify(rqstp, fhp, S_IFDIR, NFSD_MAY_CREATE); + if (err) + goto out; + } + + err = fh_compose(resfhp, fhp->fh_export, dchild, fhp); + if (err) + goto out; + + if (nfsd_create_is_exclusive(createmode)) { + /* solaris7 gets confused (bugid 4218508) if these have + * the high bit set, so just clear the high bits. If this is + * ever changed to use different attrs for storing the + * verifier, then do_open_lookup() will also need to be fixed + * accordingly. + */ + v_mtime = verifier[0]&0x7fffffff; + v_atime = verifier[1]&0x7fffffff; + } + + if (d_really_is_positive(dchild)) { + err = 0; + + switch (createmode) { + case NFS3_CREATE_UNCHECKED: + if (! d_is_reg(dchild)) + goto out; + else if (truncp) { + /* in nfsv4, we need to treat this case a little + * differently. we don't want to truncate the + * file now; this would be wrong if the OPEN + * fails for some other reason. furthermore, + * if the size is nonzero, we should ignore it + * according to spec! + */ + *truncp = (iap->ia_valid & ATTR_SIZE) && !iap->ia_size; + } + else { + iap->ia_valid &= ATTR_SIZE; + goto set_attr; + } + break; + case NFS3_CREATE_EXCLUSIVE: + if ( d_inode(dchild)->i_mtime.tv_sec == v_mtime + && d_inode(dchild)->i_atime.tv_sec == v_atime + && d_inode(dchild)->i_size == 0 ) { + if (created) + *created = true; + break; + } + fallthrough; + case NFS4_CREATE_EXCLUSIVE4_1: + if ( d_inode(dchild)->i_mtime.tv_sec == v_mtime + && d_inode(dchild)->i_atime.tv_sec == v_atime + && d_inode(dchild)->i_size == 0 ) { + if (created) + *created = true; + goto set_attr; + } + fallthrough; + case NFS3_CREATE_GUARDED: + err = nfserr_exist; + } + fh_drop_write(fhp); + goto out; + } + + if (!IS_POSIXACL(dirp)) + iap->ia_mode &= ~current_umask(); + + host_err = vfs_create(dirp, dchild, iap->ia_mode, true); + if (host_err < 0) { + fh_drop_write(fhp); + goto out_nfserr; + } + if (created) + *created = true; + + nfsd_check_ignore_resizing(iap); + + if (nfsd_create_is_exclusive(createmode)) { + /* Cram the verifier into atime/mtime */ + iap->ia_valid = ATTR_MTIME|ATTR_ATIME + | ATTR_MTIME_SET|ATTR_ATIME_SET; + /* XXX someone who knows this better please fix it for nsec */ + iap->ia_mtime.tv_sec = v_mtime; + iap->ia_atime.tv_sec = v_atime; + iap->ia_mtime.tv_nsec = 0; + iap->ia_atime.tv_nsec = 0; + } + + set_attr: + err = nfsd_create_setattr(rqstp, resfhp, iap); + + /* + * nfsd_create_setattr already committed the child + * (and possibly also the parent). + */ + if (!err) + err = nfserrno(commit_metadata(fhp)); + + /* + * Update the filehandle to get the new inode info. + */ + if (!err) + err = fh_update(resfhp); + + out: + fh_unlock(fhp); + if (dchild && !IS_ERR(dchild)) + dput(dchild); + fh_drop_write(fhp); + return err; + + out_nfserr: + err = nfserrno(host_err); + goto out; +} +#endif /* CONFIG_NFSD_V3 */ + /* * Read a symlink. On entry, *lenp must contain the maximum path length that * fits into the buffer. On return, it contains the true length. @@ -1542,25 +1579,15 @@ nfsd_readlink(struct svc_rqst *rqstp, struct svc_fh *fhp, char *buf, int *lenp) return 0; } -/** - * nfsd_symlink - Create a symlink and look up its inode - * @rqstp: RPC transaction being executed - * @fhp: NFS filehandle of parent directory - * @fname: filename of the new symlink - * @flen: length of @fname - * @path: content of the new symlink (NUL-terminated) - * @attrs: requested attributes of new object - * @resfhp: NFS filehandle of new object - * +/* + * Create a symlink and look up its inode * N.B. After this call _both_ fhp and resfhp need an fh_put - * - * Returns nfs_ok on success, or an nfsstat in network byte order. */ __be32 nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp, - char *fname, int flen, - char *path, struct nfsd_attrs *attrs, - struct svc_fh *resfhp) + char *fname, int flen, + char *path, + struct svc_fh *resfhp) { struct dentry *dentry, *dnew; __be32 err, cerr; @@ -1578,35 +1605,33 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *fhp, goto out; host_err = fh_want_write(fhp); - if (host_err) { - err = nfserrno(host_err); - goto out; - } + if (host_err) + goto out_nfserr; + fh_lock(fhp); dentry = fhp->fh_dentry; - inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT); dnew = lookup_one_len(fname, dentry, flen); - if (IS_ERR(dnew)) { - err = nfserrno(PTR_ERR(dnew)); - inode_unlock(dentry->d_inode); - goto out_drop_write; - } - fh_fill_pre_attrs(fhp); + host_err = PTR_ERR(dnew); + if (IS_ERR(dnew)) + goto out_nfserr; + host_err = vfs_symlink(d_inode(dentry), dnew, path); err = nfserrno(host_err); - cerr = fh_compose(resfhp, fhp->fh_export, dnew, fhp); - if (!err) - nfsd_create_setattr(rqstp, fhp, resfhp, attrs); - fh_fill_post_attrs(fhp); - inode_unlock(dentry->d_inode); if (!err) err = nfserrno(commit_metadata(fhp)); + fh_unlock(fhp); + + fh_drop_write(fhp); + + cerr = fh_compose(resfhp, fhp->fh_export, dnew, fhp); dput(dnew); if (err==0) err = cerr; -out_drop_write: - fh_drop_write(fhp); out: return err; + +out_nfserr: + err = nfserrno(host_err); + goto out; } /* @@ -1644,25 +1669,21 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp, goto out; } + fh_lock_nested(ffhp, I_MUTEX_PARENT); ddir = ffhp->fh_dentry; dirp = d_inode(ddir); - inode_lock_nested(dirp, I_MUTEX_PARENT); dnew = lookup_one_len(name, ddir, len); - if (IS_ERR(dnew)) { - err = nfserrno(PTR_ERR(dnew)); - goto out_unlock; - } + host_err = PTR_ERR(dnew); + if (IS_ERR(dnew)) + goto out_nfserr; dold = tfhp->fh_dentry; err = nfserr_noent; if (d_really_is_negative(dold)) goto out_dput; - fh_fill_pre_attrs(ffhp); host_err = vfs_link(dold, dirp, dnew, NULL); - fh_fill_post_attrs(ffhp); - inode_unlock(dirp); if (!host_err) { err = nfserrno(commit_metadata(ffhp)); if (!err) @@ -1673,17 +1694,17 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp, else err = nfserrno(host_err); } +out_dput: dput(dnew); -out_drop_write: +out_unlock: + fh_unlock(ffhp); fh_drop_write(tfhp); out: return err; -out_dput: - dput(dnew); -out_unlock: - inode_unlock(dirp); - goto out_drop_write; +out_nfserr: + err = nfserrno(host_err); + goto out_unlock; } static void @@ -1718,7 +1739,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, struct inode *fdir, *tdir; __be32 err; int host_err; - bool close_cached = false; + bool has_cached = false; err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE); if (err) @@ -1750,9 +1771,12 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, goto out; } + /* cannot use fh_lock as we need deadlock protective ordering + * so do it by hand */ trap = lock_rename(tdentry, fdentry); - fh_fill_pre_attrs(ffhp); - fh_fill_pre_attrs(tfhp); + ffhp->fh_locked = tfhp->fh_locked = true; + fill_pre_wcc(ffhp); + fill_pre_wcc(tfhp); odentry = lookup_one_len(fname, fdentry, flen); host_err = PTR_ERR(odentry); @@ -1774,26 +1798,11 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, if (ndentry == trap) goto out_dput_new; - if ((ndentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) && - nfsd_has_cached_files(ndentry)) { - close_cached = true; + if (nfsd_has_cached_files(ndentry)) { + has_cached = true; goto out_dput_old; } else { - struct renamedata rd = { - .old_dir = fdir, - .old_dentry = odentry, - .new_dir = tdir, - .new_dentry = ndentry, - }; - int retries; - - for (retries = 1;;) { - host_err = vfs_rename(&rd); - if (host_err != -EAGAIN || !retries--) - break; - if (!nfsd_wait_for_delegreturn(rqstp, d_inode(odentry))) - break; - } + host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0); if (!host_err) { host_err = commit_metadata(tfhp); if (!host_err) @@ -1806,12 +1815,17 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, dput(odentry); out_nfserr: err = nfserrno(host_err); - - if (!close_cached) { - fh_fill_post_attrs(ffhp); - fh_fill_post_attrs(tfhp); + /* + * We cannot rely on fh_unlock on the two filehandles, + * as that would do the wrong thing if the two directories + * were the same, so again we do it by hand. + */ + if (!has_cached) { + fill_post_wcc(ffhp); + fill_post_wcc(tfhp); } unlock_rename(tdentry, fdentry); + ffhp->fh_locked = tfhp->fh_locked = false; fh_drop_write(ffhp); /* @@ -1820,8 +1834,8 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, * shouldn't be done with locks held however, so we delay it until this * point and then reattempt the whole shebang. */ - if (close_cached) { - close_cached = false; + if (has_cached) { + has_cached = false; nfsd_close_cached_files(ndentry); dput(ndentry); goto retry; @@ -1840,7 +1854,6 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, { struct dentry *dentry, *rdentry; struct inode *dirp; - struct inode *rinode; __be32 err; int host_err; @@ -1855,50 +1868,34 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, if (host_err) goto out_nfserr; + fh_lock_nested(fhp, I_MUTEX_PARENT); dentry = fhp->fh_dentry; dirp = d_inode(dentry); - inode_lock_nested(dirp, I_MUTEX_PARENT); rdentry = lookup_one_len(fname, dentry, flen); host_err = PTR_ERR(rdentry); if (IS_ERR(rdentry)) - goto out_unlock; + goto out_drop_write; if (d_really_is_negative(rdentry)) { dput(rdentry); host_err = -ENOENT; - goto out_unlock; + goto out_drop_write; } - rinode = d_inode(rdentry); - ihold(rinode); if (!type) type = d_inode(rdentry)->i_mode & S_IFMT; - fh_fill_pre_attrs(fhp); if (type != S_IFDIR) { - int retries; - - if (rdentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) - nfsd_close_cached_files(rdentry); - - for (retries = 1;;) { - host_err = vfs_unlink(dirp, rdentry, NULL); - if (host_err != -EAGAIN || !retries--) - break; - if (!nfsd_wait_for_delegreturn(rqstp, rinode)) - break; - } + nfsd_close_cached_files(rdentry); + host_err = vfs_unlink(dirp, rdentry, NULL); } else { host_err = vfs_rmdir(dirp, rdentry); } - fh_fill_post_attrs(fhp); - inode_unlock(dirp); if (!host_err) host_err = commit_metadata(fhp); dput(rdentry); - iput(rinode); /* truncate the inode here */ out_drop_write: fh_drop_write(fhp); @@ -1916,9 +1913,6 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, } out: return err; -out_unlock: - inode_unlock(dirp); - goto out_drop_write; } /* @@ -1968,9 +1962,8 @@ static int nfsd_buffered_filldir(struct dir_context *ctx, const char *name, return 0; } -static __be32 nfsd_buffered_readdir(struct file *file, struct svc_fh *fhp, - nfsd_filldir_t func, struct readdir_cd *cdp, - loff_t *offsetp) +static __be32 nfsd_buffered_readdir(struct file *file, nfsd_filldir_t func, + struct readdir_cd *cdp, loff_t *offsetp) { struct buffered_dirent *de; int host_err; @@ -2016,8 +2009,6 @@ static __be32 nfsd_buffered_readdir(struct file *file, struct svc_fh *fhp, if (cdp->err != nfs_ok) break; - trace_nfsd_dirent(fhp, de->ino, de->name, de->namlen); - reclen = ALIGN(sizeof(*de) + de->namlen, sizeof(u64)); size -= reclen; @@ -2065,7 +2056,7 @@ nfsd_readdir(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t *offsetp, goto out_close; } - err = nfsd_buffered_readdir(file, fhp, func, cdp, offsetp); + err = nfsd_buffered_readdir(file, func, cdp, offsetp); if (err == nfserr_eof || err == nfserr_toosmall) err = nfs_ok; /* can still be found in ->err */ @@ -2272,16 +2263,13 @@ nfsd_listxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char **bufp, return err; } -/** - * nfsd_removexattr - Remove an extended attribute - * @rqstp: RPC transaction being executed - * @fhp: NFS filehandle of object with xattr to remove - * @name: name of xattr to remove (NUL-terminate) - * - * Pass in a NULL pointer for delegated_inode, and let the client deal - * with NFS4ERR_DELAY (same as with e.g. setattr and remove). - * - * Returns nfs_ok on success, or an nfsstat in network byte order. +/* + * Removexattr and setxattr need to call fh_lock to both lock the inode + * and set the change attribute. Since the top-level vfs_removexattr + * and vfs_setxattr calls already do their own inode_lock calls, call + * the _locked variant. Pass in a NULL pointer for delegated_inode, + * and let the client deal with NFS4ERR_DELAY (same as with e.g. + * setattr and remove). */ __be32 nfsd_removexattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name) @@ -2297,13 +2285,11 @@ nfsd_removexattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name) if (ret) return nfserrno(ret); - inode_lock(fhp->fh_dentry->d_inode); - fh_fill_pre_attrs(fhp); + fh_lock(fhp); ret = __vfs_removexattr_locked(fhp->fh_dentry, name, NULL); - fh_fill_post_attrs(fhp); - inode_unlock(fhp->fh_dentry->d_inode); + fh_unlock(fhp); fh_drop_write(fhp); return nfsd_xattr_errno(ret); @@ -2323,13 +2309,12 @@ nfsd_setxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name, ret = fh_want_write(fhp); if (ret) return nfserrno(ret); - inode_lock(fhp->fh_dentry->d_inode); - fh_fill_pre_attrs(fhp); + fh_lock(fhp); ret = __vfs_setxattr_locked(fhp->fh_dentry, name, buf, len, flags, NULL); - fh_fill_post_attrs(fhp); - inode_unlock(fhp->fh_dentry->d_inode); + + fh_unlock(fhp); fh_drop_write(fhp); return nfsd_xattr_errno(ret); diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index dbdfef7ae85b..a2442ebe5acf 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -6,8 +6,6 @@ #ifndef LINUX_NFSD_VFS_H #define LINUX_NFSD_VFS_H -#include -#include #include "nfsfh.h" #include "nfsd.h" @@ -44,23 +42,6 @@ struct nfsd_file; typedef int (*nfsd_filldir_t)(void *, const char *, int, loff_t, u64, unsigned); /* nfsd/vfs.c */ -struct nfsd_attrs { - struct iattr *na_iattr; /* input */ - struct xdr_netobj *na_seclabel; /* input */ - struct posix_acl *na_pacl; /* input */ - struct posix_acl *na_dpacl; /* input */ - - int na_labelerr; /* output */ - int na_aclerr; /* output */ -}; - -static inline void nfsd_attrs_free(struct nfsd_attrs *attrs) -{ - posix_acl_release(attrs->na_pacl); - posix_acl_release(attrs->na_dpacl); -} - -__be32 nfserrno (int errno); int nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp, struct svc_export **expp); __be32 nfsd_lookup(struct svc_rqst *, struct svc_fh *, @@ -69,28 +50,32 @@ __be32 nfsd_lookup_dentry(struct svc_rqst *, struct svc_fh *, const char *, unsigned int, struct svc_export **, struct dentry **); __be32 nfsd_setattr(struct svc_rqst *, struct svc_fh *, - struct nfsd_attrs *, int, time64_t); + struct iattr *, int, time64_t); int nfsd_mountpoint(struct dentry *, struct svc_export *); #ifdef CONFIG_NFSD_V4 +__be32 nfsd4_set_nfs4_label(struct svc_rqst *, struct svc_fh *, + struct xdr_netobj *); __be32 nfsd4_vfs_fallocate(struct svc_rqst *, struct svc_fh *, struct file *, loff_t, loff_t, int); -__be32 nfsd4_clone_file_range(struct svc_rqst *rqstp, - struct nfsd_file *nf_src, u64 src_pos, +__be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos, struct nfsd_file *nf_dst, u64 dst_pos, u64 count, bool sync); #endif /* CONFIG_NFSD_V4 */ __be32 nfsd_create_locked(struct svc_rqst *, struct svc_fh *, - struct nfsd_attrs *attrs, int type, dev_t rdev, - struct svc_fh *res); -__be32 nfsd_create(struct svc_rqst *, struct svc_fh *, - char *name, int len, struct nfsd_attrs *attrs, + char *name, int len, struct iattr *attrs, int type, dev_t rdev, struct svc_fh *res); +__be32 nfsd_create(struct svc_rqst *, struct svc_fh *, + char *name, int len, struct iattr *attrs, + int type, dev_t rdev, struct svc_fh *res); +#ifdef CONFIG_NFSD_V3 __be32 nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *); -__be32 nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, - struct svc_fh *resfhp, struct nfsd_attrs *iap); -__be32 nfsd_commit(struct svc_rqst *rqst, struct svc_fh *fhp, - struct nfsd_file *nf, u64 offset, u32 count, - __be32 *verf); +__be32 do_nfsd_create(struct svc_rqst *, struct svc_fh *, + char *name, int len, struct iattr *attrs, + struct svc_fh *res, int createmode, + u32 *verifier, bool *truncp, bool *created); +__be32 nfsd_commit(struct svc_rqst *, struct svc_fh *, + loff_t, unsigned long, __be32 *verf); +#endif /* CONFIG_NFSD_V3 */ #ifdef CONFIG_NFSD_V4 __be32 nfsd_getxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, char *name, void **bufp, int *lenp); @@ -104,7 +89,7 @@ __be32 nfsd_setxattr(struct svc_rqst *rqstp, struct svc_fh *fhp, int nfsd_open_break_lease(struct inode *, int); __be32 nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t, int, struct file **); -__be32 nfsd_open_verified(struct svc_rqst *, struct svc_fh *, +__be32 nfsd_open_verified(struct svc_rqst *, struct svc_fh *, umode_t, int, struct file **); __be32 nfsd_splice_read(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file, loff_t offset, @@ -128,9 +113,8 @@ __be32 nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, __be32 nfsd_readlink(struct svc_rqst *, struct svc_fh *, char *, int *); __be32 nfsd_symlink(struct svc_rqst *, struct svc_fh *, - char *name, int len, char *path, - struct nfsd_attrs *attrs, - struct svc_fh *res); + char *name, int len, char *path, + struct svc_fh *res); __be32 nfsd_link(struct svc_rqst *, struct svc_fh *, char *, int, struct svc_fh *); ssize_t nfsd_copy_file_range(struct file *, u64, @@ -168,7 +152,7 @@ static inline void fh_drop_write(struct svc_fh *fh) } } -static inline __be32 fh_getattr(const struct svc_fh *fh, struct kstat *stat) +static inline __be32 fh_getattr(struct svc_fh *fh, struct kstat *stat) { struct path p = {.mnt = fh->fh_export->ex_path.mnt, .dentry = fh->fh_dentry}; @@ -176,4 +160,10 @@ static inline __be32 fh_getattr(const struct svc_fh *fh, struct kstat *stat) AT_STATX_SYNC_AS_STAT)); } +static inline int nfsd_create_is_exclusive(int createmode) +{ + return createmode == NFS3_CREATE_EXCLUSIVE + || createmode == NFS4_CREATE_EXCLUSIVE4_1; +} + #endif /* LINUX_NFSD_VFS_H */ diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h index 852f71580bd0..b8cc6a4b2e0e 100644 --- a/fs/nfsd/xdr.h +++ b/fs/nfsd/xdr.h @@ -27,13 +27,14 @@ struct nfsd_readargs { struct svc_fh fh; __u32 offset; __u32 count; + int vlen; }; struct nfsd_writeargs { svc_fh fh; __u32 offset; __u32 len; - struct xdr_buf payload; + struct kvec first; }; struct nfsd_createargs { @@ -52,6 +53,11 @@ struct nfsd_renameargs { unsigned int tlen; }; +struct nfsd_readlinkargs { + struct svc_fh fh; + char * buffer; +}; + struct nfsd_linkargs { struct svc_fh ffh; struct svc_fh tfh; @@ -73,6 +79,7 @@ struct nfsd_readdirargs { struct svc_fh fh; __u32 cookie; __u32 count; + __be32 * buffer; }; struct nfsd_stat { @@ -94,7 +101,6 @@ struct nfsd_diropres { struct nfsd_readlinkres { __be32 status; int len; - struct page *page; }; struct nfsd_readres { @@ -102,20 +108,17 @@ struct nfsd_readres { struct svc_fh fh; unsigned long count; struct kstat stat; - struct page **pages; }; struct nfsd_readdirres { - /* Components of the reply */ __be32 status; int count; - /* Used to encode the reply's entry list */ - struct xdr_stream xdr; - struct xdr_buf dirlist; struct readdir_cd common; - unsigned int cookie_offset; + __be32 * buffer; + int buflen; + __be32 * offset; }; struct nfsd_statfsres { @@ -141,37 +144,36 @@ union nfsd_xdrstore { #define NFS2_SVC_XDRSIZE sizeof(union nfsd_xdrstore) -bool nfssvc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfssvc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfssvc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfssvc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfssvc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfssvc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfssvc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfssvc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfssvc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfssvc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfssvc_decode_void(struct svc_rqst *, __be32 *); +int nfssvc_decode_fhandle(struct svc_rqst *, __be32 *); +int nfssvc_decode_sattrargs(struct svc_rqst *, __be32 *); +int nfssvc_decode_diropargs(struct svc_rqst *, __be32 *); +int nfssvc_decode_readargs(struct svc_rqst *, __be32 *); +int nfssvc_decode_writeargs(struct svc_rqst *, __be32 *); +int nfssvc_decode_createargs(struct svc_rqst *, __be32 *); +int nfssvc_decode_renameargs(struct svc_rqst *, __be32 *); +int nfssvc_decode_readlinkargs(struct svc_rqst *, __be32 *); +int nfssvc_decode_linkargs(struct svc_rqst *, __be32 *); +int nfssvc_decode_symlinkargs(struct svc_rqst *, __be32 *); +int nfssvc_decode_readdirargs(struct svc_rqst *, __be32 *); +int nfssvc_encode_void(struct svc_rqst *, __be32 *); +int nfssvc_encode_stat(struct svc_rqst *, __be32 *); +int nfssvc_encode_attrstat(struct svc_rqst *, __be32 *); +int nfssvc_encode_diropres(struct svc_rqst *, __be32 *); +int nfssvc_encode_readlinkres(struct svc_rqst *, __be32 *); +int nfssvc_encode_readres(struct svc_rqst *, __be32 *); +int nfssvc_encode_statfsres(struct svc_rqst *, __be32 *); +int nfssvc_encode_readdirres(struct svc_rqst *, __be32 *); -bool nfssvc_encode_statres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfssvc_encode_attrstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfssvc_encode_diropres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfssvc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfssvc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfssvc_encode_statfsres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfssvc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr); - -void nfssvc_encode_nfscookie(struct nfsd_readdirres *resp, u32 offset); -int nfssvc_encode_entry(void *data, const char *name, int namlen, - loff_t offset, u64 ino, unsigned int d_type); +int nfssvc_encode_entry(void *, const char *name, + int namlen, loff_t offset, u64 ino, unsigned int); void nfssvc_release_attrstat(struct svc_rqst *rqstp); void nfssvc_release_diropres(struct svc_rqst *rqstp); void nfssvc_release_readres(struct svc_rqst *rqstp); /* Helper functions for NFSv2 ACL code */ -bool svcxdr_decode_fhandle(struct xdr_stream *xdr, struct svc_fh *fhp); -bool svcxdr_encode_stat(struct xdr_stream *xdr, __be32 status); -bool svcxdr_encode_fattr(struct svc_rqst *rqstp, struct xdr_stream *xdr, - const struct svc_fh *fhp, const struct kstat *stat); +__be32 *nfs2svc_encode_fattr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp, struct kstat *stat); +__be32 *nfs2svc_decode_fh(__be32 *p, struct svc_fh *fhp); #endif /* LINUX_NFSD_H */ diff --git a/fs/nfsd/xdr3.h b/fs/nfsd/xdr3.h index 03fe4e21306c..ae6fa6c9cb46 100644 --- a/fs/nfsd/xdr3.h +++ b/fs/nfsd/xdr3.h @@ -25,13 +25,14 @@ struct nfsd3_diropargs { struct nfsd3_accessargs { struct svc_fh fh; - __u32 access; + unsigned int access; }; struct nfsd3_readargs { struct svc_fh fh; __u64 offset; __u32 count; + int vlen; }; struct nfsd3_writeargs { @@ -40,7 +41,7 @@ struct nfsd3_writeargs { __u32 count; int stable; __u32 len; - struct xdr_buf payload; + struct kvec first; }; struct nfsd3_createargs { @@ -70,6 +71,11 @@ struct nfsd3_renameargs { unsigned int tlen; }; +struct nfsd3_readlinkargs { + struct svc_fh fh; + char * buffer; +}; + struct nfsd3_linkargs { struct svc_fh ffh; struct svc_fh tfh; @@ -90,8 +96,10 @@ struct nfsd3_symlinkargs { struct nfsd3_readdirargs { struct svc_fh fh; __u64 cookie; + __u32 dircount; __u32 count; __be32 * verf; + __be32 * buffer; }; struct nfsd3_commitargs { @@ -102,13 +110,13 @@ struct nfsd3_commitargs { struct nfsd3_getaclargs { struct svc_fh fh; - __u32 mask; + int mask; }; struct posix_acl; struct nfsd3_setaclargs { struct svc_fh fh; - __u32 mask; + int mask; struct posix_acl *acl_access; struct posix_acl *acl_default; }; @@ -137,7 +145,6 @@ struct nfsd3_readlinkres { __be32 status; struct svc_fh fh; __u32 len; - struct page **pages; }; struct nfsd3_readres { @@ -145,7 +152,6 @@ struct nfsd3_readres { struct svc_fh fh; unsigned long count; __u32 eof; - struct page **pages; }; struct nfsd3_writeres { @@ -169,17 +175,19 @@ struct nfsd3_linkres { }; struct nfsd3_readdirres { - /* Components of the reply */ __be32 status; struct svc_fh fh; + /* Just to save kmalloc on every readdirplus entry (svc_fh is a + * little large for the stack): */ + struct svc_fh scratch; + int count; __be32 verf[2]; - /* Used to encode the reply's entry list */ - struct xdr_stream xdr; - struct xdr_buf dirlist; - struct svc_fh scratch; struct readdir_cd common; - unsigned int cookie_offset; + __be32 * buffer; + int buflen; + __be32 * offset; + __be32 * offset1; struct svc_rqst * rqstp; }; @@ -265,50 +273,52 @@ union nfsd3_xdrstore { #define NFS3_SVC_XDRSIZE sizeof(union nfsd3_xdrstore) -bool nfs3svc_decode_fhandleargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_decode_sattrargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_decode_diropargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_decode_accessargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_decode_readargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_decode_writeargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_decode_createargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_decode_mkdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_decode_mknodargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_decode_renameargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_decode_linkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_decode_symlinkargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_decode_readdirargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_decode_readdirplusargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_decode_commitargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); - -bool nfs3svc_encode_getattrres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_encode_wccstat(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_encode_lookupres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_encode_writeres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_encode_createres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_encode_renameres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_encode_linkres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_encode_fsstatres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_encode_fsinfores(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_encode_pathconfres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs3svc_encode_commitres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs3svc_decode_voidarg(struct svc_rqst *, __be32 *); +int nfs3svc_decode_fhandle(struct svc_rqst *, __be32 *); +int nfs3svc_decode_sattrargs(struct svc_rqst *, __be32 *); +int nfs3svc_decode_diropargs(struct svc_rqst *, __be32 *); +int nfs3svc_decode_accessargs(struct svc_rqst *, __be32 *); +int nfs3svc_decode_readargs(struct svc_rqst *, __be32 *); +int nfs3svc_decode_writeargs(struct svc_rqst *, __be32 *); +int nfs3svc_decode_createargs(struct svc_rqst *, __be32 *); +int nfs3svc_decode_mkdirargs(struct svc_rqst *, __be32 *); +int nfs3svc_decode_mknodargs(struct svc_rqst *, __be32 *); +int nfs3svc_decode_renameargs(struct svc_rqst *, __be32 *); +int nfs3svc_decode_readlinkargs(struct svc_rqst *, __be32 *); +int nfs3svc_decode_linkargs(struct svc_rqst *, __be32 *); +int nfs3svc_decode_symlinkargs(struct svc_rqst *, __be32 *); +int nfs3svc_decode_readdirargs(struct svc_rqst *, __be32 *); +int nfs3svc_decode_readdirplusargs(struct svc_rqst *, __be32 *); +int nfs3svc_decode_commitargs(struct svc_rqst *, __be32 *); +int nfs3svc_encode_voidres(struct svc_rqst *, __be32 *); +int nfs3svc_encode_attrstat(struct svc_rqst *, __be32 *); +int nfs3svc_encode_wccstat(struct svc_rqst *, __be32 *); +int nfs3svc_encode_diropres(struct svc_rqst *, __be32 *); +int nfs3svc_encode_accessres(struct svc_rqst *, __be32 *); +int nfs3svc_encode_readlinkres(struct svc_rqst *, __be32 *); +int nfs3svc_encode_readres(struct svc_rqst *, __be32 *); +int nfs3svc_encode_writeres(struct svc_rqst *, __be32 *); +int nfs3svc_encode_createres(struct svc_rqst *, __be32 *); +int nfs3svc_encode_renameres(struct svc_rqst *, __be32 *); +int nfs3svc_encode_linkres(struct svc_rqst *, __be32 *); +int nfs3svc_encode_readdirres(struct svc_rqst *, __be32 *); +int nfs3svc_encode_fsstatres(struct svc_rqst *, __be32 *); +int nfs3svc_encode_fsinfores(struct svc_rqst *, __be32 *); +int nfs3svc_encode_pathconfres(struct svc_rqst *, __be32 *); +int nfs3svc_encode_commitres(struct svc_rqst *, __be32 *); void nfs3svc_release_fhandle(struct svc_rqst *); void nfs3svc_release_fhandle2(struct svc_rqst *); - -void nfs3svc_encode_cookie3(struct nfsd3_readdirres *resp, u64 offset); -int nfs3svc_encode_entry3(void *data, const char *name, int namlen, - loff_t offset, u64 ino, unsigned int d_type); -int nfs3svc_encode_entryplus3(void *data, const char *name, int namlen, - loff_t offset, u64 ino, unsigned int d_type); +int nfs3svc_encode_entry(void *, const char *name, + int namlen, loff_t offset, u64 ino, + unsigned int); +int nfs3svc_encode_entry_plus(void *, const char *name, + int namlen, loff_t offset, u64 ino, + unsigned int); /* Helper functions for NFSv3 ACL code */ -bool svcxdr_decode_nfs_fh3(struct xdr_stream *xdr, struct svc_fh *fhp); -bool svcxdr_encode_nfsstat3(struct xdr_stream *xdr, __be32 status); -bool svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr, - const struct svc_fh *fhp); +__be32 *nfs3svc_encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, + struct svc_fh *fhp); +__be32 *nfs3svc_decode_fh(__be32 *p, struct svc_fh *fhp); + #endif /* _LINUX_NFSD_XDR3_H */ diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h index a034b9b62137..679d40af1bbb 100644 --- a/fs/nfsd/xdr4.h +++ b/fs/nfsd/xdr4.h @@ -76,7 +76,12 @@ static inline bool nfsd4_has_session(struct nfsd4_compound_state *cs) struct nfsd4_change_info { u32 atomic; + bool change_supported; + u32 before_ctime_sec; + u32 before_ctime_nsec; u64 before_change; + u32 after_ctime_sec; + u32 after_ctime_nsec; u64 after_change; }; @@ -247,8 +252,7 @@ struct nfsd4_listxattrs { struct nfsd4_open { u32 op_claim_type; /* request */ - u32 op_fnamelen; - char * op_fname; /* request - everything but CLAIM_PREV */ + struct xdr_netobj op_fname; /* request - everything but CLAIM_PREV */ u32 op_delegate_type; /* request - CLAIM_PREV only */ stateid_t op_delegate_stateid; /* request - response */ u32 op_why_no_deleg; /* response - DELEG_NONE_EXT only */ @@ -273,13 +277,11 @@ struct nfsd4_open { bool op_truncate; /* used during processing */ bool op_created; /* used during processing */ struct nfs4_openowner *op_openowner; /* used during processing */ - struct file *op_filp; /* used during processing */ struct nfs4_file *op_file; /* used during processing */ struct nfs4_ol_stateid *op_stp; /* used during processing */ struct nfs4_clnt_odstate *op_odstate; /* used during processing */ struct nfs4_acl *op_acl; struct xdr_netobj op_label; - struct svc_rqst *op_rqstp; }; struct nfsd4_open_confirm { @@ -303,10 +305,9 @@ struct nfsd4_read { u32 rd_length; /* request */ int rd_vlen; struct nfsd_file *rd_nf; - + struct svc_rqst *rd_rqstp; /* response */ - struct svc_fh *rd_fhp; /* response */ - u32 rd_eof; /* response */ + struct svc_fh *rd_fhp; /* response */ }; struct nfsd4_readdir { @@ -384,6 +385,13 @@ struct nfsd4_setclientid_confirm { nfs4_verifier sc_confirm; }; +struct nfsd4_saved_compoundargs { + __be32 *p; + __be32 *end; + int pagelen; + struct page **pagelist; +}; + struct nfsd4_test_stateid_id { __be32 ts_id_status; stateid_t ts_id_stateid; @@ -411,7 +419,8 @@ struct nfsd4_write { u64 wr_offset; /* request */ u32 wr_stable_how; /* request */ u32 wr_buflen; /* request */ - struct xdr_buf wr_payload; /* request */ + struct kvec wr_head; + struct page ** wr_pagelist; /* request */ u32 wr_bytes_written; /* response */ u32 wr_how_written; /* response */ @@ -424,7 +433,7 @@ struct nfsd4_exchange_id { u32 flags; clientid_t clientid; u32 seqid; - u32 spa_how; + int spa_how; u32 spo_must_enforce[3]; u32 spo_must_allow[3]; struct xdr_netobj nii_domain; @@ -534,13 +543,6 @@ struct nfsd42_write_res { stateid_t cb_stateid; }; -struct nfsd4_cb_offload { - struct nfsd4_callback co_cb; - struct nfsd42_write_res co_res; - __be32 co_nfserr; - struct knfsd_fh co_fh; -}; - struct nfsd4_copy { /* request */ stateid_t cp_src_stateid; @@ -548,16 +550,18 @@ struct nfsd4_copy { u64 cp_src_pos; u64 cp_dst_pos; u64 cp_count; - struct nl4_server *cp_src; + struct nl4_server cp_src; + bool cp_intra; - unsigned long cp_flags; -#define NFSD4_COPY_F_STOPPED (0) -#define NFSD4_COPY_F_INTRA (1) -#define NFSD4_COPY_F_SYNCHRONOUS (2) -#define NFSD4_COPY_F_COMMITTED (3) + /* both */ + bool cp_synchronous; /* response */ struct nfsd42_write_res cp_res; + + /* for cb_offload */ + struct nfsd4_callback cp_cb; + __be32 nfserr; struct knfsd_fh fh; struct nfs4_client *cp_clp; @@ -570,34 +574,13 @@ struct nfsd4_copy { struct list_head copies; struct task_struct *copy_task; refcount_t refcount; + bool stopped; - struct nfsd4_ssc_umount_item *ss_nsui; + struct vfsmount *ss_mnt; struct nfs_fh c_fh; nfs4_stateid stateid; }; - -static inline void nfsd4_copy_set_sync(struct nfsd4_copy *copy, bool sync) -{ - if (sync) - set_bit(NFSD4_COPY_F_SYNCHRONOUS, ©->cp_flags); - else - clear_bit(NFSD4_COPY_F_SYNCHRONOUS, ©->cp_flags); -} - -static inline bool nfsd4_copy_is_sync(const struct nfsd4_copy *copy) -{ - return test_bit(NFSD4_COPY_F_SYNCHRONOUS, ©->cp_flags); -} - -static inline bool nfsd4_copy_is_async(const struct nfsd4_copy *copy) -{ - return !test_bit(NFSD4_COPY_F_SYNCHRONOUS, ©->cp_flags); -} - -static inline bool nfsd4_ssc_is_inter(const struct nfsd4_copy *copy) -{ - return !test_bit(NFSD4_COPY_F_INTRA, ©->cp_flags); -} +extern bool inter_copy_offload_enable; struct nfsd4_seek { /* request */ @@ -622,20 +605,19 @@ struct nfsd4_offload_status { struct nfsd4_copy_notify { /* request */ stateid_t cpn_src_stateid; - struct nl4_server *cpn_dst; + struct nl4_server cpn_dst; /* response */ stateid_t cpn_cnr_stateid; u64 cpn_sec; u32 cpn_nsec; - struct nl4_server *cpn_src; + struct nl4_server cpn_src; }; struct nfsd4_op { - u32 opnum; + int opnum; + const struct nfsd4_operation * opdesc; __be32 status; - const struct nfsd4_operation *opdesc; - struct nfs4_replay *replay; union nfsd4_op_u { struct nfsd4_access access; struct nfsd4_close close; @@ -699,6 +681,7 @@ struct nfsd4_op { struct nfsd4_listxattrs listxattrs; struct nfsd4_removexattr removexattr; } u; + struct nfs4_replay * replay; }; bool nfsd4_cache_this_op(struct nfsd4_op *); @@ -713,29 +696,35 @@ struct svcxdr_tmpbuf { struct nfsd4_compoundargs { /* scratch variables for XDR decode */ - struct xdr_stream *xdr; + __be32 * p; + __be32 * end; + struct page ** pagelist; + int pagelen; + bool tail; + __be32 tmp[8]; + __be32 * tmpp; struct svcxdr_tmpbuf *to_free; + struct svc_rqst *rqstp; - char * tag; u32 taglen; + char * tag; u32 minorversion; - u32 client_opcnt; u32 opcnt; struct nfsd4_op *ops; struct nfsd4_op iops[8]; + int cachetype; }; struct nfsd4_compoundres { /* scratch variables for XDR encode */ - struct xdr_stream *xdr; + struct xdr_stream xdr; struct svc_rqst * rqstp; - __be32 *statusp; - char * tag; u32 taglen; + char * tag; u32 opcnt; - + __be32 * tagp; /* tag, opcount encode location */ struct nfsd4_compound_state cstate; }; @@ -778,16 +767,24 @@ static inline void set_change_info(struct nfsd4_change_info *cinfo, struct svc_fh *fhp) { BUG_ON(!fhp->fh_pre_saved); - cinfo->atomic = (u32)(fhp->fh_post_saved && !fhp->fh_no_atomic_attr); + cinfo->atomic = (u32)fhp->fh_post_saved; + cinfo->change_supported = IS_I_VERSION(d_inode(fhp->fh_dentry)); cinfo->before_change = fhp->fh_pre_change; cinfo->after_change = fhp->fh_post_change; + cinfo->before_ctime_sec = fhp->fh_pre_ctime.tv_sec; + cinfo->before_ctime_nsec = fhp->fh_pre_ctime.tv_nsec; + cinfo->after_ctime_sec = fhp->fh_post_attr.ctime.tv_sec; + cinfo->after_ctime_nsec = fhp->fh_post_attr.ctime.tv_nsec; + } bool nfsd4_mach_creds_match(struct nfs4_client *cl, struct svc_rqst *rqstp); -bool nfs4svc_decode_compoundargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nfs4svc_encode_compoundres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nfs4svc_decode_voidarg(struct svc_rqst *, __be32 *); +int nfs4svc_encode_voidres(struct svc_rqst *, __be32 *); +int nfs4svc_decode_compoundargs(struct svc_rqst *, __be32 *); +int nfs4svc_encode_compoundres(struct svc_rqst *, __be32 *); __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *, u32); void nfsd4_encode_operation(struct nfsd4_compoundres *, struct nfsd4_op *); void nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op); @@ -888,19 +885,13 @@ struct nfsd4_operation { u32 op_flags; char *op_name; /* Try to get response size before operation */ - u32 (*op_rsize_bop)(const struct svc_rqst *rqstp, - const struct nfsd4_op *op); + u32 (*op_rsize_bop)(struct svc_rqst *, struct nfsd4_op *); void (*op_get_currentstateid)(struct nfsd4_compound_state *, union nfsd4_op_u *); void (*op_set_currentstateid)(struct nfsd4_compound_state *, union nfsd4_op_u *); }; -struct nfsd4_cb_recall_any { - struct nfsd4_callback ra_cb; - u32 ra_keep; - u32 ra_bmval[1]; -}; #endif diff --git a/fs/nfsd/xdr4cb.h b/fs/nfsd/xdr4cb.h index 0d39af1b00a0..547cf07cf4e0 100644 --- a/fs/nfsd/xdr4cb.h +++ b/fs/nfsd/xdr4cb.h @@ -48,9 +48,3 @@ #define NFS4_dec_cb_offload_sz (cb_compound_dec_hdr_sz + \ cb_sequence_dec_sz + \ op_dec_sz) -#define NFS4_enc_cb_recall_any_sz (cb_compound_enc_hdr_sz + \ - cb_sequence_enc_sz + \ - 1 + 1 + 1) -#define NFS4_dec_cb_recall_any_sz (cb_compound_dec_hdr_sz + \ - cb_sequence_dec_sz + \ - op_dec_sz) diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index fa81c59a2ad4..e45ca6ecba95 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -150,7 +150,7 @@ void dnotify_flush(struct file *filp, fl_owner_t id) return; dn_mark = container_of(fsn_mark, struct dnotify_mark, fsn_mark); - fsnotify_group_lock(dnotify_group); + mutex_lock(&dnotify_group->mark_mutex); spin_lock(&fsn_mark->lock); prev = &dn_mark->dn; @@ -173,7 +173,7 @@ void dnotify_flush(struct file *filp, fl_owner_t id) free = true; } - fsnotify_group_unlock(dnotify_group); + mutex_unlock(&dnotify_group->mark_mutex); if (free) fsnotify_free_mark(fsn_mark); @@ -196,7 +196,7 @@ static __u32 convert_arg(unsigned long arg) if (arg & DN_ATTRIB) new_mask |= FS_ATTRIB; if (arg & DN_RENAME) - new_mask |= FS_RENAME; + new_mask |= FS_DN_RENAME; if (arg & DN_CREATE) new_mask |= (FS_CREATE | FS_MOVED_TO); @@ -306,7 +306,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) new_dn_mark->dn = NULL; /* this is needed to prevent the fcntl/close race described below */ - fsnotify_group_lock(dnotify_group); + mutex_lock(&dnotify_group->mark_mutex); /* add the new_fsn_mark or find an old one. */ fsn_mark = fsnotify_find_mark(&inode->i_fsnotify_marks, dnotify_group); @@ -316,7 +316,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) } else { error = fsnotify_add_inode_mark_locked(new_fsn_mark, inode, 0); if (error) { - fsnotify_group_unlock(dnotify_group); + mutex_unlock(&dnotify_group->mark_mutex); goto out_err; } spin_lock(&new_fsn_mark->lock); @@ -327,7 +327,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) } rcu_read_lock(); - f = lookup_fd_rcu(fd); + f = fcheck(fd); rcu_read_unlock(); /* if (f != filp) means that we lost a race and another task/thread @@ -365,7 +365,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg) if (destroy) fsnotify_detach_mark(fsn_mark); - fsnotify_group_unlock(dnotify_group); + mutex_unlock(&dnotify_group->mark_mutex); if (destroy) fsnotify_free_mark(fsn_mark); fsnotify_put_mark(fsn_mark); @@ -383,8 +383,7 @@ static int __init dnotify_init(void) SLAB_PANIC|SLAB_ACCOUNT); dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC|SLAB_ACCOUNT); - dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops, - FSNOTIFY_GROUP_NOFS); + dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops); if (IS_ERR(dnotify_group)) panic("unable to allocate fsnotify group for dnotify\n"); return 0; diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index a2a15bc4df28..c3af99e94f1d 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -14,33 +14,20 @@ #include #include #include -#include #include "fanotify.h" -static bool fanotify_path_equal(const struct path *p1, const struct path *p2) +static bool fanotify_path_equal(struct path *p1, struct path *p2) { return p1->mnt == p2->mnt && p1->dentry == p2->dentry; } -static unsigned int fanotify_hash_path(const struct path *path) -{ - return hash_ptr(path->dentry, FANOTIFY_EVENT_HASH_BITS) ^ - hash_ptr(path->mnt, FANOTIFY_EVENT_HASH_BITS); -} - static inline bool fanotify_fsid_equal(__kernel_fsid_t *fsid1, __kernel_fsid_t *fsid2) { return fsid1->val[0] == fsid2->val[0] && fsid1->val[1] == fsid2->val[1]; } -static unsigned int fanotify_hash_fsid(__kernel_fsid_t *fsid) -{ - return hash_32(fsid->val[0], FANOTIFY_EVENT_HASH_BITS) ^ - hash_32(fsid->val[1], FANOTIFY_EVENT_HASH_BITS); -} - static bool fanotify_fh_equal(struct fanotify_fh *fh1, struct fanotify_fh *fh2) { @@ -51,16 +38,6 @@ static bool fanotify_fh_equal(struct fanotify_fh *fh1, !memcmp(fanotify_fh_buf(fh1), fanotify_fh_buf(fh2), fh1->len); } -static unsigned int fanotify_hash_fh(struct fanotify_fh *fh) -{ - long salt = (long)fh->type | (long)fh->len << 8; - - /* - * full_name_hash() works long by long, so it handles fh buf optimally. - */ - return full_name_hash((void *)salt, fanotify_fh_buf(fh), fh->len); -} - static bool fanotify_fid_event_equal(struct fanotify_fid_event *ffe1, struct fanotify_fid_event *ffe2) { @@ -76,10 +53,8 @@ static bool fanotify_info_equal(struct fanotify_info *info1, struct fanotify_info *info2) { if (info1->dir_fh_totlen != info2->dir_fh_totlen || - info1->dir2_fh_totlen != info2->dir2_fh_totlen || info1->file_fh_totlen != info2->file_fh_totlen || - info1->name_len != info2->name_len || - info1->name2_len != info2->name2_len) + info1->name_len != info2->name_len) return false; if (info1->dir_fh_totlen && @@ -87,24 +62,14 @@ static bool fanotify_info_equal(struct fanotify_info *info1, fanotify_info_dir_fh(info2))) return false; - if (info1->dir2_fh_totlen && - !fanotify_fh_equal(fanotify_info_dir2_fh(info1), - fanotify_info_dir2_fh(info2))) - return false; - if (info1->file_fh_totlen && !fanotify_fh_equal(fanotify_info_file_fh(info1), fanotify_info_file_fh(info2))) return false; - if (info1->name_len && - memcmp(fanotify_info_name(info1), fanotify_info_name(info2), - info1->name_len)) - return false; - - return !info1->name2_len || - !memcmp(fanotify_info_name2(info1), fanotify_info_name2(info2), - info1->name2_len); + return !info1->name_len || + !memcmp(fanotify_info_name(info1), fanotify_info_name(info2), + info1->name_len); } static bool fanotify_name_event_equal(struct fanotify_name_event *fne1, @@ -123,22 +88,16 @@ static bool fanotify_name_event_equal(struct fanotify_name_event *fne1, return fanotify_info_equal(info1, info2); } -static bool fanotify_error_event_equal(struct fanotify_error_event *fee1, - struct fanotify_error_event *fee2) +static bool fanotify_should_merge(struct fsnotify_event *old_fsn, + struct fsnotify_event *new_fsn) { - /* Error events against the same file system are always merged. */ - if (!fanotify_fsid_equal(&fee1->fsid, &fee2->fsid)) - return false; + struct fanotify_event *old, *new; - return true; -} + pr_debug("%s: old=%p new=%p\n", __func__, old_fsn, new_fsn); + old = FANOTIFY_E(old_fsn); + new = FANOTIFY_E(new_fsn); -static bool fanotify_should_merge(struct fanotify_event *old, - struct fanotify_event *new) -{ - pr_debug("%s: old=%p new=%p\n", __func__, old, new); - - if (old->hash != new->hash || + if (old_fsn->objectid != new_fsn->objectid || old->type != new->type || old->pid != new->pid) return false; @@ -153,13 +112,6 @@ static bool fanotify_should_merge(struct fanotify_event *old, if ((old->mask & FS_ISDIR) != (new->mask & FS_ISDIR)) return false; - /* - * FAN_RENAME event is reported with special info record types, - * so we cannot merge it with other events. - */ - if ((old->mask & FAN_RENAME) != (new->mask & FAN_RENAME)) - return false; - switch (old->type) { case FANOTIFY_EVENT_TYPE_PATH: return fanotify_path_equal(fanotify_event_path(old), @@ -170,9 +122,6 @@ static bool fanotify_should_merge(struct fanotify_event *old, case FANOTIFY_EVENT_TYPE_FID_NAME: return fanotify_name_event_equal(FANOTIFY_NE(old), FANOTIFY_NE(new)); - case FANOTIFY_EVENT_TYPE_FS_ERROR: - return fanotify_error_event_equal(FANOTIFY_EE(old), - FANOTIFY_EE(new)); default: WARN_ON_ONCE(1); } @@ -184,16 +133,14 @@ static bool fanotify_should_merge(struct fanotify_event *old, #define FANOTIFY_MAX_MERGE_EVENTS 128 /* and the list better be locked by something too! */ -static int fanotify_merge(struct fsnotify_group *group, - struct fsnotify_event *event) +static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) { - struct fanotify_event *old, *new = FANOTIFY_E(event); - unsigned int bucket = fanotify_event_hash_bucket(group, new); - struct hlist_head *hlist = &group->fanotify_data.merge_hash[bucket]; + struct fsnotify_event *test_event; + struct fanotify_event *new; int i = 0; - pr_debug("%s: group=%p event=%p bucket=%u\n", __func__, - group, event, bucket); + pr_debug("%s: list=%p event=%p\n", __func__, list, event); + new = FANOTIFY_E(event); /* * Don't merge a permission event with any other event so that we know @@ -203,15 +150,11 @@ static int fanotify_merge(struct fsnotify_group *group, if (fanotify_is_perm_event(new->mask)) return 0; - hlist_for_each_entry(old, hlist, merge_list) { + list_for_each_entry_reverse(test_event, list, list) { if (++i > FANOTIFY_MAX_MERGE_EVENTS) break; - if (fanotify_should_merge(old, new)) { - old->mask |= new->mask; - - if (fanotify_is_error_event(old->mask)) - FANOTIFY_EE(old)->err_count++; - + if (fanotify_should_merge(test_event, event)) { + FANOTIFY_E(test_event)->mask |= new->mask; return 1; } } @@ -247,11 +190,8 @@ static int fanotify_get_response(struct fsnotify_group *group, return ret; } /* Event not yet reported? Just remove it. */ - if (event->state == FAN_EVENT_INIT) { + if (event->state == FAN_EVENT_INIT) fsnotify_remove_queued_event(group, &event->fae.fse); - /* Permission events are not supposed to be hashed */ - WARN_ON_ONCE(!hlist_unhashed(&event->fae.merge_list)); - } /* * Event may be also answered in case signal delivery raced * with wakeup. In that case we have nothing to do besides @@ -291,17 +231,15 @@ static int fanotify_get_response(struct fsnotify_group *group, */ static u32 fanotify_group_event_mask(struct fsnotify_group *group, struct fsnotify_iter_info *iter_info, - u32 *match_mask, u32 event_mask, - const void *data, int data_type, - struct inode *dir) + u32 event_mask, const void *data, + int data_type, struct inode *dir) { - __u32 marks_mask = 0, marks_ignore_mask = 0; + __u32 marks_mask = 0, marks_ignored_mask = 0; __u32 test_mask, user_mask = FANOTIFY_OUTGOING_EVENTS | FANOTIFY_EVENT_FLAGS; const struct path *path = fsnotify_data_path(data, data_type); unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); struct fsnotify_mark *mark; - bool ondir = event_mask & FAN_ONDIR; int type; pr_debug("%s: report_mask=%x mask=%x data=%p data_type=%d\n", @@ -316,30 +254,37 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, return 0; } else if (!(fid_mode & FAN_REPORT_FID)) { /* Do we have a directory inode to report? */ - if (!dir && !ondir) + if (!dir && !(event_mask & FS_ISDIR)) return 0; } - fsnotify_foreach_iter_mark_type(iter_info, mark, type) { - /* - * Apply ignore mask depending on event flags in ignore mask. - */ - marks_ignore_mask |= - fsnotify_effective_ignore_mask(mark, ondir, type); + fsnotify_foreach_obj_type(type) { + if (!fsnotify_iter_should_report_type(iter_info, type)) + continue; + mark = iter_info->marks[type]; + + /* Apply ignore mask regardless of ISDIR and ON_CHILD flags */ + marks_ignored_mask |= mark->ignored_mask; /* - * Send the event depending on event flags in mark mask. + * If the event is on dir and this mark doesn't care about + * events on dir, don't send it! */ - if (!fsnotify_mask_applicable(mark->mask, ondir, type)) + if (event_mask & FS_ISDIR && !(mark->mask & FS_ISDIR)) + continue; + + /* + * If the event is on a child and this mark is on a parent not + * watching children, don't send it! + */ + if (type == FSNOTIFY_OBJ_TYPE_PARENT && + !(mark->mask & FS_EVENT_ON_CHILD)) continue; marks_mask |= mark->mask; - - /* Record the mark types of this group that matched the event */ - *match_mask |= 1U << type; } - test_mask = event_mask & marks_mask & ~marks_ignore_mask; + test_mask = event_mask & marks_mask & ~marks_ignored_mask; /* * For dirent modification events (create/delete/move) that do not carry @@ -374,23 +319,13 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group, static int fanotify_encode_fh_len(struct inode *inode) { int dwords = 0; - int fh_len; if (!inode) return 0; exportfs_encode_inode_fh(inode, NULL, &dwords, NULL); - fh_len = dwords << 2; - /* - * struct fanotify_error_event might be preallocated and is - * limited to MAX_HANDLE_SZ. This should never happen, but - * safeguard by forcing an invalid file handle. - */ - if (WARN_ON_ONCE(fh_len > MAX_HANDLE_SZ)) - return 0; - - return fh_len; + return dwords << 2; } /* @@ -400,8 +335,7 @@ static int fanotify_encode_fh_len(struct inode *inode) * Return 0 on failure to encode. */ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, - unsigned int fh_len, unsigned int *hash, - gfp_t gfp) + unsigned int fh_len, gfp_t gfp) { int dwords, type = 0; char *ext_buf = NULL; @@ -411,21 +345,15 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, fh->type = FILEID_ROOT; fh->len = 0; fh->flags = 0; - - /* - * Invalid FHs are used by FAN_FS_ERROR for errors not - * linked to any inode. The f_handle won't be reported - * back to userspace. - */ if (!inode) - goto out; + return 0; /* * !gpf means preallocated variable size fh, but fh_len could * be zero in that case if encoding fh len failed. */ err = -ENOENT; - if (fh_len < 4 || WARN_ON_ONCE(fh_len % 4) || fh_len > MAX_HANDLE_SZ) + if (fh_len < 4 || WARN_ON_ONCE(fh_len % 4)) goto out_err; /* No external buffer in a variable size allocated fh */ @@ -450,14 +378,6 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, fh->type = type; fh->len = fh_len; -out: - /* - * Mix fh into event merge key. Hash might be NULL in case of - * unhashed FID events (i.e. FAN_FS_ERROR). - */ - if (hash) - *hash ^= fanotify_hash_fh(fh); - return FANOTIFY_FH_HDR_LEN + fh_len; out_err: @@ -472,41 +392,17 @@ static int fanotify_encode_fh(struct fanotify_fh *fh, struct inode *inode, } /* - * FAN_REPORT_FID is ambiguous in that it reports the fid of the child for - * some events and the fid of the parent for create/delete/move events. - * - * With the FAN_REPORT_TARGET_FID flag, the fid of the child is reported - * also in create/delete/move events in addition to the fid of the parent - * and the name of the child. - */ -static inline bool fanotify_report_child_fid(unsigned int fid_mode, u32 mask) -{ - if (mask & ALL_FSNOTIFY_DIRENT_EVENTS) - return (fid_mode & FAN_REPORT_TARGET_FID); - - return (fid_mode & FAN_REPORT_FID) && !(mask & FAN_ONDIR); -} - -/* - * The inode to use as identifier when reporting fid depends on the event - * and the group flags. - * - * With the group flag FAN_REPORT_TARGET_FID, always report the child fid. - * - * Without the group flag FAN_REPORT_TARGET_FID, report the modified directory - * fid on dirent events and the child fid otherwise. - * + * The inode to use as identifier when reporting fid depends on the event. + * Report the modified directory inode on dirent modification events. + * Report the "victim" inode otherwise. * For example: - * FS_ATTRIB reports the child fid even if reported on a watched parent. - * FS_CREATE reports the modified dir fid without FAN_REPORT_TARGET_FID. - * and reports the created child fid with FAN_REPORT_TARGET_FID. + * FS_ATTRIB reports the child inode even if reported on a watched parent. + * FS_CREATE reports the modified dir inode and not the created inode. */ static struct inode *fanotify_fid_inode(u32 event_mask, const void *data, - int data_type, struct inode *dir, - unsigned int fid_mode) + int data_type, struct inode *dir) { - if ((event_mask & ALL_FSNOTIFY_DIRENT_EVENTS) && - !(fid_mode & FAN_REPORT_TARGET_FID)) + if (event_mask & ALL_FSNOTIFY_DIRENT_EVENTS) return dir; return fsnotify_data_inode(data, data_type); @@ -528,14 +424,13 @@ static struct inode *fanotify_dfid_inode(u32 event_mask, const void *data, if (event_mask & ALL_FSNOTIFY_DIRENT_EVENTS) return dir; - if (inode && S_ISDIR(inode->i_mode)) + if (S_ISDIR(inode->i_mode)) return inode; return dir; } static struct fanotify_event *fanotify_alloc_path_event(const struct path *path, - unsigned int *hash, gfp_t gfp) { struct fanotify_path_event *pevent; @@ -546,7 +441,6 @@ static struct fanotify_event *fanotify_alloc_path_event(const struct path *path, pevent->fae.type = FANOTIFY_EVENT_TYPE_PATH; pevent->path = *path; - *hash ^= fanotify_hash_path(path); path_get(path); return &pevent->fae; @@ -572,7 +466,6 @@ static struct fanotify_event *fanotify_alloc_perm_event(const struct path *path, static struct fanotify_event *fanotify_alloc_fid_event(struct inode *id, __kernel_fsid_t *fsid, - unsigned int *hash, gfp_t gfp) { struct fanotify_fid_event *ffe; @@ -583,153 +476,78 @@ static struct fanotify_event *fanotify_alloc_fid_event(struct inode *id, ffe->fae.type = FANOTIFY_EVENT_TYPE_FID; ffe->fsid = *fsid; - *hash ^= fanotify_hash_fsid(fsid); fanotify_encode_fh(&ffe->object_fh, id, fanotify_encode_fh_len(id), - hash, gfp); + gfp); return &ffe->fae; } -static struct fanotify_event *fanotify_alloc_name_event(struct inode *dir, +static struct fanotify_event *fanotify_alloc_name_event(struct inode *id, __kernel_fsid_t *fsid, - const struct qstr *name, + const struct qstr *file_name, struct inode *child, - struct dentry *moved, - unsigned int *hash, gfp_t gfp) { struct fanotify_name_event *fne; struct fanotify_info *info; struct fanotify_fh *dfh, *ffh; - struct inode *dir2 = moved ? d_inode(moved->d_parent) : NULL; - const struct qstr *name2 = moved ? &moved->d_name : NULL; - unsigned int dir_fh_len = fanotify_encode_fh_len(dir); - unsigned int dir2_fh_len = fanotify_encode_fh_len(dir2); + unsigned int dir_fh_len = fanotify_encode_fh_len(id); unsigned int child_fh_len = fanotify_encode_fh_len(child); - unsigned long name_len = name ? name->len : 0; - unsigned long name2_len = name2 ? name2->len : 0; - unsigned int len, size; + unsigned int size; - /* Reserve terminating null byte even for empty name */ - size = sizeof(*fne) + name_len + name2_len + 2; - if (dir_fh_len) - size += FANOTIFY_FH_HDR_LEN + dir_fh_len; - if (dir2_fh_len) - size += FANOTIFY_FH_HDR_LEN + dir2_fh_len; + size = sizeof(*fne) + FANOTIFY_FH_HDR_LEN + dir_fh_len; if (child_fh_len) size += FANOTIFY_FH_HDR_LEN + child_fh_len; + if (file_name) + size += file_name->len + 1; fne = kmalloc(size, gfp); if (!fne) return NULL; fne->fae.type = FANOTIFY_EVENT_TYPE_FID_NAME; fne->fsid = *fsid; - *hash ^= fanotify_hash_fsid(fsid); info = &fne->info; fanotify_info_init(info); - if (dir_fh_len) { - dfh = fanotify_info_dir_fh(info); - len = fanotify_encode_fh(dfh, dir, dir_fh_len, hash, 0); - fanotify_info_set_dir_fh(info, len); - } - if (dir2_fh_len) { - dfh = fanotify_info_dir2_fh(info); - len = fanotify_encode_fh(dfh, dir2, dir2_fh_len, hash, 0); - fanotify_info_set_dir2_fh(info, len); - } + dfh = fanotify_info_dir_fh(info); + info->dir_fh_totlen = fanotify_encode_fh(dfh, id, dir_fh_len, 0); if (child_fh_len) { ffh = fanotify_info_file_fh(info); - len = fanotify_encode_fh(ffh, child, child_fh_len, hash, 0); - fanotify_info_set_file_fh(info, len); - } - if (name_len) { - fanotify_info_copy_name(info, name); - *hash ^= full_name_hash((void *)name_len, name->name, name_len); - } - if (name2_len) { - fanotify_info_copy_name2(info, name2); - *hash ^= full_name_hash((void *)name2_len, name2->name, - name2_len); + info->file_fh_totlen = fanotify_encode_fh(ffh, child, child_fh_len, 0); } + if (file_name) + fanotify_info_copy_name(info, file_name); - pr_debug("%s: size=%u dir_fh_len=%u child_fh_len=%u name_len=%u name='%.*s'\n", - __func__, size, dir_fh_len, child_fh_len, + pr_debug("%s: ino=%lu size=%u dir_fh_len=%u child_fh_len=%u name_len=%u name='%.*s'\n", + __func__, id->i_ino, size, dir_fh_len, child_fh_len, info->name_len, info->name_len, fanotify_info_name(info)); - if (dir2_fh_len) { - pr_debug("%s: dir2_fh_len=%u name2_len=%u name2='%.*s'\n", - __func__, dir2_fh_len, info->name2_len, - info->name2_len, fanotify_info_name2(info)); - } - return &fne->fae; } -static struct fanotify_event *fanotify_alloc_error_event( - struct fsnotify_group *group, - __kernel_fsid_t *fsid, - const void *data, int data_type, - unsigned int *hash) -{ - struct fs_error_report *report = - fsnotify_data_error_report(data, data_type); - struct inode *inode; - struct fanotify_error_event *fee; - int fh_len; - - if (WARN_ON_ONCE(!report)) - return NULL; - - fee = mempool_alloc(&group->fanotify_data.error_events_pool, GFP_NOFS); - if (!fee) - return NULL; - - fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR; - fee->error = report->error; - fee->err_count = 1; - fee->fsid = *fsid; - - inode = report->inode; - fh_len = fanotify_encode_fh_len(inode); - - /* Bad fh_len. Fallback to using an invalid fh. Should never happen. */ - if (!fh_len && inode) - inode = NULL; - - fanotify_encode_fh(&fee->object_fh, inode, fh_len, NULL, 0); - - *hash ^= fanotify_hash_fsid(fsid); - - return &fee->fae; -} - -static struct fanotify_event *fanotify_alloc_event( - struct fsnotify_group *group, - u32 mask, const void *data, int data_type, - struct inode *dir, const struct qstr *file_name, - __kernel_fsid_t *fsid, u32 match_mask) +static struct fanotify_event *fanotify_alloc_event(struct fsnotify_group *group, + u32 mask, const void *data, + int data_type, struct inode *dir, + const struct qstr *file_name, + __kernel_fsid_t *fsid) { struct fanotify_event *event = NULL; gfp_t gfp = GFP_KERNEL_ACCOUNT; - unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); - struct inode *id = fanotify_fid_inode(mask, data, data_type, dir, - fid_mode); + struct inode *id = fanotify_fid_inode(mask, data, data_type, dir); struct inode *dirid = fanotify_dfid_inode(mask, data, data_type, dir); const struct path *path = fsnotify_data_path(data, data_type); + unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); struct mem_cgroup *old_memcg; - struct dentry *moved = NULL; struct inode *child = NULL; bool name_event = false; - unsigned int hash = 0; - bool ondir = mask & FAN_ONDIR; - struct pid *pid; if ((fid_mode & FAN_REPORT_DIR_FID) && dirid) { /* - * For certain events and group flags, report the child fid + * With both flags FAN_REPORT_DIR_FID and FAN_REPORT_FID, we + * report the child fid for events reported on a non-dir child * in addition to reporting the parent fid and maybe child name. */ - if (fanotify_report_child_fid(fid_mode, mask) && id != dirid) + if ((fid_mode & FAN_REPORT_FID) && + id != dirid && !(mask & FAN_ONDIR)) child = id; id = dirid; @@ -750,41 +568,10 @@ static struct fanotify_event *fanotify_alloc_event( if (!(fid_mode & FAN_REPORT_NAME)) { name_event = !!child; file_name = NULL; - } else if ((mask & ALL_FSNOTIFY_DIRENT_EVENTS) || !ondir) { + } else if ((mask & ALL_FSNOTIFY_DIRENT_EVENTS) || + !(mask & FAN_ONDIR)) { name_event = true; } - - /* - * In the special case of FAN_RENAME event, use the match_mask - * to determine if we need to report only the old parent+name, - * only the new parent+name or both. - * 'dirid' and 'file_name' are the old parent+name and - * 'moved' has the new parent+name. - */ - if (mask & FAN_RENAME) { - bool report_old, report_new; - - if (WARN_ON_ONCE(!match_mask)) - return NULL; - - /* Report both old and new parent+name if sb watching */ - report_old = report_new = - match_mask & (1U << FSNOTIFY_ITER_TYPE_SB); - report_old |= - match_mask & (1U << FSNOTIFY_ITER_TYPE_INODE); - report_new |= - match_mask & (1U << FSNOTIFY_ITER_TYPE_INODE2); - - if (!report_old) { - /* Do not report old parent+name */ - dirid = NULL; - file_name = NULL; - } - if (report_new) { - /* Report new parent+name */ - moved = fsnotify_data_dentry(data, data_type); - } - } } /* @@ -803,30 +590,28 @@ static struct fanotify_event *fanotify_alloc_event( if (fanotify_is_perm_event(mask)) { event = fanotify_alloc_perm_event(path, gfp); - } else if (fanotify_is_error_event(mask)) { - event = fanotify_alloc_error_event(group, fsid, data, - data_type, &hash); - } else if (name_event && (file_name || moved || child)) { - event = fanotify_alloc_name_event(dirid, fsid, file_name, child, - moved, &hash, gfp); + } else if (name_event && (file_name || child)) { + event = fanotify_alloc_name_event(id, fsid, file_name, child, + gfp); } else if (fid_mode) { - event = fanotify_alloc_fid_event(id, fsid, &hash, gfp); + event = fanotify_alloc_fid_event(id, fsid, gfp); } else { - event = fanotify_alloc_path_event(path, &hash, gfp); + event = fanotify_alloc_path_event(path, gfp); } if (!event) goto out; + /* + * Use the victim inode instead of the watching inode as the id for + * event queue, so event reported on parent is merged with event + * reported on child when both directory and child watches exist. + */ + fanotify_init_event(event, (unsigned long)id, mask); if (FAN_GROUP_FLAG(group, FAN_REPORT_TID)) - pid = get_pid(task_pid(current)); + event->pid = get_pid(task_pid(current)); else - pid = get_pid(task_tgid(current)); - - /* Mix event info, FAN_ONDIR flag and pid into event merge key */ - hash ^= hash_long((unsigned long)pid | ondir, FANOTIFY_EVENT_HASH_BITS); - fanotify_init_event(event, hash, mask); - event->pid = pid; + event->pid = get_pid(task_tgid(current)); out: set_active_memcg(old_memcg); @@ -840,14 +625,16 @@ static struct fanotify_event *fanotify_alloc_event( */ static __kernel_fsid_t fanotify_get_fsid(struct fsnotify_iter_info *iter_info) { - struct fsnotify_mark *mark; int type; __kernel_fsid_t fsid = {}; - fsnotify_foreach_iter_mark_type(iter_info, mark, type) { + fsnotify_foreach_obj_type(type) { struct fsnotify_mark_connector *conn; - conn = READ_ONCE(mark->connector); + if (!fsnotify_iter_should_report_type(iter_info, type)) + continue; + + conn = READ_ONCE(iter_info->marks[type]->connector); /* Mark is just getting destroyed or created? */ if (!conn) continue; @@ -864,27 +651,6 @@ static __kernel_fsid_t fanotify_get_fsid(struct fsnotify_iter_info *iter_info) return fsid; } -/* - * Add an event to hash table for faster merge. - */ -static void fanotify_insert_event(struct fsnotify_group *group, - struct fsnotify_event *fsn_event) -{ - struct fanotify_event *event = FANOTIFY_E(fsn_event); - unsigned int bucket = fanotify_event_hash_bucket(group, event); - struct hlist_head *hlist = &group->fanotify_data.merge_hash[bucket]; - - assert_spin_locked(&group->notification_lock); - - if (!fanotify_is_hashed_event(event->mask)) - return; - - pr_debug("%s: group=%p event=%p bucket=%u\n", __func__, - group, event, bucket); - - hlist_add_head(&event->merge_list, hlist); -} - static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, const void *data, int data_type, struct inode *dir, @@ -895,7 +661,6 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, struct fanotify_event *event; struct fsnotify_event *fsn_event; __kernel_fsid_t fsid = {}; - u32 match_mask = 0; BUILD_BUG_ON(FAN_ACCESS != FS_ACCESS); BUILD_BUG_ON(FAN_MODIFY != FS_MODIFY); @@ -916,18 +681,15 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR); BUILD_BUG_ON(FAN_OPEN_EXEC != FS_OPEN_EXEC); BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM); - BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR); - BUILD_BUG_ON(FAN_RENAME != FS_RENAME); - BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 21); + BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19); - mask = fanotify_group_event_mask(group, iter_info, &match_mask, - mask, data, data_type, dir); + mask = fanotify_group_event_mask(group, iter_info, mask, data, + data_type, dir); if (!mask) return 0; - pr_debug("%s: group=%p mask=%x report_mask=%x\n", __func__, - group, mask, match_mask); + pr_debug("%s: group=%p mask=%x\n", __func__, group, mask); if (fanotify_is_perm_event(mask)) { /* @@ -946,7 +708,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, } event = fanotify_alloc_event(group, mask, data, data_type, dir, - file_name, &fsid, match_mask); + file_name, &fsid); ret = -ENOMEM; if (unlikely(!event)) { /* @@ -959,8 +721,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, } fsn_event = &event->fse; - ret = fsnotify_insert_event(group, fsn_event, fanotify_merge, - fanotify_insert_event); + ret = fsnotify_add_event(group, fsn_event, fanotify_merge); if (ret) { /* Permission events shouldn't be merged */ BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS); @@ -981,13 +742,11 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, static void fanotify_free_group_priv(struct fsnotify_group *group) { - kfree(group->fanotify_data.merge_hash); - if (group->fanotify_data.ucounts) - dec_ucount(group->fanotify_data.ucounts, - UCOUNT_FANOTIFY_GROUPS); + struct user_struct *user; - if (mempool_initialized(&group->fanotify_data.error_events_pool)) - mempool_exit(&group->fanotify_data.error_events_pool); + user = group->fanotify_data.user; + atomic_dec(&user->fanotify_listeners); + free_uid(user); } static void fanotify_free_path_event(struct fanotify_event *event) @@ -1016,16 +775,7 @@ static void fanotify_free_name_event(struct fanotify_event *event) kfree(FANOTIFY_NE(event)); } -static void fanotify_free_error_event(struct fsnotify_group *group, - struct fanotify_event *event) -{ - struct fanotify_error_event *fee = FANOTIFY_EE(event); - - mempool_free(fee, &group->fanotify_data.error_events_pool); -} - -static void fanotify_free_event(struct fsnotify_group *group, - struct fsnotify_event *fsn_event) +static void fanotify_free_event(struct fsnotify_event *fsn_event) { struct fanotify_event *event; @@ -1047,21 +797,11 @@ static void fanotify_free_event(struct fsnotify_group *group, case FANOTIFY_EVENT_TYPE_OVERFLOW: kfree(event); break; - case FANOTIFY_EVENT_TYPE_FS_ERROR: - fanotify_free_error_event(group, event); - break; default: WARN_ON_ONCE(1); } } -static void fanotify_freeing_mark(struct fsnotify_mark *mark, - struct fsnotify_group *group) -{ - if (!FAN_GROUP_FLAG(group, FAN_UNLIMITED_MARKS)) - dec_ucount(group->fanotify_data.ucounts, UCOUNT_FANOTIFY_MARKS); -} - static void fanotify_free_mark(struct fsnotify_mark *fsn_mark) { kmem_cache_free(fanotify_mark_cache, fsn_mark); @@ -1071,6 +811,5 @@ const struct fsnotify_ops fanotify_fsnotify_ops = { .handle_event = fanotify_handle_event, .free_group_priv = fanotify_free_group_priv, .free_event = fanotify_free_event, - .freeing_mark = fanotify_freeing_mark, .free_mark = fanotify_free_mark, }; diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 57f51a9a3015..896c819a1786 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -3,7 +3,6 @@ #include #include #include -#include extern struct kmem_cache *fanotify_mark_cache; extern struct kmem_cache *fanotify_fid_event_cachep; @@ -40,45 +39,15 @@ struct fanotify_fh { struct fanotify_info { /* size of dir_fh/file_fh including fanotify_fh hdr size */ u8 dir_fh_totlen; - u8 dir2_fh_totlen; u8 file_fh_totlen; u8 name_len; - u8 name2_len; - u8 pad[3]; + u8 pad; unsigned char buf[]; /* * (struct fanotify_fh) dir_fh starts at buf[0] - * (optional) dir2_fh starts at buf[dir_fh_totlen] - * (optional) file_fh starts at buf[dir_fh_totlen + dir2_fh_totlen] - * name starts at buf[dir_fh_totlen + dir2_fh_totlen + file_fh_totlen] - * ... + * (optional) file_fh starts at buf[dir_fh_totlen] + * name starts at buf[dir_fh_totlen + file_fh_totlen] */ -#define FANOTIFY_DIR_FH_SIZE(info) ((info)->dir_fh_totlen) -#define FANOTIFY_DIR2_FH_SIZE(info) ((info)->dir2_fh_totlen) -#define FANOTIFY_FILE_FH_SIZE(info) ((info)->file_fh_totlen) -#define FANOTIFY_NAME_SIZE(info) ((info)->name_len + 1) -#define FANOTIFY_NAME2_SIZE(info) ((info)->name2_len + 1) - -#define FANOTIFY_DIR_FH_OFFSET(info) 0 -#define FANOTIFY_DIR2_FH_OFFSET(info) \ - (FANOTIFY_DIR_FH_OFFSET(info) + FANOTIFY_DIR_FH_SIZE(info)) -#define FANOTIFY_FILE_FH_OFFSET(info) \ - (FANOTIFY_DIR2_FH_OFFSET(info) + FANOTIFY_DIR2_FH_SIZE(info)) -#define FANOTIFY_NAME_OFFSET(info) \ - (FANOTIFY_FILE_FH_OFFSET(info) + FANOTIFY_FILE_FH_SIZE(info)) -#define FANOTIFY_NAME2_OFFSET(info) \ - (FANOTIFY_NAME_OFFSET(info) + FANOTIFY_NAME_SIZE(info)) - -#define FANOTIFY_DIR_FH_BUF(info) \ - ((info)->buf + FANOTIFY_DIR_FH_OFFSET(info)) -#define FANOTIFY_DIR2_FH_BUF(info) \ - ((info)->buf + FANOTIFY_DIR2_FH_OFFSET(info)) -#define FANOTIFY_FILE_FH_BUF(info) \ - ((info)->buf + FANOTIFY_FILE_FH_OFFSET(info)) -#define FANOTIFY_NAME_BUF(info) \ - ((info)->buf + FANOTIFY_NAME_OFFSET(info)) -#define FANOTIFY_NAME2_BUF(info) \ - ((info)->buf + FANOTIFY_NAME2_OFFSET(info)) } __aligned(4); static inline bool fanotify_fh_has_ext_buf(struct fanotify_fh *fh) @@ -117,21 +86,7 @@ static inline struct fanotify_fh *fanotify_info_dir_fh(struct fanotify_info *inf { BUILD_BUG_ON(offsetof(struct fanotify_info, buf) % 4); - return (struct fanotify_fh *)FANOTIFY_DIR_FH_BUF(info); -} - -static inline int fanotify_info_dir2_fh_len(struct fanotify_info *info) -{ - if (!info->dir2_fh_totlen || - WARN_ON_ONCE(info->dir2_fh_totlen < FANOTIFY_FH_HDR_LEN)) - return 0; - - return info->dir2_fh_totlen - FANOTIFY_FH_HDR_LEN; -} - -static inline struct fanotify_fh *fanotify_info_dir2_fh(struct fanotify_info *info) -{ - return (struct fanotify_fh *)FANOTIFY_DIR2_FH_BUF(info); + return (struct fanotify_fh *)info->buf; } static inline int fanotify_info_file_fh_len(struct fanotify_info *info) @@ -145,90 +100,27 @@ static inline int fanotify_info_file_fh_len(struct fanotify_info *info) static inline struct fanotify_fh *fanotify_info_file_fh(struct fanotify_info *info) { - return (struct fanotify_fh *)FANOTIFY_FILE_FH_BUF(info); + return (struct fanotify_fh *)(info->buf + info->dir_fh_totlen); } -static inline char *fanotify_info_name(struct fanotify_info *info) +static inline const char *fanotify_info_name(struct fanotify_info *info) { - if (!info->name_len) - return NULL; - - return FANOTIFY_NAME_BUF(info); -} - -static inline char *fanotify_info_name2(struct fanotify_info *info) -{ - if (!info->name2_len) - return NULL; - - return FANOTIFY_NAME2_BUF(info); + return info->buf + info->dir_fh_totlen + info->file_fh_totlen; } static inline void fanotify_info_init(struct fanotify_info *info) { - BUILD_BUG_ON(FANOTIFY_FH_HDR_LEN + MAX_HANDLE_SZ > U8_MAX); - BUILD_BUG_ON(NAME_MAX > U8_MAX); - info->dir_fh_totlen = 0; - info->dir2_fh_totlen = 0; info->file_fh_totlen = 0; info->name_len = 0; - info->name2_len = 0; -} - -/* These set/copy helpers MUST be called by order */ -static inline void fanotify_info_set_dir_fh(struct fanotify_info *info, - unsigned int totlen) -{ - if (WARN_ON_ONCE(info->dir2_fh_totlen > 0) || - WARN_ON_ONCE(info->file_fh_totlen > 0) || - WARN_ON_ONCE(info->name_len > 0) || - WARN_ON_ONCE(info->name2_len > 0)) - return; - - info->dir_fh_totlen = totlen; -} - -static inline void fanotify_info_set_dir2_fh(struct fanotify_info *info, - unsigned int totlen) -{ - if (WARN_ON_ONCE(info->file_fh_totlen > 0) || - WARN_ON_ONCE(info->name_len > 0) || - WARN_ON_ONCE(info->name2_len > 0)) - return; - - info->dir2_fh_totlen = totlen; -} - -static inline void fanotify_info_set_file_fh(struct fanotify_info *info, - unsigned int totlen) -{ - if (WARN_ON_ONCE(info->name_len > 0) || - WARN_ON_ONCE(info->name2_len > 0)) - return; - - info->file_fh_totlen = totlen; } static inline void fanotify_info_copy_name(struct fanotify_info *info, const struct qstr *name) { - if (WARN_ON_ONCE(name->len > NAME_MAX) || - WARN_ON_ONCE(info->name2_len > 0)) - return; - info->name_len = name->len; - strcpy(fanotify_info_name(info), name->name); -} - -static inline void fanotify_info_copy_name2(struct fanotify_info *info, - const struct qstr *name) -{ - if (WARN_ON_ONCE(name->len > NAME_MAX)) - return; - - info->name2_len = name->len; - strcpy(fanotify_info_name2(info), name->name); + strcpy(info->buf + info->dir_fh_totlen + info->file_fh_totlen, + name->name); } /* @@ -243,48 +135,29 @@ enum fanotify_event_type { FANOTIFY_EVENT_TYPE_PATH, FANOTIFY_EVENT_TYPE_PATH_PERM, FANOTIFY_EVENT_TYPE_OVERFLOW, /* struct fanotify_event */ - FANOTIFY_EVENT_TYPE_FS_ERROR, /* struct fanotify_error_event */ - __FANOTIFY_EVENT_TYPE_NUM }; -#define FANOTIFY_EVENT_TYPE_BITS \ - (ilog2(__FANOTIFY_EVENT_TYPE_NUM - 1) + 1) -#define FANOTIFY_EVENT_HASH_BITS \ - (32 - FANOTIFY_EVENT_TYPE_BITS) - struct fanotify_event { struct fsnotify_event fse; - struct hlist_node merge_list; /* List for hashed merge */ u32 mask; - struct { - unsigned int type : FANOTIFY_EVENT_TYPE_BITS; - unsigned int hash : FANOTIFY_EVENT_HASH_BITS; - }; + enum fanotify_event_type type; struct pid *pid; }; static inline void fanotify_init_event(struct fanotify_event *event, - unsigned int hash, u32 mask) + unsigned long id, u32 mask) { - fsnotify_init_event(&event->fse); - INIT_HLIST_NODE(&event->merge_list); - event->hash = hash; + fsnotify_init_event(&event->fse, id); event->mask = mask; event->pid = NULL; } -#define FANOTIFY_INLINE_FH(name, size) \ -struct { \ - struct fanotify_fh (name); \ - /* Space for object_fh.buf[] - access with fanotify_fh_buf() */ \ - unsigned char _inline_fh_buf[(size)]; \ -} - struct fanotify_fid_event { struct fanotify_event fae; __kernel_fsid_t fsid; - - FANOTIFY_INLINE_FH(object_fh, FANOTIFY_INLINE_FH_LEN); + struct fanotify_fh object_fh; + /* Reserve space in object_fh.buf[] - access with fanotify_fh_buf() */ + unsigned char _inline_fh_buf[FANOTIFY_INLINE_FH_LEN]; }; static inline struct fanotify_fid_event * @@ -305,30 +178,12 @@ FANOTIFY_NE(struct fanotify_event *event) return container_of(event, struct fanotify_name_event, fae); } -struct fanotify_error_event { - struct fanotify_event fae; - s32 error; /* Error reported by the Filesystem. */ - u32 err_count; /* Suppressed errors count */ - - __kernel_fsid_t fsid; /* FSID this error refers to. */ - - FANOTIFY_INLINE_FH(object_fh, MAX_HANDLE_SZ); -}; - -static inline struct fanotify_error_event * -FANOTIFY_EE(struct fanotify_event *event) -{ - return container_of(event, struct fanotify_error_event, fae); -} - static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event) { if (event->type == FANOTIFY_EVENT_TYPE_FID) return &FANOTIFY_FE(event)->fsid; else if (event->type == FANOTIFY_EVENT_TYPE_FID_NAME) return &FANOTIFY_NE(event)->fsid; - else if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR) - return &FANOTIFY_EE(event)->fsid; else return NULL; } @@ -340,8 +195,6 @@ static inline struct fanotify_fh *fanotify_event_object_fh( return &FANOTIFY_FE(event)->object_fh; else if (event->type == FANOTIFY_EVENT_TYPE_FID_NAME) return fanotify_info_file_fh(&FANOTIFY_NE(event)->info); - else if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR) - return &FANOTIFY_EE(event)->object_fh; else return NULL; } @@ -373,37 +226,6 @@ static inline int fanotify_event_dir_fh_len(struct fanotify_event *event) return info ? fanotify_info_dir_fh_len(info) : 0; } -static inline int fanotify_event_dir2_fh_len(struct fanotify_event *event) -{ - struct fanotify_info *info = fanotify_event_info(event); - - return info ? fanotify_info_dir2_fh_len(info) : 0; -} - -static inline bool fanotify_event_has_object_fh(struct fanotify_event *event) -{ - /* For error events, even zeroed fh are reported. */ - if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR) - return true; - return fanotify_event_object_fh_len(event) > 0; -} - -static inline bool fanotify_event_has_dir_fh(struct fanotify_event *event) -{ - return fanotify_event_dir_fh_len(event) > 0; -} - -static inline bool fanotify_event_has_dir2_fh(struct fanotify_event *event) -{ - return fanotify_event_dir2_fh_len(event) > 0; -} - -static inline bool fanotify_event_has_any_dir_fh(struct fanotify_event *event) -{ - return fanotify_event_has_dir_fh(event) || - fanotify_event_has_dir2_fh(event); -} - struct fanotify_path_event { struct fanotify_event fae; struct path path; @@ -447,12 +269,13 @@ static inline struct fanotify_event *FANOTIFY_E(struct fsnotify_event *fse) return container_of(fse, struct fanotify_event, fse); } -static inline bool fanotify_is_error_event(u32 mask) +static inline bool fanotify_event_has_path(struct fanotify_event *event) { - return mask & FAN_FS_ERROR; + return event->type == FANOTIFY_EVENT_TYPE_PATH || + event->type == FANOTIFY_EVENT_TYPE_PATH_PERM; } -static inline const struct path *fanotify_event_path(struct fanotify_event *event) +static inline struct path *fanotify_event_path(struct fanotify_event *event) { if (event->type == FANOTIFY_EVENT_TYPE_PATH) return &FANOTIFY_PE(event)->path; @@ -461,40 +284,3 @@ static inline const struct path *fanotify_event_path(struct fanotify_event *even else return NULL; } - -/* - * Use 128 size hash table to speed up events merge. - */ -#define FANOTIFY_HTABLE_BITS (7) -#define FANOTIFY_HTABLE_SIZE (1 << FANOTIFY_HTABLE_BITS) -#define FANOTIFY_HTABLE_MASK (FANOTIFY_HTABLE_SIZE - 1) - -/* - * Permission events and overflow event do not get merged - don't hash them. - */ -static inline bool fanotify_is_hashed_event(u32 mask) -{ - return !(fanotify_is_perm_event(mask) || - fsnotify_is_overflow_event(mask)); -} - -static inline unsigned int fanotify_event_hash_bucket( - struct fsnotify_group *group, - struct fanotify_event *event) -{ - return event->hash & FANOTIFY_HTABLE_MASK; -} - -static inline unsigned int fanotify_mark_user_flags(struct fsnotify_mark *mark) -{ - unsigned int mflags = 0; - - if (mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY) - mflags |= FAN_MARK_IGNORED_SURV_MODIFY; - if (mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF) - mflags |= FAN_MARK_EVICTABLE; - if (mark->flags & FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS) - mflags |= FAN_MARK_IGNORE; - - return mflags; -} diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 5302313f28be..84de9f97bbc0 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1,7 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include #include -#include #include #include #include @@ -28,62 +27,8 @@ #include "fanotify.h" #define FANOTIFY_DEFAULT_MAX_EVENTS 16384 -#define FANOTIFY_OLD_DEFAULT_MAX_MARKS 8192 -#define FANOTIFY_DEFAULT_MAX_GROUPS 128 -#define FANOTIFY_DEFAULT_FEE_POOL_SIZE 32 - -/* - * Legacy fanotify marks limits (8192) is per group and we introduced a tunable - * limit of marks per user, similar to inotify. Effectively, the legacy limit - * of fanotify marks per user is * . - * This default limit (1M) also happens to match the increased limit of inotify - * max_user_watches since v5.10. - */ -#define FANOTIFY_DEFAULT_MAX_USER_MARKS \ - (FANOTIFY_OLD_DEFAULT_MAX_MARKS * FANOTIFY_DEFAULT_MAX_GROUPS) - -/* - * Most of the memory cost of adding an inode mark is pinning the marked inode. - * The size of the filesystem inode struct is not uniform across filesystems, - * so double the size of a VFS inode is used as a conservative approximation. - */ -#define INODE_MARK_COST (2 * sizeof(struct inode)) - -/* configurable via /proc/sys/fs/fanotify/ */ -static int fanotify_max_queued_events __read_mostly; - -#ifdef CONFIG_SYSCTL - -#include - -struct ctl_table fanotify_table[] = { - { - .procname = "max_user_groups", - .data = &init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS], - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_ZERO, - }, - { - .procname = "max_user_marks", - .data = &init_user_ns.ucount_max[UCOUNT_FANOTIFY_MARKS], - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_ZERO, - }, - { - .procname = "max_queued_events", - .data = &fanotify_max_queued_events, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_ZERO - }, - { } -}; -#endif /* CONFIG_SYSCTL */ +#define FANOTIFY_DEFAULT_MAX_MARKS 8192 +#define FANOTIFY_DEFAULT_MAX_LISTENERS 128 /* * All flags that may be specified in parameter event_f_flags of fanotify_init. @@ -106,12 +51,8 @@ struct kmem_cache *fanotify_path_event_cachep __read_mostly; struct kmem_cache *fanotify_perm_event_cachep __read_mostly; #define FANOTIFY_EVENT_ALIGN 4 -#define FANOTIFY_FID_INFO_HDR_LEN \ +#define FANOTIFY_INFO_HDR_LEN \ (sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle)) -#define FANOTIFY_PIDFD_INFO_HDR_LEN \ - sizeof(struct fanotify_event_info_pidfd) -#define FANOTIFY_ERROR_INFO_LEN \ - (sizeof(struct fanotify_event_info_error)) static int fanotify_fid_info_len(int fh_len, int name_len) { @@ -120,45 +61,21 @@ static int fanotify_fid_info_len(int fh_len, int name_len) if (name_len) info_len += name_len + 1; - return roundup(FANOTIFY_FID_INFO_HDR_LEN + info_len, - FANOTIFY_EVENT_ALIGN); + return roundup(FANOTIFY_INFO_HDR_LEN + info_len, FANOTIFY_EVENT_ALIGN); } -/* FAN_RENAME may have one or two dir+name info records */ -static int fanotify_dir_name_info_len(struct fanotify_event *event) +static int fanotify_event_info_len(unsigned int fid_mode, + struct fanotify_event *event) { struct fanotify_info *info = fanotify_event_info(event); int dir_fh_len = fanotify_event_dir_fh_len(event); - int dir2_fh_len = fanotify_event_dir2_fh_len(event); + int fh_len = fanotify_event_object_fh_len(event); int info_len = 0; - - if (dir_fh_len) - info_len += fanotify_fid_info_len(dir_fh_len, - info->name_len); - if (dir2_fh_len) - info_len += fanotify_fid_info_len(dir2_fh_len, - info->name2_len); - - return info_len; -} - -static size_t fanotify_event_len(unsigned int info_mode, - struct fanotify_event *event) -{ - size_t event_len = FAN_EVENT_METADATA_LEN; - int fh_len; int dot_len = 0; - if (!info_mode) - return event_len; - - if (fanotify_is_error_event(event->mask)) - event_len += FANOTIFY_ERROR_INFO_LEN; - - if (fanotify_event_has_any_dir_fh(event)) { - event_len += fanotify_dir_name_info_len(event); - } else if ((info_mode & FAN_REPORT_NAME) && - (event->mask & FAN_ONDIR)) { + if (dir_fh_len) { + info_len += fanotify_fid_info_len(dir_fh_len, info->name_len); + } else if ((fid_mode & FAN_REPORT_NAME) && (event->mask & FAN_ONDIR)) { /* * With group flag FAN_REPORT_NAME, if name was not recorded in * event on a directory, we will report the name ".". @@ -166,32 +83,10 @@ static size_t fanotify_event_len(unsigned int info_mode, dot_len = 1; } - if (info_mode & FAN_REPORT_PIDFD) - event_len += FANOTIFY_PIDFD_INFO_HDR_LEN; + if (fh_len) + info_len += fanotify_fid_info_len(fh_len, dot_len); - if (fanotify_event_has_object_fh(event)) { - fh_len = fanotify_event_object_fh_len(event); - event_len += fanotify_fid_info_len(fh_len, dot_len); - } - - return event_len; -} - -/* - * Remove an hashed event from merge hash table. - */ -static void fanotify_unhash_event(struct fsnotify_group *group, - struct fanotify_event *event) -{ - assert_spin_locked(&group->notification_lock); - - pr_debug("%s: group=%p event=%p bucket=%u\n", __func__, - group, event, fanotify_event_hash_bucket(group, event)); - - if (WARN_ON_ONCE(hlist_unhashed(&event->merge_list))) - return; - - hlist_del_init(&event->merge_list); + return info_len; } /* @@ -203,41 +98,34 @@ static void fanotify_unhash_event(struct fsnotify_group *group, static struct fanotify_event *get_one_event(struct fsnotify_group *group, size_t count) { - size_t event_size; + size_t event_size = FAN_EVENT_METADATA_LEN; struct fanotify_event *event = NULL; - struct fsnotify_event *fsn_event; - unsigned int info_mode = FAN_GROUP_FLAG(group, FANOTIFY_INFO_MODES); + unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); pr_debug("%s: group=%p count=%zd\n", __func__, group, count); spin_lock(&group->notification_lock); - fsn_event = fsnotify_peek_first_event(group); - if (!fsn_event) + if (fsnotify_notify_queue_is_empty(group)) goto out; - event = FANOTIFY_E(fsn_event); - event_size = fanotify_event_len(info_mode, event); + if (fid_mode) { + event_size += fanotify_event_info_len(fid_mode, + FANOTIFY_E(fsnotify_peek_first_event(group))); + } if (event_size > count) { event = ERR_PTR(-EINVAL); goto out; } - - /* - * Held the notification_lock the whole time, so this is the - * same event we peeked above. - */ - fsnotify_remove_first_event(group); + event = FANOTIFY_E(fsnotify_remove_first_event(group)); if (fanotify_is_perm_event(event->mask)) FANOTIFY_PERM(event)->state = FAN_EVENT_REPORTED; - if (fanotify_is_hashed_event(event->mask)) - fanotify_unhash_event(group, event); out: spin_unlock(&group->notification_lock); return event; } -static int create_fd(struct fsnotify_group *group, const struct path *path, +static int create_fd(struct fsnotify_group *group, struct path *path, struct file **file) { int client_fd; @@ -252,7 +140,7 @@ static int create_fd(struct fsnotify_group *group, const struct path *path, * originally opened O_WRONLY. */ new_file = dentry_open(path, - group->fanotify_data.f_flags | __FMODE_NONOTIFY, + group->fanotify_data.f_flags | FMODE_NONOTIFY, current_cred()); if (IS_ERR(new_file)) { /* @@ -337,31 +225,9 @@ static int process_access_response(struct fsnotify_group *group, return -ENOENT; } -static size_t copy_error_info_to_user(struct fanotify_event *event, - char __user *buf, int count) -{ - struct fanotify_event_info_error info = { }; - struct fanotify_error_event *fee = FANOTIFY_EE(event); - - info.hdr.info_type = FAN_EVENT_INFO_TYPE_ERROR; - info.hdr.len = FANOTIFY_ERROR_INFO_LEN; - - if (WARN_ON(count < info.hdr.len)) - return -EFAULT; - - info.error = fee->error; - info.error_count = fee->err_count; - - if (copy_to_user(buf, &info, sizeof(info))) - return -EFAULT; - - return info.hdr.len; -} - -static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, - int info_type, const char *name, - size_t name_len, - char __user *buf, size_t count) +static int copy_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, + int info_type, const char *name, size_t name_len, + char __user *buf, size_t count) { struct fanotify_event_info_fid info = { }; struct file_handle handle = { }; @@ -373,6 +239,9 @@ static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, pr_debug("%s: fh_len=%zu name_len=%zu, info_len=%zu, count=%zu\n", __func__, fh_len, name_len, info_len, count); + if (!fh_len) + return 0; + if (WARN_ON_ONCE(len < sizeof(info) || len > count)) return -EFAULT; @@ -387,8 +256,6 @@ static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, return -EFAULT; break; case FAN_EVENT_INFO_TYPE_DFID_NAME: - case FAN_EVENT_INFO_TYPE_OLD_DFID_NAME: - case FAN_EVENT_INFO_TYPE_NEW_DFID_NAME: if (WARN_ON_ONCE(!name || !name_len)) return -EFAULT; break; @@ -409,11 +276,6 @@ static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, handle.handle_type = fh->type; handle.handle_bytes = fh_len; - - /* Mangle handle_type for bad file_handle */ - if (!fh_len) - handle.handle_type = FILEID_INVALID; - if (copy_to_user(buf, &handle, sizeof(handle))) return -EFAULT; @@ -458,79 +320,68 @@ static int copy_fid_info_to_user(__kernel_fsid_t *fsid, struct fanotify_fh *fh, return info_len; } -static int copy_pidfd_info_to_user(int pidfd, - char __user *buf, - size_t count) +static ssize_t copy_event_to_user(struct fsnotify_group *group, + struct fanotify_event *event, + char __user *buf, size_t count) { - struct fanotify_event_info_pidfd info = { }; - size_t info_len = FANOTIFY_PIDFD_INFO_HDR_LEN; + struct fanotify_event_metadata metadata; + struct path *path = fanotify_event_path(event); + struct fanotify_info *info = fanotify_event_info(event); + unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); + struct file *f = NULL; + int ret, fd = FAN_NOFD; + int info_type = 0; - if (WARN_ON_ONCE(info_len > count)) - return -EFAULT; + pr_debug("%s: group=%p event=%p\n", __func__, group, event); - info.hdr.info_type = FAN_EVENT_INFO_TYPE_PIDFD; - info.hdr.len = info_len; - info.pidfd = pidfd; + metadata.event_len = FAN_EVENT_METADATA_LEN + + fanotify_event_info_len(fid_mode, event); + metadata.metadata_len = FAN_EVENT_METADATA_LEN; + metadata.vers = FANOTIFY_METADATA_VERSION; + metadata.reserved = 0; + metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS; + metadata.pid = pid_vnr(event->pid); - if (copy_to_user(buf, &info, info_len)) - return -EFAULT; - - return info_len; -} - -static int copy_info_records_to_user(struct fanotify_event *event, - struct fanotify_info *info, - unsigned int info_mode, int pidfd, - char __user *buf, size_t count) -{ - int ret, total_bytes = 0, info_type = 0; - unsigned int fid_mode = info_mode & FANOTIFY_FID_BITS; - unsigned int pidfd_mode = info_mode & FAN_REPORT_PIDFD; + if (path && path->mnt && path->dentry) { + fd = create_fd(group, path, &f); + if (fd < 0) + return fd; + } + metadata.fd = fd; + ret = -EFAULT; /* - * Event info records order is as follows: - * 1. dir fid + name - * 2. (optional) new dir fid + new name - * 3. (optional) child fid + * Sanity check copy size in case get_one_event() and + * event_len sizes ever get out of sync. */ - if (fanotify_event_has_dir_fh(event)) { + if (WARN_ON_ONCE(metadata.event_len > count)) + goto out_close_fd; + + if (copy_to_user(buf, &metadata, FAN_EVENT_METADATA_LEN)) + goto out_close_fd; + + buf += FAN_EVENT_METADATA_LEN; + count -= FAN_EVENT_METADATA_LEN; + + if (fanotify_is_perm_event(event->mask)) + FANOTIFY_PERM(event)->fd = fd; + + /* Event info records order is: dir fid + name, child fid */ + if (fanotify_event_dir_fh_len(event)) { info_type = info->name_len ? FAN_EVENT_INFO_TYPE_DFID_NAME : FAN_EVENT_INFO_TYPE_DFID; - - /* FAN_RENAME uses special info types */ - if (event->mask & FAN_RENAME) - info_type = FAN_EVENT_INFO_TYPE_OLD_DFID_NAME; - - ret = copy_fid_info_to_user(fanotify_event_fsid(event), - fanotify_info_dir_fh(info), - info_type, - fanotify_info_name(info), - info->name_len, buf, count); + ret = copy_info_to_user(fanotify_event_fsid(event), + fanotify_info_dir_fh(info), + info_type, fanotify_info_name(info), + info->name_len, buf, count); if (ret < 0) - return ret; + goto out_close_fd; buf += ret; count -= ret; - total_bytes += ret; } - /* New dir fid+name may be reported in addition to old dir fid+name */ - if (fanotify_event_has_dir2_fh(event)) { - info_type = FAN_EVENT_INFO_TYPE_NEW_DFID_NAME; - ret = copy_fid_info_to_user(fanotify_event_fsid(event), - fanotify_info_dir2_fh(info), - info_type, - fanotify_info_name2(info), - info->name2_len, buf, count); - if (ret < 0) - return ret; - - buf += ret; - count -= ret; - total_bytes += ret; - } - - if (fanotify_event_has_object_fh(event)) { + if (fanotify_event_object_fh_len(event)) { const char *dot = NULL; int dot_len = 0; @@ -544,8 +395,8 @@ static int copy_info_records_to_user(struct fanotify_event *event, (event->mask & FAN_ONDIR)) { /* * With group flag FAN_REPORT_NAME, if name was not - * recorded in an event on a directory, report the name - * "." with info type DFID_NAME. + * recorded in an event on a directory, report the + * name "." with info type DFID_NAME. */ info_type = FAN_EVENT_INFO_TYPE_DFID_NAME; dot = "."; @@ -568,132 +419,14 @@ static int copy_info_records_to_user(struct fanotify_event *event, info_type = FAN_EVENT_INFO_TYPE_FID; } - ret = copy_fid_info_to_user(fanotify_event_fsid(event), - fanotify_event_object_fh(event), - info_type, dot, dot_len, - buf, count); - if (ret < 0) - return ret; - - buf += ret; - count -= ret; - total_bytes += ret; - } - - if (pidfd_mode) { - ret = copy_pidfd_info_to_user(pidfd, buf, count); - if (ret < 0) - return ret; - - buf += ret; - count -= ret; - total_bytes += ret; - } - - if (fanotify_is_error_event(event->mask)) { - ret = copy_error_info_to_user(event, buf, count); - if (ret < 0) - return ret; - buf += ret; - count -= ret; - total_bytes += ret; - } - - return total_bytes; -} - -static ssize_t copy_event_to_user(struct fsnotify_group *group, - struct fanotify_event *event, - char __user *buf, size_t count) -{ - struct fanotify_event_metadata metadata; - const struct path *path = fanotify_event_path(event); - struct fanotify_info *info = fanotify_event_info(event); - unsigned int info_mode = FAN_GROUP_FLAG(group, FANOTIFY_INFO_MODES); - unsigned int pidfd_mode = info_mode & FAN_REPORT_PIDFD; - struct file *f = NULL; - int ret, pidfd = FAN_NOPIDFD, fd = FAN_NOFD; - - pr_debug("%s: group=%p event=%p\n", __func__, group, event); - - metadata.event_len = fanotify_event_len(info_mode, event); - metadata.metadata_len = FAN_EVENT_METADATA_LEN; - metadata.vers = FANOTIFY_METADATA_VERSION; - metadata.reserved = 0; - metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS; - metadata.pid = pid_vnr(event->pid); - /* - * For an unprivileged listener, event->pid can be used to identify the - * events generated by the listener process itself, without disclosing - * the pids of other processes. - */ - if (FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV) && - task_tgid(current) != event->pid) - metadata.pid = 0; - - /* - * For now, fid mode is required for an unprivileged listener and - * fid mode does not report fd in events. Keep this check anyway - * for safety in case fid mode requirement is relaxed in the future - * to allow unprivileged listener to get events with no fd and no fid. - */ - if (!FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV) && - path && path->mnt && path->dentry) { - fd = create_fd(group, path, &f); - if (fd < 0) - return fd; - } - metadata.fd = fd; - - if (pidfd_mode) { - /* - * Complain if the FAN_REPORT_PIDFD and FAN_REPORT_TID mutual - * exclusion is ever lifted. At the time of incoporating pidfd - * support within fanotify, the pidfd API only supported the - * creation of pidfds for thread-group leaders. - */ - WARN_ON_ONCE(FAN_GROUP_FLAG(group, FAN_REPORT_TID)); - - /* - * The PIDTYPE_TGID check for an event->pid is performed - * preemptively in an attempt to catch out cases where the event - * listener reads events after the event generating process has - * already terminated. Report FAN_NOPIDFD to the event listener - * in those cases, with all other pidfd creation errors being - * reported as FAN_EPIDFD. - */ - if (metadata.pid == 0 || - !pid_has_task(event->pid, PIDTYPE_TGID)) { - pidfd = FAN_NOPIDFD; - } else { - pidfd = pidfd_create(event->pid, 0); - if (pidfd < 0) - pidfd = FAN_EPIDFD; - } - } - - ret = -EFAULT; - /* - * Sanity check copy size in case get_one_event() and - * event_len sizes ever get out of sync. - */ - if (WARN_ON_ONCE(metadata.event_len > count)) - goto out_close_fd; - - if (copy_to_user(buf, &metadata, FAN_EVENT_METADATA_LEN)) - goto out_close_fd; - - buf += FAN_EVENT_METADATA_LEN; - count -= FAN_EVENT_METADATA_LEN; - - if (fanotify_is_perm_event(event->mask)) - FANOTIFY_PERM(event)->fd = fd; - - if (info_mode) { - ret = copy_info_records_to_user(event, info, info_mode, pidfd, - buf, count); + ret = copy_info_to_user(fanotify_event_fsid(event), + fanotify_event_object_fh(event), + info_type, dot, dot_len, buf, count); if (ret < 0) goto out_close_fd; + + buf += ret; + count -= ret; } if (f) @@ -706,10 +439,6 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, put_unused_fd(fd); fput(f); } - - if (pidfd >= 0) - close_fd(pidfd); - return ret; } @@ -844,7 +573,6 @@ static ssize_t fanotify_write(struct file *file, const char __user *buf, size_t static int fanotify_release(struct inode *ignored, struct file *file) { struct fsnotify_group *group = file->private_data; - struct fsnotify_event *fsn_event; /* * Stop new events from arriving in the notification queue. since @@ -873,12 +601,13 @@ static int fanotify_release(struct inode *ignored, struct file *file) * dequeue them and set the response. They will be freed once the * response is consumed and fanotify_get_response() returns. */ - while ((fsn_event = fsnotify_remove_first_event(group))) { - struct fanotify_event *event = FANOTIFY_E(fsn_event); + while (!fsnotify_notify_queue_is_empty(group)) { + struct fanotify_event *event; + event = FANOTIFY_E(fsnotify_remove_first_event(group)); if (!(event->mask & FANOTIFY_PERM_EVENTS)) { spin_unlock(&group->notification_lock); - fsnotify_destroy_event(group, fsn_event); + fsnotify_destroy_event(group, &event->fse); } else { finish_permission_event(group, FANOTIFY_PERM(event), FAN_ALLOW); @@ -973,7 +702,7 @@ static int fanotify_find_path(int dfd, const char __user *filename, } /* you can only watch an inode if you have read permissions on it */ - ret = path_permission(path, MAY_READ); + ret = inode_permission(path->dentry->d_inode, MAY_READ); if (ret) { path_put(path); goto out; @@ -991,28 +720,27 @@ static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark, __u32 mask, unsigned int flags, __u32 umask, int *destroy) { - __u32 oldmask, newmask; + __u32 oldmask = 0; /* umask bits cannot be removed by user */ mask &= ~umask; spin_lock(&fsn_mark->lock); - oldmask = fsnotify_calc_mask(fsn_mark); - if (!(flags & FANOTIFY_MARK_IGNORE_BITS)) { + if (!(flags & FAN_MARK_IGNORED_MASK)) { + oldmask = fsn_mark->mask; fsn_mark->mask &= ~mask; } else { - fsn_mark->ignore_mask &= ~mask; + fsn_mark->ignored_mask &= ~mask; } - newmask = fsnotify_calc_mask(fsn_mark); /* * We need to keep the mark around even if remaining mask cannot * result in any events (e.g. mask == FAN_ONDIR) to support incremenal * changes to the mask. * Destroy mark when only umask bits remain. */ - *destroy = !((fsn_mark->mask | fsn_mark->ignore_mask) & ~umask); + *destroy = !((fsn_mark->mask | fsn_mark->ignored_mask) & ~umask); spin_unlock(&fsn_mark->lock); - return oldmask & ~newmask; + return mask & oldmask; } static int fanotify_remove_mark(struct fsnotify_group *group, @@ -1023,10 +751,10 @@ static int fanotify_remove_mark(struct fsnotify_group *group, __u32 removed; int destroy_mark; - fsnotify_group_lock(group); + mutex_lock(&group->mark_mutex); fsn_mark = fsnotify_find_mark(connp, group); if (!fsn_mark) { - fsnotify_group_unlock(group); + mutex_unlock(&group->mark_mutex); return -ENOENT; } @@ -1036,7 +764,7 @@ static int fanotify_remove_mark(struct fsnotify_group *group, fsnotify_recalc_mask(fsn_mark->connector); if (destroy_mark) fsnotify_detach_mark(fsn_mark); - fsnotify_group_unlock(group); + mutex_unlock(&group->mark_mutex); if (destroy_mark) fsnotify_free_mark(fsn_mark); @@ -1069,199 +797,76 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group, flags, umask); } -static bool fanotify_mark_update_flags(struct fsnotify_mark *fsn_mark, - unsigned int fan_flags) +static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, + __u32 mask, + unsigned int flags) { - bool want_iref = !(fan_flags & FAN_MARK_EVICTABLE); - unsigned int ignore = fan_flags & FANOTIFY_MARK_IGNORE_BITS; - bool recalc = false; - - /* - * When using FAN_MARK_IGNORE for the first time, mark starts using - * independent event flags in ignore mask. After that, trying to - * update the ignore mask with the old FAN_MARK_IGNORED_MASK API - * will result in EEXIST error. - */ - if (ignore == FAN_MARK_IGNORE) - fsn_mark->flags |= FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS; - - /* - * Setting FAN_MARK_IGNORED_SURV_MODIFY for the first time may lead to - * the removal of the FS_MODIFY bit in calculated mask if it was set - * because of an ignore mask that is now going to survive FS_MODIFY. - */ - if (ignore && (fan_flags & FAN_MARK_IGNORED_SURV_MODIFY) && - !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) { - fsn_mark->flags |= FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY; - if (!(fsn_mark->mask & FS_MODIFY)) - recalc = true; - } - - if (fsn_mark->connector->type != FSNOTIFY_OBJ_TYPE_INODE || - want_iref == !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF)) - return recalc; - - /* - * NO_IREF may be removed from a mark, but not added. - * When removed, fsnotify_recalc_mask() will take the inode ref. - */ - WARN_ON_ONCE(!want_iref); - fsn_mark->flags &= ~FSNOTIFY_MARK_FLAG_NO_IREF; - - return true; -} - -static bool fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark, - __u32 mask, unsigned int fan_flags) -{ - bool recalc; + __u32 oldmask = -1; spin_lock(&fsn_mark->lock); - if (!(fan_flags & FANOTIFY_MARK_IGNORE_BITS)) + if (!(flags & FAN_MARK_IGNORED_MASK)) { + oldmask = fsn_mark->mask; fsn_mark->mask |= mask; - else - fsn_mark->ignore_mask |= mask; - - recalc = fsnotify_calc_mask(fsn_mark) & - ~fsnotify_conn_mask(fsn_mark->connector); - - recalc |= fanotify_mark_update_flags(fsn_mark, fan_flags); + } else { + fsn_mark->ignored_mask |= mask; + if (flags & FAN_MARK_IGNORED_SURV_MODIFY) + fsn_mark->flags |= FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY; + } spin_unlock(&fsn_mark->lock); - return recalc; + return mask & ~oldmask; } static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group, fsnotify_connp_t *connp, - unsigned int obj_type, - unsigned int fan_flags, + unsigned int type, __kernel_fsid_t *fsid) { - struct ucounts *ucounts = group->fanotify_data.ucounts; struct fsnotify_mark *mark; int ret; - /* - * Enforce per user marks limits per user in all containing user ns. - * A group with FAN_UNLIMITED_MARKS does not contribute to mark count - * in the limited groups account. - */ - if (!FAN_GROUP_FLAG(group, FAN_UNLIMITED_MARKS) && - !inc_ucount(ucounts->ns, ucounts->uid, UCOUNT_FANOTIFY_MARKS)) + if (atomic_read(&group->num_marks) > group->fanotify_data.max_marks) return ERR_PTR(-ENOSPC); mark = kmem_cache_alloc(fanotify_mark_cache, GFP_KERNEL); - if (!mark) { - ret = -ENOMEM; - goto out_dec_ucounts; - } + if (!mark) + return ERR_PTR(-ENOMEM); fsnotify_init_mark(mark, group); - if (fan_flags & FAN_MARK_EVICTABLE) - mark->flags |= FSNOTIFY_MARK_FLAG_NO_IREF; - - ret = fsnotify_add_mark_locked(mark, connp, obj_type, 0, fsid); + ret = fsnotify_add_mark_locked(mark, connp, type, 0, fsid); if (ret) { fsnotify_put_mark(mark); - goto out_dec_ucounts; + return ERR_PTR(ret); } return mark; - -out_dec_ucounts: - if (!FAN_GROUP_FLAG(group, FAN_UNLIMITED_MARKS)) - dec_ucount(ucounts, UCOUNT_FANOTIFY_MARKS); - return ERR_PTR(ret); } -static int fanotify_group_init_error_pool(struct fsnotify_group *group) -{ - if (mempool_initialized(&group->fanotify_data.error_events_pool)) - return 0; - - return mempool_init_kmalloc_pool(&group->fanotify_data.error_events_pool, - FANOTIFY_DEFAULT_FEE_POOL_SIZE, - sizeof(struct fanotify_error_event)); -} - -static int fanotify_may_update_existing_mark(struct fsnotify_mark *fsn_mark, - unsigned int fan_flags) -{ - /* - * Non evictable mark cannot be downgraded to evictable mark. - */ - if (fan_flags & FAN_MARK_EVICTABLE && - !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF)) - return -EEXIST; - - /* - * New ignore mask semantics cannot be downgraded to old semantics. - */ - if (fan_flags & FAN_MARK_IGNORED_MASK && - fsn_mark->flags & FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS) - return -EEXIST; - - /* - * An ignore mask that survives modify could never be downgraded to not - * survive modify. With new FAN_MARK_IGNORE semantics we make that rule - * explicit and return an error when trying to update the ignore mask - * without the original FAN_MARK_IGNORED_SURV_MODIFY value. - */ - if (fan_flags & FAN_MARK_IGNORE && - !(fan_flags & FAN_MARK_IGNORED_SURV_MODIFY) && - fsn_mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY) - return -EEXIST; - - return 0; -} static int fanotify_add_mark(struct fsnotify_group *group, - fsnotify_connp_t *connp, unsigned int obj_type, - __u32 mask, unsigned int fan_flags, + fsnotify_connp_t *connp, unsigned int type, + __u32 mask, unsigned int flags, __kernel_fsid_t *fsid) { struct fsnotify_mark *fsn_mark; - bool recalc; - int ret = 0; + __u32 added; - fsnotify_group_lock(group); + mutex_lock(&group->mark_mutex); fsn_mark = fsnotify_find_mark(connp, group); if (!fsn_mark) { - fsn_mark = fanotify_add_new_mark(group, connp, obj_type, - fan_flags, fsid); + fsn_mark = fanotify_add_new_mark(group, connp, type, fsid); if (IS_ERR(fsn_mark)) { - fsnotify_group_unlock(group); + mutex_unlock(&group->mark_mutex); return PTR_ERR(fsn_mark); } } - - /* - * Check if requested mark flags conflict with an existing mark flags. - */ - ret = fanotify_may_update_existing_mark(fsn_mark, fan_flags); - if (ret) - goto out; - - /* - * Error events are pre-allocated per group, only if strictly - * needed (i.e. FAN_FS_ERROR was requested). - */ - if (!(fan_flags & FANOTIFY_MARK_IGNORE_BITS) && - (mask & FAN_FS_ERROR)) { - ret = fanotify_group_init_error_pool(group); - if (ret) - goto out; - } - - recalc = fanotify_mark_add_to_mask(fsn_mark, mask, fan_flags); - if (recalc) + added = fanotify_mark_add_to_mask(fsn_mark, mask, flags); + if (added & ~fsnotify_conn_mask(fsn_mark->connector)) fsnotify_recalc_mask(fsn_mark->connector); - -out: - fsnotify_group_unlock(group); + mutex_unlock(&group->mark_mutex); fsnotify_put_mark(fsn_mark); - return ret; + return 0; } static int fanotify_add_vfsmount_mark(struct fsnotify_group *group, @@ -1288,10 +893,10 @@ static int fanotify_add_inode_mark(struct fsnotify_group *group, /* * If some other task has this inode open for write we should not add - * an ignore mask, unless that ignore mask is supposed to survive + * an ignored mark, unless that ignored mark is supposed to survive * modification changes anyway. */ - if ((flags & FANOTIFY_MARK_IGNORE_BITS) && + if ((flags & FAN_MARK_IGNORED_MASK) && !(flags & FAN_MARK_IGNORED_SURV_MODIFY) && inode_is_open_for_write(inode)) return 0; @@ -1314,49 +919,20 @@ static struct fsnotify_event *fanotify_alloc_overflow_event(void) return &oevent->fse; } -static struct hlist_head *fanotify_alloc_merge_hash(void) -{ - struct hlist_head *hash; - - hash = kmalloc(sizeof(struct hlist_head) << FANOTIFY_HTABLE_BITS, - GFP_KERNEL_ACCOUNT); - if (!hash) - return NULL; - - __hash_init(hash, FANOTIFY_HTABLE_SIZE); - - return hash; -} - /* fanotify syscalls */ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) { struct fsnotify_group *group; int f_flags, fd; + struct user_struct *user; unsigned int fid_mode = flags & FANOTIFY_FID_BITS; unsigned int class = flags & FANOTIFY_CLASS_BITS; - unsigned int internal_flags = 0; pr_debug("%s: flags=%x event_f_flags=%x\n", __func__, flags, event_f_flags); - if (!capable(CAP_SYS_ADMIN)) { - /* - * An unprivileged user can setup an fanotify group with - * limited functionality - an unprivileged group is limited to - * notification events with file handles and it cannot use - * unlimited queue/marks. - */ - if ((flags & FANOTIFY_ADMIN_INIT_FLAGS) || !fid_mode) - return -EPERM; - - /* - * Setting the internal flag FANOTIFY_UNPRIV on the group - * prevents setting mount/filesystem marks on this group and - * prevents reporting pid and open fd in events. - */ - internal_flags |= FANOTIFY_UNPRIV; - } + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; #ifdef CONFIG_AUDITSYSCALL if (flags & ~(FANOTIFY_INIT_FLAGS | FAN_ENABLE_AUDIT)) @@ -1365,14 +941,6 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) #endif return -EINVAL; - /* - * A pidfd can only be returned for a thread-group leader; thus - * FAN_REPORT_PIDFD and FAN_REPORT_TID need to remain mutually - * exclusive. - */ - if ((flags & FAN_REPORT_PIDFD) && (flags & FAN_REPORT_TID)) - return -EINVAL; - if (event_f_flags & ~FANOTIFY_INIT_ALL_EVENT_F_BITS) return -EINVAL; @@ -1395,46 +963,30 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) if ((fid_mode & FAN_REPORT_NAME) && !(fid_mode & FAN_REPORT_DIR_FID)) return -EINVAL; - /* - * FAN_REPORT_TARGET_FID requires FAN_REPORT_NAME and FAN_REPORT_FID - * and is used as an indication to report both dir and child fid on all - * dirent events. - */ - if ((fid_mode & FAN_REPORT_TARGET_FID) && - (!(fid_mode & FAN_REPORT_NAME) || !(fid_mode & FAN_REPORT_FID))) - return -EINVAL; + user = get_current_user(); + if (atomic_read(&user->fanotify_listeners) > FANOTIFY_DEFAULT_MAX_LISTENERS) { + free_uid(user); + return -EMFILE; + } - f_flags = O_RDWR | __FMODE_NONOTIFY; + f_flags = O_RDWR | FMODE_NONOTIFY; if (flags & FAN_CLOEXEC) f_flags |= O_CLOEXEC; if (flags & FAN_NONBLOCK) f_flags |= O_NONBLOCK; /* fsnotify_alloc_group takes a ref. Dropped in fanotify_release */ - group = fsnotify_alloc_group(&fanotify_fsnotify_ops, - FSNOTIFY_GROUP_USER | FSNOTIFY_GROUP_NOFS); + group = fsnotify_alloc_group(&fanotify_fsnotify_ops); if (IS_ERR(group)) { + free_uid(user); return PTR_ERR(group); } - /* Enforce groups limits per user in all containing user ns */ - group->fanotify_data.ucounts = inc_ucount(current_user_ns(), - current_euid(), - UCOUNT_FANOTIFY_GROUPS); - if (!group->fanotify_data.ucounts) { - fd = -EMFILE; - goto out_destroy_group; - } - - group->fanotify_data.flags = flags | internal_flags; + group->fanotify_data.user = user; + group->fanotify_data.flags = flags; + atomic_inc(&user->fanotify_listeners); group->memcg = get_mem_cgroup_from_mm(current->mm); - group->fanotify_data.merge_hash = fanotify_alloc_merge_hash(); - if (!group->fanotify_data.merge_hash) { - fd = -ENOMEM; - goto out_destroy_group; - } - group->overflow_event = fanotify_alloc_overflow_event(); if (unlikely(!group->overflow_event)) { fd = -ENOMEM; @@ -1467,13 +1019,16 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) goto out_destroy_group; group->max_events = UINT_MAX; } else { - group->max_events = fanotify_max_queued_events; + group->max_events = FANOTIFY_DEFAULT_MAX_EVENTS; } if (flags & FAN_UNLIMITED_MARKS) { fd = -EPERM; if (!capable(CAP_SYS_ADMIN)) goto out_destroy_group; + group->fanotify_data.max_marks = UINT_MAX; + } else { + group->fanotify_data.max_marks = FANOTIFY_DEFAULT_MAX_MARKS; } if (flags & FAN_ENABLE_AUDIT) { @@ -1493,15 +1048,16 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) return fd; } -static int fanotify_test_fsid(struct dentry *dentry, __kernel_fsid_t *fsid) +/* Check if filesystem can encode a unique fid */ +static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid) { __kernel_fsid_t root_fsid; int err; /* - * Make sure dentry is not of a filesystem with zero fsid (e.g. fuse). + * Make sure path is not in filesystem with zero fsid (e.g. tmpfs). */ - err = vfs_get_fsid(dentry, fsid); + err = vfs_get_fsid(path->dentry, fsid); if (err) return err; @@ -1509,10 +1065,10 @@ static int fanotify_test_fsid(struct dentry *dentry, __kernel_fsid_t *fsid) return -ENODEV; /* - * Make sure dentry is not of a filesystem subvolume (e.g. btrfs) + * Make sure path is not inside a filesystem subvolume (e.g. btrfs) * which uses a different fsid than sb root. */ - err = vfs_get_fsid(dentry->d_sb->s_root, &root_fsid); + err = vfs_get_fsid(path->dentry->d_sb->s_root, &root_fsid); if (err) return err; @@ -1520,12 +1076,6 @@ static int fanotify_test_fsid(struct dentry *dentry, __kernel_fsid_t *fsid) root_fsid.val[1] != fsid->val[1]) return -EXDEV; - return 0; -} - -/* Check if filesystem can encode a unique fid */ -static int fanotify_test_fid(struct dentry *dentry) -{ /* * We need to make sure that the file system supports at least * encoding a file handle so user can use name_to_handle_at() to @@ -1533,22 +1083,17 @@ static int fanotify_test_fid(struct dentry *dentry) * objects. However, name_to_handle_at() requires that the * filesystem also supports decoding file handles. */ - if (!dentry->d_sb->s_export_op || - !dentry->d_sb->s_export_op->fh_to_dentry) + if (!path->dentry->d_sb->s_export_op || + !path->dentry->d_sb->s_export_op->fh_to_dentry) return -EOPNOTSUPP; return 0; } -static int fanotify_events_supported(struct fsnotify_group *group, - const struct path *path, __u64 mask, +static int fanotify_events_supported(struct path *path, __u64 mask, unsigned int flags) { unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS; - /* Strict validation of events in non-dir inode mask with v5.17+ APIs */ - bool strict_dir_events = FAN_GROUP_FLAG(group, FAN_REPORT_TARGET_FID) || - (mask & FAN_RENAME) || - (flags & FAN_MARK_IGNORE); /* * Some filesystems such as 'proc' acquire unusual locks when opening @@ -1576,15 +1121,6 @@ static int fanotify_events_supported(struct fsnotify_group *group, path->mnt->mnt_sb->s_flags & SB_NOUSER) return -EINVAL; - /* - * We shouldn't have allowed setting dirent events and the directory - * flags FAN_ONDIR and FAN_EVENT_ON_CHILD in mask of non-dir inode, - * but because we always allowed it, error only when using new APIs. - */ - if (strict_dir_events && mark_type == FAN_MARK_INODE && - !d_is_dir(path->dentry) && (mask & FANOTIFY_DIRONLY_EVENT_BITS)) - return -ENOTDIR; - return 0; } @@ -1599,8 +1135,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, __kernel_fsid_t __fsid, *fsid = NULL; u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS; unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS; - unsigned int mark_cmd = flags & FANOTIFY_MARK_CMD_BITS; - unsigned int ignore = flags & FANOTIFY_MARK_IGNORE_BITS; + bool ignored = flags & FAN_MARK_IGNORED_MASK; unsigned int obj_type, fid_mode; u32 umask = 0; int ret; @@ -1609,7 +1144,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, __func__, fanotify_fd, flags, dfd, pathname, mask); /* we only use the lower 32 bits as of right now. */ - if (upper_32_bits(mask)) + if (mask & ((__u64)0xffffffff << 32)) return -EINVAL; if (flags & ~FANOTIFY_MARK_FLAGS) @@ -1629,7 +1164,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, return -EINVAL; } - switch (mark_cmd) { + switch (flags & (FAN_MARK_ADD | FAN_MARK_REMOVE | FAN_MARK_FLUSH)) { case FAN_MARK_ADD: case FAN_MARK_REMOVE: if (!mask) @@ -1649,19 +1184,9 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, if (mask & ~valid_mask) return -EINVAL; - - /* We don't allow FAN_MARK_IGNORE & FAN_MARK_IGNORED_MASK together */ - if (ignore == (FAN_MARK_IGNORE | FAN_MARK_IGNORED_MASK)) - return -EINVAL; - - /* - * Event flags (FAN_ONDIR, FAN_EVENT_ON_CHILD) have no effect with - * FAN_MARK_IGNORED_MASK. - */ - if (ignore == FAN_MARK_IGNORED_MASK) { + /* Event flags (ONDIR, ON_CHILD) are meaningless in ignored mask */ + if (ignored) mask &= ~FANOTIFY_EVENT_FLAGS; - umask = FANOTIFY_EVENT_FLAGS; - } f = fdget(fanotify_fd); if (unlikely(!f.file)) @@ -1673,17 +1198,6 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, goto fput_and_out; group = f.file->private_data; - /* - * An unprivileged user is not allowed to setup mount nor filesystem - * marks. This also includes setting up such marks by a group that - * was initialized by an unprivileged user. - */ - ret = -EPERM; - if ((!capable(CAP_SYS_ADMIN) || - FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV)) && - mark_type != FAN_MARK_INODE) - goto fput_and_out; - /* * group->priority == FS_PRIO_0 == FAN_CLASS_NOTIF. These are not * allowed to set permissions events. @@ -1693,39 +1207,19 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, group->priority == FS_PRIO_0) goto fput_and_out; - if (mask & FAN_FS_ERROR && - mark_type != FAN_MARK_FILESYSTEM) - goto fput_and_out; - /* - * Evictable is only relevant for inode marks, because only inode object - * can be evicted on memory pressure. - */ - if (flags & FAN_MARK_EVICTABLE && - mark_type != FAN_MARK_INODE) - goto fput_and_out; - - /* - * Events that do not carry enough information to report - * event->fd require a group that supports reporting fid. Those - * events are not supported on a mount mark, because they do not - * carry enough information (i.e. path) to be filtered by mount - * point. + * Events with data type inode do not carry enough information to report + * event->fd, so we do not allow setting a mask for inode events unless + * group supports reporting fid. + * inode events are not supported on a mount mark, because they do not + * carry enough information (i.e. path) to be filtered by mount point. */ fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS); - if (mask & ~(FANOTIFY_FD_EVENTS|FANOTIFY_EVENT_FLAGS) && + if (mask & FANOTIFY_INODE_EVENTS && (!fid_mode || mark_type == FAN_MARK_MOUNT)) goto fput_and_out; - /* - * FAN_RENAME uses special info type records to report the old and - * new parent+name. Reporting only old and new parent id is less - * useful and was not implemented. - */ - if (mask & FAN_RENAME && !(fid_mode & FAN_REPORT_NAME)) - goto fput_and_out; - - if (mark_cmd == FAN_MARK_FLUSH) { + if (flags & FAN_MARK_FLUSH) { ret = 0; if (mark_type == FAN_MARK_MOUNT) fsnotify_clear_vfsmount_marks_by_group(group); @@ -1741,18 +1235,14 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, if (ret) goto fput_and_out; - if (mark_cmd == FAN_MARK_ADD) { - ret = fanotify_events_supported(group, &path, mask, flags); + if (flags & FAN_MARK_ADD) { + ret = fanotify_events_supported(&path, mask, flags); if (ret) goto path_put_and_out; } if (fid_mode) { - ret = fanotify_test_fsid(path.dentry, &__fsid); - if (ret) - goto path_put_and_out; - - ret = fanotify_test_fid(path.dentry); + ret = fanotify_test_fid(&path, &__fsid); if (ret) goto path_put_and_out; @@ -1765,13 +1255,6 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, else mnt = path.mnt; - ret = mnt ? -EINVAL : -EISDIR; - /* FAN_MARK_IGNORE requires SURV_MODIFY for sb/mount/dir marks */ - if (mark_cmd == FAN_MARK_ADD && ignore == FAN_MARK_IGNORE && - (mnt || S_ISDIR(inode->i_mode)) && - !(flags & FAN_MARK_IGNORED_SURV_MODIFY)) - goto path_put_and_out; - /* Mask out FAN_EVENT_ON_CHILD flag for sb/mount/non-dir marks */ if (mnt || !S_ISDIR(inode->i_mode)) { mask &= ~FAN_EVENT_ON_CHILD; @@ -1781,12 +1264,12 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, * events with parent/name info for non-directory. */ if ((fid_mode & FAN_REPORT_DIR_FID) && - (flags & FAN_MARK_ADD) && !ignore) + (flags & FAN_MARK_ADD) && !ignored) mask |= FAN_EVENT_ON_CHILD; } /* create/update an inode mark */ - switch (mark_cmd) { + switch (flags & (FAN_MARK_ADD | FAN_MARK_REMOVE)) { case FAN_MARK_ADD: if (mark_type == FAN_MARK_MOUNT) ret = fanotify_add_vfsmount_mark(group, mnt, mask, @@ -1847,24 +1330,8 @@ SYSCALL32_DEFINE6(fanotify_mark, */ static int __init fanotify_user_setup(void) { - struct sysinfo si; - int max_marks; - - si_meminfo(&si); - /* - * Allow up to 1% of addressable memory to be accounted for per user - * marks limited to the range [8192, 1048576]. mount and sb marks are - * a lot cheaper than inode marks, but there is no reason for a user - * to have many of those, so calculate by the cost of inode marks. - */ - max_marks = (((si.totalram - si.totalhigh) / 100) << PAGE_SHIFT) / - INODE_MARK_COST; - max_marks = clamp(max_marks, FANOTIFY_OLD_DEFAULT_MAX_MARKS, - FANOTIFY_DEFAULT_MAX_USER_MARKS); - - BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS); - BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 12); - BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 11); + BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10); + BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9); fanotify_mark_cache = KMEM_CACHE(fsnotify_mark, SLAB_PANIC|SLAB_ACCOUNT); @@ -1877,11 +1344,6 @@ static int __init fanotify_user_setup(void) KMEM_CACHE(fanotify_perm_event, SLAB_PANIC); } - fanotify_max_queued_events = FANOTIFY_DEFAULT_MAX_EVENTS; - init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS] = - FANOTIFY_DEFAULT_MAX_GROUPS; - init_user_ns.ucount_max[UCOUNT_FANOTIFY_MARKS] = max_marks; - return 0; } device_initcall(fanotify_user_setup); diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c index 55081ae3a6ec..765b50aeadd2 100644 --- a/fs/notify/fdinfo.c +++ b/fs/notify/fdinfo.c @@ -14,7 +14,6 @@ #include #include "inotify/inotify.h" -#include "fanotify/fanotify.h" #include "fdinfo.h" #include "fsnotify.h" @@ -29,13 +28,13 @@ static void show_fdinfo(struct seq_file *m, struct file *f, struct fsnotify_group *group = f->private_data; struct fsnotify_mark *mark; - fsnotify_group_lock(group); + mutex_lock(&group->mark_mutex); list_for_each_entry(mark, &group->marks_list, g_list) { show(m, mark); if (seq_has_overflowed(m)) break; } - fsnotify_group_unlock(group); + mutex_unlock(&group->mark_mutex); } #if defined(CONFIG_EXPORTFS) @@ -104,16 +103,19 @@ void inotify_show_fdinfo(struct seq_file *m, struct file *f) static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark) { - unsigned int mflags = fanotify_mark_user_flags(mark); + unsigned int mflags = 0; struct inode *inode; + if (mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY) + mflags |= FAN_MARK_IGNORED_SURV_MODIFY; + if (mark->connector->type == FSNOTIFY_OBJ_TYPE_INODE) { inode = igrab(fsnotify_conn_inode(mark->connector)); if (!inode) return; seq_printf(m, "fanotify ino:%lx sdev:%x mflags:%x mask:%x ignored_mask:%x ", inode->i_ino, inode->i_sb->s_dev, - mflags, mark->mask, mark->ignore_mask); + mflags, mark->mask, mark->ignored_mask); show_mark_fhandle(m, inode); seq_putc(m, '\n'); iput(inode); @@ -121,12 +123,12 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark) struct mount *mnt = fsnotify_conn_mount(mark->connector); seq_printf(m, "fanotify mnt_id:%x mflags:%x mask:%x ignored_mask:%x\n", - mnt->mnt_id, mflags, mark->mask, mark->ignore_mask); + mnt->mnt_id, mflags, mark->mask, mark->ignored_mask); } else if (mark->connector->type == FSNOTIFY_OBJ_TYPE_SB) { struct super_block *sb = fsnotify_conn_sb(mark->connector); seq_printf(m, "fanotify sdev:%x mflags:%x mask:%x ignored_mask:%x\n", - sb->s_dev, mflags, mark->mask, mark->ignore_mask); + sb->s_dev, mflags, mark->mask, mark->ignored_mask); } } @@ -135,8 +137,7 @@ void fanotify_show_fdinfo(struct seq_file *m, struct file *f) struct fsnotify_group *group = f->private_data; seq_printf(m, "fanotify flags:%x event-flags:%x\n", - group->fanotify_data.flags & FANOTIFY_INIT_FLAGS, - group->fanotify_data.f_flags); + group->fanotify_data.flags, group->fanotify_data.f_flags); show_fdinfo(m, f, fanotify_fdinfo); } diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 7974e91ffe13..30d422b8c0fc 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -70,7 +70,8 @@ static void fsnotify_unmount_inodes(struct super_block *sb) spin_unlock(&inode->i_lock); spin_unlock(&sb->s_inode_list_lock); - iput(iput_inode); + if (iput_inode) + iput(iput_inode); /* for each watch, send FS_UNMOUNT and then remove it */ fsnotify_inode(inode, FS_UNMOUNT); @@ -84,23 +85,24 @@ static void fsnotify_unmount_inodes(struct super_block *sb) } spin_unlock(&sb->s_inode_list_lock); - iput(iput_inode); + if (iput_inode) + iput(iput_inode); + /* Wait for outstanding inode references from connectors */ + wait_var_event(&sb->s_fsnotify_inode_refs, + !atomic_long_read(&sb->s_fsnotify_inode_refs)); } void fsnotify_sb_delete(struct super_block *sb) { fsnotify_unmount_inodes(sb); fsnotify_clear_marks_by_sb(sb); - /* Wait for outstanding object references from connectors */ - wait_var_event(&sb->s_fsnotify_connectors, - !atomic_long_read(&sb->s_fsnotify_connectors)); } /* * Given an inode, first check if we care what happens to our children. Inotify * and dnotify both tell their parents about events. If we care about any event * on a child we run all of our children and set a dentry flag saying that the - * parent cares. Thus when an event happens on a child it can quickly tell + * parent cares. Thus when an event happens on a child it can quickly tell if * if there is a need to find a parent and send the event to the parent. */ void __fsnotify_update_child_dentry_flags(struct inode *inode) @@ -250,10 +252,7 @@ static int fsnotify_handle_inode_event(struct fsnotify_group *group, if (WARN_ON_ONCE(!ops->handle_inode_event)) return 0; - if (WARN_ON_ONCE(!inode && !dir)) - return 0; - - if ((inode_mark->flags & FSNOTIFY_MARK_FLAG_EXCL_UNLINK) && + if ((inode_mark->mask & FS_EXCL_UNLINK) && path && d_unlinked(path->dentry)) return 0; @@ -277,28 +276,23 @@ static int fsnotify_handle_event(struct fsnotify_group *group, __u32 mask, WARN_ON_ONCE(fsnotify_iter_vfsmount_mark(iter_info))) return 0; - /* - * For FS_RENAME, 'dir' is old dir and 'data' is new dentry. - * The only ->handle_inode_event() backend that supports FS_RENAME is - * dnotify, where it means file was renamed within same parent. - */ - if (mask & FS_RENAME) { - struct dentry *moved = fsnotify_data_dentry(data, data_type); - - if (dir != moved->d_parent->d_inode) + if (parent_mark) { + /* + * parent_mark indicates that the parent inode is watching + * children and interested in this event, which is an event + * possible on child. But is *this mark* watching children and + * interested in this event? + */ + if (parent_mark->mask & FS_EVENT_ON_CHILD) { + ret = fsnotify_handle_inode_event(group, parent_mark, mask, + data, data_type, dir, name, 0); + if (ret) + return ret; + } + if (!inode_mark) return 0; } - if (parent_mark) { - ret = fsnotify_handle_inode_event(group, parent_mark, mask, - data, data_type, dir, name, 0); - if (ret) - return ret; - } - - if (!inode_mark) - return 0; - if (mask & FS_EVENT_ON_CHILD) { /* * Some events can be sent on both parent dir and child marks @@ -324,36 +318,42 @@ static int send_to_group(__u32 mask, const void *data, int data_type, struct fsnotify_group *group = NULL; __u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS); __u32 marks_mask = 0; - __u32 marks_ignore_mask = 0; - bool is_dir = mask & FS_ISDIR; + __u32 marks_ignored_mask = 0; struct fsnotify_mark *mark; int type; - if (!iter_info->report_mask) + if (WARN_ON(!iter_info->report_mask)) return 0; /* clear ignored on inode modification */ if (mask & FS_MODIFY) { - fsnotify_foreach_iter_mark_type(iter_info, mark, type) { - if (!(mark->flags & - FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) - mark->ignore_mask = 0; + fsnotify_foreach_obj_type(type) { + if (!fsnotify_iter_should_report_type(iter_info, type)) + continue; + mark = iter_info->marks[type]; + if (mark && + !(mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) + mark->ignored_mask = 0; } } - /* Are any of the group marks interested in this event? */ - fsnotify_foreach_iter_mark_type(iter_info, mark, type) { - group = mark->group; - marks_mask |= mark->mask; - marks_ignore_mask |= - fsnotify_effective_ignore_mask(mark, is_dir, type); + fsnotify_foreach_obj_type(type) { + if (!fsnotify_iter_should_report_type(iter_info, type)) + continue; + mark = iter_info->marks[type]; + /* does the object mark tell us to do something? */ + if (mark) { + group = mark->group; + marks_mask |= mark->mask; + marks_ignored_mask |= mark->ignored_mask; + } } - pr_debug("%s: group=%p mask=%x marks_mask=%x marks_ignore_mask=%x data=%p data_type=%d dir=%p cookie=%d\n", - __func__, group, mask, marks_mask, marks_ignore_mask, + pr_debug("%s: group=%p mask=%x marks_mask=%x marks_ignored_mask=%x data=%p data_type=%d dir=%p cookie=%d\n", + __func__, group, mask, marks_mask, marks_ignored_mask, data, data_type, dir, cookie); - if (!(test_mask & marks_mask & ~marks_ignore_mask)) + if (!(test_mask & marks_mask & ~marks_ignored_mask)) return 0; if (group->ops->handle_event) { @@ -390,11 +390,11 @@ static struct fsnotify_mark *fsnotify_next_mark(struct fsnotify_mark *mark) /* * iter_info is a multi head priority queue of marks. - * Pick a subset of marks from queue heads, all with the same group - * and set the report_mask to a subset of the selected marks. - * Returns false if there are no more groups to iterate. + * Pick a subset of marks from queue heads, all with the + * same group and set the report_mask for selected subset. + * Returns the report_mask of the selected subset. */ -static bool fsnotify_iter_select_report_types( +static unsigned int fsnotify_iter_select_report_types( struct fsnotify_iter_info *iter_info) { struct fsnotify_group *max_prio_group = NULL; @@ -402,7 +402,7 @@ static bool fsnotify_iter_select_report_types( int type; /* Choose max prio group among groups of all queue heads */ - fsnotify_foreach_iter_type(type) { + fsnotify_foreach_obj_type(type) { mark = iter_info->marks[type]; if (mark && fsnotify_compare_groups(max_prio_group, mark->group) > 0) @@ -410,49 +410,30 @@ static bool fsnotify_iter_select_report_types( } if (!max_prio_group) - return false; + return 0; /* Set the report mask for marks from same group as max prio group */ - iter_info->current_group = max_prio_group; iter_info->report_mask = 0; - fsnotify_foreach_iter_type(type) { + fsnotify_foreach_obj_type(type) { mark = iter_info->marks[type]; - if (mark && mark->group == iter_info->current_group) { - /* - * FSNOTIFY_ITER_TYPE_PARENT indicates that this inode - * is watching children and interested in this event, - * which is an event possible on child. - * But is *this mark* watching children? - */ - if (type == FSNOTIFY_ITER_TYPE_PARENT && - !(mark->mask & FS_EVENT_ON_CHILD) && - !(fsnotify_ignore_mask(mark) & FS_EVENT_ON_CHILD)) - continue; - + if (mark && + fsnotify_compare_groups(max_prio_group, mark->group) == 0) fsnotify_iter_set_report_type(iter_info, type); - } } - return true; + return iter_info->report_mask; } /* - * Pop from iter_info multi head queue, the marks that belong to the group of + * Pop from iter_info multi head queue, the marks that were iterated in the * current iteration step. */ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info) { - struct fsnotify_mark *mark; int type; - /* - * We cannot use fsnotify_foreach_iter_mark_type() here because we - * may need to advance a mark of type X that belongs to current_group - * but was not selected for reporting. - */ - fsnotify_foreach_iter_type(type) { - mark = iter_info->marks[type]; - if (mark && mark->group == iter_info->current_group) + fsnotify_foreach_obj_type(type) { + if (fsnotify_iter_should_report_type(iter_info, type)) iter_info->marks[type] = fsnotify_next_mark(iter_info->marks[type]); } @@ -474,20 +455,18 @@ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info) * @file_name is relative to * @file_name: optional file name associated with event * @inode: optional inode associated with event - - * If @dir and @inode are both non-NULL, event may be - * reported to both. + * either @dir or @inode must be non-NULL. + * if both are non-NULL event may be reported to both. * @cookie: inotify rename cookie */ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, const struct qstr *file_name, struct inode *inode, u32 cookie) { const struct path *path = fsnotify_data_path(data, data_type); - struct super_block *sb = fsnotify_data_sb(data, data_type); struct fsnotify_iter_info iter_info = {}; + struct super_block *sb; struct mount *mnt = NULL; - struct inode *inode2 = NULL; - struct dentry *moved; - int inode2_type; + struct inode *parent = NULL; int ret = 0; __u32 test_mask, marks_mask; @@ -497,20 +476,14 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, if (!inode) { /* Dirent event - report on TYPE_INODE to dir */ inode = dir; - /* For FS_RENAME, inode is old_dir and inode2 is new_dir */ - if (mask & FS_RENAME) { - moved = fsnotify_data_dentry(data, data_type); - inode2 = moved->d_parent->d_inode; - inode2_type = FSNOTIFY_ITER_TYPE_INODE2; - } } else if (mask & FS_EVENT_ON_CHILD) { /* * Event on child - report on TYPE_PARENT to dir if it is * watching children and on TYPE_INODE to child. */ - inode2 = dir; - inode2_type = FSNOTIFY_ITER_TYPE_PARENT; + parent = dir; } + sb = inode->i_sb; /* * Optimization: srcu_read_lock() has a memory barrier which can @@ -522,7 +495,7 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, if (!sb->s_fsnotify_marks && (!mnt || !mnt->mnt_fsnotify_marks) && (!inode || !inode->i_fsnotify_marks) && - (!inode2 || !inode2->i_fsnotify_marks)) + (!parent || !parent->i_fsnotify_marks)) return 0; marks_mask = sb->s_fsnotify_mask; @@ -530,35 +503,33 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir, marks_mask |= mnt->mnt_fsnotify_mask; if (inode) marks_mask |= inode->i_fsnotify_mask; - if (inode2) - marks_mask |= inode2->i_fsnotify_mask; + if (parent) + marks_mask |= parent->i_fsnotify_mask; /* - * If this is a modify event we may need to clear some ignore masks. - * In that case, the object with ignore masks will have the FS_MODIFY - * event in its mask. - * Otherwise, return if none of the marks care about this type of event. + * if this is a modify event we may need to clear the ignored masks + * otherwise return if none of the marks care about this type of event. */ test_mask = (mask & ALL_FSNOTIFY_EVENTS); - if (!(test_mask & marks_mask)) + if (!(mask & FS_MODIFY) && !(test_mask & marks_mask)) return 0; iter_info.srcu_idx = srcu_read_lock(&fsnotify_mark_srcu); - iter_info.marks[FSNOTIFY_ITER_TYPE_SB] = + iter_info.marks[FSNOTIFY_OBJ_TYPE_SB] = fsnotify_first_mark(&sb->s_fsnotify_marks); if (mnt) { - iter_info.marks[FSNOTIFY_ITER_TYPE_VFSMOUNT] = + iter_info.marks[FSNOTIFY_OBJ_TYPE_VFSMOUNT] = fsnotify_first_mark(&mnt->mnt_fsnotify_marks); } if (inode) { - iter_info.marks[FSNOTIFY_ITER_TYPE_INODE] = + iter_info.marks[FSNOTIFY_OBJ_TYPE_INODE] = fsnotify_first_mark(&inode->i_fsnotify_marks); } - if (inode2) { - iter_info.marks[inode2_type] = - fsnotify_first_mark(&inode2->i_fsnotify_marks); + if (parent) { + iter_info.marks[FSNOTIFY_OBJ_TYPE_PARENT] = + fsnotify_first_mark(&parent->i_fsnotify_marks); } /* @@ -587,7 +558,7 @@ static __init int fsnotify_init(void) { int ret; - BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 23); + BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 25); ret = init_srcu_struct(&fsnotify_mark_srcu); if (ret) diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h index fde74eb333cc..ff2063ec6b0f 100644 --- a/fs/notify/fsnotify.h +++ b/fs/notify/fsnotify.h @@ -27,21 +27,6 @@ static inline struct super_block *fsnotify_conn_sb( return container_of(conn->obj, struct super_block, s_fsnotify_marks); } -static inline struct super_block *fsnotify_connector_sb( - struct fsnotify_mark_connector *conn) -{ - switch (conn->type) { - case FSNOTIFY_OBJ_TYPE_INODE: - return fsnotify_conn_inode(conn)->i_sb; - case FSNOTIFY_OBJ_TYPE_VFSMOUNT: - return fsnotify_conn_mount(conn)->mnt.mnt_sb; - case FSNOTIFY_OBJ_TYPE_SB: - return fsnotify_conn_sb(conn); - default: - return NULL; - } -} - /* destroy all events sitting in this groups notification queue */ extern void fsnotify_flush_notify(struct fsnotify_group *group); @@ -76,6 +61,10 @@ static inline void fsnotify_clear_marks_by_sb(struct super_block *sb) */ extern void __fsnotify_update_child_dentry_flags(struct inode *inode); +/* allocate and destroy and event holder to attach events to notification/access queues */ +extern struct fsnotify_event_holder *fsnotify_alloc_event_holder(void); +extern void fsnotify_destroy_event_holder(struct fsnotify_event_holder *holder); + extern struct kmem_cache *fsnotify_mark_connector_cachep; #endif /* __FS_NOTIFY_FSNOTIFY_H_ */ diff --git a/fs/notify/group.c b/fs/notify/group.c index 1de6631a3925..a4a4b1c64d32 100644 --- a/fs/notify/group.c +++ b/fs/notify/group.c @@ -58,7 +58,7 @@ void fsnotify_destroy_group(struct fsnotify_group *group) fsnotify_group_stop_queueing(group); /* Clear all marks for this group and queue them for destruction */ - fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_ANY); + fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_ALL_TYPES_MASK); /* * Some marks can still be pinned when waiting for response from @@ -88,7 +88,7 @@ void fsnotify_destroy_group(struct fsnotify_group *group) * that deliberately ignores overflow events. */ if (group->overflow_event) - group->ops->free_event(group, group->overflow_event); + group->ops->free_event(group->overflow_event); fsnotify_put_group(group); } @@ -111,19 +111,20 @@ void fsnotify_put_group(struct fsnotify_group *group) } EXPORT_SYMBOL_GPL(fsnotify_put_group); -static struct fsnotify_group *__fsnotify_alloc_group( - const struct fsnotify_ops *ops, - int flags, gfp_t gfp) +/* + * Create a new fsnotify_group and hold a reference for the group returned. + */ +struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops) { - static struct lock_class_key nofs_marks_lock; struct fsnotify_group *group; - group = kzalloc(sizeof(struct fsnotify_group), gfp); + group = kzalloc(sizeof(struct fsnotify_group), GFP_KERNEL); if (!group) return ERR_PTR(-ENOMEM); /* set to 0 when there a no external references to this group */ refcount_set(&group->refcnt, 1); + atomic_set(&group->num_marks, 0); atomic_set(&group->user_waits, 0); spin_lock_init(&group->notification_lock); @@ -135,32 +136,9 @@ static struct fsnotify_group *__fsnotify_alloc_group( INIT_LIST_HEAD(&group->marks_list); group->ops = ops; - group->flags = flags; - /* - * For most backends, eviction of inode with a mark is not expected, - * because marks hold a refcount on the inode against eviction. - * - * Use a different lockdep class for groups that support evictable - * inode marks, because with evictable marks, mark_mutex is NOT - * fs-reclaim safe - the mutex is taken when evicting inodes. - */ - if (flags & FSNOTIFY_GROUP_NOFS) - lockdep_set_class(&group->mark_mutex, &nofs_marks_lock); return group; } - -/* - * Create a new fsnotify_group and hold a reference for the group returned. - */ -struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops, - int flags) -{ - gfp_t gfp = (flags & FSNOTIFY_GROUP_USER) ? GFP_KERNEL_ACCOUNT : - GFP_KERNEL; - - return __fsnotify_alloc_group(ops, flags, gfp); -} EXPORT_SYMBOL_GPL(fsnotify_alloc_group); int fsnotify_fasync(int fd, struct file *file, int on) diff --git a/fs/notify/inotify/inotify.h b/fs/notify/inotify/inotify.h index 7d5df7a21539..8f00151eb731 100644 --- a/fs/notify/inotify/inotify.h +++ b/fs/notify/inotify/inotify.h @@ -27,18 +27,11 @@ static inline struct inotify_event_info *INOTIFY_E(struct fsnotify_event *fse) * userspace. There is at least one bit (FS_EVENT_ON_CHILD) which is * used only internally to the kernel. */ -#define INOTIFY_USER_MASK (IN_ALL_EVENTS) +#define INOTIFY_USER_MASK (IN_ALL_EVENTS | IN_ONESHOT | IN_EXCL_UNLINK) static inline __u32 inotify_mark_user_mask(struct fsnotify_mark *fsn_mark) { - __u32 mask = fsn_mark->mask & INOTIFY_USER_MASK; - - if (fsn_mark->flags & FSNOTIFY_MARK_FLAG_EXCL_UNLINK) - mask |= IN_EXCL_UNLINK; - if (fsn_mark->flags & FSNOTIFY_MARK_FLAG_IN_ONESHOT) - mask |= IN_ONESHOT; - - return mask; + return fsn_mark->mask & INOTIFY_USER_MASK; } extern void inotify_ignored_and_remove_idr(struct fsnotify_mark *fsn_mark, diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index 993375f0db67..66991c7fef9e 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -46,10 +46,9 @@ static bool event_compare(struct fsnotify_event *old_fsn, return false; } -static int inotify_merge(struct fsnotify_group *group, - struct fsnotify_event *event) +static int inotify_merge(struct list_head *list, + struct fsnotify_event *event) { - struct list_head *list = &group->notification_list; struct fsnotify_event *last_event; last_event = list_entry(list->prev, struct fsnotify_event, list); @@ -115,7 +114,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask, mask &= ~IN_ISDIR; fsn_event = &event->fse; - fsnotify_init_event(fsn_event); + fsnotify_init_event(fsn_event, 0); event->mask = mask; event->wd = wd; event->sync_cookie = cookie; @@ -129,7 +128,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask, fsnotify_destroy_event(group, fsn_event); } - if (inode_mark->flags & FSNOTIFY_MARK_FLAG_IN_ONESHOT) + if (inode_mark->mask & IN_ONESHOT) fsnotify_destroy_mark(inode_mark, group); return 0; @@ -184,8 +183,7 @@ static void inotify_free_group_priv(struct fsnotify_group *group) dec_inotify_instances(group->inotify_data.ucounts); } -static void inotify_free_event(struct fsnotify_group *group, - struct fsnotify_event *fsn_event) +static void inotify_free_event(struct fsnotify_event *fsn_event) { kfree(INOTIFY_E(fsn_event)); } diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 67a9f3941f9b..9ea915e9d2a1 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -37,15 +37,6 @@ #include -/* - * An inotify watch requires allocating an inotify_inode_mark structure as - * well as pinning the watched inode. Doubling the size of a VFS inode - * should be more than enough to cover the additional filesystem inode - * size increase. - */ -#define INOTIFY_WATCH_COST (sizeof(struct inotify_inode_mark) + \ - 2 * sizeof(struct inode)) - /* configurable via /proc/sys/fs/inotify/ */ static int inotify_max_queued_events __read_mostly; @@ -89,10 +80,10 @@ static inline __u32 inotify_arg_to_mask(struct inode *inode, u32 arg) __u32 mask; /* - * Everything should receive events when the inode is unmounted. - * All directories care about children. + * Everything should accept their own ignored and should receive events + * when the inode is unmounted. All directories care about children. */ - mask = (FS_UNMOUNT); + mask = (FS_IN_IGNORED | FS_UNMOUNT); if (S_ISDIR(inode->i_mode)) mask |= FS_EVENT_ON_CHILD; @@ -102,28 +93,13 @@ static inline __u32 inotify_arg_to_mask(struct inode *inode, u32 arg) return mask; } -#define INOTIFY_MARK_FLAGS \ - (FSNOTIFY_MARK_FLAG_EXCL_UNLINK | FSNOTIFY_MARK_FLAG_IN_ONESHOT) - -static inline unsigned int inotify_arg_to_flags(u32 arg) -{ - unsigned int flags = 0; - - if (arg & IN_EXCL_UNLINK) - flags |= FSNOTIFY_MARK_FLAG_EXCL_UNLINK; - if (arg & IN_ONESHOT) - flags |= FSNOTIFY_MARK_FLAG_IN_ONESHOT; - - return flags; -} - static inline u32 inotify_mask_to_arg(__u32 mask) { return mask & (IN_ALL_EVENTS | IN_ISDIR | IN_UNMOUNT | IN_IGNORED | IN_Q_OVERFLOW); } -/* inotify userspace file descriptor functions */ +/* intofiy userspace file descriptor functions */ static __poll_t inotify_poll(struct file *file, poll_table *wait) { struct fsnotify_group *group = file->private_data; @@ -161,10 +137,11 @@ static struct fsnotify_event *get_one_event(struct fsnotify_group *group, size_t event_size = sizeof(struct inotify_event); struct fsnotify_event *event; - event = fsnotify_peek_first_event(group); - if (!event) + if (fsnotify_notify_queue_is_empty(group)) return NULL; + event = fsnotify_peek_first_event(group); + pr_debug("%s: group=%p event=%p\n", __func__, group, event); event_size += round_event_name_len(event); @@ -366,7 +343,7 @@ static int inotify_find_inode(const char __user *dirname, struct path *path, if (error) return error; /* you can only watch an inode if you have read permissions on it */ - error = path_permission(path, MAY_READ); + error = inode_permission(path->dentry->d_inode, MAY_READ); if (error) { path_put(path); return error; @@ -528,10 +505,13 @@ static int inotify_update_existing_watch(struct fsnotify_group *group, struct fsnotify_mark *fsn_mark; struct inotify_inode_mark *i_mark; __u32 old_mask, new_mask; - int replace = !(arg & IN_MASK_ADD); + __u32 mask; + int add = (arg & IN_MASK_ADD); int create = (arg & IN_MASK_CREATE); int ret; + mask = inotify_arg_to_mask(inode, arg); + fsn_mark = fsnotify_find_mark(&inode->i_fsnotify_marks, group); if (!fsn_mark) return -ENOENT; @@ -544,12 +524,10 @@ static int inotify_update_existing_watch(struct fsnotify_group *group, spin_lock(&fsn_mark->lock); old_mask = fsn_mark->mask; - if (replace) { - fsn_mark->mask = 0; - fsn_mark->flags &= ~INOTIFY_MARK_FLAGS; - } - fsn_mark->mask |= inotify_arg_to_mask(inode, arg); - fsn_mark->flags |= inotify_arg_to_flags(arg); + if (add) + fsn_mark->mask |= mask; + else + fsn_mark->mask = mask; new_mask = fsn_mark->mask; spin_unlock(&fsn_mark->lock); @@ -580,17 +558,19 @@ static int inotify_new_watch(struct fsnotify_group *group, u32 arg) { struct inotify_inode_mark *tmp_i_mark; + __u32 mask; int ret; struct idr *idr = &group->inotify_data.idr; spinlock_t *idr_lock = &group->inotify_data.idr_lock; + mask = inotify_arg_to_mask(inode, arg); + tmp_i_mark = kmem_cache_alloc(inotify_inode_mark_cachep, GFP_KERNEL); if (unlikely(!tmp_i_mark)) return -ENOMEM; fsnotify_init_mark(&tmp_i_mark->fsn_mark, group); - tmp_i_mark->fsn_mark.mask = inotify_arg_to_mask(inode, arg); - tmp_i_mark->fsn_mark.flags = inotify_arg_to_flags(arg); + tmp_i_mark->fsn_mark.mask = mask; tmp_i_mark->wd = -1; ret = inotify_add_to_idr(idr, idr_lock, tmp_i_mark); @@ -627,13 +607,13 @@ static int inotify_update_watch(struct fsnotify_group *group, struct inode *inod { int ret = 0; - fsnotify_group_lock(group); + mutex_lock(&group->mark_mutex); /* try to update and existing watch with the new arg */ ret = inotify_update_existing_watch(group, inode, arg); /* no mark present, try to add a new one */ if (ret == -ENOENT) ret = inotify_new_watch(group, inode, arg); - fsnotify_group_unlock(group); + mutex_unlock(&group->mark_mutex); return ret; } @@ -643,18 +623,17 @@ static struct fsnotify_group *inotify_new_group(unsigned int max_events) struct fsnotify_group *group; struct inotify_event_info *oevent; - group = fsnotify_alloc_group(&inotify_fsnotify_ops, - FSNOTIFY_GROUP_USER); + group = fsnotify_alloc_group(&inotify_fsnotify_ops); if (IS_ERR(group)) return group; - oevent = kmalloc(sizeof(struct inotify_event_info), GFP_KERNEL_ACCOUNT); + oevent = kmalloc(sizeof(struct inotify_event_info), GFP_KERNEL); if (unlikely(!oevent)) { fsnotify_destroy_group(group); return ERR_PTR(-ENOMEM); } group->overflow_event = &oevent->fse; - fsnotify_init_event(group->overflow_event); + fsnotify_init_event(group->overflow_event, 0); oevent->mask = FS_Q_OVERFLOW; oevent->wd = -1; oevent->sync_cookie = 0; @@ -830,18 +809,6 @@ SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd) */ static int __init inotify_user_setup(void) { - unsigned long watches_max; - struct sysinfo si; - - si_meminfo(&si); - /* - * Allow up to 1% of addressable memory to be allocated for inotify - * watches (per user) limited to the range [8192, 1048576]. - */ - watches_max = (((si.totalram - si.totalhigh) / 100) << PAGE_SHIFT) / - INOTIFY_WATCH_COST; - watches_max = clamp(watches_max, 8192UL, 1048576UL); - BUILD_BUG_ON(IN_ACCESS != FS_ACCESS); BUILD_BUG_ON(IN_MODIFY != FS_MODIFY); BUILD_BUG_ON(IN_ATTRIB != FS_ATTRIB); @@ -857,7 +824,9 @@ static int __init inotify_user_setup(void) BUILD_BUG_ON(IN_UNMOUNT != FS_UNMOUNT); BUILD_BUG_ON(IN_Q_OVERFLOW != FS_Q_OVERFLOW); BUILD_BUG_ON(IN_IGNORED != FS_IN_IGNORED); + BUILD_BUG_ON(IN_EXCL_UNLINK != FS_EXCL_UNLINK); BUILD_BUG_ON(IN_ISDIR != FS_ISDIR); + BUILD_BUG_ON(IN_ONESHOT != FS_IN_ONESHOT); BUILD_BUG_ON(HWEIGHT32(ALL_INOTIFY_BITS) != 22); @@ -866,7 +835,7 @@ static int __init inotify_user_setup(void) inotify_max_queued_events = 16384; init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES] = 128; - init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = watches_max; + init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = 8192; return 0; } diff --git a/fs/notify/mark.c b/fs/notify/mark.c index c74ef947447d..5b44be5f93dd 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -116,64 +116,20 @@ __u32 fsnotify_conn_mask(struct fsnotify_mark_connector *conn) return *fsnotify_conn_mask_p(conn); } -static void fsnotify_get_inode_ref(struct inode *inode) -{ - ihold(inode); - atomic_long_inc(&inode->i_sb->s_fsnotify_connectors); -} - -/* - * Grab or drop inode reference for the connector if needed. - * - * When it's time to drop the reference, we only clear the HAS_IREF flag and - * return the inode object. fsnotify_drop_object() will be resonsible for doing - * iput() outside of spinlocks. This happens when last mark that wanted iref is - * detached. - */ -static struct inode *fsnotify_update_iref(struct fsnotify_mark_connector *conn, - bool want_iref) -{ - bool has_iref = conn->flags & FSNOTIFY_CONN_FLAG_HAS_IREF; - struct inode *inode = NULL; - - if (conn->type != FSNOTIFY_OBJ_TYPE_INODE || - want_iref == has_iref) - return NULL; - - if (want_iref) { - /* Pin inode if any mark wants inode refcount held */ - fsnotify_get_inode_ref(fsnotify_conn_inode(conn)); - conn->flags |= FSNOTIFY_CONN_FLAG_HAS_IREF; - } else { - /* Unpin inode after detach of last mark that wanted iref */ - inode = fsnotify_conn_inode(conn); - conn->flags &= ~FSNOTIFY_CONN_FLAG_HAS_IREF; - } - - return inode; -} - -static void *__fsnotify_recalc_mask(struct fsnotify_mark_connector *conn) +static void __fsnotify_recalc_mask(struct fsnotify_mark_connector *conn) { u32 new_mask = 0; - bool want_iref = false; struct fsnotify_mark *mark; assert_spin_locked(&conn->lock); /* We can get detached connector here when inode is getting unlinked. */ if (!fsnotify_valid_obj_type(conn->type)) - return NULL; + return; hlist_for_each_entry(mark, &conn->list, obj_list) { - if (!(mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) - continue; - new_mask |= fsnotify_calc_mask(mark); - if (conn->type == FSNOTIFY_OBJ_TYPE_INODE && - !(mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF)) - want_iref = true; + if (mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED) + new_mask |= mark->mask; } *fsnotify_conn_mask_p(conn) = new_mask; - - return fsnotify_update_iref(conn, want_iref); } /* @@ -213,31 +169,6 @@ static void fsnotify_connector_destroy_workfn(struct work_struct *work) } } -static void fsnotify_put_inode_ref(struct inode *inode) -{ - struct super_block *sb = inode->i_sb; - - iput(inode); - if (atomic_long_dec_and_test(&sb->s_fsnotify_connectors)) - wake_up_var(&sb->s_fsnotify_connectors); -} - -static void fsnotify_get_sb_connectors(struct fsnotify_mark_connector *conn) -{ - struct super_block *sb = fsnotify_connector_sb(conn); - - if (sb) - atomic_long_inc(&sb->s_fsnotify_connectors); -} - -static void fsnotify_put_sb_connectors(struct fsnotify_mark_connector *conn) -{ - struct super_block *sb = fsnotify_connector_sb(conn); - - if (sb && atomic_long_dec_and_test(&sb->s_fsnotify_connectors)) - wake_up_var(&sb->s_fsnotify_connectors); -} - static void *fsnotify_detach_connector_from_object( struct fsnotify_mark_connector *conn, unsigned int *type) @@ -251,17 +182,13 @@ static void *fsnotify_detach_connector_from_object( if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) { inode = fsnotify_conn_inode(conn); inode->i_fsnotify_mask = 0; - - /* Unpin inode when detaching from connector */ - if (!(conn->flags & FSNOTIFY_CONN_FLAG_HAS_IREF)) - inode = NULL; + atomic_long_inc(&inode->i_sb->s_fsnotify_inode_refs); } else if (conn->type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) { fsnotify_conn_mount(conn)->mnt_fsnotify_mask = 0; } else if (conn->type == FSNOTIFY_OBJ_TYPE_SB) { fsnotify_conn_sb(conn)->s_fsnotify_mask = 0; } - fsnotify_put_sb_connectors(conn); rcu_assign_pointer(*(conn->obj), NULL); conn->obj = NULL; conn->type = FSNOTIFY_OBJ_TYPE_DETACHED; @@ -282,12 +209,19 @@ static void fsnotify_final_mark_destroy(struct fsnotify_mark *mark) /* Drop object reference originally held by a connector */ static void fsnotify_drop_object(unsigned int type, void *objp) { + struct inode *inode; + struct super_block *sb; + if (!objp) return; /* Currently only inode references are passed to be dropped */ if (WARN_ON_ONCE(type != FSNOTIFY_OBJ_TYPE_INODE)) return; - fsnotify_put_inode_ref(objp); + inode = objp; + sb = inode->i_sb; + iput(inode); + if (atomic_long_dec_and_test(&sb->s_fsnotify_inode_refs)) + wake_up_var(&sb->s_fsnotify_inode_refs); } void fsnotify_put_mark(struct fsnotify_mark *mark) @@ -316,8 +250,7 @@ void fsnotify_put_mark(struct fsnotify_mark *mark) objp = fsnotify_detach_connector_from_object(conn, &type); free_conn = true; } else { - objp = __fsnotify_recalc_mask(conn); - type = conn->type; + __fsnotify_recalc_mask(conn); } WRITE_ONCE(mark->connector, NULL); spin_unlock(&conn->lock); @@ -396,7 +329,7 @@ bool fsnotify_prepare_user_wait(struct fsnotify_iter_info *iter_info) { int type; - fsnotify_foreach_iter_type(type) { + fsnotify_foreach_obj_type(type) { /* This can fail if mark is being removed */ if (!fsnotify_get_mark_safe(iter_info->marks[type])) { __release(&fsnotify_mark_srcu); @@ -425,7 +358,7 @@ void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info) int type; iter_info->srcu_idx = srcu_read_lock(&fsnotify_mark_srcu); - fsnotify_foreach_iter_type(type) + fsnotify_foreach_obj_type(type) fsnotify_put_mark_wake(iter_info->marks[type]); } @@ -441,7 +374,9 @@ void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info) */ void fsnotify_detach_mark(struct fsnotify_mark *mark) { - fsnotify_group_assert_locked(mark->group); + struct fsnotify_group *group = mark->group; + + WARN_ON_ONCE(!mutex_is_locked(&group->mark_mutex)); WARN_ON_ONCE(!srcu_read_lock_held(&fsnotify_mark_srcu) && refcount_read(&mark->refcnt) < 1 + !!(mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED)); @@ -456,6 +391,8 @@ void fsnotify_detach_mark(struct fsnotify_mark *mark) list_del_init(&mark->g_list); spin_unlock(&mark->lock); + atomic_dec(&group->num_marks); + /* Drop mark reference acquired in fsnotify_add_mark_locked() */ fsnotify_put_mark(mark); } @@ -493,9 +430,9 @@ void fsnotify_free_mark(struct fsnotify_mark *mark) void fsnotify_destroy_mark(struct fsnotify_mark *mark, struct fsnotify_group *group) { - fsnotify_group_lock(group); + mutex_lock(&group->mark_mutex); fsnotify_detach_mark(mark); - fsnotify_group_unlock(group); + mutex_unlock(&group->mark_mutex); fsnotify_free_mark(mark); } EXPORT_SYMBOL_GPL(fsnotify_destroy_mark); @@ -537,9 +474,10 @@ int fsnotify_compare_groups(struct fsnotify_group *a, struct fsnotify_group *b) } static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, - unsigned int obj_type, + unsigned int type, __kernel_fsid_t *fsid) { + struct inode *inode = NULL; struct fsnotify_mark_connector *conn; conn = kmem_cache_alloc(fsnotify_mark_connector_cachep, GFP_KERNEL); @@ -547,8 +485,7 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, return -ENOMEM; spin_lock_init(&conn->lock); INIT_HLIST_HEAD(&conn->list); - conn->flags = 0; - conn->type = obj_type; + conn->type = type; conn->obj = connp; /* Cache fsid of filesystem containing the object */ if (fsid) { @@ -558,15 +495,16 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp, conn->fsid.val[0] = conn->fsid.val[1] = 0; conn->flags = 0; } - fsnotify_get_sb_connectors(conn); - + if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) + inode = igrab(fsnotify_conn_inode(conn)); /* * cmpxchg() provides the barrier so that readers of *connp can see * only initialized structure */ if (cmpxchg(connp, NULL, conn)) { /* Someone else created list structure for us */ - fsnotify_put_sb_connectors(conn); + if (inode) + iput(inode); kmem_cache_free(fsnotify_mark_connector_cachep, conn); } @@ -607,16 +545,15 @@ static struct fsnotify_mark_connector *fsnotify_grab_connector( * priority, highest number first, and then by the group's location in memory. */ static int fsnotify_add_mark_list(struct fsnotify_mark *mark, - fsnotify_connp_t *connp, - unsigned int obj_type, - int add_flags, __kernel_fsid_t *fsid) + fsnotify_connp_t *connp, unsigned int type, + int allow_dups, __kernel_fsid_t *fsid) { struct fsnotify_mark *lmark, *last = NULL; struct fsnotify_mark_connector *conn; int cmp; int err = 0; - if (WARN_ON(!fsnotify_valid_obj_type(obj_type))) + if (WARN_ON(!fsnotify_valid_obj_type(type))) return -EINVAL; /* Backend is expected to check for zero fsid (e.g. tmpfs) */ @@ -628,8 +565,7 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark, conn = fsnotify_grab_connector(connp); if (!conn) { spin_unlock(&mark->lock); - err = fsnotify_attach_connector_to_object(connp, obj_type, - fsid); + err = fsnotify_attach_connector_to_object(connp, type, fsid); if (err) return err; goto restart; @@ -668,7 +604,7 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark, if ((lmark->group == mark->group) && (lmark->flags & FSNOTIFY_MARK_FLAG_ATTACHED) && - !(mark->group->flags & FSNOTIFY_GROUP_DUPS)) { + !allow_dups) { err = -EEXIST; goto out_err; } @@ -702,13 +638,13 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark, * event types should be delivered to which group. */ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, - fsnotify_connp_t *connp, unsigned int obj_type, - int add_flags, __kernel_fsid_t *fsid) + fsnotify_connp_t *connp, unsigned int type, + int allow_dups, __kernel_fsid_t *fsid) { struct fsnotify_group *group = mark->group; int ret = 0; - fsnotify_group_assert_locked(group); + BUG_ON(!mutex_is_locked(&group->mark_mutex)); /* * LOCKING ORDER!!!! @@ -720,14 +656,16 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, mark->flags |= FSNOTIFY_MARK_FLAG_ALIVE | FSNOTIFY_MARK_FLAG_ATTACHED; list_add(&mark->g_list, &group->marks_list); + atomic_inc(&group->num_marks); fsnotify_get_mark(mark); /* for g_list */ spin_unlock(&mark->lock); - ret = fsnotify_add_mark_list(mark, connp, obj_type, add_flags, fsid); + ret = fsnotify_add_mark_list(mark, connp, type, allow_dups, fsid); if (ret) goto err; - fsnotify_recalc_mask(mark->connector); + if (mark->mask) + fsnotify_recalc_mask(mark->connector); return ret; err: @@ -736,21 +674,21 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark, FSNOTIFY_MARK_FLAG_ATTACHED); list_del_init(&mark->g_list); spin_unlock(&mark->lock); + atomic_dec(&group->num_marks); fsnotify_put_mark(mark); return ret; } int fsnotify_add_mark(struct fsnotify_mark *mark, fsnotify_connp_t *connp, - unsigned int obj_type, int add_flags, - __kernel_fsid_t *fsid) + unsigned int type, int allow_dups, __kernel_fsid_t *fsid) { int ret; struct fsnotify_group *group = mark->group; - fsnotify_group_lock(group); - ret = fsnotify_add_mark_locked(mark, connp, obj_type, add_flags, fsid); - fsnotify_group_unlock(group); + mutex_lock(&group->mark_mutex); + ret = fsnotify_add_mark_locked(mark, connp, type, allow_dups, fsid); + mutex_unlock(&group->mark_mutex); return ret; } EXPORT_SYMBOL_GPL(fsnotify_add_mark); @@ -784,14 +722,14 @@ EXPORT_SYMBOL_GPL(fsnotify_find_mark); /* Clear any marks in a group with given type mask */ void fsnotify_clear_marks_by_group(struct fsnotify_group *group, - unsigned int obj_type) + unsigned int type_mask) { struct fsnotify_mark *lmark, *mark; LIST_HEAD(to_free); struct list_head *head = &to_free; /* Skip selection step if we want to clear all marks. */ - if (obj_type == FSNOTIFY_OBJ_TYPE_ANY) { + if (type_mask == FSNOTIFY_OBJ_ALL_TYPES_MASK) { head = &group->marks_list; goto clear; } @@ -804,24 +742,24 @@ void fsnotify_clear_marks_by_group(struct fsnotify_group *group, * move marks to free to to_free list in one go and then free marks in * to_free list one by one. */ - fsnotify_group_lock(group); + mutex_lock(&group->mark_mutex); list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) { - if (mark->connector->type == obj_type) + if ((1U << mark->connector->type) & type_mask) list_move(&mark->g_list, &to_free); } - fsnotify_group_unlock(group); + mutex_unlock(&group->mark_mutex); clear: while (1) { - fsnotify_group_lock(group); + mutex_lock(&group->mark_mutex); if (list_empty(head)) { - fsnotify_group_unlock(group); + mutex_unlock(&group->mark_mutex); break; } mark = list_first_entry(head, struct fsnotify_mark, g_list); fsnotify_get_mark(mark); fsnotify_detach_mark(mark); - fsnotify_group_unlock(group); + mutex_unlock(&group->mark_mutex); fsnotify_free_mark(mark); fsnotify_put_mark(mark); } diff --git a/fs/notify/notification.c b/fs/notify/notification.c index 9022ae650cf8..75d79d6d3ef0 100644 --- a/fs/notify/notification.c +++ b/fs/notify/notification.c @@ -47,6 +47,13 @@ u32 fsnotify_get_cookie(void) } EXPORT_SYMBOL_GPL(fsnotify_get_cookie); +/* return true if the notify queue is empty, false otherwise */ +bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group) +{ + assert_spin_locked(&group->notification_lock); + return list_empty(&group->notification_list) ? true : false; +} + void fsnotify_destroy_event(struct fsnotify_group *group, struct fsnotify_event *event) { @@ -64,26 +71,20 @@ void fsnotify_destroy_event(struct fsnotify_group *group, WARN_ON(!list_empty(&event->list)); spin_unlock(&group->notification_lock); } - group->ops->free_event(group, event); + group->ops->free_event(event); } /* - * Try to add an event to the notification queue. - * The group can later pull this event off the queue to deal with. - * The group can use the @merge hook to merge the event with a queued event. - * The group can use the @insert hook to insert the event into hash table. - * The function returns: - * 0 if the event was added to a queue - * 1 if the event was merged with some other queued event + * Add an event to the group notification queue. The group can later pull this + * event off the queue to deal with. The function returns 0 if the event was + * added to the queue, 1 if the event was merged with some other queued event, * 2 if the event was not queued - either the queue of events has overflown - * or the group is shutting down. + * or the group is shutting down. */ -int fsnotify_insert_event(struct fsnotify_group *group, - struct fsnotify_event *event, - int (*merge)(struct fsnotify_group *, - struct fsnotify_event *), - void (*insert)(struct fsnotify_group *, - struct fsnotify_event *)) +int fsnotify_add_event(struct fsnotify_group *group, + struct fsnotify_event *event, + int (*merge)(struct list_head *, + struct fsnotify_event *)) { int ret = 0; struct list_head *list = &group->notification_list; @@ -110,7 +111,7 @@ int fsnotify_insert_event(struct fsnotify_group *group, } if (!list_empty(list) && merge) { - ret = merge(group, event); + ret = merge(list, event); if (ret) { spin_unlock(&group->notification_lock); return ret; @@ -120,8 +121,6 @@ int fsnotify_insert_event(struct fsnotify_group *group, queue: group->q_len++; list_add_tail(&event->list, list); - if (insert) - insert(group, event); spin_unlock(&group->notification_lock); wake_up(&group->notification_waitq); @@ -141,39 +140,36 @@ void fsnotify_remove_queued_event(struct fsnotify_group *group, group->q_len--; } -/* - * Return the first event on the notification list without removing it. - * Returns NULL if the list is empty. - */ -struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group) -{ - assert_spin_locked(&group->notification_lock); - - if (fsnotify_notify_queue_is_empty(group)) - return NULL; - - return list_first_entry(&group->notification_list, - struct fsnotify_event, list); -} - /* * Remove and return the first event from the notification list. It is the * responsibility of the caller to destroy the obtained event */ struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group *group) { - struct fsnotify_event *event = fsnotify_peek_first_event(group); + struct fsnotify_event *event; - if (!event) - return NULL; + assert_spin_locked(&group->notification_lock); - pr_debug("%s: group=%p event=%p\n", __func__, group, event); + pr_debug("%s: group=%p\n", __func__, group); + event = list_first_entry(&group->notification_list, + struct fsnotify_event, list); fsnotify_remove_queued_event(group, event); - return event; } +/* + * This will not remove the event, that must be done with + * fsnotify_remove_first_event() + */ +struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group) +{ + assert_spin_locked(&group->notification_lock); + + return list_first_entry(&group->notification_list, + struct fsnotify_event, list); +} + /* * Called when a group is being torn down to clean up any outstanding * event notifications. diff --git a/fs/open.c b/fs/open.c index 3ad0c6c8f5e7..965230a0710c 100644 --- a/fs/open.c +++ b/fs/open.c @@ -493,7 +493,7 @@ SYSCALL_DEFINE1(chdir, const char __user *, filename) if (error) goto out; - error = path_permission(&path, MAY_EXEC | MAY_CHDIR); + error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR); if (error) goto dput_and_out; @@ -522,7 +522,7 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd) if (!d_can_lookup(f.file->f_path.dentry)) goto out_putf; - error = file_permission(f.file, MAY_EXEC | MAY_CHDIR); + error = inode_permission(file_inode(f.file), MAY_EXEC | MAY_CHDIR); if (!error) set_fs_pwd(current->fs, &f.file->f_path); out_putf: @@ -541,7 +541,7 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename) if (error) goto out; - error = path_permission(&path, MAY_EXEC | MAY_CHDIR); + error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR); if (error) goto dput_and_out; @@ -965,47 +965,6 @@ struct file *dentry_open(const struct path *path, int flags, } EXPORT_SYMBOL(dentry_open); -/** - * dentry_create - Create and open a file - * @path: path to create - * @flags: O_ flags - * @mode: mode bits for new file - * @cred: credentials to use - * - * Caller must hold the parent directory's lock, and have prepared - * a negative dentry, placed in @path->dentry, for the new file. - * - * Caller sets @path->mnt to the vfsmount of the filesystem where - * the new file is to be created. The parent directory and the - * negative dentry must reside on the same filesystem instance. - * - * On success, returns a "struct file *". Otherwise a ERR_PTR - * is returned. - */ -struct file *dentry_create(const struct path *path, int flags, umode_t mode, - const struct cred *cred) -{ - struct file *f; - int error; - - validate_creds(cred); - f = alloc_empty_file(flags, cred); - if (IS_ERR(f)) - return f; - - error = vfs_create(d_inode(path->dentry->d_parent), - path->dentry, mode, true); - if (!error) - error = vfs_open(path, f); - - if (unlikely(error)) { - fput(f); - return ERR_PTR(error); - } - return f; -} -EXPORT_SYMBOL(dentry_create); - struct file *open_with_fake_path(const struct path *path, int flags, struct inode *inode, const struct cred *cred) { @@ -1382,7 +1341,7 @@ EXPORT_SYMBOL(filp_close); */ SYSCALL_DEFINE1(close, unsigned int, fd) { - int retval = close_fd(fd); + int retval = __close_fd(current->files, fd); /* can't restart close syscall because file table entry was cleared */ if (unlikely(retval == -ERESTARTSYS || diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index 520a6bdaf429..a9f9923a725d 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -214,16 +214,9 @@ static inline int ovl_do_rename(struct inode *olddir, struct dentry *olddentry, unsigned int flags) { int err; - struct renamedata rd = { - .old_dir = olddir, - .old_dentry = olddentry, - .new_dir = newdir, - .new_dentry = newdentry, - .flags = flags, - }; pr_debug("rename(%pd2, %pd2, 0x%x)\n", olddentry, newdentry, flags); - err = vfs_rename(&rd); + err = vfs_rename(olddir, olddentry, newdir, newdentry, NULL, flags); if (err) { pr_debug("...rename(%pd2, %pd2, ...) = %i\n", olddentry, newdentry, err); diff --git a/fs/proc/fd.c b/fs/proc/fd.c index 35b92009c1cc..6b634c0a9b6e 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -29,13 +29,14 @@ static int seq_show(struct seq_file *m, void *v) if (!task) return -ENOENT; - task_lock(task); - files = task->files; + files = get_files_struct(task); + put_task_struct(task); + if (files) { unsigned int fd = proc_fd(m->private); spin_lock(&files->file_lock); - file = files_lookup_fd_locked(files, fd); + file = fcheck_files(files, fd); if (file) { struct fdtable *fdt = files_fdtable(files); @@ -47,9 +48,8 @@ static int seq_show(struct seq_file *m, void *v) ret = 0; } spin_unlock(&files->file_lock); + put_files_struct(files); } - task_unlock(task); - put_task_struct(task); if (ret) return ret; @@ -59,7 +59,6 @@ static int seq_show(struct seq_file *m, void *v) real_mount(file->f_path.mnt)->mnt_id, file_inode(file)->i_ino); - /* show_fd_locks() never deferences files so a stale value is safe */ show_fd_locks(m, file, files); if (seq_has_overflowed(m)) goto out; @@ -108,13 +107,18 @@ static const struct file_operations proc_fdinfo_file_operations = { static bool tid_fd_mode(struct task_struct *task, unsigned fd, fmode_t *mode) { + struct files_struct *files = get_files_struct(task); struct file *file; + if (!files) + return false; + rcu_read_lock(); - file = task_lookup_fd_rcu(task, fd); + file = fcheck_files(files, fd); if (file) *mode = file->f_mode; rcu_read_unlock(); + put_files_struct(files); return !!file; } @@ -166,22 +170,29 @@ static const struct dentry_operations tid_fd_dentry_operations = { static int proc_fd_link(struct dentry *dentry, struct path *path) { + struct files_struct *files = NULL; struct task_struct *task; int ret = -ENOENT; task = get_proc_task(d_inode(dentry)); if (task) { + files = get_files_struct(task); + put_task_struct(task); + } + + if (files) { unsigned int fd = proc_fd(d_inode(dentry)); struct file *fd_file; - fd_file = fget_task(task, fd); + spin_lock(&files->file_lock); + fd_file = fcheck_files(files, fd); if (fd_file) { *path = fd_file->f_path; path_get(&fd_file->f_path); ret = 0; - fput(fd_file); } - put_task_struct(task); + spin_unlock(&files->file_lock); + put_files_struct(files); } return ret; @@ -242,6 +253,7 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, instantiate_t instantiate) { struct task_struct *p = get_proc_task(file_inode(file)); + struct files_struct *files; unsigned int fd; if (!p) @@ -249,18 +261,22 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, if (!dir_emit_dots(file, ctx)) goto out; + files = get_files_struct(p); + if (!files) + goto out; rcu_read_lock(); - for (fd = ctx->pos - 2;; fd++) { + for (fd = ctx->pos - 2; + fd < files_fdtable(files)->max_fds; + fd++, ctx->pos++) { struct file *f; struct fd_data data; char name[10 + 1]; unsigned int len; - f = task_lookup_next_fd_rcu(p, &fd); - ctx->pos = fd + 2LL; + f = fcheck_files(files, fd); if (!f) - break; + continue; data.mode = f->f_mode; rcu_read_unlock(); data.fd = fd; @@ -269,11 +285,13 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, if (!proc_fill_cache(file, ctx, name, len, instantiate, p, &data)) - goto out; + goto out_fd_loop; cond_resched(); rcu_read_lock(); } rcu_read_unlock(); +out_fd_loop: + put_files_struct(files); out: put_task_struct(p); return 0; diff --git a/fs/udf/file.c b/fs/udf/file.c index 25f7c915f22b..e283a62701b8 100644 --- a/fs/udf/file.c +++ b/fs/udf/file.c @@ -181,7 +181,7 @@ long udf_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) long old_block, new_block; int result; - if (file_permission(filp, MAY_READ) != 0) { + if (inode_permission(inode, MAY_READ) != 0) { udf_debug("no permission to access inode %lu\n", inode->i_ino); return -EPERM; } diff --git a/fs/verity/enable.c b/fs/verity/enable.c index dfe8acc32df6..dbabea77efc0 100644 --- a/fs/verity/enable.c +++ b/fs/verity/enable.c @@ -369,7 +369,7 @@ int fsverity_ioctl_enable(struct file *filp, const void __user *uarg) * has verity enabled, and to stabilize the data being hashed. */ - err = file_permission(filp, MAY_WRITE); + err = inode_permission(inode, MAY_WRITE); if (err) return err; diff --git a/include/linux/dnotify.h b/include/linux/dnotify.h index b87c3b85a166..0aad774beaec 100644 --- a/include/linux/dnotify.h +++ b/include/linux/dnotify.h @@ -26,7 +26,7 @@ struct dnotify_struct { FS_MODIFY | FS_MODIFY_CHILD |\ FS_ACCESS | FS_ACCESS_CHILD |\ FS_ATTRIB | FS_ATTRIB_CHILD |\ - FS_CREATE | FS_RENAME |\ + FS_CREATE | FS_DN_RENAME |\ FS_MOVED_FROM | FS_MOVED_TO) extern int dir_notify_enable; diff --git a/include/linux/errno.h b/include/linux/errno.h index 8b0c754bab02..d73f597a2484 100644 --- a/include/linux/errno.h +++ b/include/linux/errno.h @@ -31,6 +31,5 @@ #define EJUKEBOX 528 /* Request initiated, but will not complete before timeout */ #define EIOCBQUEUED 529 /* iocb queued, will get completion event */ #define ERECALLCONFLICT 530 /* conflict with recalled state */ -#define ENOGRACE 531 /* NFS file lock reclaim refused */ #endif diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index 218fc5c54e90..3ceb72b67a7a 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -213,27 +213,12 @@ struct export_operations { bool write, u32 *device_generation); int (*commit_blocks)(struct inode *inode, struct iomap *iomaps, int nr_iomaps, struct iattr *iattr); - u64 (*fetch_iversion)(struct inode *); -#define EXPORT_OP_NOWCC (0x1) /* don't collect v3 wcc data */ -#define EXPORT_OP_NOSUBTREECHK (0x2) /* no subtree checking */ -#define EXPORT_OP_CLOSE_BEFORE_UNLINK (0x4) /* close files before unlink */ -#define EXPORT_OP_REMOTE_FS (0x8) /* Filesystem is remote */ -#define EXPORT_OP_NOATOMIC_ATTR (0x10) /* Filesystem cannot supply - atomic attribute updates - */ -#define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */ - unsigned long flags; }; extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid, int *max_len, struct inode *parent); extern int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len, int connectable); -extern struct dentry *exportfs_decode_fh_raw(struct vfsmount *mnt, - struct fid *fid, int fh_len, - int fileid_type, - int (*acceptable)(void *, struct dentry *), - void *context); extern struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid, int fh_len, int fileid_type, int (*acceptable)(void *, struct dentry *), void *context); diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 558844c8d259..3e9c56ee651f 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -2,11 +2,8 @@ #ifndef _LINUX_FANOTIFY_H #define _LINUX_FANOTIFY_H -#include #include -extern struct ctl_table fanotify_table[]; /* for sysctl */ - #define FAN_GROUP_FLAG(group, flag) \ ((group)->fanotify_data.flags & (flag)) @@ -18,62 +15,27 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ * these constant, the programs may break if re-compiled with new uapi headers * and then run on an old kernel. */ - -/* Group classes where permission events are allowed */ -#define FANOTIFY_PERM_CLASSES (FAN_CLASS_CONTENT | \ +#define FANOTIFY_CLASS_BITS (FAN_CLASS_NOTIF | FAN_CLASS_CONTENT | \ FAN_CLASS_PRE_CONTENT) -#define FANOTIFY_CLASS_BITS (FAN_CLASS_NOTIF | FANOTIFY_PERM_CLASSES) +#define FANOTIFY_FID_BITS (FAN_REPORT_FID | FAN_REPORT_DFID_NAME) -#define FANOTIFY_FID_BITS (FAN_REPORT_DFID_NAME_TARGET) - -#define FANOTIFY_INFO_MODES (FANOTIFY_FID_BITS | FAN_REPORT_PIDFD) - -/* - * fanotify_init() flags that require CAP_SYS_ADMIN. - * We do not allow unprivileged groups to request permission events. - * We do not allow unprivileged groups to get other process pid in events. - * We do not allow unprivileged groups to use unlimited resources. - */ -#define FANOTIFY_ADMIN_INIT_FLAGS (FANOTIFY_PERM_CLASSES | \ - FAN_REPORT_TID | \ - FAN_REPORT_PIDFD | \ - FAN_UNLIMITED_QUEUE | \ - FAN_UNLIMITED_MARKS) - -/* - * fanotify_init() flags that are allowed for user without CAP_SYS_ADMIN. - * FAN_CLASS_NOTIF is the only class we allow for unprivileged group. - * We do not allow unprivileged groups to get file descriptors in events, - * so one of the flags for reporting file handles is required. - */ -#define FANOTIFY_USER_INIT_FLAGS (FAN_CLASS_NOTIF | \ - FANOTIFY_FID_BITS | \ - FAN_CLOEXEC | FAN_NONBLOCK) - -#define FANOTIFY_INIT_FLAGS (FANOTIFY_ADMIN_INIT_FLAGS | \ - FANOTIFY_USER_INIT_FLAGS) - -/* Internal group flags */ -#define FANOTIFY_UNPRIV 0x80000000 -#define FANOTIFY_INTERNAL_GROUP_FLAGS (FANOTIFY_UNPRIV) +#define FANOTIFY_INIT_FLAGS (FANOTIFY_CLASS_BITS | FANOTIFY_FID_BITS | \ + FAN_REPORT_TID | \ + FAN_CLOEXEC | FAN_NONBLOCK | \ + FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS) #define FANOTIFY_MARK_TYPE_BITS (FAN_MARK_INODE | FAN_MARK_MOUNT | \ FAN_MARK_FILESYSTEM) -#define FANOTIFY_MARK_CMD_BITS (FAN_MARK_ADD | FAN_MARK_REMOVE | \ - FAN_MARK_FLUSH) - -#define FANOTIFY_MARK_IGNORE_BITS (FAN_MARK_IGNORED_MASK | \ - FAN_MARK_IGNORE) - #define FANOTIFY_MARK_FLAGS (FANOTIFY_MARK_TYPE_BITS | \ - FANOTIFY_MARK_CMD_BITS | \ - FANOTIFY_MARK_IGNORE_BITS | \ + FAN_MARK_ADD | \ + FAN_MARK_REMOVE | \ FAN_MARK_DONT_FOLLOW | \ FAN_MARK_ONLYDIR | \ + FAN_MARK_IGNORED_MASK | \ FAN_MARK_IGNORED_SURV_MODIFY | \ - FAN_MARK_EVICTABLE) + FAN_MARK_FLUSH) /* * Events that can be reported with data type FSNOTIFY_EVENT_PATH. @@ -87,23 +49,15 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ * Directory entry modification events - reported only to directory * where entry is modified and not to a watching parent. */ -#define FANOTIFY_DIRENT_EVENTS (FAN_MOVE | FAN_CREATE | FAN_DELETE | \ - FAN_RENAME) - -/* Events that can be reported with event->fd */ -#define FANOTIFY_FD_EVENTS (FANOTIFY_PATH_EVENTS | FANOTIFY_PERM_EVENTS) +#define FANOTIFY_DIRENT_EVENTS (FAN_MOVE | FAN_CREATE | FAN_DELETE) /* Events that can only be reported with data type FSNOTIFY_EVENT_INODE */ #define FANOTIFY_INODE_EVENTS (FANOTIFY_DIRENT_EVENTS | \ FAN_ATTRIB | FAN_MOVE_SELF | FAN_DELETE_SELF) -/* Events that can only be reported with data type FSNOTIFY_EVENT_ERROR */ -#define FANOTIFY_ERROR_EVENTS (FAN_FS_ERROR) - /* Events that user can request to be notified on */ #define FANOTIFY_EVENTS (FANOTIFY_PATH_EVENTS | \ - FANOTIFY_INODE_EVENTS | \ - FANOTIFY_ERROR_EVENTS) + FANOTIFY_INODE_EVENTS) /* Events that require a permission response from user */ #define FANOTIFY_PERM_EVENTS (FAN_OPEN_PERM | FAN_ACCESS_PERM | \ @@ -117,10 +71,6 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */ FANOTIFY_PERM_EVENTS | \ FAN_Q_OVERFLOW | FAN_ONDIR) -/* Events and flags relevant only for directories */ -#define FANOTIFY_DIRONLY_EVENT_BITS (FANOTIFY_DIRENT_EVENTS | \ - FAN_EVENT_ON_CHILD | FAN_ONDIR) - #define ALL_FANOTIFY_EVENT_BITS (FANOTIFY_OUTGOING_EVENTS | \ FANOTIFY_EVENT_FLAGS) diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index 4ed3589f9294..f1a99d3e5570 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -80,7 +80,7 @@ struct dentry; /* * The caller must ensure that fd table isn't shared or hold rcu or file lock */ -static inline struct file *files_lookup_fd_raw(struct files_struct *files, unsigned int fd) +static inline struct file *__fcheck_files(struct files_struct *files, unsigned int fd) { struct fdtable *fdt = rcu_dereference_raw(files->fdt); @@ -91,40 +91,37 @@ static inline struct file *files_lookup_fd_raw(struct files_struct *files, unsig return NULL; } -static inline struct file *files_lookup_fd_locked(struct files_struct *files, unsigned int fd) +static inline struct file *fcheck_files(struct files_struct *files, unsigned int fd) { - RCU_LOCKDEP_WARN(!lockdep_is_held(&files->file_lock), + RCU_LOCKDEP_WARN(!rcu_read_lock_held() && + !lockdep_is_held(&files->file_lock), "suspicious rcu_dereference_check() usage"); - return files_lookup_fd_raw(files, fd); + return __fcheck_files(files, fd); } -static inline struct file *files_lookup_fd_rcu(struct files_struct *files, unsigned int fd) -{ - RCU_LOCKDEP_WARN(!rcu_read_lock_held(), - "suspicious rcu_dereference_check() usage"); - return files_lookup_fd_raw(files, fd); -} - -static inline struct file *lookup_fd_rcu(unsigned int fd) -{ - return files_lookup_fd_rcu(current->files, fd); -} - -struct file *task_lookup_fd_rcu(struct task_struct *task, unsigned int fd); -struct file *task_lookup_next_fd_rcu(struct task_struct *task, unsigned int *fd); +/* + * Check whether the specified fd has an open file. + */ +#define fcheck(fd) fcheck_files(current->files, fd) struct task_struct; struct files_struct *get_files_struct(struct task_struct *); void put_files_struct(struct files_struct *fs); -int unshare_files(void); +void reset_files_struct(struct files_struct *); +int unshare_files(struct files_struct **); struct files_struct *dup_fd(struct files_struct *, unsigned, int *) __latent_entropy; void do_close_on_exec(struct files_struct *); int iterate_fd(struct files_struct *, unsigned, int (*)(const void *, struct file *, unsigned), const void *); -extern int close_fd(unsigned int fd); +extern int __alloc_fd(struct files_struct *files, + unsigned start, unsigned end, unsigned flags); +extern void __fd_install(struct files_struct *files, + unsigned int fd, struct file *file); +extern int __close_fd(struct files_struct *files, + unsigned int fd); extern int __close_range(unsigned int fd, unsigned int max_fd, unsigned int flags); extern int close_fd_get_file(unsigned int fd, struct file **res); extern int unshare_fd(unsigned long unshare_flags, unsigned int max_fds, diff --git a/include/linux/fs.h b/include/linux/fs.h index ec6de06ead4c..c61553833fb9 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1020,7 +1020,6 @@ static inline struct file *get_file(struct file *f) #define FL_UNLOCK_PENDING 512 /* Lease is being broken */ #define FL_OFDLCK 1024 /* lock is "owned" by struct file */ #define FL_LAYOUT 2048 /* outstanding pNFS layout */ -#define FL_RECLAIM 4096 /* reclaiming from a reboot server */ #define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE) @@ -1044,7 +1043,6 @@ struct file_lock_operations { }; struct lock_manager_operations { - void *lm_mod_owner; fl_owner_t (*lm_get_owner)(fl_owner_t); void (*lm_put_owner)(fl_owner_t); void (*lm_notify)(struct file_lock *); /* unblock callback */ @@ -1053,8 +1051,6 @@ struct lock_manager_operations { int (*lm_change)(struct file_lock *, int, struct list_head *); void (*lm_setup)(struct file_lock *, void **); bool (*lm_breaker_owns_lease)(struct file_lock *); - bool (*lm_lock_expirable)(struct file_lock *cfl); - void (*lm_expire_lock)(void); ANDROID_KABI_RESERVE(1); ANDROID_KABI_RESERVE(2); @@ -1200,15 +1196,6 @@ extern void lease_unregister_notifier(struct notifier_block *); struct files_struct; extern void show_fd_locks(struct seq_file *f, struct file *filp, struct files_struct *files); -extern bool locks_owner_has_blockers(struct file_lock_context *flctx, - fl_owner_t owner); - -static inline struct file_lock_context * -locks_inode_context(const struct inode *inode) -{ - return smp_load_acquire(&inode->i_flctx); -} - #else /* !CONFIG_FILE_LOCKING */ static inline int fcntl_getlk(struct file *file, unsigned int cmd, struct flock __user *user) @@ -1349,18 +1336,6 @@ static inline int lease_modify(struct file_lock *fl, int arg, struct files_struct; static inline void show_fd_locks(struct seq_file *f, struct file *filp, struct files_struct *files) {} -static inline bool locks_owner_has_blockers(struct file_lock_context *flctx, - fl_owner_t owner) -{ - return false; -} - -static inline struct file_lock_context * -locks_inode_context(const struct inode *inode) -{ - return NULL; -} - #endif /* !CONFIG_FILE_LOCKING */ static inline struct inode *file_inode(const struct file *f) @@ -1580,11 +1555,8 @@ struct super_block { /* Number of inodes with nlink == 0 but still referenced */ atomic_long_t s_remove_count; - /* - * Number of inode/mount/sb objects that are being watched, note that - * inodes objects are currently double-accounted. - */ - atomic_long_t s_fsnotify_connectors; + /* Pending fsnotify inode refs */ + atomic_long_t s_fsnotify_inode_refs; /* Being remounted read-only */ int s_readonly_remount; @@ -1856,17 +1828,7 @@ extern int vfs_symlink(struct inode *, struct dentry *, const char *); extern int vfs_link(struct dentry *, struct inode *, struct dentry *, struct inode **); extern int vfs_rmdir(struct inode *, struct dentry *); extern int vfs_unlink(struct inode *, struct dentry *, struct inode **); - -struct renamedata { - struct inode *old_dir; - struct dentry *old_dentry; - struct inode *new_dir; - struct dentry *new_dentry; - struct inode **delegated_inode; - unsigned int flags; -} __randomize_layout; - -int vfs_rename(struct renamedata *); +extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, struct inode **, unsigned int); static inline int vfs_whiteout(struct inode *dir, struct dentry *dentry) { @@ -2700,8 +2662,6 @@ extern struct file *filp_open_block(const char *, int, umode_t); extern struct file *file_open_root(struct dentry *, struct vfsmount *, const char *, int, umode_t); extern struct file * dentry_open(const struct path *, int, const struct cred *); -extern struct file *dentry_create(const struct path *path, int flags, - umode_t mode, const struct cred *cred); extern struct file * open_with_fake_path(const struct path *, int, struct inode*, const struct cred *); static inline struct file *file_clone_open(struct file *file) @@ -2932,14 +2892,6 @@ static inline int bmap(struct inode *inode, sector_t *block) extern int notify_change(struct dentry *, struct iattr *, struct inode **); extern int inode_permission(struct inode *, int); extern int generic_permission(struct inode *, int); -static inline int file_permission(struct file *file, int mask) -{ - return inode_permission(file_inode(file), mask); -} -static inline int path_permission(const struct path *path, int mask) -{ - return inode_permission(d_inode(path->dentry), mask); -} extern int __check_sticky(struct inode *dir, struct inode *inode); static inline bool execute_ok(struct inode *inode) diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index bb8467cd11ae..79add91eaa04 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -26,27 +26,21 @@ * FS_EVENT_ON_CHILD mask on the parent inode and will not be reported if only * the child is interested and not the parent. */ -static inline int fsnotify_name(__u32 mask, const void *data, int data_type, - struct inode *dir, const struct qstr *name, - u32 cookie) +static inline void fsnotify_name(struct inode *dir, __u32 mask, + struct inode *child, + const struct qstr *name, u32 cookie) { - if (atomic_long_read(&dir->i_sb->s_fsnotify_connectors) == 0) - return 0; - - return fsnotify(mask, data, data_type, dir, name, NULL, cookie); + fsnotify(mask, child, FSNOTIFY_EVENT_INODE, dir, name, NULL, cookie); } static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry, __u32 mask) { - fsnotify_name(mask, dentry, FSNOTIFY_EVENT_DENTRY, dir, &dentry->d_name, 0); + fsnotify_name(dir, mask, d_inode(dentry), &dentry->d_name, 0); } static inline void fsnotify_inode(struct inode *inode, __u32 mask) { - if (atomic_long_read(&inode->i_sb->s_fsnotify_connectors) == 0) - return; - if (S_ISDIR(inode->i_mode)) mask |= FS_ISDIR; @@ -59,9 +53,6 @@ static inline int fsnotify_parent(struct dentry *dentry, __u32 mask, { struct inode *inode = d_inode(dentry); - if (atomic_long_read(&inode->i_sb->s_fsnotify_connectors) == 0) - return 0; - if (S_ISDIR(inode->i_mode)) { mask |= FS_ISDIR; @@ -86,7 +77,7 @@ static inline int fsnotify_parent(struct dentry *dentry, __u32 mask, */ static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask) { - fsnotify_parent(dentry, mask, dentry, FSNOTIFY_EVENT_DENTRY); + fsnotify_parent(dentry, mask, d_inode(dentry), FSNOTIFY_EVENT_INODE); } static inline int fsnotify_file(struct file *file, __u32 mask) @@ -144,23 +135,18 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir, u32 fs_cookie = fsnotify_get_cookie(); __u32 old_dir_mask = FS_MOVED_FROM; __u32 new_dir_mask = FS_MOVED_TO; - __u32 rename_mask = FS_RENAME; const struct qstr *new_name = &moved->d_name; + if (old_dir == new_dir) + old_dir_mask |= FS_DN_RENAME; + if (isdir) { old_dir_mask |= FS_ISDIR; new_dir_mask |= FS_ISDIR; - rename_mask |= FS_ISDIR; } - /* Event with information about both old and new parent+name */ - fsnotify_name(rename_mask, moved, FSNOTIFY_EVENT_DENTRY, - old_dir, old_name, 0); - - fsnotify_name(old_dir_mask, source, FSNOTIFY_EVENT_INODE, - old_dir, old_name, fs_cookie); - fsnotify_name(new_dir_mask, source, FSNOTIFY_EVENT_INODE, - new_dir, new_name, fs_cookie); + fsnotify_name(old_dir, old_dir_mask, source, old_name, fs_cookie); + fsnotify_name(new_dir, new_dir_mask, source, new_name, fs_cookie); if (target) fsnotify_link_count(target); @@ -195,22 +181,16 @@ static inline void fsnotify_inoderemove(struct inode *inode) /* * fsnotify_create - 'name' was linked in - * - * Caller must make sure that dentry->d_name is stable. - * Note: some filesystems (e.g. kernfs) leave @dentry negative and instantiate - * ->d_inode later */ -static inline void fsnotify_create(struct inode *dir, struct dentry *dentry) +static inline void fsnotify_create(struct inode *inode, struct dentry *dentry) { - audit_inode_child(dir, dentry, AUDIT_TYPE_CHILD_CREATE); + audit_inode_child(inode, dentry, AUDIT_TYPE_CHILD_CREATE); - fsnotify_dirent(dir, dentry, FS_CREATE); + fsnotify_dirent(inode, dentry, FS_CREATE); } /* * fsnotify_link - new hardlink in 'inode' directory - * - * Caller must make sure that new_dentry->d_name is stable. * Note: We have to pass also the linked inode ptr as some filesystems leave * new_dentry->d_inode NULL and instantiate inode pointer later */ @@ -220,8 +200,7 @@ static inline void fsnotify_link(struct inode *dir, struct inode *inode, fsnotify_link_count(inode); audit_inode_child(dir, new_dentry, AUDIT_TYPE_CHILD_CREATE); - fsnotify_name(FS_CREATE, inode, FSNOTIFY_EVENT_INODE, - dir, &new_dentry->d_name, 0); + fsnotify_name(dir, FS_CREATE, inode, &new_dentry->d_name, 0); } /* @@ -240,8 +219,7 @@ static inline void fsnotify_delete(struct inode *dir, struct inode *inode, if (S_ISDIR(inode->i_mode)) mask |= FS_ISDIR; - fsnotify_name(mask, inode, FSNOTIFY_EVENT_INODE, dir, &dentry->d_name, - 0); + fsnotify_name(dir, mask, inode, &dentry->d_name, 0); } /** @@ -276,16 +254,12 @@ static inline void fsnotify_unlink(struct inode *dir, struct dentry *dentry) /* * fsnotify_mkdir - directory 'name' was created - * - * Caller must make sure that dentry->d_name is stable. - * Note: some filesystems (e.g. kernfs) leave @dentry negative and instantiate - * ->d_inode later */ -static inline void fsnotify_mkdir(struct inode *dir, struct dentry *dentry) +static inline void fsnotify_mkdir(struct inode *inode, struct dentry *dentry) { - audit_inode_child(dir, dentry, AUDIT_TYPE_CHILD_CREATE); + audit_inode_child(inode, dentry, AUDIT_TYPE_CHILD_CREATE); - fsnotify_dirent(dir, dentry, FS_CREATE | FS_ISDIR); + fsnotify_dirent(inode, dentry, FS_CREATE | FS_ISDIR); } /* @@ -379,17 +353,4 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid) fsnotify_dentry(dentry, mask); } -static inline int fsnotify_sb_error(struct super_block *sb, struct inode *inode, - int error) -{ - struct fs_error_report report = { - .error = error, - .inode = inode, - .sb = sb, - }; - - return fsnotify(FS_ERROR, &report, FSNOTIFY_EVENT_ERROR, - NULL, NULL, NULL, 0); -} - #endif /* _LINUX_FS_NOTIFY_H */ diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index d7d96c806bff..a2e42d3cd87c 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -19,8 +19,6 @@ #include #include #include -#include -#include /* * IN_* from inotfy.h lines up EXACTLY with FS_*, this is so we can easily @@ -44,18 +42,13 @@ #define FS_UNMOUNT 0x00002000 /* inode on umount fs */ #define FS_Q_OVERFLOW 0x00004000 /* Event queued overflowed */ -#define FS_ERROR 0x00008000 /* Filesystem Error (fanotify) */ - -/* - * FS_IN_IGNORED overloads FS_ERROR. It is only used internally by inotify - * which does not support FS_ERROR. - */ #define FS_IN_IGNORED 0x00008000 /* last inotify event here */ #define FS_OPEN_PERM 0x00010000 /* open event in an permission hook */ #define FS_ACCESS_PERM 0x00020000 /* access event in a permissions hook */ #define FS_OPEN_EXEC_PERM 0x00040000 /* open/exec event in a permission hook */ +#define FS_EXCL_UNLINK 0x04000000 /* do not send events if object is unlinked */ /* * Set on inode mark that cares about things that happen to its children. * Always set for dnotify and inotify. @@ -63,9 +56,10 @@ */ #define FS_EVENT_ON_CHILD 0x08000000 -#define FS_RENAME 0x10000000 /* File was renamed */ +#define FS_DN_RENAME 0x10000000 /* file renamed */ #define FS_DN_MULTISHOT 0x20000000 /* dnotify multishot */ #define FS_ISDIR 0x40000000 /* event occurred against dir */ +#define FS_IN_ONESHOT 0x80000000 /* only send event once */ #define FS_MOVE (FS_MOVED_FROM | FS_MOVED_TO) @@ -75,7 +69,7 @@ * The watching parent may get an FS_ATTRIB|FS_EVENT_ON_CHILD event * when a directory entry inside a child subdir changes. */ -#define ALL_FSNOTIFY_DIRENT_EVENTS (FS_CREATE | FS_DELETE | FS_MOVE | FS_RENAME) +#define ALL_FSNOTIFY_DIRENT_EVENTS (FS_CREATE | FS_DELETE | FS_MOVE) #define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \ FS_OPEN_EXEC_PERM) @@ -100,12 +94,12 @@ /* Events that can be reported to backends */ #define ALL_FSNOTIFY_EVENTS (ALL_FSNOTIFY_DIRENT_EVENTS | \ FS_EVENTS_POSS_ON_CHILD | \ - FS_DELETE_SELF | FS_MOVE_SELF | \ - FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED | \ - FS_ERROR) + FS_DELETE_SELF | FS_MOVE_SELF | FS_DN_RENAME | \ + FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED) /* Extra flags that may be reported with event or control handling of events */ -#define ALL_FSNOTIFY_FLAGS (FS_ISDIR | FS_EVENT_ON_CHILD | FS_DN_MULTISHOT) +#define ALL_FSNOTIFY_FLAGS (FS_EXCL_UNLINK | FS_ISDIR | FS_IN_ONESHOT | \ + FS_DN_MULTISHOT | FS_EVENT_ON_CHILD) #define ALL_FSNOTIFY_BITS (ALL_FSNOTIFY_EVENTS | ALL_FSNOTIFY_FLAGS) @@ -142,7 +136,6 @@ struct mem_cgroup; * @dir: optional directory associated with event - * if @file_name is not NULL, this is the directory that * @file_name is relative to. - * Either @inode or @dir must be non-NULL. * @file_name: optional file name associated with event * @cookie: inotify rename cookie * @@ -162,7 +155,7 @@ struct fsnotify_ops { const struct qstr *file_name, u32 cookie); void (*free_group_priv)(struct fsnotify_group *group); void (*freeing_mark)(struct fsnotify_mark *mark, struct fsnotify_group *group); - void (*free_event)(struct fsnotify_group *group, struct fsnotify_event *event); + void (*free_event)(struct fsnotify_event *event); /* called on final put+free to free memory */ void (*free_mark)(struct fsnotify_mark *mark); }; @@ -174,6 +167,7 @@ struct fsnotify_ops { */ struct fsnotify_event { struct list_head list; + unsigned long objectid; /* identifier for queue merges */ }; /* @@ -211,14 +205,11 @@ struct fsnotify_group { unsigned int priority; bool shutdown; /* group is being shut down, don't queue more events */ -#define FSNOTIFY_GROUP_USER 0x01 /* user allocated group */ -#define FSNOTIFY_GROUP_DUPS 0x02 /* allow multiple marks per object */ -#define FSNOTIFY_GROUP_NOFS 0x04 /* group lock is not direct reclaim safe */ - int flags; - unsigned int owner_flags; /* stored flags of mark_mutex owner */ - /* stores all fastpath marks assoc with this group so they can be cleaned on unregister */ struct mutex mark_mutex; /* protect marks_list */ + atomic_t num_marks; /* 1 for each mark and 1 for not being + * past the point of no return when freeing + * a group */ atomic_t user_waits; /* Number of tasks waiting for user * response */ struct list_head marks_list; /* all inode marks for this group */ @@ -243,58 +234,23 @@ struct fsnotify_group { #endif #ifdef CONFIG_FANOTIFY struct fanotify_group_private_data { - /* Hash table of events for merge */ - struct hlist_head *merge_hash; /* allows a group to block waiting for a userspace response */ struct list_head access_list; wait_queue_head_t access_waitq; int flags; /* flags from fanotify_init() */ int f_flags; /* event_f_flags from fanotify_init() */ - struct ucounts *ucounts; - mempool_t error_events_pool; + unsigned int max_marks; + struct user_struct *user; } fanotify_data; #endif /* CONFIG_FANOTIFY */ }; }; -/* - * These helpers are used to prevent deadlock when reclaiming inodes with - * evictable marks of the same group that is allocating a new mark. - */ -static inline void fsnotify_group_lock(struct fsnotify_group *group) -{ - mutex_lock(&group->mark_mutex); - if (group->flags & FSNOTIFY_GROUP_NOFS) - group->owner_flags = memalloc_nofs_save(); -} - -static inline void fsnotify_group_unlock(struct fsnotify_group *group) -{ - if (group->flags & FSNOTIFY_GROUP_NOFS) - memalloc_nofs_restore(group->owner_flags); - mutex_unlock(&group->mark_mutex); -} - -static inline void fsnotify_group_assert_locked(struct fsnotify_group *group) -{ - WARN_ON_ONCE(!mutex_is_locked(&group->mark_mutex)); - if (group->flags & FSNOTIFY_GROUP_NOFS) - WARN_ON_ONCE(!(current->flags & PF_MEMALLOC_NOFS)); -} - /* When calling fsnotify tell it if the data is a path or inode */ enum fsnotify_data_type { FSNOTIFY_EVENT_NONE, FSNOTIFY_EVENT_PATH, FSNOTIFY_EVENT_INODE, - FSNOTIFY_EVENT_DENTRY, - FSNOTIFY_EVENT_ERROR, -}; - -struct fs_error_report { - int error; - struct inode *inode; - struct super_block *sb; }; static inline struct inode *fsnotify_data_inode(const void *data, int data_type) @@ -302,25 +258,8 @@ static inline struct inode *fsnotify_data_inode(const void *data, int data_type) switch (data_type) { case FSNOTIFY_EVENT_INODE: return (struct inode *)data; - case FSNOTIFY_EVENT_DENTRY: - return d_inode(data); case FSNOTIFY_EVENT_PATH: return d_inode(((const struct path *)data)->dentry); - case FSNOTIFY_EVENT_ERROR: - return ((struct fs_error_report *)data)->inode; - default: - return NULL; - } -} - -static inline struct dentry *fsnotify_data_dentry(const void *data, int data_type) -{ - switch (data_type) { - case FSNOTIFY_EVENT_DENTRY: - /* Non const is needed for dget() */ - return (struct dentry *)data; - case FSNOTIFY_EVENT_PATH: - return ((const struct path *)data)->dentry; default: return NULL; } @@ -337,110 +276,58 @@ static inline const struct path *fsnotify_data_path(const void *data, } } -static inline struct super_block *fsnotify_data_sb(const void *data, - int data_type) -{ - switch (data_type) { - case FSNOTIFY_EVENT_INODE: - return ((struct inode *)data)->i_sb; - case FSNOTIFY_EVENT_DENTRY: - return ((struct dentry *)data)->d_sb; - case FSNOTIFY_EVENT_PATH: - return ((const struct path *)data)->dentry->d_sb; - case FSNOTIFY_EVENT_ERROR: - return ((struct fs_error_report *) data)->sb; - default: - return NULL; - } -} - -static inline struct fs_error_report *fsnotify_data_error_report( - const void *data, - int data_type) -{ - switch (data_type) { - case FSNOTIFY_EVENT_ERROR: - return (struct fs_error_report *) data; - default: - return NULL; - } -} - -/* - * Index to merged marks iterator array that correlates to a type of watch. - * The type of watched object can be deduced from the iterator type, but not - * the other way around, because an event can match different watched objects - * of the same object type. - * For example, both parent and child are watching an object of type inode. - */ -enum fsnotify_iter_type { - FSNOTIFY_ITER_TYPE_INODE, - FSNOTIFY_ITER_TYPE_VFSMOUNT, - FSNOTIFY_ITER_TYPE_SB, - FSNOTIFY_ITER_TYPE_PARENT, - FSNOTIFY_ITER_TYPE_INODE2, - FSNOTIFY_ITER_TYPE_COUNT -}; - -/* The type of object that a mark is attached to */ enum fsnotify_obj_type { - FSNOTIFY_OBJ_TYPE_ANY = -1, FSNOTIFY_OBJ_TYPE_INODE, + FSNOTIFY_OBJ_TYPE_PARENT, FSNOTIFY_OBJ_TYPE_VFSMOUNT, FSNOTIFY_OBJ_TYPE_SB, FSNOTIFY_OBJ_TYPE_COUNT, FSNOTIFY_OBJ_TYPE_DETACHED = FSNOTIFY_OBJ_TYPE_COUNT }; -static inline bool fsnotify_valid_obj_type(unsigned int obj_type) +#define FSNOTIFY_OBJ_TYPE_INODE_FL (1U << FSNOTIFY_OBJ_TYPE_INODE) +#define FSNOTIFY_OBJ_TYPE_PARENT_FL (1U << FSNOTIFY_OBJ_TYPE_PARENT) +#define FSNOTIFY_OBJ_TYPE_VFSMOUNT_FL (1U << FSNOTIFY_OBJ_TYPE_VFSMOUNT) +#define FSNOTIFY_OBJ_TYPE_SB_FL (1U << FSNOTIFY_OBJ_TYPE_SB) +#define FSNOTIFY_OBJ_ALL_TYPES_MASK ((1U << FSNOTIFY_OBJ_TYPE_COUNT) - 1) + +static inline bool fsnotify_valid_obj_type(unsigned int type) { - return (obj_type < FSNOTIFY_OBJ_TYPE_COUNT); + return (type < FSNOTIFY_OBJ_TYPE_COUNT); } struct fsnotify_iter_info { - struct fsnotify_mark *marks[FSNOTIFY_ITER_TYPE_COUNT]; - struct fsnotify_group *current_group; + struct fsnotify_mark *marks[FSNOTIFY_OBJ_TYPE_COUNT]; unsigned int report_mask; int srcu_idx; }; static inline bool fsnotify_iter_should_report_type( - struct fsnotify_iter_info *iter_info, int iter_type) + struct fsnotify_iter_info *iter_info, int type) { - return (iter_info->report_mask & (1U << iter_type)); + return (iter_info->report_mask & (1U << type)); } static inline void fsnotify_iter_set_report_type( - struct fsnotify_iter_info *iter_info, int iter_type) + struct fsnotify_iter_info *iter_info, int type) { - iter_info->report_mask |= (1U << iter_type); + iter_info->report_mask |= (1U << type); } -static inline struct fsnotify_mark *fsnotify_iter_mark( - struct fsnotify_iter_info *iter_info, int iter_type) +static inline void fsnotify_iter_set_report_type_mark( + struct fsnotify_iter_info *iter_info, int type, + struct fsnotify_mark *mark) { - if (fsnotify_iter_should_report_type(iter_info, iter_type)) - return iter_info->marks[iter_type]; - return NULL; -} - -static inline int fsnotify_iter_step(struct fsnotify_iter_info *iter, int type, - struct fsnotify_mark **markp) -{ - while (type < FSNOTIFY_ITER_TYPE_COUNT) { - *markp = fsnotify_iter_mark(iter, type); - if (*markp) - break; - type++; - } - return type; + iter_info->marks[type] = mark; + iter_info->report_mask |= (1U << type); } #define FSNOTIFY_ITER_FUNCS(name, NAME) \ static inline struct fsnotify_mark *fsnotify_iter_##name##_mark( \ struct fsnotify_iter_info *iter_info) \ { \ - return fsnotify_iter_mark(iter_info, FSNOTIFY_ITER_TYPE_##NAME); \ + return (iter_info->report_mask & FSNOTIFY_OBJ_TYPE_##NAME##_FL) ? \ + iter_info->marks[FSNOTIFY_OBJ_TYPE_##NAME] : NULL; \ } FSNOTIFY_ITER_FUNCS(inode, INODE) @@ -448,13 +335,8 @@ FSNOTIFY_ITER_FUNCS(parent, PARENT) FSNOTIFY_ITER_FUNCS(vfsmount, VFSMOUNT) FSNOTIFY_ITER_FUNCS(sb, SB) -#define fsnotify_foreach_iter_type(type) \ - for (type = 0; type < FSNOTIFY_ITER_TYPE_COUNT; type++) -#define fsnotify_foreach_iter_mark_type(iter, mark, type) \ - for (type = 0; \ - type = fsnotify_iter_step(iter, type, &mark), \ - type < FSNOTIFY_ITER_TYPE_COUNT; \ - type++) +#define fsnotify_foreach_obj_type(type) \ + for (type = 0; type < FSNOTIFY_OBJ_TYPE_COUNT; type++) /* * fsnotify_connp_t is what we embed in objects which connector can be attached @@ -473,7 +355,6 @@ struct fsnotify_mark_connector { spinlock_t lock; unsigned short type; /* Type of object [lock] */ #define FSNOTIFY_CONN_FLAG_HAS_FSID 0x01 -#define FSNOTIFY_CONN_FLAG_HAS_IREF 0x02 unsigned short flags; /* flags [lock] */ __kernel_fsid_t fsid; /* fsid of filesystem containing object */ union { @@ -518,18 +399,11 @@ struct fsnotify_mark { struct hlist_node obj_list; /* Head of list of marks for an object [mark ref] */ struct fsnotify_mark_connector *connector; - /* Events types and flags to ignore [mark->lock, group->mark_mutex] */ - __u32 ignore_mask; - /* General fsnotify mark flags */ -#define FSNOTIFY_MARK_FLAG_ALIVE 0x0001 -#define FSNOTIFY_MARK_FLAG_ATTACHED 0x0002 - /* inotify mark flags */ -#define FSNOTIFY_MARK_FLAG_EXCL_UNLINK 0x0010 -#define FSNOTIFY_MARK_FLAG_IN_ONESHOT 0x0020 - /* fanotify mark flags */ -#define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY 0x0100 -#define FSNOTIFY_MARK_FLAG_NO_IREF 0x0200 -#define FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS 0x0400 + /* Events types to ignore [mark->lock, group->mark_mutex] */ + __u32 ignored_mask; +#define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY 0x01 +#define FSNOTIFY_MARK_FLAG_ALIVE 0x02 +#define FSNOTIFY_MARK_FLAG_ATTACHED 0x04 unsigned int flags; /* flags [mark->lock] */ }; @@ -595,9 +469,7 @@ static inline void fsnotify_update_flags(struct dentry *dentry) /* called from fsnotify listeners, such as fanotify or dnotify */ /* create a new group */ -extern struct fsnotify_group *fsnotify_alloc_group( - const struct fsnotify_ops *ops, - int flags); +extern struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops); /* get reference to a group */ extern void fsnotify_get_group(struct fsnotify_group *group); /* drop reference on a group from fsnotify_alloc_group */ @@ -612,39 +484,17 @@ extern int fsnotify_fasync(int fd, struct file *file, int on); extern void fsnotify_destroy_event(struct fsnotify_group *group, struct fsnotify_event *event); /* attach the event to the group notification queue */ -extern int fsnotify_insert_event(struct fsnotify_group *group, - struct fsnotify_event *event, - int (*merge)(struct fsnotify_group *, - struct fsnotify_event *), - void (*insert)(struct fsnotify_group *, - struct fsnotify_event *)); - -static inline int fsnotify_add_event(struct fsnotify_group *group, - struct fsnotify_event *event, - int (*merge)(struct fsnotify_group *, - struct fsnotify_event *)) -{ - return fsnotify_insert_event(group, event, merge, NULL); -} - +extern int fsnotify_add_event(struct fsnotify_group *group, + struct fsnotify_event *event, + int (*merge)(struct list_head *, + struct fsnotify_event *)); /* Queue overflow event to a notification group */ static inline void fsnotify_queue_overflow(struct fsnotify_group *group) { fsnotify_add_event(group, group->overflow_event, NULL); } -static inline bool fsnotify_is_overflow_event(u32 mask) -{ - return mask & FS_Q_OVERFLOW; -} - -static inline bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group) -{ - assert_spin_locked(&group->notification_lock); - - return list_empty(&group->notification_list); -} - +/* true if the group notification queue is empty */ extern bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group); /* return, but do not dequeue the first event on the notification queue */ extern struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group); @@ -656,101 +506,6 @@ extern void fsnotify_remove_queued_event(struct fsnotify_group *group, /* functions used to manipulate the marks attached to inodes */ -/* - * Canonical "ignore mask" including event flags. - * - * Note the subtle semantic difference from the legacy ->ignored_mask. - * ->ignored_mask traditionally only meant which events should be ignored, - * while ->ignore_mask also includes flags regarding the type of objects on - * which events should be ignored. - */ -static inline __u32 fsnotify_ignore_mask(struct fsnotify_mark *mark) -{ - __u32 ignore_mask = mark->ignore_mask; - - /* The event flags in ignore mask take effect */ - if (mark->flags & FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS) - return ignore_mask; - - /* - * Legacy behavior: - * - Always ignore events on dir - * - Ignore events on child if parent is watching children - */ - ignore_mask |= FS_ISDIR; - ignore_mask &= ~FS_EVENT_ON_CHILD; - ignore_mask |= mark->mask & FS_EVENT_ON_CHILD; - - return ignore_mask; -} - -/* Legacy ignored_mask - only event types to ignore */ -static inline __u32 fsnotify_ignored_events(struct fsnotify_mark *mark) -{ - return mark->ignore_mask & ALL_FSNOTIFY_EVENTS; -} - -/* - * Check if mask (or ignore mask) should be applied depending if victim is a - * directory and whether it is reported to a watching parent. - */ -static inline bool fsnotify_mask_applicable(__u32 mask, bool is_dir, - int iter_type) -{ - /* Should mask be applied to a directory? */ - if (is_dir && !(mask & FS_ISDIR)) - return false; - - /* Should mask be applied to a child? */ - if (iter_type == FSNOTIFY_ITER_TYPE_PARENT && - !(mask & FS_EVENT_ON_CHILD)) - return false; - - return true; -} - -/* - * Effective ignore mask taking into account if event victim is a - * directory and whether it is reported to a watching parent. - */ -static inline __u32 fsnotify_effective_ignore_mask(struct fsnotify_mark *mark, - bool is_dir, int iter_type) -{ - __u32 ignore_mask = fsnotify_ignored_events(mark); - - if (!ignore_mask) - return 0; - - /* For non-dir and non-child, no need to consult the event flags */ - if (!is_dir && iter_type != FSNOTIFY_ITER_TYPE_PARENT) - return ignore_mask; - - ignore_mask = fsnotify_ignore_mask(mark); - if (!fsnotify_mask_applicable(ignore_mask, is_dir, iter_type)) - return 0; - - return ignore_mask & ALL_FSNOTIFY_EVENTS; -} - -/* Get mask for calculating object interest taking ignore mask into account */ -static inline __u32 fsnotify_calc_mask(struct fsnotify_mark *mark) -{ - __u32 mask = mark->mask; - - if (!fsnotify_ignored_events(mark)) - return mask; - - /* Interest in FS_MODIFY may be needed for clearing ignore mask */ - if (!(mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) - mask |= FS_MODIFY; - - /* - * If mark is interested in ignoring events on children, the object must - * show interest in those events for fsnotify_parent() to notice it. - */ - return mask | mark->ignore_mask; -} - /* Get mask of events for a list of marks */ extern __u32 fsnotify_conn_mask(struct fsnotify_mark_connector *conn); /* Calculate mask of events for a list of marks */ @@ -765,27 +520,27 @@ extern int fsnotify_get_conn_fsid(const struct fsnotify_mark_connector *conn, __kernel_fsid_t *fsid); /* attach the mark to the object */ extern int fsnotify_add_mark(struct fsnotify_mark *mark, - fsnotify_connp_t *connp, unsigned int obj_type, - int add_flags, __kernel_fsid_t *fsid); + fsnotify_connp_t *connp, unsigned int type, + int allow_dups, __kernel_fsid_t *fsid); extern int fsnotify_add_mark_locked(struct fsnotify_mark *mark, fsnotify_connp_t *connp, - unsigned int obj_type, int add_flags, + unsigned int type, int allow_dups, __kernel_fsid_t *fsid); /* attach the mark to the inode */ static inline int fsnotify_add_inode_mark(struct fsnotify_mark *mark, struct inode *inode, - int add_flags) + int allow_dups) { return fsnotify_add_mark(mark, &inode->i_fsnotify_marks, - FSNOTIFY_OBJ_TYPE_INODE, add_flags, NULL); + FSNOTIFY_OBJ_TYPE_INODE, allow_dups, NULL); } static inline int fsnotify_add_inode_mark_locked(struct fsnotify_mark *mark, struct inode *inode, - int add_flags) + int allow_dups) { return fsnotify_add_mark_locked(mark, &inode->i_fsnotify_marks, - FSNOTIFY_OBJ_TYPE_INODE, add_flags, + FSNOTIFY_OBJ_TYPE_INODE, allow_dups, NULL); } @@ -798,32 +553,33 @@ extern void fsnotify_detach_mark(struct fsnotify_mark *mark); extern void fsnotify_free_mark(struct fsnotify_mark *mark); /* Wait until all marks queued for destruction are destroyed */ extern void fsnotify_wait_marks_destroyed(void); -/* Clear all of the marks of a group attached to a given object type */ -extern void fsnotify_clear_marks_by_group(struct fsnotify_group *group, - unsigned int obj_type); +/* run all the marks in a group, and clear all of the marks attached to given object type */ +extern void fsnotify_clear_marks_by_group(struct fsnotify_group *group, unsigned int type); /* run all the marks in a group, and clear all of the vfsmount marks */ static inline void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group) { - fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_VFSMOUNT); + fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_VFSMOUNT_FL); } /* run all the marks in a group, and clear all of the inode marks */ static inline void fsnotify_clear_inode_marks_by_group(struct fsnotify_group *group) { - fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_INODE); + fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_INODE_FL); } /* run all the marks in a group, and clear all of the sn marks */ static inline void fsnotify_clear_sb_marks_by_group(struct fsnotify_group *group) { - fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_SB); + fsnotify_clear_marks_by_group(group, FSNOTIFY_OBJ_TYPE_SB_FL); } extern void fsnotify_get_mark(struct fsnotify_mark *mark); extern void fsnotify_put_mark(struct fsnotify_mark *mark); extern void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info); extern bool fsnotify_prepare_user_wait(struct fsnotify_iter_info *iter_info); -static inline void fsnotify_init_event(struct fsnotify_event *event) +static inline void fsnotify_init_event(struct fsnotify_event *event, + unsigned long objectid) { INIT_LIST_HEAD(&event->list); + event->objectid = objectid; } #else diff --git a/include/linux/iversion.h b/include/linux/iversion.h index 3bfebde5a1a6..2917ef990d43 100644 --- a/include/linux/iversion.h +++ b/include/linux/iversion.h @@ -328,19 +328,6 @@ inode_query_iversion(struct inode *inode) return cur >> I_VERSION_QUERIED_SHIFT; } -/* - * For filesystems without any sort of change attribute, the best we can - * do is fake one up from the ctime: - */ -static inline u64 time_to_chattr(struct timespec64 *t) -{ - u64 chattr = t->tv_sec; - - chattr <<= 32; - chattr += t->tv_nsec; - return chattr; -} - /** * inode_eq_iversion_raw - check whether the raw i_version counter has changed * @inode: inode to check diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index 465060acc981..481273f0c72d 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -71,14 +71,15 @@ static inline void *dereference_symbol_descriptor(void *ptr) return ptr; } -int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, - unsigned long), - void *data); - #ifdef CONFIG_KALLSYMS /* Lookup the address for a symbol. Returns 0 if not found. */ unsigned long kallsyms_lookup_name(const char *name); +/* Call a function on each kallsyms symbol in the core kernel */ +int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, + unsigned long), + void *data); + extern int kallsyms_lookup_size_offset(unsigned long addr, unsigned long *symbolsize, unsigned long *offset); @@ -107,6 +108,14 @@ static inline unsigned long kallsyms_lookup_name(const char *name) return 0; } +static inline int kallsyms_on_each_symbol(int (*fn)(void *, const char *, + struct module *, + unsigned long), + void *data) +{ + return 0; +} + static inline int kallsyms_lookup_size_offset(unsigned long addr, unsigned long *symbolsize, unsigned long *offset) diff --git a/include/linux/kthread.h b/include/linux/kthread.h index 9dae77a97a03..2484ed97e72f 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -68,7 +68,6 @@ void *kthread_probe_data(struct task_struct *k); int kthread_park(struct task_struct *k); void kthread_unpark(struct task_struct *k); void kthread_parkme(void); -void kthread_exit(long result) __noreturn; int kthreadd(void *unused); extern struct task_struct *kthreadd_task; diff --git a/include/linux/lockd/bind.h b/include/linux/lockd/bind.h index 3bc9f7410e21..0520c0cd73f4 100644 --- a/include/linux/lockd/bind.h +++ b/include/linux/lockd/bind.h @@ -27,8 +27,7 @@ struct rpc_task; struct nlmsvc_binding { __be32 (*fopen)(struct svc_rqst *, struct nfs_fh *, - struct file **, - int mode); + struct file **); void (*fclose)(struct file *); }; diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h index 70ce419e2709..666f5f310a04 100644 --- a/include/linux/lockd/lockd.h +++ b/include/linux/lockd/lockd.h @@ -10,8 +10,6 @@ #ifndef LINUX_LOCKD_LOCKD_H #define LINUX_LOCKD_LOCKD_H -/* XXX: a lot of this should really be under fs/lockd. */ - #include #include #include @@ -156,8 +154,7 @@ struct nlm_rqst { struct nlm_file { struct hlist_node f_list; /* linked list */ struct nfs_fh f_handle; /* NFS file handle */ - struct file * f_file[2]; /* VFS file pointers, - indexed by O_ flags */ + struct file * f_file; /* VFS file pointer */ struct nlm_share * f_shares; /* DOS shares */ struct list_head f_blocks; /* blocked locks */ unsigned int f_locks; /* guesstimate # of locks */ @@ -270,7 +267,6 @@ typedef int (*nlm_host_match_fn_t)(void *cur, struct nlm_host *ref); /* * Server-side lock handling */ -int lock_to_openmode(struct file_lock *); __be32 nlmsvc_lock(struct svc_rqst *, struct nlm_file *, struct nlm_host *, struct nlm_lock *, int, struct nlm_cookie *, int); @@ -290,9 +286,8 @@ void nlmsvc_locks_init_private(struct file_lock *, struct nlm_host *, pid_t); * File handling for the server personality */ __be32 nlm_lookup_file(struct svc_rqst *, struct nlm_file **, - struct nlm_lock *); + struct nfs_fh *); void nlm_release_file(struct nlm_file *); -void nlmsvc_put_lockowner(struct nlm_lockowner *); void nlmsvc_release_lockowner(struct nlm_lock *); void nlmsvc_mark_resources(struct net *); void nlmsvc_free_host_resources(struct nlm_host *); @@ -304,15 +299,9 @@ void nlmsvc_invalidate_all(void); int nlmsvc_unlock_all_by_sb(struct super_block *sb); int nlmsvc_unlock_all_by_ip(struct sockaddr *server_addr); -static inline struct file *nlmsvc_file_file(struct nlm_file *file) -{ - return file->f_file[O_RDONLY] ? - file->f_file[O_RDONLY] : file->f_file[O_WRONLY]; -} - static inline struct inode *nlmsvc_file_inode(struct nlm_file *file) { - return locks_inode(nlmsvc_file_file(file)); + return locks_inode(file->f_file); } static inline int __nlm_privileged_request4(const struct sockaddr *sap) diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h index 67e4a2c5500b..7ab9f264313f 100644 --- a/include/linux/lockd/xdr.h +++ b/include/linux/lockd/xdr.h @@ -41,8 +41,6 @@ struct nlm_lock { struct nfs_fh fh; struct xdr_netobj oh; u32 svid; - u64 lock_start; - u64 lock_len; struct file_lock fl; }; @@ -98,19 +96,24 @@ struct nlm_reboot { */ #define NLMSVC_XDRSIZE sizeof(struct nlm_args) -bool nlmsvc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlmsvc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlmsvc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlmsvc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlmsvc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlmsvc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlmsvc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlmsvc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr); - -bool nlmsvc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlmsvc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlmsvc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlmsvc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlmsvc_decode_testargs(struct svc_rqst *, __be32 *); +int nlmsvc_encode_testres(struct svc_rqst *, __be32 *); +int nlmsvc_decode_lockargs(struct svc_rqst *, __be32 *); +int nlmsvc_decode_cancargs(struct svc_rqst *, __be32 *); +int nlmsvc_decode_unlockargs(struct svc_rqst *, __be32 *); +int nlmsvc_encode_res(struct svc_rqst *, __be32 *); +int nlmsvc_decode_res(struct svc_rqst *, __be32 *); +int nlmsvc_encode_void(struct svc_rqst *, __be32 *); +int nlmsvc_decode_void(struct svc_rqst *, __be32 *); +int nlmsvc_decode_shareargs(struct svc_rqst *, __be32 *); +int nlmsvc_encode_shareres(struct svc_rqst *, __be32 *); +int nlmsvc_decode_notify(struct svc_rqst *, __be32 *); +int nlmsvc_decode_reboot(struct svc_rqst *, __be32 *); +/* +int nlmclt_encode_testargs(struct rpc_rqst *, u32 *, struct nlm_args *); +int nlmclt_encode_lockargs(struct rpc_rqst *, u32 *, struct nlm_args *); +int nlmclt_encode_cancargs(struct rpc_rqst *, u32 *, struct nlm_args *); +int nlmclt_encode_unlockargs(struct rpc_rqst *, u32 *, struct nlm_args *); + */ #endif /* LOCKD_XDR_H */ diff --git a/include/linux/lockd/xdr4.h b/include/linux/lockd/xdr4.h index 72831e35dca3..e709fe5924f2 100644 --- a/include/linux/lockd/xdr4.h +++ b/include/linux/lockd/xdr4.h @@ -22,22 +22,27 @@ #define nlm4_fbig cpu_to_be32(NLM_FBIG) #define nlm4_failed cpu_to_be32(NLM_FAILED) -void nlm4svc_set_file_lock_range(struct file_lock *fl, u64 off, u64 len); -bool nlm4svc_decode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlm4svc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlm4svc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlm4svc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlm4svc_decode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlm4svc_decode_reboot(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlm4svc_decode_shareargs(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlm4svc_decode_notify(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlm4svc_encode_testres(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlm4svc_encode_res(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlm4svc_encode_void(struct svc_rqst *rqstp, struct xdr_stream *xdr); -bool nlm4svc_encode_shareres(struct svc_rqst *rqstp, struct xdr_stream *xdr); +int nlm4svc_decode_testargs(struct svc_rqst *, __be32 *); +int nlm4svc_encode_testres(struct svc_rqst *, __be32 *); +int nlm4svc_decode_lockargs(struct svc_rqst *, __be32 *); +int nlm4svc_decode_cancargs(struct svc_rqst *, __be32 *); +int nlm4svc_decode_unlockargs(struct svc_rqst *, __be32 *); +int nlm4svc_encode_res(struct svc_rqst *, __be32 *); +int nlm4svc_decode_res(struct svc_rqst *, __be32 *); +int nlm4svc_encode_void(struct svc_rqst *, __be32 *); +int nlm4svc_decode_void(struct svc_rqst *, __be32 *); +int nlm4svc_decode_shareargs(struct svc_rqst *, __be32 *); +int nlm4svc_encode_shareres(struct svc_rqst *, __be32 *); +int nlm4svc_decode_notify(struct svc_rqst *, __be32 *); +int nlm4svc_decode_reboot(struct svc_rqst *, __be32 *); +/* +int nlmclt_encode_testargs(struct rpc_rqst *, u32 *, struct nlm_args *); +int nlmclt_encode_lockargs(struct rpc_rqst *, u32 *, struct nlm_args *); +int nlmclt_encode_cancargs(struct rpc_rqst *, u32 *, struct nlm_args *); +int nlmclt_encode_unlockargs(struct rpc_rqst *, u32 *, struct nlm_args *); + */ extern const struct rpc_version nlm_version4; #endif /* LOCKD_XDR4_H */ diff --git a/include/linux/module.h b/include/linux/module.h index de07fbf3a125..4cd6d889d5ba 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -599,7 +599,7 @@ static inline bool within_module(unsigned long addr, const struct module *mod) return within_module_init(addr, mod) || within_module_core(addr, mod); } -/* Search for module by name: must be in a RCU-sched critical section. */ +/* Search for module by name: must hold module_mutex. */ struct module *find_module(const char *name); struct symsearch { @@ -621,9 +621,13 @@ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type, /* Look for this name: can be of form module:name. */ unsigned long module_kallsyms_lookup_name(const char *name); -extern void __noreturn __module_put_and_kthread_exit(struct module *mod, +int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, + struct module *, unsigned long), + void *data); + +extern void __noreturn __module_put_and_exit(struct module *mod, long code); -#define module_put_and_kthread_exit(code) __module_put_and_kthread_exit(THIS_MODULE, code) +#define module_put_and_exit(code) __module_put_and_exit(THIS_MODULE, code) #ifdef CONFIG_MODULE_UNLOAD int module_refcount(struct module *mod); @@ -804,6 +808,14 @@ static inline unsigned long module_kallsyms_lookup_name(const char *name) return 0; } +static inline int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, + struct module *, + unsigned long), + void *data) +{ + return 0; +} + static inline int register_module_notifier(struct notifier_block *nb) { /* no events will happen anyway, so this can always succeed */ @@ -815,7 +827,7 @@ static inline int unregister_module_notifier(struct notifier_block *nb) return 0; } -#define module_put_and_kthread_exit(code) kthread_exit(code) +#define module_put_and_exit(code) do_exit(code) static inline void print_modules(void) { @@ -892,8 +904,4 @@ static inline bool module_sig_ok(struct module *module) } #endif /* CONFIG_MODULE_SIG */ -int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, - struct module *, unsigned long), - void *data); - #endif /* _LINUX_MODULE_H */ diff --git a/include/linux/nfs.h b/include/linux/nfs.h index b06375e88e58..0dc7ad38a0da 100644 --- a/include/linux/nfs.h +++ b/include/linux/nfs.h @@ -36,6 +36,14 @@ static inline void nfs_copy_fh(struct nfs_fh *target, const struct nfs_fh *sourc memcpy(target->data, source->data, source->size); } + +/* + * This is really a general kernel constant, but since nothing like + * this is defined in the kernel headers, I have to do it here. + */ +#define NFS_OFFSET_MAX ((__s64)((~(__u64)0) >> 1)) + + enum nfs3_stable_how { NFS_UNSTABLE = 0, NFS_DATA_SYNC = 1, diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h index ea88d0f462c9..9dc7eeac924f 100644 --- a/include/linux/nfs4.h +++ b/include/linux/nfs4.h @@ -385,6 +385,13 @@ enum lock_type4 { NFS4_WRITEW_LT = 4 }; +enum change_attr_type4 { + NFS4_CHANGE_TYPE_IS_MONOTONIC_INCR = 0, + NFS4_CHANGE_TYPE_IS_VERSION_COUNTER = 1, + NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS = 2, + NFS4_CHANGE_TYPE_IS_TIME_METADATA = 3, + NFS4_CHANGE_TYPE_IS_UNDEFINED = 4 +}; /* Mandatory Attributes */ #define FATTR4_WORD0_SUPPORTED_ATTRS (1UL << 0) @@ -452,6 +459,7 @@ enum lock_type4 { #define FATTR4_WORD2_LAYOUT_BLKSIZE (1UL << 1) #define FATTR4_WORD2_MDSTHRESHOLD (1UL << 4) #define FATTR4_WORD2_CLONE_BLKSIZE (1UL << 13) +#define FATTR4_WORD2_CHANGE_ATTR_TYPE (1UL << 15) #define FATTR4_WORD2_SECURITY_LABEL (1UL << 16) #define FATTR4_WORD2_MODE_UMASK (1UL << 17) #define FATTR4_WORD2_XATTR_SUPPORT (1UL << 18) @@ -717,17 +725,4 @@ enum nfs4_setxattr_options { SETXATTR4_CREATE = 1, SETXATTR4_REPLACE = 2, }; - -enum { - RCA4_TYPE_MASK_RDATA_DLG = 0, - RCA4_TYPE_MASK_WDATA_DLG = 1, - RCA4_TYPE_MASK_DIR_DLG = 2, - RCA4_TYPE_MASK_FILE_LAYOUT = 3, - RCA4_TYPE_MASK_BLK_LAYOUT = 4, - RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8, - RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 9, - RCA4_TYPE_MASK_OTHER_LAYOUT_MIN = 12, - RCA4_TYPE_MASK_OTHER_LAYOUT_MAX = 15, -}; - #endif diff --git a/include/linux/nfs_ssc.h b/include/linux/nfs_ssc.h index 22265b1ff080..f5ba0fbff72f 100644 --- a/include/linux/nfs_ssc.h +++ b/include/linux/nfs_ssc.h @@ -8,7 +8,6 @@ */ #include -#include extern struct nfs_ssc_client_ops_tbl nfs_ssc_client_tbl; @@ -55,19 +54,6 @@ static inline void nfs42_ssc_close(struct file *filep) } #endif -struct nfsd4_ssc_umount_item { - struct list_head nsui_list; - bool nsui_busy; - /* - * nsui_refcnt inited to 2, 1 on list and 1 for consumer. Entry - * is removed when refcnt drops to 1 and nsui_expire expires. - */ - refcount_t nsui_refcnt; - unsigned long nsui_expire; - struct vfsmount *nsui_vfsmount; - char nsui_ipaddr[RPC_MAX_ADDRBUFLEN + 1]; -}; - /* * NFS_FS */ diff --git a/include/linux/nfsacl.h b/include/linux/nfsacl.h index 8e76a79cdc6a..103d44695323 100644 --- a/include/linux/nfsacl.h +++ b/include/linux/nfsacl.h @@ -38,11 +38,5 @@ nfsacl_encode(struct xdr_buf *buf, unsigned int base, struct inode *inode, extern int nfsacl_decode(struct xdr_buf *buf, unsigned int base, unsigned int *aclcnt, struct posix_acl **pacl); -extern bool -nfs_stream_decode_acl(struct xdr_stream *xdr, unsigned int *aclcnt, - struct posix_acl **pacl); -extern bool -nfs_stream_encode_acl(struct xdr_stream *xdr, struct inode *inode, - struct posix_acl *acl, int encode_entries, int typeflag); #endif /* __LINUX_NFSACL_H */ diff --git a/include/linux/pid.h b/include/linux/pid.h index af308e15f174..fa10acb8d6a4 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -78,7 +78,6 @@ struct file; extern struct pid *pidfd_pid(const struct file *file); struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags); -int pidfd_create(struct pid *pid, unsigned int flags); static inline struct pid *get_pid(struct pid *pid) { diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h index 52eabe6797ee..6d63a5260130 100644 --- a/include/linux/sched/user.h +++ b/include/linux/sched/user.h @@ -15,6 +15,9 @@ struct user_struct { refcount_t __count; /* reference count */ atomic_t processes; /* How many processes does this user have? */ atomic_t sigpending; /* How many pending signals does this user have? */ +#ifdef CONFIG_FANOTIFY + atomic_t fanotify_listeners; +#endif #ifdef CONFIG_EPOLL atomic_long_t epoll_watches; /* The number of file descriptors currently watched */ #endif diff --git a/include/linux/sunrpc/msg_prot.h b/include/linux/sunrpc/msg_prot.h index 938c2bf29db8..43f854487539 100644 --- a/include/linux/sunrpc/msg_prot.h +++ b/include/linux/sunrpc/msg_prot.h @@ -10,6 +10,9 @@ #define RPC_VERSION 2 +/* size of an XDR encoding unit in bytes, i.e. 32bit */ +#define XDR_UNIT (4) + /* spec defines authentication flavor as an unsigned 32 bit integer */ typedef u32 rpc_authflavor_t; diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 1cf7a7799cc0..386628b36bc7 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -19,7 +19,6 @@ #include #include #include -#include /* statistics for svc_pool structures */ struct svc_pool_stats { @@ -52,6 +51,25 @@ struct svc_pool { unsigned long sp_flags; } ____cacheline_aligned_in_smp; +struct svc_serv; + +struct svc_serv_ops { + /* Callback to use when last thread exits. */ + void (*svo_shutdown)(struct svc_serv *, struct net *); + + /* function for service threads to run */ + int (*svo_function)(void *); + + /* queue up a transport for servicing */ + void (*svo_enqueue_xprt)(struct svc_xprt *); + + /* set up thread (or whatever) execution context */ + int (*svo_setup)(struct svc_serv *, struct svc_pool *, int); + + /* optional module to count when adding threads (pooled svcs only) */ + struct module *svo_module; +}; + /* * RPC service. * @@ -66,7 +84,6 @@ struct svc_serv { struct svc_program * sv_program; /* RPC program */ struct svc_stat * sv_stats; /* RPC statistics */ spinlock_t sv_lock; - struct kref sv_refcnt; unsigned int sv_nrthreads; /* # of server threads */ unsigned int sv_maxconn; /* max connections allowed or * '0' causing max to be based @@ -84,8 +101,7 @@ struct svc_serv { unsigned int sv_nrpools; /* number of thread pools */ struct svc_pool * sv_pools; /* array of thread pools */ - int (*sv_threadfn)(void *data); - + const struct svc_serv_ops *sv_ops; /* server operations */ #if defined(CONFIG_SUNRPC_BACKCHANNEL) struct list_head sv_cb_list; /* queue for callback requests * that arrive over the same @@ -97,30 +113,15 @@ struct svc_serv { #endif /* CONFIG_SUNRPC_BACKCHANNEL */ }; -/** - * svc_get() - increment reference count on a SUNRPC serv - * @serv: the svc_serv to have count incremented - * - * Returns: the svc_serv that was passed in. +/* + * We use sv_nrthreads as a reference count. svc_destroy() drops + * this refcount, so we need to bump it up around operations that + * change the number of threads. Horrible, but there it is. + * Should be called with the "service mutex" held. */ -static inline struct svc_serv *svc_get(struct svc_serv *serv) +static inline void svc_get(struct svc_serv *serv) { - kref_get(&serv->sv_refcnt); - return serv; -} - -void svc_destroy(struct kref *); - -/** - * svc_put - decrement reference count on a SUNRPC serv - * @serv: the svc_serv to have count decremented - * - * When the reference count reaches zero, svc_destroy() - * is called to clean up and free the serv. - */ -static inline void svc_put(struct svc_serv *serv) -{ - kref_put(&serv->sv_refcnt, svc_destroy); + serv->sv_nrthreads++; } /* @@ -246,16 +247,12 @@ struct svc_rqst { size_t rq_xprt_hlen; /* xprt header len */ struct xdr_buf rq_arg; - struct xdr_stream rq_arg_stream; - struct xdr_stream rq_res_stream; - struct page *rq_scratch_page; struct xdr_buf rq_res; struct page *rq_pages[RPCSVC_MAXPAGES + 1]; struct page * *rq_respages; /* points into rq_pages */ struct page * *rq_next_page; /* next reply page to use */ struct page * *rq_page_end; /* one past the last page */ - struct pagevec rq_pvec; struct kvec rq_vec[RPCSVC_MAXPAGES]; /* generally useful.. */ struct bio_vec rq_bvec[RPCSVC_MAXPAGES]; @@ -275,13 +272,13 @@ struct svc_rqst { #define RQ_VICTIM (5) /* about to be shut down */ #define RQ_BUSY (6) /* request is busy */ #define RQ_DATA (7) /* request has data */ +#define RQ_AUTHERR (8) /* Request status is auth error */ unsigned long rq_flags; /* flags field */ ktime_t rq_qtime; /* enqueue time */ void * rq_argp; /* decoded arguments */ void * rq_resp; /* xdr'd results */ void * rq_auth_data; /* flavor-specific data */ - __be32 rq_auth_stat; /* authentication status */ int rq_auth_slack; /* extra space xdr code * should leave in head * for krb5i, krb5p. @@ -455,21 +452,40 @@ struct svc_procedure { /* process the request: */ __be32 (*pc_func)(struct svc_rqst *); /* XDR decode args: */ - bool (*pc_decode)(struct svc_rqst *rqstp, - struct xdr_stream *xdr); + int (*pc_decode)(struct svc_rqst *, __be32 *data); /* XDR encode result: */ - bool (*pc_encode)(struct svc_rqst *rqstp, - struct xdr_stream *xdr); + int (*pc_encode)(struct svc_rqst *, __be32 *data); /* XDR free result: */ void (*pc_release)(struct svc_rqst *); unsigned int pc_argsize; /* argument struct size */ - unsigned int pc_argzero; /* how much of argument to clear */ unsigned int pc_ressize; /* result struct size */ unsigned int pc_cachetype; /* cache info (NFS) */ unsigned int pc_xdrressize; /* maximum size of XDR reply */ - const char * pc_name; /* for display */ }; +/* + * Mode for mapping cpus to pools. + */ +enum { + SVC_POOL_AUTO = -1, /* choose one of the others */ + SVC_POOL_GLOBAL, /* no mapping, just a single global pool + * (legacy & UP mode) */ + SVC_POOL_PERCPU, /* one pool per cpu */ + SVC_POOL_PERNODE /* one pool per numa node */ +}; + +struct svc_pool_map { + int count; /* How many svc_servs use us */ + int mode; /* Note: int not enum to avoid + * warnings about "enumeration value + * not handled in switch" */ + unsigned int npools; + unsigned int *pool_to; /* maps pool id to cpu or node */ + unsigned int *to_pool; /* maps cpu or node to pool id */ +}; + +extern struct svc_pool_map svc_pool_map; + /* * Function prototypes. */ @@ -477,17 +493,22 @@ int svc_rpcb_setup(struct svc_serv *serv, struct net *net); void svc_rpcb_cleanup(struct svc_serv *serv, struct net *net); int svc_bind(struct svc_serv *serv, struct net *net); struct svc_serv *svc_create(struct svc_program *, unsigned int, - int (*threadfn)(void *data)); + const struct svc_serv_ops *); struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node); -void svc_rqst_replace_page(struct svc_rqst *rqstp, - struct page *page); +struct svc_rqst *svc_prepare_thread(struct svc_serv *serv, + struct svc_pool *pool, int node); void svc_rqst_free(struct svc_rqst *); void svc_exit_thread(struct svc_rqst *); +unsigned int svc_pool_map_get(void); +void svc_pool_map_put(void); struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, - int (*threadfn)(void *data)); + const struct svc_serv_ops *); int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int); +int svc_set_num_threads_sync(struct svc_serv *, struct svc_pool *, int); int svc_pool_stats_open(struct svc_serv *serv, struct file *file); +void svc_destroy(struct svc_serv *); +void svc_shutdown_net(struct svc_serv *, struct net *); int svc_process(struct svc_rqst *); int bc_svc_process(struct svc_serv *, struct rpc_rqst *, struct svc_rqst *); @@ -498,14 +519,16 @@ void svc_wake_up(struct svc_serv *); void svc_reserve(struct svc_rqst *rqstp, int space); struct svc_pool * svc_pool_for_cpu(struct svc_serv *serv, int cpu); char * svc_print_addr(struct svc_rqst *, char *, size_t); -int svc_encode_result_payload(struct svc_rqst *rqstp, - unsigned int offset, - unsigned int length); +int svc_encode_read_payload(struct svc_rqst *rqstp, + unsigned int offset, + unsigned int length); unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, - struct xdr_buf *payload); + struct page **pages, + struct kvec *first, size_t total); char *svc_fill_symlink_pathname(struct svc_rqst *rqstp, struct kvec *first, void *p, size_t total); +__be32 svc_return_autherr(struct svc_rqst *rqstp, __be32 auth_err); __be32 svc_generic_init_request(struct svc_rqst *rqstp, const struct svc_program *progp, struct svc_process_info *procinfo); @@ -534,42 +557,4 @@ static inline void svc_reserve_auth(struct svc_rqst *rqstp, int space) svc_reserve(rqstp, space + rqstp->rq_auth_slack); } -/** - * svcxdr_init_decode - Prepare an xdr_stream for svc Call decoding - * @rqstp: controlling server RPC transaction context - * - */ -static inline void svcxdr_init_decode(struct svc_rqst *rqstp) -{ - struct xdr_stream *xdr = &rqstp->rq_arg_stream; - struct kvec *argv = rqstp->rq_arg.head; - - xdr_init_decode(xdr, &rqstp->rq_arg, argv->iov_base, NULL); - xdr_set_scratch_page(xdr, rqstp->rq_scratch_page); -} - -/** - * svcxdr_init_encode - Prepare an xdr_stream for svc Reply encoding - * @rqstp: controlling server RPC transaction context - * - */ -static inline void svcxdr_init_encode(struct svc_rqst *rqstp) -{ - struct xdr_stream *xdr = &rqstp->rq_res_stream; - struct xdr_buf *buf = &rqstp->rq_res; - struct kvec *resv = buf->head; - - xdr_reset_scratch_buffer(xdr); - - xdr->buf = buf; - xdr->iov = resv; - xdr->p = resv->iov_base + resv->iov_len; - xdr->end = resv->iov_base + PAGE_SIZE - rqstp->rq_auth_slack; - buf->len = resv->iov_len; - xdr->page_ptr = buf->pages - 1; - buf->buflen = PAGE_SIZE * (1 + rqstp->rq_page_end - buf->pages); - buf->buflen -= rqstp->rq_auth_slack; - xdr->rqst = NULL; -} - #endif /* SUNRPC_SVC_H */ diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h index 2b870a3f391b..9dc3a3b88391 100644 --- a/include/linux/sunrpc/svc_rdma.h +++ b/include/linux/sunrpc/svc_rdma.h @@ -207,8 +207,8 @@ extern void svc_rdma_send_error_msg(struct svcxprt_rdma *rdma, struct svc_rdma_recv_ctxt *rctxt, int status); extern int svc_rdma_sendto(struct svc_rqst *); -extern int svc_rdma_result_payload(struct svc_rqst *rqstp, unsigned int offset, - unsigned int length); +extern int svc_rdma_read_payload(struct svc_rqst *rqstp, unsigned int offset, + unsigned int length); /* svc_rdma_transport.c */ extern struct svc_xprt_class svc_rdma_class; diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index dbffb92511ef..aca35ab5cff2 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -21,8 +21,8 @@ struct svc_xprt_ops { int (*xpo_has_wspace)(struct svc_xprt *); int (*xpo_recvfrom)(struct svc_rqst *); int (*xpo_sendto)(struct svc_rqst *); - int (*xpo_result_payload)(struct svc_rqst *, unsigned int, - unsigned int); + int (*xpo_read_payload)(struct svc_rqst *, unsigned int, + unsigned int); void (*xpo_release_rqst)(struct svc_rqst *); void (*xpo_detach)(struct svc_xprt *); void (*xpo_free)(struct svc_xprt *); @@ -127,16 +127,14 @@ int svc_reg_xprt_class(struct svc_xprt_class *); void svc_unreg_xprt_class(struct svc_xprt_class *); void svc_xprt_init(struct net *, struct svc_xprt_class *, struct svc_xprt *, struct svc_serv *); -int svc_xprt_create(struct svc_serv *serv, const char *xprt_name, - struct net *net, const int family, - const unsigned short port, int flags, - const struct cred *cred); -void svc_xprt_destroy_all(struct svc_serv *serv, struct net *net); -void svc_xprt_received(struct svc_xprt *xprt); +int svc_create_xprt(struct svc_serv *, const char *, struct net *, + const int, const unsigned short, int, + const struct cred *); +void svc_xprt_do_enqueue(struct svc_xprt *xprt); void svc_xprt_enqueue(struct svc_xprt *xprt); void svc_xprt_put(struct svc_xprt *xprt); void svc_xprt_copy_addrs(struct svc_rqst *rqstp, struct svc_xprt *xprt); -void svc_xprt_close(struct svc_xprt *xprt); +void svc_close_xprt(struct svc_xprt *xprt); int svc_port_is_privileged(struct sockaddr *sin); int svc_print_xprts(char *buf, int maxlen); struct svc_xprt *svc_find_xprt(struct svc_serv *serv, const char *xcl_name, diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h index 6d9cc9080aca..b0003866a249 100644 --- a/include/linux/sunrpc/svcauth.h +++ b/include/linux/sunrpc/svcauth.h @@ -127,7 +127,7 @@ struct auth_ops { char * name; struct module *owner; int flavour; - int (*accept)(struct svc_rqst *rq); + int (*accept)(struct svc_rqst *rq, __be32 *authp); int (*release)(struct svc_rqst *rq); void (*domain_release)(struct auth_domain *); int (*set_client)(struct svc_rqst *rq); @@ -149,7 +149,7 @@ struct auth_ops { struct svc_xprt; -extern int svc_authenticate(struct svc_rqst *rqstp); +extern int svc_authenticate(struct svc_rqst *rqstp, __be32 *authp); extern int svc_authorise(struct svc_rqst *rqstp); extern int svc_set_client(struct svc_rqst *rqstp); extern int svc_auth_register(rpc_authflavor_t flavor, struct auth_ops *aops); diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h index a366d3eb0531..b7ac7fe68306 100644 --- a/include/linux/sunrpc/svcsock.h +++ b/include/linux/sunrpc/svcsock.h @@ -57,9 +57,10 @@ int svc_recv(struct svc_rqst *, long); int svc_send(struct svc_rqst *); void svc_drop(struct svc_rqst *); void svc_sock_update_bufs(struct svc_serv *serv); -int svc_addsock(struct svc_serv *serv, struct net *net, - const int fd, char *name_return, const size_t len, - const struct cred *cred); +bool svc_alien_sock(struct net *net, int fd); +int svc_addsock(struct svc_serv *serv, const int fd, + char *name_return, const size_t len, + const struct cred *cred); void svc_init_xprt_sock(void); void svc_cleanup_xprt_sock(void); struct svc_xprt *svc_sock_create(struct svc_serv *serv, int prot); diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index c1c50eaae472..6d9d1520612b 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -19,13 +19,6 @@ struct bio_vec; struct rpc_rqst; -/* - * Size of an XDR encoding unit in bytes, i.e. 32 bits, - * as defined in Section 3 of RFC 4506. All encoded - * XDR data items are aligned on a boundary of 32 bits. - */ -#define XDR_UNIT sizeof(__be32) - /* * Buffer adjustment */ @@ -239,12 +232,10 @@ typedef int (*kxdrdproc_t)(struct rpc_rqst *rqstp, struct xdr_stream *xdr, extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, struct rpc_rqst *rqst); -extern void xdr_init_encode_pages(struct xdr_stream *xdr, struct xdr_buf *buf, - struct page **pages, struct rpc_rqst *rqst); extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes); extern int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec, size_t nbytes); -extern void __xdr_commit_encode(struct xdr_stream *xdr); +extern void xdr_commit_encode(struct xdr_stream *xdr); extern void xdr_truncate_encode(struct xdr_stream *xdr, size_t len); extern int xdr_restrict_buflen(struct xdr_stream *xdr, int newbuflen); extern void xdr_write_pages(struct xdr_stream *xdr, struct page **pages, @@ -255,71 +246,13 @@ extern void xdr_init_decode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, struct rpc_rqst *rqst); extern void xdr_init_decode_pages(struct xdr_stream *xdr, struct xdr_buf *buf, struct page **pages, unsigned int len); +extern void xdr_set_scratch_buffer(struct xdr_stream *xdr, void *buf, size_t buflen); extern __be32 *xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes); extern unsigned int xdr_read_pages(struct xdr_stream *xdr, unsigned int len); extern void xdr_enter_page(struct xdr_stream *xdr, unsigned int len); extern int xdr_process_buf(struct xdr_buf *buf, unsigned int offset, unsigned int len, int (*actor)(struct scatterlist *, void *), void *data); extern uint64_t xdr_align_data(struct xdr_stream *, uint64_t, uint32_t); extern uint64_t xdr_expand_hole(struct xdr_stream *, uint64_t, uint64_t); -extern bool xdr_stream_subsegment(struct xdr_stream *xdr, struct xdr_buf *subbuf, - unsigned int len); - -/** - * xdr_set_scratch_buffer - Attach a scratch buffer for decoding data. - * @xdr: pointer to xdr_stream struct - * @buf: pointer to an empty buffer - * @buflen: size of 'buf' - * - * The scratch buffer is used when decoding from an array of pages. - * If an xdr_inline_decode() call spans across page boundaries, then - * we copy the data into the scratch buffer in order to allow linear - * access. - */ -static inline void -xdr_set_scratch_buffer(struct xdr_stream *xdr, void *buf, size_t buflen) -{ - xdr->scratch.iov_base = buf; - xdr->scratch.iov_len = buflen; -} - -/** - * xdr_set_scratch_page - Attach a scratch buffer for decoding data - * @xdr: pointer to xdr_stream struct - * @page: an anonymous page - * - * See xdr_set_scratch_buffer(). - */ -static inline void -xdr_set_scratch_page(struct xdr_stream *xdr, struct page *page) -{ - xdr_set_scratch_buffer(xdr, page_address(page), PAGE_SIZE); -} - -/** - * xdr_reset_scratch_buffer - Clear scratch buffer information - * @xdr: pointer to xdr_stream struct - * - * See xdr_set_scratch_buffer(). - */ -static inline void -xdr_reset_scratch_buffer(struct xdr_stream *xdr) -{ - xdr_set_scratch_buffer(xdr, NULL, 0); -} - -/** - * xdr_commit_encode - Ensure all data is written to xdr->buf - * @xdr: pointer to xdr_stream - * - * Handle encoding across page boundaries by giving the caller a - * temporary location to write to, then later copying the data into - * place. __xdr_commit_encode() does that copying. - */ -static inline void xdr_commit_encode(struct xdr_stream *xdr) -{ - if (unlikely(xdr->scratch.iov_len)) - __xdr_commit_encode(xdr); -} /** * xdr_stream_remaining - Return the number of bytes remaining in the stream @@ -352,7 +285,7 @@ ssize_t xdr_stream_decode_string_dup(struct xdr_stream *xdr, char **str, static inline size_t xdr_align_size(size_t n) { - const size_t mask = XDR_UNIT - 1; + const size_t mask = sizeof(__u32) - 1; return (n + mask) & ~mask; } @@ -382,7 +315,7 @@ static inline size_t xdr_pad_size(size_t n) */ static inline ssize_t xdr_stream_encode_item_present(struct xdr_stream *xdr) { - const size_t len = XDR_UNIT; + const size_t len = sizeof(__be32); __be32 *p = xdr_reserve_space(xdr, len); if (unlikely(!p)) @@ -401,7 +334,7 @@ static inline ssize_t xdr_stream_encode_item_present(struct xdr_stream *xdr) */ static inline int xdr_stream_encode_item_absent(struct xdr_stream *xdr) { - const size_t len = XDR_UNIT; + const size_t len = sizeof(__be32); __be32 *p = xdr_reserve_space(xdr, len); if (unlikely(!p)) @@ -410,40 +343,6 @@ static inline int xdr_stream_encode_item_absent(struct xdr_stream *xdr) return len; } -/** - * xdr_encode_bool - Encode a boolean item - * @p: address in a buffer into which to encode - * @n: boolean value to encode - * - * Return value: - * Address of item following the encoded boolean - */ -static inline __be32 *xdr_encode_bool(__be32 *p, u32 n) -{ - *p++ = n ? xdr_one : xdr_zero; - return p; -} - -/** - * xdr_stream_encode_bool - Encode a boolean item - * @xdr: pointer to xdr_stream - * @n: boolean value to encode - * - * Return values: - * On success, returns length in bytes of XDR buffer consumed - * %-EMSGSIZE on XDR buffer overflow - */ -static inline int xdr_stream_encode_bool(struct xdr_stream *xdr, __u32 n) -{ - const size_t len = XDR_UNIT; - __be32 *p = xdr_reserve_space(xdr, len); - - if (unlikely(!p)) - return -EMSGSIZE; - xdr_encode_bool(p, n); - return len; -} - /** * xdr_stream_encode_u32 - Encode a 32-bit integer * @xdr: pointer to xdr_stream @@ -605,27 +504,6 @@ static inline bool xdr_item_is_present(const __be32 *p) return *p != xdr_zero; } -/** - * xdr_stream_decode_bool - Decode a boolean - * @xdr: pointer to xdr_stream - * @ptr: pointer to a u32 in which to store the result - * - * Return values: - * %0 on success - * %-EBADMSG on XDR buffer overflow - */ -static inline ssize_t -xdr_stream_decode_bool(struct xdr_stream *xdr, __u32 *ptr) -{ - const size_t count = sizeof(*ptr); - __be32 *p = xdr_inline_decode(xdr, count); - - if (unlikely(!p)) - return -EBADMSG; - *ptr = (*p != xdr_zero); - return 0; -} - /** * xdr_stream_decode_u32 - Decode a 32-bit integer * @xdr: pointer to xdr_stream @@ -647,27 +525,6 @@ xdr_stream_decode_u32(struct xdr_stream *xdr, __u32 *ptr) return 0; } -/** - * xdr_stream_decode_u64 - Decode a 64-bit integer - * @xdr: pointer to xdr_stream - * @ptr: location to store 64-bit integer - * - * Return values: - * %0 on success - * %-EBADMSG on XDR buffer overflow - */ -static inline ssize_t -xdr_stream_decode_u64(struct xdr_stream *xdr, __u64 *ptr) -{ - const size_t count = sizeof(*ptr); - __be32 *p = xdr_inline_decode(xdr, count); - - if (unlikely(!p)) - return -EBADMSG; - xdr_decode_hyper(p, ptr); - return 0; -} - /** * xdr_stream_decode_opaque_fixed - Decode fixed length opaque xdr data * @xdr: pointer to xdr_stream diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index ec4cea5dd222..61d0315a42ab 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1321,6 +1321,18 @@ static inline long ksys_ftruncate(unsigned int fd, loff_t length) return do_sys_ftruncate(fd, length, 1); } +extern int __close_fd(struct files_struct *files, unsigned int fd); + +/* + * In contrast to sys_close(), this stub does not check whether the syscall + * should or should not be restarted, but returns the raw error codes from + * __close_fd(). + */ +static inline int ksys_close(unsigned int fd) +{ + return __close_fd(current->files, fd); +} + extern long do_sys_truncate(const char __user *pathname, loff_t length); static inline long ksys_truncate(const char __user *pathname, loff_t length) diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index da861f2e34ce..6ee587d0aeaa 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -56,8 +56,6 @@ typedef int proc_handler(struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos); int proc_dostring(struct ctl_table *, int, void *, size_t *, loff_t *); -int proc_dobool(struct ctl_table *table, int write, void *buffer, - size_t *lenp, loff_t *ppos); int proc_dointvec(struct ctl_table *, int, void *, size_t *, loff_t *); int proc_douintvec(struct ctl_table *, int, void *, size_t *, loff_t *); int proc_dointvec_minmax(struct ctl_table *, int, void *, size_t *, loff_t *); diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 29f55fadc362..71cc05ddaa21 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -50,10 +50,6 @@ enum ucount_type { #ifdef CONFIG_INOTIFY_USER UCOUNT_INOTIFY_INSTANCES, UCOUNT_INOTIFY_WATCHES, -#endif -#ifdef CONFIG_FANOTIFY - UCOUNT_FANOTIFY_GROUPS, - UCOUNT_FANOTIFY_MARKS, #endif UCOUNT_COUNTS, }; diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 56e4a57d2538..8220369ee610 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -394,7 +394,6 @@ DEFINE_RPC_RUNNING_EVENT(complete); DEFINE_RPC_RUNNING_EVENT(timeout); DEFINE_RPC_RUNNING_EVENT(signalled); DEFINE_RPC_RUNNING_EVENT(end); -DEFINE_RPC_RUNNING_EVENT(call_done); DECLARE_EVENT_CLASS(rpc_task_queued, @@ -1481,7 +1480,8 @@ DEFINE_SVCXDRBUF_EVENT(sendto); svc_rqst_flag(SPLICE_OK) \ svc_rqst_flag(VICTIM) \ svc_rqst_flag(BUSY) \ - svc_rqst_flag_end(DATA) + svc_rqst_flag(DATA) \ + svc_rqst_flag_end(AUTHERR) #undef svc_rqst_flag #undef svc_rqst_flag_end @@ -1547,9 +1547,9 @@ TRACE_DEFINE_ENUM(SVC_COMPLETE); { SVC_COMPLETE, "SVC_COMPLETE" }) TRACE_EVENT(svc_authenticate, - TP_PROTO(const struct svc_rqst *rqst, int auth_res), + TP_PROTO(const struct svc_rqst *rqst, int auth_res, __be32 auth_stat), - TP_ARGS(rqst, auth_res), + TP_ARGS(rqst, auth_res, auth_stat), TP_STRUCT__entry( __field(u32, xid) @@ -1560,7 +1560,7 @@ TRACE_EVENT(svc_authenticate, TP_fast_assign( __entry->xid = be32_to_cpu(rqst->rq_xid); __entry->svc_status = auth_res; - __entry->auth_stat = be32_to_cpu(rqst->rq_auth_stat); + __entry->auth_stat = be32_to_cpu(auth_stat); ), TP_printk("xid=0x%08x auth_res=%s auth_stat=%s", @@ -1578,7 +1578,6 @@ TRACE_EVENT(svc_process, __field(u32, vers) __field(u32, proc) __string(service, name) - __string(procedure, rqst->rq_procinfo->pc_name) __string(addr, rqst->rq_xprt ? rqst->rq_xprt->xpt_remotebuf : "(null)") ), @@ -1588,16 +1587,13 @@ TRACE_EVENT(svc_process, __entry->vers = rqst->rq_vers; __entry->proc = rqst->rq_proc; __assign_str(service, name); - __assign_str(procedure, rqst->rq_procinfo->pc_name); __assign_str(addr, rqst->rq_xprt ? rqst->rq_xprt->xpt_remotebuf : "(null)"); ), - TP_printk("addr=%s xid=0x%08x service=%s vers=%u proc=%s", + TP_printk("addr=%s xid=0x%08x service=%s vers=%u proc=%u", __get_str(addr), __entry->xid, - __get_str(service), __entry->vers, - __get_str(procedure) - ) + __get_str(service), __entry->vers, __entry->proc) ); DECLARE_EVENT_CLASS(svc_rqst_event, @@ -1756,7 +1752,6 @@ DECLARE_EVENT_CLASS(svc_xprt_event, ), \ TP_ARGS(xprt)) -DEFINE_SVC_XPRT_EVENT(received); DEFINE_SVC_XPRT_EVENT(no_write_space); DEFINE_SVC_XPRT_EVENT(close); DEFINE_SVC_XPRT_EVENT(detach); @@ -1854,7 +1849,6 @@ TRACE_EVENT(svc_stats_latency, TP_STRUCT__entry( __field(u32, xid) __field(unsigned long, execute) - __string(procedure, rqst->rq_procinfo->pc_name) __string(addr, rqst->rq_xprt->xpt_remotebuf) ), @@ -1862,13 +1856,11 @@ TRACE_EVENT(svc_stats_latency, __entry->xid = be32_to_cpu(rqst->rq_xid); __entry->execute = ktime_to_us(ktime_sub(ktime_get(), rqst->rq_stime)); - __assign_str(procedure, rqst->rq_procinfo->pc_name); __assign_str(addr, rqst->rq_xprt->xpt_remotebuf); ), - TP_printk("addr=%s xid=0x%08x proc=%s execute-us=%lu", - __get_str(addr), __entry->xid, __get_str(procedure), - __entry->execute) + TP_printk("addr=%s xid=0x%08x execute-us=%lu", + __get_str(addr), __entry->xid, __entry->execute) ); DECLARE_EVENT_CLASS(svc_deferred_event, diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index d8536d77fea1..fbf9c5c7dd59 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -20,7 +20,6 @@ #define FAN_OPEN_EXEC 0x00001000 /* File was opened for exec */ #define FAN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */ -#define FAN_FS_ERROR 0x00008000 /* Filesystem error */ #define FAN_OPEN_PERM 0x00010000 /* File open in perm check */ #define FAN_ACCESS_PERM 0x00020000 /* File accessed in perm check */ @@ -28,8 +27,6 @@ #define FAN_EVENT_ON_CHILD 0x08000000 /* Interested in child events */ -#define FAN_RENAME 0x10000000 /* File was renamed */ - #define FAN_ONDIR 0x40000000 /* Event occurred against dir */ /* helper events */ @@ -54,18 +51,13 @@ #define FAN_ENABLE_AUDIT 0x00000040 /* Flags to determine fanotify event format */ -#define FAN_REPORT_PIDFD 0x00000080 /* Report pidfd for event->pid */ #define FAN_REPORT_TID 0x00000100 /* event->pid is thread id */ #define FAN_REPORT_FID 0x00000200 /* Report unique file id */ #define FAN_REPORT_DIR_FID 0x00000400 /* Report unique directory id */ #define FAN_REPORT_NAME 0x00000800 /* Report events with name */ -#define FAN_REPORT_TARGET_FID 0x00001000 /* Report dirent target id */ /* Convenience macro - FAN_REPORT_NAME requires FAN_REPORT_DIR_FID */ #define FAN_REPORT_DFID_NAME (FAN_REPORT_DIR_FID | FAN_REPORT_NAME) -/* Convenience macro - FAN_REPORT_TARGET_FID requires all other FID flags */ -#define FAN_REPORT_DFID_NAME_TARGET (FAN_REPORT_DFID_NAME | \ - FAN_REPORT_FID | FAN_REPORT_TARGET_FID) /* Deprecated - do not use this in programs and do not add new flags here! */ #define FAN_ALL_INIT_FLAGS (FAN_CLOEXEC | FAN_NONBLOCK | \ @@ -82,21 +74,12 @@ #define FAN_MARK_IGNORED_SURV_MODIFY 0x00000040 #define FAN_MARK_FLUSH 0x00000080 /* FAN_MARK_FILESYSTEM is 0x00000100 */ -#define FAN_MARK_EVICTABLE 0x00000200 -/* This bit is mutually exclusive with FAN_MARK_IGNORED_MASK bit */ -#define FAN_MARK_IGNORE 0x00000400 /* These are NOT bitwise flags. Both bits can be used togther. */ #define FAN_MARK_INODE 0x00000000 #define FAN_MARK_MOUNT 0x00000010 #define FAN_MARK_FILESYSTEM 0x00000100 -/* - * Convenience macro - FAN_MARK_IGNORE requires FAN_MARK_IGNORED_SURV_MODIFY - * for non-inode mark types. - */ -#define FAN_MARK_IGNORE_SURV (FAN_MARK_IGNORE | FAN_MARK_IGNORED_SURV_MODIFY) - /* Deprecated - do not use this in programs and do not add new flags here! */ #define FAN_ALL_MARK_FLAGS (FAN_MARK_ADD |\ FAN_MARK_REMOVE |\ @@ -140,14 +123,6 @@ struct fanotify_event_metadata { #define FAN_EVENT_INFO_TYPE_FID 1 #define FAN_EVENT_INFO_TYPE_DFID_NAME 2 #define FAN_EVENT_INFO_TYPE_DFID 3 -#define FAN_EVENT_INFO_TYPE_PIDFD 4 -#define FAN_EVENT_INFO_TYPE_ERROR 5 - -/* Special info types for FAN_RENAME */ -#define FAN_EVENT_INFO_TYPE_OLD_DFID_NAME 10 -/* Reserved for FAN_EVENT_INFO_TYPE_OLD_DFID 11 */ -#define FAN_EVENT_INFO_TYPE_NEW_DFID_NAME 12 -/* Reserved for FAN_EVENT_INFO_TYPE_NEW_DFID 13 */ /* Variable length info record following event metadata */ struct fanotify_event_info_header { @@ -173,21 +148,6 @@ struct fanotify_event_info_fid { unsigned char handle[0]; }; -/* - * This structure is used for info records of type FAN_EVENT_INFO_TYPE_PIDFD. - * It holds a pidfd for the pid that was responsible for generating an event. - */ -struct fanotify_event_info_pidfd { - struct fanotify_event_info_header hdr; - __s32 pidfd; -}; - -struct fanotify_event_info_error { - struct fanotify_event_info_header hdr; - __s32 error; - __u32 error_count; -}; - struct fanotify_response { __s32 fd; __u32 response; @@ -200,8 +160,6 @@ struct fanotify_response { /* No fd set in event */ #define FAN_NOFD -1 -#define FAN_NOPIDFD FAN_NOFD -#define FAN_EPIDFD -2 /* Helper functions to deal with fanotify_event_metadata buffers */ #define FAN_EVENT_METADATA_LEN (sizeof(struct fanotify_event_metadata)) diff --git a/include/uapi/linux/nfs3.h b/include/uapi/linux/nfs3.h index c22ab77713bd..37e4b34e6b43 100644 --- a/include/uapi/linux/nfs3.h +++ b/include/uapi/linux/nfs3.h @@ -63,12 +63,6 @@ enum nfs3_ftype { NF3BAD = 8 }; -enum nfs3_time_how { - DONT_CHANGE = 0, - SET_TO_SERVER_TIME = 1, - SET_TO_CLIENT_TIME = 2, -}; - struct nfs3_fh { unsigned short size; unsigned char data[NFS3_FHSIZE]; diff --git a/include/uapi/linux/nfsd/nfsfh.h b/include/uapi/linux/nfsd/nfsfh.h new file mode 100644 index 000000000000..ff0ca88b1c8f --- /dev/null +++ b/include/uapi/linux/nfsd/nfsfh.h @@ -0,0 +1,105 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * This file describes the layout of the file handles as passed + * over the wire. + * + * Copyright (C) 1995, 1996, 1997 Olaf Kirch + */ + +#ifndef _UAPI_LINUX_NFSD_FH_H +#define _UAPI_LINUX_NFSD_FH_H + +#include +#include +#include +#include +#include + +/* + * This is the old "dentry style" Linux NFSv2 file handle. + * + * The xino and xdev fields are currently used to transport the + * ino/dev of the exported inode. + */ +struct nfs_fhbase_old { + __u32 fb_dcookie; /* dentry cookie - always 0xfeebbaca */ + __u32 fb_ino; /* our inode number */ + __u32 fb_dirino; /* dir inode number, 0 for directories */ + __u32 fb_dev; /* our device */ + __u32 fb_xdev; + __u32 fb_xino; + __u32 fb_generation; +}; + +/* + * This is the new flexible, extensible style NFSv2/v3/v4 file handle. + * by Neil Brown - March 2000 + * + * The file handle starts with a sequence of four-byte words. + * The first word contains a version number (1) and three descriptor bytes + * that tell how the remaining 3 variable length fields should be handled. + * These three bytes are auth_type, fsid_type and fileid_type. + * + * All four-byte values are in host-byte-order. + * + * The auth_type field is deprecated and must be set to 0. + * + * The fsid_type identifies how the filesystem (or export point) is + * encoded. + * Current values: + * 0 - 4 byte device id (ms-2-bytes major, ls-2-bytes minor), 4byte inode number + * NOTE: we cannot use the kdev_t device id value, because kdev_t.h + * says we mustn't. We must break it up and reassemble. + * 1 - 4 byte user specified identifier + * 2 - 4 byte major, 4 byte minor, 4 byte inode number - DEPRECATED + * 3 - 4 byte device id, encoded for user-space, 4 byte inode number + * 4 - 4 byte inode number and 4 byte uuid + * 5 - 8 byte uuid + * 6 - 16 byte uuid + * 7 - 8 byte inode number and 16 byte uuid + * + * The fileid_type identified how the file within the filesystem is encoded. + * The values for this field are filesystem specific, exccept that + * filesystems must not use the values '0' or '0xff'. 'See enum fid_type' + * in include/linux/exportfs.h for currently registered values. + */ +struct nfs_fhbase_new { + __u8 fb_version; /* == 1, even => nfs_fhbase_old */ + __u8 fb_auth_type; + __u8 fb_fsid_type; + __u8 fb_fileid_type; + __u32 fb_auth[1]; +/* __u32 fb_fsid[0]; floating */ +/* __u32 fb_fileid[0]; floating */ +}; + +struct knfsd_fh { + unsigned int fh_size; /* significant for NFSv3. + * Points to the current size while building + * a new file handle + */ + union { + struct nfs_fhbase_old fh_old; + __u32 fh_pad[NFS4_FHSIZE/4]; + struct nfs_fhbase_new fh_new; + } fh_base; +}; + +#define ofh_dcookie fh_base.fh_old.fb_dcookie +#define ofh_ino fh_base.fh_old.fb_ino +#define ofh_dirino fh_base.fh_old.fb_dirino +#define ofh_dev fh_base.fh_old.fb_dev +#define ofh_xdev fh_base.fh_old.fb_xdev +#define ofh_xino fh_base.fh_old.fb_xino +#define ofh_generation fh_base.fh_old.fb_generation + +#define fh_version fh_base.fh_new.fb_version +#define fh_fsid_type fh_base.fh_new.fb_fsid_type +#define fh_auth_type fh_base.fh_new.fb_auth_type +#define fh_fileid_type fh_base.fh_new.fb_fileid_type +#define fh_fsid fh_base.fh_new.fb_auth + +/* Do not use, provided for userspace compatiblity. */ +#define fh_auth fh_base.fh_new.fb_auth + +#endif /* _UAPI_LINUX_NFSD_FH_H */ diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c index 691f90dd09d2..b2ebacd2f309 100644 --- a/kernel/audit_fsnotify.c +++ b/kernel/audit_fsnotify.c @@ -100,7 +100,7 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa audit_update_mark(audit_mark, dentry->d_inode); audit_mark->rule = krule; - ret = fsnotify_add_inode_mark(&audit_mark->mark, inode, 0); + ret = fsnotify_add_inode_mark(&audit_mark->mark, inode, true); if (ret < 0) { audit_mark->path = NULL; fsnotify_put_mark(&audit_mark->mark); @@ -161,7 +161,8 @@ static int audit_mark_handle_event(struct fsnotify_mark *inode_mark, u32 mask, audit_mark = container_of(inode_mark, struct audit_fsnotify_mark, mark); - if (WARN_ON_ONCE(inode_mark->group != audit_fsnotify_group)) + if (WARN_ON_ONCE(inode_mark->group != audit_fsnotify_group) || + WARN_ON_ONCE(!inode)) return 0; if (mask & (FS_CREATE|FS_MOVED_TO|FS_DELETE|FS_MOVED_FROM)) { @@ -182,8 +183,7 @@ static const struct fsnotify_ops audit_mark_fsnotify_ops = { static int __init audit_fsnotify_init(void) { - audit_fsnotify_group = fsnotify_alloc_group(&audit_mark_fsnotify_ops, - FSNOTIFY_GROUP_DUPS); + audit_fsnotify_group = fsnotify_alloc_group(&audit_mark_fsnotify_ops); if (IS_ERR(audit_fsnotify_group)) { audit_fsnotify_group = NULL; audit_panic("cannot create audit fsnotify group"); diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c index 0c35879bbf7c..39241207ec04 100644 --- a/kernel/audit_tree.c +++ b/kernel/audit_tree.c @@ -1077,7 +1077,7 @@ static int __init audit_tree_init(void) audit_tree_mark_cachep = KMEM_CACHE(audit_tree_mark, SLAB_PANIC); - audit_tree_group = fsnotify_alloc_group(&audit_tree_ops, 0); + audit_tree_group = fsnotify_alloc_group(&audit_tree_ops); if (IS_ERR(audit_tree_group)) audit_panic("cannot initialize fsnotify group for rectree watches"); diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c index 5cf22fe30149..edbeffee64b8 100644 --- a/kernel/audit_watch.c +++ b/kernel/audit_watch.c @@ -472,7 +472,8 @@ static int audit_watch_handle_event(struct fsnotify_mark *inode_mark, u32 mask, parent = container_of(inode_mark, struct audit_parent, mark); - if (WARN_ON_ONCE(inode_mark->group != audit_watch_group)) + if (WARN_ON_ONCE(inode_mark->group != audit_watch_group) || + WARN_ON_ONCE(!inode)) return 0; if (mask & (FS_CREATE|FS_MOVED_TO) && inode) @@ -492,7 +493,7 @@ static const struct fsnotify_ops audit_watch_fsnotify_ops = { static int __init audit_watch_init(void) { - audit_watch_group = fsnotify_alloc_group(&audit_watch_fsnotify_ops, 0); + audit_watch_group = fsnotify_alloc_group(&audit_watch_fsnotify_ops); if (IS_ERR(audit_watch_group)) { audit_watch_group = NULL; audit_panic("cannot create audit fsnotify group"); diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c index 5966013bc788..6b14b4c4068c 100644 --- a/kernel/bpf/inode.c +++ b/kernel/bpf/inode.c @@ -507,7 +507,7 @@ static void *bpf_obj_do_get(const char __user *pathname, return ERR_PTR(ret); inode = d_backing_inode(path.dentry); - ret = path_permission(&path, ACC_MODE(flags)); + ret = inode_permission(inode, ACC_MODE(flags)); if (ret) goto out; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index f3b7856eadb6..2b3b1a687d36 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -3909,6 +3909,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr, pid_t pid = attr->task_fd_query.pid; u32 fd = attr->task_fd_query.fd; const struct perf_event *event; + struct files_struct *files; struct task_struct *task; struct file *file; int err; @@ -3928,11 +3929,23 @@ static int bpf_task_fd_query(const union bpf_attr *attr, if (!task) return -ENOENT; - err = 0; - file = fget_task(task, fd); + files = get_files_struct(task); put_task_struct(task); + if (!files) + return -ENOENT; + + err = 0; + spin_lock(&files->file_lock); + file = fcheck_files(files, fd); if (!file) - return -EBADF; + err = -EBADF; + else + get_file(file); + spin_unlock(&files->file_lock); + put_files_struct(files); + + if (err) + goto out; if (file->f_op == &bpf_link_fops) { struct bpf_link *link = file->private_data; @@ -3972,6 +3985,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr, err = -ENOTSUPP; put_file: fput(file); +out: return err; } diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c index 762b4d7c3779..f3d3a562a802 100644 --- a/kernel/bpf/task_iter.c +++ b/kernel/bpf/task_iter.c @@ -185,7 +185,7 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info) for (; curr_fd < max_fds; curr_fd++) { struct file *f; - f = files_lookup_fd_rcu(curr_files, curr_fd); + f = fcheck_files(curr_files, curr_fd); if (!f) continue; if (!get_file_rcu(f)) diff --git a/kernel/fork.c b/kernel/fork.c index 9b2428865267..c47ad81c627c 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -3138,21 +3138,21 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags) * the exec layer of the kernel. */ -int unshare_files(void) +int unshare_files(struct files_struct **displaced) { struct task_struct *task = current; - struct files_struct *old, *copy = NULL; + struct files_struct *copy = NULL; int error; error = unshare_fd(CLONE_FILES, NR_OPEN_MAX, ©); - if (error || !copy) + if (error || !copy) { + *displaced = NULL; return error; - - old = task->files; + } + *displaced = task->files; task_lock(task); task->files = copy; task_unlock(task); - put_files_struct(old); return 0; } diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index 17d3a704bafa..96505113b907 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -200,11 +200,6 @@ unsigned long kallsyms_lookup_name(const char *name) return module_kallsyms_lookup_name(name); } -#ifdef CONFIG_LIVEPATCH -/* - * Iterate over all symbols in vmlinux. For symbols from modules use - * module_kallsyms_on_each_symbol instead. - */ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, unsigned long), void *data) @@ -220,9 +215,8 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, if (ret != 0) return ret; } - return 0; + return module_kallsyms_on_each_symbol(fn, data); } -#endif /* CONFIG_LIVEPATCH */ static unsigned long get_symbol_pos(unsigned long addr, unsigned long *symbolsize, diff --git a/kernel/kcmp.c b/kernel/kcmp.c index 5353edfad8e1..c0d2ad9b4705 100644 --- a/kernel/kcmp.c +++ b/kernel/kcmp.c @@ -61,11 +61,16 @@ static int kcmp_ptr(void *v1, void *v2, enum kcmp_type type) static struct file * get_file_raw_ptr(struct task_struct *task, unsigned int idx) { - struct file *file; + struct file *file = NULL; + task_lock(task); rcu_read_lock(); - file = task_lookup_fd_rcu(task, idx); + + if (task->files) + file = fcheck_files(task->files, idx); + rcu_read_unlock(); + task_unlock(task); return file; } @@ -102,6 +107,7 @@ static int kcmp_epoll_target(struct task_struct *task1, { struct file *filp, *filp_epoll, *filp_tgt; struct kcmp_epoll_slot slot; + struct files_struct *files; if (copy_from_user(&slot, uslot, sizeof(slot))) return -EFAULT; @@ -110,12 +116,23 @@ static int kcmp_epoll_target(struct task_struct *task1, if (!filp) return -EBADF; - filp_epoll = fget_task(task2, slot.efd); - if (!filp_epoll) + files = get_files_struct(task2); + if (!files) return -EBADF; - filp_tgt = get_epoll_tfile_raw_ptr(filp_epoll, slot.tfd, slot.toff); - fput(filp_epoll); + spin_lock(&files->file_lock); + filp_epoll = fcheck_files(files, slot.efd); + if (filp_epoll) + get_file(filp_epoll); + else + filp_tgt = ERR_PTR(-EBADF); + spin_unlock(&files->file_lock); + put_files_struct(files); + + if (filp_epoll) { + filp_tgt = get_epoll_tfile_raw_ptr(filp_epoll, slot.tfd, slot.toff); + fput(filp_epoll); + } if (IS_ERR(filp_tgt)) return PTR_ERR(filp_tgt); diff --git a/kernel/kthread.c b/kernel/kthread.c index ec9f61995004..9d736f57b84f 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -262,21 +262,6 @@ void kthread_parkme(void) } EXPORT_SYMBOL_GPL(kthread_parkme); -/** - * kthread_exit - Cause the current kthread return @result to kthread_stop(). - * @result: The integer value to return to kthread_stop(). - * - * While kthread_exit can be called directly, it exists so that - * functions which do some additional work in non-modular code such as - * module_put_and_kthread_exit can be implemented. - * - * Does not return. - */ -void __noreturn kthread_exit(long result) -{ - do_exit(result); -} - static int kthread(void *_create) { /* Copy data: it's on kthread's stack */ @@ -294,13 +279,13 @@ static int kthread(void *_create) done = xchg(&create->done, NULL); if (!done) { kfree(create); - kthread_exit(-EINTR); + do_exit(-EINTR); } if (!self) { create->result = ERR_PTR(-ENOMEM); complete(done); - kthread_exit(-ENOMEM); + do_exit(-ENOMEM); } self->threadfn = threadfn; @@ -327,7 +312,7 @@ static int kthread(void *_create) __kthread_parkme(self); ret = threadfn(data); } - kthread_exit(ret); + do_exit(ret); } /* called from do_fork() to get node information for about to be created task */ @@ -637,7 +622,7 @@ EXPORT_SYMBOL_GPL(kthread_park); * instead of calling wake_up_process(): the thread will exit without * calling threadfn(). * - * If threadfn() may call kthread_exit() itself, the caller must ensure + * If threadfn() may call do_exit() itself, the caller must ensure * task_struct can't go away. * * Returns the result of threadfn(), or %-EINTR if wake_up_process() diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c index 147ed154ebc7..f5faf935c2d8 100644 --- a/kernel/livepatch/core.c +++ b/kernel/livepatch/core.c @@ -19,7 +19,6 @@ #include #include #include -#include #include #include "core.h" #include "patch.h" @@ -58,7 +57,7 @@ static void klp_find_object_module(struct klp_object *obj) if (!klp_is_module(obj)) return; - rcu_read_lock_sched(); + mutex_lock(&module_mutex); /* * We do not want to block removal of patched modules and therefore * we do not take a reference here. The patches are removed by @@ -75,7 +74,7 @@ static void klp_find_object_module(struct klp_object *obj) if (mod && mod->klp_alive) obj->mod = mod; - rcu_read_unlock_sched(); + mutex_unlock(&module_mutex); } static bool klp_initialized(void) @@ -164,10 +163,12 @@ static int klp_find_object_symbol(const char *objname, const char *name, .pos = sympos, }; + mutex_lock(&module_mutex); if (objname) module_kallsyms_on_each_symbol(klp_find_callback, &args); else kallsyms_on_each_symbol(klp_find_callback, &args); + mutex_unlock(&module_mutex); /* * Ensure an address was found. If sympos is 0, ensure symbol is unique; diff --git a/kernel/module.c b/kernel/module.c index 1ea89ae7c2cc..93fade94f108 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -259,6 +259,11 @@ static void mod_update_bounds(struct module *mod) struct list_head *kdb_modules = &modules; /* kdb needs the list of modules */ #endif /* CONFIG_KGDB_KDB */ +static void module_assert_mutex(void) +{ + lockdep_assert_held(&module_mutex); +} + static void module_assert_mutex_or_preempt(void) { #ifdef CONFIG_LOCKDEP @@ -338,14 +343,14 @@ static inline void add_taint_module(struct module *mod, unsigned flag, /* * A thread that wants to hold a reference to a module only while it - * is running can call this to safely exit. + * is running can call this to safely exit. nfsd and lockd use this. */ -void __noreturn __module_put_and_kthread_exit(struct module *mod, long code) +void __noreturn __module_put_and_exit(struct module *mod, long code) { module_put(mod); - kthread_exit(code); + do_exit(code); } -EXPORT_SYMBOL(__module_put_and_kthread_exit); +EXPORT_SYMBOL(__module_put_and_exit); /* Find a module section: 0 means not found. */ static unsigned int find_sec(const struct load_info *info, const char *name) @@ -640,6 +645,7 @@ static struct module *find_module_all(const char *name, size_t len, struct module *find_module(const char *name) { + module_assert_mutex(); return find_module_all(name, strlen(name), false); } @@ -4489,7 +4495,6 @@ unsigned long module_kallsyms_lookup_name(const char *name) return ret; } -#ifdef CONFIG_LIVEPATCH int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, unsigned long), void *data) @@ -4498,7 +4503,8 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, unsigned int i; int ret; - mutex_lock(&module_mutex); + module_assert_mutex(); + list_for_each_entry(mod, &modules, list) { /* We hold module_mutex: no need for rcu_dereference_sched */ struct mod_kallsyms *kallsyms = mod->kallsyms; @@ -4514,13 +4520,11 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, ret = fn(data, kallsyms_symbol_name(kallsyms, i), mod, kallsyms_symbol_value(sym)); if (ret != 0) - break; + return ret; } } - mutex_unlock(&module_mutex); - return ret; + return 0; } -#endif /* CONFIG_LIVEPATCH */ #endif /* CONFIG_KALLSYMS */ static void cfi_init(struct module *mod) diff --git a/kernel/pid.c b/kernel/pid.c index 15bbb9ddb2bf..48babb1dd3e1 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -551,21 +551,13 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags) * Note, that this function can only be called after the fd table has * been unshared to avoid leaking the pidfd to the new process. * - * This symbol should not be explicitly exported to loadable modules. - * * Return: On success, a cloexec pidfd is returned. * On error, a negative errno number will be returned. */ -int pidfd_create(struct pid *pid, unsigned int flags) +static int pidfd_create(struct pid *pid, unsigned int flags) { int fd; - if (!pid || !pid_has_task(pid, PIDTYPE_TGID)) - return -EINVAL; - - if (flags & ~(O_NONBLOCK | O_RDWR | O_CLOEXEC)) - return -EINVAL; - fd = anon_inode_getfd("[pidfd]", &pidfd_fops, get_pid(pid), flags | O_RDWR | O_CLOEXEC); if (fd < 0) @@ -605,7 +597,10 @@ SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags) if (!p) return -ESRCH; - fd = pidfd_create(p, flags); + if (pid_has_task(p, PIDTYPE_TGID)) + fd = pidfd_create(p, flags); + else + fd = -EINVAL; put_pid(p); return fd; diff --git a/kernel/sys.c b/kernel/sys.c index f268f24a87ec..1de01fab5788 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1883,7 +1883,7 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd) if (!S_ISREG(inode->i_mode) || path_noexec(&exe.file->f_path)) goto exit; - err = file_permission(exe.file, MAY_EXEC); + err = inode_permission(inode, MAY_EXEC); if (err) goto exit; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index abd37b81e9d8..4deacde2e3ee 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -145,9 +145,6 @@ static unsigned long hung_task_timeout_max = (LONG_MAX/HZ); #ifdef CONFIG_INOTIFY_USER #include #endif -#ifdef CONFIG_FANOTIFY -#include -#endif #ifdef CONFIG_PROC_SYSCTL @@ -549,21 +546,6 @@ static void proc_put_char(void **buf, size_t *size, char c) } } -static int do_proc_dobool_conv(bool *negp, unsigned long *lvalp, - int *valp, - int write, void *data) -{ - if (write) { - *(bool *)valp = *lvalp; - } else { - int val = *(bool *)valp; - - *lvalp = (unsigned long)val; - *negp = false; - } - return 0; -} - static int do_proc_dointvec_conv(bool *negp, unsigned long *lvalp, int *valp, int write, void *data) @@ -826,26 +808,6 @@ static int do_proc_douintvec(struct ctl_table *table, int write, buffer, lenp, ppos, conv, data); } -/** - * proc_dobool - read/write a bool - * @table: the sysctl table - * @write: %TRUE if this is a write to the sysctl file - * @buffer: the user buffer - * @lenp: the size of the user buffer - * @ppos: file position - * - * Reads/writes up to table->maxlen/sizeof(unsigned int) integer - * values from/to the user buffer, treated as an ASCII string. - * - * Returns 0 on success. - */ -int proc_dobool(struct ctl_table *table, int write, void *buffer, - size_t *lenp, loff_t *ppos) -{ - return do_proc_dointvec(table, write, buffer, lenp, ppos, - do_proc_dobool_conv, NULL); -} - /** * proc_dointvec - read a vector of integers * @table: the sysctl table @@ -1682,12 +1644,6 @@ int proc_dostring(struct ctl_table *table, int write, return -ENOSYS; } -int proc_dobool(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos) -{ - return -ENOSYS; -} - int proc_dointvec(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { @@ -3394,14 +3350,7 @@ static struct ctl_table fs_table[] = { .mode = 0555, .child = inotify_table, }, -#endif -#ifdef CONFIG_FANOTIFY - { - .procname = "fanotify", - .mode = 0555, - .child = fanotify_table, - }, -#endif +#endif #ifdef CONFIG_EPOLL { .procname = "epoll", @@ -3564,7 +3513,6 @@ int __init sysctl_init(void) * No sense putting this after each symbol definition, twice, * exception granted :-) */ -EXPORT_SYMBOL(proc_dobool); EXPORT_SYMBOL(proc_dointvec); EXPORT_SYMBOL(proc_douintvec); EXPORT_SYMBOL(proc_dointvec_jiffies); diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 5453af26ff76..718357289899 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -124,9 +124,9 @@ static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk) if (!p) return true; *p = '\0'; - rcu_read_lock_sched(); + mutex_lock(&module_mutex); ret = !!find_module(tk->symbol); - rcu_read_unlock_sched(); + mutex_unlock(&module_mutex); *p = ':'; return ret; diff --git a/kernel/ucount.c b/kernel/ucount.c index 8d8874f1c35e..11b1596e2542 100644 --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -73,10 +73,6 @@ static struct ctl_table user_table[] = { #ifdef CONFIG_INOTIFY_USER UCOUNT_ENTRY("max_inotify_instances"), UCOUNT_ENTRY("max_inotify_watches"), -#endif -#ifdef CONFIG_FANOTIFY - UCOUNT_ENTRY("max_fanotify_groups"), - UCOUNT_ENTRY("max_fanotify_marks"), #endif { } }; diff --git a/mm/madvise.c b/mm/madvise.c index ddecb4434ccf..410f366b5df4 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -568,7 +568,7 @@ static inline bool can_do_file_pageout(struct vm_area_struct *vma) * opens a side channel. */ return inode_owner_or_capable(file_inode(vma->vm_file)) || - file_permission(vma->vm_file, MAY_WRITE) == 0; + inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0; } static long madvise_pageout(struct vm_area_struct *vma, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 67b098ec9453..c4077f277ffc 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4954,7 +4954,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of, /* the process need read permission on control file */ /* AV: shouldn't we check that it's been opened for read instead? */ - ret = file_permission(cfile.file, MAY_READ); + ret = inode_permission(file_inode(cfile.file), MAY_READ); if (ret < 0) goto out_put_cfile; diff --git a/mm/mincore.c b/mm/mincore.c index 7bdb4673f776..02db1a834021 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -167,7 +167,7 @@ static inline bool can_do_mincore(struct vm_area_struct *vma) * mappings, which opens a side channel. */ return inode_owner_or_capable(file_inode(vma->vm_file)) || - file_permission(vma->vm_file, MAY_WRITE) == 0; + inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0; } static const struct mm_walk_ops mincore_walk_ops = { diff --git a/net/bluetooth/bnep/core.c b/net/bluetooth/bnep/core.c index 09b6d825124e..43c284158f63 100644 --- a/net/bluetooth/bnep/core.c +++ b/net/bluetooth/bnep/core.c @@ -535,7 +535,7 @@ static int bnep_session(void *arg) up_write(&bnep_session_sem); free_netdev(dev); - module_put_and_kthread_exit(0); + module_put_and_exit(0); return 0; } diff --git a/net/bluetooth/cmtp/core.c b/net/bluetooth/cmtp/core.c index 90d130588a3e..83eb84e8e688 100644 --- a/net/bluetooth/cmtp/core.c +++ b/net/bluetooth/cmtp/core.c @@ -323,7 +323,7 @@ static int cmtp_session(void *arg) up_write(&cmtp_session_sem); kfree(session); - module_put_and_kthread_exit(0); + module_put_and_exit(0); return 0; } diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c index 3ff870599eb7..b946a6379433 100644 --- a/net/bluetooth/hidp/core.c +++ b/net/bluetooth/hidp/core.c @@ -1305,7 +1305,7 @@ static int hidp_session_thread(void *arg) l2cap_unregister_user(session->conn, &session->user); hidp_session_put(session); - module_put_and_kthread_exit(0); + module_put_and_exit(0); return 0; } diff --git a/net/sunrpc/auth_gss/gss_rpc_xdr.c b/net/sunrpc/auth_gss/gss_rpc_xdr.c index a857fc99431c..e265b8d38aa1 100644 --- a/net/sunrpc/auth_gss/gss_rpc_xdr.c +++ b/net/sunrpc/auth_gss/gss_rpc_xdr.c @@ -800,7 +800,7 @@ int gssx_dec_accept_sec_context(struct rpc_rqst *rqstp, scratch = alloc_page(GFP_KERNEL); if (!scratch) return -ENOMEM; - xdr_set_scratch_page(xdr, scratch); + xdr_set_scratch_buffer(xdr, page_address(scratch), PAGE_SIZE); /* res->status */ err = gssx_dec_status(xdr, &res->status); diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c index 329eac782cc5..784c8b24f164 100644 --- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -707,11 +707,11 @@ svc_safe_putnetobj(struct kvec *resv, struct xdr_netobj *o) /* * Verify the checksum on the header and return SVC_OK on success. * Otherwise, return SVC_DROP (in the case of a bad sequence number) - * or return SVC_DENIED and indicate error in rqstp->rq_auth_stat. + * or return SVC_DENIED and indicate error in authp. */ static int gss_verify_header(struct svc_rqst *rqstp, struct rsc *rsci, - __be32 *rpcstart, struct rpc_gss_wire_cred *gc) + __be32 *rpcstart, struct rpc_gss_wire_cred *gc, __be32 *authp) { struct gss_ctx *ctx_id = rsci->mechctx; struct xdr_buf rpchdr; @@ -725,7 +725,7 @@ gss_verify_header(struct svc_rqst *rqstp, struct rsc *rsci, iov.iov_len = (u8 *)argv->iov_base - (u8 *)rpcstart; xdr_buf_from_iov(&iov, &rpchdr); - rqstp->rq_auth_stat = rpc_autherr_badverf; + *authp = rpc_autherr_badverf; if (argv->iov_len < 4) return SVC_DENIED; flavor = svc_getnl(argv); @@ -737,13 +737,13 @@ gss_verify_header(struct svc_rqst *rqstp, struct rsc *rsci, if (rqstp->rq_deferred) /* skip verification of revisited request */ return SVC_OK; if (gss_verify_mic(ctx_id, &rpchdr, &checksum) != GSS_S_COMPLETE) { - rqstp->rq_auth_stat = rpcsec_gsserr_credproblem; + *authp = rpcsec_gsserr_credproblem; return SVC_DENIED; } if (gc->gc_seq > MAXSEQ) { trace_rpcgss_svc_seqno_large(rqstp, gc->gc_seq); - rqstp->rq_auth_stat = rpcsec_gsserr_ctxproblem; + *authp = rpcsec_gsserr_ctxproblem; return SVC_DENIED; } if (!gss_check_seq_num(rqstp, rsci, gc->gc_seq)) @@ -1038,8 +1038,6 @@ svcauth_gss_set_client(struct svc_rqst *rqstp) struct rpc_gss_wire_cred *gc = &svcdata->clcred; int stat; - rqstp->rq_auth_stat = rpc_autherr_badcred; - /* * A gss export can be specified either by: * export *(sec=krb5,rw) @@ -1055,8 +1053,6 @@ svcauth_gss_set_client(struct svc_rqst *rqstp) stat = svcauth_unix_set_client(rqstp); if (stat == SVC_DROP || stat == SVC_CLOSE) return stat; - - rqstp->rq_auth_stat = rpc_auth_ok; return SVC_OK; } @@ -1140,7 +1136,7 @@ static void gss_free_in_token_pages(struct gssp_in_token *in_token) } static int gss_read_proxy_verf(struct svc_rqst *rqstp, - struct rpc_gss_wire_cred *gc, + struct rpc_gss_wire_cred *gc, __be32 *authp, struct xdr_netobj *in_handle, struct gssp_in_token *in_token) { @@ -1149,7 +1145,7 @@ static int gss_read_proxy_verf(struct svc_rqst *rqstp, int pages, i, res, pgto, pgfrom; size_t inlen, to_offs, from_offs; - res = gss_read_common_verf(gc, argv, &rqstp->rq_auth_stat, in_handle); + res = gss_read_common_verf(gc, argv, authp, in_handle); if (res) return res; @@ -1230,7 +1226,7 @@ gss_write_resv(struct kvec *resv, size_t size_limit, * Otherwise, drop the request pending an answer to the upcall. */ static int svcauth_gss_legacy_init(struct svc_rqst *rqstp, - struct rpc_gss_wire_cred *gc) + struct rpc_gss_wire_cred *gc, __be32 *authp) { struct kvec *argv = &rqstp->rq_arg.head[0]; struct kvec *resv = &rqstp->rq_res.head[0]; @@ -1239,7 +1235,7 @@ static int svcauth_gss_legacy_init(struct svc_rqst *rqstp, struct sunrpc_net *sn = net_generic(SVC_NET(rqstp), sunrpc_net_id); memset(&rsikey, 0, sizeof(rsikey)); - ret = gss_read_verf(gc, argv, &rqstp->rq_auth_stat, + ret = gss_read_verf(gc, argv, authp, &rsikey.in_handle, &rsikey.in_token); if (ret) return ret; @@ -1342,7 +1338,7 @@ static int gss_proxy_save_rsc(struct cache_detail *cd, } static int svcauth_gss_proxy_init(struct svc_rqst *rqstp, - struct rpc_gss_wire_cred *gc) + struct rpc_gss_wire_cred *gc, __be32 *authp) { struct kvec *resv = &rqstp->rq_res.head[0]; struct xdr_netobj cli_handle; @@ -1354,7 +1350,8 @@ static int svcauth_gss_proxy_init(struct svc_rqst *rqstp, struct sunrpc_net *sn = net_generic(net, sunrpc_net_id); memset(&ud, 0, sizeof(ud)); - ret = gss_read_proxy_verf(rqstp, gc, &ud.in_handle, &ud.in_token); + ret = gss_read_proxy_verf(rqstp, gc, authp, + &ud.in_handle, &ud.in_token); if (ret) return ret; @@ -1527,7 +1524,7 @@ static void destroy_use_gss_proxy_proc_entry(struct net *net) {} * response here and return SVC_COMPLETE. */ static int -svcauth_gss_accept(struct svc_rqst *rqstp) +svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp) { struct kvec *argv = &rqstp->rq_arg.head[0]; struct kvec *resv = &rqstp->rq_res.head[0]; @@ -1540,7 +1537,7 @@ svcauth_gss_accept(struct svc_rqst *rqstp) int ret; struct sunrpc_net *sn = net_generic(SVC_NET(rqstp), sunrpc_net_id); - rqstp->rq_auth_stat = rpc_autherr_badcred; + *authp = rpc_autherr_badcred; if (!svcdata) svcdata = kmalloc(sizeof(*svcdata), GFP_KERNEL); if (!svcdata) @@ -1577,22 +1574,22 @@ svcauth_gss_accept(struct svc_rqst *rqstp) if ((gc->gc_proc != RPC_GSS_PROC_DATA) && (rqstp->rq_proc != 0)) goto auth_err; - rqstp->rq_auth_stat = rpc_autherr_badverf; + *authp = rpc_autherr_badverf; switch (gc->gc_proc) { case RPC_GSS_PROC_INIT: case RPC_GSS_PROC_CONTINUE_INIT: if (use_gss_proxy(SVC_NET(rqstp))) - return svcauth_gss_proxy_init(rqstp, gc); + return svcauth_gss_proxy_init(rqstp, gc, authp); else - return svcauth_gss_legacy_init(rqstp, gc); + return svcauth_gss_legacy_init(rqstp, gc, authp); case RPC_GSS_PROC_DATA: case RPC_GSS_PROC_DESTROY: /* Look up the context, and check the verifier: */ - rqstp->rq_auth_stat = rpcsec_gsserr_credproblem; + *authp = rpcsec_gsserr_credproblem; rsci = gss_svc_searchbyctx(sn->rsc_cache, &gc->gc_ctx); if (!rsci) goto auth_err; - switch (gss_verify_header(rqstp, rsci, rpcstart, gc)) { + switch (gss_verify_header(rqstp, rsci, rpcstart, gc, authp)) { case SVC_OK: break; case SVC_DENIED: @@ -1602,7 +1599,7 @@ svcauth_gss_accept(struct svc_rqst *rqstp) } break; default: - rqstp->rq_auth_stat = rpc_autherr_rejectedcred; + *authp = rpc_autherr_rejectedcred; goto auth_err; } @@ -1618,13 +1615,13 @@ svcauth_gss_accept(struct svc_rqst *rqstp) svc_putnl(resv, RPC_SUCCESS); goto complete; case RPC_GSS_PROC_DATA: - rqstp->rq_auth_stat = rpcsec_gsserr_ctxproblem; + *authp = rpcsec_gsserr_ctxproblem; svcdata->verf_start = resv->iov_base + resv->iov_len; if (gss_write_verf(rqstp, rsci->mechctx, gc->gc_seq)) goto auth_err; rqstp->rq_cred = rsci->cred; get_group_info(rsci->cred.cr_group_info); - rqstp->rq_auth_stat = rpc_autherr_badcred; + *authp = rpc_autherr_badcred; switch (gc->gc_svc) { case RPC_GSS_SVC_NONE: break; diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index a4c9d410eb8d..a00890962e11 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -821,7 +821,6 @@ void rpc_exit_task(struct rpc_task *task) else if (task->tk_client) rpc_count_iostats(task, task->tk_client->cl_metrics); if (task->tk_ops->rpc_call_done != NULL) { - trace_rpc_task_call_done(task, task->tk_ops->rpc_call_done); task->tk_ops->rpc_call_done(task, task->tk_calldata); if (task->tk_action != NULL) { /* Always release the RPC slot and buffer memory */ diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 26d972c54a59..cfe8b911ca01 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -35,37 +35,18 @@ static void svc_unregister(const struct svc_serv *serv, struct net *net); -#define SVC_POOL_DEFAULT SVC_POOL_GLOBAL +#define svc_serv_is_pooled(serv) ((serv)->sv_ops->svo_function) -/* - * Mode for mapping cpus to pools. - */ -enum { - SVC_POOL_AUTO = -1, /* choose one of the others */ - SVC_POOL_GLOBAL, /* no mapping, just a single global pool - * (legacy & UP mode) */ - SVC_POOL_PERCPU, /* one pool per cpu */ - SVC_POOL_PERNODE /* one pool per numa node */ -}; +#define SVC_POOL_DEFAULT SVC_POOL_GLOBAL /* * Structure for mapping cpus to pools and vice versa. * Setup once during sunrpc initialisation. */ - -struct svc_pool_map { - int count; /* How many svc_servs use us */ - int mode; /* Note: int not enum to avoid - * warnings about "enumeration value - * not handled in switch" */ - unsigned int npools; - unsigned int *pool_to; /* maps pool id to cpu or node */ - unsigned int *to_pool; /* maps cpu or node to pool id */ -}; - -static struct svc_pool_map svc_pool_map = { +struct svc_pool_map svc_pool_map = { .mode = SVC_POOL_DEFAULT }; +EXPORT_SYMBOL_GPL(svc_pool_map); static DEFINE_MUTEX(svc_pool_map_mutex);/* protects svc_pool_map.count only */ @@ -236,12 +217,10 @@ svc_pool_map_init_pernode(struct svc_pool_map *m) /* * Add a reference to the global map of cpus to pools (and - * vice versa) if pools are in use. - * Initialise the map if we're the first user. - * Returns the number of pools. If this is '1', no reference - * was taken. + * vice versa). Initialise the map if we're the first user. + * Returns the number of pools. */ -static unsigned int +unsigned int svc_pool_map_get(void) { struct svc_pool_map *m = &svc_pool_map; @@ -251,7 +230,6 @@ svc_pool_map_get(void) if (m->count++) { mutex_unlock(&svc_pool_map_mutex); - WARN_ON_ONCE(m->npools <= 1); return m->npools; } @@ -267,36 +245,30 @@ svc_pool_map_get(void) break; } - if (npools <= 0) { + if (npools < 0) { /* default, or memory allocation failure */ npools = 1; m->mode = SVC_POOL_GLOBAL; } m->npools = npools; - if (npools == 1) - /* service is unpooled, so doesn't hold a reference */ - m->count--; - mutex_unlock(&svc_pool_map_mutex); - return npools; + return m->npools; } +EXPORT_SYMBOL_GPL(svc_pool_map_get); /* - * Drop a reference to the global map of cpus to pools, if - * pools were in use, i.e. if npools > 1. + * Drop a reference to the global map of cpus to pools. * When the last reference is dropped, the map data is * freed; this allows the sysadmin to change the pool * mode using the pool_mode module option without * rebooting or re-loading sunrpc.ko. */ -static void -svc_pool_map_put(int npools) +void +svc_pool_map_put(void) { struct svc_pool_map *m = &svc_pool_map; - if (npools <= 1) - return; mutex_lock(&svc_pool_map_mutex); if (!--m->count) { @@ -309,6 +281,7 @@ svc_pool_map_put(int npools) mutex_unlock(&svc_pool_map_mutex); } +EXPORT_SYMBOL_GPL(svc_pool_map_put); static int svc_pool_map_get_node(unsigned int pidx) { @@ -365,18 +338,21 @@ svc_pool_for_cpu(struct svc_serv *serv, int cpu) struct svc_pool_map *m = &svc_pool_map; unsigned int pidx = 0; - if (serv->sv_nrpools <= 1) - return serv->sv_pools; - - switch (m->mode) { - case SVC_POOL_PERCPU: - pidx = m->to_pool[cpu]; - break; - case SVC_POOL_PERNODE: - pidx = m->to_pool[cpu_to_node(cpu)]; - break; + /* + * An uninitialised map happens in a pure client when + * lockd is brought up, so silently treat it the + * same as SVC_POOL_GLOBAL. + */ + if (svc_serv_is_pooled(serv)) { + switch (m->mode) { + case SVC_POOL_PERCPU: + pidx = m->to_pool[cpu]; + break; + case SVC_POOL_PERNODE: + pidx = m->to_pool[cpu_to_node(cpu)]; + break; + } } - return &serv->sv_pools[pidx % serv->sv_nrpools]; } @@ -446,7 +422,7 @@ __svc_init_bc(struct svc_serv *serv) */ static struct svc_serv * __svc_create(struct svc_program *prog, unsigned int bufsize, int npools, - int (*threadfn)(void *data)) + const struct svc_serv_ops *ops) { struct svc_serv *serv; unsigned int vers; @@ -457,13 +433,13 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools, return NULL; serv->sv_name = prog->pg_name; serv->sv_program = prog; - kref_init(&serv->sv_refcnt); + serv->sv_nrthreads = 1; serv->sv_stats = prog->pg_stats; if (bufsize > RPCSVC_MAXPAYLOAD) bufsize = RPCSVC_MAXPAYLOAD; serv->sv_max_payload = bufsize? bufsize : 4096; serv->sv_max_mesg = roundup(serv->sv_max_payload + PAGE_SIZE, PAGE_SIZE); - serv->sv_threadfn = threadfn; + serv->sv_ops = ops; xdrsize = 0; while (prog) { prog->pg_lovers = prog->pg_nvers-1; @@ -509,56 +485,59 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools, return serv; } -/** - * svc_create - Create an RPC service - * @prog: the RPC program the new service will handle - * @bufsize: maximum message size for @prog - * @threadfn: a function to service RPC requests for @prog - * - * Returns an instantiated struct svc_serv object or NULL. - */ -struct svc_serv *svc_create(struct svc_program *prog, unsigned int bufsize, - int (*threadfn)(void *data)) +struct svc_serv * +svc_create(struct svc_program *prog, unsigned int bufsize, + const struct svc_serv_ops *ops) { - return __svc_create(prog, bufsize, 1, threadfn); + return __svc_create(prog, bufsize, /*npools*/1, ops); } EXPORT_SYMBOL_GPL(svc_create); -/** - * svc_create_pooled - Create an RPC service with pooled threads - * @prog: the RPC program the new service will handle - * @bufsize: maximum message size for @prog - * @threadfn: a function to service RPC requests for @prog - * - * Returns an instantiated struct svc_serv object or NULL. - */ -struct svc_serv *svc_create_pooled(struct svc_program *prog, - unsigned int bufsize, - int (*threadfn)(void *data)) +struct svc_serv * +svc_create_pooled(struct svc_program *prog, unsigned int bufsize, + const struct svc_serv_ops *ops) { struct svc_serv *serv; unsigned int npools = svc_pool_map_get(); - serv = __svc_create(prog, bufsize, npools, threadfn); + serv = __svc_create(prog, bufsize, npools, ops); if (!serv) goto out_err; return serv; out_err: - svc_pool_map_put(npools); + svc_pool_map_put(); return NULL; } EXPORT_SYMBOL_GPL(svc_create_pooled); +void svc_shutdown_net(struct svc_serv *serv, struct net *net) +{ + svc_close_net(serv, net); + + if (serv->sv_ops->svo_shutdown) + serv->sv_ops->svo_shutdown(serv, net); +} +EXPORT_SYMBOL_GPL(svc_shutdown_net); + /* * Destroy an RPC service. Should be called with appropriate locking to - * protect sv_permsocks and sv_tempsocks. + * protect the sv_nrthreads, sv_permsocks and sv_tempsocks. */ void -svc_destroy(struct kref *ref) +svc_destroy(struct svc_serv *serv) { - struct svc_serv *serv = container_of(ref, struct svc_serv, sv_refcnt); + dprintk("svc: svc_destroy(%s, %d)\n", + serv->sv_program->pg_name, + serv->sv_nrthreads); + + if (serv->sv_nrthreads) { + if (--(serv->sv_nrthreads) != 0) { + svc_sock_update_bufs(serv); + return; + } + } else + printk("svc_destroy: no threads for serv=%p!\n", serv); - dprintk("svc: svc_destroy(%s)\n", serv->sv_program->pg_name); del_timer_sync(&serv->sv_temptimer); /* @@ -570,7 +549,8 @@ svc_destroy(struct kref *ref) cache_clean_deferred(serv); - svc_pool_map_put(serv->sv_nrpools); + if (svc_serv_is_pooled(serv)) + svc_pool_map_put(); kfree(serv->sv_pools); kfree(serv); @@ -634,10 +614,6 @@ svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node) rqstp->rq_server = serv; rqstp->rq_pool = pool; - rqstp->rq_scratch_page = alloc_pages_node(node, GFP_KERNEL, 0); - if (!rqstp->rq_scratch_page) - goto out_enomem; - rqstp->rq_argp = kmalloc_node(serv->sv_xdrsize, GFP_KERNEL, node); if (!rqstp->rq_argp) goto out_enomem; @@ -656,7 +632,7 @@ svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node) } EXPORT_SYMBOL_GPL(svc_rqst_alloc); -static struct svc_rqst * +struct svc_rqst * svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node) { struct svc_rqst *rqstp; @@ -665,17 +641,14 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node) if (!rqstp) return ERR_PTR(-ENOMEM); - svc_get(serv); - spin_lock_bh(&serv->sv_lock); - serv->sv_nrthreads += 1; - spin_unlock_bh(&serv->sv_lock); - + serv->sv_nrthreads++; spin_lock_bh(&pool->sp_lock); pool->sp_nrthreads++; list_add_rcu(&rqstp->rq_all, &pool->sp_all_threads); spin_unlock_bh(&pool->sp_lock); return rqstp; } +EXPORT_SYMBOL_GPL(svc_prepare_thread); /* * Choose a pool in which to create a new thread, for svc_set_num_threads @@ -749,9 +722,11 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) if (IS_ERR(rqstp)) return PTR_ERR(rqstp); - task = kthread_create_on_node(serv->sv_threadfn, rqstp, + __module_get(serv->sv_ops->svo_module); + task = kthread_create_on_node(serv->sv_ops->svo_function, rqstp, node, "%s", serv->sv_name); if (IS_ERR(task)) { + module_put(serv->sv_ops->svo_module); svc_exit_thread(rqstp); return PTR_ERR(task); } @@ -767,13 +742,59 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) return 0; } + +/* destroy old threads */ +static int +svc_signal_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) +{ + struct task_struct *task; + unsigned int state = serv->sv_nrthreads-1; + + /* destroy old threads */ + do { + task = choose_victim(serv, pool, &state); + if (task == NULL) + break; + send_sig(SIGINT, task, 1); + nrservs++; + } while (nrservs < 0); + + return 0; +} + /* * Create or destroy enough new threads to make the number * of threads the given number. If `pool' is non-NULL, applies * only to threads in that pool, otherwise round-robins between * all pools. Caller must ensure that mutual exclusion between this and * server startup or shutdown. + * + * Destroying threads relies on the service threads filling in + * rqstp->rq_task, which only the nfs ones do. Assumes the serv + * has been created using svc_create_pooled(). + * + * Based on code that used to be in nfsd_svc() but tweaked + * to be pool-aware. */ +int +svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) +{ + if (pool == NULL) { + /* The -1 assumes caller has done a svc_get() */ + nrservs -= (serv->sv_nrthreads-1); + } else { + spin_lock_bh(&pool->sp_lock); + nrservs -= pool->sp_nrthreads; + spin_unlock_bh(&pool->sp_lock); + } + + if (nrservs > 0) + return svc_start_kthreads(serv, pool, nrservs); + if (nrservs < 0) + return svc_signal_kthreads(serv, pool, nrservs); + return 0; +} +EXPORT_SYMBOL_GPL(svc_set_num_threads); /* destroy old threads */ static int @@ -798,10 +819,11 @@ svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) } int -svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) +svc_set_num_threads_sync(struct svc_serv *serv, struct svc_pool *pool, int nrservs) { if (pool == NULL) { - nrservs -= serv->sv_nrthreads; + /* The -1 assumes caller has done a svc_get() */ + nrservs -= (serv->sv_nrthreads-1); } else { spin_lock_bh(&pool->sp_lock); nrservs -= pool->sp_nrthreads; @@ -814,28 +836,7 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) return svc_stop_kthreads(serv, pool, nrservs); return 0; } -EXPORT_SYMBOL_GPL(svc_set_num_threads); - -/** - * svc_rqst_replace_page - Replace one page in rq_pages[] - * @rqstp: svc_rqst with pages to replace - * @page: replacement page - * - * When replacing a page in rq_pages, batch the release of the - * replaced pages to avoid hammering the page allocator. - */ -void svc_rqst_replace_page(struct svc_rqst *rqstp, struct page *page) -{ - if (*rqstp->rq_next_page) { - if (!pagevec_space(&rqstp->rq_pvec)) - __pagevec_release(&rqstp->rq_pvec); - pagevec_add(&rqstp->rq_pvec, *rqstp->rq_next_page); - } - - get_page(page); - *(rqstp->rq_next_page++) = page; -} -EXPORT_SYMBOL_GPL(svc_rqst_replace_page); +EXPORT_SYMBOL_GPL(svc_set_num_threads_sync); /* * Called from a server thread as it's exiting. Caller must hold the "service @@ -845,7 +846,6 @@ void svc_rqst_free(struct svc_rqst *rqstp) { svc_release_buffer(rqstp); - put_page(rqstp->rq_scratch_page); kfree(rqstp->rq_resp); kfree(rqstp->rq_argp); kfree(rqstp->rq_auth_data); @@ -865,14 +865,11 @@ svc_exit_thread(struct svc_rqst *rqstp) list_del_rcu(&rqstp->rq_all); spin_unlock_bh(&pool->sp_lock); - spin_lock_bh(&serv->sv_lock); - serv->sv_nrthreads -= 1; - spin_unlock_bh(&serv->sv_lock); - svc_sock_update_bufs(serv); - svc_rqst_free(rqstp); - svc_put(serv); + /* Release the server */ + if (serv) + svc_destroy(serv); } EXPORT_SYMBOL_GPL(svc_exit_thread); @@ -1164,6 +1161,22 @@ void svc_printk(struct svc_rqst *rqstp, const char *fmt, ...) static __printf(2,3) void svc_printk(struct svc_rqst *rqstp, const char *fmt, ...) {} #endif +__be32 +svc_return_autherr(struct svc_rqst *rqstp, __be32 auth_err) +{ + set_bit(RQ_AUTHERR, &rqstp->rq_flags); + return auth_err; +} +EXPORT_SYMBOL_GPL(svc_return_autherr); + +static __be32 +svc_get_autherr(struct svc_rqst *rqstp, __be32 *statp) +{ + if (test_and_clear_bit(RQ_AUTHERR, &rqstp->rq_flags)) + return *statp; + return rpc_auth_ok; +} + static int svc_generic_dispatch(struct svc_rqst *rqstp, __be32 *statp) { @@ -1187,7 +1200,7 @@ svc_generic_dispatch(struct svc_rqst *rqstp, __be32 *statp) test_bit(RQ_DROPME, &rqstp->rq_flags)) return 0; - if (rqstp->rq_auth_stat != rpc_auth_ok) + if (test_bit(RQ_AUTHERR, &rqstp->rq_flags)) return 1; if (*statp != rpc_success) @@ -1237,7 +1250,7 @@ svc_generic_init_request(struct svc_rqst *rqstp, rqstp->rq_procinfo = procp = &versp->vs_proc[rqstp->rq_proc]; /* Initialize storage for argp and resp */ - memset(rqstp->rq_argp, 0, procp->pc_argzero); + memset(rqstp->rq_argp, 0, procp->pc_argsize); memset(rqstp->rq_resp, 0, procp->pc_ressize); /* Bump per-procedure stats counter */ @@ -1266,7 +1279,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) struct svc_process_info process; __be32 *statp; u32 prog, vers; - __be32 rpc_stat; + __be32 auth_stat, rpc_stat; int auth_res; __be32 *reply_statp; @@ -1309,12 +1322,14 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) * We do this before anything else in order to get a decent * auth verifier. */ - auth_res = svc_authenticate(rqstp); + auth_res = svc_authenticate(rqstp, &auth_stat); /* Also give the program a chance to reject this call: */ - if (auth_res == SVC_OK && progp) + if (auth_res == SVC_OK && progp) { + auth_stat = rpc_autherr_badcred; auth_res = progp->pg_authenticate(rqstp); + } if (auth_res != SVC_OK) - trace_svc_authenticate(rqstp, auth_res); + trace_svc_authenticate(rqstp, auth_res, auth_stat); switch (auth_res) { case SVC_OK: break; @@ -1373,15 +1388,15 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) goto release_dropit; if (*statp == rpc_garbage_args) goto err_garbage; + auth_stat = svc_get_autherr(rqstp, statp); + if (auth_stat != rpc_auth_ok) + goto err_release_bad_auth; } else { dprintk("svc: calling dispatcher\n"); if (!process.dispatch(rqstp, statp)) goto release_dropit; /* Release reply info */ } - if (rqstp->rq_auth_stat != rpc_auth_ok) - goto err_release_bad_auth; - /* Check RPC status result */ if (*statp != rpc_success) resv->iov_len = ((void*)statp) - resv->iov_base + 4; @@ -1410,7 +1425,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) svc_authorise(rqstp); close_xprt: if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags)) - svc_xprt_close(rqstp->rq_xprt); + svc_close_xprt(rqstp->rq_xprt); dprintk("svc: svc_process close\n"); return 0; @@ -1431,14 +1446,13 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) if (procp->pc_release) procp->pc_release(rqstp); err_bad_auth: - dprintk("svc: authentication failed (%d)\n", - be32_to_cpu(rqstp->rq_auth_stat)); + dprintk("svc: authentication failed (%d)\n", ntohl(auth_stat)); serv->sv_stats->rpcbadauth++; /* Restore write pointer to location of accept status: */ xdr_ressize_check(rqstp, reply_statp); svc_putnl(resv, 1); /* REJECT */ svc_putnl(resv, 1); /* AUTH_ERROR */ - svc_putu32(resv, rqstp->rq_auth_stat); /* status */ + svc_putnl(resv, ntohl(auth_stat)); /* status */ goto sendit; err_bad_prog: @@ -1612,7 +1626,7 @@ u32 svc_max_payload(const struct svc_rqst *rqstp) EXPORT_SYMBOL_GPL(svc_max_payload); /** - * svc_encode_result_payload - mark a range of bytes as a result payload + * svc_encode_read_payload - mark a range of bytes as a READ payload * @rqstp: svc_rqst to operate on * @offset: payload's byte offset in rqstp->rq_res * @length: size of payload, in bytes @@ -1620,28 +1634,26 @@ EXPORT_SYMBOL_GPL(svc_max_payload); * Returns zero on success, or a negative errno if a permanent * error occurred. */ -int svc_encode_result_payload(struct svc_rqst *rqstp, unsigned int offset, - unsigned int length) +int svc_encode_read_payload(struct svc_rqst *rqstp, unsigned int offset, + unsigned int length) { - return rqstp->rq_xprt->xpt_ops->xpo_result_payload(rqstp, offset, - length); + return rqstp->rq_xprt->xpt_ops->xpo_read_payload(rqstp, offset, length); } -EXPORT_SYMBOL_GPL(svc_encode_result_payload); +EXPORT_SYMBOL_GPL(svc_encode_read_payload); /** * svc_fill_write_vector - Construct data argument for VFS write call * @rqstp: svc_rqst to operate on - * @payload: xdr_buf containing only the write data payload + * @pages: list of pages containing data payload + * @first: buffer containing first section of write payload + * @total: total number of bytes of write payload * * Fills in rqstp::rq_vec, and returns the number of elements. */ -unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, - struct xdr_buf *payload) +unsigned int svc_fill_write_vector(struct svc_rqst *rqstp, struct page **pages, + struct kvec *first, size_t total) { - struct page **pages = payload->pages; - struct kvec *first = payload->head; struct kvec *vec = rqstp->rq_vec; - size_t total = payload->len; unsigned int i; /* Some types of transport can present the write payload diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index d1eacf3358b8..06e503466c32 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -233,35 +233,30 @@ static struct svc_xprt *__svc_xpo_create(struct svc_xprt_class *xcl, return xprt; } -/** - * svc_xprt_received - start next receiver thread - * @xprt: controlling transport - * - * The caller must hold the XPT_BUSY bit and must +/* + * svc_xprt_received conditionally queues the transport for processing + * by another thread. The caller must hold the XPT_BUSY bit and must * not thereafter touch transport data. * * Note: XPT_DATA only gets cleared when a read-attempt finds no (or * insufficient) data. */ -void svc_xprt_received(struct svc_xprt *xprt) +static void svc_xprt_received(struct svc_xprt *xprt) { if (!test_bit(XPT_BUSY, &xprt->xpt_flags)) { WARN_ONCE(1, "xprt=0x%p already busy!", xprt); return; } - trace_svc_xprt_received(xprt); - /* As soon as we clear busy, the xprt could be closed and - * 'put', so we need a reference to call svc_xprt_enqueue with: + * 'put', so we need a reference to call svc_enqueue_xprt with: */ svc_xprt_get(xprt); smp_mb__before_atomic(); clear_bit(XPT_BUSY, &xprt->xpt_flags); - svc_xprt_enqueue(xprt); + xprt->xpt_server->sv_ops->svo_enqueue_xprt(xprt); svc_xprt_put(xprt); } -EXPORT_SYMBOL_GPL(svc_xprt_received); void svc_add_new_perm_xprt(struct svc_serv *serv, struct svc_xprt *new) { @@ -272,7 +267,7 @@ void svc_add_new_perm_xprt(struct svc_serv *serv, struct svc_xprt *new) svc_xprt_received(new); } -static int _svc_xprt_create(struct svc_serv *serv, const char *xprt_name, +static int _svc_create_xprt(struct svc_serv *serv, const char *xprt_name, struct net *net, const int family, const unsigned short port, int flags, const struct cred *cred) @@ -308,35 +303,21 @@ static int _svc_xprt_create(struct svc_serv *serv, const char *xprt_name, return -EPROTONOSUPPORT; } -/** - * svc_xprt_create - Add a new listener to @serv - * @serv: target RPC service - * @xprt_name: transport class name - * @net: network namespace - * @family: network address family - * @port: listener port - * @flags: SVC_SOCK flags - * @cred: credential to bind to this transport - * - * Return values: - * %0: New listener added successfully - * %-EPROTONOSUPPORT: Requested transport type not supported - */ -int svc_xprt_create(struct svc_serv *serv, const char *xprt_name, +int svc_create_xprt(struct svc_serv *serv, const char *xprt_name, struct net *net, const int family, const unsigned short port, int flags, const struct cred *cred) { int err; - err = _svc_xprt_create(serv, xprt_name, net, family, port, flags, cred); + err = _svc_create_xprt(serv, xprt_name, net, family, port, flags, cred); if (err == -EPROTONOSUPPORT) { request_module("svc%s", xprt_name); - err = _svc_xprt_create(serv, xprt_name, net, family, port, flags, cred); + err = _svc_create_xprt(serv, xprt_name, net, family, port, flags, cred); } return err; } -EXPORT_SYMBOL_GPL(svc_xprt_create); +EXPORT_SYMBOL_GPL(svc_create_xprt); /* * Copy the local and remote xprt addresses to the rqstp structure @@ -412,8 +393,6 @@ static bool svc_xprt_ready(struct svc_xprt *xprt) smp_rmb(); xpt_flags = READ_ONCE(xprt->xpt_flags); - if (xpt_flags & BIT(XPT_BUSY)) - return false; if (xpt_flags & (BIT(XPT_CONN) | BIT(XPT_CLOSE))) return true; if (xpt_flags & (BIT(XPT_DATA) | BIT(XPT_DEFERRED))) { @@ -426,12 +405,7 @@ static bool svc_xprt_ready(struct svc_xprt *xprt) return false; } -/** - * svc_xprt_enqueue - Queue a transport on an idle nfsd thread - * @xprt: transport with data pending - * - */ -void svc_xprt_enqueue(struct svc_xprt *xprt) +void svc_xprt_do_enqueue(struct svc_xprt *xprt) { struct svc_pool *pool; struct svc_rqst *rqstp = NULL; @@ -475,6 +449,19 @@ void svc_xprt_enqueue(struct svc_xprt *xprt) put_cpu(); trace_svc_xprt_do_enqueue(xprt, rqstp); } +EXPORT_SYMBOL_GPL(svc_xprt_do_enqueue); + +/* + * Queue up a transport with data pending. If there are idle nfsd + * processes, wake 'em up. + * + */ +void svc_xprt_enqueue(struct svc_xprt *xprt) +{ + if (test_bit(XPT_BUSY, &xprt->xpt_flags)) + return; + xprt->xpt_server->sv_ops->svo_enqueue_xprt(xprt); +} EXPORT_SYMBOL_GPL(svc_xprt_enqueue); /* @@ -533,7 +520,6 @@ static void svc_xprt_release(struct svc_rqst *rqstp) kfree(rqstp->rq_deferred); rqstp->rq_deferred = NULL; - pagevec_release(&rqstp->rq_pvec); svc_free_res_pages(rqstp); rqstp->rq_res.page_len = 0; rqstp->rq_res.page_base = 0; @@ -660,8 +646,6 @@ static int svc_alloc_arg(struct svc_rqst *rqstp) int pages; int i; - pagevec_init(&rqstp->rq_pvec); - /* now allocate needed pages. If we get a failure, sleep briefly */ pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT; if (pages > RPCSVC_MAXPAGES) { @@ -674,13 +658,13 @@ static int svc_alloc_arg(struct svc_rqst *rqstp) while (rqstp->rq_pages[i] == NULL) { struct page *p = alloc_page(GFP_KERNEL); if (!p) { - set_current_state(TASK_IDLE); - if (kthread_should_stop()) { + set_current_state(TASK_INTERRUPTIBLE); + if (signalled() || kthread_should_stop()) { set_current_state(TASK_RUNNING); return -EINTR; } + schedule_timeout(msecs_to_jiffies(500)); } - freezable_schedule_timeout(msecs_to_jiffies(500)); rqstp->rq_pages[i] = p; } rqstp->rq_page_end = &rqstp->rq_pages[i]; @@ -713,7 +697,7 @@ rqst_should_sleep(struct svc_rqst *rqstp) return false; /* are we shutting down? */ - if (kthread_should_stop()) + if (signalled() || kthread_should_stop()) return false; /* are we freezing? */ @@ -735,14 +719,18 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp, long timeout) if (rqstp->rq_xprt) goto out_found; - set_current_state(TASK_IDLE); + /* + * We have to be able to interrupt this wait + * to bring down the daemons ... + */ + set_current_state(TASK_INTERRUPTIBLE); smp_mb__before_atomic(); clear_bit(SP_CONGESTED, &pool->sp_flags); clear_bit(RQ_BUSY, &rqstp->rq_flags); smp_mb__after_atomic(); if (likely(rqst_should_sleep(rqstp))) - time_left = freezable_schedule_timeout(timeout); + time_left = schedule_timeout(timeout); else __set_current_state(TASK_RUNNING); @@ -757,7 +745,7 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp, long timeout) if (!time_left) atomic_long_inc(&pool->sp_stats.threads_timedout); - if (kthread_should_stop()) + if (signalled() || kthread_should_stop()) return ERR_PTR(-EINTR); return ERR_PTR(-EAGAIN); out_found: @@ -856,7 +844,7 @@ int svc_recv(struct svc_rqst *rqstp, long timeout) try_to_freeze(); cond_resched(); err = -EINTR; - if (kthread_should_stop()) + if (signalled() || kthread_should_stop()) goto out; xprt = svc_get_next_xprt(rqstp, timeout); @@ -1052,12 +1040,7 @@ static void svc_delete_xprt(struct svc_xprt *xprt) svc_xprt_put(xprt); } -/** - * svc_xprt_close - Close a client connection - * @xprt: transport to disconnect - * - */ -void svc_xprt_close(struct svc_xprt *xprt) +void svc_close_xprt(struct svc_xprt *xprt) { trace_svc_xprt_close(xprt); set_bit(XPT_CLOSE, &xprt->xpt_flags); @@ -1072,7 +1055,7 @@ void svc_xprt_close(struct svc_xprt *xprt) */ svc_delete_xprt(xprt); } -EXPORT_SYMBOL_GPL(svc_xprt_close); +EXPORT_SYMBOL_GPL(svc_close_xprt); static int svc_close_list(struct svc_serv *serv, struct list_head *xprt_list, struct net *net) { @@ -1124,11 +1107,7 @@ static void svc_clean_up_xprts(struct svc_serv *serv, struct net *net) } } -/** - * svc_xprt_destroy_all - Destroy transports associated with @serv - * @serv: RPC service to be shut down - * @net: target network namespace - * +/* * Server threads may still be running (especially in the case where the * service is still running in other network namespaces). * @@ -1140,7 +1119,7 @@ static void svc_clean_up_xprts(struct svc_serv *serv, struct net *net) * threads, we may need to wait a little while and then check again to * see if they're done. */ -void svc_xprt_destroy_all(struct svc_serv *serv, struct net *net) +void svc_close_net(struct svc_serv *serv, struct net *net) { int delay = 0; @@ -1151,7 +1130,6 @@ void svc_xprt_destroy_all(struct svc_serv *serv, struct net *net) msleep(delay++); } } -EXPORT_SYMBOL_GPL(svc_xprt_destroy_all); /* * Handle defer and revisit of requests diff --git a/net/sunrpc/svcauth.c b/net/sunrpc/svcauth.c index 5a8b8e03fdd4..998b196b6176 100644 --- a/net/sunrpc/svcauth.c +++ b/net/sunrpc/svcauth.c @@ -59,12 +59,12 @@ svc_put_auth_ops(struct auth_ops *aops) } int -svc_authenticate(struct svc_rqst *rqstp) +svc_authenticate(struct svc_rqst *rqstp, __be32 *authp) { rpc_authflavor_t flavor; struct auth_ops *aops; - rqstp->rq_auth_stat = rpc_auth_ok; + *authp = rpc_auth_ok; flavor = svc_getnl(&rqstp->rq_arg.head[0]); @@ -72,7 +72,7 @@ svc_authenticate(struct svc_rqst *rqstp) aops = svc_get_auth_ops(flavor); if (aops == NULL) { - rqstp->rq_auth_stat = rpc_autherr_badcred; + *authp = rpc_autherr_badcred; return SVC_DENIED; } @@ -80,7 +80,7 @@ svc_authenticate(struct svc_rqst *rqstp) init_svc_cred(&rqstp->rq_cred); rqstp->rq_authop = aops; - return aops->accept(rqstp); + return aops->accept(rqstp, authp); } EXPORT_SYMBOL_GPL(svc_authenticate); diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c index 1868596259af..60754a292589 100644 --- a/net/sunrpc/svcauth_unix.c +++ b/net/sunrpc/svcauth_unix.c @@ -699,9 +699,8 @@ svcauth_unix_set_client(struct svc_rqst *rqstp) rqstp->rq_client = NULL; if (rqstp->rq_proc == 0) - goto out; + return SVC_OK; - rqstp->rq_auth_stat = rpc_autherr_badcred; ipm = ip_map_cached_get(xprt); if (ipm == NULL) ipm = __ip_map_lookup(sn->ip_map_cache, rqstp->rq_server->sv_program->pg_class, @@ -738,16 +737,13 @@ svcauth_unix_set_client(struct svc_rqst *rqstp) put_group_info(cred->cr_group_info); cred->cr_group_info = gi; } - -out: - rqstp->rq_auth_stat = rpc_auth_ok; return SVC_OK; } EXPORT_SYMBOL_GPL(svcauth_unix_set_client); static int -svcauth_null_accept(struct svc_rqst *rqstp) +svcauth_null_accept(struct svc_rqst *rqstp, __be32 *authp) { struct kvec *argv = &rqstp->rq_arg.head[0]; struct kvec *resv = &rqstp->rq_res.head[0]; @@ -758,12 +754,12 @@ svcauth_null_accept(struct svc_rqst *rqstp) if (svc_getu32(argv) != 0) { dprintk("svc: bad null cred\n"); - rqstp->rq_auth_stat = rpc_autherr_badcred; + *authp = rpc_autherr_badcred; return SVC_DENIED; } if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) { dprintk("svc: bad null verf\n"); - rqstp->rq_auth_stat = rpc_autherr_badverf; + *authp = rpc_autherr_badverf; return SVC_DENIED; } @@ -807,7 +803,7 @@ struct auth_ops svcauth_null = { static int -svcauth_unix_accept(struct svc_rqst *rqstp) +svcauth_unix_accept(struct svc_rqst *rqstp, __be32 *authp) { struct kvec *argv = &rqstp->rq_arg.head[0]; struct kvec *resv = &rqstp->rq_res.head[0]; @@ -849,7 +845,7 @@ svcauth_unix_accept(struct svc_rqst *rqstp) } groups_sort(cred->cr_group_info); if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) { - rqstp->rq_auth_stat = rpc_autherr_badverf; + *authp = rpc_autherr_badverf; return SVC_DENIED; } @@ -861,7 +857,7 @@ svcauth_unix_accept(struct svc_rqst *rqstp) return SVC_OK; badcred: - rqstp->rq_auth_stat = rpc_autherr_badcred; + *authp = rpc_autherr_badcred; return SVC_DENIED; } diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index cb0cfcd8a814..3d5ee042c501 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -181,8 +181,8 @@ static void svc_set_cmsg_data(struct svc_rqst *rqstp, struct cmsghdr *cmh) } } -static int svc_sock_result_payload(struct svc_rqst *rqstp, unsigned int offset, - unsigned int length) +static int svc_sock_read_payload(struct svc_rqst *rqstp, unsigned int offset, + unsigned int length) { return 0; } @@ -635,7 +635,7 @@ static const struct svc_xprt_ops svc_udp_ops = { .xpo_create = svc_udp_create, .xpo_recvfrom = svc_udp_recvfrom, .xpo_sendto = svc_udp_sendto, - .xpo_result_payload = svc_sock_result_payload, + .xpo_read_payload = svc_sock_read_payload, .xpo_release_rqst = svc_udp_release_rqst, .xpo_detach = svc_sock_detach, .xpo_free = svc_sock_free, @@ -1209,7 +1209,7 @@ static const struct svc_xprt_ops svc_tcp_ops = { .xpo_create = svc_tcp_create, .xpo_recvfrom = svc_tcp_recvfrom, .xpo_sendto = svc_tcp_sendto, - .xpo_result_payload = svc_sock_result_payload, + .xpo_read_payload = svc_sock_read_payload, .xpo_release_rqst = svc_tcp_release_rqst, .xpo_detach = svc_tcp_sock_detach, .xpo_free = svc_sock_free, @@ -1342,10 +1342,25 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv, return svsk; } +bool svc_alien_sock(struct net *net, int fd) +{ + int err; + struct socket *sock = sockfd_lookup(fd, &err); + bool ret = false; + + if (!sock) + goto out; + if (sock_net(sock->sk) != net) + ret = true; + sockfd_put(sock); +out: + return ret; +} +EXPORT_SYMBOL_GPL(svc_alien_sock); + /** * svc_addsock - add a listener socket to an RPC service * @serv: pointer to RPC service to which to add a new listener - * @net: caller's network namespace * @fd: file descriptor of the new listener * @name_return: pointer to buffer to fill in with name of listener * @len: size of the buffer @@ -1355,8 +1370,8 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv, * Name is terminated with '\n'. On error, returns a negative errno * value. */ -int svc_addsock(struct svc_serv *serv, struct net *net, const int fd, - char *name_return, const size_t len, const struct cred *cred) +int svc_addsock(struct svc_serv *serv, const int fd, char *name_return, + const size_t len, const struct cred *cred) { int err = 0; struct socket *so = sockfd_lookup(fd, &err); @@ -1367,9 +1382,6 @@ int svc_addsock(struct svc_serv *serv, struct net *net, const int fd, if (!so) return err; - err = -EINVAL; - if (sock_net(so->sk) != net) - goto out; err = -EAFNOSUPPORT; if ((so->sk->sk_family != PF_INET) && (so->sk->sk_family != PF_INET6)) goto out; diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c index e2bd0cd39114..d84bb5037bb5 100644 --- a/net/sunrpc/xdr.c +++ b/net/sunrpc/xdr.c @@ -669,7 +669,7 @@ void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, struct kvec *iov = buf->head; int scratch_len = buf->buflen - buf->page_len - buf->tail[0].iov_len; - xdr_reset_scratch_buffer(xdr); + xdr_set_scratch_buffer(xdr, NULL, 0); BUG_ON(scratch_len < 0); xdr->buf = buf; xdr->iov = iov; @@ -691,29 +691,7 @@ void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, EXPORT_SYMBOL_GPL(xdr_init_encode); /** - * xdr_init_encode_pages - Initialize an xdr_stream for encoding into pages - * @xdr: pointer to xdr_stream struct - * @buf: pointer to XDR buffer into which to encode data - * @pages: list of pages to decode into - * @rqst: pointer to controlling rpc_rqst, for debugging - * - */ -void xdr_init_encode_pages(struct xdr_stream *xdr, struct xdr_buf *buf, - struct page **pages, struct rpc_rqst *rqst) -{ - xdr_reset_scratch_buffer(xdr); - - xdr->buf = buf; - xdr->page_ptr = pages; - xdr->iov = NULL; - xdr->p = page_address(*pages); - xdr->end = (void *)xdr->p + min_t(u32, buf->buflen, PAGE_SIZE); - xdr->rqst = rqst; -} -EXPORT_SYMBOL_GPL(xdr_init_encode_pages); - -/** - * __xdr_commit_encode - Ensure all data is written to buffer + * xdr_commit_encode - Ensure all data is written to buffer * @xdr: pointer to xdr_stream * * We handle encoding across page boundaries by giving the caller a @@ -725,25 +703,22 @@ EXPORT_SYMBOL_GPL(xdr_init_encode_pages); * required at the end of encoding, or any other time when the xdr_buf * data might be read. */ -void __xdr_commit_encode(struct xdr_stream *xdr) +inline void xdr_commit_encode(struct xdr_stream *xdr) { int shift = xdr->scratch.iov_len; void *page; + if (shift == 0) + return; page = page_address(*xdr->page_ptr); memcpy(xdr->scratch.iov_base, page, shift); memmove(page, page + shift, (void *)xdr->p - page); - xdr_reset_scratch_buffer(xdr); + xdr->scratch.iov_len = 0; } -EXPORT_SYMBOL_GPL(__xdr_commit_encode); +EXPORT_SYMBOL_GPL(xdr_commit_encode); -/* - * The buffer space to be reserved crosses the boundary between - * xdr->buf->head and xdr->buf->pages, or between two pages - * in xdr->buf->pages. - */ -static noinline __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr, - size_t nbytes) +static __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr, + size_t nbytes) { __be32 *p; int space_left; @@ -768,7 +743,8 @@ static noinline __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr, * the "scratch" iov to track any temporarily unused fragment of * space at the end of the previous buffer: */ - xdr_set_scratch_buffer(xdr, xdr->p, frag1bytes); + xdr->scratch.iov_base = xdr->p; + xdr->scratch.iov_len = frag1bytes; p = page_address(*xdr->page_ptr); /* * Note this is where the next encode will start after we've @@ -1080,7 +1056,8 @@ void xdr_init_decode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, struct rpc_rqst *rqst) { xdr->buf = buf; - xdr_reset_scratch_buffer(xdr); + xdr->scratch.iov_base = NULL; + xdr->scratch.iov_len = 0; xdr->nwords = XDR_QUADLEN(buf->len); if (buf->head[0].iov_len != 0) xdr_set_iov(xdr, buf->head, buf->len); @@ -1128,6 +1105,24 @@ static __be32 * __xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes) return p; } +/** + * xdr_set_scratch_buffer - Attach a scratch buffer for decoding data. + * @xdr: pointer to xdr_stream struct + * @buf: pointer to an empty buffer + * @buflen: size of 'buf' + * + * The scratch buffer is used when decoding from an array of pages. + * If an xdr_inline_decode() call spans across page boundaries, then + * we copy the data into the scratch buffer in order to allow linear + * access. + */ +void xdr_set_scratch_buffer(struct xdr_stream *xdr, void *buf, size_t buflen) +{ + xdr->scratch.iov_base = buf; + xdr->scratch.iov_len = buflen; +} +EXPORT_SYMBOL_GPL(xdr_set_scratch_buffer); + static __be32 *xdr_copy_to_scratch(struct xdr_stream *xdr, size_t nbytes) { __be32 *p; @@ -1437,51 +1432,6 @@ xdr_buf_subsegment(struct xdr_buf *buf, struct xdr_buf *subbuf, } EXPORT_SYMBOL_GPL(xdr_buf_subsegment); -/** - * xdr_stream_subsegment - set @subbuf to a portion of @xdr - * @xdr: an xdr_stream set up for decoding - * @subbuf: the result buffer - * @nbytes: length of @xdr to extract, in bytes - * - * Sets up @subbuf to represent a portion of @xdr. The portion - * starts at the current offset in @xdr, and extends for a length - * of @nbytes. If this is successful, @xdr is advanced to the next - * position following that portion. - * - * Return values: - * %true: @subbuf has been initialized, and @xdr has been advanced. - * %false: a bounds error has occurred - */ -bool xdr_stream_subsegment(struct xdr_stream *xdr, struct xdr_buf *subbuf, - unsigned int nbytes) -{ - unsigned int remaining, offset, len; - - if (xdr_buf_subsegment(xdr->buf, subbuf, xdr_stream_pos(xdr), nbytes)) - return false; - - if (subbuf->head[0].iov_len) - if (!__xdr_inline_decode(xdr, subbuf->head[0].iov_len)) - return false; - - remaining = subbuf->page_len; - offset = subbuf->page_base; - while (remaining) { - len = min_t(unsigned int, remaining, PAGE_SIZE) - offset; - - if (xdr->p == xdr->end && !xdr_set_next_buffer(xdr)) - return false; - if (!__xdr_inline_decode(xdr, len)) - return false; - - remaining -= len; - offset = 0; - } - - return true; -} -EXPORT_SYMBOL_GPL(xdr_stream_subsegment); - /** * xdr_buf_trim - lop at most "len" bytes off the end of "buf" * @buf: buf to be trimmed diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c index feac8c26fb87..c5154bc38e12 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c @@ -186,7 +186,7 @@ static int xprt_rdma_bc_send_request(struct rpc_rqst *rqst) ret = rpcrdma_bc_send_request(rdma, rqst); if (ret == -ENOTCONN) - svc_xprt_close(sxprt); + svc_close_xprt(sxprt); return ret; } diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c index d6436c13d5c4..c3d588b149aa 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c @@ -448,6 +448,7 @@ static ssize_t svc_rdma_encode_write_chunk(__be32 *src, * svc_rdma_encode_write_list - Encode RPC Reply's Write chunk list * @rctxt: Reply context with information about the RPC Call * @sctxt: Send context for the RPC Reply + * @length: size in bytes of the payload in the first Write chunk * * The client provides a Write chunk list in the Call message. Fill * in the segments in the first Write chunk in the Reply's transport @@ -464,12 +465,12 @@ static ssize_t svc_rdma_encode_write_chunk(__be32 *src, */ static ssize_t svc_rdma_encode_write_list(const struct svc_rdma_recv_ctxt *rctxt, - struct svc_rdma_send_ctxt *sctxt) + struct svc_rdma_send_ctxt *sctxt, + unsigned int length) { ssize_t len, ret; - ret = svc_rdma_encode_write_chunk(rctxt->rc_write_list, sctxt, - rctxt->rc_read_payload_length); + ret = svc_rdma_encode_write_chunk(rctxt->rc_write_list, sctxt, length); if (ret < 0) return ret; len = ret; @@ -922,12 +923,21 @@ int svc_rdma_sendto(struct svc_rqst *rqstp) goto err0; if (wr_lst) { /* XXX: Presume the client sent only one Write chunk */ - ret = svc_rdma_send_write_chunk(rdma, wr_lst, xdr, - rctxt->rc_read_payload_offset, - rctxt->rc_read_payload_length); + unsigned long offset; + unsigned int length; + + if (rctxt->rc_read_payload_length) { + offset = rctxt->rc_read_payload_offset; + length = rctxt->rc_read_payload_length; + } else { + offset = xdr->head[0].iov_len; + length = xdr->page_len; + } + ret = svc_rdma_send_write_chunk(rdma, wr_lst, xdr, offset, + length); if (ret < 0) goto err2; - if (svc_rdma_encode_write_list(rctxt, sctxt) < 0) + if (svc_rdma_encode_write_list(rctxt, sctxt, length) < 0) goto err0; } else { if (xdr_stream_encode_item_absent(&sctxt->sc_stream) < 0) @@ -969,19 +979,19 @@ int svc_rdma_sendto(struct svc_rqst *rqstp) } /** - * svc_rdma_result_payload - special processing for a result payload + * svc_rdma_read_payload - special processing for a READ payload * @rqstp: svc_rqst to operate on * @offset: payload's byte offset in @xdr * @length: size of payload, in bytes * * Returns zero on success. * - * For the moment, just record the xdr_buf location of the result + * For the moment, just record the xdr_buf location of the READ * payload. svc_rdma_sendto will use that location later when * we actually send the payload. */ -int svc_rdma_result_payload(struct svc_rqst *rqstp, unsigned int offset, - unsigned int length) +int svc_rdma_read_payload(struct svc_rqst *rqstp, unsigned int offset, + unsigned int length) { struct svc_rdma_recv_ctxt *rctxt = rqstp->rq_xprt_ctxt; diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index c895f80df659..5f7e3d12523f 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -80,7 +80,7 @@ static const struct svc_xprt_ops svc_rdma_ops = { .xpo_create = svc_rdma_create, .xpo_recvfrom = svc_rdma_recvfrom, .xpo_sendto = svc_rdma_sendto, - .xpo_result_payload = svc_rdma_result_payload, + .xpo_read_payload = svc_rdma_read_payload, .xpo_release_rqst = svc_rdma_release_rqst, .xpo_detach = svc_rdma_detach, .xpo_free = svc_rdma_free, diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 61ae7acea796..dd57a411adf6 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -959,7 +959,7 @@ static struct sock *unix_find_other(struct net *net, if (err) goto fail; inode = d_backing_inode(path.dentry); - err = path_permission(&path, MAY_WRITE); + err = inode_permission(inode, MAY_WRITE); if (err) goto put_fail; diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 78a48bd0f2b9..08249f7a09d6 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -168,9 +168,8 @@ static bool __dead_end_function(struct objtool_file *file, struct symbol *func, "panic", "do_exit", "do_task_dead", - "kthread_exit", "make_task_dead", - "__module_put_and_kthread_exit", + "__module_put_and_exit", "complete_and_exit", "__reiserfs_panic", "lbug_with_loc", From a297eae7e6b3d6ea627b1172e463315f4a048db2 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 17 Jul 2024 15:09:06 +0000 Subject: [PATCH 1368/1565] ANDROID: fix build error in ksz9477.c In commit 65a9383389db ("net: dsa: microchip: fix initial port flush problem"), a new build warning is added which breaks the Android builds as we are treating all warnings as errors now. Fix this up by removing the offending variable that is no longer used to resolve the build. Fixes: 65a9383389db ("net: dsa: microchip: fix initial port flush problem") Change-Id: I177a0fd1a17b7099e7a284e627c15edfc181c166 Signed-off-by: Greg Kroah-Hartman --- drivers/net/dsa/microchip/ksz9477.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c index 535b64155320..4d0c2be652ae 100644 --- a/drivers/net/dsa/microchip/ksz9477.c +++ b/drivers/net/dsa/microchip/ksz9477.c @@ -195,7 +195,6 @@ static int ksz9477_wait_alu_sta_ready(struct ksz_device *dev) static int ksz9477_reset_switch(struct ksz_device *dev) { - u8 data8; u32 data32; /* reset switch */ From 94ffdde326f507b1d71b42e0b6f92abeeefa8cd2 Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Mon, 5 Feb 2024 16:48:43 +0100 Subject: [PATCH 1369/1565] Compiler Attributes: Add __uninitialized macro commit fd7eea27a3aed79b63b1726c00bde0d50cf207e2 upstream. With INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO enabled the kernel will be compiled with -ftrivial-auto-var-init=<...> which causes initialization of stack variables at function entry time. In order to avoid the performance impact that comes with this users can use the "uninitialized" attribute to prevent such initialization. Therefore provide the __uninitialized macro which can be used for cases where INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO is enabled, but only selected variables should not be initialized. Acked-by: Kees Cook Reviewed-by: Nathan Chancellor Link: https://lore.kernel.org/r/20240205154844.3757121-2-hca@linux.ibm.com Signed-off-by: Heiko Carstens Signed-off-by: Greg Kroah-Hartman --- include/linux/compiler_attributes.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h index 08eb06301791..3982c2dc541e 100644 --- a/include/linux/compiler_attributes.h +++ b/include/linux/compiler_attributes.h @@ -269,6 +269,18 @@ */ #define __section(section) __attribute__((__section__(section))) +/* + * Optional: only supported since gcc >= 12 + * + * gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-uninitialized-variable-attribute + * clang: https://clang.llvm.org/docs/AttributeReference.html#uninitialized + */ +#if __has_attribute(__uninitialized__) +# define __uninitialized __attribute__((__uninitialized__)) +#else +# define __uninitialized +#endif + /* * gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-unused-function-attribute * gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#index-unused-type-attribute From 25d0d9b83d855cbc5d5aa5ae3cd79d55ea0c84a8 Mon Sep 17 00:00:00 2001 From: Erico Nunes Date: Tue, 2 Apr 2024 00:43:28 +0200 Subject: [PATCH 1370/1565] drm/lima: fix shared irq handling on driver remove [ Upstream commit a6683c690bbfd1f371510cb051e8fa49507f3f5e ] lima uses a shared interrupt, so the interrupt handlers must be prepared to be called at any time. At driver removal time, the clocks are disabled early and the interrupts stay registered until the very end of the remove process due to the devm usage. This is potentially a bug as the interrupts access device registers which assumes clocks are enabled. A crash can be triggered by removing the driver in a kernel with CONFIG_DEBUG_SHIRQ enabled. This patch frees the interrupts at each lima device finishing callback so that the handlers are already unregistered by the time we fully disable clocks. Signed-off-by: Erico Nunes Signed-off-by: Qiang Yu Link: https://patchwork.freedesktop.org/patch/msgid/20240401224329.1228468-2-nunes.erico@gmail.com Signed-off-by: Sasha Levin --- drivers/gpu/drm/lima/lima_gp.c | 2 ++ drivers/gpu/drm/lima/lima_mmu.c | 5 +++++ drivers/gpu/drm/lima/lima_pp.c | 4 ++++ 3 files changed, 11 insertions(+) diff --git a/drivers/gpu/drm/lima/lima_gp.c b/drivers/gpu/drm/lima/lima_gp.c index 6cf46b653e81..ca3842f71984 100644 --- a/drivers/gpu/drm/lima/lima_gp.c +++ b/drivers/gpu/drm/lima/lima_gp.c @@ -324,7 +324,9 @@ int lima_gp_init(struct lima_ip *ip) void lima_gp_fini(struct lima_ip *ip) { + struct lima_device *dev = ip->dev; + devm_free_irq(dev->dev, ip->irq, ip); } int lima_gp_pipe_init(struct lima_device *dev) diff --git a/drivers/gpu/drm/lima/lima_mmu.c b/drivers/gpu/drm/lima/lima_mmu.c index a1ae6c252dc2..8ca7047adbac 100644 --- a/drivers/gpu/drm/lima/lima_mmu.c +++ b/drivers/gpu/drm/lima/lima_mmu.c @@ -118,7 +118,12 @@ int lima_mmu_init(struct lima_ip *ip) void lima_mmu_fini(struct lima_ip *ip) { + struct lima_device *dev = ip->dev; + if (ip->id == lima_ip_ppmmu_bcast) + return; + + devm_free_irq(dev->dev, ip->irq, ip); } void lima_mmu_flush_tlb(struct lima_ip *ip) diff --git a/drivers/gpu/drm/lima/lima_pp.c b/drivers/gpu/drm/lima/lima_pp.c index 54b208a4a768..d34c9e8840f4 100644 --- a/drivers/gpu/drm/lima/lima_pp.c +++ b/drivers/gpu/drm/lima/lima_pp.c @@ -266,7 +266,9 @@ int lima_pp_init(struct lima_ip *ip) void lima_pp_fini(struct lima_ip *ip) { + struct lima_device *dev = ip->dev; + devm_free_irq(dev->dev, ip->irq, ip); } int lima_pp_bcast_resume(struct lima_ip *ip) @@ -299,7 +301,9 @@ int lima_pp_bcast_init(struct lima_ip *ip) void lima_pp_bcast_fini(struct lima_ip *ip) { + struct lima_device *dev = ip->dev; + devm_free_irq(dev->dev, ip->irq, ip); } static int lima_pp_task_validate(struct lima_sched_pipe *pipe, From 82ef3fa640f6f20142c930ec92d2638af4be8136 Mon Sep 17 00:00:00 2001 From: Ricardo Ribalda Date: Wed, 10 Apr 2024 12:24:37 +0000 Subject: [PATCH 1371/1565] media: dvb: as102-fe: Fix as10x_register_addr packing [ Upstream commit 309422d280748c74f57f471559980268ac27732a ] This structure is embedded in multiple other structures that are packed, which conflicts with it being aligned. drivers/media/usb/as102/as10x_cmd.h:379:30: warning: field reg_addr within 'struct as10x_dump_memory::(unnamed at drivers/media/usb/as102/as10x_cmd.h:373:2)' is less aligned than 'struct as10x_register_addr' and is usually due to 'struct as10x_dump_memory::(unnamed at drivers/media/usb/as102/as10x_cmd.h:373:2)' being packed, which can lead to unaligned accesses [-Wunaligned-access] Mark it as being packed. Marking the inner struct as 'packed' does not change the layout, since the whole struct is already packed, it just silences the clang warning. See also this llvm discussion: https://github.com/llvm/llvm-project/issues/55520 Signed-off-by: Ricardo Ribalda Signed-off-by: Hans Verkuil Signed-off-by: Sasha Levin --- drivers/media/dvb-frontends/as102_fe_types.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/dvb-frontends/as102_fe_types.h b/drivers/media/dvb-frontends/as102_fe_types.h index 297f9520ebf9..8a4e392c8896 100644 --- a/drivers/media/dvb-frontends/as102_fe_types.h +++ b/drivers/media/dvb-frontends/as102_fe_types.h @@ -174,6 +174,6 @@ struct as10x_register_addr { uint32_t addr; /* register mode access */ uint8_t mode; -}; +} __packed; #endif From 167afd3fedaf8d0cccd6aa6ad90f850e2c2b571c Mon Sep 17 00:00:00 2001 From: Ricardo Ribalda Date: Thu, 11 Apr 2024 21:17:56 +0000 Subject: [PATCH 1372/1565] media: dvb-usb: dib0700_devices: Add missing release_firmware() [ Upstream commit 4b267c23ee064bd24c6933df0588ad1b6e111145 ] Add missing release_firmware on the error paths. drivers/media/usb/dvb-usb/dib0700_devices.c:2415 stk9090m_frontend_attach() warn: 'state->frontend_firmware' from request_firmware() not released on lines: 2415. drivers/media/usb/dvb-usb/dib0700_devices.c:2497 nim9090md_frontend_attach() warn: 'state->frontend_firmware' from request_firmware() not released on lines: 2489,2497. Signed-off-by: Ricardo Ribalda Signed-off-by: Hans Verkuil Signed-off-by: Sasha Levin --- drivers/media/usb/dvb-usb/dib0700_devices.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/media/usb/dvb-usb/dib0700_devices.c b/drivers/media/usb/dvb-usb/dib0700_devices.c index d3288c107906..afc561c1a5d6 100644 --- a/drivers/media/usb/dvb-usb/dib0700_devices.c +++ b/drivers/media/usb/dvb-usb/dib0700_devices.c @@ -2419,7 +2419,12 @@ static int stk9090m_frontend_attach(struct dvb_usb_adapter *adap) adap->fe_adap[0].fe = dvb_attach(dib9000_attach, &adap->dev->i2c_adap, 0x80, &stk9090m_config); - return adap->fe_adap[0].fe == NULL ? -ENODEV : 0; + if (!adap->fe_adap[0].fe) { + release_firmware(state->frontend_firmware); + return -ENODEV; + } + + return 0; } static int dib9090_tuner_attach(struct dvb_usb_adapter *adap) @@ -2492,8 +2497,10 @@ static int nim9090md_frontend_attach(struct dvb_usb_adapter *adap) dib9000_i2c_enumeration(&adap->dev->i2c_adap, 1, 0x20, 0x80); adap->fe_adap[0].fe = dvb_attach(dib9000_attach, &adap->dev->i2c_adap, 0x80, &nim9090md_config[0]); - if (adap->fe_adap[0].fe == NULL) + if (!adap->fe_adap[0].fe) { + release_firmware(state->frontend_firmware); return -ENODEV; + } i2c = dib9000_get_i2c_master(adap->fe_adap[0].fe, DIBX000_I2C_INTERFACE_GPIO_3_4, 0); dib9000_i2c_enumeration(i2c, 1, 0x12, 0x82); @@ -2501,7 +2508,12 @@ static int nim9090md_frontend_attach(struct dvb_usb_adapter *adap) fe_slave = dvb_attach(dib9000_attach, i2c, 0x82, &nim9090md_config[1]); dib9000_set_slave_frontend(adap->fe_adap[0].fe, fe_slave); - return fe_slave == NULL ? -ENODEV : 0; + if (!fe_slave) { + release_firmware(state->frontend_firmware); + return -ENODEV; + } + + return 0; } static int nim9090md_tuner_attach(struct dvb_usb_adapter *adap) From 62349fbf86b5e13b02721bdadf98c29afd1e7b5f Mon Sep 17 00:00:00 2001 From: Michael Guralnik Date: Tue, 16 Apr 2024 15:01:44 +0300 Subject: [PATCH 1373/1565] IB/core: Implement a limit on UMAD receive List [ Upstream commit ca0b44e20a6f3032224599f02e7c8fb49525c894 ] The existing behavior of ib_umad, which maintains received MAD packets in an unbounded list, poses a risk of uncontrolled growth. As user-space applications extract packets from this list, the rate of extraction may not match the rate of incoming packets, leading to potential list overflow. To address this, we introduce a limit to the size of the list. After considering typical scenarios, such as OpenSM processing, which can handle approximately 100k packets per second, and the 1-second retry timeout for most packets, we set the list size limit to 200k. Packets received beyond this limit are dropped, assuming they are likely timed out by the time they are handled by user-space. Notably, packets queued on the receive list due to reasons like timed-out sends are preserved even when the list is full. Signed-off-by: Michael Guralnik Reviewed-by: Mark Zhang Link: https://lore.kernel.org/r/7197cb58a7d9e78399008f25036205ceab07fbd5.1713268818.git.leon@kernel.org Signed-off-by: Leon Romanovsky Signed-off-by: Sasha Levin --- drivers/infiniband/core/user_mad.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index 3bd0dcde8576..063707dd4fe3 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -63,6 +63,8 @@ MODULE_AUTHOR("Roland Dreier"); MODULE_DESCRIPTION("InfiniBand userspace MAD packet access"); MODULE_LICENSE("Dual BSD/GPL"); +#define MAX_UMAD_RECV_LIST_SIZE 200000 + enum { IB_UMAD_MAX_PORTS = RDMA_MAX_PORTS, IB_UMAD_MAX_AGENTS = 32, @@ -113,6 +115,7 @@ struct ib_umad_file { struct mutex mutex; struct ib_umad_port *port; struct list_head recv_list; + atomic_t recv_list_size; struct list_head send_list; struct list_head port_list; spinlock_t send_lock; @@ -180,24 +183,28 @@ static struct ib_mad_agent *__get_agent(struct ib_umad_file *file, int id) return file->agents_dead ? NULL : file->agent[id]; } -static int queue_packet(struct ib_umad_file *file, - struct ib_mad_agent *agent, - struct ib_umad_packet *packet) +static int queue_packet(struct ib_umad_file *file, struct ib_mad_agent *agent, + struct ib_umad_packet *packet, bool is_recv_mad) { int ret = 1; mutex_lock(&file->mutex); + if (is_recv_mad && + atomic_read(&file->recv_list_size) > MAX_UMAD_RECV_LIST_SIZE) + goto unlock; + for (packet->mad.hdr.id = 0; packet->mad.hdr.id < IB_UMAD_MAX_AGENTS; packet->mad.hdr.id++) if (agent == __get_agent(file, packet->mad.hdr.id)) { list_add_tail(&packet->list, &file->recv_list); + atomic_inc(&file->recv_list_size); wake_up_interruptible(&file->recv_wait); ret = 0; break; } - +unlock: mutex_unlock(&file->mutex); return ret; @@ -224,7 +231,7 @@ static void send_handler(struct ib_mad_agent *agent, if (send_wc->status == IB_WC_RESP_TIMEOUT_ERR) { packet->length = IB_MGMT_MAD_HDR; packet->mad.hdr.status = ETIMEDOUT; - if (!queue_packet(file, agent, packet)) + if (!queue_packet(file, agent, packet, false)) return; } kfree(packet); @@ -284,7 +291,7 @@ static void recv_handler(struct ib_mad_agent *agent, rdma_destroy_ah_attr(&ah_attr); } - if (queue_packet(file, agent, packet)) + if (queue_packet(file, agent, packet, true)) goto err2; return; @@ -409,6 +416,7 @@ static ssize_t ib_umad_read(struct file *filp, char __user *buf, packet = list_entry(file->recv_list.next, struct ib_umad_packet, list); list_del(&packet->list); + atomic_dec(&file->recv_list_size); mutex_unlock(&file->mutex); @@ -421,6 +429,7 @@ static ssize_t ib_umad_read(struct file *filp, char __user *buf, /* Requeue packet */ mutex_lock(&file->mutex); list_add(&packet->list, &file->recv_list); + atomic_inc(&file->recv_list_size); mutex_unlock(&file->mutex); } else { if (packet->recv_wc) From 5ceb40cdee721e13cbe15a0515cacf984e11236b Mon Sep 17 00:00:00 2001 From: John Meneghini Date: Wed, 3 Apr 2024 11:01:55 -0400 Subject: [PATCH 1374/1565] scsi: qedf: Make qedf_execute_tmf() non-preemptible [ Upstream commit 0d8b637c9c5eeaa1a4e3dfb336f3ff918eb64fec ] Stop calling smp_processor_id() from preemptible code in qedf_execute_tmf90. This results in BUG_ON() when running an RT kernel. [ 659.343280] BUG: using smp_processor_id() in preemptible [00000000] code: sg_reset/3646 [ 659.343282] caller is qedf_execute_tmf+0x8b/0x360 [qedf] Tested-by: Guangwu Zhang Cc: Saurav Kashyap Cc: Nilesh Javali Signed-off-by: John Meneghini Link: https://lore.kernel.org/r/20240403150155.412954-1-jmeneghi@redhat.com Acked-by: Saurav Kashyap Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/qedf/qedf_io.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/qedf/qedf_io.c b/drivers/scsi/qedf/qedf_io.c index 1f8e81296beb..70f920f4b7a1 100644 --- a/drivers/scsi/qedf/qedf_io.c +++ b/drivers/scsi/qedf/qedf_io.c @@ -2351,9 +2351,6 @@ static int qedf_execute_tmf(struct qedf_rport *fcport, struct scsi_cmnd *sc_cmd, io_req->fcport = fcport; io_req->cmd_type = QEDF_TASK_MGMT_CMD; - /* Record which cpu this request is associated with */ - io_req->cpu = smp_processor_id(); - /* Set TM flags */ io_req->io_req_flags = QEDF_READ; io_req->data_xfer_len = 0; @@ -2375,6 +2372,9 @@ static int qedf_execute_tmf(struct qedf_rport *fcport, struct scsi_cmnd *sc_cmd, spin_lock_irqsave(&fcport->rport_lock, flags); + /* Record which cpu this request is associated with */ + io_req->cpu = smp_processor_id(); + sqe_idx = qedf_get_sqe_idx(fcport); sqe = &fcport->sq[sqe_idx]; memset(sqe, 0, sizeof(struct fcoe_wqe)); From 9db8c299a521813630fcb4154298cb60c37f3133 Mon Sep 17 00:00:00 2001 From: Hailey Mothershead Date: Mon, 15 Apr 2024 22:19:15 +0000 Subject: [PATCH 1375/1565] crypto: aead,cipher - zeroize key buffer after use [ Upstream commit 23e4099bdc3c8381992f9eb975c79196d6755210 ] I.G 9.7.B for FIPS 140-3 specifies that variables temporarily holding cryptographic information should be zeroized once they are no longer needed. Accomplish this by using kfree_sensitive for buffers that previously held the private key. Signed-off-by: Hailey Mothershead Signed-off-by: Herbert Xu Signed-off-by: Sasha Levin --- crypto/aead.c | 3 +-- crypto/cipher.c | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/crypto/aead.c b/crypto/aead.c index 16991095270d..c4ece86c45bc 100644 --- a/crypto/aead.c +++ b/crypto/aead.c @@ -35,8 +35,7 @@ static int setkey_unaligned(struct crypto_aead *tfm, const u8 *key, alignbuffer = (u8 *)ALIGN((unsigned long)buffer, alignmask + 1); memcpy(alignbuffer, key, keylen); ret = crypto_aead_alg(tfm)->setkey(tfm, alignbuffer, keylen); - memset(alignbuffer, 0, keylen); - kfree(buffer); + kfree_sensitive(buffer); return ret; } diff --git a/crypto/cipher.c b/crypto/cipher.c index fd78150deb1c..72c5606cc7f8 100644 --- a/crypto/cipher.c +++ b/crypto/cipher.c @@ -33,8 +33,7 @@ static int setkey_unaligned(struct crypto_cipher *tfm, const u8 *key, alignbuffer = (u8 *)ALIGN((unsigned long)buffer, alignmask + 1); memcpy(alignbuffer, key, keylen); ret = cia->cia_setkey(crypto_cipher_tfm(tfm), alignbuffer, keylen); - memset(alignbuffer, 0, keylen); - kfree(buffer); + kfree_sensitive(buffer); return ret; } From f0645c99c20e2210a3157af3f6bc1a22b8fbb115 Mon Sep 17 00:00:00 2001 From: Ma Jun Date: Mon, 22 Apr 2024 10:07:51 +0800 Subject: [PATCH 1376/1565] drm/amdgpu: Initialize timestamp for some legacy SOCs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 2e55bcf3d742a4946d862b86e39e75a95cc6f1c0 ] Initialize the interrupt timestamp for some legacy SOCs to fix the coverity issue "Uninitialized scalar variable" Signed-off-by: Ma Jun Suggested-by: Christian König Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 582055136cdb..87dcbaf540e8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -413,6 +413,14 @@ void amdgpu_irq_dispatch(struct amdgpu_device *adev, int r; entry.iv_entry = (const uint32_t *)&ih->ring[ring_index]; + + /* + * timestamp is not supported on some legacy SOCs (cik, cz, iceland, + * si and tonga), so initialize timestamp and timestamp_src to 0 + */ + entry.timestamp = 0; + entry.timestamp_src = 0; + amdgpu_ih_decode_iv(adev, &entry); trace_amdgpu_iv(ih - &adev->irq.ih, &entry); From b5b8837d066cc182ff69fb5629ad32ade5484567 Mon Sep 17 00:00:00 2001 From: Alex Hung Date: Thu, 18 Apr 2024 13:27:43 -0600 Subject: [PATCH 1377/1565] drm/amd/display: Check index msg_id before read or write [ Upstream commit 59d99deb330af206a4541db0c4da8f73880fba03 ] [WHAT] msg_id is used as an array index and it cannot be a negative value, and therefore cannot be equal to MOD_HDCP_MESSAGE_ID_INVALID (-1). [HOW] Check whether msg_id is valid before reading and setting. This fixes 4 OVERRUN issues reported by Coverity. Reviewed-by: Rodrigo Siqueira Acked-by: Wayne Lin Signed-off-by: Alex Hung Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/modules/hdcp/hdcp_ddc.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_ddc.c b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_ddc.c index f7b5583ee609..8e9caae7c955 100644 --- a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_ddc.c +++ b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_ddc.c @@ -156,6 +156,10 @@ static enum mod_hdcp_status read(struct mod_hdcp *hdcp, uint32_t cur_size = 0; uint32_t data_offset = 0; + if (msg_id == MOD_HDCP_MESSAGE_ID_INVALID) { + return MOD_HDCP_STATUS_DDC_FAILURE; + } + if (is_dp_hdcp(hdcp)) { while (buf_len > 0) { cur_size = MIN(buf_len, HDCP_MAX_AUX_TRANSACTION_SIZE); @@ -215,6 +219,10 @@ static enum mod_hdcp_status write(struct mod_hdcp *hdcp, uint32_t cur_size = 0; uint32_t data_offset = 0; + if (msg_id == MOD_HDCP_MESSAGE_ID_INVALID) { + return MOD_HDCP_STATUS_DDC_FAILURE; + } + if (is_dp_hdcp(hdcp)) { while (buf_len > 0) { cur_size = MIN(buf_len, HDCP_MAX_AUX_TRANSACTION_SIZE); From b2e9abc95583ac7bbb2c47da4d476a798146dfd6 Mon Sep 17 00:00:00 2001 From: Alex Hung Date: Mon, 22 Apr 2024 18:07:17 -0600 Subject: [PATCH 1378/1565] drm/amd/display: Check pipe offset before setting vblank [ Upstream commit 5396a70e8cf462ec5ccf2dc8de103c79de9489e6 ] pipe_ctx has a size of MAX_PIPES so checking its index before accessing the array. This fixes an OVERRUN issue reported by Coverity. Reviewed-by: Rodrigo Siqueira Acked-by: Wayne Lin Signed-off-by: Alex Hung Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../drm/amd/display/dc/irq/dce110/irq_service_dce110.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/irq/dce110/irq_service_dce110.c b/drivers/gpu/drm/amd/display/dc/irq/dce110/irq_service_dce110.c index 378cc11aa047..3d8b2b127f3f 100644 --- a/drivers/gpu/drm/amd/display/dc/irq/dce110/irq_service_dce110.c +++ b/drivers/gpu/drm/amd/display/dc/irq/dce110/irq_service_dce110.c @@ -211,8 +211,12 @@ bool dce110_vblank_set(struct irq_service *irq_service, info->ext_id); uint8_t pipe_offset = dal_irq_src - IRQ_TYPE_VBLANK; - struct timing_generator *tg = - dc->current_state->res_ctx.pipe_ctx[pipe_offset].stream_res.tg; + struct timing_generator *tg; + + if (pipe_offset >= MAX_PIPES) + return false; + + tg = dc->current_state->res_ctx.pipe_ctx[pipe_offset].stream_res.tg; if (enable) { if (!tg || !tg->funcs->arm_vert_intr(tg, 2)) { From ffa7bd3ca9cfa902b857d1dc9a5f46fededf86c8 Mon Sep 17 00:00:00 2001 From: Alex Hung Date: Mon, 22 Apr 2024 13:52:27 -0600 Subject: [PATCH 1379/1565] drm/amd/display: Skip finding free audio for unknown engine_id [ Upstream commit 1357b2165d9ad94faa4c4a20d5e2ce29c2ff29c3 ] [WHY] ENGINE_ID_UNKNOWN = -1 and can not be used as an array index. Plus, it also means it is uninitialized and does not need free audio. [HOW] Skip and return NULL. This fixes 2 OVERRUN issues reported by Coverity. Reviewed-by: Rodrigo Siqueira Acked-by: Wayne Lin Signed-off-by: Alex Hung Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c index f1eda1a6496d..0a13c06eea44 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c @@ -1802,6 +1802,9 @@ static struct audio *find_first_free_audio( { int i, available_audio_count; + if (id == ENGINE_ID_UNKNOWN) + return NULL; + available_audio_count = pool->audio_count; for (i = 0; i < available_audio_count; i++) { From 4dcce63a6f5c54ebb8c043b42fea524a079277b5 Mon Sep 17 00:00:00 2001 From: Michael Bunk Date: Sun, 16 Jan 2022 11:22:36 +0000 Subject: [PATCH 1380/1565] media: dw2102: Don't translate i2c read into write [ Upstream commit 0e148a522b8453115038193e19ec7bea71403e4a ] The code ignored the I2C_M_RD flag on I2C messages. Instead it assumed an i2c transaction with a single message must be a write operation and a transaction with two messages would be a read operation. Though this works for the driver code, it leads to problems once the i2c device is exposed to code not knowing this convention. For example, I did "insmod i2c-dev" and issued read requests from userspace, which were translated into write requests and destroyed the EEPROM of my device. So, just check and respect the I2C_M_READ flag, which indicates a read when set on a message. If it is absent, it is a write message. Incidentally, changing from the case statement to a while loop allows the code to lift the limitation to two i2c messages per transaction. There are 4 more *_i2c_transfer functions affected by the same behaviour and limitation that should be fixed in the same way. Link: https://lore.kernel.org/linux-media/20220116112238.74171-2-micha@freedict.org Signed-off-by: Michael Bunk Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin --- drivers/media/usb/dvb-usb/dw2102.c | 120 ++++++++++++++++++----------- 1 file changed, 73 insertions(+), 47 deletions(-) diff --git a/drivers/media/usb/dvb-usb/dw2102.c b/drivers/media/usb/dvb-usb/dw2102.c index 2290f132a82c..5b51d0a8bec4 100644 --- a/drivers/media/usb/dvb-usb/dw2102.c +++ b/drivers/media/usb/dvb-usb/dw2102.c @@ -716,6 +716,7 @@ static int su3000_i2c_transfer(struct i2c_adapter *adap, struct i2c_msg msg[], { struct dvb_usb_device *d = i2c_get_adapdata(adap); struct dw2102_state *state; + int j; if (!d) return -ENODEV; @@ -729,11 +730,11 @@ static int su3000_i2c_transfer(struct i2c_adapter *adap, struct i2c_msg msg[], return -EAGAIN; } - switch (num) { - case 1: - switch (msg[0].addr) { + j = 0; + while (j < num) { + switch (msg[j].addr) { case SU3000_STREAM_CTRL: - state->data[0] = msg[0].buf[0] + 0x36; + state->data[0] = msg[j].buf[0] + 0x36; state->data[1] = 3; state->data[2] = 0; if (dvb_usb_generic_rw(d, state->data, 3, @@ -745,61 +746,86 @@ static int su3000_i2c_transfer(struct i2c_adapter *adap, struct i2c_msg msg[], if (dvb_usb_generic_rw(d, state->data, 1, state->data, 2, 0) < 0) err("i2c transfer failed."); - msg[0].buf[1] = state->data[0]; - msg[0].buf[0] = state->data[1]; + msg[j].buf[1] = state->data[0]; + msg[j].buf[0] = state->data[1]; break; default: - if (3 + msg[0].len > sizeof(state->data)) { - warn("i2c wr: len=%d is too big!\n", - msg[0].len); + /* if the current write msg is followed by a another + * read msg to/from the same address + */ + if ((j+1 < num) && (msg[j+1].flags & I2C_M_RD) && + (msg[j].addr == msg[j+1].addr)) { + /* join both i2c msgs to one usb read command */ + if (4 + msg[j].len > sizeof(state->data)) { + warn("i2c combined wr/rd: write len=%d is too big!\n", + msg[j].len); + num = -EOPNOTSUPP; + break; + } + if (1 + msg[j+1].len > sizeof(state->data)) { + warn("i2c combined wr/rd: read len=%d is too big!\n", + msg[j+1].len); + num = -EOPNOTSUPP; + break; + } + + state->data[0] = 0x09; + state->data[1] = msg[j].len; + state->data[2] = msg[j+1].len; + state->data[3] = msg[j].addr; + memcpy(&state->data[4], msg[j].buf, msg[j].len); + + if (dvb_usb_generic_rw(d, state->data, msg[j].len + 4, + state->data, msg[j+1].len + 1, 0) < 0) + err("i2c transfer failed."); + + memcpy(msg[j+1].buf, &state->data[1], msg[j+1].len); + j++; + break; + } + + if (msg[j].flags & I2C_M_RD) { + /* single read */ + if (1 + msg[j].len > sizeof(state->data)) { + warn("i2c rd: len=%d is too big!\n", msg[j].len); + num = -EOPNOTSUPP; + break; + } + + state->data[0] = 0x09; + state->data[1] = 0; + state->data[2] = msg[j].len; + state->data[3] = msg[j].addr; + memcpy(&state->data[4], msg[j].buf, msg[j].len); + + if (dvb_usb_generic_rw(d, state->data, 4, + state->data, msg[j].len + 1, 0) < 0) + err("i2c transfer failed."); + + memcpy(msg[j].buf, &state->data[1], msg[j].len); + break; + } + + /* single write */ + if (3 + msg[j].len > sizeof(state->data)) { + warn("i2c wr: len=%d is too big!\n", msg[j].len); num = -EOPNOTSUPP; break; } - /* always i2c write*/ state->data[0] = 0x08; - state->data[1] = msg[0].addr; - state->data[2] = msg[0].len; + state->data[1] = msg[j].addr; + state->data[2] = msg[j].len; - memcpy(&state->data[3], msg[0].buf, msg[0].len); + memcpy(&state->data[3], msg[j].buf, msg[j].len); - if (dvb_usb_generic_rw(d, state->data, msg[0].len + 3, + if (dvb_usb_generic_rw(d, state->data, msg[j].len + 3, state->data, 1, 0) < 0) err("i2c transfer failed."); + } // switch + j++; - } - break; - case 2: - /* always i2c read */ - if (4 + msg[0].len > sizeof(state->data)) { - warn("i2c rd: len=%d is too big!\n", - msg[0].len); - num = -EOPNOTSUPP; - break; - } - if (1 + msg[1].len > sizeof(state->data)) { - warn("i2c rd: len=%d is too big!\n", - msg[1].len); - num = -EOPNOTSUPP; - break; - } - - state->data[0] = 0x09; - state->data[1] = msg[0].len; - state->data[2] = msg[1].len; - state->data[3] = msg[0].addr; - memcpy(&state->data[4], msg[0].buf, msg[0].len); - - if (dvb_usb_generic_rw(d, state->data, msg[0].len + 4, - state->data, msg[1].len + 1, 0) < 0) - err("i2c transfer failed."); - - memcpy(msg[1].buf, &state->data[1], msg[1].len); - break; - default: - warn("more than 2 i2c messages at a time is not handled yet."); - break; - } + } // while mutex_unlock(&d->data_mutex); mutex_unlock(&d->i2c_mutex); return num; From 5a18ea7d864c0fb2c94c1149f5d3ddabca4123fe Mon Sep 17 00:00:00 2001 From: Erick Archer Date: Sat, 27 Apr 2024 19:23:36 +0200 Subject: [PATCH 1381/1565] sctp: prefer struct_size over open coded arithmetic [ Upstream commit e5c5f3596de224422561d48eba6ece5210d967b3 ] This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1][2]. As the "ids" variable is a pointer to "struct sctp_assoc_ids" and this structure ends in a flexible array: struct sctp_assoc_ids { [...] sctp_assoc_t gaids_assoc_id[]; }; the preferred way in the kernel is to use the struct_size() helper to do the arithmetic instead of the calculation "size + size * count" in the kmalloc() function. Also, refactor the code adding the "ids_size" variable to avoid sizing twice. This way, the code is more readable and safer. This code was detected with the help of Coccinelle, and audited and modified manually. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1] Link: https://github.com/KSPP/linux/issues/160 [2] Signed-off-by: Erick Archer Acked-by: Xin Long Reviewed-by: Kees Cook Link: https://lore.kernel.org/r/PAXPR02MB724871DB78375AB06B5171C88B152@PAXPR02MB7248.eurprd02.prod.outlook.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/sctp/socket.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index bc4fe944ef85..79cf4cda2cf6 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -6994,6 +6994,7 @@ static int sctp_getsockopt_assoc_ids(struct sock *sk, int len, struct sctp_sock *sp = sctp_sk(sk); struct sctp_association *asoc; struct sctp_assoc_ids *ids; + size_t ids_size; u32 num = 0; if (sctp_style(sk, TCP)) @@ -7006,11 +7007,11 @@ static int sctp_getsockopt_assoc_ids(struct sock *sk, int len, num++; } - if (len < sizeof(struct sctp_assoc_ids) + sizeof(sctp_assoc_t) * num) + ids_size = struct_size(ids, gaids_assoc_id, num); + if (len < ids_size) return -EINVAL; - len = sizeof(struct sctp_assoc_ids) + sizeof(sctp_assoc_t) * num; - + len = ids_size; ids = kmalloc(len, GFP_USER | __GFP_NOWARN); if (unlikely(!ids)) return -ENOMEM; From 402825a23a0e92b8034d7c58a6335595a8d4ad1d Mon Sep 17 00:00:00 2001 From: Jean Delvare Date: Tue, 30 Apr 2024 18:29:32 +0200 Subject: [PATCH 1382/1565] firmware: dmi: Stop decoding on broken entry [ Upstream commit 0ef11f604503b1862a21597436283f158114d77e ] If a DMI table entry is shorter than 4 bytes, it is invalid. Due to how DMI table parsing works, it is impossible to safely recover from such an error, so we have to stop decoding the table. Signed-off-by: Jean Delvare Link: https://lore.kernel.org/linux-kernel/Zh2K3-HLXOesT_vZ@liuwe-devbox-debian-v2/T/ Reviewed-by: Michael Kelley Signed-off-by: Sasha Levin --- drivers/firmware/dmi_scan.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c index d51ca0428bb8..ded0878dc3b6 100644 --- a/drivers/firmware/dmi_scan.c +++ b/drivers/firmware/dmi_scan.c @@ -101,6 +101,17 @@ static void dmi_decode_table(u8 *buf, (data - buf + sizeof(struct dmi_header)) <= dmi_len) { const struct dmi_header *dm = (const struct dmi_header *)data; + /* + * If a short entry is found (less than 4 bytes), not only it + * is invalid, but we cannot reliably locate the next entry. + */ + if (dm->length < sizeof(struct dmi_header)) { + pr_warn(FW_BUG + "Corrupted DMI table, offset %zd (only %d entries processed)\n", + data - buf, i); + break; + } + /* * We want to know the total length (formatted area and * strings) before decoding to make sure we won't run off the From 8eac1cc159b32110f2e226dc6fbef81ec05e405a Mon Sep 17 00:00:00 2001 From: Erick Archer Date: Sat, 27 Apr 2024 17:05:56 +0200 Subject: [PATCH 1383/1565] Input: ff-core - prefer struct_size over open coded arithmetic [ Upstream commit a08b8f8557ad88ffdff8905e5da972afe52e3307 ] This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1][2]. As the "ff" variable is a pointer to "struct ff_device" and this structure ends in a flexible array: struct ff_device { [...] struct file *effect_owners[] __counted_by(max_effects); }; the preferred way in the kernel is to use the struct_size() helper to do the arithmetic instead of the calculation "size + count * size" in the kzalloc() function. The struct_size() helper returns SIZE_MAX on overflow. So, refactor the comparison to take advantage of this. This way, the code is more readable and safer. This code was detected with the help of Coccinelle, and audited and modified manually. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1] Link: https://github.com/KSPP/linux/issues/160 [2] Signed-off-by: Erick Archer Reviewed-by: Kees Cook Link: https://lore.kernel.org/r/AS8PR02MB72371E646714BAE2E51A6A378B152@AS8PR02MB7237.eurprd02.prod.outlook.com Signed-off-by: Dmitry Torokhov Signed-off-by: Sasha Levin --- drivers/input/ff-core.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/input/ff-core.c b/drivers/input/ff-core.c index 1cf5deda06e1..a765e185c7a1 100644 --- a/drivers/input/ff-core.c +++ b/drivers/input/ff-core.c @@ -12,8 +12,10 @@ /* #define DEBUG */ #include +#include #include #include +#include #include #include @@ -318,9 +320,8 @@ int input_ff_create(struct input_dev *dev, unsigned int max_effects) return -EINVAL; } - ff_dev_size = sizeof(struct ff_device) + - max_effects * sizeof(struct file *); - if (ff_dev_size < max_effects) /* overflow */ + ff_dev_size = struct_size(ff, effect_owners, max_effects); + if (ff_dev_size == SIZE_MAX) /* overflow */ return -EINVAL; ff = kzalloc(ff_dev_size, GFP_KERNEL); From 2a2fe25a103cef73cde356e6d09da10f607e93f5 Mon Sep 17 00:00:00 2001 From: Simon Horman Date: Tue, 30 Apr 2024 18:46:45 +0100 Subject: [PATCH 1384/1565] net: dsa: mv88e6xxx: Correct check for empty list [ Upstream commit 4c7f3950a9fd53a62b156c0fe7c3a2c43b0ba19b ] Since commit a3c53be55c95 ("net: dsa: mv88e6xxx: Support multiple MDIO busses") mv88e6xxx_default_mdio_bus() has checked that the return value of list_first_entry() is non-NULL. This appears to be intended to guard against the list chip->mdios being empty. However, it is not the correct check as the implementation of list_first_entry is not designed to return NULL for empty lists. Instead, use list_first_entry_or_null() which does return NULL if the list is empty. Flagged by Smatch. Compile tested only. Reviewed-by: Andrew Lunn Signed-off-by: Simon Horman Link: https://lore.kernel.org/r/20240430-mv88e6xx-list_empty-v3-1-c35c69d88d2e@kernel.org Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/dsa/mv88e6xxx/chip.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index ac56bc175b51..6a1ae774cfe9 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -116,8 +116,8 @@ struct mii_bus *mv88e6xxx_default_mdio_bus(struct mv88e6xxx_chip *chip) { struct mv88e6xxx_mdio_bus *mdio_bus; - mdio_bus = list_first_entry(&chip->mdios, struct mv88e6xxx_mdio_bus, - list); + mdio_bus = list_first_entry_or_null(&chip->mdios, + struct mv88e6xxx_mdio_bus, list); if (!mdio_bus) return NULL; From 39e7a27813be32874a4b0487e3ed4c369b23808c Mon Sep 17 00:00:00 2001 From: Ricardo Ribalda Date: Mon, 29 Apr 2024 16:04:47 +0100 Subject: [PATCH 1385/1565] media: dvb-frontends: tda18271c2dd: Remove casting during div [ Upstream commit e9a844632630e18ed0671a7e3467431bd719952e ] do_div() divides 64 bits by 32. We were adding a casting to the divider to 64 bits, for a number that fits perfectly in 32 bits. Remove it. Found by cocci: drivers/media/dvb-frontends/tda18271c2dd.c:355:1-7: WARNING: do_div() does a 64-by-32 division, please consider using div64_u64 instead. drivers/media/dvb-frontends/tda18271c2dd.c:331:1-7: WARNING: do_div() does a 64-by-32 division, please consider using div64_u64 instead. Link: https://lore.kernel.org/linux-media/20240429-fix-cocci-v3-8-3c4865f5a4b0@chromium.org Signed-off-by: Ricardo Ribalda Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin --- drivers/media/dvb-frontends/tda18271c2dd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/media/dvb-frontends/tda18271c2dd.c b/drivers/media/dvb-frontends/tda18271c2dd.c index a34834487943..fd928787207e 100644 --- a/drivers/media/dvb-frontends/tda18271c2dd.c +++ b/drivers/media/dvb-frontends/tda18271c2dd.c @@ -328,7 +328,7 @@ static int CalcMainPLL(struct tda_state *state, u32 freq) OscFreq = (u64) freq * (u64) Div; OscFreq *= (u64) 16384; - do_div(OscFreq, (u64)16000000); + do_div(OscFreq, 16000000); MainDiv = OscFreq; state->m_Regs[MPD] = PostDiv & 0x77; @@ -352,7 +352,7 @@ static int CalcCalPLL(struct tda_state *state, u32 freq) OscFreq = (u64)freq * (u64)Div; /* CalDiv = u32( OscFreq * 16384 / 16000000 ); */ OscFreq *= (u64)16384; - do_div(OscFreq, (u64)16000000); + do_div(OscFreq, 16000000); CalDiv = OscFreq; state->m_Regs[CPD] = PostDiv; From 7d2fbd822df14da4b507b61da313a9fa11182838 Mon Sep 17 00:00:00 2001 From: Ricardo Ribalda Date: Mon, 29 Apr 2024 16:04:50 +0100 Subject: [PATCH 1386/1565] media: s2255: Use refcount_t instead of atomic_t for num_channels [ Upstream commit 6cff72f6bcee89228a662435b7c47e21a391c8d0 ] Use an API that resembles more the actual use of num_channels. Found by cocci: drivers/media/usb/s2255/s2255drv.c:2362:5-24: WARNING: atomic_dec_and_test variation before object free at line 2363. drivers/media/usb/s2255/s2255drv.c:1557:5-24: WARNING: atomic_dec_and_test variation before object free at line 1558. Link: https://lore.kernel.org/linux-media/20240429-fix-cocci-v3-11-3c4865f5a4b0@chromium.org Signed-off-by: Ricardo Ribalda Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin --- drivers/media/usb/s2255/s2255drv.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/media/usb/s2255/s2255drv.c b/drivers/media/usb/s2255/s2255drv.c index cb15eb32d2a6..50ac20c22630 100644 --- a/drivers/media/usb/s2255/s2255drv.c +++ b/drivers/media/usb/s2255/s2255drv.c @@ -247,7 +247,7 @@ struct s2255_vc { struct s2255_dev { struct s2255_vc vc[MAX_CHANNELS]; struct v4l2_device v4l2_dev; - atomic_t num_channels; + refcount_t num_channels; int frames; struct mutex lock; /* channels[].vdev.lock */ struct mutex cmdlock; /* protects cmdbuf */ @@ -1552,11 +1552,11 @@ static void s2255_video_device_release(struct video_device *vdev) container_of(vdev, struct s2255_vc, vdev); dprintk(dev, 4, "%s, chnls: %d\n", __func__, - atomic_read(&dev->num_channels)); + refcount_read(&dev->num_channels)); v4l2_ctrl_handler_free(&vc->hdl); - if (atomic_dec_and_test(&dev->num_channels)) + if (refcount_dec_and_test(&dev->num_channels)) s2255_destroy(dev); return; } @@ -1661,7 +1661,7 @@ static int s2255_probe_v4l(struct s2255_dev *dev) "failed to register video device!\n"); break; } - atomic_inc(&dev->num_channels); + refcount_inc(&dev->num_channels); v4l2_info(&dev->v4l2_dev, "V4L2 device registered as %s\n", video_device_node_name(&vc->vdev)); @@ -1669,11 +1669,11 @@ static int s2255_probe_v4l(struct s2255_dev *dev) pr_info("Sensoray 2255 V4L driver Revision: %s\n", S2255_VERSION); /* if no channels registered, return error and probe will fail*/ - if (atomic_read(&dev->num_channels) == 0) { + if (refcount_read(&dev->num_channels) == 0) { v4l2_device_unregister(&dev->v4l2_dev); return ret; } - if (atomic_read(&dev->num_channels) != MAX_CHANNELS) + if (refcount_read(&dev->num_channels) != MAX_CHANNELS) pr_warn("s2255: Not all channels available.\n"); return 0; } @@ -2222,7 +2222,7 @@ static int s2255_probe(struct usb_interface *interface, goto errorFWDATA1; } - atomic_set(&dev->num_channels, 0); + refcount_set(&dev->num_channels, 0); dev->pid = id->idProduct; dev->fw_data = kzalloc(sizeof(struct s2255_fw), GFP_KERNEL); if (!dev->fw_data) @@ -2342,12 +2342,12 @@ static void s2255_disconnect(struct usb_interface *interface) { struct s2255_dev *dev = to_s2255_dev(usb_get_intfdata(interface)); int i; - int channels = atomic_read(&dev->num_channels); + int channels = refcount_read(&dev->num_channels); mutex_lock(&dev->lock); v4l2_device_disconnect(&dev->v4l2_dev); mutex_unlock(&dev->lock); /*see comments in the uvc_driver.c usb disconnect function */ - atomic_inc(&dev->num_channels); + refcount_inc(&dev->num_channels); /* unregister each video device. */ for (i = 0; i < channels; i++) video_unregister_device(&dev->vc[i].vdev); @@ -2360,7 +2360,7 @@ static void s2255_disconnect(struct usb_interface *interface) dev->vc[i].vidstatus_ready = 1; wake_up(&dev->vc[i].wait_vidstatus); } - if (atomic_dec_and_test(&dev->num_channels)) + if (refcount_dec_and_test(&dev->num_channels)) s2255_destroy(dev); dev_info(&interface->dev, "%s\n", __func__); } From e1ba22618758e95e09c9fd30c69ccce38edf94c0 Mon Sep 17 00:00:00 2001 From: Ricardo Ribalda Date: Mon, 29 Apr 2024 16:05:04 +0100 Subject: [PATCH 1387/1565] media: dvb-frontends: tda10048: Fix integer overflow [ Upstream commit 1aa1329a67cc214c3b7bd2a14d1301a795760b07 ] state->xtal_hz can be up to 16M, so it can overflow a 32 bit integer when multiplied by pll_mfactor. Create a new 64 bit variable to hold the calculations. Link: https://lore.kernel.org/linux-media/20240429-fix-cocci-v3-25-3c4865f5a4b0@chromium.org Reported-by: Dan Carpenter Signed-off-by: Ricardo Ribalda Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin --- drivers/media/dvb-frontends/tda10048.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/media/dvb-frontends/tda10048.c b/drivers/media/dvb-frontends/tda10048.c index f1d5e77d5dcc..db829754f135 100644 --- a/drivers/media/dvb-frontends/tda10048.c +++ b/drivers/media/dvb-frontends/tda10048.c @@ -410,6 +410,7 @@ static int tda10048_set_if(struct dvb_frontend *fe, u32 bw) struct tda10048_config *config = &state->config; int i; u32 if_freq_khz; + u64 sample_freq; dprintk(1, "%s(bw = %d)\n", __func__, bw); @@ -451,9 +452,11 @@ static int tda10048_set_if(struct dvb_frontend *fe, u32 bw) dprintk(1, "- pll_pfactor = %d\n", state->pll_pfactor); /* Calculate the sample frequency */ - state->sample_freq = state->xtal_hz * (state->pll_mfactor + 45); - state->sample_freq /= (state->pll_nfactor + 1); - state->sample_freq /= (state->pll_pfactor + 4); + sample_freq = state->xtal_hz; + sample_freq *= state->pll_mfactor + 45; + do_div(sample_freq, state->pll_nfactor + 1); + do_div(sample_freq, state->pll_pfactor + 4); + state->sample_freq = sample_freq; dprintk(1, "- sample_freq = %d\n", state->sample_freq); /* Update the I/F */ From 158bcaa2e31be6e32cabd241c3a61fbcc526c93f Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Fri, 12 Apr 2024 12:21:58 +0200 Subject: [PATCH 1388/1565] i2c: i801: Annotate apanel_addr as __ro_after_init [ Upstream commit 355b1513b1e97b6cef84b786c6480325dfd3753d ] Annotate this variable as __ro_after_init to protect it from being overwritten later. Signed-off-by: Heiner Kallweit Signed-off-by: Andi Shyti Signed-off-by: Sasha Levin --- drivers/i2c/busses/i2c-i801.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c index d6b945f5b887..4baa9bce02b6 100644 --- a/drivers/i2c/busses/i2c-i801.c +++ b/drivers/i2c/busses/i2c-i801.c @@ -1078,7 +1078,7 @@ static const struct pci_device_id i801_ids[] = { MODULE_DEVICE_TABLE(pci, i801_ids); #if defined CONFIG_X86 && defined CONFIG_DMI -static unsigned char apanel_addr; +static unsigned char apanel_addr __ro_after_init; /* Scan the system ROM for the signature "FJKEYINF" */ static __init const void __iomem *bios_signature(const void __iomem *bios) From 19cd1d96d6f8be4942f4042ad1a7c730b548e824 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Fri, 3 May 2024 17:56:19 +1000 Subject: [PATCH 1389/1565] powerpc/64: Set _IO_BASE to POISON_POINTER_DELTA not 0 for CONFIG_PCI=n [ Upstream commit be140f1732b523947425aaafbe2e37b41b622d96 ] There is code that builds with calls to IO accessors even when CONFIG_PCI=n, but the actual calls are guarded by runtime checks. If not those calls would be faulting, because the page at virtual address zero is (usually) not mapped into the kernel. As Arnd pointed out, it is possible a large port value could cause the address to be above mmap_min_addr which would then access userspace, which would be a bug. To avoid any such issues, set _IO_BASE to POISON_POINTER_DELTA. That is a value chosen to point into unmapped space between the kernel and userspace, so any access will always fault. Note that on 32-bit POISON_POINTER_DELTA is 0, so the patch only has an effect on 64-bit. Signed-off-by: Michael Ellerman Link: https://msgid.link/20240503075619.394467-2-mpe@ellerman.id.au Signed-off-by: Sasha Levin --- arch/powerpc/include/asm/io.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h index 058d21f493fa..c6b56aa0334f 100644 --- a/arch/powerpc/include/asm/io.h +++ b/arch/powerpc/include/asm/io.h @@ -45,7 +45,7 @@ extern struct pci_dev *isa_bridge_pcidev; * define properly based on the platform */ #ifndef CONFIG_PCI -#define _IO_BASE 0 +#define _IO_BASE POISON_POINTER_DELTA #define _ISA_MEM_BASE 0 #define PCI_DRAM_OFFSET 0 #elif defined(CONFIG_PPC32) From 1617249e24bd04c8047956afb43feec4876d1715 Mon Sep 17 00:00:00 2001 From: Mike Marshall Date: Wed, 1 May 2024 16:20:36 -0400 Subject: [PATCH 1390/1565] orangefs: fix out-of-bounds fsid access [ Upstream commit 53e4efa470d5fc6a96662d2d3322cfc925818517 ] Arnd Bergmann sent a patch to fsdevel, he says: "orangefs_statfs() copies two consecutive fields of the superblock into the statfs structure, which triggers a warning from the string fortification helpers" Jan Kara suggested an alternate way to do the patch to make it more readable. I ran both ideas through xfstests and both seem fine. This patch is based on Jan Kara's suggestion. Signed-off-by: Mike Marshall Signed-off-by: Sasha Levin --- fs/orangefs/super.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c index 2f2e430461b2..b48aef43b51d 100644 --- a/fs/orangefs/super.c +++ b/fs/orangefs/super.c @@ -200,7 +200,8 @@ static int orangefs_statfs(struct dentry *dentry, struct kstatfs *buf) (long)new_op->downcall.resp.statfs.files_avail); buf->f_type = sb->s_magic; - memcpy(&buf->f_fsid, &ORANGEFS_SB(sb)->fs_id, sizeof(buf->f_fsid)); + buf->f_fsid.val[0] = ORANGEFS_SB(sb)->fs_id; + buf->f_fsid.val[1] = ORANGEFS_SB(sb)->id; buf->f_bsize = new_op->downcall.resp.statfs.block_size; buf->f_namelen = ORANGEFS_NAME_MAX; From 9d046f697e9a409597feb6f81a20d074f5a195f0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= Date: Mon, 8 Apr 2024 09:46:21 +0200 Subject: [PATCH 1391/1565] kunit: Fix timeout message MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 53026ff63bb07c04a0e962a74723eb10ff6f9dc7 ] The exit code is always checked, so let's properly handle the -ETIMEDOUT error code. Cc: Brendan Higgins Cc: Shuah Khan Reviewed-by: Kees Cook Reviewed-by: David Gow Reviewed-by: Rae Moar Signed-off-by: Mickaël Salaün Link: https://lore.kernel.org/r/20240408074625.65017-4-mic@digikod.net Signed-off-by: Shuah Khan Signed-off-by: Sasha Levin --- lib/kunit/try-catch.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c index 71e5c5853099..d18da926b2cd 100644 --- a/lib/kunit/try-catch.c +++ b/lib/kunit/try-catch.c @@ -76,7 +76,6 @@ void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context) time_remaining = wait_for_completion_timeout(&try_completion, kunit_test_timeout()); if (time_remaining == 0) { - kunit_err(test, "try timed out\n"); try_catch->try_result = -ETIMEDOUT; } @@ -89,6 +88,8 @@ void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context) try_catch->try_result = 0; else if (exit_code == -EINTR) kunit_err(test, "wake_up_process() was never called\n"); + else if (exit_code == -ETIMEDOUT) + kunit_err(test, "try timed out\n"); else if (exit_code) kunit_err(test, "Unknown error: %d\n", exit_code); From 6b84e9d53bc0156c129f1be05e57c4701b80ab9d Mon Sep 17 00:00:00 2001 From: Greg Kurz Date: Tue, 9 Mar 2021 19:11:10 +0100 Subject: [PATCH 1392/1565] powerpc/xmon: Check cpu id in commands "c#", "dp#" and "dx#" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 8873aab8646194a4446117bb617cc71bddda2dee ] All these commands end up peeking into the PACA using the user originated cpu id as an index. Check the cpu id is valid in order to prevent xmon to crash. Instead of printing an error, this follows the same behavior as the "lp s #" command : ignore the buggy cpu id parameter and fall back to the #-less version of the command. Signed-off-by: Greg Kurz Reviewed-by: Cédric Le Goater Signed-off-by: Michael Ellerman Link: https://msgid.link/161531347060.252863.10490063933688958044.stgit@bahia.lan Signed-off-by: Sasha Levin --- arch/powerpc/xmon/xmon.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 3de2adc0a807..a2883360d07c 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -1249,7 +1249,7 @@ static int cpu_cmd(void) unsigned long cpu, first_cpu, last_cpu; int timeout; - if (!scanhex(&cpu)) { + if (!scanhex(&cpu) || cpu >= num_possible_cpus()) { /* print cpus waiting or in xmon */ printf("cpus stopped:"); last_cpu = first_cpu = NR_CPUS; @@ -2680,7 +2680,7 @@ static void dump_pacas(void) termch = c; /* Put c back, it wasn't 'a' */ - if (scanhex(&num)) + if (scanhex(&num) && num < num_possible_cpus()) dump_one_paca(num); else dump_one_paca(xmon_owner); @@ -2777,7 +2777,7 @@ static void dump_xives(void) termch = c; /* Put c back, it wasn't 'a' */ - if (scanhex(&num)) + if (scanhex(&num) && num < num_possible_cpus()) dump_one_xive(num); else dump_one_xive(xmon_owner); From b694989bb13ed5f166e633faa1eb0f21c6d261a6 Mon Sep 17 00:00:00 2001 From: "Jose E. Marchesi" Date: Wed, 8 May 2024 12:13:13 +0200 Subject: [PATCH 1393/1565] bpf: Avoid uninitialized value in BPF_CORE_READ_BITFIELD [ Upstream commit 009367099eb61a4fc2af44d4eb06b6b4de7de6db ] [Changes from V1: - Use a default branch in the switch statement to initialize `val'.] GCC warns that `val' may be used uninitialized in the BPF_CRE_READ_BITFIELD macro, defined in bpf_core_read.h as: [...] unsigned long long val; \ [...] \ switch (__CORE_RELO(s, field, BYTE_SIZE)) { \ case 1: val = *(const unsigned char *)p; break; \ case 2: val = *(const unsigned short *)p; break; \ case 4: val = *(const unsigned int *)p; break; \ case 8: val = *(const unsigned long long *)p; break; \ } \ [...] val; \ } \ This patch adds a default entry in the switch statement that sets `val' to zero in order to avoid the warning, and random values to be used in case __builtin_preserve_field_info returns unexpected values for BPF_FIELD_BYTE_SIZE. Tested in bpf-next master. No regressions. Signed-off-by: Jose E. Marchesi Signed-off-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20240508101313.16662-1-jose.marchesi@oracle.com Signed-off-by: Sasha Levin --- tools/lib/bpf/bpf_core_read.h | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/lib/bpf/bpf_core_read.h b/tools/lib/bpf/bpf_core_read.h index f05cfc082915..4303b31498d8 100644 --- a/tools/lib/bpf/bpf_core_read.h +++ b/tools/lib/bpf/bpf_core_read.h @@ -101,6 +101,7 @@ enum bpf_enum_value_kind { case 2: val = *(const unsigned short *)p; break; \ case 4: val = *(const unsigned int *)p; break; \ case 8: val = *(const unsigned long long *)p; break; \ + default: val = 0; break; \ } \ val <<= __CORE_RELO(s, field, LSHIFT_U64); \ if (__CORE_RELO(s, field, SIGNED)) \ From 6d6d94287f6365282bbf41e9a5b5281985970789 Mon Sep 17 00:00:00 2001 From: Wang Yong Date: Tue, 7 May 2024 15:00:46 +0800 Subject: [PATCH 1394/1565] jffs2: Fix potential illegal address access in jffs2_free_inode [ Upstream commit af9a8730ddb6a4b2edd779ccc0aceb994d616830 ] During the stress testing of the jffs2 file system,the following abnormal printouts were found: [ 2430.649000] Unable to handle kernel paging request at virtual address 0069696969696948 [ 2430.649622] Mem abort info: [ 2430.649829] ESR = 0x96000004 [ 2430.650115] EC = 0x25: DABT (current EL), IL = 32 bits [ 2430.650564] SET = 0, FnV = 0 [ 2430.650795] EA = 0, S1PTW = 0 [ 2430.651032] FSC = 0x04: level 0 translation fault [ 2430.651446] Data abort info: [ 2430.651683] ISV = 0, ISS = 0x00000004 [ 2430.652001] CM = 0, WnR = 0 [ 2430.652558] [0069696969696948] address between user and kernel address ranges [ 2430.653265] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 2430.654512] CPU: 2 PID: 20919 Comm: cat Not tainted 5.15.25-g512f31242bf6 #33 [ 2430.655008] Hardware name: linux,dummy-virt (DT) [ 2430.655517] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 2430.656142] pc : kfree+0x78/0x348 [ 2430.656630] lr : jffs2_free_inode+0x24/0x48 [ 2430.657051] sp : ffff800009eebd10 [ 2430.657355] x29: ffff800009eebd10 x28: 0000000000000001 x27: 0000000000000000 [ 2430.658327] x26: ffff000038f09d80 x25: 0080000000000000 x24: ffff800009d38000 [ 2430.658919] x23: 5a5a5a5a5a5a5a5a x22: ffff000038f09d80 x21: ffff8000084f0d14 [ 2430.659434] x20: ffff0000bf9a6ac0 x19: 0169696969696940 x18: 0000000000000000 [ 2430.659969] x17: ffff8000b6506000 x16: ffff800009eec000 x15: 0000000000004000 [ 2430.660637] x14: 0000000000000000 x13: 00000001000820a1 x12: 00000000000d1b19 [ 2430.661345] x11: 0004000800000000 x10: 0000000000000001 x9 : ffff8000084f0d14 [ 2430.662025] x8 : ffff0000bf9a6b40 x7 : ffff0000bf9a6b48 x6 : 0000000003470302 [ 2430.662695] x5 : ffff00002e41dcc0 x4 : ffff0000bf9aa3b0 x3 : 0000000003470342 [ 2430.663486] x2 : 0000000000000000 x1 : ffff8000084f0d14 x0 : fffffc0000000000 [ 2430.664217] Call trace: [ 2430.664528] kfree+0x78/0x348 [ 2430.664855] jffs2_free_inode+0x24/0x48 [ 2430.665233] i_callback+0x24/0x50 [ 2430.665528] rcu_do_batch+0x1ac/0x448 [ 2430.665892] rcu_core+0x28c/0x3c8 [ 2430.666151] rcu_core_si+0x18/0x28 [ 2430.666473] __do_softirq+0x138/0x3cc [ 2430.666781] irq_exit+0xf0/0x110 [ 2430.667065] handle_domain_irq+0x6c/0x98 [ 2430.667447] gic_handle_irq+0xac/0xe8 [ 2430.667739] call_on_irq_stack+0x28/0x54 The parameter passed to kfree was 5a5a5a5a, which corresponds to the target field of the jffs_inode_info structure. It was found that all variables in the jffs_inode_info structure were 5a5a5a5a, except for the first member sem. It is suspected that these variables are not initialized because they were set to 5a5a5a5a during memory testing, which is meant to detect uninitialized memory.The sem variable is initialized in the function jffs2_i_init_once, while other members are initialized in the function jffs2_init_inode_info. The function jffs2_init_inode_info is called after iget_locked, but in the iget_locked function, the destroy_inode process is triggered, which releases the inode and consequently, the target member of the inode is not initialized.In concurrent high pressure scenarios, iget_locked may enter the destroy_inode branch as described in the code. Since the destroy_inode functionality of jffs2 only releases the target, the fix method is to set target to NULL in jffs2_i_init_once. Signed-off-by: Wang Yong Reviewed-by: Lu Zhongjun Reviewed-by: Yang Tao Cc: Xu Xin Cc: Yang Yang Signed-off-by: Richard Weinberger Signed-off-by: Sasha Levin --- fs/jffs2/super.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c index 81ca58c10b72..40cc5e62907c 100644 --- a/fs/jffs2/super.c +++ b/fs/jffs2/super.c @@ -58,6 +58,7 @@ static void jffs2_i_init_once(void *foo) struct jffs2_inode_info *f = foo; mutex_init(&f->sem); + f->target = NULL; inode_init_once(&f->vfs_inode); } From 93c034c4314bc4c4450a3869cd5da298502346ad Mon Sep 17 00:00:00 2001 From: Holger Dengler Date: Tue, 7 May 2024 17:03:18 +0200 Subject: [PATCH 1395/1565] s390/pkey: Wipe sensitive data on failure [ Upstream commit 1d8c270de5eb74245d72325d285894a577a945d9 ] Wipe sensitive data from stack also if the copy_to_user() fails. Suggested-by: Heiko Carstens Reviewed-by: Harald Freudenberger Reviewed-by: Ingo Franzki Acked-by: Heiko Carstens Signed-off-by: Holger Dengler Signed-off-by: Alexander Gordeev Signed-off-by: Sasha Levin --- drivers/s390/crypto/pkey_api.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/s390/crypto/pkey_api.c b/drivers/s390/crypto/pkey_api.c index 69882ff4db10..362c97d9bd5b 100644 --- a/drivers/s390/crypto/pkey_api.c +++ b/drivers/s390/crypto/pkey_api.c @@ -1155,7 +1155,7 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd, if (rc) break; if (copy_to_user(ucs, &kcs, sizeof(kcs))) - return -EFAULT; + rc = -EFAULT; memzero_explicit(&kcs, sizeof(kcs)); break; } @@ -1187,7 +1187,7 @@ static long pkey_unlocked_ioctl(struct file *filp, unsigned int cmd, if (rc) break; if (copy_to_user(ucp, &kcp, sizeof(kcp))) - return -EFAULT; + rc = -EFAULT; memzero_explicit(&kcp, sizeof(kcp)); break; } From febed740a31de7f515b66548fad11dc9681fe120 Mon Sep 17 00:00:00 2001 From: Neal Cardwell Date: Wed, 26 Jun 2024 22:42:27 -0400 Subject: [PATCH 1396/1565] UPSTREAM: tcp: fix DSACK undo in fast recovery to call tcp_try_to_open() [ Upstream commit a6458ab7fd4f427d4f6f54380453ad255b7fde83 ] In some production workloads we noticed that connections could sometimes close extremely prematurely with ETIMEDOUT after transmitting only 1 TLP and RTO retransmission (when we would normally expect roughly tcp_retries2 = TCP_RETR2 = 15 RTOs before a connection closes with ETIMEDOUT). From tracing we determined that these workloads can suffer from a scenario where in fast recovery, after some retransmits, a DSACK undo can happen at a point where the scoreboard is totally clear (we have retrans_out == sacked_out == lost_out == 0). In such cases, calling tcp_try_keep_open() means that we do not execute any code path that clears tp->retrans_stamp to 0. That means that tp->retrans_stamp can remain erroneously set to the start time of the undone fast recovery, even after the fast recovery is undone. If minutes or hours elapse, and then a TLP/RTO/RTO sequence occurs, then the start_ts value in retransmits_timed_out() (which is from tp->retrans_stamp) will be erroneously ancient (left over from the fast recovery undone via DSACKs). Thus this ancient tp->retrans_stamp value can cause the connection to die very prematurely with ETIMEDOUT via tcp_write_err(). The fix: we change DSACK undo in fast recovery (TCP_CA_Recovery) to call tcp_try_to_open() instead of tcp_try_keep_open(). This ensures that if no retransmits are in flight at the time of DSACK undo in fast recovery then we properly zero retrans_stamp. Note that calling tcp_try_to_open() is more consistent with other loss recovery behavior, since normal fast recovery (CA_Recovery) and RTO recovery (CA_Loss) both normally end when tp->snd_una meets or exceeds tp->high_seq and then in tcp_fastretrans_alert() the "default" switch case executes tcp_try_to_open(). Also note that by inspection this change to call tcp_try_to_open() implies at least one other nice bug fix, where now an ECE-marked DSACK that causes an undo will properly invoke tcp_enter_cwr() rather than ignoring the ECE mark. Fixes: c7d9d6a185a7 ("tcp: undo on DSACK during recovery") Signed-off-by: Neal Cardwell Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 512f8dc051c6..604ff1b04c3e 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2993,7 +2993,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una, return; if (tcp_try_undo_dsack(sk)) - tcp_try_keep_open(sk); + tcp_try_to_open(sk, flag); tcp_identify_packet_loss(sk, ack_flag); if (icsk->icsk_ca_state != TCP_CA_Recovery) { From cdffc358717e436bb67122bb82c1a2a26e050f98 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 27 Jun 2024 14:25:00 -0700 Subject: [PATCH 1397/1565] tcp_metrics: validate source addr length [ Upstream commit 66be40e622e177316ae81717aa30057ba9e61dff ] I don't see anything checking that TCP_METRICS_ATTR_SADDR_IPV4 is at least 4 bytes long, and the policy doesn't have an entry for this attribute at all (neither does it for IPv6 but v6 is manually validated). Reviewed-by: Eric Dumazet Fixes: 3e7013ddf55a ("tcp: metrics: Allow selective get/del of tcp-metrics based on src IP") Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- net/ipv4/tcp_metrics.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index f823a15b973c..95b5ac082a2f 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -619,6 +619,7 @@ static const struct nla_policy tcp_metrics_nl_policy[TCP_METRICS_ATTR_MAX + 1] = [TCP_METRICS_ATTR_ADDR_IPV4] = { .type = NLA_U32, }, [TCP_METRICS_ATTR_ADDR_IPV6] = { .type = NLA_BINARY, .len = sizeof(struct in6_addr), }, + [TCP_METRICS_ATTR_SADDR_IPV4] = { .type = NLA_U32, }, /* Following attributes are not received for GET/DEL, * we keep them for reference */ From df76fb67eaa256ac1b95e9c11a24c3cacfdcac0e Mon Sep 17 00:00:00 2001 From: Jozef Hopko Date: Mon, 1 Jul 2024 18:23:20 +0200 Subject: [PATCH 1398/1565] wifi: wilc1000: fix ies_len type in connect path MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 39ab8fff623053a50951b659e5f6b72343d7d78c ] Commit 205c50306acf ("wifi: wilc1000: fix RCU usage in connect path") made sure that the IEs data was manipulated under the relevant RCU section. Unfortunately, while doing so, the commit brought a faulty implicit cast from int to u8 on the ies_len variable, making the parsing fail to be performed correctly if the IEs block is larger than 255 bytes. This failure can be observed with Access Points appending a lot of IEs TLVs in their beacon frames (reproduced with a Pixel phone acting as an Access Point, which brough 273 bytes of IE data in my testing environment). Fix IEs parsing by removing this undesired implicit cast. Fixes: 205c50306acf ("wifi: wilc1000: fix RCU usage in connect path") Signed-off-by: Jozef Hopko Signed-off-by: Alexis Lothoré Acked-by: Ajay Singh Signed-off-by: Kalle Valo Link: https://patch.msgid.link/20240701-wilc_fix_ies_data-v1-1-7486cbacf98a@bootlin.com Signed-off-by: Sasha Levin --- drivers/net/wireless/microchip/wilc1000/hif.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/microchip/wilc1000/hif.c b/drivers/net/wireless/microchip/wilc1000/hif.c index 457386f9de99..3f167bf4eef3 100644 --- a/drivers/net/wireless/microchip/wilc1000/hif.c +++ b/drivers/net/wireless/microchip/wilc1000/hif.c @@ -364,7 +364,8 @@ void *wilc_parse_join_bss_param(struct cfg80211_bss *bss, struct ieee80211_p2p_noa_attr noa_attr; const struct cfg80211_bss_ies *ies; struct wilc_join_bss_param *param; - u8 rates_len = 0, ies_len; + u8 rates_len = 0; + int ies_len; int ret; param = kzalloc(sizeof(*param), GFP_KERNEL); From 707c85ba3527ad6aa25552033576b0f1ff835d7b Mon Sep 17 00:00:00 2001 From: Sam Sun Date: Tue, 2 Jul 2024 14:55:55 +0100 Subject: [PATCH 1399/1565] bonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set() [ Upstream commit e271ff53807e8f2c628758290f0e499dbe51cb3d ] In function bond_option_arp_ip_targets_set(), if newval->string is an empty string, newval->string+1 will point to the byte after the string, causing an out-of-bound read. BUG: KASAN: slab-out-of-bounds in strlen+0x7d/0xa0 lib/string.c:418 Read of size 1 at addr ffff8881119c4781 by task syz-executor665/8107 CPU: 1 PID: 8107 Comm: syz-executor665 Not tainted 6.7.0-rc7 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:364 [inline] print_report+0xc1/0x5e0 mm/kasan/report.c:475 kasan_report+0xbe/0xf0 mm/kasan/report.c:588 strlen+0x7d/0xa0 lib/string.c:418 __fortify_strlen include/linux/fortify-string.h:210 [inline] in4_pton+0xa3/0x3f0 net/core/utils.c:130 bond_option_arp_ip_targets_set+0xc2/0x910 drivers/net/bonding/bond_options.c:1201 __bond_opt_set+0x2a4/0x1030 drivers/net/bonding/bond_options.c:767 __bond_opt_set_notify+0x48/0x150 drivers/net/bonding/bond_options.c:792 bond_opt_tryset_rtnl+0xda/0x160 drivers/net/bonding/bond_options.c:817 bonding_sysfs_store_option+0xa1/0x120 drivers/net/bonding/bond_sysfs.c:156 dev_attr_store+0x54/0x80 drivers/base/core.c:2366 sysfs_kf_write+0x114/0x170 fs/sysfs/file.c:136 kernfs_fop_write_iter+0x337/0x500 fs/kernfs/file.c:334 call_write_iter include/linux/fs.h:2020 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x96a/0xd80 fs/read_write.c:584 ksys_write+0x122/0x250 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b ---[ end trace ]--- Fix it by adding a check of string length before using it. Fixes: f9de11a16594 ("bonding: add ip checks when store ip target") Signed-off-by: Yue Sun Signed-off-by: Simon Horman Acked-by: Jay Vosburgh Reviewed-by: Hangbin Liu Link: https://patch.msgid.link/20240702-bond-oob-v6-1-2dfdba195c19@kernel.org Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/bonding/bond_options.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index fe55c81608da..fa0bf77870d4 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -1100,9 +1100,9 @@ static int bond_option_arp_ip_targets_set(struct bonding *bond, __be32 target; if (newval->string) { - if (!in4_pton(newval->string+1, -1, (u8 *)&target, -1, NULL)) { - netdev_err(bond->dev, "invalid ARP target %pI4 specified\n", - &target); + if (strlen(newval->string) < 1 || + !in4_pton(newval->string + 1, -1, (u8 *)&target, -1, NULL)) { + netdev_err(bond->dev, "invalid ARP target specified\n"); return ret; } if (newval->string[0] == '+') From 1782a42ca25c0a9b8943a4d11e68e9b52a46ab5c Mon Sep 17 00:00:00 2001 From: Zijian Zhang Date: Mon, 1 Jul 2024 22:53:48 +0000 Subject: [PATCH 1400/1565] selftests: fix OOM in msg_zerocopy selftest [ Upstream commit af2b7e5b741aaae9ffbba2c660def434e07aa241 ] In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg on a socket with MSG_ZEROCOPY flag, and it will recv the notifications until the socket is not writable. Typically, it will start the receiving process after around 30+ sendmsgs. However, as the introduction of commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is always writable and does not get any chance to run recv notifications. The selftest always exits with OUT_OF_MEMORY because the memory used by opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a different value to trigger OOM on older kernels too. Thus, we introduce "cfg_notification_limit" to force sender to receive notifications after some number of sendmsgs. Fixes: 07b65c5b31ce ("test: add msg_zerocopy test") Signed-off-by: Zijian Zhang Signed-off-by: Xiaochun Lu Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/20240701225349.3395580-2-zijianzhang@bytedance.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- tools/testing/selftests/net/msg_zerocopy.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/msg_zerocopy.c b/tools/testing/selftests/net/msg_zerocopy.c index bdc03a2097e8..926556febc83 100644 --- a/tools/testing/selftests/net/msg_zerocopy.c +++ b/tools/testing/selftests/net/msg_zerocopy.c @@ -85,6 +85,7 @@ static bool cfg_rx; static int cfg_runtime_ms = 4200; static int cfg_verbose; static int cfg_waittime_ms = 500; +static int cfg_notification_limit = 32; static bool cfg_zerocopy; static socklen_t cfg_alen; @@ -95,6 +96,7 @@ static char payload[IP_MAXPACKET]; static long packets, bytes, completions, expected_completions; static int zerocopied = -1; static uint32_t next_completion; +static uint32_t sends_since_notify; static unsigned long gettimeofday_ms(void) { @@ -208,6 +210,7 @@ static bool do_sendmsg(int fd, struct msghdr *msg, bool do_zerocopy, int domain) error(1, errno, "send"); if (cfg_verbose && ret != len) fprintf(stderr, "send: ret=%u != %u\n", ret, len); + sends_since_notify++; if (len) { packets++; @@ -460,6 +463,7 @@ static bool do_recv_completion(int fd, int domain) static void do_recv_completions(int fd, int domain) { while (do_recv_completion(fd, domain)) {} + sends_since_notify = 0; } /* Wait for all remaining completions on the errqueue */ @@ -549,6 +553,9 @@ static void do_tx(int domain, int type, int protocol) else do_sendmsg(fd, &msg, cfg_zerocopy, domain); + if (cfg_zerocopy && sends_since_notify >= cfg_notification_limit) + do_recv_completions(fd, domain); + while (!do_poll(fd, POLLOUT)) { if (cfg_zerocopy) do_recv_completions(fd, domain); @@ -708,7 +715,7 @@ static void parse_opts(int argc, char **argv) cfg_payload_len = max_payload_len; - while ((c = getopt(argc, argv, "46c:C:D:i:mp:rs:S:t:vz")) != -1) { + while ((c = getopt(argc, argv, "46c:C:D:i:l:mp:rs:S:t:vz")) != -1) { switch (c) { case '4': if (cfg_family != PF_UNSPEC) @@ -736,6 +743,9 @@ static void parse_opts(int argc, char **argv) if (cfg_ifindex == 0) error(1, errno, "invalid iface: %s", optarg); break; + case 'l': + cfg_notification_limit = strtoul(optarg, NULL, 0); + break; case 'm': cfg_cork_mixed = true; break; From 3908637dce2e7fb848fe4ca118f46865ec47d38a Mon Sep 17 00:00:00 2001 From: Zijian Zhang Date: Mon, 1 Jul 2024 22:53:49 +0000 Subject: [PATCH 1401/1565] selftests: make order checking verbose in msg_zerocopy selftest [ Upstream commit 7d6d8f0c8b700c9493f2839abccb6d29028b4219 ] We find that when lock debugging is on, notifications may not come in order. Thus, we have order checking outputs managed by cfg_verbose, to avoid too many outputs in this case. Fixes: 07b65c5b31ce ("test: add msg_zerocopy test") Signed-off-by: Zijian Zhang Signed-off-by: Xiaochun Lu Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/20240701225349.3395580-3-zijianzhang@bytedance.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- tools/testing/selftests/net/msg_zerocopy.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/msg_zerocopy.c b/tools/testing/selftests/net/msg_zerocopy.c index 926556febc83..7ea5fb28c93d 100644 --- a/tools/testing/selftests/net/msg_zerocopy.c +++ b/tools/testing/selftests/net/msg_zerocopy.c @@ -438,7 +438,7 @@ static bool do_recv_completion(int fd, int domain) /* Detect notification gaps. These should not happen often, if at all. * Gaps can occur due to drops, reordering and retransmissions. */ - if (lo != next_completion) + if (cfg_verbose && lo != next_completion) fprintf(stderr, "gap: %u..%u does not append to %u\n", lo, hi, next_completion); next_completion = hi + 1; From 7ef519c8efde152e0d632337f2994f6921e0b7e4 Mon Sep 17 00:00:00 2001 From: Shigeru Yoshida Date: Wed, 3 Jul 2024 18:16:49 +0900 Subject: [PATCH 1402/1565] inet_diag: Initialize pad field in struct inet_diag_req_v2 [ Upstream commit 61cf1c739f08190a4cbf047b9fbb192a94d87e3f ] KMSAN reported uninit-value access in raw_lookup() [1]. Diag for raw sockets uses the pad field in struct inet_diag_req_v2 for the underlying protocol. This field corresponds to the sdiag_raw_protocol field in struct inet_diag_req_raw. inet_diag_get_exact_compat() converts inet_diag_req to inet_diag_req_v2, but leaves the pad field uninitialized. So the issue occurs when raw_lookup() accesses the sdiag_raw_protocol field. Fix this by initializing the pad field in inet_diag_get_exact_compat(). Also, do the same fix in inet_diag_dump_compat() to avoid the similar issue in the future. [1] BUG: KMSAN: uninit-value in raw_lookup net/ipv4/raw_diag.c:49 [inline] BUG: KMSAN: uninit-value in raw_sock_get+0x657/0x800 net/ipv4/raw_diag.c:71 raw_lookup net/ipv4/raw_diag.c:49 [inline] raw_sock_get+0x657/0x800 net/ipv4/raw_diag.c:71 raw_diag_dump_one+0xa1/0x660 net/ipv4/raw_diag.c:99 inet_diag_cmd_exact+0x7d9/0x980 inet_diag_get_exact_compat net/ipv4/inet_diag.c:1404 [inline] inet_diag_rcv_msg_compat+0x469/0x530 net/ipv4/inet_diag.c:1426 sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282 netlink_rcv_skb+0x537/0x670 net/netlink/af_netlink.c:2564 sock_diag_rcv+0x35/0x40 net/core/sock_diag.c:297 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline] netlink_unicast+0xe74/0x1240 net/netlink/af_netlink.c:1361 netlink_sendmsg+0x10c6/0x1260 net/netlink/af_netlink.c:1905 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x332/0x3d0 net/socket.c:745 ____sys_sendmsg+0x7f0/0xb70 net/socket.c:2585 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2639 __sys_sendmsg net/socket.c:2668 [inline] __do_sys_sendmsg net/socket.c:2677 [inline] __se_sys_sendmsg net/socket.c:2675 [inline] __x64_sys_sendmsg+0x27e/0x4a0 net/socket.c:2675 x64_sys_call+0x135e/0x3ce0 arch/x86/include/generated/asm/syscalls_64.h:47 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xd9/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was stored to memory at: raw_sock_get+0x650/0x800 net/ipv4/raw_diag.c:71 raw_diag_dump_one+0xa1/0x660 net/ipv4/raw_diag.c:99 inet_diag_cmd_exact+0x7d9/0x980 inet_diag_get_exact_compat net/ipv4/inet_diag.c:1404 [inline] inet_diag_rcv_msg_compat+0x469/0x530 net/ipv4/inet_diag.c:1426 sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282 netlink_rcv_skb+0x537/0x670 net/netlink/af_netlink.c:2564 sock_diag_rcv+0x35/0x40 net/core/sock_diag.c:297 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline] netlink_unicast+0xe74/0x1240 net/netlink/af_netlink.c:1361 netlink_sendmsg+0x10c6/0x1260 net/netlink/af_netlink.c:1905 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x332/0x3d0 net/socket.c:745 ____sys_sendmsg+0x7f0/0xb70 net/socket.c:2585 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2639 __sys_sendmsg net/socket.c:2668 [inline] __do_sys_sendmsg net/socket.c:2677 [inline] __se_sys_sendmsg net/socket.c:2675 [inline] __x64_sys_sendmsg+0x27e/0x4a0 net/socket.c:2675 x64_sys_call+0x135e/0x3ce0 arch/x86/include/generated/asm/syscalls_64.h:47 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xd9/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Local variable req.i created at: inet_diag_get_exact_compat net/ipv4/inet_diag.c:1396 [inline] inet_diag_rcv_msg_compat+0x2a6/0x530 net/ipv4/inet_diag.c:1426 sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282 CPU: 1 PID: 8888 Comm: syz-executor.6 Not tainted 6.10.0-rc4-00217-g35bb670d65fc #32 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 Fixes: 432490f9d455 ("net: ip, diag -- Add diag interface for raw sockets") Reported-by: syzkaller Signed-off-by: Shigeru Yoshida Reviewed-by: Eric Dumazet Link: https://patch.msgid.link/20240703091649.111773-1-syoshida@redhat.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/ipv4/inet_diag.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 27a5a7d66d18..b9df76f6571c 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -1275,6 +1275,7 @@ static int inet_diag_dump_compat(struct sk_buff *skb, req.sdiag_family = AF_UNSPEC; /* compatibility */ req.sdiag_protocol = inet_diag_type2proto(cb->nlh->nlmsg_type); req.idiag_ext = rc->idiag_ext; + req.pad = 0; req.idiag_states = rc->idiag_states; req.id = rc->id; @@ -1290,6 +1291,7 @@ static int inet_diag_get_exact_compat(struct sk_buff *in_skb, req.sdiag_family = rc->idiag_family; req.sdiag_protocol = inet_diag_type2proto(nlh->nlmsg_type); req.idiag_ext = rc->idiag_ext; + req.pad = 0; req.idiag_states = rc->idiag_states; req.id = rc->id; From 731011ac6c37cbe97ece229fc6daa486276052c5 Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi Date: Sun, 23 Jun 2024 14:11:33 +0900 Subject: [PATCH 1403/1565] nilfs2: fix inode number range checks commit e2fec219a36e0993642844be0f345513507031f4 upstream. Patch series "nilfs2: fix potential issues related to reserved inodes". This series fixes one use-after-free issue reported by syzbot, caused by nilfs2's internal inode being exposed in the namespace on a corrupted filesystem, and a couple of flaws that cause problems if the starting number of non-reserved inodes written in the on-disk super block is intentionally (or corruptly) changed from its default value. This patch (of 3): In the current implementation of nilfs2, "nilfs->ns_first_ino", which gives the first non-reserved inode number, is read from the superblock, but its lower limit is not checked. As a result, if a number that overlaps with the inode number range of reserved inodes such as the root directory or metadata files is set in the super block parameter, the inode number test macros (NILFS_MDT_INODE and NILFS_VALID_INODE) will not function properly. In addition, these test macros use left bit-shift calculations using with the inode number as the shift count via the BIT macro, but the result of a shift calculation that exceeds the bit width of an integer is undefined in the C specification, so if "ns_first_ino" is set to a large value other than the default value NILFS_USER_INO (=11), the macros may potentially malfunction depending on the environment. Fix these issues by checking the lower bound of "nilfs->ns_first_ino" and by preventing bit shifts equal to or greater than the NILFS_USER_INO constant in the inode number test macros. Also, change the type of "ns_first_ino" from signed integer to unsigned integer to avoid the need for type casting in comparisons such as the lower bound check introduced this time. Link: https://lkml.kernel.org/r/20240623051135.4180-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240623051135.4180-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi Cc: Hillf Danton Cc: Jan Kara Cc: Matthew Wilcox (Oracle) Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- fs/nilfs2/nilfs.h | 5 +++-- fs/nilfs2/the_nilfs.c | 6 ++++++ fs/nilfs2/the_nilfs.h | 2 +- 3 files changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h index ace27a89fbb0..cefd2cfc7012 100644 --- a/fs/nilfs2/nilfs.h +++ b/fs/nilfs2/nilfs.h @@ -116,9 +116,10 @@ enum { #define NILFS_FIRST_INO(sb) (((struct the_nilfs *)sb->s_fs_info)->ns_first_ino) #define NILFS_MDT_INODE(sb, ino) \ - ((ino) < NILFS_FIRST_INO(sb) && (NILFS_MDT_INO_BITS & BIT(ino))) + ((ino) < NILFS_USER_INO && (NILFS_MDT_INO_BITS & BIT(ino))) #define NILFS_VALID_INODE(sb, ino) \ - ((ino) >= NILFS_FIRST_INO(sb) || (NILFS_SYS_INO_BITS & BIT(ino))) + ((ino) >= NILFS_FIRST_INO(sb) || \ + ((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino)))) /** * struct nilfs_transaction_info: context information for synchronization diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c index a8374d89da7c..09ba8bbaf6ca 100644 --- a/fs/nilfs2/the_nilfs.c +++ b/fs/nilfs2/the_nilfs.c @@ -452,6 +452,12 @@ static int nilfs_store_disk_layout(struct the_nilfs *nilfs, } nilfs->ns_first_ino = le32_to_cpu(sbp->s_first_ino); + if (nilfs->ns_first_ino < NILFS_USER_INO) { + nilfs_err(nilfs->ns_sb, + "too small lower limit for non-reserved inode numbers: %u", + nilfs->ns_first_ino); + return -EINVAL; + } nilfs->ns_blocks_per_segment = le32_to_cpu(sbp->s_blocks_per_segment); if (nilfs->ns_blocks_per_segment < NILFS_SEG_MIN_BLOCKS) { diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h index 01914089e76d..c1eea04052e6 100644 --- a/fs/nilfs2/the_nilfs.h +++ b/fs/nilfs2/the_nilfs.h @@ -182,7 +182,7 @@ struct the_nilfs { unsigned long ns_nrsvsegs; unsigned long ns_first_data_block; int ns_inode_size; - int ns_first_ino; + unsigned int ns_first_ino; u32 ns_crc_seed; /* /sys/fs// */ From 2f2fa9cf7c3537958a82fbe8c8595a5eb0861ad7 Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi Date: Sun, 23 Jun 2024 14:11:34 +0900 Subject: [PATCH 1404/1565] nilfs2: add missing check for inode numbers on directory entries commit bb76c6c274683c8570ad788f79d4b875bde0e458 upstream. Syzbot reported that mounting and unmounting a specific pattern of corrupted nilfs2 filesystem images causes a use-after-free of metadata file inodes, which triggers a kernel bug in lru_add_fn(). As Jan Kara pointed out, this is because the link count of a metadata file gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(), tries to delete that inode (ifile inode in this case). The inconsistency occurs because directories containing the inode numbers of these metadata files that should not be visible in the namespace are read without checking. Fix this issue by treating the inode numbers of these internal files as errors in the sanity check helper when reading directory folios/pages. Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer analysis. Link: https://lkml.kernel.org/r/20240623051135.4180-3-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi Reported-by: syzbot+d79afb004be235636ee8@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8 Reported-by: Jan Kara Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3 Tested-by: Ryusuke Konishi Cc: Hillf Danton Cc: Matthew Wilcox (Oracle) Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- fs/nilfs2/dir.c | 6 ++++++ fs/nilfs2/nilfs.h | 5 +++++ 2 files changed, 11 insertions(+) diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c index 552234ef22fe..3e15a8fdac4c 100644 --- a/fs/nilfs2/dir.c +++ b/fs/nilfs2/dir.c @@ -143,6 +143,9 @@ static bool nilfs_check_page(struct page *page) goto Enamelen; if (((offs + rec_len - 1) ^ offs) & ~(chunk_size-1)) goto Espan; + if (unlikely(p->inode && + NILFS_PRIVATE_INODE(le64_to_cpu(p->inode)))) + goto Einumber; } if (offs != limit) goto Eend; @@ -168,6 +171,9 @@ static bool nilfs_check_page(struct page *page) goto bad_entry; Espan: error = "directory entry across blocks"; + goto bad_entry; +Einumber: + error = "disallowed inode number"; bad_entry: nilfs_error(sb, "bad entry in directory #%lu: %s - offset=%lu, inode=%lu, rec_len=%d, name_len=%d", diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h index cefd2cfc7012..3f3971e0292d 100644 --- a/fs/nilfs2/nilfs.h +++ b/fs/nilfs2/nilfs.h @@ -121,6 +121,11 @@ enum { ((ino) >= NILFS_FIRST_INO(sb) || \ ((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino)))) +#define NILFS_PRIVATE_INODE(ino) ({ \ + ino_t __ino = (ino); \ + ((__ino) < NILFS_USER_INO && (__ino) != NILFS_ROOT_INO && \ + (__ino) != NILFS_SKETCH_INO); }) + /** * struct nilfs_transaction_info: context information for synchronization * @ti_magic: Magic number From f033241a7c2d9c36d28939cbaa3f2fff7edf34f4 Mon Sep 17 00:00:00 2001 From: Jinliang Zheng Date: Thu, 20 Jun 2024 20:21:24 +0800 Subject: [PATCH 1405/1565] mm: optimize the redundant loop of mm_update_owner_next() commit cf3f9a593dab87a032d2b6a6fb205e7f3de4f0a1 upstream. When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or /proc or ptrace or page migration (get_task_mm()), it is impossible to find an appropriate task_struct in the loop whose mm_struct is the same as the target mm_struct. If the above race condition is combined with the stress-ng-zombie and stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in write_lock_irq() for tasklist_lock. Recognize this situation in advance and exit early. Link: https://lkml.kernel.org/r/20240620122123.3877432-1-alexjlzheng@tencent.com Signed-off-by: Jinliang Zheng Acked-by: Michal Hocko Cc: Christian Brauner Cc: Jens Axboe Cc: Mateusz Guzik Cc: Matthew Wilcox (Oracle) Cc: Oleg Nesterov Cc: Tycho Andersen Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- kernel/exit.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/exit.c b/kernel/exit.c index bacdaf980933..af9c8e794e4d 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -432,6 +432,8 @@ void mm_update_next_owner(struct mm_struct *mm) * Search through everything else, we should not get here often. */ for_each_process(g) { + if (atomic_read(&mm->mm_users) <= 1) + break; if (g->flags & PF_KTHREAD) continue; for_each_thread(g, c) { From 7a49389771ae7666f4dc3426e2a4594bf23ae290 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Fri, 21 Jun 2024 16:42:38 +0200 Subject: [PATCH 1406/1565] mm: avoid overflows in dirty throttling logic commit 385d838df280eba6c8680f9777bfa0d0bfe7e8b2 upstream. The dirty throttling logic is interspersed with assumptions that dirty limits in PAGE_SIZE units fit into 32-bit (so that various multiplications fit into 64-bits). If limits end up being larger, we will hit overflows, possible divisions by 0 etc. Fix these problems by never allowing so large dirty limits as they have dubious practical value anyway. For dirty_bytes / dirty_background_bytes interfaces we can just refuse to set so large limits. For dirty_ratio / dirty_background_ratio it isn't so simple as the dirty limit is computed from the amount of available memory which can change due to memory hotplug etc. So when converting dirty limits from ratios to numbers of pages, we just don't allow the result to exceed UINT_MAX. This is root-only triggerable problem which occurs when the operator sets dirty limits to >16 TB. Link: https://lkml.kernel.org/r/20240621144246.11148-2-jack@suse.cz Signed-off-by: Jan Kara Reported-by: Zach O'Keefe Reviewed-By: Zach O'Keefe Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- mm/page-writeback.c | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index e8d7d3c2bfcb..ef825aa14de0 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -432,13 +432,20 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc) else bg_thresh = (bg_ratio * available_memory) / PAGE_SIZE; - if (bg_thresh >= thresh) - bg_thresh = thresh / 2; tsk = current; if (rt_task(tsk)) { bg_thresh += bg_thresh / 4 + global_wb_domain.dirty_limit / 32; thresh += thresh / 4 + global_wb_domain.dirty_limit / 32; } + /* + * Dirty throttling logic assumes the limits in page units fit into + * 32-bits. This gives 16TB dirty limits max which is hopefully enough. + */ + if (thresh > UINT_MAX) + thresh = UINT_MAX; + /* This makes sure bg_thresh is within 32-bits as well */ + if (bg_thresh >= thresh) + bg_thresh = thresh / 2; dtc->thresh = thresh; dtc->bg_thresh = bg_thresh; @@ -488,7 +495,11 @@ static unsigned long node_dirty_limit(struct pglist_data *pgdat) if (rt_task(tsk)) dirty += dirty / 4; - return dirty; + /* + * Dirty throttling logic assumes the limits in page units fit into + * 32-bits. This gives 16TB dirty limits max which is hopefully enough. + */ + return min_t(unsigned long, dirty, UINT_MAX); } /** @@ -524,10 +535,17 @@ int dirty_background_bytes_handler(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { int ret; + unsigned long old_bytes = dirty_background_bytes; ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos); - if (ret == 0 && write) + if (ret == 0 && write) { + if (DIV_ROUND_UP(dirty_background_bytes, PAGE_SIZE) > + UINT_MAX) { + dirty_background_bytes = old_bytes; + return -ERANGE; + } dirty_background_ratio = 0; + } return ret; } @@ -553,6 +571,10 @@ int dirty_bytes_handler(struct ctl_table *table, int write, ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos); if (ret == 0 && write && vm_dirty_bytes != old_bytes) { + if (DIV_ROUND_UP(vm_dirty_bytes, PAGE_SIZE) > UINT_MAX) { + vm_dirty_bytes = old_bytes; + return -ERANGE; + } writeback_set_ratelimit(); vm_dirty_ratio = 0; } From 215a26c2404fa34625c725d446967fa328a703eb Mon Sep 17 00:00:00 2001 From: Zijun Hu Date: Thu, 16 May 2024 21:31:34 +0800 Subject: [PATCH 1407/1565] Bluetooth: qca: Fix BT enable failure again for QCA6390 after warm reboot commit 88e72239ead9814b886db54fc4ee39ef3c2b8f26 upstream. Commit 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on closed serdev") will cause below regression issue: BT can't be enabled after below steps: cold boot -> enable BT -> disable BT -> warm reboot -> BT enable failure if property enable-gpios is not configured within DT|ACPI for QCA6390. The commit is to fix a use-after-free issue within qca_serdev_shutdown() by adding condition to avoid the serdev is flushed or wrote after closed but also introduces this regression issue regarding above steps since the VSC is not sent to reset controller during warm reboot. Fixed by sending the VSC to reset controller within qca_serdev_shutdown() once BT was ever enabled, and the use-after-free issue is also fixed by this change since the serdev is still opened before it is flushed or wrote. Verified by the reported machine Dell XPS 13 9310 laptop over below two kernel commits: commit e00fc2700a3f ("Bluetooth: btusb: Fix triggering coredump implementation for QCA") of bluetooth-next tree. commit b23d98d46d28 ("Bluetooth: btusb: Fix triggering coredump implementation for QCA") of linus mainline tree. Fixes: 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on closed serdev") Cc: stable@vger.kernel.org Reported-by: Wren Turkal Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218726 Signed-off-by: Zijun Hu Tested-by: Wren Turkal Signed-off-by: Luiz Augusto von Dentz Signed-off-by: Greg Kroah-Hartman --- drivers/bluetooth/hci_qca.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c index 6e0c0762fbab..24cfc552e6a8 100644 --- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -2076,15 +2076,27 @@ static void qca_serdev_shutdown(struct device *dev) struct qca_serdev *qcadev = serdev_device_get_drvdata(serdev); struct hci_uart *hu = &qcadev->serdev_hu; struct hci_dev *hdev = hu->hdev; - struct qca_data *qca = hu->priv; const u8 ibs_wake_cmd[] = { 0xFD }; const u8 edl_reset_soc_cmd[] = { 0x01, 0x00, 0xFC, 0x01, 0x05 }; if (qcadev->btsoc_type == QCA_QCA6390) { - if (test_bit(QCA_BT_OFF, &qca->flags) || - !test_bit(HCI_RUNNING, &hdev->flags)) + /* The purpose of sending the VSC is to reset SOC into a initial + * state and the state will ensure next hdev->setup() success. + * if HCI_QUIRK_NON_PERSISTENT_SETUP is set, it means that + * hdev->setup() can do its job regardless of SoC state, so + * don't need to send the VSC. + * if HCI_SETUP is set, it means that hdev->setup() was never + * invoked and the SOC is already in the initial state, so + * don't also need to send the VSC. + */ + if (test_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks) || + hci_dev_test_flag(hdev, HCI_SETUP)) return; + /* The serdev must be in open state when conrol logic arrives + * here, so also fix the use-after-free issue caused by that + * the serdev is flushed or wrote after it is closed. + */ serdev_device_write_flush(serdev); ret = serdev_device_write_buf(serdev, ibs_wake_cmd, sizeof(ibs_wake_cmd)); From 9528e95d6eaeb8076e7646e7bffd5683fef384f5 Mon Sep 17 00:00:00 2001 From: Jimmy Assarsson Date: Fri, 28 Jun 2024 21:45:29 +0200 Subject: [PATCH 1408/1565] can: kvaser_usb: Explicitly initialize family in leafimx driver_info struct commit 19d5b2698c35b2132a355c67b4d429053804f8cc upstream. Explicitly set the 'family' driver_info struct member for leafimx. Previously, the correct operation relied on KVASER_LEAF being the first defined value in enum kvaser_usb_leaf_family. Fixes: e6c80e601053 ("can: kvaser_usb: kvaser_usb_leaf: fix CAN clock frequency regression") Signed-off-by: Jimmy Assarsson Link: https://lore.kernel.org/all/20240628194529.312968-1-extja@kvaser.com Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c b/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c index 1f015b496a47..411b3adb1d9e 100644 --- a/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c +++ b/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c @@ -114,6 +114,7 @@ static const struct kvaser_usb_driver_info kvaser_usb_driver_info_leaf_err_liste static const struct kvaser_usb_driver_info kvaser_usb_driver_info_leafimx = { .quirks = 0, + .family = KVASER_LEAF, .ops = &kvaser_usb_leaf_dev_ops, }; From c9f715f1b416baf2f8d455afcbcbdd74dc29d619 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Mon, 17 Jun 2024 18:23:00 +0200 Subject: [PATCH 1409/1565] fsnotify: Do not generate events for O_PATH file descriptors commit 702eb71fd6501b3566283f8c96d7ccc6ddd662e9 upstream. Currently we will not generate FS_OPEN events for O_PATH file descriptors but we will generate FS_CLOSE events for them. This is asymmetry is confusing. Arguably no fsnotify events should be generated for O_PATH file descriptors as they cannot be used to access or modify file content, they are just convenient handles to file objects like paths. So fix the asymmetry by stopping to generate FS_CLOSE for O_PATH file descriptors. Cc: Signed-off-by: Jan Kara Link: https://lore.kernel.org/r/20240617162303.1596-1-jack@suse.cz Reviewed-by: Amir Goldstein Signed-off-by: Christian Brauner Signed-off-by: Greg Kroah-Hartman --- include/linux/fsnotify.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index bb8467cd11ae..34f242105be2 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -93,7 +93,13 @@ static inline int fsnotify_file(struct file *file, __u32 mask) { const struct path *path = &file->f_path; - if (file->f_mode & FMODE_NONOTIFY) + /* + * FMODE_NONOTIFY are fds generated by fanotify itself which should not + * generate new events. We also don't want to generate events for + * FMODE_PATH fds (involves open & close events) as they are just + * handle creation / destruction events and not "real" file events. + */ + if (file->f_mode & (FMODE_NONOTIFY | FMODE_PATH)) return 0; return fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH); From 145faa3d03688cbb7bbaaecbd84c01539852942c Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Fri, 21 Jun 2024 16:42:37 +0200 Subject: [PATCH 1410/1565] Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again" commit 30139c702048f1097342a31302cbd3d478f50c63 upstream. Patch series "mm: Avoid possible overflows in dirty throttling". Dirty throttling logic assumes dirty limits in page units fit into 32-bits. This patch series makes sure this is true (see patch 2/2 for more details). This patch (of 2): This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78. The commit is broken in several ways. Firstly, the removed (u64) cast from the multiplication will introduce a multiplication overflow on 32-bit archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the default settings with 4GB of RAM will trigger this). Secondly, the div64_u64() is unnecessarily expensive on 32-bit archs. We have div64_ul() in case we want to be safe & cheap. Thirdly, if dirty thresholds are larger than 1<<32 pages, then dirty balancing is going to blow up in many other spectacular ways anyway so trying to fix one possible overflow is just moot. Link: https://lkml.kernel.org/r/20240621144017.30993-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240621144246.11148-1-jack@suse.cz Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again") Signed-off-by: Jan Kara Reviewed-By: Zach O'Keefe Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- mm/page-writeback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index ef825aa14de0..b2c916474855 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1546,7 +1546,7 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc) */ dtc->wb_thresh = __wb_calc_thresh(dtc); dtc->wb_bg_thresh = dtc->thresh ? - div64_u64(dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0; + div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0; /* * In order to avoid the stacked BDI deadlock we need From 274cba8d2d1b48c72d8bd90e76c9e2dc1aa0a81d Mon Sep 17 00:00:00 2001 From: Ma Ke Date: Thu, 27 Jun 2024 15:42:04 +0800 Subject: [PATCH 1411/1565] drm/nouveau: fix null pointer dereference in nouveau_connector_get_modes commit 80bec6825b19d95ccdfd3393cf8ec15ff2a749b4 upstream. In nouveau_connector_get_modes(), the return value of drm_mode_duplicate() is assigned to mode, which will lead to a possible NULL pointer dereference on failure of drm_mode_duplicate(). Add a check to avoid npd. Cc: stable@vger.kernel.org Fixes: 6ee738610f41 ("drm/nouveau: Add DRM driver for NVIDIA GPUs") Signed-off-by: Ma Ke Signed-off-by: Lyude Paul Link: https://patchwork.freedesktop.org/patch/msgid/20240627074204.3023776-1-make24@iscas.ac.cn Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/nouveau/nouveau_connector.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c index d2cefca120c4..6b8ad830c034 100644 --- a/drivers/gpu/drm/nouveau/nouveau_connector.c +++ b/drivers/gpu/drm/nouveau/nouveau_connector.c @@ -960,6 +960,9 @@ nouveau_connector_get_modes(struct drm_connector *connector) struct drm_display_mode *mode; mode = drm_mode_duplicate(dev, nv_connector->native_mode); + if (!mode) + return 0; + drm_mode_probed_add(connector, mode); ret = 1; } From fac2544b8c997417cf36a8aaea4fd3bd9409f8fe Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Mon, 1 Jul 2024 12:50:10 -0400 Subject: [PATCH 1412/1565] drm/amdgpu/atomfirmware: silence UBSAN warning commit d0417264437a8fa05f894cabba5a26715b32d78e upstream. This is a variable sized array. Link: https://lists.freedesktop.org/archives/amd-gfx/2024-June/110420.html Tested-by: Jeff Layton Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/include/atomfirmware.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/include/atomfirmware.h b/drivers/gpu/drm/amd/include/atomfirmware.h index 3e526c394f6c..ce7b0763ff19 100644 --- a/drivers/gpu/drm/amd/include/atomfirmware.h +++ b/drivers/gpu/drm/amd/include/atomfirmware.h @@ -690,7 +690,7 @@ struct atom_gpio_pin_lut_v2_1 { struct atom_common_table_header table_header; /*the real number of this included in the structure is calcualted by using the (whole structure size - the header size)/size of atom_gpio_pin_lut */ - struct atom_gpio_pin_assignment gpio_pin[8]; + struct atom_gpio_pin_assignment gpio_pin[]; }; From 55d6a97cf02c79d766eeea0653c7de6dd2cbe687 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Thu, 16 May 2024 15:13:20 +0200 Subject: [PATCH 1413/1565] mtd: rawnand: Bypass a couple of sanity checks during NAND identification commit 8754d9835683e8fab9a8305acdb38a3aeb9d20bd upstream. Early during NAND identification, mtd_info fields have not yet been initialized (namely, writesize and oobsize) and thus cannot be used for sanity checks yet. Of course if there is a misuse of nand_change_read_column_op() so early we won't be warned, but there is anyway no actual check to perform at this stage as we do not yet know the NAND geometry. So, if the fields are empty, especially mtd->writesize which is *always* set quite rapidly after identification, let's skip the sanity checks. nand_change_read_column_op() is subject to be used early for ONFI/JEDEC identification in the very unlikely case of: - bitflips appearing in the parameter page, - the controller driver not supporting simple DATA_IN cycles. As nand_change_read_column_op() uses nand_fill_column_cycles() the logic explaind above also applies in this secondary helper. Fixes: c27842e7e11f ("mtd: rawnand: onfi: Adapt the parameter page read to constraint controllers") Fixes: daca31765e8b ("mtd: rawnand: jedec: Adapt the parameter page read to constraint controllers") Cc: stable@vger.kernel.org Reported-by: Alexander Dahl Closes: https://lore.kernel.org/linux-mtd/20240306-shaky-bunion-d28b65ea97d7@thorsis.com/ Reported-by: Steven Seeger Closes: https://lore.kernel.org/linux-mtd/DM6PR05MB4506554457CF95191A670BDEF7062@DM6PR05MB4506.namprd05.prod.outlook.com/ Signed-off-by: Miquel Raynal Tested-by: Sascha Hauer Link: https://lore.kernel.org/linux-mtd/20240516131320.579822-3-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/nand/raw/nand_base.c | 57 ++++++++++++++++++-------------- 1 file changed, 32 insertions(+), 25 deletions(-) diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c index c41c0ff611b1..308fcbe394a5 100644 --- a/drivers/mtd/nand/raw/nand_base.c +++ b/drivers/mtd/nand/raw/nand_base.c @@ -964,28 +964,32 @@ static int nand_fill_column_cycles(struct nand_chip *chip, u8 *addrs, unsigned int offset_in_page) { struct mtd_info *mtd = nand_to_mtd(chip); + bool ident_stage = !mtd->writesize; - /* Make sure the offset is less than the actual page size. */ - if (offset_in_page > mtd->writesize + mtd->oobsize) - return -EINVAL; - - /* - * On small page NANDs, there's a dedicated command to access the OOB - * area, and the column address is relative to the start of the OOB - * area, not the start of the page. Asjust the address accordingly. - */ - if (mtd->writesize <= 512 && offset_in_page >= mtd->writesize) - offset_in_page -= mtd->writesize; - - /* - * The offset in page is expressed in bytes, if the NAND bus is 16-bit - * wide, then it must be divided by 2. - */ - if (chip->options & NAND_BUSWIDTH_16) { - if (WARN_ON(offset_in_page % 2)) + /* Bypass all checks during NAND identification */ + if (likely(!ident_stage)) { + /* Make sure the offset is less than the actual page size. */ + if (offset_in_page > mtd->writesize + mtd->oobsize) return -EINVAL; - offset_in_page /= 2; + /* + * On small page NANDs, there's a dedicated command to access the OOB + * area, and the column address is relative to the start of the OOB + * area, not the start of the page. Asjust the address accordingly. + */ + if (mtd->writesize <= 512 && offset_in_page >= mtd->writesize) + offset_in_page -= mtd->writesize; + + /* + * The offset in page is expressed in bytes, if the NAND bus is 16-bit + * wide, then it must be divided by 2. + */ + if (chip->options & NAND_BUSWIDTH_16) { + if (WARN_ON(offset_in_page % 2)) + return -EINVAL; + + offset_in_page /= 2; + } } addrs[0] = offset_in_page; @@ -994,7 +998,7 @@ static int nand_fill_column_cycles(struct nand_chip *chip, u8 *addrs, * Small page NANDs use 1 cycle for the columns, while large page NANDs * need 2 */ - if (mtd->writesize <= 512) + if (!ident_stage && mtd->writesize <= 512) return 1; addrs[1] = offset_in_page >> 8; @@ -1189,16 +1193,19 @@ int nand_change_read_column_op(struct nand_chip *chip, unsigned int len, bool force_8bit) { struct mtd_info *mtd = nand_to_mtd(chip); + bool ident_stage = !mtd->writesize; if (len && !buf) return -EINVAL; - if (offset_in_page + len > mtd->writesize + mtd->oobsize) - return -EINVAL; + if (!ident_stage) { + if (offset_in_page + len > mtd->writesize + mtd->oobsize) + return -EINVAL; - /* Small page NANDs do not support column change. */ - if (mtd->writesize <= 512) - return -ENOTSUPP; + /* Small page NANDs do not support column change. */ + if (mtd->writesize <= 512) + return -ENOTSUPP; + } if (nand_has_exec_op(chip)) { const struct nand_sdr_timings *sdr = From 8b17cec33892a66bbd71f8d9a70a45e2072ae84f Mon Sep 17 00:00:00 2001 From: Ghadi Elie Rahme Date: Thu, 27 Jun 2024 14:14:05 +0300 Subject: [PATCH 1414/1565] bnx2x: Fix multiple UBSAN array-index-out-of-bounds commit 134061163ee5ca4759de5c24ca3bd71608891ba7 upstream. Fix UBSAN warnings that occur when using a system with 32 physical cpu cores or more, or when the user defines a number of Ethernet queues greater than or equal to FP_SB_MAX_E1x using the num_queues module parameter. Currently there is a read/write out of bounds that occurs on the array "struct stats_query_entry query" present inside the "bnx2x_fw_stats_req" struct in "drivers/net/ethernet/broadcom/bnx2x/bnx2x.h". Looking at the definition of the "struct stats_query_entry query" array: struct stats_query_entry query[FP_SB_MAX_E1x+ BNX2X_FIRST_QUEUE_QUERY_IDX]; FP_SB_MAX_E1x is defined as the maximum number of fast path interrupts and has a value of 16, while BNX2X_FIRST_QUEUE_QUERY_IDX has a value of 3 meaning the array has a total size of 19. Since accesses to "struct stats_query_entry query" are offset-ted by BNX2X_FIRST_QUEUE_QUERY_IDX, that means that the total number of Ethernet queues should not exceed FP_SB_MAX_E1x (16). However one of these queues is reserved for FCOE and thus the number of Ethernet queues should be set to [FP_SB_MAX_E1x -1] (15) if FCOE is enabled or [FP_SB_MAX_E1x] (16) if it is not. This is also described in a comment in the source code in drivers/net/ethernet/broadcom/bnx2x/bnx2x.h just above the Macro definition of FP_SB_MAX_E1x. Below is the part of this explanation that it important for this patch /* * The total number of L2 queues, MSIX vectors and HW contexts (CIDs) is * control by the number of fast-path status blocks supported by the * device (HW/FW). Each fast-path status block (FP-SB) aka non-default * status block represents an independent interrupts context that can * serve a regular L2 networking queue. However special L2 queues such * as the FCoE queue do not require a FP-SB and other components like * the CNIC may consume FP-SB reducing the number of possible L2 queues * * If the maximum number of FP-SB available is X then: * a. If CNIC is supported it consumes 1 FP-SB thus the max number of * regular L2 queues is Y=X-1 * b. In MF mode the actual number of L2 queues is Y= (X-1/MF_factor) * c. If the FCoE L2 queue is supported the actual number of L2 queues * is Y+1 * d. The number of irqs (MSIX vectors) is either Y+1 (one extra for * slow-path interrupts) or Y+2 if CNIC is supported (one additional * FP interrupt context for the CNIC). * e. The number of HW context (CID count) is always X or X+1 if FCoE * L2 queue is supported. The cid for the FCoE L2 queue is always X. */ However this driver also supports NICs that use the E2 controller which can handle more queues due to having more FP-SB represented by FP_SB_MAX_E2. Looking at the commits when the E2 support was added, it was originally using the E1x parameters: commit f2e0899f0f27 ("bnx2x: Add 57712 support"). Back then FP_SB_MAX_E2 was set to 16 the same as E1x. However the driver was later updated to take full advantage of the E2 instead of having it be limited to the capabilities of the E1x. But as far as we can tell, the array "stats_query_entry query" was still limited to using the FP-SB available to the E1x cards as part of an oversignt when the driver was updated to take full advantage of the E2, and now with the driver being aware of the greater queue size supported by E2 NICs, it causes the UBSAN warnings seen in the stack traces below. This patch increases the size of the "stats_query_entry query" array by replacing FP_SB_MAX_E1x with FP_SB_MAX_E2 to be large enough to handle both types of NICs. Stack traces: UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1529:11 index 20 is out of range for type 'stats_query_entry [19]' CPU: 12 PID: 858 Comm: systemd-network Not tainted 6.9.0-060900rc7-generic #202405052133 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019 Call Trace: dump_stack_lvl+0x76/0xa0 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xcb/0x110 bnx2x_prep_fw_stats_req+0x2e1/0x310 [bnx2x] bnx2x_stats_init+0x156/0x320 [bnx2x] bnx2x_post_irq_nic_init+0x81/0x1a0 [bnx2x] bnx2x_nic_load+0x8e8/0x19e0 [bnx2x] bnx2x_open+0x16b/0x290 [bnx2x] __dev_open+0x10e/0x1d0 RIP: 0033:0x736223927a0a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffc0bb2ada8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000583df50f9c78 RCX: 0000736223927a0a RDX: 0000000000000020 RSI: 0000583df50ee510 RDI: 0000000000000003 RBP: 0000583df50d4940 R08: 00007ffc0bb2adb0 R09: 0000000000000080 R10: 0000000000000000 R11: 0000000000000246 R12: 0000583df5103ae0 R13: 000000000000035a R14: 0000583df50f9c30 R15: 0000583ddddddf00 ---[ end trace ]--- ------------[ cut here ]------------ UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c:1546:11 index 28 is out of range for type 'stats_query_entry [19]' CPU: 12 PID: 858 Comm: systemd-network Not tainted 6.9.0-060900rc7-generic #202405052133 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019 Call Trace: dump_stack_lvl+0x76/0xa0 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xcb/0x110 bnx2x_prep_fw_stats_req+0x2fd/0x310 [bnx2x] bnx2x_stats_init+0x156/0x320 [bnx2x] bnx2x_post_irq_nic_init+0x81/0x1a0 [bnx2x] bnx2x_nic_load+0x8e8/0x19e0 [bnx2x] bnx2x_open+0x16b/0x290 [bnx2x] __dev_open+0x10e/0x1d0 RIP: 0033:0x736223927a0a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffc0bb2ada8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000583df50f9c78 RCX: 0000736223927a0a RDX: 0000000000000020 RSI: 0000583df50ee510 RDI: 0000000000000003 RBP: 0000583df50d4940 R08: 00007ffc0bb2adb0 R09: 0000000000000080 R10: 0000000000000000 R11: 0000000000000246 R12: 0000583df5103ae0 R13: 000000000000035a R14: 0000583df50f9c30 R15: 0000583ddddddf00 ---[ end trace ]--- ------------[ cut here ]------------ UBSAN: array-index-out-of-bounds in drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c:1895:8 index 29 is out of range for type 'stats_query_entry [19]' CPU: 13 PID: 163 Comm: kworker/u96:1 Not tainted 6.9.0-060900rc7-generic #202405052133 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019 Workqueue: bnx2x bnx2x_sp_task [bnx2x] Call Trace: dump_stack_lvl+0x76/0xa0 dump_stack+0x10/0x20 __ubsan_handle_out_of_bounds+0xcb/0x110 bnx2x_iov_adjust_stats_req+0x3c4/0x3d0 [bnx2x] bnx2x_storm_stats_post.part.0+0x4a/0x330 [bnx2x] ? bnx2x_hw_stats_post+0x231/0x250 [bnx2x] bnx2x_stats_start+0x44/0x70 [bnx2x] bnx2x_stats_handle+0x149/0x350 [bnx2x] bnx2x_attn_int_asserted+0x998/0x9b0 [bnx2x] bnx2x_sp_task+0x491/0x5c0 [bnx2x] process_one_work+0x18d/0x3f0 ---[ end trace ]--- Fixes: 50f0a562f8cc ("bnx2x: add fcoe statistics") Signed-off-by: Ghadi Elie Rahme Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20240627111405.1037812-1-ghadi.rahme@canonical.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h index 2a61229d3f97..c1fe9bcbb955 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h @@ -1262,7 +1262,7 @@ enum { struct bnx2x_fw_stats_req { struct stats_query_header hdr; - struct stats_query_entry query[FP_SB_MAX_E1x+ + struct stats_query_entry query[FP_SB_MAX_E2 + BNX2X_FIRST_QUEUE_QUERY_IDX]; }; From 37c59198bc3b869bb7fa848078f02e73f12c1cb5 Mon Sep 17 00:00:00 2001 From: Wang Yufen Date: Tue, 24 May 2022 15:53:11 +0800 Subject: [PATCH 1415/1565] bpf, sockmap: Fix sk->sk_forward_alloc warn_on in sk_stream_kill_queues commit d8616ee2affcff37c5d315310da557a694a3303d upstream. During TCP sockmap redirect pressure test, the following warning is triggered: WARNING: CPU: 3 PID: 2145 at net/core/stream.c:205 sk_stream_kill_queues+0xbc/0xd0 CPU: 3 PID: 2145 Comm: iperf Kdump: loaded Tainted: G W 5.10.0+ #9 Call Trace: inet_csk_destroy_sock+0x55/0x110 inet_csk_listen_stop+0xbb/0x380 tcp_close+0x41b/0x480 inet_release+0x42/0x80 __sock_release+0x3d/0xa0 sock_close+0x11/0x20 __fput+0x9d/0x240 task_work_run+0x62/0x90 exit_to_user_mode_prepare+0x110/0x120 syscall_exit_to_user_mode+0x27/0x190 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The reason we observed is that: When the listener is closing, a connection may have completed the three-way handshake but not accepted, and the client has sent some packets. The child sks in accept queue release by inet_child_forget()->inet_csk_destroy_sock(), but psocks of child sks have not released. To fix, add sock_map_destroy to release psocks. Signed-off-by: Wang Yufen Signed-off-by: Daniel Borkmann Signed-off-by: Andrii Nakryiko Acked-by: Jakub Sitnicki Acked-by: John Fastabend Link: https://lore.kernel.org/bpf/20220524075311.649153-1-wangyufen@huawei.com [ Conflict in include/linux/bpf.h due to function declaration position and remove non-existed sk_psock_stop helper from sock_map_destroy. ] Signed-off-by: Wen Gu Signed-off-by: Greg Kroah-Hartman --- include/linux/bpf.h | 1 + include/linux/skmsg.h | 1 + net/core/skmsg.c | 1 + net/core/sock_map.c | 22 ++++++++++++++++++++++ net/ipv4/tcp_bpf.c | 1 + 5 files changed, 26 insertions(+) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index a75faf437e75..340f4fef5b5a 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1800,6 +1800,7 @@ int sock_map_get_from_fd(const union bpf_attr *attr, struct bpf_prog *prog); int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype); int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value, u64 flags); void sock_map_unhash(struct sock *sk); +void sock_map_destroy(struct sock *sk); void sock_map_close(struct sock *sk, long timeout); #else static inline int sock_map_prog_update(struct bpf_map *map, diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 1138dd3071db..e2af013ec05f 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -98,6 +98,7 @@ struct sk_psock { spinlock_t link_lock; refcount_t refcnt; void (*saved_unhash)(struct sock *sk); + void (*saved_destroy)(struct sock *sk); void (*saved_close)(struct sock *sk, long timeout); void (*saved_write_space)(struct sock *sk); struct proto *sk_proto; diff --git a/net/core/skmsg.c b/net/core/skmsg.c index bb4fbc60b272..51792dda1b73 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -599,6 +599,7 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node) psock->eval = __SK_NONE; psock->sk_proto = prot; psock->saved_unhash = prot->unhash; + psock->saved_destroy = prot->destroy; psock->saved_close = prot->close; psock->saved_write_space = sk->sk_write_space; diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 52e395a189df..d1d0ee2dbfaa 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -1566,6 +1566,28 @@ void sock_map_unhash(struct sock *sk) saved_unhash(sk); } +void sock_map_destroy(struct sock *sk) +{ + void (*saved_destroy)(struct sock *sk); + struct sk_psock *psock; + + rcu_read_lock(); + psock = sk_psock_get(sk); + if (unlikely(!psock)) { + rcu_read_unlock(); + if (sk->sk_prot->destroy) + sk->sk_prot->destroy(sk); + return; + } + + saved_destroy = psock->saved_destroy; + sock_map_remove_links(sk, psock); + rcu_read_unlock(); + sk_psock_put(sk, psock); + saved_destroy(sk); +} +EXPORT_SYMBOL_GPL(sock_map_destroy); + void sock_map_close(struct sock *sk, long timeout) { void (*saved_close)(struct sock *sk, long timeout); diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index d0ca1fc325cd..f909e440bb22 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -582,6 +582,7 @@ static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS], struct proto *base) { prot[TCP_BPF_BASE] = *base; + prot[TCP_BPF_BASE].destroy = sock_map_destroy; prot[TCP_BPF_BASE].close = sock_map_close; prot[TCP_BPF_BASE].recvmsg = tcp_bpf_recvmsg; prot[TCP_BPF_BASE].stream_memory_read = tcp_bpf_stream_read; From a6176a802c4bfb83bf7524591aa75f44a639a853 Mon Sep 17 00:00:00 2001 From: GUO Zihua Date: Tue, 7 May 2024 01:25:41 +0000 Subject: [PATCH 1416/1565] ima: Avoid blocking in RCU read-side critical section commit 9a95c5bfbf02a0a7f5983280fe284a0ff0836c34 upstream. A panic happens in ima_match_policy: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 PGD 42f873067 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 5 PID: 1286325 Comm: kubeletmonit.sh Kdump: loaded Tainted: P Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 RIP: 0010:ima_match_policy+0x84/0x450 Code: 49 89 fc 41 89 cf 31 ed 89 44 24 14 eb 1c 44 39 7b 18 74 26 41 83 ff 05 74 20 48 8b 1b 48 3b 1d f2 b9 f4 00 0f 84 9c 01 00 00 <44> 85 73 10 74 ea 44 8b 6b 14 41 f6 c5 01 75 d4 41 f6 c5 02 74 0f RSP: 0018:ff71570009e07a80 EFLAGS: 00010207 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000200 RDX: ffffffffad8dc7c0 RSI: 0000000024924925 RDI: ff3e27850dea2000 RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffffabfce739 R10: ff3e27810cc42400 R11: 0000000000000000 R12: ff3e2781825ef970 R13: 00000000ff3e2785 R14: 000000000000000c R15: 0000000000000001 FS: 00007f5195b51740(0000) GS:ff3e278b12d40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000000626d24002 CR4: 0000000000361ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ima_get_action+0x22/0x30 process_measurement+0xb0/0x830 ? page_add_file_rmap+0x15/0x170 ? alloc_set_pte+0x269/0x4c0 ? prep_new_page+0x81/0x140 ? simple_xattr_get+0x75/0xa0 ? selinux_file_open+0x9d/0xf0 ima_file_check+0x64/0x90 path_openat+0x571/0x1720 do_filp_open+0x9b/0x110 ? page_counter_try_charge+0x57/0xc0 ? files_cgroup_alloc_fd+0x38/0x60 ? __alloc_fd+0xd4/0x250 ? do_sys_open+0x1bd/0x250 do_sys_open+0x1bd/0x250 do_syscall_64+0x5d/0x1d0 entry_SYSCALL_64_after_hwframe+0x65/0xca Commit c7423dbdbc9e ("ima: Handle -ESTALE returned by ima_filter_rule_match()") introduced call to ima_lsm_copy_rule within a RCU read-side critical section which contains kmalloc with GFP_KERNEL. This implies a possible sleep and violates limitations of RCU read-side critical sections on non-PREEMPT systems. Sleeping within RCU read-side critical section might cause synchronize_rcu() returning early and break RCU protection, allowing a UAF to happen. The root cause of this issue could be described as follows: | Thread A | Thread B | | |ima_match_policy | | | rcu_read_lock | |ima_lsm_update_rule | | | synchronize_rcu | | | | kmalloc(GFP_KERNEL)| | | sleep | ==> synchronize_rcu returns early | kfree(entry) | | | | entry = entry->next| ==> UAF happens and entry now becomes NULL (or could be anything). | | entry->action | ==> Accessing entry might cause panic. To fix this issue, we are converting all kmalloc that is called within RCU read-side critical section to use GFP_ATOMIC. Fixes: c7423dbdbc9e ("ima: Handle -ESTALE returned by ima_filter_rule_match()") Cc: stable@vger.kernel.org Signed-off-by: GUO Zihua Acked-by: John Johansen Reviewed-by: Mimi Zohar Reviewed-by: Casey Schaufler [PM: fixed missing comment, long lines, !CONFIG_IMA_LSM_RULES case] Signed-off-by: Paul Moore Signed-off-by: Roberto Sassu Signed-off-by: Greg Kroah-Hartman --- include/linux/lsm_hook_defs.h | 2 +- include/linux/security.h | 5 +++-- kernel/auditfilter.c | 5 +++-- security/apparmor/audit.c | 6 +++--- security/apparmor/include/audit.h | 2 +- security/integrity/ima/ima.h | 2 +- security/integrity/ima/ima_policy.c | 15 +++++++++------ security/security.c | 6 ++++-- security/selinux/include/audit.h | 4 +++- security/selinux/ss/services.c | 5 +++-- security/smack/smack_lsm.c | 4 +++- 11 files changed, 34 insertions(+), 22 deletions(-) diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index 07abcd384975..35bb13ce1faf 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -370,7 +370,7 @@ LSM_HOOK(int, 0, key_getsecurity, struct key *key, char **_buffer) #ifdef CONFIG_AUDIT LSM_HOOK(int, 0, audit_rule_init, u32 field, u32 op, char *rulestr, - void **lsmrule) + void **lsmrule, gfp_t gfp) LSM_HOOK(int, 0, audit_rule_known, struct audit_krule *krule) LSM_HOOK(int, 0, audit_rule_match, u32 secid, u32 field, u32 op, void *lsmrule) LSM_HOOK(void, LSM_RET_VOID, audit_rule_free, void *lsmrule) diff --git a/include/linux/security.h b/include/linux/security.h index 5b61aa19fac6..e32e040f094c 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1856,7 +1856,8 @@ static inline int security_key_getsecurity(struct key *key, char **_buffer) #ifdef CONFIG_AUDIT #ifdef CONFIG_SECURITY -int security_audit_rule_init(u32 field, u32 op, char *rulestr, void **lsmrule); +int security_audit_rule_init(u32 field, u32 op, char *rulestr, void **lsmrule, + gfp_t gfp); int security_audit_rule_known(struct audit_krule *krule); int security_audit_rule_match(u32 secid, u32 field, u32 op, void *lsmrule); void security_audit_rule_free(void *lsmrule); @@ -1864,7 +1865,7 @@ void security_audit_rule_free(void *lsmrule); #else static inline int security_audit_rule_init(u32 field, u32 op, char *rulestr, - void **lsmrule) + void **lsmrule, gfp_t gfp) { return 0; } diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c index 333b3bcfc545..a5b43f25609e 100644 --- a/kernel/auditfilter.c +++ b/kernel/auditfilter.c @@ -521,7 +521,8 @@ static struct audit_entry *audit_data_to_entry(struct audit_rule_data *data, entry->rule.buflen += f_val; f->lsm_str = str; err = security_audit_rule_init(f->type, f->op, str, - (void **)&f->lsm_rule); + (void **)&f->lsm_rule, + GFP_KERNEL); /* Keep currently invalid fields around in case they * become valid after a policy reload. */ if (err == -EINVAL) { @@ -790,7 +791,7 @@ static inline int audit_dupe_lsm_field(struct audit_field *df, /* our own (refreshed) copy of lsm_rule */ ret = security_audit_rule_init(df->type, df->op, df->lsm_str, - (void **)&df->lsm_rule); + (void **)&df->lsm_rule, GFP_KERNEL); /* Keep currently invalid fields around in case they * become valid after a policy reload. */ if (ret == -EINVAL) { diff --git a/security/apparmor/audit.c b/security/apparmor/audit.c index 704b0c895605..963df28584ee 100644 --- a/security/apparmor/audit.c +++ b/security/apparmor/audit.c @@ -173,7 +173,7 @@ void aa_audit_rule_free(void *vrule) } } -int aa_audit_rule_init(u32 field, u32 op, char *rulestr, void **vrule) +int aa_audit_rule_init(u32 field, u32 op, char *rulestr, void **vrule, gfp_t gfp) { struct aa_audit_rule *rule; @@ -186,14 +186,14 @@ int aa_audit_rule_init(u32 field, u32 op, char *rulestr, void **vrule) return -EINVAL; } - rule = kzalloc(sizeof(struct aa_audit_rule), GFP_KERNEL); + rule = kzalloc(sizeof(struct aa_audit_rule), gfp); if (!rule) return -ENOMEM; /* Currently rules are treated as coming from the root ns */ rule->label = aa_label_parse(&root_ns->unconfined->label, rulestr, - GFP_KERNEL, true, false); + gfp, true, false); if (IS_ERR(rule->label)) { int err = PTR_ERR(rule->label); aa_audit_rule_free(rule); diff --git a/security/apparmor/include/audit.h b/security/apparmor/include/audit.h index 18519a4eb67e..f325f1bef8d6 100644 --- a/security/apparmor/include/audit.h +++ b/security/apparmor/include/audit.h @@ -186,7 +186,7 @@ static inline int complain_error(int error) } void aa_audit_rule_free(void *vrule); -int aa_audit_rule_init(u32 field, u32 op, char *rulestr, void **vrule); +int aa_audit_rule_init(u32 field, u32 op, char *rulestr, void **vrule, gfp_t gfp); int aa_audit_rule_known(struct audit_krule *rule); int aa_audit_rule_match(u32 sid, u32 field, u32 op, void *vrule); diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h index 6ebefec616e4..14b77a71c16c 100644 --- a/security/integrity/ima/ima.h +++ b/security/integrity/ima/ima.h @@ -420,7 +420,7 @@ static inline void ima_free_modsig(struct modsig *modsig) #else static inline int ima_filter_rule_init(u32 field, u32 op, char *rulestr, - void **lsmrule) + void **lsmrule, gfp_t gfp) { return -EINVAL; } diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c index 4f5d44037081..8d69b2e27936 100644 --- a/security/integrity/ima/ima_policy.c +++ b/security/integrity/ima/ima_policy.c @@ -349,7 +349,8 @@ static void ima_free_rule(struct ima_rule_entry *entry) kfree(entry); } -static struct ima_rule_entry *ima_lsm_copy_rule(struct ima_rule_entry *entry) +static struct ima_rule_entry *ima_lsm_copy_rule(struct ima_rule_entry *entry, + gfp_t gfp) { struct ima_rule_entry *nentry; int i; @@ -358,7 +359,7 @@ static struct ima_rule_entry *ima_lsm_copy_rule(struct ima_rule_entry *entry) * Immutable elements are copied over as pointers and data; only * lsm rules can change */ - nentry = kmemdup(entry, sizeof(*nentry), GFP_KERNEL); + nentry = kmemdup(entry, sizeof(*nentry), gfp); if (!nentry) return NULL; @@ -373,7 +374,8 @@ static struct ima_rule_entry *ima_lsm_copy_rule(struct ima_rule_entry *entry) ima_filter_rule_init(nentry->lsm[i].type, Audit_equal, nentry->lsm[i].args_p, - &nentry->lsm[i].rule); + &nentry->lsm[i].rule, + gfp); if (!nentry->lsm[i].rule) pr_warn("rule for LSM \'%s\' is undefined\n", nentry->lsm[i].args_p); @@ -386,7 +388,7 @@ static int ima_lsm_update_rule(struct ima_rule_entry *entry) int i; struct ima_rule_entry *nentry; - nentry = ima_lsm_copy_rule(entry); + nentry = ima_lsm_copy_rule(entry, GFP_KERNEL); if (!nentry) return -ENOMEM; @@ -573,7 +575,7 @@ static bool ima_match_rules(struct ima_rule_entry *rule, struct inode *inode, } if (rc == -ESTALE && !rule_reinitialized) { - lsm_rule = ima_lsm_copy_rule(rule); + lsm_rule = ima_lsm_copy_rule(rule, GFP_ATOMIC); if (lsm_rule) { rule_reinitialized = true; goto retry; @@ -990,7 +992,8 @@ static int ima_lsm_rule_init(struct ima_rule_entry *entry, entry->lsm[lsm_rule].type = audit_type; result = ima_filter_rule_init(entry->lsm[lsm_rule].type, Audit_equal, entry->lsm[lsm_rule].args_p, - &entry->lsm[lsm_rule].rule); + &entry->lsm[lsm_rule].rule, + GFP_KERNEL); if (!entry->lsm[lsm_rule].rule) { pr_warn("rule for LSM \'%s\' is undefined\n", entry->lsm[lsm_rule].args_p); diff --git a/security/security.c b/security/security.c index 0bbcb100ba8e..f836f292ea16 100644 --- a/security/security.c +++ b/security/security.c @@ -2545,9 +2545,11 @@ int security_key_getsecurity(struct key *key, char **_buffer) #ifdef CONFIG_AUDIT -int security_audit_rule_init(u32 field, u32 op, char *rulestr, void **lsmrule) +int security_audit_rule_init(u32 field, u32 op, char *rulestr, void **lsmrule, + gfp_t gfp) { - return call_int_hook(audit_rule_init, 0, field, op, rulestr, lsmrule); + return call_int_hook(audit_rule_init, 0, field, op, rulestr, lsmrule, + gfp); } int security_audit_rule_known(struct audit_krule *krule) diff --git a/security/selinux/include/audit.h b/security/selinux/include/audit.h index 073a3d34a0d2..72af85ff96a4 100644 --- a/security/selinux/include/audit.h +++ b/security/selinux/include/audit.h @@ -18,12 +18,14 @@ * @op: the operater the rule uses * @rulestr: the text "target" of the rule * @rule: pointer to the new rule structure returned via this + * @gfp: GFP flag used for kmalloc * * Returns 0 if successful, -errno if not. On success, the rule structure * will be allocated internally. The caller must free this structure with * selinux_audit_rule_free() after use. */ -int selinux_audit_rule_init(u32 field, u32 op, char *rulestr, void **rule); +int selinux_audit_rule_init(u32 field, u32 op, char *rulestr, void **rule, + gfp_t gfp); /** * selinux_audit_rule_free - free an selinux audit rule structure. diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c index 3db8bd2158d9..a01e768337cd 100644 --- a/security/selinux/ss/services.c +++ b/security/selinux/ss/services.c @@ -3542,7 +3542,8 @@ void selinux_audit_rule_free(void *vrule) } } -int selinux_audit_rule_init(u32 field, u32 op, char *rulestr, void **vrule) +int selinux_audit_rule_init(u32 field, u32 op, char *rulestr, void **vrule, + gfp_t gfp) { struct selinux_state *state = &selinux_state; struct selinux_policy *policy; @@ -3583,7 +3584,7 @@ int selinux_audit_rule_init(u32 field, u32 op, char *rulestr, void **vrule) return -EINVAL; } - tmprule = kzalloc(sizeof(struct selinux_audit_rule), GFP_KERNEL); + tmprule = kzalloc(sizeof(struct selinux_audit_rule), gfp); if (!tmprule) return -ENOMEM; diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 750f6007bbbb..8c790563b33a 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -4490,11 +4490,13 @@ static int smack_post_notification(const struct cred *w_cred, * @op: required testing operator (=, !=, >, <, ...) * @rulestr: smack label to be audited * @vrule: pointer to save our own audit rule representation + * @gfp: type of the memory for the allocation * * Prepare to audit cases where (@field @op @rulestr) is true. * The label to be audited is created if necessay. */ -static int smack_audit_rule_init(u32 field, u32 op, char *rulestr, void **vrule) +static int smack_audit_rule_init(u32 field, u32 op, char *rulestr, void **vrule, + gfp_t gfp) { struct smack_known *skp; char **rule = (char **)vrule; From 2032e5dfae5f5c092d254c7b2a871364bfa3d4d1 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Mon, 29 Apr 2024 15:15:05 +0100 Subject: [PATCH 1417/1565] media: dw2102: fix a potential buffer overflow commit 1c73d0b29d04bf4082e7beb6a508895e118ee30d upstream. As pointed by smatch: drivers/media/usb/dvb-usb/dw2102.c:802 su3000_i2c_transfer() error: __builtin_memcpy() '&state->data[4]' too small (64 vs 67) That seemss to be due to a wrong copy-and-paste. Fixes: 0e148a522b84 ("media: dw2102: Don't translate i2c read into write") Reported-by: Hans Verkuil Reviewed-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman --- drivers/media/usb/dvb-usb/dw2102.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/usb/dvb-usb/dw2102.c b/drivers/media/usb/dvb-usb/dw2102.c index 5b51d0a8bec4..86e537f0316d 100644 --- a/drivers/media/usb/dvb-usb/dw2102.c +++ b/drivers/media/usb/dvb-usb/dw2102.c @@ -786,7 +786,7 @@ static int su3000_i2c_transfer(struct i2c_adapter *adap, struct i2c_msg msg[], if (msg[j].flags & I2C_M_RD) { /* single read */ - if (1 + msg[j].len > sizeof(state->data)) { + if (4 + msg[j].len > sizeof(state->data)) { warn("i2c rd: len=%d is too big!\n", msg[j].len); num = -EOPNOTSUPP; break; From 2849a1b747cf37aa5b684527104d3a53f1e296d2 Mon Sep 17 00:00:00 2001 From: Piotr Wojtaszczyk Date: Fri, 28 Jun 2024 17:25:42 +0200 Subject: [PATCH 1418/1565] i2c: pnx: Fix potential deadlock warning from del_timer_sync() call in isr [ Upstream commit f63b94be6942ba82c55343e196bd09b53227618e ] When del_timer_sync() is called in an interrupt context it throws a warning because of potential deadlock. The timer is used only to exit from wait_for_completion() after a timeout so replacing the call with wait_for_completion_timeout() allows to remove the problematic timer and its related functions altogether. Fixes: 41561f28e76a ("i2c: New Philips PNX bus driver") Signed-off-by: Piotr Wojtaszczyk Signed-off-by: Andi Shyti Signed-off-by: Sasha Levin --- drivers/i2c/busses/i2c-pnx.c | 48 ++++++++---------------------------- 1 file changed, 10 insertions(+), 38 deletions(-) diff --git a/drivers/i2c/busses/i2c-pnx.c b/drivers/i2c/busses/i2c-pnx.c index 8c4ec7f13f5a..b6b5a65efcbb 100644 --- a/drivers/i2c/busses/i2c-pnx.c +++ b/drivers/i2c/busses/i2c-pnx.c @@ -15,7 +15,6 @@ #include #include #include -#include #include #include #include @@ -32,7 +31,6 @@ struct i2c_pnx_mif { int ret; /* Return value */ int mode; /* Interface mode */ struct completion complete; /* I/O completion */ - struct timer_list timer; /* Timeout */ u8 * buf; /* Data buffer */ int len; /* Length of data buffer */ int order; /* RX Bytes to order via TX */ @@ -117,24 +115,6 @@ static inline int wait_reset(struct i2c_pnx_algo_data *data) return (timeout <= 0); } -static inline void i2c_pnx_arm_timer(struct i2c_pnx_algo_data *alg_data) -{ - struct timer_list *timer = &alg_data->mif.timer; - unsigned long expires = msecs_to_jiffies(alg_data->timeout); - - if (expires <= 1) - expires = 2; - - del_timer_sync(timer); - - dev_dbg(&alg_data->adapter.dev, "Timer armed at %lu plus %lu jiffies.\n", - jiffies, expires); - - timer->expires = jiffies + expires; - - add_timer(timer); -} - /** * i2c_pnx_start - start a device * @slave_addr: slave address @@ -259,8 +239,6 @@ static int i2c_pnx_master_xmit(struct i2c_pnx_algo_data *alg_data) ~(mcntrl_afie | mcntrl_naie | mcntrl_drmie), I2C_REG_CTL(alg_data)); - del_timer_sync(&alg_data->mif.timer); - dev_dbg(&alg_data->adapter.dev, "%s(): Waking up xfer routine.\n", __func__); @@ -276,8 +254,6 @@ static int i2c_pnx_master_xmit(struct i2c_pnx_algo_data *alg_data) ~(mcntrl_afie | mcntrl_naie | mcntrl_drmie), I2C_REG_CTL(alg_data)); - /* Stop timer. */ - del_timer_sync(&alg_data->mif.timer); dev_dbg(&alg_data->adapter.dev, "%s(): Waking up xfer routine after zero-xfer.\n", __func__); @@ -364,8 +340,6 @@ static int i2c_pnx_master_rcv(struct i2c_pnx_algo_data *alg_data) mcntrl_drmie | mcntrl_daie); iowrite32(ctl, I2C_REG_CTL(alg_data)); - /* Kill timer. */ - del_timer_sync(&alg_data->mif.timer); complete(&alg_data->mif.complete); } } @@ -400,8 +374,6 @@ static irqreturn_t i2c_pnx_interrupt(int irq, void *dev_id) mcntrl_drmie); iowrite32(ctl, I2C_REG_CTL(alg_data)); - /* Stop timer, to prevent timeout. */ - del_timer_sync(&alg_data->mif.timer); complete(&alg_data->mif.complete); } else if (stat & mstatus_nai) { /* Slave did not acknowledge, generate a STOP */ @@ -419,8 +391,6 @@ static irqreturn_t i2c_pnx_interrupt(int irq, void *dev_id) /* Our return value. */ alg_data->mif.ret = -EIO; - /* Stop timer, to prevent timeout. */ - del_timer_sync(&alg_data->mif.timer); complete(&alg_data->mif.complete); } else { /* @@ -453,9 +423,8 @@ static irqreturn_t i2c_pnx_interrupt(int irq, void *dev_id) return IRQ_HANDLED; } -static void i2c_pnx_timeout(struct timer_list *t) +static void i2c_pnx_timeout(struct i2c_pnx_algo_data *alg_data) { - struct i2c_pnx_algo_data *alg_data = from_timer(alg_data, t, mif.timer); u32 ctl; dev_err(&alg_data->adapter.dev, @@ -472,7 +441,6 @@ static void i2c_pnx_timeout(struct timer_list *t) iowrite32(ctl, I2C_REG_CTL(alg_data)); wait_reset(alg_data); alg_data->mif.ret = -EIO; - complete(&alg_data->mif.complete); } static inline void bus_reset_if_active(struct i2c_pnx_algo_data *alg_data) @@ -514,6 +482,7 @@ i2c_pnx_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs, int num) struct i2c_msg *pmsg; int rc = 0, completed = 0, i; struct i2c_pnx_algo_data *alg_data = adap->algo_data; + unsigned long time_left; u32 stat; dev_dbg(&alg_data->adapter.dev, @@ -548,7 +517,6 @@ i2c_pnx_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs, int num) dev_dbg(&alg_data->adapter.dev, "%s(): mode %d, %d bytes\n", __func__, alg_data->mif.mode, alg_data->mif.len); - i2c_pnx_arm_timer(alg_data); /* initialize the completion var */ init_completion(&alg_data->mif.complete); @@ -564,7 +532,10 @@ i2c_pnx_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs, int num) break; /* Wait for completion */ - wait_for_completion(&alg_data->mif.complete); + time_left = wait_for_completion_timeout(&alg_data->mif.complete, + alg_data->timeout); + if (time_left == 0) + i2c_pnx_timeout(alg_data); if (!(rc = alg_data->mif.ret)) completed++; @@ -657,7 +628,10 @@ static int i2c_pnx_probe(struct platform_device *pdev) alg_data->adapter.algo_data = alg_data; alg_data->adapter.nr = pdev->id; - alg_data->timeout = I2C_PNX_TIMEOUT_DEFAULT; + alg_data->timeout = msecs_to_jiffies(I2C_PNX_TIMEOUT_DEFAULT); + if (alg_data->timeout <= 1) + alg_data->timeout = 2; + #ifdef CONFIG_OF alg_data->adapter.dev.of_node = of_node_get(pdev->dev.of_node); if (pdev->dev.of_node) { @@ -677,8 +651,6 @@ static int i2c_pnx_probe(struct platform_device *pdev) if (IS_ERR(alg_data->clk)) return PTR_ERR(alg_data->clk); - timer_setup(&alg_data->mif.timer, i2c_pnx_timeout, 0); - snprintf(alg_data->adapter.name, sizeof(alg_data->adapter.name), "%s", pdev->name); From 97982c31064a203d57403959c0f51a7571964700 Mon Sep 17 00:00:00 2001 From: Jian-Hong Pan Date: Mon, 20 May 2024 13:50:09 +0800 Subject: [PATCH 1419/1565] ALSA: hda/realtek: Enable headset mic of JP-IK LEAP W502 with ALC897 [ Upstream commit 45e37f9ce28d248470bab4376df2687a215d1b22 ] JP-IK LEAP W502 laptop's headset mic is not enabled until ALC897_FIXUP_HEADSET_MIC_PIN3 quirk is applied. Here is the original pin node values: 0x11 0x40000000 0x12 0xb7a60130 0x14 0x90170110 0x15 0x411111f0 0x16 0x411111f0 0x17 0x411111f0 0x18 0x411111f0 0x19 0x411111f0 0x1a 0x411111f0 0x1b 0x03211020 0x1c 0x411111f0 0x1d 0x4026892d 0x1e 0x411111f0 0x1f 0x411111f0 Signed-off-by: Jian-Hong Pan Link: https://lore.kernel.org/r/20240520055008.7083-2-jhp@endlessos.org Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin --- sound/pci/hda/patch_realtek.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 5c5a144e707f..c2374556c17e 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -10807,6 +10807,7 @@ enum { ALC897_FIXUP_LENOVO_HEADSET_MODE, ALC897_FIXUP_HEADSET_MIC_PIN2, ALC897_FIXUP_UNIS_H3C_X500S, + ALC897_FIXUP_HEADSET_MIC_PIN3, }; static const struct hda_fixup alc662_fixups[] = { @@ -11253,10 +11254,18 @@ static const struct hda_fixup alc662_fixups[] = { {} }, }, + [ALC897_FIXUP_HEADSET_MIC_PIN3] = { + .type = HDA_FIXUP_PINS, + .v.pins = (const struct hda_pintbl[]) { + { 0x19, 0x03a11050 }, /* use as headset mic */ + { } + }, + }, }; static const struct snd_pci_quirk alc662_fixup_tbl[] = { SND_PCI_QUIRK(0x1019, 0x9087, "ECS", ALC662_FIXUP_ASUS_MODE2), + SND_PCI_QUIRK(0x1019, 0x9859, "JP-IK LEAP W502", ALC897_FIXUP_HEADSET_MIC_PIN3), SND_PCI_QUIRK(0x1025, 0x022f, "Acer Aspire One", ALC662_FIXUP_INV_DMIC), SND_PCI_QUIRK(0x1025, 0x0241, "Packard Bell DOTS", ALC662_FIXUP_INV_DMIC), SND_PCI_QUIRK(0x1025, 0x0308, "Acer Aspire 8942G", ALC662_FIXUP_ASPIRE), From 2d428a07e8b206bb33869141e9f66d0db5db5282 Mon Sep 17 00:00:00 2001 From: Nilay Shroff Date: Thu, 16 May 2024 17:43:51 +0530 Subject: [PATCH 1420/1565] nvme-multipath: find NUMA path only for online numa-node [ Upstream commit d3a043733f25d743f3aa617c7f82dbcb5ee2211a ] In current native multipath design when a shared namespace is created, we loop through each possible numa-node, calculate the NUMA distance of that node from each nvme controller and then cache the optimal IO path for future reference while sending IO. The issue with this design is that we may refer to the NUMA distance table for an offline node which may not be populated at the time and so we may inadvertently end up finding and caching a non-optimal path for IO. Then latter when the corresponding numa-node becomes online and hence the NUMA distance table entry for that node is created, ideally we should re-calculate the multipath node distance for the newly added node however that doesn't happen unless we rescan/reset the controller. So essentially, we may keep using non-optimal IO path for a node which is made online after namespace is created. This patch helps fix this issue ensuring that when a shared namespace is created, we calculate the multipath node distance for each online numa-node instead of each possible numa-node. Then latter when a node becomes online and we receive any IO on that newly added node, we would calculate the multipath node distance for newly added node but this time NUMA distance table would have been already populated for newly added node. Hence we would be able to correctly calculate the multipath node distance and choose the optimal path for the IO. Signed-off-by: Nilay Shroff Reviewed-by: Christoph Hellwig Signed-off-by: Keith Busch Signed-off-by: Sasha Levin --- drivers/nvme/host/multipath.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 9f59f93b70e2..54ca60db6547 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -420,7 +420,7 @@ static void nvme_mpath_set_live(struct nvme_ns *ns) int node, srcu_idx; srcu_idx = srcu_read_lock(&head->srcu); - for_each_node(node) + for_each_online_node(node) __nvme_find_path(head, node); srcu_read_unlock(&head->srcu, srcu_idx); } From 560eaa1af0382a0f089dd52863bd5cef8c9427d3 Mon Sep 17 00:00:00 2001 From: Kundan Kumar Date: Thu, 23 May 2024 17:01:49 +0530 Subject: [PATCH 1421/1565] nvme: adjust multiples of NVME_CTRL_PAGE_SIZE in offset [ Upstream commit 1bd293fcf3af84674e82ed022c049491f3768840 ] bio_vec start offset may be relatively large particularly when large folio gets added to the bio. A bigger offset will result in avoiding the single-segment mapping optimization and end up using expensive mempool_alloc further. Rather than using absolute value, adjust bv_offset by NVME_CTRL_PAGE_SIZE while checking if segment can be fitted into one/two PRP entries. Suggested-by: Christoph Hellwig Signed-off-by: Kundan Kumar Reviewed-by: Sagi Grimberg Reviewed-by: Christoph Hellwig Signed-off-by: Keith Busch Signed-off-by: Sasha Levin --- drivers/nvme/host/pci.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 5242feda5471..a7131f4752e2 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -844,7 +844,8 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req, struct bio_vec bv = req_bvec(req); if (!is_pci_p2pdma_page(bv.bv_page)) { - if (bv.bv_offset + bv.bv_len <= NVME_CTRL_PAGE_SIZE * 2) + if ((bv.bv_offset & (NVME_CTRL_PAGE_SIZE - 1)) + + bv.bv_len <= NVME_CTRL_PAGE_SIZE * 2) return nvme_setup_prp_simple(dev, req, &cmnd->rw, &bv); From 8802d233505fdc54fd44d7e4e597026f93522393 Mon Sep 17 00:00:00 2001 From: hmtheboy154 Date: Mon, 27 May 2024 11:14:46 +0200 Subject: [PATCH 1422/1565] platform/x86: touchscreen_dmi: Add info for GlobalSpace SolT IVW 11.6" tablet [ Upstream commit 7c8639aa41343fd7b3dbe09baf6b0791fcc407a1 ] This is a tablet created by GlobalSpace Technologies Limited which uses an Intel Atom x5-Z8300, 4GB of RAM & 64GB of storage. Link: https://web.archive.org/web/20171102141952/http://globalspace.in/11.6-device.html Signed-off-by: hmtheboy154 Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede Link: https://lore.kernel.org/r/20240527091447.248849-2-hdegoede@redhat.com Signed-off-by: Sasha Levin --- drivers/platform/x86/touchscreen_dmi.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/drivers/platform/x86/touchscreen_dmi.c b/drivers/platform/x86/touchscreen_dmi.c index fbaa61859462..8cb07f0166c2 100644 --- a/drivers/platform/x86/touchscreen_dmi.c +++ b/drivers/platform/x86/touchscreen_dmi.c @@ -857,6 +857,22 @@ static const struct ts_dmi_data schneider_sct101ctm_data = { .properties = schneider_sct101ctm_props, }; +static const struct property_entry globalspace_solt_ivw116_props[] = { + PROPERTY_ENTRY_U32("touchscreen-min-x", 7), + PROPERTY_ENTRY_U32("touchscreen-min-y", 22), + PROPERTY_ENTRY_U32("touchscreen-size-x", 1723), + PROPERTY_ENTRY_U32("touchscreen-size-y", 1077), + PROPERTY_ENTRY_STRING("firmware-name", "gsl1680-globalspace-solt-ivw116.fw"), + PROPERTY_ENTRY_U32("silead,max-fingers", 10), + PROPERTY_ENTRY_BOOL("silead,home-button"), + { } +}; + +static const struct ts_dmi_data globalspace_solt_ivw116_data = { + .acpi_name = "MSSL1680:00", + .properties = globalspace_solt_ivw116_props, +}; + static const struct property_entry techbite_arc_11_6_props[] = { PROPERTY_ENTRY_U32("touchscreen-min-x", 5), PROPERTY_ENTRY_U32("touchscreen-min-y", 7), @@ -1490,6 +1506,15 @@ const struct dmi_system_id touchscreen_dmi_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "SCT101CTM"), }, }, + { + /* GlobalSpace SoLT IVW 11.6" */ + .driver_data = (void *)&globalspace_solt_ivw116_data, + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Globalspace Tech Pvt Ltd"), + DMI_MATCH(DMI_PRODUCT_NAME, "SolTIVW"), + DMI_MATCH(DMI_PRODUCT_SKU, "PN20170413488"), + }, + }, { /* Techbite Arc 11.6 */ .driver_data = (void *)&techbite_arc_11_6_data, From 1fa5c6eef4ec59b1b8d639de00b3e2307148ad23 Mon Sep 17 00:00:00 2001 From: hmtheboy154 Date: Mon, 27 May 2024 11:14:47 +0200 Subject: [PATCH 1423/1565] platform/x86: touchscreen_dmi: Add info for the EZpad 6s Pro [ Upstream commit 3050052613790e75b5e4a8536930426b0a8b0774 ] The "EZpad 6s Pro" uses the same touchscreen as the "EZpad 6 Pro B", unlike the "Ezpad 6 Pro" which has its own touchscreen. Signed-off-by: hmtheboy154 Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede Link: https://lore.kernel.org/r/20240527091447.248849-3-hdegoede@redhat.com Signed-off-by: Sasha Levin --- drivers/platform/x86/touchscreen_dmi.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/platform/x86/touchscreen_dmi.c b/drivers/platform/x86/touchscreen_dmi.c index 8cb07f0166c2..dce2d26b1d0f 100644 --- a/drivers/platform/x86/touchscreen_dmi.c +++ b/drivers/platform/x86/touchscreen_dmi.c @@ -1277,6 +1277,17 @@ const struct dmi_system_id touchscreen_dmi_table[] = { DMI_MATCH(DMI_BIOS_DATE, "04/24/2018"), }, }, + { + /* Jumper EZpad 6s Pro */ + .driver_data = (void *)&jumper_ezpad_6_pro_b_data, + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Jumper"), + DMI_MATCH(DMI_PRODUCT_NAME, "Ezpad"), + /* Above matches are too generic, add bios match */ + DMI_MATCH(DMI_BIOS_VERSION, "E.WSA116_8.E1.042.bin"), + DMI_MATCH(DMI_BIOS_DATE, "01/08/2020"), + }, + }, { /* Jumper EZpad 6 m4 */ .driver_data = (void *)&jumper_ezpad_6_m4_data, From 2f3c22b1d3d7e86712253244797a651998c141fa Mon Sep 17 00:00:00 2001 From: Sagi Grimberg Date: Mon, 27 May 2024 22:38:52 +0300 Subject: [PATCH 1424/1565] nvmet: fix a possible leak when destroy a ctrl during qp establishment [ Upstream commit c758b77d4a0a0ed3a1292b3fd7a2aeccd1a169a4 ] In nvmet_sq_destroy we capture sq->ctrl early and if it is non-NULL we know that a ctrl was allocated (in the admin connect request handler) and we need to release pending AERs, clear ctrl->sqs and sq->ctrl (for nvme-loop primarily), and drop the final reference on the ctrl. However, a small window is possible where nvmet_sq_destroy starts (as a result of the client giving up and disconnecting) concurrently with the nvme admin connect cmd (which may be in an early stage). But *before* kill_and_confirm of sq->ref (i.e. the admin connect managed to get an sq live reference). In this case, sq->ctrl was allocated however after it was captured in a local variable in nvmet_sq_destroy. This prevented the final reference drop on the ctrl. Solve this by re-capturing the sq->ctrl after all inflight request has completed, where for sure sq->ctrl reference is final, and move forward based on that. This issue was observed in an environment with many hosts connecting multiple ctrls simoutanuosly, creating a delay in allocating a ctrl leading up to this race window. Reported-by: Alex Turin Signed-off-by: Sagi Grimberg Reviewed-by: Christoph Hellwig Signed-off-by: Keith Busch Signed-off-by: Sasha Levin --- drivers/nvme/target/core.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index 59109eb8e8e4..a04bb02c1251 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -795,6 +795,15 @@ void nvmet_sq_destroy(struct nvmet_sq *sq) wait_for_completion(&sq->free_done); percpu_ref_exit(&sq->ref); + /* + * we must reference the ctrl again after waiting for inflight IO + * to complete. Because admin connect may have sneaked in after we + * store sq->ctrl locally, but before we killed the percpu_ref. the + * admin connect allocates and assigns sq->ctrl, which now needs a + * final ref put, as this ctrl is going away. + */ + ctrl = sq->ctrl; + if (ctrl) { /* * The teardown flow may take some time, and the host may not From 3affee779bd3c947d101a54d970c8ff06cecf83b Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Mon, 20 May 2024 21:42:11 +0900 Subject: [PATCH 1425/1565] kbuild: fix short log for AS in link-vmlinux.sh [ Upstream commit 3430f65d6130ccbc86f0ff45642eeb9e2032a600 ] In convention, short logs print the output file, not the input file. Let's change the suffix for 'AS' since it assembles *.S into *.o. [Before] LD .tmp_vmlinux.kallsyms1 NM .tmp_vmlinux.kallsyms1.syms KSYMS .tmp_vmlinux.kallsyms1.S AS .tmp_vmlinux.kallsyms1.S LD .tmp_vmlinux.kallsyms2 NM .tmp_vmlinux.kallsyms2.syms KSYMS .tmp_vmlinux.kallsyms2.S AS .tmp_vmlinux.kallsyms2.S LD vmlinux [After] LD .tmp_vmlinux.kallsyms1 NM .tmp_vmlinux.kallsyms1.syms KSYMS .tmp_vmlinux.kallsyms1.S AS .tmp_vmlinux.kallsyms1.o LD .tmp_vmlinux.kallsyms2 NM .tmp_vmlinux.kallsyms2.syms KSYMS .tmp_vmlinux.kallsyms2.S AS .tmp_vmlinux.kallsyms2.o LD vmlinux Signed-off-by: Masahiro Yamada Signed-off-by: Sasha Levin --- scripts/link-vmlinux.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 3a1ffd84eac2..bf534a323fdd 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -213,7 +213,7 @@ kallsyms_step() vmlinux_link ${kallsyms_vmlinux} "${kallsymso_prev}" ${btf_vmlinux_bin_o} kallsyms ${kallsyms_vmlinux} ${kallsyms_S} - info AS ${kallsyms_S} + info AS ${kallsymso} ${CC} ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS} \ ${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL} \ -c -o ${kallsymso} ${kallsyms_S} From 1e99ce37e96e59120253917b75eca5eaf5bade8e Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi Date: Sun, 23 Jun 2024 14:11:35 +0900 Subject: [PATCH 1426/1565] nilfs2: fix incorrect inode allocation from reserved inodes commit 93aef9eda1cea9e84ab2453fcceb8addad0e46f1 upstream. If the bitmap block that manages the inode allocation status is corrupted, nilfs_ifile_create_inode() may allocate a new inode from the reserved inode area where it should not be allocated. Previous fix commit d325dc6eb763 ("nilfs2: fix use-after-free bug of struct nilfs_root"), fixed the problem that reserved inodes with inode numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to bitmap corruption, but since the start number of non-reserved inodes is read from the super block and may change, in which case inode allocation may occur from the extended reserved inode area. If that happens, access to that inode will cause an IO error, causing the file system to degrade to an error state. Fix this potential issue by adding a wraparound option to the common metadata object allocation routine and by modifying nilfs_ifile_create_inode() to disable the option so that it only allocates inodes with inode numbers greater than or equal to the inode number read in "nilfs->ns_first_ino", regardless of the bitmap status of reserved inodes. Link: https://lkml.kernel.org/r/20240623051135.4180-4-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi Cc: Hillf Danton Cc: Jan Kara Cc: Matthew Wilcox (Oracle) Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- fs/nilfs2/alloc.c | 18 ++++++++++++++---- fs/nilfs2/alloc.h | 4 ++-- fs/nilfs2/dat.c | 2 +- fs/nilfs2/ifile.c | 7 ++----- 4 files changed, 19 insertions(+), 12 deletions(-) diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c index 279d945d4ebe..2dc5fae6a6ee 100644 --- a/fs/nilfs2/alloc.c +++ b/fs/nilfs2/alloc.c @@ -377,11 +377,12 @@ void *nilfs_palloc_block_get_entry(const struct inode *inode, __u64 nr, * @target: offset number of an entry in the group (start point) * @bsize: size in bits * @lock: spin lock protecting @bitmap + * @wrap: whether to wrap around */ static int nilfs_palloc_find_available_slot(unsigned char *bitmap, unsigned long target, unsigned int bsize, - spinlock_t *lock) + spinlock_t *lock, bool wrap) { int pos, end = bsize; @@ -397,6 +398,8 @@ static int nilfs_palloc_find_available_slot(unsigned char *bitmap, end = target; } + if (!wrap) + return -ENOSPC; /* wrap around */ for (pos = 0; pos < end; pos++) { @@ -495,9 +498,10 @@ int nilfs_palloc_count_max_entries(struct inode *inode, u64 nused, u64 *nmaxp) * nilfs_palloc_prepare_alloc_entry - prepare to allocate a persistent object * @inode: inode of metadata file using this allocator * @req: nilfs_palloc_req structure exchanged for the allocation + * @wrap: whether to wrap around */ int nilfs_palloc_prepare_alloc_entry(struct inode *inode, - struct nilfs_palloc_req *req) + struct nilfs_palloc_req *req, bool wrap) { struct buffer_head *desc_bh, *bitmap_bh; struct nilfs_palloc_group_desc *desc; @@ -516,7 +520,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode, entries_per_group = nilfs_palloc_entries_per_group(inode); for (i = 0; i < ngroups; i += n) { - if (group >= ngroups) { + if (group >= ngroups && wrap) { /* wrap around */ group = 0; maxgroup = nilfs_palloc_group(inode, req->pr_entry_nr, @@ -541,7 +545,13 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode, bitmap = bitmap_kaddr + bh_offset(bitmap_bh); pos = nilfs_palloc_find_available_slot( bitmap, group_offset, - entries_per_group, lock); + entries_per_group, lock, wrap); + /* + * Since the search for a free slot in the + * second and subsequent bitmap blocks always + * starts from the beginning, the wrap flag + * only has an effect on the first search. + */ if (pos >= 0) { /* found a free entry */ nilfs_palloc_group_desc_add_entries( diff --git a/fs/nilfs2/alloc.h b/fs/nilfs2/alloc.h index 0303c3968cee..071fc620264e 100644 --- a/fs/nilfs2/alloc.h +++ b/fs/nilfs2/alloc.h @@ -50,8 +50,8 @@ struct nilfs_palloc_req { struct buffer_head *pr_entry_bh; }; -int nilfs_palloc_prepare_alloc_entry(struct inode *, - struct nilfs_palloc_req *); +int nilfs_palloc_prepare_alloc_entry(struct inode *inode, + struct nilfs_palloc_req *req, bool wrap); void nilfs_palloc_commit_alloc_entry(struct inode *, struct nilfs_palloc_req *); void nilfs_palloc_abort_alloc_entry(struct inode *, struct nilfs_palloc_req *); diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c index 22b1ca5c379d..9b63cc42caac 100644 --- a/fs/nilfs2/dat.c +++ b/fs/nilfs2/dat.c @@ -75,7 +75,7 @@ int nilfs_dat_prepare_alloc(struct inode *dat, struct nilfs_palloc_req *req) { int ret; - ret = nilfs_palloc_prepare_alloc_entry(dat, req); + ret = nilfs_palloc_prepare_alloc_entry(dat, req, true); if (ret < 0) return ret; diff --git a/fs/nilfs2/ifile.c b/fs/nilfs2/ifile.c index 02727ed3a7c6..9ee8d006f1a2 100644 --- a/fs/nilfs2/ifile.c +++ b/fs/nilfs2/ifile.c @@ -55,13 +55,10 @@ int nilfs_ifile_create_inode(struct inode *ifile, ino_t *out_ino, struct nilfs_palloc_req req; int ret; - req.pr_entry_nr = 0; /* - * 0 says find free inode from beginning - * of a group. dull code!! - */ + req.pr_entry_nr = NILFS_FIRST_INO(ifile->i_sb); req.pr_entry_bh = NULL; - ret = nilfs_palloc_prepare_alloc_entry(ifile, &req); + ret = nilfs_palloc_prepare_alloc_entry(ifile, &req, false); if (!ret) { ret = nilfs_palloc_get_entry_block(ifile, req.pr_entry_nr, 1, &req.pr_entry_bh); From 0100aeb8a12d51950418e685f879cc80cb8e5982 Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Tue, 25 Jun 2024 20:16:39 -0400 Subject: [PATCH 1427/1565] mm: prevent derefencing NULL ptr in pfn_section_valid() [ Upstream commit 82f0b6f041fad768c28b4ad05a683065412c226e ] Commit 5ec8e8ea8b77 ("mm/sparsemem: fix race in accessing memory_section->usage") changed pfn_section_valid() to add a READ_ONCE() call around "ms->usage" to fix a race with section_deactivate() where ms->usage can be cleared. The READ_ONCE() call, by itself, is not enough to prevent NULL pointer dereference. We need to check its value before dereferencing it. Link: https://lkml.kernel.org/r/20240626001639.1350646-1-longman@redhat.com Fixes: 5ec8e8ea8b77 ("mm/sparsemem: fix race in accessing memory_section->usage") Signed-off-by: Waiman Long Cc: Charan Teja Kalla Signed-off-by: Andrew Morton Signed-off-by: Sasha Levin --- include/linux/mmzone.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index ffae2b330818..71150fb1cb2a 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1353,8 +1353,9 @@ static inline int subsection_map_index(unsigned long pfn) static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn) { int idx = subsection_map_index(pfn); + struct mem_section_usage *usage = READ_ONCE(ms->usage); - return test_bit(idx, READ_ONCE(ms->usage)->subsection_map); + return usage ? test_bit(idx, usage->subsection_map) : 0; } #else static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn) From 7d4c14f4b511fd4c0dc788084ae59b4656ace58b Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Tue, 2 Jul 2024 18:44:48 -0400 Subject: [PATCH 1428/1565] filelock: fix potential use-after-free in posix_lock_inode MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 1b3ec4f7c03d4b07bad70697d7e2f4088d2cfe92 ] Light Hsieh reported a KASAN UAF warning in trace_posix_lock_inode(). The request pointer had been changed earlier to point to a lock entry that was added to the inode's list. However, before the tracepoint could fire, another task raced in and freed that lock. Fix this by moving the tracepoint inside the spinlock, which should ensure that this doesn't happen. Fixes: 74f6f5912693 ("locks: fix KASAN: use-after-free in trace_event_raw_event_filelock_lock") Link: https://lore.kernel.org/linux-fsdevel/724ffb0a2962e912ea62bb0515deadf39c325112.camel@kernel.org/ Reported-by: Light Hsieh (謝明燈) Signed-off-by: Jeff Layton Link: https://lore.kernel.org/r/20240702-filelock-6-10-v1-1-96e766aadc98@kernel.org Reviewed-by: Alexander Aring Signed-off-by: Christian Brauner Signed-off-by: Sasha Levin --- fs/locks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/locks.c b/fs/locks.c index b0753c8871fb..843fa3d3375d 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -1392,9 +1392,9 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request, locks_wake_up_blocks(left); } out: + trace_posix_lock_inode(inode, request, error); spin_unlock(&ctx->flc_lock); percpu_up_read(&file_rwsem); - trace_posix_lock_inode(inode, request, error); /* * Free any unused locks. */ From 7092f1e5821fe80693e206a132506bfe1e313680 Mon Sep 17 00:00:00 2001 From: linke li Date: Wed, 3 Apr 2024 10:10:08 +0800 Subject: [PATCH 1429/1565] fs/dcache: Re-use value stored to dentry->d_flags instead of re-reading [ Upstream commit 8bfb40be31ddea0cb4664b352e1797cfe6c91976 ] Currently, the __d_clear_type_and_inode() writes the value flags to dentry->d_flags, then immediately re-reads it in order to use it in a if statement. This re-read is useless because no other update to dentry->d_flags can occur at this point. This commit therefore re-use flags in the if statement instead of re-reading dentry->d_flags. Signed-off-by: linke li Link: https://lore.kernel.org/r/tencent_5E187BD0A61BA28605E85405F15228254D0A@qq.com Reviewed-by: Jan Kara Signed-off-by: Christian Brauner Stable-dep-of: aabfe57ebaa7 ("vfs: don't mod negative dentry count when on shrinker list") Signed-off-by: Sasha Levin --- fs/dcache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/dcache.c b/fs/dcache.c index 976c7474d62a..d2ebee3a120c 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -329,7 +329,7 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry) flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU); WRITE_ONCE(dentry->d_flags, flags); dentry->d_inode = NULL; - if (dentry->d_flags & DCACHE_LRU_LIST) + if (flags & DCACHE_LRU_LIST) this_cpu_inc(nr_dentry_negative); } From 1b95de9433b39d58af63feae2017d5d1f1c85898 Mon Sep 17 00:00:00 2001 From: Brian Foster Date: Wed, 3 Jul 2024 08:13:01 -0400 Subject: [PATCH 1430/1565] vfs: don't mod negative dentry count when on shrinker list [ Upstream commit aabfe57ebaa75841db47ea59091ec3c5a06d2f52 ] The nr_dentry_negative counter is intended to only account negative dentries that are present on the superblock LRU. Therefore, the LRU add, remove and isolate helpers modify the counter based on whether the dentry is negative, but the shrinker list related helpers do not modify the counter, and the paths that change a dentry between positive and negative only do so if DCACHE_LRU_LIST is set. The problem with this is that a dentry on a shrinker list still has DCACHE_LRU_LIST set to indicate ->d_lru is in use. The additional DCACHE_SHRINK_LIST flag denotes whether the dentry is on LRU or a shrink related list. Therefore if a relevant operation (i.e. unlink) occurs while a dentry is present on a shrinker list, and the associated codepath only checks for DCACHE_LRU_LIST, then it is technically possible to modify the negative dentry count for a dentry that is off the LRU. Since the shrinker list related helpers do not modify the negative dentry count (because non-LRU dentries should not be included in the count) when the dentry is ultimately removed from the shrinker list, this can cause the negative dentry count to become permanently inaccurate. This problem can be reproduced via a heavy file create/unlink vs. drop_caches workload. On an 80xcpu system, I start 80 tasks each running a 1k file create/delete loop, and one task spinning on drop_caches. After 10 minutes or so of runtime, the idle/clean cache negative dentry count increases from somewhere in the range of 5-10 entries to several hundred (and increasingly grows beyond nr_dentry_unused). Tweak the logic in the paths that turn a dentry negative or positive to filter out the case where the dentry is present on a shrink related list. This allows the above workload to maintain an accurate negative dentry count. Fixes: af0c9af1b3f6 ("fs/dcache: Track & report number of negative dentries") Signed-off-by: Brian Foster Link: https://lore.kernel.org/r/20240703121301.247680-1-bfoster@redhat.com Acked-by: Ian Kent Reviewed-by: Josef Bacik Reviewed-by: Waiman Long Signed-off-by: Christian Brauner Signed-off-by: Sasha Levin --- fs/dcache.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index d2ebee3a120c..406a71abb1b5 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -329,7 +329,11 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry) flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU); WRITE_ONCE(dentry->d_flags, flags); dentry->d_inode = NULL; - if (flags & DCACHE_LRU_LIST) + /* + * The negative counter only tracks dentries on the LRU. Don't inc if + * d_lru is on another list. + */ + if ((flags & (DCACHE_LRU_LIST|DCACHE_SHRINK_LIST)) == DCACHE_LRU_LIST) this_cpu_inc(nr_dentry_negative); } @@ -1940,9 +1944,11 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode) spin_lock(&dentry->d_lock); /* - * Decrement negative dentry count if it was in the LRU list. + * The negative counter only tracks dentries on the LRU. Don't dec if + * d_lru is on another list. */ - if (dentry->d_flags & DCACHE_LRU_LIST) + if ((dentry->d_flags & + (DCACHE_LRU_LIST|DCACHE_SHRINK_LIST)) == DCACHE_LRU_LIST) this_cpu_dec(nr_dentry_negative); hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry); raw_write_seqcount_begin(&dentry->d_seq); From 893e140dcc021eef6eb379ec144082daacbd641a Mon Sep 17 00:00:00 2001 From: Neal Cardwell Date: Wed, 3 Jul 2024 13:12:46 -0400 Subject: [PATCH 1431/1565] tcp: fix incorrect undo caused by DSACK of TLP retransmit [ Upstream commit 0ec986ed7bab6801faed1440e8839dcc710331ff ] Loss recovery undo_retrans bookkeeping had a long-standing bug where a DSACK from a spurious TLP retransmit packet could cause an erroneous undo of a fast recovery or RTO recovery that repaired a single really-lost packet (in a sequence range outside that of the TLP retransmit). Basically, because the loss recovery state machine didn't account for the fact that it sent a TLP retransmit, the DSACK for the TLP retransmit could erroneously be implicitly be interpreted as corresponding to the normal fast recovery or RTO recovery retransmit that plugged a real hole, thus resulting in an improper undo. For example, consider the following buggy scenario where there is a real packet loss but the congestion control response is improperly undone because of this bug: + send packets P1, P2, P3, P4 + P1 is really lost + send TLP retransmit of P4 + receive SACK for original P2, P3, P4 + enter fast recovery, fast-retransmit P1, increment undo_retrans to 1 + receive DSACK for TLP P4, decrement undo_retrans to 0, undo (bug!) + receive cumulative ACK for P1-P4 (fast retransmit plugged real hole) The fix: when we initialize undo machinery in tcp_init_undo(), if there is a TLP retransmit in flight, then increment tp->undo_retrans so that we make sure that we receive a DSACK corresponding to the TLP retransmit, as well as DSACKs for all later normal retransmits, before triggering a loss recovery undo. Note that we also have to move the line that clears tp->tlp_high_seq for RTO recovery, so that upon RTO we remember the tp->tlp_high_seq value until tcp_init_undo() and clear it only afterward. Also note that the bug dates back to the original 2013 TLP implementation, commit 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)"). However, this patch will only compile and work correctly with kernels that have tp->tlp_retrans, which was added only in v5.8 in 2020 in commit 76be93fc0702 ("tcp: allow at most one TLP probe per flight"). So we associate this fix with that later commit. Fixes: 76be93fc0702 ("tcp: allow at most one TLP probe per flight") Signed-off-by: Neal Cardwell Reviewed-by: Eric Dumazet Cc: Yuchung Cheng Cc: Kevin Yang Link: https://patch.msgid.link/20240703171246.1739561-1-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/ipv4/tcp_input.c | 11 ++++++++++- net/ipv4/tcp_timer.c | 2 -- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 604ff1b04c3e..06c03b21500f 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2084,8 +2084,16 @@ void tcp_clear_retrans(struct tcp_sock *tp) static inline void tcp_init_undo(struct tcp_sock *tp) { tp->undo_marker = tp->snd_una; + /* Retransmission still in flight may cause DSACKs later. */ - tp->undo_retrans = tp->retrans_out ? : -1; + /* First, account for regular retransmits in flight: */ + tp->undo_retrans = tp->retrans_out; + /* Next, account for TLP retransmits in flight: */ + if (tp->tlp_high_seq && tp->tlp_retrans) + tp->undo_retrans++; + /* Finally, avoid 0, because undo_retrans==0 means "can undo now": */ + if (!tp->undo_retrans) + tp->undo_retrans = -1; } static bool tcp_is_rack(const struct sock *sk) @@ -2164,6 +2172,7 @@ void tcp_enter_loss(struct sock *sk) tcp_set_ca_state(sk, TCP_CA_Loss); tp->high_seq = tp->snd_nxt; + tp->tlp_high_seq = 0; tcp_ecn_queue_cwr(tp); /* F-RTO RFC5682 sec 3.1 step 1: retransmit SND.UNA if no previous diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 5c7e10939dd9..2925b985e11c 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -492,8 +492,6 @@ void tcp_retransmit_timer(struct sock *sk) if (WARN_ON_ONCE(!skb)) return; - tp->tlp_high_seq = 0; - if (!tp->snd_wnd && !sock_flag(sk, SOCK_DEAD) && !((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV))) { /* Receiver dastardly shrinks window. Our retransmits From efc05a5fdc0dd16f613ea47b93413a1b3139e7ad Mon Sep 17 00:00:00 2001 From: Aleksandr Mishin Date: Fri, 5 Jul 2024 12:53:17 +0300 Subject: [PATCH 1432/1565] octeontx2-af: Fix incorrect value output on error path in rvu_check_rsrc_availability() [ Upstream commit 442e26af9aa8115c96541026cbfeaaa76c85d178 ] In rvu_check_rsrc_availability() in case of invalid SSOW req, an incorrect data is printed to error log. 'req->sso' value is printed instead of 'req->ssow'. Looks like "copy-paste" mistake. Fix this mistake by replacing 'req->sso' with 'req->ssow'. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 746ea74241fa ("octeontx2-af: Add RVU block LF provisioning support") Signed-off-by: Aleksandr Mishin Reviewed-by: Simon Horman Link: https://patch.msgid.link/20240705095317.12640-1-amishin@t-argos.ru Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- drivers/net/ethernet/marvell/octeontx2/af/rvu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c index 23b829f974de..e8a2552fb690 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c @@ -1357,7 +1357,7 @@ static int rvu_check_rsrc_availability(struct rvu *rvu, if (req->ssow > block->lf.max) { dev_err(&rvu->pdev->dev, "Func 0x%x: Invalid SSOW req, %d > max %d\n", - pcifunc, req->sso, block->lf.max); + pcifunc, req->ssow, block->lf.max); return -EINVAL; } mappedlfs = rvu_get_rsrc_mapcount(pfvf, block->addr); From b4ac93b0418fe79004d39dcca99604e1dee83b64 Mon Sep 17 00:00:00 2001 From: Aleksander Jan Bajkowski Date: Tue, 28 Dec 2021 23:00:31 +0100 Subject: [PATCH 1433/1565] net: lantiq_etop: add blank line after declaration [ Upstream commit 4c46625bb586a741b8d0e6bdbddbcb2549fa1d36 ] This patch adds a missing line after the declaration and fixes the checkpatch warning: WARNING: Missing a blank line after declarations + int desc; + for (desc = 0; desc < LTQ_DESC_NUM; desc++) Signed-off-by: Aleksander Jan Bajkowski Link: https://lore.kernel.org/r/20211228220031.71576-1-olek2@wp.pl Signed-off-by: Jakub Kicinski Stable-dep-of: e1533b6319ab ("net: ethernet: lantiq_etop: fix double free in detach") Signed-off-by: Sasha Levin --- drivers/net/ethernet/lantiq_etop.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/lantiq_etop.c b/drivers/net/ethernet/lantiq_etop.c index 5ea626b1e578..62300d46d918 100644 --- a/drivers/net/ethernet/lantiq_etop.c +++ b/drivers/net/ethernet/lantiq_etop.c @@ -214,6 +214,7 @@ ltq_etop_free_channel(struct net_device *dev, struct ltq_etop_chan *ch) free_irq(ch->dma.irq, priv); if (IS_RX(ch->idx)) { int desc; + for (desc = 0; desc < LTQ_DESC_NUM; desc++) dev_kfree_skb_any(ch->skb[ch->dma.desc]); } From 22b16618a80858b3a9d607708444426948cc4ae1 Mon Sep 17 00:00:00 2001 From: Aleksander Jan Bajkowski Date: Mon, 8 Jul 2024 22:58:26 +0200 Subject: [PATCH 1434/1565] net: ethernet: lantiq_etop: fix double free in detach [ Upstream commit e1533b6319ab9c3a97dad314dd88b3783bc41b69 ] The number of the currently released descriptor is never incremented which results in the same skb being released multiple times. Fixes: 504d4721ee8e ("MIPS: Lantiq: Add ethernet driver") Reported-by: Joe Perches Closes: https://lore.kernel.org/all/fc1bf93d92bb5b2f99c6c62745507cc22f3a7b2d.camel@perches.com/ Signed-off-by: Aleksander Jan Bajkowski Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/20240708205826.5176-1-olek2@wp.pl Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/ethernet/lantiq_etop.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/lantiq_etop.c b/drivers/net/ethernet/lantiq_etop.c index 62300d46d918..1d7c0b872c59 100644 --- a/drivers/net/ethernet/lantiq_etop.c +++ b/drivers/net/ethernet/lantiq_etop.c @@ -213,9 +213,9 @@ ltq_etop_free_channel(struct net_device *dev, struct ltq_etop_chan *ch) if (ch->dma.irq) free_irq(ch->dma.irq, priv); if (IS_RX(ch->idx)) { - int desc; + struct ltq_dma_channel *dma = &ch->dma; - for (desc = 0; desc < LTQ_DESC_NUM; desc++) + for (dma->desc = 0; dma->desc < LTQ_DESC_NUM; dma->desc++) dev_kfree_skb_any(ch->skb[ch->dma.desc]); } } From 3ba12c2afd933fc1bf800f6d3f6c7ec8f602ce56 Mon Sep 17 00:00:00 2001 From: Dmitry Antipov Date: Mon, 8 Jul 2024 14:56:15 +0300 Subject: [PATCH 1435/1565] ppp: reject claimed-as-LCP but actually malformed packets [ Upstream commit f2aeb7306a898e1cbd03963d376f4b6656ca2b55 ] Since 'ppp_async_encode()' assumes valid LCP packets (with code from 1 to 7 inclusive), add 'ppp_check_packet()' to ensure that LCP packet has an actual body beyond PPP_LCP header bytes, and reject claimed-as-LCP but actually malformed data otherwise. Reported-by: syzbot+ec0723ba9605678b14bf@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ec0723ba9605678b14bf Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Dmitry Antipov Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- drivers/net/ppp/ppp_generic.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c index b825c6a9b6dd..e2bca6fa0822 100644 --- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -70,6 +70,7 @@ #define MPHDRLEN_SSN 4 /* ditto with short sequence numbers */ #define PPP_PROTO_LEN 2 +#define PPP_LCP_HDRLEN 4 /* * An instance of /dev/ppp can be associated with either a ppp @@ -489,6 +490,15 @@ static ssize_t ppp_read(struct file *file, char __user *buf, return ret; } +static bool ppp_check_packet(struct sk_buff *skb, size_t count) +{ + /* LCP packets must include LCP header which 4 bytes long: + * 1-byte code, 1-byte identifier, and 2-byte length. + */ + return get_unaligned_be16(skb->data) != PPP_LCP || + count >= PPP_PROTO_LEN + PPP_LCP_HDRLEN; +} + static ssize_t ppp_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { @@ -511,6 +521,11 @@ static ssize_t ppp_write(struct file *file, const char __user *buf, kfree_skb(skb); goto out; } + ret = -EINVAL; + if (unlikely(!ppp_check_packet(skb, count))) { + kfree_skb(skb); + goto out; + } switch (pf->kind) { case INTERFACE: From c184be30b12e59469c830ace0526f4c9f815f44f Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Tue, 9 Jul 2024 08:19:43 +0200 Subject: [PATCH 1436/1565] ethtool: netlink: do not return SQI value if link is down [ Upstream commit c184cf94e73b04ff7048d045f5413899bc664788 ] Do not attach SQI value if link is down. "SQI values are only valid if link-up condition is present" per OpenAlliance specification of 100Base-T1 Interoperability Test suite [1]. The same rule would apply for other link types. [1] https://opensig.org/automotive-ethernet-specifications/# Fixes: 806602191592 ("ethtool: provide UAPI for PHY Signal Quality Index (SQI)") Signed-off-by: Oleksij Rempel Reviewed-by: Andrew Lunn Reviewed-by: Woojung Huh Link: https://patch.msgid.link/20240709061943.729381-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/ethtool/linkstate.c | 41 ++++++++++++++++++++++++++++------------- 1 file changed, 28 insertions(+), 13 deletions(-) diff --git a/net/ethtool/linkstate.c b/net/ethtool/linkstate.c index fb676f349455..470582a70ccb 100644 --- a/net/ethtool/linkstate.c +++ b/net/ethtool/linkstate.c @@ -36,6 +36,8 @@ static int linkstate_get_sqi(struct net_device *dev) mutex_lock(&phydev->lock); if (!phydev->drv || !phydev->drv->get_sqi) ret = -EOPNOTSUPP; + else if (!phydev->link) + ret = -ENETDOWN; else ret = phydev->drv->get_sqi(phydev); mutex_unlock(&phydev->lock); @@ -54,6 +56,8 @@ static int linkstate_get_sqi_max(struct net_device *dev) mutex_lock(&phydev->lock); if (!phydev->drv || !phydev->drv->get_sqi_max) ret = -EOPNOTSUPP; + else if (!phydev->link) + ret = -ENETDOWN; else ret = phydev->drv->get_sqi_max(phydev); mutex_unlock(&phydev->lock); @@ -61,6 +65,17 @@ static int linkstate_get_sqi_max(struct net_device *dev) return ret; }; +static bool linkstate_sqi_critical_error(int sqi) +{ + return sqi < 0 && sqi != -EOPNOTSUPP && sqi != -ENETDOWN; +} + +static bool linkstate_sqi_valid(struct linkstate_reply_data *data) +{ + return data->sqi >= 0 && data->sqi_max >= 0 && + data->sqi <= data->sqi_max; +} + static int linkstate_get_link_ext_state(struct net_device *dev, struct linkstate_reply_data *data) { @@ -92,12 +107,12 @@ static int linkstate_prepare_data(const struct ethnl_req_info *req_base, data->link = __ethtool_get_link(dev); ret = linkstate_get_sqi(dev); - if (ret < 0 && ret != -EOPNOTSUPP) + if (linkstate_sqi_critical_error(ret)) goto out; data->sqi = ret; ret = linkstate_get_sqi_max(dev); - if (ret < 0 && ret != -EOPNOTSUPP) + if (linkstate_sqi_critical_error(ret)) goto out; data->sqi_max = ret; @@ -122,11 +137,10 @@ static int linkstate_reply_size(const struct ethnl_req_info *req_base, len = nla_total_size(sizeof(u8)) /* LINKSTATE_LINK */ + 0; - if (data->sqi != -EOPNOTSUPP) - len += nla_total_size(sizeof(u32)); - - if (data->sqi_max != -EOPNOTSUPP) - len += nla_total_size(sizeof(u32)); + if (linkstate_sqi_valid(data)) { + len += nla_total_size(sizeof(u32)); /* LINKSTATE_SQI */ + len += nla_total_size(sizeof(u32)); /* LINKSTATE_SQI_MAX */ + } if (data->link_ext_state_provided) len += nla_total_size(sizeof(u8)); /* LINKSTATE_EXT_STATE */ @@ -147,13 +161,14 @@ static int linkstate_fill_reply(struct sk_buff *skb, nla_put_u8(skb, ETHTOOL_A_LINKSTATE_LINK, !!data->link)) return -EMSGSIZE; - if (data->sqi != -EOPNOTSUPP && - nla_put_u32(skb, ETHTOOL_A_LINKSTATE_SQI, data->sqi)) - return -EMSGSIZE; + if (linkstate_sqi_valid(data)) { + if (nla_put_u32(skb, ETHTOOL_A_LINKSTATE_SQI, data->sqi)) + return -EMSGSIZE; - if (data->sqi_max != -EOPNOTSUPP && - nla_put_u32(skb, ETHTOOL_A_LINKSTATE_SQI_MAX, data->sqi_max)) - return -EMSGSIZE; + if (nla_put_u32(skb, ETHTOOL_A_LINKSTATE_SQI_MAX, + data->sqi_max)) + return -EMSGSIZE; + } if (data->link_ext_state_provided) { if (nla_put_u8(skb, ETHTOOL_A_LINKSTATE_EXT_STATE, From 9f965684c57c3117cfd2f754dd3270383c529fba Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Tue, 9 Jul 2024 12:13:56 -0700 Subject: [PATCH 1437/1565] udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port(). [ Upstream commit 5c0b485a8c6116516f33925b9ce5b6104a6eadfd ] syzkaller triggered the warning [0] in udp_v4_early_demux(). In udp_v[46]_early_demux() and sk_lookup(), we do not touch the refcount of the looked-up sk and use sock_pfree() as skb->destructor, so we check SOCK_RCU_FREE to ensure that the sk is safe to access during the RCU grace period. Currently, SOCK_RCU_FREE is flagged for a bound socket after being put into the hash table. Moreover, the SOCK_RCU_FREE check is done too early in udp_v[46]_early_demux() and sk_lookup(), so there could be a small race window: CPU1 CPU2 ---- ---- udp_v4_early_demux() udp_lib_get_port() | |- hlist_add_head_rcu() |- sk = __udp4_lib_demux_lookup() | |- DEBUG_NET_WARN_ON_ONCE(sk_is_refcounted(sk)); `- sock_set_flag(sk, SOCK_RCU_FREE) We had the same bug in TCP and fixed it in commit 871019b22d1b ("net: set SOCK_RCU_FREE before inserting socket into hashtable"). Let's apply the same fix for UDP. [0]: WARNING: CPU: 0 PID: 11198 at net/ipv4/udp.c:2599 udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599 Modules linked in: CPU: 0 PID: 11198 Comm: syz-executor.1 Not tainted 6.9.0-g93bda33046e7 #13 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:udp_v4_early_demux+0x481/0xb70 net/ipv4/udp.c:2599 Code: c5 7a 15 fe bb 01 00 00 00 44 89 e9 31 ff d3 e3 81 e3 bf ef ff ff 89 de e8 2c 74 15 fe 85 db 0f 85 02 06 00 00 e8 9f 7a 15 fe <0f> 0b e8 98 7a 15 fe 49 8d 7e 60 e8 4f 39 2f fe 49 c7 46 60 20 52 RSP: 0018:ffffc9000ce3fa58 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8318c92c RDX: ffff888036ccde00 RSI: ffffffff8318c2f1 RDI: 0000000000000001 RBP: ffff88805a2dd6e0 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0001ffffffffffff R12: ffff88805a2dd680 R13: 0000000000000007 R14: ffff88800923f900 R15: ffff88805456004e FS: 00007fc449127640(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc449126e38 CR3: 000000003de4b002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 PKRU: 55555554 Call Trace: ip_rcv_finish_core.constprop.0+0xbdd/0xd20 net/ipv4/ip_input.c:349 ip_rcv_finish+0xda/0x150 net/ipv4/ip_input.c:447 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip_rcv+0x16c/0x180 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core+0xb3/0xe0 net/core/dev.c:5624 __netif_receive_skb+0x21/0xd0 net/core/dev.c:5738 netif_receive_skb_internal net/core/dev.c:5824 [inline] netif_receive_skb+0x271/0x300 net/core/dev.c:5884 tun_rx_batched drivers/net/tun.c:1549 [inline] tun_get_user+0x24db/0x2c50 drivers/net/tun.c:2002 tun_chr_write_iter+0x107/0x1a0 drivers/net/tun.c:2048 new_sync_write fs/read_write.c:497 [inline] vfs_write+0x76f/0x8d0 fs/read_write.c:590 ksys_write+0xbf/0x190 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __x64_sys_write+0x41/0x50 fs/read_write.c:652 x64_sys_call+0xe66/0x1990 arch/x86/include/generated/asm/syscalls_64.h:2 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x4b/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7fc44a68bc1f Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 e9 cf f5 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 3c d0 f5 ff 48 RSP: 002b:00007fc449126c90 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00000000004bc050 RCX: 00007fc44a68bc1f RDX: 0000000000000032 RSI: 00000000200000c0 RDI: 00000000000000c8 RBP: 00000000004bc050 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000032 R11: 0000000000000293 R12: 0000000000000000 R13: 000000000000000b R14: 00007fc44a5ec530 R15: 0000000000000000 Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF") Reported-by: syzkaller Signed-off-by: Kuniyuki Iwashima Reviewed-by: Eric Dumazet Link: https://patch.msgid.link/20240709191356.24010-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/ipv4/udp.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index da9015efb45e..6ad25dc9710c 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -317,6 +317,8 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, goto fail_unlock; } + sock_set_flag(sk, SOCK_RCU_FREE); + sk_add_node_rcu(sk, &hslot->head); hslot->count++; sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); @@ -333,7 +335,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, hslot2->count++; spin_unlock(&hslot2->lock); } - sock_set_flag(sk, SOCK_RCU_FREE); + error = 0; fail_unlock: spin_unlock_bh(&hslot->lock); From b81a523d54ea689414f67c9fb81a5b917a41ed55 Mon Sep 17 00:00:00 2001 From: Chengen Du Date: Wed, 10 Jul 2024 13:37:47 +0800 Subject: [PATCH 1438/1565] net/sched: Fix UAF when resolving a clash [ Upstream commit 26488172b0292bed837b95a006a3f3431d1898c3 ] KASAN reports the following UAF: BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469 Call Trace: dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 asm_common_interrupt+0x27/0x40 Allocated by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_alloc_info+0x1e/0x40 __kasan_krealloc+0x133/0x190 krealloc+0xaa/0x130 nf_ct_ext_add+0xed/0x230 [nf_conntrack] tcf_ct_act+0x1095/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 Freed by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_free_info+0x2b/0x60 ____kasan_slab_free+0x180/0x1f0 __kasan_slab_free+0x12/0x30 slab_free_freelist_hook+0xd2/0x1a0 __kmem_cache_free+0x1a2/0x2f0 kfree+0x78/0x120 nf_conntrack_free+0x74/0x130 [nf_conntrack] nf_ct_destroy+0xb2/0x140 [nf_conntrack] __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack] nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack] __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack] tcf_ct_act+0x12ad/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 The ct may be dropped if a clash has been resolved but is still passed to the tcf_ct_flow_table_process_conn function for further usage. This issue can be fixed by retrieving ct from skb again after confirming conntrack. Fixes: 0cc254e5aa37 ("net/sched: act_ct: Offload connections with commit action") Co-developed-by: Gerald Yang Signed-off-by: Gerald Yang Signed-off-by: Chengen Du Link: https://patch.msgid.link/20240710053747.13223-1-chengen.du@canonical.com Signed-off-by: Paolo Abeni Signed-off-by: Sasha Levin --- net/sched/act_ct.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index c6d6a6fe9602..a59a8ad38721 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -1038,6 +1038,14 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a, */ if (nf_conntrack_confirm(skb) != NF_ACCEPT) goto drop; + + /* The ct may be dropped if a clash has been resolved, + * so it's necessary to retrieve it from skb again to + * prevent UAF. + */ + ct = nf_ct_get(skb, &ctinfo); + if (!ct) + skip_add = true; } if (!skip_add) From 148d5494258b4e6f9831045a6191396604795841 Mon Sep 17 00:00:00 2001 From: Sven Schnelle Date: Tue, 30 Apr 2024 16:30:01 +0200 Subject: [PATCH 1439/1565] s390: Mark psw in __load_psw_mask() as __unitialized [ Upstream commit 7278a8fb8d032dfdc03d9b5d17e0bc451cdc1492 ] Without __unitialized, the following code is generated when INIT_STACK_ALL_ZERO is enabled: 86: d7 0f f0 a0 f0 a0 xc 160(16,%r15), 160(%r15) 8c: e3 40 f0 a0 00 24 stg %r4, 160(%r15) 92: c0 10 00 00 00 08 larl %r1, 0xa2 98: e3 10 f0 a8 00 24 stg %r1, 168(%r15) 9e: b2 b2 f0 a0 lpswe 160(%r15) The xc is not adding any security because psw is fully initialized with the following instructions. Add __unitialized to the psw definitiation to avoid the superfluous clearing of psw. Reviewed-by: Heiko Carstens Signed-off-by: Sven Schnelle Signed-off-by: Alexander Gordeev Signed-off-by: Sasha Levin --- arch/s390/include/asm/processor.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h index 0987c3fc45f5..95ed01a3536c 100644 --- a/arch/s390/include/asm/processor.h +++ b/arch/s390/include/asm/processor.h @@ -252,8 +252,8 @@ static inline void __load_psw(psw_t psw) */ static __always_inline void __load_psw_mask(unsigned long mask) { + psw_t psw __uninitialized; unsigned long addr; - psw_t psw; psw.mask = mask; From 7e0297c80fa1de20038c2fb4c78bcf3fedd59fd2 Mon Sep 17 00:00:00 2001 From: Chen Ni Date: Wed, 10 Jul 2024 16:16:48 +0800 Subject: [PATCH 1440/1565] ARM: davinci: Convert comma to semicolon [ Upstream commit acc3815db1a02d654fbc19726ceaadca0d7dd81c ] Replace a comma between expression statements by a semicolon. Fixes: efc1bb8a6fd5 ("davinci: add power management support") Signed-off-by: Chen Ni Acked-by: Bartosz Golaszewski Signed-off-by: Arnd Bergmann Signed-off-by: Sasha Levin --- arch/arm/mach-davinci/pm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/mach-davinci/pm.c b/arch/arm/mach-davinci/pm.c index 323ee4e657c4..94d7d69b9db7 100644 --- a/arch/arm/mach-davinci/pm.c +++ b/arch/arm/mach-davinci/pm.c @@ -62,7 +62,7 @@ static void davinci_pm_suspend(void) /* Configure sleep count in deep sleep register */ val = __raw_readl(pm_config.deepsleep_reg); - val &= ~DEEPSLEEP_SLEEPCOUNT_MASK, + val &= ~DEEPSLEEP_SLEEPCOUNT_MASK; val |= pm_config.sleepcount; __raw_writel(val, pm_config.deepsleep_reg); From b4e9f8905d789ba81848f9879b75a37906b9a5eb Mon Sep 17 00:00:00 2001 From: Michal Mazur Date: Wed, 10 Jul 2024 13:21:25 +0530 Subject: [PATCH 1441/1565] octeontx2-af: fix detection of IP layer [ Upstream commit 404dc0fd6fb0bb942b18008c6f8c0320b80aca20 ] Checksum and length checks are not enabled for IPv4 header with options and IPv6 with extension headers. To fix this a change in enum npc_kpu_lc_ltype is required which will allow adjustment of LTYPE_MASK to detect all types of IP headers. Fixes: 21e6699e5cd6 ("octeontx2-af: Add NPC KPU profile") Signed-off-by: Michal Mazur Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/net/ethernet/marvell/octeontx2/af/npc.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/af/npc.h b/drivers/net/ethernet/marvell/octeontx2/af/npc.h index dc34e564c919..46689d95254b 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/npc.h +++ b/drivers/net/ethernet/marvell/octeontx2/af/npc.h @@ -54,8 +54,13 @@ enum npc_kpu_lb_ltype { NPC_LT_LB_CUSTOM1 = 0xF, }; +/* Don't modify ltypes up to IP6_EXT, otherwise length and checksum of IP + * headers may not be checked correctly. IPv4 ltypes and IPv6 ltypes must + * differ only at bit 0 so mask 0xE can be used to detect extended headers. + */ enum npc_kpu_lc_ltype { - NPC_LT_LC_IP = 1, + NPC_LT_LC_PTP = 1, + NPC_LT_LC_IP, NPC_LT_LC_IP_OPT, NPC_LT_LC_IP6, NPC_LT_LC_IP6_EXT, @@ -63,7 +68,6 @@ enum npc_kpu_lc_ltype { NPC_LT_LC_RARP, NPC_LT_LC_MPLS, NPC_LT_LC_NSH, - NPC_LT_LC_PTP, NPC_LT_LC_FCOE, NPC_LT_LC_CUSTOM0 = 0xE, NPC_LT_LC_CUSTOM1 = 0xF, From 24b9fafe34641a6dcf8df36f5e0de17d9cdccce8 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 7 Jun 2024 12:56:52 +0000 Subject: [PATCH 1442/1565] tcp: use signed arithmetic in tcp_rtx_probe0_timed_out() commit 36534d3c54537bf098224a32dc31397793d4594d upstream. Due to timer wheel implementation, a timer will usually fire after its schedule. For instance, for HZ=1000, a timeout between 512ms and 4s has a granularity of 64ms. For this range of values, the extra delay could be up to 63ms. For TCP, this means that tp->rcv_tstamp may be after inet_csk(sk)->icsk_timeout whenever the timer interrupt finally triggers, if one packet came during the extra delay. We need to make sure tcp_rtx_probe0_timed_out() handles this case. Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0") Signed-off-by: Eric Dumazet Cc: Menglong Dong Acked-by: Neal Cardwell Reviewed-by: Jason Xing Link: https://lore.kernel.org/r/20240607125652.1472540-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_timer.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 2925b985e11c..33b00b00bc84 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -442,8 +442,13 @@ static bool tcp_rtx_probe0_timed_out(const struct sock *sk, { const struct tcp_sock *tp = tcp_sk(sk); const int timeout = TCP_RTO_MAX * 2; - u32 rcv_delta, rtx_delta; + u32 rtx_delta; + s32 rcv_delta; + /* Note: timer interrupt might have been delayed by at least one jiffy, + * and tp->rcv_tstamp might very well have been written recently. + * rcv_delta can thus be negative. + */ rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; if (rcv_delta <= timeout) return false; From 5d7e64d70a11d988553a08239c810a658e841982 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 10 Jul 2024 00:14:01 +0000 Subject: [PATCH 1443/1565] tcp: avoid too many retransmit packets commit 97a9063518f198ec0adb2ecb89789de342bb8283 upstream. If a TCP socket is using TCP_USER_TIMEOUT, and the other peer retracted its window to zero, tcp_retransmit_timer() can retransmit a packet every two jiffies (2 ms for HZ=1000), for about 4 minutes after TCP_USER_TIMEOUT has 'expired'. The fix is to make sure tcp_rtx_probe0_timed_out() takes icsk->icsk_user_timeout into account. Before blamed commit, the socket would not timeout after icsk->icsk_user_timeout, but would use standard exponential backoff for the retransmits. Also worth noting that before commit e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0"), the issue would last 2 minutes instead of 4. Fixes: b701a99e431d ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy") Signed-off-by: Eric Dumazet Cc: Neal Cardwell Reviewed-by: Jason Xing Reviewed-by: Jon Maxwell Reviewed-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20240710001402.2758273-1-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_timer.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 33b00b00bc84..316a4ef4db37 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -440,22 +440,34 @@ static void tcp_fastopen_synack_timer(struct sock *sk, struct request_sock *req) static bool tcp_rtx_probe0_timed_out(const struct sock *sk, const struct sk_buff *skb) { + const struct inet_connection_sock *icsk = inet_csk(sk); + u32 user_timeout = READ_ONCE(icsk->icsk_user_timeout); const struct tcp_sock *tp = tcp_sk(sk); - const int timeout = TCP_RTO_MAX * 2; + int timeout = TCP_RTO_MAX * 2; u32 rtx_delta; s32 rcv_delta; + rtx_delta = (u32)msecs_to_jiffies(tcp_time_stamp(tp) - + (tp->retrans_stamp ?: tcp_skb_timestamp(skb))); + + if (user_timeout) { + /* If user application specified a TCP_USER_TIMEOUT, + * it does not want win 0 packets to 'reset the timer' + * while retransmits are not making progress. + */ + if (rtx_delta > user_timeout) + return true; + timeout = min_t(u32, timeout, msecs_to_jiffies(user_timeout)); + } + /* Note: timer interrupt might have been delayed by at least one jiffy, * and tp->rcv_tstamp might very well have been written recently. * rcv_delta can thus be negative. */ - rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; + rcv_delta = icsk->icsk_timeout - tp->rcv_tstamp; if (rcv_delta <= timeout) return false; - rtx_delta = (u32)msecs_to_jiffies(tcp_time_stamp(tp) - - (tp->retrans_stamp ?: tcp_skb_timestamp(skb))); - return rtx_delta > timeout; } From 26b4d6802ed7a9977d4892f16a8ab90c36fb5c4b Mon Sep 17 00:00:00 2001 From: Ronald Wahl Date: Tue, 9 Jul 2024 21:58:45 +0200 Subject: [PATCH 1444/1565] net: ks8851: Fix potential TX stall after interface reopen commit 7a99afef17af66c276c1d6e6f4dbcac223eaf6ac upstream. The amount of TX space in the hardware buffer is tracked in the tx_space variable. The initial value is currently only set during driver probing. After closing the interface and reopening it the tx_space variable has the last value it had before close. If it is smaller than the size of the first send packet after reopeing the interface the queue will be stopped. The queue is woken up after receiving a TX interrupt but this will never happen since we did not send anything. This commit moves the initialization of the tx_space variable to the ks8851_net_open function right before starting the TX queue. Also query the value from the hardware instead of using a hard coded value. Only the SPI chip variant is affected by this issue because only this driver variant actually depends on the tx_space variable in the xmit function. Fixes: 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun") Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: Simon Horman Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Ronald Wahl Reviewed-by: Jacob Keller Link: https://patch.msgid.link/20240709195845.9089-1-rwahl@gmx.de Signed-off-by: Paolo Abeni Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/micrel/ks8851_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c index 3d0ac7f3c87e..a4b4fa78ef92 100644 --- a/drivers/net/ethernet/micrel/ks8851_common.c +++ b/drivers/net/ethernet/micrel/ks8851_common.c @@ -501,6 +501,7 @@ static int ks8851_net_open(struct net_device *dev) ks8851_wrreg16(ks, KS_IER, ks->rc_ier); ks->queued_len = 0; + ks->tx_space = ks8851_rdreg16(ks, KS_TXMIR); netif_start_queue(ks->netdev); netif_dbg(ks, ifup, ks->netdev, "network device up\n"); @@ -1057,7 +1058,6 @@ int ks8851_probe_common(struct net_device *netdev, struct device *dev, int ret; ks->netdev = netdev; - ks->tx_space = 6144; gpio = of_get_named_gpio_flags(dev->of_node, "reset-gpios", 0, NULL); if (gpio == -EPROBE_DEFER) From 8e4e917f9d305a55073075a23bc895d5fe0fb3ad Mon Sep 17 00:00:00 2001 From: Daniele Palmas Date: Thu, 30 May 2024 10:00:53 +0200 Subject: [PATCH 1445/1565] USB: serial: option: add Telit generic core-dump composition commit 4298e400dbdbf259549d69c349e060652ad53611 upstream. Add the following core-dump composition, used in different Telit modems: 0x9000: tty (sahara) T: Bus=03 Lev=01 Prnt=03 Port=07 Cnt=01 Dev#= 41 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=9000 Rev=00.00 S: Manufacturer=Telit Cinterion S: Product=FN990-dump S: SerialNumber=e815bdde C: #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=2mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=10 Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms Signed-off-by: Daniele Palmas Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/option.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index 400e3e10b435..4d99b31e1951 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -1433,6 +1433,8 @@ static const struct usb_device_id option_ids[] = { .driver_info = NCTRL(2) }, { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x701b, 0xff), /* Telit LE910R1 (ECM) */ .driver_info = NCTRL(2) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x9000, 0xff), /* Telit generic core-dump device */ + .driver_info = NCTRL(0) }, { USB_DEVICE(TELIT_VENDOR_ID, 0x9010), /* Telit SBL FN980 flashing device */ .driver_info = NCTRL(0) | ZLP }, { USB_DEVICE(TELIT_VENDOR_ID, 0x9200), /* Telit LE910S1 flashing device */ From 9fb736742355f9a7a4667d4467530cc9c0155289 Mon Sep 17 00:00:00 2001 From: Daniele Palmas Date: Tue, 25 Jun 2024 12:27:16 +0200 Subject: [PATCH 1446/1565] USB: serial: option: add Telit FN912 rmnet compositions commit 9a590ff283421b71560deded2110dbdcbe1f7d1d upstream. Add the following Telit FN912 compositions: 0x3000: rmnet + tty (AT/NMEA) + tty (AT) + tty (diag) T: Bus=03 Lev=01 Prnt=03 Port=07 Cnt=01 Dev#= 8 Spd=480 MxCh= 0 D: Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=3000 Rev=05.15 S: Manufacturer=Telit Cinterion S: Product=FN912 S: SerialNumber=92c4c4d8 C: #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms 0x3001: rmnet + tty (AT) + tty (diag) + DPL (data packet logging) + adb T: Bus=03 Lev=01 Prnt=03 Port=07 Cnt=01 Dev#= 7 Spd=480 MxCh= 0 D: Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=3001 Rev=05.15 S: Manufacturer=Telit Cinterion S: Product=FN912 S: SerialNumber=92c4c4d8 C: #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 3 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=80 Driver=(none) E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=usbfs E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms Signed-off-by: Daniele Palmas Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/option.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index 4d99b31e1951..b163eee78b76 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -1425,6 +1425,10 @@ static const struct usb_device_id option_ids[] = { .driver_info = NCTRL(0) | RSVD(1) }, { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1901, 0xff), /* Telit LN940 (MBIM) */ .driver_info = NCTRL(0) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x3000, 0xff), /* Telit FN912 */ + .driver_info = RSVD(0) | NCTRL(3) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x3001, 0xff), /* Telit FN912 */ + .driver_info = RSVD(0) | NCTRL(2) | RSVD(3) | RSVD(4) }, { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x7010, 0xff), /* Telit LE910-S1 (RNDIS) */ .driver_info = NCTRL(2) }, { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x7011, 0xff), /* Telit LE910-S1 (ECM) */ From c9e1030198e56f118e092e1b24e6789cf4f4a283 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= Date: Wed, 26 Jun 2024 15:32:23 +0200 Subject: [PATCH 1447/1565] USB: serial: option: add Fibocom FM350-GL MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 2604e08ff251dba330e16b65e80074c9c540aad7 upstream. FM350-GL is 5G Sub-6 WWAN module which uses M.2 form factor interface. It is based on Mediatek's MTK T700 CPU. The module supports PCIe Gen3 x1 and USB 2.0 and 3.0 interfaces. The manufacturer states that USB is "for debug" but it has been confirmed to be fully functional, except for modem-control requests on some of the interfaces. USB device composition is controlled by AT+GTUSBMODE= command. Two values are currently supported for the : 40: RNDIS+AT+AP(GNSS)+META+DEBUG+NPT+ADB 41: RNDIS+AT+AP(GNSS)+META+DEBUG+NPT+ADB+AP(LOG)+AP(META) (default value) [ Note that the functions above are not ordered by interface number. ] Mode 40 corresponds to: T: Bus=03 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#= 22 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0e8d ProdID=7126 Rev= 0.01 S: Manufacturer=Fibocom Wireless Inc. S: Product=FM350-GL C:* #Ifs= 8 Cfg#= 1 Atr=a0 MxPwr=500mA A: FirstIf#= 0 IfCount= 2 Cls=e0(wlcon) Sub=01 Prot=03 I:* If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=ff Driver=rndis_host E: Ad=82(I) Atr=03(Int.) MxPS= 64 Ivl=125us I:* If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none) E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 7 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms Mode 41 corresponds to: T: Bus=03 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#= 7 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0e8d ProdID=7127 Rev= 0.01 S: Manufacturer=Fibocom Wireless Inc. S: Product=FM350-GL C:* #Ifs=10 Cfg#= 1 Atr=a0 MxPwr=500mA A: FirstIf#= 0 IfCount= 2 Cls=e0(wlcon) Sub=01 Prot=03 I:* If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=ff Driver=rndis_host E: Ad=82(I) Atr=03(Int.) MxPS= 64 Ivl=125us I:* If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none) E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 7 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 8 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=08(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 9 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=8a(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=09(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms Cc: stable@vger.kernel.org Signed-off-by: Bjørn Mork Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/option.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index b163eee78b76..1fc22710069a 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -2230,6 +2230,10 @@ static const struct usb_device_id option_ids[] = { { USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, MEDIATEK_PRODUCT_7106_2COM, 0x02, 0x02, 0x01) }, { USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, MEDIATEK_PRODUCT_DC_4COM2, 0xff, 0x02, 0x01) }, { USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, MEDIATEK_PRODUCT_DC_4COM2, 0xff, 0x00, 0x00) }, + { USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, 0x7126, 0xff, 0x00, 0x00), + .driver_info = NCTRL(2) }, + { USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, 0x7127, 0xff, 0x00, 0x00), + .driver_info = NCTRL(2) | NCTRL(3) | NCTRL(4) }, { USB_DEVICE(CELLIENT_VENDOR_ID, CELLIENT_PRODUCT_MEN200) }, { USB_DEVICE(CELLIENT_VENDOR_ID, CELLIENT_PRODUCT_MPL200), .driver_info = RSVD(1) | RSVD(4) }, From fb9ff5139625711aff348deb06a45b983c38c4bb Mon Sep 17 00:00:00 2001 From: Slark Xiao Date: Fri, 5 Jul 2024 16:17:09 +0800 Subject: [PATCH 1448/1565] USB: serial: option: add support for Foxconn T99W651 commit 3c841d54b63e4446383de3238399a3910e47d8e2 upstream. T99W651 is a RNDIS based modem device. There are 3 serial ports need to be enumerated: Diag, NMEA and AT. Test evidence as below: T: Bus=01 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#= 6 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0489 ProdID=e145 Rev=05.15 S: Manufacturer=QCOM S: Product=SDXPINN-IDP _SN:93B562B2 S: SerialNumber=82e6fe26 C: #Ifs= 7 Cfg#= 1 Atr=a0 MxPwr=500mA I: If#=0x0 Alt= 0 #EPs= 1 Cls=ef(misc ) Sub=04 Prot=01 Driver=rndis_host I: If#=0x1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option I: If#=0x3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option I: If#=0x4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option I: If#=0x5 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=70 Driver=(none) I: If#=0x6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none) 0&1: RNDIS, 2:AT, 3:NMEA, 4:DIAG, 5:QDSS, 6:ADB QDSS is not a serial port. Signed-off-by: Slark Xiao Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/option.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index 1fc22710069a..5c47142c9cbf 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -2294,6 +2294,8 @@ static const struct usb_device_id option_ids[] = { .driver_info = RSVD(3) }, { USB_DEVICE_INTERFACE_CLASS(0x0489, 0xe0f0, 0xff), /* Foxconn T99W373 MBIM */ .driver_info = RSVD(3) }, + { USB_DEVICE_INTERFACE_CLASS(0x0489, 0xe145, 0xff), /* Foxconn T99W651 RNDIS */ + .driver_info = RSVD(5) | RSVD(6) }, { USB_DEVICE(0x1508, 0x1001), /* Fibocom NL668 (IOT version) */ .driver_info = RSVD(4) | RSVD(5) | RSVD(6) }, { USB_DEVICE(0x1782, 0x4d10) }, /* Fibocom L610 (AT mode) */ From 2dc6aad6eaca81823e608b9ff10c9ee8902002e4 Mon Sep 17 00:00:00 2001 From: Mank Wang Date: Sat, 29 Jun 2024 01:54:45 +0000 Subject: [PATCH 1449/1565] USB: serial: option: add Netprisma LCUK54 series modules commit dc6dbe3ed28795b01c712ad8f567728f9c14b01d upstream. Add support for Netprisma LCUK54 series modules. LCUK54-WRD-LWW(0x3731/0x0100): NetPrisma LCUK54-WWD for Global LCUK54-WRD-LWW(0x3731/0x0101): NetPrisma LCUK54-WRD for Global SKU LCUK54-WRD-LCN(0x3731/0x0106): NetPrisma LCUK54-WRD for China SKU LCUK54-WRD-LWW(0x3731/0x0111): NetPrisma LCUK54-WWD for SA LCUK54-WRD-LWW(0x3731/0x0112): NetPrisma LCUK54-WWD for EU LCUK54-WRD-LWW(0x3731/0x0113): NetPrisma LCUK54-WWD for NA LCUK54-WWD-LCN(0x3731/0x0115): NetPrisma LCUK54-WWD for China EDU LCUK54-WWD-LWW(0x3731/0x0116): NetPrisma LCUK54-WWD for Golbal EDU Above products use the exact same interface layout and option driver: MBIM + GNSS + DIAG + NMEA + AT + QDSS + DPL T: Bus=03 Lev=01 Prnt=01 Port=01 Cnt=02 Dev#= 5 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=3731 ProdID=0101 Rev= 5.04 S: Manufacturer=NetPrisma S: Product=LCUK54-WRD S: SerialNumber=b6250c36 C:* #Ifs= 8 Cfg#= 1 Atr=a0 MxPwr=500mA A: FirstIf#= 0 IfCount= 2 Cls=02(comm.) Sub=0e Prot=00 I:* If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=0e Prot=00 Driver=cdc_mbim E: Ad=81(I) Atr=03(Int.) MxPS= 64 Ivl=32ms I: If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim I:* If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim E: Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) E: Ad=82(I) Atr=03(Int.) MxPS= 64 Ivl=32ms I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=40 Driver=option E: Ad=85(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=87(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 6 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=70 Driver=(none) E: Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 7 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=80 Driver=(none) E: Ad=8f(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms Signed-off-by: Mank Wang Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/option.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index 5c47142c9cbf..9e93c96a69b0 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -2333,6 +2333,30 @@ static const struct usb_device_id option_ids[] = { .driver_info = RSVD(4) }, { USB_DEVICE_INTERFACE_CLASS(0x33f8, 0x0115, 0xff), /* Rolling RW135-GL (laptop MBIM) */ .driver_info = RSVD(5) }, + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0100, 0xff, 0xff, 0x30) }, /* NetPrisma LCUK54-WWD for Global */ + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0100, 0xff, 0x00, 0x40) }, + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0100, 0xff, 0xff, 0x40) }, + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0101, 0xff, 0xff, 0x30) }, /* NetPrisma LCUK54-WRD for Global SKU */ + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0101, 0xff, 0x00, 0x40) }, + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0101, 0xff, 0xff, 0x40) }, + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0106, 0xff, 0xff, 0x30) }, /* NetPrisma LCUK54-WRD for China SKU */ + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0106, 0xff, 0x00, 0x40) }, + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0106, 0xff, 0xff, 0x40) }, + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0111, 0xff, 0xff, 0x30) }, /* NetPrisma LCUK54-WWD for SA */ + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0111, 0xff, 0x00, 0x40) }, + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0111, 0xff, 0xff, 0x40) }, + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0112, 0xff, 0xff, 0x30) }, /* NetPrisma LCUK54-WWD for EU */ + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0112, 0xff, 0x00, 0x40) }, + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0112, 0xff, 0xff, 0x40) }, + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0113, 0xff, 0xff, 0x30) }, /* NetPrisma LCUK54-WWD for NA */ + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0113, 0xff, 0x00, 0x40) }, + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0113, 0xff, 0xff, 0x40) }, + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0115, 0xff, 0xff, 0x30) }, /* NetPrisma LCUK54-WWD for China EDU */ + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0115, 0xff, 0x00, 0x40) }, + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0115, 0xff, 0xff, 0x40) }, + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0116, 0xff, 0xff, 0x30) }, /* NetPrisma LCUK54-WWD for Golbal EDU */ + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0116, 0xff, 0x00, 0x40) }, + { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0116, 0xff, 0xff, 0x40) }, { USB_DEVICE_AND_INTERFACE_INFO(OPPO_VENDOR_ID, OPPO_PRODUCT_R11, 0xff, 0xff, 0x30) }, { USB_DEVICE_AND_INTERFACE_INFO(SIERRA_VENDOR_ID, SIERRA_PRODUCT_EM9191, 0xff, 0xff, 0x30) }, { USB_DEVICE_AND_INTERFACE_INFO(SIERRA_VENDOR_ID, SIERRA_PRODUCT_EM9191, 0xff, 0xff, 0x40) }, From 868bc44086299afc152dd40119e3c360c66c19f3 Mon Sep 17 00:00:00 2001 From: Vanillan Wang Date: Fri, 31 May 2024 10:40:12 +0800 Subject: [PATCH 1450/1565] USB: serial: option: add Rolling RW350-GL variants commit ae420771551bd9f04347c59744dd062332bdec3e upstream. Update the USB serial option driver support for the Rolling RW350-GL - VID:PID 33f8:0802, RW350-GL are laptop M.2 cards (with MBIM interfaces for /Linux/Chrome OS) Here are the outputs of usb-devices: usbmode=63: mbim, pipe T: Bus=02 Lev=01 Prnt=01 Port=02 Cnt=01 Dev#= 2 Spd=5000 MxCh= 0 D: Ver= 3.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS= 9 #Cfgs= 1 P: Vendor=33f8 ProdID=0802 Rev=00.01 S: Manufacturer=Rolling Wireless S.a.r.l. S: Product=USB DATA CARD C: #Ifs= 3 Cfg#= 1 Atr=a0 MxPwr=896mA I: If#= 0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim E: Ad=82(I) Atr=03(Int.) MxPS= 64 Ivl=32ms I: If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim E: Ad=01(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms usbmode=64: mbim, others at (If#= 5 adb) MBIM(MI0) + GNSS(MI2) + AP log(MI3) + AP META(MI4) + ADB(MI5) + MD AT(MI6) + MD META(MI7) + NPT(MI8) + Debug(MI9) T: Bus=02 Lev=01 Prnt=01 Port=02 Cnt=01 Dev#= 5 Spd=5000 MxCh= 0 D: Ver= 3.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS= 9 #Cfgs= 1 P: Vendor=33f8 ProdID=0802 Rev=00.01 S: Manufacturer=Rolling Wireless S.a.r.l. S: Product=USB DATA CARD C: #Ifs=10 Cfg#= 1 Atr=a0 MxPwr=896mA I: If#= 0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim E: Ad=82(I) Atr=03(Int.) MxPS= 64 Ivl=32ms I: If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim E: Ad=01(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms I: If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms I: If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=84(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms I: If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=04(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=85(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms I: If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=usbfs E: Ad=05(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=86(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms I: If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=06(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=87(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms I: If#= 7 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=07(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=88(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms I: If#= 8 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=08(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=89(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms I: If#= 9 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=09(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=8a(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms Signed-off-by: Vanillan Wang Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/option.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index 9e93c96a69b0..73d97f7bf15b 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -2333,6 +2333,8 @@ static const struct usb_device_id option_ids[] = { .driver_info = RSVD(4) }, { USB_DEVICE_INTERFACE_CLASS(0x33f8, 0x0115, 0xff), /* Rolling RW135-GL (laptop MBIM) */ .driver_info = RSVD(5) }, + { USB_DEVICE_INTERFACE_CLASS(0x33f8, 0x0802, 0xff), /* Rolling RW350-GL (laptop MBIM) */ + .driver_info = RSVD(5) }, { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0100, 0xff, 0xff, 0x30) }, /* NetPrisma LCUK54-WWD for Global */ { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0100, 0xff, 0x00, 0x40) }, { USB_DEVICE_AND_INTERFACE_INFO(0x3731, 0x0100, 0xff, 0xff, 0x40) }, From 932a86a711c722b45ed47ba2103adca34d225b33 Mon Sep 17 00:00:00 2001 From: Dmitry Smirnov Date: Sat, 15 Jun 2024 01:45:56 +0300 Subject: [PATCH 1451/1565] USB: serial: mos7840: fix crash on resume commit c15a688e49987385baa8804bf65d570e362f8576 upstream. Since commit c49cfa917025 ("USB: serial: use generic method if no alternative is provided in usb serial layer"), USB serial core calls the generic resume implementation when the driver has not provided one. This can trigger a crash on resume with mos7840 since support for multiple read URBs was added back in 2011. Specifically, both port read URBs are now submitted on resume for open ports, but the context pointer of the second URB is left set to the core rather than mos7840 port structure. Fix this by implementing dedicated suspend and resume functions for mos7840. Tested with Delock 87414 USB 2.0 to 4x serial adapter. Signed-off-by: Dmitry Smirnov [ johan: analyse crash and rewrite commit message; set busy flag on resume; drop bulk-in check; drop unnecessary usb_kill_urb() ] Fixes: d83b405383c9 ("USB: serial: add support for multiple read urbs") Cc: stable@vger.kernel.org # 3.3 Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/mos7840.c | 45 ++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/drivers/usb/serial/mos7840.c b/drivers/usb/serial/mos7840.c index 48a9a0476a45..e568bf566ab0 100644 --- a/drivers/usb/serial/mos7840.c +++ b/drivers/usb/serial/mos7840.c @@ -1764,6 +1764,49 @@ static int mos7840_port_remove(struct usb_serial_port *port) return 0; } +static int mos7840_suspend(struct usb_serial *serial, pm_message_t message) +{ + struct moschip_port *mos7840_port; + struct usb_serial_port *port; + int i; + + for (i = 0; i < serial->num_ports; ++i) { + port = serial->port[i]; + if (!tty_port_initialized(&port->port)) + continue; + + mos7840_port = usb_get_serial_port_data(port); + + usb_kill_urb(mos7840_port->read_urb); + mos7840_port->read_urb_busy = false; + } + + return 0; +} + +static int mos7840_resume(struct usb_serial *serial) +{ + struct moschip_port *mos7840_port; + struct usb_serial_port *port; + int res; + int i; + + for (i = 0; i < serial->num_ports; ++i) { + port = serial->port[i]; + if (!tty_port_initialized(&port->port)) + continue; + + mos7840_port = usb_get_serial_port_data(port); + + mos7840_port->read_urb_busy = true; + res = usb_submit_urb(mos7840_port->read_urb, GFP_NOIO); + if (res) + mos7840_port->read_urb_busy = false; + } + + return 0; +} + static struct usb_serial_driver moschip7840_4port_device = { .driver = { .owner = THIS_MODULE, @@ -1792,6 +1835,8 @@ static struct usb_serial_driver moschip7840_4port_device = { .port_probe = mos7840_port_probe, .port_remove = mos7840_port_remove, .read_bulk_callback = mos7840_bulk_in_callback, + .suspend = mos7840_suspend, + .resume = mos7840_resume, }; static struct usb_serial_driver * const serial_drivers[] = { From 10ae6b364be746929ba9b5fbd3bd9fd7430cb5e5 Mon Sep 17 00:00:00 2001 From: WangYuli Date: Tue, 2 Jul 2024 23:44:08 +0800 Subject: [PATCH 1452/1565] USB: Add USB_QUIRK_NO_SET_INTF quirk for START BP-850k commit 3859e85de30815a20bce7db712ce3d94d40a682d upstream. START BP-850K is a dot matrix printer that crashes when it receives a Set-Interface request and needs USB_QUIRK_NO_SET_INTF to work properly. Cc: stable Signed-off-by: jinxiaobo Signed-off-by: WangYuli Link: https://lore.kernel.org/r/202E4B2BD0F0FEA4+20240702154408.631201-1-wangyuli@uniontech.com Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/quirks.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c index 856947620f14..3a54d76c55a3 100644 --- a/drivers/usb/core/quirks.c +++ b/drivers/usb/core/quirks.c @@ -504,6 +504,9 @@ static const struct usb_device_id usb_quirk_list[] = { { USB_DEVICE(0x1b1c, 0x1b38), .driver_info = USB_QUIRK_DELAY_INIT | USB_QUIRK_DELAY_CTRL_MSG }, + /* START BP-850k Printer */ + { USB_DEVICE(0x1bc3, 0x0003), .driver_info = USB_QUIRK_NO_SET_INTF }, + /* MIDI keyboard WORLDE MINI */ { USB_DEVICE(0x1c75, 0x0204), .driver_info = USB_QUIRK_CONFIG_INTF_STRINGS }, From e8474a10c535e6a2024c3b06e37e4a3a23beb490 Mon Sep 17 00:00:00 2001 From: Lee Jones Date: Fri, 5 Jul 2024 08:43:39 +0100 Subject: [PATCH 1453/1565] usb: gadget: configfs: Prevent OOB read/write in usb_string_copy() commit 6d3c721e686ea6c59e18289b400cc95c76e927e0 upstream. Userspace provided string 's' could trivially have the length zero. Left unchecked this will firstly result in an OOB read in the form `if (str[0 - 1] == '\n') followed closely by an OOB write in the form `str[0 - 1] = '\0'`. There is already a validating check to catch strings that are too long. Let's supply an additional check for invalid strings that are too short. Signed-off-by: Lee Jones Cc: stable Link: https://lore.kernel.org/r/20240705074339.633717-1-lee@kernel.org Signed-off-by: Greg Kroah-Hartman --- drivers/usb/gadget/configfs.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/gadget/configfs.c b/drivers/usb/gadget/configfs.c index d51ea1c052f2..6bb69a4e6470 100644 --- a/drivers/usb/gadget/configfs.c +++ b/drivers/usb/gadget/configfs.c @@ -104,9 +104,12 @@ static int usb_string_copy(const char *s, char **s_copy) int ret; char *str; char *copy = *s_copy; + ret = strlen(s); if (ret > USB_MAX_STRING_LEN) return -EOVERFLOW; + if (ret < 1) + return -EINVAL; if (copy) { str = copy; From d09dd21bb5215d583ca9a1cb1464dbc77a7e88cf Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Thu, 27 Jun 2024 15:56:18 -0400 Subject: [PATCH 1454/1565] USB: core: Fix duplicate endpoint bug by clearing reserved bits in the descriptor commit a368ecde8a5055b627749b09c6218ef793043e47 upstream. Syzbot has identified a bug in usbcore (see the Closes: tag below) caused by our assumption that the reserved bits in an endpoint descriptor's bEndpointAddress field will always be 0. As a result of the bug, the endpoint_is_duplicate() routine in config.c (and possibly other routines as well) may believe that two descriptors are for distinct endpoints, even though they have the same direction and endpoint number. This can lead to confusion, including the bug identified by syzbot (two descriptors with matching endpoint numbers and directions, where one was interrupt and the other was bulk). To fix the bug, we will clear the reserved bits in bEndpointAddress when we parse the descriptor. (Note that both the USB-2.0 and USB-3.1 specs say these bits are "Reserved, reset to zero".) This requires us to make a copy of the descriptor earlier in usb_parse_endpoint() and use the copy instead of the original when checking for duplicates. Signed-off-by: Alan Stern Reported-and-tested-by: syzbot+8693a0bb9c10b554272a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/0000000000003d868e061bc0f554@google.com/ Fixes: 0a8fd1346254 ("USB: fix problems with duplicate endpoint addresses") CC: Oliver Neukum CC: stable@vger.kernel.org Link: https://lore.kernel.org/r/205a5edc-7fef-4159-b64a-80374b6b101a@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/config.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/usb/core/config.c b/drivers/usb/core/config.c index 2f8a1225b697..1508e0f00dbc 100644 --- a/drivers/usb/core/config.c +++ b/drivers/usb/core/config.c @@ -291,6 +291,20 @@ static int usb_parse_endpoint(struct device *ddev, int cfgno, if (ifp->desc.bNumEndpoints >= num_ep) goto skip_to_next_endpoint_or_interface_descriptor; + /* Save a copy of the descriptor and use it instead of the original */ + endpoint = &ifp->endpoint[ifp->desc.bNumEndpoints]; + memcpy(&endpoint->desc, d, n); + d = &endpoint->desc; + + /* Clear the reserved bits in bEndpointAddress */ + i = d->bEndpointAddress & + (USB_ENDPOINT_DIR_MASK | USB_ENDPOINT_NUMBER_MASK); + if (i != d->bEndpointAddress) { + dev_notice(ddev, "config %d interface %d altsetting %d has an endpoint descriptor with address 0x%X, changing to 0x%X\n", + cfgno, inum, asnum, d->bEndpointAddress, i); + endpoint->desc.bEndpointAddress = i; + } + /* Check for duplicate endpoint addresses */ if (config_endpoint_is_duplicate(config, inum, asnum, d)) { dev_notice(ddev, "config %d interface %d altsetting %d has a duplicate endpoint with address 0x%X, skipping\n", @@ -308,10 +322,8 @@ static int usb_parse_endpoint(struct device *ddev, int cfgno, } } - endpoint = &ifp->endpoint[ifp->desc.bNumEndpoints]; + /* Accept this endpoint */ ++ifp->desc.bNumEndpoints; - - memcpy(&endpoint->desc, d, n); INIT_LIST_HEAD(&endpoint->urb_list); /* From bdb9c58e8048d4d81e7b5bcfd49bd88bf0c8be67 Mon Sep 17 00:00:00 2001 From: He Zhe Date: Thu, 6 Jun 2024 20:39:08 +0800 Subject: [PATCH 1455/1565] hpet: Support 32-bit userspace commit 4e60131d0d36af65ab9c9144f4f163fe97ae36e8 upstream. hpet_compat_ioctl and read file operations failed to handle parameters from 32-bit userspace and thus samples/timers/hpet_example.c fails as below. root@intel-x86-64:~# ./hpet_example-32.out poll /dev/hpet 1 2 -hpet: executing poll hpet_poll: HPET_IRQFREQ failed This patch fixes cmd and arg handling in hpet_compat_ioctl and adds compat handling for 32-bit userspace in hpet_read. hpet_example now shows that it works for both 64-bit and 32-bit. root@intel-x86-64:~# ./hpet_example-32.out poll /dev/hpet 1 2 -hpet: executing poll hpet_poll: info.hi_flags 0x0 hpet_poll: expired time = 0xf4298 hpet_poll: revents = 0x1 hpet_poll: data 0x1 hpet_poll: expired time = 0xf4235 hpet_poll: revents = 0x1 hpet_poll: data 0x1 root@intel-x86-64:~# ./hpet_example-64.out poll /dev/hpet 1 2 -hpet: executing poll hpet_poll: info.hi_flags 0x0 hpet_poll: expired time = 0xf42a1 hpet_poll: revents = 0x1 hpet_poll: data 0x1 hpet_poll: expired time = 0xf4232 hpet_poll: revents = 0x1 hpet_poll: data 0x1 Cc: stable@vger.kernel.org Signed-off-by: He Zhe Fixes: 54066a57c584 ("hpet: kill BKL, add compat_ioctl") Reviewed-by: Arnd Bergmann Link: https://lore.kernel.org/r/20240606123908.738733-1-zhe.he@windriver.com Signed-off-by: Greg Kroah-Hartman --- drivers/char/hpet.c | 34 +++++++++++++++++++++++++++++----- 1 file changed, 29 insertions(+), 5 deletions(-) diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c index 8b55085650ad..900d1da0075e 100644 --- a/drivers/char/hpet.c +++ b/drivers/char/hpet.c @@ -304,8 +304,13 @@ hpet_read(struct file *file, char __user *buf, size_t count, loff_t * ppos) if (!devp->hd_ireqfreq) return -EIO; - if (count < sizeof(unsigned long)) - return -EINVAL; + if (in_compat_syscall()) { + if (count < sizeof(compat_ulong_t)) + return -EINVAL; + } else { + if (count < sizeof(unsigned long)) + return -EINVAL; + } add_wait_queue(&devp->hd_waitqueue, &wait); @@ -329,9 +334,16 @@ hpet_read(struct file *file, char __user *buf, size_t count, loff_t * ppos) schedule(); } - retval = put_user(data, (unsigned long __user *)buf); - if (!retval) - retval = sizeof(unsigned long); + if (in_compat_syscall()) { + retval = put_user(data, (compat_ulong_t __user *)buf); + if (!retval) + retval = sizeof(compat_ulong_t); + } else { + retval = put_user(data, (unsigned long __user *)buf); + if (!retval) + retval = sizeof(unsigned long); + } + out: __set_current_state(TASK_RUNNING); remove_wait_queue(&devp->hd_waitqueue, &wait); @@ -686,12 +698,24 @@ struct compat_hpet_info { unsigned short hi_timer; }; +/* 32-bit types would lead to different command codes which should be + * translated into 64-bit ones before passed to hpet_ioctl_common + */ +#define COMPAT_HPET_INFO _IOR('h', 0x03, struct compat_hpet_info) +#define COMPAT_HPET_IRQFREQ _IOW('h', 0x6, compat_ulong_t) + static long hpet_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { struct hpet_info info; int err; + if (cmd == COMPAT_HPET_INFO) + cmd = HPET_INFO; + + if (cmd == COMPAT_HPET_IRQFREQ) + cmd = HPET_IRQFREQ; + mutex_lock(&hpet_mutex); err = hpet_ioctl_common(file->private_data, cmd, arg, &info); mutex_unlock(&hpet_mutex); From 781092884262cc7e87e3ab45fea8652013dc2209 Mon Sep 17 00:00:00 2001 From: Joy Chakraborty Date: Fri, 28 Jun 2024 12:37:02 +0100 Subject: [PATCH 1456/1565] nvmem: meson-efuse: Fix return value of nvmem callbacks commit 7a0a6d0a7c805f9380381f4deedffdf87b93f408 upstream. Read/write callbacks registered with nvmem core expect 0 to be returned on success and a negative value to be returned on failure. meson_efuse_read() and meson_efuse_write() call into meson_sm_call_read() and meson_sm_call_write() respectively which return the number of bytes read or written on success as per their api description. Fix to return error if meson_sm_call_read()/meson_sm_call_write() returns an error else return 0. Fixes: a29a63bdaf6f ("nvmem: meson-efuse: simplify read callback") Cc: stable@vger.kernel.org Signed-off-by: Joy Chakraborty Reviewed-by: Dan Carpenter Reviewed-by: Neil Armstrong Signed-off-by: Srinivas Kandagatla Link: https://lore.kernel.org/r/20240628113704.13742-3-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman --- drivers/nvmem/meson-efuse.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/nvmem/meson-efuse.c b/drivers/nvmem/meson-efuse.c index ba2714bef8d0..cf1b249e67ca 100644 --- a/drivers/nvmem/meson-efuse.c +++ b/drivers/nvmem/meson-efuse.c @@ -18,18 +18,24 @@ static int meson_efuse_read(void *context, unsigned int offset, void *val, size_t bytes) { struct meson_sm_firmware *fw = context; + int ret; - return meson_sm_call_read(fw, (u8 *)val, bytes, SM_EFUSE_READ, offset, - bytes, 0, 0, 0); + ret = meson_sm_call_read(fw, (u8 *)val, bytes, SM_EFUSE_READ, offset, + bytes, 0, 0, 0); + + return ret < 0 ? ret : 0; } static int meson_efuse_write(void *context, unsigned int offset, void *val, size_t bytes) { struct meson_sm_firmware *fw = context; + int ret; - return meson_sm_call_write(fw, (u8 *)val, bytes, SM_EFUSE_WRITE, offset, - bytes, 0, 0, 0); + ret = meson_sm_call_write(fw, (u8 *)val, bytes, SM_EFUSE_WRITE, offset, + bytes, 0, 0, 0); + + return ret < 0 ? ret : 0; } static const struct of_device_id meson_efuse_match[] = { From 4d62aa6247219f5d791a3d3ca8fc3a2c4eca0527 Mon Sep 17 00:00:00 2001 From: Nazar Bilinskyi Date: Tue, 9 Jul 2024 11:05:46 +0300 Subject: [PATCH 1457/1565] ALSA: hda/realtek: Enable Mute LED on HP 250 G7 commit b46953029c52bd3a3306ff79f631418b75384656 upstream. HP 250 G7 has a mute LED that can be made to work using quirk ALC269_FIXUP_HP_LINE1_MIC1_LED. Enable already existing quirk. Signed-off-by: Nazar Bilinskyi Cc: Link: https://patch.msgid.link/20240709080546.18344-1-nbilinskyi@gmail.com Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index c2374556c17e..1907b71f6871 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9057,6 +9057,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x103c, 0x83b9, "HP Spectre x360", ALC269_FIXUP_HP_MUTE_LED_MIC3), SND_PCI_QUIRK(0x103c, 0x841c, "HP Pavilion 15-CK0xx", ALC269_FIXUP_HP_MUTE_LED_MIC3), SND_PCI_QUIRK(0x103c, 0x8497, "HP Envy x360", ALC269_FIXUP_HP_MUTE_LED_MIC3), + SND_PCI_QUIRK(0x103c, 0x84a6, "HP 250 G7 Notebook PC", ALC269_FIXUP_HP_LINE1_MIC1_LED), SND_PCI_QUIRK(0x103c, 0x84ae, "HP 15-db0403ng", ALC236_FIXUP_HP_MUTE_LED_COEFBIT2), SND_PCI_QUIRK(0x103c, 0x84da, "HP OMEN dc0019-ur", ALC295_FIXUP_HP_OMEN), SND_PCI_QUIRK(0x103c, 0x84e7, "HP Pavilion 15", ALC269_FIXUP_HP_MUTE_LED_MIC3), From f70b51a3656255c36010242e50d63d804f8b5c5a Mon Sep 17 00:00:00 2001 From: Edson Juliano Drosdeck Date: Fri, 5 Jul 2024 11:10:12 -0300 Subject: [PATCH 1458/1565] ALSA: hda/realtek: Limit mic boost on VAIO PRO PX commit 6db03b1929e207d2c6e84e75a9cd78124b3d6c6d upstream. The internal mic boost on the VAIO models VJFE-CL and VJFE-IL is too high. Fix this by applying the ALC269_FIXUP_LIMIT_INT_MIC_BOOST fixup to the machine to limit the gain. Signed-off-by: Edson Juliano Drosdeck Cc: Link: https://patch.msgid.link/20240705141012.5368-1-edson.drosdeck@gmail.com Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_realtek.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 1907b71f6871..669937bae570 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9190,6 +9190,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x10cf, 0x1845, "Lifebook U904", ALC269_FIXUP_LIFEBOOK_EXTMIC), SND_PCI_QUIRK(0x10ec, 0x10f2, "Intel Reference board", ALC700_FIXUP_INTEL_REFERENCE), SND_PCI_QUIRK(0x10ec, 0x118c, "Medion EE4254 MD62100", ALC256_FIXUP_MEDION_HEADSET_NO_PRESENCE), + SND_PCI_QUIRK(0x10ec, 0x11bc, "VAIO VJFE-IL", ALC269_FIXUP_LIMIT_INT_MIC_BOOST), SND_PCI_QUIRK(0x10ec, 0x1230, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK), SND_PCI_QUIRK(0x10ec, 0x124c, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK), SND_PCI_QUIRK(0x10ec, 0x1252, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK), @@ -9397,6 +9398,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1d72, 0x1901, "RedmiBook 14", ALC256_FIXUP_ASUS_HEADSET_MIC), SND_PCI_QUIRK(0x1d72, 0x1945, "Redmi G", ALC256_FIXUP_ASUS_HEADSET_MIC), SND_PCI_QUIRK(0x1d72, 0x1947, "RedmiBook Air", ALC255_FIXUP_XIAOMI_HEADSET_MIC), + SND_PCI_QUIRK(0x2782, 0x0214, "VAIO VJFE-CL", ALC269_FIXUP_LIMIT_INT_MIC_BOOST), SND_PCI_QUIRK(0x2782, 0x0232, "CHUWI CoreBook XPro", ALC269VB_FIXUP_CHUWI_COREBOOK_XPRO), SND_PCI_QUIRK(0x2782, 0x1707, "Vaio VJFE-ADL", ALC298_FIXUP_SPK_VOLUME), SND_PCI_QUIRK(0x8086, 0x2074, "Intel NUC 8", ALC233_FIXUP_INTEL_NUC8_DMIC), From 34b76d1922e41da1fa73d43b764cddd82ac9733c Mon Sep 17 00:00:00 2001 From: Ilya Dryomov Date: Mon, 8 Jul 2024 22:37:29 +0200 Subject: [PATCH 1459/1565] libceph: fix race between delayed_work() and ceph_monc_stop() commit 69c7b2fe4c9cc1d3b1186d1c5606627ecf0de883 upstream. The way the delayed work is handled in ceph_monc_stop() is prone to races with mon_fault() and possibly also finish_hunting(). Both of these can requeue the delayed work which wouldn't be canceled by any of the following code in case that happens after cancel_delayed_work_sync() runs -- __close_session() doesn't mess with the delayed work in order to avoid interfering with the hunting interval logic. This part was missed in commit b5d91704f53e ("libceph: behave in mon_fault() if cur_mon < 0") and use-after-free can still ensue on monc and objects that hang off of it, with monc->auth and monc->monmap being particularly susceptible to quickly being reused. To fix this: - clear monc->cur_mon and monc->hunting as part of closing the session in ceph_monc_stop() - bail from delayed_work() if monc->cur_mon is cleared, similar to how it's done in mon_fault() and finish_hunting() (based on monc->hunting) - call cancel_delayed_work_sync() after the session is closed Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/66857 Signed-off-by: Ilya Dryomov Reviewed-by: Xiubo Li Signed-off-by: Greg Kroah-Hartman --- net/ceph/mon_client.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/net/ceph/mon_client.c b/net/ceph/mon_client.c index ef5c174102d5..82d3f1204ee4 100644 --- a/net/ceph/mon_client.c +++ b/net/ceph/mon_client.c @@ -1014,13 +1014,19 @@ static void delayed_work(struct work_struct *work) struct ceph_mon_client *monc = container_of(work, struct ceph_mon_client, delayed_work.work); - dout("monc delayed_work\n"); mutex_lock(&monc->mutex); + dout("%s mon%d\n", __func__, monc->cur_mon); + if (monc->cur_mon < 0) { + goto out; + } + if (monc->hunting) { dout("%s continuing hunt\n", __func__); reopen_session(monc); } else { int is_auth = ceph_auth_is_authenticated(monc->auth); + + dout("%s is_authed %d\n", __func__, is_auth); if (ceph_con_keepalive_expired(&monc->con, CEPH_MONC_PING_TIMEOUT)) { dout("monc keepalive timeout\n"); @@ -1045,6 +1051,8 @@ static void delayed_work(struct work_struct *work) } } __schedule_delayed(monc); + +out: mutex_unlock(&monc->mutex); } @@ -1157,13 +1165,15 @@ EXPORT_SYMBOL(ceph_monc_init); void ceph_monc_stop(struct ceph_mon_client *monc) { dout("stop\n"); - cancel_delayed_work_sync(&monc->delayed_work); mutex_lock(&monc->mutex); __close_session(monc); + monc->hunting = false; monc->cur_mon = -1; mutex_unlock(&monc->mutex); + cancel_delayed_work_sync(&monc->delayed_work); + /* * flush msgr queue before we destroy ourselves to ensure that: * - any work that references our embedded con is finished. From ae630de24efb123d7199a43256396d7758f4cb75 Mon Sep 17 00:00:00 2001 From: Helge Deller Date: Thu, 4 Jul 2024 17:45:15 +0200 Subject: [PATCH 1460/1565] wireguard: allowedips: avoid unaligned 64-bit memory accesses commit 948f991c62a4018fb81d85804eeab3029c6209f8 upstream. On the parisc platform, the kernel issues kernel warnings because swap_endian() tries to load a 128-bit IPv6 address from an unaligned memory location: Kernel: unaligned access to 0x55f4688c in wg_allowedips_insert_v6+0x2c/0x80 [wireguard] (iir 0xf3010df) Kernel: unaligned access to 0x55f46884 in wg_allowedips_insert_v6+0x38/0x80 [wireguard] (iir 0xf2010dc) Avoid such unaligned memory accesses by instead using the get_unaligned_be64() helper macro. Signed-off-by: Helge Deller [Jason: replace src[8] in original patch with src+8] Cc: stable@vger.kernel.org Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld Link: https://patch.msgid.link/20240704154517.1572127-3-Jason@zx2c4.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireguard/allowedips.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c index 0ba714ca5185..4b8528206cc8 100644 --- a/drivers/net/wireguard/allowedips.c +++ b/drivers/net/wireguard/allowedips.c @@ -15,8 +15,8 @@ static void swap_endian(u8 *dst, const u8 *src, u8 bits) if (bits == 32) { *(u32 *)dst = be32_to_cpu(*(const __be32 *)src); } else if (bits == 128) { - ((u64 *)dst)[0] = be64_to_cpu(((const __be64 *)src)[0]); - ((u64 *)dst)[1] = be64_to_cpu(((const __be64 *)src)[1]); + ((u64 *)dst)[0] = get_unaligned_be64(src); + ((u64 *)dst)[1] = get_unaligned_be64(src + 8); } } From 0bdb5a74443f18ac20cc41a54615663dd9471912 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Thu, 4 Jul 2024 17:45:16 +0200 Subject: [PATCH 1461/1565] wireguard: queueing: annotate intentional data race in cpu round robin commit 2fe3d6d2053c57f2eae5e85ca1656d185ebbe4e8 upstream. KCSAN reports a race in the CPU round robin function, which, as the comment points out, is intentional: BUG: KCSAN: data-race in wg_packet_send_staged_packets / wg_packet_send_staged_packets read to 0xffff88811254eb28 of 4 bytes by task 3160 on cpu 1: wg_cpumask_next_online drivers/net/wireguard/queueing.h:127 [inline] wg_queue_enqueue_per_device_and_peer drivers/net/wireguard/queueing.h:173 [inline] wg_packet_create_data drivers/net/wireguard/send.c:320 [inline] wg_packet_send_staged_packets+0x60e/0xac0 drivers/net/wireguard/send.c:388 wg_packet_send_keepalive+0xe2/0x100 drivers/net/wireguard/send.c:239 wg_receive_handshake_packet drivers/net/wireguard/receive.c:186 [inline] wg_packet_handshake_receive_worker+0x449/0x5f0 drivers/net/wireguard/receive.c:213 process_one_work kernel/workqueue.c:3248 [inline] process_scheduled_works+0x483/0x9a0 kernel/workqueue.c:3329 worker_thread+0x526/0x720 kernel/workqueue.c:3409 kthread+0x1d1/0x210 kernel/kthread.c:389 ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 write to 0xffff88811254eb28 of 4 bytes by task 3158 on cpu 0: wg_cpumask_next_online drivers/net/wireguard/queueing.h:130 [inline] wg_queue_enqueue_per_device_and_peer drivers/net/wireguard/queueing.h:173 [inline] wg_packet_create_data drivers/net/wireguard/send.c:320 [inline] wg_packet_send_staged_packets+0x6e5/0xac0 drivers/net/wireguard/send.c:388 wg_packet_send_keepalive+0xe2/0x100 drivers/net/wireguard/send.c:239 wg_receive_handshake_packet drivers/net/wireguard/receive.c:186 [inline] wg_packet_handshake_receive_worker+0x449/0x5f0 drivers/net/wireguard/receive.c:213 process_one_work kernel/workqueue.c:3248 [inline] process_scheduled_works+0x483/0x9a0 kernel/workqueue.c:3329 worker_thread+0x526/0x720 kernel/workqueue.c:3409 kthread+0x1d1/0x210 kernel/kthread.c:389 ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 value changed: 0xffffffff -> 0x00000000 Mark this race as intentional by using READ/WRITE_ONCE(). Cc: stable@vger.kernel.org Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld Link: https://patch.msgid.link/20240704154517.1572127-4-Jason@zx2c4.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireguard/queueing.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireguard/queueing.h b/drivers/net/wireguard/queueing.h index a2e702f8c582..5d2480d90fdb 100644 --- a/drivers/net/wireguard/queueing.h +++ b/drivers/net/wireguard/queueing.h @@ -126,10 +126,10 @@ static inline int wg_cpumask_choose_online(int *stored_cpu, unsigned int id) */ static inline int wg_cpumask_next_online(int *last_cpu) { - int cpu = cpumask_next(*last_cpu, cpu_online_mask); + int cpu = cpumask_next(READ_ONCE(*last_cpu), cpu_online_mask); if (cpu >= nr_cpu_ids) cpu = cpumask_first(cpu_online_mask); - *last_cpu = cpu; + WRITE_ONCE(*last_cpu, cpu); return cpu; } From b6d942365dbe1dc18cd5293886980f34a59450e8 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Thu, 4 Jul 2024 17:45:17 +0200 Subject: [PATCH 1462/1565] wireguard: send: annotate intentional data race in checking empty queue commit 381a7d453fa2ac5f854a154d3c9b1bbb90c4f94f upstream. KCSAN reports a race in wg_packet_send_keepalive, which is intentional: BUG: KCSAN: data-race in wg_packet_send_keepalive / wg_packet_send_staged_packets write to 0xffff88814cd91280 of 8 bytes by task 3194 on cpu 0: __skb_queue_head_init include/linux/skbuff.h:2162 [inline] skb_queue_splice_init include/linux/skbuff.h:2248 [inline] wg_packet_send_staged_packets+0xe5/0xad0 drivers/net/wireguard/send.c:351 wg_xmit+0x5b8/0x660 drivers/net/wireguard/device.c:218 __netdev_start_xmit include/linux/netdevice.h:4940 [inline] netdev_start_xmit include/linux/netdevice.h:4954 [inline] xmit_one net/core/dev.c:3548 [inline] dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3564 __dev_queue_xmit+0xeff/0x1d80 net/core/dev.c:4349 dev_queue_xmit include/linux/netdevice.h:3134 [inline] neigh_connected_output+0x231/0x2a0 net/core/neighbour.c:1592 neigh_output include/net/neighbour.h:542 [inline] ip6_finish_output2+0xa66/0xce0 net/ipv6/ip6_output.c:137 ip6_finish_output+0x1a5/0x490 net/ipv6/ip6_output.c:222 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip6_output+0xeb/0x220 net/ipv6/ip6_output.c:243 dst_output include/net/dst.h:451 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] ndisc_send_skb+0x4a2/0x670 net/ipv6/ndisc.c:509 ndisc_send_rs+0x3ab/0x3e0 net/ipv6/ndisc.c:719 addrconf_dad_completed+0x640/0x8e0 net/ipv6/addrconf.c:4295 addrconf_dad_work+0x891/0xbc0 process_one_work kernel/workqueue.c:2633 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2706 worker_thread+0x525/0x730 kernel/workqueue.c:2787 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242 read to 0xffff88814cd91280 of 8 bytes by task 3202 on cpu 1: skb_queue_empty include/linux/skbuff.h:1798 [inline] wg_packet_send_keepalive+0x20/0x100 drivers/net/wireguard/send.c:225 wg_receive_handshake_packet drivers/net/wireguard/receive.c:186 [inline] wg_packet_handshake_receive_worker+0x445/0x5e0 drivers/net/wireguard/receive.c:213 process_one_work kernel/workqueue.c:2633 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2706 worker_thread+0x525/0x730 kernel/workqueue.c:2787 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242 value changed: 0xffff888148fef200 -> 0xffff88814cd91280 Mark this race as intentional by using the skb_queue_empty_lockless() function rather than skb_queue_empty(), which uses READ_ONCE() internally to annotate the race. Cc: stable@vger.kernel.org Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld Link: https://patch.msgid.link/20240704154517.1572127-5-Jason@zx2c4.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireguard/send.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireguard/send.c b/drivers/net/wireguard/send.c index 0d48e0f4a1ba..26e09c30d596 100644 --- a/drivers/net/wireguard/send.c +++ b/drivers/net/wireguard/send.c @@ -222,7 +222,7 @@ void wg_packet_send_keepalive(struct wg_peer *peer) { struct sk_buff *skb; - if (skb_queue_empty(&peer->staged_packet_queue)) { + if (skb_queue_empty_lockless(&peer->staged_packet_queue)) { skb = alloc_skb(DATA_PACKET_HEAD_ROOM + MESSAGE_MINIMUM_LENGTH, GFP_ATOMIC); if (unlikely(!skb)) From 596dedc6fa8927b63ddf983e4e3a6042d821c8ff Mon Sep 17 00:00:00 2001 From: Jim Mattson Date: Tue, 9 Jul 2024 06:20:46 -0700 Subject: [PATCH 1463/1565] x86/retpoline: Move a NOENDBR annotation to the SRSO dummy return thunk The linux-5.10-y backport of commit b377c66ae350 ("x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk") misplaced the new NOENDBR annotation, repeating the annotation on __x86_return_thunk, rather than adding the annotation to the !CONFIG_CPU_SRSO version of srso_alias_untrain_ret, as intended. Move the annotation to the right place. Fixes: 0bdc64e9e716 ("x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk") Reported-by: Greg Thelen Signed-off-by: Jim Mattson Acked-by: Borislav Petkov (AMD) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- arch/x86/lib/retpoline.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S index ab9b047790dd..d1902213a0d6 100644 --- a/arch/x86/lib/retpoline.S +++ b/arch/x86/lib/retpoline.S @@ -105,6 +105,7 @@ __EXPORT_THUNK(srso_alias_untrain_ret) /* dummy definition for alternatives */ SYM_START(srso_alias_untrain_ret, SYM_L_GLOBAL, SYM_A_NONE) ANNOTATE_UNRET_SAFE + ANNOTATE_NOENDBR ret int3 SYM_FUNC_END(srso_alias_untrain_ret) @@ -258,7 +259,6 @@ SYM_CODE_START(__x86_return_thunk) UNWIND_HINT_FUNC ANNOTATE_NOENDBR ANNOTATE_UNRET_SAFE - ANNOTATE_NOENDBR ret int3 SYM_CODE_END(__x86_return_thunk) From 96c58b0966599ca9f553452510d328d49c018327 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Mon, 18 Jan 2021 13:38:42 +0100 Subject: [PATCH 1464/1565] efi: ia64: move IA64-only declarations to new asm/efi.h header commit 8ff059b8531f3b98e14f0461859fc7cdd95823e4 upstream. Move some EFI related declarations that are only referenced on IA64 to a new asm/efi.h arch header. Cc: Tony Luck Cc: Fenghua Yu Signed-off-by: Ard Biesheuvel Cc: Frank Scheiner Signed-off-by: Greg Kroah-Hartman --- arch/ia64/include/asm/efi.h | 13 +++++++++++++ arch/ia64/kernel/efi.c | 1 + arch/ia64/kernel/machine_kexec.c | 1 + arch/ia64/kernel/mca.c | 1 + arch/ia64/kernel/smpboot.c | 1 + arch/ia64/kernel/time.c | 1 + arch/ia64/kernel/uncached.c | 4 +--- arch/ia64/mm/contig.c | 1 + arch/ia64/mm/discontig.c | 1 + arch/ia64/mm/init.c | 1 + include/linux/efi.h | 6 ------ 11 files changed, 22 insertions(+), 9 deletions(-) create mode 100644 arch/ia64/include/asm/efi.h diff --git a/arch/ia64/include/asm/efi.h b/arch/ia64/include/asm/efi.h new file mode 100644 index 000000000000..6a4a50d8f19a --- /dev/null +++ b/arch/ia64/include/asm/efi.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_EFI_H +#define _ASM_EFI_H + +typedef int (*efi_freemem_callback_t) (u64 start, u64 end, void *arg); + +void *efi_get_pal_addr(void); +void efi_map_pal_code(void); +void efi_memmap_walk(efi_freemem_callback_t, void *); +void efi_memmap_walk_uc(efi_freemem_callback_t, void *); +void efi_gettimeofday(struct timespec64 *ts); + +#endif diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c index 33282f33466e..4707c5ee6692 100644 --- a/arch/ia64/kernel/efi.c +++ b/arch/ia64/kernel/efi.c @@ -34,6 +34,7 @@ #include #include +#include #include #include #include diff --git a/arch/ia64/kernel/machine_kexec.c b/arch/ia64/kernel/machine_kexec.c index efc9b568401c..af310dc8a356 100644 --- a/arch/ia64/kernel/machine_kexec.c +++ b/arch/ia64/kernel/machine_kexec.c @@ -16,6 +16,7 @@ #include #include +#include #include #include #include diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c index bd0a51dc345a..d9f51f0b33a7 100644 --- a/arch/ia64/kernel/mca.c +++ b/arch/ia64/kernel/mca.c @@ -91,6 +91,7 @@ #include #include +#include #include #include #include diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c index 0cad990385c0..d10f780c13b9 100644 --- a/arch/ia64/kernel/smpboot.c +++ b/arch/ia64/kernel/smpboot.c @@ -45,6 +45,7 @@ #include #include #include +#include #include #include #include diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c index 7abc5f37bfaf..b53d4688d5a7 100644 --- a/arch/ia64/kernel/time.c +++ b/arch/ia64/kernel/time.c @@ -26,6 +26,7 @@ #include #include +#include #include #include #include diff --git a/arch/ia64/kernel/uncached.c b/arch/ia64/kernel/uncached.c index 0750f367837d..51883a66aeb5 100644 --- a/arch/ia64/kernel/uncached.c +++ b/arch/ia64/kernel/uncached.c @@ -20,14 +20,12 @@ #include #include #include +#include #include #include #include #include - -extern void __init efi_memmap_walk_uc(efi_freemem_callback_t, void *); - struct uncached_pool { struct gen_pool *pool; struct mutex add_chunk_mutex; /* serialize adding a converted chunk */ diff --git a/arch/ia64/mm/contig.c b/arch/ia64/mm/contig.c index c638e012ad05..11c82d4d4f7c 100644 --- a/arch/ia64/mm/contig.c +++ b/arch/ia64/mm/contig.c @@ -20,6 +20,7 @@ #include #include +#include #include #include #include diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c index 4d0813419013..fc3c77fb2597 100644 --- a/arch/ia64/mm/discontig.c +++ b/arch/ia64/mm/discontig.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index f316a833b703..85e4e9ef027a 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -27,6 +27,7 @@ #include #include +#include #include #include #include diff --git a/include/linux/efi.h b/include/linux/efi.h index 2fd321da5c4b..5554d26f91d8 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -171,8 +171,6 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, void *kbuff, size_t hdr_bytes); int __efi_capsule_setup_info(struct capsule_info *cap_info); -typedef int (*efi_freemem_callback_t) (u64 start, u64 end, void *arg); - /* * Types and defines for Time Services */ @@ -609,10 +607,6 @@ efi_guid_to_str(efi_guid_t *guid, char *out) } extern void efi_init (void); -extern void *efi_get_pal_addr (void); -extern void efi_map_pal_code (void); -extern void efi_memmap_walk (efi_freemem_callback_t callback, void *arg); -extern void efi_gettimeofday (struct timespec64 *ts); #ifdef CONFIG_EFI extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, if possible */ #else From 5edef7986495acad703270b07a1ef39e0c0465ee Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 28 Feb 2024 13:54:26 +0000 Subject: [PATCH 1465/1565] ipv6: annotate data-races around cnf.disable_ipv6 commit d289ab65b89c1d4d88417cb6c03e923f21f95fae upstream. disable_ipv6 is read locklessly, add appropriate READ_ONCE() and WRITE_ONCE() annotations. v2: do not preload net before rtnl_trylock() in addrconf_disable_ipv6() (Jiri) Signed-off-by: Eric Dumazet Reviewed-by: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Sasha Levin [Ashwin: Regenerated the Patch for v5.10] Signed-off-by: Ashwin Dayanand Kamat Signed-off-by: Greg Kroah-Hartman --- net/ipv6/addrconf.c | 9 +++++---- net/ipv6/ip6_input.c | 2 +- net/ipv6/ip6_output.c | 2 +- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 8a6f4cdd5a48..ac09d4543f3e 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -4107,7 +4107,7 @@ static void addrconf_dad_work(struct work_struct *w) if (!ipv6_generate_eui64(addr.s6_addr + 8, idev->dev) && ipv6_addr_equal(&ifp->addr, &addr)) { /* DAD failed for link-local based on MAC */ - idev->cnf.disable_ipv6 = 1; + WRITE_ONCE(idev->cnf.disable_ipv6, 1); pr_info("%s: IPv6 being disabled!\n", ifp->idev->dev->name); @@ -6220,7 +6220,8 @@ static void addrconf_disable_change(struct net *net, __s32 newf) idev = __in6_dev_get(dev); if (idev) { int changed = (!idev->cnf.disable_ipv6) ^ (!newf); - idev->cnf.disable_ipv6 = newf; + + WRITE_ONCE(idev->cnf.disable_ipv6, newf); if (changed) dev_disable_change(idev); } @@ -6237,7 +6238,7 @@ static int addrconf_disable_ipv6(struct ctl_table *table, int *p, int newf) net = (struct net *)table->extra2; old = *p; - *p = newf; + WRITE_ONCE(*p, newf); if (p == &net->ipv6.devconf_dflt->disable_ipv6) { rtnl_unlock(); @@ -6245,7 +6246,7 @@ static int addrconf_disable_ipv6(struct ctl_table *table, int *p, int newf) } if (p == &net->ipv6.devconf_all->disable_ipv6) { - net->ipv6.devconf_dflt->disable_ipv6 = newf; + WRITE_ONCE(net->ipv6.devconf_dflt->disable_ipv6, newf); addrconf_disable_change(net, newf); } else if ((!newf) ^ (!old)) dev_disable_change((struct inet6_dev *)table->extra1); diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c index 4eb9fbfdce33..8cf5b10eee69 100644 --- a/net/ipv6/ip6_input.c +++ b/net/ipv6/ip6_input.c @@ -165,7 +165,7 @@ static struct sk_buff *ip6_rcv_core(struct sk_buff *skb, struct net_device *dev, __IP6_UPD_PO_STATS(net, idev, IPSTATS_MIB_IN, skb->len); if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL || - !idev || unlikely(idev->cnf.disable_ipv6)) { + !idev || unlikely(READ_ONCE(idev->cnf.disable_ipv6))) { __IP6_INC_STATS(net, idev, IPSTATS_MIB_INDISCARDS); goto drop; } diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 4126be15e0d9..2b55bf0d3e04 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -240,7 +240,7 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb) skb->protocol = htons(ETH_P_IPV6); skb->dev = dev; - if (unlikely(idev->cnf.disable_ipv6)) { + if (unlikely(READ_ONCE(idev->cnf.disable_ipv6))) { IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS); kfree_skb(skb); return 0; From 9df3b2474a627994433a87cbf325a562555b17de Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 7 May 2024 16:18:42 +0000 Subject: [PATCH 1466/1565] ipv6: prevent NULL dereference in ip6_output() commit 4db783d68b9b39a411a96096c10828ff5dfada7a upstream. According to syzbot, there is a chance that ip6_dst_idev() returns NULL in ip6_output(). Most places in IPv6 stack deal with a NULL idev just fine, but not here. syzbot reported: general protection fault, probably for non-canonical address 0xdffffc00000000bc: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x00000000000005e0-0x00000000000005e7] CPU: 0 PID: 9775 Comm: syz-executor.4 Not tainted 6.9.0-rc5-syzkaller-00157-g6a30653b604a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 RIP: 0010:ip6_output+0x231/0x3f0 net/ipv6/ip6_output.c:237 Code: 3c 1e 00 49 89 df 74 08 4c 89 ef e8 19 58 db f7 48 8b 44 24 20 49 89 45 00 49 89 c5 48 8d 9d e0 05 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 38 84 c0 4c 8b 74 24 28 0f 85 61 01 00 00 8b 1b 31 ff RSP: 0018:ffffc9000927f0d8 EFLAGS: 00010202 RAX: 00000000000000bc RBX: 00000000000005e0 RCX: 0000000000040000 RDX: ffffc900131f9000 RSI: 0000000000004f47 RDI: 0000000000004f48 RBP: 0000000000000000 R08: ffffffff8a1f0b9a R09: 1ffffffff1f51fad R10: dffffc0000000000 R11: fffffbfff1f51fae R12: ffff8880293ec8c0 R13: ffff88805d7fc000 R14: 1ffff1100527d91a R15: dffffc0000000000 FS: 00007f135c6856c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000080 CR3: 0000000064096000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: NF_HOOK include/linux/netfilter.h:314 [inline] ip6_xmit+0xefe/0x17f0 net/ipv6/ip6_output.c:358 sctp_v6_xmit+0x9f2/0x13f0 net/sctp/ipv6.c:248 sctp_packet_transmit+0x26ad/0x2ca0 net/sctp/output.c:653 sctp_packet_singleton+0x22c/0x320 net/sctp/outqueue.c:783 sctp_outq_flush_ctrl net/sctp/outqueue.c:914 [inline] sctp_outq_flush+0x6d5/0x3e20 net/sctp/outqueue.c:1212 sctp_side_effects net/sctp/sm_sideeffect.c:1198 [inline] sctp_do_sm+0x59cc/0x60c0 net/sctp/sm_sideeffect.c:1169 sctp_primitive_ASSOCIATE+0x95/0xc0 net/sctp/primitive.c:73 __sctp_connect+0x9cd/0xe30 net/sctp/socket.c:1234 sctp_connect net/sctp/socket.c:4819 [inline] sctp_inet_connect+0x149/0x1f0 net/sctp/socket.c:4834 __sys_connect_file net/socket.c:2048 [inline] __sys_connect+0x2df/0x310 net/socket.c:2065 __do_sys_connect net/socket.c:2075 [inline] __se_sys_connect net/socket.c:2072 [inline] __x64_sys_connect+0x7a/0x90 net/socket.c:2072 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 778d80be5269 ("ipv6: Add disable_ipv6 sysctl to disable IPv6 operaion on specific interface.") Reported-by: syzbot Signed-off-by: Eric Dumazet Reviewed-by: Larysa Zaremba Link: https://lore.kernel.org/r/20240507161842.773961-1-edumazet@google.com Signed-off-by: Jakub Kicinski [Ashwin: Regenerated the Patch for v5.10] Signed-off-by: Ashwin Dayanand Kamat Signed-off-by: Greg Kroah-Hartman --- net/ipv6/ip6_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 2b55bf0d3e04..32512b8ca5e7 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -240,7 +240,7 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb) skb->protocol = htons(ETH_P_IPV6); skb->dev = dev; - if (unlikely(READ_ONCE(idev->cnf.disable_ipv6))) { + if (unlikely(!idev || READ_ONCE(idev->cnf.disable_ipv6))) { IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS); kfree_skb(skb); return 0; From ca42be8dd1e217273dd8e9077d4ec9117eaa0ef3 Mon Sep 17 00:00:00 2001 From: Eduard Zingerman Date: Sun, 19 Feb 2023 22:04:26 +0200 Subject: [PATCH 1467/1565] bpf: Allow reads from uninit stack commit 6715df8d5d24655b9fd368e904028112b54c7de1 upstream. This commits updates the following functions to allow reads from uninitialized stack locations when env->allow_uninit_stack option is enabled: - check_stack_read_fixed_off() - check_stack_range_initialized(), called from: - check_stack_read_var_off() - check_helper_mem_access() Such change allows to relax logic in stacksafe() to treat STACK_MISC and STACK_INVALID in a same way and make the following stack slot configurations equivalent: | Cached state | Current state | | stack slot | stack slot | |------------------+------------------| | STACK_INVALID or | STACK_INVALID or | | STACK_MISC | STACK_SPILL or | | | STACK_MISC or | | | STACK_ZERO or | | | STACK_DYNPTR | This leads to significant verification speed gains (see below). The idea was suggested by Andrii Nakryiko [1] and initial patch was created by Alexei Starovoitov [2]. Currently the env->allow_uninit_stack is allowed for programs loaded by users with CAP_PERFMON or CAP_SYS_ADMIN capabilities. A number of test cases from verifier/*.c were expecting uninitialized stack access to be an error. These test cases were updated to execute in unprivileged mode (thus preserving the tests). The test progs/test_global_func10.c expected "invalid indirect read from stack" error message because of the access to uninitialized memory region. This error is no longer possible in privileged mode. The test is updated to provoke an error "invalid indirect access to stack" because of access to invalid stack address (such error is not verified by progs/test_global_func*.c series of tests). The following tests had to be removed because these can't be made unprivileged: - verifier/sock.c: - "sk_storage_get(map, skb->sk, &stack_value, 1): partially init stack_value" BPF_PROG_TYPE_SCHED_CLS programs are not executed in unprivileged mode. - verifier/var_off.c: - "indirect variable-offset stack access, max_off+size > max_initialized" - "indirect variable-offset stack access, uninitialized" These tests verify that access to uninitialized stack values is detected when stack offset is not a constant. However, variable stack access is prohibited in unprivileged mode, thus these tests are no longer valid. * * * Here is veristat log comparing this patch with current master on a set of selftest binaries listed in tools/testing/selftests/bpf/veristat.cfg and cilium BPF binaries (see [3]): $ ./veristat -e file,prog,states -C -f 'states_pct<-30' master.log current.log File Program States (A) States (B) States (DIFF) -------------------------- -------------------------- ---------- ---------- ---------------- bpf_host.o tail_handle_ipv6_from_host 349 244 -105 (-30.09%) bpf_host.o tail_handle_nat_fwd_ipv4 1320 895 -425 (-32.20%) bpf_lxc.o tail_handle_nat_fwd_ipv4 1320 895 -425 (-32.20%) bpf_sock.o cil_sock4_connect 70 48 -22 (-31.43%) bpf_sock.o cil_sock4_sendmsg 68 46 -22 (-32.35%) bpf_xdp.o tail_handle_nat_fwd_ipv4 1554 803 -751 (-48.33%) bpf_xdp.o tail_lb_ipv4 6457 2473 -3984 (-61.70%) bpf_xdp.o tail_lb_ipv6 7249 3908 -3341 (-46.09%) pyperf600_bpf_loop.bpf.o on_event 287 145 -142 (-49.48%) strobemeta.bpf.o on_event 15915 4772 -11143 (-70.02%) strobemeta_nounroll2.bpf.o on_event 17087 3820 -13267 (-77.64%) xdp_synproxy_kern.bpf.o syncookie_tc 21271 6635 -14636 (-68.81%) xdp_synproxy_kern.bpf.o syncookie_xdp 23122 6024 -17098 (-73.95%) -------------------------- -------------------------- ---------- ---------- ---------------- Note: I limited selection by states_pct<-30%. Inspection of differences in pyperf600_bpf_loop behavior shows that the following patch for the test removes almost all differences: - a/tools/testing/selftests/bpf/progs/pyperf.h + b/tools/testing/selftests/bpf/progs/pyperf.h @ -266,8 +266,8 @ int __on_event(struct bpf_raw_tracepoint_args *ctx) } if (event->pthread_match || !pidData->use_tls) { - void* frame_ptr; - FrameData frame; + void* frame_ptr = 0; + FrameData frame = {}; Symbol sym = {}; int cur_cpu = bpf_get_smp_processor_id(); W/o this patch the difference comes from the following pattern (for different variables): static bool get_frame_data(... FrameData *frame ...) { ... bpf_probe_read_user(&frame->f_code, ...); if (!frame->f_code) return false; ... bpf_probe_read_user(&frame->co_name, ...); if (frame->co_name) ...; } int __on_event(struct bpf_raw_tracepoint_args *ctx) { FrameData frame; ... get_frame_data(... &frame ...) // indirectly via a bpf_loop & callback ... } SEC("raw_tracepoint/kfree_skb") int on_event(struct bpf_raw_tracepoint_args* ctx) { ... ret |= __on_event(ctx); ret |= __on_event(ctx); ... } With regards to value `frame->co_name` the following is important: - Because of the conditional `if (!frame->f_code)` each call to __on_event() produces two states, one with `frame->co_name` marked as STACK_MISC, another with it as is (and marked STACK_INVALID on a first call). - The call to bpf_probe_read_user() does not mark stack slots corresponding to `&frame->co_name` as REG_LIVE_WRITTEN but it marks these slots as BPF_MISC, this happens because of the following loop in the check_helper_call(): for (i = 0; i < meta.access_size; i++) { err = check_mem_access(env, insn_idx, meta.regno, i, BPF_B, BPF_WRITE, -1, false); if (err) return err; } Note the size of the write, it is a one byte write for each byte touched by a helper. The BPF_B write does not lead to write marks for the target stack slot. - Which means that w/o this patch when second __on_event() call is verified `if (frame->co_name)` will propagate read marks first to a stack slot with STACK_MISC marks and second to a stack slot with STACK_INVALID marks and these states would be considered different. [1] https://lore.kernel.org/bpf/CAEf4BzY3e+ZuC6HUa8dCiUovQRg2SzEk7M-dSkqNZyn=xEmnPA@mail.gmail.com/ [2] https://lore.kernel.org/bpf/CAADnVQKs2i1iuZ5SUGuJtxWVfGYR9kDgYKhq3rNV+kBLQCu7rA@mail.gmail.com/ [3] git@github.com:anakryiko/cilium.git Suggested-by: Andrii Nakryiko Co-developed-by: Alexei Starovoitov Signed-off-by: Eduard Zingerman Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/r/20230219200427.606541-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov Signed-off-by: Maxim Mikityanskiy Signed-off-by: Greg Kroah-Hartman --- kernel/bpf/verifier.c | 11 +- .../selftests/bpf/progs/test_global_func10.c | 31 +++ tools/testing/selftests/bpf/verifier/calls.c | 13 +- .../bpf/verifier/helper_access_var_len.c | 104 ++++++--- .../testing/selftests/bpf/verifier/int_ptr.c | 9 +- .../selftests/bpf/verifier/search_pruning.c | 13 +- tools/testing/selftests/bpf/verifier/sock.c | 27 --- .../selftests/bpf/verifier/spill_fill.c | 211 ++++++++++++++++++ .../testing/selftests/bpf/verifier/var_off.c | 52 ----- 9 files changed, 342 insertions(+), 129 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/test_global_func10.c diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index ad115ccc2fe0..60db311480d0 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2807,6 +2807,8 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env, continue; if (type == STACK_MISC) continue; + if (type == STACK_INVALID && env->allow_uninit_stack) + continue; verbose(env, "invalid read from stack off %d+%d size %d\n", off, i, size); return -EACCES; @@ -2844,6 +2846,8 @@ static int check_stack_read_fixed_off(struct bpf_verifier_env *env, continue; if (type == STACK_ZERO) continue; + if (type == STACK_INVALID && env->allow_uninit_stack) + continue; verbose(env, "invalid read from stack off %d+%d size %d\n", off, i, size); return -EACCES; @@ -4300,7 +4304,8 @@ static int check_stack_range_initialized( stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE]; if (*stype == STACK_MISC) goto mark; - if (*stype == STACK_ZERO) { + if ((*stype == STACK_ZERO) || + (*stype == STACK_INVALID && env->allow_uninit_stack)) { if (clobber) { /* helper can write anything into the stack */ *stype = STACK_MISC; @@ -9492,6 +9497,10 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old, if (old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_INVALID) continue; + if (env->allow_uninit_stack && + old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_MISC) + continue; + /* explored stack has more populated slots than current stack * and these slots were used */ diff --git a/tools/testing/selftests/bpf/progs/test_global_func10.c b/tools/testing/selftests/bpf/progs/test_global_func10.c new file mode 100644 index 000000000000..8fba3f3649e2 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_global_func10.c @@ -0,0 +1,31 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include +#include "bpf_misc.h" + +struct Small { + long x; +}; + +struct Big { + long x; + long y; +}; + +__noinline int foo(const struct Big *big) +{ + if (!big) + return 0; + + return bpf_get_prandom_u32() < big->y; +} + +SEC("cgroup_skb/ingress") +__failure __msg("invalid indirect access to stack") +int global_func10(struct __sk_buff *skb) +{ + const struct Small small = {.x = skb->len }; + + return foo((struct Big *)&small) ? 1 : 0; +} diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c index eb888c8479c3..4b0628cd2d03 100644 --- a/tools/testing/selftests/bpf/verifier/calls.c +++ b/tools/testing/selftests/bpf/verifier/calls.c @@ -1948,19 +1948,22 @@ * that fp-8 stack slot was unused in the fall-through * branch and will accept the program incorrectly */ - BPF_JMP_IMM(BPF_JGT, BPF_REG_1, 2, 2), + BPF_EMIT_CALL(BPF_FUNC_get_prandom_u32), + BPF_JMP_IMM(BPF_JGT, BPF_REG_0, 2, 2), BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), BPF_JMP_IMM(BPF_JA, 0, 0, 0), BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), BPF_LD_MAP_FD(BPF_REG_1, 0), BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), + BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .fixup_map_hash_48b = { 6 }, - .errstr = "invalid indirect read from stack R2 off -8+0 size 8", - .result = REJECT, - .prog_type = BPF_PROG_TYPE_XDP, + .fixup_map_hash_48b = { 7 }, + .errstr_unpriv = "invalid indirect read from stack R2 off -8+0 size 8", + .result_unpriv = REJECT, + /* in privileged mode reads from uninitialized stack locations are permitted */ + .result = ACCEPT, }, { "calls: ctx read at start of subprog", diff --git a/tools/testing/selftests/bpf/verifier/helper_access_var_len.c b/tools/testing/selftests/bpf/verifier/helper_access_var_len.c index 0ab7f1dfc97a..0e24aa11c457 100644 --- a/tools/testing/selftests/bpf/verifier/helper_access_var_len.c +++ b/tools/testing/selftests/bpf/verifier/helper_access_var_len.c @@ -29,19 +29,30 @@ { "helper access to variable memory: stack, bitwise AND, zero included", .insns = { - BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, 8), - BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -64), - BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_2, -128), - BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -128), - BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 64), - BPF_MOV64_IMM(BPF_REG_3, 0), - BPF_EMIT_CALL(BPF_FUNC_probe_read_kernel), + /* set max stack size */ + BPF_ST_MEM(BPF_DW, BPF_REG_10, -128, 0), + /* set r3 to a random value */ + BPF_EMIT_CALL(BPF_FUNC_get_prandom_u32), + BPF_MOV64_REG(BPF_REG_3, BPF_REG_0), + /* use bitwise AND to limit r3 range to [0, 64] */ + BPF_ALU64_IMM(BPF_AND, BPF_REG_3, 64), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -64), + BPF_MOV64_IMM(BPF_REG_4, 0), + /* Call bpf_ringbuf_output(), it is one of a few helper functions with + * ARG_CONST_SIZE_OR_ZERO parameter allowed in unpriv mode. + * For unpriv this should signal an error, because memory at &fp[-64] is + * not initialized. + */ + BPF_EMIT_CALL(BPF_FUNC_ringbuf_output), BPF_EXIT_INSN(), }, - .errstr = "invalid indirect read from stack R1 off -64+0 size 64", - .result = REJECT, - .prog_type = BPF_PROG_TYPE_TRACEPOINT, + .fixup_map_ringbuf = { 4 }, + .errstr_unpriv = "invalid indirect read from stack R2 off -64+0 size 64", + .result_unpriv = REJECT, + /* in privileged mode reads from uninitialized stack locations are permitted */ + .result = ACCEPT, }, { "helper access to variable memory: stack, bitwise AND + JMP, wrong max", @@ -183,20 +194,31 @@ { "helper access to variable memory: stack, JMP, no min check", .insns = { - BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, 8), - BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -64), - BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_2, -128), - BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -128), - BPF_JMP_IMM(BPF_JGT, BPF_REG_2, 64, 3), - BPF_MOV64_IMM(BPF_REG_3, 0), - BPF_EMIT_CALL(BPF_FUNC_probe_read_kernel), + /* set max stack size */ + BPF_ST_MEM(BPF_DW, BPF_REG_10, -128, 0), + /* set r3 to a random value */ + BPF_EMIT_CALL(BPF_FUNC_get_prandom_u32), + BPF_MOV64_REG(BPF_REG_3, BPF_REG_0), + /* use JMP to limit r3 range to [0, 64] */ + BPF_JMP_IMM(BPF_JGT, BPF_REG_3, 64, 6), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -64), + BPF_MOV64_IMM(BPF_REG_4, 0), + /* Call bpf_ringbuf_output(), it is one of a few helper functions with + * ARG_CONST_SIZE_OR_ZERO parameter allowed in unpriv mode. + * For unpriv this should signal an error, because memory at &fp[-64] is + * not initialized. + */ + BPF_EMIT_CALL(BPF_FUNC_ringbuf_output), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .errstr = "invalid indirect read from stack R1 off -64+0 size 64", - .result = REJECT, - .prog_type = BPF_PROG_TYPE_TRACEPOINT, + .fixup_map_ringbuf = { 4 }, + .errstr_unpriv = "invalid indirect read from stack R2 off -64+0 size 64", + .result_unpriv = REJECT, + /* in privileged mode reads from uninitialized stack locations are permitted */ + .result = ACCEPT, }, { "helper access to variable memory: stack, JMP (signed), no min check", @@ -564,29 +586,41 @@ { "helper access to variable memory: 8 bytes leak", .insns = { - BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, 8), - BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -64), + /* set max stack size */ + BPF_ST_MEM(BPF_DW, BPF_REG_10, -128, 0), + /* set r3 to a random value */ + BPF_EMIT_CALL(BPF_FUNC_get_prandom_u32), + BPF_MOV64_REG(BPF_REG_3, BPF_REG_0), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -64), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -64), BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -56), BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -48), BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -40), + /* Note: fp[-32] left uninitialized */ BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -24), BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16), BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8), - BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -128), - BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_10, -128), - BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 63), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, 1), - BPF_MOV64_IMM(BPF_REG_3, 0), - BPF_EMIT_CALL(BPF_FUNC_probe_read_kernel), - BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16), + /* Limit r3 range to [1, 64] */ + BPF_ALU64_IMM(BPF_AND, BPF_REG_3, 63), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, 1), + BPF_MOV64_IMM(BPF_REG_4, 0), + /* Call bpf_ringbuf_output(), it is one of a few helper functions with + * ARG_CONST_SIZE_OR_ZERO parameter allowed in unpriv mode. + * For unpriv this should signal an error, because memory region [1, 64] + * at &fp[-64] is not fully initialized. + */ + BPF_EMIT_CALL(BPF_FUNC_ringbuf_output), + BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .errstr = "invalid indirect read from stack R1 off -64+32 size 64", - .result = REJECT, - .prog_type = BPF_PROG_TYPE_TRACEPOINT, + .fixup_map_ringbuf = { 3 }, + .errstr_unpriv = "invalid indirect read from stack R2 off -64+32 size 64", + .result_unpriv = REJECT, + /* in privileged mode reads from uninitialized stack locations are permitted */ + .result = ACCEPT, }, { "helper access to variable memory: 8 bytes no leak (init memory)", diff --git a/tools/testing/selftests/bpf/verifier/int_ptr.c b/tools/testing/selftests/bpf/verifier/int_ptr.c index 070893fb2900..02d9e004260b 100644 --- a/tools/testing/selftests/bpf/verifier/int_ptr.c +++ b/tools/testing/selftests/bpf/verifier/int_ptr.c @@ -54,12 +54,13 @@ /* bpf_strtoul() */ BPF_EMIT_CALL(BPF_FUNC_strtoul), - BPF_MOV64_IMM(BPF_REG_0, 1), + BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .result = REJECT, - .prog_type = BPF_PROG_TYPE_CGROUP_SYSCTL, - .errstr = "invalid indirect read from stack R4 off -16+4 size 8", + .result_unpriv = REJECT, + .errstr_unpriv = "invalid indirect read from stack R4 off -16+4 size 8", + /* in privileged mode reads from uninitialized stack locations are permitted */ + .result = ACCEPT, }, { "ARG_PTR_TO_LONG misaligned", diff --git a/tools/testing/selftests/bpf/verifier/search_pruning.c b/tools/testing/selftests/bpf/verifier/search_pruning.c index 7e36078f8f48..949cbe460248 100644 --- a/tools/testing/selftests/bpf/verifier/search_pruning.c +++ b/tools/testing/selftests/bpf/verifier/search_pruning.c @@ -128,9 +128,10 @@ BPF_EXIT_INSN(), }, .fixup_map_hash_8b = { 3 }, - .errstr = "invalid read from stack off -16+0 size 8", - .result = REJECT, - .prog_type = BPF_PROG_TYPE_TRACEPOINT, + .errstr_unpriv = "invalid read from stack off -16+0 size 8", + .result_unpriv = REJECT, + /* in privileged mode reads from uninitialized stack locations are permitted */ + .result = ACCEPT, }, { "allocated_stack", @@ -187,6 +188,8 @@ BPF_EXIT_INSN(), }, .flags = BPF_F_TEST_STATE_FREQ, - .errstr = "invalid read from stack off -8+1 size 8", - .result = REJECT, + .errstr_unpriv = "invalid read from stack off -8+1 size 8", + .result_unpriv = REJECT, + /* in privileged mode reads from uninitialized stack locations are permitted */ + .result = ACCEPT, }, diff --git a/tools/testing/selftests/bpf/verifier/sock.c b/tools/testing/selftests/bpf/verifier/sock.c index 8c224eac93df..59d976d22867 100644 --- a/tools/testing/selftests/bpf/verifier/sock.c +++ b/tools/testing/selftests/bpf/verifier/sock.c @@ -530,33 +530,6 @@ .prog_type = BPF_PROG_TYPE_SCHED_CLS, .result = ACCEPT, }, -{ - "sk_storage_get(map, skb->sk, &stack_value, 1): partially init stack_value", - .insns = { - BPF_MOV64_IMM(BPF_REG_2, 0), - BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), - BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)), - BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2), - BPF_MOV64_IMM(BPF_REG_0, 0), - BPF_EXIT_INSN(), - BPF_EMIT_CALL(BPF_FUNC_sk_fullsock), - BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2), - BPF_MOV64_IMM(BPF_REG_0, 0), - BPF_EXIT_INSN(), - BPF_MOV64_IMM(BPF_REG_4, 1), - BPF_MOV64_REG(BPF_REG_3, BPF_REG_10), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, -8), - BPF_MOV64_REG(BPF_REG_2, BPF_REG_0), - BPF_LD_MAP_FD(BPF_REG_1, 0), - BPF_EMIT_CALL(BPF_FUNC_sk_storage_get), - BPF_MOV64_IMM(BPF_REG_0, 0), - BPF_EXIT_INSN(), - }, - .fixup_sk_storage_map = { 14 }, - .prog_type = BPF_PROG_TYPE_SCHED_CLS, - .result = REJECT, - .errstr = "invalid indirect read from stack", -}, { "bpf_map_lookup_elem(smap, &key)", .insns = { diff --git a/tools/testing/selftests/bpf/verifier/spill_fill.c b/tools/testing/selftests/bpf/verifier/spill_fill.c index 0b943897aaf6..1e76841b7bfa 100644 --- a/tools/testing/selftests/bpf/verifier/spill_fill.c +++ b/tools/testing/selftests/bpf/verifier/spill_fill.c @@ -104,3 +104,214 @@ .result = ACCEPT, .retval = POINTER_VALUE, }, +{ + "Spill and refill a u32 const scalar. Offset to skb->data", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct __sk_buff, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct __sk_buff, data_end)), + /* r4 = 20 */ + BPF_MOV32_IMM(BPF_REG_4, 20), + /* *(u32 *)(r10 -8) = r4 */ + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_4, -8), + /* r4 = *(u32 *)(r10 -8) */ + BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_10, -8), + /* r0 = r2 */ + BPF_MOV64_REG(BPF_REG_0, BPF_REG_2), + /* r0 += r4 R0=pkt R2=pkt R3=pkt_end R4=20 */ + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_4), + /* if (r0 > r3) R0=pkt,off=20 R2=pkt R3=pkt_end R4=20 */ + BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1), + /* r0 = *(u32 *)r2 R0=pkt,off=20,r=20 R2=pkt,r=20 R3=pkt_end R4=20 */ + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_2, 0), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, +}, +{ + "Spill a u32 const, refill from another half of the uninit u32 from the stack", + .insns = { + /* r4 = 20 */ + BPF_MOV32_IMM(BPF_REG_4, 20), + /* *(u32 *)(r10 -8) = r4 */ + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_4, -8), + /* r4 = *(u32 *)(r10 -4) fp-8=????rrrr*/ + BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_10, -4), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result_unpriv = REJECT, + .errstr_unpriv = "invalid read from stack off -4+0 size 4", + /* in privileged mode reads from uninitialized stack locations are permitted */ + .result = ACCEPT, +}, +{ + "Spill a u32 const scalar. Refill as u16. Offset to skb->data", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct __sk_buff, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct __sk_buff, data_end)), + /* r4 = 20 */ + BPF_MOV32_IMM(BPF_REG_4, 20), + /* *(u32 *)(r10 -8) = r4 */ + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_4, -8), + /* r4 = *(u16 *)(r10 -8) */ + BPF_LDX_MEM(BPF_H, BPF_REG_4, BPF_REG_10, -8), + /* r0 = r2 */ + BPF_MOV64_REG(BPF_REG_0, BPF_REG_2), + /* r0 += r4 R0=pkt R2=pkt R3=pkt_end R4=umax=65535 */ + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_4), + /* if (r0 > r3) R0=pkt,umax=65535 R2=pkt R3=pkt_end R4=umax=65535 */ + BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1), + /* r0 = *(u32 *)r2 R0=pkt,umax=65535 R2=pkt R3=pkt_end R4=20 */ + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_2, 0), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = REJECT, + .errstr = "invalid access to packet", + .prog_type = BPF_PROG_TYPE_SCHED_CLS, +}, +{ + "Spill u32 const scalars. Refill as u64. Offset to skb->data", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct __sk_buff, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct __sk_buff, data_end)), + /* r6 = 0 */ + BPF_MOV32_IMM(BPF_REG_6, 0), + /* r7 = 20 */ + BPF_MOV32_IMM(BPF_REG_7, 20), + /* *(u32 *)(r10 -4) = r6 */ + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_6, -4), + /* *(u32 *)(r10 -8) = r7 */ + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_7, -8), + /* r4 = *(u64 *)(r10 -8) */ + BPF_LDX_MEM(BPF_H, BPF_REG_4, BPF_REG_10, -8), + /* r0 = r2 */ + BPF_MOV64_REG(BPF_REG_0, BPF_REG_2), + /* r0 += r4 R0=pkt R2=pkt R3=pkt_end R4=umax=65535 */ + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_4), + /* if (r0 > r3) R0=pkt,umax=65535 R2=pkt R3=pkt_end R4=umax=65535 */ + BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1), + /* r0 = *(u32 *)r2 R0=pkt,umax=65535 R2=pkt R3=pkt_end R4=20 */ + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_2, 0), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = REJECT, + .errstr = "invalid access to packet", + .prog_type = BPF_PROG_TYPE_SCHED_CLS, +}, +{ + "Spill a u32 const scalar. Refill as u16 from fp-6. Offset to skb->data", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct __sk_buff, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct __sk_buff, data_end)), + /* r4 = 20 */ + BPF_MOV32_IMM(BPF_REG_4, 20), + /* *(u32 *)(r10 -8) = r4 */ + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_4, -8), + /* r4 = *(u16 *)(r10 -6) */ + BPF_LDX_MEM(BPF_H, BPF_REG_4, BPF_REG_10, -6), + /* r0 = r2 */ + BPF_MOV64_REG(BPF_REG_0, BPF_REG_2), + /* r0 += r4 R0=pkt R2=pkt R3=pkt_end R4=umax=65535 */ + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_4), + /* if (r0 > r3) R0=pkt,umax=65535 R2=pkt R3=pkt_end R4=umax=65535 */ + BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1), + /* r0 = *(u32 *)r2 R0=pkt,umax=65535 R2=pkt R3=pkt_end R4=20 */ + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_2, 0), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = REJECT, + .errstr = "invalid access to packet", + .prog_type = BPF_PROG_TYPE_SCHED_CLS, +}, +{ + "Spill and refill a u32 const scalar at non 8byte aligned stack addr. Offset to skb->data", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct __sk_buff, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct __sk_buff, data_end)), + /* r4 = 20 */ + BPF_MOV32_IMM(BPF_REG_4, 20), + /* *(u32 *)(r10 -8) = r4 */ + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_4, -8), + /* *(u32 *)(r10 -4) = r4 */ + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_4, -4), + /* r4 = *(u32 *)(r10 -4), */ + BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_10, -4), + /* r0 = r2 */ + BPF_MOV64_REG(BPF_REG_0, BPF_REG_2), + /* r0 += r4 R0=pkt R2=pkt R3=pkt_end R4=umax=U32_MAX */ + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_4), + /* if (r0 > r3) R0=pkt,umax=U32_MAX R2=pkt R3=pkt_end R4= */ + BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1), + /* r0 = *(u32 *)r2 R0=pkt,umax=U32_MAX R2=pkt R3=pkt_end R4= */ + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_2, 0), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = REJECT, + .errstr = "invalid access to packet", + .prog_type = BPF_PROG_TYPE_SCHED_CLS, +}, +{ + "Spill and refill a umax=40 bounded scalar. Offset to skb->data", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, + offsetof(struct __sk_buff, data)), + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, + offsetof(struct __sk_buff, data_end)), + BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_1, + offsetof(struct __sk_buff, tstamp)), + BPF_JMP_IMM(BPF_JLE, BPF_REG_4, 40, 2), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + /* *(u32 *)(r10 -8) = r4 R4=umax=40 */ + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_4, -8), + /* r4 = (*u32 *)(r10 - 8) */ + BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_10, -8), + /* r2 += r4 R2=pkt R4=umax=40 */ + BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_4), + /* r0 = r2 R2=pkt,umax=40 R4=umax=40 */ + BPF_MOV64_REG(BPF_REG_0, BPF_REG_2), + /* r2 += 20 R0=pkt,umax=40 R2=pkt,umax=40 */ + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, 20), + /* if (r2 > r3) R0=pkt,umax=40 R2=pkt,off=20,umax=40 */ + BPF_JMP_REG(BPF_JGT, BPF_REG_2, BPF_REG_3, 1), + /* r0 = *(u32 *)r0 R0=pkt,r=20,umax=40 R2=pkt,off=20,r=20,umax=40 */ + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, +}, +{ + "Spill a u32 scalar at fp-4 and then at fp-8", + .insns = { + /* r4 = 4321 */ + BPF_MOV32_IMM(BPF_REG_4, 4321), + /* *(u32 *)(r10 -4) = r4 */ + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_4, -4), + /* *(u32 *)(r10 -8) = r4 */ + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_4, -8), + /* r4 = *(u64 *)(r10 -8) */ + BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_10, -8), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, +}, diff --git a/tools/testing/selftests/bpf/verifier/var_off.c b/tools/testing/selftests/bpf/verifier/var_off.c index eab1f7f56e2f..dc92a29f0d74 100644 --- a/tools/testing/selftests/bpf/verifier/var_off.c +++ b/tools/testing/selftests/bpf/verifier/var_off.c @@ -212,31 +212,6 @@ .result = REJECT, .prog_type = BPF_PROG_TYPE_LWT_IN, }, -{ - "indirect variable-offset stack access, max_off+size > max_initialized", - .insns = { - /* Fill only the second from top 8 bytes of the stack. */ - BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, 0), - /* Get an unknown value. */ - BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0), - /* Make it small and 4-byte aligned. */ - BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 4), - BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 16), - /* Add it to fp. We now have either fp-12 or fp-16, but we don't know - * which. fp-12 size 8 is partially uninitialized stack. - */ - BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10), - /* Dereference it indirectly. */ - BPF_LD_MAP_FD(BPF_REG_1, 0), - BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem), - BPF_MOV64_IMM(BPF_REG_0, 0), - BPF_EXIT_INSN(), - }, - .fixup_map_hash_8b = { 5 }, - .errstr = "invalid indirect read from stack R2 var_off", - .result = REJECT, - .prog_type = BPF_PROG_TYPE_LWT_IN, -}, { "indirect variable-offset stack access, min_off < min_initialized", .insns = { @@ -289,33 +264,6 @@ .result = ACCEPT, .prog_type = BPF_PROG_TYPE_CGROUP_SKB, }, -{ - "indirect variable-offset stack access, uninitialized", - .insns = { - BPF_MOV64_IMM(BPF_REG_2, 6), - BPF_MOV64_IMM(BPF_REG_3, 28), - /* Fill the top 16 bytes of the stack. */ - BPF_ST_MEM(BPF_W, BPF_REG_10, -16, 0), - BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), - /* Get an unknown value. */ - BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1, 0), - /* Make it small and 4-byte aligned. */ - BPF_ALU64_IMM(BPF_AND, BPF_REG_4, 4), - BPF_ALU64_IMM(BPF_SUB, BPF_REG_4, 16), - /* Add it to fp. We now have either fp-12 or fp-16, we don't know - * which, but either way it points to initialized stack. - */ - BPF_ALU64_REG(BPF_ADD, BPF_REG_4, BPF_REG_10), - BPF_MOV64_IMM(BPF_REG_5, 8), - /* Dereference it indirectly. */ - BPF_EMIT_CALL(BPF_FUNC_getsockopt), - BPF_MOV64_IMM(BPF_REG_0, 0), - BPF_EXIT_INSN(), - }, - .errstr = "invalid indirect read from stack R4 var_off", - .result = REJECT, - .prog_type = BPF_PROG_TYPE_SOCK_OPS, -}, { "indirect variable-offset stack access, ok", .insns = { From a9a466a69b85059b341239766a10efdd3ee68a4b Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi Date: Sat, 29 Jun 2024 01:51:07 +0900 Subject: [PATCH 1468/1565] nilfs2: fix kernel bug on rename operation of broken directory commit a9e1ddc09ca55746079cc479aa3eb6411f0d99d4 upstream. Syzbot reported that in rename directory operation on broken directory on nilfs2, __block_write_begin_int() called to prepare block write may fail BUG_ON check for access exceeding the folio/page size. This is because nilfs_dotdot(), which gets parent directory reference entry ("..") of the directory to be moved or renamed, does not check consistency enough, and may return location exceeding folio/page size for broken directories. Fix this issue by checking required directory entries ("." and "..") in the first chunk of the directory in nilfs_dotdot(). Link: https://lkml.kernel.org/r/20240628165107.9006-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi Reported-by: syzbot+d3abed1ad3d367fa2627@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d3abed1ad3d367fa2627 Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations") Tested-by: Ryusuke Konishi Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- fs/nilfs2/dir.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c index 3e15a8fdac4c..5c0e280c83ee 100644 --- a/fs/nilfs2/dir.c +++ b/fs/nilfs2/dir.c @@ -396,11 +396,39 @@ nilfs_find_entry(struct inode *dir, const struct qstr *qstr, struct nilfs_dir_entry *nilfs_dotdot(struct inode *dir, struct page **p) { - struct nilfs_dir_entry *de = nilfs_get_page(dir, 0, p); + struct page *page; + struct nilfs_dir_entry *de, *next_de; + size_t limit; + char *msg; + de = nilfs_get_page(dir, 0, &page); if (IS_ERR(de)) return NULL; - return nilfs_next_entry(de); + + limit = nilfs_last_byte(dir, 0); /* is a multiple of chunk size */ + if (unlikely(!limit || le64_to_cpu(de->inode) != dir->i_ino || + !nilfs_match(1, ".", de))) { + msg = "missing '.'"; + goto fail; + } + + next_de = nilfs_next_entry(de); + /* + * If "next_de" has not reached the end of the chunk, there is + * at least one more record. Check whether it matches "..". + */ + if (unlikely((char *)next_de == (char *)de + nilfs_chunk_size(dir) || + !nilfs_match(2, "..", next_de))) { + msg = "missing '..'"; + goto fail; + } + *p = page; + return next_de; + +fail: + nilfs_error(dir->i_sb, "directory #%lu %s", dir->i_ino, msg); + nilfs_put_page(page); + return NULL; } ino_t nilfs_inode_by_name(struct inode *dir, const struct qstr *qstr) From 8d99f26b557a77db021ba13a0039e85bf7aeadcf Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Sun, 7 Jul 2024 10:28:46 +0200 Subject: [PATCH 1469/1565] i2c: rcar: bring hardware to known state when probing [ Upstream commit 4e36c0f20cb1c74c7bd7ea31ba432c1c4a989031 ] When probing, the hardware is not brought into a known state. This may be a problem when a hypervisor restarts Linux without resetting the hardware, leaving an old state running. Make sure the hardware gets initialized, especially interrupts should be cleared and disabled. Reported-by: Dirk Behme Reported-by: Geert Uytterhoeven Closes: https://lore.kernel.org/r/20240702045535.2000393-1-dirk.behme@de.bosch.com Fixes: 6ccbe607132b ("i2c: add Renesas R-Car I2C driver") Signed-off-by: Wolfram Sang Signed-off-by: Andi Shyti Signed-off-by: Sasha Levin --- drivers/i2c/busses/i2c-rcar.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/drivers/i2c/busses/i2c-rcar.c b/drivers/i2c/busses/i2c-rcar.c index 6a7a7a074a97..029d99970826 100644 --- a/drivers/i2c/busses/i2c-rcar.c +++ b/drivers/i2c/busses/i2c-rcar.c @@ -220,6 +220,14 @@ static void rcar_i2c_init(struct rcar_i2c_priv *priv) } +static void rcar_i2c_reset_slave(struct rcar_i2c_priv *priv) +{ + rcar_i2c_write(priv, ICSIER, 0); + rcar_i2c_write(priv, ICSSR, 0); + rcar_i2c_write(priv, ICSCR, SDBS); + rcar_i2c_write(priv, ICSAR, 0); /* Gen2: must be 0 if not using slave */ +} + static int rcar_i2c_bus_barrier(struct rcar_i2c_priv *priv) { int ret; @@ -888,11 +896,8 @@ static int rcar_unreg_slave(struct i2c_client *slave) /* ensure no irq is running before clearing ptr */ disable_irq(priv->irq); - rcar_i2c_write(priv, ICSIER, 0); - rcar_i2c_write(priv, ICSSR, 0); + rcar_i2c_reset_slave(priv); enable_irq(priv->irq); - rcar_i2c_write(priv, ICSCR, SDBS); - rcar_i2c_write(priv, ICSAR, 0); /* Gen2: must be 0 if not using slave */ priv->slave = NULL; @@ -1004,7 +1009,9 @@ static int rcar_i2c_probe(struct platform_device *pdev) goto out_pm_disable; } - rcar_i2c_write(priv, ICSAR, 0); /* Gen2: must be 0 if not using slave */ + /* Bring hardware to known state */ + rcar_i2c_init(priv); + rcar_i2c_reset_slave(priv); if (priv->devtype < I2C_RCAR_GEN3) { irqflags |= IRQF_NO_THREAD; From 785290cb16edfa27d056e9207efa3bfbefb64bfb Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Wed, 10 Jul 2024 10:55:07 +0200 Subject: [PATCH 1470/1565] i2c: mark HostNotify target address as used [ Upstream commit bd9f5348089b65612e5ca976e2ae22f005340331 ] I2C core handles the local target for receiving HostNotify alerts. There is no separate driver bound to that address. That means userspace can access it if desired, leading to further complications if controllers are not capable of reading their own local target. Bind the local target to the dummy driver so it will be marked as "handled by the kernel" if the HostNotify feature is used. That protects aginst userspace access and prevents other drivers binding to it. Fixes: 2a71593da34d ("i2c: smbus: add core function handling SMBus host-notify") Signed-off-by: Wolfram Sang Signed-off-by: Sasha Levin --- drivers/i2c/i2c-core-base.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/i2c/i2c-core-base.c b/drivers/i2c/i2c-core-base.c index e8a89e18c640..6fac638e423a 100644 --- a/drivers/i2c/i2c-core-base.c +++ b/drivers/i2c/i2c-core-base.c @@ -969,6 +969,7 @@ EXPORT_SYMBOL_GPL(i2c_unregister_device); static const struct i2c_device_id dummy_id[] = { { "dummy", 0 }, + { "smbus_host_notify", 0 }, { }, }; From b4c11a53e1f8d42be6fdb7883e3a79732c64e3e2 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 3 Feb 2022 15:33:17 +0100 Subject: [PATCH 1471/1565] i2c: rcar: Add R-Car Gen4 support [ Upstream commit ea01b71b07993d5c518496692f476a3c6b5d9786 ] Add support for the I2C Bus Interface on R-Car Gen4 SoCs (e.g. R-Car S4-8) by matching on a family-specific compatible value. While I2C on R-Car Gen4 does support some extra features (Slave Clock Stretch Select), for now it is treated the same as I2C on R-Car Gen3. Signed-off-by: Geert Uytterhoeven Reviewed-by: Wolfram Sang [wsa: removed incorrect "FM+" from commit message] Signed-off-by: Wolfram Sang Stable-dep-of: ea5ea84c9d35 ("i2c: rcar: ensure Gen3+ reset does not disturb local targets") Signed-off-by: Sasha Levin --- drivers/i2c/busses/i2c-rcar.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/i2c/busses/i2c-rcar.c b/drivers/i2c/busses/i2c-rcar.c index 029d99970826..720c3fd757ef 100644 --- a/drivers/i2c/busses/i2c-rcar.c +++ b/drivers/i2c/busses/i2c-rcar.c @@ -950,6 +950,7 @@ static const struct of_device_id rcar_i2c_dt_ids[] = { { .compatible = "renesas,rcar-gen1-i2c", .data = (void *)I2C_RCAR_GEN1 }, { .compatible = "renesas,rcar-gen2-i2c", .data = (void *)I2C_RCAR_GEN2 }, { .compatible = "renesas,rcar-gen3-i2c", .data = (void *)I2C_RCAR_GEN3 }, + { .compatible = "renesas,rcar-gen4-i2c", .data = (void *)I2C_RCAR_GEN3 }, {}, }; MODULE_DEVICE_TABLE(of, rcar_i2c_dt_ids); From a720e2e42fd857785f25d52e473c11ebdf7515eb Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Thu, 21 Sep 2023 14:53:49 +0200 Subject: [PATCH 1472/1565] i2c: rcar: reset controller is mandatory for Gen3+ [ Upstream commit 0e864b552b2302e40b2277629ebac79544a5c433 ] Initially, we only needed a reset controller to make sure RXDMA works at least once per transfer. Meanwhile, documentation has been updated. It now says that a reset has to be performed prior every transaction, even if it is non-DMA. So, make the reset controller a requirement instead of being optional. And bail out if resetting fails. Signed-off-by: Wolfram Sang Reviewed-by: Geert Uytterhoeven Signed-off-by: Wolfram Sang Stable-dep-of: ea5ea84c9d35 ("i2c: rcar: ensure Gen3+ reset does not disturb local targets") Signed-off-by: Sasha Levin --- drivers/i2c/busses/i2c-rcar.c | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/drivers/i2c/busses/i2c-rcar.c b/drivers/i2c/busses/i2c-rcar.c index 720c3fd757ef..4b0222e5f584 100644 --- a/drivers/i2c/busses/i2c-rcar.c +++ b/drivers/i2c/busses/i2c-rcar.c @@ -821,12 +821,10 @@ static int rcar_i2c_master_xfer(struct i2c_adapter *adap, /* Gen3 needs a reset before allowing RXDMA once */ if (priv->devtype == I2C_RCAR_GEN3) { - priv->flags |= ID_P_NO_RXDMA; - if (!IS_ERR(priv->rstc)) { - ret = rcar_i2c_do_reset(priv); - if (ret == 0) - priv->flags &= ~ID_P_NO_RXDMA; - } + priv->flags &= ~ID_P_NO_RXDMA; + ret = rcar_i2c_do_reset(priv); + if (ret) + goto out; } rcar_i2c_init(priv); @@ -1019,15 +1017,6 @@ static int rcar_i2c_probe(struct platform_device *pdev) irqhandler = rcar_i2c_gen2_irq; } - if (priv->devtype == I2C_RCAR_GEN3) { - priv->rstc = devm_reset_control_get_exclusive(&pdev->dev, NULL); - if (!IS_ERR(priv->rstc)) { - ret = reset_control_status(priv->rstc); - if (ret < 0) - priv->rstc = ERR_PTR(-ENOTSUPP); - } - } - /* Stay always active when multi-master to keep arbitration working */ if (of_property_read_bool(dev->of_node, "multi-master")) priv->flags |= ID_P_PM_BLOCKED; @@ -1037,6 +1026,16 @@ static int rcar_i2c_probe(struct platform_device *pdev) if (of_property_read_bool(dev->of_node, "smbus")) priv->flags |= ID_P_HOST_NOTIFY; + if (priv->devtype == I2C_RCAR_GEN3) { + priv->rstc = devm_reset_control_get_exclusive(&pdev->dev, NULL); + if (IS_ERR(priv->rstc)) + goto out_pm_put; + + ret = reset_control_status(priv->rstc); + if (ret < 0) + goto out_pm_put; + } + ret = platform_get_irq(pdev, 0); if (ret < 0) goto out_pm_put; From 88046f94cc0c49c9a1f36fa43a1b3a7d24bbc3b3 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Thu, 14 Dec 2023 08:43:57 +0100 Subject: [PATCH 1473/1565] i2c: rcar: introduce Gen4 devices [ Upstream commit 2b523c46e81ebd621515ab47117f95de197dfcbf ] So far, we treated Gen4 as Gen3. But we are soon adding FM+ as a Gen4 specific feature, so prepare the code for the new devtype. Signed-off-by: Wolfram Sang Reviewed-by: Geert Uytterhoeven Reviewed-by: Andi Shyti Signed-off-by: Wolfram Sang Stable-dep-of: ea5ea84c9d35 ("i2c: rcar: ensure Gen3+ reset does not disturb local targets") Signed-off-by: Sasha Levin --- drivers/i2c/busses/i2c-rcar.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/i2c/busses/i2c-rcar.c b/drivers/i2c/busses/i2c-rcar.c index 4b0222e5f584..41c9d15c3bc6 100644 --- a/drivers/i2c/busses/i2c-rcar.c +++ b/drivers/i2c/busses/i2c-rcar.c @@ -116,6 +116,7 @@ enum rcar_i2c_type { I2C_RCAR_GEN1, I2C_RCAR_GEN2, I2C_RCAR_GEN3, + I2C_RCAR_GEN4, }; struct rcar_i2c_priv { @@ -380,8 +381,8 @@ static void rcar_i2c_dma_unmap(struct rcar_i2c_priv *priv) dma_unmap_single(chan->device->dev, sg_dma_address(&priv->sg), sg_dma_len(&priv->sg), priv->dma_direction); - /* Gen3 can only do one RXDMA per transfer and we just completed it */ - if (priv->devtype == I2C_RCAR_GEN3 && + /* Gen3+ can only do one RXDMA per transfer and we just completed it */ + if (priv->devtype >= I2C_RCAR_GEN3 && priv->dma_direction == DMA_FROM_DEVICE) priv->flags |= ID_P_NO_RXDMA; @@ -819,8 +820,8 @@ static int rcar_i2c_master_xfer(struct i2c_adapter *adap, if (ret < 0) goto out; - /* Gen3 needs a reset before allowing RXDMA once */ - if (priv->devtype == I2C_RCAR_GEN3) { + /* Gen3+ needs a reset. That also allows RXDMA once */ + if (priv->devtype >= I2C_RCAR_GEN3) { priv->flags &= ~ID_P_NO_RXDMA; ret = rcar_i2c_do_reset(priv); if (ret) @@ -948,7 +949,7 @@ static const struct of_device_id rcar_i2c_dt_ids[] = { { .compatible = "renesas,rcar-gen1-i2c", .data = (void *)I2C_RCAR_GEN1 }, { .compatible = "renesas,rcar-gen2-i2c", .data = (void *)I2C_RCAR_GEN2 }, { .compatible = "renesas,rcar-gen3-i2c", .data = (void *)I2C_RCAR_GEN3 }, - { .compatible = "renesas,rcar-gen4-i2c", .data = (void *)I2C_RCAR_GEN3 }, + { .compatible = "renesas,rcar-gen4-i2c", .data = (void *)I2C_RCAR_GEN4 }, {}, }; MODULE_DEVICE_TABLE(of, rcar_i2c_dt_ids); @@ -1026,7 +1027,7 @@ static int rcar_i2c_probe(struct platform_device *pdev) if (of_property_read_bool(dev->of_node, "smbus")) priv->flags |= ID_P_HOST_NOTIFY; - if (priv->devtype == I2C_RCAR_GEN3) { + if (priv->devtype >= I2C_RCAR_GEN3) { priv->rstc = devm_reset_control_get_exclusive(&pdev->dev, NULL); if (IS_ERR(priv->rstc)) goto out_pm_put; From 41f62c95e0085b84253d5def65a17a4084bf1cd7 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Thu, 11 Jul 2024 10:30:44 +0200 Subject: [PATCH 1474/1565] i2c: rcar: ensure Gen3+ reset does not disturb local targets [ Upstream commit ea5ea84c9d3570dc06e8fc5ee2273eaa584aa3ac ] R-Car Gen3+ needs a reset before every controller transfer. That erases configuration of a potentially in parallel running local target instance. To avoid this disruption, avoid controller transfers if a local target is running. Also, disable SMBusHostNotify because it requires being a controller and local target at the same time. Fixes: 3b770017b03a ("i2c: rcar: handle RXDMA HW behaviour on Gen3") Signed-off-by: Wolfram Sang Signed-off-by: Andi Shyti Signed-off-by: Sasha Levin --- drivers/i2c/busses/i2c-rcar.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/i2c/busses/i2c-rcar.c b/drivers/i2c/busses/i2c-rcar.c index 41c9d15c3bc6..db3a5b80f1aa 100644 --- a/drivers/i2c/busses/i2c-rcar.c +++ b/drivers/i2c/busses/i2c-rcar.c @@ -796,6 +796,10 @@ static int rcar_i2c_do_reset(struct rcar_i2c_priv *priv) { int ret; + /* Don't reset if a slave instance is currently running */ + if (priv->slave) + return -EISCONN; + ret = reset_control_reset(priv->rstc); if (ret) return ret; @@ -1027,6 +1031,7 @@ static int rcar_i2c_probe(struct platform_device *pdev) if (of_property_read_bool(dev->of_node, "smbus")) priv->flags |= ID_P_HOST_NOTIFY; + /* R-Car Gen3+ needs a reset before every transfer */ if (priv->devtype >= I2C_RCAR_GEN3) { priv->rstc = devm_reset_control_get_exclusive(&pdev->dev, NULL); if (IS_ERR(priv->rstc)) @@ -1035,6 +1040,9 @@ static int rcar_i2c_probe(struct platform_device *pdev) ret = reset_control_status(priv->rstc); if (ret < 0) goto out_pm_put; + + /* hard reset disturbs HostNotify local target, so disable it */ + priv->flags &= ~ID_P_HOST_NOTIFY; } ret = platform_get_irq(pdev, 0); From 2907dd5855f69ba1847ba8220569ec77092c159f Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Wed, 10 Jul 2024 13:03:00 +0200 Subject: [PATCH 1475/1565] i2c: rcar: clear NO_RXDMA flag after resetting [ Upstream commit fea6b5ebb71a2830b042e42de7ae255017ac3ce8 ] We should allow RXDMA only if the reset was really successful, so clear the flag after the reset call. Fixes: 0e864b552b23 ("i2c: rcar: reset controller is mandatory for Gen3+") Signed-off-by: Wolfram Sang Signed-off-by: Andi Shyti Signed-off-by: Sasha Levin --- drivers/i2c/busses/i2c-rcar.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-rcar.c b/drivers/i2c/busses/i2c-rcar.c index db3a5b80f1aa..8a8c2403641b 100644 --- a/drivers/i2c/busses/i2c-rcar.c +++ b/drivers/i2c/busses/i2c-rcar.c @@ -826,10 +826,10 @@ static int rcar_i2c_master_xfer(struct i2c_adapter *adap, /* Gen3+ needs a reset. That also allows RXDMA once */ if (priv->devtype >= I2C_RCAR_GEN3) { - priv->flags &= ~ID_P_NO_RXDMA; ret = rcar_i2c_do_reset(priv); if (ret) goto out; + priv->flags &= ~ID_P_NO_RXDMA; } rcar_i2c_init(priv); From f52913e5d6ca9c1e6ef77d115246da48104b3075 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 27 Sep 2023 15:38:36 +0300 Subject: [PATCH 1476/1565] i2c: rcar: fix error code in probe() commit 37a672be3ae357a0f87fbc00897fa7fcb3d7d782 upstream. Return an error code if devm_reset_control_get_exclusive() fails. The current code returns success. Fixes: 0e864b552b23 ("i2c: rcar: reset controller is mandatory for Gen3+") Signed-off-by: Dan Carpenter Reviewed-by: Geert Uytterhoeven Signed-off-by: Wolfram Sang Signed-off-by: Greg Kroah-Hartman --- drivers/i2c/busses/i2c-rcar.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-rcar.c b/drivers/i2c/busses/i2c-rcar.c index 8a8c2403641b..7a6bae9df568 100644 --- a/drivers/i2c/busses/i2c-rcar.c +++ b/drivers/i2c/busses/i2c-rcar.c @@ -1034,8 +1034,10 @@ static int rcar_i2c_probe(struct platform_device *pdev) /* R-Car Gen3+ needs a reset before every transfer */ if (priv->devtype >= I2C_RCAR_GEN3) { priv->rstc = devm_reset_control_get_exclusive(&pdev->dev, NULL); - if (IS_ERR(priv->rstc)) + if (IS_ERR(priv->rstc)) { + ret = PTR_ERR(priv->rstc); goto out_pm_put; + } ret = reset_control_status(priv->rstc); if (ret < 0) From 83a48a4503d034f087f7648e6edf767c0550dbca Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Thu, 18 Jul 2024 13:05:52 +0200 Subject: [PATCH 1477/1565] Linux 5.10.222 Link: https://lore.kernel.org/r/20240716152745.988603303@linuxfoundation.org Tested-by: Florian Fainelli Tested-by: Pavel Machek (CIP) Tested-by: Mark Brown Tested-by: Dominique Martinet Link: https://lore.kernel.org/r/20240717063758.061781150@linuxfoundation.org Tested-by: Dominique Martinet Tested-by: Salvatore Bonaccorso Tested-by: Pavel Machek (CIP) Tested-by: Florian Fainelli Tested-by: Jon Hunter Signed-off-by: Greg Kroah-Hartman --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index b0e22161cd55..5b6dae61250c 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 5 PATCHLEVEL = 10 -SUBLEVEL = 221 +SUBLEVEL = 222 EXTRAVERSION = NAME = Dare mighty things From 2e272e7d71596e8f65a3e850c60ed42003076c46 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 7 Aug 2023 09:41:19 -0700 Subject: [PATCH 1478/1565] gcc-plugins: Rename last_stmt() for GCC 14+ commit 2e3f65ccfe6b0778b261ad69c9603ae85f210334 upstream. In GCC 14, last_stmt() was renamed to last_nondebug_stmt(). Add a helper macro to handle the renaming. Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook Cc: Thomas Meyer Signed-off-by: Greg Kroah-Hartman --- scripts/gcc-plugins/gcc-common.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/scripts/gcc-plugins/gcc-common.h b/scripts/gcc-plugins/gcc-common.h index 6d4563b8a52c..0c037b884530 100644 --- a/scripts/gcc-plugins/gcc-common.h +++ b/scripts/gcc-plugins/gcc-common.h @@ -980,4 +980,8 @@ static inline void debug_gimple_stmt(const_gimple s) #define SET_DECL_MODE(decl, mode) DECL_MODE(decl) = (mode) #endif +#if BUILDING_GCC_VERSION >= 14000 +#define last_stmt(x) last_nondebug_stmt(x) +#endif + #endif From 5661b9c7ec189406c2dde00837aaa4672efb6240 Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Tue, 2 Jul 2024 18:26:52 +0200 Subject: [PATCH 1479/1565] filelock: Remove locks reliably when fcntl/close race is detected commit 3cad1bc010416c6dd780643476bc59ed742436b9 upstream. When fcntl_setlk() races with close(), it removes the created lock with do_lock_file_wait(). However, LSMs can allow the first do_lock_file_wait() that created the lock while denying the second do_lock_file_wait() that tries to remove the lock. In theory (but AFAIK not in practice), posix_lock_file() could also fail to remove a lock due to GFP_KERNEL allocation failure (when splitting a range in the middle). After the bug has been triggered, use-after-free reads will occur in lock_get_status() when userspace reads /proc/locks. This can likely be used to read arbitrary kernel memory, but can't corrupt kernel memory. This only affects systems with SELinux / Smack / AppArmor / BPF-LSM in enforcing mode and only works from some security contexts. Fix it by calling locks_remove_posix() instead, which is designed to reliably get rid of POSIX locks associated with the given file and files_struct and is also used by filp_flush(). Fixes: c293621bbf67 ("[PATCH] stale POSIX lock handling") Cc: stable@kernel.org Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563 Signed-off-by: Jann Horn Link: https://lore.kernel.org/r/20240702-fs-lock-recover-2-v1-1-edd456f63789@google.com Reviewed-by: Jeff Layton Signed-off-by: Christian Brauner [stable fixup: ->c.flc_type was ->fl_type in older kernels] Signed-off-by: Jann Horn Signed-off-by: Greg Kroah-Hartman --- fs/locks.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 843fa3d3375d..8a6fcdb6e886 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2588,8 +2588,9 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd, error = do_lock_file_wait(filp, cmd, file_lock); /* - * Attempt to detect a close/fcntl race and recover by releasing the - * lock that was just acquired. There is no need to do that when we're + * Detect close/fcntl races and recover by zapping all POSIX locks + * associated with this file and our files_struct, just like on + * filp_flush(). There is no need to do that when we're * unlocking though, or for OFD locks. */ if (!error && file_lock->fl_type != F_UNLCK && @@ -2604,9 +2605,7 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd, f = files_lookup_fd_locked(files, fd); spin_unlock(&files->file_lock); if (f != filp) { - file_lock->fl_type = F_UNLCK; - error = do_lock_file_wait(filp, cmd, file_lock); - WARN_ON_ONCE(error); + locks_remove_posix(filp, files); error = -EBADF; } } From cd9472c43f5e2a96141c2964df6936ceca64dfe3 Mon Sep 17 00:00:00 2001 From: Saurav Kashyap Date: Wed, 15 May 2024 14:41:01 +0530 Subject: [PATCH 1480/1565] scsi: qedf: Set qed_slowpath_params to zero before use [ Upstream commit 6c3bb589debd763dc4b94803ddf3c13b4fcca776 ] Zero qed_slowpath_params before use. Signed-off-by: Saurav Kashyap Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240515091101.18754-4-skashyap@marvell.com Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/qedf/qedf_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/qedf/qedf_main.c b/drivers/scsi/qedf/qedf_main.c index 6923862be3fb..2536da96130e 100644 --- a/drivers/scsi/qedf/qedf_main.c +++ b/drivers/scsi/qedf/qedf_main.c @@ -3453,6 +3453,7 @@ static int __qedf_probe(struct pci_dev *pdev, int mode) } /* Start the Slowpath-process */ + memset(&slowpath_params, 0, sizeof(struct qed_slowpath_params)); slowpath_params.int_mode = QED_INT_MODE_MSIX; slowpath_params.drv_major = QEDF_DRIVER_MAJOR_VER; slowpath_params.drv_minor = QEDF_DRIVER_MINOR_VER; From fd57dbffd92547bd65aadce912e7d4d29438a2f9 Mon Sep 17 00:00:00 2001 From: Armin Wolf Date: Wed, 22 May 2024 23:36:48 +0200 Subject: [PATCH 1481/1565] ACPI: EC: Abort address space access upon error [ Upstream commit f6f172dc6a6d7775b2df6adfd1350700e9a847ec ] When a multi-byte address space access is requested, acpi_ec_read()/ acpi_ec_write() is being called multiple times. Abort such operations if a single call to acpi_ec_read() / acpi_ec_write() fails, as the data read from / written to the EC might be incomplete. Signed-off-by: Armin Wolf Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin --- drivers/acpi/ec.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index 487884420fb0..60f49ee16147 100644 --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -1316,10 +1316,13 @@ acpi_ec_space_handler(u32 function, acpi_physical_address address, if (ec->busy_polling || bits > 8) acpi_ec_burst_enable(ec); - for (i = 0; i < bytes; ++i, ++address, ++value) + for (i = 0; i < bytes; ++i, ++address, ++value) { result = (function == ACPI_READ) ? acpi_ec_read(ec, address, value) : acpi_ec_write(ec, address, *value); + if (result < 0) + break; + } if (ec->busy_polling || bits > 8) acpi_ec_burst_disable(ec); From 6df01b7eabc8c4fc3000ebb82fdce06bced9ca9b Mon Sep 17 00:00:00 2001 From: Armin Wolf Date: Wed, 22 May 2024 23:36:49 +0200 Subject: [PATCH 1482/1565] ACPI: EC: Avoid returning AE_OK on errors in address space handler [ Upstream commit c4bd7f1d78340e63de4d073fd3dbe5391e2996e5 ] If an error code other than EINVAL, ENODEV or ETIME is returned by acpi_ec_read() / acpi_ec_write(), then AE_OK is incorrectly returned by acpi_ec_space_handler(). Fix this by only returning AE_OK on success, and return AE_ERROR otherwise. Signed-off-by: Armin Wolf [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin --- drivers/acpi/ec.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index 60f49ee16147..01a6400c3234 100644 --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -1334,8 +1334,10 @@ acpi_ec_space_handler(u32 function, acpi_physical_address address, return AE_NOT_FOUND; case -ETIME: return AE_TIME; - default: + case 0: return AE_OK; + default: + return AE_ERROR; } } From 60cf36f2900f58ed1083363541e2df4695f2cda4 Mon Sep 17 00:00:00 2001 From: Nicolas Escande Date: Mon, 27 May 2024 16:17:59 +0200 Subject: [PATCH 1483/1565] wifi: mac80211: mesh: init nonpeer_pm to active by default in mesh sdata [ Upstream commit 6f6291f09a322c1c1578badac8072d049363f4e6 ] With a ath9k device I can see that: iw phy phy0 interface add mesh0 type mp ip link set mesh0 up iw dev mesh0 scan Will start a scan with the Power Management bit set in the Frame Control Field. This is because we set this bit depending on the nonpeer_pm variable of the mesh iface sdata and when there are no active links on the interface it remains to NL80211_MESH_POWER_UNKNOWN. As soon as links starts to be established, it wil switch to NL80211_MESH_POWER_ACTIVE as it is the value set by befault on the per sta nonpeer_pm field. As we want no power save by default, (as expressed with the per sta ini values), lets init it to the expected default value of NL80211_MESH_POWER_ACTIVE. Also please note that we cannot change the default value from userspace prior to establishing a link as using NL80211_CMD_SET_MESH_CONFIG will not work before NL80211_CMD_JOIN_MESH has been issued. So too late for our initial scan. Signed-off-by: Nicolas Escande Link: https://msgid.link/20240527141759.299411-1-nico.escande@gmail.com Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin --- net/mac80211/mesh.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/mac80211/mesh.c b/net/mac80211/mesh.c index ce5825d6f1d1..d3a9ce1f8e53 100644 --- a/net/mac80211/mesh.c +++ b/net/mac80211/mesh.c @@ -1584,6 +1584,7 @@ void ieee80211_mesh_init_sdata(struct ieee80211_sub_if_data *sdata) ifmsh->last_preq = jiffies; ifmsh->next_perr = jiffies; ifmsh->csa_role = IEEE80211_MESH_CSA_ROLE_NONE; + ifmsh->nonpeer_pm = NL80211_MESH_POWER_ACTIVE; /* Allocate all mesh structures when creating the first mesh interface. */ if (!mesh_allocated) ieee80211s_init(); From bb8ace6794a1bc50eee92adc67a4857a3e8c81d7 Mon Sep 17 00:00:00 2001 From: Dmitry Antipov Date: Fri, 17 May 2024 18:33:32 +0300 Subject: [PATCH 1484/1565] wifi: mac80211: fix UBSAN noise in ieee80211_prep_hw_scan() [ Upstream commit 92ecbb3ac6f3fe8ae9edf3226c76aa17b6800699 ] When testing the previous patch with CONFIG_UBSAN_BOUNDS, I've noticed the following: UBSAN: array-index-out-of-bounds in net/mac80211/scan.c:372:4 index 0 is out of range for type 'struct ieee80211_channel *[]' CPU: 0 PID: 1435 Comm: wpa_supplicant Not tainted 6.9.0+ #1 Hardware name: LENOVO 20UN005QRT/20UN005QRT <...BIOS details...> Call Trace: dump_stack_lvl+0x2d/0x90 __ubsan_handle_out_of_bounds+0xe7/0x140 ? timerqueue_add+0x98/0xb0 ieee80211_prep_hw_scan+0x2db/0x480 [mac80211] ? __kmalloc+0xe1/0x470 __ieee80211_start_scan+0x541/0x760 [mac80211] rdev_scan+0x1f/0xe0 [cfg80211] nl80211_trigger_scan+0x9b6/0xae0 [cfg80211] ... Since '__ieee80211_start_scan()' leaves 'hw_scan_req->req.n_channels' uninitialized, actual boundaries of 'hw_scan_req->req.channels' can't be checked in 'ieee80211_prep_hw_scan()'. Although an initialization of 'hw_scan_req->req.n_channels' introduces some confusion around allocated vs. used VLA members, this shouldn't be a problem since everything is correctly adjusted soon in 'ieee80211_prep_hw_scan()'. Cleanup 'kmalloc()' math in '__ieee80211_start_scan()' by using the convenient 'struct_size()' as well. Signed-off-by: Dmitry Antipov Link: https://msgid.link/20240517153332.18271-2-dmantipov@yandex.ru [improve (imho) indentation a bit] Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin --- net/mac80211/scan.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c index b241ff8c015a..be5d02c129e9 100644 --- a/net/mac80211/scan.c +++ b/net/mac80211/scan.c @@ -727,15 +727,21 @@ static int __ieee80211_start_scan(struct ieee80211_sub_if_data *sdata, local->hw_scan_ies_bufsize *= n_bands; } - local->hw_scan_req = kmalloc( - sizeof(*local->hw_scan_req) + - req->n_channels * sizeof(req->channels[0]) + - local->hw_scan_ies_bufsize, GFP_KERNEL); + local->hw_scan_req = kmalloc(struct_size(local->hw_scan_req, + req.channels, + req->n_channels) + + local->hw_scan_ies_bufsize, + GFP_KERNEL); if (!local->hw_scan_req) return -ENOMEM; local->hw_scan_req->req.ssids = req->ssids; local->hw_scan_req->req.n_ssids = req->n_ssids; + /* None of the channels are actually set + * up but let UBSAN know the boundaries. + */ + local->hw_scan_req->req.n_channels = req->n_channels; + ies = (u8 *)local->hw_scan_req + sizeof(*local->hw_scan_req) + req->n_channels * sizeof(req->channels[0]); From 42e60f3bde3c2f9a370a87c4961d0b7786d65ca5 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Tue, 21 May 2024 13:03:25 +1000 Subject: [PATCH 1485/1565] selftests/openat2: Fix build warnings on ppc64 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 84b6df4c49a1cc2854a16937acd5fd3e6315d083 ] Fix warnings like: openat2_test.c: In function ‘test_openat2_flags’: openat2_test.c:303:73: warning: format ‘%llX’ expects argument of type ‘long long unsigned int’, but argument 5 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=] By switching to unsigned long long for u64 for ppc64 builds. Signed-off-by: Michael Ellerman Reviewed-by: Muhammad Usama Anjum Signed-off-by: Shuah Khan Signed-off-by: Sasha Levin --- tools/testing/selftests/openat2/openat2_test.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/openat2/openat2_test.c b/tools/testing/selftests/openat2/openat2_test.c index 453152b58e7f..1045df1a98c0 100644 --- a/tools/testing/selftests/openat2/openat2_test.c +++ b/tools/testing/selftests/openat2/openat2_test.c @@ -5,6 +5,7 @@ */ #define _GNU_SOURCE +#define __SANE_USERSPACE_TYPES__ // Use ll64 #include #include #include From fe855e5b16270aec8fe6d953630147bcce5b965e Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Sat, 25 May 2024 21:38:53 +0200 Subject: [PATCH 1486/1565] Input: silead - Always support 10 fingers [ Upstream commit 38a38f5a36da9820680d413972cb733349400532 ] When support for Silead touchscreens was orginal added some touchscreens with older firmware versions only supported 5 fingers and this was made the default requiring the setting of a "silead,max-fingers=10" uint32 device-property for all touchscreen models which do support 10 fingers. There are very few models with the old 5 finger fw, so in practice the setting of the "silead,max-fingers=10" is boilerplate which needs to be copy and pasted to every touchscreen config. Reporting that 10 fingers are supported on devices which only support 5 fingers doesn't cause any problems for userspace in practice, since at max 4 finger gestures are supported anyways. Drop the max_fingers configuration and simply always assume 10 fingers. Signed-off-by: Hans de Goede Acked-by: Dmitry Torokhov Link: https://lore.kernel.org/r/20240525193854.39130-2-hdegoede@redhat.com Signed-off-by: Sasha Levin --- drivers/input/touchscreen/silead.c | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/drivers/input/touchscreen/silead.c b/drivers/input/touchscreen/silead.c index e8b6c3137420..901e28bc0164 100644 --- a/drivers/input/touchscreen/silead.c +++ b/drivers/input/touchscreen/silead.c @@ -70,7 +70,6 @@ struct silead_ts_data { struct regulator_bulk_data regulators[2]; char fw_name[64]; struct touchscreen_properties prop; - u32 max_fingers; u32 chip_id; struct input_mt_pos pos[SILEAD_MAX_FINGERS]; int slots[SILEAD_MAX_FINGERS]; @@ -98,7 +97,7 @@ static int silead_ts_request_input_dev(struct silead_ts_data *data) input_set_abs_params(data->input, ABS_MT_POSITION_Y, 0, 4095, 0, 0); touchscreen_parse_properties(data->input, true, &data->prop); - input_mt_init_slots(data->input, data->max_fingers, + input_mt_init_slots(data->input, SILEAD_MAX_FINGERS, INPUT_MT_DIRECT | INPUT_MT_DROP_UNUSED | INPUT_MT_TRACK); @@ -145,10 +144,10 @@ static void silead_ts_read_data(struct i2c_client *client) return; } - if (buf[0] > data->max_fingers) { + if (buf[0] > SILEAD_MAX_FINGERS) { dev_warn(dev, "More touches reported then supported %d > %d\n", - buf[0], data->max_fingers); - buf[0] = data->max_fingers; + buf[0], SILEAD_MAX_FINGERS); + buf[0] = SILEAD_MAX_FINGERS; } touch_nr = 0; @@ -200,7 +199,6 @@ static void silead_ts_read_data(struct i2c_client *client) static int silead_ts_init(struct i2c_client *client) { - struct silead_ts_data *data = i2c_get_clientdata(client); int error; error = i2c_smbus_write_byte_data(client, SILEAD_REG_RESET, @@ -210,7 +208,7 @@ static int silead_ts_init(struct i2c_client *client) usleep_range(SILEAD_CMD_SLEEP_MIN, SILEAD_CMD_SLEEP_MAX); error = i2c_smbus_write_byte_data(client, SILEAD_REG_TOUCH_NR, - data->max_fingers); + SILEAD_MAX_FINGERS); if (error) goto i2c_write_err; usleep_range(SILEAD_CMD_SLEEP_MIN, SILEAD_CMD_SLEEP_MAX); @@ -437,13 +435,6 @@ static void silead_ts_read_props(struct i2c_client *client) const char *str; int error; - error = device_property_read_u32(dev, "silead,max-fingers", - &data->max_fingers); - if (error) { - dev_dbg(dev, "Max fingers read error %d\n", error); - data->max_fingers = 5; /* Most devices handle up-to 5 fingers */ - } - error = device_property_read_string(dev, "firmware-name", &str); if (!error) snprintf(data->fw_name, sizeof(data->fw_name), From 34eb7ab9af70464350b31280366415b3a98e6959 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 31 May 2024 13:26:33 +0000 Subject: [PATCH 1487/1565] net: ipv6: rpl_iptunnel: block BH in rpl_output() and rpl_input() [ Upstream commit db0090c6eb12c31246438b7fe2a8f1b833e7a653 ] As explained in commit 1378817486d6 ("tipc: block BH before using dst_cache"), net/core/dst_cache.c helpers need to be called with BH disabled. Disabling preemption in rpl_output() is not good enough, because rpl_output() is called from process context, lwtunnel_output() only uses rcu_read_lock(). We might be interrupted by a softirq, re-enter rpl_output() and corrupt dst_cache data structures. Fix the race by using local_bh_disable() instead of preempt_disable(). Apply a similar change in rpl_input(). Signed-off-by: Eric Dumazet Cc: Alexander Aring Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240531132636.2637995-3-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/ipv6/rpl_iptunnel.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c index 5fdf3ebb953f..2ba605db6976 100644 --- a/net/ipv6/rpl_iptunnel.c +++ b/net/ipv6/rpl_iptunnel.c @@ -217,9 +217,9 @@ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb) if (unlikely(err)) goto drop; - preempt_disable(); + local_bh_disable(); dst = dst_cache_get(&rlwt->cache); - preempt_enable(); + local_bh_enable(); if (unlikely(!dst)) { struct ipv6hdr *hdr = ipv6_hdr(skb); @@ -239,9 +239,9 @@ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb) goto drop; } - preempt_disable(); + local_bh_disable(); dst_cache_set_ip6(&rlwt->cache, dst, &fl6.saddr); - preempt_enable(); + local_bh_enable(); } skb_dst_drop(skb); @@ -273,9 +273,8 @@ static int rpl_input(struct sk_buff *skb) return err; } - preempt_disable(); + local_bh_disable(); dst = dst_cache_get(&rlwt->cache); - preempt_enable(); skb_dst_drop(skb); @@ -283,14 +282,13 @@ static int rpl_input(struct sk_buff *skb) ip6_route_input(skb); dst = skb_dst(skb); if (!dst->error) { - preempt_disable(); dst_cache_set_ip6(&rlwt->cache, dst, &ipv6_hdr(skb)->saddr); - preempt_enable(); } } else { skb_dst_set(skb, dst); } + local_bh_enable(); err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev)); if (unlikely(err)) From a0cafb7b0b94d18e4813ee4b712a056f280e7b5a Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 31 May 2024 13:26:35 +0000 Subject: [PATCH 1488/1565] ila: block BH in ila_output() [ Upstream commit cf28ff8e4c02e1ffa850755288ac954b6ff0db8c ] As explained in commit 1378817486d6 ("tipc: block BH before using dst_cache"), net/core/dst_cache.c helpers need to be called with BH disabled. ila_output() is called from lwtunnel_output() possibly from process context, and under rcu_read_lock(). We might be interrupted by a softirq, re-enter ila_output() and corrupt dst_cache data structures. Fix the race by using local_bh_disable(). Signed-off-by: Eric Dumazet Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20240531132636.2637995-5-edumazet@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- net/ipv6/ila/ila_lwt.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c index 8c1ce78956ba..9d37f7164e73 100644 --- a/net/ipv6/ila/ila_lwt.c +++ b/net/ipv6/ila/ila_lwt.c @@ -58,7 +58,9 @@ static int ila_output(struct net *net, struct sock *sk, struct sk_buff *skb) return orig_dst->lwtstate->orig_output(net, sk, skb); } + local_bh_disable(); dst = dst_cache_get(&ilwt->dst_cache); + local_bh_enable(); if (unlikely(!dst)) { struct ipv6hdr *ip6h = ipv6_hdr(skb); struct flowi6 fl6; @@ -86,8 +88,11 @@ static int ila_output(struct net *net, struct sock *sk, struct sk_buff *skb) goto drop; } - if (ilwt->connected) + if (ilwt->connected) { + local_bh_disable(); dst_cache_set_ip6(&ilwt->dst_cache, dst, &fl6.saddr); + local_bh_enable(); + } } skb_dst_set(skb, dst); From 9934cda0e7fad5cd43dbdc849d3fd1369c8fd765 Mon Sep 17 00:00:00 2001 From: Wei Li Date: Tue, 23 Apr 2024 17:35:01 +0800 Subject: [PATCH 1489/1565] arm64: armv8_deprecated: Fix warning in isndep cpuhp starting process [ Upstream commit 14951beaec93696b092a906baa0f29322cf34004 ] The function run_all_insn_set_hw_mode() is registered as startup callback of 'CPUHP_AP_ARM64_ISNDEP_STARTING', it invokes set_hw_mode() methods of all emulated instructions. As the STARTING callbacks are not expected to fail, if one of the set_hw_mode() fails, e.g. due to el0 mixed-endian is not supported for 'setend', it will report a warning: ``` CPU[2] cannot support the emulation of setend CPU 2 UP state arm64/isndep:starting (136) failed (-22) CPU2: Booted secondary processor 0x0000000002 [0x414fd0c1] ``` To fix it, add a check for INSN_UNAVAILABLE status and skip the process. Signed-off-by: Wei Li Tested-by: Huisong Li Link: https://lore.kernel.org/r/20240423093501.3460764-1-liwei391@huawei.com Signed-off-by: Will Deacon Signed-off-by: Sasha Levin --- arch/arm64/kernel/armv8_deprecated.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm64/kernel/armv8_deprecated.c b/arch/arm64/kernel/armv8_deprecated.c index f0ba854f0045..34370be75acd 100644 --- a/arch/arm64/kernel/armv8_deprecated.c +++ b/arch/arm64/kernel/armv8_deprecated.c @@ -471,6 +471,9 @@ static int run_all_insn_set_hw_mode(unsigned int cpu) for (i = 0; i < ARRAY_SIZE(insn_emulations); i++) { struct insn_emulation *insn = insn_emulations[i]; bool enable = READ_ONCE(insn->current_mode) == INSN_HW; + if (insn->status == INSN_UNAVAILABLE) + continue; + if (insn->set_hw_mode && insn->set_hw_mode(enable)) { pr_warn("CPU[%u] cannot support the emulation of %s", cpu, insn->name); From 9625afe1dd4a158a14bb50f81af9e2dac634c0b1 Mon Sep 17 00:00:00 2001 From: Andreas Hindborg Date: Mon, 3 Jun 2024 21:26:45 +0200 Subject: [PATCH 1490/1565] null_blk: fix validation of block size [ Upstream commit c462ecd659b5fce731f1d592285832fd6ad54053 ] Block size should be between 512 and PAGE_SIZE and be a power of 2. The current check does not validate this, so update the check. Without this patch, null_blk would Oops due to a null pointer deref when loaded with bs=1536 [1]. Link: https://lore.kernel.org/all/87wmn8mocd.fsf@metaspace.dk/ Signed-off-by: Andreas Hindborg Reviewed-by: Ming Lei Link: https://lore.kernel.org/r/20240603192645.977968-1-nmi@metaspace.dk [axboe: remove unnecessary braces and != 0 check] Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin --- drivers/block/null_blk/main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c index 862a9420df52..8e24fb93324c 100644 --- a/drivers/block/null_blk/main.c +++ b/drivers/block/null_blk/main.c @@ -1743,8 +1743,8 @@ static int null_validate_conf(struct nullb_device *dev) return -EINVAL; } - dev->blocksize = round_down(dev->blocksize, 512); - dev->blocksize = clamp_t(unsigned int, dev->blocksize, 512, 4096); + if (blk_validate_block_size(dev->blocksize)) + return -EINVAL; if (dev->queue_mode == NULL_Q_MQ && dev->use_per_node_hctx) { if (dev->submit_queues != nr_online_nodes) From 4beba240857322b97074cc78f427fa57794e2f05 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Sun, 2 Jun 2024 03:20:40 +0900 Subject: [PATCH 1491/1565] kconfig: gconf: give a proper initial state to the Save button [ Upstream commit 46edf4372e336ef3a61c3126e49518099d2e2e6d ] Currently, the initial state of the "Save" button is always active. If none of the CONFIG options are changed while loading the .config file, the "Save" button should be greyed out. This can be fixed by calling conf_read() after widget initialization. Signed-off-by: Masahiro Yamada Signed-off-by: Sasha Levin --- scripts/kconfig/gconf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/scripts/kconfig/gconf.c b/scripts/kconfig/gconf.c index 5527482c3077..409799912731 100644 --- a/scripts/kconfig/gconf.c +++ b/scripts/kconfig/gconf.c @@ -1484,7 +1484,6 @@ int main(int ac, char *av[]) conf_parse(name); fixup_rootmenu(&rootmenu); - conf_read(NULL); /* Load the interface and connect signals */ init_main_window(glade_file); @@ -1492,6 +1491,8 @@ int main(int ac, char *av[]) init_left_tree(); init_right_tree(); + conf_read(NULL); + switch (view_mode) { case SINGLE_VIEW: display_tree_part(); From ffe47bf986d130ff7ce74b65f1ce00c7bee90b21 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Tue, 4 Jun 2024 01:19:04 +0900 Subject: [PATCH 1492/1565] kconfig: remove wrong expr_trans_bool() [ Upstream commit 77a92660d8fe8d29503fae768d9f5eb529c88b36 ] expr_trans_bool() performs an incorrect transformation. [Test Code] config MODULES def_bool y modules config A def_bool y select C if B != n config B def_tristate m config C tristate [Result] CONFIG_MODULES=y CONFIG_A=y CONFIG_B=m CONFIG_C=m This output is incorrect because CONFIG_C=y is expected. Documentation/kbuild/kconfig-language.rst clearly explains the function of the '!=' operator: If the values of both symbols are equal, it returns 'n', otherwise 'y'. Therefore, the statement: select C if B != n should be equivalent to: select C if y Or, more simply: select C Hence, the symbol C should be selected by the value of A, which is 'y'. However, expr_trans_bool() wrongly transforms it to: select C if B Therefore, the symbol C is selected by (A && B), which is 'm'. The comment block of expr_trans_bool() correctly explains its intention: * bool FOO!=n => FOO ^^^^ If FOO is bool, FOO!=n can be simplified into FOO. This is correct. However, the actual code performs this transformation when FOO is tristate: if (e->left.sym->type == S_TRISTATE) { ^^^^^^^^^^ While it can be fixed to S_BOOLEAN, there is no point in doing so because expr_tranform() already transforms FOO!=n to FOO when FOO is bool. (see the "case E_UNEQUAL" part) expr_trans_bool() is wrong and unnecessary. Signed-off-by: Masahiro Yamada Acked-by: Randy Dunlap Signed-off-by: Sasha Levin --- scripts/kconfig/expr.c | 29 ----------------------------- scripts/kconfig/expr.h | 1 - scripts/kconfig/menu.c | 2 -- 3 files changed, 32 deletions(-) diff --git a/scripts/kconfig/expr.c b/scripts/kconfig/expr.c index 81ebf8108ca7..81dfdf4470f7 100644 --- a/scripts/kconfig/expr.c +++ b/scripts/kconfig/expr.c @@ -396,35 +396,6 @@ static struct expr *expr_eliminate_yn(struct expr *e) return e; } -/* - * bool FOO!=n => FOO - */ -struct expr *expr_trans_bool(struct expr *e) -{ - if (!e) - return NULL; - switch (e->type) { - case E_AND: - case E_OR: - case E_NOT: - e->left.expr = expr_trans_bool(e->left.expr); - e->right.expr = expr_trans_bool(e->right.expr); - break; - case E_UNEQUAL: - // FOO!=n -> FOO - if (e->left.sym->type == S_TRISTATE) { - if (e->right.sym == &symbol_no) { - e->type = E_SYMBOL; - e->right.sym = NULL; - } - } - break; - default: - ; - } - return e; -} - /* * e1 || e2 -> ? */ diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h index 5c3443692f34..385a47daa364 100644 --- a/scripts/kconfig/expr.h +++ b/scripts/kconfig/expr.h @@ -302,7 +302,6 @@ void expr_free(struct expr *e); void expr_eliminate_eq(struct expr **ep1, struct expr **ep2); int expr_eq(struct expr *e1, struct expr *e2); tristate expr_calc_value(struct expr *e); -struct expr *expr_trans_bool(struct expr *e); struct expr *expr_eliminate_dups(struct expr *e); struct expr *expr_transform(struct expr *e); int expr_contains_symbol(struct expr *dep, struct symbol *sym); diff --git a/scripts/kconfig/menu.c b/scripts/kconfig/menu.c index a5fbd6ccc006..e5ad6313cfa1 100644 --- a/scripts/kconfig/menu.c +++ b/scripts/kconfig/menu.c @@ -401,8 +401,6 @@ void menu_finalize(struct menu *parent) dep = expr_transform(dep); dep = expr_alloc_and(expr_copy(basedep), dep); dep = expr_eliminate_dups(dep); - if (menu->sym && menu->sym->type != S_TRISTATE) - dep = expr_trans_bool(dep); prop->visible.expr = dep; /* From 229bce543ba097d1b8ce522bfcd264cf695fcb4f Mon Sep 17 00:00:00 2001 From: Yuntao Wang Date: Thu, 30 May 2024 00:06:56 +0800 Subject: [PATCH 1493/1565] fs/file: fix the check in find_next_fd() [ Upstream commit ed8c7fbdfe117abbef81f65428ba263118ef298a ] The maximum possible return value of find_next_zero_bit(fdt->full_fds_bits, maxbit, bitbit) is maxbit. This return value, multiplied by BITS_PER_LONG, gives the value of bitbit, which can never be greater than maxfd, it can only be equal to maxfd at most, so the following check 'if (bitbit > maxfd)' will never be true. Moreover, when bitbit equals maxfd, it indicates that there are no unused fds, and the function can directly return. Fix this check. Signed-off-by: Yuntao Wang Link: https://lore.kernel.org/r/20240529160656.209352-1-yuntao.wang@linux.dev Reviewed-by: Jan Kara Signed-off-by: Christian Brauner Signed-off-by: Sasha Levin --- fs/file.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/file.c b/fs/file.c index fdb84a64724b..913f7d897d2f 100644 --- a/fs/file.c +++ b/fs/file.c @@ -494,12 +494,12 @@ struct files_struct init_files = { static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start) { - unsigned int maxfd = fdt->max_fds; + unsigned int maxfd = fdt->max_fds; /* always multiple of BITS_PER_LONG */ unsigned int maxbit = maxfd / BITS_PER_LONG; unsigned int bitbit = start / BITS_PER_LONG; bitbit = find_next_zero_bit(fdt->full_fds_bits, maxbit, bitbit) * BITS_PER_LONG; - if (bitbit > maxfd) + if (bitbit >= maxfd) return maxfd; if (bitbit > start) start = bitbit; From 9774641b255fbfee87fb9265580abb0452cf59e5 Mon Sep 17 00:00:00 2001 From: Alexander Usyskin Date: Thu, 30 May 2024 12:14:15 +0300 Subject: [PATCH 1494/1565] mei: demote client disconnect warning on suspend to debug [ Upstream commit 1db5322b7e6b58e1b304ce69a50e9dca798ca95b ] Change level for the "not connected" client message in the write callback from error to debug. The MEI driver currently disconnects all clients upon system suspend. This behavior is by design and user-space applications with open connections before the suspend are expected to handle errors upon resume, by reopening their handles, reconnecting, and retrying their operations. However, the current driver implementation logs an error message every time a write operation is attempted on a disconnected client. Since this is a normal and expected flow after system resume logging this as an error can be misleading. Signed-off-by: Alexander Usyskin Signed-off-by: Tomas Winkler Link: https://lore.kernel.org/r/20240530091415.725247-1-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin --- drivers/misc/mei/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/mei/main.c b/drivers/misc/mei/main.c index 9f6682033ed7..d8311d41f0a7 100644 --- a/drivers/misc/mei/main.c +++ b/drivers/misc/mei/main.c @@ -329,7 +329,7 @@ static ssize_t mei_write(struct file *file, const char __user *ubuf, } if (!mei_cl_is_connected(cl)) { - cl_err(dev, cl, "is not connected"); + cl_dbg(dev, cl, "is not connected"); rets = -ENODEV; goto out; } From 6295bad58f988eaafcf0e6f8b198a580398acb3b Mon Sep 17 00:00:00 2001 From: Dmitry Antipov Date: Fri, 31 May 2024 06:20:10 +0300 Subject: [PATCH 1495/1565] wifi: cfg80211: wext: add extra SIOCSIWSCAN data check [ Upstream commit 6ef09cdc5ba0f93826c09d810c141a8d103a80fc ] In 'cfg80211_wext_siwscan()', add extra check whether number of channels passed via 'ioctl(sock, SIOCSIWSCAN, ...)' doesn't exceed IW_MAX_FREQUENCIES and reject invalid request with -EINVAL otherwise. Reported-by: syzbot+253cd2d2491df77c93ac@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=253cd2d2491df77c93ac Signed-off-by: Dmitry Antipov Link: https://msgid.link/20240531032010.451295-1-dmantipov@yandex.ru Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin --- net/wireless/scan.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/wireless/scan.c b/net/wireless/scan.c index a6c289a61d30..76a27b6d45d2 100644 --- a/net/wireless/scan.c +++ b/net/wireless/scan.c @@ -2772,10 +2772,14 @@ int cfg80211_wext_siwscan(struct net_device *dev, wiphy = &rdev->wiphy; /* Determine number of channels, needed to allocate creq */ - if (wreq && wreq->num_channels) + if (wreq && wreq->num_channels) { + /* Passed from userspace so should be checked */ + if (unlikely(wreq->num_channels > IW_MAX_FREQUENCIES)) + return -EINVAL; n_channels = wreq->num_channels; - else + } else { n_channels = ieee80211_get_num_supported_channels(wiphy); + } creq = kzalloc(sizeof(*creq) + sizeof(struct cfg80211_ssid) + n_channels * sizeof(void *), From 4cdf6926f443c84f680213c7aafbe6f91a5fcbc0 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Fri, 14 Jun 2024 22:29:10 +1000 Subject: [PATCH 1496/1565] KVM: PPC: Book3S HV: Prevent UAF in kvm_spapr_tce_attach_iommu_group() [ Upstream commit a986fa57fd81a1430e00b3c6cf8a325d6f894a63 ] Al reported a possible use-after-free (UAF) in kvm_spapr_tce_attach_iommu_group(). It looks up `stt` from tablefd, but then continues to use it after doing fdput() on the returned fd. After the fdput() the tablefd is free to be closed by another thread. The close calls kvm_spapr_tce_release() and then release_spapr_tce_table() (via call_rcu()) which frees `stt`. Although there are calls to rcu_read_lock() in kvm_spapr_tce_attach_iommu_group() they are not sufficient to prevent the UAF, because `stt` is used outside the locked regions. With an artifcial delay after the fdput() and a userspace program which triggers the race, KASAN detects the UAF: BUG: KASAN: slab-use-after-free in kvm_spapr_tce_attach_iommu_group+0x298/0x720 [kvm] Read of size 4 at addr c000200027552c30 by task kvm-vfio/2505 CPU: 54 PID: 2505 Comm: kvm-vfio Not tainted 6.10.0-rc3-next-20240612-dirty #1 Hardware name: 8335-GTH POWER9 0x4e1202 opal:skiboot-v6.5.3-35-g1851b2a06 PowerNV Call Trace: dump_stack_lvl+0xb4/0x108 (unreliable) print_report+0x2b4/0x6ec kasan_report+0x118/0x2b0 __asan_load4+0xb8/0xd0 kvm_spapr_tce_attach_iommu_group+0x298/0x720 [kvm] kvm_vfio_set_attr+0x524/0xac0 [kvm] kvm_device_ioctl+0x144/0x240 [kvm] sys_ioctl+0x62c/0x1810 system_call_exception+0x190/0x440 system_call_vectored_common+0x15c/0x2ec ... Freed by task 0: ... kfree+0xec/0x3e0 release_spapr_tce_table+0xd4/0x11c [kvm] rcu_core+0x568/0x16a0 handle_softirqs+0x23c/0x920 do_softirq_own_stack+0x6c/0x90 do_softirq_own_stack+0x58/0x90 __irq_exit_rcu+0x218/0x2d0 irq_exit+0x30/0x80 arch_local_irq_restore+0x128/0x230 arch_local_irq_enable+0x1c/0x30 cpuidle_enter_state+0x134/0x5cc cpuidle_enter+0x6c/0xb0 call_cpuidle+0x7c/0x100 do_idle+0x394/0x410 cpu_startup_entry+0x60/0x70 start_secondary+0x3fc/0x410 start_secondary_prolog+0x10/0x14 Fix it by delaying the fdput() until `stt` is no longer in use, which is effectively the entire function. To keep the patch minimal add a call to fdput() at each of the existing return paths. Future work can convert the function to goto or __cleanup style cleanup. With the fix in place the test case no longer triggers the UAF. Reported-by: Al Viro Closes: https://lore.kernel.org/all/20240610024437.GA1464458@ZenIV/ Signed-off-by: Michael Ellerman Link: https://msgid.link/20240614122910.3499489-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin --- arch/powerpc/kvm/book3s_64_vio.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index c640053ab03f..2686ba59873d 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -117,14 +117,16 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd, } rcu_read_unlock(); - fdput(f); - - if (!found) + if (!found) { + fdput(f); return -EINVAL; + } table_group = iommu_group_get_iommudata(grp); - if (WARN_ON(!table_group)) + if (WARN_ON(!table_group)) { + fdput(f); return -EFAULT; + } for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) { struct iommu_table *tbltmp = table_group->tables[i]; @@ -145,8 +147,10 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd, break; } } - if (!tbl) + if (!tbl) { + fdput(f); return -EINVAL; + } rcu_read_lock(); list_for_each_entry_rcu(stit, &stt->iommu_tables, next) { @@ -157,6 +161,7 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd, /* stit is being destroyed */ iommu_tce_table_put(tbl); rcu_read_unlock(); + fdput(f); return -ENOTTY; } /* @@ -164,6 +169,7 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd, * its KVM reference counter and can return. */ rcu_read_unlock(); + fdput(f); return 0; } rcu_read_unlock(); @@ -171,6 +177,7 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd, stit = kzalloc(sizeof(*stit), GFP_KERNEL); if (!stit) { iommu_tce_table_put(tbl); + fdput(f); return -ENOMEM; } @@ -179,6 +186,7 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd, list_add_rcu(&stit->next, &stt->iommu_tables); + fdput(f); return 0; } From 134b12f0c590b1f30480bfedfba39e2c2f39a7eb Mon Sep 17 00:00:00 2001 From: Kailang Yang Date: Tue, 18 Jun 2024 14:16:04 +0800 Subject: [PATCH 1497/1565] ALSA: hda/realtek: Add more codec ID to no shutup pins list [ Upstream commit 70794b9563fe011988bcf6a081af9777e63e8d37 ] If it enter to runtime D3 state, it didn't shutup Headset MIC pin. Signed-off-by: Kailang Yang Link: https://lore.kernel.org/r/8d86f61e7d6f4a03b311e4eb4e5caaef@realtek.com Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin --- sound/pci/hda/patch_realtek.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 669937bae570..fdbc76eaf233 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -578,10 +578,14 @@ static void alc_shutup_pins(struct hda_codec *codec) switch (codec->core.vendor_id) { case 0x10ec0236: case 0x10ec0256: + case 0x10ec0257: case 0x19e58326: case 0x10ec0283: + case 0x10ec0285: case 0x10ec0286: + case 0x10ec0287: case 0x10ec0288: + case 0x10ec0295: case 0x10ec0298: alc_headset_mic_no_shutup(codec); break; From cf704e7d04377e37ff377d28e9196a6c27003775 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 20 Jun 2024 18:23:04 +0200 Subject: [PATCH 1498/1565] mips: fix compat_sys_lseek syscall [ Upstream commit 0d5679a0aae2d8cda72169452c32e5cb88a7ab33 ] This is almost compatible, but passing a negative offset should result in a EINVAL error, but on mips o32 compat mode would seek to a large 32-bit byte offset. Use compat_sys_lseek() to correctly sign-extend the argument. Signed-off-by: Arnd Bergmann Signed-off-by: Thomas Bogendoerfer Signed-off-by: Sasha Levin --- arch/mips/kernel/syscalls/syscall_o32.tbl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 6036af4f30e2..c262975484fa 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -27,7 +27,7 @@ 17 o32 break sys_ni_syscall # 18 was sys_stat 18 o32 unused18 sys_ni_syscall -19 o32 lseek sys_lseek +19 o32 lseek sys_lseek compat_sys_lseek 20 o32 getpid sys_getpid 21 o32 mount sys_mount 22 o32 umount sys_oldumount From 9b32a1348653e3b5cea2748894a708c28a14347f Mon Sep 17 00:00:00 2001 From: Jonathan Denose Date: Fri, 3 May 2024 16:12:07 +0000 Subject: [PATCH 1499/1565] Input: elantech - fix touchpad state on resume for Lenovo N24 [ Upstream commit a69ce592cbe0417664bc5a075205aa75c2ec1273 ] The Lenovo N24 on resume becomes stuck in a state where it sends incorrect packets, causing elantech_packet_check_v4 to fail. The only way for the device to resume sending the correct packets is for it to be disabled and then re-enabled. This change adds a dmi check to trigger this behavior on resume. Signed-off-by: Jonathan Denose Link: https://lore.kernel.org/r/20240503155020.v2.1.Ifa0e25ebf968d8f307f58d678036944141ab17e6@changeid Signed-off-by: Dmitry Torokhov Signed-off-by: Sasha Levin --- drivers/input/mouse/elantech.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/drivers/input/mouse/elantech.c b/drivers/input/mouse/elantech.c index 400281feb4e8..8246662fa60b 100644 --- a/drivers/input/mouse/elantech.c +++ b/drivers/input/mouse/elantech.c @@ -1476,16 +1476,47 @@ static void elantech_disconnect(struct psmouse *psmouse) psmouse->private = NULL; } +/* + * Some hw_version 4 models fail to properly activate absolute mode on + * resume without going through disable/enable cycle. + */ +static const struct dmi_system_id elantech_needs_reenable[] = { +#if defined(CONFIG_DMI) && defined(CONFIG_X86) + { + /* Lenovo N24 */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "81AF"), + }, + }, +#endif + { } +}; + /* * Put the touchpad back into absolute mode when reconnecting */ static int elantech_reconnect(struct psmouse *psmouse) { + int err; + psmouse_reset(psmouse); if (elantech_detect(psmouse, 0)) return -1; + if (dmi_check_system(elantech_needs_reenable)) { + err = ps2_command(&psmouse->ps2dev, NULL, PSMOUSE_CMD_DISABLE); + if (err) + psmouse_warn(psmouse, "failed to deactivate mouse on %s: %d\n", + psmouse->ps2dev.serio->phys, err); + + err = ps2_command(&psmouse->ps2dev, NULL, PSMOUSE_CMD_ENABLE); + if (err) + psmouse_warn(psmouse, "failed to reactivate mouse on %s: %d\n", + psmouse->ps2dev.serio->phys, err); + } + if (elantech_set_absolute_mode(psmouse)) { psmouse_err(psmouse, "failed to put touchpad back into absolute mode.\n"); From a87d15d1a3fe875c47ae5d7209afceea749fcd54 Mon Sep 17 00:00:00 2001 From: Tobias Jakobi Date: Fri, 31 May 2024 15:43:07 -0700 Subject: [PATCH 1500/1565] Input: i8042 - add Ayaneo Kun to i8042 quirk table [ Upstream commit 955af6355ddfe35140f9706a635838212a32513b ] See the added comment for details. Also fix a typo in the quirk's define. Signed-off-by: Tobias Jakobi Link: https://lore.kernel.org/r/20240531190100.3874731-1-tjakobi@math.uni-bielefeld.de Signed-off-by: Dmitry Torokhov Signed-off-by: Sasha Levin --- drivers/input/serio/i8042-acpipnpio.h | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/input/serio/i8042-acpipnpio.h b/drivers/input/serio/i8042-acpipnpio.h index 6804970d8f51..91edfb88a218 100644 --- a/drivers/input/serio/i8042-acpipnpio.h +++ b/drivers/input/serio/i8042-acpipnpio.h @@ -75,7 +75,7 @@ static inline void i8042_write_command(int val) #define SERIO_QUIRK_PROBE_DEFER BIT(5) #define SERIO_QUIRK_RESET_ALWAYS BIT(6) #define SERIO_QUIRK_RESET_NEVER BIT(7) -#define SERIO_QUIRK_DIECT BIT(8) +#define SERIO_QUIRK_DIRECT BIT(8) #define SERIO_QUIRK_DUMBKBD BIT(9) #define SERIO_QUIRK_NOLOOP BIT(10) #define SERIO_QUIRK_NOTIMEOUT BIT(11) @@ -1235,6 +1235,20 @@ static const struct dmi_system_id i8042_dmi_quirk_table[] __initconst = { .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) }, + { + /* + * The Ayaneo Kun is a handheld device where some the buttons + * are handled by an AT keyboard. The keyboard is usually + * detected as raw, but sometimes, usually after a cold boot, + * it is detected as translated. Make sure that the keyboard + * is always in raw mode. + */ + .matches = { + DMI_EXACT_MATCH(DMI_BOARD_VENDOR, "AYANEO"), + DMI_MATCH(DMI_BOARD_NAME, "KUN"), + }, + .driver_data = (void *)(SERIO_QUIRK_DIRECT) + }, { } }; @@ -1553,7 +1567,7 @@ static void __init i8042_check_quirks(void) if (quirks & SERIO_QUIRK_RESET_NEVER) i8042_reset = I8042_RESET_NEVER; } - if (quirks & SERIO_QUIRK_DIECT) + if (quirks & SERIO_QUIRK_DIRECT) i8042_direct = true; if (quirks & SERIO_QUIRK_DUMBKBD) i8042_dumbkbd = true; From 73bb3e01941382a976795cee051f7a4df9f54d2b Mon Sep 17 00:00:00 2001 From: Thomas GENTY Date: Sat, 8 Jun 2024 19:02:51 +0200 Subject: [PATCH 1501/1565] bytcr_rt5640 : inverse jack detect for Archos 101 cesium [ Upstream commit e3209a1827646daaab744aa6a5767b1f57fb5385 ] When headphones are plugged in, they appear absent; when they are removed, they appear present. Add a specific entry in bytcr_rt5640 for this device Signed-off-by: Thomas GENTY Reviewed-by: Hans de Goede Acked-by: Pierre-Louis Bossart Link: https://lore.kernel.org/r/20240608170251.99936-1-tomlohave@gmail.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/intel/boards/bytcr_rt5640.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c index 1d049685e707..47b581d99da6 100644 --- a/sound/soc/intel/boards/bytcr_rt5640.c +++ b/sound/soc/intel/boards/bytcr_rt5640.c @@ -468,6 +468,17 @@ static const struct dmi_system_id byt_rt5640_quirk_table[] = { BYT_RT5640_SSP0_AIF1 | BYT_RT5640_MCLK_EN), }, + { + .matches = { + DMI_EXACT_MATCH(DMI_SYS_VENDOR, "ARCHOS"), + DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "ARCHOS 101 CESIUM"), + }, + .driver_data = (void *)(BYTCR_INPUT_DEFAULTS | + BYT_RT5640_JD_NOT_INV | + BYT_RT5640_DIFF_MIC | + BYT_RT5640_SSP0_AIF1 | + BYT_RT5640_MCLK_EN), + }, { .matches = { DMI_EXACT_MATCH(DMI_SYS_VENDOR, "ARCHOS"), From 072f6348c589150cab9b1bc53b4e82bb0e332748 Mon Sep 17 00:00:00 2001 From: Jai Luthra Date: Tue, 11 Jun 2024 18:02:55 +0530 Subject: [PATCH 1502/1565] ALSA: dmaengine: Synchronize dma channel after drop() [ Upstream commit e8343410ddf08fc36a9b9cc7c51a4e53a262d4c6 ] Sometimes the stream may be stopped due to XRUN events, in which case the userspace can call snd_pcm_drop() and snd_pcm_prepare() to stop and start the stream again. In these cases, we must wait for the DMA channel to synchronize before marking the stream as prepared for playback, as the DMA channel gets stopped by drop() without any synchronization. Make sure the ALSA core synchronizes the DMA channel by adding a sync_stop() hook. Reviewed-by: Peter Ujfalusi Signed-off-by: Jai Luthra Link: https://lore.kernel.org/r/20240611-asoc_next-v3-1-fcfd84b12164@ti.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- include/sound/dmaengine_pcm.h | 1 + sound/core/pcm_dmaengine.c | 10 ++++++++++ sound/soc/soc-generic-dmaengine-pcm.c | 8 ++++++++ 3 files changed, 19 insertions(+) diff --git a/include/sound/dmaengine_pcm.h b/include/sound/dmaengine_pcm.h index 8c5e38180fb0..618405da95b3 100644 --- a/include/sound/dmaengine_pcm.h +++ b/include/sound/dmaengine_pcm.h @@ -34,6 +34,7 @@ snd_pcm_uframes_t snd_dmaengine_pcm_pointer_no_residue(struct snd_pcm_substream int snd_dmaengine_pcm_open(struct snd_pcm_substream *substream, struct dma_chan *chan); int snd_dmaengine_pcm_close(struct snd_pcm_substream *substream); +int snd_dmaengine_pcm_sync_stop(struct snd_pcm_substream *substream); int snd_dmaengine_pcm_open_request_chan(struct snd_pcm_substream *substream, dma_filter_fn filter_fn, void *filter_data); diff --git a/sound/core/pcm_dmaengine.c b/sound/core/pcm_dmaengine.c index be58505889a3..db2229445256 100644 --- a/sound/core/pcm_dmaengine.c +++ b/sound/core/pcm_dmaengine.c @@ -342,6 +342,16 @@ int snd_dmaengine_pcm_open_request_chan(struct snd_pcm_substream *substream, } EXPORT_SYMBOL_GPL(snd_dmaengine_pcm_open_request_chan); +int snd_dmaengine_pcm_sync_stop(struct snd_pcm_substream *substream) +{ + struct dmaengine_pcm_runtime_data *prtd = substream_to_prtd(substream); + + dmaengine_synchronize(prtd->dma_chan); + + return 0; +} +EXPORT_SYMBOL_GPL(snd_dmaengine_pcm_sync_stop); + /** * snd_dmaengine_pcm_close - Close a dmaengine based PCM substream * @substream: PCM substream diff --git a/sound/soc/soc-generic-dmaengine-pcm.c b/sound/soc/soc-generic-dmaengine-pcm.c index 9ef80a48707e..d65dc1acff43 100644 --- a/sound/soc/soc-generic-dmaengine-pcm.c +++ b/sound/soc/soc-generic-dmaengine-pcm.c @@ -326,6 +326,12 @@ static int dmaengine_copy_user(struct snd_soc_component *component, return 0; } +static int dmaengine_pcm_sync_stop(struct snd_soc_component *component, + struct snd_pcm_substream *substream) +{ + return snd_dmaengine_pcm_sync_stop(substream); +} + static const struct snd_soc_component_driver dmaengine_pcm_component = { .name = SND_DMAENGINE_PCM_DRV_NAME, .probe_order = SND_SOC_COMP_ORDER_LATE, @@ -335,6 +341,7 @@ static const struct snd_soc_component_driver dmaengine_pcm_component = { .trigger = dmaengine_pcm_trigger, .pointer = dmaengine_pcm_pointer, .pcm_construct = dmaengine_pcm_new, + .sync_stop = dmaengine_pcm_sync_stop, }; static const struct snd_soc_component_driver dmaengine_pcm_component_process = { @@ -347,6 +354,7 @@ static const struct snd_soc_component_driver dmaengine_pcm_component_process = { .pointer = dmaengine_pcm_pointer, .copy_user = dmaengine_copy_user, .pcm_construct = dmaengine_pcm_new, + .sync_stop = dmaengine_pcm_sync_stop, }; static const char * const dmaengine_pcm_dma_channel_names[] = { From 96414bf03778ed72f085967c946c8270bbeac500 Mon Sep 17 00:00:00 2001 From: Jai Luthra Date: Tue, 11 Jun 2024 18:02:56 +0530 Subject: [PATCH 1503/1565] ASoC: ti: davinci-mcasp: Set min period size using FIFO config [ Upstream commit c5dcf8ab10606e76c1d8a0ec77f27d84a392e874 ] The minimum period size was enforced to 64 as older devices integrating McASP with EDMA used an internal FIFO of 64 samples. With UDMA based platforms this internal McASP FIFO is optional, as the DMA engine internally does some buffering which is already accounted for when registering the platform. So we should read the actual FIFO configuration (txnumevt/rxnumevt) instead of hardcoding frames.min to 64. Acked-by: Peter Ujfalusi Signed-off-by: Jai Luthra Link: https://lore.kernel.org/r/20240611-asoc_next-v3-2-fcfd84b12164@ti.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/ti/davinci-mcasp.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/sound/soc/ti/davinci-mcasp.c b/sound/soc/ti/davinci-mcasp.c index a6b72ad53b43..61ea444f2018 100644 --- a/sound/soc/ti/davinci-mcasp.c +++ b/sound/soc/ti/davinci-mcasp.c @@ -1441,10 +1441,11 @@ static int davinci_mcasp_hw_rule_min_periodsize( { struct snd_interval *period_size = hw_param_interval(params, SNDRV_PCM_HW_PARAM_PERIOD_SIZE); + u8 numevt = *((u8 *)rule->private); struct snd_interval frames; snd_interval_any(&frames); - frames.min = 64; + frames.min = numevt; frames.integer = 1; return snd_interval_refine(period_size, &frames); @@ -1459,6 +1460,7 @@ static int davinci_mcasp_startup(struct snd_pcm_substream *substream, u32 max_channels = 0; int i, dir, ret; int tdm_slots = mcasp->tdm_slots; + u8 *numevt; /* Do not allow more then one stream per direction */ if (mcasp->substreams[substream->stream]) @@ -1558,9 +1560,12 @@ static int davinci_mcasp_startup(struct snd_pcm_substream *substream, return ret; } + numevt = (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) ? + &mcasp->txnumevt : + &mcasp->rxnumevt; snd_pcm_hw_rule_add(substream->runtime, 0, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, - davinci_mcasp_hw_rule_min_periodsize, NULL, + davinci_mcasp_hw_rule_min_periodsize, numevt, SNDRV_PCM_HW_PARAM_PERIOD_SIZE, -1); return 0; From 6e90cd169608e6b0a6e37ff81fb5d2fdc2c71c0b Mon Sep 17 00:00:00 2001 From: Primoz Fiser Date: Mon, 10 Jun 2024 14:58:47 +0200 Subject: [PATCH 1504/1565] ASoC: ti: omap-hdmi: Fix too long driver name [ Upstream commit 524d3f126362b6033e92cbe107ae2158d7fbff94 ] Set driver name to "HDMI". This simplifies the code and gets rid of the following error messages: ASoC: driver name too long 'HDMI 58040000.encoder' -> 'HDMI_58040000_e' Signed-off-by: Primoz Fiser Acked-by: Peter Ujfalusi Link: https://lore.kernel.org/r/20240610125847.773394-1-primoz.fiser@norik.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- sound/soc/ti/omap-hdmi.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/sound/soc/ti/omap-hdmi.c b/sound/soc/ti/omap-hdmi.c index 3328c02f93c7..1dfe439d1341 100644 --- a/sound/soc/ti/omap-hdmi.c +++ b/sound/soc/ti/omap-hdmi.c @@ -353,11 +353,7 @@ static int omap_hdmi_audio_probe(struct platform_device *pdev) if (!card) return -ENOMEM; - card->name = devm_kasprintf(dev, GFP_KERNEL, - "HDMI %s", dev_name(ad->dssdev)); - if (!card->name) - return -ENOMEM; - + card->name = "HDMI"; card->owner = THIS_MODULE; card->dai_link = devm_kzalloc(dev, sizeof(*(card->dai_link)), GFP_KERNEL); From 71db8dc6f8066178b823451aa7f105ea5d3ebb8d Mon Sep 17 00:00:00 2001 From: Chen Ni Date: Tue, 21 May 2024 12:10:20 +0800 Subject: [PATCH 1505/1565] can: kvaser_usb: fix return value for hif_usb_send_regout [ Upstream commit 0d34d8163fd87978a6abd792e2d8ad849f4c3d57 ] As the potential failure of usb_submit_urb(), it should be better to return the err variable to catch the error. Signed-off-by: Chen Ni Link: https://lore.kernel.org/all/20240521041020.1519416-1-nichen@iscas.ac.cn Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin --- drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c b/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c index 411b3adb1d9e..a96b22398407 100644 --- a/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c +++ b/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c @@ -266,7 +266,7 @@ int kvaser_usb_send_cmd_async(struct kvaser_usb_net_priv *priv, void *cmd, } usb_free_urb(urb); - return 0; + return err; } int kvaser_usb_can_rx_over_error(struct net_device *netdev) From 2e51db7ab71b89dc5a17068f5e201c69f13a4c9a Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Fri, 14 Jun 2024 18:09:01 +0200 Subject: [PATCH 1506/1565] s390/sclp: Fix sclp_init() cleanup on failure [ Upstream commit 6434b33faaa063df500af355ee6c3942e0f8d982 ] If sclp_init() fails it only partially cleans up: if there are multiple failing calls to sclp_init() sclp_state_change_event will be added several times to sclp_reg_list, which results in the following warning: ------------[ cut here ]------------ list_add double add: new=000003ffe1598c10, prev=000003ffe1598bf0, next=000003ffe1598c10. WARNING: CPU: 0 PID: 1 at lib/list_debug.c:35 __list_add_valid_or_report+0xde/0xf8 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.10.0-rc3 Krnl PSW : 0404c00180000000 000003ffe0d6076a (__list_add_valid_or_report+0xe2/0xf8) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 ... Call Trace: [<000003ffe0d6076a>] __list_add_valid_or_report+0xe2/0xf8 ([<000003ffe0d60766>] __list_add_valid_or_report+0xde/0xf8) [<000003ffe0a8d37e>] sclp_init+0x40e/0x450 [<000003ffe00009f2>] do_one_initcall+0x42/0x1e0 [<000003ffe15b77a6>] do_initcalls+0x126/0x150 [<000003ffe15b7a0a>] kernel_init_freeable+0x1ba/0x1f8 [<000003ffe0d6650e>] kernel_init+0x2e/0x180 [<000003ffe000301c>] __ret_from_fork+0x3c/0x60 [<000003ffe0d759ca>] ret_from_fork+0xa/0x30 Fix this by removing sclp_state_change_event from sclp_reg_list when sclp_init() fails. Reviewed-by: Peter Oberparleiter Signed-off-by: Heiko Carstens Signed-off-by: Alexander Gordeev Signed-off-by: Sasha Levin --- drivers/s390/char/sclp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/s390/char/sclp.c b/drivers/s390/char/sclp.c index d2ab3f07c008..8296e6bc229e 100644 --- a/drivers/s390/char/sclp.c +++ b/drivers/s390/char/sclp.c @@ -1208,6 +1208,7 @@ sclp_init(void) fail_unregister_reboot_notifier: unregister_reboot_notifier(&sclp_reboot_notifier); fail_init_state_uninitialized: + list_del(&sclp_state_change_event.list); sclp_init_state = sclp_init_state_uninitialized; free_page((unsigned long) sclp_read_sccb); free_page((unsigned long) sclp_init_sccb); From 94818bdb00ef34a996a06aa63d11f591074cb757 Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Thu, 20 Jun 2024 12:32:00 +0100 Subject: [PATCH 1507/1565] btrfs: qgroup: fix quota root leak after quota disable failure [ Upstream commit a7e4c6a3031c74078dba7fa36239d0f4fe476c53 ] If during the quota disable we fail when cleaning the quota tree or when deleting the root from the root tree, we jump to the 'out' label without ever dropping the reference on the quota root, resulting in a leak of the root since fs_info->quota_root is no longer pointing to the root (we have set it to NULL just before those steps). Fix this by always doing a btrfs_put_root() call under the 'out' label. This is a problem that exists since qgroups were first added in 2012 by commit bed92eae26cc ("Btrfs: qgroup implementation and prototypes"), but back then we missed a kfree on the quota root and free_extent_buffer() calls on its root and commit root nodes, since back then roots were not yet reference counted. Reviewed-by: Boris Burkov Reviewed-by: Qu Wenruo Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Sasha Levin --- fs/btrfs/qgroup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 50669ff9346c..83d17f22335b 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1197,7 +1197,7 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info) int btrfs_quota_disable(struct btrfs_fs_info *fs_info) { - struct btrfs_root *quota_root; + struct btrfs_root *quota_root = NULL; struct btrfs_trans_handle *trans = NULL; int ret = 0; @@ -1290,9 +1290,9 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info) btrfs_tree_unlock(quota_root->node); btrfs_free_tree_block(trans, quota_root, quota_root->node, 0, 1); - btrfs_put_root(quota_root); out: + btrfs_put_root(quota_root); mutex_unlock(&fs_info->qgroup_ioctl_lock); if (ret && trans) btrfs_end_transaction(trans); From 2a28531dd0167a298befe90d946e34da8bb1b579 Mon Sep 17 00:00:00 2001 From: Aivaz Latypov Date: Tue, 25 Jun 2024 13:12:02 +0500 Subject: [PATCH 1508/1565] ALSA: hda/relatek: Enable Mute LED on HP Laptop 15-gw0xxx [ Upstream commit 1d091a98c399c17d0571fa1d91a7123a698446e4 ] This HP Laptop uses ALC236 codec with COEF 0x07 controlling the mute LED. Enable existing quirk for this device. Signed-off-by: Aivaz Latypov Link: https://patch.msgid.link/20240625081217.1049-1-reichaivaz@gmail.com Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index fdbc76eaf233..5cc158c56d43 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9096,6 +9096,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x103c, 0x8788, "HP OMEN 15", ALC285_FIXUP_HP_MUTE_LED), SND_PCI_QUIRK(0x103c, 0x87b7, "HP Laptop 14-fq0xxx", ALC236_FIXUP_HP_MUTE_LED_COEFBIT2), SND_PCI_QUIRK(0x103c, 0x87c8, "HP", ALC287_FIXUP_HP_GPIO_LED), + SND_PCI_QUIRK(0x103c, 0x87d3, "HP Laptop 15-gw0xxx", ALC236_FIXUP_HP_MUTE_LED_COEFBIT2), SND_PCI_QUIRK(0x103c, 0x87e5, "HP ProBook 440 G8 Notebook PC", ALC236_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x87e7, "HP ProBook 450 G8 Notebook PC", ALC236_FIXUP_HP_GPIO_LED), SND_PCI_QUIRK(0x103c, 0x87f1, "HP ProBook 630 G8 Notebook PC", ALC236_FIXUP_HP_GPIO_LED), From 8acf8801f3d9d864d0ab79a1b5f3f768bb80cc97 Mon Sep 17 00:00:00 2001 From: Shengjiu Wang Date: Thu, 20 Jun 2024 10:40:18 +0800 Subject: [PATCH 1509/1565] ALSA: dmaengine_pcm: terminate dmaengine before synchronize [ Upstream commit 6a7db25aad8ce6512b366d2ce1d0e60bac00a09d ] When dmaengine supports pause function, in suspend state, dmaengine_pause() is called instead of dmaengine_terminate_async(), In end of playback stream, the runtime->state will go to SNDRV_PCM_STATE_DRAINING, if system suspend & resume happen at this time, application will not resume playback stream, the stream will be closed directly, the dmaengine_terminate_async() will not be called before the dmaengine_synchronize(), which violates the call sequence for dmaengine_synchronize(). This behavior also happens for capture streams, but there is no SNDRV_PCM_STATE_DRAINING state for capture. So use dmaengine_tx_status() to check the DMA status if the status is DMA_PAUSED, then call dmaengine_terminate_async() to terminate dmaengine before dmaengine_synchronize(). Signed-off-by: Shengjiu Wang Link: https://patch.msgid.link/1718851218-27803-1-git-send-email-shengjiu.wang@nxp.com Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin --- sound/core/pcm_dmaengine.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/sound/core/pcm_dmaengine.c b/sound/core/pcm_dmaengine.c index db2229445256..a7e2e6955e51 100644 --- a/sound/core/pcm_dmaengine.c +++ b/sound/core/pcm_dmaengine.c @@ -359,6 +359,12 @@ EXPORT_SYMBOL_GPL(snd_dmaengine_pcm_sync_stop); int snd_dmaengine_pcm_close(struct snd_pcm_substream *substream) { struct dmaengine_pcm_runtime_data *prtd = substream_to_prtd(substream); + struct dma_tx_state state; + enum dma_status status; + + status = dmaengine_tx_status(prtd->dma_chan, prtd->cookie, &state); + if (status == DMA_PAUSED) + dmaengine_terminate_async(prtd->dma_chan); dmaengine_synchronize(prtd->dma_chan); kfree(prtd); @@ -377,6 +383,12 @@ EXPORT_SYMBOL_GPL(snd_dmaengine_pcm_close); int snd_dmaengine_pcm_close_release_chan(struct snd_pcm_substream *substream) { struct dmaengine_pcm_runtime_data *prtd = substream_to_prtd(substream); + struct dma_tx_state state; + enum dma_status status; + + status = dmaengine_tx_status(prtd->dma_chan, prtd->cookie, &state); + if (status == DMA_PAUSED) + dmaengine_terminate_async(prtd->dma_chan); dmaengine_synchronize(prtd->dma_chan); dma_release_channel(prtd->dma_chan); From 909f4c2fc98761d2a15d6ee16138d020038f11fb Mon Sep 17 00:00:00 2001 From: Daniele Palmas Date: Tue, 25 Jun 2024 12:22:36 +0200 Subject: [PATCH 1510/1565] net: usb: qmi_wwan: add Telit FN912 compositions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 77453e2b015b5ced5b3f45364dd5a72dfc3bdecb ] Add the following Telit FN912 compositions: 0x3000: rmnet + tty (AT/NMEA) + tty (AT) + tty (diag) T: Bus=03 Lev=01 Prnt=03 Port=07 Cnt=01 Dev#= 8 Spd=480 MxCh= 0 D: Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=3000 Rev=05.15 S: Manufacturer=Telit Cinterion S: Product=FN912 S: SerialNumber=92c4c4d8 C: #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms 0x3001: rmnet + tty (AT) + tty (diag) + DPL (data packet logging) + adb T: Bus=03 Lev=01 Prnt=03 Port=07 Cnt=01 Dev#= 7 Spd=480 MxCh= 0 D: Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=3001 Rev=05.15 S: Manufacturer=Telit Cinterion S: Product=FN912 S: SerialNumber=92c4c4d8 C: #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms I: If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 3 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=80 Driver=(none) E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I: If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=usbfs E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms Signed-off-by: Daniele Palmas Acked-by: Bjørn Mork Link: https://patch.msgid.link/20240625102236.69539-1-dnlplm@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- drivers/net/usb/qmi_wwan.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 4dd1a9fb4c8a..d2a8238e144a 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1312,6 +1312,8 @@ static const struct usb_device_id products[] = { {QMI_QUIRK_SET_DTR(0x1bc7, 0x1260, 2)}, /* Telit LE910Cx */ {QMI_QUIRK_SET_DTR(0x1bc7, 0x1261, 2)}, /* Telit LE910Cx */ {QMI_QUIRK_SET_DTR(0x1bc7, 0x1900, 1)}, /* Telit LN940 series */ + {QMI_QUIRK_SET_DTR(0x1bc7, 0x3000, 0)}, /* Telit FN912 series */ + {QMI_QUIRK_SET_DTR(0x1bc7, 0x3001, 0)}, /* Telit FN912 series */ {QMI_FIXED_INTF(0x1c9e, 0x9801, 3)}, /* Telewell TW-3G HSPA+ */ {QMI_FIXED_INTF(0x1c9e, 0x9803, 4)}, /* Telewell TW-3G HSPA+ */ {QMI_FIXED_INTF(0x1c9e, 0x9b01, 3)}, /* XS Stick W100-2 from 4G Systems */ From d1e4e94cb8ab5154ea0ee3a6f8bed08b8d398fb2 Mon Sep 17 00:00:00 2001 From: Yunshui Jiang Date: Fri, 31 May 2024 16:07:39 +0800 Subject: [PATCH 1511/1565] net: mac802154: Fix racy device stats updates by DEV_STATS_INC() and DEV_STATS_ADD() [ Upstream commit b8ec0dc3845f6c9089573cb5c2c4b05f7fc10728 ] mac802154 devices update their dev->stats fields locklessly. Therefore these counters should be updated atomically. Adopt SMP safe DEV_STATS_INC() and DEV_STATS_ADD() to achieve this. Signed-off-by: Yunshui Jiang Message-ID: <20240531080739.2608969-1-jiangyunshui@kylinos.cn> Signed-off-by: Stefan Schmidt Signed-off-by: Sasha Levin --- net/mac802154/tx.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c index c829e4a75325..7cea95d0b78f 100644 --- a/net/mac802154/tx.c +++ b/net/mac802154/tx.c @@ -34,8 +34,8 @@ void ieee802154_xmit_worker(struct work_struct *work) if (res) goto err_tx; - dev->stats.tx_packets++; - dev->stats.tx_bytes += skb->len; + DEV_STATS_INC(dev, tx_packets); + DEV_STATS_ADD(dev, tx_bytes, skb->len); ieee802154_xmit_complete(&local->hw, skb, false); @@ -86,8 +86,8 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb) goto err_tx; } - dev->stats.tx_packets++; - dev->stats.tx_bytes += len; + DEV_STATS_INC(dev, tx_packets); + DEV_STATS_ADD(dev, tx_bytes, len); } else { local->tx_skb = skb; queue_work(local->workqueue, &local->tx_work); From 6b16098148ea58a67430d90e20476be2377c3acd Mon Sep 17 00:00:00 2001 From: Anjali K Date: Fri, 14 Jun 2024 23:08:44 +0530 Subject: [PATCH 1512/1565] powerpc/pseries: Whitelist dtl slub object for copying to userspace [ Upstream commit 1a14150e1656f7a332a943154fc486504db4d586 ] Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-* results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as shown below. kernel BUG at mm/usercopy.c:102! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 #85 Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries NIP: c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8 REGS: c000000120c078c0 TRAP: 0700 Not tainted (6.10.0-rc3) MSR: 8000000000029033 CR: 2828220f XER: 0000000e CFAR: c0000000001fdc80 IRQMASK: 0 [ ... GPRs omitted ... ] NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0 LR [c0000000005d23d0] usercopy_abort+0x74/0xb0 Call Trace: usercopy_abort+0x74/0xb0 (unreliable) __check_heap_object+0xf8/0x120 check_heap_object+0x218/0x240 __check_object_size+0x84/0x1a4 dtl_file_read+0x17c/0x2c4 full_proxy_read+0x8c/0x110 vfs_read+0xdc/0x3a0 ksys_read+0x84/0x144 system_call_exception+0x124/0x330 system_call_vectored_common+0x15c/0x2ec --- interrupt: 3000 at 0x7fff81f3ab34 Commit 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0") requires that only whitelisted areas in slab/slub objects can be copied to userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY. Dtl contains hypervisor dispatch events which are expected to be read by privileged users. Hence mark this safe for user access. Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the entire object. Co-developed-by: Vishal Chourasia Signed-off-by: Vishal Chourasia Signed-off-by: Anjali K Reviewed-by: Srikar Dronamraju Signed-off-by: Michael Ellerman Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com Signed-off-by: Sasha Levin --- arch/powerpc/platforms/pseries/setup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 822be2680b79..8e4a2e8aee11 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -312,8 +312,8 @@ static int alloc_dispatch_log_kmem_cache(void) { void (*ctor)(void *) = get_dtl_cache_ctor(); - dtl_cache = kmem_cache_create("dtl", DISPATCH_LOG_BYTES, - DISPATCH_LOG_BYTES, 0, ctor); + dtl_cache = kmem_cache_create_usercopy("dtl", DISPATCH_LOG_BYTES, + DISPATCH_LOG_BYTES, 0, 0, DISPATCH_LOG_BYTES, ctor); if (!dtl_cache) { pr_warn("Failed to create dispatch trace log buffer cache\n"); pr_warn("Stolen time statistics will be unreliable\n"); From 033c51dfdbb6b79ab43fb3587276fa82d0a329e1 Mon Sep 17 00:00:00 2001 From: Ganesh Goudar Date: Mon, 17 Jun 2024 19:32:40 +0530 Subject: [PATCH 1513/1565] powerpc/eeh: avoid possible crash when edev->pdev changes [ Upstream commit a1216e62d039bf63a539bbe718536ec789a853dd ] If a PCI device is removed during eeh_pe_report_edev(), edev->pdev will change and can cause a crash, hold the PCI rescan/remove lock while taking a copy of edev->pdev->bus. Signed-off-by: Ganesh Goudar Signed-off-by: Michael Ellerman Link: https://msgid.link/20240617140240.580453-1-ganeshgr@linux.ibm.com Signed-off-by: Sasha Levin --- arch/powerpc/kernel/eeh_pe.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index 845e024321d4..a856d9ba42d2 100644 --- a/arch/powerpc/kernel/eeh_pe.c +++ b/arch/powerpc/kernel/eeh_pe.c @@ -849,6 +849,7 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe) { struct eeh_dev *edev; struct pci_dev *pdev; + struct pci_bus *bus = NULL; if (pe->type & EEH_PE_PHB) return pe->phb->bus; @@ -859,9 +860,11 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe) /* Retrieve the parent PCI bus of first (top) PCI device */ edev = list_first_entry_or_null(&pe->edevs, struct eeh_dev, entry); + pci_lock_rescan_remove(); pdev = eeh_dev_to_pci_dev(edev); if (pdev) - return pdev->bus; + bus = pdev->bus; + pci_unlock_rescan_remove(); - return NULL; + return bus; } From 739d8d008209e43b91eb30508177699ad1ff74d8 Mon Sep 17 00:00:00 2001 From: Xingui Yang Date: Wed, 19 Jun 2024 09:17:42 +0000 Subject: [PATCH 1514/1565] scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed [ Upstream commit ab2068a6fb84751836a84c26ca72b3beb349619d ] The expander phy will be treated as broadcast flutter in the next revalidation after the exp-attached end device probe failed, as follows: [78779.654026] sas: broadcast received: 0 [78779.654037] sas: REVALIDATING DOMAIN on port 0, pid:10 [78779.654680] sas: ex 500e004aaaaaaa1f phy05 change count has changed [78779.662977] sas: ex 500e004aaaaaaa1f phy05 originated BROADCAST(CHANGE) [78779.662986] sas: ex 500e004aaaaaaa1f phy05 new device attached [78779.663079] sas: ex 500e004aaaaaaa1f phy05:U:8 attached: 500e004aaaaaaa05 (stp) [78779.693542] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] found [78779.701155] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0 [78779.707864] sas: Enter sas_scsi_recover_host busy: 0 failed: 0 ... [78835.161307] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1 [78835.171344] sas: sas_probe_sata: for exp-attached device 500e004aaaaaaa05 returned -19 [78835.180879] hisi_sas_v3_hw 0000:b4:02.0: dev[16:5] is gone [78835.187487] sas: broadcast received: 0 [78835.187504] sas: REVALIDATING DOMAIN on port 0, pid:10 [78835.188263] sas: ex 500e004aaaaaaa1f phy05 change count has changed [78835.195870] sas: ex 500e004aaaaaaa1f phy05 originated BROADCAST(CHANGE) [78835.195875] sas: ex 500e004aaaaaaa1f rediscovering phy05 [78835.196022] sas: ex 500e004aaaaaaa1f phy05:U:A attached: 500e004aaaaaaa05 (stp) [78835.196026] sas: ex 500e004aaaaaaa1f phy05 broadcast flutter [78835.197615] sas: done REVALIDATING DOMAIN on port 0, pid:10, res 0x0 The cause of the problem is that the related ex_phy's attached_sas_addr was not cleared after the end device probe failed, so reset it. Signed-off-by: Xingui Yang Link: https://lore.kernel.org/r/20240619091742.25465-1-yangxingui@huawei.com Reviewed-by: John Garry Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/libsas/sas_internal.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h index 52e09c3e2b50..3ef2fde28b8e 100644 --- a/drivers/scsi/libsas/sas_internal.h +++ b/drivers/scsi/libsas/sas_internal.h @@ -114,6 +114,20 @@ static inline void sas_fail_probe(struct domain_device *dev, const char *func, i func, dev->parent ? "exp-attached" : "direct-attached", SAS_ADDR(dev->sas_addr), err); + + /* + * If the device probe failed, the expander phy attached address + * needs to be reset so that the phy will not be treated as flutter + * in the next revalidation + */ + if (dev->parent && !dev_is_expander(dev->dev_type)) { + struct sas_phy *phy = dev->phy; + struct domain_device *parent = dev->parent; + struct ex_phy *ex_phy = &parent->ex_dev.ex_phy[phy->number]; + + memset(ex_phy->attached_sas_addr, 0, SAS_ADDR_SIZE); + } + sas_unregister_dev(dev->port, dev); } From ddeda6ca5f218b668b560d90fc31ae469adbfd92 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Mon, 10 Jun 2024 20:00:32 +0900 Subject: [PATCH 1515/1565] Bluetooth: hci_core: cancel all works upon hci_unregister_dev() [ Upstream commit 0d151a103775dd9645c78c97f77d6e2a5298d913 ] syzbot is reporting that calling hci_release_dev() from hci_error_reset() due to hci_dev_put() from hci_error_reset() can cause deadlock at destroy_workqueue(), for hci_error_reset() is called from hdev->req_workqueue which destroy_workqueue() needs to flush. We need to make sure that hdev->{rx_work,cmd_work,tx_work} which are queued into hdev->workqueue and hdev->{power_on,error_reset} which are queued into hdev->req_workqueue are no longer running by the moment destroy_workqueue(hdev->workqueue); destroy_workqueue(hdev->req_workqueue); are called from hci_release_dev(). Call cancel_work_sync() on these work items from hci_unregister_dev() as soon as hdev->list is removed from hci_dev_list. Reported-by: syzbot Closes: https://syzkaller.appspot.com/bug?extid=da0a9c9721e36db712e8 Signed-off-by: Tetsuo Handa Signed-off-by: Luiz Augusto von Dentz Signed-off-by: Sasha Levin --- net/bluetooth/hci_core.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index b9cf5bc9364c..c8c1cd55c0eb 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -3839,7 +3839,11 @@ void hci_unregister_dev(struct hci_dev *hdev) list_del(&hdev->list); write_unlock(&hci_dev_list_lock); + cancel_work_sync(&hdev->rx_work); + cancel_work_sync(&hdev->cmd_work); + cancel_work_sync(&hdev->tx_work); cancel_work_sync(&hdev->power_on); + cancel_work_sync(&hdev->error_reset); if (!test_bit(HCI_QUIRK_NO_SUSPEND_NOTIFIER, &hdev->quirks)) { hci_suspend_clear_tasks(hdev); From f65bffb464408cb1e7a6feb542a7234a38fde0d6 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Tue, 2 Jul 2024 21:03:26 +0200 Subject: [PATCH 1516/1565] fs: better handle deep ancestor chains in is_subdir() [ Upstream commit 391b59b045004d5b985d033263ccba3e941a7740 ] Jan reported that 'cd ..' may take a long time in deep directory hierarchies under a bind-mount. If concurrent renames happen it is possible to livelock in is_subdir() because it will keep retrying. Change is_subdir() from simply retrying over and over to retry once and then acquire the rename lock to handle deep ancestor chains better. The list of alternatives to this approach were less then pleasant. Change the scope of rcu lock to cover the whole walk while at it. A big thanks to Jan and Linus. Both Jan and Linus had proposed effectively the same thing just that one version ended up being slightly more elegant. Reported-by: Jan Kara Signed-off-by: Linus Torvalds Signed-off-by: Christian Brauner Signed-off-by: Sasha Levin --- fs/dcache.c | 31 ++++++++++++++----------------- 1 file changed, 14 insertions(+), 17 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 406a71abb1b5..5febd219fdeb 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -3092,28 +3092,25 @@ EXPORT_SYMBOL(d_splice_alias); bool is_subdir(struct dentry *new_dentry, struct dentry *old_dentry) { - bool result; + bool subdir; unsigned seq; if (new_dentry == old_dentry) return true; - do { - /* for restarting inner loop in case of seq retry */ - seq = read_seqbegin(&rename_lock); - /* - * Need rcu_readlock to protect against the d_parent trashing - * due to d_move - */ - rcu_read_lock(); - if (d_ancestor(old_dentry, new_dentry)) - result = true; - else - result = false; - rcu_read_unlock(); - } while (read_seqretry(&rename_lock, seq)); - - return result; + /* Access d_parent under rcu as d_move() may change it. */ + rcu_read_lock(); + seq = read_seqbegin(&rename_lock); + subdir = d_ancestor(old_dentry, new_dentry); + /* Try lockless once... */ + if (read_seqretry(&rename_lock, seq)) { + /* ...else acquire lock for progress even on deep chains. */ + read_seqlock_excl(&rename_lock); + subdir = d_ancestor(old_dentry, new_dentry); + read_sequnlock_excl(&rename_lock); + } + rcu_read_unlock(); + return subdir; } EXPORT_SYMBOL(is_subdir); From 38c2028bb3e4817c9265dcdf28e2ac8e7cc5fd48 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= Date: Wed, 8 May 2024 11:56:10 +0200 Subject: [PATCH 1517/1565] spi: imx: Don't expect DMA for i.MX{25,35,50,51,53} cspi devices MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit ce1dac560a74220f2e53845ec0723b562288aed4 ] While in commit 2dd33f9cec90 ("spi: imx: support DMA for imx35") it was claimed that DMA works on i.MX25, i.MX31 and i.MX35 the respective device trees don't add DMA channels. The Reference manuals of i.MX31 and i.MX25 also don't mention the CSPI core being DMA capable. (I didn't check the others.) Since commit e267a5b3ec59 ("spi: spi-imx: Use dev_err_probe for failed DMA channel requests") this results in an error message spi_imx 43fa4000.spi: error -ENODEV: can't get the TX DMA channel! during boot. However that isn't fatal and the driver gets loaded just fine, just without using DMA. Signed-off-by: Uwe Kleine-König Link: https://patch.msgid.link/20240508095610.2146640-2-u.kleine-koenig@pengutronix.de Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- drivers/spi/spi-imx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c index 21297cc62571..8566da12d15e 100644 --- a/drivers/spi/spi-imx.c +++ b/drivers/spi/spi-imx.c @@ -1001,7 +1001,7 @@ static struct spi_imx_devtype_data imx35_cspi_devtype_data = { .rx_available = mx31_rx_available, .reset = mx31_reset, .fifo_size = 8, - .has_dmamode = true, + .has_dmamode = false, .dynamic_burst = false, .has_slavemode = false, .devtype = IMX35_CSPI, From f236af7561379c5201da4eb6f49f07d5340c61b6 Mon Sep 17 00:00:00 2001 From: John Hubbard Date: Fri, 5 Jul 2024 09:57:34 -1000 Subject: [PATCH 1518/1565] selftests/vDSO: fix clang build errors and warnings [ Upstream commit 73810cd45b99c6c418e1c6a487b52c1e74edb20d ] When building with clang, via: make LLVM=1 -C tools/testing/selftests ...there are several warnings, and an error. This fixes all of those and allows these tests to run and pass. 1. Fix linker error (undefined reference to memcpy) by providing a local version of memcpy. 2. clang complains about using this form: if (g = h & 0xf0000000) ...so factor out the assignment into a separate step. 3. The code is passing a signed const char* to elf_hash(), which expects a const unsigned char *. There are several callers, so fix this at the source by allowing the function to accept a signed argument, and then converting to unsigned operations, once inside the function. 4. clang doesn't have __attribute__((externally_visible)) and generates a warning to that effect. Fortunately, gcc 12 and gcc 13 do not seem to require that attribute in order to build, run and pass tests here, so remove it. Reviewed-by: Carlos Llamas Reviewed-by: Edward Liaw Reviewed-by: Muhammad Usama Anjum Tested-by: Muhammad Usama Anjum Signed-off-by: John Hubbard Signed-off-by: Shuah Khan Signed-off-by: Sasha Levin --- tools/testing/selftests/vDSO/parse_vdso.c | 16 +++++++++++----- .../selftests/vDSO/vdso_standalone_test_x86.c | 18 ++++++++++++++++-- 2 files changed, 27 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/vDSO/parse_vdso.c b/tools/testing/selftests/vDSO/parse_vdso.c index 413f75620a35..4ae417372e9e 100644 --- a/tools/testing/selftests/vDSO/parse_vdso.c +++ b/tools/testing/selftests/vDSO/parse_vdso.c @@ -55,14 +55,20 @@ static struct vdso_info ELF(Verdef) *verdef; } vdso_info; -/* Straight from the ELF specification. */ -static unsigned long elf_hash(const unsigned char *name) +/* + * Straight from the ELF specification...and then tweaked slightly, in order to + * avoid a few clang warnings. + */ +static unsigned long elf_hash(const char *name) { unsigned long h = 0, g; - while (*name) + const unsigned char *uch_name = (const unsigned char *)name; + + while (*uch_name) { - h = (h << 4) + *name++; - if (g = h & 0xf0000000) + h = (h << 4) + *uch_name++; + g = h & 0xf0000000; + if (g) h ^= g >> 24; h &= ~g; } diff --git a/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c b/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c index 8a44ff973ee1..27f6fdf11969 100644 --- a/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c +++ b/tools/testing/selftests/vDSO/vdso_standalone_test_x86.c @@ -18,7 +18,7 @@ #include "parse_vdso.h" -/* We need a libc functions... */ +/* We need some libc functions... */ int strcmp(const char *a, const char *b) { /* This implementation is buggy: it never returns -1. */ @@ -34,6 +34,20 @@ int strcmp(const char *a, const char *b) return 0; } +/* + * The clang build needs this, although gcc does not. + * Stolen from lib/string.c. + */ +void *memcpy(void *dest, const void *src, size_t count) +{ + char *tmp = dest; + const char *s = src; + + while (count--) + *tmp++ = *s++; + return dest; +} + /* ...and two syscalls. This is x86-specific. */ static inline long x86_syscall3(long nr, long a0, long a1, long a2) { @@ -70,7 +84,7 @@ void to_base10(char *lastdig, time_t n) } } -__attribute__((externally_visible)) void c_main(void **stack) +void c_main(void **stack) { /* Parse the stack */ long argc = (long)*stack; From 34f8efd2743f2d961e92e8e994de4c7a2f9e74a0 Mon Sep 17 00:00:00 2001 From: Edward Adam Davis Date: Tue, 21 May 2024 13:21:46 +0800 Subject: [PATCH 1519/1565] hfsplus: fix uninit-value in copy_name [ Upstream commit 0570730c16307a72f8241df12363f76600baf57d ] [syzbot reported] BUG: KMSAN: uninit-value in sized_strscpy+0xc4/0x160 sized_strscpy+0xc4/0x160 copy_name+0x2af/0x320 fs/hfsplus/xattr.c:411 hfsplus_listxattr+0x11e9/0x1a50 fs/hfsplus/xattr.c:750 vfs_listxattr fs/xattr.c:493 [inline] listxattr+0x1f3/0x6b0 fs/xattr.c:840 path_listxattr fs/xattr.c:864 [inline] __do_sys_listxattr fs/xattr.c:876 [inline] __se_sys_listxattr fs/xattr.c:873 [inline] __x64_sys_listxattr+0x16b/0x2f0 fs/xattr.c:873 x64_sys_call+0x2ba0/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:195 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Uninit was created at: slab_post_alloc_hook mm/slub.c:3877 [inline] slab_alloc_node mm/slub.c:3918 [inline] kmalloc_trace+0x57b/0xbe0 mm/slub.c:4065 kmalloc include/linux/slab.h:628 [inline] hfsplus_listxattr+0x4cc/0x1a50 fs/hfsplus/xattr.c:699 vfs_listxattr fs/xattr.c:493 [inline] listxattr+0x1f3/0x6b0 fs/xattr.c:840 path_listxattr fs/xattr.c:864 [inline] __do_sys_listxattr fs/xattr.c:876 [inline] __se_sys_listxattr fs/xattr.c:873 [inline] __x64_sys_listxattr+0x16b/0x2f0 fs/xattr.c:873 x64_sys_call+0x2ba0/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:195 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f [Fix] When allocating memory to strbuf, initialize memory to 0. Reported-and-tested-by: syzbot+efde959319469ff8d4d7@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis Link: https://lore.kernel.org/r/tencent_8BBB6433BC9E1C1B7B4BDF1BF52574BA8808@qq.com Reported-and-tested-by: syzbot+01ade747b16e9c8030e0@syzkaller.appspotmail.com Signed-off-by: Christian Brauner Signed-off-by: Sasha Levin --- fs/hfsplus/xattr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/hfsplus/xattr.c b/fs/hfsplus/xattr.c index bb0b27d88e50..d91f76ef18d9 100644 --- a/fs/hfsplus/xattr.c +++ b/fs/hfsplus/xattr.c @@ -700,7 +700,7 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size) return err; } - strbuf = kmalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN + + strbuf = kzalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN + XATTR_MAC_OSX_PREFIX_LEN + 1, GFP_KERNEL); if (!strbuf) { res = -ENOMEM; From 727ed4810c8b8ef614c17c25a026850927b2de01 Mon Sep 17 00:00:00 2001 From: David Lechner Date: Mon, 8 Jul 2024 20:05:30 -0500 Subject: [PATCH 1520/1565] spi: mux: set ctlr->bits_per_word_mask [ Upstream commit c8bd922d924bb4ab6c6c488310157d1a27996f31 ] Like other SPI controller flags, bits_per_word_mask may be used by a peripheral driver, so it needs to reflect the capabilities of the underlying controller. Signed-off-by: David Lechner Link: https://patch.msgid.link/20240708-spi-mux-fix-v1-3-6c8845193128@baylibre.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin --- drivers/spi/spi-mux.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/spi/spi-mux.c b/drivers/spi/spi-mux.c index 9708b7827ff7..b18c5265e858 100644 --- a/drivers/spi/spi-mux.c +++ b/drivers/spi/spi-mux.c @@ -149,6 +149,7 @@ static int spi_mux_probe(struct spi_device *spi) /* supported modes are the same as our parent's */ ctlr->mode_bits = spi->controller->mode_bits; ctlr->flags = spi->controller->flags; + ctlr->bits_per_word_mask = spi->controller->bits_per_word_mask; ctlr->transfer_one_message = spi_mux_transfer_one_message; ctlr->setup = spi_mux_setup; ctlr->num_chipselect = mux_control_states(priv->mux); From c85e6b7d9ef83eadd24e0cb793f59167d7edbb3a Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Tue, 26 Sep 2023 17:09:03 +0100 Subject: [PATCH 1521/1565] ARM: 9324/1: fix get_user() broken with veneer commit 24d3ba0a7b44c1617c27f5045eecc4f34752ab03 upstream. The 32-bit ARM kernel stops working if the kernel grows to the point where veneers for __get_user_* are created. AAPCS32 [1] states, "Register r12 (IP) may be used by a linker as a scratch register between a routine and any subroutine it calls. It can also be used within a routine to hold intermediate values between subroutine calls." However, bl instructions buried within the inline asm are unpredictable for compilers; hence, "ip" must be added to the clobber list. This becomes critical when veneers for __get_user_* are created because veneers use the ip register since commit 02e541db0540 ("ARM: 8323/1: force linker to use PIC veneers"). [1]: https://github.com/ARM-software/abi-aa/blob/2023Q1/aapcs32/aapcs32.rst Signed-off-by: Masahiro Yamada Reviewed-by: Ard Biesheuvel Signed-off-by: Russell King (Oracle) Cc: John Stultz Signed-off-by: Greg Kroah-Hartman --- arch/arm/include/asm/uaccess.h | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h index d9db752c51fe..effcf14e0d47 100644 --- a/arch/arm/include/asm/uaccess.h +++ b/arch/arm/include/asm/uaccess.h @@ -147,16 +147,6 @@ extern int __get_user_64t_1(void *); extern int __get_user_64t_2(void *); extern int __get_user_64t_4(void *); -#define __GUP_CLOBBER_1 "lr", "cc" -#ifdef CONFIG_CPU_USE_DOMAINS -#define __GUP_CLOBBER_2 "ip", "lr", "cc" -#else -#define __GUP_CLOBBER_2 "lr", "cc" -#endif -#define __GUP_CLOBBER_4 "lr", "cc" -#define __GUP_CLOBBER_32t_8 "lr", "cc" -#define __GUP_CLOBBER_8 "lr", "cc" - #define __get_user_x(__r2, __p, __e, __l, __s) \ __asm__ __volatile__ ( \ __asmeq("%0", "r0") __asmeq("%1", "r2") \ @@ -164,7 +154,7 @@ extern int __get_user_64t_4(void *); "bl __get_user_" #__s \ : "=&r" (__e), "=r" (__r2) \ : "0" (__p), "r" (__l) \ - : __GUP_CLOBBER_##__s) + : "ip", "lr", "cc") /* narrowing a double-word get into a single 32bit word register: */ #ifdef __ARMEB__ @@ -186,7 +176,7 @@ extern int __get_user_64t_4(void *); "bl __get_user_64t_" #__s \ : "=&r" (__e), "=r" (__r2) \ : "0" (__p), "r" (__l) \ - : __GUP_CLOBBER_##__s) + : "ip", "lr", "cc") #else #define __get_user_x_64t __get_user_x #endif From 9e2b0a5e252dc554e1437581b884b73b933008b3 Mon Sep 17 00:00:00 2001 From: Kuan-Wei Chiu Date: Tue, 2 Jul 2024 04:56:39 +0800 Subject: [PATCH 1522/1565] ACPI: processor_idle: Fix invalid comparison with insertion sort for latency commit 233323f9b9f828cd7cd5145ad811c1990b692542 upstream. The acpi_cst_latency_cmp() comparison function currently used for sorting C-state latencies does not satisfy transitivity, causing incorrect sorting results. Specifically, if there are two valid acpi_processor_cx elements A and B and one invalid element C, it may occur that A < B, A = C, and B = C. Sorting algorithms assume that if A < B and A = C, then C < B, leading to incorrect ordering. Given the small size of the array (<=8), we replace the library sort function with a simple insertion sort that properly ignores invalid elements and sorts valid ones based on latency. This change ensures correct ordering of the C-state latencies. Fixes: 65ea8f2c6e23 ("ACPI: processor idle: Fix up C-state latency if not ordered") Reported-by: Julian Sikorski Closes: https://lore.kernel.org/lkml/70674dc7-5586-4183-8953-8095567e73df@gmail.com Signed-off-by: Kuan-Wei Chiu Tested-by: Julian Sikorski Cc: All applicable Link: https://patch.msgid.link/20240701205639.117194-1-visitorckw@gmail.com Signed-off-by: Rafael J. Wysocki Signed-off-by: Kuan-Wei Chiu Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/processor_idle.c | 40 ++++++++++++++--------------------- 1 file changed, 16 insertions(+), 24 deletions(-) diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c index 3deeabb27394..ae07927910ca 100644 --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -16,7 +16,6 @@ #include #include #include /* need_resched() */ -#include #include #include #include @@ -390,28 +389,24 @@ static void acpi_processor_power_verify_c3(struct acpi_processor *pr, return; } -static int acpi_cst_latency_cmp(const void *a, const void *b) +static void acpi_cst_latency_sort(struct acpi_processor_cx *states, size_t length) { - const struct acpi_processor_cx *x = a, *y = b; + int i, j, k; - if (!(x->valid && y->valid)) - return 0; - if (x->latency > y->latency) - return 1; - if (x->latency < y->latency) - return -1; - return 0; -} -static void acpi_cst_latency_swap(void *a, void *b, int n) -{ - struct acpi_processor_cx *x = a, *y = b; - u32 tmp; + for (i = 1; i < length; i++) { + if (!states[i].valid) + continue; - if (!(x->valid && y->valid)) - return; - tmp = x->latency; - x->latency = y->latency; - y->latency = tmp; + for (j = i - 1, k = i; j >= 0; j--) { + if (!states[j].valid) + continue; + + if (states[j].latency > states[k].latency) + swap(states[j].latency, states[k].latency); + + k = j; + } + } } static int acpi_processor_power_verify(struct acpi_processor *pr) @@ -456,10 +451,7 @@ static int acpi_processor_power_verify(struct acpi_processor *pr) if (buggy_latency) { pr_notice("FW issue: working around C-state latencies out of order\n"); - sort(&pr->power.states[1], max_cstate, - sizeof(struct acpi_processor_cx), - acpi_cst_latency_cmp, - acpi_cst_latency_swap); + acpi_cst_latency_sort(&pr->power.states[1], max_cstate); } lapic_timer_propagate_broadcast(pr); From be35504b959f2749bab280f4671e8df96dcf836f Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 21 Jun 2024 16:08:27 +0200 Subject: [PATCH 1523/1565] bpf: Fix overrunning reservations in ringbuf commit cfa1a2329a691ffd991fcf7248a57d752e712881 upstream. The BPF ring buffer internally is implemented as a power-of-2 sized circular buffer, with two logical and ever-increasing counters: consumer_pos is the consumer counter to show which logical position the consumer consumed the data, and producer_pos which is the producer counter denoting the amount of data reserved by all producers. Each time a record is reserved, the producer that "owns" the record will successfully advance producer counter. In user space each time a record is read, the consumer of the data advanced the consumer counter once it finished processing. Both counters are stored in separate pages so that from user space, the producer counter is read-only and the consumer counter is read-write. One aspect that simplifies and thus speeds up the implementation of both producers and consumers is how the data area is mapped twice contiguously back-to-back in the virtual memory, allowing to not take any special measures for samples that have to wrap around at the end of the circular buffer data area, because the next page after the last data page would be first data page again, and thus the sample will still appear completely contiguous in virtual memory. Each record has a struct bpf_ringbuf_hdr { u32 len; u32 pg_off; } header for book-keeping the length and offset, and is inaccessible to the BPF program. Helpers like bpf_ringbuf_reserve() return `(void *)hdr + BPF_RINGBUF_HDR_SZ` for the BPF program to use. Bing-Jhong and Muhammad reported that it is however possible to make a second allocated memory chunk overlapping with the first chunk and as a result, the BPF program is now able to edit first chunk's header. For example, consider the creation of a BPF_MAP_TYPE_RINGBUF map with size of 0x4000. Next, the consumer_pos is modified to 0x3000 /before/ a call to bpf_ringbuf_reserve() is made. This will allocate a chunk A, which is in [0x0,0x3008], and the BPF program is able to edit [0x8,0x3008]. Now, lets allocate a chunk B with size 0x3000. This will succeed because consumer_pos was edited ahead of time to pass the `new_prod_pos - cons_pos > rb->mask` check. Chunk B will be in range [0x3008,0x6010], and the BPF program is able to edit [0x3010,0x6010]. Due to the ring buffer memory layout mentioned earlier, the ranges [0x0,0x4000] and [0x4000,0x8000] point to the same data pages. This means that chunk B at [0x4000,0x4008] is chunk A's header. bpf_ringbuf_submit() / bpf_ringbuf_discard() use the header's pg_off to then locate the bpf_ringbuf itself via bpf_ringbuf_restore_from_rec(). Once chunk B modified chunk A's header, then bpf_ringbuf_commit() refers to the wrong page and could cause a crash. Fix it by calculating the oldest pending_pos and check whether the range from the oldest outstanding record to the newest would span beyond the ring buffer size. If that is the case, then reject the request. We've tested with the ring buffer benchmark in BPF selftests (./benchs/run_bench_ringbufs.sh) before/after the fix and while it seems a bit slower on some benchmarks, it is still not significantly enough to matter. Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it") Reported-by: Bing-Jhong Billy Jheng Reported-by: Muhammad Ramdhan Co-developed-by: Bing-Jhong Billy Jheng Co-developed-by: Andrii Nakryiko Signed-off-by: Bing-Jhong Billy Jheng Signed-off-by: Andrii Nakryiko Signed-off-by: Daniel Borkmann Signed-off-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20240621140828.18238-1-daniel@iogearbox.net Signed-off-by: Dominique Martinet Signed-off-by: Greg Kroah-Hartman --- kernel/bpf/ringbuf.c | 30 +++++++++++++++++++++++++----- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c index 1e4bf23528a3..eac0026e2fa6 100644 --- a/kernel/bpf/ringbuf.c +++ b/kernel/bpf/ringbuf.c @@ -41,9 +41,12 @@ struct bpf_ringbuf { * mapping consumer page as r/w, but restrict producer page to r/o. * This protects producer position from being modified by user-space * application and ruining in-kernel position tracking. + * Note that the pending counter is placed in the same + * page as the producer, so that it shares the same cache line. */ unsigned long consumer_pos __aligned(PAGE_SIZE); unsigned long producer_pos __aligned(PAGE_SIZE); + unsigned long pending_pos; char data[] __aligned(PAGE_SIZE); }; @@ -145,6 +148,7 @@ static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node) rb->mask = data_sz - 1; rb->consumer_pos = 0; rb->producer_pos = 0; + rb->pending_pos = 0; return rb; } @@ -323,9 +327,9 @@ bpf_ringbuf_restore_from_rec(struct bpf_ringbuf_hdr *hdr) static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size) { - unsigned long cons_pos, prod_pos, new_prod_pos, flags; - u32 len, pg_off; + unsigned long cons_pos, prod_pos, new_prod_pos, pend_pos, flags; struct bpf_ringbuf_hdr *hdr; + u32 len, pg_off, tmp_size, hdr_len; if (unlikely(size > RINGBUF_MAX_RECORD_SZ)) return NULL; @@ -343,13 +347,29 @@ static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size) spin_lock_irqsave(&rb->spinlock, flags); } + pend_pos = rb->pending_pos; prod_pos = rb->producer_pos; new_prod_pos = prod_pos + len; - /* check for out of ringbuf space by ensuring producer position - * doesn't advance more than (ringbuf_size - 1) ahead + while (pend_pos < prod_pos) { + hdr = (void *)rb->data + (pend_pos & rb->mask); + hdr_len = READ_ONCE(hdr->len); + if (hdr_len & BPF_RINGBUF_BUSY_BIT) + break; + tmp_size = hdr_len & ~BPF_RINGBUF_DISCARD_BIT; + tmp_size = round_up(tmp_size + BPF_RINGBUF_HDR_SZ, 8); + pend_pos += tmp_size; + } + rb->pending_pos = pend_pos; + + /* check for out of ringbuf space: + * - by ensuring producer position doesn't advance more than + * (ringbuf_size - 1) ahead + * - by ensuring oldest not yet committed record until newest + * record does not span more than (ringbuf_size - 1) */ - if (new_prod_pos - cons_pos > rb->mask) { + if (new_prod_pos - cons_pos > rb->mask || + new_prod_pos - pend_pos > rb->mask) { spin_unlock_irqrestore(&rb->spinlock, flags); return NULL; } From c0809c128dad4c3413818384eb06a341633db973 Mon Sep 17 00:00:00 2001 From: Jason Xing Date: Thu, 4 Apr 2024 10:10:01 +0800 Subject: [PATCH 1524/1565] bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue commit 6648e613226e18897231ab5e42ffc29e63fa3365 upstream. Fix NULL pointer data-races in sk_psock_skb_ingress_enqueue() which syzbot reported [1]. [1] BUG: KCSAN: data-race in sk_psock_drop / sk_psock_skb_ingress_enqueue write to 0xffff88814b3278b8 of 8 bytes by task 10724 on cpu 1: sk_psock_stop_verdict net/core/skmsg.c:1257 [inline] sk_psock_drop+0x13e/0x1f0 net/core/skmsg.c:843 sk_psock_put include/linux/skmsg.h:459 [inline] sock_map_close+0x1a7/0x260 net/core/sock_map.c:1648 unix_release+0x4b/0x80 net/unix/af_unix.c:1048 __sock_release net/socket.c:659 [inline] sock_close+0x68/0x150 net/socket.c:1421 __fput+0x2c1/0x660 fs/file_table.c:422 __fput_sync+0x44/0x60 fs/file_table.c:507 __do_sys_close fs/open.c:1556 [inline] __se_sys_close+0x101/0x1b0 fs/open.c:1541 __x64_sys_close+0x1f/0x30 fs/open.c:1541 do_syscall_64+0xd3/0x1d0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 read to 0xffff88814b3278b8 of 8 bytes by task 10713 on cpu 0: sk_psock_data_ready include/linux/skmsg.h:464 [inline] sk_psock_skb_ingress_enqueue+0x32d/0x390 net/core/skmsg.c:555 sk_psock_skb_ingress_self+0x185/0x1e0 net/core/skmsg.c:606 sk_psock_verdict_apply net/core/skmsg.c:1008 [inline] sk_psock_verdict_recv+0x3e4/0x4a0 net/core/skmsg.c:1202 unix_read_skb net/unix/af_unix.c:2546 [inline] unix_stream_read_skb+0x9e/0xf0 net/unix/af_unix.c:2682 sk_psock_verdict_data_ready+0x77/0x220 net/core/skmsg.c:1223 unix_stream_sendmsg+0x527/0x860 net/unix/af_unix.c:2339 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x140/0x180 net/socket.c:745 ____sys_sendmsg+0x312/0x410 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmsg+0x1e9/0x280 net/socket.c:2667 __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __x64_sys_sendmsg+0x46/0x50 net/socket.c:2674 do_syscall_64+0xd3/0x1d0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 value changed: 0xffffffff83d7feb0 -> 0x0000000000000000 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 10713 Comm: syz-executor.4 Tainted: G W 6.8.0-syzkaller-08951-gfe46a7dd189e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024 Prior to this, commit 4cd12c6065df ("bpf, sockmap: Fix NULL pointer dereference in sk_psock_verdict_data_ready()") fixed one NULL pointer similarly due to no protection of saved_data_ready. Here is another different caller causing the same issue because of the same reason. So we should protect it with sk_callback_lock read lock because the writer side in the sk_psock_drop() uses "write_lock_bh(&sk->sk_callback_lock);". To avoid errors that could happen in future, I move those two pairs of lock into the sk_psock_data_ready(), which is suggested by John Fastabend. Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Reported-by: syzbot+aa8c8ec2538929f18f2d@syzkaller.appspotmail.com Signed-off-by: Jason Xing Signed-off-by: Daniel Borkmann Reviewed-by: John Fastabend Closes: https://syzkaller.appspot.com/bug?extid=aa8c8ec2538929f18f2d Link: https://lore.kernel.org/all/20240329134037.92124-1-kerneljasonxing@gmail.com Link: https://lore.kernel.org/bpf/20240404021001.94815-1-kerneljasonxing@gmail.com Signed-off-by: Ashwin Dayanand Kamat Signed-off-by: Greg Kroah-Hartman --- include/linux/skmsg.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index e2af013ec05f..49db90cfe375 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -407,10 +407,12 @@ static inline void sk_psock_put(struct sock *sk, struct sk_psock *psock) static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock) { + read_lock_bh(&sk->sk_callback_lock); if (psock->parser.enabled) psock->parser.saved_data_ready(sk); else sk->sk_data_ready(sk); + read_unlock_bh(&sk->sk_callback_lock); } static inline void psock_set_prog(struct bpf_prog **pprog, From 5ce8fad941233e81f2afb5b52a3fcddd3ba8732f Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 25 Aug 2022 17:26:34 -0700 Subject: [PATCH 1525/1565] scsi: core: Fix a use-after-free commit 8fe4ce5836e932f5766317cb651c1ff2a4cd0506 upstream. There are two .exit_cmd_priv implementations. Both implementations use resources associated with the SCSI host. Make sure that these resources are still available when .exit_cmd_priv is called by waiting inside scsi_remove_host() until the tag set has been freed. This commit fixes the following use-after-free: ================================================================== BUG: KASAN: use-after-free in srp_exit_cmd_priv+0x27/0xd0 [ib_srp] Read of size 8 at addr ffff888100337000 by task multipathd/16727 Call Trace: dump_stack_lvl+0x34/0x44 print_report.cold+0x5e/0x5db kasan_report+0xab/0x120 srp_exit_cmd_priv+0x27/0xd0 [ib_srp] scsi_mq_exit_request+0x4d/0x70 blk_mq_free_rqs+0x143/0x410 __blk_mq_free_map_and_rqs+0x6e/0x100 blk_mq_free_tag_set+0x2b/0x160 scsi_host_dev_release+0xf3/0x1a0 device_release+0x54/0xe0 kobject_put+0xa5/0x120 device_release+0x54/0xe0 kobject_put+0xa5/0x120 scsi_device_dev_release_usercontext+0x4c1/0x4e0 execute_in_process_context+0x23/0x90 device_release+0x54/0xe0 kobject_put+0xa5/0x120 scsi_disk_release+0x3f/0x50 device_release+0x54/0xe0 kobject_put+0xa5/0x120 disk_release+0x17f/0x1b0 device_release+0x54/0xe0 kobject_put+0xa5/0x120 dm_put_table_device+0xa3/0x160 [dm_mod] dm_put_device+0xd0/0x140 [dm_mod] free_priority_group+0xd8/0x110 [dm_multipath] free_multipath+0x94/0xe0 [dm_multipath] dm_table_destroy+0xa2/0x1e0 [dm_mod] __dm_destroy+0x196/0x350 [dm_mod] dev_remove+0x10c/0x160 [dm_mod] ctl_ioctl+0x2c2/0x590 [dm_mod] dm_ctl_ioctl+0x5/0x10 [dm_mod] __x64_sys_ioctl+0xb4/0xf0 dm_ctl_ioctl+0x5/0x10 [dm_mod] __x64_sys_ioctl+0xb4/0xf0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Link: https://lore.kernel.org/r/20220826002635.919423-1-bvanassche@acm.org Fixes: 65ca846a5314 ("scsi: core: Introduce {init,exit}_cmd_priv()") Cc: Ming Lei Cc: Christoph Hellwig Cc: Mike Christie Cc: Hannes Reinecke Cc: John Garry Cc: Li Zhijian Reported-by: Li Zhijian Tested-by: Li Zhijian Signed-off-by: Bart Van Assche Signed-off-by: Martin K. Petersen [mheyne: fixed contextual conflicts: - drivers/scsi/hosts.c: due to missing commit 973dac8a8a14 ("scsi: core: Refine how we set tag_set NUMA node") - drivers/scsi/scsi_sysfs.c: due to missing commit 6f8191fdf41d ("block: simplify disk shutdown") - drivers/scsi/scsi_scan.c: due to missing commit 59506abe5e34 ("scsi: core: Inline scsi_mq_alloc_queue()")] Signed-off-by: Maximilian Heyne Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/hosts.c | 16 +++++++++++++--- drivers/scsi/scsi_lib.c | 6 +++++- drivers/scsi/scsi_priv.h | 2 +- drivers/scsi/scsi_scan.c | 1 + drivers/scsi/scsi_sysfs.c | 1 + include/scsi/scsi_host.h | 2 ++ 6 files changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index 297f2b412d07..17fa1cd91da6 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -182,6 +182,15 @@ void scsi_remove_host(struct Scsi_Host *shost) scsi_proc_host_rm(shost); scsi_proc_hostdir_rm(shost->hostt); + /* + * New SCSI devices cannot be attached anymore because of the SCSI host + * state so drop the tag set refcnt. Wait until the tag set refcnt drops + * to zero because .exit_cmd_priv implementations may need the host + * pointer. + */ + kref_put(&shost->tagset_refcnt, scsi_mq_free_tags); + wait_for_completion(&shost->tagset_freed); + spin_lock_irqsave(shost->host_lock, flags); if (scsi_host_set_state(shost, SHOST_DEL)) BUG_ON(scsi_host_set_state(shost, SHOST_DEL_RECOVERY)); @@ -240,6 +249,9 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, shost->dma_dev = dma_dev; + kref_init(&shost->tagset_refcnt); + init_completion(&shost->tagset_freed); + /* * Increase usage count temporarily here so that calling * scsi_autopm_put_host() will trigger runtime idle if there is @@ -312,6 +324,7 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, pm_runtime_disable(&shost->shost_gendev); pm_runtime_set_suspended(&shost->shost_gendev); pm_runtime_put_noidle(&shost->shost_gendev); + kref_put(&shost->tagset_refcnt, scsi_mq_free_tags); fail: return error; } @@ -344,9 +357,6 @@ static void scsi_host_dev_release(struct device *dev) kfree(dev_name(&shost->shost_dev)); } - if (shost->tag_set.tags) - scsi_mq_destroy_tags(shost); - kfree(shost->shost_data); ida_simple_remove(&host_index_ida, shost->host_no); diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 14dec86ff749..64ae7bc2de60 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1923,9 +1923,13 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) return blk_mq_alloc_tag_set(tag_set); } -void scsi_mq_destroy_tags(struct Scsi_Host *shost) +void scsi_mq_free_tags(struct kref *kref) { + struct Scsi_Host *shost = container_of(kref, typeof(*shost), + tagset_refcnt); + blk_mq_free_tag_set(&shost->tag_set); + complete(&shost->tagset_freed); } /** diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h index 1183dbed687c..3fee9af05925 100644 --- a/drivers/scsi/scsi_priv.h +++ b/drivers/scsi/scsi_priv.h @@ -93,7 +93,7 @@ extern void scsi_requeue_run_queue(struct work_struct *work); extern struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev); extern void scsi_start_queue(struct scsi_device *sdev); extern int scsi_mq_setup_tags(struct Scsi_Host *shost); -extern void scsi_mq_destroy_tags(struct Scsi_Host *shost); +extern void scsi_mq_free_tags(struct kref *kref); extern void scsi_exit_queue(void); extern void scsi_evt_thread(struct work_struct *work); struct request_queue; diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 6f7c4d41c51d..e8703b043805 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -273,6 +273,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget, kfree(sdev); goto out; } + kref_get(&sdev->host->tagset_refcnt); WARN_ON_ONCE(!blk_get_queue(sdev->request_queue)); sdev->request_queue->queuedata = sdev; diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 6cc4d0792e3d..bb37d698da1a 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -1481,6 +1481,7 @@ void __scsi_remove_device(struct scsi_device *sdev) mutex_unlock(&sdev->state_mutex); blk_cleanup_queue(sdev->request_queue); + kref_put(&sdev->host->tagset_refcnt, scsi_mq_free_tags); cancel_work_sync(&sdev->requeue_work); if (sdev->host->hostt->slave_destroy) diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h index 4a9f1e6e3aac..3979e946c3fd 100644 --- a/include/scsi/scsi_host.h +++ b/include/scsi/scsi_host.h @@ -548,6 +548,8 @@ struct Scsi_Host { struct scsi_host_template *hostt; struct scsi_transport_template *transportt; + struct kref tagset_refcnt; + struct completion tagset_freed; /* Area to keep a shared tag map */ struct blk_mq_tag_set tag_set; From 88e44424a62f68554512e0390ca4fe8c0d17d130 Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Tue, 26 Oct 2021 14:33:02 -0300 Subject: [PATCH 1526/1565] ext4: fix error code saved on super block during file system abort commit 124e7c61deb27d758df5ec0521c36cf08d417f7a upstream. ext4_abort will eventually call ext4_errno_to_code, which translates the errno to an EXT4_ERR specific error. This means that ext4_abort expects an errno. By using EXT4_ERR_ here, it gets misinterpreted (as an errno), and ends up saving EXT4_ERR_EBUSY on the superblock during an abort, which makes no sense. ESHUTDOWN will get properly translated to EXT4_ERR_SHUTDOWN, so use that instead. Signed-off-by: Gabriel Krisman Bertazi Link: https://lore.kernel.org/r/20211026173302.84000-1-krisman@collabora.com Signed-off-by: Theodore Ts'o Signed-off-by: Ajay Kaher Signed-off-by: Greg Kroah-Hartman --- fs/ext4/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 0149d3c2cfd7..18a493fc6a3d 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5887,7 +5887,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data) } if (ext4_test_mount_flag(sb, EXT4_MF_FS_ABORTED)) - ext4_abort(sb, EXT4_ERR_ESHUTDOWN, "Abort forced by user"); + ext4_abort(sb, ESHUTDOWN, "Abort forced by user"); sb->s_flags = (sb->s_flags & ~SB_POSIXACL) | (test_opt(sb, POSIX_ACL) ? SB_POSIXACL : 0); From 9760c6ceb2a92c8b5e9ecd9d12c86469d9217f58 Mon Sep 17 00:00:00 2001 From: Gabriel Krisman Bertazi Date: Mon, 25 Oct 2021 16:27:44 -0300 Subject: [PATCH 1527/1565] ext4: Send notifications on error commit 9a089b21f79b47eed240d4da7ea0d049de7c9b4d upstream. Send a FS_ERROR message via fsnotify to a userspace monitoring tool whenever a ext4 error condition is triggered. This follows the existing error conditions in ext4, so it is hooked to the ext4_error* functions. Link: https://lore.kernel.org/r/20211025192746.66445-30-krisman@collabora.com Signed-off-by: Gabriel Krisman Bertazi Acked-by: Theodore Ts'o Reviewed-by: Amir Goldstein Reviewed-by: Jan Kara Signed-off-by: Jan Kara [Ajay: - Modified to apply on v5.10.y - Added fsnotify for __ext4_abort()] Signed-off-by: Ajay Kaher Signed-off-by: Greg Kroah-Hartman --- fs/ext4/super.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 18a493fc6a3d..02236f298de9 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -46,6 +46,7 @@ #include #include #include +#include #include "ext4.h" #include "ext4_extents.h" /* Needed for trace points definition */ @@ -699,6 +700,7 @@ void __ext4_error(struct super_block *sb, const char *function, sb->s_id, function, line, current->comm, &vaf); va_end(args); } + fsnotify_sb_error(sb, NULL, error ? error : EFSCORRUPTED); save_error_info(sb, error, 0, block, function, line); ext4_handle_error(sb); } @@ -730,6 +732,7 @@ void __ext4_error_inode(struct inode *inode, const char *function, current->comm, &vaf); va_end(args); } + fsnotify_sb_error(inode->i_sb, inode, error ? error : EFSCORRUPTED); save_error_info(inode->i_sb, error, inode->i_ino, block, function, line); ext4_handle_error(inode->i_sb); @@ -769,6 +772,7 @@ void __ext4_error_file(struct file *file, const char *function, current->comm, path, &vaf); va_end(args); } + fsnotify_sb_error(inode->i_sb, inode, EFSCORRUPTED); save_error_info(inode->i_sb, EFSCORRUPTED, inode->i_ino, block, function, line); ext4_handle_error(inode->i_sb); @@ -837,7 +841,7 @@ void __ext4_std_error(struct super_block *sb, const char *function, printk(KERN_CRIT "EXT4-fs error (device %s) in %s:%d: %s\n", sb->s_id, function, line, errstr); } - + fsnotify_sb_error(sb, NULL, errno ? errno : EFSCORRUPTED); save_error_info(sb, -errno, 0, 0, function, line); ext4_handle_error(sb); } @@ -861,6 +865,7 @@ void __ext4_abort(struct super_block *sb, const char *function, if (unlikely(ext4_forced_shutdown(EXT4_SB(sb)))) return; + fsnotify_sb_error(sb, NULL, error ? error : EFSCORRUPTED); save_error_info(sb, error, 0, 0, function, line); va_start(args, fmt); vaf.fmt = fmt; From a5224e2123ce21102f346f518db80f004d5053a7 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Sun, 28 Apr 2024 15:57:00 +0300 Subject: [PATCH 1528/1565] drm/amdgpu: Fix signedness bug in sdma_v4_0_process_trap_irq() commit 6769a23697f17f9bf9365ca8ed62fe37e361a05a upstream. The "instance" variable needs to be signed for the error handling to work. Fixes: 8b2faf1a4f3b ("drm/amdgpu: add error handle to avoid out-of-bounds") Reviewed-by: Bob Zhou Signed-off-by: Dan Carpenter Signed-off-by: Alex Deucher Cc: Siddh Raman Pant Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c index c54834ed8751..1bc3167d3a1c 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c @@ -2069,7 +2069,7 @@ static int sdma_v4_0_process_trap_irq(struct amdgpu_device *adev, struct amdgpu_irq_src *source, struct amdgpu_iv_entry *entry) { - uint32_t instance; + int instance; DRM_DEBUG("IH: SDMA trap\n"); instance = sdma_v4_0_irq_id_to_seq(entry->client_id); From 6e03006548c66b979f4e5e9fc797aac4dad82822 Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Tue, 21 May 2024 16:01:00 +0200 Subject: [PATCH 1529/1565] net: relax socket state check at accept time. commit 26afda78cda3da974fd4c287962c169e9462c495 upstream. Christoph reported the following splat: WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0 Modules linked in: CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759 Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80 RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293 RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64 R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000 R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800 FS: 000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786 do_accept+0x435/0x620 net/socket.c:1929 __sys_accept4_file net/socket.c:1969 [inline] __sys_accept4+0x9b/0x110 net/socket.c:1999 __do_sys_accept net/socket.c:2016 [inline] __se_sys_accept net/socket.c:2013 [inline] __x64_sys_accept+0x7d/0x90 net/socket.c:2013 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x4315f9 Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004 RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300 R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055 The reproducer invokes shutdown() before entering the listener status. After commit 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets"), the above causes the child to reach the accept syscall in FIN_WAIT1 status. Eric noted we can relax the existing assertion in __inet_accept() Reported-by: Christoph Paasch Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/490 Suggested-by: Eric Dumazet Fixes: 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets") Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/23ab880a44d8cfd967e84de8b93dbf48848e3d8c.1716299669.git.pabeni@redhat.com Signed-off-by: Paolo Abeni Signed-off-by: Nikolay Kuratov Signed-off-by: Greg Kroah-Hartman --- net/ipv4/af_inet.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 475a19db3713..ce42626663de 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -758,7 +758,9 @@ int inet_accept(struct socket *sock, struct socket *newsock, int flags, sock_rps_record_flow(sk2); WARN_ON(!((1 << sk2->sk_state) & (TCPF_ESTABLISHED | TCPF_SYN_RECV | - TCPF_CLOSE_WAIT | TCPF_CLOSE))); + TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 | + TCPF_CLOSING | TCPF_CLOSE_WAIT | + TCPF_CLOSE))); sock_graft(sk2, newsock); From 77495e5da5cb110a8fed27b052c77853fe282176 Mon Sep 17 00:00:00 2001 From: lei lu Date: Wed, 26 Jun 2024 18:44:33 +0800 Subject: [PATCH 1530/1565] ocfs2: add bounds checking to ocfs2_check_dir_entry() commit 255547c6bb8940a97eea94ef9d464ea5967763fb upstream. This adds sanity checks for ocfs2_dir_entry to make sure all members of ocfs2_dir_entry don't stray beyond valid memory region. Link: https://lkml.kernel.org/r/20240626104433.163270-1-llfamsec@gmail.com Signed-off-by: lei lu Reviewed-by: Heming Zhao Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- fs/ocfs2/dir.c | 46 +++++++++++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 17 deletions(-) diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c index bdfba9db558a..4cc29b808d18 100644 --- a/fs/ocfs2/dir.c +++ b/fs/ocfs2/dir.c @@ -296,13 +296,16 @@ static void ocfs2_dx_dir_name_hash(struct inode *dir, const char *name, int len, * bh passed here can be an inode block or a dir data block, depending * on the inode inline data flag. */ -static int ocfs2_check_dir_entry(struct inode * dir, - struct ocfs2_dir_entry * de, - struct buffer_head * bh, +static int ocfs2_check_dir_entry(struct inode *dir, + struct ocfs2_dir_entry *de, + struct buffer_head *bh, + char *buf, + unsigned int size, unsigned long offset) { const char *error_msg = NULL; const int rlen = le16_to_cpu(de->rec_len); + const unsigned long next_offset = ((char *) de - buf) + rlen; if (unlikely(rlen < OCFS2_DIR_REC_LEN(1))) error_msg = "rec_len is smaller than minimal"; @@ -310,9 +313,11 @@ static int ocfs2_check_dir_entry(struct inode * dir, error_msg = "rec_len % 4 != 0"; else if (unlikely(rlen < OCFS2_DIR_REC_LEN(de->name_len))) error_msg = "rec_len is too small for name_len"; - else if (unlikely( - ((char *) de - bh->b_data) + rlen > dir->i_sb->s_blocksize)) - error_msg = "directory entry across blocks"; + else if (unlikely(next_offset > size)) + error_msg = "directory entry overrun"; + else if (unlikely(next_offset > size - OCFS2_DIR_REC_LEN(1)) && + next_offset != size) + error_msg = "directory entry too close to end"; if (unlikely(error_msg != NULL)) mlog(ML_ERROR, "bad entry in directory #%llu: %s - " @@ -354,16 +359,17 @@ static inline int ocfs2_search_dirblock(struct buffer_head *bh, de_buf = first_de; dlimit = de_buf + bytes; - while (de_buf < dlimit) { + while (de_buf < dlimit - OCFS2_DIR_MEMBER_LEN) { /* this code is executed quadratically often */ /* do minimal checking `by hand' */ de = (struct ocfs2_dir_entry *) de_buf; - if (de_buf + namelen <= dlimit && + if (de->name + namelen <= dlimit && ocfs2_match(namelen, name, de)) { /* found a match - just to be sure, do a full check */ - if (!ocfs2_check_dir_entry(dir, de, bh, offset)) { + if (!ocfs2_check_dir_entry(dir, de, bh, first_de, + bytes, offset)) { ret = -1; goto bail; } @@ -1140,7 +1146,7 @@ static int __ocfs2_delete_entry(handle_t *handle, struct inode *dir, pde = NULL; de = (struct ocfs2_dir_entry *) first_de; while (i < bytes) { - if (!ocfs2_check_dir_entry(dir, de, bh, i)) { + if (!ocfs2_check_dir_entry(dir, de, bh, first_de, bytes, i)) { status = -EIO; mlog_errno(status); goto bail; @@ -1640,7 +1646,8 @@ int __ocfs2_add_entry(handle_t *handle, /* These checks should've already been passed by the * prepare function, but I guess we can leave them * here anyway. */ - if (!ocfs2_check_dir_entry(dir, de, insert_bh, offset)) { + if (!ocfs2_check_dir_entry(dir, de, insert_bh, data_start, + size, offset)) { retval = -ENOENT; goto bail; } @@ -1778,7 +1785,8 @@ static int ocfs2_dir_foreach_blk_id(struct inode *inode, } de = (struct ocfs2_dir_entry *) (data->id_data + ctx->pos); - if (!ocfs2_check_dir_entry(inode, de, di_bh, ctx->pos)) { + if (!ocfs2_check_dir_entry(inode, de, di_bh, (char *)data->id_data, + i_size_read(inode), ctx->pos)) { /* On error, skip the f_pos to the end. */ ctx->pos = i_size_read(inode); break; @@ -1871,7 +1879,8 @@ static int ocfs2_dir_foreach_blk_el(struct inode *inode, while (ctx->pos < i_size_read(inode) && offset < sb->s_blocksize) { de = (struct ocfs2_dir_entry *) (bh->b_data + offset); - if (!ocfs2_check_dir_entry(inode, de, bh, offset)) { + if (!ocfs2_check_dir_entry(inode, de, bh, bh->b_data, + sb->s_blocksize, offset)) { /* On error, skip the f_pos to the next block. */ ctx->pos = (ctx->pos | (sb->s_blocksize - 1)) + 1; @@ -3343,7 +3352,7 @@ static int ocfs2_find_dir_space_id(struct inode *dir, struct buffer_head *di_bh, struct super_block *sb = dir->i_sb; struct ocfs2_dinode *di = (struct ocfs2_dinode *)di_bh->b_data; struct ocfs2_dir_entry *de, *last_de = NULL; - char *de_buf, *limit; + char *first_de, *de_buf, *limit; unsigned long offset = 0; unsigned int rec_len, new_rec_len, free_space = dir->i_sb->s_blocksize; @@ -3356,14 +3365,16 @@ static int ocfs2_find_dir_space_id(struct inode *dir, struct buffer_head *di_bh, else free_space = dir->i_sb->s_blocksize - i_size_read(dir); - de_buf = di->id2.i_data.id_data; + first_de = di->id2.i_data.id_data; + de_buf = first_de; limit = de_buf + i_size_read(dir); rec_len = OCFS2_DIR_REC_LEN(namelen); while (de_buf < limit) { de = (struct ocfs2_dir_entry *)de_buf; - if (!ocfs2_check_dir_entry(dir, de, di_bh, offset)) { + if (!ocfs2_check_dir_entry(dir, de, di_bh, first_de, + i_size_read(dir), offset)) { ret = -ENOENT; goto out; } @@ -3445,7 +3456,8 @@ static int ocfs2_find_dir_space_el(struct inode *dir, const char *name, /* move to next block */ de = (struct ocfs2_dir_entry *) bh->b_data; } - if (!ocfs2_check_dir_entry(dir, de, bh, offset)) { + if (!ocfs2_check_dir_entry(dir, de, bh, bh->b_data, blocksize, + offset)) { status = -ENOENT; goto bail; } From 6386f1b6a10e5d1ddd03db4ff6dfc55d488852ce Mon Sep 17 00:00:00 2001 From: lei lu Date: Wed, 29 May 2024 02:30:40 +0800 Subject: [PATCH 1531/1565] jfs: don't walk off the end of ealist commit d0fa70aca54c8643248e89061da23752506ec0d4 upstream. Add a check before visiting the members of ea to make sure each ea stays within the ealist. Signed-off-by: lei lu Signed-off-by: Dave Kleikamp Signed-off-by: Greg Kroah-Hartman --- fs/jfs/xattr.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/fs/jfs/xattr.c b/fs/jfs/xattr.c index 7ae54f78a5b0..aea5531559c0 100644 --- a/fs/jfs/xattr.c +++ b/fs/jfs/xattr.c @@ -797,7 +797,7 @@ ssize_t __jfs_getxattr(struct inode *inode, const char *name, void *data, size_t buf_size) { struct jfs_ea_list *ealist; - struct jfs_ea *ea; + struct jfs_ea *ea, *ealist_end; struct ea_buffer ea_buf; int xattr_size; ssize_t size; @@ -817,9 +817,16 @@ ssize_t __jfs_getxattr(struct inode *inode, const char *name, void *data, goto not_found; ealist = (struct jfs_ea_list *) ea_buf.xattr; + ealist_end = END_EALIST(ealist); /* Find the named attribute */ - for (ea = FIRST_EA(ealist); ea < END_EALIST(ealist); ea = NEXT_EA(ea)) + for (ea = FIRST_EA(ealist); ea < ealist_end; ea = NEXT_EA(ea)) { + if (unlikely(ea + 1 > ealist_end) || + unlikely(NEXT_EA(ea) > ealist_end)) { + size = -EUCLEAN; + goto release; + } + if ((namelen == ea->namelen) && memcmp(name, ea->name, namelen) == 0) { /* Found it */ @@ -834,6 +841,7 @@ ssize_t __jfs_getxattr(struct inode *inode, const char *name, void *data, memcpy(data, value, size); goto release; } + } not_found: size = -ENODATA; release: @@ -861,7 +869,7 @@ ssize_t jfs_listxattr(struct dentry * dentry, char *data, size_t buf_size) ssize_t size = 0; int xattr_size; struct jfs_ea_list *ealist; - struct jfs_ea *ea; + struct jfs_ea *ea, *ealist_end; struct ea_buffer ea_buf; down_read(&JFS_IP(inode)->xattr_sem); @@ -876,9 +884,16 @@ ssize_t jfs_listxattr(struct dentry * dentry, char *data, size_t buf_size) goto release; ealist = (struct jfs_ea_list *) ea_buf.xattr; + ealist_end = END_EALIST(ealist); /* compute required size of list */ - for (ea = FIRST_EA(ealist); ea < END_EALIST(ealist); ea = NEXT_EA(ea)) { + for (ea = FIRST_EA(ealist); ea < ealist_end; ea = NEXT_EA(ea)) { + if (unlikely(ea + 1 > ealist_end) || + unlikely(NEXT_EA(ea) > ealist_end)) { + size = -EUCLEAN; + goto release; + } + if (can_list(ea)) size += name_size(ea) + 1; } From 74c6b151a85e84901a7408885b9d0e57bc215512 Mon Sep 17 00:00:00 2001 From: Edson Juliano Drosdeck Date: Fri, 12 Jul 2024 15:06:42 -0300 Subject: [PATCH 1532/1565] ALSA: hda/realtek: Enable headset mic on Positivo SU C1400 commit 8fc1e8b230771442133d5cf5fa4313277aa2bb8b upstream. Positivo SU C1400 is equipped with ALC256, and it needs ALC269_FIXUP_ASPIRE_HEADSET_MIC quirk to make its headset mic work. Signed-off-by: Edson Juliano Drosdeck Cc: Link: https://patch.msgid.link/20240712180642.22564-1-edson.drosdeck@gmail.com Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 5cc158c56d43..07c092737695 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9195,6 +9195,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x10cf, 0x1845, "Lifebook U904", ALC269_FIXUP_LIFEBOOK_EXTMIC), SND_PCI_QUIRK(0x10ec, 0x10f2, "Intel Reference board", ALC700_FIXUP_INTEL_REFERENCE), SND_PCI_QUIRK(0x10ec, 0x118c, "Medion EE4254 MD62100", ALC256_FIXUP_MEDION_HEADSET_NO_PRESENCE), + SND_PCI_QUIRK(0x10ec, 0x119e, "Positivo SU C1400", ALC269_FIXUP_ASPIRE_HEADSET_MIC), SND_PCI_QUIRK(0x10ec, 0x11bc, "VAIO VJFE-IL", ALC269_FIXUP_LIMIT_INT_MIC_BOOST), SND_PCI_QUIRK(0x10ec, 0x1230, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK), SND_PCI_QUIRK(0x10ec, 0x124c, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK), From a7ec8a5a7ff277b1fbd8f6e23095a7c0447ab72d Mon Sep 17 00:00:00 2001 From: Seunghun Han Date: Thu, 18 Jul 2024 17:09:08 +0900 Subject: [PATCH 1533/1565] ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book Pro 360 commit d7063c08738573fc2f3296da6d31a22fa8aa843a upstream. Samsung Galaxy Book Pro 360 (13" 2022 NT935QDB-KC71S) with codec SSID 144d:c1a4 requires the same workaround to enable the speaker amp as other Samsung models with the ALC298 codec. Signed-off-by: Seunghun Han Cc: Link: https://patch.msgid.link/20240718080908.8677-1-kkamagui@gmail.com Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 07c092737695..28dbe8cbbffd 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9208,6 +9208,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x144d, 0xc189, "Samsung Galaxy Flex Book (NT950QCG-X716)", ALC298_FIXUP_SAMSUNG_AMP), SND_PCI_QUIRK(0x144d, 0xc18a, "Samsung Galaxy Book Ion (NP930XCJ-K01US)", ALC298_FIXUP_SAMSUNG_AMP), SND_PCI_QUIRK(0x144d, 0xc1a3, "Samsung Galaxy Book Pro (NP935XDB-KC1SE)", ALC298_FIXUP_SAMSUNG_AMP), + SND_PCI_QUIRK(0x144d, 0xc1a4, "Samsung Galaxy Book Pro 360 (NT935QBD)", ALC298_FIXUP_SAMSUNG_AMP), SND_PCI_QUIRK(0x144d, 0xc1a6, "Samsung Galaxy Book Pro 360 (NP930QBD)", ALC298_FIXUP_SAMSUNG_AMP), SND_PCI_QUIRK(0x144d, 0xc740, "Samsung Ativ book 8 (NP870Z5G)", ALC269_FIXUP_ATIV_BOOK_8), SND_PCI_QUIRK(0x144d, 0xc812, "Samsung Notebook Pen S (NT950SBE-X58)", ALC298_FIXUP_SAMSUNG_AMP), From ddf0caf01295e81602e3e81ed51cd04095b5a370 Mon Sep 17 00:00:00 2001 From: Krishna Kurapati Date: Thu, 4 Jul 2024 20:58:47 +0530 Subject: [PATCH 1534/1565] arm64: dts: qcom: msm8996: Disable SS instance in Parkmode for USB commit 44ea1ae3cf95db97e10d6ce17527948121f1dd4b upstream. For Gen-1 targets like MSM8996, it is seen that stressing out the controller in host mode results in HC died error: xhci-hcd.12.auto: xHCI host not responding to stop endpoint command xhci-hcd.12.auto: xHCI host controller not responding, assume dead xhci-hcd.12.auto: HC died; cleaning up And at this instant only restarting the host mode fixes it. Disable SuperSpeed instance in park mode for MSM8996 to mitigate this issue. Cc: stable@vger.kernel.org Fixes: 1e39255ed29d ("arm64: dts: msm8996: Add device node for qcom,dwc3") Signed-off-by: Krishna Kurapati Reviewed-by: Konrad Dybcio Link: https://lore.kernel.org/r/20240704152848.3380602-8-quic_kriskura@quicinc.com Signed-off-by: Bjorn Andersson Signed-off-by: Greg Kroah-Hartman --- arch/arm64/boot/dts/qcom/msm8996.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi b/arch/arm64/boot/dts/qcom/msm8996.dtsi index d766f3b5c03e..e990e727cc0f 100644 --- a/arch/arm64/boot/dts/qcom/msm8996.dtsi +++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi @@ -1796,6 +1796,7 @@ dwc3@6a00000 { snps,dis_u2_susphy_quirk; snps,dis_enblslpm_quirk; snps,is-utmi-l1-suspend; + snps,parkmode-disable-ss-quirk; tx-fifo-resize; }; }; From 7fa9d1d2524ccbec10d24318a9f84b62067950ec Mon Sep 17 00:00:00 2001 From: Shengjiu Wang Date: Wed, 17 Jul 2024 14:44:53 +0800 Subject: [PATCH 1535/1565] ALSA: pcm_dmaengine: Don't synchronize DMA channel when DMA is paused commit 88e98af9f4b5b0d60c1fe7f7f2701b5467691e75 upstream. When suspended, the DMA channel may enter PAUSE state if dmaengine_pause() is supported by DMA. At this state, dmaengine_synchronize() should not be called, otherwise the DMA channel can't be resumed successfully. Fixes: e8343410ddf0 ("ALSA: dmaengine: Synchronize dma channel after drop()") Signed-off-by: Shengjiu Wang Cc: Link: https://patch.msgid.link/1721198693-27636-1-git-send-email-shengjiu.wang@nxp.com Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/core/pcm_dmaengine.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/sound/core/pcm_dmaengine.c b/sound/core/pcm_dmaengine.c index a7e2e6955e51..c46c53656393 100644 --- a/sound/core/pcm_dmaengine.c +++ b/sound/core/pcm_dmaengine.c @@ -345,8 +345,12 @@ EXPORT_SYMBOL_GPL(snd_dmaengine_pcm_open_request_chan); int snd_dmaengine_pcm_sync_stop(struct snd_pcm_substream *substream) { struct dmaengine_pcm_runtime_data *prtd = substream_to_prtd(substream); + struct dma_tx_state state; + enum dma_status status; - dmaengine_synchronize(prtd->dma_chan); + status = dmaengine_tx_status(prtd->dma_chan, prtd->cookie, &state); + if (status != DMA_PAUSED) + dmaengine_synchronize(prtd->dma_chan); return 0; } From 911cc83e56a2de5a40758766c6a70d6998248860 Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Tue, 23 Jul 2024 17:03:56 +0200 Subject: [PATCH 1536/1565] filelock: Fix fcntl/close race recovery compat path commit f8138f2ad2f745b9a1c696a05b749eabe44337ea upstream. When I wrote commit 3cad1bc01041 ("filelock: Remove locks reliably when fcntl/close race is detected"), I missed that there are two copies of the code I was patching: The normal version, and the version for 64-bit offsets on 32-bit kernels. Thanks to Greg KH for stumbling over this while doing the stable backport... Apply exactly the same fix to the compat path for 32-bit kernels. Fixes: c293621bbf67 ("[PATCH] stale POSIX lock handling") Cc: stable@kernel.org Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563 Signed-off-by: Jann Horn Link: https://lore.kernel.org/r/20240723-fs-lock-recover-compatfix-v1-1-148096719529@google.com Signed-off-by: Christian Brauner Signed-off-by: Greg Kroah-Hartman --- fs/locks.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 8a6fcdb6e886..ed8b3e318f97 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2719,8 +2719,9 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd, error = do_lock_file_wait(filp, cmd, file_lock); /* - * Attempt to detect a close/fcntl race and recover by releasing the - * lock that was just acquired. There is no need to do that when we're + * Detect close/fcntl races and recover by zapping all POSIX locks + * associated with this file and our files_struct, just like on + * filp_flush(). There is no need to do that when we're * unlocking though, or for OFD locks. */ if (!error && file_lock->fl_type != F_UNLCK && @@ -2735,9 +2736,7 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd, f = files_lookup_fd_locked(files, fd); spin_unlock(&files->file_lock); if (f != filp) { - file_lock->fl_type = F_UNLCK; - error = do_lock_file_wait(filp, cmd, file_lock); - WARN_ON_ONCE(error); + locks_remove_posix(filp, files); error = -EBADF; } } From 6100e0237204890269e3f934acfc50d35fd6f319 Mon Sep 17 00:00:00 2001 From: Dongli Zhang Date: Wed, 24 Jul 2024 10:04:52 -0700 Subject: [PATCH 1537/1565] tun: add missing verification for short frame commit 049584807f1d797fc3078b68035450a9769eb5c3 upstream. The cited commit missed to check against the validity of the frame length in the tun_xdp_one() path, which could cause a corrupted skb to be sent downstack. Even before the skb is transmitted, the tun_xdp_one-->eth_type_trans() may access the Ethernet header although it can be less than ETH_HLEN. Once transmitted, this could either cause out-of-bound access beyond the actual length, or confuse the underlayer with incorrect or inconsistent header length in the skb metadata. In the alternative path, tun_get_user() already prohibits short frame which has the length less than Ethernet header size from being transmitted for IFF_TAP. This is to drop any frame shorter than the Ethernet header size just like how tun_get_user() does. CVE: CVE-2024-41091 Inspired-by: https://lore.kernel.org/netdev/1717026141-25716-1-git-send-email-si-wei.liu@oracle.com/ Fixes: 043d222f93ab ("tuntap: accept an array of XDP buffs through sendmsg()") Cc: stable@vger.kernel.org Signed-off-by: Dongli Zhang Reviewed-by: Si-Wei Liu Reviewed-by: Willem de Bruijn Reviewed-by: Paolo Abeni Reviewed-by: Jason Wang Link: https://patch.msgid.link/20240724170452.16837-3-dongli.zhang@oracle.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- drivers/net/tun.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 77e63e7366e7..c34c6f0d23ef 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2472,6 +2472,9 @@ static int tun_xdp_one(struct tun_struct *tun, bool skb_xdp = false; struct page *page; + if (unlikely(datasize < ETH_HLEN)) + return -EINVAL; + xdp_prog = rcu_dereference(tun->xdp_prog); if (xdp_prog) { if (gso->gso_type) { From 7431144b406ae82807eb87d8c98e518475b0450f Mon Sep 17 00:00:00 2001 From: Si-Wei Liu Date: Wed, 24 Jul 2024 10:04:51 -0700 Subject: [PATCH 1538/1565] tap: add missing verification for short frame commit ed7f2afdd0e043a397677e597ced0830b83ba0b3 upstream. The cited commit missed to check against the validity of the frame length in the tap_get_user_xdp() path, which could cause a corrupted skb to be sent downstack. Even before the skb is transmitted, the tap_get_user_xdp()-->skb_set_network_header() may assume the size is more than ETH_HLEN. Once transmitted, this could either cause out-of-bound access beyond the actual length, or confuse the underlayer with incorrect or inconsistent header length in the skb metadata. In the alternative path, tap_get_user() already prohibits short frame which has the length less than Ethernet header size from being transmitted. This is to drop any frame shorter than the Ethernet header size just like how tap_get_user() does. CVE: CVE-2024-41090 Link: https://lore.kernel.org/netdev/1717026141-25716-1-git-send-email-si-wei.liu@oracle.com/ Fixes: 0efac27791ee ("tap: accept an array of XDP buffs through sendmsg()") Cc: stable@vger.kernel.org Signed-off-by: Si-Wei Liu Signed-off-by: Dongli Zhang Reviewed-by: Willem de Bruijn Reviewed-by: Paolo Abeni Reviewed-by: Jason Wang Link: https://patch.msgid.link/20240724170452.16837-2-dongli.zhang@oracle.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- drivers/net/tap.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 41ee56015a45..16fa0e3e752a 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -1141,6 +1141,11 @@ static int tap_get_user_xdp(struct tap_queue *q, struct xdp_buff *xdp) struct sk_buff *skb; int err, depth; + if (unlikely(xdp->data_end - xdp->data < ETH_HLEN)) { + err = -EINVAL; + goto err; + } + if (q->flags & IFF_VNET_HDR) vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz); From b15dc4170c6317731c00fa8fe7d4d99a0a04c83c Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Sat, 27 Jul 2024 10:40:24 +0200 Subject: [PATCH 1539/1565] Linux 5.10.223 Link: https://lore.kernel.org/r/20240725142733.262322603@linuxfoundation.org Tested-by: ChromeOS CQ Test Tested-by: Pavel Machek (CIP) Tested-by: kernelci.org bot Tested-by: Mark Brown Tested-by: Jon Hunter Tested-by: Florian Fainelli Tested-by: Linux Kernel Functional Testing Signed-off-by: Greg Kroah-Hartman --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 5b6dae61250c..91c79c64f5a3 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 5 PATCHLEVEL = 10 -SUBLEVEL = 222 +SUBLEVEL = 223 EXTRAVERSION = NAME = Dare mighty things From eeb46257ab7638a9d0e2f6d04963b374f238dfdf Mon Sep 17 00:00:00 2001 From: liang zhang Date: Wed, 7 Aug 2024 13:57:31 +0800 Subject: [PATCH 1540/1565] ANDROID: GKI: Update symbols to symbol list Leaf changes summary: 6 artifacts changed Changed leaf types summary: 0 leaf type changed Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 6 Added functions Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 0 Added variable 6 Added functions: [A] 'function int fsnotify_add_mark(fsnotify_mark*, fsnotify_connp_t*, unsigned int, int, __kernel_fsid_t*)' [A] 'function fsnotify_group* fsnotify_alloc_group(const fsnotify_ops*)' [A] 'function void fsnotify_destroy_mark(fsnotify_mark*, fsnotify_group*)' [A] 'function void fsnotify_init_mark(fsnotify_mark*, fsnotify_group*)' [A] 'function void fsnotify_put_group(fsnotify_group*)' [A] 'function void fsnotify_put_mark(fsnotify_mark*)' Bug: 357980118 Change-Id: I9c0c1a01cca0994c8e9273cea039df6547f58895 Signed-off-by: liang zhang --- android/abi_gki_aarch64.xml | 356 ++++++++++++++++++++++++++++-- android/abi_gki_aarch64_transsion | 6 + 2 files changed, 344 insertions(+), 18 deletions(-) diff --git a/android/abi_gki_aarch64.xml b/android/abi_gki_aarch64.xml index 2e96b987c556..c6bcef1a55d5 100644 --- a/android/abi_gki_aarch64.xml +++ b/android/abi_gki_aarch64.xml @@ -2766,6 +2766,12 @@ + + + + + + @@ -8069,6 +8075,10 @@ + + + + @@ -9749,6 +9759,7 @@ + @@ -11874,6 +11885,17 @@ + + + + + + + + + + + @@ -15606,6 +15628,89 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -23343,6 +23448,10 @@ + + + + @@ -27016,6 +27125,11 @@ + + + + + @@ -27844,6 +27958,7 @@ + @@ -30338,6 +30453,59 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -30937,6 +31105,7 @@ + @@ -32921,6 +33090,7 @@ + @@ -34116,6 +34286,9 @@ + + + @@ -38461,6 +38634,7 @@ + @@ -43202,6 +43376,17 @@ + + + + + + + + + + + @@ -43970,6 +44155,26 @@ + + + + + + + + + + + + + + + + + + + + @@ -50673,6 +50878,7 @@ + @@ -58251,6 +58457,15 @@ + + + + + + + + + @@ -68001,6 +68216,7 @@ + @@ -68959,7 +69175,14 @@ - + + + + + + + + @@ -69760,6 +69983,7 @@ + @@ -76690,6 +76914,7 @@ + @@ -78368,6 +78593,10 @@ + + + + @@ -82897,6 +83126,35 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -87063,6 +87321,17 @@ + + + + + + + + + + + @@ -89331,6 +89600,7 @@ + @@ -93131,6 +93401,7 @@ + @@ -94000,6 +94271,14 @@ + + + + + + + + @@ -94299,6 +94578,7 @@ + @@ -97310,6 +97590,7 @@ + @@ -104022,6 +104303,7 @@ + @@ -107561,6 +107843,14 @@ + + + + + + + + @@ -118684,9 +118974,9 @@ - - - + + + @@ -118695,10 +118985,10 @@ - - - - + + + + @@ -132837,6 +133127,36 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -132882,8 +133202,8 @@ - - + + @@ -147434,8 +147754,8 @@ - - + + @@ -147452,8 +147772,8 @@ - - + + @@ -147482,8 +147802,8 @@ - - + + @@ -147568,10 +147888,10 @@ - + - + diff --git a/android/abi_gki_aarch64_transsion b/android/abi_gki_aarch64_transsion index 3870b27f06fd..d8f080416a63 100644 --- a/android/abi_gki_aarch64_transsion +++ b/android/abi_gki_aarch64_transsion @@ -134,3 +134,9 @@ __tracepoint_android_vh_blk_mq_free_tags __tracepoint_android_vh_blk_mq_sched_insert_request zero_pfn + fsnotify_add_mark + fsnotify_alloc_group + fsnotify_destroy_mark + fsnotify_init_mark + fsnotify_put_group + fsnotify_put_mark From 306e16d49cf6da40d0bf62a45302f8b31b5c4bc4 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Sun, 11 Aug 2024 14:28:51 +0000 Subject: [PATCH 1541/1565] Revert "net: mac802154: Fix racy device stats updates by DEV_STATS_INC() and DEV_STATS_ADD()" This reverts commit d1e4e94cb8ab5154ea0ee3a6f8bed08b8d398fb2 which is commit b8ec0dc3845f6c9089573cb5c2c4b05f7fc10728 upstream. It breaks the Android kernel abi and can be brought back in the future in an abi-safe way if it is really needed. Bug: 161946584 Change-Id: I08cf60f05f5db95e255ce111e9556b0671b0cc09 Signed-off-by: Greg Kroah-Hartman --- net/mac802154/tx.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c index 7cea95d0b78f..c829e4a75325 100644 --- a/net/mac802154/tx.c +++ b/net/mac802154/tx.c @@ -34,8 +34,8 @@ void ieee802154_xmit_worker(struct work_struct *work) if (res) goto err_tx; - DEV_STATS_INC(dev, tx_packets); - DEV_STATS_ADD(dev, tx_bytes, skb->len); + dev->stats.tx_packets++; + dev->stats.tx_bytes += skb->len; ieee802154_xmit_complete(&local->hw, skb, false); @@ -86,8 +86,8 @@ ieee802154_tx(struct ieee802154_local *local, struct sk_buff *skb) goto err_tx; } - DEV_STATS_INC(dev, tx_packets); - DEV_STATS_ADD(dev, tx_bytes, len); + dev->stats.tx_packets++; + dev->stats.tx_bytes += len; } else { local->tx_skb = skb; queue_work(local->workqueue, &local->tx_work); From 72f4574b8d4ceba24f522f772739a91aede08a94 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 12 Aug 2024 13:08:57 +0000 Subject: [PATCH 1542/1565] Revert "ext4: Send notifications on error" This reverts commit 9760c6ceb2a92c8b5e9ecd9d12c86469d9217f58 which is commit 9a089b21f79b47eed240d4da7ea0d049de7c9b4d upstream. It breaks the build due to other patches not being backported, so revert it as it is not needed in an Android kernel. Fixes: 9760c6ceb2a9 ("ext4: Send notifications on error") Change-Id: I95da56bf79ffde03437bc20f9b6435554d3cdb32 Signed-off-by: Greg Kroah-Hartman --- fs/ext4/super.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 8a3be3b063df..55af9897f65e 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -46,7 +46,6 @@ #include #include #include -#include #include "ext4.h" #include "ext4_extents.h" /* Needed for trace points definition */ @@ -700,7 +699,6 @@ void __ext4_error(struct super_block *sb, const char *function, sb->s_id, function, line, current->comm, &vaf); va_end(args); } - fsnotify_sb_error(sb, NULL, error ? error : EFSCORRUPTED); save_error_info(sb, error, 0, block, function, line); ext4_handle_error(sb); } @@ -732,7 +730,6 @@ void __ext4_error_inode(struct inode *inode, const char *function, current->comm, &vaf); va_end(args); } - fsnotify_sb_error(inode->i_sb, inode, error ? error : EFSCORRUPTED); save_error_info(inode->i_sb, error, inode->i_ino, block, function, line); ext4_handle_error(inode->i_sb); @@ -772,7 +769,6 @@ void __ext4_error_file(struct file *file, const char *function, current->comm, path, &vaf); va_end(args); } - fsnotify_sb_error(inode->i_sb, inode, EFSCORRUPTED); save_error_info(inode->i_sb, EFSCORRUPTED, inode->i_ino, block, function, line); ext4_handle_error(inode->i_sb); @@ -841,7 +837,7 @@ void __ext4_std_error(struct super_block *sb, const char *function, printk(KERN_CRIT "EXT4-fs error (device %s) in %s:%d: %s\n", sb->s_id, function, line, errstr); } - fsnotify_sb_error(sb, NULL, errno ? errno : EFSCORRUPTED); + save_error_info(sb, -errno, 0, 0, function, line); ext4_handle_error(sb); } @@ -865,7 +861,6 @@ void __ext4_abort(struct super_block *sb, const char *function, if (unlikely(ext4_forced_shutdown(EXT4_SB(sb)))) return; - fsnotify_sb_error(sb, NULL, error ? error : EFSCORRUPTED); save_error_info(sb, error, 0, 0, function, line); va_start(args, fmt); vaf.fmt = fmt; From dc67fccdbe146f0b1095f7d5b164bd09ceb9efbf Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 12 Aug 2024 13:07:34 +0000 Subject: [PATCH 1543/1565] ANDROID: properly backport filelock fix in 5.10.223 In commit 5661b9c7ec18 ("filelock: Remove locks reliably when fcntl/close race is detected"), upstream the current change was not present, but it is in our branch due to external changes, so fix it up to build and work properly here. Fixes: 5661b9c7ec18 ("filelock: Remove locks reliably when fcntl/close race is detected") Change-Id: I0ac2c1ab80cfb9f9577d09dbee45d3eb2ac19612 Signed-off-by: Greg Kroah-Hartman --- fs/locks.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 83145d4639a1..425ef3951153 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2546,7 +2546,7 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd, f = fcheck(fd); spin_unlock(¤t->files->file_lock); if (f != filp) { - locks_remove_posix(filp, files); + locks_remove_posix(filp, ¤t->files); error = -EBADF; } } @@ -2676,7 +2676,7 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd, f = fcheck(fd); spin_unlock(¤t->files->file_lock); if (f != filp) { - locks_remove_posix(filp, files); + locks_remove_posix(filp, ¤t->files); error = -EBADF; } } From c6acc5f0795ce34ecabc78e3aebbd583b909a075 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 12 Aug 2024 15:28:24 +0000 Subject: [PATCH 1544/1565] Revert "scsi: core: Fix a use-after-free" This reverts commit 5ce8fad941233e81f2afb5b52a3fcddd3ba8732f which is commit 8fe4ce5836e932f5766317cb651c1ff2a4cd0506 upstream. It breaks the Android kernel abi and can be brought back in the future in an abi-safe way if it is really needed. Bug: 161946584 Change-Id: I939984e199db400ce2107dedaec05c186eae11d8 Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/hosts.c | 16 +++------------- drivers/scsi/scsi_lib.c | 6 +----- drivers/scsi/scsi_priv.h | 2 +- drivers/scsi/scsi_scan.c | 1 - drivers/scsi/scsi_sysfs.c | 1 - include/scsi/scsi_host.h | 2 -- 6 files changed, 5 insertions(+), 23 deletions(-) diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index e8963fa37f57..16e6ac792de5 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -182,15 +182,6 @@ void scsi_remove_host(struct Scsi_Host *shost) scsi_proc_host_rm(shost); scsi_proc_hostdir_rm(shost->hostt); - /* - * New SCSI devices cannot be attached anymore because of the SCSI host - * state so drop the tag set refcnt. Wait until the tag set refcnt drops - * to zero because .exit_cmd_priv implementations may need the host - * pointer. - */ - kref_put(&shost->tagset_refcnt, scsi_mq_free_tags); - wait_for_completion(&shost->tagset_freed); - spin_lock_irqsave(shost->host_lock, flags); if (scsi_host_set_state(shost, SHOST_DEL)) BUG_ON(scsi_host_set_state(shost, SHOST_DEL_RECOVERY)); @@ -249,9 +240,6 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, shost->dma_dev = dma_dev; - kref_init(&shost->tagset_refcnt); - init_completion(&shost->tagset_freed); - /* * Increase usage count temporarily here so that calling * scsi_autopm_put_host() will trigger runtime idle if there is @@ -324,7 +312,6 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, pm_runtime_disable(&shost->shost_gendev); pm_runtime_set_suspended(&shost->shost_gendev); pm_runtime_put_noidle(&shost->shost_gendev); - kref_put(&shost->tagset_refcnt, scsi_mq_free_tags); fail: return error; } @@ -357,6 +344,9 @@ static void scsi_host_dev_release(struct device *dev) kfree(dev_name(&shost->shost_dev)); } + if (shost->tag_set.tags) + scsi_mq_destroy_tags(shost); + kfree(shost->shost_data); ida_simple_remove(&host_index_ida, shost->host_no); diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 91dc4d4b32df..99b90031500b 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1921,13 +1921,9 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) return blk_mq_alloc_tag_set(tag_set); } -void scsi_mq_free_tags(struct kref *kref) +void scsi_mq_destroy_tags(struct Scsi_Host *shost) { - struct Scsi_Host *shost = container_of(kref, typeof(*shost), - tagset_refcnt); - blk_mq_free_tag_set(&shost->tag_set); - complete(&shost->tagset_freed); } /** diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h index 3b3c165cd8af..180636d54982 100644 --- a/drivers/scsi/scsi_priv.h +++ b/drivers/scsi/scsi_priv.h @@ -93,7 +93,7 @@ extern void scsi_requeue_run_queue(struct work_struct *work); extern struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev); extern void scsi_start_queue(struct scsi_device *sdev); extern int scsi_mq_setup_tags(struct Scsi_Host *shost); -extern void scsi_mq_free_tags(struct kref *kref); +extern void scsi_mq_destroy_tags(struct Scsi_Host *shost); extern void scsi_exit_queue(void); extern void scsi_evt_thread(struct work_struct *work); struct request_queue; diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index e8703b043805..6f7c4d41c51d 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -273,7 +273,6 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget, kfree(sdev); goto out; } - kref_get(&sdev->host->tagset_refcnt); WARN_ON_ONCE(!blk_get_queue(sdev->request_queue)); sdev->request_queue->queuedata = sdev; diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index bb37d698da1a..6cc4d0792e3d 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -1481,7 +1481,6 @@ void __scsi_remove_device(struct scsi_device *sdev) mutex_unlock(&sdev->state_mutex); blk_cleanup_queue(sdev->request_queue); - kref_put(&sdev->host->tagset_refcnt, scsi_mq_free_tags); cancel_work_sync(&sdev->requeue_work); if (sdev->host->hostt->slave_destroy) diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h index 543c61ccc4a1..8c0d37f389df 100644 --- a/include/scsi/scsi_host.h +++ b/include/scsi/scsi_host.h @@ -554,8 +554,6 @@ struct Scsi_Host { struct scsi_host_template *hostt; struct scsi_transport_template *transportt; - struct kref tagset_refcnt; - struct completion tagset_freed; /* Area to keep a shared tag map */ struct blk_mq_tag_set tag_set; From d66c08993e4a3cbf88026ebcc98f2c8e44d90e01 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Maciej=20=C5=BBenczykowski?= Date: Fri, 17 Sep 2021 06:49:25 -0700 Subject: [PATCH 1545/1565] ANDROID: gki - CONFIG_USB_NET_AX8817X=y MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Found an Apple Ethernet USB dongle which only works on older Pixels, but not on newer ones. Test: TreeHugger Bug: 200269356 Signed-off-by: Maciej Żenczykowski Change-Id: Id366434cc0a41f5e7e8f336dbdb854c7c117b35d --- arch/arm64/configs/gki_defconfig | 1 - arch/x86/configs/gki_defconfig | 1 - 2 files changed, 2 deletions(-) diff --git a/arch/arm64/configs/gki_defconfig b/arch/arm64/configs/gki_defconfig index 9688b5ec5372..665f5fb5c8d8 100644 --- a/arch/arm64/configs/gki_defconfig +++ b/arch/arm64/configs/gki_defconfig @@ -333,7 +333,6 @@ CONFIG_PPPOL2TP=y CONFIG_USB_RTL8150=y CONFIG_USB_RTL8152=y CONFIG_USB_USBNET=y -# CONFIG_USB_NET_AX8817X is not set # CONFIG_USB_NET_AX88179_178A is not set CONFIG_USB_NET_CDC_EEM=y # CONFIG_USB_NET_NET1080 is not set diff --git a/arch/x86/configs/gki_defconfig b/arch/x86/configs/gki_defconfig index 986af3f86e1e..ee9853544d9e 100644 --- a/arch/x86/configs/gki_defconfig +++ b/arch/x86/configs/gki_defconfig @@ -304,7 +304,6 @@ CONFIG_PPPOL2TP=y CONFIG_USB_RTL8150=y CONFIG_USB_RTL8152=y CONFIG_USB_USBNET=y -# CONFIG_USB_NET_AX8817X is not set # CONFIG_USB_NET_AX88179_178A is not set CONFIG_USB_NET_CDC_EEM=y # CONFIG_USB_NET_NET1080 is not set From 68af227b9573f25c25e17bdf11b93fae828ad1b1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Maciej=20=C5=BBenczykowski?= Date: Fri, 17 Sep 2021 15:41:58 -0700 Subject: [PATCH 1546/1565] ANDROID: gki - set CONFIG_USB_NET_AX88179_178A=y (usb gbit ethernet dongle) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per Kconfig: config USB_NET_AX88179_178A tristate "ASIX AX88179/178A USB 3.0/2.0 to Gigabit Ethernet" depends on USB_USBNET select CRC32 select PHYLIB default y help This option adds support for ASIX AX88179 based USB 3.0/2.0 to Gigabit Ethernet adapters. This driver should work with at least the following devices: * ASIX AX88179 * ASIX AX88178A * Sitcomm LN-032 This driver creates an interface named "ethX", where X depends on what other networking devices you have in use. This was already enabled on 'db845c_gki.fragment', which suggests this hardware is reasonably common (even though I don't have a dongle that requires it). Test: TreeHugger Bug: 200269356 Signed-off-by: Maciej Żenczykowski Change-Id: I9915cfb54a324f007d508a8e3d2aad1d6fc9e5de (cherry picked from commit 3ed683cb941fc0a457159dbaf736eee40f82b840) --- arch/arm64/configs/db845c_gki.fragment | 2 -- arch/arm64/configs/gki_defconfig | 1 - arch/x86/configs/gki_defconfig | 1 - 3 files changed, 4 deletions(-) diff --git a/arch/arm64/configs/db845c_gki.fragment b/arch/arm64/configs/db845c_gki.fragment index e7c8f04055e6..b67d6ee1015e 100644 --- a/arch/arm64/configs/db845c_gki.fragment +++ b/arch/arm64/configs/db845c_gki.fragment @@ -6,8 +6,6 @@ CONFIG_MAC80211=m CONFIG_QRTR=m CONFIG_QRTR_TUN=m CONFIG_SCSI_UFS_QCOM=m -CONFIG_USB_NET_AX8817X=m -CONFIG_USB_NET_AX88179_178A=m CONFIG_INPUT_PM8941_PWRKEY=m CONFIG_SERIAL_MSM=m CONFIG_SERIAL_QCOM_GENI=m diff --git a/arch/arm64/configs/gki_defconfig b/arch/arm64/configs/gki_defconfig index 665f5fb5c8d8..c79d77da33bc 100644 --- a/arch/arm64/configs/gki_defconfig +++ b/arch/arm64/configs/gki_defconfig @@ -333,7 +333,6 @@ CONFIG_PPPOL2TP=y CONFIG_USB_RTL8150=y CONFIG_USB_RTL8152=y CONFIG_USB_USBNET=y -# CONFIG_USB_NET_AX88179_178A is not set CONFIG_USB_NET_CDC_EEM=y # CONFIG_USB_NET_NET1080 is not set # CONFIG_USB_NET_CDC_SUBSET is not set diff --git a/arch/x86/configs/gki_defconfig b/arch/x86/configs/gki_defconfig index ee9853544d9e..b14ce267cf44 100644 --- a/arch/x86/configs/gki_defconfig +++ b/arch/x86/configs/gki_defconfig @@ -304,7 +304,6 @@ CONFIG_PPPOL2TP=y CONFIG_USB_RTL8150=y CONFIG_USB_RTL8152=y CONFIG_USB_USBNET=y -# CONFIG_USB_NET_AX88179_178A is not set CONFIG_USB_NET_CDC_EEM=y # CONFIG_USB_NET_NET1080 is not set # CONFIG_USB_NET_CDC_SUBSET is not set From c2201dde2a76788b5b7a75426e53a58e1490a028 Mon Sep 17 00:00:00 2001 From: Carlos Llamas Date: Thu, 22 Aug 2024 18:23:52 +0000 Subject: [PATCH 1547/1565] FROMLIST: binder: fix UAF caused by offsets overwrite Binder objects are processed and copied individually into the target buffer during transactions. Any raw data in-between these objects is copied as well. However, this raw data copy lacks an out-of-bounds check. If the raw data exceeds the data section size then the copy overwrites the offsets section. This eventually triggers an error that attempts to unwind the processed objects. However, at this point the offsets used to index these objects are now corrupted. Unwinding with corrupted offsets can result in decrements of arbitrary nodes and lead to their premature release. Other users of such nodes are left with a dangling pointer triggering a use-after-free. This issue is made evident by the following KASAN report (trimmed): ================================================================== BUG: KASAN: slab-use-after-free in _raw_spin_lock+0xe4/0x19c Write of size 4 at addr ffff47fc91598f04 by task binder-util/743 CPU: 9 UID: 0 PID: 743 Comm: binder-util Not tainted 6.11.0-rc4 #1 Hardware name: linux,dummy-virt (DT) Call trace: _raw_spin_lock+0xe4/0x19c binder_free_buf+0x128/0x434 binder_thread_write+0x8a4/0x3260 binder_ioctl+0x18f0/0x258c [...] Allocated by task 743: __kmalloc_cache_noprof+0x110/0x270 binder_new_node+0x50/0x700 binder_transaction+0x413c/0x6da8 binder_thread_write+0x978/0x3260 binder_ioctl+0x18f0/0x258c [...] Freed by task 745: kfree+0xbc/0x208 binder_thread_read+0x1c5c/0x37d4 binder_ioctl+0x16d8/0x258c [...] ================================================================== To avoid this issue, let's check that the raw data copy is within the boundaries of the data section. Fixes: 6d98eb95b450 ("binder: avoid potential data leakage when copying txn") Cc: Todd Kjos Cc: stable@vger.kernel.org Signed-off-by: Carlos Llamas Bug: 352520660 Link: https://lore.kernel.org/all/20240822182353.2129600-1-cmllamas@google.com/ Change-Id: I1b2dd8403b63e5eeb58904558b7b542141c83fc2 Signed-off-by: Carlos Llamas --- drivers/android/binder.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index a09270e822ae..aae0acb084b7 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -3436,6 +3436,7 @@ static void binder_transaction(struct binder_proc *proc, */ copy_size = object_offset - user_offset; if (copy_size && (user_offset > object_offset || + object_offset > tr->data_size || binder_alloc_copy_user_to_buffer( &target_proc->alloc, t->buffer, user_offset, From 134a56fc8a9c55a1c9e4459f0b1f790e5732327f Mon Sep 17 00:00:00 2001 From: Kamati Srinivas Date: Wed, 3 Jul 2024 14:39:10 +0530 Subject: [PATCH 1548/1565] ANDROID: irqchip/irq-gic-v3: Add vendor hook for gic suspend This change adds vendor hook for gic suspend syscore ops callback. And it is invoked during deepsleep and hibernation to store gic register snapshot. Bug: 340049585 Change-Id: I4e3729afa4daf18d73e00ee9601b6da72a578b4a Signed-off-by: Nagireddy Annem Signed-off-by: Kamati Srinivas --- drivers/android/vendor_hooks.c | 1 + drivers/irqchip/irq-gic-v3.c | 19 ++++++++++++++++--- include/linux/irqchip/arm-gic-v3.h | 3 +++ include/trace/hooks/gic_v3.h | 6 ++++++ 4 files changed, 26 insertions(+), 3 deletions(-) diff --git a/drivers/android/vendor_hooks.c b/drivers/android/vendor_hooks.c index 8c753468e1ea..5eb99ad34d0d 100644 --- a/drivers/android/vendor_hooks.c +++ b/drivers/android/vendor_hooks.c @@ -145,6 +145,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_cpu_idle_enter); EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_cpu_idle_exit); EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_find_busiest_group); EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_gic_resume); +EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_gic_suspend); EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_wq_lockup_pool); EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_ipi_stop); EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_dump_throttled_rt_tasks); diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 192a9b1be185..746c46a22a3f 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -217,10 +217,11 @@ static void gic_do_wait_for_rwp(void __iomem *base, u32 bit) } /* Wait for completion of a distributor change */ -static void gic_dist_wait_for_rwp(void) +void gic_dist_wait_for_rwp(void) { gic_do_wait_for_rwp(gic_data.dist_base, GICD_CTLR_RWP); } +EXPORT_SYMBOL_GPL(gic_dist_wait_for_rwp); /* Wait for completion of a redistributor change */ static void gic_redist_wait_for_rwp(void) @@ -797,7 +798,7 @@ static bool gic_has_group0(void) return val != 0; } -static void __init gic_dist_init(void) +void gic_dist_init(void) { unsigned int i; u64 affinity; @@ -855,6 +856,7 @@ static void __init gic_dist_init(void) for (i = 0; i < GIC_ESPI_NR; i++) gic_write_irouter(affinity, base + GICD_IROUTERnE + i * 8); } +EXPORT_SYMBOL_GPL(gic_dist_init); static int gic_iterate_rdists(int (*fn)(struct redist_region *, void __iomem *)) { @@ -1136,7 +1138,7 @@ static int gic_dist_supports_lpis(void) !gicv3_nolpi); } -static void gic_cpu_init(void) +void gic_cpu_init(void) { void __iomem *rbase; int i; @@ -1163,6 +1165,7 @@ static void gic_cpu_init(void) /* initialise system registers */ gic_cpu_sys_reg_init(); } +EXPORT_SYMBOL_GPL(gic_cpu_init); #ifdef CONFIG_SMP @@ -1363,8 +1366,17 @@ void gic_resume(void) } EXPORT_SYMBOL_GPL(gic_resume); +static int gic_suspend(void) +{ + int ret = 0; + + trace_android_vh_gic_suspend(&gic_data, &ret); + return ret; +} + static struct syscore_ops gic_syscore_ops = { .resume = gic_resume, + .suspend = gic_suspend, }; static void gic_syscore_init(void) @@ -1375,6 +1387,7 @@ static void gic_syscore_init(void) #else static inline void gic_syscore_init(void) { } void gic_resume(void) { } +static int gic_suspend(void) { return 0; } #endif diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index 43348c487c8e..654d2ce2af06 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -723,6 +723,9 @@ static inline bool gic_enable_sre(void) } void gic_resume(void); +void gic_dist_init(void); +void gic_cpu_init(void); +void gic_dist_wait_for_rwp(void); #endif diff --git a/include/trace/hooks/gic_v3.h b/include/trace/hooks/gic_v3.h index 6b649fd865b1..c10b159df9e0 100644 --- a/include/trace/hooks/gic_v3.h +++ b/include/trace/hooks/gic_v3.h @@ -19,6 +19,9 @@ struct irq_data; /* struct irq_data */ #include #endif /* __GENKSYMS__ */ + +struct gic_chip_data; + DECLARE_RESTRICTED_HOOK(android_rvh_gic_v3_set_affinity, TP_PROTO(struct irq_data *d, const struct cpumask *mask_val, u64 *affinity, bool force, void __iomem *base), @@ -26,6 +29,9 @@ DECLARE_RESTRICTED_HOOK(android_rvh_gic_v3_set_affinity, 1); /* macro versions of hooks are no longer required */ +DECLARE_HOOK(android_vh_gic_suspend, + TP_PROTO(struct gic_chip_data *gd, int *ret), + TP_ARGS(gd, ret)); #endif /* _TRACE_HOOK_GIC_V3_H */ /* This part must be outside protection */ From 5a3d6440f505c4bda6f2758f81e3e14645a70b72 Mon Sep 17 00:00:00 2001 From: Kamati Srinivas Date: Wed, 31 Jul 2024 10:57:27 +0530 Subject: [PATCH 1549/1565] ANDROID: Update the GKI symbol list and ABI XML Add below functions and symbols to support GIC Deepsleep and Hibernation feature. Leaf changes summary: 5 artifacts changed Changed leaf types summary: 0 leaf type changed Removed/Changed/Added functions summary: 0 Removed, 0 Changed, 4 Added functions Removed/Changed/Added variables summary: 0 Removed, 0 Changed, 1 Added variable 4 Added functions: [A] 'function int __traceiter_android_vh_gic_suspend(void*, gic_chip_data*, int*)' [A] 'function void gic_cpu_init()' [A] 'function void gic_dist_init()' [A] 'function void gic_dist_wait_for_rwp()' 1 Added variable: [A] 'tracepoint __tracepoint_android_vh_gic_suspend' Bug: 340049585 Change-Id: Ib972331e5851b02ba8f9963f642a868f9b22129a Signed-off-by: Kamati Srinivas --- android/abi_gki_aarch64.xml | 64 +++++++++++++++++++----------------- android/abi_gki_aarch64_qcom | 5 +++ 2 files changed, 38 insertions(+), 31 deletions(-) diff --git a/android/abi_gki_aarch64.xml b/android/abi_gki_aarch64.xml index c6bcef1a55d5..967f69d39267 100644 --- a/android/abi_gki_aarch64.xml +++ b/android/abi_gki_aarch64.xml @@ -564,6 +564,7 @@ + @@ -2904,6 +2905,9 @@ + + + @@ -6722,6 +6726,7 @@ + @@ -84588,9 +84593,6 @@ - - - @@ -93610,9 +93612,6 @@ - - - @@ -106967,20 +106966,7 @@ - - - - - - - - - - - - - - + @@ -117226,9 +117212,9 @@ - + - + @@ -119335,13 +119321,13 @@ - - - - - - - + + + + + + + @@ -120694,6 +120680,12 @@ + + + + + + @@ -122406,7 +122398,7 @@ - + @@ -122613,6 +122605,7 @@ + @@ -133859,8 +133852,17 @@ + + + + + + + + + - + diff --git a/android/abi_gki_aarch64_qcom b/android/abi_gki_aarch64_qcom index 3fb23d9e4f38..391e91bc04c6 100644 --- a/android/abi_gki_aarch64_qcom +++ b/android/abi_gki_aarch64_qcom @@ -984,6 +984,9 @@ get_user_pages get_zeroed_page gfp_zone + gic_cpu_init + gic_dist_init + gic_dist_wait_for_rwp gic_nonsecure_priorities gic_resume gov_attr_set_init @@ -2587,6 +2590,7 @@ __traceiter_android_vh_ftrace_oops_exit __traceiter_android_vh_ftrace_size_check __traceiter_android_vh_gic_resume + __traceiter_android_vh_gic_suspend __traceiter_android_vh_gpio_block_read __traceiter_android_vh_handle_tlb_conf __traceiter_android_vh_iommu_setup_dma_ops @@ -2716,6 +2720,7 @@ __tracepoint_android_vh_ftrace_oops_exit __tracepoint_android_vh_ftrace_size_check __tracepoint_android_vh_gic_resume + __tracepoint_android_vh_gic_suspend __tracepoint_android_vh_gpio_block_read __tracepoint_android_vh_handle_tlb_conf __tracepoint_android_vh_iommu_setup_dma_ops From 9ac177ec5cc195943ddbc9db1f4e071107a14e1c Mon Sep 17 00:00:00 2001 From: Hangyu Hua Date: Mon, 3 Jun 2024 15:13:03 +0800 Subject: [PATCH 1550/1565] UPSTREAM: net: sched: sch_multiq: fix possible OOB write in multiq_tune() [ Upstream commit affc18fdc694190ca7575b9a86632a73b9fe043d ] q->bands will be assigned to qopt->bands to execute subsequent code logic after kmalloc. So the old q->bands should not be used in kmalloc. Otherwise, an out-of-bounds write will occur. Bug: 349777785 Fixes: c2999f7fb05b ("net: sched: multiq: don't call qdisc_put() while holding tree lock") Signed-off-by: Hangyu Hua Acked-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Sasha Levin (cherry picked from commit 0f208fad86631e005754606c3ec80c0d44a11882) Signed-off-by: Lee Jones Change-Id: Iec8413c39878596795420ae58bbe6974890cf2de From 89d09e01fa4ebc47727b66f543f14eee4c24b07f Mon Sep 17 00:00:00 2001 From: liliangliang Date: Sun, 7 Apr 2024 15:19:02 +0800 Subject: [PATCH 1551/1565] ANDROID: vendor_hooks: add vendor hooks for fuse request Add hooks to fuse queue request and request end so we can do boost to those background tasks which block the UX related task. Bug: 333220630 Change-Id: I9be59ed88675c5102c57ba9cbd26cf4df3d2fd7f Signed-off-by: liliangliang (cherry picked from commit e520c2932df0d1bbf83ae45c82ac01fd41655d77) (cherry picked from commit e9fd05e64568bfb9fe1fb4c319d7be17ab1c448a) --- drivers/android/vendor_hooks.c | 3 +++ fs/fuse/dev.c | 4 ++++ include/trace/hooks/fuse.h | 24 ++++++++++++++++++++++++ 3 files changed, 31 insertions(+) create mode 100644 include/trace/hooks/fuse.h diff --git a/drivers/android/vendor_hooks.c b/drivers/android/vendor_hooks.c index 5eb99ad34d0d..9df83fbfe170 100644 --- a/drivers/android/vendor_hooks.c +++ b/drivers/android/vendor_hooks.c @@ -78,6 +78,7 @@ #include #include #include +#include /* * Export tracepoints that act as a bare tracehook (ie: have no trace event @@ -493,6 +494,8 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_free_pages); EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_set_shmem_page_flag); EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_mmput); EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_sched_pelt_multiplier); +EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_queue_request_and_unlock); +EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_fuse_request_end); EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_alloc_pages_reclaim_bypass); EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_alloc_pages_failure_bypass); EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_check_page_look_around_ref); diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 47eef2496230..1c277a08b6ec 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -23,6 +23,8 @@ #include #include +#include + MODULE_ALIAS_MISCDEV(FUSE_MINOR); MODULE_ALIAS("devname:fuse"); @@ -234,6 +236,7 @@ __releases(fiq->lock) fuse_len_args(req->args->in_numargs, (struct fuse_arg *) req->args->in_args); list_add_tail(&req->list, &fiq->pending); + trace_android_vh_queue_request_and_unlock(&fiq->waitq, sync); fiq->ops->wake_pending_and_unlock(fiq, sync); } @@ -330,6 +333,7 @@ void fuse_request_end(struct fuse_req *req) } else { /* Wake up waiter sleeping in request_wait_answer() */ wake_up(&req->waitq); + trace_android_vh_fuse_request_end(current); } if (test_bit(FR_ASYNC, &req->flags)) diff --git a/include/trace/hooks/fuse.h b/include/trace/hooks/fuse.h new file mode 100644 index 000000000000..a4267b47d5c0 --- /dev/null +++ b/include/trace/hooks/fuse.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM fuse +#undef TRACE_INCLUDE_PATH +#define TRACE_INCLUDE_PATH trace/hooks +#if !defined(_TRACE_HOOK_FUSE_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_HOOK_FUSE_H +#include +/* + * Following tracepoints are not exported in tracefs and provide a + * mechanism for vendor modules to hook and extend functionality + */ + +struct wait_queue_head; +DECLARE_HOOK(android_vh_queue_request_and_unlock, + TP_PROTO(struct wait_queue_head *wq_head, bool sync), + TP_ARGS(wq_head, sync)); +DECLARE_HOOK(android_vh_fuse_request_end, + TP_PROTO(struct task_struct *self), + TP_ARGS(self)); + +#endif /* _TRACE_HOOK_FUSE_H */ +/* This part must be outside protection */ +#include From c7d7f8476dfc40cd86018bda22cb422918195cd9 Mon Sep 17 00:00:00 2001 From: Liangliang Li Date: Wed, 4 Sep 2024 17:32:22 +0800 Subject: [PATCH 1552/1565] ANDROID: GKI: Update symbol list for vivo 2 function symbol(s) added 'int __traceiter_android_vh_fuse_request_end(void*, struct task_struct*)' 'int __traceiter_android_vh_queue_request_and_unlock(void*, struct fuse_iqueue*, bool)' 2 variable symbol(s) added 'struct tracepoint __tracepoint_android_vh_fuse_request_end' 'struct tracepoint __tracepoint_android_vh_queue_request_and_unlock' Bug: 364501502 Change-Id: I9f8e645fa83fe5a0f39fd11daa6e758a63c6550c Signed-off-by: Liangliang Li --- android/abi_gki_aarch64.xml | 3772 +++++++++++++++++----------------- android/abi_gki_aarch64_vivo | 4 + 2 files changed, 1869 insertions(+), 1907 deletions(-) diff --git a/android/abi_gki_aarch64.xml b/android/abi_gki_aarch64.xml index 967f69d39267..ffea9cf2133a 100644 --- a/android/abi_gki_aarch64.xml +++ b/android/abi_gki_aarch64.xml @@ -555,6 +555,7 @@ + @@ -637,6 +638,7 @@ + @@ -6717,6 +6719,7 @@ + @@ -6799,6 +6802,7 @@ + @@ -7450,7 +7454,7 @@ - + @@ -7467,21 +7471,21 @@ - + - + - + - + - + - + @@ -7727,12 +7731,12 @@ - + - + - + @@ -8614,7 +8618,7 @@ - + @@ -10002,30 +10006,30 @@ - + - + - + - + - + - + - + - + - + @@ -10466,18 +10470,18 @@ - + - + - + - + - + @@ -10961,12 +10965,12 @@ - + - + - + @@ -12602,7 +12606,7 @@ - + @@ -12948,9 +12952,9 @@ - + - + @@ -13905,18 +13909,18 @@ - + - + - + - + - + @@ -14982,15 +14986,15 @@ - + - + - + - + @@ -15317,7 +15321,6 @@ - @@ -15633,89 +15636,6 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -15831,51 +15751,51 @@ - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + @@ -17265,24 +17185,24 @@ - + - + - + - + - + - + - + @@ -18301,15 +18221,15 @@ - + - + - + - + @@ -18665,27 +18585,27 @@ - + - + - + - + - + - + - + - + @@ -18728,12 +18648,12 @@ - + - + - + @@ -19536,36 +19456,36 @@ - + - + - + - + - + - + - + - + - + - + - + @@ -19779,24 +19699,24 @@ - + - + - + - + - + - + - + @@ -20277,7 +20197,7 @@ - + @@ -20811,18 +20731,18 @@ - + - + - + - + - + @@ -23672,12 +23592,12 @@ - + - + - + @@ -25855,12 +25775,12 @@ - + - + - + @@ -26289,57 +26209,57 @@ - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + @@ -26898,87 +26818,87 @@ - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + @@ -29327,27 +29247,27 @@ - + - + - + - + - + - + - + - + @@ -30003,7 +29923,7 @@ - + @@ -30744,12 +30664,12 @@ - + - + - + @@ -31198,9 +31118,9 @@ - + - + @@ -32129,30 +32049,30 @@ - + - + - + - + - + - + - + - + - + @@ -33489,15 +33409,15 @@ - + - + - + - + @@ -33872,7 +33792,7 @@ - + @@ -33883,24 +33803,24 @@ - + - + - + - + - + - + - + @@ -35166,12 +35086,12 @@ - + - + - + @@ -35563,7 +35483,7 @@ - + @@ -36244,15 +36164,15 @@ - + - + - + - + @@ -37971,7 +37891,7 @@ - + @@ -38015,12 +37935,6 @@ - - - - - - @@ -38438,7 +38352,7 @@ - + @@ -39290,51 +39204,51 @@ - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + @@ -39586,7 +39500,7 @@ - + @@ -41168,30 +41082,30 @@ - + - + - + - + - + - + - + - + - + @@ -42089,21 +42003,21 @@ - + - + - + - + - + - + @@ -43226,21 +43140,21 @@ - + - + - + - + - + - + @@ -43696,386 +43610,386 @@ - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + @@ -44542,45 +44456,45 @@ - + - + - + - + - + - + - + - + - + - + - + - + - + - + @@ -45058,7 +44972,23 @@ - + + + + + + + + + + + + + + + + + @@ -45881,7 +45811,7 @@ - + @@ -46635,9 +46565,9 @@ - + - + @@ -48187,9 +48117,9 @@ - + - + @@ -48770,21 +48700,21 @@ - + - + - + - + - + - + @@ -49460,6 +49390,7 @@ + @@ -49602,7 +49533,7 @@ - + @@ -49644,7 +49575,7 @@ - + @@ -49820,15 +49751,15 @@ - + - + - + - + @@ -51260,12 +51191,12 @@ - + - + - + @@ -51384,7 +51315,7 @@ - + @@ -52015,21 +51946,21 @@ - + - + - + - + - + - + @@ -52348,27 +52279,27 @@ - + - + - + - + - + - + - + - + @@ -54445,6 +54376,13 @@ + + + + + + + @@ -54617,7 +54555,7 @@ - + @@ -54801,18 +54739,18 @@ - + - + - + - + - + @@ -54866,15 +54804,15 @@ - + - + - + - + @@ -56968,12 +56906,12 @@ - + - + - + @@ -57326,30 +57264,30 @@ - + - + - + - + - + - + - + - + - + @@ -57580,7 +57518,7 @@ - + @@ -58307,96 +58245,96 @@ - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + @@ -64261,7 +64199,7 @@ - + @@ -64599,30 +64537,30 @@ - + - + - + - + - + - + - + - + - + @@ -64909,7 +64847,7 @@ - + @@ -66758,7 +66696,7 @@ - + @@ -67137,6 +67075,13 @@ + + + + + + + @@ -67426,45 +67371,45 @@ - + - + - + - + - + - + - + - + - + - + - + - + - + - + @@ -68783,18 +68728,18 @@ - + - + - + - + - + @@ -68830,27 +68775,27 @@ - + - + - + - + - + - + - + - + @@ -68897,9 +68842,9 @@ - + - + @@ -69180,14 +69125,7 @@ - - - - - - - - + @@ -69258,30 +69196,30 @@ - + - + - + - + - + - + - + - + - + @@ -69358,18 +69296,18 @@ - + - + - + - + - + @@ -69801,12 +69739,12 @@ - + - + - + @@ -69940,15 +69878,15 @@ - + - + - + - + @@ -70067,18 +70005,18 @@ - + - + - + - + - + @@ -70226,12 +70164,12 @@ - + - + - + @@ -70617,26 +70555,7 @@ - - - - - - - - - - - - - - - - - - - - + @@ -72345,12 +72264,12 @@ - + - + - + @@ -73296,18 +73215,18 @@ - + - + - + - + - + @@ -73503,30 +73422,30 @@ - + - + - + - + - + - + - + - + - + @@ -74210,18 +74129,18 @@ - + - + - + - + - + @@ -74567,7 +74486,7 @@ - + @@ -74697,15 +74616,15 @@ - + - + - + - + @@ -75082,7 +75001,7 @@ - + @@ -75479,30 +75398,30 @@ - + - + - + - + - + - + - + - + - + @@ -76834,12 +76753,12 @@ - + - + - + @@ -78438,18 +78357,18 @@ - + - + - + - + - + @@ -78488,9 +78407,9 @@ - + - + @@ -78975,30 +78894,30 @@ - + - + - + - + - + - + - + - + - + @@ -79745,12 +79664,12 @@ - + - + - + @@ -79894,9 +79813,9 @@ - + - + @@ -79921,15 +79840,15 @@ - + - + - + - + @@ -80464,7 +80383,7 @@ - + @@ -80990,24 +80909,24 @@ - + - + - + - + - + - + - + @@ -82533,24 +82452,24 @@ - + - + - + - + - + - + - + @@ -83116,18 +83035,18 @@ - + - + - + - + - + @@ -84558,33 +84477,33 @@ - + - + - + - + - + - + - + - + - + - + @@ -84593,6 +84512,9 @@ + + + @@ -92059,6 +91981,7 @@ + @@ -93612,6 +93535,9 @@ + + + @@ -93726,9 +93652,9 @@ - + - + @@ -94437,18 +94363,18 @@ - + - + - + - + - + @@ -94577,7 +94503,6 @@ - @@ -95007,7 +94932,7 @@ - + @@ -97430,7 +97355,7 @@ - + @@ -98388,12 +98313,12 @@ - + - + - + @@ -98691,12 +98616,12 @@ - + - + - + @@ -101115,30 +101040,30 @@ - + - + - + - + - + - + - + - + - + @@ -101807,15 +101732,15 @@ - + - + - + - + @@ -102207,12 +102132,12 @@ - + - + - + @@ -102345,12 +102270,12 @@ - + - + - + @@ -102387,42 +102312,42 @@ - + - + - + - + - + - + - + - + - + - + - + - + - + @@ -102837,12 +102762,12 @@ - + - + - + @@ -103673,12 +103598,12 @@ - + - + - + @@ -103841,6 +103766,12 @@ + + + + + + @@ -103860,33 +103791,33 @@ - + - + - + - + - + - + - + - + - + - + @@ -104242,18 +104173,18 @@ - + - + - + - + - + @@ -104719,15 +104650,15 @@ - + - + - + - + @@ -106248,33 +106179,33 @@ - + - + - + - + - + - + - + - + - + - + @@ -106381,18 +106312,18 @@ - + - + - + - + - + @@ -106966,7 +106897,20 @@ - + + + + + + + + + + + + + + @@ -107774,15 +107718,15 @@ - + - + - + - + @@ -109239,66 +109183,66 @@ - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + @@ -109310,30 +109254,30 @@ - + - + - + - + - + - + - + - + - + @@ -111390,33 +111334,33 @@ - + - + - + - + - + - + - + - + - + - + @@ -111469,9 +111413,9 @@ - + - + @@ -111557,7 +111501,7 @@ - + @@ -112111,6 +112055,7 @@ + @@ -112130,24 +112075,24 @@ - + - + - + - + - + - + - + @@ -112462,24 +112407,24 @@ - + - + - + - + - + - + - + @@ -113498,87 +113443,87 @@ - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + @@ -113959,24 +113904,24 @@ - + - + - + - + - + - + - + @@ -114142,7 +114087,7 @@ - + @@ -114196,15 +114141,15 @@ - + - + - + - + @@ -115130,30 +115075,30 @@ - + - + - + - + - + - + - + - + - + @@ -115452,24 +115397,24 @@ - + - + - + - + - + - + - + @@ -115795,39 +115740,39 @@ - + - + - + - + - + - + - + - + - + - + - + - + @@ -117212,9 +117157,9 @@ - + - + @@ -117406,8 +117351,8 @@ - - + + @@ -117547,10 +117492,10 @@ - - - - + + + + @@ -117724,10 +117669,10 @@ - - - - + + + + @@ -117987,16 +117932,16 @@ - - - - + + + + - - - - + + + + @@ -118048,10 +117993,10 @@ - - - - + + + + @@ -118064,26 +118009,26 @@ - - - - - - - - - + + + + + + + + + - - - - - - - - - + + + + + + + + + @@ -118768,8 +118713,8 @@ - - + + @@ -118798,8 +118743,8 @@ - - + + @@ -118868,10 +118813,10 @@ - - - - + + + + @@ -118949,9 +118894,9 @@ - - - + + + @@ -120621,6 +120566,11 @@ + + + + + @@ -121134,6 +121084,12 @@ + + + + + + @@ -122596,6 +122552,7 @@ + @@ -122678,6 +122635,7 @@ + @@ -122897,28 +122855,28 @@ - - - - - - - - - - + + + + + + + + + + - - - - - - - - - - + + + + + + + + + + @@ -123140,45 +123098,45 @@ - - - + + + - - - - - - - - - - - - - - - - - - - - - - - - - + - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -123427,9 +123385,9 @@ - - - + + + @@ -123533,13 +123491,13 @@ - - - - - - - + + + + + + + @@ -123755,8 +123713,8 @@ - - + + @@ -124628,10 +124586,10 @@ - - - - + + + + @@ -124775,8 +124733,8 @@ - - + + @@ -125096,10 +125054,10 @@ - - - - + + + + @@ -125107,21 +125065,21 @@ - - - - + + + + - - - - + + + + - - - + + + @@ -125249,8 +125207,8 @@ - - + + @@ -125661,9 +125619,9 @@ - - - + + + @@ -125902,9 +125860,9 @@ - - - + + + @@ -125912,13 +125870,13 @@ - - - - + + + + - + @@ -125957,45 +125915,45 @@ - - + + - - + + - + - - - + + + - - + + - - + + - - + + - - + + - - - + + + @@ -126012,25 +125970,25 @@ - - + + - - + + - - - + + + - - + + - - + + @@ -126137,23 +126095,23 @@ - - + + - - + + - - - + + + - - - - + + + + @@ -126188,10 +126146,10 @@ - - - - + + + + @@ -126246,22 +126204,22 @@ - - - - + + + + - - - - + + + + - - - - + + + + @@ -126302,12 +126260,12 @@ - - - - - - + + + + + + @@ -126368,13 +126326,13 @@ - - + + - - - + + + @@ -126539,8 +126497,8 @@ - - + + @@ -126642,50 +126600,50 @@ - - - + + + - - - - + + + + - - - + + + - - + + - - + + - - - + + + - - - + + + - - + + - - + + - - + + @@ -126694,14 +126652,14 @@ - - - + + + - - - + + + @@ -127009,10 +126967,10 @@ - - - - + + + + @@ -127040,16 +126998,16 @@ - - - - + + + + - - - + + + @@ -127081,9 +127039,9 @@ - - - + + + @@ -127413,17 +127371,17 @@ - - - - + + + + - - - - + + + + @@ -127442,9 +127400,9 @@ - - - + + + @@ -127462,11 +127420,11 @@ - - - - - + + + + + @@ -127571,8 +127529,8 @@ - - + + @@ -127581,9 +127539,9 @@ - - - + + + @@ -127594,54 +127552,54 @@ - - - - - - + + + + + + - - - + + + - - - + + + - - - - - - - + + + + + + + - - + + - - - + + + - - - - + + + + - - - - + + + + @@ -127681,8 +127639,8 @@ - - + + @@ -127700,19 +127658,19 @@ - - - + + + - - - + + + - - - + + + @@ -127758,42 +127716,42 @@ - - + + - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + @@ -127832,8 +127790,8 @@ - - + + @@ -127962,30 +127920,30 @@ - - - + + + - - - + + + - - - + + + - - - + + + - - - - + + + + @@ -128141,10 +128099,10 @@ - - - - + + + + @@ -128167,8 +128125,8 @@ - - + + @@ -128372,9 +128330,9 @@ - - - + + + @@ -128630,9 +128588,9 @@ - - - + + + @@ -129330,8 +129288,8 @@ - - + + @@ -129400,8 +129358,8 @@ - - + + @@ -131572,8 +131530,8 @@ - - + + @@ -132842,8 +132800,8 @@ - - + + @@ -133647,8 +133605,8 @@ - - + + @@ -133693,12 +133651,12 @@ - - + + - - + + @@ -134229,26 +134187,26 @@ - - - - - - + + + + + + - - - + + + - - - + + + @@ -134335,41 +134293,41 @@ - - + + - + - - + + - - + + - - + + - - - + + + - - + + - - - - - - + + + + + + @@ -134382,16 +134340,16 @@ - - - - - - + + + + + + - - + + @@ -134495,13 +134453,13 @@ - - - + + + - - + + @@ -134514,41 +134472,41 @@ - - - - + + + + - - + + - - + + - - - + + + - - + + - - - + + + - - - + + + @@ -134556,10 +134514,10 @@ - - - - + + + + @@ -134567,41 +134525,41 @@ - - - + + + - - - - - + + + + + - - - - + + + + - - + + - - - - + + + + - - - + + + @@ -134659,25 +134617,25 @@ - - - - + + + + - - - - - + + + + + - - + + @@ -135342,7 +135300,7 @@ - + @@ -135356,9 +135314,9 @@ - - - + + + @@ -135367,12 +135325,12 @@ - - + + - - + + @@ -135421,16 +135379,16 @@ - - + + - - + + - - + + @@ -135442,10 +135400,10 @@ - - - - + + + + @@ -135453,9 +135411,9 @@ - - - + + + @@ -135463,16 +135421,16 @@ - - + + - - + + - - + + @@ -136398,12 +136356,12 @@ - - + + - - + + @@ -137922,13 +137880,13 @@ - - + + - - - + + + @@ -137944,8 +137902,8 @@ - - + + @@ -138018,8 +137976,8 @@ - - + + @@ -138057,11 +138015,11 @@ - - - - - + + + + + @@ -138077,13 +138035,13 @@ - - + + - - - + + + @@ -138122,8 +138080,8 @@ - - + + @@ -138135,16 +138093,16 @@ - - + + - - + + - - + + @@ -138373,7 +138331,7 @@ - + @@ -138430,9 +138388,9 @@ - - - + + + @@ -138444,21 +138402,21 @@ - - - + + + - - - - + + + + - - - + + + @@ -138475,9 +138433,9 @@ - - - + + + @@ -138506,9 +138464,9 @@ - - - + + + @@ -138530,9 +138488,9 @@ - - - + + + @@ -139891,15 +139849,15 @@ - - - - + + + + - - - + + + @@ -139907,9 +139865,9 @@ - - - + + + @@ -139927,32 +139885,32 @@ - - + + - + - + - - - + + + - - - + + + - - - + + + @@ -140028,12 +139986,12 @@ - - + + - - + + @@ -140048,8 +140006,8 @@ - - + + @@ -140077,9 +140035,9 @@ - - - + + + @@ -140239,9 +140197,9 @@ - - - + + + @@ -140320,8 +140278,8 @@ - - + + @@ -140346,13 +140304,13 @@ - - - + + + - - + + @@ -140360,9 +140318,9 @@ - - - + + + @@ -140379,29 +140337,29 @@ - - - - + + + + - - - + + + - - - - + + + + - - + + @@ -140422,17 +140380,17 @@ - - - + + + - - + + - - + + @@ -140453,8 +140411,8 @@ - - + + @@ -140512,11 +140470,11 @@ - - - - - + + + + + @@ -140538,17 +140496,17 @@ - - + + - - + + - - - + + + @@ -140632,12 +140590,12 @@ - - - - - - + + + + + + @@ -140648,9 +140606,9 @@ - - - + + + @@ -140660,18 +140618,18 @@ - - - - + + + + - - + + @@ -140681,20 +140639,20 @@ - - - - + + + + - - - - + + + + - - + + @@ -140703,15 +140661,15 @@ - - - - - - - - - + + + + + + + + + @@ -141984,13 +141942,13 @@ - - - + + + - - + + @@ -142063,8 +142021,8 @@ - - + + @@ -142390,20 +142348,20 @@ - - + + - - + + - - + + - - + + @@ -142436,9 +142394,9 @@ - - - + + + @@ -142579,12 +142537,12 @@ - - + + - - + + @@ -142782,25 +142740,25 @@ - - - + + + - - - + + + - - - + + + - - - - + + + + @@ -142847,24 +142805,24 @@ - - + + - - + + - - + + - - + + @@ -142872,12 +142830,12 @@ - - + + - - + + @@ -142892,10 +142850,10 @@ - - - - + + + + @@ -142942,24 +142900,24 @@ - - - - + + + + - - - + + + - - - + + + @@ -142972,10 +142930,10 @@ - - - - + + + + @@ -142984,24 +142942,24 @@ - - - + + + - - - + + + - - - - + + + + @@ -143009,34 +142967,34 @@ - - - - + + + + - - - - + + + + - - - + + + - - + + - - + + - - - + + + @@ -144104,8 +144062,8 @@ - - + + @@ -144116,24 +144074,24 @@ - - + + - - - + + + - - + + - - - - - + + + + + @@ -144149,9 +144107,9 @@ - - - + + + @@ -144176,9 +144134,9 @@ - - - + + + @@ -144196,21 +144154,21 @@ - - + + - - + + - - - + + + - - + + @@ -144240,12 +144198,12 @@ - - + + - - + + @@ -144725,12 +144683,12 @@ - - + + - - + + @@ -144952,8 +144910,8 @@ - - + + @@ -145521,8 +145479,8 @@ - - + + @@ -145550,11 +145508,11 @@ - - - - - + + + + + @@ -146643,42 +146601,42 @@ - - + + - - + + - - + + - - + + - - - + + + - - + + - - + + - - + + @@ -146733,34 +146691,34 @@ - - + + - - + + - - - + + + - - - + + + - - + + - - - - - - + + + + + + @@ -147190,7 +147148,7 @@ - + @@ -147388,10 +147346,10 @@ - - - - + + + + @@ -147429,11 +147387,11 @@ - - - - - + + + + + @@ -147467,12 +147425,12 @@ - - - - - - + + + + + + @@ -148646,10 +148604,10 @@ - - - - + + + + @@ -148973,21 +148931,21 @@ - - + + - - + + - - - + + + @@ -148998,12 +148956,12 @@ - - + + - - + + @@ -150981,8 +150939,8 @@ - - + + @@ -151543,8 +151501,8 @@ - - + + @@ -151552,8 +151510,8 @@ - - + + @@ -151692,8 +151650,8 @@ - - + + @@ -151751,47 +151709,47 @@ - - + + - + - - - - + + + + - - + + - - + + - - - - + + + + - - - - + + + + - - + + - - - - + + + + @@ -151902,8 +151860,8 @@ - - + + @@ -151923,12 +151881,12 @@ - - - - - - + + + + + + @@ -151978,9 +151936,9 @@ - - - + + + diff --git a/android/abi_gki_aarch64_vivo b/android/abi_gki_aarch64_vivo index 9143afa6c2dc..13c6c8d26e23 100644 --- a/android/abi_gki_aarch64_vivo +++ b/android/abi_gki_aarch64_vivo @@ -1736,6 +1736,7 @@ __traceiter_android_vh_ftrace_oops_enter __traceiter_android_vh_ftrace_oops_exit __traceiter_android_vh_ftrace_size_check + __traceiter_android_vh_fuse_request_end __traceiter_android_vh_iommu_setup_dma_ops __traceiter_android_vh_ipi_stop __traceiter_android_vh_irqtime_account_process_tick @@ -1747,6 +1748,7 @@ __traceiter_android_vh_mmc_gpio_cd_irqt __traceiter_android_vh_mm_dirty_limits __traceiter_android_vh_printk_hotplug + __traceiter_android_vh_queue_request_and_unlock __traceiter_android_vh_scheduler_tick __traceiter_android_vh_sdhci_get_cd __traceiter_android_vh_sd_update_bus_speed_mode @@ -1849,6 +1851,7 @@ __tracepoint_android_vh_ftrace_oops_enter __tracepoint_android_vh_ftrace_oops_exit __tracepoint_android_vh_ftrace_size_check + __tracepoint_android_vh_fuse_request_end __tracepoint_android_vh_iommu_setup_dma_ops __tracepoint_android_vh_ipi_stop __tracepoint_android_vh_irqtime_account_process_tick @@ -1860,6 +1863,7 @@ __tracepoint_android_vh_mmc_gpio_cd_irqt __tracepoint_android_vh_mm_dirty_limits __tracepoint_android_vh_printk_hotplug + __tracepoint_android_vh_queue_request_and_unlock __tracepoint_android_vh_scheduler_tick __tracepoint_android_vh_sdhci_get_cd __tracepoint_android_vh_sd_update_bus_speed_mode From a5d073d697604f0101b2463509e27cab636da5ff Mon Sep 17 00:00:00 2001 From: Xiaofeng Yuan Date: Tue, 25 Jun 2024 19:51:30 +0800 Subject: [PATCH 1553/1565] ANDROID: GKI: Add pageflags for OEM Add PG_oem_reserved_x pageflags for OEM on 64-bit platform. These new flags can be used to indicate different states of page. For example, we can use these pageflags to identify anonymous pages at different ZRAM compression rates. These flags will be used in conjunction with different Android application states. Therefore, I think this is not a generic modification for all Linux devices and is not suitable for upstream submission. Bug: 336964184 Change-Id: Ie68087248866af63ac516d96e3af85222e7a0b50 Signed-off-by: Xiaofeng Yuan --- include/linux/page-flags.h | 6 ++++++ include/trace/events/mmflags.h | 13 +++++++++++++ 2 files changed, 19 insertions(+) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index dc636f162194..94b7b9d36d06 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -141,6 +141,12 @@ enum pageflags { #endif #ifdef CONFIG_KASAN_HW_TAGS PG_skip_kasan_poison, +#endif +#if defined(CONFIG_64BIT) && !defined(CONFIG_NUMA_BALANCING) + PG_oem_reserved_1, + PG_oem_reserved_2, + PG_oem_reserved_3, + PG_oem_reserved_4, #endif __NR_PAGEFLAGS, diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index a26dbefdf294..76e0a25f9293 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -83,8 +83,17 @@ #ifdef CONFIG_64BIT #define IF_HAVE_PG_ARCH_2(flag,string) ,{1UL << flag, string} + +/* With CONFIG_NUMA_BALANCING LAST_CPUPID_WIDTH consumes OEM-used page flags */ +#ifdef CONFIG_NUMA_BALANCING +#define IF_HAVE_PG_OEM_RESERVED(flag,string) +#else +#define IF_HAVE_PG_OEM_RESERVED(flag,string) ,{1UL << flag, string} +#endif + #else #define IF_HAVE_PG_ARCH_2(flag,string) +#define IF_HAVE_PG_OEM_RESERVED(flag,string) #endif #ifdef CONFIG_KASAN_HW_TAGS @@ -121,6 +130,10 @@ IF_HAVE_PG_HWPOISON(PG_hwpoison, "hwpoison" ) \ IF_HAVE_PG_IDLE(PG_young, "young" ) \ IF_HAVE_PG_IDLE(PG_idle, "idle" ) \ IF_HAVE_PG_ARCH_2(PG_arch_2, "arch_2" ) \ +IF_HAVE_PG_OEM_RESERVED(PG_oem_reserved_1,"oem_reserved_1") \ +IF_HAVE_PG_OEM_RESERVED(PG_oem_reserved_2,"oem_reserved_2") \ +IF_HAVE_PG_OEM_RESERVED(PG_oem_reserved_3,"oem_reserved_3") \ +IF_HAVE_PG_OEM_RESERVED(PG_oem_reserved_4,"oem_reserved_4") \ IF_HAVE_PG_SKIP_KASAN_POISON(PG_skip_kasan_poison, "skip_kasan_poison") #define show_page_flags(flags) \ From 284a6a930d62c110b86d5e144064536027d3b64d Mon Sep 17 00:00:00 2001 From: Xiaofeng Yuan Date: Tue, 25 Jun 2024 19:56:31 +0800 Subject: [PATCH 1554/1565] ANDROID: vendor_hooks: add hooks to modify pageflags These hooks are designed to set or clear OEM reserved pageflags when the memory state may change. Bug: 336964184 Change-Id: I9cb288ef6eef7a719d4f4748d6b71010645b7d50 Signed-off-by: Xiaofeng Yuan --- drivers/android/vendor_hooks.c | 5 +++++ include/trace/hooks/mm.h | 16 ++++++++++++++++ kernel/events/uprobes.c | 4 ++++ mm/memory.c | 6 ++++++ mm/shmem.c | 1 + 5 files changed, 32 insertions(+) diff --git a/drivers/android/vendor_hooks.c b/drivers/android/vendor_hooks.c index 9df83fbfe170..8e48ff9b8984 100644 --- a/drivers/android/vendor_hooks.c +++ b/drivers/android/vendor_hooks.c @@ -543,6 +543,11 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_blk_mq_all_tag_iter); EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_blk_mq_queue_tag_busy_iter); EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_blk_mq_free_tags); EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_blk_mq_sched_insert_request); +EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_shmem_swapin_page); +EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_do_wp_page); +EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_do_swap_page); +EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_do_anonymous_page); +EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_uprobes_replace_page); /* * For type visibility */ diff --git a/include/trace/hooks/mm.h b/include/trace/hooks/mm.h index efe0b0e9b974..d502c69069f1 100644 --- a/include/trace/hooks/mm.h +++ b/include/trace/hooks/mm.h @@ -370,6 +370,22 @@ DECLARE_HOOK(android_vh_do_swap_page_spf, DECLARE_HOOK(android_vh_tune_fault_around_bytes, TP_PROTO(unsigned long *fault_around_bytes), TP_ARGS(fault_around_bytes)); +DECLARE_HOOK(android_vh_do_anonymous_page, + TP_PROTO(struct vm_area_struct *vma, struct page *page), + TP_ARGS(vma, page)); +DECLARE_HOOK(android_vh_do_swap_page, + TP_PROTO(struct page *page, pte_t *pte, struct vm_fault *vmf, + swp_entry_t entry), + TP_ARGS(page, pte, vmf, entry)); +DECLARE_HOOK(android_vh_do_wp_page, + TP_PROTO(struct page *page), + TP_ARGS(page)); +DECLARE_HOOK(android_vh_uprobes_replace_page, + TP_PROTO(struct page *new_page, struct page *old_page), + TP_ARGS(new_page, old_page)); +DECLARE_HOOK(android_vh_shmem_swapin_page, + TP_PROTO(struct page *page), + TP_ARGS(page)); #endif /* _TRACE_HOOK_MM_H */ /* This part must be outside protection */ diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 826a2355da1e..93325a58d68b 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -30,6 +30,9 @@ #include +#undef CREATE_TRACE_POINTS +#include + #define UINSNS_PER_PAGE (PAGE_SIZE/UPROBE_XOL_SLOT_BYTES) #define MAX_UPROBE_XOL_SLOTS UINSNS_PER_PAGE @@ -185,6 +188,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, get_page(new_page); page_add_new_anon_rmap(new_page, vma, addr, false); lru_cache_add_inactive_or_unevictable(new_page, vma); + trace_android_vh_uprobes_replace_page(new_page, old_page); } else /* no new page, just dec_mm_counter for old_page */ dec_mm_counter(mm, MM_ANONPAGES); diff --git a/mm/memory.c b/mm/memory.c index d41ab12a2eba..2aeeef494103 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -78,6 +78,9 @@ #include +#undef CREATE_TRACE_POINTS +#include + #include #include #include @@ -3444,6 +3447,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) * Take out anonymous pages first, anonymous shared vmas are * not dirty accountable. */ + trace_android_vh_do_wp_page(vmf->page); if (PageAnon(vmf->page)) { struct page *page = vmf->page; @@ -3815,6 +3819,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); dec_mm_counter_fast(vma->vm_mm, MM_SWAPENTS); pte = mk_pte(page, vmf->vma_page_prot); + trace_android_vh_do_swap_page(page, &pte, vmf, entry); if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) { pte = maybe_mkwrite(pte_mkdirty(pte), vmf->vma_flags); vmf->flags &= ~FAULT_FLAG_WRITE; @@ -3971,6 +3976,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) */ __SetPageUptodate(page); + trace_android_vh_do_anonymous_page(vma, page); entry = mk_pte(page, vmf->vma_page_prot); entry = pte_sw_mkyoung(entry); if (vmf->vma_flags & VM_WRITE) diff --git a/mm/shmem.c b/mm/shmem.c index b812f26e6f8d..17863d315899 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1742,6 +1742,7 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index, /* We have to do this with page locked to prevent races */ lock_page(page); + trace_android_vh_shmem_swapin_page(page); if (!PageSwapCache(page) || page_private(page) != swap.val || !shmem_confirm_swap(mapping, index, swap)) { error = -EEXIST; From d7881f1c8fd2200b00758728b66abff828b38f73 Mon Sep 17 00:00:00 2001 From: Xiaofeng Yuan Date: Thu, 5 Sep 2024 16:09:56 +0800 Subject: [PATCH 1555/1565] ANDROID: GKI: Add symbol to symbol list for vivo. 5 Added functions: [A] 'function int __traceiter_android_vh_do_anonymous_page(void*, vm_area_struct*, page*)' [A] 'function int __traceiter_android_vh_do_swap_page(void*, page*, pte_t*, vm_fault*, swp_entry_t)' [A] 'function int __traceiter_android_vh_do_wp_page(void*, page*)' [A] 'function int __traceiter_android_vh_shmem_swapin_page(void*, page*)' [A] 'function int __traceiter_android_vh_uprobes_replace_page(void*, page*, page*)' 5 Added variables: [A] 'tracepoint __tracepoint_android_vh_do_anonymous_page' [A] 'tracepoint __tracepoint_android_vh_do_swap_page' [A] 'tracepoint __tracepoint_android_vh_do_wp_page' [A] 'tracepoint __tracepoint_android_vh_shmem_swapin_page' [A] 'tracepoint __tracepoint_android_vh_uprobes_replace_page' Bug: 336964184 Change-Id: Ib101654dadf398f6087002fdb45c79ba715aa355 Signed-off-by: Xiaofeng Yuan --- android/abi_gki_aarch64.xml | 250 ++++++++++++++++++++--------------- android/abi_gki_aarch64_vivo | 10 ++ 2 files changed, 151 insertions(+), 109 deletions(-) diff --git a/android/abi_gki_aarch64.xml b/android/abi_gki_aarch64.xml index ffea9cf2133a..fcae78b34eec 100644 --- a/android/abi_gki_aarch64.xml +++ b/android/abi_gki_aarch64.xml @@ -520,12 +520,15 @@ + + + @@ -694,6 +697,7 @@ + @@ -755,6 +759,7 @@ + @@ -6684,12 +6689,15 @@ + + + @@ -6858,6 +6866,7 @@ + @@ -6919,6 +6928,7 @@ + @@ -15321,6 +15331,7 @@ + @@ -37935,6 +37946,12 @@ + + + + + + @@ -44972,23 +44989,7 @@ - - - - - - - - - - - - - - - - - + @@ -49390,7 +49391,6 @@ - @@ -54376,13 +54376,6 @@ - - - - - - - @@ -67075,13 +67068,6 @@ - - - - - - - @@ -70555,7 +70541,26 @@ - + + + + + + + + + + + + + + + + + + + + @@ -91981,7 +91986,6 @@ - @@ -103766,12 +103770,6 @@ - - - - - - @@ -112055,7 +112053,6 @@ - @@ -117157,9 +117154,9 @@ - + - + @@ -120340,6 +120337,12 @@ + + + + + + @@ -120362,6 +120365,14 @@ + + + + + + + + @@ -120378,6 +120389,11 @@ + + + + + @@ -121409,6 +121425,11 @@ + + + + + @@ -121801,6 +121822,12 @@ + + + + + + @@ -122517,12 +122544,15 @@ + + + @@ -122691,6 +122721,7 @@ + @@ -122752,6 +122783,7 @@ + @@ -123286,12 +123318,12 @@ - - - - - - + + + + + + @@ -132744,10 +132776,10 @@ - - - - + + + + @@ -134360,7 +134392,7 @@ - + @@ -137873,11 +137905,11 @@ - - - - - + + + + + @@ -142390,8 +142422,8 @@ - - + + @@ -143010,12 +143042,12 @@ - - - - - - + + + + + + @@ -144880,21 +144912,21 @@ - - - - + + + + - - - + + + - - - - + + + + @@ -148857,11 +148889,11 @@ - - - - - + + + + + @@ -151243,22 +151275,22 @@ - - - - + + + + - - - - + + + + - - - - + + + + @@ -151329,23 +151361,23 @@ - - - - + + + + - - - - + + + + - - - - - + + + + + @@ -151953,7 +151985,7 @@ - + diff --git a/android/abi_gki_aarch64_vivo b/android/abi_gki_aarch64_vivo index 13c6c8d26e23..e89b31bc5c7d 100644 --- a/android/abi_gki_aarch64_vivo +++ b/android/abi_gki_aarch64_vivo @@ -1728,6 +1728,9 @@ __traceiter_android_vh_blk_rq_ctx_init __traceiter_android_vh_cpu_idle_enter __traceiter_android_vh_cpu_idle_exit + __traceiter_android_vh_do_anonymous_page + __traceiter_android_vh_do_swap_page + __traceiter_android_vh_do_wp_page __traceiter_android_vh_dup_task_struct __traceiter_android_vh_filemap_fault_cache_page __traceiter_android_vh_filemap_fault_get_page @@ -1752,6 +1755,7 @@ __traceiter_android_vh_scheduler_tick __traceiter_android_vh_sdhci_get_cd __traceiter_android_vh_sd_update_bus_speed_mode + __traceiter_android_vh_shmem_swapin_page __traceiter_android_vh_show_max_freq __traceiter_android_vh_show_resume_epoch_val __traceiter_android_vh_show_suspend_epoch_val @@ -1764,6 +1768,7 @@ __traceiter_android_vh_ufs_compl_command __traceiter_android_vh_ufs_send_command __traceiter_android_vh_ufs_update_sdev + __traceiter_android_vh_uprobes_replace_page __traceiter_android_vh_vmpressure __traceiter_binder_transaction_received __traceiter_block_bio_complete @@ -1843,6 +1848,9 @@ __tracepoint_android_vh_blk_rq_ctx_init __tracepoint_android_vh_cpu_idle_enter __tracepoint_android_vh_cpu_idle_exit + __tracepoint_android_vh_do_anonymous_page + __tracepoint_android_vh_do_swap_page + __tracepoint_android_vh_do_wp_page __tracepoint_android_vh_dup_task_struct __tracepoint_android_vh_filemap_fault_cache_page __tracepoint_android_vh_filemap_fault_get_page @@ -1867,6 +1875,7 @@ __tracepoint_android_vh_scheduler_tick __tracepoint_android_vh_sdhci_get_cd __tracepoint_android_vh_sd_update_bus_speed_mode + __tracepoint_android_vh_shmem_swapin_page __tracepoint_android_vh_show_max_freq __tracepoint_android_vh_show_resume_epoch_val __tracepoint_android_vh_show_suspend_epoch_val @@ -1879,6 +1888,7 @@ __tracepoint_android_vh_ufs_compl_command __tracepoint_android_vh_ufs_send_command __tracepoint_android_vh_ufs_update_sdev + __tracepoint_android_vh_uprobes_replace_page __tracepoint_android_vh_vmpressure __tracepoint_binder_transaction_received __tracepoint_block_bio_complete From 433a83ab087ad000cea56b209e82d79e0ab80a39 Mon Sep 17 00:00:00 2001 From: Carlos Llamas Date: Thu, 15 Jul 2021 03:18:03 +0000 Subject: [PATCH 1556/1565] UPSTREAM: binderfs: add support for feature files Provide userspace with a mechanism to discover features supported by the binder driver to refrain from using any unsupported ones in the first place. Starting with "oneway_spam_detection" only new features are to be listed under binderfs and all previous ones are assumed to be supported. Assuming an instance of binderfs has been mounted at /dev/binderfs, binder feature files can be found under /dev/binderfs/features/. Usage example: $ mkdir /dev/binderfs $ mount -t binder binder /dev/binderfs $ cat /dev/binderfs/features/oneway_spam_detection 1 Acked-by: Christian Brauner Signed-off-by: Carlos Llamas Link: https://lore.kernel.org/r/20210715031805.1725878-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman (cherry picked from commit fc470abf54b2bd6e539065e07905e767b443d719) Bug: 191910201 Signed-off-by: Carlos Llamas Change-Id: Ia5c03aa1881981bee26459e741134b83d5b59693 --- drivers/android/binderfs.c | 39 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/drivers/android/binderfs.c b/drivers/android/binderfs.c index 2e5fbb3adb09..8d4eb491d4a6 100644 --- a/drivers/android/binderfs.c +++ b/drivers/android/binderfs.c @@ -58,6 +58,10 @@ enum binderfs_stats_mode { binderfs_stats_mode_global, }; +struct binder_features { + bool oneway_spam_detection; +}; + static const struct constant_table binderfs_param_stats[] = { { "global", binderfs_stats_mode_global }, {} @@ -69,6 +73,10 @@ static const struct fs_parameter_spec binderfs_fs_parameters[] = { {} }; +static struct binder_features binder_features = { + .oneway_spam_detection = true, +}; + static inline struct binderfs_info *BINDERFS_SB(const struct super_block *sb) { return sb->s_fs_info; @@ -581,6 +589,33 @@ static struct dentry *binderfs_create_dir(struct dentry *parent, return dentry; } +static int binder_features_show(struct seq_file *m, void *unused) +{ + bool *feature = m->private; + + seq_printf(m, "%d\n", *feature); + + return 0; +} +DEFINE_SHOW_ATTRIBUTE(binder_features); + +static int init_binder_features(struct super_block *sb) +{ + struct dentry *dentry, *dir; + + dir = binderfs_create_dir(sb->s_root, "features"); + if (IS_ERR(dir)) + return PTR_ERR(dir); + + dentry = binderfs_create_file(dir, "oneway_spam_detection", + &binder_features_fops, + &binder_features.oneway_spam_detection); + if (IS_ERR(dentry)) + return PTR_ERR(dentry); + + return 0; +} + static int init_binder_logs(struct super_block *sb) { struct dentry *binder_logs_root_dir, *dentry, *proc_log_dir; @@ -694,6 +729,10 @@ static int binderfs_fill_super(struct super_block *sb, struct fs_context *fc) name++; } + ret = init_binder_features(sb); + if (ret) + return ret; + if (info->mount_opts.stats_mode == binderfs_stats_mode_global) return init_binder_logs(sb); From fe8ef2d5db97835b1cc8521e07c6acb473243f58 Mon Sep 17 00:00:00 2001 From: Carlos Llamas Date: Thu, 15 Jul 2021 03:18:04 +0000 Subject: [PATCH 1557/1565] UPSTREAM: docs: binderfs: add section about feature files Document how binder feature files can be used to determine whether a feature is supported by the binder driver. "oneway_spam_detection" is used as an example as it is the first available feature file. Acked-by: Christian Brauner Signed-off-by: Carlos Llamas Link: https://lore.kernel.org/r/20210715031805.1725878-2-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman (cherry picked from commit 06e1721d2a265d1247093f5ad5ae2958ef10a604) Bug: 191910201 Signed-off-by: Carlos Llamas Change-Id: I9c4542e0ee65dd94a492fe0440ba8f1a48d8b797 --- Documentation/admin-guide/binderfs.rst | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/Documentation/admin-guide/binderfs.rst b/Documentation/admin-guide/binderfs.rst index 8243af9b3510..9919879dcba3 100644 --- a/Documentation/admin-guide/binderfs.rst +++ b/Documentation/admin-guide/binderfs.rst @@ -72,3 +72,16 @@ that the `rm() `_ tool can be used to delete them. Note that the ``binder-control`` device cannot be deleted since this would make the binderfs instance unuseable. The ``binder-control`` device will be deleted when the binderfs instance is unmounted and all references to it have been dropped. + +Binder features +--------------- + +Assuming an instance of binderfs has been mounted at ``/dev/binderfs``, the +features supported by the binder driver can be located under +``/dev/binderfs/features/``. The presence of individual files can be tested +to determine whether a particular feature is supported by the driver. + +Example:: + + cat /dev/binderfs/features/oneway_spam_detection + 1 From 014a9ca18fa04fc48a16abfbb9dea6b59dcebd1c Mon Sep 17 00:00:00 2001 From: Carlos Llamas Date: Thu, 15 Jul 2021 03:18:05 +0000 Subject: [PATCH 1558/1565] UPSTREAM: selftests/binderfs: add test for feature files Verify that feature files are created successfully after mounting a binderfs instance. Note that only "oneway_spam_detection" feature is tested with this patch as it is currently the only feature listed. Acked-by: Christian Brauner Signed-off-by: Carlos Llamas Link: https://lore.kernel.org/r/20210715031805.1725878-3-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman (cherry picked from commit 07e913418ce4ba5eb620dd4668bf91ec94e11136) Bug: 191910201 Signed-off-by: Carlos Llamas Change-Id: I86d7ef34b3099c8714c319e48029aaf3dbf87081 --- .../filesystems/binderfs/binderfs_test.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/tools/testing/selftests/filesystems/binderfs/binderfs_test.c b/tools/testing/selftests/filesystems/binderfs/binderfs_test.c index 477cbb042f5b..0315955ff0f4 100644 --- a/tools/testing/selftests/filesystems/binderfs/binderfs_test.c +++ b/tools/testing/selftests/filesystems/binderfs/binderfs_test.c @@ -62,6 +62,9 @@ static int __do_binderfs_test(struct __test_metadata *_metadata) struct binder_version version = { 0 }; char binderfs_mntpt[] = P_tmpdir "/binderfs_XXXXXX", device_path[sizeof(P_tmpdir "/binderfs_XXXXXX/") + BINDERFS_MAX_NAME]; + static const char * const binder_features[] = { + "oneway_spam_detection", + }; change_mountns(_metadata); @@ -150,6 +153,20 @@ static int __do_binderfs_test(struct __test_metadata *_metadata) } /* success: binder-control device removal failed as expected */ + + for (int i = 0; i < ARRAY_SIZE(binder_features); i++) { + snprintf(device_path, sizeof(device_path), "%s/features/%s", + binderfs_mntpt, binder_features[i]); + fd = open(device_path, O_CLOEXEC | O_RDONLY); + EXPECT_GE(fd, 0) { + TH_LOG("%s - Failed to open binder feature: %s", + strerror(errno), binder_features[i]); + goto umount; + } + close(fd); + } + + /* success: binder feature files found */ result = 0; umount: From 0e10c6560f938a99f20f7ca7e593a1725eabe527 Mon Sep 17 00:00:00 2001 From: Yu-Ting Tseng Date: Tue, 9 Jul 2024 00:00:47 -0700 Subject: [PATCH 1559/1565] BACKPORT: FROMGIT: binder: frozen notification Frozen processes present a significant challenge in binder transactions. When a process is frozen, it cannot, by design, accept and/or respond to binder transactions. As a result, the sender needs to adjust its behavior, such as postponing transactions until the peer process unfreezes. However, there is currently no way to subscribe to these state change events, making it impossible to implement frozen-aware behaviors efficiently. Introduce a binder API for subscribing to frozen state change events. This allows programs to react to changes in peer process state, mitigating issues related to binder transactions sent to frozen processes. Implementation details: For a given binder_ref, the state of frozen notification can be one of the followings: 1. Userspace doesn't want a notification. binder_ref->freeze is null. 2. Userspace wants a notification but none is in flight. list_empty(&binder_ref->freeze->work.entry) = true 3. A notification is in flight and waiting to be read by userspace. binder_ref_freeze.sent is false. 4. A notification was read by userspace and kernel is waiting for an ack. binder_ref_freeze.sent is true. When a notification is in flight, new state change events are coalesced into the existing binder_ref_freeze struct. If userspace hasn't picked up the notification yet, the driver simply rewrites the state. Otherwise, the notification is flagged as requiring a resend, which will be performed once userspace acks the original notification that's inflight. See https://r.android.com/3070045 for how userspace is going to use this feature. Signed-off-by: Yu-Ting Tseng Acked-by: Carlos Llamas Link: https://lore.kernel.org/r/20240709070047.4055369-4-yutingtseng@google.com Signed-off-by: Greg Kroah-Hartman Bug: 363013421 (cherry picked from commit d579b04a52a183db47dfcb7a44304d7747d551e1 git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git char-misc-next) Change-Id: I5dd32abba932ca7d03ae58660143e075ed778b81 [cmllamas: fix merge conflicts due to missing 0567461a7a6e] Signed-off-by: Carlos Llamas --- arch/arm64/tools/gen-hyprel | Bin 0 -> 16512 bytes drivers/android/binder.c | 283 +++++++++++++++++++++++++++- drivers/android/binder_internal.h | 21 ++- include/uapi/linux/android/binder.h | 36 ++++ 4 files changed, 337 insertions(+), 3 deletions(-) create mode 100755 arch/arm64/tools/gen-hyprel diff --git a/arch/arm64/tools/gen-hyprel b/arch/arm64/tools/gen-hyprel new file mode 100755 index 0000000000000000000000000000000000000000..83299628cbfe2637803b5e2f54190ac94f884d43 GIT binary patch literal 16512 zcmeHO4R9OBbzbmCqG*Yr6v?He*1RxcOxYwvilR&@q6|=?bo5XXqaRxxM?OIWKw<&{ zEO3aZG_k2rbc@1toK9?KGMQHOq-iFJrDCnxx9Ap2aBLJehy~!Qq)h4Eia@K%rHXu7qve&LWY<8IBJ}r_STN-XiIQEZGNKz` zD(V%I?5e4%`9j*YP~n)eKHHw5^rC=Xq6DBY&9x)dY@1Yz*`9p*fX<81{bxH8lii%Q zo6~k#N&%5xQ_d&lg#Np=zd~w2N5zmwAqwpdYP&+ZUE`Tj8&vkaYWw~(?G0+X7n+ry zLfWdtf+@H6aoFipYsC8#KFxb|d!^1pyFL$0by~%`Q1Hmc4eLU_woot}A8#A)-q^Nr zgDVzst(Of`T+{}QshztIh>8g@Lo&9f`f~hJT1mg>!ZQz@dg&$mwKu-H>?ePpJoU^M z&9&@t3`1%r-H8k$E0>Q916#TRwEC$w)6|=oj+Zy!-!y+&i4TkN27>nC6 zF%*i#{30^u4~v-X^GBl~QIJR!mZ0H?6^aab?O-G<{Nq7ejE;K8L;&t=L5^vFZWKEQ z`g^xo>s_6l+3I>%SGL}{L0J9!cUkc79}dQBe{}z@ZJ|imzu$W#L?I53M#4HoOF@eQ zX>?X%WL9SXRGp4hQbyypQhX9A;p_1`Uw8<^xJtyeEWRqZh(`T{=D#cT72-bdRQSHh z`80k{$uH^qBFpFW@6|!K(zBfTnOl{dt|2;yT%OL6>sz2y&F07vV76S!kuT0laX-$H z)3wQk#{=CbI0sVaRtWmpsZdpUtZ^%bA`SThxS^a&?%gw17yCO4dPCixhWH!o7*E3LMnxDnd+2}@%_z}{l=Q1eG zPY_R2s`RYnzehYxY0@*2|2FY7wMw6o{5OcFsZIK%!dNX)2SRko=z$ zPt%F?nB@PAc$&JT4@>@Y#M2ZlJt+BF@PP-w%i_7ibGzq&Xa9kH=JB6@38KkNg}u&9 zUT-FQ4g=XSV@`Lyi7lC(Y#DH61e@zcCX%A=squ z05tLB)Qs~_FQH>*; zkfK^Y-JDgKb6LV&;J}^n#4;8m8aD4uZ za89Mcnn|<8oVw^d(WP`-O~*MivE)G)k~;Og^TaKLIrqu}bGpZi=O(jq`_cwP_*v*v zm*1mqEb*dlCRZE*ah~yP=})|5o|{`>I%dsHPdF#HArt+NcZX=EJ=w0F+;GTsa z4DBfpIkcZ$3edv<{R4F@^{8t2YfyX6jzeT7&LLHA;?V?Fd!$urBn!V=!H*I==>+gW z0#+`$9|GVT6}~%*->vXF34iuAG}tGD_*d6M3I zmjkVWC+F#BJZEp9Ad@TJ0>JYa1|M?R_3{$p&9c((lP@In@ef>Syy3 zcvu1Z6>yaXz5rmVnqV^uwn@R76zpaVdsN;7RxF0pb9Oo6@;v5N0d)Y8z6)yE;RY4* zPj4c`Jqi&a#A!yXREQ@eqE8|Inh;-P#43e2DG>t-@gqV+8L?U+?v{uh3bAxC5Q9Jj zT+YT5kkNg%;}`vj=k78Sub7iRy*#)d=bKJz&lI|vSz)ae;;pM_`u%_|4XHU@{R)vy zZ`&*I^Rt!e`1x6snbY@!Cp88Bti*SHo5i z+dTX9Q7FlZek=2S9U&~2at6v7C}*IYfpP}Q87ODq|5yebc>8c5<{kELHkxC`7Jn#U z1;cn1wY3)Sf$%(K#r#9`w#gE%;ZWp=Hw0L7tX8;=Mxs8WUAV>~V;UlRF5gv=ug%7` zNIc{-!V%k`_gMx$G%y0ekZh<{6vYkuZG&EA#bZV!P+*|*TGWnSf)&7XaNbzVAEiD> z->#T##qcK1*xF+djRab5La$x^5nptz5%3~HpV1N;iS;yx;;lxMD|ij;?`ig}tyO^( z_0N9Vw)(B0&mXq$Y(Hf5^cd~qw*>NBw~sfEE8nVi)%geX+adkl*mvUxbeo&+tNFM# z-u24w!0*~`r}DewckOqB^t(PU3(b7mK(8kD+VS&#dFYaUJN}D)Eh~;!yB!Of{XqoXxxQ82YmJ+6|A2>U;?C(O;2wj9lhGWn6^_-FAET#S ztOAOG*E1O#^tYg20bTq?Ci7j;(B(`9kIC{I6%gV0^j_f@uXn6oys+j0$HIF0c7;AF zF@V!4CRho)5Nn_#{A@atIYq#_`WhsnS~DSTUG=e>I@d@ygr?yal;d_`|v!~K=Lb;fj6Z(YlM)tx+fRztebTNYwNlp@zk{d z0VFJX>uNk$$0(O_2Fe*IXP}&cat8jNWq`krn@TG5Q%>cuuJAW%Piwt?oDx4)uO8oVraNPVvnk^Ij-9CHyrNlR6uS=FMLnEqm0$ z$NOP_44!-uOD&dKK06YPS>E`#7>bOJh5T5QfQO<8+9MFOywRxln1!Xo=rIw9dPn`1 zFFrbY3??}omS}@^9@MgS?Dg#Gv-);#xA4jS_CMV1+10-dR@hf!^_g00Zr>}codbJ% zJpo5G53)MiRPcT~31~}0KG;voIYM?-Ze7UJ z%FPdX5+n+EwH1LkN)!O{_bOPvSi~CfhJCb;p??o7eZjC5kNJH9NyWiB5{qena`y!- zP3L)}T@`r(%fjvmPAqMkD7qJdw`t^=%6%DG33g-TZwAR~WFG09>P;Vc3gL|zS~w8+!j-$tWc!biNZ5#jP3 z3!@1|?Wodx${&rw5fR zm0e^xWvT|Uvfw&e@QwzD(0s%u&j^kNjtgf`xNs?r;tDBp_5a63G{40yk8W?WH0XH+ zQ=Z>ywZw9r=tuBRb4s@F*7i&d=yD6sGg!6}m0bIS+MX$`tK}BH-?OYAm0WwC7cn&$ zsA-}79gzP9b5xF>=j}{ew4I*LX6O0r){6?QxUoIY>zRIul8ZtyQkDD9^J|(9QJ2}C zpA(q!a{(Z-Xm`wtKSGV>;B3$HZl+Bd$?-Fe>D|!NoSu1}*E4l%`@;COSWKfrIc0l( zu3*Z~72JMd{~y-&z1kr^r!c*&+3E89~^d_vy{7BeW1Lie!&$&-6WLa_xCtAf@d`vX-jCc1#z*IM<%%?St(K&vCLF zwqt%Vj7i4vA9gFvVQoLgKuwJTeyMI>!?RkpZ_G+mg<_?WOS5BFspN4&JDku1?hhcP qcP|`2-+$@);Mk=hAG%cKkpk!3F2nF7Qq=yDTNHvmHO(yrtoU~%FhP(2 literal 0 HcmV?d00001 diff --git a/drivers/android/binder.c b/drivers/android/binder.c index b8c68c3c9eb4..219e7883f399 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -1414,6 +1414,7 @@ static void binder_free_ref(struct binder_ref *ref) if (ref->node) binder_free_node(ref->node); kfree(ref->death); + kfree(ref->freeze); kfree(ref); } @@ -3826,6 +3827,155 @@ static void binder_transaction(struct binder_proc *proc, } } +static int +binder_request_freeze_notification(struct binder_proc *proc, + struct binder_thread *thread, + struct binder_handle_cookie *handle_cookie) +{ + struct binder_ref_freeze *freeze; + struct binder_ref *ref; + bool is_frozen; + + freeze = kzalloc(sizeof(*freeze), GFP_KERNEL); + if (!freeze) + return -ENOMEM; + binder_proc_lock(proc); + ref = binder_get_ref_olocked(proc, handle_cookie->handle, false); + if (!ref) { + binder_user_error("%d:%d BC_REQUEST_FREEZE_NOTIFICATION invalid ref %d\n", + proc->pid, thread->pid, handle_cookie->handle); + binder_proc_unlock(proc); + kfree(freeze); + return -EINVAL; + } + + binder_node_lock(ref->node); + + if (ref->freeze || !ref->node->proc) { + binder_user_error("%d:%d invalid BC_REQUEST_FREEZE_NOTIFICATION %s\n", + proc->pid, thread->pid, + ref->freeze ? "already set" : "dead node"); + binder_node_unlock(ref->node); + binder_proc_unlock(proc); + kfree(freeze); + return -EINVAL; + } + binder_inner_proc_lock(ref->node->proc); + is_frozen = ref->node->proc->is_frozen; + binder_inner_proc_unlock(ref->node->proc); + + binder_stats_created(BINDER_STAT_FREEZE); + INIT_LIST_HEAD(&freeze->work.entry); + freeze->cookie = handle_cookie->cookie; + freeze->work.type = BINDER_WORK_FROZEN_BINDER; + freeze->is_frozen = is_frozen; + + ref->freeze = freeze; + + binder_inner_proc_lock(proc); + binder_enqueue_work_ilocked(&ref->freeze->work, &proc->todo); + binder_wakeup_proc_ilocked(proc); + binder_inner_proc_unlock(proc); + + binder_node_unlock(ref->node); + binder_proc_unlock(proc); + return 0; +} + +static int +binder_clear_freeze_notification(struct binder_proc *proc, + struct binder_thread *thread, + struct binder_handle_cookie *handle_cookie) +{ + struct binder_ref_freeze *freeze; + struct binder_ref *ref; + + binder_proc_lock(proc); + ref = binder_get_ref_olocked(proc, handle_cookie->handle, false); + if (!ref) { + binder_user_error("%d:%d BC_CLEAR_FREEZE_NOTIFICATION invalid ref %d\n", + proc->pid, thread->pid, handle_cookie->handle); + binder_proc_unlock(proc); + return -EINVAL; + } + + binder_node_lock(ref->node); + + if (!ref->freeze) { + binder_user_error("%d:%d BC_CLEAR_FREEZE_NOTIFICATION freeze notification not active\n", + proc->pid, thread->pid); + binder_node_unlock(ref->node); + binder_proc_unlock(proc); + return -EINVAL; + } + freeze = ref->freeze; + binder_inner_proc_lock(proc); + if (freeze->cookie != handle_cookie->cookie) { + binder_user_error("%d:%d BC_CLEAR_FREEZE_NOTIFICATION freeze notification cookie mismatch %016llx != %016llx\n", + proc->pid, thread->pid, (u64)freeze->cookie, + (u64)handle_cookie->cookie); + binder_inner_proc_unlock(proc); + binder_node_unlock(ref->node); + binder_proc_unlock(proc); + return -EINVAL; + } + ref->freeze = NULL; + /* + * Take the existing freeze object and overwrite its work type. There are three cases here: + * 1. No pending notification. In this case just add the work to the queue. + * 2. A notification was sent and is pending an ack from userspace. Once an ack arrives, we + * should resend with the new work type. + * 3. A notification is pending to be sent. Since the work is already in the queue, nothing + * needs to be done here. + */ + freeze->work.type = BINDER_WORK_CLEAR_FREEZE_NOTIFICATION; + if (list_empty(&freeze->work.entry)) { + binder_enqueue_work_ilocked(&freeze->work, &proc->todo); + binder_wakeup_proc_ilocked(proc); + } else if (freeze->sent) { + freeze->resend = true; + } + binder_inner_proc_unlock(proc); + binder_node_unlock(ref->node); + binder_proc_unlock(proc); + return 0; +} + +static int +binder_freeze_notification_done(struct binder_proc *proc, + struct binder_thread *thread, + binder_uintptr_t cookie) +{ + struct binder_ref_freeze *freeze = NULL; + struct binder_work *w; + + binder_inner_proc_lock(proc); + list_for_each_entry(w, &proc->delivered_freeze, entry) { + struct binder_ref_freeze *tmp_freeze = + container_of(w, struct binder_ref_freeze, work); + + if (tmp_freeze->cookie == cookie) { + freeze = tmp_freeze; + break; + } + } + if (!freeze) { + binder_user_error("%d:%d BC_FREEZE_NOTIFICATION_DONE %016llx not found\n", + proc->pid, thread->pid, (u64)cookie); + binder_inner_proc_unlock(proc); + return -EINVAL; + } + binder_dequeue_work_ilocked(&freeze->work); + freeze->sent = false; + if (freeze->resend) { + freeze->resend = false; + binder_enqueue_work_ilocked(&freeze->work, &proc->todo); + binder_wakeup_proc_ilocked(proc); + } + binder_inner_proc_unlock(proc); + return 0; +} + /** * binder_free_buf() - free the specified buffer * @proc: binder proc that owns buffer @@ -4311,6 +4461,44 @@ static int binder_thread_write(struct binder_proc *proc, binder_inner_proc_unlock(proc); } break; + case BC_REQUEST_FREEZE_NOTIFICATION: { + struct binder_handle_cookie handle_cookie; + int error; + + if (copy_from_user(&handle_cookie, ptr, sizeof(handle_cookie))) + return -EFAULT; + ptr += sizeof(handle_cookie); + error = binder_request_freeze_notification(proc, thread, + &handle_cookie); + if (error) + return error; + } break; + + case BC_CLEAR_FREEZE_NOTIFICATION: { + struct binder_handle_cookie handle_cookie; + int error; + + if (copy_from_user(&handle_cookie, ptr, sizeof(handle_cookie))) + return -EFAULT; + ptr += sizeof(handle_cookie); + error = binder_clear_freeze_notification(proc, thread, &handle_cookie); + if (error) + return error; + } break; + + case BC_FREEZE_NOTIFICATION_DONE: { + binder_uintptr_t cookie; + int error; + + if (get_user(cookie, (binder_uintptr_t __user *)ptr)) + return -EFAULT; + + ptr += sizeof(cookie); + error = binder_freeze_notification_done(proc, thread, cookie); + if (error) + return error; + } break; + default: pr_err("%d:%d unknown command %d\n", proc->pid, thread->pid, cmd); @@ -4713,6 +4901,46 @@ static int binder_thread_read(struct binder_proc *proc, if (cmd == BR_DEAD_BINDER) goto done; /* DEAD_BINDER notifications can cause transactions */ } break; + + case BINDER_WORK_FROZEN_BINDER: { + struct binder_ref_freeze *freeze; + struct binder_frozen_state_info info; + + memset(&info, 0, sizeof(info)); + freeze = container_of(w, struct binder_ref_freeze, work); + info.is_frozen = freeze->is_frozen; + info.cookie = freeze->cookie; + freeze->sent = true; + binder_enqueue_work_ilocked(w, &proc->delivered_freeze); + binder_inner_proc_unlock(proc); + + if (put_user(BR_FROZEN_BINDER, (uint32_t __user *)ptr)) + return -EFAULT; + ptr += sizeof(uint32_t); + if (copy_to_user(ptr, &info, sizeof(info))) + return -EFAULT; + ptr += sizeof(info); + binder_stat_br(proc, thread, BR_FROZEN_BINDER); + goto done; /* BR_FROZEN_BINDER notifications can cause transactions */ + } break; + + case BINDER_WORK_CLEAR_FREEZE_NOTIFICATION: { + struct binder_ref_freeze *freeze = + container_of(w, struct binder_ref_freeze, work); + binder_uintptr_t cookie = freeze->cookie; + + binder_inner_proc_unlock(proc); + kfree(freeze); + binder_stats_deleted(BINDER_STAT_FREEZE); + if (put_user(BR_CLEAR_FREEZE_NOTIFICATION_DONE, (uint32_t __user *)ptr)) + return -EFAULT; + ptr += sizeof(uint32_t); + if (put_user(cookie, (binder_uintptr_t __user *)ptr)) + return -EFAULT; + ptr += sizeof(binder_uintptr_t); + binder_stat_br(proc, thread, BR_CLEAR_FREEZE_NOTIFICATION_DONE); + } break; + default: binder_inner_proc_unlock(proc); pr_err("%d:%d: bad work type %d\n", @@ -5332,6 +5560,48 @@ static bool binder_txns_pending_ilocked(struct binder_proc *proc) return false; } +static void binder_add_freeze_work(struct binder_proc *proc, bool is_frozen) +{ + struct rb_node *n; + struct binder_ref *ref; + + binder_inner_proc_lock(proc); + for (n = rb_first(&proc->nodes); n; n = rb_next(n)) { + struct binder_node *node; + + node = rb_entry(n, struct binder_node, rb_node); + binder_inner_proc_unlock(proc); + binder_node_lock(node); + hlist_for_each_entry(ref, &node->refs, node_entry) { + /* + * Need the node lock to synchronize + * with new notification requests and the + * inner lock to synchronize with queued + * freeze notifications. + */ + binder_inner_proc_lock(ref->proc); + if (!ref->freeze) { + binder_inner_proc_unlock(ref->proc); + continue; + } + ref->freeze->work.type = BINDER_WORK_FROZEN_BINDER; + if (list_empty(&ref->freeze->work.entry)) { + ref->freeze->is_frozen = is_frozen; + binder_enqueue_work_ilocked(&ref->freeze->work, &ref->proc->todo); + binder_wakeup_proc_ilocked(ref->proc); + } else { + if (ref->freeze->sent && ref->freeze->is_frozen != is_frozen) + ref->freeze->resend = true; + ref->freeze->is_frozen = is_frozen; + } + binder_inner_proc_unlock(ref->proc); + } + binder_node_unlock(node); + binder_inner_proc_lock(proc); + } + binder_inner_proc_unlock(proc); +} + static int binder_ioctl_freeze(struct binder_freeze_info *info, struct binder_proc *target_proc) { @@ -5343,6 +5613,7 @@ static int binder_ioctl_freeze(struct binder_freeze_info *info, target_proc->async_recv = false; target_proc->is_frozen = false; binder_inner_proc_unlock(target_proc); + binder_add_freeze_work(target_proc, false); return 0; } @@ -5375,6 +5646,8 @@ static int binder_ioctl_freeze(struct binder_freeze_info *info, binder_inner_proc_lock(target_proc); target_proc->is_frozen = false; binder_inner_proc_unlock(target_proc); + } else { + binder_add_freeze_work(target_proc, true); } return ret; @@ -5742,6 +6015,7 @@ static int binder_open(struct inode *nodp, struct file *filp) binder_stats_created(BINDER_STAT_PROC); proc->pid = current->group_leader->pid; INIT_LIST_HEAD(&proc->delivered_death); + INIT_LIST_HEAD(&proc->delivered_freeze); INIT_LIST_HEAD(&proc->waiting_threads); filp->private_data = proc; @@ -6296,6 +6570,9 @@ static const char * const binder_return_strings[] = { "BR_FAILED_REPLY", "BR_FROZEN_REPLY", "BR_ONEWAY_SPAM_SUSPECT", + "UNSUPPORTED", + "BR_FROZEN_BINDER", + "BR_CLEAR_FREEZE_NOTIFICATION_DONE", }; static const char * const binder_command_strings[] = { @@ -6318,6 +6595,9 @@ static const char * const binder_command_strings[] = { "BC_DEAD_BINDER_DONE", "BC_TRANSACTION_SG", "BC_REPLY_SG", + "BC_REQUEST_FREEZE_NOTIFICATION", + "BC_CLEAR_FREEZE_NOTIFICATION", + "BC_FREEZE_NOTIFICATION_DONE", }; static const char * const binder_objstat_strings[] = { @@ -6327,7 +6607,8 @@ static const char * const binder_objstat_strings[] = { "ref", "death", "transaction", - "transaction_complete" + "transaction_complete", + "freeze", }; static void print_binder_stats(struct seq_file *m, const char *prefix, diff --git a/drivers/android/binder_internal.h b/drivers/android/binder_internal.h index 00010c2e3e3c..b452fcba0344 100644 --- a/drivers/android/binder_internal.h +++ b/drivers/android/binder_internal.h @@ -129,12 +129,13 @@ enum binder_stat_types { BINDER_STAT_DEATH, BINDER_STAT_TRANSACTION, BINDER_STAT_TRANSACTION_COMPLETE, + BINDER_STAT_FREEZE, BINDER_STAT_COUNT }; struct binder_stats { - atomic_t br[_IOC_NR(BR_ONEWAY_SPAM_SUSPECT) + 1]; - atomic_t bc[_IOC_NR(BC_REPLY_SG) + 1]; + atomic_t br[_IOC_NR(BR_CLEAR_FREEZE_NOTIFICATION_DONE) + 1]; + atomic_t bc[_IOC_NR(BC_FREEZE_NOTIFICATION_DONE) + 1]; atomic_t obj_created[BINDER_STAT_COUNT]; atomic_t obj_deleted[BINDER_STAT_COUNT]; }; @@ -158,6 +159,8 @@ struct binder_work { BINDER_WORK_DEAD_BINDER, BINDER_WORK_DEAD_BINDER_AND_CLEAR, BINDER_WORK_CLEAR_DEATH_NOTIFICATION, + BINDER_WORK_FROZEN_BINDER, + BINDER_WORK_CLEAR_FREEZE_NOTIFICATION, } type; }; @@ -279,6 +282,14 @@ struct binder_ref_death { binder_uintptr_t cookie; }; +struct binder_ref_freeze { + struct binder_work work; + binder_uintptr_t cookie; + bool is_frozen:1; + bool sent:1; + bool resend:1; +}; + /** * struct binder_ref_data - binder_ref counts and id * @debug_id: unique ID for the ref @@ -311,6 +322,8 @@ struct binder_ref_data { * @node indicates the node must be freed * @death: pointer to death notification (ref_death) if requested * (protected by @node->lock) + * @freeze: pointer to freeze notification (ref_freeze) if requested + * (protected by @node->lock) * * Structure to track references from procA to target node (on procB). This * structure is unsafe to access without holding @proc->outer_lock. @@ -327,6 +340,7 @@ struct binder_ref { struct binder_proc *proc; struct binder_node *node; struct binder_ref_death *death; + struct binder_ref_freeze *freeze; }; /** @@ -391,6 +405,8 @@ struct binder_priority { * (atomics, no lock needed) * @delivered_death: list of delivered death notification * (protected by @inner_lock) + * @delivered_freeze: list of delivered freeze notification + * (protected by @inner_lock) * @max_threads: cap on number of binder threads * (protected by @inner_lock) * @requested_threads: number of binder threads requested but not @@ -437,6 +453,7 @@ struct binder_proc { struct list_head todo; struct binder_stats stats; struct list_head delivered_death; + struct list_head delivered_freeze; int max_threads; int requested_threads; int requested_threads_started; diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h index 7badb670f24f..cd973628dc2c 100644 --- a/include/uapi/linux/android/binder.h +++ b/include/uapi/linux/android/binder.h @@ -284,6 +284,12 @@ struct binder_frozen_status_info { __u32 async_recv; }; +struct binder_frozen_state_info { + binder_uintptr_t cookie; + __u32 is_frozen; + __u32 reserved; +}; + #define BINDER_WRITE_READ _IOWR('b', 1, struct binder_write_read) #define BINDER_SET_IDLE_TIMEOUT _IOW('b', 3, __s64) #define BINDER_SET_MAX_THREADS _IOW('b', 5, __u32) @@ -492,6 +498,17 @@ enum binder_driver_return_protocol { * asynchronous transaction makes the allocated async buffer size exceed * detection threshold. No parameters. */ + + BR_FROZEN_BINDER = _IOR('r', 21, struct binder_frozen_state_info), + /* + * The cookie and a boolean (is_frozen) that indicates whether the process + * transitioned into a frozen or an unfrozen state. + */ + + BR_CLEAR_FREEZE_NOTIFICATION_DONE = _IOR('r', 22, binder_uintptr_t), + /* + * void *: cookie + */ }; enum binder_driver_command_protocol { @@ -575,6 +592,25 @@ enum binder_driver_command_protocol { /* * binder_transaction_data_sg: the sent command. */ + + BC_REQUEST_FREEZE_NOTIFICATION = + _IOW('c', 19, struct binder_handle_cookie), + /* + * int: handle + * void *: cookie + */ + + BC_CLEAR_FREEZE_NOTIFICATION = _IOW('c', 20, + struct binder_handle_cookie), + /* + * int: handle + * void *: cookie + */ + + BC_FREEZE_NOTIFICATION_DONE = _IOW('c', 21, binder_uintptr_t), + /* + * void *: cookie + */ }; #endif /* _UAPI_LINUX_BINDER_H */ From 69d87eed075cc112021b2e111ba27a4bcd1c61f3 Mon Sep 17 00:00:00 2001 From: Yu-Ting Tseng Date: Tue, 9 Jul 2024 00:00:49 -0700 Subject: [PATCH 1560/1565] BACKPORT: FROMGIT: binder: frozen notification binder_features flag Add a flag to binder_features to indicate that the freeze notification feature is available. Signed-off-by: Yu-Ting Tseng Acked-by: Carlos Llamas Link: https://lore.kernel.org/r/20240709070047.4055369-6-yutingtseng@google.com Signed-off-by: Greg Kroah-Hartman Bug: 363013421 (cherry picked from commit 30b968b002a92870325a5c9d1ce78eba0ce386e7 git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git char-misc-next) Change-Id: Ic26c8ae42d27c6fd8f5daed5eecabd1652e29502 [cmllamas: fix trivial conflicts due to missing extended_error] Signed-off-by: Carlos Llamas --- drivers/android/binderfs.c | 8 ++++++++ .../selftests/filesystems/binderfs/binderfs_test.c | 1 + 2 files changed, 9 insertions(+) diff --git a/drivers/android/binderfs.c b/drivers/android/binderfs.c index 8d4eb491d4a6..b9a60982aba3 100644 --- a/drivers/android/binderfs.c +++ b/drivers/android/binderfs.c @@ -60,6 +60,7 @@ enum binderfs_stats_mode { struct binder_features { bool oneway_spam_detection; + bool freeze_notification; }; static const struct constant_table binderfs_param_stats[] = { @@ -75,6 +76,7 @@ static const struct fs_parameter_spec binderfs_fs_parameters[] = { static struct binder_features binder_features = { .oneway_spam_detection = true, + .freeze_notification = true, }; static inline struct binderfs_info *BINDERFS_SB(const struct super_block *sb) @@ -613,6 +615,12 @@ static int init_binder_features(struct super_block *sb) if (IS_ERR(dentry)) return PTR_ERR(dentry); + dentry = binderfs_create_file(dir, "freeze_notification", + &binder_features_fops, + &binder_features.freeze_notification); + if (IS_ERR(dentry)) + return PTR_ERR(dentry); + return 0; } diff --git a/tools/testing/selftests/filesystems/binderfs/binderfs_test.c b/tools/testing/selftests/filesystems/binderfs/binderfs_test.c index 0315955ff0f4..efc30a67a40f 100644 --- a/tools/testing/selftests/filesystems/binderfs/binderfs_test.c +++ b/tools/testing/selftests/filesystems/binderfs/binderfs_test.c @@ -64,6 +64,7 @@ static int __do_binderfs_test(struct __test_metadata *_metadata) device_path[sizeof(P_tmpdir "/binderfs_XXXXXX/") + BINDERFS_MAX_NAME]; static const char * const binder_features[] = { "oneway_spam_detection", + "freeze_notification", }; change_mountns(_metadata); From 587d04a0706d2bbd80b44197bca201a6515b076c Mon Sep 17 00:00:00 2001 From: Carlos Llamas Date: Thu, 29 Aug 2024 18:18:15 +0000 Subject: [PATCH 1561/1565] ANDROID: binder: fix KMI issues due to frozen notification The patches to support binder's frozen notification feature break the KMI. This change fixes such issues by (1) moving proc->delivered_freeze into the existing proc_wrapper struction, (2) dropping the frozen stats support and (3) amending the STG due to a harmless enum binder_work_type addition. These are the reported KMI issues fixed by this patch: function symbol 'int __traceiter_binder_transaction_received(void*, struct binder_transaction*)' changed CRC changed from 0x74e9c98b to 0xfe0f8640 type 'struct binder_proc' changed byte size changed from 584 to 632 member 'struct list_head delivered_death' changed offset changed by 256 member 'struct list_head delivered_freeze' was added 13 members ('u32 max_threads' .. 'u64 android_oem_data1') changed offset changed by 384 type 'struct binder_thread' changed byte size changed from 464 to 496 2 members ('atomic_t tmp_ref' .. 'bool is_dead') changed offset changed by 224 4 members ('struct task_struct* task' .. 'enum binder_prio_state prio_state') changed offset changed by 256 type 'struct binder_stats' changed byte size changed from 216 to 244 member changed from 'atomic_t br[21]' to 'atomic_t br[23]' type changed from 'atomic_t[21]' to 'atomic_t[23]' number of elements changed from 21 to 23 member changed from 'atomic_t bc[19]' to 'atomic_t bc[22]' offset changed from 672 to 736 type changed from 'atomic_t[19]' to 'atomic_t[22]' number of elements changed from 19 to 22 member changed from 'atomic_t obj_created[7]' to 'atomic_t obj_created[8]' offset changed from 1280 to 1440 type changed from 'atomic_t[7]' to 'atomic_t[8]' number of elements changed from 7 to 8 member changed from 'atomic_t obj_deleted[7]' to 'atomic_t obj_deleted[8]' offset changed from 1504 to 1696 type changed from 'atomic_t[7]' to 'atomic_t[8]' number of elements changed from 7 to 8 type 'enum binder_work_type' changed enumerator 'BINDER_WORK_FROZEN_BINDER' (10) was added enumerator 'BINDER_WORK_CLEAR_FREEZE_NOTIFICATION' (11) was added Bug: 363013421 Change-Id: If9f1f14a2eda215a4c9cb0823c50c8e0e8079ef1 Signed-off-by: Carlos Llamas --- drivers/android/binder.c | 15 +++------------ drivers/android/binder_internal.h | 19 +++++++++++++------ 2 files changed, 16 insertions(+), 18 deletions(-) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index 219e7883f399..c5ff564d51a4 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -3864,7 +3864,6 @@ binder_request_freeze_notification(struct binder_proc *proc, is_frozen = ref->node->proc->is_frozen; binder_inner_proc_unlock(ref->node->proc); - binder_stats_created(BINDER_STAT_FREEZE); INIT_LIST_HEAD(&freeze->work.entry); freeze->cookie = handle_cookie->cookie; freeze->work.type = BINDER_WORK_FROZEN_BINDER; @@ -3950,7 +3949,7 @@ binder_freeze_notification_done(struct binder_proc *proc, struct binder_work *w; binder_inner_proc_lock(proc); - list_for_each_entry(w, &proc->delivered_freeze, entry) { + list_for_each_entry(w, &proc_wrapper(proc)->delivered_freeze, entry) { struct binder_ref_freeze *tmp_freeze = container_of(w, struct binder_ref_freeze, work); @@ -4911,7 +4910,7 @@ static int binder_thread_read(struct binder_proc *proc, info.is_frozen = freeze->is_frozen; info.cookie = freeze->cookie; freeze->sent = true; - binder_enqueue_work_ilocked(w, &proc->delivered_freeze); + binder_enqueue_work_ilocked(w, &proc_wrapper(proc)->delivered_freeze); binder_inner_proc_unlock(proc); if (put_user(BR_FROZEN_BINDER, (uint32_t __user *)ptr)) @@ -4931,7 +4930,6 @@ static int binder_thread_read(struct binder_proc *proc, binder_inner_proc_unlock(proc); kfree(freeze); - binder_stats_deleted(BINDER_STAT_FREEZE); if (put_user(BR_CLEAR_FREEZE_NOTIFICATION_DONE, (uint32_t __user *)ptr)) return -EFAULT; ptr += sizeof(uint32_t); @@ -6015,7 +6013,7 @@ static int binder_open(struct inode *nodp, struct file *filp) binder_stats_created(BINDER_STAT_PROC); proc->pid = current->group_leader->pid; INIT_LIST_HEAD(&proc->delivered_death); - INIT_LIST_HEAD(&proc->delivered_freeze); + INIT_LIST_HEAD(&proc_wrapper(proc)->delivered_freeze); INIT_LIST_HEAD(&proc->waiting_threads); filp->private_data = proc; @@ -6570,9 +6568,6 @@ static const char * const binder_return_strings[] = { "BR_FAILED_REPLY", "BR_FROZEN_REPLY", "BR_ONEWAY_SPAM_SUSPECT", - "UNSUPPORTED", - "BR_FROZEN_BINDER", - "BR_CLEAR_FREEZE_NOTIFICATION_DONE", }; static const char * const binder_command_strings[] = { @@ -6595,9 +6590,6 @@ static const char * const binder_command_strings[] = { "BC_DEAD_BINDER_DONE", "BC_TRANSACTION_SG", "BC_REPLY_SG", - "BC_REQUEST_FREEZE_NOTIFICATION", - "BC_CLEAR_FREEZE_NOTIFICATION", - "BC_FREEZE_NOTIFICATION_DONE", }; static const char * const binder_objstat_strings[] = { @@ -6608,7 +6600,6 @@ static const char * const binder_objstat_strings[] = { "death", "transaction", "transaction_complete", - "freeze", }; static void print_binder_stats(struct seq_file *m, const char *prefix, diff --git a/drivers/android/binder_internal.h b/drivers/android/binder_internal.h index b452fcba0344..03bad17942df 100644 --- a/drivers/android/binder_internal.h +++ b/drivers/android/binder_internal.h @@ -129,13 +129,12 @@ enum binder_stat_types { BINDER_STAT_DEATH, BINDER_STAT_TRANSACTION, BINDER_STAT_TRANSACTION_COMPLETE, - BINDER_STAT_FREEZE, BINDER_STAT_COUNT }; struct binder_stats { - atomic_t br[_IOC_NR(BR_CLEAR_FREEZE_NOTIFICATION_DONE) + 1]; - atomic_t bc[_IOC_NR(BC_FREEZE_NOTIFICATION_DONE) + 1]; + atomic_t br[_IOC_NR(BR_ONEWAY_SPAM_SUSPECT) + 1]; + atomic_t bc[_IOC_NR(BC_REPLY_SG) + 1]; atomic_t obj_created[BINDER_STAT_COUNT]; atomic_t obj_deleted[BINDER_STAT_COUNT]; }; @@ -159,8 +158,10 @@ struct binder_work { BINDER_WORK_DEAD_BINDER, BINDER_WORK_DEAD_BINDER_AND_CLEAR, BINDER_WORK_CLEAR_DEATH_NOTIFICATION, +#ifndef __GENKSYMS__ BINDER_WORK_FROZEN_BINDER, BINDER_WORK_CLEAR_FREEZE_NOTIFICATION, +#endif } type; }; @@ -405,8 +406,6 @@ struct binder_priority { * (atomics, no lock needed) * @delivered_death: list of delivered death notification * (protected by @inner_lock) - * @delivered_freeze: list of delivered freeze notification - * (protected by @inner_lock) * @max_threads: cap on number of binder threads * (protected by @inner_lock) * @requested_threads: number of binder threads requested but not @@ -453,7 +452,6 @@ struct binder_proc { struct list_head todo; struct binder_stats stats; struct list_head delivered_death; - struct list_head delivered_freeze; int max_threads; int requested_threads; int requested_threads_started; @@ -474,6 +472,8 @@ struct binder_proc { * @cred struct cred associated with the `struct file` * in binder_open() * (invariant after initialized) + * @delivered_freeze: list of delivered freeze notification + * (protected by @inner_lock) * * Extended binder_proc -- needed to add the "cred" field without * changing the KMI for binder_proc. @@ -481,6 +481,7 @@ struct binder_proc { struct binder_proc_ext { struct binder_proc proc; const struct cred *cred; + struct list_head delivered_freeze; }; static inline const struct cred *binder_get_cred(struct binder_proc *proc) @@ -491,6 +492,12 @@ static inline const struct cred *binder_get_cred(struct binder_proc *proc) return eproc->cred; } +static inline +struct binder_proc_ext *proc_wrapper(struct binder_proc *proc) +{ + return container_of(proc, struct binder_proc_ext, proc); +} + /** * struct binder_thread - binder thread bookkeeping * @proc: binder process for this thread From 8e78d8ae8aeba310f723e8132a85419c7cfc4793 Mon Sep 17 00:00:00 2001 From: Carlos Llamas Date: Sat, 7 Sep 2024 01:47:39 +0000 Subject: [PATCH 1562/1565] ANDROID: fix ENOMEM check of binder_proc_ext The check should be done against 'eproc' before it gets dereferenced. Fixes: d49297739550 ("BACKPORT: binder: use euid from cred instead of using task") Change-Id: Ief0c08212c4da8bdfdf628474de9dd30ee5a8db0 Signed-off-by: Carlos Llamas --- drivers/android/binder.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index c5ff564d51a4..b540a477c5f8 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -5979,9 +5979,9 @@ static int binder_open(struct inode *nodp, struct file *filp) current->group_leader->pid, current->pid); eproc = kzalloc(sizeof(*eproc), GFP_KERNEL); - proc = &eproc->proc; - if (proc == NULL) + if (eproc == NULL) return -ENOMEM; + proc = &eproc->proc; spin_lock_init(&proc->inner_lock); spin_lock_init(&proc->outer_lock); get_task_struct(current->group_leader); From 2c00661c3fd09957c17ac18651c4cda4990dd174 Mon Sep 17 00:00:00 2001 From: zhujingpeng Date: Wed, 10 Jul 2024 11:22:28 +0800 Subject: [PATCH 1563/1565] ANDROID: GKI: Add initialization for mutex oem_data. Although __mutex_init() already contains a hook, but this function may be called before the mutex_init hook is registered, causing mutex's oem_data to be uninitialized and causing unpredictable errors. Bug: 352181884 Change-Id: I04378d6668fb4e7b93c11d930ac46aae484fc835 Signed-off-by: zhujingpeng (cherry picked from commit 96d66062d0767aeafb690ce014ec91785820d62b) (cherry picked from commit 4c213b2ea1f58bc640894684f65c92b6cdee522d) --- kernel/locking/mutex.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 943f6c3a7c67..6ae1244025de 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -47,6 +47,7 @@ __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key) #ifdef CONFIG_MUTEX_SPIN_ON_OWNER osq_lock_init(&lock->osq); #endif + android_init_oem_data(lock, 1); debug_mutex_init(lock, name, key); } From 0c025265d857ea3159aecc2c7820fb4158e12d40 Mon Sep 17 00:00:00 2001 From: zhujingpeng Date: Thu, 4 Jul 2024 22:08:43 +0800 Subject: [PATCH 1564/1565] ANDROID: GKI: Add initialization for rwsem's oem_data and vendor_data. Add initialization for rwsem's oem_data and vendor_data. The __init_rwsem() already contains a hook, but this function may be called before the rwsem_init hook is registered, causing some rwsem's oem_data to be uninitialized and causing unpredictable errors Bug: 351133539 Change-Id: I7bbb83894d200102bc7d84e91678f164529097a0 Signed-off-by: zhujingpeng (cherry picked from commit aaca6b10f1a352dec4596548396f590500f2001b) (cherry picked from commit 1a5cffcbb543099802723c420f2f538b12619a72) --- kernel/locking/rwsem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c index acd1fc78d059..3527850a60a1 100644 --- a/kernel/locking/rwsem.c +++ b/kernel/locking/rwsem.c @@ -348,6 +348,8 @@ void __init_rwsem(struct rw_semaphore *sem, const char *name, #ifdef CONFIG_RWSEM_SPIN_ON_OWNER osq_lock_init(&sem->osq); #endif + android_init_vendor_data(sem, 1); + android_init_oem_data(sem, 1); trace_android_vh_rwsem_init(sem); } EXPORT_SYMBOL(__init_rwsem); From 5b9bc4b19816d7a5c3af9202497105313b70a6a5 Mon Sep 17 00:00:00 2001 From: Carlos Llamas Date: Mon, 9 Sep 2024 22:11:51 +0000 Subject: [PATCH 1565/1565] ANDROID: delete tool added by mistake Remove the gen-hyprel binary added accidentally while backporting a patch into an older branch. The tool gets generated in newer builds and wasn't part of the gitignore file here. Fixes: 0e10c6560f93 ("BACKPORT: FROMGIT: binder: frozen notification") Change-Id: I103358cb2ca9c5fb934f047033e44c04fe85298d Signed-off-by: Carlos Llamas --- arch/arm64/tools/gen-hyprel | Bin 16512 -> 0 bytes 1 file changed, 0 insertions(+), 0 deletions(-) delete mode 100755 arch/arm64/tools/gen-hyprel diff --git a/arch/arm64/tools/gen-hyprel b/arch/arm64/tools/gen-hyprel deleted file mode 100755 index 83299628cbfe2637803b5e2f54190ac94f884d43..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 16512 zcmeHO4R9OBbzbmCqG*Yr6v?He*1RxcOxYwvilR&@q6|=?bo5XXqaRxxM?OIWKw<&{ zEO3aZG_k2rbc@1toK9?KGMQHOq-iFJrDCnxx9Ap2aBLJehy~!Qq)h4Eia@K%rHXu7qve&LWY<8IBJ}r_STN-XiIQEZGNKz` zD(V%I?5e4%`9j*YP~n)eKHHw5^rC=Xq6DBY&9x)dY@1Yz*`9p*fX<81{bxH8lii%Q zo6~k#N&%5xQ_d&lg#Np=zd~w2N5zmwAqwpdYP&+ZUE`Tj8&vkaYWw~(?G0+X7n+ry zLfWdtf+@H6aoFipYsC8#KFxb|d!^1pyFL$0by~%`Q1Hmc4eLU_woot}A8#A)-q^Nr zgDVzst(Of`T+{}QshztIh>8g@Lo&9f`f~hJT1mg>!ZQz@dg&$mwKu-H>?ePpJoU^M z&9&@t3`1%r-H8k$E0>Q916#TRwEC$w)6|=oj+Zy!-!y+&i4TkN27>nC6 zF%*i#{30^u4~v-X^GBl~QIJR!mZ0H?6^aab?O-G<{Nq7ejE;K8L;&t=L5^vFZWKEQ z`g^xo>s_6l+3I>%SGL}{L0J9!cUkc79}dQBe{}z@ZJ|imzu$W#L?I53M#4HoOF@eQ zX>?X%WL9SXRGp4hQbyypQhX9A;p_1`Uw8<^xJtyeEWRqZh(`T{=D#cT72-bdRQSHh z`80k{$uH^qBFpFW@6|!K(zBfTnOl{dt|2;yT%OL6>sz2y&F07vV76S!kuT0laX-$H z)3wQk#{=CbI0sVaRtWmpsZdpUtZ^%bA`SThxS^a&?%gw17yCO4dPCixhWH!o7*E3LMnxDnd+2}@%_z}{l=Q1eG zPY_R2s`RYnzehYxY0@*2|2FY7wMw6o{5OcFsZIK%!dNX)2SRko=z$ zPt%F?nB@PAc$&JT4@>@Y#M2ZlJt+BF@PP-w%i_7ibGzq&Xa9kH=JB6@38KkNg}u&9 zUT-FQ4g=XSV@`Lyi7lC(Y#DH61e@zcCX%A=squ z05tLB)Qs~_FQH>*; zkfK^Y-JDgKb6LV&;J}^n#4;8m8aD4uZ za89Mcnn|<8oVw^d(WP`-O~*MivE)G)k~;Og^TaKLIrqu}bGpZi=O(jq`_cwP_*v*v zm*1mqEb*dlCRZE*ah~yP=})|5o|{`>I%dsHPdF#HArt+NcZX=EJ=w0F+;GTsa z4DBfpIkcZ$3edv<{R4F@^{8t2YfyX6jzeT7&LLHA;?V?Fd!$urBn!V=!H*I==>+gW z0#+`$9|GVT6}~%*->vXF34iuAG}tGD_*d6M3I zmjkVWC+F#BJZEp9Ad@TJ0>JYa1|M?R_3{$p&9c((lP@In@ef>Syy3 zcvu1Z6>yaXz5rmVnqV^uwn@R76zpaVdsN;7RxF0pb9Oo6@;v5N0d)Y8z6)yE;RY4* zPj4c`Jqi&a#A!yXREQ@eqE8|Inh;-P#43e2DG>t-@gqV+8L?U+?v{uh3bAxC5Q9Jj zT+YT5kkNg%;}`vj=k78Sub7iRy*#)d=bKJz&lI|vSz)ae;;pM_`u%_|4XHU@{R)vy zZ`&*I^Rt!e`1x6snbY@!Cp88Bti*SHo5i z+dTX9Q7FlZek=2S9U&~2at6v7C}*IYfpP}Q87ODq|5yebc>8c5<{kELHkxC`7Jn#U z1;cn1wY3)Sf$%(K#r#9`w#gE%;ZWp=Hw0L7tX8;=Mxs8WUAV>~V;UlRF5gv=ug%7` zNIc{-!V%k`_gMx$G%y0ekZh<{6vYkuZG&EA#bZV!P+*|*TGWnSf)&7XaNbzVAEiD> z->#T##qcK1*xF+djRab5La$x^5nptz5%3~HpV1N;iS;yx;;lxMD|ij;?`ig}tyO^( z_0N9Vw)(B0&mXq$Y(Hf5^cd~qw*>NBw~sfEE8nVi)%geX+adkl*mvUxbeo&+tNFM# z-u24w!0*~`r}DewckOqB^t(PU3(b7mK(8kD+VS&#dFYaUJN}D)Eh~;!yB!Of{XqoXxxQ82YmJ+6|A2>U;?C(O;2wj9lhGWn6^_-FAET#S ztOAOG*E1O#^tYg20bTq?Ci7j;(B(`9kIC{I6%gV0^j_f@uXn6oys+j0$HIF0c7;AF zF@V!4CRho)5Nn_#{A@atIYq#_`WhsnS~DSTUG=e>I@d@ygr?yal;d_`|v!~K=Lb;fj6Z(YlM)tx+fRztebTNYwNlp@zk{d z0VFJX>uNk$$0(O_2Fe*IXP}&cat8jNWq`krn@TG5Q%>cuuJAW%Piwt?oDx4)uO8oVraNPVvnk^Ij-9CHyrNlR6uS=FMLnEqm0$ z$NOP_44!-uOD&dKK06YPS>E`#7>bOJh5T5QfQO<8+9MFOywRxln1!Xo=rIw9dPn`1 zFFrbY3??}omS}@^9@MgS?Dg#Gv-);#xA4jS_CMV1+10-dR@hf!^_g00Zr>}codbJ% zJpo5G53)MiRPcT~31~}0KG;voIYM?-Ze7UJ z%FPdX5+n+EwH1LkN)!O{_bOPvSi~CfhJCb;p??o7eZjC5kNJH9NyWiB5{qena`y!- zP3L)}T@`r(%fjvmPAqMkD7qJdw`t^=%6%DG33g-TZwAR~WFG09>P;Vc3gL|zS~w8+!j-$tWc!biNZ5#jP3 z3!@1|?Wodx${&rw5fR zm0e^xWvT|Uvfw&e@QwzD(0s%u&j^kNjtgf`xNs?r;tDBp_5a63G{40yk8W?WH0XH+ zQ=Z>ywZw9r=tuBRb4s@F*7i&d=yD6sGg!6}m0bIS+MX$`tK}BH-?OYAm0WwC7cn&$ zsA-}79gzP9b5xF>=j}{ew4I*LX6O0r){6?QxUoIY>zRIul8ZtyQkDD9^J|(9QJ2}C zpA(q!a{(Z-Xm`wtKSGV>;B3$HZl+Bd$?-Fe>D|!NoSu1}*E4l%`@;COSWKfrIc0l( zu3*Z~72JMd{~y-&z1kr^r!c*&+3E89~^d_vy{7BeW1Lie!&$&-6WLa_xCtAf@d`vX-jCc1#z*IM<%%?St(K&vCLF zwqt%Vj7i4vA9gFvVQoLgKuwJTeyMI>!?RkpZ_G+mg<_?WOS5BFspN4&JDku1?hhcP qcP|`2-+$@);Mk=hAG%cKkpk!3F2nF7Qq=yDTNHvmHO(yrtoU~%FhP(2