[ Upstream commit 096516d092d54604d590827d05b1022c8f326639 ]
The 54810 does not support c45. The mmd_phy_indirect accesses return
arbirtary values leading to odd behavior like saying it supports EEE
when it doesn't. We also see that reading/writing these non-existent
MMD registers leads to phy instability in some cases.
Fixes: b14995ac25 ("net: phy: broadcom: Add BCM54810 PHY entry")
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/1691901708-28650-1-git-send-email-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 23185c6aed1ffb8fc44087880ba2767aba493779 ]
Do not allow to insert elements from datapath to objects maps.
Fixes: 8aeff920dc ("netfilter: nf_tables: add stateful object reference to set elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5310760af1d4fbea1452bfc77db5f9a680f7ae47 ]
When two threads run proc_do_sync_threshold() in parallel,
data races could happen between the two memcpy():
Thread-1 Thread-2
memcpy(val, valp, sizeof(val));
memcpy(valp, val, sizeof(val));
This race might mess up the (struct ctl_table *) table->data,
so we add a mutex lock to serialize them.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/netdev/B6988E90-0A1E-4B85-BF26-2DAF6D482433@gmail.com/
Signed-off-by: Sishuai Gong <sishuai.system@gmail.com>
Acked-by: Simon Horman <horms@kernel.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 90e5b3462efa37b8bba82d7c4e63683856e188af ]
When flushing, individual set elements are disabled in the next
generation via the ->flush callback.
Catchall elements are not disabled. This is incorrect and may lead to
double-deactivations of catchall elements which then results in memory
leaks:
WARNING: CPU: 1 PID: 3300 at include/net/netfilter/nf_tables.h:1172 nft_map_deactivate+0x549/0x730
CPU: 1 PID: 3300 Comm: nft Not tainted 6.5.0-rc5+ #60
RIP: 0010:nft_map_deactivate+0x549/0x730
[..]
? nft_map_deactivate+0x549/0x730
nf_tables_delset+0xb66/0xeb0
(the warn is due to nft_use_dec() detecting underflow).
Fixes: aaa31047a6 ("netfilter: nftables: add catch-all set element support")
Reported-by: lonial con <kongln9170@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 28458c80006bb4e993a09fc094094a8578cad292 ]
tx_timeout_task is canceled too early when removing the driver. Nothing
prevents .ndo_tx_timeout from triggering and queuing the work again.
Better cancel it after the netdev is unregistered.
It's harmless for octep_tx_timeout_task to run in the window between the
unregistration and cancelation, because it checks netif_running.
Fixes: 862cd659a6 ("octeon_ep: Add driver framework and device initialization")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Link: https://lore.kernel.org/r/20230810150114.107765-3-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6c461e394d11a981c662cc16cebfb05b602e23ba ]
On Zynq UltraScale+ MPSoC ubuntu platform when systemctl issues suspend,
network manager bring down the interface and goes into suspend. When it
wakes up it again enables the interface.
This leads to xilinx-psgtr "PLL lock timeout" on interface bringup, as
the power management controller power down the entire FPD (including
SERDES) if none of the FPD devices are in use and serdes is not
initialized on resume.
$ sudo rtcwake -m no -s 120 -v
$ sudo systemctl suspend <this does ifconfig eth1 down>
$ ifconfig eth1 up
xilinx-psgtr fd400000.phy: lane 0 (type 10, protocol 5): PLL lock timeout
phy phy-fd400000.phy.0: phy poweron failed --> -110
macb driver is called in this way:
1. macb_close: Stop network interface. In this function, it
reset MACB IP and disables PHY and network interface.
2. macb_suspend: It is called in kernel suspend flow. But because
network interface has been disabled(netif_running(ndev) is
false), it does nothing and returns directly;
3. System goes into suspend state. Some time later, system is
waken up by RTC wakeup device;
4. macb_resume: It does nothing because network interface has
been disabled;
5. macb_open: It is called to enable network interface again. ethernet
interface is initialized in this API but serdes which is power-off
by PMUFW during FPD-off suspend is not initialized again and so
we hit GT PLL lock issue on open.
To resolve this PLL timeout issue always do PS GTR initialization
when ethernet device is configured as non-wakeup source.
Fixes: f22bd29ba1 ("net: macb: Fix ZynqMP SGMII non-wakeup source resume failure")
Fixes: 8b73fa3ae0 ("net: macb: Added ZynqMP-specific initialization")
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://lore.kernel.org/r/1691414091-2260697-1-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e8470c0a7bcaa82f78ad34282d662dd7bd9630c2 ]
Commit 03e909acd9 ("drm/panel: simple: Add support for AUO G121EAN01.4
panel") added support for this panel model, but the timings it implements
are very different from what the datasheet describes. I checked both the
G121EAN01.0 datasheet from [0] and the G121EAN01.4 one from [1] and they
all have the same timings: for example the LVDS clock typical value is 74.4
MHz, not 66.7 MHz as implemented.
Replace the timings with the ones from the documentation. These timings
have been tested and the clock frequencies verified with an oscilloscope to
ensure they are correct.
Also use struct display_timing instead of struct drm_display_mode in order
to also specify the minimum and maximum values.
[0] https://embedded.avnet.com/product/g121ean01-0/
[1] https://embedded.avnet.com/product/g121ean01-4/
Fixes: 03e909acd9 ("drm/panel: simple: Add support for AUO G121EAN01.4 panel")
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230804151239.835216-1-luca.ceresoli@bootlin.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 855067defa36b1f9effad8c219d9a85b655cf500 ]
This test verifies whether the encapsulated packets have the correct
configured TTL. It does so by sending ICMP packets through the test
topology and mirroring them to a gretap netdevice. On a busy host
however, more than just the test ICMP packets may end up flowing
through the topology, get mirrored, and counted. This leads to
potential spurious failures as the test observes much more mirrored
packets than the sent test packets, and assumes a bug.
Fix this by tightening up the mirror action match. Change it from
matchall to a flower classifier matching on ICMP packets specifically.
Fixes: 45315673e0 ("selftests: forwarding: Test changes in mirror-to-gretap")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cc941e548bffc01b5816b4edc5cb432a137a58b3 ]
Uwe reports:
"Most PHYs signal WoL using an interrupt. So disabling interrupts [at
shutdown] breaks WoL at least on PHYs covered by the marvell driver."
Discussing with Ioana, the problem which was trying to be solved was:
"The board in question is a LS1021ATSN which has two AR8031 PHYs that
share an interrupt line. In case only one of the PHYs is probed and
there are pending interrupts on the PHY#2 an IRQ storm will happen
since there is no entity to clear the interrupt from PHY#2's registers.
PHY#1's driver will get stuck in .handle_interrupt() indefinitely."
Further confirmation that "the two AR8031 PHYs are on the same MDIO
bus."
With WoL using interrupts to wake the system, in such a case, the
system will begin booting with an asserted interrupt. Thus, we need to
cope with an interrupt asserted during boot.
Solve this instead by disabling interrupts during PHY probe. This will
ensure in Ioana's situation that both PHYs of the same type sharing an
interrupt line on a common MDIO bus will have their interrupt outputs
disabled when the driver probes the device, but before we hook in any
interrupt handlers - thus avoiding the interrupt storm.
A better fix would be for platform firmware to disable the interrupting
devices at source during boot, before control is handed to the kernel.
Fixes: e2f016cf77 ("net: phy: add a shutdown procedure")
Link: 20230804071757.383971-1-u.kleine-koenig@pengutronix.de
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 829c6524d6729d05a82575dbcc16f99be5ee843d ]
The reference of pdev->dev is taken by of_find_device_by_node, so
it should be released when not need anymore.
Fixes: 7dc54d3b8d ("net: pcs: add Renesas MII converter driver")
Signed-off-by: Xiang Yang <xiangyang3@huawei.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 51b813176f098ff61bd2833f627f5319ead098a5 ]
Commit 25266128fe16 ("virtio-net: fix race between set queues and
probe") tries to fix the race between set queues and probe by calling
_virtnet_set_queues() before DRIVER_OK is set. This violates virtio
spec. Fixing this by setting queues after virtio_device_ready().
Note that rtnl needs to be held for userspace requests to change the
number of queues. So we are serialized in this way.
Fixes: 25266128fe16 ("virtio-net: fix race between set queues and probe")
Reported-by: Dragos Tatulea <dtatulea@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9f62d221a4b0aa6a8d2a18053a0ca349c025297c ]
In virtnet_probe(), if the device doesn't provide a MAC address the
driver assigns a random one.
As we modify the MAC address we need to notify the device to allow it
to update all the related information.
The problem can be seen with vDPA and mlx5_vdpa driver as it doesn't
assign a MAC address by default. The virtio_net device uses a random
MAC address (we can see it with "ip link"), but we can't ping a net
namespace from another one using the virtio-vdpa device because the
new MAC address has not been provided to the hardware:
RX packets are dropped since they don't go through the receive filters,
TX packets go through unaffected.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 51b813176f09 ("virtio-net: set queues after driver_ok")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e2424708da7207087934c5c75211e8584d553a0 ]
The previous commit 4e484b3e96 ("xfrm: rate limit SA mapping change
message to user space") added one additional attribute named
XFRMA_MTIMER_THRESH and described its type at compat_policy
(net/xfrm/xfrm_compat.c).
However, the author forgot to also describe the nla_policy at
xfrma_policy (net/xfrm/xfrm_user.c). Hence, this suppose NLA_U32 (4
bytes) value can be faked as empty (0 bytes) by a malicious user, which
leads to 4 bytes overflow read and heap information leak when parsing
nlattrs.
To exploit this, one malicious user can spray the SLUB objects and then
leverage this 4 bytes OOB read to leak the heap data into
x->mapping_maxage (see xfrm_update_ae_params(...)), and leak it to
userspace via copy_to_user_state_extra(...).
The above bug is assigned CVE-2023-3773. To fix it, this commit just
completes the nla_policy description for XFRMA_MTIMER_THRESH, which
enforces the length check and avoids such OOB read.
Fixes: 4e484b3e96 ("xfrm: rate limit SA mapping change message to user space")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6018a266279b1a75143c7c0804dd08a5fc4c3e0b ]
When ip_vti device is set to the qdisc of the sfb type, the cb field
of the sent skb may be modified during enqueuing. Then,
slab-use-after-free may occur when ip_vti device sends IPv6 packets.
As commit f855691975 ("xfrm6: Fix the nexthdr offset in
_decode_session6.") showed, xfrm_decode_session was originally intended
only for the receive path. IP6CB(skb)->nhoff is not set during
transmission. Therefore, set the cb field in the skb to 0 before
sending packets.
Fixes: f855691975 ("xfrm6: Fix the nexthdr offset in _decode_session6.")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9fd41f1ba638938c9a1195d09bc6fa3be2712f25 ]
When ipv6_vti device is set to the qdisc of the sfb type, the cb field
of the sent skb may be modified during enqueuing. Then,
slab-use-after-free may occur when ipv6_vti device sends IPv6 packets.
The stack information is as follows:
BUG: KASAN: slab-use-after-free in decode_session6+0x103f/0x1890
Read of size 1 at addr ffff88802e08edc2 by task swapper/0/0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-next-20230707-00001-g84e2cad7f979 #410
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0xd9/0x150
print_address_description.constprop.0+0x2c/0x3c0
kasan_report+0x11d/0x130
decode_session6+0x103f/0x1890
__xfrm_decode_session+0x54/0xb0
vti6_tnl_xmit+0x3e6/0x1ee0
dev_hard_start_xmit+0x187/0x700
sch_direct_xmit+0x1a3/0xc30
__qdisc_run+0x510/0x17a0
__dev_queue_xmit+0x2215/0x3b10
neigh_connected_output+0x3c2/0x550
ip6_finish_output2+0x55a/0x1550
ip6_finish_output+0x6b9/0x1270
ip6_output+0x1f1/0x540
ndisc_send_skb+0xa63/0x1890
ndisc_send_rs+0x132/0x6f0
addrconf_rs_timer+0x3f1/0x870
call_timer_fn+0x1a0/0x580
expire_timers+0x29b/0x4b0
run_timer_softirq+0x326/0x910
__do_softirq+0x1d4/0x905
irq_exit_rcu+0xb7/0x120
sysvec_apic_timer_interrupt+0x97/0xc0
</IRQ>
Allocated by task 9176:
kasan_save_stack+0x22/0x40
kasan_set_track+0x25/0x30
__kasan_slab_alloc+0x7f/0x90
kmem_cache_alloc_node+0x1cd/0x410
kmalloc_reserve+0x165/0x270
__alloc_skb+0x129/0x330
netlink_sendmsg+0x9b1/0xe30
sock_sendmsg+0xde/0x190
____sys_sendmsg+0x739/0x920
___sys_sendmsg+0x110/0x1b0
__sys_sendmsg+0xf7/0x1c0
do_syscall_64+0x39/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 9176:
kasan_save_stack+0x22/0x40
kasan_set_track+0x25/0x30
kasan_save_free_info+0x2b/0x40
____kasan_slab_free+0x160/0x1c0
slab_free_freelist_hook+0x11b/0x220
kmem_cache_free+0xf0/0x490
skb_free_head+0x17f/0x1b0
skb_release_data+0x59c/0x850
consume_skb+0xd2/0x170
netlink_unicast+0x54f/0x7f0
netlink_sendmsg+0x926/0xe30
sock_sendmsg+0xde/0x190
____sys_sendmsg+0x739/0x920
___sys_sendmsg+0x110/0x1b0
__sys_sendmsg+0xf7/0x1c0
do_syscall_64+0x39/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The buggy address belongs to the object at ffff88802e08ed00
which belongs to the cache skbuff_small_head of size 640
The buggy address is located 194 bytes inside of
freed 640-byte region [ffff88802e08ed00, ffff88802e08ef80)
As commit f855691975 ("xfrm6: Fix the nexthdr offset in
_decode_session6.") showed, xfrm_decode_session was originally intended
only for the receive path. IP6CB(skb)->nhoff is not set during
transmission. Therefore, set the cb field in the skb to 0 before
sending packets.
Fixes: f855691975 ("xfrm6: Fix the nexthdr offset in _decode_session6.")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 53223f2ed1ef5c90dad814daaaefea4e68a933c8 ]
When the xfrm device is set to the qdisc of the sfb type, the cb field
of the sent skb may be modified during enqueuing. Then,
slab-use-after-free may occur when the xfrm device sends IPv6 packets.
The stack information is as follows:
BUG: KASAN: slab-use-after-free in decode_session6+0x103f/0x1890
Read of size 1 at addr ffff8881111458ef by task swapper/3/0
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.4.0-next-20230707 #409
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0xd9/0x150
print_address_description.constprop.0+0x2c/0x3c0
kasan_report+0x11d/0x130
decode_session6+0x103f/0x1890
__xfrm_decode_session+0x54/0xb0
xfrmi_xmit+0x173/0x1ca0
dev_hard_start_xmit+0x187/0x700
sch_direct_xmit+0x1a3/0xc30
__qdisc_run+0x510/0x17a0
__dev_queue_xmit+0x2215/0x3b10
neigh_connected_output+0x3c2/0x550
ip6_finish_output2+0x55a/0x1550
ip6_finish_output+0x6b9/0x1270
ip6_output+0x1f1/0x540
ndisc_send_skb+0xa63/0x1890
ndisc_send_rs+0x132/0x6f0
addrconf_rs_timer+0x3f1/0x870
call_timer_fn+0x1a0/0x580
expire_timers+0x29b/0x4b0
run_timer_softirq+0x326/0x910
__do_softirq+0x1d4/0x905
irq_exit_rcu+0xb7/0x120
sysvec_apic_timer_interrupt+0x97/0xc0
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:intel_idle_hlt+0x23/0x30
Code: 1f 84 00 00 00 00 00 f3 0f 1e fa 41 54 41 89 d4 0f 1f 44 00 00 66 90 0f 1f 44 00 00 0f 00 2d c4 9f ab 00 0f 1f 44 00 00 fb f4 <fa> 44 89 e0 41 5c c3 66 0f 1f 44 00 00 f3 0f 1e fa 41 54 41 89 d4
RSP: 0018:ffffc90000197d78 EFLAGS: 00000246
RAX: 00000000000a83c3 RBX: ffffe8ffffd09c50 RCX: ffffffff8a22d8e5
RDX: 0000000000000001 RSI: ffffffff8d3f8080 RDI: ffffe8ffffd09c50
RBP: ffffffff8d3f8080 R08: 0000000000000001 R09: ffffed1026ba6d9d
R10: ffff888135d36ceb R11: 0000000000000001 R12: 0000000000000001
R13: ffffffff8d3f8100 R14: 0000000000000001 R15: 0000000000000000
cpuidle_enter_state+0xd3/0x6f0
cpuidle_enter+0x4e/0xa0
do_idle+0x2fe/0x3c0
cpu_startup_entry+0x18/0x20
start_secondary+0x200/0x290
secondary_startup_64_no_verify+0x167/0x16b
</TASK>
Allocated by task 939:
kasan_save_stack+0x22/0x40
kasan_set_track+0x25/0x30
__kasan_slab_alloc+0x7f/0x90
kmem_cache_alloc_node+0x1cd/0x410
kmalloc_reserve+0x165/0x270
__alloc_skb+0x129/0x330
inet6_ifa_notify+0x118/0x230
__ipv6_ifa_notify+0x177/0xbe0
addrconf_dad_completed+0x133/0xe00
addrconf_dad_work+0x764/0x1390
process_one_work+0xa32/0x16f0
worker_thread+0x67d/0x10c0
kthread+0x344/0x440
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff888111145800
which belongs to the cache skbuff_small_head of size 640
The buggy address is located 239 bytes inside of
freed 640-byte region [ffff888111145800, ffff888111145a80)
As commit f855691975 ("xfrm6: Fix the nexthdr offset in
_decode_session6.") showed, xfrm_decode_session was originally intended
only for the receive path. IP6CB(skb)->nhoff is not set during
transmission. Therefore, set the cb field in the skb to 0 before
sending packets.
Fixes: f855691975 ("xfrm6: Fix the nexthdr offset in _decode_session6.")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d1e0e61d617ba17aa516db707aa871387566bbf7 ]
According to all consumers code of attrs[XFRMA_SEC_CTX], like
* verify_sec_ctx_len(), convert to xfrm_user_sec_ctx*
* xfrm_state_construct(), call security_xfrm_state_alloc whose prototype
is int security_xfrm_state_alloc(.., struct xfrm_user_sec_ctx *sec_ctx);
* copy_from_user_sec_ctx(), convert to xfrm_user_sec_ctx *
...
It seems that the expected parsing result for XFRMA_SEC_CTX should be
structure xfrm_user_sec_ctx, and the current xfrm_sec_ctx is confusing
and misleading (Luckily, they happen to have same size 8 bytes).
This commit amend the policy structure to xfrm_user_sec_ctx to avoid
ambiguity.
Fixes: cf5cb79f69 ("[XFRM] netlink: Establish an attribute policy")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 75065a8929069bc93181848818e23f147a73f83a ]
When running xfrm_state_walk_init(), the xfrm_address_filter being used
is okay to have a splen/dplen that equals to sizeof(xfrm_address_t)<<3.
This commit replaces >= to > to make sure the boundary checking is
correct.
Fixes: 37bd22420f ("af_key: pfkey_dump needs parameter validation")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 69f035c480d76f12bf061148ccfd578e1099e5fc upstream.
In the I2C_FUNC_SMBUS_BLOCK_DATA case, the invalid length byte value
(outside of 1-32) of the SMBus block data response from the Slave device
is not correctly handled by the I2C Designware driver.
In case IC_EMPTYFIFO_HOLD_MASTER_EN==1, which cannot be detected
from the registers, the Master can be disabled only if the STOP bit
is set. Without STOP bit set, the Master remains active, holding the bus
until receiving a block data response length. This hangs the bus and
is unrecoverable.
Avoid this by issuing another dump read to reach the stop condition when
an invalid length byte is received.
Cc: stable@vger.kernel.org
Signed-off-by: Tam Nguyen <tamnguyenchi@os.amperecomputing.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Link: https://lore.kernel.org/r/20230726080001.337353-3-tamnguyenchi@os.amperecomputing.com
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 49d4db3953cb9004ff94efc0c176e026c820af5a upstream.
Commit 0daede80f8 ("i2c: designware: Convert driver to using regmap API")
changes the logic to validate the whole 32-bit return value of
DW_IC_DATA_CMD register instead of 8-bit LSB without reason.
Later, commit f53f15ba5a ("i2c: designware: Get right data length"),
introduced partial fix but not enough because the "tmp > 0" still test
tmp as 32-bit value and is wrong in case the IC_DATA_CMD[11] is set.
Revert the logic to just before commit 0daede80f8
("i2c: designware: Convert driver to using regmap API").
Fixes: f53f15ba5a ("i2c: designware: Get right data length")
Fixes: 0daede80f8 ("i2c: designware: Convert driver to using regmap API")
Cc: stable@vger.kernel.org
Signed-off-by: Tam Nguyen <tamnguyenchi@os.amperecomputing.com>
Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Link: https://lore.kernel.org/r/20230726080001.337353-2-tamnguyenchi@os.amperecomputing.com
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 29eefa6d0d07e185f7bfe9576f91e6dba98189c2 upstream.
Pausing and canceling balance can race to interrupt balance lead to BUG_ON
panic in btrfs_cancel_balance. The BUG_ON condition in btrfs_cancel_balance
does not take this race scenario into account.
However, the race condition has no other side effects. We can fix that.
Reproducing it with panic trace like this:
kernel BUG at fs/btrfs/volumes.c:4618!
RIP: 0010:btrfs_cancel_balance+0x5cf/0x6a0
Call Trace:
<TASK>
? do_nanosleep+0x60/0x120
? hrtimer_nanosleep+0xb7/0x1a0
? sched_core_clone_cookie+0x70/0x70
btrfs_ioctl_balance_ctl+0x55/0x70
btrfs_ioctl+0xa46/0xd20
__x64_sys_ioctl+0x7d/0xa0
do_syscall_64+0x38/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Race scenario as follows:
> mutex_unlock(&fs_info->balance_mutex);
> --------------------
> .......issue pause and cancel req in another thread
> --------------------
> ret = __btrfs_balance(fs_info);
>
> mutex_lock(&fs_info->balance_mutex);
> if (ret == -ECANCELED && atomic_read(&fs_info->balance_pause_req)) {
> btrfs_info(fs_info, "balance: paused");
> btrfs_exclop_balance(fs_info, BTRFS_EXCLOP_BALANCE_PAUSED);
> }
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: xiaoshoukui <xiaoshoukui@ruijie.com.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c962098ca4af146f2625ed64399926a098752c9c upstream.
In production we were seeing a variety of WARN_ON()'s in the extent_map
code, specifically in btrfs_drop_extent_map_range() when we have to call
add_extent_mapping() for our second split.
Consider the following extent map layout
PINNED
[0 16K) [32K, 48K)
and then we call btrfs_drop_extent_map_range for [0, 36K), with
skip_pinned == true. The initial loop will have
start = 0
end = 36K
len = 36K
we will find the [0, 16k) extent, but since we are pinned we will skip
it, which has this code
start = em_end;
if (end != (u64)-1)
len = start + len - em_end;
em_end here is 16K, so now the values are
start = 16K
len = 16K + 36K - 16K = 36K
len should instead be 20K. This is a problem when we find the next
extent at [32K, 48K), we need to split this extent to leave [36K, 48k),
however the code for the split looks like this
split->start = start + len;
split->len = em_end - (start + len);
In this case we have
em_end = 48K
split->start = 16K + 36K // this should be 16K + 20K
split->len = 48K - (16K + 36K) // this overflows as 16K + 36K is 52K
and now we have an invalid extent_map in the tree that potentially
overlaps other entries in the extent map. Even in the non-overlapping
case we will have split->start set improperly, which will cause problems
with any block related calculations.
We don't actually need len in this loop, we can simply use end as our
end point, and only adjust start up when we find a pinned extent we need
to skip.
Adjust the logic to do this, which keeps us from inserting an invalid
extent map.
We only skip_pinned in the relocation case, so this is relatively rare,
except in the case where you are running relocation a lot, which can
happen with auto relocation on.
Fixes: 55ef689900 ("Btrfs: Fix btrfs_drop_extent_cache for skip pinned case")
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 282069845af388b08d622ad192b831dcd0549c62 upstream.
Do not read the data register to clear the error flags for lpuart32
platforms, the additional read may cause the receive FIFO underflow
since the DMA has already read the data register.
Actually all lpuart32 platforms support write 1 to clear those error
bits, let's use this method to better clear the error flags.
Fixes: 42b68768e5 ("serial: fsl_lpuart: DMA support for 32-bit variant")
Cc: stable <stable@kernel.org>
Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Link: https://lore.kernel.org/r/20230801022304.24251-1-sherry.sun@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c4f8333b582487a2d1e02171f1465531cde53e3 upstream.
In commit 9b9c8195f3f0 ("tty: n_gsm: fix UAF in gsm_cleanup_mux"), the UAF
problem is not completely fixed. There is a race condition in
gsm_cleanup_mux(), which caused this UAF.
The UAF problem is triggered by the following race:
task[5046] task[5054]
----------------------- -----------------------
gsm_cleanup_mux();
dlci = gsm->dlci[0];
mutex_lock(&gsm->mutex);
gsm_cleanup_mux();
dlci = gsm->dlci[0]; //Didn't take the lock
gsm_dlci_release(gsm->dlci[i]);
gsm->dlci[i] = NULL;
mutex_unlock(&gsm->mutex);
mutex_lock(&gsm->mutex);
dlci->dead = true; //UAF
Fix it by assigning values after mutex_lock().
Link: https://syzkaller.appspot.com/text?tag=CrashReport&x=176188b5a80000
Cc: stable <stable@kernel.org>
Fixes: 9b9c8195f3f0 ("tty: n_gsm: fix UAF in gsm_cleanup_mux")
Fixes: aa371e96f0 ("tty: n_gsm: fix restart handling via CLD command")
Signed-off-by: Yi Yang <yiyang13@huawei.com>
Co-developed-by: Qiumiao Zhang <zhangqiumiao1@huawei.com>
Signed-off-by: Qiumiao Zhang <zhangqiumiao1@huawei.com>
Link: https://lore.kernel.org/r/20230811031121.153237-1-yiyang13@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f46c1e1620c6bbc9aad5693082efd1b80822e97c upstream.
The previous patches added the missing nla policies that were required for
validation to work.
Now strict validation on netlink ops can be enabled. This patch does it.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Cc: stable@vger.kernel.org
Message-Id: <20230727175757.73988-9-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d6ba607d6cb5c58a4ddf33381e18c83dbb4098f upstream.
The vdpa_nl_policy structure is used to validate the nlattr when parsing
the incoming nlmsg. It will ensure the attribute being described produces
a valid nlattr pointer in info->attrs before entering into each handler
in vdpa_nl_ops.
That is to say, the missing part in vdpa_nl_policy may lead to illegal
nlattr after parsing, which could lead to OOB read just like CVE-2023-3773.
This patch adds the missing nla_policy for vdpa max vqp attr to avoid
such bugs.
Fixes: ad69dd0bf2 ("vdpa: Introduce query of device config layout")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Cc: stable@vger.kernel.org
Message-Id: <20230727175757.73988-7-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3003e1b54e057f5f3124e437b80c3bef26ed3fe upstream.
The vdpa_nl_policy structure is used to validate the nlattr when parsing
the incoming nlmsg. It will ensure the attribute being described produces
a valid nlattr pointer in info->attrs before entering into each handler
in vdpa_nl_ops.
That is to say, the missing part in vdpa_nl_policy may lead to illegal
nlattr after parsing, which could lead to OOB read just like CVE-2023-3773.
This patch adds the missing nla_policy for vdpa queue index attr to avoid
such bugs.
Fixes: 13b00b1356 ("vdpa: Add support for querying vendor statistics")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Cc: stable@vger.kernelorg
Message-Id: <20230727175757.73988-5-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 79c8651587504ba263d2fd67fd4406240fb21f69 upstream.
The vdpa_nl_policy structure is used to validate the nlattr when parsing
the incoming nlmsg. It will ensure the attribute being described produces
a valid nlattr pointer in info->attrs before entering into each handler
in vdpa_nl_ops.
That is to say, the missing part in vdpa_nl_policy may lead to illegal
nlattr after parsing, which could lead to OOB read just like CVE-2023-3773.
This patch adds the missing nla_policy for vdpa features attr to avoid
such bugs.
Fixes: 90fea5a800 ("vdpa: device feature provisioning")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Cc: stable@vger.kernel.org
Message-Id: <20230727175757.73988-3-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4f3175979e62de3b929bfa54a0db4b87d36257a7 upstream.
With hardened usercopy enabled (CONFIG_HARDENED_USERCOPY=y), using the
/proc/powerpc/rtas/firmware_update interface to prepare a system
firmware update yields a BUG():
kernel BUG at mm/usercopy.c:102!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 0 PID: 2232 Comm: dd Not tainted 6.5.0-rc3+ #2
Hardware name: IBM,8408-E8E POWER8E (raw) 0x4b0201 0xf000004 of:IBM,FW860.50 (SV860_146) hv:phyp pSeries
NIP: c0000000005991d0 LR: c0000000005991cc CTR: 0000000000000000
REGS: c0000000148c76a0 TRAP: 0700 Not tainted (6.5.0-rc3+)
MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 24002242 XER: 0000000c
CFAR: c0000000001fbd34 IRQMASK: 0
[ ... GPRs omitted ... ]
NIP usercopy_abort+0xa0/0xb0
LR usercopy_abort+0x9c/0xb0
Call Trace:
usercopy_abort+0x9c/0xb0 (unreliable)
__check_heap_object+0x1b4/0x1d0
__check_object_size+0x2d0/0x380
rtas_flash_write+0xe4/0x250
proc_reg_write+0xfc/0x160
vfs_write+0xfc/0x4e0
ksys_write+0x90/0x160
system_call_exception+0x178/0x320
system_call_common+0x160/0x2c4
The blocks of the firmware image are copied directly from user memory
to objects allocated from flash_block_cache, so flash_block_cache must
be created using kmem_cache_create_usercopy() to mark it safe for user
access.
Fixes: 6d07d1cd30 ("usercopy: Restrict non-usercopy caches to size 0")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
[mpe: Trim and indent oops]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230810-rtas-flash-vs-hardened-usercopy-v2-1-dcf63793a938@linux.ibm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0872b2c0abc0e84ac82472959c8e14e35277549c upstream.
in mmphw_probe(), check the return value of clk_prepare_enable()
and return the error code if clk_prepare_enable() returns an
unexpected value.
Fixes: d63028c389 ("video: mmp display controller support")
Signed-off-by: Yuanjun Gong <ruc_gongyuanjun@163.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 27ec43c77b5db780a56fc3a6d6de6bf2f74614f7 upstream.
Tegra processors prior to Tegra186 used APB DMA for I2C requiring
CONFIG_TEGRA20_APB_DMA=y while Tegra186 and later use GPC DMA requiring
CONFIG_TEGRA186_GPC_DMA=y.
The check for if the processor uses APB DMA is inverted and so the wrong
DMA config options are checked.
This means if CONFIG_TEGRA20_APB_DMA=y but CONFIG_TEGRA186_GPC_DMA=n
with a Tegra186 or later processor the driver will incorrectly think DMA is
enabled and attempt to request DMA channels that will never be availible,
leaving the driver in a perpetual EPROBE_DEFER state.
Fixes: 48cb6356fa ("i2c: tegra: Add GPCDMA support")
Signed-off-by: Parker Newman <pnewman@connecttech.com>
Acked-by: Andi Shyti <andi.shyti@kernel.org>
Acked-by: Akhil R <akhilrajeev@nvidia.com>
Link: https://lore.kernel.org/r/fcfcf9b3-c8c4-9b34-2ff8-cd60a3d490bd@connecttech.com
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fff67c1b17ee093947bdcbac6f64d072e644159a upstream.
The controller may be shared with other port, for example the firmware.
Handle the interrupt from other sources will cause crash since some
data are not initialized. So only handle the interrupt of the driver's
transfer and discard others.
Fixes: d62fbdb99a ("i2c: add support for HiSilicon I2C controller")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20230801124625.63587-1-yangyicong@huawei.com
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4caf4cb1eaed469742ef719f2cc024b1ec3fa9e6 upstream.
iproc_i2c_rd_reg() and iproc_i2c_wr_reg() are called from both
interrupt context (e.g. bcm_iproc_i2c_isr) and process context
(e.g. bcm_iproc_i2c_suspend). Therefore, interrupts should be
disabled to avoid potential deadlock. To prevent this scenario,
use spin_lock_irqsave().
Fixes: 9a10387280 ("i2c: iproc: add NIC I2C support")
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Acked-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e8f5f849ffce24490eb9449e98312b66c0dba76f ]
With deferred close we can have closes that race with lease breaks,
and so with the current checks for whether to send the lease response,
oplock_response(), this can mean that an unmount (kill_sb) can occur
just before we were checking if the tcon->ses is valid. See below:
[Fri Aug 4 04:12:50 2023] RIP: 0010:cifs_oplock_break+0x1f7/0x5b0 [cifs]
[Fri Aug 4 04:12:50 2023] Code: 7d a8 48 8b 7d c0 c0 e9 02 48 89 45 b8 41 89 cf e8 3e f5 ff ff 4c 89 f7 41 83 e7 01 e8 82 b3 03 f2 49 8b 45 50 48 85 c0 74 5e <48> 83 78 60 00 74 57 45 84 ff 75 52 48 8b 43 98 48 83 eb 68 48 39
[Fri Aug 4 04:12:50 2023] RSP: 0018:ffffb30607ddbdf8 EFLAGS: 00010206
[Fri Aug 4 04:12:50 2023] RAX: 632d223d32612022 RBX: ffff97136944b1e0 RCX: 0000000080100009
[Fri Aug 4 04:12:50 2023] RDX: 0000000000000001 RSI: 0000000080100009 RDI: ffff97136944b188
[Fri Aug 4 04:12:50 2023] RBP: ffffb30607ddbe58 R08: 0000000000000001 R09: ffffffffc08e0900
[Fri Aug 4 04:12:50 2023] R10: 0000000000000001 R11: 000000000000000f R12: ffff97136944b138
[Fri Aug 4 04:12:50 2023] R13: ffff97149147c000 R14: ffff97136944b188 R15: 0000000000000000
[Fri Aug 4 04:12:50 2023] FS: 0000000000000000(0000) GS:ffff9714f7c00000(0000) knlGS:0000000000000000
[Fri Aug 4 04:12:50 2023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri Aug 4 04:12:50 2023] CR2: 00007fd8de9c7590 CR3: 000000011228e000 CR4: 0000000000350ef0
[Fri Aug 4 04:12:50 2023] Call Trace:
[Fri Aug 4 04:12:50 2023] <TASK>
[Fri Aug 4 04:12:50 2023] process_one_work+0x225/0x3d0
[Fri Aug 4 04:12:50 2023] worker_thread+0x4d/0x3e0
[Fri Aug 4 04:12:50 2023] ? process_one_work+0x3d0/0x3d0
[Fri Aug 4 04:12:50 2023] kthread+0x12a/0x150
[Fri Aug 4 04:12:50 2023] ? set_kthread_struct+0x50/0x50
[Fri Aug 4 04:12:50 2023] ret_from_fork+0x22/0x30
[Fri Aug 4 04:12:50 2023] </TASK>
To fix this change the ordering of the checks before sending the oplock_response
to first check if the openFileList is empty.
Fixes: da787d5b7498 ("SMB3: Do not send lease break acknowledgment if all file handles have been closed")
Suggested-by: Bharath SM <bharathsm@microsoft.com>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ad03a0f44cdb97b46e5c84ed353dac9b8ae2c276 ]
mlx5_vdpa_destroy_mr can be called from .set_map with data ASID after
the control virtqueue ASID iotlb has been populated. The control vq
iotlb must not be cleared, since it will not be populated again.
So call the ASID aware destroy function which makes sure that the
right vq resource is destroyed.
Fixes: 8fcd20c307 ("vdpa/mlx5: Support different address spaces for control and data")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Message-Id: <20230802171231.11001-5-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ee811009ad8f87982b69e61d07447d12233ad01 ]
The mr->initialized flag is shared between the control vq and data vq
part of the mr init/uninit. But if the control vq and data vq get placed
in different ASIDs, it can happen that initializing the control vq will
prevent the data vq mr from being initialized.
This patch consolidates the control and data vq init parts into their
own init functions. The mr->initialized will now be used for the data vq
only. The control vq currently doesn't need a flag.
The uninitializing part is also taken care of: mlx5_vdpa_destroy_mr got
split into data and control vq functions which are now also ASID aware.
Fixes: 8fcd20c307 ("vdpa/mlx5: Support different address spaces for control and data")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Message-Id: <20230802171231.11001-3-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7ca26efb09a1543fddb29308ea3b63b66cb5d3ee ]
The IRQ injection work used spin_lock_irq() to protect the
scheduling of the softirq, but spin_lock_bh() should be
used.
With spin_lock_irq(), we noticed delay of more than 6
seconds between the time a NAPI polling work is scheduled
and the time it is executed.
Fixes: c8a6153b6c ("vduse: Introduce VDUSE - vDPA Device in Userspace")
Cc: xieyongji@bytedance.com
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Message-Id: <20230705114505.63274-1-maxime.coquelin@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 55c91fedd03d7b9cf0c5199b2eb12b9b8e95281a ]
vm_dev has a separate lifecycle because it has a 'struct device'
embedded. Thus, having a release callback for it is correct.
Allocating the vm_dev struct with devres totally breaks this protection,
though. Instead of waiting for the vm_dev release callback, the memory
is freed when the platform_device is removed. Resulting in a
use-after-free when finally the callback is to be called.
To easily see the problem, compile the kernel with
CONFIG_DEBUG_KOBJECT_RELEASE and unbind with sysfs.
The fix is easy, don't use devres in this case.
Found during my research about object lifetime problems.
Fixes: 7eb781b1bb ("virtio_mmio: add cleanup for virtio_mmio_probe")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Message-Id: <20230629120526.7184-1-wsa+renesas@sang-engineering.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0657b20c5a76c938612f8409735a8830d257866e ]
If a task creates a new block group and that block group becomes unused
before we finish its creation, at btrfs_create_pending_block_groups(),
then when btrfs_mark_bg_unused() is called against the block group, we
assume that the block group is currently in the list of block groups to
reclaim, and we move it out of the list of new block groups and into the
list of unused block groups. This has two consequences:
1) We move it out of the list of new block groups associated to the
current transaction. So the block group creation is not finished and
if we attempt to delete the bg because it's unused, we will not find
the block group item in the extent tree (or the new block group tree),
its device extent items in the device tree etc, resulting in the
deletion to fail due to the missing items;
2) We don't increment the reference count on the block group when we
move it to the list of unused block groups, because we assumed the
block group was on the list of block groups to reclaim, and in that
case it already has the correct reference count. However the block
group was on the list of new block groups, in which case no extra
reference was taken because it's local to the current task. This
later results in doing an extra reference count decrement when
removing the block group from the unused list, eventually leading the
reference count to 0.
This second case was caught when running generic/297 from fstests, which
produced the following assertion failure and stack trace:
[589.559] assertion failed: refcount_read(&block_group->refs) == 1, in fs/btrfs/block-group.c:4299
[589.559] ------------[ cut here ]------------
[589.559] kernel BUG at fs/btrfs/block-group.c:4299!
[589.560] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[589.560] CPU: 8 PID: 2819134 Comm: umount Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1
[589.560] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[589.560] RIP: 0010:btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.561] Code: 68 62 da c0 (...)
[589.561] RSP: 0018:ffffa55a8c3b3d98 EFLAGS: 00010246
[589.561] RAX: 0000000000000058 RBX: ffff8f030d7f2000 RCX: 0000000000000000
[589.562] RDX: 0000000000000000 RSI: ffffffff953f0878 RDI: 00000000ffffffff
[589.562] RBP: ffff8f030d7f2088 R08: 0000000000000000 R09: ffffa55a8c3b3c50
[589.562] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8f05850b4c00
[589.562] R13: ffff8f030d7f2090 R14: ffff8f05850b4cd8 R15: dead000000000100
[589.563] FS: 00007f497fd2e840(0000) GS:ffff8f09dfc00000(0000) knlGS:0000000000000000
[589.563] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[589.563] CR2: 00007f497ff8ec10 CR3: 0000000271472006 CR4: 0000000000370ee0
[589.563] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[589.564] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[589.564] Call Trace:
[589.564] <TASK>
[589.565] ? __die_body+0x1b/0x60
[589.565] ? die+0x39/0x60
[589.565] ? do_trap+0xeb/0x110
[589.565] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.566] ? do_error_trap+0x6a/0x90
[589.566] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.566] ? exc_invalid_op+0x4e/0x70
[589.566] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.567] ? asm_exc_invalid_op+0x16/0x20
[589.567] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.567] ? btrfs_free_block_groups+0x449/0x4a0 [btrfs]
[589.567] close_ctree+0x35d/0x560 [btrfs]
[589.568] ? fsnotify_sb_delete+0x13e/0x1d0
[589.568] ? dispose_list+0x3a/0x50
[589.568] ? evict_inodes+0x151/0x1a0
[589.568] generic_shutdown_super+0x73/0x1a0
[589.569] kill_anon_super+0x14/0x30
[589.569] btrfs_kill_super+0x12/0x20 [btrfs]
[589.569] deactivate_locked_super+0x2e/0x70
[589.569] cleanup_mnt+0x104/0x160
[589.570] task_work_run+0x56/0x90
[589.570] exit_to_user_mode_prepare+0x160/0x170
[589.570] syscall_exit_to_user_mode+0x22/0x50
[589.570] ? __x64_sys_umount+0x12/0x20
[589.571] do_syscall_64+0x48/0x90
[589.571] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[589.571] RIP: 0033:0x7f497ff0a567
[589.571] Code: af 98 0e (...)
[589.572] RSP: 002b:00007ffc98347358 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[589.572] RAX: 0000000000000000 RBX: 00007f49800b8264 RCX: 00007f497ff0a567
[589.572] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000557f558abfa0
[589.573] RBP: 0000557f558a6ba0 R08: 0000000000000000 R09: 00007ffc98346100
[589.573] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[589.573] R13: 0000557f558abfa0 R14: 0000557f558a6cb0 R15: 0000557f558a6dd0
[589.573] </TASK>
[589.574] Modules linked in: dm_snapshot dm_thin_pool (...)
[589.576] ---[ end trace 0000000000000000 ]---
Fix this by adding a runtime flag to the block group to tell that the
block group is still in the list of new block groups, and therefore it
should not be moved to the list of unused block groups, at
btrfs_mark_bg_unused(), until the flag is cleared, when we finish the
creation of the block group at btrfs_create_pending_block_groups().
Fixes: a9f189716cf1 ("btrfs: move out now unused BG from the reclaim list")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 961f5b8bf48a463ac5fe5b13143426d79eb41817 ]
In zoned mode the sequential status of zone can be also tracked in the
runtime flags of block group.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 0657b20c5a76 ("btrfs: fix use-after-free of new block group that became unused")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0d7764ff58b4b45c39eb03f2c74a819c1a88fa7b ]
We already have flags in block group to track various status bits,
convert needs_free_space as well and reduce size of btrfs_block_group.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 0657b20c5a76 ("btrfs: fix use-after-free of new block group that became unused")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a9f189716cf15913c453299d72f69c51a9b0f86b ]
An unused block group is easy to remove to free up space and should be
reclaimed fast. Such block group can often already be a target of the
reclaim process. As we check list_empty(&bg->bg_list), we keep it in the
reclaim list. That block group is never reclaimed until the file system
is filled e.g. up to 75%.
Instead, we can move unused block group to the unused list and delete it
fast.
Fixes: 18bb8bbf13 ("btrfs: zoned: automatically reclaim zones")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5ae3716cfdcd286268133867f67d0803847acefc ]
Instead of calling aperture_remove_conflicting_devices() to remove the
conflicting devices, just call to aperture_detach_devices() to detach
the device that matches the same PCI BAR / aperture range. Since the
former is just a wrapper of the latter plus a sysfb_disable() call,
and now that's done in this function but only for the primary devices.
This fixes a regression introduced by commit ee7a69aa38 ("fbdev:
Disable sysfb device registration when removing conflicting FBs"),
where we remove the sysfb when loading a driver for an unrelated pci
device, resulting in the user losing their efifb console or similar.
Note that in practice this only is a problem with the nvidia blob,
because that's the only gpu driver people might install which does not
come with an fbdev driver of it's own. For everyone else the real gpu
driver will restore a working console.
Also note that in the referenced bug there's confusion that this same
bug also happens on amdgpu. But that was just another amdgpu specific
regression, which just happened to happen at roughly the same time and
with the same user-observable symptoms. That bug is fixed now, see
https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15
Note that we should not have any such issues on non-pci multi-gpu
issues, because I could only find two such cases:
- SoC with some external panel over spi or similar. These panel
drivers do not use drm_aperture_remove_conflicting_framebuffers(),
so no problem.
- vga+mga, which is a direct console driver and entirely bypasses all
this.
For the above reasons the cc: stable is just notionally, this patch
will need a backport and that's up to nvidia if they care enough.
v2:
- Explain a bit better why other multi-gpu that aren't pci shouldn't
have any issues with making all this fully pci specific.
v3
- polish commit message (Javier)
v4:
- Fix commit message style (i.e., commit 1234 ("..."))
- fix Daniel's S-o-b address
v5:
- add back an S-o-b tag with Daniel's Intel address
Fixes: ee7a69aa38 ("fbdev: Disable sysfb device registration when removing conflicting FBs")
Tested-by: Aaron Plattner <aplattner@nvidia.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Aaron Plattner <aplattner@nvidia.com>
Cc: Javier Martinez Canillas <javierm@redhat.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Helge Deller <deller@gmx.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: <stable@vger.kernel.org> # v5.19+ (if someone else does the backport)
Link: https://patchwork.freedesktop.org/patch/msgid/20230406132109.32050-8-tzimmermann@suse.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 81d2393485f0990cf6566b0c9e0697c199f68ae5 ]
Generic fbdev drivers use the apertures field in struct fb_info to
control ownership of the framebuffer memory and graphics device. Do
not set the values in hyperv-fb.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221219160516.23436-9-tzimmermann@suse.de
Stable-dep-of: 5ae3716cfdcd ("video/aperture: Only remove sysfb on the default vga pci device")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b321c31c9b7b309dcde5e8854b741c8e6a9a05f0 ]
Xiang reports that VMs occasionally fail to boot on GICv4.1 systems when
running a preemptible kernel, as it is possible that a vCPU is blocked
without requesting a doorbell interrupt.
The issue is that any preemption that occurs between vgic_v4_put() and
schedule() on the block path will mark the vPE as nonresident and *not*
request a doorbell irq. This occurs because when the vcpu thread is
resumed on its way to block, vcpu_load() will make the vPE resident
again. Once the vcpu actually blocks, we don't request a doorbell
anymore, and the vcpu won't be woken up on interrupt delivery.
Fix it by tracking that we're entering WFI, and key the doorbell
request on that flag. This allows us not to make the vPE resident
when going through a preempt/schedule cycle, meaning we don't lose
any state.
Cc: stable@vger.kernel.org
Fixes: 8e01d9a396 ("KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put")
Reported-by: Xiang Chen <chenxiang66@hisilicon.com>
Suggested-by: Zenghui Yu <yuzenghui@huawei.com>
Tested-by: Xiang Chen <chenxiang66@hisilicon.com>
Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20230713070657.3873244-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cdff36a0217aadf5cbc167893ad1c0da869619cb ]
[Why] hdcp are enabled for asics from raven. for old asics
which hdcp are not enabled, hdcp_workqueue are null. some
access to hdcp work queue are not guarded with pointer check.
[How] add hdcp_workqueue pointer check before access workqueue.
Reviewed-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Hersen Wu <hersenxs.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>