Commit Graph

1149734 Commits

Author SHA1 Message Date
Matthias Maennich
73185e2d4e ANDROID: Remove all but top-level OWNERS
Now that the branch is used to create production GKI
images, need to institute ACK DrNo for all commits.

The DrNo approvers are in the android-mainline branch
at /OWNERS_DrNo.

Bug: 287162457
Signed-off-by: Matthias Maennich <maennich@google.com>
Change-Id: Id5bb83d7add5f314df6816c1c51b4bf2d8018e79
2023-06-15 09:54:33 +01:00
Matthias Maennich
1090306d3d ANDROID: Enable GKI Dr. No Enforcement
This locks down OWNERS approval to a small group to guard against
unintentional breakages.

Bug: 287162457
Signed-off-by: Matthias Maennich <maennich@google.com>
Change-Id: I58ca467b1e7786e1ad0f6ad67c7a7a5845a91ec6
2023-06-15 09:54:33 +01:00
Carlos Llamas
16c18c497d ANDROID: 6/16/2023 KMI update
Set KMI_GENERATION=10 for 6/16 KMI update

function symbol changed from 'int devm_gh_rm_register_platform_ops(struct device*, struct gh_rm_platform_ops*)' to 'int devm_gh_rm_register_platform_ops(struct device*, const struct gh_rm_platform_ops*)'
  CRC changed from 0xec193d82 to 0xe82ea1f9
  type changed from 'int(struct device*, struct gh_rm_platform_ops*)' to 'int(struct device*, const struct gh_rm_platform_ops*)'
    parameter 2 type changed from 'struct gh_rm_platform_ops*' to 'const struct gh_rm_platform_ops*'
      pointed-to type changed from 'struct gh_rm_platform_ops' to 'const struct gh_rm_platform_ops'
        qualifier const added

function symbol changed from 'int gh_rm_register_platform_ops(struct gh_rm_platform_ops*)' to 'int gh_rm_register_platform_ops(const struct gh_rm_platform_ops*)'
  CRC changed from 0xc34a7803 to 0xfd11885c
  type changed from 'int(struct gh_rm_platform_ops*)' to 'int(const struct gh_rm_platform_ops*)'
    parameter 1 type changed from 'struct gh_rm_platform_ops*' to 'const struct gh_rm_platform_ops*'
      pointed-to type changed from 'struct gh_rm_platform_ops' to 'const struct gh_rm_platform_ops'
        qualifier const added

function symbol changed from 'void gh_rm_unregister_platform_ops(struct gh_rm_platform_ops*)' to 'void gh_rm_unregister_platform_ops(const struct gh_rm_platform_ops*)'
  CRC changed from 0xc1f09d18 to 0x57f483b
  type changed from 'void(struct gh_rm_platform_ops*)' to 'void(const struct gh_rm_platform_ops*)'
    parameter 1 type changed from 'struct gh_rm_platform_ops*' to 'const struct gh_rm_platform_ops*'
      pointed-to type changed from 'struct gh_rm_platform_ops' to 'const struct gh_rm_platform_ops'
        qualifier const added

function symbol 'int ___pskb_trim(struct sk_buff*, unsigned int)' changed
  CRC changed from 0xb8fdf4c6 to 0x45b20f13

function symbol 'struct sk_buff* __alloc_skb(unsigned int, gfp_t, int, int)' changed
  CRC changed from 0x34355489 to 0x19dd35ba

function symbol 'void __balance_callbacks(struct rq*)' changed
  CRC changed from 0x76a1a2f4 to 0x2af1f39a

... 886 omitted; 889 symbols have only CRC changes

type 'struct hci_dev' changed
  byte size changed from 6416 to 6464
  member 'struct mutex unregister_lock' was added
  106 members ('struct work_struct cmd_sync_cancel_work' .. 'u64 android_kabi_reserved4') changed
    offset changed by 384

type 'struct sock' changed
  member 'int sk_wait_pending' was added

type 'struct xhci_driver_overrides' changed
  byte size changed from 64 to 120
  member 'int(* address_device)(struct usb_hcd*, struct usb_device*)' was added
  member 'int(* bus_suspend)(struct usb_hcd*)' was added
  member 'int(* bus_resume)(struct usb_hcd*)' was added
  member 'u64 android_kabi_reserved1' was added
  member 'u64 android_kabi_reserved2' was added
  member 'u64 android_kabi_reserved3' was added
  member 'u64 android_kabi_reserved4' was added

type 'struct pneigh_entry' changed
  member changed from 'u8 key[0]' to 'u32 key[0]'
    offset changed from 232 to 256
    type changed from 'u8[0]' to 'u32[0]'
      element type changed from 'u8' = '__u8' = 'unsigned char' to 'u32' = '__u32' = 'unsigned int'
        resolved type changed from 'unsigned char' to 'unsigned int'

type 'struct usb_udc' changed
  byte size changed from 952 to 1048
  member 'bool allow_connect' was added
  member 'struct work_struct vbus_work' was added
  member 'struct mutex connect_lock' was added

type 'struct sk_psock' changed
  byte size changed from 392 to 472
  member changed from 'struct work_struct work' to 'struct delayed_work work'
    offset changed from 2176 to 2112
    type changed from 'struct work_struct' to 'struct delayed_work'
  member 'struct rcu_work rwork' changed
    offset changed by 640

type 'struct netns_sysctl_ipv6' changed
  member changed from 'bool skip_notify_on_dev_down' to 'int skip_notify_on_dev_down'
    type changed from 'bool' = '_Bool' to 'int'
      resolved type changed from '_Bool' to 'int'
  member 'u8 fib_notify_on_flag_change' changed
    offset changed by 24

type 'struct sk_psock_work_state' changed
  byte size changed from 16 to 8
  member 'struct sk_buff* skb' was removed
  2 members ('u32 len' .. 'u32 off') changed
    offset changed by -64

Bug: 287162457
Change-Id: I438a8aa2c6a38dd5d575493b2735fe4d4403a971
Signed-off-by: Carlos Llamas <cmllamas@google.com>
2023-06-14 23:24:57 +00:00
Elliot Berman
fcc32be061 ANDROID: virt: gunyah: Sync with latest platform ops
Const-ify the gh_rm_platform_ops.

Syncs with the latest version of the platform ops:

https://lore.kernel.org/all/20230613172054.3959700-15-quic_eberman@quicinc.com/

INFO: function symbol changed from 'int devm_gh_rm_register_platform_ops(struct device*, struct gh_rm_platform_ops*)' to 'int devm_gh_rm_register_platform_ops(struct device*, const struct gh_rm_platform_ops*)'
  CRC changed from 0xc4b20ef4 to 0x7fe0042f
  type changed from 'int(struct device*, struct gh_rm_platform_ops*)' to 'int(struct device*, const struct gh_rm_platform_ops*)'
    parameter 2 type changed from 'struct gh_rm_platform_ops*' to 'const struct gh_rm_platform_ops*'
      pointed-to type changed from 'struct gh_rm_platform_ops' to 'const struct gh_rm_platform_ops'
        qualifier const added

function symbol changed from 'int gh_rm_register_platform_ops(struct gh_rm_platform_ops*)' to 'int gh_rm_register_platform_ops(const struct gh_rm_platform_ops*)'
  CRC changed from 0xc34a7803 to 0xfd11885c
  type changed from 'int(struct gh_rm_platform_ops*)' to 'int(const struct gh_rm_platform_ops*)'
    parameter 1 type changed from 'struct gh_rm_platform_ops*' to 'const struct gh_rm_platform_ops*'
      pointed-to type changed from 'struct gh_rm_platform_ops' to 'const struct gh_rm_platform_ops'
        qualifier const added

function symbol changed from 'void gh_rm_unregister_platform_ops(struct gh_rm_platform_ops*)' to 'void gh_rm_unregister_platform_ops(const struct gh_rm_platform_ops*)'
  CRC changed from 0xc1f09d18 to 0x57f483b
  type changed from 'void(struct gh_rm_platform_ops*)' to 'void(const struct gh_rm_platform_ops*)'
    parameter 1 type changed from 'struct gh_rm_platform_ops*' to 'const struct gh_rm_platform_ops*'
      pointed-to type changed from 'struct gh_rm_platform_ops' to 'const struct gh_rm_platform_ops'
        qualifier const added

Bug: 287037804
Change-Id: Iff37610b721c344ac8c6b1737830f6d1e8674d34
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
2023-06-14 23:05:19 +00:00
Badhri Jagan Sridharan
69a3ec73e4 FROMGIT: usb: gadget: udc: core: Prevent soft_connect_store() race
usb_udc_connect_control(), soft_connect_store() and
usb_gadget_deactivate() can potentially race against each other to invoke
usb_gadget_connect()/usb_gadget_disconnect(). To prevent this, guard
udc->started, gadget->allow_connect, gadget->deactivate and
gadget->connect with connect_lock so that ->pullup() is only invoked when
the gadget is bound, started and not deactivated. The routines
usb_gadget_connect_locked(), usb_gadget_disconnect_locked(),
usb_udc_connect_control_locked(), usb_gadget_udc_start_locked(),
usb_gadget_udc_stop_locked() are called with this lock held.

An earlier version of this commit was reverted due to the crash reported in
https://lore.kernel.org/all/ZF4BvgsOyoKxdPFF@francesco-nb.int.toradex.com/.
commit 16737e78d190 ("usb: gadget: udc: core: Offload usb_udc_vbus_handler processing")
addresses the crash reported.

Cc: stable@vger.kernel.org
Fixes: 628ef0d273 ("usb: udc: add usb_udc_vbus_handler")
Change-Id: I33b56f9eee28059a7e113d6c8081ab6653a03c33
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
Message-ID: <20230609010227.978661-2-badhri@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 286d9975a838d0a54da049765fa1d1fb96b89682
https: //git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git/ usb-linus)
2023-06-14 23:03:53 +00:00
Badhri Jagan Sridharan
18b677ffae FROMGIT: usb: gadget: udc: core: Offload usb_udc_vbus_handler processing
usb_udc_vbus_handler() can be invoked from interrupt context by irq
handlers of the gadget drivers, however, usb_udc_connect_control() has
to run in non-atomic context due to the following:
a. Some of the gadget driver implementations expect the ->pullup
   callback to be invoked in non-atomic context.
b. usb_gadget_disconnect() acquires udc_lock which is a mutex.

Hence offload invocation of usb_udc_connect_control()
to workqueue.

UDC should not be pulled up unless gadget driver is bound. The new flag
"allow_connect" is now set by gadget_bind_driver() and cleared by
gadget_unbind_driver(). This prevents work item to pull up the gadget
even if queued when the gadget driver is already unbound.

Cc: stable@vger.kernel.org
Fixes: 1016fc0c09 ("USB: gadget: Fix obscure lockdep violation for udc_mutex")
Change-Id: Idbe00846fc5394397567024c3081381ddec7cfae
Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
Message-ID: <20230609010227.978661-1-badhri@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 50966da807c81c5eb3bdfd392990fe0bba94d1ee
https: //git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git/ usb-linus)
2023-06-14 23:03:53 +00:00
Johan Hovold
a1741f9c45 UPSTREAM: Bluetooth: fix debugfs registration
commit fe2ccc6c29d53e14d3c8b3ddf8ad965a92e074ee upstream.

Since commit ec6cef9cd9 ("Bluetooth: Fix SMP channel registration for
unconfigured controllers") the debugfs interface for unconfigured
controllers will be created when the controller is configured.

There is however currently nothing preventing a controller from being
configured multiple time (e.g. setting the device address using btmgmt)
which results in failed attempts to register the already registered
debugfs entries:

	debugfs: File 'features' in directory 'hci0' already present!
	debugfs: File 'manufacturer' in directory 'hci0' already present!
	debugfs: File 'hci_version' in directory 'hci0' already present!
	...
	debugfs: File 'quirk_simultaneous_discovery' in directory 'hci0' already present!

Add a controller flag to avoid trying to register the debugfs interface
more than once.

Fixes: ec6cef9cd9 ("Bluetooth: Fix SMP channel registration for unconfigured controllers")
Cc: stable@vger.kernel.org      # 4.0
Change-Id: I495feabe66fa2b294ff72fbb5dfd1bd869b1ad83
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e5ae01fd46)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 23:02:42 +00:00
Zhengping Jiang
d890debdaf UPSTREAM: Bluetooth: hci_sync: add lock to protect HCI_UNREGISTER
[ Upstream commit 1857c19941c87eb36ad47f22a406be5dfe5eff9f ]

When the HCI_UNREGISTER flag is set, no jobs should be scheduled. Fix
potential race when HCI_UNREGISTER is set after the flag is tested in
hci_cmd_sync_queue.

Fixes: 0b94f2651f ("Bluetooth: hci_sync: Fix queuing commands when HCI_UNREGISTER is set")
Change-Id: I565a2ad87dc2ce4fd62ee0d09a5d28342fec8ad3
Signed-off-by: Zhengping Jiang <jiangzp@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 17aac12002)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 23:02:42 +00:00
Eric Dumazet
855c5479cb UPSTREAM: net/ipv6: fix bool/int mismatch for skip_notify_on_dev_down
[ Upstream commit edf2e1d2019b2730d6076dbe4c040d37d7c10bbe ]

skip_notify_on_dev_down ctl table expects this field
to be an int (4 bytes), not a bool (1 byte).

Because proc_dou8vec_minmax() was added in 5.13,
this patch converts skip_notify_on_dev_down to an int.

Following patch then converts the field to u8 and use proc_dou8vec_minmax().

Fixes: 7c6bb7d2fa ("net/ipv6: Add knob to skip DELROUTE message on device down")
Change-Id: I99875fad5012906099456fafa88e42e7f02133cf
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 76e38e6e1b)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 23:02:42 +00:00
Qingfang DENG
b0fa6dd29a UPSTREAM: neighbour: fix unaligned access to pneigh_entry
[ Upstream commit ed779fe4c9b5a20b4ab4fd6f3e19807445bb78c7 ]

After the blamed commit, the member key is longer 4-byte aligned. On
platforms that do not support unaligned access, e.g., MIPS32R2 with
unaligned_action set to 1, this will trigger a crash when accessing
an IPv6 pneigh_entry, as the key is cast to an in6_addr pointer.

Change the type of the key to u32 to make it aligned.

Fixes: 62dd93181a ("[IPV6] NDISC: Set per-entry is_router flag in Proxy NA.")
Change-Id: I3ac6eaf9afe9210cc4d9ef2dc6181fcb0fba6d15
Signed-off-by: Qingfang DENG <qingfang.deng@siflower.com.cn>
Link: https://lore.kernel.org/r/20230601015432.159066-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 8af3119388)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 23:02:42 +00:00
Eric Dumazet
1707d64dab UPSTREAM: tcp: deny tcp_disconnect() when threads are waiting
[ Upstream commit 4faeee0cf8a5d88d63cdbc3bab124fb0e6aed08c ]

Historically connect(AF_UNSPEC) has been abused by syzkaller
and other fuzzers to trigger various bugs.

A recent one triggers a divide-by-zero [1], and Paolo Abeni
was able to diagnose the issue.

tcp_recvmsg_locked() has tests about sk_state being not TCP_LISTEN
and TCP REPAIR mode being not used.

Then later if socket lock is released in sk_wait_data(),
another thread can call connect(AF_UNSPEC), then make this
socket a TCP listener.

When recvmsg() is resumed, it can eventually call tcp_cleanup_rbuf()
and attempt a divide by 0 in tcp_rcv_space_adjust() [1]

This patch adds a new socket field, counting number of threads
blocked in sk_wait_event() and inet_wait_for_connect().

If this counter is not zero, tcp_disconnect() returns an error.

This patch adds code in blocking socket system calls, thus should
not hurt performance of non blocking ones.

Note that we probably could revert commit 499350a5a6 ("tcp:
initialize rcv_mss to TCP_MIN_MSS instead of 0") to restore
original tcpi_rcv_mss meaning (was 0 if no payload was ever
received on a socket)

[1]
divide error: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 13832 Comm: syz-executor.5 Not tainted 6.3.0-rc4-syzkaller-00224-g00c7b5f4ddc5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
RIP: 0010:tcp_rcv_space_adjust+0x36e/0x9d0 net/ipv4/tcp_input.c:740
Code: 00 00 00 00 fc ff df 4c 89 64 24 48 8b 44 24 04 44 89 f9 41 81 c7 80 03 00 00 c1 e1 04 44 29 f0 48 63 c9 48 01 e9 48 0f af c1 <49> f7 f6 48 8d 04 41 48 89 44 24 40 48 8b 44 24 30 48 c1 e8 03 48
RSP: 0018:ffffc900033af660 EFLAGS: 00010206
RAX: 4a66b76cbade2c48 RBX: ffff888076640cc0 RCX: 00000000c334e4ac
RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000001
RBP: 00000000c324e86c R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8880766417f8
R13: ffff888028fbb980 R14: 0000000000000000 R15: 0000000000010344
FS: 00007f5bffbfe700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b32f25000 CR3: 000000007ced0000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
tcp_recvmsg_locked+0x100e/0x22e0 net/ipv4/tcp.c:2616
tcp_recvmsg+0x117/0x620 net/ipv4/tcp.c:2681
inet6_recvmsg+0x114/0x640 net/ipv6/af_inet6.c:670
sock_recvmsg_nosec net/socket.c:1017 [inline]
sock_recvmsg+0xe2/0x160 net/socket.c:1038
____sys_recvmsg+0x210/0x5a0 net/socket.c:2720
___sys_recvmsg+0xf2/0x180 net/socket.c:2762
do_recvmmsg+0x25e/0x6e0 net/socket.c:2856
__sys_recvmmsg net/socket.c:2935 [inline]
__do_sys_recvmmsg net/socket.c:2958 [inline]
__se_sys_recvmmsg net/socket.c:2951 [inline]
__x64_sys_recvmmsg+0x20f/0x260 net/socket.c:2951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f5c0108c0f9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f5bffbfe168 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
RAX: ffffffffffffffda RBX: 00007f5c011ac050 RCX: 00007f5c0108c0f9
RDX: 0000000000000001 RSI: 0000000020000bc0 RDI: 0000000000000003
RBP: 00007f5c010e7b39 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000122 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f5c012cfb1f R14: 00007f5bffbfe300 R15: 0000000000022000
</TASK>

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Paolo Abeni <pabeni@redhat.com>
Diagnosed-by: Paolo Abeni <pabeni@redhat.com>
Change-Id: I63f5375d7dd7d2094b30d00c48d48bb500d54e2e
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20230526163458.2880232-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit c2251ce048)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 23:02:42 +00:00
JaeHun Jung
a7cd7a3dd7 ANDROID: sound: usb: Add vendor's hooking interface
In mobile, a co-processor can be used with USB audio to improve power
consumption.  To support this type of hardware, hooks need to be added
to the USB audio subsystem to be able to call into the hardware when
needed.

This interface can be support multiful USB audio devices.
It is depends on co-processor's F/W.

The main operation of the call-backs are:
- Initialize the co-processor by transmitting data when initializing.
- Change the co-processor setting value through the interface
function.
- Configure sampling rate
- pcm open/close
- other housekeeping

Bug: 156315379
Signed-off-by: Oh Eomji <eomji.oh@samsung.com>
Signed-off-by: JaeHun Jung <jh0801.jung@samsung.com>
[rework api to be a bit more self-contained and obvious - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I39061f6cc85be7bcae8db0e612fe58396bdedb24
2023-06-14 23:02:42 +00:00
Greg Kroah-Hartman
2c6f80378c ANDROID: GKI: USB: XHCI: add Android ABI padding to struct xhci_driver_overrides
Given that the vendors like to hook the xhci platform driver to handle
offload functionality that is not yet upstream, add some more padding to
struct xhci_driver_overrides in order to be able to handle any future
problems easier.

Bug: 151154716
Bug: 182336717
Cc: Daehwan Jung <dh10.jung@samsung.com>
Cc: JaeHun Jung <jh0801.jung@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iaa59f63b0777c7671292bd0839a6eb8f57bc7a59
2023-06-14 23:02:41 +00:00
Daehwan Jung
cd3b5ff535 ANDROID: usb: host: add address_device to xhci overrides
Co-processor needs some information about connected usb device.
It's proper to pass information after usb device gets address when
getting "Set Address" command.
It supports vendors to implement it using xhci overrides.

There're several power scenarios depending on vendors.
It gives vendors flexibilty to meet their power requirement.
They can override suspend and resume of root hub.

Bug: 183761108
Change-Id: I51e4d190a6a110f987139d394621590fa40ea6a6
Signed-off-by: Daehwan Jung <dh10.jung@samsung.com>
Signed-off-by: JaeHun Jung <jh0801.jung@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 23:02:41 +00:00
Eric Dumazet
e3ff5d6bf0 UPSTREAM: bpf, sockmap: Avoid potential NULL dereference in sk_psock_verdict_data_ready()
[ Upstream commit b320a45638296b63be8d9a901ca8bc43716b1ae1 ]

syzbot found sk_psock(sk) could return NULL when called
from sk_psock_verdict_data_ready().

Just make sure to handle this case.

[1]
general protection fault, probably for non-canonical address 0xdffffc000000005c: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000002e0-0x00000000000002e7]
CPU: 0 PID: 15 Comm: ksoftirqd/0 Not tainted 6.4.0-rc3-syzkaller-00588-g4781e965e655 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/16/2023
RIP: 0010:sk_psock_verdict_data_ready+0x19f/0x3c0 net/core/skmsg.c:1213
Code: 4c 89 e6 e8 63 70 5e f9 4d 85 e4 75 75 e8 19 74 5e f9 48 8d bb e0 02 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 07 02 00 00 48 89 ef ff 93 e0 02 00 00 e8 29 fd
RSP: 0018:ffffc90000147688 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000100
RDX: 000000000000005c RSI: ffffffff8825ceb7 RDI: 00000000000002e0
RBP: ffff888076518c40 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000008000 R15: ffff888076518c40
FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f901375bab0 CR3: 000000004bf26000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
tcp_data_ready+0x10a/0x520 net/ipv4/tcp_input.c:5006
tcp_data_queue+0x25d3/0x4c50 net/ipv4/tcp_input.c:5080
tcp_rcv_established+0x829/0x1f90 net/ipv4/tcp_input.c:6019
tcp_v4_do_rcv+0x65a/0x9c0 net/ipv4/tcp_ipv4.c:1726
tcp_v4_rcv+0x2cbf/0x3340 net/ipv4/tcp_ipv4.c:2148
ip_protocol_deliver_rcu+0x9f/0x480 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x2ec/0x520 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:303 [inline]
NF_HOOK include/linux/netfilter.h:297 [inline]
ip_local_deliver+0x1ae/0x200 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:468 [inline]
ip_rcv_finish+0x1cf/0x2f0 net/ipv4/ip_input.c:449
NF_HOOK include/linux/netfilter.h:303 [inline]
NF_HOOK include/linux/netfilter.h:297 [inline]
ip_rcv+0xae/0xd0 net/ipv4/ip_input.c:569
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5491
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5605
process_backlog+0x101/0x670 net/core/dev.c:5933
__napi_poll+0xb7/0x6f0 net/core/dev.c:6499
napi_poll net/core/dev.c:6566 [inline]
net_rx_action+0x8a9/0xcb0 net/core/dev.c:6699
__do_softirq+0x1d4/0x905 kernel/softirq.c:571
run_ksoftirqd kernel/softirq.c:939 [inline]
run_ksoftirqd+0x31/0x60 kernel/softirq.c:931
smpboot_thread_fn+0x659/0x9e0 kernel/smpboot.c:164
kthread+0x344/0x440 kernel/kthread.c:379
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>

Fixes: 6df7f764cd3c ("bpf, sockmap: Wake up polling after data copy")
Reported-by: syzbot <syzkaller@googlegroups.com>
Change-Id: I7c0f888b35987f8019088e9232fbe0f0491f661b
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20230530195149.68145-1-edumazet@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 898c9a0ee7)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 23:02:41 +00:00
John Fastabend
07873e75c6 UPSTREAM: bpf, sockmap: Incorrectly handling copied_seq
[ Upstream commit e5c6de5fa025882babf89cecbed80acf49b987fa ]

The read_skb() logic is incrementing the tcp->copied_seq which is used for
among other things calculating how many outstanding bytes can be read by
the application. This results in application errors, if the application
does an ioctl(FIONREAD) we return zero because this is calculated from
the copied_seq value.

To fix this we move tcp->copied_seq accounting into the recv handler so
that we update these when the recvmsg() hook is called and data is in
fact copied into user buffers. This gives an accurate FIONREAD value
as expected and improves ACK handling. Before we were calling the
tcp_rcv_space_adjust() which would update 'number of bytes copied to
user in last RTT' which is wrong for programs returning SK_PASS. The
bytes are only copied to the user when recvmsg is handled.

Doing the fix for recvmsg is straightforward, but fixing redirect and
SK_DROP pkts is a bit tricker. Build a tcp_psock_eat() helper and then
call this from skmsg handlers. This fixes another issue where a broken
socket with a BPF program doing a resubmit could hang the receiver. This
happened because although read_skb() consumed the skb through sock_drop()
it did not update the copied_seq. Now if a single reccv socket is
redirecting to many sockets (for example for lb) the receiver sk will be
hung even though we might expect it to continue. The hang comes from
not updating the copied_seq numbers and memory pressure resulting from
that.

We have a slight layer problem of calling tcp_eat_skb even if its not
a TCP socket. To fix we could refactor and create per type receiver
handlers. I decided this is more work than we want in the fix and we
already have some small tweaks depending on caller that use the
helper skb_bpf_strparser(). So we extend that a bit and always set
the strparser bit when it is in use and then we can gate the
seq_copied updates on this.

Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Change-Id: I8dc204d02e26975f8133d7e4d777b2194e30a6aa
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-9-john.fastabend@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit fe735073a5)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 23:02:41 +00:00
John Fastabend
e218734b1b UPSTREAM: bpf, sockmap: Wake up polling after data copy
[ Upstream commit 6df7f764cd3cf5a03a4a47b23be47e57e41fcd85 ]

When TCP stack has data ready to read sk_data_ready() is called. Sockmap
overwrites this with its own handler to call into BPF verdict program.
But, the original TCP socket had sock_def_readable that would additionally
wake up any user space waiters with sk_wake_async().

Sockmap saved the callback when the socket was created so call the saved
data ready callback and then we can wake up any epoll() logic waiting
on the read.

Note we call on 'copied >= 0' to account for returning 0 when a FIN is
received because we need to wake up user for this as well so they
can do the recvmsg() -> 0 and detect the shutdown.

Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Change-Id: Idf56c7acfeb25791dc6e5f42dce2e64b09d55cf9
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-8-john.fastabend@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit dd628fc697)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 23:02:41 +00:00
John Fastabend
f9cc0b7f9b UPSTREAM: bpf, sockmap: TCP data stall on recv before accept
[ Upstream commit ea444185a6bf7da4dd0df1598ee953e4f7174858 ]

A common mechanism to put a TCP socket into the sockmap is to hook the
BPF_SOCK_OPS_{ACTIVE_PASSIVE}_ESTABLISHED_CB event with a BPF program
that can map the socket info to the correct BPF verdict parser. When
the user adds the socket to the map the psock is created and the new
ops are assigned to ensure the verdict program will 'see' the sk_buffs
as they arrive.

Part of this process hooks the sk_data_ready op with a BPF specific
handler to wake up the BPF verdict program when data is ready to read.
The logic is simple enough (posted here for easy reading)

 static void sk_psock_verdict_data_ready(struct sock *sk)
 {
	struct socket *sock = sk->sk_socket;

	if (unlikely(!sock || !sock->ops || !sock->ops->read_skb))
		return;
	sock->ops->read_skb(sk, sk_psock_verdict_recv);
 }

The oversight here is sk->sk_socket is not assigned until the application
accepts() the new socket. However, its entirely ok for the peer application
to do a connect() followed immediately by sends. The socket on the receiver
is sitting on the backlog queue of the listening socket until its accepted
and the data is queued up. If the peer never accepts the socket or is slow
it will eventually hit data limits and rate limit the session. But,
important for BPF sockmap hooks when this data is received TCP stack does
the sk_data_ready() call but the read_skb() for this data is never called
because sk_socket is missing. The data sits on the sk_receive_queue.

Then once the socket is accepted if we never receive more data from the
peer there will be no further sk_data_ready calls and all the data
is still on the sk_receive_queue(). Then user calls recvmsg after accept()
and for TCP sockets in sockmap we use the tcp_bpf_recvmsg_parser() handler.
The handler checks for data in the sk_msg ingress queue expecting that
the BPF program has already run from the sk_data_ready hook and enqueued
the data as needed. So we are stuck.

To fix do an unlikely check in recvmsg handler for data on the
sk_receive_queue and if it exists wake up data_ready. We have the sock
locked in both read_skb and recvmsg so should avoid having multiple
runners.

Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Change-Id: I82bc3eafce486a816cf8dfada1939128922ae174
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-7-john.fastabend@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ab90b68f65)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 23:02:41 +00:00
John Fastabend
028591f2c8 UPSTREAM: bpf, sockmap: Handle fin correctly
[ Upstream commit 901546fd8f9ca4b5c481ce00928ab425ce9aacc0 ]

The sockmap code is returning EAGAIN after a FIN packet is received and no
more data is on the receive queue. Correct behavior is to return 0 to the
user and the user can then close the socket. The EAGAIN causes many apps
to retry which masks the problem. Eventually the socket is evicted from
the sockmap because its released from sockmap sock free handling. The
issue creates a delay and can cause some errors on application side.

To fix this check on sk_msg_recvmsg side if length is zero and FIN flag
is set then set return to zero. A selftest will be added to check this
condition.

Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Change-Id: I26d941790b9742534370c0447fd4a92cab55c32e
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: William Findlay <will@isovalent.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-6-john.fastabend@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 3a2129ebae)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 23:02:41 +00:00
John Fastabend
e69ad7c838 UPSTREAM: bpf, sockmap: Improved check for empty queue
[ Upstream commit 405df89dd52cbcd69a3cd7d9a10d64de38f854b2 ]

We noticed some rare sk_buffs were stepping past the queue when system was
under memory pressure. The general theory is to skip enqueueing
sk_buffs when its not necessary which is the normal case with a system
that is properly provisioned for the task, no memory pressure and enough
cpu assigned.

But, if we can't allocate memory due to an ENOMEM error when enqueueing
the sk_buff into the sockmap receive queue we push it onto a delayed
workqueue to retry later. When a new sk_buff is received we then check
if that queue is empty. However, there is a problem with simply checking
the queue length. When a sk_buff is being processed from the ingress queue
but not yet on the sockmap msg receive queue its possible to also recv
a sk_buff through normal path. It will check the ingress queue which is
zero and then skip ahead of the pkt being processed.

Previously we used sock lock from both contexts which made the problem
harder to hit, but not impossible.

To fix instead of popping the skb from the queue entirely we peek the
skb from the queue and do the copy there. This ensures checks to the
queue length are non-zero while skb is being processed. Then finally
when the entire skb has been copied to user space queue or another
socket we pop it off the queue. This way the queue length check allows
bypassing the queue only after the list has been completely processed.

To reproduce issue we run NGINX compliance test with sockmap running and
observe some flakes in our testing that we attributed to this issue.

Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Change-Id: I076ae2689caf17afbae7d4093139407d60cf4d0d
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: William Findlay <will@isovalent.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-5-john.fastabend@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ba4fec5bd6)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 23:02:40 +00:00
John Fastabend
ecfcbe21d7 UPSTREAM: bpf, sockmap: Reschedule is now done through backlog
[ Upstream commit bce22552f92ea7c577f49839b8e8f7d29afaf880 ]

Now that the backlog manages the reschedule() logic correctly we can drop
the partial fix to reschedule from recvmsg hook.

Rescheduling on recvmsg hook was added to address a corner case where we
still had data in the backlog state but had nothing to kick it and
reschedule the backlog worker to run and finish copying data out of the
state. This had a couple limitations, first it required user space to
kick it introducing an unnecessary EBUSY and retry. Second it only
handled the ingress case and egress redirects would still be hung.

With the correct fix, pushing the reschedule logic down to where the
enomem error occurs we can drop this fix.

Fixes: bec217197b ("skmsg: Schedule psock work if the cached skb exists on the psock")
Change-Id: Ibf8b70dbeca5122c2ef954504dbe44724456899e
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-4-john.fastabend@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 1e4e379ccd)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 23:02:40 +00:00
John Fastabend
42fcf3b6df UPSTREAM: bpf, sockmap: Convert schedule_work into delayed_work
[ Upstream commit 29173d07f79883ac94f5570294f98af3d4287382 ]

Sk_buffs are fed into sockmap verdict programs either from a strparser
(when the user might want to decide how framing of skb is done by attaching
another parser program) or directly through tcp_read_sock. The
tcp_read_sock is the preferred method for performance when the BPF logic is
a stream parser.

The flow for Cilium's common use case with a stream parser is,

 tcp_read_sock()
  sk_psock_verdict_recv
    ret = bpf_prog_run_pin_on_cpu()
    sk_psock_verdict_apply(sock, skb, ret)
     // if system is under memory pressure or app is slow we may
     // need to queue skb. Do this queuing through ingress_skb and
     // then kick timer to wake up handler
     skb_queue_tail(ingress_skb, skb)
     schedule_work(work);

The work queue is wired up to sk_psock_backlog(). This will then walk the
ingress_skb skb list that holds our sk_buffs that could not be handled,
but should be OK to run at some later point. However, its possible that
the workqueue doing this work still hits an error when sending the skb.
When this happens the skbuff is requeued on a temporary 'state' struct
kept with the workqueue. This is necessary because its possible to
partially send an skbuff before hitting an error and we need to know how
and where to restart when the workqueue runs next.

Now for the trouble, we don't rekick the workqueue. This can cause a
stall where the skbuff we just cached on the state variable might never
be sent. This happens when its the last packet in a flow and no further
packets come along that would cause the system to kick the workqueue from
that side.

To fix we could do simple schedule_work(), but while under memory pressure
it makes sense to back off some instead of continue to retry repeatedly. So
instead to fix convert schedule_work to schedule_delayed_work and add
backoff logic to reschedule from backlog queue on errors. Its not obvious
though what a good backoff is so use '1'.

To test we observed some flakes whil running NGINX compliance test with
sockmap we attributed these failed test to this bug and subsequent issue.

>From on list discussion. This commit

 bec217197b41("skmsg: Schedule psock work if the cached skb exists on the psock")

was intended to address similar race, but had a couple cases it missed.
Most obvious it only accounted for receiving traffic on the local socket
so if redirecting into another socket we could still get an sk_buff stuck
here. Next it missed the case where copied=0 in the recv() handler and
then we wouldn't kick the scheduler. Also its sub-optimal to require
userspace to kick the internal mechanisms of sockmap to wake it up and
copy data to user. It results in an extra syscall and requires the app
to actual handle the EAGAIN correctly.

Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Change-Id: I61dbe914b0abf5f0f7e16f95d246c8e4fa0f5afa
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: William Findlay <will@isovalent.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-3-john.fastabend@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 9f4d7efb33)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 23:02:40 +00:00
John Fastabend
a59051006b UPSTREAM: bpf, sockmap: Pass skb ownership through read_skb
[ Upstream commit 78fa0d61d97a728d306b0c23d353c0e340756437 ]

The read_skb hook calls consume_skb() now, but this means that if the
recv_actor program wants to use the skb it needs to inc the ref cnt
so that the consume_skb() doesn't kfree the sk_buff.

This is problematic because in some error cases under memory pressure
we may need to linearize the sk_buff from sk_psock_skb_ingress_enqueue().
Then we get this,

 skb_linearize()
   __pskb_pull_tail()
     pskb_expand_head()
       BUG_ON(skb_shared(skb))

Because we incremented users refcnt from sk_psock_verdict_recv() we
hit the bug on with refcnt > 1 and trip it.

To fix lets simply pass ownership of the sk_buff through the skb_read
call. Then we can drop the consume from read_skb handlers and assume
the verdict recv does any required kfree.

Bug found while testing in our CI which runs in VMs that hit memory
constraints rather regularly. William tested TCP read_skb handlers.

[  106.536188] ------------[ cut here ]------------
[  106.536197] kernel BUG at net/core/skbuff.c:1693!
[  106.536479] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[  106.536726] CPU: 3 PID: 1495 Comm: curl Not tainted 5.19.0-rc5 #1
[  106.537023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.16.0-1 04/01/2014
[  106.537467] RIP: 0010:pskb_expand_head+0x269/0x330
[  106.538585] RSP: 0018:ffffc90000138b68 EFLAGS: 00010202
[  106.538839] RAX: 000000000000003f RBX: ffff8881048940e8 RCX: 0000000000000a20
[  106.539186] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff8881048940e8
[  106.539529] RBP: ffffc90000138be8 R08: 00000000e161fd1a R09: 0000000000000000
[  106.539877] R10: 0000000000000018 R11: 0000000000000000 R12: ffff8881048940e8
[  106.540222] R13: 0000000000000003 R14: 0000000000000000 R15: ffff8881048940e8
[  106.540568] FS:  00007f277dde9f00(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
[  106.540954] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  106.541227] CR2: 00007f277eeede64 CR3: 000000000ad3e000 CR4: 00000000000006e0
[  106.541569] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  106.541915] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  106.542255] Call Trace:
[  106.542383]  <IRQ>
[  106.542487]  __pskb_pull_tail+0x4b/0x3e0
[  106.542681]  skb_ensure_writable+0x85/0xa0
[  106.542882]  sk_skb_pull_data+0x18/0x20
[  106.543084]  bpf_prog_b517a65a242018b0_bpf_skskb_http_verdict+0x3a9/0x4aa9
[  106.543536]  ? migrate_disable+0x66/0x80
[  106.543871]  sk_psock_verdict_recv+0xe2/0x310
[  106.544258]  ? sk_psock_write_space+0x1f0/0x1f0
[  106.544561]  tcp_read_skb+0x7b/0x120
[  106.544740]  tcp_data_queue+0x904/0xee0
[  106.544931]  tcp_rcv_established+0x212/0x7c0
[  106.545142]  tcp_v4_do_rcv+0x174/0x2a0
[  106.545326]  tcp_v4_rcv+0xe70/0xf60
[  106.545500]  ip_protocol_deliver_rcu+0x48/0x290
[  106.545744]  ip_local_deliver_finish+0xa7/0x150

Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Reported-by: William Findlay <will@isovalent.com>
Change-Id: I0dadf18f695e4305ba1043a7fbec7ef3f58baba7
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: William Findlay <will@isovalent.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-2-john.fastabend@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 4ae2af3e59)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 23:02:40 +00:00
Elliot Berman
86409bb4e1 ANDROID: virt: gunyah: Sync with latest Gunyah patches
Sync changes to Gunyah stack to align with latest changes
posted to kernel.org:

https://lore.kernel.org/all/20230613172054.3959700-1-quic_eberman@quicinc.com/

Bug: 287037804
Change-Id: Ia36044894860bb94ff5518cf304254cdad14aaf5
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
2023-06-14 22:02:31 +00:00
Elliot Berman
705a9b5feb ANDROID: virt: gunyah: Sync with latest documentation and sample
Sync with latest documentation and sample code from v14 of Gunyah
patches:

https://lore.kernel.org/all/20230613172054.3959700-1-quic_eberman@quicinc.com/

Bug: 287037804
Change-Id: I8893922e6b8096fdd5dff1b22ebce96e72cdb7c3
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
2023-06-14 22:02:31 +00:00
Howard Yen
60662882b7 FROMLIST: usb: xhci-plat: add xhci_plat_priv_overwrite
Add an overwrite to platform specific callback for setting up the
xhci_vendor_ops, allow vendor to store the xhci_vendor_ops and
overwrite them when xhci_plat_probe invoked.

This change is depend on Commit in this patch series
("usb: host: add xhci hooks for USB offload"), vendor needs
to invoke xhci_plat_register_vendor_ops() to register the vendor specific
vendor_ops. And the vendor_ops will overwrite the vendor_ops inside
xhci_plat_priv in xhci_vendor_init() during xhci-plat-hcd probe.

Change-Id: I8030fe3bd274615f5926f19014c3a3e066ca9dba
Signed-off-by: Howard Yen <howardyen@google.com>
Bug: 175358363
Link: https://lore.kernel.org/r/20210119101044.1637023-1-howardyen@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Signed-off-by: JaeHun Jung <jh0801.jung@samsung.com>
2023-06-14 17:35:00 +00:00
Howard Yen
6496f6cfbb ANDROID: usb: host: export symbols for xhci hooks usage
Export symbols for xhci hooks usage:
xhci_ring_free
	- Allow xhci hook to free xhci_ring.
xhci_get_slot_ctx
	- Allow xhci hook to get slot_ctx from the xhci_container_ctx
	for getting the slot_ctx information to know which slot is
	offloading and compare the context in remote subsystem memory
	if needed.
xhci_get_ep_ctx
	- Allow xhci hook to get ep_ctx from the xhci_container_ctx for
	getting the ep_ctx information to know which ep is offloading and
	comparing the context in remote subsystem memory if needed.

Export below xhci symbols for vendor modules to manage additional secondary rings.
These will be used to manage the secondary ring for usb audio offload.

xhci_segment_free
	- Free a segment struct.
xhci_remove_stream_mapping
	- Free for sram
xhci_link_segments
	- Make the prev segment point to the next segment.
xhci_initialze_ring_info
	- Initialze a ring struct.
xhci_check_trb_in_td_math
	- Check TRB math for validation.
xhci_address_device
	- Issue an address device command
xhci_bus_suspend
xhci_bus_resume
	- Suspend and resume for power scenario

Change-Id: I2d99bded67024b2a7c625f934567e39ac03a6e5f
Signed-off-by: Howard Yen <howardyen@google.com>
Bug: 175358363
Bug: 183761108
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Signed-off-by: Daehwan Jung <dh10.jung@samsung.com>
Signed-off-by: JaeHun Jung <jh0801.jung@samsung.com>
2023-06-14 17:35:00 +00:00
Howard Yen
90ab8e7f98 ANDROID: usb: host: add xhci hooks for USB offload
To enable supporting for USB offload, define "offload" in usb controller
node of device tree. "offload" value can be used to determine which type
of offload was been enabled in the SoC.

For example:

&usbdrd_dwc3 {
	...
		/* support usb offloading, 0: disabled, 1: audio */
		offload = <1>;
	...
};

There are several vendor_ops introduced by this patch:

c - function callbacks for vendor specific operations
{
	@vendor_init:
		- called for vendor init process during xhci-plat-hcd
		probe.
		@vendor_cleanup:
		- called for vendor cleanup process during xhci-plat-hcd
		remove.
		@is_usb_offload_enabled:
		- called to check if usb offload enabled.
		@queue_irq_work:
		- called to queue vendor specific irq work.
		@alloc_dcbaa:
		- called when allocating vendor specific dcbaa during
		memory initializtion.
		@free_dcbaa:
		- called to free vendor specific dcbaa when cleanup the
		memory.
		@alloc_transfer_ring:
		- called when vendor specific transfer ring allocation is required
		@free_transfer_ring:
		- called to free vendor specific transfer ring
		@sync_dev_ctx:
		- called when synchronization for device context is required
		@usb_offload_skip_urb:
		- skip urb control for offloading
		@alloc_container_ctx:
		@free_container_ctx:
		- called to alloc and free vendor specific container context
}

The xhci hooks with prefix "xhci_vendor_" on the ops in xhci_vendor_ops.
For example, vendor_init ops will be invoked by xhci_vendor_init() hook,
is_usb_offload_enabled ops will be invoked by
xhci_vendor_is_usb_offload_enabled(), and so on.

Change-Id: Ib7f6952e6d44a2fcfe9d19a78f1d9f5093417613
Signed-off-by: Howard Yen <howardyen@google.com>
Bug: 175358363
Signed-off-by: Greg Kroah-Harktman <gregkh@google.com>
Signed-off-by: Puma Hsu <pumahsu@google.com>
Signed-off-by: J. Avila <elavila@google.com>
Signed-off-by: Daehwan Jung <dh10.jung@samsung.com>
Signed-off-by: JaeHun Jung <jh0801.jung@samsung.com>
2023-06-14 17:35:00 +00:00
Carlos Llamas
88959a53f4 ANDROID: 6/16/2023 KMI update
Set KMI_GENERATION=9 for 6/16 KMI update

function symbol 'struct block_device* I_BDEV(struct inode*)' changed
  CRC changed from 0xb3d19fd2 to 0xc8597fa

function symbol 'void __ClearPageMovable(struct page*)' changed
  CRC changed from 0x66921e4f to 0xb4e74d22

function symbol 'void __SetPageMovable(struct page*, const struct movable_operations*)' changed
  CRC changed from 0x2b34667d to 0xe8b6d861

... 4484 omitted; 4487 symbols have only CRC changes

type 'struct request' changed
  byte size changed from 312 to 320
  member 'u64 alloc_time_ns' was added
  19 members ('u64 start_time_ns' .. 'u64 android_kabi_reserved1') changed
    offset changed by 64

type 'struct bio' changed
  byte size changed from 152 to 160
  member 'u64 bi_iocost_cost' was added
  12 members ('struct bio_crypt_ctx* bi_crypt_context' .. 'struct bio_vec bi_inline_vecs[0]') changed
    offset changed by 64

type 'enum cpuhp_state' changed
  enumerator 'CPUHP_AP_ARM_SDEI_STARTING' (116) was removed
  enumerator 'CPUHP_AP_ARM_VFP_STARTING' value changed from 117 to 116
  enumerator 'CPUHP_AP_ARM64_DEBUG_MONITORS_STARTING' value changed from 118 to 117
  enumerator 'CPUHP_AP_PERF_ARM_HW_BREAKPOINT_STARTING' value changed from 119 to 118
  enumerator 'CPUHP_AP_PERF_ARM_ACPI_STARTING' value changed from 120 to 119
  enumerator 'CPUHP_AP_PERF_ARM_STARTING' value changed from 121 to 120
  enumerator 'CPUHP_AP_PERF_RISCV_STARTING' value changed from 122 to 121
  enumerator 'CPUHP_AP_ARM_L2X0_STARTING' value changed from 123 to 122
  enumerator 'CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING' value changed from 124 to 123
  enumerator 'CPUHP_AP_ARM_ARCH_TIMER_STARTING' value changed from 125 to 124
  enumerator 'CPUHP_AP_ARM_GLOBAL_TIMER_STARTING' value changed from 126 to 125
  enumerator 'CPUHP_AP_JCORE_TIMER_STARTING' value changed from 127 to 126
  enumerator 'CPUHP_AP_ARM_TWD_STARTING' value changed from 128 to 127
  enumerator 'CPUHP_AP_QCOM_TIMER_STARTING' value changed from 129 to 128
  enumerator 'CPUHP_AP_TEGRA_TIMER_STARTING' value changed from 130 to 129
  enumerator 'CPUHP_AP_ARMADA_TIMER_STARTING' value changed from 131 to 130
  enumerator 'CPUHP_AP_MARCO_TIMER_STARTING' value changed from 132 to 131
  enumerator 'CPUHP_AP_MIPS_GIC_TIMER_STARTING' value changed from 133 to 132
  enumerator 'CPUHP_AP_ARC_TIMER_STARTING' value changed from 134 to 133
  enumerator 'CPUHP_AP_RISCV_TIMER_STARTING' value changed from 135 to 134
  enumerator 'CPUHP_AP_CLINT_TIMER_STARTING' value changed from 136 to 135
  enumerator 'CPUHP_AP_CSKY_TIMER_STARTING' value changed from 137 to 136
  enumerator 'CPUHP_AP_TI_GP_TIMER_STARTING' value changed from 138 to 137
  enumerator 'CPUHP_AP_HYPERV_TIMER_STARTING' value changed from 139 to 138
  enumerator 'CPUHP_AP_KVM_STARTING' value changed from 140 to 139
  enumerator 'CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING' value changed from 141 to 140
  enumerator 'CPUHP_AP_KVM_ARM_VGIC_STARTING' value changed from 142 to 141
  enumerator 'CPUHP_AP_KVM_ARM_TIMER_STARTING' value changed from 143 to 142
  enumerator 'CPUHP_AP_DUMMY_TIMER_STARTING' value changed from 144 to 143
  enumerator 'CPUHP_AP_ARM_XEN_STARTING' value changed from 145 to 144
  enumerator 'CPUHP_AP_ARM_CORESIGHT_STARTING' value changed from 146 to 145
  enumerator 'CPUHP_AP_ARM_CORESIGHT_CTI_STARTING' value changed from 147 to 146
  enumerator 'CPUHP_AP_ARM64_ISNDEP_STARTING' value changed from 148 to 147
  enumerator 'CPUHP_AP_SMPCFD_DYING' value changed from 149 to 148
  enumerator 'CPUHP_AP_X86_TBOOT_DYING' value changed from 150 to 149
  enumerator 'CPUHP_AP_ARM_CACHE_B15_RAC_DYING' value changed from 151 to 150
  enumerator 'CPUHP_AP_ONLINE' value changed from 152 to 151
  enumerator 'CPUHP_TEARDOWN_CPU' value changed from 153 to 152
  enumerator 'CPUHP_AP_ONLINE_IDLE' value changed from 154 to 153
  enumerator 'CPUHP_AP_SCHED_WAIT_EMPTY' value changed from 155 to 154
  enumerator 'CPUHP_AP_SMPBOOT_THREADS' value changed from 156 to 155
  enumerator 'CPUHP_AP_X86_VDSO_VMA_ONLINE' value changed from 157 to 156
  enumerator 'CPUHP_AP_IRQ_AFFINITY_ONLINE' value changed from 158 to 157
  enumerator 'CPUHP_AP_BLK_MQ_ONLINE' value changed from 159 to 158
  enumerator 'CPUHP_AP_ARM_MVEBU_SYNC_CLOCKS' value changed from 160 to 159
  enumerator 'CPUHP_AP_X86_INTEL_EPB_ONLINE' value changed from 161 to 160
  enumerator 'CPUHP_AP_PERF_ONLINE' value changed from 162 to 161
  enumerator 'CPUHP_AP_PERF_X86_ONLINE' value changed from 163 to 162
  enumerator 'CPUHP_AP_PERF_X86_UNCORE_ONLINE' value changed from 164 to 163
  enumerator 'CPUHP_AP_PERF_X86_AMD_UNCORE_ONLINE' value changed from 165 to 164
  enumerator 'CPUHP_AP_PERF_X86_AMD_POWER_ONLINE' value changed from 166 to 165
  enumerator 'CPUHP_AP_PERF_X86_RAPL_ONLINE' value changed from 167 to 166
  enumerator 'CPUHP_AP_PERF_X86_CQM_ONLINE' value changed from 168 to 167
  enumerator 'CPUHP_AP_PERF_X86_CSTATE_ONLINE' value changed from 169 to 168
  enumerator 'CPUHP_AP_PERF_X86_IDXD_ONLINE' value changed from 170 to 169
  enumerator 'CPUHP_AP_PERF_S390_CF_ONLINE' value changed from 171 to 170
  enumerator 'CPUHP_AP_PERF_S390_SF_ONLINE' value changed from 172 to 171
  enumerator 'CPUHP_AP_PERF_ARM_CCI_ONLINE' value changed from 173 to 172
  enumerator 'CPUHP_AP_PERF_ARM_CCN_ONLINE' value changed from 174 to 173
  enumerator 'CPUHP_AP_PERF_ARM_HISI_CPA_ONLINE' value changed from 175 to 174
  enumerator 'CPUHP_AP_PERF_ARM_HISI_DDRC_ONLINE' value changed from 176 to 175
  enumerator 'CPUHP_AP_PERF_ARM_HISI_HHA_ONLINE' value changed from 177 to 176
  enumerator 'CPUHP_AP_PERF_ARM_HISI_L3_ONLINE' value changed from 178 to 177
  enumerator 'CPUHP_AP_PERF_ARM_HISI_PA_ONLINE' value changed from 179 to 178
  enumerator 'CPUHP_AP_PERF_ARM_HISI_SLLC_ONLINE' value changed from 180 to 179
  enumerator 'CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE' value changed from 181 to 180
  enumerator 'CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE' value changed from 182 to 181
  enumerator 'CPUHP_AP_PERF_ARM_L2X0_ONLINE' value changed from 183 to 182
  enumerator 'CPUHP_AP_PERF_ARM_QCOM_L2_ONLINE' value changed from 184 to 183
  enumerator 'CPUHP_AP_PERF_ARM_QCOM_L3_ONLINE' value changed from 185 to 184
  enumerator 'CPUHP_AP_PERF_ARM_APM_XGENE_ONLINE' value changed from 186 to 185
  enumerator 'CPUHP_AP_PERF_ARM_CAVIUM_TX2_UNCORE_ONLINE' value changed from 187 to 186
  enumerator 'CPUHP_AP_PERF_ARM_MARVELL_CN10K_DDR_ONLINE' value changed from 188 to 187
  enumerator 'CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE' value changed from 189 to 188
  enumerator 'CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE' value changed from 190 to 189
  enumerator 'CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE' value changed from 191 to 190
  enumerator 'CPUHP_AP_PERF_POWERPC_TRACE_IMC_ONLINE' value changed from 192 to 191
  enumerator 'CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE' value changed from 193 to 192
  enumerator 'CPUHP_AP_PERF_POWERPC_HV_GPCI_ONLINE' value changed from 194 to 193
  enumerator 'CPUHP_AP_PERF_CSKY_ONLINE' value changed from 195 to 194
  enumerator 'CPUHP_AP_WATCHDOG_ONLINE' value changed from 196 to 195
  enumerator 'CPUHP_AP_WORKQUEUE_ONLINE' value changed from 197 to 196
  enumerator 'CPUHP_AP_RANDOM_ONLINE' value changed from 198 to 197
  enumerator 'CPUHP_AP_RCUTREE_ONLINE' value changed from 199 to 198
  enumerator 'CPUHP_AP_BASE_CACHEINFO_ONLINE' value changed from 200 to 199
  enumerator 'CPUHP_AP_ONLINE_DYN' value changed from 201 to 200
  enumerator 'CPUHP_AP_ONLINE_DYN_END' value changed from 231 to 230
  enumerator 'CPUHP_AP_MM_DEMOTION_ONLINE' value changed from 232 to 231
  enumerator 'CPUHP_AP_X86_HPET_ONLINE' value changed from 233 to 232
  enumerator 'CPUHP_AP_X86_KVM_CLK_ONLINE' value changed from 234 to 233
  enumerator 'CPUHP_AP_ACTIVE' value changed from 235 to 234
  enumerator 'CPUHP_ANDROID_RESERVED_1' value changed from 236 to 235
  enumerator 'CPUHP_ANDROID_RESERVED_2' value changed from 237 to 236
  enumerator 'CPUHP_ANDROID_RESERVED_3' value changed from 238 to 237
  enumerator 'CPUHP_ANDROID_RESERVED_4' value changed from 239 to 238
  enumerator 'CPUHP_ONLINE' value changed from 240 to 239

type 'struct task_struct' changed
  byte size changed from 4736 to 4800
  104 members ('const struct cred* ptracer_cred' .. 'struct thread_struct thread') changed
    offset changed by 384

type 'struct platform_driver' changed
  byte size changed from 240 to 248
  member 'void(* remove_new)(struct platform_device*)' was added
  8 members ('void(* shutdown)(struct platform_device*)' .. 'u64 android_kabi_reserved1') changed
    offset changed by 64

type 'struct tipc_bearer' changed
  member 'u16 encap_hlen' was added

type 'struct posix_cputimers_work' changed
  byte size changed from 24 to 72
  member 'struct mutex mutex' was added
  member 'unsigned int scheduled' changed
    offset changed by 384

type 'struct binder_alloc' changed
  member 'struct vm_area_struct* vma' was added
  member 'unsigned long vma_addr' was removed

type 'struct usb_udc' changed
  byte size changed from 1000 to 952
  member 'struct mutex connect_lock' was removed

type 'enum kvm_pgtable_prot' changed
  enumerator 'KVM_PGTABLE_PROT_PXN' (32) was added
  enumerator 'KVM_PGTABLE_PROT_UXN' (64) was added

Bug: 287162457
Change-Id: Ic3aad43bd3a6083cf91e71e79ece713bef0e8172
Signed-off-by: Carlos Llamas <cmllamas@google.com>
2023-06-14 16:40:59 +00:00
Carlos Llamas
21bc72f339 UPSTREAM: binder: fix UAF of alloc->vma in race with munmap()
commit d1d8875c8c13517f6fd1ff8d4d3e1ac366a17e07 upstream.

[ cmllamas: clean forward port from commit 015ac18be7de ("binder: fix
  UAF of alloc->vma in race with munmap()") in 5.10 stable. It is needed
  in mainline after the revert of commit a43cfc87ca ("android: binder:
  stop saving a pointer to the VMA") as pointed out by Liam. The commit
  log and tags have been tweaked to reflect this. ]

In commit 720c241924 ("ANDROID: binder: change down_write to
down_read") binder assumed the mmap read lock is sufficient to protect
alloc->vma inside binder_update_page_range(). This used to be accurate
until commit dd2283f260 ("mm: mmap: zap pages with read mmap_sem in
munmap"), which now downgrades the mmap_lock after detaching the vma
from the rbtree in munmap(). Then it proceeds to teardown and free the
vma with only the read lock held.

This means that accesses to alloc->vma in binder_update_page_range() now
will race with vm_area_free() in munmap() and can cause a UAF as shown
in the following KASAN trace:

  ==================================================================
  BUG: KASAN: use-after-free in vm_insert_page+0x7c/0x1f0
  Read of size 8 at addr ffff16204ad00600 by task server/558

  CPU: 3 PID: 558 Comm: server Not tainted 5.10.150-00001-gdc8dcf942daa #1
  Hardware name: linux,dummy-virt (DT)
  Call trace:
   dump_backtrace+0x0/0x2a0
   show_stack+0x18/0x2c
   dump_stack+0xf8/0x164
   print_address_description.constprop.0+0x9c/0x538
   kasan_report+0x120/0x200
   __asan_load8+0xa0/0xc4
   vm_insert_page+0x7c/0x1f0
   binder_update_page_range+0x278/0x50c
   binder_alloc_new_buf+0x3f0/0xba0
   binder_transaction+0x64c/0x3040
   binder_thread_write+0x924/0x2020
   binder_ioctl+0x1610/0x2e5c
   __arm64_sys_ioctl+0xd4/0x120
   el0_svc_common.constprop.0+0xac/0x270
   do_el0_svc+0x38/0xa0
   el0_svc+0x1c/0x2c
   el0_sync_handler+0xe8/0x114
   el0_sync+0x180/0x1c0

  Allocated by task 559:
   kasan_save_stack+0x38/0x6c
   __kasan_kmalloc.constprop.0+0xe4/0xf0
   kasan_slab_alloc+0x18/0x2c
   kmem_cache_alloc+0x1b0/0x2d0
   vm_area_alloc+0x28/0x94
   mmap_region+0x378/0x920
   do_mmap+0x3f0/0x600
   vm_mmap_pgoff+0x150/0x17c
   ksys_mmap_pgoff+0x284/0x2dc
   __arm64_sys_mmap+0x84/0xa4
   el0_svc_common.constprop.0+0xac/0x270
   do_el0_svc+0x38/0xa0
   el0_svc+0x1c/0x2c
   el0_sync_handler+0xe8/0x114
   el0_sync+0x180/0x1c0

  Freed by task 560:
   kasan_save_stack+0x38/0x6c
   kasan_set_track+0x28/0x40
   kasan_set_free_info+0x24/0x4c
   __kasan_slab_free+0x100/0x164
   kasan_slab_free+0x14/0x20
   kmem_cache_free+0xc4/0x34c
   vm_area_free+0x1c/0x2c
   remove_vma+0x7c/0x94
   __do_munmap+0x358/0x710
   __vm_munmap+0xbc/0x130
   __arm64_sys_munmap+0x4c/0x64
   el0_svc_common.constprop.0+0xac/0x270
   do_el0_svc+0x38/0xa0
   el0_svc+0x1c/0x2c
   el0_sync_handler+0xe8/0x114
   el0_sync+0x180/0x1c0

  [...]
  ==================================================================

To prevent the race above, revert back to taking the mmap write lock
inside binder_update_page_range(). One might expect an increase of mmap
lock contention. However, binder already serializes these calls via top
level alloc->mutex. Also, there was no performance impact shown when
running the binder benchmark tests.

Fixes: c0fd2101781e ("Revert "android: binder: stop saving a pointer to the VMA"")
Fixes: dd2283f260 ("mm: mmap: zap pages with read mmap_sem in munmap")
Reported-by: Jann Horn <jannh@google.com>
Closes: https://lore.kernel.org/all/20230518144052.xkj6vmddccq4v66b@revolver
Cc: <stable@vger.kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Change-Id: I4215750a81e94bccf5340e4d79f7b26bb039c573
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Acked-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20230519195950.1775656-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 931ea1ed31)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 16:40:59 +00:00
Carlos Llamas
62c6dbdccd UPSTREAM: binder: add lockless binder_alloc_(set|get)_vma()
commit 0fa53349c3acba0239369ba4cd133740a408d246 upstream.

Bring back the original lockless design in binder_alloc to determine
whether the buffer setup has been completed by the ->mmap() handler.
However, this time use smp_load_acquire() and smp_store_release() to
wrap all the ordering in a single macro call.

Also, add comments to make it evident that binder uses alloc->vma to
determine when the binder_alloc has been fully initialized. In these
scenarios acquiring the mmap_lock is not required.

Fixes: a43cfc87ca ("android: binder: stop saving a pointer to the VMA")
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: stable@vger.kernel.org
Change-Id: I2a8040417790b6b82bf44e838146fd68403fdb51
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20230502201220.1756319-3-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d7cee853bc)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 16:40:59 +00:00
Carlos Llamas
3cac174682 UPSTREAM: Revert "android: binder: stop saving a pointer to the VMA"
commit c0fd2101781ef761b636769b2f445351f71c3626 upstream.

This reverts commit a43cfc87ca.

This patch fixed an issue reported by syzkaller in [1]. However, this
turned out to be only a band-aid in binder. The root cause, as bisected
by syzkaller, was fixed by commit 5789151e48 ("mm/mmap: undo ->mmap()
when mas_preallocate() fails"). We no longer need the patch for binder.

Reverting such patch allows us to have a lockless access to alloc->vma
in specific cases where the mmap_lock is not required. This approach
avoids the contention that caused a performance regression.

[1] https://lore.kernel.org/all/0000000000004a0dbe05e1d749e0@google.com

[cmllamas: resolved conflicts with rework of alloc->mm and removal of
 binder_alloc_set_vma() also fixed comment section]

Fixes: a43cfc87ca ("android: binder: stop saving a pointer to the VMA")
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: stable@vger.kernel.org
Change-Id: I208b4ebf832790eb155d52ec3115e1e6c58f6f80
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20230502201220.1756319-2-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 72a94f8c14)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 16:40:59 +00:00
Carlos Llamas
dadb40b436 UPSTREAM: Revert "binder_alloc: add missing mmap_lock calls when using the VMA"
commit b15655b12ddca7ade09807f790bafb6fab61b50a upstream.

This reverts commit 44e602b4e5.

This caused a performance regression particularly when pages are getting
reclaimed. We don't need to acquire the mmap_lock to determine when the
binder buffer has been fully initialized. A subsequent patch will bring
back the lockless approach for this.

[cmllamas: resolved trivial conflicts with renaming of alloc->mm]

Fixes: 44e602b4e5 ("binder_alloc: add missing mmap_lock calls when using the VMA")
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: stable@vger.kernel.org
Change-Id: If26447c08c59fbbc43731ecbd8b501c928ffbe2d
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20230502201220.1756319-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7e6b854854)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 16:40:59 +00:00
Xin Long
fcdbf469c5 UPSTREAM: tipc: check the bearer min mtu properly when setting it by netlink
[ Upstream commit 35a089b5d793d2bfd2cc7cfa6104545184de2ce7 ]

Checking the bearer min mtu with tipc_udp_mtu_bad() only works for
IPv4 UDP bearer, and IPv6 UDP bearer has a different value for the
min mtu. This patch checks with encap_hlen + TIPC_MIN_BEARER_MTU
for min mtu, which works for both IPv4 and IPv6 UDP bearer.

Note that tipc_udp_mtu_bad() is still used to check media min mtu
in __tipc_nl_media_set(), as m->mtu currently is only used by the
IPv4 UDP bearer as its default mtu value.

Fixes: 682cd3cf94 ("tipc: confgiure and apply UDP bearer MTU on running links")
Change-Id: I384afae6ffa9c43f72c1cda34ad2f1dd611fc675
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f215b62f59)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 16:40:59 +00:00
Xin Long
e48a801737 UPSTREAM: tipc: do not update mtu if msg_max is too small in mtu negotiation
[ Upstream commit 56077b56cd3fb78e1c8619e29581ba25a5c55e86 ]

When doing link mtu negotiation, a malicious peer may send Activate msg
with a very small mtu, e.g. 4 in Shuang's testing, without checking for
the minimum mtu, l->mtu will be set to 4 in tipc_link_proto_rcv(), then
n->links[bearer_id].mtu is set to 4294967228, which is a overflow of
'4 - INT_H_SIZE - EMSG_OVERHEAD' in tipc_link_mss().

With tipc_link.mtu = 4, tipc_link_xmit() kept printing the warning:

 tipc: Too large msg, purging xmit list 1 5 0 40 4!
 tipc: Too large msg, purging xmit list 1 15 0 60 4!

And with tipc_link_entry.mtu 4294967228, a huge skb was allocated in
named_distribute(), and when purging it in tipc_link_xmit(), a crash
was even caused:

  general protection fault, probably for non-canonical address 0x2100001011000dd: 0000 [#1] PREEMPT SMP PTI
  CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 6.3.0.neta #19
  RIP: 0010:kfree_skb_list_reason+0x7e/0x1f0
  Call Trace:
   <IRQ>
   skb_release_data+0xf9/0x1d0
   kfree_skb_reason+0x40/0x100
   tipc_link_xmit+0x57a/0x740 [tipc]
   tipc_node_xmit+0x16c/0x5c0 [tipc]
   tipc_named_node_up+0x27f/0x2c0 [tipc]
   tipc_node_write_unlock+0x149/0x170 [tipc]
   tipc_rcv+0x608/0x740 [tipc]
   tipc_udp_recv+0xdc/0x1f0 [tipc]
   udp_queue_rcv_one_skb+0x33e/0x620
   udp_unicast_rcv_skb.isra.72+0x75/0x90
   __udp4_lib_rcv+0x56d/0xc20
   ip_protocol_deliver_rcu+0x100/0x2d0

This patch fixes it by checking the new mtu against tipc_bearer_min_mtu(),
and not updating mtu if it is too small.

Fixes: ed193ece26 ("tipc: simplify link mtu negotiation")
Reported-by: Shuang Li <shuali@redhat.com>
Change-Id: I95f28cbfaf6dc4899e0695ba6168c7c58737f06b
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 259683001d)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 16:40:59 +00:00
Xin Long
461038ba5c UPSTREAM: tipc: add tipc_bearer_min_mtu to calculate min mtu
[ Upstream commit 3ae6d66b605be604644d4bb5708a7ffd9cf1abe8 ]

As different media may requires different min mtu, and even the
same media with different net family requires different min mtu,
add tipc_bearer_min_mtu() to calculate min mtu accordingly.

This API will be used to check the new mtu when doing the link
mtu negotiation in the next patch.

Change-Id: I960cf07506388294eb6028938025e1073a2c4be5
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 56077b56cd3f ("tipc: do not update mtu if msg_max is too small in mtu negotiation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 735c64ea88)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 16:40:59 +00:00
Francesco Dolcini
d0be9e79ee UPSTREAM: Revert "usb: gadget: udc: core: Invoke usb_gadget_connect only when started"
commit f22e9b67f19ccc73de1ae04375d4b30684e261f8 upstream.

This reverts commit 0db213ea8eed5534a5169e807f28103cbc9d23df.

It introduces an issues with configuring the USB gadget hangs forever
on multiple Qualcomm and NXP i.MX SoC at least.

Cc: stable@vger.kernel.org
Fixes: 0db213ea8eed ("usb: gadget: udc: core: Invoke usb_gadget_connect only when started")
Reported-by: Stephan Gerhold <stephan@gerhold.net>
Reported-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Link: https://lore.kernel.org/all/ZF4BvgsOyoKxdPFF@francesco-nb.int.toradex.com/
Change-Id: I2a294aedee1ca56b293db30fc7d9258e92e61372
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Link: https://lore.kernel.org/r/20230512131435.205464-3-francesco@dolcini.it
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ea56ede911)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 16:40:59 +00:00
Shengjiu Wang
66a5c03404 UPSTREAM: ASoC: fsl_micfil: Fix error handler with pm_runtime_enable
[ Upstream commit 17955aba7877a4494d8093ae5498e19469b01d57 ]

There is error message when defer probe happens:

fsl-micfil-dai 30ca0000.micfil: Unbalanced pm_runtime_enable!

Fix the error handler with pm_runtime_enable and add
fsl_micfil_remove() for pm_runtime_disable.

Fixes: 47a70e6fc9 ("ASoC: Add MICFIL SoC Digital Audio Interface driver.")
Change-Id: I292d01a821e595076795be3088b2b816251a700f
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com
Link: https://lore.kernel.org/r/1683540996-6136-1-git-send-email-shengjiu.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ce6c7befc2)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 16:40:59 +00:00
Uwe Kleine-König
6e721f991f UPSTREAM: platform: Provide a remove callback that returns no value
[ Upstream commit 5c5a7680e67ba6fbbb5f4d79fa41485450c1985c ]

struct platform_driver::remove returning an integer made driver authors
expect that returning an error code was proper error handling. However
the driver core ignores the error and continues to remove the device
because there is nothing the core could do anyhow and reentering the
remove callback again is only calling for trouble.

So this is an source for errors typically yielding resource leaks in the
error path.

As there are too many platform drivers to neatly convert them all to
return void in a single go, do it in several steps after this patch:

 a) Convert all drivers to implement .remove_new() returning void instead
    of .remove() returning int;
 b) Change struct platform_driver::remove() to return void and so make
    it identical to .remove_new();
 c) Change all drivers back to .remove() now with the better prototype;
 d) drop struct platform_driver::remove_new().

While this touches all drivers eventually twice, steps a) and c) can be
done one driver after another and so reduces coordination efforts
immensely and simplifies review.

Change-Id: I7da6828a301462bad53470cf94db94d55ac51d37
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20221209150914.3557650-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: 17955aba7877 ("ASoC: fsl_micfil: Fix error handler with pm_runtime_enable")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 9d3ac384cb)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 16:40:59 +00:00
Pierre Gondois
07a8c09137 UPSTREAM: firmware: arm_sdei: Fix sleep from invalid context BUG
[ Upstream commit d2c48b2387eb89e0bf2a2e06e30987cf410acad4 ]

Running a preempt-rt (v6.2-rc3-rt1) based kernel on an Ampere Altra
triggers:

  BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
  in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 24, name: cpuhp/0
  preempt_count: 0, expected: 0
  RCU nest depth: 0, expected: 0
  3 locks held by cpuhp/0/24:
    #0: ffffda30217c70d0 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248
    #1: ffffda30217c7120 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248
    #2: ffffda3021c711f0 (sdei_list_lock){....}-{3:3}, at: sdei_cpuhp_up+0x3c/0x130
  irq event stamp: 36
  hardirqs last  enabled at (35): [<ffffda301e85b7bc>] finish_task_switch+0xb4/0x2b0
  hardirqs last disabled at (36): [<ffffda301e812fec>] cpuhp_thread_fun+0x21c/0x248
  softirqs last  enabled at (0): [<ffffda301e80b184>] copy_process+0x63c/0x1ac0
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  CPU: 0 PID: 24 Comm: cpuhp/0 Not tainted 5.19.0-rc3-rt5-[...]
  Hardware name: WIWYNN Mt.Jade Server [...]
  Call trace:
    dump_backtrace+0x114/0x120
    show_stack+0x20/0x70
    dump_stack_lvl+0x9c/0xd8
    dump_stack+0x18/0x34
    __might_resched+0x188/0x228
    rt_spin_lock+0x70/0x120
    sdei_cpuhp_up+0x3c/0x130
    cpuhp_invoke_callback+0x250/0xf08
    cpuhp_thread_fun+0x120/0x248
    smpboot_thread_fn+0x280/0x320
    kthread+0x130/0x140
    ret_from_fork+0x10/0x20

sdei_cpuhp_up() is called in the STARTING hotplug section,
which runs with interrupts disabled. Use a CPUHP_AP_ONLINE_DYN entry
instead to execute the cpuhp cb later, with preemption enabled.

SDEI originally got its own cpuhp slot to allow interacting
with perf. It got superseded by pNMI and this early slot is not
relevant anymore. [1]

Some SDEI calls (e.g. SDEI_1_0_FN_SDEI_PE_MASK) take actions on the
calling CPU. It is checked that preemption is disabled for them.
_ONLINE cpuhp cb are executed in the 'per CPU hotplug thread'.
Preemption is enabled in those threads, but their cpumask is limited
to 1 CPU.
Move 'WARN_ON_ONCE(preemptible())' statements so that SDEI cpuhp cb
don't trigger them.

Also add a check for the SDEI_1_0_FN_SDEI_PRIVATE_RESET SDEI call
which acts on the calling CPU.

[1]:
https://lore.kernel.org/all/5813b8c5-ae3e-87fd-fccc-94c9cd08816d@arm.com/

Suggested-by: James Morse <james.morse@arm.com>
Change-Id: I9f73aadd24096d8298b5ae8f26f955e9f6ee2b9a
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20230216084920.144064-1-pierre.gondois@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a8267bc8de)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 16:40:59 +00:00
Kevin Brodsky
b065972b7b UPSTREAM: uapi/linux/const.h: prefer ISO-friendly __typeof__
[ Upstream commit 31088f6f7906253ef4577f6a9b84e2d42447dba0 ]

typeof is (still) a GNU extension, which means that it cannot be used when
building ISO C (e.g.  -std=c99).  It should therefore be avoided in uapi
headers in favour of the ISO-friendly __typeof__.

Unfortunately this issue could not be detected by
CONFIG_UAPI_HEADER_TEST=y as the __ALIGN_KERNEL() macro is not expanded in
any uapi header.

This matters from a userspace perspective, not a kernel one. uapi
headers and their contents are expected to be usable in a variety of
situations, and in particular when building ISO C applications (with
-std=c99 or similar).

This particular problem can be reproduced by trying to use the
__ALIGN_KERNEL macro directly in application code, say:

int align(int x, int a)
{
	return __KERNEL_ALIGN(x, a);
}

and trying to build that with -std=c99.

Link: https://lkml.kernel.org/r/20230411092747.3759032-1-kevin.brodsky@arm.com
Fixes: a79ff731a1 ("netfilter: xtables: make XT_ALIGN() usable in exported headers by exporting __ALIGN_KERNEL()")
Change-Id: I05462cdee00da59617f3dfb875c233a246f7d2f6
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reported-by: Ruben Ayrapetyan <ruben.ayrapetyan@arm.com>
Tested-by: Ruben Ayrapetyan <ruben.ayrapetyan@arm.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Tested-by: Petr Vorel <pvorel@suse.cz>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ef9f854103)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 16:40:59 +00:00
Thomas Gleixner
aaf6ccb6f3 UPSTREAM: posix-cpu-timers: Implement the missing timer_wait_running callback
commit f7abf14f0001a5a47539d9f60bbdca649e43536b upstream.

For some unknown reason the introduction of the timer_wait_running callback
missed to fixup posix CPU timers, which went unnoticed for almost four years.
Marco reported recently that the WARN_ON() in timer_wait_running()
triggers with a posix CPU timer test case.

Posix CPU timers have two execution models for expiring timers depending on
CONFIG_POSIX_CPU_TIMERS_TASK_WORK:

1) If not enabled, the expiry happens in hard interrupt context so
   spin waiting on the remote CPU is reasonably time bound.

   Implement an empty stub function for that case.

2) If enabled, the expiry happens in task work before returning to user
   space or guest mode. The expired timers are marked as firing and moved
   from the timer queue to a local list head with sighand lock held. Once
   the timers are moved, sighand lock is dropped and the expiry happens in
   fully preemptible context. That means the expiring task can be scheduled
   out, migrated, interrupted etc. So spin waiting on it is more than
   suboptimal.

   The timer wheel has a timer_wait_running() mechanism for RT, which uses
   a per CPU timer-base expiry lock which is held by the expiry code and the
   task waiting for the timer function to complete blocks on that lock.

   This does not work in the same way for posix CPU timers as there is no
   timer base and expiry for process wide timers can run on any task
   belonging to that process, but the concept of waiting on an expiry lock
   can be used too in a slightly different way:

    - Add a mutex to struct posix_cputimers_work. This struct is per task
      and used to schedule the expiry task work from the timer interrupt.

    - Add a task_struct pointer to struct cpu_timer which is used to store
      a the task which runs the expiry. That's filled in when the task
      moves the expired timers to the local expiry list. That's not
      affecting the size of the k_itimer union as there are bigger union
      members already

    - Let the task take the expiry mutex around the expiry function

    - Let the waiter acquire a task reference with rcu_read_lock() held and
      block on the expiry mutex

   This avoids spin-waiting on a task which might not even be on a CPU and
   works nicely for RT too.

Fixes: ec8f954a40 ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT")
Reported-by: Marco Elver <elver@google.com>
Change-Id: Ic069585c15bc968dec3c2b99cc70256f56a70b32
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marco Elver <elver@google.com>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87zg764ojw.ffs@tglx
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit bccf9fe296)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-14 16:40:59 +00:00
Greg Kroah-Hartman
f3b712fcb5 ANDROID: GKI: reserve extra arm64 cpucaps for ABI preservation
Over the lifetime of the kernel, new arm64 cpucaps need to be added to
handle errata and other fun stuff.  So reserve 20 spots for us to use in
the future as this is an ABI-stable structure that we can not increase
over time without major problems.

Bug: 151154716
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I37bdac374e2570f61ab54919712fd62c7e541e67
2023-06-14 16:40:59 +00:00
Jindong Yue
d1c7974b1f ANDROID: arm64: errata: Add WORKAROUND_NXP_ERR050104 cpucaps
This is a placeholder to workaround NXP iMX8QM A53 Cache coherency issue.
The full patch is still under review upstream.

Considering the patch adds a new cpucap, which breaks KMI, and
the KMI freeze date is coming, so use a placeholder
here to update KMI before the freeze.

According to NXP errata document[1] i.MX8QuadMax SoC suffers from
serious cache coherence issue. It was also mentioned in initial
support[2] for imx8qm mek machine.

Following is excerpt from NXP IMX8_1N94W "Mask Set Errata" document
Rev. 5, 3/2023. Just in case it gets lost somehow.

"ERR050104: Arm/A53: Cache coherency issue"

Description

Some maintenance operations exchanged between the A53 and A72
core clusters, involving some Translation Look-aside Buffer
Invalidate (TLBI) and Instruction Cache (IC) instructions can
be corrupted. The upper bits, above bit-35, of ARADDR and ACADDR
buses within in Arm A53 sub-system have been incorrectly connected.
Therefore ARADDR and ACADDR address bits above bit-35 should not
be used.

Workaround

The following software instructions are required to be downgraded
to TLBI VMALLE1IS:  TLBI ASIDE1, TLBI ASIDE1IS, TLBI VAAE1,
TLBI VAAE1IS, TLBI VAALE1, TLBI VAALE1IS, TLBI VAE1, TLBI VAE1IS,
TLBI VALE1, TLBI VALE1IS

The following software instructions are required to be downgraded
to TLBI VMALLS12E1IS: TLBI IPAS2E1IS, TLBI IPAS2LE1IS

The following software instructions are required to be downgraded
to TLBI ALLE2IS: TLBI VAE2IS, TLBI VALE2IS.

The following software instructions are required to be downgraded
to TLBI ALLE3IS: TLBI VAE3IS, TLBI VALE3IS.

The following software instructions are required to be downgraded
to TLBI VMALLE1IS when the Force Broadcast (FB) bit [9] of the
Hypervisor Configuration Register (HCR_EL2) is set:
TLBI ASIDE1, TLBI VAAE1, TLBI VAALE1, TLBI VAE1, TLBI VALE1

The following software instruction is required to be downgraded
to IC IALLUIS: IC IVAU, Xt

Specifically for the IC IVAU, Xt downgrade, setting SCTLR_EL1.UCI
to 0 will disable EL0 access to this instruction. Any attempt to
execute from EL0 will generate an EL1 trap, where the downgrade to
IC ALLUIS can be implemented.

[1] https://www.nxp.com/docs/en/errata/IMX8_1N94W.pdf
[2] commit 307fd14d4b ("arm64: dts: imx: add imx8qm mek support")

Bug: 284762900
Link: https://lore.kernel.org/linux-arm-kernel/20230420112952.28340-1-iivanov@suse.de/
Signed-off-by: Jindong Yue <jindong.yue@nxp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8dd50b369412de73b608805d1b5bb8424ea23280
2023-06-14 16:40:59 +00:00
Quentin Perret
b489c53001 ANDROID: KVM: arm64: Allow setting {P,U}XN in stage-2 PTEs
FEAT_XNX allows to specify PXN and UXN attributes on stage-2 entries.
Make this usable from pKVM by exposing two new kvm_pgtable_prot entries
for each of them.

No functional changes intended.

Bug: 264070847
Change-Id: I47d861fa64ba511370b182f4609fe1c27695a949
Signed-off-by: Quentin Perret <qperret@google.com>
2023-06-14 16:40:59 +00:00
Quentin Perret
b7aff5c603 ANDROID: KVM: arm64: Restrict host-to-hyp MMIO donations
Nothing currently prevents the donation of an MMIO region to the
hypervisor for backing e.g. guest stage-2 page-tables, tracing buffers,
hyp vm and vcpu metadata, or any other donation to EL2. However, the
only confirmed use-case for MMIO donations are for protecting the IOMMU
registers as well as for vendor module usage.

Restrict the donation of MMIO regions to these two paths only by
introducing a new helper function.

Bug: 264070847
Change-Id: I914508fb3e3547fcfabca8557bdf7948cb796099
Signed-off-by: Quentin Perret <qperret@google.com>
2023-06-14 16:40:59 +00:00
Quentin Perret
f5f8c19f6c ANDROID: KVM: arm64: Allow state changes of MMIO pages
We've historically disallowed state changes for MMIO pages -- the host
had sole ownership of all of them. However, changing the state of those
pages has clearly become a goal both to support vendor extensions to
the hypervisor, as well as to support device assignment in the longer
term. To pave the way towards this support, let's allow certain state
transitions for MMIO pages.

Bug: 264070847
Change-Id: I9803b572c90d8a694c3d43a0ee0d7b4f4124fe4a
Signed-off-by: Quentin Perret <qperret@google.com>
2023-06-14 16:40:59 +00:00
Quentin Perret
4ddb4ed818 ANDROID: KVM: arm64: Allow MMIO perm changes from modules
We now allow donations of MMIO ranges, let's also allow modules to
change host stage-2 permissions.

Bug: 264070847
Change-Id: Ia72678bb27559d9a7963dbc5ffb5a101efcbbad2
Signed-off-by: Quentin Perret <qperret@google.com>
2023-06-14 16:40:59 +00:00
Quentin Perret
5d0225cdf0 ANDROID: KVM: arm64: Don't allocate from handle_host_mem_abort
There shouldn't be any reason to ever need allocating from the host
stage-2 pool during mem aborts now that the base page-table structure
is pinned. To prevent future regressions in this area, introduce a new
sanity check that will warn when hyp_page_alloc() is used from the mem
wrong paths.

Bug: 264070847
Change-Id: I7a7c606fe01558790e4ffcd3534f8976caf48bd0
Signed-off-by: Quentin Perret <qperret@google.com>
2023-06-14 16:40:59 +00:00
Quentin Perret
5136a28ab6 ANDROID: KVM: arm64: Donate IOMMU regions to pKVM
The MMIO register space for IOMMUs controlled by the hypervisor is
currently unmapped from the host stage-2, and we rely on the host abort
path to not accidentally map them. However, this approach becomes
increasingly difficult to maintain as we introduce support for donating
MMIO regions and not just memory -- nothing prevents the host from
donating a protected MMIO register to another entity for example.

Now that MMIO donations are possible, let's use the proper
host-donate-hyp machinery to implement this. As a nice side effect, this
guarantees the host stage-2 page-table is annotated with hyp ownership
for those IOMMU regions, which guarantees the core range alignment
feature in the host mem abort parth will do the right thing without
requiring a second pass in the IOMMU code. This also turns the host
stage-2 PTEs into "non-default" entries, hence avoiding issues with the
coallescing code looking forward.

Bug: 264070847
Change-Id: I1fad1b1be36f3b654190a912617e780141945a8f
Signed-off-by: Quentin Perret <qperret@google.com>
2023-06-14 16:40:59 +00:00