70698 Commits

Author SHA1 Message Date
0d92efdee9 Add skb drop reasons to IPv6 UDP receive path
Enumerate the skb drop reasons in the receive path for IPv6 UDP packets.

Signed-off-by: Donald Hunter <donald.hunter@redhat.com>
Link: https://lore.kernel.org/r/20220926120350.14928-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-27 18:05:03 -07:00
62e56ef57c net: tls: Add ARIA-GCM algorithm
RFC 6209 describes ARIA for TLS 1.2.
ARIA-128-GCM and ARIA-256-GCM are defined in RFC 6209.

This patch would offer performance increment and an opportunity for
hardware offload.

Benchmark results:
iperf-ssl are used.
CPU: intel i3-12100.

  TLS(openssl-3.0-dev)
[  3]  0.0- 1.0 sec   185 MBytes  1.55 Gbits/sec
[  3]  1.0- 2.0 sec   186 MBytes  1.56 Gbits/sec
[  3]  2.0- 3.0 sec   186 MBytes  1.56 Gbits/sec
[  3]  3.0- 4.0 sec   186 MBytes  1.56 Gbits/sec
[  3]  4.0- 5.0 sec   186 MBytes  1.56 Gbits/sec
[  3]  0.0- 5.0 sec   927 MBytes  1.56 Gbits/sec
  kTLS(aria-generic)
[  3]  0.0- 1.0 sec   198 MBytes  1.66 Gbits/sec
[  3]  1.0- 2.0 sec   194 MBytes  1.62 Gbits/sec
[  3]  2.0- 3.0 sec   194 MBytes  1.63 Gbits/sec
[  3]  3.0- 4.0 sec   194 MBytes  1.63 Gbits/sec
[  3]  4.0- 5.0 sec   194 MBytes  1.62 Gbits/sec
[  3]  0.0- 5.0 sec   974 MBytes  1.63 Gbits/sec
  kTLS(aria-avx wirh GFNI)
[  3]  0.0- 1.0 sec   632 MBytes  5.30 Gbits/sec
[  3]  1.0- 2.0 sec   657 MBytes  5.51 Gbits/sec
[  3]  2.0- 3.0 sec   657 MBytes  5.51 Gbits/sec
[  3]  3.0- 4.0 sec   656 MBytes  5.50 Gbits/sec
[  3]  4.0- 5.0 sec   656 MBytes  5.50 Gbits/sec
[  3]  0.0- 5.0 sec  3.18 GBytes  5.47 Gbits/sec

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Vadim Fedorenko <vfedorenko@novek.ru>
Link: https://lore.kernel.org/r/20220925150033.24615-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-27 17:29:09 -07:00
de4feb4e3d NFC: hci: Split memcpy() of struct hcp_message flexible array
To work around a misbehavior of the compiler's ability to see into
composite flexible array structs (as detailed in the coming memcpy()
hardening series[1]), split the memcpy() of the header and the payload
so no false positive run-time overflow warning will be generated. This
split already existed for the "firstfrag" case, so just generalize the
logic further.

[1] https://lore.kernel.org/linux-hardening/20220901065914.1417829-2-keescook@chromium.org/

Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Reported-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220924040835.3364912-1-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-27 07:45:18 -07:00
59cd737766 net: openvswitch: allow conntrack in non-initial user namespace
Similar to the previous commit, the Netlink interface of the OVS
conntrack module was restricted to global CAP_NET_ADMIN by using
GENL_ADMIN_PERM. This is changed to GENL_UNS_ADMIN_PERM to support
unprivileged containers in non-initial user namespace.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-27 11:31:36 +02:00
8039371847 net: openvswitch: allow metering in non-initial user namespace
The Netlink interface for metering was restricted to global CAP_NET_ADMIN
by using GENL_ADMIN_PERM. To allow metring in a non-inital user namespace,
e.g., a container, this is changed to GENL_UNS_ADMIN_PERM.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-27 11:31:36 +02:00
6627a2074d net/smc: Support SO_REUSEPORT
This enables SO_REUSEPORT [1] for clcsock when it is set on smc socket,
so that some applications which uses it can be transparently replaced
with SMC. Also, this helps improve load distribution.

Here is a simple test of NGINX + wrk with SMC. The CPU usage is collected
on NGINX (server) side as below.

Disable SO_REUSEPORT:

05:15:33 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
05:15:34 PM  all    7.02    0.00   11.86    0.00    2.04    8.93    0.00    0.00    0.00   70.15
05:15:34 PM    0    0.00    0.00    0.00    0.00   16.00   70.00    0.00    0.00    0.00   14.00
05:15:34 PM    1   11.58    0.00   22.11    0.00    0.00    0.00    0.00    0.00    0.00   66.32
05:15:34 PM    2    1.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00    0.00   98.00
05:15:34 PM    3   16.84    0.00   30.53    0.00    0.00    0.00    0.00    0.00    0.00   52.63
05:15:34 PM    4   28.72    0.00   44.68    0.00    0.00    0.00    0.00    0.00    0.00   26.60
05:15:34 PM    5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
05:15:34 PM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
05:15:34 PM    7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

Enable SO_REUSEPORT:

05:15:20 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
05:15:21 PM  all    8.56    0.00   14.40    0.00    2.20    9.86    0.00    0.00    0.00   64.98
05:15:21 PM    0    0.00    0.00    4.08    0.00   14.29   76.53    0.00    0.00    0.00    5.10
05:15:21 PM    1    9.09    0.00   16.16    0.00    1.01    0.00    0.00    0.00    0.00   73.74
05:15:21 PM    2    9.38    0.00   16.67    0.00    1.04    0.00    0.00    0.00    0.00   72.92
05:15:21 PM    3   10.42    0.00   17.71    0.00    1.04    0.00    0.00    0.00    0.00   70.83
05:15:21 PM    4    9.57    0.00   15.96    0.00    0.00    0.00    0.00    0.00    0.00   74.47
05:15:21 PM    5    9.18    0.00   15.31    0.00    0.00    1.02    0.00    0.00    0.00   74.49
05:15:21 PM    6    8.60    0.00   15.05    0.00    0.00    0.00    0.00    0.00    0.00   76.34
05:15:21 PM    7   12.37    0.00   14.43    0.00    0.00    0.00    0.00    0.00    0.00   73.20

Using SO_REUSEPORT helps the load distribution of NGINX be more
balanced.

[1] https://man7.org/linux/man-pages/man7/socket.7.html

Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Acked-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://lore.kernel.org/r/20220922121906.72406-1-tonylu@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-27 10:26:17 +02:00
fc4f2fd02a net/sched: taprio: simplify list iteration in taprio_dev_notifier()
taprio_dev_notifier() subscribes to netdev state changes in order to
determine whether interfaces which have a taprio root qdisc have changed
their link speed, so the internal calculations can be adapted properly.

The 'qdev' temporary variable serves no purpose, because we just use it
only once, and can just as well use qdisc_dev(q->root) directly (or the
"dev" that comes from the netdev notifier; this is because qdev is only
interesting if it was the subject of the state change, _and_ its root
qdisc belongs in the taprio list).

The 'found' variable also doesn't really serve too much of a purpose
either; we can just call taprio_set_picos_per_byte() within the loop,
and exit immediately afterwards.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://lore.kernel.org/r/20220923145921.3038904-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26 12:38:39 -07:00
56378f3ccb net: dsa: make user ports return to init_net on netns deletion
As pointed out during review, currently the following set of commands
crashes the kernel:

$ ip netns add ns0
$ ip link set swp0 netns ns0
$ ip netns del ns0
WARNING: CPU: 1 PID: 27 at net/core/dev.c:10884 unregister_netdevice_many+0xaa4/0xaec
Workqueue: netns cleanup_net
pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : unregister_netdevice_many+0xaa4/0xaec
lr : unregister_netdevice_many+0x700/0xaec
Call trace:
 unregister_netdevice_many+0xaa4/0xaec
 default_device_exit_batch+0x294/0x340
 ops_exit_list+0xac/0xc4
 cleanup_net+0x2e4/0x544
 process_one_work+0x4ec/0xb40
---[ end trace 0000000000000000 ]---
unregister_netdevice: waiting for swp0 to become free. Usage count = 2

This is because since DSA user ports, since they started populating
dev->rtnl_link_ops in the blamed commit, gained a different treatment
from default_device_exit_net(), which thinks these interfaces can now be
unregistered.

They can't; so set netns_refund = true to restore the behavior prior to
populating dev->rtnl_link_ops.

Fixes: 95f510d0b792 ("net: dsa: allow the DSA master to be seen and changed through rtnetlink")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220921185428.1767001-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26 11:28:57 -07:00
fb33ec016b xdp: improve page_pool xdp_return performance
During LPC2022 I meetup with my page_pool co-maintainer Ilias. When
discussing page_pool code we realised/remembered certain optimizations
had not been fully utilised.

Since commit c07aea3ef4d4 ("mm: add a signature in struct page") struct
page have a direct pointer to the page_pool object this page was
allocated from.

Thus, with this info it is possible to skip the rhashtable_lookup to
find the page_pool object in __xdp_return().

The rcu_read_lock can be removed as it was tied to xdp_mem_allocator.
The page_pool object is still safe to access as it tracks inflight pages
and (potentially) schedules final release from a work queue.

Created a micro benchmark of XDP redirecting from mlx5 into veth with
XDP_DROP bpf-prog on the peer veth device. This increased performance
6.5% from approx 8.45Mpps to 9Mpps corresponding to using 7 nanosec
(27 cycles at 3.8GHz) less per packet.

Suggested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/r/166377993287.1737053.10258297257583703949.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26 11:28:19 -07:00
d6e3b27cbd af_unix: Refactor unix_read_skb()
Similar to udp_read_skb(), delete the unnecessary while loop in
unix_read_skb() for readability.  Since recv_actor() cannot return a
value greater than skb->len (see sk_psock_verdict_recv()), remove the
redundant check.

Suggested-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Link: https://lore.kernel.org/r/7009141683ad6cd3785daced3e4a80ba0eb773b5.1663909008.git.peilin.ye@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26 11:00:19 -07:00
31f1fbcb34 udp: Refactor udp_read_skb()
Delete the unnecessary while loop in udp_read_skb() for readability.
Additionally, since recv_actor() cannot return a value greater than
skb->len (see sk_psock_verdict_recv()), remove the redundant check.

Suggested-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Link: https://lore.kernel.org/r/343b5d8090a3eb764068e9f1d392939e2b423747.1663909008.git.peilin.ye@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26 11:00:19 -07:00
9258b8b1be ipv6: tcp: send consistent autoflowlabel in RST packets
Blamed commit added a txhash parameter to tcp_v6_send_response()
but forgot to update tcp_v6_send_reset() accordingly.

Fixes: aa51b80e1af4 ("ipv6: tcp: send consistent autoflowlabel in SYN_RECV state")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220922165036.1795862-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-23 21:00:25 -07:00
4dfa5f05ff linux-can-next-for-6.1-20220923
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEEBsvAIBsPu6mG7thcrX5LkNig010FAmMtnygTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRCtfkuQ2KDTXXOLCACFQGHCZrUGdWBw83UDA9r/lHuiIRuX
 ELaSq1h9PJ2BC2+4t/4uDClJgvSYkxJkomNUVDDvbMK8gNTdVdIuHzaR12kerVWs
 ArG/g0tjRE3ed5I3JWIOZ/ukfVmsVy/Rm/cwtqtWibJNSFtZuiodN1sYtS3Bv4i3
 9sAdU6++LW/4OSAMss0+6VakC27/D5GJEZuEdYveqtf/TunrqNKGL2M5YmbenRoD
 Bnx7FIiPHeL8pYz97W5ZEyGzZH90AuKOvHmx3+cHgZVHDzyBKd0UNLgisYFwXo9w
 +9tfyZMZqzIZL2R0f3GVGPSvJGnLaX73bdgc8FAas/mY9WJze7plYWKt
 =8zkH
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-6.1-20220923' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2022-09-23

The first 2 patches are by Ziyang Xuan and optimize registration and
the sending in the CAN BCM protocol a bit.

The next 8 patches target the gs_usb driver. 7 are by me and first fix
the time hardware stamping support (added during this net-next cycle),
rename a variable, convert the usb_control_msg + manual
kmalloc()/kfree() to usb_control_msg_{send,rev}(), clean up the error
handling and add switchable termination support. The patch by Rhett
Aultman and Vasanth Sadhasivan convert the driver from
usb_alloc_coherent()/usb_free_coherent() to kmalloc()/URB_FREE_BUFFER.

The last patch is by Shang XiaoJing and removes an unneeded call to
dev_err() from the ctucanfd driver.

* tag 'linux-can-next-for-6.1-20220923' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
  can: ctucanfd: Remove redundant dev_err call
  can: gs_usb: remove dma allocations
  can: gs_usb: add switchable termination support
  can: gs_usb: gs_make_candev(): clean up error handling
  can: gs_usb: convert from usb_control_msg() to usb_control_msg_{send,recv}()
  can: gs_usb: gs_cmd_reset(): rename variable holding struct gs_can pointer to dev
  can: gs_usb: gs_can_open(): initialize time counter before starting device
  can: gs_usb: add missing lock to protect struct timecounter::cycle_last
  can: gs_usb: gs_usb_get_timestamp(): fix endpoint parameter for usb_control_msg_recv()
  can: bcm: check the result of can_send() in bcm_can_tx()
  can: bcm: registration process optimization in bcm_module_init()
====================

Link: https://lore.kernel.org/r/20220923120859.740577-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-23 07:07:55 -07:00
3fd7bfd28c can: bcm: check the result of can_send() in bcm_can_tx()
If can_send() fail, it should not update frames_abs counter
in bcm_can_tx(). Add the result check for can_send() in bcm_can_tx().

Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Suggested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Link: https://lore.kernel.org/all/9851878e74d6d37aee2f1ee76d68361a46f89458.1663206163.git.william.xuanziyang@huawei.com
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-23 13:53:10 +02:00
edd1a7e42f can: bcm: registration process optimization in bcm_module_init()
Now, register_netdevice_notifier() and register_pernet_subsys() are both
after can_proto_register(). It can create CAN_BCM socket and process socket
once can_proto_register() successfully, so it is possible missing notifier
event or proc node creation because notifier or bcm proc directory is not
registered or created yet. Although this is a low probability scenario, it
is not impossible.

Move register_pernet_subsys() and register_netdevice_notifier() to the
front of can_proto_register(). In addition, register_pernet_subsys() and
register_netdevice_notifier() may fail, check their results are necessary.

Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Link: https://lore.kernel.org/all/823cff0ebec33fa9389eeaf8b8ded3217c32cb38.1663206163.git.william.xuanziyang@huawei.com
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-23 13:53:02 +02:00
0c3e10cb44 net: phy: Add support for rate matching
This adds support for rate matching (also known as rate adaptation) to
the phy subsystem. The general idea is that the phy interface runs at
one speed, and the MAC throttles the rate at which it sends packets to
the link speed. There's a good overview of several techniques for
achieving this at [1]. This patch adds support for three: pause-frame
based (such as in Aquantia phys), CRS-based (such as in 10PASS-TS and
2BASE-TL), and open-loop-based (such as in 10GBASE-W).

This patch makes a few assumptions and a few non assumptions about the
types of rate matching available. First, it assumes that different phys
may use different forms of rate matching. Second, it assumes that phys
can use rate matching for any of their supported link speeds (e.g. if a
phy supports 10BASE-T and XGMII, then it can adapt XGMII to 10BASE-T).
Third, it does not assume that all interface modes will use the same
form of rate matching. Fourth, it does not assume that all phy devices
will support rate matching (even if some do). Relaxing or strengthening
these (non-)assumptions could result in a different API. For example, if
all interface modes were assumed to use the same form of rate matching,
then a bitmask of interface modes supportting rate matching would
suffice.

For some better visibility into the process, the current rate matching
mode is exposed as part of the ethtool ksettings. For the moment, only
read access is supported. I'm not sure what userspace might want to
configure yet (disable it altogether, disable just one mode, specify the
mode to use, etc.). For the moment, since only pause-based rate
adaptation support is added in the next few commits, rate matching can
be disabled altogether by adjusting the advertisement.

802.3 calls this feature "rate adaptation" in clause 49 (10GBASE-R) and
"rate matching" in clause 61 (10PASS-TL and 2BASE-TS). Aquantia also calls
this feature "rate adaptation". I chose "rate matching" because it is
shorter, and because Russell doesn't think "adaptation" is correct in this
context.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23 11:55:35 +01:00
05cd823863 ethtool: tunnels: check the return value of nla_nest_start()
Check the return value of nla_nest_start(). When starting the entry
level nested attributes, if the tailroom of socket buffer is
insufficient to store the attribute header and payload, the return value
will be NULL.

There is, however, no real bug here since if the skb is full
nla_put_be16() will fail as well and we'll error out.

Signed-off-by: Li Zhong <floridsleeves@gmail.com>
Link: https://lore.kernel.org/r/20220921181716.1629541-1-floridsleeves@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22 19:28:10 -07:00
e046fa895c net/sched: use tc_qdisc_stats_dump() in qdisc
use tc_qdisc_stats_dump() in qdisc.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22 17:34:10 -07:00
a2c2a4ddc2 net/sched: taprio: remove unnecessary taprio_list_lock
The 3 functions that want access to the taprio_list:
taprio_dev_notifier(), taprio_destroy() and taprio_init() are all called
with the rtnl_mutex held, therefore implicitly serialized with respect
to each other. A spin lock serves no purpose.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://lore.kernel.org/r/20220921095632.1379251-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22 17:28:51 -07:00
56e5a6d3aa net/tls: Support 256 bit keys with TX device offload
Add the missing clause for 256 bit keys in tls_set_device_offload(), and
the needed adjustments in tls_device_fallback.c.

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22 17:27:42 -07:00
ea7a9d88ba net/tls: Use cipher sizes structs
Use the newly introduced cipher sizes structs instead of the repeated
switch cases churn.

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22 17:27:42 -07:00
2d2c5ea242 net/tls: Describe ciphers sizes by const structs
Introduce cipher sizes descriptor. It helps reducing the amount of code
duplications and repeated switch/cases that assigns the proper sizes
according to the cipher type.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22 17:27:41 -07:00
0140a7168f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/freescale/fec.h
  7b15515fc1ca ("Revert "fec: Restart PPS after link state change"")
  40c79ce13b03 ("net: fec: add stop mode support for imx8 platform")
https://lore.kernel.org/all/20220921105337.62b41047@canb.auug.org.au/

drivers/pinctrl/pinctrl-ocelot.c
  c297561bc98a ("pinctrl: ocelot: Fix interrupt controller")
  181f604b33cd ("pinctrl: ocelot: add ability to be used in a non-mmio configuration")
https://lore.kernel.org/all/20220921110032.7cd28114@canb.auug.org.au/

tools/testing/selftests/drivers/net/bonding/Makefile
  bbb774d921e2 ("net: Add tests for bonding and team address list management")
  152e8ec77640 ("selftests/bonding: add a test for bonding lladdr target")
https://lore.kernel.org/all/20220921110437.5b7dbd82@canb.auug.org.au/

drivers/net/can/usb/gs_usb.c
  5440428b3da6 ("can: gs_usb: gs_can_open(): fix race dev->can.state condition")
  45dfa45f52e6 ("can: gs_usb: add RX and TX hardware timestamp support")
https://lore.kernel.org/all/84f45a7d-92b6-4dc5-d7a1-072152fab6ff@tessares.net/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22 13:02:10 -07:00
504c25cb76 Including fixes from wifi, netfilter and can.
A handful of awaited fixes here - revert of the FEC changes,
 bluetooth fix, fixes for iwlwifi spew.
 
 We added a warning in PHY/MDIO code which is triggering on
 a couple of platforms in a false-positive-ish way. If we can't
 iron that out over the week we'll drop it and re-add for 6.1.
 
 I've added a new "follow up fixes" section for fixes to fixes
 in 6.0-rcs but it may actually give the false impression that
 those are problematic or that more testing time would have
 caught them. So likely a one time thing.
 
 Follow up fixes:
 
  - nf_tables_addchain: fix nft_counters_enabled underflow
 
  - ebtables: fix memory leak when blob is malformed
 
  - nf_ct_ftp: fix deadlock when nat rewrite is needed
 
 Current release - regressions:
 
  - Revert "fec: Restart PPS after link state change"
  - Revert "net: fec: Use a spinlock to guard `fep->ptp_clk_on`"
 
  - Bluetooth: fix HCIGETDEVINFO regression
 
  - wifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2
 
  - mptcp: fix fwd memory accounting on coalesce
 
  - rwlock removal fall out:
    - ipmr: always call ip{,6}_mr_forward() from RCU read-side
      critical section
    - ipv6: fix crash when IPv6 is administratively disabled
 
  - tcp: read multiple skbs in tcp_read_skb()
 
  - mdio_bus_phy_resume state warning fallout:
    - eth: ravb: fix PHY state warning splat during system resume
    - eth: sh_eth: fix PHY state warning splat during system resume
 
 Current release - new code bugs:
 
  - wifi: iwlwifi: don't spam logs with NSS>2 messages
 
  - eth: mtk_eth_soc: enable XDP support just for MT7986 SoC
 
 Previous releases - regressions:
 
  - bonding: fix NULL deref in bond_rr_gen_slave_id
 
  - wifi: iwlwifi: mark IWLMEI as broken
 
 Previous releases - always broken:
 
  - nf_conntrack helpers:
    - irc: tighten matching on DCC message
    - sip: fix ct_sip_walk_headers
    - osf: fix possible bogus match in nf_osf_find()
 
  - ipvlan: fix out-of-bound bugs caused by unset skb->mac_header
 
  - core: fix flow symmetric hash
 
  - bonding, team: unsync device addresses on ndo_stop
 
  - phy: micrel: fix shared interrupt on LAN8814
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmMsj3EACgkQMUZtbf5S
 IrsUgQ//eXxuUZeGTg7cgJKPFJelrZ3iL16B1+s2qX94GPIqXRAShgC78iM7IbSe
 y3vR/7YVE7sKXm88wnLefMQVXPp0cE2p0+8++E/j4zcRZsM5sHb2+d3gW6nos2ed
 U8Ldm7LzWUNt/o1ZHDqZWBSoreFkmbFyHO6FVPCuH11tFUJqxJ/SP860mwo6tbuT
 HOoVphKis41IMEXCgybs2V0DAQewba0gejzAmySDy8epNhOj2F4Vo6aadnUCI68U
 HrIFYe2wiEi6MZDsB9zpRXc9seb6ZBKbBjgQnTK7MwfBEQCzxtR2lkNobJM1WbdL
 nYwHBOJ16yX0BnlSpUEepv6iJYY5Q7FS35Wk3Rq5Mik6DaEir6vVSBdRxHpYOkO2
 KPIyyMMAA5E8mAtqH3PcpnwDK+9c3KlZYYKXxIp2IjQm87DpOZJFynwsC3Crmbzo
 C7UTMav2nkHljoapMLUwzqyw2ip+Qo14XA043FDPUru1sXY9CY6q50XZa5GmrNKh
 xyaBdp4Ckj1kOuXUR9jz3Rq8skOZ8lNGHtiCdgPZitWhNKW1YORJihC7/9zdieCR
 1gOE7Dpz/MhVmFn2e8S5O3TkU5lXfALfPDJi4QiML5VLHXd/nCE5sHPiOBWcoo4w
 2djKbIGpLRnO6qMs4NkWNmPbG+/ouvpM+lewqn+xU4TGyn/NTbI=
 =wrep
 -----END PGP SIGNATURE-----

Merge tag 'net-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from wifi, netfilter and can.

  A handful of awaited fixes here - revert of the FEC changes, bluetooth
  fix, fixes for iwlwifi spew.

  We added a warning in PHY/MDIO code which is triggering on a couple of
  platforms in a false-positive-ish way. If we can't iron that out over
  the week we'll drop it and re-add for 6.1.

  I've added a new "follow up fixes" section for fixes to fixes in
  6.0-rcs but it may actually give the false impression that those are
  problematic or that more testing time would have caught them. So
  likely a one time thing.

  Follow up fixes:

   - nf_tables_addchain: fix nft_counters_enabled underflow

   - ebtables: fix memory leak when blob is malformed

   - nf_ct_ftp: fix deadlock when nat rewrite is needed

  Current release - regressions:

   - Revert "fec: Restart PPS after link state change" and the related
     "net: fec: Use a spinlock to guard `fep->ptp_clk_on`"

   - Bluetooth: fix HCIGETDEVINFO regression

   - wifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2

   - mptcp: fix fwd memory accounting on coalesce

   - rwlock removal fall out:
      - ipmr: always call ip{,6}_mr_forward() from RCU read-side
        critical section
      - ipv6: fix crash when IPv6 is administratively disabled

   - tcp: read multiple skbs in tcp_read_skb()

   - mdio_bus_phy_resume state warning fallout:
      - eth: ravb: fix PHY state warning splat during system resume
      - eth: sh_eth: fix PHY state warning splat during system resume

  Current release - new code bugs:

   - wifi: iwlwifi: don't spam logs with NSS>2 messages

   - eth: mtk_eth_soc: enable XDP support just for MT7986 SoC

  Previous releases - regressions:

   - bonding: fix NULL deref in bond_rr_gen_slave_id

   - wifi: iwlwifi: mark IWLMEI as broken

  Previous releases - always broken:

   - nf_conntrack helpers:
      - irc: tighten matching on DCC message
      - sip: fix ct_sip_walk_headers
      - osf: fix possible bogus match in nf_osf_find()

   - ipvlan: fix out-of-bound bugs caused by unset skb->mac_header

   - core: fix flow symmetric hash

   - bonding, team: unsync device addresses on ndo_stop

   - phy: micrel: fix shared interrupt on LAN8814"

* tag 'net-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
  selftests: forwarding: add shebang for sch_red.sh
  bnxt: prevent skb UAF after handing over to PTP worker
  net: marvell: Fix refcounting bugs in prestera_port_sfp_bind()
  net: sched: fix possible refcount leak in tc_new_tfilter()
  net: sunhme: Fix packet reception for len < RX_COPY_THRESHOLD
  udp: Use WARN_ON_ONCE() in udp_read_skb()
  selftests: bonding: cause oops in bond_rr_gen_slave_id
  bonding: fix NULL deref in bond_rr_gen_slave_id
  net: phy: micrel: fix shared interrupt on LAN8814
  net/smc: Stop the CLC flow if no link to map buffers on
  ice: Fix ice_xdp_xmit() when XDP TX queue number is not sufficient
  net: atlantic: fix potential memory leak in aq_ndev_close()
  can: gs_usb: gs_usb_set_phys_id(): return with error if identify is not supported
  can: gs_usb: gs_can_open(): fix race dev->can.state condition
  can: flexcan: flexcan_mailbox_read() fix return value for drop = true
  net: sh_eth: Fix PHY state warning splat during system resume
  net: ravb: Fix PHY state warning splat during system resume
  netfilter: nf_ct_ftp: fix deadlock when nat rewrite is needed
  netfilter: ebtables: fix memory leak when blob is malformed
  netfilter: nf_tables: fix percpu memory leak at nf_tables_addchain()
  ...
2022-09-22 10:58:13 -07:00
c2e1cfefca net: sched: fix possible refcount leak in tc_new_tfilter()
tfilter_put need to be called to put the refount got by tp->ops->get to
avoid possible refcount leak when chain->tmplt_ops != NULL and
chain->tmplt_ops != tp->ops.

Fixes: 7d5509fa0d3d ("net: sched: extend proto ops with 'put' callback")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Link: https://lore.kernel.org/r/20220921092734.31700-1-hbh25y@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22 07:04:47 -07:00
db39dfdc1c udp: Use WARN_ON_ONCE() in udp_read_skb()
Prevent udp_read_skb() from flooding the syslog.

Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Link: https://lore.kernel.org/r/20220921005915.2697-1-yepeilin.cs@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22 06:42:57 -07:00
0227f058aa net/smc: Unbind r/w buffer size from clcsock and make them tunable
Currently, SMC uses smc->sk.sk_{rcv|snd}buf to create buffers for
send buffer and RMB. And the values of buffer size are from tcp_{w|r}mem
in clcsock.

The buffer size from TCP socket doesn't fit SMC well. Generally, buffers
are usually larger than TCP for SMC-R/-D to get higher performance, for
they are different underlay devices and paths.

So this patch unbinds buffer size from TCP, and introduces two sysctl
knobs to tune them independently. Also, these knobs are per net
namespace and work for containers.

Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-22 12:58:21 +02:00
77eee32514 net/smc: Introduce a specific sysctl for TEST_LINK time
SMC-R tests the viability of link by sending out TEST_LINK LLC
messages over RoCE fabric when connections on link have been
idle for a time longer than keepalive interval (testlink time).

But using tcp_keepalive_time as testlink time maybe not quite
suitable because it is default no less than two hours[1], which
is too long for single link to find peer dead. The active host
will still use peer-dead link (QP) sending messages, and can't
find out until get IB_WC_RETRY_EXC_ERR error CQEs, which takes
more time than TEST_LINK timeout (SMC_LLC_WAIT_TIME) normally.

So this patch introduces a independent sysctl for SMC-R to set
link keepalive time, in order to detect link down in time. The
default value is 30 seconds.

[1] https://www.rfc-editor.org/rfc/rfc1122#page-101

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-22 12:58:21 +02:00
e738455b2c net/smc: Stop the CLC flow if no link to map buffers on
There might be a potential race between SMC-R buffer map and
link group termination.

smc_smcr_terminate_all()     | smc_connect_rdma()
--------------------------------------------------------------
                             | smc_conn_create()
for links in smcibdev        |
        schedule links down  |
                             | smc_buf_create()
                             |  \- smcr_buf_map_usable_links()
                             |      \- no usable links found,
                             |         (rmb->mr = NULL)
                             |
                             | smc_clc_send_confirm()
                             |  \- access conn->rmb_desc->mr[]->rkey
                             |     (panic)

During reboot and IB device module remove, all links will be set
down and no usable links remain in link groups. In such situation
smcr_buf_map_usable_links() should return an error and stop the
CLC flow accessing to uninitialized mr.

Fixes: b9247544c1bc ("net/smc: convert static link ID instances to support multiple links")
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Link: https://lore.kernel.org/r/1663656189-32090-1-git-send-email-guwen@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-22 12:53:53 +02:00
7a5d48c446 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:

====================
netfilter patches for net-next

Remove GPL license copypastry in uapi files, those have SPDX tags.
From Christophe Jaillet.

Remove unused variable in rpfilter, from Guillaume Nault.

Rework gc resched delay computation in conntrack, from Antoine Tenart.

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: rpfilter: Remove unused variable 'ret'.
  headers: Remove some left-over license text in include/uapi/linux/netfilter/
  netfilter: conntrack: revisit the gc initial rescheduling bias
  netfilter: conntrack: fix the gc rescheduling delay
====================

Link: https://lore.kernel.org/r/20220921095000.29569-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21 18:42:54 -07:00
9f87eb4246 flow_dissector: Do not count vlan tags inside tunnel payload
We've met the problem that when there is a vlan tag inside
GRE encapsulation, the match of num_of_vlans fails.
It is caused by the vlan tag inside GRE payload has been
counted into num_of_vlans, which is not expected.

One example packet is like this:
Ethernet II, Src: Broadcom_68:56:07 (00:10:18:68:56:07)
                   Dst: Broadcom_68:56:08 (00:10:18:68:56:08)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 100
Internet Protocol Version 4, Src: 192.168.1.4, Dst: 192.168.1.200
Generic Routing Encapsulation (Transparent Ethernet bridging)
Ethernet II, Src: Broadcom_68:58:07 (00:10:18:68:58:07)
                   Dst: Broadcom_68:58:08 (00:10:18:68:58:08)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 200
...
It should match the (num_of_vlans 1) rule, but it matches
the (num_of_vlans 2) rule.

The vlan tags inside the GRE or other tunnel encapsulated payload
should not be taken into num_of_vlans.
The fix is to stop counting the vlan number when the encapsulation
bit is set.

Fixes: 34951fcf26c5 ("flow_dissector: Add number of vlan tags dissector")
Signed-off-by: Qingqing Yang <qingqing.yang@broadcom.com>
Reviewed-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Link: https://lore.kernel.org/r/20220919074808.136640-1-qingqing.yang@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21 18:36:57 -07:00
1d14b30b5a net: sched: remove unused tcf_result extension
Added by:
commit e5cf1baf92cb ("act_mirred: use TC_ACT_REINSERT when possible")
but no longer useful.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20220919130627.3551233-1-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21 18:32:33 -07:00
2801f30e2c net: sched: simplify code in mall_reoffload
such expression:
	if (err)
		return err;
	return 0;
can simplify to:
	return err;

Signed-off-by: William Dean <williamsukatube@163.com>
Link: https://lore.kernel.org/r/20220917063556.2673-1-williamsukatube@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21 18:22:04 -07:00
63b7c2ebcc net/af_packet: registration process optimization in packet_init()
Now, register_pernet_subsys() and register_netdevice_notifier() are both
after sock_register(). It can create PF_PACKET socket and process socket
once sock_register() successfully. It is possible PF_PACKET socket is
creating but register_pernet_subsys() and register_netdevice_notifier()
are not registered yet. Thus net->packet.sklist_lock and net->packet.sklist
will be accessed without initialization that is done in packet_net_init().
Although this is a low probability scenario.

Move register_pernet_subsys() and register_netdevice_notifier() to the
front in packet_init(). Correspondingly, adjust the unregister process
in packet_exit().

Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-21 12:59:22 +01:00
2a566f0148 net: sched: act_ct: remove redundant variable err
Return value directly from pskb_trim_rcsum() instead of
getting value from redundant variable err.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Jinpeng Cui <cui.jinpeng2@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-21 12:49:32 +01:00
72f5c89804 netfilter: rpfilter: Remove unused variable 'ret'.
Commit 91a178258aea ("netfilter: rpfilter: Convert
rpfilter_lookup_reverse to new dev helper") removed the need for the
'ret' variable. This went unnoticed because of the __maybe_unused
annotation.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-21 10:44:56 +02:00
2aa1927570 netfilter: conntrack: revisit the gc initial rescheduling bias
The previous commit changed the way the rescheduling delay is computed
which has a side effect: the bias is now represented as much as the
other entries in the rescheduling delay which makes the logic to kick in
only with very large sets, as the initial interval is very large
(INT_MAX).

Revisit the GC initial bias to allow more frequent GC for smaller sets
while still avoiding wakeups when a machine is mostly idle. We're moving
from a large initial value to pretending we have 100 entries expiring at
the upper bound. This way only a few entries having a small timeout
won't impact much the rescheduling delay and non-idle machines will have
enough entries to lower the delay when needed. This also improves
readability as the initial bias is now linked to what is computed
instead of being an arbitrary large value.

Fixes: 2cfadb761d3d ("netfilter: conntrack: revisit gc autotuning")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-21 10:44:56 +02:00
95eabdd207 netfilter: conntrack: fix the gc rescheduling delay
Commit 2cfadb761d3d ("netfilter: conntrack: revisit gc autotuning")
changed the eviction rescheduling to the use average expiry of scanned
entries (within 1-60s) by doing:

  for (...) {
      expires = clamp(nf_ct_expires(tmp), ...);
      next_run += expires;
      next_run /= 2;
  }

The issue is the above will make the average ('next_run' here) more
dependent on the last expiration values than the firsts (for sets > 2).
Depending on the expiration values used to compute the average, the
result can be quite different than what's expected. To fix this we can
do the following:

  for (...) {
      expires = clamp(nf_ct_expires(tmp), ...);
      next_run += (expires - next_run) / ++count;
  }

Fixes: 2cfadb761d3d ("netfilter: conntrack: revisit gc autotuning")
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-21 10:44:56 +02:00
5508ff7cf3 net/sched: use tc_cls_stats_dump() in filter
use tc_cls_stats_dump() in filter.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20 15:54:13 -07:00
d250889322 netfilter: nf_ct_ftp: fix deadlock when nat rewrite is needed
We can't use ct->lock, this is already used by the seqadj internals.
When using ftp helper + nat, seqadj will attempt to acquire ct->lock
again.

Revert back to a global lock for now.

Fixes: c783a29c7e59 ("netfilter: nf_ct_ftp: prefer skb_linearize")
Reported-by: Bruno de Paula Larini <bruno.larini@riosoft.com.br>
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-20 23:50:03 +02:00
62ce44c4ff netfilter: ebtables: fix memory leak when blob is malformed
The bug fix was incomplete, it "replaced" crash with a memory leak.
The old code had an assignment to "ret" embedded into the conditional,
restore this.

Fixes: 7997eff82828 ("netfilter: ebtables: reject blobs that don't provide all entry points")
Reported-and-tested-by: syzbot+a24c5252f3e3ab733464@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-20 23:50:03 +02:00
9a4d6dd554 netfilter: nf_tables: fix percpu memory leak at nf_tables_addchain()
It seems to me that percpu memory for chain stats started leaking since
commit 3bc158f8d0330f0a ("netfilter: nf_tables: map basechain priority to
hardware priority") when nft_chain_offload_priority() returned an error.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 3bc158f8d0330f0a ("netfilter: nf_tables: map basechain priority to hardware priority")
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-20 23:50:03 +02:00
921ebde3c0 netfilter: nf_tables: fix nft_counters_enabled underflow at nf_tables_addchain()
syzbot is reporting underflow of nft_counters_enabled counter at
nf_tables_addchain() [1], for commit 43eb8949cfdffa76 ("netfilter:
nf_tables: do not leave chain stats enabled on error") missed that
nf_tables_chain_destroy() after nft_basechain_init() in the error path of
nf_tables_addchain() decrements the counter because nft_basechain_init()
makes nft_is_base_chain() return true by setting NFT_CHAIN_BASE flag.

Increment the counter immediately after returning from
nft_basechain_init().

Link:  https://syzkaller.appspot.com/bug?extid=b5d82a651b71cd8a75ab [1]
Reported-by: syzbot <syzbot+b5d82a651b71cd8a75ab@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+b5d82a651b71cd8a75ab@syzkaller.appspotmail.com>
Fixes: 43eb8949cfdffa76 ("netfilter: nf_tables: do not leave chain stats enabled on error")
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-20 23:50:03 +02:00
caddb4e0d6 net: make NET_(DEV|NS)_REFCNT_TRACKER depend on NET
It makes little sense to ask if networking namespace or net device refcount
tracking shall be enabled for debug kernel builds without network support.

This is similar to the commit eb0b39efb7d9 ("net: CONFIG_DEBUG_NET depends
on CONFIG_NET").

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20220915124256.32512-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20 14:23:56 -07:00
2c08a4f898 net/sched: taprio: replace safety precautions with comments
The WARN_ON_ONCE() checks introduced in commit 13511704f8d7 ("net:
taprio offload: enforce qdisc to netdev queue mapping") take a small
toll on performance, but otherwise, the conditions are never expected to
happen. Replace them with comments, such that the information is still
conveyed to developers.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20 13:53:34 -07:00
026de64d7b net/sched: taprio: add extack messages in taprio_init
Stop contributing to the proverbial user unfriendliness of tc, and tell
the user what is wrong wherever possible.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20 13:53:34 -07:00
25becba629 net/sched: taprio: stop going through private ops for dequeue and peek
Since commit 13511704f8d7 ("net: taprio offload: enforce qdisc to netdev
queue mapping"), taprio_dequeue_soft() and taprio_peek_soft() are de
facto the only implementations for Qdisc_ops :: dequeue and Qdisc_ops ::
peek that taprio provides.

This is because in full offload mode, __dev_queue_xmit() will select a
txq->qdisc which is never root taprio qdisc. So if nothing is enqueued
in the root qdisc, it will never be run and nothing will get dequeued
from it.

Therefore, we can remove the private indirection from taprio, and always
point Qdisc_ops :: dequeue to taprio_dequeue_soft (now simply named
taprio_dequeue) and Qdisc_ops :: peek to taprio_peek_soft (now simply
named taprio_peek).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20 13:53:34 -07:00
fa65edde5e net/sched: taprio: remove redundant FULL_OFFLOAD_IS_ENABLED check in taprio_enqueue
Since commit 13511704f8d7 ("net: taprio offload: enforce qdisc to netdev
queue mapping"), __dev_queue_xmit() will select a txq->qdisc for the
full offload case of taprio which isn't the root taprio qdisc, so
qdisc enqueues will never pass through taprio_enqueue().

That commit already introduced one safety precaution check for
FULL_OFFLOAD_IS_ENABLED(); a second one is really not needed, so
simplify the conditional for entering into the GSO segmentation logic.
Also reword the comment a little, to appear more natural after the code
change.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20 13:53:34 -07:00
9af23657b3 net/sched: taprio: use rtnl_dereference for oper and admin sched in taprio_destroy()
Sparse complains that taprio_destroy() dereferences q->oper_sched and
q->admin_sched without rcu_dereference(), since they are marked as __rcu
in the taprio private structure.

1671:28: warning: incorrect type in argument 1 (different address spaces)
1671:28:    expected struct callback_head *head
1671:28:    got struct callback_head [noderef] __rcu *
1674:28: warning: incorrect type in argument 1 (different address spaces)
1674:28:    expected struct callback_head *head
1674:28:    got struct callback_head [noderef] __rcu *

To silence that build warning, do actually use rtnl_dereference(), since
we know the rtnl_mutex is held at the time of q->destroy().

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20 13:53:33 -07:00
18cdd2f099 net/sched: taprio: taprio_dump and taprio_change are protected by rtnl_mutex
Since the writer-side lock is taken here, we do not need to open an RCU
read-side critical section, instead we can use rtnl_dereference() to
tell lockdep we are serialized with concurrent writes.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20 13:53:33 -07:00